You are on page 1of 109

ACCELEROMETER BASED CALCULATOR FOR VISUALLY-IMPAIRED PEOPLE USING MOBILE DEVICES

by Doğukan Erenel B.S. in Computer Engineering, Istanbul Technical University, 2008

Submitted to the Institute for Graduate Studies in Science and Engineering in partial fulfillment of the requirements for the degree of Master of Science

Graduate Program in Computer Engineering Boğaziçi University 2011

ii

ACCELEROMETER BASED CALCULATOR FOR VISUALLY-IMPAIRED PEOPLE USING MOBILE DEVICES

APPROVED BY:

Assoc. Prof. Haluk Bingöl (Thesis Supervisor) Prof. Lale Akarun Assoc. Prof. Zehra Çataltepe

...................

................... ...................

DATE OF APPROVAL: 20.07.2011

iii

ACKNOWLEDGEMENTS

This work is partially supported by Boğaziçi University Research Fund Project BAP-2011-6017.

gives 95.5% accuracy rate among 15 gestures with 720 gesture data totally. almost all studies. The second experimental result. which have high accuracy results. a design of accelerometer-based recognition system is given as well as its implementation as a gesture recognition based talking calculator and experimental results. the need of new accessibility options for these interfaces have become emergent. our aim is to find an alternative approach to classify 20 different gestures captured by iPhone 3GS’s built-in accelerometer and make high accuracy on user-independent classifications. which is obtained from collected data set. Previous studies gave the idea of using accelerometer based gesture recognition system on touch-screen smartphones with accelerometer as a new interface for visually-impaired people to use touchscreen keyboards. The first experimental result. In this study. which is obtained from 4 visually-impaired people with implemented calculator as end-user test. gives 96. Our method is based on Dynamic Time Warping (DTW) with dynamic warping window sizes. .7% accuracy rate among 20 gestures with 1062 gesture data totally. Within this work. However. are used user-dependent classifications or very limited gesture sets.iv ABSTRACT ACCELEROMETER BASED CALCULATOR FOR VISUALLY-IMPAIRED PEOPLE USING MOBILE DEVICES Within the popularity of new interface devices such as accelerometer based game controllers or touch-screen smartphones.

Önceki çalışmalar. ivme ölçer temelli el hareket tanıma sistemlerinin. yüksek tanıma oranlarına sahip hemen bütün çalışmalar kullanıcı bağımlı tanımalar yapmakta veya sınırlı bir hareket kümesine sahip bulunmaktadırlar. Ancak. Bu çalışma ile birlikte. Metodumuz temeli. ivmeölçerli dokunmatik ekran telefonlarda görme engellilerin dokunmatik ekran klavye kullanımları için yeni bir arayüz olarak kullanılması fikrini vermiştir. Toplanan 15 farklı el hareketi için toplam 720 örnek alınmış ve ortalama %95.7 tanıma başarısı elde edilmiştir.5 tanıma başarısı sağlanmıştır. bu yeni arayüzlerin erişilebilirlik seçeneklerine ihtiyaçlar önem kazanmaktadır. sistemin pratik bir uygulaması olarak konuşan ve 4 işlem yapabilen basit bir hesap makinası ve deney sonuçları verilmiştir. ivmeölçer temelli el hareket tanıma sistemi. 20 farklı el hareketi için 1062 veri toplanmış ve %96.v ÖZET GÖRME ENGELİLER İÇİN MOBİL CİHAZLARDA İVMEÖLÇER TEMELLİ HESAP MAKİNASI İvmeölçerli oyun kumandaları ve dokunmatik ekranlı akıllı telefonların popülerlik kazanması ile birlikte. Toplanan el hareket verileri üzerinde yapılan birinci deneye göre. dinamik zaman bükmesi ile birlikte dinamik sarmal çerçeve boyutu kullanılmasına dayanmaktadır. sistem yapılan hesap makinası ile 4 görme engelli tarafından denenmiştir. Bu çalışmada amacımız. Apple iPhone 3GS mobil cihazın ivmeölçeri kullanılarak 20 adet el hareketinin tanınmasına alternatif bir çözüm ve kullanıcı bağımsız yüksek tanıma oranı sunabilmektir. . İkinci deneyde.

. . . . . . . .1. . . . Example of DTW Algorithm . . . . . .1. xiii LIST OF ACRONYMS/ABBREVIATIONS . . . . . . . . . . . . . . . . . . . . . . . . . Built-in Accelerometer of iPhone 3GS . . . . . . . . . . . . 2. . . . . . Gesture Recognition . . .7. . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . xiv 1. . . . . . . . . . . . . . .1. .6. . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . Dynamic Time Warping . . . . . . . . . . . Accelerometer Based Gesture Recognition . . . . . . . . . . . 2. . . . . . . 1 4 4 5 8 8 11 11 12 12 12 13 14 14 15 15 17 17 18 19 21 24 2. .1. . . .1. . . . .5. . . . . . . . . Brief History . . . 2. . . ÖZET . . . . . . . . . . . . 2. . . . . . . . . . . . . . . .1. . . . . . . Low-Pass Filter . . . . . . . . . . . . . . . . . . . . . .6.3. . . . . .4. . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . .1. . . . . . 2. . . . Acceleration . . . .3. . . . . 2. . . . . .1. . .2. . . . . . . . . . . .2. . . . . . . . . . . . . . . . . LIST OF TABLES . . . .2. . . . . . . . . . . . .7. . . . . . . . . . . . .6. . . Conditions of DTW Algorithm . . . . . . . . . . . . . . . . 2. . . ABSTRACT . . . .6. . . . iii iv v ix xii LIST OF SYMBOLS .6. . . .2. . . . . . . . . . . . . . . . . . . . . . . . BACKGROUND . . . . . . General DTW Algorithm . . . . . . . . . . . . . . . . . . . . 2. . . . . . . . . . . . . . . . Digital Filter . . . . .1. . . 2. . . . . . . . . . . . . . . . . . . . . 2. . . .7. . . . 2. . . . . . . . . . Usage of Accelerometer . . . . . . . .2.3. . .5. . . . . . . . . . .5. . DTW Algorithm in Thesis . . . . . . . . . . . INTRODUCTION . . . . . . .7. 2. . . . . . . . . . . . . . . . . . . . . Mathematical Notations . . . . . . . . . . . . . . .5. . . . . . . 2. . . . .vi TABLE OF CONTENTS ACKNOWLEDGEMENTS . . . Gesture . . . . . . . . . . . . . . . . . . . . . . . 2. Sampling . . . . . . . . . . . . .2. . . . . . . . . . . LIST OF FIGURES . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . 2. . . . . . . . . . . . Gesture Recognition Systems . . . . . . . . . . . . . .1. . .7. . Digital Signal Processing . . . . . . . . . . . . . . . . . 2. . . . . . . . . Accelerometer . . . . . . . 2. . . . . . Downsampling . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . 2. . . . . . . . . . . . . . . . . . . . . . . .

. .7. 2. . . . . . . . . Average Amplitude Data Validation . . . .1. . . . . General Model . . . . . . . . . . . . .10. . . . 3. . . . . .1. . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . .3. . . . . . Previous Works .9. . . . DTW Training Processes . . . . . . . Best Warping Window Size . . Preprocess . . . . Downsampling . . . . . . . . . . . . 3. . . . . 3. . 2. . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Definitions . . . . . . . . . . . . . 3. . . . . . . . . . . . . . .7. . . . .vii 2.8. . . . . . . Finding Closest Class with Threshold Control . . . . . . . . . . . Data Validation . . 3. . . . . . . .1.5. . . . . .9. . . . 3. . . . . . . . . . . . . . . .1. .8. . .2. . . . Basic Information . . . .3. . . . . . . . . . . . . . . . . . . . Data Acquisition . . . . . . . . . . PROPOSED METHOD . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. .8. . . . . . . . . . . . . 3. . . . . Statistical Information . . . . . . . . .8. . .4. . . . . . . .6. . .11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . .8. . . . . . . 3. 3. . . . . . . . . . . . Assumptions . . . . . . . .1. 3. . . .4. Implementation Information . . . . . . . . . .8. .2. . . . . . . . . . . . . . . .1. . . . . . . . . . . 3. .9. . . . . . . . . . 2. . . . . . .1. Recognition Processes . .3. . . . Variance Adjustment . . . Theory . . . . . 2. . .2. . . 3. . .1. . . . . LBKeogh Distance . 3. . . . . Paramater Tuning . Threshold Control . . . . . . . . . . . . . . . . 3. . .11. . .8. .6. . .2. . 2. . . . . . 3. Cross Validation . 3. Mean Adjustment . . . . . .1. . . .3. . . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . . . . . . . . 3. .1.1. .3. . . . . . . . . . . . Recognition Processes . . . . . . . . . . . . . . . . .9. .2. . 3. . Template Generation . . . . DTW Training . . . 3. . . . .2. . . . 3. . . .2. . . . . . . . . . . . . . . . . . . . . . . .2. . .1. . . . . . . . Template Library Generation . . . . . . . . . . . . . . . . . . . . . . . .6. . . . .8. . . . . . . . . . . . . . . . . . . . . . . . . . . . Gesture Length Data Validation . . . Experimental Results . . . . . . . . . . . . . . . . . 25 28 28 28 29 30 37 37 39 40 43 44 44 45 46 46 47 48 49 50 52 53 55 55 55 57 60 60 60 62 62 62 3. . . . . . . Confusion Matrix . . . . 3. . . . . . . . Filtering Accelerometer Data . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . . . . . . Experimental Information . . REFERENCES . Results . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3. . . . . . . . . . . . . . . . . . APPENDIX A: EFFECTS OF PROCEDURES . . . 3. . . . CONCLUSION AND FUTURE WORK . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4. . . . APPENDIX B: FINDINGS FROM TRAINING PART APPENDIX D: CASE STUDY 63 64 68 70 72 76 82 88 APPENDIX C: ALGORITHMS . . . . . . . . . . . . . . . . . . .viii 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . .11. . . . . . . .11.

. . . . .6.7.11. . . .2.1. . . . . . Figure 2. . . . . . . . . . . . . . . . . . . . LB Keogh Distance definitions. . . . . . . Figure 2. . DTW matching example. . . . Gesture set from Jie Yang and Yangsheng Xu’s work. . . . . . . . . . . . . Figure 2. . . . . .8. . . . Acceleration example. . Relevant coordinate systems from Mizell’s work. Figure 2. . . . . .15. . . . . . . . .3. . . . . . . . . . Figure 2. . . . . . Gesture set from Kela’s work. . Acceleration movement characteristics. . . Example of signal sampling. . . . . . . . . . . . . . . . . . . . . . . Gesture set from Keskin’s work.5. Figure 2.4. . . . . Gesture set from Jiahui Wu’s work. . . . . . . . . Gesture set from Tian Seng Leong’s work. . . . . . . . . . Figure 2. . . . . . . . . . . .14. . . . . . . . . . .9. . . Figure 2. Figure 2. . . . . . . . . . . . . . . . . . . . . . . . . . .13. . . . . . . . iPhone axes orientation. . . . . . . .10. Figure 2. Figure 2. . . . . 6 7 8 10 16 20 24 27 30 32 32 33 34 34 35 Example gesture data illustration. . . . Figure 2. . . . . . . . . . . . . . Figure 2. Gesture set from Thomas Schlömer’s work. . . . . . . . .12. . . Example adjustment window conditions. . . . . . . . . . . . . Figure 2. . . . . .ix LIST OF FIGURES Figure 2. . . . . . . . . . . . . .

x Figure 2.16. Gesture set from Ahmad Akl and Shahrokh Valaee’s work. . . . . Figure 3.1. Figure 3.2. Figure 3.3. Figure 3.4. Figure 3.5. Figure 3.6. Figure 3.7. Figure 3.8. Figure 3.9. Gestures need for 4-operations calculator. . . . . . . . . . . . . . . Gesture Set used in this project. . . . . . . . . . . . . . . . . . . . Legend used in template generation and recognition processes. . . Template generating processes. . . . . . . . . . . . . . . . . . . . . Template generation algorithm. . . . . . . . . . . . . . . . . . . . Gesture recognition processes. . . . . . . . . . . . . . . . . . . . . Gesture recognition algorithm. . . . . . . . . . . . . . . . . . . . . Low-Pass Filter algorithm. . . . . . . . . . . . . . . . . . . . . . . Illustration of preprocess effects. . . . . . . . . . . . . . . . . . . . 35 37 39 40 41 42 43 44 48 50 54 55 59 61 64 67

Figure 3.10. Downsampling algorithm. . . . . . . . . . . . . . . . . . . . . . . . Figure 3.11. General DTW Training algorithm. . . . . . . . . . . . . . . . . . . Figure 3.12. Forward search algorithm. . . . . . . . . . . . . . . . . . . . . . . Figure 3.13. Finding Closest Class Index algorithm. . . . . . . . . . . . . . . . Figure 3.14. First Experiment’s Result. . . . . . . . . . . . . . . . . . . . . . . Figure 3.15. Second Experiment’s Result. . . . . . . . . . . . . . . . . . . . . .

xi Figure A.1. Figure C.1. Effects of procedures on classification accuracy. . . . . . . . . . . . Simple DTW Distance with Dynamic Warping Window Size algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Figure C.2. Figure C.3. Figure C.4. Figure C.5. Figure C.6. Figure D.1. Figure D.2. Figure D.3. Figure D.4. Figure D.5. Figure D.6. Figure D.7. Figure D.8. Best Warping Window Size algorithm. . . . . . . . . . . . . . . . . DTW evaluation algorithm. . . . . . . . . . . . . . . . . . . . . . 76 77 78 79 80 81 82 83 83 84 85 85 86 87 70

Finding Best Warping Window Sizes algorithm. . . . . . . . . . . LBKeogh Distance calculation algorithm. . . . . . . . . . . . . . . DTW Distance calculation algorithm. . . . . . . . . . . . . . . . . Gesture types used in case study. . . . . . . . . . . . . . . . . . . Raw gesture data sequences. . . . . . . . . . . . . . . . . . . . . . Filtered gesture data sequences. . . . . . . . . . . . . . . . . . . . Preprocessed gesture data sequences. . . . . . . . . . . . . . . . . Sampled gesture data sequences. . . . . . . . . . . . . . . . . . . . Template data sequences with upper and lower bounds. . . . . . . Raw, Filtered, Sampled validation data for Class A and Class B. . Template Matched data. . . . . . . . . . . . . . . . . . . . . . . .

xii

LIST OF TABLES

Table 2.1. Table 2.2. Table 2.3. Table 2.4. Table 2.5. Table 3.1. Table 3.2. Table A.1. Table B.1. Table B.2. Table B.3. Table B.4. Table D.1.

Mathematical notations used in thesis. . . . . . . . . . . . . . . . . Proper acceleration of rightward movement. . . . . . . . . . . . . . Proper acceleration of downward movement. . . . . . . . . . . . . . Simple confusion matrix with 2 class labels. . . . . . . . . . . . . . Comparison Table of The Previous Works and Proposed System. . Calculations used in second experiment. . . . . . . . . . . . . . . . Confusion matrix of experimental result. . . . . . . . . . . . . . . . Effects of procedures on classification accuracy. . . . . . . . . . . . DTW distance threshold values. . . . . . . . . . . . . . . . . . . . Warping window sizes of x axis. . . . . . . . . . . . . . . . . . . . Warping window sizes of y axis. . . . . . . . . . . . . . . . . . . . Warping window sizes of z axis. . . . . . . . . . . . . . . . . . . . Template Matching result with confusion matrix. . . . . . . . . . .

4 7 7 29 36 64 66 71 72 73 74 75 87

xiii LIST OF SYMBOLS C FN FND G G Gm {!(k)}K g I K L M N P Pr Q R Rc T TL TP TPD U W k=1 Candidate gesture data The number of False Negative classification DTW distance between falsely predicted and actual class Gesture data matrix Gesture set containing all gestures collected Gesture set containing all gestures of gesture type m Gesture data definition Gesture data value at time k The number of maximum iteration on DTW training Length of a single discrete-time data sequence Lower limit row vector used in LB Keogh distance calculation Number gesture types (gesture class) Number of samples Number of person who participated on gesture data collection Precision rate from confusion matrix Template (Query) Library (matrix) Number of range limit using in LB Keogh distance calculation Recall rate from confusion matrix Training Data Set Template Library (Set) The number of True Positive classification DTW distance between accurately predicted and actual class Upper limit row vector used in LB Keogh distance calculation Warping window size matrix Threshold value !(k) g .

xiv LIST OF ACRONYMS/ABBREVIATIONS DTW HMM Dynamic Time Warping Hidden Markov Model .

Our core approach (the training part) is based on the technique originally proposed in Ratanamahatana and Keogh’s work [9]. The main contribution of this work is the new interface to write text by capturing accelerometer data of hand gestures for touch-screen smartphones with accelerometer. for handicapped people or some games are designed for rehabilitation of people with physical disability or mental retardation. speech-to-text.1 1. the accessibility options have served by touch-screen smartphones for text editing are insufficient for visually impaired people. For example. Our motivation is the ability to serve a new interface to the current smart-phone industry without making any hardware changes. More precisely. some mobile devices with just touchscreens serve features like text-to-speech. therefore the integration of handicapped people to the society is much increased. the need of new accessibility options for these interfaces have become emergent. magnifier etc. we aim to serve a reliable. touchscreen keyboards are still in need of some improvement for visually-impaired people. which is desperately needed for visuallyimpaired people for writing text with touchscreen smart-phones. there are limited research works on these mentioned interfaces for visually-impaired people on mobile devices. INTRODUCTION Within the popularity of new interface devices such as accelerometer based game controllers or touch-screen smartphones. Several research works on accelerometer based gesture recognition systems in computer science and on the usage of accelerometer based devices in medical area gave the idea of new interface for accessibility [1–5]. This work aims to present an alternative solution for this problem. For example. In this work. there are few research . However. surgery and even combat injuries with some specific games [6–8]. the Wii controller is used for patients recovering from strokes. However. broken bones. In previous studies. fast and simple gesture recognition model and its implementation as a new interface.

In the second experiment. our proposed system is capable of classifying 20 different gestures and also it has tested by end-users with a handled device. which determine the distance between two gesture data as a similarity metric [10.7%. each gesture movement is captured by iPhone built-in accelerometer and saved into a dataset. The proposed system is separated into two parts. while the others use a limited gesture set. our proposed method seems to be one of the best among previous works (see: Comparison Table 2. In the training part. In the first experiment. The calculator was tested by 4 visuallyimpaired people with 40 different calculations having all simple operations (‘+’. In the classification part. are made by PC because of it’s computational power and iPhone 3GS is used in classification part with the finding from training part. . In this study. of each gesture type. we collected training data for 20 different gesture from 15 participants with 55 samples for each gesture types. preprocessed and the system is trained to generate key data. user-independent gesture recognition accuracy rate is 96. After templates are ready. ‘⇥’. which is iPhone 3GS. ‘/’) and numbers. a candidate gesture data is matched with each template using warping window size array and threshold values and the closest template is found as the classification result.5%. 11]. The proposed recognition system has been implemented as an iPhone application and tested with two experiments. except the data collection. ‘ ’. training (template generation) and classification (recognition process) parts and works as follows. On the last day of experiment (7th day). Here. Then each individual gesture data in the dataset is validated. the average accuracy rate has become 95. we made a gesture based talking calculator with 15 different gestures by using proposed system and first experiment’s training data. According to result of first experiment. The matching of two gesture data is based on Dynamic Time Warping (DTW) and LBKeogh algorithms.5). all operations. This experiment was repeated day by day for an adaptation period of 7 days. called template data. Based on the user-independent classification accuracy rates and the number of gesture types. in the training part.2 works tested by end-users. each individual gesture data in the dataset is validated and trained to find warping window size array and threshold values. The data was used to train and validate our system.

3 The remainder of this thesis is organized as follows. Chapter 2 gives the related background information with gesture recognition and preprocessing steps. Also, it gives some account of previous works on different gesture recognition approaches, relevant academic publications and their gesture sets and results. The proposed recognition system is described in Chapter 3 with the data collection part in Section 3.1, while our new and exciting results are given in Section 3.11. Finally, Chapter 4 gives conclusion for the project and ideas for future works. In the Appendix part; effects of procedures on classification accuracy, findings from training part, algorithms used in the thesis and a case study with simple gesture set are given with informative figures and tables.

4

2. BACKGROUND

This chapter gives an overview and related works on gesture recognition systems as well as their experimental results. Mathematical notations are given in the Section 2.1. From Section 2.2 to 2.4, definitions of acceleration, accelerometer and gesture, the usage of the accelerometer and gesture recognition systems are explained in order to understand the area of the field and future expectations. Then a brief information about gesture recognition systems in Section 2.5, digital signal processing in Section 2.6 with detailed DTW explanations in Section 2.7 and statistical information on some related topics in Section 2.8 are given. Then the gesture data sets used in previous research works with experimental results are given in Section 2.9 to give an idea on comparison with previous works. 2.1. Mathematical Notations The mathematical notations used in this thesis are given with table below: Table 2.1. Mathematical notations used in thesis. Notation X X ! x x x(k) X(k) {x(k)}T k=1 x,y Meaning of the Notation TBoldface capitalized characters denote matrices. Capitalized characters (non-bold, non-italic) denote sets. Boldface lowercase characters with arrow on top denote vectors. Boldface lowercase characters denote time series. Value of time serie x at time k. k th column of matrix X. A sequence definition with type of x and size of T . Equality operator with a small triangle denotes a definition. Here, x is defined as y.

5 2.2. Acceleration Acceleration is defined to be the rate change of change of velocity over time. Acceleration is a vector in 3D, represented by (x(t), y(t), z(t)) at any given time t, where x(t), y(t), z(t) are x, y, z coordinates of the acceleration vector. In generally, velocity is measured in meters per second (m/s), so acceleration is measured in meters per second squared (m/s2 ), which can be positive or negative. For motion along an axis, the average acceleration aavg over a time interval v2 t2 v1 = t1 v t t is:

aavg ,

(2.1)

where an object has velocity v1 at time t1 and v2 at time t2 (t2 > t1 ) [12]. In addition, the instantaneous acceleration a, which is commonly used for acceleration, is the derivative of the velocity with respect to time where velocity is the change in displacement with time at a given instant: dv d a, = dt dt ✓ dx dt ◆ d2 x dt2

=

(2.2)

As a special case; constant acceleration aconst is the acceleration, when the average and instantaneous accelerations are equal: v t v0 0

aconst , a = aavg =

(2.3)

where v0 is the velocity at time t = 0 and v is the velocity at any later time t. In addition to these essential definitions about acceleration, free-fall acceleration is the constant acceleration of an object while it is in free fall and which is caused by the gravitational force. Free-fall acceleration’s amplitude is represented by g and it’s value

given in Figure 2. The values of acceleration of the circle are given with Table 2. (b) Downward movement.6 near Earth’s surface is 9. Here.8m/s2 . the acceleration of an object relative to an observer.2. the acceleration vector values are given with respect to accelerometer frame of reference.8m/s2 (2.1. Lastly. In the right- . there are two frame of references. whose net acceleration is zero.1. where acceleration is: a= g= 9.3.2 and Table 2. (a) (b) Figure 2. is called proper acceleration and it is measured by an accelerometer. which is moving with an external force. free-fall acceleration is negative that shows the direction of the acceleration. while velocity and instantaneous acceleration is given with Figure 2. are some example movements to show proper acceleration values. the big circle with a small mark on bottom is an glass sphere containing an object to show the effect of the acceleration. one of them is inertial frame of reference. which is stationary and not accelerated and other is the frame of reference of glass sphere (or accelerometer). (a) Rightward movement. Here. Note that.4) Note that. which is downward on the y axis and towards Earth’s center. Example movements of glass sphere containing an object. The illustrations.

2 -9.0 4.8 0.1. in the downward movement the glass sphere is moved downard by increasing it’s velocity with a constant acceleration 0.2 -9. Table 2.0 -9.8 0.8 0.0 -9.0 -7.8 0.0 -9.0 0.0 0.8 0. the glass sphere is moved to the right by increasing it’s velocity with a constant acceleration 0.8 0.5 0.0 3.0 0.0 2.0 -11.0 4.0 0.0 0.0 2.7 ward movement.0 2. Note that. (a) shows the velocity values.0 3.5 0. Proper acceleration values of the sphere with downward movement.8 0.8 0.5 0.5 -0. Acceleration on axis (m/s2 ) \ Time (sec) x y z 0.5 -0.0 -7.0 (a) (b) Figure 2.0 1.0 -9.8 0. while (b) gives the acceleration values at a given time t of movements in Figure 2.8 0.8 0.0 2.0 1.0 0.0 -11.8 0.2.0 0. Similarly.8 0.0 -9.5 0.8 0.0 -11.0 0.0 -7.0 0.5 0.2.2 -9.2 -9.5 0. .2 -9.0 3.2 -9.2m/s2 till time t = 2sec then slowing with the same acceleration value and stopped at time t = 4sec.0 3. the directions of velocity and acceleration are parallel with the movement directions.8 0.8 0.0 1.8 0.0 0.2m/s2 till time t = 2sec then slowing with the same acceleration value and stopped at time t = 4sec.0 -9.0 1.3.0 0.0 -0.0 Table 2. Proper acceleration values of the sphere with rightward movement. Acceleration on axis (m/s2 ) \ Time (sec) x y z 0.8 0.

02g and a range of ±2g [14]. Figure 2.8 2. It is capable to work on 10 Hz to 100 Hz and gives values on x.3. Although all technical details are described in LIS302DL guide. . z(t)) for each clock cycle. Orientation of the iPhone built-in accelerometer axes (image from Klingmann [2]). 1 At time at writing. impacts and motion of an object (exp. y.3. tilt. The Figure 2. falling is a case where the proper acceleration changes. z axes and time stamp data t. but it was not used it in our research. since it tends toward zero).8m/s2 . the accelerometer has a sensitivity of approximately 0. shocks. an accelerometer has a test mass in order to measure the force per unit of mass. Accelerometer Accelerometer is a sensor device which measures the proper acceleration of the device. which is equal to approximately 9. the measurement value is g. In this kind of accelerometers. y(t). Built-in Accelerometer of iPhone 3GS In this project. if there is no change in velocity (not accelerated). Accelerometers can measure: vibrations. In generally.3. 2. producing time series of each axes in 3D (x(t). so this kind of accelerometers give an output in units of g (gravity).1. we used iPhone 3GS and its built-in accelerometer1 [13].3 gives an idea of the axes of built-in accelerometer of iPhone. iPhone 4 has newly released.

for example 100 Hz) accelerometer produces three (double precision) data values for acceleration on x.4 three data sequences can be seen as an example of a circle gesture movement with time information on x axis and amplitude values on y axis. At time t = 0 the accelerometer produces the constant acceleration as a proper accelerations. In brief.5) where ⌧ contains the exact time in seconds from the starting time where k = 1 and ⌧ (1) = 0. If the device is laying still with its back on a horizontal surface. y = 0 and z = 1. y and z axes with time information given in following order below. user makes a circle movement in clockwise direction starting from top point.0 corresponds to the normal acceleration caused by gravity at the Earths surface. which shows the end of the movement (length of a single discrete-time data sequence). k=1 ⌧ = {⌧ (k)}K k=1 (2.0. k=1 y = {y(k)}K . in Figure 2. then user starts to make the movement and stops at time t = 1. acceleration values (in 3D) will be approximately the following values: x = 0. x = {x(k)}K . Here. at each clock time (set by user. . there are four data sequences starting from 1 to time K.9 The acceleration data are specified as g-force values. where the value 1. k=1 z = {z(k)}K . In addition. The sphere containing an object represent the accelerometer and the object determines the effects of the acceleration. Here.5.

75 sec (h) Time=0. y and z axes combined. y and z) at given time t.675 sec (g) Time=0. Example of circle gesture data captured by iPhone 3GS accelerometer at 60Hz and its illustration with given time.0 sec (c) Time=0.6 sec (f) Time=0.5sec .5 sec Figure 2.45 sec (e) Time=0.3 sec (d) Time=0.2 sec (k) Time=1. (a) Data sequence in x. (b) is the starting position.05 sec (j) Time=1.4 (a) shows the proper acceleration values measured by accelerometer of each axis (x.4.825 sec (i) Time=1. A full circle movement is provided with (b) to (k). where time t = 0sec and (k) is the end position of the sphere at time t = 1. Here. the Figure 2.10 (a) (b) Time=0.

used Wii for rehabilitation of an adolescent with cerebral palsy. dynamic acceleration (acceleration of the device itself). or face used to express an idea or emotion [19].2. which can be considered as writing on air with mobile phone. • Rehabilitation. and functional mobility were measured after training. In addition to the extreme areas. Deutsch et al. used accelerometer-based gesture controls for a design environment and their research work gives the experimental result of controlling the basic functionality of three different home appliances: TV. This work shows the possible implementations to our daily devices and control it within a accelerometer-based interface. • Gesture Recognition. computers and washing machines now contain accelerometers [15].4. 17]. postural control. There are a couple of different usage on research area given below: • Acceleration to Text. which is a group of disorder in brain and nervous system functions affecting a person’s ability to move and to maintain balance and posture [6. Within this research work. they are used to measure static acceleration (gravity). Juha Kela et al. Judith E. VCR and lighting [18].11 2.3. Nowadays. The movement could include . arms. accelerometers are becoming more and more ubiquitous such that cell phones. Gestures are used to convey meaningful information as well as to interact with the environment. Here. the improvements in visual-perceptual processing. PhonePoint Pen work shows the possible usage of 3D accelerometer data [16]. vibration and tilt of an object as well as shock to an object. in generally. Gesture Gesture is a movement of the body or a part of the body such as hands. 2. the data are converted to an image in order to give the user the ability to note down small pieces of information. orientation. quickly and ubiquitously. Usage of Accelerometer Accelerometers have wide variety of application areas from industry to education such as triggering airbag deployments or monitoring of nuclear reactors.

2. HMM. Brief History In 1963. Commercially. PCA. Gesture Recognition Systems 2.5. a dynamic movement of the body part. 2. 21]. Michael Coleman at CMU developed a gesture-based text editor using proof-reading symbols [24]. university and commercial researches on gesture recognition systems have been done with various other disciplinary researches. image processing. It was quite common some gesture recognition included light-pen-based systems. hand and arm gestures. light-pen gestures were used by a sketchpad.5. we used hand and arm gestures. the static part of the body is the arm. There are mainly two types of gesture recognition. more advanced particle filtering and condensation algorithms [10. There are different tools for gesture recognition. From that time. DTW. Gesture Recognition Gesture Recognition is a topic in computer science and language technology with the goal of interpreting human gestures via mathematical algorithms. the first trainable gesture recognizer developed by Teitelman as well as RAND tablet’s early demonstration [20. 23]. based on the approaches ranging from statistical modeling. head and face gestures and body gestures. In hand and arm gestures. CAD systems used gesture recognition since the 1970s. Most of the problems have been addressed based on statistical modeling. In this thesis. 2. shaking of the fist.1. connectionist systems. In addition. 26–32]. involving a posture. 19. which are captured by iPhone built-in accelerometer.5. In 1969. One of them is computer vision based gesture recognition and the . SVM. computer vision and pattern recognition. FFT. such as Kalman filtering. At that time. handshake.12 a wave. etc. There are mainly three types of gestures. RAND tablet and one year later in 1964. for example in the AMBIT/G system [22. a static configuration without the movement of the body part and a gesture. gestures are often language and culture specific. and came to universal notice with the Apple Newton in 1992 [25]. while dynamic part is hand such as sign language or hand shaking. head nod. etc.

) whereas in accelerometer based recognition.5. the input type is some type of images (2D. but there may additional sensor such as gyroscope or camera etc. while holding the phone facing to user. the input is a type of accelerometer based controller whether it is a basic accelerometer or an advanced accelerometer with extra sensor information like gyroscope. In vision based recognition. etc.3. these types are separated from each other by the input type. Accelerometer Based Gesture Recognition Accelerometer Based Gesture Recognition is a subset of pattern recognition using user controller’s acceleration data as input and interprets the symbols using the feature extraction. They have their built-in accelerometers and the game console uses this data directly or after processing in a gesture recognition system. G . 3D. Here. clustering and classification methods as well as acceleration data specific data operations. acceleration of the motion generated and identified by the recognition system. In brief.13 other is accelerometer based gesture recognition. {!(k)}K g k=1 (2. captured from iPhone 3GS built-in accelerometer. Controller acts as an extension of the body so that when user performs a gesture.6) . 2. (a) in Figure 2.4 shows an example gesture data. The main input is a controller having at least one accelerometer. User makes a circle movement in clockwise direction starting from top point. etc. which are used as a game controller in games [33–35]. as 3D discrete-time data sequences. the gesture G is a sequence. For example. Wii Remote controller or Play Station Move Motion controller are the famous examples for controller.

and countless other applications. 6 g 6y(k)7 4 5 z(k) x(k) (2. G can be interpreted as an 3 ⇥ K matrix. Digital filtering is a topic related with electronics. what we need to do mathematically (mathematics) and how well we represent the algorithm as a software (computer science). In this thesis.1. which aspects are need to reduce or enhance (requirements). how the signal is produced (electronic component details). 2. computer science and mathematics. 2. and much more. the accelerometer based gesture recognition system takes a gestures data given in Figure 2. neural networks. In brief. images from remote space probes. which means that we need to know all about signal characteristics such that. In addition. radar and sonar echoes.6. DSP has a wide variety of goals: filtering. Digital Filter Digital Filter is a system that performs mathematical operations on a discretetime signal in order to enhance or reduce certain aspects of a digital signal [37].6. . data compression.7) Note that. image enhancement.14 where !(k) is. speech recognition. Digital Signal Processing Digital Signal Processing (DSP) is the science of using computers to understand data represented by discrete time signals and the processing of these signals [36].4 and produce the symbol of circle. g 2 3 6 7 7 !(k) . voltages generated by the heart and brain. seismic vibrations. we used a simple low-pass filter explained below. the signal has come from numerous sources.

2. . The example in Figure 2. A filter that reduces the amplitude (attenuates) of signals with frequencies higher than the cutoff frequency but passes low-frequency signals is called Low-Pass Filter [38].5a shows a continuous time function starting at time t = 0 and ended at time t = 29 and Figure 2. Low-Pass Filter. In this project. the accelerometer gives the data in a sampled form with a sampling rate of 60 Hz. Cutoff frequency is the boundary frequency for filter.8) where x is input signal and y is the filtered signal and ↵ is Smoothing Factor which is defined as: Fcutof f Fcutof f + Fupdate ↵= (2.15 2. Sampling Sampling is a process that reduction of a continuous signal to a discrete signal with given sampling rate. if the rate is 20 Hz.2.1. In this work. so we don’t need to use a sampling but downsampling defined below. is used to determine number of samples (discrete sequence length) in one second. which is a constant value defined by the user. respectively. where update frequency is the accelerometer update frequency.8 below: y(i) = ↵x(i) + (1 ↵)y(i 1) (2. Sampling Rate.1. a simple low-pass filter is used which has the same logic with analog low-pass RC filter given with Equation 2.5b gives the sampled function of given continuous time function with 1 Hz.6. Fcutof f and Fupdate are the cutoff and update frequency. For example. Low-pass filters exist in many different forms such as digital filters for smoothing sets of data like blurring of images or electronic circuits for acoustic barriers etc.6.9) Here. it means that sampler produce 20 data values in one second.

so the new function is named as f 0 (k) where k is the time index between 0 and 29.4 Hz downsampling rate. (c) Down-sampled sequence with scaling factor 0. Here.16 (a) (b) (c) Figure 2. The down-sampled time series is called f 00 (n) where n is the time index between 0 and 11. (a) Continuous time function. while the discrete signal is sampled previously with 1 Hz.4 (12 / 30). t is the time value in continuous and k and n is the time index in discrete time series.5. . (b) Sampled function (discrete time series) with 1 Hz.1 = 0. therefore the downsampling factor is 0. Example of signal sampling.4.4/0. the sampled time series is down-sampled with 0. The continuous time function f (t) is segmented from t = 0 to t = 29 and sampled with 1 Hz sampling rate. Then.

7. Dynamic Time Warping In this study. we need a fast classification method. 2.17 2. Downsampling. the values are directly selected from the given signal with sampling rate.5. According to computational power and memory limitation or iPhone 3GS. In this thesis. if a discrete time signal is sampled with 5 Hz previously and need to downsampled with 2 Hz. according to downsampling factor. Downsampling is a process that reduction the sampling rate of a discrete signal to a given downsampling rate. which defines the length of data selection interval (selection group). This process is similar to sampling but sampling operates on continuous signal whereas downsampling operates on discrete signal (sequence).1. In average downsampling. the classification method should not be too dependent on the variety of training data set. In addition. we used average downsampling.75 and 30/60 = 0. the value of a selection group is calculated by averaging the values in the groups. if a gesture data has 40 data points (captured at 60 Hz) and other one is 70 data points. in this study. after downsampling each gesture data has exactly 30 data points with downsampling factors are 30/40 = 0. we have 20 different gestures and we need to classify any unknown gesture data captured by iPhone 3GS. There are number of methods in determining the value of selection groups. the downsampling factor is 2/5 = 0. called selection group. the values are selected from an interval of values of given discrete signal according to downsampling factor. In addition. In addition. Downsampling factor is calculated with division of given discrete signal sampling rate and downsampling rate. simple downsampling. average downsampling and median downsampling.4.6. In order to down-sample a discrete sequence. therefore. otherwise classification . For example. respectively. the given discrete signal is split into group of values. we collected 55 gesture data (see: Data Acquisition in Section 3. in downsampling. For example. a value is selected or calculated to generate down-sampled sequence for each selection group.2. downsampling factor is changed according to each gesture data in order to make each gesture data with same length but the downsampling rate is constant 30 Hz. while sampling rate is 60 Hz. firstly. Then. in sampling. min downsampling. which is not a big data set according to previous works.1) of each gesture type. max downsampling.

According to these reasons.18 accuracy become very low in the end-user test. In other words.1.C @ A D(i 1. (2. c = {c(j)}J is the one unknown sequence called candidate with length of J and j=1 D is the two-dimensional matrix for matching cost of each point pairs and d is the on dynamic programming using the following formulation: 0 1 distance between two points of given sequences. we chose to use Dynamic Time Warping (DTW). Moreover.7. j 1). DTW is also used for measuring the similarity distance between two sequences after finding optimal match.10) D(0.g. d(q(i). The matching cost is computed based B C B C D(i. j=1 (2. causing the signals to be optimally aligned in terms of the cumulative distance between the aligned samples is minimized. 2. our classification method can handle with problems such as the lengths of the gesture data are not the same or one part of the motion is faster then other ones etc. j) where the boundary conditions are: D(i. In generally. General DTW Algorithm DTW matches two time sequence and calculate the minimum distance in warped sequence from matching result [10]. the classification method is expected to work user-independently. Let q = {q(i)}I is the one known sequence called query (or template) with length of i=1 I. In order to work user-independently. j) . j 1). which is an algorithm for finding the optimal match between two given sequences (e. DTW is implemented by calculating the matching cost and using a dynamic programming algorithm with two-dimensional matrix. {D(i. c(j)) + min BD(i 1. time series) which may vary in time or speed with some certain criteria [39]. i=1 {D(0. DTW matches two time signals by computing a temporal transformation. 0) = 0. which means that it should produce high accuracy recognition results in user-independent work. called warping function. j)}J = 1.11) . 0)}I = 1.

the DTW distance. D(I. > > > > > > > > >|q(i) > > > > > > > : B D(i.2).1.7. so d(q(i). c(j)) . if i = 0.12) In summary. j 1)C . j = 0 0 1 if i = 0.6gives an example of usage of DTW Algorithm. |q(i) c(j)| if the sequences are one dimensional and d(q(i). j > 0 if i > 0. c(j)) . so our distance calculation is given below: d(q(i). DTW . j) otherwise where i  I and j  J. j) . |q(i) c(j)| (2. is the final point of the matching cost matrix and computed as: DT W (q. p (q(i)1 c(j)1 )2 + (q(i)2 c(j)2 )2 if the sequences are two dimensional etc. the value of matching cost matrix D for given points i. j between two sequences is calculated by: 8 > >1. J) (2. j 1) C B C B C c(j)| + min BD(i 1. Figure 2.1. which is the similarity distance between two sequences after finding optimal match between query and template data sequences. In addition. B C @ A D(i 1. In this thesis. Example of DTW Algorithm. c) .14) where I is the length of sequence q and J is the length of sequence c. which is the distance between points p and q is the length of the line segment connecting them (pq).19 The distance calculation is generally same as the Euclidean distance. > > > > > > >1. > > > > > > > <0.13) D(i. we used one dimensional sequences (see: DTW Algorithm in Thesis in Section 2. c(j)) . 2.7. j = 0 (2. After taking query sequence (or template) and candidate sequence.

(a) Distance matrix (b) Cost matching matrix Figure 2. the lower script letter for each character show the lexical ordering index and each cell shows the calculated matching cost as well as the optimal path for minimum DTW distance. This corresponds to a section of template . the first elements ‘S’ of each sequences match together while the second element of the candidate ‘s’ also matches best against the first element of the template. the value of lexical order is given as subscript of each letter. and write inside the corresponding cell. In the given example in Figure 2. Now.20 arranges two given sequences on the sides of grid with the unknown sequence on the bottom and template on the left side. Here. Here. which is the same as finding path through the grid minimizing the total distance. we are determine a distance measure comparing the corresponding elements of the two sequences for each cell. because the distance is 0. Here. (a) shows the Euclidean distances. In the figure. but here it means that the distance in lexical ordering in Latin alphabet without considering the upper or lower cases. The total distance minimizing path should start from the lower-left corner (the initial sequence points) and end upper-right corner (the last sequence points).6. In generally. the distance between ‘B’ and ‘a’ is 1. the distance refers to the Euclidean distance in DTW algorithms. the aim is finding the best match between two given sequences. while (b) gives the matching cost and optimal path between “SPEECH” and “SsPEEChH” sequences according to letter lexical order.6. For example. our template is word “SPEECH” and our candidate is “SsPEEChH”.

which is the upper-left value of the matching cost matrix. the sixth element of the candidate ‘h’ matches with the sixth element of the template ‘H’ and in order to minimize the distance we are passing the fifth element of the template ‘C’. Conditions of DTW Algorithm. In warping function (optimum path).2. After stretching. and k is the any time index where 0 < k  I and 0 < k  J. there some condition that we assume such as. • Monotonic Condition. In addition.7. which means that warping . 2. In DTW algorithm. Within this operations. q and c are two given data sequence with length of I and J respectively. we can see that from the second element to fourth element of the template ‘PEE’ are matched with the candidates from third to fifth element ‘PEE’. Here. These are some examples to show the need of optimizations on DTW which are defined in Sakoe and Chiba work [10]. the i and j indexes can only increase by 1 on each step along the path. Then. and i and j are any index points where 0 < i  I and 0 < j  J. we need some limitations for optimum path in order to speed up the DTW distance calculation. In warping function (optimum path). the i and j indexes either stay the same or increase.21 being stretched in the input. they can’t turn back on itself. distance from the 0th row and column are infinite in the cost matching matrix. it means that this section of template has been compressed in the input sequence. Observations on the nature of good paths through the grid gives the idea of the major optimizations to the DTW algorithm which are explained briefly below.1. ik 1  ik jk 1  jk (2. an overall best path has been found for this template and the total distance between the two sequences is calculated by adding each cell pair distance of the optimal path and found as 2.15) • Continuity Condition.

respectively. jk ) on optimal path. (2. j) contains the index element from template once more while stepping into time on candidate sequence. > > > : (i . For any point at (ik . jk 1 ). jk 1 ). 8 > (i . i1 = 1.18) • Warping Window Condition.17) • Boundary Condition. j ) k 1 k path(k (2.The warping function (optimum path) starts at the lowerleft and ends at the upper-right. Let’s assume. J) and any point of optimal path path(k) = (i.22 function advances one step at a time. there are three choice at each step while stepping in optimum path. the previous point may be (ik . optimal path in DTW is path = {path(k)}K where K is the length of the optimal path which k=1 values. jk ) or (ik 1 . . jk 1 ) or (ik 1 . which are called insertion. 1) = (ik 1 . jk 1 )) is used for the distance if we use an is max(I. Insertion (if path(k deletion (if path(k matching (if path(k 1) = (ik . the warping function (optimum path) can’t be wander very far from the diagonal. j ). jf inal = J. j1 = 1. > k k 1 > < 1) = (ik 1 . if inal = I. There are several approaches for satisfying this condition (see: Figure 2.7). In brief. In generally. deletion and matching. jk )) is used for the distance if we use an element 1) = (ik 1 .16) • Sum of Monotonic and Continuity conditions. ik ik 1 1 jk jk 1 1 (2. jk 1 )) is used for the distance if we use both from candidate once more while stepping into time on template sequence and element from template and candidate (most probably there are matching) and stepping in time both on template and candidate sequences.

Here.7b. (iii) Ratanamahatana-Keogh Band (R-K Band). There should be a ratio i/j as a slope constraint condition and according to this condition. so our system is trained in order to find best warping window size (R-K band) for each gesture type.7c.23 (i) Sakoe-Chiba Band. In this study. the difference from Sakoe-Chiba band is the band width is changing with given index and the window shape is a symmetric parallelogram as seen in Figure 2.19) (ii) Itakura Parallelogram. |ik jk |  w(k) (2. A constant distance r is defined as a window length in order to restrict the possible paths with a diagonal band [10]. R-K band is used in order to increase the classification accuracy. |ik jk |  r (2. A global contraint r is defined as a window length of given index [40].7a.6). A variable w is defined as warping window for each index according to classification result in order to find highest accuracy result [9].20) • Slope constraint condition. An example R-K band can be seen in Figure 2. . The path should be in a range of slope angles to prevent being too steep or too shallow as well as matching very short sequences with very long ones. The restriction path of Sakoe-Chiba Band is given in Figure 2. after j steps in x we must make a step in y and vice versa (see: Figure 2.

a candidate gesture data C and warping window size array W are both the same size matrices with 3 ⇥ N .24 (a) (b) (c) Figure 2. Note that. Example adjustment window conditions. 2 3 2 3 2 3 6 7 7 !(n) . as given in R-K band definition.22) . {!(n)}N . c n=1 W . Here. A locality constraint as warping window size matrix W has been defined.1). (c) R-K band. {!(n)}N . in this study. for each axis of the gesture data G. R-K Band has been selected as the warping window condition. (b) Itakura parallelogram.21) where nth column of !(n). (a) Sakoe-Chiba band. 6y(n)7 . !(n) and ! q c w(n) are. 4 5 z(n) x(n) (2. DTW Algorithm in Thesis In this thesis.7.6. Here N is the length of each sequences. because Section 2. {! w(n)}N .2. 4 5 z(n) x(n) 6 7 7 !(n) . 6 c 6y(n)7 . g n=1 C . The matrices are. a template gesture data Q. from Ratanamahatana work [9]. n=1 (2. 6 q 6y(n)7 .2. after downsampling all sequences become of the same length (see: Downsampling in Q . we used DTW distance with satisfying all conditions defined above. 4 5 z(n) x(n) 6 7 6 7 ! w(n) . 2.7.

otherwise Dx (i.24) The DTW distance calculation becomes: DT W (Q. y and z) and added to each other. j))  |i > :1. Here. but a simple addition operation. the distance calculation is the same as finding one dimensional Euclidean distance d(q(i). Dx (N. Lower-Bound Keogh (LBKeogh) Distance is a tool for lower bounding various . in order to decrease time consumption in recognition processes using DTW distance we used one more distance metric before calculating the DTW distance. cx (j)) + Mx . The matching cost for x axis sequence is given below with DTW distance calculation. in order to calculate the DTW distance between Q and C. c(j)) = |q(i) c(j)| as given in general DTW algorithm.23) where Mx is the minimum distance coming from previous route and is defined: Mx . 8 > <d(qx (i). j 1).25) Note that.7. C) . LBKeogh Distance In this thesis. Dx (i 1.25 Therefore. we tested Euclidean distance calculation. min(Dx (i. j)) (2. N ) (2. if wx (max(i. j| (2. Here.6.3. N ) + Dy (N. j 1). the matching cost is calculated for each dimension (x. the pseudocode of DTW distance calculation is given in the Appendix C in Figure C. 2. In our experiments. there are three dimensions but the distance calculation is not the same with Euclidean distance calculation for three dimensions. weighted addition (multiplying each dimension with a variable and sum all) and simple addition and found that simple addition operation gives higher accuracy rate in classification. j) . we mainly used DTW distance to matching two data sequences defined above. However. N ) + Dz (N. Dx (i 1.

min{N. k + R + 1. k R + 1.. 6 u 6y(n)7 .. where k is an arbitrary index (0 < k  N ) and N is the length of the sequence and R is the range value of lower and upper bounds. 2. 4 5 z(n) (2. u 2 3 2 3 U .26) x(n) 6 7 ! 6 7 l (n) .. Lower bounds and upper bounds are defined in order to make indexing fast. . 6y(n)7 . .27) ! where ith value of l (n) and !(n) for x axis are: u l(i)x ... k. min {qx (j)|j 2 [max{1... 4 5 z(n) x(n) 6 7 7 !(n) . max {qx (j)|j 2 [max{1. Here. . i u(i)x . i + R}]} R}. N }.. consider a sequence with indices {1. respectively. i=1 ! where nth column of l (n) and !(n) are. . which are: ! L . a protective envelope is built around one dimension of template Q with the upper bound U and lower bound L.. k + R. i R}.. the upper and lower bounds at k index are calculated by the maximum and minimum value in subsequence {k R. we are defining two more time series for each templates and a constant value R for determining the lower and upper bound range.. i + R}]} (2.8.. In the given Figure 2. min{N. k + R}..28) As an example. . k + 1. Here... k R. It is used for indexing DTW in a fast way. u i=1 (2.. The LBKeogh distance calculation for one dimension (x axis) between candidate sequence C and template sequence Q is . .. {!(n)}N . k. { l (n)}N ..26 time series distance measures [11].

we used 3 dimensional sequences as described before.29) In this study. Cz ) (2. LBKeogh (Qx . . LBKeogh (Qx . LB Keogh Distance definitions from Keogh’s work [11]. Cy ) + LBKeogh (Qz . Cx ) + LBKeogh (Qy . the pseudocode of LB Keogh distance calculation is given in the Appendix C in Figure C.8. if qx (i) > ux (i) if qx (i) < lx (i) otherwise (2. LBKeogh (Q. N X i=1 ux (i))2 .5. Cx ) . lx (i))2 . given below: 8 > >(q (i) > x > > < >(qx (i) > > > > :0.27 Figure 2. C) .30) Note that. so our distance definition becomes.

the single .8. is a term equivalent to the average value of a signal as given below [36]: N 1 X µ. validation may be performed multiple times by using different partitions and the results from validation of algorithm are averaged in order to reduce variability.8. Statistical Information 2. is used for training the algorithms and the remaining part of the data. is used for estimating the risk of the algorithm.31) Variance. CV selects the algorithm with the smallest estimated risk [42]. According to type of CV. One split of data. called training set.1. system is trained with whole samples except for a single observation and repeating the procedure till all data are used as validation once. The main idea on cross validation is to split data into two parts (once or several times till all data are used in validation set once) for estimating how accurately a predictive model will perform in practice. leave-one-out Validation (LOOV). Here. Cross Validation Cross validation (CV) is a popular technique for selecting the model or estimating the risk of an estimator [41].32) 2. Most common types of cross validation techniques are.2. called validation set or testing set. is a term equivalent to the average value of squared difference of mean and the sequence points. indicated by v. In LOOV. we used LOOV. which is given below: N 1 X v. xi N i=1 (2. K-fold cross validation and random sub-sampling. In this thesis. (xi N i=1 µ)2 (2. Then. indicated by µ.28 2.8. Basic Information Mean.

• False Positive (FP). Table 2.3.4. The number of incorrect predictions that an instance is negative. The proportion of the predicted positive cases that were correct . The proportion of the predictions that were correct to the total prediction number. The number of correct predictions that an instance is negative.The number of correct predictions that an instance is positive.33) • Precision (Pr). It shows the predicted and actual classification result.4 is an example with 2 different class labels. Simple confusion matrix with 2 class labels.8. Actual \ Predicted Negative Positive Negative Positive TN FN FP TP There are some definitions related with confusion matrix and used in this project given below: • True Positive (TP). 2. • Accuracy. Confusion Matrix Confusion Matrix is a matrix containing all classification result of actual and predicted classes done by a classification system [43]. • True Negative (TN). The number of incorrect predictions that an instance is positive. • False Negative (FN). (2. TP + TN TP + TN + FP + FN Accuracy . The given Table 2.29 observation is used as validation data whereas the all others are used as training data.

Figure 2.Rc) is the proportion of positive cases that were correctly identified to the total actual class numbers. In 1995. Recall (true positive rate .9) and they collected 100 training.9.34) • True Positive Rate (Recall or Rc). In 1994. Donald O. (2. Jr.35) 2. the general background information has been given in order to make clear all field of this research project. worked on a hand gesture recognition for identifying some basic letters by using HMM with a mouse controller as gesture input . Tanguay. Previous Works Until this section.9.78%. Gesture set from Jie Yang and Yangsheng Xu’s research [44]. Jie Yang and Yangsheng Xu made a gesture recognition system by using HMM classifier with a mouse controller as gesture input [44]. TP TP + FP Pr . TP TP + FN Rc . Now. 50 validation samples for each gesture.30 to the total prediction of given class. The general accuracy rate was 99. we are looking the previous works related with our proposed method (see: Chapter 3) and result of them in order to determine our method and our expectations from our method. Here. There were 9 gestures (see: Figure 2. (2. some gesture collections used in previous works with their experimental results are given in a chronological order.

Keskin et al. DG2 as ‘checking the phone and bringing phone to right ear ’. He made some experiments and lastly.75% by using HMM classifier.31 (working at 20Hz) [45]. Mantyla et al. There were a 5 gesture set of words {‘my’. ‘c’. The training set contains 5 examples for each gesture as well as 5 examples for validation. In 2003. Tracy Westeyn et al. 2003. ‘me’. ‘b’. DG5:‘checking the phone from desk and bringing phone to right ear’. SG3 as ‘on the left/ right ear’. SG1 as ‘display up’. The indepen- . The average accuracy is 98. DG6:‘reverse in time of DG5’. ‘talk’} as vocabulary and 72 annotated sequences of signed sentences as test data. DG3 as ‘reverse in time of DG2’. There were 469 elements in training set totally and 160 elements in test set. where as dynamic gestures are DG1 as ‘checking the phone from pocket and turning the same position’. He used 395 sentences as training data and 99 sentences as validation with totally 6000 samples. Static gestures are. tried to identify 8 different 3D gestures (see: Figure 2. ‘d’ and ‘e’. 49]. used acceleration sensors for recognition static and dynamic gestures by a HMM classifier [3]. gave an idea about using multiple sensors on gesture recognition with ‘Georgia Tech Gesture Toolkit’ [48. DG4:‘checking the phone from desk and turning the same position’.10) captured by a camera with reconstruction 2D images (from two camera) into 3D image [47]. ‘computer’. Thad Eugene Starner used a camera (5 Frames/ sec) and colored gloves to identify sentences from American Sign Languages by HMM classifier [46]. and found 97% accuracy with strong grammar and 91% accuracy rate without grammar.46% for static gestures where as 96% for dynamic gestures from second person. There were 4050 training and 6 testing samples for each static gesture and 100 training (from 1 person) and 100 testing samples (50 from same person. 50 from other person) and the average accuracy is 96. HMM state number and number of code book clusters are changed and found maximum 88% accuracy on validation data. Then. In 2000. ‘a’. he used five letters to distinguish them each other. ‘helps’. In same year with Keskin’s work. SG4 as ‘in the pocket’. SG2 as ‘display down’. In 1995.

28% for uppercase letters. 16.32 Figure 2. In 2004.26% by combining accelerometer and vision information.20% for lowercase letters. Relevant coordinate systems. Figure 2. 94.32% for lowercase letters.683 lowercase letters. in 2003.26% for digits with ‘Resample and Average’ prototype set and 90.11.16% for digits with ‘Mergesamples’ prototype set. In addition to accelerometer based gesture recognition system.451 uppercase letters and 9. User independent classification accuracy rate was.10. Choti- .64% for uppercase letters. David Mizell proposed a solution for the acceleration axis calculation problem by an estimate of the constant gravity vector from accelerometer samples average [50]. and 97. Gesture set from Keskin’s work [47].195 digits. The gesture set contains 62 latin letters (all upper case [A-Z]. In addition to gesture recognition systems. and 95.48% and user-dependent was 94. Ralph Niels made a handwriting recognition system by using DTW with two different approaches [51]. in 2004. lower case [a-z] and number [0-9] letters) with 37. image from Mizell’s work [50]. 90. 88. dent user recognition rate was 90.

for improving the accuracy (see: Section 2. All gestures were captured by a wireless sensor device equipped with accelerometers. The experimental results showed that the two different classifiers reach 97. Figure 2. In 2006.5% accuracy for ‘Gun’ data set. The training set contains 240 samples for each gesture. In accuracy experiments.3%.13) with 90 samples for each gesture by 6 participants. HMM and SVM classifiers [52]. circle = 86.4% (with HMM) and 96% (with SVM) accuracy on a personalized gesture set containing 10 dif- . Zoltan Prekopcsak proposed a real-time hand gesture recognition system by using two different classifiers. The classifier was HMM and average accuracy on recognition by gestures were. VCR and lighting [18]. roll = 84. and tennis = 94. square = 88. such as TV.12.67% accuracy for ‘Word Spotting’ data set. ‘Trace’ contains 4 different class with 50 samples for each class. made a research on a gesture recognition system with a wii controller using HMM classifier [8]. In the same year. they used 8 gestures (see: Figure 2. ‘Word Spotting’ contains 4 class with 272 samples totaly) and found 99. called Ratanamahatana-Keogh Bands (or R-K band ).3%. they used 3 different gesture data set (‘Gun’ contains 2 different classes with 100 samples for each class.7. In 2008. 100% accuracy for ‘Trace’ data set and 99. with 240 samples for each gestures totally.33 rat Ann Ratanamahatana and Eamonn Keogh proposed a dynamic window distance calculation as an addition of Sakoe-Chiba Band and Itakura Parallelogram.5%. z = 94.12). while 28 samples for validation set and the experiment is repeated with different training set sizes. used acceleration data for controlling some daily operations. A recognition rate of over 90% was achieved with four training vectors by using HMM classifier.6%. In this work. Juha Kela et al. Gesture set from Kela’s work [18]. 2008.1) [9]. which are determined according to questionnaires results. Thomas Schlömer et al. There were 5 different gestures (see: Figure 2.8%.

15. proposed an acceleration-based gesture recognition approach called Frame-based Descriptor and multi-class SVM [1]. C4. DTW. The data were captured using 3-D accelerometer of the Wiimote. there were 4480 gestures from 8 participant and uWave achieves accuracy of 98. There were 10 different gestured. Gesture set from Jiahui Wu’s work [1]. with 500 samples of each gestures from 50 participant.14. in 2009. uWave was proposed which is an efficient personalized gesture recognizer based on a 3-D accelerometer and it allows users to define personalized gestures with a single sample [53]. In 2009. Naïve Bayes. In the same year 2009. Jiahui Wu et al.21% for all the 12 gestures in the user-dependent case.4% in user-independent case. The gesture set was the same in Kela’s work in 2006 which is given in Figure 2.12. The recognition rate was 97% . proposed an accelerometer-based recognition algorithm using DTW classifier [54].13. whereas 75. whereas 89.6% in user-dependent case. In addition to FDSVM.14) captured a wearable 3-dimensional accelerometer. There were 3360 gesture samples of 12 gestures (see: Figure 2.5 and HMM. Gesture set from Thomas Schlömer’s work [8].29% in the user-independent case which is higher than the compared classifiers. Tian Seng Leong et al. According to experimental results. Figure 2. given in Figure 2. ferent gestures with 400 samples totally captured by Sony-Ericsson W910i mobile phone which has in-built three-dimensional accelerometer accessible from Java environment.34 Figure 2. The experimental results showed that proposed system made 95.

5) shows the features of the previous works and the proposed recognition system.700 samples totally from 7 participants.15. the number of data in training set and in validation set. Ahmad Akl.79% in user-dependent case. Figure 2. There were 18 gestures (see: Figure 2. In 2010. while 93.16. Shahrokh Valaee proposed a gesture recognition system based primarily on a single 3-D accelerometer [55]. Gesture set from Ahmad Akl and Shahrokh Valaee’s work [55].35 using 3 or more examples in the training part. Gesture set from Tian Seng Leong’s work [54].5% in user-independent case.16) with 3. the number of symbol used in the system and user-dependent and user-independent classification results as true positive rate (recall). . The compared features are. the comparison table (see: Table 2. The proposed recognizer archives 99. In summary. Figure 2.

96% with SVM 95.195 digits for training totally Ratanamahatana et al.64%: lowercase uppercase 95. Experiment 1 has 1062 test samples and Experiment 2 has 720 test samples (see: Experi- mental Information in Section 3. [54] 500 samples for training of each gesture Ahmad Akl et al. 95.29% 96. 100%.46% for static.16%: digits 99. Tanguay [45] 5 samples for training and 5 samples for validation of each gesture Thad Eugene Starner [46] 395 five word sentences for training and 99 five word sentences for validation totally Mantyla et al.683 lowercase letters.32%: 94. [8] 90 samples for training of each gesture Zoltan Prekopcsak [52] 400 samples for training totally Jiahui Wu et al.7% Average 97. 99. 98.4% with grammar.0% with grammar.28%: 3. 4. letters.Table 2.79% 89. [1] 3360 samples for training totally Tian Seng Leong et al. 93.2) .26%: digits letters. # of Symbols 9 5 40 without grammar 4 static. Comparison Table of The Previous Works and Proposed System.7%: Experiment 1.050 samples for training and 6 samples for testing of each static gesture and 100 samples for training and 100 samples for testing of each dynamic gesture Keskin et al. [44] 100 samples for training and 50 samples for validation of each gesture Donald O.451 uppercase letters and 9.5%: Experiment 2 (end-user test) 90% 90.48% 88. [9] 100 samples of each gesture from ‘Gun’ set. [18] 240 samples for training.5. 272 totally from ‘Handwritten’ data set for training Juha Kela et al. 6 dynamic gesture independent) 8 5 62 (totally) 94.5%.4% with HMM. letters. [55] 3700 samples for training totally Proposed System 1062 samples for training totally.7% without grammar 99% User-Dependent Rc User-Independent Rc Method Data Collection Jie Yang et al. 28 samples for validation of each gesture Thomas Schlömer et al.5% for 18 gestures 96.21% 97% 99.11. 90.5% 88% 97. [47] 469 samples for training and 160 samples for testing totally Tracy Westeyn et al.75% 90. ⇡93. 4 (respectively) 8 5 10 12 10 18 20.20%: 90.67% (respectively) 89. 50 samples of each gesture from ‘Trace’ set. dynamic gesture (50% user96. 16. 15 97. [3] 4.26% lowercase uppercase letters.89% for 8 gestures. [48] 72 annotated sequences of signed sentences for testing data Ralph Niels [51] 37. 96% for 99.

Data Acquisition A simple 4-operations calculator requires in total 16 gestures. ‘⇥’. our proposed method starting from the template training part to gesture recognition part are given by following the data flow given in Figure 3. (a) shows gestures for digits. (a) (b) (c) Figure 3. ‘/’) . 4 gestures for 4-operations (‘+’. ‘ ’. since 10 gestures for digits [0-9]. matrices and constant values are defined with given assumptions and initialization values. Here the meaning of each gesture is given on top of each gesture movement (each gesture’s movement is in xy plane). Firstly. our proposed method is explained.6. After having such an initial information. 1 gesture for ‘equal’ operator (to calculate and give output) and 1 gesture to clear the last digit entered.4 and 3. then template generation and recognition processes are stated briefly.37 3.1.9) and covering the simple 4-operations calculator ges- . (b) shows gestures for operations and (c) shows gestures for equal and clear operations.1. Then variables. In this project. PROPOSED METHOD In this chapter. data acquisition is given to understand the our data set. 3. we gathered 22 gestures covering almost all the gestures defined in previous works (see: Section 2.

age. . In this study.2. the additional informations were not considered and every single gesture data. except for the gestures [18-20]. In the Figure. class number. each gesture type’s. The data contains all time series of x. has movement in xz plane. were used with the implemented application. All gesture data have gathered with iPhone 3GS at 60Hz within the implemented application. In the experiments. Gesture types are determined in order to satisfy the usage of a simple 4-operations calculator and some additional gestures using z axis. all gesture’s movement. disability (yes or no). profession. The gestures [18-20].38 tures. the hand information (right-handed or lefthanded) and the current hand for gesture data. sex. the meaning and the usage in calculator are given within box shape. are in xy plane. In data capturing.2. The gesture set used in this thesis is given in Figure 3. while touching to the screen and holding till movement finished. 2 of them were very close and made lower recognition accuracy than expectation (around 90% accuracy rate). However. y and z axes with timestamp value for each clock time of accelerometer and additional meta information of all participant’s. collected from participants. All gesture data were taken both hands from all participants in order to put various noise to our system for testing our model accuracy. so we used 20 gestures. user holds the iPhone face to him/ her and makes the given movements in Figure 3.

in order to access the built-in accelerometer of iPhone. General Model We proposed an accelerometer based gesture recognition system. key sequences.2. which consists of two initial parts. Note that. training (template generation) and classification (recognition process) parts. iPhone operating system (iOS) has UIAccelerometer class under UIKit framework. As the iPhone moves. Gesture Set used in this project. The update frequency of the accelerometer can be configured and 3D acceleration data with timestamp can be received by using the shared UIAccelerometer object. 3. each gesture movement is captured by . are generated.2. Here.39 Figure 3. In the training part. called templates. the built-in accelerometer reports linear acceleration changes along the primary axes in three-dimensional space. therefore this part is called template generation.

All operations in recognition process are done with iPhone 3GS as a new interface for the visually-impaired users. The matching of two gesture data is based on Dynamic Time Warping (DTW) and LBKeogh algorithms. Legend used in template generation and recognition processes.3 below: Figure 3. Then each individual gesture data in the dataset is validated. Template Generation In the template generation processes. After templates generation operations. are made by PC because of it’s computational power. 11]. down-sampled and trained in order to find best warping windows sizes and threshold values. the collected data is used to generate template library. As an introductory information. The warping window sizes contain adjusted window size of each gesture type for each time t and it is calculated by considering the classification accuracy result. warping window sizes and threshold values of each gesture types.2. of each gesture type. 3. In the classification part. In brief. Then the candidate gesture data is matched with each template using warping window sizes and threshold values and the closest template is selected as the classification result.40 iPhone built-in accelerometer and saved into a dataset.1. filtered. except the data collection. Here. called template data. in the training part. a candidate gesture data is tried to recognized according to trained templates.3. preprocessed. Here. user makes a movement with iPhone and it is captured by iPhone built-in accelerometer and saved temporary as a candidate data. template generation is made once by PC to produce the essential data for recognition and recognition process is done by the users at their choice. collected data is validated. therefore this part is called recognition process. Threshold values determine the minimum and maximum DTW distances as a range for each template. all operations. down-sampled and trained respectively to generate key data. The templates are the key sequences for each gesture type in order to check the similarity . which determine the distance between two gesture data as a similarity metric [10. the legend of the template generation and recognition processes figures is given in Figure 3. filtered.

therefore the validation data or any candidate data is not preprocessed. the system is used the template library to generate the warping window and threshold values. then in the second data flow with the validation data set.4 gets one gesture data at one time. are two data flow in the system.4. then the templates.41 distance (DTW distance) with the candidate data and determine the closest template for the candidate. is used for generating the warping windows sizes and threshold values. In addition. There Figure 3. Template generating processes. except for preprocessing and DTW training procedure. each procedure.6). one of them is used for generating template library (T 1) and other one. the system generate template library of each gesture type. In the preprocessing step. given with boxes in Figure 3. warping window sizes and threshold values are stored in iPhone and used in recognition processes. Template generation is done once on a desktop machine. The difference between training and validation set is the preprocessing step. called validation data set (T 2). .4 is an abstraction of our model given below. The Figure 3. In the first data flow with training data set. The general algorithm of template generation is given below. the gesture type information is need to process of each data (see: Preprocessing Section 3.

. Template generation algorithm.5.42 Figure 3.

2. single candidate gesture data is tried to classify by checking the DTW distances with template gestures of template library. so the closest template gesture data means that it is the most similar gesture data with our candidate. Here. . so it is used for data collection and candidate data classification. The threshold control procedure takes the closest class information with threshold values and controls whether it is in range or not.6. Gesture recognition processes. the captured user gesture data is used as candidate gesture data. warping window sizes and threshold values. Here.6 is an abstraction of our recognition procedures given below: Figure 3. are used in the recognition steps. which are found in template generation procedure. The Figure 3. warping window sizes and threshold values found in template generation previously. The iPhone application has the templates. Recognition Processes In the recognition processes.43 3. Then gives the output as classification result. In addition.2. DTW template library. The DTW template matching step takes the down-sampled candidate data. the distance is a metric of similarity. DTW template library and warping window size values in order to find the closest template. Recognition processes are done on iPhone.

These 3 time series can be represented as a 3 ⇥ K matrix as G. Gesture recognition algorithm. K}.7. where K is the end time of captured user movement. Therefore the gesture data is G .. 6 g 6y(k)7 4 5 z(k) 2 3 (3. z(k) respectively. second and third row corresponds to x(k). all definitions used in our model with their meaning and initial values are given as well as the assumptions. Figure 3. Definitions A captured gesture data has x(k). 3. {!(k)}K where g k=1 k th column of !(k) is: g x(k) 6 7 7 !(k) . 3.3.3.. .44 The general algorithm is given below.1. y(k) and z(k) components where k 2 {1. y(k).1) .. where the first. Theory In this section.

3) m=1 which means that the collected gesture set is the union of the gestures of each gesture G m. We defined a function: .. In addition. • Assumption 3. the frame of reference of the user is stationary. 8m 2 M. These 1.2) G m .2. which means that . All users hold the device as the same position and the device orientation is not changing dramatically while the gesture movement continues.. so the instant continuous recognition system may lead an extra difficulty on usage of the system. There are M = 20 different gesture classes given in Figure 3.2.. 48  |G m |  55 where |G| = 1091. This assumption is made. We define G = {G1 . 091 gesture data with known gesture types.45 In this study. . All users start their gesture movement by touching the screen and holding it till the movement finish. . The class with smaller number of gesture is 48. {G 2 G| (G) = m} is the set of all gestures that corresponds to gesture type m. All users do not move and they stand in ground without any external force which may make noise to the accelerometer while the data is captured. • Assumption 2. Here. 091 gesture data are almost evenly distributed to the classes. so: (3. we have 1. so the captured data should be segmented as a single gesture symbol. G1091 } as the set of gestures and M = {1. Assumptions S20 (3... because our proposed method does not contain any segmentation procedure. we assumed that a visually-impaired user would like to use the recognition system on demand. Note that. 20} is the set of symbols. 2. 3. while the one with the largest number of gesture is 55.G!M G 7! m where m is the gesture type corresponding to gesture G. • Assumption 1.. G = types. Here.3.

46 user is not an acceleration motion while the data is collected. The built-in accelerometer of iPhone 3GS produces acceleration data with the same time interval t. There are three types of data validation used in this project. if a gesture data G is marked as invalid then it is deleted by system from the data set. • Assumption 4. if Kmin < KG < Kmax otherwise > :invalid. First one is the data validation during the gesture data collection. Kmin = 30 is determined by the downsampling sample size N .1. LengthV alidation(G) . which is defined intuitively. Kmax = 205 is determined by multiplying the maximum gesture data length among all gesture in training set data by 1.1 (110%). The other ones are validated each gesture data in the data set.4. Data Validation The aim of data validation is eliminating some error conditions before we start to extract the DTW templates in order to increase the classification accuracy and template quality.4.4) where KG represents the length of a gesture data. This procedure is applied for . According to gesture length data validation. 8 > <valid. Any gesture data shorter than Kmin = 30 or longer than Kmax = 205 is rejected in the validation step. Gesture Length Data Validation A gesture data G may be too short or too long. 3. 3. In actual data. We noted the wrong gesture movements while participants make the data with iPhone. according to gesture length and average amplitude respectively. t is changing slightly because of some operating system delays. Here. This step is done before the training start and the wrong movements are deleted from the data set. (3.

The amplitude of gesture data G at time k is: p x(k)2 + y(k)2 + z(k)2 . if a gesture data G is marked as invalid then it is deleted by system from the data set. Such gestures are rejected. (3.1 (110%). which is determined by multiplying the minimum average amplitude among all gesture data in training set by 0.4. 8 > <valid.9 (90%). Any gesture data. a(k) = (3. a(k) K k=1 Aavg (3.10 is rejected in the validation step. which is defined intuitively. Amax = 2.95 is the minimum average amplitude value. which has lower than Amin = 0.5) The average of the amplitude values of a gesture data G is defined as: K 1 X . According to average amplitude data validation. AmplituteV alidation(G) .95 or greater than Amax = 2.10 is the maximum average amplitude value. This procedure is applied for all gesture data in data set and each candidate gesture data . which means that a user may make the gesture movement in slowly or fast. which is determined by multiplying the maximum average amplitude among all gesture data in training set by 1.7) where AG represents the average amplitude value of a gesture data. Average Amplitude Data Validation A gesture data G could be too weak or too strong for classification. Here. if Amin < AG < Amax otherwise > :invalid.2. Here. 3. amplitude is used for measuring the weakness or the strength of the gesture. Here.6) where K is the length of gesture data G.47 all gesture data set and each candidate gesture data entering the system. Amin = 0.

from our gesture data and smooth the data. y.8. Figure 3. ↵= Fcutof f Fcutof f +Fupdate = Filtering constant used in filtering (smoothing factor) ! = (x. z) = The filtered gesture data at time t. so we used a simple low-pass filtering (see: Low-Pass Filtering Section 2. Low-Pass Filter algorithm. There are some definitions used in low-pass filtering given below.48 entering the system. and the algorithm of low-pass filter is given below. This procedure is applied for all gesture data in data set and each candidate gesture data entering the system. which is used as a temporary variable in filtering. which is caused by user hand shaking or some noisy hand movements. Filtering Accelerometer Data The aim of filtering is to eliminate the high frequency data. .6). 3.5. Fupdate = 60 = The filtering update frequency which is equal to the frequency used in gesture capturing Fcutof f = 10 = The cutoff frequency used in filtering which is determined after some trials.

the aim of preprocessing is close up each gesture data in same gesture type. iii) user may change the iPhone position while gesture is captured.6. y and z axes even if he/she tries to hold it face to him/her. i) user may make a gesture movement slower or faster than other gesture data.9c).49 3. which affects the amplitude of gesture sequences (see: Figure 3. which are belong to same gesture type. Firstly. Then each gesture data is shifted in y to have the same mean value with the average one. In brief. all the same type gesture data are aligned in terms of average mean value. so we can’t eliminate the third problem.9b). may have three different properties under the certain assumptions given at the beginning. Here. one of them is mean adjustment and other one is variance adjustment of gesture data of same class in order to increase the quality of template (and also increase the accuracy rate). This procedure eliminates the differences among same type gesture data caused by user hand movement sharpness. The Figure 3. After shifting in y. which affects the quality of templates generated by using all same type gesture data. This procedure eliminates the differences among same type gesture data caused by initial phone position (see: Figure 3. therefore. ii) user hold the iPhone with a certain angle in x. the average mean value is found among the gesture data in same type. we can eliminate the first and the second situations by making the preprocessing. However. All three properties changes the shape of gesture data sequences. Secondly.9 shows the effects the preprocessing procedures. . the oscillation rate around mean is tuned with average variance value to get same oscillation rate of each same type gesture data. Preprocess The all gesture data. there are two types of preprocessing made for this project. the position information at time t is not known.

9. In order to equalize the mean values of gestures of the same gesture type. (b) After aligned in terms of average mean value. (a) Three raw gesture data in x axis. (c) After aligned in terms of average variance value. (d) After aligned in terms of mean and average variance value. 3.50 (a) (b) (c) (d) Figure 3. Illustration of preprocess effects. Mean Adjustment In mean adjustment.1. the average mean value of the given gesture type is calculated.6. we make all gesture data having same mean value of the same gesture type in training set. . Then the difference value between average mean value and the current mean value of given gesture is added to the given gesture data.

. y µm .12) where G⇤ is the mean adjusted gesture data. z (3. µy . |G m | G2G m 1 X µG. |G m | G2G m 1 X µG. (3.11) where |G m | is the number of gesture data of gesture type m in the training set. K i=1 K 1 X y(i). K i=1 K 1 X z(i) K i=1 µx (3. x µm .51 The mean value of a gesture data is defined as: 2 3 6 7 6 7 µ . The average mean of all gesture data of the same gesture type m in training set is defined as: 2 3 6 7 µm . G µG + µm (3. After defining mean of a gesture and average mean of all gestures in same gesture type.8) µx . the mean adjustment operation for any gesture data G of gesture type m is defined as: G⇤ . µz . 6 µy 7 4 5 µz where mean value of each dimension is calculated as: K 1 X x(i).10) µm .y .z |G m | G2G m µm 6 x7 (3. 6 µm 7 4 y5 µm z where average mean value of each dimension is defined as: 1 X µG.9) where K is the length of each time series.x .

6 vy 7 4 5 vz where variance value of each dimension is calculated as: K 1 X vx . In order to equalize the variance values of gestures of the same gesture type. The variance value of a gesture data is defined as: 2 3 v 6 x7 6 7 v . x vm . (Gx K i=1 K 1 X µx ) .2. Then the average variance value is divided by the current variance value of given gesture and multiplied by the difference between gesture data point and mean value and added the mean value to the result in order to have the same variance value among gestures of the same gesture type. (Gy K i=1 2 K 1 X µy ) . The average variance of all gesture data of the same gesture type m in training set is defined as: 2 3 6 7 v m .6. v y . z (3. 6v m 7 4 y5 vm z where average variance value of each dimension is defined as: 1 X vG. the average variance value of the given gesture type is calculated. |G m | G2G m vm 6 x7 (3.y . y vm . we make all gesture data having the same variance value of the same gesture type in training set. Variance Adjustment In variance adjustment. v z .15) vm .13) µz ) 2 .16) .x . |G m | G2G m 1 X vG.52 3.14) where K is the length of each time series and µ is the mean value of the given gesture with given axis. |G m | G2G m 1 X vG.z . (Gz K i=1 2 (3. (3.

we divide each sequence of gesture data in x.2. After defining selection group length. we can not overlap two data and can’t generate a quality templates making increase in accuracy. we take the average of each group and put them into new sequence. After defining variance of a gesture and average variance of all gestures in same gesture type. µG + µG ) (3. y and z axes into groups. end) 2 3 (3. Downsampling The aim of downsampling is getting gesture data with same length without having dependency on time. which is the running frequency of iPhone built-in accelerometer. which is a popular frame rate used in high definition video and determined intuitively. Then. so sample size becomes N = 30. the gesture data is sampled to 60 Hz. end) 6 x 7 6 7 µ(start. end) . which is down-sampled sequence. but the segmentation is assumed to 1 second. In this study. µ (start. the downsampling rate is 30 Hz. end)7 4 5 µz (start.53 where |G m | is the number of gesture data of gesture type m in the training set.18) .6.7. 6µy (start. where (start < end) and (start > 0) and (end  T ). Here.1). 3. The selection group length is defined as D = K/N where K is the length of a gesture data G on any axis.17) where G⇤⇤ is the variance adjusted gesture data. we use average downsampling (see: Section 2. In data collection part. Let’s define average (mean) value of a gesture data with given start and end time. as: ✓r vm ⇥ (G vG ◆ the variance adjustment operation for any gesture data G of gesture type m is defined G ⇤⇤ . which means that if a user makes a gesture slowly where as another user makes the same gesture in a fast way.

z(i) (3. the downsampling algorithm is defined as given below: Figure 3. end) = 1 (end 1 (end 1 (end start) i=start start) i=start start) i=start x(i). . y(i).10. Downsampling algorithm.19) After defining mean value of a gesture data with given start and end time.54 where segmented mean value of each dimension is defined as: end X end X end X µx (start. end) = µy (start. end) = µz (start.

The general algorithm and detailed definitions are given below. DTW training takes two data set.11.55 3. we calculate the mean value of gestures.6). General DTW Training algorithm.1.1. Template Library Generation.1. which is the sample size. Template is represented by Q and defined as: Q . called query or template.20) .8. Note that.8. which are the same gesture type. one of them is the preprocessed training data set and other one is the validation data set (see: Figure 3. DTW Training The aim of DTW training is learning the template sequences. after downsampling our time limit is now N . 3. DTW Training Processes 3. In template library generation. threshold and warping window size values from training data set which make the highest accuracy on the validation data. Then for each gesture type a gesture data is generated. in training set T1 for each time index. {!(n)}N q n=1 (3.8. within the mean value for each time index. Figure 3.

7.56 where template data Q of any time index n is: 2 3 6 7 7 !(n) . 6 q 6y(n)7 4 5 z(n) x(n) (3. According to template Q definition on any time index n.3). which is given below: 2 Q1 3 (3.23) qx (n) .24) . 6 7 6 7 6 .21) The value of template Q of gesture type m for each axis on any time n is defined as: 1 X 1 X 1 X Gx (n). 7 4 5 M Q where total gesture type value M = 20. {!(n)}N u n=1 (3. qz (n) . 7 6 7 6 7 6 . |G m | G2G m |G | G2G m |G | G2G m (3. 6 . m Gy (n). m Gz (n).22) where |G m | is the number of gesture data of gesture type m in the training set. we can generate our DTW template library TL by calculating each template values for each gesture type m. now we can define the lower limit L and upper limit U matrices for each model m which are used in LBKeogh distance calculations (see: LBKeogh distance in Section 2. ! L . 7 6 7 6 7 TL . { l (n)}N . qy (n) . n=1 U . After defining DTW template library.

i R}. min {qx (j)|j 2 [max{1. y(i).1. l(i)x . which is determined after some trials. we define warping window array as: W .8.2. Best Warping Window Size. DistanceM etric . . min{N. min{N. lM (n) 1 7 7 7 7 7 7. i u(i)x . In this project.25) 6 6 6 6 ! 6 l (n) . i + R}]} (3. max {qx (j)|j 2 [max{1. 6 6 6 6 4 l (n) . z(i)) and um (i) = (x(i). then we calculate distance metric and save it. 6 u 6 6 6 6 4 u (n) . 3.27) and initialize the all values to 1. (T P D/F N D) ⇥ (F N/T P ) (3.57 where 2 3 2 3 7 7 7 7 7 7 7 7 7 5 (3. . . i + R}]} R}. {w(n)}N n=1 (3. Validation data set T2 is used to find warping window sizes. . z(i)). uM (n) 1 and all lower and upper bound has three time series lm (i) = (x(i).28) . Firstly.26) where the range is R = 3. so our lower and upper bounds are given for the gesture type m on x axis by using template q. y(i). we used dynamic warping window sizes defined in Ratanamahatana and Keogh’s work [11]. The distance metric is given below. 7 7 7 5 6 6 6 6 !(n) . which produce highest accuracy rate on classification.

the general algorithm is given in Appendix in Figure C. However. which is found empirically. if the distance metric is greater than the previous one. Here. LBKeogh distance.3. This is a recursive procedure called forward search and the dividing warping size array continues till the length of part becomes 1 (lower bound) or there is no more increment value (upper bound). then we divide the warping window array into two part and follow the given procedure for each part. the maximum iteration for each forward search algorithm is defined as I = 3. which is the same as sample size. we increment the values on warping window array W by 1 and calculate the distance metric and compare with the previous one. defined in Section 2. Here. In our study. it means that our system is improved and we can continue to increase the warping size until reaching the maximum limit N .7. The Figure 3. is used while comparing . Secondly. After 3 iteration. the best warping window array did not change in our experiments. F N D is the total DTW distance of false negative classification. in order to decrease the computational time for finding the best warping window size array.58 where T P D is the total DTW distance of true positive classification. F N is false negative classification number and T P is true positive classification number among the training set by checking each gesture data in training set and finding the closest template. so we define a lower bound threshold limit for warping array part length as: m lp (N )/2 = 3 (3.2 with other used algorithms. However. after finishing one forward search we can save the warping window size array and restart the algorithm from the saved array. If distance metric is lower than the previous one.29) In addition. defining the minimum limit as 1 produce a danger of over-fitting.12 gives the illustration of the forward search for finding best warping window sizes. In addition. h(1) and h(2) represent the distance metrics of each warping array condition.

G2i ) (3.5). two gesture data. LBKeogh distance is calculated by adding each time series (each dimension) distances (see: Appendix C Figure C.2. G2) . Forward search algorithm from Ratanamahatana and Keogh’s work [11]. the DTW distance. 3 X i=1 LBKeogh(G1.12. defined in Section 2. y and z respectively.7. G2i ) (3. G2) .31) . LBKeogh(G1i . is used while calculating the distance between two gesture data. In addition. DT W Distance(G1i . DTW distance is calculated by adding each time sequence (each dimension) distances (see: Appendix C Figure C. 3 X i=1 DT W Distance(G1.6).30) where i is the index of each dimension x.59 Figure 3.

if there isn’t any template in the range the unknown gesture data is marked as unclassified gesture. Then the closest template in the given threshold range for each dimension x. called misclassification problem. recognition processes (see: Figure 3. Recognition Processes In order to classify an unknown gesture data. which is not a general condition. Finding Closest Class with Threshold Control The closest gesture type is found using DTW template library. low quality template set. 3. 3. { M min (m)}m=1 is the minimum DTW distance among all training data in gesture type m with template max (m) is the maximum DTW distance among all training data in gesture type m with template Qm . Firstly.9. y and z becomes the prediction (classification) result. This problem. It can not be eliminated directly but reduced by defining a range of DTW distance values for each gesture type m. Here. given in Figure 3.9.2. The algorithm is given below: 3. Threshold Control An unknown gesture data C may be misclassified if there are some other closer templates than the actual one. warping window sizes and threshold values found in the template generation. which are called threshold values.6) are followed. noisy candidate data etc.7.9. Qm and .1.60 where i is the index of each dimension x. can be caused because of various different reasons such as. . y and z respectively.{ M max (m)}m=1 are defined according to DTW min (m) . an unknown gesture as candidate gesture data C is taken and calculated all DTW distances between each template in DTW template library. The minimum threshold values and maximum threshold values max min distances between each validation data and the corresponding template.

8 >Invalid. > > > < >Invalid. According to threshold control. > > > :V alid. which is determined after some trials. if DT W (Qm . C) < Vtolerance ⇥ min (m) max (m) T hresholdControl(Qm . C) > (1 + Vtolerance ) ⇥ otherwise (3. This procedure is applied for all gesture data given by the system for classification. C) .61 Figure 3. if DT W (Qm . . Finding Closest Class Index algorithm. Here. if DTW distance between unknown gesture data C and template Qm is not in range then it is not used in classification.13.32) where Vtolerance = 0.5 is tolerance value.

Then. Here. Markov model (observable HMM as MM). the algorithms caused the best classification accuracy is given with optimized parameter values. K-Mean. fixed amplitude. Random Subsampling and leave-one-out algorithms [61]. dynamic trajectory angle quantization [57–59].10.1. dynamic amplitude and dynamic split clustering. the classification accuracy is checked to find the maximum classification accuracy. This operation is very costly according to CPU time. 3D HMM and DTW models [11. median and middle down-sampler. Implementation Information All algorithms given in this thesis were implemented and packed as framework. Fast Fourier Transformation (FFT) [28]. for each increment value of each parameter. • Sampling Algorithms. Firstly. Secondly. which are not mentioned in thesis. average. minimum and maximum range values are defined for each parameters used in this project as well as the increment value intuitively. high-pass and band-pass filtering algorithms [56]. HMM. In addition to the algorithms given with this study. • Clustering Algorihtms.11. there are numerous algorithms in the framework. • Classifier Model. so some parameters are defined in a very low range or high increment level. K-Medoid. our proposed . all parameters are set the minimum range value. 3. • Cross Validation. In thesis.11. All implemented algorithms are given below: • Filtering Algorithms. Simple. all the parameters can be configured dynamically as parameter tuning. 29. K-Fold. In the framework. • Dimensional Reduction. Paramater Tuning In this study. 60]. The implementation of our system consist of two parts. One of them is the computer based part and other one is the iPhone application. we made a brute-force search to find best parameters giving the highest accuracy. Low-pass. Experimental Results 3.62 3.

training set were collected for 20 different gesture from 15 participant.53Gz Core 2 Duo. it took 2 days with 2. which uses the ‘GestureRecognition’ framework. However. After this point. 2 .63 system has been implemented as a framework called ‘GestureRecognition’ in order to reuse the code in the iPhone application2 . Then the application produces an XML file containing optimum values for gesture recognition system which is used by our iPhone application. In the collection stage. 40 calculations. Here. The optimal condition satisfaction may last longer than expectation.edu. After data collection. which was noted at that time and deleted from the gesture set. were defined given in Table 3. then they should be saved in computer. some participants made wrong gesture movement involuntary. 63].tr/doku. In order to test our system with the calculator.2. 1091 samples were in the data set. However.boun. Gestures can be freely collected by iPhone application. 4GB Ram Mac Pro for 1062 gesture data with 20 different gestures. Experimental Information In this study. warping window sizes and threshold values. who do not have any disability.11. Flite text-to-speech library and its wrapper by Sam Foster was used [62. in our case. with 55 samples for each gesture types (totally 1100 gesture samples). we taught the gesture set to each participant before data capturing and did not interrupt on data capturing. we can put our XML file to our iPhone application and test the system by making a gesture. The application for computer. our system was trained and validated with leave-one-out cross validation. In the second experiment.php/wiki/theses/abc) contains framework and the details of this thesis.1.cmpe. After this simple validation operation. the system eliminated 29 gesture data. finds the optimal template library. Then 4 visually-impaired people tested the system by making defined calculations using our Thesis web page (http://soslab. so 1062 gesture data were used totally. containing totally 180 characters. In the first experiment. 3. we made a gesture based talking calculator with 15 different gestures by using proposed system (implemented framework) and first experiment’s training results. two experiments were made for validating our recognition system. in the data validation step.

3. Results In the first experiment. here the number of days is determined intuitively. Accuracy rate per symbol from experiment 1. In each day. which is user-independent classification result. 20 different gestures were collected and classified with average of 96.14. all participants calculation data were collected in order to check the learning curve and the accuracy of the classifications. The data set contains 1062 data totally and leave-one-out cross validation is used on validation. Moreover. Figure 3. The data set contains 15 different gestures with . 6+9= 1/50= 2+0= 58/9= 72 4= 3⇥8= 6 7= 31⇥4= 7+9= 30/5= 4+5= 6/13= 1 24= 8⇥6= 8 7= 90⇥2= 5+1= 40/2= 67+4= 5/2= 8 9= 76⇥3= 1 9= 3⇥80= 90+7= 8/24= 30+4= 2/8= 3 6= 1⇥5= 1 5= 67⇥9= 8+3= 10/4= 81+9= 2/6= 56 2= 7⇥9= 30 5= 4⇥7= 3.1.7% recognition accuracy. Table 3.11.64 calculator for an adaptation period of 7 days. The confusion matrix of the first experiment is given in Table 3. Calculations used in second experiment.2.5% recognition accuracy in the last day. In the second experiment. minimum and maximum DTW distances (threshold values) and warping windows size of each gesture type are given in the Appendix part Chapter B. which is user-independent classification result. we got 95.

which is a neurologically-based condition in which letters or numbers are perceived as inherently colored and they were unaware about it during the experiment [64]. .65 720 data totally (180 samples from 4 visually-impaired people). interestingly.15.1 and Figure A.1. In addition to all findings in experiments. The experimental data was recorded day by day and the daily average recognition accuracy (learning curve). the effects of each procedure in the system on the classification result is calculated and given in the Appendix A with Table A. all participants of experiment 2 were color-graphemic synesthesia. daily recognition accuracy per person and per letters are given in Figure 3. Lastly.

. Confusion matrix of experimental result.2.Table 3.

67 (a) (b) (c) Figure 3. (b) Daily Average Accuracy Per Person. . Second Experiment’s Result.15. (c) Daily Average Accuracy Per Symbol. (a) Daily Average Accuracy.

g. In order to increase the accuracy rate. a gesture data should be captured first. which segments the data instantly and tries to classify each segment. In our experiments. The only and main limitation is that. This kind of gesture recognition systems can be used in anywhere between humancomputer interaction occurs. the system is totally user-independent and it doesn’t have limitations on gesture collection. gyroscope) can be added to the system while preprocessing the data in order to get more accurate result. a gesture set can be determined by increasing the dissimilarity of gestures from each other while considering user ability and comfort during writing. Here. then classified. some assumptions were made on gesture collection in order to increase our accuracy rate such as holding phone as face to user. some additional sensor’s data (e. In this study. . CONCLUSION AND FUTURE WORK We have introduced an optimized accelerometer-based gesture recognition on touch-screen smartphones with accelerometer for visually-impaired people as well as its implementation as gesture based talking calculator. They may be used in games or in simulations in order to get and analyze the information of interactions of human with human and with computer. may be a good goal for future works. templates are generated directly from simple preprocessed training data set and the preprocessing steps are defined intuitively. However. a participant should press the screen to save the accelerometer data till touching up from the screen. In addition. Making our proposed system to an continuous-gesture recognition system. for example a designed war game using the hand gestures can give better understanding the work of gestures under-stress [65]. The system doesn’t check the data while user making gesture movement.68 4. therefore a template quality metric may be defined considering gesture data features and possible noises in the data.

our proposed gesture recognition system provides an outstanding performance in terms of user-independent recognition accuracy.69 In summary. experimental results of end-users and variety of gesture set when compared to other systems in literature. .

the grey boxes show the procedures that takes place (on).1. Effects of procedures on classification accuracy.70 APPENDIX A: EFFECTS OF PROCEDURES (a) Effects of procedures on classification accuracy ordered by combination number (b) Effects of procedures on classification accuracy ordered by accuracy rate Figure A. . Here. while white boxes show the procedures that does not take place (off).

82% 93.58% 93.82% 94.79% 90.1.35% 94.54% 90.05% 89.97% 94.71 Table A.48% 88.61% 94.63% 93.35% 94.63% 93.06% 96.01% 91.51% 90.05% 90.72% 94.01% 88. Combination No Procedure Name 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON OFF ON \ Filtering Preprocess Mean Adjustment OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON OFF OFF ON ON Preprocess Variance Adjustment OFF OFF OFF OFF ON ON ON ON OFF OFF OFF OFF ON ON ON ON OFF OFF OFF OFF ON ON ON ON OFF OFF OFF OFF ON ON ON ON OFF OFF OFF OFF OFF OFF OFF OFF ON ON ON ON ON ON ON ON OFF OFF OFF OFF OFF OFF OFF OFF ON ON ON ON ON ON ON ON Threshold Control Warping Window Sizes Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Constant : 3 Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic Dynamic 88.48% 88. Effects of procedures on classification accuracy.79% 89.97% 94.83% 90.63% 94.92% 90.97% 94.01% 91.86% 89.79% 90.70% Accuracy .35% 96.78% 94.

00674 1.7967 7.6344 13.48257 7.542985 0.05376 10.74777 6.70625 0.982116 1.43468 8.306786 0.02641 1.610638 0.08345 0.9138 11.943864 0.12 7.02938 5.614656 0.76474 11.61127 2.5646 13.30977 1.44584 0.140728 0.1071 9.968705 1.1.36975 0.1798 0.0865 9.4934 6.5179 9.72 APPENDIX B: FINDINGS FROM TRAINING PART Table B.977618 1.5803 9.88993 .3523 9.567924 0.21046 8.580739 0.704822 1.523203 0.58284 7.520953 0.84511 6.44041 5.4402 6.5253 10.8662 6.24038 Max DTW Distance on z axis 18.726587 0.96132 10.6204 9.5169 8.47047 7.86949 0.30249 7. Gesture Class Threshold 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 \ Min DTW Distance on x axis 0. DTW distance threshold values of each template gesture type on each axis.652959 0.880651 0.08159 1.488112 0.513074 0.52301 9.40113 7.660745 0.45951 17.26738 0.12837 9.75498 11.7932 8.366963 0.55948 0.759 5.2111 0.552789 0.2703 6.0276 11.36223 10.781868 0.628353 0.20961 9.19809 8.627831 0.892143 1.766398 Max DTW Distance on x axis 4.788986 1.82795 9.33565 12.29786 1.30562 7.36951 Min DTW Distance on y axis 0.1234 8.577131 0.09326 0.02519 1.87538 Min DTW Distance on z axis 0.76258 9.57316 8.595201 Max DTW Distance on y axis 15.49621 7.584787 0.09508 8.249236 0.79466 11.650224 0.2732 8.15434 8.46075 4.621314 0.543717 0.684935 0.51399 6.787927 0.870725 0.357632 0.253275 0.975159 0.3236 0.21406 6.8274 6.554388 0.

Warping window sizes of all gesture types for x axis (m is gesture type.2. m\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 2 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 5 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 6 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 7 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 8 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 9 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 10 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 11 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 29 12 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 29 13 29 29 29 29 6 29 29 29 29 4 29 29 29 5 29 29 29 4 29 29 14 29 29 29 29 6 29 29 29 29 4 29 29 29 5 29 29 29 4 29 29 15 29 29 29 29 6 29 29 29 29 4 29 29 29 5 29 29 29 4 29 29 16 29 29 29 29 29 29 29 29 29 29 29 29 29 4 29 29 29 1 29 29 17 29 29 29 29 29 29 29 29 29 29 29 29 29 4 29 29 29 1 29 29 18 29 6 29 29 29 29 4 29 29 29 5 29 29 29 29 29 4 29 29 29 19 29 6 29 29 29 29 4 29 29 29 5 29 29 29 29 29 4 29 29 29 20 29 4 29 29 29 5 29 29 7 4 5 29 29 4 29 29 29 29 3 29 21 29 4 29 29 29 5 29 29 7 4 5 29 29 4 29 29 29 29 3 29 22 29 29 29 29 5 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 23 29 29 29 29 5 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 24 29 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 25 29 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 26 29 29 3 29 29 29 5 7 29 29 29 29 5 29 29 29 29 29 29 29 27 29 29 3 29 29 29 5 7 29 29 29 29 5 29 29 29 29 29 29 29 28 29 29 29 29 29 5 29 29 29 7 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 5 29 29 29 7 29 29 29 29 29 29 29 29 29 29 30 29 29 29 29 29 5 29 29 29 7 29 29 29 29 29 29 29 29 29 29 .Table B. n is time).

Table B. Warping window sizes of all gesture types for y axis (m is gesture type.3. n is time). m\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 2 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 5 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 1 29 6 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 1 29 7 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 8 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 9 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 3 29 29 29 10 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 3 29 29 29 11 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 3 1 29 12 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 3 1 29 13 29 29 29 29 29 3 29 29 29 29 29 29 29 29 4 29 29 29 0 3 14 29 29 29 29 29 3 29 29 29 29 29 29 29 29 4 29 29 29 0 3 15 29 29 29 29 29 3 29 29 29 29 29 29 29 29 4 29 29 29 0 3 16 29 29 29 29 29 29 4 5 3 3 29 29 29 29 29 29 29 3 29 29 17 29 29 29 29 29 29 4 5 3 3 29 29 29 29 29 29 29 3 29 29 18 29 29 29 29 29 5 29 29 3 29 3 29 3 8 4 29 29 29 29 29 19 29 29 29 29 29 5 29 29 3 29 3 29 3 8 4 29 29 29 29 29 20 29 29 29 29 29 3 29 29 29 3 29 29 29 29 3 29 29 29 0 29 21 29 29 29 29 29 3 29 29 29 3 29 29 29 29 3 29 29 29 0 29 22 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 29 29 23 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 29 29 24 29 8 29 29 29 3 29 29 29 3 29 29 3 29 29 6 4 29 29 4 25 29 8 29 29 29 3 29 29 29 3 29 29 3 29 29 6 4 29 29 4 26 29 3 29 29 29 3 3 29 29 29 29 29 29 29 29 29 29 29 29 29 27 29 3 29 29 29 3 3 29 29 29 29 29 29 29 29 29 29 29 29 29 28 29 29 29 29 29 29 29 29 29 29 29 29 3 29 3 29 3 29 2 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 3 29 3 29 2 29 30 29 29 29 29 29 29 29 29 29 29 29 29 3 29 3 29 3 29 2 29 .

m\n 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 1 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 2 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 5 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 6 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 7 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 8 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 9 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 3 29 29 3 10 29 29 29 29 29 29 29 29 29 29 29 29 3 29 29 29 3 29 29 3 11 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 12 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 13 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 3 29 29 3 29 14 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 3 29 29 3 29 15 29 29 29 29 29 29 29 29 3 29 29 29 29 29 29 3 29 29 3 29 16 29 3 29 29 29 29 29 29 29 3 29 29 4 29 5 29 29 29 29 29 17 29 3 29 29 29 29 29 29 29 3 29 29 4 29 5 29 29 29 29 29 18 29 29 29 29 29 29 8 29 29 3 29 29 29 29 29 29 3 29 29 29 19 29 29 29 29 29 29 8 29 29 3 29 29 29 29 29 29 3 29 29 29 20 29 29 29 29 29 29 29 5 29 29 29 29 29 29 29 3 29 29 4 5 21 29 29 29 29 29 29 29 5 29 29 29 29 29 29 29 3 29 29 4 5 22 29 3 29 29 29 29 29 29 29 29 29 6 3 29 4 29 3 29 29 29 23 29 3 29 29 29 29 29 29 29 29 29 6 3 29 4 29 3 29 29 29 24 29 29 29 29 10 29 29 29 29 29 3 5 29 7 29 29 29 29 29 3 25 29 29 29 29 10 29 29 29 29 29 3 5 29 7 29 29 29 29 29 3 26 29 29 29 29 3 29 29 29 3 29 29 29 29 29 29 4 29 29 3 29 27 29 29 29 29 3 29 29 29 3 29 29 29 29 29 29 4 29 29 3 29 28 29 29 29 29 29 29 29 29 29 29 29 29 29 4 4 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 29 4 4 29 29 29 29 29 30 29 29 29 29 29 29 29 29 29 29 29 29 29 4 4 29 29 29 29 29 . Warping window sizes of all gesture types for z axis (m is gesture type.Table B.4. n is time).

Simple DTW Distance with Dynamic Warping Window Size algorithm.76 APPENDIX C: ALGORITHMS Figure C. .1.

.77 Figure C.2. Best Warping Window Size algorithm.

78 Figure C. DTW evaluation algorithm. .3.

Finding Best Warping Window Sizes algorithm.4. .79 Figure C.

5. .80 Figure C. LBKeogh Distance calculation algorithm.

.6.81 Figure C. DTW Distance calculation algorithm.

7. a confusion matrix is generated in order to see all classification result. After DTW training step.1. now the system is ready to classify each gesture data of validation data set by following the general recognition algorithm explained in Recognition Processes section (see: Section 3. the classification accuracy is 100% for 2 class with 8 validation data. in the validation part. After getting the results from template matching with threshold control.2). warping window sizes and threshold values. The first gesture type called ‘Class A’ and the second gesture type called ‘Class B’ are shown in Figure D. there are 2 gesture class and 1 participant and 4 gesture data for training and 5 gesture data for validation of each gesture classes were collected with the same conditions defined in Experimental Information section (see: Section 3. • Template Generation.2). In order to generate templates.6. which were chosen among gesture set used in this theses (see: Figure 3. filtered. The raw.82 APPENDIX D: CASE STUDY Let’s assume. • Result of case study. However.2 to D.1. validated and sampled validation data are given in Figure D. .2. • Recognition Processes. All main procedures and results are explained below. the procedures described in Template Generation section are followed by using the training data set of case study.11.1.2). preprocessed. 2 data sequences are eliminated in the data validation. Raw. Figure D. respectively. Template matching result and generated confusion matrix are shown in Table D. Gesture types used in case study. Here. sampled training data and generated template sequences with upper and lower bounds are given in Figures from D. so there are 2 sequence of non-classified.

2. Raw gesture data sequences. Filtered gesture data sequences. . (a) (b) Figure D.83 (a) (b) Figure D. (a) Filtered Class A.3. (b) Raw Class B. (b) Filtered Class B. (a) Raw Class A.

Preprocessed gesture data sequences. (a) Preprocessed Class A. (b) Preprocessed Class B.4.84 (a) (b) Figure D. .

(a) Sampled Class A. (b) Template sequences of Class B with upper and lower bounds.6. (a) Template sequences of Class A with upper and lower bounds. . (b) Sampled Class B. Template data sequences.5.85 (a) (b) Figure D. (a) (b) Figure D. Sampled gesture data sequences.

(b) Validation data after Data Validation. (c) Filtered validation data. . (a) Raw validation data. Sampled validation data for Class A and Class B. Filtered.7. (d) Down-sampled validation data. Raw.86 (a) (b) (c) (d) Figure D.

87 Figure D. .1. Template Matching result with confusion matrix. Template Matched data.8. Table D.

. and P. 88. Proceedings of the 3rd International Conference on Mobile and Ubiquitous Multimedia. and E. Korpipää. No. and P.. USA. Heidelberg. Filler. E. “Hand Gesture Recognition of a Mobile Device User”. M. “Gesture Recognition with a 3-D Accelerometer”. V. Future Play ’07. J. Mantyjarvi. Wu. G. L. 25– 31. 6. Klingmann.. pp. Kratz. 209–212. Zhang. Pina (editors). pp. pp.. pp. Commercially Available Gaming Console (Wii) for Rehabilitation of an Adolescent With Cerebral Palsy”. NY. Li. Vol. Pérez de la Blanca. 25–38. Berlin.88 REFERENCES 1.. 2005. Springer. Seppanen. T. Mäntyjärvi. 1196–1207. Springer-Verlag. Pylvänäinen. 4. T. Tuulari. 1. 2009. J. Goldsmiths University of London. 2000. ICME 2000. P. “Use of a Low-Cost. and F. J. M. Berlin. 2. Guarrera-Bowlby. Mantyla. G. J. M. 7. 10.. Proceedings of the 6th International Conference on Ubiquitous Intelligence and Computing. “Accelerometer Based Gesture Recognition Using Continuous HMMs”. UIC ’09. pp. Kela.. J. USA. October 2008. Kallio. J. 281 –284 vol. Ph. Borbely. pp. Smith. thesis.. Proceedings of the 2007 Conference on Future Play. 2000. Heidelberg.-M. N. New York. and S. Huhn.D. Physical Therapy.. . 3. Pattern Recognition and Image Analysis. J. New York. IEEE International Conference on Multimedia and Expo. 5. Qi. J. Deutsch. Vol. 2007. 2004. “Wiizards: 3D Gesture Recognition for Game Play Input”. Lee. ACM. and S. Accelerometer-Based Gesture Recognition with the iPhone.1. Pan. “Enabling Fast and Effortless Customization in Accelerometer Based Gesture Interaction”. D. K. Marques. 2009. MUM ’04. 3522 of Lecture Notes in Computer Science. Vol. ACM. NY. 413–430.

and E. U. “Gesture Recognition with a Wii Controller”. Lake Buena Vista. pp. Chiba. USA. Hong Kong. pp. Keogh. 2009.. and S. 2002. Florida. USA. New York. Wiley. MI. Resnick. February 1978. pp. and J. STMicroelectronics Inc. NY. and S. and D. Apple Inc. SIAM. 2008. Berry. Proceedings of the Fourth SIAM International Conference on Data Mining. A. Choudhury. B.. J.. MEMS Motion Sensor 3-Axis . Practical Guide to Accelerometers. 6 edition edition. iPhone User Guide for iPhone OS 3. 13.. Sensr Inc. 1. and Signal Processing. C. Halliday. Vol. VLDB Endowment. “Exact Indexing of Dynamic Time Warping”. C. Proceedings of the 2nd International Conference on Tangible and Embedded Interaction. R. R.. 26. LIS302DL. Chapter 2: Motion Along a Straight Line. 2011. and R. Gaonkar. Cupertino. USA.. 10. CA. Speech. 2011. Systems. USA. ACM. “Dynamic Programming Algorithm Optimization for Spoken Word Recognition”.1 . T. W. Centers for Disease Control and Prevention. 17–26. 11–14. New York. Henze. Keogh. China. Poppinga.. S. Kamath. “Making Time-Series Classification More Accurate Using Learned Constraints”. 2009. 16.. . and Applications for Mobile Handhelds. VLDB ’02. http://www. NY. B. Schlömer. USA. D.. TEI ’08.. Saline. Constandache. 17. MobiHeld ’09. pp. April 2004. 11. “Phonepoint Pen: Using Mobile Phones to Write in Air”. S. Boll. Sakoe. Dayal. No. “Cerebral Palsy”. 406–417.89 8. Ratanamahatana. I. 15. Apple Inc. 43–49. Proceedings of the 28th International Conference on Very Large Databases. 9. M. Skillicorn (editors). Proceedings of the 1St Acm Workshop on Networking. 14. Sensr Inc. N.± 2g/± 8g Smart Digital Output “Piccolo” Accelerometer . pp. 2008. 1–6. IEEE Transactions on Acoustics. Fundamentals of Physics. H. 2000. E. Agrawal. ACM. 12. Walker.

USA.gov/ncbddd/dd/ddcp. “Real Time Recognition of Hand-Drawn Characters”.90 cdc. pp.htm. and T. a Language for Symbol Manipulation”. Fall Joint Computer Conference. University of Illinois Press. 1964. 282–290. Ill. part I). Davis. No. Rovner. 1969. ACM. Savino. part I). C. J. 20. A. Fall Joint Computer Conference. 1964. S. Kallio. pp. pp. 21. AFIPS ’64 (Fall. Proceedings of the October 27-29. ACM. pp. Korpipää. Part I . Faiman. Champaign.. 1964. Teitelman. “The RAND Tablet: a Man-Machine Graphical Communication Device”. 3. 22. NY.. Christensen. Nievergelt (editors). R. Proceedings of the October 27-29. NY. Vol. P. Vol.. and J. and S. 1969. USA. San Francisco. 24. and D. Mäntyjärvi. 2011. Ellis. . G. pp. “Text Editing on a Graphic Display Device Using Hand-Drawn Proofreader’s Symbols”.. D. O. June 14. L. and Cybernetics. 10. pp. 2006. Personal and Ubiquitous Computing. Marca. pp. Proceedings of the 2nd University of Illinois Conference on Computer Graphics. 18. J. M. Part I . CA. May 2007. S. “Accelerometer-Based Gesture Control for a Design Environment”. August 1966. 1964. 311–324. Mitra. “On the Implementation of Ambit/G: a Graphical Programming Language”. 9.. Proceedings of the 1St International Joint Conference on Artificial Intelligence. IEEE Transactions on Systems. Communications of the ACM . 19. 37. “On the Implementation of Ambit. AFIPS ’64 (Fall. USA.. Coleman. Man. Part C: Applications and Reviews. 570–573. 285–299. New York. and T. W. M. Acharya. L. 325–331. Henderson. Vol. New York. 23. 9–20. M. Jozzo. 559–575. P. Morgan Kaufmann Publishers Inc. “Gesture Recognition: A Survey”. Pertinent Concepts in Computer Graphics. Kela.

Sony Computer Entertainment Inc. “A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition”. 3. 44–54. Analog Devices. J. Vol. Morrow. 2011. B. second edition. http://www. “Contour Tracking by Stochastic Propagation of Conditional Density”. 2011. 12.analog. “Real-Time Particle Filters”. Jolliffe. March 2007. J. Proceedings of the IEEE . C. L.. http://www. 26. 2011. Vol. 63–70. 2.. 28. E.co. B. and M. 31. May 18. 1999.91 25. Kwok. M. Vol.. C. Proceedings of the IEEE . Fox.nintendo. Smola (editors). March 1998. pp. MIT Press. A. 343–356. 22. February 1989. Meila. interactions. 2011. Welch.html. Bishop. “An Introduction to the Kalman Filter”. USA. 1996. 1 of ECCV ’96 . Principal Component Analysis. 34. “A Brief History of Human-Computer Interaction Technology”. March 2004.html. 32. January . Cambridge. No.. No. 4. Blake.. London. Isard. New York. pp. UK.cs. pp. USA. O.edu/~welch/kalman/kalmanIntro. and G. 469–484.. 35. “PlayStation Move Motion “ADXL330”. No. E. Proceedings of the 4th European Conference on Computer Vision. Springer-Verlag. “Wii Technical Details”. 29. October 2002. and A. 257–286. NY. Vol. and A. Rabiner. February 4. T. D. Brigham. 92. Schölkopf.com/en/ mems-sensors/inertial-sensors/adxl330/products/product. pp.. 27. Springer-Verlag.unc. MA. http://www. Vol. Advances in Kernel Methods: Support Vector Learning. 2. “The Fast Fourier transform”. No. G. 77. and R. Nintendo Inc. IEEE Spectrum. March 2007. C. December 1967. Burges.uk/NOE/ en_GB/systems/technical_details_1072. I. 30. 33.html. 5. Myers. pp.

Smith.. 43.. June 4. Refaeilzadeh. February 1975. May 1994. Kohavi.. 532–538. No. March 14. R.html. http://robotics. Encyclopedia of Database Systems. and L. Technical Report CMU-RI-TR-94-10. Robotics Institute. Vol. and A. 60. “A Survey of Cross-Validation Procedures for Model Selection”. 2011. Provost. Statistics Surveys. “Minimum Prediction Residual Principle Applied to Speech Recognition”. September 1981. “Hidden Markov Model for Gesture Recognition”. 41. S. “Glossary of Terms”. C. The Scientist and Engineer’s Guide to Digital Signal Processing. L. Liu. 1998. stanford.wikipedia. The Bell System Technical Journal . January 13. 2011. Wikipedia. Arlot. May 3. D. http://us. 7. Master’s thesis. Rabiner. 2011. 2011. and Y. 37. 23. Myers. 44. Speech and Signal Processing. http://en. S. 36.playstation. http://en. 2011. “Low-Pass Filter”.edu/~ronnyk/glossary. Vol. Springer US. 2011. Chapter 28: Digital Signal Processors. and F. PA. . Chapter Cross-Validation. pp. Vol. 1389–1409. 1997. F. 39. Tanguay Jr. pp. J. California Technical Publishing. pp.org/wiki/Low-pass_ filter. 4.com/ps3/accessories/ playstation-move-motion-controller-ps3. Wikipedia. No. 40. “Digital Filter”. Xu. Celisse. IEEE Transactions on Acoustics. 67–72. 2011.html. pp. 45. Tang. P. 40–79. 42. 2009. W. Pittshburgh.wikipedia.org/wiki/Digital_ filter. Cargnegie Mellon University. and H. Hidden Markov Models for Gesture Recognition. 2010. Itakura. “A Comparative Study of Several Dynamic TimeWarping Algorithms for Connected-Word Recognition”.. Yang.. 38.92 Controller”. 1.

Wang. of Electrical Engineering and Computer Science. 49. USA.ICMI ’03 . Massachusetts Institute of Technology. Wickramasuriya.. USA. T. Z. Brashear. USA. Proceedings of the Joint International Conference. Proceedings of the 5th International Conference on Multimodal Interfaces . 50. pp. Program in Media Arts and Sciences. Erkan. Vasudevan. ACM Press. Cambridge. Nijmegen. 48. MA. May 2008. Starner. A. H. 53. 52.. Netherlands. J. T. A.. 45–52. and H. MA. 51. T. “Georgia Tech Gesture Toolkit: Supporting Experiments in Gesture Recognition”. L. and L. p. Starner. February 1995. ICANN/ICONIP.. Master’s thesis. Visual Recognition of American Sign Language Using Hidden Markov Models. E.93 Massachusetts Institute of Technology. 46. Cambridge. Seventh IEEE International Symposium on Wearable Computers. “Using Multiple Sensors for Mobile Sign Language Recognition”. 47. Dynamic Time Warping An Intuitive Way of Handwriting Recognition ? . P. pp. October 2003. Dept. Czech Republic. 2003. New York. J. Lukowicz. December 2004... Proceedings of Seventh IEEE International Symposium on Wearable Computers. Prekopcsák.. Z. and V. Keskin. Radboud University. D. Akarun. Springer. October 2003. Starner. Westeyn. Niels. R. and T. Liu. Atrash. pp. IEEE International Conference on Pervasive Computing and Communications. August 1995. Master’s thesis. . “Accelerometer Based Real-Time Gesture Recognition”. Mizell. 2003. Junker. Zhong. Brashear. Prague. New York. “Real Time Hand Tracking and 3D Gesture Recognition for Interactive Interfaces Using Hmm”. Proceedings of the 12th International Student Conference on Electrical Engineering. H. “Using Gravity to Estimate Accelerometer Orientation”. “uWave: Accelerometer-Based Personalized Gesture Recognition and its Applications”. 1–5. 85.. C. 252–253.

. pp. E. Addison-Wensley. and S. Introduction to Data Mining.94 2009. January 2004. 58. “Using 3D Hidden Markov Models that Explicitly Represent Spatial Coordinates to Model and Compare Protein Structures. Smith. Kuala Lumpur. USA. 2270–2273. M. Affinity Propagation. Elmezain. 5. “A Hidden Markov Model-Based Continuous Gesture Recognition System for Hand Motion Trajectory”. Lai.”. p. H.. 19th International Conference on Pattern Recognition. December 2008. A. BMC Bioinformatics. and B. Introduction to Machine Learning. and Compressive Sensing”. Vol. 2. pp. California Technical Publishing. Malaysia. 1–9. Taiwan. 61. 54. Proceedings of the 36th CIE Conference on Computers and Industrial Engineering. Steinbach. 2004. 2009. and J. V.. March 2010. J. The Scientist and Engineer’s Guide to Digital Signal Processing. 1997.-s. 20–23. J. 55. Alpaydın. Chapter 14: Introduction to Digital Filters. A. Leong.-h. IEEE International Conference on Speech and Signal Processing. J. Hong. pp. Alexandrov. “Wii Want to Write: An Accelerometer Based Gesture Recognition System”. Gerstein. Panza. Park. 4–7. and M. June 2006. 2009. M. 57. Kumar. pp. Jun. 60. pp. Pong. Lee. 2006. and C. Appenrodt. MA. “Accelerometer-Based Gesture Recognition via DynamicTime Warping.. Valaee.-s. W. “A K-Means-Like Algorithm for K-Medoids Clustering and Its Performance”. . P.. Michaelis. Chapter 7.. Chapter 8. Dallas. 56. MIT Press. Cluster Analysis: Basic Concepts and Algorithms. P. J. 59. PerCom 2009. USA. Akl. Clustering.. Tan. Texas. Cambridge. International Conference on Recent and Emerging Advanced Technologies in Engineering. and V. S. T. Taipei. Al-Hamadi. 1–4.

65.org/sfoster/ iphone-tts/. June 1. Communications of the ACM . 51. A. 859 – 868. “Flite: a Small Fast Run-Time Synthesis Engine”. Vol. “iPhone Text-To-Speech”. “When Visual Perception Causes Feeling: Enhanced Cross-Modal Processing in Grapheme-Color Synesthesia”. Lenzo. K. H. Von Ahn. pp. https://bitbucket. August 2008. 2010. Foster. NeuroImage. S. R. UK. and L. . 28. Perthshire. “Designing Games with a Purpose”. P. Zilles.. 2011. A. Fink. Vol. 4th International Speech Communication Association Tutorial and Research Workshop on Speech Synthesis. 58–67. Dabbish. 63. pp. No. 2001.95 62.. and G. and K. Weiss. L. 4. 64. 2005. Black. W.