Professional Documents
Culture Documents
Cluj-Napoca
Facultatea de Matematică şi Informatică
Specializarea Sisteme Inteligente
Lucrare de Disertaţie
Recunoaşterea Exerciţiilor
Fizice
iunie 2010
Babeş-Bolyai University of
Cluj-Napoca
Faculty of Mathematics and Computer Science
Major in Intelligent Systems
Master’s Thesis
Recognizing Physical Exercises
Supervisor Author
Assoc. Prof. Ph.D. Simona Motogna Paul V. Borza
June 2010
Contents
Acronyms iii
1 Introduction 1
1.1 Research problem and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Collaboration with University of Siena . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Human–computer interaction 4
2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Historical roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Present trends and likely future developments . . . . . . . . . . . . . . . . . . . . 5
2.4 Personalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.1 Prof. Ph.D. Lawrence R. Rabiner . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.2 Assoc. Prof. Ph.D. Thad Starner . . . . . . . . . . . . . . . . . . . . . . . 6
4 Pattern recognition 32
4.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Low-pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 High-pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Inverse discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Hidden Markov models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Definition of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Definition of hidden Markov models . . . . . . . . . . . . . . . . . . . . . 38
4.3.3 Definition of continuous density hidden Markov models . . . . . . . . . . 39
4.3.4 The evaluation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 The decoding problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.6 The learning problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.7 Issues and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
i
CONTENTS
6 Results 48
6.1 Recognition of activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.1 Comparison to other approaches . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Computation of repetition counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Conclusion 59
7.1 Summary of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2 Critical evaluation of own work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography 61
Index 64
ii
Acronyms
ACC accelerometer. 2, 7–10, 33, 34, 36, 37, 43, 46, 54, 59, 60
EMG electromyogram. 8, 60
IDFT inverse discrete Fourier transform. 2, 35, 37, 44, 46, 54, 59
iii
Chapter 1
Introduction
The advent of low-cost motion sensors allow industrial designers and developers to create context-
aware applications that enhance and enrich the user interaction. With such custom devices in
place, one attached to the wrist of the non-dominant arm and the other attached to the oppo-
site ankle, a three-phase approach that keeps track of physical exercises and repetition counts is
proposed.
1
CHAPTER 1. INTRODUCTION
other developers a framework capable of defining and detecting medium-size sets of activities.
The immediate objective of this paper is to showcase a health-related application that keeps
track of physical exercises and counts the number of repetitions.
Given two devices, each consisting of a three-axis accelerometer and a three-axis gyroscope,
the first attached to the wrist of the non-dominant arm and the second attached to the opposite
ankle, the problem translates into detecting and correctly classifying segments of continuous
streams of data into classes. The data volume remains an issue as sensor frequencies of about
70 − 100Hz are required in such problems.
Human activities have three aspects of signal characteristics that make them particularly
difficult to recognize: noisy input, segmentation ambiguity, and spatio-temporal variability. The
noisy input refers to the inherent noisy nature of the wearable inertial sensors, hence noise
reduction needs to be done ahead of time. Segmentation ambiguity refers to not knowing the
boundaries, and therefore reference patterns have to be matched with all possible segments of
input signals. Spatio-temporal variability refers to the fact that each repetition of the same
gesture varies dynamically in shape and duration, even for the same performer.
The lack of a standardized dataset used in detecting human activities is a crucial problem
that determine most scientists to focus on other open problems. Thus, a dataset was recorded
at the University of Siena with the help of 10 volunteers. All candidates performed 20 physical
exercises each with up to 12 repetitions in about 15 minutes while the sessions were videotaped;
during the course, each volunteer also performed other activities that would incorrectly account
for the monitored exercises with the sole purpose of thoroughly measuring the reliability of the
proposed approach.
1.2 Contribution
The problem of successfully recognizing classes of a medium-sized vocabulary of 20 human ac-
tivities given a continuous stream of collected samples, in a constrained time interval, poses a
trade-off between processing speed and recognition accuracy. The work presented in this paper
employs state of the art signal filters, transformations, and machine learning techniques. The
first phase applies a fast low-pass filter, to remove the small variations that all sensors are sus-
ceptible to, and then converts the smoothened input to the frequency domain with a discrete
Fourier transform, discards low-amplitude frequencies, and then translates them back to the time
domain with an inverse discrete Fourier transform. Once the streams are noise-free, the second
phase splits and classifies each half-overlapping sliding window of two and a half seconds into
one of the classes; two variants are presented: k-means with discrete hidden Markov models and
continuous density hidden Markov models of multivariate Gaussian mixtures. The final phase
detects the number of repetitions, either with a time- or peak-based heuristic, once a continuous
block of windows which contain activities is detected.
The two variants of hidden Markov models are compared in terms of computational and
storage efficiency, but the most relevant characteristic remains the recognition accuracy which is
measured in three different scenarios: subject-dependent, subject-independent, and mixed k-fold
cross-validation.
The author participated in the 2008 Google Summer of Code and developed an application in
C, for Openmoko’s second-generation mobile phone, capable of recognizing 12 motion gestures in
real-time; his bachelor’s thesis was also on the same subject and used one Nintendo Wii Remote
as input for controlling various applications on a host computer. The decision to implement the
core part of this framework in MATLAB, while the consumer endpoints in C and C#, is based
on MATLAB’s code generation capabilities for C, C++, FORTRAN, Java, and C#.
2
CHAPTER 1. INTRODUCTION
3
Chapter 2
Human–computer interaction
2.1 Definition
The Association for Computing Machinery, Special Interest Group on Computer Human In-
teraction [11] defines human–computer interaction as “a discipline concerned with the design,
evaluation, and implementation of interactive computing systems for human use and with the
study of major phenomena surrounding them”.
HCI in the large is an interdisciplinary area and emerges as a specialty concern within several
disciplines, each with different emphases: computer science (application design and engineering of
human interfaces), psychology (the application of theories of cognitive processes and the empirical
analysis of user behavior), sociology and anthropology (interactions between technology, work,
and organization), and industrial design (interactive products).
Human–computer interaction is concerned with:
4
CHAPTER 2. HUMAN–COMPUTER INTERACTION
2.4 Personalities
Due to the multidisciplinary nature of human–computer interaction, people with different back-
grounds contribute to its success. HCI is also referred to as man—machine interaction or
computer—human interaction.
5
CHAPTER 2. HUMAN–COMPUTER INTERACTION
AT&T Labs in 1996 as Director of the Speech and Image Processing Services Research Lab, and
was promoted to Vice President of Research in 1998 where he managed a broad research program
in communications, computing, and information sciences technologies. Dr. Rabiner retired from
AT&T at the end of March 2002 and is now a Professor of Electrical and Computer Engineer-
ing at Rutgers University, and the Associate Director of the Center for Advanced Information
Processing at Rutgers.
Dr. Rabiner is co-author of the books ”Theory and Application of Digital Signal Processing”
(Prentice-Hall, 1975), ”Digital Processing of Speech Signals” (Prentice-Hall, 1978), ”Multirate
Digital Signal Processing” (Prentice-Hall, 1983), and ”Fundamentals of Speech Recognition”
(Prentice-Hall, 1993).
Dr. Rabiner is a member of Eta Kappa Nu, Sigma Xi, Tau Beta Pi, the National Academy of
Engineering, the National Academy of Sciences, and a Fellow of the Acoustical Society of Amer-
ica, the IEEE, Bell Laboratories, and AT&T. He is a former President of the IEEE Acoustics,
Speech, and Signal Processing Society, a former Vice-President of the Acoustical Society of Amer-
ica, a former editor of the ASSP Transactions, and a former member of the IEEE Proceedings
Editorial Board.
Education
• Ph.D. in Electrical Engineering, Massachusetts Institute of Technology, 1967
• MS in Electrical Engineering, Massachusetts Institute of Technology, 1964
• BS in Electrical Engineering, Massachusetts Institute of Technology, 1964
Education
• Ph.D. in Media Arts and Sciences, Massachusetts Institute of Technology, 1999
• MS in Media Arts and Sciences, Massachusetts Institute of Technology, 1995
6
Chapter 3
7
CHAPTER 3. GESTURES AND ACTIVITIES
performed with support vector machines on data collected from 67 subjects. The best average
recognition result for 17 complex gestures is 87.36%, achieved with wavelet-based method, while
DCT and FFT produce an accuracy of 85.16% and 86.92% respectively.
Liu et al. [15] present an efficient recognition algorithm for interaction based on gestures
also using a single three-axis accelerometer. The approach requires a single training sample
for each gesture pattern and allows users to employ personalized gestures. The algorithm, a
modified dynamic time warping, is evaluated against a large gesture library with over 4,000
samples for eight gesture patterns collected from eight users over one month. This achieves
98.6% accuracy, competitive with statistical methods that require significantly more training
samples. Applications of gesture-based user authentication and interaction with 3D mobile user
interfaces are also showcased.
Tapia et al. [30] present a real-time algorithm for automatic recognition of not only physical
activities, but also, in some cases, their intensities, using five triaxial wireless accelerometers
and a wireless heart rate monitor. The algorithm was evaluated with datasets consisting of
30 physical gymnasium activities collected from a total of 21 people at two different laborato-
ries. On these activities, recognition accuracies of 94.6% using subject-dependent training and
56.3% using subject-independent training were obtained. The addition of heart rate data im-
proves subject-dependent recognition accuracy with 1.2% and subject-independent recognition
with 2.1%. When recognizing activity types without differentiating intensity levels, a subject-
independent performance of 80.6% is obtained.
Wilson et al. [32] create a novel wireless sensor package that enables styles of natural interac-
tion with intelligent environments. For example, a user is able to point the wand at a device and
control it using simple gestures. The proposed system leverages the intelligence of the environ-
ment to best determine the user’s intention. Details on the hardware device, signal processing
algorithms to recover position and orientation, gesture recognition techniques, and multimodal
(wand and speech) computational architecture are also presented. Furthermore, a preliminary
user study on pointing performance under conditions of tracking availability and audio feedback
is examined.
Xu et al. [34] describe a novel hand gesture recognition system that utilizes both multichannel
surface electromyogram sensors and three-axis accelerometers. Signal segments of meaningful
gestures are determined from the continuous EMG signal inputs. Multistream hidden Markov
models consisting of EMG and acceleration streams are used to classify 18 gestures, each trained
with 10 repetitions; the average recognition accuracy obtained was 91.7%. This paper also
presents a virtual Rubik’s cube game that is controlled with hand gestures.
8
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.1: The annotating application is capable of working with an entire session composed
of several recorded data files and an associated movie file. The video takes the top part of the
screen, while the bottom part presents charts with previously saved accelerometer and gyroscope
values. The application allows the researcher to annotate a portion of the data stream with the
start of an exercise (i.e. marker with yellow background) or with an end of a repetition (i.e.
marker with black background). It also gives the option of choosing which data streams (x-, y-,
or z-axis of the ankle/wrist ACC or GYRO) to view at any given time. The screenshot depicts
the start of the arm spreading exercise and its eight repetitions.
9
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.2: Two devices were attached to each volunteer’s non-dominant arm and opposite ankle.
This approach removes scenarios with increased chance of recognition failure like writing on a
piece of paper where the subject would have intensively used its dominant wrist.
the same path and activities three times; the second and third iteration contained five and nine
repetitions respectively per physical exercise.
10
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.3: Walking on a corridor of about 200 meters in an almost straight line; at the end of
the corridor, the subject had to open a door. Other segments of the path included walking inside
a room around a desk, and on other corridors.
0 0
g
−2
−500
−4
0 200 400 600 800 0 200 400 600 800
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 800 0 200 400 600 800
samples samples
11
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.4: Running on a distance of about 50 meters in a narrow corridor; at the end of the
exercise, the subject opened the door to exit the corridor. This exercise required the performer
to run at a slow to medium speed and followed immediately after walking.
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
12
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.5: Backward running on the same corridor of about 50 meters but in the opposite
direction; at the end of the exercise, the subject had to open a door. Prior to this physical
exercise, the subject took a short break, doing other activities such as tying a shoelace.
0 0
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
13
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.6: Stair climbing involved climbing up 22 steps. The stairway had three small flat
surfaces that separated the steps, which required the user to walk on tiny distances. When the
subject reached the top of the stairwell, it rested on a chair for a couple of seconds.
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
14
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.7: Jumping rope is an exercise that made the subject skip a virtual rope that passes
under its feet and over its head. Some subjects jumped with both feet at a time, while others
alternatively jumped with one foot at a time.
0 0
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
15
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.8: Toe touching required the subject to spread its feet apart, lean forward toward one
leg, and touch the foot with the opposite hand. Before performing this physical exercise, the
user walked around and opened a trash bin.
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
16
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.9: Walking downstairs involved climbing down 22 steps. The stairway had three small
flat surfaces that separated the steps, which required the user to walk on tiny distances. When
the subject reached the bottom of the stairwell, it walked till the nearest available room.
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
17
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.10: Limbs spreading is a common warming up physical exercise where the subject has
to jump with its feet and arms alternatively spread apart. Some subjects carried out the activity
by spreading both limbs at the same time.
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
18
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.11: Arm signaling is a simple stretching exercise that involves a horizontal to vertical
movement of the forearm. Prior to this activity, the subject had to walk around some chairs
inside the room for a couple of seconds.
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
19
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.12: Broomstick twisting involves rotating the torso through the waist to one side then
to the opposite. This is achieved with a stick that is positioned on the back of the shoulders.
Before performing this exercise, the subject rested on a chair and moved a trash bin.
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
20
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.13: Right crossed running on a distance of about 15 meters inside a room. Subjects
had to alternatively put the left foot in front and back of the right foot while running in a right
lateral direction. Some subjects used to put the left foot only to the front of the right foot.
0 0
g
−2
−500
−4
0 200 400 0 200 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 0 200 400
samples samples
21
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.14: Left crossed running on a distance of about 15 meters inside a room. Subjects had
to alternatively put the right foot in front and back of the left foot while running in a left lateral
direction. Some subjects used to put the right foot only to the front of the left foot.
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
22
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.15: Right lateral running on a distance of about 15 meters inside a room. Subjects had
to jump with their feet and arms spread apart in a right lateral direction. Some volunteers had
their arms above their heads when jumping while others had these in a resting position.
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
23
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.16: Left lateral running on a distance of about 15 meters inside a room. Subjects had
to jump with their feet and arms spread apart in a left lateral direction. Some volunteers had
their arms above their heads when jumping while others had these in a resting position.
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
24
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.17: High knees running in place needs the subject to start running in place while getting
its knees as high as possible. Before performing this exercise, the subject rested on a chair for a
couple of seconds and answered the mobile phone.
0 0
g
−2
−500
−4
0 200 400 0 200 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 0 200 400
samples samples
25
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.18: High knees running on a distance of about 15 meters inside a room. This activity
requires the subject to start running while raising its knees as high as possible. Prior to this,
the subject had to run in place while raising its knees as high as possible.
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
26
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.19: Chest rotation running on a short distance. This exercise involves holding both
arms horizontally and rotating the chest from one side to the other while running at a slow
speed. Before this, the subject had to move a chair from one part of the room to the other.
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
27
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.20: Arm spreading is a typical warming up physical exercise that involves movement
just on the upper body. Subjects had to fully spread their arms and then cross these twice in
front of their chest. Some volunteers used to cross their arms just once.
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
28
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.21: Arm alternating is another typical warming up exercise that requires performers to
alternatively raise one arm and lower the other, in the same time, to reach full vertical extension.
Some volunteers used to make a second smaller extension in the same repetition.
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
29
CHAPTER 3. GESTURES AND ACTIVITIES
Figure 3.22: Ground touching involves spreading the legs to a comfortable position and touching
the ground with both hands once or twice in a repetition. Before performing this physical
exercise, subjects had to walk back to the starting point of the path.
−2
−500
−4
0 20 40 60 0 20 40 60
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 20 40 60 0 20 40 60
samples samples
30
CHAPTER 3. GESTURES AND ACTIVITIES
31
Chapter 4
Pattern recognition
4.1 Filters
4.1.1 Low-pass filter
To detect the posture (current orientation) of a device, the portion of the acceleration data that
is caused by gravity has to be filtered out from the portion that is caused by motion of the
device. To accomplish this, a low-pass filter that reduces the influence of sudden changes on the
accelerometer data has to be enforced (please see figure 4.1). The resulting filtered values would
then reflect the more constant effects of gravity [4].
Equation 4.1 shows a simplified and fast version of a low-pass filter. This approach uses a
low filtering factor to generate a value that uses 10% of the unfiltered acceleration data and
90% of the previously filtered value. The previous values are stored in the accelX , accelY , and
accelZ variables. Because acceleration data comes in regularly, these values settle out quickly
and respond slowly to sudden but short-lived changes in motion.
32
Figure 4.1: The low-pass filter when applied on accelerometer values with a filtering factor of 0.1 smoothens the data streams. This
shows a comparison of the unfiltered and low-pass filtered wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with low-pass filter
4
2 x
0
g
−2
−4 y
33
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with low-pass filter
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
Figure 4.2: The high-pass filter when applied on accelerometer values emphasizes sudden changes in the data streams. This shows a
comparison of the unfiltered and high-pass filtered wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with high-pass filter
4
2 x
0
g
−2
−4 y
34
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with high-pass filter
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
CHAPTER 4. PATTERN RECOGNITION
4.2 Transforms
4.2.1 Discrete Fourier transform
The discrete Fourier transform provides the means of transforming discrete data streams de-
fined in the time domain into functions defined in the frequency domain [31]. The fast Fourier
transform is a class of special algorithms which implement the discrete Fourier transform with
considerable savings in computational time.
Equation 4.3 describes how the complex numbers Xk that represent the Fourier coefficients of
the different sinusoidal components of the input function xn are calculated. As seen in figure 4.3,
noise is characterized by frequencies that have low amplitudes, hence these have to be discarded
in order to obtain noise-free data streams.
N −1
2πi
X
Xk = xn e− N kn k = 0, . . . , N − 1 (4.3)
n=0
35
Figure 4.3: The frequency domain of the input functions is calculated with the help of a fast Fourier transform. This shows the
amplitudes of the frequencies that exist in the wrist and ankle accelerometer signals recorded during the running exercise.
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer in frequency domain
amplitudes
1
0.5 x
0 y
36
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer in frequency domain
amplitudes
1
0.5 x
0 y
0 100 200 300 400 500 600 700
z
f requencies
Figure 4.4: Once the data streams are converted to the frequency domain, frequencies with low amplitudes are removed. The inverse
discrete Fourier transform is used to recover the noise-free wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with discrete Fourier transform and its inverse
4
2 x
0
g
−2
−4 y
37
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with discrete Fourier transform and its inverse
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
CHAPTER 4. PATTERN RECOGNITION
The random variables Xk are said to form a first-order Markov chain, provided that equation
4.6 validates. This equation is also known as the Markov assumption.
Formula 4.8 describes how the Markov chain can be used to model time-invariant events if
time index i is discarded.
38
CHAPTER 4. PATTERN RECOGNITION
• B = {bj (k)}, the output probability matrix, where bj (k) is the probability of emitting
symbol ok when state j is entered, calculated as in equation 4.10.
πi = P (s1 = i) (4.11)
In addition, the variables that define a hidden Markov model need to satisfy the constraints
exemplified in equation 4.12.
PN
aij = 1
Pj=1
M
bj (k) = 1 (4.12)
Pk=1
N
i=1 i = 1
π
For convenience, the notation Φ = (A, B, π) will be used throughout this paper.
The mixture gains have to satisfy the stochastic constraint from equation 4.14 so that the
probability density function is properly normalized, i.e. equation 4.15 holds.
M
X
cjk = 1 cjk ≥ 0 (4.14)
k=1
39
CHAPTER 4. PATTERN RECOGNITION
Z +∞
bj (x)dx = 1 (4.15)
−∞
N
hX i
αt+1 (j) = αt (i)aij bj (xt+1 ) (4.18)
i=1
N
X
P (X|Φ) = αT (i) (4.19)
i=1
Strictly speaking, only the forward part of the forward-backward procedure is needed to solve
the evaluation problem. However, the backward part of the procedure is introduced because it’s
used to help solve the learning problem.
In a similar manner, a backward variable βt (i) is considered in formula 4.20 as the probability
of the partial observation sequence xt+1 xt+2 · · · xT , given state i at time t and model Φ.
βT = 1 (4.21)
N
X
βt (i) = aij bj (xt+1 )βt+1 (j) (4.22)
j=1
40
CHAPTER 4. PATTERN RECOGNITION
To actually retrieve the state sequence, the argument which maximized the previous quantity
needs to be tracked for each t and i; this is done via the variable Ψt (i). These computations can
be performed inductively: the initialization is described in equation 4.24, the induction steps in
4.25, and the final solution in 4.26; the sequence of the states have to be backtracked like in
formula 4.27.
δ1 (i) = πi bi (x1 )
(4.24)
Ψ1 (i) = 0
γt (i) is defined as the probability of being in state i at time t, given the observation sequence
and the model; hence, γt (i) relates to ξt (i, j) by summing over j, yielding formula 4.30.
41
CHAPTER 4. PATTERN RECOGNITION
N
X
γt (i) = ξt (i, j) (4.30)
j=1
The summation of γt (i) over time index t (please see equation 4.31) represents the expected
number of times that state i is visited, or equivalently, the expected number of transitions made
from state i. Similarly, summation of ξt (i, j) over t (see equation 4.32) can be interpreted as the
expected number of transitions made from state i to state j.
T
X −1
γt (i) = expected number of transitions from state i (4.31)
t=1
T
X −1
ξt (i, j) = expected number of transitions from state i to state j (4.32)
t=1
Therefore, formulas 4.33, 4.34, and 4.35 can be used for re-estimation of the parameters of a
discrete hidden Markov model.
PT −1
ξt (i, j)
âij = Pt=1
T −1
(4.34)
t=1 γt (i)
PT
γt (j) s.t. xt = ok
b̂j (k) = t=1 PT (4.35)
t=1 γt (j)
42
CHAPTER 4. PATTERN RECOGNITION
Deleted interpolation
A problem associated with training HMM parameters via re-estimation methods is that the
observation sequence used for training is, of necessity, finite. Thus, there is often an insufficient
number of occurrences of different model events to give good estimates of the model parameters.
A solution to this problem is to interpolate one set of parameter estimates with another set of
parameter estimates from a model for which an adequate amount of training data exists. The
estimates of the parameters of the first model λ = (A, B, π) can be mixed with the parameters
of the second model λ0 = (A0 , B 0 , π 0 ) to reach the interpolated model λ̄ = (Ā, B̄, π̄). Equation
4.40 shows how the interpolation is obtained; represents the weighting of the parameters of the
first model and (1 − ) represents the weighting of the parameters of the second model. Hence,
the re-estimated hidden Markov model will approach the global maximum slower.
λ̄ = λ + (1 − )λ0 (4.40)
43
Chapter 5
44
CHAPTER 5. SOFTWARE ARCHITECTURE, DESIGN, AND IMPLEMENTATION
Figure 5.1: The architecture of the framework relies on the well-known pipes and filters archi-
tectural style. Yellow shapes represent sensors while blue circles represent filters. This depicts
the entire flow of the process, starting with the sensors and ending with the recognition results:
id/name of the exercise and the associated number of performed repetitions.
45
CHAPTER 5. SOFTWARE ARCHITECTURE, DESIGN, AND IMPLEMENTATION
5.2 Design
5.2.1 Challenges
A simple calculus based on the hardware specification [1] [3] of the two identical devices show
that each second 2 devices × (3 accelerations + 3 rotations) × 4 bytes × 100 hz = 4.68 bytes are
sent over for processing. Moreover, it can be seen that five days of continuous logging in binary
mode accounts for as much as 2 GB. Thus, the task of classifying a medium-sized vocabulary of
human activities involves designing a technique that not only has a good recognition rate, but
is also optimized as much as possible [17].
46
CHAPTER 5. SOFTWARE ARCHITECTURE, DESIGN, AND IMPLEMENTATION
implemented the Microsoft Speech Recognition Engine on vocabularies with more than 60,000
words; IBM’s approach was similar [13].
Each of the 20 presented activities was associated to one hidden Markov model with eight
states, where each state is characterized by a single multivariate Gaussian mixture [24]. In
addition to these, an extra class was added to model anything else that is not considered an
exercise. Given that the starting and finishing point of the physical exercises is not determined,
the HMMs were chosen to be ergodic rather than left-to-right [20]. Experimental results showed
that computing the inverse of the covariance matrix with its diagonal values not only gives better
recognition results, but is also much faster than computing the full inverse. Each of the 21 hidden
Markov models was tested against all sliding windows and the one with the highest logarithmic
probability was considered to be the winner.
5.3 Implementation
The statistical part of the framework was written in MATLAB and its compiler was used to gen-
erate C function libraries. On top of these, another stack of C wrapper functions for interfacing
with the MATLAB C-specific functions was added. The visual interface of the tool is in C#
and invokes the exported wrapper functions from the unmanaged dynamic link libraries. With
the help of this tool, users can record, annotate, train, test, and validate sessions in subject-
dependent, subject-independent, and mixed k-fold cross-validation scenarios.
47
Chapter 6
Results
48
CHAPTER 6. RESULTS
Figure 6.1: The subject-dependent validation of the activities was computed against five of the
subjects that completed the path three times. For each subject, two of its sessions were used in
training and the remaining one in testing. In this scenario, the DHMMs were set to work with
eight states while the k-means clustering algorithm transformed the continuous domain into 32
discrete centroids. The CDHMMs were also configured to work with eight states, but each state
was modeled with a single multivariate Gaussian mixture. Both approaches were trained in a
maximum of 25 iterations to reach a converge threshold of less than 10−5 . The chart depicts
that discrete hidden Markov models manage to reach a recognition rate as high as 86.70% for
the third subject, while continuous density hidden Markov models score as high as 97.08% for
the fifth subject.
1
0.8259 DHMM
0.9379 CDHMM
0.95
0.9
accuracy
0.85
0.8
0.75
0.7
1 2 3 4 5
subject
49
CHAPTER 6. RESULTS
Figure 6.2: The subject-independent validation of the activities was computed against five of the
subjects that completed the path three times. For each subject, all other 12 sessions were used
in training and the subject’s all three sessions in testing. In this scenario, the DHMMs were
set to work with eight states while the k-means clustering algorithm transformed the continuous
domain into 32 discrete centroids. The CDHMMs were also configured to work with eight states,
but each state was modeled with a single multivariate Gaussian mixture. Both approaches were
trained in a maximum of 25 iterations to reach a converge threshold of less than 10−5 . The
chart depicts that discrete hidden Markov models manage to reach a recognition rate as high as
77.01% for the second subject, while continuous density hidden Markov models score as high as
96.15% for the first subject.
1
0.7196 DHMM
0.95 0.9151 CDHMM
0.9
0.85
accuracy
0.8
0.75
0.7
0.65
0.6
1 2 3 4 5
subject
50
CHAPTER 6. RESULTS
Figure 6.3: The 5-fold cross-validation of the activities was computed against the entire set
of 22 sessions. In this scenario, the DHMMs were set to work with eight states while the k-
means clustering algorithm transformed the continuous domain into 32 discrete centroids. The
CDHMMs were also configured to work with eight states, but each state was modeled with
a single multivariate Gaussian mixture. Both approaches were trained in a maximum of 20
iterations to reach a converge threshold of less than 10−5 . The chart depicts that discrete hidden
Markov models manage to reach a recognition rate as high as 82.68% for the fifth fold, while
continuous density hidden Markov models score as high as 95.67% for the fourth fold.
1
0.7811 DHMM
0.8537 CDHMM
0.95
0.9
accuracy
0.85
0.8
0.75
0.7
1 2 3 4 5
f old
51
CHAPTER 6. RESULTS
Figure 6.4: The running times of the activities were computed on a Lenovo ThinkPad X61 tablet
with an Intel Core2 Duo CPU L7500 running at 1.6GHz. The DHMMs were set to work with
eight states while the k-means clustering algorithm transformed the continuous domain into 32
discrete centroids. The CDHMMs were also configured to work with eight states, but each state
was modeled with a single multivariate Gaussian mixture. Both approaches were trained in a
maximum of 25 iterations to reach a converge threshold of less than 10−5 . This shows that
discrete hidden Markov models need 0.64s to adjust their parameters on one real second of
data, while continuous density hidden Markov models need only 0.31s to achieve the same thing.
Recognizing one real second of data requires 0.53s for DHMMs, and 0.15s for CDHMMs.
training
testing
CDHMM
DHMM
52
Table 6.1: The confusion matrix of one subject’s exercises is presented. The total accuracy for this particular volunteer is 96.68%.
However, the table shows that walking was sometimes incorrectly classified with other activities, running was misclassified once as stair
climbing, and walking downstairs has a high chance of being detected just as walking. All other CDHMMs scored a perfect 100%.
ground touching
arm alternating
limbs spreading
other activities
arm spreading
stair climbing
arm signaling
jumping rope
toe touching
running
walking
other activities 254 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%
walking 8 128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94%
running 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80%
53
CHAPTER 6. RESULTS
left crossed running 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 100%
right lateral running 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 100%
left lateral running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 100%
high knees running in place 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 100%
high knees running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 100%
chest rotation running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 100%
arm spreading 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 100%
arm alternating 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 100%
ground touching 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 100%
CHAPTER 6. RESULTS
Table 6.2: Comparison of the proposed DHMM- and CDHMM-based approaches to other meth-
ods found in recent literature. Some authors reach high recognition accuracies because of a small
vocabulary or because of the designed set of monitored gestures which contains very different
activities. Tapia et al. [30] has the most similar configuration and testing scenario to ours and
reaches a recognition rate of 94.60% with a naive Bayes classifier; this is comparable with the
proposed LPF + DFT + IDFT + CDHMM approach.
54
CHAPTER 6. RESULTS
Figure 6.5: The subject-dependent validation of the repetition counts was computed against five
of the subjects which completed the path three times. For each subject, two of its sessions were
used in training and the remaining one in testing. These values show little difference between
the two heuristics, as both score as high as 97.31% for the second subject.
1
0.9546 TBH
0.98 0.9520 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5
subject
Given that these heuristics compute coefficients that relate directly to the number of repeti-
tions, the time-based heuristic requires only 0.17ms to come up with a result for a data stream
of one real second, as indicated in figure 6.8.
55
CHAPTER 6. RESULTS
Figure 6.6: The subject-independent validation of the repetition counts was computed against
five of the subjects which completed the path three times. For each subject, all other 12 sessions
were used in training and the subject’s all three sessions in testing. These values show little
difference between the two heuristics, but the time-based heuristic scores as high as 97.28% for
the second subject, while the peak-based heuristic scores as high as 97.60% also for the second
subject.
1
0.9107 TBH
0.98 0.9086 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5
subject
56
CHAPTER 6. RESULTS
Figure 6.7: The 10-fold cross-validation of the repetition counts was computed against the entire
set of 22 sessions. This depicts that the time-based heuristic scores as high as 95.97% for the
seventh fold, while the peak-based heuristic scores as high as 96.50% also for the second fold.
1
0.9254 TBH
0.98 0.9227 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5 6 7 8 9 10
f old
57
CHAPTER 6. RESULTS
Figure 6.8: The running times of the repetition counts were computed on a Lenovo ThinkPad X61
tablet with an Intel Core2 Duo CPU L7500 running at 1.6GHz. This shows that the time-based
heuristic needs 0.19ms to adjust its parameters on one real second of data, while the peak-based
heuristic needs 0.22ms to achieve the same thing. Recognizing one real second of data requires
0.17ms for TBH and 0.20ms for PBH.
training
testing
PBH
TBH
58
Chapter 7
Conclusion
59
CHAPTER 7. CONCLUSION
60
Bibliography
[1] Adxl330 small, low power, 3-axis +/-3 g imems accelerometer. Technical report, Analog
Devices, Norwood, Massachusetts 02062, 09 2006.
[2] Integration patterns: Pipes and filters. Technical report, Microsoft Corp., Redmond, Wash-
ington 98052, 06 2006.
[3] Idg-650 dual-axis gyro product specification. Technical report, InvenSense Inc., Sunnyvale,
California 94089, 05 2010.
[4] iphone application programming guide. Technical report, Apple Inc., Cupertino, California
95014, 03 2010.
[5] Ling Bao. Physical activity recognition from acceleration data under semi-naturalistic con-
ditions, 08 2003.
[6] Thomas G. Dietterich. Machine learning for sequential data: A review. Technical report,
Oregon State University, Corvallis, Oregon 97331, 2002.
[7] Adrian Flanagan, Jani Mantyjarvi, Kalle Korpiaho, and Johanna Tikanmaki. Recognizing
movements of a handheld device using symbolic representation and coding of sensor signals.
12 2002.
[8] David Garlan and Mary Shaw. An introduction to software architecture. Advances in
Software Engineering and Knowledge Engineering, 1, 01 1993.
[9] Zhenyu He, Lianwen Jin, Lixin Zhen, and Jiancheng Huang. Gesture recognition based on 3d
accelerometer for cell phones interaction. Institute of Electrical and Electronics Engineers,
2008.
[10] Mi hee Lee, Jungchae Kim, Kwangsoo Kim, Inho Lee, Sun Ha Jee, and Sun Kook Yoo.
Physical activity recognition using a single tri-axis accelerometer. volume 1, 10 2009.
[11] Hewett, Baecker, Card, Carey, Gasen, Mantei, Perlman, Strong, and Verplank. Human-
computer interaction. Association for Computing Machinery, 07 2009.
[12] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. Spoken Language Processing: A Guide
to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River, New
Jersey 07458, 2001.
[13] Frederick Jelinek. Statistical Methods for Speech Recognition (Language, Speech, and Com-
munication). The MIT Press, Cambridge, Massachusetts 02142, 1998.
61
BIBLIOGRAPHY
[14] Alan L. Liu, Jun Yang, and Peter Pal Boda. Poster abstract: Gesture recognition via
continuous maximum entropy training on accelerometer data. Association for Computing
Machinery, 04 2009.
[15] Jiayang Liu, Lin Zhong, Jehan Wickramasuriya, and Venu Vasudevan. uwave:
Accelerometer-based personalized gesture recognition and its applications. Pervasive and
Mobile Computing, 5, 2009.
[16] Marcel Michelmann. Context-aware applications, computing in context. Technical report,
Ludwig Maximilians Universitat Munchen, Munich, Germany 80539, 04 2007.
[17] Gerrit Niezen. The optimization of gesture recognition techniques for resource-constrained
devices, 04 2008.
[18] Tamara M.E. Nijsen, Pierre J.M. Cluitmans, Paul A.M. Griep, and Ronald M. Aarts. Short
time fourier and wavelet transform for accelerometric detection of myoclonic seizures. 12
2006.
[19] Zoltan Prekopcsak. Accelerometer based real-time gesture recognition. 05 2008.
[20] Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in
speech recognition. Institute of Electrical and Electronics Engineers, 77(2), 02 1989.
[21] Lawrence R. Rabiner. Biography. http://www.caip.rutgers.edu/~lrr/lrr%20info/lrr_
biography_rutgers_2002.pdf, 2002.
[22] Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L. Littman. Activity
recognition from accelerometer data. American Association for Artificial Intelligence, 2005.
[23] Matthias Rehm, Nikolaus Bee, and Elisabeth Andre. Wave like an egyptian - accelerometer
based gesture recognition for culture specific interactions. 2008.
62
BIBLIOGRAPHY
[33] Daniel Wilson and Andy Wilson. Gesture recognition using the xwand. Technical report,
Carnegie Mellon University, Robotics Institute, Assistive Intelligent Environments Group
and Microsoft Research, Redmond, Washington 98052, 2002.
[34] Zhang Xu, Chen Xiang, Wang Wen-hui, Yang Ji-hai, Vuokko Lantz, and Wang Kong-qiao.
Hand gesture recognition and virtual game control based on 3d accelerometer and emg
sensors. Association for Computing Machinery, 02 2009.
[35] Jhun-Ying Yang, Jeen-Shing Wang, and Yen-Ping Chen. Using acceleration measurements
for activity recognition: An effective learning algorithm for constructing neural classifiers.
Pattern Recognition Letters, 29, 2008.
63
Index
ergodic topology, 43
64