Recunoas Terea Exercit Iilor Fizice: Universitatea Babes - Bolyai, Cluj-Napoca

Universitatea Babeş-Bolyai,
Cluj-Napoca
Facultatea de Matematică şi Informatică
Specializarea Sisteme Inteligente
Lucrare de Disertaţie
Recunoaşterea Exerciţiilor
Fizice
Conducător ştiinţific Absolvent

Conf. Dr. Simona Motogna Paul V. Borza
iunie 2010
Babeş-Bolyai University of
Cluj-Napoca
Faculty of Mathematics and Computer Science
Major in Intelligent Systems
Master’s Thesis
Recognizing Physical Exercises
Supervisor Author
Assoc. Prof. Ph.D. Simona Motogna Paul V. Borza
June 2010
Contents
Acronyms iii
1 Introduction 1
1.1 Research problem and objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Contribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Collaboration with University of Siena . . . . . . . . . . . . . . . . . . . . . . . . 3
2 Human–computer interaction 4
2.1 Definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Historical roots . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 Present trends and likely future developments . . . . . . . . . . . . . . . . . . . . 5
2.4 Personalities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.1 Prof. Ph.D. Lawrence R. Rabiner . . . . . . . . . . . . . . . . . . . . . . . 5
2.4.2 Assoc. Prof. Ph.D. Thad Starner . . . . . . . . . . . . . . . . . . . . . . . 6
3 Gestures and activities 7

3.1 Overview of current literature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3.2 Dataset construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
3.3 Specification of activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
3.4 Possible applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Pattern recognition 32
4.1 Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.1 Low-pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.1.2 High-pass filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
4.2 Transforms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.1 Discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2.2 Inverse discrete Fourier transform . . . . . . . . . . . . . . . . . . . . . . 35
4.3 Hidden Markov models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.1 Definition of Markov chains . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.3.2 Definition of hidden Markov models . . . . . . . . . . . . . . . . . . . . . 38
4.3.3 Definition of continuous density hidden Markov models . . . . . . . . . . 39
4.3.4 The evaluation problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.5 The decoding problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3.6 The learning problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.3.7 Issues and limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
i
CONTENTS
5 Software architecture, design, and implementation 44

5.1 Pipes and filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.1 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.2 Data preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.3 Recognition of activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.2.4 Computation of repetition counts . . . . . . . . . . . . . . . . . . . . . . . 47
5.3 Implementation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
6 Results 48
6.1 Recognition of activities . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
6.1.1 Comparison to other approaches . . . . . . . . . . . . . . . . . . . . . . . 54
6.2 Computation of repetition counts . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
7 Conclusion 59
7.1 Summary of the work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
7.2 Critical evaluation of own work . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
7.3 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
Bibliography 61
Index 64
ii
Acronyms
ACC accelerometer. 2, 7–10, 33, 34, 36, 37, 43, 46, 54, 59, 60
CDHMM continuous density hidden Markov model. 2, 42, 46, 48–54, 59
DCT discrete cosine transform. 7, 8, 54

DFT discrete Fourier transform. 2, 35, 46, 54, 59
DHMM discrete hidden Markov model. 2, 38, 42, 46, 48–52, 54, 59
DTW dynamic time warping. 8, 46, 54
EMG electromyogram. 8, 60
FFT fast Fourier transform. 7, 8, 35, 36, 46
GYRO gyroscope. 2, 7, 9, 10, 43, 46, 54, 59, 60
HCI human–computer interaction. 4–7, 59

HMM hidden Markov model. 2, 8, 38, 39, 41–44, 47, 54, 60
HPF high-pass filter. 32, 34
IDFT inverse discrete Fourier transform. 2, 35, 37, 44, 46, 54, 59
KM k-means. 2, 42, 46, 49–52, 54, 59

KNN k-nearest neighbor. 7, 46, 54
LPF low-pass filter. 2, 32, 33, 44, 46, 54, 59
MGM multivariate Gaussian mixture. 2, 39, 44, 47, 49–52, 59
NB naive Bayes. 7, 46, 54
PBH peak-based heuristic. 2, 47, 54, 56–59
SVM support vector machine. 8, 46, 54
TBH time-based heuristic. 47, 54–59
WLT wavelet transform. 7, 46, 54
iii
Chapter 1
Introduction
The advent of low-cost motion sensors allow industrial designers and developers to create context-
aware applications that enhance and enrich the user interaction. With such custom devices in
place, one attached to the wrist of the non-dominant arm and the other attached to the oppo-
site ankle, a three-phase approach that keeps track of physical exercises and repetition counts is
proposed.
Chapter 2 provides a brief overview of the interdisciplinary area of human–computer inter-

action which is further refined in chapter 3. Due to the lack of a publicly available standardized
dataset, the author developed means to record, videotape, and annotate fitness sessions. This
dataset of 20 activities was constructed at the University of Siena with the help of 10 volunteers.
Chapter 4 includes the mathematical foundation that lies behind the state of the art signal
processing filters, transformations, and machine learning techniques which are combined in a
pipeline fashion, while chapter 5 details the middle- and low-level implementation decisions.
The first phase applies a fast low-pass filter, to remove the small variations that all sensors are
susceptible to, and then converts the smoothened input to the frequency domain with a discrete
Fourier transform, discards low-amplitude frequencies, and then translates them back to the
time domain with an inverse discrete Fourier transform. Once the streams are noise-free, the
second phase splits and classifies each half-overlapping sliding window into one of the classes;
two variants are presented: k-means with discrete hidden Markov models and continuous density
hidden Markov models of multivariate Gaussian mixtures. The final phase detects the number
of repetitions, either with a time- or peak-based heuristic, once a continuous block of windows
which contain activities is detected.
Chapter 6 validates the proposed approach which measures accuracies as high as 97.08% with
subject-dependent training and 96.15% with subject-independent training in recognizing the set
of activities, while an average confidence of 92.54% is obtained in a 10-fold cross-validation of the
repetition counts. The last chapter, 7, outlines a couple of innovative extensions and concludes
this research.
1.1 Research problem and objectives

Michelmann [16] states that context is defined by four qualities: location, identity, activity, and
time. Three of these (location, identity, and time) can be considered solved, while the problem
of detecting human activities remains open. The ultimate objective of this paper is to provide
1
CHAPTER 1. INTRODUCTION
other developers a framework capable of defining and detecting medium-size sets of activities.
The immediate objective of this paper is to showcase a health-related application that keeps
track of physical exercises and counts the number of repetitions.
Given two devices, each consisting of a three-axis accelerometer and a three-axis gyroscope,
the first attached to the wrist of the non-dominant arm and the second attached to the opposite
ankle, the problem translates into detecting and correctly classifying segments of continuous
streams of data into classes. The data volume remains an issue as sensor frequencies of about
70 − 100Hz are required in such problems.
Human activities have three aspects of signal characteristics that make them particularly
difficult to recognize: noisy input, segmentation ambiguity, and spatio-temporal variability. The
noisy input refers to the inherent noisy nature of the wearable inertial sensors, hence noise
reduction needs to be done ahead of time. Segmentation ambiguity refers to not knowing the
boundaries, and therefore reference patterns have to be matched with all possible segments of
input signals. Spatio-temporal variability refers to the fact that each repetition of the same
gesture varies dynamically in shape and duration, even for the same performer.
The lack of a standardized dataset used in detecting human activities is a crucial problem
that determine most scientists to focus on other open problems. Thus, a dataset was recorded
at the University of Siena with the help of 10 volunteers. All candidates performed 20 physical
exercises each with up to 12 repetitions in about 15 minutes while the sessions were videotaped;
during the course, each volunteer also performed other activities that would incorrectly account
for the monitored exercises with the sole purpose of thoroughly measuring the reliability of the
proposed approach.
1.2 Contribution
The problem of successfully recognizing classes of a medium-sized vocabulary of 20 human ac-
tivities given a continuous stream of collected samples, in a constrained time interval, poses a
trade-off between processing speed and recognition accuracy. The work presented in this paper
employs state of the art signal filters, transformations, and machine learning techniques. The
first phase applies a fast low-pass filter, to remove the small variations that all sensors are sus-
ceptible to, and then converts the smoothened input to the frequency domain with a discrete
Fourier transform, discards low-amplitude frequencies, and then translates them back to the time
domain with an inverse discrete Fourier transform. Once the streams are noise-free, the second
phase splits and classifies each half-overlapping sliding window of two and a half seconds into
one of the classes; two variants are presented: k-means with discrete hidden Markov models and
continuous density hidden Markov models of multivariate Gaussian mixtures. The final phase
detects the number of repetitions, either with a time- or peak-based heuristic, once a continuous
block of windows which contain activities is detected.
The two variants of hidden Markov models are compared in terms of computational and
storage efficiency, but the most relevant characteristic remains the recognition accuracy which is
measured in three different scenarios: subject-dependent, subject-independent, and mixed k-fold
cross-validation.
The author participated in the 2008 Google Summer of Code and developed an application in
C, for Openmoko’s second-generation mobile phone, capable of recognizing 12 motion gestures in
real-time; his bachelor’s thesis was also on the same subject and used one Nintendo Wii Remote
as input for controlling various applications on a host computer. The decision to implement the
core part of this framework in MATLAB, while the consumer endpoints in C and C#, is based
on MATLAB’s code generation capabilities for C, C++, FORTRAN, Java, and C#.
2
CHAPTER 1. INTRODUCTION
1.3 Collaboration with University of Siena

Prof. Antonio Rizzo, head of University of Siena’s Interaction Design Department contacted
the author and invited him to present his work to fellow professors and students in April 2009.
Following the initial visit of three days, the author was invited for the second time in October
2009 for a two-week visit to hold a more focused hands-on workshop. On the third and last
one-week visit in February 2010, the dataset was constructed and the sessions were videotaped.
Therefore, the author would like to give special thanks to Prof. Antonio Rizzo and the entire
team from University of Siena for their help and permission to use the dataset in his research.
3
Chapter 2
Human–computer interaction
2.1 Definition
The Association for Computing Machinery, Special Interest Group on Computer Human In-
teraction [11] defines human–computer interaction as “a discipline concerned with the design,
evaluation, and implementation of interactive computing systems for human use and with the
study of major phenomena surrounding them”.
HCI in the large is an interdisciplinary area and emerges as a specialty concern within several
disciplines, each with different emphases: computer science (application design and engineering of
human interfaces), psychology (the application of theories of cognitive processes and the empirical
analysis of user behavior), sociology and anthropology (interactions between technology, work,
and organization), and industrial design (interactive products).
Human–computer interaction is concerned with:
• The joint performance of tasks by humans and machines;

• The structure of communication between human and machine;
• Human capabilities to use the machine, including learnability of the interfaces;
• Algorithms and programming of the interface itself;

• Engineering concerns that arise in designing and building interfaces;
• The process of specification, design, and implementation of interfaces;
• Design trade-offs.
Because human–computer interaction studies a human and a machine in communication, it

draws from supporting knowledge on both the machine and the human side. On the machine
side, techniques in machine learning, operating systems, and programming languages are relevant.
On the human side, communication theory, graphic and industrial design disciplines, linguistics,
social sciences, cognitive psychology, and human performance are relevant. And, of course,
engineering and design methods are also relevant.
4
CHAPTER 2. HUMAN–COMPUTER INTERACTION
2.2 Historical roots

Human–computer interaction arose as a field from intertwined roots in computer graphics, op-
erating systems, human factors, ergonomics, industrial engineering, cognitive psychology, and
the systems part of computer science. Computer graphics was born from the use of cathode ray
tubes and pen devices very early in the history of computers. This led to the development of
several human–computer interaction techniques. Many techniques date from Sutherland’s [28]
Sketchpad Ph.D. thesis in 1963 that essentially marked the beginning of computer graphics as a
discipline.
2.3 Present trends and likely future developments

Human–computer interaction is, in the first instance, affected by the forces shaping the nature
of future computing. These forces include:
• Decreasing hardware costs leading to larger memories and faster systems;
• Miniaturization of hardware leading to portability;
• Reduction in power requirements leading to portability;
• New display technologies leading to the packaging of computational devices in new forms;
• Assimilation of computation into new environments like microwave ovens, televisions etc.
• Specialized hardware leading to new functions like rapid text search, video decoding etc.
• Increased development of network communication and distributed computing;
• Increasingly widespread use of computers, especially by people who are outside of the
computing profession;
• Increasing innovation in input techniques like voice, gesture, and pen, combined with low-
ering cost, leading to rapid adoption from people who have never used computers before;
• Wider social concerns leading to improved access to computers by currently disadvantaged
groups like young children, physically disabled etc.
2.4 Personalities
Due to the multidisciplinary nature of human–computer interaction, people with different back-
grounds contribute to its success. HCI is also referred to as man—machine interaction or
computer—human interaction.
2.4.1 Prof. Ph.D. Lawrence R. Rabiner

From 1962 through 1964, Dr. Rabiner [21] participated in the cooperative program in Electri-
cal Engineering at AT&T Bell Laboratories, Whippany and Murray Hill, New Jersey. During
this period Dr. Rabiner worked on designing digital circuitry, issues in military communications
problems, and problems in binaural hearing. Dr. Rabiner joined AT&T Bell Labs in 1967 as
a Member of the Technical Staff. He was promoted to Supervisor in 1972, Department Head
in 1985, Director in 1990, and Functional Vice President in 1995. He joined the newly created
5
CHAPTER 2. HUMAN–COMPUTER INTERACTION
AT&T Labs in 1996 as Director of the Speech and Image Processing Services Research Lab, and
was promoted to Vice President of Research in 1998 where he managed a broad research program
in communications, computing, and information sciences technologies. Dr. Rabiner retired from
AT&T at the end of March 2002 and is now a Professor of Electrical and Computer Engineer-
ing at Rutgers University, and the Associate Director of the Center for Advanced Information
Processing at Rutgers.
Dr. Rabiner is co-author of the books ”Theory and Application of Digital Signal Processing”
(Prentice-Hall, 1975), ”Digital Processing of Speech Signals” (Prentice-Hall, 1978), ”Multirate
Digital Signal Processing” (Prentice-Hall, 1983), and ”Fundamentals of Speech Recognition”
(Prentice-Hall, 1993).
Dr. Rabiner is a member of Eta Kappa Nu, Sigma Xi, Tau Beta Pi, the National Academy of
Engineering, the National Academy of Sciences, and a Fellow of the Acoustical Society of Amer-
ica, the IEEE, Bell Laboratories, and AT&T. He is a former President of the IEEE Acoustics,
Speech, and Signal Processing Society, a former Vice-President of the Acoustical Society of Amer-
ica, a former editor of the ASSP Transactions, and a former member of the IEEE Proceedings
Editorial Board.
Education
• Ph.D. in Electrical Engineering, Massachusetts Institute of Technology, 1967
• MS in Electrical Engineering, Massachusetts Institute of Technology, 1964
• BS in Electrical Engineering, Massachusetts Institute of Technology, 1964
2.4.2 Assoc. Prof. Ph.D. Thad Starner

Dr. Starner [27] is an Associate Professor at Georgia Institute of Technology’s School of Inter-
active Computing. He was perhaps the first to integrate a wearable computer into his everyday
life as an intelligent personal assistant. Starner’s work as a Ph.D. student would help found
the field of Wearable Computing. His group’s prototypes and patents on mobile MP3 players,
mobile instant messaging and e-mail, gesture-based interfaces, and mobile context-based search
foreshadowed now commonplace devices and services.
Mr. Starner has authored over 100 scientific publications with over 100 co-authors on mobile
human–computer interaction, pattern discovery, human power generation for mobile devices, and
gesture recognition, and he is a founder and current co-chair of the IEEE Technical Committee
on Wearable Information Systems. His work is discussed in public forums such as CNN, NPR,
the BBC, CBS’s 60 Minutes, The New York Times, Nikkei Science, The London Independent,
The Bangkok Post, and The Wall Street Journal.
Education
• Ph.D. in Media Arts and Sciences, Massachusetts Institute of Technology, 1999
• MS in Media Arts and Sciences, Massachusetts Institute of Technology, 1995
• BS in Computer Science, Massachusetts Institute of Technology, 1991

• BS in Brain and Cognitive Science, Massachusetts Institute of Technology, 1991
6
Chapter 3
Gestures and activities
3.1 Overview of current literature

The most popular technique for recognizing gestures is by using visual gesture data from a
camera-based sensor [17]. This method is fairly precise, but also the most demanding in terms
of setting up the infrastructure, maintaining the hardware and algorithm complexity. This has
consequences on the applicability of such systems: motion capture and tracking platforms are
rarely used beyond computer-aided design animation or medical purposes. Image-based gesture
recognition techniques have resource demands that can make gesture recognition particularly
difficult on resource-constrained devices. Requirements such as previously positioned cameras
and good lighting conditions make this approach unsuitable for unsupervised environments.
The availability of three-axis accelerometers and gyroscopes allows for the design of an in-
expensive gesture recognition system suited for unsupervised environments. Wearable inertial
sensors are a low-cost, low-power solution to track gestures and, more generally, activities of a
person.
Sensor-based gesture recognition techniques can be defined into two classes: discrete and
continuous. Discrete gesture recognition is performed on a gesture at a time, while continuous
gesture recognition is performed on a sequence of gestures within a contiguous block of data.
Discrete gesture recognition uses an explicit command from the user to indicate the start and
stop of the gesture (e.g. with the help of a button), while continuous gesture recognition is
performed on a continuous flow of gestural input data.
Bao [5] develops algorithms to detect physical activities from data acquired using five small ac-
celerometers worn simultaneously on different parts of the body. Acceleration data was collected
from 20 subjects in both laboratory and semi-naturalistic environments. For semi-naturalistic
data, subjects were asked to perform a sequence of everyday tasks outside of the laboratory.
Mean, energy, frequency-domain entropy, and correlation of acceleration data was calculated
over six seconds sliding windows. Decision table, k-nearest neighbor, decision tree, and naive
Bayes classifiers were tested on these features. Classification results using individual training and
leave-one-subject-out validation were compared. Leave-one-subject-out validation with decision
tree classifiers showed the best performance recognizing everyday activities such as walking,
watching TV, and vacuuming with an overall accuracy rate of 89%.
He et al. [9] propose a gesture recognition system based on a single three-axis accelerometer
mounted in a cell phone for human–computer interaction. Three feature extraction methods,
namely discrete cosine transform, fast Fourier transform, and a hybrid approach which combines
wavelet transform with fast Fourier transform are proposed. Recognition of the gestures is
7
CHAPTER 3. GESTURES AND ACTIVITIES
performed with support vector machines on data collected from 67 subjects. The best average
recognition result for 17 complex gestures is 87.36%, achieved with wavelet-based method, while
DCT and FFT produce an accuracy of 85.16% and 86.92% respectively.
Liu et al. [15] present an efficient recognition algorithm for interaction based on gestures
also using a single three-axis accelerometer. The approach requires a single training sample
for each gesture pattern and allows users to employ personalized gestures. The algorithm, a
modified dynamic time warping, is evaluated against a large gesture library with over 4,000
samples for eight gesture patterns collected from eight users over one month. This achieves
98.6% accuracy, competitive with statistical methods that require significantly more training
samples. Applications of gesture-based user authentication and interaction with 3D mobile user
interfaces are also showcased.
Tapia et al. [30] present a real-time algorithm for automatic recognition of not only physical
activities, but also, in some cases, their intensities, using five triaxial wireless accelerometers
and a wireless heart rate monitor. The algorithm was evaluated with datasets consisting of
30 physical gymnasium activities collected from a total of 21 people at two different laborato-
ries. On these activities, recognition accuracies of 94.6% using subject-dependent training and
56.3% using subject-independent training were obtained. The addition of heart rate data im-
proves subject-dependent recognition accuracy with 1.2% and subject-independent recognition
with 2.1%. When recognizing activity types without differentiating intensity levels, a subject-
independent performance of 80.6% is obtained.
Wilson et al. [32] create a novel wireless sensor package that enables styles of natural interac-
tion with intelligent environments. For example, a user is able to point the wand at a device and
control it using simple gestures. The proposed system leverages the intelligence of the environ-
ment to best determine the user’s intention. Details on the hardware device, signal processing
algorithms to recover position and orientation, gesture recognition techniques, and multimodal
(wand and speech) computational architecture are also presented. Furthermore, a preliminary
user study on pointing performance under conditions of tracking availability and audio feedback
is examined.
Xu et al. [34] describe a novel hand gesture recognition system that utilizes both multichannel
surface electromyogram sensors and three-axis accelerometers. Signal segments of meaningful
gestures are determined from the continuous EMG signal inputs. Multistream hidden Markov
models consisting of EMG and acceleration streams are used to classify 18 gestures, each trained
with 10 repetitions; the average recognition accuracy obtained was 91.7%. This paper also
presents a virtual Rubik’s cube game that is controlled with hand gestures.
3.2 Dataset construction

To overcome the lack of a standardized dataset, an in-house application (please see figure 3.1)
for annotating and synchronizing video with the two streams of data was developed, such that
supervised machine learning algorithms could easily be trained, tested, and validated.
The experiment was conducted with the help of 10 volunteers at the University of Siena,
in Italy. Each candidate performed 20 physical exercises in about 15 minutes while the session
was videotaped; during each course, the volunteer also performed other activities like moving a
chair or trash bin around the room, answering the mobile phone, lowering the window blinds
etc. Before the start of each session, the instructor attached the two devices to the volunteer’s
non-dominant wrist and opposite ankle [29], as seen in figure 3.2, and informed them of the
activities he or she had to do along the path. All candidates performed three repetitions of
each exercise on their first iteration of the path. In addition, four of the volunteers completed
8
Figure 3.1: The annotating application is capable of working with an entire session composed
of several recorded data files and an associated movie file. The video takes the top part of the
screen, while the bottom part presents charts with previously saved accelerometer and gyroscope
values. The application allows the researcher to annotate a portion of the data stream with the
start of an exercise (i.e. marker with yellow background) or with an end of a repetition (i.e.
marker with black background). It also gives the option of choosing which data streams (x-, y-,
or z-axis of the ankle/wrist ACC or GYRO) to view at any given time. The screenshot depicts
the start of the arm spreading exercise and its eight repetitions.
9
Figure 3.2: Two devices were attached to each volunteer’s non-dominant arm and opposite ankle.
This approach removes scenarios with increased chance of recognition failure like writing on a
piece of paper where the subject would have intensively used its dominant wrist.
the same path and activities three times; the second and third iteration contained five and nine
repetitions respectively per physical exercise.
3.3 Specification of activities

The medium-sized vocabulary of 20 activities is comprised of walking (please see figure 3.3),
running (see figure 3.4), backward running (see figure 3.5), stair climbing (see figure 3.6), jumping
rope (see figure 3.7), toe touching (see figure 3.8), walking downstairs (see figure 3.9), limbs
spreading (see figure 3.10), arm signaling (see figure 3.11), broomstick twisting (see figure 3.12),
right crossed running (see figure 3.13), left crossed running (see figure 3.14), right lateral running
(see figure 3.15), left lateral running (see figure 3.16), high knees running in place (see figure 3.17),
high knees running (see figure 3.18), chest rotation running (see figure 3.19), arm spreading (see
figure 3.20), arm alternating (see figure 3.21), and ground touching (see figure 3.22). Each
figure consists of a short description of how the exercise was carried out, six photos to better
understand the exercise, and a sample of the accelerometer and gyroscope outputs recorded
during the process.
10
Figure 3.3: Walking on a corridor of about 200 meters in an almost straight line; at the end of
the corridor, the subject had to open a door. Other segments of the path included walking inside
a room around a desk, and on other corridors.
wrist accelerometer wrist gyroscope

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 800 0 200 400 600 800
samples samples
ankle accelerometer ankle gyroscope
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 800 0 200 400 600 800
samples samples
11
Figure 3.4: Running on a distance of about 50 meters in a narrow corridor; at the end of the
exercise, the subject opened the door to exit the corridor. This exercise required the performer
to run at a slow to medium speed and followed immediately after walking.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
12
Figure 3.5: Backward running on the same corridor of about 50 meters but in the opposite
direction; at the end of the exercise, the subject had to open a door. Prior to this physical
exercise, the subject took a short break, doing other activities such as tying a shoelace.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
13
Figure 3.6: Stair climbing involved climbing up 22 steps. The stairway had three small flat
surfaces that separated the steps, which required the user to walk on tiny distances. When the
subject reached the top of the stairwell, it rested on a chair for a couple of seconds.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
14
Figure 3.7: Jumping rope is an exercise that made the subject skip a virtual rope that passes
under its feet and over its head. Some subjects jumped with both feet at a time, while others
alternatively jumped with one foot at a time.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 0 500 1000
samples samples
15
Figure 3.8: Toe touching required the subject to spread its feet apart, lean forward toward one
leg, and touch the foot with the opposite hand. Before performing this physical exercise, the
user walked around and opened a trash bin.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
16
Figure 3.9: Walking downstairs involved climbing down 22 steps. The stairway had three small
flat surfaces that separated the steps, which required the user to walk on tiny distances. When
the subject reached the bottom of the stairwell, it walked till the nearest available room.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 500 1000 1500 0 500 1000 1500
samples samples
17
Figure 3.10: Limbs spreading is a common warming up physical exercise where the subject has
to jump with its feet and arms alternatively spread apart. Some subjects carried out the activity
by spreading both limbs at the same time.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
18
Figure 3.11: Arm signaling is a simple stretching exercise that involves a horizontal to vertical
movement of the forearm. Prior to this activity, the subject had to walk around some chairs
inside the room for a couple of seconds.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
19
Figure 3.12: Broomstick twisting involves rotating the torso through the waist to one side then
to the opposite. This is achieved with a stick that is positioned on the back of the shoulders.
Before performing this exercise, the subject rested on a chair and moved a trash bin.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
20
Figure 3.13: Right crossed running on a distance of about 15 meters inside a room. Subjects
had to alternatively put the left foot in front and back of the right foot while running in a right
lateral direction. Some subjects used to put the left foot only to the front of the right foot.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 0 200 400
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 0 200 400
samples samples
21
Figure 3.14: Left crossed running on a distance of about 15 meters inside a room. Subjects had
to alternatively put the right foot in front and back of the left foot while running in a left lateral
direction. Some subjects used to put the right foot only to the front of the left foot.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 600 0 200 400 600
samples samples
22
Figure 3.15: Right lateral running on a distance of about 15 meters inside a room. Subjects had
to jump with their feet and arms spread apart in a right lateral direction. Some volunteers had
their arms above their heads when jumping while others had these in a resting position.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
23
Figure 3.16: Left lateral running on a distance of about 15 meters inside a room. Subjects had
to jump with their feet and arms spread apart in a left lateral direction. Some volunteers had
their arms above their heads when jumping while others had these in a resting position.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
24
Figure 3.17: High knees running in place needs the subject to start running in place while getting
its knees as high as possible. Before performing this exercise, the subject rested on a chair for a
couple of seconds and answered the mobile phone.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 200 400 0 200 400
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 200 400 0 200 400
samples samples
25
Figure 3.18: High knees running on a distance of about 15 meters inside a room. This activity
requires the subject to start running while raising its knees as high as possible. Prior to this,
the subject had to run in place while raising its knees as high as possible.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
26
Figure 3.19: Chest rotation running on a short distance. This exercise involves holding both
arms horizontally and rotating the chest from one side to the other while running at a slow
speed. Before this, the subject had to move a chair from one part of the room to the other.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 100 200 300 400 0 100 200 300 400
samples samples
27
Figure 3.20: Arm spreading is a typical warming up physical exercise that involves movement
just on the upper body. Subjects had to fully spread their arms and then cross these twice in
front of their chest. Some volunteers used to cross their arms just once.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 150 0 50 100 150
samples samples
28
Figure 3.21: Arm alternating is another typical warming up exercise that requires performers to
alternatively raise one arm and lower the other, in the same time, to reach full vertical extension.
Some volunteers used to make a second smaller extension in the same repetition.

4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 50 100 0 50 100
samples samples
4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 50 100 0 50 100
samples samples
29
Figure 3.22: Ground touching involves spreading the legs to a comfortable position and touching
the ground with both hands once or twice in a repetition. Before performing this physical
exercise, subjects had to walk back to the starting point of the path.

4
x x
500
2 y y
z z
0 0
/s
g
−2
−500
−4
0 20 40 60 0 20 40 60
samples samples
4
x x
500
2 y y
z z
/s
0 0
g
−2
−500
−4
0 20 40 60 0 20 40 60
samples samples
30
3.4 Possible applications

For the specific problem of interest of detecting human activities, a couple of fitness qualities like
amount of steps taken, miles traveled, and burned calories could be computed on a daily basis.
If the system was to synchronize with a nutrition website, the consumed calories could also be
measured. In addition, activity levels such as sedentary, lightly active, fairly active, and very
active could be enabled with a simple heuristic. During night, the application could track the
time when the user went to bed, time to fall asleep, number of times awakened, total time in bed,
and actual sleep time.
Context-aware applications that track human activities could adapt user interfaces, tailor the
set of application-relevant data, increase the precision of information retrieval, discover services,
make the user interaction implicit, or build smart environments. One consequence of such de-
velopments is that computing systems will appear to partially dissolve into the environment and
become much more intimately associated with their users’ activities.
31
Chapter 4
Pattern recognition
4.1 Filters
4.1.1 Low-pass filter
To detect the posture (current orientation) of a device, the portion of the acceleration data that
is caused by gravity has to be filtered out from the portion that is caused by motion of the
device. To accomplish this, a low-pass filter that reduces the influence of sudden changes on the
accelerometer data has to be enforced (please see figure 4.1). The resulting filtered values would
then reflect the more constant effects of gravity [4].
Equation 4.1 shows a simplified and fast version of a low-pass filter. This approach uses a
low filtering factor to generate a value that uses 10% of the unfiltered acceleration data and
90% of the previously filtered value. The previous values are stored in the accelX , accelY , and
accelZ variables. Because acceleration data comes in regularly, these values settle out quickly
and respond slowly to sudden but short-lived changes in motion.
accelX = (rawX ∗ k) + (accelX ∗ (1 − k))

accelY = (rawY ∗ k) + (accelY ∗ (1 − k)) (4.1)
accelZ = (rawZ ∗ k) + (accelZ ∗ (1 − k))
4.1.2 High-pass filter

To detect just the instant motion of a device, sudden changes in movement have to be isolated
from the constant effect of gravity (please see figure 4.2). This is accomplished with a high-pass
filter [4].
Equation 4.2 shows a simplified high-pass filter computation. The acceleration values from the
previous event are stored in the accelX , accelY , and accelZ variables. This approach computes
the low-pass filter filter value and then subtracts it from the current value to obtain just the
instantaneous component of motion.
accelX = rawX − ((rawX ∗ k) + (accelX ∗ (1 − k)))

accelY = rawY − ((rawY ∗ k) + (accelY ∗ (1 − k))) (4.2)
accelZ = rawZ − ((rawZ ∗ k) + (accelZ ∗ (1 − k)))
32
Figure 4.1: The low-pass filter when applied on accelerometer values with a filtering factor of 0.1 smoothens the data streams. This
shows a comparison of the unfiltered and low-pass filtered wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with low-pass filter
4
2 x
0
g
−2
−4 y
33
0 100 200 300 400 500 600 700

z
CHAPTER 4. PATTERN RECOGNITION

samples
ankle accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with low-pass filter
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
Figure 4.2: The high-pass filter when applied on accelerometer values emphasizes sudden changes in the data streams. This shows a
comparison of the unfiltered and high-pass filtered wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with high-pass filter
4
2 x
0
g
−2
−4 y
34
0 100 200 300 400 500 600 700

z

samples
ankle accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with high-pass filter
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
4.2 Transforms
4.2.1 Discrete Fourier transform
The discrete Fourier transform provides the means of transforming discrete data streams de-
fined in the time domain into functions defined in the frequency domain [31]. The fast Fourier
transform is a class of special algorithms which implement the discrete Fourier transform with
considerable savings in computational time.
Equation 4.3 describes how the complex numbers Xk that represent the Fourier coefficients of
the different sinusoidal components of the input function xn are calculated. As seen in figure 4.3,
noise is characterized by frequencies that have low amplitudes, hence these have to be discarded
in order to obtain noise-free data streams.
N −1
2πi
X
Xk = xn e− N kn k = 0, . . . , N − 1 (4.3)
n=0
4.2.2 Inverse discrete Fourier transform

Once the relevant frequencies are kept, the inverse discrete Fourier transform presented in equa-
tion 4.4 converts the function back to its time domain, as shown in figure 4.4.
N −1
1 X 2πi
xn = Xk e N kn n = 0, . . . , N − 1 (4.4)
N
k=0
35
Figure 4.3: The frequency domain of the input functions is calculated with the help of a fast Fourier transform. This shows the
amplitudes of the frequencies that exist in the wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer in time domain

4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer in frequency domain
amplitudes
1
0.5 x
0 y
36
0 100 200 300 400 500 600 700

z

f requencies
ankle accelerometer in time domain
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer in frequency domain
amplitudes
1
0.5 x
0 y
0 100 200 300 400 500 600 700
z
f requencies
Figure 4.4: Once the data streams are converted to the frequency domain, frequencies with low amplitudes are removed. The inverse
discrete Fourier transform is used to recover the noise-free wrist and ankle accelerometer signals recorded during the running exercise.
wrist accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
wrist accelerometer with discrete Fourier transform and its inverse
4
2 x
0
g
−2
−4 y
37
0 100 200 300 400 500 600 700

z

samples
ankle accelerometer
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
ankle accelerometer with discrete Fourier transform and its inverse
4
2 x
0
g
−2
−4 y
0 100 200 300 400 500 600 700
z
samples
4.3 Hidden Markov models

The hidden Markov model is a powerful statistical method of characterizing the observed data
samples of a discrete time series. Three basic problems of interest have to be addressed in
real-world applications: the evaluation, decoding, and learning problem.
4.3.1 Definition of Markov chains

A Markov chain models a class of random processes that incorporates a minimum amount of
memory without being completely memoryless [12]. Let X = X1 , X2 , · · · Xn be a sequence of
random variables from a finite discrete alphabet O = {o1 , o2 , · · · , oM }; based on Bayes’ rule,
equation 4.5 holds as Xii−1 = X1 , X2 , · · · , Xi−1 .
n
Y
P (X1 , X2 , · · · , Xn ) = P (X1 ) P (Xi |Xii−1 ) (4.5)
i=2
The random variables Xk are said to form a first-order Markov chain, provided that equation
4.6 validates. This equation is also known as the Markov assumption.
P (Xi |X1i−1 ) = P (Xi |Xi−1 ) (4.6)

This assumption uses very little memory to model dynamic data sequences: the probability
of the random variable at a given time depends only on the value at the preceding time. As a
consequence, for the first-order Markov chain, the new equation is shown in 4.7.
n
Y
P (X1 , X2 , · · · , Xn ) = P (X1 ) P (Xi |Xi−1 ) (4.7)
i=2
Formula 4.8 describes how the Markov chain can be used to model time-invariant events if
time index i is discarded.
P (Xi = s|Xi−1 = s0 ) = P (s|s0 ) (4.8)

If Xi is associated to a state, then the Markov chain can be represented by a finite state
process with transition between states specified by the probability function P (s|s0 ). Using this
finite state representation, the Markov assumption is translated to the following: the probability
that the Markov chain will be in a particular state at a given time depends only on the state of
the Markov chain at the previous time.
4.3.2 Definition of hidden Markov models

A hidden Markov model is a collection of states connected by transitions. Each state is character-
ized by two sets of probabilities: a transition probability, and either a discrete output probability
distribution or continuous output probability density function which, given the state, defines the
conditional probability of emitting each output symbol from a finite alphabet or a continuous
random vector.
The HMM can be viewed as a double-embedded stochastic process with an underlying stochas-
tic process (the state sequence) not directly observable. This underlying process can only be
probabilistically associated with another observable stochastic process producing the sequence
of observable features. Formally speaking [12], a discrete hidden Markov model is defined by:
38
• O = {o1 , o2 , · · · , oM }, the output observation alphabet. These observation symbols corre-

spond to the physical output of the system being modeled.
• Ω = {1, 2, · · · , N }, the set of states representing the state sequence. These states are hidden
and for many practical applications there is often some physical significance attached to
the states of the model.
• A = {aij }, the transition probability matrix, where aij is the probability of taking a
transition from state i to state j, calculated as in equation 4.9.
aij = P (st+1 = j|st = i) (4.9)
• B = {bj (k)}, the output probability matrix, where bj (k) is the probability of emitting
symbol ok when state j is entered, calculated as in equation 4.10.
bj (k) = P (Xt = ok |st = j) (4.10)
• π = {πi }, the initial state distribution, as presented in equation 4.11.
πi = P (s1 = i) (4.11)
In addition, the variables that define a hidden Markov model need to satisfy the constraints
exemplified in equation 4.12.
PN
aij = 1
Pj=1
M
bj (k) = 1 (4.12)
Pk=1
N
i=1 i = 1
π
For convenience, the notation Φ = (A, B, π) will be used throughout this paper.
4.3.3 Definition of continuous density hidden Markov models

If the observation does not come from a finite set, but from a continuous space, the discrete
output distribution discussed in section 4.3.2 needs to be modified. The difference between the
discrete and the continuous HMM lies in the different form of output probability functions.
In choosing continuous output probability density functions bj (x), the first candidate is multi-
variate Gaussian mixture density functions; this is because these can approximate any continuous
probability density function. Equation 4.13 shows how a Gaussian probability density function
with M mixtures looks like; x is the vector being modeled, cjk is the mixture coefficient for the
kth mixture in state j and N denotes a single Gaussian density function with mean vector µjk
and covariance matrix Σjk for the kth mixture component in state j.
M
X M
X
bj (x) = cjk N (x, µjk , Σjk ) = cjk bjk (x) (4.13)
k=1 k=1
The mixture gains have to satisfy the stochastic constraint from equation 4.14 so that the
probability density function is properly normalized, i.e. equation 4.15 holds.
M
X
cjk = 1 cjk ≥ 0 (4.14)
k=1
39
Z +∞
bj (x)dx = 1 (4.15)
−∞
4.3.4 The evaluation problem

Given the observation sequence x = x1 , x2 , · · · , xT and a model Φ = (A, B, π), how can P (X|Φ),
the probability of the observation sequence, be efficiently computed [20]?
The forward-backward procedure

The forward variable αt (i) is defined in formula 4.16 and represents the probability of the partial
observation sequence x1 x2 · · · xt , and state i at time t, given the model Φ.
αt (i) = P (x1 x2 · · · xt , st = i|Φ) (4.16)

αt (i) can be solved inductively: the initialization is presented in equation 4.17, the induction
steps in 4.18, and the final solution in 4.19.
α1 (i) = πi bi (x1 ) (4.17)
N
hX i
αt+1 (j) = αt (i)aij bj (xt+1 ) (4.18)
i=1
N
X
P (X|Φ) = αT (i) (4.19)
i=1
Strictly speaking, only the forward part of the forward-backward procedure is needed to solve
the evaluation problem. However, the backward part of the procedure is introduced because it’s
used to help solve the learning problem.
In a similar manner, a backward variable βt (i) is considered in formula 4.20 as the probability
of the partial observation sequence xt+1 xt+2 · · · xT , given state i at time t and model Φ.
βt (i) = P (xt+1 xt+2 · · · xT |st = i, Φ) (4.20)

Again, βt (i) can be solved inductively: the initialization is shown in equation 4.21 and the
induction steps in 4.22.
βT = 1 (4.21)
N
X
βt (i) = aij bj (xt+1 )βt+1 (j) (4.22)
j=1
4.3.5 The decoding problem

Given the observation sequence x = x1 x2 · · · xT and a model Φ = (A, B, π) how can the corre-
sponding state sequence Q = s1 s2 · · · sT , which is optimal is some meaningful sense (best explains
the observations), be computed [20]?
40
The Viterbi procedure

To find the single best state sequence Q = s1 s2 · · · sT for the given observation sequence x =
x1 x2 · · · xT , a quantity is defined in equation 4.23 which denotes the best score (highest proba-
bility) along a single path, at time t, which accounts for the first t observations and ends in state
i.
δt (i) = max P (s1 s2 · · · st = i, x1 x2 · · · xt |Φ) (4.23)

s1 ,s2 ,···,st−1
To actually retrieve the state sequence, the argument which maximized the previous quantity
needs to be tracked for each t and i; this is done via the variable Ψt (i). These computations can
be performed inductively: the initialization is described in equation 4.24, the induction steps in
4.25, and the final solution in 4.26; the sequence of the states have to be backtracked like in
formula 4.27.
δ1 (i) = πi bi (x1 )
(4.24)
Ψ1 (i) = 0
δt (j) = max1≤i≤N [δt−1 (i)aij ]bj (xt )

(4.25)
Ψt (j) = arg max1≤i≤N [δt−1 (i)aij ]
P ∗ = max1≤i≤N [δT (i)]

(4.26)
s∗T = arg max1≤i≤N [δT (i)]
s∗t = Ψt+1 (s∗t+1 ) (4.27)
4.3.6 The learning problem

The most difficult problem of hidden Markov models is to determine a method to adjust the
model parameters (A, B, π) to maximize the probability of the observation sequence. There is no
known way to analytically solve for the model which maximizes the probability of the observation
sequence. In fact, given any finite observation sequence as training data, there is no optimal way
of estimating the model parameters. However, Φ = (A, B, π) can be chosen such that P (x|Φ) is
locally maximized using an iterative procedure such as the Baum-Welch method [20].
The Baum-Welch procedure

In order to describe the procedure for re-estimation (iterative update and improvement) of HMM
parameters, ξt (i, j) needs to be defined in equation 4.28 to represent the probability of being in
state i at time t, and state j at time t + 1, given the model and the observation sequence.
ξt (i, j) = P (st = i, st+1 = j|x, Φ) (4.28)

ξt (i, j) can also be written as in equation 4.29 where the numerator term is just P (st =
i, st+1 = j|x, Φ) and the division by P (x, Φ) gives the desired probability measure.
αt (i)aij bj (xt+1 )βt+1 (j) αt (i)aij bj (xt+1 )βt+1 (j)

ξt (i, j) = = PN PN (4.29)
P (x|Φ) i=1 j=1 αt (i)aij bj (xt+1 )βt+1 (j)
γt (i) is defined as the probability of being in state i at time t, given the observation sequence
and the model; hence, γt (i) relates to ξt (i, j) by summing over j, yielding formula 4.30.
41
N
X
γt (i) = ξt (i, j) (4.30)
j=1
The summation of γt (i) over time index t (please see equation 4.31) represents the expected
number of times that state i is visited, or equivalently, the expected number of transitions made
from state i. Similarly, summation of ξt (i, j) over t (see equation 4.32) can be interpreted as the
expected number of transitions made from state i to state j.
T
X −1
γt (i) = expected number of transitions from state i (4.31)
t=1
T
X −1
ξt (i, j) = expected number of transitions from state i to state j (4.32)
t=1
Therefore, formulas 4.33, 4.34, and 4.35 can be used for re-estimation of the parameters of a
discrete hidden Markov model.
π̂i = γ1 (i) (4.33)
PT −1
ξt (i, j)
âij = Pt=1
T −1
(4.34)
t=1 γt (i)
PT
γt (j) s.t. xt = ok
b̂j (k) = t=1 PT (4.35)
t=1 γt (j)
The Baum-Welch procedure for continuous density hidden Markov models

h α (j)β (j) ih cjk N (xt , µjk , Σjk ) i
t t
γt (j, k) = PN PM (4.36)
j=1 αt (j)βt (j) m=1 cjm N (xt , µjm , Σjm )
PT
γt (j, k)
ĉjk = PT t=1 PM (4.37)
t=1 k=1 γt (j, k)
PT
γt (j, k)xt
µ̂jk = Pt=1 T
(4.38)
t=1 γt (j, k)
PT
γt (j, k)(xt − µjk )(xt − µjk )t
Σ̂jk = t=1 PT (4.39)
t=1 γt (j, k)
4.3.7 Issues and limitations

Initial estimates
In theory, the Baum-Welch re-estimation algorithm of the hidden Markov models should reach
a local maximum for the likelihood function. The problem of choosing the right initial estimates
of the HMM parameters so that the local maximum becomes the global maximum remains
open. Furthermore, for continuous density hidden Markov models, good initial estimates for the
Gaussian density functions are essential [21] and techniques such as k-means clustering should
be enforced.
42
Left-to-right vs. ergodic topologies

Motion is a time-evolving non-stationary signal. Each hidden Markov model state has the ability
to capture some quasi-stationary segment in the non-stationary motion signal. A left-to-right
topology is a natural candidate to model the accelerometer and gyroscope signals as long as the
boundaries of the exercises are determined. When the quasi-stationary motion segment evolves,
the left-to-right transition enables a natural progression of such an evolution. This topology is, in
fact, one of the most popular HMM structures used in state of the art speech recognition systems
[12]. On the other hand, when working with half-overlapping sliding windows, the boundaries of
the activity are not clearly defined and the ergodic model represents a viable alternative.
Deleted interpolation
A problem associated with training HMM parameters via re-estimation methods is that the
observation sequence used for training is, of necessity, finite. Thus, there is often an insufficient
number of occurrences of different model events to give good estimates of the model parameters.
A solution to this problem is to interpolate one set of parameter estimates with another set of
parameter estimates from a model for which an adequate amount of training data exists. The
estimates of the parameters of the first model λ = (A, B, π) can be mixed with the parameters
of the second model λ0 = (A0 , B 0 , π 0 ) to reach the interpolated model λ̄ = (Ā, B̄, π̄). Equation
4.40 shows how the interpolation is obtained; represents the weighting of the parameters of the
first model and (1 − ) represents the weighting of the parameters of the second model. Hence,
the re-estimated hidden Markov model will approach the global maximum slower.
λ̄ = λ + (1 − )λ0 (4.40)
43
Chapter 5
Software architecture, design, and

implementation
5.1 Pipes and filters

The approach follows the pipes and filters architectural style which connects computational com-
ponents (filters) through connectors (pipes) such that computations are concurrently performed
in a streamlike fashion. Also known as the data flow architecture, it’s an architecture that sepa-
rates computation from control [2]; while the former is represented by algorithms formulated in
a programming language, the latter is formulated in terms of components which carry out the
computations and connectors which transport data from one component to the other.
There are a number of advantages for making use of the pipes and filters architectural pat-
tern, especially in applications of signal processing, such as [8]: natural support for concurrent
execution and increased support for reuse, as any two filters can be hooked together, assuming
these agree on the data contract. Systems can also be easily maintained and enhanced because
new filters can be added to existing systems and old filters can be replaced by improved ones.
Furthermore, the designer is allowed to understand the overall input/output behavior of the
framework as a simple composition of the behaviors of the individual filters.
However, because of their transformational character, pipes and filters systems are not good at
handling interactive applications. Another problem occurs when filters are required to maintain
correspondences between two separate but related streams of data.
The framework is composed of a stream synchronizer, a window splitter, a noise remover, an
activity detector, and a repetition counter, as reflected in figure 5.1. The first component is a
consolidation filter that synchronizes the 12 streams of data into a linear sequence of samples.
The second component is a batching filter that splits the sequence of samples into half-overlapping
sliding windows of 256 frames each [6]. The next part applies a low-pass filter on the accelerations
to remove the inadvertent variations of the sensors. The second step of this filter converts the
signal to frequencies and discards the ones with low amplitudes; the relevant frequencies that
created the input functions are transformed back to the time domain with an inverse discrete
Fourier transform. Each window is then classified into an exercise with either a discrete or
continuous hidden Markov model of multivariate Gaussian mixtures. The final component counts
the number of repetitions that were performed in a continuous block of windows detected as
activities with two heuristics, one being time-based and the other relying on the number of
peaks that exist in the streams.
44
CHAPTER 5. SOFTWARE ARCHITECTURE, DESIGN, AND IMPLEMENTATION
Figure 5.1: The architecture of the framework relies on the well-known pipes and filters archi-
tectural style. Yellow shapes represent sensors while blue circles represent filters. This depicts
the entire flow of the process, starting with the sensors and ending with the recognition results:
id/name of the exercise and the associated number of performed repetitions.
accelerometer wrist ankle accelerometer

gyroscope device device gyroscope
consolidation filter that merges the

sync
12 streams of data into a sequence
streams
of samples
batching filter that creates half-

split in
overlapping sliding windows of 256
windows
frames
noise removal with a low- or high-

remove
pass filter, as well as with a discrete
noise
Fourier transform and its inverse
recognition of activities with either

if an exercise is detected in a win- discrete (codebook created with k-
detect
dow, it outputs its id/name means) or continuous density hid-
activities
den Markov models of multivariate
Gaussian mixtures
count count of repetitions with either

if an exercise is detected, it counts
repetitions a time-based heuristic or a peak-
the number of repetitions
based heuristic
45
5.2 Design
5.2.1 Challenges
A simple calculus based on the hardware specification [1] [3] of the two identical devices show
that each second 2 devices × (3 accelerations + 3 rotations) × 4 bytes × 100 hz = 4.68 bytes are
sent over for processing. Moreover, it can be seen that five days of continuous logging in binary
mode accounts for as much as 2 GB. Thus, the task of classifying a medium-sized vocabulary of
human activities involves designing a technique that not only has a good recognition rate, but
is also optimized as much as possible [17].
5.2.2 Data preprocessing

Given the inherent noise that comes with hardware sensors, ±0.05g for accelerometers [1] and
±20◦ /s for gyroscopes [3], using a simple and fast low-pass filter tends to remove the small
vibrations that each sensor is susceptible to. This approach alone is not enough to come up
with an entirely noise-free signal, and more advanced techniques like discrete Fourier transforms
have to be employed. The size of each sliding window, 256 frames, is large enough in order for
the fast Fourier transform to catch the periodicity of the repetitions. However, the use of fewer
frames in a window is not advised as it fails to correctly determine the relevant frequencies.
Once the amplitudes and phases of the frequencies are computed, these are rounded toward zero
with a chosen factor of 50; thus, frequencies with low amplitudes, usually generated by noise,
are discarded. The inverse discrete Fourier transform performs the final preprocessing step of
converting the correct frequencies back to the time domain in a clean signal. Nijsen et al. [18] use
other transforms such as the wavelet transform as a preprocessing step in detecting myoclonic
seizures.
5.2.3 Recognition of activities

Flanagan et al. [7] devise an one of a kind approach that uses self-organizing maps–an unsuper-
vised machine learning technique–for discriminating between walking, walking downstairs, and
walking upstairs. This approach clusters similar feature vectors together to translate the con-
tinuous domain of the problem in a discrete one; afterwards, a dynamic time warping algorithm
is applied on the test and reference vectors. Others use decision tables, decision trees, k-nearest
neighbor [26], support vector machines [9], naive Bayes [22] [23], neural [35], and fuzzy c-means
[10] classifiers; in addition, Liu et al. [14] utilizes genetic algorithms as a tool for selecting the
most significant features.
Discrete hidden Markov models

Schlomer et al. [25] handcraft a codebook with 18 centroids, placed on a sphere, to serve as input
for their discrete hidden Markov models. An alternative approach was taken in this paper where
32 centroids were computed with k-means clustering for transforming the continuous samples into
a sequence of discrete centroid identifiers. The use of a codebook will degrade results significantly,
but will speed up the running time considerably if only a few centroids are considered [20].
Continuous density hidden Markov models

Used in state of the art speech recognition systems, continuous density hidden Markov models
[33] [32] perform exceptionally well on large vocabularies. Huang et al. [12] designed and
46
implemented the Microsoft Speech Recognition Engine on vocabularies with more than 60,000
words; IBM’s approach was similar [13].
Each of the 20 presented activities was associated to one hidden Markov model with eight
states, where each state is characterized by a single multivariate Gaussian mixture [24]. In
addition to these, an extra class was added to model anything else that is not considered an
exercise. Given that the starting and finishing point of the physical exercises is not determined,
the HMMs were chosen to be ergodic rather than left-to-right [20]. Experimental results showed
that computing the inverse of the covariance matrix with its diagonal values not only gives better
recognition results, but is also much faster than computing the full inverse. Each of the 21 hidden
Markov models was tested against all sliding windows and the one with the highest logarithmic
probability was considered to be the winner.
5.2.4 Computation of repetition counts

Two heuristics compute the number of repetitions performed in a contiguous block of activities.
The first heuristic, TBH, looks at the number of frames that make up the detected block and
divides it by the average number of samples required to execute a single repetition. The second
heuristic, PBH, counts the number of peaks that exist in the low-pass filtered streams of data
and uses that to calculate the number of repetitions.
5.3 Implementation
The statistical part of the framework was written in MATLAB and its compiler was used to gen-
erate C function libraries. On top of these, another stack of C wrapper functions for interfacing
with the MATLAB C-specific functions was added. The visual interface of the tool is in C#
and invokes the exported wrapper functions from the unmanaged dynamic link libraries. With
the help of this tool, users can record, annotate, train, test, and validate sessions in subject-
dependent, subject-independent, and mixed k-fold cross-validation scenarios.
47
Chapter 6
Results
6.1 Recognition of activities

The recognition rate of the proposed approaches was calculated in three distinct scenarios to thor-
oughly measure the reliability of the algorithms. The first scenario involved a subject-dependent
validation performed on five subjects. As seen in figure 6.1, the highest obtained accuracy is
97.08% for the case of continuous density hidden Markov models and 86.70% for the case of
discrete hidden Markov models. The second scenario involved a subject-independent validation
performed on the same five subjects. As figure 6.2 shows, the highest obtained recognition
rate is 96.15% for CDHMMs, and 77.01% for DHMMs. The final mixed 5-fold cross validation,
presented in figure 6.3, measures accuracies as high as 95.67% for CDHMMs and 82.68% for
DHMMs.
Figure 6.4 depicts that the system is able to successfully process one half-overlapping sliding
window of 256 frames, accounting for two and a half real seconds of data, in 0.38s if continuous
density hidden Markov models are used. While a number of iterations less than five keeps the
recognition rate at a satisfactory level and provides fast adjustment of the parameters, using more
than 15 iterations is likely to significantly increase the accuracy, especially in subject-dependent
scenarios.
The confusion matrix from table 6.1 provides an analysis of the misclassifications of one
subject’s activities. In that particular case, walking was sometimes incorrectly classified with
other activities, running with stair climbing, and walking downstairs with walking. In a more
general sense, the exercises with the highest chance of being incorrectly detected are the ones
that occur in a daily fashion, such as walking, running, backward running, stair climbing, and
walking downstairs. All CDHMMs attached to other activities score a near-perfect 100%.
22 continuous density hidden Markov models require 229, 424bytes = 224.04KB to store
the initial state vectors, transition matrices, mean vectors, covariance matrices, and mixture
coefficients, while 22 discrete hidden Markov models require a mere 59, 360bytes = 57.96KB to
store the centroids, transition matrices, and emission probabilities, amount which is almost four
times smaller than what CDHMMs need.
48
CHAPTER 6. RESULTS
Figure 6.1: The subject-dependent validation of the activities was computed against five of the
subjects that completed the path three times. For each subject, two of its sessions were used in
training and the remaining one in testing. In this scenario, the DHMMs were set to work with
eight states while the k-means clustering algorithm transformed the continuous domain into 32
discrete centroids. The CDHMMs were also configured to work with eight states, but each state
was modeled with a single multivariate Gaussian mixture. Both approaches were trained in a
maximum of 25 iterations to reach a converge threshold of less than 10−5 . The chart depicts
that discrete hidden Markov models manage to reach a recognition rate as high as 86.70% for
the third subject, while continuous density hidden Markov models score as high as 97.08% for
the fifth subject.
1
0.8259 DHMM
0.9379 CDHMM
0.95
0.9
accuracy
0.85
0.8
0.75
0.7
1 2 3 4 5
subject
49
CHAPTER 6. RESULTS
Figure 6.2: The subject-independent validation of the activities was computed against five of the
subjects that completed the path three times. For each subject, all other 12 sessions were used
in training and the subject’s all three sessions in testing. In this scenario, the DHMMs were
set to work with eight states while the k-means clustering algorithm transformed the continuous
domain into 32 discrete centroids. The CDHMMs were also configured to work with eight states,
but each state was modeled with a single multivariate Gaussian mixture. Both approaches were
trained in a maximum of 25 iterations to reach a converge threshold of less than 10−5 . The
chart depicts that discrete hidden Markov models manage to reach a recognition rate as high as
77.01% for the second subject, while continuous density hidden Markov models score as high as
96.15% for the first subject.
1
0.7196 DHMM
0.95 0.9151 CDHMM
0.9
0.85
accuracy
0.8
0.75
0.7
0.65
0.6
1 2 3 4 5
subject
50
CHAPTER 6. RESULTS
Figure 6.3: The 5-fold cross-validation of the activities was computed against the entire set
of 22 sessions. In this scenario, the DHMMs were set to work with eight states while the k-
means clustering algorithm transformed the continuous domain into 32 discrete centroids. The
CDHMMs were also configured to work with eight states, but each state was modeled with
a single multivariate Gaussian mixture. Both approaches were trained in a maximum of 20
iterations to reach a converge threshold of less than 10−5 . The chart depicts that discrete hidden
Markov models manage to reach a recognition rate as high as 82.68% for the fifth fold, while
continuous density hidden Markov models score as high as 95.67% for the fourth fold.
1
0.7811 DHMM
0.8537 CDHMM
0.95
0.9
accuracy
0.85
0.8
0.75
0.7
1 2 3 4 5
f old
51
CHAPTER 6. RESULTS
Figure 6.4: The running times of the activities were computed on a Lenovo ThinkPad X61 tablet
with an Intel Core2 Duo CPU L7500 running at 1.6GHz. The DHMMs were set to work with
eight states while the k-means clustering algorithm transformed the continuous domain into 32
discrete centroids. The CDHMMs were also configured to work with eight states, but each state
was modeled with a single multivariate Gaussian mixture. Both approaches were trained in a
maximum of 25 iterations to reach a converge threshold of less than 10−5 . This shows that
discrete hidden Markov models need 0.64s to adjust their parameters on one real second of
data, while continuous density hidden Markov models need only 0.31s to achieve the same thing.
Recognizing one real second of data requires 0.53s for DHMMs, and 0.15s for CDHMMs.
training
testing
CDHMM
DHMM
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7

seconds
52
Table 6.1: The confusion matrix of one subject’s exercises is presented. The total accuracy for this particular volunteer is 96.68%.
However, the table shows that walking was sometimes incorrectly classified with other activities, running was misclassified once as stair
climbing, and walking downstairs has a high chance of being detected just as walking. All other CDHMMs scored a perfect 100%.
high knees running in place
chest rotation running

right crossed running
right lateral running

broomstick twisting
left crossed running

walking downstairs
high knees running

left lateral running
backward running
ground touching
arm alternating
limbs spreading
other activities
arm spreading
stair climbing
arm signaling
jumping rope
toe touching
running
walking
other activities 254 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%
walking 8 128 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 94%
running 0 0 4 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 80%
53
backward running 0 0 0 9 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%

stair climbing 0 0 0 0 15 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%
jumping rope 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%
toe touching 0 0 0 0 0 0 8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 100%
walking downstairs 0 8 0 0 0 0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 27%
limbs spreading 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 0 100%
arm signaling 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 0 0 100%
broomstick twisting 0 0 0 0 0 0 0 0 0 0 7 0 0 0 0 0 0 0 0 0 0 100%
right crossed running 0 0 0 0 0 0 0 0 0 0 0 5 0 0 0 0 0 0 0 0 0 100%
CHAPTER 6. RESULTS
left crossed running 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 0 0 100%
right lateral running 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 0 0 0 100%
left lateral running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 4 0 0 0 0 0 0 100%
high knees running in place 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 0 0 0 100%
high knees running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 0 0 0 100%
chest rotation running 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 8 0 0 0 100%
arm spreading 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 6 0 0 100%
arm alternating 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 7 0 100%
ground touching 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 5 100%
CHAPTER 6. RESULTS
Table 6.2: Comparison of the proposed DHMM- and CDHMM-based approaches to other meth-
ods found in recent literature. Some authors reach high recognition accuracies because of a small
vocabulary or because of the designed set of monitored gestures which contains very different
activities. Tapia et al. [30] has the most similar configuration and testing scenario to ours and
reaches a recognition rate of 94.60% with a naive Bayes classifier; this is comparable with the
proposed LPF + DFT + IDFT + CDHMM approach.
# of classes sensors accuracy

SVM [29] 12 5 ACCs 88.13%
SVM [19] 10 1 ACC 96.00%
DCT + SVM [9] 17 1 ACC 85.16%
DFT + SVM [9] 17 1 ACC 86.92%
WLT + SVM [9] 17 1 ACC 87.36%
NB [29] 12 5 ACCs 84.46%
NB [30] 30 5 ACCs 94.60%
KNN [26] 5 1 ACC 86.80%
DHMM [25] 5 1 ACC 89.70%
CDHMM [19] 10 1 ACC 97.40%
DTW [15] 8 1 ACC 98.60%
LPF + DFT + IDFT + KM + DHMM 20 2 ACCs + 2 GYROs 71.96–82.59%
LPF + DFT + IDFT + CDHMM 20 2 ACCs + 2 GYROs 91.51–93.79%
6.1.1 Comparison to other approaches

Table 6.2 shows how the proposed discrete and continuous methods compare themselves in terms
of subject-dependent and subject-independent validation to other authors’ approaches. Tapia et.
al [30] create a vocabulary of 30 physical exercises and design a naive Bayes classifier to reach a
94.60% recognition accuracy. Compared to most authors’ methods, the LPF + DFT + IDFT +
CDHMM approach ranks second (considering only studies on at least 15 activities) and is able
to score an average of 91.51% in subject-independent validation and an average of 93.79% in
subject-dependent validation.
Prekopcsak [19] compares support vector machines to hidden Markov models and concludes
that HMMs have a higher recognition rate but a slower running time than support vector ma-
chines. However, the main advantage of HMMs over SVMs is their time-dependent analysis, such
that hidden Markov models are able to filter out more than 95% of noise movements.
6.2 Computation of repetition counts

The two heuristics, namely the time-based heuristic and the peak-based heuristic, are also com-
pared in three different scenarios. The output of the functions was forced to be an integer, thus
rounded when necessary. The first scenario which involves a subject-dependent validation is pre-
sented in figure 6.5. It shows that the average recognition rate is 95.46% for TBH and 95.20%
for PBH. The second scenario, presented in figure 6.6, displays recognition rates of up to 97.28%
for TBH and 97.60% for PBH in a subject-independent validation. The third scenario is a 10-fold
cross-validation which measures averages of 92.65% for TBH and 92.27% for PBH, as seen in
figure 6.7.
54
CHAPTER 6. RESULTS
Figure 6.5: The subject-dependent validation of the repetition counts was computed against five
of the subjects which completed the path three times. For each subject, two of its sessions were
used in training and the remaining one in testing. These values show little difference between
the two heuristics, as both score as high as 97.31% for the second subject.
1
0.9546 TBH
0.98 0.9520 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5
subject
Given that these heuristics compute coefficients that relate directly to the number of repeti-
tions, the time-based heuristic requires only 0.17ms to come up with a result for a data stream
of one real second, as indicated in figure 6.8.
55
CHAPTER 6. RESULTS
Figure 6.6: The subject-independent validation of the repetition counts was computed against
five of the subjects which completed the path three times. For each subject, all other 12 sessions
were used in training and the subject’s all three sessions in testing. These values show little
difference between the two heuristics, but the time-based heuristic scores as high as 97.28% for
the second subject, while the peak-based heuristic scores as high as 97.60% also for the second
subject.
1
0.9107 TBH
0.98 0.9086 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5
subject
56
CHAPTER 6. RESULTS
Figure 6.7: The 10-fold cross-validation of the repetition counts was computed against the entire
set of 22 sessions. This depicts that the time-based heuristic scores as high as 95.97% for the
seventh fold, while the peak-based heuristic scores as high as 96.50% also for the second fold.
1
0.9254 TBH
0.98 0.9227 PBH
0.96
0.94
0.92
accuracy
0.9
0.88
0.86
0.84
0.82
0.8
1 2 3 4 5 6 7 8 9 10
f old
57
CHAPTER 6. RESULTS
Figure 6.8: The running times of the repetition counts were computed on a Lenovo ThinkPad X61
tablet with an Intel Core2 Duo CPU L7500 running at 1.6GHz. This shows that the time-based
heuristic needs 0.19ms to adjust its parameters on one real second of data, while the peak-based
heuristic needs 0.22ms to achieve the same thing. Recognizing one real second of data requires
0.17ms for TBH and 0.20ms for PBH.
training
testing
PBH
TBH
0 0.05 0.1 0.15 0.2 0.25

milliseconds
58
Chapter 7
Conclusion
7.1 Summary of the work

This research provides two contributions to the field of human–computer interaction. First, it
was shown that 20 fitness activities can be optimally tracked in an unsupervised environment
with two custom devices made up of an accelerometer and a gyroscope that should be worn on
the non-dominant wrist and on the opposite ankle. This provides not only a low-cost personal
trainer that keeps track of exercises and repetition counts, but also completes Michelmann’s [16]
definition of context being determined by location, identity, activity, and time. The presented
context-aware framework is written in MATLAB, C, and C#, hence it’s interoperable enough to
be consumed from C, C++, FORTRAN, Java, or C#.
Second, the proposed three-phase approach combines state of the art filters, transforma-
tions, and machine learning techniques that measure accuracies as high as 97.08% with subject-
dependent training and 96.15% with subject-independent training in recognizing the set of activ-
ities. The first phase applies a fast low-pass filter, to remove the small variations that all sensors
are susceptible to, and then converts the smoothened input to the frequency domain with a
discrete Fourier transform, discards low-amplitude frequencies, and then translates them back
to the time domain with an inverse discrete Fourier transform. Once the streams are noise-free,
the second phase splits and classifies each half-overlapping sliding window of two and a half sec-
onds into one of the classes; two variants were presented: k-means with discrete hidden Markov
models and continuous density hidden Markov models of multivariate Gaussian mixtures. The
final phase detects the number of repetitions, either with a time- or peak-based heuristic, once
a continuous block of windows which contain activities is detected. The time-based heuristic
obtains an average confidence of 92.54% in a 10-fold cross-validation of the repetition counts.
The presented problem of recognizing physical exercises and counting the number of per-
formed repetitions shows that the applicabilities of context-aware applications are endless, such
that these could adapt user interfaces, tailor the set of application-relevant data, increase the
precision of information retrieval, discover services, or make the user interaction implicit. With
such developments, computing systems will appear to partially dissolve into the environment
and become much more intimately associated with their users’ activities.
59
CHAPTER 7. CONCLUSION
7.2 Critical evaluation of own work

To overcome the lack of a publicly available dataset, one had to be created at the University
of Siena with the help of 10 volunteers. However, to fully test the validity of the proposed
approaches, the number of recorded hours should be increased from 5.5h to a well full week of
168h. This is left as future work, as the objective of this research was to evaluate whether a
medium-sized set of activities could be optimally tracked in semi-naturalistic conditions.
A well-defined and sufficiently constrained recognition problem will lead to a more compact
pattern recognition representation and increased computational efficiency. It is therefore of
importance that the activities to be recognized are sufficiently different from one another in
order to create large interclass variations such that parts of large vocabularies could be pruned.
Unfortunately, there were no resources to design better physical exercises nor to monitor more
than 20 activities in 22 sessions.
7.3 Future work

Besides adjusting the parameters of the subject’s hidden Markov models, another complete set
of perfectly executed HMMs could be kept for reference. The latter of the models would be
trained on data recorded by world-renowned athletes and would be constantly compared to the
ones of the subject. This would enable the system to provide pieces of advice on improving the
subject’s overall performance. Furthermore, common mistakes in executing the activities would
be translated into separate HMMs and matched with the user’s once classification takes place.
To the author’s knowledge, such extensions have never been proposed before.
Xu et al. [34] add a heart rate monitor in addition to the accelerometer and conclude that the
EMG increases the recognition accuracy of physical exercises by a significant order of magnitude.
Therefore, the author is also interested in working with pedometers, blood pressure, and heart
rate monitors besides accelerometers and gyroscopes.
60
Bibliography
[1] Adxl330 small, low power, 3-axis +/-3 g imems accelerometer. Technical report, Analog
Devices, Norwood, Massachusetts 02062, 09 2006.
[2] Integration patterns: Pipes and filters. Technical report, Microsoft Corp., Redmond, Wash-
ington 98052, 06 2006.
[3] Idg-650 dual-axis gyro product specification. Technical report, InvenSense Inc., Sunnyvale,
California 94089, 05 2010.
[4] iphone application programming guide. Technical report, Apple Inc., Cupertino, California
95014, 03 2010.
[5] Ling Bao. Physical activity recognition from acceleration data under semi-naturalistic con-
ditions, 08 2003.
[6] Thomas G. Dietterich. Machine learning for sequential data: A review. Technical report,
Oregon State University, Corvallis, Oregon 97331, 2002.
[7] Adrian Flanagan, Jani Mantyjarvi, Kalle Korpiaho, and Johanna Tikanmaki. Recognizing
movements of a handheld device using symbolic representation and coding of sensor signals.
12 2002.
[8] David Garlan and Mary Shaw. An introduction to software architecture. Advances in
Software Engineering and Knowledge Engineering, 1, 01 1993.
[9] Zhenyu He, Lianwen Jin, Lixin Zhen, and Jiancheng Huang. Gesture recognition based on 3d
accelerometer for cell phones interaction. Institute of Electrical and Electronics Engineers,
2008.
[10] Mi hee Lee, Jungchae Kim, Kwangsoo Kim, Inho Lee, Sun Ha Jee, and Sun Kook Yoo.
Physical activity recognition using a single tri-axis accelerometer. volume 1, 10 2009.
[11] Hewett, Baecker, Card, Carey, Gasen, Mantei, Perlman, Strong, and Verplank. Human-
computer interaction. Association for Computing Machinery, 07 2009.
[12] Xuedong Huang, Alex Acero, and Hsiao-Wuen Hon. Spoken Language Processing: A Guide
to Theory, Algorithm and System Development. Prentice Hall, Upper Saddle River, New
Jersey 07458, 2001.
[13] Frederick Jelinek. Statistical Methods for Speech Recognition (Language, Speech, and Com-
munication). The MIT Press, Cambridge, Massachusetts 02142, 1998.
61
BIBLIOGRAPHY
[14] Alan L. Liu, Jun Yang, and Peter Pal Boda. Poster abstract: Gesture recognition via
continuous maximum entropy training on accelerometer data. Association for Computing
Machinery, 04 2009.
[15] Jiayang Liu, Lin Zhong, Jehan Wickramasuriya, and Venu Vasudevan. uwave:
Accelerometer-based personalized gesture recognition and its applications. Pervasive and
Mobile Computing, 5, 2009.
[16] Marcel Michelmann. Context-aware applications, computing in context. Technical report,
Ludwig Maximilians Universitat Munchen, Munich, Germany 80539, 04 2007.
[17] Gerrit Niezen. The optimization of gesture recognition techniques for resource-constrained
devices, 04 2008.
[18] Tamara M.E. Nijsen, Pierre J.M. Cluitmans, Paul A.M. Griep, and Ronald M. Aarts. Short
time fourier and wavelet transform for accelerometric detection of myoclonic seizures. 12
2006.
[19] Zoltan Prekopcsak. Accelerometer based real-time gesture recognition. 05 2008.
[20] Lawrence R. Rabiner. A tutorial on hidden markov models and selected applications in
speech recognition. Institute of Electrical and Electronics Engineers, 77(2), 02 1989.
[21] Lawrence R. Rabiner. Biography. http://www.caip.rutgers.edu/~lrr/lrr%20info/lrr_
biography_rutgers_2002.pdf, 2002.
[22] Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L. Littman. Activity
recognition from accelerometer data. American Association for Artificial Intelligence, 2005.
[23] Matthias Rehm, Nikolaus Bee, and Elisabeth Andre. Wave like an egyptian - accelerometer
based gesture recognition for culture specific interactions. 2008.
[24] David Marshall Rouse. Estimation of finite mixture models, 2005.

[25] Thomas Schlomer, Benjamin Poppinga, Niels Henze, and Susanne Boll. Gesture recognition
with a wii controller. 02 2008.
[26] Kai-Tai Song and Yao-Qing Wang. Remote activity monitoring of the elderly using a two-
axis accelerometer. 11 2005.
[27] Thad Starner. Biography. http://www.ic.gatech.edu/people/thad-starner, 2010.

[28] Ivan E. Sutherland. Sketchpad, a Man-Machine Graphical Communication System. PhD
thesis, Massachusetts Institute of Technology, Lincoln Laboratory, 1963.
[29] Emmanuel Munguia Tapia. Activity recognition from accelerometer data for videogame
applications. Technical report, Massachusetts Institute of Technology, Cambridge, Mas-
sachusetts 02142, 12 2003.
[30] Emmanuel Munguia Tapia, Stephen S. Intille, William Haskell, Kent Larson, Julie Wright,
Abby King, and Robert Friedman. Real-time recognition of physical activities and their
intensities using wireless accelerometers and a heart rate monitor. Institute of Electrical
and Electronics Engineers, 2007.
62
BIBLIOGRAPHY
[31] Jan Verschelde. Introduction to symbolic computation. Technical report, University of

Illinois, Department of Mathematics, Statistics, and Computer Science, Urbana, Illinois
61801, 04 2007.
[32] Andrew Wilson and Steven Shafer. Xwand: Ui for intelligent spaces. Association for
Computing Machinery, 04 2003.
[33] Daniel Wilson and Andy Wilson. Gesture recognition using the xwand. Technical report,
Carnegie Mellon University, Robotics Institute, Assistive Intelligent Environments Group
and Microsoft Research, Redmond, Washington 98052, 2002.
[34] Zhang Xu, Chen Xiang, Wang Wen-hui, Yang Ji-hai, Vuokko Lantz, and Wang Kong-qiao.
Hand gesture recognition and virtual game control based on 3d accelerometer and emg
sensors. Association for Computing Machinery, 02 2009.
[35] Jhun-Ying Yang, Jeen-Shing Wang, and Yen-Ping Chen. Using acceleration measurements
for activity recognition: An effective learning algorithm for constructing neural classifiers.
Pattern Recognition Letters, 29, 2008.
63
Index
accelerometer, 7, 46 forward-backward procedure, 40

activities, 10
arm alternating, 29 gyroscope, 7, 46
arm signaling, 19
arm spreading, 28 hidden Markov model, 38
backward running, 13 high-pass filter, 32, 34
broomstick twisting, 20 human–computer interaction, 4
chest rotation running, 27
initial estimate, 42
ground touching, 30
inverse discrete Fourier transform, 35
high knees running, 26
high knees running in place, 25 k-fold cross-validation, 48, 51, 57
jumping rope, 15 k-means clustering, 42, 46
left crossed running, 22
left lateral running, 24 left-to-right topology, 43
limbs spreading, 18 low-pass filter, 32, 33
right crossed running, 21
right lateral running, 23 Markov chain, 38
running, 12 multivariate Gaussian mixture, 39
stair climbing, 14
toe touching, 16 noise removal, 46
walking, 11
walking downstairs, 17 peak-based heuristic, 47
activity recognition, 46, 49–53 pipes and filters architectural style, 44, 45
Baum-Welch procedure, 41 repetition count, 47, 55–58

running time, 52, 58
confusion matrix, 48, 53
context, 2 storage efficiency, 48
continuous density hidden Markov model, 39 subject-dependent validation, 48, 49, 55
continuous gesture recognition, 7 subject-independent validation, 48, 50, 56
dataset, 2, 10 time-based heuristic, 47

deleted interpolation, 43
discrete Fourier transform, 35 Viterbi procedure, 41
discrete gesture recognition, 7
discrete hidden Markov model, 38
ergodic topology, 43
fast Fourier transform, 35
64

Recunoas Terea Exercit Iilor Fizice: Universitatea Babes - Bolyai, Cluj-Napoca

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recunoas Terea Exercit Iilor Fizice: Universitatea Babes - Bolyai, Cluj-Napoca

Uploaded by

Copyright:

Available Formats

Universitatea Babeş-Bolyai,

Conducător ştiinţific Absolvent

3 Gestures and activities 7

5 Software architecture, design, and implementation 44

CDHMM continuous density hidden Markov model. 2, 42, 46, 48–54, 59

DCT discrete cosine transform. 7, 8, 54

FFT fast Fourier transform. 7, 8, 35, 36, 46

GYRO gyroscope. 2, 7, 9, 10, 43, 46, 54, 59, 60

HCI human–computer interaction. 4–7, 59

KM k-means. 2, 42, 46, 49–52, 54, 59

LPF low-pass filter. 2, 32, 33, 44, 46, 54, 59

MGM multivariate Gaussian mixture. 2, 39, 44, 47, 49–52, 59

NB naive Bayes. 7, 46, 54

PBH peak-based heuristic. 2, 47, 54, 56–59

SVM support vector machine. 8, 46, 54

TBH time-based heuristic. 47, 54–59

WLT wavelet transform. 7, 46, 54

Chapter 2 provides a brief overview of the interdisciplinary area of human–computer inter-

1.1 Research problem and objectives

1.3 Collaboration with University of Siena

• The joint performance of tasks by humans and machines;

• Algorithms and programming of the interface itself;

Because human–computer interaction studies a human and a machine in communication, it

2.2 Historical roots

2.3 Present trends and likely future developments

2.4.1 Prof. Ph.D. Lawrence R. Rabiner

2.4.2 Assoc. Prof. Ph.D. Thad Starner

• BS in Computer Science, Massachusetts Institute of Technology, 1991

Gestures and activities

3.1 Overview of current literature

3.2 Dataset construction

3.3 Specification of activities

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

wrist accelerometer wrist gyroscope

3.4 Possible applications

accelX = (rawX ∗ k) + (accelX ∗ (1 − k))

4.1.2 High-pass filter

accelX = rawX − ((rawX ∗ k) + (accelX ∗ (1 − k)))

0 100 200 300 400 500 600 700

CHAPTER 4. PATTERN RECOGNITION

0 100 200 300 400 500 600 700

CHAPTER 4. PATTERN RECOGNITION

4.2.2 Inverse discrete Fourier transform

wrist accelerometer in time domain