Thesis On Gait Biometric

}w
!"#$%&'()+,-./012345<yA|
Masaryk University
Faculty of Informatics
Human Gait Recognition

Based on Body Component
Trajectories
Master thesis
Michal Balážia
Brno, 2013
Declaration
Hereby I declare, that this paper is my original authorial work, which I have
worked out by my own. All sources, references and literature used or excerpted
during elaboration of this work are properly cited and listed in complete ref-
erence to the due source.
Michal Balážia
Advisor: RNDr. Jan Sedmidubský, Ph.D.
ii
Acknowledgement
First and foremost, I thank my colleagues that motivated me produce this work.
Honza Sedmidubský, for bringing up the topic and providing many helpful
insights and encouragement, and together with Kuba Valčík for their beneficial
input and inspiring work environment. Professor Pavel Zezula, for finding ways
to aid and deal with my studies. I thank all my teachers for teaching me, my
family for loving me, and my friends for supporting me. I thank everyone for
doing their best in their respective places.
iii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Scope of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Appearance-Based Approach . . . . . . . . . . . . . . . . . . . 15
2.2 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 A Comprehensive Overview of Previous Methods . . . . . . . . 24
3 Gait Variables Representation . . . . . . . . . . . . . . . . . . . 26
3.1 Model Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Trajectories of Movement . . . . . . . . . . . . . . . . . . . . . 27
3.3 Distance-Time Dependency Signals . . . . . . . . . . . . . . . . 28
3.4 Similarity of DTDSs . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Gait Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Methods of Gait Comparison . . . . . . . . . . . . . . . . . . . 32
4 Normalization of Signals . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Identifying Footsteps . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Identifying Walk Cycles . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Time Normalization . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Conclusions and Future Research . . . . . . . . . . . . . . . . . 47
1
Abstract
This thesis concentrates on recognizing persons according to the way they walk
(gait). The primary objective is to design and implement a new method for
measuring similarity of gait features that are extracted from trajectories of 3D
coordinates of body components moving in time. The proposed method in-
troduces signals of body point pairs’ distance-time dependency while walking
as a basic unit of gait pattern. The signals are consequently normalized with
respect to a novel distance function. Parameters of the distance function are
optimized regarding the recognition rate and achieved results are experimen-
tally evaluated.
The textual part contains a comprehensive survey of existing approaches
to human gait recognition, as well as a description of a suitable model of hu-
man movement represented by 3D trajectories, a method for extracting gait
features, a proper similarity function for effective comparison of extracted fea-
tures, and results by analyzing recognition rate.
2
Keywords
biometric surveillance, human gait recognition, body point trajectory, distance-

time dependency signal, gait pattern, distance function, recognition rate
3
1 Introduction
In the world of arising terrorism and free movement of criminals, people clearly
understand the importance of security monitoring and control for the purposes
of national defense and public safety. Early-warning systems already have a
large number of surveillance cameras installed in public areas, but require
intelligent approaches to human recognition. A perfectly useful system would
analyze the collected video data and release an alert before an adverse event
happens. Detecting an abnormal behavior, the system could instatly identify
all scene participants, rapidly investigate their previous activities, and launch
tracking the suspects.
1.1 Biometrics
In our narrowly interconnected society human recognition gained considerable

importance. Establishing the identity of an individual is critical in a vari-
ety of scenarios ranging from issuing a driver’s license to granting access to
highly secured resources. The need for reliable identification (to determine our
identities) and authentication (to confirm our claimed identities) systems has
increased due to rapid advancements in networking, communication, and mo-
bility. Traditional passwords (knowledge-based schemes) and ID cards (token-
based schemes) have been used for authentication in many applications (e.g.
Internet banking) or facilities (e.g. library), although such mechanisms have
several limitations. Passwords can be guessed or divulged to illegitimate users
and ID cards can be stolen or forged, resulting in a breach of security. Thus, it
is necessary to utilize alternative methods that are not merely based on what
you know or what you have but, rather, on what you are.
Biometrics [24], as the science of recognizing an individual based on his
inherent biological traits, has gained acceptance as a sound method in various
commercial, civilian, and forensic applications, such as access control and vi-
sual surveillance. There are numerous human measurable characteristics, which
can be used to help to derive an individual’s identity. Looking at the nature
of the underlying modalities, we distinguish two basic categories of biometric
traits, physiological (passive) and behavioral (active).
Physiological biometrics are derived from a direct measurement of a per-
son’s physical characteristics which are assumed to be relatively unchanging,
such as a part of a human body. We all use a form of physiological biometrics
when we recognize our friends and aquaintances, at least by looking at them.
The most prominent and successful of these types of measures to date are:
Fingerprints
A fingerprint is formed by friction ridges and valleys of the surface
of a fingertip. It has been proven that each single finger generates
4
1. Introduction
a different fingerprint. Multiple fingerprints of an individual pro-

vide additional information to allow mass identification involving
millions of identities. However, fingerprints of a small fraction
of the population may be unsuitable for automatic identification
because of genetic factors, aging, environmental, or occupational
reasons (manual work).
DNA
DNA (deoxyribonucleic acid) contains genetic instructions of all
living creatures. DNA recognition instruments are likely to be
used in investigation of suspicious objects and in diagnostics. The
greatest advantage is its high distinctiveness, where the probabil-
ity of two people having exactly the same DNA code is one in a
hundred billion. On the other hand, DNA matching is not avail-
able in real-time because it needs a physical sample, while other
biometric systems only use an image or recording.
Retina
Capillary vessels located at the inner most sensitive layer of eye
have a unique pattern. Retina scans require the person to remove
his glasses, place his eye close to the scanner, stare at a specific
point, and remain still while the scan is completed. A retina scan
cannot be faked as it is currently impossible to forge a human
retina. Furthermore, the retina of a deceased person decays too
rapidly to be used to deceive a retinal scan. Although the enroll-
ment and scanning are intrusive and slow, the method is highly
accurate.
Iris
Iris is a pigmented portion of eye, in which a pupil is situated cen-
trally. Its visual texture is stabilized during the first two years of
life. The complex iris texture carries very distinctive information
useful for personal recognition. The hippus movement of the eye
may also be used as a measure of liveness for this biometric.
Palm vein pattern

Pattern of blood veins in the back of the hand and the wrist
contain differentiating features for personal identification. An in-
dividual’s vein pattern image is captured by radiating his hand
with near-infrared rays. It is a very secure method of authenti-
cation because this blood vein pattern lies under the skin. This
makes it almost impossible for others to read (without permis-
sion) or copy. Furthermore, it will not vary during the person’s
lifetime.
5
1. Introduction
Face
Applications of this non-intrusive method range from a static and
controlled authentication to a dynamic uncontrolled identification
in a cluttered background. The most popular approaches to face
recognition are based on either the location and shape of facial
attributes (eyes, eyebrows, nose, lips, chin) and their spatial rela-
tionships, or the overall analysis of the face image as a weighted
combination of canonical faces. A facial recognition system should
automatically detect whether a face is present in an acquired im-
age, locate the face if there is one, and recognize the face from a
general viewpoint.
Hand geometry
Hand geometry recognition systems are based on a number of
measurements taken from the human hand, including its shape,
size of palm, and the lengths and widths of its fingers. Since hand
geometry information may vary during lifetime and accessories
(jewelry) may touch extracting the variables, this biometric is
not known to be very distinctive.
By contrast, behavioral biometrics are an indirect measure of the character-

istic of the human form. Based on (hopefully) unique ways people do things, we
extract features from an action performed by an individual. The main aspect
of a behavioral biometric is the use of time as another dimension. From the
user’s point of view, data can be acquired even without his explicit consent.
Established measures include:
Signature
The way a person writes his name has been accepted in govern-
ment, legal, and commercial transactions as a method of authen-
tication. This is an obtrusive method, since the user’s effort and
contact with a pen is required. The actual signature changes over
time and is influenced by physical and emotional conditions of
the signatories. In addition, professional forgers may be able to
reproduce signatures that fool the signature verification system.
Gait
Gait refers to the manner in which a person walks, and is one of
the few biometric traits that can be used to recognize people at a
distance. Therefore, this trait is very appropriate in surveillance
scenarios. Most gait recognition algorithms attempt to extract the
human silhouette in order to derive the gait variables. Hence, the
selection of a good model to represent the human body is pivotal
to the efficient functioning of a gait recognition system. However,
6
1. Introduction
the gait of an individual is affected by footwear, clothing, affliction

of the legs, or walking surface.
Keystroke dynamics
Each person is hypothesized to type on a keyboard in a charac-
teristic way. Keystroke dynamics is a collection of detailed press
and release timing information about typing on a keyboard. This
biometric is not expected to be unique to each individual but it
may offer sufficient discriminatory information to permit identity
verification. The keystrokes of a person could be monitored un-
obtrusively and continuous verification of an individual’s identity
is available.
Voice
The measurable parameters of a person’s voice are tone, pitch,
cadence and frequency. However, they change over time due to
age, medical conditions (common cold), or emotional state. Being
not very distinctive, voice and may not be an appropriate trait for
large-scale identification. A text-dependent voice recognition sys-
tem is based on the utterance of a fixed predetermined phrase. A
text-independent voice recognition system offers more protection
against fraud, but is more difficult to design. Speaker recognition
is possible in telephone-based applications but the voice signal is
typically degraded in quality by the communication channel.
All biometric traits have their own relative merits in various operational
scenarios and their choice for a particular application depends on a series of
conditions besides its match performance. No single biometric ever meets all
the requirements (accuracy, cost, efficiency) imposed by all applications. How-
ever, the inherent limitation factors of processing a single biometric can be
alleviated by fusing the information obtained from multiple sources. For ex-
ample, the fingerprints of the right and left index fingers, or multiple images
of face from different angles, or a combination of the face and gait traits may
be used to resolve the identity of an individual. Consolidating the evidence
presented by multiple sources is expected to be more reliable. Furthermore,
fusion of biometrics helps users to get successfully enrolled and makes the sys-
tem more robust against hackers.
Most of the human recognition technologies generally require a subject
that permits and enables physical contact, close proximity or views from spe-
cific angles, in order to help the recognition procedure. Without a cooperating
subject, individuals are difficult to be reliably recognized from arbitrary views
when walking at a distance in real-world changing environmental conditions.
For optimal performance, we should make use of as much information as can
possibly be obtained from the available observations. Gait and face are two
7
1. Introduction
traits available to be captured on a video that is acquired from a distance.

Considering biometrics, there is nothing like a unique signature for a par-
ticular individual. Despite the best efforts, each time he generates a slightly
different sample. Since exact match is simply inapplicable, we are induced to
employ similarity search methods [35], such as k-NN1 or range2 query. The
recognition is done through correlation and thresholding. This does not mean
the technology is inherently insecure, as very high success rates of have been
achieved. However, technology offering 100% recognition should be greeted
with a pinch of salt.
In the world of information technology security we require biometric tech-
niques to be automatically verifiable. Manual feature extraction would be
both undesirable and time consuming due to acquiring and processing of large
amount of data. Inability to automatize would render the process infeasible on
realistic size data sets. This leads to development of fast and convenient recog-
nition systems (see Figure 1.1) that automatically evaluate plenty of queries
and give instant answers.
Database
Sample 1
Similarity 1
Database
Sample 2
Biometric Query Similarity 2
Data Sample
Database
Sample N Maximum
Similarity N Identity
Similarity
Figure 1.1: Biometric identification scheme.
A typical biometric system can be seen as a generic pattern recognizer that

operates by capturing the biometric trait of a person via an appropriately de-
signed acquisition module and comparing the recorded trait with the biometric
samples (or templates) stored in a central database. Users are represented by
a limited gallery of their biometric samples. Before uploading to the gallery,
the samples are preprocessed to the form of feature vectors. At recognition, a
query feature vector is extracted from the input sample and will be compared
with those in the database. The recognition decision is made according to the
matching scores. Depending on the context, a biometric system operates either
in the identification or in the authentication mode. An identification system
1. A k-NN query returns the k most similar distinct objects to the query object.
2. A range query returns all objects within some range (radius) of the query object.
8
1. Introduction
assigns the individual an identity label, while an authentication system termi-

nates with a binary verdict of match (genuine) or non-match (impostor).
While biometric systems, particularly Automatic Fingerprint Identification
System (AFIS) or DNA Identification System (DIS), have been widely used in
forensics for criminal identification, progress in biometric sensors and matching
algorithms have led to the deployment of biometric authentication in a large
number of civilian and government applications. Biometrics are being used for
physical access control, computer log-in, international border crossing or na-
tional ID cards. This rapidly evolving technology holds promise as a tool in
fraud prevention, security enhancement and identity theft curtailment. On the
top of it, forensics, as one of the leading applications of biometric systems,
benefits from the deployment of gait recognition systems, as an alternative to
already enhanced systems based on physiological traits.
1.2 Gait
Walking is a complex dynamic activity that is fast, animate, irreversible and

also rigid. Many components of the body are involved in moving and inter-
act mutually with each other and with the environment. Gait is defined as a
manner of walking. However, human gait is more than that. It is an idiosyn-
crasy of a person that is determined by, among other things, an individual’s
weight, bones’ length, footwear, and posture combined with characteristic mo-
tion. Considering all3 the features, human gait has been proven peculiar to
individuals.
Human gait has been an active subject of study in physical medicine (gait
disorders caused by stroke or cerebral palsy) [19], sport (gait regulation) [28], or
sociology (age and gender classification) [2] for a long time. Early psychological
studies by Murray [22] suggested that gait was a unique personal character-
istic, with cadence and was naturally cyclic. Along with the development in
information technology security, researchers are investigating and hope to use
gait as a biometric measure to recognize known persons [3, 4, 31]. It has been
receiving growing interest within the realms of computer vision and a number
of gait metrics have been developed.
We use the term gait recognition to signify the identification of an individ-
ual from a video sequence of the subject walking (see Figure 1.2). This does
not mean that gait is limited to walking, it can also be applied to running or
any means of movement on foot. With current technology supporting in-depth
measurement, we can capture diverse elements of gait, which might be helpful
at recognition. Besides figure silhouettes and body point spatial trajectories,
we can also pay attention to the aspect ratio of the torso, the clothing, the
amount of arm swing, the period and phase of a walking cycle, etc., all as part
3. Early medical and psychological studies [18] indicate that the human gait has 24 different
measurable components.
9
1. Introduction
of one’s gait.
Figure 1.2: Contrast enhanced images in gait sequences of CASIA Gait

Database (Dataset B) [34]. Three samples from left to right show the gait
of normal walking, clothing and carrying condition changes.
Gait can be seen as advantageous over other traits since the information
is captured without usage of any obtrusive body-invasive equipment or sub-
ject’s cooperation. From a surveillance perspective, walk pattern biometrics is
appealing because of its possibility to be performed at a distance, even sur-
reptitiously. Together with high rate of collectability, these are the reasons
why the method is preferably employed at human identification rather than
authentication. Another motivation is that video footage of suspects is read-
ily available, as surveillance cameras are relatively low cost and installed at
most locations requiring presence of security. Unlike face recognition, which
can be easily affected by low resolution, here a video of reduced quality is of-
ten sufficient. Moreover, gait is difficult to disguise, and by trying to do so the
individual will probably appear even more suspicious, while face can easily be
altered or hidden.
On the other hand, slightly modified conditions may introduce a large
appearance change, which may lead to a failure in recognition. Gait can be
affected by clothing (long cloak, shoes, carrying objects), physical changes
(pregnancy, injury, weight gain/loss) or environmental context (walking sur-
face, background). Temporary stimulants (drugs and alcohol) as well as mood
[20] can change a person’s walking style. The large gait variation of the same
person under different conditions intentionally or unintentionally reduces the
discriminating power of gait as a biometric. Hence it may not be as unique
as fingerprint or iris, although the inherent gait characteristic of an individual
still makes it irreplaceable and useful in visual surveillance.
The ultimate goal is to detect, classify and identify humans at long dis-
tances, under day or night, all-weather conditions, alone or among a group
of people. Light deficit and self-occlusion of feature points (overlapping legs)
10
1. Introduction
kills the process of gait features extraction from the external factors. The
challenges involved in gait recognition include also imperfect foreground seg-
mentation of the walking subject from the background scene and variations
in the camera viewing angle with respect to the walking subjects. Large vari-
ation of the same individual under different conditions requires more gallery
samples collected from different environmental contexts. Despite having sam-
ples generally obtained under similar environmental conditions, a reliable gait
recognition system with large-scale database seems unreal due to the complex-
ity of real-world situations. Therefore, a reasonable solution is not to ask the
system for a single (the best match) identity of an individual, but for a set of
most probable identities. The set can be constructed by taking first k most sim-
ilar patterns (k-NN query), or by setting a similarity threshold (range query).
It is important to adjust the values sensitively with respect to database size
and similarity function in order to filter out maximum of unrelated identities,
but also to keep the related one. The goal of a dramatic reduction of the set
of all identities (from billions to hundreds) is now achievable.
Walking Identity
Subject Detection Extraction Recognition of the Subject
Spotted Captured Raw Extracted Verdict Revealed
Video Gait Data Features
Recorded
Features
Database
Figure 1.3: Gait recognition on video.
Automated identification works in most cases as shown in Figure 1.3. Gen-

erally, it is a sequence of three base algorithms:
1. Detection:
Detection and tracking humans from a video sequence is the first
step in gait recognition, where an individual is actually spotted
walking. Gait detection systems usually work with the assumption
that the video sequence to be processed is captured by a static
camera, and the only moving object in video sequence is the sub-
ject (person). Given a video sequence from a static camera, this
module detects and tracks the moving silhouettes. This process
comprises of two submodules, Foreground Modelling4 and Human
tracking5 using skeletonization operation.
4. Tracking foreground objects.

5. Tracking the moving silhouettes of a walking figure from the extracted foreground image.
11
1. Introduction
2. Extraction:
Targeted elements of gait are collected from the individual in ques-
tion. In this step it helps to have the background be as simple as
possible and extra attention should be paid to selection of an
appropriate (side) viewpoint. To obtain a useful descriptor of a
walking person, person’s legs can be temporally tracked for sym-
metries. A more challenging approach is to extract phase infor-
mation from the timed interplay of the coordinated movements
comprising gait. Variable lighting and moving backgrounds make
traditional foreground extraction techniques such as optical flow6
and background subtraction7 unstable. At last, the specific mark-
ers of the identification scheme are extracted and consequently
preprocessed (normalized) to be in a comparable form.
3. Recognition:
Recently extracted samples are compared to the entries stored
in a central database. The identity of an individual holding the
most (and enough) similar gait sample is picked and stated as the
recognition verdict.
Gait recognition system can be used in a number of different scenarios. If
an individual walks by the camera who’s gait has been previously recorded and
he is a known threat, then the system will recognise him and the appropriate
authorities can be automatically alerted. Such systems have a large amount of
potential application in airports, banks or other high security area.
1.3 Scope of Thesis
This work acts as a fundament for developing a system capable of automatic

gait recognition from video sequences. A person’s gait is modelled via a 3D
stick figure in motion. Gait samples are constructed with use of trajectories
formed by selected anatomical landmarks captured as the person walks. The
locomotion of the figure is used to compute viewpoint invariant planar sig-
nals that express how a distance between two specific joints of the human
body changes in time. A suitable collection of such distance-time dependency
signals characterizes the person’s gait and is examined at every gait recogni-
tion query evaluation. In order to simplify whe whole process and to increase
recognition effectiveness, the similarity comparison is performed at the level of
individual walk cycles.
Primary objective of the thesis is to design a model with proper distance
6. The pattern of apparent motion of objects, surfaces, and edges in a visual scene caused
by the relative motion between an observer (an eye or a camera) and the scene.
7. A commonly used class of techniques for segmenting out objects of interest in a scene
from the background noise.
12
1. Introduction
function for effective comparison of gait samples. Several functions that com-
pare various combinations of extracted signals are suggested to determine the
similarity of gait samples with respect to the presented model. Secondary con-
tribution lies in proposal of a novel method for automatic determination of
walk cycles within extracted signals and their consecutive time normalization.
In addition, the importance of entire preprocessing is evaluated on a real-life
human motion database.
Walking subjects are assumed to be discovered with their gait variables8
captured. The work focuses on preprocession of gait variables and design of a
reasonable similarity function for their comparison. Experiments are employed
to reveal optimal settings for the similarity function.
After a short introduction to principles of biometrics and gait recognition
in Chapter 1, a rather exhaustive survey of existing approaches to human gait
recognition is given in Chapter 2. The model is defined in Chapter 3, together
with specification which gait parameters to measure and how to compare them.
Normalization approach for achieving a high similarity of related signals is de-
scribed in Chapter 4 and its importance is with optimized settings evaluated
in Chapter 5. Chapter 6 serves as a summary of the thesis, picking the spots
of imperfection that are nominated for the future research issues.
8. Any piece of information that could be used for gait representation.
13
2 Related Work
Early research was motivated by Johansson’s [17] and Barclay’s [2] psycholog-
ical experiments, where participants were able to recognize the type of move-
ment of pedestrians simply from observing the 2D motion pattern generated
by light bulbs attached to several joints over their body. Results are consistent
with physiological measurements. These experiments proved that the gait is
personally unique, and can be used for biometric recognition. Similar experi-
ments later showed some evidence that identity of a familiar person (friend)
[8], as well as gender [2], might be recognizable.
Because of the complexity of the field, most approaches only analyze gait
from the side view without exploring the variation in measurements caused
by differing view angles. Previous results on automatic gait recognition looked
promising [6, 12, 4], however, the subject databases used for testing are typ-
ically small (often less than ten people). Subsequent work has to be carried
out on larger databases to ascertain how effective identification methods will
be with a large-scale population size. Even if operating on small databases,
the success rate is reported as percent correct, that is, on how many trials are
needed for the system to correctly recognize the individual by choosing its best
match. Such a result gives little insight to how the technique might scale when
a database contains hundreds or more people.
Generally for any method, the recognition rate decreases with increasing
number of subjects and increases with increasing number of gait sequences
recorded for a single subject. Some of large gait databases widely used in aca-
demic research are:
• CASIA Gait Database1 (Dataset B) [34]
• CMU Motion of Body Database2 [13]
• USF HumanID Gait Baseline Database3 [25]
In spite of some covariances in view angle, shoe type and carrying condi-
tions, these large databases have been built for massive human identification.
Related to the challenging factors, such as view angle, clothing, or walking
speed, many related researches have been published and most of them work
accurately in controlled environment and walking style. However, existing ap-
proaches are far from perfect. For example, it is difficult to track the pedestrian
in crowd and gait feature may be extremely inaccurate if camera is shaking or
weather dramatically changes.
1. Collected by National Laboratory of Pattern Recognition, Institute of Automation, Chi-

nese Academy of Sciences.
2. Collected by Robotics Institute, Carnegie Mellon University.
3. Collected by Computer Vision and Pattern Recognition Group, Department of Computer
Science and Engineering, University of South Florida.
14
2. Related Work
There are many aspects that influence effectivness of individual methods,

such as input data character, ways of object comparison, suitability for dif-
ferent types of walk or success rate of similarity search methods. Some ap-
proaches use the fact that human body consists of components linked to each
other at joints, which allows much more accurate measurements to perform.
However, these methods require structure models compatible with the system
of measurement followed by lost of portability. An alternative is to consider
the property of the spatio-temporal pattern4 as a whole, where the structure
is ignored and not so much work with gait extraction is required. With re-
spect to all possible aspects, there are two major categories of gait recognition
approaches, appearance-based [1, 3, 14, 15, 18, 23, 25, 31] and model-based
[27, 10, 11, 32, 33]. The latter is described in detail as it is relevant to the issue
proposed in this work.
2.1 Appearance-Based Approach
Gait motion of the walking human is compactly represented by its appear-

ance in its continuum, without considering any underlying structure. This is a
holistic approach, i.e. derived methods are not linked to one object. Therefore,
a method detecting human gait could be used (with some modification) also
for animal gait and vice versa. A number of appearance-based gait recognition
techniques have been investigated, the most significant of them are sketched
in paragraphs below.
Many algorithms operate on human silhouettes that can be easily acquired
through preprocessing methods [21]. The silhouettes are used to derive the
spatio-temporal attributes of a moving individual. The spatio-temporal distri-
bution is generated by the motion across the plane (XY) and time (T). Niyogi
and Adelson [23] used XYT volume to find out the snakes pattern and then
to detect walking people from image sequences. Depending on a weighting
scheme, the recognition performance on 26 image sequences ranged from 15
to 21 correct. Their recognition rate of 81% looks promising but was executed
only on a few walkers.
Dimensionality reduction is vital to the recognition purposes because the
size of recognition matrices5 can be vast. Principle component analysis (PCA),
also known as eigenanalysis, is a technique used to reduce the dimensionality
of data and examine the relationship between a set of correlated variables.
By concatenating the columns of each silhouette of the subject during mo-
tion, Huang et al. [15] created a feature vector with obviously large dimension.
That would be infeasible to use for recognition purposes. PCA extracts the
4. A query mechanism to characterize complex object behaviors in space and time.

5. The 2 dimensions of a recognition matrix are compared subjects and feature vector
coordinates.
15
2. Related Work
main variation in the feature vector and allows an accurate reconstruction of

the data to be produced from only a few of the extracted feature values (prin-
ciple components), hence reducing the amount of computation needed. After
performing PCA, the eigenvectors6 (which are all orthogonal to each other)
represent the new axes, as shown in Figure 2.1, in ascending order of variation
in the original data set. Having the principle components calculated, it is nec-
essary to decide how many of them to keep in order to maintain a correct and
accurate representation of the original data.
Figure 2.1: Example of PCA in action [9].
BenAbdelkader et al. [3] successfully used a plot of image self-similarity to

classify sequences. The self-similarity plot is a measure of correlation between
two different extracted foreground regions at different time instances. For each
frame, a bounding box is drawn around the person. Image self-similarity be-
tween two frames from the same sequence is defined as the sum of all the pixel
intensities in the difference image of the two frames. A self-similarity plot can
therefore be constructed by plotting this value for all combinations of sequence
frames. This plot displays period and magnitude of a person’s gait, however,
they needs to be normalized with respect to phase (make the cycles start at
the same location) and frequency (keep the same pace between two iterations
of recording). Two of these plots can be seen in Figure 2.2. The dark areas sig-
nify high correlation, and the lighter areas signify a drop in the correlation of
the measured parameter. The main diagonal will always be darkest, but other
dark diagonals can also be seen where the pose is either the same or opposite.
PCA was then performed to reduce the dimensionality of the extracted images
followed by clustering analysis. Gait dynamics was preserved, but was easily
6. Set of characteristic vectors.
16
2. Related Work
influenced by light and noise. The classifier was built simply by feature vectors
as points in Eigengait space7 , and the test sample was classified by determin-
ing its 5-NN, using the Euclidian distance and simple majority as a decision
rule. Experiments with recognition rates up to 93% have proven that people
maintain similar plots on different video sequences, and that the plots differ
from person to person, giving evidence about possibility to be used as a gait
classifier.
Figure 2.2: Self-similarity plots (right) for the walk sequences (left) [3].
Some other discrete methods aim to represent gait cycles with a Hidden
Markov Model (HMM). States of these systems are not actually observable,
but they generate output based on probability. Kale et al. [18] used HMM to
capture the information from gait sequence and to recognize individuals. The
postures that the individual adopts can be regarded as the states of the HMM
and provide a means of his discrimination. Gait was represented by the width
of the outer contour of the binarized silhouette. HMM was trained for each
person and then gait recognition was performed by evaluating the probability
that a given observation sequence was generated by a particular HMM. On the
CMU database, the right person is in the top three matches (3-NN) 90% of
the times for the cases where the training and testing sets correspond to the
same walking styles.
Han and Bhanu [14] propose a new spatio-temporal gait representation,
called the Gait Energy Image (GEI), for individual recognition by gait. Un-
like other gait representations, which consider gait as a sequence of templates
(poses), GEI captured human motion sequence in a single image while pre-
serving temporal information. They directly generate real synthetic templates
from training silhouette sequences by simulating silhouette distortion (see Fig-
ure 2.3). The intensity of each pixel on GEI revealed the duration of foreground
staying at that position. Experimental results on USF HumanID Database
7. A PCA-obtained subspace spanned by a specific number of the most significant eigen-

vectors that account for a fraction of the variation in the feature vectors.
17
2. Related Work
show that the proposed GEI representation achieves the performance of up to

94%, which is in average 20% higher than traditional Baseline8 representation.
Figure 2.3: Examples of normalized and aligned silhouette frames in different

human walking sequences. The rightmost image in each row is the correspond-
ing GEI [14].
2.2 Model-Based Approach
The main principle of the model-based approach is to explicitly recover a high-

level structure (model) of an examined object. This model represents the dis-
criminatory gait characteristics (be they static or dynamic), with a set of log-
ically and quantitatively related parameters. Models are constructed on the
basis of prior knowledge about the object and justifiable assumptions such
as the system only accounts for pathologically normal gait. For humans we
typically use simplified stick figures, as shown in Figure 2.4, with topology
containing striking body components (head, torso, hips, thighs, knees and an-
kles) of proportional length, width and position. When modelling the human
body, we can place various kinematical and physical constraints for the model
to be more realistic, i.e. maximum variation in angle of knee joint.
In every frame of the walking sequence, a 2D or 3D structural model of
the human body is assumed. Body pose is then recovered by extracting image
features and mapping them to structural components of the model (body la-
belling). Hence a human is detected in the image if there exists a labelling that
fits the model well enough9 . Once a person has been detected and tracked,
model matching is usually performed so that the parameters such as angu-
8. Direct frame shape matching.

9. Based on some previously specified fit threshold.
18
2. Related Work
Figure 2.4: Stick figures [9].
lar velocity10 [27] or trajectories11 [5, 30] are measured on the body model
as it deforms over the walking sequence. They are typically mapped to some
low-dimensional feature vectors and compared by various sophisticated classi-
fication techniques.
As opposed to silhouettes, models can reliably handle occlusion (especially
self-occlusion), noise, scale and rotation. They offer the ability to derive gait
signatures directly from model parameters. Another considerable advantage is
that extra evidence gathering techniques can be used across the whole sequence
before model fitting. Mapping to models also helps reducing the data dimen-
sionality. These methods are easy to understand, however, their effectiveness is
still limited by imperfect vision techniques, especially in body structure mod-
eling12 . Moreover, comparison and searching procedures tend to be complex
and need high computational cost.
The first model-based approach to gait biometrics was by Cunado et al.
[7]. Their simple structure modelled a leg as interlinked pendulums. Gait sig-
nature was derived fom the angular rotation of the hip and knee, as draught
in Figure 2.5, by multiplication of the phase and magnitude component of the
Fourier description. Assuming that gait is symmetric, only one leg was mod-
elled and the other one was calculated with a phase lock of half-period shift.
This method could withstand differing amounts of noise and occlusion, and
achieved recognition rate of 100% on a database of 10 subjects.
Research made by Dockstader et al. [10, 11] presents a structural approach
toward 3D tracking and extraction of gait from human motion. For the struc-
10. Also called rotational velocity, it is a quantitative expression of the amount of rotation
that a spinning object undergoes per unit time.
11. Path that a moving object follows through space as a function of time.
12. Tracking and labelling human body.
19
2. Related Work
(a) (b)
Figure 2.5: Structural model of a leg [7]. (a) Upper and lower pendulum rep-
resents the thigh and the calf, respectively, connected at the knee joint. (b)
Fourier description of rotation-phase dependency.
ture of the human model, they used a set of thick lines joined at points to
represent the legs and a periodic pendulum motion to describe the gait (see Fig-
ure 2.6). A 15-parameter stick model (p1 , . . . , p15 ) and a 10-parameter bound-
ing volume (p16 , . . . , p25 ) are defined. Each component of the model is measured
in 3D, body-centered coordinates. It is assumed that during a normal gait cycle,
the body moves forward, tangentially to the transverse (T P, x-y) and saggital
(SP, x-z) planes and orthogonally to the coronal plane (CP, y-z). Points in this
space are indicated by vector (x, y, z). To further limit the variability and to
increase the tracking accuracy of the body model, they introduced a concept
of additional hard and soft kinematic constraints13 .
Figure 2.6: Structural object model with its kinematic constraints [10].
Tanawongsuwan and Bobick [27] demonstrate gait recognition using only

the trajectories of lower body joint angles projected into the walking plane. Es-
13. Bounds on the velocity and acceleration magnitudes, spanning distances and scaling.
20
2. Related Work
timating an underlying skeleton enables measuring the joint angle trajectories

of four joints: left and right angles of hips and knees. Differences in body mor-
phology and dynamics cause joint-angle trajectories to differ in both magnitude
and time. These trajectories can be analyzed, as they are time-normalized14
with respect to duration and foot step count, as shown in Figure 2.7. A metric
function is simply defined as Euclidean distance between these trajectories.
Figure 2.7: Left knee trajectories before (up) and after (down) time-
normalization [27].
An important consideration in similarity-based retrieval of moving object

trajectories is the definition of a distance function. The aim is to reduce sen-
sitivity to noise, shifts, and scaling of data that commonly occur due to sen-
sor failures, disturbance signals, different sampling rates, and detection errors.
Data cleaning15 to eliminate this inaccuracy is not always possible.
Vlachos et al. [30] investigate techniques for analysis and retrieval of object
trajectories in 2D and 3D. Euclidean and Dynamic Time Warping (DTW) dis-
tances are relatively sensitive to noise, which makes both of the metrics fail.
Distance functions that are robust for extremely noisy data typically violate
the triangular inequality. Therefore, the authors formalize non-metric similarity
functions based on the Longest Common Subsequence (LCSS, see Figure 2.8),
which are very robust to noise and furthermore provide an intuitive notion of
similarity between trajectories, by giving more weight to the similar portions
of the sequences. The LCSS is a variation of the edit distance. The basic idea
is to match two sequences by allowing them to stretch, without rearranging
the sequence of elements but allowing some elements to be unmatched. Since
the LCSS model allows more efficient approximate computation, it is used to
define similarity measures for trajectories.
In the paper written by Chen et al. [5], Edit Distance on Real sequence
(EDR) is introduced. This distance function is robust against most types of
14. Trajectory’s time axis is linearly converted from the experimentally-recorded time units
to an axis representing percentage of the gait cycle.
15. Used mainly in databases, the term refers to detecting and correcting corrupted records.
21
2. Related Work
(a) (b)
Figure 2.8: [30] (a) Examples of 2D trajectories. (b) Notion of the LCSS match-
ing within the gray region.
data imperfections. EDR is based on edit distance of strings, and removes

the noise effects by quantizing the distance between a pair of elements to two
values, 0 and 1. Seeking the minimum number of edit operations required
to change one trajectory to another offers EDR the ability to handle local
time shifting. Analysis and comparison of EDR with other popular distance
functions, such as Euclidean distance, DTW, Edit distance with Real Penalty
(ERP), and LCSS, indicate that EDR is more robust than Euclidean distance,
DTW and ERP, and more accurate than LCSS.
The USF HumanID Gait Baseline Database includes a set of silhouettes, in
which the subjects were extracted and their body components were manually
labelled. An example of an original frame, its corresponding automatically-
extracted silhouette, and manually-labelled silhouette is shown in Figure 2.9.
Huang and Boulgouris [16] investigate the fusion of several features extracted
from these manually-labelled silhouettes. They developed a novel approach
for human gait recognition based on the combination of three discriminative
features, i.e., the area, the gravity centre, and the orientation of each body
component. Experimental results show that the method exhibits considerably
better performance, in comparison to all existing methods that use manually-
extracted silhouettes.
Yoo et al. [33] described an automated gait recognition technique using
back-propagation neural network algorithm. Gait is described as rhythmic and
periodic motion, and a sequential set of 2D stick figures (see Figure 2.10) is
extracted from gait silhouettes. To recognize human gait, a total of 27 param-
eters are considered as gait features. By measuring a class separability of the
given feature, only 10 important features for classifying the gait are selected
from these feature sets. Then, an enhanced neural network algorithm is applied
to the database, and recognition rate of 90% for 30 subjects is accomplished.
22
2. Related Work
(a) (b) (c)
Figure 2.9: Sample of the USF data set [16]. (a) Original frame, (b)
Automatically-extracted silhouette, (c) Manually-labelled silhouette.
Figure 2.10: Extraction of the stick figures from body contours [33].
The problem of recognizing walking humans at arbitrary poses is addressed

in the paper of Yamauchi et al. [32]. The models of walking humans (see Fig-
ure 2.11) are built using the sensed 3D range data at selected poses without
any markers. An instance of a walking individual at a different pose is recog-
nized using the 3D range data at that pose. First, the system measures high
resolution 3D human body data that consist of the representative poses during
a gait cycle. Next, a 3D human body model is fitted to the body data and
the gait sequence is synthesized by interpolation of positions of joint angles
and their movements. Both static and dynamic gait features are used to define
a similarity measure for an individual recognition in the database. The ex-
perimental results show high recognition rates using our range based 3D gait
database.
23
2. Related Work
(a) (b)
Figure 2.11: [32] (a) 3D human body model with approximated by 12 segments
and (b) its hierarchical structure.
2.3 An Overview of Previous Methods
Research in human gait recognition experienced a boom over the past 15 years.
As far back as 90’s, scientists were working with systems of > 90% recognition
rate. Yet having a tiny database, the research attracted considerable attention.
Current research is indeed more sophisticated and methods developed are more
sensitive. Comparing the old methods with the new ones acts not only as an
indicator of a meaningful research, but also as a result of natural interest of
some researchers.
There are plenty of factors to regard at comparisons, but the most questions
asked are about success rate of the recognition procedure. Related to that pa-
rameter, number of walking subjects and gait samples play an important role.
One could invent a 100% method, although tested only on a database with
2 persons. On the other hand, another could work harder and introduce an
80% method examined on hundreds of subjects. Hence research conditions are
highly significant for the relevance of the performance expressed in a single
number. In order to perform a completely fair performance survey, the im-
plementations of all methods should be executed on the very same database
that could offer gait data in particular representations and could retrieve any
particular query.
Unfortunately, it is impossible to test all methods on a single database,
which might be a reason to understand this survey as of less informatory value.
Some methods focus on a traditional style of walking, some work with iden-
24
2. Related Work
tification in adverse conditions (bad weather, carried accessories), and others

recognize persons from various walking styles (running, crawling). Table 2.1
compares the greatest methods in the history of gait recognition, as well as the
most recent and demonstrational ones.
Year Authors Walkers Samples Performance

1994 Niyogi & Adelson [23] 4 26 81% (1-NN)
1999 Huang et al. [15] 6 6 95% (1-NN)
2001 Tanawongsuwan & Bobick [27] 18 106 73% (1-NN)
2001 BenAbdelkader et al. [3] 6 40 93% (5-NN)
2003 Cunado et al. [7] 10 10 100% (1-NN)
2004 Kale et al. [18] 25 50 90% (3-NN)
91% (1-NN)
2006 Han & Bhanu [14] 122 122
94% (5-NN)
2008 Yoo et al. [33] 30 210 90% (1-NN)
2009 Yamauchi et al. [32] 6 6 100% (1-NN)
34% (1-NN)
2009 Huang & Boulgouris [16] 70 70
58% (5-NN)
Table 2.1: Performance of chronologically ordered previous methods of gait

recognition.
25
3 Gait Variables Representation
Appearance-based methods generally characterize the whole motion pattern

of the human body by a compact representation regardless of the underlying
structure. The most successful ones usually combine extracted human silhou-
ettes from each video frame into a single gait image that preserves temporal
information. Nevertheless, recognition based on comparison of such gait images
is restricted to one view point. To use gait traits in unconstrained views, we
need to adopt the model-based concept that fits various kinds of stick figures
onto the walking person in order to estimate its 3D models. The recovered
structure allows us to perform accurate measurements that are independent
from camera view. However, structure models compatible with the system of
measurement are required.
In this section, a structural model of the human body is introduced. We
consider a human movement as a set of trajectories of anatomical landmarks
(shoulders, elbows, hands, loins, knees, and feet) captured as the person walks.
Any pair of these trajectories can be used for the extraction of a distance-time
dependency signal (DTDS), which is a viewpoint invariant planar signal indi-
cating how a distance between the two model’s components varies over time.
A collection of such signals may already serve as a walk characteristic and is
used as the person’s gait pattern. Every person is assigned with his or her gait
pattern and in order to perform person recognition query, their gait patterns
are compared. This is managed by computing the similarity of DTDSs using a
sophisticated distance function.
We believe that effective walk recognition methods should compare individ-
ual walk cycles separately, instead of processing a walking sequence as a whole.
The collection of signals partitioned to walk cycles could serve as a suitable
characteristic.
3.1 Model Definition
A human model is generally defined as a structure with body point labels as

its components. Formally, any such structure is an m-tuple
M = (p1 , . . . , pm ),
where each pi is the label of a specific body point.

In particular, for our research we suggest the human model as the set
of significant, easy-to-track anatomical landmarks: shoulders SL /SR , elbows
EL /ER , hands HL /HR , loins LL /LR , knees KL /KR , and feet FL /FR . The
subscripts L and R express whether a given landmark is situated at the left or
right side of the human body, respectively (see Figure 3.1). Hence m = 12 and
26
3. Gait Variables Representation
we operate with the 12-parameter model
M = (SL , SR , EL , ER , HL , HR , LL , LR , KL , KR , FL , FR ),
where each landmark is described by a 3-dimensional body point
Pf = [xf , yf , zf ] ∈ R3
captured at a given video frame f ∈ F . The relative frame (discrete time) do-
main F = [1, l]N avoids operating with concrete time instances of an absolute
time domain. The number l is relative and refers to the length of input video
in terms of number of frames, i.e., to the number of instances a specific body
point has been captured. This enables us to use the above relative notation of
body points. The only condition is a constant capture frequency, corresponding
to a time difference between capture times of two consecutive video frames.
Figure 3.1: Locations of anatomical landmarks over suggested model [26].
As being already investigated, we abstract from gait pattern capture mech-

anism. Having coordinates of all the points already extracted and mapped on
our model, we execute just the recognition procedure.
3.2 Trajectories of Movement
A chronologically ordered sequence of specific point positions represents its

motion trajectory (see Figure 3.2). Formally, each point P moving in time as
the person walks constitutes a discrete trajectory
TP = {Pf | f ∈ F }.
The discrete frame domain F , limited from above by the trajectory rel-
ative length l = |T |, allows us to utilize metric functions for point-by-point
comparison of trajectories.
27
anatomical landmarks
trajectories
z
y
x
Figure 3.2: Trajectories of anatomical landmarks captured as the person walks

[26].
3.3 Distance-Time Dependency Signals
Extracted trajectories cannot be directly used for recognition because the val-
ues of their spatial coordinates depend on the system calibration that detects
and estimates particular coordinates. Moreover, persons do not walk in the
same direction, what makes trajectories of different walks (even of the same
person) incomparable. We rather compute distances between selected pairs of
trajectories.
A distance-time dependency signal (DTDS) expresses how a distance be-
tween two trajectories changes over time as the person walks. It is the variation
in these distances that is primarily exploited as information for identity. Such
signals are already independent from the walk direction and system calibra-
tion.
The distance is always measured between two points Pf = [xf , yf , zf ] and
Pf = [x0f , yf0 , zf0 ] captured at the same video frame on the basis of the Euclidean
0
distance
r 2
2 2
L2 (Pf , Pf0 ) = xf − x0f + yf − yf0 + zf − zf0 .
For given two points P and P 0 that travel over trajectories TP and TP 0 of
an identical domain F we formally define corresponding DTDS as a function
SP P 0 : F → R+
0,
SP P 0 (f ) = L2 (Pf , Pf0 ).
Lemma 1. Let S be a DTDS. Then
a) for each body point P , SP P equals the zero function, and
b) for all body points P and P 0 , we can write SP P 0 = SP 0 P .
28
Proof. We use some well-known properties of L2 :
a) The Euclidean distance L2 (P, P ) returns 0 for any point P .
∀ f ∈ F : SP P (f ) = L2 (Pf , Pf ) = 0 ⇒ SP P = F × {0}
b) The Euclidean distance is a symmetric function of its parameters.
∀ f ∈ F : SP P 0 (f ) = L2 (Pf , Pf0 ) = L2 (Pf0 , Pf ) = SP 0 P (f )
The person’s walk identity consists of several DTDSs that describe the per-
son’s style of walking. In the following, we present a methodology for measuring
similarity between two signals of the same pairs.
3.4 Similarity of DTDSs
Having two DTDSs of an identical pair of points measured, the main issue
is to propose an effective method for their comparison. After reading Chap-
ter 2.2, an attentive reader is familiar with some notable trajectory-comparing
distance functions that can be applied on signals as well. We can for example
use functions as EDR [5] or LCSS [30] that take local time shifting and noise
into account. However, with the help of DTDS normalization (see Chapter 4)
we are not strained to use such sophisticated and complex distance functions.
Even the simpliest pointwise difference (Manhattan distance) gives a sound
result.
Extracted DTDSs within a single walking sequence have always the same
length and number of footsteps. However, we need to compare signals extracted
from different walks that can have a significantly different length and footstep
count. The problem of different length can simply be solved by truncating the
longer signal to the length of shorter one. Thinking about possible inaccuracies,
two long signals that are very similar can have the same distance as two short
signals that are rather different. Fortunately, signal normalization solves this
unwanted scenario by unifying the frame domains. Then we can use the fol-
lowing distance for measuring the dissimilarity of two signals S and S 0 defined
on a (normalized) frame domain Φ = [1, λ] of a fixed length λ
1X
δ(S, S 0 ) = |S(f ) − S 0 (f )|.
λ
f ∈Φ
This function sums point-by-point differences between two specific DTDSs and
computes the average difference for a single frame. It returns 0 if the DTDSs
are identical and with an increasing return value their similarity decreases.
29
Definition 2. Given a set M and a binary function d : M 2 → R, the structure

(M, d) is a metric space if the following properties hold:
a) ∀x, y ∈ M : d(x, y) ≥ 0 (Non-Negativity)
b) ∀x, y ∈ M : d(x, y) = 0 ⇔ x = y (Identity of Indiscernibles)
c) ∀x, y ∈ M : d(x, y) = d(y, x) (Symmetry)
d) ∀x, y, z ∈ M : d(x, z) ≤ d(x, y) + d(y, z) (Triangle Inequality)

Lemma 3. The structure (Φ × R+
0 , δ) is a metric space.
Proof. It is sufficient to verify the properties in Definition 2.

a) Non-Negativity:
∀S, S 0 ∈ Φ × R+ 0
0 : δ(S, S ) ≥ 0
Let S and S 0 be DTDSs defined on the frame domain Φ = [1, λ]. Then
1 X
δ(S, S 0 ) = |S(f ) − S 0 (f )| ≥ 0.
λ
|{z} f ∈Φ
>0 | {z }
≥0
b) Identity of Indiscernibles:
∀S, S 0 ∈ Φ × R+ 0
0 : δ(S, S ) = 0 ⇔ S = S
0
Let S and S 0 be DTDSs defined on the frame domain Φ = [1, λ]. We

prove both implications. For the first one, “⇐", we need δ(S, S) = 0.
By definition,
1X 1X
δ(S, S) = |S(f ) − S(f )| = 0 = 0.
λ | {z } λ
f ∈Φ =0 f ∈Φ
The reverse implication, “⇒", requires S 6= S 0 ⇒ δ(S, S 0 ) 6= 0. Since

6 S 0 ⇒ ∃f ∈ Φ : S(f ) 6= S 0 (f ),
S=
1 X
δ(S, S 0 ) = |S(f ) − S 0 (f )| 6= 0.
λ
|{z} f ∈Φ
6=0 | {z }
6=0
c) Symmetry:
∀S, S 0 ∈ Φ × R+ 0 0
0 : δ(S, S ) = δ(S , S)
Let S and S 0 be DTDSs defined on the frame domain Φ = [1, λ]. Then
1X 1X 0
δ(S, S 0 ) = |S(f ) − S 0 (f )| = |S (f ) − S(f )| = δ(S 0 , S)
λ λ
f ∈Φ f ∈Φ
30
d) Triangle Inequality:
∀S, S 0 , S 00 ∈ Φ × R+ 00 0 0 00
0 : δ(S, S ) ≤ δ(S, S ) + δ(S , S )
Let S, S 0 and S 00 be DTDSs defined on the frame domain Φ = [1, λ].

Then
δ(S, S 0 ) + δ(S 0 , S 00 ) =
1X 1X 0
= |S(f ) − S 0 (f )| + |S (f ) − S 00 (f )| =
λ λ
f ∈Φ f ∈Φ
1X
= (|S(f ) − S 0 (f )| + |S 0 (f ) − S 00 (f )|) ≥
λ | {z }
f ∈Φ
≥|S(f )−S 00 (f )|
1X
≥ |S(f ) − S 00 (f )| = δ(S, S 00 ).
λ
f ∈Φ
3.5 Gait Pattern
A gait pattern describes the person’s style of walking by encapsulating infor-

mation about DTDSs that are extracted from the single person’s walk. Since
we have a DTDS of every possible pair of body point trajectories at our dis-
posal, we can utilize it as a recognition parameter by integrating it to the gait
pattern structure. Hence a general gait
pattern of a given person mapped over
m
an m-parameter model M is an 2 -tuple
G = (SM1 M2 , SM1 M3 , . . . , SMm−1 Mm ),
where each SMi Mj refers to the DTDS of the body points on corresponding
coordinates i and j of the model M.
The following equivalence is defined for our ability to treat relative gait
patterns. Given two gait patterns G and G 0 , we put
G ∼ G0
if and only if both gait patterns belong to a single person.

Extracted signals naturally differentiate as static and dynamic ones. Static
signals describe the person’s skeleton, the distance is measured between two
neighboring joints connected by a bone, which is constant over time. On the
other hand, dynamic signals describe the person’s style of walking, their dis-
tance is significantly fluctuating as the person walks.
For the comparison procedure, it is possible to take into account signals
of all pairs of points, but not all of the DTDSs are necessary to be employed
and thus constructed. That is not because the construction itself would be too
31
comlicated or the m

2 comparisons would consume unacceptable time. The
reason is that recognizing with use of too many criterions may have a negative
influence on success rate. This structure therefore allows to assign weights to
individual DTDSs and thus reflect the importance of a given DTDS in the
process of recognition. Assigning zero weight to a single DTDS will cause giv-
ing it no respect at recognition. The question how the weights should be set
in order to maximize the recognition rate is a subject of this research. Hence,
experiments in Chapter 5 were employed to distinguish the weight scaling of
the DTDSs selected as components of the gait pattern structure.
In the following, we present a methodology for measuring similarity between
two gait patterns.
3.6 Methods of Gait Comparison
We introduce a novel similarity function for comparing two gait patterns G and
G 0 of an m-parameter model. This function is based on aggregation of weighted
signal distances and is generally defined as a weighted sum
(m2 )
1 X
Dw (G, G 0 ) = m wi δ(Gi , Gi0 ),
2 i=1
where δ(Gi , Gi0 ) is the distance of DTDSs on both gait pattern structures’ i-th
coordinate. The weights should be assigned with the properties
(m2 )
m X
∀ i ∈ 1, : wi ∈ [0, 1]R and wi = 1,
2 N
i=1
forming the weight vector (also called distribution)

w = w1 , . . . , w(m) .
2
Lemma 4. For any weight vector w, the structure

(Φ × R+ ) (m2 ) , D
0 w
is a metric space.
Proof. Metricity of the gait distance function Dw follows directly from metric-
ity of the DTDS distance function δ at each coordinate of the gait pattern
structure, which is proven in Lemma 3.
The DTDSs within a single walk identity has always the same length and
number of footsteps. However, a recognition algorithm has to compare DTDSs
of identities which can have a significantly different length and footstep count.
32
Use of the introduced metric function is meaningful only in case when both
compared signals contain the same number of footsteps and start at the same
phase of a walking process, otherwise such heterogeneous signals are semanti-
cally incomparable by standard distance functions (e.g. Manhattan, Euclidean,
or DTW). Consequently, we normalize extracted DTDSs with respect to a du-
ration and walk cycles’ phase before similarity comparison.
33
4 Normalization of Signals
Walk cycle is a sequence of poses between the exact same repetitive events of
walking. The cycle can be defined to start at any moment, now if it is with the
right foot contacting the ground, then the cycle ends when the right foot makes
the contact again. We distinguish two major phases, stance and swing. Stance
phase is the part of the cycle when the foot is in contact with the ground.
It comprises 62% of the cycle, beginning with initial foot strike and ending
with toe-off. Swing phase occurs when the foot is in the air and comprises 38%
of the cycle, beginning with toe-off and ending with second foot strike. The
motion between successive points of contacts of the same foot is called stride,
and the motion among consecutive heel strikes of opposite feet is a footstep. It
is obvious that a complete walk cycle (see Figure 4.1) includes two footsteps.
Thus, human gait is a form of periodic motion.
(a) Pose 1 (b) Pose 2 (c) Pose 3 (d) Pose 4 (e) Pose 1
Figure 4.1: Walk cycle.
To ensure comparability of gait patterns, signals are extended or contracted

to be synchronized, that is keeping the same phase of a walking process at ev-
ery moment. Not considering the synchronization would cause comparing two
gait patterns that occur in different phases at the same moment impossible
(corresponding signals are shifted). We propose a methodology for preprocess-
ing DTDSs, prior to application of a similarity function. The general idea of
the preprocessing stage is to normalize each DTDS to a standard form with
respect to
• period - the signal represents one walk cycle,
• inception - the signal starts at a fixed phase, and
• length - the signal has a fixed length.
34
4. Normalization of Signals
Walk cycle normalization considers speed and characteristics of walking by

extracting a single walk cycle from each signal. Normalized signals correspond
purely to a single walk cycle, regardless of their original length, and thus can
effectively be compared by standard similarity functions, instead of comparing
signals of unknown movements. Their inception is therefore set to an arbitrary
but identical phase, for example the pose where the feet pass each other. This
configuration will indicate the walk cycle inception (and termination), which
sets constraints for Φ. Typically, Φ is bounded below by 1 and above by the
normalized walk cycle length λ, which is a fixed value henceforward. Assuming
that every walking process is periodic (two walk cycles of one person are very
much similar), we can make the gait pattern storage and comparison easier just
by operating with normalized serials of one walk cycle (with prior specification
of its inception).
Consider two feet signals1 of different walks of the same person in Fig-
ure 4.2(a) before normalization. Both the illustrated signals look like com-
pletely dissimilar, however, after their normalization in Figure 4.2(d), they
seem to be almost identical.
The normalization process consists of the following three parts. The first
one splits the input signal into individual footsteps. The second part identifies
a single walk cycle composed of two consecutive footsteps, the first footstep
corresponds to the left foot in front and the next footstep with the right foot
in front. And finally, the third part aligns the extracted walk cycle to a fixed
length. Figure 4.2 illustrates the whole normalization process of two DTDSs.
In the rest of this section, we suggest an algorithm for preprocessing gait
patterns to contain single walk cycle signals aligned to λ video frames. An im-
portant thing to remind is that comparison of different data types may require
different normalization aspects, while for DTDS of the particular 12-parameter
model from Chapter 3.1 we suggest the introduced ones as the most suitable.
4.1 Identifying Footsteps
Considering the fact that human walking is periodic, this process splits an
input feet DTDS SFL FR into individual walk cycles. To extract the requested
walk cycle, we need to identify inceptions of individual footsteps.
The inception of each footstep must begin at a same phase. Any phase
can be chosen, however, some of them are more striking and easier to describe
and measure. We select the moment when person’s legs are the closest to each
other as a footstep inception, as outlined in Figure 4.1(a). By a naked view
of the character of the signal we can see a sequence of hills and valleys (see
Figure 4.2). Each hill represents a subcycle of drowing the points apart and
their consecutive approach. As a walk cycle inception, we selected the moment
1. The feet signal of our 12-parameter model introduced in Chapter 3.1, SFL FR , represents
the changing distance between the left FL and right FR foot as the person walks.
35
Distance
Distance
(a) (b)
0 Time 0 Time
(a) (b)
Distance
Distance
(c) (d)
0 Time 0 Time
(c) (d)
Figure 4.2: Normalization of two feet DTDSs with a different number of foot-
steps (each hill represents a single footstep) [29]. (a) represents these signals
without normalization. (b) denotes identified minima of each signal. (c) con-
stitutes just the first walk cycle of the signals starting with the move of left
foot ahead. (d) shows the extracted walk cycles after time normalization.
when person’s legs are close to each other. In the hill-valley notation, depicted
phase corresponds to the lowest tip of valley (minimum). The reason is easier
determining of even miniature footsteps. The values of such minima within the
signal are very similar because feet are passing at a similar (almost identical)
distance at each footstep. To determine footsteps, we identify all the minima
within the feet signal. We cannot rely on the signal to contain minima at a
fixed distance and to be ideally smooth, which means without any undulation
caused by measurement errors. This is the reason we cannot use traditional
find-minima algorithms.
The algorithm for finding minima is described in Algorithm 1. The algo-
rithm processes the feet signal SFL FR of frame domain F and returns the array
of frames f ∈ F corresponding to identified minima that are smaller than
some threshold. The threshold defines the maximum distance which the sig-
nal at identified minima can reach. This threshold should be assigned in the
form of percentile β of distances of the input feet signal, with respect to walk-
ing conditions. The feet signal is consecutively scanned from the beginning,
and the first frame f with the distance SFL FR (f ) smaller than the threshold
denotes a candidate for the minimum. If there is another better candidate
f 0 ∈ [f + 1, f + α]N with a smaller distance SFL FR (f 0 ) < SFL FR (f ), it replaces
the previous candidate f . Looking for a better candidate continues until the
next candidate with a smaller distance within next α frame can not already be
found. The last candidate f is declared as the minimum, the nearest frame f 0
36
higher than f + α such that SFL FR (f 0 ) is smaller than β percentile determines

the next candidate, and the process continues until the whole signal is scanned.
Thus a variable number of minima can be identified for different feet signals.
Complexity of this algorithm is linear since each frame is examined exactly
once.
Algorithm 1 Identification of minima within a feet DTDS.

Input: Feet DTDS SFL FR defined on frame domain [1, l], minimal footstep
length α and threshold percentile β.
Output: Array of identified minima minima.
IdentifyMinima(SFL FR ,α,β)
1: dists ← sort([SFL FR (f ) | 1 ≤ f ≤ l])
2: threshold ← dists [bl · βc]
3: minima ← []
4: i ← 1
5: while i ≤ l do
6: i ← minj∈[i,l] {SFL FR (j) < threshold}
7: alphaLimit ← i + α
8: minimum ← i
9: while i ≤ alphaLimit do
10: i++
11: if SFL FR (i) < SFL FR (min) then
12: minimum ← i
13: alphaLimit ← minimum + α
14: end if
15: end while
16: minima ← minima ∪ [min]
17: end while
18: return minima
The algorithm uses a parameter α expressing the minimal number of frames

between inceptions of two consecutive footsteps. This parameter depends on
the frame frequency. Another parameter, β, is used as a threshold percentile
(from interval [0, 1]R ) of point distances that specifies the upper bound of point
distance at the minima candidate frames. Having minima identified, the pairs
of adjacent ones separate subsignals that represent individual footsteps, alter-
nately with the left or right foot in front.
4.2 Identifying Walk Cycles
Serializing individual footsteps would extract the walk cycle disregarding the
order of feet to undertake given footsteps. Note that the pose of passing feet
can continue with either left or right foot go forward, which fits onto two differ-
37
ent places within a walk cycle. From the physiological point of view, not only
injured, but also healthy human walking is barely balanced. The occurence of
tiny differences between left and right footsteps results in a different character-
istic of feet signal for the left and right foot and often reveals discriminatory
information about gait pattern. Hence we need to determine the following foot-
step at this phase, in particular, the objective is to extract a single walk cycle
that always starts with the move of left foot ahead. The walk cycle is then
composed of the footstep of the left foot and consecutive footstep of the right
foot.
The first 4 frames of our minima array f0 , f1 , f2 and f3 are utilized to
form the requested walk cycle. To identify the first footstep of the left leg,
we analyze the signal SKL FR that constitutes the changing distance between
the left knee and right foot. If both the feet are passing, this signal achieves
a higher value when the left foot is moving ahead in comparison with the
opposite situation when the right foot is moving ahead. Thus we can decide
whether a given footstep was undertaken by the left or the right foot. This
way, if the condition SKL FR (f0 ) > SKL FR (f1 ) is met, each signal in our gait
pattern is cropped according to the frames f0 and f2 (these are the frames
where the first and third minima of the feet signal were found). Otherwise,
signals are cropped according to the frames f1 and f3 . Then a requested walk
cycle is extracted either as the first two footsteps, or as the second and third
footstep, both preserving the first footstep to be undertaken by the left foot.
The pseudocode is described in Algorithm 2.
Algorithm 2 Walk cycle identification starting with the move of left foot
ahead.
Input: Arbitrary DTDS S, the DTDS SKL FR and minima array minima.
Output: Signal of one walk cycle starting with the move of left foot ahead.
IdentifyWalkCycles(S,SKL FR ,minima)
1: S 0 ← ∅
2: if SKL FR (minima[0]) < SKL FR (minima[1]) then
3: for f ← 1 to minima[2] − minima[0] do
4: S 0 ← S 0 ∪ {(f, S(f + minima[0]))}
5: end for
6: else
7: for f ← 1 to minima[3] − minima[1] do
8: S 0 ← S 0 ∪ {(f, S(f + minima[1]))}
9: end for
10: end if
11: return S 0
Finally, the cropped walk cycle signal is transformed to the length of λ

video frames.
38
4.3 Time Normalization
Natural variation in walking behavior results in different lengths of identified

walk cycles. To enable their more precise comparison, the third part of normal-
ization transforms walk cycles to a standardized length λ. The signal’s time
domain is linearly transformed (see Algorithm 3), making the signal shortened
as well as prolonged. We call the output signal normalized.
Algorithm 3 Linear time normalization of a DTDS.

Input: Arbitrary DTDS S defined on frame domain [1, l] and requested signal
length λ.
Output: Normalized DTDS.
TimeNormalize(S,λ)
1: S 0 ← ∅
2: for f ← 1 to λ do
3: S 0 ← S 0 ∪ {(f, S(b fλ·l c))}
4: end for
5: return S 0
Similarity of two gait pattern samples is exclusively measured on their nor-

malized DTDSs. Each DTDS S in G must consequently be normalized prior
to recognition. In particular, we run Algorithm 1 with the feet signal SFL FR
and fixed parameters α and β to identify inceptions of footsteps (i.e., minima
12
of SFL FR ). For all 2 DTDSs, Algorithm 2 is executed using the identified
minima of SFL FR together with the DTDS SKL FR recognition signal, to ex-
tract single walk cycles. At last the signals are linearly time-normalized using
Algorithm 3. All the signals become normalized by executing Algorithm 4.
Algorithm 4 Gait database normalization.

Input: Arbitrary gait database S, requested signal length λ, minimal footstep
length α and threshold percentile β.
Output: Normalized gait database.
Normalize(S,λ,α,β)
1: for all G ∈ S do
2: SKL FR ← G63
3: SFL FR ← G66
4: for i ← 1 to 66 do
5: minima ← IdentifyMinima(SFL FR ,α,β)
6: Gi ← IdentifyWalkCycles(Gi ,SKL FR ,minima)
7: Gi ← TimeNormalize(Gi ,λ)
8: end for
9: end for
10: return S
39
5 Experimental Evaluation
Influence of the normalization process on similarity of signals has been evalu-

ated in our previous works [26, 29], where the results showed dramatic improve-
ment in terms of recognition rate against non-normalized signals. Yet having
no doubt about employing the walk cycle normalization, the problem of weight
vector for all the DTDSs in our m

2 -parameter gait pattern structure is still
open. The gait distance Dw is by Chapter 3.6 in the form of a weighted sum of
given gait pattern components’ DTDS distances δ, the corresponding weights
of which are to be assigned. Due to a cardinal impact on recognition rate of the
distance function, their values can not be set arbitrarily, otherwise we could
receive untrustworthy results from such recognition. The primary objective of
our experiments is to investigate the DTDS weights and suggest optimal set-
tings for the gait comparison function.
Firstly, our laboratory motion-capture database is described briefly. Sec-
ondly, methodology for evaluating experimental trials and searching for a
proper weight vector is presented. And thirdly, success rates of examined weight
distributions for the gait comparison function are reported.
5.1 Database
The presented method was evaluated on a freely available real-life motion

database that was not meant to be used for purposes of gait recognition. The
Motion Capture Database1 (MoCap DB) from CMU Graphics Lab serves pri-
marily as a data source of walking humans’ trajectories and contains motion
sequences of different kinds of movements (e.g., dance, walk, box, etc.) for 144
recorded persons. This database has been utilized for our experiments that
were performed on all of the sequences that corresponded to common walking
and could meaningfully be used for gait recognition. That is 131 walking se-
quences belonging to 24 recorded persons in total, each person having at least
two of them different.
Our 12-parameter format M suggested in Chapter 3.1 is fully supported
here. A specialized software to construct, visualize and compare gait patterns
has been implemented. Trajectories of all the landmarks were extracted from
each walking sequence and employed to compute DTDSs that were conse-
quently normalized and used to construct a gait pattern for each of the 131
walking sequences. Since working with the frequency of 40ms, we empirically
set the frequency-dependent constants α = 40 and λ = 150. The constant β
is dependent on walkng style and since we are dealing with common (normal)
walk only, we set it to the 33rd percentile.
1. http://mocap.cs.cmu.edu
40
5. Experimental Evaluation
5.2 Methodology
Our database S was initially normalized with 150-frame alignment. Given a

weight vector w, the gait-distance effectiveness was evaluated by the following
procedure:
1. A query set Q of 48 walking sequences was randomly picked from all
131 of them, containing precisely two walking sequences of each person.
2. Each gait pattern G of Q was compared to the remaining ones in the

database by computing their mutual gait distance. According to the gait
distance Dw using a weight vector w, the most similar walking sequence
was recognized as the one of the minimal mutual distance
ςw (G) = inf Dw (G, G 0 ).

G 0 ∈S\{G}
3. Recognition (success) rate of the weight vector w was calculated as the

average 1-NN hit score per trial
1 X
ρ(w) = 1{ςw (G)∼G}
|Q|
G∈Q
and achieved rational values from [0, 1]Q . Here 1{ςw (G)∼G} denotes the
indicator function that returns 1 only if ςw (G) and G are gait patterns
of the same person and 0 otherwise.
Having the recognition rates of all possible weight distributions available,
we could theoretically define the optimal weight vector as the weight vector of
the maximal success rate
ω = sup ρ(w),
m
w∈R( 2 )
however, algorithmically speaking, the program would have to browse an un-
countable number of weight vectors. Moreover, even with binary2 weights and
our m = 12, the program would need to count 266 outcome values of recog-
nition rates. The amount of time consumed by that algorithm is obviously
unacceptable, therefore we need to shrink the m 2 -dimensional space of weight
distributions and establish a so-called hops count for the real-valued weight
interval cleavage.
Each DTDS is assigned a number of up to η weight units (including 0)
that determines its relative weight with respect to the other weights. For each
assignment of DTDSs to the unit count, a weight vector is constructed by di-
viding all the counts by their sum (to ensure the weight vector properties) and
tested for success rate on given database. We see that each DTDS is assigned
m −1

2. The weight achieves the value of either 0 or 2
.
41
η + 1 different values, which should be a rather small number. This η value we

call the hops count and is used to browse the [0, 1]R interval of possible weight
values by hops, without processing all the values. Increasing η will increase
preciseness of browsing, yet adds a considerable load on the optimal-weights-
search algorithm.
In order to further decrease computation costs by reducing dimensional-
ity of the weight distribution space, we specify an array Π of priority signals.
DTDS with index included in this array will be examined normally, others are
assigned zero weight. This might be unfair to the not lucky ones, however there
is a motivation for that. Not all of the DTDSs preserve gait information. For
example, SSL EL is the distance-time dependency signal of a shoulder and an
elbow of the same (left) side. These two points are on our model connected by
humerus, a solid bone of fixed length, that necessarily makes the signal static
over time. Hence we call a signal S static if
∀ f, f 0 ∈ Φ : S(f ) = S(f 0 )
and we see that there are a lot of signals in our 12-parameter model M that
don’t provide any dynamic characteristics. Since walk is a dynamic activity
and gait is a behavioral trait, a static signal is not considered as a valid gait-
recognition feature. Thus we can exclude all the static signals3 from our model
by assigning them zero weight. Further, there are 66 − 9 = 57 remaining sig-
nals, which is still a large number. With the same motivation, we can get rid
of some almost static signals. These are signals of negligible fluctuation and
contain a negligible amount of gait information as well. It is hard to determine
something like a fluctuation threshold that would divide dynamic signals into
highly dynamic and almost static ones. However, we tried to estimate the set
(implemented as array) Π of priority signals that enclose the key information
about person’s gait characteristics. Various fluctuation volume of the signals
SFL FR , SKL FL , SSR HR and SHL HR is shown in Figure 5.1.
Table 5.1 shows examined sets Π of priority DTDSs with corresponding
hops count η that were selected for searching of the optimal weights distribu-
tion at gait pattern recognition. Algorithm 5 is then executed over our MoCap
database, query set of above-mentioned attributes and the table’s Π and η as
parameters. As a consequence, Algorithm 6, Algorithm 7 and Algorithm 8 are
called to count the success rate of given weight vector, gait distance of the
weight vector and two given gait patterns and DTDS distance of two given
DTDSs, respectively.
For the sake of time consumption effectiveness, the implementation involves
weight vectors reduced to |Π| positions with zero-valued positions omitted.
Calculating DTDS distance of zero-weighted signals would be useless since
multiplicating it by 0 returns 0 as the particular summand. Thus, according to
3. SSL EL , SSR ER , SEL HL , SER HR , SLL LR , SLL KL , SLR KR , SKL FL , SKR FR
42
80 FOOT_LEFT FOOT_RIGHT
KNEE_LEFT FOOT_LEFT
SHOULDER_RIGHT HAND_RIGHT
70 HAND_LEFT HAND_RIGHT
60
50
Distance (cm)
40
30
20
10
0
0 20 40 60 80 100 120 140
Time (frame number)
Figure 5.1: DTDSs SFL FR , SKL FL , SSR HR and SHL HR .
# Priority DTDSs Π Hops η

1 SHL FL , SHL FR , SHR FL , SHR FR , SKL KR , SKL FR , SKR FL , SFR FL 6
SEL HR , SER HL , SHL HR , SHL LL , SHL FL , SHL FR , SHR LR , SHR FL ,
2 2
SHR FR , SLL FL , SLR FR , SKL KR , SKL FR , SKR FL , SFR FL
3 SEL HR , SHR LR , SLR FR 20
4 SLL KR , SLR KL 1
5 SLL LR 1
Table 5.1: Examined priority DTDSs sets with their hops counts.
Table 5.1, the database was modified into a couple of versions, each gait pat-
tern containing the appropriate set of priority DTDSs only. Moreover, access
to each iteration in Algorithm 5 and processing iteration-th relative weight
vector v is conditional on the fact whether the corresponding weight vector
w has already been used as a parameter for calculating success rate (in Algo-
rithm 6). Entrance to each iteration is granted in the positive cases of checking
the condition
(m2 )
gcd vi = 1,
i=1
43
Algorithm 5 Searching for the optimal weight distribution.

Input: Arbitrary gait database S, query set Q ⊆ S, array Π of priority DTDS
indexes, and hop value η.
Output: The optimal weight vector.
FindOptimalWeights(S,Q,Π,η)
1: p ← |Π|
2: v, ω ← (0, . . . , 0) of dimension 66
3: vΠ[0] ← 1
4: maxRate ← 0
5: iterations ← (η + 1)p
6: iteration ← 1
7: while iteration ≤ iterations do
w ← v/ p−1
P
8: i=0 vΠ[i]
9: rate ←SuccessRate(S,Q,w)
10: if rate > maxRate then
11: maxRate ← rate
12: ω←w
13: end if
14: i←0
15: while i < p ∧ vΠ[i] = η do
16: vΠ[i] ← 0
17: i++
18: end while
19: if i < p then
20: vΠ[i] + +
21: end if
22: iteration + +
23: end while
24: return ω
which means that the relative weight vector v is not a multiple of another
previously-used relative weight vector that would correspond to the same
weight vector already used for success rate calculation.
In the remaining section we provide and discuss the results of evaluation
experiments.
5.3 Results
Table 5.2 shows results of searching for optimal weight vectors of scenarios as-
signed in Table 5.1. Each line represents a single experiment of a given priority
DTDS set and hops count. For transparency reasons, the dimension of result-
ing optimal weight vector is shrunk to |Π| positions, hidden values are set to
44
Algorithm 6 Calculating success rate.

Input: Arbitrary gait database S, query set Q ⊆ S and a weight vector w.
Output: Success rate.
SuccessRate(S,Q,w)
1: hits ← 0
2: for all G ∈ Q do
3: minDistance ← ∞
4: closest ← random G 0 ∈ Q such that G G 0
5: for all G 0 ∈ S do
6: if G =6 G 0 then
7: distance ←GaitDistance(w,G,G 0 )
8: if distance < minDistance then
9: minDistance ← distance
10: closest ← G 0
11: end if
12: end if
13: if G ∼ closest then
14: hits + +
15: end if
16: end for
17: end for
18: return hits/|Q|
Algorithm 7 Calculating gait distance.

Input: Weight vector w and two gait patterns G and G 0 .
Output: Gait distance of the gait patterns.
GaitDistance(w,G,G 0 )
1 P66 0
1: return 66 s=1 ws SignalDistance(Gs ,Gs )
Algorithm 8 Calculating signal distance.

Input: Two DTDSs S and S 0 normalized to 150 frames.
Output: DTDS distance of the DTDs.
SignalDistance(S,S 0 )
1 P150 0
1: return 150 f =1 |S(f ) − S (f )|
zeros explicitely. The table also shows the maximum success rate as the ratio
of correctly recognized gait patterns over the number of query gait patterns.
Note that the higher hops count is selected the more accurate the results
we have, but also the more of them we have. Since success rate is calculated
over the rather small query set of size 48, there are only 49 different values for
success rate of a given weight vector to achieve. Hence the higher hops count
45
# Optimal Weight Vector Success Rate

1 (0.3, 0.05, 0.3, 0.25, 0, 0.05, 0.05, 0) 0.8333
2 (0.222, 0, 0, 0, 0, 0.111, 0.222, 0, 0, 0.222, 0.222, 0, 0, 0, 0) 0.9375
3 (0.5, 0.3, 0.2) 0.9375
4 (0.5, 0.5) 0.9792
5 (1) 1.0000
Table 5.2: Optimal wright vectors and their success rates of examined trials.
we set the more weight vectors of equal success rate there are. This means
that there were multiple optimal weight vectors reported for each scenario.
The most balanced (keeping the most of symmetric DTDS weights equal) and
compact (with the largest number of zero weights) one was selected among all
the weight vectors of the highest success rate as the representative.
In trial #1 of 8 signals we see a fairly spread weights distribution, slightly
dominated by the hand-foot signals while knee and feet signals became sig-
nificantly recessed. The more massive trial #2 got higher success rate just by
employing 5 out of 15 signals that are rather asymmetric (left-elbow-right-
hand, left-hand-right-foot, right-hand-right-loin and both same-side loin-foot
signals), moreover, just 3 of them (left-elbow-right-hand, right-hand-right-loin
and right-loin-right-foot) achieved the same recognition rate in trial #3, which
despite its asymmetricity seems like an economic solution. The trials #4 and
#5 have been evaluated to complete experiments over the entire DTDS charac-
ter spectrum. Balanced weights of the almost static cross-side loin-knee signals
and the solitude static loins signal scored the highest rates.
An attentive reader observes that presence of relatively static signals qual-
ifies for higher recognition rate. The highly-fluctuating signals are usually as-
signed with zero weight, which means that they harm when being used for
recognition. This leads to an unexpectable resolution that in order to increase
recognition rate static signals should be utilized instead. However, measuring
human skeletons without monitoring any dynamic parameter of walk would
contradict the idea of gait recognition.
The recognition rates on employed database are rather satisfactory, al-
though the trend of static signals being more discriminatory than the dynamic
ones is a result of either small database size or wrong notion of distance func-
tions. Both of the issues are subjects of our future research.
46
6 Conclusions and Future Research
This work introduced a model to represent gait variables, a normalization

technique to operate with gait patterns in a standardized form, and a gait-
comparison distance function that has been proven metric. Supported by this
work, comparison of human skeletons as physiological biometric traits reports
currently better results without having any gait parameter involved. Unlike
this field, where the research cannot go any further, gait recognition has many
possibilities to develop. Depending on the surveillance system, adjustments in
gait representation or distance function may be needed, so as to maximize
recognition rate while not breaking non-contactness or viewpoint invariance.
Our research will continue with the approach of DTDSs or angular velocity
as an alternative, both being handled with respective normalization methods
applied. The distance function needs to be revised in order to deal with the
local time shifting better, which may fundamentally redesign its notion. At last,
we will focus on our experiments being executed over a much larger database
to provide results of higher credibility and thus prepare the recognition method
to be utilized in real-world biometric surveillance.
Though still in its infancy, gait recognition is lately growing in significance.
An identification system based on this characteristic has many advantages as
a biometric option, such as being an unobtrusive technology, distance capture,
no consent of the observed individual required, and it is very difficult to steal,
fake or conceal. A considerable demand is in war on terrorism or criminology to
establish facts in criminal investigations and suspect tracking. However, having
human tracking available has various ethical implementations that arise in
protection of privacy, performance claims and public understanding. The civil
libertarian fear is that the desire to be safe from terrorism will push aside
reason in balancing security versus privacy (Figure 6.1). But the proposed
tradeoff is clear, loss of some amount of privacy for increase by some amount
in security. In order for the authorities to get the most of the security offered
while respecting personal privacy is to insure and maintain the products of this
research to be used in a socially responsible way.
Figure 6.1: Go ahead, act natural. Don’t mind us.
47
Bibliography
[1] Aravecchia M., Calderara S., Chiossi S., and Cucchiara R., A Videosurveil-
lance Data Browsing Software Architecture for Forensics: From Trajectories
Similarities to Video Fragments, Proceedings of the 2nd ACM workshop on
Multimedia in forensics, security and intelligence ACM, NY, 2010.
[2] Barclay C., Cutting J., and Kozlowski, L., Temporal and Spatial Factors
in Gait Perception That Influence Gender Recognition, Perception and
Psychophysics, vol. 23, no. 2, pp. 145–152, 1978.
[3] BenAbdelkader C., Cutler R., Nanda H., and Davis L., EigenGait: Motion-
Based Recognition of People Using Image Self-Similarity, Proceedings of
the Third IC on Audio- and Video-Based Biometric Person Authentication
Springer-Verlag London, pp. 284-294, 2001.
[4] Boyd J.E. and Little J.J., Biometric Gait Recognition, Advanced Studies
in Biometrics, pp. 19-42, 2005.
[5] Chen L., Özsu M.T., and Oria V., Robust and Fast Similarity Search for
Moving Object Trajectories, Proceedings of the 2005 ACM SIGMOD In-
ternational Conference on Management of data, pp. 491-502, 2005.
[6] Cunado D., Nixon M.S., and Carter J.N., Using Gait as a Biometric, Via
Phase-Weighted Magnitude Spectra, 1st International Conference Audio-
and Video-Based Biometric Person Authentication, pp. 95–102, Springer-
Verlag, 1997.
[7] Cunado D., Nixon M.S., and Carter J.N., Automatic Extraction and De-
scription of Human Gait Models for Recognition Purposes, Computer Vi-
sion and Image Understanding, vol. 90, no. 1, pp. 1-41, 2003.
[8] Cutting J. and Kozlowski, L., Recognizing Friends By Their Walk: Gait
Perception Without Familiarity Cues, Bulletin of the Psychonomic Society,
pp. 353-356, 1977.
[9] Dawson M.R., Gait Recognition, Final Report, Department of Computing

Imperial College of Science, Technology & Medicine, London, 2002.
[10] Dockstader S.L., Bergkessel K.A., and Tekalp A.M., Feature Extraction
for the Analysis of Gait and Human Motion, Proceedings of the 16th In-
ternational Conference on Pattern Recognition, vol. 1, pp. 5-8, 2002.
[11] Dockstader S.L., Berg M.J., and Tekalp A.M., Stochastic Kinematic Mod-
eling and Feature Extraction for Gait Analysis, IEEE Transactions on Im-
age Processing, vol. 12, no. 8, pp. 962-976, 2003.
48
6. Conclusions and Future Research
[12] Foster J.P., Nixon M.S., and Prugel-Bennett A., New Area Based Gait
Recognition, Audio- and Video-Based Biometric Person Authentication,
Springer–Verlag, pp. 312-317, 2001.
[13] Gross R. and Shi J., The CMU Motion of Body (MoBo) Database,
Robotics Institute, Pittsburgh, PA, Technical Report CMU-RI-TR-01-18,
June 2001.
[14] Han J. and Bhanu B., Individual Recognition Using Gait Energy Image,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28,
no. 2, pp. 316-322, 2006.
[15] Huang P.S., Harris C.J., and Nixon M.S., Human Gait Recognition in
Canonical Space Using Temporal Templates, IEEE Proceedings on Vision,
Image and Signal Processing, vol. 146, no. 2, pp. 93-100, 1999.
[16] Huang X. and Boulgouris N.V., Model-Based Human Gait Recognition

Using Fusion of Features, IEEE International Conference on Acoustics,
Speech and Signal Processing, pp. 1469-1472, 2009.
[17] Johansson G., Visual Perception of Biological Motion and a Model for Its
Analysis, Perception and Psychophysics 14(2), pp. 201-211, 1977.
[18] Kale A., Sundaresan A., Rajagopalan A.N., Cuntoor N.P., Roy-
Chowdhury A.K., Krüger V., and Chellappa R., Identification of Humans
Using Gait, IEEE Transactions on Image Processing 13(9), pp. 1163-1173,
2004.
[19] Kamruzzaman J. and Begg R.K., Support Vector Machines and Other
Pattern Recognition Approaches to the Diagnosis of Cerebral Palsy Gait,
IEEE Transactions on Biomedical Engineering, vol. 53, no. 12, pp. 2479-
2490, 2006.
[20] Kirtley C., Psychological Influences on Gait,

http://www.clinicalgaitanalysis.com/teach-in/psych/,
Last seen: 28.10.2012.
[21] Martek C., A Survey of Silhouette-Based Gait Recognition Methods, RIT

Department of Computer Science, Pattern Recognition (Topics in Artificial
Intelligence), 2010.
[22] Murray M.P., Gait as a Total Pattern of Movement, American journal of

Physical medicine 46(1), pp. 290-333, 1967.
[23] Niyogi S. and Adelson E., Analyzing and Recognizing Walking Figures in
XYT, IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, Seattle, Washington, USA, pp. 469–474, 1994.
49
6. Conclusions and Future Research
[24] Ross A. and Jain A.K., Human Recognition Using Biometrics: An

Overview, Annals of Telecommunications 62, pp. 11–35, 2007.
[25] Sarkar S., Phillips P.J., Liu Z., Vega I.R., Grother P., and Bowyer K.W.,
The HumanID Gait Challenge Problem: Data Sets, Performance, and Anal-
ysis, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol.
27, no. 2, pp. 162–177, 2005.
[26] Sedmidubský J., Valčík J., Balážia M., and Zezula P., Gait Recognition
Based on Normalized Walk Cycles, Proceedings of 8th International Sym-
posium on Visual Computing, pp. 11-20, 2012.
[27] Tanawongsuwan R. and Bobick A., Gait Recognition from Time-
Normalized Joint-Angle Trajectories in the Walking Plane, IEEE Computer
Society Conference on Computer Vision and Pattern Recognition, vol. 2,
pp. 726-731, 2001.
[28] Theodorou A., Skordilis E., Padulles J.M., Torralba M.A., Tasoulas E.,
Panteli F., and Smirniotou A., Stride Pattern Characteristics And Reg-
ulation Of Gait In The Approach Phase Of The Long Jump In Visually
Impaired Athletes, World Congress on Science in Athletics, INEFC, 2010.
[29] Valčík J., Sedmidubský J., Balážia M., and Zezula P., Identifying Walk
Cycles for Human Recognition, Proceedings of Pacific Asia Workshop on
Intelligence and Security Informatics, pp. 127-135, 2012.
[30] Vlachos M., Gunopoulos D., and Kollios G., Discovering Similar Multidi-
mensional Trajectories, Proceedings of the 18th International Conference
on Data Engineering, pp. 673-684, 2002.
[31] Wang L., Tan T., Ning H., and Hu W., Silhouette Analysis-based Gait
Recognition for Human Identification, IEEE Transactions on Pattern Anal-
ysis and Machine Intelligence 25(12), vol. 25, no. 12, pp. 1505-1518, 2003.
[32] Yamauchi K., Bhanu B., and Saito, H., Recognition of Walking Humans
in 3D: Initial Results, IEEE Computer Society Conference on Computer
Vision and Pattern Recognition Workshops, pp. 45-52, 2009.
[33] Yoo J.H., Hwang D., Moon K.Y., and Nixon, M.S., Automated Human
Recognition by Gait using Neural Network, First Workshops on Image Pro-
cessing Theory, Tools and Applications, pp. 1-6, 2008.
[34] Yu S., Tan D., and Tan T., A Framework for Evaluating the Effect of View
Angle, Clothing and Carrying Condition on Gait Recognition, International
Conference Pattern Recognition, vol. 4, pp. 441–444, 2006.
[35] Zezula P., Amato G., and Dohnal V., Similarity Search: The Metric Space
Approach, ACM SAC 2007 Conference Tutorial, Seoul, 2007.
50

Thesis On Gait Biometric

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Thesis On Gait Biometric

Uploaded by

Copyright:

Available Formats

}w

Human Gait Recognition

Advisor: RNDr. Jan Sedmidubský, Ph.D.

biometric surveillance, human gait recognition, body point trajectory, distance-

In our narrowly interconnected society human recognition gained considerable

a different fingerprint. Multiple fingerprints of an individual pro-

Palm vein pattern

By contrast, behavioral biometrics are an indirect measure of the character-

the gait of an individual is affected by footwear, clothing, affliction

traits available to be captured on a video that is acquired from a distance.

Figure 1.1: Biometric identification scheme.

A typical biometric system can be seen as a generic pattern recognizer that

assigns the individual an identity label, while an authentication system termi-

Walking is a complex dynamic activity that is fast, animate, irreversible and

Figure 1.2: Contrast enhanced images in gait sequences of CASIA Gait

Figure 1.3: Gait recognition on video.

Automated identification works in most cases as shown in Figure 1.3. Gen-

4. Tracking foreground objects.

1.3 Scope of Thesis

This work acts as a fundament for developing a system capable of automatic

8. Any piece of information that could be used for gait representation.

1. Collected by National Laboratory of Pattern Recognition, Institute of Automation, Chi-

There are many aspects that influence effectivness of individual methods,

2.1 Appearance-Based Approach

Gait motion of the walking human is compactly represented by its appear-

4. A query mechanism to characterize complex object behaviors in space and time.

main variation in the feature vector and allows an accurate reconstruction of

Figure 2.1: Example of PCA in action [9].

BenAbdelkader et al. [3] successfully used a plot of image self-similarity to

6. Set of characteristic vectors.

7. A PCA-obtained subspace spanned by a specific number of the most significant eigen-

show that the proposed GEI representation achieves the performance of up to

Figure 2.3: Examples of normalized and aligned silhouette frames in different

2.2 Model-Based Approach

The main principle of the model-based approach is to explicitly recover a high-

8. Direct frame shape matching.

Figure 2.4: Stick figures [9].

Tanawongsuwan and Bobick [27] demonstrate gait recognition using only

timating an underlying skeleton enables measuring the joint angle trajectories

An important consideration in similarity-based retrieval of moving object

data imperfections. EDR is based on edit distance of strings, and removes

(a) (b) (c)

The problem of recognizing walking humans at arbitrary poses is addressed

2.3 An Overview of Previous Methods

tification in adverse conditions (bad weather, carried accessories), and others

Year Authors Walkers Samples Performance

Table 2.1: Performance of chronologically ordered previous methods of gait

Appearance-based methods generally characterize the whole motion pattern

3.1 Model Definition

A human model is generally defined as a structure with body point labels as

where each pi is the label of a specific body point.

we operate with the 12-parameter model

where each landmark is described by a 3-dimensional body point

Figure 3.1: Locations of anatomical landmarks over suggested model [26].

As being already investigated, we abstract from gait pattern capture mech-

3.2 Trajectories of Movement

A chronologically ordered sequence of specific point positions represents its

Figure 3.2: Trajectories of anatomical landmarks captured as the person walks

3.3 Distance-Time Dependency Signals

Lemma 1. Let S be a DTDS. Then

a) for each body point P , SP P equals the zero function, and

b) for all body points P and P 0 , we can write SP P 0 = SP 0 P .

Proof. We use some well-known properties of L2 :

}w