Professional Documents
Culture Documents
!"#$%&'()+,-./012345<yA|
Masaryk University
Faculty of Informatics
Master thesis
Michal Balážia
Brno, 2013
Declaration
Hereby I declare, that this paper is my original authorial work, which I have
worked out by my own. All sources, references and literature used or excerpted
during elaboration of this work are properly cited and listed in complete ref-
erence to the due source.
Michal Balážia
ii
Acknowledgement
First and foremost, I thank my colleagues that motivated me produce this work.
Honza Sedmidubský, for bringing up the topic and providing many helpful
insights and encouragement, and together with Kuba Valčík for their beneficial
input and inspiring work environment. Professor Pavel Zezula, for finding ways
to aid and deal with my studies. I thank all my teachers for teaching me, my
family for loving me, and my friends for supporting me. I thank everyone for
doing their best in their respective places.
iii
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.1 Biometrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Gait . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Scope of Thesis . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.1 Appearance-Based Approach . . . . . . . . . . . . . . . . . . . 15
2.2 Model-Based Approach . . . . . . . . . . . . . . . . . . . . . . . 18
2.3 A Comprehensive Overview of Previous Methods . . . . . . . . 24
3 Gait Variables Representation . . . . . . . . . . . . . . . . . . . 26
3.1 Model Definition . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.2 Trajectories of Movement . . . . . . . . . . . . . . . . . . . . . 27
3.3 Distance-Time Dependency Signals . . . . . . . . . . . . . . . . 28
3.4 Similarity of DTDSs . . . . . . . . . . . . . . . . . . . . . . . . 29
3.5 Gait Pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
3.6 Methods of Gait Comparison . . . . . . . . . . . . . . . . . . . 32
4 Normalization of Signals . . . . . . . . . . . . . . . . . . . . . . . 34
4.1 Identifying Footsteps . . . . . . . . . . . . . . . . . . . . . . . . 35
4.2 Identifying Walk Cycles . . . . . . . . . . . . . . . . . . . . . . 37
4.3 Time Normalization . . . . . . . . . . . . . . . . . . . . . . . . 39
5 Experimental Evaluation . . . . . . . . . . . . . . . . . . . . . . . 40
5.1 Database . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.2 Methodology . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6 Conclusions and Future Research . . . . . . . . . . . . . . . . . 47
1
Abstract
This thesis concentrates on recognizing persons according to the way they walk
(gait). The primary objective is to design and implement a new method for
measuring similarity of gait features that are extracted from trajectories of 3D
coordinates of body components moving in time. The proposed method in-
troduces signals of body point pairs’ distance-time dependency while walking
as a basic unit of gait pattern. The signals are consequently normalized with
respect to a novel distance function. Parameters of the distance function are
optimized regarding the recognition rate and achieved results are experimen-
tally evaluated.
The textual part contains a comprehensive survey of existing approaches
to human gait recognition, as well as a description of a suitable model of hu-
man movement represented by 3D trajectories, a method for extracting gait
features, a proper similarity function for effective comparison of extracted fea-
tures, and results by analyzing recognition rate.
2
Keywords
3
1 Introduction
In the world of arising terrorism and free movement of criminals, people clearly
understand the importance of security monitoring and control for the purposes
of national defense and public safety. Early-warning systems already have a
large number of surveillance cameras installed in public areas, but require
intelligent approaches to human recognition. A perfectly useful system would
analyze the collected video data and release an alert before an adverse event
happens. Detecting an abnormal behavior, the system could instatly identify
all scene participants, rapidly investigate their previous activities, and launch
tracking the suspects.
1.1 Biometrics
Fingerprints
A fingerprint is formed by friction ridges and valleys of the surface
of a fingertip. It has been proven that each single finger generates
4
1. Introduction
DNA
DNA (deoxyribonucleic acid) contains genetic instructions of all
living creatures. DNA recognition instruments are likely to be
used in investigation of suspicious objects and in diagnostics. The
greatest advantage is its high distinctiveness, where the probabil-
ity of two people having exactly the same DNA code is one in a
hundred billion. On the other hand, DNA matching is not avail-
able in real-time because it needs a physical sample, while other
biometric systems only use an image or recording.
Retina
Capillary vessels located at the inner most sensitive layer of eye
have a unique pattern. Retina scans require the person to remove
his glasses, place his eye close to the scanner, stare at a specific
point, and remain still while the scan is completed. A retina scan
cannot be faked as it is currently impossible to forge a human
retina. Furthermore, the retina of a deceased person decays too
rapidly to be used to deceive a retinal scan. Although the enroll-
ment and scanning are intrusive and slow, the method is highly
accurate.
Iris
Iris is a pigmented portion of eye, in which a pupil is situated cen-
trally. Its visual texture is stabilized during the first two years of
life. The complex iris texture carries very distinctive information
useful for personal recognition. The hippus movement of the eye
may also be used as a measure of liveness for this biometric.
5
1. Introduction
Face
Applications of this non-intrusive method range from a static and
controlled authentication to a dynamic uncontrolled identification
in a cluttered background. The most popular approaches to face
recognition are based on either the location and shape of facial
attributes (eyes, eyebrows, nose, lips, chin) and their spatial rela-
tionships, or the overall analysis of the face image as a weighted
combination of canonical faces. A facial recognition system should
automatically detect whether a face is present in an acquired im-
age, locate the face if there is one, and recognize the face from a
general viewpoint.
Hand geometry
Hand geometry recognition systems are based on a number of
measurements taken from the human hand, including its shape,
size of palm, and the lengths and widths of its fingers. Since hand
geometry information may vary during lifetime and accessories
(jewelry) may touch extracting the variables, this biometric is
not known to be very distinctive.
Signature
The way a person writes his name has been accepted in govern-
ment, legal, and commercial transactions as a method of authen-
tication. This is an obtrusive method, since the user’s effort and
contact with a pen is required. The actual signature changes over
time and is influenced by physical and emotional conditions of
the signatories. In addition, professional forgers may be able to
reproduce signatures that fool the signature verification system.
Gait
Gait refers to the manner in which a person walks, and is one of
the few biometric traits that can be used to recognize people at a
distance. Therefore, this trait is very appropriate in surveillance
scenarios. Most gait recognition algorithms attempt to extract the
human silhouette in order to derive the gait variables. Hence, the
selection of a good model to represent the human body is pivotal
to the efficient functioning of a gait recognition system. However,
6
1. Introduction
Keystroke dynamics
Each person is hypothesized to type on a keyboard in a charac-
teristic way. Keystroke dynamics is a collection of detailed press
and release timing information about typing on a keyboard. This
biometric is not expected to be unique to each individual but it
may offer sufficient discriminatory information to permit identity
verification. The keystrokes of a person could be monitored un-
obtrusively and continuous verification of an individual’s identity
is available.
Voice
The measurable parameters of a person’s voice are tone, pitch,
cadence and frequency. However, they change over time due to
age, medical conditions (common cold), or emotional state. Being
not very distinctive, voice and may not be an appropriate trait for
large-scale identification. A text-dependent voice recognition sys-
tem is based on the utterance of a fixed predetermined phrase. A
text-independent voice recognition system offers more protection
against fraud, but is more difficult to design. Speaker recognition
is possible in telephone-based applications but the voice signal is
typically degraded in quality by the communication channel.
All biometric traits have their own relative merits in various operational
scenarios and their choice for a particular application depends on a series of
conditions besides its match performance. No single biometric ever meets all
the requirements (accuracy, cost, efficiency) imposed by all applications. How-
ever, the inherent limitation factors of processing a single biometric can be
alleviated by fusing the information obtained from multiple sources. For ex-
ample, the fingerprints of the right and left index fingers, or multiple images
of face from different angles, or a combination of the face and gait traits may
be used to resolve the identity of an individual. Consolidating the evidence
presented by multiple sources is expected to be more reliable. Furthermore,
fusion of biometrics helps users to get successfully enrolled and makes the sys-
tem more robust against hackers.
Most of the human recognition technologies generally require a subject
that permits and enables physical contact, close proximity or views from spe-
cific angles, in order to help the recognition procedure. Without a cooperating
subject, individuals are difficult to be reliably recognized from arbitrary views
when walking at a distance in real-world changing environmental conditions.
For optimal performance, we should make use of as much information as can
possibly be obtained from the available observations. Gait and face are two
7
1. Introduction
Database
Sample 1
Similarity 1
Database
Sample 2
Biometric Query Similarity 2
Data Sample
Database
Sample N Maximum
Similarity N Identity
Similarity
1. A k-NN query returns the k most similar distinct objects to the query object.
2. A range query returns all objects within some range (radius) of the query object.
8
1. Introduction
1.2 Gait
3. Early medical and psychological studies [18] indicate that the human gait has 24 different
measurable components.
9
1. Introduction
of one’s gait.
Gait can be seen as advantageous over other traits since the information
is captured without usage of any obtrusive body-invasive equipment or sub-
ject’s cooperation. From a surveillance perspective, walk pattern biometrics is
appealing because of its possibility to be performed at a distance, even sur-
reptitiously. Together with high rate of collectability, these are the reasons
why the method is preferably employed at human identification rather than
authentication. Another motivation is that video footage of suspects is read-
ily available, as surveillance cameras are relatively low cost and installed at
most locations requiring presence of security. Unlike face recognition, which
can be easily affected by low resolution, here a video of reduced quality is of-
ten sufficient. Moreover, gait is difficult to disguise, and by trying to do so the
individual will probably appear even more suspicious, while face can easily be
altered or hidden.
On the other hand, slightly modified conditions may introduce a large
appearance change, which may lead to a failure in recognition. Gait can be
affected by clothing (long cloak, shoes, carrying objects), physical changes
(pregnancy, injury, weight gain/loss) or environmental context (walking sur-
face, background). Temporary stimulants (drugs and alcohol) as well as mood
[20] can change a person’s walking style. The large gait variation of the same
person under different conditions intentionally or unintentionally reduces the
discriminating power of gait as a biometric. Hence it may not be as unique
as fingerprint or iris, although the inherent gait characteristic of an individual
still makes it irreplaceable and useful in visual surveillance.
The ultimate goal is to detect, classify and identify humans at long dis-
tances, under day or night, all-weather conditions, alone or among a group
of people. Light deficit and self-occlusion of feature points (overlapping legs)
10
1. Introduction
kills the process of gait features extraction from the external factors. The
challenges involved in gait recognition include also imperfect foreground seg-
mentation of the walking subject from the background scene and variations
in the camera viewing angle with respect to the walking subjects. Large vari-
ation of the same individual under different conditions requires more gallery
samples collected from different environmental contexts. Despite having sam-
ples generally obtained under similar environmental conditions, a reliable gait
recognition system with large-scale database seems unreal due to the complex-
ity of real-world situations. Therefore, a reasonable solution is not to ask the
system for a single (the best match) identity of an individual, but for a set of
most probable identities. The set can be constructed by taking first k most sim-
ilar patterns (k-NN query), or by setting a similarity threshold (range query).
It is important to adjust the values sensitively with respect to database size
and similarity function in order to filter out maximum of unrelated identities,
but also to keep the related one. The goal of a dramatic reduction of the set
of all identities (from billions to hundreds) is now achievable.
Walking Identity
Subject Detection Extraction Recognition of the Subject
Spotted Captured Raw Extracted Verdict Revealed
Video Gait Data Features
Recorded
Features
Database
1. Detection:
Detection and tracking humans from a video sequence is the first
step in gait recognition, where an individual is actually spotted
walking. Gait detection systems usually work with the assumption
that the video sequence to be processed is captured by a static
camera, and the only moving object in video sequence is the sub-
ject (person). Given a video sequence from a static camera, this
module detects and tracks the moving silhouettes. This process
comprises of two submodules, Foreground Modelling4 and Human
tracking5 using skeletonization operation.
11
1. Introduction
2. Extraction:
Targeted elements of gait are collected from the individual in ques-
tion. In this step it helps to have the background be as simple as
possible and extra attention should be paid to selection of an
appropriate (side) viewpoint. To obtain a useful descriptor of a
walking person, person’s legs can be temporally tracked for sym-
metries. A more challenging approach is to extract phase infor-
mation from the timed interplay of the coordinated movements
comprising gait. Variable lighting and moving backgrounds make
traditional foreground extraction techniques such as optical flow6
and background subtraction7 unstable. At last, the specific mark-
ers of the identification scheme are extracted and consequently
preprocessed (normalized) to be in a comparable form.
3. Recognition:
Recently extracted samples are compared to the entries stored
in a central database. The identity of an individual holding the
most (and enough) similar gait sample is picked and stated as the
recognition verdict.
Gait recognition system can be used in a number of different scenarios. If
an individual walks by the camera who’s gait has been previously recorded and
he is a known threat, then the system will recognise him and the appropriate
authorities can be automatically alerted. Such systems have a large amount of
potential application in airports, banks or other high security area.
6. The pattern of apparent motion of objects, surfaces, and edges in a visual scene caused
by the relative motion between an observer (an eye or a camera) and the scene.
7. A commonly used class of techniques for segmenting out objects of interest in a scene
from the background noise.
12
1. Introduction
function for effective comparison of gait samples. Several functions that com-
pare various combinations of extracted signals are suggested to determine the
similarity of gait samples with respect to the presented model. Secondary con-
tribution lies in proposal of a novel method for automatic determination of
walk cycles within extracted signals and their consecutive time normalization.
In addition, the importance of entire preprocessing is evaluated on a real-life
human motion database.
Walking subjects are assumed to be discovered with their gait variables8
captured. The work focuses on preprocession of gait variables and design of a
reasonable similarity function for their comparison. Experiments are employed
to reveal optimal settings for the similarity function.
After a short introduction to principles of biometrics and gait recognition
in Chapter 1, a rather exhaustive survey of existing approaches to human gait
recognition is given in Chapter 2. The model is defined in Chapter 3, together
with specification which gait parameters to measure and how to compare them.
Normalization approach for achieving a high similarity of related signals is de-
scribed in Chapter 4 and its importance is with optimized settings evaluated
in Chapter 5. Chapter 6 serves as a summary of the thesis, picking the spots
of imperfection that are nominated for the future research issues.
13
2 Related Work
Early research was motivated by Johansson’s [17] and Barclay’s [2] psycholog-
ical experiments, where participants were able to recognize the type of move-
ment of pedestrians simply from observing the 2D motion pattern generated
by light bulbs attached to several joints over their body. Results are consistent
with physiological measurements. These experiments proved that the gait is
personally unique, and can be used for biometric recognition. Similar experi-
ments later showed some evidence that identity of a familiar person (friend)
[8], as well as gender [2], might be recognizable.
Because of the complexity of the field, most approaches only analyze gait
from the side view without exploring the variation in measurements caused
by differing view angles. Previous results on automatic gait recognition looked
promising [6, 12, 4], however, the subject databases used for testing are typ-
ically small (often less than ten people). Subsequent work has to be carried
out on larger databases to ascertain how effective identification methods will
be with a large-scale population size. Even if operating on small databases,
the success rate is reported as percent correct, that is, on how many trials are
needed for the system to correctly recognize the individual by choosing its best
match. Such a result gives little insight to how the technique might scale when
a database contains hundreds or more people.
Generally for any method, the recognition rate decreases with increasing
number of subjects and increases with increasing number of gait sequences
recorded for a single subject. Some of large gait databases widely used in aca-
demic research are:
• CASIA Gait Database1 (Dataset B) [34]
• CMU Motion of Body Database2 [13]
• USF HumanID Gait Baseline Database3 [25]
In spite of some covariances in view angle, shoe type and carrying condi-
tions, these large databases have been built for massive human identification.
Related to the challenging factors, such as view angle, clothing, or walking
speed, many related researches have been published and most of them work
accurately in controlled environment and walking style. However, existing ap-
proaches are far from perfect. For example, it is difficult to track the pedestrian
in crowd and gait feature may be extremely inaccurate if camera is shaking or
weather dramatically changes.
14
2. Related Work
15
2. Related Work
16
2. Related Work
influenced by light and noise. The classifier was built simply by feature vectors
as points in Eigengait space7 , and the test sample was classified by determin-
ing its 5-NN, using the Euclidian distance and simple majority as a decision
rule. Experiments with recognition rates up to 93% have proven that people
maintain similar plots on different video sequences, and that the plots differ
from person to person, giving evidence about possibility to be used as a gait
classifier.
Figure 2.2: Self-similarity plots (right) for the walk sequences (left) [3].
Some other discrete methods aim to represent gait cycles with a Hidden
Markov Model (HMM). States of these systems are not actually observable,
but they generate output based on probability. Kale et al. [18] used HMM to
capture the information from gait sequence and to recognize individuals. The
postures that the individual adopts can be regarded as the states of the HMM
and provide a means of his discrimination. Gait was represented by the width
of the outer contour of the binarized silhouette. HMM was trained for each
person and then gait recognition was performed by evaluating the probability
that a given observation sequence was generated by a particular HMM. On the
CMU database, the right person is in the top three matches (3-NN) 90% of
the times for the cases where the training and testing sets correspond to the
same walking styles.
Han and Bhanu [14] propose a new spatio-temporal gait representation,
called the Gait Energy Image (GEI), for individual recognition by gait. Un-
like other gait representations, which consider gait as a sequence of templates
(poses), GEI captured human motion sequence in a single image while pre-
serving temporal information. They directly generate real synthetic templates
from training silhouette sequences by simulating silhouette distortion (see Fig-
ure 2.3). The intensity of each pixel on GEI revealed the duration of foreground
staying at that position. Experimental results on USF HumanID Database
17
2. Related Work
18
2. Related Work
lar velocity10 [27] or trajectories11 [5, 30] are measured on the body model
as it deforms over the walking sequence. They are typically mapped to some
low-dimensional feature vectors and compared by various sophisticated classi-
fication techniques.
As opposed to silhouettes, models can reliably handle occlusion (especially
self-occlusion), noise, scale and rotation. They offer the ability to derive gait
signatures directly from model parameters. Another considerable advantage is
that extra evidence gathering techniques can be used across the whole sequence
before model fitting. Mapping to models also helps reducing the data dimen-
sionality. These methods are easy to understand, however, their effectiveness is
still limited by imperfect vision techniques, especially in body structure mod-
eling12 . Moreover, comparison and searching procedures tend to be complex
and need high computational cost.
The first model-based approach to gait biometrics was by Cunado et al.
[7]. Their simple structure modelled a leg as interlinked pendulums. Gait sig-
nature was derived fom the angular rotation of the hip and knee, as draught
in Figure 2.5, by multiplication of the phase and magnitude component of the
Fourier description. Assuming that gait is symmetric, only one leg was mod-
elled and the other one was calculated with a phase lock of half-period shift.
This method could withstand differing amounts of noise and occlusion, and
achieved recognition rate of 100% on a database of 10 subjects.
Research made by Dockstader et al. [10, 11] presents a structural approach
toward 3D tracking and extraction of gait from human motion. For the struc-
10. Also called rotational velocity, it is a quantitative expression of the amount of rotation
that a spinning object undergoes per unit time.
11. Path that a moving object follows through space as a function of time.
12. Tracking and labelling human body.
19
2. Related Work
(a) (b)
Figure 2.5: Structural model of a leg [7]. (a) Upper and lower pendulum rep-
resents the thigh and the calf, respectively, connected at the knee joint. (b)
Fourier description of rotation-phase dependency.
ture of the human model, they used a set of thick lines joined at points to
represent the legs and a periodic pendulum motion to describe the gait (see Fig-
ure 2.6). A 15-parameter stick model (p1 , . . . , p15 ) and a 10-parameter bound-
ing volume (p16 , . . . , p25 ) are defined. Each component of the model is measured
in 3D, body-centered coordinates. It is assumed that during a normal gait cycle,
the body moves forward, tangentially to the transverse (T P, x-y) and saggital
(SP, x-z) planes and orthogonally to the coronal plane (CP, y-z). Points in this
space are indicated by vector (x, y, z). To further limit the variability and to
increase the tracking accuracy of the body model, they introduced a concept
of additional hard and soft kinematic constraints13 .
Figure 2.6: Structural object model with its kinematic constraints [10].
13. Bounds on the velocity and acceleration magnitudes, spanning distances and scaling.
20
2. Related Work
Figure 2.7: Left knee trajectories before (up) and after (down) time-
normalization [27].
14. Trajectory’s time axis is linearly converted from the experimentally-recorded time units
to an axis representing percentage of the gait cycle.
15. Used mainly in databases, the term refers to detecting and correcting corrupted records.
21
2. Related Work
(a) (b)
Figure 2.8: [30] (a) Examples of 2D trajectories. (b) Notion of the LCSS match-
ing within the gray region.
22
2. Related Work
Figure 2.9: Sample of the USF data set [16]. (a) Original frame, (b)
Automatically-extracted silhouette, (c) Manually-labelled silhouette.
Figure 2.10: Extraction of the stick figures from body contours [33].
23
2. Related Work
(a) (b)
Figure 2.11: [32] (a) 3D human body model with approximated by 12 segments
and (b) its hierarchical structure.
Research in human gait recognition experienced a boom over the past 15 years.
As far back as 90’s, scientists were working with systems of > 90% recognition
rate. Yet having a tiny database, the research attracted considerable attention.
Current research is indeed more sophisticated and methods developed are more
sensitive. Comparing the old methods with the new ones acts not only as an
indicator of a meaningful research, but also as a result of natural interest of
some researchers.
There are plenty of factors to regard at comparisons, but the most questions
asked are about success rate of the recognition procedure. Related to that pa-
rameter, number of walking subjects and gait samples play an important role.
One could invent a 100% method, although tested only on a database with
2 persons. On the other hand, another could work harder and introduce an
80% method examined on hundreds of subjects. Hence research conditions are
highly significant for the relevance of the performance expressed in a single
number. In order to perform a completely fair performance survey, the im-
plementations of all methods should be executed on the very same database
that could offer gait data in particular representations and could retrieve any
particular query.
Unfortunately, it is impossible to test all methods on a single database,
which might be a reason to understand this survey as of less informatory value.
Some methods focus on a traditional style of walking, some work with iden-
24
2. Related Work
25
3 Gait Variables Representation
M = (p1 , . . . , pm ),
26
3. Gait Variables Representation
M = (SL , SR , EL , ER , HL , HR , LL , LR , KL , KR , FL , FR ),
Pf = [xf , yf , zf ] ∈ R3
captured at a given video frame f ∈ F . The relative frame (discrete time) do-
main F = [1, l]N avoids operating with concrete time instances of an absolute
time domain. The number l is relative and refers to the length of input video
in terms of number of frames, i.e., to the number of instances a specific body
point has been captured. This enables us to use the above relative notation of
body points. The only condition is a constant capture frequency, corresponding
to a time difference between capture times of two consecutive video frames.
TP = {Pf | f ∈ F }.
The discrete frame domain F , limited from above by the trajectory rel-
ative length l = |T |, allows us to utilize metric functions for point-by-point
comparison of trajectories.
27
3. Gait Variables Representation
anatomical landmarks
trajectories
z
y
x
Extracted trajectories cannot be directly used for recognition because the val-
ues of their spatial coordinates depend on the system calibration that detects
and estimates particular coordinates. Moreover, persons do not walk in the
same direction, what makes trajectories of different walks (even of the same
person) incomparable. We rather compute distances between selected pairs of
trajectories.
A distance-time dependency signal (DTDS) expresses how a distance be-
tween two trajectories changes over time as the person walks. It is the variation
in these distances that is primarily exploited as information for identity. Such
signals are already independent from the walk direction and system calibra-
tion.
The distance is always measured between two points Pf = [xf , yf , zf ] and
Pf = [x0f , yf0 , zf0 ] captured at the same video frame on the basis of the Euclidean
0
distance
r 2
2 2
L2 (Pf , Pf0 ) = xf − x0f + yf − yf0 + zf − zf0 .
For given two points P and P 0 that travel over trajectories TP and TP 0 of
an identical domain F we formally define corresponding DTDS as a function
SP P 0 : F → R+
0,
SP P 0 (f ) = L2 (Pf , Pf0 ).
28
3. Gait Variables Representation
∀ f ∈ F : SP P (f ) = L2 (Pf , Pf ) = 0 ⇒ SP P = F × {0}
The person’s walk identity consists of several DTDSs that describe the per-
son’s style of walking. In the following, we present a methodology for measuring
similarity between two signals of the same pairs.
Having two DTDSs of an identical pair of points measured, the main issue
is to propose an effective method for their comparison. After reading Chap-
ter 2.2, an attentive reader is familiar with some notable trajectory-comparing
distance functions that can be applied on signals as well. We can for example
use functions as EDR [5] or LCSS [30] that take local time shifting and noise
into account. However, with the help of DTDS normalization (see Chapter 4)
we are not strained to use such sophisticated and complex distance functions.
Even the simpliest pointwise difference (Manhattan distance) gives a sound
result.
Extracted DTDSs within a single walking sequence have always the same
length and number of footsteps. However, we need to compare signals extracted
from different walks that can have a significantly different length and footstep
count. The problem of different length can simply be solved by truncating the
longer signal to the length of shorter one. Thinking about possible inaccuracies,
two long signals that are very similar can have the same distance as two short
signals that are rather different. Fortunately, signal normalization solves this
unwanted scenario by unifying the frame domains. Then we can use the fol-
lowing distance for measuring the dissimilarity of two signals S and S 0 defined
on a (normalized) frame domain Φ = [1, λ] of a fixed length λ
1X
δ(S, S 0 ) = |S(f ) − S 0 (f )|.
λ
f ∈Φ
This function sums point-by-point differences between two specific DTDSs and
computes the average difference for a single frame. It returns 0 if the DTDSs
are identical and with an increasing return value their similarity decreases.
29
3. Gait Variables Representation
Let S and S 0 be DTDSs defined on the frame domain Φ = [1, λ]. Then
1 X
δ(S, S 0 ) = |S(f ) − S 0 (f )| ≥ 0.
λ
|{z} f ∈Φ
>0 | {z }
≥0
b) Identity of Indiscernibles:
∀S, S 0 ∈ Φ × R+ 0
0 : δ(S, S ) = 0 ⇔ S = S
0
c) Symmetry:
∀S, S 0 ∈ Φ × R+ 0 0
0 : δ(S, S ) = δ(S , S)
Let S and S 0 be DTDSs defined on the frame domain Φ = [1, λ]. Then
1X 1X 0
δ(S, S 0 ) = |S(f ) − S 0 (f )| = |S (f ) − S(f )| = δ(S 0 , S)
λ λ
f ∈Φ f ∈Φ
30
3. Gait Variables Representation
d) Triangle Inequality:
∀S, S 0 , S 00 ∈ Φ × R+ 00 0 0 00
0 : δ(S, S ) ≤ δ(S, S ) + δ(S , S )
1X
= (|S(f ) − S 0 (f )| + |S 0 (f ) − S 00 (f )|) ≥
λ | {z }
f ∈Φ
≥|S(f )−S 00 (f )|
1X
≥ |S(f ) − S 00 (f )| = δ(S, S 00 ).
λ
f ∈Φ
where each SMi Mj refers to the DTDS of the body points on corresponding
coordinates i and j of the model M.
The following equivalence is defined for our ability to treat relative gait
patterns. Given two gait patterns G and G 0 , we put
G ∼ G0
31
3. Gait Variables Representation
comlicated or the m
2 comparisons would consume unacceptable time. The
reason is that recognizing with use of too many criterions may have a negative
influence on success rate. This structure therefore allows to assign weights to
individual DTDSs and thus reflect the importance of a given DTDS in the
process of recognition. Assigning zero weight to a single DTDS will cause giv-
ing it no respect at recognition. The question how the weights should be set
in order to maximize the recognition rate is a subject of this research. Hence,
experiments in Chapter 5 were employed to distinguish the weight scaling of
the DTDSs selected as components of the gait pattern structure.
In the following, we present a methodology for measuring similarity between
two gait patterns.
We introduce a novel similarity function for comparing two gait patterns G and
G 0 of an m-parameter model. This function is based on aggregation of weighted
signal distances and is generally defined as a weighted sum
(m2 )
1 X
Dw (G, G 0 ) = m wi δ(Gi , Gi0 ),
2 i=1
where δ(Gi , Gi0 ) is the distance of DTDSs on both gait pattern structures’ i-th
coordinate. The weights should be assigned with the properties
(m2 )
m X
∀ i ∈ 1, : wi ∈ [0, 1]R and wi = 1,
2 N
i=1
is a metric space.
Proof. Metricity of the gait distance function Dw follows directly from metric-
ity of the DTDS distance function δ at each coordinate of the gait pattern
structure, which is proven in Lemma 3.
The DTDSs within a single walk identity has always the same length and
number of footsteps. However, a recognition algorithm has to compare DTDSs
of identities which can have a significantly different length and footstep count.
32
3. Gait Variables Representation
Use of the introduced metric function is meaningful only in case when both
compared signals contain the same number of footsteps and start at the same
phase of a walking process, otherwise such heterogeneous signals are semanti-
cally incomparable by standard distance functions (e.g. Manhattan, Euclidean,
or DTW). Consequently, we normalize extracted DTDSs with respect to a du-
ration and walk cycles’ phase before similarity comparison.
33
4 Normalization of Signals
Walk cycle is a sequence of poses between the exact same repetitive events of
walking. The cycle can be defined to start at any moment, now if it is with the
right foot contacting the ground, then the cycle ends when the right foot makes
the contact again. We distinguish two major phases, stance and swing. Stance
phase is the part of the cycle when the foot is in contact with the ground.
It comprises 62% of the cycle, beginning with initial foot strike and ending
with toe-off. Swing phase occurs when the foot is in the air and comprises 38%
of the cycle, beginning with toe-off and ending with second foot strike. The
motion between successive points of contacts of the same foot is called stride,
and the motion among consecutive heel strikes of opposite feet is a footstep. It
is obvious that a complete walk cycle (see Figure 4.1) includes two footsteps.
Thus, human gait is a form of periodic motion.
(a) Pose 1 (b) Pose 2 (c) Pose 3 (d) Pose 4 (e) Pose 1
34
4. Normalization of Signals
Considering the fact that human walking is periodic, this process splits an
input feet DTDS SFL FR into individual walk cycles. To extract the requested
walk cycle, we need to identify inceptions of individual footsteps.
The inception of each footstep must begin at a same phase. Any phase
can be chosen, however, some of them are more striking and easier to describe
and measure. We select the moment when person’s legs are the closest to each
other as a footstep inception, as outlined in Figure 4.1(a). By a naked view
of the character of the signal we can see a sequence of hills and valleys (see
Figure 4.2). Each hill represents a subcycle of drowing the points apart and
their consecutive approach. As a walk cycle inception, we selected the moment
1. The feet signal of our 12-parameter model introduced in Chapter 3.1, SFL FR , represents
the changing distance between the left FL and right FR foot as the person walks.
35
4. Normalization of Signals
Distance
Distance
(a) (b)
0 Time 0 Time
(a) (b)
Distance
Distance
(c) (d)
0 Time 0 Time
(c) (d)
Figure 4.2: Normalization of two feet DTDSs with a different number of foot-
steps (each hill represents a single footstep) [29]. (a) represents these signals
without normalization. (b) denotes identified minima of each signal. (c) con-
stitutes just the first walk cycle of the signals starting with the move of left
foot ahead. (d) shows the extracted walk cycles after time normalization.
when person’s legs are close to each other. In the hill-valley notation, depicted
phase corresponds to the lowest tip of valley (minimum). The reason is easier
determining of even miniature footsteps. The values of such minima within the
signal are very similar because feet are passing at a similar (almost identical)
distance at each footstep. To determine footsteps, we identify all the minima
within the feet signal. We cannot rely on the signal to contain minima at a
fixed distance and to be ideally smooth, which means without any undulation
caused by measurement errors. This is the reason we cannot use traditional
find-minima algorithms.
The algorithm for finding minima is described in Algorithm 1. The algo-
rithm processes the feet signal SFL FR of frame domain F and returns the array
of frames f ∈ F corresponding to identified minima that are smaller than
some threshold. The threshold defines the maximum distance which the sig-
nal at identified minima can reach. This threshold should be assigned in the
form of percentile β of distances of the input feet signal, with respect to walk-
ing conditions. The feet signal is consecutively scanned from the beginning,
and the first frame f with the distance SFL FR (f ) smaller than the threshold
denotes a candidate for the minimum. If there is another better candidate
f 0 ∈ [f + 1, f + α]N with a smaller distance SFL FR (f 0 ) < SFL FR (f ), it replaces
the previous candidate f . Looking for a better candidate continues until the
next candidate with a smaller distance within next α frame can not already be
found. The last candidate f is declared as the minimum, the nearest frame f 0
36
4. Normalization of Signals
Serializing individual footsteps would extract the walk cycle disregarding the
order of feet to undertake given footsteps. Note that the pose of passing feet
can continue with either left or right foot go forward, which fits onto two differ-
37
4. Normalization of Signals
ent places within a walk cycle. From the physiological point of view, not only
injured, but also healthy human walking is barely balanced. The occurence of
tiny differences between left and right footsteps results in a different character-
istic of feet signal for the left and right foot and often reveals discriminatory
information about gait pattern. Hence we need to determine the following foot-
step at this phase, in particular, the objective is to extract a single walk cycle
that always starts with the move of left foot ahead. The walk cycle is then
composed of the footstep of the left foot and consecutive footstep of the right
foot.
The first 4 frames of our minima array f0 , f1 , f2 and f3 are utilized to
form the requested walk cycle. To identify the first footstep of the left leg,
we analyze the signal SKL FR that constitutes the changing distance between
the left knee and right foot. If both the feet are passing, this signal achieves
a higher value when the left foot is moving ahead in comparison with the
opposite situation when the right foot is moving ahead. Thus we can decide
whether a given footstep was undertaken by the left or the right foot. This
way, if the condition SKL FR (f0 ) > SKL FR (f1 ) is met, each signal in our gait
pattern is cropped according to the frames f0 and f2 (these are the frames
where the first and third minima of the feet signal were found). Otherwise,
signals are cropped according to the frames f1 and f3 . Then a requested walk
cycle is extracted either as the first two footsteps, or as the second and third
footstep, both preserving the first footstep to be undertaken by the left foot.
The pseudocode is described in Algorithm 2.
Algorithm 2 Walk cycle identification starting with the move of left foot
ahead.
Input: Arbitrary DTDS S, the DTDS SKL FR and minima array minima.
Output: Signal of one walk cycle starting with the move of left foot ahead.
IdentifyWalkCycles(S,SKL FR ,minima)
1: S 0 ← ∅
2: if SKL FR (minima[0]) < SKL FR (minima[1]) then
3: for f ← 1 to minima[2] − minima[0] do
4: S 0 ← S 0 ∪ {(f, S(f + minima[0]))}
5: end for
6: else
7: for f ← 1 to minima[3] − minima[1] do
8: S 0 ← S 0 ∪ {(f, S(f + minima[1]))}
9: end for
10: end if
11: return S 0
38
4. Normalization of Signals
39
5 Experimental Evaluation
5.1 Database
1. http://mocap.cs.cmu.edu
40
5. Experimental Evaluation
5.2 Methodology
and achieved rational values from [0, 1]Q . Here 1{ςw (G)∼G} denotes the
indicator function that returns 1 only if ςw (G) and G are gait patterns
of the same person and 0 otherwise.
Having the recognition rates of all possible weight distributions available,
we could theoretically define the optimal weight vector as the weight vector of
the maximal success rate
ω = sup ρ(w),
m
w∈R( 2 )
however, algorithmically speaking, the program would have to browse an un-
countable number of weight vectors. Moreover, even with binary2 weights and
our m = 12, the program would need to count 266 outcome values of recog-
nition rates. The amount of time consumed by that algorithm is obviously
unacceptable, therefore we need to shrink the m 2 -dimensional space of weight
distributions and establish a so-called hops count for the real-valued weight
interval cleavage.
Each DTDS is assigned a number of up to η weight units (including 0)
that determines its relative weight with respect to the other weights. For each
assignment of DTDSs to the unit count, a weight vector is constructed by di-
viding all the counts by their sum (to ensure the weight vector properties) and
tested for success rate on given database. We see that each DTDS is assigned
m −1
2. The weight achieves the value of either 0 or 2
.
41
5. Experimental Evaluation
42
5. Experimental Evaluation
80 FOOT_LEFT FOOT_RIGHT
KNEE_LEFT FOOT_LEFT
SHOULDER_RIGHT HAND_RIGHT
70 HAND_LEFT HAND_RIGHT
60
50
Distance (cm)
40
30
20
10
0
0 20 40 60 80 100 120 140
Time (frame number)
Table 5.1: Examined priority DTDSs sets with their hops counts.
Table 5.1, the database was modified into a couple of versions, each gait pat-
tern containing the appropriate set of priority DTDSs only. Moreover, access
to each iteration in Algorithm 5 and processing iteration-th relative weight
vector v is conditional on the fact whether the corresponding weight vector
w has already been used as a parameter for calculating success rate (in Algo-
rithm 6). Entrance to each iteration is granted in the positive cases of checking
the condition
(m2 )
gcd vi = 1,
i=1
43
5. Experimental Evaluation
which means that the relative weight vector v is not a multiple of another
previously-used relative weight vector that would correspond to the same
weight vector already used for success rate calculation.
In the remaining section we provide and discuss the results of evaluation
experiments.
5.3 Results
Table 5.2 shows results of searching for optimal weight vectors of scenarios as-
signed in Table 5.1. Each line represents a single experiment of a given priority
DTDS set and hops count. For transparency reasons, the dimension of result-
ing optimal weight vector is shrunk to |Π| positions, hidden values are set to
44
5. Experimental Evaluation
zeros explicitely. The table also shows the maximum success rate as the ratio
of correctly recognized gait patterns over the number of query gait patterns.
Note that the higher hops count is selected the more accurate the results
we have, but also the more of them we have. Since success rate is calculated
over the rather small query set of size 48, there are only 49 different values for
success rate of a given weight vector to achieve. Hence the higher hops count
45
5. Experimental Evaluation
Table 5.2: Optimal wright vectors and their success rates of examined trials.
we set the more weight vectors of equal success rate there are. This means
that there were multiple optimal weight vectors reported for each scenario.
The most balanced (keeping the most of symmetric DTDS weights equal) and
compact (with the largest number of zero weights) one was selected among all
the weight vectors of the highest success rate as the representative.
In trial #1 of 8 signals we see a fairly spread weights distribution, slightly
dominated by the hand-foot signals while knee and feet signals became sig-
nificantly recessed. The more massive trial #2 got higher success rate just by
employing 5 out of 15 signals that are rather asymmetric (left-elbow-right-
hand, left-hand-right-foot, right-hand-right-loin and both same-side loin-foot
signals), moreover, just 3 of them (left-elbow-right-hand, right-hand-right-loin
and right-loin-right-foot) achieved the same recognition rate in trial #3, which
despite its asymmetricity seems like an economic solution. The trials #4 and
#5 have been evaluated to complete experiments over the entire DTDS charac-
ter spectrum. Balanced weights of the almost static cross-side loin-knee signals
and the solitude static loins signal scored the highest rates.
An attentive reader observes that presence of relatively static signals qual-
ifies for higher recognition rate. The highly-fluctuating signals are usually as-
signed with zero weight, which means that they harm when being used for
recognition. This leads to an unexpectable resolution that in order to increase
recognition rate static signals should be utilized instead. However, measuring
human skeletons without monitoring any dynamic parameter of walk would
contradict the idea of gait recognition.
The recognition rates on employed database are rather satisfactory, al-
though the trend of static signals being more discriminatory than the dynamic
ones is a result of either small database size or wrong notion of distance func-
tions. Both of the issues are subjects of our future research.
46
6 Conclusions and Future Research
47
Bibliography
[1] Aravecchia M., Calderara S., Chiossi S., and Cucchiara R., A Videosurveil-
lance Data Browsing Software Architecture for Forensics: From Trajectories
Similarities to Video Fragments, Proceedings of the 2nd ACM workshop on
Multimedia in forensics, security and intelligence ACM, NY, 2010.
[2] Barclay C., Cutting J., and Kozlowski, L., Temporal and Spatial Factors
in Gait Perception That Influence Gender Recognition, Perception and
Psychophysics, vol. 23, no. 2, pp. 145–152, 1978.
[3] BenAbdelkader C., Cutler R., Nanda H., and Davis L., EigenGait: Motion-
Based Recognition of People Using Image Self-Similarity, Proceedings of
the Third IC on Audio- and Video-Based Biometric Person Authentication
Springer-Verlag London, pp. 284-294, 2001.
[4] Boyd J.E. and Little J.J., Biometric Gait Recognition, Advanced Studies
in Biometrics, pp. 19-42, 2005.
[5] Chen L., Özsu M.T., and Oria V., Robust and Fast Similarity Search for
Moving Object Trajectories, Proceedings of the 2005 ACM SIGMOD In-
ternational Conference on Management of data, pp. 491-502, 2005.
[6] Cunado D., Nixon M.S., and Carter J.N., Using Gait as a Biometric, Via
Phase-Weighted Magnitude Spectra, 1st International Conference Audio-
and Video-Based Biometric Person Authentication, pp. 95–102, Springer-
Verlag, 1997.
[7] Cunado D., Nixon M.S., and Carter J.N., Automatic Extraction and De-
scription of Human Gait Models for Recognition Purposes, Computer Vi-
sion and Image Understanding, vol. 90, no. 1, pp. 1-41, 2003.
[8] Cutting J. and Kozlowski, L., Recognizing Friends By Their Walk: Gait
Perception Without Familiarity Cues, Bulletin of the Psychonomic Society,
pp. 353-356, 1977.
[10] Dockstader S.L., Bergkessel K.A., and Tekalp A.M., Feature Extraction
for the Analysis of Gait and Human Motion, Proceedings of the 16th In-
ternational Conference on Pattern Recognition, vol. 1, pp. 5-8, 2002.
[11] Dockstader S.L., Berg M.J., and Tekalp A.M., Stochastic Kinematic Mod-
eling and Feature Extraction for Gait Analysis, IEEE Transactions on Im-
age Processing, vol. 12, no. 8, pp. 962-976, 2003.
48
6. Conclusions and Future Research
[12] Foster J.P., Nixon M.S., and Prugel-Bennett A., New Area Based Gait
Recognition, Audio- and Video-Based Biometric Person Authentication,
Springer–Verlag, pp. 312-317, 2001.
[13] Gross R. and Shi J., The CMU Motion of Body (MoBo) Database,
Robotics Institute, Pittsburgh, PA, Technical Report CMU-RI-TR-01-18,
June 2001.
[14] Han J. and Bhanu B., Individual Recognition Using Gait Energy Image,
IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 28,
no. 2, pp. 316-322, 2006.
[15] Huang P.S., Harris C.J., and Nixon M.S., Human Gait Recognition in
Canonical Space Using Temporal Templates, IEEE Proceedings on Vision,
Image and Signal Processing, vol. 146, no. 2, pp. 93-100, 1999.
[17] Johansson G., Visual Perception of Biological Motion and a Model for Its
Analysis, Perception and Psychophysics 14(2), pp. 201-211, 1977.
[18] Kale A., Sundaresan A., Rajagopalan A.N., Cuntoor N.P., Roy-
Chowdhury A.K., Krüger V., and Chellappa R., Identification of Humans
Using Gait, IEEE Transactions on Image Processing 13(9), pp. 1163-1173,
2004.
[19] Kamruzzaman J. and Begg R.K., Support Vector Machines and Other
Pattern Recognition Approaches to the Diagnosis of Cerebral Palsy Gait,
IEEE Transactions on Biomedical Engineering, vol. 53, no. 12, pp. 2479-
2490, 2006.
[23] Niyogi S. and Adelson E., Analyzing and Recognizing Walking Figures in
XYT, IEEE Computer Society Conference on Computer Vision and Pattern
Recognition, Seattle, Washington, USA, pp. 469–474, 1994.
49
6. Conclusions and Future Research
50