You are on page 1of 11

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/290490824

Gait Recognition Based on Normalized Walk Cycles

Conference Paper · July 2012


DOI: 10.1007/978-3-642-33191-6_2

CITATIONS READS

12 71

4 authors, including:

Michal Balazia Pavel Zezula


University of South Florida Masaryk University
9 PUBLICATIONS   91 CITATIONS    241 PUBLICATIONS   4,671 CITATIONS   

SEE PROFILE SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Search In Audio Visual Content Using Peer-to-peer IR View project

Gait Recognition from Motion Capture Data View project

All content following this page was uploaded by Michal Balazia on 20 April 2016.

The user has requested enhancement of the downloaded file.


Gait Recognition Based
on Normalized Walk Cycles

Jan Sedmidubsky, Jakub Valcik, Michal Balazia, and Pavel Zezula

Masaryk University, Botanicka 68a, 602 00 Brno, Czech Republic

Abstract. We focus on recognizing persons according to the way they


walk. Our approach considers a human movement as a set of trajectories
formed by specific anatomical landmarks, such as hips, feet, shoulders,
or hands. The trajectories are used for the extraction of distance-time
dependency signals that express how a distance between a pair of specific
landmarks on the human body changes in time as the person walks.
The collection of such signals characterizes a gait pattern of person’s
walk. To determine the similarity of gait patterns, we propose several
functions that compare various combinations of extracted signals. The
gait patterns are compared on the level of individual walk cycles in order
to increase the recognition effectiveness. The results evaluated on a 3D
database of walking humans achieved the recognition rate up to 96 %.

1 Introduction
Human gait is defined as the manner in which a person walks. Recent stud-
ies have proven that gait can be seen as a biometric characteristic and used
as a signature to recognize people. The great advantage is its possibility to be
captured at a distance, even surreptitiously. However, effectiveness of gait recog-
nition methods strongly depends on many factors such as camera view, carried
accessories, person’s clothes, or walking surface.
Gait recognition methods can be divided to two major categories: model-based
and appearance-based approaches. Appearance-based methods [4,6,10] generally
characterize the whole motion pattern of the human body by a compact repre-
sentation regardless of the underlying structure. They usually combine extracted
human silhouettes from each video frame into a single gait image that preserves
temporal information. Nevertheless, recognition based on comparison of such
gait images is restricted to one view point. To use gait features in unconstrained
views, we need to adopt the model-based concept in order to estimate 3D models
of walking persons.
The model-based concept fits various kinds of stick figures onto the walking
human. The recovered stick structure allows accurate measurements to perform,
independent of camera view. BenAbdelkader et al. [1] computed an average
stride length and cadence of feet and used just both these numbers for gait
recognition. Tanawongsuwan and Bobick [7] compared joint-angle trajectories of
hips, knees, and feet by the dynamic time warping (DTW) similarity function,
with normalization for noise reduction. Cunado et al. [5] used a pendulum model

G. Bebis et al. (Eds.): ISVC 2012, Part II, LNCS 7432, pp. 11–20, 2012.

c Springer-Verlag Berlin Heidelberg 2012
12 J. Sedmidubsky et al.

where thigh’s motion and rotation were analyzed using a Fourier transformation.
The approach of Wang et al. [9] measured a mean shape of silhouettes gained
by Procrustes’s analysis and combined it with absolute positions of angles of
specific joints. Yoo et al. [11] compared sequences of 2D stick figures by a back-
propagation neural network algorithm. Recent advances in gait recognition have
been surveyed in [3].
We adopt the model-based concept to recover a 3D stick figure of the human
body by capturing spatial coordinates of significant anatomical landmarks, such
as hands, hips, knees, or feet. The recovered stick figure is used to compute
distance-time dependency signals that express how a distance between two spe-
cific joints of the human body changes in time. The collection of such signals
defines a gait pattern of person’s walk (Section 2). In Section 3, a novel similarity
function for comparing gait patterns is introduced. To effectively compare gait
patterns, we normalize them to encapsulate signals corresponding exclusively to
a single walk cycle (Section 4). In Section 5, the influence of normalization and
effectiveness of similarity function is deeply evaluated on a real-life 3D motion
database. We distinguish from existing approaches by taking also movements of
arms into account and by comparing gait patterns on the basis of normalized
walk cycles.
The main contributions of this paper constitute: (1) proposal of a gait pattern
that encapsulates information about a person’s walk in the form of viewpoint
invariant distance-time dependency signals, (2) introduction of a novel similarity
function for comparing gait patterns, taking movements of legs and arms into
account, and (3) experimental evaluation of recognition rate of the proposed
function and its modifications, based also on diverse normalization methods.

2 Gait Representation

We introduce a structural model of a human body. This model is used for the
extraction of viewpoint invariant planar signals. The collection of such planar
signals forms a gait pattern of person’s walk. Recognition of persons is based on
comparing their gait patterns by a sophisticated similarity function.

2.1 Model Definition

We define the human model by a set of significant anatomical landmarks: clav-


icles CL /CR , elbows EL /ER , hands HL /HR , hips LL /LR , knees KL /KR , and
feet FL /FR . The subscripts L and R express whether a given landmark is situ-
ated at the left or right side of the human body, respectively (see Figure 1). In
particular, we suggest a 12-parameter model M:

M = (CL , CR , EL , ER , HL , HR , LL , LR , KL , KR , FL , FR ) ,

where each landmark is described by a 3-dimensional body point Pf = (xf , yf , zf )


captured at a given video frame f ∈ F . The domain F = {1, . . . , n} refers to the
Gait Recognition Based on Normalized Walk Cycles 13

anatomical landmarks
trajectories

z
y
x

Fig. 1. Location of specific anatomical landmarks and their trajectories captured as


the person walks

length of input video in terms of number of frames, i.e., to the number of times
a specific body point has been captured.
A collection of consecutive points represents a motion trajectory (see Figure 1).
Formally, each point P moving in time, as the person walks, constitutes a discrete
trajectory TP , defined as:
TP = {Pf | f ∈ F } .
The discrete domain F allows us to utilize metric functions for point-by-point
comparison of trajectories. Trajectories cannot be used directly for recogni-
tion because the values of their spatial coordinates depend on the calibration
of system that detects and estimates particular coordinates. Moreover, per-
sons do not walk in the same direction, which makes trajectories of different
walks (even of the same person) incomparable. We rather compute distances
between selected pairs of trajectories to construct distance-time dependency sig-
nals. Such signals are already independent of the walk direction and system
calibration.

2.2 Distance-Time Dependency Signal

A distance-time dependency signal (DTDS) expresses how a distance between


two trajectories changes over time as the person walks. The variation in these dis-
tances is primarily exploited as information for human recognition. The distance
is always measured between two points Pf = (xf , yf , zf ) and Pf = (xf , yf , zf )
captured at the same video frame on the basis of the Euclidean distance L2 :

   2  2  2

L2 Pf , Pf = xf − xf + yf − yf + zf − zf .

A distance-time dependency signal SP P  between two trajectories TP and TP  of


points P and P  is formally defined as:
 
 

SP P  = df ∈ R+0 | f ∈ F ∩ F ∧ df = L2 Pf , Pf ,
14 J. Sedmidubsky et al.

where F and F  are domains of trajectories TP and TP  , respectively. The DTDSs


of selected pairs of trajectories are used to construct a gait pattern that serves
as the characteristic of person’s walk.

2.3 Gait Pattern

A gait pattern describes the person’s style of walking by encapsulating informa-


tion about DTDSs that are extracted from the single person’s walk. In particular,
each gait pattern consists of signals described in Table 1.

Table 1. Signals extracted from each person’s walk

Notation DTDS measured between Notation DTDS measured between


SLL FL left hip and left foot S CR H R right clavicle and right hand
SLR FR right hip and right foot S FL FR feet
S CL H L left clavicle and left hand SKL FR left knee and right foot

Formally, a gait pattern G of person’s walk is defined as the 6-parameter


structure, consisting of six DTDSs:

G = (SLL FL , SLR FR , SCL HL , SCR HR , SFL FR , SKL FR ) .

In the following, we present a methodology for measuring similarity between two


gait patterns.

3 Similarity of Gait Patterns

To express similarity of gait patterns G and G  , we firstly need to define the


way of comparing DTDSs. We define a function Φ for measuring a distance
(dissimilarity) of two signals S = {df | f ∈ F } and S  = {df | f ∈ F  } as:

Φ (S, S  ) =
df − df
. (1)
f ∈F ∩F 

This function, also known as the L1 or Manhattan distance function, sums point-
by-point differences between two specific DTDSs. In case the domains F and F 
are not the same, similarity is computed among their common frames only. The
function returns 0 if the signals are identical and with an increasing distance
their similarity decreases.
We introduce a novel similarity function D for comparing gait patterns G and
G  . This function is based on aggregation of four Φ functions and is formally
defined as:
   
D (G, G  ) = Φ SLL FL , SL L FL + Φ SLR FR , SL R FR + 
  (2)
Φ SCL HL , SC L HL
+ Φ SCR HR , SC R HR
.
Gait Recognition Based on Normalized Walk Cycles 15

Individual Φ functions compare similarity of signals between the clavicle and


hand and between the hip and foot for both the arms and legs. In this way, the
similarity function D expresses difference in the manner of walking of two gait
patterns, taking movements of arms and legs into account. Similar to Φ, D re-
turns 0 for identical gait patterns and with an increasing value their dissimilarity
decreases.
The use of Φ functions is meaningful only in case when input signals contain
the same number of footsteps and start at the same phase of a walking process,
otherwise the signals are semantically incomparable. Consequently, before simi-
larity comparison we normalize the signals within each gait pattern with respect
to a duration and walk cycles’ phase. Signals are extended or contracted to be
synchronized, i.e., keeping the same phase of a walking process at every moment.

4 Normalization of Gait Patterns


We preprocess each gait pattern to contain signals aligned to 150 video frames,
prior to application of the similarity function. The length of 150 frames corre-
sponds to an average walk-cycle duration in our database used. We preprocess
each gait pattern by three different normalization methods, which were proposed
in [8]. These methods are briefly summarized in the rest of this section.

4.1 Simple Normalization


The most straightforward way of aligning signals is to take the first 150 video
frames from each signal S ∈ G, where G is an input gait pattern. In case the
signal is shorter, it is linearly transformed to the length of 150 frames. This
simple method is denoted as SN .

4.2 Footstep Normalization


In contrast to the SN approach, the footstep normalization (F N ) considers
speed and characteristics of walking by extracting a single walk cycle from each
signal. Signals corresponding purely to a single walk cycle, which is two footsteps,
can effectively be compared by the D similarity function, instead of comparing
signals of unknown movements. To extract the requested walk cycle, we need
to identify inceptions of individual footsteps. We select the moment when per-
son’s legs are the closest to each other as a footstep inception. Focusing on the
character of one of feet signals SFL FR ∈ G in Figure 2a, we can see a sequence
of hills and valleys. Each hill represents a period of moving feet apart and their
consecutive approach. The minimum of each valley expresses that both the feet
are passing, i.e., the legs are the closest to each other. To determine footsteps,
all the minima within the feet signal must be identified. We cannot rely on the
signal to contain minima at a fixed distance and to be ideally smooth, which
means without any undulation caused by measurement errors. This is the reason
we utilize our specialized find-minima algorithm proposed in [8]. In particular,
16 J. Sedmidubsky et al.

S’FLFR S’FLFR
SFLFR SFLFR

Distance
Distance

(a) (b)

Video Frames m1 m’1 m’2 m2 m’3 m3 m’4


S’FLFR S’FLFR
SFLFR SFLFR
Distance

Distance
(c) (d)

m1, m’2 m’3 m2 m’4 m3 150th frame Video Frames 150th frame

Fig. 2. Normalization of two feet DTDSs with a different number of footsteps (each
hill represents a single footstep). Figure (a) represents these signals without any nor-
malization. Figure (b) denotes identified minima of each signal. Figure (c) constitutes
just the first walk cycle of the signals, which starts with the move of left foot ahead.
Figure (d) shows the extracted walk cycles after linear transformation to 150 frames.

we pick the video frames m1 , m2 , m3 , m4 where the first four minima were iden-
tified. The pairs of adjacent minima determine individual footsteps, alternately
with the left or right foot in front. The requested walk cycle is formed by the
first two footsteps, so each S ∈ G is cropped according to the m1 -th and m3 -th
video frame. The cropped signals are linearly transformed to the standardized
length of 150 video frames.

4.3 Walk Cycle Normalization

The F N approach extracts the walk cycle disregarding the fact whether the
first footstep belonged to the left or right foot. However, a characteristic of some
DTDSs depends on the leg which undertook a given footstep – such DTDSs
are periodic on the level of walk cycles. Moreover, human walking might not be
balanced, e.g., due to an injury, which even results in a different characteristic of
feet signal for the left and right foot. The walk cycle normalization W N solves
this problem by extracting a single walk cycle that always starts with the move
of left foot ahead – the footstep of the left leg and consecutive footstep of the
right leg. To identify the first footstep of the left leg, we analyze the signal
SKL FR = {df | f ∈ F } that constitutes the changing distance between the left
knee and right foot. If both the feet are passing, this signal achieves a higher value
when the left foot is moving ahead in comparison with the opposite situation
when the right foot is moving ahead. In this way, if the condition dm1 < dm2 is
met, we crop each signal S ∈ G according to the m1 -th and m3 -th video frame
(m1 and m3 are frames where the first and third minima of the feet signal were
found). Otherwise, signals are cropped according to the m2 -th and m4 -th video
frame. Both the extracted footsteps form the requested walk cycle with the first
footstep undertaken by the left leg. Similar to F N , the requested walk cycle is
Gait Recognition Based on Normalized Walk Cycles 17

finally transformed to the length of 150 video frames. The whole normalization
process is depicted in Figure 2 and described in [8] in more detail.

5 Experimental Evaluation
We evaluate effectiveness of the proposed similarity function for gait recognition
and compare it with other functions and different normalization approaches.
Firstly, we describe a motion-capture database used. Secondly, methodology for
evaluating experimental trials is presented. Thirdly, effectiveness of examined
similarity functions and influence of diverse normalization processes is reported.

5.1 Database
We utilized the Motion Capture Database (MoCap DB) 1 from the CMU Graph-
ics Lab as a primary data source of trajectories of walking humans. This database
contains motion sequences of different kinds of movements (e.g., dance, walk,
box, etc.) for 144 recorded persons. We performed experiments on the subset of
motion sequences that corresponded to common walking. We took all 131 walk-
ing sequences belonging to 24 recorded persons. Each person had at least two
different sequences. Walking sequences are the only ones that could meaningfully
be used for gait recognition.
We implemented a specialized software to extract gait patterns from 131 walk-
ing sequences. In particular, we extracted trajectories of all landmarks P ∈ M
(see Section 2) for each walking sequence. The obtained trajectories were em-
ployed to compute DTDSs specified in Table 1. These DTDSs were normalized
and used to construct a gait pattern for each walking sequence.

5.2 Methodology
We concentrated on verifying effectiveness of our approach by evaluating nearest-
neighbors queries. To be maximally fair, we constructed one query for each
person – the query object for each query was randomly chosen from gait patterns
belonging to the given person. Thus 24 queries were constructed and evaluated
against a database of all 131 gait patterns. The nearest found neighbor was
always the same as the query gait pattern (i.e., the exact match), so it was
omitted and the next closest neighbor was analyzed. If the gait pattern of the
analyzed neighbor belonged to the same person as the query pattern, search
was successful because of the correct person identified. Search could always be
successful since at least two different gait patterns were available in the database
for each person. Effectiveness – a recognition rate – was stated as a ratio between
the number of correctly identified persons and the number of all persons (i.e.,
the number of successful queries divided by 24). Since we do not define any
recognition threshold, it is not possible to calculate false positives. This is the
part of our future work.
1
http://mocap.cs.cmu.edu
18 J. Sedmidubsky et al.

5.3 Results
The results were deeply studied for diverse similarity functions and the three nor-
malization approaches presented in Section 4: (1) Simple Normalization (SN ),
(2) Footstep Normalization (F N ), and (3) Walk Cycle Normalization (W N ).
We expect that the SN normalization should achieve the worst recognition rate
since it does not take individual footsteps into account. The W N normalization
should be more effective than F N because it, furthermore, distinguishes between
the left and right foot.
The normalized signals served as input parameters for computation of simi-
larity of gait patterns. We also evaluated three different types of similarity by
changing the Φ function in Equation 1. In addition to the original Manhattan
distance (L1 ), the Euclidean distance (L2 ), and the dynamic time warping ap-
proach (DT W ) [2] were used to measure similarity of two DTDSs.
We also modified Equation 2 to evaluate suitability of different DTDSs for
gait recognition. Firstly, we simply modified the function D to recognize persons
based purely on a single DTDS, i.e., D = Φ (S, S  ). This approach is denoted as
single-DTDS recognition. Secondly, we modified the function D to combine sev-
eral DTDSs with the same “weight” (e.g., the original setting with four DTDSs
in Equation 2). The use of several DTDSs is referred as multi-DTDS recognition.

Single-DTDS Recognition. We evaluated recognition rates of all possible


DTDSs that were computed for each couple of anatomical landmarks from M.
Table 2 presents the results for top-six DTDSs with the highest recognition rates.
Individual rows constitute the six best signals examined and columns represent
combinations of similarity functions and normalization approaches.

Table 2. Recognition rate comparison of diverse similarity functions and different


normalization approaches for the single-DTDS recognition

L1 L2 DT W
Examined DTDS
W N F N SN W N F N SN W N F N SN
SLL FR 0.77 0.54 0.27 0.75 0.52 0.21 0.67 0.58 0.44
SLL FL 0.69 0.60 0.33 0.63 0.56 0.21 0.75 0.56 0.54
SLR FR 0.71 0.58 0.25 0.67 0.50 0.25 0.73 0.67 0.56
S CL H L 0.73 0.63 0.40 0.73 0.63 0.40 0.58 0.58 0.40
SLR FL 0.56 0.52 0.23 0.56 0.54 0.25 0.69 0.48 0.38
S CR H R 0.67 0.60 0.40 0.65 0.60 0.35 0.60 0.52 0.54

The best recognition rate of 0.77 (i.e., effectiveness of 77 %) was achieved by


the L1 similarity function, W N normalization, and SLL FR signal that repre-
sents the changing distance between the left hip and right foot. We can deduce
that success of recognition primarily depends on a normalization approach and
type of DTDS used. There is an obvious difference in distribution of recognition
rates between SN and the rest of normalization approaches, which shows the
Gait Recognition Based on Normalized Walk Cycles 19

usefulness of normalization. From the similarity point of view, L1 and DT W


were slightly more successful than L2 . However, the 77 % effectiveness is not still
satisfactory and calls for the use of combination of more DTDSs.

Multi-DTDS Recognition. We combined several DTDSs to improve a recog-


nition rate. The best recognition rate of 0.96 (only one query out of 24 was un-
successful) was achieved by using the W N normalization and DT W similarity
function comparing the changing distance  between the  shoulder
 and hand of the
left and right arm, i.e., D (G, G  ) = Φ SCL HL , SC

L HL
+ Φ S 
CR HR , SCR HR . The
original similarity function proposed in Equation 2 achieved the 0.83 recognition
rate. The lower effectiveness is caused by summing “mismatched” similarities
of legs and arms. This should be solved by weighting individual functions Φ or
their normalization to interval [0, 1]. The results of top-four similarity functions
with the highest recognition rates – both the discussed similarity functions along
with other two functions comparing the changing distances between (1) the hip
and foot for both the legs and (2) the left hip and left foot and the left shoulder
and left hand – are presented in Table 3.

Table 3. Recognition rate comparison of diverse similarity functions and different


normalization approaches for the multi-DTDS recognition

Combination of L1 L2 DT W
examined DTDSs W N F N SN W N F N SN W N F N SN
S CL H L + S CR H R 0.94 0.71 0.44 0.88 0.67 0.38 0.96 0.73 0.69
SLL FL + SLR FR 0.83 0.65 0.40 0.81 0.63 0.35 0.92 0.67 0.60
SLL FL + SCL HL 0.81 0.81 0.54 0.85 0.79 0.56 0.83 0.75 0.67
SLL FL + SCL HL + SLR FR + SCR HR 0.71 0.67 0.29 0.69 0.65 0.27 0.83 0.67 0.50

The results again confirmed the importance of normalization. The maximum


recognition rate of SN was not higher than the minimal recognition rate of W N ,
disregarding the Φ function and combinations of DTDSs used.

6 Conclusions
We investigated the problem of gait recognition based on processing trajectories
of human walking. Trajectories were used to extract distance-time dependency
signals to ensure viewpoint invariant recognition. These signals were normalized
in the form of walk cycles that were compared by a specialized similarity method.
The results were evaluated on a real-life database and compared with diverse
similarity functions and normalization approaches. The combination of signals
expressing the manner of movement of left and right arm along with the walk-
cycle normalization and DTW-like comparison led to the 96 % effectiveness. We
demonstrated that the normalization process and the movement of arms are
important characteristics to be considered for gait recognition.
20 J. Sedmidubsky et al.

We are aware of the fact that the database of 131 walking sequences is too
small, so we plan to build a bigger 3D database of motion trajectories acquired
by the Kinect 2 equipment. In the future, we also plan to improve a normalization
approach along with similarity function, so that gait patterns could compose of
more than a single walk cycle for more effective recognition.

Acknowledgements. This research was supported by the national project


GACR 103/10/0886. The database used in this paper was obtained from
http://mocap.cs.cmu.edu – the database was created with funding from NSF
EIA-0196217.

References
1. BenAbdelkader, C., Cutler, R., Davis, L.: Stride and cadence as a biometric in
automatic person identification and verification. In: 5th International Conference
on Automatic Face Gesture Recognition, pp. 372–377. IEEE (2002)
2. Berndt, D.J., Clifford, J.: Finding patterns in time series: a dynamic programming
approach. In: Advances in Knowledge Discovery and Data Mining, pp. 229–248.
American Association for Artificial Intelligence, Menlo Park (1996)
3. Bhanu, B., Han, J.: Human Recognition at a Distance in Video. In: Advances in
Computer Vision and Pattern Recognition. Springer (2010)
4. Chen, C., Liang, J., Zhao, H., Hu, H., Tian, J.: Frame difference energy image for
gait recognition with incomplete silhouettes. Pattern Recognition 30(11), 977–984
(2009)
5. Cunado, D.: Automatic extraction and description of human gait models for recog-
nition purposes. Computer Vision and Image Understanding 90(1), 1–41 (2003)
6. Han, J., Bhanu, B.: Individual recognition using gait energy image. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence 28(2), 316–322 (2006)
7. Tanawongsuwan, R., Bobick, A.F.: Gait recognition from time-normalized joint-
angle trajectories in the walking plane. In: International Conference on Computer
Vision and Pattern Recognition (CVPR 2001), vol. 2(C), II–726–II–731 (2001)
8. Valcik, J., Sedmidubsky, J., Balazia, M., Zezula, P.: Identifying Walk Cycles for
Human Recognition. In: Chau, M., Wang, G.A., Yue, W.T., Chen, H. (eds.) PAISI
2012. LNCS, vol. 7299, pp. 127–135. Springer, Heidelberg (2012)
9. Wang, L., Ning, H., Tan, T., Hu, W.: Fusion of static and dynamic body biomet-
rics for gait recognition. IEEE Transactions on Circuits and Systems for Video
Technology 14(2), 149–158 (2004)
10. Xue, Z., Ming, D., Song, W., Wan, B., Jin, S.: Infrared gait recognition based
on wavelet transform and support vector machine. Pattern Recognition 43(8),
2904–2910 (2010)
11. Yoo, J.H., Hwang, D., Moon, K.Y., Nixon, M.S.: Automated human recognition
by gait using neural network. In: Workshops on Image Processing Theory, Tools
and Applications, pp. 1–6. IEEE (2008)

2
http://www.xbox.com/kinect

View publication stats

You might also like