You are on page 1of 6

3D Gait Recognition Using Multiple Cameras

Guoying Zhao1,2, Guoyi Liu2, Hua Li2, Matti Pietikäinen1


1. Machine Vision Group, Infotech Oulu and Department of Electrical and Information
Engineering,
P. O. Box 4500 FI-90014 University of Oulu, Finland
E-mail:{gyzhao, mkp}@ee.oulu.fi
2. Key Laboratory of Intelligent Information Processing, Institute of Computing Technology,
Chinese Academy of Sciences, Beijing 100080
E-mail:{gyliu, lihua}@ict.ac.cn

Abstract tests up to date. The approaches to gait recognition can


be regarded as 2D or 3D methods. A great majority of
Gait recognition is used to identify individuals in the proposed approaches are 2D methods based on
image sequences by the way they walk. Nearly all of analyzing video sequences captured by a single camera.
the approaches proposed for gait recognition are 2D These methods can be divided into feature/appearance-
methods based on analyzing image sequences captured based methods [2, 5-8] and model-based methods [1, 9-
by a single camera. In this paper, video sequences 12]. Appearance-based methods which have low
captured by multiple cameras are used as input, and requirements for video quality are simple to implement
then a human 3D model is set up. The motion is but often sensitive to the variations of viewpoint, shoes,
tracked by applying a local optimization algorithm. and lighting. Model-based methods with rather
The lengths of key segments are extracted as static complex model matching and tracking stages are often
parameters, and the motion trajectories of lower limbs applied to high quality videos. The main disadvantages
are used as dynamic features. Finally, linear time of 2D recognition are caused by the limitations of a
normalization is exploited for matching and single camera view, and occlusion. Some methods try
recognition. The proposed method based on 3D to solve it based on 2D analysis. Ben-Abdelkader et
tracking and recognition is robust to the changes of al.[13] extract a subject’s height, amplitude of height
viewpoints. Moreover, better results are achieved for oscillations during gait, gait cadence, and stride length,
sequences containing difficult surface variations than and Johnson and Bobick [14] used four static body
with 2D methods, which prove the efficiency of our parameters for recognition, but the features extracted
algorithm. using these methods are simple, at the same time, they
did not use any timing information from gait.
1. Introduction Moreover, these methods require some camera
calibration and knowledge of the distance from camera
Biometric methods for identifying people based on to subject. Lee and Elgammal [15] introduced a
their physiological/behavioral characteristics, such as method, using higher-order singular value
face, speech, iris, fingerprints, hand-geometry and gait, decomposition which decomposes view factors, body
have come to play an increasingly important role in configuration factors and gait-style factors. Gait-style
human identification due primarily to their universality factors are view-invariant, time-invariant and speed
and uniqueness [1,2]. Gait recognition is one of the invariant that can be used for recognition. But due to
behavioral biometric methods, used to signify the the limitation of 2D methods, they can not deal with
identity of individuals in image sequences ‘by the way drastic view variation substantially which was not
they walk’ [3]. From a surveillance perspective, gait included in learning for multi-linear analysis.
recognition is an attractive modality because it may be In 3D gait recognition, videos are captured by more
performed at a distance surreptitiously. than two cameras. The walking persons are tracked
Boyd and Little [4] gave an overview of gait with the help of 3D human models. Then, the 3D
recognition, comparing and evaluating the research and human structure is reconstructed and dynamic features

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
are extracted. Finally, the gait analysis or recognition is system can rotate around this joint according to the
performed. Dockstader et al. introduced a complex rotation parameters of the joint. The parameters of the
method based on hard and soft kinematic constraints joint include a rotation vector N(1x3) and a translation
towards 3D tracking and extraction of gait patterns in vector T(1x3) expressing the displacement relative to
human motion [16]. They only experimented with data the father joint node. This displacement is achieved
from two persons, with an emphasis on constructing according to measurements obtained from the body. In
human models and tracking, but not recognition. tracking, the length of the skeleton remains unchanged,
Urtasun and Fua [17] proposed an approach based on so the translation vector T is fixed. A skeleton model
matching 3D motion models to synthesized video, and with 10 joints and 24 degrees of freedom (DOFs),
on tracking and restoring motion parameters. They which can be seen in Fig. 2, is used in our
preformed tests on 4 people with nine speed variations, implementation, Fig. 3(a) shows the appearance
emphasizing robustness of speed changes. model(here a truncated conic model is applied) and (b)
In real-world environments, a 2D analysis will be the projected image using our models.
easily affected by varying viewpoints, occlusion and
surface variations, and cannot provide correct and
accurate results.
In this paper, we propose a novel approach to 3D
gait recognition. In our method gait sequences captured (a) (b)
by multiple cameras are tracked, trajectories of key Fig. 3. (a) Human appearance model
joints are extracted as dynamic features, and lengths of (b) Estimated edge points and inner points after
segments are used as static parameters to assist projection
analysis, simulation and recognition.

2. Human model and initialization

To restore the 3D pose of a person, a 3D human (a) (b) (c)


model should firstly be defined, which includes the
skeleton structure and appearance. Skeleton parameters
define the pose, while the appearance model describes
the shape, which can be used in the comparison of
features from projected images.
(d) (e) (f) (g)
Fig.4. Manual initialization and stick figures of
simulation data after initialization
The visual tracking is based on an initialized first
frame. The radius and length of the truncated conic
model, initial pose and the initial angular configuration
of all views for the first time step have to be known.
Fig.1. Tree structure of human model This step requires the user’s interaction. A semi-
automatic system was setup for this purpose. A user
adjusts the parameters of the conic model, clicks on the
2D joint locations in all views at the first time step, as
shown in Fig.4 (a). Given that, the 3D pose and the
image projection of the matching angular configuration
is found in minimizing the sum of squared differences
between the projected model joint locations and the
user supplied model joint locations. Symmetry
Fig. 2. Human skeleton model: 10 joints and 24 constraints, that the left and right body lengths are the
dofs.(Number in brackets is the DOF of the joint) same, are enforced as well. Good results are achieved
In our approach, a tree structure is applied to with a Quasi-Newton method. Fig.4 shows the original
describe a human skeleton, and each node in the tree input images (Fig.4 (b)(d)(f)) and corresponding stick
denotes a joint, as shown in Fig. 1. A local coordinate model (Fig.4 (c)(e)(g)) after initialization.
system is attached to each joint, and this coordinate

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
3. Tracking In this paper, the objective function is built aimed at
three different image features: luminance, edge and
The axis-angle is used to parameterize rotation. It silhouette, as shown in Fig.5. Edges and silhouette
provide the limitation to the projected position of the
describes rotation as rotating an angle θ =|| a || round
human shape model, while luminance to the motion of
an axis direction of a ( a is an Axis-angle vector). The two neighboring frames.
corresponding rotation matrix can be computed using Fig.6 demonstrates the framework, which can be
Rodrigues formula: summarized as follows:
a a 2
Ra (θ ) = I 3 + sin(θ )[ ×] + (1 − cos(θ ))[ ×] (1)
a a
where, × denotes an anti-symmetric matrix. On the
basis of a and T, we can compute the relative rigid
body transformation of the coordinate system to the
⎛ Ra T ⎞
father node coordinate M = ⎜ ⎟.
⎝ 01×3 1 ⎠
The Newton-Gauss algorithm is appropriate to solve Fig.6. Framework
non-linear least squares problems, and its objective 1) Setting up human model;
function is with the formulation of the sum of squared 2) Manual initialization to get initial pose vector for
M images of first time step;
residuals: f (x ) = ∑ ri2 (x ) 3) Input: images from multi-cameras I (t ) , I (t + 1) in
i =1
time t and t + 1 , pose parameters vector x (t ) for I (t ) ,
Suppose that J is the Jacobian matrix of residual
and set x0 (t + 1) = x (t ) ;
vector r = [ r1 , r2 ,..., rM ] ' , M the number of visible
4) Iterate: n = 0 ; Extracting image features.
points in the model surface, and x is the parameter a) Computing model projection image features from
including all the axis angles to be optimized.
⎛ current pose xn (t + 1) ;
⎜⎜⎜ ∂r1 ∂r1⎟⎟⎞
⎜⎜ ∂x1 ... ⎟⎟⎟ (2) b) Comparing input image features and model
∂x N ⎟⎟

J (x ) = ⎜⎜ ... ... ⎟⎟⎟ projection features, then computing the residual
⎜⎜ ⎟⎟
⎜⎜ ∂r ∂rM ⎟⎟⎟ vector r ;
⎜⎜ M
⎜⎝ ∂x1 ∂x N ⎟⎟⎠ c) Solving Jqn + r (x n ) = 0 to get modification
Then the first derivative of the objective function increment qn = (J T J )−1J T r .
is ∇f (x ) = 2J T r . Supposing the norm of residuals d) Modification: x n +1 = x n + qn . n = n + 1 .
ri by optimal objective is small, a second order Until the residual vector r is smaller than threshold
Hessian matrix can be approximated using the or iteration time n exceeds given value, iterate stops
following formula: and the pose vector of current images from multi-
2 T
∂ r cameras is outputted as the resulted pose and the initial
H (x ) = 2J T J + (r  In ) ≈ 2J T J
∂x T ∂x pose for frames of the next time step.
According to the Newton optimized formula, the Fig.7 shows the tracking results.
iterative formula solving the numerical value of the
objective function f (x ) is:
−1
x n + 1 = x n − H (x n ) ∇f (x n )
−1
= x n − (J J ) J T r (x n ) ,
T
(3) (a) (b) (c)
= x n + qn Fig.7. a) Foreground detection b) Tracking
c) Skeleton model simulation

4. Gait recognition
(a) (b) (c) Gait is mainly a motion of lower limbs, although the
Fig.5. (a) Edge features (b) Silhouette features (c) motion of upper limbs can also affect gait. For the
Matching of edge points from model projection and following two reasons the upper limbs motion
input image parameters are not selected as gait features: 1) In our

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
experiments, because of self-occlusion and inter- Different persons have different walking speeds, and
occlusion of upper limbs and body, tracking of upper even for each person the speed often changes in
limbs is not correct and reliable, and usually fails. This different times and situations. Therefore, it is important
conclusion is the same as in [16, 17]; 2) In walking, the to take into account these variations. They can be
poses of upper limbs are often changed, for example, handled using time normalization. Time normalization
taking suitcase, lifting some things, pulling the can be implemented with linear or non-linear methods,
baggage, so inconstancy of upper limbs makes us such as the linear time normalization (LTN) process or
“distrust” their parameters in recognition. Therefore, dynamic time warping (DTW).
only motion parameters of lower limbs are applied as In LTN, suppose that there are R frames in the
dynamic features in our recognition. training data and T frames in the test sequence
Intuitively, when recognizing a person at a distance, (usually T ≠ R ). In matching, every frame in the
people firstly judge according to bodily form and/or training data is compared to certain testing frames
proportion, then according to the way of walking. So determined by linear measure change compensation.
we combine the static and dynamic features together for This method is appropriate for situations in which there
recognition. The static features here refer to the whole is a linear or approximately linear relation between the
body information, such as height, length of leg, which times of the decomposed activity and the whole activity.
do not change with the variation of mental state, for DTW is a method of dynamic programming, which is
example drunk, while the dynamic features describe the appropriate for the situation in which there are no rules
motion of a human. Trajectories of positions of knee for the decomposed and whole activity times, but just
joints and ankle joints relative to the root node are an order limitation of activities.
extracted as dynamic features. These two kinds of
features are combined together to recognize gait. 5. Experiments

Recently, Urtasun and Fua [17] proposed an


approach based on matching 3D motion models to
synthesized video and on tracking and restoring motion
Fig.8. Images from views of four cameras in parameters. They preformed tests on 4 people and nine
CMU database speed variations, with an emphasis on robustness of
speed changes. For speed variation, however, a 2D
analysis has already provided good results [6], but it is
sensitive to surface changes [6-8, 18]. For example,
almost all methods have given poor results in the USF
Fig.9. Gait simulation and analysis figure surface variation experiments, for example 29% in [8]
and 34.3% in [7]. For the CMU database, to our
knowledge, there are only few results dealing with
walking on inclined surfaces. This is because such
surface variations have drastic effects on human
silhouettes, which are difficult to solve using 2D
methods. We expect that a 3D method can help in this
Fig.10. Gait tracking and stick figure simulation respect. For test this, we used the CMU slow walking
In the CMU MoBo database, the data of each person data as the training set and inclined data (walking on an
include sequences from six cameras. In 3D tracking and inclined surface) as the test set. Static parameters and
analysis, at least two cameras are required, but because dynamic features obtained by tracking are used for
of occlusion, information from two or three cameras is recognition.
not enough. Data from four cameras were selected for Sequences from lateral, front oblique, front and back
analysis, because six cameras did not provide any better oblique views are used for tracking and recognition.
tracking results in our experiments. The cameras The static features include the lengths of eight
marked as 03, 05, 07 and 13, as shown in Fig.8, were segments: upper arms, lower arms, shoulders, head,
upper body, hips, upper legs and lower legs. These
used in our further experiments. Figs.9 and 10 present
features can be achieved in motion initialization. To
examples of volume and stick model images of gait decrease the errors caused by marking joints manually,
simulation based on tracking. The rightmost figure in five independent initializations are done to get the
Fig. 9 shows the distance trajectories of left knee, right mean lengths of the key segments as final static
knee, left ankle, right ankle relative to the root node, features. Dynamic features mainly use the trajectories
and analyzed information such as the gait period. Fig. of distances from two knees, two ankles to root, and the
10 demonstrates the key stick images simulating gait. distances between two knees and between two ankles,

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
respectively. The total trajectory number of dynamic describes the similarity of two different gaits of the
features used to describe gait motion is six. same person, which is more similar, and so it can be
used for recognition.
After the feature extraction and alignment, Euclidean
distance and 1NN are applied for measurement. Static
features reflect the body characteristics. The
recognition accuracy obtained with the lengths of eight
segments is 60%. Dynamic features describe the
characteristics of motion. The time alignment keeps the
motion parameters of the whole sequence, providing a
recognition rate of 60%.
If we add the parameters of the upper limbs, there are
Fig.11. Alignment of key joints’ trajectory of one 16 dynamic features in total. Besides the distance of
subject’s incline and slow in one period using LTN left knee, right knee, left ankle, right ankle, distance
We define four key poses in gait similar to [19]: P1: between two knees, distance between to ankles, there
two legs are together and the swinging right leg passes are distances of left hip, right hip, head top, neck, left
the planted left foot; P2: two legs are furthest apart and hand, right hand, left elbow, right elbow, left shoulder,
the right leg is in front, P3: the two legs are together right shoulder. All these distances refer to the distance
and the swinging left leg just passes the planted right
relative to the root node (MidHip) in motion, except the
foot; P4: two legs are furthest apart and the left leg is in
front. Considering that each period of walking will distance between two knees and between two ankles.
include these four key poses, and the time of two
neighboring key poses has a linear or approximately
linear relation to the whole time, LTN is applied to time
normalization. In fact, in pose matching, we often think
intuitively that P1 should match P1. If using DTW, the Fig.14. Relationship of recognition and different
motion parameters of similar poses may be very number of dynamic features
different, and therefore cannot find this Fig.14 shows the results obtained with different
correspondence. Fig. 11 describes the alignment of key numbers of dynamic features. We can see that the
joints’ trajectory of subject 04011_incline and motion parameters of the upper limbs are not helpful
04011_slow in one period using LTN. for recognition. In contrast, because of their inaccuracy
In time alignment, it is required that the start pose is and instability, the recognition rates decrease when
the same. In our experiments, when extracting one adding these features. Using only six features from the
period sequence, this period will start from pose P1. lower limbs (LeftKnee, RightKnee, LeftFoot,
This makes alignment and matching easy and reliable.
RightFoot, DisBetwKnee, DisBetwFoot), the best
Fig.12 shows the trajectory of distance between two
results are achieved.
ankles(a) and distance between two knees(b) of
04011_incline with ten persons’ slow sequences, When combining the static features and dynamic
relative distance of left ankle(c) and right ankle(d) of features, and testing inclined walking in the slow
04022_incline with ten persons’ slow sequences (two dataset, a recognition rate of 70% was obtained. So the
lines with “*” describe the two different sequences of combination of these two features can describe gait in
the same person, we can discriminate the person in this appearance and motion, increasing the efficiency and
trajectory figure) accuracy of analysis.

6. Conclusions

In this paper, an approach for gait tracking and


recognition based on multi-cameras is proposed. A
tracking method using a local optimized algorithm is
presented, and the gait feature extraction and
recognition are studied. The main advantage of the
Fig.13. Similarity of incline and slow sequence proposed method is that it can deal with the effects of
Slow and inclined sequences of 10 persons were viewpoint and surface variations, which are difficult to
randomly selected to evaluate our algorithm. The solve with 2D analysis. In recognition, static
similarity plot is shown in Fig.13. The more similar the parameters and dynamic features are combined together
two sequences, the blacker the figure is. The diagonal to describe gait.

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
(a) (b)

(c) (d)
Fig.12. Trajectory of distance between two ankles (a) and distance between two knees(b) of 04011_incline
with ten persons’ slow sequences; relative distance of left ankle(c) and right ankle(d) of 04022_incline with
ten persons’ slow sequences
Because in 2D recognition differences in surfaces [7] L. Wang, T. Tan, H. Ning and W. Hu, “Silhouette
have more effect on recognition accuracy, ten persons Analysis-based Gait Recognition for Human Identification”,
were randomly selected in our experiments to evaluate IEEE Trans. PAMI, 2003, 25(12):1505-1518.
our method. A slow dataset is used as a training set, and [8] S.Sarkar, P.J.Phillipsm, Z.Liu, I.R.Vega, P. Grother, and
inclined walking dataset as test set to be recognized in K.W. Bowyer, “The HumanID Gait Challenge Problem: Data
slow dataset. A result of 70% is achieved. Moreover, Sets, Performance, and Analysis”, PAMI,2005,27(2):162-177
[9] C.Y. Yam, M.S. Nixon, and J.N. Carter, “Gait
the results of experiments using different dynamic
Recognition by Walking and Running: A Model-Based
features show the efficiency of 3D analysis, importance
Approach”, ACCV 2002, pp.1-6.
of lower limbs motion, and low reliability of upper [10] D. Meyer, J. Denzler and H. Niemann, “Model based
limbs. But initialization is still implemented manually, Extraction of Articulated Objects in ImageSequences for Gait
and the reliability of tracking also needs to be improved. Analysis”, ICIP, 1997, pp. 78-81.
[11] L.Lee and W.E.L.Grimson, “Gait Appearance for
Acknowledgements Recognition”, Biometric Authentication, pp.143-154, 2002.
This work is partly supported by the Academy of [12] G.V.Veres, M.S.Nixon, and J.N.Carter, “Model-based
Finland. Approaches for Predicting Gait Changes Over Time”,
IWBRS,2005, Beijing. LNCS, 213-220.
References [13] C.BenAbdelkader, R.Cutler and L.Davis, “Person
[1] A. Kale, A.N. Rajagopalan, N. Cuntoor, V. Kruger, and Identification using Automatic Height and Stride Estimation”,
R.Chellappa, “Identification of Humans Using Gait”, IEEE In: proceedings of icpr2002: 377-380.
Transactions on Image Processing, 2004,13(9): 1163-1173 [14] A.Y. Johnson, and A. F.Bobick, “A Multi-view Method
[2] N. Cuntoor, A. Kale and R. Chellappa, “Combining for Gait Recognition Using Static Body Parameters”, AVBPA
Multiple Evidences for Gait Recognition”, ICASSP 2003, 2001 Halmstad, Sweden: 301-311.
Hong Kong, pp. 6-10 [15] C.Lee and A.Elgammal, “Towards Scalable View-
[3] C. BenAbdelkader, R. Cutler and L. Davis, “Gait Invariant Gait Recognition: Multilinear Analysis for Gait”,
Recognition Using Image Self-Similarity”, EURASIP Journal AVBPA2005:395-405.
on Applied Signal Processing, 2004, 4: 572-585. [16] S.L. Dockstader and A.M. Tekalp, “A Kinematic Model
[4] J. E.Boyd and J.J.Little, “Biometric Gait Recognition”, for Human Motion and Gait Analysis”, In Proc. of the
Biometrics school 2003, LNCS3161: 19-42,2005 Workshop on Statistical Methods in Video Processing
[5] R.T. Collins, Y. Liu, and Y. Tsin, “Gait Sequence (ECCV), pp.49-54,Copenhagen, Denmark, June 2002.
Analysis using Frieze Patterns”, In ECCV, 2002, vol.2, [17] R. Urtasun and P. Fua, “3D Tracking for Gait
pp.659-671. Characterization and Recognition”, FGR, 2004, pp.17-22.
[6] G. Zhao, L. Cui, and H. Li, “Combining Wavelet Velocity [18] L. Lee, G. Dalley, and K. Tieu, “Learning Pedestrian
Moments and Reflective Symmetry for Gait Recognition”, In Models for Silhouette Refinement”, ICCV2003, 663-670.
International Workshop on Biometric Recognition [19] R.T.Collins, R.Gross, J.Shi, “Silhouette-based Human
Systems(IWBRS),2005, Beijing. LNCS, 205-212. Identification from Body Shape and Gait”, FG’02,pp.366-371

Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE

You might also like