Professional Documents
Culture Documents
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
are extracted. Finally, the gait analysis or recognition is system can rotate around this joint according to the
performed. Dockstader et al. introduced a complex rotation parameters of the joint. The parameters of the
method based on hard and soft kinematic constraints joint include a rotation vector N(1x3) and a translation
towards 3D tracking and extraction of gait patterns in vector T(1x3) expressing the displacement relative to
human motion [16]. They only experimented with data the father joint node. This displacement is achieved
from two persons, with an emphasis on constructing according to measurements obtained from the body. In
human models and tracking, but not recognition. tracking, the length of the skeleton remains unchanged,
Urtasun and Fua [17] proposed an approach based on so the translation vector T is fixed. A skeleton model
matching 3D motion models to synthesized video, and with 10 joints and 24 degrees of freedom (DOFs),
on tracking and restoring motion parameters. They which can be seen in Fig. 2, is used in our
preformed tests on 4 people with nine speed variations, implementation, Fig. 3(a) shows the appearance
emphasizing robustness of speed changes. model(here a truncated conic model is applied) and (b)
In real-world environments, a 2D analysis will be the projected image using our models.
easily affected by varying viewpoints, occlusion and
surface variations, and cannot provide correct and
accurate results.
In this paper, we propose a novel approach to 3D
gait recognition. In our method gait sequences captured (a) (b)
by multiple cameras are tracked, trajectories of key Fig. 3. (a) Human appearance model
joints are extracted as dynamic features, and lengths of (b) Estimated edge points and inner points after
segments are used as static parameters to assist projection
analysis, simulation and recognition.
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
3. Tracking In this paper, the objective function is built aimed at
three different image features: luminance, edge and
The axis-angle is used to parameterize rotation. It silhouette, as shown in Fig.5. Edges and silhouette
provide the limitation to the projected position of the
describes rotation as rotating an angle θ =|| a || round
human shape model, while luminance to the motion of
an axis direction of a ( a is an Axis-angle vector). The two neighboring frames.
corresponding rotation matrix can be computed using Fig.6 demonstrates the framework, which can be
Rodrigues formula: summarized as follows:
a a 2
Ra (θ ) = I 3 + sin(θ )[ ×] + (1 − cos(θ ))[ ×] (1)
a a
where, × denotes an anti-symmetric matrix. On the
basis of a and T, we can compute the relative rigid
body transformation of the coordinate system to the
⎛ Ra T ⎞
father node coordinate M = ⎜ ⎟.
⎝ 01×3 1 ⎠
The Newton-Gauss algorithm is appropriate to solve Fig.6. Framework
non-linear least squares problems, and its objective 1) Setting up human model;
function is with the formulation of the sum of squared 2) Manual initialization to get initial pose vector for
M images of first time step;
residuals: f (x ) = ∑ ri2 (x ) 3) Input: images from multi-cameras I (t ) , I (t + 1) in
i =1
time t and t + 1 , pose parameters vector x (t ) for I (t ) ,
Suppose that J is the Jacobian matrix of residual
and set x0 (t + 1) = x (t ) ;
vector r = [ r1 , r2 ,..., rM ] ' , M the number of visible
4) Iterate: n = 0 ; Extracting image features.
points in the model surface, and x is the parameter a) Computing model projection image features from
including all the axis angles to be optimized.
⎛ current pose xn (t + 1) ;
⎜⎜⎜ ∂r1 ∂r1⎟⎟⎞
⎜⎜ ∂x1 ... ⎟⎟⎟ (2) b) Comparing input image features and model
∂x N ⎟⎟
⎜
J (x ) = ⎜⎜ ... ... ⎟⎟⎟ projection features, then computing the residual
⎜⎜ ⎟⎟
⎜⎜ ∂r ∂rM ⎟⎟⎟ vector r ;
⎜⎜ M
⎜⎝ ∂x1 ∂x N ⎟⎟⎠ c) Solving Jqn + r (x n ) = 0 to get modification
Then the first derivative of the objective function increment qn = (J T J )−1J T r .
is ∇f (x ) = 2J T r . Supposing the norm of residuals d) Modification: x n +1 = x n + qn . n = n + 1 .
ri by optimal objective is small, a second order Until the residual vector r is smaller than threshold
Hessian matrix can be approximated using the or iteration time n exceeds given value, iterate stops
following formula: and the pose vector of current images from multi-
2 T
∂ r cameras is outputted as the resulted pose and the initial
H (x ) = 2J T J + (r In ) ≈ 2J T J
∂x T ∂x pose for frames of the next time step.
According to the Newton optimized formula, the Fig.7 shows the tracking results.
iterative formula solving the numerical value of the
objective function f (x ) is:
−1
x n + 1 = x n − H (x n ) ∇f (x n )
−1
= x n − (J J ) J T r (x n ) ,
T
(3) (a) (b) (c)
= x n + qn Fig.7. a) Foreground detection b) Tracking
c) Skeleton model simulation
4. Gait recognition
(a) (b) (c) Gait is mainly a motion of lower limbs, although the
Fig.5. (a) Edge features (b) Silhouette features (c) motion of upper limbs can also affect gait. For the
Matching of edge points from model projection and following two reasons the upper limbs motion
input image parameters are not selected as gait features: 1) In our
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
experiments, because of self-occlusion and inter- Different persons have different walking speeds, and
occlusion of upper limbs and body, tracking of upper even for each person the speed often changes in
limbs is not correct and reliable, and usually fails. This different times and situations. Therefore, it is important
conclusion is the same as in [16, 17]; 2) In walking, the to take into account these variations. They can be
poses of upper limbs are often changed, for example, handled using time normalization. Time normalization
taking suitcase, lifting some things, pulling the can be implemented with linear or non-linear methods,
baggage, so inconstancy of upper limbs makes us such as the linear time normalization (LTN) process or
“distrust” their parameters in recognition. Therefore, dynamic time warping (DTW).
only motion parameters of lower limbs are applied as In LTN, suppose that there are R frames in the
dynamic features in our recognition. training data and T frames in the test sequence
Intuitively, when recognizing a person at a distance, (usually T ≠ R ). In matching, every frame in the
people firstly judge according to bodily form and/or training data is compared to certain testing frames
proportion, then according to the way of walking. So determined by linear measure change compensation.
we combine the static and dynamic features together for This method is appropriate for situations in which there
recognition. The static features here refer to the whole is a linear or approximately linear relation between the
body information, such as height, length of leg, which times of the decomposed activity and the whole activity.
do not change with the variation of mental state, for DTW is a method of dynamic programming, which is
example drunk, while the dynamic features describe the appropriate for the situation in which there are no rules
motion of a human. Trajectories of positions of knee for the decomposed and whole activity times, but just
joints and ankle joints relative to the root node are an order limitation of activities.
extracted as dynamic features. These two kinds of
features are combined together to recognize gait. 5. Experiments
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
respectively. The total trajectory number of dynamic describes the similarity of two different gaits of the
features used to describe gait motion is six. same person, which is more similar, and so it can be
used for recognition.
After the feature extraction and alignment, Euclidean
distance and 1NN are applied for measurement. Static
features reflect the body characteristics. The
recognition accuracy obtained with the lengths of eight
segments is 60%. Dynamic features describe the
characteristics of motion. The time alignment keeps the
motion parameters of the whole sequence, providing a
recognition rate of 60%.
If we add the parameters of the upper limbs, there are
Fig.11. Alignment of key joints’ trajectory of one 16 dynamic features in total. Besides the distance of
subject’s incline and slow in one period using LTN left knee, right knee, left ankle, right ankle, distance
We define four key poses in gait similar to [19]: P1: between two knees, distance between to ankles, there
two legs are together and the swinging right leg passes are distances of left hip, right hip, head top, neck, left
the planted left foot; P2: two legs are furthest apart and hand, right hand, left elbow, right elbow, left shoulder,
the right leg is in front, P3: the two legs are together right shoulder. All these distances refer to the distance
and the swinging left leg just passes the planted right
relative to the root node (MidHip) in motion, except the
foot; P4: two legs are furthest apart and the left leg is in
front. Considering that each period of walking will distance between two knees and between two ankles.
include these four key poses, and the time of two
neighboring key poses has a linear or approximately
linear relation to the whole time, LTN is applied to time
normalization. In fact, in pose matching, we often think
intuitively that P1 should match P1. If using DTW, the Fig.14. Relationship of recognition and different
motion parameters of similar poses may be very number of dynamic features
different, and therefore cannot find this Fig.14 shows the results obtained with different
correspondence. Fig. 11 describes the alignment of key numbers of dynamic features. We can see that the
joints’ trajectory of subject 04011_incline and motion parameters of the upper limbs are not helpful
04011_slow in one period using LTN. for recognition. In contrast, because of their inaccuracy
In time alignment, it is required that the start pose is and instability, the recognition rates decrease when
the same. In our experiments, when extracting one adding these features. Using only six features from the
period sequence, this period will start from pose P1. lower limbs (LeftKnee, RightKnee, LeftFoot,
This makes alignment and matching easy and reliable.
RightFoot, DisBetwKnee, DisBetwFoot), the best
Fig.12 shows the trajectory of distance between two
results are achieved.
ankles(a) and distance between two knees(b) of
04011_incline with ten persons’ slow sequences, When combining the static features and dynamic
relative distance of left ankle(c) and right ankle(d) of features, and testing inclined walking in the slow
04022_incline with ten persons’ slow sequences (two dataset, a recognition rate of 70% was obtained. So the
lines with “*” describe the two different sequences of combination of these two features can describe gait in
the same person, we can discriminate the person in this appearance and motion, increasing the efficiency and
trajectory figure) accuracy of analysis.
6. Conclusions
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE
(a) (b)
(c) (d)
Fig.12. Trajectory of distance between two ankles (a) and distance between two knees(b) of 04011_incline
with ten persons’ slow sequences; relative distance of left ankle(c) and right ankle(d) of 04022_incline with
ten persons’ slow sequences
Because in 2D recognition differences in surfaces [7] L. Wang, T. Tan, H. Ning and W. Hu, “Silhouette
have more effect on recognition accuracy, ten persons Analysis-based Gait Recognition for Human Identification”,
were randomly selected in our experiments to evaluate IEEE Trans. PAMI, 2003, 25(12):1505-1518.
our method. A slow dataset is used as a training set, and [8] S.Sarkar, P.J.Phillipsm, Z.Liu, I.R.Vega, P. Grother, and
inclined walking dataset as test set to be recognized in K.W. Bowyer, “The HumanID Gait Challenge Problem: Data
slow dataset. A result of 70% is achieved. Moreover, Sets, Performance, and Analysis”, PAMI,2005,27(2):162-177
[9] C.Y. Yam, M.S. Nixon, and J.N. Carter, “Gait
the results of experiments using different dynamic
Recognition by Walking and Running: A Model-Based
features show the efficiency of 3D analysis, importance
Approach”, ACCV 2002, pp.1-6.
of lower limbs motion, and low reliability of upper [10] D. Meyer, J. Denzler and H. Niemann, “Model based
limbs. But initialization is still implemented manually, Extraction of Articulated Objects in ImageSequences for Gait
and the reliability of tracking also needs to be improved. Analysis”, ICIP, 1997, pp. 78-81.
[11] L.Lee and W.E.L.Grimson, “Gait Appearance for
Acknowledgements Recognition”, Biometric Authentication, pp.143-154, 2002.
This work is partly supported by the Academy of [12] G.V.Veres, M.S.Nixon, and J.N.Carter, “Model-based
Finland. Approaches for Predicting Gait Changes Over Time”,
IWBRS,2005, Beijing. LNCS, 213-220.
References [13] C.BenAbdelkader, R.Cutler and L.Davis, “Person
[1] A. Kale, A.N. Rajagopalan, N. Cuntoor, V. Kruger, and Identification using Automatic Height and Stride Estimation”,
R.Chellappa, “Identification of Humans Using Gait”, IEEE In: proceedings of icpr2002: 377-380.
Transactions on Image Processing, 2004,13(9): 1163-1173 [14] A.Y. Johnson, and A. F.Bobick, “A Multi-view Method
[2] N. Cuntoor, A. Kale and R. Chellappa, “Combining for Gait Recognition Using Static Body Parameters”, AVBPA
Multiple Evidences for Gait Recognition”, ICASSP 2003, 2001 Halmstad, Sweden: 301-311.
Hong Kong, pp. 6-10 [15] C.Lee and A.Elgammal, “Towards Scalable View-
[3] C. BenAbdelkader, R. Cutler and L. Davis, “Gait Invariant Gait Recognition: Multilinear Analysis for Gait”,
Recognition Using Image Self-Similarity”, EURASIP Journal AVBPA2005:395-405.
on Applied Signal Processing, 2004, 4: 572-585. [16] S.L. Dockstader and A.M. Tekalp, “A Kinematic Model
[4] J. E.Boyd and J.J.Little, “Biometric Gait Recognition”, for Human Motion and Gait Analysis”, In Proc. of the
Biometrics school 2003, LNCS3161: 19-42,2005 Workshop on Statistical Methods in Video Processing
[5] R.T. Collins, Y. Liu, and Y. Tsin, “Gait Sequence (ECCV), pp.49-54,Copenhagen, Denmark, June 2002.
Analysis using Frieze Patterns”, In ECCV, 2002, vol.2, [17] R. Urtasun and P. Fua, “3D Tracking for Gait
pp.659-671. Characterization and Recognition”, FGR, 2004, pp.17-22.
[6] G. Zhao, L. Cui, and H. Li, “Combining Wavelet Velocity [18] L. Lee, G. Dalley, and K. Tieu, “Learning Pedestrian
Moments and Reflective Symmetry for Gait Recognition”, In Models for Silhouette Refinement”, ICCV2003, 663-670.
International Workshop on Biometric Recognition [19] R.T.Collins, R.Gross, J.Shi, “Silhouette-based Human
Systems(IWBRS),2005, Beijing. LNCS, 205-212. Identification from Body Shape and Gait”, FG’02,pp.366-371
Proceedings of the 7th International Conference on Automatic Face and Gesture Recognition (FGR’06)
0-7695-2503-2/06 $20.00 © 2006 IEEE