You are on page 1of 15

IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL.

30, 2021 3041

Multi-View Gait Image Generation for Cross-View


Gait Recognition
Xin Chen , Xizhao Luo, Member, IEEE, Jian Weng , Member, IEEE,
Weiqi Luo , Huiting Li, and Qi Tian, Fellow, IEEE

Abstract— Gait recognition aims to recognize persons’ identi- cross-view gait recognition methods. The proposed MvGGAN
ties by walking styles. Gait recognition has unique advantages due method trains a single generator for all view pairs involved in sin-
to its characteristics of non-contact and long-distance compared gle or multiple datasets. Moreover, we perform domain alignment
with face and fingerprint recognition. Cross-view gait recognition based on projected maximum mean discrepancy to reduce the
is a challenge task because view variance may produce large influence of distribution divergence caused by sample generation.
impact on gait silhouettes. The development of deep learning The experimental results on CASIA-B and OUMVLP dataset
has promoted cross-view gait recognition performances to a demonstrate that fake gait samples generated by the proposed
higher level. However, performances of existing deep learning- MvGGAN method can improve performances of existing state-
based cross-view gait recognition methods are limited by lack of-the-art cross-view gait recognition methods obviously on both
of gait samples under different views. In this paper, we take single-dataset and cross-dataset evaluation settings.
a Multi-view Gait Generative Adversarial Network (MvGGAN) Index Terms— Cross-view gait recognition, gait image gen-
to generate fake gait samples to extend existing gait datasets, eration, multi-domain generative adversarial networks, domain
which provides adequate gait samples for deep learning-based alignment, convolutional neural networks.

Manuscript received May 4, 2020; revised December 26, 2020; accepted I. I NTRODUCTION
January 20, 2021. Date of publication February 5, 2021; date of current
version February 22, 2021. This work was supported in part by the
National Natural Science Foundation of China under Grant U1736203,
Grant 61932010, Grant 61825203, Grant 61732021, Grant 61877029, Grant
G AIT recognition aims to recognize a pedestrian’s identity
based on walking styles. Gait can be captured at a long
distance without direct contact, and is hard to disguise once
61872153, Grant 61802145, Grant U1636209, Grant 61972454, and Grant
61906074; in part by the National Key Research and Development Plan of formed. This is an important advantage compared with face
China under Grant 2017YFB0802203 and Grant 2018YFB1003701; in part recognition and fingerprint recognition which require subjects’
by the Major Program of Guangdong Basic and Applied Research under active cooperation. Thus, gait recognition has become a more
Grant 2019B030302008; in part by the National Joint Engineering Research
Center of Network Security Detection and Protection Technology, Guangdong and more popular research direction in intelligent monitoring
Provincial Special Funds for Applied Technology Research and Development field.
and Transformation of Important Scientific and Technological Achieve under Gait recognition is easily affected by several interference
Grant 2016B010124009 and Grant 2017B010124002; in part by the Guang-
dong Key Laboratory of Data Security and Privacy Preserving under Grant factors, including dress, carrying, view and so on. Among
2017B030301004; in part by the Project Funded by the China Postdoctoral these factors, view variation has the most obvious influence
Science Foundation under Grant 2019M650232; in part by the National Key on recognition performance because it can change a subject’s
Research and Development Program of China under Grant 2018YFB1402600;
in part by the Science and Technology Program of Guangzhou of China under silhouette to a large extent [1]. In realities, capturing gait
Grant 201802010061; in part by the Natural Science Foundation of Jiangsu data under several views is quite expensive, and it is nearly
Province under Grant BK20201405; in part by the Guangdong Basic and impossible to capture gait data under all views. Then, cross-
Applied Basic Research Foundation under Grant 2019A1515011276, and in
part by the Natural Science Foundation of Guangdong Province, China, under view gait recognition is an important challenge. Cross-view
Grant 2019B010136003 and Grant 2019B010137005. The associate editor gait recognition aims to recognize gait data under unknown
coordinating the review of this manuscript and approving it for publication views based on gait data under known views.
was Prof. Mireille Boutin. (Corresponding author: Xizhao Luo.)
Xin Chen is with the College of Electronic Engineering, South China Existing cross-view gait recognition methods can be mainly
Agricultural University, Guangzhou 510642, China, and also with the College classified into two types: view transformation model (VTM)
of Artificial Intelligence, South China Agricultural University, Guangzhou [2]–[6] and view-invariant feature extraction methods [7]–[11].
510642, China (e-mail: chenxin@jnu.edu.cn).
Xizhao Luo is with the School of Computer Science and Technology, As to VTM methods, Makihara et al. in [2] realized view
Soochow University, Suzhou 215006, China (e-mail: xzluo@suda.edu.cn). transformation based on frequency spectrum characteristics
Jian Weng, Weiqi Luo, and Huiting Li are with the Guangdong Key Labo- of gait sequences. Muramatsu et al. in [5] realized arbitrary
ratory of Data Security and Privacy Preserving, Jinan University, Guangzhou
510632, China, also with the Guangzhou Key Laboratory of Data Security view transformation by constructing 3D gait volumes for
and Privacy Preserving, Jinan University, Guangzhou 510632, China, also training sequences and projecting to arbitrary 2D view planes
with the Guangdong Engineering Research Center of Data Security and in test phase. As to view-invariant feature extraction methods,
Privacy Preserving, Jinan University, Guangzhou 510632, China, and also
with the College of Information Science and Technology, Jinan University, Martín-Félez and Xiang in [9] trained a ranking model to learn
Guangzhou 510632, China (e-mail: cryptjweng@gmail.com; lwq@jnu.edu.cn; stable gait features existing in gait sequences under different
lihuting7@gmail.com). walking conditions. Xing et al. in [8] discovered common
Qi Tian is with Noahs Ark Laboratory, Huawei Inc., Shenzhen 518129,
China (e-mail: tian.qi1@huawei.com). features of multi-view gait sequences by canonical correlation
Digital Object Identifier 10.1109/TIP.2021.3055936 analysis method. With rapid development of deep learning
1941-0042 © 2021 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3042 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

techniques, deep convolutional neural networks have shown for all domain pairs, which takes in both image and domain
great potential in improving gait recognition performance. Wu label as input. By this way, a gait image can be translated to
et al. in [12] trained siamese deep networks by gait sequences any desired domain by controlling domain labels. This design
under eleven views to realize end-to-end cross-view gait is similar to the StarGAN method proposed in [23]. The main
recognition and achieved obvious performance improvement differences include two aspects: one is that StarGAN does
compared with traditional methods. Shiraga et al. in [13] not need to preserve identity information while gait image
designed a specific deep network taking gait energy image translation needs; the other one is that StarGAN only needs
(GEI) as input and significantly outperformed state-of-the- to translate static images while gait image translation needs to
art gait recognition approaches. These works demonstrate the preserve dynamic information in gait sequences.
effectiveness of deep learning methods in gait recognition. After training the generative adversarial network, we can
However, the discriminative capacities of deep networks in extend gait datasets by generating fake gait samples. The
gait recognition are still limited because existing gait datasets gait sample generation process covering two gait datasets is
are quite small compared with other image recognition tasks illustrated in Fig.1. By generating gait samples under merged
[14]. If larger gait datasets are used to train deep networks, views of different gaitsets, gait samples of each dataset are
gait recognition performances will be further improved signif- extended to a large extent. By t-SNE visualization of real and
icantly. In actual scenes, it is very time and labor consuming fake gait samples, we find that there are obvious distribution
to collect massive gait samples under a large number of views gaps between real and fake samples, which may influence the
for one subject. Recently, generative adversarial networks generalization capacity of gait classification models. To reduce
(GAN) have achieved great success in generating vivid fake this influence, we take the projected maximum mean discrep-
images [15]–[17]. Wei et al. in [18] proposed a person transfer ancy (MMD) to measure the distribution divergence between
GAN (PTGAN) to realize style transformation preserving real and fake samples, then train a domain alignment module
foreground identity information between different datasets. to minimize the distribution divergence. By this way, the dis-
The experimental results showed that transferring a person tributions of real and fake samples are aligned to some extent.
ReID dataset to another dataset can improve performances We then train deep networks for gait recognition based on
on the latter dataset. Zheng et al. in [19] added fake person extended gait datasets. Then the generalization and discrim-
images generated by GANs to increase training data, and inative ability of gait recognition network will be improved
achieved encouraging results in person re-identification tasks. with extended gait samples.
These works demonstrate that generative models are helpful in We highlight the main contributions of our method as
improving performance by increasing training data amount. In follows:
gait recognition field, Yu et al. in [20] introduced GaitGAN to • Gait images under different views are automatically gen-
generate invariant normal gait images under side views based erated by a single generator, which not only can make
on gait images under other views. In this work, GAN is taken advantage of common gait characteristics of different
as a regressor to realize gait transformations to side views, subjects and different datasets, but also can make the
which can be classified into the second type of cross-view gait single generator learn richer gait information of differ-
recognition methods, that is, view-invariant feature extraction ent subjects and different datasets, which can increase
methods. This work suggested that GAN can generate gait diversities of generated fake samples.
samples achieving encouraging results. However, gait samples • We perform domain alignment based on projected max-
generated by GaitGAN are directly used to complete recogni- imum mean discrepancy (MMD) for real and fake gait
tion phase without increasing training data amount. Moreover, samples to reduce the influence of domain shift caused
GaitGAN can only generate gait samples under side view. by fake sample generation.
To extend limited gait data, in this paper, we propose to add • We can extend existing gait datasets by adding fake gait
fake gait samples under different views generated by GAN images generated from the same or different dataset,
to increase training sample amount, and use more training which can improved generalization capacities of gait
samples including fake samples to train deep networks for gait classification models. Apart from views, gait images
recognition. More training samples will increase the general- under other domains are also generated to further extend
ization capacity of deep networks [21]. Then gait recognition gaitsets under more walking conditions, including dress
performance will be further improved. The process of gen- and carrying variances. This will further improve gait
erating gait samples under one view from another view can recognition accuracies under more walking conditions in
be seen as a domain transformation problem where one view realities. The proposed method achieves obvious accuracy
can be seen as a domain, and multi-view gait image synthesis improvement in cross-dataset experimental settings, and
can be seen as a multi-domain translation task. In multi- outperforms the-state-of-the-art performance on CASIA-
domain translation tasks, supposing there are d domains to be B and OUMVLP dataset.
translated, most existing methods need to train d × (d − 1)
generators [22]. In gait recognition, since gait samples in
existing gait datasets are quite limited, training one generator II. R ELATED W ORK
independently for each domain pair suffers from serious over-
fitting problem. Moreover, training generators independently A. Gait Recognition
cannot make full use of common features among different Traditional gait recognition methods can be classified
domains. To address this problem, we use a single generator into two aspects: model-based methods [24]–[26] and

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3043

specialized deep CNN architecture for gait recognition, which


avoided several shortcomings in subspace learning methods.
A multi-layer perceptron based neural network architecture
was proposed in [48] for kinect-based gait recognition, which
introduced two unique geometric features: joint relative cosine
dissimilarity and joint relative triangle area. Wang and Yan in
[45] designed a deep gait recognition model using convolu-
tional Long Short-Term Memory to take advantage of gait
temporality. He et al. in [49] proposed multi-task generative
adversarial networks (MGANs) to generate period energy
images (PEI) under specific view after feature transforma-
tion in the encoded latent space. This method can extract
richer features by adversarial training mechanism and is more
interpretable to cross-view gait recognition. Li et al. in [50]
proposed a joint intensity transformer deep network robust to
clothing and carrying variations. Chao et al. in [44] creatively
regarded gait sequences as a set consisting of independent
frames and designed a deep network (GaitSet) to complete
identity recognition based on the set. This method is immune
to frame permutation and incomplete silhouettes. Zhang et al.
in [51] proposed a robust and effective angle center loss
Fig. 1. The proposed Multi-view Generative Adversarial Network (MvG- function, extracted discriminative spatial-temporal features by
GAN) framework applied in cross-dataset settings. spatial transformer network, and learnt each frame’s atten-
tion score by long short-term memory (LSTM) units. This
appearance-based methods [1], [27]–[33]. Model-based meth- method achieved superior performance on most existing gait
ods reconstruct human body shape and structure geometrically, datasets. Since deep learning methods have presented obvious
which requires high image qualities. Appearance-based meth- advantage compared with conventional methods, we make use
ods extract gait features directly from gait image sequences, of fake gait samples generated by GANs to improve gait
which are more efficient and flexible. 3D reconstruction is performance of deep networks [44].
a typical technique in model-based gait recognition methods.
B. Image Translation
Tang et al. in [34] reconstructed 3D objects based on gait
silhouettes, and extract silhouettes in different views by re- Recently image translation methods have achieved
projecting 3D objects onto 2D space. Liao et al. in [35] impressive effect. Isola et al. in [52] proposed pix2pix method
proposed PoseGait to estimate pedestrian 3D pose by CNN to realize image translation between paired data samples based
and then extract spatio-temporal features from the estimated on conditional adversarial networks. In realities, obtaining
3D pose invariant to view variance. As to appearance-based paired data samples needs a lot of manual annotating works.
methods, feature selection is a typical method. Rida et al. To address this problem, image translation methods for
in [29] selected discriminative gait features based on Group unpaired data samples have appeared. Zhu et al. in [53]
Lasso of Motion, in [30] selected gait features based on modi- introduced a cycle consistency loss to translate images
fied phase-only correlation, in [32] selected gait features based to target domains while preserving shape and structure
on Statistical Dependency and Globality-Locality Preserving information. Kim in [22] proposed DiscoGAN to realize style
Projections, and in [33] selected most relevant discriminative transferring between different domains. The similar idea is
gait features based on a feature selection mask. These feature also applied in [54].
selection methods can effectively select discriminative gait Conditional GANs can generate images based on given
features to reduce intra-class variations. condition labels [55], [56]. The generated images can be
As appearance-based methods are easily influenced by sev- operated by controlling given condition labels including text
eral factors like dress, carrying, view and so on, there are descriptions [57]–[59]. Reed et al. in [57] translated visual
many methods reducing the effect of these factors [36]–[39]. concepts from text descriptions to pixel-level images combing
View variance is recognized to have the largest influence than a novel deep architecture and GAN formulation. Besides,
other factors. Thus a lot of cross-view gait recognition methods conditional image generation has been applied in several fields
have appeared [40], [41]. Muramatsu et al. in [6] proposed successfully, like photo generation [60], superresolution [61],
quality measures to measure bias degree of dissimilarity score style transfer [22], [62], and so on. The success of conditional
in VTMs. Li et al. in [11] combined GEI’s discriminative GANs suggests that GANs have the capacity of generating
subspace projections and collaborative representation classi- images based on given conditions.
fication to achieve robust and efficient performance.
In recent years, the development of deep learning methods C. Pedestrian Image Generation
have proceeded gait recognition performances to a higher Generative adversarial networks have shown encouraging
level [42]–[46]. Alotaibi and Mahmood in [47] developed a results in pedestrian image generation. Typical GAN-based

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3044 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

models in person re-identification task can be concluded into layer and a ReLU layer. ‘Deconv(4*4)’ has the same kernel
three classes: conditional GAN-based models, Cycle-GAN- size, stride and padding size with ‘Conv(4*4)’, after which
based models [63]–[67] and StarGAN-based models [68], following an Instance Normalization layer and a ReLU layer.
[69]. Ren et al. in [68] realized identity-based style mappings The ResidualBlock architecture is stacked by a Convolutional
between different camera views. Ma et al. in [70] generated layer, an Instance Normalization layer, a ReLU layer, a Con-
person images based on U-Net taking in condition image and volutional layer and an Instance Normalization layer. In the
target pose. Qian et al. in [71] fused original and synthe- Convolutional layer, kernel size is 3, stride is 1 and padding
sized pose-normalized image features to reduce the effect of size is 1.
pose variations. Ge et al. in [72] designed a novel Siamese
network to learn identity-related and pose-unrelated features
with the guidance of pose information. Wei et al. in [18] A. Multi-View Generative Adversarial Networks
bridged domain gap between different person re-identification Generative adversarial networks (GAN) can generate fake
datasets by GAN-based style transferring. Zheng et al. in images by two convolutional neural networks competing
[19] optimized generative module and discriminative module against each other. The main structure of GAN consists of a
jointly, by which generated image qualities and person re- generator that deals with data distributions and a discriminator
identification accuracies are promoted by each other. that estimates the probability of its input being real or fake.
These works demonstrate that generative adversarial net- The goal of the generator is to map a noise vector to a
works have capability of generating pedestrian images under pixel-level image as realistic as possible. The input of the
different appearances, poses, viewpoints and so on. However, generator can be an image instead of a noise vector. When
these works require massive paired pedestrian images under the generator takes an image as input, the generator can
different conditions as training data. Moreover, appearance transfer the input image into domains defined by target images,
information is not necessary in gait recognition. Thus, image by which the domain transfer task is realized [76]. Given
translation methods are more appropriate to gait image syn- paired samples, the domain transfer task can be realized in
thesis. a supervised manner. For instance, the pix2pix method [52]
Apart from static image generation, generative adversarial takes two images from two different domains as one training
networks have presented great potential in generating dynamic data pair and realizes translation from one domain to another
animations. Shaham et al. in [73] trained a pyramid of domain.
GANs to capture internal patch distribution of patches within However, gait features should be extracted based on gait
images, and then generated diverse samples with the same images consisting of a series of gait images, and each gait
visual content. Consecutive diverse samples with similar visual image correspond to a walking phase. Gait is consecutive
content can form vivid animation effect. Tesfaldet et al. in [74] motion, but gait images are sampled discretely to form gait
proposed a two-stream model for dynamic texture synthesis images. It is hard to realize complete alignment of walking
including appearance stream and dynamics stream. To generate phases under different views without any deviation. Thus,
a novel dynamic texture, this paper took in a noise sequence unsupervised GANs based on unpaired images [53], [54] are
as input, and optimized the output by two-stream model to more appropriate to gait sequence generation. Given two sets
gradually match each stream’s features of a given texture. Li sampled from domains U and V , unsupervised GANs need to
et al. in [75] proposed a fully convolutional model to generate learn two generators: one is G A : U → V mapping an image
video sequences only based on a start frame and an end frame. from domain U to an image from domain V , and the other
This method used a stochastic fusion mechanism to learn latent one is G B : V → U mapping an image from domain V to
representations incorporated between the start and end frames. an image from domain U . Correspondingly, the generator G A
These works have important reference values to gait dynamic learns a discriminator D A to distinguish fake images generated
characteristics synthesis. by G A and real images from domain V , and the discriminator
D B is learned in a similar way. To generate images in multiple
domains, unsupervised GANs should build a G A and G B
III. M ULTI -V IEW I MAGE G ENERATION FOR C ROSS -V IEW
for each pair of domains. When involved in gait recognition,
G AIT R ECOGNITION
supposing we need to generate gait images under k views,
We first describe the designed architecture of multi-view unsupervised GANs need to train k × (k − 1) GANs. However,
generative adversarial networks, then describe multi-view gait existing gait samples are too limited to train several GANs,
image sequence generation across datasets, finally we extend which easily suffers from over-fitting problem.
multi-view sequence generation to multi-domain generation To generate gait images under adequate views and avoid
covering dress and carrying variance. The overall framework over-fitting, we need to train a single generator G that realizes
of the Multi-view Generative Adversarial Network architecture mappings among multiple views. To achieve this goal, we can
is presented in Fig.2. In the figure, ‘Conv(7*7)’ represents a control the views of generating gait images by training a
convolutional layer whose kernel size is 7, stride is 1, padding conditional GAN conditioned on target view labels. Given
size is 3, after which following an Instance Normalization an input image sequence X = (x 1 , x 2 , . . . , x N ) where N
layer and a ReLU layer. ‘Conv(4*4)’ represents a convo- represents the number of frames in a sequence, the target view
lutional layer whose kernel size is 4, stride is 2, padding label c, then the output image sequence Y = (y1 , y2 , . . . , y N )
size is 1, after which following an Instance Normalization is obtained by G(X, v) → Y . Then we need a discriminator

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3045

Fig. 2. The overall framework of the Multi-view Generative Adversarial Networks. The view label can include views of the same or different gait datasets.

to predict the generated image is real or fake, the view label where Dview (v t |X) produces the probability input sample
of Y and the human identity of Y . This idea is similar to X belonging to view v t . The view classification network
the StarGAN method proposed in [23], but there are several Dview can learn to classify a real gait image sequence to its
differences: (1) StarGAN aims to translate images from one corresponding original view v. On the other hand, we also
domain to another domain while our method aims to generate need Dview to classify fake samples Y generated by G to the
more fake samples with diversities in several domains; (2) target view v t rightly. Then G should minimize the following
StarGAN does not need to preserve identity information while loss function:
out method aims to preserve identity information to serve as f ake
Lview = E X,v t [− log Dview (v t |Y )]
training data.
= E X,v t [− log Dview (v t |G(X, v t ))] (3)

B. Discriminator Loss By this loss the generator G will try to generate fake images
classified into the target view v t by Dview .
When the generator outputs the generated gait image
sequence Y , we need a discriminator D to calculate the
probability of Y is real or fake. The discriminator tries to dis- D. Cycle Consistency Loss
tinguish the generated fake sequence Y from the groundtruth The adversarial loss and view classification loss aim to
sequence while the generator tries to fool the discriminator. generate gait images that look realistic and belong to target
This objective can be achieved by an adversarial loss as views. However, since the training samples from source and
follows: target views may have other changes apart from view-related
changes, only minimizing the loss functions in Eq.(1) and
Ladv = E X [log D(X)] + E X,v t [log(1 − D(G(X, v t )))] (1)
Eq.(3) may generate gait samples owning other variance (like
where G(X, v t ) represents that the generator G generates an dress, carryings and so on) apart from views, which makes the
image sequence based on the input image sequence X and generation results become uncontrollable. To preserve other
the target view label v t . To generate samples as realistic as gait information of input gait samples while only changing
possible, the generator G tries to minimize this loss function view-related information, we adopt the cycle consistency loss
while the discriminator tries to maximize the loss function. function in [22], [54] to preserve gait information apart from
view variances as much as possible. The cycle consistency loss
requires the generator produce results that can reconstruct the
C. View Classification Loss original input image as follows:
To generate gait images under a specific view v t , we need
Lrec = E X,v,v t [X − G(G(X, v t ), v)1 ] (4)
the generated gait samples classified into view v t by a view
classification network. Supposing there are k views in total, where the generator G is used twice. G(X, v t ) generates gait
we add a classifier with k output nodes on top of the images under view v t and then G(G(X, v t ), v) reconstructs
discriminator D to realize view classification. Denote the the original input sequence under view v.
designed view classification network is Dview , we first need
to optimize Dview by taking real gait images as input and E. Identification Loss
the corresponding groundtruth view labels as output. The
optimization loss function is as follows: The reconstruction loss tries to preserve structural informa-
tion apart from view-related variances of each gait image in
view = E X,v t [log Dview (v t |X)]
Lreal (2) the input sequence. However, this loss treats each image in a

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3046 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

sequence separately without considering continuous dynamic then the view label is a k-dimensional one-hot vector k,
information between different image frames. This limit may where the element corresponding to the target view is ‘1’ and
lead to identification loss of subjects due to dynamic gait infor- the other elements are zeros. To realize gait image sequence
mation changing. Thus, we need to design an identification generation under other domains, we need to concatenate the
discriminator Did . The identification discriminator takes the domain label vector representing dress or carrying attributes
original gait image sequence and the generator’s output gait with k to form a unified label vector u. Then we have
image sequence as one training data pair, and produces the u = [k, d] where d represents the one-hot label vector of
probability of whether the data pair comes from the same dress or carrying condition.
person. If the training data pair comes from one subject, The main challenge of multi-domain gait image sequence
the identification discriminator should output 1, otherwise it generation is that, unlike view variance, dress or carrying
should output 0. The loss function is as follows: variance does not have regular changing patterns. In realities,
there are innumerable dress and carrying types, for example,
Lid = E X [log Did (X)]+ vE X,v t [log(1 − Did (G(X, v t )))]. (5)
longuette and jeans will produce quite different silhouette vari-
To make full use of the static and dynamic gait information ances and should correspond to different domain labels. In this
at the same time, we design the identification discriminator paper, we mainly perform gait image sequence generation of
architecture based on the LB network in [12]. two typical dress types (summer wear and long coat) and two
typical carrying types (Backpack and single shoulder bag),
F. Full Objective which are involved in CASIA-B dataset [77].
In training process, we consider the ‘nm’ type, ‘bg’ type
Combining the above mentioned four loss functions we can
and ‘cl’ type as three domains. We select training samples
get the final full optimization objective. Overall, the multi-
from each type of sequences for its responding domain.
view gait image sequence generation GAN can be optimized
Then, we train MvGGAN by these training samples. After
by minimizing the following objective function:
training, we can generate fake gait samples under different
L(G, D) = Ladv + λview Lreal
view
dress or carrying conditions.
f ake
+ λview Lview + λrec Lrec + λid Lid (6)
I. Domain Adaption Between Fake and Real Gait Images
where G tries to minimize this objective function while D
Since we adopt a single generator to generate fake gait
tries to maximize it, λ are hyper-parameters controlling the
images under different walking conditions even different
importance of different loss functions in optimization process.
datasets, theoretically, there should be distribution differences
between real and fake gait images. We present t-SNE visual-
G. Multi-View Generation Across Datasets ization results of real and fake gait images in Fig.3. We can
By multi-view GAN we can control the target view of see that there is an obvious gap between real and fake samples.
generated gait images by controlling the view label vector If we merge real and fake samples directly in training process
v t . If v t consists of view labels coming from a dataset A of gait classification models, the distribution differences will
including k1 views, we can generate gait samples under k1 affect the performance on test samples. Therefore, we need to
views. Furthermore, if we combine view labels of another design a domain alignment method to reduce the distribution
dataset B including k2 views with v t together, the multi-view differences and lead fake gait samples to help improving the
GAN will generate gait samples under k1 + k2 views. By this generalization capacities of gait classification models.
way, both dataset A and dataset B become to include samples Supposing the set of real gait features is Dr = {xri , yri }ni=1 ,
under k1 + k2 views, and both datasets are extended to contain the set of fake gait features is D f = {x f j , y f j }n+m
j =n+1 , we have
more samples. To achieve this goal, we follow the StarGAN the label space Yr = Y f . The goal of domain alignment is to
method [23] to define a unified label vector ṽ as follows: learn a feature mapping function F(·) to reduce the distribution
differences of real and fake gait samples. As argued in
ṽ = [v 1 , . . . , v n , m] (7)
[78], the distribution difference between different datasets is
where v i represents the label vector of dataset i , m is a n- dominated by marginal(P) distributions while the distribution
dimensional one-hot vector. If the i th element of m is ‘1’, difference between samples in the same dataset is dominated
the target view for gait image generation comes from the i th by conditional(Q) distributions. Since our proposed method
dataset. realizes image generation covering both same and different
datasets, we consider both marginal and conditional distribu-
H. Training With Multiple Domains tions alignment for distribution alignment of gait images. The
distribution alignment is defined as:
In gait recognition, apart from the view variance, dress and
carryings are also two important influence factors. Collecting 
C
a large amount of gait samples under different dress and D F (Dr , D f ) = (1−η)D F (Pr , P f )+η D (c)
F (Q r , Q f ) (8)
carrying conditions is also human consumption. If we consider c=1
a dress or carrying condition as a domain, then we can generate where η ∈ [0, 1] represents the adaptive factor to balance
gait samples under different dress or carrying conditions by the importance of marginal and conditional distributions, and
GANs. Supposing there is a gait dataset including k views, c ∈ 1, . . . , C represents the subject identities of gait images.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3047

D F (Pr , P f ) represents the marginal distribution alignment and


D (c)
F (Q r , Q f ) represents the conditional distribution align-
ment of subject c.
The distribution divergence between fake and real gait
images can be calculated empirically by projected maximum
mean discrepancy (MMD) [79], [80]. The distribution align-
ment can be written as:
D F (Dr , D f ) = (1 − η)E[F(xr )] − E[F(x f )]2H K Fig. 3. The t-SNE visualization results of real and gait samples. Samples with

C label 0 − 4 represent fake gait images coming from 5 subjects, and samples
(c) with label 5 − 9 represent real gait images coming from the same 5 subjects.
+η E[F(xr(c) )] − E[F(x f )]2H K (9) In the left image, fake images cover 22 views merged from both CASIA-B
c=1 and OUMVLP dataset, while real images cover 11 views coming from only
CASIA-B dataset. In the right image, both fake and real images are under
where E[·] represents the mean value of the embedded features 90◦ . We can see that there is obvious gap between fake and real gait images
and H K represents the reproducing kernel Hilbert space. Based under different views.
on the representer theorem, equation (9) can be written as:
D F (Dr , D f ) = tr(FT MF) (10) fake images to be aligned, and the merged characteristics of
single images are not constrained. Thus, 1024 node numbers
In this equation, F ∈ R(n+m)×d represents the feature matrix of middle layer are enough to realize distribution alignment.
concatenated by feature vector of each sample. In the feature More node numbers will increase computational cost and
matrix, each row corresponds to a feature vector F(x) of overfitting risk, and will not bring performance improvement.
a real or fake gait sample x, and d represents the feature Once F is trained, gait features of real and fake gait samples
dimension
 of F(x). The MMD matrix M = (1 − η)M0 + are input to F to complete domain alignment. Domain-aligned
η C c=1 M c is used to calculate distribution divergence of gait gait features are input to next layers of gait classification
features in matrix F, where each element is calculated as: network to complete training or testing.

⎪ 1

⎪ , xi , x j ∈ Dr

⎨ 2n J. Implementation
1
(M0 )i j = , xi , x j ∈ D f (11) As to the network architecture, the generator belongs to

⎪ m 2

⎪ encoder-decoder architectures, which is composed of two
⎩ − 1 , otherwise
convolutional layers with stride size of two, six residual
⎧ 1 mn blocks, and two deconvolutional layers with stride size is
⎪ xi , x j ∈ Dr(c)
⎪ n2 ,

⎪ two. The instance normalization is adopted in the generator

⎪ c

⎪ 1 rather than the discriminator following [23]. For the adver-

⎪ (c)
⎪ m2 ,

xi , x j ∈ D f sarial discriminator architectures, the StarGAN method used
⎨ c
PatchGANs [52], [53], [62] which justify whether the input is
(Mc )i j = 1 (c) (12)

⎪ − , xi ∈ Dr(c) , x j ∈ D f real or fake by local image patches. Since gait recognition is

⎪ m c n c

⎪ based on complete characteristics, and image patches cannot

⎪ 1

⎪ − , xi ∈ D(c) (c)
f , x j ∈ Dr reflect global gait information, we directly input whole gait

⎪ m n
⎩ c c
images to PatchGANs to get classification results. As to the
0, otherwise
identification loss, we adopt the LB network in [12] to justify
where n c and m c represent the real and fake sample number of whether two inputs come from the same subject.
subject c. In this paper, we set η = 0.8 when generating fake In training process, since Wasserstein GAN (WGAN) [18]
samples in the same dataset, and set η = 0.2 when generating has proved its capacity in stabilizing the adversarial training
fake samples in cross-dataset settings. process, we also adopt the gradient penalty term of WGAN in
To reduce the distribution divergence of real and fake sam- loss function Ladv . As to hyper-parameters, we set λview = 1,
ples, the mapping function F needs to minimize the objective λrec = 10, λid = 5. We apply Adam [82] to optimize generator
function in Eq. (8) during training. In this paper, the mapping and discriminators. The batch size is set to 16. The generator is
function F is constructed as a multi-layer fully connected updated one time after the discriminator is updated five times.
network. The input gait features can be extracted from any The learning rate is set initially as 0.0001 for 50 epochs and
layer of gait classification network. Supposing we extract gait decayed linearly to 0 over the next 50 epochs gradually.
features from the output of layer l, then the node number of
F’s first layer should be same to the node number of layer l, IV. E XPERIMENTAL R ESULTS
and the node number of F’s last layer should be same to the In this section, we first perform experiments on CASIA-B
node number of layer l +1. The node numbers of middle layers dataset and OUMVLP dataset independently, then perform
are 1024. The reason we set the node numbers of middle layers cross-dataset experiments where shooting views of the two
to 1024 is: the domain alignment network aims to reduce the datasets are merged, then we study the performance of adding
distribution differences between real and fake gait images, that fake dress or carrying samples, finally we analyze the compu-
is, we only require the overall distributions of real images and tational complexities.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3048 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 4. Some gait samples selected from CASIA-B and OUMVLP dataset.
The first row represents gait samples from CASIA-B and the second row
represents gait samples from OUMVLP.

A. Datasets
(1) CASIA-B [77] contains 124 subjects. There are
110 sequences under 11 views (0◦ ,18◦,. . .,162◦ ,180◦) for each
subject, and each view corresponds to 10 sequences. Among
the 10 sequences, 6 sequences (nm01-nm06) are taken under
normal conditions, 2 sequences (cl01-cl02) are captured in Fig. 5. The influence of hyper-parameters on recognition accuracies.
coats and 2 sequences (bg01-bg02) are captured with bags.
During testing, NM01-NM04 are taken as gallery set while
nm05-nm06, bg01-bg02 and cl01-cl02 are taken as probe set. The influence of parameter η on recognition accuracies is
The dataset provides samples under different view, clothing presented in Fig.5b. We can see that, when generating gait
and carrying conditions, which satisfies the training data images in the same dataset, with η increasing, the recognition
requirement for multi-view gait image sequence generation. accuracy increases continuously; when generating gait images
(2) OUMVLP [83] is a large view-variation gait dataset. in cross-dataset setting, with η increasing, the recognition
This dataset includes more than 10,000 subjects. Each sub- accuracy decreases continuously.
ject’s sequences are captured under 14 views (0◦ , 15◦, 30◦ ,
45◦ , 60◦ , 75◦, 90◦ , 180◦ , 195◦ , 210◦ , 215◦, 240◦, 255◦, and
270◦ ). There are two sequences under each view which is C. Experiment Results on CASIA-B
similar to OULP [84]. The dataset can be extended to contain After training the proposed multi-view gait image sequence
more sequences for each subject by our proposed gait sequence generation network, we can generate gait images under all
generation method. Some samples are illustrated in Fig.4. views participating in the training process. For CASIA-B,
we take the first 6 sequences (nm01-nm06) of the first 74 sub-
jects as training sequences and the other 50 subjects for testing.
B. Parameter Selection
During testing, the first 4 normal sequences (nm01-nm04) are
The hyper-parameters of the proposed method include selected gallery set and the other sequences (nm05-nm06) are
λview , λrec , λid in the objective function of multi-view GAN taken as probe set. Before training the proposed multi-view
and η in the objective function of domain adaptation. λ is GAN, we cut and align the raw gait silhouettes and resize
used to control the importance of different loss functions in the silhouettes to 128 × 128. Moreover, since the number
multi-view GAN, while η is used to balance the importance of of gait samples under different views is different, to reduce
marginal and conditional distributions in domain adaptation. the influence of sample unbalancing, we balance gait samples
We first set λview = 1, λrec = 10, λid = 5 referencing [23]. under different walking conditions by the re-sampling strategy
To study the influence of each parameter, when we change in [85].
one parameter, the other parameters are fixed. The influence As the CASIA-B dataset contains 11 views
of λ values on recognition accuracies is presented in Fig.5a. (0◦ ,18◦ ,. . .,162◦,180◦), given gait samples under one
We can see that, when fixing λrec and λid , with λview increas- view like 18◦, we can generate gait samples under the
ing, the recognition accuracy first increases then decreases, other 10 views. Naturally, gait samples under 18◦ can also
because larger view classification loss may generate fake gait be generated. To validate the effectiveness of gait samples
images with more accurate view information but may reduce generation, for gait samples used for training, supposing
diversities; when fixing λview and λid , with λrec increasing, we only have gait samples under 90◦ , we generate gait
the recognition accuracy increases continuously, because larger samples under the other 10 views by our designed network
λrec can preserve more accurate structural gait information; automatically. Then we take the generated fake gait samples
when fixing λview and λrec , the recognition accuracy first combined with the original real samples under 90◦ as training
increases then decreases with λid increasing, because larger data to train a gait classification network (we adopt the
identification loss may generate fake gait images with more GaitSet network in [44] as the classification network in this
accurate identification information but may reduce diversities experiment), and the testing gait samples remain unchanged.
to some extent. The validation results are presented in Fig.6. We can see

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3049

TABLE I
T HE R ECOGNITION A CCURACIES (%) W HEN A DDING FAKE G AIT S AMPLES TO THE O RIGINAL T RAINING D ATA

that, the generated fake samples under different views can If we remove the identification loss in Eq.(5), the recognition
improve gait recognition accuracies to a large extent compared accuracy has obvious drop, because fake samples with wrong
with the GaitSet network trained using samples under only identification information will produce negative impact on
90◦ . Moreover, the recognition accuracy distribution of final performance.
fake samples are quite similar to the recognition accuracy To study the influence of view intervals between input and
distribution of GaitSet trained using original real samples. output on generation qualities of multi-view GAN method,
This suggests that the proposed multi-view GAN method has we design an experiment where the view intervals between
the capacity of generating fake samples corresponding to the input and output gait image sequences are less than 36◦ , which
distribution of real samples. We present some generated fake refers to the ‘36◦ +’ method in Table I. We can see that,
samples under different views in Fig.7. We can see that the apart from extreme views 0◦ and 180◦ , view intervals between
proposed method can generate gait silhouettes consistent with input and output gait image sequences have small influence on
view variance visually. recognition accuracies, which suggests that the generator can
To validate whether the generated fake samples can improve well learn relationships between two views with large inter-
the generalization capacity of gait classification network, vals. This further demonstrates that the proposed method has
supposing we have training samples under 11 views, take stronger view transformation capacity than traditional VTM
90◦ as example, we select 10 sequences from gait samples methods [2], [3] whose performances drop sharply if view
under 90◦ , and use the selected 10 sequences to generate intervals are quite large. The main reason why view intervals
g sequences under another view. We will generate g × 11 have smaller influence under normal views than extreme views
sequences based on the selected 10 sequences under 90◦ . If we is, since the dynamic gait characteristics under extreme views
select 10 sequences under each view, then we will generate are hard to be complete, gait features under normal views are
g×11×11 fake sequences for each subject. In this experiment, much more regular than extreme views, it is much easier for
we set g = 10. We add these fake sequences to the original the proposed MvGGAN model to learn mapping relationships
training data, and train the GaitSet network, the results are between different gait samples under normal views than under
presented in Table I. From the results, we can see that, adding extreme views.
fake samples to training data can improve gait recognition Since the idea of cross-view gait generation using StarGAN
performance obviously under three walking conditions. The has been discussed in VT-GAN method [86] who also involves
reason is more samples enhance the discriminative and gen- an identification loss to keep discriminability, we compare our
eralization capacity of deep classification networks, which is proposed method with VT-GAN method in Table I. We can
consistent with the conclusion in [21]. On the other hand, see that, our proposed method outperforms VT-GAN method
this phenomenon also suggests that the proposed multi-view significantly. One reason is, since GEI combines static and
GAN method has the capacity of generating gait samples dynamic gait information in a single image, it is harder to
under different views preserving identification information. realize view transformation of GEIs than raw gait image

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3050 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

Fig. 6. Recognition accuracy comparison of the proposed method with and without fake samples when only real samples under 90◦ are provided. Adding fake
samples can improve performance to a large extent when annotated real samples under some views are unknown. (a) Recognition accuracy comparison under
condition of normal walking; (b) recognition accuracy comparison under condition of carrying bags; (c) recognition accuracy comparison under condition of
wearing coats.

Fig. 7. Some fake gait samples under different views generated by our proposed multi-view GAN method. In each row, the first column represents
a real gait image, and as to the other columns, from left to right represents view 0◦ , 15◦ , 18◦ , 30◦ , 36◦ ,45◦ ,54◦ ,60◦ ,72◦ ,75◦ ,90◦ ,108◦ , 126◦ ,144◦ ,
162◦ ,180◦ ,195◦ ,210◦ ,225◦ ,240◦ ,255◦ ,270◦ , existing in both CASIA-B and OUMVLP dataset.

sequences. In person re-identification tasks, there have been TABLE II


several works using GAN-based models to generate pedestrain R ECOGNITION A CCURACY (%) C OMPARISON ON OUMVLP D ATASET
U NDER F OUR T YPICAL V IEWS
images for data augmentation, which can be classified into
two classes: Cycle-GAN-based methods and StarGAN-based
methods. In Table I, we also compare the proposed method
with typical Cycle-GAN-based methods (Cycle-GAN, Cycle-
GAN+SiaNet) and StarGAN-based methods (CTGAN). The
experimental results demonstrated that our proposed MvG-
GAN method outperforms other GAN-based methods obvi-
ously. The Cycle-GAN-based methods achieve lower accura- single dataset. In realities, views of probe gait sequences
cies than StarGAN-based methods, because one Cycle-GAN often do not appear in the gallery set, which makes perfor-
model can only correspond to one view pair and Cycle-GAN mance drop sharply. Thus, making the gallery set containing
models cannot guarantee accurate view classification results. sequences under as many views as possible is important.
In this experiment, we try to merge views of both CASIA-
D. Experimental Results on OUMVLP B and OUMVLP dataset to make each dataset contain
We test the effectiveness of the proposed Multi-view GAN sequences under more views, and perform evaluations under
method on OUMVLP dataset. The evaluation setting in [88] these merged views, as illustrated in Fig.1. By this way,
is adopted in this experiment, which takes 5,153 people for the recognition capacities of both two datasets in realities
training and 5,154 people for testing. We evaluate gait samples are promoted by each other. The CASIA-B dataset contains
under four typical views (0◦ ,30◦ ,60◦ ,90◦ ) in Table II. We first 11 views (0◦ ,18◦ ,. . .,162◦,180◦ ), the OUMVLP dataset con-
use gait images in training set to train the proposed multi-view tains 14 views (0◦ ,15◦ ,30◦,. . .,240◦ ,255◦,270◦), then the view
GAN network, then use the trained network to generate fake union set of the two datasets contains 22 views since there
gait samples. We add 200 fake gait samples to the training set are three common views (0◦ ,90◦ ,180◦ ). The training samples
of each view per subject. As listed in Table II, we can see that, under common views come from the single CASIA-B dataset.
fake samples are helpful in improving recognition accuracies We train MvGGAN combing the training sets of the two
under four typical views. This further demonstrates that fake datasets based on the union view labels defined in Eq.(7).
samples generated by MvGGAN method can improve the gen- Based on the trained MvGGAN, fake gait samples under
eralization and discriminative capacities of gait classification 22 views can be generated. During evaluation, take 15◦ for
networks. example, to generate gait samples for CASIA-B under 15◦
which does not exist in original dataset, we select g samples
E. Experimental Results of CASIA-B+OUMVLP from each view of original dataset (11 views in total), then we
In the above two experiments, the proposed MvGGAN can generate 11 × g fake samples under 15◦ . We set g = 10
method generates fake samples under different views in a in this experiment.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3051

Fig. 8. Recognition accuracy comparison under different ranks of gait recognition model trained and tested in cross-dataset settings.

TABLE III
T HE C ROSS -D ATASET R ECOGNITION A CCURACIES (%) OF G AIT C LAS -
SIFICATION N ETWORK T RAINED ON CASIA-B D ATASET I NCLUDING
FAKE I MAGES U NDER 22 M ERGED V IEWS AND E VALUATED ON
OUMVLP D ATASET

Fig. 9. Some fake images generated based on images from OUMVLP dataset
and target views coming from CASIA-B dataset.

dress or carryings. However, different dress or carrying types


often cause large impact to gait silhouettes. Each typical
dress or carrying type can be considered as a domain. Since
it is very expensive to collect gait samples under a large
number of dress or carrying types, in this experiment, we just
consider the dress and carrying types in CASIA-B dataset to
The evaluation results are presented in Table III and Fig. 8. test the possibility of generating fake samples under different
We can see that, the recognition accuracies of the Gait- dress or carrying types. Sample generation involving more
Set method drop sharply when recognizing sequences under dress and carrying types will be studied in future works.
unknown views during training process. After adding fake We present some fake gait samples generated by MvGGAN
samples, the GaitSet model trained on CASIA-B achieves in Fig.10. We use 600 real gait images covering several sub-
satisfactory recognition results on OUMVLP dataset, and the jects without identity information in each domain as training
GaitSet model trained on OUMVLP dataset also achieves data. We can see that MvGGAN has the capacity of learning
satisfactory recognition results on CASIA-B dataset. The the main domain information existing in dress and carryings.
recognition accuracies on extreme views (0◦ and 180◦ ) are To extend gait datasets to include more dress and carrying
lower than other views because the two views reflect the conditions, we input ‘nm’ real gait images to generate corre-
least gait information compared with other views, which may sponding fake ‘bg’ and ‘cl’ gait images. The generated fake
cause information loss during fake sample generation and real images are merged with real images to extend gait datasets.
sample classification. We then take the extend dataset to train the GaitSet model, and
We compare the rank-5 accuracies of gait recognition model complete evaluation using real gait samples. The recognition
trained in cross-dataset settings in Fig.8. We can see that, results are presented in Table IV. We can see that, adding
accuracies of gait recognition model under different ranks drop fake ‘bg’ and ‘cl’ gait images can improve the performance of
sharply in cross-dataset settings, and our proposed MvGGAN GaitSet, because fake ‘bg’ and ‘cl’ gait images make GaitSet
method is helpful in accuracy improvement of each rank. This capture gait information under more dress or carrying condi-
phenomenon suggests that it is possible to make full use of gait tions in training phase, by which the generalization capacity
samples across different gait datasets, which is important in of GaitSet corresponding to dress or carrying variances is
promoting gait recognition technologies to reality applications. improved obviously. ‘bg’ fake gait samples improve gait
We present some fake images generated from gait images of recognition performance larger than ‘cl’ fake samples, because
OUMVLP dataset with target view coming from CASIA-B the patterns of ‘bg’ samples are much more regular than ‘cl’
dataset in Fig.9. We can see that MvGGAN has the capacity samples.
of generating fake gait images across datasets. If provided with training samples with more dress or car-
rying types, the proposed method will produce more fake gait
images under more dress or carrying conditions, then the gait
F. Experimental Results of Multi-Domain Generation dataset will be extended to a large extent. Since training MvG-
The proposed MvGGAN method not only can generate GAN does not need identity information, collecting training
fake gait samples under different views, but also can gen- images under different dress or carrying conditions is much
erate fake gait samples under other walking conditions like cheaper than collecting gait samples with identity information.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3052 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

TABLE IV
R ECOGNITION A CCURACIES (%) OF G AIT C LASSIFICATION N ETWORK T RAINED ON CASIA-B D ATASET M ERGED W ITH ‘ BG ’ AND ‘ CL’ FAKE G AIT
I MAGES

TABLE V
R ECOGNITION R ESULTS C OMPARISON (%) OF A BLATION S TUDY IN M V GGAN ON CASIA-B D ATASET

Thus, the proposed MvGGAN method is helpful in improving information. Therefore, collecting training samples for MvG-
generalizing capacities of gait classification models. GAN is much easier than collecting supervised gait samples
with corresponding identity labels. In future works, we will
try to collect more training data not included in existing gait
G. Complexity Analysis datasets for MvGGAN, then we will generate fake gait samples
As to computational complexities, compared with existing with higher qualities, and gait recognition performances will
deep gait classification models, our proposed method needs be further improved.
to train an extra generative adversarial network and a domain
alignment module, which needs more training time. However,
once the network is trained, it can generate fake gait samples H. Ablation Study
automatically without human cooperation. Once fake samples We conduct several ablation experiments to evaluate indi-
are merged with original gait datasets, gait classification vidual influence of different modules, including view clas-
models can be trained in the same complexity level with sification loss, cycle consistency loss, identification loss,
existing models. Thus, our proposed method does not need and domain adaption. The experiment results are presented
extra computational time in test process. Moreover, compared in Table V. ‘MvGGAN-Identification loss’ means removing
with existing gait classification models, our method does not identification loss from MvGGAN, ‘MvGGAN-View loss’
need extra training data. When we train MvGGAN, we only means removing view classification loss from MvGGAN,
need to select training samples from existing gait datasets in ‘MvGGAN-Cycle loss’ means removing cycle consistency
different domains without the requirement for subject identity loss from MvGGAN, and ‘MvGGAN-Domain adaption’

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3053

[3] W. Kusakunniran, Q. Wu, H. Li, and J. Zhang, “Multiple views gait


recognition using view transformation model based on optimized gait
energy image,” in Proc. IEEE 12th Int. Conf. Comput. Vis. Workshops,
Sep. 2009, pp. 1058–1064.
[4] W. Kusakunniran, Q. Wu, J. Zhang, and H. Li, “Cross-view and multi-
view gait recognitions based on view transformation model using multi-
layer perceptron,” Pattern Recognit. Lett., vol. 33, no. 7, pp. 882–889,
May 2012.
[5] D. Muramatsu, A. Shiraishi, Y. Makihara, M. Z. Uddin, and Y. Yagi,
“Gait-based person recognition using arbitrary view transformation
model,” IEEE Trans. Image Process., vol. 24, no. 1, pp. 140–154,
Jan. 2015.
[6] D. Muramatsu, Y. Makihara, and Y. Yagi, “View transformation model
incorporating quality measures for cross-view gait recognition,” IEEE
Trans. Cybern., vol. 46, no. 7, pp. 1602–1615, Jul. 2016.
[7] S. Das Choudhury and T. Tjahjadi, “Robust view-invariant multi-
scale gait recognition,” Pattern Recognit., vol. 48, no. 3, pp. 798–811,
Mar. 2015.
[8] X. Xing, K. Wang, T. Yan, and Z. Lv, “Complete canonical correla-
tion analysis with application to multi-view gait recognition,” Pattern
Recognit., vol. 50, pp. 107–117, Feb. 2016.
Fig. 10. Some fake gait samples generated by our proposed multi-view GAN [9] R. Martín-Félez and T. Xiang, “Uncooperative gait recognition by
method on CASIA-B dataset. The first row represents real ‘bg’ gait images, learning to rank,” Pattern Recognit., vol. 47, no. 12, pp. 3793–3806,
the second row represents fake ‘bg’ gait images, the third row represents real Dec. 2014.
‘cl’ gait images, and the last row represents fake ‘cl’ gait images. [10] X. Chen and J. Xu, “Uncooperative gait recognition: Re-ranking based
on sparse coding and multi-view hypergraph learning,” Pattern Recog-
means removing domain adaption module from MvGGAN. nit., vol. 53, pp. 116–129, May 2016.
[11] W. Li, C.-C.-J. Kuo, and J. Peng, “Gait recognition via GEI subspace
From the results, we can see that each module has influence projections and collaborative representation classification,” Neurocom-
on recognition accuracies. The cycle consistency loss has big puting, vol. 275, pp. 1932–1945, Jan. 2018.
influence because it can affect gait structural information. [12] Z. Wu, Y. Huang, L. Wang, X. Wang, and T. Tan, “A comprehensive
study on cross-view gait based human identification with deep CNNs,”
The domain adaptation also plays an important role. Without IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, no. 2, pp. 209–226,
domain adaptation, domain gap between fake and real gait Feb. 2017.
images can weaken discriminative capacity of deep network [13] K. Shiraga, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi,
“GEINet: View-invariant gait recognition using a convolutional neural
for identity information. network,” in Proc. Int. Conf. Biometrics (ICB), Jun. 2016, pp. 1–8.
[14] J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei, “ImageNet:
A large-scale hierarchical image database,” in Proc. IEEE Conf. Comput.
V. C ONCLUSION Vis. Pattern Recognit., Jun. 2009, pp. 248–255.
In this paper, we generate fake gait samples under differ- [15] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normal-
ization for generative adversarial networks,” in Proc. Int. Conf. Learn.
ent views across different gait datasets based on Multi-view Represent. (ICLR), 2018, pp. 1–8.
Gait Generative Adversarial Network (MvGGAN). MvGGAN [16] H. Zhang, I. Goodfellow, D. Metaxas, and A. Odena, “Self-attention
includes a single generator which generates fake gait samples generative adversarial networks,” in Proc. Annu. Conf. Neural Inf.
Process. Syst. (NIPS), 2018, pp. 7354–7363.
under several walking conditions and a discriminator which [17] A. Brock, J. Donahue, and K. Simonyan, “Large scale GAN training
realizes adversarial training and preserves person identity for high fidelity natural image synthesis,” in Proc. Int. Conf. Learn.
information. By adding generated fake gait samples to the orig- Represent. (ICLR), 2019, pp. 1–8.
[18] L. Wei, S. Zhang, W. Gao, and Q. Tian, “Person transfer GAN to
inal gait datasets and performing domain alignment between bridge domain gap for person re-identification,” in Proc. IEEE/CVF
real and fake samples, the performances of deep learning- Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 79–88.
based gait classification networks can be improved obviously. [19] Z. Zheng, X. Yang, Z. Yu, L. Zheng, Y. Yang, and J. Kautz, “Joint dis-
This paper demonstrates that it is practicable to improve cross- criminative and generative learning for person re-identification,” in Proc.
IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Jun. 2019,
view gait recognition performances by adding fake samples pp. 2138–2147.
to the original gait datasets, and also demonstrates that it [20] S. Yu, H. Chen, E. B. G. Reyes, and N. Poh, “GaitGAN: Invariant gait
is possible to generate fake samples for one dataset under feature extraction using generative adversarial networks,” in Proc. IEEE
Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jul. 2017,
different views or other walking conditions involved in another pp. 30–37.
dataset. Cross-dataset gait image generation is quite important [21] Q. Xie, M.-T. Luong, E. Hovy, and Q. V. Le, “Self-training with noisy
because it can make gait classification network learn gait student improves ImageNet classification,” 2019, arXiv:1911.04252.
[Online]. Available: http://arxiv.org/abs/1911.04252
information under as many walking conditions as possible, [22] T. Kim, “Learning to discover cross-domain relations with generative
and improve gait recognition performances in reality scenes adversarial networks,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV),
where unregistered walking conditions often appear. Jul. 2017, pp. 1857–1865.
[23] Y. Choi, M. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN:
Unified generative adversarial networks for multi-domain Image-to-
R EFERENCES Image translation,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern
Recognit., Jun. 2018, pp. 8789–8797.
[1] L. Wang, T. Tan, H. Ning, and W. Hu, “Silhouette analysis-based gait [24] D. Cunado, M. S. Nixon, and J. N. Carter, “Automatic extraction and
recognition for human identification,” IEEE Trans. Pattern Anal. Mach. description of human gait models for recognition purposes,” Comput.
Intell., vol. 25, no. 12, pp. 1505–1518, Dec. 2003. Vis. Image Understand., vol. 90, no. 1, pp. 1–41, Apr. 2003.
[2] Y. Makihara, R. Sagawa, Y. Mukaigawa, T. Echigo, and Y. Yagi, “Gait [25] J. Gu, X. Ding, S. Wang, and Y. Wu, “Action and gait recognition
recognition using a view transformation model in the frequency domain,” from recovered 3-D human joints,” IEEE Trans. Syst., Man, Cybern.
in Proc. Eur. Conf. Comput. Vis. (ECCV), 2006, pp. 151–163. B. Cybern., vol. 40, no. 4, pp. 1021–1033, Aug. 2010.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
3054 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 30, 2021

[26] F. Tafazzoli and R. Safabakhsh, “Model-based human gait recognition [50] X. Li, Y. Makihara, C. Xu, Y. Yagi, and M. Ren, “Joint intensity
using leg and arm movements,” Eng. Appl. Artif. Intell., vol. 23, no. 8, transformer network for gait recognition robust against clothing and
pp. 1237–1246, Dec. 2010. carrying status,” IEEE Trans. Inf. Forensics Security, vol. 14, no. 12,
[27] J. Han and B. Bhanu, “Individual recognition using gait energy image,” pp. 3102–3115, Dec. 2019.
IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 2, pp. 316–322, [51] Y. Zhang, Y. Huang, S. Yu, and L. Wang, “Cross-view gait recognition
Feb. 2006. by discriminative feature learning,” IEEE Trans. Image Process., vol. 29,
[28] Z. Liu and S. Sarkar, “Improved gait recognition by gait dynamics no. 2020, pp. 1001–1015, Jul. 2019.
normalization,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 28, no. 6, [52] P. Isola, J.-Y. Zhu, T. Zhou, and A. A. Efros, “Image-to-image translation
pp. 863–876, Jun. 2006. with conditional adversarial networks,” in Proc. IEEE Conf. Comput. Vis.
[29] I. Rida, X. Jiang, and G. L. Marcialis, “Human body part selection by Pattern Recognit. (CVPR), Jul. 2017, pp. 1125–1134.
group lasso of motion for model-free gait recognition,” IEEE Signal [53] J.-Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired image-to-image
Process. Lett., vol. 23, no. 1, pp. 154–158, Jan. 2016. translation using cycle-consistent adversarial networks,” in Proc. IEEE
Int. Conf. Comput. Vis. (ICCV), Oct. 2017, pp. 2223–2232.
[30] I. Rida, S. Almaadeed, and A. Bouridane, “Gait recognition based on
[54] Z. Yi, H. Zhang, P. Tan, and M. Gong, “DualGAN: Unsupervised
modified phase-only correlation,” Signal, Image Video Process., vol. 10,
dual learning for Image-to-Image translation,” in Proc. IEEE Int. Conf.
no. 3, pp. 463–470, Mar. 2016.
Comput. Vis. (ICCV), Oct. 2017, pp. 2849–2857.
[31] I. Rida, S. Almaadeed, and A. Bouridane, “Improved gait recognition [55] M. Mirza and S. Osindero, “Conditional generative adversarial
based on gait energy images,” in Proc. 26th Int. Conf. Microelectron. nets,” 2014, arXiv:1411.1784. [Online]. Available: http://arxiv.org/abs/
(ICM), Dec. 2014, pp. 40–43. 1411.1784
[32] I. Rida, L. Boubchir, N. Al-Maadeed, S. Al-Maadeed, and A. Bouridane, [56] E. Denton, S. Chintala, A. Szlam, and R. Fergus, “Deep generative
“Robust model-free gait recognition by statistical dependency feature image models using a Laplacian pyramid of adversarial networks,” in
selection and globality-locality preserving projections,” in Proc. Int. Proc. Adv. Neural Inf. Process. Syst., 2015, pp. 1486–1494.
Conf. Telecommun. Signal Process., Jun. 2016, pp. 652–655. [57] S. Reed, Z. Akata, X. Yan, L. Logeswaran, B. Schiele, and H. Lee,
[33] I. Rida, A. Bouridane, G. L. Marcialis, and P. Tuveri, “Improved human “Generative adversarial text to image synthesis,” in Proc. Int. Conf.
gait recognition,” in Proc. ICIAP, 2015, pp. 119–129. Mach. Learn. (ICML), 2016, pp. 1060–1069.
[34] J. Tang, J. Luo, T. Tjahjadi, and F. Guo, “Robust arbitrary-view gait [58] H. Zhang et al., “StackGAN: Text to photo-realistic image synthesis
recognition based on 3D partial similarity matching,” IEEE Trans. Image with stacked generative adversarial networks,” in Proc. IEEE Int. Conf.
Process., vol. 26, no. 1, pp. 7–22, Jan. 2017. Comput. Vis. (ICCV), Oct. 2017, pp. 5907–5915.
[35] R. Liao, S. Yu, W. An, and Y. Huang, “A model-based gait recognition [59] J. Johnson, A. Gupta, and L. Fei-Fei, “Image generation from scene
method with body pose and human prior knowledge,” Pattern Recognit., graphs,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.,
vol. 98, Feb. 2020, Art. no. 107069. Jun. 2018, pp. 1219–1228.
[36] E. R. H. P. Isaac, S. Elias, S. Rajagopalan, and K. S. Easwarakumar, [60] Z. Shu, E. Yumer, S. Hadap, K. Sunkavalli, E. Shechtman, and
“View-invariant gait recognition through genetic template segmenta- D. Samaras, “Neural face editing with intrinsic image disentangling,”
tion,” IEEE Signal Process. Lett., vol. 24, no. 8, pp. 1188–1192, in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), Jul. 2017,
Aug. 2017. pp. 5541–5550.
[37] Y. Makihara, D. Adachi, C. Xu, and Y. Yagi, “Gait recognition by [61] C. Ledig et al., “Photo-realistic single image super-resolution using
deformable registration,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern a generative adversarial network,” in Proc. IEEE Conf. Comput. Vis.
Pattern Recognit. (CVPR), Jul. 2017, pp. 4681–4690.
Recognit. Workshops (CVPRW), Jun. 2018, pp. 674–684.
[62] C. Li and M. Wand, “Precomputed real-time texture synthesis with
[38] Z. Zhang et al., “Gait recognition via disentangled representation learn- Markovian generative adversarial networks,” in Proc. Eur. Conf. Comput.
ing,” in Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit. (CVPR), Vis. (ECCV), 2016, pp. 702–716.
Jun. 2019, pp. 4705–4714. [63] W. Deng, L. Zheng, G. Kang, Y. Yang, and J. Jiao, “Image-image domain
[39] X. Ben, P. Zhang, Z. Lai, R. Yan, X. Zhai, and W. Meng, “A general adaptation with preserved self-similarity and domain-dissimilarity for
tensor representation framework for cross-view gait recognition,” Pattern person re-identification,” in Proc. IEEE Conf. Comput. Vis. Pattern
Recognit., vol. 90, pp. 87–98, Jun. 2019. Recognit., Jun. 2017, pp. 994–1003.
[40] W. Xu, C. Luo, A. Ji, and C. Zhu, “Coupled locality preserving [64] X. Liu, H. Tan, X. Tong, J. Cao, and J. Zhou, “Feature preserving GAN
projections for cross-view gait recognition,” Neurocomputing, vol. 224, and multi-scale feature enhancement for domain adaption person re-
pp. 37–44, Feb. 2017. identification,” Neurocomputing, vol. 364, pp. 108–118, Oct. 2019.
[41] X. Ben, C. Gong, P. Zhang, R. Yan, Q. Wu, and W. Meng, “Cou- [65] H. Tang, Y. Zhao, and H. Lu, “Unsupervised person re-identification
pled bilinear discriminant projection for cross-view gait recognition,” with iterative self-supervised domain adaptation,” in Proc. IEEE/CVF
IEEE Trans. circuits Syst. Video Technol., vol. 1, no. 1, pp. 734–747, Conf. Comput. Vis. Pattern Recognit. Workshops (CVPRW), Jun. 2019,
Mar. 2018. pp. 1536–1543.
[42] S. Yu, H. Chen, Q. Wang, L. Shen, and Y. Huang, “Invariant feature [66] F. Xu, B. Ma, H. Chang, S. Shan, and X. Chen, “Style transfer with
extraction for gait recognition using only one uniform model,” Neuro- adversarial learning for cross-dataset person re-identification,” in Proc.
computing, vol. 239, pp. 81–93, May 2017. Asian Conf. Comput. Vis., 2019, pp. 165–180.
[43] S. Li, W. Liu, and H. Ma, “Attentive Spatial–Temporal summary [67] X. Zhang, X.-Y. Jing, X. Zhu, and F. Ma, “Semi-supervised person
networks for feature learning in irregular gait recognition,” IEEE Trans. re-identification by similarity-embedded cycle GANs,” Neural Comput.
Multimedia, vol. 21, no. 9, pp. 2361–2375, Sep. 2019. Appl., vol. 32, no. 17, pp. 14143–14152, Sep. 2020.
[44] H. Chao, Y. He, J. Zhang, and J. Feng, “Gaitset: Regarding gait as a [68] C.-X. Ren, B. Liang, P. Ge, Y. Zhai, and Z. Lei, “Domain adaptive person
set for cross-view gait recognition,” in Proc. Assoc. Advancement Artif. re-identification via camera style generation and label propagation,”
Intell. (AAAI), 2019, pp. 8126–8133. IEEE Trans. Inf. Forensics Security, vol. 15, no. 2020, pp. 1290–1302,
Sep. 2019.
[45] X. Wang and W. Q. Yan, “Human gait recognition based on frame-by-
[69] J. Liu, Y. Zhou, L. Sun, and Z. Jiang, “Similarity preserved camera-
frame gait energy images and convolutional long short-term memory,”
to-camera GAN for person re-identification,” in Proc. IEEE Int. Conf.
Int. J. Neural Syst., vol. 30, no. 1, Jan. 2020, Art. no. 1950027.
Multimedia Expo Workshops (ICMEW), Jul. 2019, pp. 531–536.
[46] Y. Wu, J. Hou, Y. Su, C. Wu, M. Huang, and Z. Zhu, “Gait recognition [70] L. Ma, X. Jia, Q. Sun, B. Schiele, T. Tuytelaars, and L. Van Gool,
based on feedback weight capsule network,” in Proc. IEEE 4th Inf. “Pose guided person image generation,” in Proc. Annu. Conf. Neural
Technol., Netw., Electron. Autom. Control Conf. (ITNEC), Jun. 2020, Inf. Process. Syst. (NIPS), 2017, pp. 406–416.
pp. 155–160. [71] X. Qian et al., “Pose-normalized image generation for person re-
[47] M. Alotaibi and A. Mahmood, “Improved gait recognition based on identification,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2018,
specialized deep convolutional neural network,” Comput. Vis. Image pp. 650–667.
Understand., vol. 164, pp. 103–110, Nov. 2017. [72] Y. Ge, Z. Li, H. Zhao, G. Yin, X. Wang, and H. Li, “FD-GAN:
[48] A. S. M. Hossain Bari and M. L. Gavrilova, “Multi-layer perceptron Pose-guided feature distilling GAN for robust person re-identification,”
architecture for Kinect-based gait recognition,” in Proc. Adv. Comput. in Proc. Annu. Conf. Neural Inf. Process. Syst. (NIPS), 2018,
Graph. Int. Conf., 2019, pp. 356–363. pp. 1222–1233.
[49] Y. He, J. Zhang, H. Shan, and L. Wang, “Multi-task GANs for view- [73] T. R. Shaham, T. Dekel, and T. Michaeli, “SinGAN: Learning a
specific feature learning in gait recognition,” IEEE Trans. Inf. Forensics generative model from a single natural image,” in Proc. IEEE/CVF Int.
Security, vol. 14, no. 1, pp. 102–113, Jan. 2019. Conf. Comput. Vis. (ICCV), Oct. 2019, pp. 4570–4580.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.
CHEN et al.: MULTI-VIEW GAIT IMAGE GENERATION FOR CROSS-VIEW GAIT RECOGNITION 3055

[74] M. Tesfaldet, M. A. Brubaker, and K. G. Derpanis, “Two-stream con- Jian Weng (Member, IEEE) received the Ph.D.
volutional networks for dynamic texture synthesis,” in Proc. IEEE/CVF degree in computer science from Shanghai Jiao
Conf. Comput. Vis. Pattern Recognit., Jun. 2018, pp. 6703–6712. Tong University, China, in 2007. He is currently
[75] Y. Li, D. Roblek, and M. Tagliasacchi, “From here to there: Video a Professor with the College of Information Sci-
inbetweening using direct 3D convolutions,” 2019, arXiv:1905.10240. ence and Technology, Jinan Univerisity, Guangzhou,
[Online]. Available: http://arxiv.org/abs/1905.10240 China. His research interests include cryptography,
[76] D. Yoo, N. Kim, S. Park, A. S. Paek, and I. So Kweon, “Pixel- information security, and multimedia analysis.
level domain transfer,” 2016, arXiv:1603.07442. [Online]. Available:
http://arxiv.org/abs/1603.07442
[77] S. Yu, D. Tan, and T. Tan, “A framework for evaluating the effect of
view angle, clothing and carrying condition on gait recognition,” in Proc.
18th Int. Conf. Pattern Recognit. (ICPR), 2006, pp. 441–444.
[78] J. Wang, W. Feng, Y. Chen, H. Yu, M. Huang, and P. S. Yu, “Visual
domain adaptation with manifold embedded distribution alignment,” in
Proc. 26th ACM Int. Conf. Multimedia, Oct. 2018, pp. 402–410.
[79] S. Ben-David, J. Blitzer, K. Crammer, and F. Pereira, “Analysis of
representations for domain adaptation,” in Proc. Annu. Conf. Neural
Inf. Process. Syst. (NIPS), 2007, pp. 137–144. Weiqi Luo received the B.S. and M.S. degrees from
[80] B. Quanz and J. Huan, “Large margin transductive transfer learn- Jinan University, in 1982 and 1985, respectively, and
ing,” in Proc. 18th ACM Conf. Inf. Knowl. Manage. (CIKM), 2009, the Ph.D. degree from the South China University
pp. 1327–1336. of Technology in 1999. He is currently a Professor
[81] I. Gulrajani, F. Ahmed, M. Arjovsky, V. Dumoulin, and A. Courville, with the School of Information Science and Technol-
“Improved training of wasserstein GANs,” 2017, arXiv:1704.00028. ogy, Jinan University. He has published more than
[Online]. Available: http://arxiv.org/abs/1704.00028 100 high-quality papers in international journals and
[82] D. P. Kingma and J. Ba, “Adam: A method for stochastic opti- conferences. His research interests include network
mization,” 2014, arXiv:1412.6980. [Online]. Available: http://arxiv. security, big data, and artificial intelligence.
org/abs/1412.6980
[83] N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi,
“Multi-view large population gait dataset and its performance evaluation
for cross-view gait recognition,” Ipsj Trans. Comput. Vis. Appl., vol. 10,
no. 1, p. 4, 2018.
[84] H. Iwama, M. Okumura, Y. Makihara, and Y. Yagi, “The OU-ISIR
gait database comprising the large population dataset and performance
evaluation of gait recognition,” IEEE Trans. Inf. Forensics Security,
vol. 7, no. 5, pp. 1511–1521, Oct. 2012. Huiting Li received the bachelor’s degree from
[85] N. V. Chawla, K. W. Bowyer, L. O. Hall, and W. P. Kegelmeyer, the College of Electrical and Information, Jinan
“Smote: Synthetic minority over-sampling technique,” J. Artif. Intell. University, Guangzhou, China, in 2017, where she
Res., vol. 16, no. 1, pp. 321–357, 2002. is currently pursuing the master’s degree with the
[86] P. Zhang, Q. Wu, and J. Xu, “VT-GAN: View transformation GAN for College of Information Science and Technology. Her
gait recognition across views,” in Proc. Int. Joint Conf. Neural Netw. current research interests include computer vision
(IJCNN), Jul. 2019, pp. 1–8. and action/gait recognition.
[87] S. Zhou, M. Ke, and P. Luo, “Multi-camera transfer GAN for person re-
identification,” J. Vis. Commun. Image Represent., vol. 59, pp. 393–400,
Feb. 2019.
[88] N. Takemura, Y. Makihara, D. Muramatsu, T. Echigo, and Y. Yagi,
“On input/output architectures for convolutional neural network-based
cross-view gait recognition,” IEEE Trans. Circuits Syst. Video Technol.,
vol. 16, no. 1, pp. 321–357, Sep. 2019.
[89] B. Hu, Y. Guan, Y. Gao, Y. Long, N. Lane, and T. Ploetz, “Robust
cross-view gait recognition with evidence: A discriminant gait GAN
(DiGGAN) approach,” 2018, arXiv:1811.10493. [Online]. Available:
http://arxiv.org/abs/1811.10493 Qi Tian (Fellow, IEEE) received the B.E. degree in
electronic engineering from Tsinghua University, the
M.S. degree in ECE from Drexel University, and the
Ph.D. degree in ECE from the University of Illinois
Xin Chen received the master’s and Ph.D. degrees at Urbana–Champaign (UIUC). He was a Full Pro-
from the College of Information Science and Tech- fessor with the Department of Computer Science,
nology, Jinan University, China, in 2015 and 2018, The University of Texas at San Antonio (UTSA),
respectively. She was a Postdoctoral Researcher with from 2002 to 2019. From 2008 to 2009, he took one-
the College of Information Science and Technology, year Faculty Leave with Microsoft Research Asia
Jinan University, Guangzhou, China, from 2018 to (MSRA). From 2018 to 2020, he was the Chief
2020. Her current research interests include com- Scientist in computer vision with the Huawei Noah’s
puter vision, machine learning, action recognition, Ark Laboratory. He is currently a Chief Scientist in artificial intelligence with
and gait recognition. Cloud BU, Huawei. He has published more than 610 refereed journal and
conference papers. His Google citation is more than 26100 with H-index 78.
He was a coauthor of best papers, including IEEE ICME 2019, ACM CIKM
2018, ACM ICMR 2015, PCM 2013, MMM 2013, ACM ICIMCS 2012, a
Xizhao Luo (Member, IEEE) received the B.S. and Top 10% Paper Award in MMSP 2011, the Student Contest Paper in ICASSP
M.S. degrees from the Xi’an University of Technol- 2006, the Best Paper/Student Paper Candidate in ACM Multimedia 2019,
ogy, Xi’an, China, in 2000 and 2003, respectively, ICME 2015, and PCM 2007. His research interests include computer vision,
and the Ph.D. degree from Soochow University, multimedia information retrieval, and machine learning. He received the 2017
China, in 2010. He held a postdoctoral position with UTSA President’s Distinguished Award for Research Achievement, the 2016
the Center of Cryptography and Code, School of UTSA Innovation Award, the 2014 Research Achievement Awards from the
Mathematical Sciences, Soochow University. He is College of Science, UTSA, the 2010 Google Faculty Award, and the 2010
currently an Associate Professor with the School of ACM Service Award. He is an Associate Editor of IEEE TMM, IEEE TCSVT,
Computer Science and Technology, Soochow Uni- ACM TOMM, and MMSJ, and the Editorial Board of Journal of Multimedia
versity. His research interests include cryptography (JMM) and Journal of Machine Vision Applications (MVA). He is the Guest
and computational complexity. Editor of IEEE TMM, Journal of CVIU, and so on.

Authorized licensed use limited to: Linkoping University Library. Downloaded on June 21,2021 at 00:16:39 UTC from IEEE Xplore. Restrictions apply.

You might also like