Professional Documents
Culture Documents
www.vislab.ucr.edu
Recognizing Individuals
Suspect Suspect
Camera 1 2
A
Same
person?
Camera
B
Plus low resolution, noise, blurriness - Same person looks dissimilar, different persons look similar!
www.vislab.ucr.edu
Previous Approaches
Gallery
Feature Extraction
Feature Selection/
Transformation/
Metric Learning
…
Classification
Feature Extraction
Feature Selection/
Probe Transformation/
Metric Learning
www.vislab.ucr.edu
Related Work
Person Feature Feature Mapping
Matching
Detection Extraction Distance Metric
Method Description
Yang et al. ECCV 2014 Propose to use semantic color descriptor as a higher level color feature
Zhao et al. CVPR 2014 Learn discriminative mid-level filter from image patch clusters
Kuo et al. WACV 2013 Propose to use color names and RankBoost for re-identification
Liu et al. PR 2014 Study the importance of different features
Li et al. CVPR 2014 Use a deep neural network to learn robust features
Zheng et al. TPAMI 2013 Formulate re-identification as a relative distance comparison problem
Tao et al. TCSVT 2013 Propose a regularized Mahalanobis distance metric
Li et al. CVPR 2013 Partition the image space into different configurations
Xiong et al. ECCV 2014 Use different metric learning methods with histogram-based features
Pedagadi et al. CVPR13 Use local Fisher discriminant analysis to reduce feature dimensionality
www.vislab.ucr.edu
Drawbacks of Traditional Approaches
Different People
Look Similar
Same People
Look Different
www.vislab.ucr.edu
Basic Reference Set Idea
• Proposed a new feature representation -- reference
descriptor (RD)
• RD is generated by computing the similarity between a
probe/gallery and a reference set
www.vislab.ucr.edu
Regularized CCA
Canonical Correlation Analysis (CCA)/Regularized CCA (RCCA)
• Explore relations between two sets of random variables from different observations of
the same data
• Find the projections of the data such that after projection the correlation between the
two sets of data is maximized
• Formulated as a constrained optimization problem, solved by solutions of equivalent
generalized eigenvalue problems
Note: Given a small training set as in most re-identification problems, direct application of
CCA may lead to poor performance due to the inaccuracy in estimating the data
covariance matrices. We have proposed ROCCA (IEEE SPL 2015) with shrinkage estimation
and smoothing technique, it can robustly estimate the data covariance matrices with
limited training samples and the implementation is simple.
Before CCA After CCA
www.vislab.ucr.edu
Coherent Subspace
• Canonical correlation analysis (CCA) is used to learn feature
transformation matrices
• After transformation, features of the same person from different
cameras are better correlated
𝑐𝑜𝑣(𝑤𝐴𝑇 𝑋𝐴 , 𝑤𝐵𝑇 𝑋𝐵 )
𝜌=
𝑣𝑎𝑟 𝑤𝐴𝑇 𝑋𝐴 𝑣𝑎𝑟(𝑤𝐵𝑇 𝑋𝐵 )
𝐶𝐴𝐵 𝑤𝐵 = 𝜆𝐶𝐴𝐴 𝑤𝐴
𝐶𝐵𝐴 𝑤𝐴 = 𝜆𝐶𝐵𝐵 𝑤𝐵
𝑇
𝐶𝐴𝐵 = 𝐸 𝑋𝐴 𝑋𝐵𝑇 = 𝐶𝐵𝐴
𝐶𝐴𝐴 = 𝐸 𝑋𝐴 𝑋𝐴𝑇 , 𝐶𝐵𝐵 = 𝐸 𝑋𝐵 𝑋𝐵𝑇
www.vislab.ucr.edu
Data Disparity
# samples
Distance distribution
# samples
Distance distribution
www.vislab.ucr.edu
Generate Reference Descriptor
Reference Set
Similarity
Similarity-based
Probe … probe RD
[s1, s2, s3, …, sN]
1 2 3 4 5 N-2 N-1 N
Gallery … Similarity-based
gallery RD
[s1, s2, s3, …, sN]
Similarity
www.vislab.ucr.edu
Experimental Setup
VIPeR Database Feature Extraction
• Two camera views • HSV and Lab features for color
• 632 persons in both views • LBP for texture
• One image for a person in one view • Features extracted from 8×16 blocks
• Images normalized to 128×48
• Images from one camera as probe and
images from the other camera as gallery
Method Compared
• Half data used for training and half for
• RPLM[10], Hirzer et al. ECCV2012
testing
• PS[2], Cheng et al. BMVC2011
• SDALF[5], Farenzena et al. CVPR2010
• KISSME[13], Köstinger et al. CVPR2012
• DDC[9], Hirzer et al. SCIA2011
• LMNN[19], Weinberger et al. JMLR2009
• PRDC[20], Zheng et al. PAMI2013
• ITML[3], Davis et al. ICML2007
• ERSVM[17], Prosser et al. BMVC2010
• ELF[7], Gray et al. ECCV2008
• LDML[8], Guillaumin et al. ICCV2009
• LMNN-R[4], Dikmen et al. ACCV2011
www.vislab.ucr.edu
Experimental Results
Recognition rates at top ranks (in %)
CMC curves for the proposed method and RPLM (2nd best
method)
90
Recognition rate (%)
80
70
60
50 RPLM
40 RCCA only
30 Proposed
20
0 5 10 15 20 25 30
Rank
www.vislab.ucr.edu
Experimental Results
Effects of
reference set size
90
80
Recognition rate (%)
70
60
50
40 N=316
30
N=200
N=100
20
10
0 5 10 15 20 25 30
Rank
www.vislab.ucr.edu
Saliency and Re-Ranking
• To improve the performance, a re-ranking step based on
image saliency is added
• If a salient region is detected in one view, the corresponding
saliency in another view is likely to be present
• Soft-biometrics (Backpack, Jeans, Carrying, Short hair, and
Male) for Re-ranking
www.vislab.ucr.edu
Further Experiments
• VIPeR dataest, two cameras, 632 subjects*
• CUHK campus dataset, two cameras, 971 subjects**
• Images normalized to 128 by 48
• Color (HSV, LAB) and texture (LBP) features are used
*http://vision.soe.ucsc.edu/?q=node/178
**http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html
www.vislab.ucr.edu
Results on VIPER Dataset
www.vislab.ucr.edu
Results on CUHK Dataset
Comparison with state-of-the-art
www.vislab.ucr.edu
Reference Set Selection
• Random selection
• Max-variation
– Keep 𝑓𝑖 such that
𝑣𝑎𝑟{𝑠(𝑓𝑖 , 𝑓𝑗 )}𝑁
𝑗=1,𝑗!=𝑖 is
highest
• Min-correlation
– Remove 𝑓𝑖 whose
average correlation with
other samples is
highest
www.vislab.ucr.edu
Summary
• Direct comparison between images
from different cameras is bypassed by
using RDs
• Reference-based approach
outperforms previous approaches
• Future work will expand this framework
to multi-shot scenario with RL and
reference set selection
www.vislab.ucr.edu
www.vislab.ucr.edu 23
BEST PAPER AWARD
IEEE International Conference on Advanced Video and Signal-Based
Surveillance (AVSS), August 27-30, 2013, Krakow, Poland.
www.vislab.ucr.edu
Re-identification Scenario
• In a network of non-overlapping
Camera 2
at a different time
• Tasks
– Vision: Tracking and re-identification
– Learning: Online adaptation of Camera 8
26
Tracking and Re-identification
Video
stream(s)
Detection & Tracklets
Features
Framework for dual
functionalities Matching Evaluation Identity
• Tracking: Single
camera (Features, IDs) Reinforcement
• Re-identification:
Multi-camera Weights
Stochastic
Reinforcement Context
Database Learning
• Matching by weighted comparison of features
• Weights learned online by stochastic RL
• Weights are adapted based on the context
• Evaluation can be combination of
– Self-evaluation: e.g., confidence, error
– External: e.g., higher level generative models
www.vislab.ucr.edu
RL: SRV Unit Algorithm
• Stochastic real valued (SRV) Unit
– Immediate reinforcement Reinforcement
www.vislab.ucr.edu
Results
• Comparison with the baseline with fixed weights
1
Proposed
Baseline
0.9
Camera 1 2 3 4 5 6 7 8 1
0.8
Recognition Accuracy (Cumulative)
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
50 100 150 200 250 300 350
Tracklet
www.vislab.ucr.edu
Results
• Why performance keeps dropping in cameras 4 and 5?
– Appearance of people under these cameras is drastically different and the system
has not seen any examples under these scenarios
– In addition performance of detector is inferior in Camera 4 due to smaller target
sizes
(4) Outdoor
(Unseen)
1
Recognition Accuracy (Cumulative)
Proposed
Baseline
0.8 Camera 4 Camera 5
Per Camera
0.6
0.4
Detections
in Camera 4
0.2
0
130 140 150 160 170 180 190 200 210
Tracklet
www.vislab.ucr.edu
Results
• If the algorithm has seen similar images in the
past, it performs better. (It learns from the past
experience)
Camera 4 Camera 5 Camera 7 Camera 8
AVSS 2017
www.vislab.ucr.edu
Person Re-identification
www.vislab.ucr.edu
Motivation
• Allow overlapping
• Reliability and accuracy
• Not only existence to existence, but also existence to no-
existence relationship
www.vislab.ucr.edu
Motivation
www.vislab.ucr.edu
Technical Approach
www.vislab.ucr.edu
Attribute detectors
• Three separate ConvNets for
head,
torso, and
leg parts
• Three convolutional layers and two fully connected layers
• Cross entropy loss function
www.vislab.ucr.edu
Co-occurrence Pattern Mining for
Attributes Detection Refinement
• Refinement
– Case 1:
– Case 2:
www.vislab.ucr.edu
Co-occurrence of attributes
• Association rules aim at mining the relationships between attributes
under the restriction of the given lower bounds for both the support and
the confidence.
• The advantages are
(1) they fit the non-symmetric nature well;
(2) items can be overlapped, which complies with the pairwise
relationships among different attributes;
(3) they take into consideration of both the reliability (support) and
the accuracy (confidence).
(4) they cannot only predict the existence-to-existence relationship
but also the existence to-nonexistence one.
• Weka platform has been used to implement our mining technique for its
simplicity to explore the data and run learning algorithms.
www.vislab.ucr.edu
Ranking
• Transfer Learning for Person Re-id
– Apply attributes detection for to person re-id
• Take attributes scores as features
• One metric learning method is used to learn a distance metric
www.vislab.ucr.edu
Experiments
• Dataset
– iLIDs-VID dataset
• 300 video pairs at the airport arrival hall
• 23-192 image frames for each video
• 150 for training, 150 for testing
– PRID 2011 dataset
• 300 video pairs at the airport
• 50675 image frames for each video
• 100 for training, 100 for testing
– PETA dataset
• 19000 attribute labeled images taken from 8705 persons
include part of iLID-VID and PRID 2011
www.vislab.ucr.edu
Experiments
• Setup
– Attributes Detection
– Evaluation Protocol
• 10 cross validation
• Cumulative Matching Characteristic (CMC)
– Each probe is compared against all gallery samples
– Sort the matching scores
– Determine the rank at which a true match occurs
www.vislab.ucr.edu
Experiments
• Results
– iLIDs-VID dataset
www.vislab.ucr.edu
Experiments
• Results
– PRID 2011 dataset
www.vislab.ucr.edu
Conclusions
www.vislab.ucr.edu
AN UNBIASED TEMPORAL REPRESENTATION
FOR VIDEO-BASED PERSON RE-IDENTIFICATION
ICIP 2018
www.vislab.ucr.edu
Deep Agent Using RL- ICIP 2018
Key Idea:
1. We propose a Deep Agent which can integrate existing algorithms and enable
them to complement each other.
2. Two Deep Agents are designed to integrate algorithms for data augmentation
and feature extraction parts separately for RE-ID.
www.vislab.ucr.edu
Acknowledgment
• Students: Le An, Xiu Zhang, Fulong Jiao
• Post-doc: Federico Pala, Ninad Thakoor
• Funding agencies
• IEEE TCSVT 2015
• IEEE SPL 2015
• IEEE Computer 2015
• Information Sciences 2015
• AVSS 2017
• ICIP 2016, 2017, 2018
www.vislab.ucr.edu
Thank you.
Questions?
www.vislab.ucr.edu