You are on page 1of 49

Human Re-identification in a

Network of Video Cameras


Bir Bhanu
University of California, Riverside
Riverside, California 92521, USA
bhanu@cris.ucr.edu

July 31, 2018

www.vislab.ucr.edu
Recognizing Individuals
Suspect Suspect
Camera 1 2
A

Same
person?

Camera
B

• Non-overlapping cameras at different times and locations


www.vislab.ucr.edu
Challenges
Different Pose Changing Illumination Occlusion

Plus low resolution, noise, blurriness - Same person looks dissimilar, different persons look similar!

www.vislab.ucr.edu
Previous Approaches
Gallery
Feature Extraction

Feature Selection/
Transformation/
Metric Learning


Classification

Feature Extraction
Feature Selection/
Probe Transformation/
Metric Learning

www.vislab.ucr.edu
Related Work
Person Feature Feature Mapping
Matching
Detection Extraction Distance Metric

Method Description

Yang et al. ECCV 2014 Propose to use semantic color descriptor as a higher level color feature

Zhao et al. CVPR 2014 Learn discriminative mid-level filter from image patch clusters
Kuo et al. WACV 2013 Propose to use color names and RankBoost for re-identification
Liu et al. PR 2014 Study the importance of different features
Li et al. CVPR 2014 Use a deep neural network to learn robust features
Zheng et al. TPAMI 2013 Formulate re-identification as a relative distance comparison problem
Tao et al. TCSVT 2013 Propose a regularized Mahalanobis distance metric
Li et al. CVPR 2013 Partition the image space into different configurations
Xiong et al. ECCV 2014 Use different metric learning methods with histogram-based features
Pedagadi et al. CVPR13 Use local Fisher discriminant analysis to reduce feature dimensionality

Proposed Transform the original image features into reference descriptors


to avoid direct matching

www.vislab.ucr.edu
Drawbacks of Traditional Approaches

“The variations in feature space between the images of the same


person due to pose and illumination change are almost always larger
than image variations due to change in person identity.”

Different People
Look Similar

Same People
Look Different

Direct comparison is not that reliable…


www.vislab.ucr.edu
A Reference Helps!
Intuition: A reference set can be used to indirectly
compare two sets of data (images from different camera
views) Jennifer Lopez Jennifer Lopez

Reference Set Angelina Jolie

www.vislab.ucr.edu
Basic Reference Set Idea
• Proposed a new feature representation -- reference
descriptor (RD)
• RD is generated by computing the similarity between a
probe/gallery and a reference set

www.vislab.ucr.edu
Regularized CCA
Canonical Correlation Analysis (CCA)/Regularized CCA (RCCA)
• Explore relations between two sets of random variables from different observations of
the same data
• Find the projections of the data such that after projection the correlation between the
two sets of data is maximized
• Formulated as a constrained optimization problem, solved by solutions of equivalent
generalized eigenvalue problems

Note: Given a small training set as in most re-identification problems, direct application of
CCA may lead to poor performance due to the inaccuracy in estimating the data
covariance matrices. We have proposed ROCCA (IEEE SPL 2015) with shrinkage estimation
and smoothing technique, it can robustly estimate the data covariance matrices with
limited training samples and the implementation is simple.
Before CCA After CCA

www.vislab.ucr.edu
Coherent Subspace
• Canonical correlation analysis (CCA) is used to learn feature
transformation matrices
• After transformation, features of the same person from different
cameras are better correlated

𝑐𝑜𝑣(𝑤𝐴𝑇 𝑋𝐴 , 𝑤𝐵𝑇 𝑋𝐵 )
𝜌=
𝑣𝑎𝑟 𝑤𝐴𝑇 𝑋𝐴 𝑣𝑎𝑟(𝑤𝐵𝑇 𝑋𝐵 )

𝐶𝐴𝐵 𝑤𝐵 = 𝜆𝐶𝐴𝐴 𝑤𝐴
𝐶𝐵𝐴 𝑤𝐴 = 𝜆𝐶𝐵𝐵 𝑤𝐵
𝑇
𝐶𝐴𝐵 = 𝐸 𝑋𝐴 𝑋𝐵𝑇 = 𝐶𝐵𝐴
𝐶𝐴𝐴 = 𝐸 𝑋𝐴 𝑋𝐴𝑇 , 𝐶𝐵𝐵 = 𝐸 𝑋𝐵 𝑋𝐵𝑇

www.vislab.ucr.edu
Data Disparity

# samples
Distance distribution
# samples

Distance distribution

www.vislab.ucr.edu
Generate Reference Descriptor
Reference Set
Similarity

Similarity-based
Probe … probe RD
[s1, s2, s3, …, sN]
1 2 3 4 5 N-2 N-1 N

Gallery … Similarity-based
gallery RD
[s1, s2, s3, …, sN]
Similarity

Resulting match is determined using cosine


similarity between the probe and gallery.

www.vislab.ucr.edu
Experimental Setup
VIPeR Database Feature Extraction
• Two camera views • HSV and Lab features for color
• 632 persons in both views • LBP for texture
• One image for a person in one view • Features extracted from 8×16 blocks
• Images normalized to 128×48
• Images from one camera as probe and
images from the other camera as gallery
Method Compared
• Half data used for training and half for
• RPLM[10], Hirzer et al. ECCV2012
testing
• PS[2], Cheng et al. BMVC2011
• SDALF[5], Farenzena et al. CVPR2010
• KISSME[13], Köstinger et al. CVPR2012
• DDC[9], Hirzer et al. SCIA2011
• LMNN[19], Weinberger et al. JMLR2009
• PRDC[20], Zheng et al. PAMI2013
• ITML[3], Davis et al. ICML2007
• ERSVM[17], Prosser et al. BMVC2010
• ELF[7], Gray et al. ECCV2008
• LDML[8], Guillaumin et al. ICCV2009
• LMNN-R[4], Dikmen et al. ACCV2011

www.vislab.ucr.edu
Experimental Results
Recognition rates at top ranks (in %)

This paper, AVSS 2013


ECCV 2012
BMVC 2011
CVPR 2010
CVPR 2012
SCIA 2011
JMLR 2009
TPAMI 2013
ICML 2007
BMVC 2010
ECCV 2008
ICCV 2009
ACCV 2011
27 62 76 CVPR 2013
Experimental Results

CMC curves for the proposed method and RPLM (2nd best
method)
90
Recognition rate (%)

80
70
60
50 RPLM
40 RCCA only
30 Proposed
20
0 5 10 15 20 25 30
Rank

www.vislab.ucr.edu
Experimental Results
Effects of
reference set size

90
80
Recognition rate (%)

70
60
50
40 N=316
30
N=200
N=100
20
10
0 5 10 15 20 25 30
Rank
www.vislab.ucr.edu
Saliency and Re-Ranking
• To improve the performance, a re-ranking step based on
image saliency is added
• If a salient region is detected in one view, the corresponding
saliency in another view is likely to be present
• Soft-biometrics (Backpack, Jeans, Carrying, Short hair, and
Male) for Re-ranking

www.vislab.ucr.edu
Further Experiments
• VIPeR dataest, two cameras, 632 subjects*
• CUHK campus dataset, two cameras, 971 subjects**
• Images normalized to 128 by 48
• Color (HSV, LAB) and texture (LBP) features are used

*http://vision.soe.ucsc.edu/?q=node/178
**http://www.ee.cuhk.edu.hk/~xgwang/CUHK_identification.html

www.vislab.ucr.edu
Results on VIPER Dataset

Comparison with state-of-the-art

Performance with reduced reference set size

www.vislab.ucr.edu
Results on CUHK Dataset
Comparison with state-of-the-art

www.vislab.ucr.edu
Reference Set Selection
• Random selection
• Max-variation
– Keep 𝑓𝑖 such that
𝑣𝑎𝑟{𝑠(𝑓𝑖 , 𝑓𝑗 )}𝑁
𝑗=1,𝑗!=𝑖 is
highest
• Min-correlation
– Remove 𝑓𝑖 whose
average correlation with
other samples is
highest

www.vislab.ucr.edu
Summary
• Direct comparison between images
from different cameras is bypassed by
using RDs
• Reference-based approach
outperforms previous approaches
• Future work will expand this framework
to multi-shot scenario with RL and
reference set selection

www.vislab.ucr.edu
www.vislab.ucr.edu 23
BEST PAPER AWARD
IEEE International Conference on Advanced Video and Signal-Based
Surveillance (AVSS), August 27-30, 2013, Krakow, Poland.

www.vislab.ucr.edu September 25, 2014 24


Outline of Presentation
• Human Identification
• Human Re-Identification
- Reference-Based Framework
- Online Learning for re-identification
• Attributes co-occurrence based re-
identification
• Unbiased representation of video for re-
identification
• Conclusions and Future Work

www.vislab.ucr.edu
Re-identification Scenario
• In a network of non-overlapping
Camera 2

cameras, identify an individual


• Tracking: From frame-to-frame
• Re-identification: Re-appearing in a
different camera (or the same camera) Camera 7

at a different time
• Tasks
– Vision: Tracking and re-identification
– Learning: Online adaptation of Camera 8

feature weights based on the context

26
Tracking and Re-identification
Video
stream(s)
Detection & Tracklets
Features
Framework for dual
functionalities Matching Evaluation Identity
• Tracking: Single
camera (Features, IDs) Reinforcement
• Re-identification:
Multi-camera Weights
Stochastic
Reinforcement Context
Database Learning
• Matching by weighted comparison of features
• Weights learned online by stochastic RL
• Weights are adapted based on the context
• Evaluation can be combination of
– Self-evaluation: e.g., confidence, error
– External: e.g., higher level generative models
www.vislab.ucr.edu
RL: SRV Unit Algorithm
• Stochastic real valued (SRV) Unit
– Immediate reinforcement Reinforcement

– Real valued output Context SRV Weight


– Associative learner Unit

 Advancement of REINFORCE algorithm for


real valued output
 Maps context to feature weight Reinforcement Weight

 Context is provided by the environment Output Mapping


(Logistic function)
 Context and reinforcement are used to update
the sample generator, e.g., mean and variance Sample
Generator
for a Gaussian distribution Update Rules ¼
 Output function maps the output of sample Parameter
Computation
generator to the real valued output, e.g., the SRV Unit
¼
logistic function maps −∞, +∞ to 0,1 For each Context inputs
which is suitable for feature weights feature
www.vislab.ucr.edu
Video Data
• Data collected from 8 cameras
in the Videoweb
• 61 Individuals,~70Minutes
• Soft-biometric features
– Height
– Weight
– Body color histogram
– Torso color histogram Videoweb: Wireless camera network with 80 cameras (37 outdoor)

– Leg color histogram


• Context
– Distance from camera
– Computed with
homography calibration

www.vislab.ucr.edu
Results
• Comparison with the baseline with fixed weights
1
Proposed
Baseline
0.9

Camera 1 2 3 4 5 6 7 8 1
0.8
Recognition Accuracy (Cumulative)

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
50 100 150 200 250 300 350
Tracklet

www.vislab.ucr.edu
Results
• Why performance keeps dropping in cameras 4 and 5?
– Appearance of people under these cameras is drastically different and the system
has not seen any examples under these scenarios
– In addition performance of detector is inferior in Camera 4 due to smaller target
sizes

(4) Outdoor
(Unseen)

1
Recognition Accuracy (Cumulative)

Proposed
Baseline
0.8 Camera 4 Camera 5
Per Camera

0.6

0.4
Detections
in Camera 4
0.2

0
130 140 150 160 170 180 190 200 210
Tracklet
www.vislab.ucr.edu
Results
• If the algorithm has seen similar images in the
past, it performs better. (It learns from the past
experience)
Camera 4 Camera 5 Camera 7 Camera 8

Outdoor Outdoor + Strong Outdoor + Strong Outdoor + Strong


Illumination Changes Illumination Changes Illumination Changes
+Viewpoint Change
Accuracy=0.23 Accuracy=0.31 Accuracy=0.73 Accuracy=0.64

• Performance can be improved further by


– Adding more contexts while learning, e.g.
illumination, viewpoints
– Allowing camera to camera appearance mapping
www.vislab.ucr.edu
Attributes Co-occurrence Pattern Mining
for Video-based Person Re-identification

AVSS 2017

www.vislab.ucr.edu
Person Re-identification

• Person Re-identification aims at matching pedestrians


across non-overlapping cameras

www.vislab.ucr.edu
Motivation

• Co-occurrence among attributes help detection


• Intuition: People tend to follow some rules in dressing
• Requirements:
• Non-symmetric

• Allow overlapping
• Reliability and accuracy
• Not only existence to existence, but also existence to no-
existence relationship

www.vislab.ucr.edu
Motivation

• How do people recognize persons?


• Attributes
• “A man with a white T-shirt and blue jeans”
• ”A girl wearing sunglasses and a black shirt”
• Challenges:
• Low Resolution
• Limited labeled data
• Different discriminative capabilities among
attributes

www.vislab.ucr.edu
Technical Approach

www.vislab.ucr.edu
Attribute detectors
• Three separate ConvNets for
head,
torso, and
leg parts
• Three convolutional layers and two fully connected layers
• Cross entropy loss function

www.vislab.ucr.edu
Co-occurrence Pattern Mining for
Attributes Detection Refinement

• Association Rules Mining:

• Refinement
– Case 1:

– Case 2:

www.vislab.ucr.edu
Co-occurrence of attributes
• Association rules aim at mining the relationships between attributes
under the restriction of the given lower bounds for both the support and
the confidence.
• The advantages are
(1) they fit the non-symmetric nature well;
(2) items can be overlapped, which complies with the pairwise
relationships among different attributes;
(3) they take into consideration of both the reliability (support) and
the accuracy (confidence).
(4) they cannot only predict the existence-to-existence relationship
but also the existence to-nonexistence one.

• We adopted FP-growth algorithm for calculating the co-occurrence


information matrix because
• it converges very fast by virtue of its tree-like structure.

• Weka platform has been used to implement our mining technique for its
simplicity to explore the data and run learning algorithms.

www.vislab.ucr.edu
Ranking
• Transfer Learning for Person Re-id
– Apply attributes detection for to person re-id
• Take attributes scores as features
• One metric learning method is used to learn a distance metric

• Integration with an Appearance-based Re-id System


– Add the distance metric values from the two separate
parts, which are calculated from appearance features and
attribute features

www.vislab.ucr.edu
Experiments
• Dataset
– iLIDs-VID dataset
• 300 video pairs at the airport arrival hall
• 23-192 image frames for each video
• 150 for training, 150 for testing
– PRID 2011 dataset
• 300 video pairs at the airport
• 50675 image frames for each video
• 100 for training, 100 for testing
– PETA dataset
• 19000 attribute labeled images taken from 8705 persons
include part of iLID-VID and PRID 2011

www.vislab.ucr.edu
Experiments

• Setup
– Attributes Detection

– Evaluation Protocol
• 10 cross validation
• Cumulative Matching Characteristic (CMC)
– Each probe is compared against all gallery samples
– Sort the matching scores
– Determine the rank at which a true match occurs

www.vislab.ucr.edu
Experiments

• Results
– iLIDs-VID dataset

www.vislab.ucr.edu
Experiments

• Results
– PRID 2011 dataset

www.vislab.ucr.edu
Conclusions

• Proposed an attribute-based method for person


re-identification
• Designd three ConvNets for attribute detection
• Exploited Co-occurrence information to refine the
attribute detection
• Integrated into an appearance-based model for
final prediction

www.vislab.ucr.edu
AN UNBIASED TEMPORAL REPRESENTATION
FOR VIDEO-BASED PERSON RE-IDENTIFICATION

ICIP 2018

www.vislab.ucr.edu
Deep Agent Using RL- ICIP 2018

Key Idea:
1. We propose a Deep Agent which can integrate existing algorithms and enable
them to complement each other.
2. Two Deep Agents are designed to integrate algorithms for data augmentation
and feature extraction parts separately for RE-ID.

www.vislab.ucr.edu
Acknowledgment
• Students: Le An, Xiu Zhang, Fulong Jiao
• Post-doc: Federico Pala, Ninad Thakoor
• Funding agencies
• IEEE TCSVT 2015
• IEEE SPL 2015
• IEEE Computer 2015
• Information Sciences 2015
• AVSS 2017
• ICIP 2016, 2017, 2018

www.vislab.ucr.edu
Thank you.

Questions?

www.vislab.ucr.edu

You might also like