You are on page 1of 22

Who is who at different

people re-identification
using depth cameras

Beln Castilln Fernndez de Pinedo


1. Introduction

2. System description


Kinect sensor
Camera calibration
Heigh maps

3. Bodyprints
3.1 Extraction
3.2 Matching people

4. Results and discussion

5. Conclusions

1. Introduction

Goal: Obtain a feature vector per person,

which is called bodyprint, this vectors can be
matched to solve the re-identification
problem. Bodyprints are obtained using
calibrated depth-colour cameras such as
Microsoft kinect.

Main problem: In multi-camera systems is

difficult to re-identify people who leave one
camera and enter in another or in the same
one after a period of time.

2. System description

2.1 Kinect sensor

Aligned RGB and depth

images obtained with kinect.

Has a colour RGB and an

infrared (IR) camera. It also has
an IR pattern generator that
jointly with the IR camera is
able to determine the depth.

Segments people and it is able

to estimate their position.

The sensor has a minimum

distance to measure depth
around 1 m, and a maximum
distance of about 10 m. (only
for indoor)

2.2 Camera calibration

To change from
camera coordinates to
world coordinates.

Spatial camera
coordinates equations
for every pixel:

These equations
provide 3D

The system has the origin in the optical center of

the camera and its aligned with the camera axis.

2.2 Camera calibration

zword coordinate represent the height.

Transformation between both systems:

2.2 Camera calibration (ground)

To determine the ground plane just select a

portion of the RGB-d image that correspond
to the ground plane. The Zworld axis will be
normal to this plane.

2.3 Height maps

Virtual aerial view of the scene.

They are images where pixel values

represent the height with respect to the

Makes segmentation easier.

2.4 Segmentation

Segmentation provides us information of

which pixels in the original image belong to
each particular person.

2.5 Tracking

Process of linking the segmentation results

from several frames.

Track denote a thread of linked objects

corresponding to the same person.

3. Bodyprints

Key idea: To match people, we extract a feature

vector per track which we call bodyprint. Each
bodyprint summarises the colour appearance at
a different height for a track.

Algorithm: Height is discretised at steps of 2


At time t, the mean RGB value for each given

height is computed to obtain the temporal

3. Bodyprints

3.1 Extraction

We obtain bodyprints by averaging the

temporal signatures along time:

Where the bodyprint vector is RGBk (h), it

describes the appareance of the person.
The count vector is Ck (h) and measures
the reliability of the values of the bodyprint.

3.2 Matching people

To compare bodyprints we propose a

normalized weighted correlation coefficient.

If we want to compare bodyprints j and k we

use W(h) that allows to compare bodyprints
with missing values (like occlusions):

3.2 Matching people

We compute a weighted mean for each

track, which is used to compensate changes
in brightness and finally the correlation:

4. Results and discussion

Experiment 1: people recorded by camera 1

are searched across some videos recorded
by camera 2.

One camera captures people entering into a

shop and another one captures people at
the exit. (front1-front2 and rear1-rear2

The re-identification performance that

obtained is 93%.

4. Results and discussion

Front-front example

Rear-rear example

4. Results and discussion

Example of a wrong match: the correct

match had the second highest correlation
coefficient and it was very similar to the
highest (0.87345 and 0.87212).

4. Results and discussion

Experiment 2: people are re-identified using

the same camera.

The key difference compared with the

previous experiment is that frontal and rear
views are now compared.

The average correct re-identification

obtained in this experiment drops to 55%.

4. Results and discussion

Problems in re-identification: presence of

logos on T-shirts, backpacks

5. Conclusions

The method has proved to be robust against

differences in illumination, point of view and
momentary partial occlusions.

Similar appearance of two different people.
Different appearance of the same person from the point of
view of each camera.

More complex models can be used.
Models that take into account the relative angular position
with respect to the person axis.