You are on page 1of 22

Who is who at different

cameras:
people re-identification
using depth cameras

Beln Castilln Fernndez de Pinedo

Outline

1. Introduction

2. System description

2.1
2.2
2.3
2.4
2.5

Kinect sensor
Camera calibration
Heigh maps
Segmentation
Tracking

3. Bodyprints
3.1 Extraction
3.2 Matching people

4. Results and discussion

5. Conclusions

1. Introduction

Goal: Obtain a feature vector per person,


which is called bodyprint, this vectors can be
matched to solve the re-identification
problem. Bodyprints are obtained using
calibrated depth-colour cameras such as
Microsoft kinect.

Main problem: In multi-camera systems is


difficult to re-identify people who leave one
camera and enter in another or in the same
one after a period of time.

2. System description

2.1 Kinect sensor

Aligned RGB and depth


images obtained with kinect.

Has a colour RGB and an


infrared (IR) camera. It also has
an IR pattern generator that
jointly with the IR camera is
able to determine the depth.

Segments people and it is able


to estimate their position.

The sensor has a minimum


distance to measure depth
around 1 m, and a maximum
distance of about 10 m. (only
for indoor)

2.2 Camera calibration

To change from
camera coordinates to
world coordinates.

Spatial camera
coordinates equations
for every pixel:

These equations
provide 3D
coordinates

The system has the origin in the optical center of


the camera and its aligned with the camera axis.

2.2 Camera calibration

zword coordinate represent the height.

Transformation between both systems:

2.2 Camera calibration (ground)

To determine the ground plane just select a


portion of the RGB-d image that correspond
to the ground plane. The Zworld axis will be
normal to this plane.

2.3 Height maps

Virtual aerial view of the scene.

They are images where pixel values


represent the height with respect to the
ground.

Makes segmentation easier.

2.4 Segmentation

Segmentation provides us information of


which pixels in the original image belong to
each particular person.

2.5 Tracking

Process of linking the segmentation results


from several frames.

Track denote a thread of linked objects


corresponding to the same person.

3. Bodyprints

Key idea: To match people, we extract a feature


vector per track which we call bodyprint. Each
bodyprint summarises the colour appearance at
a different height for a track.

Algorithm: Height is discretised at steps of 2


cm.

At time t, the mean RGB value for each given


height is computed to obtain the temporal
signatures.

3. Bodyprints

3.1 Extraction

We obtain bodyprints by averaging the


temporal signatures along time:

Where the bodyprint vector is RGBk (h), it


describes the appareance of the person.
The count vector is Ck (h) and measures
the reliability of the values of the bodyprint.

3.2 Matching people

To compare bodyprints we propose a


normalized weighted correlation coefficient.

If we want to compare bodyprints j and k we


use W(h) that allows to compare bodyprints
with missing values (like occlusions):

3.2 Matching people

We compute a weighted mean for each


track, which is used to compensate changes
in brightness and finally the correlation:

4. Results and discussion

Experiment 1: people recorded by camera 1


are searched across some videos recorded
by camera 2.

One camera captures people entering into a


shop and another one captures people at
the exit. (front1-front2 and rear1-rear2
views)

The re-identification performance that


obtained is 93%.

4. Results and discussion

Front-front example

Rear-rear example

4. Results and discussion

Example of a wrong match: the correct


match had the second highest correlation
coefficient and it was very similar to the
highest (0.87345 and 0.87212).

4. Results and discussion

Experiment 2: people are re-identified using


the same camera.

The key difference compared with the


previous experiment is that frontal and rear
views are now compared.

The average correct re-identification


obtained in this experiment drops to 55%.

4. Results and discussion

Problems in re-identification: presence of


logos on T-shirts, backpacks

5. Conclusions

The method has proved to be robust against


differences in illumination, point of view and
momentary partial occlusions.

Errors:
Similar appearance of two different people.
Different appearance of the same person from the point of
view of each camera.

Solutions:
More complex models can be used.
Models that take into account the relative angular position
with respect to the person axis.