Professional Documents
Culture Documents
DOI 10.1007/s00138-008-0174-7
ORIGINAL PAPER
Received: 7 December 2006 / Accepted: 15 September 2008 / Published online: 23 October 2008
© Springer-Verlag 2008
123
428 L. Brèthes et al.
no assumption on the probability distributions entailed in the to contours and color [8,21,37,47], or color and motion
characterization of the problem and enable an easy combi- [6,10,33,43].
nation/fusion of diverse kind of measurements. This data fusion problem has been extensively tackled by
Particle filters represent the posterior distribution by a set Pérez et al. in [33]. The authors propose a hierarchical par-
of samples, or particles, with associated importance weights. ticle filtering algorithm, which successively takes account
This weighted particles set is first drawn from the state vector of the measurements so as to efficiently draw the particles.
initial probability distribution, and is then updated over time To our belief, using multiple cues simultaneously, both in
taking into account the measurements and a prior knowledge the importance and measurement functions, not only allows
on the system dynamics and observation models. to use complementary and redundant information but also
In the Computer Vision community, the formalism has enables a more robust failures detection and recovery. More
been pioneered in the seminal paper [20] by Isard and Blake, globally, other existing particle filtering strategies should also
which coins the term CONDENSATION. In this scheme, be evaluated in order to check which ones best fulfill the
the particles are drawn from the dynamics and weighted by requirements for the envisaged application.
their likelihood w.r.t. the measurement. CONDENSATION From these considerations, a first contribution of this paper
is shown to outperform Kalman filter in the presence of back- relates to visual data fusion in robotics scenarios covering
ground clutter. a wide variety of environmental conditions. A large spec-
Following the CONDENSATION algorithm, various trum of plausible multi-cues association for such a context is
improvements and extensions have been proposed for visual depicted. Evaluations are then performed in order to exhibit
tracking. Isard et al. in [22] introduce a mixed-state the most meaningful visual cues associations in terms of dis-
CONDENSATION tracker in order to perform multiple criminative power, robustness to artifacts and time consump-
model tracking. The same authors propose in [21] another tion, be these cues involved in the particle filter importance
extension, named ICONDENSATION, which has introdu- or measurement functions. A second contribution concerns
ced for the first time importance sampling in visual tracking. a thorough comparison of the various particle filtering stra-
It constitutes a mathematically principled way of directing tegies for data fusion dedicated to the applications envisaged
search, combining the dynamics and measurements. So, the here. Some experiments are presented, in which the designed
tracker can take advantage of the distinct qualities of the trackers efficiency is evaluated with respect to temporary tar-
information sources and re-initialize automatically when get occlusion, presence of significant clutter, as well as large
temporary failures occur. Particle filtering with history sam- variations in the target appearance and in the illumination
pling is proposed as a variant in [37]. Rui and Chen in [34] of the environment. These trackers have been integrated on
introduce the Unscented Particle Filter (UPF) into audio and a tour-guide robot named Rackham whose role is to help
visual tracking. The UPF uses the Unscented Kalman filter people attending an exhibition.
to generate proposal distributions that seamlessly integrate The paper is organized as follows. Section 2 describes
the current observation. Partitioned sampling, introduced by Rackham and outlines the embedded visual trackers. Sec-
MacCormick and Isard in [27], is another way of applying tion 3 sums up the well-known particle filtering formalism,
particle filters to tracking problems with high-dimensional and reviews some variants which enable data fusion for tra-
configuration spaces. This algorithm is shown to be well- cking. Then, Sect. 4 specifies some visual measurements
suited to track articulated objects [28]. The hierarchical stra- which rely on the shape, color or image motion of the obser-
tegy [33] constitutes a generalization. Last, though outside ved target. A study comparing the efficiency of various par-
the scope of this paper, particle filters have also become a ticle filtering strategies is carried out in Sect. 5. Section 6
popular tool to perform simultaneous tracking of multiple reports on the implementation of these modalities on Rack-
persons [27,42]. ham. Last, Sect. 7 summarizes our contribution and puts
As mentioned before, the literature proposes numerous forward some future extensions.
particle filtering algorithms, yet a few studies comparing the
efficiency of these filtering strategies have been carried out.
When doing so, the associated results are mainly compared
against those of the original CONDENSATION approach 2 Rackham and the tour-guide scenario progress
[26,34,37].
Another observation concerns data fusion. It can be argued Rackham is an iRobot B21r mobile platform whose stan-
that data fusion using particle filtering schemes has been dard equipment has been extended with one pan-tilt camera
fairly seldom exploited within this visual tracking context. EVI-D70 dedicated to H/R interaction, one digital camera
The numerous visual trackers referred to in the literature for robot localization, one ELO touch-screen, a pair of loud-
consider a single cue, i.e. contours [20,28,34] or color speakers, an optical fiber gyroscope and wireless Ethernet
[30,32]. The multiple cues association has often been confined (Fig. 1).
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 429
123
430 L. Brèthes et al.
(i)
measurement z k as well as the values at xk of the importance
function and dynamics pdf.
In order to limit the degeneracy phenomenon, which says
that whatever the sequential Monte Carlo simulation method,
after few instants all but one particle weights tend to zero,
step 8 inserts a resampling stage, e.g. the so-called “syste-
matic resampling” defined in [25]. There, the particles asso-
ciated with high weights are duplicated while the others
(1) (N )
collapse, so that the resulting sequence xk(s ) , . . . , xk(s )
N
is i.i.d. according to i=1 wk(i) δ(xk − xk(i) ). Note that this
Fig. 3 The robot visual modalities: a search for interaction, b proximal
interaction, c guidance mission resampling stage should rather be fired only when the fil-
ter efficiency—related to the number of “useful” particles—
goes beneath a predefined threshold [14].
These trackers involves the camera EVI-D70 whose charac-
3.2 Importance sampling from either dynamics
teristics are: image resolution 320×240 pixels, retina dimen-
or measurements: basic strategies
sion 1/2 , and focal length 4.1mm.
The CONDENSATION—for “Conditional Density Propaga-
tion” [20]—can be viewed as the instance of the SIR
3 Particle filtering algorithms for data fusion algorithm in which the particles are drawn according to the
(i) (i)
system dynamics, viz. when q(xk |xk−1 , z k ) = p(xk |xk−1 ).
3.1 A generic algorithm This endows CONDENSATION with a prediction-update
N (i) (i)
structure, in that i=1 wk−1 δ(xk − xk ) approximates the
Particle filters are sequential Monte Carlo simulation prior p(xk |z 1:k−1 ). The weighting stage becomes wk(i) ∝
(i)
methods to the state vector estimation of any Markovian wk−1 p(z k |xk(i) ). In the visual tracking context, the original
dynamic system subject to possibly non-Gaussian random algorithm [20] defines the particles likelihoods from contour
inputs [3,13,14]. Their aim is to recursively approximate the primitives, yet other visual cues have also been exploited [33].
posterior probability density function (pdf) p(xk |z 1:k ) of the Resampling by itself cannot efficiently limit the degene-
state vector xk at time k conditioned on the set of measure- racy phenomenon. In addition, it may lead to a loss of diver-
ments z 1:k = z 1 , . . . , z k . A linear point-mass combination sity in the state space exploration. The importance function
must thus be defined with special care.
N N
In visual tracking, the modes of the likelihoods
p(xk |z 1:k ) ≈ wk(i) δ xk − xk(i) , wk(i) = 1, (1)
p(z k |xk ), though multiple, are generally pronounced. As
i=1 i=1
(i)
CONDENSATION draws the particles xk from the sys-
is determined—with δ(·) the Dirac distribution—which tem dynamics but “blindly” w.r.t. the measurement z k , many
(i)
expresses the selection of a value—or “particle”—xk(i) with of these may well be assigned a low likelihood p(z k |xk )
probability—or “weight”—wk(i) , i = 1, . . . , N . An approxi- and thus a low weight in step 5, significantly worsening the
mation of the conditional expectation of any function of xk , overall filter performance. An alternative, henceforth labeled
such as the minimum mean square error (MMSE) estimate “Measurement-based SIR” (MSIR), merely consists in sam-
E p(xk |z 1:k ) [xk ], then follows. pling the particles at time k—or just some of their entries—
Let the system be fully described by the prior p(x0 ), the according to an importance function π(xk |z k ) defined from
dynamics pdf p(xk |xk−1 ) and the observation pdf p(z k |xk ). the current image. The first MSIR strategy was
The generic particle filtering algorithm—or “Sampling ICONDENSATION [21], which guides the state space explo-
Importance Resampling” (SIR)—is shown in Table 1. Its ini- ration by a color blobs detector. Other visual detection func-
tialization consists in an independent identically distributed tionalities can be used as well, e.g. face detector (Sect. 4), or
(i.i.d.) sequence drawn from p(x0 ). At each further time k, the any other intermittent primitive which, despite its sporadi-
particles keep evolving stochastically, being sampled from an city, is very discriminant when present [33]: motion, sound,
(i) etc.
importance function q(xk |xk−1 , z k ) which aims at adaptively
exploring “relevant” areas of the state space. They are then In an MSIR scheme, a particle xk(i) whose entries are drawn
suitably weighted so as to guarantee the consistency of the from the current image may be inconsistent with its prede-
(i)
approximation (1). To this end, step 5 affects each particle cessor xk−1 from the point of view of the state dynamics.
(i) (i) (i) (i) (i)
xk a weight wk involving its likelihood p(z k |xk ) w.r.t. the As expected, the smaller is the value p(xk |xk−1 ), the lesser
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 431
1: IF k = 0, THEN Draw x0(1) , . . . , x0(i) , . . . , x0(N ) i.i.d. according to p(x0 ), and set w0(i) =
1
N END IF
(i) (i) N
2: IF k ≥ 1 THEN {—[{xk−1 , wk−1 }] being a particle description of p(xk−1 |z 1:k−1 )—}
i=1
3: FOR i = 1, . . . , N , DO
(i) (i) (i)
4: “Propagate” the particle xk−1 by independently sampling xk ∼ q(xk |xk−1 , z k )
(i) (i) (i)
(i) (i) (i) (i) p(z k |xk ) p(xk |xk−1 ) (i)
5: Update the weight wk associated with xk according to wk ∝ wk−1 , prior to a normalization step s.t. i wk = 1
(i) (i)
q(xk |xk−1 , z k )
6: END FOR N (i) (i)
7: Compute the conditional mean of any function of xk , e.g. the MMSE estimate E p(xk |z1:k ) [xk ], from the approximation i=1 wk δ(xk − xk ) of the posterior p(xk |z 1:k )
N
8: At any time or depending on an “efficiency” criterion, resample the description [{xk(i) , wk(i) }]i=1 of p(xk |z 1:k ) into the equivalent evenly weighted particles
N
(s (i) ) ( j) (i) (i) (s (i) )
set [{xk , N1 }]i=1 , by sampling in {1, . . . , N } the indexes s (1) , . . . , s (N ) according to P(s (i) = j) = wk ; set xk and wk with xk and N1
9: END IF
is the weight wk(i) . One solution to this problem, as propo- First, an auxiliary weight λ(i) (i) (i)
k ∝ wk−1 p̂(z k |x k−1 ) is
sed in the genuine ICONDENSATION algorithm, consists in (i)
associated with each particle xk−1 . The approximation
sampling some of the particles from the dynamics and some (i) (i) N
[{xk−1 , λk }]i=1 of p(xk−1 |z 1:k ) is then resampled into
w.r.t. the prior, so that the importance function reads as, with
N
α, β ∈ [0; 1] (s (i) )
[{xk−1 , N }]i=1
1
(step 6), prior to its propagation until time
(s (i) )
(i)
q xk |xk−1 , z k k through π(xk |xk−1 , z k ) (step 8). Finally, the weights of
(i)
the resulting particles xk must be corrected (step 9) in order
(i)
= απ(xk |z k ) + βp xk |xk−1 + (1 − α − β) p0 (xk ). (2) (i) ∗(i)
to take account of the “distance” between λk and wk , as
well as of the dissimilarity between the selected and optimal
This combination enables the tracker to benefit from the dis- (i) (s (i) ) (i) (s (i) )
tinct qualities of the information sources and to re-initialize importance functions π(xk |xk−1 , z k ) and p(xk |xk−1 , z k ).
automatically when temporary failures occur. The particles cloud can thus be steered towards relevant
areas of the state space. In the visual tracking context, the
approximate predictive likelihood can rely on distinct visual
3.3 Towards the “optimal” case: the auxiliary particle filter cues from these involved in the computation of the “final-
(i)
stage” likelihoods p(z k |xk ). The main limitation of the
It can be shown [14] that the “optimal” recursive scheme, AUXILIARY_PF algorithm is its bad performance when the
i.e. which best limits the degeneracy phenomenon, dynamics is uninformative compared to the state final-stage
(i) (i)
must define q ∗ (xk |xk−1 , z k ) p(xk |xk−1 , z k ) and thus likelihood, e.g. when the dynamics is very noisy or when the
wk∗ (i) ∝ wk−1
∗ (i) (i)
p(z k |xk−1 ) in the SIR algorithm (Table 1). observation density has sharp modes. Therefore, μ(i)k being
∗ (i) (i) (i)
Each weight wk can then be computed before drawing xk . a bad characterization of p(xk |xk−1 ), the pdf p(xk−1 |z 1:k )
So, the overall efficiency can be enhanced by resampling (i) (i) N
(i) ∗ (i) N
of the smoother is poorly approximated by [{xk−1 , λk }]i=1 .
the weighted particle set [{xk−1 , wk }]i=1 , which in fact The resampling stage in step 6 of Table 2 may well eliminate
represents the smoother pdf p(xk−1 |z 1:k ), just before its some particles which, once propagated through the dyna-
“propagation” through the optimal importance function mics, would be very likely w.r.t. z k . At the same time, other
(i)
q ∗ (xk |xk−1 , z k ). particles may well be duplicated which, after the prediction
Despite such an algorithm can be seldom implemented step, come to lie in the final-stage likelihood tails.
exactly, it can be mimicked by the “Auxiliary Particle Fil- In the framework of auxiliary particle filters, the Uns-
ter” (AUXILIARY_PF) along the lines of [31], see Table 2. cented Transform [24] can constitute a way to define a bet-
(i) (i)
Let the importance function π(xk |xk−1 , z k ) = p(xk |xk−1 ) ter approximation p̂(z k |xk−1 ) of the predictive likelihood
(i) (i) p(z k |xk−1 ), which is the basis of the auxiliary resampling
be defined in place of q ∗ (xk |xk−1 , z k ), and p̂(z k |xk−1 ) be
(i) stage, see steps 3–6 of Table 2. As is the case in the Unscen-
an approximation of the predictive likelihood p(z k |xk−1 )
(i) (i) ted Particle Filter [39], this transform can also be entailed in
(steps 3–5); for instance, one can set p̂(z k |xk−1 )= p(z k |μk ),
(i)
the association to each particle of the Gaussian near-optimal
where μk characterizes the distribution of xk conditioned importance function from which it is sampled. Andrieu
(i) (i) (i) (i)
on xk−1 , e.g. μk = E p(x |x (i) ) [xk ] or μk ∼ p(xk |xk−1 ). et al. propose such a strategy in [2]. Nevertheless, despite
k k−1
123
432 L. Brèthes et al.
1: IF k = 0, THEN Draw x0(1) , . . . , x0(i) , . . . , x0(N ) i.i.d. according to p(x0 ), and set w0(i) =
1
N END IF
(i) (i) N
2: IF k ≥ 1 THEN {—[{xk−1 , wk−1 }] being a particle description of p(xk−1 |z 1:k−1 )—}
i=1
3: FOR i = 1, . . . , N , DO
(i) (i) (i) (i) (i) (i) (i) (i)
4: From the approximation p̂(z k |xk−1 ) = p(z k |μk ) – e.g. with μk ∼ p(xk |xk−1 ) or μk = E (i) [x k ] –, compute the auxiliary weights λk ∝ wk−1 p̂(z k |x k−1 ),
p(xk |xk−1 )
(i)
prior to a normalization step s.t. i λk = 1
5: END FOR
(i) (i) N ( j)
6: Resample [{xk−1 , λk }] – or, equivalently, sample in {1, . . . , N } the indexes s (1) , . . . , s (N ) of the particles at time k − 1 according to P(s (i) = j) = λk – in order to get
i=1
N N
(s (i) )
[{xk−1 , N1 }]
(i) (i) 1 N δ(x (s (i) )
i=1
; both i=1 λk δ(x k−1 − x k−1 ) and N i=1 k−1 − x k−1 ) mimic p(x k−1 |z 1:k )
7: FOR i = 1, . . . , N , DO
(i) (s (i) )
8: “Propagate” the particles by independently drawing xk ∼ p(xk |xk−1 )
(i) (i) (s (i) ) (i) (i)
(i) p(z k |xk ) p(xk |xk−1 ) p(z k |xk ) p(z k |xk )
9: Update the weights, prior to their normalization, by setting wk ∝ (i) = =
(s ) (i) (s (i) ) (s (i) ) (s (i) )
p̂(z k |xk−1 )π(xk |xk−1 ,z k ) p̂(z k |xk−1 ) p(z k |μk )
N (i) (i)
10: Compute E p(xk |z 1:k ) [xk ] from the approximation i=1 wk δ(xk − xk ) of the posterior p(xk |z 1:k )
11: END FOR
12: END IF
(i)
its attractiveness and its ability to mimic the optimal case, “history part” of xk —which is at the same time likely w.r.t.
this is more difficult to implement and shows a higher com- (i)
u k from the dynamics point of view and assigned with a
putational cost. significant weight. The RBSSHSSIR algorithm noticeably
differs from ICONDENSATION precisely because of this
(i) (i) N
3.4 Other strategies suited to visual tracking stage, yet necessary lest the weighted particles [{xk , wk }]i=1
is not a consistent description of the posterior p(xk |z 1:k ).
3.4.1 History sampling An original proof of the RBSSHSSIR algorithm is sket-
ched in [9], using arguments similar to these underlying
Several interesting particle filtering alternatives to visual tra- the AUXILIARY_PF. It is shown that the algorithm applies
cking are proposed in [37]. One of them considers dyna- even when the state process is of the first order, by just sup-
mic models of order greater than or equal to 2, in which pressing the entry f (xk−1 ) from xk .
the state vector has the form xk = (u k , vk , h k ) , with [.] the
transpose operator. The subvector (u k , vk ) or “innovation 3.4.2 Partitioned and hierarchical sampling
part”—of xk obeys a stochastic state equation on xk−1 , while
h k —called “history part”—is a deterministic function Partitioned and Hierarchical sampling can significantly
(i) (i)
f (xk−1 ). It is assumed that the innovations (u k , vk ) are enhance the efficiency of a particle filter in cases when the
sampled from an importance function such as system dynamics comes as the successive application of ele-
(i) (i) (i) mentary dynamics, provided that intermediate likelihoods
q I (u k , vk |xk−1 , z k ) = π(u k |z k ) p(vk |u k , xk−1 ), i.e. the sub-
(i) can be defined on the state vector after applying each par-
particles u k are positioned from the measurement only while
(i) tial evolution model. The classical single-stage sampling of
the vk ’s are drawn by fusing the state dynamics with the
(i)
the full state space is then replaced by a layered sampling
knowledge of u k —and that the pdf of the measurement approach: thanks to a succession of sampling operations fol-
conditioned on the state satisfies p(z k |xk ) = p(z k |u k , vk ). lowed by resamplings based on the intermediate likelihoods,
This context is particularly well-suited to visual tracking, the search can be guided so that each sampling stage refines
for state-space representations of linear AR models entail the output from the previous stage.
the above decomposition of the state vector, and because the To outline the technical aspects of each strategy, let
output equation does not involve its “history part”. ξ0 = xk−1 , ξ1 , . . . , ξ M−1 , ξ M = xk be M + 1 “auxiliary vec-
The authors define procedures enabling the avoidance of tors” such that the dynamics p(xk |xk−1 ) reads as the
(i) (i) (i)
any contradiction between (u k , vk ) and its past xk−1 . convolution
Their “Rao-Blackwellized Subspace SIR with History Sam-
pling” (RBSSHSSIR) is summarized in Table 3. Its step 5 p(xk |xk−1 )
(i)
consists, for each subparticle u k drawn from π(u k |z k ), in
the resampling of a predecessor particle—and thus of the = d̃ M (ξ M |ξ M−1 ) . . . d̃1 (ξ1 |ξ0 ) dξ1 . . . dξ M−1 , (3)
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 433
1: IF k = 0, THEN Draw x0(1) , . . . , x0(i) , . . . , x0(N ) i.i.d. according to p(x0 ), and set w0(i) =
1
N END IF
(i) (i) N
2: IF k ≥ 1 THEN {—[{xk−1 , wk−1 }] being a particle description of p(xk−1 |z 1:k−1 )—}
i=1
3: FOR i = 1, . . . , N , DO
(i)
4: Draw u k ∼ π(u k |z k )
(i) (i) (1) (i) (1) (.) (i) (.) (N ) (i) (N )
5: Sample in {1, . . . , N } the index Ik of the predecessor particle of u k according to the weights (wk−1 p(u k |xk−1 ), . . . , wk−1 p(u k |xk−1 ), . . . , wk−1 p(u k |xk−1 )), i.e.
( j)
(i) ( j)
(i) w p(u |xk−1 )
according to P(Ik = j) = k−1(l) k (i) (l)
N
l=1 wk−1 p(u k |xk−1 )
(i)
(i) (i) (Ik )
6: Draw vk ∼ p(vk |u k , xk−1 )
⎛ ⎞
(i)
(i) (i) (i) (Ik )
7: Set xk = ⎝u k , vk , f (xk−1 )⎠
i.e. the successive application of d̃1 (ξ1 |ξ0 ), . . . , d̃m (ξm |ξm−1 ), function
q(xk |xk−1 , z k ) by q(xk |xk−1 , z k ) =
. . . , d̃ M (ξ M |ξ M−1 ). The measurement M pdf p(z k |xk ) is q̃ M (xk |ξ M−1 , z kM ) . . . q̃1 (ξ1 |xk−1 , z k1 ) dξ1 . . . dξ M−1 .
supposed to factorize as p(z k |xk ) = m=1 pm (z k |xk ). Incorporating each likelihood pm (z k |·) after applying the
The second partitioned particle filtering algorithm pro- intermediate dynamics leads to the algorithm depicted in
posed in [28] assumes that the state vector dynamics are Table 4, with q̃m (ξm |ξm−1 , z k ) = q̃m (ξm |ξm−1 , z km ) and
component-wise independent, i.e. if all the vectors ξm , pm (z k |xk ) = pm (z km |xk ).
m = 1, . . . , M, are analogously partitioned into M subvec-
tors ξm1 , . . . , ξmM , then d̃m (ξm |ξm−1 ) = p(ξmm |ξm−1
m )
diate likelihoods pm (z k |xk ) are supposed to concern a sub- Importance sampling offers a mathematically principled way
set of the state vector all the more important as m −→ M, of directing search according to visual cues which are discri-
i.e. to have the form pm (z k |xk ) = lm (z k |xk1 , . . . , xkm ). Under minant though possibly intermittent, e.g. motion. Such cues
these hypotheses, the partitioned particle filter follows the are logical candidates for detection modules and efficient
algorithm outlined in Table 4, with q̃m (ξm |ξm−1 , z k ) = proposal distributions. Besides, each sample weight is upda-
d̃m (ξm |ξm−1 ), pm (z k |xk ) = lm (z k |xk1 , . . . , xkm ). ted taking into account its likelihood w.r.t. the current image.
Partitioned sampling has been successfully applied to the This likelihood is computed by means of measurement func-
visual tracking of an open kinematic chain in [28], by orga- tions, according to visual cues (e.g. color, shape) which must
nizing the state vector so that its first entries depict the top be persistent but may however be proner to ambiguity in clut-
elements of the chain—to be accurately positioned sooner for tered scenes. In both importance sampling and weight update
a higher efficiency—while its last components are related to steps, combining or fusing multiple cues enables the tracker
the extremities. A branched algorithm has also proved to be to better benefit from distinct information sources, and can
able to track multiple persons in [27]. decrease its sensitivity to temporary failures in some of the
The hierarchical particle filter developed in [33] can measurement processes. Measurement and importance func-
be viewed as a generalization of the partitioned scheme tions are depicted in the next subsections.
outlined above. No restriction is imposed on the functions
d̃1 (·|·), d̃ M (·|·). The measurement z k is supposed made
up with M sensory information z k1 , . . . , z kM conditionally 4.1 Measurement functions
independent given xk , so that the intermediate likelihoods
come as pm (z k |xk ) = pm (z km |xk ). Importantly, the particles 4.1.1 Shape cue
relative to the auxiliary vectors ξ1 , . . . , ξ M are not
sampled from d̃1 (·|·), . . . , d̃ M (·|·) but instead from dis- The use of shape cues requires that silhouette templates of
tributions q̃1 (·|·), . . . , q̃ M (·|·) related to the importance human limbs have been learnt beforehand (Fig. 4). Each
123
434 L. Brèthes et al.
Table 4 Partitioned (PARTITIONED_PF) and hierarchi- q̃m (ξm |ξm−1 , z k ) = d̃m (ξm |ξm−1 ), pm (z k |xk ) = lm (z k |xk1 , . . . , xkm ),
cal (HIERARCHICAL_PF) particle filtering: q̃m (ξm |ξm−1 , z k ) or: (HIERARCHI- CAL_PF) q̃m (ξm |ξm−1 , z k ) = q̃m (ξm |ξm−1 , z km ),
and pm (z k |xk ) are defined either as: (PARTITIONED_PF) pm (z k |xk ) = pm (z km |xk )
(i) (i) N (i) (i) N
[{xk , wk }]i=1= PARTITIONED_OR_HIERARCHICAL_PF [{xk−1 , wk−1 , }] , zk
i=1
1: IF k = 0, THEN Draw x0(1) , . . . , x0(i) , . . . , x0(N ) i.i.d. according to p(x0 ), and set w0(i) = 1
N END IF
(i) (i) N
2: IF k ≥ 1 THEN —[{xk−1 , wk−1 }] being a particle description of p(xk−1 |z 1:k−1 )—
i=1
(i) (i) (i) (i)
3: Set {ξ0 , τ0 } = {xk−1 , wk−1 }
4: FOR m = 1, . . . , M , DO
(i) (i) (i)
(i) (i) (i) (i) pm (z k |ξm )d̃m (ξm |ξm−1 )
(i)
5: FOR i = 1, . . . , N , DO Independently sample ξm ∼ q̃m (ξm |ξm−1 , z k ) and associate ξm the weight τm ∝ τm−1 (i) (i) END FOR
q̃m (ξm |ξm−1 ,z k )
N N
(i) (i) N (s (i) ) 1 (s (i) ) 1 (i) (i) N
6: Resample [{ξm , τm }]i=1 into the evenly weighted particles set [{ξm , N }]i=1 ; rename [{ξm , N }]i=1 into [{ξm , τm }]i=1
7: END FOR
8: Set {xk(i) , wk(i) } = {ξ M
(i) (i)
, τ M }, which is a consistent description of p(xk |z 1:k )
.
.
.
(. . .): END IF
particle x is classically given an edge-based likelihood In case of cluttered background, using only shape cues for
p(z S |x) that depends on the sum of the squared distances the model-to-image fitting is not sufficiently discriminant, as
between N p points uniformly distributed along the template multiple peaks are present.
corresponding to x and their nearest image edges [20], i.e.
4.1.2 Color cue
D2
Np
p z S |x ∝ exp − 2 , D= |x( j) − z( j)|, (4)
2σs Reference color models can be associated with the targe-
j=1
ted ROIs. These models are defined either a priori, or on-line
where the similarity measure D involves each jth template using some automatic detection modules. Let h cref and h cx two
point x( j) and associated closest edge z( j) in the image, the Nbi -bin normalized histograms in channel c ∈ {R, G, B},
standard deviation σs being determined a priori. respectively corresponding to the model and to a region Bx
A variant [17] consists in converting the edge image into a parametrized by the state x. The color likelihood p(z C |x)
Distance Transform image. Interestingly, the DT is a smoo- must favor candidate color histograms h cx close to the refe-
ther function of the model parameters. In addition, the DT rence histogram h rce f . The likelihood has a form similar to (4),
image reduces the involved computations as it needs to be provided that D terms the Bhattacharyya distance [30] bet-
generated only once whatever the number of particles invol- ween the two histograms h cx and h rce f , i.e. for a channel c,
ved in the filter. The similarity distance D in (4) is replaced ⎛ ⎞1/2
by Nbi
D(h cx , h cref ) = ⎝1 − h cj,x · h cj,ref ⎠ . (6)
Np
j=1
D= IDT ( j), (5)
j=1
A single histogram does not capture any information on the
spatial arrangement of colors and so can lead to noticeable
where IDT ( j) terms the DT image value at the j-th tem- drift. This drift can be avoided by splitting the tracked region
plate point. Figure 5 plots this shape-based likelihood for an into sub-regions with individual reference color models. Let
example where the target is a 2D elliptical template corres-
ponding coarsely to the subject on the right of the input image.
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 435
R
the union Bx = Np=1 Bx, p be associated with the set of N R
reference histograms {h cref, p : c ∈ {R, G, B}, p = 1, . . . , NR }.
By assuming conditional independence of the color measu-
rements, the likelihood p(z C |x) becomes
⎛ ⎞
NR
D 2 (h c , h c )
p(z C |x) ∝ exp ⎝− ⎠.
x, p ref, p
(7)
c
2σc2
p=1
Figure 6b and c plots single and multi-patch likelihoods for Fig. 7 a Absolute luminance frame difference. b Motion histograms of
the above example. The ROIs corresponding to the face and two ROIs (top, middle) and of a reference ROI (bottom). c Consequent
clothes of the person on the right, are compared to their refe- associated likelihood
rence model shown in Fig. 6a.
Yet, to avoid the evaluation of the likelihood for each cue,
4.1.3 Motion cue we hereafter propose some variants so as to combine multiple
cues into a single likelihood model.
For a static camera, a basic method consists in computing
the luminance absolute difference image from successive 4.1.5 Shape and motion cues combination
frames. To capture motion activity, we propose to embed the
frame difference information into a likelihood model similar Considering a static camera, it is highly possible that the tar-
to the one developed for the color measurements. geted subject be moving, at least intermittently. To cope with
Pérez et al. in [33] define a reference histogram model background clutter, we thus favor the moving edges (if any)
for motion cues. For motionless regions, the measurements by combining motion and shape cues into the definition of
−
→
fall in the lower histograms bins while moving regions fall the likelihood p(z S , z M |x) of each particle x. Given f (z( j))
a priori in all the histograms bins. From these considera- the optical flow vector at pixel z( j), the similarity distance
j,ref =
M is given by h M
tions, the reference motion histogram h ref D in (4) is then replaced by
1
, j = 1, . . . , Nbi . The motion likelihood is set to Np
Nbi
D= |x( j) − z( j)| + ρ · γ (z( j)), (10)
D 2 (h xM , h ref
M) j=1
p(z M |x) ∝ exp − 2
, (8) −
→
2σm where γ (z( j)) = 0 (resp. 1) if f (z( j)) = 0 (resp. if
−
→
f (z( j)) = 0) and ρ > 0 terms a penalty. Figure 8 plots
and is illustrated on Fig. 7a, b, and c.
this more discriminant likelihood function for the example
seen above. The target is still the subject on the right, but is
4.1.4 Multi-cues fusion assumed to be moving.
Fusing multiple cues enables the tracker to better benefit from 4.1.6 Shape and color cues combination
M distinct measurements (z 1 , . . . , z M ). Assuming that these
are mutually independent conditioned on the state, the unified We propose in [7] a likelihood model p(z S , z C |x) which
measurement function thus factorizes as combines both shape and color cues through a skin-colored
regions segmentation. The use of color features makes the
M
p(z 1 , . . . , z M |x) = p(z m |x). (9) tracker more robust to situations where there is poor grey-
m=1 level contrast between the human limbs and the background.
123
436 L. Brèthes et al.
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 437
123
438 L. Brèthes et al.
14 1
13.3 Average error between the maximum likelihood and true position False positive
Average error between the nearest significative peak and true position
12.6 0.9 True positive
11.9 False negative
11.2 0.8
10.5
9.8 0.7
9.1
8.4 0.6
7.7
7 0.5
6.3
5.6 0.4
4.9
4.2 0.3
3.5
2.8
0.2
2.1
1.4
0.1
0.7
0
S1 S2 C1 S1C1 S1M S2C1 S2C2 S1MC1 0
FD FMD FSBD MD SBD SBMD
Fig. 13 Average distance between the true target position and (1) the Fig. 15 Average detection rate relative to false positives, true positives
maximum likelihood peak, (2) the nearest significative peak and false negatives for various importance functions. The red and blue
lines depict the real true positives rate and the frontal face recognition
rate in the database, respectively
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 439
180
171 5.2.1 Face tracker (proximal interaction)
162
153
144 This modality involves the state vector xk = (u k , vk , sk , θk ) .
135
126 As the robot remains static, both shape and motion cues
117
are combined into the S1M measurement function. The
Time in ms
108
99
90 tracker is evaluated in nominal conditions, viz. under no
81
72
disturbance, as well as against cluttered environments and
63 illumination changes. A typical run is shown in Fig. 17.
54
45 Figures 18 and 19 plot the tracking failure rate as well as
36
27 tracking errors averaged over scenarios involving cluttered
18
9 backgrounds. Dynamics-based Importance Functions lead to
0
FD FMD FSBD MD SBD SBMD
a better precision (about 10 pixels) together with a low fai-
lure rate, so that detection modules are not necessary in this
Fig. 16 Average computation time of one image for each importance “easiest” context.
function. Note that this time is independent of the number of particles
The AUXILIARY_PF strategy shows an higher time
involved in the tracking
consumption than the CONDENSATION algorithm though
with no improvement of the approximation of the poste-
ensures a sufficient number of particles to be sampled on rior. The increased computational cost is of course due to
the target, despite some may be positioned on false detec- the auxiliary sampling step. The fair performance comes
tions. Consequently, the importance functions F S B D and from the poorly informative dynamics model, see the end
M D are considered as candidate elements of the aforemen- of Sect. 3.3. This is why we opt for a CONDENSATION
tioned visual tracking modalities, to be characterized and algorithm, which can run at ≈ 40 Hz for N = 200 particles
evaluated below. (Fig. 20).
The parameters values of our face tracker are listed in
Table 5. The standard deviations of the Gaussian noises entai-
5.2 Particle filtering strategies evaluations led in the random walk dynamics are set using classical
arguments relating them to the admissible evolution of the
The filtering strategies depicted in Sect. 3 must be exami- template between two consecutive images. The standard
ned in order to check which ones best fulfill the require- deviations of the importance functions come from an offline
ments of the considered H/R interaction modalities. For the prior study of the underlying detectors, as was done in [21].
sake of comparison, importance functions rely on dynamics The parameter σs involved in the shape-based likelihood
or measurements alone (and are respectively noted DIF for (and, in Tables 6, 7, the parameters σc , σm involved in the
“Dynamics-based Importance Function” and MIF for color-based and motion-based likelihoods), are defined as
“Measurement-based Importance Function”), or combine follows: first, for each element of an image database, a
both (and are termed DMIF for “Dynamics and Measurement- “ground truth” is determined by manually adjusting the
based Importance Function”). Further, each modality is eva- template position, orientation and scale; then the distances
luated on a database of sequences acquired from the robot in involved in the various likelihoods are computed for several
a wide range of typical conditions: cluttered environments, perturbations of the template parameters around their “true”
appearance changes or sporadic disappearance of the targeted values; assuming that these distances are samples of a Gaus-
subject, jumps in his/her dynamics, etc. For each sequence, sian centered on 0 enables the estimation of σs , σc , σm . The
the mean estimation error with respect to “ground truth”, remaining coefficients, including the number of particles,
together with the mean failure rate (% of target loss), are com- are selected by trial-and-error, so as to ensure overall good
puted from several filter runs and particles numbers. The error performance of the tracker while limiting its computational
is computed as the distance (in pixels) between the estimated cost.
position and the true position of the object to be tracked. It
is important to keep in mind that a failure is registered when 5.2.2 Upper human body tracker (guidance mission)
this error exceeds a threshold (related to the region of inter-
est), and is followed by a re-initialization of the tracker. Due This modality involves the state vector xk = (u k , vk , sk ) —
to space reasons, only a subset of the associated figure plots the orientation θk being set to a known constant—as well as
is shown here. This analysis motivates our choices depicted two color models h cref,1 ,h cref,2 , respectively corresponding to
hereafter for the three visual tracking modalities. The pre- the head and the torso of the guided person, in the measure-
sented results have been obtained on a 3 GHz Pentium IV ment function (7). To overcome appearance changes of these
personal computer. ROIs in the video stream, their associated color models are
123
440 L. Brèthes et al.
0.08 250
CONDENSATION (DIF)
0.075 CONDENSATION (DIF)
AUXILIARY_PF_PF (DIF) 225 AUXILIARY_PF (DIF)
0.07 ICONDENSATION (MIF)
ICONDENSATION (MIF)
0.065 RBSSHSSIR (MIF) RBSSHSSIR (MIF)
200
0.06 ICONDENSATION (DMIF) ICONDENSATION (DMIF)
RBSSHSSIR (DMIF) RBSSHSSIR (DMIF)
0.055 175
Failure rate
0.05
150
Time in ms
0.045
0.04 125
0.035
0.03 100
0.025
75
0.02
0.015 50
0.01
0.005 25
0
0 100 200 300 400 500 600 700 800 900 0
0 100 200 300 400 500 600 700 800 900
Number of particles Number of particules
Fig. 18 Average failure rate versus number of particles on sequences Fig. 20 Average time consumption versus number of particles on all
involving cluttered background sequences for several strategies
24 CONDENSATION (DIF)
AUXILIARY_PF (DIF)
ICONDENSATION (MIF)
22 RBSSHSSIR (MIF) where κ weights the contribution of the mean state histo-
ICONDENSATION (DMIF)
20
RBSSHSSIR (DMIF) gram h cE[xk ] to the target model h cref,k−1 and index p has
been omitted for compactness reasons. Drift and possible
Average error
18
subsequent target loss are experienced in any tracker which
16 involves models updating. To avoid this, the particles weigh-
14
ting step considers the likelihood S2C1 which fuses, thanks
to (9), color distributions cue but also shape cue relatively to
12
the head silhouette (Fig. 4).
10 Given the H/R interaction distance and the evaluations
8
results in Sect. 4.2, the common importance function of
0 100 200 300 400 500 600 700 800 900 ICONDENSATION and RBSSHSSIR strategies is based on
Number of particles
color blobs and face detectors, namely F S B D. These propo-
Fig. 19 Tracking errors versus number of particles on sequences invol- sals permit automatic initialization when persons appear or
ving cluttered background re-appear in the scene and improve the recovery of deadlocks
induced by target loss.
In nominal conditions, all the particle filtering strategies
updated online through a first-order discrete-time filtering lead to a similar precision and failure rate. Experiments on
process entailing the state estimate i.e. sequences including appearance or illumination changes,
such as the two runs reported in Fig. 21, also show
h cref,k = (1 − κ) · h cref,k−1 + κ · h cE[xk ] , (14) similar results. Indeed, fusing shape and color cues in the
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 441
(α, β) Mixture coefficients in the importance function q(xk |xk−1 , z k ) along Eq. (2) (0.3, 0.6)
√
(σu , σv , σs ) Standard deviation of the random walk dynamics noise on the state vector xk = (u k , vk , sk ) (11, 6, 0.1)
(σu i , σvi ) Standard deviation in importance function π(xk |z S ) for F D-based detector (6, 6)
(σu i , σvi ) Standard deviation in importance function π(xk |z C ) for S B D detector (6, 6)
σs Standard deviation in shape-based likelihood p(z kS |xk ) 25
NR Number of patches in p(z kC |xk ) 2
σc Standard deviation in color-based likelihood p(z kC |xk ) 0.03
Nbi Number of color bins per channel involved in p(z kC |xk ) 32
κ Coefficients for reference histograms h cref,1 , h cref,2 update in Eq. (14) (0.1, 0.05)
N Number of particles 150
(α, β) Mixture coefficients in the importance function q(xk |xk−1 , z k ) along Eq. (2) (0.3, 0.6)
√
(σu , σv , σs ) Standard deviation of the random walk dynamics noise on the state vector xk = (u k , vk , sk ) (7, 5, 0.1)
ν Threshold for importance function π(xk |z kM ) 10
(σu i , σvi ) Standard deviation in importance function π(xk |z M ) for M D-based detector (8, 8)
σm Standard deviation in motion-based likelihood p(z kM |xk ) 0.2
Nbi Number of motion bins involved in p(z kM |xk ) 32
σc Standard deviation in color-based likelihood p(z kC |xk ) 0.03
Nbi Number of color bins per channel involved in p(z kC |xk ) 32
NR Number of patches in p(z kC |xk ) 1
κ Coefficient for reference histogram h cref update in Eq. (14) 0.1
N Number of particles 150
measurement function improves the discriminative power so modules, permit automatic initialization and aid recovery
that in these contexts, a robust tracking can be performed from transient tracking failures. In addition, the
whatever the used particle filtering strategy. RBSSHSSIR filter leads to a slightly better precision than
Experiments on sequences including additional sporadic ICONDENSATION. This is a consequence of the more effi-
disappearances (due to the limits of the camera field of view) cient association of subparticles sampled from the proposal
or jumps in the target dynamics highlight the efficiency of with plausible predecessors thanks to the intermediate resam-
ICONDENSATION/RBSSHSSIR strategies in terms of fai- pling stage (step 6 in Table 3).
lure rate (Fig. 22). In fact, these two strategies, by drawing Experiments on sequences including spurious detections
some particles according to the output from detection due to the presence of another non-occluding person in the
123
442 L. Brèthes et al.
0.7 0.4
CONDENSATION (DIF)
AUXILIARY_PF (DIF) 0.35 CONDENSATION (DIF)
0.6 ICONDENSATION (MIF)
AUXILIARY_PF (DIF)
RBSSHSSIR (MIF)
ICONDENSATION (DMIF)
ICONDENSATION (MIF)
0.3 RBSSHSSIR (MIF)
0.5 RBSSHSSIR (DMIF)
ICONDENSATION (DMIF)
0.25 RBSSHSSIR (DMIF)
Failure rate
Failure rate
0.4
0.2
0.3
0.15
0.2
0.1
0.1 0.05
0 0
0 100 200 300 400 500 600 700 800 900 0 100 200 300 400 500 600 700 800 900
Number of particles Number of particles
Fig. 22 Average failure rate versus number of particles on sequences Fig. 25 Average failure rate versus number of particles on sequences
including jumps in the target dynamics including two people occluding each other
0.7
ICONDENSATION (DMIF)
0.4 RBSSHSSIR (DMIF)
occluding each other highlight the efficiency of the
ICONDENSATION/RBSSHSSIR strategies in terms of fai-
0.3 lure rate (Fig. 25).
0.2
DIF strategies lead to track the person on the foreground
(Fig. 26, top), whereas ICONDENSATION/RBSSHSSIR
0.1 strategies keep locking on the right target throughout the
sequence (Fig. 26, bottom) thanks to the sampling of some
0
0 100 200 300 400 500 600 700 800 900 particles according to the visual detectors outputs, and to the
Number of particles discriminative power of the measurement function.
Fig. 23 Average failure rate versus number of particles on sequences The above experiments emphasize the necessity of taking
including a spurious detection without occlusion into account both the dynamics and the measurements so that
the tracking can be robust and efficient enough in all consi-
dered scenarios related to the guidance modality. Therefore,
camera field of view, bring out that Measurement-based the DIF and MIF particle filtering strategies are excluded in
Importance Functions lead to a worse failure rate (Fig. 23). this context. Even if DMIF-ICONDENSATION and DMIF-
Conversely, the DMIF strategies ensure a proper tracking RBSSHSSIR are well suited and have similar time consump-
thanks to the sampling of some particles from the dynamics. tion (Fig. 27) for the used number N of particles (between
To illustrate these observations, Fig. 24 shows a tracking 100 and 200), we finally adopt DMIF-RBSSHSSIR for this
run including two persons for MIF-ICONDENSATION guidance modality because of its slightly better performances
(Fig. 24, top) and DMIF-ICONDENSATION (Fig. 24, bot- compared to DMIF-ICONDENSATION. The parameters
tom). In the MIF-ICONDENSATION case, only the non- reported in Table 6 are used in the likelihoods, proposal and
targeted person is detected so that the importance function state dynamics involved in our upper human body tracker.
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 443
250 19
CONDENSATION (DIF)
ICONDENSATION (DMIF)
AUXILIARY_PF (DIF)
18 RBSSHSSIR (DMIF)
ICONDENSATION (MIF)
HIERARC
200 RBSSHSSIR (MIF)
ICONDENSATION (DMIF) 17
RBSSHSSIR (DMIF)
16
Average error
Time in ms
150
15
14
100
13
50 12
11
0
0 100 200 300 400 500 600 700 800 900 10
0 100 200 300 400 500 600 700 800 900
Number of particles
Number of particles
Fig. 27 Average time consumption versus number of particles on all Fig. 28 Tracking errors versus number of particles in nominal condi-
sequences for several strategies tions
0.07
5.2.3 Person tracker (search for interaction) ICONDENSATION (DMIF)
RBSSHSSIR (DMIF)
0.06 HIERARC
As was the case in the above section, the state vector has
the form xk = (u k , vk , sk ) . Color and motion cues are fused 0.05
Failure rate
into the global measurement function (9) as the robot is sup- 0.04
posed motionless. The selected importance function M D
is defined from the motion detector output. Relying on the 0.03
more focused on the target, it results in a significant decrease Fig. 29 Average failure rate versus number of particles in nominal
of the tracking error under nominal conditions, as illustrated conditions
in Fig. 28. In this helpful case, a slightly better failure rate is
also observed as shown in Fig. 29.
Experiments on sequences including a full occlusion of dynamics pdf, they are affected small weights and thus get
the moving target by a static object (Fig. 32) highlight the effi- eliminated during the first resampling step of the algorithm
ciency of DMIF-ICONDENSATION/DMIF-RBSSHSSIR (step 6 in Table 4). Meanwhile, the other filtering strategies
strategies in terms of of failure rate compared to the which rely on Dynamics and Measurement based importance
HIERARCHICAL_PF strategy (Fig. 30). Though some par- functions can perform the tracking.
ticles are sampled on the targeted person by the motion- Similar conclusions hold when the target is motionless and
based importance function (M D) as soon as he/she reappears subject to occlusion (Figs. 33, 34). This can again be explai-
after an occlusion, the HIERARCHICAL_PF strategy fails ned by the action of its first resampling step which concen-
(Figs. 31, 32). Indeed, as these particles lie in the tails of the trates particles on high motion activity regions. Figures 35
123
444 L. Brèthes et al.
0.7 0.45
ICONDENSATION (DMIF)
RBSSHSSIR (DMIF) 0.4
0.6 HIERARC
ICONDENSATION (DMIF)
0.35 RBSSHSSIR (DMIF)
0.5 HIERARC
0.3
Failure rate
Failure rate
0.4 0.25
0.3 0.2
0.15
0.2
0.1
0.1
0.05
0 0
0 100 200 300 400 500 600 700 800 900 0 100 200 300 400 500 600 700 800 900
Number of particles Number of particles
Fig. 30 Average failure rate versus number of particles on sequences Fig. 33 Average failure rate versus number of particles on sequences
including an occlusion by a static object including target occlusions
15.5
11 ICONDENSATION (DMIF)
ICONDENSATION (DMIF)
15 RBSSHSSIR (DMIF) RBSSHSSIR (DMIF)
HIERARC 10 HIERARC
14.5
9
14
Average error
Average error
13.5 8
13 7
12.5 6
12
5
11.5
4
11
0 100 200 300 400 500 600 700 800 900 3
Number of particles 0 100 200 300 400 500 600 700 800 900
Number of particles
Fig. 31 Tracking errors versus number of particles on sequences
including an occlusion by a static object Fig. 34 Tracking errors versus number of particles on sequences
including target occlusions
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 445
ICU
I See You
LuckyLoc
monocular loc
Segloc
map building &
NDD
local avoidance
SonO
Sonar Obstacles
The C++ implementation of the module ICU is integrated in
on quadrangles loc on segments detection
123
446 L. Brèthes et al.
8 7
INIT
− MD−based detector
2
6
proximal interaction guidance mission
− FD−based detector − FD−based detector
− Face classification 5 − Face classification
− Face learning 3
− Upper human body tracking
− User presence table update
− Face tracking
− Interaction through touch−screen
Fig. 39 Snapshots of detected (red)/recognized (green) faces with 1 4
associated probabilities. The target is named Sylvain in this example
search for interaction
− whole body tracking (motion monitoring)
6.3 Visual functions provided by the module ICU Fig. 40 Transitions between tracking modalities
6.4 Heuristic-based switching between trackers This paper has introduced mechanisms for visual data fusion/
combination within particle filtering to develop people tra-
A finite-state automaton can be defined from the tour-guide ckers from a single color camera mounted on a mobile robot.
scenario outlined in Sect. 2, as illustrated in Fig. 40. Its four Most particle filtering techniques to single-target tracking
123
Particle filtering strategies for data fusion dedicated to visual tracking from a mobile robot 447
have been surveyed and tested. A first contribution concerns 4. Avidan, S.: Support vector tracking. In: IEEE Int. Conf. on Com-
visual data fusion for the considered robotics scenarios, a puter Vision and Pattern Recognition (CVPR’01), pp. 184–191.
Kauai, HI (2001)
context which has fairly seldom exploited particle filtering 5. Bailly, G., Brèthes, L., Chatila, R., Clodic, A., Crowley, J.,
based solutions. The most persistent cues are used in the Danès, P., Elisei, F., Fleury, S., Herrb, M., Lerasle, F., Menezes, P.,
particles weighting stage. The others, logically intermittent, Alami, R.: HR+ : towards an interactive autonomous robot. In:
permit automatic initialization and aid recovery from tran- Journées ROBEA, pp. 39–45. Montpellier, France (2005)
6. Bichot, E., Mascarilla, L., Courtellemont, P.: Particle filtering based
sient tracking failures. Mixing these cues both into the on motion and color information. IEEE Trans. Information Sci.
importance and measurement functions of the underlying Appl. 2, 220–227 (2005)
estimation scheme, can help trackers work under a wide range 7. Brèthes, L., Menezes, P., Lerasle, F., Briot, M.: Face tracking and
of conditions encountered by our Rackham robot during its hand gesture recognition for human-robot interaction. In: IEEE Int.
Conf. on Robotics and Automation (ICRA’04), pp. 1901–1906.
displacements. A second contribution relates to the evalua-
New Orleans, LA (2004)
tion of dedicated particle filtering strategies in order to check 8. Bretzner, L., Laptev, I., Lindeberg, T.: Hand gesture using multi-
which people trackers, regarding visual cues and algorithms scale colour features, hierarchical models and particle filtering.
associations, best fulfill the requirements of the considered In: IEEE Int. Conf. on Automatic Face and Gesture Recognition
(FGR’02), pp. 405–410. Washington D.C. (2002)
scenarios. Let us point out that few studies comparing the
9. Brèthes, L.: Suivi visuel par filtrage particulaire. Application à
efficiency of so many filtering strategies had been carried l’interaction homme-robot. Ph.D. thesis, Université Paul Sabatier,
out in the literature before. A third contribution concerns the LAAS-CNRS, Toulouse (2006)
integration of all these trackers on a mobile platform, whose 10. Bullock, D., Zelek, J.: Real-time tracking for visual interface appli-
cations in cluttered and occluding situations. J. Vis. Image Com-
deployment in public areas has highlighted the relevance and put. 22, 1083–1091 (2004)
the complementarity of our visual modalities. To our know- 11. Chen, H., Liu, T.: Trust-region methods for real-time tracking.
ledge, quite few mature robotic systems enjoy such advanced In: IEEE Int. Conf. on Computer Vision (ICCV’01), vol. 2, pp.
capabilities of human perception. 717–722. Vancouver, Canada (2001)
12. Comaniciu, D., Ramesh, V., Meer, P.: Kernel-based object tracking.
Several directions are currently investigated. First, we In: IEEE Trans. on Pattern Analysis and Machine Intelligence, vol.
study how to fuse other information such as laser or sound 25, pp. 564–575 (2003)
cues. The sound cue would not just contribute to the loca- 13. Doucet, A., De Freitas, N., Gordon, N.J.: Sequential Monte Carlo
lization in the image plane, but will also endow the tracker Methods in Practice. Series Statistics For Engineering and Infor-
mation Science. Springer, New York (2001)
with the ability to switch its focus between speakers. A next 14. Doucet, A., Godsill, S.J., Andrieu, C.: On sequential monte carlo
issue will concern the incorporation of appropriate degrees sampling methods for bayesian filtering. Stat. Comput. 10(3),
of adaptivity into our multiple cues based likelihood models 197–208 (2000)
depending on the target properties changes or the current 15. Gavrila, D.M.: The visual analysis of human movement : a sur-
vey. Comput. Vis. Image Underst. 1, 82–98 (1999)
viewing conditions [40]. In addition, our tracking modali- 16. Germa, T., Brèthes, L., Lerasle, F., Simon, T.: Data fusion and
ties will be made much more active. Zooming will be used eigenface based tracking dedicated to a tour-guide robot. In: Int.
to actively adapt the focal length with respect to the H/R Conf. on Vision Systems (ICVS’07). Bielefeld, Germany (2007)
distance and to the current active visual modalities. Finally, 17. Giebel, J., Gavrila, D.M., Schnorr, C.: A bayesian framework
for multi-cue 3D object. In: Eur. Conf. on Computer Vision
the tracker will be enhanced so as to track multiple persons (ECCV’04). Prague, Czech Republic (2004)
simultaneously [27,42]. 18. Haritaoglu, I., Harwood, D., Davis, L.: W4 : Real-time surveillance
of people and their activities. IEEE Trans. Pattern Anal. Mach.
Acknowledgments The work described in this paper was partially Intell. 8(22), 809–830 (2000)
conducted within the EU Integrated Project COGNIRON (The Cog- 19. Hu, W., Tan, T., Wang, L., Maybank, S.: A survey on visual sur-
nitive Companion) and the EU STREP Project CommRob (Advanced veillance of object motion and behaviors. IEEE Trans. Syst. Man
Behavior and High-Level Multimodal Communication with and among Cybernet. 34(3), 334–352 (2004)
Robots), funded by the European Commission Division FP6-IST Future 20. Isard, M., Blake, A.: Condensation—conditional density propaga-
and Emerging Technologies under Contracts FP6-IST-002020 and FP6- tion for visual tracking. Int. J. Comput. Vis. 29(1), 5–28 (1998)
IST-045441. 21. Isard, M., Blake, A.: Icondensation: Unifying low-level and high-
level tracking in a stochastic framework. In: Eur. Conf. On Com-
puter Vision (ECCV’98), pp. 893–908. London, UK (1998)
22. Isard, M.A., Blake, A.: A mixed-state condensation tracker with
References automatic model-switching. In: IEEE Int. Conf. on Computer
Vision (ICCV’98), pp. 107–112. Bombay, India (1998)
1. Alami, R., Chatila, R., Fleury, S., Ingrand, F.: An architecture for 23. Jones, M., Rehg, J.: Color detection. Tech. rep., Compaq Cam-
autonomy. Int. J. Robot. Res. 17(4), 315–337 (1998) bridge Research Lab (1998)
2. Andrieu, C., Davy, M., Doucet, A.: Improved auxiliary particle fil- 24. Julier, S., Uhlmann, J.: A general method for approximating nonli-
tering: Application to timevarying spectral analysis. In: IEEE Wk. near transformations of probability distributions. Tech. rep., RRG,
on Statistical Signal Processing, pp. 309–312. Singapore (2001) Dept. of Engineering Science, University of Oxford (1994)
3. Arulampalam, S., Maskell, S., Gordon, N., Clapp, T.: A tutorial 25. Kitagawa, G.: Monte carlo filter and smoother for non-gaussian
on particle filters for on-line non-linear/non-gaussian bayesian tra- nonlinear state space models. J. Comput. Graph. Stat. 5(1), 1–25
cking. IEEE Trans. Signal Process. 50(2), 174–188 (2002) (1996)
123
448 L. Brèthes et al.
26. Li, P., Zhang, T.: Visual contour based on sequential importance 38. Turk, M.A., Pentland, A.P.: Face recognition using eigenfaces.
sampling/resampling algorithm. In: IEEE Int. Conf. on Pattern In: IEEE Int. Conf. on Computer Vision and Pattern Recognition
Recognition (ICPR’02), pp. 564–568. Quebec, Canada (2002) (CVPR’91), pp. 586–591. Maui, HI (1991)
27. MacCormick, J., Blake, A.: A probabilistic exclusion principle for 39. Van Der Merwe, R., De Freitas, N., Doucet, A., Wan, E.: The
tracking multiple objects. Int. J. Comput. Vis. 39(1), 57–71 (2000) unscented particle filter. In: Advances in Neural Information Pro-
28. MacCormick, J., Isard, M.: Partitioned sampling, articulated cessing Systems, vol. 13 (2001)
objects, and interface-quality hand tracking. In: Eur. Conf. on Com- 40. Vermaak, J., Andrieu, C., Doucet, A., Godsill, S.: Particle methods
puter Vision (ECCV’00), pp. 3–19. Springer, London, UK (2000) for bayesian modeling and enhancement of speech signals. IEEE
29. Moeslund, T., Granum, E.: A survey on computer vision-based Trans. Speech Audio Process. 10(3), 173–185 (2002)
human motion capture. Comput. Vis. Image Underst. 81, 231–268 41. Vermaak, J., Blake, A.: A nonlinear filtering for speaker tracking in
(2001) noisy and reverberant environments. In: IEEE Int. Conf. on Acous-
30. Nummiaro, K., Koller-Meier, E., Gool, L.V.: An adaptative color- tics, Speech and Signal Processing (ICASSP’00). Istanbul, Turkey
based particle filter. J. Image Vis. Comput. 21, 90–110 (2003) (2000)
31. Pitt, M., Shephard, N.: Filtering via simulation: auxiliary particle 42. Vermaak, J., Doucet, A., Pérez, P.: Maintaining multi-modality
filters. J. Am. Stat. Assoc. 94(446), 590–599 (1999) through mixture tracking. In: IEEE Int. Conf. on Computer Vision
32. Pérez, P., Hue, C., Vermaak, J., Gangnet, M.: Color-based proba- ( ICCV’03). Nice, France (2003)
bilistic tracking. In: Eur. Conf. on Computer Vision (ECCV’02), 43. Vermaak, J., Pérez, P., Gangnet, M., Blake, A.: Towards impro-
pp. 661–675. Berlin, Germany (2002) ved observation models for visual tracking: Selective adaptation.
33. Pérez, P., Vermaak, J., Blake, A.: Data fusion for visual tracking In: Eur. Conf. on Computer Vision (ECCV’02), pp. 645–660. Ber-
with particles. Proc. IEEE 92(3), 495–513 (2004) lin, Germany (2002)
34. Rui, Y., Chen, Y.: Better proposal distributions: Object tracking 44. Vezhnevets, V., Sazonov, V., Andreeva, A.: A survey on pixel-based
using unscented particle filter. In: IEEE Int. Conf. on Computer skin color detection techniques. In: Proc. Graphicon-2003 pp.
Vision and Pattern Recognition (CVPR’01), pp. 786–793. Kauai, 85–92 (2003)
HI (2001) 45. Viola, P., Jones, M.: Rapid object detection using a boosted cascade
35. Schwerdt, K., Crowley, J.L.: Robust face tracking using color. of simple features. In: IEEE Int. Conf. on Computer Vision and
In: Int. Conf. on Face and Gesture Recognition (FGR’00), pp. 90– Pattern Recognition (CVPR’01). Kauai, HI (2001)
95. Grenoble, France (2000) 46. Wachter, S., Nagel, H.: Tracking persons in monocular image
36. Thayananthan, A., Stenger, B., Torr, P., Cipolla, R.: Learning a sequences. Comput. Vis. Image Underst. 74(3), 174–192 (1999)
kinematic prior for tree-based filtering. In: British Machine Vision 47. Wu, Y., Huang, T.: A co-inference approach to robust visual tra-
Conf. (BMVC’03), vol. 2, pp. 589–598. Norwich, UK (2003) cking. In: IEEE Int. Conf. on Computer Vision (ICCV’01), vol. 2,
37. Torma, P., Szepesvári, C.: Sequential importance sampling for pp. 26–33. Vancouver, Canada (2001)
visual tracking reconsidered. In: AI and Statistics, pp. 198–205
(2003)
123