You are on page 1of 15

by a set of pixels or estimated by computing the first

and the second moments on the probability map. In


this way it is possible to represent color distribution
with a small number of component Gaussians,
however, building and updating MoGs via EM is
time-consuming.
The idea of the second category is histogram
matching. It usually needs a reference color model
of the object and a similarity measure to evaluate
the similarity between the reference and the
candidate color model. The candidate whose mode
is the most similar to the reference one is selected as
the tracking result in the current frame. The wellknown
mean shift algorithm [5] and the improved
work [23,1,4,22] that follows fall into this category,
in which the color model was represented by a
weighted histogram (kernel-based probability distribution),
and the similarity was measured with
Bhattacharrya distance. By the first order gradient
descent of the similarity measure, the mean shift
algorithm is derived with which the local best
candidate is achieved.
The method proposed in this paper belongs to the
second class, aiming at solving two problems in the
algorithms. Conventional histogram methods
[5,23,1,4,22] partition the whole color space of the
object into regular square tessellation, neglecting the
fact that object color is usually very compact and
distributed only in some small regions of the whole
color space, thus leading to a large number of void
bins and a waste of computational resources. The
second problem is that in each bin the ample color
information is not modelled, discarding the distribution
of the multi-channel gray level.
To address the two problems, a clustering-based
color model is proposed and a fast algorithm
based on Integral Images is developed for object
tracking. In Section 2 K-means clustering is used
to partition the color space adaptively and the
histogram bins of the object model is determined
accordingly. Moreover, we model the multi-channel
gray level distribution in each bin with
Gaussian to capture a richer description of the
target. Then a similarity measure and its simplified
form based on Bhattacharrya distance is introduced
to evaluate the similarity between two color
models. In Section 3 the Integral Images for
computation of histogram, mean and variance
are proposed, with which the color model is able
to be evaluated with fast array index operation.
Thanks to the Integral Images it is possible to
implement efficiently the brute-force search tracking
algorithm. In Section 4 diverse experiments are
made to demonstrate the validity and the performance
of the algorithm.
2. Clustering-based color model
It is a common understanding that adaptive
binning histograms can represent the distributions
more efficiently and more accurately with much less
bins. Although adaptive partition of color space has
long been studied in image coding [6] and image
segmentation [2], few related work was found in
object tracking.
2.1. Adaptive partition of color space
In the paper K-means clustering [7] is employed to
adaptively partition the color space of the object.
According to the clustering result, the histogram
bins are determined using the following simple
methods. For each cluster, the pixel farthest to that
cluster center is used to determine bin range that is
non-uniform rectangle for two dimensions or hyperrectangle
for higher dimensions. Adjacent rectangles
(or hyper-rectangles) may have small overlapping
regions. For a pixel within such an overlapping
region, its identity is determined by computing its
distance to relevant cluster centers and selecting the
cluster with minimum distance.
Fig. 1 presents an example of adaptive partition
of color space. The left figure is a reference image of
a human face. The middle figure shows the color
distribution of the object in RG color space, from
which we can see color is very compact and
distributed only in some small regions of the whole
RG color space. The right figure shows nonuniform
histogram bins according to K-means
clustering (d ¼ 6), where pixels belonging to the
same bin are labelled with the same color.
Determination of the number of histogram bins
is an important yet unresolved problem in colorbased
object tracking [5,23,1,3,21,22]. Too many
bins fail to handle environment changes or noise
which leads to tracking failures, meanwhile too
few fail to allow a good discrimination of the
target color model, resulting in distraction by
similar color regions nearby. In our case, straightforward
application of clustering algorithms [8]
which handle automatic selection of cluster
number cannot yet solve the above problem.
Thus, like most color-based tracking algorithms,
the bin number is empirically set (between d ¼ 4
ARTICLE IN PRESS
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 677
and 8 in our case) and selection of bin number
accounting for environment changes is left for
future work.
2.2. Color model and similarity measure
Based on the adaptive bins obtained above, given
a reference image consisting of a set of pixels
IðxiÞ; i ¼ 1; . . . ;N, the reference color model is
represented by p ¼ fpug; u ¼ 1; . . . ; d, where pu is
defined as
puðIðxÞ; bu; lu;RuÞ ¼ buGðlu; RuÞ. (1)
In the above equation, Gðlu;RuÞ is a Gaussian
distribution with mean vector lu and covariance
matrix Ru, and bu; lu; Ru are of the following forms:
bu ¼ nu=N,
lu ¼
1
nu
XN
i¼1
IðxiÞduðxiÞ,
Ru ¼
1
nu
XN
i¼1 ðIðxiÞ  luÞðIðxiÞ  luÞTduðxiÞ, ð2Þ
where nu ¼
PN
i¼1 duðxiÞ is the number of pixels
within the uth bin, and duðxiÞ is kronecker function
which is 1 if IðxiÞ falls into the uth bin and 0
otherwise.
Consider the color model q ¼ fqug; u ¼ 1; . . . ; d,
of a candidate region comprising of N0 pixels, in
which the component distribution has the form
quðIðxÞ; b0u; l0u;R0uÞ ¼ b0uGðl0u; R0uÞ, (3)
where b0u, l0u and R0u have similar forms as shown in
Eq. (2). Similarity between two component distributions
puðIðxÞ; nu; lu;RuÞ and quðIðxÞ; n0u; l0u;R0uÞ is
mR easured using Bhattacharrya distance rðpu; quÞ ¼
p
1=2
u q
1=2
u dIðxÞ. By integral we get
rðpu; quÞ ¼ cu exp 1
4 ðlu  l0uÞT ðRu þ R0uÞ1 
ðlu  l0uÞ

, ð4Þ
where cu is given below:
cu ¼ ð2bub0uÞ1=2 jRuj1=2jR0uj1=2
jRu þ R0uj
 1=2
. (5)
Thus, the similarity measure between two distributions
p ¼ fpug and q ¼ fqug is defined as
rðp; qÞ ¼
Xd
u¼1
rðpu; quÞ. (6)
2.2.1. Simplification of the color model
Assuming that gray level distribution of different
channel in each bin is independent of each other, the
covariance matrix becomes diagonal and similarity
measure can be simplified. Let lu ¼ ½mu;1 mu;2 mu;3T
and Ru ¼ diagfs2
u;1 s2
u;2 s2
u;3g, the similarity measure
between two component distributions, as described
by Eq. (4), is simplified as
rðpu; quÞ ¼ cu exp 
1
4
X3
j¼1
ðmu;j  m0u;jÞ2
s2u
;j þ s0u;j
2
!
, (7)
where cu has the following form:
cu ¼ ð2bub0uÞ1=2
Y3
j¼1
su;js0u;j
s2u
;j þ s0u;j
2
!1=2
. (8)
The advantage of such an assumption is that we can
evaluate the means and the variances in array index
ARTICLE IN PRESS
Fig. 1. Adaptive partition of color space. From left to right are: a reference i
mage (size: 73  69) of a human face, the histogram of the
reference model and non-uniform histogram bins in RG space.
678 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687
operations through the Integral Images described in
Section 3.1.
2.2.2. Remarks
A histogram models the probability pu of a pixel
pðIðxÞÞ falling into the uth bin. It is interesting to
compare different forms of the probability pu for
different definitions of histograms
pu ¼
bu for a traditional histogram;
buGðlu; RuÞ for a histogram proposed
in the paper:
8><
>:
A traditional histogram only counts the number of
pixels belonging to one bin, without modelling color
distribution within each bin, the assumption underlying
which is that all pixels within that bin are
uniformly distributed. As for the histogram proposed
in the paper, in addition to counting of the
pixel number, the distribution within each bin is
modelled as Gaussian.
3. Fast algorithm based on Integral Images for object
tracking
Exhaustive search via histogram comparison for
the maximal mode is computationally prohibitive in
real-time tracking applications. However, with the
Integral Images proposed below it is possible to
make a brute-force search.
Motivated by the work of Viola and Jones [19],
we presented a straightforward method to compute
histogram by introducing a concept of Integral
Histogram Image [20]. Porikli independently presented
the concept of Integral Histogram and
analyzed at length its computational complexity
[13]. In agreement with the methods above, the
histogram of any size of rectangle region can be
achieved with fast array index operations.
In the paper we use the methods introduced in
[20] to compute histogram. Furthermore, we extended
the work of Viola and Jones by presenting
Integral Images for computing the means and
variances of three channels in each bin.
3.1. Computation of color distribution through
Integral Images
Given the original color image Dðx; yÞ ¼
ðDj ðx; yÞ j ¼ 1; 2; 3Þ, we present Integral Images
Ibu ðx; yÞ, Imu;j ðx; yÞ and Isu;j ðx; yÞ, where
u ¼ 1; . . . ; d; j ¼ 1; 2; 3, for computation of histogram,
mean and variance of gray level for three
channels.
Assume the image Dðx; yÞ is of size M  N pixels,
the corresponding Integral Image for histogram is an
array with ðM þ 1Þ  ðN þ 1Þ rows and d columns.
The Integral Image Ibu ðx; yÞ at location ðx; yÞ corresponds
to the number of pixels that falls within the uth
bin above and to the left of ðx; yÞ in the image:
Ibu ðx; yÞ ¼
X
x0px;y0py
duðx0; y0Þ, (9)
where duðx0; y0Þ ¼ 1 if the pixel at location ðx0; y0Þ belongs to the uth bin, o
therwise duðx0; y0Þ ¼ 0. Using
the following pair of recurrences:
ibu ðx; yÞ ¼ ibu ðx  1; yÞ þ duðx; yÞ,
Ibu ðx; yÞ ¼ Ibu ðx; y  1Þ þ ibu ðx; yÞ; u ¼ 1; . . . ; d,
ð10Þ
where ibu ðx; 0Þ ¼ 0, Ibu ð0; yÞ ¼ 0 for any x and y, the
Integral Image for histogram can be computed in one
pass over the original image.
Given any rectangle, its histogram nuðu ¼ 1; . . . ; dÞ can be determined in 4d
array references
(see Fig. 2 and Eq. (11)) with Integral Histogram
Image for u ¼ 1; . . . ; d:
nu ¼ Ibu ðx þ w; y þ hÞ  Ibu ðx þ w; yÞ
 Ibu ðx; y þ hÞ þ Ibu ðx; yÞ, ð11Þ
where Ibu ðx; 0Þ ¼ Ibu ð0; yÞ ¼ 0, w and h are the width
and height of the rectangle, respectively.
The Integral Images for means and variances can
be defined as follows:
Imu;j ðx; yÞ ¼
X
x0px;y0py
duðx0; y0ÞDjðx0; y0Þ,
Isu;j ðx; yÞ ¼
X
x0px;y0py
duðx0; y0ÞDjðx0; y0Þ2,
u ¼ 1; . . . ; d; j ¼ 1; 2; 3. ð12Þ
ARTICLE IN PRESS
Fig. 2. Construction of Integral Image for histogram. On the left
is a rectangle with width w and height h, and on the right each
plane corresponds to one Integral Image plane of one bin.
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 679
With the following two pairs of recurrences:
imu;j ðx; yÞ ¼ imu;j ðx  1; yÞ þ Djðx; yÞduðx; yÞ,
Imu;j ðx; yÞ ¼ Imu;j ðx; y  1Þ þ imu;j ðx; yÞ,
u ¼ 1; . . . ; d; j ¼ 1; 2; 3, ð13Þ
isu;j ðx; yÞ ¼ isu;j ðx  1; yÞ þ Djðx; yÞ2duðx; yÞ,
Isu;j ðx; yÞ ¼ Isu;j ðx; y  1Þ þ isu;j ðx; yÞ,
u ¼ 1; . . . ; d; j ¼ 1; 2; 3, ð14Þ
the Integral Images for means and covariances can
be computed in one pass over the original image.
Based on Eqs. (13) and (14), The mean and variance
for the jth channel and the uth bin can be obtained
in fast array index operations as below:
mu;j ¼
1
nu ðImu;j ðx þ w; y þ hÞ  I mu;j ðx þ w; yÞ
 Imu;j ðx; y þ hÞ þ I mu;j ðx; yÞÞ,
s2
u;j ¼
1
nu ðIsu;j ðx þ w; y þ hÞ  I su;j ðx þ w; yÞ
 Isu;j ðx; y þ hÞ þ I su;j ðx; yÞÞ  m2
u;j ,
u ¼ 1; . . . ; d; j ¼ 1; 2; 3. (15)
3.2. Object tracking algorithm
The object shape is represented by a rectangle
which is allowed to move freely in the image plane
and to change width and height with the same scale.
Given the object location (position and size) in the
previous frame, exhaustive search is made seeking
the maximal mode in the neighboring region, the size
of which is two times of the object size.
To adapt to scale variation, the object size is
changed 0:2 in scale and exhaustive search
procedures are repeated again. The candidate with
the maximum similarity is adopted. The search step
in x and y directions is adopted as 10% of the object
width and height, respectively.
Exhaustive search guarantees that the global
maximum be achieved, which is superior to a
gradient-based algorithm such as the mean shift
that can only get a local maximum. Fig. 3 shows an
example. In the left image the girl’s face is tracked,
which is occluded by the man’s face nearby. The
right figure shows probability map in which the left,
global maximum corresponds to the object, and the
right, local maximum the man. The convergence of
gradient descent (ascent)-based algorithm such as
mean shift depends on the initial condition, which
may be trapped in the local maximum.
Thanks to Integral Images proposed, the similarity
measure can be evaluated at negligible computational
cost. Note that for tracking applications only
the Integral Images of the neighboring region
surrounding the object needs to be computed. It is
very efficient and thus, despite brute-force search in
the neighborhood the algorithm runs very fast.
4. Experiments
The program is written with Cþþ on a laptop
with 1.8GHz Intel Pentium-M 745 (Dothan) CPU
and 512 Memory. The cluster number d is 6 in the
proposed algorithm, and the mean shift algorithm is
implemented with 32  32  32 bins. In both algorithms
RGB color space is used. Initializations of
both algorithms are by hand in the first frame and
the ground truth is manually labelled.
Four measures are adopted to compare the two
algorithms: x, y coordinates and size of the computed
rectangle, as well as area of overlapping region
between the true bounding rectangle (ground truth)
and the computed one (tracking result). In addition,
as a measure to evaluate the amount of time in which
the object is not effectively followed, the temporal
fraction in which there is no overlap between the true
bounding rectangle and the computed one is also
used. In most of our experiments, the temporal
fraction is zero which means effective tracking
throughout the whole sequence. So in the following,
only cases where the temporal fraction are not zero
are explicitly indicated.
ARTICLE IN PRESS
Fig. 3. Exhaustive search guarantees the global maximum be
achieved. In the left image the girl’s face is tracked, which is
occluded by the man’s face nearby. The right figure shows
probability map in which the left, global maximum corresponds
to the object, and the right, local maximum the man.
680 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687
4.1. Person tracking
The experiment is conducted on a video clipped
from the image sequence (size: 388  284) named
‘‘ThreePastShop2cor.mpg’’ (frames 480–915) [18].
Among the three subjects walking in the corridor,
the one dressed in red clothe on the left side is
followed.
Note that from frame 260 the illumination varies,
and from frames 360 to 380 one person occludes the
interested subject gradually from the left. Despite
these difficulties the proposed algorithm and the
mean shift algorithm succeed in following the object
throughout the complete sequence.
The tracking errors vs. frame index are plotted in
Fig. 4, and some of typical tracking results using the
proposed algorithm are shown in Fig. 5. The
average tracking errors and time of both algorithms
are shown in Table 1. It can be seen that, the
tracking errors of x and y coordinates and scale
using the proposed algorithm are less than those
using the mean shift algorithm. The variances of y
coordinate and scale using the proposed algorithm
are less than those using the mean shift, meanwhile
the x coordinate variance of the former is a little
more than that of the latter.
During occlusion and the immediate short period
that follows (frames 360–420) the scale error of the
mean shift algorithm becomes very large, as shown
in the bottom, left-hand corner in Fig. 6. Actually in
this case size of the computed bounding rectangle
using the mean shift tends to larger and almost
encloses the true one. Therefore its area error
becomes very small in this period.
ARTICLE IN PRESS
0 50 100 150 200 250 300 350 400 450
0
5
10
15
Frame index
X coordinate error of object centroid (pixels)
The proposed algorithm
Mean shift algorithm
0 50 100 150 200 250 300 350 400 450
0
5
10
15
Frame index
Y coordinate error of object centroid (pixels)
The proposed algorithm
Mean shift algorithm
0 50 100 150 200 250 300 350 400 450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frame index
Scale error
The proposed algorithm
Mean shift algorithm
0 50 100 150 200 250 300 350 400 450
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Frame index
Error of overlapping region area
The proposed algorithm
Mean shift algorithm
Fig. 4. Comparison of errors for person tracking between the mean shift algorith
m (blue, dotted) and the proposed algorithm (red, solid).
From left to right, top to bottom, are shown errors of x, y, scale and overlappi
ng region area versus frame index.
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 681
Table 1 also shows the average tracking time per
frame for the mean shift algorithm (16 ms) and the
proposed algorithm (7 ms) in which most time
(5 ms) is taken by computation of Integral Images.
4.2. Human face tracking
The face image sequence (size: 256  192) is
recorded in a typical office environment [17].
Comparisons between the two algorithms are shown
in Fig. 6 and average errors and variances in Table 2.
Some of typical tracking results using the proposed
algorithm are presented in Fig. 7. Note that tracking
errors of both algorithms in this video stream are
larger than those of person sequence. It is not
surprising because the face sequence is more challenging
due to motion of both the camera and the
subject, disappearance of the object, severe illumination
changes and occlusion by a similar object.
From frames 140 to 165 the subject gradually
turns her back towards the camera and the face
becomes invisible, and in the following consecutive
100 frames the illumination changes are considerable.
The face becomes unseen again when the girl
turns around from frames 270 to 360.When the face
is invisible both trackers deviates from the target
and the errors becomes large. The reason for this is
that the reference color model is built from the
subject’s frontal face. Thanks to the reference color
model that contains some pixels of hair the
deviation is not much and tracking recovers when
the girl faces the camera again.
From frames 630 to 710 a man’s face gradually
occludes and un-occludes the tracked face and Fig.
8 shows different behaviors of the two algorithms.
When a quite similar object appears nearby, two
local maxima appear (please refer to Fig. 3), the
gradient-based mean shift is trapped in a local
maximum and locks on the man’s face. It can been
seen from Fig. 6 that errors of x, y and scale of the
mean shift becomes very large. But the proposed
algorithm performs exhaustive search and so
succeeds to handle this situation.
The average errors of x, y coordinates and scale
using the proposed algorithm are all less than those
using the mean shift algorithm, as Table 2 shows.
ARTICLE IN PRESS
Fig. 5. Some of typical tracking results using the proposed algorithm. From left
to right, top to bottom, are shown frames
20; 80; 148; 220; 322; 369; 381 and 430.
Table 1
Comparison of tracking errors (means  standard variances) and time for person tra
cking
X error (pixels) Y error (pixels) Scale error (%) Area error (%) Tracking timea
(ms)
Mean shift 2:3  1:8 4:7  2:7 0:12  0:26 0:14  0:15 16
The proposed 2:0  1:9 3:4  2:3 0:02  0:02 0:15  0:11 7 (5)
aThe data in parenthesis is the average time to compute the Integral Images.
682 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687
The average tracking time is 20ms for the mean
shift and 15 ms for the proposed in which 12 ms is
taken by computation of Integral Images. The time
fraction of the proposed algorithm is significantly
less, which indicates that in most frames the object
is successfully tracked.
4.3. Performance evaluation of the proposed
algorithm vs. cluster number and color space
The cluster number in the above experiments is 6,
and it is interesting to see performance variation vs.
cluster number, which is shown in Table 3 for
ARTICLE IN PRESS
0 100 200 300 400 500 600 700 800
0
10
20
30
40
50
60
70
Frame index
X coordinate error of object centroid (pixels)
The proposed algorithm
Mean shift algorithm
0 100 200 300 400 500 600 700 800
0
20
40
60
80
100
120
Frame index
Y coordinate error of object centroid (pixels)
The proposed algorithm
Mean shift algorithm
0 100 200 300 400 500 600 700 800
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Frame index
Scale error
The proposed algorithm
Mean shift algorithm
0 100 200 300 400 500 600 700 800
0
0.2
0.4
0.6
0.8
1
Frame index
Error of overlapping region area
The proposed algorithm
Mean shift algorithm
Fig. 6. Comparison of errors for face tracking between the mean shift algorithm
(blue, dotted) and the proposed algorithm (red, solid).
From left to right, top to bottom are shown errors of x, y, scale and overlappin
g region area versus frame index.
Table 2
Comparison of tracking errors (means  standard variances) and time for face track
ing
X error (pixels) Y error (pixels) Scale error (%) Area errora (%) Tracking timeb
(ms)
Mean shift 12:0  13:4 17:0  26:0 0:24  0:24 0:57  0:41 ð0:22Þ 20
The proposed 10:2  10:7 13:3  14:1 0:14  0:17 0:51  0:32 ð0:09Þ 15 (12)
aThe data in parenthesis is the time fraction in which the object is not effecti
vely tracked.
bThe data in parenthesis is the average time to compute the Integral Images.
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 683
person tracking and in Table 4 for face tracking,
respectively.
As demonstrated in Table 3, with increase of the
cluster number, scale error becomes larger whereas
area error gets less, and it is seen that y error
fluctuates. X error gradually increases from d ¼ 12
to 24 but is still less than that at d ¼ 6.
For face tracking, y, scale and area errors at d ¼ 12; 18; 24 are less than that
at d ¼ 6. It is seen that x
error at d ¼ 12 is less whereas it is larger at
d ¼ 18; 24, in contrast with that at d ¼ 6. The
tendency of consistent increase or decrease is not
obvious since with the increase of d, fluctuation of
each error is almost always observed. For both
examples, tracking time are seen on the significant
increase when cluster number grows.
In all, the performance of the proposed algorithm
will be slightly improved with increase of cluster
number, however, at the cost of consumption of
much more CPU time. It shows that it is generally
sufficient for the proposed algorithm to describe
well color information of a target with a small
number of cluster number.
For the sake of simplicity of the color model and
computational efficiency, assumption is made that
gray-level distribution in different RGB channel is
independent. Although correlations exist between
channels in RGB space, experiments in Sections 4.1,
ARTICLE IN PRESS
Fig. 8. Comparison of two algorithms when a similar object occludes the subject.
Top row shows results with the proposed algorithm and
bottom row with mean shift algorithm. From left to right shown are frames 630; 6
62; 670; 690 and 700.
Table 3
Performance vs. cluster number using the proposed algorithm for person tracking
Cluster numberd Xerror (pixels) Y error (pixels) Scale error (%) Area error (%)
Tracking timea (ms)
6 2:01  1:94 3:37  2:34 0:021  0:021 0:153  0:109 7 (5)
12 1:75  1:57 4:30  2:41 0:039  0:097 0:132  0:100 15 (12)
18 1:78  1:48 2:78  2:45 0:060  0:093 0:057  0:036 24 (19)
24 1:83  1:55 3:62  2:75 0:074  0:101 0:064  0:039 32 (26)
aThe data in parenthesis is the average time to compute the Integral Images.
Fig. 7. Some of typical tracking results using the proposed algorithm. From left
to right, top to bottom, shown are frames
1; 90; 160; 260; 320; 378; 460 and 700.
684 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687
4.2 and 4.4 prove that the color model under such a
assumption works well. Full consideration of
covariance matrix may improve performance of
the algorithm, however, at the cost of huge increase
of computational load as lack of fast algorithm
currently.
It is interesting to see the performance of the
proposed algorithm in other color spaces with
greater channel separation, particularly in YCbCr,
CIELAB and HSV color spaces. For person
tracking, as Table 5 shows, in comparison with
errors in RGB space, y error in YCbCr space
increases while the other three decrease, and almost
all errors increase in CIELAB and HSV spaces. For
the more challenging face sequence, as shown in
Table 6, tracker fails in both YCbCr and HSV
spaces, where the object is lost from 370 in the
former and from 150 in the latter and never
recovers. In CIELAB space, x and y errors are
larger than those in RGB space, whereas scale error,
area error and the time fraction are less than those
in RGB space.
From experiments above, we see that among
some factors including independence assumption we
made, illumination and appearance changes may
play dominant roles in affecting performance of one
tracking algorithm in different color spaces. We
note that, to handle the above problem, some
researchers investigate how to dynamically select
the best one from many color spaces [16] or the best
color features based on linear combination of
different channels in a color space [3].
4.4. More tracking results
More experiments are made to testify the
performance of the algorithm on image sequences
accommodating different scenarios, where sequences
1 and 2 are both concerned with vehicle
tracking, and sequences 3 and 4 pedestrian tracking.
Tracking results are summarized in Table 7.
In sequence 1 (frames 560 to 760, size: 768  576)
[9], a car was moving on the highway at an
accelerating speed the back of which was captured
ARTICLE IN PRESS
Table 5
Performance vs. color space using the proposed algorithm for person tracking (cl
uster number is 6)
Color space X error (pixels) Y error (pixels) Scale error (%) Area error (%)
RGB 2:01  1:94 3:37  2:34 0:021  0:021 0:153  0:109
YCbCr 1:52  1:67 3:81  2:54 0:019  0:079 0:130  0:108
CIELAB 2:76  2:94 4:03  2:18 0:008  0:010 0:200  0:146
HSV 2:58  2:23 5:99  3:77 0:036  0:035 0:324  0:217
Table 4
Performance vs. cluster number using the proposed algorithm for face tracking
Cluster numberd Xerror (pixels) Y error (pixels) Scale error (%) Area errora (%)
Tracking timeb (ms)
6 10.15710.71 13.30714.07 0.14370.173 0.51470.316 (0.094) 15 (12)
12 8.85710.51 10.74711.16 0.09870.147 0.38670.315 (0.011) 35 (32)
18 10.5179.58 11.34713.15 0.09770.121 0.42770.319 (0.040) 50 (44)
24 10.9871.68 9.79710.34 0.11570.162 0.40970.329 (0.045) 67 (61)
aThe data in parenthesis is the time fraction in which the object is not effecti
vely tracked.
bThe data in parenthesis is the average time to compute the Integral Images.
Table 6
Performance vs. color space using the proposed algorithm for face tracking (clus
ter number is 6)
Color space X error (pixels) Y error (pixels) Scale error (%) Area errora (%)
RGB 10:15  10:71 13:30  14:07 0:143  0:173 0:514  0:316 ð0:224Þ
YCbCr — — — —
CIELAB 12:25  11:70 18:57  21:77 0:109  0:142 0:513  0:319 ð0:165Þ
HSV — — — —
aThe data in parenthesis is the time fraction in which the object is not effecti
vely tracked.
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 685
with a camera installed in another vehicle following
it. In this scenario, both the foreground and the
background are moving and the appearance
changes are non-trivial as the car tracked moves
farther and farther away. As seen in the first column
of Table 7, all tracking errors but y error of the
proposed algorithm are less than those of the mean
shift. It takes on average about 20ms for mean shift
in comparison with 16 ms for the proposed algorithm.
In the second sequence (frames 990 to 1350, size:
768  576) [10], one hatchback entered the view
from the left, moving forward on the road while
passing in front of a row of parked vehicles, and
finally, moving backward and parked in a slot. In
this situation the parked cars nearby which are
similar in appearance to the hatchback pose threats
to trackers. The second column of Table 7 shows
tracking results of the proposed algorithm are better
than those of the mean shift except y coordinate.
The mean shift takes about 21ms and the proposed
algorithm 19 ms to track object.
The scenario in sequence 3 (frames 208 to 430,
size: 720  576) [11] is a train station hall. A person
walked quickly to the exit of the hall, away from the
camera. As the person walked fast severe motion
blurring occurs in the appearance of the object. As
indicated by the third column in Table 7, both
algorithms have almost the same scale error. While
y error of the proposed is less than that of the mean
shift, its x error is larger. The main reason that area
error of the mean shift is less, is that, when
illumination changes from about frame 380, the
size of the computed bounding rectangle tends to
larger and almost encloses the true one. The average
tracking time is 23 for the mean shift and 10 ms for
the proposed algorithm.
In sequence 4 (frames 126 to 280, size: 720  576)
[12] a lady walks straightforward from the left to the
right. During frames 230 to 250 a man occludes in
part the lady while walking past. In this scenario all
tracking errors of the proposed are less than those
of the mean shift. The average time is 25 and 17ms
for the proposed algorithm and the mean shift,
respectively.
5. Conclusions
In the paper a color model is proposed based on
K-means clustering, in which the color space is
partitioned adaptively and the histogram bins are
determined accordingly. Moreover, the distribution
of multi-channel gray level is modelled within each
bin to catch more information on object color. To
measure similarity between two color models, a
similarity measure is defined based on Bhattacharrya
distance and its simplified form is derived.
Thanks to the Integral Images proposed, the
tracking algorithm is able to search exhaustively
but efficiently for the global maximal mode in the
neighboring region. The comparisons with the wellknown
mean shift show that the proposed algorithm
has better performance while retaining the same (or
less) computational cost.
Currently the bin number is empirically set, which
is applicable to all our experiments. Nevertheless it
is desirable to automatically determine the number
of bins to account for illumination changes or noise
ARTICLE IN PRESS
Table 7
Comparisons of tracking results with different image sequences
Algorithm Sequence 1 Sequence 2 Sequence 3 Sequence 4
X error Mean shift 3:6  3:7 8:0  7:1 2:2  2:0 8:8  9:6
Proposed 2:9  2:5 6:0  4:6 3:2  2:2 7:1  7:5
Y error Mean shift 1:6  1:6 4:0  3:0 3:0  2:4 10:1  14:3
Proposed 1:8  1:6 8:0  7:5 2:6  2:2 9:1  6:7
Scale error Mean shift 0:0278  0:0314 0:041  0:026 0:021  0:022 0:118  0:087
Proposed 0:0114  0:0120 0:026  0:018 0:021  0:022 0:035  0:050
Area error Mean shift 0:0554  0:0393 0:319  0:124 0:280  0:118 0:301  0:370
Proposed 0:187  0:0951 0:315  0:109 0:411  0:085 0:255  0:271
Timea Mean shift 20 21 23 25
Proposed 16 (14) 19 (16) 10 (8) 17 (15)
Unit of X, Y error: pixels; unit of tracking time: ms; unit of scale and area: %
.
aThe data in parenthesis is the average time to compute the Integral Images.
686 L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687
while retaining a good discriminative power. Once
the Integral Images are computed, the color model
can be evaluated very fast. Applications of the
Integral Images along with the color model are
therefore possible to tasks where a brute-forth
yet efficient search is needed, such as object
detection and sub-image retrieval, which are our
future work.
Acknowledgments
The work was supported by the National Natural
Science Foundation of China (NSFC) under Grant
Number 60505006, Natural Science Foundation of
Hei Long Jiang Province (F200512), Science and
Technology Research Project of Educational Bureau
of Hei Long Jiang Province (1151G033),
Postdoctoral Fund for Scientific Research of Hei
Long Jiang Province (LHK-04093) and Science
Fund of Hei Long Jiang University for Distinguished
Young Scholars (JC200406).
References
[1] S.T. Birchfield, S. Rangarajan, Spatiograms versus histograms
for region-based tracking, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern
Recognition, San Diego, CA, USA, June 2005,
pp. 1158–1163.
[2] H.-D. Cheng, X.-H. Jiang, Y. Sun, J. Wang, Color image
segmentation: advances and prospects, Pattern Recognition
34 (12) (2001) 2259–2281.
[3] R. Collins, Y. Liu, On-line selection of discriminative
tracking features, in: Proceedings of the IEEE Conference
on Computer Vision, Nice, France, 2003, pp. 346–352.
[4] R.T. Collins, Mean-shift blob tracking through scale space,
in: Proceedings of the IEEE Conference on Computer Vision
and Pattern Recognition, 2003, pp. 234–241.
[5] D. Comaniciu, V. Ramesh, P. Meer, Real-time tracking of
non-rigid objects using mean shift, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition,
2000, pp. 142–149.
[6] A. Gersho, R. Gray, Vector Quantization and Signal
Compression, Kluwer Publishers, Dordrecht, 1992.
[7] A.K. Jain, R.C. Dubes, Algorithms for Clustering Data,
Prentice-Hall, Englewood Cliffs, NJ, 1988.
[8] A.K. Jain, M. Murthy, P. Flynn, Data clustering: a review,
ACM Comput. Rev. 31 (3) (1999) 264–323.
[9] PETS2001 datasets, The University of Reading, UK, found
at URL: hhttp://peipa.essex.ac.uk/ipa/pix/pets/PETS2001/
DATASET5/TESTING/CAMERA1_JPEGS/i.
[10] PETS2001 datasets, The University of Reading, UK, found
at URL: hhttp://peipa.essex.ac.uk/ipa/pix/pets/PETS2001/
DATASET5/TRAINING/CAMERA1_JPEGS/i.
[11] PETS 2006 dataset S7 camera 4, ISCAPS consortium, found
at URL: hhttp://ftp.cs.rdg.ac.uk/PETS2006/S3-T7-A.zipi.
[12] PETS 2006 dataset S7 camera 3, ISCAPS consortium, found
at URL: hhttp://ftp.cs.rdg.ac.uk/PETS2006/S3-T7-A.zipi.
[13] F. Porikli, Integral histogram: a fast way to extract
histograms in cartesian spaces, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
San Diego, CA, USA, 2005, pp. 829–863.
[14] Y. Raja, S.J. McKenna, S. Gong, Colour model selection
and adaptation in dynamic scene, in: Proceedings of the
European Conference on Computer Vision, 1998,
pp. 460–474.
[15] C. Stauffer, W.E. Grimson, Learning patterns of activity
using real-time tracking, IEEE Trans. Pattern Anal.
Machine Intell. 22 (8) (2000) 747–757.
[16] H. Stern, B. Efros, Adaptive color space switching for
tracking under varying illumination, Image Vision Comput.
23 (3) (2005) 353–364.
[17] Test image sequences for face tracking by Stan Birchfield,
found at URL: hhttp://vision.stanford.edu/birch/headtracker/
seq/i.
[18] The EC Funded CAVIAR project/IST 2001 37540, found at
URL: hhttp://homepages.inf.ed.ac.uk/rbf/CAVIAR/i.
[19] P. Viola, M. Jones, Rapid object detection using a boosted
cascade of simple features, in: Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition,
2001, pp. 511–518.
[20] H. Wang, P. Li, T. Zhang, Proposal of novel histogram
features for face detection, in: International Conference on
Advances in Pattern Recognition, Bath, UK, 2005,
pp. 334–343.
[21] C. Wren, A. Azarbayejani, T. Darrell, A.P. Pentland,
Pfinder: real-time tracking of the human body, IEEE Trans.
Pattern Anal. Machine Intell. 19 (7) (1997) 780–785.
[22] C. Yang, R. Duraiswami, L. Davis, Efficient mean-shift
tracking via a new similarity measure, in: Proceedings of the
IEEE Conference on Computer Vision and Pattern Recognition,
2005, pp. 176–183.
[23] Q. Zhao, H. Tao, Object tracking using color correlogram,
in: IEEE Workshop on VS-PETS, 2005.
ARTICLE IN PRESS
L. Peihua / Signal Processing: Image Communication 21 (2006) 676–687 687