Spatial color histogram based center voting method for subsequent object tracking

and segmentation

Suryanto, Dae-Hwan Kim, Hyo-Kak Kim, Sung-Jea Ko ⁎
School of Electrical Engineering, Korea University, Seoul, South Korea
a b s t r a c t a r t i c l e i n f o
Article history:
Received 13 August 2010
Received in revised form 2 September 2011
Accepted 23 September 2011
Keywords:
Object tracking
Spatial color
Histogram
Center voting
Back projection
Generalized Hough transform
In this paper, we introduce an algorithm for object tracking in video sequences. In order to represent the ob-
ject to be tracked, we propose a new spatial color histogram model which encodes both the color distribution
and spatial information. Using this spatial color histogram model, a voting method based on the generalized
Hough transform is employed to estimate the object location from frame to frame. The proposed voting based
method, called the center voting method, requests every pixel near the previous object center to cast a vote
for locating the new object center in the new frame. Once the location of the object is obtained, the back pro-
jection method is used to segment the object from the background. Experiment results showsuccessful track-
ing of the object even when the object being tracked changes in size and shares similar color with the
background.
© 2011 Elsevier B.V. All rights reserved.
1. Introduction
The tracking of moving objects from frame to frame in real time
video sequences captured by the moving camera is a highly challeng-
ing task. The conventional background subtraction techniques cannot
be employed to locate the moving object in such video sequences due
to the constant changes of the background scene. Additional chal-
lenges come from complex object motion, non-rigid object tracking,
partial occlusion, illumination change, and real time processing re-
quirement. Despite these difficulties, many tracking algorithms have
been proposed over past decades. For a comprehensive review of var-
ious tracking algorithms, the readers can refer to [1].
Two important aspects that determine the performance of the track-
ing algorithms are target representation and target localization. Target
representation refers to how the object to be tracked is modeled and
target localization deals with how the search of the corresponding ob-
ject in the following frame is accomplished. Popular models used for
target representation are object contour [2,3], feature point [4–7], and
color histogram [8–10]. Depending on the target representation
model, various target localization techniques can be employed.
Tracking with object contour performs well even when the object
being tracked is not rigid. The CONDENSATION algorithm [2] parameter-
izes the contour using the B-Spline curve and performs tracking using the
particle filtering method. At each frame, a total of N particles, with each
particle represented by one B-Spline curve, have to be maintained and
updated based on the local edge map. In general, a high number of parti-
cles are required to maintain a good tracking performance. The computa-
tion requirement of this algorithm is relatively high, which limits its
application in simultaneous multiple object tracking. Another algorithm
for object contour tracking represents the object contour using two linked
lists and a level set array [3]. Contour adaptationis realizedby performing
switching on elements of the linked lists. At each frame, the elements of
the linked lists are adjusted to fit the object contour. The computation
complexity of this algorithm is relatively low, but the non-parametric
representation of the contour makes the contour unconstrained. As the
object moves into the backgroundwhichshares similar color withthe ob-
ject, the object contour expands to include this backgroundregion, result-
ing in tracking failure.
Tracking using feature points produces good results when the
object has rich textures. For a good feature point, the iterative Newton
Raphson minimization algorithm can be employed to find its corre-
sponding point in the next frame [5,11]. Tracking with feature points
is fast andreliable. However, when the object turns around or is partially
occluded, the performance of the tracking algorithm deteriorates.
The use of the color histogram for target representation has been
increasingly popular due to its robustness against object pose changes
and partial occlusion. Bradski proposed an algorithm called CAM-
SHIFT [8] which tracks the face in video sequences using the color his-
togram of the skin. In order to locate the face from frame to frame, an
Image and Vision Computing 29 (2011) 850–860
☆ This paper has been recommended for acceptance by Richard Bowden.
⁎ Corresponding author at: School of Electrical Engineering, Korea University, Anam-
dong, Sungbuk-Gu, Seoul, 136-713, South Korea. Tel.: +82 2 3290 3228; fax: +82 2
925 5883.
E-mail addresses: suryanto@dali.korea.ac.kr (Suryanto), dhkim@dali.korea.ac.kr
(D.-H. Kim), hkkim@dali.korea.ac.kr (H.-K. Kim), sjko@korea.ac.kr (S.-J. Ko).
0262-8856/$ – see front matter © 2011 Elsevier B.V. All rights reserved.
doi:10.1016/j.imavis.2011.09.008
Contents lists available at SciVerse ScienceDirect
Image and Vision Computing
j our nal homepage: www. el sevi er . com/ l ocat e/ i mavi s
iterative procedure based on the mean shift is applied to center the
object rectangle in the face region. At each iteration, the rectangle po-
sition is moved to a new position until convergence. Even though the
Bradski's algorithm was developed for face tracking, it can be used to
track any object of interest. Comaniciu et al. proposed the Kernel
Based Tracking (KBT) algorithm which uses the kernel weighted
color histogram to represent the color distribution of the object [9].
In the kernel weighted color histogram representation, the pixels lo-
cated near the object boundary are given smaller weights while the
pixels around the center of the object are assigned larger weights.
The object localization is performed iteratively by a mean shift meth-
od similar to CAMSHIFT. The use of the kernel weighted histogram for
the mean shift based algorithms significantly improves the tracking
performance. However, the algorithm does not perform well when
the object being tracked changes in size.
In this paper, we introduce a newobject representation model and
localization method. In order to represent the object to be tracked, we
propose a new class of spatial color histogram model, by adopting the
concept of the spatiogram[10]. Each bin in the spatial color histogram
model contains the information on the number of pixels belonging to
the color bin and the positions of those pixels relative to the object
center. The localization of the object in the next and following frames
is accomplished using the center voting method based on the gener-
alized Hough transform voting scheme [12]. Once the object location
is obtained, the back projection method is utilized to segment the ob-
ject from the background. With the segmented object, the current ob-
ject size is estimated and the search range is adjusted accordingly.
This paper is organized as follows. In the next section, we briefly
review the generalized Hough transform and the spatiogram which
are closely related to our proposed algorithm. Then, we present
our algorithm in detail in Section 3. Experiment results are given in
Section 4. Section 5 concludes the content of the paper.
The preliminary version of this work has been published in [13].
We extend our previous work by utilizing the kernel to create a
more reliable model and by introducing the spatial color histogram
update mechanism.
2. Related works
2.1. The generalized Hough transform
The generalized Hough transform has been widely used to detect
the shape in the image [12]. It consists of two main parts; the con-
struction of the R-Table to represent the shape and the voting scheme
to detect the shape in the image.
2.1.1. Construction of the R-Table
Given an arbitrary shape as in Fig. 1(a), for each point x

¼ x; y ð Þ at
the boundary of the shape, the gradient direction φ and its relative
position r

¼ r
x
; r
y
_ _
from a reference point c

¼ c
x
; c
y
_ _
are comput-
ed. Then, the relative position vectors r

¼ x

− c

are stored as a
function of the gradient direction in the R-Table. In general, a gradient
direction φ may have many values of r

. Table 1 shows a general form
of the R-Table.
2.1.2. Voting scheme for the shape detection
For each edge pixel x

in the image in Fig. 1(b), its gradient direc-
tion φ=φ′ is calculated. Then, each pixel casts votes into the vote ac-
cumulator at positions x

− r

where r

is the set of all position
vectors indexed by φ′ in the R-Table. Fig. 1(c) shows the vote accu-
mulator for the detection of the shape in Fig. 1(a) from the test
image in Fig. 1(b). The pixel with the highest intensity, i.e. the pixel
with the highest vote, indicates the location of the shape.
In Section 3, we show how to adopt this generalized Hough trans-
form technique to track the object from frame to frame in a video se-
quence. Note that the underlying idea behind the generalized Hough
transform has also been applied successfully to the object category
detection [14].
2.2. Tracking by spatiogram
The spatiogram [10] represents the object to be tracked as
h ¼ n
b
; μ
b
; Σ
b
f g
b¼1;…;B
; ð1Þ
where n
b
is the number of object pixels whose quantized values fall
into the b-th bin, and μ
b
and Σ
b
are the mean vector and the covari-
ance matrix, respectively, of the coordinates of those pixels. The num-
ber B is the total number of bins in the spatiogram.
The object localization is performed by determining the location
yεR
2
in the image where the similarity between the spatiogram of
the object h and the spatiogram at location y, h(y)={n
b
(y), μ
b
(y), Σ
b
(y)}, is maximized. The spatiogram h(y) is calculated from an image
region whose center is at location y and has the same size as the ob-
ject to be tracked.
The similarity between two spatiograms is computed as the
weighted sum of the Bhattacharyya similarity between two histo-
grams
ρ h y ð Þ; h ð Þ ¼ ∑
B
b¼1
ψ
b
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
n
b
y ð Þn
b
_
; ð2Þ
a b c
Fig. 1. Geometry for the generalized Hough transform. (a) Model shape. (b) Test image. (c) Vote accumulator.
Table 1
The R-Table of an arbitrary shape.
Gradient direction Positions
0 r

_ ¸
¸
¸ r

¼ x

− c

; ϕ x ð Þ ¼ 0g
Δϕ r

_ ¸
¸
¸ r

¼ x

− c

; ϕ x ð Þ ¼ Δϕg
2Δϕ r

_ ¸
¸
¸ r

¼ x

− c

; ϕ x ð Þ ¼ 2Δϕg
… …
851 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
where the weighting function is given by
ψ
b
¼ η exp −
1
2
μ
b
y ð Þ−μ
b
ð Þ
T
Σ

−1
b
μ
b
y ð Þ−μ
b
ð Þ
_ _
; ð3Þ
with η as the Gaussian normalization constant and Σ

b
−1
¼
Σ
−1
b
y ð Þ þΣ
−1
b
. The location y that maximizes the similarity function
in Eq. (2) is determined either by using the gradient descent mean
shift method or the exhaustive local search. In general, the mean
shift based localization is several order faster than the exhaustive
local search.
In our algorithm, we modify this spatiogram to construct a table
similar to the R-Table and then use the voting mechanism of the gen-
eralized Hough transform to find the location of the object to be
tracked. We describe our algorithm in detail in the next section.
3. Proposed algorithm
Given an object centered at location c

¼ c
x
; c
y
_ _
in the current
frame, the tracking objective is to find the new object center c


in
the next and the following frames. The object to be tracked is referred
to as the target object.
Since our main objective is to track the object, we assume that at
the initial frame, the object has been segmented out from the back-
ground. This object segmentation can be done at the initial frame by
performing the background subtraction algorithms [15–17] or even
by manual selection by a human operator.
3.1. Target representation
Let x

i
¼ x
i
; y
i
ð Þ
_ _
i¼1…N
be the location of the pixels belonging to
the target object, c

be the location of the object center, and l
x
and l
y
be half the widthand height of the rectangle bounding the object region
as shown in Fig. 2(a). In order to represent the target object, we use the
spatial color histogram model h ¼ μ

b;k ð Þ
; n
b;k ð Þ
_ _
b¼1;…;B;k¼1;2;…
, where
μ

b;k ð Þ
is the mean vector representing the position of the k-th cluster
of pixels relative to the object center. A cluster is defined as a group of
pixels whose quantized values fall into the same b-th bin of the histo-
gram and which are located close to each other. The n
(b, k)
is the proba-
bility value associated with the number of pixels belonging to that
particular cluster. Mathematically,
n
b;k ð Þ
¼ C ∑
N
i¼1
K x

i
_ _
δ
ibk
; ð4Þ
where C is the normalization constant to ensure that ∑
b;k
n
b;k ð Þ
¼ 1
which is given by
C ¼
1

N
i¼1
K x

i
_ _
; ð5Þ
and δ
ibk
=1 if the value of pixel x

i
is quantized into the b-th bin and its
distance from the μ

b;k ð Þ
is smaller than threshold ε, otherwise δ
ibk
=0.
The pseudo-code for estimating ěcμ
(b, k)
is given in Algorithm 1.
Algorithm 1. Estimation pf μ

b;k ð Þ
Input: pixel location x

i
of the target object
Output: cluster of pixels μ

b;k ð Þ
for each x

i
:
1. Calculate the bin index b of pixel x

i
.
2. Create a new cluster μ

b;1 ð Þ
¼ x

i
if there are no cluster in the b-
th bin.
3. Otherwise, for each k:
(a) calculate the distance of x

i
from μ

b;k ð Þ
,
(b) include x

i
to μ

b;k ð Þ
if the distance is smaller than ε,
(c) update μ

b;k ð Þ
.
4. Create a new cluster μ

b;k ð Þ
¼ x

i
if x

i
is not included in any
existing clusters.
As in [9], here we also employ a monotonically decreasing kernel
function K x

i
_ _
to assign smaller weights to pixels located farther
away from the object center; because these pixels are less reliable,
a b c
Fig. 3. Illustration of object tracking using the proposed algorithm. (a) The target object. (b) Center voting procedure. (c) Tracking result.
a b c d
Fig. 2. (a) The target object. (b) Object region and background region. (c) 2-D object kernel. (d) 2-D background kernel.
852 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
since they are often affected by occlusion or interference from the
background. The kernel function K x

i
_ _
can be written as
K x

i
_ _
¼
1−d
2
x

i
_ _
; if d x

i
_ _
≤1;
0; otherwise;
_
ð6Þ
where d x

i
_ _
is the standardized Euclidean distance between pixel
x

i
and the object center c

, calculated by first normalizing (x
i
−c
x
)
and (y
i
−c
y
) by l
x
and l
y
, respectively, i.e.:
d x

i
_ _
¼
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
x
i
−c
x
l
x
_ _
2
þ
y
i
−c
y
l
y
_ _
2
¸
¸
¸
_
: ð7Þ
The standardized Euclidean distance in Eq. (7) is similar to the equa-
tion of the ellipse. In fact, if d x

i
_ _
¼ 1, we obtain an ellipse centered
on c

with the semi-minor axis l
x
and the semi-major axis l
y
as shown
in Fig. 2(b). For pixels located outside this ellipse, which are mostly the
background pixels, their standardized distances from the object center
are greater than one, and consequently their kernel weights K(cx
i
) be-
come zero. On the other hand, for pixels inside the ellipse, their standard-
ized distances are smaller than one. The kernel function K x

i
_ _
assigns
larger weights to the pixels located closer to the center as shown in
Fig. 2(c).
Unlike the spatiogram which has a single mean and a single co-
variance for each bin, our spatial color histogram model allows each
histogram bin to have more than one mean vector. Each mean vector
represents a cluster of pixels sharing a similar color which are located
in close proximity to each other.
For example, consider a target object in Fig. 3(a). The alphabets in-
dicate the color histogram bins associated with the pixels. The spatial
color histogram for this object is shown in Table 2. For simplicity, we
use auniform kernel K x

i
_ _
¼ 1 for this particular example. Note that
the two pixels belonging to the same color histogram bin a are not
clustered together due to their spatial distance. Thus, the correspond-
ing histogram bin has two mean vectors. On the other hand, since the
two pixels with bin index e are adjacent to each other, they are
grouped into the same cluster with a single mean.
Table 3 shows that the proposed spatial color histogram model
has a structure similar to the R-Table. In the next subsection, we
show how to use this spatial color histogram model to locate the tar-
get object in the following frames.
3.2. Target localization
Target localization consists of center voting and back projection
steps. In the center voting step, every pixel located near the previous
object center c

is required to cast a vote for the location of the object
center. Rules for the center voting procedure are as follows:
1. Only the pixel whose color exists in the target model can cast a vote.
2. Each pixel x

i
, whose quantized value falls into b-th bin, casts a
vote on position x

i
−μ

b;k ð Þ
_ _
.
3. More reliable pixels cast votes with higher weights than less reli-
able pixels.
Fig. 3(b) illustrates the center voting procedure. The arrows indi-
cate where the pixels cast their votes on. The pixels labeled × are
the pixels whose colors do not fall into any of the bins of the target
model histogram. Thus, these pixels do not cast any vote on the loca-
tion of the object center. Pixels located at x

i
labeled b, c, and e cast
their votes on the position indicated by their mean vectors at
x

i
− 0; −1 ð Þ
_ _
, x

i
− 1; −1 ð Þ
_ _
, and x

i
− −0:5; 1 ð Þ
_ _
, respectively.
Since bin a contains two mean vectors, pixels whose colors fall into
this bin cast two votes. It can be seen that the location in the image
receiving the highest number of votes is located at pixels labeled d,
which is the new location of the object c


as shown in Fig. 3.
In this example, we have made a very naive assumption that the
background does not share the same color with the object. The as-
sumption, of course, does not hold in most cases, which causes the al-
gorithm to fail to correctly estimate the location of the object center.
This problem can be solved by adding the third rule, i.e. reliable pixels
cast votes with higher weights.
Naturally, a pixel is regarded as a reliable pixel if its color exists in
the object, but not in the background. A straightforward way to quan-
tify the reliability of a pixel is to use the probability difference as
employed in [18]. Thus, the voting weight w
(b, k)
for a vote casted
using mean vector μ

b;k ð Þ
by a pixel whose color belongs to the b-th
bin of the histogram can be expressed as
w
b;k ð Þ
¼ max
n
b;k ð Þ
−m
b
n
b;k ð Þ
þ m
b
; 0
_ _
; ð8Þ
where n
(b, k)
is given in Eq. (4) and m
b
is the probability value associ-
ated with the number of pixels in the background whose quantized
values fall into the b-th bin.
Redefine x

i
¼ x
i
; y
i
ð Þ
_ _
i¼1…N
as the pixels in the background re-
gion, located between the ellipse with semi-minor axis l
x
and semi-
major axis l
y
and the ellipse with semi-minor axis l
x
+Δ and semi-
major axis l
y
+Δ, both centered at c

as shown in Fig. 2(b). The prob-
ability value m
b
, which is also the value of the background histogram
at b-th bin, can be calculated by
m
b
¼ C
BG

N
i¼1
K
BG
x

i
_ _
δ
ib
; ð9Þ
Table 2
Spatial color histogram for the target object in Fig. 3(a).
bin index h ¼ μ

b;k ð Þ
; n
b;k ð Þ
_ _
a {(−1,−1),1/7}, {(1,1),1/7}
b {(0,−1),1/7}
c {(1,−1),1/7}
d {(0, 0),1/7}
e {(−0.5, 1),2/7}
Table 3
Attributes of the R-Table and the proposed model.
R-Table Our proposed model
Reference point c

Object center c

Gradient direction ϕ Histogram bins b
Positions r

Mean vectors μ

b;k ð Þ
Table 4
The parameters.
Param Meaning Suggested value
Object motion related parameters
η Speed of the object For most tracking application: 2. If the object
displacement between frames is larger than
the object size, then 3 or larger.
ε Spatial clustering
threshold
5
ζ Object rigidity threshold 10
Model update related parameters
α Object size update ratio 0.1
β Background model update
ratio
0.5
γ Object model update ratio 0.1
ξ Pruning threshold 0.00001
853 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
where
C
BG
¼
1

N
i¼1
K
BG
x

i
_ _
is a normalization constant and δ
ib
=1 if the value of pixel x

i
is quan-
tized into the b-th bin, otherwise δ
ib
=0. K
BG
x

i
_ _
is the background
kernel function as used in [18], which is given by
K
BG
x

i
_ _
¼
1−λ σ−d x

i
_ _ _ _
2
; if d x

i
_ _
≥1
_ _
and d
Δ
x

i
_ _
≤1
_ _
;
0; otherwise;
_
ð10Þ
where
σ ¼
1
2
1 þ
d x

i
_ _
d
Δ
x

i
_ _
_
_
_
_
; ð11Þ
λ ¼
1
σ−1 ð Þ
2
; ð12Þ
and d
Δ
x

i
_ _
is the standardized Euclidean distance as in Eq. (7) but
with the normalizer l
x
+Δ and l
y
+Δ.
The ring-shaped background kernel function K
BG
in Eq. (10) assigns
weights with value zero for pixels located outside the background re-
gion, as shown in Fig. 2(d). Note that the widthof the backgroundregion
is determined by the value Δwhich can be calculated as a function of the
object dimension, i.e. Δ=η⋅min(l
x
, l
y
), where η is a parameter that de-
pends on the speed of the object. For most tracking application where
the object motion between frames is not larger than the object size,
η=2 is sufficient. Larger value of η should be used when faster moving
objects are tracked.
Furthermore, as the object in the new frame tends to be located
near its previous location, we only have to collect the votes from
pixels located nearby its previous center c

. This set of pixels, from
whom we collect the votes, is the search range of the algorithm and
is shown in Fig. 3(b) as a region enclosed by the dash-line rectangle
centered on c

denoted as Rect c

_ _
. The dimension of this search re-
gion is 2(l
x
+Δ)×2(l
y
+Δ).
Once we obtain the new object center c


, we re-scan the pixels in
the neighborhood to see which pixels have casted the correct votes.
The pixels that have casted the correct votes for the object center are
marked as the object pixels. Since the object being tracked may not be
rigid and may grow or shrink in size, the relative location of pixels
fromthe object center can change slightly, causing themto vote slightly
off from the object center. In order to include these pixels into the fore-
ground, we allowpixels that have casted their votes somewhere within
distance ζ fromthe object center c


to be categorized as object pixels as
well. The pseudocode is given in Algorithm 2.
Algorithm 2. Back projection
for all x

∈Rect c

_ _
distance←‖vote x

_ _
− c



if distancebζ then
x

is foreground pixel
else
x

is background pixel
end if
end for.
The result of this back projection method is a foreground image
where the pixels belonging to the object are marked as 1's. With
this foreground image, we can easily estimate the change in object
size and then re-adjust the dimension of the object and background
kernel accordingly. Let l
x
*
and l
y
*
be half of the width and length of
the rectangle bounding the foreground region obtained by the back
projection method. The new object size is calculated by
l
x
0
¼ 1−α ð Þ⋅l
x
þα⋅l
4
x
;
l
y
0
¼ 1−α ð Þ⋅l
y
þα⋅l
4
y
;
ð13Þ
where α determines how fast we update the old object size with the
newly obtained size.
a b c d
e f g h
Fig. 4. Tracking result using the proposed algorithm. (a) Target object at the initial frame. (b) Tracking result at frame 30. (c) Vote map for frame 30. (d) Segmented object at frame
30. (e) Tracking result at frame 60. (f) Segmented object at frame 60. (g) Tracking result at frame 140. (h) Segmented object at frame 140.
854 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
3.3. Model update
During tracking, the object and background models have to be con-
tinuously updated to reflect the change in object and background in-
formation. Updating the background model is quite straightforward.
Let m
b
*
be the background histogram calculated at the current frame
by centering the background kernel at c


. The background histogram
is updated as
m
b
¼ 1−β ð Þ⋅m
b
þβ⋅m
4
b
; ð14Þ
a
b
c
d
e
Fig. 5. Comparison of the tracking results of the (a) KBT, (b) level set, (c) spatiogram, (d) EFS, and (e) the proposed algorithm.
855 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
where β determines howmuch the proportion of the newly calculated
background histogram is used to update the background histogram.
Since the background tends to change quickly from frame to frame,
we use β=0.5.
Updating the object histogram involves merging, appending, and
pruning. Let h
4
¼ μ
→4
b;l ð Þ
; n
4
b;l ð Þ
_ _
b¼1;…;B;l¼1;2;…
be the spatial color histo-
gram calculated at the current frame by centering the object kernel at
c


with the updated kernel dimensions (l
x
, l
y
). The spatial color histo-
gram of the object is updated by:
1. Merge μ

b;k ð Þ
with μ
→4
b;l ð Þ
by simply taking their average if they are
matched, i.e. the distance between them is smaller than the clus-
tering threshold ε. The corresponding probability is updated by
(1−ma)⋅n
(b, k)
+γ⋅ n
(b, l)
*
. If μ

b;k ð Þ
does not find a match in any
entry of the h
*
, its corresponding probability value is updated by
(1−γ)⋅n
(b, k)
.
2. Append μ
→4
b;l ð Þ
to the object histogram model if a match cannot be
found in h.
3. Prune μ

b;k ð Þ
out of the model if its corresponding probability n
(b, k)
is
smaller than a certain threshold ξ. Practically, ξ=0.0001 can be
chosen.
The parameter γ is a histogram update parameter similar to β. As
the object appearance should not change dramatically from frame to
frame, we set this parameter to 0.1. After updating the object histo-
gram, its probability value has to be normalized by dividing each
probability value n
b, k
by ∑
b;k
n
b;k ð Þ
.
3.4. Algorithm summary
The overall summary of the proposed algorithm is given in
Algorithm 3. Additionally, Table 4 presents the list of the parameters
used in this paper.
3.5. Our contributions
The algorithm we proposed in this paper is the result of meshing the
spatiogram and the generalized Hough transform. In this subsection,
we highlight our contributions and show explicitly how our algorithm
differs from the existing approaches.
Our target representation model is derived from the spatiogram
proposed in [10]. However, unlike [10] which tracks the object in
the following frames by finding the image region whose spatiogram
representation is most similar to the spatiogram of the target object,
Fig. 6. Tracking object which shares very similar color with the background.
Fig. 7. Tracking result using the proposed algorithm for the car-rear sequence from PETS2001 data set.
856 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
we propose an adaptive voting method based on the generalized
Hough transform to locate the target object. It should be noted that
the use of the voting scheme is feasible only after we modify the spa-
tiogram into a form similar to the R-Table of the generalized Hough
transform.
Algorithm 3. Algorithm summary
Initialization
Given an object to be tracked at a spatial location c

in the initial
frame:
1. Create the object histogram h using Algorithm 1 and Eq. (4).
2. Create the background histogram m
b
by using Eq. (9).
Tracking
At the following frame:
3. Request each pixel inside the search region Rect c

_ _
to cast
the votes with the voting weights according to Eq. (8).
4. Assign the location in the image which receives the highest
votes as the new object location c


.
5. Perform the back projection algorithm.
Model update
6. Update the dimension of the target object using Eq. (13).
7. Create the new background histogram m
b
*
at the new object
location c


and update the background histogram m
b
by
using Eq. (14).
8. Create the new object histogram h
*
at the new object location
c


and update the object histogram h.
9. Go back to step 3.
Our proposed tracking algorithm has several advantages over the
existing methods. First, by allowing each bin of our spatial color his-
togram model to have more than one mean vector, we obtain a target
representation model that has richer spatial information than the
spatiogram. Second, the proposed adaptive voting method explicitly
considers the existence of background region which shares similar
color with the object and suppresses the contribution of those colors
in tracking. Third, the proposed algorithm segments the object region
from the background using the simple back projection method.
4. Experiment results
In our experiments, we manually select a target object at the ini-
tial frame and model it using the proposed spatial color histogram
model presented in Section 3.1. For all experiments, we use a
16×16×16-bins RGB color histogram and set the spatial clustering
threshold ε=5 and α=0.1.
Fig. 4(a) and (b) shows the initial frame with the target object
marked with green color and the tracking result at frame 30 with
the predicted object marked by rectangle, respectively. In order to vi-
sually illustrate how the center voting procedure works, we present a
vote-map for this particular frame in Fig. 4(c). The high intensity
pixels represent the locations associated with a large number of
votes, i.e. the location of the object center c


to be estimated. After
Table 7
The average dice coefficients given various values of ζ.
ζ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
2 0.41 0.40 0.32
6 0.78 0.76 0.83
10* 0.80 0.75 0.83
15 0.80 0.60 0.74
Table 5
Time complexity of algorithms (time in ms).
Algorithm Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
KBT 2.76 0.67 5.72
Spatiogram 2.39 0.26 3.62
Level set 1.86 0.69 3.26
EFS 25.41 22.56 36.46
Proposed 4.81 1.64 6.31
Table 6
The average dice coefficients given various values of ε.
ε Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
2 0.76 0.73 0.82
5* 0.80 0.75 0.83
10 0.30 0.66 0.67
15 0.22 0.72 0.62
a
b
c
Fig. 8. The dice coefficient for each frame of the sequence in (a) Fig. 5, (b) Fig. 6, (c)
Fig. 7.
857 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
the object center is obtained, the back projection method is utilized to
segment the object from the background. This segmentation result is
shown in Fig. 4(d). Fig. 4(e), (f), (g), and (h) shows more tracking re-
sults along with the segmented object results. As the object being
tracked is not rigid, ζ=10.
We compare our algorithm with the kernel based tracking (KBT)
[9], the level set [3], the spatiogram based mean shift [10], and the ex-
tended feature selection (EFS) algorithm [19] and show some of the
tracking results in Fig. 5. The performance differences among these al-
gorithms become apparent when the target object becomes smaller
and smaller as it moves away from the camera. As shown in the
third column of Fig. 5(a), the KBT algorithm tracks the object with a
much larger bounding rectangle than the object actually is. This
poor estimation of the target object size contributes to tracking fail-
ure in the later frames as shown in the fourth column of the figure.
The EFS algorithm also fails for the same reason. The spatiogram
based mean shift algorithm loses the target object due to sudden
movement of the camera. The level set algorithm shows relatively
good tracking performance, but with its contour occasionally expands
to include the neighboring objects and shrinks to capture only a por-
tion of the target object as can be seen in the third and fourth columns
of Fig. 5(b). Our algorithm tracks both the location and the size of the
target object successfully throughout the sequence.
In Fig. 6, we present the proposed algorithm result when tracking
an object which shares similar color with the background. The robust
performance of our algorithm against the background with similar
color is achieved due to the use of adaptive voting weights.
Fig. 7 demonstrates the tracking of a vehicle in Performance Eval-
uation of Tracking and Surveillance (PETS) data set using the pro-
posed algorithm. The algorithm tracks the car successfully until the
car becomes too small to be discerned. Even though the object
being tracked is rigid, the car appearance changes quickly both in
pose and size, thus ζ is set to 10.
In order to compare the performance of the algorithms quantita-
tively, we use the dice coefficient metric [20] to measure the degree
of overlap between the tracking rectangle and the ground truth. If
we denote the ground truth rectangle and the tracking rectangle as
Ω
1
and Ω
2
, respectively, then the dice coefficient can be calculated as
D Ω
1
; Ω
2
ð Þ ¼
2⋅Area Ω
1
∩Ω
2
ð Þ
Area Ω
1
ð Þ þ Area Ω
2
ð Þ
: ð15Þ
The dice coefficients of the various algorithms for each frame of
the sequence in Figs. 5, 6, and 7 are shown in Fig. 8(a), (b), and (c),
respectively. In all three sequences, the proposed algorithm outper-
forms the conventional methods.
In order to compare the algorithmcomplexity, we measure the av-
erage time required to process a single frame during tracking. The ex-
periment is run on a PC with dual core 3 GHz CPU and 3 GB RAM. We
present the experiment result in Table 5. The resolution of the se-
quence from Figs. 5 and 6 is 320×240, and Fig. 7 is 384×288. While
our algorithm is a little more complex than the conventional algo-
rithms except for the EFS algorithm, it still runs at the speed much
higher than the real time requirement. The slight increase in compu-
tational time is well justified by its superior tracking performance.
Next, we show the effect of changing the value of the parameters
to the tracking performance. The experiment results are given in
Table 6 to Table 11 with the average dice coefficient used as the per-
formance metric. The asterisk sign near the parameter value indicates
the default value of the parameters as suggested in Table 4. At each
experiment, only one parameter is varied and the rest of the parame-
ters are set to the default value.
Table 6 shows the tracking results for various values of the spatial
clustering threshold ε. Large values of ε will cause more pixels to be
grouped into the same cluster. The result is a coarse model of the spa-
tial color histogram. In general, small ε gives good performance.
The rigidity threshold ζ should be set according to the characteris-
tic of the object being tracked. We suggest using a small value for the
rigid object and a larger value for the non rigid object. The result of
experimenting with various values of ζ is given in Table 7. We con-
clude that ζ is critical for good tracking performance, but it is not
very sensitive since using a value slightly smaller or larger than the
suggested value does not affect the tracking performance greatly.
The parameter α is used to update the object size. We set this
value to 0.1 as the size of the object generally does not change drasti-
cally between frames. As shown in Table 8, smaller value results in
better performance as we expected.
Table 9 shows the result of experimenting with various values of
β. The parameter determines how fast the background model is
updated. This parameter is relatively insensitive except for the chal-
lenging sequence in Fig. 6 where the object being tracked shares sim-
ilar color with the background.
In video tracking, it is reasonable to assume that the object ap-
pearance does not change drastically from frame to frame. Thus, set-
ting the object model update parameter γ to a small value is a logical
choice. The tracking result for various values of γ is shown in Table 10.
In general, large γ reduces the tracking performance as it allows more
background pixels to be updated into the object model.
The pruning threshold ξ is used to remove clusters whose proba-
bility to be the object model are small. Since these clusters are casting
votes with very small weights, their removal does not affect the track-
ing performance. This parameter is not sensitive as long as it is set to a
small value. We provide the experiment result in Table 11 to support
our argument.
Table 9
The average dice coefficients given various values of β.
β Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
0.30 0.77 0.80 0.82
0.40 0.82 0.50 0.82
0.50* 0.80 0.75 0.83
0.60 0.60 0.70 0.82
0.70 0.70 0.79 0.80
Table 10
The average dice coefficients given various values of γ.
γ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
0.05 0.62 0.79 0.84
0.10* 0.80 0.75 0.83
0.20 0.51 0.36 0.78
0.30 0.74 0.67 0.82
0.40 0.64 0.71 0.72
Table 11
The average dice coefficients given various values of ξ.
ξ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
0.00001* 0.80 0.75 0.83
0.00005 0.79 0.78 0.79
0.00010 0.84 0.79 0.80
0.00050 0.60 0.77 0.80
Table 8
The average dice coefficients given various values of α.
α Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7
0.05 0.76 0.75 0.72
0.10* 0.80 0.75 0.83
0.20 0.78 0.30 0.80
0.30 0.79 0.55 0.76
0.40 0.77 0.55 0.79
858 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
Fig. 9 shows the experiment result of tracking a fast moving ob-
ject. The result of the KBT algorithm and our algorithm is presented
in Fig. 9(a) and (b), respectively. In this sequence, the movement of
the ball from frame to frame is larger than its size. In order to ensure
that the target region at the current and following frames are overlap-
ping, Comaniciu et al. suggest to initialize the target model using a
21×31 size region, which is larger than the ball itself, when tracking
using their KBT algorithm. Our algorithm, however, deals with this
problem by simply enlarging the search region by setting η=3,
allowing more pixels around the previous object location to vote for
the new location of the object. Note that our algorithm obtains better
estimate of both the location and size of the ball. Since the object
being tracked is rigid, ζ=2 is used.
Here we also showthe result of tracking in the case of illumination
change. Since our algorithm uses the color information to represent
the object, variation in illumination affects the performance of our al-
gorithm greatly. We show both the success case and the failure case
in Fig. 10(a) and (b), respectively. Note that in Fig. 10(b), the target
appearance changes drastically when the person walks into the shad-
ow area.
The robustness of the proposed algorithm can be improved by
using multiple features together with the color feature. We consider
that several features such as edges and textures can compensate the
weakness of the color feature. However, in order to keep the content
of this paper concise, only the color information is used and the com-
bination with other features remains as our future work.
5. Discussion
We presented a new object tracking algorithm based on the gen-
eralized Hough transform method. In the proposed algorithm, the ob-
ject is represented by using a spatial color histogrammodel which has
a form similar to the R-Table of the generalized Hough transform.
With the proposed spatial color histogram model, the object position
a
b
Fig. 9. The table tennis sequence. (a) KBT algorithm. (b) The proposed algorithm.
a
b
Fig. 10. Performance of the proposed algorithm under illumination change. (a) Successful tracking. (b) Failure tracking.
859 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860
in the next frame is estimated by requesting each pixel to vote for the
location of the object. Experimental results indicate that the proposed
algorithm can track the object successfully even when the object
shares similar color with the background and changes in size.
The proposed algorithmcan be employed into applications such as
understanding human motion in video sequence. Many of the algo-
rithms developed for human motion understanding are based on
the analysis of object silhouettes [21–24]. In order to extract the ob-
ject silhouette, these algorithms usually employ a simple background
subtraction technique which is effective only when the video is cap-
tured by a static camera. Our algorithm can extract the object silhou-
ette even when the video sequence is taken by a moving camera.
The future work shall be focused on integrating the boundary in-
formation into color histogram to further improve the reliability of
the algorithm. Furthermore, the relative distance of a cluster of pixels
from the object center is currently expressed inpixel-distance. Thus
we expect that an alternative representation that is invariant to size
and shape changes will improve the robustness of the algorithm.
Acknowledgments
This work was supported by the Mid-career Researcher Program
through NRF grant funded by the MEST (No. 2011-0000200).
Appendix A. Supplementary data
Supplementary data to this article can be found online at doi:10.
1016/j.imavis.2011.09.008.
References
[1] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Comput. Surv. 38
(2006) 13.
[2] M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual
tracking, Int. J. Comput. Vision 29 (1998) 5–28.
[3] Y. Shi, W.C. Karl, Real-time tracking using level sets, Proceedings of IEEE Confer-
ence on Computer Vision and Pattern Recognition, 2, 2005, pp. 34–41.
[4] J. Shi, C. Tomasi, Good features to track, Proceedings of IEEE Conference on Com-
puter Vision and Pattern Recognition, 1994, pp. 593–600.
[5] C. Tomasi, T. Kanade, Detection and tracking of point features, Technical Report,
Carnegie Mellon University, 1991.
[6] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Com-
put. Vision 60 (2004) 91–110.
[7] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (surf), Com-
put. Vis. Image Underst. 110 (2008) 346–359.
[8] G.R. Bradski, Real time face and object tracking as a component of a perceptual
user interface, IEEE Workshop on Applications of Computer Vision, 1998,
pp. 214–219.
[9] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. Pat-
tern Anal. Mach. Intell. 25 (2003) 564–575.
[10] S.T. Birchfield, S. Rangarajan, Spatiograms versus histograms for region-based
tracking, Proceedings of IEEE Conference on Computer Vision and Pattern Recog-
nition, 2, 2005, pp. 1158–1163.
[11] B.D. Lucas, T. Kanade, An iterative image registration technique with an applica-
tion to stereo vision, Proceedings of International Joint Conference on Artificial
intelligence, 2, 1981, pp. 674–679.
[12] D.H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Read-
ings in Computer Vision: Issues, Problems, Principles, and Paradigms, 1987,
pp. 714–725.
[13] H.-K. Kim Suryanto, S.-H. Park, D.-H. Kim, S.-J. Ko, Probabilistic center voting
method for subsequent object tracking and segmentation, World Academy of Sci-
ence, Engineering and Technology, 59, 2009, pp. 450–454.
[14] B. Leibe, A. Leonardis, B. Schiele, Combined object categorization and segmenta-
tion with an implicit shape model, Workshop on Statistical Learning in Computer
Vision, ECCV, 2004, pp. 17–32.
[15] K. Kim, T.H. Chalidabhongse, D. Harwood, L. Davis, Real-time foreground-
background segmentation using codebook model, Real-Time Imaging 11 (2005)
172–185.
[16] C. Stauffer, W. Grimson, Adaptive background mixture models for real-time
tracking, Proceedings of IEEE Conference on Computer Vision and Pattern Recog-
nition, 2, 1999, pp. 246–252.
[17] A.M. Elgammal, D. Harwood, L.S. Davis, Non-parametric model for background
subtraction, Proceedings of European Conference on Computer Vision, 2, 2000,
pp. 751–767.
[18] I. Leichter, M. Lindenbaum, E. Rivlin, Tracking by affine kernel transformations
using color and boundary cues, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)
164–171.
[19] J. Wang, Y. Yagi, Integrating color and shape texture features for adaptive real-
time object tracking, IEEE Trans. Image Process. 17 (2008) 235–240.
[20] D. Doermann, D. Mihalcik, Tools and techniques for video performance evalua-
tion, Proceedings of International Conference on Pattern Recognition, 4, 2000,
pp. 167–170.
[21] R. Hoshino, D. Arita, S. Yonemoto, R.-I. Taniguchi, R. Hoshino, D. Arita, S. Yone-
moto, R.-I. Taniguchi, Real-time human motion analysis based on analysis of sil-
houette contour and color blob, Proceedings of the Second International
Workshop on Articulated Motion and Deformable Objects, 2002, pp. 92–103.
[22] K. Tabb, N. Davey, R.G. Adams, S.J. George, Analysis of human motion using snakes
and neural networks, Proceedings of the First International Workshop on Articu-
lated Motion and Deformable Objects, 2000, pp. 48–57.
[23] F. Buccolieri, C. Distante, A. Leone, Human posture recognition using active con-
tours and radial basis function neural network, Proceedings of the Advanced
Video and Signal Based Surveillance, 2005, pp. 213–218.
[24] I. Haritaoglu, D. Harwood, L.S. Davis, W4: real-time surveillance of people and
their activities, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 809–830.
860 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

μb . ð2Þ a b c Fig. 2.2. we present our algorithm in detail in Section 3. 1. Table 1 shows a general form of the R-Table. Σb gb¼1.Suryanto et al. In the next section. the pixels located near the object boundary are given smaller weights while the pixels around the center of the object are assigned larger weights. The object localization is performed iteratively by a mean shift method similar to CAMSHIFT. 2. a gradient → direction φ may have many values of r . (b) Test image. is maximized. The spatiogram h(y) is calculated from an image region whose center is at location y and has the same size as the object to be tracked. . ϕðxÞ ¼ 0g →→ → → n r  r ¼ x − c . In general. by adopting the concept of the spatiogram [10]. the current object size is estimated and the search range is adjusted accordingly. The number B is the total number of bins in the spatiogram. indicates the location of the shape. we briefly review the generalized Hough transform and the spatiogram which are closely related to our proposed algorithm. Even though the Bradski's algorithm was developed for face tracking. its gradient direction φ = φ′ is calculated. h(y) = {nb(y). In Section 3.1. 1(a). the construction of the R-Table to represent the shape and the voting scheme to detect the shape in the image. 1(b). Gradient direction 0 Δϕ 2Δϕ … Positions n  →→ → → n r  r ¼ x − c . and μb and Σb are the mean vector and the covariance matrix. Fig. ry from a reference point c ¼ cx . The use of the kernel weighted histogram for the mean shift based algorithms significantly improves the tracking performance. Then. In the kernel weighted color histogram representation. Section 5 concludes the content of the paper. of the coordinates of those pixels. This paper is organized as follows. (a) Model shape. proposed the Kernel Based Tracking (KBT) algorithm which uses the kernel weighted color histogram to represent the color distribution of the object [9]. we introduce a new object representation model and localization method. Σb (y)}. The pixel with the highest intensity. Once the object location is obtained. However. The generalized Hough transform The generalized Hough transform has been widely used to detect the shape in the image [12]. the rectangle position is moved to a new position until convergence. μb(y). Then. Then. Each bin in the spatial color histogram model contains the information on the number of pixels belonging to the color bin and the positions of those pixels relative to the object center. 1(b). The similarity between two spatiograms is computed as the weighted sum of the Bhattacharyya similarity between two histograms B ρðhðyÞ. 1(a) from the test image in Fig. the algorithm does not perform well when the object being tracked changes in size.e. Geometry for the generalized Hough transform. for each point x ¼ ðx.2. The preliminary version of this work has been published in [13]. we show how to adopt this generalized Hough transform technique to track the object from frame to frame in a video sequence. it can be used to track any object of interest. With the segmented object. In order to represent the object to be tracked. Experiment results are given in Section 4. the back projection method is utilized to segment the object from the background. the pixel with the highest vote. It consists of two main parts. / Image and Vision Computing 29 (2011) 850–860 851 iterative procedure based on the mean shift is applied to center the object rectangle in the face region. ϕðxÞ ¼ Δϕg →→ → → r  r ¼ x − c . In this paper. Related works 2. ð1Þ where nb is the number of object pixels whose quantized values fall into the b-th bin. each pixel casts votes into the vote ac→ → → cumulator at positions x − r where r is the set of all position vectors indexed by φ′ in the R-Table. 2. Tracking by spatiogram The spatiogram [10] represents the object to be tracked as h ¼ fnb . the gradient direction φ and its relative Á Á → À → À position r ¼ rx . yÞ at the boundary of the shape. respectively. hÞ ¼ ∑ ψb b¼1 pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi nb ðyÞnb . The object localization is performed by determining the location y ε R2 in the image where the similarity between the spatiogram of the object h and the spatiogram at location y. Construction of the R-Table → Given an arbitrary shape as in Fig. the relative position vectors r ¼ x − c are stored as a Table 1 The R-Table of an arbitrary shape. cy are comput→ → → ed.….1. (c) Vote accumulator. The localization of the object in the next and following frames is accomplished using the center voting method based on the generalized Hough transform voting scheme [12]. At each iteration. Note that the underlying idea behind the generalized Hough transform has also been applied successfully to the object category detection [14].1. Comaniciu et al. i. we propose a new class of spatial color histogram model. 2. Voting scheme for the shape detection → For each edge pixel x in the image in Fig. ϕðxÞ ¼ 2Δϕg … function of the gradient direction in the R-Table.1. We extend our previous work by utilizing the kernel to create a more reliable model and by introducing the spatial color histogram update mechanism.B . 1(c) shows the vote accumulator for the detection of the shape in Fig.

Proposed algorithm → Given an object centered at location c ¼ cx . 2.   N → nðb.k¼1. i¼1 ⌢ −1 with η as the Gaussian normalization constant and Σ b ¼ −1 −1 Σb ðyÞ þ Σb . → → (b) include x i to μ ðb. for each k: → → (a) calculate the distance of x i from μ ðb. This object segmentation can be done at the initial frame by performing the background subtraction algorithms [15–17] or even by manual selection by a human operator. → K xi ∑ N i¼1 ð5Þ → and δibk = 1 if the value of pixel x i is quantized into the b-th bin and its → distance from the μ ðb. 2(a). where the weighting function is given by ( ) −1 1 T ⌢ ψb ¼ η exp − ðμb ðyÞ−μb Þ Σ ðμb ðyÞ−μb Þ .kÞ . (c) Tracking result. → → 2. cy in the current →′ frame. The pseudo-code for estimating ěcμ(b. Calculate the bin index b of pixel x i . otherwise δibk = 0. The object to be tracked is referred to as the target object. c be the location of the object center. (b) Center voting procedure. and lx and ly be half the width and height of the rectangle bounding the object region as shown in Fig. (c) 2-D object kernel.kÞ . Otherwise.… → μ ðb. where spatial color histogram model h ¼ μ ðb. / Image and Vision Computing 29 (2011) 850–860 a b c d Fig. The n(b. (d) 2-D background kernel. 2 b ð3Þ of pixels relative to the object center. k) is the probability value associated with the number of pixels belonging to that particular cluster. The location y that maximizes the similarity function in Eq. we modify this spatiogram to construct a table similar to the R-Table and then use the voting mechanism of the generalized Hough transform to find the location of the object to be tracked.B. we assume that at the initial frame.1Þ ¼ x i if there are no cluster in the bth bin.kÞ ¼ x i if x i is not included in any existing clusters.852 Suryanto et al.kÞ b¼1.kÞ is the mean vector representing the position of the k-th cluster Let n À Á ð4Þ where C is the normalization constant to ensure that ∑b. a b c Fig. (a) The target object. In order to represent the target object. we use the n o → . Since our main objective is to track the object.kÞ → for each x i : → 1.1.…. k) is given in Algorithm 1. → Algorithm 1. (2) is determined either by using the gradient descent mean shift method or the exhaustive local search. Target representation o → x i ¼ ðxi .kÞ → Input: pixel location x i of the target object → Output: cluster of pixels μ ðb. In our algorithm.k nðb. Create a new cluster μ ðb. here we also employ a monotonically decreasing kernel   → function K x i to assign smaller weights to pixels located farther away from the object center. (a) The target object. the mean shift based localization is several order faster than the exhaustive local search.2. yi Þ be the location of the pixels belonging to i¼1…N → the target object. A cluster is defined as a group of pixels whose quantized values fall into the same b-th bin of the histogram and which are located close to each other. Mathematically.kÞ ¼ 1 which is given by C¼ 1  . 3. As in [9]. Illustration of object tracking using the proposed algorithm. because these pixels are less reliable. Create a new cluster μ ðb. 3.kÞ if the distance is smaller than ε. In general. nðb. the object has been segmented out from the background. We describe our algorithm in detail in the next section. 3.kÞ . → (c) update μ ðb.kÞ ¼ C ∑ K x i δibk . the tracking objective is to find the new object center c in the next and the following frames. Estimation pf μ ðb. 3. → → → 4. (b) Object region and background region.kÞ is smaller than threshold ε. .

n o → Redefine x i ¼ ðxi . both centered at c as shown in Fig. It can be seen that the location in the image receiving the highest number of votes is located at pixels labeled d. In the next subsection. 3. 5 10 ε ζ Spatial clustering threshold Object rigidity threshold {(− 1. our spatial color histogram model allows each histogram bin to have more than one mean vector. 3. which are mostly the background pixels. Only the pixel whose color exists in the target model can cast a vote. → 2. (4) and mb is the probability value associated with the number of pixels in the background whose quantized values fall into the b-th bin. ð8Þ The standardized Euclideandistance in Eq. calculated by first normalizing (xi−cx) and (yi−cy) by lx and ly. Thus. Target localization Target localization consists of center voting and back projection steps. For example. Note that the two pixels belonging to the same color histogram bin a are not clustered together due to their spatial distance. 0. The kernel function K x i can be written as  → K xi ¼  (     → → 1−d2 x i . x i −ð1.kÞ . i. The assumption. Table 2 Spatial color histogram for the target object in Fig. of course. −1Þ . does not hold in most cases. A straightforward way to quantify the reliability of a pixel is to use the probability difference as employed in [18]. pixels whose colors fall into this bin cast two votes. Thus. since the two pixels with bin index e are adjacent to each other.kÞ   → where d x i is the standardized Euclidean distance between pixel → → x i and the object center c .1/7} {(− 0. On the other hand. 3(b) illustrates the center voting procedure.1/7} {(0.0 .kÞ −mb ) . i. If the object displacement between frames is larger than the object size. located between the ellipse with semi-minor axis lx and semimajor axis ly and the ellipse with semi-minor axis lx + Δ and semi→ major axis ly + Δ. R-Table → Reference point c Gradient direction ϕ → Positions r Our proposed model → Object center c Histogram bins b → Mean vectors μ ðb.2. they are grouped into the same cluster with a single mean. the corresponding histogram bin has two mean vectors. The arrows indicate where the pixels cast their votes on. for pixels inside the ellipse.5.1 0.1/7}.Suryanto et al. For simplicity. 3. (7) is similar to the equa → tion of the ellipse. which is also the value of the background histogram at b-th bin. we obtain an ellipse centered → on c with the semi-minor axis lx and the semi-major axis ly as shown in Fig. Naturally. More reliable pixels cast votes with higher weights than less reliable pixels. casts a n o → → vote on position x i − μ ðb. we show how to use this spatial color histogram model to locate the target object in the following frames. 2(c). and e cast their votes on the position indicated by their mean vectors at n o n o n o → → → x i −ð0.−1). the voting weight w(b.00001 . nðb. On the other hand. Rules for the center voting procedure are as follows: 1. a pixel is regarded as a reliable pixel if its color exists in the object. 3(a).e. respectively. Unlike the spatiogram which has a single mean and a single covariance for each bin. −1Þ . 1).kÞ þ mb where n(b. 0). The spatial color histogram for this object is shown in Table 2. In the center voting step. 1Þ . if d x i ¼ 1. Each mean vector represents a cluster of pixels sharing a similar color which are located in close proximity to each other. reliable pixels cast votes with higher weights.2/7} Model update related parameters α Object size update ratio β Background model update ratio γ Object model update ratio ξ Pruning threshold 0. Thus.kÞ o nðb. k) for a vote casted → using mean vector μ ðb. {(1. The probability value mb. respectively. yi Þ as the pixels in the background rei¼1…N gion. k) is given in Eq. which causes the algorithm to fail to correctly estimate the location of the object center. The alphabets indicate the color histogram bins associated with the pixels.1). can be calculated by   N → mb ¼ CBG ∑ KBG x i δib .1/7} {(1.e. we have made a very naive assumption that the background does not share the same color with the object. ifd x i ≤1. i¼1 ð9Þ Table 4 The parameters. we   → use auniform kernel K x i ¼ 1 for this particular example. otherwise. Pixels located at x i labeled b.1 0. For pixels located outside this ellipse.: vffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi !2   ux −c 2 u yi −cy → d xi ¼t i x þ : lx ly ð7Þ Fig. their standard  → ized distances are smaller than one. Table 3 shows that the proposed spatial color histogram model has a structure similar to the R-Table. these pixels do not cast any vote on the loca→ tion of the object center.kÞ by a pixel whose color belongs to the b-th bin of the histogram can be expressed as ( wðb.1/7} {(0. c.−1). In this example. Since bin a contains two mean vectors. Param Meaning Object motion related parameters η Speed of the object Suggested value For most tracking application: 2. then 3 or larger.−1). and consequently their kernel weights K(cxi) become zero. The kernel function K x i assigns larger weights to the pixels located closer to the center as shown in Fig. This problem can be solved by adding the third rule. but not in the background. bin index a b c d e h¼ n → μ ðb. The pixels labeled × are the pixels whose colors do not fall into any of the bins of the target model histogram. whose quantized value falls into b-th bin. and x i −ð−0:5.kÞ . 3(a). 2(b). consider a target object in Fig.kÞ ¼ max nðb. In fact. →′ which is the new location of the object c as shown in Fig. their standardized distances from the object center are greater than one. / Image and Vision Computing 29 (2011) 850–860 853 since they are often affected by occlusion or interference from the   → background.5 0. 2(b). Each pixel x i . every pixel located near the previous → object center c is required to cast a vote for the location of the object center. ð6Þ Table 3 Attributes of the R-Table and the proposed model.

In order to include these pixels into the foreground.e. The ring-shaped background kernel function KBG in Eq. causing them to vote slightly off from the object center. The result of this back projection method is a foreground image where the pixels belonging to the object are marked as 1's. we can easily estimate the change in object size and then re-adjust the dimension of the object and background * * kernel accordingly. ð10Þ where  1 0 → d xi 1@ 1 þ → A. (c) Vote map for frame 30. The dimension of this search region is 2(lx + Δ) × 2(ly + Δ). Larger value of η should be used when faster moving objects are tracked. ð12Þ ðσ−1Þ2   → and dΔ x i is the standardized Euclidean distance as in Eq. η = 2 is sufficient. σ¼ 2 d x Δ i   → → centered on c denoted as Rect c . the relative location of pixels from the object center can change slightly. →′ Once we obtain the new object center c . if d x i ≥1 and dΔ x i ≤1 . is the search range of the algorithm and is shown in Fig. (b) Tracking result at frame 30. as the object in the new frame tends to be located near its previous location. (f) Segmented object at frame 60. / Image and Vision Computing 29 (2011) 850–860 where CBG ¼ 1   → ∑ KBG x i N i¼1 → of is a normalization constant and δib = 1 if the value pixel x i is quan → tized into the b-th bin. The pseudocode is given in Algorithm 2. otherwise.ly). Note that the width of the background region is determined by the value Δ which can be calculated as a function of the object dimension. which is given by   → KBG x i ¼ (   2         → → → . KBG x i is the background kernel function as used in [18]. (a) Target object at the initial frame. as shown in Fig. 3(b) as a region enclosed by the dash-line rectangle λ¼ lx 0 ¼ ð1−αÞ⋅lx þ α⋅l4 . ð13Þ where α determines how fast we update the old object size with the newly obtained size. we only have to collect the votes from → pixels located nearby its previous center c . Δ = η ⋅ min(lx. Let lx and ly be half of the width and length of the rectangle bounding the foreground region obtained by the back projection method. (e) Tracking result at frame 60.854 Suryanto et al. 4. . (7) but with the normalizer lx + Δ and ly + Δ. Furthermore. we re-scan the pixels in the neighborhood to see which pixels have casted the correct votes. we allow pixels that have casted their votes somewhere within →′ distance ζ from the object center c to be categorized as object pixels as well. (d) Segmented object at frame 30. 2(d). otherwise δib = 0. (h) Segmented object at frame 140. (10) assigns weights with value zero for pixels located outside the background region. x 4 ly 0 ¼ ð1−αÞ⋅ly þ α⋅ly . 1−λ σ−d x i 0. With this foreground image. a b c d e f g h Fig. The pixels that have casted the correct votes for the object center are marked as the object pixels. This set of pixels. For most tracking application where the object motion between frames is not larger than the object size. Back projection   → → for all x ∈Rect   c →′ → distance←‖vote x − c ‖ if distance b ζ then → x is foreground pixel else → x is background pixel end if end for. Since the object being tracked may not be rigid and may grow or shrink in size. Tracking result using the proposed algorithm. i. (g) Tracking result at frame 140. The new object size is calculated by ð11Þ 1 . Algorithm 2. from whom we collect the votes. where η is a parameter that depends on the speed of the object.

the object and background models have to be continuously updated to reflect the change in object and background information. Updating the background model is quite straightforward. (c) spatiogram. The background histogram is updated as mb ¼ ð1−βÞ⋅mb þ β⋅mb . 4 ð14Þ a b c d e Fig. * Let mb be the background histogram calculated at the current frame →′ by centering the background kernel at c . / Image and Vision Computing 29 (2011) 850–860 855 3. 5.Suryanto et al. (b) level set. . (d) EFS.3. Comparison of the tracking results of the (a) KBT. Model update During tracking. and (e) the proposed algorithm.

where β determines how much the proportion of the newly calculated background histogram is used to update the background histogram. → 3.5. we use β = 0. Fig. Our target representation model is derived from the spatiogram proposed in [10]. gram calculated at the current frame by centering the object kernel at →′ c with the updated kernel dimensions (lx.…. →4 2. ly). 7.lÞ to the object histogram model if a match cannot be found in h.lÞ be the spatial color histob¼1. we highlight our contributions and show explicitly how our algorithm differs from the existing approaches. 3. Tracking object which shares very similar color with the background. k) + γ ⋅ n(b. Let h4 ¼ μ ðb.856 Suryanto et al. Algorithm summary The overall summary of the proposed algorithm is given in Algorithm 3. the distance between them is smaller than the clustering threshold ε.e. k by ∑b. 6. k) is smaller than a certain threshold ξ. In this subsection. nðb. k).kÞ does not find a match in any * entry of the h .5. i. 3.kÞ out of the model if its corresponding probability n(b. However. As the object appearance should not change dramatically from frame to frame. Append μ ðb. The corresponding probability is updated by → * (1−ma) ⋅ n(b. / Image and Vision Computing 29 (2011) 850–860 Fig. Additionally. Since the background tends to change quickly from frame to frame. appending. .2.… The parameter γ is a histogram update parameter similar to β. If μ ðb. Prune μ ðb. and n o →4 4 pruning.B.kÞ .lÞ by simply taking their average if they are matched. unlike [10] which tracks the object in the following frames by finding the image region whose spatiogram representation is most similar to the spatiogram of the target object. Table 4 presents the list of the parameters used in this paper. After updating the object histogram. Our contributions The algorithm we proposed in this paper is the result of meshing the spatiogram and the generalized Hough transform. ξ = 0. The spatial color histogram of the object is updated by: →4 → 1. its corresponding probability value is updated by (1−γ) ⋅ n(b.kÞ with μ ðb.k nðb. l). Merge μ ðb.0001 can be chosen.lÞ . Practically.l¼1. Tracking result using the proposed algorithm for the car-rear sequence from PETS2001 data set.1. its probability value has to be normalized by dividing each probability value nb.4. Updating the object histogram involves merging. we set this parameter to 0.

Update the dimension of the target object using Eq. by allowing each bin of our spatial color histogram model to have more than one mean vector. Our proposed tracking algorithm has several advantages over the existing methods. (13).26 0. Algorithm KBT Spatiogram Level set EFS Proposed Sequence in Fig. 5. Table 5 Time complexity of algorithms (time in ms). respectively.86 25.31 ζ 2 6 10* 15 Sequence in Fig.67 0. 5 0. the proposed adaptive voting method explicitly considers the existence of background region which shares similar color with the object and suppresses the contribution of those colors in tracking. we manually select a target object at the initial frame and model it using the proposed spatial color histogram model presented in Section 3.22 Sequence in Fig. * 7. Model update 6.75 0. Sequence in Fig. we present a vote-map for this particular frame in Fig.46 6.62 Algorithm 3. / Image and Vision Computing 29 (2011) 850–860 857 a Table 6 The average dice coefficients given various values of ε. 2. 7 0. Create the background histogram mb by using Eq. Third.32 0. 7 5.69 22. (b) Fig. 4(c).Suryanto et al.64 .76 0.74 c Fig.83 0. ε 2 5* 10 15 Sequence in Fig. The dice coefficient for each frame of the sequence in (a) Fig.83 0.76 0.62 3. 4(a) and (b) shows the initial frame with the target object marked with green color and the tracking result at frame 30 with the predicted object marked by rectangle. It should be noted that the use of the voting scheme is feasible only after we modify the spatiogram into a form similar to the R-Table of the generalized Hough transform. 6 0.26 36.30 0. we obtain a target representation model that has richer spatial information than the spatiogram.81 Sequence in Fig.39 1. i. 6 0. After Table 7 The average dice coefficients given various values of ζ. 8. (9). 7 0. Fig. 5. 5 0.80 0.83 0.72 3. For all experiments.73 0.80 0. Go back to step 3. (4). Perform the back projection algorithm. 8. the proposed algorithm segments the object region from the background using the simple back projection method. The high intensity pixels represent the locations associated with a large number of →′ votes.41 0.40 0. (c) Fig. 4. Second. 5 2. b Tracking At the following frame:   → 3. 6 0.78 0. we use a 16 × 16 × 16-bins RGB color histogram and set the spatial clustering threshold ε = 5 and α = 0. Experiment results In our experiments. (14). First.76 2.80 Sequence in Fig. 4.41 4.72 Sequence in Fig. 6. 9.67 0. Assign the location in the image which receives the highest →′ votes as the new object location c .82 0. Create the object histogram h using Algorithm 1 and Eq. 7.75 0. Create the new background histogram mb at the new object →′ location c and update the background histogram mb by using Eq.e.56 1.66 0.1. In order to visually illustrate how the center voting procedure works. we propose an adaptive voting method based on the generalized Hough transform to locate the target object.60 Sequence in Fig. Algorithm summary Initialization → Given an object to be tracked at a spatial location c in the initial frame: 1. Request each pixel inside the search region Rect c to cast the votes with the voting weights according to Eq. (8). Create the new object histogram h* at the new object location →′ c and update the object histogram h. the location of the object center c to be estimated.1.

60 0. Table 6 shows the tracking results for various values of the spatial clustering threshold ε. 5 0. The algorithm tracks the car successfully until the car becomes too small to be discerned. The EFS algorithm also fails for the same reason.50* 0. Fig. and (h) shows more tracking results along with the segmented object results.05 0.75 0.77 Suryanto et al.82 0. 5(a).78 0.83 0. Table 9 shows the result of experimenting with various values of β.40 Sequence in Fig.20 0.83 0. Next.82 0.70 0. / Image and Vision Computing 29 (2011) 850–860 Table 10 The average dice coefficients given various values of γ. the level set [3].60 0. The parameter determines how fast the background model is updated. ζ = 10.71 Sequence in Fig. This segmentation result is shown in Fig.80 0. The result of experimenting with various values of ζ is given in Table 7. The robust performance of our algorithm against the background with similar color is achieved due to the use of adaptive voting weights. we show the effect of changing the value of the parameters to the tracking performance.82 0. 7 0.70 Sequence in Fig. As shown in Table 8. 5. setting the object model update parameter γ to a small value is a logical choice.64 Sequence in Fig. Table 11 The average dice coefficients given various values of ξ. 7 0.80 0. Ω2 Þ ¼ ð15Þ The dice coefficients of the various algorithms for each frame of the sequence in Figs. small ε gives good performance.55 the object center is obtained. We compare our algorithm with the kernel based tracking (KBT) [9]. and 7 are shown in Fig.72 0. thus ζ is set to 10. 6 0. As the object being tracked is not rigid.80 0. then the dice coefficient can be calculated as 2⋅AreaðΩ1 ∩Ω2 Þ : AreaðΩ1 Þ þ AreaðΩ2 Þ DðΩ1 . If we denote the ground truth rectangle and the tracking rectangle as Ω1 and Ω2. 7 0.70 Sequence in Fig.30 0.79 0.84 0. We suggest using a small value for the rigid object and a larger value for the non rigid object. 6 0.1 as the size of the object generally does not change drastically between frames. In all three sequences. 6 0. The rigidity threshold ζ should be set according to the characteristic of the object being tracked. 6 0. We conclude that ζ is critical for good tracking performance. We present the experiment result in Table 5. The performance differences among these algorithms become apparent when the target object becomes smaller and smaller as it moves away from the camera.78 0. 6. 4(d). we measure the average time required to process a single frame during tracking.80 0. 7 0. and Fig.00001* 0. (b). respectively.80 0. 5 0. Fig. The spatiogram based mean shift algorithm loses the target object due to sudden movement of the camera. Table 9 The average dice coefficients given various values of β. and the extended feature selection (EFS) algorithm [19] and show some of the tracking results in Fig.40 Sequence in Fig.80 0. In video tracking. The asterisk sign near the parameter value indicates the default value of the parameters as suggested in Table 4.30 0. 5(b). The experiment is run on a PC with dual core 3 GHz CPU and 3 GB RAM. their removal does not affect the tracking performance.75 0. The level set algorithm shows relatively good tracking performance. The resolution of the sequence from Figs.00010 0.79 0.82 0.80 In order to compare the algorithm complexity.858 Table 8 The average dice coefficients given various values of α. We provide the experiment result in Table 11 to support our argument. but it is not very sensitive since using a value slightly smaller or larger than the suggested value does not affect the tracking performance greatly. respectively. 5 0. We set this value to 0.20 0. it still runs at the speed much higher than the real time requirement. 6 where the object being tracked shares similar color with the background.55 0. 8(a).84 0. 7 is 384 × 288. This poor estimation of the target object size contributes to tracking failure in the later frames as shown in the fourth column of the figure. This parameter is relatively insensitive except for the challenging sequence in Fig.30 0. This parameter is not sensitive as long as it is set to a small value.83 0. 5 0.78 0. 6.36 0.80 0.51 0.80 . β 0.75 0. we present the proposed algorithm result when tracking an object which shares similar color with the background. In Fig.05 0. The tracking result for various values of γ is shown in Table 10.79 0. In general.75 0.72 Sequence in Fig. the proposed algorithm outperforms the conventional methods.76 0.82 0.50 0. (f). In general.75 0. As shown in the third column of Fig.79 γ 0. the spatiogram based mean shift [10]. it is reasonable to assume that the object appearance does not change drastically from frame to frame.30 0. The experiment results are given in Table 6 to Table 11 with the average dice coefficient used as the performance metric. 5.76 0. While our algorithm is a little more complex than the conventional algorithms except for the EFS algorithm. The result is a coarse model of the spatial color histogram. and (c).79 0. the car appearance changes quickly both in pose and size. ξ 0. large γ reduces the tracking performance as it allows more background pixels to be updated into the object model. The slight increase in computational time is well justified by its superior tracking performance.60 Sequence in Fig. In order to compare the performance of the algorithms quantitatively.83 0. At each experiment.67 0. only one parameter is varied and the rest of the parameters are set to the default value.00050 Sequence in Fig.74 0. we use the dice coefficient metric [20] to measure the degree of overlap between the tracking rectangle and the ground truth. The parameter α is used to update the object size. Large values of ε will cause more pixels to be grouped into the same cluster. Even though the object being tracked is rigid. Since these clusters are casting votes with very small weights.00005 0. smaller value results in better performance as we expected.77 Sequence in Fig.10* 0. Thus. Our algorithm tracks both the location and the size of the target object successfully throughout the sequence.62 0.40 0. 7 demonstrates the tracking of a vehicle in Performance Evaluation of Tracking and Surveillance (PETS) data set using the proposed algorithm.10* 0. 5 and 6 is 320 × 240. Sequence in Fig. α 0.79 0. the KBT algorithm tracks the object with a much larger bounding rectangle than the object actually is.79 Sequence in Fig. 4(e). the back projection method is utilized to segment the object from the background. The pruning threshold ξ is used to remove clusters whose probability to be the object model are small. (g). but with its contour occasionally expands to include the neighboring objects and shrinks to capture only a portion of the target object as can be seen in the third and fourth columns of Fig.77 0.

Since the object being tracked is rigid. variation in illumination affects the performance of our algorithm greatly. Since our algorithm uses the color information to represent the object. (a) KBT algorithm. In order to ensure that the target region at the current and following frames are overlapping. The robustness of the proposed algorithm can be improved by using multiple features together with the color feature. suggest to initialize the target model using a 21 × 31 size region. However. deals with this problem by simply enlarging the search region by setting η = 3. Discussion We presented a new object tracking algorithm based on the generalized Hough transform method. allowing more pixels around the previous object location to vote for the new location of the object. the target appearance changes drastically when the person walks into the shadow area. We consider that several features such as edges and textures can compensate the weakness of the color feature. (b) The proposed algorithm. 9(a) and (b). Performance of the proposed algorithm under illumination change. Note that our algorithm obtains better estimate of both the location and size of the ball. We show both the success case and the failure case in Fig. Comaniciu et al. 5. ζ = 2 is used. With the proposed spatial color histogram model. in order to keep the content of this paper concise. Our algorithm. respectively. (a) Successful tracking. however. respectively. 9. . The result of the KBT algorithm and our algorithm is presented in Fig. which is larger than the ball itself. In the proposed algorithm. 9 shows the experiment result of tracking a fast moving object. the movement of the ball from frame to frame is larger than its size. 10(a) and (b). / Image and Vision Computing 29 (2011) 850–860 859 a b Fig. Here we also show the result of tracking in the case of illumination change. The table tennis sequence. the object position a b Fig.Suryanto et al. when tracking using their KBT algorithm. Fig. Note that in Fig. 10(b). In this sequence. the object is represented by using a spatial color histogram model which has a form similar to the R-Table of the generalized Hough transform. 10. (b) Failure tracking. only the color information is used and the combination with other features remains as our future work.

Yonemoto. Intell. pp. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Shah. [22] K. Arita. 2000. Grimson.09. Thus we expect that an alternative representation that is invariant to size and shape changes will improve the robustness of the algorithm. Yagi. Y.T. Bay. Supplementary data Supplementary data to this article can be found online at doi:10.-J. Yonemoto. pp.2011. Harwood. 2. Combined object categorization and segmentation with an implicit shape model. Principles. [13] H. Isard. Analysis of human motion using snakes and neural networks. Intell. Taniguchi. 714–725. S. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Shi. Detection and tracking of point features. 593–600. O. Mach. Leone. 1987. 1016/j. J. 59. Comaniciu. Surv. CONDENSATION—conditional density propagation for visual tracking.S. R. In order to extract the object silhouette. pp. Haritaoglu. 1158–1163. Lowe.imavis. Proceedings of International Joint Conference on Artificial intelligence. B. Experimental results indicate that the proposed algorithm can track the object successfully even when the object shares similar color with the background and changes in size. Rangarajan. S. A. [3] Y.G. in the next frame is estimated by requesting each pixel to vote for the location of the object. Vis. D. 2. D. [18] I.M. 2. Ko. 674–679. Chalidabhongse. Generalizing the Hough transform to detect arbitrary shapes. Real-time tracking using level sets. Real time face and object tracking as a component of a perceptual user interface. Comput. Mach. Vision 29 (1998) 5–28. D. Intell. 2002. 167–170. [17] A. Mach.H. A. [2] M. pp. T.J. Distante. 214–219. 17–32. Karl. Lindenbaum. 2005.S. [19] J. Harwood. [16] C. Problems. [10] S. Probabilistic center voting method for subsequent object tracking and segmentation. Engineering and Technology. Integrating color and shape texture features for adaptive realtime object tracking. [5] C. S. Our algorithm can extract the object silhouette even when the video sequence is taken by a moving camera. V. Image Underst. 1998. C. [23] F. 92–103. [21] R. S. Adaptive background mixture models for real-time tracking. Leonardis. A. An iterative image registration technique with an application to stereo vision. Ess. Acknowledgments This work was supported by the Mid-career Researcher Program through NRF grant funded by the MEST (No.-I. Kanade.G. Kim. W. pp. World Academy of Science. 2004. M. Bradski. R. 2000. L. 2011-0000200). 2005. Tools and techniques for video performance evaluation. Shi. 1981. 246–252. C. R. T. 2005. D. Birchfield. Proceedings of the Advanced Video and Signal Based Surveillance. M. 1991. 450–454. pp. Pattern Anal. Adams. pp. [8] G. Pattern Anal.-H. 2. Kernel-based object tracking. Kanade. the relative distance of a cluster of pixels from the object center is currently expressed inpixel-distance.-H. Arita. S. Meer. 213–218. Good features to track. 38 (2006) 13. Davey. Ballard. D. pp. [20] D. J. Tuytelaars. [12] D. Tomasi.R. Rivlin. L. Mihalcik. E. Harwood. 34–41. Elgammal. IEEE Trans. Comput. Many of the algorithms developed for human motion understanding are based on the analysis of object silhouettes [21–24]. Javed. Carnegie Mellon University. [6] D. References [1] A. 1994. L. N. 2009. pp. L. Technical Report. R.-I. Int. pp. Image Process. T. Ramesh. Yilmaz. Tracking by affine kernel transformations using color and boundary cues. W.D. Vision 60 (2004) 91–110. Readings in Computer Vision: Issues.H. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Lucas. Hoshino. Object tracking: a survey. / Image and Vision Computing 29 (2011) 850–860 [7] H. 2. 751–767. Wang. Non-parametric model for background subtraction. Proceedings of the Second International Workshop on Articulated Motion and Deformable Objects. pp. Davis. Davis.008. Proceedings of the First International Workshop on Articulated Motion and Deformable Objects. IEEE Trans. Comput. Taniguchi. Real-time foregroundbackground segmentation using codebook model. D. George. Kim. Workshop on Statistical Learning in Computer Vision. T. 22 (2000) 809–830. Furthermore. IEEE Trans. Schiele. A. these algorithms usually employ a simple background subtraction technique which is effective only when the video is captured by a static camera. ACM Comput.-K. Doermann. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. [15] K. 4. [24] I. Proceedings of European Conference on Computer Vision. Tabb. 2000. Leichter. D. P. 1999. [14] B. Speeded-up robust features (surf). IEEE Trans. The proposed algorithm can be employed into applications such as understanding human motion in video sequence. 17 (2008) 235–240. Stauffer. and Paradigms.C. Kim Suryanto. Buccolieri. ECCV. 110 (2008) 346–359. [9] D. 31 (2009) 164–171. Appendix A. Proceedings of International Conference on Pattern Recognition. [4] J. Real-Time Imaging 11 (2005) 172–185. Real-time human motion analysis based on analysis of silhouette contour and color blob. Hoshino. pp. S. Tomasi. . The future work shall be focused on integrating the boundary information into color histogram to further improve the reliability of the algorithm. Human posture recognition using active contours and radial basis function neural network. W4: real-time surveillance of people and their activities. Leibe. Davis. Van Gool. pp. Spatiograms versus histograms for region-based tracking. 48–57. Blake. pp. Int. [11] B. Park. 25 (2003) 564–575. Distinctive image features from scale-invariant keypoints. Pattern Anal.860 Suryanto et al. IEEE Workshop on Applications of Computer Vision.