59 views

Uploaded by raghu247

save

- A Methodology for Extracting Standing Human Bodies from Single Images
- Based on correlation coefficient in image matching
- Lane Change Detection and Tracking for a Safe-Lane Approach in Real Time Vision Based Navigation Systems
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer Vision
- Object Detection Using a Max-margin Hough Transform - Maji, Malik - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition - 2009
- Chapter 10
- Object Extraction in Data Mining Framework for Video Sequenc
- Color Image Segmentation Based on Jnd Color Histogram
- A Novel Approach for Hand Analysis Using Image Processing Techniques
- Real-Time Object Detection for Smart Vehicles
- IMAGE PROCESSING BASED GIRTH MONITORING AND RECORDING SYSTEM FOR RUBBER PLANTATIONS
- A Benchmark for Breast Ultrasound Image Segmentation (BUSIS)
- Segmentation and Classification of Point Clouds From Dense Aerial Image Matching
- Neural Network Approach for Eye Detection
- cvip0
- 2009 Humanoids Drkaakd
- Influence of Local Segmentation in the Context of Digital Image Processing
- A Survey of Methods and Strategies in Character Segmentation
- 10.1.1.23
- Histogram Specification a Fast and Flexible Method
- Recognition of Urdu Scripts
- 2014 IEEE MATLAB TITLES for Btech
- Contents
- recognition of Urdu Script
- Segmentation of Brain MRI Images using Fuzzy c-means and DWT
- Object Base Image Classifications Bangladesh
- Av 23274279
- S R S
- Iris Recognition
- Moving Object Detection for Video Surveillance
- greedy_method.pdf
- IEEE_MM
- fator_impacto.pdf
- Fator Impacto

and segmentation

☆

Suryanto, Dae-Hwan Kim, Hyo-Kak Kim, Sung-Jea Ko ⁎

School of Electrical Engineering, Korea University, Seoul, South Korea

a b s t r a c t a r t i c l e i n f o

Article history:

Received 13 August 2010

Received in revised form 2 September 2011

Accepted 23 September 2011

Keywords:

Object tracking

Spatial color

Histogram

Center voting

Back projection

Generalized Hough transform

In this paper, we introduce an algorithm for object tracking in video sequences. In order to represent the ob-

ject to be tracked, we propose a new spatial color histogram model which encodes both the color distribution

and spatial information. Using this spatial color histogram model, a voting method based on the generalized

Hough transform is employed to estimate the object location from frame to frame. The proposed voting based

method, called the center voting method, requests every pixel near the previous object center to cast a vote

for locating the new object center in the new frame. Once the location of the object is obtained, the back pro-

jection method is used to segment the object from the background. Experiment results showsuccessful track-

ing of the object even when the object being tracked changes in size and shares similar color with the

background.

© 2011 Elsevier B.V. All rights reserved.

1. Introduction

The tracking of moving objects from frame to frame in real time

video sequences captured by the moving camera is a highly challeng-

ing task. The conventional background subtraction techniques cannot

be employed to locate the moving object in such video sequences due

to the constant changes of the background scene. Additional chal-

lenges come from complex object motion, non-rigid object tracking,

partial occlusion, illumination change, and real time processing re-

quirement. Despite these difﬁculties, many tracking algorithms have

been proposed over past decades. For a comprehensive review of var-

ious tracking algorithms, the readers can refer to [1].

Two important aspects that determine the performance of the track-

ing algorithms are target representation and target localization. Target

representation refers to how the object to be tracked is modeled and

target localization deals with how the search of the corresponding ob-

ject in the following frame is accomplished. Popular models used for

target representation are object contour [2,3], feature point [4–7], and

color histogram [8–10]. Depending on the target representation

model, various target localization techniques can be employed.

Tracking with object contour performs well even when the object

being tracked is not rigid. The CONDENSATION algorithm [2] parameter-

izes the contour using the B-Spline curve and performs tracking using the

particle ﬁltering method. At each frame, a total of N particles, with each

particle represented by one B-Spline curve, have to be maintained and

updated based on the local edge map. In general, a high number of parti-

cles are required to maintain a good tracking performance. The computa-

tion requirement of this algorithm is relatively high, which limits its

application in simultaneous multiple object tracking. Another algorithm

for object contour tracking represents the object contour using two linked

lists and a level set array [3]. Contour adaptationis realizedby performing

switching on elements of the linked lists. At each frame, the elements of

the linked lists are adjusted to ﬁt the object contour. The computation

complexity of this algorithm is relatively low, but the non-parametric

representation of the contour makes the contour unconstrained. As the

object moves into the backgroundwhichshares similar color withthe ob-

ject, the object contour expands to include this backgroundregion, result-

ing in tracking failure.

Tracking using feature points produces good results when the

object has rich textures. For a good feature point, the iterative Newton

Raphson minimization algorithm can be employed to ﬁnd its corre-

sponding point in the next frame [5,11]. Tracking with feature points

is fast andreliable. However, when the object turns around or is partially

occluded, the performance of the tracking algorithm deteriorates.

The use of the color histogram for target representation has been

increasingly popular due to its robustness against object pose changes

and partial occlusion. Bradski proposed an algorithm called CAM-

SHIFT [8] which tracks the face in video sequences using the color his-

togram of the skin. In order to locate the face from frame to frame, an

Image and Vision Computing 29 (2011) 850–860

☆ This paper has been recommended for acceptance by Richard Bowden.

⁎ Corresponding author at: School of Electrical Engineering, Korea University, Anam-

dong, Sungbuk-Gu, Seoul, 136-713, South Korea. Tel.: +82 2 3290 3228; fax: +82 2

925 5883.

E-mail addresses: suryanto@dali.korea.ac.kr (Suryanto), dhkim@dali.korea.ac.kr

(D.-H. Kim), hkkim@dali.korea.ac.kr (H.-K. Kim), sjko@korea.ac.kr (S.-J. Ko).

0262-8856/$ – see front matter © 2011 Elsevier B.V. All rights reserved.

doi:10.1016/j.imavis.2011.09.008

Contents lists available at SciVerse ScienceDirect

Image and Vision Computing

j our nal homepage: www. el sevi er . com/ l ocat e/ i mavi s

iterative procedure based on the mean shift is applied to center the

object rectangle in the face region. At each iteration, the rectangle po-

sition is moved to a new position until convergence. Even though the

Bradski's algorithm was developed for face tracking, it can be used to

track any object of interest. Comaniciu et al. proposed the Kernel

Based Tracking (KBT) algorithm which uses the kernel weighted

color histogram to represent the color distribution of the object [9].

In the kernel weighted color histogram representation, the pixels lo-

cated near the object boundary are given smaller weights while the

pixels around the center of the object are assigned larger weights.

The object localization is performed iteratively by a mean shift meth-

od similar to CAMSHIFT. The use of the kernel weighted histogram for

the mean shift based algorithms signiﬁcantly improves the tracking

performance. However, the algorithm does not perform well when

the object being tracked changes in size.

In this paper, we introduce a newobject representation model and

localization method. In order to represent the object to be tracked, we

propose a new class of spatial color histogram model, by adopting the

concept of the spatiogram[10]. Each bin in the spatial color histogram

model contains the information on the number of pixels belonging to

the color bin and the positions of those pixels relative to the object

center. The localization of the object in the next and following frames

is accomplished using the center voting method based on the gener-

alized Hough transform voting scheme [12]. Once the object location

is obtained, the back projection method is utilized to segment the ob-

ject from the background. With the segmented object, the current ob-

ject size is estimated and the search range is adjusted accordingly.

This paper is organized as follows. In the next section, we brieﬂy

review the generalized Hough transform and the spatiogram which

are closely related to our proposed algorithm. Then, we present

our algorithm in detail in Section 3. Experiment results are given in

Section 4. Section 5 concludes the content of the paper.

The preliminary version of this work has been published in [13].

We extend our previous work by utilizing the kernel to create a

more reliable model and by introducing the spatial color histogram

update mechanism.

2. Related works

2.1. The generalized Hough transform

The generalized Hough transform has been widely used to detect

the shape in the image [12]. It consists of two main parts; the con-

struction of the R-Table to represent the shape and the voting scheme

to detect the shape in the image.

2.1.1. Construction of the R-Table

Given an arbitrary shape as in Fig. 1(a), for each point x

→

¼ x; y ð Þ at

the boundary of the shape, the gradient direction φ and its relative

position r

→

¼ r

x

; r

y

_ _

from a reference point c

→

¼ c

x

; c

y

_ _

are comput-

ed. Then, the relative position vectors r

→

¼ x

→

− c

→

are stored as a

function of the gradient direction in the R-Table. In general, a gradient

direction φ may have many values of r

→

. Table 1 shows a general form

of the R-Table.

2.1.2. Voting scheme for the shape detection

For each edge pixel x

→

in the image in Fig. 1(b), its gradient direc-

tion φ=φ′ is calculated. Then, each pixel casts votes into the vote ac-

cumulator at positions x

→

− r

→

where r

→

is the set of all position

vectors indexed by φ′ in the R-Table. Fig. 1(c) shows the vote accu-

mulator for the detection of the shape in Fig. 1(a) from the test

image in Fig. 1(b). The pixel with the highest intensity, i.e. the pixel

with the highest vote, indicates the location of the shape.

In Section 3, we show how to adopt this generalized Hough trans-

form technique to track the object from frame to frame in a video se-

quence. Note that the underlying idea behind the generalized Hough

transform has also been applied successfully to the object category

detection [14].

2.2. Tracking by spatiogram

The spatiogram [10] represents the object to be tracked as

h ¼ n

b

; μ

b

; Σ

b

f g

b¼1;…;B

; ð1Þ

where n

b

is the number of object pixels whose quantized values fall

into the b-th bin, and μ

b

and Σ

b

are the mean vector and the covari-

ance matrix, respectively, of the coordinates of those pixels. The num-

ber B is the total number of bins in the spatiogram.

The object localization is performed by determining the location

yεR

2

in the image where the similarity between the spatiogram of

the object h and the spatiogram at location y, h(y)={n

b

(y), μ

b

(y), Σ

b

(y)}, is maximized. The spatiogram h(y) is calculated from an image

region whose center is at location y and has the same size as the ob-

ject to be tracked.

The similarity between two spatiograms is computed as the

weighted sum of the Bhattacharyya similarity between two histo-

grams

ρ h y ð Þ; h ð Þ ¼ ∑

B

b¼1

ψ

b

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

n

b

y ð Þn

b

_

; ð2Þ

a b c

Fig. 1. Geometry for the generalized Hough transform. (a) Model shape. (b) Test image. (c) Vote accumulator.

Table 1

The R-Table of an arbitrary shape.

Gradient direction Positions

0 r

→

_ ¸

¸

¸ r

→

¼ x

→

− c

→

; ϕ x ð Þ ¼ 0g

Δϕ r

→

_ ¸

¸

¸ r

→

¼ x

→

− c

→

; ϕ x ð Þ ¼ Δϕg

2Δϕ r

→

_ ¸

¸

¸ r

→

¼ x

→

− c

→

; ϕ x ð Þ ¼ 2Δϕg

… …

851 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

where the weighting function is given by

ψ

b

¼ η exp −

1

2

μ

b

y ð Þ−μ

b

ð Þ

T

Σ

⌢

−1

b

μ

b

y ð Þ−μ

b

ð Þ

_ _

; ð3Þ

with η as the Gaussian normalization constant and Σ

⌢

b

−1

¼

Σ

−1

b

y ð Þ þΣ

−1

b

. The location y that maximizes the similarity function

in Eq. (2) is determined either by using the gradient descent mean

shift method or the exhaustive local search. In general, the mean

shift based localization is several order faster than the exhaustive

local search.

In our algorithm, we modify this spatiogram to construct a table

similar to the R-Table and then use the voting mechanism of the gen-

eralized Hough transform to ﬁnd the location of the object to be

tracked. We describe our algorithm in detail in the next section.

3. Proposed algorithm

Given an object centered at location c

→

¼ c

x

; c

y

_ _

in the current

frame, the tracking objective is to ﬁnd the new object center c

→

′

in

the next and the following frames. The object to be tracked is referred

to as the target object.

Since our main objective is to track the object, we assume that at

the initial frame, the object has been segmented out from the back-

ground. This object segmentation can be done at the initial frame by

performing the background subtraction algorithms [15–17] or even

by manual selection by a human operator.

3.1. Target representation

Let x

→

i

¼ x

i

; y

i

ð Þ

_ _

i¼1…N

be the location of the pixels belonging to

the target object, c

→

be the location of the object center, and l

x

and l

y

be half the widthand height of the rectangle bounding the object region

as shown in Fig. 2(a). In order to represent the target object, we use the

spatial color histogram model h ¼ μ

→

b;k ð Þ

; n

b;k ð Þ

_ _

b¼1;…;B;k¼1;2;…

, where

μ

→

b;k ð Þ

is the mean vector representing the position of the k-th cluster

of pixels relative to the object center. A cluster is deﬁned as a group of

pixels whose quantized values fall into the same b-th bin of the histo-

gram and which are located close to each other. The n

(b, k)

is the proba-

bility value associated with the number of pixels belonging to that

particular cluster. Mathematically,

n

b;k ð Þ

¼ C ∑

N

i¼1

K x

→

i

_ _

δ

ibk

; ð4Þ

where C is the normalization constant to ensure that ∑

b;k

n

b;k ð Þ

¼ 1

which is given by

C ¼

1

∑

N

i¼1

K x

→

i

_ _

; ð5Þ

and δ

ibk

=1 if the value of pixel x

→

i

is quantized into the b-th bin and its

distance from the μ

→

b;k ð Þ

is smaller than threshold ε, otherwise δ

ibk

=0.

The pseudo-code for estimating ěcμ

(b, k)

is given in Algorithm 1.

Algorithm 1. Estimation pf μ

→

b;k ð Þ

Input: pixel location x

→

i

of the target object

Output: cluster of pixels μ

→

b;k ð Þ

for each x

→

i

:

1. Calculate the bin index b of pixel x

→

i

.

2. Create a new cluster μ

→

b;1 ð Þ

¼ x

→

i

if there are no cluster in the b-

th bin.

3. Otherwise, for each k:

(a) calculate the distance of x

→

i

from μ

→

b;k ð Þ

,

(b) include x

→

i

to μ

→

b;k ð Þ

if the distance is smaller than ε,

(c) update μ

→

b;k ð Þ

.

4. Create a new cluster μ

→

b;k ð Þ

¼ x

→

i

if x

→

i

is not included in any

existing clusters.

As in [9], here we also employ a monotonically decreasing kernel

function K x

→

i

_ _

to assign smaller weights to pixels located farther

away from the object center; because these pixels are less reliable,

a b c

Fig. 3. Illustration of object tracking using the proposed algorithm. (a) The target object. (b) Center voting procedure. (c) Tracking result.

a b c d

Fig. 2. (a) The target object. (b) Object region and background region. (c) 2-D object kernel. (d) 2-D background kernel.

852 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

since they are often affected by occlusion or interference from the

background. The kernel function K x

→

i

_ _

can be written as

K x

→

i

_ _

¼

1−d

2

x

→

i

_ _

; if d x

→

i

_ _

≤1;

0; otherwise;

_

ð6Þ

where d x

→

i

_ _

is the standardized Euclidean distance between pixel

x

→

i

and the object center c

→

, calculated by ﬁrst normalizing (x

i

−c

x

)

and (y

i

−c

y

) by l

x

and l

y

, respectively, i.e.:

d x

→

i

_ _

¼

ﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ

x

i

−c

x

l

x

_ _

2

þ

y

i

−c

y

l

y

_ _

2

¸

¸

¸

_

: ð7Þ

The standardized Euclidean distance in Eq. (7) is similar to the equa-

tion of the ellipse. In fact, if d x

→

i

_ _

¼ 1, we obtain an ellipse centered

on c

→

with the semi-minor axis l

x

and the semi-major axis l

y

as shown

in Fig. 2(b). For pixels located outside this ellipse, which are mostly the

background pixels, their standardized distances from the object center

are greater than one, and consequently their kernel weights K(cx

i

) be-

come zero. On the other hand, for pixels inside the ellipse, their standard-

ized distances are smaller than one. The kernel function K x

→

i

_ _

assigns

larger weights to the pixels located closer to the center as shown in

Fig. 2(c).

Unlike the spatiogram which has a single mean and a single co-

variance for each bin, our spatial color histogram model allows each

histogram bin to have more than one mean vector. Each mean vector

represents a cluster of pixels sharing a similar color which are located

in close proximity to each other.

For example, consider a target object in Fig. 3(a). The alphabets in-

dicate the color histogram bins associated with the pixels. The spatial

color histogram for this object is shown in Table 2. For simplicity, we

use auniform kernel K x

→

i

_ _

¼ 1 for this particular example. Note that

the two pixels belonging to the same color histogram bin a are not

clustered together due to their spatial distance. Thus, the correspond-

ing histogram bin has two mean vectors. On the other hand, since the

two pixels with bin index e are adjacent to each other, they are

grouped into the same cluster with a single mean.

Table 3 shows that the proposed spatial color histogram model

has a structure similar to the R-Table. In the next subsection, we

show how to use this spatial color histogram model to locate the tar-

get object in the following frames.

3.2. Target localization

Target localization consists of center voting and back projection

steps. In the center voting step, every pixel located near the previous

object center c

→

is required to cast a vote for the location of the object

center. Rules for the center voting procedure are as follows:

1. Only the pixel whose color exists in the target model can cast a vote.

2. Each pixel x

→

i

, whose quantized value falls into b-th bin, casts a

vote on position x

→

i

−μ

→

b;k ð Þ

_ _

.

3. More reliable pixels cast votes with higher weights than less reli-

able pixels.

Fig. 3(b) illustrates the center voting procedure. The arrows indi-

cate where the pixels cast their votes on. The pixels labeled × are

the pixels whose colors do not fall into any of the bins of the target

model histogram. Thus, these pixels do not cast any vote on the loca-

tion of the object center. Pixels located at x

→

i

labeled b, c, and e cast

their votes on the position indicated by their mean vectors at

x

→

i

− 0; −1 ð Þ

_ _

, x

→

i

− 1; −1 ð Þ

_ _

, and x

→

i

− −0:5; 1 ð Þ

_ _

, respectively.

Since bin a contains two mean vectors, pixels whose colors fall into

this bin cast two votes. It can be seen that the location in the image

receiving the highest number of votes is located at pixels labeled d,

which is the new location of the object c

→

′

as shown in Fig. 3.

In this example, we have made a very naive assumption that the

background does not share the same color with the object. The as-

sumption, of course, does not hold in most cases, which causes the al-

gorithm to fail to correctly estimate the location of the object center.

This problem can be solved by adding the third rule, i.e. reliable pixels

cast votes with higher weights.

Naturally, a pixel is regarded as a reliable pixel if its color exists in

the object, but not in the background. A straightforward way to quan-

tify the reliability of a pixel is to use the probability difference as

employed in [18]. Thus, the voting weight w

(b, k)

for a vote casted

using mean vector μ

→

b;k ð Þ

by a pixel whose color belongs to the b-th

bin of the histogram can be expressed as

w

b;k ð Þ

¼ max

n

b;k ð Þ

−m

b

n

b;k ð Þ

þ m

b

; 0

_ _

; ð8Þ

where n

(b, k)

is given in Eq. (4) and m

b

is the probability value associ-

ated with the number of pixels in the background whose quantized

values fall into the b-th bin.

Redeﬁne x

→

i

¼ x

i

; y

i

ð Þ

_ _

i¼1…N

as the pixels in the background re-

gion, located between the ellipse with semi-minor axis l

x

and semi-

major axis l

y

and the ellipse with semi-minor axis l

x

+Δ and semi-

major axis l

y

+Δ, both centered at c

→

as shown in Fig. 2(b). The prob-

ability value m

b

, which is also the value of the background histogram

at b-th bin, can be calculated by

m

b

¼ C

BG

∑

N

i¼1

K

BG

x

→

i

_ _

δ

ib

; ð9Þ

Table 2

Spatial color histogram for the target object in Fig. 3(a).

bin index h ¼ μ

→

b;k ð Þ

; n

b;k ð Þ

_ _

a {(−1,−1),1/7}, {(1,1),1/7}

b {(0,−1),1/7}

c {(1,−1),1/7}

d {(0, 0),1/7}

e {(−0.5, 1),2/7}

Table 3

Attributes of the R-Table and the proposed model.

R-Table Our proposed model

Reference point c

→

Object center c

→

Gradient direction ϕ Histogram bins b

Positions r

→

Mean vectors μ

→

b;k ð Þ

Table 4

The parameters.

Param Meaning Suggested value

Object motion related parameters

η Speed of the object For most tracking application: 2. If the object

displacement between frames is larger than

the object size, then 3 or larger.

ε Spatial clustering

threshold

5

ζ Object rigidity threshold 10

Model update related parameters

α Object size update ratio 0.1

β Background model update

ratio

0.5

γ Object model update ratio 0.1

ξ Pruning threshold 0.00001

853 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

where

C

BG

¼

1

∑

N

i¼1

K

BG

x

→

i

_ _

is a normalization constant and δ

ib

=1 if the value of pixel x

→

i

is quan-

tized into the b-th bin, otherwise δ

ib

=0. K

BG

x

→

i

_ _

is the background

kernel function as used in [18], which is given by

K

BG

x

→

i

_ _

¼

1−λ σ−d x

→

i

_ _ _ _

2

; if d x

→

i

_ _

≥1

_ _

and d

Δ

x

→

i

_ _

≤1

_ _

;

0; otherwise;

_

ð10Þ

where

σ ¼

1

2

1 þ

d x

→

i

_ _

d

Δ

x

→

i

_ _

_

_

_

_

; ð11Þ

λ ¼

1

σ−1 ð Þ

2

; ð12Þ

and d

Δ

x

→

i

_ _

is the standardized Euclidean distance as in Eq. (7) but

with the normalizer l

x

+Δ and l

y

+Δ.

The ring-shaped background kernel function K

BG

in Eq. (10) assigns

weights with value zero for pixels located outside the background re-

gion, as shown in Fig. 2(d). Note that the widthof the backgroundregion

is determined by the value Δwhich can be calculated as a function of the

object dimension, i.e. Δ=η⋅min(l

x

, l

y

), where η is a parameter that de-

pends on the speed of the object. For most tracking application where

the object motion between frames is not larger than the object size,

η=2 is sufﬁcient. Larger value of η should be used when faster moving

objects are tracked.

Furthermore, as the object in the new frame tends to be located

near its previous location, we only have to collect the votes from

pixels located nearby its previous center c

→

. This set of pixels, from

whom we collect the votes, is the search range of the algorithm and

is shown in Fig. 3(b) as a region enclosed by the dash-line rectangle

centered on c

→

denoted as Rect c

→

_ _

. The dimension of this search re-

gion is 2(l

x

+Δ)×2(l

y

+Δ).

Once we obtain the new object center c

→

′

, we re-scan the pixels in

the neighborhood to see which pixels have casted the correct votes.

The pixels that have casted the correct votes for the object center are

marked as the object pixels. Since the object being tracked may not be

rigid and may grow or shrink in size, the relative location of pixels

fromthe object center can change slightly, causing themto vote slightly

off from the object center. In order to include these pixels into the fore-

ground, we allowpixels that have casted their votes somewhere within

distance ζ fromthe object center c

→

′

to be categorized as object pixels as

well. The pseudocode is given in Algorithm 2.

Algorithm 2. Back projection

for all x

→

∈Rect c

→

_ _

distance←‖vote x

→

_ _

− c

→

′

‖

if distancebζ then

x

→

is foreground pixel

else

x

→

is background pixel

end if

end for.

The result of this back projection method is a foreground image

where the pixels belonging to the object are marked as 1's. With

this foreground image, we can easily estimate the change in object

size and then re-adjust the dimension of the object and background

kernel accordingly. Let l

x

*

and l

y

*

be half of the width and length of

the rectangle bounding the foreground region obtained by the back

projection method. The new object size is calculated by

l

x

0

¼ 1−α ð Þ⋅l

x

þα⋅l

4

x

;

l

y

0

¼ 1−α ð Þ⋅l

y

þα⋅l

4

y

;

ð13Þ

where α determines how fast we update the old object size with the

newly obtained size.

a b c d

e f g h

Fig. 4. Tracking result using the proposed algorithm. (a) Target object at the initial frame. (b) Tracking result at frame 30. (c) Vote map for frame 30. (d) Segmented object at frame

30. (e) Tracking result at frame 60. (f) Segmented object at frame 60. (g) Tracking result at frame 140. (h) Segmented object at frame 140.

854 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

3.3. Model update

During tracking, the object and background models have to be con-

tinuously updated to reﬂect the change in object and background in-

formation. Updating the background model is quite straightforward.

Let m

b

*

be the background histogram calculated at the current frame

by centering the background kernel at c

→

′

. The background histogram

is updated as

m

b

¼ 1−β ð Þ⋅m

b

þβ⋅m

4

b

; ð14Þ

a

b

c

d

e

Fig. 5. Comparison of the tracking results of the (a) KBT, (b) level set, (c) spatiogram, (d) EFS, and (e) the proposed algorithm.

855 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

where β determines howmuch the proportion of the newly calculated

background histogram is used to update the background histogram.

Since the background tends to change quickly from frame to frame,

we use β=0.5.

Updating the object histogram involves merging, appending, and

pruning. Let h

4

¼ μ

→4

b;l ð Þ

; n

4

b;l ð Þ

_ _

b¼1;…;B;l¼1;2;…

be the spatial color histo-

gram calculated at the current frame by centering the object kernel at

c

→

′

with the updated kernel dimensions (l

x

, l

y

). The spatial color histo-

gram of the object is updated by:

1. Merge μ

→

b;k ð Þ

with μ

→4

b;l ð Þ

by simply taking their average if they are

matched, i.e. the distance between them is smaller than the clus-

tering threshold ε. The corresponding probability is updated by

(1−ma)⋅n

(b, k)

+γ⋅ n

(b, l)

*

. If μ

→

b;k ð Þ

does not ﬁnd a match in any

entry of the h

*

, its corresponding probability value is updated by

(1−γ)⋅n

(b, k)

.

2. Append μ

→4

b;l ð Þ

to the object histogram model if a match cannot be

found in h.

3. Prune μ

→

b;k ð Þ

out of the model if its corresponding probability n

(b, k)

is

smaller than a certain threshold ξ. Practically, ξ=0.0001 can be

chosen.

The parameter γ is a histogram update parameter similar to β. As

the object appearance should not change dramatically from frame to

frame, we set this parameter to 0.1. After updating the object histo-

gram, its probability value has to be normalized by dividing each

probability value n

b, k

by ∑

b;k

n

b;k ð Þ

.

3.4. Algorithm summary

The overall summary of the proposed algorithm is given in

Algorithm 3. Additionally, Table 4 presents the list of the parameters

used in this paper.

3.5. Our contributions

The algorithm we proposed in this paper is the result of meshing the

spatiogram and the generalized Hough transform. In this subsection,

we highlight our contributions and show explicitly how our algorithm

differs from the existing approaches.

Our target representation model is derived from the spatiogram

proposed in [10]. However, unlike [10] which tracks the object in

the following frames by ﬁnding the image region whose spatiogram

representation is most similar to the spatiogram of the target object,

Fig. 6. Tracking object which shares very similar color with the background.

Fig. 7. Tracking result using the proposed algorithm for the car-rear sequence from PETS2001 data set.

856 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

we propose an adaptive voting method based on the generalized

Hough transform to locate the target object. It should be noted that

the use of the voting scheme is feasible only after we modify the spa-

tiogram into a form similar to the R-Table of the generalized Hough

transform.

Algorithm 3. Algorithm summary

Initialization

Given an object to be tracked at a spatial location c

→

in the initial

frame:

1. Create the object histogram h using Algorithm 1 and Eq. (4).

2. Create the background histogram m

b

by using Eq. (9).

Tracking

At the following frame:

3. Request each pixel inside the search region Rect c

→

_ _

to cast

the votes with the voting weights according to Eq. (8).

4. Assign the location in the image which receives the highest

votes as the new object location c

→

′

.

5. Perform the back projection algorithm.

Model update

6. Update the dimension of the target object using Eq. (13).

7. Create the new background histogram m

b

*

at the new object

location c

→

′

and update the background histogram m

b

by

using Eq. (14).

8. Create the new object histogram h

*

at the new object location

c

→

′

and update the object histogram h.

9. Go back to step 3.

Our proposed tracking algorithm has several advantages over the

existing methods. First, by allowing each bin of our spatial color his-

togram model to have more than one mean vector, we obtain a target

representation model that has richer spatial information than the

spatiogram. Second, the proposed adaptive voting method explicitly

considers the existence of background region which shares similar

color with the object and suppresses the contribution of those colors

in tracking. Third, the proposed algorithm segments the object region

from the background using the simple back projection method.

4. Experiment results

In our experiments, we manually select a target object at the ini-

tial frame and model it using the proposed spatial color histogram

model presented in Section 3.1. For all experiments, we use a

16×16×16-bins RGB color histogram and set the spatial clustering

threshold ε=5 and α=0.1.

Fig. 4(a) and (b) shows the initial frame with the target object

marked with green color and the tracking result at frame 30 with

the predicted object marked by rectangle, respectively. In order to vi-

sually illustrate how the center voting procedure works, we present a

vote-map for this particular frame in Fig. 4(c). The high intensity

pixels represent the locations associated with a large number of

votes, i.e. the location of the object center c

→

′

to be estimated. After

Table 7

The average dice coefﬁcients given various values of ζ.

ζ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

2 0.41 0.40 0.32

6 0.78 0.76 0.83

10* 0.80 0.75 0.83

15 0.80 0.60 0.74

Table 5

Time complexity of algorithms (time in ms).

Algorithm Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

KBT 2.76 0.67 5.72

Spatiogram 2.39 0.26 3.62

Level set 1.86 0.69 3.26

EFS 25.41 22.56 36.46

Proposed 4.81 1.64 6.31

Table 6

The average dice coefﬁcients given various values of ε.

ε Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

2 0.76 0.73 0.82

5* 0.80 0.75 0.83

10 0.30 0.66 0.67

15 0.22 0.72 0.62

a

b

c

Fig. 8. The dice coefﬁcient for each frame of the sequence in (a) Fig. 5, (b) Fig. 6, (c)

Fig. 7.

857 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

the object center is obtained, the back projection method is utilized to

segment the object from the background. This segmentation result is

shown in Fig. 4(d). Fig. 4(e), (f), (g), and (h) shows more tracking re-

sults along with the segmented object results. As the object being

tracked is not rigid, ζ=10.

We compare our algorithm with the kernel based tracking (KBT)

[9], the level set [3], the spatiogram based mean shift [10], and the ex-

tended feature selection (EFS) algorithm [19] and show some of the

tracking results in Fig. 5. The performance differences among these al-

gorithms become apparent when the target object becomes smaller

and smaller as it moves away from the camera. As shown in the

third column of Fig. 5(a), the KBT algorithm tracks the object with a

much larger bounding rectangle than the object actually is. This

poor estimation of the target object size contributes to tracking fail-

ure in the later frames as shown in the fourth column of the ﬁgure.

The EFS algorithm also fails for the same reason. The spatiogram

based mean shift algorithm loses the target object due to sudden

movement of the camera. The level set algorithm shows relatively

good tracking performance, but with its contour occasionally expands

to include the neighboring objects and shrinks to capture only a por-

tion of the target object as can be seen in the third and fourth columns

of Fig. 5(b). Our algorithm tracks both the location and the size of the

target object successfully throughout the sequence.

In Fig. 6, we present the proposed algorithm result when tracking

an object which shares similar color with the background. The robust

performance of our algorithm against the background with similar

color is achieved due to the use of adaptive voting weights.

Fig. 7 demonstrates the tracking of a vehicle in Performance Eval-

uation of Tracking and Surveillance (PETS) data set using the pro-

posed algorithm. The algorithm tracks the car successfully until the

car becomes too small to be discerned. Even though the object

being tracked is rigid, the car appearance changes quickly both in

pose and size, thus ζ is set to 10.

In order to compare the performance of the algorithms quantita-

tively, we use the dice coefﬁcient metric [20] to measure the degree

of overlap between the tracking rectangle and the ground truth. If

we denote the ground truth rectangle and the tracking rectangle as

Ω

1

and Ω

2

, respectively, then the dice coefﬁcient can be calculated as

D Ω

1

; Ω

2

ð Þ ¼

2⋅Area Ω

1

∩Ω

2

ð Þ

Area Ω

1

ð Þ þ Area Ω

2

ð Þ

: ð15Þ

The dice coefﬁcients of the various algorithms for each frame of

the sequence in Figs. 5, 6, and 7 are shown in Fig. 8(a), (b), and (c),

respectively. In all three sequences, the proposed algorithm outper-

forms the conventional methods.

In order to compare the algorithmcomplexity, we measure the av-

erage time required to process a single frame during tracking. The ex-

periment is run on a PC with dual core 3 GHz CPU and 3 GB RAM. We

present the experiment result in Table 5. The resolution of the se-

quence from Figs. 5 and 6 is 320×240, and Fig. 7 is 384×288. While

our algorithm is a little more complex than the conventional algo-

rithms except for the EFS algorithm, it still runs at the speed much

higher than the real time requirement. The slight increase in compu-

tational time is well justiﬁed by its superior tracking performance.

Next, we show the effect of changing the value of the parameters

to the tracking performance. The experiment results are given in

Table 6 to Table 11 with the average dice coefﬁcient used as the per-

formance metric. The asterisk sign near the parameter value indicates

the default value of the parameters as suggested in Table 4. At each

experiment, only one parameter is varied and the rest of the parame-

ters are set to the default value.

Table 6 shows the tracking results for various values of the spatial

clustering threshold ε. Large values of ε will cause more pixels to be

grouped into the same cluster. The result is a coarse model of the spa-

tial color histogram. In general, small ε gives good performance.

The rigidity threshold ζ should be set according to the characteris-

tic of the object being tracked. We suggest using a small value for the

rigid object and a larger value for the non rigid object. The result of

experimenting with various values of ζ is given in Table 7. We con-

clude that ζ is critical for good tracking performance, but it is not

very sensitive since using a value slightly smaller or larger than the

suggested value does not affect the tracking performance greatly.

The parameter α is used to update the object size. We set this

value to 0.1 as the size of the object generally does not change drasti-

cally between frames. As shown in Table 8, smaller value results in

better performance as we expected.

Table 9 shows the result of experimenting with various values of

β. The parameter determines how fast the background model is

updated. This parameter is relatively insensitive except for the chal-

lenging sequence in Fig. 6 where the object being tracked shares sim-

ilar color with the background.

In video tracking, it is reasonable to assume that the object ap-

pearance does not change drastically from frame to frame. Thus, set-

ting the object model update parameter γ to a small value is a logical

choice. The tracking result for various values of γ is shown in Table 10.

In general, large γ reduces the tracking performance as it allows more

background pixels to be updated into the object model.

The pruning threshold ξ is used to remove clusters whose proba-

bility to be the object model are small. Since these clusters are casting

votes with very small weights, their removal does not affect the track-

ing performance. This parameter is not sensitive as long as it is set to a

small value. We provide the experiment result in Table 11 to support

our argument.

Table 9

The average dice coefﬁcients given various values of β.

β Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

0.30 0.77 0.80 0.82

0.40 0.82 0.50 0.82

0.50* 0.80 0.75 0.83

0.60 0.60 0.70 0.82

0.70 0.70 0.79 0.80

Table 10

The average dice coefﬁcients given various values of γ.

γ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

0.05 0.62 0.79 0.84

0.10* 0.80 0.75 0.83

0.20 0.51 0.36 0.78

0.30 0.74 0.67 0.82

0.40 0.64 0.71 0.72

Table 11

The average dice coefﬁcients given various values of ξ.

ξ Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

0.00001* 0.80 0.75 0.83

0.00005 0.79 0.78 0.79

0.00010 0.84 0.79 0.80

0.00050 0.60 0.77 0.80

Table 8

The average dice coefﬁcients given various values of α.

α Sequence in Fig. 5 Sequence in Fig. 6 Sequence in Fig. 7

0.05 0.76 0.75 0.72

0.10* 0.80 0.75 0.83

0.20 0.78 0.30 0.80

0.30 0.79 0.55 0.76

0.40 0.77 0.55 0.79

858 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

Fig. 9 shows the experiment result of tracking a fast moving ob-

ject. The result of the KBT algorithm and our algorithm is presented

in Fig. 9(a) and (b), respectively. In this sequence, the movement of

the ball from frame to frame is larger than its size. In order to ensure

that the target region at the current and following frames are overlap-

ping, Comaniciu et al. suggest to initialize the target model using a

21×31 size region, which is larger than the ball itself, when tracking

using their KBT algorithm. Our algorithm, however, deals with this

problem by simply enlarging the search region by setting η=3,

allowing more pixels around the previous object location to vote for

the new location of the object. Note that our algorithm obtains better

estimate of both the location and size of the ball. Since the object

being tracked is rigid, ζ=2 is used.

Here we also showthe result of tracking in the case of illumination

change. Since our algorithm uses the color information to represent

the object, variation in illumination affects the performance of our al-

gorithm greatly. We show both the success case and the failure case

in Fig. 10(a) and (b), respectively. Note that in Fig. 10(b), the target

appearance changes drastically when the person walks into the shad-

ow area.

The robustness of the proposed algorithm can be improved by

using multiple features together with the color feature. We consider

that several features such as edges and textures can compensate the

weakness of the color feature. However, in order to keep the content

of this paper concise, only the color information is used and the com-

bination with other features remains as our future work.

5. Discussion

We presented a new object tracking algorithm based on the gen-

eralized Hough transform method. In the proposed algorithm, the ob-

ject is represented by using a spatial color histogrammodel which has

a form similar to the R-Table of the generalized Hough transform.

With the proposed spatial color histogram model, the object position

a

b

Fig. 9. The table tennis sequence. (a) KBT algorithm. (b) The proposed algorithm.

a

b

Fig. 10. Performance of the proposed algorithm under illumination change. (a) Successful tracking. (b) Failure tracking.

859 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

in the next frame is estimated by requesting each pixel to vote for the

location of the object. Experimental results indicate that the proposed

algorithm can track the object successfully even when the object

shares similar color with the background and changes in size.

The proposed algorithmcan be employed into applications such as

understanding human motion in video sequence. Many of the algo-

rithms developed for human motion understanding are based on

the analysis of object silhouettes [21–24]. In order to extract the ob-

ject silhouette, these algorithms usually employ a simple background

subtraction technique which is effective only when the video is cap-

tured by a static camera. Our algorithm can extract the object silhou-

ette even when the video sequence is taken by a moving camera.

The future work shall be focused on integrating the boundary in-

formation into color histogram to further improve the reliability of

the algorithm. Furthermore, the relative distance of a cluster of pixels

from the object center is currently expressed inpixel-distance. Thus

we expect that an alternative representation that is invariant to size

and shape changes will improve the robustness of the algorithm.

Acknowledgments

This work was supported by the Mid-career Researcher Program

through NRF grant funded by the MEST (No. 2011-0000200).

Appendix A. Supplementary data

Supplementary data to this article can be found online at doi:10.

1016/j.imavis.2011.09.008.

References

[1] A. Yilmaz, O. Javed, M. Shah, Object tracking: a survey, ACM Comput. Surv. 38

(2006) 13.

[2] M. Isard, A. Blake, CONDENSATION—conditional density propagation for visual

tracking, Int. J. Comput. Vision 29 (1998) 5–28.

[3] Y. Shi, W.C. Karl, Real-time tracking using level sets, Proceedings of IEEE Confer-

ence on Computer Vision and Pattern Recognition, 2, 2005, pp. 34–41.

[4] J. Shi, C. Tomasi, Good features to track, Proceedings of IEEE Conference on Com-

puter Vision and Pattern Recognition, 1994, pp. 593–600.

[5] C. Tomasi, T. Kanade, Detection and tracking of point features, Technical Report,

Carnegie Mellon University, 1991.

[6] D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Com-

put. Vision 60 (2004) 91–110.

[7] H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (surf), Com-

put. Vis. Image Underst. 110 (2008) 346–359.

[8] G.R. Bradski, Real time face and object tracking as a component of a perceptual

user interface, IEEE Workshop on Applications of Computer Vision, 1998,

pp. 214–219.

[9] D. Comaniciu, V. Ramesh, P. Meer, Kernel-based object tracking, IEEE Trans. Pat-

tern Anal. Mach. Intell. 25 (2003) 564–575.

[10] S.T. Birchﬁeld, S. Rangarajan, Spatiograms versus histograms for region-based

tracking, Proceedings of IEEE Conference on Computer Vision and Pattern Recog-

nition, 2, 2005, pp. 1158–1163.

[11] B.D. Lucas, T. Kanade, An iterative image registration technique with an applica-

tion to stereo vision, Proceedings of International Joint Conference on Artiﬁcial

intelligence, 2, 1981, pp. 674–679.

[12] D.H. Ballard, Generalizing the Hough transform to detect arbitrary shapes, Read-

ings in Computer Vision: Issues, Problems, Principles, and Paradigms, 1987,

pp. 714–725.

[13] H.-K. Kim Suryanto, S.-H. Park, D.-H. Kim, S.-J. Ko, Probabilistic center voting

method for subsequent object tracking and segmentation, World Academy of Sci-

ence, Engineering and Technology, 59, 2009, pp. 450–454.

[14] B. Leibe, A. Leonardis, B. Schiele, Combined object categorization and segmenta-

tion with an implicit shape model, Workshop on Statistical Learning in Computer

Vision, ECCV, 2004, pp. 17–32.

[15] K. Kim, T.H. Chalidabhongse, D. Harwood, L. Davis, Real-time foreground-

background segmentation using codebook model, Real-Time Imaging 11 (2005)

172–185.

[16] C. Stauffer, W. Grimson, Adaptive background mixture models for real-time

tracking, Proceedings of IEEE Conference on Computer Vision and Pattern Recog-

nition, 2, 1999, pp. 246–252.

[17] A.M. Elgammal, D. Harwood, L.S. Davis, Non-parametric model for background

subtraction, Proceedings of European Conference on Computer Vision, 2, 2000,

pp. 751–767.

[18] I. Leichter, M. Lindenbaum, E. Rivlin, Tracking by afﬁne kernel transformations

using color and boundary cues, IEEE Trans. Pattern Anal. Mach. Intell. 31 (2009)

164–171.

[19] J. Wang, Y. Yagi, Integrating color and shape texture features for adaptive real-

time object tracking, IEEE Trans. Image Process. 17 (2008) 235–240.

[20] D. Doermann, D. Mihalcik, Tools and techniques for video performance evalua-

tion, Proceedings of International Conference on Pattern Recognition, 4, 2000,

pp. 167–170.

[21] R. Hoshino, D. Arita, S. Yonemoto, R.-I. Taniguchi, R. Hoshino, D. Arita, S. Yone-

moto, R.-I. Taniguchi, Real-time human motion analysis based on analysis of sil-

houette contour and color blob, Proceedings of the Second International

Workshop on Articulated Motion and Deformable Objects, 2002, pp. 92–103.

[22] K. Tabb, N. Davey, R.G. Adams, S.J. George, Analysis of human motion using snakes

and neural networks, Proceedings of the First International Workshop on Articu-

lated Motion and Deformable Objects, 2000, pp. 48–57.

[23] F. Buccolieri, C. Distante, A. Leone, Human posture recognition using active con-

tours and radial basis function neural network, Proceedings of the Advanced

Video and Signal Based Surveillance, 2005, pp. 213–218.

[24] I. Haritaoglu, D. Harwood, L.S. Davis, W4: real-time surveillance of people and

their activities, IEEE Trans. Pattern Anal. Mach. Intell. 22 (2000) 809–830.

860 Suryanto et al. / Image and Vision Computing 29 (2011) 850–860

μb . ð2Þ a b c Fig. 2.2. we present our algorithm in detail in Section 3. 1. Table 1 shows a general form of the R-Table. Σb gb¼1.Suryanto et al. In the next section. the pixels located near the object boundary are given smaller weights while the pixels around the center of the object are assigned larger weights. The object localization is performed iteratively by a mean shift method similar to CAMSHIFT. 2. a gradient → direction φ may have many values of r . (b) Test image. is maximized. The spatiogram h(y) is calculated from an image region whose center is at location y and has the same size as the object to be tracked. . ϕðxÞ ¼ 0g →→ → → n r r ¼ x − c . In general. by adopting the concept of the spatiogram [10]. the current object size is estimated and the search range is adjusted accordingly. The number B is the total number of bins in the spatiogram. indicates the location of the shape. we brieﬂy review the generalized Hough transform and the spatiogram which are closely related to our proposed algorithm. Even though the Bradski's algorithm was developed for face tracking. its gradient direction φ = φ′ is calculated. h(y) = {nb(y). In Section 3.1. 1(a). the construction of the R-Table to represent the shape and the voting scheme to detect the shape in the image. 1(b). Gradient direction 0 Δϕ 2Δϕ … Positions n →→ → → n r r ¼ x − c . and μb and Σb are the mean vector and the covariance matrix. Fig. ry from a reference point c ¼ cx . The use of the kernel weighted histogram for the mean shift based algorithms signiﬁcantly improves the tracking performance. Then. In the kernel weighted color histogram representation. Section 5 concludes the content of the paper. of the coordinates of those pixels. This paper is organized as follows. (a) Model shape. proposed the Kernel Based Tracking (KBT) algorithm which uses the kernel weighted color histogram to represent the color distribution of the object [9]. we introduce a new object representation model and localization method. Σb (y)}. The pixel with the highest intensity. Once the object location is obtained. However. The generalized Hough transform The generalized Hough transform has been widely used to detect the shape in the image [12]. the rectangle position is moved to a new position until convergence. μb(y). Then. Then. Each bin in the spatial color histogram model contains the information on the number of pixels belonging to the color bin and the positions of those pixels relative to the object center. 1(b). The similarity between two spatiograms is computed as the weighted sum of the Bhattacharyya similarity between two histograms B ρðhðyÞ. 1(a) from the test image in Fig. the algorithm does not perform well when the object being tracked changes in size.e. Geometry for the generalized Hough transform. for each point x ¼ ðx.2. The preliminary version of this work has been published in [13]. we show how to adopt this generalized Hough transform technique to track the object from frame to frame in a video sequence. it can be used to track any object of interest. With the segmented object. In order to represent the object to be tracked. Experiment results are given in Section 4. the back projection method is utilized to segment the object from the background. the pixel with the highest vote. It consists of two main parts. / Image and Vision Computing 29 (2011) 850–860 851 iterative procedure based on the mean shift is applied to center the object rectangle in the face region. ϕðxÞ ¼ Δϕg →→ → → r r ¼ x − c . In this paper. Related works 2. ð1Þ where nb is the number of object pixels whose quantized values fall into the b-th bin. each pixel casts votes into the vote ac→ → → cumulator at positions x − r where r is the set of all position vectors indexed by φ′ in the R-Table. 2. Tracking by spatiogram The spatiogram [10] represents the object to be tracked as h ¼ fnb . the gradient direction φ and its relative Á Á → À → À position r ¼ rx . yÞ at the boundary of the shape. respectively. hÞ ¼ ∑ ψb b¼1 pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ nb ðyÞnb . The object localization is performed by determining the location y ε R2 in the image where the similarity between the spatiogram of the object h and the spatiogram at location y. Construction of the R-Table → Given an arbitrary shape as in Fig. the relative position vectors r ¼ x − c are stored as a Table 1 The R-Table of an arbitrary shape. cy are comput→ → → ed.….1. (c) Vote accumulator. The localization of the object in the next and following frames is accomplished using the center voting method based on the generalized Hough transform voting scheme [12]. At each iteration. Note that the underlying idea behind the generalized Hough transform has also been applied successfully to the object category detection [14].1. Comaniciu et al. i. we propose a new class of spatial color histogram model. 2. Voting scheme for the shape detection → For each edge pixel x in the image in Fig. ϕðxÞ ¼ 2Δϕg … function of the gradient direction in the R-Table.1. We extend our previous work by utilizing the kernel to create a more reliable model and by introducing the spatial color histogram update mechanism.B . 1(c) shows the vote accumulator for the detection of the shape in Fig.

Proposed algorithm → Given an object centered at location c ¼ cx . 2. N → nðb.k¼1. i¼1 ⌢ −1 with η as the Gaussian normalization constant and Σ b ¼ −1 −1 Σb ðyÞ þ Σb . → → (b) include x i to μ ðb. for each k: → → (a) calculate the distance of x i from μ ðb. This object segmentation can be done at the initial frame by performing the background subtraction algorithms [15–17] or even by manual selection by a human operator. → K xi ∑ N i¼1 ð5Þ → and δibk = 1 if the value of pixel x i is quantized into the b-th bin and its → distance from the μ ðb. 2(a). where the weighting function is given by ( ) −1 1 T ⌢ ψb ¼ η exp − ðμb ðyÞ−μb Þ Σ ðμb ðyÞ−μb Þ .kÞ . (c) Tracking result. → → 2. cy in the current →′ frame. The pseudo-code for estimating ěcμ(b. Calculate the bin index b of pixel x i . otherwise δibk = 0. The object to be tracked is referred to as the target object. c be the location of the object center. (b) Center voting procedure. and lx and ly be half the width and height of the rectangle bounding the object region as shown in Fig. (c) 2-D object kernel.kÞ . Otherwise.… → μ ðb. where spatial color histogram model h ¼ μ ðb. / Image and Vision Computing 29 (2011) 850–860 a b c d Fig. The n(b. (d) 2-D background kernel. 2 b ð3Þ of pixels relative to the object center. k) is the probability value associated with the number of pixels belonging to that particular cluster. The location y that maximizes the similarity function in Eq. we modify this spatiogram to construct a table similar to the R-Table and then use the voting mechanism of the generalized Hough transform to ﬁnd the location of the object to be tracked.B. we assume that at the initial frame.1Þ ¼ x i if there are no cluster in the bth bin.kÞ ¼ x i if x i is not included in any existing clusters.852 Suryanto et al.kÞ b¼1.kÞ is the mean vector representing the position of the k-th cluster Let n À Á ð4Þ where C is the normalization constant to ensure that ∑b. a b c Fig. (a) The target object. In order to represent the target object. we use the n o → . Since our main objective is to track the object.kÞ → for each x i : → 1.1.…. k) is given in Algorithm 1. → Algorithm 1. (2) is determined either by using the gradient descent mean shift method or the exhaustive local search. Target representation o → x i ¼ ðxi .kÞ → Input: pixel location x i of the target object → Output: cluster of pixels μ ðb. In our algorithm.k nðb. Create a new cluster μ ðb. here we also employ a monotonically decreasing kernel → function K x i to assign smaller weights to pixels located farther away from the object center. (a) The target object. the mean shift based localization is several order faster than the exhaustive local search.2. yi Þ be the location of the pixels belonging to i¼1…N → the target object. A cluster is deﬁned as a group of pixels whose quantized values fall into the same b-th bin of the histogram and which are located close to each other. Mathematically.kÞ ¼ 1 which is given by C¼ 1 . 3. As in [9]. Illustration of object tracking using the proposed algorithm. because these pixels are less reliable. Create a new cluster μ ðb. 3.kÞ if the distance is smaller than ε. In general. nðb. the object has been segmented out from the background. We describe our algorithm in detail in the next section. 3.kÞ . → (c) update μ ðb.kÞ ¼ C ∑ K x i δibk . the tracking objective is to ﬁnd the new object center c in the next and the following frames. Estimation pf μ ðb. 3. → → → 4. (b) Object region and background region.kÞ is smaller than threshold ε. .

n o → Redeﬁne x i ¼ ðxi . both centered at c as shown in Fig. It can be seen that the location in the image receiving the highest number of votes is located at pixels labeled d. In the next subsection. 3. 5 10 ε ζ Spatial clustering threshold Object rigidity threshold {(− 1. our spatial color histogram model allows each histogram bin to have more than one mean vector. 3. which are mostly the background pixels. Only the pixel whose color exists in the target model can cast a vote. → 2. (4) and mb is the probability value associated with the number of pixels in the background whose quantized values fall into the b-th bin. ð8Þ The standardized Euclideandistance in Eq. calculated by ﬁrst normalizing (xi−cx) and (yi−cy) by lx and ly. Thus. Target localization Target localization consists of center voting and back projection steps. For example. Note that the two pixels belonging to the same color histogram bin a are not clustered together due to their spatial distance. 0. The kernel function K x i can be written as → K xi ¼ ( → → 1−d2 x i . x i −ð1.kÞ . i. The assumption. Table 2 Spatial color histogram for the target object in Fig. of course. −1Þ . does not hold in most cases. A straightforward way to quantify the reliability of a pixel is to use the probability difference as employed in [18]. pixels whose colors fall into this bin cast two votes. Thus. since the two pixels with bin index e are adjacent to each other.kÞ → where d x i is the standardized Euclidean distance between pixel → → x i and the object center c .1/7} {(− 0. On the other hand. 3(b) illustrates the center voting procedure.1/7} {(0.0 .kÞ −mb ) . i. If the object displacement between frames is larger than the object size. located between the ellipse with semi-minor axis lx and semimajor axis ly and the ellipse with semi-minor axis lx + Δ and semi→ major axis ly + Δ. R-Table → Reference point c Gradient direction ϕ → Positions r Our proposed model → Object center c Histogram bins b → Mean vectors μ ðb.2. they are grouped into the same cluster with a single mean. the corresponding histogram bin has two mean vectors. The arrows indicate where the pixels cast their votes on. for pixels inside the ellipse.5.1 0.1/7}.Suryanto et al. For simplicity. 3. (7) is similar to the equa → tion of the ellipse. which is also the value of the background histogram at b-th bin. we obtain an ellipse centered → on c with the semi-minor axis lx and the semi-major axis ly as shown in Fig. Naturally. More reliable pixels cast votes with higher weights than less reliable pixels. casts a n o → → vote on position x i − μ ðb. we show how to use this spatial color histogram model to locate the target object in the following frames. 2(c). and e cast their votes on the position indicated by their mean vectors at n o n o n o → → → x i −ð0.−1). the voting weight w(b.00001 . nðb. On the other hand. Rules for the center voting procedure are as follows: 1. a pixel is regarded as a reliable pixel if its color exists in the object. 3(a).e. respectively. Unlike the spatiogram which has a single mean and a single covariance for each bin. −1Þ . 1).kÞ þ mb where n(b. 0). The spatial color histogram for this object is shown in Table 2. In the center voting step. 1Þ . if d x i ¼ 1. Each mean vector represents a cluster of pixels sharing a similar color which are located in close proximity to each other. reliable pixels cast votes with higher weights.2/7} Model update related parameters α Object size update ratio β Background model update ratio γ Object model update ratio ξ Pruning threshold 0. Thus.kÞ o nðb. k) for a vote casted → using mean vector μ ðb. {(1. The probability value mb. respectively. yi Þ as the pixels in the background rei¼1…N gion. k) is given in Eq. which causes the algorithm to fail to correctly estimate the location of the object center. The alphabets indicate the color histogram bins associated with the pixels.1). can be calculated by N → mb ¼ CBG ∑ KBG x i δib .1/7} {(1.e. we have made a very naive assumption that the background does not share the same color with the object. ifd x i ≤1. i¼1 ð9Þ Table 4 The parameters. we → use auniform kernel K x i ¼ 1 for this particular example. otherwise. Pixels located at x i labeled b.1 0. For pixels located outside this ellipse.: vﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ !2 ux −c 2 u yi −cy → d xi ¼t i x þ : lx ly ð7Þ Fig. their standard → ized distances are smaller than one. Table 3 shows that the proposed spatial color histogram model has a structure similar to the R-Table. these pixels do not cast any vote on the loca→ tion of the object center.kÞ by a pixel whose color belongs to the b-th bin of the histogram can be expressed as ( wðb.1/7} {(0. c.−1). In this example. Since bin a contains two mean vectors. Param Meaning Object motion related parameters η Speed of the object Suggested value For most tracking application: 2. then 3 or larger.−1). and consequently their kernel weights K(cxi) become zero. The kernel function K x i assigns larger weights to the pixels located closer to the center as shown in Fig. This problem can be solved by adding the third rule. but not in the background. bin index a b c d e h¼ n → μ ðb. The pixels labeled × are the pixels whose colors do not fall into any of the bins of the target model histogram. whose quantized value falls into b-th bin. and x i −ð−0:5.kÞ . 3(a). 2(b). consider a target object in Fig.kÞ ¼ max nðb. In fact. →′ which is the new location of the object c as shown in Fig. their standardized distances from the object center are greater than one. / Image and Vision Computing 29 (2011) 850–860 853 since they are often affected by occlusion or interference from the → background.5 0. 2(b). Each pixel x i . every pixel located near the previous → object center c is required to cast a vote for the location of the object center. ð6Þ Table 3 Attributes of the R-Table and the proposed model.

In order to include these pixels into the foreground.e. The ring-shaped background kernel function KBG in Eq. causing them to vote slightly off from the object center. The result of this back projection method is a foreground image where the pixels belonging to the object are marked as 1's. we can easily estimate the change in object size and then re-adjust the dimension of the object and background * * kernel accordingly. ð10Þ where 1 0 → d xi 1@ 1 þ → A. (c) Vote map for frame 30. The dimension of this search region is 2(lx + Δ) × 2(ly + Δ). Larger value of η should be used when faster moving objects are tracked. ð12Þ ðσ−1Þ2 → and dΔ x i is the standardized Euclidean distance as in Eq. η = 2 is sufﬁcient. σ¼ 2 d x Δ i → → centered on c denoted as Rect c . the relative location of pixels from the object center can change slightly. →′ Once we obtain the new object center c . if d x i ≥1 and dΔ x i ≤1 . is the search range of the algorithm and is shown in Fig. (b) Tracking result at frame 30. as the object in the new frame tends to be located near its previous location. (f) Segmented object at frame 60. / Image and Vision Computing 29 (2011) 850–860 where CBG ¼ 1 → ∑ KBG x i N i¼1 → of is a normalization constant and δib = 1 if the value pixel x i is quan → tized into the b-th bin. The pseudocode is given in Algorithm 2. otherwise.ly). Note that the width of the background region is determined by the value Δ which can be calculated as a function of the object dimension. which is given by → KBG x i ¼ ( 2 → → → . KBG x i is the background kernel function as used in [18]. (a) Target object at the initial frame. as shown in Fig. 3(b) as a region enclosed by the dash-line rectangle λ¼ lx 0 ¼ ð1−αÞ⋅lx þ α⋅l4 . ð13Þ where α determines how fast we update the old object size with the newly obtained size. we only have to collect the votes from → pixels located nearby its previous center c . Δ = η ⋅ min(lx. Let lx and ly be half of the width and length of the rectangle bounding the foreground region obtained by the back projection method. (e) Tracking result at frame 60.854 Suryanto et al. 4. . (7) but with the normalizer lx + Δ and ly + Δ. Furthermore. we re-scan the pixels in the neighborhood to see which pixels have casted the correct votes. we allow pixels that have casted their votes somewhere within →′ distance ζ from the object center c to be categorized as object pixels as well. (d) Segmented object at frame 30. 2(d). otherwise δib = 0. (h) Segmented object at frame 140. (10) assigns weights with value zero for pixels located outside the background region. x 4 ly 0 ¼ ð1−αÞ⋅ly þ α⋅ly . 1−λ σ−d x i 0. With this foreground image. a b c d e f g h Fig. The pixels that have casted the correct votes for the object center are marked as the object pixels. This set of pixels. For most tracking application where the object motion between frames is not larger than the object size. Back projection → → for all x ∈Rect c →′ → distance←‖vote x − c ‖ if distance b ζ then → x is foreground pixel else → x is background pixel end if end for. Since the object being tracked may not be rigid and may grow or shrink in size. Tracking result using the proposed algorithm. i. (g) Tracking result at frame 140. The new object size is calculated by ð11Þ 1 . Algorithm 2. from whom we collect the votes. where η is a parameter that depends on the speed of the object.

the object and background models have to be continuously updated to reﬂect the change in object and background information. Updating the background model is quite straightforward. (c) spatiogram. The background histogram is updated as mb ¼ ð1−βÞ⋅mb þ β⋅mb . 4 ð14Þ a b c d e Fig. * Let mb be the background histogram calculated at the current frame →′ by centering the background kernel at c . / Image and Vision Computing 29 (2011) 850–860 855 3. 5.Suryanto et al. (b) level set. . (d) EFS.3. Comparison of the tracking results of the (a) KBT. Model update During tracking. and (e) the proposed algorithm.

where β determines how much the proportion of the newly calculated background histogram is used to update the background histogram. → 3.5. we use β = 0. Fig. Our target representation model is derived from the spatiogram proposed in [10]. gram calculated at the current frame by centering the object kernel at →′ c with the updated kernel dimensions (lx.…. →4 2. ly). 7.lÞ to the object histogram model if a match cannot be found in h.lÞ be the spatial color histob¼1. we highlight our contributions and show explicitly how our algorithm differs from the existing approaches. 3. Tracking object which shares very similar color with the background. k) + γ ⋅ n(b. Let h4 ¼ μ ðb.856 Suryanto et al. Algorithm summary The overall summary of the proposed algorithm is given in Algorithm 3. the distance between them is smaller than the clustering threshold ε.e. k by ∑b. 6. k) is smaller than a certain threshold ξ. In this subsection. nðb. k).kÞ does not ﬁnd a match in any * entry of the h .5. i. 3.kÞ out of the model if its corresponding probability n(b. However. As the object appearance should not change dramatically from frame to frame. Append μ ðb. The corresponding probability is updated by → * (1−ma) ⋅ n(b. / Image and Vision Computing 29 (2011) 850–860 Fig. Additionally. Since the background tends to change quickly from frame to frame. appending. .2.… The parameter γ is a histogram update parameter similar to β. If μ ðb. Prune μ ðb. and n o →4 4 pruning.B.kÞ .lÞ by simply taking their average if they are matched. unlike [10] which tracks the object in the following frames by ﬁnding the image region whose spatiogram representation is most similar to the spatiogram of the target object. Table 4 presents the list of the parameters used in this paper. After updating the object histogram. Our contributions The algorithm we proposed in this paper is the result of meshing the spatiogram and the generalized Hough transform. ξ = 0. The spatial color histogram of the object is updated by: →4 → 1. its corresponding probability value is updated by (1−γ) ⋅ n(b.kÞ with μ ðb.k nðb. l). Merge μ ðb.0001 can be chosen.lÞ . Practically.l¼1. Tracking result using the proposed algorithm for the car-rear sequence from PETS2001 data set.1. its probability value has to be normalized by dividing each probability value nb.4. Updating the object histogram involves merging. we set this parameter to 0.

Update the dimension of the target object using Eq. by allowing each bin of our spatial color histogram model to have more than one mean vector. Our proposed tracking algorithm has several advantages over the existing methods. (13).26 0. Algorithm KBT Spatiogram Level set EFS Proposed Sequence in Fig. 5. Table 5 Time complexity of algorithms (time in ms). respectively.86 25.31 ζ 2 6 10* 15 Sequence in Fig.67 0. 5 0. the proposed adaptive voting method explicitly considers the existence of background region which shares similar color with the object and suppresses the contribution of those colors in tracking. we manually select a target object at the initial frame and model it using the proposed spatial color histogram model presented in Section 3.22 Sequence in Fig. * 7. Model update 6.75 0. Sequence in Fig. we present a vote-map for this particular frame in Fig.46 6.62 Algorithm 3. / Image and Vision Computing 29 (2011) 850–860 857 a Table 6 The average dice coefﬁcients given various values of ε. 2. 7 0. Create the background histogram mb by using Eq. Third.32 0. 7 5.69 22. (b) Fig. 4(c).Suryanto et al.64 .76 0.74 c Fig.83 0. ε 2 5* 10 15 Sequence in Fig. The dice coefﬁcient for each frame of the sequence in (a) Fig.83 0.76 0.62 3. 4(a) and (b) shows the initial frame with the target object marked with green color and the tracking result at frame 30 with the predicted object marked by rectangle. It should be noted that the use of the voting scheme is feasible only after we modify the spatiogram into a form similar to the R-Table of the generalized Hough transform. 6 0.26 36.30 0. we obtain a target representation model that has richer spatial information than the spatiogram.81 Sequence in Fig.39 1. i. 6 0. After Table 7 The average dice coefﬁcients given various values of ζ. 8. (9). 7 0. Fig. 5. 5 0.80 0.83 0.72 3. For all experiments.73 0.80 0. Go back to step 3. (4). Perform the back projection algorithm. 8. the proposed algorithm segments the object region from the background using the simple back projection method. The high intensity pixels represent the locations associated with a large number of →′ votes.41 0.40 0. (c) Fig. 4. Second. 5 2. b Tracking At the following frame: → 3. 6 0.78 0. we use a 16 × 16 × 16-bins RGB color histogram and set the spatial clustering threshold ε = 5 and α = 0. Experiment results In our experiments. (14). First.76 2.80 Sequence in Fig. 4.41 4.72 Sequence in Fig. 6. 9.67 0. Assign the location in the image which receives the highest →′ votes as the new object location c .82 0. Create the object histogram h using Algorithm 1 and Eq. 7.75 0. Create the new background histogram mb at the new object →′ location c and update the background histogram mb by using Eq.e.56 1.66 0.1. In order to visually illustrate how the center voting procedure works. we propose an adaptive voting method based on the generalized Hough transform to locate the target object.60 Sequence in Fig. Algorithm summary Initialization → Given an object to be tracked at a spatial location c in the initial frame: 1. Request each pixel inside the search region Rect c to cast the votes with the voting weights according to Eq. (8). Create the new object histogram h* at the new object location →′ c and update the object histogram h. the location of the object center c to be estimated.1.

60 0. Table 6 shows the tracking results for various values of the spatial clustering threshold ε. 5 0. The algorithm tracks the car successfully until the car becomes too small to be discerned. The EFS algorithm also fails for the same reason.50* 0. Fig. and (h) shows more tracking results along with the segmented object results.05 0.75 0.77 Suryanto et al.82 0. 5(a).78 0.83 0. Table 9 shows the result of experimenting with various values of β.40 Sequence in Fig.20 0.83 0. Next.82 0.70 0. / Image and Vision Computing 29 (2011) 850–860 Table 10 The average dice coefﬁcients given various values of γ. the level set [3].60 0. The parameter determines how fast the background model is updated. ζ = 10.71 Sequence in Fig. This segmentation result is shown in Fig.80 0. The result of experimenting with various values of ζ is given in Table 7. The robust performance of our algorithm against the background with similar color is achieved due to the use of adaptive voting weights. we show the effect of changing the value of the parameters to the tracking performance.82 0. 7 0.70 Sequence in Fig. As shown in Table 8. 5. setting the object model update parameter γ to a small value is a logical choice.64 Sequence in Fig. Table 11 The average dice coefﬁcients given various values of ξ. 7 0.80 0. Ω2 Þ ¼ ð15Þ The dice coefﬁcients of the various algorithms for each frame of the sequence in Figs. small ε gives good performance.55 the object center is obtained. We compare our algorithm with the kernel based tracking (KBT) [9]. and 7 are shown in Fig.72 0. thus ζ is set to 10. 6 0. As the object being tracked is not rigid.80 0. then the dice coefﬁcient can be calculated as 2⋅AreaðΩ1 ∩Ω2 Þ : AreaðΩ1 Þ þ AreaðΩ2 Þ DðΩ1 . If we denote the ground truth rectangle and the tracking rectangle as Ω1 and Ω2. 7 0.70 Sequence in Fig.30 0.79 0.84 0. We suggest using a small value for the rigid object and a larger value for the non rigid object. 6 0.1 as the size of the object generally does not change drastically between frames. In all three sequences. 6 0. The rigidity threshold ζ should be set according to the characteristic of the object being tracked. 6 0. We conclude that ζ is critical for good tracking performance. We present the experiment result in Table 5. The performance differences among these algorithms become apparent when the target object becomes smaller and smaller as it moves away from the camera.78 0. 6. 4(d). we measure the average time required to process a single frame during tracking.80 0. 7 0. and Fig.00001* 0. (b). respectively.80 0. 5 0. Fig. The spatiogram based mean shift algorithm loses the target object due to sudden movement of the camera. Table 9 The average dice coefﬁcients given various values of β. and the extended feature selection (EFS) algorithm [19] and show some of the tracking results in Fig.40 Sequence in Fig.80 0. In video tracking. The asterisk sign near the parameter value indicates the default value of the parameters as suggested in Table 4.30 0. 5(b). The experiment is run on a PC with dual core 3 GHz CPU and 3 GB RAM. their removal does not affect the tracking performance.75 0. The level set algorithm shows relatively good tracking performance. The resolution of the sequence from Figs.00010 0.79 0.82 0.80 In order to compare the algorithm complexity.858 Table 8 The average dice coefﬁcients given various values of α. We provide the experiment result in Table 11 to support our argument. but it is not very sensitive since using a value slightly smaller or larger than the suggested value does not affect the tracking performance greatly. respectively. 5 0. We set this value to 0.20 0. it still runs at the speed much higher than the real time requirement. 6 where the object being tracked shares similar color with the background.55 0. 8(a).84 0. 7 is 384 × 288. This poor estimation of the target object size contributes to tracking failure in the later frames as shown in the fourth column of the ﬁgure. This parameter is relatively insensitive except for the challenging sequence in Fig.30 0. This parameter is not sensitive as long as it is set to a small value.83 0. 5 0.78 0. 6.36 0.80 0.51 0.80 . β 0.75 0. we present the proposed algorithm result when tracking an object which shares similar color with the background. In Fig.05 0. The tracking result for various values of γ is shown in Table 10.79 0. In general.75 0.72 Sequence in Fig. the proposed algorithm outperforms the conventional methods.76 0.82 0.50 0. (f). In general.75 0. As shown in the third column of Fig.79 γ 0. the spatiogram based mean shift [10]. it is reasonable to assume that the object appearance does not change drastically from frame to frame.30 0. The experiment results are given in Table 6 to Table 11 with the average dice coefﬁcient used as the performance metric. 5.76 0. While our algorithm is a little more complex than the conventional algorithms except for the EFS algorithm. The result is a coarse model of the spatial color histogram. and (c).79 0. the car appearance changes quickly both in pose and size. ξ 0. large γ reduces the tracking performance as it allows more background pixels to be updated into the object model. The slight increase in computational time is well justiﬁed by its superior tracking performance.60 Sequence in Fig. In order to compare the performance of the algorithms quantitatively.83 0. At each experiment.67 0. only one parameter is varied and the rest of the parameters are set to the default value.00050 Sequence in Fig.74 0. we use the dice coefﬁcient metric [20] to measure the degree of overlap between the tracking rectangle and the ground truth. The parameter α is used to update the object size. Large values of ε will cause more pixels to be grouped into the same cluster. Even though the object being tracked is rigid. Since these clusters are casting votes with very small weights.00005 0. smaller value results in better performance as we expected.77 Sequence in Fig.10* 0. Thus. Our algorithm tracks both the location and the size of the target object successfully throughout the sequence.62 0.40 0. 7 demonstrates the tracking of a vehicle in Performance Evaluation of Tracking and Surveillance (PETS) data set using the proposed algorithm.10* 0. 5 and 6 is 320 × 240. Sequence in Fig. α 0.79 0. the KBT algorithm tracks the object with a much larger bounding rectangle than the object actually is.79 Sequence in Fig. 4(e). the back projection method is utilized to segment the object from the background. The pruning threshold ξ is used to remove clusters whose probability to be the object model are small. (g). but with its contour occasionally expands to include the neighboring objects and shrinks to capture only a portion of the target object as can be seen in the third and fourth columns of Fig.77 0.

Since the object being tracked is rigid. variation in illumination affects the performance of our algorithm greatly. Since our algorithm uses the color information to represent the object. (a) KBT algorithm. In order to ensure that the target region at the current and following frames are overlapping. The robustness of the proposed algorithm can be improved by using multiple features together with the color feature. suggest to initialize the target model using a 21 × 31 size region. However. deals with this problem by simply enlarging the search region by setting η = 3. Discussion We presented a new object tracking algorithm based on the generalized Hough transform method. allowing more pixels around the previous object location to vote for the new location of the object. the target appearance changes drastically when the person walks into the shadow area. We consider that several features such as edges and textures can compensate the weakness of the color feature. (b) The proposed algorithm. 9(a) and (b). Performance of the proposed algorithm under illumination change. Note that our algorithm obtains better estimate of both the location and size of the ball. We show both the success case and the failure case in Fig. Comaniciu et al. 5. ζ = 2 is used. With the proposed spatial color histogram model. in order to keep the content of this paper concise. Our algorithm. respectively. (a) Successful tracking. however. respectively. 9. . The result of the KBT algorithm and our algorithm is presented in Fig. which is larger than the ball itself. In the proposed algorithm. 9 shows the experiment result of tracking a fast moving object. the movement of the ball from frame to frame is larger than its size. 10(a) and (b). / Image and Vision Computing 29 (2011) 850–860 859 a b Fig. Here we also show the result of tracking in the case of illumination change. The table tennis sequence. the object position a b Fig.Suryanto et al. when tracking using their KBT algorithm. Fig. Note that in Fig. 10(b). In this sequence. the object is represented by using a spatial color histogram model which has a form similar to the R-Table of the generalized Hough transform. 10. (b) Failure tracking. only the color information is used and the combination with other features remains as our future work.

Yonemoto. Intell. pp. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Shah. [22] K. Arita. 2000. Grimson.09. Thus we expect that an alternative representation that is invariant to size and shape changes will improve the robustness of the algorithm. Yagi. Y.T. Bay. Supplementary data Supplementary data to this article can be found online at doi:10.-J. Yonemoto. pp.2011. Harwood. 2. Combined object categorization and segmentation with an implicit shape model. Principles. [13] H. Isard. Analysis of human motion using snakes and neural networks. Intell. Taniguchi. 714–725. S. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Shi. Detection and tracking of point features. 593–600. O. Mach. Leone. 1987. 1016/j. J. 59. Comaniciu. Surv. CONDENSATION—conditional density propagation for visual tracking.S. R. In order to extract the object silhouette. pp. Haritaoglu. 1158–1163. Lowe.imavis. Proceedings of International Joint Conference on Artiﬁcial intelligence. B. Experimental results indicate that the proposed algorithm can track the object successfully even when the object shares similar color with the background and changes in size. Rangarajan. S. A. [3] Y.G. in the next frame is estimated by requesting each pixel to vote for the location of the object. Vis. D. 2. D. [18] I.M. 2. Ko. 674–679. Chalidabhongse. Generalizing the Hough transform to detect arbitrary shapes. Real-time tracking using level sets. Real time face and object tracking as a component of a perceptual user interface. Comput. Mach. Vision 29 (1998) 5–28. D. Intell. 2002. 167–170. [17] A. Mach.H. A. [2] M. pp. T.J. Distante. 214–219. 17–32. Karl. Lindenbaum. 2005.S. [19] J. Harwood. [16] C. Problems. [10] S. Probabilistic center voting method for subsequent object tracking and segmentation. Engineering and Technology. Integrating color and shape texture features for adaptive realtime object tracking. [5] C. S. Our algorithm can extract the object silhouette even when the video sequence is taken by a moving camera. V. Image Underst. 1998. C. [23] F. 92–103. [21] R. S. Adaptive background mixture models for real-time tracking. Leonardis. A. An iterative image registration technique with an application to stereo vision. Ess. Acknowledgments This work was supported by the Mid-career Researcher Program through NRF grant funded by the MEST (No.-I. Kanade.G. Kim. W. pp. World Academy of Science. 2004. M. Bradski. R. 2000. L. 2011-0000200). 2005. Tools and techniques for video performance evaluation. Shi. 1981. 246–252. C. R. T. 2005. D. Birchﬁeld. Proceedings of the Advanced Video and Signal Based Surveillance. M. 1991. 450–454. pp. Pattern Anal. Adams. pp. [8] G. Pattern Anal.-H. 2. Kernel-based object tracking. Kanade. the relative distance of a cluster of pixels from the object center is currently expressed inpixel-distance.-H. Arita. S. Meer. 213–218. Good features to track. 38 (2006) 13. Davey. Ballard. D. pp. [20] D. J. Tuytelaars. [12] D. Tomasi.R. Rivlin. L. Mihalcik. E. Harwood. 34–41. Elgammal. IEEE Trans. Comput. Many of the algorithms developed for human motion understanding are based on the analysis of object silhouettes [21–24]. Javed. Carnegie Mellon University. [6] D. References [1] A. 1994. L. N. 2009. pp. L. Technical Report. R.-I. Int. pp. Image Process. T. Ramesh. Yilmaz. Tracking by afﬁne kernel transformations using color and boundary cues. W.D. Vision 60 (2004) 91–110. Readings in Computer Vision: Issues.H. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. Lucas. Hoshino. Object tracking: a survey. / Image and Vision Computing 29 (2011) 850–860 [7] H. 2. 751–767. Wang. Non-parametric model for background subtraction. Proceedings of the Second International Workshop on Articulated Motion and Deformable Objects. pp. Davis. Davis.008. Proceedings of the First International Workshop on Articulated Motion and Deformable Objects. IEEE Trans. Comput. Taniguchi. Real-time foregroundbackground segmentation using codebook model. D. George. Kim. Workshop on Statistical Learning in Computer Vision. T. 22 (2000) 809–830. Furthermore. IEEE Trans. Schiele. A. these algorithms usually employ a simple background subtraction technique which is effective only when the video is captured by a static camera. ACM Comput.-K. Doermann. Proceedings of IEEE Conference on Computer Vision and Pattern Recognition. [15] K. 4. [24] I. Proceedings of European Conference on Computer Vision. Tabb. 2000. Leichter. D. P. 1999. [14] B. Speeded-up robust features (surf). IEEE Trans. The proposed algorithm can be employed into applications such as understanding human motion in video sequence. 17 (2008) 235–240. Stauffer. and Paradigms.C. Kim Suryanto. Buccolieri. ECCV. 110 (2008) 346–359. [9] D. 31 (2009) 164–171. Appendix A. Proceedings of International Conference on Pattern Recognition. [4] J. Real-Time Imaging 11 (2005) 172–185. Real-time human motion analysis based on analysis of silhouette contour and color blob. Hoshino. pp. S. Tomasi. . The future work shall be focused on integrating the boundary information into color histogram to further improve the reliability of the algorithm. Human posture recognition using active contours and radial basis function neural network. W4: real-time surveillance of people and their activities. Leibe. Davis. Van Gool. pp. Spatiograms versus histograms for region-based tracking. 48–57. Blake. pp. Int. [11] B. Park. 25 (2003) 564–575. Distinctive image features from scale-invariant keypoints. Pattern Anal.860 Suryanto et al. IEEE Workshop on Applications of Computer Vision.

- A Methodology for Extracting Standing Human Bodies from Single ImagesUploaded byjournal
- Based on correlation coefficient in image matchingUploaded byInternational Journal of Research in Engineering and Science
- Lane Change Detection and Tracking for a Safe-Lane Approach in Real Time Vision Based Navigation SystemsUploaded byCS & IT
- A Real Time Approach for Indian Road Analysis using Image Processing and Computer VisionUploaded byIOSRjournal
- Object Detection Using a Max-margin Hough Transform - Maji, Malik - Proceedings of IEEE Conference on Computer Vision and Pattern Recognition - 2009Uploaded byzukun
- Chapter 10Uploaded bytrungnt1981
- Object Extraction in Data Mining Framework for Video SequencUploaded byRashmi Shankar
- Color Image Segmentation Based on Jnd Color HistogramUploaded byursubhash
- A Novel Approach for Hand Analysis Using Image Processing TechniquesUploaded byijcsis
- Real-Time Object Detection for Smart VehiclesUploaded byapi-3799599
- IMAGE PROCESSING BASED GIRTH MONITORING AND RECORDING SYSTEM FOR RUBBER PLANTATIONSUploaded bysipij
- A Benchmark for Breast Ultrasound Image Segmentation (BUSIS)Uploaded byMia Amalia
- Segmentation and Classification of Point Clouds From Dense Aerial Image MatchingUploaded byIJMAJournal
- Neural Network Approach for Eye DetectionUploaded byCS & IT
- cvip0Uploaded byHàng Không Mẫu Hạm
- 2009 Humanoids DrkaakdUploaded byJoko Lelono
- Influence of Local Segmentation in the Context of Digital Image ProcessingUploaded byIAEME Publication
- A Survey of Methods and Strategies in Character SegmentationUploaded byjamilhaider
- 10.1.1.23Uploaded byMouncef Cherqaoui
- Histogram Specification a Fast and Flexible MethodUploaded byThanga Mani
- Recognition of Urdu ScriptsUploaded byapi-3715896
- 2014 IEEE MATLAB TITLES for BtechUploaded byHari Ram
- ContentsUploaded bysharmikbk
- recognition of Urdu ScriptUploaded byapi-3754855
- Segmentation of Brain MRI Images using Fuzzy c-means and DWTUploaded byIJSTE
- Object Base Image Classifications BangladeshUploaded byMohd Razif Sumairi
- Av 23274279Uploaded byAnonymous 7VPPkWS8O
- S R SUploaded byfaizan_ke
- Iris RecognitionUploaded byNandhini Ilangovan
- Moving Object Detection for Video SurveillanceUploaded byIJMER