You are on page 1of 5

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO.

4, APRIL 2010 605

Two-Stage Object Tracking Method Based on


Kernel and Active Contour
Qiang Chen, Quan-Sen Sun, Pheng Ann Heng, Senior Member, IEEE, and De-Shen Xia

Abstract—This letter presents a two-stage object tracking The basic idea of region-based method is to track ob-
method by combining a region-based method and a contour- ject with the similarity measure of object region. The
based method. First, a kernel-based method is adopted to
Bhattacharyya coefficient and Kullback–Leibler divergence are
locate the object region. Then the diffusion snake is used to
evolve the object contour in order to improve the tracking two popular similarity measures, and the mean-shift algorithm
precision. In the first object localization stage, the initial target has achieved considerable success in similarity region search
position is predicted and evaluated by the Kalman filter and the due to its simplicity and robustness. The real-time kernel-based
Bhattacharyya coefficient, respectively. In the contour evolution object tracking proposed by Comaniciu [4] can successfully
stage, the active contour is evolved on the basis of an object
track partial occluded nonrigid objects, but cannot cope with
feature image generated with the color information in the initial
object region. In the process of the evolution, similarities of the the severe deformation of object contours. A more discrimina-
target region are compared to ensure that the object contour tive similarity measure in spatial-feature space was proposed
evolves in the right way. The comparison between our method by Yang [5], which is a symmetric similarity function between
and the kernel-based method demonstrates that our method can spatially smoothed kernel-density estimates of the model and
effectively cope with the severe deformation of object contour, so
target distributions for object tracking. The new similarity
the tracking precision of our method is higher.
measure in the spatial-feature space can effectively cope with
Index Terms—Diffusion snake, Kalman filter, mean-shift, the translation and scaling of the object, but does not consider
object deformation, object tracking.
the rotation invariance. To cope with occlusions effectively,
Kalman filter [4], particle filter [6], and mean-shift [7] were
I. Introduction combined with the mean-shift algorithm. The scale of the
BJECT TRACKING is an important task in many com- mean-shift kernel is a crucial parameter, so many mechanisms
O puter vision applications such as driver assistance [1],
video surveillance [2], object-based video compression [3],
were presented for choosing or updating scale. Moments of
the sample weight image were used in [8] to compute blob
and so on. Various methods have been proposed and improved, scale and orientation. [4] suggests repeating the mean-shift
from the simple and rigid object tracking under the condi- algorithm at each iteration using window size of ±10% of
tion of a static camera, to the complex and nonrigid object the current size, and evaluating which scale is best using the
tracking under the condition of a moving camera. For ease Bhattacharyya coefficient. Collins [9] added a scaling factor
of discussion, we classify these methods into two categories: to the similarity measure, and used the updating rule to update
region-based method and contour-based method. the scale. But all of the scale updating methods cannot solve
the object deformation very effectively.
Manuscript received May 31, 2006; revised February 10, 2007 and October For contour-based methods, snake [10], [11] or level set
12, 2008. First version published January 29, 2010; current version published
April 2, 2010. This work was supported by the National Science Foundation [12] are mainly used to track object contours. Peterfreund
of China, under Grants 60 805 003/60 773 172, the Special Grade of China presented Kalman snake [13] models, in which energy function
Postdoctoral Science Foundation under Grant no. 200 902 519, and a grant was mainly constructed with optical flow. Chung and Chen
from the Research Grants Council of the Hong Kong Special Administrative
Region (Project No. CUHK4121/08E). This paper was recommended by [14] presented a video segmentation system that integrated
Associate Editor, P. Topiwala. Markov random field (MRF)-based contour tracking with
Q. Chen and Q.-S. Sun are with the School of Computer Science and graph-cut image segmentation. Yilmaz [15] incorporated prior
Technology, Nanjing University of Science and Technology, Nanjing 210094,
China (e-mail: chen2qiang@163.com; sunquansen@mail.njust.edu.cn). shape into object energy functions, and used level set to
P. A. Heng is with the Department of Computer Science and Engineering, evolve the contour by minimizing the energy functional. In
the Chinese University of Hong Kong, Hong Kong 852, China, and also with order to track the object with non-Gaussian state density
the Shenzhen Institute of Advanced Integration Technology, Chinese Academy
of Sciences, Chinese University of Hong Kong, Shenzhen, China (e-mail: in clutter, Isard and Blake [16] presented the condensation
pheng@cse.cuhk.edu.hk). algorithm. Contour-based methods can achieve a high tracking
D.-S. Xia is with the School of Computer Science and Technology, Nanjing precision, but their robustness is usually not better than that of
University of Science and Technology, Nanjing 210094, China, and also
with the ESIC/ELEC, Rouen, France, and also with the Computer Graphics region-based methods. Furthermore, the computing cost of
Laboratory, Centre National de la Recherche Scientifique (CNRS), Paris, contour-based methods is usually high, especially for large
France (e-mail: deshen x@263.net). and fast-moving objects.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. There are some tracking methods using both region and
Digital Object Identifier 10.1109/TCSVT.2010.2041819 contour information. Sung and Kim [17] proposed an active
1051-8215/$26.00 
c 2010 IEEE
606 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

contour-based active appearance model (AAM) to improve the respectively. We calculate the Bhattacharyya coefficients of
tracking accuracy and the convergence rate of the existing four regions with the centers, [x0 ± 0.5h, y0 ± w], and take
robust AAM [18]. Rathi et al. [19] formulates a particle filter- the center with the maximum coefficient as the center of the
ing algorithm in the geometric active contour framework that initial target region.
can be used for tracking moving and deforming objects. Com-
bining the merits of region-based and contour-based methods, C. Kernel-Based Object Tracking
we introduce a two-stage object tracking method. First, the Let the reference target model be the probability density
kernel-based method is adopted to locate the object region, and function (pdf) q in the feature space. To reduce the com-
the Kalman filter and the Bhattacharyya coefficient are used to putational cost, m-bin histograms are used. Thus, we have
determine the initial object tracking position. Then the object q̂ = {q̂u }u=1···m , m u=1 q̂u = 1, and the target candidate p̂(y) =
feature image is generated according to the color information {p̂u (y)}u=1···m , m u=1 p̂u = 1, where y is the center of object
in the object region, and the diffusion snake is used to evolve region. Let Xi∗ i=1···n be the normalized pixel locations in
the object contour in order to improve the tracking precision. the region defined as the target model. The region is centered
at 0. The function b : R2 → {1 · · · m} associates the pixel at
location Xi∗ with the index b(Xi∗ ) of its bin in the quantized
II. Target Localization feature space. The probability of the feature u = 1 · · · m in the
A. Target Prediction Based on the Kalman Filter target model is then computed as
In 1960, Kalman [20] published his famous paper describ-  n
 2
ing a recursive solution to the discrete-data linear filtering q̂u = C k(Xi∗  )δ[b(Xi∗ ) − u] (3)
problem. The Kalman filter is a set of mathematical equations i=1

that provides an efficient computational (recursive) means to where k is the Epanechnikov kernel, and δ is the Kronecker
estimate the state of a process, in a way that minimizes the delta function. The normalization constant C is derived by
m
mean of the squared error. In this letter, the Kalman filter is imposing the condition q̂u = 1.
used to predict the center [xc, yc] of the object region. Let u=1
the displacements in the x and y directions be dx and dy, Let {Xi }i=1···nh be the normalized pixel locations of the target
respectively, and the state vector is X = [xc, yc, dx, dy]. The candidate, centered at y in the current frame. Then
  
Kalman filter system model is  nh
 y − Xi 2
p̂u (y) = Ch 
k   δ[b(Xi ) − u] (4)
h 
Xk+1 = FXk + Wk . (1) i=1

Measurement model is where h is the bandwidth, and Ch is the normalization con-


stant.
Zk = HXk + Vk (2) The similarity function defines a distance among the target
model and candidates. We adopt the Bhattacharyya coefficient
where
⎡ ⎤ to define the distance
1 0 1 0  
⎢0 m

1 0 1⎥ 1 0 0 0 ρ̂(y) ≡ ρ p̂(y), q̂ =
F =⎢
⎣0
⎥, H= . p̂u (y)q̂u . (5)
0 1 0⎦ 0 1 0 0 u=1
0 0 0 1
The detailed object localization process with the
Details about the Kalman filter can refer to [21] and [22]. Bhattacharyya coefficient was introduced in [4].

B. Evaluation of the Initial Target Region


III. Evolution of Object Contour
The center y of the candidate region in the kernel-based
method is crucial. When the target region intersects on the After the target localization with the kernel-based method,
object region, the mean-shift algorithm will find the location we adopt the diffusion snake to evolve the object contour in the
that maximizes the Bhattacharyya coefficient, which is the object feature space, in order to improve the tracking precision.
new position of the target. If there exists no intersection, the
procedure fails, namely the loss of the object. Though Kalman A. Generation of Object Feature Image
filter can improve the validity of the initial target region, the Let Q0 be the a bin image by quantifying each color
object loss still exists when the object changes the moving component of the initial RGB color image I0 , and let {Yi }i=1,...,n
direction suddenly. In this letter, the Bhattacharyya coefficient be the pixel locations in the initial object region. The color
of the target region is calculated in order to judge whether the probability density function in the object region is
 n s
target region intersects the object region. If the Bhattacharyya 
coefficient is very small, the target region does not intersect wr = C δ [b (Yi ) − r] , r = 1, . . . , a3 (6)
the object region, and we need to relocate the target region i
with the following method. where the definitions of the constant C and the function δ are
Let the center of the object region in the previous frame the same as those of (3), the exponent 0 < s < 2 can adjust
be x0 , y0 . The high and width of the region are h and w, the color difference between the object and the background.
CHEN et al.: TWO-STAGE OBJECT TRACKING METHOD BASED ON KERNEL AND ACTIVE CONTOUR 607

Plane, and λ, ν are positive constants. Replace the original


1
length norm C by the squared L2-norm, L (C) = 0 Cs2 ds,
to obtain the function for the diffusion snake
 
1 1
Ei (u, C) = (I − u)2 dx + λ2 |∇u|2 dx + νL (C) .
2  2 −C
(10)
Fig. 1. Generation of the object feature image. (a) Initialization. (b) Object
feature image. (c) Smoothed object feature image. The diffusion snake can be considered as a hybrid model which
combines the external energy of the Mumford–Shah function
with the internal energy of the snakes.
The values of the parameters a and s depend on the consistency For a fixed segmentation u, minimizing the diffusion snake
of the color in the object region. If the consistency is good, functional with respect to the contour C, which gives the
namely there are few color varieties in the object region and Euler–Lagrange equation
small color change of the object throughout the sequence, the
values of a and s should be larger. If there are many color ∂Ei
varieties in the object region and large color change of the = e− (s) − e+ (s) · n (s) − νCss (s) = 0, ∀s ∈ [0, 1] (11)
∂C
object, the values of a and s should be smaller. In order to where Css is the second derivative of the B-spline curve about
overcome the influence of the illumination changes, we can s, and the terms e+ and e− denote the energy density inside
adopt the chrominance to generate the object feature image and outside the regions of the contour C (s), respectively
instead of the RGB space. If the texture feature in the object
region is very obvious, we can adopt the texture feature to e+/− = (I − u)2 + λ2 (∇u)2 (12)
generate the object feature image.
and n denotes the outer normal vector on the contour.
According to the color probability density function, the
Solving the minimization problem of (10) by gradient
object feature image can be obtained
descent results in the evolution equation
FI (i, j) = w (b (Z(i, j))) (7)
∂C ∂Ei
where Z (i, j) denotes the pixel with the image coordinate =− = e+ (s) − e− (s) · n (s) + νCss (s) . (13)
∂t ∂C
(i, j). The basic idea of (7) is that for each pixel in the Equation (13) can be converted to an evolution equation
image, the corresponding value of the color probability density for the control points by inserting the definition (8) of the
function is used to represent the value of the object feature contour as a spline curve. We can obtain the coordinates of
image. control point m
Fig. 1 shows the generation of an object feature image.    
Fig. 1(a) shows the initial object contour in the first frame.
dxm (t)
dt
= i B−1 mi e+ (si , t) − e− (si , t) nx (si , t)
Fig. 1(b) shows the object feature image that is generated +ν (xi−1 − 2xi + xi+1 )
with (7), where a = 8 and s = 1. Fig. 1(c) is the smoothed  −1
  + 
dym (t)
= i B e (si , t) − e− (si , t) ny (si , t)
object feature image with the Gaussian filter, where some noise dt mi
(14)
is removed. + ν (yi−1 − 2yi + yi+1 )
where the cyclic tridiagonal matrix B contains the spline basis
B. Diffusion Snake
functions evaluated at the nodes.
Cremers et al. [23] presented a diffusion snake method by
integrating shape statistics into the Mumford–Shah model [24]. C. Evolution Control
To construct the energy function conveniently, we adopt a
Because the generated object feature image simply dis-
closed spline curve to represent the boundary C
tinguishes object and background, the object contour can

N
probably evolve into the background wrongly. In order to
C : [0, 1] → C (s) = pn Bn (s) (8) make the object contour evolve in the right way, we compare
n=1
the similarities of target regions in the process of the contour
where Bn are periodic, quadratic B-spline basis functions and evolution. After some evolution, the Bhattacharyya coefficient
pn = (xn , yn )t are the spline control points. s is the knot of the of the target region is calculated. If the coefficient becomes
B-spline curve C and N is the total number of spline control large, the object contour continues to evolve. Otherwise, the
points. evolution stops.
In 1989, Mumford and Shah presented a variational ap-
proach for image segmentation, which consists of the mini-
mization of the following energy function: IV. Experimental Results and Discussion
  We have tested our method using various videos down-
1 1
Ei (u, C) = (I − u)2 dx+ λ2 |∇u|2 dx+ν C (9) loaded from the Internet, and have performed the experiments
2  2 −C on a 2.8 GHz Pentium 4 PC with 512 M memory. In the
where I is the input image, namely the object feature image FI experiments the RGB color space was taken as feature space
in this letter. u is a piecewise smooth function,  is the image and it was quantized into 16 × 16 × 16 bins.
608 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010

Fig. 2. Tracking of the Plane sequence with the kernel-based method (top) and our method (bottom). Frames 1, 8, 15, 25, 45, and 54 are shown.

Fig. 3. Tracking of the Ball Board sequence with the kernel-based method (top) and our method (bottom). Frames 1, 11, 23, 33, and 45 are shown.

Fig. 4. Comparison of the tracking error. (a) Tracking error of Fig. 2. Fig. 5. Comparison of the location error. (a) Location error of Fig. 2.
(b) Tracking error of Fig. 3. (b) Location error of Fig. 3.

Fig. 2 shows the tracking results of the Plane sequence with which is defined as follows:
the kernel-based method [4] and our method, in which the d = Ca − Cm  (16)
frames with large deformation are shown. The Plane sequence
has 60 frames of 160 × 120 pixels. The color spaces for our where Ca and Cm are the object center coordinates of the
method and the kernel-based method are quantized into 8 × object tracking and manual segmentation results, respectively.
8 × 8 bins and 16 × 16 × 16 bins, respectively. The first frame Fig. 4 shows the tracking errors of Figs. 2 and 3, respec-
is the manual initialization. The same Kalman filter is used for tively. Fig. 4 indicates that our method can cope with the
the target prediction, and the scale of the kernel is adjusted deformation and the scale change more effectively than the
with ±10% of the current size. Fig. 3 shows the tracking re- kernel-based method, so our method can achieve a higher
sults of the Ball Board sequence with the kernel-based method tracking precision. But the time performance of our method
[4] and our method, which has 45 frames of 352 × 240 pixels. is not better than that of the kernel-based method. For the
In order to evaluate the tracking precision quantitatively, the tracking of the Plane sequence, the time is 463 s with our
error measure in this letter is as follows: method, but 54 s with the kernel-based method. Fig. 5 shows
     the location errors of Figs. 2 and 3, respectively. From Fig. 5,
O 1 O2 B 1 B2
error = 1 −  (15) we can observe that our method has the better location of
O1 B1 object center than the kernel-based method for most frames.
where O1 and B1 are the object region and background region For Figs. 2 and 3, there does not exist object loss. If we omit
in the manual segmentation result, O2 and B2 are the object the target localization and only use the active contour, the
region and background region in the segmentation result with object loss will appear. The reason of the object loss is that
the object tracking method. In addition, the location error of the initial target region does not include the object region, or
object center is adopted to evaluate the location precision, includes very small object region.
CHEN et al.: TWO-STAGE OBJECT TRACKING METHOD BASED ON KERNEL AND ACTIVE CONTOUR 609

Fig. 6. Tracking of the person’s head in a Movie sequence with the kernel-based method (top) and our method (bottom). Frames 1, 2, 16, and 24 are shown.

Fig. 6 shows the tracking of the person’s head in a Movie [8] G. R. Bradski, “Computer vision face tracking for use in a perceptual
user interface,” Intell. Technol. J., vol. 2, no. 2, pp. 1–15, 1998.
sequence. Because the displacement of the object between two [9] R. T. Collins, “Mean-shift blob tracking through scale space,” in Proc.
consecutive frames is large, such as the frames 1 and 2 in IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR),
Fig. 6, the object will be easily lost with the active contour vol. 2. 2003, pp. 234–240.
[10] S. Sun, D. R. Haynor, and Y. Kim, “Semiautomatic video object
method, and without the object location based on the kernel- segmentation using VSnakes,” IEEE Trans. Circuits Syst. Video Technol.,
based method. vol. 13, no. 1, pp. 75–82, Jan. 2003.
[11] Q. Chen, Q. S. Sun, P. A. Heng, and D. S. Xia, “Parametric active
contours for object tracking based on matching degree image of object
V. Conclusion contour points,” Pattern Recognition Letters, vol. 29, no. 2, pp. 126–141,
Jan. 2008.
By combining the merits of the region-based method and [12] N. Paragios and R. Deriche, “Geodesic active regions and level set
the contour-based method, we have presented a two-stage methods for motion estimation and tracking,” Comput. Vision Image
object tracking method. Using the kernel-based method, we Understand., vol. 97, no. 3, pp. 259–282, Mar. 2005.
[13] N. Peterfreund, “Robust tracking of position and velocity with Kalman
can locate the object effectively in complex condition with snakes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 6, pp.
camera motion, partial occlusions, clutter, etc., but the tracking 564–569, Jun. 1999.
precision is not high when the object severely deforms. In [14] C. Y. Chung and H. H. Chen, “Video object extraction via MRF-based
contour tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 20,
order to improve the tracking precision, we used the contour- no. 1, pp. 149–155, Jan. 2010.
based method to track the object contour precisely after the [15] A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with
target localization. The experimental results demonstrated that occlusion handling in video acquired using mobile cameras,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1531–1536, Nov.
our method can achieve a higher tracking precision than that of 2004.
the kernel-based method, but our method is time-consuming. [16] M. Isard and A. Blake, “Condensation: Conditional density propagation
Because in this letter the object feature image is based on for visual tracking,” Int. J. Comput. Vision, vol. 29, no. 1, pp. 5–28,
1998.
the color information, our method cannot effectively track the [17] J. Sung and D. Kim, “A background robust active appearance model
object when the color feature of the object is very similar using active contour technique,” Pattern Recognit., vol. 40, no. 1,
to that of the background. In the future research, we will pp. 108–120, Jan. 2007.
[18] R. Gross, I. Matthews, and S. Baker, “Constructing and fitting active
incorporate other image information, such as the texture and appearance models with occlusion,” in Proc. IEEE Workshop Face
shape information, into the color information to generate more Process. Video, 2004, pp. 674–679.
robust object feature image. [19] Y. Rathi, N. Vaswani, A. Tannenbaum, and A. Yezzi, “Tracking deform-
ing objects using particle filtering for geometric active contours,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 29, no. 8, pp. 1470–1475, Aug.
2007.
References [20] R. E. Kalman, “A new approach to linear filtering and prediction
[1] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, and W. von Seelen, problems,” Trans. Am. Soc. Mechan. Eng.-J. Basic Eng., vol. 82, no. 1,
“Computer vision for driver assistance systems,” in Proc. Soc. Photo- pp. 35–45, 1960.
Optic. Instrum. Eng., vol. 3364. 1998, pp. 136–147. [21] G. Welch, and G. Bishop, “An introduction to the Kalman filter,” Dept.
[2] D. Gavrila, “The visual analysis of human movement: A survey,” Comput. Sci., Univ. North Carolina, Chapel Hill, Tech. Rep. TR95-041,
Comput. Vision Image Understand., vol. 73, no. 1, pp. 82–98, Jan. 1999. 2004.
[3] M. Lee, W. Chen, B. Lin, C. Gu, T. Markoc, S. Zabinsky, and R. [22] D. Salmond, “Target tracking: Introduction and Kalman tracking filters,”
Szeliski, “A layered video object coding system using sprite and affine in Proc. IEE Target Tracking: Algorithms Applicat. (Ref. No. 2001/174),
motion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, vol. 2. 2001, pp. 1–16.
pp. 130–145, Feb. 1997. [23] D. Cremers, F. Tischhäuser, J. Weickert, C. Schnörr, “Diffusion snakes:
[4] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” Introducing statistical shape knowledge into the Mumford–Shah func-
IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577, tional,” Int. J. Comput. Vision, vol. 50, no. 3, pp. 295–313, Dec.
May 2003. 2002.
[5] C. Yang, R. Duraiswami, and L. Davis, “Efficient mean-shift tracking via [24] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth
a new similarity measure,” in Proc. IEEE Comput. Soc. Conf. Comput. functions and associated variational problems,” Comm. Pure Appl. Math,
Vision Pattern Recognit. (CVPR), vol. 1. 2005, pp. 176–183. vol. 42, no. 5, pp. 577–685, 1989.
[6] C. Chang, and R. Ansari, “Kernel particle filter for visual track-
ing,” IEEE Signal Process. Lett., vol. 12, no. 3, pp. 242–245, Mar.
2005.
[7] H. Y. Zhou, Y. Yuan, and C. M. Shi, “Object tracking using SIFT features
and mean shift,” Comput. Vision Image Understand., vol. 113, no. 3,
pp. 345–352, Mar. 2009.

You might also like