Professional Documents
Culture Documents
Abstract—This letter presents a two-stage object tracking The basic idea of region-based method is to track ob-
method by combining a region-based method and a contour- ject with the similarity measure of object region. The
based method. First, a kernel-based method is adopted to
Bhattacharyya coefficient and Kullback–Leibler divergence are
locate the object region. Then the diffusion snake is used to
evolve the object contour in order to improve the tracking two popular similarity measures, and the mean-shift algorithm
precision. In the first object localization stage, the initial target has achieved considerable success in similarity region search
position is predicted and evaluated by the Kalman filter and the due to its simplicity and robustness. The real-time kernel-based
Bhattacharyya coefficient, respectively. In the contour evolution object tracking proposed by Comaniciu [4] can successfully
stage, the active contour is evolved on the basis of an object
track partial occluded nonrigid objects, but cannot cope with
feature image generated with the color information in the initial
object region. In the process of the evolution, similarities of the the severe deformation of object contours. A more discrimina-
target region are compared to ensure that the object contour tive similarity measure in spatial-feature space was proposed
evolves in the right way. The comparison between our method by Yang [5], which is a symmetric similarity function between
and the kernel-based method demonstrates that our method can spatially smoothed kernel-density estimates of the model and
effectively cope with the severe deformation of object contour, so
target distributions for object tracking. The new similarity
the tracking precision of our method is higher.
measure in the spatial-feature space can effectively cope with
Index Terms—Diffusion snake, Kalman filter, mean-shift, the translation and scaling of the object, but does not consider
object deformation, object tracking.
the rotation invariance. To cope with occlusions effectively,
Kalman filter [4], particle filter [6], and mean-shift [7] were
I. Introduction combined with the mean-shift algorithm. The scale of the
BJECT TRACKING is an important task in many com- mean-shift kernel is a crucial parameter, so many mechanisms
O puter vision applications such as driver assistance [1],
video surveillance [2], object-based video compression [3],
were presented for choosing or updating scale. Moments of
the sample weight image were used in [8] to compute blob
and so on. Various methods have been proposed and improved, scale and orientation. [4] suggests repeating the mean-shift
from the simple and rigid object tracking under the condi- algorithm at each iteration using window size of ±10% of
tion of a static camera, to the complex and nonrigid object the current size, and evaluating which scale is best using the
tracking under the condition of a moving camera. For ease Bhattacharyya coefficient. Collins [9] added a scaling factor
of discussion, we classify these methods into two categories: to the similarity measure, and used the updating rule to update
region-based method and contour-based method. the scale. But all of the scale updating methods cannot solve
the object deformation very effectively.
Manuscript received May 31, 2006; revised February 10, 2007 and October For contour-based methods, snake [10], [11] or level set
12, 2008. First version published January 29, 2010; current version published
April 2, 2010. This work was supported by the National Science Foundation [12] are mainly used to track object contours. Peterfreund
of China, under Grants 60 805 003/60 773 172, the Special Grade of China presented Kalman snake [13] models, in which energy function
Postdoctoral Science Foundation under Grant no. 200 902 519, and a grant was mainly constructed with optical flow. Chung and Chen
from the Research Grants Council of the Hong Kong Special Administrative
Region (Project No. CUHK4121/08E). This paper was recommended by [14] presented a video segmentation system that integrated
Associate Editor, P. Topiwala. Markov random field (MRF)-based contour tracking with
Q. Chen and Q.-S. Sun are with the School of Computer Science and graph-cut image segmentation. Yilmaz [15] incorporated prior
Technology, Nanjing University of Science and Technology, Nanjing 210094,
China (e-mail: chen2qiang@163.com; sunquansen@mail.njust.edu.cn). shape into object energy functions, and used level set to
P. A. Heng is with the Department of Computer Science and Engineering, evolve the contour by minimizing the energy functional. In
the Chinese University of Hong Kong, Hong Kong 852, China, and also with order to track the object with non-Gaussian state density
the Shenzhen Institute of Advanced Integration Technology, Chinese Academy
of Sciences, Chinese University of Hong Kong, Shenzhen, China (e-mail: in clutter, Isard and Blake [16] presented the condensation
pheng@cse.cuhk.edu.hk). algorithm. Contour-based methods can achieve a high tracking
D.-S. Xia is with the School of Computer Science and Technology, Nanjing precision, but their robustness is usually not better than that of
University of Science and Technology, Nanjing 210094, China, and also
with the ESIC/ELEC, Rouen, France, and also with the Computer Graphics region-based methods. Furthermore, the computing cost of
Laboratory, Centre National de la Recherche Scientifique (CNRS), Paris, contour-based methods is usually high, especially for large
France (e-mail: deshen x@263.net). and fast-moving objects.
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org. There are some tracking methods using both region and
Digital Object Identifier 10.1109/TCSVT.2010.2041819 contour information. Sung and Kim [17] proposed an active
1051-8215/$26.00
c 2010 IEEE
606 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 20, NO. 4, APRIL 2010
contour-based active appearance model (AAM) to improve the respectively. We calculate the Bhattacharyya coefficients of
tracking accuracy and the convergence rate of the existing four regions with the centers, [x0 ± 0.5h, y0 ± w], and take
robust AAM [18]. Rathi et al. [19] formulates a particle filter- the center with the maximum coefficient as the center of the
ing algorithm in the geometric active contour framework that initial target region.
can be used for tracking moving and deforming objects. Com-
bining the merits of region-based and contour-based methods, C. Kernel-Based Object Tracking
we introduce a two-stage object tracking method. First, the Let the reference target model be the probability density
kernel-based method is adopted to locate the object region, and function (pdf) q in the feature space. To reduce the com-
the Kalman filter and the Bhattacharyya coefficient are used to putational cost, m-bin histograms are used. Thus, we have
determine the initial object tracking position. Then the object q̂ = {q̂u }u=1···m , m u=1 q̂u = 1, and the target candidate p̂(y) =
feature image is generated according to the color information {p̂u (y)}u=1···m , m u=1 p̂u = 1, where y is the center of object
in the object region, and the diffusion snake is used to evolve region. Let Xi∗ i=1···n be the normalized pixel locations in
the object contour in order to improve the tracking precision. the region defined as the target model. The region is centered
at 0. The function b : R2 → {1 · · · m} associates the pixel at
location Xi∗ with the index b(Xi∗ ) of its bin in the quantized
II. Target Localization feature space. The probability of the feature u = 1 · · · m in the
A. Target Prediction Based on the Kalman Filter target model is then computed as
In 1960, Kalman [20] published his famous paper describ- n
2
ing a recursive solution to the discrete-data linear filtering q̂u = C k(Xi∗ )δ[b(Xi∗ ) − u] (3)
problem. The Kalman filter is a set of mathematical equations i=1
that provides an efficient computational (recursive) means to where k is the Epanechnikov kernel, and δ is the Kronecker
estimate the state of a process, in a way that minimizes the delta function. The normalization constant C is derived by
m
mean of the squared error. In this letter, the Kalman filter is imposing the condition q̂u = 1.
used to predict the center [xc, yc] of the object region. Let u=1
the displacements in the x and y directions be dx and dy, Let {Xi }i=1···nh be the normalized pixel locations of the target
respectively, and the state vector is X = [xc, yc, dx, dy]. The candidate, centered at y in the current frame. Then
Kalman filter system model is nh
y − Xi 2
p̂u (y) = Ch
k δ[b(Xi ) − u] (4)
h
Xk+1 = FXk + Wk . (1) i=1
Fig. 2. Tracking of the Plane sequence with the kernel-based method (top) and our method (bottom). Frames 1, 8, 15, 25, 45, and 54 are shown.
Fig. 3. Tracking of the Ball Board sequence with the kernel-based method (top) and our method (bottom). Frames 1, 11, 23, 33, and 45 are shown.
Fig. 4. Comparison of the tracking error. (a) Tracking error of Fig. 2. Fig. 5. Comparison of the location error. (a) Location error of Fig. 2.
(b) Tracking error of Fig. 3. (b) Location error of Fig. 3.
Fig. 2 shows the tracking results of the Plane sequence with which is defined as follows:
the kernel-based method [4] and our method, in which the d = Ca − Cm (16)
frames with large deformation are shown. The Plane sequence
has 60 frames of 160 × 120 pixels. The color spaces for our where Ca and Cm are the object center coordinates of the
method and the kernel-based method are quantized into 8 × object tracking and manual segmentation results, respectively.
8 × 8 bins and 16 × 16 × 16 bins, respectively. The first frame Fig. 4 shows the tracking errors of Figs. 2 and 3, respec-
is the manual initialization. The same Kalman filter is used for tively. Fig. 4 indicates that our method can cope with the
the target prediction, and the scale of the kernel is adjusted deformation and the scale change more effectively than the
with ±10% of the current size. Fig. 3 shows the tracking re- kernel-based method, so our method can achieve a higher
sults of the Ball Board sequence with the kernel-based method tracking precision. But the time performance of our method
[4] and our method, which has 45 frames of 352 × 240 pixels. is not better than that of the kernel-based method. For the
In order to evaluate the tracking precision quantitatively, the tracking of the Plane sequence, the time is 463 s with our
error measure in this letter is as follows: method, but 54 s with the kernel-based method. Fig. 5 shows
the location errors of Figs. 2 and 3, respectively. From Fig. 5,
O 1 O2 B 1 B2
error = 1 − (15) we can observe that our method has the better location of
O1 B1 object center than the kernel-based method for most frames.
where O1 and B1 are the object region and background region For Figs. 2 and 3, there does not exist object loss. If we omit
in the manual segmentation result, O2 and B2 are the object the target localization and only use the active contour, the
region and background region in the segmentation result with object loss will appear. The reason of the object loss is that
the object tracking method. In addition, the location error of the initial target region does not include the object region, or
object center is adopted to evaluate the location precision, includes very small object region.
CHEN et al.: TWO-STAGE OBJECT TRACKING METHOD BASED ON KERNEL AND ACTIVE CONTOUR 609
Fig. 6. Tracking of the person’s head in a Movie sequence with the kernel-based method (top) and our method (bottom). Frames 1, 2, 16, and 24 are shown.
Fig. 6 shows the tracking of the person’s head in a Movie [8] G. R. Bradski, “Computer vision face tracking for use in a perceptual
user interface,” Intell. Technol. J., vol. 2, no. 2, pp. 1–15, 1998.
sequence. Because the displacement of the object between two [9] R. T. Collins, “Mean-shift blob tracking through scale space,” in Proc.
consecutive frames is large, such as the frames 1 and 2 in IEEE Comput. Soc. Conf. Comput. Vision Pattern Recognit. (CVPR),
Fig. 6, the object will be easily lost with the active contour vol. 2. 2003, pp. 234–240.
[10] S. Sun, D. R. Haynor, and Y. Kim, “Semiautomatic video object
method, and without the object location based on the kernel- segmentation using VSnakes,” IEEE Trans. Circuits Syst. Video Technol.,
based method. vol. 13, no. 1, pp. 75–82, Jan. 2003.
[11] Q. Chen, Q. S. Sun, P. A. Heng, and D. S. Xia, “Parametric active
contours for object tracking based on matching degree image of object
V. Conclusion contour points,” Pattern Recognition Letters, vol. 29, no. 2, pp. 126–141,
Jan. 2008.
By combining the merits of the region-based method and [12] N. Paragios and R. Deriche, “Geodesic active regions and level set
the contour-based method, we have presented a two-stage methods for motion estimation and tracking,” Comput. Vision Image
object tracking method. Using the kernel-based method, we Understand., vol. 97, no. 3, pp. 259–282, Mar. 2005.
[13] N. Peterfreund, “Robust tracking of position and velocity with Kalman
can locate the object effectively in complex condition with snakes,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 21, no. 6, pp.
camera motion, partial occlusions, clutter, etc., but the tracking 564–569, Jun. 1999.
precision is not high when the object severely deforms. In [14] C. Y. Chung and H. H. Chen, “Video object extraction via MRF-based
contour tracking,” IEEE Trans. Circuits Syst. Video Technol., vol. 20,
order to improve the tracking precision, we used the contour- no. 1, pp. 149–155, Jan. 2010.
based method to track the object contour precisely after the [15] A. Yilmaz, X. Li, and M. Shah, “Contour-based object tracking with
target localization. The experimental results demonstrated that occlusion handling in video acquired using mobile cameras,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 26, no. 11, pp. 1531–1536, Nov.
our method can achieve a higher tracking precision than that of 2004.
the kernel-based method, but our method is time-consuming. [16] M. Isard and A. Blake, “Condensation: Conditional density propagation
Because in this letter the object feature image is based on for visual tracking,” Int. J. Comput. Vision, vol. 29, no. 1, pp. 5–28,
1998.
the color information, our method cannot effectively track the [17] J. Sung and D. Kim, “A background robust active appearance model
object when the color feature of the object is very similar using active contour technique,” Pattern Recognit., vol. 40, no. 1,
to that of the background. In the future research, we will pp. 108–120, Jan. 2007.
[18] R. Gross, I. Matthews, and S. Baker, “Constructing and fitting active
incorporate other image information, such as the texture and appearance models with occlusion,” in Proc. IEEE Workshop Face
shape information, into the color information to generate more Process. Video, 2004, pp. 674–679.
robust object feature image. [19] Y. Rathi, N. Vaswani, A. Tannenbaum, and A. Yezzi, “Tracking deform-
ing objects using particle filtering for geometric active contours,” IEEE
Trans. Pattern Anal. Mach. Intell., vol. 29, no. 8, pp. 1470–1475, Aug.
2007.
References [20] R. E. Kalman, “A new approach to linear filtering and prediction
[1] U. Handmann, T. Kalinke, C. Tzomakas, M. Werner, and W. von Seelen, problems,” Trans. Am. Soc. Mechan. Eng.-J. Basic Eng., vol. 82, no. 1,
“Computer vision for driver assistance systems,” in Proc. Soc. Photo- pp. 35–45, 1960.
Optic. Instrum. Eng., vol. 3364. 1998, pp. 136–147. [21] G. Welch, and G. Bishop, “An introduction to the Kalman filter,” Dept.
[2] D. Gavrila, “The visual analysis of human movement: A survey,” Comput. Sci., Univ. North Carolina, Chapel Hill, Tech. Rep. TR95-041,
Comput. Vision Image Understand., vol. 73, no. 1, pp. 82–98, Jan. 1999. 2004.
[3] M. Lee, W. Chen, B. Lin, C. Gu, T. Markoc, S. Zabinsky, and R. [22] D. Salmond, “Target tracking: Introduction and Kalman tracking filters,”
Szeliski, “A layered video object coding system using sprite and affine in Proc. IEE Target Tracking: Algorithms Applicat. (Ref. No. 2001/174),
motion model,” IEEE Trans. Circuits Syst. Video Technol., vol. 7, no. 1, vol. 2. 2001, pp. 1–16.
pp. 130–145, Feb. 1997. [23] D. Cremers, F. Tischhäuser, J. Weickert, C. Schnörr, “Diffusion snakes:
[4] D. Comaniciu, V. Ramesh, and P. Meer, “Kernel-based object tracking,” Introducing statistical shape knowledge into the Mumford–Shah func-
IEEE Trans. Pattern Anal. Mach. Intell., vol. 25, no. 5, pp. 564–577, tional,” Int. J. Comput. Vision, vol. 50, no. 3, pp. 295–313, Dec.
May 2003. 2002.
[5] C. Yang, R. Duraiswami, and L. Davis, “Efficient mean-shift tracking via [24] D. Mumford and J. Shah, “Optimal approximations by piecewise smooth
a new similarity measure,” in Proc. IEEE Comput. Soc. Conf. Comput. functions and associated variational problems,” Comm. Pure Appl. Math,
Vision Pattern Recognit. (CVPR), vol. 1. 2005, pp. 176–183. vol. 42, no. 5, pp. 577–685, 1989.
[6] C. Chang, and R. Ansari, “Kernel particle filter for visual track-
ing,” IEEE Signal Process. Lett., vol. 12, no. 3, pp. 242–245, Mar.
2005.
[7] H. Y. Zhou, Y. Yuan, and C. M. Shi, “Object tracking using SIFT features
and mean shift,” Comput. Vision Image Understand., vol. 113, no. 3,
pp. 345–352, Mar. 2009.