Region Covariance Based Object Tracking Using Monte Carlo Method

2010 8th IEEE International Conference on FrB7.
1
Control and Automation
Xiamen, China, June 9-11, 2010
Region Covariance based Object Tracking Using Monte Carlo Method

Xiaofeng Ding, Chengrong Huang, Fengchen Huang,
Lizhong Xu, Member, IEEE, and Xiao-fang Li, Member, IEEE
Abstract— Covariance features enabled efficient fusion of been proposed in [2]. Covariance matrix extracted from a
different type of image features have low dimensions and region is used as region descriptor. The covariance matrix
covariance-based object tracking has been proved robust, fuses multiple features which might be correlated such as
versatile for a modest computational cost. In this paper, a
method combined Monte Carlo method and covariance features intensity, color, gradients, filter responses, etc. It is robust,
is proposed. Monte Carlo method is used to determine the scope discriminative and the computing based on integral image
of the search target at the region level. Covariance features are make the computational cost is independent of the size of
used to model the objects appearance at the object level. An the region.
improved object matching and occlusion handling strategies Covariance-based object tracking proposed in [2], [3],
are given, which are followed by an appearance model update
method. Experiments show our approach is robust and effective [4] has been proved robust and versatile for a modest
for tracking the object with irregular movement and partial computational cost. Tuzel et al. [2] proposed a global search
occlusions. algorithm to find the best matching between consecutive
video frames, wherein covariance features are proved faster
I. INTRODUCTION and robuster than color histograms. Porikli et al. [3] proposed
Object tracking is always an important task in computer a simple and elegant algorithm to track non-rigid objects
vision, it has various application in visual surveillance, using a covariance based object description and an update
human-computer interaction, vehicle navigation and so on. mechanism based on means on Riemannian manifolds. Wu et
Object tracking can be classified into point tracking, kernel al. [4] and Palaio, Batista [5] proposed the covariance-based
tracking and silhouette tracking [1]. For differen tracking particle filter for single and multi objects tracking, separately.
task different tracking method is used. Point tracking use However, when particle filters are applied to visual tracking,
point features which are usually detected by Harris, KTL the particle degeneracy and how to establish the target’s
and SIFT detectors to describe object. It is implemented by motion model are two of the main problems.
find the points’ relationship in single and consecutive frames. In this paper, we propose a Monte Carlo method [6]
Kernel tracking is typically performed by computing the combined a covariance based object description to track non-
motion of the kernel from one frame to the next. Silhouette rigid objets with irregular movement and partial occlusions.
tracking is performed by find the object frame by means of an In order to improve the performance of occlusion handling,
object model generated using the previous frames, wherein we give a new object matching and model update strategy.
the object model is in the form of a color histogram, object The paper is organized as follows: Section II describes
edges or the object contours. Details of these three object the region covariance descriptor; in section III, we give the
tracking categories are referred to Ref. [1]. Monte Carlo method; then the object matching and model
Generally, a suitable feature will be selected to describe update strategy are introduced in section IV; experiment
the target in kernel tracking and silhouette tracking. So results are provided in section V, which is followed by
how to select a discriminate, robust and easy computing conclusions in section VI.
feature is very critical to the performance of object tracking. II. R EGION C OVARIANCE D ESCRIPTOR
Firstly, the raw pixel values of several image statics such The region covariance descriptor proposed in [2] by Tuzel
as color, gradient and filter responses are used for image et al. fuses multiple features into a covariance matrix which
features. These features are easy to change in the presence is low-dimension and discriminative.
of illumination changes and non-rigid motions. Then, his- Let I be a three dimensional color image. Let F be the
togram, a nature extension of raw pixel values, is used. W × H × d dimensional feature image extracted from I
However, histogram described features are exponential with
its numbers. Recently, the covariance region descriptor has F (x, y) = φ(I, x, y) (1)
where the function φ can be any mapping such as intensity,
This work is supported by the National High Technology Research and color, gradients, filter responses, etc. For a given rectangular
Development Program of China (No. 2007AA11Z227) and the Natural
Science Foundation of Jiangsu Province of China (No. BK2009352). region R in image I, let {fk }k=1..n be the d-dimensional
Xiaofeng Ding, Fengchen Huang, Lizhong Xu and Xiao-fang feature points inside R. The region covariance can be ex-
Li are with the College of Computer and Information, Hohai pressed as
University, Nanjing, 210098, China. dingxfeng@gmail.com,
n
lzhxu@hhu.edu.cn. 1 X
Chengrong Huang is with the School of Computer Engineering, Nanjing CR = (fk − µ)(fk − µ)T , (2)
Institute of Technology, Nanjing, 211167, China. n−1
k=1
978-1-4244-5196-8/10/$26.00 ©2010 IEEE 1802

FrB7.1
where CR is a d × d positive matrix, µ is the mean of all

the points in R.
Differen composite of different sub-feature extracted from
the image represents differen property. To have a correct
object representation we need to choose what kind of features
to extract from the image I. In this paper, we define φ as

φ = x y R(x, y) G(x, y) B(x, y)
(3)
I(x, y) Ix (x, y) Iy (x, y) ,
where x and y are the pixel location; R(x, y), G(x, y) and Fig. 1: The object is presented with a rectangle, and its
B(x, y) are the red, green and blue color values; I is the corresponding 100 sampling rectangles.
intensity; Ix (x, y) and Iy (x, y) are the image derivatives
calculated through the filter [−1 0 1] in x, y direction.
A. Covariance Computation and Dissimilarity Metric The distance between X and Y is given by d(X, Y ) =
k logX (Y )kX . Combining (6) and (4), we have
During the covariance based object tracking process, a
fast covariance matrix computation method of a given region d2 (X, Y ) = hlogX (Y ), logX (Y )iX
is necessary. After getting the regions’ covariance matrixes, 1 1 (7)
= tr log2 (X − 2 Y X − 2 ) .
how to measure the dissimilarity of the matrixes is the basis
of the good tracking performance. To over come these two
III. M ONTE C ARLO M ETHOD
problems, a integral image based computation method [2]
and a Riemannian Manifold [3] are used. Tracking can be considered as estimation of the state
The integral image based covariance matrix computation given all the measurements up to the moment, or equiva-
method proposed in [2] will be used in our paper. When this lently constructing the probability density function of object
method is used in covariance computation, the computation location [3]. Generally, a filtering with prediction approach
cost is independent of the size of the region. Details of the is used. The most common filter is particle filter. However,
computation process are referred to Ref. [2]. the particle degeneracy and how to establish the target’s
Although a covariance metric has been proposed in [3] motion model are critical to the tracking performance. Our
which is an invariant metric for the tangent space for approach is based on the Monte Carlo method [6], which is a
symmetric positive definite matrices. For the completeness special case of particle filter and do not consider the particle
of this paper and the convenience of the next section IV of degeneracy and the object’s motion model.
the object matching and model update, we provide here a During the Monte Carlo method, if we get the object’s
necessary introduction. location in the current frame, then using Monte Carlo method
A covariance metric proposed in [3] is given by to sample the object’s possible locations and scales in the
1 1 next frame. After getting these samples, the best matching,
hy, ziX = tr X − 2 yX −1 zX − 2 , (4)
i.e. the object’s location, can be got by (11).
where hy, ziX is the metric defined by a collection of inner Let the distribution of the object’s motion between con-
products on the tangent space, y and z are arbitrary points tinuous frames is two-dimensional Gaussian. Using the two-
of the manifold; tr(.) denotes the matrix’s trace; the capital dimensional Gaussian distribution to produce n sampling
letters denote the points on manifold and the small letters rectangles. The center of these rectangles subjects to two-
corresponds to vectors on the tangent space. dimensional Gaussian distribution, the mean is the center of
The distance on the manifold, i.e. the dissimilarity between the rectangle in the former frame, the variance reflects the
two regions’ covariance matrixes, is defined in terms of speed of the object’s movement. The sampling rectangles’
minimum length curves between points on the manifold. The centers are produced by the Box-Muller [7] method. Let the
curve with minimum length is called the geodesic and the random variable x and y is the uniform distribution in [0, 1],
length of the curve is the intrinsic distance. Let y ∈ TX M , then u and v expressed as
where TX M is the tangent space at point X ∈ M . There is ( √
a unique geodesic starting at X with tangent vector y. The u = ( −2 ln x cos 2πy) × σ + µ
√ , (8)
exponential map, exp X: TX M → M , maps the vector y to v = ( −2 ln x sin 2πy) × σ + µ
a point Y belonging to the previous geodesic. We denote by
logX its inverse. The exponential map is given by which are the two-dimensional Gaussian distribution. Let the
1 1 1 1 object’s scale is also Gaussian distribution, and the mean is
expX (y) = X 2 exp X − 2 yX − 2 X 2 . (5) the scale of the object in the former frame, the variance
The logarithm map is given by reflects the rate of its change. Then using Monte Carlo
method, we can get the object’s possible location and scale
1 1 1 1
logX (Y ) = X 2 log X − 2 Y X − 2 X 2 . (6) in the next frame, see Fig. 1.
1803
FrB7.1
(a) Origin (b) Occlusion-1 (c) Occlusion-2
(a) Object (b) C 1 (c) C 2 (d) C 3 (e) C 4 (f) C 5
Fig. 2: Object representation. Construction of the five covari-

ance matrixes from overlapping regions of an object feature
image.
(d) Occlusion-3 (e) Occlusion-4 (f) Occlusion-5

IV. O BJECT M ATCHING AND M ODEL U PDATE
Fig. 3: The original object and different occlusions. a: the
To obtain the most similar region to the target, we can original object. b-f: the object under different types of partial
use (7) to measure the distance between the covariance occlusions.
matrixes corresponding to the target object window and the
candidate regions. Theoretically a single covariance matrix
extracted from a region is usually enough to matching the in Fig. 2. By (9), the best and robust object matching is
region in different views and poses. So we can find the best 5
object matching by finding the minimum distance with (7).
X
arg min d2j (Cji , CTi ), (11)
However, during object tracking, occlusions often happen. j={k1 ,k2 ,k3 ,k4 ,k5 ,n+1}
i=1
Tuzel et al. [2] and Palaio, Batista [5] represented an object
with five covariance matrixes of the image features computed where Cji and CTi are the j-th sampling’s and the target
inside the object region, as show in Fig. 2. Based on this object’s five covariance matrixes, respectively.
representation, two different dissimilarities of the two objects Solutions for the covariance matrixes update were pro-
were given in Ref. [2], [5]. The dissimilarity in Ref. [5] is posed in [3] and [5], wherein the method in [3] considered
given by the previous T covariance matrixes by a gradient descent
X5 approach and the method in [5] just considered the current
ρ(O1 , O2 ) = d2 (COi
1
i
, CO 2
), (9) and the last covariance matrixes by a middle mean approach.
i=1 In this paper, we propose a novel solution for the covariance
i i matrix update based on a variable mean of the the current and
where CO and CO (i = 1, ..., 5) are the five covariance
1 2 the last covariance matrixes. Let X and Y are the last and
matrixes of object 1 and object 2, respectively, which is better
current covariance matrixes. d(X, Y ) is the distance between
than the dissimilarity in Ref. [2]. Equation (9) integrates all
X and Y . Then our model update is
the distances of the five regions of a object. It is robust when
1 1 1 1
the object is with partial occlusions. C = X 2 exp(X − 2 (αd(X, Y ))X − 2 )X 2 , (12)
Combining the Monte Carlo method, an improved object
matching strategy based on (9) is given. In the following where α is the rate of model update, and when d(X, Y ) is
statement, k = 1, ..., n and i = 1, ..., 5 expect a special more than a given threshold the model will de updated.
note. Among all the sampling regions, let the k-th sampling
V. E XPERIMENTS AND R ESULTS
region’s location and scale be xk . dk is the distance of the k-
th sampling region (or candidate region) and the target object To demonstrate the performance of the proposed approach,
region (or reference object). Let d1k be the weight of the k- experiments were carried on several video sequences. The
th sampling region. Then, the predictive object location and experiments show that our approach is effective, feasible and
scale is Pn xk robust, when the object’s motion is under one or several
k=1 dk of these situations: (1) irregular movement; (2) target scale
x̂ = Pn 1 . (10) variations; (3) partial occlusions; (4) illumination changes.
k=1 dk
Let the predictive object location and scale be the (n + 1)-th A. Experiment 1
sampling region. An experiment was implemented to prove that our object
Among all the sampling regions, for the k-th sampling matching strategy is robust and exactly. In Table I, the
region, we can get its five covariance matrixes representation values in the final column do not differ much although the
Cki . For a given i, by min1≤k≤n d2 (Cki , CTi ) we can get occlusions are different; and from column 2 to 6, in each
five sampling regions k1 , k2 , k3 , k4 , k5 based on the spatial column the minim value is corresponding to the maxim
relationships between the original object and the subregions visual area of the target object.
1804
FrB7.1
Occlusion d2 (CO1 , C1)

j d2 (CO2 , C2)
j d2 (CO3 , C3)
j d2 (CO4 , C4)
j d2 (CO5 , C5)
j ρ(O, Oj )
1 2.0875 1.5445 2.5791 0.6510 3.3970 10.2591
2 1.8770 1.6951 2.7369 0.2546 2.8200 9.3836
3 1.9158 0.4171 2.9884 0.3370 2.8523 8.5106
4 2.3987 0.3247 3.8712 2.6385 2.6440 11.8771
5 2.1392 0.6884 3.3502 1.9920 2.8185 10.9883
TABLE I: The dissimilarities of the object without and with different partial occlusions. These dissimilarities are calculated
i
under the best matching selected manually. CO is the i-th covariance matrix (Fig. 2) of the object in Fig. 3(a). Cji is the i-th
covariance matrix of the object with occlusion-j (Fig. 3) with the i-th covariance representation.
(a) frame 1 (b) frame 3 (c) frame 4 (d) frame 10

(a) frame 10 (b) frame 181 (c) frame 332
(e) frame 1 (f) frame 3 (g) frame 4 (h) frame 10
Fig. 4: A head is tracked against a irregular movement.
(d) frame 392 (e) frame 416

B. Experiment 2
Figure 4 shows the tracking results on a continuous
sequence of a head’s irregular movement and illustrate how
to use the covariance of the object’s motion σ1 and the
covariance of the rate of scale variation σ2 . In Fig. 4, the top
row outlines the tracking result with σ1 = 5, the rectangles (f) frame 497 (g) frame 647 (h) frame 684
in frame 4 and 10 are not well overlapping the head; the Fig. 5: Tracking result of the pedestrian walking sequence.
tracking results in the second row are well done while σ1
increases to 8, where σ1 is in pixel units.
the above experiments. The minimum cover ideal is used
C. Experiment 3
to reduce the computing area of the integral image when
In this experiment, a pedestrian with gray overcoat is the sampling regions’ covariance matrixes are calculated.
tracked. The pedestrian’s scale is increasing as his walking Compared to particle filter, the particle degeneracy and how
from far to near and the pedestrian also stops for a while or to model the object’s motion are not needed to consider with
turns around during the walking. In Fig. 5, from frame 181 our approach’s performance is similarly with it for the above
to 332, the pedestrian turns around and changes his walking experiments. The model update strategy can give a good
direction; in frame 392, the pedestrian stops for a while to prediction while the object is under occlusions.
have a look; in frame 416, another pedestrian passes in front
of him which causes the partial occlusion. Results show in R EFERENCES
Fig. 5 illustrate that our approach can successfully tracking [1] Y. Alper, J. Omar and S. Mubarak, “Object tracking: A survey,” ACM
Computing Surveys, 2006, 38(4).
the pedestrian. [2] O. Tuzel, F. Porikli and P. Meer, “Region Covariance: A Fast De-
scriptor for Detection and Classification,” In Computer Vision - ECCV
D. Other Experiments 2006, 2006, pp.589-600.
[3] F. Porikli, O. Tuzel and P. Meer, “Covariance Tracking using Model
Experiments for outdoors tracking of pedestrians or rid- Update Based on Means on Riemannian Manifolds,” Computer Vision
ers are also implemented. Although the outdoors video and Pattern Recognition, 2006 IEEE Computer Society Conference on,
sequences are easy to be degraded by some noises or 2006.
[4] Y. Wu, B. Wu, et al. “Probabilistic tracking on Riemannian mani-
camera vibration, the tracking results show our proposed folds,” In Pattern Recognition, 2008. ICPR 2008. 19th International
approach effective. Due to the space limitation, we omit these Conference on, 2008, pp. 1-4.
experiments’ results. [5] H. Palaio, J. Batista. A region covariance embedded in a particle filter
for multi-object tracking. In workshop on Visual Surveillance, ECCV
2008, 2008.
VI. C ONCLUSIONS [6] Haiqin Zhang, Houqiang Li, Object Tracking with Monte Carlo
A Monte Carlo method is introduced to the covariance Methods. Journal of Image and Graphics, vol. 13, No. 5, pp. 937-
943, 2008.
based object tracking. The proposed approach can track the [7] T. Shinzato. Box Muller Method. http://www.sp.dis.
object with partial occlusions and irregular movements for titech.ac.jp/˜shinzato/boxmuller.pdf
1805

Region Covariance Based Object Tracking Using Monte Carlo Method

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Region Covariance Based Object Tracking Using Monte Carlo Method

Uploaded by

Copyright:

Available Formats

2010 8th IEEE International Conference on FrB7.

Region Covariance based Object Tracking Using Monte Carlo Method

978-1-4244-5196-8/10/$26.00 ©2010 IEEE 1802

where CR is a d × d positive matrix, µ is the mean of all

(a) Origin (b) Occlusion-1 (c) Occlusion-2

(a) Object (b) C 1 (c) C 2 (d) C 3 (e) C 4 (f) C 5

Fig. 2: Object representation. Construction of the five covari-

(d) Occlusion-3 (e) Occlusion-4 (f) Occlusion-5

Occlusion d2 (CO1 , C1)

(a) frame 1 (b) frame 3 (c) frame 4 (d) frame 10

(e) frame 1 (f) frame 3 (g) frame 4 (h) frame 10

Fig. 4: A head is tracked against a irregular movement.

(d) frame 392 (e) frame 416

You might also like