You are on page 1of 13

High-speed Tracking with Multi-kernel Correlation Filters

Ming Tang1,2∗, Bin Yu1,2 , Fan Zhang3 , and Jinqiao Wang1,2


1
University of Chinese Academy of Sciences, Beijing, China
2
National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China
3
School of Info. & Comm. Eng., Beijing University of Posts and Telecommunications
arXiv:1806.06418v1 [cs.CV] 17 Jun 2018

Abstract
Correlation filter (CF) based trackers are currently
ranked top in terms of their performances. Nevertheless,
only some of them, such as KCF [26] and MKCF [47], are
able to exploit the powerful discriminability of non-linear
kernels. Although MKCF achieves more powerful discrim-
inability than KCF through introducing multi-kernel learn-
ing (MKL) into KCF, its improvement over KCF is quite lim-
ited and its computational burden increases significantly in
comparison with KCF. In this paper, we will introduce the
MKL into KCF in a different way than MKCF. We refor-
mulate the MKL version of CF objective function with its
upper bound, alleviating the negative mutual interference
of different kernels significantly. Our novel MKCF tracker,
MKCFup, outperforms KCF and MKCF with large margins
and can still work at very high fps. Extensive experiments
MKCFup MKCF KCF SRDCF ECO-HC

on public data sets show that our method is superior to


state-of-the-art algorithms for target objects of small move Figure 1. Qualitative comparison of our novel multi-kernel cor-
at very high speed. relation filters tracker, MKCFup, with state-of-the-art trackers,
KCF [26], MKCF [47], SRDCF [12], and ECO HC [9] on chal-
lenging sequences, singer2 and freeman4 of OTB2013 [54] and
1. Introduction ski long and running 100 m 2 of NfS [17].
Visual object tracking is one of the most challenging
problems in computer vision [48, 28, 31, 41, 34, 38, 35, 37, tracking domain in recent years [4, 25, 16, 10, 26, 13, 15,
29, 58, 56, 23, 49, 6, 45]. To adapt to unpredictable vari- 5, 8, 42, 14, 40, 36]. Bolme et al. [4] reignited the interests
ations of object appearance and background during track- in correlation filters in the vision community by proposing
ing, the tracker could select a single strong feature that is a CF tracker, called minimum output sum of squared er-
robust to any variation. However, this strategy has been ror (MOSSE), with classical signal processing techniques.
known to be difficult [50, 20], especially for a model-free MOSSE used a base image patch and several virtual ones
tracking task in which no prior knowledge about the target to train the correlation filter directly in the Fourier domain,
object is known except for the initial frame. Therefore, de- achieving top accuracy and fps then. Later, the expres-
signing an effective and efficient scheme to combine several sion of MOSSE in the spatial domain turned out to be the
complementary features for tracking is a reasonable alterna- ridge regression [44] with a linear kernel [25]. Therefore,
tive [53, 55, 32, 16, 1, 52, 59, 57]. in order to exploit the powerful discriminability of non-
Since 2010, correlation filter based trackers (CF track- linear kernels, Henriques et al. [25, 26] utilized the cir-
ers) have been being proposed and almost dominated the culant structure produced by a base sample to propose an
∗ The
efficient kernelized correlation filter based tracker (KCF).
corresponding author (tangm@nlpr.ia.ac.cn). This work was sup-
ported by Natural Science Foundation of China under Grants 61375035
Danelljan et al. [16] extended the KCF with the historically
and 61772527. The code is available at http://www.nlpr.ia.ac.cn/mtang/ weighted objective function and low-dimensional adaptive
Publications.htm. color channels. To adaptively employ complementary fea-

1
tures in KCF, Tang and Feng [47] derived a multi-kernel simplifies the solution of MKCF, then analyzes its short-
learning (MKL) [43] based correlation filter (MKCF) which coming, and finally derives a novel multi-kernel correlation
is able to take advantage of the invariance-discriminative filter with the upper bound of objective function. Sec.4 pro-
power spectrums of various features [50] to improve the lo- vides some necessary implementation details. Experimental
cation performance. By introducing a mask on the sam- results and comparison with state-of-the-art approaches are
ples into the loss item of correlation filter formulation, Ga- presented in Sec.5. Sec.6 summarizes our work.
loogani et al. [19] proposed the correlation filter with lim-
ited boundaries (CFLB) to address the boundary effect [30]. 2. Related Work
And Danelljan et al. [12] introduced a smooth spatial regu-
Multi-kernel learning (MKL) aims at simultaneously
larization factor within the regularizer to restrain the bound-
learning a kernel and the associated predictor in supervised
ary effect. In [9], Danelljan et al. employed the dimen-
learning settings. Rakotomamonjy et al. [43] proposed
sionality reduction, linear weighting of features, and sample
an efficient algorithm, named SimpleMKL, for solving the
clustering to further improve the SRDCF proposed in [12]
MKL problem through reduced gradient descent in a pri-
in both location accuracy and fps.
mal formulation. Varma and Ray [50] extended the MKL
Up till now, there are at least two principal lines to im- formulation in [43] by introducing an additional constraint
prove MOSSE and KCF. The first one is to weight the fil- on combinational coefficients and applied it to object clas-
ter or samples with a mask in MOSSE or the KCF of lin- sification. Vedaldi et al. [51] and Gehler and Nowozin [20]
ear kernel, alleviating the negative boundary effect greatly applied MKL based approaches to object detection and clas-
and improving the location performance remarkably. How- sification. Cortes et al. [7] studied the problem of learn-
ever, the trackers on this line, such as CFLB, SRDCF, C- ing kernels of the same family with an L2 regularization
COT [15], and ECO HC [9], are unable to employ power- for ridge regression (RR) [44]. Tang and Feng [47] ex-
ful non-linear kernels. And the other line is to improve the tended the MKL formulation of [43] to RR, and presented
objective function of KCF, such as designing more compli- a different multi-kernel RR approach. In this paper, differ-
cated objective functions [2], or introducing the MKL into ently from all above approaches, we derive a novel multi-
KCF to adaptively exploit multiple (non-linear) kernels. Al- kernel correlation filter through optimizing the upper bound
though MKCF, the MKL version of KCF, is more discrimi- of multi-kernel version of KCF’s objective function.
native than KCF, its improvement over KCF is quite limited In addition to the correlation filter based trackers afore-
because different kernels of MKCF may restrict each other mentioned, generalizations of KCF to other applications
in training and updating. And unfortunately, the computa- have also been proposed [3, 18, 24] in recent years. And
tional cost of MKCF increases significantly in comparison Henriques et al. [27] utilized the circulant structure of Gram
to KCF. Specifically, the MKCF’s improvement over KCF matrix to speed up the training of pose detectors in the
on AUC is only about 2% ∼ 3%, while its fps drops dramat- Fourier domain. It is noted that all these approaches are un-
ically from averagely about 300 of KCF to 30. It is noticed able to employ multiple kernels or non-linear kernels simul-
that such an improvement of introducing MKL into KCF is taneously. In this paper, we propose a novel multi-kernel
similar to that of introducing MKL into single kernel binary correlation filter which is able to fully take advantage of
classifier [50], where the improvement of MKL version is invariance-discriminative power spectrums of various fea-
about 2%. tures at really high speed.
In this paper, we will introduce the MKL into KCF
in a different way than [47] to adaptively exploit multi- 3. Multi-kernel Correlation Filters with Upper
ple complementary features and non-linear kernels more Bound
effectively than in MKCF. We reformulate the MKL ver-
sion of CF objective function with its upper bound, al- In this section, we will first review the multi-kernel cor-
leviating the negative mutual interference of complemen- relation filter (MKCF) [47], simplify its optimization, then
tary features significantly while keeping very large fps. In analyze its drawback, and finally derive a novel multi-kernel
fact, our novel MKCF tracker, i.e., MKCFup, outperforms correlation filter with upper bound. Readers may refer
KCF and the KCF with scaling on AUC about 16% and to [43, 21] for more details on multi-kernel learning.
7%, respectively, at about 150 fps. A qualitative compari-
3.1. Simplified Multi-kernel Correlation Filter
son shown in Fig. 1 indicates that our novel tracker, MKC-
Fup, outperforms other state-of-the-art trackers in challeng- The goal of a ridge regression [44] is to solve the
ing sequences singer2 and freeman4 of OTB2013 [54] and Tikhonov regularization problem,
ski long and running 100 m 2 of NfS [17].
l−1
The remainder of this paper is organized as follows. In 1X
min (f (xi ) − yi )2 + λo ||f ||2k , (1)
Sec.2, we briefly overview the related work. Sec.3 first f 2 i=0
where l is the number of samples, f lies in a bounded con- To solve for α, let ∇α F (α, d) = 0; it is achieved that
vex subset of an RKHS defined by a positive definite kernel !−1
function k(, ), xi s and yi s are the samples and their regres- M
X
sion targets, respectively, and λo ≥ 0 is the regularization α= dm Km + λo I y, (6)
parameter. m=1

As a special case of ridge regression, correlation filters where I is an l × l identity matrix. And d can be deter-
generate their training set {xi |i = 0, . . . , l−1} by cyclically mined with the quadprog function in Matlab’s optimiza-
shifting a base sample, x ∈ Rl , such that xi = Pil x, where tion toolbox. Initially, ∀m, dm = 1/M . Then, because
Pl is the permutation matrix of l × l [26], and the yi s are F (α, d) ≥ 0, alternately evaluating Eq. (6) with fixed d
often Gaussian labels. and invoking the quadprog function with fixed α for d will
By means of the Representer Theorem [46], the optimal achieve a local optimal solution (α∗ , d∗ ).
solution f ∗ to Problem (1) can be expressed as f ∗ (x) =
Pl−1
i=0 αi k(xi , x). Then, ||f ||k = α Kα, where α =
2 >
3.1.1 Fast Evaluation in Training
(α0 , α1 , . . . , αl−1 ) , and K is the positive semi-definite
>

kernel matrix with κij = k(xi , xj ) as its elements, and As stated in Sec. 3.1, the training samples are cyclically
Problem (1) becomes shifting in correlation filters. Therefore, the optimization
processes of α and d can be speeded up by means of the
1 λo fast Fourier transform (FFT) pair, F and F −1 .
min ||y − Kα||22 + α> Kα (2)
α∈Rl 2 2 At first, the evaluation of first rows km s of kernel matri-
ces Km s can be accelerated with FFT because the samples
for α, where y = (y0 , y1 , . . . , yl−1 )> .
are circulant [25, 26]. Because Km s are circulant [25], the
It has been shown that using multiple kernels instead of a
inverses and the sum of circulant matrices are circulant [22].
single one can improve the discriminability [33, 50]. Given
Then the evaluation of Eq. (6) can be accelerated as
the base kernels, km , where m = 1, 2, . . . , M , a usual ap-
 
proach is to consider k(xi , xj ) to be a convex combina-
tion of base kernels, i.e., k(xi , xj ) = d> k(xi , xj ), where F(y)
α = F −1  P  . (7)
M
k(xi , xj ) = (k1 (xi , xj ), k2 (xi , xj ), . . . , kM (xi , xj ))> , F m=1 dm km + λo
PM
d = (d1 , d2 , . . . , dM )> , m=1 dm = 1, and dm ≥ 0.
PM According to Eq. (4), given α, the optimization function
Hence we have K = m=1 dm Km , where Km is the mth
base kernel matrix with κm F (d; α) w.r.t. d can be expressed as
ij = km (xi , xj ) as its elements.
Substituting K for that in (2), we obtain the constrained op- 1 > 1 1
timization problem as follows. F (d; α) = d Ad d + d> Bd + y> y, (8)
2 2 2
min F (α, d), where
α,d
PM  
s.t. m=1 dm = 1,
(3) α> K> ··· α> K>
1 K1 α 1 KM α
dm ≥ 0, m = 1, . . . , M,  . .. .. 
Ad = 
 .. . .
,
 (9)
where > >
α KM K1 α · · · > >
α KM KM α
2
1 λ
XM XM
o and >
F (α, d) = y − dm K m α + α> dm Km α. Bd = b> >
(10)
2 2 d K1 α, . . . , bd KM α ,
m=1 2 m=1
(4) bd = λo α − 2y. The evaluation of Ad and Bd can be ac-
The optimal solution to Problem (3) can be expressed as celerated by evaluating Km α with F −1 (F ∗ (km ) F(α)),
l−1
where m = 1, . . . , M .
X
f ∗ (x) = αi d> k(xi , x). (5)
i=0 3.1.2 Fast Detection

Given d in Problem (3), we get an unconstrained According to Eq. (5), the MKCF evaluates the responses
quadratic programming problem w.r.t. α. And given α, of all test samples zn = Pnl z, n = 0, 1, . . . , l − 1, in the
Problem (3) is the constrained quadratic programming w.r.t. current frame p + 1 as
d. Let {Km } be positive semi-definite. Then, it is clear that M l−1
X X
given d, F (α, d) is convex w.r.t. α, and given α, F (α, d) y n (z) = dm αi km (zn , xpm,i ), (11)
is convex w.r.t. d. m=1 i=0
where z is the base test sample, xpm,i = Pil xpm , xpm is the km (z, Pil xjm ), j = 1, . . . , p. xjm is evaluated by using
weighted average of the mth feature of historical locations Eq. (12) where j is used instead of p.
till frame p. Formally, Commonly, different kernels (ı.e., features) should be
equipped with different weights β j , as their robustness is

xpm = (1 − ηm )xp−1
m + ηm R(D(ι(p), sp ), ζ, m), (12) different throughout an image sequence. For example, the
colors of the target object may vary more frequently than its
where ηm ∈ [0, 1] is the learning rate of kernel m for the HOG in an image sequence. Nevertheless, it is impossible
appearance of training samples, ι(p) and s∗p are the optimal for different kernels to set different β j in Fe (α, d), because
location and scale of target object in frame p, respectively, ζ different kernels are multiplied by each other and can not
is the pre-defined scale for the image sequence, D(ι(p), s∗p ) be separated into different items. Therefore, it is expectable
is the image patch determined by ι(p) and s∗p in frame p, that the location performance will be affected negatively if
R(D, ζ, m) denotes D re-sampled by ζ for kernel m, and Fe (α, d), instead of F (α, d), is used in Problem (3), be-
x0m is the feature in the initial frame. cause different kernels have to share the same weight β j .
Because km (, )’s are permutation-matrix-invariant, the
response map, y(z), of all virtual samples generated by z 3.3. Extension of Multi-kernel Correlation Filter
can be evaluated as with Upper Bound
M Let yc = y/M. We have
X
0
y(z) ≡ (y (z), . . . , y l−1 >
(z)) = dm C(kpm )α, (13) 2
1
XM XM
λo
m=1 F (α, d) = y − dm K m α + α> dm Km α
2 m=1
2 m=1
p p p 2
where kpm = (km,0 , . . . , km,l−1 ), km,i = km (z, Pil xpm ),
µ X 
M
and C(km ) is the circulant matrix with kpm as its first row.
p

2
kyc − dm Km αk2 + λdm α> Km α
Therefore, the response map can be accelerated as follows. 2 m=1
M
X ≡ UF (α,d) ,
−1 ∗
y(z) = dm F (F (kpm ) F(α)) . (14) where µ = 2M + 1, λ = λµo , and the upper bound is
m=1
reached when dm1 Km1 α = dm2 Km2 α, m1 = 1, . . . , M
The element of y(z) which takes the maximal value is ac- and m2 = 1, . . . , M . The proof can be found in the supple-
cepted as the optimal location of object in frame p + 1. And mentary material. We then treat UF (α,d) , the upper bound
the target’s optimal scale is determined with fDSST [14]. of F (α, d), as the optimization function of MKCF and in-
troduce the historical samples into it. Consequently, the
3.2. Shortcoming of Multi-kernel Correlation Filter final optimization objective for training a common multi-
In order to achieve the robust performance of location, kernel correlation filter for the whole historical samples can
MKCF is updated with the weighted average of histori- be expressed as follows.
cal samples. To improve the location performance further, p M
1 X X j j,m
we would like to train a common MKCF (i.e., common α Fp (αp , dp ) ≡ β u ,
2 j=1 m=1 m F (α,d)
and d) for the historical samples, just like what was done
in [16]. Then, the optimization function should be as fol- where
lows. 2
uj,m j > j
F (α,d) = yc − dm,p Km αp 2 + λdm,p αp Km αp ,
Fe (α, d) =
 2  βm1
= (1 − γm )p−1 , βm j
= γm (1 − γm )p−j , j = 2, . . . , p,
Xp
1 XM λo XM p is the number of historical frames, γm ∈ (0, 1) is the

β j  y − dm Kjm α + α> dm Kjm α learning rate of kernel m for the common MKCF, Kjm
2 2
j=1 m=1 m=1 2 is the Gram matrix of the mth kernel for the samples
M p
1 XX j >  in frame j, αp = (α0,p , α1,p , . . . , αl−1,p )> and dp =
= β y y − 2dm y> Kjm α + λo dm α> Kjm α (d1,p , d2,p , . . . , dM,p )> are dual vector and weight vector
2 m=1 j=1
p M M of
PM all kernels when frame p is processed, respectively, and
1X j >X X
+ β α dm Kjm dm Kjm α, m=1 dm,p = 1. And the new optimization problem for
2 j=1 m=1 m=1
the MKCF with whole samples is
min Fp (αp , dp ),
where β j is the weight of optimization function of the sam- αp ,dp
PM
ple in frame j, Kjm is the circulant kernel matrix with s.t. (15)
m=1 dm,p = 1,
j j j
kjm as its first row, kjm = (km,0 , . . . , km,l−1 ), km,i = dm,p ≥ 0, m = 1, . . . , M.
This is a constrained optimization problem. And similar to w.r.t. dp , such that the above three requirements are satisfied
Problem (3), given dp , Fp (αp , dp ) is convex and uncon- implicitly, then the explicit constraints in Problem (15) can
strained w.r.t. αp , and given αp , Fp (αp , dp ) is convex and be canceled. In the rest of this section, we will first derive an
constrained w.r.t. dp . efficient algorithm to optimize Problem (18) w.r.t. dp , and
Because Fp (αp , dp ) is unconstrained w.r.t. αp , to solve then prove that the optimal d∗p indeed implicitly satisfies the
for αp , let ∇αp Fp (αp , dp ) = 0; we achieve that above requirements for the optimal solution if dm,1 > 0,
 −1 m = 1, . . . , M .
Xp X M
 To solve for dp in Problem (18), let ∇dp Fp (αp , dp ) =
αp =  j
βm (dm,p Kjm )2 + λdm,p Kjm  · 0. Then, it is achieved that
j=1 m=1 Pp j j >
p X
X M j=1 βm (Km αp ) (2yc − λαp )
j dm,p = Pp ,
βm dm,p Kjm yc , 2 j=1 βm j
(Kjm αp )> (Kjm αp )
j=1 m=1
(16) where m = 1, . . . , M . Set
which can be evaluated efficiently with FFT as follows. dN
m,p
dm,p = , (19)
Ap ≡ F(αp ) dD
m,p
p X
X M
j
where
βm F(dm,p kjm ) F(yc )
>
j=1 m=1 dN N p
m,p = (1 − γm )dm,p−1 + γm (Km αp ) (2yc − λαp ),
= p X
M
. >
X
j
dD D p p
m,p = (1 − γm )dm,p−1 + 2γm (Km αp ) (Km αp ),
βm F(dm,p kjm ) (F(dm,p kjm ) + λ)
j=1 m=1 if p > 1. And if p = 1, then
Set PM dN 1 >
m,1 = (Km α1 ) (2yc − λα1 ),
AN
p AN
m,p
Ap = D = Pm=1 , (17) dD 1 > 1
Ap M D m,1 = 2(Km α1 ) (Km α1 ).
A
m=1 m,p
where It is clear that Kpm αp can be accelerated with
AN N p
m,p = (1 − γm )Am,p−1 + γm F(dm,p km ) F(yc ), F −1 (F ∗ (kpm ) F(αp )) = F −1 (F ∗ (kpm ) Ap ).
AD
m,p =(1 − γm )AD
m,p−1 +
Therefore, dm,p can be evaluated efficiently, and optimal
γm F(dm,p kpm ) (F(dm,p kpm ) + λ), solution d∗p can be obtained efficiently frame by frame.
if p > 1. In the initial frame, p = 1. Then
Theorem 1 Suppose that Kjm is circulant Gram matrix,
AN
m,1 = F(dm,1 k1m ) F(yc ), λ > 0, all components of yc is positive, and also suppose
AD = F(dm,1 k1m ) (F(dm,1 k1m ) + λ). dtm,p > 0, m = 1, . . . , M , j = 1, . . . , p, t = 1, 2, . . .,
m,1
where dtm,p is the tth iteration on frame p when solving
Therefore, Ap can be evaluated efficiently frame by frame. Problem (18) with alternative evaluation of αp and dp .
Solving for dp in Problem (15) will have to deal with Then,
a constrained optimization problem. This means that it is
difficult to obtain an iteration scheme for the optimal d∗p (1) dt+1
m,p > 0,
which is as efficient as the one for α∗p . Now let us investi-
(2) cl · λ/2 + cl · bmin < dt+1
m,p < cu · λ/2 + cu · b
max
, where
gate the constraints in Problem (15). It is clear that there are
cl and cu are two constants determined by yc , discrete
three purposes for adding these constraints in Problem (15).
Fourier transform matrix, βm j
, and the eigenvalues of
(1) dm,p ≥ 0, m = 1, . . . , M , are necessary to ensure
PM PM Km , b
j min
and b max
are two constants related to dtm,p ,
m=1 dm,p is convex combination. (2) m=1 dm,p = 1 βmj
, and the eigenvalues of Kjm .
is necessary to ensure the optimal d∗p is unique and its value
PM
is finite. (3) Both dm,p ≥ 0 and m=1 dm,p = 1 are neces- The proof can be found in the supplementary material.
sary to ensure there exists at least an m such that dm,p > 0. It can be seen from Theorem 1 that the range of dt+1m,p is
Therefore, if we are able to design an algorithm to optimize totally determined by two lines w.r.t. λ when d1m,p is fixed.
the unconstrained problem The smaller λ, the smaller dt+1
m,p , therefore, the smaller the
components of final optimal solution d∗p . That is, the com-
min Fp (αp , dp ) (18) ponents of d∗p are always finite and controlled by λ. It is
αp ,dp
obvious that d∗p satisfies the three requirements for the op- performance [25, 16]. Therefore, the search region is set 2.5
timal solution of Problem (18) w.r.t. dp , given the initial times larger than the bounding box of target object, which
d1m,p > 0, m = 1, . . . , M . is the same as that in KCF and CN2 [16].
More refined analysis on the relationship of λ and opti- 5. Experimental Results
mal d∗p is complex, because the bounds of d∗p heavily de-
pend on the eigenvalues of all kernel matrices which are The MKCFup was implemented in MATLAB. The ex-
constructed with practical samples and an additional scale periments were performed on a PC with Intel Core i7
parameter in the kernel. Therefore, we will experimentally 3.40GHz CPU and 8GB RAM.
show the further numerical relation between λ and d∗p in It is well-known that all samples of MOSSE, KCF,
Sec. 5.1. MKCF, and MKCFup are circulant. Therefore, their search
Based on the above analysis, it is concluded that the region can not be set too large [12]. Too large a search re-
optimization objective of the extension of MKCF is Prob- gion will include too much background, significantly reduc-
lem (18), and its optimization process is as follows. Ini- ing the discriminability of filters for target object against
tially, dm,1 = 1/M , m = 1, . . . , M . Then alternately background. Consequently, the search regions of above CF
evaluate Eq. (17) with fixed dp and Eq. (19) with fixed αp . trackers have to be set experientially around 2.5 times larger
Because Fp (αp , dp ) ≥ 0 is convex w.r.t. αp and dp , re- than the object bounding boxes [26, 47], much smaller than
spectively, such iterations will converge to a local optimal those of CFLB, SRDCF, and ECO HC [19, 12, 9]. It is ob-
solution (α∗p , d∗p ). In our experiments, a satisfactory con- vious that it will be impossible for any tracker to catch the
vergency (α∗p , d∗p ) on frame p can be achieved in three iter- target object once the target moves out of its search region
ations of Eq. (17) and Eq. (19). in the next frame. Therefore, CFLB, SRDCF, and ECO HC
The fast determination of the optimal location and scale are better for locating the target object of large move than
of target object in frame p + 1 is the same as that of MKCF KCF, MKCF, and MKCFup.
described in Sec. 3.1.2, where α = α∗p and d = d∗p . An even worse situation for KCF, MKCF, and MKCFup
is that, according to the experimental experiences on corre-
4. Implementation Details lation filter based trackers [4, 25, 10, 47], even if the target
is in the search region in next frame, its location may still
In our experiments, the color and HOG are used as be unreliable when the target moving near to the boundaries
features in MKCFup. Considering the tradeoff between of the search region. Specifically, it is often difficult for the
the discriminability and computational cost, we employ a CF trackers, such as MOSSE, CN2 [16], KCF, MKCF, and
kernel for each of color and HOG, i.e., M = 2. As MKCFup which use only one base sample, to obtain a re-
in [16, 26, 11, 47], the multiple channels of the color and liable location by using response maps if the ratio of the
HOG are concatenated into a single vector, respectively. center distance of target object over the bounding box in
The response map y is identical to that in KCF [25]. two frames is larger than 0.6 when the background clutter is
The color scheme proposed by [16] is adopted as our present. Consequently, it is suitable for the above CF track-
color feature, except that we reduce the dimensionality of ers to track the target object with quite small move between
color to four with principal component analysis (PCA). two frames. In this paper, the move of target object is de-
Normal nine gradient orientations and 4 × 4 cell size are fined as small, if the offset ratio
utilized in HOGs. The dimensionality of our HOGs is also
reduced to four with PCA to speed up MKCFup. Gaus- kc(xt ) − c(xt+δ )k2
τ≡ p < 0.6, (20)
sian kernel is used for both features with σcolor = 0.515 w(xt ) · h(xt )
and σHOG = 0.6 for color sequences and σcolor = 0.3 and where c(), w(), and h() are the center, width, and height of
σHOG = 0.4 for gray sequences. Employing Gaussian ker- sample, respectively. δ = 1 if there is no occlusion for the
nel to construct kernel matrices ensures that all Km s are target object, otherwise δ is the amount of frames from start-
positive definite [39]. The learning rates γcolor = 0.0174 ing to ending occlusion. A sequence is accepted to contain
and γHOG = 0.0173 for color sequences, and γcolor = the target object of large move if there exists two adjacent
0.0175 and γHOG = 0.018 for gray sequences. The learn- frames or the occlusion of target object such that τ > 0.6.
ing rates of sample appearance ηcolor = γcolor and ηHOG = It is noted that the above definition of offset ratio for small
γHOG for both color and gray sequences. move is quite rough, because it neglects the possible big
In order to reduce high-frequency noise in the frequency difference between width and height.
domain stemming from the large discontinuity between op- According to the above discussion, two visual tracking
posite edges of a cyclic-extended image patch, the feature benchmarks, OTB2013 [54] and NfS [17] were utilized to
patches are banded with Hann window. Because there is compare different trackers in this paper, because most of
only one true sample in each frame, it is well known that sequences of OTB2013 and most of high frequency part of
too large a search region in KCF will reduce the location NfS only contain small move of the target object.
In our experiments, the trackers are evaluated in one-
pass evaluation (OPE) using both precision and success logarithmic plot of and λ
plots [54], calculated as percentages of frames with cen- 6

ter errors lower than a threshold and the intersection-over- 5

union (IoU) overlaps exceeding a threshold, respectively. 4

Trackers are ranked using the precision score with center 3

log
error lower than 20 pixels and area-under-the-curve (AUC), 2

respectively, in precision and success plots. 1

In this paper, to simplify the experiments, we only com- 0

pare those state-of-the-art trackers which merely employ the -1


-4 -3 -2 -1 0 1 2 3 4 5

hand-crafted features color or HOG. log λ

5.1. Relationship of optimal weight d∗p and regular-


ization parameter λ Figure 2. The numerical relationship between regularization pa-
rameter λ and d, the average of d∗m,p over m on OTB2013 [54].
Fig. 2 shows the numerical relation of λ and d∗p obtained Besides d, two deviations away from d are also presented. The
on OTB2013 when initially d1p = (0.5, 0.5). In the exper- logarithmic function is employed to make the relation more clear.
iment, λ ∈ {10−3 , 10−2 , 10−1 , 1, 10, 102 , 103 , 104 }. Ac- See Sec. 5.1 for details.
cording to Theorem 1, we set d∗m,p = d∗m,p (λ), because
d∗m,p is a function of λ, and 1
Precision plots of OPE on OTB2013
1
Success plots of OPE on OTB2013
0.9 0.9

1 XXX ∗ 0.8 0.8

d(λ) = d (λ), 0.7 0.7

Success rate
M P S m p i m,p,i
Precision
0.6 0.6

0.5 0.5

0.4 0.4

(max d∗m,p,i (λ), min d∗m,p,i (λ)),


MKCFup [0.835] MKCFup [0.641]
(δmax (λ), δmin (λ)) = 0.3
KCFscale [0.782] 0.3
MKCF [0.592]
m,p,i m,p,i 0.2 MKCF [0.767]
fMKCF [0.758]
0.2 KCFscale [0.585]
0.1 0.1
fMKCF [0.580]
KCF [0.742] KCF [0.521]

where P and S are the number of selected frames in each 0


0 10 20 30

Location error threshold


40 50
0
0 0.2 0.4 0.6

Overlap threshold
0.8 1

image sequence and the total number of selected sequences, Figure 3. The precision and success plots of KCF [26], KCFs-
respectively, p and i represent the number of selected frame cale, MKCF [47], fMKCF, and MKCFup on OTB2013 [54]. See
and the number of selected sequence, respectively, and Sec. 5.2 for details. The average precision scores and AUCs of the
d∗m,p,i is the optimal weight of the mth kernel at frame p trackers on the sequences are reported in the legends.
of sequence i. In our experiment, specifically, P = 10 and
S = 20. That is, for each λ, ten frames are randomly sam-
pled from each of 20 randomly selected sequences out of KCFscale, MKCF, fMKCF, and MKCFup on OTB2013,
OTB2013, and d∗m,p (λ)s on these frames are used to cal- where KCFscale is the KCF with the scaling scheme of
culate d(λ) and two deviations, δmax (λ) and δmin (λ). To patch pyramid, and fMKCF is a variant of MKCF whose
demonstrate the relationship more clearly, λ and its three features and scaling scheme are the same as those adopted
functions are shown with logarithmic function. by MKCFup, and the optimization of d that is more efficient
It is interesting to notice that the relation of the averages than the one in [47], as described in Sec. 3.1.1, is adopted.
of λ and optimal d∗p is almost linear when λ < 10−1 or λ ≥ Fig. 3 reports the results. It is concluded from the figure that
1. And δmax (λ) and δmin (λ) drop significantly when λ < MKCFup outperforms KCF and KCFscale with large mar-
10−1 . When λ ≤ 0.05, the deviations are really close to the gins in both center precision and IoU, and that the novel ob-
average, and the relation of λ and d∗p itself is approximately jective function and training scheme of MKCFup improve
PM the location performance with the average precision score of
linear. Surprisingly, M 1
m=1 dm,p ≈ 0.5 for the frames

of all sequences when λ < 10 in our experiment. That


−1 83.5% and the average AUC score of 64.1%, significantly
PM outperforming MKCF and fMKCF by 6.8% and 4.9% and
is, m=1 d∗m,p ≈ 1, because M = 2. This means that the
constraint of Problem (15) on the sum of all components 7.7% and 6.1%, respectively. It is noticed that the location
of the optimal dp is satisfied implicitly and approximately, performances of fMKCF are inferior to those of MKCF, al-
while optimizing Problem (18) w.r.t. dp with the iterations though its fps is higher than MKCF’s (50 vs. 30).
of Eq. (17) and Eq. (19).
5.3. Comparison to State-of-the-art Trackers with
5.2. Comparison among MKCFs Handcrafted Features
In this section, we consider KCF as a special case of the We compare our MKCFup to other 6 trackers, KCF,
original MKCF [47] with M = 1. To verify our improve- KCFscale, MKCF, SRDCF, fDSST, and ECO HC on
ment on KCF and MKCF is effective, we compare KCF, OTB2013 and NfS. Fig. 4 shows the results. It can be seen
Precision plots of OPE on OTB2013 Success plots of OPE on OTB2013 Precision plots of OPE on NfS Success plots of OPE on NfS
1 1 1 1

0.9 0.9 0.9 0.9

0.8 0.8 0.8 0.8

0.7 0.7 0.7 0.7

Success rate

Success rate
Precision

Precision
0.6 0.6 0.6 0.6

0.5 0.5 0.5 0.5


ECO-HC [0.862] ECO-HC [0.656] ECO-HC [0.560] ECO-HC [0.459]
0.4 SRDCF [0.838] 0.4 MKCFup [0.641] 0.4 MKCFup [0.532] 0.4 MKCFup [0.455]
MKCFup [0.835] SRDCF [0.638] SRDCF [0.487] SRDCF [0.414]
0.3 0.3 0.3 0.3
KCFscale [0.782] MKCF [0.592] KCFscale [0.454] fDSST [0.382]
0.2 MKCF [0.767] 0.2 KCFscale [0.585] 0.2 fDSST [0.450] 0.2 MKCF [0.378]
0.1
KCF [0.742] 0.1 fDSST [0.564] 0.1 MKCF [0.437] 0.1 KCFscale [0.377]
fDSST [0.741] KCF [0.521] KCF [0.390] KCF [0.290]
0 0 0 0
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1 0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1

Location error threshold Overlap threshold Location error threshold Overlap threshold

Figure 4. The precision and success plots of MKCFup, KCF, KCFscale, MKCF, SRDCF, fDSST, and ECO HC on OTB2013 [54] and
NfS [17]. The average precision scores and AUCs of the trackers on the sequences are reported in the legends.

1
Precision plots of OPE on OTB2013
1
Success plots of OPE on OTB2013 Table 1. The amount of frames processed per second (fps) with
0.9 0.9
different trackers.
0.8 0.8

0.7 0.7
Success rate

Tracker KCF MKCF fMKCF fDSST SRDCF ECO-HC MKCFup


Precision

0.6 0.6

0.5 0.5 fps 297 30 50 80 6 39 150


MKCFup [0.885] MKCFup [0.680]
0.4 ECO-HC [0.869] 0.4 ECO-HC [0.665]
0.3
SRDCF [0.854] 0.3
SRDCF [0.653]
KCFscale [0.821] MKCF [0.620]
0.2 MKCF [0.802] 0.2 KCFscale [0.612]
fDSST [0.801] fDSST [0.602]

1.6% and 1.5%, respectively, on the small move sequences


0.1 0.1
KCF [0.789] KCF [0.549]
0 0
0 10 20 30 40 50 0 0.2 0.4 0.6 0.8 1

Location error threshold Overlap threshold


of OTB2013.
Figure 5. The precision and success plots of MKCFup, KCF, KCF-
To verify the advantage of MKCFup further, we removed
scale, MKCF, SRDCF, fDSST, and ECO HC on small move se-
quences of OTB2013 [54]. The average precision scores and the large move sequences2 from NfS by means of Eq. (20),
AUCs of the trackers on the sequences are reported in the legends. and compared the above trackers on the rest 84 sequences.
Note that an occluded target object is considered undergo-
1
Precision plots of OPE on NfS
1
Success plots of OPE on NfS ing large move if its τ > 0.6 between two frames of start-
0.9

0.8
0.9

0.8
ing and ending occlusion. Fig. 6 shows the results. It is seen
0.7 0.7 that MKCFup outperforms SRDCF and ECO HC on the av-
Success rate
Precision

erage precision score and AUC by 7.6% and 6.6% and 0.5%
0.6 0.6

0.5 0.5
MKCFup [0.575] MKCFup [0.499]
0.4

0.3
ECO-HC [0.570]
SRDCF [0.499]
0.4

0.3
ECO-HC [0.478]
SRDCF [0.433] and 2.1%, respectively, on small move sequences of NfS.
fDSST [0.479] MKCF [0.416]

Table 1 lists the amount of frames the trackers can pro-


0.2 MKCF [0.474] 0.2 fDSST [0.415]
0.1 KCFscale [0.467] 0.1 KCFscale [0.397]
KCF [0.414] KCF [0.310]
0
0 10 20 30 40 50
0
0 0.2 0.4 0.6 0.8 1 cess per second.
Location error threshold Overlap threshold

Figure 6. The precision and success plots of MKCFup, KCF, KCF- According to the above experiments, it can be concluded
scale, MKCF, SRDCF, fDSST, and ECO HC on small move se- that MKCFup outperforms state-of-the-art trackers, such as
quences of NfS [17]. The average precision scores and AUCs of SRDCF and ECO HC, with much higher fps as long as the
the trackers on the sequences are reported in the legends. move of target object is small.

that MKCFup outperforms all other trackers in both preci- 6. Conclusions and Future Work
sion scores and AUCs, except for ECO HC, on two bench-
marks. ECO HC is able to exploit larger search regions A novel tracker, MKCFup, has been presented in this pa-
than MKCFup does to catch the target object of large move, per. By optimizing the upper bound of the objective func-
whereas MKCFup is not. Therefore, ECO HC outperforms tion of original MKCF and introducing the historical sam-
MKCFup on the whole benchmarks. ples into the upper bound, we derived the novel MKCFup.
It has been demonstrated that the discriminability of MKC-
5.4. Comparison on Sequences of Small Move Fup is more powerful than those of state-of-the-art trackers,
By means of Eq. (20), it is found that there exist six such as SRDCF and ECO-HC, although its search region is
sequences which contain large move in OTB2013.1 We much smaller than theirs. And the MKCFup’fps is much
then removed them from the benchmark and compared our larger than state-of-the-art trackers’. In conclusion, MKC-
MKCFup, KCF, KCFscale, MKCF, SRDCF, fDSST, and Fup outperforms state-of-the-arts trackers with handcrafted
ECO HC on the rest sequences. Fig. 5 reports the results. It features at high speed if the target object moves small.
is seen that MKCFup outperforms SRDCF and ECO HC on
the average precision score and AUC by 3.1% and 2.7% and 2 The
16 sequences which contain the target object with large move in
NfS are airboard 1, airtable 3, bee, bowling3, football skill, parkour, ping-
1 The 6 sequences which contain the target object with large move in pong8, basketball 1, basketball 3, basketball 6, bowling2, dog 2, ping-
OTB2013 are boy, matrix, tiger2, ironman, couple, jumping. pong2, motorcross, person scooter, soccer player 3.
References [17] H. Galoogahi, A. Fagg, C. Huang, D. Ramanan, and
S. Lucey. Need for speed: A benchmark for higher frame
[1] L. Bertinetto, J. Valmadre, S. Golodetz, O. Miksik, and rate object tracking. In Proc. International Conference on
P. H.S. Torr. Staple: Complementary learners for real-time Computer Vision, 2017. 1, 2, 6, 8
tracking. In Proc. Computer Vision and Pattern Recognition, [18] H. Galoogahi, T. Sim, and S. Lucey. Multi-channel correla-
2016. 1 tion filters. In Proc. International Conference on Computer
[2] A. Bibi, M. Mueller, and B. Ghanem. Target response adap- Vision, 2013. 2
tation for correlation filter tracking. In Proc. European Con- [19] H. Galoogahi, T. Sim, and S. Lucey. Correlation filters with
ference on Computer Vision, 2016. 2 limited boundaries. In Proc. Computer Vision and Pattern
[3] V. Boddeti, T. Kanade, and B. Kumar. Correlation filters Recognition, 2015. 2, 6
for object alignment. In Proc. Computer Vision and Pattern [20] P. Gehler and S. Nowozin. On feature combination for multi-
Recognition, 2013. 2 class object classification. In Proc. International Conference
[4] D. Bolme, R. Beveridge, B. Draper, and Y. Lui. Visual object on Computer Vision, 2009. 1, 2
tracking using adaptive correlation filters. In Proc. Computer [21] M. Gönen and E. Alpaydın. Multiple kernel learning algo-
Vision and Pattern Recognition, 2010. 1, 6 rithms. Journal of Machine Learning Research, 12:pp.2211–
[5] J.-W. Choi, H. Chang, J. Jeong, Y. Demiris, and J.-Y. Choi. 2268, 2011. 2
Visual tracking using attention-modulated disintegration and [22] R. Gray. Toeplitz and Circulant Matrices: A review. Now
integration. In Proc. Computer Vision and Pattern Recogni- Publishers Inc., 2006. 3
tion, 2016. 1 [23] B. Han, J. Sim, and H. Adam. Branchout: Regularization
[6] J.-W. Choi, H. Chang, S. Yun, T. Fischer, Y. Demiris, and for online ensemble tracking with convolutional neural net-
J.-Y. Choi. Attentional correlation filter network for adap- works. In Proc. Computer Vision and Pattern Recognition,
tive visual tracking. In Proc. Computer Vision and Pattern 2017. 1
Recognition, 2017. 1 [24] J. Henriques, J. Carreira, R. Caseiro, and J. Batista. Beyond
[7] C. Cortes, M. Mohri, and A. Rostamizadeh. l2 regulariza- hard negative mining: Efficient detector learning via block-
tion for learning kernels. In Proc. Uncertainty in Artificial circulant decomposition. In Proc. International Conference
Intelligence, 2009. 2 on Computer Vision, 2013. 2
[8] Z. Cui, S. Xiao, J. Feng, and S. Yan. Recurrently target- [25] J. Henriques, R. Caseiro, P. Martins, and J. Batista. Exploit-
attending tracking. In Proc. Computer Vision and Pattern ing the circulant structure of tracking-by-detection with ker-
Recognition, 2016. 1 nels. In Proc. European Conference on Computer Vision,
[9] M. Danelljan, G. Bhat, F. Shahbaz Khan, and M. Felsberg. 2012. 1, 3, 6
Eco: Efficient convolution operators for tracking. In Proc. [26] J. Henriques, R. Caseiro, P. Martins, and J. Batista. High-
Computer Vision and Pattern Recognition, 2017. 1, 2, 6 speed tracking with kernelized correlation filters. IEEE
[10] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. Transactions on Pattern Analysis and Machine Intelligence,
Accurate scale estimation for robust visual tracking. In Proc. Vol.37:pp.583–596, 2015. 1, 3, 6, 7
British Machine Vision Conference (BMVC), 2014. 1, 6 [27] J. Henriques, P. Martins, R. Caseiro, and J. Batista. Fast
[11] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Fels- training of pose detectors in the fourier domain. In Proc.
berg. Convolutional features for correlation filter based vi- Neural Information Processing Systems (NIPS), 2014. 2
sual tracking. In Proc. International Conference on Com- [28] Z. Hong, C. Wang, X. Mei, D. Prokhorov, and D. Tao. Track-
puter Vision Workshop: VOT, 2015. 6 ing using multilevel quantizations. In Proc. European Con-
[12] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. ference on Computer Vision, 2014. 1
Learning spatially regularized correlation filters for visual [29] Y. Hua, K. Alahari, and C. Schmid. Online object tracking
tracking. In Proc. International Conference on Computer with proposal selection. In Proc. International Conference
Vision, 2015. 1, 2, 6 on Computer Vision, 2015. 1
[13] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. [30] B. Kumar, A. Mahalanobis, and R. Juday. Correlation Pat-
Adaptive decontamination of the training set: A unified for- tern Recognition. Cambridge University Press, 2005. 2
mulation for discriminative visual tracking. In Proc. Com- [31] J. Kwon, J. Roh, K.-M. Lee, and L. Van Gool. Robust visual
puter Vision and Pattern Recognition, 2016. 1 tracking with double bounding box model. In Proc. Euro-
[14] M. Danelljan, G. Hager, F. Shahbaz Khan, and M. Felsberg. pean Conference on Computer Vision, 2014. 1
Discriminative scale space tracking. IEEE Transactions on [32] X. Lan, A. Ma, and P. Yuen. Multi-cue visual tracking using
Pattern Analysis and Machine Intelligence, 2017. 1, 4 robust feature-level fusion based on joint sparse represen-
[15] M. Danelljan, A. Robinson, F. Shahbaz Khan, and M. Fels- tation. In Proc. Computer Vision and Pattern Recognition,
berg. Learning continuous convolution operators for visual 2014. 1
tracking. In Proc. European Conference on Computer Vision, [33] G. Lanckriet, T. De Bie, N. Cristianini, M. Jordan, and
2016. 1, 2 W. Noble. A statistical framework for genomic data fusion.
[16] M. Danelljan, F. Shahbaz Khan, M. Felsberg, and J. van de Bioinformatics, 20:2626–2635, 2004. 3
Weijer. Adaptive color attributes for real-time visual track- [34] D. Lee, J.-Y. Sim, and C.-S. Kim. Visual tracking using perti-
ing. In Proc. Computer Vision and Pattern Recognition, nent patch selection and masking. In Proc. Computer Vision
2014. 1, 4, 6 and Pattern Recognition, 2014. 1
[35] T. Liu, G. Wang, and Q. Yang. Real-time part-based visual [54] Y. Wu, J. Lim, and M.-H. Yang. Online object tracking - a
tracking via adaptive correlation filters. In Proc. Computer benchmark. In Proc. Computer Vision and Pattern Recogni-
Vision and Pattern Recognition, 2015. 1 tion, 2013. 1, 2, 6, 7, 8
[36] A. Lukezic, T. Tomas Vojir, L.-C. Zajc, J. Matas, and [55] F. Yang, H. Lu, and M. Yang. Robust visual tracking via
M. Kristan. Discriminative correlation filter with channel multiple kernel boosting with affinity constraints. IEEE
and spatial reliability. In Proc. Computer Vision and Pattern Transactions on Circuits and Systems for Video Technology,
Recognition, 2017. 1 24:pp.242–254, 2014. 1
[37] C. Ma, J. Huang, X. Yang, and M. Yang. Hierarchical con- [56] S. Yun, J.-W. Choi, Y. Yoo, K. Yun, and J.-Y. Choi. Action-
volutional features for visual tracking. In Proc. International decision networks for visual tracking with deep reinforce-
Conference on Computer Vision, 2015. 1 ment learning. In Proc. Computer Vision and Pattern Recog-
[38] C. Ma, X. Yang, C. Zhang, and M. Yang. Long-term correla- nition, 2017. 1
tion tracking. In Proc. Computer Vision and Pattern Recog- [57] L. Zhang, J. Varadarajan, and P.-N. Suganthan. Robust visual
nition, 2015. 1 tracking using oblique random forests. In Proc. Computer
[39] C. Micchelle. Interpolation of scattered data: Distance matri- Vision and Pattern Recognition, 2017. 1
ces and conditionally positive definite functions. Construc- [58] T. Zhang, A. Bibi, and B. Ghanem. In defense of sparse
tive Approximation, Vol.2:pp.11–22, 1986. 6 tracking: Circulant sparse tracking. In Proc. Computer Vi-
[40] M. Mueller, N. Smith, and B. Ghanem. Context-aware cor- sion and Pattern Recognition, 2016. 1
relation filter tracking. In Proc. Computer Vision and Pattern [59] T. Zhang, C. Xu, and M.-H. Yang. Multi-task correlation
Recognition, 2017. 1 particle filter for robust object tracking. In Proc. Computer
[41] H. Nam, S. Hong, and B. Han. Online graph-based tracking. Vision and Pattern Recognition, 2017. 1
In Proc. European Conference on Computer Vision, 2014. 1
[42] Y. Qi, S. Zhang, L. Qin, H. Yao, Q. Huang, J. Lim, and Y. M-
H. Hedged deep tracking. In Proc. Computer Vision and
Pattern Recognition, 2016. 1
[43] A. Rakotomamonjy, F. Bach, S. Canu, and Y. Grand-
valet. SimpleMKL. Journal of Machine Learning Research,
9:2491–2521, 2008. 1, 2
[44] R. Rifkin, G. Yeo, and T. Poggio. Regularized least-squares
classification. Nato Science Series Sub Series III: Computer
and Systems Sciences, pp131-154.:2003, 190. 1, 2
[45] D. Rozumnyi, J. Kotera, S. Sroubek, L. Novotny, and
J. Matas. The world of fast moving objects. In Proc. Com-
puter Vision and Pattern Recognition, 2017. 1
[46] B. Schölkopf and A. Smola. Learning with Kernels. MIT
press Cambridge, MA, 2002. 3
[47] M. Tang and J. Feng. Multi-kernel correlation filter for visual
tracking. In Proc. International Conference on Computer
Vision, 2015. 1, 2, 6, 7
[48] M. Tang and X. Peng. Robust tracking with discrimina-
tive ranking lists. IEEE Transactions on Image Processing,
Vol.21(No.7):3273–3281, 2012. 1
[49] J. Valmadre, L. Luca Bertinetto, J. Henriques, A. Vedaldi,
and P. H. S. Torr. End-to-end representation learning for cor-
relation filter based tracking. In Proc. Computer Vision and
Pattern Recognition, 2017. 1
[50] M. Varma and D. Ray. Learning the discriminative power-
invariance trade-off. In Proc. International Conference on
Computer Vision, 2007. 1, 2, 3
[51] A. Vedaldi, V. Gulshan, M. Varma, and A. Zisserman. Multi-
ple kernels for object detection. In Proc. International Con-
ference on Computer Vision, 2009. 2
[52] L. Wang, W. Ouyang, X. Wang, and H. Lu. Stct: Sequen-
tially training convolutional networks for visual tracking. In
Proc. Computer Vision and Pattern Recognition, 2016. 1
[53] Y. Wu, G. Blasch, G. Chen, L. Bai, and H. Ling. Multiple
source data fusion via sparse representation for robust visual
tracking. In FUSION, 2011. 1
Supplementary Material

High-speed Tracking with Multi-kernel Correlation Filters

Ming Tang1,2 , Bin Yu1,2 , Fan Zhang3 , and Jinqiao Wang1,2


1
University of Chinese Academy of Sciences, Beijing, China
2
National Lab of Pattern Recognition, Institute of Automation, CAS, Beijing 100190, China
arXiv:1806.06418v1 [cs.CV] 17 Jun 2018

3
School of Info. & Comm. Eng., Beijing University of Posts and Telecommunications

Abstract It is true that


2 !> !
This supplementary material includes 1) the experimen- XM M
X M
X

tal results on OTB2015, 2) the proof of Theorem 1. am = am am =

m=1 2 m=1 m=1
M
X M
X M
X M X
X M
1. Experimental Results on OTB2015 a>
m am = a>
m am + 2 a>
m an .
m=1 m=1 m=1 m=1 n=1

1
Success plots of OPE on OTB2015 It is also true that
0.9

0.8 a> > >


m am + an an − 2am an =
0.7
a> >
m (am − an ) − an (am − an ) =
Success rate

0.6

0.5
MKCFup [0.656]
(am − an )> (am − an ) = kam − an k22 ≥ 0.
0.4 ECO-HC [0.648]
SRDCF [0.637]
0.3
MKCF [0.603] Therefore,
0.2 KCFscale [0.592]
0.1 fDSST [0.587]

0
KCF [0.518] a> > >
m am + an an ≥ 2am an ,
0 0.2 0.4 0.6 0.8 1
M
X M
X
Overlap threshold
Figure 1. The success plot of MKCFup, KCF, KCFscale, MKCF, a> >
m am + M an an ≥ 2 a>
m an ,
SRDCF, fDSST, and ECO HC on small move sequences of m=1 m=1

OTB2015. The AUCs of the trackers on the sequences are reported M


X M
X M X
X M

in the legends. M a>


m am + M a>
n an ≥ 2 a>
m an ,
m=1 n=1 n=1 m=1
M
X M X
X M

2. Proof of Upper Bound of MKCF Objective 2M a>


m am ≥ 2 a>
m an .
m=1 m=1 n=1
Function [2]
Therefore,
Lemma 1 Suppose am is a vector, m = 1, . . . , M . Then,
M 2
X M
X M X
X M
M 2
X M
X am = a>
m am + 2 a>
m an

am ≤ (2M + 1) kam k22 . m=1 2 m=1 m=1 n=1
M M
m=1 2 m=1 X X
≤ a>
m am + 2M a>
m am
m=1 m=1
The equality holds when am = an , where m = 1, . . . , M
M
X
and n = 1, . . . , M .
= (2M + 1) a>
m am .
Proof of Lemma 1: m=1

1
That is, 3.1. Proof of First Conclusion
2 According to Eq. (2), we set αp = D−1
XM M
X p Np yc , where

am ≤ (2M + 1) kam k22 .
p X
X M

m=1 m=1 j
2
Dp = βm (dm,p Kjm )2 + λdm,p Kjm
It is clear that the equality holds when am = an , where j=1 m=1
m = 1, . . . , M and n = 1, . . . , M .
and
p X
X M
Q.E.D. Np = j
βm dm,p Kjm .
2.1. Proof of Upper Bound j=1 m=1

It is clear that both Dp and Np are positive definite be-


According to Lemma 1,
cause βm j
> 0, dm,p > 0, λ > 0, and Kjm is positive
2 definite. Because Kjm is circulant Gram matrix, we have
M
X M
X

y −

dm Km α ≤ (2M +1)
2
kyc − dm Km αk2 . Kjm = UΣjm UH , where U = √1l F−1 l and Fl is the 1-D
discrete Fourier transform matrix [1]. Because the linear
m=1 2 m=1
combination of circulant matrices is also circulant, we have
Therefore, UF (α,d) is the upper bound of F (α, d), and the  
upper bound is reached when dm Km α = dn Kn α, where Xp X M

m = 1, . . . , M and n = 1, . . . , M . Dp = U  j
βm d2m,p (Σjm )2 + λdm,p Σjm  UH
j=1 m=1
Q.E.D.
and  
p X
3. Proof of Theorem 1 X M
Np = U  j
βm dm,p Σjm  UH .
In the extension of MKCF with upper bound, to optimize j=1 m=1
the unconstrained problem  
j j
Let Σjm = diag σm,1 , . . . , σm,l , σm,n
j
> 0, n =
min Fp (αp , dp ), (1) 1, . . . , l. Then the nth eigenvalue of D−1
αp ,dp p Np is
Pp M Pj j
we achieve that j=1 m=1 βm dm,p σm,n
σαp ,n ≡ Pp PM
 −1 j j j
p X M j=1 m=1 βm dm,p σm,n (dm,p σm,n + λ)
X 
αp =  j
βm (dm,p Kjm )2 + λdm,p Kjm  · = (λ + bn )−1 ,
j=1 m=1
p X
where
X M
j Pp PM
βm dm,p Kjm yc , j=1
j 2 j
m=1 βm dm,p (σm,n )
2
j=1 m=1 bn = Pp PM j j
.
(2) j=1 m=1 βm dm,p σm,n

It is clear that bn > 0.


and
dN According to Eq. (3), we also have
m,p
dm,p = , (3)
dD
m,p
p
X
dN = j
βm (Kjm αp )> (2yc − λαp )
where m,p
j=1
>
dN N p
m,p = (1 − γm )dm,p−1 + γm (Km αp ) (2yc − λαp ),
p
X
> = yc> j
βm Np D−1 j −1
p Km (2I − λDp Np )yc
dD D p p
m,p = (1 − γm )dm,p−1 + 2γm (Km αp ) (Km αp ), j=1

when p > 1. If p = 1, then = yc> DN


m,p yc ,
Pp
dN 1 >
m,1 = (Km α1 ) (2yc − λα1 ),
where DNm,p = Np Dp
−1
j=1
j
βm p Np ), and
Kjm (2I−λD−1
th
dD 1 > 1 its n eigenvalue is
m,1 = (Km α1 ) (Km α1 ).
p
X
To simplify the denotation, in the proof, dm,p expresses N
σm,p,n = σαp ,n (2 − λσαp ,n ) j j
βm σm,n .
the kernel weight dtm,p of the tth iteration of αp and dp . j=1
∵ λσαp ,n = λ(λ + bn )−1 < 1, ∴ 2 − λσαp ,n > 1, ∴ Furthermore,
N
σm,p,n > 0, n = 1, . . . , l, ∴ DNm,p is positive definite, and Pl
dm,p > 0. It is obvious that dD (λ + bn )−1 (2(λ + bn ) − λ)(λ + bn )−1
m,p > 0. Consequently,
N
σr = n=1 Pl
2 n=1 (λ + bn )−2
dN Pl
dt+1
m,p (λ + bn )−2 (λ + 2bn )
m,p = > 0, = n=1Pl
dD
m,p 2 n=1 (λ + bn )−2
Pl −2
where m = 1, . . . , M . 1 n=1 bn (λ + bn )
= λ+ P l
.
2 n=1 (λ + bn )
−2
Q.E.D.
Let σm,max
j j
= maxn σm,n , σm,min = minn σm,n
j
,
3.2. Proof of Second Conclusion
Pp PM j 2 j 2
According to Eq. (2), we have max j=1 m=1 βm dm,p (σm,max )
b = Pp PM j j
,
p
X j=1 m=1 , βm dm,p σm,min
dD j
(Kjm αp )> (Kjm αp ) Pp PM
m,p = 2 βm j 2 j 2
j=1 m=1 βm dm,p (σm,min )
j=1 bmin = Pp PM j j
.
p
X j=1 m=1 , βm dm,p σm,max
= yc> 2 j
βm Np D−1 j 2 −1
p (Km ) Dp Np yc
j=1
Then, bmin ≤ bn ≤ bmax , and
= yc> DD
m,p yc 1 1
λ + bmin ≤ σr ≤ λ + bmax ,
Pp 2 2
where DD −1
m,p = 2Np Dp j=1
j
βm p Np , and its
(Kjm )2 D−1 cl cu
nth eigenvalue is λ + cl · bmin < dt+1
m,p < λ + cu · bmax .
2 2
p
X where m = 1, . . . , M .
D 2 j j
σm,p,n = 2σα p ,n
βm (σm,n )2 .
Q.E.D.
j=1

Then, according to Eq. (3), References


[1] P. David. Circulant Matrices. Chelsea Publishing Company,
dN
m,p yc> DN
m,p yc yc> UΣN H
m,p U yc 2nd edition, 1994. 2
dt+1
m,p = D = > D = > .
dm,p yc Dm,p yc yc UΣD H
m,p U yc [2] M. Tang and J. Feng. Multi-kernel correlation filter for vi-
sual tracking. In Proc. International Conference on Computer
Pp Vision, 2015. 1
Let UH yc = (yu,1 , . . . , yu,l ), cN
n = j=1 βm σm,n , and
j j
Pp
cD
n = β j

j=1 m m,n
j
) 2
. Then
Pl 2
n=1 yu,n cN
n σαp ,n (2 − λσαp ,n )
dt+1
m,p = Pl .
2 n=1 yu,n2 cD σ 2
n αp ,n

Let cN
max = maxn cn , cmin = minn cn , cmax = maxn cn ,
N N N D D

cmin = minn cn , ymax = maxn yu,n , ymin = minn yu,n ,


D D

2
ymin cN
min
2
ymax cN
max
cl = 2
, c u = 2 cN
,
ymax cN
max ymin min

and
Pl
n=1 σαp ,n (2 − λσαp ,n )
σr = Pl .
2 n=1 σα 2
p ,n

Then
cl · σr < dt+1
m,p < cu · σr .

You might also like