GPU Based Chromatic Co Occurrence Matrices For Tracking Moving Objects

Journal of Real-Time Image Processing
https://doi.org/10.1007/s11554-019-00874-x
ORIGINAL RESEARCH PAPER
GPU‑based chromatic co‑occurrence matrices for tracking moving

objects
Issam Elafi1 · Mohamed Jedra1 · Noureddine Zahid1
Received: 3 July 2018 / Accepted: 13 April 2019

© Springer-Verlag GmbH Germany, part of Springer Nature 2019
Abstract
Generally, a good tracking system requires a huge computation time to localize, with accuracy, the target object. For real-time
tracking applications, the running time is a critical factor. In this paper, a GPU implementation of the chromatic co-occurrence
matrices (CCM) tracking system is proposed. Indeed, the descriptors based on CCM help to improve the accuracy of the
tracking. However, they require a long computation time. To overcome this limitation, a parallel implementation of these
matrices based on GPU is incorporated to the tracker. The developed algorithm is then integrated into an embedded system
to build a real-time autonomous embedded tracking system. The experimental results show a speed up of 150% in the GPU
version of the tracker compared to the CPU version.
Keywords Chromatic co-occurrence matrices · Particle filter · Real time · GPU · Embedded system
1 Introduction their moments are different. Initially, the co-occurrence

matrices are defined for gray-level images called “Gray
In the last decades, the tracking field became a very impor- Level Co-occurrence Matrix” (GLCM) [18] and then suc-
tant domain for many applications such as surveillance cessfully adopted for chromatic images. However, CCMs
[1–3], driving assistance [4, 5] and robot control [6, 7]. are computationally expensive than histograms, which pre-
Before tracking an object, a detection phase is required. vents their usage in several domains and especially for real-
Many methods are used to detect a moving object in a given time tracking. This paper proposes a real-time autonomous
scene. The most used among them is the popular background embedded tracking system with a parallel implementation
subtraction method [8]. Once the object is detected, a track- of the chromatic co-occurrence matrices to address the com-
ing phase is performed by establishing a frame to frame putation time limitation.
update of the object location. At this stage, several methods The Graphics Processing Unit (GPU) is a special set of
based on learning systems [9–12], Histogram of Oriented processors designed to accelerate computational workloads.
Gradient (HOG) [13] or optical flow [14] can be used. When programming correctly, GPUs become a good tool to
Most existing tracking methods use the histogram as a process huge data streams and consequently outperform the
descriptor of the target object. However, this descriptor can- CPUs. Many applications in image processing take benefits
not represent efficiently a target object as it finds some dif- offered by the GPUs. Dixon and Ding [19] proposed a par-
ficulties to distinguish objects with the same color. Among allel implementation of GLCM. They extracted 17 features
the proposed solutions are the chromatic co-occurrence that are used to analyze the diffraction images of biological
matrices (CCM) [15–17]. The co-occurrence matrix uses cells. The authors claim that the parallel implementation
the information of texture and color to represent the target speeds up seven times the computation time compared to
object. Thus, even if the colors of the objects are similar, the CPU version. Liu et al. [20] proposed a multi-cue-based
face tracking algorithm. To speed up the tracking process,
they used two parallel computing techniques. The first one
* Issam Elafi
massifale@gmail.com uses the MapReduce thread model and the second explores a
GPU-based speed-up approach. Afterword, they presented a
1
Laboratory of Conception and Systems (Electronics, third approach that combines both techniques. The obtained
Signals, and Informatics), Faculty of Science, Mohammed V results show that the use of GPU can speed up the multi-cue
University, Rabat, Morocco
13
Vol.:(0123456789)
face tracking 8–12 times. Laborda et al. [21] proposed to

extract features based on the color pattern from multi-cam-
era with high quality in uncontrolled sports environment like
a football match. The authors compared multiple CPU and
GPU implementations of their algorithm. The usage of GPU
shows a speedup of four to ten times. Gómez-Luna et al. [22]
presented an optimized GPU-based method to histogram cal-
culation, able to eliminate the position conflict, to reduce
the bank conflict and to improve the access to memory. The
results show that the proposed method is 1.4–15.7 faster
than the other implementations. Franco et al. [23] proposed
a GPU implementation of 2D wavelet transform based on a
pair of Quadrature Mirror Filters. Amamra and Aouf [24]
used an adaptation of the Kalman filter to improve the accu-
racy of the Kinect sensor for real-time RGBD (Red, Green,
Blue, and Depth) image capture. The GPU implementation
of the filter allows a real-time processing when calculating
the pixel kernels in the depth image. All these applications
and others have benefited from parallel computation on GPU
to improve the running time. Hence, this motivated us to
realize a GPU implementation of the CCM tracker and inte-
grate it into an embedded system.
The rest of this article is organized as follows. In Sect. 2, Fig. 1 Example of Mk,k′ calculation with k = R and i = 2 (step 1)
the chromatic co-occurrence matrices and their calculation
are described. Section 3 presents in detail the tracking algo-
rithm based on CCM and the proposed parallel implemen-
tation of the CCM. The comparative study is provided in
Sect. 4 and finally, a conclusion is given.
2 The chromatic co‑occurrence matrices
The chromatic co-occurrence matrices represent the correla-

tion between the color components of an image [15]. Each
pixel p in a color image I is represented in the RGB color
space by a vector {rp, gp, bp}. Let Cpx the value of a pixel p
following the color channel x such that: Fig. 2 Example of Mk,k′ calculation with k′ = G and i = 2 (step 2)
⎧ rp if x = R
⎪
Cpx = ⎨ gp if x = G . (1) Figures 1, 2, and 3 show an example of a chromatic co-
⎪ occurrence matrix calculation for pixels with values equal
⎩ bp if x = B
to 2 (i = 2) in the color R component (k = R) and all values
in the second component G (k′= G).
Consider k and k′, two components from RGB space. Let The color images are coded on 256 levels for every color
Mk,k′ be the co-occurrence matrix of I for the two compo- channel. Implicitly, the chromatic co-occurrence matrix size
nents k and k′ with a size 256 × 256, initially empty. To fill is 256 × 256. However, it is not necessary to use all levels
the ith row of Mk,k′, we need first to find every pixel p having of a color image; only a few levels are needed to compute
Cpk = i (Fig. 1). the CCM. Consequently, the sensitivity to the illumination
Then, for each pixel p, we define a 3 × 3 neighborhood Nh variation and the computation time are decreased.
of adjacent pixels. For every pixel p′ in all neighborhoods Every color image I can generate nine chromatic co-
Nh (Fig. 2), if Cpk� = j , the value of the cell Mk,k′(i, j) is incre-
�
occurrence matrices M R,R[I], M G,G[I], M B,B[I], MR,G[I],

mented by 1. In other words, we compute the number of MG,B[I], M R,B[I], M G,B[I], M R,G[I] and M B,R[I]. The
times we find Cpk′ values in all neighborhoods Nh for Cpk = i .
′
matrices M G,R[I], M B,G[I], and MR,B[I] can be deducted
13
state, is weighted using a likelihood function. The estimated

state is then calculated by the weighted average of all parti-
cles as following:
∑NP
i=1
𝜔i ∗ si , (3)
where si is the state of the particle i and 𝜔i its normalized

weight.
Generally, for a tracking system, the particle filter tries to
estimate the center coordinates of the target in each frame.
Consequently, the state vector is defined as follows:
s = {x, y, vx , vy }, (4)
Fig. 3 Chromatic co-occurrence matrix calculation example (k = R, where x and y are the coordinates of the target center, vx the
k′= G, i = 2) (step 3)
velocity of x and vy is the velocity of y.
The sample set is propagated in each frame by
using symmetry. The use of the Nh neighborhood allows to st = Ast−1 + wt−1 , (5)
the CCM to be less sensitive to rotation and translation. To
reduce the sensitivity to the resolution difference between ⎛ 1 0 1 0⎞
images, a normalization phase of these matrices is recom- ⎜ ⎟
mended using the following equation: ⎜ 0 1 0 1⎟
A=⎜ ⎟, (6)
� ⎜ 0 0 1 0⎟
M k,k [I](i, j) ⎜ 0 0 0 1 ⎟⎠
�
M k,k [I](i, j) ← ∑N−1 ∑N−1 . (2) ⎝
i=0 j=0
M k,k� [I](i, j)
where st is the state of a particle at time t, A the model matrix

3 Parallel chromatic co‑occurrence tracker and wt−1 is a randomly Gaussian noise.
(PCCT) To avoid the degeneration of particles, a resampling phase
is mandatory using the Sequential Importance Resampling
3.1 Tracking objects by chromatic co‑occurrence algorithm (SIR) [28]. The tracking process using the CCM
tracker (CCT) and particle filtering is divided into three steps (Fig. 4).
In the first step, the corresponding chromatic co-occur-
To track occluded objects, Elafi et al. [16] proposed a com- rence matrices of every sample are calculated. Before track-
bination of the particle filter (PF) and CCM-based descrip- ing, we must select the target object from the first frame.
tors. This method can follow different objects regardless of This object will be then considered as the model image of
their nature or size. The comparison results show that the the target.
given method demonstrates competitive results in several The second step calculates the parameter Md, which
complex situations such as illumination and scale variations represents the differential rate between the co-occurrence
of occluded objects. The main idea of this tracker is to cal- matrices of a sample with those of the model image. This
culate the corresponding chromatic co-occurrence matrices parameter is calculated using the average of Nd features as
of the target object. These matrices are evaluated to extract follows:
invariant features. The calculated features help to estimate
the position of the occluded object using the particle filter. 1 ∑ Nd
Md = m.
i=1 i (7)
The aim of the particle filter [25–27] is to estimate a hid- Nd
den state of a dynamic system at time t by considering its
A feature mi is obtained by calculating the difference
state at time t − 1. The PF generates a randomly NP samples
between two CCMs using the same color channels taken
(particles) to calculate the estimated state at each time step.
from the model image and the sample image. For example,
Every particle, which represents a supposition of the real
if Nd = 3, these features can be:
13
Fig. 4 Different steps of track-

ing an object by chromatic
co-occurrences matrices and
particle filter for one frame
Fig. 5 Computation time
percentage of CCT steps for one
frame
∑N ∑ N | This weight helps to determine the similarity between

| the sample image (represented by a particle) and the model
m1 = |M R,G (i, j) − MmR,G (i, j)|,
j=1 | s
i=1 |
image. Consequently, the particle with the highest weight
∑N ∑ N | is considered as the closest one to the target object. The
|
m2 = |M G,B (i, j) − MmG,B (i, j)|,
j=1 | s
(8) weights are normalized after the evaluation of all particles.
i=1 |
The estimated position of the target is deduced using the
∑N ∑N | Eq. (3).
|
m3 =
i=1
|M R,B (i, j) − MmR,B (i, j)|,
j=1 | s | It is proved that the use of chromatic co-occurrence matri-
ces as a descriptor for tracking improves the accuracy and
where N is the number of CCM levels. the robustness of the tracker [16, 17, 29]. Figure 5 shows
The third step evaluates every sample by calculating its the percentage of time average performed by every step of
weight using the likelihood function defined as the tracker for one frame. It is clear that the CCM computa-
tion step is time-consuming compared to other steps. To
1 1 2
overcome this problem, a parallel version of the chromatic
𝜔s = √ e− 2𝜎2 Md . (9)
2𝜋
13
Table 1 Average running Image size The average running time (ms)

time for the three parallel
implementations 8 64 256
GPU-1 GPU-2 GPU-3 GPU-1 GPU-2 GPU-3 GPU-1 GPU-2 GPU-3
128 × 128 111.41 6.97 6.37 111.45 7.07 3.73 111.56 8.05 3.59
256 × 256 445.47 11.96 10.69 445.60 11.78 8.60 445.63 13.18 8.41
512 × 512 1778.68 23.75 19.86 1777.32 21.91 17.54 1778.65 22.31 22.26
1024 × 1024 – 103.01 45.53 – 142.25 35.33 – 213.27 60.78
co-occurrence matrices is designed to decrease the required issue, we must use synchronization between threads. There
computation time. are some atomic operations which are implemented natively
on the GPU and which make it possible to solve this issue.
3.2 Graphics processing unit CUDA Besides the standard transfer in CUDA, there is another
type of data transfer between the CPU and the GPU called
Computer Unified Device Architecture (CUDA) is an archi- asynchronous transfer. Indeed, with the standard transfer,
tecture that increases computing performance by exploiting the CPU hangs until the end of the transfer while the asyn-
the power of graphics processors. The Graphics Processing chronous transfer allows continuing the execution of the next
Unit (GPU) contains a lot of parallel computing units that instructions even when the transfer has not yet finished. In
allow performing complex calculations. A CUDA program other words, this process allows the GPU to work while data
is organized in one or more kernels. Each kernel is executed, are being transferred from or to memory. The control of
at the GPU level, in several threads grouped in multidimen- these asynchronous transfers is done through the streams
sional blocks. Each thread of a block knows its index in where each stream is a sequence of ordered operations.
this block. This index is called “threadIdx” and of course,
each thread also knows the index of its block “blockIdx”. 3.3 Parallel CCM calculation
In practice, a CUDA GPU consists of a set of SM (Stream-
ing Multiprocessors). All these SMs operate in parallel and The goal of this work is to speed up the computation of the
independently. An SM has many processors that can run a chromatic co-occurrence matrices by taking advantage of
set of threads in parallel. For best performance, the SMs the parallel computing on GPUs. We start by analyzing the
must remain busy which means that we have a maximum CPU version to find the regions where the algorithm can be
of parallelism. parallelized. First, the basic CPU code of CCM calculation
Each thread has access to a global memory and another is executed on the GPU using a single GPU thread (GPU-
private memory called local memory. Threads of a block 1). With this implementation, we remark that the simple
have access to a memory called shared memory. It is a small execution of the CPU code on the GPU shows a low perfor-
memory integrated directly on the SM. Any memory access mance because of the difference in frequency between both
must respect some rules. The coalescence is one of them. architectures (Table 1 GPU-1). This implementation uses
Indeed, whenever a thread on the GPU reads or writes from/ only a small fraction of the GPU power. So, without paral-
to the global memory, it always accesses to a segment of lelism, the CPU code remains more suitable than the GPU
memory even if this Thread needs only to read or write a code. Second, we tried to integrate the parallelism at the
small chunk in this segment. Consequently, if other Threads image rows by associating with each row a specific thread
make similar accesses to the same segment at the same time in the GPU. Indeed, the value of the image’s row is set by
then the GPU can exploit and reuse this segment for all the thread.Idx. Then, each thread will execute a loop over
Threads. This means that the GPU is more efficient when the whole line (GPU-2). Consequently, only one block that
threads read or write to contiguous memory locations. That contains N threads is used (N is the number of rows in the
is called the coalesced access. Threads can share data and image). This parallel version is almost 100 times better when
even modify it using shared memory and global memory. compared to the first GPU implementation (GPU-1) but still
However, this can cause a problem when a thread tries to less powerful (Table 1 GPU-2).
read a result before another thread writes it. To avoid this
13
Fig. 6 Diagram of CCM parallel implementation
The third implementation assigns for each pixel a spe- the image into sub-images of K × K dimensions. Each sub-
cific thread that will compute, relative to its neighbors, the divided image will be assigned to a block of threads. Since
indices of the matrix where we must increment the value. a thread block cannot exceed 1024 Threads, the maximum
This thread will be responsible for calculating all possible value of K will be 32 (32 × 32 = 1024). During the calcula-
combinations of this pixel with its neighborhood of 3 × 3 tion, we chose to use the image matrix as a one-dimensional
(GPU-3). However, we must pay attention to two constraints. table. In fact, we concatenate the lines of the image to have
The first is that a block of threads cannot execute more than at the end a one-dimensional array. Figure 6 represents the
1024 threads. The second concerns access to memory that diagram of CCM parallel computation for an image with h
must be coalesced. For this reason, we have added a param- rows and l columns.
eter called K that will help us to manage the number of The kernel code of the proposed GPU-based CCM
threads that each block will contain. Indeed, we subdivide approach is as follows:
13
Fig. 7 Examples of the dataset images
13
The kernel call by the CPU is as follows:
To test the performance of our parallel implementations, The results in Table 2 show that the implementation on
a dataset containing 1000 images representing different the GPU always gives better computation times than the
textures is used. Figure 7 shows some examples of these CPU even with small images. To test the effect of the param-
images. Each texture is in four copies, each one of them hav- eter K, we have varied the value of K while fixing the num-
ing a different dimension (128 × 128, 256 × 256, 512 × 512, ber of levels of the CCM. The results obtained are shown
and 1024 × 1024). in Table 3.
The tests are performed on a GPU NVIDIA Tegra X1 From Table 3, we notice that when the value of K varies,
integrated into a Jetson TX1 onboard card. Table 1 sum- the GPU becomes less or more efficient because of the SM
marizes the obtained results from the three implementations. occupation problem. Indeed, when K increases, it means that
From Table 1 we can deduce: the number of threads in a block increases and consequently
the number of thread blocks decreases. As mentioned before,
• The number of levels in the CCM calculation does not the SMs must always be busy to get good GPU performance.
affect the computational time. So, the SMs stay less busy when the K increases unless the
• The thread is blocked once it exceeds a threshold time image is big enough. Also, when K decreases, the number
amount takes time to give results and, therefore, killed of threads of a block decreases implicitly. Consequently, the
automatically (GPU-1).
• The GPU-3 gives more relevant results when the size of
Table 3 The results obtained for different values of K (levels = 8)
the CCM increases. Indeed, when the dimension of the
images increases, the probability to find two threads try- Images size The running time (ms)
ing to increment the same memory cell at the same time K = 4 K = 8 K = 16 K = 32
decreases.
• If we compare the three implementations, we can see 128 × 128 2.79 1.54 1.53 1.61
clearly that the third implementation is the best one 256 × 256 10.43 5.49 5.49 5.88
(GPU-3). 512 × 512 25.72 15.02 15.13 15.44
1024 × 1024 46.02 47.04 47.99 49.67
Since the GPU-3 is the best parallel implementation, we
compare the CPU and GPU-3 versions. Table 2 resumes the
obtained results by calculating the running time of CPU and
GPU for multiple dimensions of the images (K = 8).
Table 2 Test results obtained by CPU and GPU-3 versions of CCM

calculation
Images size The average running time (ms)
8 levels 64 levels
CPU-3 GPU CPU GPU-3
128 × 128 9.04 6.80 7.94 5.96

256 × 256 41.97 17.23 39.19 16.70
512 × 512 149.72 103.00 146.28 99.15
1024 × 1024 578.09 390.97 570.78 385.728 Fig. 8 Percentage of computation and memory bandwidth for the pro-
posed kernel (GPU-3)
13
Fig. 9 Occupancy per SM and

the number of warps for the
proposed kernel (GPU-3)
Fig. 10 L2 cache memory bandwidth for the proposed kernel (GPU-3)
number of blocks increases because a SM cannot contain 3.4 Pseudo‑code of the parallel chromatic

more than eight blocks. This means that the GPU does not co‑occurrence tracker
execute all the blocks at the same time (the number of SMs
is fixed) which influences negatively the running time. The overall algorithm is summarized in Fig. 11.
Nvidia Profiler is a useful tool designed for analyzing
the performance of kernel CUDA. Figures 8, 9, and 10 are
generated automatically with this tool. To run the test, we 4 Experimental results
used images of size 1024 × 1024 pixels with K equal to 16
and 64 levels for the CCM dimension. To get a good ker- 4.1 Quantitative study
nel performance, we must maximize the GPU compute and
memory bandwidth. From Fig. 8, the Nvidia Profiler shows To evaluate the performance of the CCT tracker, we use the
that the computation and memory bandwidth of our imple- Object Tracking Benchmark (OTB) [30, 31]. This is a use-
mentation are balanced. Another performance aspect to be ful benchmark that contains many complex scenes includ-
verified is the occupancy. Indeed, when the GPU does not ing occlusion, illumination variation, fast motion, rotation,
have enough work, it causes a drop in the performance of scale variation, and deformation. This benchmark uses two
the kernel and limits its efficiency. This issue is solved by types of measurements. The first one is the precision rate in
increasing occupancy. In Fig. 9, the occupancy rate is the term of local error threshold which represents the Euclidian
measure of how many warps, the kernel has activated on distance between centroids of the tracked window and the
the GPU, relative to the maximum number of warps existed groundtruth window. The second measure is the success rate
in the GPU. The obtained results show that the occupancy in terms of overlap threshold [30]. The overlap is the ratio
rate of the proposed kernel can reach 90.6%. Also, the per- between the number of pixels in the intersection region of
formance of a kernel can be limited by the memory band- the tracked bounding box, and the groundtruth bounding
width when the GPU memories cannot provide data at the box and their union region. The Chromatic Co-occurrence
rate requested by the kernel. Figure 10 indicates the usage Tracker is compared with 12 other algorithms: LOT [32],
of L2 cache memory relative to the maximum throughput IVT [33], CT [34], DFT [35], CXT [36], SCM [37], CSK
supported by the memory. From Fig. 10, we deduce that the [38], L1APG [39], ASLA [40], MTT [41], KCF [42] and
memory managing of our kernel is pretty good. OCT_KCF [43]. For our tracker, the number of particles is
fixed to NP = 60 and the number of chromatic co-occurrence
13
Fig. 11 Algorithm of tracking
objects by CCM and particle
filtering
Table 4 Average overlap scores

SCM ASLA CT DFT LOT L1APG CT CXT CSK IVT MTT KCF OCT_KCF CCT
Occlusion 0.442 0.462 0.375 0.512 0.608 0.470 0.375 0.510 0.532 0.451 0.453 0.679 0.674 0.764
Deformation 0.359 0.397 0.312 0.436 0.623 0.458 0.312 0.467 0.442 0.383 0.417 0.621 0.688 0.716
Fast motion 0.158 0.135 0.259 0.283 0.508 0.542 0.259 0.693 0.505 0.138 0.484 0.562 0.662 0.795
Illumination variation 0.288 0.433 0.394 0.550 0.682 0.484 0.394 0.580 0.549 0.425 0.539 0.722 0.692 0.737
Out-of-plane rotation 0.328 0.354 0.303 0.426 0.597 0.468 0.303 0.524 0.491 0.343 0.422 0.639 0.678 0.744
In-plane rotation 0.082 0.201 0.178 0.203 0.000 0.560 0.178 0.730 0.606 0.215 0.286 0.650 0.717 0.782
Scale variation 0.158 0.135 0.259 0.283 0.508 0.542 0.259 0.693 0.505 0.138 0.484 0.562 0.662 0.795
Average 0.259 0.302 0.297 0.385 0.504 0.503 0.297 0.600 0.519 0.299 0.441 0.634 0.682 0.762
13
Fig. 12 Some tracking results over the used sequences
Table 5 The results obtained for the different sequences (K = 8) matrix levels to N = 16. Each object is tracked using its own
CCM. We extract from every model image three co-occur-
Number of The running time in seconds
frames rence matrices (Nd = 3) MR,G, MG,B, MR,B. Using more matri-
CCM_CPU CCM_GPU ces increases the accuracy of the tracker. Consequently, the
Sequence 1 412 32.90 22.12 running time also increases.
Sequence 2 651 36.18 26.59 The CCT algorithm performs the best score in terms of
Sequence 3 610 31.83 18.31 robustness in most issues existing in the OTB as shown in
Sequence 4 869 37.51 25.80 Table 4 which resumes the overlaps scores obtained by all
Sequence 5 975 28.08 17.88 trackers. For more details please refer to [16].
Sequence 6 977 31.04 20.59
13
4.2 PCCT vs CCT This parameter subdivides the original image into multiple
sub-images. Every portion of the image will be affected to
The computation of the CCM for every particle is executed a specific thread block. To improve the computational time
in a separate stream. In this way, we can benefit from the of the tracking system, we use streams to execute multiple
asynchronous copy, from and to the GPU global memory, parallel CCM computation at the same time. The experi-
between particles. Indeed, the use of multiple streams will mental results are performed using the embedded system
improve the computational time. The initial models of tar- Jetson TX1 from NVIDIA. The comparative study shows
gets are obtained using the method proposed by Elafi et al. that the parallel version on GPU completes the tracking of
[1]. This method can detect automatically all moving object all moving objects in the scene in a shorter time compared
in the scene without any prior information about the objects. to the CPU version. The GPU implementation speeds up the
To compare the CCT with its parallel version PCCT, we tracking process by more than 150%; however, this perfor-
chose several scenes taken by surveillance cameras inside mance can be boosted further if using a standalone standard
and outside of buildings providing from PETS and CAVIAR GPU unit.
datasets [44, 45]. These sequences contain many challenging
issues in object tracking such as partial occlusion, rotation, Acknowledgements We gratefully acknowledge the support of
NVIDIA Corporation with the donation of the Jetson TX1 onboard
illumination variation, scale variation, and deformation. The card used for this research.
results are obtained by running the tracker on a GPU-based
embedded system. The NVIDIA Jetson TX1 is an onboard
card designed to deploy parallel computing on GPUs. It has References
a 256-core CUDA GPU with NVIDIA Maxwell architecture.
The qualitative results of the CCT with the parallel imple- 1. Elafi, I., Jedra, M., Zahid, N.: Unsupervised detection and track-
mentation of CCM are shown in Fig. 12. ing of moving objects for video surveillance applications. Pattern
Table 5 summarizes the results obtained using the Recognit. Lett. 84, 70–77 (2016)
2. Chua, J.-L., Chang, Y.C., Lim, W.K.: A simple vision-based fall
CPU and GPU versions of the CCM. We can clearly see detection technique for indoor video surveillance. Signal Image
the advantage of using chromatic co-occurrence matrices Video Process. 9(3), 623–633 (2013)
in GPU compared to the CPU version. In fact, the parallel 3. Zhang, S., Zhou, H., Zhang, B., Han, Z., Guo, Y.: “Signal, image
version completes the tracking of all moving objects in the and video processing” special issue: semantic representations for
social behavior analysis in video surveillance systems. Signal
scene in a shorter time than the one required by the CPU Image Video Process. 8(1), 73–74 (2014)
version. As mentioned before, these results are obtained 4. Abdi, L., Meddeb, A.: In-vehicle augmented reality TSR to
by running the CCM calculation on an embedded GPU. As improve driving safety and enhance the driver’s experience. Sig-
known for embedded systems, the performance is limited nal Image Video Process. 12(1), 75–82 (2017)
5. Wang, J., Zhang, L., Zhang, D., Li, K.: An adaptive longitudinal
due to their small size and low energy consumption. The driving assistance system based on driver characteristics. IEEE
use of a standard GPU (not an embedded GPU) will give Trans. Intell. Transp. Syst. 14(1), 1–12 (2013)
better results; however, it is not tested here as the intention 6. Ding, S., Zhai, Q., Li, Y., Zhu, J., Zheng, Y.F., Xuan, D.: Simul-
is to develop an autonomous tracker devoted to embedded taneous body part and motion identification for human-following
robots. Pattern Recognit. 50, 118–130 (2016)
systems. 7. Maglietta, R., Milella, A., Caccia, M., Bruzzone, G.: A vision-
based system for robotic inspection of marine vessels. Signal
Image Video Process. 12(3), 471–478 (2017)
5 Conclusion 8. Piccardi, M.: Background subtraction techniques: a review. In:
IEEE International Conference on Systems, Man and Cybernetics,
vol. 4, pp. 3099–3104. Hague, Netherlands, Oct. 2004
In this paper, a new GPU-based parallel implementation 9. Fang, J., Wang, Q., Yuan, Y.: Part-based online tracking with
of the Chromatic Co-occurrence Matrices (CCM) is pre- geometry constraint and attention selection. IEEE Trans. Circuits
sented. To test the quality of the proposed implementation, Syst. Video Technol. 24(5), 854–864 (2014)
10. Lan, X., Zhang, S., Yuen, P. C.: Robust joint discriminative fea-
we developed a real-time tracking system of moving objects ture learning for visual tracking. In: the 25th International Joint
based on the CCM as an object descriptor and particle filter- Conference on Artificial Intelligence, pp. 3403–3410, N.Y, USA,
ing as a tracking process. This tracker is then integrated into Jul. 2016
an embedded system. The new implementation assigns to 11. Lan, X., Yuen, P. C., Chellappa, R.: Robust MIL-based features
template learning for object tracking. In: Proceedings of the thirty-
each pixel a specific thread that will compute the position first AAAI conference on artificial intelligence, pp 4118–4125,
of the value to increment into the CCM matrix. This thread San Francisco, USA, 2017
will be responsible for calculating all possible combinations 12. Liu, R., Lan, X., Yuen, P. C., Feng, G. C.: Robust visual track-
of this pixel with its neighborhood of 3 × 3. The proposed ing using dynamic feature weighting based on multiple diction-
ary learning. In: 24th European Signal Processing Conference
implementation considers the limitation of the thread num- (EUSIPCO), pp. 2166–2170, Budapest, Hungary, Aug. 2016
ber in a thread block by introducing a new parameter K.
13
13. Tian, S., et al.: Multilingual scene character recognition with co- 36. Dinh, T. B., Vo, N., Medioni, G.: Context tracker: exploring sup-
occurrence of histogram of oriented gradients. Pattern Recognit. porters and distracters in unconstrained environments. In: IEEE
51, 125–134 (2016) conference on computer vision and pattern recognition (CVPR),
14. Wang, Q., Fang, J., Yuan, Y.: Multi-cue based tracking. Neuro- pp. 1177–1184, Colorado Springs, (2011)
computing 131, 227–236 (2014) 37. Zhong, W., Lu, H., Yang, M. H.: Robust object tracking via spar-
15. Arvis, V., Debain, C., Berducat, M., Benassi, A.: Generalization of sity-based collaborative model. In: IEEE conference on computer
the co-occurrence matrix for colour images: application to colour vision and pattern recognition, pp. 1838–1845, Colorado Springs,
texture classification. Image Anal. Stereol. 23(1), 63–72 (2011) (2011)
16. Elafi, I., Jedra, M., Zahid, N.: Tracking occluded objects using 38. Henriques, J. F., Caseiro, R., Martins, P., Batista, J.: Exploiting
chromatic co-occurrence matrices and particle filter. Signal Image the circulant structure of tracking-by-detection with kernels. In:
Video Process. 12(7), 1227–1235 (2018) European conference on computer vision (ECCV), pp. 702–715,
17. Elafi, I., Jedra, M., Zahid, N.: Fuzzy chromatic co-occurrence Firenze, Italy, (2012)
matrices for tracking objects. Pattern Anal. Appl. (2018). https:// 39. Bao, C., Wu, Y., Ling, H., Ji, H.: Real time robust L1 tracker using
doi.org/10.1007/s10044-018-0726-z accelerated proximal gradient approach. In: IEEE conference on
18. Haralick, R.M., Shanmugam, K., Dinstein, I.: Textural features for computer vision and pattern recognition (CVPR), pp. 1830–1837,
image classification. IEEE Trans. Syst. Man Cybern. SMC-3(6), Providence, Rhode Island. (2012)
610–621 (1973) 40. Jia, X., Lu, H., Yang, M. H.: Visual tracking via adaptive struc-
19. Dixon, J., Ding, J.: An empirical study of parallel solutions for tural local sparse appearance model. In: IEEE conference on
GLCM calculation of diffraction images. In: 38th Annual Inter- computer vision and pattern recognition (CVPR), pp. 1822–1829,
national Conference of the IEEE Engineering in Medicine and Providence, Rhode Island. (2012)
Biology Society (EMBC), p. 3969–3972, Orlando, Florida USA, 41. Zhang, T., Ghanem, B., Liu, S., Ahuja, N. Robust visual tracking
(2016) via multi-task sparse learning. In: IEEE conference on computer
20. Liu, K.-Y., Li, Y.-H., Li, S., Tang, L., Wang, L.: A new parallel vision and pattern recognition, pp. 2042–2049, Providence, Rhode
particle filter face tracking method based on heterogeneous sys- Island. (2012)
tem. J. Real-Time Image Process. 7(3), 153–163 (2012) 42. Henriques, J.F., Caseiro, R., Martins, P., Batista, J.: High-speed
21. Laborda, M.A.M., Moreno, E.F.T., del Rincón, J.M., Jaraba, tracking with kernelized correlation filters. IEEE Trans. Pattern
J.E.H.: Real-time GPU color-based segmentation of football play- Anal. Mach. Intell. 37(3), 583–596 (2015)
ers. J. Real-Time Image Process. 7(4), 267–279 (2012) 43. Zhang, B., et al.: Output constraint transfer for kernelized correla-
22. Gómez-Luna, J., González-Linares, J.M., Benavides, J.I., Guil, tion filter in tracking. IEEE Trans. Syst. Man Cybern. Syst. 47(4),
N.: An optimized approach to histogram computation on GPU. 693–703 (2017)
Mach. Vis. Appl. 24(5), 899–908 (2013) 44. PETS. ftp://pets.rdg.ac.uk/pub/PETS2000/. Accessed 07 May
23. Franco, J., Bernabé, G., Fernández, J., Ujaldón, M.: The 2D wave- 2015
let transform on emerging architectures: GPUs and multicores. J. 45. CAVIAR, http://homepages.inf.ed.ac.uk/rbf/CAVIARDATA1/.
Real-Time Image Process. 7(3), 145–152 (2012) Accessed 07 May 2015
24. Amamra, A., Aouf, N.: GPU-based real-time RGBD data filtering.
J. Real-Time Image Process. 14(2), 323–340 (2018) Publisher’s Note Springer Nature remains neutral with regard to
25. Doucet, A., Godsill, S., Andrieu, C.: On sequential Monte Carlo jurisdictional claims in published maps and institutional affiliations.
sampling methods for Bayesian filtering. Stat. Comput. 10(3),
197–208 (2000)
26. Arulampalam, M.S., Maskell, S., Gordon, N., Clapp, T.: A tuto-
rial on particle filters for online nonlinear/non-Gaussian Bayesian Issam Elafi Issam ELAFI holds Ph.D degree from the Faculty of Sci-
tracking. IEEE Trans. Signal Process. 50(2), 174–188 (2002) ence, Mohammed V University in Rabat, and received the MS degree
27. Elafi, I., Jedra, M., Zahid, N.: Tracking objects with co-occurrence in computer science and embedded systems in 2012. Currently, he is
matrix and particle filter in infrared video sequences. IET Com- post-doctoral researcher at the Conception and Systems Laboratory—
put. Vis. 12(5), 634–639 (2018) Mohammed V University in Rabat, Morocco. His research interests
28. Øivind, S., Erik, B., Lars, H.: Improved sampling-importance include pattern recognition, image analysis and video processing.
resampling and reduced bias importance sampling. Scand. J. Stat.
30(4), 719–737 (2003) Mohamed Jedra Mohamed Jedra holds « Doctorat de Troisième
29. Elafi, I., Jedra, M., Zahid, N.: A novel particle swarm tracking sys- Cycle » and « Doctorat d’Etat » degrees in Electronics Engineering and
tem based on chromatic co-occurrence matrices. In: 2018 Inter- Informatics; all from Mohammed V University in Rabat, Morocco.
national conference on intelligent systems and computer vision From 1990 to 1999, he was the Network and Internet Center Manager
(ISCV), p. 1–8, Fez, Morocco, (2018) in the Faculty of Sciences and Assistant Professor at the Department
30. Wu, Y., Lim, J., Yang, M.H.: Object tracking benchmark. IEEE of Physics. In 1999, he became a Professor Habilité of Informatics in
Trans. Pattern Anal. Mach. Intell. 37(9), 1834–1848 (2015) the same Department and in 2003 he was promoted to the position of
31. “OTB.” Available: http://cvlab. hanyan g.ac.kr/tracke r_benchm ark/. Professor. From 1987 to the present, he was a member of the Concep-
Accessed Jan 2019 tion & Systems Laboratory. He is the co-founder and the current Chair
32. Oron, S., Bar-Hillel, A., Levi, D., Avidan, S.: Locally orderless of the Informatic Automatic and Embedded Systems Master (IASE) in
tracking. Int. J. Comput. Vis. 111(2), 213–228 (2014) the same faculty. His main research interests include machine learning,
33. Ross, D.A., Lim, J., Lin, R.-S., Yang, M.-H.: Incremental learning pattern recognition, biometric and video processing. He was a reviewer
for robust visual tracking. Int. J. Comput. Vis. 77(1–3), 125–141 of several conferences dealing with topics related to machine learning
(2007) and pattern recognition. He has published numerous refereed papers
34. Zhang, K., Zhang, L., Yang, M.H.: Fast compressive tracking. in specialized journals and conferences.
IEEE Trans. Pattern Anal. Mach. Intell. 36(10), 2002–2015 (2014)
35. Sevilla-Lara, L., Learned-Miller, E.: Distribution fields for track-
ing. In: IEEE conference on computer vision and pattern recogni-
tion, pp. 1910–1917, Providence, Rhode Island. (2012)
13
Noureddine Zahid Noureddine Zahid holds « Doctorat de Troisième the Conception & Systems Laboratory. He is the cofounder and the
Cycle » and « Doctorat d’Etat » degrees in Electronics Engineering and Chair of the Informatics Telecommunications and Imagery master (ITI)
Informatics, all from the Mohammed V University in Rabat, Morocco. in the Faculty of Science in Rabat. He has presented and published
He was an Assistant Professor in computer science at the Department many articles in scientific journals and conferences. He has reviewed
of Physics between 1988 and 1999, and Professor “Habilité” between numerous papers for international journals in the field of fuzzy systems
1999 and 2003. In 2003, he was promoted to the position of Professor and pattern recognition. His research interests lie in the areas of fuzzy
in the same department. From 1987 to the present, he is a member of logic, pattern recognition, machine learning and biometric.
13

GPU Based Chromatic Co Occurrence Matrices For Tracking Moving Objects

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

GPU Based Chromatic Co Occurrence Matrices For Tracking Moving Objects

Uploaded by

Copyright:

Available Formats

Journal of Real-Time Image Processing

ORIGINAL RESEARCH PAPER

GPU‑based chromatic co‑occurrence matrices for tracking moving

Received: 3 July 2018 / Accepted: 13 April 2019

1 Introduction their moments are different. Initially, the co-occurrence

face tracking 8–12 times. Laborda et al. [21] proposed to

2 The chromatic co‑occurrence matrices

The chromatic co-occurrence matrices represent the correla-

occurrence matrices M R,R[I], M G,G[I], M B,B[I], MR,G[I],

state, is weighted using a likelihood function. The estimated

where si is the state of the particle i and 𝜔i its normalized

where st is the state of a particle at time t, A the model matrix

Fig. 4 Different steps of track-

∑N ∑ N | This weight helps to determine the similarity between

Table 1 Average running Image size The average running time (ms)

Fig. 6 Diagram of CCM parallel implementation

Fig. 7 Examples of the dataset images

The kernel call by the CPU is as follows:

Table 2 Test results obtained by CPU and GPU-3 versions of CCM

128 × 128 9.04 6.80 7.94 5.96

Fig. 9 Occupancy per SM and

Fig. 10 L2 cache memory bandwidth for the proposed kernel (GPU-3)

number of blocks increases because a SM cannot contain 3.4 Pseudo‑code of the parallel chromatic

Table 4 Average overlap scores

Fig. 12 Some tracking results over the used sequences

You might also like

2 The chromatic co‑occurrence matrices

Fig. 4 Different steps of track-

Table 1 Average running Image size The average running time (ms)

Fig. 6 Diagram of CCM parallel implementation

Fig. 7 Examples of the dataset images

Table 2 Test results obtained by CPU and GPU-3 versions of CCM

Fig. 9 Occupancy per SM and

Fig. 10 L2 cache memory bandwidth for the proposed kernel (GPU-3)

number of blocks increases because a SM cannot contain 3.4 Pseudo‑code of the parallel chromatic

Table 4 Average overlap scores

Fig. 12 Some tracking results over the used sequences