You are on page 1of 4

2012 Fifth International Symposium on Computational Intelligence and Design

Parallel Algorithm for Moving Foreground Detection in Dynamic Background

YI YANG, WENJIE CHEN


School of Automation, Beijing Institute of Technology, Beijing, 100081
Beijing Key Laboratory of Automatic Control System (Beijing Institute of Technology)
Beijing, China
e-mail: yyhengwuyuan73@yahoo.com.cn, chen.wenjie@163.com

Abstract—Foreground detection in dynamic background has detection performance. The strong parallel computing
become a hot topic in video surveillance in recent years. In this ability is fully used. Meanwhile, the computation of CPU
paper we propose a new foreground detection approach based and GPU are divided wisely.
on GPU in dynamic background. With the proposed method, We adopt function-structured and modularized design
SIFT features are first extracted from two adjacent frames in concept to develop our moving foreground detection based
video sequences, which can be utilized to compute the
on CPU and GPU in dynamic background. The complete
parameters of affine transform model and to solve global
system is designed as below.
motion compensation. Then improving background
subtraction approach with dynamic background updating
CPU GPU CPU CPU
module is adopted to detect foreground objects. GPU method is Video
used to improve application performance. Combined with input Background Image Global moving
Updating matching Motion foreground
CUDA, three mainly algorithm modules, which are so called between compensation detection
Global Motion Compensation Module, Updating Background frames
Module and Foreground Detection Module, are improved. In
this paper, GPU and CPU are used as a combined computing
Figure 1. The flow diagram of detection system in this paper
unit, which makes good use of strong parallel computing
ability. The effectiveness of the method has been proved.
Finally, the contrasting experiments on processing time show II. ALGORITHM FOR MOVING FOREGROUND DETECTION
that the proposed algorithm based on GPU is better in speed. IN DYNAMIC BACKGROUND
The algorithm in this paper consists of four parts as
Keywords-dynamic background; affine transform; Firgure.1. The complete algorithm is shown in Firgure.2.
foreground detection; GPU parallel computing
A. Image Matching between Frames
I. INTRODUCTION SIFT ˄Scale Invariant Feature Transform˅algorithm is
At present, there are two dominant methods for moving applied to finish image feature extraction. As a feature point
foreground detection, optical flow method and motion based on matching algorithm, SIFT can process the matching
compensation method. Unfortunately, with high problem between two images with translation, rotation and
computational complexity and poor anti noise capability, affine transformation. Strong matching ability and good
optical flow method can only be used with special hardware robustness make the SIFT algorithm widely used in image
device to fit the need of real time. Among all of the motion match area. Principal Steps of SIFT algorithm are extreme
compensation based methods, global motion compensation points detection ˈ confirming key-points in scale space,
method is further helpful. The main ideal of this method is to direction parameters specified for each key-point and key
estimate the motion parameter of a camera through image descriptor generation. Euclidean distance is utilized to
matching. But complex computation and excessive calculate the distance between two SIFT feature vectors. In
processing bring serious problems to this approach[1,2]. order to obtain the accurate parameters of affine
With the rapid development of parallel computation, transformation model, RANSAC method is used to eliminate
GPU (Graphic Processing Unit) becomes more and more the abnormal feature points[3,4].
attractive all over the world due to its powerful parallel B. Global Motion Compensation
computing ability. CUDA (Compute Unified Device
Architecture) is a GPU general computing product produced Global Motion Compensation is an essential part of
by NVIDIA Company in 2007. It develops a new structure to moving target detection in dynamic background. In this
eliminate a great deal of constraints caused by GPU paper, we adopt affine model with six parameters to describe
computing model. the image global motion. This model is widely used, Because
Considering the problems above, we propose a parallel of its invariance under motion and variation, such as
translation, scaling and 2D-rotation[5]. 
algorithm for moving foreground detection in dynamic
background. It consists of four modules: image matching
between frames, global motion compensation, background
updating and moving foreground detection. We combine
last three algorithm modules with CUDA to improve the

978-0-7695-4811-1/12 $26.00 © 2012 IEEE 444


443
442
DOI 10.1109/ISCID.2012.270
­0, f t ( x, y ) − F ( Bt ( x, y )) < Th
M ( x, y ) = ® ˄3˅
last frame Current frame ¯1, f t ( x, y ) − F ( Bt ( x, y )) ≥ Th
SIFT extraction Where M ( x, y ) represents the result in which 0 means
and matching this pixel should belong to background area, while 1 means
foreground area. Th is the threshold.
f t ( x, y ) is current
Global motion Background Frame differencing
compensation image
input frame, and
Bt ( x, y) is the corresponding background
Background image.
updating Foreground object
module
III. ALGORITHM OPTIMIZATION BASED ON GPU

Show image Based on the above analysis, there are a large amount of
cumulative and matrix calculations in the global motion
Figure 2. The complete algorithm in this paper
compensation model, the background update model, and
foreground detection model. The following part will
The affine model with six parameters is shown as below. transplant these three modules to the CUDA platform for
ª f xt º ª a1 f xt −1 + a2 f yt −1 + a3 º optimization according to this feature.
« t» = « t −1 t −1 »
¬« f y ¼» ¬«a4 f x + a5 f y + a6 ¼»
A. The division of tasks
˄1˅
CUDA is a hardware and software system which regards
f xt GPU as a data parallel computing device. The CUDA
where represents x coordinate of a pixel at some
programming model considers the CPU as the host (Host),
f yt and the GPU as a co-processor or device (Device). In this
point in the image, represents y
model, the CPU and GPU work together to take their
(a1 , a2 , a4 , a5 ) represents the parameters responsibilities The CPU takes charge of the highly logical
coordinate,
( a3 , a 6 ) transaction processing and serial computing, while the GPU
described image rotation and scaling, and is responsible for the execution of highly threaded parallel
represents the parameters described translation. processing tasks. Each of them has independent memory
address space: The memory of the host side and device-side
C. Background Updating Model and Foreground memory. Memory operation of this algorithm is basically the
Detection same as general C program, and memory operations need to
In order to adapt dynamic background, we propose a new call the memory management functions of CUDA API.
background updating model which is different from These management operations include opening up, releasing
traditional background subtraction. Background updating and initializing memory space, as well as transmutes data
model is a crucial component of our improved background between the host and the device. For better utilization of the
subtraction. GPU's parallel processing capabilities, the task is divided as
In this paper, the new background is made up of two shown in Figure 3.
parts. One is the background area of current frame, and the
other is the foreground of current frame after affine B. arithmetic transplant
transformation which is the estimation of foreground motion. The kernel (kernel function) running on GPU is
The updating model is shown as below. organized in the form of thread grid (Grid). Each thread grid
Bt +1 ( x, y ) = bk t ( x, y ) + F ( frt ( x, y )) ˄2˅
is composed of a number of thread blocks (Block), and each
thread block consists of many threads (thread), which is the
Bt+1 ( x, y ) represents the new background image smallest execution unit of the kernel function. The threads in
Where the same block can be quickly synchronized and they share a
bkt ( x, y ) represents the background area limited size of shared memory. The thread number of each
after updating,
block is limited (GPU dependable). Blocks which execute
frt ( x, y ) represents the foreground the same procedure can be composed of grid [6-8].
of current frame,
Affine transformation in the global motion compensation
ground area of current frame, and F (•) represents affine module, frame difference in the prospective detection, and
transformation. key calculation steps of background update shown as (1-3)
After obtaining new background image which can be are typical image manipulation and typical matrix operations
used in our improved background subtraction, we can detect as well. They occupy a lot of processing time. The larger the
the foreground of current input frame. The method we image, the longer the processing time, and the more
adopted is shown as below. difficulties to obtain assurance for real-time video
processing. Regarding of the three large amount of matrix

443
444
445
computation of the above three models, this paper come up Figure 5. This total contains 640 * 480 threads, each thread
with using GPU to achieve parallel computing acceleration. only needs to calculate a corresponding element. Elements to
The serial part of the programming will not be repeated here. be processed by the thread can be easily collected by unique
The parallel algorithm procedure used in the three models' identifier of each thread.
computation is shown in Figure 4. GPU kernel
pretreatment
HOST
DEVICE
Allocate Calculate thread
Create frames, Graphic memory
initialization and index
data transmission
Affine Transmit data Affine Frame difference or
transformation; from main transformation updating
memory to background
feature extraction and global motion
graphic memory
and matching compensation
between frames Calculate Summation
new or
Foreground coordinate subtraction
detection:
Frame difference
Transmit new
Updating coordinate from
background device to host
model

Show images Release graphic


memory
Figure 3. The division of tasks between GPU and CPU
Pretreatment contains a series of preparatory work after Figure 4. The flow diagram of parallel algorithm
starting CUDA, including defining of the number of thread Affine transformation part, according to the six-
blocks in the grid, defining the number of threads in each parameter affine model definition (1), in the serial code,
thread block and allocating memory size for data . Next, every time the program executes an affine transformation , it
memory allocates in the host side and memory allocates in need four multiplication and addition operations. For a
the device side. After the data is received, it is passed to the frame, it will be 4 * 640 * 480 multiplication and 4 * 640 *
CUDA program. Using the CUDA API, data in memory is 480 addition operations. Therefore, using of the GPU parallel
read into memory, and then memory data passes to the kernel mode, defining the same above-mentioned two-dimensional
function. The kernel function executes appropriate grid (10,120,1) and two-dimensional threads (64,4), multiple
processing on the GPU and copies the results from the threads work simultaneously to reduce computing time and
memory back to the main memory. The CPU is responsible to succeed accelerating.
for displaying the result image. The work of the CPU serial 10*64=640
code includes data preparation and device initialization
before the kernel start, including the creation and Grid Block˄119ˈ1˅
initialization of video image for each frame, image feature 120 Block Block Block thread thread thread
matching, some serial computation, de-noising operation and * (0,0) (0,1) (0,9) (0,0) (0,1) (0,63)
the final displayed image. 4= Block thread
Operation of the frame difference method to detect 480 (1,0) (1,0)
motion foreground (2) and the background model of the
background model update (3) are image subtraction, so they Block Block thread thread thread
(119,0) (119,9) (3,0) (3,1) (3,63)
can be regarded as typical matrix operations. The frame size
of experiment is 640 * 480, so that under the CPU mode,
every image processing operation or subtraction should be
640 * 480 times. Taking into account the experiment using
Figure 5. The define the layout of grid and threads on GPU
NVIDIA GeForce 9800 GT GPU, the computing power is
1.1. Therefore, when using GPU to achieve optimizations
and improvement, two-dimensional grid is defined to be IV. MAIN EXPERIMENTS AND RESULTS
(10,120,1) and two-dimensional thread is defined to be
A. effectiveness verification of Detection method in
(64,4). In other words, two-dimensional grid has10 blocks
each row, and 120 blocks each column, and the thread has 64 dynamic background
thread threads in each row and four threads each column. First, we verify the effectiveness of our algorithm on
The product of the three dimensions of the threads is 64 * 4 CPU by indoor experiment of pedestrian detection. The
= 256 which is less than upper limit 768 of the computing resolution ratio of moving camera in this experiment is 640
power for the 1.1 hardware. The grid definition is shown in h 0.Our method is performed on PC with Intel(R)

444
445
446
Core(TM)2 Duo CPU 2.4GHz, VC2010, Intel OpenCV, and ratio is calculated. This is one group of experiment. We have
CUDA toolkit 4.0. The graphics card is NVIDIA GeForce conduct five groups of such experiments all together.
9800 GT and its computational capabilities is 1.1. The The result obtained by contrastive analysis shows that the
threshold of foreground is 15. detection method proposed in this paper makes full use of
The result of indoor experiment of pedestrian detection is GPU powerful parallel computing ability. The moving
shown in Figure 6. Figure 6(a) shows four original images foreground detection in dynamic background is successfully
extracted from original video sequence, and Figure 6˄b˅ accomplished.
are results of foreground detection. The results show that our
method is effective on moving foreground detection under
dynamic background. SIFT features matching and affine V. CONCLUSION
transformation based global motion compensation serve the In this paper, we propose a new foreground detection
detection in dynamic background pretty well. The detection approach based on GPU in dynamic background. The
system can perform well in dynamic background detection, proposed method first extracts SIFT features of two adjacent
which consists of four parts: image matching between frames in video sequences. Then we compute the parameters
frames, global motion compensation, background updating of affine transform model which are utilized in global motion
and moving foreground detection. compensation. At last we adopt improving background
subtraction approach to detect foreground objects. The
updating background model is used which can leads a more
accurate detection performance. Combined with GPU, three
mainly algorithm modules, which are so called Global
Motion Compensation Module, Updating Background
Module and Foreground Detection Module have been
optimized. In this paper, GPU and CPU are used as a
combined computing unit, which makes a good use of
powerful parallel computing ability. In the end, the
effectiveness of the method and optimizing about speed up
by GPU has been proved. How to further improve the ability
in real-time processing is the new job pathway in the future.

ACKNOWLEDGEMENTS
We would like to thank Professor LiHua Dou for
discussions. This work is sponsored by Ordinary University
Key Laboratory of Beijing, SYS100070417.
 (a) (b)
Figure 6. (a) Original video sequences; (b) results of detection
REFERENCES
[1] Bo Gao,Guobin Bao. Based on the dynamic image processing
B. Performance testing of GPU optimizing technology, the dynamic object detection and tracking[J].
Photoelectric technology, 2010,25(4),pp.73-76.
Table.I. THE RESULT OBTAINED BY CONTRASTIVE EXPERIMENTS [2] Jun Lu㧘Fengling Li, Mai Jiang. Camera movement under dynamic
object detection and tracking[J]. Journal of Harbin engineering
The  Before After optimizing Speed-up
university,2008,29(8),pp.831-835.
average optimizing˄s˅ ˄s˅ ratio
processi [3] Hua Nian.GPU general computing and SIFT the image matching
ĉ   
ng time features based on parallel algorithms [D]. Xian:Xian master thesis,
of three    university of electronic science and technology,2010.
parts ċ    [4] David G Lowe. Distinctive Image Features from Scale-Invariant
Č    Keypoints[J]. International Journal of Computer Vision, 2004, 60(3),
č
pp. 91-110.
  
[5] Xiaofeng Yang,Guilin Zhang.A target tracking algorithm based on an
affine transformation model[J]. The computer and digital project,
The experiment environment is the same as the above. 2005,33(12),pp.30-34.
We make a comparison of processing time before and after [6] Feng Cheng,Dehua Li.The Adaboost algorithm based on CUDA and
GPU optimization of three parts which are global motion parallel implementation[J]. Computer engineering and science, 2011,
compensation, background updating and moving foreground 33(2), pp.118-123.
detection. The result is shown as table 1.The average [7] 3Hao Deng, Fei Yang, Xudong Pan, Anqing You. An implement with
processing time is calculated according to the following CUDA of real-time target tracking based on upper machine[J].
Information and electronic engineering, 2010, 3(6), pp.368-371.
steps.1.Picking 10 frames randomly from the video sequence.
[8] Qing Zhong,Shungang Hua. GPU acceleration calculation of complex
2. Counting the processing time of those three parts for each grid model affine transformation[J]. Photoelectric technology
frame. 3. Calculating the average. After that, the speed-up application,2011,26(1),pp.59-62.

445
446
447

You might also like