You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/269272400

A Pedestrian Detection and Tracking System Based on Video Processing


Technology

Conference Paper · December 2013


DOI: 10.1109/GCIS.2013.17

CITATIONS READS
3 1,263

4 authors, including:

Biaobiao Zhang K.-L. Du

9 PUBLICATIONS   122 CITATIONS   
Concordia University Montreal
94 PUBLICATIONS   1,129 CITATIONS   
SEE PROFILE
SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Signal processing for antenna arrays View project

Neural networks and statistical learning View project

All content following this page was uploaded by K.-L. Du on 10 December 2014.

The user has requested enhancement of the downloaded file.


A Pedestrian Detection and Tracking System Based on Video Processing
Technology
Yuanyuan Chen1,2, Shuqin Guo2, Biaobiao Zhang1, K.-L. Du1

1. Enjoyor Labs, Enjoyor Inc. 2. College of Information Engineering


Hangzhou, China Zhejiang University of Technology
Hangzhou, China

Abstract—Pedestrian detection and tracking are widely applied


to intelligent video surveillance, intelligent transportation, III. WORKING PRINCIPLE OF THE SYSTEM
automotive autonomous driving or driving-assistance systems.
We select OpenCV as the development tool for implementation A. Pedestrian detection
of pedestrian detection, tracking, counting and risk warning in Pedestrian detection is to detect pedestrians in each frame
a video segment. We introduce a low-dimensional soft- output of a video and to sequentially store them into a container. For
SVM pedestrian classifier to implement precise pedestrian video processing of a fixed camera, pedestrian detection in
detection. Experiments indicate that the system has high
general can be approached by: optical flow method, inter-
recognition accuracy, and can operate in real time.
frame difference method and background subtraction
Keywords-pedestrian detection; pedestrian tracking; pedestr- method. Background subtraction method [2] is simple and
ian counting; risk warning easy to be implemented, which is used in this paper. Fig. 2
gives a block diagram of the pedestrian detection algorithm,
comprising two modules, namely a main thread module and
I. INTRODUCTION a support thread module. The features of the two threads are
Pedestrian detection and tracking are basic functions in given in Fig. 2.
intelligent video surveillance, intelligent transportation [1]. 1) Extracting foreground objects
Their performance has a great impact on pedestrian counting, Background difference method performs background
capture of pedestrian running red lights, and another modeling. Gaussian mixture model [3] is one of the most
behavior analysis. Specifically, it automatically detects successful background modeling methods. It represents
targets of interest in an image sequence, and continuously characteristic of each pixel in the image using the Gaussian
locates the target in a subsequent sequence. Currently, the probability density function. Let Gt be a background image
technology is widely used in banks, military spots, at time t . Create a Gaussian mixture model for each pixel in
transportation, supermarkets, warehouses, and other
the background image at time t :
locations with high security requirements.
P(Gt ) = ∑ km =1 ω m,t f (G t , μ m,t , σ m2 ,t ) , (1)
II. SYSTEM COMPOSITION
where ω m,t , μ m,t , σ m2 ,t are, respectively, the weighting
The system is developed on the Visual Studio 2010
platform. We select OpenCV as the development tool, and coefficient, the mean and variance. f (Gt , μ m,t , σ m2 ,t ) is the
improve several functions for implementing pedestrian distribution function of the Gaussian component at time t .
detection, tracking, counting and risk warning in a video As time changes, the background image will slowly
segment. Fig. 1 gives the system block diagram comprising change. Then we have to constantly update the Gaussian
five modules. The user interface is used to load and play a mixture model:
video, and to display the system functions. The foreground ω m,t +1 = (1 + α )ω m,t , μ m,t +1 = βμ m,t + (1 − β ) I ( x, y, t ) , (2)
objects are obtained by difference with the background.
Pedestrians are screened from the foreground objects by where α is the weight update parameter of the background
using pedestrian physical characteristics as well as a low- model, β is the mean update parameter of the background
dimensional soft-output SVM pedestrian classifier. Finally, model, and I ( x, y, t ) is the gray value of the image I at
interested pedestrians are tracked and their trajectories are
pixel ( x, y ) .
plotted.
For the Gaussian mixture model given by (1), P (G t ) has
two parameters: mean μ and variance σ . For an image
I ( x, y, t ) , match each pixel I ( x, y ) to the corresponding
background model P (G t ) , if it satisfies
( I ( x, y ) − μ ( x , y )) 2

2σ 2 ( x, y )
Figure 1. System block diagram. e >T , (3)
decreased complexity for classification. We present a low-
dimensional soft-output SVM pedestrian classifier.
We first extract HOG features, and then select support
vector machine (SVM) with Gaussian kernel as classifier.
The method is implemented as follows.
(a) Select positive and negative samples for training the
classifier. We select 800 positive samples and 500
negative samples in the INRIA pedestrian database,
which are already normalized to 64 × 128 [4].
(b) Reduce feature dimension of the sample set. If a sample
is positive, trim the pedestrian area and normalize it to
32× 64 ; otherwise, directly normalize it to 32× 64 .
(c) For each sample, extract HOG features [5]. It first
divides a sample into blocks of 32 × 64 pixels; then,
each block is divided into 8× 8 pixel units, and the step
is 8 pixels. The original HOG feature dimension is
7 × 15 × 36 = 3780 . After reducing feature dimension,
HOG feature dimension is 3 × 7 × 36 = 756 . In summary,
the computational complexity is reduced by a factor of 5.
Figure 2. The block diagram of pedestrian detection.
(d) We chose the efficient LIBSVM classifier, and use the
where T is a preset threshold, 0.7 ≤ T ≤ 0.75 . If (3) is soft output [0,1] instead of {− 1,1} [6].
satisfied, ( x, y ) is judged as a background point; Otherwise, (e) Train LIBSVM classifier with HOG features as input to
get a low-dimensional soft-output pedestrian classifier.
it is judged a foreground point. Meanwhile, sequentially save in the queue those
2) Screening foreground objects foreground objects and their frame numbers of the
Screening foreground objects is to identify pedestrians images that objects stay without clear classification.
from foreground objects. The traditional method generally (f) Extract HOG features of 756 dimensions for each object.
selects most obvious shape features, such as aspect ratio and Input HOG features of each object to the trained
size, for detection. This method can quickly remove objects pedestrian classifier, and the output gives whether the
with a large variation from pedestrian shape, such as object is a pedestrian.
vehicles. However, this shape-based method has a low 3) Correcting the wrong output of the main thread
accuracy. Thus, we introduce a classifier to improve the In the main thread module, the foreground objects
detection accuracy. without clear classification have been saved sequentially in
Let the size of a bounding rectangle be S = W × H . Due the queue. We process these objects in the support thread
to the varying distance between an object and a camera, an module as follows. First, when the count of foreground
object has a size with wide variation. Thus, we divide the images stored in the queue reaches 10, the support thread
video frame into three regions on a horizontal or vertical starts to work. The system reads these images in FIFO order
axis: [0, x 0 ) , [ x 0 , x1 ) , [ x1 , x 2 ) . Judge the condition and normalizes them to size 64× 128 . Then, extract HOG
⎧0.001 < δ S < 0.01, if 0 ≤ x 0 features of 3780 dimensions for each image, and feed them
⎪ to the trained pedestrian classifier. Compare this result with
⎨0.002 < δ S < 0.05, ifx 0 ≤ x1 . (4)
that of the low-dimensional soft output SVM classifier. If
⎪0.004 < δ S < 0.1, ifx ≤ x they are the same, go to the next foreground object in the
⎩ 1 2
If (4) is satisfied, we remove too large or too small objects. queue; otherwise, we use this result instead of the previous
Further, if the aspect ratio of the object bounding rectangle result.
satisfies B. Pedestrian tracking
width
0.2 < scale = < 0 .8 , (5) Pedestrian tracking is to automatically monitor the space
height and time changes of each pedestrian in a video sequence [7].
we determine the object as a pedestrian. As pedestrians tend We select CamShift algorithm, which implements tracking
to be tall and narrow, they can be well separated with the by color characteristic and can effectively solve the problem
ratio. of object deformation.
We further use classification to implement pedestrian In OpenCV4.0, there is a semi-automatic, single-object
detection. We treat each object as a separate region and CamShift algorithm. In practical applications, we expect to
perform normalization and feature extraction. Then, classify track one or several objects, obtain their trajectories, and
the processed objects one by one. For an arbitrary static conduct analysis on the trajectories. Thus, the system
image, these objects usually occupy small areas in each implements tracking of multiple objects based on the semi-
frame. Thus, we extract a characteristic dimension that is automatic, single-target CamShift algorithm [8]. The object
greatly reduced in each image, learning to a substantially trajectories are plotted and stored in a specified folder. The
working principle of multi-object tracking is given as follows.
(a) Use the mouse to select a region of interest. Set tab
trackObject[i ] to indicate whether i-th object is selected. M 4 M 2 3M 4 M
If unselected, the value is 0; if selected, the value is -1; if
end of the trace, the value is 1. Store all selected areas in Figure 3. Illustration of pedestrian counting
the array by their labels.
(b) Get the initial objects and the H component histogram.
Call setMouseCallback()function to obtain the
coordinates of objects of interest. Call calcHist()
function to calculate the H component histogram of
object region. Initialize size and position of the search
window, and define the mass coordinates y 0 .
(c) Use the histogram to calculate the inverse projection of
the input image. This is implemented by calling
calcBackProject().
(d) Run MeanShift tracking algorithm and search new
window area of object image. Define the zero-order
moment M 00 and first-order moments M 10 , M 01 of the
search window:
M 00 = ∑ x ∑ y I ( x, y) , M 10 = ∑ x ∑ y xI ( x, y) ,
M 10 = ∑ x ∑ y yI ( x, y) , (6)
where ( x, y ) represents the pixel in the search window, Figure 4. Main interface

I ( x, y ) represents the grayscale value of pixel ( x, y ) in


the projection image. The mass coordinates of the
search window are obtained by
M M
( x, y ) = ( 10 , 01 ) . (7)
M 00 M 00
(e) Move the center of the search window to the mass center
y 1 . Denote original mass center by y 0 . We have
y 0 ← y 1 . Let d = y 1 − y 0 , ε the error threshold
value, N the maximum number of iterations. If d < ε or
k > N , then end the iteration and return the new target
position y 1 ; otherwise k = k + 1 , go to (4).
(f) Save the mass center obtained in each frame to a
container of vector <Point2f>. The mass centers of
every fifth frame are connected into a trace. Thus, we
obtain the object trajectories. Figure 5. The secondary interface
C. Pedestrian counting and risk warning
method is as follows. First, set the forbidden region such as
Pedestrian counting is an application of pedestrian the green areas in the park, the cashier in the shopping mall.
detection and tracking [9]. We select a video of size M × N . Then, determine whether a pedestrian is within the preset
Each frame is divided into four regions along the x-axis, as region. If yes, call alarm function to execute alarming.
shown in Fig. 3. The left and right regions are for direction Finally, set the effect of alarming.
flagging, and the two in the middle are counting areas.
When detecting a pedestrian, we first determine the entering IV. SOFTWARE IMPLEMENTATION OF THE SYSTEM
direction. If a pedestrian enters from the left, the left The program runs on a PC machine with a configuration
identifier is set to 1, and the right identifier is 0. Then, when of Intel i3 (CPU), 2GB DDR3 (Memory), and Visual Studio
the pedestrian enters the left counting region, the added by1. 2010 (Development Tool).
If entering from the right, the result is similar.
Risk warning is another application of pedestrian A. User interface
detection and tracking. For video surveillance in such places User interface provides an interactive communication
as parks, shopping malls and other scenes, some regions are platform for user and computer. It allows us change system
prohibited for pedestrians. The system realizes automatic parameters and display system functions. We create a dialog
detection of such abnormal behavior and issues alarm. The box interface, as shown in Fig. 4. The interface consists of a
video loading area and a video processing area. The video • Change a rectangle: double-click anywhere in bounding
loading area selects to play and pause a video source. The rectangle to cancel the rectangle. Then we can redraw a
video processing area contains two options: Set and Run / rectangle.
Stop. Clicking Set button pops up a “Detection” dialog box, • Add Warning button, initialize it to hide, and set the
as shown in Fig. 5, where one can select the processing alarm mode.
methods for moving target detection, pedestrian detection Call ShowWindow() to display the button.
and tracking. Clicking Run/Stop buttons displays or stops Call SetDownColor(), SetUpColor() to change
the processing results in the display. In addition, the display the button color.
text Warning, when an exception is triggered in a preset Call ShowWindow() again to hide the button.
area of the display, will automatically alarm. • Determine whether a pedestrian is within the preset
B. Software implementation region. If yes, call alarm() function to execute
alarming.
Many image processing functions in OpenCV library are
used for the system development. C. Experimental results
First, load and close a video. A video is composed of a In order to verify the effectiveness of the system, we
continuous sequence of images. Each image can read and selected a surveillance video sequence. Size is 480×360,
displayed by setting an appropriate frame number. frame rate is 25f/s.
DrawToHDC() can be called to realize this function. A The procedure of pedestrian detection is shown in Fig. 6.
video can be closed by setting a label. In Fig. 6a, we can see that the system can completely detect
Then, run the pedestrian detection and tracking module. moving objects (vehicle and pedestrian), and some noise
• Background modeling. occurs due to antenna jitter. In Fig. 6b, the system can
Call morphologyEx() function to implement eliminate the noise, and identify all pedestrians. Fig. 7 shows
Gaussian mixture model. that the system can track the objects of interest.
• Foreground extraction. Fig. 8 illustrates pedestrian counting. In Fig. 8a, "in: 2
Call absdiff(), threshold() to implement out: 0" indicates a total of 2 pedestrian entering from the left,
differential operation and binarization, and a binary and a total of 0 entering from the right. In Fig. 8b, "in: 2 out:
foreground image is obtained. Then, through the 1" indicates 2 entering from the left, and 1 from the right.
morphological filtering functions erode(), Clearly, the system gives accurate counting from both sides.
morphologyEx(), dilate(), one can optimize Risk warning is illustrated in Fig. 9. When someone is
the foreground image obtained. trampling the lawn, the system automatically alarms. First,
• Edge detection. Get each foreground contours with we manually set the region of interest on the lawn. If no one
findContours(). The 165th frame The 210th frame
• Pedestrian recognition.
a) Calculate the size and aspect of foreground objects.
b) Load the trained SVM classifier. We can call
resize() to normalize an input foreground image to
32 × 64 .
c) Mark the identified pedestrian with the rec-
tangle().
• Pedestrian tracking.
Select objects of interest, and call setMouse-
Callback() to get the coordinates of these objects.
(a) Foregrounds Extracted by background subtraction method
Then, call calcHist(), normalize(), calc-
BackProject() to get the object histogram and The 165th frame The 210th frame
reverse projection. Finally, implement pedestrian
tracking with CamShift() function.
Finally, run pedestrian counting and risk warning
modules. The procedure of pedestrian counting is given in
Section II.C .The procedure for risk warning is given below.
• Add three message response functions to get the
coordinates of the forbidden region.
OnLButtonDown(): left-click anywhere on the
display to get the coordinates for a vertex of the
rectangle.
OnMouseMove(): drag the mouse to a new location.
OnLButtonUp(): release the mouse to obtain the (b) Pedestrians detectioned
diagonal vertex on the rectangle. The two vertices Figure 6. Pedestrian detection
construct a rectangle parallel to the window.
(a) Single pedestrian tracking (b) Multiple pedestrians tracking
Figure 7. Pedestrian tracking

(a) Set a warning zone

(a) Entering from the left (b) Entering from the right
Figure 8. Pedestrian counting

stepped into the region, Warning button would is hidden.


When someone steps into the preset region, the display text
“Warning” flashes, and the system alarms. At the same time,
save the pedestrian trajectory and record a video clip.
V. SUMMARY (b) Warning
In this paper, we developed a pedestrian detection and Figure 9. Risk warning
tracking system, and also applied it for counting and risk
warning. We also designed a low-dimensional soft-output Congress on Image and Signal Processing (CISP), Chongqing, China,
SVM pedestrian classifier. The combination of the classifier Oct. 2012, pp. 1205-1208.
with the support thread can improve pedestrian detection [5] Q. Zhu, S. Avidan, M.-C. Yeh & K.-T. Cheng, “Fast human detection
accuracy and the speed of operating. using a cascade of histograms of oriented gradients,” Proc. IEEE
Conf. Computer Vision Pattern Recogn., New York, 2006, pp. 1490-
1499.
REFERENCES
[6] Platt J. C. “Probabilistic Output for Support Vector Machine and
Comparisons to Regularized Likelihood Methods,” Advances in
[1] P. Spagnolo, M. Leo, T. D’Orazio & A. Distante, “Robust moving Large Margin Classifiers: MIT Press, 1999.
objects segmentation by background subtraction,” Proc. Interactive [7] C. Wen, A. Azarbayejani, T. Darrell & A. P.Pfinder, “Real-time
Services (WIAMIS), Lisboa, Portugal, 2004, pp. 81-84. tracking of the human body,” IEEE Trans. Pattern Anal. Mach.
[2] J. Rymal, J. Renno, D. Greenhill, J. Orwell & G. A. Jones, “Adaptive Intell., vol. 19, pp. 780-785, 1997.
eigen-backgrounds for object detection,” Proc. IEEE Int. Conf. on [8] M. Andriluka, S. Roth & B. Schiele, “People-tracking-by-detection
Image Processing (ICIP), Singapore, Oct. 2004, pp. 1847-1850. and people-detection-by -tracking,” Proc. IEEE Conf. Proc. Computer
[3] M. Andriluka, S. Roth & B. Schiele, “Pictorial structures revisited: Vision and Pattern Recognition, Anchorage, Alaska, USA, 2008, pp.
people detection and articulated pose estimation,” Proc. IEEE Conf. 1-8.
on Computer Vision and Pattern Recognition, Miami, Florida, USA , [9] Y.-L. Hou & G.K.H. Pang, “People counting and human detection in
2009, pp. 1014-1021. 9 a challenging situation,” IEEE Trans. Syst. Man Cybern., vol. 41, pp.
[4] N. Shou, H. Peng, H. Wang, L.-M. Meng, K.-L. Du, “An ROIs based 24-33, 2011.
pedestrian detection system for single images,” Proc. 5th. Int.

View publication stats

You might also like