You are on page 1of 16

1

Moving/motionless foreground object detection using


fast statistical background updating

W-Y Chiu and D-M Tsai*


Department of Industrial Engineering and Management, Yuan-Ze University, Tao-Yuan, Taiwan

Abstract: In video surveillance, the detection of foreground objects in an image sequence from
a still camera is very important for object tracking, activity recognition and behaviour
understanding. The conventional background subtraction cannot respond promptly to dynamic
changes in the background, and temporal difference cannot accurately extract the object shapes
and detect motionless objects. In this paper, we propose a fast statistical process control scheme
for foreground segmentation. The proposed method can promptly calculate the exact grey-level
mean and standard deviation of individual pixels in both short- and long-term image sequences
by simply deleting the earliest one among the set of images and adding the current image scene
in the image sequence. A short-term updating process can be highly responsive to dynamic
changes of the environment, and a long-term updating process can well extract the shape of a
moving object. The detection results from both the short- and long-term processes are
incorporated to detect motionless objects and eliminate non-stationary background objects.
Experimental results have shown that the proposed scheme can be well applied to both indoor
and outdoor environments. It can effectively extract foreground objects with various moving
speeds or without motion at a high process frame rate.

Keywords: motion detection, surveillance, foreground segmentation, statistical process control

1 INTRODUCTION detection can promptly absorb the changes in the


background and prevent the erroneous detection of
In video surveillance, the detection of moving objects in non-stationary background objects such as the move-
video sequences from a stationary camera is very ment of chairs as foreground regions. Accurate extrac-
important for the success of object tracking, incident tion of the complete shape of a foreground object is very
detection, activity recognition and behaviour under- important for object and activity recognition in appear-
standing. Motion detection aims to segment the fore- ance-based methods. Fast computation in foreground
ground pixels corresponding to moving objects from the segmentation means that more time is available for a
background in a scene image. A robust motion detection high-level classification process.
algorithm has three properties: (1) high responsiveness to Hu et al.1 categorised motion segmentation ap-
dynamic environments such as gradual changes in proaches as background subtraction, temporal differ-
outdoor light, sudden on/off switching of indoor lights ence and optical flow. Optical flow represents the
and door opening/closing;(2) accurate extraction of motion of objects by a velocity vector for each pixel in
foreground shapes; and (3) computational efficiency for two consecutive images, and was originally proposed by
real-time implementation. Responsive foreground Horn and Schunck.2 The method proposed by Lucas
and Kanade3 is one of the most popular versions of
The MS was accepted for publication on 21 April 2011. optical flow methods for motion detection. Gibson and
* Corresponding author: Du-Ming Tsai, Department of Industrial Spann,4 Liu et al.,5 Kim and Kak,6 and Chamorro-
Engineering and Management, Yuan-Ze University, 135 Yuan-Tung
Road, Nei-Li, Tao-Yuan, Taiwan; email: iedmtsai@saturn.yzu. Martinez and Fernandez-Valdivia7 have applied optical
edu.tw flow to detect moving objects in video sequences.

IMAG 91 # RPS 2011 DOI: 10.1179/1743131X11Y.0000000016 The Imaging Science Journal Vol 0
2 W-Y CHIU AND D-M TSAI

Optical flow methods for motion detection are rela- used extensively as a benchmark for the comparison
tively computationally slow, and cannot detect motion- of motion detection techniques. The background
less foreground objects. model can deal with multimodal distributions caused
Temporal difference8,9 calculates the difference of by shadows, swaying branches, etc. It can handle
pixel features between two consecutive scene frames in slow lighting changes by slowly adapting the para-
an image sequence. It is very computationally efficient, meter values of the Gaussians. Since the estimation of
and well accommodates environmental changes, but it Gaussian parameter values for each pixel in the image
generally can only extract partial shapes of mov- using standard algorithms such as Expectation
ing objects. Since temporal difference relies on the Maximisation is computationally prohibitive, recur-
measure of pixel variations in consecutive image sive updating using a simple linear adaptive filter is
frames, it is prone to failure in detecting motionless applied for real-time implementation. The background
foreground objects. model is generally considered as a linear combination
Background subtraction detects moving objects in of the current background and the current scene image
an image by evaluating the difference of pixel features with a specific learning rate. A slow updating of the
between the current scene image and the reference background model cannot respond promptly to the
background image. This approach is very computa- background changes, whereas a fast updating might
tionally fast, but it is also very sensitive to environ- absorb the slow-moving object into the background.
mental changes. To compensate for illumination KaewTrakulPong and Bowden18 argued that the
changes, a background model updating process is background model of Stauffer and Grimson17 suffers
generally adopted, but that process requires addi- from slow learning at the beginning of the updating
tional computation time. In this paper, we propose a process. They therefore proposed two different updat-
fast foreground segmentation scheme that can ing formulations to estimate the Gaussian mixture
effectively detect moving foreground objects and model at the beginning phase and at the stable phase
motionless humans, while being promptly adaptive when sufficient image samples are processed. Their
to environmental changes in the background. It is methods improved the accuracy of the initial estimate,
computationally very fast and accurate in updating and allowed fast convergence on a stable background
both short- and long-term backgrounds characterised model. Lee19 presented an adaptive learning rate of the
by the grey-level mean and standard deviation of background model to improve the updating con-
each individual pixel over time using the statistical vergence without compromising model stability.
process control technique. McFarlane and Schofield,20 and Manzanera and
Many background updating strategies have been Richefeu21 used a recursive approximation of the
proposed in the literature on foreground segmenta- temporal median for background estimation. In their
tion. Piccardi10 presented a review of background models, the background is updated with a constant
subtraction techniques with an emphasis on back- increment/decrement of 1 based on the sign of the
ground modelling algorithms for detecting moving grey-level difference between the current frame and the
objects from a static camera. Background model background. The method entails only a small cost in
updating methods are generally based on the analysis terms of memory consumption and computational
of the grey-level (or colour) histogram taken by each complexity. The variance of each pixel over time is also
individual pixel over a limited number of recent increased/decreased by 1 at each frame. The statistics
frames. Wren et al.11 developed a system called of the background must be constantly updated with a
Pfinder for person segmentation, tracking and inter- pre-determined frequency. This approach might not be
pretation. This system modelled each pixel of the fast enough to update radical changes in the back-
background over time with a single Gaussian dis- ground. A moving object might not be promptly
tribution. Single Gaussian updating models have also identified during the updating.
been adopted by many researchers12–15 for back- Rather than modelling the features of each pixel with
ground subtraction. Gaussian distributions, Elgammal et al.22 and Ianasi
A more robust background modelling technique is et al.23 evaluated the probability of a background pixel
to represent each pixel in the background image over using kernel density estimation from very recent
time by a mixture of Gaussians, which was originally historical samples in the image sequence. Elgammal
proposed by Stauffer and Grimson.16,17 It has been et al.24 also used Fast Gauss Transform25 to improve

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 3

the computation of the Gaussian kernel density es- image frames. The mean of the short-term image
timation. The kernel-based density estimation ap- sequence represents a short-term background, which
proach is computationally intensive because it requires can be used to detect moving objects and is very
a large data set for accurate estimation. The adaptive responsive to dynamic changes in the background.
background updating methods discussed above can Conversely, the mean of the long-term image se-
well identify moving objects in a dynamically changing quence represents a long-term background, which
environment by carefully adjusting the updating rate of can be used to extract the complete silhouettes of the
the background. They generally encounter the contra- moving/motionless foreground objects. The proposed
diction that both non-stationary background objects method then takes advantage of both the background
and motionless foreground objects are absorbed as subtraction and temporal difference methods, yet it is
parts of the background, or both motionless fore- very computationally efficient.
ground objects and non-stationary background objects There have also been a few studies based on short-
are detected as foreground regions. and long-term information combination for fore-
As mentioned above, background subtraction ground segmentation. Huerta et al.26 proposed an
cannot promptly respond to dynamic changes in the adaptive dual-background modeling algorithm for
background and fails to absorb non-stationary back- motion and motionless object detection. The back-
ground objects without a carefully-designed back- ground model is updated using long-term and short-
ground updating mechanism. Temporal difference term learning rates. This method can detect a
cannot extract the complete shape of a moving object motionless person. However, when a background
and fails to detect motionless human figures. In this object changes its status in the scene, it is falsely
paper, we propose a fast statistical process control detected as a foreground object. Bayona et al.27
(SPC) scheme for identifying both moving and combined simple background subtraction and tem-
motionless foreground objects in a dynamic environ- poral difference for stationary foreground detection.
ment. A motionless foreground object in this study This mechanism cannot distinguish a motionless
means that a moving human figure becomes still in person from the background objects with status
changes. Porikli and Yin28 presented a dual-back-
the video. In SPC, the mean and variance over a
ground model for temporally static region detection.
series of observed sample data must first be calcu-
Their method can detect a static foreground object
lated. They are then repeatedly updated as the new
such as abandoned luggage. However, it is very
sample data are collected. The control limits (i.e. the
sensitive to dynamic changes of the environment,
thresholds) are then given by some constant multi-
such as a door opening/closing. The proposed
plication of the standard deviation with the mean as
method in this paper not only detects the stationary
the centre. If the measured value falls outside the
foreground object such as an unconscious person on
control limits, an anomaly is declared. We assume
the floor, but also responsively absorbs non-station-
that the grey values of foreground and background
ary background objects such as a moving chair and
points are generally distinctly different.
opening/closing curtains as parts of the background.
The current existing background updating models,
The organisation of this paper is as follows.
either with single Gaussian or multiple Gaussians,
Section 2 presents the statistical process control
derive the grey-level distribution parameters using
scheme for foreground segmentation. Section 3
estimation techniques. The representation of the true
describes the experimental results on three sets of
current background can be degraded after a very
video scenarios, and compares the performance of the
long-run due to the accumulated errors of back-
proposed method with that of temporal difference
ground estimation in each update operation. and background subtraction with Gaussian mixture
The proposed method can quickly calculate the modelling. Conclusions are given in Section 4.
exact grey-level mean and standard deviation of
individual pixels in both short- and long-term image
sequences by simply deleting the earliest one among 2 STATISTICAL PROCESS CONTROL FOR
the set of images and adding the current image FOREGROUND SEGMENTATION
scene in the image sequences. It therefore involves
only two arithmetic operations to update the intensity As aforementioned, background subtraction techni-
statistics, regardless of the number of consecutive ques cannot promptly respond to changes in the

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


4 W-Y CHIU AND D-M TSAI

environment such as cups and newspapers placed on temporal images to set up the control limits. The
a table, curtain opening/closing in a living room, and multiple temporal images of the background will
cars on a street parking and moving. Temporal present approximately the same grey value with a
difference techniques cannot detect motionless fore- small variance. The grey values in a foreground
ground objects such as an unconscious person or an region will be distinctly different from those of the
older person falling asleep. background, and therefore outside the control limits
The proposed method uses statistical process given by mf¡Ksf. The grey-level mean and variance
control to identify individual foreground pixels from of multiple temporal images can be quickly and
both short- and long-term image frames. Statistical precisely calculated by deleting the earliest image in
process control29 has been commonly used in ma- the series of the historical image frames and adding
nufacturing to monitor process stability. The most the current video image. The statistical process
important aspect of implementing SPC is control control procedure for foreground segmentation is
charting. Unnatural patterns in control charts can be described in detail as follows.
associated with a specific set of assignable causes, Let FN5{ft(x,y), t5T21, T22, …, T2N} be a
provided that appropriate process knowledge is series of N consecutive image frames, where T denotes
available.30 A simple form of control chart for SPC the current time frame. The grey-level mean and
is given by m¡Ks, where m is the mean of the variance of multiple temporal images are given by
measure, s is the standard deviation of the measure 1
and K is a control constant. mT{1 ðx,yÞ~ ST{1 ðx,yÞ (1)
N
Let ft(x, y) be the 2D spatial scene image of size
R6C at frame t, for x50, 1, 2, …, R21, y50, 1, 2, s2T{1 (x,y)~
…, C21 and t5T, T21, …, T2Nz1, where T ET{1 ½f 2 (x,y){fET{1 ½f (x,y)g2 ~
denotes the current time frame. It can be expected
1 2
that the grey values for each pixel at (x, y) will be S (x,y){m2T{1 (x,y) (2)
N T{1
approximately the same if the pixel at (x, y) is a part
2
of the static background over the entire N observed where ST21(x,y) and ST{1 ðx,yÞ are defined, respec-
frames. Because a pixel at (x, y) in the background tively, as the sum of ft(x,y) and sum of square of ft(x,y)
region has similar grey values over the N time frames, over the N image frames, i.e.
the grey-level mean mf will represent the background XN

and the grey-level standard deviation sf will be ST{1 ðx,yÞ~ fT{i ðx,yÞ, Vðx,yÞ (3)
i~1
approximately zero. Therefore, detection of fore-
ground objects in a series of video images is X
N
2 2
ST{1 ðx,yÞ~ fT{i ðx,yÞ, Vðx,yÞ (4)
equivalent to identifying the anomalies from a stable
i~1
process given by the control limits mf¡Ksf. The When the grey-level mean and variance at time frame
statistics mf and sf of the control limits can be T21 are obtained, the upper and lower control limits
evaluated from both short- and long-term processes. for foreground-pixel detection in image frame fT(x,y)
By considering a large number of recent frames, the can be given by mT21(x,y)¡KsT21(x,y), where K is a
control limits give the long-term process (back- control constant. If the grey-level fT(x,y) at pixel
ground) variation. The proposed SPC scheme then
coordinates (x,y) is out of the control limits, the pixel is
acts as background subtraction to extract the
then considered as a foreground point. Otherwise, it is
moving/motionless foreground objects with accurate
classified as a steady background point. The detection
shapes. In contrast, by considering a small number of
result is represented by a binary image BT(x,y), where
recent frames, the control limits then present the
BT (x,y)~
short-term process variation. The proposed SPC (
scheme can therefore act as a temporal difference 0 (background), if ft (x,y) [ mt{1 (x,y)+Kst{1 (x,y)
that responds quickly to environmental changes. 1 (foreground), otherwise (5)
Since the grey values between foreground and back-
2.1 Short- and long-term background statistics
ground points are generally distinctly different, the
In this paper, we propose a fast update process of control constant K is set at 5 in this study, rather than
grey-level mean and variance from a series of the commonly-used three-sigma in SPC.

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 5

Once the foreground objects in image frame fT(x,y) mT(x,y) obtained from a large N can be considered as
are segmented from the background, the grey-level a long-term background. It can effectively extract the
mean mT and standard deviation sT at the current accurate silhouette of a moving object. It can also be
time frame T can be quickly updated from ST21(x,y) used to detect a motionless foreground object. As
2
and ST{1 ðx,yÞ. That is previously noted, the updating process is independent
1 of the number of image frames N. Both short- and
mT ðx,yÞ~ ST ðx,yÞ (6) long-term background statistics can be simultaneously
N
evaluated with minimum computational effort. They
1 2
s2T ðx,yÞ~ S ðx,yÞ{m2T ðx,yÞ (7) can be incorporated to extract the foreground shapes
N T
in a dynamic environment.
where ST(x,y)5ST21(x,y)2fT2N(x,y)zfT(x,y) Let the number of image frames N be set at Ns for
ST2 ðx,yÞ~ST{1
2 2
ðx,yÞ{fT{N ðx,yÞzfT2 ðx,yÞ short-term updating and Nl for long-term updating,
where Nl.Ns. Furthermore, denote by msT ðx,yÞ and
 s 2
Note that the grey-level mean mT(x,y) and standard sT (x,y) the short-term grey-level mean and var-
 2
deviation sT(x,y) at the current time frame T are iance, and mlT ðx,yÞ and slT (x,y) the long-term grey-
2 level mean and variance at the current time frame T.
derived from ST21(x,y) and ST{1 ðx,yÞ, which are
passed along from the previous updating stage. They In this study, the number of short-term Ns is usually
are calculated precisely, rather than being estimated set in the range of 10–100, and the number of long-
from an updating filter. ST(x,y) and ST2 ðx,yÞ can be term Nl is set between 100 and 1000, depending on
efficiently updated by dropping the earliest image the monitoring environments such as indoor scenes
frame fT2N(x,y) in the image series and adding the or highway traffic scenes.
current image frame fT(x,y) to the image series. The foreground detection results represented as
Therefore, the updating computation involves only binary images BsT ðx,yÞ and BlT ðx,yÞ for the short- and
two simple arithmetic operations. A very high long-term image frames are, respectively, given by
processing rate of image frames is thus achieved. 
s 0, if fT (x,y) [ msT{1 (x,y)+KssT{1 (x,y)
Note also that the mean and variance updating BT (x,y)~
1, otherwise
processes in equations (6) and (7) are invariant
(8)
with respect to the number of image frames N in
the series. and
(
The updated mean mT(x,y) can be interpreted as the 0, if fT ðx,yÞ [ mlT{1 ðx,yÞ+KslT{1 ðx,yÞ
background at the current time. In this paper, we do BlT ðx,yÞ~
1, otherwise
not use image differencing that computes the (9)
difference between the reference background and
current image frame to identify foreground objects. Figure 1a shows the sequence of an indoor scenario,
Rather, the standard deviation sT of pixel intensities in which a man walks into a classroom, puts a bag on
over time is effectively used to set up the control the desk, sits on a chair, falls asleep for a while, then
limits for discriminating foreground pixels from the wakes up, and, finally, leaves the room without
stationary background. The grey-level mean and taking the bag with him. The background scene of the
variance updating formulae in equations (6) and (7) classroom involves the rotation of ceiling fans.
involve one parameter value, the number of image Figure 1b shows the short-term mean mst ðx,yÞ with
frames N, to be determined. A small number of N Ns5100. It indicates that the short-term process can
produces a fast update of the background statistics. quickly respond to dynamic change in the environ-
The mean mT(x,y) obtained from a small N can then ment, where the revolving ceiling fans, the bag and
be interpreted as a short-term background. It can the sleeping man are all taken as the background.
quickly absorb non-stationary background objects Figure 1c shows the long-term mean mlt ðx,yÞ with
such as a moving chair and opening/closing of a Nl510000. It reveals that the long-term process can
curtain as parts of the background. It is therefore well preserve the static background, where the bag
very responsive to dynamic changes in the environ- and the sleeping man are not absorbed as parts of the
ment. In contrast, a large number of N produces a background, but the constantly revolving ceiling fans
slow update of the background statistics. The mean are still taken as the background.

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


6 W-Y CHIU AND D-M TSAI

1 Estimated mean on an indoor scenario: (a) discrete scene images in a video sequence taken at
15 fps; (b) mean grey valuesmst calculated from a short-term number of 100 frames; (c) mean
grey valuesmlt calculated from a long-term number of 10 000 frames (the symbol t
represents the frame number in the sequence)

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 7

2 Estimated mean on an outdoor parking-lot scenario: (a) discrete scene images in a video
sequence taken at 15 fps; (b) means calculated from the short-term process (Ns510); (c) means
calculated from the long-term process (Nt5100) (the symbol t represents the frame number in
the sequence)

Figure 2a illustrates the sequence of an outdoor process. The selection of proper numbers of image
parking lot scenario. In contrast to the slow-walking frames for Ns and Nl depends on the speed of moving
person with a short-term number Ns5100 in Fig. 1, foreground objects in the observed scene. The short-
the foreground object in Fig. 2 is a fast-moving term grey-level mean from a small number of the
motorcycle and, therefore, a relatively small short- most recent frames can avoid false detection from
term number Ns510 is used for rapid SPC statistics non-stationary background objects such as moving
updating. Figure 2b presents the short-term mean chairs, and the long-term mean can detect the
mst ðx,yÞ with Ns510. It shows that the short-term stationary foreground objects such as an unconscious
process can respond promptly to dynamic change in person. As previously noted, the short-term process
the environment, where the stationary motorcycle is can be highly responsive to dynamic changes in the
taken as a part of the background soon after it is environment, and the long-term process can effec-
parked in the lot. Figure 2c shows the long-term tively extract the shape of a moving/motionless
mean mlt ðx,yÞ with Nl5100. It indicates that the long- foreground object.
term process can well preserve the static background
without showing the stationary motorcycle as a
2.2 Motionless foreground object detection
background region. The walking person did not
present in either the short-term or the long-term In order to detect the motionless foreground objects,
background. the detection results from both the short- and long-
Note that the indoor scenario in Fig. 1 involves the term processes are incorporated as follows. Let
slow movement of a person and, therefore, relatively Bst ðx,yÞ and Blt ðx,yÞ represent, respectively, the short-
larger Ns and Nl are used. Conversely, the outdoor and long-term binary results at time frame t. De-
scenario in Fig. 2 involves the fast movement of a note by bst ðiÞ, i~1, 2, 3, . . . , ns the moving objects
vehicle and, thus, small Ns and Nl are used for the detected in Bst ðx,yÞ, where ns is the total number of

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


8 W-Y CHIU AND D-M TSAI

3 Combined scheme of long- and short-term processes: (a) discrete scene images in a video
sequence taken at 15 fps; (b) short-term process results Bst ðx,yÞ (Ns5100); (c) long-term pro-
cess resultsBlt ðx,yÞ (Nt510 000); (d) final detection results of the combined scheme
(the symbol t represents the frame number in the sequence)

moving objects, and blt ð j Þ, j51, 2, 3, …, nl, the frame T21. If there is at least one bsT{1 ðiÞ, i.e. a
foreground objects detected in Blt ðx,yÞ, where nt is the moving object in image BsT{1 ðx,yÞ, then we find the
total number of foreground objects detected. corresponding object blT ð j Þ that has the shortest
When the short-term BsT ðx,yÞ~0 for all (x,y), i.e. distance to bsT{1 ðiÞ in the long-term BlT ðx,yÞ. The
no moving objects are detected in image fT(x,y), we Euclidean distance between the centroid of blob
then check for long-term BlT ðx, yÞ at the current bsT{1 ðiÞ and that of blob blT ð j Þ is denoted by
 
frame T and short-term BsT{1 ðx,yÞ at the previous d bsT{1 ðiÞ,blT ð j Þ . Let ROI be the effective region

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 9

slightly smaller than the size of the image frame. If a dynamic environments. It requires no complicated
moving object falls outside the ROI, this indicates tracking algorithms to detect the stationary fore-
that the foreground object walks away from the field ground objects.
of view of the still camera. The motionless fore-
ground objects can then be extracted using the
following detection process 3 EXPERIMENTAL RESULTS
If
BsT ðx,yÞ~0, This section presents the experimental results from
three sets of image sequences that involve one
Vðx,yÞ (i:e: no moving objects) outdoor and two indoor scenarios. The proposed
and there exists a bsT{1 ðiÞ[ROI, algorithms were implemented using the Czz lan-
guage and run on a Pentium 4, 3.0 GHz personal
then
  computer. The test images in the experiments were
Compute d bsT{1 (i),blT (j)
1506200 pixels wide with 8-bit grey levels. The
~f½xsT{1 (i){xlT (j)2 z½ysT{1 (i){ylT (j)2 g1=2 computation time per frame of size 1506200 was
for j~1, 2, . . . , n1 (10) 0.0071 s (i.e. 142 fps). In the experiments, we also
 s    compare the performance of the proposed method
where xT{1 (i),ysT{1 (i) and xlT (j),ylT (j) are the with that of temporal difference and adaptive back-
coordinates of the centre of area for blobs bsT{1 ðiÞ ground mixture models proposed by Stauffer and
and blT ð j Þ, respectively. Grimson.16 The source code of the Gaussian mixture
modelling algorithm for background subtraction is
  available through OpenCV.31 In the adaptive back-
Let d bsT{1 (i),blT (j  ) ~
    ground mixture models, a mixture of three Gaussians
min d bsT{1 (i),blT (j) , j~1, 2, :::, nl was used, and the learning rate a was given by 0.1 for
all test scenarios. In the proposed method, the short-
Superimpose blT (j  ) on the binary image BsT ðx,yÞ, term number and the long-term number of image
i.e. frames were set with Ns5100 and Nl510 000 for all
test scenarios. In order to show the primary effec-
BsT ðx,yÞ/BsT ðx,yÞ|blT ðj  Þ (11)
tiveness of the proposed method, post-processing
Endif operations such as noise removal and connected
In equation (10), blob bsT{1 ðiÞ represents a detected component analysis were not applied to the resulting
moving object in BsT{1 ðx,yÞ at frame T21, but images.
disappears in BsT ðx,yÞ at frame T. The blob blT (j  ) In order to evaluate the performance of the
that has the shortest distance to bsT{1 (i) then proposed method with the currently available meth-
represents the object bsT{1 (i) that becomes motionless ods that combine short and long frame information,
at frame T in the long-term background. Equa- the dual-background model proposed by Porikli and
tion (11) indicates that all final detection results of Yin28 is used as the benchmark method and the
the motionless foreground objects are superimposed indoor scenario in Fig. 4 is used for the experiment.
on the binary image BsT ðx,yÞ at the current frame T. The indoor scenario entails a person walking into the
Figure 3a shows a sequence of indoor scenario laboratory with the curtain closed. He then opens the
with a person opening the door and then fainting on curtain, faints on the floor, stands up after a while
the floor. Figure 3b shows the short-term process and finally leaves the room with the curtain still open.
results, whereby the motionless person cannot be The grey-level images shown in Fig. 4a display the
detected in BsT ðx,yÞ after fainting (at frame T5201). original video sequence at varying time frames. The
Figure 3c shows the long-term process results, in video images were captured at 15 fps.
which the opened door cannot be promptly absorbed The results of the proposed method are presented
as a part of the background, but the fainted person is in Fig. 4b. The detection results show that the
well extracted in BlT ðx,yÞ. Figure 3d displays the proposed method can well extract the moving person
results of the combined scheme. It reveals that the and is not affected by the changed state of the
proposed method can effectively detect motionless curtain. Furthermore, the proposed method also
foreground objects and is highly responsive to correctly detected the motionless man after he

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


10 W-Y CHIU AND D-M TSAI

4 Experimental results on an indoor laboratory scenario: (a) discrete scene images in a video
sequence taken at 15 fps; (b) detected foreground objects from the proposed method; (c) detec-
tion results from the long-term and short-term combination method;26 (d) detection results
from temporal difference; (e) detection results from the adaptive background mixture model
(the symbol t represents the frame number in the sequence)

fainted. The opened curtain was promptly adapted as detected the opened curtain as a foreground object.
a part of the background. Figure 4c shows the results Even after the person left at t51095, the curtain was
from the dual-background model method.28 When still incorrectly detected as a foreground object. The
the person fainted at t5550, the method was able to test images in the experiments were 1506200 pixels
detect the motionless man. At the same time, it also wide with 8-bit grey levels. The computation time per

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 11

5 Experimental results of an indoor living room with a rotating fan: (a) discrete scene images in
a video sequence taken at 15 fps; (b) detected foreground objects from the proposed method;
(c) detection results from temporal difference; (d) detection results from the adaptive back-
ground mixture model (the symbol t indicates the frame number in the sequence)

frame of our proposed method was 0.0071 s (i.e. temporal difference. It indicates that temporal
142 fps), and the computation time of the dual- difference cannot detect a stationary foreground
background model28 was 0.021 s (i.e. 46 fps). The object such as the fainted man. The resulting shape
proposed method is more effective and computation- of the moving object from temporal difference was
ally more efficient. not accurate, and only partial edges were detected.
The indoor scenario in Fig. 4 was also tested with Figure 4e displays the results from the adaptive
the temporal difference method and mixture of background mixture model, in which the walking
Gaussians model. Figure 4d shows the results from person and the opened curtain were both detected as

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


12 W-Y CHIU AND D-M TSAI

6 Experimental results on an outdoor parking lot scenario: (a) discrete scene images in a video
sequence taken at 15 fps; (b) detected foreground objects from the proposed method; (c) detec-
tion results from temporal difference; (d) detection results from the adaptive background mix-
ture model (the symbol t indicates the frame number in the sequence)

foreground objects with the learning rate a50.1. If a he got up from floor. The mixture of Gaussians
larger learning rate is used, the opened curtain model could also detect the walking person when he
can indeed be updated promptly as a part of the got up from the floor, but the floor area he fell onto
background. However, the slow-moving person will still generated severe residuals.
not be fully identified as a foreground object. As seen The second indoor example in the experiments
in the last row of Fig. 4, the proposed method was came from a scenario in which a person walks into a
able to quickly update the floor area occupied by the living room equipped with a ceiling fan, and then
fainted person as a part of the background soon after leaves the room, while the ceiling fan keeps spinning.

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 13

7 Background estimation: (a1–a3) real background images for the scenarios in the last row of
Figs. 4a, 5a and 6a, respectively; (b1–b3) estimated background images mst from the proposed
method; (c1–c3) estimated background images from the adaptive background mixture
model

The video sequence was taken at 15 fps. Figure 5a after the person walked out of the room, as seen in
shows the video sequence at varying time frames. Fig. 5d.
Figure 5b displays the results from the proposed The third outdoor scenario entailed monitoring of
method. It shows that the regular rotation of the a parking lot, in which a car leaves a parking space in
ceiling fan was promptly adapted as a part of the the parking lot with varying speeds, and then drives
background, and only the walking person was back to the lot and parks in the same space. The
identified as a foreground object. Figure 5c shows video sequence was also taken at 15 fps. Figure 6a
the results from temporal difference. The detection shows the original video sequence at varying time
result was affected by the rotation of the ceiling fan, frames. There are many trees around the parking lot,
and only a partial shape of the walking person was and wind and sunlight produced dynamic changes in
extracted. The detection results from the adaptive the background. The proposed foreground segmenta-
background mixture model are demonstrated in the tion scheme can effectively extract the moving car,
fourth column Fig. 5d. The mixture model of regardless of its speeds, as shown in Fig. 6b. The
adaptive Gaussians was also capable of dealing with detection results from temporal difference are pre-
movement of the ceiling fan. However, it was more sented in Fig. 6c. Different driving speeds of the car
sensitive to the shadow of the walking person. The caused an incomplete car shape. When the car slowed
background model was not promptly updated right down or stopped, the temporal difference method

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


14 W-Y CHIU AND D-M TSAI

8 Bar chart of overall performance on the Microsoft benchmark dataset

was unable to detect the car effectively. In the Gaussians model cannot responsively update the
experiment, the frames from t5569 to t5605 show scene backgrounds of the laboratory and the parking
that the target car remains still. Thus, the temporal lot. There were dark residuals presented in the floor
difference method cannot detect the foreground from area possessed previously by the fainted person, as
the image sequence. The proposed method can seen in Fig. 7c1. The edges of the white car were not
reliably detect this motionless foreground car. effectively updated soon after it parked in the lot and
Figure 6d presents the detection results from the presented some artificial effects, as seen in Fig. 7c3.
adaptive background mixture model. It shows that In order to verify the effect of parameter K in
the mixture of Gaussians model was robust to the equation (5), K value was varied from 1 to 7 and
changes in sunlight and waves of leaves. However, the tested on the Microsoft Wallflower dataset.32 The
car shape was not well extracted. The residuals of dataset contains seven scenarios of image sequences:
the car still presented in the image soon after the car Moved objects, Time of day, Light switch, Waving
parked in the lot and became a part of the trees, Camouflage, Bootstrapping and Foreground
background, as seen in Fig. 6d. aperture. The Wallflower dataset also provides a
In order to further analyse the detection results of hand-segmented ground truth for evaluation. The
the proposed method and the adaptive background accuracy of foreground segmentation results can be
mixture model, Fig. 7 shows the estimated back- represented by false-positive rate (FP), false-negative
grounds for the three scenarios. Figure 7a1–a3 are, rate (FN) and total error rate (TE). False-positive
respectively, real scene backgrounds of the laboratory refers to the percentage of background pixels that are
(at image frame t51095 in Fig. 4a), the living room misclassified as foreground pixels, and false-negative
(at image frame t5363 in Fig. 5a) and the parking indicates the percentage of foreground pixels that are
lot (at image frame t51650 in Fig. 6a). Figure 7(b1)– misclassified as background pixels. The total error
(b3) are the short-term mean mst given by the rate is the sum of FP and FN.
proposed method. They show that the backgrounds In the experiments, we compared the performance
derived from the proposed method are very close to of the proposed method with different control
their real ones. Figure 7(c1)–(c3) present the esti- constants K and two adaptive background models:
mated backgrounds from the mixture of Gaussians single Gaussian11 and mixture of Gaussians.16 In the
model. The results show that the mixture model can background models, a mixture of three Gaussians
effectively estimate the scene background of the living was used. We experimented with different settings of
room, as seen in Fig. 7c2. It can well model the non- the adjustable parameters of the two compared
stationary blades of the ceiling fan, and adapt the fan algorithms until the results seemed optimal over the
as a part of the background. However, the mixture of entire datasets. We did not change parameter values

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011


MOVING/MOTIONLESS FOREGROUND OBJECT DETECTION 15

between sequences. The hand-segmented ground The variance of image frames in either short- or
truths are also provided by the benchmark for long-term processes indicates the degree of scene
performance evaluation. Figure 8 summarises the changes, thus enabling noise and repetitive move-
performance in terms of FP, FN and TE for each ments such as rotating fans and swinging leaves to be
algorithm. When the control constant K is increased easily removed from the background. Combining the
from 1 to 7, the FP is decreased and FN is slightly results of the long- and short-term processes can
increased. detect the motionless objects and eliminate the
When K is given between 4 and 7, all the resulting influence of non-stationary background objects. The
FP values are less than 1% for lower noise in the proposed method is robust in accommodating noise
background. The TE is stable with K from 2 to 7. In and grey-level variations of individual pixels. It can
considering the overall performance of FP and TE, effectively extract the shape of a moving object,
we chose K55 for robust foreground detection. and is very responsive to both gradual and radical
Figure 8 also shows that the proposed method can changes in the environment. High process frame rates
detect the foreground objects with the least total can be achieved up to 142 fps for smaller images of
errors, compared to the single Gaussian and mixture size 1506200 and 15 fps for larger images of size
of Gaussian methods. In terms of the sensitivity of 6406480 on an Intel Core2 2.33 GHz personal
the parameter value, the experimental results indicate computer. Experimental results have revealed that
that the proposed method has stable and effective the proposed method can be applied to monitoring of
performance in a wide range of K values between 3 both indoor and outdoor scenarios in which dynamic
and 7. environments and foreground objects with various
The proposed foreground segmentation method is moving velocities or without motion might occur.
based on the statistical mean and standard deviation
of a series of consecutive image frames. The well-
known single Gaussian updating models also rely on REFERENCES
these two statistics for motion detection. However,
1 Hu, W., Tan, T., Wang, L. and Maybank, S. A survey
the proposed method updates the background by on visual surveillance of object motion and behaviors.
precise calculation of the mean and standard devia- IEEE Trans. Syst. Man Cybern. Part C: Appl. Rev.,
tion and requires no reset operation in a non-stop 2004, 34, 334–352.
monitoring process. Instead, the single Gaussian 2 Horn, B. K. P. and Schunck, B. G. Determining optical
updating models estimate these two statistics with a flow. Artif. Intell., 1981, 17, 185–203.
linear filter and the estimation degrades over time due 3 Lucas, B. D. and Kanade, T. An iterative image
to accumulated error. registration technique with an application to stereo
vision, Proc. PADRA Imaging Understanding Work-
shop, Washington, DC, USA, April 1981, IEEE, pp.
4 CONCLUSION 121–130.
4 Gibson, D. and Spann, M. Robust optical flow
In this paper, we have presented a statistic process estimation based on a sparse motion trajectory set.
control scheme for foreground detection in video Image Process., 2003, 12, 431–445.
sequences from a static camera. The main advantages 5 Liu, P. R., Meng, M. Q.-H. and Liu, P. X.: Moving
of the proposed method are that it can observe both object segmentation and detection for monocular robot
based on active contour model. Electron. Lett., 2005, 41,
long- and short-term backgrounds simultaneously
1320–1322
and the corresponding grey-level mean and variance
6 Kim Y. H. and Kak, A. C. Error analysis of robust
can be efficiently and exactly computed without the
optical flow estimation by least median of squares
introduction of a learning rate for background
methods for the varying illumination model. Patt. Anal.
estimation. The grey-level mean of long-term frames Mach. Intell., 2006, 28, 1418–1435.
represents a static background without moving 7 Chamorro-Martinez, J. and Fernandez-Valdivia, J. A
objects, and retains all the advantages of background new approach to motion pattern recognition and its
subtraction. The grey-level mean of short-term application to optical flow estimation. IEEE Trans.
frames represents the most current status of the Syst. Man Cybern. Part C: Appl. Rev., 2007, 37, 39–51.
dynamic environment, and retains all the advantages 8 Lipton, A. J., Fujiyoshi, H. and Patil, R. S. Moving
of temporal difference. target classification and tracking from real-time video,

IMAG 91 # RPS 2011 The Imaging Science Journal Vol 0


16 W-Y CHIU AND D-M TSAI

Proc. 4th IEEE Workshop on Applications of computer 20 McFarlane, N. and Schofield, C. Segmentation and
vision: WACV ’98, Princeton, NJ, USA, October 1998, tracking of piglets in images. Mach. Vision Appl., 1995,
IEEE, pp. 8–14. 8, 187–193.
9 Wang, C. M. and Brandstein, M. S. A hybrid real-time 21 Manzanera, A. and Richefeu, J. C. A new motion
face tracking system, Proc. IEEE Int. Conf. on detection algorithm based on S–D background estima-
Acoustics, speech, signal processing: ICASSP ’98, tion. Patt. Recogn. Lett., 2007, 28, 320–328.
Seattle, WA, USA, May 1998, IEEE, pp. 3737–3740. 22 Elgammal, A., Duraiswami, R., Harwood, D. and
10 Piccardi, M. Background subtraction techniques: a Davis, L. S. Background and foreground modeling
review, Proc. IEEE Int. Conf. on Systems, man, using nonparametric kernel density estimation for visual
cybernetics, The Hague, The Netherlands, October surveillance. Proc. IEEE, 2002, 90, 1151–1163.
2004, IEEE, pp. 3099–3104. 23 Ianasi, C., Gui, V., Toma, C. I. and Pescaru, D. A fast
11 Wren, C. R., Azarbayejani, A., Darrell, T. and algorithm for background tracking in video surveillance
Pentland, A. P. Pfinder: real-time tracking of the using nonparametric kernel density estimation. Facta
human body. IEEE Trans. Patt. Anal. Mach. Intell., Univ. Ser.: Electron. Energ., 2005, 18, 127–144.
1997, 19, 780–785. 24 Elgammal, A., Duraiswami, R. and Davis, L. Efficient
12 Olson, T. and Brill, F. Moving object detection and kernel density estimation using the Fast Gauss
event recognition algorithm for smart camera, Proc. Transform with applications to color modeling and
DARPA Image Understanding Workshop, New tracking. IEEE Trans. Patt. Anal. Mach. Intell., 25,
Orleans, LA, USA, May 1997, Morgan Kaufmann, 1499–1504.
pp. 159–175. 25 Greengard, L. and Strain, J. The fast Gauss transform.
13 Eveland, C., Konolige, K. and Bolles, R. C. SIAM J. Sci. Stat. Comput., 1991, 12, 79–94.
Background modeling for segmentation of video-rate 26 Huerta, I., Rowe, D., Gonzalez, J. and Villanueva, J. J.
stereo sequences, Proc. IEEE Conf. on Computer vision Improving foreground detection for adaptive back-
and pattern recognition: CVPR ’98, Santa Barbara, ground segmentation, Proc. 1st CVC Workshop on
CA, USA, June 1998, IEEE Computer Society, pp. computer vision: progress of research and development:
266– CVCRD 2006, Bellaterra, Spain, Universitat Autònoma
271. de Barcelona. October 2006, pp. 1–6.
14 Kanade, T., Collins, R., Lipton, A., Burt, P. and 27 Bayona, A., SanMiguel, J. C. and Martiı́nez, J. M.
Wixson, L. Advances in cooperative multi-sensor video Stationary foreground detection using background
surveillance, Proc. DARPA Image Understanding subtraction and temporal difference in video surveil-
Workshop, Monterey, , CA, USA, November1998, lance, Proc. 17th IEEE Int. Conf. on Image processing:
IEEE, Vol. 1, pp. 3–24. ICIP 2010, Hong Kong, China, September 2010, IEEE
15 Cavallaro, A. and Ebrahimi, T. Video object extraction Signal Processing Society pp. 4657–4660.
based on adaptive background and statistical change 28 Porikli, F. and Yin, Z. Temporally static region
detection. Proc. SPIE, 2001, 4310, 465–475. detection in multi-camera systems, Proc. 10th IEEE
16 Stauffer, C. and Grimson, W. E. L. Adaptive back- Int. Workshop on Performance evaluation of tracking
ground mixture models for real-time tracking, Proc. and surveillance: PETS 2007, Rio de Janeiro, Brazil,
IEEE Conf. on Computer vision and pattern recogni- October 2007, IEEE, pp. 79–86.
tion: CVPR ’99, IEEE Computer Society, Vol. 2, pp. 29 Montgomery, D. C. Introduction to Statistical Quality
246–252. Control, 2004 (John Wiley & Sons, New York).
17 Stauffer, C. and Grimson, W. E. L. Learning patterns of 30 Hsieh, K. L., Tong, L. I. and Wang, M. C. The
activity using real-time tracking. IEEE Trans. Patt. application of control chart for defects and defect
Anal. Mach. Intell., 2000, 22, 747–757. clustering in IC manufacturing based on fuzzy theory.
18 KaewTrakulPong, P. and Bowden, R. An improved Expert Syst. Appl., 2007, 32, 765–776.
adaptive background mixture model for real-time 31 OpenCV, Intel Open Source Computer Vision
tracking with shadow detection, Proc. 2nd European Library. ,http://www.intel.com/technology/computing/
Workshop on Advanced video based surveillan: AVBS opencv/.
2001, Kingston, UK, September 2001, Kingston 32 Toyama, K., Krumm, J., Brumitt, B. and Meyers, B.
University, pp. 149–158. Wallflower: principles and practice of background
19 Lee, D.-S. Effective Gaussian mixture learning for video maintenance, Proc. 7th IEEE Int. Conf. on Computer
background subtraction. IEEE Trans. Patt. Anal. Mach. vision: ICCV ’99: Corfu, Greece, September 1999,
Intell., 2005, 27, 827–832. IEEE Computer Society, pp. 255–261.

The Imaging Science Journal Vol 0 IMAG 91 # RPS 2011

You might also like