An Advanced Motion Detection Algorithm With Video Quality Analysis For Video Surveillance Systems

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO.
1, JANUARY 2011 1
An Advanced Motion Detection Algorithm

with Video Quality Analysis for Video
Surveillance Systems
Shih-Chia Huang
Abstract—Motion detection is the first essential process in the nursing, monitoring of endangered species, observation of
extraction of information regarding moving objects and makes people and vehicles within a busy environment [5]–[12], along
use of stabilization in functional areas, such as tracking, classifica-
with many others.
tion, recognition, and so on. In this paper, we propose a novel and
accurate approach to motion detection for the automatic video The design of an advanced automatic video surveillance
surveillance system. Our method achieves complete detection of system requires the application of many important functions
moving objects by involving three significant proposed modules: including, but not limited to, motion detection [13]–[25],
a background modeling (BM) module, an alarm trigger (AT) classification [26], tracking [27], [28], behavior [29], activity
module, and an object extraction (OE) module. For our proposed
analysis, and identification [30], [31]. Motion detection is one
BM module, a unique two-phase background matching proce-
dure is performed using rapid matching followed by accurate of the greatest problem areas in video surveillance as it is not
matching in order to produce optimum background pixels for only responsible for the extraction of moving objects but also
the background model. Next, our proposed AT module eliminates critical to many computer vision applications including object-
the unnecessary examination of the entire background region, based video encoding, human motion analysis, and human-
allowing the subsequent OE module to only process blocks
machine interactions [32]. Therefore, our focus here is the
containing moving objects. Finally, the OE module forms the
binary object detection mask in order to achieve highly complete further development of the motion detection phase for an
detection of moving objects. The detection results produced by advanced video surveillance system.
our proposed (PRO) method were both qualitatively and quan- The three major classes of methods for motion detection
titatively analyzed through visual inspection and for accuracy, are background subtraction, temporal differencing, and op-
along with comparisons to the results produced by other state-
tical flow [13]. Background subtraction [14]–[23], [33] is
of-the-art methods. The analyses show that our PRO method
has a substantially higher degree of efficacy, outperforming other the most popular motion detection method and consists of
methods by an F1 metric accuracy rate of up to 53.43%. the differentiation of moving objects from a maintained and
updated background model, which can be further grouped
Index Terms—Background model, entropy, morphology, mo-
tion detection, video surveillance. into parametric type and non-parametric type [33]. Based on
the implicit assumption along with the choice of parameters,
the parametric model may achieve perfect performance corre-
I. Introduction sponding to the real data along with parametric information
[22]. On the contrary, the non-parametric model is heavily data
I N THE LAST DECADE, video surveillance systems have
become an extremely active research area due to increasing
levels of terrorist activity and general social problems. This
dependent without any parameters [22], [33]. Apart from back-
ground subtraction, two other motion detection methods—
has led to motivation for the development of a strong and optical flow and temporal differencing—are discussed in [25].
precise automatic processing system, an essential tool for While the optical flow method shows the projected motion on
safety and security in both public and private sectors. The the image plane with successful approximation of the complex
need for advanced video surveillance systems has inspired background handling, it often requires levels of computational
progress in many important areas of science and technology complexity that are very high and which subsequently create
including traffic monitoring [1], [2], transport networks, traffic difficulties in its implementation [34]. The temporal differ-
flow analysis, understanding of human activity [3], [4], home encing method, while effectively adapting to environmental
changes, often results in incomplete detection of the shapes of
Manuscript received October 22, 2009; revised February 8, 2010; accepted
June 16, 2010. Date of publication October 18, 2010; date of current version
moving objects, due to the limitations in temporal differencing
February 24, 2011. This work was supported by the National Science Council, with a sensitive threshold for noisy and local consistency
under Grant NSC 98-2218-E-027-008. This paper was recommended by properties of the change mask [35].
Associate Editor I. Ahmad.
The author is with the Department of Electronic Engineering, Na-
The currently implemented method for background subtrac-
tional Taipei University of Technology, Taipei 106, Taiwan (e-mail: tion accomplishes its objective by subtracting each pixel of
schuang@ntut.edu.tw). the incoming video frame from the background model, thus
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
generating an absolute difference. It then applies a threshold
Digital Object Identifier 10.1109/TCSVT.2010.2087812 to get the binary objects detection mask [20]. Threshold
1051-8215/$26.00
c 2010 IEEE
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
selection is a critical operation and can be conducted by a cosine transform (DCT) domain [21], and temporal median
variety of previously researched methods [36]–[40]. Although filter (TMF) [23]. These methods are briefly reviewed in the
the currently implemented background subtraction method is following sections.
convenient for implementation, the noise tolerance in the video
frame relies on the determined threshold. Functionalities such A. Simple Background Subtraction
as object classification, tracking, behavior, and identification
Both the reference image B(x, y) and the incoming video
are then performed on the regions where moving objects have
frame It (x, y) are obtained from the video sequence. A binary
been detected.
motion detection mask D(x, y) is calculated as follows:
The computational costs of traditional foreground analysis
methods are usually relatively expensive for the video surveil-
lance systems based on the traditional optical flow imple- 1, if |It (x, y) − B(x, y)| > τ
D(x, y) = (1)
mentation [34]. For more accurate motion detection design, 0, if |It (x, y) − B(x, y)| ≤ τ
foreground analysis is always needed for the most popular
background subtraction method in order to achieve the analysis where τ is the predefined threshold which designates pixels as
of the motion information [34]. either the background or the moving objects in a video frame.
With respect to background maintenance, the pixel-level If the absolute difference between a reference image and
processes and region-level processes should be clearly de- an incoming video frame does not exceed τ, the pixels of
signed into the background subtraction approach [41]. This the detection mask are labeled “0,” which means it contains
is because pixel-level processes can handle the adaptation background, otherwise, active ones are labeled “1,” which des-
to changing background at each pixel independently without ignates it as containing moving objects. A significant problem
pixels group observation, and region-level process can refine experienced by the SBS method in most real video sequences
the raw classification of the pixel-level with regard to inter- is that it fails to respond precisely when noise occurs in
pixel relationships [41]. the incoming video frame It (x, y) and static objects occur
This paper presents a novel background subtraction method in the reference image B(x, y) [20]. Note that the reference
which generates a background model using the selected suit- image B(x, y) represents the fixed background model, which
able background candidates. Then, through the use of an alarm is selected from the test frames [20].
trigger (AT) module, it detects the pixels of moving objects
within the regions determined to significantly feature objects. B. Running Average
The organization of the proposed (PRO) method is as follows. The problem can be countered by using the RA [14] to
1) A two-phase background matching procedure is used to generate the adaptive background model for adaptation to
select suitable background candidates for generation of temporal changes in the video sequence. RA differs from SBS
an updated background model. in that it updates each background image frame Bt (x, y) of the
2) A block-based entropy evaluation with morphological adaptive background model frequently in order to ensure the
operations is conducted through a triggered block-based reliability of motion detection.
alarm module. The previous background frame Bt−1 (x, y) and the new
3) Production of motion detection is completed through the incoming video frame It (x, y) are then integrated with the
automatic threshold selection algorithm. current background image. The adaptive background model
When compared to other state-of-the-art methods included is attained using the simple adaptive filter as follows:
in the performance study, our method proved to be of higher
efficacy. This was indicated by both qualitative and quanti-
tative results through analysis using a wide range of natural Bt (x, y) = (1 − β)Bt−1 (x, y) + βIt (x, y) (2)
video sequences. The remainder of our paper is organized as where β is an empirically adjustable parameter. While a
follows. Section II presents a condensed overview of the var- large coefficient β leads to a faster background updating
ious background subtraction approaches used for comparison. speed, it also causes the creation of artificial trails behind
Section III contains our proposed motion detection method. moving objects in the background model. In other words, if
Section IV presents the experimental results achieved by our objects remain stationary long enough, they become part of
PRO method compared to those of other methods. Section V the background model.
contains our concluding remarks. The binary motion detection mask D(x, y) is based on the
SBS method and is defined as follows:
II. Related Work

The major purpose of background subtraction is to generate 1, if |It (x, y) − Bt (x, y)| > τ
a reliable background model and thus significantly improve D(x, y) = (3)
0, if |It (x, y) − Bt (x, y)| ≤ τ
the detection of moving objects [35], [42]. Some state-of-
the-art background subtraction methods include simple back- where It (x, y) is the current incoming video frame, Bt (x, y)
ground subtraction (SBS), running average (RA) [14], − is the current background model, and τ is an experimentally
estimation (SDE) [16], multiple − estimation (MSDE) predefined threshold to generate the binary motion detection
[19], simple statistical difference (SSD) [20], RA with discrete mask.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 3
C. − Estimation bti−1 (x, y) is the current (i − 1)th reference background. Addi-

In accordance with the pixel-based decision framework, the tionally, the reference difference it (x, y) and reference time-
temporal statistics of the pixels of the original video sequence variance vit (x, y) are also computed as follows:
is calculated by a new background subtraction method called
vit (x, y) = vit−1 (x, y) + sgn(N × it (x, y)
SDE method [16]. In the first background estimate, the calcu-
lation makes use of the sgn function in order to estimate the −vit−1 (x, y)) (10)
background intensity. The sgn function is expressed as follows: where = |It (x, y) −
it (x, y) bti (x, y)|.
⎧ The confidence adaptive background model Bt (x, y) can be
⎨1, if a > 0 calculated after bti (x, y) and vit (x, y) are determined, yielding
sgn(a) = 0, if a = 0 (4) the formula as follows:
⎩
−1, if a < 0
αi (bti (x,y))
where a is the input real value. Then the background estima-
i∈[1,R] vit (x,y)
tion formula is expressed as follows: Bt (x, y) = αi (11)
i∈[1,R] vit (x,y)
Bt (x, y) = Bt−1 (x, y) + sgn(It (x, y) − Bt−1 (x, y)) (5) where each αi is the predefined confidence value, i is the
reference number, R is the total number of i, and Bt (x, y)
where Bt (x, y) is the current background model, Bt−1 (x, y) is the confidence adaptive background model. According to
is the previous background model, and It (x, y) is the current [19], R is experimentally set to 3 and confidence values α1 ,
incoming video frame. The intensity of the background model α2 , and α3 are set to 1, 8, and 16, respectively. Notice that the
increases or decreases by a value of one through the evaluation binary moving objects mask D(x, y) is generated by the same
of the sgn function at every frame. The image of absolute dif- approach SDE based on the confidence adaptive background
ference t (x, y) is then calculated as the estimative difference model Bt (x, y).
between It (x, y) and Bt (x, y) as follows: For the certain complex scenes, the MSDE method can
detect multiple moving objects with higher degrees of accuracy
than the SDE method. This is because the MSDE method
t (x, y) = |It (x, y) − Bt (x, y)|. (6)
generates the binary moving objects mask D(x, y) based on
In a similar fashion, the time-variance Vt (x, y) is calculated the multimodal background model Bt (x, y), a procedure which
by utilizing the sgn function which measures motion activity requires greater computational complexity.
in order to determine whether each pixel should be designated
as “background” or “moving object.” E. Simple Statistical Difference
Vt (x, y) = Vt−1 (x, y) + sgn(N × t (x, y) SSD method is based on the use of mean value and
standard deviation and organizes the background model by
−Vt−1 (x, y)) (7)
computing the mean value for the individual pixels of the
where Vt (x, y) is the current time-variance, Vt−1 (x, y) is the previous video frames. Each pixel value µxy of the background
previous time-variance, and N is the predefined parameter image is produced from a collection of previous frames in
which ranges from 1 to 4. the time interval (t0 , tk−1 ). For each pixel, a threshold is also
Based on the generated current time-variance Vt (x, y), the represented by the standard deviation σxy in the same interval
binary motion detection mask D(x, y) is detected as follows: as follows:

1, if t (x, y) > Vt (x, y) 1
K−1
Dt (x, y) = (8)
0, if t (x, y) ≤ Vt (x, y). µxy = Ik (x, y) (12)
K k=0
D. Multiple − Estimation
1/2
1
K−1
The SDE method is characterized by its updating period
which features a constant time in which the background model σxy = (Ik (x, y) − µxy )2 . (13)
K k=0
is generated. This in turn causes a constraint when used for
certain complex scenes, as in scenes with many moving objects In order to achieve motion detection, the absolute difference
or those with moving objects exhibiting variable motion [19]. between the incoming video frame and the background model
Thus, in this situation, the MSDE method is proposed in is calculated. The predefined parameter is represented by λ,
order to build the adaptive background model. The background with an absolute difference larger than λσxy describing a pixel
model formula is expressed as follows: which is part of a moving object; if the absolute difference is
less or equal to λσxy , it denotes a background pixel as follows:
bti (x, y) = bt−1
i
(x, y) + sgn(bti−1 (x, y) − bt−1
i
(x, y)) (9)
1, if |It (x, y) − µxy | > λσxy
where bti (x, y)
is the current ith reference background, Dt (x, y) = (14)
i
bt−1 (x, y) is the previous ith reference background, and 0, if |It (x, y) − µxy | ≤ λσxy .
Fig. 1. Gray-level distribution. (a) Of the original signal. (b) Of the extracted optimum background pixels.
Fig. 2. Framework of the optimum background modeling procedure.
F. RA with DCT Domain ground model BtL (x, y). Each pixel of BtL (x, y) exhibits a long-
The RADCT algorithm [21] makes use of an altered RA term timer T L (x, y), which counts the number of frames in
method [14] in order to model the adaptive background in the which a given pixel exists BtL (x, y) as follows:
DCT domain. This is a correlation of the same function in the
spatial domain. The resultant adaptive background model can
L T L (x, y) + 1,
be expressed as follows: T (x, y) = (16)
if |It (x, y) − BtL (x, y)| > τ
where τ is the predefined tolerance. BtL (x, y) updates to the

dtB,k = (1 − β)dt−1
B,k
+ βdtk (15)
gray-level of the incoming pixels and increases the long-term
where β is an empirically adjustable parameter similar to that timer T L (x, y) if it is within the tolerance τ, after which
used in the traditional RA algorithm [14], dtk denotes the the short-term timer T S (x, y) counts the consecutive number
DCT coefficient vector of the kth pixel block for the current of frames which are associated with the gray-level pixels as
incoming video frame, dtB,k denotes the background estimation follows:
B,k
of the kth pixel block in the DCT domain, and dt−1 denotes
the previous background estimation.
S T S (x, y) + 1, if |It (x, y) − BtS (x, y)| ≤ τ
Compared with the traditional RA method [14], the RADCT T (x, y) =
0, otherwise.
algorithm can save time if the video is compressed by
(17)
DCT transformation-based methods. Unlike the traditional
RA method which uses the intensity of each pixel [14], the If the short-term timer T S (x, y) is greater than the long-
RADCT method is able to effectively generate the adaptive term timer T L (x, y), then the pixels of BtL (x, y) and BtS (x, y)
background using the DCT coefficient vector within each are updated by the new incoming frame It (x, y) and the short-
separate block instead of pixel values. term timer T S (x, y) can subsequently be reset to zero. If the
long-term timer T L (x, y) is greater than the tolerance µ, then
G. Temporal Median Filter
the long-term timer T L (x, y) can be reset to µ. According to
The TMF [23] statistically computes the temporal median [23], µ is experimentally set to at least 10–15% of the number
value for each pixel in order to generate the adaptive back- of frames in the test video sequence.
1
Bt (x, y) = Bt−1 (x, y) + (It (x, y) − Bt−1 (x, y)) (18)
t
where Bt−1 (x, y) is the previous background model, It (x, y)
is the current incoming video frame, t is the frame number
in the video sequence, and K is experimentally set at 50 to
represent the initial background model. In order to reduce
frame storage consumption [21], the initial background model
adopts the calculated average. This is accomplished by making
appropriate use of MMA which holds only the last background
model Bt (x, y) and the current incoming video frame It (x, y)
during the calculation procedure.
2) Optimum Background Modeling: In order to expedi-
tiously determine background candidates, emphasis is placed
on the first rapid matching phase in optimum background mod-
eling (OBM). Fig. 1(a) illustrates gray-level distribution, which
is composed of the stable signal formed by the major part of
the background. Unstable signals result only occasionally and
indicate the appearance of moving object(s). For the optimum
background pixel estimation in the video sequence, the main
objective of OBM is to extract the stable signal of the incoming
Fig. 3. Flowchart for our proposed background modeling procedure. frame in the video sequence as shown in Fig. 1(b). As can be
seen, the signal extracted by OBM is more stable than those
III. Proposed Method of the un-extracted signal in the other frames.
The framework of OBM can be seen in Fig. 2. It consists
In this section, we present a novel motion detection ap- of the following steps:
proach for static-camera surveillance scenarios. Our approach
1) immediate determination of background candidates via
achieves complete detection of moving objects and involves
the rapid matching procedure;
three proposed modules: a background modeling (BM) mod-
2) use of the stable signal trainer in order to provide a
ule, an AT module, and an object extraction (OE) module.
measure of temporal activity of the pixels within the set
Initially, the proposed BM module designs a unique two-
of background candidates;
phase background matching procedure using rapid matching
3) determination of the optimum background pixels via the
followed by accurate matching in order to produce optimum
accurate matching procedure.
background pixels for the background model.
In order to drastically reduce the computational complexity Rapid matching: This procedure is used to quickly find
of the motion detection process, we propose using an AT a great quantity of background candidates by determining
module. This module consists of a novel block-based en- whether or not their respective pixel values for the incoming
tropy evaluation method developed for the employment of video frame It (x, y) are equal to the corresponding pixel values
block candidates, after which the most likely moving objects of the previous video frame It−1 (x, y). If the values corre-
within the motion blocks are determined based on block- spond, it indicates good candidate selection for the following
based morphological erosion and dilation operations. Our AT stable signal trainer.
module eliminates the unnecessary examination of the entire Stable signal trainer: All pixels from the set of back-
background region, allowing the OE module to only process ground candidates selected via the rapid matching procedure
blocks containing moving objects. are then trained through the stable signal trainer. The stable
As the final step of our process, the proposed OE module signal trainer is expressed as follows:
examines every block which may possibly contain moving
objects in order to generate the binary object detection mask. Mt (x, y) + p, if It (x, y) > Mt−1 (x, y)
Mt (x, y) = (19)
This is accomplished by utilizing the suitable threshold value, Mt (x, y) − p, if It (x, y) < Mt−1 (x, y)
which is then applied automatically through our proposed where Mt (x, y) is the corresponding pixel within the most
effective threshold selection algorithm. recent set of background candidates, Mt−1 (x, y) is the cor-
responding pixel within the previous set of background candi-
A. Background Modeling dates, and p represents the real value which is experimentally
1) Initial Background Model: The modified moving av- set at 1. Notice that the initial background candidate value
erage (MMA) is used to compute the average of frames M0 (x, y) is set at I0 (x, y).
1 through K for the initial background model generation. Accurate matching: As shown on the right in Fig. 2,
For each pixel (x, y), the corresponding value of the current each light-gray pixel of the background candidate is trained
background model Bt (x, y) is calculated using the formula as to the dark-gray pixel by using the stable signal trainer. The
follows: optimum of these are then determined from the dark-gray
TABLE I where h (an arbitrary element of {L0 , L1 , L2 , L3 , . . . , LV −1 })

Block-Based Entropy Value in a Video Frame represents the arbitrary gray-level within each w × w block
(i,j)
(i, j) and nh denotes the number of pixels corresponding
to arbitrary gray-level h. Note that h is reset to 0 when it
is smaller than τ. In this paper, w is experimentally set at 8
and τ is experimentally set at 10. The block-based entropy
evaluation function can then be expressed as follows:

LV −1
(i,j) (i,j)
E(i, j) = − Ph log 2(Ph ). (22)
h=0
After each w × w entropy block E(i, j) is calculated, the

motion block A can be defined as follows:

1, if E(i, j) > T
A(i, j) = (23)
0, otherwise.
pixels when the pixels of Mt (x, y) are equal to It (x, y). They When the calculated entropy block (i, j) exceeds T , the
are represented here in black. motion block A(i, j) is labeled with “1,” denoting that it
3) Background Updating: Each optimum background pixel contains pixels of moving objects. Otherwise, nonactive ones
of Mt (x, y) will then be supplied to every frame of the are labeled with “0.” Table I illustrates the block-based entropy
background model Bt (x, y). Based on OBM, the best possible density of the pronounced change for the blocks in a sampled
background pixels are then updated for the background model. video frame. By setting T equal to 1, possible motion blocks
Here, we adopt a simple moving average method in order can then be detected.
to smooth the proposed background model. This will yield The elimination of some of the detected background blocks
better results when motion detection is performed. The moving and completed motion blocks can then be performed. This
average formula is expressed as follows: is accomplished through the use of block-based erosion and
dilation, and can be defined by the formula as follows:
1
Bt (x, y) = Bt−1 (x, y) + (It (x, y) − Bt−1 (x, y)) (20)
α A∗ = δb2λ (εbλ (A)) (24)
where α is the predefined parameter and, in this paper, where ε is the morphological erosion, δ is the morphological
is experimentally set at 8. The flowchart for the proposed dilation, and bλ , the elemental structure, is a ball of radius λ
background model can be seen in Fig. 3. which is experimentally set at 1 in this paper. The flowchart
of the block-based alarms trigger (AR) module can be seen in
B. Alarms Trigger Module Fig. 4.
After the background model is produced via the BM
procedure at each frame, the absolute difference t (x, y) C. OE Module
is generated by the absolute differential estimation between The detection of moving objects can be achieved through
the updated background model Bt (x, y) and current incoming the observed change in gray-level illumination of the obtained
video frame It (x, y). motion blocks within the absolute difference t (x, y). How-
In order to significantly accelerate the following OE module, ever, the critical challenge is obtaining a suitable threshold
we propose that the AT module be comprised of a stepwise for binarization [30]. To solve this problem, we propose the
procedure involving novel block-based entropy evaluation effective threshold selection algorithm for use with the OE
followed by block-based morphological operations. Detection module in order to produce the binary motion detection mask.
of each possible motion block candidate is accomplished by For the first variance estimate, the computation employs the
the proposed block-based entropy evaluation. Elimination of basic stable signal trainer for the estimation of the short-term
some of the detected background blocks and completed motion variance value at each frame when t (x, y) is not equal to 0.
blocks is then performed via the block-based morphological The short-term variance estimation formula can be expressed
erosion and dilation operations. as follows:
Suppose that each w × w block (i, j) within the absolute
difference t (x, y) is composed of V discrete gray-levels and
vst−1 (x, y) + p, if N × t (x, y) > vst−1 (x, y)
is denoted by {L0 , L1 , L2 , L3 , . . . , LV −1 }. The block-based vst (x, y) =
(i,j) vst−1 (x, y) − p, if N × t (x, y) < vst−1 (x, y)
probability density function Ph is defined as follows:
(25)
(i,j) (i,j) where vst−1 (x, y) represents the previous short-term variance
Ph = nh /w2 (21) value, and N is the predefined parameter which ranges from
1 to 4. In this paper, N is experimentally set to 2. Notice that

the initial short-term variance vs0 is set at 0 .
Similarly, the basic stable signal trainer is modified to
compute the long-term variance value vlt−1 (x, y) of the video
streams. The long-term variance estimation formula is thus
expressed as follows:

vlt−1 (x, y) + p, if vst (x, y) > vlt−1 (x, y)
vlt (x, y) = (26)
vlt−1 (x, y) − p, if vst (x, y) < vlt−1 (x, y)
where t is a multiple of α and vlt−1 (x, y) represents the previous

long-term variance value. Note that the initial long-term value
vl0 is set at 0 .
After the short-term and long-term variance values are
generated, the best variance value Vt (x, y) can be calculated
at each frame as follows:
Vt (x, y) = N × min (vst (x, y), vlt (x, y)). (27)

Finally, the binary motion detection mask D(x, y) can be
formed by detecting the pixels of moving objects within each
motion block as follows:

1, if t (x, y) > Vt (x, y)
D(x, y) = (28)
0, otherwise.
Fig. 4. Flowchart of our proposed block-based AR module.
Fig. 5 shows the flowchart of our proposed OE algorithm.
For achievement of our objective which is the detection of
moving objects, we employ a method which involves a BM
module, an AT module, and an OE module. A system block
diagram of our PRO method is shown in Fig. 6.
IV. Experimental Results

The objective of this section is to demonstrate the benefit
of our PRO method. Experimental results have been produced
for several natural video sequences by using our method and
several other state-of-the-art methods, and are presented and
compared in this section. The results were analyzed in two
ways as follows:
1) through comparison of generated background models and
the detected qualitative binary objects masks (Figs. 7–9);
2) quantitative evaluation of all test sequences (Tables III–
VI).
Table II illustrates a representational suite of typical situa-
tions which can critically affect performance in video surveil-
lance systems. The six sequences portrayed vary considerably
in overall image size, noise level, background property, ob-
ject size, and object number, with both indoor and outdoor
environments presented.
Generation of the background model without any mov-
ing objects is very desirable in advanced video surveillance
systems. Some properties are indicative of a high-quality
background model, including background model adaptation,
maintenance, avoiding the probability of wrong associations
Fig. 5. Flowchart of our proposed OE module. owing to noise, and artificial “ghost” trails caused by cluttered
motion.
Fig. 6. System block diagram of our motion detection method.
TABLE II
Used Original Video Sequences Benchmark
Fig. 7. Comparison between (a) SDE, (b) RA, (c) TMF, (d) MSDE, and (e) our proposed background model.
Fig. 8. Background images of the RD video sequence generated by (a) SDE, (b) RA, (c) TMF, (d) MSDE, (e) and our proposed background model.
Fig. 9. Background images of the ST video sequence generated by (a) SDE, (b) RA, (c) TMF, (d) MSDE, (e) and our proposed background model.
We then evaluate the generated background models of background signal Bt (30, 20) of our proposed background
different methods for a specific pixel selected from a scene model in Fig. 7(e) exhibits less variance than those of
for the 6000-frame RD video sequences displayed in Fig. Fig. 7(a)–(d).
7. In this scene, many vehicles pass through a country road, Figs. 8 and 9 show the results of the background models
which causes the signal of observed pixel It (30, 20) to generated by the respective use of SDE, RA, TMF, MSDE,
tremble frequently. and our PRO method, in the RD and ST video sequences.
Therefore, the output signal of the background model The background images calculated by our proposed model are
must be more stable in order to differentiate between illustrated in Fig. 8(e) and Fig. 9(e). Here, we can observe that
the trembling signals of moving objects and those pro- our proposed model is easily the most successful in modeling
duced by the background. When compared to other state- background images which abstain from the generation of noise
of-the-art methods, it becomes apparent that the output and artificial “ghost” trails.
TABLE III TABLE IV

Comparison Between the Binary Objects Masks with Similarity Comparison Between the Binary Objects Masks with Similarity
and F1 Values of Each Method in Sequence IR and F1 Values of Each Method in Sequence MR
In addition to background model evaluation, the qualitative Precision represents the percentage of true positives not
motion detection results obtained through our proposed (PRO) needed through comparison with the amount of true positive
method were also compared with those obtained through pixels in the detected binary objects mask as follows:
other state-of-the-art methods. To obtain measurements of
quantitative accuracy, different metrics were performed on
several video sequences; these included Recall, Precision, F1 , Precision = tp/(tp + fp) (30)
and Similarity [22], [42]–[47]. where fp is the total number of false positive pixels and (tp +
The use of Recall offers the percentage of needed true fp) indicates the total number of true positive pixels in the
positives through comparison with the amount of true positive detected binary objects mask.
pixels in the ground truth as follows: Unfortunately, Recall selectively measures only the wrong
association of lost true positive pixels internal to the moving
Recall = tp/(tp + fn) (29) object, and Precision selectively measures only the wrong
association of external true positive pixels. Therefore, the
where tp is the total number of true positive pixels, fn is the accuracy measurements produced by Recall and Precision
total number of false negative pixels, and (tp + fn) indicates independent of each other cannot offer an adequate comparison
the total number of true positive pixels in the ground truth. between the different methods.
TABLE V TABLE VI
Comparison Between the Binary Objects Masks with Similarity Comparison Between the Binary Objects Masks with Similarity
and F1 Values of Each Method in Sequence SC and F1 Values of Each Method in Sequence W S
However, there are two accuracy metrics—F1 and their binary objects masks obtained by each method are shown
Similarity—which can reconcile the accuracy measurements in Tables III–VI.
of Recall and Precision by fairly weighting their harmonic Tables III–VI show the original frames, the ground truth
balances as follows: frames, and the test frames with Similarity and F1 accuracy
measurements of their binary objects masks. These were
F1 = 2(Recall) (Precision)/(Recall + Precision) (31) generated by PRO, RADCT, MSDE, SDE, and SSD methods,
respectively, with the results shown from top to bottom. Here,
we can observe that the PRO method is obviously more precise
Similarity = tp/(tp + fp + fn). (32) than the MSDE, SDE, SSD, and RADCT methods for almost
every test sequence. The binary objects masks obtained by the
All metric-attained values range from 0 to 1, with higher PRO method exhibit not only the largest amount of completion
values representing greater accuracy. but also the highest Similarity and F1 accuracy measurements.
The qualitative results for six test frames of each video In addition to the single binary moving objects mask and
sequence along with the Similarity and F1 measurements of Similarity and F1 measurements of the six sampled frames
TABLE VII TABLE IX

Average Similarity, F1 , P recision, and Recall Values Compared Time Consumption Ratio of the Complete PRO Method to the
PRO Method Without the AT Module
Sequence Evaluation PRO RADCT MSDE SDE SSD
IR Similarity 0.6599 0.4058 0.2120 0.1528 0.1810 Test sequence IR MR SC WS
F1 0.7917 0.5744 0.3390 0.2574 0.2936 BM+AT+OE
0.8591 0.9106 0.8227 0.9441
BM+OE
Precision 0.7864 0.4959 0.2257 0.1594 0.2814
Recall 0.8025 0.7001 0.8344 0.8328 0.3900
MR Similarity 0.8077 0.4463 0.5138 0.3774 0.4559 elimination pixel numbers of each frame is 17 331 on average,
F1 0.8929 0.5988 0.6652 0.5328 0.6185
Precision 0.8665 0.6449 0.6651 0.4800 0.7141 so that the elimination rate of the subsequent OE module is
Recall 0.9246 0.5814 0.7041 0.6224 0.5689 84.62% on average. In other words, the following module only
SC Similarity 0.5473 0.5079 0.3845 0.3076 0.3470 processes an average of 15.38% of the pixels of each frame to
F1 0.7061 0.6718 0.5516 0.4661 0.5125 achieve motion detection. For the test video sequence SC with
Precision 0.7044 0.6291 0.7788 0.7697 0.3648
Recall 0.7113 0.7338 0.4281 0.3355 0.8739 image size 256 × 320, the elimination pixel numbers of each
WS Similarity 0.7660 0.3874 0.5408 0.3521 0.7213 frame is 54 101 on average, so that the elimination rate of the
F1 0.8669 0.5561 0.6977 0.5197 0.8340 subsequent OE module is 66.04% on average. In other words,
Precision 0.8684 0.6109 0.8665 0.7406 0.7994
Recall 0.8673 0.5311 0.5938 0.4073 0.8756 the following module only processes an average of 33.96%
of the pixels of each frame to achieve motion detection. For
TABLE VIII
the test video sequence WS with image size 128 × 160, the
Average Elimination Region Achieved by the AR for Each
elimination pixel numbers of each frame is 8757 on average,
Dataset
so that the elimination rate of the subsequent OE module is
42.76% on average. In other words, the following module only
Test sequence IR MR SC WS
processes an average of 57.24% of the pixels of each frame
Image size 240 × 320 128 × 160 256 × 320 128 × 160 to achieve motion detection.
Elimination pixel 60 386 17 331 54 101 8757 In this paper, the complete PRO method includes the pro-
Elimination rate (%) 78.63 84.62 66.04 42.76 posed BM module, AT module, and OE module. In order to
further verify the performance of the proposed AT module,
Table IX lists the time consumption ratio of the complete PRO
compared above, the average accuracy values produced by method to the PRO method without the AT module, in the test
using the different metrics for all test sequences are reported video sequences IR, MR, SC, and WS.
in Table VII. Generally, the PRO method attains higher For the test video sequence IR, the time consumption ratio
Similarity and F1 values than other state-of-the-art methods. is 85.91%; thus, the proposed AT module can save 14.09% in
This is especially evident in the MR sequence; not only are computational cost. For the test video sequence MR, the time
the highest Similarity and F1 values achieved but also the consumption ratio is 91.06%; thus, the proposed AT module
Precision and Recall values attained by the PRO method are can save 8.94% in computational cost. For the test video
high as well, resulting in the only accuracy rate over 80% for sequence SC, the time consumption ratio is 82.27%; thus,
each measured metric. the proposed AT module can save 17.73% in computational
The following offers a comparison between the accuracy cost. For the test video sequence WS, the time consumption
rates of each method. For the IR sequence, the accuracy rates ratio is 94.41%; thus, the proposed AT module can save
of the SDE method produced by Similarity and F1 were up 5.59% in computational cost. Thus, the proposed AT module
to 50.71% and 53.43%, respectively. In regard to the MR can effectively facilitate the performance of the entire motion
sequence, the accuracy rates of the SDE method produced detection process, while requiring little computational cost by
by Similarity and F1 were up to 43.03% and 36.01%, respec- the necessary operations in the proposed AT chain.
tively. For the SC sequence, the accuracy rates of the SDE
method produced by Similarity and F1 were up to 23.97%
and 24%, respectively. Last, in regard to the WS sequence, V. Conclusion
the accuracy rates of the SDE method produced by Similarity This paper has presented a novel module that generated an
and F1 were up to 43.03% and 29.41%, respectively. accurate background with production of neither noise pixels
As demonstration of the efficacy of our proposed AR mod- nor artificial “ghost” trails. After a high-quality background
ule, the average amount of elimination block of unnecessary model was produced, the AT module eliminated the un-
examination for each test video sequence is reported in Table necessary examination of the entire background region and
VIII. reduced the computational complexity for the subsequent
For the test video sequence IR with image size 240 × motion detection phase. For the final step, the proposed object
320, the elimination pixel numbers of each frame is extraction module detected the pixels of moving objects within
60 386 on average, so that the elimination rate of the sub- the triggered alarm region to form the moving objects mask.
sequent OE module is 78.63% on average. In other words, This was accomplished by utilizing our proposed effective
the following module only processes an average of 21.37% threshold algorithm. The results of our approach were analyzed
of the pixels of each frame to achieve motion detection. For both quantitatively and qualitatively in a wide range of natural
the test video sequence MR with image size 128 × 160, the video sequences. These analyses illustrated the efficacy of
our proposed motion detection approach as not only did the [23] B. Shoushtarian and N. Ghasem-Aghaee, “A practical approach to
realtime dynamic background generation based on a temporal median
accuracy rates of our procedure exceed those of other methods filter,” J. Sci. Islamic Republic Iran, vol. 14, no. 4, pp. 351–362, 2003.
but also the resulting visual performance was more pleasing. [24] L. Havasi, Z. Szlavik, and T. Sziranyi, “Detection of gait characteristics
for scene registration in video surveillance system,” IEEE Trans. Image
Process., vol. 16, no. 2, pp. 503–510, Feb. 2007.
[25] D.-M. Tsai and S.-C. Lai, “Independent component analysis-based
References background subtraction for indoor surveillance,” IEEE Trans. Image
[1] D. Koller, K. Daniilidis, and H. Nagel, “Model-based object tracking in Process., vol. 18, no. 1, pp. 158–167, Jan. 2009.
monocular image sequences of road traffic scenes,” Int. J. Comput. Vis., [26] H. Murase and R. Sakai, “Moving object recognition in eigenspace
vol. 10, pp. 257–281, Jun. 1993. representation: Gait analysis and lip reading,” Patt. Recog. Lett., vol.
[2] Z. Zhu, G. Xu, B. Yang, D. Shi, and X. Lin, “Visatram: A real-time 17, pp. 155–162, Feb. 2003.
vision system for automatic traffic monitoring,” Image Vis. Comput., [27] N. McFarlane and C. Schofield, “Segmentation and tracking of piglets
vol. 18, no. 10, pp. 781–794, Jul. 2000. in images,” Mach. Vision Applicat., vol. 8, no. 3, pp. 187–193, May
[3] S. Dockstader and M. Tekalp, “Multiple camera tracking of interacting 1995.
and occluded human motion,” Proc. IEEE, vol. 89, no. 10, pp. 1441– [28] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using
1455, Oct. 2001. real-time tracking,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 22, no.
[4] S. Park and J. Aggarwal, “A hierarchical bayesian network for event 8, pp. 747–757, Aug. 2000.
recognition of human actions and interactions,” Multimedia Syst., vol. [29] D. Ayers and M. Shah, “Monitoring human behavior from video taken
10, no. 2, pp. 164–179, Aug. 2004. in an office environment,” Image Vis. Comput., vol. 19, pp. 833–846,
[5] P. Remagnino, T. Tan, and K. Baker, “Multiagent visual surveillance of Oct. 2001.
dynamic scenes,” Image Vis. Comput., vol. 16, pp. 529–532, Jun. 1998. [30] L. Wang, T. Tan, H. Ning, and W. Hu, “Silhouette analysis-based gait
[6] T. Huang and S. Russell, “Object identification: A bayesian analysis recognition for human identification,” IEEE Trans. Patt. Anal. Mach.
with application to traffic surveillance,” Artif. Intell., vol. 103, nos. 1–2, Intell., vol. 25, no. 12, pp. 1505–1518, Dec. 2003.
pp. 77–93, Aug. 1998. [31] J. Hayfron-Acquah, M. Nixon, and J. Carter, “Automatic gait recognition
[7] G. L. Foresti, “Real-time system for video surveillance of unattended by symmetry analysis,” Patt. Recog. Lett., vol. 24, pp. 2175–2183, Sep.
outdoor environments,” IEEE Trans. Circuits Syst. Video Technol., vol. 2003.
8, no. 6, pp. 697–704, Oct. 1998. [32] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of
[8] M. Haag and H. H. Nagel, “Incremental recognition of traffic situations complex backgrounds for foreground object detection,” IEEE Trans.
from video image sequences,” Image Vis. Comput., vol. 18, pp. 137–153, Image Process., vol. 13, no. 11, pp. 1459–1472, Nov. 2004.
Jan. 2000. [33] A. M. Elgammal, D. Harwood, and L. S. Davis, “Non-parametric model
[9] T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill, “Integrated for background subtraction,” in Proc. Eur. Conf. Comput. Vision, 2000,
person tracking using stereo, color and pattern detection,” Int. J. Comput. pp. 751–767.
Vis., vol. 37, pp. 175–185, Jun. 2000. [34] Y.-L. Tian, M. Lu, and A. Hampapur, “Robust and efficient foreground
[10] J. M. Ferryman, S. J. Maybank, and A. D. Worrall, “Visual surveillance analysis for real-time video surveillance,” in Proc. IEEE Int. Conf.
for moving vehicles,” Int. J. Comput. Vis., vol. 37, no. 2, pp. 187–197, Comput. Vision Patt. Recog., vol. 1. Jun. 2005, pp. 1182–1187.
Jun. 2000. [35] R. J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, “Image change
[11] I. Haritaoglu, D. Harwood, and L. S. Davis, “W 4 : Real-time surveillance detection algorithms: A systematic survey,” IEEE Trans. Image Process.,
of people and their activities,” IEEE Trans. Patt. Anal. Mach. Intell., vol. vol. 14, no. 3, pp. 294–307, Mar. 2005.
22, no. 8, pp. 809–830, Aug. 2000. [36] T. Ridler and S. Calvard, “Picture thresholding using an iterative
[12] N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer selection method,” IEEE Trans. Syst., Man, Cybern., vol. 8, no. 8, pp.
vision system for modeling human interactions,” IEEE Trans. Patt. Anal. 630–632, Aug. 1978.
Mach. Intell., vol. 22, no. 8, pp. 831–843, Aug. 2000. [37] N. Otsu, “A threshold selection method from gray level histograms,”
[13] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62–66, Jan.
surveillance of object motion and behaviors,” IEEE Trans. Syst., Man, 1979.
Cybern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004. [38] J. Kapur, P. Sahoo, and A. Wong, “A new method for gray-level
[14] C. R. Wren, A. Azarbayehani, T. Darrell, and A. P. Pentland, “Pfinder: picture thresholding using the entropy of the histogram,” Comput. Vision
Real-time tracking of the human body,” IEEE Trans. Patt. Anal. Mach. Graphics Image Process., vol. 29, no. 3, pp. 273–285, 1985.
Intell., vol. 19, no. 7, pp. 780–785, Jul. 1997. [39] P. Rosin, “Thresholding for change detection,” Comput. Vis. Image
[15] S. J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, and H. Wechsler, Understanding, vol. 86, no. 2, pp. 79–95, May 2002.
“Tracking groups of people,” Comput. Vis. Image Understanding, vol. [40] L. Snidaro and G. Foresti, “Real-time thresholding with Euler numbers,”
80, no. 1, pp. 42–56, Oct. 2000. Patt. Recog. Lett., vol. 24, pp. 1533–1544, Jun. 2003.
[16] A. Manzanera and J. C. Richefeu, “A robust and computationally effi- [41] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Princi-
cient motion detection algorithm based on - background estimation,” ples and practice of background maintenance,” in Proc. IEEE Int. Conf.
in Proc. ICVGIP, 2004, pp. 46–51. Comput. Vision, vol. 1. Sep. 1999, pp. 255–261.
[17] G. Pajares, “A Hopfield neural network for image change detec- [42] G. Gualdi, A. Prati, and R. Cucchiara, “Video streaming for mobile video
tion,” IEEE Trans. Neural Netw., vol. 17, no. 5, pp. 1250–1264, surveillance,” IEEE Trans. Multimedia, vol. 10, no. 6, pp. 1142–1154,
Sep. 2006. Oct. 2008.
[18] D. Culibrk, O. Marques, D. Socek, H. Kalva, and B. Furht, “Neural [43] C. Benedek and T. Sziranyi, “Bayesian foreground and shadow detec-
network approach to background modeling for video object segmen- tion in uncertain frame rate surveillance videos,” IEEE Trans. Image
tation,” IEEE Trans. Neural Netw., vol. 18, no. 6, pp. 1614–1627, Process., vol. 17, no. 4, pp. 608–621, Apr. 2008.
Nov. 2007. [44] C.-Y. Chen, T.-M. Lin, and W. H. Wolf, “A visible/infrared fusion
[19] A. Manzanera and J. C. Richefeu, “A new motion detection algorithm algorithm for distributed smart cameras,” IEEE J. Selected Topics Signal
based on - background estimation,” Patt. Recog. Lett., vol. 28, pp. Process., vol. 2, no. 4, pp. 514–525, Aug. 2008.
320–328, Feb. 2007. [45] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V. S. Subrah-
[20] M. Oral and U. Deniz, “Center of mass model: A novel approach to manian, P. Turaga, and O. Udrea, “A constrained probabilistic Petri
background modeling for segmentation of moving objects,” Image Vis. Net framework for human activity detection in video,” IEEE Trans.
Comput., vol. 25, pp. 1365–1376, Aug. 2007. Multimedia, vol. 10, no. 6, pp. 982–996, Oct. 2008.
[21] W. Wang, J. Yang, and W. Gao, “Modeling background and segmenting [46] F. Bartolini, A. Tefas, M. Barni, and I. Pitas, “Image authentication
moving objects from compressed video,” IEEE Trans. Circuits Syst. techniques for surveillance applications,” Proc. IEEE, vol. 89, no. 10,
Video Technol., vol. 18, no. 5, pp. 670–681, May 2008. pp. 1403–1418, Oct. 2001.
[22] L. Maddalena and A. Petrosino, “A self-organizing approach to back- [47] D. Avitzour, “Novel scene calibration procedure for video surveillance
ground subtraction for visual surveillance applications,” IEEE Trans. systems,” IEEE Trans. Aerospace Electron. Syst., vol. 40, no. 3, pp.
Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008. 1105–1110, Jul. 2004.
Shih-Chia Huang received the Masters degree

from the Department of Computer Science Informa-
tion Engineering, National Chiao Tung University,
Hsinchu, Taiwan, in 2005. He received the Doctorate
degree from the Department of Electrical Engineer-
ing at National Taiwan University, Taipei, Taiwan,
in 2009.
He is currently an Assistant Professor with the De-
partment of Electronic Engineering, National Taipei
University of Technology, Taipei. He has published
and presented more than five papers in journals
and at conferences. He holds more than 24 U.S. and Taiwan patents. His
current research interests include image and video coding, video transmission,
video surveillance, error resilience and concealment techniques, digital signal
processing, Java processor design, and embedded software and hardware co-
design.

An Advanced Motion Detection Algorithm With Video Quality Analysis For Video Surveillance Systems

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Advanced Motion Detection Algorithm With Video Quality Analysis For Video Surveillance Systems

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO.

An Advanced Motion Detection Algorithm

C. − Estimation bti−1 (x, y) is the current (i − 1)th reference background. Addi-

Fig. 2. Framework of the optimum background modeling procedure.

where τ is the predefined tolerance. BtL (x, y) updates to the

TABLE I where h (an arbitrary element of {L0 , L1 , L2 , L3 , . . . , LV −1 })

After each w × w entropy block E(i, j) is calculated, the

1 to 4. In this paper, N is experimentally set to 2. Notice that

where t is a multiple of α and vlt−1 (x, y) represents the previous

Vt (x, y) = N × min (vst (x, y), vlt (x, y)). (27)

IV. Experimental Results

Fig. 6. System block diagram of our motion detection method.

TABLE III TABLE IV

TABLE VII TABLE IX

Shih-Chia Huang received the Masters degree

You might also like