Professional Documents
Culture Documents
1, JANUARY 2011 1
Abstract—Motion detection is the first essential process in the nursing, monitoring of endangered species, observation of
extraction of information regarding moving objects and makes people and vehicles within a busy environment [5]–[12], along
use of stabilization in functional areas, such as tracking, classifica-
with many others.
tion, recognition, and so on. In this paper, we propose a novel and
accurate approach to motion detection for the automatic video The design of an advanced automatic video surveillance
surveillance system. Our method achieves complete detection of system requires the application of many important functions
moving objects by involving three significant proposed modules: including, but not limited to, motion detection [13]–[25],
a background modeling (BM) module, an alarm trigger (AT) classification [26], tracking [27], [28], behavior [29], activity
module, and an object extraction (OE) module. For our proposed
analysis, and identification [30], [31]. Motion detection is one
BM module, a unique two-phase background matching proce-
dure is performed using rapid matching followed by accurate of the greatest problem areas in video surveillance as it is not
matching in order to produce optimum background pixels for only responsible for the extraction of moving objects but also
the background model. Next, our proposed AT module eliminates critical to many computer vision applications including object-
the unnecessary examination of the entire background region, based video encoding, human motion analysis, and human-
allowing the subsequent OE module to only process blocks
machine interactions [32]. Therefore, our focus here is the
containing moving objects. Finally, the OE module forms the
binary object detection mask in order to achieve highly complete further development of the motion detection phase for an
detection of moving objects. The detection results produced by advanced video surveillance system.
our proposed (PRO) method were both qualitatively and quan- The three major classes of methods for motion detection
titatively analyzed through visual inspection and for accuracy, are background subtraction, temporal differencing, and op-
along with comparisons to the results produced by other state-
tical flow [13]. Background subtraction [14]–[23], [33] is
of-the-art methods. The analyses show that our PRO method
has a substantially higher degree of efficacy, outperforming other the most popular motion detection method and consists of
methods by an F1 metric accuracy rate of up to 53.43%. the differentiation of moving objects from a maintained and
updated background model, which can be further grouped
Index Terms—Background model, entropy, morphology, mo-
tion detection, video surveillance. into parametric type and non-parametric type [33]. Based on
the implicit assumption along with the choice of parameters,
the parametric model may achieve perfect performance corre-
I. Introduction sponding to the real data along with parametric information
[22]. On the contrary, the non-parametric model is heavily data
I N THE LAST DECADE, video surveillance systems have
become an extremely active research area due to increasing
levels of terrorist activity and general social problems. This
dependent without any parameters [22], [33]. Apart from back-
ground subtraction, two other motion detection methods—
has led to motivation for the development of a strong and optical flow and temporal differencing—are discussed in [25].
precise automatic processing system, an essential tool for While the optical flow method shows the projected motion on
safety and security in both public and private sectors. The the image plane with successful approximation of the complex
need for advanced video surveillance systems has inspired background handling, it often requires levels of computational
progress in many important areas of science and technology complexity that are very high and which subsequently create
including traffic monitoring [1], [2], transport networks, traffic difficulties in its implementation [34]. The temporal differ-
flow analysis, understanding of human activity [3], [4], home encing method, while effectively adapting to environmental
changes, often results in incomplete detection of the shapes of
Manuscript received October 22, 2009; revised February 8, 2010; accepted
June 16, 2010. Date of publication October 18, 2010; date of current version
moving objects, due to the limitations in temporal differencing
February 24, 2011. This work was supported by the National Science Council, with a sensitive threshold for noisy and local consistency
under Grant NSC 98-2218-E-027-008. This paper was recommended by properties of the change mask [35].
Associate Editor I. Ahmad.
The author is with the Department of Electronic Engineering, Na-
The currently implemented method for background subtrac-
tional Taipei University of Technology, Taipei 106, Taiwan (e-mail: tion accomplishes its objective by subtracting each pixel of
schuang@ntut.edu.tw). the incoming video frame from the background model, thus
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
generating an absolute difference. It then applies a threshold
Digital Object Identifier 10.1109/TCSVT.2010.2087812 to get the binary objects detection mask [20]. Threshold
1051-8215/$26.00
c 2010 IEEE
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
2 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
selection is a critical operation and can be conducted by a cosine transform (DCT) domain [21], and temporal median
variety of previously researched methods [36]–[40]. Although filter (TMF) [23]. These methods are briefly reviewed in the
the currently implemented background subtraction method is following sections.
convenient for implementation, the noise tolerance in the video
frame relies on the determined threshold. Functionalities such A. Simple Background Subtraction
as object classification, tracking, behavior, and identification
Both the reference image B(x, y) and the incoming video
are then performed on the regions where moving objects have
frame It (x, y) are obtained from the video sequence. A binary
been detected.
motion detection mask D(x, y) is calculated as follows:
The computational costs of traditional foreground analysis
methods are usually relatively expensive for the video surveil-
lance systems based on the traditional optical flow imple- 1, if |It (x, y) − B(x, y)| > τ
D(x, y) = (1)
mentation [34]. For more accurate motion detection design, 0, if |It (x, y) − B(x, y)| ≤ τ
foreground analysis is always needed for the most popular
background subtraction method in order to achieve the analysis where τ is the predefined threshold which designates pixels as
of the motion information [34]. either the background or the moving objects in a video frame.
With respect to background maintenance, the pixel-level If the absolute difference between a reference image and
processes and region-level processes should be clearly de- an incoming video frame does not exceed τ, the pixels of
signed into the background subtraction approach [41]. This the detection mask are labeled “0,” which means it contains
is because pixel-level processes can handle the adaptation background, otherwise, active ones are labeled “1,” which des-
to changing background at each pixel independently without ignates it as containing moving objects. A significant problem
pixels group observation, and region-level process can refine experienced by the SBS method in most real video sequences
the raw classification of the pixel-level with regard to inter- is that it fails to respond precisely when noise occurs in
pixel relationships [41]. the incoming video frame It (x, y) and static objects occur
This paper presents a novel background subtraction method in the reference image B(x, y) [20]. Note that the reference
which generates a background model using the selected suit- image B(x, y) represents the fixed background model, which
able background candidates. Then, through the use of an alarm is selected from the test frames [20].
trigger (AT) module, it detects the pixels of moving objects
within the regions determined to significantly feature objects. B. Running Average
The organization of the proposed (PRO) method is as follows. The problem can be countered by using the RA [14] to
1) A two-phase background matching procedure is used to generate the adaptive background model for adaptation to
select suitable background candidates for generation of temporal changes in the video sequence. RA differs from SBS
an updated background model. in that it updates each background image frame Bt (x, y) of the
2) A block-based entropy evaluation with morphological adaptive background model frequently in order to ensure the
operations is conducted through a triggered block-based reliability of motion detection.
alarm module. The previous background frame Bt−1 (x, y) and the new
3) Production of motion detection is completed through the incoming video frame It (x, y) are then integrated with the
automatic threshold selection algorithm. current background image. The adaptive background model
When compared to other state-of-the-art methods included is attained using the simple adaptive filter as follows:
in the performance study, our method proved to be of higher
efficacy. This was indicated by both qualitative and quanti-
tative results through analysis using a wide range of natural Bt (x, y) = (1 − β)Bt−1 (x, y) + βIt (x, y) (2)
video sequences. The remainder of our paper is organized as where β is an empirically adjustable parameter. While a
follows. Section II presents a condensed overview of the var- large coefficient β leads to a faster background updating
ious background subtraction approaches used for comparison. speed, it also causes the creation of artificial trails behind
Section III contains our proposed motion detection method. moving objects in the background model. In other words, if
Section IV presents the experimental results achieved by our objects remain stationary long enough, they become part of
PRO method compared to those of other methods. Section V the background model.
contains our concluding remarks. The binary motion detection mask D(x, y) is based on the
SBS method and is defined as follows:
II. Related Work
The major purpose of background subtraction is to generate 1, if |It (x, y) − Bt (x, y)| > τ
a reliable background model and thus significantly improve D(x, y) = (3)
0, if |It (x, y) − Bt (x, y)| ≤ τ
the detection of moving objects [35], [42]. Some state-of-
the-art background subtraction methods include simple back- where It (x, y) is the current incoming video frame, Bt (x, y)
ground subtraction (SBS), running average (RA) [14], − is the current background model, and τ is an experimentally
estimation (SDE) [16], multiple − estimation (MSDE) predefined threshold to generate the binary motion detection
[19], simple statistical difference (SSD) [20], RA with discrete mask.
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 3
Bt (x, y) = Bt−1 (x, y) + sgn(It (x, y) − Bt−1 (x, y)) (5) where each αi is the predefined confidence value, i is the
reference number, R is the total number of i, and Bt (x, y)
where Bt (x, y) is the current background model, Bt−1 (x, y) is the confidence adaptive background model. According to
is the previous background model, and It (x, y) is the current [19], R is experimentally set to 3 and confidence values α1 ,
incoming video frame. The intensity of the background model α2 , and α3 are set to 1, 8, and 16, respectively. Notice that the
increases or decreases by a value of one through the evaluation binary moving objects mask D(x, y) is generated by the same
of the sgn function at every frame. The image of absolute dif- approach SDE based on the confidence adaptive background
ference t (x, y) is then calculated as the estimative difference model Bt (x, y).
between It (x, y) and Bt (x, y) as follows: For the certain complex scenes, the MSDE method can
detect multiple moving objects with higher degrees of accuracy
than the SDE method. This is because the MSDE method
t (x, y) = |It (x, y) − Bt (x, y)|. (6)
generates the binary moving objects mask D(x, y) based on
In a similar fashion, the time-variance Vt (x, y) is calculated the multimodal background model Bt (x, y), a procedure which
by utilizing the sgn function which measures motion activity requires greater computational complexity.
in order to determine whether each pixel should be designated
as “background” or “moving object.” E. Simple Statistical Difference
Vt (x, y) = Vt−1 (x, y) + sgn(N × t (x, y) SSD method is based on the use of mean value and
standard deviation and organizes the background model by
−Vt−1 (x, y)) (7)
computing the mean value for the individual pixels of the
where Vt (x, y) is the current time-variance, Vt−1 (x, y) is the previous video frames. Each pixel value µxy of the background
previous time-variance, and N is the predefined parameter image is produced from a collection of previous frames in
which ranges from 1 to 4. the time interval (t0 , tk−1 ). For each pixel, a threshold is also
Based on the generated current time-variance Vt (x, y), the represented by the standard deviation σxy in the same interval
binary motion detection mask D(x, y) is detected as follows: as follows:
1, if t (x, y) > Vt (x, y) 1
K−1
Dt (x, y) = (8)
0, if t (x, y) ≤ Vt (x, y). µxy = Ik (x, y) (12)
K k=0
D. Multiple − Estimation
1/2
1
K−1
The SDE method is characterized by its updating period
which features a constant time in which the background model σxy = (Ik (x, y) − µxy )2 . (13)
K k=0
is generated. This in turn causes a constraint when used for
certain complex scenes, as in scenes with many moving objects In order to achieve motion detection, the absolute difference
or those with moving objects exhibiting variable motion [19]. between the incoming video frame and the background model
Thus, in this situation, the MSDE method is proposed in is calculated. The predefined parameter is represented by λ,
order to build the adaptive background model. The background with an absolute difference larger than λσxy describing a pixel
model formula is expressed as follows: which is part of a moving object; if the absolute difference is
less or equal to λσxy , it denotes a background pixel as follows:
bti (x, y) = bt−1
i
(x, y) + sgn(bti−1 (x, y) − bt−1
i
(x, y)) (9)
1, if |It (x, y) − µxy | > λσxy
where bti (x, y)
is the current ith reference background, Dt (x, y) = (14)
i
bt−1 (x, y) is the previous ith reference background, and 0, if |It (x, y) − µxy | ≤ λσxy .
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
4 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
Fig. 1. Gray-level distribution. (a) Of the original signal. (b) Of the extracted optimum background pixels.
F. RA with DCT Domain ground model BtL (x, y). Each pixel of BtL (x, y) exhibits a long-
The RADCT algorithm [21] makes use of an altered RA term timer T L (x, y), which counts the number of frames in
method [14] in order to model the adaptive background in the which a given pixel exists BtL (x, y) as follows:
DCT domain. This is a correlation of the same function in the
spatial domain. The resultant adaptive background model can
L T L (x, y) + 1,
be expressed as follows: T (x, y) = (16)
if |It (x, y) − BtL (x, y)| > τ
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 5
1
Bt (x, y) = Bt−1 (x, y) + (It (x, y) − Bt−1 (x, y)) (18)
t
where Bt−1 (x, y) is the previous background model, It (x, y)
is the current incoming video frame, t is the frame number
in the video sequence, and K is experimentally set at 50 to
represent the initial background model. In order to reduce
frame storage consumption [21], the initial background model
adopts the calculated average. This is accomplished by making
appropriate use of MMA which holds only the last background
model Bt (x, y) and the current incoming video frame It (x, y)
during the calculation procedure.
2) Optimum Background Modeling: In order to expedi-
tiously determine background candidates, emphasis is placed
on the first rapid matching phase in optimum background mod-
eling (OBM). Fig. 1(a) illustrates gray-level distribution, which
is composed of the stable signal formed by the major part of
the background. Unstable signals result only occasionally and
indicate the appearance of moving object(s). For the optimum
background pixel estimation in the video sequence, the main
objective of OBM is to extract the stable signal of the incoming
Fig. 3. Flowchart for our proposed background modeling procedure. frame in the video sequence as shown in Fig. 1(b). As can be
seen, the signal extracted by OBM is more stable than those
III. Proposed Method of the un-extracted signal in the other frames.
The framework of OBM can be seen in Fig. 2. It consists
In this section, we present a novel motion detection ap- of the following steps:
proach for static-camera surveillance scenarios. Our approach
1) immediate determination of background candidates via
achieves complete detection of moving objects and involves
the rapid matching procedure;
three proposed modules: a background modeling (BM) mod-
2) use of the stable signal trainer in order to provide a
ule, an AT module, and an object extraction (OE) module.
measure of temporal activity of the pixels within the set
Initially, the proposed BM module designs a unique two-
of background candidates;
phase background matching procedure using rapid matching
3) determination of the optimum background pixels via the
followed by accurate matching in order to produce optimum
accurate matching procedure.
background pixels for the background model.
In order to drastically reduce the computational complexity Rapid matching: This procedure is used to quickly find
of the motion detection process, we propose using an AT a great quantity of background candidates by determining
module. This module consists of a novel block-based en- whether or not their respective pixel values for the incoming
tropy evaluation method developed for the employment of video frame It (x, y) are equal to the corresponding pixel values
block candidates, after which the most likely moving objects of the previous video frame It−1 (x, y). If the values corre-
within the motion blocks are determined based on block- spond, it indicates good candidate selection for the following
based morphological erosion and dilation operations. Our AT stable signal trainer.
module eliminates the unnecessary examination of the entire Stable signal trainer: All pixels from the set of back-
background region, allowing the OE module to only process ground candidates selected via the rapid matching procedure
blocks containing moving objects. are then trained through the stable signal trainer. The stable
As the final step of our process, the proposed OE module signal trainer is expressed as follows:
examines every block which may possibly contain moving
objects in order to generate the binary object detection mask. Mt (x, y) + p, if It (x, y) > Mt−1 (x, y)
Mt (x, y) = (19)
This is accomplished by utilizing the suitable threshold value, Mt (x, y) − p, if It (x, y) < Mt−1 (x, y)
which is then applied automatically through our proposed where Mt (x, y) is the corresponding pixel within the most
effective threshold selection algorithm. recent set of background candidates, Mt−1 (x, y) is the cor-
responding pixel within the previous set of background candi-
A. Background Modeling dates, and p represents the real value which is experimentally
1) Initial Background Model: The modified moving av- set at 1. Notice that the initial background candidate value
erage (MMA) is used to compute the average of frames M0 (x, y) is set at I0 (x, y).
1 through K for the initial background model generation. Accurate matching: As shown on the right in Fig. 2,
For each pixel (x, y), the corresponding value of the current each light-gray pixel of the background candidate is trained
background model Bt (x, y) is calculated using the formula as to the dark-gray pixel by using the stable signal trainer. The
follows: optimum of these are then determined from the dark-gray
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
6 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
LV −1
(i,j) (i,j)
E(i, j) = − Ph log 2(Ph ). (22)
h=0
1, if E(i, j) > T
A(i, j) = (23)
0, otherwise.
pixels when the pixels of Mt (x, y) are equal to It (x, y). They When the calculated entropy block (i, j) exceeds T , the
are represented here in black. motion block A(i, j) is labeled with “1,” denoting that it
3) Background Updating: Each optimum background pixel contains pixels of moving objects. Otherwise, nonactive ones
of Mt (x, y) will then be supplied to every frame of the are labeled with “0.” Table I illustrates the block-based entropy
background model Bt (x, y). Based on OBM, the best possible density of the pronounced change for the blocks in a sampled
background pixels are then updated for the background model. video frame. By setting T equal to 1, possible motion blocks
Here, we adopt a simple moving average method in order can then be detected.
to smooth the proposed background model. This will yield The elimination of some of the detected background blocks
better results when motion detection is performed. The moving and completed motion blocks can then be performed. This
average formula is expressed as follows: is accomplished through the use of block-based erosion and
dilation, and can be defined by the formula as follows:
1
Bt (x, y) = Bt−1 (x, y) + (It (x, y) − Bt−1 (x, y)) (20)
α A∗ = δb2λ (εbλ (A)) (24)
where α is the predefined parameter and, in this paper, where ε is the morphological erosion, δ is the morphological
is experimentally set at 8. The flowchart for the proposed dilation, and bλ , the elemental structure, is a ball of radius λ
background model can be seen in Fig. 3. which is experimentally set at 1 in this paper. The flowchart
of the block-based alarms trigger (AR) module can be seen in
B. Alarms Trigger Module Fig. 4.
After the background model is produced via the BM
procedure at each frame, the absolute difference t (x, y) C. OE Module
is generated by the absolute differential estimation between The detection of moving objects can be achieved through
the updated background model Bt (x, y) and current incoming the observed change in gray-level illumination of the obtained
video frame It (x, y). motion blocks within the absolute difference t (x, y). How-
In order to significantly accelerate the following OE module, ever, the critical challenge is obtaining a suitable threshold
we propose that the AT module be comprised of a stepwise for binarization [30]. To solve this problem, we propose the
procedure involving novel block-based entropy evaluation effective threshold selection algorithm for use with the OE
followed by block-based morphological operations. Detection module in order to produce the binary motion detection mask.
of each possible motion block candidate is accomplished by For the first variance estimate, the computation employs the
the proposed block-based entropy evaluation. Elimination of basic stable signal trainer for the estimation of the short-term
some of the detected background blocks and completed motion variance value at each frame when t (x, y) is not equal to 0.
blocks is then performed via the block-based morphological The short-term variance estimation formula can be expressed
erosion and dilation operations. as follows:
Suppose that each w × w block (i, j) within the absolute
difference t (x, y) is composed of V discrete gray-levels and
vst−1 (x, y) + p, if N × t (x, y) > vst−1 (x, y)
is denoted by {L0 , L1 , L2 , L3 , . . . , LV −1 }. The block-based vst (x, y) =
(i,j) vst−1 (x, y) − p, if N × t (x, y) < vst−1 (x, y)
probability density function Ph is defined as follows:
(25)
(i,j) (i,j) where vst−1 (x, y) represents the previous short-term variance
Ph = nh /w2 (21) value, and N is the predefined parameter which ranges from
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 7
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
8 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
TABLE II
Used Original Video Sequences Benchmark
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 9
Fig. 7. Comparison between (a) SDE, (b) RA, (c) TMF, (d) MSDE, and (e) our proposed background model.
Fig. 8. Background images of the RD video sequence generated by (a) SDE, (b) RA, (c) TMF, (d) MSDE, (e) and our proposed background model.
Fig. 9. Background images of the ST video sequence generated by (a) SDE, (b) RA, (c) TMF, (d) MSDE, (e) and our proposed background model.
We then evaluate the generated background models of background signal Bt (30, 20) of our proposed background
different methods for a specific pixel selected from a scene model in Fig. 7(e) exhibits less variance than those of
for the 6000-frame RD video sequences displayed in Fig. Fig. 7(a)–(d).
7. In this scene, many vehicles pass through a country road, Figs. 8 and 9 show the results of the background models
which causes the signal of observed pixel It (30, 20) to generated by the respective use of SDE, RA, TMF, MSDE,
tremble frequently. and our PRO method, in the RD and ST video sequences.
Therefore, the output signal of the background model The background images calculated by our proposed model are
must be more stable in order to differentiate between illustrated in Fig. 8(e) and Fig. 9(e). Here, we can observe that
the trembling signals of moving objects and those pro- our proposed model is easily the most successful in modeling
duced by the background. When compared to other state- background images which abstain from the generation of noise
of-the-art methods, it becomes apparent that the output and artificial “ghost” trails.
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
10 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
In addition to background model evaluation, the qualitative Precision represents the percentage of true positives not
motion detection results obtained through our proposed (PRO) needed through comparison with the amount of true positive
method were also compared with those obtained through pixels in the detected binary objects mask as follows:
other state-of-the-art methods. To obtain measurements of
quantitative accuracy, different metrics were performed on
several video sequences; these included Recall, Precision, F1 , Precision = tp/(tp + fp) (30)
and Similarity [22], [42]–[47]. where fp is the total number of false positive pixels and (tp +
The use of Recall offers the percentage of needed true fp) indicates the total number of true positive pixels in the
positives through comparison with the amount of true positive detected binary objects mask.
pixels in the ground truth as follows: Unfortunately, Recall selectively measures only the wrong
association of lost true positive pixels internal to the moving
Recall = tp/(tp + fn) (29) object, and Precision selectively measures only the wrong
association of external true positive pixels. Therefore, the
where tp is the total number of true positive pixels, fn is the accuracy measurements produced by Recall and Precision
total number of false negative pixels, and (tp + fn) indicates independent of each other cannot offer an adequate comparison
the total number of true positive pixels in the ground truth. between the different methods.
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 11
TABLE V TABLE VI
Comparison Between the Binary Objects Masks with Similarity Comparison Between the Binary Objects Masks with Similarity
and F1 Values of Each Method in Sequence SC and F1 Values of Each Method in Sequence W S
However, there are two accuracy metrics—F1 and their binary objects masks obtained by each method are shown
Similarity—which can reconcile the accuracy measurements in Tables III–VI.
of Recall and Precision by fairly weighting their harmonic Tables III–VI show the original frames, the ground truth
balances as follows: frames, and the test frames with Similarity and F1 accuracy
measurements of their binary objects masks. These were
F1 = 2(Recall) (Precision)/(Recall + Precision) (31) generated by PRO, RADCT, MSDE, SDE, and SSD methods,
respectively, with the results shown from top to bottom. Here,
we can observe that the PRO method is obviously more precise
Similarity = tp/(tp + fp + fn). (32) than the MSDE, SDE, SSD, and RADCT methods for almost
every test sequence. The binary objects masks obtained by the
All metric-attained values range from 0 to 1, with higher PRO method exhibit not only the largest amount of completion
values representing greater accuracy. but also the highest Similarity and F1 accuracy measurements.
The qualitative results for six test frames of each video In addition to the single binary moving objects mask and
sequence along with the Similarity and F1 measurements of Similarity and F1 measurements of the six sampled frames
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
12 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
HUANG: AN ADVANCED MOTION DETECTION ALGORITHM WITH VIDEO QUALITY ANALYSIS FOR VIDEO SURVEILLANCE SYSTEMS 13
our proposed motion detection approach as not only did the [23] B. Shoushtarian and N. Ghasem-Aghaee, “A practical approach to
realtime dynamic background generation based on a temporal median
accuracy rates of our procedure exceed those of other methods filter,” J. Sci. Islamic Republic Iran, vol. 14, no. 4, pp. 351–362, 2003.
but also the resulting visual performance was more pleasing. [24] L. Havasi, Z. Szlavik, and T. Sziranyi, “Detection of gait characteristics
for scene registration in video surveillance system,” IEEE Trans. Image
Process., vol. 16, no. 2, pp. 503–510, Feb. 2007.
[25] D.-M. Tsai and S.-C. Lai, “Independent component analysis-based
References background subtraction for indoor surveillance,” IEEE Trans. Image
[1] D. Koller, K. Daniilidis, and H. Nagel, “Model-based object tracking in Process., vol. 18, no. 1, pp. 158–167, Jan. 2009.
monocular image sequences of road traffic scenes,” Int. J. Comput. Vis., [26] H. Murase and R. Sakai, “Moving object recognition in eigenspace
vol. 10, pp. 257–281, Jun. 1993. representation: Gait analysis and lip reading,” Patt. Recog. Lett., vol.
[2] Z. Zhu, G. Xu, B. Yang, D. Shi, and X. Lin, “Visatram: A real-time 17, pp. 155–162, Feb. 2003.
vision system for automatic traffic monitoring,” Image Vis. Comput., [27] N. McFarlane and C. Schofield, “Segmentation and tracking of piglets
vol. 18, no. 10, pp. 781–794, Jul. 2000. in images,” Mach. Vision Applicat., vol. 8, no. 3, pp. 187–193, May
[3] S. Dockstader and M. Tekalp, “Multiple camera tracking of interacting 1995.
and occluded human motion,” Proc. IEEE, vol. 89, no. 10, pp. 1441– [28] C. Stauffer and W. E. L. Grimson, “Learning patterns of activity using
1455, Oct. 2001. real-time tracking,” IEEE Trans. Patt. Anal. Mach. Intell., vol. 22, no.
[4] S. Park and J. Aggarwal, “A hierarchical bayesian network for event 8, pp. 747–757, Aug. 2000.
recognition of human actions and interactions,” Multimedia Syst., vol. [29] D. Ayers and M. Shah, “Monitoring human behavior from video taken
10, no. 2, pp. 164–179, Aug. 2004. in an office environment,” Image Vis. Comput., vol. 19, pp. 833–846,
[5] P. Remagnino, T. Tan, and K. Baker, “Multiagent visual surveillance of Oct. 2001.
dynamic scenes,” Image Vis. Comput., vol. 16, pp. 529–532, Jun. 1998. [30] L. Wang, T. Tan, H. Ning, and W. Hu, “Silhouette analysis-based gait
[6] T. Huang and S. Russell, “Object identification: A bayesian analysis recognition for human identification,” IEEE Trans. Patt. Anal. Mach.
with application to traffic surveillance,” Artif. Intell., vol. 103, nos. 1–2, Intell., vol. 25, no. 12, pp. 1505–1518, Dec. 2003.
pp. 77–93, Aug. 1998. [31] J. Hayfron-Acquah, M. Nixon, and J. Carter, “Automatic gait recognition
[7] G. L. Foresti, “Real-time system for video surveillance of unattended by symmetry analysis,” Patt. Recog. Lett., vol. 24, pp. 2175–2183, Sep.
outdoor environments,” IEEE Trans. Circuits Syst. Video Technol., vol. 2003.
8, no. 6, pp. 697–704, Oct. 1998. [32] L. Li, W. Huang, I. Y.-H. Gu, and Q. Tian, “Statistical modeling of
[8] M. Haag and H. H. Nagel, “Incremental recognition of traffic situations complex backgrounds for foreground object detection,” IEEE Trans.
from video image sequences,” Image Vis. Comput., vol. 18, pp. 137–153, Image Process., vol. 13, no. 11, pp. 1459–1472, Nov. 2004.
Jan. 2000. [33] A. M. Elgammal, D. Harwood, and L. S. Davis, “Non-parametric model
[9] T. Darrell, G. G. Gordon, M. Harville, and J. Woodfill, “Integrated for background subtraction,” in Proc. Eur. Conf. Comput. Vision, 2000,
person tracking using stereo, color and pattern detection,” Int. J. Comput. pp. 751–767.
Vis., vol. 37, pp. 175–185, Jun. 2000. [34] Y.-L. Tian, M. Lu, and A. Hampapur, “Robust and efficient foreground
[10] J. M. Ferryman, S. J. Maybank, and A. D. Worrall, “Visual surveillance analysis for real-time video surveillance,” in Proc. IEEE Int. Conf.
for moving vehicles,” Int. J. Comput. Vis., vol. 37, no. 2, pp. 187–197, Comput. Vision Patt. Recog., vol. 1. Jun. 2005, pp. 1182–1187.
Jun. 2000. [35] R. J. Radke, S. Andra, O. Al-Kofahi, and B. Roysam, “Image change
[11] I. Haritaoglu, D. Harwood, and L. S. Davis, “W 4 : Real-time surveillance detection algorithms: A systematic survey,” IEEE Trans. Image Process.,
of people and their activities,” IEEE Trans. Patt. Anal. Mach. Intell., vol. vol. 14, no. 3, pp. 294–307, Mar. 2005.
22, no. 8, pp. 809–830, Aug. 2000. [36] T. Ridler and S. Calvard, “Picture thresholding using an iterative
[12] N. M. Oliver, B. Rosario, and A. P. Pentland, “A bayesian computer selection method,” IEEE Trans. Syst., Man, Cybern., vol. 8, no. 8, pp.
vision system for modeling human interactions,” IEEE Trans. Patt. Anal. 630–632, Aug. 1978.
Mach. Intell., vol. 22, no. 8, pp. 831–843, Aug. 2000. [37] N. Otsu, “A threshold selection method from gray level histograms,”
[13] W. Hu, T. Tan, L. Wang, and S. Maybank, “A survey on visual IEEE Trans. Syst., Man, Cybern., vol. 9, no. 1, pp. 62–66, Jan.
surveillance of object motion and behaviors,” IEEE Trans. Syst., Man, 1979.
Cybern. C, Appl. Rev., vol. 34, no. 3, pp. 334–352, Aug. 2004. [38] J. Kapur, P. Sahoo, and A. Wong, “A new method for gray-level
[14] C. R. Wren, A. Azarbayehani, T. Darrell, and A. P. Pentland, “Pfinder: picture thresholding using the entropy of the histogram,” Comput. Vision
Real-time tracking of the human body,” IEEE Trans. Patt. Anal. Mach. Graphics Image Process., vol. 29, no. 3, pp. 273–285, 1985.
Intell., vol. 19, no. 7, pp. 780–785, Jul. 1997. [39] P. Rosin, “Thresholding for change detection,” Comput. Vis. Image
[15] S. J. McKenna, S. Jabri, Z. Duric, A. Rosenfeld, and H. Wechsler, Understanding, vol. 86, no. 2, pp. 79–95, May 2002.
“Tracking groups of people,” Comput. Vis. Image Understanding, vol. [40] L. Snidaro and G. Foresti, “Real-time thresholding with Euler numbers,”
80, no. 1, pp. 42–56, Oct. 2000. Patt. Recog. Lett., vol. 24, pp. 1533–1544, Jun. 2003.
[16] A. Manzanera and J. C. Richefeu, “A robust and computationally effi- [41] K. Toyama, J. Krumm, B. Brumitt, and B. Meyers, “Wallflower: Princi-
cient motion detection algorithm based on - background estimation,” ples and practice of background maintenance,” in Proc. IEEE Int. Conf.
in Proc. ICVGIP, 2004, pp. 46–51. Comput. Vision, vol. 1. Sep. 1999, pp. 255–261.
[17] G. Pajares, “A Hopfield neural network for image change detec- [42] G. Gualdi, A. Prati, and R. Cucchiara, “Video streaming for mobile video
tion,” IEEE Trans. Neural Netw., vol. 17, no. 5, pp. 1250–1264, surveillance,” IEEE Trans. Multimedia, vol. 10, no. 6, pp. 1142–1154,
Sep. 2006. Oct. 2008.
[18] D. Culibrk, O. Marques, D. Socek, H. Kalva, and B. Furht, “Neural [43] C. Benedek and T. Sziranyi, “Bayesian foreground and shadow detec-
network approach to background modeling for video object segmen- tion in uncertain frame rate surveillance videos,” IEEE Trans. Image
tation,” IEEE Trans. Neural Netw., vol. 18, no. 6, pp. 1614–1627, Process., vol. 17, no. 4, pp. 608–621, Apr. 2008.
Nov. 2007. [44] C.-Y. Chen, T.-M. Lin, and W. H. Wolf, “A visible/infrared fusion
[19] A. Manzanera and J. C. Richefeu, “A new motion detection algorithm algorithm for distributed smart cameras,” IEEE J. Selected Topics Signal
based on - background estimation,” Patt. Recog. Lett., vol. 28, pp. Process., vol. 2, no. 4, pp. 514–525, Aug. 2008.
320–328, Feb. 2007. [45] M. Albanese, R. Chellappa, V. Moscato, A. Picariello, V. S. Subrah-
[20] M. Oral and U. Deniz, “Center of mass model: A novel approach to manian, P. Turaga, and O. Udrea, “A constrained probabilistic Petri
background modeling for segmentation of moving objects,” Image Vis. Net framework for human activity detection in video,” IEEE Trans.
Comput., vol. 25, pp. 1365–1376, Aug. 2007. Multimedia, vol. 10, no. 6, pp. 982–996, Oct. 2008.
[21] W. Wang, J. Yang, and W. Gao, “Modeling background and segmenting [46] F. Bartolini, A. Tefas, M. Barni, and I. Pitas, “Image authentication
moving objects from compressed video,” IEEE Trans. Circuits Syst. techniques for surveillance applications,” Proc. IEEE, vol. 89, no. 10,
Video Technol., vol. 18, no. 5, pp. 670–681, May 2008. pp. 1403–1418, Oct. 2001.
[22] L. Maddalena and A. Petrosino, “A self-organizing approach to back- [47] D. Avitzour, “Novel scene calibration procedure for video surveillance
ground subtraction for visual surveillance applications,” IEEE Trans. systems,” IEEE Trans. Aerospace Electron. Syst., vol. 40, no. 3, pp.
Image Process., vol. 17, no. 7, pp. 1168–1177, Jul. 2008. 1105–1110, Jul. 2004.
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.
14 IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY, VOL. 21, NO. 1, JANUARY 2011
Authorized licensed use limited to: Chandigarh University. Downloaded on March 12,2024 at 21:31:46 UTC from IEEE Xplore. Restrictions apply.