Professional Documents
Culture Documents
Energy Based Surveillance Systems For ATM Machines 2
Energy Based Surveillance Systems For ATM Machines 2
i=1
H
j=1
w
i,j
(n) v
2
i,j
(n) (1)
The parameter v
i,j
(n) is the velocity of the (ith, jth)
pixel in the nth frame (Width = W, Height = H) and
the coefcient w
i,j
(n) is obtained from the blob area of the
current frame to describe the number of people in the blob.
This equation has been dened and discussed in[11], [12] in
full length.
In this paper, the same equation is used. The coefcient of
kinetic energy w
i,j
(n) should be reset for the identication of
violence like collision, rotation and vibration occur constantly
in the process of the human body movements. Therefore,
our question focuses on how to describe the discordance and
complexity of motion. For this purpose, we should analyze
the angle component of optical ow motion eld.
B. Angle Field Analysis
The phrase angle component in a motion eld, referred to
as the Angle Field for short, can be received from following
steps:
1) Calculate horizontal u
i,j
(n) and vertical v
i,j
(n) com-
ponents in complex form of every pixel in current frame,
u
i,j
(n) + j v
i,j
(n).
2) Get the velocity component by the complex modu-
lus(magnitude) of the complex matrix, V
i,j
(n) = |u
i,j
(n) +
j v
i,j
(n)|. Make a MASK(n) by setting a certain threshold
V
min
on velocity component matrix.
3) Get angle component by the phase angle of the complex
matrix. Use the MASK(n) to obtain the nal Angle Field,
A
i,j
(n) = arctan
u
i,j
(n)
v
i,j
(n)
MASK
i,j
(n). The usage of the
Mask is to reduce the noise from the background.
2882
Fig. 4 Motion Field and Angle Distribution. The left four image is describe Normal Behaviors and the right four is Abnormal Behaviors.
The sub-gure(a) is current frame. The (b) is the optical ow eld. The red point in (b) indicate the pixel with highest velocity. The
position mostly located in head and limbs, and change frequently along with the object posture. The (d) is the zoomed image from the
red box in (b) to show detail around the red point. The (c) is the histogram of angle eld.
Fig.4 shows the angle distributing of the normal and abnor-
mal situation separately. The angle concentrated in a certain
direction when the object moving in the view normally. When
aggressive events occurring, the angle eld presents average
distribution. Next, we use two coefcients to describe the
angle discordance.
C. Weighted Coefcients Design
In our algorithm, two coefcients are adopted. The rst
is Angle in the Table. 1, which indicates the angle differ-
ence between proximate frames. The second is designed to
represent the inconsistency of angle in the current frame.
To achieve this purpose, we need an angle value for the
benchmark. There are three kinds of angle which can be con-
sidered as the benchmark. The rst is the average direction
that represents the common motion direction of each body
part. The second one is the motion direction of the object
centroid, which shows the general motion trend of the object.
And the one we adopted is the direction of the pixel with
the highest speed. Because it is situated in the most intense
conicting region, to which we should pay much attention.
We named the coefcient AngleM, which can be ob-
tained from following steps:
1) Find the pixel with the highest speed, and use its angle
as the benchmark angle
AngleMax;
2) Calculate the difference of all pixel in the ow eld
with the
AngleMax Eq.(2);
AngleM
i,j
(n) = A
i,j
(n)
AngleMax (2)
3) Some element in the matrix of AngleM
i,j
(n) should
round into the range of (, ). if the value of AngleM
i,j
less than or big than , then add 2 or 2 on it. The
AngleM
i,j
(n) is the absolute difference to the benchmark
angle.
4) Before application, we should normalize coefcients
in the range of (0, 1). Firstly, Calculate absolute matrix of
AngleM
i,j
(n), and divide the AngleM
i,j
(n) by .
In order to exaggerate the weight effect for better classi-
cation performance, we can use two weight approaches, such
as (A)Eq.(3):
w
i,j
(n) = (1 +
|Angle
i,j
(n)|
+
|AngleM
i,j
(n)|
)
2
(3)
and (B)Eq.(4):
w
i,j
(n) = (
|Angle
i,j
(n)|
10)
2
+(
|AngleM
i,j
(n)|
10)
2
(4)
Comparing the Eq.1, the Weighted Kinetic Energy wE of
nth frame is obtained from the equation Eq.5 below:
wE(n) =
W
i=1
H
j=1
w
i,j
(n) v
i,j
(n)
2
MASK
i,j
(n) (5)
TABLE II
EXPERIMENT DESIGN
Parameters Value
Video Signal
1 Video resolution 576 720pix
2 Down sampled resolution 120 160pix
3 Frame rate 25fps
4 Time threshold of the blob stay 10s
5 Velocity threshold for Mask 0.01
Software platform
1 OS Windows XP
2 Simulation environment Matlab 2007a, Simulink
Hardware platform
1 CPU Inter D-Core 2140
2 Memory DDR II 667 2G
2883
Fig. 5 Comparison of Energy Curves. These curves are obtained from different optical ow algorithm and weight approaches. The
comparison result will help to chose the best energy extraction method. The red line indicates the time when the abnormality occurs. The
sub-gure under the chart shows the snapshot of a clip contained violence.
IV. EXPERIMENTS
We randomly surveyed 30 ATM outlets in NanShan district
of ShenZhen, a big city in south China, for installation
information and experimental videos of ATM surveillance
systems. The position and number of camera rely on the
shape and space of the ATM booth to meet the requirements
of face identication and event recording. The realistic video
dataset of robberies and other abnormal events at ATMs is
rather difcult to collect. When we viewed and analyzed
clips of actual cases downloaded from network, We found
that those cases occur in different scenes, the position of
camera and the quality of video is varies considerably with
the position of camera and location s where those case occur.
For the comparability of experimental result, we imitate those
real events and act them out in a same camera location to
test our algorithm.
A. Experiment Designation
A traditional CCTV video surveillance system is installed
in the laboratory to simulate the ATM scene as shown in
Fig.5. The door of the lab is considered the location of
ATM and a yellow line is drawn 1m away from it outside
the door. The camera is placed on the top of the wall close
to the ATM to make sure that the customer operating on
2884
Fig. 6 ATM scene planform.
the ATM do not overlap the yellow line from the view of
camera. The experiment materials which consist of 21 video
clips running 20 to 30 seconds contain normal statues in the
rst part and several typical abnormal events such as fraud,
ghting, robbery and etc., in the second part.
Experiments information and parameters of the system is
listed in Table 2:
B. Comparison and Analysis
Fig.6 shows the performance that the Weighted Kinetic
Energy describes Aggressive Behaviors in a clip recoding
4 people ghting. As the gure shows, when abnormality
occurs (around the 269th frame as the red line indicated), the
Original Kinetic Energy Curve[9](oKEn), the energy without
weights, which remained in stationary state as in the previous
normal situation. but, the Weighted Energy Curve(wKEn)
uctuate drastically and rise to high value within a few
frames. There are two high energy event occur around 80th
and 180th frame. The Energy value of Original Kinetic
Energy curve is close to 2500, and the value of Weighted
Energy Curve is 2800. The proportion of two energy is
(1 : 1.2). When the real abnormal event occur, the proportion
is raise to (1 : 3 6). It means, the weighted method can
restrain the weight when a high speed but not complex motion
is happen, but spirit up the weight when a noticeable event
occur.
The difference between the Weighted and the Original
Curve is enlarged in the abnormal part because of the
disorder degree motion direction. The sub gure (1) and
(2) is the energy curve obtained from Lucas-Kanade and
Horn-Schunck Method separately. We chose Horn-Schunck
algorithm, because those curves between 150th 200th
frames prove to be rather robust in normal part. Another
comparison is made between two weighted approaches (A)
Fig. 7 Wavelet Analysis. The upper sub-gure is the energy
curve computer by Horn-Schunck algorithm and weight
approach(B). The 3-level Approximation in the 2nd gure indicate
that there is more energy when violence occur, and the 1-level
detail indicate that the energy vary acutely in the abnormal part.
Eq.(3)and (B) Eq.(4). Noticed that, comparing to the (A), the
curve based on weighted approach (B) performs more robust
in normal part and more sensitive in the real, the abnormal
part, Which is what we want. Finally, the energy extraction
approach we chosen is based on weighted method(B) and
Horn-Schunck optical ow algorithm.
Furthermore, Stationary Wavelet Transform(SWT)[16] is
employed to analyze the energy curve and the sym4 of the
Symlets wavelet family to perform 3-level signal decompo-
sition. In Fig.7, the 3rd level Approximation coefcient of
the energy curve shows there is more energy generating when
aggressive behavior occurs. As for the 1st level detail coef-
cient (the lowest sub-gure), when adopting 1-dimensional
variance adaptive threshold on it, the boundary(located in the
271st frame) between two parts with different variance is
quite close to the frame of abnormal event start. From these
result it is clear that we even need not use machine learning
approaches to distinguish the abnormal from the normal as
an energy threshold value is good enough to do the job. In
this experiment, the threshold value of Orange and Red alarm
are dened as 0.5 10
4
and 0.7 10
4
respectively.
V. RESULT AND DISCUSSION
The following group of gures in Fig.8 show that cor-
responding alarms respond to the contents and the statistic
of experimental results in Table 3. The system performs
with a low rate of False Negative(FP.) and a high rate of
False Positive(NP.), which means that the system is quite
sensitive to abnormal events, but sometimes it overreacts to
2885
Fig. 8 Snapshots of experimental results. The semitransparent
yellow region is the sensitive area we dened at beginning. The
rectangle on the people means that their motion is being tracked.
The Yellow Alarm indicates that someone is striding over the
yellow line when a customer is operating on the ATM. The Red
Alarm warns of the occurrence of violence. The left sub-gure of
Orange alarm indicate more than one customer in the area, and
the right sub-gure shows the interim of violence behaviors.
normal situations. Due to imperfections of the calibration
algorithm, such False Positives occur mainly in the normal
part of the video when a customer walks close to the
ATM. The False Negative of Nonviolent cases is lower than
that of Violent events because Nonviolent crime detection
mainly relies on the object tracking, which is more robust
than energy approach for aggressive behavior detection. The
system output frame rate is around 911fps, which satises
the real-time requirement.
The proposed energy-based algorithm processes a promi-
nent performance in solving the problem of aggressive be-
haviors detection. The novel weighted method is not just a
description of the entropy of velocity histogram. It focuses
on the pixel with max velocity in eld and its relationship
TABLE III
RESULT
Case Type Clips Num. Frame Num. FP. FN.
1 Normal 4 2000 3.8 0
2 Fraud 5 3775 2.7 1.5
3 Fight 7 4200 2.1 1.7
4 Robbery 4 1700 3.6 2.3
with other pixels which represents the important feature
of aggressive behavior. The structure of the system has
been proven effective in Non-violent and violent abnormality
detection at ATMs. To enhance the robustness of the ATM
surveillance system, our future work will be concentrated on
the following three aspects. Firstly, the approach of energy
mining should be improved for describing the abnormal situa-
tion in certain scene. Secondly, a robust calibration algorithm
should be considered to reduce the false positive rate. Finally,
a appropriate feature extraction and analysis methods of
energy curve will discover more valuable information about
abnormality in video. The intelligence level of the system
should be updated by introducing machine learning and fuzzy
decision rule in complex event recognition.
ACKNOWLEDGMENT
The authors would like to acknowledge Dr. Weizhong Ye in
The Chinese University of Hong Kong. The authors also wish
to thank Mr. Huihuan Qian for his insightful suggestions,
and other colleagues in Advanced Robotics Laboratory, at
The Chinese University of Hong Kong, for their help, and
encouragement.
REFERENCES
[1] F. Cupillard, F.Bremond, M. Thonnat. Behaviour Recognition for indi-
viduals, groups of people and crowd, Intelligence Distributed Surveil-
lance Systems, IEE Symposium on , vol. 7, pp. 1-5, 2003.
[2] Ankur Datta, Mubarak Shah, Niels Da, Vitoria Lobo. Person-on-Person
Violence Detection in Video Data, In Proceedings of the 16 th Interna-
tional Conference on Pattern Recognition. vol. 1, pp. 433- 438, 2002.
[3] Yufeng Chen, Guoyuan Liang, Ka Keung Lee, and Yangsheng Xu.
Abnormal Behavior Detection by Multi-SVM-Based Bayesian Network,
Proceedings of the 2007 International Conference on Information Acqui-
sition, vol. 1, pp. 298-303, 2007.
[4] Xinyu Wu, Yongsheng Ou, Huihuan Qian, and Yangsheng Xu. A
Detection System for Human Abnormal Behavior, IEEE International
Conference on Intelligent Robot Systems,vol. 1, no. 1, pp. 589-1593,
2005.
[5] Xin Geng, Gang Li1, Yangdong Ye, Yiqing Tu, Honghua Dai. Abnormal
Behavior Detection for Early Warning of Terrorist Attack, Lecture Notes
in Computer Science. vol. 4304, pp. 1002-1009, 2006.
[6] Oren Boiman, Michal Irani. Detecting Irregularities in Images and in
Video, International Journal of Computer Vision , vol. 74, no. 1, pp.
17-31, 2007.
[7] Haritaoglu, I. Harwood, D. Davis, L.S. A fast background scene
modeling and maintenance for outdoor surveillance, In Proceedings OF
International Conference of Pattern Recognition, vol. 4, pp. 179-183,
2000.
[8] Jeho Nam, Masoud Alghoniemy, Ahmed H. Tewk. Audio-Visual
Content-based Violent Scene Characterization,In Proceedings of Inter-
national Conference on Image Processing (ICIP), vol. 1, pp. 353-357,
1998.
2886
[9] W. Zajdel, J.D. Krijnders, T. Andringa, D.M. Gavrila. CASSANDRA:
audio-video sensor fusion for aggression detection., IEEE Int. Conf. on
Advanced Video and Signal based Surveillance (AVSS), vol. 1, pp. 200-
205, 2007.
[10] Thanassis Perperis, Soa Tsekeridou. A Knowledge Engineering Ap-
proach for Complex Violence Identication in Movies ,International
Federation for Information Processing(IFIP). vol. 247, pp. 357-364,
2007.
[11] Z. Zhong, W.Z. Ye, S.S. Wang, M. Yang and Y.S. Xu. Crowd Energy
and Feature Analysis, IEEE International Conference on Integreted
Technology (ICIT07), vol.1, pp. 144-150, 2007.
[12] Z. Zhong, W.Z. Ye, M. Yang, S.S. Wang and Y.S. Xu. Energy Methods
for Crowd Surveillance, IEEE International Conference on Information
Acquisition (ICIA07),vol. 1, pp. 504-510, 2007.
[13] Jiangjian Xiao, Hui Cheng, Harpreet S. Sawhney, Cen Rao, Michael
A. Isnardi. Bilateral Filtering-Based Optical Flow Estimation with
Occlusion Detection, ECCV (1): pp. 211-224, 2006.
[14] B. Lucas and T. Kanade. An Iterative Image Registration Technique
with an Application to Stereo Vision, In Proceedings of the International
Joint Conference on Articial Intelligence, vol. 1, pp. 674-679, 1981.
[15] Barron, J.L., D.J. Fleet, S.S. Beauchemin, and T.A. Burkitt. Perfor-
mance of optical ow techniques. IEEE Computer Society Conference
on Computer Vision and Pattern Recognition. vol. 1, pp. 236-242, 1992.
[16] Coifman, R.R.; Donoho, D.L., Translation invariant de-noising. Lecture
Notes in Statistics, vol. 103, pp. 125-150, 1995.
2887