Professional Documents
Culture Documents
https://doi.org/10.1007/s11760-018-1267-z
ORIGINAL PAPER
Abstract
In this paper, a new method for detecting abnormal events in public surveillance systems is proposed. In the first step of the
proposed method, candidate regions are extracted, and the redundant information is eliminated. To describe appearance and
motion of the extracted regions, HOG-LBP and HOF are calculated for each region. Finally, abnormal events are detected
using two distinct one-class SVM models. To achieve more accurate anomaly localization, the large regions are divided into
non-overlapping cells, and the abnormality of each cell is examined separately. Experimental results show that the proposed
method outperforms existing methods based on the UCSD anomaly detection video datasets.
123
Signal, Image and Video Processing
Biswas et al. [22] propose an approach in H.264/AVC other on the sparse representation (SR) of an input video
framework to detect anomalies in surveillance videos using patch. Xu et al. [36] propose an appearance and motion deep
the magnitude and orientation of motion vectors. Zaharescu network (AMDN) in a double-fusion framework to extract
and Waild [23] propose a method that employs the spatiotem- both motion and appearance information. Leyva et al. [28]
poral energy filters to make a pixel-level activity pattern. propose binary features to detect abnormal events in an online
They also model the appearance and dynamics of the crowded manner. They extract binary strings from temporal gradients
scenes using the mixture of dynamic textures (MDT) [15]. and also foreground occupancy features to detect the abnor-
Bertini et al. [24] propose a local test to measure the simi- mality in a Gaussian mixture model (GMM).
larity of a video cell and its eight neighbors. In this method, Kangwei et al. [27] extract five image descriptors, namely
the cells that are not similar to other neighboring cells are the color moments, the edge histogram descriptors, the color
considered as abnormal events. Reddy et al. [14] divide each and edge directivity descriptors, the color layout descriptors,
frame into the same size cells and extract the features such and the scalable color descriptors, for robust detection. They
as motion, size, and texture for each of them. They model employ the local binary fitting model (LBF) for anomaly
the scene by combining three types of classifiers to make detection.
effective use of different features. In this paper, we propose a new approach for anomaly
Cong et al. [32] segment the video in spatiotemporal detection and localization in crowded senses. The most
domain at first. Then, a region-based descriptor is used to important contributions of the proposed method include: (1)
describe the appearance and motion of the patches. In this Eliminating the problem of occluded normal objects by learn-
approach, anomaly detection is formulated as a matching ing the conventional overlapping from the training data (2)
problem instead of using statistical models. Roshtkhari et extracting important regions and eliminating trivial infor-
al. [13] model the spatiotemporal compositions (STC) using mation to reduce the number of required feature vectors
a method which imposes spatial and temporal constraints and computational load (3) using two different classifiers to
on the video volumes so that a video arrangement with the detect appearance and motion anomalies separately (4) divid-
very low frequency of occurrence is considered as an abnor- ing large regions into non-overlapping cells and examining
mality. Amraee et al. [26] present a method based on the abnormalities for each cell to achieve more accurate anomaly
connected component analysis (CCA). In this method, the localizations. The rest of this paper is structured as follows:
appearance abnormality is detected using the HOG of candi- the main body of the proposed method is discussed in Sect. 2.
date regions and a multivariate Gaussian model. For motion The experimental results are shown in Sect. 3. Finally, Sect. 4
anomaly detection, they propose a simple algorithm using is dedicated to conclusions and recommendations for future
the average magnitude of optical flow without considering research.
the direction of moving objects.
Revathi et al. [33] propose a deep learning-based anomaly
detection (DLAD) system that involves four modules for 2 The proposed method
background estimation, object segmentation, feature extrac-
tion, and activity recognition.The proposed method by Zhou The proposed method consists of three main parts: feature
et al. [20] models the events using a spatial-temporal con- extraction, one-class SVM model, and anomaly detection,
volutional neural networks (CNN) to capture features from which are described in this section.
both spatial and temporal dimensions by performing spatial-
temporal convolutions. Yu et al. [21] propose a multiscale 2.1 Feature extraction
histogram of optical flow and oriented gradient (MHOFG)
to represent the features of a spatiotemporal patches. They As shown in Fig. 1, in the first step of the proposed method,
use two sparse dictionaries to include both normal and abnor- a foreground binary mask is created for each frame. Then,
mal information in training. Also, an additional abnormality the connected components are extracted from the foreground
dictionary is trained as an additional condition of judging mask using associated region map. A connected component
testing samples to be normal or abnormal. Sabokrou et al. in a binary image is a set of contiguous pixels that contain
[29] define each video as a set of non-overlapping cubic “1”. In a region map, each connected component receives a
patches using two local and global descriptors. The local unique number as its region code. There are three separate
and global features are based on structure similarity between regions in the map of Fig. 1b which are labeled with No. 1
adjacent patches and the features learned in an unsupervised to 3. The size of each region is defined by the rectangle that
way, using a sparse autoencoder. covers it (see Fig. 1a).
In the other work, Sabokrou et al. [30] propose two cubic In the next step, we construct a matrix called region-size
patch-based anomaly detectors using autoencoders where matrix. This matrix displays the number of regions that have
one works based on the reconstituting error (RE), and the the same size. In other words, the entry in the ith row and jth
123
Signal, Image and Video Processing
123
Signal, Image and Video Processing
HOG 9
1 2 3
1 2 3 ... LBP 9 10
HOG-LBP
1 2 3 ... 9 10 11 18 19
(a)
…
Optical flow maps Foreground cells HOF
(b)
Fig. 2 Feature extraction from training data: a appearance features (HOG-LBP), b motion features (HOF)
Where xi for i = {1, 2, . . . , N} is a set of training data and k xi , x j = exp −γ xi − x j 2 , γ ≥ 0 (4)
w is the learned weight vector. Also, ρ is the offset, and the
predefined parameter ν ∈ (0, 1] represents an upper bound where γ is a parameter that sets the spread of the kernel.
on the fraction of data that is let to be located in the outlier
side of the hypeplane. (xi ) is a feature projection function 2.3 Anomaly detection
which maps feature vector xi into a higher dimensional fea-
ture hyperspace F. The projection function can be defined To detect abnormalities in our method, the foreground mask
implicitly by introducing an associated kernel function k and the region map are extracted from the current frame.
[36,43]: For the regions whose sizes are similar to the PCS, HOF is
calculated and then applied to the OC-SVM model. If the
k(xi , x j ) = (xi )T .(x j ) (3) HOF of a region does not fall into normal half of feature
space, it is reported as a motion anomaly. For large regions,
Polynomial, sigmoid and radial basis kernel function (RBF) the work is a bit complicated because there may be an abnor-
are the most frequently used kernel functions in SVM mod- mality in a small part of it. For example, in a large region that
els. In the RBF kernel, the number of parameters which includes two or more occluded objects, only one of them may
influences the complexity of model selection is less than the carry out abnormal motion. So, the global HOF of the region
others. In the proposed method, we use RBF kernel in two cannot reflect this abnormality. Therefore, it is necessary to
distinct OC-SVM models to detect appearance and motion divide the large region into non-overlapping cells and calcu-
anomaly separately. The RBF kernel function on two samples late individual HOF for each cell. Figure 3 illustrates how
xi and x j is defined as [36]: this subdivision is performed. In this figure, a large region is
123
Signal, Image and Video Processing
5 6 3 Experimental results
Create HOG-LBP
OC-SVM
for the region Appearance
Anomaly
Detecon
Divide the
Create HOF
region into non-
for each cell
overlapped cells
yes
123
Signal, Image and Video Processing
123
Signal, Image and Video Processing
Fig. 7 Anomaly localization on Ped2: pixel-level ROC Fig. 8 ROC of appearance anomaly detection: Proposed method
(HOG-LBP, OC-SVM), and CCA [26] (HOG, Gaussian model)
Table 2 Comparison of AUC on UCSD (Ped1 | Ped2)
Frame-level Pixel-level Method
123
Signal, Image and Video Processing
Table 3 Total number of extracted feature vectors 2. Vishwakarma, S., Agrawal, A.: A survey on activity recognition
and behavior understanding in video surveillance. Vis Comput.
HOG-LB HOF
29(10), 983–1009 (2013)
Traditional method 3. Feng, W., Liu, R., Zhu, M.: Fall detection for elderly person care in
a vision-based home surveillance environment using a monocular
Ped1 100 × 7200 = 720,000 720,000 camera. Signal Image Video Process 8(6), 1129–1138 (2014)
Ped2 144 × 2010 = 289,440 289,440 4. Zhou, S.H., et al.: Unusual event detection in crowded scenes by tra-
Proposed method jectory analysis. In: Proceedings of ICASSP, pp. 1300–1304 (2015)
5. Kumar, D., et al.: A visual-numeric approach to clustering and
Ped1 7090 74,689
anomaly detection for trajectory data. Vis Comput. 33(3), 265–281
Ped2 5535 25,620 (2017)
6. Junejo, I.: Using dynamic Bayesian network for scene modeling
and anomaly detection. Signal Image Video Process. 4(1), 1–10
(2010)
4 Conclusion 7. Rao, Y.: Automatic vehicle recognition in multiple cameras for
video surveillance. Vis. Comput. 31(3), 271–280 (2015)
8. Zhang, C., Chen, W., et al.: A multiple instance learning and rel-
In this paper, a new method for detecting anomalous events in evance feedback framework for retrieving abnormal incidents in
crowded scenes is presented. The proposed method is based surveillance videos. J. Multimed. 5(4), 310–321 (2010)
on one-class support vector machine. In the first step, we 9. Vallejo, D., Albusac, J., Jimenez, L.: A cognitive surveillance sys-
determine the proper cell size (PCS). The PCS is calculated tem for detecting incorrect traffic behaviors. Expert Syst. Appl.
36(7), 10503–10511 (2009)
by considering the common size of normal objects in the 10. Albusac, J., et al.: Intelligent surveillance based on normality anal-
training data. Then, the appearance abnormality is detected ysis to detect abnormal behaviors. Pattern Recognit. Artif. Intell.
using HOG-LBP descriptors of the candidate regions and 23(7), 1223–1244 (2009)
applying it to a trained OC-SVM classifier. To achieve more 11. Varadarajan, J., Odobez, J.: Topic models for scene analysis and
abnormality detection. In: Proceedings of IEEE Conference on
accurate detection of appearance anomalies, the conven- Computer Vision Workshops, pp. 1338–1345 (2009)
tional overlappings are taken into account in training data. 12. Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of
In fact, the proposed method can correctly distinguish a few occluded people. Int. J. Comput. Vis. 110(1), 58–69 (2014)
overlapped normal objects from a single large (abnormal) 13. Roshtkhari, M., Levine, D.: A non-line, real-time learning method
for detecting anomalies in videos using spatio-temporal composi-
one. Motion abnormalities are also detected by a separate tions. Comput. Vis. Image Underst. 117(10), 1436–1452 (2013)
OC-SVM using HOF. To achieve higher accuracy, the large 14. Reddy, V., Sanderson, C., Lovell, B.: Improved anomaly detection
regions are divided into non-overlapping cells, and a sepa- in crowded scenes via cell-based analysis of foreground speed, size
rate HOF is calculated for each cell. The experimental results and texture. In: Proceedings of IEEE Computer Society Conference
on Computer Vision and Pattern Recognition Workshops, pp. 55–
show that the proposed method can detect abnormal events in 61 (2011)
crowded scenes outperforming the state-of-the-art methods 15. Mahadevan, V., Li, W., et al.: Anomaly detection in crowded scenes.
based on the UCSD dataset. In: Proceedings of IEEE Conference on Computer Vision and Pat-
In the follow-up of this work, we intend to move toward tern Recognition, pp. 1975–1981 (2010)
16. Kim, J., Grauman, K.: Observe locally, infer globally: a space-time
deep learning concepts and use of autoencoders. In this way, MRF for detecting abnormal activities with incremental updates.
it will no longer be necessary to calculate the hand-crafted In: IEEE Conference on Computer Vision and Pattern Recognition,
features such as HOG-LBP and HOF. The output codes of pp. 2921–2928 (2009)
a trained autoencoder in a deep network can be used as the 17. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detec-
tion using social force model. In: Proceedings of IEEE Conference
extracted feature vectors. Also, we will conduct our research on Computer Vision and Pattern Recognition, pp. 935–942 (2009)
toward other types of anomaly detection algorithms applica- 18. Zhang, T., et al.: A new method for violence detection in surveil-
ble to other sorts of surveillance systems. In the method we lance scenes. Multimed. Tools Appl. 75(12), 7327–7349 (2016)
presented here, it is assumed that the background image is 19. Ren, W., et al.: Unsupervised kernel learning for abnormal events
detection. Vis. Comput. 31(3), 245–255 (2015)
time-invariant and is calculated in the training phase. In appli- 20. Zhou, S.H., et al.: Spatial-temporal convolutional neural networks
cations using a moving camera, the background image of the for anomaly detection and localization in crowded scenes. Signal
target environment is variable. In these circumstances, we Proc. Image Comm. 47, 358–368 (2016)
need to create a module for recalculation of the background 21. Yu, Y., Shen, W., Huang, H., Zhang, Zh: Abnormal event detection
in crowded scenes using two sparse dictionaries with saliency. J.
image at specified intervals and add it to the proposed system. Electron. Imaging 26(3), 33013 (2017)
22. Biswas, S., Babu, R.V.: Anomaly detection in compressed
H.264/AVC video. Multimed. Tools Appl. 74(24), 11099–11115
(2015)
References 23. Zaharescu, A., Wildes, R.: Anomalous behavior detection using
spatiotemporal oriented energies, subset inclusion histogram com-
1. Sodemann, A., Ross, M., Borghetti, B.: A review of anomaly detec- parison and event-driven processing. In: Proceedings of European
tion in automated surveillance. IEEE Trans. Syst. Man Cybern. Conference on Computer Vision, pp. 563–576 (2010)
42(6), 1257–1272 (2012)
123
Signal, Image and Video Processing
24. Bertini, M., Bimbo, A., Seidenari, L.: Multi-scale and real-time 35. Cheng, W., Chen, T., Fang, H.: Gaussian process regression-based
nonparametric approach for anomaly detection and localization. video anomaly detection and localization with hierarchical feature
Comput. Vis. Image Underst. 116(3), 320–329 (2012) representation. IEEE Trans. Image Process. 24(12), 5288–5301
25. Li, T., Chang, H., et al.: Crowded scene analysis: a survey. IEEE (2015)
Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015) 36. Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in
26. Amraee, S., et al.: Anomaly detection and localization in crowded videos by learning deep representations of appearance and motion.
scenes using connected component analysis. Multimed. Tools Comput. Vis. Image Underst. 156(C), 117–127 (2017)
Appl. https://doi.org/10.1007/s11042-017-5061-7 (2017) 37. Miao, Y., Song, J.: Abnormal event detection based on SVM
27. Kangwei, L., et al.: Abnormal event detection and localization in video surveillance. In: Proceedings of IEEE Workshop on
using level set based on hybrid features. Signal Image Video Pro- Advanced Research and Technology in Industry Applications, pp.
cess. https://doi.org/10.1007/s11760-017-1153-0 (2017) 1379–1383 (2014)
28. Leyva, R., et al.: Abnormal event detection in videos using binary 38. Chen, Y., Qian, J., Saligrama, V.: A new one-class SVM for
features. In: International Conference on Telecommunications and anomaly detection. In: Proceedings of IEEE ICASSP, pp. 3567–
Signal Processing (TSP) (2017) 3571 (2013)
29. Sabokrou, M., et al.: Real-time anomaly detection and localiza- 39. Dalal, N., Triggs, B.: Histograms of oriented gradients for human
tion in crowded scenes. In: IEEE Conference on Computer Vision detection. In: Proceedings of IEEE Conference on Computer Vision
Pattern Recognition Workshops, pp. 320–329 (2015) and Pattern Recognition, pp. 886–893 (2005)
30. Sabokrou, M., et al.: Video anomaly detection and localisation 40. Ojala, T., Pietikainen, M., Maenpaa, T.: Multi resolution gray-
based on the sparsity and reconstruction error of auto-encoder. scale and rotation invariant texture classification with local binary
Electron. Lett. 52(13), 1122–1124 (2016) patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987
31. Lee, D., et al.: Motion influence map for unusual human activ- (2002)
ity. IEEE Trans. Circuits Syst. Video Technol. 25(10), 1612–1623 41. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented
(2015) histograms of flow and appearance. In: Proceedings of European
32. Cong, Y., Yuan, J., Yandong, T.: Video anomaly search in crowded Conference on Computer Vision, pp. 428–441 (2006)
scenes via spatio-temporal motion context. IEEE Trans. Inf. Foren- 42. Barron, L., Fleet, J., Beauchemin, S., Burkitt, A.: Performance of
sics Secur. 8(10), 1590–1599 (2013) optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
33. Revathi, A., Kumar, D.: An efficient system for anomaly detection 43. Schölkopf, B., et al.: Estimating the support of a high-dimensional
using deep learning classifier. Signal Image Video Process. 11(2), distribution. Neural Comput. 13(7), 1443–1471 (2001)
291–299 (2017) 44. UCSD Anomaly Detection Dataset.: http://www.svcl.ucsd.edu/
34. Xiang, T., Gong, Sh: Video behavior profiling for anomaly detec- projects/anomaly/dataset
tion. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 893–908 (2008)
123