You are on page 1of 9

Signal, Image and Video Processing

https://doi.org/10.1007/s11760-018-1267-z

ORIGINAL PAPER

Abnormal event detection in crowded scenes using one-class SVM


Somaieh Amraee1 · Abbas Vafaei1 · Kamal Jamshidi1 · Peyman Adibi1

Received: 21 August 2017 / Revised: 24 January 2018 / Accepted: 2 March 2018


© Springer-Verlag London Ltd., part of Springer Nature 2018

Abstract
In this paper, a new method for detecting abnormal events in public surveillance systems is proposed. In the first step of the
proposed method, candidate regions are extracted, and the redundant information is eliminated. To describe appearance and
motion of the extracted regions, HOG-LBP and HOF are calculated for each region. Finally, abnormal events are detected
using two distinct one-class SVM models. To achieve more accurate anomaly localization, the large regions are divided into
non-overlapping cells, and the abnormality of each cell is examined separately. Experimental results show that the proposed
method outperforms existing methods based on the UCSD anomaly detection video datasets.

Keywords Anomaly detection · Crowded scenes · One-class SVM · Optical flow

1 Introduction extraction of occluded entities is an important problem in


the object-level methods [5–12]. In the systems with variable
Video surveillance systems have made great progress in congestion of the intended environment, we need a method
recent years. Pedestrian safety has become one of the most that can continue with high performance in the crowded sit-
important applications of surveillance systems. Therefore, uation.
the number of public places that are equipped with surveil- In the low-level feature extraction methods, the moving
lance cameras is increasing day by day. Due to the cost and objects are not recognized as independent entities. In these
difficulties of careful monitoring by human operators in a approaches, the features such as motion, color, texture, opti-
traditional surveillance system, the automatic approaches are cal flow are extracted for each pixel or cell. The low-level
required for detecting abnormal events in public places and feature extraction approaches have the advantages of being
important buildings. It should be mentioned that the pres- robust to occlusion which affect the tracking accuracy. Since
ence of unusual moving objects such as trucks, cars, bicycles, no object is extracted from the image, this approach is able
carts on pedestrian walkways is considered as an appear- to operate in crowded scenes [13–38]. In most low-level fea-
ance anomaly. Also, the unusual actions of one or multiple ture extraction methods, the input video sequence is first
pedestrians are considered as motion abnormality. The act divided into a set of 2D or 3D non-overlapping cells. Then,
of running and aggressive behavior of pedestrians are two the cells are represented in the form of appropriate descrip-
examples of this anomaly type. tors such as histogram of oriented gradients (HOG) [39],
To detect abnormalities in the places with sparse distri- local binary pattern (LBP) [40], histogram of optical flow
bution of moving objects, automated surveillance systems (HOF) [41]. Consequently, the cells that contain abnormal
are often used to extract object-level features such as posi- events are detected using a proper one-class classifier.
tion, size, speed, trajectory for each moving object. Then In recent years, several methods have been developed to
the abnormal object is detected according to these features detect abnormal events in crowded scenes without the need
[1–4]. In crowded scenes, the performance of tracking algo- for object tracking. Kim and Grauman [16] model the local
rithms is reduced because of temporary loss of an object due optical flow using mixtures of probabilistic principal com-
to its being blocked from camera view. The accurate feature ponent analyzers (MPPCA). They employ spatiotemporal
Markov random field (MRF) to detect local and global abnor-
B Abbas Vafaei malities. Mehran et al. [17] propose social force concept in
abbas_vafaei@eng.ui.ac.ir
which interacting forces are calculated using the extracted
1 Faculty of Computer Engineering, University of Isfahan, optical flow from small cells.
Isfahan, Iran

123
Signal, Image and Video Processing

Biswas et al. [22] propose an approach in H.264/AVC other on the sparse representation (SR) of an input video
framework to detect anomalies in surveillance videos using patch. Xu et al. [36] propose an appearance and motion deep
the magnitude and orientation of motion vectors. Zaharescu network (AMDN) in a double-fusion framework to extract
and Waild [23] propose a method that employs the spatiotem- both motion and appearance information. Leyva et al. [28]
poral energy filters to make a pixel-level activity pattern. propose binary features to detect abnormal events in an online
They also model the appearance and dynamics of the crowded manner. They extract binary strings from temporal gradients
scenes using the mixture of dynamic textures (MDT) [15]. and also foreground occupancy features to detect the abnor-
Bertini et al. [24] propose a local test to measure the simi- mality in a Gaussian mixture model (GMM).
larity of a video cell and its eight neighbors. In this method, Kangwei et al. [27] extract five image descriptors, namely
the cells that are not similar to other neighboring cells are the color moments, the edge histogram descriptors, the color
considered as abnormal events. Reddy et al. [14] divide each and edge directivity descriptors, the color layout descriptors,
frame into the same size cells and extract the features such and the scalable color descriptors, for robust detection. They
as motion, size, and texture for each of them. They model employ the local binary fitting model (LBF) for anomaly
the scene by combining three types of classifiers to make detection.
effective use of different features. In this paper, we propose a new approach for anomaly
Cong et al. [32] segment the video in spatiotemporal detection and localization in crowded senses. The most
domain at first. Then, a region-based descriptor is used to important contributions of the proposed method include: (1)
describe the appearance and motion of the patches. In this Eliminating the problem of occluded normal objects by learn-
approach, anomaly detection is formulated as a matching ing the conventional overlapping from the training data (2)
problem instead of using statistical models. Roshtkhari et extracting important regions and eliminating trivial infor-
al. [13] model the spatiotemporal compositions (STC) using mation to reduce the number of required feature vectors
a method which imposes spatial and temporal constraints and computational load (3) using two different classifiers to
on the video volumes so that a video arrangement with the detect appearance and motion anomalies separately (4) divid-
very low frequency of occurrence is considered as an abnor- ing large regions into non-overlapping cells and examining
mality. Amraee et al. [26] present a method based on the abnormalities for each cell to achieve more accurate anomaly
connected component analysis (CCA). In this method, the localizations. The rest of this paper is structured as follows:
appearance abnormality is detected using the HOG of candi- the main body of the proposed method is discussed in Sect. 2.
date regions and a multivariate Gaussian model. For motion The experimental results are shown in Sect. 3. Finally, Sect. 4
anomaly detection, they propose a simple algorithm using is dedicated to conclusions and recommendations for future
the average magnitude of optical flow without considering research.
the direction of moving objects.
Revathi et al. [33] propose a deep learning-based anomaly
detection (DLAD) system that involves four modules for 2 The proposed method
background estimation, object segmentation, feature extrac-
tion, and activity recognition.The proposed method by Zhou The proposed method consists of three main parts: feature
et al. [20] models the events using a spatial-temporal con- extraction, one-class SVM model, and anomaly detection,
volutional neural networks (CNN) to capture features from which are described in this section.
both spatial and temporal dimensions by performing spatial-
temporal convolutions. Yu et al. [21] propose a multiscale 2.1 Feature extraction
histogram of optical flow and oriented gradient (MHOFG)
to represent the features of a spatiotemporal patches. They As shown in Fig. 1, in the first step of the proposed method,
use two sparse dictionaries to include both normal and abnor- a foreground binary mask is created for each frame. Then,
mal information in training. Also, an additional abnormality the connected components are extracted from the foreground
dictionary is trained as an additional condition of judging mask using associated region map. A connected component
testing samples to be normal or abnormal. Sabokrou et al. in a binary image is a set of contiguous pixels that contain
[29] define each video as a set of non-overlapping cubic “1”. In a region map, each connected component receives a
patches using two local and global descriptors. The local unique number as its region code. There are three separate
and global features are based on structure similarity between regions in the map of Fig. 1b which are labeled with No. 1
adjacent patches and the features learned in an unsupervised to 3. The size of each region is defined by the rectangle that
way, using a sparse autoencoder. covers it (see Fig. 1a).
In the other work, Sabokrou et al. [30] propose two cubic In the next step, we construct a matrix called region-size
patch-based anomaly detectors using autoencoders where matrix. This matrix displays the number of regions that have
one works based on the reconstituting error (RE), and the the same size. In other words, the entry in the ith row and jth

123
Signal, Image and Video Processing

anomaly. It should be remembered that the main problem in


the crowded scenes is the occlusion of moving objects. In
our method, this problem is well resolved, because the con-
ventional overlapped objects are learned from the training
data.
Figure 2a shows the block diagram of the proposed method
for HOG-LBP extraction from the training data. To extract
the HOG vector for a candidate region, each pixel within
a cell casts a weighted vote for one of the nine histogram
bins based on the values found by the gradient computation.
Also, we use the rotationally invariant uniform LBP, which
implements a ten-bin histogram as a simple texture descrip-
tor. By putting together these two HOG and LBP vectors,
the final feature vectors are created to describe the appear-
ance of the regions. Therefore, the HOG-LBP feature vector
has a length of 19, and we have a 19-dimensional space to
detect appearance abnormalities. To extract motion informa-
tion from training data, we use Lucas–Kanade method [42]
to calculate the optical flow of two consecutive frames.
Fig. 1 Determining proper cell size: a Foreground mask with three
connected components and their covering rectangles, b Region map, c For each pixel, the magnitude of optical flow indicates its
Region-size matrix, d Proper cell size (PCS) speed that has a value greater than zero inside the foreground.
In order to attain better utilization of motion information, it is
necessary to use the direction of optical flow vectors in addi-
column represents the number of regions that have the width tion to their magnitudes. As shown in Fig. 2b after calculating
of i and the length of j. For example, if there are four regions the optical flow, each frame is divided into non-overlapping
with the size of 24 × 10 pixels, the element at row 24 and cells and histogram of oriented optical flow (HOF) is calcu-
column 10 is marked by number 4 (blue rectangle in Fig. 1c). lated for each of them. It should be noted that the cells that are
After parsing all the training frames, this matrix represents mostly occupied by background pixels do not have significant
the repetition of different region sizes in the whole of the motion information. So, to avoid modeling the background
training data. As the region-size matrix becomes complete, and additional computation load, the motion feature is only
an entry which has the maximum value is considered as the extracted from the foreground cells. Thus, the white cells in
cell size. The red circle shows the maximum entry of Fig. 1c. Fig. 2b does not affect the histogram formation.
Thus, the proper cell is a rectangle with the size of 21 × 12 in
this example. Thereafter, we call the proper cell size as PCS. 2.2 One-class SVM modeling
We expect that PCS is proportional to the size of the normal
objects (pedestrians in Fig. 1d) because it is calculated using One-class SVM (OC-SVM) is widely used for abnormal
the most frequent region size in the training data. event detection [36–38]. The main idea of OC-SVM is to
Generally, there are two types of abnormalities in the find the maximal margin hyperplane using appropriate ker-
surveillance systems. Appearance anomaly refers to an object nel function to map most of the training data into only one
which is visually different from normal ones, such as a truck side of the hyperplane. The OC-SVM maximizes the dis-
or bicycle on the sidewalk. Motion anomaly refers to an tance of this hyperplane from the origin. It may be viewed
unusual motion by an object that seems normal in physi- as a regular two-class SVM where all the training data lies
cal appearance. A running pedestrian on the sidewalk is an on one side of the feature space while outlier data points find
example of motion anomaly. To detect appearance anomaly, themselves on the other side. To find the maximal margin
the first step is to extract large regions. Any region whose hyperplane, the underlying problem of OC-SVM is formu-
covering rectangle is adequately greater than the PCS is con- lated as the following quadratic program [36,43]:
sidered as a large region. The large regions are extracted in
the training data, and then HOG-LBP descriptor is calcu- 1 1 N
min w2 + ξi − ρ (1)
lated for each of them. Since there is no abnormality in the w,ρ 2 vN i=1
training data, the large regions in these frames are certainly
made by a set of occluded normal objects. Therefore, we such that:
can make a one-class model on these regions. This model
describes the large regions in which there is no appearance w T . (xi ) ≥ ρ − ξi , ξi ≥ 0 (2)

123
Signal, Image and Video Processing

HOG 9
1 2 3

1 2 3 ... LBP 9 10

HOG-LBP
1 2 3 ... 9 10 11 18 19

(a)


Optical flow maps Foreground cells HOF
(b)

Fig. 2 Feature extraction from training data: a appearance features (HOG-LBP), b motion features (HOF)

   
Where xi for i = {1, 2, . . . , N} is a set of training data and k xi , x j = exp −γ xi − x j 2 , γ ≥ 0 (4)
w is the learned weight vector. Also, ρ is the offset, and the
predefined parameter ν ∈ (0, 1] represents an upper bound where γ is a parameter that sets the spread of the kernel.
on the fraction of data that is let to be located in the outlier
side of the hypeplane. (xi ) is a feature projection function 2.3 Anomaly detection
which maps feature vector xi into a higher dimensional fea-
ture hyperspace F. The projection function  can be defined To detect abnormalities in our method, the foreground mask
implicitly by introducing an associated kernel function k and the region map are extracted from the current frame.
[36,43]: For the regions whose sizes are similar to the PCS, HOF is
calculated and then applied to the OC-SVM model. If the
k(xi , x j ) = (xi )T .(x j ) (3) HOF of a region does not fall into normal half of feature
space, it is reported as a motion anomaly. For large regions,
Polynomial, sigmoid and radial basis kernel function (RBF) the work is a bit complicated because there may be an abnor-
are the most frequently used kernel functions in SVM mod- mality in a small part of it. For example, in a large region that
els. In the RBF kernel, the number of parameters which includes two or more occluded objects, only one of them may
influences the complexity of model selection is less than the carry out abnormal motion. So, the global HOF of the region
others. In the proposed method, we use RBF kernel in two cannot reflect this abnormality. Therefore, it is necessary to
distinct OC-SVM models to detect appearance and motion divide the large region into non-overlapping cells and calcu-
anomaly separately. The RBF kernel function on two samples late individual HOF for each cell. Figure 3 illustrates how
xi and x j is defined as [36]: this subdivision is performed. In this figure, a large region is

123
Signal, Image and Video Processing

if a new HOG-LBP is known as an outlier of the learned


hyperplane, it means that this large region is not made by a
1 2 set of occluded normal objects and it has different appear-
ance characteristics from the training ones. As a result, this
region is reported as an appearance anomaly. The overall
structure of the proposed algorithm for detecting abnormal-
Abnormal
cell ities in crowded scenes is shown in Fig. 4. As can be seen in
this figure, a region map is created by receiving a new frame.
3 4
Abnormality of each region is examined separately, and the
size of each region determines the path that follows in the
flowchart of Fig. 4

5 6 3 Experimental results

The proposed method is implemented using MATLAB 2016


Large Region Optical Flow and LibSVM3.22 library. We used UCSD anomaly detec-
tion dataset [44] for evaluation of abnormal event detection
Fig. 3 Division of large regions into non-overlapping cells
and localization in crowded scenes. The UCSD image set
has been recorded by a fixed camera on the sidewalks. The
crowd density in the walkways is variable from sparse to very
divided into six cells. HOF is plotted at the bottom of each crowded and abnormal events are due to either the circulation
cell. The first cell contains no motion information, so it’s of non-pedestrian moving objects on the sidewalks or anoma-
HOF is not calculated. The cell No. 3 covers a skateboard, so lous pedestrian motion patterns. The UCSD dataset is split
it produces an unusual HOF. Therefore, this cell is reported into two image sequences. Ped1 contains 34 and 36 training
as containing a motion anomaly. Other cells have all normal and testing video clips, respectively, and Ped2 includes 16
motion, so their HOF are placed in the normal half of feature training clips and 12 test clips. Each test clip in Ped1 and
space. Ped2 is provided with manually generated pixel-level binary
In this way, the motion abnormalities of each cell are masks for evaluating the anomaly localization. In the exper-
examined independently of the others. In addition, a large iments, any region whose covering rectangle’s area is two
region is not always the result of occluded pedestrians. Some- times greater than the PCS is considered as a large region.
times an unusual moving object, such as a truck, forms a The calculated PCS value is 32 × 12 for Ped1 and 40 × 15
large region. By extracting the HOG-LBP and applying it to for Ped2. In the proposed method, there are two distinct OC-
another OC-SVM model, we can identify large regions that SVM classifiers to detect appearance and motion anomaly.
include an object with apparent anomaly. In other words, Thus, they receive two independent γ in Eq. (4). The value of

Create HOG-LBP
OC-SVM
for the region Appearance
Anomaly
Detecon
Divide the
Create HOF
region into non-
for each cell
overlapped cells
yes

New test Create Get a new Large


OC-SVM
frame region map region region Moon
Anomaly
NO Detecon

Create HOF for


the region

Fig. 4 Anomaly detection in the proposed method

123
Signal, Image and Video Processing

Fig. 5 Anomaly detection in Ped1 (top) and Ped2 (bottom)


Fig. 6 Anomaly detection on Ped2: frame-level ROC

Table 1 EER comparison with the state-of-the-art methods


Pixel-level Frame-level Method which indicates its higher accuracy in anomaly detection.
With a closer look at Table 1, it becomes obvious that despite
SF [17] 0.42 0.79
the frame-level EER in the methods of [13] and [31] is lower
MPPCA [16] 0.31 0.82
than the proposed method, their pixel-level error is more than
SF+MPPCA [15] 0.36 0.72 ours (the pixel-level EER is not available in [31]).
Bertini [24] 0.30 – Note that a frame is considered correctly detected if at least
Zaharescu [23] 0.27 0.36 one pixel is detected as an abnormality, and actual location
MDT [15] 0.25 0.55 of the anomaly is not necessary for frame-level evaluation.
Biswas [22] 0.21 – Therefore, it is possible for some true positive detections to
Reddy [14] 0.20 – identify abnormal events in the wrong location. To test local-
Revathi [33] 0.18 – ization accuracy, detected regions are compared to pixel-level
STC [13] 0.13 0.26 ground truth masks. In the pixel-level evaluation, a frame is
Cong [32] 0.24 – considered as correctly detected if at least 40 percent of the
Zhou [20] 0.24 – truly anomalous pixels have been identified and is counted
Xu [36] 0.17 – as a false positive detection otherwise. Thus, the pixel-level
Sabokrou [29] 0.19 0.24 criterion seems to be the predominant criterion for evalua-
RE-SR [30] 0.15 – tion of different methods. The frame-level and pixel-level
CCA [26] 0.21 0.27 receiver operating characteristic (ROC) curves of the pro-
Leyva [28] 0.21 0.38 posed method are plotted in Figs. 6 and 7 respectively. Also,
Lee [31] 0.10 – a quantitative comparison of different methods in terms of
Proposed method 0.14 0.21 area under curve (AUC) is shown in Table 2. The superiority
of the proposed method in terms of anomaly detection and
localization is obvious from this table.
To perform a more accurate evaluation, the ROC curves for
γ is set to 0.05 on appearance classifier and 0.10 on motion appearance and motion anomaly detection are drawn up indi-
classifier. These values lead to better detection accuracy in vidually and then compared with the related curves of CCA
our experiments. [26].1 Figure 8 shows the ROC of appearance anomaly detec-
Figure 5 shows four examples of anomaly detection on tion for proposed method and CCA [26]. It can be seen in this
Ped1 and Ped2 datasets. Also, In Table 1, the frame-level figure that the proposed method has achieved better results
and pixel-level values of equal error rate (EER) on Ped2 are due to the use of more accurate descriptors (HOG-LBP) and
shown. In an anomaly detection method, EER is defined as classifier (OC-SVM). The ROC curves of motion anomaly
the point that the false positive rate (FPR) is equal to the
false negative rate (FNR). The lower values of EER show 1 Individual ROC curves for motion/appearance anomaly detection are
the higher accuracy of detection. As this table shows, the only available for Ped2 in [26]. Such curves have not been reported in
proposed method has achieved a smaller pixel-level EER, the other methods.

123
Signal, Image and Video Processing

Fig. 7 Anomaly localization on Ped2: pixel-level ROC Fig. 8 ROC of appearance anomaly detection: Proposed method
(HOG-LBP, OC-SVM), and CCA [26] (HOG, Gaussian model)
Table 2 Comparison of AUC on UCSD (Ped1 | Ped2)
Frame-level Pixel-level Method

SF [17] 0.67 | 0.55 0.20 | 0.17


MPPCA [16] 0.59 | 0.69 0.20 | 0.13
SF+MPPCA [15] 0.67 | 0.61 0.21 | 0.20
MDT [15] 0.82 | 0.83 0.44 | 0.42
CCA [26] – | 0.85 – | 0.80
Kangwei [27] 0.79 | 0.90 0.66 | 0.77
Xu [36] 0.92 | 0.90 0.67 | –
Lee [31] –|– 0.65 | 0.81
Biswas [22] –|– 0.40 | –
Proposed method 0.85 | 0.93 0.68 | 0.85

detection are plotted in Fig. 9. This figure also shows the


superiority of the proposed method for detecting abnormal Fig. 9 ROC of motion anomaly detection: Proposed method (HOF,
motion events. It is necessary to mention that the algorithm OC-SVM), and CCA [26] (Average optical flow)
of motion anomaly detection in [26] is based on the average
magnitude of optical flow (AOF) and it does not indicate the
direction of moving objects. Figure 9 demonstrates that the contains six small regions and four large regions (as large as
HOF descriptor simultaneously utilizes both magnitude and three cells), the number of extracted feature vectors is much
direction of the optical flow vectors, and this will result in less than 288. In this example, six HOF vectors for the small
higher accuracy for motion anomaly detection. regions and four HOG-LBP vectors plus 4 × 3 = 12 HOF
To detect abnormal events in the proposed method, due vectors for large regions are extracted. In this way, a total
to the extraction of connected components from the original of 22 vectors are calculated for the example frame, which is
frames, there is no need for dividing of the entire frame into much less than 288. It should be noted that the number of
the non-overlapping cells (see Fig. 4). Therefore, the amount large regions in the test set is 7090 and 5535 for Ped1 and
of required data and thus the computational load is signifi- Ped2, respectively, and the total number of extracted HOF is
cantly reduced. For example, as can be seen in Fig. 1b, a frame 74,689 for Ped1 and 25,620 for ped2. Also, there are 7200
of Ped2 contains 144 cells of size 40 × 15. In the traditional test frames in Ped1, and this number is 2010 for Ped2. Table
method, one HOG-LBP feature vector and one HOF feature 3 compares our method with the traditional method in terms
vector are extracted for each cell. Thus, it is necessary to of the total number of feature vectors. The comparison of
extract 144 × 2 = 288 feature vectors for one frame of Ped2. these numbers proves that our method can accurately detect
However, in the proposed method, if a hypothetical frame anomalies using a smaller number of feature vectors.

123
Signal, Image and Video Processing

Table 3 Total number of extracted feature vectors 2. Vishwakarma, S., Agrawal, A.: A survey on activity recognition
and behavior understanding in video surveillance. Vis Comput.
HOG-LB HOF
29(10), 983–1009 (2013)
Traditional method 3. Feng, W., Liu, R., Zhu, M.: Fall detection for elderly person care in
a vision-based home surveillance environment using a monocular
Ped1 100 × 7200 = 720,000 720,000 camera. Signal Image Video Process 8(6), 1129–1138 (2014)
Ped2 144 × 2010 = 289,440 289,440 4. Zhou, S.H., et al.: Unusual event detection in crowded scenes by tra-
Proposed method jectory analysis. In: Proceedings of ICASSP, pp. 1300–1304 (2015)
5. Kumar, D., et al.: A visual-numeric approach to clustering and
Ped1 7090 74,689
anomaly detection for trajectory data. Vis Comput. 33(3), 265–281
Ped2 5535 25,620 (2017)
6. Junejo, I.: Using dynamic Bayesian network for scene modeling
and anomaly detection. Signal Image Video Process. 4(1), 1–10
(2010)
4 Conclusion 7. Rao, Y.: Automatic vehicle recognition in multiple cameras for
video surveillance. Vis. Comput. 31(3), 271–280 (2015)
8. Zhang, C., Chen, W., et al.: A multiple instance learning and rel-
In this paper, a new method for detecting anomalous events in evance feedback framework for retrieving abnormal incidents in
crowded scenes is presented. The proposed method is based surveillance videos. J. Multimed. 5(4), 310–321 (2010)
on one-class support vector machine. In the first step, we 9. Vallejo, D., Albusac, J., Jimenez, L.: A cognitive surveillance sys-
determine the proper cell size (PCS). The PCS is calculated tem for detecting incorrect traffic behaviors. Expert Syst. Appl.
36(7), 10503–10511 (2009)
by considering the common size of normal objects in the 10. Albusac, J., et al.: Intelligent surveillance based on normality anal-
training data. Then, the appearance abnormality is detected ysis to detect abnormal behaviors. Pattern Recognit. Artif. Intell.
using HOG-LBP descriptors of the candidate regions and 23(7), 1223–1244 (2009)
applying it to a trained OC-SVM classifier. To achieve more 11. Varadarajan, J., Odobez, J.: Topic models for scene analysis and
abnormality detection. In: Proceedings of IEEE Conference on
accurate detection of appearance anomalies, the conven- Computer Vision Workshops, pp. 1338–1345 (2009)
tional overlappings are taken into account in training data. 12. Tang, S., Andriluka, M., Schiele, B.: Detection and tracking of
In fact, the proposed method can correctly distinguish a few occluded people. Int. J. Comput. Vis. 110(1), 58–69 (2014)
overlapped normal objects from a single large (abnormal) 13. Roshtkhari, M., Levine, D.: A non-line, real-time learning method
for detecting anomalies in videos using spatio-temporal composi-
one. Motion abnormalities are also detected by a separate tions. Comput. Vis. Image Underst. 117(10), 1436–1452 (2013)
OC-SVM using HOF. To achieve higher accuracy, the large 14. Reddy, V., Sanderson, C., Lovell, B.: Improved anomaly detection
regions are divided into non-overlapping cells, and a sepa- in crowded scenes via cell-based analysis of foreground speed, size
rate HOF is calculated for each cell. The experimental results and texture. In: Proceedings of IEEE Computer Society Conference
on Computer Vision and Pattern Recognition Workshops, pp. 55–
show that the proposed method can detect abnormal events in 61 (2011)
crowded scenes outperforming the state-of-the-art methods 15. Mahadevan, V., Li, W., et al.: Anomaly detection in crowded scenes.
based on the UCSD dataset. In: Proceedings of IEEE Conference on Computer Vision and Pat-
In the follow-up of this work, we intend to move toward tern Recognition, pp. 1975–1981 (2010)
16. Kim, J., Grauman, K.: Observe locally, infer globally: a space-time
deep learning concepts and use of autoencoders. In this way, MRF for detecting abnormal activities with incremental updates.
it will no longer be necessary to calculate the hand-crafted In: IEEE Conference on Computer Vision and Pattern Recognition,
features such as HOG-LBP and HOF. The output codes of pp. 2921–2928 (2009)
a trained autoencoder in a deep network can be used as the 17. Mehran, R., Oyama, A., Shah, M.: Abnormal crowd behavior detec-
tion using social force model. In: Proceedings of IEEE Conference
extracted feature vectors. Also, we will conduct our research on Computer Vision and Pattern Recognition, pp. 935–942 (2009)
toward other types of anomaly detection algorithms applica- 18. Zhang, T., et al.: A new method for violence detection in surveil-
ble to other sorts of surveillance systems. In the method we lance scenes. Multimed. Tools Appl. 75(12), 7327–7349 (2016)
presented here, it is assumed that the background image is 19. Ren, W., et al.: Unsupervised kernel learning for abnormal events
detection. Vis. Comput. 31(3), 245–255 (2015)
time-invariant and is calculated in the training phase. In appli- 20. Zhou, S.H., et al.: Spatial-temporal convolutional neural networks
cations using a moving camera, the background image of the for anomaly detection and localization in crowded scenes. Signal
target environment is variable. In these circumstances, we Proc. Image Comm. 47, 358–368 (2016)
need to create a module for recalculation of the background 21. Yu, Y., Shen, W., Huang, H., Zhang, Zh: Abnormal event detection
in crowded scenes using two sparse dictionaries with saliency. J.
image at specified intervals and add it to the proposed system. Electron. Imaging 26(3), 33013 (2017)
22. Biswas, S., Babu, R.V.: Anomaly detection in compressed
H.264/AVC video. Multimed. Tools Appl. 74(24), 11099–11115
(2015)
References 23. Zaharescu, A., Wildes, R.: Anomalous behavior detection using
spatiotemporal oriented energies, subset inclusion histogram com-
1. Sodemann, A., Ross, M., Borghetti, B.: A review of anomaly detec- parison and event-driven processing. In: Proceedings of European
tion in automated surveillance. IEEE Trans. Syst. Man Cybern. Conference on Computer Vision, pp. 563–576 (2010)
42(6), 1257–1272 (2012)

123
Signal, Image and Video Processing

24. Bertini, M., Bimbo, A., Seidenari, L.: Multi-scale and real-time 35. Cheng, W., Chen, T., Fang, H.: Gaussian process regression-based
nonparametric approach for anomaly detection and localization. video anomaly detection and localization with hierarchical feature
Comput. Vis. Image Underst. 116(3), 320–329 (2012) representation. IEEE Trans. Image Process. 24(12), 5288–5301
25. Li, T., Chang, H., et al.: Crowded scene analysis: a survey. IEEE (2015)
Trans. Circuits Syst. Video Technol. 25(3), 367–386 (2015) 36. Xu, D., Yan, Y., Ricci, E., Sebe, N.: Detecting anomalous events in
26. Amraee, S., et al.: Anomaly detection and localization in crowded videos by learning deep representations of appearance and motion.
scenes using connected component analysis. Multimed. Tools Comput. Vis. Image Underst. 156(C), 117–127 (2017)
Appl. https://doi.org/10.1007/s11042-017-5061-7 (2017) 37. Miao, Y., Song, J.: Abnormal event detection based on SVM
27. Kangwei, L., et al.: Abnormal event detection and localization in video surveillance. In: Proceedings of IEEE Workshop on
using level set based on hybrid features. Signal Image Video Pro- Advanced Research and Technology in Industry Applications, pp.
cess. https://doi.org/10.1007/s11760-017-1153-0 (2017) 1379–1383 (2014)
28. Leyva, R., et al.: Abnormal event detection in videos using binary 38. Chen, Y., Qian, J., Saligrama, V.: A new one-class SVM for
features. In: International Conference on Telecommunications and anomaly detection. In: Proceedings of IEEE ICASSP, pp. 3567–
Signal Processing (TSP) (2017) 3571 (2013)
29. Sabokrou, M., et al.: Real-time anomaly detection and localiza- 39. Dalal, N., Triggs, B.: Histograms of oriented gradients for human
tion in crowded scenes. In: IEEE Conference on Computer Vision detection. In: Proceedings of IEEE Conference on Computer Vision
Pattern Recognition Workshops, pp. 320–329 (2015) and Pattern Recognition, pp. 886–893 (2005)
30. Sabokrou, M., et al.: Video anomaly detection and localisation 40. Ojala, T., Pietikainen, M., Maenpaa, T.: Multi resolution gray-
based on the sparsity and reconstruction error of auto-encoder. scale and rotation invariant texture classification with local binary
Electron. Lett. 52(13), 1122–1124 (2016) patterns. IEEE Trans. Pattern Anal. Mach. Intell. 24(7), 971–987
31. Lee, D., et al.: Motion influence map for unusual human activ- (2002)
ity. IEEE Trans. Circuits Syst. Video Technol. 25(10), 1612–1623 41. Dalal, N., Triggs, B., Schmid, C.: Human detection using oriented
(2015) histograms of flow and appearance. In: Proceedings of European
32. Cong, Y., Yuan, J., Yandong, T.: Video anomaly search in crowded Conference on Computer Vision, pp. 428–441 (2006)
scenes via spatio-temporal motion context. IEEE Trans. Inf. Foren- 42. Barron, L., Fleet, J., Beauchemin, S., Burkitt, A.: Performance of
sics Secur. 8(10), 1590–1599 (2013) optical flow techniques. Int. J. Comput. Vis. 12(1), 43–77 (1994)
33. Revathi, A., Kumar, D.: An efficient system for anomaly detection 43. Schölkopf, B., et al.: Estimating the support of a high-dimensional
using deep learning classifier. Signal Image Video Process. 11(2), distribution. Neural Comput. 13(7), 1443–1471 (2001)
291–299 (2017) 44. UCSD Anomaly Detection Dataset.: http://www.svcl.ucsd.edu/
34. Xiang, T., Gong, Sh: Video behavior profiling for anomaly detec- projects/anomaly/dataset
tion. IEEE Trans. Pattern Anal. Mach. Intell. 30(5), 893–908 (2008)

123

You might also like