You are on page 1of 14

Appl Intell

DOI 10.1007/s10489-017-1017-x

Anomaly detection using piecewise aggregate approximation


in the amplitude domain
Huorong Ren1 · Xiujuan Liao1 · Zhiwu Li1,2 · Abdulrahman AI-Ahmari3

© Springer Science+Business Media, LLC 2017

Abstract Anomaly detection has received much attention determined by anomaly scores that are based on similarities
due to its various applications. Generally, the first step to among representation results. Experimental results on vari-
discover anomalies is a data representation method which ous data confirm that the proposed method is more accurate
reduces dimensionality as well as preserves key informa- than the PAA based method and other comparison meth-
tion. Anomaly detection based on real-value representation ods. The ability to differentiate anomalies of the proposed
methods is meaningful for its convenience in numeric oper- algorithm is also superior.
ation. A typical real-value representation method is the
Piecewise Aggregate Approximation (PAA) that is simple Keywords Sequences · Anomaly detection · Data
and intuitive by capturing mean values of segments in a representation · Anomaly score
sequence. However, if segments are same or similar in their
average values but different in their oscillation amplitudes,
the PAA method is ineffective to describe a sequence com- 1 Introduction
posed of such segments. To address this issue, we propose
a representation method called the Piecewise Aggregate Sequences arise frequently in many fields, including med-
Approximation in the Amplitude Domain (AD-PAA). For ical science, network security, finance, industrial engineer-
discovering anomalies, a sequence is partitioned into subse- ing, and transportation [3, 9, 32, 40, 42]. Since meaningful
quences by a sliding window firstly. Then in the AD-PAA information of a sequence is involved in anomalies, anomaly
method, a subsequence is divided into equal size subsections detection for sequences is valuable in real applications,
according to the amplitude domain. With mean values of for instance, disease diagnosis in medical science [2] and
subsections computed, the amplitude oscillation of a subse- intrusion detection in networks [44]. In a sequence, pattern
quence is embodied effectively. When the AD-PAA method anomalies are sub-regions (subsequences) that deviate from
is applied to approximate subsequences, the AD-PAA rep- the expected behavior [11], which is an important type of
resentation of a sequence is constructed. Anomalies are anomalies. This paper focuses on pattern anomaly detection.
Initially, the studies for anomaly detection usually deal
with sequences directly. Different techniques such as dis-
tance functions [25], density functions [5], clustering meth-
 Huorong Ren ods [19] and prediction models [4] are employed for
rszhjin@126.com
detecting various types of anomalies. For the reason that
1 School of Electro-Mechanical Engineering, Xidian University,
a subsequence is equal to a set of points, Ma and Perkins
Xi’an 710071, China [35] propose an algorithm using a prediction model. By
using the Support Vector Regression (SVR) model that is
2 Institute of Systems Engineering, Macau University being trained for normal parts in a sequence, the predicted
of Science and Technology, Taipa 999078, Macau, China
values for points to be detected are computed. When the
3 Industrial Engineering Department, College of Engineering, difference between some real values of all points and the
King Saud University, Riyadh 11421, Saudi Arabia predicted values exceeds expectation, the pattern anomalies
H. Ren et al.

are detected. The work in [24] develops a method called However, the PAA method smooths the amplitude informa-
the brute force algorithm that utilizes the nearest neighbor tion of the local of a sequence. Improved works for the PAA
distance between any two subsequences. By setting a thresh- method are mainly divided into three types. One method is
old, the top-k anomalies are found. Luo et al. [34] suggest to add different information such as variance for each seg-
a method reconstructing a raw sequence in the phase space. ment of a sequence directly. Using the adaptive division
When reconstruction errors between parts of a raw sequence method in the time axis falls into the second category. The
and a reconstructed sequence are larger than the selected third is the combination of the approach of adding various
threshold, anomalies are detected. There are other studies information and the technique of adaptive division. There
that utilize reconstruction methods for anomaly detection. are two limitations of these improved works. Firstly, they
For example, in [18] and [46], reconstruction models are only capture the local change of a sequence in the time axis,
built by the fuzzy C-means clustering. Certainly, some algo- while that of a sequence in the amplitude axis is ignored.
rithms that manage raw sequences directly are efficient for Secondly, the complexity of improved representation meth-
anomaly detection. However, the time cost of these anomaly ods based on the adaptive division is high. To overcome
detection methods that handle sequences directly is usually these limitations of the PAA-based methods, the need for
high. a non-adaptive representation method that strengthens the
Recently, algorithms always commence with a data rep- description of the local information of the amplitude axis is
resentation method to speed up the procedure of anomaly significant.
detection [29, 36, 41, 43]. There are two major func- In this paper, we propose a concise and effective real-
tions of data representation methods. Firstly, it contributes value representation method, namely the Piecewise Aggre-
to compress sequences [21]. When a sequence is long, a gate Approximation in the amplitude domain (AD-PAA),
data representation approach is useful to reduce computa- which divides a subsequence into equal size subsections
tional costs. Secondly, it contributes to represent critical according to the amplitude domain to realize anomaly
information of a sequence [12]. If an unsuitable data rep- detection. The AD-PAA method partitioned a sequence
resentation is adopted, false anomaly detection results are into non-overlapping subsequences firstly. Then mean val-
probably yielded. Consequently, an effective data represen- ues in subsections are utilized to substitute original data.
tation method is primary to anomaly detection. The symbol After that, the proposed representation method can be con-
representation is a common data representation technique. structed. By using the AD-PAA method to all subsequences,
With a certain rule for segments in a sequence, each segment the AD-PAA representation of a sequence is obtained. Since
is converted into a character [30]. Accordingly, a subse- the AD-PAA method describes information of a sequence in
quence in the transformed sequence is a string formed of the amplitude domain, it reserves the information of oscil-
characters. The symbol representation is superior in using lation amplitude. The division method in the AD-PAA adds
complex tools such as suffix trees [31]. Keogh et al. pro- information in the direction of the amplitude domain. In
pose a method called the HOT-SAX [24] that employs addition, there is no need to fit the shape of sequences
the maximum neighbor distance among strings to decide in the division process. With the help of Euclidean dis-
anomalies. Several algorithms using symbol representation tance, we construct the relationship between two arbitrary
methods are also reported. Yan and Chen [43] develop an subsequences. Finally, we convert correlations among sub-
on-line method by estimating the density of anomaly scores sequences into anomaly scores for subsequences. By setting
in subsequences continuously, where anomaly scores are an anomaly threshold parameter, anomalies are detected.
calculated by the HOT-SAX algorithm. Lonardi et al. [33] The remainder of this paper is structured as follows.
suggest an approach using the Markov model to train a nor- Section 2 briefly reviews some previous works. Section 3
mal string and then to compute the expected probability of presents the related definitions of anomaly detection. The
test strings. The work in [6] reports a binary representa- proposed method AD-PAA is developed in Section 4.
tion based method that adopts the identical algorithm frame Section 5 formulates an algorithm for anomaly detection
in the HOT-SAX. In reality, anomaly detection depends on using the AD-PAA method. The experiment results are
numerical values, but the computation for symbols should reported in Section 6. Conclusions are reached in Section 7.
be transformed to numeric operation in the existing stud-
ies. Based on the numeric form directly, the operation for
real-value approximation results is convenient. Hence, an 2 Related work
efficacious real-value representation method for anomaly
detection is worthy of attention. Currently, lots of representation methods have been pro-
The Piecewise Aggregate Approximation (PAA) [21] posed. Popular real-value representation methods include
technique is a typical and fast real-value method which uses the Discrete Wavelet Transform (DWT) [8, 10], the Model
mean values of equal size segments to represent a sequence. Approximation (MA) [14, 37, 38], the Piecewise Linear
Anomaly detection using piecewise aggregate approximation...

Approximation (PLA) [22] and the Piecewise Aggregate of a sequence is completed. The PLA method is effec-
Approximation (PAA) [21, 45]. Various real-value data rep- tive in shape approximation and information representation
resentation strategies constitute different approaches for if the breakpoints are reasonable. However, the segmenta-
anomaly detection. Based on the DWT method, Shahabi tion procedure by traversing all points to find break points is
et al. [39] propose an algorithm called the TSA-Tree which complex [15].
uses the maximum of Harr wavelet coefficients to find Next the PAA technique [21, 45] will be reviewed. Sup-
anomalies. Harr wavelet is outstanding in describing local pose that a sequence is X(m) = {x(1), x(2), . . . , x(m)}
information [13]. Since a sequence is represented by several of length m and the PAA approach is utilized to trans-
significant Harr DWT coefficients, some anomalies may be form the sequence. The sequence can be written as X =
lost when the TSA-Tree algorithm is utilized for anomaly {x(1), x(2), . . . , x(w)}. Formally,
detection. Akouemo et al. propose an algorithm by combing m
wi

different models [1]. In this work, the difficulty of training w
x(i) = x(j ) (1)
process by adopting various models is increased simultane- m
j= m
w (i−1)+1
ously. On the basis of the PLA method that approximates
segments as straight lines, Leng et al. [27] develop an where w is the number of segments after transformation.
approach utilizing the dynamic warping distance function to Besides, the size of each segment is equal. In (1), the dimen-
two arbitrary subsequences. Once some results exceed the sionality of a sequence is reduced from m to w(w < m). By
setting threshold, they can be regarded as anomalies. In [28], sorting the means of w segments according to the time order,
the PLA method is applied by Leng et al. again to propose a the PAA representations of a sequence is obtained. If we
method where the density function called the Local Outlier extract subsequences by a sliding window for the PAA rep-
Factor (LOF) is used. The LOF is adopted in [28] of mea- resentations of a whole sequence, the PAA representations
suring the density relationship among a subsequence and its of any subsequences are acquired.
neighbors. Generally, if the LOF of a subsequence is larger Compared with the DWT, MA and PLA methods, the
than one, the subsequence is an anomaly. These works sug- procedure in the PAA approach is simple. Accompany-
gested by Leng et al. receive good grades in accuracy, but ing with merits, a problem is also presented in the PAA
their complex procedure affects the efficiency of detection espe- technique. For example, given two segments with same or
cially when the data size is great. From the literature review, similar average values but different oscillation amplitudes,
it is easy to find that data representation methods have the PAA method is invalid to describe their property. Given
important influence on the results of anomaly detection. the number of segments w = 10, Fig. 1 illustrates an exam-
The DWT is a method which transforms a sequence of ple of the PAA representations of a sequence. Figure 1
length m into p main wavelet coefficients (with p << shows that the PAA representation method divides time
m) merely. For the reason that the DWT completes the series on the direction of time axis. However, the sequence
task of representation in frequency domain instead of time in Fig. 1 is smoothed out with amplitude information lost.
domain, the representation results of the DWT method is To improve the representation quality of the PAA
not intuitive. The MA, as its name suggests, is a method method, different methods are presented. Guo et al. in
that describes a sequence by a kind of model. By build- [16] suggest an improved PAA by adding variance infor-
ing a suitable model with its parameters or coefficients, the mation of each equal size segment, which is an approach
model is formed. However, sometimes only a kind of model
for a sequence approximation is not sufficient, especially
when a sequence is complex. Classic model approximation
methods include the Support Vector Machine (SVM), the
−1.0

Hidden Markov Model (HMM), etc.. The PLA is a linear


method which is intuitive and understandable. By select-
ing and connecting breakpoints into segments, the process
data value
−1.5

of the PLA approach is implemented. A vital parameter


to choose breakpoints is the segmentation error [26] and
different methods to select breakpoints are proposed. An
−2.0

example of the PLA method is given next. Firstly, one can


link two adjacent points and add points one by one until
the fitting error of a straight line exceeds the segmentation
−2.5

error. Secondly, the previous step is executed continuously


0 20 40 60 80
to receive all breakpoints of a sequence. Once the break-
points of a sequence are obtained, the PLA representation Fig. 1 The PAA representations
H. Ren et al.

that supplies different information directly. Since various If a sliding window moves l steps each time, the k-th
information in each segment grows, the difficulty for data subsequence is expressed as:
analysis is increased. In fact, diverse information such as Xk = {x(1 + l(k − 1)), x(2 + l(k − 2)), . . . , x(n + l(k − 1))} (2)
the extreme value can be appended to the PAA representa-
where the number of subsequences k is computed by:
tion, but it is difficult to decide what information is useful.
Keogh et al. in [7] propose the Adaptive Piecewise Con- m−n
k= +1 (3)
stant Approximation (APCA) technique using breakpoints l
to decide the best linear segments and then capturing the Specially, if the number of the moving steps l of a sliding
mean value of each segment. Since its segmentation size window is equal to the length n of the subsequences, the
is unfixed, the APCA approach is adapted to the shape of k-th subsequence is alternatively denoted as:
a sequence. With unfixed size for tracking the shape of
Xk = {x(1 + n(k − 1)), x(2 + n(k − 1)), . . . , x(nk)} (4)
a sequence, the mean value of each segment plays a bet-
ter role in the time axis. However, when the single mean where k = m/n. A subsequence is a part of a sequence with
value is used for each segment, the amplitude domain is length n ≤ m. Indeed, it is can been seen as a new sequence too.
still probably lost in the APCA method. On the basis of
the APCA method, the work in [17] proposes the Piece- Definition 4 Amplitude domain: Given a subsequence Xi ,
wise Liner Aggregate Approximation (PLAA) technique. the amplitude domain of a subsequence is defined by Ai =
That is to say, the adaptive division by selecting breakpoints [Xil , Xiu ], where Xil (Xiu ) is the minimum (maximum)
in the PLAA method is the same as the APCA method. value of Xi .
Meanwhile, both mean and slope are employed to repre-
sent each unfixed size segment in a sequence. Hence, the Definition 5 Subsections: Given a subsequence Xi , its sub-
information in the PLAA method is increased by using sections are generated by dividing the amplitude domain Ai
the slope information. In addition, some problems of the of the subsequence Xi into equal size subsections. The t-th
APCA and PLAA approaches are exposed by using break- subsection of a subsequence is defined by:
points. To choose optimal breakpoints, the procedure for the
Ait = [at−1 , at ) 1≤t ≤q (5)
APCA and PLAA techniques is complex. If the APCA and
PLAA methods are used for anomaly detection, it is difficult where at−1 and at are the lower and upper bounds of t-th
to establish the correlations among subsequences. Further- subsection, respectively, and q is the number of subsections.
more, the improved works only capture the local change of
a sequence in the time axis. Aimed at highlighting the local
change of a sequence in the amplitude axis, a concise rep- 4 Piecewise Aggregate Approximation
resentation method is proposed. In addition, by dividing a in the amplitude domain
sequence according to the amplitude domain, the process in
the brevity representation method to be adapted to the shape In this section, a representation method called the AD-PAA
of a sequence is unnecessary. This paper reports a compact is introduced. Unlike the PAA method that completes the
real-value representation method for anomaly detection. division on the direction of the time axis merely, the AD-
PAA method focuses on the division on the direction of the
amplitude axis. The division process and the suitable form
3 Definition statement for this representation method are introduced in detail.

To state the issue of anomaly detection concretely, we give 4.1 Division


the related definitions used in this paper.
The inspiration of the AD-PAA method is to divide the
Definition 1 Sequence: A sequence X(m) = amplitude information domain to describe the local infor-
{x(1), x(2), . . . , x(m)} is a time series, where data elements mation in the direction of amplitude. Considering the time
are ordered by time, and m stands for the length of X(m). attribute of a sequence and the local property of anomalies,
the division starts from subsequences which are achieved by
Definition 2 Sliding Window: Sliding window is a time a sliding window. And then a sequence is composed of a
window of a user-defined length n ≤ m whose movement series of subsequences. Given a subsequence Xi of length n,
follows a sequence. the amplitude domain of Xi is determined according to Def-
inition 4. Then we divide Xi into q equal size subsections.
Definition 3 Subsequence: Given a sequence, the subse- Based on the division, points are assigned to the correspond-
quences of length n are extracted by a sliding window. ing subsection. In other words, when the data value of a
Anomaly detection using piecewise aggregate approximation...

point in the subsequence Xi is within [at−1 , at ), it belongs information of the first subsequence in the AD-PAA method
to the t-th subsection. is reserved.
To introduce the AD-PAA division frame clearly, we con-
tinue to use the sequence in Fig. 1 to give statement. The
division frame of the AD-PAA method is shown in Fig. 2, 5 The AD-PAA for anomaly detection
illustrating two subsequences of length 40 (40 points) that
are transformed to length four (four subsections). That is According to the concept of pattern anomalies, the goal is
to say, the subsequence is reduced to 10% of its raw sub- to find the most unusual subsequences. Hence, an approach
sequence. The AD-PAA method is competitive in reducing by computing anomaly scores for subsequences is proposed.
dimensionality when a data set is large. Before discussing anomaly scores, building a similarity
measure is an essential step. A similarity measure is used
4.2 Representation to describe the differences among subsequences. Given two
subsequences Xi and Xj , we apply the AD-PAA method
A vital representation step is to capture useful information. to Xi and Xj firstly. Consequently, subsequences Xi and
Indeed, each subsection is a dispersed point set. To assess Xj are transformed to vectors [μi1 , μi2 , . . . , μiq ]T and
the whole trend of points within each subsection, the mean [μj 1 , μj 2 , . . . , μj q ]T , as expressed by (6), respectively. The
value is beneficial. Accordingly, we employ mean values of similarity between these two vectors reflects the difference
points within each subsection to construct the AD-PAA rep- between Xi and Xj . Hence, we define the similarity for Xi
resentation of a subsequence. Consequently, several average and Xj according to Euclidean Distance as follows:
values in subsections are calculated for a subsequence. Let 
 q
μit (1 ≤ t ≤ q) denote the mean value in the t-th subsection of k 
Sμ (X1 , Xj )/k =  (μit − μj t )2 (7)
a subsequence Xi . The representation of a subsequence is
j =1 t=1
written as a vector of the mean elements in subsections.
Formally, where q is the number of subsections. If two sequences are
 T similar, the values will be small, and vice versa. Given a
Xi = μi1 , μi2 , . . . , μiq (6)
sequence X(m), we divide it into k subsequences by a slid-
The sequence previously mentioned is still adopted in ing window firstly and then the AD-PAA method represents
this step. With length four of the subsequence in the each subsequence. Aimed at detecting anomalous subse-
PAA representation, which is equal to the length of sub- quences, we calculate the similarities between two arbitrary
sequence in the AD-PAA approach, the first subsequence subsequences, and the results are denoted as:
is represented as [−1.40, −1.47, −1.21, −0.96]. In real- ⎡ ⎤
Sμ (X1 , X1 ) Sμ (X1 , X2 ) . . . Sμ (X1 , Xk )
ity, the amplitude domain of the first subsequence is ⎢ Sμ (X2 , X1 ) Sμ (X2 , X2 ) . . . Sμ (X2 , Xk ) ⎥
[−2.28, −0.63]. The elements in the PAA representations ⎢ ⎥
S=⎢ .. ⎥ (8)
gather in a narrow range in the amplitude domain. In the ⎣ . ⎦
AD-PAA technique, the first subsequence is transformed to Sμ (Xk , X1 ) Sμ (Xk , X2 ) . . . Sμ (Xk , Xk )
[−2.14, −1.60, −1.46, −1.05]T . It is close to the original which is called the similarity matrix S. Finally, we employ
amplitude of the first subsequence [−2.28, −0.63]. Thus, the similarity matrix S to compute anomaly scores of sub-
compared with the PAA method, the amplitude oscillation sequences. The anomaly score of the i-th subsequence Xi is
calculated by:

k
Sμ (X1 , Xj )/k
−1.0

j =1
hi = k k (9)
2
i=1 j =1 Sμ (Xi , Xj )/k
data value
−1.5

In (9), the numerator indicates the mean of a column (a


row) in the similarity matrix S, and the denominator indi-
cates the mean of all elements in the similarity matrix S.
−2.0

Obviously, if a subsequence Xi is an anomaly, its similarity



k
Sμ (X1 , Xj )/k(j ∈ [1, k]) in a column (a row) is larger
−2.5

j =1
than that of other columns (rows) in the similarity matrix
0 20 40 60 80
S. Hence, the computed results of anomaly scores highlight
Fig. 2 The division in the AD-PAA representation anomalies. Since the anomaly score of an anomaly deviates
H. Ren et al.

that of the rest subsequences, it must be larger than the aver- subsequences are regarded as anomalies. However, if the
age level of anomaly scores of the whole subsequences. A F alse.rate is a positive value, some mistakes are made
threshold is crucial to make a distinction between normal for normal subsequences. The Det.rate and F alse.rate
and abnormal subsequences. Indeed, the threshold depends are given to evaluate the performance of the accurate and
on a performance index, which is called the confidence false results respectively. For determining the Det.rate and
index in next section. By conducting theoretical analysis and F alse.rate, the confidence index for an detection method
extensive experiments, the threshold is set to an experienced to detect an anomaly is computed. Let us consider a set
value of the confidence index. The process of the proposed hn that contains anomaly scores of all subsequences. The
method is shown as follows: confidence index for a sequence is given by
Step 1. Divide a sequence into subsequences by a sliding
hj
window. CI = (12)
Step 2. Construct the AD-PAA representation of each sub- hn
sequence.
Step 3. Compute the distances between two arbitrary sub- where CI is the confidence index, hj stands for the anomaly
sequences by (7) and then a similarity matrix is score of the j -th element in the whole subsequences, and
achieved by (8). hn is the mean of hn . According to (13), CI is the ratio
Step 4. Calculate anomaly scores for subsequences by (9). of anomaly score of a subsequence to the mean value
Step 5. Use a threshold to determine anomalies. of anomaly scores of all subsequence. Certainly, with a
large CI obtained by an anomaly detection method for an
anomaly in a sequence, the anomaly to be detected deviate
6 Experiment far from normal subsequences. Thus, the anomaly scores of
anomalies are larger than those of normal subsequences and
We conduct various experiments on extensive data sets the average value of that of all subsequences. According to
in this section. The performance of the presented method (9), it is easy to infer that the mean value of anomaly scores
focuses on effectiveness. In this section, the setting of of all subsequences is one. Consequently, if anomalies exist
parameters is discussed by synthetic data. To verify the per- in a sequence, the confidence index CI of an algorithm for
formance of the proposed approach, experiments on real detecting an anomaly in a sequence is much large than one.
data are launched. In addition, comparison methods are also Conversely, the confidence index CI of a normal subse-
applied for further discussion. quence is close to one. Hence, the setting threshold for the
proposed method must be larger than one. Extensive exper-
6.1 Performance indices iments show that the threshold two, which is equal to the
twice of the average value of anomaly scores of the whole
To evaluate the performance of detection methods, the subsequences, is effective for anomaly detection. Moreover,
indices Det.rate [20], F alse.rate and confidence index the confidence index CI is helpful to evaluate an algorithm
[18] are introduced. The Det.rate for a sequence is the to discover an anomaly in a sequence. To evaluate the ability
positive detection rate, which is denoted as follows: to differentiate multiple anomalies in a sequence, the aver-
tp age confidence index, which is denoted as CI , is calculated
Det.rate = (10) by
ta
where ta is the number of anomalies to detect in a sequence
ha
and tp is the number of anomalies detected correctly in practice. CI = (13)
From the definition above, the greater the parameter tp is, the hn
greater Det.rate is. Specially, Det.rate = 1(ta =tp ) means
Specially, if the number of anomalies is equal to one, the
ta anomalies are detected successfully, while Det.rate = 0
average confidence index is identical to the confidence
means that no subsequences are detected correctly.
index of the anomaly to be detected, namely CI = CI .
Except ta abnormal sequences, the rest (k − ta ) subse-
From the definition of the average confidence index, the
quences are normal, where k is the number of all subse-
larger the average confidence index CI is, the stronger the
quences. If tf normal subsequences are detected as anoma-
ability of the method to differentiate the normal and abnor-
lies, the F alse.rate for a sequence is defined by
mal subsequences. In addition, the average confidence index
tf CI is conducive to estimate the missing detection situation.
F alse.rate = (11)
(k − ta ) If the average confidence index CI is lower than the thresh-
where the F alse.rate for a sequence is the false detec- old, there must exist missing detection. On the contrary, the
tion rate. When the F alse.rate is equal to zero, no normal missing detection may or may not exist.
Anomaly detection using piecewise aggregate approximation...

6.2 Synthetic data the parameter q by one each time. Several detection results
with different parameter q are illustrated in Fig. 3.
According to introduction of the proposed method, the Figure 3a–d show the detection results of the first syn-
parameter q, standing for the number of subsections, is thetic data with different values of parameter q (the num-
significant to the quality of representation and anomaly ber of subsections) and each sub-figure is composed of
detection. To determine the optimal parameter q, the syn- two parts. The top part is a raw sequence and the other
thetic data is designed in this section. Waves have been used is anomaly score obtained by the proposed method and
for anomaly detection in [11]. The first sequence is built on assigned to each subsequence. To make the effect of detec-
a sine signal of cycle 60 and length 1200. Assigning a Gauss tion results obvious, a threshold line is added in the below
noise n(t) with zero mean and a standard deviation of 0.1 to part. The threshold line stands for twice the mean value of
the whole sine wave, the resulting signal, denoted by Y1 (t), the whole anomaly scores in a sequence. When an anomaly
is generated, i.e., score exceeds the threshold, its corresponding subsequence
  is regarded as an anomaly. For clarity, anomalies discovered
40π successfully are also highlighted. In Fig. 3a, the anomaly
Y1 (t) = sin t + n(t) (14)
N score of the event e1 (t) is great when q = 2. Since the
anomaly score of the first subsequence is also close to the
Then we add an anomalous portion in [601,630] of the
threshold line, q = 2 is not an optimal choice for the first
signal with a noise e1 (t) that obeys the distribution of
synthetic data. In Fig. 3b–d, the anomalies are discovered
N(0, 0.3). The resulting signal Y2 (t) is written as follows:
successfully, and their performance is better compared with
 
40π Fig. 3a.
Y2 (t) = sin t + n(t) + e1 (t) (15) To learn the function of parameter q for the proposed
N
method roundly, the performance indices of the Det.rate,
where t = 1, 2, . . . , 1200, and N = 1200. F alse.rate and CI (the average confidence index) are
Considering the quasi-period property of the first syn- shown in Table 1. Generally, the larger Det.rate and CI
thetic data, we set the length of subsequence to the cycle 60. are, and the smaller F alse.rate is, the better the parame-
The number of subsections q, namely the parameter in the ter q is. In Table 1, the Det.rate of 1/1 and F alse.rate of
proposed method, starts from two. We gradually increase 0/1 appear conformably when q is increased. Although the
data value

data value
0.5

0.5
−1.0

−1.0

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(a) (b)
data value

data value
0.5

0.5
−1.0

−1.0

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(c) (d)
Fig. 3 Detection results with different numbers of subsections q in the first synthetic data
H. Ren et al.

Table 1 Performance indices with different numbers of subsections q in the first synthetic data

q 2 3 4 5 6 7 8 9 10

Det.rate 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1 1/1
False.rate 0/19 0/19 0/19 0/19 0/19 0/19 0/19 0/19 0/19
CI 2.38 2.81 2.76 2.91 3.07 2.69 2.90 2.51 2.59

Det.rate and the F alse.rate are the same in various values anomaly is hidden in normal subsequences in this case. In
of q, the results of the CI vary. Since a large CI is help- Fig. 4b and c, the event e2 (t) reaches a value exceeding
ful for detection, the parameter q from three to ten is better the threshold. Besides, in each of these two sub-figures, the
than two in Table 1. Furthermore, when the number of sub- anomaly scores of others are less than the threshold, which
sections is six, CI reaches the maximum value in Table 1. leads to the successful detection of e2 (t) easily. In Fig. 4d,
Hence, the best value of q for the first synthetic data is six. the anomaly score of the subsequence with the anomaly is
In the first synthetic data, its anomaly is different from lower than the threshold when q is ten. Hence, the anomaly
other parts obviously. When anomalies are weak in a is not detected correctly. More detection results are listed in
sequence, it is interesting to understand the effect of the Table 2 for selecting the optimal parameter q.
parameter q on the proposed method. Next we expand an In Table 2, the F alse.rate is zero under different values
experiment on the second synthetic data. By adding an event of q. Since the Det.rate is 1/1 means that the anomaly is
e2 (t) that obeys the distribution N(0, 0.22) in [601,630] of detected correctly by a method, q from five to eight is fea-
the signal Y1 (t) as an anomaly, the second sequence is yielded. sible in the second synthetic data. Furthermore, the values
Figure 4 shows the detection results for the second syn- of CI (the average confidence index) are larger by using
thetic data under difference values of parameter q (the q from five to eight than that of the rest situations. Hence,
number of subsections). In Fig. 4a, no anomaly scores are when the parameter q is from five to eight, its detection
greater than the threshold when q is three. Although the results are better than other situations according to the over-
anomaly score of the event e2 (t) is the maximum, there are all performance indices in Table 2. In addition, on the basis
some other anomaly scores close to it. In other words, the of the maximum CI in results, the best value of q is also six.
data value

data value
0.5

0.5
−1.0

−1.0

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(a) (b)
data value

data value
0.5

0.5
−1.0

−1.0

0 200 400 600 800 1000 1200 0 200 400 600 800 1000 1200
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(c) (d)
Fig. 4 Detection results with different numbers of subsections q in the second synthetic data
Anomaly detection using piecewise aggregate approximation...

Table 2 Performance indices with different numbers of subsections q in the second synthetic data

q 2 3 4 5 6 7 8 9 10

Det.rate 0/1 0/1 0/1 1/1 1/1 1/1 1/1 0/1 0/1
F alse.rate 0/19 0/19 0/19 0/19 0/19 0/19 0/19 0/19 0/19
CI 0.93 1.57 1.91 2.21 2.29 2.05 2.13 1.89 1.90

The experiments of the first and the second synthetic data different cycle, they may be segmented in various length of
reveals that an appropriate q is of paramount importance. subsequences. We list two parameters m and n of data sets
The prime value of q in the first and the second synthetic in Table 3 subsequently displayed, where m is the length of
data is six. Therefore, we set the value of parameter q to be a sequence and n is that of subsequences. Several data sets
six in this paper. contain multiple columns, but their parameters are fixed for
each column in a data set. For example, chfdbchf01 275 2
6.3 Real data is the second column of sequence chfdbchf01 275 whose
length m is 3700 and length n of each subsequence is 230.
Considering real data is more complex than synthetic data, The parameter q discussed previously is still set to six in
the experiments will be applied to real data. The data come real data. Given the parameters m and n of real data sets, the
from the UCR time series Data Mining Archive [23], which proposed algorithm is applied.
is an open resource for sharing and were used in [20, 24] Figure 5 shows the detection results of the proposed
as well as some others paper. A portion of the real data method on two ECG sequences. In Fig. 5a, a sub-region
which involves (Electrocardiogram) ECG data sets and a [2000, 2600] of sequence 231 which is different from oth-
video data set are utilized in this experiment. The ECG data ers. The abnormal part of sequence 231 involves the seventh
sets involve various anomalies, such as Premature ventric- and the eighth subsequences. As illustrated in Fig. 5, two
ular contraction, Supraventricular escape beat and Bundle anomalies are detected successfully. Due to the fact that
branch block beat. The video data set is extracted from an two anomalies are much large than the threshold, the power
actor shooting action with a replica gun. In the video data of the proposed method to differentiate normal and abnor-
set, the X and Y coordinates of the actor’s right hand is mal subsequences is strong. Next a video data set is used
described. The repeated actions of the actor include drawing for the certification of the proposed method. The detection
a replica gun from a hip mounted holster, aiming it at a tar- result of video 2 (the second column of the video data set)
get and returning it. Since different data sets may involve a is visualized in Fig. 5b. As shown in Fig. 5b, two parts in

Table 3 The performance indices of the PAA method and the proposed method on various data

Data sets m n PAA method Proposed method

(sequence) (subsequence) Det.rate F alse.rate CI Det.rate False.rate CI

chfdbchf01 275 2 3700 230 1/1 0/15 2.12 1/1 0/15 4.97
chfdbchf01 275 3 3700 230 1/1 0/15 4.51 1/1 0/15 4.21
chfdbchf13 45590 2 3650 165 0/1 1/21 1.26 1/1 1/21 3.17
chfdbchf13 45590 3 3650 165 0/1 1/21 0.74 1/1 1/21 5.88
xmitdbx108 2 5100 340 0/1 1/14 1.20 1/1 1/14 2.95
xmitdbx108 3 5100 340 0/1 1/14 0.69 1/1 1/14 2.70
231 8500 330 0/2 1/23 0.83 2/2 0/23 4.02
234 3 8500 170 0/1 3/49 0.73 1/1 0/49 6.82
qtdbsele0606 3 9000 150 3/6 0/55 2.62 5/6 0/55 3.45
qtdbsel102 3 10000 200 1/2 0/46 0.98 2/2 0/46 4.72
video 1 6000 150 1/4 2/36 1.91 3/4 0/36 3.12
video 2 6000 150 1/4 2/36 1.96 4/4 0/36 3.72
Total 72900 2620 8/25(32%) 12/345(3.5%) 19.55 23/25(92%) 0/345(0%) 49.73
Average 6075 218 N.A. N.A. 1.63 N.A. N.A. 4.14
H. Ren et al.

data value

data value
300
0.5

150
−1.0
0 2000 4000 6000 8000 0 1000 2000 3000 4000 5000 6000

6
anomaly score

anomaly score
4

4
2

2
0

0
(a) (b)
Fig. 5 Detection results of the proposed algorithm on ECG data

[2000, 3000] are tremendously inconsistent with the others, is partitioned into subsequences whose length is identi-
which are included in four subsequences. When it comes cal to that in the AD-PAA method. After applying the
to the proposed method, four subsequences exceed thresh- PAA representation method to each subsequence, distances
old line and correspond to anomalies to be detected. Hence, between two arbitrary subsequences are calculated by (7),
anomalies are discovered correctly without falsehood by the (8) in the PAA (Euclidean distance) method. Besides, in
proposed method. the direct detection (Euclidean distance) method, the dis-
In order to reveal the performance of the proposed tances between subsequences are computed by the sum of
approach profoundly, comparison methods which involve distances of all points in them. To compare detection results
the PAA method (Euclidean distance), the direct detection of different methods, anomaly scores of the four methods
(Euclidean distance) method [25] and the brute force algo- are achieved by (9).
rithm [24] are adopted to prove the performance of the Two ECG data sets are exhibited to verify the proposed
proposed method. To compare these methods, a sequence method. Figure 6a–d show the detection results of the four
−1.0

−1.0
data value

data value
−2.5

−2.5

0 1000 2000 3000 0 1000 2000 3000


4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(a) (b)
−1.0

−1.0
data value

data value
−2.5

−2.5

0 1000 2000 3000 0 1000 2000 3000


4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

(c) (d)
Fig. 6 Detection results of four methods on sequence chfdbchf13 45590 2
Anomaly detection using piecewise aggregate approximation...

5.2

5.2
data value

data value
4.6

4.6
4.0

4.0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

0
(a) (b)
5.2

5.2
data value

data value
4.6

4.6
4.0

4.0
0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000
4

4
anomaly score

anomaly score
3

3
2

2
1

1
0

0
(c) (d)
Fig. 7 Detection results of four methods on sequence xmitdbx108 2

methods on sequence chfdbchf13 45590 2, in which one than the threshold. It is concluded that the direct detection
anomaly is involved. From the results of Fig. 6a, c and method fails to detect the anomaly. Moreover, for the reason
d, the maximum anomaly score points to the anomaly in that the column of the maximum anomaly score corresponds
the sequence. In other words, the anomaly is discovered by to a normal subsequence, the PAA method generates false
the PAA method, the brute force method and the proposed detection result. Compared with other methods, the differ-
method. However, anomaly scores in Fig. 6b are lower ence between anomaly scores of the abnormal and normal

Table 4 The performance indices of the direct detection method, the brute force method and the proposed method on various data

Data sets Direct detection method Brute force method Proposed method

Det.rate False.rate CI Det.rate False.rate CI Det.rate False.rate CI

chfdbchf01 275 2 1/1 0/15 3.86 1/1 0/15 6.60 1/1 0/15 4.97
chfdbchf01 275 3 0/1 0/15 1.89 1/1 0/15 3.30 1/1 0/15 4.21
chfdbchf13 45590 2 0/1 0/21 1.94 1/1 0/21 2.30 1/1 0/21 3.17
chfdbchf13 45590 3 0/1 0/21 1.23 0/1 0/21 1.56 1/1 0/21 5.88
xmitdbx108 2 0/1 0/14 0.90 0/1 0/14 1.28 1/1 0/14 2.95
xmitdbx108 3 0/1 0/14 0.92 0/1 1/14 0.46 1/1 0/14 2.70
231 0/2 0/23 0.90 1/2 0/23 1.47 2/2 0/23 4.02
234 3 0/1 0/49 0.97 1/1 0/49 2.22 1/1 0/49 6.82
qtdbsele0606 3 1/6 0/55 1.65 5/6 0/55 3.97 5/6 0/55 3.45
qtdbsel102 3 0/2 0/46 0.92 1/2 2/46 3.83 2/2 0/46 4.72
video 1 1/4 0/36 1.86 4/4 1/36 2.66 3/4 0/36 3.12
video 2 3/4 0/36 2.00 4/4 0/36 3.27 4/4 0/36 3.72
Total 6/25(24%) 1/345(0.3%) 19.04 19/25(76%) 5/345 = (1.5%) 32.92 23/25(92%) 0/345(0%) 49.73
Average N.A. N.A. 1.59 N.A. N.A. 2.64 N.A. N.A. 4.14
H. Ren et al.

subsequences is large in the proposed method. Hence, the method and the brute force method are 24% and 76%,
proposed method is superior in discriminating normal and respectively. Compared with these two approaches, the
abnormal subsequences. index Det.rate of the proposed method is improved by 68%
Figure 7a–d demonstrate the detection results of four and 16% respectively. When it comes to the F alse.rate of
methods on sequence xmitdbx108 2 of one anomaly. In the three methods, the direct detection method is 0.3%, the brute
view of Fig. 7a–c, anomaly scores are lower than the thresh- force algorithm is 1.5% and the proposed method is 0%.
old. Hence, the PAA method, the direct detection method Although the parameters F alse.rate of three methods are
and the brute force method are invalid to detect the anomaly. small, the advantage of the proposed algorithm is still pre-
When it comes to our method, one column of anomaly sented. In comparison with the direct detection method and
scores in Fig. 7d points to the anomaly. It is concluded the brute force method, F alse.rate of the proposed method
that the proposed method is effective to detect the anomaly is reduced by 0.3% and 1.5% respectively. Furthermore, the
in sequence xmitdbx108 2. In addition, there are no false average value of CI , i.e., 4.14, of the proposed method is
detection results when the proposed method is used. Con- greater than 1.59 of the direct detection method and 2.64
sidering the difference between the abnormal and normal of the brute force method. The overall performance indices
subsequences, the proposed method shows the strong ability above indicate that the proposed method is excellent in
to differentiate the abnormal and normal subsequences. accuracy and its ability to differentiate anomalies.
Except the previous results, other experiments are
expanded to compare the accuracy of the PAA method, the
direct detection method, the brute force method and the pro- 7 Conclusion
posed method. Table 3 shows results of experiments of the
PAA method and the proposed method. Parameter m and n In this paper, an effective approach is proposed for anomaly
stand for the length of sequence and subsequences in exper- detection. We realize the goal of anomaly detection based
iments, respectively. In Table 3, the performance indices on a real-value representation method, which is called the
of Det.rate, F alse.rate and CI (the average confidence AD-PAA. It is a novel representation method that its divi-
index) of 12 data sets are recorded. According to Table 3, sion is developed according to the amplitude domain. Then
we are engaged in the analysis from three steps. Firstly, mean values of points within subsections are utilized to rep-
the Det.rate of the proposed approach is 92%, while the resent subsequences. After the AD-PAA method is applied
Det.rate of the PAA method is 32%. That is to say, our to subsequences, a similarity measure is constructed for
method is adapted to some cases where the PAA method two arbitrary subsequences. As the similarity results are
does not work well. Totally, the Det.rate is improved by transformed to anomaly scores, anomalies are discovered
60% in the developed algorithm. Secondly, the F alse.rate by a threshold. We experimentally evaluated the proposed
of the PAA method is about 3.5%. Compared with it, the method in synthetic data and real data. Experiments show
proposed method is superior with its F alse.rate 0% in this that the accuracy of the proposed algorithm is 92% on real
experiment. Although the F alse.rate is reduced by 3.5%, data. Compared with the PAA method, its effectiveness is
it keeps the results more accurate. Finally, we evaluate the improved by 60%. Meanwhile, its false results is reduced
confidence index of the two methods for different data sets. by 3.5%. The ability of the proposed method to distin-
In most data sets, the CI for anomalies in our method are guish normal and abnormal is also outstanding. When more
greater than those in the PAA method. A large CI indicates approaches are applied to experimental data for comparison,
the stronger ability to discriminate the normal and abnormal the superiority of the proposed method is still exhibited in
subsequences. In turn, if a method has a stronger differen- accuracy and its ability to differentiate anomalies.
tiation ability in seeking out anomalies from a sequence, its
detection effect will be better. It is verified in Table 4 that Acknowledgements The authors extend their appreciation to the
the average value of CI , i.e., 4.14 of the proposed method is International Scientific Partnership Program ISPP at King Saud Uni-
greater than 1.63 of the PAA method. In summary, the pro- versity for funding this research Work through ISPP#0799.
posed method is more accurate than the PAA method with
less false results.
Table 4 counts the Det.rate, F alse.rate and CI (the References
average confidence index) of the direct detection approach,
the brute force approach and the proposed algorithm. By 1. Akouemo HN, Povinelli RJ (2016) Probabilistic anomaly detec-
comparing the performances indices in the three methods, tion in natural gas time series data. Int J Forecast 32(3):948–956.
doi:10.1016/j.ijforecast.2015.06.001
the quality of the proposed method is explored further. 2. Andrysiak T (2016) Machine learning techniques applied to data
Besides, the parameter setting is the same as Table 3. In analysis and anomaly detection in ecg signals. Appl Artif Intell
Table 4, the parameters Det.rate of the direct detection 30(6):610–634. doi:10.1080/08839514.2016.1193720
Anomaly detection using piecewise aggregate approximation...

3. Avazbeigi M, Doulabi SHH, Karimi B (2010) Choosing the appro- 21. Keogh E, Chakrabarti K, Pazzani M, Mehrotra S (2001a)
priate order in fuzzy time series: a new N-factor fuzzy time series Dimensionality reduction for fast similarity search in
for prediction of the auto industry production. Expert Syst Appl large time series databases. Knowl Inf Syst 3(3):263–286.
37(8):5630–5639. doi:10.1016/j.eswa.2010.02.049 doi:10.1007/PL00011669
4. Balasooriya U (1989) Detection of outliers in the exponential 22. Keogh E, Chu S, Hart D, Pazzani M (2001b) An online
distribution based on prediction. Commun Stat- Theory Methods algorithm for segmenting time series. In: Proceedings of
18(2):711–720. doi:10.1080/03610928908829929 IEEE international conference on data mining, pp 289–296.
5. Breunig MM, Kriegel H, Ng RT, Jsander (2000) Lof: iden- doi:10.1109/ICDM.2001.989531
tifying density-based local outliers. In: ACM SIGMOD inter- 23. Keogh E, Lin J, Fu AWC (2005) Details about time series discords.
national conference on management of data, pp 93–104. http://www.cs.ucr.edu/eamonn/discords
doi:10.1145/342009.335388 24. Keogh E, Lin J, Fu AW, Herle HV (2006) Finding unusual
6. Buu HTQ, Anh DT (2011) Time series discord discovery based on medical time-series subsequences: algorithms and appli-
iSAX symbolic representation. In: Proceedings of the third inter- cations. IEEE Trans Inf Technol Biomed 10(3):429–439.
national conference on knowledge and systems engineering, pp doi:10.1109/TITB.2005.863870
11–18. doi:10.1109/KSE.2011.11 25. Knorr EM, Ng R, Tucakov V (2000) Distance-based out-
7. Chakrabarti K, Keogh E, Mehrotra S, Pazzani M (2002) liers: algorithms and applications. VLDB J 8(3):237–253.
Locally adaptive dimensionality reduction for indexing large doi:10.1007/s007780050006
time series databases. ACM Trans Database Syst 27(2):188–228. 26. Lemire D, 2007 A better alternative to piecewise linear time series
doi:10.1145/568518.568520 segmentation. In: Proceedings of SIAM international conference
8. Chan FKP, Fu AWC, Yu C (2003) Haar wavelets for on data mining, pp 985–993. doi:10.1137/1.9781611972771.59
efficient similarity search of time-series: with and without 27. Leng MW, Lai XS, Tan G, Xu X (2009) Time series represen-
time warping. IEEE Trans Knowl Data Eng 15(3):686–705. tation for anomaly detection. In: IEEE international conference
doi:10.1109/TKDE.2003.1198399 on computer science and information technology, pp 628–632.
9. Chang PC, Fan CY, Lin JL (2011) Trend discovery in financial doi:10.1109/ICCSIT.2009.5234775
time series data using a case based fuzzy decision tree. Expert Syst 28. Leng MW, Yu W, Wu S, Hu H (2013) Anomaly detection algo-
Appl 38(5):6070–6080. doi:10.1016/j.eswa.2010.11.006 rithm based on pattern density in time series. Lecture Notes Electr
10. Chaovalit P, Gangopadhyay A, Karabatis G, Chen ZY (2011) Dis- Eng 236:305–311. doi:10.1007/978-1-4614-7010-6 35
crete wavelet transform-based time series analysis and mining. 29. Li GL, Bräysy O, Jiang LX, Wu ZD, Wang YZ (2013) Finding
ACM Comput Surv 43(2):33–63. doi:10.1145/1883612.1883613 time series discord based on bit representation clustering. Knowl-
11. Chen XY, Zhan YY (2008) Multi-scale anomaly detection algo- Based Syst 54(4):243–254. doi:10.1016/j.knosys.2013.09.015
rithm based on infrequent pattern of time series. J Comput Appl 30. Lin J, Keogh E, Lonardi S, Chiu B (2003) A symbolic repre-
Math 214(1):227–237. doi:10.1016/j.cam.2007.02.027 sentation of time series, with implications for streaming algo-
12. Esling P, Agon C (2012) Time-series data mining. ACM Comput rithms. In: Proceedings of the eighth ACM SIGMOD workshop on
Surv 45(1):12–45. doi:10.1145/2379776.2379788 research issues in data mining and knowledge discovery, pp 2–11.
13. Fu AWC, Leung OTW, Keogh E, Lin J (2006) Finding time series doi:10.1145/882082.882086
discords based on haar transform. In: Proceedings of international 31. Lin J, Keogh E, Wei L, Lonardi S (2007) Experiencing SAX: a
conference on advanced data mining and applications, pp 31–41. novel symbolic representation of time series. Data Min Knowl
doi:10.1007/11811305 3 Disc 15(2):107–144. doi:10.1007/s10618-007-0064-z
14. Fuchs E, Gruber T, Nitschke J, Sick B (2010) Online segmentation 32. Lippi M, Bertini M, Frasconi P (2013) Short-term traffic flow
of time series based on polynomial least-squares approxima- forecasting: an experimental comparison of time-series analysis
tions. IEEE Trans Pattern Anal Mach Intell 32(12):2232–2245. and supervised learning. IEEE Trans Intell Transp Syst 14(2):871–
doi:10.1109/TPAMI.2010.44 882. doi:10.1109/TITS.2013.2247040
15. Guerrero JL, Berlanga A, Garc J, Molina JM (2010) Piece- 33. Lonardi S, Lin J, Keogh E, Chiu B (2006) Efficient discovery
wise linear representation segmentation as a multiobjective of unusual patterns in time series. N Gener Comput 25(1):61–93.
optimization problem. Adv Intell Soft Comput 79:267–274. doi:10.1007/s00354-006-0004-2
doi:10.1007/978-3-642-14883-5 35 34. Luo W, Gallagher M, Wiles J (2013) Parameter-free search
16. Guo CH, Li HL, Pan DH (2010) An improved piece- of time-series discord. J Comput Sci Technol 28(2):300–310.
wise aggregate approximation based on statistical features for doi:10.1007/s11390-013-1330-8
time series mining. In: International conference on knowl- 35. Ma J, Perkins S (2003) Online novelty detection on tem-
edge science, engineering and management, pp 234–244. poral sequences. In: ACM SIGKDD international confer-
doi:10.1007/978-3-642-15280-1 23 ence on knowledge discovery and data mining, pp 613–618.
17. Hung NQ, Anh DT (2008) An improvement of PAA for dimen- doi:10.1145/956750.956828
sionality reduction in large time series databases. In: Proceedings 36. Ma JG, Sun L, Wang H, Zhang YC, Aickelin U (2016) Supervised
of pacific rim international conference on artificial intelligence, anomaly detection in uncertain pseudoperiodic data streams. ACM
pp 698–707. doi:10.1007/978-3-540-89197-0 64 Trans Internet Technol 16(1):1–20. doi:10.1145/2806890
18. Izakian H, Pedrycz W (2013) Anomaly detection in time series 37. Mok MS, Sohn SY, Ju YH (2010) Random effects logistic regres-
data using a fuzzy C-means clustering. In: Proceedings of sion model for anomaly detection. Expert Syst Appl 37(10):7162–
IFSA world congress and NAFIPS meeting, pp 1513–1518. 7166. doi:10.1016/j.eswa.2010.04.017
doi:10.1109/IFSA-NAFIPS.2013.6608627 38. Quinn JA, Sugiyama M (2014) A least-squares approach to
19. Jaing MF, Tseng SS, Su CM (2001) Two-phase clustering pro- anomaly detection in static and sequential data. Pattern Recogn
cess for outliers detection. Pattern Recogn Lett 22(6–7):691–700. Lett 40(1):36–40. doi:10.1016/j.patrec.2013.12.016
doi:10.1016/S0167-8655(00)00131-8 39. Shahabi C, Tian XM, Zhao WG (2000) TSA-tree: a wavelet-based
20. Jones M, Nikovski D, Imamura M, Hirata T (2016) Exem- approach to improve the efficiency of multi-level surprise and
plar learning for extremely efficient anomaly detection in trend queries on time-series data. In: Proceedings of the twelfth
real-valued time series. Data Min Knowl Disc 30(6):1–28. international conference on scientific and statistical database man-
doi:10.1007/s10618-015-0449-3 agement, pp 55–68. doi:10.1109/SSDM.2000.869778
H. Ren et al.

40. Tewatia DK, Tolakanahalli RP, Paliwal BR, Tomé WA (2011) Zhiwu Li received his BS,
Time series analyses of breathing patterns of lung cancer MS, and PhD degrees in
patients using nonlinear dynamical system theory. Phys Med Biol mechanical engineering, auto-
56(7):2161–2181. doi:10.1118/1.4734982 matic control, and manufac-
41. Truong CD, Anh DT (2015) An efficient method for motif and turing engineering, respective-
anomaly detection in time series based on clustering. Int J Bus Intell ly, all from Xidian Univer-
Data Min 10(4):356–377. doi:10.1504/IJBIDM.2015.072212 sity, Xi’an, China, in 1989,
42. Viinikka J, Debar H, Mé L, Lehikoinen A, Tarvainen M 1992, and 1995, respectively.
(2009) Processing intrusion detection alert aggregates with time He joined Xidian University
series modeling. Inf Fusion 10(4):312–324. doi:10.1016/j.inffus. in 1992 and now he is also
2009.01.003 with the Institute of Systems
43. Yan QY, Chen XT (2013) A novel never-ending uncertain Top- Engineering, Macau Univer-
k discord detection method. Inf Technol J 12(19):4906–4910. sity of Science and Technol-
doi:10.3923/itj.2013.4906.4910 ogy, Taipa, Macau. Over the
44. Yang Y, Hu HP, Xiong W, Ding F (2011) A novel network traf- past decade, he was a Visiting
fic anomaly detection model based on superstatistics theory. J Professor at the University of
Networks 6(2):311–318. doi:10.4304/jnw.6.2.311-318 Toronto, Technion (Israel Institute of Technology), Martin-Luther Uni-
45. Yi BK, Faloutsos C (2000) Fast time sequence indexing for arbi- versity, Conservatoire National des Arts et Métiers (Cnam), Meliksah
trary Lp Norms. In: Proceedings of the twenty-sixth international Universitesi, and King Saud University. His current research inter-
conference on very large data bases, pp 385–394 ests include Petri net theory and application, supervisory control
46. Zhao J, Liu K, Wang W, Liu Y (2014) Adaptive fuzzy clustering of discrete event systems, workflow modeling and analysis, system
based anomaly data detection in energy system of steel industry. reconfiguration, game theory, and data and process mining.
Inf Sci Int J 259(3):335–345. doi:10.1016/j.ins.2013.05.018 He is a member of Discrete Event Systems Technical Committee
of the IEEE Systems, Man, and Cybernetics Society, and a member
of IFAC Technical Committee on Discrete Event and Hybrid Systems
Huorong Ren received his from 2011 to 2014. He serves as a frequent reviewer for 50+ inter-
BS, MS, and PhD degrees in national journals including Automatica and a number of the IEEE
electronic engineering rom Xi- Transactions as well as many international conferences. He is listed
dian University, Xi’an, China, in Marquis Who’s Who in the world, 27th Edition, 2010. Dr. Li is a
in 1994, 1999, and 2004, res- recipient of an Alexander von Humboldt Research Grant, Alexander
pectively. Currently, he is an von Humboldt Foundation, Germany, and Research in Paris, France.
Associate Professor with the He is the founding chair of Xi’an Chapter of IEEE Systems, Man, and
School of Electro-Mechanical Cybernetics Society.
Engineering, Xidian Univer-
sity, China. His research inter-
ests include face recognition,
data mining, pattern recogni- Abdulrahman AI-Ahmari re-
tion, computer vision, and bio- ceived his PhD degree in ma-
security. nufacturing systems engineer-
ing from the University of
Sheffield, Sheffield, U.K., in
1998. He is currently a Profes-
Xiujuan Liao received her BS sor of Industrial Engineering
degree in measurement and with King Saud University
control technology and ins- (KSU), Riyadh, Saudi Arabia.
trument from Xi’an Univer- He led a number of funded
sity of Posts and Telecom- projects from different organi-
munications, China, in 2014. zations at KSU. His research
Currently, she is a postgrad- interests include advanced
uate student with the School manufacturing technologies,
of Electro-Mechanical Engi- analysis, and design of manu-
neering, Xidian University, facturing systems; computer-
China. Her research interests integrated manufacturing; optimization of manufacturing operations;
include data mining, and pat- applications of simulation optimization; flexible manufacturing sys-
tern recognition. tems and cellular manufacturing systems; and applications of decision
support system in manufacturing.

You might also like