You are on page 1of 8

Expert Systems With Applications 149 (2020) 113252

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

Anomaly pattern detection for streaming data


Taegong Kim, Cheong Hee Park∗
Dept. of Computer Science and Engineering Chungnam National University 220 Gung-dong, Yuseong-gu Daejeon, 305–763, Korea

a r t i c l e i n f o a b s t r a c t

Article history: Outlier detection aims to find a data sample that is different from most other data samples. While out-
Received 14 March 2019 lier detection is performed at an individual instance level, anomaly pattern detection on a data stream
Revised 6 December 2019
means detecting a time point where a pattern to generate data is unusual and significantly different from
Accepted 28 January 2020
normal behavior. Beyond predicting the outlierness of individual data samples in a data stream, it can be
Available online 8 February 2020
very useful to detect the occurrence of anomalous patterns in real time. In this paper, we propose a
Keywords: method for anomaly pattern detection in a data stream based on binary classification for outliers and
Anomaly pattern detection statistical tests on a data stream of binary labels of normal or an outlier. In the first step, by applying
Control charts the clustering-based outlier detection method, we transform a data stream into a stream of binary values
Hypothesis testing where 0 stands for the prediction as normal data and 1 for outlier prediction. In the second step, anomaly
Outlier detection pattern detection is performed on a stream of binary values by two approaches: testing the equality of
Streaming data
parameters in the binomial distributions of a reference window and a detection window, and using con-
trol charts for the fraction defective. The proposed method obtained the average true positive detection
rate of 94% in simulated experiments using real and artificial data. The experimental results also show
that anomaly pattern occurrence can be detected reliably even when outlier detection performance is
relatively low.
© 2020 Elsevier Ltd. All rights reserved.

1. Introduction is unusual and significantly different from normal behavior (Park,


2019; Wong, Moore, Cooper, & Wagner, 2002). Fig. 1 shows the
An outlier is defined as an observation which deviates so conceptual difference between outlier detection and anomaly pat-
much from other observations enough to arouse suspicions that tern detection on data streams (Park, 2019). As shown in Fig. 1(a),
it was generated by a different mechanism (Aggarwal, 2017; the outlier detection focuses on predicting whether an incoming
Hawkins, 1980). Outlier detection can be applied to various prob- test instance is an outlier or not. Fig. 1(b) illustrates the concept
lems such as fraud detection, intrusion detection in computer net- of anomaly pattern detection where the arrow indicates the time
works, system fault detection, and unexpected error detection in point when outliers suddenly begin to occur heavily. There may
databases (Park, 2019). be some noise in the data and the outlierness of individual data
While most of outlier detection methods work in batch mode samples does not always mean that an interesting situation oc-
where all the data samples are available at once, the necessity for curs. However, a sudden increase of data samples predicted as out-
efficient outlier and anomaly pattern detection in a data stream liers may indicate that something unusual has happened such as a
has increased (Domingues, Filippone, Michiardi, & Zouaoui, 2018). breakdown of equipment generating the data stream or an occur-
A data stream is a sequence of data that is generated continuously rence of a particular event. In (Park, 2019), we presented a review
over time. It is infeasible to store streaming data completely in of outlier detection, anomaly pattern detection, and concept drift
memory, and therefore it is difficult to process data streams using for streaming data. The review in (Park, 2019) shows that many
traditional data mining algorithms. batch mode outlier detection methods have been extended for out-
Outlier detection is usually performed at an individual in- lier detection on data streams, but anomaly pattern detection has
stance level, predicting the outlierness of each sample. In con- not been studied much compared to outlier detection. This obser-
trast, anomaly pattern detection on a data stream involves detect- vation motivated research into how to detect outbreaks of abnor-
ing a time point where the behavior of the data generation system mal patterns in a data stream.
In this paper, we propose a method for anomaly pattern de-
tection in a data stream, assuming that a training set consisting

Corresponding author. only of normal data samples is given. Anomaly pattern detection
E-mail addresses: dmflsla@naver.com (T. Kim), cheonghee@cnu.ac.kr (C.H. Park). in the proposed method is performed in two steps. In the first

https://doi.org/10.1016/j.eswa.2020.113252
0957-4174/© 2020 Elsevier Ltd. All rights reserved.
2 T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252

W raw anomaly scores were computed and the recent short term
average of anomaly scores was used for anomaly pattern detection
by applying a threshold to the Gaussian tail probability. However,
the method in (Ahmad & Purdy, 2016) has a limitation in that there
was no guarantee for a Gaussian distribution of anomaly scores
and extending the method for the application to multivariate data
streams is not easy.
In (Spinosa, Carvalho, & Gama, 20 07; 20 08), a cluster-based
anomaly pattern detection method was proposed. Starting with ex-
amples of a single class that describe the normal profile, k-means
Fig. 1. An illustration of outlier detection and anomaly pattern detection on clustering was applied to build the normal model consisting of
data streams. Circles represent normal samples and triangles denote out-
hyperspheres. If an incoming data instance lies inside any hyper-
liers. (Park, 2019).
sphere, it is considered to be a normal instance. Otherwise, it is
moved to a short-term memory where validated clusters that rep-
step, an outlier detection model is learned from the training data resent a novel concept or the extension of the existing model are
which outputs binary labeling of normal samples or outliers. By found. However, this approach focuses mainly on finding evolving
applying the outlier detection model to incoming data samples, a classes and does not detect the time point where an anomaly pat-
data stream is transformed into a stream of binary values indi- tern occurs on a data stream.
cating normal or outlier. In the second step, the occurrence of an Anomaly pattern detection is closely related to concept drift
anomaly pattern is detected on a stream of binary values by testing detection on a data stream whose goal is to perceive a change
the equality of parameters in binomial distributions of a reference of existing concepts (Park, 2019). When there is limited access
window and a detection window or using control charts for the to class labels, a classification model can be constructed using a
fraction defective. small amount of labeled data, and the concept drift on a stream
The contribution of the paper can be summarized as follows. of the unlabeled data is detected using a confidence value derived
from the classifier (Kim & Park, 2017; Lindstrom, Namee, & De-
• We propose two methods for detecting a time point where
lany, 2011; Sethi & Kantardzic, 2015; Zliobaite, 2010). The method
anomaly pattern occurs on a data stream: APD-HT based on hy-
in (Zliobaite, 2010) analyzed a sequence of posterior estimates de-
pothesis testing and APD-CC using control charts.
rived from the classifier by using univariate statistical tests such
• By using an outlier detection method which outputs a binary
as the two sample t-test and Wilcoxon Rank sum test. CDBD (Con-
label for outlier prediction, the proposed method overcomes
dence Distribution Batch Detection) (Lindstrom et al., 2011) used
the difficulty of detecting changes in a real-valued data stream.
the Kullback–Leibler Divergence to compare the distribution of the
• The proposed method works under the assumption that train-
confidence values. While most of concept drift detection methods
ing data consisting of normal data is given. This makes the pro-
under the limited access to class labels work on a sequence of
posed method very practical in an environment where outliers
real-valued posterior estimates by a trained classifier, our proposed
occur at a very small rate or it is difficult to collect outliers in
method for anomaly pattern detection performs change detection
a training stage.
on a sequence of binary values of 0 and 1.
• The experimental results show that the proposed method is not
sensitive to the prediction performance of the outlier detection
method. 3. The proposed anomaly pattern detection method
• The proposed method, APD-HT, obtained the average true pos-
itive detection rate of 94% in simulated experiments using real Detection of an anomaly pattern such as a sudden burst of out-
and artificial data. liers can be performed based on the distribution of outlier scores
of the incoming data samples. However, change detection based
The remainder of the paper is organized as follows. In Section 2,
on the distribution of outlier scores relies on accurately estimat-
related work for anomaly pattern detection is reviewed. In
ing the parameters of probability distribution. In order to overcome
Section 3, we propose a method for anomaly pattern detection on
the difficulty of change detection in a real-value stream, we de-
a data stream. Experimental results are given in Section 4 and dis-
fine a Bernoulli random variable representing whether an incoming
cussion follows in Section 5.
data instance is an outlier or not and propose two approaches for
anomaly pattern detection on an outcome stream from Bernoulli
2. Related work
trials: testing the equality of parameters in the binomial distribu-
tions of a reference window and a detection window, and using
Wong et. al. proposed a method for detecting disease
control charts for the fraction defective.
outbreaks by searching a database of emergency department
cases (Wong et al., 2002). They applied a rule-based anomaly pat-
tern detection algorithm where a rule was defined in a form Xi = 3.1. Transformation into a stream of binary values indicating outliers
j j
Vi for a feature Xi and the j-th value Vi of that feature. Strange
events were detected by comparing the records of the current day In order to transform a data stream into a stream of binary val-
against the records of the past. In (Das, Schneider, & Neil, 2008), ues of 0 or 1, any outlier detection method which gives the bi-
it was extended to generate a pattern of anomalous records that nary labeling indicating normal or an outlier can be used. We ap-
was similar for a particular subset of attributes, but anomalous due plied a clustering-based outlier detection method (Haque, Khan, &
to unusual values in any random set of attributes. However, both Baron, 2016; Park, Kim, Kim, Choi, & Lee, 2018; Spinosa, Carvalho,
methods are only applicable to categorical data and have limita- & Gama, 2008). Given training data that contains only normal data,
tions with building complicated rules in various situations. the region of normal data samples is modeled as a union of hy-
In (Ahmad & Purdy, 2016), raw anomaly scores for streaming perspheres by performing k-means clustering on the training data.
data instances were computed using HTM (hierarchical temporal Each cluster is summarized as a center and a radius. The cluster
memory) network(Padilla, Brinkworth, & McDonnell, 2013). In a se- radius means the farthest distance from the center to the cluster
quence of outlier scores, the sample mean and variance for the last member. When a test data sample arrives, the closest cluster cen-
T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252 3

large k value in the k-means clustering, it is possible to efficiently


construct a normal region by removing small clusters below size s
which might include noise.

3.2. Anomaly pattern detection on a binary value stream

3.2.1. Hypothesis testing for the difference between two proportions


The data stream is transformed into a stream of binary values of
0 for the prediction as normal data, and 1 for outlier prediction, as
shown in Fig. 3. We tested the difference between the proportions
of a reference window and a detection window. Let the samples in
the reference window be generated from the binomial distribution
of the proportion p1 and the samples in the detection window gen-
erated from the binomial distribution of the proportion p2 . The val-
ues of p1 and p2 are unknown. When X and Y represent the num-
ber of the value 1 in the reference window and the detection win-
dow and the sizes, m and n, of the two windows are large, by the
central limit theorem, the distribution of the difference between
X
pˆ 1 = m and pˆ 2 = Yn follows a normal distribution approximately as
in (1) (Navidi, 2006).
! "
p1 ( 1 − p1 ) p2 ( 1 − p2 )
pˆ 1 − pˆ 2 ∼ N p1 − p2 , + (1)
m n
Fig. 2. The flowchart of the clustering-based ensemble outlier detection method.
Under the null hypothesis H0 : p1 − p2 = 0 and alternative hy-
pothesis H1 : p1 − p2 < 0,

ter to it is found, and it is predicted to be an outlier when it is not pˆ 1 − pˆ 2


p-value = P (Z < z0 ) for z0 = # (2)
included within the cluster radius. pˆ (1 − pˆ )(1/m + 1/n )
Clustering-based ensemble models can be constructed by divid-
ing the normal training data into t chunks and performing k-means is computed where pˆ = (X + Y )/(m + n ). For a significance level
clustering on each chunk. When a new data sample is outside the α , if the p-value is less than α , then H0 is rejected and anomaly
cluster area in all chunks, it is ultimately determined to be an out- pattern detection is signaled. Otherwise, H0 is accepted and a de-
lier. This means that the region of normal data is regarded as the tection window is moved forward for a new hypothesis testing.
union of all cluster regions. Fig. 2 summarizes the clustering-based Fig. 3 (a) shows the sliding of a reference window and a detec-
ensemble outlier detection method in (Park et al., 2018). tion window. The reference window is fixed in the very early pe-
The number of clusters, k, affects how the normal region sur- riod when an anomaly pattern is not considered to have occurred,
rounding the normal data is constructed. Using a too small k value but the detection window is moving forward for real time anomaly
increases the cluster radius, because all the normal data must be pattern detection. We call this approach Anomaly Pattern Detection
included in a small number of big hyperspheres. Therefore, it can by Hypothesis Testing (APD-HT).
increase the false negatives, which is the prediction of an outlier
as normal data. On the other hand, if the value of k is too large, 3.2.2. Using control charts for the fraction defective
the normal region is set too tight, and the false positive predic- Control charts can be used to determine whether a manufactur-
tion of normal data as an outlier is increased. The noise present in ing process is in or out of control based on the upper and lower
the normal data also affects the clustering result. By using a rather control limits (Sheldon, 2004). Let us consider situations where the

Fig. 3. (a) APD-HT (Anomaly Pattern Detection by Hypothesis Testing) (b) APD-CC (Anomaly Pattern Detection by Control Charts).
4 T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252

Fig. 4. Summarization of the proposed anomaly pattern detection method.

items produced have quality characteristics that are classified as ei- Table 1
Data description.
ther being defective or nondefective. Suppose that each item pro-
duced will independently be defective with probability p when the Data Attributes Samples Outliers Outlier ratio(%)
process is in control. If X denotes the number of defective items Creditcard 28 284,807 492 0.17
in a set of n items, X will be a binomial random variable with Kdd-Http 3 567,498 2211 0.39
E (X ) = np and V (X ) = np(1 − p) and approximately follows a nor- Annthyroid 6 7200 534 7.42
mal distribution when np ≥ 10 and n(1 − p) ≥ 10. If F = X/n is Shuttle 9 49,097 3511 7.15
Gaussian 1 102,500 2500 2.44
the fraction of defective items, then F ∼ N ( p, p(1 − p)/n ). Hence,
RBFevents 5 100,000 10,201 10.2
when the process is in control, the fraction defective in a set of
size n should be between the limits
# #
LCL = p − 3 p(1 − p)/n, UCL = p + 3 p(1 − p)/n (3) holds, then an alarm for anomaly pattern detection is signaled.
To apply a control chart, it is necessary to estimate p. Choosing l Otherwise, the detection window moves forward by one and a
sets of size n under the process in control (l ≥ 20), let Fi denote the new reference window is added to the ensemble of reference win-
fraction of defective items in the i-th set. Then the estimate of p is dows by moving the last reference window forward by one. We
given by F̄ which is defined by F̄ = (F1 + · · · + Fl )/l (Sheldon, 2004). call this approach Anomaly Pattern Detection by Control Charts (APD-
However, applying control charts for the fraction of outliers in CC). Fig. 4 shows the flowchart of the proposed anomaly pattern
a window of size n has a problem that outliers occur at a very detection method.
small rate or almost do not exist in real situations. Hence, the
assumption of normal distribution might produce inaccurate re- 4. Experimental results
sults. To rectify this problem, we apply the arcsine transforma-
tion (Mosteller & Youtz, 1961), which is 4.1. Data description
−1
#
Y = sin ( X/n ). (4)
In order to test the performance of the proposed anomaly pat-
Then the distribution of Y would be approximately a normal dis- tern detection method, we used six data sets which include real

tribution with mean sin−1 ( p) and variance 1/(4n). The lower and and artificial data. Table 1 describes the details of the data. Cred-
upper control limits are itcard data is a summary of credit card usage by some card hold-
−1 √ # √ # ers in Europe in September 2013.1 The total number of data sam-
LCL = sin ( p) − 3 1/(4n ), UCL = sin−1 ( p) + 3 1/(4n ) ples was 284,807, and the number of outliers was 492. We used 28
(5) attributes except the attribute indicating usage amount. KDD-http
√ data was obtained from kddcup.data-10-percent-corrected data by
and sin−1 ( p) can be replaced with
# # the DARPA Intrusion Detection Evaluation Program.2 We extracted
−1
Ȳ = (sin−1 ( F1 ) + · · · + sin ( Fl ))/l. (6) 567,498 samples whose value of attribute service was http, and
three attributes of duration, src-bytes, and dst-bytes were used as
As shown in Fig. 3(b), we initialize the ensemble of reference
in (Hawkins, Hongxing, Williams, & Baxter, 2002). The number of
windows and the detection window. The l reference windows are
outliers indicating network intrusion was 2211.
constructed by moving the first reference window with stride 1. Let
Annthyroid and Shuttle data were downloaded from UCI ma-
Fi be the fraction of outliers in the i-th reference window and D be
chine learning repository. In Annthyroid data, the data samples in
the fraction of outliers in the detection window. Ȳ is computed by
Eq. (6). If the condition
√ # 1
https://www.kaggle.com/dalpozz/creditcardfraud.
sin−1 ( D ) > Ȳ + 3 1/(4n ) (7) 2
http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.
T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252 5

Fig. 5. Performance evaluation.

two classes, hyperfunction and subnormal functioning, were consid- In the clustering-based ensemble outlier detection, the parame-
ered to be outliers and the remaining one class was set as a normal ters to be determined are the number of ensembles, t, the number
class. In Shuttle data, the data samples of class 1 were treated as of clusters, k, and the maximum size of a cluster to be deleted,
normal data and data samples in the other classes except class 4 s. In Fig. 6, the average F1 value obtained from the 10 repeated
were set as outliers as in (Liu, Ting, & Zhou, 2008). experiments is shown for various k and t values. The s value was
Gaussian data is 1-dimensional data where 10 0,0 0 0 normal set as 5, meaning that the clusters below size 5 were removed in
samples were generated using 5 Gaussian mixture distributions the clustering model. Overall, if the number of clusters exceeds 30,
with mean values 0, 1, 2, 3 and 4, and 2500 abnormal data sam- the F1 value tended to increase. Also, as the number of ensem-
ples were generated from Gaussian distribution with mean value ble members increased, the performance was stabilized, while be-
6. RBFevents data was constructed using the artificial streaming ing less affected by the number of clusters. In the experiments for
data generator RandomRBFevents of MOA (Bife, Holmes, Kirkby, & anomaly pattern detection, we used default values of t = 3, k = 30,
Pfahringer, 2010) with normal data from five normal distributions and s = 5 as in (Park et al., 2018).
and outliers from random uniform distribution.
4.4. Comparison of anomaly pattern detection performance
4.2. Experiment setup
In the second step of the proposed method, the size of the ref-
In order to simulate the occurrence of an anomaly pattern erence window and the detection window was set as 400, and the
on data streams, outliers were arranged after a sequence of nor- significance level of the hypothesis test was set at 0.01. As a com-
mal data, and the starting point of outliers was set as the actual parison method, we applied Isolation Forest(IF) (Liu et al., 2008),
anomaly pattern occurrence point. An outlier detection model is which is a well-known tree-based outlier detection method. Isola-
constructed using normal data corresponding to 30% of the total tion Forest method builds Isolation Tree by repeating recursively
data samples. Moving the detection window forward, the anomaly the random selection of an attribute and a splitting value until all
pattern detection process continues on a data sequence of the re- instances are isolated. The length of a path where a data sample
maining 70% until the occurrence of the anomaly pattern is pre- traverses from the root node to an external node is used to com-
dicted. The anomaly pattern detection performance can be evalu- pute an outlier score. For anomaly pattern detection, the Wilcoxon
ated as shown in Fig. 5. rank sum test was performed on a stream of outlier scores gener-
ated by IF. A Wilcoxon signed-rank test is a nonparametric test that
• True Positive(TP): Increase TP by one when anomaly pattern oc-
can be used to determine whether two groups of samples were se-
currence is predicted after the actual anomaly pattern occur-
lected from populations having the same distribution. Tests were
rence point.
performed with a significance level of 0.01, and the size of both
• False Positive(FP): Increase FP by one when an anomaly pattern
the reference window and the detection window was set to 400.
is predicted to have occurred before the actual anomaly pattern
Table 2 shows the performance of the compared methods. TP,
occurrence point.
FP, and FN were obtained by repeating the experiment 100 times.
• False Negative(FN): Increase FN by one when anomaly pattern
In all cases, FN was 0, hence we did not show it in the ta-
occurrence is not detected after the actual anomaly pattern oc-
ble. Delay is the average of the delay values which were mea-
currence point.
sured only for true positive detection. In the fifth column, the
• Delay: Distance to the anomaly pattern prediction point after
F1 value obtained by the clustering-based ensemble outlier detec-
the actual anomaly pattern occurrence point.
tion method is shown. Overall, the APD-HT method showed the
The experiment is repeated while randomly mixing the data best performance. APD-HT detected anomaly pattern occurrence
samples in each sequence of the normal data and the outliers. We 94 times among 100 experiments, while successful detection by
cumulate TP, FP, FN and compute the average of delay values. the Wilcoxon rank sum test is 49.8 times. Low performance of the
Wilcoxon rank sum test indicates that it can be difficult to detect
4.3. Outlier detection performance by the clustering-based ensemble changes in the outlier score distribution. For the Creditcard and
method Annthyroid data, the anomaly pattern detection performance was
quite high, even though the outlier detection performance is not.
In the first step of the proposed anomaly pattern detection This shows that the proposed method is not sensitive to the pre-
method, the clustering-based ensemble outlier detection model diction performance of the outlier detection method because it de-
needs to be constructed from training data that contains normal tects the point where the prediction tendency changes.
data. We used normal data corresponding to 30% of the total data
samples as a training set and applied the model for outlier predic- 4.5. Comparison of anomaly pattern detection performance in noisy
tion of the remaining 70% test data. TP, FP, and FN were measured environments
for outlier prediction of test data, and Precision = TP / (TP + FP),
Recall = TP / (TP + FN), and F1 = 2 ∗ Precision ∗ Recall / (Precision We also tested anomaly pattern detection performance in noisy
+ Recall) were computed. environments. This was simulated by mixing a small number of
6 T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252

Fig. 6. A comparison of the F1-values with respect to the k value in k-means clustering and the number t of ensemble members (Park et al., 2018).

outliers in a sequence of normal data samples. We inserted 10% Wilcoxon rank sum test, but delay distance of APD-CC was shorter
outliers randomly into the normal data sequence and the same than that of the Wilcoxon rank sum test.
number of normal data was mixed with the outlier data sequence.
Table 3 compares the performance of anomaly pattern detection. 5. Discussions
Although the performance of the method APD-HT in noisy envi-
ronments is lower than that in Table 2, it is still much higher In this paper, we proposed a method to detect anomaly pat-
than the performance of Wilcoxon rank sum test. APD-HT detected tern occurrence in streaming data. It detects the occurrence of an
anomaly pattern occurrence 78.8 times among 100 experiments, anomaly pattern in a stream of binary labeling of normal or an
while successful detection by the Wilcoxon rank sum test is 48.8 outlier which is predicted by the clustering-based ensemble out-
times. APD-CC showed similar true positive detection ratio to the lier detection method. Two approaches, APD-HT based on hypoth-
T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252 7

Table 2
Comparison of anomaly pattern detection performance. Delay denotes the average delay distance which
was measured in the cases of true positive detection.

Wilcoxon The proposed method

Data rank sum out. det. APD-HT APD-CC

TP FP delay F1 TP FP delay TP FP delay

Creditcard 24 76 53.7 0.56 100 0 13.2 25 75 6.2


Kdd-Http 36 64 48.4 0.98 100 0 27.2 87 13 17.9
Annthyroid 84 16 50.1 0.29 92 8 72.7 80 20 61.8
Shuttle 60 40 45.7 0.96 72 28 7.6 50 50 0.5
Gaussian 52 48 45.2 0.93 100 0 5.9 83 17 2.3
RBFevents 43 57 45.1 0.99 100 0 5.0 99 1 1.9
mean 49.8 50.2 48.0 0.78 94 6 21.9 70.7 29.3 15.1

Table 3
Comparison of anomaly pattern detection performance in noisy environments.

Wilcoxon The proposed method

Data rank sum APD-HT APD-CC

TP FP delay TP FP delay TP FP delay

Creditcard 37 63 59.4 100 0 14.9 40 60 7.8


Kdd-Http 29 71 49.9 100 0 30.5 35 65 40.2
Annthyroid 78 22 54.2 91 9 83.3 84 16 70.8
Shuttle 58 42 51.0 59 41 12.7 57 43 9.0
Gaussian 50 50 52.2 72 28 9.3 32 68 6.2
RBFevents 41 59 52.2 51 49 15.9 37 63 11.1
mean 48.8 51.2 53.2 78.8 21.2 27.8 47.5 52.5 24.2

esis tests to determine the difference between two proportions Declaration of Competing Interest
and APD-CC based on control charts for the fraction defective with
the arcsine transformation, were proposed. The experimental re- The authors declare that they have no known competing finan-
sults showed the proposed method had competent detection per- cial interests or personal relationships that could have appeared to
formance. APD-HT obtained the successful detection rate of 94% influence the work reported in this paper.
on average in the simulated experiments. Also it detected anomaly
pattern occurrence 78.8 times among 100 experiments in a simu- Credit authorship contribution statement
lated noisy environment. The true positive detection rate by APD-
HT was 1.6 to 2 times higher than the comparison method using Taegong Kim: Conceptualization, Data curation, Formal analy-
the Wilcoxon rank sum, while the delay distance was shorter than sis, Methodology, Resources, Software, Validation, Writing - orig-
that by the Wilcoxon rank sum. The experimental results in some inal draft. Cheong Hee Park: Conceptualization, Formal analysis,
data sets of Table 2 show that anomaly pattern occurrence can be Funding acquisition, Methodology, Project administration, Supervi-
detected reliably even when outlier detection performance is rela- sion, Writing - review & editing.
tively low.
Most researches focus on outlier detection methods and there Acknowledgments
has not been much work on anomaly pattern detection on a data
stream. Existing methods for anomaly pattern detection work on This research was supported in part by Korea Electric Power
a limited situation such as one-dimensional data stream or simple Corporation (grant number:R18XA05) and in part by the National
rule-based detection. On the other hand, the proposed method can Research Foundation of Korea (NRF) grant funded by the Korea
be applied reliably on a general streaming environment. It utilizes government (MSIT) (no. NRF-2019R1F1A1062341).
an outlier detection method which directly gives binary output of
outlier or normal. Hence, the process to set a threshold on out- References
lier scores for outlier prediction which is usually difficult is not
Aggarwal, C. (2017). Outlier analysis. Springer.
needed. However, as a future work, we plan to expand the pro- Ahmad, S., & Purdy, S. (2016). Real-time anomaly detection for streaming analytics.
posed anomaly pattern detection method to perform on a real- Retrieved from https://arxiv.org/pdf/1607.02480.pdf.
Bife, A., Holmes, G., Kirkby, R., & Pfahringer, B. (2010). Moa: Massive online analysis.
valued stream of outlier scores. In that way, well-known outlier
Journal of Machine Learning Research, 11, 1601–1604.
detection methods which compute outlier scores can be used. Das, K., Schneider, J., & Neil, D. (2008). Anomaly pattern detection in categorical
The proposed method is based on the assumption that a train- datasets. In Proceedings of KDD.
ing set consisting of normal data samples is given. The normal Domingues, R., Filippone, M., Michiardi, P., & Zouaoui, J. (2018). A comparative eval-
uation of outlier detection algorithms: Experiments and analyses. Pattern Recog-
training data is used to build a clustering model of normal data nition, 74, 406–421.
and an initial reference window. Since it is easy to collect normal Haque, A., Khan, L., & Baron, M. (2016). Sand: Semi-supervised adaptive novel class
data compared to real outliers, this assumption makes it possible detection and classification over data stream. In Proceedings of the AAAI.
Hawkins, D. (1980). Identification of outliers. Springer Netherlands.
to apply the proposed method in situations where it is difficult to Hawkins, S., Hongxing, H., Williams, G., & Baxter, R. (2002). Outlier detection using
obtain real outliers. Nevertheless, manual labeling of normal data replicator neural networks. In Proceedings of the international conference on data
requires some effort, so extending the proposed method without warehousing and knowledge discovery.
Kim, Y., & Park, C. (2017). An efficient concept drift detection method for stream-
that assumption will widen the application of anomaly pattern de- ing data under limited labeling. IEICE Transactions on Information and systems,
tection. E100-D(10), 2537–2546.
8 T. Kim and C.H. Park / Expert Systems With Applications 149 (2020) 113252

Lindstrom, P., Namee, B., & Delany, S. (2011). Drift detection using uncertainty dis- Sethi, T., & Kantardzic, M. (2015). Don’t pay for validation: Detecting drifts from
tribution divergence. In Proceedings of the IEEE 11th int. conf. data mining work- unlabeled data using margin density. In Proceedings of the INNS conference on
shops. big data.
Liu, F., Ting, K., & Zhou, Z. (2008). Isolation forest. In Proceedings of the 8th interna- Sheldon, S. (2004). Introduction to probability and statistics for engineers and scien-
tional conference on data mining. tists. Elsevier academic press.
Mosteller, F., & Youtz, C. (1961). Tables of the Freeman-Tukey transformations for Spinosa, E., Carvalho, A., & Gama, J. (2007). Olindda: A cluster-based approach for
the binomial and poisson distributions. Biometrika, 48(3–4), 433–440. detecting novelty and concept drift in data streams. In Proceedings of the SAC.
Navidi, W. (2006). Statistics for engineers and scientists. McGraw-Hill. Spinosa, E., Carvalho, A., & Gama, J. (2008). Cluster-based novel concept detection in
Padilla, D., Brinkworth, R., & McDonnell, M. (2013). Performance of a hierarchical data streams applied to intrusion detection in computer networks. In Proceed-
temporal memory network in noisy sequence learning. In Proceedings of IEEE ings of the SAC.
international conference on computational intelligence and cybernetics. Wong, W., Moore, A., Cooper, G., & Wagner, M. (2002). Rule-based anomaly pattern
Park, C. (2019). Outlier and anomaly pattern detection on data streams. Journal of detection for detecting disease outbreaks. In Proceedings of the 18th international
Supercomputing, 75(9), 6118–6128. conference on artificial intelligence.
Park, C., Kim, T., Kim, J., Choi, S., & Lee, G. (2018). Outlier detection by cluster- Zliobaite, I. (2010). Change with delayed labeling: When is it detectable?. In Pro-
ing-based ensemble model construction. KIPS Transactions Software and Data ceedings of the IEEE international conference on data mining workshops.
Engineering, 7(11), 435–442.

You might also like