Professional Documents
Culture Documents
net/publication/326969312
CITATIONS READS
5 336
3 authors, including:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Gobinath Loganathan on 14 December 2018.
Abstract—Advanced intrusion detection systems are beginning threshold values is a tedious and time-consuming task for
to utilize the power and flexibility offered by Complex Event Pro- humans. In a volatile domain like intrusion detection, such
cessing (CEP) engines. Adapting to new attacks and optimizing threshold values are subject to change frequently. Therefore,
CEP rules are two challenges in this domain. Optimizing CEP
rules requires a complete framework which can be ported to an optimization algorithm to find optimal threshold values for
stream processors because a CEP rule cannot run without a a CEP rule can relieve domain experts from spending time in
stream processor. External dependencies of stream processors analyzing training data. Furthermore, such an algorithm can
make CEP rule a black box which is hard to optimize. In be used in runtime to continuously optimize CEP rules based
this paper, we present a novel adaptive and functionally auto- on the dynamicity of the environment.
scaling stream processor: “Wisdom” with a built-in hybrid
optimizer developed using Particle Swarm Optimization, and Optimizing CEP rules using available optimization algo-
Bisection algorithms to optimize CEP rule parameters. We rithms requires dynamic stream processors (stream processors
show that an adaptive “Wisdom” rule tuned by the proposed which allow runtime query modification) to try different
optimization algorithm is able to detect selected attacks in threshold values without restarting the stream processor. Even
CICIDS 2017 dataset with an average precision of 99.98% and though Esper and Apache Flink support CEP variables, they
an average recall of 93.42% while processing over 2.5 million
events per second. The proposed distributed functionally auto- do not provide a built-in mechanism to optimize queries de-
scaling deployment mode consumes significantly fewer system fined using variables [9], [10]. Abadi et al. proposed the first
resources than the monolithic deployment of CEP rules. fully dynamic stream processor: “Borealis” years ago [11].
However, none of the existing commercial stream processors
I. I NTRODUCTION offer dynamic operators comparable to “Borealis” [9]–[12].
Complex Event Processing (CEP) is a reactive program- These stream processors focus on performance optimization
ming paradigm used in responding to real-time events based and scalability than query optimization. Even though stream
on predefined rules. Stream processors provide the neces- processors can distribute and scale operators, not all stateful
sary infrastructure to develop and deploy CEP rules for a operators are horizontally scalable [13]. Especially when it
wide range of applications including intrusion detection [1], comes to dynamic CEP operators, it is hard to track and
healthcare [2], fleet management [3], and power grid [4]. In atomically update them in a horizontally scaled environment.
all these domains, constructing a CEP rule often requires Another dynamic complex event processor: iCEP does not
a domain expert who knows how to mine complex events reach the throughput benchmark set by commercial stream
out of a stream of raw events. Some recent studies have processors [6]. We believe these limitations are keeping com-
proposed automatic CEP rule generation using unsupervised mercial stream processors away from dynamic and adaptive
machine learning algorithms to replace domain experts by CEP.
machines [5]–[8]. Machine learning algorithms require a lot In this paper, we propose a new steam processor: “Wisdom”
of preprocessed data and training time. Moreover, the pro- developed with the following features: (1) dynamic without
posed solutions are based on frequent patterns which are not compromising the performance, (2) adaptive using Particle
useful in anomaly detection scenarios like intrusion detection. Swarm Optimization and Bisection algorithms, (3) distributed
Instead, the traditional way of defining CEP rules based on and functionally auto-scaling as an alternative to horizontal
human cognition with the support of domain-specific facts scaling. We use the term “functionally auto-scaling” to mean
is easier than mining rules from training data. Even though the ability of “Wisdom” stream processor to start new rules
humans are experts in logical reasoning, we are poor in to add more features or to stop unwanted rules to reduce
handling numbers. For example, it is easy for a domain expert resource consumption. We tested our stream processor using
to say that “a CEP rule to detect FTP brute force attack must three CEP rules defined by domain experts and optimized by
look for a lot of failed login attempts within a short period “Wisdom” using packets arrived within a 10 minutes interval
of time”, but he/she requires manual inspection of training to detect FTP brute force attack, HTTP Slow Header Denial
data to quantify the number of failed attempts and the shortest of Service (DoS) attack, and Port Scan probe. “Wisdom” was
time interval to consider. Analyzing training data to find those able to detect the selected attacks with an average precision
of 99.98% and an average recall of 93.42% which is better the cosine similarity between individual events. The attribute
than the maximum precision of 80% and maximum recall of to calculate cosine similarity is determined by domain experts.
90% obtained by Turchin et al. after training their adaptive Mehdiyev et al. used Elitist Pareto-based Multi-Objective
CEP rule using the entire dataset. “Wisdom” is able to process Evolutionary Algorithm to select event attributes and Fuzzy
2.5 million events per second in a single thread environment Unordered Rule Induction Algorithm to classify events [8]. In
which is significantly better than the throughput of iCEP this research, the authors compared their algorithm with other
and comparable with commercial stream processors. We also classification algorithms. However, they did not propose how
show that the proposed functionally auto-scaling deployment to convert the output of their algorithm to a CEP rule and
consumes fewer system resources compared to traditional admit that generating CEP rules using their classifier will be
monolithic deployment without compromising the accuracy. a difficult challenge to address.
All above CEP rule mining methodologies were developed
II. BACKGROUND
with an intention to replace domain experts with machines
A. Dynamic Complex Event Processor [5]–[8]. However, they rely on false assumptions like raw
In a dynamically changing environment, static CEP rules events not being complex, TimeWindow being enough to
become obsolete very soon. Redeploying new rules for every collect events in all scenarios, or a single CEP rule template
change in the environment reduces the uptime of the system. being able to represent all complex events. These assumptions
Moreover, dynamicity is one of the primary requirements oversimplify the problem and do not capture the real world
for adaptive complex event processing so that system can requirements. Furthermore, these solutions mainly focus on
automatically adjust the flow of events without redeploy- generating rules for commonly occurring patterns. In anomaly
ing the rule. Borealis stream processing engine developed driven domains like intrusion detection, such patterns repre-
by Abadi et al. supports dynamic query modification and sent legitimate traffic on which we are not interested. Hence
performance optimization at runtime using separate control rules developed for frequent patterns may not work well for
flows [11]. A similar approach is used by Bhargavi et al. in detecting anomalous traffic.
their dynamic complex event processor to deploy CEP rules
C. Parameter Tuning
without restarting the stream processor [14].
Turchin et al. defined CEP rules based on probability
B. Complex Event Processing Rule Mining score of selected attributes and tuned threshold values using
Mousheimish et al. proposed automatic predictive CEP Discrete Kalman Filter based on expert feedback and event
rule mining from classified multivariate time series data [5]. history [15]. The concept of tuning rule parameters and the
The learning algorithm first searches for subsequences across application of adaptive rules to detect attacks in DARPA
a time series input. The length of possible subsequences is 1999 dataset are close to our research. Therefore, we have
limited by user-defined lower and upper bounds. A CEP rule chosen this research as a benchmark to compare the results we
is built using subsequence with the highest accuracy after obtain. However, their contribution to CEP rule optimization
removing redundant parts from the sequence. However, this may not be widely applicable because their rules neither
approach is limited by user-defined sequence lengths and use any CEP operators nor follow CEP semantics. Instead,
limited CEP rule templates which are not guaranteed to fit they calculate anomalous probability score of request length,
all use cases. response length, possible “SYN” error, and hostname for each
Margara et al. developed iCEP which can generate expres- packet. A packet is classified either as an anomaly or not by
sive CEP rules using time window, selector, logical operator, comparing the total score of these four attributes with two
pattern, and aggregator [6]. iCEP learns interesting events threshold values. Therefore, this rule does not address any
and time frame followed by aggregators and filters, and problems we raised in Section III-A3.
finally parameters and sequences in an independent three- Bayesian Optimization is widely being used by researchers
phase pipeline. In this approach, errors made in an early stage for hyperparameter optimization and black-box optimization
of the pipeline can propagate and affect the following learners. [16]. In this method, an unknown objective function is
For example, if time window learner fails to capture all mapped into a prior belief and sequentially refined by a
necessary events, sequence learner cannot learn a meaningful Bayesian posterior update [16]. Snoek et al. used Bayesian
sequence at the end of the pipeline. Isolated learning phases Optimization to tune machine learning hyperparameters [17].
of iCEP fail to address the correlation between CEP operators. It is also used by Pooyan et al. to optimize the performance of
Therefore, the rule generated by iCEP may not perform well Apache Storm stream processor [18]. Among the population-
in a highly correlated domain. based optimization algorithms, Genetic Algorithm (GA) and
CEP rule mining based on similarity match was proposed Particle Swarm Optimization (PSO) are widely being used
by Lee et al. [7]. In this work, authors cluster event sequences, for hyperparameter tuning [19], [20]. GA and PSO optimized
extract a complex event based on similarity across sequences a selected set of problems with equal accuracy in a test
from the same cluster and finally generate a complex event conducted by Hassan et al. [21]. Though GA has been
pattern using Markov Transition model. Their clustering algo- successfully applied for optimization problems, it is inefficient
rithm calculates the distance between two sequences based on for applications with highly correlated parameters [20]. In
TABLE I
P ERFORMANCE COMPARISON OF “W ISDOM ” WITH COMMERCIAL
0.8
ENVIRONMENT
0.4
0.2
300
addition, GA is much more complex to implement than order of events. Therefore, we have designed “Wisdom” using
PSO. Therefore, we have chosen Bayesian Optimization and microservice architecture [25] to deploy each CEP rule as a
PSO for our experiment. Hosseini et al. used PSO algorithm microservice with required memory and CPU allocation. Each
to optimize Multiple Criteria Linear Programming (MCLP) “Wisdom” instance can be controlled via exposed RESTful
algorithm used to detect DoS attacks in KDD CUP 1999 [26] admin service endpoints. We also have developed a
dataset [22]. Even though the use case (intrusion detection) service named “Wisdom Manager” to start, stop and control
and optimization algorithm are the same as our research, the “Wisdom” instances automatically. Using the @app annota-
actual contribution of Hosseini et al. is optimizing MCLP in tion, “Wisdom Manager” can be informed to start a query
which the problem is already in an optimizable format with only if specific streams receive events in the system. Though
variables. We are focusing on optimizing CEP rules which are “Wisdom” can be used as a Java library, for functionally auto-
black-box functions and require additional steps to convert scaling, we recommend distributed deployment of “Wisdom”
them into optimizable problems. Furthermore, Hosseini et instances with Apache Kafka [27] for intermediate commu-
al. used MCLP for anomaly-based detection in KDD CUP nication.
1999 dataset and we use CEP for signature-based detection 3) Optimizable CEP Rule: Threshold values in a CEP rule
in CICIDS 2017 [23] dataset. Therefore, we compare our final can be an integer, a real number or a constant. Considering
results with Turchin et al. because both of us are solving the all possible constants as a list of candidates, they can be
CEP rule optimization problem. mapped into integer values. These numbers may or may not
III. M ETHODOLOGY have lower and upper bounds. For example, the minimum
no of packets threshold in Figure 3 has a lower bound 0
A. Wisdom Architecture
because it is a count but not an upper bound. However, these
We developed an adaptive and functionally auto-scaling parameters are correlated with each other in such a way that
stream processor: “Wisdom” with the following goals (1) they cannot take all possible values in the space. According to
dynamic without compromising performance, (2) functionally these facts, a CEP rule optimization problem can be defined
auto-scaling, and (3) optimizable CEP rule. as
1) Dynamicity and Performance: In our early attempts,
we tried to modify existing open source stream processors:
Apache Flink [10] and WSO2 Siddhi [12] to make them max/min f (x1 , x2 , x3 , ..., xn )
dynamic. The underlying static data structure to represent s.t AX ≤ B
events in these stream processors are designed for high xϵR
throughput and low latency and did not allow us to imple-
ment dynamic operators. Authors of iCEP dynamic complex where A is a rational matrix and B is a rational vector.
event processor claim that their complex event processor A CEP rule is a discontinuous function which takes streams
can analyze “thousands of events in a few minutes” [6]. of events as input and optionally generates complex events
Our underlying architecture using Observer design pattern as output. Therefore, it is hard to fit a CEP rule itself in
and Mediator design pattern [24] to implement variables an optimization problem. Instead, f is a continuous profit or
and dynamic CEP operators yields performance comparable loss function defined using the output of a CEP rule in such
to commercial stream processors as given in Table I and a way that optimizing f will optimize the CEP rule. This
significantly better performance than iCEP. way, optimizing CEP rule can be defined as a Mixed Integer
2) Functionally Auto-scaling: In intrusion detection sys- Linear Programming (MILP) problem if f is linear or Mixed
tems, some attack detectors may need more resources than Integer Non-Linear Programming (MINLP) problem if f is
others. For example, a DoS attack detector may need more non-linear. Both MILP and MINLP are NP-Hard problems so
system resources than an SQL attack detector due to the large finding a solution in polynomial time is not always feasible
amount of traffic involved in a DoS attacks. Distributing and [28], [29].
scaling a stream processor at the operator level can cause The profit or loss function f is a black box of correlated
to coordination problems in CEP operators depending on the variables because its output depends on the underlying CEP
TABLE II Fig. 3. Optimizable Wisdom query to detect FTP brute force attack
C OMPARISON OF BAYESIAN AND PARTICLE S WARM OPTIMIZATION
def stream PacketStream;
ALGORITHMS USING THE PROFIT FUNCTION SHOWN IN F IGURE 1 def stream FTPBruteForceAttackStream;
Total Loss
Parameters
3: val ← Bisection(f unction, val,
constraints[val], step[val]) Optimization Algorithm
4: end for
5: return optimal values Fig. 4. Wisdom Optimizer architecture in which the Input Feeder and Loss
Function must be defined by the user
400
7: end if 450
Memory (MB)
Memory (MB)
300
8: end for
400
200
9: return loss
350
100
300
Wisdom Start/Stop FTP Brute-force Wisdom Instances HTTP Slow Header FTP Brute Force Port Scan Wisdom Instances HTTP Slow Header Packet Filter FTP Brute Force Port Scan
Manager Detector
Start/Stop Start/Stop
Streams Throughput
(a) (b)
Port Scan PossibleBruteForceStream HTTP Slow Header
Detector Detector Fig. 7. Memory consumption of “Wisdom” instances in (a) manual deploy-
PossiblePortScanStream
Filter Query
PossibleDosStream ment and (b) functionally auto-scaling deployment
Apache Kafka