A Survey of Outlier Detection Methods in Network Anomaly Identiﬁcation
Prasanta Gogoi ^{1} , D K Bhattacharyya ^{1} , B Borah ^{1} and Jugal K Kalita ^{2}
^{1} Department of Computer Science and Engineering, Tezpur University, Napaam Tezpur, India 784028 ^{2} Department of Computer Science, College of Engineering and Applied Science University of Colorado, Colorado Springs Email: {prasant, dkb, bgb}@tezu.ernet.in kalita@eas.uccs.edu
The detection of outliers has gained considerable interest in data mining with the realization that outliers can be the key discovery to be made from very large databases. Outliers arise due to various reasons such as mechanical faults, changes in system behavior, fraudulent behavior, human error and instrument error. Indeed, for many applications the discovery of outliers leads to more interesting and useful results than the discovery of inliers. Detection of outliers can lead to identiﬁcation of system faults so that administrators can take preventive measures before they escalate. It is possible that anomaly detection may enable detection of new attacks. Outlier detection is an important anomaly detection approach. In this paper, we present a comprehensive survey of well known distancebased, densitybased and other techniques for outlier detection and compare them. We provide deﬁnitions of outliers and discuss their detection based on supervised and unsupervised learning in the context of network anomaly detection.
Keywords: Anomaly ; Outlier ; NIDS ; Densitybased ; Distancebased ; Unsupervised
Received 27 September 2010; revised 9 February 2011
1. INTRODUCTION
Outlier detection refers to the problem of ﬁnding patterns in data that are very diﬀ erent from the rest of the data based on appropriate metrics. Such a pattern often contains useful information regarding abnormal behavior of the system described by the data. These anomalous patterns are usually called outliers, noise, anomalies, exceptions, faults, defects, errors, damage, surprise, novelty or peculiarities in di ﬀ erent application domains. Outlier detection is a widely researched problem and ﬁnds immense use in application domains such as credit card fraud detection, fraudulent usage of mobile phones, unauthorized access in computer networks, abnormal running conditions in aircraft engine rotation, abnormal ﬂow problems in pipelines, military surveillance for enemy activities and many other areas. Outlier detection is important due to the fact that outliers can have signiﬁcant information. Outliers can be candidates for aberrant data that may aﬀ ect systems adversely such as by producing incorrect results, misspeciﬁcation of models, and biased estimation of parameters. It is therefore important to identify them
prior to modelling and analysis [1]. Outliers in data translate to signiﬁcant (and often critical) information in a large variety of application domains. For example, an anomalous tra ﬃ c pattern in a computer network could mean that a hacked computer is sending out sensitive data to an unauthorized destination. In tasks such as credit card usage monitoring or mobile phone monitoring, a sudden change in usage pattern may indicate fraudulent usage such as stolen cards or stolen phone airtime. In public health data, outlier detection techniques are widely used to detect anomalous patterns in patient medical records, possibly indicating symptoms of a new disease. Outliers can also help discover critical entities such as in military surveillance where the presence of an unusual region in a satellite image in an enemy area could indicate enemy troop movement. In many safety critical environments, the presence of an outlier indicates abnormal running conditions from which signiﬁcant performance degradation may result, such as an aircraft engine rotation defect or a ﬂow problem in a pipeline.
An outlier detection algorithm may need access to certain information to work. A labelled training
The Computer Journal, Vol. ??, No. ??, ????
2
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
data set is one such piece of information that can be used with techniques from machine learning [2] and statistical learning theory [3]). A training data set is required by techniques which build an explicit predictive model. The labels associated with a
data instance denote if that instance is normal or outlier. Based on the extent to which these labels are available or utilized, outlier detection techniques can be either supervised or unsupervised. Supervised outlier detection techniques assume the availability of
a training data set which has labelled instances for
the normal as well as the outlier class. In such techniques, predictive models are built for both normal
and outlier classes. Any unseen data instance is compared against the two models to determine which class it belongs to. An unsupervised outlier detection
technique makes no assumption about the availability of labelled training data. Thus, these techniques are more widely applicable. The techniques in this class make other assumptions about the data. For example, parametric statistical techniques assume a parametric distribution for one or both classes of instances. Several techniques make the basic assumption that normal instances are far more frequent than outliers. Thus
a frequently occurring pattern is typically considered
normal while a rare occurrence is an outlier. Outlier detection is of interest in many practical applications. For example, an unusual ﬂow of network packets, revealed by analysing system logs, may be classiﬁed as an outlier, because it may be a virus attack [4] or an attempt at an intrusion. Another example is automatic systems for preventing fraudulent use of credit cards. These systems detect unusual transactions and may block such transactions in early stages, preventing, large losses. The problem of
outlier detection typically arises in the context of very high dimensional data sets. However, much of the recent work on ﬁnding outliers uses methods which make implicit assumptions regarding relatively low dimensionality of the data. A speciﬁc point to note
in outlier detection is that the great majority of objects
analysed are not outliers. Moreover, in many cases, it
is not a priori known what objects are outliers.
1.1. Outlier Detection in Anomaly Detection
The anomaly detection problem is similar to the problem of ﬁnding outliers, speciﬁcally, in network intrusion detection. Intrusion detection is a part of a security management system for computers and networks. Intrusion [5] is a set of actions aimed to compromise computer security goals such as conﬁdentiality, integrity and availability of resources. Traditional technologies such as ﬁrewalls are used to build a manual passive defence system against attacks. An Intrusion Detection System (IDS) is usually used to enhance the network security of enterprises by monitoring and analysing network data
packets. Intrusion detection is a system’s “second line of defence” [6]. IDSs play a vital role in network security. Network intrusion detection systems (NIDSs) can detect attacks by observing network activities. Intrusion detection techniques are used, primarily, for misuse detection and anomaly detection. Misuse based detection involves an attempt to deﬁne a set of rules (also called signatures) that can be used to decide that a given behavior is that of an intruder. For example, Snort [7] is a misuse based NIDS. The other approach, anomaly detection, involves the collection of data relating to the behavior of legitimate users over a period of time, and then applying tests to the gathered data to determine whether that behavior is legitimate user behavior or not. Anomaly detection has the advantage that it can detect new attacks that the system has never seen before as they deviate from normal behavior. ADAM [8] is a well known anomaly detection NIDS. The key challenge for outlier detection in this domain is the huge volume of data. Outlier detection schemes need to be computationally e ﬃ cient to handle these large sized inputs. An outlier can be an observation that is distinctly diﬀ erent or is at a position of abnormal distance from other values in the dataset. Detection of abnormal behavior can be based on features extracted from traces such as network trace or system call trace [9]. An intrusion can be detected by ﬁnding an outlier whose features are distinctly di ﬀ erent from the rest of the data. Outliers can often be individuals or groups of clients exhibiting behavior outside the range of what is considered normal. In order to apply outlier detection to anomaly based network intrusion detection, it is assumed [10] that 
1. The majority of the network connections are normal tra ﬃ c. Only a small amount of traﬃ c is malicious.
2. Attack traﬃc is statistically di ﬀ erent from normal tra ﬃc.
However, in a realworld network scenario, these assumptions may not be always true. For example, when dealing with DDoS (distributed denial of service) [11] or bursty attack [12] detection in computer networks, the anomalous traﬃc is actually more frequent than the normal traﬃc.
1.2. Contribution of The Paper
Outlier detection methods have been used for numerous applications in various domains. A lot of these techniques have been developed to solve focused problems in a particular application domain, while others have been developed in a more generic fashion. Outlier detection approaches found in literature [13, 14, 15, 16] have varying scopes and abilities. The selection of an approach for detection of outlier(s) depends on the domain of application, type
The Computer Journal, Vol. ??, No. ??, ????
3
of data (e.g., numeric, categorical or mixed) and availability of labeled data. So, an adequate knowledge is highly essential regarding existing approaches to outlier detection while selecting an appropriate method for a speciﬁc domain. In this paper, we aim to provide a comprehensive uptodate survey on outlier detection methods and approaches to network anomaly identiﬁcation by using outlier detection methods. In particular, this paper contributes to the literature in outlier detection in the following ways.
• 
We have found general surveys on outlier detection such as [13, 14, 15, 16, 17, 18] and surveys on network anomaly detection such as [19, 20, 21]. But survey papers on the speciﬁc topic of anomaly identiﬁcation using outlier detection method are not available. This survey emphasizes anomaly identiﬁcation by using outlier detection approach. 
• 
In network traﬃ c, most traﬃc is normal. Traﬃc related to attacks is naturally rare and therfore outlier. Thus, it is beﬁtting that the problem of network anomaly identiﬁcation be studied as an outlier detection problem. So, it will be beneﬁcial for researchers as well as practitioners to have a resource where papers that use outlier detection for network anomaly identiﬁcation are surveyed. 
• 
We believe that our classiﬁcation of outliers into six cases provides a unique and novel insight into understanding the concept of outlier. This insight is likely to have implications on the design and development of algorithms for outlier detection whether for network anomaly detection or other context. 
• 
Although other surveys classify outlier detection techniques into the categories of supervised and unsupervised, our survey is most uptodate. 
• 
Our classiﬁcation of anomaly scores into three categories is also novel. An appropriate selection of anomaly score is crucial when applying outlier detection methods in speciﬁc domains. This survey will help readers select an appropriate anomaly score for their purpose. 
• 
We identify various key research issues and challenges of outlier detection methods in network anomaly identiﬁcation. 
1.3. 
Organization of The Paper 
The remainder of this paper is organized as follows. In the next section, we present preliminaries necessary to understand outlier detection methodologies. In Section 3, we explain issues in anomaly detection of network intrusion detection. Existing outlier detection approaches and a classiﬁcation of these approaches are presented in Section 4. In Section 5, we outline various research challenges and possibilities of future work. Finally, Section 6 concludes the paper.
2. PRELIMINARIES
Outlier detection searches for objects that do not obey rules and expectations valid for the major part of the data. The detection of an outlier object may be an evidence that there are new tendencies in data. Although, outliers are considered noise or errors, they may have important information. What is an outlier often depends on the applied detection methods and hidden assumptions regarding data structures used. Depending on the approaches used in outlier detection, the methodologies can be broadly classiﬁed as:
1. Distancebased,
2. Densitybased, and
3. Machine learning or softcomputing based.
These are discussed below.
2.1. Distancebased Outlier Detection
Distancebased methods for outlier detection are based on the calculation of distances among objects in the data with clear geometric interpretation. We
can calculate a socalled outlier factor as a function
F : x → R to quantitatively characterize an outlier
[14]. The function F depends on the distance between the given object x and other objects R in the dataset being analysed. We introduce some commonly avail able deﬁnitions of distancebased outlier detection from [22, 23, 24] below.
Deﬁnition 1: Hawkins Outlier  Outliers are ob servations which deviate signiﬁcantly from other observations as to arouse suspicion that these are generated by a diﬀ erent mechanism [22]. This notion is formalized by Knorr and Ng [23] as follows: Let o, p, q denote objects in a dataset and let d (p, q ) denote the distance between objects p and q . C is a set of objects and d (p, C ) denotes the minimum distance between p and object q in C :
d (p, C ) = min { d (p, q )q ∈ C } .
(1)
Deﬁnition 2: DB(pct, dmin) Outlier  An object p in a dataset D is a DB (pct, dmin) outlier if at least pct percentage of the objects in D lies at distance greater than dmin from p , i.e., the cardinality of the set {q ∈ D d (p, q ) ≤ dmin} is less than or equal to (100 − pct)% of the size of D [23]. To illustrate, consider a 2D data set depicted in F ig. 1. This is a simple 2dimensional dataset con taining 602 objects. There are 500 objects in the ﬁrst cluster C _{1} , 100 objects in the cluster C _{2} , and two additional objects O _{1} and O _{2} . In this example, C _{2} forms a denser cluster than C _{1} . According to Hawkins’ deﬁnition, both O _{1} and O _{2} are outliers, whereas ob jects in C _{1} and C _{2} are not. In contrast, within the framework of distancebased outliers, only O _{1} is a rea sonable DB (pct, dmin)outlier in the following sense.
The Computer Journal, Vol. ??, No. ??, ????
4
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
FIGURE 1. A 2D data set
If for every object q ^{O} ^{i} in C _{1} , the distance between q ^{O} ^{i} and its nearest neighbour is greater than the distance between O _{2} and C _{2} (i.e., d (O _{2} , C _{2} )), we can show that there is no appropriate value of pct and dmin such that O _{2} is a DB (pct, dmin)outlier but the objects in
C _{1} are not. The reason is as follows. If the dmin value
is less than the distance d (O _{2} , C _{2} ), all 601 objects (pct = 100 ∗ 601 /602) are further away from O _{2} than dmin. But the same condition holds also for every object q in C _{1} . Thus, in this case, O _{2} and all objects in
C _{1} are DB (pct, dmin) outliers. Otherwise, if the dmin
value is greater than the distance d (O _{2} , C _{2} ), it is easy to see that O _{2} is a DB (pct, dmin) outlier implying that there are many objects q in C _{1} such that q is also a DB (pct, dmin) outlier. This is because the cardinality of the set p ∈ D d (p, O _{2} ) ≤ dmin is always bigger than
the cardinality of the set p ∈ D d (p, q ) ≤ dmin.
Deﬁnition 3: D _{n} Outlier  Given an input dataset with N points, parameters n and k can be used to denote a
_{n} k outlier for a point p if there are no more than n 1
other points p ^{} such that D ^{k} (p ^{} ) > D ^{k} (p ) [24].
D
k
D ^{k} (p ) denotes distance of point p from its k ^{t}^{h} nearest neighbour. The points can be ranked according to their
D 
^{k} (p ) distances. For a given value of k and n , a point 
p 
is an outlier if no more than n − 1 other points in the 
data set have a higher value of distance than D ^{k} (p ). As can be seen in F ig. 1 for n = 6, D is outlier for a point p , since there is no more than (6 − 1) = 5 other points p ^{} , such that D ^{k} (p ^{} ) > D ^{k} (p ).
Deﬁnition 3 has intuitive appeal to rank each point based on its distance from its k ^{t}^{h} nearest neighbour. With this deﬁnition of outliers, it is possible to rank outliers based on D ^{k} (p ) distances. Outliers with larger
D ^{k} (p ) distances have fewer points close to them and
are thus intuitively stronger outliers. Various proximity measures can be used to measure the distance between a pair of points with numeric as well as categorical data.
Based on these deﬁnitions, we observe that distance based outliers are data points that are situated away from the majority of points using some geometric distance measure following a ﬁxed or changeable
k
6
threshold related to the domain of interest. The advantage of distancebased methods is the high degree of accuracy of distance measures. High dimensional data is always sparse related to some dimension or attribute. Because of the sparsity of data, distance based approaches usually do not perform well in situations where the actual values of the distances are similar for many pairs of points. So, researchers [25] working with distancebased outlier detection methods take a nonparametric approach. Although distancebased approaches for outlier detection are non parametric, the drawback is the amount of computation time required. In distancebased outlier detection methods, eﬀ ective use of the adaptive or conditional threshold value can result in better performance.
2.2. Densitybased Outlier Detection
The densitybased approach was originally proposed in [26]. Densitybased methods estimate the density distri bution of the input space and then identify outliers as those lying in regions of low density [27]. Densitybased outlier detection techniques estimate the density of the neighbourhood of each data instance. An instance that lies in a neighbourhood with low density is declared to be an outlier while an instance that lies in a dense neighbourhood is declared to be normal. A generalized deﬁnition of densitybased outlier based on [28] is given next. This approach is very sensitive to parameters deﬁning the neighbourhood. The deﬁnition is complex and therefore, is introduced in several steps.
Deﬁnition 4: LOF based Outlier  A local outlier factor (LOF) [28] is computed for each object in the dataset, indicating its degree of outlierness. This quantiﬁes how outlying an object is. The outlier factor is local in the sense that only a restricted neighbourhood of each object is taken into account. The LOF of an object is based on the single param eter called M inP ts, which is the number of nearest neighbours used in deﬁning the local neighbourhood of the object. The LOF of an object p can be deﬁned as
lrd M inP ts (o) lrd M inP ts (p )
N M inP ts (p ) ^{.} ^{(}^{2}^{)}
The outlier factor of object p captures the degree to which we can call p an outlier. It is the average of the ratio of the local reachability density of p and those of p ’s M inP tsnearest neighbours. The lower p ’s local reachability density (lrd ) is, and the higher lrd of p ’s
M inP tsnearest neighbours are, the higher is the LOF
value of p . The local reachability density (lrd ) of an object
p is the inverse of the average reachability distance
(reach dist) based on the M inP ts nearest neighbours of p . Note that the local density can be ∞ if all the
LOF _{M} _{i}_{n}_{P} _{t}_{s} (p ) =
o∈N M inP ts ( p)
The Computer Journal, Vol. ??, No. ??, ????
5
reachability distances in the summation are 0. This may occur for an object p if there are at least M inP ts objects, di ﬀerent from p , but sharing the same spatial coordinates, i.e., if there are at least M inP ts duplicates of p in the dataset. lrd is deﬁned as:
lrd _{M} _{i}_{n}_{P} _{t}_{s} (p ) =
reach dist _{M} _{i}_{n}_{P} _{t}_{s} (p, o)
(3)
o∈N M inP ts ( p)
N M inP ts (p )
−1
.
The reachability distance of an object p with respect to object o is reach dist _{M} _{i}_{n}_{P} _{t}_{s} (p, o):
reach dist _{M} _{i}_{n}_{P} _{t}_{s} (p, o)
= max {M inP ts dist(o), dist(p, o)} .
(4)
For any positive integer k , the k distance of object p , denoted as k distance( p ), is deﬁned as the distance d (p, o) between p and an object o ∈ D where D is a dataset such that:
1. for at least k objects o ^{} ∈ D  {p } it holds that d (p, o ^{} ) ≤ d (p, o), and
2. for at most k 1 objects o ^{} ∈ D  {p } it holds that d (p, o ^{} ) < d(p, o).
The k  distance neighborhood of p contains every object whose distance from p is not greater than the k distance, i.e.,
N _{k} _{} _{d}_{i}_{s}_{t}_{a}_{n}_{c}_{e}_{(} _{p}_{)} (p ) = q ∈ D  {p } d (p, q ) ≤ k  distance(p ).
These objects q are called the k nearest neighbours
of p . The notation N _{k} (p ) is used as a shorthand for
N _{k} _{} _{d}_{i}_{s}_{t}_{a}_{n}_{c}_{e}_{(} _{p}_{)} (p ). k distance( p ) is well deﬁned for any
positive integer k , although the object o may not be unique. In such a case, the cardinality of N _{k} (p ) is greater than k . For example, suppose that there are:
(i) 1 object with distance 1 unit from p ; (ii) 2 objects with distance 2 units from p ; and (iii) 3 objects with distance 3 units from p . Then 2distance(p ) is identical to 3distance(p ). Assume now that there are 3 objects
of 4distance(p ) from p . Thus, the cardinality of N _{4} (p ) can be greater than 4; in this case it is 6. F ig. 2 illustrates the idea of reachability distance with
k = 4. Intuitively, if object p is far away from o (e.g.,
p _{2} in the ﬁgure), the reachability distance between the two is simply their actual distance. However, if they are suﬃ ciently close (e.g., p _{1} in the ﬁgure), the actual distance is replaced by the k distance of o. The reason is that in so doing, the statistical ﬂuctuations of d (p, o) for all the p ’s close to o can be signiﬁcantly reduced. The strength of this smoothing e ﬀ ect can be controlled by the parameter k . The higher the value of k , the more similar the reachability distances for objects within the same neighbourhood. Densitybased outlier detection is a parameter based approach. The performance of a densitybased method is largely dependent on optimized parameter selection.
With reference to intrusion detection, we can consider
FIGURE 2. Reachability distance
densitybased outliers as data points lying in low density regions with respect to speciﬁc attributes or parameters.
2.3. Outlier Detection based on Soft comput ing Approaches
In this section we present some deﬁnitions of outliers inspired by soft computing approaches.
Deﬁnition 5: RMF (rough membership function) based outliers [29, 30]. A RMF is deﬁned as follows. Let IS=(U, A, V, f ) be an information system, X ⊆ U and X = Φ. U is a nonempty ﬁnite set of objects, A
a set of attributes, V the union of attribute domains,
and f : U × A → V a function such that for any X ∈ U
and a ∈ A , f (x, a) ∈ V _{a} . Let ν be a given threshold value. For any x ∈ X , if ROF _{X} (x) > ν , x is called
a rough membership function (RMF ) based outlier
with respect to X in IS, where ROF _{X} (x) is the rough outlier factor of x with respect to X in IS. The rough outlier f actor is deﬁned as ROF _{X} (x)
(x)
= 1−
(5)
(x) are
RMFs for every attribute subset A _{j} ⊆ A and singleton subset { a _{j} } of A , 1 ≤ j ≤ m . For every singleton subset
{a _{j} } , W
(x) = ^{} ( [x] _{{} _{a} _{j} _{}} )/ (U ).
[x] _{{} _{a} _{j} _{}} = {u ∈ U : f (u, a _{j} ) = f (x, a _{j} )} denotes the indiscernibility class of relation IND({a _{j} }) that contains element x. The RMF is µ _{X} :→ (0, 1] such that for any x ∈ X
: X → (0, 1] is a weight function such
that for any x ∈ X , W
m
j =1
µ
A
j
X
(x) × A _{j}  +
m
j =1
µ
{ a _{j} }
X
(x) × W
X
{ a _{j} }
2 × A  ^{2}
, a _{m} }.
µ
A
j
X
(x) and µ
{ a _{j} }
X
{ a _{j} }
X
where A={a _{1} , a _{2} ,
{ a _{j} }
X
B
µ
X _{(}_{x}_{)} _{=} [x] _{B} ∩ X 
B
[x] _{B} 
(6)
where [ x] _{B} = { u ∈ U : ∀ _{a} ∈ B (f (u, a) = f (x, a))} and B ⊆ A denotes the indiscernibility class of relation
The Computer Journal, Vol. ??, No. ??, ????
6
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
FIGURE 3. Six cases
IND(B) that contains element x. Rough sets are used in classiﬁcation system, where we do not have complete knowledge of the system [31]. In any classiﬁcation task the aim is to form various classes where each class contains objects that are not noticeably di ﬀ erent. These indiscernible or indistinguishable objects can be viewed as basic building blocks (concepts) used to build a knowledge base about the real world. This kind of uncertainty is referred to as rough uncertainty. Rough uncertainty is formulated in terms of rough sets. In fuzzy sets, the membership of an element in a set is not crisp. It can be anything in between yes and no. The concept of fuzzy sets is important in pattern classiﬁcation. Thus, fuzzy and rough sets represent diﬀerent facets of uncertainty. Fuzziness deals with vagueness among overlapping sets [32]. On the other hand, rough sets deal with coarse nonoverlapping concepts [33]. Neither roughness nor fuzziness depends on the occurrence of an event. In fuzzy sets, each granule of knowledge can have only one membership value for a particular class. However, rough sets assert that each granule may have diﬀerent membership values for the same class. Thus, roughness appears due to indiscernibility in the input pattern set, and fuzziness is generated due to the vagueness present in the output class and the clusters. To model this type of situation, where both vagueness and approximation are present, the concept of fuzzyrough sets [33] can be employed.
2.4. Comparison of outlier Detection Ap proaches
Outlier detection has largely focused on data that is univariate, and data with a known (or parametric or densitybased) distribution. These two limitations have restricted the ability to apply outlier detection methods to large realworld databases which typically have many diﬀ erent ﬁelds and have no easy way of characterizing the multivariate distribution.
TABLE 1. A General Comparison of Three Outlier Detection Approaches
Approaches Case 1 Case 2 Case 3 Case 4 Case 5 Case 6
Distance 
Yes 
Yes 
Yes 
Yes 
Yes 
No 
based 

Density 
Yes 
Yes 
No 
Yes 
Partially Partially 

based 

Soft com 
Yes 
Yes 
Yes 
Yes 
Partially No 

puting 
To evaluate the eﬀ ectiveness of outlier detection methods, we consider six cases over the synthetic data set given in F ig. 3. In these ﬁgures, O _{i} is an object and C _{i} is cluster of objects. A distinct outlier object, (case 1 in F ig. 3) is one that cannot be included in any clusters, whereas a distinct inlier object (case 2) is inside of a cluster. An equidistant outlier (case 3) is the object which is at equal distance from the clusters, whereas nondistinct inlier (case 4) is that object located near the border of a cluster. The chaining e ﬀ ect (case 5) represents the objects which are situated in a straight line among the clusters. However, the staying together (case 6) e ﬀ ect represents the objects which are outliers on a straight line. A comparison table of three approaches in the context of these six cases over synthetic data is given in Table 1.
3. NETWORK ANOMALY DETECTION
Unusual activities are outliers that are inconsistent with the remainder of data set [34]. An outlier is an observation that lies at an abnormal distance from other values in the dataset. In order to apply outlier detection to anomaly detection in the context of network intrusion detection, it is assumed by most researchers [10] that (1) the majority of the network connections are normal tra ﬃc. Only a small fraction of the tra ﬃ c is malicious. (2) The attack traﬃc is statistically di ﬀerent from normal traﬃ c. An anomaly detection technique identiﬁes attacks based on deviations from the established proﬁles of normal activities. Activities that exceed thresholds of the deviations are identiﬁed as attacks. Thus, supervised and unsupervised outlier detection methods can ﬁnd anomaly in network intrusion detection. Network intrusion detection systems [35] deal with detecting intrusions in network data. The primary reason for these intrusions is attacks launched by outside hackers who want to gain unauthorized access to the network to steal information or to disrupt the network. A typical setting is a large network of computers which is connected to the rest of the world through the Internet. Generally, detection of an intrusion in a network system is carried out based on two basic approaches  signature based and anomaly based. A signature based approach attempts to ﬁnd attacks based on the previously stored patterns
The Computer Journal, Vol. ??, No. ??, ????
7
or signatures for known intrusions. However, an anomaly based approach can detect intrusions based on deviations from the previously stored proﬁles of normal activities, but also, capable of detecting unknown intrusions (or suspicious pattern). The NIDS does this by reading all incoming packets and trying to ﬁnd suspicious patterns.
3.1. Network Anomalies and Types
Anomaly detection attempts to ﬁnd data patterns that are deviations in that they do not conform to expected behavior. These deviations or nonconforming patterns are anomalies. Based on the nature, context, behavior or cardinality, anomalies are generally classiﬁed into following three categories [10]:
1. Point Anomalies This simplest type of anomaly
is an instance of the data that has been found to
be anomalous with respect to the rest of the data. In a majority of applications, this type of anomaly occurs and a good amount of research addresses this issue.
2. Contextual Anomalies This type of anomaly (also known as conditional anomaly [36]) is deﬁned for a data instance in a speciﬁc context. Generally, the notion of a context is induced by the structure in the data set. Two sets of attributes determine if
a data instance belongs to this type of anomaly:
(i) Contextual attributes and (ii) Behavioural attributes. Contextual attributes determine the context (or neighbourhood) for that instance. For example, in stock exchange or gene expression time series datasets, time is a contextual attribute
that helps to specify the position of the instance in the entire sequence. However, behavioural attributes are responsible for the noncontextual characteristics of an instance. For example, in a spatial data set describing the average number of people infected by a speciﬁc disease in a country, the amount of infection at a speciﬁc location can be deﬁned as a behavioural attribute.
3. Collective Anomalies These are collections of related data instances found to be anomalous with respect to the entire set of data. In collective anomaly, the individual data instances may not be anomalous by themselves, however, their collective occurrence is anomalous.
To handle the above types of anomalies, various detection techniques have been proposed over the decades. Especially, to handle the point anomaly type, distancebased approaches have been found suitable. However, the eﬀ ectiveness of such techniques largely depends on the type of data, proximity measure or anomaly score used and the dimensionality of the data. In the case of contextual anomaly detection, both distancebased and densitybased approaches have been found suitable. However, like the previous case, in the
case of distancebased outlier approaches, the proximity measure used, type of data, data dimensionality and the threshold measure play a vital role. The densitybased approach can handle this type of anomaly eﬀ ectively for uniformly distributed datasets. However, in the case of skewed distributions, an appropriate density threshold is required for handling the variable density situation. Collective anomaly is mostly handled by using density based approaches. However, in the identiﬁcation of this type of anomalous patterns, other factors also play crucial role (such as compactness, and single linkage
e ﬀects)
3.2. Characterizing ANIDS
An ANIDS is an anomaly based network intrusion detection system. A variety of ANIDSs have been
proposed since the mid 1980s. It is necessary to have
a clear deﬁnition of anomaly in the context of network
intrusion detection. The majority of current research on ANIDS does not explicitly state what constitute anomaly in their study [37]. In a recent survey [19], anomaly detection methods were classiﬁed into two classes: generative and discriminative. Generally, an ANIDS is characterized based on the following attributes: (i) nature and type of the input data, (ii) appropriateness of similarity/dissimilarity measures, (iii) labelling of data and (iv) reporting of anomalies. Next, we discuss each of these issues.
3.2.1. Types of Data
A key aspect of any anomaly detection technique is
the nature of the input data. The input is generally
a collection of data instances or objects. Each data
instance can be described using a set of attributes (also
referred to as variables, characteristics, features, ﬁelds
or dimensions). The attributes can be of di ﬀerent types such as binary, categorical or continuous. Each data
instance may consist of only one attribute (univariate)
or multiple attributes (multivariate). In the case of
multivariate data instances, all attributes may be of the same type or may be a mixture of diﬀ erent data types.
3.2.2. Proximity measures
Distance or similarity measures are necessary to solve many pattern recognition problems such as classiﬁcation, clustering, and retrieval problems. From scientiﬁc and mathematical points of view, distance is deﬁned as a quantitative degree of how far apart two objects are. A synonym for distance is dissimilarity. Distance measures satisfying the metric properties are simply called metric while nonmetric distance measures are occasionally called divergence. A synonym for similarity is proximity and similarity measures are often called similarity coeﬃ cients. The selection of a proximity measure is very diﬃ cult because it depends upon the (i) the types of attributes in the data (ii) the dimensionality of data and (iii) the problem of weighing
The Computer Journal, Vol. ??, No. ??, ????
8
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
data attributes. In the case of numeric data objects, their inherent geometric properties can be exploited naturally to deﬁne distance functions between two data points. Numeric objects may be discrete or continuous. A detailed discussion on the various proximity measures for numeric data can be found in [38]. Categorical attribute values cannot be naturally arranged as numerical values. Computing similarity between categorical data instances is not straightforward. Several datadriven similarity measures have been proposed [39] for categorical data. The behavior of such measures directly depends on the data. Mixed type datasets include categorical and numeric values.
A 
common practice for clustering mixed datasets is 
to 
transform categorical values into numeric values 
and then use a proximity measure for numeric data. Another approach [27] is to compare the categorical
values directly, in which two distinct values result in distance 1 while two identical values result in distance
0. 

3.2.3. 
Data Labels 
The labels associated with a data instance denote if that instance is normal or anomalous. It should be noted that obtaining labelled data that is accurate
as 
well as representative of all types of behaviours, 
is 
often prohibitively expensive. Labelling is often 
done manually by a human expert and hence requires substantial eﬀ ort to obtain the labelled training data set. Several active learning approaches for creating labelled datasets have also been proposed [40]. Typically, getting a labelled set of anomalous data
instances which cover all possible type of anomalous behavior is more diﬃcult than getting labels for normal behavior. Moreover, anomalous behavior is often dynamic in nature, e.g., new types of anomalies may arise, for which there is no labelled training data. The KDD CUP ’99 dataset [41] is an evaluated intrusion data set with labelled training and testing data. Based on the extent to which labels are available, anomaly detection techniques can operate either in supervised
or unsupervised approaches. A supervised approach
usually trains the system with normal patterns and attempts to detect an attack based on its non conformity with reference to normal patterns. In case of the KDD CUP ’99 dataset, the attack data are labelled into four classes – DoS (denial of service), R2L (remote to local), U2R (user to root), and probe. This deﬁnes the intrusion detection problem as a 5class problem. If attack data are labelled into n possible classes, we have an (n + 1)class problem at hand. An unsupervised approach does not need a labelled dataset. Once the system identiﬁes the meaningful clusters, it applies the appropriate labelling techniques for identiﬁed clusters. A supervised approach has high detection rate (DR) and low false positive rate (FPR) of attack detection compared to an unsupervised
approach. Supervised approaches can detect known attacks whereas unsupervised approaches can detect unknown attacks as well.
3.2.4. Anomaly Scores
Detection of anomalies depends on scoring techniques that assign an anomaly score to each instance in the test data depending on the degree to which that instance is considered an anomaly. Thus the output of such a technique is a ranked list of anomalies. An analyst may choose to either analyse the top few anomalies or use a cuto ﬀ threshold to select anomalies. Several anomaly
score estimation techniques have been developed in the past decades. Some of them have been represented under the category of distancebased, densitybased and machine learning or soft computing based approach.
A Distancebased anomaly scores In this section, we introduce some of the popular distancebased anomaly score estimation techniques.
A.1 LOADED (Linkbased Outlier and Anomaly De tection in Evolving Data Sets) Anomaly Score [42]  Assume our data set contains both continuous and categorical attributes. Two data points p _{i} and p _{j} are considered linked if they are considerably
similar to each other. Moreover, associated with each link is a link strength that captures the degree of linkage, and is determined using a similarity metric deﬁned on the two points. The data points p _{i} and p _{j} are linked in a categorical attribute space if they have at least one attributevalue pair
in common. The associated link strength is equal
to the number of attributevalue pairs shared in common between the two points. A score function that generates high scores for outliers assigns score to a point that is inversely proportional to the
sum of the strengths of all its links. To estimate this score e ﬃ ciently, ideas from frequent itemset mining are used. Let I be the set of all possible attributevalue pairs in the data set M . Let D =
{d : d ∈ P owerSet(I ) ∧ ∀ _{i}_{,}_{j} _{:}_{i} _{} _{=} _{j} d _{i} · attrib = d _{j} · attrib}
be the set of all itemsets, where an attribute only occurs once per itemset. The score function for a categorical attribute is deﬁned as:
Score _{1} (p _{i} ) = ^{}
d⊆p
i
_{}_{d} _{} sup(d ) ≤ s
1
(7)
where p _{i} is an ordered set of categorical attributes. sup(d ) is the number of points p _{i} in the data set where d ⊆ p _{i} , otherwise known as support of itemset d . d  is the number of attributevalue pairs in d . s is a userdeﬁned threshold of minimum support or minimum number of links.
A point is deﬁned to be linked to another point in
the mixed data space if they are linked together in
The Computer Journal, Vol. ??, No. ??, ????
9
the categorical data space and if their continuous attributes adhere to the joint distribution as indicated by the correlation matrix. Points that violate these conditions are deﬁned to be outliers. The modiﬁed score function for mixed attribute data is as follows:
Score _{2} (p _{i} ) = ^{}
d⊆p
i
_{}_{d} _{} (C _{1} ∨ C _{2} ) ∧ C _{3} is true
1
(8)
where C _{1} : sup(d ) ≤ s, C _{2} : at least δ % of the correlation coe ﬃcients disagree with the distribution followed by the continuous attributes for point p _{i} , and C _{3} : C _{1} or C _{2} hold true for every superset of d in p _{i} . Condition C _{1} is the same condition used to ﬁnd outliers in a categorical data space using Score _{1} (p _{i} ). Condition C _{2} adds continuous attribute checks to Score _{1} (p _{i} ). Condition C _{3} is a heuristic and allows for more e ﬃcient processing because if an itemset does not satisfy conditions C _{1} and C _{2} , none of its subsets are considered.
A.2 RELOADED (REduced memory LOADED) Anomaly Score [43] An anomalous data point can be deﬁned as one that has a subset of attributes that take on unusual values given the values of the other attributes. When all categorical attributes of a data point have been processed, the anomaly score of the data point is computed as a func tion of the count of incorrect predictions and the violation score as below:
AnomalyScore[P _{i} ] =
m
j
i
_{=}_{1} i −W _{j}
V τ
m
_{m}_{n} _{2}
^{+}
(9)
where W _{j} is the cumulative number of incorrect predictions of categorical attribute j for the previous i data points. There are m categorical attributes and n continuous attributes. V _{τ} is cumulative violation score of point P _{i} .
B Densitybased anomaly scores Here, we introduce a few densitybased anomaly score estimation techniques.
B.1 ODMAD (Outlier Detection for Mixed Attribute Datasets) Anomaly Score [27]  This score can be used in an approach that mines outliers from data containing both categorical and continuous attributes. The ODMAD score is computed for each point taking into consideration the irregular ity of the categorical values, the continuous values, and the relationship between the two spaces in the dataset. A good indicator to decide if point X _{i} is an outlier in the categorical attribute space is the score value, Score _{1} , deﬁned below:
Score _{1} (X _{i} ) =
^{}
d⊆X _{i} ∧supp ( d) < σ∧  d≤ Max
1
supp(d ) × d 
(10)
where X _{i} is a data point with m _{c} categorical attributes in a dataset D . Let T be the set of all possible combinations of attribute and value pairs in the dataset D . Let S be the set of all sets d so that an attribute occurs only once in each set d . Then,
S = { d : d ∈ P owerSet(T ) ∧ ∀ l, k ∈ d, l = k }
(11)
where l and k represent attributes whose values appear in set d . d  represents length of set d . δ is a userdeﬁned threshold. A point can be an outlier if it contains single values that are infrequent or sets of values that are infrequent. A categorical value or a combination of values is infrequent if it appears less than δ times in dataset. M ax is a userdeﬁned length of attribute set.
In the case of mixed attribute datasets a modiﬁed score is deﬁned for the data points that share the same categorical value as well as similar continuous values as below:
Score _{2} (X _{i} ) =
^{1}
a ∈ X ^{c}  ^{×}
i
∀a∈ X
c
i
q
cos (X , µ _{a} )
i
(12)
where X _{i} is a data point containing m _{c} categorical
values and m _{q} continuous values. X
are respectively the categorical and the continuous
parts of X _{i} . Let a be one of the categorical values
of X that occurs with support supp(a). Let a subset of the data that contains the continuous
vectors corresponding to the data points that share
n},
categorical value a be {X : a ∈ X , i = 1
i and X
c
q
i
c
i
q
i
c
i
with a total of supp(a). The mean vector of this
set, µ _{a} , is below:
µ _{a} =
^{1}
supp(a) ^{×}
n
i =1 ∧a∈X
c
i
q
X .
i
(13)
q
Also, the cosine similarity between a point X and mean µ _{a} is given below:
i
q
cos (X , µ _{a} ) =
i
m
q
j
=1
q
ij
x
X
q
i
^{×}
_{}
µ
aj
µ _{a}
(14)
where X represents the L _{2} norm of vector X . Score _{2} (X _{i} ) is a score for each point X _{i} ; for all categorical values a contained in X , this is the summation of all cosine similarities for all categorical values a divided by the total number of values in the categorical part of X _{i} , X . As
c
i
c
i
The Computer Journal, Vol. ??, No. ??, ????
10
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
minimum cosine similarity is 0 and maximum is 1, the data points with similarity close to 0 are more likely to be outliers.
C Machine learning or softcomputing based anomaly scores This section presents a few machine learning or soft computing based anomaly score esimation techniques.
C.1 RNN (Replicator Neural Network) Outlier Detec tion Anomaly Score [44]  This score has been used
in feedforward multilayer perceptron network ap
proaches to anomaly detection. The Outlier Factor δ _{i} of the ith data record is the measure of outlier
ness. δ _{i} is deﬁned by the average reconstruction error over all features (variables):
δ _{i} = ^{1}
n
n
j =1
(x _{i}_{j} − o _{i}_{j} ) ^{2}
(15)
where x _{i}_{j} ’s are reconstruction data instances, o _{i}_{j} are reconstruction output instances and n is the number of features over which the data is deﬁned. The reconstruction error is used as anomaly score.
C.2 GMM (Gaussian Mixture Model) Anomaly Detec tion [36]  Assume each data instance d is rep resented as [x, y ]. If U is the contextual data and V the behavioural data, the mapping function p (V _{j} U _{i} ) indicates the probability of the indicator part of a data point y to be generated from a mix ture component V _{j} , when the environmental part x is generated by U _{i} . The anomaly score for a test instance d is given as:
Anomaly Score
=
n
U
i
=1
p (x ∈ U _{i} )
n
V
j
=1
p (y ∈ V _{j} ) p (V _{j} U _{i} )
(16)
where n _{U} is the number of mixture components in
U and n _{V} is the number of mixture components in V . p (x ∈ U _{i} ) indicates the probability that
a sample point x is generated from the mixture
component U _{i} while p (y ∈ V _{j} ) indicates the probability that a sample point y is generated from the mixture component V _{j} .
C.3 Markov Chain Model Anomaly Score [45]  This model is used to represent a temporal proﬁle of normal behavior in a computer and network system. The Markov chain model of the normal proﬁle is learned from historic data of the system’s normal behavior. The observed behavior of the system is analysed to infer the probability that the Markov chain model of the normal proﬁle supports the observed behavior. A low probability of support indicates an anomalous behavior that
may result from intrusive activities. The likelihood P (S ) of sequence S is given as:
P (S ) = q _{S} _{1}
 S 
t=2
p _{S} _{t}_{−} _{1} _{S} _{t}
(17)
where q _{S} _{1} is the probability of observing the symbol S _{1} in the training set and p _{S} _{t}_{−} _{1} _{S} _{t} is the probability of observing the symbol S _{t} after S _{t}_{−}_{1} in the training set. The inverse of P (S ) is the anomaly score for given sequence S .
A general comparison of the e ﬀ ectiveness of various
anomaly/outlier scores reported in the previous subsections can be made based on parameters such
as detection approaches used (density and distance), attribute types of data and applications under
considerations. A summary of comparisons are given
in
Table 2.
3.2.5. Datasets Used
Network intrusion detection is a problem of handling of high dimensional mixed type data. Most approaches considered here are able to handle categorical, numerical or mixed type high dimensional data. Most techniques are evaluated based on KDD Cup 1999 intrusion detection dataset. However, this dataset has several limitations such as (i) the dataset is not unbiased and (ii) it is a puriﬁed dataset that does not contain fragment data. Several acedemic and research laboratories, commer cial organizations have generated unbiased intrusion datasets to evaluate IDSs and associated datasets. Pro totype IDS testing platforms have been developed by University of California at Davis [46] and IBM Zurich [47]. A rigorous and extensive IDS testing was per
formed by MIT Lincoln Laboratory [48]. The Air Force Research Laboratory [49] has also been involved in IDS
testing in a complex hierarchical network environment. The MITRE Corporation [50] investigated the charac
teristics and capabilities of network base IDS.
4. EXISTING OUTLIER DETECTION AP PROACHES FOR NETWORK ANOMALY DETECTION
Outlier detection is a critical task in many safety critical environments as outliers indicate abnormal running conditions from which signiﬁcant performance degradation may result. We can categorise and analyse a broad range of outlier detection methodologies as either supervised or unsupervised approaches.
4.1. Supervised Approaches
The supervised approaches are essentially supervised classiﬁcation and require prelabelled data, tagged as
The Computer Journal, Vol. ??, No. ??, ????
11
TABLE 2. Comparison of Anomaly Score
Author 
Score Formula 
Approach 
Data type 
Applications 

& Year 
(Density 

/Distance/soft 

computing) 

 
S  

Ye, 
P ( S ) = q _{S} _{1}

p _{S} _{t}_{−} _{1} _{S} _{t} 
Softcomputing based approach using HMM [45] 
Mixed Type 
1order Markov chain modelling [45] for contextual anomaly detection. 

2000 
t=2 
data 

n 

Hawkins, 
δ _{i} = ^{1} n 

( x _{i}_{j} − o _{i}_{j} ) ^{2} 
Softcomputing based approach in RNN [44] 
Categorical 
Oneclass Anomaly Detection [44]. 

2002 
j 
=1 
data 

Ghoting, 2004 
Score _{1} ( p _{i} ) = ^{} d⊆ ^{p} i 
 d  sup ( d) ≤ s 1 
Distancebased approach 
Categorical data 
Capturing dependencies using links for categorical data in LOADED [42]. 

Ghoting, 2004 
Score _{2} ( p _{i} ) = ^{} d⊆ ^{p} i 
 d  ( C 1 ∨ C 2) ∧ C 3 is true 1 
Distancebased approach 
Mixed type data 
Handling mixed attribute data in LOADED [42]. 

m 

Otey, 2005 
Anomaly Score [ P _{i} ] = 
j _{=}_{1} i− W _{j} i /m + ^{V} ^{τ} _{2} mn 
Distancebased approach 
Mixed type data 
Discriminate outliers for categori cal and continuous attributes in RELOADED [43]. 

^{n} U 
^{n} 
V 

Song, 
Anomaly Score =

p( x ∈ U _{i} )
p( y ∈ V _{j} ) p( V _{j}  U _{i} ) Softcomputing 
Mixed type 
Reduction of contextual anomaly to point anomaly [36]. 

2007 
i=1 
j =1 
based approach 
data 

in 
GMMCAD 

[36] 

Koufakou, Score _{1} ( X _{i} ) = 
^{} 
1 
Densitybased 
Categorical 
Categorical score ﬁnding in ODMAD [27] for categorical data. 

2010 
d⊆ X _{i} ∧ supp(d)< σ∧ d≤ Max 
supp ( d) ×  d 
approach 
data 

Koufakou, Score _{2} ( X _{i} ) = 
1 
^{×} 

cos ^{} X , µ _{a} ^{} q i 
Densitybased 
Mixed type 
Continuous score ﬁnding in ODMAD [27] for mixed attribute data. 

_{} a ∈ X 
c 
 

2010 
i 
∀a ∈ X 
c 
approach 
data 

i 
normal or abnormal. These approaches can be used for online classiﬁcation, where the classiﬁer learns the classiﬁcation model and then classiﬁes new exemplars as and when required against the learned model. If the new exemplar lies in a region of normality it is classiﬁed as normal, otherwise it is ﬂagged as an outlier. Classiﬁcation algorithms require a good spread of both normal and abnormal labelled data.
4.1.1. Statistical Methods
The general approach to solve the outlier detection problem using statistical methods is based on the construction of probabilistic data models and the use of mathematical methods of applied statistics and probability theory. With a statistical approach to outlier detection, a system learns the behavior of users, applying metrics or measuring methods. As the system is running, the outlier or anomaly detector is constantly measuring the deviation of the present behavior proﬁle from the original. The LNKnet software package was developed to simplify the application of most important statistical, neural network, and machine learning pattern classiﬁer [51] on network connection data.
Methods for detecting outliers based on the regression analysis are included among statistical methods. Regression analysis consists of ﬁnding a dependence of one random variable (or a group of variables) Y on another variable (or a group of variables) X . Speciﬁcally, the problem is formulated as that of examining the conditional probability distribution Y X .
Among regression methods for outlier analysis, two approaches are distinguished. In the framework of the ﬁrst approach, the regression model is constructed with the use of all data; then, the objects with the greatest errors are successively, or simultaneously, excluded from the model. This approach is called a reverse search. The second approach consists of constructing a model based on a part of data and, then, adding new objects followed by the reconstruction of the model. Such a method is referred to as a direct search [52]. The model is extended by adding the most appropriate objects, which are the objects with least deviations from the model constructed. The objects added to the model in the last turn are considered outliers. A regression method based on support vector regression (SVR) and particle swarm optimization algorithm (PSOA) is given in [53] for pattern analysis of intrusion detection. Basic disadvantages of the regression methods are that they greatly depend on assumptions about the error distribution and need a prior partition of variables into independent and dependent ones.
4.1.2. Decision Tree based Approaches
These approaches begin with a set of cases or examples, and create a tree data structure that can be used to classify new cases. Each case is described by a set of attributes (or features) which can have numeric or symbolic values. Associated with each training case is a label representing the name of a class. Each internal node of the tree contains a test, the result of which is used to decide what branch to follow from that node.
The Computer Journal, Vol. ??, No. ??, ????
12
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
For example, a test might ask is x > 4 for attribute x ? If the test is true, then the case processes down the left branch, and if not then it follows the right branch. The leaf nodes contain class labels instead of tests. In classiﬁcation mode, when a test case (which has no label) reaches a leaf node, a decision tree method such as C4.5 [54] classiﬁes it using the label stored there. Decision tree learners use a method known as divide and conquer to construct a suitable tree from a training set. The divide and conquer algorithm partitions the data until every leaf contains cases of a single class, or until further partitioning is impossible because two cases have the same values for each attribute but belong to di ﬀ erent classes. Consequently, if there are no conﬂicting cases, the decision tree will correctly classify all training cases. This socalled overﬁtting is generally thought to lead to a loss of predictive accuracy in most applications. Overﬁtting can be avoided by a stopping criterion that prevents some sets of training cases from being subdivided (usually on the basis of a statistical test of the signiﬁcance of the best test), or by removing some of the structure of the decision tree after it has been produced. The possibilistic decision tree [55] has been used for a intrusion detection system. Traﬃc data is captured by performing simulated attacks on Intelligent Electronic Devices (IEDs). Data is obtained for two types of genuine user activity and two types of common malicious attacks on IEDs. The genuine user activity includes, casual browsing of IED data and downloading of IED data while a Ping ﬂood Denial of Service (DoS) and password crack attack are performed for malicious attacks. Classiﬁcation is done using possibilistic decision trees for the logarithmic histogram of the time diﬀ erence between the arrival of two consecutive packets. It obtains a continuous valued possibilistic decision tree and its cut points. It also includes the use of mean distance metrics to obtain the possibility distribution for the real attack data.
4.1.3. Soft computing Method  Roughset Approach
The rough set [29] philosophy is based on the assumption that with every object of the universe there is associated a certain amount of information
(data, knowledge), expressed by means of its attributes. Objects having the same description are indiscernible. The basic idea of rough sets is discussed in Subsection
2.3.
In [6], RST (Rough Set Theory) and SVM (Support Vector Machine) are used to detect intrusions. First, RST is used to preprocess the data and to reduce the dimensions. Next, the features selected by RST are sent to the SVM model to learn and test respectively. The method is eﬀective in decreasing the space density of data and has low false positive rate and good accuracy of detection. The major
advantage of rough set theory is that it does not need any preliminary or additional information about data, such as a probability distribution. The main problems that can be solved using rough set theory include data reduction (i.e., elimination of superﬂuous data), discovery of data dependencies, estimation of data signiﬁcance, generation of decision (control) algorithms from data, approximate classiﬁcation of data, discovery of similarities or diﬀ erences in data, discovery of patterns in data, and discovery of cause e ﬀect relationships.
4.1.4. Proximitybased Approaches
Proximitybased techniques are simple to implement and make no prior assumptions about the data distribution model. However, they suﬀ er high computational cost as they are founded on the calculation of the distances between all records. The computational complexity is directly proportional to both the dimensionality of the data m and the number of records n . The k nearest neighbour (k NN) algorithm is suitable for outlier detection; it calculates the nearest neighbours of a record using a suitable distance calculation metric such as Euclidean distance or Mahalanobis distance. A deﬁnition of proximity based outlier D ^{k} (p ) from [24] is discussed in Deﬁnition 4 in Subsection 2.1. A partitionbased outlier detection algorithm ﬁrst partitions the input points using a clustering algorithm, and computes lower and upper bounds on D ^{k} for points in each partition. It then uses this information to identify the partitions that cannot possibly contain the top n outliers and prunes them. Outliers are then computed from the remaining points (belonging to unpruned partitions) in a ﬁnal phase. PAIDS (Proximity assisted intrusion detection) [56],
is an approach for identifying the outbreak of unknown
worms. PAIDS does not rely on signatures. Instead, it takes advantage of the proximity information of compromised hosts. PAIDS operates on an orthogonal dimension with existing IDS approaches and can thus work collaboratively with existing IDSs to achieve better performance. The e ﬀ ectiveness of PAIDS with
tracedriven simulations has a high detection rate and
a low false positive rate.
The major limitation of this approach is selectiing an appropriate proximity measure, especially for the high dimensional mixed type data. Also, developing
a 
heuristic method for appropriate threshold selection 
is 
a di ﬃcult task. 
4.1.5. Kernel Function based approach
As has been found in the case of distancebased meth ods, deﬁning an appropriate proximity measure for heterogeneously structured data is a di ﬃ cult task. To overcome this limitation, methods based on kernel functions can be used [57]. The kernel of a function
The Computer Journal, Vol. ??, No. ??, ????
13
f is the equivalence relation on the function’s domain that roughly expresses the idea of equivalence as far as the function f can tell.
Deﬁnition 6: Let X and Y be sets and let f be a function from X to Y . Elements x _{1} and x _{2} of X are equivalent if f (x _{1} ) and f (x _{2} ) are equal, i.e., they are the same element of Y [58]. Formally: f : X → Y
ker (f ) = f { (x _{1} , x _{2} ) ∈ X × X : f (x _{1} ) = f (x _{2} )} . (18)
The kernel function K (x, y ) can be expressed as a dot product in a high dimensional space. If the arguments to the kernel are in a measurable space X , and if the kernel is positive semideﬁnite, i.e.,
^{} _{i}_{,}_{j} K (x _{i} , x _{j} )c _{i} c _{j} ≥ 0
for any ﬁnite subset f (x _{1} ,
, c _{n} ) of objects, there exists a function ϕ (x)
, x _{n} ) of X and subset
f (c _{1} ,
whose range is in an inner product space of possibly high dimension, such that
K (x, y ) = ϕ (x) · ϕ (y ).
(19)
The KPCA (Kernel Principal Component Analysis) [59] is a real time IDS. It is composed of two parts. First part is used for online feature extraction. The second part is used for classiﬁcation. Extracted features are used as input for classiﬁcation. With an adaptation of the kernel function kerneltrick (Equation 19 ) KPCA extracts online nonlinear features. Here, Least Squares Support Vector Machines (LSSVM) [60] is used as a classiﬁer. SVMs typically solve problems by quadratic programming (QP). Solving QP problem requires complicated computational e ﬀ ort and has high memory requirement. LSSVM overcomes by solving a set of linear equations.
4.1.6. Kernel Function using Fuzzy Approach
This is a fuzzy clustering method in the feature space for the multiclass classiﬁcation problem. The approach [14] searches for one common cluster containing images of all objects from the original space. In this case, the membership degree of an object image with respect to the fuzzy cluster in the feature space may be viewed as a typicalness degree of the object, i.e., a measure opposed to outlierness. Objects with a low typicalness degree (less than a threshold determined by the user) are considered outliers. It should be noted that the modiﬁcation of the threshold (i.e., the modiﬁcation of the outlier factor criterion) does not require model reconstruction, which is the case when the distance based algorithms are used. Petrovskiy introduces a fuzzy kernelbased method for realtime network intrusion detection [61]. It involves a kernelbased fuzzy clustering technique. Here, network audit records are vectors with numeric and nominal attributes. These vectors are implicitly
mapped by means of a special kernel function into a high dimensional feature space, where the possibilistic clustering algorithm is applied to calculate the measure
of typicalness and to discover outliers.
4.1.7. Distancebased outlier detection approach
This approach trains classiﬁers and computes covari ance matrices incrementally. Therefore, the decision whether a given point is an anomaly or not is based
only on the previously processed data points. For ex ample, in the RELOADED algorithm [43] for each point
in 
the data set, and for each categorical attribute d 
of 
that data point, an appropriate classiﬁer is trained. 
That classiﬁer, in turn, is used to predict the appro
priate value of d . If the prediction is wrong, the count
of incorrect predictions is incremented. Next, continu
ous attributes of the data point are used to incremen
tally compute the covariance matrix corresponding to the attributevalue pair d . The cumulative violation score of the data point is incremented. An anomaly score for RELOADED is given in Equation 9 . VAHD (Variablelength Average Hamming Distance) [62] is a distancebased anomaly detection method. The method operates in two stages (i) building up
a normal variablelength pattern database and (ii)
detecting such pattern(s) in real environment. In the ﬁrst stage, it builds a normal proﬁle by collecting system calls of some interested process(es) and by extracting interested patterns. In the second stage, this
proﬁle is used to monitor system behavior and calculate the average Hamming distance (AHD) between them, which determines the strength of an anomalous signal.
If it exceeds a user deﬁned threshold value, a suspicious
event is assumed. The method has some advantages,
such as high accuracy and realtime detection.
One needs to choose a threshold for anomaly score
in order to discriminate between outliers and normal
points. This can be done by incrementally computing the mean and standard deviation of the anomaly scores and by ﬂagging any point as an outlier if it is more
than s standard deviations greater than the current
mean. Here, deriving the standard deviation is diﬃ cult. Also, estimating distance over a combined numeric and categorical attribute domain with proper weightage is
a di ﬃcult task.
4.1.8. Signal processing based approach
Signal processing techniques can be applied to identify network anomalies, and to study network characteristics such as routing and congestion. There are two signal processing based approaches: wavelet based approach and cognitive packet network based approach.
A W avelet based approach In a wireless sensor network (WSN), a large number of sensor nodes are distributed over a large area. The sensor nodes are endowed with
The Computer Journal, Vol. ??, No. ??, ????
14
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
wireless communication capabilities for sensing and processing. A measurement that signiﬁcantly deviates from the normal pattern of sensed data detects an outlier in WSN. In [17], a detailed overview of the existing outlier detection techniques for the WSN is given. A survey of wavelet based network anomaly detection approaches in the context of WSN can be found in [20]. In recent research, another signiﬁcant work [63] can be found for anomaly detection by identifying outlier based on wavelets. In addition, research reported in [64, 65, 66, 67, 68] present signiﬁcant work dealing with automatic network response to unexpected events and improvement in quality of service (QoS).
B Cognitive packet network based approach The cognitive packet network (CPN) architecture uses an adaptive routing protocol that attempts to address the stability and reliability of net work services by rerouting their traﬃc as neces sary. Network worms are selfreplicating and self propagating malicious applications. They can ex ploit system vulnerabilities of operating systems and spread through networks causing signiﬁcant damage by reducing system performance. Worms can be considered anomalies in network traﬃ c. Re search related to detection of attacks, particularly those by network worms, using CPN is found in [69, 70, 71, 72, 73]. Recent research [74, 75] also report on the development of selfaware computer networks (SAN) based on CPN. Such networks are capable of detecting and reacting to intrusions adaptively.
4.1.9. Densitybased outlier detection approach
A densitybased approach uses an outlier factor as a
measurement of being an outlier. In the LOF [28] algorithm, a local outlier factor (LOF) is used as a measurement for ﬁnding a sample as outlier. This local outlier factor is computed from the sample’s nearest neighbour objects rather than from the entire set of data as a whole. LOF is the mean value of the ratio of
the density distribution estimate in the neighbourhood
of 
the object analysed to the distribution densities of 
its 
neighbours. LOF is computed using Equation 2 . Another densitybased anomaly detection method 
is 
found in [76]. An important advantage of this 
method is its capability to update normal proﬁle of system usage pattern dynamically. It models the system usage pattern based on features of program behavior. When system usage pattern changes, new program behaviours are inserted into old proﬁles by densitybased incremental clustering. It uses DBSCAN [77] to generate the initial clusters for normal program behavior proﬁles. The proﬁles are updated
by modifying DBSCAN’s clusters using incremental clustering. This method has incremental detection
quality and a much lower false alarm rate.
4.2. Unsupervised Approaches
These approaches determine outliers with no prior knowledge of the data. They use a learning approach analogous to unsupervised clustering. The approaches process the data as static distributions, pinpoint the most remote points, and ﬂag them as potential outliers. Once a system possesses a suﬃ ciently large dataset with good coverage, it can compare new items with the existing data.
4.2.1. Statistical Method
These approaches are developed from the view point of statistical learning theory. They attempt to detect outliers in an online process through the online unsupervised learning of a probabilistic model of the information source. A score is given to an input based on the learned model, with a high score indicating a high possibility of being a statistical outlier. An oﬀ  line process of outlier detection uses batchdetection in which outliers can be detected only after seeing the entire dataset. The online setting is more realistic than the o ﬀ line one when one deals with the tremendous amount of data in network monitoring. An example of such a method is SmartSifter (SS) [78]. SS is able to detect 79% intrusions in the top 3%, and 81% intrusions in the top 5% of the KDDCUP 1999 dataset.
4.2.2. Graph Theoretic approach
These approaches generate many classiﬁcation trees from the original data using a tree classiﬁcation algorithm. After the forest of trees is formed, a new object that needs to be classiﬁed is sent down each of the trees in the forest for classiﬁcation. It ﬁnds outliers whose proximities to all other cases in the entire data are generally small. An example is the random forests [21] algorithm. The random forests algorithm generates many classiﬁcation trees from the original data. If cases k and n are in the same leaf of a tree, their proximity is increased by one. Finally, the proximities are normalized by dividing by the number of trees.
The random forests algorithm, outliers can be deﬁned as the cases whose proximities to other cases in the dataset are generally small. Outlierness can be calculated over proximities. class(k ) = j denotes that k belongs to class j . prox(n, k ) denotes the proximity between cases n and k . The average proximity from case n in class j to case k (the rest of data in class j ) is computed as:
¯
P (n ) =
^{}
class( k )= j
prox ^{2} (n, k ).
(20)
¯
The raw outlierness of case n is deﬁned as: N/ P (n ), where N denotes the number of cases in the dataset. In
each class, the median and the standard deviations of
The Computer Journal, Vol. ??, No. ??, ????
15
all raw outlierness values are calculated. The median is subtracted from each raw outlierness value. The result of the subtraction is divided by the standard deviation to get the ﬁnal outlierness. If the outlierness of a case is large, the proximity is small, and the case is determined an outlier. The random forests algorithm provides relatively higher detection rates where the false positive rates are low on KDDCUP 1999 dataset. For example, the detection rate is 95% when the false positive rate is 1%. When the false positive rate is reduced to 0.1%, the detection rate is still over 60%.
4.2.3. Clustering
These approaches attempt to detect both either single point outliers or clusterbased outliers, and can assign each outlier a degree of being an outlier. LDBSCAN (localdensitybased spatial clustering of applications with noise) [79] is a clusterbased outlier detection algorithm. LDBSCAN randomly selects one core point which has not been clustered, and then retrieves all points that are local density reachable from the chosen core point to form a cluster. It does not stop until there is no unclustered core point. Clusterbased outlier detection has been applied to the Backbone Anomaly Detection System for CSTNET, an Internet service provider for all the institutes of Chinese Academy of Sciences [79]. The Backbone Anomaly Detection System continuously monitors the input and output throughput of about 300 network nodes of CSTNET. Each node generates its average throughput record every ﬁve minutes, so the Backbone Anomaly Detection System checks the node state 300 × 12 = 3600 times each hour. Under normal circumstances, the throughput of a certain node forms
a cluster. But during the period of an abnormal event,
the throughput exhibits temporal locality, i.e., it forms
a new cluster which is diﬀerent from history. Using
clusterbased outlier detection, the system generates 10 alerts per hour. According to the feedback of the network administrators in CSTNET, clusterbased outlier detection generates accurate alerts. We compare the outlier detection approaches discussed in the previous subsections based on parameters such as detection approach (supervised or unsupervised), methods (statistical, proximity based, kernel function), input data (training data required or not), proximity measure used (e.g., Euclidean distance), parametric or not, attribute types of data. See Table 3 for how the methods compare against one another. Supervised methods require training data, but unsupervised methods do not. Except a few, most supervised methods use numeric data. However, unsupervised methods are capable of handling mixed type high dimensional data. All the methods in Table 3 can handle high dimensional data.
5. RESEARCH ISSUES AND CHALLENGES
Outlier detection is an extremely important problem with direct application in various domains. It involves exploring unseen spaces. A key observation in outlier detection is that it is not a wellformulated problem. The nature of the data, the nature of the outliers, the constraints and the assumptions collectively constitute the problem formulation. Some outlier detection techniques are developed in a more generic fashion and can be ported to various application domains while others directly target a particular application domain. In many cases, the data structures used for faster detection, the proximity measure or anomaly detection formula used and the capability of handling higher dimensional data (may be mixed type) for any distribution pattern dictate which method outperforms the others. In outlier detection, the developer should select an algorithm that is suitable for their data set in terms of the correct distribution model, the correct attribute types, the scalability, the speed, any desired incremental capabilities to allow new exemplars to be handled and the modelling accuracy. The developer should also consider which of the fundamental approaches is suitable for their problem. The distancebased techniques do not make assump tions about the data since they compute the distance between each pair of points. The distance measur ing techniques can be based on numeric, categorical or mixed type data. It is diﬃ cult to have a single prox imity measure that can handle the numeric, categori cal or mixed type attribute for any dimensionality and for any number of instances. Datasets that consist of one type of attribute, i.e., only numerical attributes or categorical attributes can be directly mapped into numerical values. However, the mapping of categori cal attributes to numerical attributes is not a straight forward process and greatly depends on the mapping used. On the other hand, densitybased methods esti mate the density distribution of the data points based on attributes or parameters and then identify outliers as those lying in regions of low density. Like distance based methods, the design of an appropriate density based outlier detection method is a challenging task. Densitybased methods are also based on distance com putations which can be inappropriate for categorical data, and again not straightforward. In addition, high dimensional data is almost always sparse, which creates problems for densitybased methods. If indiscernible or indistinguishable objects occur, rough set techniques can provide good solutions for outlier detection. For ob jects with uncertainties, fuzzyrough techniques may be suitable for outlier detection. However, soft computing [80] based approaches allow tolerance for imprecision and uncertainty to achieve tractability, robustness, and low solution cost.
The Computer Journal, Vol. ??, No. ??, ????
16
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
TABLE 3. Outlier Detection Methods: A General Comparison
Approach 
Author, 
Methods 
Method 
Training 
Proximity 
Parametric/ 
Numeric/ 

Year 
Used 
Class 
Dataset 
Measure 
nonPara 
Categorical 

metric/Both 
/Mixed type 

Hadi, 
Regression [52] Analysis 
Statistical 
Training 
No 
Parametric 
Numeric 

1992 
Data 

Ramaswamy, Partition 
Proximity 
No 
Euclidean/ 
No 
Numeric 

Supervised 
2000 
based [24] 
based 
Mahalanobis 

Petrovskiy, 
Fuzzy 
Ap 
Kernel 
No 
Membership 
nonParametric Numeric 

2003 
proach [14] 
Function 
Degree 

Pawlak, 
Roughset [29] 
Soft com 
Training 
Membership 
nonParametric Numeric 

1995 
puting 
Data 
Function 

Quinlan, 
C4.5 [54] 
Decision 
Training 
Information 
Parametric 
Numeric 

1993 
Tree 
Data 
entropy 

Matthew, 
RELOADED 
distance 
Training 
Euclidean 
nonParametric Mixed Type 

2005 
[43] 
based 
Data 
Distance 

Kriegel, 
LOF [28] 
Density 
Training 
LOF 
Parametric 
Mixed Type 

2000 
based 
Data 

Kenji, 
SmartSifter 
Statistical 
No 
No 
Both 
Mixed 

Unsupervised 2004 
[78] 

Zhang, 
Random 
Graph 
No 
Gini Index 
Parametric 
Mixed 

2006 
forests [21] 
Theoretic 

approach 

Duan, 
LDBSCAN 
Clustering 
No 
Euclidean 
nonParametric Numeric 

2008 
[79] 
distance 
In addition, for all types of attacks, all features are not equally predictive [81]. Considering various factors, one can conclude that a combined approach based on distance, density or soft computing, can provide required robustness and scalability for outlier detection. Here, preprocessing of the data set is essential to identify responsible features among categorical attributes. For continuous attributes, a distancebased approach is suitable, but the right threshold value is needed for diﬀ erentiation of data points. Thus, a faster incremental method capable of handling high dimensional mixed type data with high detection rate and reduced false positives is still called for. The major issues in outlier detection are as follows.
• Deﬁning a normal region which encompasses every possible normal behavior is very diﬃcult. Oftentimes normal behavior evolves over time and an existing notion of normal behavior may not be su ﬃ ciently representative in the future.
• The exact notion of an outlier is diﬀ erent for di ﬀerent application domains. Every application domain imposes a set of requirements and constraints giving rise to a speciﬁc problem formulation for outlier detection.
• Often the data contains noise similar to the actual outliers and hence is di ﬃcult to distinguish and remove.
• Availability of labelled data for training/validation is often a major issue when developing an outlier detection technique.
• Typically, soft computing embraces several compu tational intelligence methodologies, including ar tiﬁcial neural networks, fuzzy logic, evolutionary computation, and probabilistic computing. These methods neither are independent of one another
nor compete with one another. Rather, they work
in a cooperative and complementary way. In this context, establishing an appropriate soft comput ing method for outlier detection is a challenging task.
• Nowadays, most network attacks are distributed. To handle such attacks a distributed outlier detection technique is essential.
• Among existing outlier detection approaches, none is capable of handling the outlier detection problem individually to a satisfactory level. Especially, to achieve a high detection rate with a low false positive alarm rate for the network domain, a cost
e ﬀ ective ensemble approach may be more suitable.
• Several scoring functions have been proposed over the decades for numeric, categorical or mixed type data. However, the design of an appropriate scoring function that can handle all these types of data in the presence of noise still remains a challenge.
• Often, it has been observed that supervised or unsupervised anomaly detection based on clustering approaches alone cannot handle all network traﬃ c data eﬀectively. In such cases, for time window based real time attack detection, integration of an appropriate outlier detection technique, either in a post processing task or in support of simultaneous processing, may be useful.
6. CONCLUSION
This paper has attempted to establish the signiﬁcance of outlier detection in anomaly identiﬁcation. A compre hensive survey of various distancebased, densitybased and soft computing based outlier detection techniques has been provided in this paper. In addition, we re
The Computer Journal, Vol. ??, No. ??, ????
17
port on and analyse various outlier detection techniques under supervised and unsupervised approaches. Based on our review, we observe that the notion of outlier is di ﬀ erent for diﬀ erent application domains. Thus, de velopment of an eﬀ ective outlier detection technique for mixedtype and evolving network traﬃc data, especially in the presence of noise, is a challenging task. However, for future research, outlier detection method should be tested on real network data collected using tools such as ﬂowtools [82] and dataset like the MITRE [50] data set.
REFERENCES
S Shah (2004) Online outlier
detection and data cleaning. Computers and Chemical Engineering, 28, 1635–1647. [2] Mitchell, T. M. (1997) Machine Learning. McGraw Hill, Inc., New York, NY, USA. [3] Vapnik, V. N. (1995) The nature of statistical learning theory. SpringerVerlag, New York, USA.
[4] Gelenbe, E. (2007) Dealing with software viruses: A biological paradigm. Information Security Technical Report, 12(4), 242–250. [5] Heady, R., Luger, G., Maccabe, A., and Servilla, M. (1990) The architecture of a network level intrusion detection system. Technical Report NM 87131. Computer Science Department, University of New Mexico, Albuquerque Mexico. [6] Chen, R. C., Cheng, K. F., and Hsieh, C. F. (2009) Using rough set and support vector machine for network intrusion detection. International Journal of Network Security & Its Applications (IJNSA), 1(1), 1–13.
[1] H Liu,
W. J.,
[7] Roesch, M. (1999) Snortlightweight intrusion detection for networks. Proceedings of the 13th Conference on Systems Administration (LISA99), Seatlle, WA, USA November 712, pp. 229–238. USENIX, Seattle, Washington. [8] Daniel, B., Julia, C., Sushil, J., and Ningning, W. (2001) Adam: a testbed for exploring the use of data mining in intrusion detection. SIGMOD Rec., 30, 15–
24.
[9] Lee, W. and Stolfo, S. J. (1998) Data mining approaches for intrusion detection. Proceedings of the 7th conference on USENIX Security Symposium  Volume 7, San Antonio, Texas, USA, Jan., pp. 6–6. USENIX. [10] Chandola, V., Banerjee, A., and Kumar, V. (2009) Anomaly detection : A survey. ACM Computing Surveys (CSUR), 41, 15:1–58. [11] Padmanabhan, J. and Easwarakumar, K. S. (2009) Traﬃ c engineering based attack detection in active networks. Lecture Notes in Computer Science (LNCS), 5408, 181–186. [12] Lazarevic, A., Ertoz, L., Kumar, V., Ozgur, A., and Srivastava, J. (2003) A comparative study of anomaly detection schemes in network intrusion detection. Proceedings of the 3rd SIAM International Conference on Data mining, San Francisco, CA, May 13, 2003, pp. 25–36. SIAM.
[13] Hodge, V. and Austin, J. (2004) A survey of outlier detection methodologies. Artiﬁcial Intelligence, 22,
85–126.
[14] Petrovskiy, M. I. (2003) Outlier detection algorithms
in data mining systems. Programming and Computer
Software, 29, 228–237.
[15] Tang, J., Chen, Z., Fu, A. W., and Cheung, D. W. (2006) Capabilities of outlier detection schemes in large datasets, framework and methodologies. Knowledge and Information Systems, 11, 45–84. [16] Chandula, V., Banerjee, A., and Kumar, V. (2007) Outlier detection: A survey. Technical Report TR 07 017. Dept of CSE,University of Minnesota, USA. [17] Zhang, Y., Meratnia, N., and Havinga, P. (2010) Out lier detection techniques for wireless sensor networks:
A survey. IEEE Communications Survey & Tutorials,
12(2), 159 – 170.
[18] Bhuyan, M. H., Bhattacharyya, D. K., and Kalita,
J. K. (2011) Rodd: An eﬀ ective reference based outlier
detection technique for large datasets. LNCSCCIS, 133, Part I, 76–84. [19] Ng, B. (2006) Survey of anomaly detection methods. Technical Report UCRLTR225264. Lawrence Liver more National Laboratory, University of California, California USA. [20] Kaur, G., Saxena, V., and Gupta, J. P. (2010) Anomaly detection in network traﬃc and role of wavelets. Proc. of 2nd IEEE International Conference on Computer Engineering and Technology (ICCET 2010), Chengdu, China, 1618 April, pp. V7: 46–51. IEEE.
[21] Zhang, J. and Zulkernine, M. (2006) Anomaly based network intrusion detection with unsupervised outlier detection. IEEE International Conference on Communications (ICC), June, pp. 2388–2393. IEEE Xplore, Istanbul. [22] Hawkins, D. (1980) Identiﬁcation of outliers. Chapman and Hall, London. [23] Knorr, E. M. and Ng, R. T. (1998) Algorithms for mining distancebased outliers in large datasets. Proceedings of the 24th Int’l. Conf. on Very Large Data Bases, New York USA, Sep., pp. 392–403. Morgan
Kaufmann.
[24] Ramaswamy, S., Rastogi, R., and Shim, K. (2000) Eﬃ cient algorithms for mining outliers from large data sets. ACM SIGMOD Record, 29, 427–438.
[25] Knorr, E. M. and Ng, R. T. (1999) Finding intensional knowledge of distancebased outliers. Proceedings of the 25th International Conference on Very Large Data Bases, VLDB’99, Edinburgh, Scotland, UK, 710 Sep., pp. 211–222. Morgan Kaufmann Publishers Inc. San Francisco, CA, USA. [26] Breunig, M. M., Kriegel, H., Ng, R. T., and Sander,
J. (2000) Lof: Identifying densitybased local outliers.
Proceedings of the 2000 ACM SIGMOD international
conference on management of data, Dallas, Texas, United States, May, pp. 93–104. New York: ACM
Press.
[27] Koufakou, A. and Georgiopoulos, M. (2010) A fast outlier detection strategy for distributed high dimensional data sets with mixed attributes. Data Mining and Knowledge Discovery, 20, 259–289.
The Computer Journal, Vol. ??, No. ??, ????
18
P.Gogoi D.K.Bhattacharyya B.Borah J.K.Kalita
[28] Breunig, M. M., Kriegel, H. P., Ng, R. T., and Sander,
J. (2000) Lof: Identifying densitybased local outliers.
ACM SIGMOD, 29, 93–104. [29] Pawlak, Z., GrzymalaBusse, J., and Ziarko, W. (1995) Rough sets. Communications of the ACM, 38, 88–95. [30] Jiangab, F., Suia, Y., and Caoa, C. (2008) A rough set approach to outlier detection. International Journal of General Systems, 37, 519–536. [31] Sarkar, M. and Yegnanarayana, B. (1998) Fuzzy rough membership functions. Proceedings of the IEEE International Conference on Systems, Man and Cybernetics, San Diego, CA , USA, Oct., pp. 2028– 2033. IEEE Xplore. [32] Bezdek, J. C. and Pal, S. K. (1992) Fuzzy Model for Pattern Recognition. Eds. IEEE Press, Newyork USA. [33] Duboia, D. and Prade, H. (1990) Roughfuzzy sets and fuzzyrough sets. International Journal of General Systems, 17, 191–209. [34] Barnett, V. and Lewis, T. (1994) Outliers in Statistical Data. John Wiley, Chichester, New York, USA.
[35] Teodoro, P. G., Verdejo, J. D., Fernandez, G. M., and Vazquez, E. (2009) Anomalybased network intrusion detection: Techniques, systems and challenges. Com puter and Security, 28, 18–28.
[36] Song, X., Wu, M., Jermaine, C., and Ranka, S. (2007) Conditional anomaly detection. IEEE Transactions on Knowledge and Data Engineering, 19, 631–645. [37] Tavallaee, M., Stakhanova, N., and Ghorbani, A. A. (2010) Towards credible evaluation of anomalybased intrusion detection methods. IEEE Transactions on System, Man, and Cybernetics Part C: Applications and Reviews, 40(5), 516–524. [38] Cha, S.H. (2007) Comprehensive survey on dis tance/similarity measures between probability density functions. International Journal of Mathematical Mod els and Methods in Applied Science, 1 (4) , 300–307. [39] Boriah, S., Chandola, V., and Kumar, V. (2008) Sim ilarity measures for categorical data: A comparative evaluation. Proceedings of the 8th SIAM International Conference on Data Mining, Atlanta, Georgia, USA, Apr., pp. 243–254. Society for Industrial and Applied Mathematics(SIAM). [40] Settles, B. (2009) Active learning literature survey. Computer Sciences Technical Report 1648. University of Wisconsin–Madison. [41] Lippman, R. P., Fried, D. J., Graf, I., Haines, J., Kendall, K., McClung, D., Weber, D., Wyschogrod,
S. W. D., Cunningham, R. K., and Zissman, M. A.
(2000) Evaluating intrusion detection systems: The 1998 darpa o ﬀline intrusion detection evaluation. Pro ceedings of DARPA Information Survivability Confer ence and Exposition, 2000. DISCEX ’00, Hilton Head, SC , USA, 2527 Jan., pp. 12–26 Vol 2. IEEE Xplore. [42] Ghoting, A., Otey, M. E., and Parthasarathy, S. (2004) Loaded: Linkbased outlier and anomaly detection in evolving data sets. Proceedings of the 4th IEEE International Conference on Data Mining, Brighton, UK, Nov., pp. 387–390. IEEE Computer Society.
[43] Otey, M. E., Parthasarathy, S., and Ghoting, A. (2005) Fast lightweight outlier detection in mixedattribute data. Technical Report OSUCISRC 6/05TR43. Department of Computer Science and
Engineering, The Ohio State University, Ohio, United
States.
[44] Hawkins, S., He, H., Williams, G., and Baxter,
R. (2002) Outlier detection using replicator neural
networks. Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery, London, UK, Sep., pp. 170–180. Springer
Verlag.
[45] Ye, N. (2000) A markov chain model of temporal behavior for anomaly detection. Proceedings of the
2000 IEEE Workshop on Information Assurance and
Security, United States Military Academy, West Point, NY, June, pp. 171–174. IEEE Xplore.
[46] Puketza, N., Chung, M., Olsson, A. R., and Mukherjee,
B. (1997) A software platform for testing intrusion
detection systems. IEEE Software, 14(5), 43–51. [47] Debar, H., Dacier, M., Wespi, A., and Lampart, S. (1998) An experimentation workbench for intrusion detection systems. Technical report. RZ 2998(93044) Research Division, IBM, New York, NY. [48] McHugh, J. (2000) Testing intrusion detection systems:
A critique of the 1998 and 1999 darpa intrusion detection system evaluations as performed by lincoln laboratory. ACM Transactions on Information and System Security, 3(4), 262–294. [49] Durst, R., Champion, T., Witten, B., Miller, E., and Spagnuolo, L. (1999) Testing and evaluating computer intrusion detection systems. Communication ACM, 42(7), 53–61. [50] Aguirre, S. J. and Hill, W. H. (1997) Intrusion detection ﬂyoﬀ : Implications for the united states navy. Technical report. Sept. 1997, MITRE, MTR 97W096 McLean, Virginia.
[51] Lippmann, R. and Kukolich, L. (1993). Lnknet user’s guide. MIT Lincoln Laboratory. [52] Hadi, A. S. (1992) A new measure of overall potential inﬂuence in linear regression. Computational Statistics & Data Analysis, 14, 1–27.
[53] Tian, W. and Liu, J. (2009) Intrusion detection quanti tative analysis with support vector regression and par ticle swarm optimization algorithm. Proceedings of the
2009 International Conference on Wireless Networks
ICWN’09, Shanghai, China, 2829 December, pp. 133– 136. IEEE Xplore. [54] SALZBERG, S. L. (1994) C4.5: Programs for machine learning. Machine Learning, 16, 235–240. [55] Premaratne, U., Ling, C., Samarabandu, J., and Sidhu, T. (2009) Possibilistic decision trees for intrusion detection in iec61850 automated substations. Proceedings of the 2009 International Conference on Industrial and Information Systems (ICIIS), Sri Lanka, Dec., pp. 204–209. IEEE Xplore. [56] Zhuang, Z., Li, Y., and Chen, Z. (2009) Paids:a proximityassisted intrusion detection system for unidentiﬁed worms. Proceedings of the 33rd Annual IEEE International Computer Software and Applica tions Conference, Seattle,Washington, 2024 July, pp. 392–399. IEEE. [57] Scholkopf, B. and Smola, A. J. (2000) Learning with Kernels . The MIT Press, Cambridge, Massachusetts, London, England.
The Computer Journal, Vol. ??, No. ??, ????
19
[58] Sewell, M. (2009). Kernel methods. Department of Computer Science, University College London. [59] Kim, B. and Kim, I. (2006) Kernel based intrusion detection system. Proceedings of the 4th Annual ACIS International Conference on Computer and Information Science (ICIS’05), Jeju Island, South Korea, 1616 July, pp. 13–18. IEEE Computer Society. [60] Suykens, J. A. K. and Vandewalle, J. (1999) Least squares support vector machine classiﬁers. Neural Processing Letters, 9, 293–300. [61] Petrovskiy, M. (2003) A fuzzy kernelbased method for realtime network intrusion detection. Proceeding of the 3rd International Workshop of Innovative Internet Community Systems (IICS), Leipzig, Germany, 1921 June, pp. 189–200. Springer. [62] Du, Y., Zhang, R., and Guo, Y. (2010) A useful anomaly intrusion detection method using variable length patterns and average hamming distance. Journal of Computers, 5(8), 1219–1226. [63] Hey, L. and Gelenbe, E. (2009) Adaptive packet pri oritisation for wireless sensor networks. Procedings of Next Generation Internet Networks, Aveiro, Portugal, 13 July, pp. 1–7. IEEE Xplore. [64] Lent, R., Abdelrahman, O. H., Gorbil, G., and Gelenbe, E. (2010) Fast message dissemination for emergency communications. Proceedings of PerCom Workshop on Pervasive Networks for Emergency Management (PerNEM’10), Mannheim, Germany, March 29April 02 2010, pp. 370–375. IEEE, NY, USA. [65] Ngai, E., Gelenbe, E., and Humber, G. (2009) Informationaware traﬃc reduction for wireless sensor networks. Proceedings of the 34th Annual IEEE Conference on Local Computer Networks (LCN 2009) October 2023, Zurich, Switzerland, pp. 451–458. IEEE Zurich, Switzerland. [66] Gelenbe, E. and Ngai, E. (2008) Adaptive qos routing for signiﬁcant events in wireless sensor networks. Proceedings of the 5th IEEE International Conference on Mobile AdHoc and Sensor Systems (MASS’08), Atlanta, GA, USA, 29 September 2 October 2008, pp. 410–415. IEEE, New York, NY, USA. [67] Gelenbe, E. and Ngai, E. (2010) Adaptive random re routing for di ﬀerentiated qos in sensor networks. The Computer Journal, 53(7), 1052–1061. [68] Gelenbe, E. and Ngai, E. (2008) Adaptive random rerouting in sensor networks. Proceedings of the Annual Conference of ITA (ACITA ’08) September 16 18, London, UK, pp. 348–349. Imperial College London, UK. [69] Sakellari, G. and Gelenbe, E. (2010) Demonstrating cognitive packet network resilience to worm attacks. Procedings of the 17th ACM Conference on Computer and Communications Security (CCS 2010), Chicago, IL, USA, 48 October 2010, pp. 636 – 638. ACM, New York, NY, USA. [70] Sakellari, G. and Gelenbe, E. (2009) Adaptive resilience of the cognitive packet network in the presence of network worms. Procedings of the NATO Symposium on C3I for Crisis, Emergency and Consequence Management, Bucharest, Romania, 1112 May, pp. 16:1–16:14. NATO Research & Technology Organisation.
[71] Sakellari, G., Hey, L., and Gelenbe, E. (2008) Adaptability and failure resilience of the cognitive packet network. Presented at the Demo Session of the INFOCOM2008, Phoneix, AZ, USA, 1517 April. IEEE, New York, NY, USA. [72] Oke, G., Loukas, G., and Gelenbe, E. (2007) Detecting denial of service attacks with bayesian classiﬁers and the random neural network. Procedings of FuzzIEEE 2007, London, UK, 2326 July, pp. 1964–1969. IEEE, New York, NY, USA. [73] Gelenbe, E. and Loukas, G. (2007) A selfaware approach to denial of service defence. Computer Networks, 51(5), 1299–1314. [74] Gelenbe, E. (2011) Selfaware networks. McGrawHill 2011 Yearbook of Science & Technology, To appear, Manuscript ID YB11–0175, 2011. [75] Gelenbe, E. (July 2009) Steps towards selfaware networks. Communications of the ACM, 52(7), 66–75. [76] Ren, F., Hu, L., Liang, H., Liu, X., and Ren, W. (2008) Using densitybased incremental clustering for anomaly detection. Proceedings of 2008 International Confer ence on Computer Science and Software Engineering, Wuhan, Hubei, China, 1214 December, pp. 986–989. IEEE computer society. [77] Ester, M. and Kriegel, H. (1996) A densitybased algorithm for discovering clusters in large spatial databases with noise. Proceedings of the 2nd International Conference on knowledge discovery and Data mining, Portland, Oregon, August 24, pp. 226– 231. AAAI Press. [78] Yamanishi, K., ichi Takeuchi, J., Williams, G., and Milne, P. (2004) Online unsupervised outlier detection using ﬁnite mixtures with discounting learning algorithms. Data Mining and Knowledge Discovery, 8, 275–300. [79] Duan, L., Xu, L., Liu, Y., and Lee, J. (2008) Cluster based outlier detection. Annals of Operations Research, 168, 151–168. [80] Zadeh, L. A. (1994) Fuzzy logic, neural networks, and soft computing. Communications, ACM, 37, 77–84. [81] Kayacik, H. G., Heywood, A. N. Z., and Heywood, M. I. (2005) Selecting features for intrusion detection: A feature relevance analysis on kdd 99 intrusion detection datasets. Proceedings of the 3rd Annual Conference on Privacy, Security and Trust
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.