Professional Documents
Culture Documents
4,
ISSN: 1837-7823
Detecting the vulnerabilities in a network plays a vital role in the prevention of intrusions in a system. This paper
describes a cluster based mechanism for detecting vulnerabilities and in turn intrusions. The network is analyzed and
a graph is constructed representing the entire network. This graph is passed to a clustering algorithm that clusters the
nodes. This process of clustering is basically an elimination of edges, hence providing the number of clusters or the
shape of the cluster before the processing is not necessary. This process helps us in sorting out the outliers. These
outliers are the nodes that have the maximum vulnerability of being attacked. Analysis shows that our process has an
accuracy rate of 0.91375.
12
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
2. Related Works
In general, detection of an anomaly focuses mainly on monitoring and recording the users behavior. This helps us
detect unusual behavior from the normal behavior. Any kind of behavior that deviates from the normal behavior is
labeled as an anomaly or intrusion. Typical conventional anomaly detection researches [1, 2, 3] have used statistical
approaches. The statistical methods have the strong point that the size of a profile for real-time intrusion detection
can be minimized. However, the usage of statistical operators alone cannot provide best results. Further detection of
false positives cannot be avoided. Furthermore, the statistical methods cannot handle infrequent but periodically
occurring activities.
Leonid Portnoy [4] introduced a clustering algorithm to detect both known and new intrusion types without the need
to label the training data. A simple variant of singlelinkage clustering to separate intrusion instances from the normal
instances was used. Though this algorithm overcomes the shortcoming of number of clusters dependency, it requires
a predefined parameter of clustering width W which is not always easy to find. The assumption that "the normal
instances constitute an overwhelmingly large portion (>98%)" is also too strong. In [5], Qiang Wang introduced
Fuzzy-Connectedness Clustering (FCC) for intrusion detection based on the concept of fuzzy connectedness which
was introduced by Rosenfeld in 1979. FCC can detect the known intrusion types and their variants, but it is difficult
to find a general accepted definition of fuzzy affinity which was used by FCC.
3. SYSTEM ARCHITECTURE
The process of Intrusion detection can be performed as described in the Figure 1.
The initial phase deals with creating a graph for proceeding with the processing. Every system in a network is
considered as a node and every connection between the systems is marked as an edge. A complete graph is created
along with the weight details for future analysis. The graph is analyzed using the weight values provided and all
related nodes are grouped together to form clusters [10]. After the formation of clusters, the cluster analysis [6] is
performed, in which every cluster is checked for outlying items, i.e. items that are at farthest reach from the cluster
centre. These are isolated and are considered to be the vulnerable nodes. After this process, monitoring of the nodes
is performed, and if traffic anomalies were detected, then the node is labeled as an intruder.
13
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
deg i = aij
j =1
A measure that evaluates the clustering tendency in graphs is known as clustering coefficient. It is based on the
analysis of three node cycles around a node i. A formulation of this measure for unweighted graphs is given by
2 j = 1 k =
n 1
ci =
Note that
n 1
j= 1
j i
k = j +1
k i
aij a jk aik
a a jk aik
j +1 ij
deg i (deg i 1)
deg i
indicates the total number of neighbors of node i. The denominator measures the maximum possible number of
edges that could exist between the vertices within the neighborhood.
This measure evaluates the tendency of the nearest neighbors of node i to be connected to each other.
14
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
After constructing the graph, clustering is performed. The process of clustering divides the graph into several subgraphs. Clustering [7]&[8] is performed by providing a threshold value , which is calculated using the formula
o D such that:
d ( p, o ' ) d ( p, o)
( p, o ' ) < d ( p, o) .
Given the k-distance of p, the k-distance neighborhood of p contains every object whose distance from p is not
greater than the k-distance.
N k distance (p) = { q D{p} | d(p,q) < k-distance (p) }
These objects q are called the k-nearest neighbors of p.
Given the k-distance of p, and p is a center of circle with radius k. All objects in this circle are k-distance
neighborhood of p. p is the centre of mass of this circle. So the Local Deviation Rate is defined as:
LDCk ( p ) =
Intuitively, LDC is sum of the LDR of k distance neighborhood of p. The coefficient reflects the degree of
dispersion of an objects neighborhood. Greater value of LDC means higher probability of one object being an
outlier. On the other hand, a low LDC value indicates that the density of an objects neighborhood is high. So its
hardly to be an outlier. All probable outliers are shortlisted in this phase. After the completion of this phase, comes
the monitoring phase. All the shortlisted nodes that are considered vulnerable for attacks are monitored for attacks or
abnormal activities. The traffic flow to and from these nodes are monitored. If any abnormalities were discovered,
then cleanup is performed on the node for removing the vulnerabilities.
15
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
5. Result Analysis
The current process is evaluated with various sets of data containing different number of data items and the obtained
values are recorded in a confusion matrix.
Table 1: Confusion Matrix
Predicted
Actual
Positive
Negative
Positive
TP
FP
Negative
TN
FN
Where,
TP - True positive, FP- False positive, TN True Negative and FN False Negative.
The two performance measures, sensitivity and specificity are used for evaluating the results.
Sensitivity is the accuracy on the positive instances (equivalent to True Positive Rate-TPR)
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
From Figure 5, we can see that during the initial stages, when the number of entries are minimal, the plots point to
0,0 and 0,1 points. As the number of entries keep increasing, we can see that the plotted points are clustered towards
the northwest corner and are above the diagonal. This proves that this process provides a high level of accuracy,
almost meeting the perfect standard of 0,1.
1
0.9
0.8
0.7
T
P
R
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
FPR
Figure 5: ROC Plot
17
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
Precision is the fraction of retrieved instances that are relevant, while recall is the fraction of relevant instances that
are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Hence we
can use this measure to find the relevance of the readings.
P
r
e
c
i
s
i
o
n
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
0.2
0.4
0.6
0.8
Recall
Figure 6: PR Curve
Usually, precision and recall[9] scores are not discussed in isolation. Instead, either values for one measure are
compared for a fixed level at the other measure or both are combined into a single measure, such as their harmonic
mean the F-measure, which is the weighted harmonic mean of precision and recall.
F = 2.
This is also known as the
precision recall
precision + recall
F= (1 + 2 ).
):
precision recall
2 precision + recall
F measures are the F2 measure, which weights recall higher than precision, and
The F-measure was derived by van Rijsbergen (1979) so that F "measures the effectiveness of retrieval with
respect to a user who attaches
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
E = 1
1
1
+
P
R
1
1+ 2
100
90
80
70
60
50
40
30
20
10
0
No of nodes in network
No of nodes detected for
vulnerabilities
9 10
Instance number
19
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
Figure 8 shows the detection rate of our algorithm. 15% of the total nodes show abnormalities.
30
N
u
m
b
e
r
25
n
o
d
e
s
o
f
20
15
10
Actual no of nodes
attacked
5
0
1 2 3 4 5 6 7 8 9 10
Instance number
Figure 9 shows the actual number of nodes detected for monitoring the vulnerabilities versus actual number of nodes
attacked. We can see that our algorithm has managed to detect most of the nodes that are vulnerable. Our system
shows a detection percentage of 84.91729.
Here, the F-Measure of our values shows a rate of 0.84833 and we obtain an average accuracy rate of 0.91375.
Further, we can see that our proposed structure reduces the amount of nodes that are to be monitored, hence
reduction in the amount of processing is observed. Further, the number and shape of the clusters is not defined.
Hence any type of network can be used for the clustering process. The current process can be further fine tuned by
incorporating artificial intelligence into the system. This can help create an evolutionary system that can learn new
types of attacks and evolve in time.
7. REFERENCES
[1]
[2]
[3]
[4]
[5]
Harold S.Javitz and Alfonso Valdes, "The NIDES Statistical Component Description and Justification,"
Annual Report, SRI International, 333 Ravenwood Avenue, Menlo Park, CA 94025, March 1994.
Phillip A. Porras and Peter G. Neumann, "EMERALD: Event Monitoring Enabling Responses to
Anomalous Live Disturbances," 20th NISSC, October 1997.
H.S. Javitz, A. Valdes, "The SRI IDES Statistical Anomaly Detector," IEEE Symposium on Research in
Security and Privacy, May 1991.
Portnoy, L., Eskin, E., Stolfo, S, Intrusion Detection with Unlabeled Data Using Clustering, ACM CSS
Workshop on Data Mining Applied to Security, pp. 58. ACM Press, Philadelphia, 2001.
Qiang, W., Vasileios, M, A Clustering Algorithm for Intrusion Detection, The SPIE Conference on Data
Mining, Intrusion Detection, Information Assurance, and Data Networks Security, Florida, vol. 5812, pp.
3138, 2005.
20
International Journal of Computational Intelligence and Information Security, April 2013, Vol. 4 No. 4,
ISSN: 1837-7823
[6]
Joshua Oldmeadow, Siddarth Ravinutala1, and Christopher Leckie, Adaptive Clustering for Network
Intrusion Detection PAKDD 2004, LNAI 3056, pp. 255259, Springer-Verlag, Berlin Heidelberg, 2004.
[7] XlONG Jiajun, LI Qinghua, TU Jing, A Heuristic Clustering Algorithm for Intrusion Detection Based on
Information Entropy, Wuhan University Journal Of Natural Sciences, Vol. 11 No. 2 2006 355-359, 2006.
[8] Maria C.V. Nascimento, Andre C.P.L.F. Carvalho, J, A Graph Clustering Algorithm Based On A
Clustering Coefficient For Weighted Graphs, Brazil Computer Society, 17: 1929 DOI 10.1007/s13173010-0027, 2011.
[9] Jesse Davis, Mark Goadrich, The Relationship Between Precision-Recall and ROC Curves, Proceedings
of the 23 rd International Conference on Machine Learning, Pittsburgh, PA, 2006.
[10] Sang-Hyun Oh and Won-Suk Lee, Z.-H. Zhou, H. Li, and Q. Yang (Eds.), Anomaly Intrusion Detection
Based on Dynamic Cluster Updating, PAKDD 2007, LNAI 4426, pp. 737744, Springer-Verlag, Berlin
Heidelberg, 2007.
21