(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 2, 2010
Figure 1. A typical anomaly IDS [7].
The main issues with the anomaly-based detectors are thatthey produce high number of false alerts [7]. According to [9],anomaly detectors are tending to be computationally expensive.This is because there are several metrics which are oftenmaintained and often need to be updated against every systemactivity; and they might be gradually trained incorrectly torecognize abnormal behaviors as normal in the long run due tothe insufficient data.In this study, we have assumed that the neural networksbehaviors in SOM will learn patterns of the normal systembehavior and continually produce profiles to incorporate withthe fuzzy logic behaviors in the FCM. This is also to determinethe appropriate membership function which will help inreducing false alerts and increasing the detection accuracy of the detection sensor [10].III.
T
HE
C
LASSIFIER
I
NTRUSION
D
ETECTION
The ability of detecting/preventing new attacks withoutprior knowledge of the attack behavior is a tough task,especially the way of determining the input features to monitornormal versus intrusive behavior. For this challenging task, wedecide on unsupervised learning techniques as they are the bestsuited for such situation [27].The focus here is to provide a multi-classifier system whichcan work as an inference engine supplement for enhancementof the IDS capability. Using the classifiers system, we candetermine the importance of features in various anomalydetection cases.In order to build the inference engine classifiers system, wehave used the unsupervised learning method so-calledKohonen’s maps (SOM) [6] for clustering and recognition of input data and the fuzzy cognitive maps (FCM) [5] to detectfeatures relevancy. The FCM use causal reasoning to assess theSOM output and then model the final decision. FCM are idealcausal knowledge acquiring tool with fuzzy signed graphswhich can be presented as an associative single layer neuralnetwork [4]. Using FCM, our methodology attempt to diagnoseand direct network traffic data based on its relevance to attack or normal connections.
By quantifying the causal inference process we candetermine the attack detection and the severity of odd packets.As such, packets with low causal relations to attacks can bedropped or ignored and/or packets with high causal relations toattacks are to be highlighted. In the following subsections, weelaborate the classifiers system modules. Figure 3 shows theoverall detection process.
A.
Preprocessor Module
Data preprocessing module performs the final preparationof the target data records. This includes the slicing of the largedataset. The selection criteria based on pre-user definedmechanisms or threshold value, and the number of the startingrow in the given dataset. First, we introduce the input file withall the input vectors then we put the number of vectors requiredto read, the number of levels and the threshold value. In thismodule, the user can introduce the number of neurons and theselected features which will be used in each SOM level. Afterthat, the user can train and save the neurons state accordinglyfor each training level. The elapse time is the differencebetween the first and the last level according to the userpredefined number of levels.
Figure 2. Preprocessing module.
This module involves slicing the dataset into five classes.Each class symbolic-valued features are mapped into numeric-valued features. Symbolic features such as protocol typesymbols (TCP, UDP, and ICMP) were mapped into integervalues. More details about the data used and the data classesare available in section V. Each symbol data is corresponded toa position in the labels array and this position will be used tofill the input vector. In this module, we have focused on thefinal preparation of the target data to be presented to thesubsequent module.The prime importance of this module join up by the factthat finding or discovering related patterns in a data set is aninstructive process, with slight or even no former knowledgeabout the structure of the given dataset to be examined [21].Hence, dependence on clean dataset can give more confidencethat the assumption drawn from the pattern exploration outputcan be treated as being precise to the model of the data beingexamined. Moreover, the redundant and non related patternscan be dropped earlier to avoid congestion on the subsequentoperations. Thus, it gives the system vigilant and the flexibilityof features selection for further exploration of attacks details.
B.
Data Mining Module
Data mining module is the first important component of theclassifiers system. The task of this module is to generate clusterinformation such that generates logical and homogeneousclusters from the input dataset. To achieve that task, a network
KDD’99Dataset
PREPROCESSINGTarget DataSLICINGREADINGSELECTION
UserDefinedThreshold
Audit Data System ProfileAttack StateStatisticallyDeviant?Update profileGenerate new profiles dynamically
255http://sites.google.com/site/ijcsis/ISSN 1947-5500