You are on page 1of 4

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013

Identifying Network Intrusions using One Dimensional distance


Greeshma K
Department of Computer Science and Engineering Calicut University,Kerala,India.
Abstract Firewalls and other simple boundary devices lack some degree of intelligence when it comes to observing, recognizing, and identifying attack signatures that may be present in the traffic they monitor and the log files they collect. Without sounding critical of such other systems capabilities, this deficiency explains the need for an intrusion detection systems,IDS helping to maintain proper network security. The simplest way to define an IDS is to describe it as a specialized tool that knows how to read and interpret the contents of log files from routers, firewalls, servers, and other network devices.To provide the effective result for detecting intrusions, this process introduces a new approach by clustering and classification technique. In this process, two distances are measured and summed. The first one is depended on the distance between each data sample and its cluster center, and the second distance is between the data and its nearest neighbor in the same cluster. Then, this new and one-dimensional distance based feature is used to represent each data.

I. INTRODUCTION An intrusion detection system (IDS) is a device or software application that monitors network or system activities for malicious activities or policy violations and produces reports to a management station. An IDS inspects all inbound and outbound network activity and identifies suspicious patterns that may indicate a network or system attack from someone attempting to break into or compromise a system. Intrusion Detection Systems help information systems prepare for, and deal with attacks.They accomplish this by collecting information from a variety of systems and network sources, and then analyzing the information for possible security problems.Intrusion detection provides the monitoring and analysis of user and system activity,auditing of system configurations and vulnerabilities,assessing the integrity of critical system and data files,statistical analysis of activity patterns based on the matching to known attacks,abnormal activity analysis and operating system audit. IDPSes typically record information related to observed events, notify security administrators of important observed events and produce reports. Many IDPSes can also respond to a detected threat by attempting to prevent it from succeeding. They use several response techniques, which involve the IDPS stopping the attack itself, changing the security environment (e.g. reconfiguring a firewall) or changing the attack's content. Intrusion detection (ID) is a type of security management system for computers and networks. An ID system gathers and analyzes information from various areas within a computer or a network to identify possible security breaches, which

include both intrusions (attacks from outside the organization) and misuse (attacks from within the organization). ID uses vulnerability assessment (sometimes refered to as scanning), which is a technology developed to assess the security of a computer system or network. The three main components to the Intrusion detection system are,Network Intrusion Detection system (NIDS) which performs an analysis for a passing traffic on the entire subnet and works in a promiscuous mode, and matches the traffic that is passed on the subnets to the library of knows attacks. Second one,Network Node Intrusion detection system (NNIDS) which performs the analysis of the traffic that is passed from the network to a specific hos and in this the traffic is monitored on the single host only and not for the entire subnet. Third one is , Host Intrusion Detection System (HIDS) which takes a snap shot of your existing system files and matches it to the previous snap shot. If the critical system files were modified or deleted, the alert is sent to the administrator to investigate. Therefore, in this paper, we propose a new approach for effective and efficient intrusion detection. It is based on combining cluster centers and nearest neighbours. Particularly, given a dataset the k-means clustering algorithm is used to extract cluster centers of each pre-defined category. Then, the nearest neighbor of each data sample in the same cluster is identified. Next, the sum of the distance between a specific data and the cluster centers and the distance between this data and its nearest neighbor is calculated. This results in a new distance as the feature to represent the data in the given dataset. Consequently, the new dataset containing only one dimension (i.e. distance based feature representation) is used for AODE classification, which allows for effective and efficient intrusion detection. II. LITERATURE SURVEY Network security is a large and growing area of concern for every network. Most of the network environments keep on facing an ever increasing number of security threats in the form of Trojan wormattacks and viruses that can damage the computer system and communication channels. Firewalls are used as a security check point in a network environment but still different types of security issues keep on arising. Inorder to further strengthen the network from illegal access the concept of Intrusion Detection System (IDS)and Intrusion

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 2031

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013
Prevention System (IPS) is gaining popularity. IDS is a process of monitoring the events occurring in a computer system or network and analyzing them for sign of possible incident which areviolations or imminent threats of violations of computer security policies or standard security policies.Intrusion Prevention System (IPS) is a process of performing intrusion detection and attempting to stopdetected possible incidents. A. HIDE : Heirarchichal Network Intrusinn Detection system usingStastical preprocessing and Neural network classification. HIDE is an anomaly network intrusion detection system, with hierarchical architecture, that uses statistical models and neural network classifiers to detect attacks. Here we report our experimental results of the performance of five different types of neural networks, as well as the results of traffic intensity stress-testing on HIDE. The system is a distributed hierarchical application, which consists of several tiers with each tier containing several Intrusion Detection Agents (IDAs). IDAs are IDS components that monitor the activities of a host or a network. Different tiers correspond to different network scopes that are protected by agents affiliated to them. The intrusion detection system can be divided into 3 tiers. Tier 1 agents monitor system activities of the servers and bridges within a department and periodically generate reports for Tier 2 agents.Tier 2 agents detect the network status of a departmental LAN based on the network traffic that they observe as well as the reports from the Tier 1 agents within the LAN. Tier 3 agents collect data from the Tier 1 agents at the firewall and the router as well as data of Tier 2 agents. B. Intrusion detection in MANET using classification algorithms In this they employ statistical classification algorithms to order to perform intrusion detection in MANETs. Such algorithms have the advantages that they are largely automated, that they can be quite accurate, and that they are rooted in statistics.For that reason, they are prime candidates for use in cost-sensitive classification problems. After training, they can be used for detection with arbitrary cost matrices.They have extended applications including intrusion detection in wired networks , they have been extensively studied, both theoretically and experimentally, and used in many applications with a high degree of success. C. One-Dependence Estimators for Accurate Detection of Anomalous Network Traffic In this paper prior to the application of any training algorithm on agiven data set, it is essential to convert all features (attributes) to a format that is intelligible by the classification algorithm. As a result, the effect of potential nullification of the impact of certain features on the outcome of the classification,is alleviated. In the NSL-KDD data set, all features of the data set take numeric values except three, namely,protocol type, service, and flag. As part of the preprocessing phase, these features are converted into nominal values,so that the AODE training algorithm, which we use for network traffic classification, can operate on this data set, unaffected.The process of numeric to nominal conversion is achieved through discretization of the numeric values, usingtechniques such as equal frequency binning, wherein the frequency of occurrence of a certain data value defines the bin into which the data is placed, i.e. discretized III. PROPOSED METHOD For avoiding different problem in existing system, here we introduce the new system for effective and efficient intrusion detection. The proposed approach is based on two distances as the new features between a specific data and its cluster center and nearest neighbor respectively. This contains the processes of extracting cluster centers and nearest neighbors and new data formation. The purpose of this algorithm is to assign an unlabeled data to the class of its k nearest neighbors. Thus process provides the effective results.

A.Selection of features: KDD cup 99 dataset has been used to examine this technique. After loading the dataset, the dataset moves to the feature classification. In feature classification step, the features are classified by five classes. After completing the feature classification process, the classified features are separated into 41 feature set. Feature set are detect as normal and anomaly. Feature selection algorithm is used to eliminate the unimportant features. B.Distance of cluster center: To extract cluster centers, some clustering technique can be applied in this stage. In this paper, the k-means clustering algorithm is used. The chosen dataset consisting of 12 data samples (N1 to N12) is a five-class classification problem. Then, the number of clusters is defined as five (i.e. k = 5) for the k-means clustering algorithm. As a result, there are five clusters, in which each cluster contains a cluster center (i.e. C1, C2, C3, C4, and C5). C.Loading one dimensional dataset: After the cluster center and nearest neighbor for every data of the chosen dataset are extracted and identified, two types of distances are calculated and then summed. For the first distance type, they are based on each data point to the cluster centers. That is, if there are three cluster centers, then there are three distances between a data point to the three cluster centers respectively. The second distance type is based on each data point to its nearest neighbor. The distance between two data points is based on the Euclidean distance. Finally this distance provides the new dataset.

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 2032

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013
D.AODE Classifier: The new dataset is divided into the training and testing datasets to train and test a specific classifier respectively. In this paper, we use Average One Dependence Estimator for classification of network traffic.This classifier intelligently classifies network traffic based on the attributes (features) of the network traffic.AODE resolves this issue by having a single feature identified as a super-parent, upon whom all other features depend.As a results, a dependancy graph is generated to establish inter-feature relationships. It was observed that such an approach improves the accuracy of the detection process significantly. An enhancement to SPODE, namely, AODE resolves the attribute independence assumption by ensuring that dependencies between various attributes in a given data set are averaged for various models. For the NSL-KDD dataset, our proposed intrusion detection scheme outperforms all other classifiers in terms of accuracy in attack detection, lowered false alarm rates, and improved precision and recall values. In particular, the SPODE classified detected 97.8% of all attack traffic, whereas the AODE classifier successfully detected 99.3% of the attacks, with a false positive rate of 0.1%, as compared to a detection rate of 97.3% and a false positive rate of 1%, exhibited by the other classifiers. IV. CLASSIFICATION AODE was found to outperform both ODE and SPODE in terms of speed and accuracy,The training phase of the AODE algorithm operates by iterating through a given data set with k features, and generating a set of frequency vectors as follows: 1. cfreq[y] - number of data elements belonging to a given class y 2. afreq[k] - number of times a given data element is found to possess a value, iterated over all k features 3. vfreq[xi] - number of times the value xi is encountered in the entire data set 4. freq[y; xi; xj ] - the frequency of simultaneous occurrence of two attribute values xi and xj for a given class y During the testing phase of the AODE algorithm, the data elements or instances of the data set are introduced to the algorithm by hiding the class to which they belong. The task of the AODE classifier is to predict the probability of the data element to belong to each of the given classes. The higher probability is then used for deciding the class of the data element. These values are computed based on the following equations: for all i 2 k ; p = P^ (xi ^ y) for all j 2 k; if xj is known; p = p P^ (xj jxi ^ y) where, k is the total number of attributes, xj is the value of a feature, P^ (xi ^ y) is the probability that the value xi is observed given xi belongs to y, and P^ (xj jxi ^ y) gives the probability of observing the value xj , given that the value xi belongs to y. V. SIMULATION ANALYSIS In this section, we analyze the results obtained from simulations performed to test the effectiveness of our proposed intrusion detection system. The results were quantified based on the following metrics, commonly used for evaluating intelligent classifiers: 1. Accuracy = TP+TN / TP+TN+FN+FP 2.Recall = TP / TP+FN 3.Precision = TP / TP+FP 4. Specificity = TN / TN+FP

Table Head
ACCURACY

Table Column Head


DETECTION RATE FALSE ALARM

SVM k-NN AODE

99.01% 99.2% 99.3%

93.29% 96.921% 97.3%

0.0289% 0.322% 0.310%

Figure 1 :Comparison of the Precision Recall curve for different AODE models built using different feature selection techniques V1. CONCLUSION A new approach is proposed in this paper for effective and efficient intrusion detection. This approach first transforms the original feature representation of a given dataset into one dimensional distance based feature. Then, this new dataset is used to train and test a AODE classifier for classification. Our experimental results show that this performs similar to the k-NN and SVM classifiers using the

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 2033

International Journal of Computer Trends and Technology (IJCTT) volume 4 Issue 7July 2013
original dataset in terms of accuracy, detection rates, and false alarm rates. However, the important strength of this method is that it needs less computational effort than the k-NN and SVM classifiers trained and tested by the original datasets. That is, although this requires additional computations for extracting the proposed distance based feature, it largely reduce the training and testing (i.e. detection) time since the new dataset only contains one dimension. REFERENCES
[1] Mukkamala, S. (2002). Intrusion detection using neural networks and support vector machine. Proceedings of the 2002 IEEE International Honolulu, HI. Sammany, M.; Sharawi, M.; El-Beltagy, M.; and Saroit, I. (2007). Artificial neural networks architecture for intrusion detection systems and classification of attacks. Accepted for publication in the 5th international conference INFO2007, Cairo University. Morteza, A.; Jalili, R.; and Hamid R.S. (2006). RT-UNNID: A practical solution to real-time network-based intrusion detection using unsupervised neural networks. Computers & Security, 25(6), 459 468. Tran. T.P.; Cao, L.; Tran, D.; Nguyen, C.D. (2009). Novel intrusion detection using probabilistic neural network and adaptive boosting. International Journal of Computer Science and Information Security (IJCSIS), 6(1), 83-91. Chen, R.C.; Cheng, K.F.; and Hsieh, C.F. (2009). Using rough set and support vector machine for network intrusion detection. International Journal of Network Security & Its Applications (IJNSA), 1(1), 1-13. G. Vigna, R. A. Kemmerer, NetSTAT: a network-based Intrusion Detection Approach, Proceedings of 14th Annual Computer Security Applications Conference, 1998, pp. 25 34 W. Lee, S. J. Stolfo, K. Mok, A Data MiningFramework for Building Intrusion Detection Models, Proceedings of 1999 IEEE Symposium of Security and Privacy, pp. 120- 132. A.K. Ghosh, J. Wanken, F. Charron, Detectin Anomalous and Unknown Intrusions Against Programs,Proceedings of IEEE 14th Annual Computer Security Applications Conference, 1998,. Lorenzo-Fonseca, I.; Maci-Prez, F.; Mora-Gimeno, F.; LauFernndez1, R.; Gil-Martnez-Abarca, J.; and Marcos-Jorquera, D. (2009). Intrusion detection method using neural networks based on the reduction of characteristics. LNCS, 5517, 12961303. S. Axelsson. \Research in intrusion detection systems: A survey". Technical Report No. 98-17, Dept. of Computer Engineering, Chalmers University of Technology, Gteborg, Sweden, 1999.

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

ISSN: 2231-2803

http://www.ijcttjournal.org

Page 2034