This action might not be possible to undo. Are you sure you want to continue?

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

8, November 2010

**Efficient Probabilistic Classification Methods for NIDS
**

S.M.Aqil Burney

Department of Computer Science University of Karachi, Karachi-Pakistan

**M.Sadiq Ali Khan
**

Department of Computer Science University of Karachi, Karchi-Pakistan msakhan@uok.edu.pk

**Mr.Jawed Naseem
**

Principal Scientific Officer-PARC

Abstract: As technology improve, attackers are trying to get access of the network system resources by so many means, open loop holes in the network allow them to penetrate in the network more easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods Naïve Bayes and Junction Tree Algorithm on reduced set of features by improving the performance as compared to full data set. For feature reduction PCA is used that helped in proposing a new method for efficient classification. We proposed a Bayesian network-based model with reduced set of features for Intrusion Detection. Our proposed method generates a less false positive rate that increase the detection efficiency by reducing the workload and that increase the overall performance of an IDS. We also investigated that whether conditional independence really effect on the attacks/ threats detection. Keywords-Network Intrusion Detection Bayesain Networks; Junction Tree Algorithm System(NIDS);

approach for determination of attack probability. Naïve Bayes’ classifiers assume conditional independence while Bayesian network consider assumes conditional dependence. Two methods can be used to compare whether conditional independency or interdependency really contribute to probability of attack. In the next section we discussed some related works which are already proposed, in section 3 we discussed the two methods of classification, in section 4 the methodology is mentioned and finally in section 5 results and discussions are presented. II. BACKGROUND For intrusion most network based systems become the target to the hacker, so building efficient IDS is the main task now a day [4]. Intrusion based systems needs a component that generates an alerts on the basis of rule set, to detect the malicious activity correctly it is necessary to manage the alerts correctly [1]. Data Mining approaches are being applied by researchers for the attacks detection in their Intrusion Detection Systems[2]..Probabilistic approaches for reducing the false alarm rate are proposed for example, see [3]. The enormous amount of network data traffic is accumulated each day. Numbers of data mining approaches are used for collecting knowledge domain for intrusion detection which includes clustering, association rules and classification [12]. Data analysis supports by data mining techniques and now it becomes one of the important features/component in intrusion based system. The main concern of using data mining techniques in attacks detection system to differentiate between normal packet vs abnormal. For applying data mining in intrusion detection we need a data set and a classification model. That classification model may be Bayesian Network, neural network, rule based decision tree based and other soft computing techniques as Support Vector Machines(SVM) [10,11]. Intrusion Detection System is now becomes the necessicity for an organizational security system with its credibility that may depend upon the data mining techniques. 2.1 Clustering The process of labeling data and arranging it in groups is called clustering. By grouping we basically improve the performance of different classifiers used. The genuine cluster contains data corresponding to single category [5]. The data set belongs to the cluster is modeled with respect to them exciting

I. INTRODUCTION Network Security whether in a commercial organization or in a critically important research network, is a major issue of concern with the increasing use of web even the personal information in under threat. Efficient network intrusion detection system is only solution to such threats [4]. IDS is a monitoring system of networks to control / avoid / secure the networks from cyber terrorist or it is the process of examing the events occurring in a network or computer system and detecting the signs of incidents which are the threats of computer security policies. Network system monitored by the IDS for detection of any rules violation. Having such violation in the system, efficient IDS generates notification by means of an alarm generation that alert the administrator to put some steps/major according to such vulnerabilities. Common intrusion attacks are classified based on various features/ parameter. KDD-99 data set usually used for investigating the nature of attack. The data set has 41 features listed. Information value of these features and interdependence among them is an interest of investigation. How much reduction in features can be made without reducing the efficiency of classification algorithm and whether interdependency really contributes to detection efficiency? We are tried to find the answers of such kind of questions in this paper. PCA is an effective data dimension reduction technique. Similarly Naïve Bayes’ classifier and Bayesian Network both use probabilistic

168

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

features. You may define the term clustering in such a way that it refers as unsupervised machine learning mechanism for

**patterns matching in unlabeled data with numerous aspects.
**

2.2 Classification In classification we break the data sets into different classes and it is much less exploratory than clustering. By means of classification we need to classify data into set of classes normal /not normal and to sub classify into different types. Naïve Bayes’ used as a classification algorithm in this research by which data classification for intrusion detection be achieved. Due to the collection of huge amount of data traffic needed classification is less famous [6].

Bayes classifier is compared with Junction Tree algorithm. For modeling Naive Bayes classifier several distribution including normal gamma or Poisson density function can be employed.

3.2

Junction Tree Algorithm

III. 3.1

CLASSIFICATION METHODS

Naïve Bayes Classifier

Its a graphical method of belief updation or probabilistic reasoning. For Probabilistic reasoning, we are using Bayesian Networks and Decision Graphs (BNDG) for which details can be found in [9]. The basic concept in junction tree is clustering of predicted attributes [8]. In belief updation instead of approximating joint probability distribution of all targeted variable (cliques) cluster attributes are formed and potential of clusters are used to approximate probability. So basically junction tree is the graphical representation of potential cluster nodes or cliques and a suitable algorithm to update this potential. Junction tree algorithm involve several steps as moralizing the graph, triangulation junction tree formulation, assigning probabilities to cliques, message passing and reading cliques marginal potentials from junction tree. Using Junction tree algorithm requires that directed graph is changed to undirected graph to ensure uniform application process is called moralization which involve adding edges between parents and dropping the direction let = ( be a directed graph to be changed into undirected graph G (NG,EG) so infect two new sets along with EG required to be added i.e. and The set can be defined as

Naïve Bayes classifier is an effective technique for classification of data. The technique is particularly useful for large data dimension. The Naïve Bayes is a special case of Bayes theoram which presuppose independence in data attributes [7]. Even though Naïve Bayes assumes data independence, its performance is efficient and at par with other techniques assuming data conditionality. Naïve Bayes classifier can manage continuous or categorical data. Let for a set of given variable X={x1,x2,.....xn } with possible outcomes O={o1,o2,…..on}. The posterior probability of the dependent variable is obtained by Bayes rule.

P(Oj | x1,x2,.....xn) * P(x1,x2,.....xn)Oj P(Oj) We can obtain a new case with X with a class label Oj have highest posterior probability as d Junction tree is formed after moralization which is basically hyper graphs of cliques if cliques of undirected graph G is given by C(G) than junction tree with a unique property that intersections of any two nodes is contained in every node in the unique path joining the nodes. Let consider a cluster representation having to neighbor cluster U and V sharing a variable S in common

In moralization undirected moralized graph is given as

is obtained and new

The efficiency of Naive Bayes classifier lies in the fact that it converts multi dimensionality of data to one dimensional density estimation. The occupations of evidence do not affect the posterior probability so generally classification task is efficient. The same is proved in this study also when Naive

169

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010

U

S

V

The aim of JTA is to modify potential in such a way that the distribution of P (V) is obtained by modified potential Ψ(V). In such case probability of S can be given as P(S)= ∑ Ψ(V)

random sampling. For Naive Bayes classification two data sets (stratified sample of equal size of 10000) were used for learning and testing using software BN classifier. In junction tree algorithm structure learning is carried out by drawing a random sample of 5000 from KDD data sets using netica. Then five data sets each of size 1000 are selected through simple random sample, data set is used for learning and drawing junction tree. Data set 2 to 5 were used for testing belief update learned by junction tree.

Ψ(U)

Ψ(S)

Ψ(V)

V.

RESULTS & DISCUSSION

Similarly P(S) = ∑ Ψ(U) Let Ψ(S) represent modified potential so Ψ(S) = P(S), so now if potential of let say Ψ(V) is delayed as result of new evidence f the potential of both Ψ(S) & Ψ(U) can be updated realizing the equivalence Ψ(U) = P(S) = Ψ(V) Belief updation in junction tree is carried out through message passing let U and V are two adjacent node with separator S. so the task is to absorb V and W through S. potential Ψ(W) and Ψ(S) with condition ∑ Ψ*(W) = Ψ*(S) = ∑ Ψ*(V) In absorption Ψ*(S) and Ψ*(W) are replaced as under Ψ*(S) = ∑ Ψ(V) Ψ(S) Ψ*(W) = Ψ (W) Ψ(S) In this way belief of the whole network is updated through message passing.

The 41 features of KDD’99 data set were reduced to 14 features. The PCA identified 12 major components having Eigen values greater than and around more than 80% variability of data explained by these features while 98% variability can be explained 24 components. The difference of variability between 24 and 14 features selection is only 18% but computational cost highly increased if 24 parameters are selected, so optimize the processing speed 14 has been selected. It is evident from the graph mentioned above that first 24 components represent 98.866% data and 14 components explained 80% variability which is quite sufficient, and work was carried out on these components only, neglecting the other components which seem less worthy. Besides this, structure learning also support selection of 14 features. The Bayesian network model shown in Figure 2 represents interdependence among various attributes. It is evident that mainly two factors as count & src_byte are effected by various features and in turn these two ultimately affect the attack types. The KDD’99 data set classification list 18 attack types however normal & neptune are more frequent.

IV.

METHODOLOGY

KDD’99 data set of intrusion detection was used. PCA technique was used and 14 features were selected on the basis of analysis. Selection of data set for training and testing plays a vital role in accuracy of prediction. In intrusion detection frequency of some attacks are very large as compare to others. To ensure inclusion of all attacks type in learning stratified random sample were drawn relative to proportion of each attack type. This produces better result as compare to simple

Figure 1: Scree Plot of attributes.

170

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

**(IJCSIS) International Journal of Computer Science and Information Security, Vol. 8, No. 8, November 2010
**

Prediction Accuracy of Major Attack Category 3500 3000

Numbers

2500 2000 1500 1000 500 0 Actual Predicted

DoS 2872 2868

Probe 726 538

R2l 7 2

U2R 3 1

Attack Category Actual Predicted

Figure:2 Bayesian Network Model Intrusion Detection System

Figure 4: Prediction Accuracy of Major Attacks

BN classification also supports the importance of these two type normal (0.527) and neptune (0.399) in Table 1. The probability of features buffer overflow, imap and multihop are less than 0.001% and that of ftp_write, guess_password and load_module are close to 0. It suggests that this classification can be merged.

Prediction Accuracy

4400 4200

Numbers

BN classifier learned more effectively the attack which is more frequent. In case of identify normal attacks it showed error rate of 0.8% only and identification of most frequent attack neptune is 6.8% refers in table 1.

TABLE 1 Class back buffer_overflow guess_passwd imap ipsweep multihop neptune nmap normal phf pod portsweep rootkit satan smurf teardrop warezclient warezmaster Total ACCURACY OF C LASSIFICATION(B AYESIAN C LASSIFIER) Actual 62 2 3 2 225 1 2630 96 4271 1 12 186 1 219 168 60 57 4 8000 Predicted 62 0 0 0 284 0 2587 35 4287 0 0 219 0 273 180 39 34 0 8000 Diff 0 2 3 2 -59 1 43 61 -16 1 12 -33 1 -54 -12 21 23 4 Error % 0 100 100 100 -26.2 100 1.6 63.5 -0.37 100 100 -17.7 100 -24.6 -7.1 35 40.35 100

4271 4287

4000 3800 3600 3400 normal Actual Attack Predicted 3729 3713

Figure3: Prediction accuracy using BN Classifier

Figure 4 shows majors attacks category predictions. DoS attacks are 99.86% detected while probe attacks about 75% detected.

TABLE 2. PROBABILITY OF ATTACK(AVERAGE) Class back buffer_overflow imap ipsweep multihop neptune nmap normal Junction Tree 0.0102 0.0008 0.0006 0.0368 0.0002 0.3992 0.0176 0.527 Naïve Bayes Classifier 0.0086 0.001 0.0005 0.0368 0 0.3936 0.0147 0.5432 Diff 0.0016 -0.0002 0.0001 0 0.0002 0.0056 0.0029 -0.0162

Total

1

1

171

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

Using junction tree algorithm accuracy of identification is utmost 98%. Junction tree also identified neptune as most frequent attack. Probability identified of various attacks is depicted in table 2. It is evident that estimation of probability almost equal. This has been statistically compared that there is no significance difference between two methods. Frequencies of remaining attacks are very small and their probability almost near to zero.

Probability of Attack

0.6 0.5

Probability

0.4 0.3 0.2 0.1 0

P L P P E P E D D E K C OW IT W IMA EE AN UL IHO UN MA MA L D T BA FL WR SS N OR O L T EP R SW _ PA U N M N IP VE TP S_ M AD F S _O E LO R U E G FF BU

[6] Tasleem Mustafa, Ahmed Mateen, Ahsan Raza Sattar, Nauman ul Haq and M. Yahya Saeed,“Forensic Data Security for Intrusions”, European Journal of Scientific Research ISSN 1450-216X Vol.39 No.2 (2010), pp.296308,2010. [7] Karl Friston, Carlton Chu, Jnaina Mourao,Oliver Hulme, Geriant Rees, Will Penny and John Ashburner, “Bayesian decoding of brain images”, Elsevier NeuroImage Volume 39, Issue 1, 1, Pages 181-205, January 2008. [8] Jaydip Sen, “An agent-based intrusion detection system for local area networks”,IJCNIS, Vol. 2, No. 2, August 2010. [9] F.V.Jensen and T.S.nielsen, “ Bayesian Networks and Decision Graphs” Springer.Berlin Heidelberg, New York,2007. [10] C.Cortes and V. Vapnik,“ Support Vector Networks”. Machine Learning, 20, 1995, pp. 273-297,1995. [11] Jungtaek Seo,“ An Attack Classification Mechanism Based on Multiple Support Vector Machines”, LNCS 4706, Part II, pp. 94–103, Springer-Verlag Berlin Heidelberg, ICCSA 2007. [12] Hebah H. O. Nasereddin, “Stream Data Mining”, International Journal of Web Applications, Volume 1 Number 4 December 2009. AUTHORS PROFILE

Attack Type Avg JT Naïve Bayes

VI.

CONCLUSION & FUTURE RECOMMENDATIONS

Despite the fact that Naïve Bayes classifiers assume conditional independence and junction tree algorithm parameter interdependence, even though Naïve Bayes and junction tree classifiers are almost equally effective. It is recommended that only those attacks should be considered which are more frequents in order to achieve better performance. It is also found that in selection of learning and testing data set appropriate sampling techniques are utilized for better result prediction. REFERENCES

[1] Moon Sun Shin, Eun Hee Kim, and Keun Ho Ryu, “ False Alarm classification model for network-based IDS”; Springer-verlag berlin Heidelberg, LNCS 3177, pp. 259–265, 2004. [2] M.J.Lee,M.S.Shin,H.S.Moon,” Design and implementation of alert analyzer with data mining engine. Proc. IDEAL ’03, Hongkong, 2003. [3] A.Valdes and K. Skinner, “Probabilistic alert correlation”; 4th international symposium on Recent Advances in ID, RAID, 54-68, 2003. [4] S.M.Aqil Burney and M.Sadiq Ali Khan , “Network Usage Security Policies for Academic Institutions”, International Journal of Computer Applications, October Issue, Published By Foundation of Computer Science,2010. [5] Anoop Singhal and Sushil Jajodia, “Data warehousing and data mining techniques for intrusion detection systems”, Distributed and Parallel Databases Volume 20, Number 2, 149-166, DOI: 10.1007/s10619-006-94965,2006.

Dr.S.M.Aqil Burney is the Meritorious Professor and approved Supervisor in Computer Science and Statistics by the Higher Education Commission, Govt of Pakistan. He is also the Director & Chairman of Computer Science Department, University of Karachi. Additionally he is also a Director of Main Communication Network University of Karachi. He is also member of various higher academic boards of different universities of Pakistan. His research interest includes AI, Soft Computing, Neural Network, Fuzzy Logic, Data Mining, Statistics, Simulation and Stochastic Modeling of Mobile Communication system and Networks, Network Security and MIS in health services. Dr.Burney is also referee of various journals and conferences proceedings, nationally & internationally. He is member of IEEE(USA), ACM(USA) and

M.Sadiq Ali Khan received his BS & MS Degree in Computer Engineering from SSUET in 1998 and 2003 respectively. Since 2003 he is serving Computer Science Department, University of Karachi as an Assistant Professor. He has about 12 years of teaching experience and his research areas includes Data Communication & Networks, Network Security, Cryptography issues and Security in Wireless Networks. He is member of CSI, PEC and NSP.

Jawed Naseem is Principal Scientific Officer in Pakistan Agricultural Research Council. He has M.Sc(Statistics) and MCS from University of Karachi, currently doing MS (Computer Science) from University of Karachi. His research interest are data modeling, Information Management & Security and Decision Support System particularly in agricultural research. He has been a team member in development of several regional(SAARC) level agricultural databases.

172

http://sites.google.com/site/ijcsis/ ISSN 1947-5500

- Journal of Computer Science IJCSIS March 2016 Part II
- Journal of Computer Science IJCSIS March 2016 Part I
- Journal of Computer Science IJCSIS April 2016 Part II
- Journal of Computer Science IJCSIS April 2016 Part I
- Journal of Computer Science IJCSIS February 2016
- Journal of Computer Science IJCSIS Special Issue February 2016
- Journal of Computer Science IJCSIS January 2016
- Journal of Computer Science IJCSIS December 2015
- Journal of Computer Science IJCSIS November 2015
- Journal of Computer Science IJCSIS October 2015
- Journal of Computer Science IJCSIS June 2015
- Journal of Computer Science IJCSIS July 2015
- International Journal of Computer Science IJCSIS September 2015
- Journal of Computer Science IJCSIS August 2015
- Journal of Computer Science IJCSIS April 2015
- Journal of Computer Science IJCSIS March 2015
- Fraudulent Electronic Transaction Detection Using Dynamic KDA Model
- Embedded Mobile Agent (EMA) for Distributed Information Retrieval
- A Survey
- Security Architecture with NAC using Crescent University as Case study
- An Analysis of Various Algorithms For Text Spam Classification and Clustering Using RapidMiner and Weka
- Unweighted Class Specific Soft Voting based ensemble of Extreme Learning Machine and its variant
- An Efficient Model to Automatically Find Index in Databases
- Base Station Radiation’s Optimization using Two Phase Shifting Dipoles
- Low Footprint Hybrid Finite Field Multiplier for Embedded Cryptography

As technology improve, attackers are trying to get access of the network system resources by so many means, open loop holes in the network allow them to penetrate in the network more easily. Variou...

As technology improve, attackers are trying to get access of the network system resources by so many means, open loop holes in the network allow them to penetrate in the network more easily. Various approaches are tried for classification of attacks. In this paper we have compared two methods Naïve Bayes and Junction Tree Algorithm on reduced set of features by improving the performance as compared to full data set. For feature reduction PCA is used that helped in proposing a new method for efficient classification. We proposed a Bayesian network-based model with reduced set of features for Intrusion Detection. Our proposed method generates a less false positive rate that increase the detection efficiency by reducing the workload and that increase the overall performance of an IDS. We also investigated that whether conditional independence really effect on the attacks/ threats detection.

- Data Mining Improves Pipeline Risk Assessment
- Effect Size Calculation in Power Estimation for the Chi-square Test of Preliminary Data in Different Studies
- 01_Abonyi-Computional Intelligence in..
- Data Profiling.doc
- Feature Subset Selection Techniques - Swift Clustering and Principle Component Analysis
- Applications of Sampling Theory
- Arch Garch Econometris
- Modeling of Two-dimensional Warranty Policy using Artificial Neural Network (ANN) Approach
- Age and Gender Classification Using CNN CVPR2015
- Stats 4 Chemists
- Particle Filter
- Spectral Relevance Coding in Mining for Genomic Application
- 99 Text Categorization Methods
- 880330
- Evidence for Connsciuousnessrelateda Anomalies in Random Physical Syustems
- Data Migration Strategies
- Final Year Matlab Project List With Abstract 2012
- MWMH Fact Sheet 12
- SSRN-id2132390
- A Critical Examination of IV Methods in Marketing Applica
- File 3202
- 10.1.1.130
- Chapter 8- Sampling
- Descriptive Statistics
- Reliability Analysis of Car Maintenance Forecast and Performance
- Al-Anazi & Babadagli 2010
- Detection Of Reliable Software Using Sprt On Time Domain Data
- Sampling
- Analysis of external and internal risks in project early phase
- Performance Monitoring

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd