You are on page 1of 6

ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb.

2014

Modeling Intrusion Detection System Based On


Generalized Discriminant Analysis and Support Vector
Machines
P Indira priyadarsini I Ramesh Babu
Dept of Computer Science & Engg. Dept of Computer Science,&Engg.
Acharya Nagarjuna University Acharya Nagarjuna University
Guntur,A.P.,India Guntur,A.P.,India
indupullagura@gmail.com rinampudi@hotmail.com

ABSTRACT criteria in improving the performance of Intrusion Detection


These days incidents of cyber attacks have increased, developing Systems. The need for reducing data is to remove the
an effective Intrusion Detection Systems (IDSs) are mandatory redundant or irrelevant features. In a study the authors Ravi
for defending information systems security. In general all existing kiran et.al, [6] demonstrated using the Principal Component
IDSs use all the features in the network packet to trace out
Analysis (PCA) which gained improved performance of the
known intrusive patterns. A few of these features are irrelevant
or redundant. However, high dimensional and non-linear data Artificial Neural Network (ANN) classifier for intrusion
degrade the performance due to the Curse of dimensionality detection. In another related work achieved good results with
problem. We have used a non-linear dimensionality reduction PCA as feature reduction techniques using Support Vector
technique, Generalized Discriminant Analysis (GDA) which finds Machine in IDS [7],[8],[9].Most recently Rupali et.al
an optimized transformation that maximizes the class investigated, Linear Discriminant analysis (LDA) in intrusion
separability and avoid Curse of dimensionality problem. Well detection systems and achieved drastically good results
organized IDSs make the classification process more effective and [10].The authors Srilatha et.al [11], had showed comparably
efficient. Support Vector Machines (SVMs) are used in this best detection rate.
process since they have eminent classifying ability with good
Examining the many aspects of dimensionality reduction
generalization power. The purpose of this paper is to select most
important features that are useful in building computationally techniques in IDS we use Generalized Discriminant
effective IDS. We have successfully introduced IDS with GDA Analysis(GDA) is used as the finest approach for reducing the
using SVMs and are able to speed up the process with minimum number of dimensions and SVMs are used for classifying the
memory space and CPU utilization. We investigated the networked data since they are more accurate and performs
experiments on KDD Cup 99 dataset with standard Principal well on large data sets. The goal of this paper is to apply
Component Analysis (PCA) using SVMs and compared the kernel trick which offer modular framework in dimensionality
performances with the GDA. reduction techniques. The remaining part of this paper is
organized as follows. Section 2 of this paper describes An
Key words: Curse of dimensionality, Intrusion Detection System,
Overview of Intrusion Detection System and KDD Cup 99
Generalized Discriminant Analysis (GDA), Support Vector
Machines (SVMs), Principal Component Analysis (PCA). dataset. Section 3 will describe Support Vector Machine
Classification. Section 4 describes the Dimensionality
1. INTRODUCTION Reduction techniques with standard PCA and proposed GDA
With the advent of Internet and world wide connectivity the algorithms. Section 5 describes the Proposed Framework for
damage that can be caused by attacks launched over Internet Building an efficient IDS. Section 6 gives metrics for
against remote systems has increased. So, Intrusion Detection evaluating results while last Section describes Conclusions
Systems (IDSs) have become an essential module of computer with Future work.
security to detect these attacks. Several methods have been 2. AN OVERVIEW OF INTRUSION DETECTION SYSTEM (IDS)
proposed in the development of IDS systems including 2.1. NETWORKING ATTACKS
Decision Trees, Bayesian networks, Artificial Neural In the year 1998 DARPA intrusion detection
Networks, Association Rules and SVM.Besides these methods evaluation program [DARPA], has set up a setting by
research was made to investigate the performance of SVM as simulating a typical U.S. Air Force LAN to obtain raw TCP/IP
the tool for the classification module of the intrusion detection dump data for a network. At that period it was blasted with
system using various kernels [1],[2]. Now to enhance the multiple attacks. DARPA’98 have taken 7 weeks of network
learning capabilities and reduce the computational ability of traffic, which is processed into about 5 million connection
SVM, different dimensionality reduction techniques are records, each with about 100 bytes. KDD Cup 99 training
applied. dataset [12] consists of around 4,900,000 single connection
In recent years, a large variety of nonlinear dimensionality vectors each of which contains 41 features and is marked as
reduction techniques have been proposed, many of which either normal or an attack, with exactly one specific attack
depend on the evaluation of local properties of the data type and the testing data contain the last two weeks of traffic
[3],[4],[5].Mainly Dimensionality reduction is the major with nearly 300,000 connections. It holds new types of attacks

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 13


ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014
that were not contained in the training data. From the training SVMs can solve a two-class problem, by separating the
database a subset of 4, 94,021 data was taken as standard dataset using a maximal marginal hyper plane defined by a set
dataset, which is 10% of the original data. Each class specifies of support vectors as shown in Fig.1.The dataset is separated
a category of a simulated attack, and the attacks are Denial of using a hyper plane in such a way that maximizing the margin
Service (DOS), User to Root (U2R), Remote to Local Attack (solid line in the figure below), this can be done by extending
(R2L), and Probing Attack. both the marginal lines at both sides. The solid line is called
 Denial of Service (DOS): This is a type of attack in Maximal Marginal Hyper plane (MMH). Then the support
which the attacker makes some computing or vectors are taken as a subset of training dataset which play a
memory resource too busy or too full to handle crucial role in classification process; hence the process is
legitimate requests by sending some malicious named as Support Vector Machines. If SVM is not able to
packets. separate into two classes, then it solves by mapping input
 User to Root (U2R): This is a type of attack in which dataset into high dimensional dataset using a kernel function.
the attacker attempts to gain access to the root Then in the high dimensional space it is able to classify with
account of the target system and is able to develop good accuracy. There are several kernel functions used in
some vulnerability to gain access to the root account. SVM classification like linear, polynomial and Gaussian. The
 Remote to Local Attack (R2L): This is a type of SVM Algorithm is given below.
attack in which the attacker who does not have any
account on that target machine, make use of some
flaw and tries to gain the access of that target
1/w w.x+b=1
machine.
 Probing Attack: This is a type of attack in which w.x+b=-1
malicious attacker attempt to gather information Support vectors
about a network of computers for the observable
purpose of circumventing its security control.
The IDS trains and builds a classifier on a training dataset
containing TCP\IP data and labels are taken from the KDD 99
w.x+b=0
dataset and then it will test itself by trying to classify a set of
test cases to the above groups.
2.2. KDD CUP 99 DATASET DESCRIPTION
The KDD Cup 99 dataset used in the experiment was taken MMH
from the Third International Knowledge Discovery and Data
1/w
Mining Tools Competition. Each connection record is given
with 41 attributes. The list of attributes contains both
continuous type and discrete type variables, which are
statistical distributions, crooked drastically from each other, Class label y= - 1
Class label y= + 1
which makes the detecting of intrusions a very exigent task.
Figure 1: MMH is shown in SVM classification
There are 22 categories of attacks from the following four
classes: DOS, R2L, U2R, and Probe [13].The dataset has
391458 DOS attack records, 97278 normal records, 4107 SVM Algorithm:
Probe attack records, 1126 R2L attack records and 52 U2R For classifying a training dataset, we try to estimate a function
attack records. This is the dataset taken from only 10 percent f: Rn → {+1,-1}.Suppose there are two classes denote as A
of the original data set. The data has been preprocessed before and B. The class A can be given with xЄA,y=1 and the class B
using for training and testing of the IDS model. The 41 can be given with xЄB,y=-1 for (xi,yi) Є Rn x {+1,-1}.If the
attributes [1] are given in the order given training data are linearly separable then there exists a
A,B,C,D,E,F,G,H,I,J,K,L,M,N,O,P,Q,R,S,T,U,V,W,X,Y,Z,A pair (w,b) Є Rn x R such that y(wt x+b)≥ 1 for all xЄ AUB,
A,AB,AC,AD,AE,AF,AG,AH,AI,AJ,AK,AL,AM,AN,AO and where w is the weight vector and b is the bias given on test
the class label AP. tuple t.
3. SUPPORT VECTOR MACHINES 1. SVM can be defined as a maximal marginal classifier, in
Support Vector Machines (SVMs) are machines that make which the classifying dataset is represented as an optimization
classification process based on support vectors.SVMs are problem:
introduced by the Vapnik [24], [25]. These are built based on Min ψ (w) = ǁwǁ2 subject to y (wt x+b) ≥ 1.
Statistical Learning Theory (SLT).They are accurate on 2. The dual of the problem is finding Lagrange’s multiplier λi
training samples and have a good generalization ability on which maximize the following equation as
testing samples.SVMs can create both linear and non-linear
decision margins using an optimization problem [14].These Max w (λ) = λi - λiλjxi txj yi yj constrained to λi≥ 0
days SVMs have become tremendous areas for research and i and λi yi = 0
are also became powerful tools in machine learning. Generally

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 14


ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014
SVMs are originally designed for binary classification later After multiplying the resulted components with the old data,
they are extended for handling multiclass problems. These are we will get the new data.
solved using one versus rest, one-versus-one and Directed 4.2. GENERALIZED DISCRIMINANT ANALYSIS (GDA)
Acyclic Graph methods. It is also called as Kernel Fisher Discriminant Analysis (KFD)
4. DIMENSIONALITY REDUCTION TECHNIQUES or Kernel Discriminant Analysis (KDA).It is a kernel zed
A dimension is defined as a measurement of a certain version of Linear Discriminant Analysis (LDA).GDA is taken
characteristic of an object. Its general properties are to remove as an extension of the standard LDA from linear domain to a
irrelevant and redundant data to reduce the computational cost nonlinear domain via the kernel trick. It is given
and avoid data over-fitting [17], and to improve the quality of after Fisher.Using the kernel trick, LDA is absolutely
data. Dimensionality reduction is a effectual solution to the performed in a new feature space, which allows non-linear
problem of “curse of dimensionality”. When the dimensions mappings to be applied successfully. GDA is the one which
are increased linearly, experiments have shown that the extends KFD to multiple classes. It was independently given
required number of examples for learning increases by Baudat et.al [22] which specifies that the kernel matrix K is
exponentially [18].In practice dimension, feature, variable, non-singular with the applications on low-dimensional feature
vector, object and attribute are all similar. Consider any space.
application in which a system processes data (eg: speech Suppose that X is n-dimensional sample set with N elements.
signal, images, or patterns in general) in the form of a Let X1 denote subset of X and C is the number of classes. Let
collection of vectors. And in many situations it is suggested to X is mapped to a non-linear mapping function ɸ.Then it is
remove the irrelevant and redundant dimensions from the data transformed as Z: ɸ: X→Z.The between-class scatter matrix
which produce successful results [19] [23]. and the with-in class scatter matrix of the nonlinearly mapped
4.1. PRINCIPAL COMPONENT ANALYSIS (PCA) data is given with the following equations [26].
It is the most widely used dimensionality reduction technique. Bɸ = Mcmɸc(mc)T -----(3)
It is also known as Karhunen-Loeve transform in signal
processing literature. The class label information is not taken Wɸ = ɸ(x)(x)T -----(4)
ɸ
into consideration in this method. It decreases the amount of Where m c is the mean of the class Xc in Z and Mc is the
dimensions essential to classify new data and generates a set number of instances in Xc.
of principal components, which are orthonormal eigenvector The main purpose of the GDA is to find projection matrix Uɸ
pairs [20]. It reduces the dimensionality of the data to some that maximizes the ratio
directions in the feature space where the variance is largest. Uɸopt =arg max |(Uɸ)T Bɸ Uɸ | /|(Uɸ)T Wɸ Uɸ|
The amount of the total variance obtained for a feature is =[uɸ1,uɸ2,….uɸN] ------(5)
proportional to its Eigen value [21].PCA can be calculated by Where the vectors uɸ ,can be taken as solution of the
the following steps shown below [6]. generalized Eigen value problem that is given as the statement
STEP 1: Get some data of the form
The 41 features of preprocessed KDD CUP 99 dataset are Bɸ uɸi=λ Wɸ uɸi ----- (6)
applied to PCA for feature optimization, which are redundant The training samples are to be adjusted in the center (zero
and correlated. mean, unit variance) in the feature space Z.From the existing
STEP 2: Subtract the mean kernels any solution uɸЄ Z have to lie in the span of all
The mean is calculated as shown below: training samples in Z.
X' = ----- (1) So, it becomes uɸ = αciɸ(xci) --(7)
Where αci are some real weights and xci is the ith sample of the
Now subtract the mean from each dimension from the data set.
class c.The solution is obtained by solving the equation
The mean subtracted is the average across each dimension.
λ= αT KDK α / αT KKα ----- (8)
The resultant data set with subtracted means will have a mean
Where K is a kernel matrix of (Ki,j =K(xi,xj)).
of zero.
i.e. K=(Kkl) k=1,…C,,l=1,…C ------(9)
STEP 3: Compute the covariance matrix
The matrix D is given as
Covariance matrix is calculated as:
Di,j =1/mk ,if both xk and xj both belong to kth class;
( )= ----- (2) =0;otherwise ----(10)
STEP 4: Compute the Eigen vectors and Eigen values of the
covariance matrix. Solving the Eigen value problem yields the coefficient vector
For generating the signal, we can calculate the Eigen vectors α that define the projection vectors uɸ Є Z.A projection of a
and Eigen values for this matrix, which is a square matrix. testing vector xtest is computed as
STEP 5: Forming a feature vector by selecting the (u)T ɸ(xtest) = αci K(xci,xtest) ---(11)
components.
Based on the signals it generated, we are required to choose GDA Algorithm:
the features whose signal value is more and these features are Input: Training dataset X and the class labels, input data point
called principal components. z
STEP 6: Get the new data. Output: low dimensional dataset

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 15


ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014
1. Compute the matrices K and D by solving the equations (9) 3)Since the KDD 99 cup dataset retrieved is unlabeled,
and (10). column headers are added specifying
2. Decompose K using Eigen vector decomposition. duration,protocol_type,src_bytes,dst_bytes,ext.
3. Compute Eigen vectors α and Eigen values of the equation 4) Then Sampling is applied to the dataset. Since the 10%
(5). KDD Cup 99 dataset contain major part with normal and DOS
4. Compute Eigen vectors uɸ using αci from the equation (7) attacks while very minor part with remaining attacks. So we
and normalize those using (8). have applied Sampling to both normal and DOS instances
5. Compute the projections of test points onto the eigenvectors .While we have taken the same number as it is from KDD Cup
uɸ using equation (11). 99 for the remaining attacks. In this dataset among 97277
normal instances 32790 are taken. From 391458 DOS
5. PROPOSED FRAMEWORK instances 32790 are taken to the final dataset. From the Probe,
Because of the larger dataset with 41 input features the U2R, R2L attacks all the instances are taken i.e.4107, 1126, 52
classification process, including training and testing time instances respectively. After performing preprocessing
becomes slow. Therefore irrelevant and redundant features can techniques, the final dataset we have taken contain 70865
lead to slow training and testing processes, greater resource instances. The Experiments which were conducted are
consumption, and a poor detection rate so they are omitted. performed on 5 classes. Among this final dataset, we have
For building computationally capable intrusion detection taken the two-third of it as training dataset and one-third as
system it is necessary to identify important input features. In test dataset which contain 46771, 24094 instances
this paper, a statistical dimensionality reduction technique is respectively.
introduced to identify most significant attributes. This will
speed up the learning process. In this paper SVM
classification is presented to obtain excellent accurate results. Collection of KDD Cup Dataset
In the experiments conducted a standard data reduction
scheme Principal Component Analysis was performed. Then
the proposed approach is compared with the PCA.Due to the
large differences in the attack patterns of various attack Pre-processing Step
classes, there is generally an overlap between some of these
classes in the feature space. In this situation, a feature
transformation mechanism that can minimize the between- Final dataset with 41 attr., 70865 instances
class scatter is used as given in the previous section. The
proposed Framework is shown in the Fig.2. Dimension reduction
A. DATASET SELECTION (GDA) Step
The KDD cup 99 dataset of 4, 94,020 records contain 97277,
Reduced dimensions with 12 attr., 70865 instaces
391458, 4107,1126,52 instances which are of Normal, DOS,
Probe, R2L, U2R respectively. Hence the dataset cannot be
Classification using SVM
processed in its format. So, Pre-processing techniques are to
be applied to the dataset. Therefore Pre-processing a dataset Step
becomes very crucial task in classification which also makes Testing the model on 24094 instances (test data)
difference in accuracy.
B. PRE-PROCESSING STEP Metrics used to analyze
Even though Pre-processing step is often neglected but it is
the results
important step in the data mining process.
1) Since the dataset do not contain no missing values this is
one advantage. The class attribute in the dataset which Figure 2: Building Proposed System Framework
contains normal and all categories of attacks which are not
differentiated. So they are assigned names. Based on the four C.DIMENSION REDUCTION
attack categories the target attribute is converted. For example Dimension reduction is done using PCA and GDA.Since the
back, land, Neptune, pod, smurf, and teardrop belong to DOS final dataset contains 41 attributes with 70865 instances, the
category. So the instances with these are converted to DOS number of attributes is reduced. Both the Experiments are
category in the class label. Similarly all the remaining attack done on the platform Windows 2007 with 3.40 GHz CPU and
categories are named. 2.0GB of RAM using Java 1.6 and Weka 3.6.9.
2) Then the symbolic attributes are converted to numeric. So D.CLASSIFICATION USING SVM
the attributes B, C, D are converted to numeric. For example After the dataset is reduced, then the model is constructed
the attribute B is protocol_type which has icmp, tcp, udp using Support Vector Machines. The Kernel used is Radial
values. These are assigned 1 to icmp, 2 to tcp and 3 to udp. Basis Function with the parameters constant C=10 and γ=0.01.
Likewise other symbolic attributes are also converted. The reason behind the choice of RBF kernel is that it has less
numerical difficulties since the kernel values lie between zero

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 16


ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014
and one, while the polynomial kernel values may go to infinity FN=Actual normal instances that were incorrectly predicted as
or zero while the degree is large .The final step of the test attacks =425+94+144+5=668
consists of finding a classification method using the SVM FP=Actual Attacks that were incorrectly labeled as
algorithm on the lower-dimensional training dataset, as a Normal=1876+214+224+9=2323
result the test dataset is classified and results obtained are TN=All the remaining attacks that were correctly classified as
shown in the Section 6. non-normal.
EXPERIMENT 1: =9293+4+178+976+10+58+44+57+2+3+2+3=10630
The first experiment is performed on PCA.The final dataset is Detection Rate: It is given by
reduced using Principal Component Analysis. Then it has DR= x 100%
selected 19 attributes among 41 attributes. The attributes
obtained are False Alarm Rate (FAR): It refers to the proportion of
A,B,C,D,E,F,J,M,P,U,V,W,X,AA,AD,AE,AF,AH,AK. normal data is falsely detected as attack behavior.
Then training is done on 46771 instances using SVMs. The FAR= x 100%
time taken to build the model using 19 attributes is 2350.89s.
Then the correctly classified instances are 20802 where TABLE III: CONFUSION MATRIX OBTAINED BY SVM CLASSIFIER
remaining 3294 are incorrectly classified instances based on WITH PCA
Predicted
test dataset. Actual Normal DOS Probe R2L U2R %
EXPERIMENT 2:
Now the proposed approach Generalized Discriminant
Analysis is applied on the final dataset which selects 12 Normal 10473 425 94 144 5 94
DOS 1876 9293 4 0 0 83
attributes out of 41 attributes. The attributes selected are
Probe 214 178 976 10 0 71
C,E,F,L,W,X,Y,AB,AE,AF,AG,AI. R2L 224 58 44 57 0 15
Then the training is done on 46771 instances using SVMs.The U2R 9 2 3 2 3 16
time taken to build the model using 12 attributes is % 82 93 87 27 38
1521.82s.We have obtained 22648 correctly classified The time taken to build the model: 2350.89 sec
instances among 24094 instances and the rest are incorrectly
TABLE IV: CONFUSION MATRIX OBTAINED BY SVM CLASSIFIER
classified instances. WITH GDA
6. EVALUATION AND RESULTS Predicted %
6.1. METRICS USED FOR EVALUATION Actual Normal DOS Probe R2L U2R
We have used three metrics to analyze the results. They are
namely Confusion matrix, Detection Rate (DR) and the False
Normal 11118 23 0 0 0 98
Alarm Rate (FAR). DOS 998 10149 26 0 0 91
Confusion matrix: Probe 10 4 1279 85 0 93
The confusion matrix represents the data with actual and R2L 90 120 75 98 0 26
predicted classifications done by a classifier for each class. U2R 10 2 2 1 4 21
For the proposed system it contain 5 classes namely % 91 98 93 53 0
Normal,DOS,probe,R2Land U2R.The Confusion matrix The time taken to build the model: 1521.82 sec
generally contain True Positive(TP),False Negative(FN),False TABLE V: COMPARISON OF BOTH THE MODELS
Positive(FP),True Negative (TN) values as shown in Table I SVM classifier with PCA SVM classifier with GDA
.The performance of the system is computed based on the data
in a matrix. Table II shows the confusion matrix for Normal
for illustration. DR Rate FAR Rate DR Rate FAR Rate
TABLE I: CONFUSION MATRIX (%) (%) (%) (%)
Predicted Normal Attack
Actual Normal 94 17.92 99.7 8.5
Normal TP FN DOS 83.17 5.13 90.8 1.06
Attack FP TN Probe 70.8 0.63 92.8 0.45
R2L 14.8 0.61 25.5 0.358
In the above matrix, rows represent actual categories while the U2R 15.7 0.02 21.0 0
columns represent predicted categories.
7. CONCLUSIONS AND FUTURE WORK
TABLE II: CONFUSION MATRIX FOR NORMAL FROM TABLE III
Predicted Normal Attack In this paper GDA is used to select the important features for
Actual classifying the KDD cup 99 dataset. When compared to PCA
Normal 10473 668 which find directions (principal directions) that best represent
Attack 2323 10630 the original data, GDA obtain directions that are efficient for
discrimination. While using SVMs there are several kernels,
TP= Actual normal instances that were correctly predicted as among all we have used RBF kernel because it shows better
normal = 10473 performance mainly in IDS.The experimental results showed
that our system is able to speed up the training and testing

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 17


ISSN: 2347-9310 (Online) IJCSITAR Vol. 2, ISSUE 2, Feb. 2014
process of Intrusions Detection Systems. Even though both of [21] E. E. Cureton and R. B. D’Agostino, ”Factor Analysis: An Applied
Approach”, London: Lawrence Erlbaum Associates, vol. I, 1983.
the GDA and SVMs are slightly bewildering with many
[22] G.Baudt and F. Anouar “Generalized Discriminant Aanlyis Using a
mathematical equations they are good enough in solving Kernal Approach” Neural Computation, 2000.
complex problems. As our future work, we will extend our [23] Kai-mei Zheng, Xu Qian, Na An,”Supervised Non-Linear
SVM system to build more proficient intrusion detection Dimensionality Reduction Techniques for Classification in Intrusion
Detection”, International Conference on Artificial Intelligence and
systems based on several non-linear dimensionality reduction Computational Intelligence, 2010.
techniques. [24] Boser, Guyon, and Vapnik, “A training algorithm for optimal margin
REFERENCES classifiers”,Proceedings of the fifth annual workshop on Computational
[1] Mukkamala S., Janoski G., Sung A. H, “Intrusion Detection Using Neural learning theory.pp.144-152, 1992.
Networks and Support Vector Machines,” Proceedings of IEEE International [25] Cortes C.,Vapnik V.,“Support vector networks, in Proceedings of
Joint Conference on Neural Networks, 2002, pp.1702-1707. Machine Learning 20: pp.273–297, 1995.
[2] Wun-Hwa Chen, Sheng-Hsun Hsu,”Application of SVM and ANN for [26]Sebastian Mika, Gunnar fitsch, Jason Weston,Bernhard Scholkopf , and
intrusion detection”, Computers & Operations Research, 2005 – Elsevier . Klaus-Robert Muller,”Fisher discriminant analysis with kernels”,IEEE,1999.
[3] C.J.C. Burges. “Data Mining and Knowledge Discovery Handbook: A
Complete Guide for Practitioners and Researchers”, chapter Geometric
Methods for Feature Selection and Dimensional Reduction: A Guided Tour.
Kluwer Academic Publishers, 2005.
[4] B. Sch¨olkopf, A.J. Smola, and K.R. Muller,”Nonlinear component
analysis as a kernel eigenvalue problem”, Neural Computation, 10(5):1299–
1319, 1998.
[5] J. Wang, Z. Zhang, and H. Zha.,”Adaptive manifold learning”,In
Advances in Neural Information Processing Systems, vol17, pages 1473–
1480, Cambridge, MA, USA, 2005.The MIT Press.
[6)]Ravi Kiran Varma,V.Valli Kumari ,”Feature Optimization and
Performance Improvement of a Multiclass Intrusion Detection System using
PCA and ANN” , International Journal of Computer Applications (0975 –
8887) Vol 44 No13, April 2012.
[7] Hansheng Lei, Venu Govindaraju,”Speeding Up Multi-class SVM
Evaluation by PCA and Feature Selection”,The 5th SIAM International
Conference on Data Mining Workshop, California, USA, 2005.
[8] Gopi K. Kuchimanchi, Vir V. Phoha, Kiran S. Balagani, Shekhar R.
Gaddam,”Dimension Reduction Using Feature Extraction Methods for Real-
time Misuse Detection Systems”,Proceedings of the 2004 IEEE Workshop on
Information Assurance and Security T1B2 1555 United States Military
Academy, West Point, NY, 10,June 2004.
[9] Heba F. Eid1, Ashraf Darwish2, Aboul Ella Hassanien3, and Ajith
Abraham,”Principle Components Analysis and Support Vector Machine based
Intrusion Detection System”,ISDA 2010,363-367.
[10] Rupali Datti, Bhupendra verma,”Feature Reduction for Intrusion
Detection Using Linear Discriminant Analysis”, (IJCSE) International Journal
on Computer Science and Engineering Vol 02, No. 04, 2010, 1072-1078.
[11] Srilatha Chebrolu, Ajith Abraham, Johnson P. Thomas,”Feature
deduction and ensemble design of intrusion detection
systems”,Elsevier,Computers & Security (2005) 24, 295-307.
[12]KDD Cup 1999,” October
2007, http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[13] Mahbod Tavallaee, Ebrahim Bagheri, Wei Lu, and Ali A. Ghorbani “A
Detailed Analysis of the KDD CUP 99 Data Set”, Proceedings of the 2009
IEEE Symposium on Computational Intelligence in Security and Defence
Applications (CISDA 2009).
[14] P Indira priyadarsini,Nagaraju Devarakonda,I Ramesh Babu,”A Chock-
Full Survey on Support Vector Machines”, International Journal of Computer
Science and Software Engineering,Vol 3,issue10,2013.
[15] Mukkamala S., Janoski G., Sung A. H.,” Comparison of Neural Networks
and Support Vector Machines, in Intrusion Detection”, Workshop on
Statistical and Machine Learning Techniques in Computer
Intrusion Detection, June 11-13, 2002.
[16] Vapnik V.,” The Nature of Statistical Learning Theory”, Springer-
Verlag, New York, 1995.
[17] Andrew Y N.,”Preventing overfitting of crossvalidation data”, In
Proceedings of Fourteenth International Conference on Machine Learning,
pages 245–253, 1997.
[18] R. Bellman, “Adaptive Control Processes: A Guided Tour”, Princeton
University Press, Princeton, 1961.
[19] U.M. Fayyad and R. Uthurusamy,” Evolving data mining into solutions
for insights”, Communications of the Association for Computing Machinery,
45(8):28 – 31, August 2002.
[20] I. T. Jolliffe, ”Principal Component Analysis” Springer-Verlag, (New
York), 2002.

©All rights reserved. IJCSITAR 2014 www.arph.in/ijcsitar/ Page 18

You might also like