You are on page 1of 9

Expert Systems With Applications 88 (2017) 249–257

Contents lists available at ScienceDirect

Expert Systems With Applications


journal homepage: www.elsevier.com/locate/eswa

A feature reduced intrusion detection system using ANN classifier


Akashdeep∗, Ishfaq Manzoor, Neeraj Kumar
UIET, Panjab University, Chandigarh, India

a r t i c l e i n f o a b s t r a c t

Article history: Rapid increase in internet and network technologies has led to considerable increase in number of attacks
Received 2 August 2016 and intrusions. Detection and prevention of these attacks has become an important part of security. In-
Revised 28 June 2017
trusion detection system is one of the important ways to achieve high security in computer networks
Accepted 6 July 2017
and used to thwart different attacks. Intrusion detection systems have curse of dimensionality which
Available online 8 July 2017
tends to increase time complexity and decrease resource utilization. As a result, it is desirable that im-
Keywords: portant features of data must be analyzed by intrusion detection system to reduce dimensionality. This
Intrusion Detection System (IDS) work proposes an intelligent system which first performs feature ranking on the basis of information gain
Feature Ranking and correlation. Feature reduction is then done by combining ranks obtained from both information gain
Feature Reduction and correlation using a novel approach to identify useful and useless features. These reduced features
ANN are then fed to a feed forward neural network for training and testing on KDD99 dataset. Pre-processing
of KDD-99 dataset has been done to normalize number of instances of each class before training. The
system then behaves intelligently to classify test data into attack and non-attack classes. The aim of the
feature reduced system is to achieve same degree of performance as a normal system. The system is
tested on five different test datasets and both individual and average results of all datasets are reported.
Comparison of proposed method with and without feature reduction is done in terms of various perfor-
mance metrics. Comparisons with recent and relevant approaches are also tabled. Results obtained for
proposed method are really encouraging.
© 2017 Elsevier Ltd. All rights reserved.

1. Introduction and networks. Accuracy, extensibility and adaptability are three


important characteristics of intrusion detection system. IDS must
Traditionally, intrusion was detected by authentication, encryp- achieve good accuracy and adaptability to counter attacks from in-
tion and decryption techniques and firewalls etc. These are known truders. IDS distinguish between legitimate and illegitimate users
as first line of defense in computer security. This enables evalua- and must be used with the first line of defense to thwart intru-
tion of computer programs installed on the host to detect known sions and aberrations from inside as well as outside attackers. IDS
vulnerabilities. After evaluation, the vulnerable computer program is an important asset to computer security because attacker tries to
is patched with the latest patch code (Amiri, Yousefi, Lucas, Shak- conceal his identity and launch attacks through intermediate hosts
ery, & Yazdani, 2011). However, attacker can bypass them eas- widely known as stepping stones intrusion. Secondly, changing na-
ily and first line defense mechanism is not flexible and powerful ture of technology and technique makes it more difficult to detect
enough to thwart different kinds of attacks or intrusions. Antivirus attacks. IDS can therefore make use of learning techniques to de-
softwares that act as the second line of defense has a limitation tect unknown future attacks. Intrusion Detection System (IDS) can
that it can only detect attacks whose signatures are present in the be classified as Misuse detection system and Anomaly detection
database. They are limited in their ability to cope with attacks that system. These two methods may be combined to form a hybrid
may result in a few hours until their next updates. An opposed but detection system.
of strong nature is Intrusion Detection System (IDS) as it gathers In Misuse Detection system, signatures of already known at-
information related to activities that violate security policies. IDS tack patterns are stored in the database and matched with net-
gather information from a network system and analyze it in order work data; if the match is positive it is declared as an attack (Wu
to determine elements which violate security policies of computer & Huang, 2010). Example:- successive failed login within a minute
or two may be a possibility of password guessing attack. Experts
∗ from security field encode rules which are attained from real in-
Corresponding author.
E-mail addresses: akashdeep@pu.ac.in (Akashdeep), 11ubaid@gmail.com (I. Man- trusions. Misuse detection fails or gives less effective results in case
zoor), neer2890@gmail.com (N. Kumar). signatures are not known or attack varies from the actual signature

http://dx.doi.org/10.1016/j.eswa.2017.07.005
0957-4174/© 2017 Elsevier Ltd. All rights reserved.
250 Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257

pattern (Lin, Ving, Lee, & Lee, 2012). Also, this type of detection the proposed method is that it ranks features according to infor-
system has the same problem as antivirus software which needs mation gain and correlation at first step. The important features
periodic updation to detect a new type of attacks. Anomaly Detec- are then combined using a novel mechanism so that only useful
tion system creates a normal profile by analyzing and observing ones are included and useless are discarded. A neural network us-
the normal behavior of network system known as normal or base- ing back-propagation learning is then trained with KDD 99 train-
line profile. It then checks any deviation from the baseline profile ing dataset. Pre-processing of KDD dataset has been done to pre-
(Bhuyan, Bhattacharya, & Kalita, 2014) and patterns which deviate vent over-traing and over-fitting of data. The developed IDS is then
from normal profile are called outliers, anomalies, aberrations. A tested using testing dataset and performance is evaluated with
significant deviation from the normal profile is considered an at- help of various statistical methods. The study put to competition
tack. In anomaly detection system there is no need of prior knowl- with other contemporary technique in literature and results ob-
edge of signatures. Anomaly detection can be divided into static tained are promising.
and dynamic. The static detector works on a principle that only a The manuscript is arranged as follows: A brief outline of related
fixed part of system which does not change, is monitored e.g. op- studies is covered in Section 2; Section 3 introduces the proposed
erating system software. Dynamic anomaly detector addresses net- method; Illustrations and results are given in Section 4. Conclu-
work traffic data or audit records. Dynamic detector sets a thresh- sions and future directions are provided in Section 5.
old to separate normal consumption from anomalous consumption
of resources. This method can detect an seen as well as a new at- 2. Literature survey
tack which is an advantage over misuse detection systems but may
lead to high rate of false alarm. Another drawback is, if an attacker Intrusion detection system has dimensionality curse i.e. large
knows that he or she is being profiled, they can slowly change the datasets which simulate real network data increases time complex-
profile to train the anomaly detection system of intruder’s mali- ity of training and testing in IDS. Large data also leads to con-
cious behavior as normal. False positives and false negatives er- sumption of resources and may result in less detection stability.
rors also lead to inevitable costs (Joo, Hong, & Han, 2003). Such It is pragmatic that data not contributing to detection must be
systems can be further categorized into Network Intrusion Detec- eliminated before processing. This leads to the development of ef-
tion System (NIDS) and Host Intrusion Detection System (HIDS). fective feature extraction and reduction policy that can not only
Network-based intrusion detection monitors and analyses network help to reduce training time but shall provide higher accuracy and
traffic to differentiate and detect normal usage patterns from at- can also safeguard against unknown attacks. Feature selection re-
tack patterns. If a malicious pattern is detected, it is said to be duces computational complexity, information redundancy, increase
an intrusion. Host-based intrusion detection analyzes log files for the accuracy of learning algorithm, facilitate data understanding
attack signatures. HIDS analyze host based audit sources such as and improve generalization. Feature selection and ranking meth-
audit trails, system logs and application logs to detect attacks. Hy- ods are divided into two types as Wrapper and Filter methods
brid Detection System ensembles both misuse and anomaly based (Barmejo et al., 2012). Filter methods use some predefined crite-
detection systems. Hybrid methods thus eradicate known attacks ria in order to select features from the data set eliminating irrele-
using signature based mechanism by identifying a match and then vant features. Wrapper methods, on the other hand, are based on
anomaly detection checks if there is any deviation from normal training data to evaluate feature.
baseline profile to increase detection rate and decrease false alarm Amiri et al. (2011) proposed some feature selection algorithms
rate. and developed IDS using support vector machine. They carried fea-
Feature Ranking and Reduction:- Feature ranking and selection ture investigation by two feature selection methods: linear correla-
is important perspective in intrusion detection systems for get- tion coefficient and forward feature selection algorithm (FFSA) and
ting better performance. Methods of feature ranking and feature proposed modified- mutual information feature selection (MMIFS).
selection are useful to answer the question of feature importance They also compared the results obtained by three feature selection
present in a dataset and categorize them into features with high or and analyzed its effects by feature selection. The data set used was
less significance. These features help to classify data traffic in net- KDD cup 99 (KDD Dataset, 1999). Experiments were performed
works to normal or abnormal (attack) classes. However, features on windows platform and result showed that modified mutual in-
which either marginally or does not contribute in detecting dif- formation feature selection is more effective in detecting probe
ferent kinds of attacks should be removed to get better accuracy and R2L attack classes while FFSA performance was good in de-
and speed in intrusion detection systems. Removal of these fea- tecting U2R, DoS and normal profile attack. (Li et al., 2012). They
tures will make IDS performance better in terms of computation, performed preprocessing by k-means clustering to get a compact
dimension reduction and time complexity (Sangkatsance, Wat- dataset, then selected a small training data set with help of ant
lanapongsakorn, & CharnsriPinyo, 2011). Predicting the importance colony optimization (ACO). They performed feature reduction to re-
of such features is a complex task due to lack of proper mathe- duce features from 41 to 19 in KDD data set. Sangkatsance et al.
matical methods. Empirical methods may be used to determine the (2011), proposed real-time intrusion detection system (RT-IDS) and
importance of these features and mostly more than one method is extracted 12 essential features from network packet header in the
used to gauge their importance. Feature reduction develops an un- first step. In second step information gain (IG) was used to ana-
derstanding of features, leads to reduction in data, improves per- lyze their importance in detecting various types of attacks. RT-IDS
formance and can also aid use of simple models for classification. used in detecting attack types achieved detection rate of 98% in de-
A complete analysis of various advantages of feature reduction was nial of service and probing attack classes. Liu, Sui, and Xiao (2014)
given by Barmejo, Ossa, Gamez, and Puerta (2012). proposed clustered mutual information hybrid method for feature
The literature is rich in studies on feature selection or reduc- reduction. Clustering of features was done in unsupervised stage
tion and inspired by these studies, we introduce development of based on similarity. Supervised learning was used to select rep-
a feature reduction based intrusion detection system. This study resentative features to increase similarity with response features
is driven by the desire that in order to make IDS effective, it is representing class labels. Xiao, Liu, and Xiao (2009) proposed two
not computationally advantageous to work on all features collec- step feature algorithm for IDS in which redundant features were
tively for differentiation of attack and non-attack cases. Therefore, eliminated by mutual information method and experiments were
the proposed system first reduces the number of features and then carried on KDD cup 99 dataset. Improvement in processing speed
performs classification using supervised learning. The highlight of and better accuracy was achieved by the proposed method.
Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257 251

Bolon-Canedo, Sanchnez-Marano, and Alonso- Betanzos (2011)


obtained reduced set of features with the help of feature reduc-
tion algorithms like correlation and interact (based on symmetrical
uncertainty). An ensemble method was applied to get better accu-
racy in detection rate. This method was combination of discretizes,
filters and classifier to achieve better performance results and in
most cases feature reduction was more than 80%. Al-Jarrah et al.
(2014) proposed Random Forest-Forward Selection Ranking (RF-
FSR) and Random Forest-Backward Elimination Ranking (RF-BER)
feature selection methods. In RF-FSR, two subsets were formed
with first having three highest weighted features and rest all fea-
tures were placed in other set. Features were then added to first
set one by one to check detection rate. RF-BEM feature having
lowest weight was eliminated to check effect on detection rate.
Feature set was compared with well-known feature sets and the
proposed method resulted in increase of detection rate and a de-
crease in false alarm rate to 0.01%. Karimi, Mansour, and Haroun-
abadi (2013) merged two different feature sets, first one created
by applying information gain and second by applying symmetrical
uncertainty. The combined features were weighted and ranked to
get the most important features. Statistically experimental results
showed that detection rate was better compared to other feature
selection algorithms. Mukherjee, and Sharma (2012) analyzed the
performance of feature selection by correlation, information gain
and gain ratio using WEKA 3.6 tool. They have also proposed fea-
ture vitality based reduction method to identify important features
and used Naive Bayes classifier to classify different types of attacks
in dataset. Results in detection rate were good and their method
achieved better accuracy in U2R class attack.
Sung and Mukammala (2003) proposed feature ranking on the
basis of support vector decision function. Support vector ma-
chine and neural network were used for the classification process.
Detection accuracy achieved was good in all the attack classes.
Barmejo et al. (2012) proposed method which deals with subset
Fig. 1. Diagrammatic representation of proposed method.
selection in datasets with very large number of attributes. Their
goal was to maintain good performance with reduced number of
wrapper evaluations. The algorithm switches between filter rank- to improve performance of one or more attack classes. A generic
ing and wrapper feature subset selection to achieve better perfor- model that fits well for attack and non-attack classes is still de-
mance. Also, the method was tested on 11 high dimension data sired. It has also been evident that less detection stability or true
sets using different classifiers. Uguz (2011) used feature selec- positive rate has been observed in case of less frequent attacks
tion algorithms in text categorization by using a two-stage fea- like U2R and R2L classes. The existing approaches also suffer from
ture extraction and selection algorithm. In first stage, information higher false alarm rate due to higher false positives for frequent
gain was used and in second stage, principal component analysis occurring attacks. Redundant and irrelevant data also tends to in-
(PCA) and Genetic algorithm were used. k- nearest neighbor and crease overall complexity of the intrusion detection system. The
C4.5 decision algorithm were used for classification. Dataset used reason for such variability is that training data for some classes
was reuters-21,578 and classic3, high categorization effectiveness is abundant where as for other classes is very scarce. Based on lit-
was achieved by proposed method. A conditional mutual informa- erature gaps, an intelligent system is proposed in this study which
tion based method for feature selection was proposed by Fleuret performs pre-processing of data as an initial step to remove re-
(2004). He compared proposed algorithm with other feature se- dundant data from the dataset. This step has helped to overcome
lection algorithms and showed that conditional mutual informa- higher false alarm rates and classes having less number of tu-
tion method along with Naive Bayes classifier has better perfor- ples were also normalized. The method then performs dimension-
mance than methods like support vector machine. Chebrolu, Abra- ality reduction to decrease time complexity and increase resource
ham, and Thomas (2005) used Markova blanket model and deci- utilization. The system employs a unique mechanism to combine
sion tree analysis in feature selection to identify importance of dif- information gain with correlation based features to find features
ferent features. Bayesian network and regression trees were used with higher utility values. A classifier based on artificial neural net-
for classification. Mukkamela and Sung (2006) deleted each feature work (ANN) has been implemented for training and testing of sys-
one by one to see a change in the detection rate. They extracted 19 tem. ANN was used to increase the effectiveness of classification
significant features from 41 features and results showed that per- process. The system has been tested on KDD 99 dataset and results
formance was statically unimportant. are encouraging. The next section provides details of the proposed
Table 1 summarizes important points of various studies avail- method.
able in literature. In spite of availability of large number of stud-
ies, major research gaps which may lay foundation of current study 3. Proposed method
are summarized here. The cited literature indicates that predicting
an optimal number of features to increase the accuracy of intru- The proposed methodology is shown in Fig. 1.The first step in-
sion detection system and reduce training time complexity is still volves selection of dataset KDD-99 which is a benchmark dataset
an open issue. Studies are available but these have been targeted in intrusion detection system. KDD is actually raw TCP/IP dump
252 Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257

Table 1
Summarization of various Feature Selection and Reduction approaches.

Authors Remarks Advantages Limitations

Amiri et al. (2011) They compared results by three feature MMIFS was effective in detecting probe Method shows less detections in DoS
selection methods and analyzed their and R2L attacks while FFSA was good and R2L attacks.
effects. in detecting U2R, DoS and normal
profile attacks.
Li et.al. (2012) Used feature reduction to reduce 41 Achieved higher overall accuracy. System was able to recognize only 71%
features to 19. normal attacks.
Sangkatsance et al. (2011) Extracted 12 essential features from Achieved 98% detection rate in DoS and Results of R2L and U2R attacks were
network packet. probe classes. not presented.
Xiao et al. (2009) Redundant features were removed by Processing speed and accuracy was Experiments showed good results in
mutual information. improved. DoS and Probing attacks only.
Bolon-Canedo et al. (2011) Used correlation and symmetrical In most of the cases, feature reduction Detection rate in Normal, DoS and U2R
uncertainty to reduce features. was more than 80%. attacks are low.
Al-Jarrah et al. (2014) Two feature sets were formed and Significant increase in detection rate Results were shown in the form of
lowest weight features were removed. and decrease in false alarm by 0.01%. accuracy only.
Karimi et al. (2013) Merged two feature sets obtained by Detection rate was improved. Detection accuracy in U2R and R2L
applying information gain and needs to be improved further.
symmetrical uncertainty.
Mukherjee and Sharma (2012) Proposed feature vitality based Achieved good detection rate and Detection rate of U2R attacks with
reduction method to identify important better accuracy in U2R attacks. complexity and overheads needs to be
features. improved.
Sung and Mukammala (2003) Used SVM and neural network for Compared SVM with neural networks SVM only makes binary classifications
classification of features. and found that SVM has more and neural networks took more
scalability i.e., SVM can be used on training time than SVM.
large datasets.
Uguz (2011) Used a two-stage feature extraction High categorization effectiveness was Results were not appreciable for all
method achieved. classes.
Fleuret (2004) Proposed a mutual information based Their method along with naïve bayes Focused more on processing time.
method for feature selection. has better performance than SVM
Chebrolu et al. (2005) Classified features according to their Identified all type of attacks with 12 U2R attacks have less detection rate.
importance features only.
Mukkamela and Sung. (2006) Removed features one by one to check Comparative study of SVMs, MARs and Most important features in normal,
change in detection rate. LGPs is done with elimination of DoS and U2R classes overlap with each
features one by one. other. Accuracy for Probe and DoS class
is low.
Horng et al. (2010) Hierarchical clustering algorithm Better performance in detection of DoS Very less detection accuracy for R2L
provided the SVM with fewer, and Probe attacks. and U2R classes.
abstracted, and higher-qualified
training instances

data which was acquired by MIT Lincoln lab by simulating US Air quent classes. The result of data preprocessing step gives a com-
Force LAN. It was operated like real network and different inten- pact dataset with removal of redundancy and imbalance. In next
tional or pseudo information was attacked on it. KDD features fall step, feature ranking is performed by two algorithms namely, in-
into four categories which are both qualitative and qualitative in formation gain and correlation. Information gain calculates the en-
nature. KDD-99 dataset consists of five classes out of which one is tropy of each feature. Higher the entropy, more information con-
normal class and other four are attack classes known as DoS, U2R, tent it has. It will determine which attribute in a given set of fea-
R2L and Probe having redundancy and imbalance. Denial of service ture vectors is useful for learning purpose. These features will be
(DOS) is the type of attack in which legitimate users are denied or used by classification algorithm to distinguish unknown instances
kept waiting for the resources because attackers make resources into different attack classes. The second method used to rank fea-
too busy that legitimate users are not able to use the resources or tures is correlation. Lower the correlation of an attribute in feature
their request for resources are denied. Example: Smurf, Neptune, vector, more is its power to distinguish between types of attacks in
teardrop, back etc. In Probing, attackers gather all information re- multiclass problem. Ranked features from previous steps are then
garding computer network and look for weak points to launch the divided into subsets in next step. Information gain features are di-
attack. Port scanning is one of the major attacks of this category vided into three subsets named as IG-1, IG-2 and IG-3 and corre-
others are ip-sweep, saint and nmap etc. In, Remote to local (R2L) lation features are divided into three subsets named as CR-1, CR-
type of attack, attackers exploit computer systems for vulnerability 2 and CR-3. IG-1 and CR-1 subsets are built such that both con-
to gain access as a local user. The attacker tries to have an account tain first10 features which are in the range of 1 to 10 as result of
on victim machine by guessing password or spy attacking. Guess ranking by corresponding algorithms of information gain and cor-
password, multi-hop, phf, spy, Warezclient etc. are examples of R2L relation respectively; IG-2 and CR-2 consist of features that were
attacks. User to root (U2R) attackers having local access to system ranked in the range of 11 to 30 and IG-3 and CR-3 contains rest of
exploits system for weak points to get root privileges of a system. features. We call them as strongly useful, useful and useless fea-
Example: buffer overflow, root-kit, land module, Perl etc. ture subsets respectively. Strongly useful feature have high ranking
10% of KDD-99 consists of a total of 494,020 instances out of so high ability or information to differentiate instances into differ-
which 97,277 are normal instances; 3,91,458 are denial of service ent classes. Useful features contain higher information or ability to
instances; 4107 are probe instances; only 52 instances belong to differentiate instances into different classes than useless features.
user to root and 1126 are root to local instances. Data preprocess- IG-1 and CR-1 subset features have a higher ranking than features
ing has been done manually by removing duplicate instances from in IG-2 and CR-2 and similarly IG-2 and CR-2 subset features have
KDD-99 dataset and separating instances into different classes. The higher ranking than features in IG-3 and CR-3. In next step, union
method starts by removal of some redundant instances in high fre- of IG-1 feature subset and CR-1 feature subset has been performed
Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257 253

Table 2 Table 3
Sample distributions of instances for five classes in training dataset. Sample distributions of instances for five classes in a single test dataset.

Category of class Number of instances Percentage of class occurrence Category of class Number of instances Percentage of class occurrence

Normal 25,0 0 0 42.65% Normal 14,0 0 0 41.76%


U2R 35(3 times) 0.17% U2R 35 0.10%
DoS 30,020 51.22% DoS 16,0 0 0 47.72%
R2L 751 1.28% R2L 751 2.24%
Probe 2738 4.67% Probe 2738 8.17%
Total 58,416 100% Total 33,524 100%

and intersection of IG-2 feature subset with CR-2 feature subset is dancy and class imbalance, therefore, a compact dataset was
calculated and new feature set is generated. Rest of features that formed manually in which all 41 features are present. In second
were present in IG-3 and CR-3 were removed because their pres- step, compact dataset was imported and feature ranking and se-
ence will have negligible difference in intrusion detection. Union lection algorithms were applied. Information gain algorithm was
is implemented to make sure that all important features qualify applied by calculating entropy of each feature to know extent of
for the next level of detection and classification process. A reduced information present in different features of dataset. Correlation re-
data set consisting of 25 features is obtained by performing union turns matrix containing pair-wise linear correlation coefficient be-
operation to results obtained in the previous step. tween each pair of column. Then, correlation coefficient of every
The next step involves training of classification network which feature is calculated by taking mean of every column. Feature rank-
has been implemented as feed forward neural network. ing from both methods was prepared and corresponding ranking
A feed forward neural network model is constructed in which results are shown in Table 4 in which features like Dst-host-count,
every neuron of a layer is connected to next layer neurons. In Dst-host-srv-count, Dst-host-srv-diff-host- rate, flag and protocol
our case, we have three layer feed forward neural network. In- type has higher ranking. These features are represented by fea-
put layer has 25 neurons which are equal to the number of input ture number in table for reasons of brevity. Readers may refer to
features. Output layer has 5 neurons which are equal to five out- Appendix A for resolving numbers of different features to names.
put classes i.e. Normal, DoS, U2R, R2L and Probe. The middle layer Many of these features frequently change values in correspond-
has 10 neurons;
√ middle layer neurons are determined the empiri- ing feature column while feature like Num-root, Num-file cre-
cal formula I + O+α (α = 1–10) where I is input layer neurons, O ation, Num-shell and is-host-login has lowest ranking as these fea-
is output layer neurons and α is a random number in the range ture contain values which remain constant throughout. After per-
1 to 10. Training of feed forward is done by Levenberg-Marquardt forming ranking of features, three feature subsets were formed as
training method in which weights and biases are changed to train strongly useful, useful and useless features. Strongly useful features
neural network. Levenberg–Marquardt method is fastest back prop- can’t be removed from dataset as these may lead to decrease in ac-
agation method which may need more memory. Activation func- curacy of proposed method. Useful features being important than
tion of feed forward neural network is implemented by calculating useless feature can’t be eliminated as these features also helps in
weights of each connection between neurons and biases that each detection of different attacks like DoS attack. However, any useless
neural network layer has, these weights keeping on changing to feature which does not have a significant contribution in differen-
perform training of neural network. Advantages of using an Arti- tiating different types of attacks or to distinguish between normal
ficial neural network are knowledge discovery about dependencies and abnormal data can be removed. One research problem that we
without prior knowledge, robustness to inaccuracies and high ef- address here is to calculate the optimal number of features in fea-
ficiency. The network is trained using training dataset created by ture ranking and feature selection to get higher accuracy in classi-
extracting tuples from KDD dataset. A suitable ratio of number of fication methods for intrusion detection.
samples has been maintained among various classes. MATLAB ver- Information gain ranked features are divided into three sub-
sion 2013 was used to perform feature ranking by information gain sets as IG-1, IG-2 and IG-3 on the basis of ranking at-
and correlation and measure classification performance for these tained by them in information and correlation algorithm. IG-
features. Training dataset contains a total of 58,614 instances. Out 1 subset consists of first 10 features that were ranked be-
of which 30,020 are DoS instances, 25,0 0 0 instances are normal, tween 1 to 10, IG-2 subset contains total 20 features that
2738 instances are probe, 751 instances are root to local and 35 were ranked between 11 to 30, IG-3 subset contains 11 fea-
instances are user to root. Re-sampling of user to root instances tures that were ranked 31–41. IG-1 subset consists of feature as
is done to increase their count. After training, testing of datasets < 4,37,41,22,32,34,40,39,31,14>, IG-2 subset consists of features as
is performed on five different test datasets. Each test dataset con- < 33,29,36,30,28,35,15,20,38,9,1,8,13,11,6,19, 12,26,27,10>, and IG-3
sists of 33,524 instances which consist of a mixture of known and subset consists of features as < 17,18,2,3,23,5,25,7,24,16,21>.
unknown instances in equal proportion. Out of 33,524 instances, Correlations ranked features are also ranked into three sub-
16,0 0 0 are DoS, 14,0 0 0 are normal, 751 are R2L, 35 are U2R, and sets as CR-1, CR-2 and CR-3. CR-1 subset consists of first 10
2738 are Probe instances. We had performed testing on five dif- ranked features by correlation algorithm which are < 33,2, 41,
ferent datasets all containing different types of instances. Tables 2 27,22,14,37,38,12,39>, CR-2 consists of 20 features ranked in range
and 3 show distribution of instances into different classes for train- of 11 to 30 as < 4,16,8,13,5,6,7,3,19,20, 17,10,1,24,9,11,23,15,21,18 >
ing and testing purpose. Results of each dataset were taken and and CR-3 subset consists of features ranked between 31-41 as
average of all was tabulated by calculating statistical measures like < 25,26,29,35,28,30, 36, 31, 40, 34,32 >.
True positive rate, False positive rate, Precision and Recall. The next In next step Union of IG-1 feature subset and CR-1 fea-
section details about results and analysis of proposed method. ture subset is performed. Union operation is used here because
it will include all strongly useful features present in IG-1 and
4. Illustrations and results CR-1. Result of this operation has 15 features as < 2,4,12,14,
22,27,31,32,33,34,37,38,39,40,41> since five features are common in
Since it is not possible to carry out experiments with whole both feature subsets IG-1 and CR-1. Intersection of IG-2 and CR-2
KDD-99 dataset because of dimensionality, existence of redun- is performed and result of two subsets consists total of 10 features
254 Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257

Table 4
Ranking of features by information gain and correlation method.

Method # Features Ranking

Information gain 41 4,37,41,22,32,34,40,39,31,14,33,29,36,30,28,35,15,20,38,9,1,8,13,11,6,19,12,26,27,10,17,18,2,3,23,5,25,7,24,16,21


Correlation 41 33,2,41,27,22,14,37,38,12,39,4168,13,5,6,7,3,19,20,17,10,124,9,11,23,15,21,18,25,26,29,35,28,30,36,31,40,34,32.

Table 5
Feature reduced dataset with 25 selected features.

Dataset # Features Selected features

Feature reduced dataset 25 1,2,4,6,8,9,10,11,12,13,14,15,19, 20,22,27,31,32,3334, 37, 38, 39, 40, 41.

as < 1,6,8,9,10,11,13,15,19,20>. Subsets IG-3 and CR-3 are eliminated Table 6


Confusion matrix for test dataset-1.
because they don’t have much influence on the accuracy of classi-
fication method instead they increase training time of dataset. In Output class Target class
next step feature reduced dataset which is a combination of fea- Normal U2R DOS L2R Probe
tures obtained by union operation of IG-1 and CR-1 and the in-
Normal 13,833 9 14 92 25
tersection of IR-2 and CR-2 is formed. Table 5 shows feature re- U2R 4 17 0 1 0
duced dataset which contain total 25 features used in feature re- DOS 23 5 15,982 3 2
duced training and testing data set to carry experiments. L2R 57 4 1 654 0
An artificial neural network based classifier as defined in pre- Probe 83 0 3 1 2711
vious section was then setup for testing effectiveness of proposed
method. The classifier was trained with training dataset and vari-
Table 7
ous experiments were performed. The performance of the system Statistical parameters for Test Dataset-1.
has been observed using number of statistical measures like true
S. No Parameters Normal U2R DOS R2L Probe
positive rate or sensitivity, false positive rate or specificity, preci-
sion and recall. Confusion matrix also known as error matrix is 1 True positive 13,833 17 15,982 654 2711
generated and utilized for the measuring performance of classi- 2 False negative 140 5 33 62 87
3 False positive 167 18 18 97 27
fier. In confusion matrix, columns correspond to actual classes and
4 True negative 19,384 33,484 17,491 32,711 30,699
rows correspond to predicted class. Values of True Positive (TP), 5 TPR(%) 99.0 77.3 99.8 91.3 96.9
True Negative (TN), False Positive(FP) and False negatives(FN) can 6 FPR 0.0086 .0 0 05 0.0010 0.0030 0.0 0 09
be easily obtained. True Positive (TP) is a correct prediction of clas- 7 Precision (%) 98.6 48.6 99.9 87.1 99.0
8 Recall (%) 99.0 77.3 99.8 91.3 96.9
sifier as attack when actual test instances are attack. True Negative
9 Accuracy (%) 99.08 99.93 99.85 99.53 99.66
(TN) is a correct prediction of classifier as normal when actual test
instances are normal. False Positive (FP) is incorrect classifier pre-
diction as attack but actual instances are normal. False Negative
(FN) is an incorrect classifier prediction as normal but actual test Normal class false negatives (FN) is equal to 140 which is the ad-
instances are attack instances. dition of 9, 14, 92 and 25 excluding True positives. False positives
of a class can be calculated from respective columns without in-
T P R or Sensit ivit y = T P/(T P + F N ) (1)
cluding True positives of that class, for example: to calculate False
negatives of Normal class addition of 4, 23, 57 and 83 can be done,
F P R or (1 − Speci f icity ) = F P /(F P + T N ) (2) which is 167. To calculate True negatives of Normal class, eliminate
column one and row one in confusion matrix and add rest of sub-
matrix which also include true positives of other classes excluding
P recision = T P/(T P + F P ) (3) that class for which TNs are calculated.
Table 7 summarizes values for various statistical parameters un-
dertaken for test dataset-1. It can be seen from Table 7 that val-
Accuracy = (T P + T N )/(T P + T N + F P + F N ) (4)
ues of False Positives and False Negatives are small which is actu-
Sensitivity also known as true positive rate or recall measures ally good for the system because False Negative will compromise
proportion of actual positive cases that are correctly identified the security of system by allowing malicious data to enter in net-
as such. Thus sensitivity quantifies avoiding of false negatives, as work system but falsely predicted by intrusion detection system
specificity does for false positives. For any test, there is usually a as normal. False positive results increase in overheads, may cost
trade-off between measures. Eqs. (1)-(4) have been used to eval- time and resources of systems. However, it is not possible to elim-
uate our performance parameters. We here present sample values inate all False positive and false negatives in the systems because
obtained for our test datasets. there is a tradeoff between these parameters. True Positives and
Table 6 shows the results for test dataset-1 in the form of con- True Negative must be on high side so that the system accuracy
fusion matrix. Statistical parameter true positive is shown across in detecting different attack increases which is considerably good
the main diagonal of confusion matrix while others are calculated for our system. The above mentioned process was repeated for all
using formulae’s mentioned above. Columns in confusion matrix test datasets. The values obtained for test datasets 2, 3, 4 and 5 are
correspond to actual classes while rows correspond to predicted presented in Table 8–15 respectively.
classes that are predictions of classification algorithm. The main di- Tables 7, 9, 11, 13 and 15 shows that FPs and FNs are small
agonal of confusion matrix corresponds to True positives samples which is actually good for the system because false negative can
which are 13,833 for normal class, 17 for U2R and so on. False neg- compromise the security of system by allowing malicious data to
atives (FN) of a particular class can be calculated from correspond- enter in network system. The value of statistical parameters preci-
ing row without including true positives of that class for example: sion and recall does not depend on size of dataset and is also ap-
Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257 255

Table 8 Table 14
Confusion matrix for Test Dataset- 2. Confusion matrix for Test Dataset 5.

Output class Target class Output Class Target Class

Normal U2R DOS L2R Probe Normal U2R DOS L2R Probe

Normal 10,255 10 9 88 66 Normal 12,403 9 9 92 25


U2R 0 12 0 0 0 U2R 2 17 0 1 0
DOS 3688 4 15,991 1 0 DOS 1509 5 15,991 3 2
L2R 29 7 0 662 0 L2R 48 4 0 654 0
Probe 28 2 0 0 2672 Probe 38 0 0 1 2711

Table 9 Table 15
Statistical parameters for Test Dataset-2. Statistical parameters for Test Dataset-5.
S No Parameters Normal U2R DOS R2L Probe S. No Parameters Normal U2R DOS R2L Probe
1 True positive 10,255 12 15,991 662 2672 1 True positive 12,403 17 15,991 654 2711
2 False negative 173 0 3693 36 30 2 False negative 135 3 1519 52 39
3 False positive 3745 23 9 89 66 3 False positive 1597 18 9 97 27
4 True negative 19,351 33,489 13,831 32,737 27,988 4 True negative 19,372 33,486 15,988 32,704 30,747
5 TPR(%) 98.3 100 81.2 94.8 98.9 5 TPR(%) 98.9% 85% 91.3% 92.6% 98.6%
6 FPR 0.162 0.0 0 07 0.0 0 06 0.0027 0.0023 6 FPR 0.0761 0.0 0 05 0.0 0 06 0.0029 0.0 0 08
7 Precision (%) 73.7 34.3 99.9 88.1 97.6 7 Precision (%) 88.6 92.6 99.95 87.1 99
8 Recall (%) 98.3 100 81.2 94.8 98.9 8 Recall (%) 98.9 98.6 91.35 92.6 99
9 Accuracy (%) 88.31 99.93 88.95 99.62 99.68 9 Accuracy (%) 94.83 99.93 95.43 99.55 99.80

Table 10
Confusion matrix of Test Dataset- 3. Table 16
Average statistical parameters of proposed method with feature reduction.
Output Class Target Class
S.no Parameters Normal U2R DOS R2L Probe
Normal U2R DOS L2R Probe
1 TPR or Sensitivity 98.8 86.6 93.8 91.9 89.8
Normal 12,391 9 0 92 25 2 FPR 0.06558 0.0 0 05 0.0 0 04 0.0028 0.0014
U2R 2 17 0 1 0 3 Precision 88.9 42.88 99.9 87.5 98.4
DOS 403 5 16,0 0 0 3 2 4 Recall 98.8 86.6 93.8 91.9 89.8
L2R 66 4 0 654 0
Probe 1138 0 0 1 2711
Table 17
Comparison of average statistical parameter of two methods.
Table 11
Statistical parameters for Test Dataset-3. Proposed method Class TPR FPR Precision Recall

S. No Parameters Normal U2R DOS R2L Probe With Feature Reduction Normal 98.8 0.0655 88.9 98.8
U2R 86.6 0.0 0 05 42.9 86.6
1 True positive 12,391 17 16,0 0 0 54 2711
DoS 93.8 0.0 0 04 99.9 93.8
2 False negative 126 3 413 70 1139
R2L 91.9 0.0028 87.5 91.9
3 False positive 1609 18 0 97 27
Probe 89.8 0.0014 98.4 89.8
4 True negative 19,398 33,489 17,111 32,703 29,647
Without Feature Reduction Normal 99.3 0.0835 85.9 99.3
5 TPR(%) 99.0% 85% 97.5 90.3 70.4
U2R 81.4 0.0 0 05 49.7 81.4
6 FPR 0.0765 0.0 0 05 0.0 0 0 0 0.0029 0.0 0 09
DoS 90.4 0.0 0 04 99.9 90.4
7 Precision (%) 88.5 48.6 100 87.1 99.0
R2L 91.6 0.0048 94.7 91.6
8 Recall (%) 99 85 97.5 90.3 70.4
Probe 97.5 0.0010 98.8 97.5
9 Accuracy (%) 94.82 99.93 98.76 99.49 96.52

Table 12
Confusion matrix for Test Dataset −4. preciable across various datasets. Table 16 shows empirical results
Output Class Target Class
obtained by averaging values across all datasets. The table shows
that proposed method achieve better detection rate in all classes.
Normal U2R DOS L2R Probe
Proposed method was able to achieve 86.6% detection rate in U2R
Normal 13,299 10 0 88 66 and 91.9% detection rate in R2L class which is really appreciable.
U2R 2 12 0 0 0 Also, false alarm rate achieved in U2R, DoS, R2L and Probe classes
DOS 135 4 16,0 0 0 1 0
L2R 59 7 0 662 0
was very less which is a good indication that number of false de-
Probe 505 2 0 0 2672 tections were also very less. The values of precision and recall are
also appreciable.
Table 13 In order to justify the significance of proposed method, an ex-
Statistical parameters for Test Dataset-4. periment was performed in which feature reduced system was
compared without feature reduced system. Proposed method with
S. No Parameters Normal U2R DOS R2L Probe
reduced number of features (25) was compared with all 41 fea-
1 True positive 13,299 12 16,0 0 0 662 2672 tures present in KDD-99 and statistical parameters like TPR, Preci-
2 False negative 164 2 140 66 507
3 False positive 701 23 0 89 66
sion, FPR and Recall are calculated. Training and testing instances
4 True negative 14,164 33,470 17,367 32,707 30,279 for both methods were same as used in the previous experiment.
5 TPR(%) 98.8% 85.7% 99.1% 90.9% 84.1% Table 17 shows a comparison of proposed method with and with-
6 FPR 0.0471 0.0 0 06 0.0 0 0 0 0.0027 0.0022 out feature reduction by taking the average of all statistical param-
7 Precision (%) 95 34.35 100 88.1 97.6
eters. Results of two methods show that true positive rate or sen-
8 Recall (%) 98.8 85.7 99.1 90.9 84.1
9 Accuracy (%) 96.94 99.92 99.58 99.53 98.29 sitivity i.e., the proportion of positives which are correctly iden-
tified has increased when feature ranking is done. It can be seen
256 Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257

Table 18 rate and decreased false alarm rate. The system was put to test
Comparison of detection accuracy for various algorithms.
against contemporary techniques and results were found to be en-
Method/Study Name Normal DoS U2R R2L Probe couraging. The implication of the proposed system is demonstra-
Li et al. (2012) 71.51 98.61 99.69 77.86 99.66 tion of the fact that feature reduction can be an important phe-
Badran and Rockett (2012) 99.5 97.00 11.40 5.60 78.00 nomenon to reduce dimensionality and training time of the sys-
Amiri et al. (2011) 99.80 99.00 93.16 99.91 99.83 tem. The performance of the feature reduced system is actually
Mukkamela and Sung (2006) 99.55 99.25 99.87 99.78 99.70 better than system without feature reduction thereby influencing
Horng et al. (2010) 99.3 99.5 19.7 28.8 97.5
design of systems with far less time complexities. The proposed
Chebrolu et al. (2005) 98.78 98.95 48.00 98.93 99.57
Proposed Method 94.80 99.93 96.51 99.54 98.79 method of Intrusion detection system can be used to provide se-
curity in network, organizational and social areas where security
is prime importance. The study can also inspire researchers from
field of data science, big data to utilize their work to propose more
from table that proposed method has a gradual increase in the true
challenging solutions for the current research problem.
positive rate and has less false positive rate in less frequent attacks
Although the present work seems convincing but there has
like U2R and L2R. TPR has also increased in DoS class while it has
been some shortcomings like pre-processing work has been done
decreased slightly in Probe class but it can be considered negligi-
manually, number of features in reduced feature set can be made
ble considering the savings in training time. FPR in case of Normal
optimal. The present work can be extended to improve these
and R2L achieved by features ranking is less which is good indica-
shortcomings in number of ways like finding the optimal number
tion while in case of U2R and DoS class, it has remained the same.
of features to further increase the accuracy of intrusion detection
Precision decreases in less frequent attacks like U2R and R2L in
system. This can be accomplished by use of population based op-
case of feature ranking method. This is because less percentage of
timization algorithms like genetic algorithms, big bang big crunch
class occurrences are available for these attacks for training. This
optimization etc. Fuzzy systems can also be used in preprocess-
is acceptable for large systems as such attacks do not occur often.
ing work as they have a stable history of working in imprecise do-
Overall, it indicates that feature ranking plays an important role in
mains. The performance of system can be further improved by the
increasing detection rate of different class’s especially in less fre-
use of fast converging learning algorithms to check for speedy and
quent attack class like U2R and R2L. Performance of the system
accurate detection rate. Increase in the amount of data requires
has been retained and results are encouraging.
more and more powerful networks and studies based on deep net-
The performance of the proposed system was further compared
works like convolution neural networks can be one of the hottest
with recent and relevant approaches found in literature. Such stud-
candidates in this direction.
ies were harvested considering their wider dissemination, publish-
ing agency, citations and in which class wise results were available
on KDD dataset. Table 18 presents comparative results of detection Appendix A. List of various features in KDD
accuracy for various studies. It can be seen from table that detec-
tion accuracy of proposed method in DoS attacks is higher than
almost all methods. Also, 98.79% of probe attacks are detected by
Feature name Description
the proposed method which is better than study of Badran and
Rockett (2012) and Horng et al. (2010) while for other studies re- 1 duration Length (number of seconds) of the
connection
sults are comparable. The proposed method is better than methods 2 protocol_type Type of the protocol, e.g. tcp, udp, etc.
of Li, Xia, Zhang, Yan, Chuan, and Dai (2012), Badran and Rockett 3 service Network service on the destination, e.g., http,
(2012), Horng et al. (2010) and Chebrolu et al. (2005) in detecting telnet, etc.
DoS, U2R, R2L and Probe attacks. The only drawback of the proosed 4 src_bytes Number of data bytes from source to
destination
method is its performance for Normal class where its values are
5 dst_bytes Number of data bytes from destination to
somewhat on lower side as compared to other methods. There is source
very little to choose from results of proposed method and stud- 6 flag Normal or error status of the connection
ies of Amiri et al. (2011) and Mukkamela and Sung (2006) since 7 land 1 if connection is from/to the same
the obtained values are in stiff competition to each other. The pro- host/port; 0 otherwise
8 wrong_fragment Number of “wrong” fragments
posed study scores over these studies in majority of cases. Over- 9 urgent Number of urgent packets
all, it can be concluded that the proposed method is better con- 10 hot Number of “hot” indicators
sidering its performance levels across both attack and non-attack 11 num_failed_logins Number of failed login attempts
classes and advantages incurred by reducing the number of fea- 12 logged_in 1 if successfully logged in;
0 otherwise
tures.
13 num_compromised Number of “compromised” conditions
14 root_shell 1 if root shell is obtained;
0 otherwise
5. Conclusion
15 su_attempted 1 if “su root” command attempted; 0
otherwise
The study proposed a new intelligent intrusion detection sys- 16 num_root Number of “root” accesses
tem that works on reduced number of features. The system ex- 17 num_file_creations Number of file creation operations
tracts features using concepts of information gain and correlation. 18 num_shells Number of shell prompts
19 num_access_files Number of operations on access control files
Features are first raked using information gain and correlation and 20 num_outbound_cmds Number of outbound commands in an ftp
combined thereafter using a suitably designed mechanism. The session
method uses pre-processing to eliminate redundant and irrelevant 21 is_hot_login 1 if the login belongs to the “hot” list; 0
data from the dataset in order to improve resource utilization and otherwise
22 is_guest_login 1 if the login is a “guest”login;
reduce time complexity. A classification system was designed us-
0 otherwise
ing ANN which was trained on compact dataset and tested on five 23 count Number of connections to the same host as
different subsets of KDD99 dataset. It can be seen from results that the current connection in the past two
the method outperforms other methods for attack and non-attack seconds
classes. Overall, the method has reported an increased detection (continued on next page)
Akashdeep et al. / Expert Systems With Applications 88 (2017) 249–257 257

Chebrolu, S., Abraham, A. &, & Thomas, P. (2005). Feature deduction and ensemble
Feature name Description design of intrusion detection systems. Computer and Security., 24(4), 295–307.
Fleuret, F. (2004). Fast binary feature selection with conditionalmutual information.
24 serror_rate % of connections that have “SYN” errors Journal of Machine Learning Research, 5, 1531–1555.
25 rerror_rate % of connections that have “REJ” errors Horng, S. J., Su, M.-Y., Chen, Y. H., kao, T. K., Chen, R. J., & Lai, J. L. (2010). A novel
26 same_srv_rate % of connections to the same service intrusion detection system based on hierarchical clustering and support vec-
27 diff_srv_rate % of connections to different services tor machines. Expert Systems with Applications (2010). doi:10.1016/j.eswa.2010.
28 srv_count Number of connections to the same service 06.066.
as the current connection in the past two Joo, D., Hong, T., & Han, I. (2003). Neural network model for IDS based on asym-
seconds metrical costs of false negative errors and false positives errors. Expert Systems
29 srv_serror_rate % of connections that have “SYN” errors with Applications, 25, 69–75.
30 srv_rerror_rate % of connections that have “REJ” errors Karimi, Z., Mansour, M., & Harounabadi, A. (2013). Feature ranking in intrusion
detection dataset using combination of filter methods. International Journal of
31 srv_diff_host_rate % of connections to different hosts
Computer Application, 78, 21–27.
32 dst_host_count Destination host count
KDD dataset. [Online accessed in 2 August 2016] http://kdd.ics.uci.edu/databases/
33 dst_host_srv_count Service count for destination host kddcup99 .
34 dst_host_same_srv_rate Same service count for destination host Li, Y., Xia, J., Zhang, S., Yan, J., Chuan, X., & Dai, K. (2012). An efficient intrusion de-
35 dst_host_diff_srv_rate Different service count for destination host tection system based on support vector machine and gradually features removal
36 dst_host_same_src_port_rate Same source port rate for destination host method. Expert System with Applications, 39, 424–430.
37 dst_host_srv_diff_host_rate Different host rate for destination host Lin, S., Ving, K., Lee, C., & Lee, Z. (2012). An intelligent algorithm with feature selec-
38 dst_host_serror_rate Serror rate for destination host tion and decision rules applied to anomaly intrusion detection. Journal of Soft
39 dst_host_srv_serror_rate Srv-serror for destination host Computing, 12, 3285–3290.
40 dst_host_rerror_rate R-error rate for destination host Liu, Q., Sui, S., & Xiao, J. (2014). A mutual information based hybrid feature selection
41 dst_host_srv_rerror_rate Srv-rerror for destinaon host method using feature clustering. In IEEE 38th annual international conference on
computers, software and application 2014 (pp. 27–32). doi:10.1109/compsac.2014.
Source: KDD Cup 1999 Dataset 99.
Mukherjee, S., & Sharma, N. (2012). Intrusion detection using Naïve Bayes classi-
References fier with feature reduction. Procedia Technology, 4, 119–128. doi:10.1016/j.protcy.
2012.05.017.
Mukkamela, S., & Sung, A. H. (2006). Significant feature selection using computa-
Al-Jarrah, O. Y., Siddiqui, A., Elsalamouny, M., Yoo, P. D., Muhaidat, S., & tional intelligent techniques for intrusion detection. Advanced Information and
Kim, K. (2014). Machine learning based feature selection techniques for large Knowledge Processing, 24, 285–306. doi:10.1007/1- 84628- 284- 5_11.
scale intrusion detection. In Distributed computing systems workshop 2014.IEEE Sangkatsance, P., Watlanapongsakorn, N., & CharnsriPinyo, C. (2011). Practical real
34th International conference on (pp. 177–181). doi:10.1109/ICDCSW.2014.14. time intrusion detection using machine learning approach. Journal of Computer
Amiri, F., Yousefi, M. M. R., Lucas, C., Shakery, A., & Yazdani, N. (2011). Mutual in- Communication, 34, 2227–2235.
formation based feature selection for intrusion detection. Network and Computer Sung, A. H., & Mukammala, S. (2003). Feature selection for intrusion detection us-
Application, 34, 1184–1199. ing neural network and support vector machine. Transportation Research Record:
Badran, Khaled, & Rockett, Peter (2012). Multi-class pattern classification using sin- Journal of the Transportation Research Board, 1822, 1–11. http://dx.doi.org/10.
gle, multi-dimensional feature-space feature extraction evolved by multi-objec- 3141/1822-05.
tive genetic programming and its application to network intrusion detection. Uguz, H. (2011). Two stage feature selection method for text categorization by using
Genetic Programming and Evolvable Machines, 13(1), 33–63. information gain, principal component analysis and genetic algorithm. Journal of
Barmejo, P., Ossa, L., Gamez, J. A., & Puerta, J. M. (2012). Fastwrapper feature subset Knowledge Based Systems, 24, 1024–1032.
selection in high dimensional datasets by means of filter re ranking. Journal of Wu, H., & Haung, S. S. (2010). Neural network based detection on stepping stone
Knowledge Based Systems, 25, 35–44. intrusion. Expert System with Applications, 37, 1431–1437.
Bhuyan, M. H., Bhattacharya, D. K., & Kalita, J. K. (2014). Network anomaly detec- Xiao, L., Liu, Y., & Xiao, L. (2009). A two step feature selection algorithm adapting
tion: Methods, systems and tools. IEEE Communication Surveys and Tutorials, 16, to intrusion detection. In Convergence and hybrid information technology,2009,
303–336. international joint conference on (pp. 618–622).
Bolon-Canedo, V., Sanchnez-Marano, N., & Alonso- Betanzos, A. (2011). Feature se-
lection and classification in multiple class dataset: An application to KDD cup
99 dataset. Expert System with Applications, 38, 5947–5957.

You might also like