Professional Documents
Culture Documents
158-166
ISSN: 2222-2510
2011 WAP journal. www.waprogramming.com
Abstract: This paper presents a succinct review of the application of various Artificial Intelligence techniques
and their advances in the design, development and application of Intrusion Detection Systems (IDS) for
protecting computer and communication networks from intruders. These Computational Intelligence algorithms
have been shown to demonstrate their respective capabilities to produce high performance accuracies in various
applications. This study is intended to serve as an all-in-one resource to practitioners and researchers in this
inter-disciplinary endeavor while assisting them to take a critical look at the various efforts made so far in order
to design and develop new and better algorithms that will have the capability of solving most of the yetunsolved problems in the field of Network Intrusion Detection. With the recent advancement in hybrid and
ensemble systems, this paper also presents an overview of some of the existing systems and concluded by
presenting suggestions for the innovative combination and integration of individual existing techniques in new
and more efficient hybrid and ensemble algorithms as applied to IDS.
Keywords: intrusion detection systems, artificial intelligence, computational intelligence, hybrid intelligent
systems, ensemble systems.
I.
INTRODUCTION
Intrusion Detection System (IDS) is the process of monitoring the events occurring in a computer system or
network and analyzing them for signs of intrusion [1]. It is useful not only in detecting successful intrusions, but
also in monitoring attempts to break security, which provides important information for timely counter-measures.
Basically, IDS can be classified into two types: Misuse Intrusion Detection and Anomaly Intrusion Detection.
Traditional protection techniques such as user authentication, data encryption, avoiding programming errors, and
firewalls are used as first lines of defense for computer security. If a weak password is compromised, user
authentication cannot prevent unauthorized use. Also, firewalls are vulnerable to errors in configuration and
susceptible to ambiguous or undefined security policies.
Recently, the use of Artificial Intelligence (AI) techniques has been employed in different data mining and
machine learning classification and prediction modeling schemes. In addition to these, hybrid data mining
schemes, hierarchical hybrid intelligent system models, and ensemble learning approaches that combine the base
models with other hybrid machine learning paradigms, to maximize the accuracy and minimize both root mean
squared errors and computational complexity, have also gained popularity in the literature [2].
In this paper, a succinct review has been carried out on the individual capabilities of various AI techniques in their
application to network IDS. Such techniques include Artificial Neural Networks (ANN), Support Vector
Machines (SVM), Genetic Algorithms (GA) and Fuzzy Neural Networks (FNN). Attempts were also made to
propose possible hybrid approaches based on these techniques.
The rest of this paper is organized as follows. Section 2 presents an overview of IDSs and the recent advances in
AI. Section 3 gives a brief account of some AI techniques and their applications in IDS. Section 4 discusses the
concept of hybridization and ensemblage of AI techniques and the possible hybrid architectures that researchers
could explore in their design and development of IDSs. Section 5 peeps into the future of AI, hybrid and ensemble
applications in IDSs along with their prospects while section 6 gives the summary and conclusive remarks.
158
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
II.
BACKGROUND KNOWLEDGE
159
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
time. A major focus of machine learning research is to automatically learn to recognize complex attributes and to
make intelligent decisions based on the correlations among the data variables. Hence, machine learning is closely
related to fields such as statistics, probability theory, data mining, pattern recognition, artificial intelligence,
adaptive control, and theoretical computer science.
The machine learning concept can be categorized into three common algorithms viz. supervised, unsupervised and
hybrid learning. Supervised learning is the type of machine learning technique in which the algorithm generates a
function that maps inputs to the desired outputs with the least possible error. Unsupervised learning is the machine
learning technique in which a set of inputs are analyzed without the target output. This is also called clustering.
The hybrid learning combines the supervised and unsupervised techniques to generate an appropriate function and
to meet a specific need of solving a problem. The computational analysis of machine learning algorithms and their
performance is a branch of theoretical computer science known as computational learning theory [7].
A general modeling framework for computational intelligence is shown Figure 1.
Learning Process
Intrusion Attack
Training Set
Intrusion Attack
Initial AI Model
Input
Datase
Intrusion Attack
Trained
AI Model
Testing Set
Identified
Attack Type
Model
Validation
III. OVERVIEW OF SOME ARTIFICIAL INTELLIGENCE TECHNIQUES AND THEIR APPLICATION IN IDS
A good number of studies have been carried out on the use of various CI/AI techniques to model various IDS
strategies. Some of these techniques will be discussed in the following sections.
A. Artificial Neural Networks (ANN)
Attempts to artificially simulate the biological processes that lead to intelligent behavior culminated in the
development of ANN. ANN is a mathematical or computational model that is based on biological neural
networks. It consists of an interconnected group of artificial neurons which processes information using a
connectionist approach to computation. In most cases, ANN is an adaptive system that changes its structure based
on external or internal information that flows through the network during the learning phase.
In more practical terms, neural networks are non-linear statistical data modeling tools. They can be used to
model complex relationships between inputs and outputs or to find patterns in data. A typical ANN framework is
shown in figure 2. ANN is a close emulation of the biological nervous system. In this model, a neuron multiplies
the inputs by weights, calculates the sum, and applies a threshold. The result of this computation would then be
transmitted to subsequent neurons. Basically, the ANN has been generalized to:
160
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
(1)
where xk are inputs to the neuron i, wik are weights attached to the inputs, i is a threshold, offset or bias, f () is a
transfer function and yi is the output of the neuron. The transfer function f () can be any of: linear, non-linear,
piece-wise linear, sigmoidal, tangent hyperbolic and polynomial functions.
Some of the versions of ANN, depending on which algorithm is used at the summation stage, include:
Probabilistic Neural Networks, Generalized Regression Neural Networks and Multi-Layer Perceptron Neural
Networks. The most commonly used learning algorithm of ANN is the Feed-Forward Back-propagation
algorithm.
More details of this technique can be found in [8, 9].
161
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
More studies on the application of ANN and Fuzzy Logic can be found in literature.
C. Support Vector Machines
Support Vector Machines (SVMs) are a set of related supervised learning methods used for classification and
regression. They belong to a family of Generalized Linear Classifiers. They can also be considered as a special
case of Tikhonov Regularization. SVMs map input vectors to a higher dimensional space where a maximal
separating hyperplane is constructed [14]. This is shown in Figure 4.
The generalization ability of SVMs is ensured by special properties of the optimal hyperplane that maximizes the
distance to training examples in a high dimensional feature space. SVMs were initially introduced for the purpose
of classification until 1995 when Vapnik et al., as reported in [15], developed a new -sensitive loss function
technique that is based on statistical learning theory, and which adheres to the principle of structural risk
minimization, seeking to minimize an upper bound of the generalization error. This new technique is called
Support Vector Regression (SVR). It has been shown to exhibit excellent performance. Further details on SVM
can be found in [16, 17, 18].
In [19], Zang and Shen utilized the capability of SVM to formulate an Intrusion Detection System as a binary
classification problem by characterizing the frequencies of the system calls executed by the privileged programs.
Using the intersection of pattern recognition and text categorization domains, they modified the conventional
SVM, Robust SVM and one-class SVM; and compared their performances with that of the original SVM
algorithm. Using the 1998 DARPA BSM data set collected at MITs Lincoln Labs, they verified that the modified
SVMs can be trained online and the results outperform the original ones with fewer Support Vectors (SVs) and
less training time without decreasing detection accuracy.
162
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
Training Dataset
X
Figure 4. Mapping Input Vectors to a Higher Dimensional Space in SVM [22, 24, 25].
D. Genetic Algorithms
Genetic Algorithm (GA) is a computing technique used as an exhaustive search paradigm to find exact or
approximate solutions to optimization problems. GAs are categorized as global search heuristics. Its paradigm is
based on a particular class of evolutionary algorithms that uses techniques inspired by evolutionary biology such
as inheritance, mutation, selection, and crossover. GAs are implemented in a computer simulation framework in
which a population of abstract representations (representing chromosomes) of candidate solutions (representing
biological creatures, or phenotypes) to an optimization problem produces better solutions. Traditionally, solutions
are represented in bits (a set of 0s and 1s), but other encodings are also possible.
The evolution process begins with a population of randomly generated individuals and continues in generations.
In each generation, the fitness of every individual in the population is evaluated, multiple individuals are
stochastically selected from the current population (based on their fitness), and modified (recombined and
possibly randomly mutated) to form a new population. The new population is then used in the next iteration of the
algorithm. Usually, the algorithm terminates when either a maximum number of generations has been produced,
or a satisfactory fitness level has been reached for the population. If the algorithm has terminated due to a
maximum number of generations, a satisfactory solution may or may not have been reached.
Genetic Algorithms have been widely applied in almost all fields of research. The main property that makes
genetic representations in computer simulations convenient is that their parts are easily aligned due to their fixed
size, which facilitates simple crossover operations. The fitness function is defined over the genetic representation
and measures the quality of the represented solution. Once the genetic representation of a problem has been
obtained, and the fitness function defined, GA proceeds to initialize a population of solutions randomly, and then
improves it through repetitive application of mutation, crossover, inversion and selection operators.
More details on GA can be found in [20, 21].
E. Functional Networks
Functional Networks (FN) is an extension of Artificial Neural Networks which consists of different layers of
neurons connected by links. Each computing unit or neuron performs a simple calculation: a scalar, typically
monotone, function f of a weighted sum of inputs. The function f, associated with the neurons, is fixed and the
weights are learned from data using some well-known algorithms such as the least-squares fitting algorithm.
The main idea of FN consists of allowing the f functions to be learned while suppressing the weights. In addition,
the f functions are allowed to be multi-dimensional, though they can be equivalently replaced by functions of
single variables. When there are several links, say m, going from the last layer of neurons to a given output unit,
we can write the value of this output unit in several different forms (one per different link). This leads to a system
of m1 functional equations, which can be directly written from the topology of the Neural Network. Solving this
system leads to a great simplification of the initial functions f associated with the neurons.
163
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
As shown in Figure 5, a FN consists of a layer of input units which contains the input data, a layer of output units
which contains the output data, and one or several layers of neurons or computing units which evaluates a set of
input values coming from the previous layer and gives a set of output values to the next layer of neurons or output
units. The computing units are connected to each other, in the sense that output from one unit can serve as part of
input to another neuron or to the units in the output layer. Once the input values are given, the output is
determined by the neuron type, which can be defined by a function.
For example, assume that we have a neuron with s inputs: (x1 , , xs) and k outputs: (y1, , yk), then we assume
that there exist k functions Fj; j = 1, , k, such that yj = Fj(x1 , , xs); j = 1, , k.
FN also consists of a set of directed links that connect the input layer to the first layer of neurons, neurons of one
layer to neurons of the next layer, and the last layer of neurons to the output units. Connections are represented by
arrows, indicating the direction of information flow [22].
The least squares fitting algorithm has the ability to learn itself and to use the input data directly, by minimizing
the sum of squared errors, in order to obtain the parameters, namely the number of neurons and the type of kernel
functions, needed for training. The FN learning process consists of initial network creation, modification of the
initial network, and selection of the best model. More details can be found in [24].
Figure 5. Illustration of the Generalized Associativity Functional Network. (a) Initial network (b) simplified network [23]
164
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
given task, one of them is how to automatically generate an ensemble structure for taking advantage of available
learners whose capabilities have been well studied or known.
A. Fuzzy Neural Networks with Genetic Algorithms
One of the implementations of hybrid techniques is [1] which proposed a Fuzzy Neural Network assisted with GA
(FNN/GA) which used the FNN component to make a restriction of membership function to be some specific
shape such as triangular, trapezoidal or bell-shaped and then tuning the parameters of the membership function
with the GA component to achieve the mapping accuracy. The FNN consists of 4 layers. Layer 1 with 4 nodes
consists of input and output nodes representing input and output linguistic variables respectively. Nodes in layer 2
are those that act as membership functions and each is responsible for mapping an input linguistic variable into a
possibility distribution for that variable. Thus, together, all the layer 3 nodes formulate a fuzzy rule basis. Links
between layer 3 and 4 function as a connectionist inference engine. The training algorithm consists of first
constructing and training the FNN using the back-propagation algorithm to obtain membership functions and the
consequent weight vector. The membership functions with a group of line segments that are obtained by
partitioning and sampling the line segments are also constructed and finally, for every partition point, the GA is
used to search the optimal value and to obtain the optimal membership functions.
B. Hybrid of Functional Network, Support Vector Machines and Type-2 Fuzzy Logic
Another recent implementation of hybrids is [26] which combined the excellent features of Functional Networks
(FN), Support Vector Machines (SVM) and Type-2 Fuzzy Logic (T2FL). There were two versions of this hybrid:
FN-Fuzzy Logic-SVM (FFS) and FN-SVM-Fuzzy Logic (FSF). In the FFS version, after the FN was used to
select the most relevant variables from the input data, the best variables were passed on to the T2FL block where
uncertainties were removed and the SVM block performed the training and prediction tasks. In the FSF version,
the best variables from the FN block were passed through the SVM block where they transformed to a higher
dimensional space for the T2FL block to use for the training and prediction tasks. An improvement to the FFS and
FSF hybrid models are presented in [27].
C. Fuzzy Linear Programming with Support Vector Machines
Another dimension to hybridization of AI techniques was presented by [16] who proposed a combination of Fuzzy
Linear Programming (LP) with SVM to resolve the seemingly unclassifiable regions for multiclass problems. The
LP-SVM was trained to define the membership functions in the directions orthogonal to the decision functions.
Then by the minimum or average operation for these membership functions, a membership function for each class
was defined and finally, the one-against-all and pair-wise Fuzzy LP-SVMs for some benchmark datasets were
evaluated to demonstrate the superiority of the proposed Fuzzy LP-SVMs over conventional LP-SVMs.
V. THE FUTURE
With the reports in literature about the successful applications of AI techniques and the few ones reported on
hybrid techniques, there is high prospect of even more successful applications of the promising capabilities of the
advances in hybrids and ensembles of existing techniques for more pro-active detection of network intrusions.
More efforts can also be exerted on the existing techniques by engaging in better tuning of the parameters of such
techniques. Automatic parameter tuning can be employed using some of the exhaustive search algorithms such as
Genetic Algorithms, Particle Swarm Optimization, Ant Colony Optimization, etc.
In addition to the above, best feature selection can be used to increase the correlation of the input data variables by
selecting the most relevant variables instead of using all the available variables some of which can corrupt the
more relevant ones. Techniques such as Discriminant Analysis and Principal Component Analysis have become
preferred over the conventional statistical methods such as multi-variate analysis. The least-square fitting
algorithm of Functional Networks has been successfully used by [22, 24, 25] for best subset selection.
VI. CONCLUSION
Given the succinct review of the application of Artificial Intelligence techniques and its advances along with their
excellent performance in literature, we conclude that further research in this area is necessary as there are very
promising results that are obtainable from such techniques. The ensemblage and hybridization of various Artificial
Intelligence techniques also indicate a bright future in the analysis of IDS and the prediction of its various
properties for effective real-time network security.
165
Anifowose and Eludiora, World Applied Programming, Vol (2), No (3), March 2012.
ACKNOWLEDGMENT
The authors would like to thank their respective institutions: King Fahd University of Petroleum and Minerals and
Obafemi Awolowo University; for the support and resources provided during the conduct of this review.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
[22]
[23]
[24]
[25]
[26]
[27]
C.H. Lee, and Y. C. Lin, Hybrid Learning Algorithm for Neuro-Fuzzy Systems, in Proceedings: Proceedings. 2004 IEEE
International Conference on Fuzzy Systems, 2004, pp. 691-696.
H. Artail, H. Safa, M. Sraj, I. Kuwatly, and Z. Al-Masri, A Hybrid Honeypot Framework for Improving Intrusion Detection Systems
in Protecting Organizational Networks, Journal of Computers & Security, Vol. 25, 2006, pp 274 288.
J. Eduardo, and M.S. Brando, A New Approach for IDS Composition, in Proceedings: IEEE International Conference on
Communications, 2006, pp. 2195-2200.
A. Watkins, "An Immunological Approach to Intrusion Detection", In Proceedings: 12th Annual Canadian Information Technology
Security Symposium, Ottawa, Canada, June 2000, p. 447-454.
Wikipedia, The Free Encyclopedia, Intrusion Detection System, http://en.wikipedia.org/wiki/Intrusion-detection_system, Accessed
June 25, 2011.
H. Jun, Computational Intelligence, Research Interests, http://www.cs.bham.ac.uk/~jxh/hejunrs.html, 2008, Accessed June 25, 2011.
Symeonidis, A. L., and Mitkas, P. A., 2005. Agent Intelligence through Data Mining. Multi-agent Systems, Artificial Societies, and
Simulated Organizations Series 14: 200. USA: International Book Series, Springer+Business Media.
J.B. Petrus, F. Thuijsman, and A.J. Weijters, Artificial Neural Networks: An Introduction to ANN Theory and Practice, Springer,
1995, pp. 37-57.
Y. Wang, Fuzzy Clustering Analysis by using Genetic Algorithm, Innovative Computing, Information and Control Express Letters 2
(4), 2008, pp. 331-337.
O. Castillo, and P. Melin, Intelligent Systems with Interval Type-2 Fuzzy Logic, International Journal of Innovative Computing,
Information and Control 4 (4), 2008, pp. 771-784.
J. Mendel, Type-2 Fuzzy Sets: Some Questions and Answers, IEEE Connections, Newsletter of the IEEE Neural Networks Society 1,
2003, pp. 10-13.
C. Sampada, S. Khusbu, D. Neha, M. Sanghamitra, A. Abraham, and S. Sugata, "Adaptive Neuro-Fuzzy Intrusion Detection Systems",
in Proceedings: International Conference on Information Technology: Coding and Computing (ITCC04), DOI: 0-7695-2108-8/04,
2004.
N. Bashah, I.B. Shanmugam, and A.M. Ahmed, Hybrid Intelligent Intrusion Detection System, World Academy of Science,
Engineering and Technology 11, 2005, pp. 23-26.
C.J. Burges, A Tutorial on Support Vector Machines for Pattern Recognition, Data Mining and Knowledge Discovery 2, 1998, pp.
121-167.
N. Cristianini, and J. Shawe-Taylor, An Introduction to Support Vector Machines and other Kernel-Based Learning Methods, 1st
Edition. Cambridge University Press, UK. 2000.
S. Abe, Fuzzy LP-SVMs for Multiclass Problems, in Proceedings: European Symposium on Artificial Neural Networks, Belgium,
2004, pp. 429-434.
J. Taboada, J.M. Matas, C. Ordez, and P.J. Garca, Creating a Quality Map of a Slate Deposit using Support Vector Machines,
Elsevier Journal of Computational and Applied Mathematics 20 (4), 2007, pp. 84-94.
Y. Xing, X. Wu, and Z. Xu, Multiclass Least Squares Auto-Correlation Wavelet Support Vector Machines, International Journal of
Innovative Computing, Information and Control Express Letters 2 (4), 2008, pp. 345-350.
Z. Zhang, and H. Shen, "Application of Online Training SVMs for Real-time Intrusion Detection with Different Considerations",
Elsevier Journal of Computer Communications, Volume 28, Issue 12, July 2005, pp. 1428-1442.
S. Mohsen, A. Morteza, and Y.V. Ali, Design of Neural Networks using Genetic Algorithm for the Permeability Estimation of the
Reservoir, Journal of Petroleum Science and Engineering, Vol. 59, 2007, pp. 97105.
R.R. Bies, M.F. Muldoon, B.G. Pollock, S. Manuck, G. Smith, and M.E. Sale, "A Genetic Algorithm-Based Hybrid Machine Learning
Approach to Model Selection," Journal of Pharmacokinetics and Pharmacodynamics, Vol. 33, No. 2, 2006, pp. 195-221.
E. Castillo, "Functional Networks", Neural Processing Letters 7: 151159, 1998.
E. A. El-Sebakhy, "Software reliability identification using functional networks: A comparative study", Expert Systems with
Applications, Volume 36 Issue 2, March, 2009, pp. 4013-4020.
F. Anifowose, Hybrid AI Models for the Characterization of Oil and Gas Reservoirs: Concept, Design and Implementation, VDM
Verlag, Germany, 2009.
H. Inoue, and H. Narihisa, Efficient Pruning Method for Ensemble Self-Generating Neural Networks, Journal of Systemic,
Cybernetics and Informatics, 2003, pp. 423-428.
T. Helmy, F. Anifowose and K. Faisal, Hybrid Computational Models for the Characterization of Oil and Gas Reservoirs, Elsevier
International Journal of Expert Systems with Applications, vol. 37, 2010, pp. 5353-5363.
F. Anifowose and A. Abdulraheem, Fuzzy Logic-Driven and SVM-Driven Hybrid Computational Intelligence Models Applied to Oil
and Gas Reservoir Characterization, Journal of Natural Gas Science and Engineering, Volume 3, Issue 3, July 2011, Pages 505-517.
166