You are on page 1of 5

2017 4th International Conference on Engineering Technology and Application (ICETA 2017)

ISBN: 978-1-60595-527-8

Intrusion Detection Algorithm Based on Convolutional Neural Network


Yuchen Liu, Shengli Liu* & Xing Zhao
State Key Laboratory of Mathematical Engineering and Advanced Computing, Zhengzhou, Henan, China

ABSTRACT: A periodic load balancing method is proposed according to the load imbalance of network server
in a clustered system of high concurrency. It is divided into different load periods based on the load condition of
server node. Corresponding load balancing strategy is adopted for each period that the strategy of fast select is
used in the period of small load while the server node with stable response time is first adopted in the period of
large load. Periodic load strategy avoids the load imbalance caused by the single load strategy. It can be known
from the experimental comparison that the improved periodic load balancing strategy is superior to the load bal-
ancing strategy of weighted least connection.

Keywords: intrusion detection system; deep learning; convolution neural network

1 INTRODUCTION main approach of establishing intrusion detection


model by traditional intrusion detection algorithm [3].
With the rapid development of internet technology, In order to detect user’s various activities in a given
network security problems have become serious with pattern, misuse pattern will compare user behavior and
each passing day. Researchers have been studying attacker’s given behavior [4]. It can train intrusion
Intrusion Detection System (IDS) all the time so as to detection model by learning the marked attack data
protect network system from malware and attack. On under off-line condition, and apply the model in actual
the basis of detection objects, IDS can be divided into intrusion detection to detect attacks of known types.
two categories: IDS based on host and IDS based on This intrusion detection system is slow in model es-
network [1]. Host-based IDS (HIDS) monitors host tablishment and high in model change cost, making it
system operation behavior or status, such as whether difficult to effectively detect newly emerging attack
there is unauthorized installation and access in system types on the internet. In the face of expanding internal
event or whether there is anticipatory data in memory scale and endlessly emerging attacks, most traditional
status or file system status. HIDS is based on system intrusion detection algorithms are lack of adaptivity
event log. It has low false alarm rate, but cannot ana- and expansibility. Thus, we need to study a more au-
lyze internet-related behavior. Network-based IDS tomated and intelligent approach to build an intrusion
(NIDS) is put in isolation region or on chokepoint of detection model based on behavior detection with
network edge. It monitors and analyzes real-time net- self-adaptivity and dynamic expansibility.
work flow to detect any unauthorized intrusion or
hostile attack. There are usually two kinds of detection
technology [2]. One is behavior-based intrusion detec- 2 IDS RESEARCH STATUS
tion and we call it anomaly detection. It compares
abnormal behavior and normal behavior to capture With the arrival of the big data era, approaches to
attack. The other is misuse detection which is based network attack are being updated day by day. New
on given information to realize intrusion detection. It network intrusions have shown a trend of intelligenti-
is used to detect attacks in repository of given infor- zation and complication. It is difficult for traditional
mation. anomaly detection technologies to take effect on new
Misuse detection based on pattern matching is the network intrusions and deliver a satisfying result.
Almost all commercialized intrusion detection systems
*Corresponding author: liu05.meac@gmail.com use the known attack database in detection. How to

9
train a high-efficiency model and reduce training cost tection rate and false alarm rate were obtained through
at the same time has become a research emphasis. In experiment.
the face of various new network intrusions, many new
intelligent intrusion detection algorithm applications
have emerged as the times require. Soft Computing 3 CONVOLUTIONAL NEURAL NETWORK
(SC) can obtain low-cost solution and robustness
through fault-tolerance on uncertain, inaccurate, and Convolutional Neural Network (CNN) is a famous
incomplete truth-values, and thus can effectively re- deep learning frame originating from biotic natural
duce detection cost. IDS has applied part of SC tech- visual perception mechanism. In 1990, LeCun, et al.
[14]
nologies, such as artificial neural network, Genetic published his thesis on CNN frame innovation, and
Algorithm (GA), decision-making tree, and fuzzy later improved it in [15]. They developed a multi-
logic, etc. Due to their flexible and rapid learning ple-layer artificial neural network called LeNet-5
abilities, SC technologies have entered the practical which can classify handwritten numerals. Like other
stage of intrusion detection area. neural networks, LeNet-5 has multiple layers and can
With deepening of study, the nonlinear fitting abil- be trained by BP [16]. It can obtain valid expression of
ity of neural network algorithm can map all complex original images, making it possible to directly recog-
nonlinear relations. The simple learning rules are easy nize visual pixels. There is almost no preprocessing.
for realization by PC. It contains strong robustness, There are various variant models. However, their
memory ability, nonlinear mapping ability, and basic modules are very similar. Take the famous Le-
self-learning ability. Hence, there is a huge application Net-5 [15] for example (see Figure 1). It is made up of
market for neural network algorithm. Yihua Liao et al. three types of layers: convolutional layer, subsampling
proposed to apply k-Nearest Neighbor (k-NN) in IDS layer, and full-connection layers.
[5]
. They used k-NN classifier to conduct anomaly
detection. Andrew H. Sung et al. used Support Vendor
Machine (SVM) and Artificial Neural Network (ANN)
to do IDS performance test [6], and reduced training
features according to the influence left on attack clas-
sification. However, it has been proved the perfor-
mance test result is similar to that of training con- Figure 1. LeNet-5 network structure.
ducted with primitive features. Maheshkumar Sa-
bhnani et al. designed a classifier model that includes
several machine learning algorithms, such as Mul- As shown in Figure 1, convolutional layer aims at
ti-Layer Perceptron (MLP), k-means, and deci- learning character representation of input. It is com-
sion-making tree [7]. They proved that multi- posed of several convolutional kernels used for com-
ple-classifier model has better performance than sin- puting different characteristic patterns. Each nerve cell
gle-classifier model. By applying Self-Organizing of characteristic pattern is connected to the neighbor-
Mapping (SOM), Heywood et al. proposed a hierar- hood of nerve cells on the upper layer. Thus, one
chical neural network structure used for intrusion de- neighborhood on the upper layer is called receptive
tection [8]. J. Shum et al. proposed to establish IDS by field of nerve cell. Firstly, we can convolute the fea-
using Back Propagation (BP) to train feedforward tures of input and learning, and then apply element
neural network [9]. Mukkamala et al. proposed a mixed nonlinearity on convolution result to obtain new char-
IDS model by combining neural network and SVM [10]. acteristic pattern. Through usage of different convolu-
Xue et al. proposed to use improved Jordan recurrent tional kernels, we can obtain a complete new charac-
neural network in intrusion detection system [11]. Ska- ௟
teristic pattern. The characteristic value ‫ݖ‬௜,௝,௞ on spot
ruz et al. succeeded in using Jordan recurrent neural (i, j) of the No. k characteristic pattern can be com-
network to test SQL attack [12]. In 2015, K. Jihyun et puted by the equation shown below.
al. studied and applied recurrent neural network and
Hessian free optimization algorithm to train DARPA zil, j ,k = wklT xil, j + bkl (1)
dataset [13].
If we extend this study process to convolutional
neural network algorithm, we will find a way to obtain In which, ‫ݓ‬௞௟ and ܾ௞௟ respectively refer to the weight
detection accuracy higher than that of previous ap- vector and bias term of the No. k filter on the first

proaches. Convolutional neural network can realize layer while ‫ݔ‬௜,௝ refers to input complement centering
high-grade abstraction in data through complex struc- on spot (i, j) on the first layer. The kernel ‫ݓ‬௞௟ gener-

ture or combinations of nonlinear transformations, and ating characteristic mapping ‫ݖ‬௜,௝,௞ is shared. This
thus can obtain higher detection rate. This thesis ap- weight sharing mechanism has a few advantages, such
plied convolutional neural network model trained by as reducing model complexity and making training
BP in IDS model, and used KDD Cup 1999 dataset to easier. Activation function introduces nonlinearity to
train the model and detect the performance. The de- CNN which is ideal for a multiple-layer network to

10
detect nonlinearity. Set a (•) as nonlinear activation attack categories. DoS attack can exhaust target serv-
௟ ers’ resources and stop servers from accepting to pro-
function. The activation value ܽ௜,௝,௞ of convolutional

characteristic ‫ݖ‬௜,௝,௞ can be computed as follows: vide service. R2L attack can allow unauthorized re-
mote access. U2R attack tries to obtain superuser per-
mission. Detection attack is used to find bugs in target
ail, j ,k = a ( zil, j ,k ) (2) servers.
As there are too many data recorded concentrated
Typical activation functions are sigmoid, tanh, and by original dataset, this thesis used 10% of KDD Cup
ReLU [17]. Subsampling layer is usually between two 1999 data for training and test. However, DoS attack
convolutional layers, aiming at realizing posi- proportion was the highest among all attacked data
tion-invariance by reducing resolution ratio of charac- while other attacks only accounted for 1% as shown in
teristic pattern. Each characteristic pattern of subsam- Figure 2. Therefore, each flow training weight in IDS
pling layer is connected to the characteristic pattern of model cannot be balanced. Compared with other kinds
its corresponding previous convolutional layer. Set of flows, DoS attack and normal flow are easier to
pool(•) as pooling function and the equation applica- detect.
ble to each characteristic pattern al:,:,k is given below:
Table 1. Attack Categories.
l l Category Attack Pattern
y i , j ,k = pool (a m ,n,k ), ∀(m, n) ∈ Rij (3)
DoS back, land, neptune, pod, smurf, teardrop
In which, Rij is the neighborhood of (i,j). Typical R2L ftp-write, guess-passwd, imap, multihop, phf,
spy, warezclient, warezmaster
pooling operation includes average pool [18] and
U2R buffer-overflow, loadmodeule, perl, rootkit
maximum pool [19]. We can extract more abstract fea-
Probe ipsweep, nmap, portsweep, satan
ture representations by piling up convolutional layers
and subsampling layers.
There might be one or several full-connection layers
after a few convolutional layers and subsampling lay-
ers, aiming at executing high-grade reasoning [20]
[21]. They connect all nerve cells on the upper layer
and link them to each nerve cell of the current layer, in
order to generate global semantic information. The
last fully connected layer output will be fed to output
layer. Softmax algorithm [22] is usually used for classi-
fication tasks. Another commonly used method is
SVM which can be combined with CNN features to
solve different classification tasks [20]. All CNN pa- Figure 2. Percentage of each test data.
rameters (e.g. weight vector and bias terms) can obtain
the optimal parameters of a particular task by mini- In order to solve this problem, we needed to gener-
mizing appropriate loss functions defined in the task. ate a new training dataset averagely. Thus, we took
CNN learns through weight sharing, local perceptive 300 living examples from each attack category. As
field, subsampling, and other related processes. It there was very few living examples of U2R attack, we
contains strong robustness [23]. only took 30 data from U2R category and added 1,000
normal examples.
Zhanyi Wang [25] proposed to extract effective load
4 TEST SETS bytes from flow data. The length of an effective load
sequence is around 1,000 bytes that can map effective
KDD Cup 1999 data has been used in evaluation on load records into images or documents. Each byte is a
network flow analysis of intrusion detection to the pixel or a character. As convolutional neural network
highest extent. Stolfo, et al. proposed [24] that KDD has achieved very sophisticated and good effect on
Cup 1999 data was established by the data captured in image/document learning features and classification
DARPA IDS evaluation procedure. Although it has application, this approach can be effectively applied in
been used for ages, many intrusion detection test re- information security area.
sults are based on KDD Cup 1999 data. Thus, this
dataset can provide good comparison for each intru-
sion detection system model. This is the main reason 5 EXPERIMENT
for why this thesis chose KDD99 dataset.
There are 4,898,431 network flows in the dataset In this section, we firstly established IDS detection
and each flow has 41 features. According to flow fea- model and used training datasets to do learning train-
tures, there are 22 attack categories. See Table 1 for ing. Then, we used test sets to do the detection. The

11
experimental environment is as follows: In the test, we selected 10 test datasets from KDD
Cup according to the generated results. Each dataset
CPU: Intel(R) Core(TM)i7-7700K 4.20GHz contained 5,000 randomly selected examples. Figure 4
GPU: GTX 1080 shows the detection rate and false alarm rate of each
RAM: 32GB test dataset during test. By concluding the results in
Operating system: Ubuntu 14.04 Table 2, we found the average detection rate was
0.976614. It means the attack detection percentage in
5.1 Evaluation Indicators total attack examples was 97.7%. The average false
alarm rate was 0.099958, meaning there were around
In most cases, Detection Rate (DR) and False Alarm 10% normal examples were classified into wrong
Rate (FAR) were used as measurement in IDS evalua- categories.
tion. DR refers to the rate of intrusion examples de-
tected by IDS model. FAR refers to the rate of normal
examples which are classified in wrong categories.
Based on confusion matrix, the equation of measure-
ment is as follows (TP: True Rate, TN: True Negative
Rate, FP: False Alarm Percentage, FN: False Negative
Rate):

DR = TP / (TP + FN ) (4)

FAR = FP /(TN + FP) (5)

With the increase of DR or the decrease of FAR,


model performance was enhanced. As a consequence,
introduction of a new measurement Efficiency (Ef) Figure 4. Test results.
can evaluate the model more comprehensively:

Ef = DR FAR (6) Table 2. Conclusions.


Category Detection rate False alarm rate Efficiency
Maximum value 0.97967 0.12509 7.83155
5.2 Establishment of IDS Model Minimum value 0.97103 0.07781 12.47898
Average value 0.97661 0.09996 9.77024
Before using training datasets, we extracted the first
1024 bytes of effective loads of all examples. The
input vector was a 32*32=128 grey-scale map as Figure 5 shows the detected average percentage of
shown in Figure 3. The output vector was composed each attack. The detection results of DoS and normal
of 4 attack vectors and 1 normal vector. Therefore, the examples were good. However, the detection of U2R
input dimension was 128 while the output dimension example was zero, meaning the 30 U2R examples
was 5. This thesis applied 1LenNet-5 convolutional were too few to meet the model training need.
neural network [2] in training, including 2 convolu-
tional layers, 2 subsampling layers, 2 full-connection
layers, and 1 classification layer. The selected convo-
lutional kernel was 5×5 in size.

Figure 5. Attack detection percentage.

In order to make an objective evaluation, we com-


pared the results with other classifier algorithms as
shown in Table 3. Although our false alarm rate was
slightly higher than other algorithms, we got the high-
Figure 3. Grey-scale Map of Input Data. est detection rate and the most precise percentage.

12
Table 3. Comparative results. [6] Andrew, H. S. & Mukkamala, S. 2003. Identifying im-
Category Detection Rate (%) False Alarm Rate (%) Efficiency (%) portant features for intrusion detection using support
GRNN 59.12 12.46 87.54 vector machines and neural networks. Applications and
PNN 96.33 3.34 96.66 the Internet.
RBNN 69.83 6.95 93.05 [7] Maheshkumar, S. & Serpen, G. 2003. Application of
KNN 45.74 46.49 90.74
SVM 87.65 6.12 90.4
machine learning algorithms to KDD intrusion detection
Bayes 77.6 17.57 88.46 dataset within misuse detection context, MLMTA.
CNN 97.7 9.996 98.11 [8] Kayacik, H. Zincir-Heywood, A. & Heywood, M. 2007.
A hierarchical SOM-based intrusion detection system. In
Proc. Elsevier Engineering Application of Artificial In-
6 CONCLUSIONS telligence, pp: 439-451.
[9] Shum, J. & Malki, H. A. 2008. Network intrusion detec-
tion system using neural network. In Proc. IEEE Fourth
This thesis established an intrusion detection algo-
Int. Conference on Natural Computation, pp: 242-246.
rithm based on convolutional neural network and [10] Mukkamala, S. Andrew, H. S & Abraham, A. 2003. In-
evaluated IDS model. In training phase, it generated trusion detection using ensemble of soft computing par-
datasets by extracting living examples from KDD Cup adigms. Intelligent Systems Design and Applications.
1999 dataset, and did two-dimensionalization pro- Springer Berlin Heidelberg, pp: 239-248.
cessing on test data to bring more convenient convolu- [11] Xue, J. S. Sun, J. Z. & Zhang, X. 2004. Recurrent net-
tional neural network learning. In test phase, it ex- work in network intrusion detection system. In Machine
tracted 10 test datasets and tested their performance. Learning and Cybernetics, Proceedings of 2004 Interna-
tional Conference on, 5: 2676-2679.
Compared with other IDS classifiers, intrusion detec-
[12] Skaruz, J. & Seredynski, F. 2007. Recurrent neural net-
tion model based on convolutional neural network has works towards detection of sql attacks, In Parallel and
the highest detection rate and precision. The feasibility Distributed Processing Symposium, 2007, pp: 18.
of applying convolutional neural network in high- [13] Jihyun, K. Howon, K. 2015. Applying recurrent neural
ly-intruded detection has been proved. In the future, network to intrusion detection with hessian free optimi-
we will focus on improving false alarm rate. We will zation, In Proc. WISA.
start from detection rules based on white list and im- [14] LeCun, Y. Denker, J. S. Henderson, D. et al. 1990.
prove normal data learning quantity of model to effi- Handwritten digit recognition with a back-propagation
network, Advances in Neural Information Processing
ciently reduce false alarm rate and provide real data
Systems, Colorado, USA: [s.n.]: 396-404.
for learning and test in the meantime. [15] LeCun, Y. Bottou, L. Bengio, Y. & Haffner, P. 1998.
Gradient-based learning applied to document recogni-
tion. Proceedings of the IEEE.
ACKNOWLEDGEMENT [16] Hecht-Nielsen, R. 1989. Theory of the back-propagation
neural network, In IJCNN.
This paper is supported by the National Key Research [17] Nair, V. and Hinton, G. E. 2010. Rectified linear units
and Development Project (No.: 2016YFB0801505, improve restricted Boltzmann machines, in ICML.
2016YFB0801601) [18] Wang, T. Wu, D. Coates, A. & Ng, A. 2012. End-to-end
text recognition with convolutional neural networks, in
ICPR.
[19] Boureau, Y. Ponce, J. & LeCun, Y. 2010. A theoretical
REFERENCES analysis of feature pooling in visual recognition, in
ICML.
[1] Yuebin, B. & Kobayashi, H. 2003. Intrusion detection [20] Zeiler, M. D. & Fergus, R. 2014. Visualizing and under-
systems: technology and development, AINA 2003. 17th standing convolutional networks, in ECCV.
International Conference on. IEEE. [21] Simonyan, K. & Zisserman, A. 2015. Very deep convo-
[2] Ozgur, D, et al. 2005. An intelligent intrusion detection lutional networks for large-scale image recognition, in
system (IDS) for anomaly and misuse detection in com- ICLR.
puter networks. Expert systems with Applications, 29(4): [22] Krizhevsky, A. Sutskever, I. & Hinton, G. E. 2012.
713-722. Imagenet classification with deep convolutional neural
[3] Taeshik, S. & Jongsub, M. 2007. A hybrid machine networks, in NIPS.
learning approach to network anomaly detection. Science [23] Lecun, Y. Bottou, L. & Bengio, Y. et al. 1998. Gradi-
Direct, Information Sciences, 177: 3799-3821. ent-based learning applied to document recognition.
[4] Yehui, C. Ajith, A. Bo, Y. 2007. Hybrid flexible neu- Proceedings of the IEEE, 86(11): 2278-2324.
ral-tree-based intrusion detection systems. International [24] http://kdd.ics.uci.edu
Journal of Intelligent Systems, 22(4): 337-352. [25] Wang, Z.Y. 2015. The applications of deep learning on
[5] Yihua, L. & Vemuri, V. R. 2002. Use of k-nearest traffic identification, In black hat USA.
neighbor classifier for intrusion detection. Computers &
Security. 21(5): 439-448.

13

You might also like