You are on page 1of 9

Network Malicious Behavior Detection

Using Bidirectional LSTM

Wenwu Chen1, Su Yang1,2, Xu An Wang1,2,3(&),


Wei Zhang1, and Jindan Zhang4,5
1
Key Laboratory for Network and Information Security of Chinese
Armed Police Force, Engineering University of Chinese Armed Police Force,
Xi’an, Shaanxi, China
Chenwen5abc@163.com, wangxazjd@163.com
2
Department of Electronic Technology, Engineering University of the Chinese
Armed Police Force, Xi’an, Shaanxi, China
3
Guangxi Key Laboratory of Cryptography and Information Security,
Guilin University of Electronic Technology, Guilin, People’s Republic of China
4
State Key Laboratory of Integrated Service Networks,
School of Telecommunications Engineering,
Xidian University, Xi’an, Shaanxi, China
5
Xianyang Vocational Technical College, Xianyang, Shaanxi, China

Abstract. With the rapid development of the Internet, the methods of cyber
attack have become more complex and the damage to the world has become
increasingly greater. Therefore, timely detection of malicious behavior on the
Internet has become an important security issue today. This paper proposes an
intrusion detection system based on deep learning, applies bidirectional long
short term memory architecture to the system, and uses the UNSW-NB15 data
set for training and testing. Experimental tests show that the intrusion detection
system can effectively detect the known or unknown malicious behavior of the
network under the current network environment.

1 Introduction

In recent years, with the rapid development of Internet, cyber space security faces more
serious challenges. In the technology of protecting network security, the malicious
behavior detection system as an important protection means has been paid more and
more attention. Malicious behavior detection system, namely the intrusion detection
system (IDS) [1] by collecting and analyzing network behavior, security logs, audit
data, check the network or system whether there is a violation of security policy and the
phenomenon of being attacked.
According to the detection technology can be divided into two categories:
(a) anomaly detection: behaviors that are normal system by learning summary form
normal behavior patterns, the difference between the current behavior and normal mode
exceeds the threshold, was judged to be invaded. (b) feature detection. The system first
collects the behavior characteristics of the abnormal operation [10, 11]. When the
current behavior matches the abnormal pattern, it is determined as an invasion.

© Springer International Publishing AG, part of Springer Nature 2019


L. Barolli et al. (Eds.): CISIS 2018, AISC 772, pp. 627–635, 2019.
https://doi.org/10.1007/978-3-319-93659-8_57
628 W. Chen et al.

With the development of machine learning technology, especially deep learning


technology, more and more researchers have begun to apply deep learning to intrusion
detection system [12]. Deep learning through complex neural network structure and
iterative arithmetic to get the best weights, so it can obtain more accurate results.
As a serialized model, the long short-term memory network (LSTM) [2] sees the
network connection data as a sequence of time, which is more consistent with the
current network attack structure. Staudemeye et al. [3] first used LSTM for network
intrusion detection. The input feature is the original 41 characteristics of KDD99 data
set. The output vector has a length of 5, which includes 4 attacks and normal condi-
tions, and the method achieves 93.3% accuracy, exceeding the winning work of KDD
Cup 1999. Kim et al. [4] used LSTM to perform network intrusion detection and
parameter selection on KDD99 data set, and obtained a high detection rate (98.88%)
and accuracy (96.93%), however, the false alarm rate was also high, and reached
10.04%. The above research indicates that LSTM can be used to overcome the problem
of rapid information loss in neural network [13], and can be able to recall the previous
connection information and play a greater role in the field of intrusion detection.
This paper adopts Bi-directional long short-term memory (Bi-directional LSTM)
for intrusion detection system. Using UNSW-NB15 data set for training and testing.
Through experiments, the optimal parameters of bidirectional LSTM are found, and the
high detection rate and low false alarm rate are obtained.

2 Bi-directional LSTM

Bi-directional LSTM [5] using finite sequence according to the context of elements in
the past and the future to predict or tag sequence of each element. This is the output of
two LSTM in series, one processing sequence from left to right, and the other from
right to left. Composite output is the prediction of a given target signal. This technique
has proved particularly useful.

2.1 Recurrent Neural Network


Neural network experts such as Jordan and Elman proposed a neural network structure
model in the late 1980s, that is the Recurrent neural network (RNN). The essential
feature of this neural network is that there is an internal feedback connection and a
feedforward connection between the processing units. From the viewpoint of the
system, it is a feedback power system, which reflects the dynamic characteristics of the
process in the calculation process.
In the feed-forward neural network model, the connection exists between the layers,
and there is no connection between the nodes of each layer. The recurrent neural
network can process any length of sequence by using a neural with self-feedback.
The recurrent neural network is more in line with the structure of the biological
neural network than the feedforward neural network. A directed graph is formed along
the sequence of each unit in the network to represent the dynamic change of time series.
The RNN uses internal state processing input sequences and therefore has memory
Network Malicious Behavior Detection Using Bidirectional LSTM 629

functions. The recurrent neural network has been widely used in speech recognition,
language model and natural language generation (Figs. 1, 3 and Tables 1, 2, 3).

Fig. 1. A standard RNN and its unfold form. The chain characteristics reveal that RNN is
essentially related to the sequence and the list. They are the most natural neural networks for this
kind of data.

RNN has a circular connection, assuming that the input sequence, the hidden vector
sequence and the output vector sequence are represented by X, H and O. The input
sequence is given by X ¼ ðx1 ; x2 ; . . .; xT Þ.
RNN calculates the hidden vector sequence (H ¼ ðh1 ; h2 ; . . .; hT Þ) and output
vector sequence (O ¼ ðo1 ; o2 ; . . .; oT Þ) of t = 1 to t, as follows:

ht ¼ rðWxh xt þ Whh ht1 þ bh Þ ð1Þ

ot ¼ Who ht þ bo ð2Þ

Where function r is a nonlinear function, W is a weight matrix, and b is a deviation


term.

2.2 LSTM Cell


Long short-term memory (LSTM) is a new RNN architecture proposed by Hochreiter
and Schmidhuber [2]. The problem of gradient vanishing and gradient explosion is
solved by introducing gate structure and memory unit. Figure 2 shows a single LSTM
cell, which describes the operation of three gate and cell states through the following
formula [9].
r is the sigmoid function, and i, f, o and c are input gates, forgot gates, output gate
and cell status. Wci, Wcf and Wco are denoted as the weight matrix for the peep joint.
In LSTM, three gates (i, f, o) control the flow of information. Input and forget gates
together determine how much of that front nerve cell lay information can remain in the
current memory cell, and the output layer at the other end determines how much
information the next nerve cell layer can acquire from the current neuron. The specific
process is as follows:
(A) forgetting the information from the cell state, determined by the sigmoid function
of the forgotten gate, the output of the input of the current layer and the output of
the previous layer as input, the cell state output at time t–1 being:
630 W. Chen et al.

Fig. 2. The basic architecture of the LSTM model, in which the middle four interacting layers
are the core of the entire model.

Fig. 3. The basic architecture of the Bi-directional LSTM model

Table 1. Details of a part of UNSW-NB15 data sets


Class Category Training set Testing set
0 Normal 56,000 37,000
1 Analysis 2000 677
2 Backdoor 1746 583
3 DoS 12,264 4089
4 Exploits 33,393 11,132
5 Fuzzers 18,184 6062
6 Generic 40,000 18,871
7 Reconnaissance 10,491 3496
8 Shellcode 1133 378
9 Worms 130 44
Total records 175,341 82,332

Table 2. Optimum model parameter


Learning rate 0.01
Hiding layer 128
Bach size 10
Epoch 10500
Network Malicious Behavior Detection Using Bidirectional LSTM 631

Table 3. Bi-directional LSTM network architecture


Layer Dimensions Activation Dropout
Input 49 – –
Bi-directional LSTM 128 Sigmoid 0.4
Bi-directional LSTM 128 Sigmoid 0.4
Dense 10 Softmax –

ft ¼ rðWf  ½ht1 ; xt  þ bf Þ ð3Þ

(B) the information stored in the cell state is mainly composed of two parts: the result
of the sigmoid layer of the input door as the updated information; The new vector,
created by the tanh function, is added to the cell state. Multiply the old cell state
by f, with the new candidate information, and generate cell status updates.

it ¼ rðWi  ½ht1 ; xt  þ bi Þ ð4Þ

^ t ¼ tan hðWC  ½ht1 ; xt  þ bC Þ


C ð5Þ

^t
Ct ¼ ft  Ct1 þ it  C ð6Þ

(C) output information is determined by the output gate. First, the sigmoid layer is
used to determine the cell status information to be output, and the cell state is
treated with tanh, and the product of two parts of information is output value.

ot ¼ rðWo ½ht1 ; xt  þ bo Þ ð7Þ

ht ¼ ot  tan hðCt Þ ð8Þ

2.3 Bi-LSTM Network


When the standard LSTM network deals with the time series, the prediction effect is
lost due to the neglect of the future context information and the failure to learn all the
sequences. Bi-directional LSTM [5] proposes that each training sequence forward and
behind are two LSTM cells. This structure can calculate the past and future states of
each cell in the input sequence.
The hidden layer of the Bi-directional LSTM network should save two values.
A participates in the forward calculation, and A is involved in the reverse calculation.
The final output value y depends on A and A transpose. The final output value y
depends on A and A transpose.
632 W. Chen et al.

3 The Data Set

In the field of network intrusion detection, the KDD99 data set [6] is the fact bench-
mark, which lays the foundation for the study of network intrusion detection based on
machine learning. But the KDD99 data set was founded in 1998, when the experi-
mental condition and means of attack is outdated and attack from the network layer
evolution for attacks on application layer, such as cross-site scripting, cross-site request
forgery, click on the hijacked, etc., the data set does not reflect the modern network
traffic scene.
To solve this problem, in 2015, Nour Moustafa and Jill Slay set up an integrated
network of UNSW-NB15 data set [7, 8]. It reflects the modern network traffic pattern,
which contains a lot of low footprint intrusion and deep structured network traffic
information. UNSW-NB15 is a collection of approximately 100 GB of original net-
work traffic created by the network space laboratory of the Australian network security
center (ACCS). This data set includes real modern normal activities and comprehensive
attacks. The dataset contains 49 features, as well as some 2540,044 data instances,
including normal records and nine attack types, namely, Fuzzers, Analysis, Backdoors,
DoS, Exploits, Generic, Reconnaissance, Shellcode, and Worms. A subset of the
training set and test set includes 175,341 records and 82,332 records.
The related attack types of UNSW-NB15 data sets are divided into nine categories,
as follows:
(1) Fuzzers: the attacker attempts to found in the applications, operating system or
network security vulnerabilities, through a large number of random input data
make it crash.
(2) Analysis: a variety of types of intrusions that penetrate web applications through
ports (e.g., port scanning), E-mail (e.g., spam), and web scripts (e.g., HTML files).
(3) Backdoor: is a bypass common authentication technology, can be unauthorized
remote access to a particular device, positioning to the plain text entry, because it
is hard to continue to be detected.
(4) DoS: intrusion destroys computer resources through memory, resulting in too
much business, in order to prevent unauthorized access to devices.
(5) Exploit: a set of instructions that exploits failures, vulnerabilities, or bugs,
resulting in unintentional or unsuspected behavior on a host or network.
(6) Generic: a technique that USES a hash function to establish each block cipher and
causes a collision without regard to the configuration of the block cipher.
(7) Reconnaissance: which is an attack to collect information about computer net-
works to evade its security control.
(8) Shellcode: malware, the attacker inserts a small piece of code from a shell to
control the hacked machine.
(9) Worm: it is the attacker who copies the attacks that he has spread on other
computers. Typically, it uses a computer network to spread itself, depending on
the security failure of the target computer to access it.
Network Malicious Behavior Detection Using Bidirectional LSTM 633

4 The Experiment
4.1 Experimental Environment
The experimental environment adopted in this paper is as follows:
CPU: Intel (R) Core (TM) i7-6700HQ 2.60 GHz.
Memory: 4 GB
GPU: GTX 960 M
Operating system: Windows 10.
Machine learning framework: Tensorflow1.3 + Keras2.1

4.2 Model Build


Each group connection in the data set is described by 49 characteristics, and the attack
category has 10 different results. Therefore, the input layer node number is 49 and the
output layer is 10. The classifier of the output layer selects softmax, the optimization
function adopts the Adam, and the loss objective function selects the
“binary_crossentropy”.
Select the training subset for training to determine the optimal parameters. The
parameters that need to be selected are the learning rate, the hidden layers, the time
steps, the size of each batch, the number of epoch.
Using “grid_search” in the sklearn library to iterate through a grid search algorithm
to experiment with different parameter combinations, determine the optimal parameters
as follows:
The sequence model of Keras was used to initialize the API function Sequential(),
adding two hidden Bi-directional LSTM layers, set the input dimension to 49, and the
number of each hidden layer node was 128. In order to prevent the overfit in the
training, the Dropout was set to 0.4. The Dense layer is added as the output layer, the
number of nodes is 10.
As a comparison, we use the normal LSTM network to conduct the classification
test. The relevant parameters are the same as before.

4.3 Classification Result Analysis


In Tables 4 and 5, the attack categories are represented by Numbers, 0 represents
normal connections, and 1-9 in turn means Fuzzers, Analysis, Backdoors, DoS,
Exploits, Generic, Reconnaissance, Shellcode, and worms.
The experimental results show that the model can accurately determine whether a
group is normal or not, and it performs poorly in the attack classification. The reason
may be that different types of attacks have different characteristics and are easy to
identify normal connections. However, some types of attacks are relatively small, and
there is little difference between different types of attacks, so it is not easy to classify
different attacks.
In comparison with ordinary LSTM network, the performance of bidirectional
LSTM network is better than that of ordinary LSTM network.
634 W. Chen et al.

Table 4. Classification result with bi-directional LSTM network


Class Precision Recall F1-score
0 0.99 0.97 0.96
1 0.00 0.00 0.00
2 0.22 0.87 0.36
3 0.26 0.19 0.20
4 0.59 0.70 0.64
5 0.27 0.16 0.20
6 0.00 0.00 0.00
7 0.00 0.00 0.00
8 0.00 0.00 0.00
9 0.00 0.00 0.00
Avg 0.93 0.89 0.91

Table 5. Classification result with LSTM network


Class Precision Recall F1-score
0 0.94 0.96 0.94
1 0.00 0.00 0.00
2 0.17 0.64 0.27
3 0.18 0.13 0.15
4 0.45 0.73 0.56
5 0.19 0.23 0.11
6 0.00 0.00 0.00
7 0.00 0.00 0.00
8 0.00 0.00 0.00
9 0.00 0.00 0.00
Avg 0.85 0.88 0.86

5 Conclusion

In this paper, a Bi-directional LSTM network is used to classify UNSW-NB15 datasets.


A neural network consisting of two bi-directional LSTM cells are constructed, and the
optimal parameters are found by using the training data set. The trained network is
applied to the test data set, high accuracy and high recall rate were obtained in the
detection of normal/attack. In order to compare the test results, the normal LSTM
network was used to train and test the data set. The test results show that the bidi-
rectional LSTM network can effectively improve the detection effect.
Found some problems in the experimental process, such as, large amount of data of
imbalanced data set classification, Unbalanced data sets of problems make it difficult to
classification of generalization, need resolved in the next step.
Network Malicious Behavior Detection Using Bidirectional LSTM 635

Acknowledgements. This work is supported by National Natural Science Foundation of China


(Grant Nos. 61772550, 61572521, U1636114), National Cryptography Development Fund of
China Under Grants No. MMJJ20170112, National Key Research and Development Program of
China Under Grants No. 2017YFB0802000, the Natural Science Basic Research Plan in Shaanxi
Province of china (Grant Nos. 2016JQ6037) and Guangxi Key Laboratory of Cryptography and
Information Security (No. GCIS201610).

References
1. Rowland, C.H.: Intrusion Detection System. US, US6405318 (2002)
2. Hochreiter, S., Schmidhuber, J.: Long short-term memory. Neural Comput. 9(8), 1735–1780
(1997)
3. Staudemeyer, R.C.: Applying long short-term memory recurrent neural networks to intrusion
detection. S. Afr. Comput. J. 56(1), 136–154 (2015)
4. Kim, J., et al.: Long short term memory recurrent neural network classifier for intrusion
detection. In: International Conference on Platform Technology and Service IEEE, pp. 1–5
(2016)
5. Graves, A., Schmidhuber, J.: Framewise phoneme classification with bidirectional LSTM
and other neural network architectures. Neural Netw. 18(5), 602–610 (2005)
6. Stolfo, S.J., Stolfo, S.J.: KDD Cup 1999 Dataset (1999)
7. Moustafa, N., Slay, J: UNSW-NB15: a comprehensive data set for network intrusion
detection systems (UNSW-NB15 network data set). In: Military Communications and
Information Systems Conference (MilCIS), IEEE (2015)
8. Moustafa, N., Slay, J.: The evaluation of network anomaly detection systems: statistical
analysis of the UNSW-NB15 data set and the comparison with the KDD99 data set. Inf.
Secur. J. Glob. Perspect. 25, 1–14 (2016)
9. Olah, C.: Understanding LSTM Networks (2015). http://colah.github.io/posts/2015-08-
Understanding-LSTMs/
10. Denning, D.E.: An Intrusion-Detection Model. IEEE Press, New York (1987)
11. Lee, W., Stolfo, S.J., Mok, K.W.: A data mining framework for building intrusion detection
models. In: Proceedings of the 1999 IEEE Symposium on Security and Privacy, p. 0120
(1999)
12. Ryan, J., Lin, M.J., Miikkulainen, R.: Intrusion detection with neural networks. Adv. Neural.
Inf. Process. Syst. 28(10), 915 (1998)
13. Gers, F., Schmidhuber, J., Cummins, F.: Learning to forget: continual prediction with
LSTM. Neural Comput. 12(10), 2451–2471 (2000)

You might also like