10.a Novel Multimodal-Sequential Approach Based On Multi-View Features For Network Intrusion Detection

Received November 11, 2019, accepted November 30, 2019, date of publication December 12, 2019,
date of current version December 27, 2019.

Digital Object Identifier 10.1109/ACCESS.2019.2959131
A Novel Multimodal-Sequential Approach Based

on Multi-View Features for Network Intrusion
Detection
HAITAO HE 1,3 , XIAOBING SUN 1,3 , HONGDOU HE 1,3 , GUYU ZHAO 1,3 , LIGANG HE 2,
AND JIADONG REN 1,3

1 College of Information Science and Engineering, Yanshan University, Qinhuangdao 066004, China
2 Department of Computer Science, University of Warwick, Coventry CV4 7AL, U.K.
3 Key Computer Virtual Technology and System Integration Laboratory, Yanshan University, Qinhuangdao 066004, China
Corresponding author: Xiaobing Sun (dabingsun@yeah.net)
This work was supported in part by the National Natural Science Foundation of China under Grant 61772449 and Grant 61572420 and in
part by the Natural Science Foundation of Hebei Province under Grant F2019203120.
ABSTRACT Network intrusion detection systems (NIDS) are essential tools in ensuring network infor-
mation security, and neural networks have become an increasingly popular solution for NIDS. However,
with the gradual complexity of the network environment, the existing solutions using the conventional
neural network cannot make full use of the rich information in the network traffic data due to its single
structure. More importantly, this will lead to the existing NIDS have incomplete knowledge of the intrusion
detection domain, and making it unable to achieve a high detection rate and good stability in the new
environment. In this paper, we take a step forward and extract the different level features from the network
connection, rather than a long feature vector used in the traditional approach, which can process feature
information separately more efficiently. And further, we propose multimodal-sequential intrusion detection
approach with special structure of hierarchical progressive network, which is supported by multimodal deep
auto encoder (MDAE) and LSTM technologies. By design the special structure of hierarchical progressive
network, our approach can efficiently integrate the different level features information within a network
connection and automatically learn temporal information between adjacent network connections at the same
time. Based on the three benchmark datasets from 1999 to 2017, including NSL-KDD, UNSW-NB15, and
CICIDS 2017, we investigated the performance of our proposed approach on the task of detecting attacks
within modern network. The experimental results show that the average accuracy of this method is 94% in
binary classification and 88% in multi-class classification, which is at least 2% and 4% super than other
methods respectively, and demonstrated that our model has excellent stability. Moreover, we further explore
the multimodality and complementarity in traffic data, the experimental results show that the performance
of detection model can be further improved in the range 2% to 5% when using our MDAE model to process
the features of traffic data.
INDEX TERMS Network anomaly detection, hierarchical progressive network, multimodal deep learning.
I. INTRODUCTION cross-site scripting, and probe. For example, ICT systems and
With the rapid development of information and networks with various sensitive user data are prone to various
communication technology (ICT), the Internet has brought attacks, which will result in serious data breaches [1]. In the
great convenience to network users. However, the problem field of cybersecurity defenses, network intrusion detection
of information security is becoming more and more serious system (NIDS) is an important security countermeasure to
for the increase of network intrusion attacks, such as DDOS, identify and prevent malicious intrusion [2], [3]. In fact, net-
work intrusion detection is a typical classification problem,
The associate editor coordinating the review of this manuscript and its task keeps an eye on network behaviors every minute and
approving it for publication was Mamoun Alazab . determine whether give an alarm message to the network
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see http://creativecommons.org/licenses/by/4.0/
VOLUME 7, 2019 183207
H. He et al.: Novel Multimodal-Sequential Approach Based on Multi-View Features for Network Intrusion Detection
system administrators. Specifically, intrusion detection units data can help to achieve better performance of intrusion
need to identify attacks or possible threats concealed within detection.
network traffic in a timely and automatic manner to ensure However, most of the current studies use the information of
systems working stably and efficiently in the network. traffic data through incomplete consideration. In [12], [13],
In order to achieve the goal of protecting target systems they think that the features describing network connection
and networks from malicious activities, many researchers have complex relationships, and try to use unsupervised
have committed to building more reliable traffic data and learning method based on auto-encoder neural network to
effective intrusion detection models. In the early stage of the extract intermediate representations of features. However,
Internet, most network attacks can be effectively handled by with the development of network infrastructure and archi-
using port-based detection and firewall technology due to tecture, there will be more and more heterogeneity between
the transport protocol used for computer communication was features, the traditional neural network cannot obtain effec-
relatively fixed and simple [4]. However, with the advance- tive information from the domain knowledge of traffic data
ment of the network, software, and hardware technology in through its simple structure. Besides, considering the con-
recent years, the network behaviors have become a more text between adjacent network connections, some studies
complex human-computer interaction process, which is asso- [14], [15] attempt to use recurrent neural network (RNN) to
ciated with various protocols and user activities. Therefore, learn time-related information. These considerations do pro-
these early detection technologies are unqualified to detect mote the development of intrusion detection, but it is hard for
modern attacks. these studies to use the full information of traffic data. In these
At present, relatively mature detection technologies of net- methods, the complex feature information within a network
work attacks include misuse detection and anomaly detection. connection and the temporal information between network
Misuse detection can quickly identify attacks by matching connections were either ignored completely or considered
predefined patterns, while keeping false alarm rates under simply, therefore intrusion detection models will inevitably
acceptable levels. But it can only maintain the schema lose some information of traffic data and can only use incom-
database artificially and detect well-known attacks. Hence, plete feature information to classify.
misuse detection attempts to develop flexible and adap- To this end, this paper explores to integrate the
tive misuse NIDS to compensate for its own shortcomings anomaly detection theory with advanced artificial intelli-
[5]–[7]. However, due to the limited knowledge of unknown gence technology, and proposes an intelligent multimodal
attacks patterns under dynamic network ecosystem, there is sequential methodology, with the goal of full use of the multi-
still a lack of an efficient detection method for new attacks. information in traffic data and further boost the performance
In contrary, anomaly detection has the ability to identify new of an anomaly NIDS. Our intelligent methodology provides a
attacks. It uses an established statistical model from both novel multi-view solution which is extract the different level
normal and abnormal traffic to identify attacks by observing groups of feature as sub vectors from traffic data instead
whether the current action deviates from the normal actions. of using a long feature vector, and then used the structure
Although anomaly detection may be affected by high false of hierarchical progressive network for processing the com-
positive rates [8], integrating this technology with adaptive plex features within a network connection and learning the
software frameworks or artificial intelligence methods is still temporal information between network connections at the
attractive for constructing intelligent detection models, such same time. To do so, our intelligent methodology provides
as using the typical artificial neural network (ANN) to mimic scalable access to sub feature vectors via designing a scalable
human brain decision through neurons and activation func- multimodal deep auto encoder (MDAE), and each access is a
tions to establish NIDS. probability graph model to learn the distribution of each level
Moreover, some factors such as reliable traffic data also features. Although this is the first time to adopt multimodal
play a very important role in improving detection efficiency. deep learning method to solve information security problem,
A reliable traffic data is a comprehensive reflection of con- our evaluation proves that it can integrate the multi-view
temporary threat and normal range of traffic [9]. It can be features effectively and intelligently. Moreover, our intel-
conceptualized as relational data [10]. Intuitively, according ligent methodology is also supported by sequence model-
to RFC 2722 [11], a network connection can be considered ing through LSTM technology. Code has been released at
to be equivalent to the artificial logic of a call or connection, https://github.com/dabingsun/MS-DHCP. The experimental
which contains aggregated values of feature reflecting events results show the effectiveness of the proposed approach and
that occur during this connection, such as the packet size, the main contributions of this paper are as follows:
the number of packets and other statistical features of the traf- 1. We provide a novel multi-view solution to reduce the
fic packet. At the same time, each network connection in the complexity of features, and explore the use of advanced
traffic data is sequentially collected according to the access multimodal deep learning to establish an effective feature
timestamp. These indicate that traffic data not only contain fusion module to learn the underlying structure of traffic data.
the features information of each network connection, but also To our best knowledge, it is the first time to use multimodal
contains structured information such as temporal information technology based on deep learning to solve the problem of
between network connections. The rich information in traffic intrusion detection in the field of information security.
183208 VOLUME 7, 2019

2. We propose a novel methodology for designing an as unimodal features, and used to pre-train STL and classified
intelligent anomaly NIDS with the structure of hierarchical by traditional supervisory methods. In [13], the authors use
progressive network, which makes full use of the rich struc- all features to uniformly train the proposed Nonsymmetric
tured information of traffic data and further improves the Deep Auto Encoder (NDAE) method. It is worth noting that
performance of intrusion detection. the above methods use deep learning techniques in feature
3. The proposed methodology is scalable, that is, the selec- engineering, and improves the efficiency and accuracy of
tion of interpreters in each access channel and the number attack detection. In [19], the authors proposed a method
of channels in MDAE can be changed according to the view for constructing an intrusion detection classifier using deep
features, which make our methodology applicable to detect belief net (DBN). The method first uses an unsupervised
in the new environment. learning technique to pre-train the DBN and initialize the
4. We investigated the performance of our proposed weight parameters, and then use the supervised training tech-
approach on the task of detecting attacks within modern net- nique to tune. In [14], the authors explored how to establish
work. Based on three benchmark datasets from 1999 to 2017, an intrusion detection system using classical RNN. This is an
we implemented the method respectively and demonstrated attempt to use a time series model to learn the time-related
its excellent stability and accuracy both in binary classifica- features in intrusion detection, and the experimental results
tion and multiclass classification. show that this model is superior to the traditional machine
The remainder of this paper is organized as follows. learning model. In [20], Conditional Variational Autoencoder
Section II describes some of the related work in the field of (CVAE) with a specific architecture is used for intrusion
intrusion detection. Section III mainly explained the proposed detection in the Internet of Things network. This method can
multimodal-sequential intrusion detection methodology. In recover missing features from incomplete training datasets
section IV, we described the experimental process and ana- through the specific architecture and provide better classifi-
lyze the performance of the proposed methodology on the cation results than other commonly used classifiers.
three datasets. Finally, the paper concludes in section V. In addition, some well-known multi-inputs deep learn-
ing methods such as LSTM and CNN are also used for
II. RELATED WORK intrusion detection. According to [21], multi-channel LSTM
In the early stages of research, many scholars focused on assemble multi-features using different sources features, such
traditional machine learning methods. One popular approach as numeric-based, nominal-based, and binary-based, and
is to use shallow ANN to build intrusion detection models. assigned to different channels for training. In this work, each
For example, the Feedforward Neural Network (FNN) is channel is an independent LSTM neural network, and the
applied to construct a classifier, and the back-propagation final detection result is the major result of the multi-channel
algorithm is used to train the network classifier [2]. Others classifiers by voting. Although the experimental results are
popular machine learning approaches such as Support Vec- ideal, they only use one dataset NSL-KDD to evaluate the
tor Machine (SVM), Random Forest (RF), Multiclass SVM multi-channel method. The NSL-KDD dataset is relatively
(MSVM), have also implemented intrusion detection sys- old and lacks the attack types of the modern network. Simi-
tems [16]. lar to multi-channel LSTM, Multi-channel CNN (MC-CNN)
Subsequently, some hybrid methods using multi-level and is proposed and used in the DDoS Attack Detection [22].
ensemble models were proposed. In the multi-level models, Different from the multi-channel LSTM, MC-CNN model
the two-level model is the most representative, such as utiliz- split features according to different level, such as packet-
ing feature engineering as the first stage and classifier as the based and traffic-based, and adds a full connection layer after
second stage. In [17], the kernel PCA technique is used for all the CNN channels and achieves the final classification.
feature dimensionality reduction, and then SVM is employed The experiment found that MC-CNN model can achieve
in classification. In the ensemble models, the most are the highly accurate in detecting DDoS attacks. However, they
combination of single algorithms that are trained on different only focus on DDOS attack instead all of the modern attacks.
training samples, such as used a combined classifier based In [23], considering the temporal and spatial features of
on k-means and RF methods for intrusion detection [18]. each network connection, the author designs a hierarchical
When the performance of the hybrid method is compared to spatial-temporal features-based intrusion detection system
the single machine learning method, it can be observed that (HAST-IDS) with the hierarchical deep network supported by
the former is more effective. However, the major problem CNN and LSTM, and detects attacks by pcap data extracted
in those methods is that due to the limitations of shallow from DARPA1998 by himself. This is a study that considers
learning, they cannot effectively solve the intrusion detection two aspects at the same time in traffic data. Although the
problem under massive complex data relationships. above approaches provided a new way to understand the rich
Recently, deep learning methods as a research hotspot have information in the traffic data, features usually have different
also widely used in the design of intrusion detection systems. structures and a high degree of non-linear correlation between
In [12], a typical neural network classifier based on Self- them, making full use of features is difficult. Although
taught Learning (STL) is proposed for intrusion detection. HAST-IDS used pcap data extracted by himself, by extracting
In this method, all multi-features of traffic data are regarded part of the pcap traffic data by himself, the information they
VOLUME 7, 2019 183209

get is incomplete at the beginning. Therefore, how to integrate

these heterogeneous features into joint features is particularly
important. Moreover, we found that they seem to ignore
the structured information such as temporal relationships in
traffic data and the dataset used is released for a long time and
only a few types of attacks are available.
In this paper, in order for information security to make
progress in understanding the behaviors based on network,
we do not directly use traffic record as input, but split
them to different level group with heterogeneous features
and each group is considered as a feature view respectively. FIGURE 1. Extraction features from the multi-features.
To integrate these heterogeneous features and extract the
structured information of records, we design a multimodal-
sequential intrusion detection approach with the structure connection record. In fact, the essence of network behaviors is
of hierarchical progressive network, the scalable multimodal dynamic and continuous network interactive data, it suggests
auto-encoder (MDAE) was used in the first stage and the that records in traffic data contain multi-dimensional infor-
joint features was given to the LSTM in the second stage. mation of the network connection. Meanwhile, the network
We implemented our method on three different datasets information network also can be divided into traffic based and
including CICIDS2017, UNSW-NB15, and NSL-KDD. The packet based [22]. Therefore, we do not directly use traffic
experimental results show that our approach can achieve 96% record as input to detect attacks in this paper.
and 99% accuracy in the modern dataset UNSW-NB15 and As show in Figure 1, we split the features of each record
CICIDS 2017. We compared with the same methods such NB, into groups based on the different nature of features, such as
SVM, and DNN on three datasets, our approach is super than packet based, traffic based, general based, and so on. In the
other method on average, and improves the accuracy of intru- process of segmentation, the sequential relationship between
sion detection at least 9%, and proved that our approach has records is still maintained, and the number of records is does
better stability. Moreover, in the final experimental section, not change. After that, we can get several feature groups
we further explore the multimodality and complementarity for each record, and each group is still a vector, which is
using three different views (single-view, simple-view, and expressed as Fgruops = {F1 , F2 , . . . , Fm }, where m is the
multi-view) in traffic data features, the experimental results number of feature groups.
show that when using our approach to process the traffic data Compared with the simple way that concatenated all the
features, the performance of detection model can be further complex features into a long feature vector, the data process
improved. in this paper can further divide the long feature vector into
different feature groups and reduce the complexity of fea-
III. METHOD tures. Moreover, the split standard is flexible according to the
In this section, we develop a multimodal-sequential approach monitoring tools and the different views on features. Because
with deep hierarchical progressive network for modern attack the features of data are not uniform from different network
detection and called MS-DHPN. The MS-DHPN consists of data monitoring tools. For CICIDS2017, UNSW-NB15, and
two layers. In the first layer, we provide a multimodal fusion NSL-KDD evaluation datasets used in this paper, we divided
algorithm is based on the multimode deep auto-encoder their features into 2, 3, and 3 groups respectively.
(MDAE) to integrate the complex features in each traffic
flow in the low level. In the second layer, we adopt LSTM B. MULTIMODAL FUSION MODEL
as the sequential learning algorithm to extract the temporal This module designs an extensible multimode deep auto-
information between traffic flows in the high level. Before encoder (MDAE) based on multimodal learning technol-
introducing the multimodal algorithm and sequential algo- ogy [24] to learn the joint representation for multi-groups
rithm, we first describe the data preprocessing module that of feature in intrusion detection. The design of MDAE is
split the complex features from traffic data. based on the insight that the correlation between features
in a traffic flow is heterogeneous and complementary. For
A. DATA PREPROCESSING example, both traffic feature group and packet feature group
Data preprocessing module is mainly designed to get the come from the same network connection, but they describe
features of different levels in traffic data. After monitoring different aspects of connection, respectively. Distinct from
the network flow, we can obtain the traffic database that conventional approaches with a single input channel, which
contains historical network behaviors. In order to represent a cannot effectively process data with multi-feature groups at
sequence of TCP packets from source to destination within a the same time, our MDAE model has good ability to process
connection, the connection record was used to description the and relate information from multi-features.
sequence packets, which is expressed as F = (f1 , f2 , · · · , fn ), The architecture of MDAE network is illustrated
where f is one feature and n is the number of features in each in Figure 2. The multiple input channels in the input layer are
183210 VOLUME 7, 2019

FIGURE 2. The neural network architecture of MDAE and its construction process.
provided according to the number of feature groups, rather Gaussian RBM also can be easy to compute as follows:
than a fixed one or two. So MDAE model allows input of
multi-eigenvectors at the same time. Several individual Gaus- P (hi = 1|v) = sigmoid(Wv + b) (3)
sian restricted Boltzmann machines (Gaussian RBM) were P (vi |h) = N (Wv + b) (4)
adopted as the interpreter in the intermediate layer to learn the
We adopt the contrastive divergence algorithm [26] to train
distribution for each input channel. The final layer is a joint
Gaussian RBM, and the RBM parameters θ(W , b, c) can be
network to fusion the multimodal information from these
obtained. The learning rule is:
Gaussian RBM interpreters and obtained the joint feature
representation. Therefore, when given a traffic flow data with 1W = Edata (vh) − Emodel (vh) (5)
m feature groups, Fgruops = {F1 , F2 , . . . , Fm }, our goal is to
learn the final consensus representation F 0 = {Fjoint }. where Edata is the expectation observed in the training data
As shown in Figure 2, the training procedures of the and Emodel is the expectation observed in the data generated
MDAE model consist of two parts - Forward encoding and by the RBM model.
Back decoding. Forward encoding is responsible for fusing In the process of back decoding part, the stacked RBMs
multi-features and calculating initial joint representation val- obtained from the forward coding part were unfolded into
ues. Back decoding is responsible for fine-tuning the weight a deep auto-encoder with multi-input and multi-output. The
matrices according to the reconstruction error. parameters of decoder part and encoder part to be the
In the process of forward encoding part, since the RBM corresponding weight matrices. The learning rate is set to
with undirected graphical structure only have two layers: 0.001 and SGD back-propagation algorithm was used to tune
input layer and hidden layer, so we tune the number of the weights of MADE model, final the joint representations
hidden neurons from 10 to 120 depending on the size of the of different features were extracted. Compared to other mul-
input. After building the Gaussian RBM interpreters for the timodal deep learning methods using the Multiple Kernel
intermediate layer, hidden layers of those Gaussian RBMs are Learning (MKL) [25] or CNN [22] as the interpreter, our
concatenated together as the input of upper RBM to build next MDAE using restricted Boltzmann machines (RBM) with
joint network. More specifically, there are no connections graph structure as interpreter, which can learn better inter-
between units in the same layer for Gaussian RBM, so when pretable fusion features from traffic data. The Forward encod-
given the real-valued visible unit v and binary hidden unit h, ing algorithm and Back decoding algorithm are described in
the joint distribution P(v, h) is easy to compute by an energy Algorithm 1.
function as follow:
C. SEQUENTIAL LEARNING MODEL
exp(−E(v, h))
P(v, h) = (1) In the real-time network environment, the network connec-
Z tion does not exist independently, but in a sequence based
1 T 1
E (v, h) = v v − 2 (cT v + bT h + hT Wv) (2) manner, and the number and content of packets in each
2σ 2 σ connection is changes over time, which indicate that there
where Z is a normalization constant, E(v, h) is an energy is structured information in the traffic data, the most obvi-
function, σ is a hyper-parameter, W is the weight matrices ous is temporal information. Different from the method [14]
between visible layer and hidden layer, c and b are biases used Recurrent Neural Network (RNN) to learn the tempo-
for the visible layer and hidden layer, respectively. When we ral features, this paper adopts LSTM with a hidden layer
set σ = 1, the conditional probability distributions of the to perform the temporal feature extraction in an automatic
VOLUME 7, 2019 183211

Algorithm 1 The Training Procedures of MDAE activation function σ . Secondly, the forget information and
Input: Different feature groups including new information are combined into the update information to
(F 1 , F2 , . . . , Fm ) add the cell state.
Output: The joint representation Fjoint
Step 1: Forward encoding it = σ (Wxi xt + Whi ht−1 + bi ) (7)
1: for F = 1 to m do ct = ft ct−1 + it tanh(Wxc xt + Whc ht−1 + bc ) (8)
2: Train Gaussian RBM for current feature group
For the output gate, its function is to determine the final
3: Save Gaussian RBM as individual interpreter
state of cell and the final output value. The output gate first
4: Aggregate the hidden layer of all Gauss RBM into an
determines which parts of the cell state to output. Then,
intermediate layer
the cell output is multiplied by the output of the previous
5: Use the intermediate layer as input to train next RBM
sigmoid layer by a tanh layer operation as the final output
as joint interpreter
0 value.
6: Get the preliminary joint representation Fjoint
Step 2: Back decoding ot = σ (Wxo xt + Who ht−1 + bo ) (9)
7: Spread all RBMs symmetrically to form MDAE ht = ot tanh(ct ) (10)
8: Update MDAE according to reconstruction error of
each feature group Different from KNN, RF, SVM, and other simple models,
9: end through the structure of LSTM cell, the information of net-
work flow can be store and temporal correlation information
can be captured more completely in the high level. More
manner. LSTM is a variant of the RNN [26] and has been importantly, in the modern network, traffic data is structured
successfully applied in sequence modeling and natural lan- sequentially, it is usually a time series, and should be modeled
guage processing. In LSTM model, the traffic data can be in a sequence manner.
formalized as a sequence. Due to the LSTM not only contains
excellent loop structure of RNN, but also has a memory block D. MULTIMODAL REAL-TIME MODEL
that can capture long-term dependency through gate control, We consider that network flow contains rich information,
the hidden layer ht is controlled by three main gates (i, f , o), which can be further extracted and integrated by multimodal
which are input gate, forget gate, and output gate, as shown in fusion and sequential learning algorithm to improve the per-
FIGURE 3. When given the input traffic record xt at time t, formance of attack recognition. However, multimodal fusion
the information flow of traffic record in LSTM is described and sequential learning algorithm can only extract the com-
as follow: plex feature in the low level and external temporal feature in
the high level respectively, so they can only use the feature
information at their respective levels, but can not fully express
all the information of traffic.
This paper developed an end-to-end intelligent multi-
modal real-time approach (MS-DHPN) can make full use
the information in low and high level of traffic as much
as possible. Considering input diversity in different feature
views, we designed a flexible multimode deep auto-encoder
(MDAE) to make MS-DHPN more usable. The structure of
MS-DHPN is hierarchical progressive network from MDAE
to LSTM networks and its structure is shown in FIGURE 4.
FIGURE 3. Information flow in LSTM cell.
In our approach, the MDAE model with two layer inter-
For forget gate, its function is to selectively discard infor- preters is constructed in advance, and the soft-max function
mation in the cell state, which is the first step in the LSTM is engaged with the end of LSTM to classify network traffic
decision. The forget gate first receive the information flow into normal or known attacks. Furthermore, in order to make
from the previous hidden layer ht−1 and input layer xt , and the classifier have better classification performance, the loss
then use the activation function σ to determine how much function is used as a criterion for measuring the error of
information to forget. The percentage of forgetting is between the actual labels yt and the prediction labels ŷt . For binary
0 and 1. classification we use the cross-entropy as the loss function:
XT
ft = σ (Wxf xt + Whf ht−1 + bf ) (6)

L=− yt log ŷt + (1 − yt )log(1 − ŷt ) (11)
t=1
For the input gate, its information flow is to the cell state For multi-classification we use the entropy as the loss
ct , and it is used to determine store the new information. The function:
update of information in cell state contains two steps. Firstly, XT
the input gate determine get the new information through the L= yt log ŷt (12)
t=1
183212 VOLUME 7, 2019

FIGURE 4. Framework of the detection procedure of the proposed method.
The training and testing process of the deep hierarchical adjacent traffic records. In the final experiment, we used
network structure model is shown in Algorithm 2. our MS-DHPN methods to integrate joint representation with
real-time feature to detection attacks. In each experiment,
Algorithm 2 The Training and Testing Procedures for MS- we separated these datasets into training and test datasets,
DHPN training datasets were used to train the intrusion detection
Input: Network connection flow xi = {F1 , F2 , Fm }, (i = model and test datasets were used to evaluate the intru-
1, 2, . . . , T ) sion detection model. For the experimental evaluations, each
Output: Flow category probabilities yi model is required to run three times on each dataset, and
Step 1: Multimodal fusion model the average accuracy of the three runs is considered to be
1: Generating MDAE model according to Algorithm 2 the final result. We have implemented our experiments in
2: Using MDAE to integrate internal multi-features Python with Tensorflow, Keras, and Scikit-learn libraries, and
xi0 = {Ffusion } compared performance against the popular attack detection
Step 2: Sequential learning model methods. Before presenting our experimental results, we will
3: for i from 1 to T do first introduce the data model, experiment setting, and evalu-
4: Using LSTM to external temporal feature ation metrics.
00 temporal
xi = {Ffusion }
5: end for A. DATA MODEL
Step 3: Hierarchical progressive network by integrate Nowadays, only a few datasets are publicly available for
MDAE and LSTM evaluating intrusion detection models. Additionally, most of
6: Add a soft-max layer to the end of LSTM to form the publicly available datasets are not maintained. Therefore,
MS-DHPN it is inaccurate to use only one dataset to evaluate the detec-
7: While train not to end do tion model. This paper used three different datasets between
00 temporal
8: Get the output the probability ŷt by xi = {Ffusion } 1999 to 2018 in experiment, two of which are recently
9: Calculate the cross entropy L and update the released and contain a variety of attacks from the modern
MS-DHPN real network, the other is the classic NSL-KDD datasets.
Step 4: Test In particular, UNSW-NB15 and CICIDS 2017 dataset have
10: Given test traffic flow to MS-DHPN model the characteristics of real-time network traffic. Table 1 shows
11: Return the category probabilities list main information on three datasets. In the experiment, we use
three different datasets to train and test MS-DHPN model.
Therefore, we will emphatically introduce split features of
IV. EXPERIMENT instances into different groups.
In this section, several stage experiments are designed to eval-
uate the performance of MS-DHPN model on the NSL-KDD, 1) NSL-KDD
UNSW-NB15, and CICIDS2017 dataset respectively. In the The NSL-KDD dataset is an improved version of the
first stage, MDAE was used to learn the joint representa- KDD-CUP 99 dataset [27], [28], both datasets were built
tion of multimodal features in traffic data. In the second by processing raw tcpdump data of the 1998 DARPA intru-
stage, LSTM was used to learn temporal information between sion detection challenge dataset. Compared with KDD-CUP
VOLUME 7, 2019 183213

99 data, NSL-KDD dataset is primarily optimized for data TABLE 1. The main information of distribution on NSL-KDD, UNSW-NB15,
and CICIDS 2017 dataset.
quality and eliminates a large number of redundant records
in the KDD-CUP 99 data. Although it has been pointed out
that the NSL-KDD dataset lacks the types of modern network
attacks [9], such as low foot print attack, it still widely used
by many researchers to measure the performance of intrusion
detection method due to contains new attacks in the test
dataset. There are 41 features and 5 labels such as Normal,
Dos, Probe, R2L, and U2R, and the distribution statistics
about the NSL-KDD are shown in Table 1. TABLE 2. Feature groups in NSL-KDD dataset.
For the 41 features in NSL-KDD dataset, it is usually
treated as an entire instance to a single channel model to
make the classification in other studies. However, in this
paper, we considered each level of features as a modality and
split the features into three groups based on different sources.
The detailed results are shown in Table 2, and the detailed
description is as follows:
Basic features: the basic features describe the primary
properties of individual TCP connection, such as service
and protocol type. These features are extracted from packet
header, UDP datagram, and TCP segment in the packet cap-
ture files of tcpdump.
Content features: the content features describe the data
payload information of a packet, such as the number of login
failures. Specifically, these features are extracted from the full TABLE 3. Feature groups in UNSW-NB15 dataset.
payload of TCP/IP packets in order to detect attacks, such as

U2R and R2L, which embedded in the data payload instead
of frequent sequence patterns in data records.
Traffic features: the traffic features describe the statistical
indicators about the current connection, including time-based
and target host-based. The time-based features are derived
from traffic metrics in the last two seconds, such as the num-
ber of connections with same target host, and same service.
Similarly, target host-based features are derived from the
100 connection records with same target host.
Meanwhile, as the features of each dimensional come from
different sources, the data types between them are usually
different. The basic features are mainly nominal, the content-
based features are binary, and the content-based features normal activities, and the other one is used to generate syn-
are numeric. For nominal features, it needed to be con- thetic contemporary attack behaviors. Finally, tcpdump tool
verted to numerical features so that meet the input require- was utilized to capture these network behaviors and form
ments of the model, such as the ‘tcp’, ‘udp’ and ‘icmp’, traffic records. The distribution of UNSW-NB15 dataset is
belong to ‘protocol_type’ feature, are mapped into (0,0,1), shown in Table 1.
(0,1,0) and (1,0,0). In addition, since the numerical distri- Similar to NSL-KDD dataset, the 43 features of UNSW-
bution of each dimension is in different intervals, in order NB15 dataset are mainly split three groups, such as basic,
to eliminate dimensional influence between the feature indi- content, and traffic features (time and additional generated
cators, we normalize the new features list using min-max features). The detailed results are shown in Table 3. Basic
normalization. features involve the attributes that represent protocols con-
nections, such as service and state of protocol. Content fea-
2) UNSW-NB15 tures mainly describe the attributes of TCP/IP. Time features
The UNSW-NB15 [29] dataset generated in 2015 and contain and additional generated features mainly describe the traffic
the normal and attack behaviors of a live modern network information for TCP connection including the attributes time
traffic. Different from the NSL-KDD dataset, in UNSW- of packets and TCP protocol, the statistical indicators for TCP
NB15 dataset, the raw traffic packet is generated in a hybrid connection, such as the arrival time between packets, round
way by IXIA PerfectStorm tool, two servers of the IXIA traf- trip time of TCP protocol, and matched statistical indicators
fic generator were used to generate a hybrid of real modern under a certain window.
183214 VOLUME 7, 2019

3) CICIDS2017 TABLE 5. The total parameters details of all models in three datasets.
The CICIDS2017 dataset [30] released in 2017 and is widely

used in intrusion detection experiments. It is a reliable real-
time dataset and contains the most common attacks from
real attack scenario. This dataset collects normal and attack
traffic from Monday to Friday based on the generated realistic
background traffic in real time. On Monday, the normal traffic
was generated by used B-Profile system that can profile the
abstract behaviors of human interactions. On the rest four
days, various types of attack traffic are generated alternately
with normal traffic. Finally, the collected traffic flow was
accurately labeled by professional. Moreover, its attack labels
included Web attack, Brute Force FTP, Brute Force SSH,
DoS, DDoS, Infiltration, Bot and PortScan. The distribution
of CICIDS2017 dataset for network intrusion detection is
shown in Table 1. respectively, and the number of neurons of the second layer of
Unlike other datasets, the CICIDS2017 dataset not only interpreters is 60. When the MDAE is fine-tuned, the learning
includes the results of the network traffic analysis but also rate is set to 0.001, and the weight of MDAE is adjusted by
timestamp, protocol, source and destination ports. According SGD back propagation algorithm. For LSTM model, we set
to [22], we select 77 features in the CICIDS2017 dataset and the time step to 50 and reshape the joint representation of
split them into two groups, one of which is based on package MADE as input to train it. The number of hidden layer of
information and the other is based on traffic indicators. The LSTM is set to 1 and the numbers of neurons are tuned in
detailed results are shown in Table 4. [120, 90, 80, 70, 60, 50, 40, 30, 20] units. The learning rate is
tuned from 0.001 to 0.1. The all parameters of all models on
TABLE 4. Feature groups in CICIDS 2017 dataset.
three datasets are shown in Table 5.
C. EVALUATION METRICS
In our experiment, Accuracy, Precision, Recall, and F1-score
are used as metrics to measure the proposed approach perfor-
mance. Accuracy is used as the main performance indicator.
Besides, the confusion matrix widely used in the classifica-
tion model is also used in our experiment. In the confusion
matrix, TP is the number of attack records which correctly
classified as attack, TN is the number of normal records
which correctly classified as normal, FP is the number of
normal records which incorrectly classified as attack, FN
is the number of attack records which incorrectly classified
B. EXPERIMENT SETTING
as normal. The performance metrics can be computed as
According to the analysis of NSL-KDD dataset, we find its follows:
features mainly come from three sources, including packet Accuracy: the percentage of all records correctly classified
header, data payload, and statistical indicators of connection. in total records.
So, we split the features of NSL-KDD into three groups and TP + TN
set three input channels in MDAE model for it. Similar to Accuracy = (13)
TP + TN + FP + FN
NSL-KDD data, we split the features of UNSW-NB15 dataset
Precision: the percentage of the correctly identified attack
and CICIDS2017 dataset into three and two groups respec-
records in all identified attack records.
tively, and set three and two input channels for them TP
respectively. Precesion = (14)
TP + FP
For the MDAE model with two layer interpreters, the num-
ber of neurons of each channel was tuned from 10 to Recall: the percentage of the correctly identified attack
120 depending on the size of the input, and the number of records in all attack records. It is also called as true positive
neurons in the interpreter of the middle layer is also tuned rate (TPR).
depending on the number of neurons of channel. For example, TP
Recall = (15)
for the NSL-KDD dataset, we set three channels of input in TP + FN
MDAE model, and the number of neurons of the three input F1-score: it is the harmonic mean of Precision and Recall.
channels are 90, 13, and 19 respectively, and the number of 2(Recall × Precesion)
F1 = (16)
neurons of the first layer of interpreters are 60, 10, and 15 Recall + Precesion
VOLUME 7, 2019 183215
FIGURE 5. Loss values of MDAE and each feature groups. (a) NSL-KDD dataset. (b) UNSW-NB15 dataset. (c) CICIDS2017 dataset
FIGURE 6. Train accuracy of MDAE model, LSTM model, and MS-DHPN model on (a) NSL-KDD dataset, (b) UNSW-NB15 dataset, (c) CICIDS2017 dataset.
FIGURE 7. ROC curves of MDAE model, LSTM model, and MS-DHPN model on (a) NSL-KDD dataset, (b) UNSW-NB15 dataset, (c) CICIDS2017 dataset.
In addition, Receiver Operating Characteristic (ROC) In these measures, the value of Accuracy, Precision, Recall,
curve is also used to measure the performance of the proposed F1-score, AUC is higher, the value of FPR is lower, the per-
approach. ROC is plotted based on the tradeoff between the formance of model is better.
TPR on the y axis to false positive rate (FPR) on the x axis
across different thresholds, and the Area Under the ROC D. RESULTS AND ANALYSIS
Curve (AUC) is used as a comparison metric of the machine To evaluate our model more comprehensively, we perform
learning models. AUC can be computed as follows: the binary and multiclass classification tasks on the MADE
model, LSTM model, and proposed model respectively. The
FP loss value using MDAE for NSL-KDD, UNSW-NB15, and
FPR = (17)
TN + FP CICIDS2017 datasets are shown in Figure 5a, Figure 5b,
Z 1
TP FP and Figure 5c respectively. We can see that the loss value
AUC = d (18)
0 TP + FN TN + FP
of each level feature decreases synchronously and eventually
183216 VOLUME 7, 2019

TABLE 6. Test results of MS-DHPN model and the other models in the
binary classification.
FIGURE 8. Average performance comparison of all competitive methods

in the binary and multi classification.
converges to near zero. Train accuracy of the three models

for NSL-KDD, UNSW-NB15, and CICIDS2017 datasets are
shown in Figure 6a, Figure 6b, and Figure 6c respectively. For TABLE 7. Test results of MS-DHPN model and the other models in the
these datasets, the proposed method showed train accuracy in multi-classification.
the range 95% to 99%, and the convergence speed is faster.
This indicates that compared with the MDAE and the LSTM
method, this method can achieve the highest training accuracy
in the shortest time. Moreover, ROC curve for NSL-KDD,
CICIDS2017, and UNSW-NB15 datasets are shown in Figure
7a, Figure 7b, and Figure 7c respectively. In most of the
cases, MS-DHPN performed well and the value of AUC is
higher. For MS-DHPN on the CICIDS2017, and UNSW-
NB15 datasets, the FPR is close to 0 and the TPR is close
to 1.
In addition, our approach also compared performance with
the conventional approaches in [1], which is the recent liter-
ature providing the detailed performance of various machine
and deep learning methods such as SVM, DNN, and Naive
Bayes(NB). The average accuracy of these models on the
three datasets as shown in Figure 8, it is clear that among all
the competitors, our MS-DHPN model has the highest accu-
racy. Moreover, the detailed performances of those models in
binary and multiclass classification are shown in Table 6 and have the ability to identify attacks, but our method can further
Table 7 respectively. We used one-way analysis of variance improve the performs of intrusion detection, which make it
(ANOVA) to determine the statistical significance. The per- possible for our method to detect more attacks in the case of
formance with MS-DHPN is significantly greater than that massive network data.
with the traditional methods (p < 0.01). For the binary clas- From the experimental results of multiclass classifica-
sification, the final categories on the three different datasets tion in Table 7, the accuracy of classification algorithms is
NSL-KDD, UNSW-NB15, and CICIDS2017 are Normal and declined compared with the binary classification. That is due
Attack. It mainly focuses on whether the model can identify to in the task of multi-classification, the final classifications
the attack. From the experimental results in Table 6, in the on three different datasets are five, ten, and eight categories
six models, our model achieves the best performance in the respectively instead of two categories. Multi-classification is
metric of accuracy and F-score, especially in the UNSW- mainly concerned with whether the model has good stability
NB15 and CICIDS2017 datasets, the accuracy of our model in this paper. However, the accuracy and F-score value on
up to 96.8% and 99.9% respectively, and the F-score value NSL-KDD dataset is lowest between the three datasets, this
is 0.971 and 0.999 respectively. For the MDAE model and is due to the number of attacks such as ‘U2R’ and ‘R2L’is
LSTM model, although the results are lower than the pro- very less than ‘Dos’ and ‘Probe’ in training set, and new
posed method, it is still slightly better than traditional meth- types of attacks are included in test set. Even so, our model is
ods. The experimental results show that all of the six models still superior to those using SVM, NB, and DNN algorithms.
VOLUME 7, 2019 183217

At the same time, we find that when the NB model gets TABLE 8. Test accuracy of three feature scenarios in the binary and
multi-classification on NSL-KDD dataset.
higher recall value in CICIDS2017 datasets, the precision
is very lower. Similarly, the SVM model also has the same
situation in binary classification. This indicates that these
classifiers are unstable under the new environment. So to
further analyze the stability of our proposed model, we given
the accuracy of these models on the three datasets based
on the experimental results in Figure 9. It can be observed TABLE 9. Test accuracy of three feature scenarios in the binary and
that the accuracy of traditional methods varies greatly with multi-classification on UNSW-NB15 dataset.
the change of datasets. On the contrary, our model accuracy
remains at a very good level. The phenomenon can also be
observed in binary classification. The experimental results
show that our model is generalizable and more suitable for
the various modern attacks detection.
TABLE 10. Test accuracy of three feature scenarios in the binary and
multi-classification on CICIDS2017 dataset.
FIGURE 9. The test accuracy trends of all competitive methods on the

three different datasets from 2009 to 2017.
The experiments of both binary classification and multi-

classification on three datasets show that the MDAE model,
LSTM model, and our proposed model can achieve excellent
classification results. For CICIDS2017 and UNSW-NB15
with real-time network traffic, the performance of LSTM is
super than MDAE model, but the opposite is true on the
NSL-KKD dataset, which show that the MDAE model can
intelligently integrate the feature of different levels within a
traffic connection, and LSTM model can extract the temporal FIGURE 10. Box plot of the accuracies with three feature scenarios in the
features between the adjacent traffic connections. However, binary classification and multi-classification.
compared with our MR-DHPN model based on the above
consideration at the same time, our model can further improve presents the box plot of accuracy using different feature views
the performance of network intrusion detection system. At the in binary classification and multi-classification.
same time, its performance on multiple datasets is very stable. Through the experimental results of single view in Table 8,
In order to further explore the complementarity and Table 9, and Table 10, we found the accuracy of different sin-
multimodality of network features, we further studied the gle features is almost at the same level, such as for NSL-KDD
impacts of different feature inputs from NSL-KDD, UNSW- datasets, the accuracy of binary classification in range 75%
NB15 and CICIDS datasets via our MDAE mode on detection to 80%, the accuracy of multi-classification in range 60% to
accuracy. Specifically, we analyze the performance from the 70%. However, the basic feature is almost always higher than
three scenarios: single view, simple view, and multi-view. content in the NSL-KDD and NUSW-NB15 datasets. These
The single view approach uses only one level feature as input. results are similar to that in [31], but in their experiments
The simple view approach uses the vector by directly concate- results, the accuracy of content features is much lower than
nated with all level features. The multi-view is uses different basic features. This is because they used only part of payload
level features as input. In fact, for single view and simple view information in the packet, which results in inaccurate results.
approach, the MDAE model just has one interpreter, which By analyzing the experimental results and the actual meaning
becomes an auto-encoder using RBM to initialize parameters. of these single view features, we can draw the conclusion that
This is a classical deep learning technology and widely used each single feature is important for intrusion detection, but
in feature engineering. Table 8, Table 9, and Table 10 show the only using one level of features is not enough to achieve high
detailed experimental results on three datasets, and Figure 10 detection accuracy.
183218 VOLUME 7, 2019

TABLE 11. The time expenditure for training and testing models.
Further, through the experimental results of simple view

and multi-view in Table 8, Table 9, Table 10, and Figure 10,
we can see that the multi-view approach results are almost
always the best on the three datasets. The average accuracies
in binary classification and multi-classification is 87.03%
and 81.82% respectively. Although simple view method out-
performs the single view, it is still lower than multi-view
approach about 2% to 5%, which is indicates that it does
not effectively utilize the multi-features of network enough
yet, and it is very difficult to fusion the relate informa-
tion between the different level features. On the contrary,
multi-view approach can learn the high-level representations
between different modalities. Through the processing of mul-
tiple layers in deep neural network, the effective shared rep-
resentations are automatically extracted. Moreover, relations
across various modalities are deep instead of shallow. There-
fore, in order to illustrate the convergence performance of
FIGURE 11. The confusion matrices of the simple view and multi-view.
MDAE model, we given the training and the test time of (a) UNSW-NB15 dataset. (b) CICIDS2017 dataset.
the simple view and multi view on UNSW-NB15 dataset.
The parameters of those model were set to be same, and
the experimental results are shown in Table 11. From the experiment result reveals why our multi-view approach can
results, we found that the time consuming of the simple view enhance the performance of intrusion detection. Moreover,
is lower than our multi-view. This is because the structure of these analyses of experimental results show that multimodal
the simple view model is easy, but our multi-view model has view method can not only integrate the network features of
the special structure use multiple deep learning techniques, multi-level more effectively, but also be used to compare the
when it can capture the relations across various modalities, importance of features, so as to further improve the accuracy
the computational cost and training time will increase at the of the model. This is crucial for an NIDS, because in the
same time. More importantly, in the test stage, the total time actual network, a slight improvement in performance can
used of our multi-view model is only 25 seconds, it only about detect more attacks and reduce more potential threats.
36% more time consume compares the simple view model,
and improved hardware configuration and GPU acceleration V. CONCLUSION
can further reduce time overhead in future. Network traffic is a formal representation of complex net-
Moreover, to further investigate the enhancement effect work behavior and contains rich feature information, but
of fusing single view features, we analyzed the con- single consideration for intrusion detection cannot make full
fusion matrices of the simple view and multi-view on use of the rich information in network traffic. In this paper,
CICIDS2017 and UNSW-NB15 datasets, which can reveal a multimodal-sequential approach with deep hierarchical pro-
the strength and weakness of each approach in detecting gressive network is proposed, and a novel view for con-
different modern network attacks. Figure 11 presents the sidering the feature of information security is given. In our
confusion matrices, and indicated by these results, the simple approach, the special structure of the MDAE and LSTM can
view and multi-view approaches can significantly enhance maximize the use of feature information at different levels
the performance of attack detection. However, the multi- and make the model achieve the outstanding performance.
view used in this paper provides even better improvements To our best knowledge, it is the first time to consider the two
than simple view. Compared with simply view on UNSW- aspects in the design of NIDS, and it is also the first time
NB15 dataset, we observe that the accuracies in ‘Fuzzers’, the multimodal deep technology is used in the information
‘Worms’, and ‘Shellcode’ are improve 22%, 52% and 36% security problem. In the experimental part, we use three
by multi-view, for the other classes improve 1% up to 14%. datasets from 1999 to 2017 to verify the performance of our
On CICIDS2017 dataset, although the accuracy of simple method, the results show that compared with other network
view in ‘DDoS’ is higher 6% than multimodal view, the latter intrusion methods with single consideration, our method
is superior to the former in the other seven categories, espe- improves the accuracy of intrusion detection by at least 2%.
cially in ‘bot’, which is 0.03% and 0.43 respectively. So the At the same time, we analyze and present the utilization of
VOLUME 7, 2019 183219

features in three different feature views. This is something [18] Y. Y. Aung and M. M. Min, ‘‘An analysis of random forest algorithm
that has never been studied before. Our research shows that based network intrusion detection system,’’ in Proc. 18th IEEE/ACIS Int.
Conf. Softw. Eng., Artif. Intell., Netw. Parallel/Distrib. Comput., Jun. 2017,
our intelligent approach based on the multi-view of features pp. 127–132.
can significantly improve performance, which provides a new [19] N. Gao, L. Gao, Y. He, Q. Gao, and J. Ren, ‘‘An intrusion detection model
perspective for other researchers to understand the nature of based on deep belief networks,’’ in Proc. 2nd Int. Conf. Adv. Cloud Big,
Nov. 2014, pp. 247–252.
network behavior. [20] M. Lopez-Martin, B. Carro, A. Sanchez-Esguevillas, and J. Lloret, ‘‘Con-
In the future, we will design our own traffic collection sys- ditional variational autoencoder for prediction and feature recovery applied
tem. Through the collection system, we will collect network to intrusion detection in IoT,’’ Sensors, vol. 17, no. 9, p. 1967, 2017.
[21] J. Feng, Y. Fu, B. B. Gupta, F. Lou, S. Rho, F. Meng, and Z. Tian, ‘‘Deep
data closer to the real world. So we can find more attacks to learning based multi-channel intelligent attack detection for data security,’’
test our model, and further study the data multimodality in IEEE Trans. Sustain. Comput., to be published.
the field of information security from a more primitive point [22] J. Chen, Y.-T. Yang, K.-K. Hu, H.-B. Zheng, and Z. Wang, ‘‘DAD-MCNN:
DDoS attack detection via multi-channel CNN,’’ in Proc. 11th Int. Conf.
of view to improve the accuracy of intrusion detection. Mach. Learn. Comput. (ICMLC-ACM), 2019, pp. 484–488.
[23] W. Wang, Y. Sheng, J. Wang, X. Zeng, X. Ye, Y. Huang, and M. Zhu,
‘‘HAST-IDS: Learning hierarchical spatial-temporal features using deep
REFERENCES neural networks to improve intrusion detection,’’ IEEE Access, vol. 6,
[1] R. Vinayakumar, M. Alazab, K. Soman, P. Poornachandran, A. Al-Nemrat, pp. 1792–1806, 2018.
and S. Venkatraman, ‘‘Deep learning approach for intelligent intrusion [24] J. Ngiam, ‘‘Multimodal deep learning,’’ in Proc. Int. Conf. Mach. Learn.,
detection system,’’ IEEE Access, vol. 7, pp. 41525–41550, 2019. 2011, pp. 689–696.
[2] L. Li, Y. Yu, S. Bai, Y. Hou, and X. Chen, ‘‘An effective two-step intru- [25] M. Guillaumin, J. Verbeek, and C. Schmid, ‘‘Multimodal semi-supervised
sion detection approach based on binary classification and k-NN,’’ IEEE learning for image classification,’’ in Proc. IEEE Conf. Comput. Vision.
Access, vol. 6, pp. 12060–12073, 2018. Pattern. Recog., Jun. 2010, pp. 902–909.
[3] M. H. Ali, B. A. D. Al Mohammed, A. Ismail, and M. F. Zolkipli, [26] S. Hochreiter and J. Schmidhuber, ‘‘Long short-term memory,’’ Neural
‘‘A new intrusion detection system based on fast learning network and Comput., vol. 9, no. 8, pp. 1735–1780, 1997.
particle swarm optimization,’’ IEEE Access, vol. 6, pp. 20255–20261, [27] M. Tavallaee, E. Bagheri, W. Lu, and A.-A. Ghorbani, ‘‘A detailed analysis
Apr. 2018. of the KDD CUP 99 data set,’’ in Proc. IEEE Symp. Comput. Intell. Secur.
[4] A. Dainotti, F. Gargiulo, L. I. Kuncheva, A. Pescapè, and C. Sansone, Defense Appl., Jul. 2009, pp. 1–6.
‘‘Identification of traffic flows hiding behind TCP Port 80,’’ in Proc. IEEE [28] KDD Cup 99. Accessed: Oct. 1999. [Online]. Available: http://
Int. Conf. Commun., May 2010, pp. 1–6. kdd.ics.uci.edu/databases/kddcup99/kddcup99.html
[5] D. Papamartzivanos, F. G. Mármol, and G. Kambourakis, ‘‘Introducing [29] N. Moustafa and J. Slay, ‘‘The evaluation of network anomaly detection
deep learning self-adaptive misuse network intrusion detection systems,’’ systems: Statistical analysis of the UNSW-NB15 data set and the compari-
IEEE Access, vol. 7, pp. 13546–13560, 2019. son with the KDD99 data set,’’ Inf. Secur. J., A Global Perspective, vol. 25,
[6] K.-H. Lee and Y. B. Park, ‘‘A study of environment-adaptive intrusion nos. 1–3, pp. 18–31, 2016.
detection system,’’ in Advances in Computer Science and Ubiquitous [30] I. Sharafaldin, A. H. Lashkari, and A. A. Ghorbani, ‘‘Toward generating a
Computing, J. J. Park, Y. Pan, G. Yi, and V. Loia, Eds. Singapore: Springer, new intrusion detection dataset and intrusion traffic characterization,’’ in
2017, pp. 625–630. Proc. ICISSP., Jan. 2018, pp. 108–116.
[31] Y. Zhang, X. Chen, L. Jin, X. Wang, and D. Guo, ‘‘Network intrusion
[7] G. Kim, S. Lee, and S. Kim, ‘‘A novel hybrid intrusion detection method
detection: Based on deep hierarchical network and original flow data,’’
integrating anomaly detection with misuse detection,’’ Expert Syst. Appl.,
IEEE Access, vol. 7, pp. 37004–37016, 2019.
vol. 41, no. 4, pp. 1690–1700, 2014.
[8] P. Mishra, V. Varadharajan, U. Tupakula, and E. S. Pilli, ‘‘A detailed inves-
tigation and analysis of using machine learning techniques for intrusion
detection,’’ IEEE Commun. Surveys Tuts., vol. 21, no. 1, pp. 686–728,
1st Quart., 2019.
[9] N. Moustafa and J. Slay, ‘‘UNSW-NB15: A comprehensive data set for
network intrusion detection systems (UNSW-NB15 network data set),’’ in
HAITAO HE was born in January 1968. She
Proc. IEEE Mil. Commun. Inf. Syst. Conf. (MiLCIS), Nov. 2015, pp. 1–6.
received the Ph.D. degree in mechanical design
[10] P. Gogoi, M. H. Bhuyan, D. Bhattacharyya, and J. K. Kalita, ‘‘Packet and
and manufacturing from Yanshan University,
flow based network intrusion dataset,’’ in Proc. 5th Int. Conf. Contemp.
Comput., 2012, pp. 322–334. China. She is currently a Professor with the School
[11] N. Brownlee, C. Mills, and G. Ruth, Traffic Flow Measurement: Architec-
of Information Science and Engineering, Yanshan
ture, document RFC 2722, 1999. University. Her research interests include data
[12] A. Javaid, Q. Niyaz, W. Sun, and M. Alam, ‘‘A deep learning mining, network information security, and artifi-
approach for network intrusion detection system,’’ in Proc. 9th EAI cial intelligence.
Int. Conf. Bio-Inspired Inf. Commun. Technol. (BICT), May 2016,
pp. 21–26.
[13] N. Shone, T. N. Ngoc, V. D. Phai, and Q. Shi, ‘‘A deep learning approach to
network intrusion detection,’’ IEEE Trans. Emerg. Topics Comput. Intell.,
vol. 2, no. 1, pp. 41–50, Feb. 2018.
[14] C. Yin, Y. Zhu, J. Fei, and X. He, ‘‘A deep learning approach for intru-
sion detection using recurrent neural networks,’’ IEEE Access, vol. 5,
pp. 21954–21961, 2017. XIAOBING SUN was born in 1994. He received
[15] J. Kim, J. Kim, H. L. T. Thu, and H. Kim, ‘‘Long short term memory the B.S. degree from the Hebei University of
recurrent neural network classifier for intrusion detection,’’ in Proc. Int. Architecture, China, in 2017. He is currently pur-
Conf. Platform Technol. Service. (PlatCon), Feb. 2016, pp. 1–5. suing the M.S. degree with the College of Informa-
[16] C.-F. Tsai, Y.-F. Hsu, C.-Y. Lin, and W.-Y. Lin, ‘‘Intrusion detection tion Science and Engineering, Yanshan University,
by machine learning: A review,’’ Expert Syst. Appl., vol. 36, no. 10, China. He is currently focusing on the project
pp. 11994–12000, 2009. on intrusion detection and information security,
[17] F. Kuang, W. Xu, and S. Zhang, ‘‘A novel hybrid KPCA and SVM with GA which has been supported by the National Nat-
model for intrusion detection,’’ Appl. Soft Comput., vol. 18, pp. 178–184, ural Science Foundation of China under Grant
May 2014. 61772449.
183220 VOLUME 7, 2019

HONGDOU HE was born in 1991. He received the LIGANG HE received the Ph.D. degree in com-
B.S. degree from the College of Information Sci- puter science from the University of Warwick,
ence and Engineering, Yanshan University, China, U.K. He was a Postdoctoral Researcher with the
in 2014, where he is currently pursuing the Ph.D. University of Cambridge, U.K.
degree. He is currently focusing on a project on Since 2006, he has been with the Department of
software security. And, he is proficient in Java and Computer Science, University of Warwick, as an
Python. His research interests include data mining Assistant Professor and then as an Associate Pro-
and machine learning. His research has been sup- fessor, where he is currently a Reader. His research
ported by the National Natural Science Foundation interests focus on parallel and distributed comput-
of China under Grant 61472341. He is a member ing and big data processing.
of the ACM.
JIADONG REN received the B.S. and M.S.

GUYU ZHAO was born in 1993. She received degrees from the Northeast Heavy Machinery
the B.S. degree from Hebei Normal University, Institute, in 1989 and 1994, respectively, and the
China, in 2015. She is currently pursuing the Ph.D. Ph.D. degree from the Harbin Institute of Technol-
degree in the College of Information Science and ogy, in 1999.
Engineering, Yanshan University, China. She is He is currently a Professor with the School
currently focusing on the project on data mining of Information Science and Engineering, Yanshan
with air quality, which has been supported by University, China. His research interests include
the National Natural Science Foundation of China data mining, complex networks, and software
under Grant 61772451. Besides, she is involved in security. He is a Senior Member of the Chinese
the research on data mining and machine learning. Computer Society and a member of the IEEE SMC Society and the ACM.
She is a member of the ACM.
VOLUME 7, 2019 183221

10.a Novel Multimodal-Sequential Approach Based On Multi-View Features For Network Intrusion Detection

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

10.a Novel Multimodal-Sequential Approach Based On Multi-View Features For Network Intrusion Detection

Uploaded by

Copyright:

Available Formats

Received November 11, 2019, accepted November 30, 2019, date of publication December 12, 2019,

date of current version December 27, 2019.

A Novel Multimodal-Sequential Approach Based

AND JIADONG REN 1,3

183208 VOLUME 7, 2019

VOLUME 7, 2019 183209

get is incomplete at the beginning. Therefore, how to integrate

183210 VOLUME 7, 2019

VOLUME 7, 2019 183211

183212 VOLUME 7, 2019

FIGURE 4. Framework of the detection procedure of the proposed method.

VOLUME 7, 2019 183213

payload of TCP/IP packets in order to detect attacks, such as

183214 VOLUME 7, 2019

The CICIDS2017 dataset [30] released in 2017 and is widely

183216 VOLUME 7, 2019

FIGURE 8. Average performance comparison of all competitive methods

converges to near zero. Train accuracy of the three models

VOLUME 7, 2019 183217

FIGURE 9. The test accuracy trends of all competitive methods on the

The experiments of both binary classification and multi-

183218 VOLUME 7, 2019

Further, through the experimental results of simple view

VOLUME 7, 2019 183219

183220 VOLUME 7, 2019

JIADONG REN received the B.S. and M.S.

VOLUME 7, 2019 183221

You might also like