You are on page 1of 17

Hindawi

Computational Intelligence and Neuroscience


Volume 2022, Article ID 6591140, 17 pages
https://doi.org/10.1155/2022/6591140

Research Article
Network Intrusion Detection Method Based on FCWGAN
and BiLSTM

Zexuan Ma ,1 Jin Li,1 Yafei Song ,1 Xuan Wu,1 and Chen Chen1,2
1
College of Air and Missile Defense, Air Force Engineering University, Xi’an 710051, China
2
Xi’an Satellite Control Center, Xi’an 710043, China

Correspondence should be addressed to Yafei Song; yafei_song@163.com

Received 13 February 2022; Revised 11 March 2022; Accepted 15 March 2022; Published 13 April 2022

Academic Editor: Konstantinos Demertzis

Copyright © 2022 Zexuan Ma et al. This is an open access article distributed under the Creative Commons Attribution License,
which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Imbalanced datasets greatly affect the analysis capability of intrusion detection models, biasing their classification results toward
normal behavior and leading to high false-positive and false-negative rates. To alleviate the impact of class imbalance on the
detection accuracy of network intrusion detection models and improve their effectiveness, this paper proposes a method based on
a feature selection-conditional Wasserstein generative adversarial network (FCWGAN) and bidirectional long short-term
memory network (BiLSTM). The method uses the XGBoost algorithm with Spearman’s correlation coefficient to select the data
features, filters out useless and redundant features, and simplifies the data structure. A conditional WGAN (CWGAN) is used to
generate a small number of samples in the dataset, add them to the original training set to supplement the dataset samples, and
apply BiLSTM to complete the training of the model and realize the classification. In comparative tests based on the NSL-KDD
and UNSW-NB15 datasets, the accuracy of the proposed model reached 99.57% and 85.59%, respectively, which is 1.44% and
2.98% higher than that of the same type of CWGAN and deep neural network (CWGAN-DNN) model, respectively.

1. Introduction failure to meet performance analysis requirements. There-


fore, it is important to solve the network data imbalance
The continuous development of computer and network problem and improve the performance of model intrusion
technology has greatly improved people’s lives, but with it detection.
come a variety of attacks and threats at the network level, The class imbalance problem is commonly solved by
making network security an unavoidable and urgent enhancing the model training effect by increasing the
problem. As an effective method to detect and defend against number of samples in datasets, and much research has been
network attacks, the intrusion detection system (IDS) has conducted based on this method. Maryam Yousefnezhad
been widely used. It monitors network traffic in real time, et al. [1] proposed a feature extraction ensemble classifi-
classifies it as normal or malicious, and provides information cation method based on deep learning. Firstly, the feature
necessary to intrusion prevention systems. In recent years, selection algorithm based on ensemble margin is used to
machine learning and deep learning have been widely used select the samples, and the deep learning method is used to
for intrusion detection. However, since real-life network extract the sample features. Finally, the outputs of multiple
traffic data are unbalanced and relatively little has malicious KNN and SVM are combined according to Dempster–
attack attributes, the training sets of such methods are se- Shafer method. This method uses the method of ensemble
verely unbalanced. Hence, while existing network intrusion learning, which can improve the detection rate of attack
detection systems have high resolution accuracy for whether types to a great extent. At the same time, feature selection
there is an attack, the detection accuracy of various samples based on ensemble margin can remove the useless data in the
is still low, especially for minority-class attacks, resulting in original dataset, and improve the overall detection accuracy,
the misclassification of such traffic as other traffic, and the and shorten the training time to a certain extent. However,
2 Computational Intelligence and Neuroscience

the structure is complex, there are many classifiers, and the classification. This method increases the minority samples of
overall calculation cost is high. Meanwhile, this method uses CIC-IDS2017 dataset and improves the detection ability of
KNN and SVM as classifiers to classify samples, and the the model for minority attack samples, so that the model can
overall classification accuracy of the model has a large space achieve better detection effect. At the same time, the
for improvement. Considering the complexity of dimen- structure is simple and the detection speed is fast. However,
sions and the low efficiency of traditional algorithms, a only ordinary GAN is used for sample generation, without
chaotic cuckoo optimization algorithm with levy flight, considering the instability of the GAN, there are hidden
disruption operator, and opposition-based learning dangers in the process of sample generation, and other
(CCOALFDO) is proposed by Kelidari and Hamidzadeh [2]. datasets and models were not used to further validate its
The algorithm combines levy flight, disruption operator and feasibility, which is not convincing. Liu et al. [6] proposed a
opposition-based learning to select the optimal feature GAN-FS method to address feature redundancy. The model
subspace for classification. Levy flight can deal with un- can select dataset features based on feature variance, elim-
certainty and better update the cuckoo steps in high-di- inate the impact of redundant data and useless data on the
mensional space. The opposition-based learning and model detection effect to a great extent, improve the ac-
disruption operator can improve the search ability of the curacy and speed of detection, and uses a GAN to generate
algorithm and ensure the diversity of the population. The samples, which increase the number of samples and enhance
algorithm proposed in this paper combines the above ad- the training effect. Comparative experiments confirmed that
vantages, which can greatly reduce the randomness of the method could effectively improve model detection
feature selection and avoid falling into the local optimal performance, but the method does not consider the degrees
solution. At the same time, due to the elimination of some of freedom of GAN training, and the generated data are
redundant features, the classification accuracy can be greatly unsupervised and uncontrollable. Compared with CGAN, it
improved. However, the combination of multiple algorithms is less targeted. At the same time, it only selects the features
leads to the increase of the overall computational complexity according to the feature variance, and the detection method
of the algorithm, which requires higher computational cost, is not comprehensive, which has certain limitations. He [7]
slows down the convergence speed and increases the addressed the low accuracy of class imbalance data detection
computational time. Gonzalez-Cuautle et al. [3] proposed a and proposed a model using a conditional Wasserstein
resampling method that integrates the synthetic minority generative adversarial network (CWGAN) to generate mi-
oversampling technique (SMOTE) and grid search algo- nority class attack samples and a Deep Neural Networks
rithms to solve the problems of overfitting and low classi- (DNN) as a classifier for network intrusion detection, which
fication accuracy. This method improved the classification improves the detection effect compared to a DNN. However,
results of the intrusion detection system (IDS) dataset by only using DNN as classifier to identify intrusion behavior,
merging synthetically generated balanced data and adjusting there is still a large gap in detection accuracy compared with
different supervised learning algorithms. SMOTE can other deep learning methods. At the same time, the high
oversample the data sample and increase the number of dimensionality of data is not considered, and the use of the
minority data. The grid search algorithm can automatically network intrusion detection system in a large-scale network
optimize the parameters, and find the parameters with the environment will be limited by time and space complexity
best detection effect and apply them to the model structure, because the data have high dimensionality and nonlinear
and avoid falling into the local optimal solution, which characteristics. Therefore, dimensionality reduction for
ensures the optimality of the model detection effect. high-dimensional data is a key step to improve detection
However, SMOTE randomly synthesizes the original data speed and performance.
according to the k-nearest neighbor principle, does not learn To solve the above problems, this paper combines
the essence of the original data, and the quality of the feature selection with a CWGAN. The feature selection-
generated samples is poor. At the same time, the grid search based dimensionality reduction of high-dimensional data
algorithm searches every parameter, which leads to too large can filter out redundant and useless features, simplify the
calculation cost, too long calculation time, and there is a data structure, improve intrusion detection performance,
large space for improvement. Lee and Park [4] proposed AE- and decrease training time. The CWGAN oversamples the
CGAN-RF, a model to solve the data imbalance problem by minority class data to supplement the samples and balance
using an autoencoder to reduce the dimension of the net- the data distribution, thus improving detection perfor-
work traffic and a conditional generative adversarial network mance. A bidirectional long short-term memory network
(CGAN) to generate data samples, which were passed to a (BiLSTM) is used to extract and classify the features from
random forest (RF) to complete the intrusion detection the time series. The loss function and optimization al-
classification. The model could greatly improve the accuracy gorithm are analyzed to select the most suitable
of minority class sample detection, and reduces the data hyperparameters.
dimension, which reduces the time required for training and This paper makes the following contributions:
reduces the calculation cost. However, the use of RF as a
(1) We propose FCWGAN-BiLSTM, a network intru-
classifier led to a low overall detection accuracy because of
sion detection system based on FCWGAN and a
RF’s weak classification ability. Lee and Park [5] proposed a
BiLSTM network, to alleviate the impact of class
detection model using a generative adversarial network
imbalance on detection performance and improve
(GAN) to generate minority class attack samples and RF for
Computational Intelligence and Neuroscience 3

the overall performance of a network intrusion accuracy and error score of the DT model on each feature
detection model subset are generated by the SFS. SFSDT starts from an empty
(2) We use XGBoost and Spearman correlation coeffi- set and sequentially adds features to enhance the accuracy of
cients for feature selection to filter out redundant the DT model until it is maximized on a validation dataset
and useless data and simplify the feature structure, (feature subset). The algorithm reduces execution time and
which reduces computational difficulty and im- required memory, and significantly improves detection
proves detection accuracy performance. However, SFS can only add features, and
cannot remove them, and it tends to fall into local optima.
(3) We apply CWGAN to generate minority class
Thus, it requires a large number of experiments to obtain the
samples to supplement the dataset, enhance the
best subset. Considering the above problems, we use
model training effect, reduce the impact of class
XGBoost and the Spearman correlation coefficient for
imbalance on the detection rate, and improve de-
dataset feature selection.
tection performance
(4) A BiLSTM network captures information in network
traffic data with long-term dependency, extracts 2.1.1. XGBoost. Proposed by Chen in 2015, XGBoost
network traffic feature extraction based on time (eXtreme Gradient Boosting) is a model framework based on
series, and effectively uses future moment infor- the idea of the gradient boosting decision tree (GBDT) [14].
mation to improve the model classification effect It has the advantages of high speed, high efficiency, and
(5) Model performance analysis experiments, model strong performance, and has been widely used to solve
ablation experiments, and comparison experiments classification and regression problems. The core idea is to
with different data augmentation algorithms and generate a new tree by splitting the features in a dataset, and
classification algorithms demonstrate the perfor- then to add new trees. It fits the residual of its last prediction
mance of the proposed model to obtain a new function and improves performance through
iteration. The traditional GBDT algorithm uses only first-
The rest of this paper is organized as follows. Section 2 order derivative information, while XGBoost uses a second-
presents the background and related work. Section 3 order Taylor expansion of the loss function and a regular
presents the proposed model, Section 4 provides experi- term to speed up training and prevent overfitting. We use
mental results and analysis, and Section 5 presents the this method to rank the importance of features in the dataset
conclusions. [15].

2. Background and Related Work 2.1.2. Spearman Correlation Coefficient. We use the
Spearman correlation coefficient to measure the correlation
2.1. Feature Selection. Feature selection is a method of between features. Proposed by Spearman in 1904, it mea-
selecting relevant features of a dataset by obtaining a subset sures the strength of the relationship between two variables
from the original feature set based on specific criteria. Data [16], and it takes values in the range (−1, 1). The Spearman
dimensionality reduction is often applied to high-dimen- correlation coefficient between variables xi and yi is cal-
sional complex data [8]. Unlike feature extraction, feature culated as
selection preserves the physical meaning of the original
features by retaining some of the data, and thus makes the 􏽐 xi − x􏼁 yi − y􏼁
i
model more readable and interpretable [9, 10]. In the field of ρ � 􏽲������������������� � (1)
2 2
intrusion detection, where datasets are characterized by a 􏽐 xi − x􏼁 􏽐 yi − y􏼁 ,
i i
large volume of data and high dimensionality, feature se-
lection reduces computational difficulty and eliminates data where xi (i � 1, 2, . . . , n) and yi (i � 1, 2, . . . , n) are elements
redundancy [11], thereby improving the detection rate of the of the vectors X and Y, respectively. A value of ρ close to ±1
model and reducing false positives. For example, a firefly indicates a strong association; hence one of the features can
algorithm was used for feature selection and to pass the be filtered out. A value close to 0 indicates that there is no
generated features through a classifier based on C4.5 and a association between them, and both should be retained.
Bayesian network (BN) to complete the classification for
intrusion detection [12]. The method selected important
features in the KDD CUP 99 dataset and reduced the 41- 2.2. CWGAN. A GAN is a deep learning model inspired by
dimensional features to 10 dimensions, which achieved the two-person zero-sum game in game theory and is used to
better detection performance and reduced computation. simulate complex high-dimensional distributions of real-
However, the method suffers from a low discovery rate and world data. It consists of a generator (G) and discriminator
slow solution speed, which leads to long calculation times. Le (D) [17], which are both neural networks. The generator
et al. [13] proposed SFSDT, a feature selection model that captures the potential distribution of real data samples and
combines a hybrid sequence forward selection (SFS) algo- generates new data samples. The discriminator is a binary
rithm with a decision tree (DT) model to select the best classifier used to determine whether the input sample is real
feature subset from the complete set of features in a dataset. or generated data. The classification results are passed back
The CF function in the SFS algorithm is adjusted, and the to the generator and discriminator through updates of the
4 Computational Intelligence and Neuroscience

weighted loss. The above networks are trained until the algorithms to iteratively generate new samples to retrain the
discriminator can no longer distinguish between real and intrusion detection system based on machine learning until
generated samples [18]. Its optimization process is a the detection rate converges. AEGAN [22] is a hybrid model
minimax game problem with the goal to achieve a Nash consisting of adversarial environment reinforcement
equilibrium so that the generated network can estimate the learning (AE-RL) and a CGAN, whose model is trained on a
distribution of the data samples [19]. The objective function network intrusion detection dataset to generate synthetic
for generating the adversarial network is samples to deal with class imbalance problems. The above
methods can improve the performance of network intrusion
min max V(D, G) � Ex∼pdata (x)[logD(x)] + Ez∼pdata (z)
G D detection systems, but none considers the vanishing gradient
(2)
[log(1 − D(G(z)))], problem that might occur during the training of GANs.
GANs and CGANs can generate samples and reduce
where pdata denotes the distribution of real samples, the class imbalance problems. However, their use of Jensen-
function G(z) maps noise z to the data space, and D(x) is Shannon scatter requires overlap between the distributions
the probability that sample x is real data. To distinguish of real and generated samples, which is nonexistent or
between real and generated data, D(x) should be as large as negligible when the discriminator is trained to be optimal,
possible, and D(G(z)) as small as possible. which can lead to model collapse and vanishing gradient
The CGAN is based on a GAN, where category infor- problems [23].
mation and noise are merged with the original data as the input To solve the above problems, we introduce the Lipschitz
to the generator and discriminator [20], with loss function limit and Wasserstein distance to CGAN to realize CWGAN
for the dataset samples, with the workflow shown in
min max V(D, G) � Ex∼pdata (x)[logD(x|y)] Figure 1.
G D
(3) We fix the discriminator, input the noise vector and
+ Ez∼pdata (z)[log(1 − D(G(z|y)))],
labels to the generator, and train it to simulate the real data
where y represents the category information, and other distribution. We use the discriminator to judge the real and
parameters are the same as in (2). generated samples. If it cannot distinguish between them, we
A GAN is different from ordinary oversampling, as it fix the generator and train the discriminator, and if it can, we
generates new samples by obtaining the potential distri- fix the discriminator and train the generator. We repeat
bution of the original data and passing it randomly into the these steps until the loss function of the discriminator is
generator. By training the generator and discriminator, the stabilized at about 0.5, at which time we generate attack
generated samples are similar to the original sample dis- samples and add them to the training set.
tribution with high confidence. GANs are used to generate Through the above method, the model can generate data
samples for minority classes and to expand datasets. For of a specified pattern to supplement the dataset, while ef-
example, the SIGMA method [21] generates new samples to fectively avoiding the vanishing gradients caused by the
enhance the ability of IDSs to resist new types of attacks, failure of the discriminator to converge during training. The
combining a GAN with hybrid local search and genetic objective function of CWGAN is

�� �� 2
V(D, G) � max􏼚Ex∼pdata [D(x|y)] − Ex∼pg [D(x|y)] − λEx∼pPenaty 􏽨��∇x D(x|y)�� − 1􏽩 􏼛, (4)
D



⎪ ft � σ 􏼐Wf · 􏼂ht−1 , xt 􏼃 + bf 􏼑
where λ is an artificial parameter, ‖∇x D(x)‖ is the calcu- ⎪



lation paradigm for x in D(x), and x ∼ pPenaty is the middle ⎪
⎪ it � σ Wi · 􏼂ht−1 , xt 􏼃 + bi 􏼁


position of the line connecting points on pr and pg . ⎪ 􏽥
⎨ Ct � tanh WC · 􏼂ht−1 , xt 􏼃 + bC 􏼁
⎪ , (5)

⎪ Ct 􏽥t
� ft · Ct−1 + it · C


2.3. BiLSTM. The model in a traditional neural network ⎪


⎪ ot � σ Wo · 􏼂ht−1 , xt 􏼃 + bo 􏼁
focuses only on the processing of the current moment, while ⎪


a recurrent neural network (RNN) can use information ht � ot · tanh Ct 􏼁
processed at the current moment at the next moment [24].
Considering the problem of the vanishing gradient and where ft is the forget gate; it is the input gate; C 􏽥 t and Ct are
gradient explosion during the training of an RNN, the current input and unit state, respectively; σ is the sigmoid
Hochreiter et al. proposed the long short-term memory function; Wf , Wi , Wo , and WC are the weight matrices of
network (LSTM) [25], which adds a gate mechanism and a the forget gate, input gate, output gate, and current input
memory unit on the basis of the RNN and memory unit to unit state, respectively; [ht−1 , xt ] denotes the concatenation
effectively solve the problems of RNNs, and better solves the of the two vectors; and bf , bi , bo , and bC are the bias terms of
longer distance dependence problem [26]. LSTM has input, the forget gate, input gate, output gate, and current input
forget, and output gates, as shown in Figure 2. unit state, respectively. The above parameters change con-
The LSTM structure is described as tinuously during training.
Computational Intelligence and Neuroscience 5

Input
CWGAN
Latent Space ...

(Real Samples BiLSTM


⊕ L S T M ...
x, y)
(Noise L S TM L S TM L S TM
z, y)
...
G L S TM L S TM L S TM L S TM
Generator

Output
Generated
Samples → ←
Ct = WTCt + WVCt

G Loss D
Discriminator
Figure 3: BiLSTM process diagram.
D Loss

Is D Correct? The process is


True False
→ → →


⎪ C t � LSTM 􏼒xt , h t−1 , C t−1 􏼓




⎨← ← ←
Figure 1: CWGAN workflow diagram. ⎪ C � LSTM 􏼒 x , h t−1 , Ct−1 􏼓 , (6)

⎪ t t



⎩ → ←
Ct � WT C t + WV Ct
ht
where the LSTM function represents the nonlinear trans-
Ct−1 × + Ct formation of the input feature, which is encoded as the
× tanh
corresponding hidden state of the LSTM ((5) and WT and
ft
~
it Ct ot ×
WV are the weight coefficients corresponding to the forward
σ σ tanh σ and backward moment unit state, respectively.
ht−1 + ht

xt 3. Network Intrusion Detection Method


Figure 2: LSTM structure diagram.
Based on FCWGAN and BiLSTM
We propose a network intrusion detection method based on
Considering the distinct temporal characteristics of FCWGAN and BiLSTM. XGBoost is used in the feature
network traffic data, the use of RNN-like approaches to selection stage to rank the importance of the features in the
deal with network intrusion problems has unique ad- dataset, whose relevance is analyzed based on the Spearman
vantages. For example, in [27], a deep learning-based correlation coefficient. Features with strong relevance and
intrusion detection system, DL-IDS, uses a hybrid net- low importance are filtered out to simplify the feature
work of convolutional neural networks (CNNs) and structure. The selected features are passed into CWGAN
LSTM to extract the spatiotemporal characteristics of together with the labels, and minority class samples in the
network traffic data, thus providing a better intrusion training set are generated in a controlled manner. Generated
detection system. However, it was not considered that the samples are passed into BiLSTM together with the original
unidirectional LSTM can only read sequence data from data in the training set for training, and the model is val-
one direction and cannot exclude the influence of sub- idated on a test set. The intrusion detection process includes
sequent information on the detection results. Thus, stages of data preprocessing, feature selection, sample
BiLSTM was used instead of LSTM to process incoming generation, feature extraction and training, and testing, as
data [28]. shown in Figure 4.
BiLSTM combines forward and backward LSTM to
learn from forward and backward time-series data. The 3.1. Data Preprocessing. Tag encoding was used to convert
hidden layer contains two units with the same input that the string-type features in the NSL-KDD and UNSW-NB15
are connected to the same output, where one processes the datasets to numeric-type. It was judged whether there was a
forward time series, and the other the backward time null value in the dataset, and if there was none, the data were
series, increasing the time series involved in training by normalized by Min − Max,
learning features better, thus providing higher accuracy for
longer time series. The BiLSTM process is shown in x − Mmin
x� , (7)
Figure 3. Mmax − Mmin
6 Computational Intelligence and Neuroscience

model performance [29]. Unlike the traditional GBDT al-


Model Structure gorithm that uses only first-order derivative information, the
XGBoost algorithm performs a second-order Taylor ex-
Datasets pansion on the loss function and adds a regularization term
to improve the model training speed and prevent overfitting.
Data Preprocessing The target loss function of the XGBoost algorithm is
n K



Feature Selection ⎪
⎪ Obj � 􏽘 l y ,
i i􏽢
y 􏼁 + 􏽘 Ω fk 􏼁


⎨ i�1 k�1
⎪ , (8)


Training set ⎪

Testing
⎩ Ω f 􏼁 � cT + 1 λ‖ω‖2

k
set
2
Sample generation
􏽢 i ) is the loss function, which represents the
where l(yi , y
Reshape difference between the predicted value y 􏽢 i and true value yi ;
and Ω(fk ) aims to prevent overfitting, where T is the
number of child nodes, ω denotes the leaf weights, cT re-
BiLSTM
duces the number of leaf nodes in the tree, c is the penalty
coefficient, λ‖ω‖2 is the regularization term, and λ is the
Reshape regularization coefficient.
XGBoost requires several iterations to continuously
BiLSTM generate the tree [30], assuming that the t-th iteration
produces the tree, and the objective function of the t-th
Softmax iteration is
Normal / Network n
intrusion Types Obj(t) � 􏽘 l 􏼐yi , y
􏽢 (t−1)
i + ft xi 􏼁􏼑 + Ω ft 􏼁. (9)
i�1
Figure 4: Schematic diagram of model structure based on
FCWGAN and BiLSTM. where Ω(ft ) is a function to prevent overfitting.
We can evaluate the reasonableness of the decision tree
structure based on the structure loss,
where Mmin and Mmax are the minimum and maximum
values, respectively, of the dimension. 􏽐 gi )
(t) 1 T ⎛ ⎜

⎜ i∈Ij
Obj (p) � − 􏽘 ⎜ ⎜ (10)
2 j�1 ⎝ 􏽐 hi + λ + cT,
3.2. Feature Selection. In the feature selection stage, we used i∈Ij
XGBoost to rank the feature importance, and Spearman’s
correlation coefficient to analyze the feature relevance. Ir- where gi and hi are the first- and second-order derivatives of
relevant and redundant features were filtered out, and im- the loss function to the predicted values after iteration t-1,
portant features were retained to improve detection speed Ij � 􏼈i|p(xi ) � j􏼉 is the index of leaf node j, and a smaller
and enhance detection results. structural loss indicates a better decision tree structure.
XGBoost obtains a new function by fitting the residuals If the tree splits at node j, the structure gain of the leaf
of the last prediction of the model and iterates to improve node is

2
2 2
⎛⎛

⎜ ⎝􏽐 g ⎞ ⎠ 􏼠􏽐 g􏼡 􏼠􏽐 g􏼡 ⎟ ⎞


⎜ i i i ⎟


1⎜



i∈I j i∈IL i∈IR ⎟


Objs � Obj pbefore 􏼁 − Obj pafter 􏼁 � ⎜
⎜ + + ⎟
⎟ − c, (11)


2⎜
⎜ 􏽐 hi + λ 􏽐 hi + λ 􏽐 hi + λ ⎟





⎝ i∈I i∈I i∈I ⎠
j L R

where c is the split coefficient, which can reduce the The importance of the features is sorted according to
complexity of the model and prevent overfitting. This split formula (11), and the Spearman correlation coefficient is
gain is used to judge the quality of the split node. used to analyze the feature correlation. The two are com-
Based on the above formulas, the importance of the bined to eliminate irrelevant and redundant features, filter
features was ranked, and their relevance was analyzed out key features, and pass them to the GAN for minority
through the Spearman correlation coefficient. class sample generation.
Computational Intelligence and Neuroscience 7

3.3. Sample Generation. In the sample generation process, Table 1: CWGAN training algorithm.
CWGAN was trained using noise and data samples that Algorithm 1: minority class sample generation based on
underwent feature selection and preprocessing [31], as CWGANs
shown in Table 1. Input: s � (z, y), where z is noise data, y is class label
In the process of training CWGAN, the generator and Output: sG � [G(z, y′ ), y′ ]
discriminator were trained in turn, as follows: (1) While D does not approach 0.5 /∗CWGAN training∗/
(1) The discriminator is fixed and the generator is (2) for t � 1, . . . , n do /∗optimize discriminator∗/
nz
(3) Sampling 􏼈(xi , yi )􏼉i�1 from pdata (x, y)
trained to simulate the distribution of the real data
n
(2) The generator is fixed, and the discriminator is (4) Sampling 􏼈(zi )􏼉i�1
z
from pz (z)
trained until it cannot distinguish whether samples nz D(xi , yi ) − D(G(zi , y′i), y′i)−
are from the real dataset or the generator (5) ηθD ←∇θD 􏼢1/nz 􏽐i�1 􏼨 λE [‖∇ D(x, y)‖ − 1]2 􏼩􏼣
(x,y)∼pPenaty (x,y)
(3) The discriminator is fixed, and the generator is (6) θD ←θD + αD · Adam(θD , ηθD )
trained until the discriminator cannot distinguish (7) end
nz
samples by successive training (8) from pz (z) sample 􏼈(zi )􏼉i�1 /∗optimize generator∗/
nz
(9) ηθG ←∇θG [1/nz 􏽐i�1 (D(G(zi , y′i), y′i))]
(4) Steps 1–3 are repeated until the loss value of the (10) θG ←θG − αG · Adam(θG , ηθG )
discriminator reaches 0.5 (11) end
(5) The generator is used to generate attack samples, and (12) return u /∗generate samples∗/
these are added to the training set to complete where θG , ηθG , θD and ηθD respectively denote the network parameters and
sample generation gradients of the generator and discriminator.

3.4. Feature Extraction and Training. In the feature extrac-


tion stage, a BiLSTM layer learned the long-term temporal A Bayesian optimization algorithm was used for auto-
features in the dataset, Nadam optimization was applied to matic optimization of model parameters, whose settings are
the neural network [32], a dropout layer alleviated over- shown in Table 2.
fitting, and a softmax classifier was used for network attack The categorical cross-entropy loss function is
classification.
1 N
oss � − 􏽘 y logyi + 1 − yi 􏼁 log 1 − yi 􏼁􏼁. (13)
3.5. Testing. The trained model was used to classify the test N i�0 i
set to obtain the prediction type. To ensure credible test
results, the model was tested by k-fold cross-validation. The
softmax function,
4.2. Dataset and Experimental Evaluation Criteria. The
exj proposed model was evaluated on the NSL-KDD and
σ(x)j � K xk j � 1, . . . , K, (12)
􏽐k�1 e UNSW-NB15 datasets.
The NSL-KDD dataset was obtained by Tavallaee et al. in
was used to calculate the probability of the classification and 2009 by eliminating duplicate instances in the KDD99
compare it with the original labels. dataset and enabling a more objective reflection of the
detection accuracy of the model [33]. It includes DoS, Probe,
4. Experiment and Result Analysis R2L, and U2R attack types, and has 41 attributes, but the data
are extremely unbalanced. It has far fewer attack instances
4.1. Experimental Settings. The performance of network than normal instances, with only 995 R2L attacks and 52
intrusion detection methods based on FCWGAN and U2R attacks.
BiLSTM were evaluated according to the following The UNSW-NB15 dataset was created by the Cyber
experiments: Range Lab of the Australian Cyber Security Centre, and
Experiment 1: model performance analysis includes attack types other than NSL-KDD, i.e., Fuzzers,
Analysis, Backdoor, DoS, Exploits, Generic, Reconnaissance,
Experiment 2: model noise robustness
Shellcode, and Worms. Similarly, there are far fewer attack
Experiment 3: model ablation instances than normal instances.
Experiment 4: comparison of data enhancement algorithms The distributions of training set types for the NSL-KDD
Experiment 5: comparison of classification algorithms datasets are shown in Figure 5.
The distributions of training set types for the UNSW-
Experiment 6: comparison of intrusion detection models
NB15 datasets are shown in Figure 6.
The experimental environment was a 64-bit Windows 10 Comparative experiments used classification accuracy,
operating system with TensorFlow learning framework, an precision, recall, and F1-score to judge the classification
AMD Ryzen 9 5900HX with Radeon Graphics at 3.30 GHz, effectiveness of the models. The classification confusion
and 32 GB RAM. matrix is shown in Table 3.
8 Computational Intelligence and Neuroscience

Table 2: Model parameter settings. individual features is strong, and redundant features can be
filtered out (Figures 9 and 10).
Parameter Setting
We combined the feature importance and correlation for
XGBoost maximum depth 12 analysis, and the filtered features are shown in Table 4.
XGBoost gamma value 0
Training set samples were then generated based on the
CWGAN learning rate 0.0001
CWGAN training iterations 200 selected features. We expanded the training set samples and
Noise dimension 32 combined the generated and original samples. The data
Batch size setting 1024 distribution of the combined training set is shown in Ta-
Loss function Categorical cross-entropy bles 5 and 6.
Optimizer Nadam Finally, the training set was passed into the BiLSTM
Optimizer learning rate 0.001 network for training, and the test data were passed into the
BiLSTM cell count 64/128 completed model to evaluate the model detection effect. The
Dropout rate 0.5 trends of model detection accuracy and average loss with the
number of iterations are shown in Figures 11 and 12.
The trends of various class detection rates with the
The four evaluation criteria are as follows: number of iterations are shown in Figures 13 and 14.
TP + TN From Figures 11 and 12, one can see that the accuracy of
accuracy � , the model increases rapidly with the number of iterations at
TP + TN + FP + FN
the early stage of training, and gradually stabilizes; the av-
TP erage loss decreases rapidly with the number of iterations,
recall � ,
TP + FN and can reach a stable state quickly. Using the proposed
(14) model to perform multiclassification on the NSL-KDD and
TP UNSW-NB15 datasets, the best accuracy rates are 99.57%
precision � ,
TP + FP and 85.59%, respectively. This shows that the model can
distinguish types of network intrusion attacks well, thus
precision × recall
F1 − score � 2 × . obtaining high detection accuracy and a better detection
precision + recall effect.
From Figures 13 and 14, it can be seen that the proposed
model can accurately identify normal and majority class
4.3. Experimental Results and Analysis attacks on both datasets, and the detection rate for minority
class attacks can also reach a high standard, showing that the
4.3.1. Model Performance Analysis Experiment. To verify the minority class samples generated by the model largely al-
effectiveness of the proposed model at network intrusion leviate the impact of the class imbalance problem, thus
detection, we set up performance analysis experiments on improving the overall detection effect.
network intrusion detection methods based on FCWGAN
and BiLSTM.
FCWGAN was used to select the features of the training 4.3.2. Model Noise Robustness Experiment. In recent years,
set samples of the NSL-KDD and UNSW-NB15 datasets, the network environment has become more and more
filter out redundant and useless samples, and simplify the complex. In addition to a large number of redundant and
data structure. The feature importance was judged using useless data, there are also noise data in the network data,
XGBoost, and the feature importance scores were obtained which will lead to the low robustness of the intrusion de-
as shown in Figures 7 and 8. tection system [34]. In order to verify the robustness of the
The feature importance score in Figures 7 and 8 selects model proposed in this paper to noise, this section sets up a
the total splitting gain, which can better reflect the im- noise robustness experiment for network intrusion detection
portance of variables to the model. methods based on FCWGAN and BiLSTM.
From Figure 7, one can see that among the features of Different levels of Gaussian white noise are added to
NSL-KDD datasets, the “dst_host_srv_count” is the most NSL-KDD and UNSW-NB15 datasets, which obey N
important and the “su_attempted” is the lowest; Similarly, it (0, 0.02), N (0, 0.04), N (0, 0.06) and N (0, 0.08), respectively.
can be seen from Figure 8 that among the features of UNSW- The detection accuracy of the model under the influence of
NB15 datasets, the “dur” is the most important and the different noise levels is shown in Table 7.
“ct_ftp_cmd” is the lowest. At the same time, it can be seen From Table 7, it shows that the accuracy of the two
that in the above two datasets, the importance of different datasets decreases to a certain extent with the increase of the
features varies greatly, and the importance of individual noise level. However, the range of change did not exceed
features is close to 0, which has little influence on the dis- 1.5%. This shows that the model proposed in this paper has
crimination of sample types. Therefore, these useless features strong robustness and stability to the interference of noise,
with low importance can be screened out to simplify the and a small amount of noise data cannot have a significant
feature structure. impact on the performance of the model. At the same time,
The feature correlations were analyzed using the according to the conclusion of 3.3.1, different levels of
Spearman correlation coefficient; the correlation between Gaussian white noise are added to several features with
Computational Intelligence and Neuroscience 9

Attack data
U2R: 52 0.09%
Attack data
R2L: 995 1.7%
Attack data
Probe: 11656 19.88%

Attack data
Normal data

Attack data
DoS: 45927 78.33%

Figure 5: Distribution of NSL-KDD training set types.

Attack data Attack data


Backdoor: 1746 1.46% Worms: 130 0.11%
Attack data Attack data
Analysis: 2000 1.67% Shellcode: 1133 0.95%
Attack data Attack data
Recon..: 10491 8.79% Generic: 40000 33.52%
Normal data
Attack data
DoS: 12264 10.28% Attack data
Attack data
Fuzzers: 18184 15.24%

Attack data
Exploits: 33393 27.98%

Figure 6: Distribution of UNSW-NB15 training set types.

strong correlation, middle correlation and weak correlation, Table 3: Definition of classification confusion matrix.
which tests the accuracy of model detection. The result
Predicted class
shows that adding noise to the features with stronger cor-
relation has more obvious impact on the performance of the Normal Abnormal
model, while the features with weaker correlation have little Normal TP FN
Actual class
impact. It shows that when dealing with noise, it is not Abnormal FP TN
necessary to deal with all features, but only some noise
sensitive features, which also confirms the necessity of
feature selection.
From Table 8, it shows that the feature extraction and the
proposed CWGAN played a relatively significant role in the
4.3.3. Model Ablation Experiment. We set up model ablation improvement of the detection rate for minority class sam-
experiments to verify the proposed feature selection and the ples. The reason is that real-world data contain many ir-
ability of CWGAN to improve the detection effect of the relevant, redundant, and noisy features, whose removal
model for minority samples. through feature selection can greatly reduce storage and
Under the same experimental conditions, BiLSTM, computational costs, and can simplify the data structure and
GAN-BiLSTM, CWGAN-BiLSTM, and the model in this improve the detection results. The proposed feature selection
paper were compared on the NSL-KDD dataset. The de- method was used to directly select a subset of relevant
tection rates of each model for various types of NSL-KDD features for the model, eliminate useless and redundant
datasets were evaluated, and are displayed in Table 8. features, and improve the test effectiveness from the original
10

Importance Importance

0
1000
2000
3000
4000

0
5000
10000
15000
20000
25000
30000
35000
40000
dst_host_srv_count
dur src_bytes
sload
dst_host_count
ct_srv_src
dst_bytes
sbytes
stcpb count
dtcpb dst_host_diff_srv_rate
ct_src_ltm duration
tcprtt dst_host_same_srv_rate
ct_srv_dst dst_host_same_src_port_rate
ackdat dst_host_srv_diff_host_rate
ct_dst_src_ltm srv_count
ct_dst_ltm service
synack
dst_host_rerror_rate
djit
dst_host_serror_rate
sjit
dload dst_host_srv_serror_rate
rate diff_srv_rate

the number of minority samples. Ultimately, this enhanced


and combining it with the original training set to increase
dataset level. CWGAN achieved the controlled generation of

the training effect of the model. Therefore, we combined the


Wasserstein distance, while avoiding the vanishing gradient,
minority samples by adding category information and the
sinpkt srv_diff_host_rate
dinpkt same_srv_rate
smean flag
dbytes hot
Feature

Feature
ct_src_dport_ltm

model.
protocol_type
dmean serror_rate
service
logged_in
spkts
rerror_rate
proto
dpkts srv_serror_rate
Figure 7: Feature importance map of NSL-KDD.

ct_dst_sport_ltm num_compromised

Figure 8: Feature importance map of UNSW-NB15.


response_body_len num_root
sttl srv_rerror_rate
sloss num_file_creations
trans_depth num_failed_logins
dloss wrong_fragment
state land
ct_state_ttl
root_shell
ct_flw_http_mthd
num_access_files
is_ftp_login
dttl is_guest_login
swin urgent
dwin num_shells
ct_ftp_cmd su_attempted

two to process the dataset and improve the test effect of the
Computational Intelligence and Neuroscience

hancement Algorithms. We set up comparison experiments


4.3.4. Comparative Experiments with Different Data En-
Computational Intelligence and Neuroscience 11

1
duration 1 0.1 0.18 0.11 0.23 0.15 0.0041 0.027 0.03 0.23 0.058 0.12 0.08 0.075 0.075 0.042 0.087 0.0099 0.048 0 0.0098 0.31 0.32 0.32 0.18 0.18 0.051 0.046 0.17 0.14 0.016 0.067 0.16 0.14 0.2 0.18 0.027 0.16 0.15 0.065 0.069

protocol_type 0.1 1 0.0079 0.11 0.044 0.14 0.0019 0.13 0.0012 0.02 0.0042 0.11 0.014 0.005 0.0034 0.0098 0.0065 0.0026 0.0074 0 3.8E-4 0.013 0.08 0.053 0.086 0.088 0.048 0.052 0.032 0.013 0.035 0.24 0.027 0.058 0.12 0.15 0.28 0.087 0.093 0.016 0.058

service 0.18 0.0079 1 0.3 0.16 0.098 0.013 0.058 0.012 0.069 0.03 0.015 0.029 0.033 0.041 0.019 0.045 0.0041 0.024 0 0.0046 0.098 0.034 0.27 0.25 0.23 0.14 0.14 0.3 0.3 0.2 0.14 0.31 0.36 0.37 0.15 0.16 0.26 0.24 0.13 0.13

flag 0.11 0.11 0.3 1 0.82 0.66 0.013 0.074 0.0067 0.088 0.0091 0.62 0.07 0.023 0.013 0.052 0.032 0.015 0.04 0 0.0022 0.075 0.5 0.084 0.6 0.56 0.64 0.62 0.76 0.69 0.37 0.39 0.64 0.66 0.59 0.4 0.35 0.56 0.53 0.48 0.54

src_bytes 0.23 0.044 0.16 0.82 1 0.7 0.015 0.016 0.0078 0.2 0.013 0.78 0.15 0.04 0.034 0.081 0.073 0.03 0.063 0 9.5E-4 0.13 0.52 0.053 0.67 0.65 0.36 0.34 0.75 0.71 0.29 0.41 0.62 0.62 0.53 0.38 0.34 0.62 0.61 0.23 0.26

dst_bytes 0.15 0.14 0.098 0.66 0.7 1 0.012 0.081 0.013 0.2 0.025 0.82 0.17 0.063 0.046 0.013 0.04 0.0066 0.068 0 0.0047 0.13 0.44 0.017 0.54 0.51 0.3 0.28 0.63 0.61 0.31 0.34 0.71 0.67 0.62 0.059 0.33 0.52 0.48 0.22 0.19

land 0 0.0019 0.013 0.013 0.015 0.012 1 0.0013 1.2E-4 0.0021 4.4E-4 0.011 0.0014 5.2E-4 3.6E-4 0.001 6.7E-4 2.7E-4 7.7E-4 0 4E-5 0.0014 0.018 0.014 0.022 0.022 0.0039 0.0054 0.0086 0.0081 0.023 0.026 0.018 0.013 0.011 0.023 0.021 0.02 0.017 0.004 0.006

wrong_fragment 0.03 0.13 0.058 0.074 0.016 0.081 0.0013 1 7.9E-4 0.014 0.0029 0.076 0.0095 0.0034 0.0024 0.0067 0.0045 0.0018 0.0051 0 2.6E-4 0.0091 0.012 0.085 0.041 0.06 0.018 0.036 0.048 0.047 0.028 0.023 0.019 0.02 0.032 0.11 0.037 0.017 0.064 0.038 0.04
0.8
urgent 0.03 0.0012 0.012 0.0067 0.0078 0.013 1.2E-4 7.9E-4 1 0.018 0.06 0.0085 0.056 0.1 0.11 0.078 0.059 1.6E-4 0.017 0 2.4E-5 8.2E-4 0.011 0.012 0.0056 0.0055 0.0032 0.0032 0.0065 0.0066 0.0045 0.0068 0.012 0.0046 0.0023 0.0041 3.8E-4 0.0047 0.0058 0.0039 0.0036

hot 0.23 0.02 0.069 0.088 0.2 0.2 0.0021 0.014 0.018 1 0.093 0.17 0.51 0.14 0.016 0.0085 0.053 0.014 0.0021 0 0.019 0.67 0.15 0.14 0.086 0.083 0.009 0.012 0.11 0.1 0.028 0.069 0.0049 0.063 0.06 0.0018 0.066 0.064 0.07 0.095 0.095

num_failed_logins 0.058 0.0042 0.03 0.0091 0.013 0.025 4.4E-4 0.0029 0.06 0.093 1 0.011 0.035 0.027 0.09 0.037 0.063 6E-4 0.003 0 8.8E-5 0.0022 0.038 0.04 0.018 0.018 0.031 0.03 0.023 0.022 0.017 0.031 0.019 0.0075 0.0028 0.015 0.018 0.0072 0.0079 0.03 0.029

logged_in 0.12 0.11 0.015 0.62 0.78 0.82 0.011 0.076 0.0085 0.17 0.011 1 0.13 0.045 0.031 0.089 0.059 0.024 0.067 0 0.0035 0.12 0.5 0.14 0.49 0.47 0.28 0.26 0.6 0.59 0.27 0.42 0.64 0.61 0.56 0.14 0.43 0.46 0.43 0.2 0.18

num_compromised 0.08 0.014 0.029 0.07 0.15 0.17 0.0014 0.0095 0.056 0.51 0.035 0.13 1 0.22 0.24 0.17 0.11 0.023 0.093 0 0.028 0.0099 0.087 0.079 0.062 0.06 0.002 0.027 0.076 0.075 0.014 0.047 0.0072 0.069 0.068 0.0024 0.053 0.031 0.028 0.13 0.15

root_shell 0.075 0.005 0.033 0.023 0.04 0.063 5.2E-4 0.0034 0.1 0.14 0.027 0.045 0.22 1 0.58 0.25 0.13 0.089 0.21 0 1E-4 0.0036 0.039 0.035 0.019 0.018 0.0089 0.0096 0.027 0.027 0.016 0.029 0.0091 0.011 0.01 0.014 0.01 0.01 0.01 5.5E-4 0.0028

su_attempted 0.075 0.0034 0.041 0.013 0.034 0.046 3.6E-4 0.0024 0.11 0.016 0.09 0.031 0.24 0.58 1 0.34 0.12 0.016 0.28 0 7.1E-5 0.0025 0.034 0.035 0.0086 0.0079 0.0059 0.0068 0.019 0.019 0.013 0.012 0.02 0.0099 0.014 0.001 0.0061 0.0035 0.0043 0.007 0.0089

num_root 0.042 0.0098 0.019 0.052 0.081 0.013 0.001 0.0067 0.078 0.0085 0.037 0.089 0.17 0.25 0.34 1 0.11 0.039 0.1 0 0.039 0.007 0.08 0.08 0.043 0.041 0.025 0.026 0.05 0.044 0.027 0.054 0.031 0.021 0.039 0.061 0.035 0.032 0.023 0.011 0.013

num_file_creations 0.087 0.0065 0.045 0.032 0.073 0.04 6.7E-4 0.0045 0.059 0.053 0.063 0.059 0.11 0.13 0.12 0.11 1 0.085 0.062 0 1.3E-4 0.0074 0.052 0.051 0.022 0.024 0.015 0.015 0.031 0.029 0.0081 0.021 0.021 0.01 0.019 0.016 0.0018 0.0011 0.0065 0.0014 0.0016
0.6
num_shells 0.0099 0.0026 0.0041 0.015 0.03 0.0066 2.7E-4 0.0018 1.6E-4 0.014 6E-4 0.024 0.023 0.089 0.016 0.039 0.085 1 0.022 0 5.4E-5 0.0019 0.01 0.0074 0.01 0.012 0.0074 0.0074 0.011 0.0087 0.0076 0.01 0.0081 0.0041 0.0038 0.021 0.0023 0.0067 0.0073 4.5E-4 0.0072

num_access_files 0.048 0.0074 0.024 0.04 0.063 0.068 7.7E-4 0.0051 0.017 0.0021 0.003 0.067 0.093 0.21 0.28 0.1 0.062 0.022 1 0 1.5E-4 0.0023 0.054 0.044 0.032 0.03 0.019 0.0084 0.04 0.039 0.0085 2.5E-4 0.012 0.011 0.0023 0.015 0.012 0.023 0.023 0.0098 0.0072

num_outbound_cmds 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

is_host_login 0.0098 3.8E-4 0.0046 0.0022 9.5E-4 0.0047 4E-5 2.6E-4 2.4E-5 0.019 8.8E-5 0.0035 0.028 1E-4 7.1E-5 0.039 1.3E-4 5.4E-5 1.5E-4 0 1 2.7E-4 0.0038 0.0039 0.0019 0.0018 0.0011 0.0011 0.0022 0.0022 0.0015 0.0023 0.0014 0.0017 0.0045 0.0026 0.0018 0.0025 0.0026 0.0013 0.0058

is_guest_login 0.31 0.013 0.098 0.075 0.13 0.13 0.0014 0.0091 8.2E-4 0.67 0.0022 0.12 0.0099 0.0036 0.0025 0.007 0.0074 0.0019 0.0023 0 2.7E-4 1 0.13 0.13 0.063 0.061 0.036 0.036 0.07 0.066 0.049 0.035 0.035 0.02 0.025 0.018 0.061 0.046 0.062 0.0099 0.011

count 0.32 0.08 0.034 0.5 0.52 0.44 0.018 0.012 0.011 0.15 0.038 0.5 0.087 0.039 0.034 0.08 0.052 0.01 0.054 0 0.0038 0.13 1 0.52 0.58 0.54 0.071 0.065 0.72 0.62 0.33 0.62 0.32 0.43 0.36 0.55 0.53 0.54 0.51 0.022 0.024

srv_count 0.32 0.053 0.27 0.084 0.053 0.017 0.014 0.085 0.012 0.14 0.04 0.14 0.079 0.035 0.035 0.08 0.051 0.0074 0.044 0 0.0039 0.13 0.52 1 0.073 0.11 0.21 0.21 0.032 0.039 0.24 0.22 0.3 0.26 0.24 0.2 0.12 0.046 0.08 0.21 0.22

serror_rate 0.18 0.086 0.25 0.6 0.67 0.54 0.022 0.041 0.0056 0.086 0.018 0.49 0.062 0.019 0.0086 0.043 0.022 0.01 0.032 0 0.0019 0.063 0.58 0.073 1 0.97 0.17 0.18 0.76 0.67 0.32 0.43 0.52 0.57 0.49 0.48 0.39 0.94 0.92 0.23 0.21 0.4
srv_serror_rate 0.18 0.088 0.23 0.56 0.65 0.51 0.022 0.06 0.0055 0.083 0.018 0.47 0.06 0.018 0.0079 0.041 0.024 0.012 0.03 0 0.0018 0.061 0.54 0.11 0.97 1 0.22 0.24 0.71 0.62 0.31 0.41 0.48 0.53 0.44 0.47 0.37 0.92 0.94 0.28 0.26

rerror_rate 0.051 0.048 0.14 0.64 0.36 0.3 0.0039 0.018 0.0032 0.009 0.031 0.28 0.002 0.0089 0.0059 0.025 0.015 0.0074 0.019 0 0.0011 0.036 0.071 0.21 0.17 0.22 1 0.97 0.22 0.23 0.15 0.083 0.31 0.29 0.29 0.014 0.078 0.18 0.22 0.84 0.88

srv_rerror_rate 0.046 0.052 0.14 0.62 0.34 0.28 0.0054 0.036 0.0032 0.012 0.03 0.26 0.027 0.0096 0.0068 0.026 0.015 0.0074 0.0084 0 0.0011 0.036 0.065 0.21 0.18 0.24 0.97 1 0.21 0.21 0.12 0.075 0.3 0.28 0.27 0.02 0.075 0.19 0.24 0.83 0.89

same_srv_rate 0.17 0.032 0.3 0.76 0.75 0.63 0.0086 0.048 0.0065 0.11 0.023 0.6 0.076 0.027 0.019 0.05 0.031 0.011 0.04 0 0.0022 0.07 0.72 0.032 0.76 0.71 0.22 0.21 1 0.92 0.38 0.54 0.7 0.76 0.65 0.53 0.49 0.72 0.68 0.14 0.16

diff_srv_rate 0.14 0.013 0.3 0.69 0.71 0.61 0.0081 0.047 0.0066 0.1 0.022 0.59 0.075 0.027 0.019 0.044 0.029 0.0087 0.039 0 0.0022 0.066 0.62 0.039 0.67 0.62 0.23 0.21 0.92 1 0.38 0.53 0.67 0.73 0.65 0.44 0.48 0.64 0.6 0.16 0.16

srv_diff_host_rate 0.016 0.035 0.2 0.37 0.29 0.31 0.023 0.028 0.0045 0.028 0.017 0.27 0.014 0.016 0.013 0.027 0.0081 0.0076 0.0085 0 0.0015 0.049 0.33 0.24 0.32 0.31 0.15 0.12 0.38 0.38 1 0.31 0.4 0.44 0.4 0.16 0.34 0.33 0.3 0.14 0.12

dst_host_count 0.067 0.24 0.14 0.39 0.41 0.34 0.026 0.023 0.0068 0.069 0.031 0.42 0.047 0.029 0.012 0.054 0.021 0.01 2.5E-4 0 0.0023 0.035 0.62 0.22 0.43 0.41 0.083 0.075 0.54 0.53 0.31 1 0.35 0.53 0.43 0.69 0.84 0.42 0.38 0.052 0.03

dst_host_srv_count 0.16 0.027 0.31 0.64 0.62 0.71 0.018 0.019 0.012 0.0049 0.019 0.64 0.0072 0.0091 0.02 0.031 0.021 0.0081 0.012 0 0.0014 0.035 0.32 0.3 0.52 0.48 0.31 0.3 0.7 0.67 0.4 0.35 1 0.92 0.84 0.15 0.45 0.53 0.47 0.27 0.25
0.2
dst_host_same_srv_rate 0.14 0.058 0.36 0.66 0.62 0.67 0.013 0.02 0.0046 0.063 0.0075 0.61 0.069 0.011 0.0099 0.021 0.01 0.0041 0.011 0 0.0017 0.02 0.43 0.26 0.57 0.53 0.29 0.28 0.76 0.73 0.44 0.53 0.92 1 0.9 0.31 0.54 0.58 0.51 0.25 0.22

dst_host_diff_srv_rate 0.2 0.12 0.37 0.59 0.53 0.62 0.011 0.032 0.0023 0.06 0.0028 0.56 0.068 0.01 0.014 0.039 0.019 0.0038 0.0023 0 0.0045 0.025 0.36 0.24 0.49 0.44 0.29 0.27 0.65 0.65 0.4 0.43 0.84 0.9 1 0.21 0.49 0.5 0.43 0.27 0.22

dst_host_same_src_port_rate 0.18 0.15 0.15 0.4 0.38 0.059 0.023 0.11 0.0041 0.0018 0.015 0.14 0.0024 0.014 0.001 0.061 0.016 0.021 0.015 0 0.0026 0.018 0.55 0.2 0.48 0.47 0.014 0.02 0.53 0.44 0.16 0.69 0.15 0.31 0.21 1 0.56 0.46 0.45 0.039 0.0076

dst_host_srv_diff_host_rate 0.027 0.28 0.16 0.35 0.34 0.33 0.021 0.037 3.8E-4 0.066 0.018 0.43 0.053 0.01 0.0061 0.035 0.0018 0.0023 0.012 0 0.0018 0.061 0.53 0.12 0.39 0.37 0.078 0.075 0.49 0.48 0.34 0.84 0.45 0.54 0.49 0.56 1 0.38 0.34 0.062 0.038

dst_host_serror_rate 0.16 0.087 0.26 0.56 0.62 0.52 0.02 0.017 0.0047 0.064 0.0072 0.46 0.031 0.01 0.0035 0.032 0.0011 0.0067 0.023 0 0.0025 0.046 0.54 0.046 0.94 0.92 0.18 0.19 0.72 0.64 0.33 0.42 0.53 0.58 0.5 0.46 0.38 1 0.92 0.2 0.21

dst_host_srv_serror_rate 0.15 0.093 0.24 0.53 0.61 0.48 0.017 0.064 0.0058 0.07 0.0079 0.43 0.028 0.01 0.0043 0.023 0.0065 0.0073 0.023 0 0.0026 0.062 0.51 0.08 0.92 0.94 0.22 0.24 0.68 0.6 0.3 0.38 0.47 0.51 0.43 0.45 0.34 0.92 1 0.27 0.25

dst_host_rerror_rate 0.065 0.016 0.13 0.48 0.23 0.22 0.004 0.038 0.0039 0.095 0.03 0.2 0.13 5.5E-4 0.007 0.011 0.0014 4.5E-4 0.0098 0 0.0013 0.0099 0.022 0.21 0.23 0.28 0.84 0.83 0.14 0.16 0.14 0.052 0.27 0.25 0.27 0.039 0.062 0.2 0.27 1 0.88

dst_host_srv_rerror_rate 0.069 0.058 0.13 0.54 0.26 0.19 0.006 0.04 0.0036 0.095 0.029 0.18 0.15 0.0028 0.0089 0.013 0.0016 0.0072 0.0072 0 0.0058 0.011 0.024 0.22 0.21 0.26 0.88 0.89 0.16 0.16 0.12 0.03 0.25 0.22 0.22 0.0076 0.038 0.21 0.25 0.88 1
0
root_shell
dst_bytes

su_attempted

dst_host_count
dst_host_srv_count
dst_host_same_srv_rate

dst_host_same_src_port_rate
duration

num_outbound_cmds
is_host_login
protocol_type
service
flag
src_bytes

land
wrong_fragment
urgent
hot

is_guest_login

dst_host_srv_diff_host_rate
dst_host_serror_rate

dst_host_rerror_rate
num_failed_logins
logged_in
num_compromised

num_root
num_file_creations
num_shells
num_access_files

count
srv_count
serror_rate
srv_serror_rate
rerror_rate
srv_rerror_rate
same_srv_rate
diff_srv_rate
srv_diff_host_rate

dst_host_diff_srv_rate

dst_host_srv_serror_rate

dst_host_srv_rerror_rate
Figure 9: Feature correlation diagram of NSL-KDD.

to verify the superiority of the FCWGAN data enhancement labels, and pass them to the generator to generate new
algorithm at network intrusion detection. minority samples. Compared with WGAN, FCWGAN adds
Under the same experimental conditions, ROS, ADA- feature selection and simplifies the data structure, which
SYN, SMOTE, WGAN, and the proposed FCWGAN calculation cost is reduced and the calculation speed is
method were used for data enhancement on the NSL-KDD accelerated. At the same time, a gradient penalty term solves
and UNSW-NB15 datasets, respectively, using BiLSTM as a the vanishing gradient problem during training, so that
classifier, with test results as shown in Tables 9 and 10. FCWGAN can generate minority class samples that have
From Tables 9 and 10, it can be seen that the proposed higher quality and are more similar to the original samples.
FCWGAN-BiLSTM achieved the best test results in terms of
accuracy, precision, recall, and F1-score. Overall, FCWGAN
was better for data enhancement. The time in the table is the 4.3.5. Comparative Experiments with Different Classification
training time of a single epoch. It can be found that the Algorithms. We performed comparison experiments to
training time of the model in this paper is lower than that of verify that BiLSTM could achieve better results for the
other methods, indicating that the calculation speed of the classification of network intrusions.
model is the fastest and the calculation cost is the smallest. Under the same experimental conditions, the dataset was
This is because ROS only performs a simple resampling of processed using FCWGAN, and was then trained on RF,
the original data, ADASYN and SMOTE perform a random DNN, LSTM, and BiLSTM. The results of different algo-
synthesis of the original data based on the k-nearest rithms for network intrusion behavior were evaluated, and
neighbor principle, and neither learns the nature of the the results are shown in Tables 11 and 12.
original data. In contrast, FCWGAN, which is based on deep From Tables 11 and 12, it can be seen that the proposed
learning, can acquire the potential distribution of the FCWGAN-BiLSTM achieved the best results in terms of
original data, randomly connect the data points with class accuracy, precision, recall, and F1-score. Moreover, BiLSTM
12 Computational Intelligence and Neuroscience

1
dur 1 0.52 0.086 0.69 0.82 0.77 0.74 0.75 0.88 0.29 0.82 0.85 0.62 0.75 0.73 0.88 0.85 0.89 0.82 0.79 0.74 0.74 0.79 0.8 0.8 0.8 0.17 0.69 0.3 0.26 0.52 0.6 0.47 0.57 0.61 0.54 0.14 0.14 0.3 0.47 0.55 0.19

proto 0.52 1 0.12 0.36 0.56 0.52 0.52 0.51 0.59 0.28 0.51 0.57 0.39 0.57 0.56 0.6 0.51 0.58 0.53 0.6 0.56 0.56 0.59 0.56 0.56 0.55 0.1 0.44 0.21 0.17 0.44 0.35 0.43 0.46 0.46 0.43 0.071 0.071 0.21 0.42 0.45 0.22

service 0.086 0.12 1 0.098 0.027 0.11 0.08 0.13 0.008 0.15 0.11 0.082 0.12 0.05 0.11 0.0081 0.1 0.074 0.12 0.068 0.08 0.08 0.085 0.07 0.067 0.083 0.12 0.16 0.5 0.41 0.089 0.1 0.18 0.13 0.18 0.029 0.14 0.14 0.5 0.15 0.063 0.1

state 0.69 0.36 0.098 1 0.66 0.81 0.62 0.82 0.66 0.61 0.86 0.66 0.89 0.65 0.67 0.59 0.82 0.67 0.69 0.69 0.68 0.68 0.72 0.69 0.69 0.68 0.21 0.86 0.25 0.2 0.47 0.86 0.39 0.58 0.67 0.56 0.085 0.085 0.25 0.4 0.48 0.12

spkts 0.82 0.56 0.027 0.66 1 0.92 0.9 0.89 0.63 0.43 0.75 0.62 0.76 0.93 0.91 0.68 0.77 0.78 0.81 0.87 0.82 0.82 0.88 0.77 0.77 0.77 0.22 0.79 0.24 0.19 0.42 0.71 0.38 0.57 0.61 0.52 0.15 0.15 0.24 0.38 0.44 0.21

dpkts 0.77 0.52 0.11 0.81 0.92 1 0.82 0.99 0.63 0.62 0.82 0.65 0.9 0.93 0.96 0.65 0.83 0.76 0.85 0.89 0.85 0.85 0.91 0.79 0.79 0.79 0.15 0.93 0.28 0.27 0.46 0.86 0.38 0.6 0.66 0.57 0.16 0.16 0.28 0.39 0.47 0.12

sbytes 0.74 0.52 0.08 0.62 0.9 0.82 1 0.79 0.58 0.4 0.68 0.47 0.69 0.83 0.81 0.61 0.69 0.7 0.71 0.76 0.74 0.74 0.79 0.69 0.69 0.69 0.56 0.71 0.24 0.19 0.48 0.64 0.45 0.6 0.67 0.54 0.11 0.11 0.24 0.43 0.49 0.21

dbytes 0.75 0.51 0.13 0.82 0.89 0.99 0.79 1 0.63 0.66 0.81 0.65 0.92 0.89 0.95 0.65 0.82 0.75 0.85 0.87 0.84 0.84 0.89 0.77 0.76 0.77 0.14 0.96 0.32 0.33 0.46 0.86 0.38 0.61 0.66 0.58 0.13 0.13 0.32 0.39 0.47 0.12

rate 0.88 0.59 0.008 0.66 0.63 0.63 0.58 0.63 1 0.34 0.79 0.96 0.53 0.62 0.59 0.95 0.81 0.85 0.74 0.71 0.66 0.66 0.71 0.75 0.75 0.75 0.09 0.6 0.29 0.23 0.56 0.52 0.51 0.55 0.59 0.55 0.1 0.1 0.29 0.51 0.58 0.2 0.8
sttl 0.29 0.28 0.15 0.61 0.43 0.62 0.4 0.66 0.34 1 0.31 0.36 0.76 0.45 0.54 0.29 0.31 0.25 0.32 0.37 0.36 0.36 0.39 0.2 0.2 0.2 0.08 0.73 0.16 0.27 0.29 0.71 0.082 0.4 0.45 0.4 0.1 0.1 0.16 0.13 0.26 0.24

dttl 0.82 0.51 0.11 0.86 0.75 0.82 0.68 0.81 0.79 0.31 1 0.78 0.74 0.78 0.77 0.76 0.95 0.87 0.87 0.87 0.84 0.84 0.89 0.93 0.93 0.93 0.15 0.78 0.35 0.26 0.54 0.68 0.54 0.6 0.65 0.57 0.11 0.11 0.35 0.52 0.57 0.12

sload 0.85 0.57 0.082 0.66 0.62 0.65 0.47 0.65 0.96 0.36 0.78 1 0.55 0.6 0.61 0.92 0.79 0.84 0.75 0.71 0.66 0.66 0.7 0.74 0.74 0.74 0.12 0.63 0.29 0.25 0.51 0.53 0.45 0.51 0.52 0.51 0.12 0.12 0.29 0.46 0.53 0.2

dload 0.62 0.39 0.12 0.89 0.76 0.9 0.69 0.92 0.53 0.76 0.74 0.55 1 0.75 0.81 0.5 0.7 0.57 0.67 0.73 0.7 0.7 0.75 0.6 0.6 0.6 0.16 0.96 0.27 0.27 0.42 0.93 0.3 0.59 0.66 0.56 0.076 0.076 0.27 0.32 0.41 0.12

sloss 0.75 0.57 0.05 0.65 0.93 0.93 0.83 0.89 0.62 0.45 0.78 0.6 0.75 1 0.95 0.65 0.79 0.77 0.81 0.94 0.87 0.87 0.93 0.83 0.83 0.83 0.17 0.78 0.24 0.18 0.4 0.69 0.38 0.55 0.59 0.49 0.17 0.17 0.23 0.38 0.42 0.11

dloss 0.73 0.56 0.11 0.67 0.91 0.96 0.81 0.95 0.59 0.54 0.77 0.61 0.81 0.95 1 0.64 0.77 0.76 0.84 0.92 0.88 0.88 0.94 0.81 0.81 0.81 0.1 0.87 0.3 0.29 0.42 0.73 0.37 0.55 0.6 0.52 0.17 0.17 0.3 0.37 0.44 0.11

sinpkt 0.88 0.6 0.0081 0.59 0.68 0.65 0.61 0.65 0.95 0.29 0.76 0.92 0.5 0.65 0.64 1 0.8 0.88 0.77 0.74 0.69 0.69 0.74 0.77 0.77 0.77 0.085 0.58 0.31 0.26 0.52 0.48 0.49 0.53 0.56 0.52 0.11 0.11 0.31 0.49 0.55 0.17

dinpkt 0.85 0.51 0.1 0.82 0.77 0.83 0.69 0.82 0.81 0.31 0.95 0.79 0.7 0.79 0.77 0.8 1 0.91 0.93 0.87 0.84 0.84 0.89 0.92 0.91 0.92 0.14 0.76 0.35 0.26 0.52 0.72 0.5 0.59 0.63 0.56 0.12 0.12 0.35 0.49 0.55 0.12
0.6
sjit 0.89 0.58 0.074 0.67 0.78 0.76 0.7 0.75 0.85 0.25 0.87 0.84 0.57 0.77 0.76 0.88 0.91 1 0.9 0.87 0.82 0.82 0.87 0.9 0.9 0.9 0.12 0.67 0.37 0.32 0.51 0.55 0.49 0.55 0.58 0.51 0.13 0.13 0.37 0.48 0.54 0.07

djit 0.82 0.53 0.12 0.69 0.81 0.85 0.71 0.85 0.74 0.32 0.87 0.75 0.67 0.81 0.84 0.77 0.93 0.9 1 0.88 0.84 0.84 0.9 0.88 0.88 0.88 0.082 0.77 0.4 0.35 0.48 0.69 0.46 0.56 0.58 0.52 0.14 0.14 0.4 0.46 0.51 0.11

swin 0.79 0.6 0.068 0.69 0.87 0.89 0.76 0.87 0.71 0.37 0.87 0.71 0.73 0.94 0.92 0.74 0.87 0.87 0.88 1 0.92 0.92 0.98 0.92 0.92 0.92 0.1 0.78 0.35 0.28 0.45 0.66 0.44 0.57 0.63 0.5 0.12 0.12 0.35 0.44 0.48 0.11

stcpb 0.74 0.56 0.08 0.68 0.82 0.85 0.74 0.84 0.66 0.36 0.84 0.66 0.7 0.87 0.88 0.69 0.84 0.82 0.84 0.92 1 0.89 0.94 0.88 0.88 0.88 0.13 0.75 0.33 0.27 0.44 0.65 0.42 0.53 0.58 0.49 0.11 0.11 0.33 0.42 0.46 0.11

dtcpb 0.74 0.56 0.08 0.68 0.82 0.85 0.74 0.84 0.66 0.36 0.84 0.66 0.7 0.87 0.88 0.69 0.84 0.82 0.84 0.92 0.89 1 0.94 0.88 0.88 0.88 0.13 0.75 0.33 0.27 0.44 0.65 0.42 0.53 0.58 0.49 0.11 0.11 0.33 0.42 0.46 0.11

dwin 0.79 0.59 0.085 0.72 0.88 0.91 0.79 0.89 0.71 0.39 0.89 0.7 0.75 0.93 0.94 0.74 0.89 0.87 0.9 0.98 0.94 0.94 1 0.94 0.94 0.94 0.13 0.8 0.35 0.28 0.46 0.7 0.45 0.56 0.62 0.52 0.12 0.12 0.35 0.44 0.49 0.11

tcprtt 0.8 0.56 0.07 0.69 0.77 0.79 0.69 0.77 0.75 0.2 0.93 0.74 0.6 0.83 0.81 0.77 0.92 0.9 0.88 0.92 0.88 0.88 0.94 1 1 0.99 0.12 0.68 0.34 0.25 0.47 0.54 0.51 0.53 0.57 0.49 0.1 0.1 0.34 0.49 0.51 0.11

synack 0.8 0.56 0.067 0.69 0.77 0.79 0.69 0.76 0.75 0.2 0.93 0.74 0.6 0.83 0.81 0.77 0.91 0.9 0.88 0.92 0.88 0.88 0.94 1 1 0.98 0.12 0.68 0.34 0.25 0.47 0.55 0.51 0.53 0.57 0.49 0.1 0.1 0.34 0.49 0.51 0.11

ackdat 0.8 0.55 0.083 0.68 0.77 0.79 0.69 0.77 0.75 0.2 0.93 0.74 0.6 0.83 0.81 0.77 0.92 0.9 0.88 0.92 0.88 0.88 0.94 0.99 0.98 1 0.12 0.68 0.35 0.26 0.48 0.55 0.51 0.53 0.57 0.5 0.11 0.11 0.35 0.49 0.52 0.11
0.4
smean 0.17 0.1 0.12 0.21 0.22 0.15 0.56 0.14 0.09 0.08 0.15 0.12 0.16 0.17 0.1 0.085 0.14 0.12 0.082 0.1 0.13 0.13 0.13 0.12 0.12 0.12 1 0.13 0.19 0.12 0.24 0.16 0.25 0.25 0.3 0.22 0.14 0.14 0.18 0.21 0.24 0.2

dmean 0.69 0.44 0.16 0.86 0.79 0.93 0.71 0.96 0.6 0.73 0.78 0.63 0.96 0.78 0.87 0.58 0.76 0.67 0.77 0.78 0.75 0.75 0.8 0.68 0.68 0.68 0.13 1 0.34 0.37 0.45 0.9 0.35 0.6 0.66 0.57 0.07 0.07 0.34 0.36 0.45 0.12

trans_depth 0.3 0.21 0.5 0.25 0.24 0.28 0.24 0.32 0.29 0.16 0.35 0.29 0.27 0.24 0.3 0.31 0.35 0.37 0.4 0.35 0.33 0.33 0.35 0.34 0.34 0.35 0.19 0.34 1 0.8 0.33 0.23 0.23 0.21 0.21 0.28 0.037 0.037 1 0.23 0.33 0.04

response_body_len 0.26 0.17 0.41 0.2 0.19 0.27 0.19 0.33 0.23 0.27 0.26 0.25 0.27 0.18 0.29 0.26 0.26 0.32 0.35 0.28 0.27 0.27 0.28 0.25 0.25 0.26 0.12 0.37 0.8 1 0.25 0.21 0.15 0.16 0.16 0.21 0.03 0.03 0.8 0.16 0.25 0.032

ct_srv_src 0.52 0.44 0.089 0.47 0.42 0.46 0.48 0.46 0.56 0.29 0.54 0.51 0.42 0.4 0.42 0.52 0.52 0.51 0.48 0.45 0.44 0.44 0.46 0.47 0.47 0.48 0.24 0.45 0.33 0.25 1 0.39 0.72 0.76 0.69 0.82 0.15 0.15 0.33 0.72 0.92 0.13

ct_state_ttl 0.6 0.35 0.1 0.86 0.71 0.86 0.64 0.86 0.52 0.71 0.68 0.53 0.93 0.69 0.73 0.48 0.72 0.55 0.69 0.66 0.65 0.65 0.7 0.54 0.55 0.55 0.16 0.9 0.23 0.21 0.39 1 0.27 0.56 0.62 0.53 0.09 0.09 0.23 0.29 0.39 0.11

ct_dst_ltm 0.47 0.43 0.18 0.39 0.38 0.38 0.45 0.38 0.51 0.082 0.54 0.45 0.3 0.38 0.37 0.49 0.5 0.49 0.46 0.44 0.42 0.42 0.45 0.51 0.51 0.51 0.25 0.35 0.23 0.15 0.72 0.27 1 0.77 0.71 0.74 0.019 0.019 0.22 0.79 0.78 0.075

ct_src_dport_ltm 0.57 0.46 0.13 0.58 0.57 0.6 0.6 0.61 0.55 0.4 0.6 0.51 0.59 0.55 0.55 0.53 0.59 0.55 0.56 0.57 0.53 0.53 0.56 0.53 0.53 0.53 0.25 0.6 0.21 0.16 0.76 0.56 0.77 1 0.88 0.8 0.097 0.097 0.21 0.75 0.77 0.012 0.2
ct_dst_sport_ltm 0.61 0.46 0.18 0.67 0.61 0.66 0.67 0.66 0.59 0.45 0.65 0.52 0.66 0.59 0.6 0.56 0.63 0.58 0.58 0.63 0.58 0.58 0.62 0.57 0.57 0.57 0.3 0.66 0.21 0.16 0.69 0.62 0.71 0.88 1 0.74 0.081 0.081 0.21 0.68 0.7 0.021

ct_dst_src_ltm 0.54 0.43 0.029 0.56 0.52 0.57 0.54 0.58 0.55 0.4 0.57 0.51 0.56 0.49 0.52 0.52 0.56 0.51 0.52 0.5 0.49 0.49 0.52 0.49 0.49 0.5 0.22 0.57 0.28 0.21 0.82 0.53 0.74 0.8 0.74 1 0.036 0.036 0.27 0.69 0.84 0.092

is_ftp_login 0.14 0.071 0.14 0.085 0.15 0.16 0.11 0.13 0.1 0.1 0.11 0.12 0.076 0.17 0.17 0.11 0.12 0.13 0.14 0.12 0.11 0.11 0.12 0.1 0.1 0.11 0.14 0.07 0.037 0.03 0.15 0.09 0.019 0.097 0.081 0.036 1 1 0.037 0.027 0.14 0.014

ct_ftp_cmd 0.14 0.071 0.14 0.085 0.15 0.16 0.11 0.13 0.1 0.1 0.11 0.12 0.076 0.17 0.17 0.11 0.12 0.13 0.14 0.12 0.11 0.11 0.12 0.1 0.1 0.11 0.14 0.07 0.037 0.03 0.15 0.09 0.019 0.097 0.081 0.036 1 1 0.037 0.027 0.14 0.014

ct_flw_http_mthd 0.3 0.21 0.5 0.25 0.24 0.28 0.24 0.32 0.29 0.16 0.35 0.29 0.27 0.23 0.3 0.31 0.35 0.37 0.4 0.35 0.33 0.33 0.35 0.34 0.34 0.35 0.18 0.34 1 0.8 0.33 0.23 0.22 0.21 0.21 0.27 0.037 0.037 1 0.22 0.32 0.04

ct_src_ltm 0.47 0.42 0.15 0.4 0.38 0.39 0.43 0.39 0.51 0.13 0.52 0.46 0.32 0.38 0.37 0.49 0.49 0.48 0.46 0.44 0.42 0.42 0.44 0.49 0.49 0.49 0.21 0.36 0.23 0.16 0.72 0.29 0.79 0.75 0.68 0.69 0.027 0.027 0.22 1 0.7 0.1

ct_srv_dst 0.55 0.45 0.063 0.48 0.44 0.47 0.49 0.47 0.58 0.26 0.57 0.53 0.41 0.42 0.44 0.55 0.55 0.54 0.51 0.48 0.46 0.46 0.49 0.51 0.51 0.52 0.24 0.45 0.33 0.25 0.92 0.39 0.78 0.77 0.7 0.84 0.14 0.14 0.32 0.7 1 0.11

is_sm_ips_ports 0.19 0.22 0.1 0.12 0.21 0.12 0.21 0.12 0.2 0.24 0.12 0.2 0.12 0.11 0.11 0.17 0.12 0.07 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.11 0.2 0.12 0.04 0.032 0.13 0.11 0.075 0.012 0.021 0.092 0.014 0.014 0.04 0.1 0.11 1
0
dur
proto
service
state
spkts
dpkts
sbytes
dbytes
rate
sttl
dttl
sload
dload

ct_src_dport_ltm
sloss
dloss
sinpkt
dinpkt
sjit
djit

is_sm_ips_ports
swin
stcpb
dtcpb
dwin
tcprtt

ackdat
smean
dmean
trans_depth

ct_srv_src
ct_state_ttl
ct_dst_ltm

ct_dst_sport_ltm
ct_dst_src_ltm

ct_src_ltm
is_ftp_login
ct_ftp_cmd
ct_flw_http_mthd
response_body_len

ct_srv_dst
synack

Figure 10: Feature correlation diagram of UNSW-NB15.

Table 4: Feature selection results.


Dataset Feature selection Number
duration, protocol_type, service, dst_host_srv_count, src_bytes, dst_host_count, dst_bytes, count,
dst_host_same_src_port_rate, dst_host_srv_diff_host_rate, srv_count, dst_host_rerror_rate,
NSL-KDD 20
dst_host_serror_rate, diff_srv_rate, srv_diff_host_rate, hot, serror_rate, rerror_rate, num_compromised,
num_root
UNSW- dur, sload, ct_srv_src, sbytes, stcpd, ct_src_ltm, tcprtt, ct_srv_dst, ct_dst_src_ltm, ct_dst_ltm, djit, sjit, dload,
19
NB15 smean, ct_src_dport_ltm, dmean, service, proto, response_body_len

Table 5: Distribution of NSL-KDD dataset before and after sample generation.


Class Before sample generation After sample generation
Normal 67343 67343
DoS 45927 45927
Probe 11656 11656
R2L 995 5995
U2R 52 5052

has advantages in network intrusion detection problem. The type of method can achieve good results at network in-
reason is that network traffic data have obvious time-series trusion detection. LSTM could only read sequence data from
characteristics, while LSTM and BiLSTM have strong time- one direction and could not rule out the influence of sub-
series processing capability and could perform deeper fea- sequent information on the detection results. Thus, BiLSTM
ture extraction on long-term time-series data. Therefore, this is used to process the incoming data to improve the
Computational Intelligence and Neuroscience 13

Table 6: Distribution of UNSW-NB15 dataset before and after 1.00


sample generation.
0.95
Before sample After sample
Class
generation generation
0.90
Normal 56000 56000

Detection rate
Generic 40000 40000
0.85
Exploits 33393 33393
Fuzzers 18184 18184
DoS 12264 12264 0.80
Reconnaissance 10491 10491
Analysis 2000 7000 0.75
Backdoor 1746 6746
Shellcode 1133 6133 0.70
Worms 130 5130
0.65
0 20 40 60 80 100 120 140 160 180 200
1.00 0.40 Epoch

0.99 0.35 Normal R2L


0.98 DoS U2R
0.30 Probe
0.97
0.25 Figure 13: NSL-KDD class detection rate curve.
0.96
Accuracy

0.20
Loss

0.95
0.94 0.15
0.93 0.10 0.98
0.92 0.05 0.91
0.91 0.84
0.00
0.90
0.77
Detection rate

0 20 40 60 80 100 120 140 160 180 200


Epoch 0.70
acc val_loss 0.63
val_acc loss
0.56
Figure 11: NSL-KDD accuracy curve.
0.49
0.42
0.90 1.50
0.35
0.85 1.35 0 20 40 60 80 100 120 140 160 180 200
1.20 Epoch
0.80
1.05 Worms Backdoor
0.75
DoS Normal
Accuracy

0.90
Loss

0.70 Reconnaissance Analysis


0.75 Generic Fuzzers
0.65 Exploits Shellcode
0.60
0.60 0.45 Figure 14: UNSW-NB15 class detection rate curve.
0.55 0.30
0.50 0.15 4.3.6. Comparative Experiment with Existing Intrusion De-
0 20 40 60 80 100 120 140 160 180 200
Epoch
tection Models. Performance comparison experiments were
conducted to further verify the comprehensive performance
acc val_loss of network intrusion detection algorithms based on
val_acc loss FCWGAN and BiLSTM.
Figure 12: UNSW-NB15 accuracy curve. Under the same experimental conditions, models with
superior detection results in the literature, such as CNN-
BiLSTM [35], SSAE-LSTM [36], CWGAN-DNN, and AE-
detection effect. However, the training time of a single epoch CGAN-RF, were applied to the NSL-KDD and UNSW-NB15
of the model is higher than that of other detection methods, datasets in accordance with their published descriptions
because BiLSTM is more complex than other classification and parameter settings, with results as shown in Tables 13
algorithms. and 14.
14 Computational Intelligence and Neuroscience

Table 7: Detection accuracy under the influence of different noise levels.


Noise level
Dataset
0 0.02 0.04 0.06 0.08
NSL-KDD 99.57 ± 0.21 99.55 ± 0.22 99.45 ± 0.22 98.88 ± 0.24 98.27 ± 0.25
UNSW-NB15 85.59 ± 0.27 85.53 ± 0.29 85.28 ± 0.30 84.71 ± 0.31 84.15 ± 0.33

Table 8: Ablation experiment detection rate of various types.


Type of samples
Algorithm
Normal DoS Probe U2R R2L
BiLSTM 94.65 ± 0.21 88.24 ± 0.19 72.91 ± 0.23 46.81 ± 0.35 51.97 ± 0.30
GAN-BiLSTM 95.31 ± 0.25 92.18 ± 0.22 81.27 ± 0.30 60.33 ± 0.42 65.10 ± 0.37
CWGAN-BiLSTM 98.54 ± 0.19 94.60 ± 0.15 85.15 ± 0.23 70.20 ± 0.26 72.13 ± 0.25
Model in this paper 99.68 ± 0.14 96.01 ± 0.11 90.12 ± 0.15 76.35 ± 0.27 80.26 ± 0.19

Table 9: Comparison of data enhancement algorithms on NSL-KDD dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
ROS-BiLSTM 89.18 ± 0.35 90.34 ± 0.40 88.61 ± 0.35 89.46 ± 0.37 4
ADASYN-BiLSTM 92.95 ± 0.24 93.12 ± 0.27 92.61 ± 0.21 92.86 ± 0.25 5
SMOTE-BiLSTM 93.66 ± 0.28 94.63 ± 0.34 93.14 ± 0.26 93.88 ± 0.30 3
WGAN-BiLSTM 96.56 ± 0.23 96.71 ± 0.28 95.65 ± 0.21 96.20 ± 0.26 7
Model in this paper 99.57 ± 0.21 99.55 ± 0.20 99.47 ± 0.17 99.51 ± 0.18 2

Table 10: Comparison of data enhancement algorithms on UNSW-NB15 dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
ROS-BiLSTM 81.70 ± 0.43 79.32 ± 0.47 80.49 ± 0.41 79.90 ± 0.44 6
ADASYN-BiLSTM 83.65 ± 0.37 84.11 ± 0.40 82.14 ± 0.35 83.12 ± 0.37 6
SMOTE-BiLSTM 83.66 ± 0.31 84.28 ± 0.34 81.24 ± 0.27 82.73 ± 0.30 5
WGAN-BiLSTM 81.49 ± 0.30 84.71 ± 0.24 82.51 ± 0.28 83.60 ± 0.26 8
Model in this paper 85.59 ± 0.27 86.11 ± 0.21 85.57 ± 0.24 85.84 ± 0.22 4

Table 11: Comparison of classification algorithms on NSL-KDD dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
FCWGAN-RF 91.29 ± 0.27 90.24 ± 0.29 89.11 ± 0.21 89.67 ± 0.24 1
FCWGAN-DNN 95.11 ± 0.23 96.01 ± 0.22 94.98 ± 0.17 95.00 ± 0.19 2
FCWGAN-LSTM 98.29 ± 0.23 98.37 ± 0.21 98.14 ± 0.15 98.25 ± 0.18 2
Model in this paper 99.57 ± 0.21 99.55 ± 0.20 99.47 ± 0.17 99.51 ± 0.18 2

Table 12: Comparison of classification algorithms on UNSW-NB15 dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
FCWGAN-RF 81.00 ± 0.37 81.94 ± 0.33 80.97 ± 0.31 81.45 ± 0.32 1
FCWGAN-DNN 83.44 ± 0.31 84.12 ± 0.33 83.40 ± 0.27 83.76 ± 0.30 2
FCWGAN-LSTM 84.98 ± 0.30 85.44 ± 0.29 84.67 ± 0.25 85.05 ± 0.28 3
Model in this paper 85.59 ± 0.27 86.11 ± 0.21 85.57 ± 0.24 85.84 ± 0.22 4
Computational Intelligence and Neuroscience 15

Table 13: Comparison of multiclassification on NSL-KDD dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
CNN-BiLSTM 99.22 ± 0.31 99.18 ± 0.29 99.14 ± 0.24 99.15 ± 0.26 6
SSAE-LSTM 97.63 ± 0.34 97.91 ± 0.33 97.21 ± 0.28 97.56 ± 0.30 4
CWGAN-DNN 98.13 ± 0.26 99.03 ± 0.30 97.91 ± 0.25 98.46 ± 0.27 8
AE-CGAN-RF 98.53 ± 0.27 98.67 ± 0.28 98.31 ± 0.23 98.49 ± 0.25 7
Model in this paper 99.57 ± 0.21 99.55 ± 0.20 99.47 ± 0.17 99.51 ± 0.18 2

Table 14: Comparison of multiclassification on UNSW-NB15 dataset.


Evaluation metrics
Algorithm
Accuracy Precision Recall F1-score Time (s)
CNN-BiLSTM 82.08 ± 0.43 82.68 ± 0.43 80.00 ± 0.37 81.32 ± 0.40 10
SSAE-LSTM 82.31 ± 0.45 83.65 ± 0.44 81.94 ± 0.36 82.78 ± 0.41 7
CWGAN-DNN 82.61 ± 0.37 82.95 ± 0.41 82.11 ± 0.33 82.53 ± 0.38 14
AE-CGAN-RF 81.24 ± 0.39 83.47 ± 0.40 80.31 ± 0.35 81.86 ± 0.38 13
Model in this paper 85.59 ± 0.27 86.11 ± 0.21 85.57 ± 0.24 85.84 ± 0.22 4

From Tables 13 and 14, it can be seen that the proposed dataset demonstrated that there is room for improvement.
model achieved the best detection results on all metrics. Future work will focus on this deficiency, and we will in-
Compared with CNN-BiLSTM and SSAE-LSTM, the pro- vestigate the construction of feature extraction and classi-
posed model uses FCWGAN to simplify the data features fication models to find ways to improve detection accuracy.
and reduce dataset dimensionality, which reduces the
computational cost, while generating minority class samples Data Availability
to supplement the dataset, which alleviates the impact of
class imbalance, and thus it could obtain better detection All data used in this paper can be obtained by contacting the
results. Compared with CWGAN-DNN and AE-CGAN-RF, authors of this study.
the proposed model eliminates high-dimensional disasters
and simplifies the data structure, while uses BiLSTM for Ethical Approval
feature extraction and classification, which can extract more
in-depth and comprehensive data features from the time- This article does not contain any studies with human or
series level, and thus obtains better multiclassification animal subjects performed by any of the authors.
results.
Consent
5. Conclusion Informed consent was obtained from all individual partic-
ipants included in the study.
To alleviate the impact of class imbalance on the accuracy of
network intrusion detection models and improve their ef-
fectiveness at detecting network intrusion attacks, we pro- Conflicts of Interest
posed a network intrusion method based on FCWGAN and
The authors declare that they have no conflicts of interest.
BiLSTM. The method uses XGBoost and Spearman corre-
lation coefficients to process the dataset, which effectively
filters out redundant and useless features, simplifies the data Acknowledgments
structure, which reduces the computational cost and This work was supported by the National Natural Science
training time, and avoids high-dimensional disasters. Mi- Foundation of China (nos. 61703426, 61806219, and
nority class samples are generated using CWGANs to 61876189), Youth Talent Promotion Plan of Shaanxi Uni-
supplement the dataset and alleviate class imbalance. versity Association for Science and Technology (no.
BiLSTM is used to extract the time-series features of data to 20190108), and Shaanxi Innovation Capability Support Plan
complete the classification of network intrusions. Extensive (no. 2020KJXX-065).
experiments on the NSL-KDD and UNSW-NB15 datasets
demonstrated that the model greatly improves the detection
effect for minority class samples, has a strong feature ex- References
traction capability, high detection accuracy, and low false- [1] M. Yousefnezhad, J. Hamidzadeh, and M. Aliannejadi,
positive rate when processing large-scale network data, and “Ensemble classification for intrusion detection via feature
shows promise for real-time intrusion detection systems. extraction based on deep Learning,” Soft Computing, vol. 25,
However, the accuracy of this model on the UNSW-NB15 no. 20, Article ID 12667, 2021.
16 Computational Intelligence and Neuroscience

[2] M. Kelidari and J. Hamidzadeh, “Feature selection by using [19] H. Chen and L. Jiang, “Efficient GAN-based Method for
chaotic cuckoo optimization algorithm with levy flight, op- Cyber-Intrusion Detection,” 2019, https://arxiv.org/abs/1904.
position-based learning and disruption operator,” Soft 02426.
Computing, vol. 25, no. 4, pp. 2911–2933, 2020. [20] J. Ye, Y. Fang, and J. Ma, “Intrusion Detection Model Based
[3] D. Gonzalez-Cuautle, A. Hernandez-Suarez, G. Sanchez- on Conditional Generative Adversarial Networks,” in Pro-
Perez et al., “Synthetic minority oversampling technique for ceedings of the 2019 Second International Conference on Al-
optimizing classification tasks in botnet and intrusion-de- gorithms, Computing and Artificial Intelligence, ACM, Sanya,
tection-system datasets,” Applied Sciences, vol. 10, no. 3, China, December 2019.
p. 794, 2020. [21] S. Msika, A. Quintero, and F. Khomh, “SIGMA: Strength-
[4] J. Lee and K. Park, “AE-CGAN model based high perfor- ening IDS with GAN and Metaheuristics Attacks,” 2019,
mance network intrusion detection system,” Applied Sciences, https://arxiv.org/abs/1912.09303.
vol. 9, no. 20, p. 4221, 2019. [22] R. Ahsan, W. Shi, X. Ma, and W. L. Croft, “A Comparative
[5] J. Lee and K. Park, “GAN-based imbalanced data intrusion Analysis of CGAN-based Oversampling for Anomaly De-
detection system,” Personal and Ubiquitous Computing, tection,” IET Cyber-Physical Systems: Theory & Applications,
vol. 25, no. 1, pp. 121–128, 2019. vol. 7, no. 6, 2021.
[6] X. Liu, T. Li, R. Zhang, D. Wu, Y. Liu, and Z. Yang, “A GAN [23] G. Zhang, X. Wang, R. Li, Y. Song, J. He, and J. Lai, “Network
and feature selection-based oversampling technique for in- intrusion detection based on conditional Wasserstein gen-
trusion detection,” Security and Communication Networks, erative adversarial network and cost-sensitive stacked
vol. 2021, Article ID 9947059, 15 pages, 2021. autoencoder,” IEEE access, vol. 8, Article ID 190431, 2020.
[7] J. He, “CWGAN-DNN: a method of conditional Wasserstein [24] R. Feng, “Uncertainty analysis in well log classification by
generation against network intrusion detection,” Journal of Bayesian long short-term memory networks,” Journal of
Air Force Engineering University (NATURAL SCIENCE Petroleum Science and Engineering, vol. 205, 2021.
EDITION), vol. 22, no. 5, pp. 67–74, 2021. [25] S. Hochreiter and J. Schmidhuber, “Long short-term memory,”
[8] A. Thakkar and R. Lohiya, “Attack classifcation using feature Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997.
selection techniques: a comparative study,” Journal of Am- [26] S. A. Althubiti, E. M. Jones, and K. Roy, “LSTM for anomaly-
bient Intelligence and Humanized Computing, vol. 12, 2021. based network intrusion detection,” in Proceedings of the 2018
[9] M. Di Mauro, G. Galatro, G. Fortino, and A. Liotta, “Su- 28th International Telecommunication Networks and Appli-
pervised feature selection techniques in network intrusion
cations Conference (ITNAC), November 2018.
detection: a critical review,” Engineering Applications of Ar- [27] P. Sun, P. Liu, Q. Li et al., “DL-IDS: Extracting Features Using
tificial Intelligence, vol. 101, Article ID 104216, 2021.
CNN-LSTM Hybrid Network for Intrusion Detection Sys-
[10] S. M. Z. Kashani and J. Hamidzadeh, “Feature selection by
tem,” Security and communication networks, vol. 2020,
using privacy-preserving of recommendation systems based
pp. 1–11, Article ID 8890306, 2020.
on collaborative filtering and mutual trust in social networks,”
[28] S. Hao, J. Long and Y. Yingchuan, BL-IDS: “Detecting Web
Soft Computing, vol. 24, no. 15, Article ID 11425, 2019.
Attacks Using Bi-LSTM Model Based on Deep Learning,” in
[11] X. Li, P. Yi, W. Wei, Y. Jiang, and L. Tian, “Lnnls-KH: A
Proceedings of the Second EAI International Conference,
Feature Selection Method for Network Intrusion Detection,”
SPNCE 2019, Tianjin, China, April 2019.
Security and Communication Networks, vol. 2021, Article ID
[29] P. Devan and N. Khare, “An efficient XGBoost–DNN-based
8830431, 22 pages.
[12] B. Selvakumar and K. Muneeswaran, “Firefly algorithm based classification model for network intrusion detection system,”
feature selection for network intrusion detection,” Computers Neural Computing & Applications, vol. 32, no. 16, Article ID
& Security, vol. 81, pp. 148–155, 2019. 12499, 2020.
[13] T. Le, Y. Kim, and H. Kim, “Network intrusion detection [30] Y. Zhou, G. Cheng, S. Jiang, and M. Dai, “Building an efficient
based on novel feature selection model and various recurrent intrusion detection system based on feature selection and
neural networks,” Applied Sciences, vol. 9, no. 7, p. 1392, 2019. ensemble classifier,” Computer Networks, vol. 174, Article ID
[14] B. S. Bhati, G. Chugh, F. Al-Turjman, and N. S. Bhati, “An 107247, 2020.
improved ensemble based intrusion detection technique using [31] M. Usama, M. Asim, S. Latif, J. Qadir, and A.-A. Fuqaha,
XGBoost,” Transactions on emerging telecommunications “Generative Adversarial Networks for Launching and
technologies, vol. 32, no. 6, 2021. Thwarting Adversarial Attacks on Network Intrusion De-
[15] S. T. Ikram, A. K. Cherukuri, B. Poorva et al., “Anomaly tection Systems,” in Proceedings of the 2019 15th International
Detection Using XGBoost Ensemble of Deep Neural Network Wireless Communications & Mobile Computing Conference
Models,” Cybernetics and information technologies, vol. 21, (IWCMC), Tangier, Morocco, June 2019.
no. 3, 2021. [32] M. Ishaque and L. Hudec, “Feature extraction using deep
[16] G. P. Dubey and D. R. K. Bhujade, “Optimal feature selection learning for intrusion detection system,” in Proceedings of the
for machine learning based intrusion detection system by 2019 Second International Conference on Computer Applica-
exploiting attribute dependence,” Materials Today Proceed- tions & Information Security, May 2019.
ings, vol. 47, pp. 6325–6331, 2021. [33] C. Sarika and K. Nishtha, “Analysis of KDD-cup’99, NSL-
[17] D. Liao, S. Huang, Y. Tan, and G. Bai, “Network Intrusion KDD and UNSW-NB15 datasets using deep learning in IoT,”
Detection Method Based on GAN Model,” in Proceedings of Procedia Computer Science, vol. 167, 2020.
the 2020 International Conference on Computer Communi- [34] X Zhu and X Wu, “Class noise vs. attribute noise: a quan-
cation and Network Security (CCNS), August 2020. titative study,” Artificial Intelligence Review, vol. 22,
[18] S. Huang and K. Lei, “IGAN-IDS: an imbalanced generative pp. 177–210, 2004.
adversarial network towards intrusion detection system in ad- [35] S. Jay and M. Manollas, “Effective deep CNN-BiLSTM model
hoc networks,” Ad Hoc Networks, vol. 105, Article ID 102177, for network intrusion detection,” in Proceedings of the Third
2020. International Conference on Artificial Intelligence and Pattern
Computational Intelligence and Neuroscience 17

Recognition in 2020 2020, p. 9, Xiamen, Fujian, China, June


2020.
[36] Y. Lin, J. Wang, Y. Tu, L. Chen, and Z. Dou, “Time-related
Network Intrusion Detection Model: A Deep Learning
Method,” in Proceedings of the 2019 IEEE Global Commu-
nications Conference (GLOBECOM), Waikoloa, HI, USA,
December 2019.

You might also like