Professional Documents
Culture Documents
PII: S1084-8045(23)00041-3
DOI: https://doi.org/10.1016/j.jnca.2023.103622
Reference: YJNCA 103622
Please cite this article as: W. Yao, H. Shi and H. Zhao, Scalable anomaly-based intrusion detection
for secure Internet of Things using generative adversarial networks in fog environment. Journal of
Network and Computer Applications (2023), doi: https://doi.org/10.1016/j.jnca.2023.103622.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
of
ARTICLE INFO ABSTRACT
Keywords: The data generated exponentially by a massive number of devices in the Internet of Things (IoT) are
Intrusion detection extremely high-dimensional, large-scale, non-labeled, which poses great challenges to timely analysis
Internet of Things and effective decision making for anomaly detection in IoT. In this paper, we propose a novel unsuper-
pro
Anomaly detection vised deep learning method to identify anomalies against IoT networks, which exploits Bidirectional
Generative adversarial networks Generative Adversarial Networks (BiGAN) to build model on normal IoT data. The model introduces
Fog computing Wassertein distance to capture and learn the distribution of high-dimensional raw data and focuses on
latent representations using an auxiliary classifier. A cycle consistency connection between data is
designed to prevent information loss that helps to reduce false positive rate. The model detects outliers
by utilizing reconstruction error in feature space. Another challenge facing the current anomaly detec-
tion solutions is their limited scalability, which restricts capability in handling big IoT data. This issue
is resolved by deploying and jointly training the proposed method in a fog computing environment.
The anomaly-based intrusion detection can be scalable by leveraging the flexibility of fog computing,
which contributes to supporting efficient detection. Experimental results on two recent datasets (i.e.,
UNSW-NB15 and CIC-IDS2017) validate that the proposed method achieves 4% increase in accuracy
1. Introduction
re-
and 4% reduction in false alarm rate than the state-of-the-art methods while keeping computational
efficiency.
system (IDS) has become an effective and active technol- san et al., 2021; Cao et al., 2019). The attacks may evade
ogy to monitor and detect attacks or possible anomalous ac- advanced security technologies and engender heavy damage
tivities (Stoyanova et al., 2020). An IDS can be divided to networks (Cao et al., 2019). Traditional machine learning
into two categorizes: the signature-based IDS and anomaly- (ML) methods (Shafiq et al., 2021) have been proven to iden-
based IDS. The signature-based IDS identifies attack behav- tify malicious attacks in IoT networks for their advantages in
iors by examining traffic with previously learned knowledge behavior analysis (Keshk et al., 2021). Nevertheless, ML has
of confirmed attacks (Stoyanova et al., 2020). However, also been shown to not handle effectively with large amounts
it fails to recognize zero-day attacks or novel ones (Zoppi of complex and high-dimensional data, and predominantly
and Ceccarelli, 2021). In contrast, anomaly-based IDS is reflects high false positive rates for attack detection (Ben-
able to identify any deviation from the normal profile as gio et al., 2013). Nowadays, Deep Learning (DL) methods
have been receiving considerable attention in intrusion de-
∗ Correspondingauthor.
tection (Mahdavifar and Ghorbani, 2019; Wu et al., 2020).
E-mail addresses: yaow.neu@gmail.com (W. Yao), Advances in DL have spawned novel IDS technologies ca-
shihan.neu@gmail.com (H. Shi), zhaoh@mail.neu.edu.cn (H. Zhao). pable of handling the existing attacks, complexity, and so-
Journal Pre-proof
phistication (Li et al., 2022a). mal or not by estimating how well the given sample fits the
Obtaining labeled attacks from IoT network is usually a learned distribution. Therefore, instead of using standard
difficult task, which can be time-consuming and even im- GAN, we adopt Bidirectional Generative Adversarial Net-
possible in some complex scenarios. Thus, unsupervised works (BiGAN) (Donahue et al., 2018) to propose a novel
learning techniques are considered to be a promising way, anomaly detection method for IoT networks. BiGAN intro-
being able to identify anomalies without the need for labels duces an encoder network, which alleviates the computa-
(Choi et al., 2019; Schlegl et al., 2019; Zenati et al., 2018b). tion of reconstruction error for a given input sample 𝑥 (i.e.,
we do not require to take expensive time to find the suit-
of
Nevertheless, most of the existing unsupervised methods are
built on simple linear projection and transformation, which able latent representation 𝑧 for the sample 𝑥). Thus, the Bi-
often fails to find the hidden relationship of data (Li et al., GAN is better for our goal. However, when bigan model
2019). Moreover, they generally use a simple comparison to learns to capture the distribution of high-dimensional IoT
distinguish between normal data and abnormal data, which data, the model gradient may vanishing, which leads to the
pro
often increases false alarm rates. This may not be sufficient low detection performance. To solve this issue, we introduce
for anomaly detection because the control boundaries are not Wassertein distance (Arjovsky et al., 2017) into adversarial
flexible enough and cannot effectively identify malicious at- loss to train BiGAN model, which helps to stabilizing and
tacks. Thus, it is crucial to develop a novel unsupervised accelerating the model training. In order to distinguish the
method to detect attacks in IoT networks. potential distribution of normal data from IoT anomalies, we
Meanwhile, the existing approaches for anomaly detection develop a code classifier to learn their potential latent rep-
mainly rely on centralized cloud. This indicates that they resentations to enhance detection accuracy. To reduce false
have no powerful capability to deal with the needs of current alarm rate for detection, we add a consistency loss to guaran-
IoT applications, such as computational distribution, low la- tee mapping normal data to desired outputs and reconstruct
re-
tency, and scalability (Wang et al., 2020; de Araujo-Filho
et al., 2021; Li et al., 2022a). Moreover, cloud-based meth-
ods usually overtake a high recognition time to detect attacks
due to long distance between IoT devices and IDS, leading
them precisely. In addition, we use the feature-based dis-
crimination loss as a novel anomaly score function to iden-
tify potential IoT anomalies. Finally, to achieve low detec-
tion latency and communication cost, our anomaly detection
to long communication time (Lim et al., 2020). As an alter- framework combines the advantages of fog computing and
native, edge computing poses a serious obstacle to deploy cloud computing to detect anomalies. The difference be-
IDS directly at the edge level due to the limited processing tween proposed method and previous methods is that, our
lP
and battery resources (Ning et al., 2021). A new nontrivial method not only has stronger generalization and prediction
computation called fog computing brings cloud-things ser- capability for detecting anomalies by focusing on latent rep-
vices next to end users via grouping available computing, resentation and reconstruction of data, but also achieves a
storage and processing resources at the network edge (Ni scalable and efficient intrusion detection by exploiting fog
et al., 2018). Fog computing can reduce costly storage and computing. Previous methods mainly focus on simple com-
processing generated by IoT devices and enable real-time de- parisons on the data space to identify anomalies, which does
tection requirement to achieve high efficiency and flexibility. not help to reduce false positive rates. They also use central-
rna
Thus, the fog layer provides a unique opportunity for IoT to ized deployment for anomaly detection, which cannot sup-
deploy intelligent and collaborative security solutions. Con- port efficient real-time detection. The main contributions of
sequently, an unsupervised method can take advantage of the this paper are summarized as follows.
fog computing to efficiently identify attacks in IoT networks. 1. We propose a novel unsupervised DL method for de-
Recently, Generative Adversarial Networks (GAN) (Li tecting attacks in IoT networks utilizing improved GAN,
et al., 2021) has been widely used as an effective unsuper- which not only solve the challenge of obtaining labels but
vised anomaly detection approach in various fields, such as also improve the detection performance.
computer vision and cyber security (Schlegl et al., 2019; 2. We develop a code classifier and a cycle-consistent loss
Jou
Zhang et al., 2022). The standard GAN consists of a gener- to learn the new latent space representation, which aims
ator and a discriminator. The generator produces fake sam- to enhance the capability in detecting intrusions from IoT
ples from a specific latent space, while the discriminator at- networks for unsupervised anomaly detection methods.
tempts to distinguish fake samples from real ones. When be- 3. We design a scalable anomaly detection framework by
ing trained on normal data, GAN can be utilized for detect- making use of the benefits of fog node and cloud server.
ing anomalies by observing its anomaly score. The trained This helps to reduce response time and communication
GAN often produces small values of reconstruction error on overhead. Extensive experiments are conducted on two
the normal data and much higher values on the abnormal recent intrusion datasets to assess the performance of
data. Although GAN can approximate the normal data dis- proposed method. The experimental results show that
tribution and generate realistic data samples, it is still dif- our method greatly outperforms the existing methods for
ficult to find a suitable latent representation 𝑧 to determine anomaly detection, while keeping a lower recognize time
whether a given test sample conforms to the normal data dis- and communication cost than cloud-based way.
tribution. More importantly, our goal of learning the data
distribution is to estimate whether a test sample is abnor- The rest of this paper is described as follows. Section 2
2
Journal Pre-proof
briefly introduces an overview of recent research on intru- hashing (LSH), isolation forest, and PCA techniques to ef-
sion detection. In Section 3, we present the system model ficiently and accurately detect point anomalies and partially
in IoT networks and some fundamental preliminaries. Sec- group anomalies in Industry 4.0.
tion 4 describes the details of proposed anomaly detection However, these ML methods often rely on the robustness
method. Then we demonstrate and analysis the evaluation of feature engineering and only capture some shallow feature
results in Section 5. Section 6 analyzes results obtained relationships between data, which may limit their learning
from proposed method and compares the state-of-the-art capability. Moreover, their detection performance degrades
IDS studies. Section 7 presents the limitations of proposed
of
when utilized for high-dimensional IoT data and can not sat-
method. Finally, Section 8 concludes this paper and presents isfy the increasing complex and demand IoT data (Moustafa
potential future work. et al., 2021).
pro
2. Related Work
Detection
In this section, we briefly review existing studies of IDS Deep learning (DL) has attracted extensive interest in
and recent advances in deep learning-based intrusion detec- intrusion/anomaly detection due to its powerful learning
tion methods. abilities (especially from high-dimensional data), and in-
dependence from any feature engineering. Some advanced
2.1. Traditional Intrusion Detection Methods deep learning methods have been employed in IDS for IoT
As an indispensable security tool, IDS can collect and networks, such as Convolutional Neural Network (CNN)
screen network traffic to detect whether a network has been (Ferrag et al., 2020), Long Short-Term Memory (LSTM)
compromised. A large number of intrusion/anomaly detec- (Al-Hawawreh et al., 2021), and Recurrent Neural network
re-
tion methods based on machine learning (ML) have been in-
vestigated. For example, Chawathe (2018) employed many
ML approaches, such as J48 and Random Forest (RF) to
detect IoT attacks. Nevertheless, these methods are not ef-
(RNN) (Zhang et al., 2022). For example, Gao et al. (2021)
explored LSTM and feed forward neural network (FNN) to
identify attacks respectively and then combined them to do
experiments. Depending on improved feature representa-
fective for identifying novel botnets. To detect point-wise tion, Zhou et al. (2021) developed an IDS by utilizing vari-
and collective anomalies, Marteau (2021) utilized a forest ational LSTM to solve the complexity of data compression
of binary partitioning trees to build an intrusion detection but retain critical features. Ding and Li (2022) proposed a
lP
model. By introducing of the distance-based paradigm, the intrusion detection method of exploiting Graph Convolution
proposed method is better than the Isolated Forest (IF) ap- Network (GCN) to built graph structured data in network
proach. Nomm and Bahsi (2018) presented a new method traffic and combining LSTM with attention mechanism to
for identifying anomalies from IoT datasets. In the proposed capture the time dependence of network traffic. To resolve
method, they first resampled normal data from IoT dataset to the problem of data imbalance, Park et al. (2022) developed a
expand training dataset. Then, the trained Local Anomaly novel Boundary Equilibrium GAN (BEGAN)-based IDS to
Factor (LOF) and One Class of Support Vector Machines detect network They extracted features from trained encoder
rna
(OCSVM) were employed to detect malicious attacks. The and used DNN, CNN, and LSTM as supervised classifiers,
sampling technique may alter the distribution of the original which improves the detection performance. However, the
data, thus affecting the effectiveness of LOF and OCSVM. computational cost of these DL methods are expensive.
In Yang et al. (2022), a Cluster Labeling (CL) K-Means and Apart from previous approaches using supervised learn-
was used as an unsupervised learner for zero-day attack de- ing as an anomaly detection model, several studies have ap-
tection. The authors first employed Kernal PCA to select plied unsupervised learning, especially autoencoder (AE)
the relevant features for reducing dimensionality and then models (Vu et al., 2022; Monshizadeh et al., 2022; Yang and
exploited two biased classifiers and a Bayesian optimiza- Hwang, 2022). Vu et al. (2022) introduced three regularized
Jou
tion with Gaussian process (BO-GP) to optimize the model, AE architectures (i.e., MAE, MVAE, MDAE) to build the
which realizes much improved performance. Different from desired feature representation of IoT data and then utilized
this, Abdelmoumin et al. (2022) proposed a triple stacking- for facilitating supervised learner to detect attacks. Yang
based ensemble learning techniquefor detecting anomalies and Hwang (2022) proposed an unsupervised and ensemble-
in IoT. This approach ensembles Principal Component Anal- based anomaly detection method based on AE and Maha-
ysis (PCA), One Class SVM, and Two class Neural Networks lanobis distances. An anomaly score was then calculated for
to aggregate the prediction, which provides better detection testing sampels by weighted summing the calculated recon-
accuracy than a single model. Similarly, Khan et al. (2023) struction loss and Mahalanobis distance from the output fea-
presented an ensemble IDS model utilizing AutoML based tures of each layer in the autoencoder. Also, an anomaly de-
on a soft voting method in the network environment. Nev- tection approach based on a hierarchical AE was developed
ertheless, the ensemble learning is costly in terms of train- by (Kye et al., 2022). Unlike Yang and Hwang (2022), this
ing time, testing time, and computational overhead, which method has multiple detection stages, where the first stage
can lead to high latency and resource utilization in IoT. To based on encoder output identifies anomalies using normal-
tackle this issue, Qi et al. (2022) combined locality-sensitive ized 𝐿1 norm distance and the rest of stages based on de-
3
Journal Pre-proof
coder and hidden layer outputs detect anomalies utilizing 2.3. Deep Learning for Fog/Edge Computing
Mahalanobis distance. Li et al. (2022b) employed autoen- Currently, some works have studied and investigated dif-
coders and temporal convolutional networks to identify early ferent methods in fog/edge environments (Moustafa et al.,
failures of large mechanical equipment. As one of the sig- 2021; Yao et al., 2021; Abdel-Basset et al., 2021; Nie
nificant results, Alsaedi et al. (2023) designed a integrated et al., 2022). Moustafa et al. (2021) developed a novel dis-
method called USMD to identify attacks on Cyber Physical tributed anomaly detection using ensemble statistical learn-
Systems. They first employed Temporal Dependencies Net- ing method to timely and effectively identify zero-day at-
work (TDN) based on LSTM and a Temporal Attention Unit tacks in the edge networks. To capture long-standing depen-
of
for multi-sensor data and then used Isolation Forest to cal- dencies and complement parallel computation on IoT traf-
culate the misbehaviour scores, which achieves F-scores of fic sequences, Abdel-Basset et al. (2021) introduced a new
96.99% and 97.02% on SWaT and WADI datasets, respec- DL model to identify attacks in fog-based industrial IoT by
tively. integrating gated recurrent unit (GRU) and multihead at-
pro
tention (MA). Nie et al. (2022) proposed a GAN-based in-
As another way to applying unsupervised learning, trusion detection method to detect new attacks in collabo-
Li et al. (2018) proposed a new method called GAN- rative edge computing environment. Li et al. (2022a) de-
AD, to identify possible anomalies in cyber physical sys- veloped a LSTM AE-based anomaly detection framework,
tems. Long-short-term memory recurrent neural networks which leverages edge computing to discover potential at-
(LSTM-RNN) were employed to obtain the distribution of tacks in IoT networks. The authors showed that the proposed
multivariate IoT data. They realized high detection rate and model could achieve high F1 measure compared with other
low false positive to detect anomalies caused by various conventional ML models.
attacks. Then, the authors extended GAN-AD to propose Although the above studies have achieved promising re-
an unsupervised multivariate anomaly detection with GAN
re-
(MAD-GAN) (Li et al., 2019) to identify attacks utilizing
a new anomaly score called DR-Score. Moreover, Schlegl
et al. (2017) presented AnoGAN, and combined the discrim-
ination loss and reconstruction loss to define an anomaly
sults, there are still some issues in IoT network anomaly de-
tection among performance, scalability, and computation. In
this paper, we propose to take advantage of Wassertein dis-
tance to effectively stabilize the model training and enhance
detection performance. Moreover, we introduce a code clas-
score in medical images. However, these methods require sifier and a cycle-consistent constrain to learn desired la-
computationally expensive latent space optimization, which tent representations of input samples, which helps to distin-
may not be suitable for real-time IoT applications such as
lP
guish abnormal data from normal data. Finally, we exploit
e-healthcare. A lot of studies solve this issue by introduc- fog node and cloud server to jointly train the proposed DL
ing an encoder that maps input data to latent representation. method for detecting anomalies in IoT networks, which can
Schlegl et al. (2019) proposed f-AnoGAN method by intro- reduce response time and communication cost.
ducing an encoder and the more stable Wasserstein GAN.
However, it cannot emphasize the importance of learning
encoder and generator jointly and may degrade the detec- 3. Problem Statement
tion performance. The works efficient GAN based anomaly
rna
data, which possess important specificity and complexity. are relatively secure Liyanage et al. (2021). Moreover, the
In addition, de Araujo-Filho et al. (2021) proposed an im- communication between gateways and cloud server is pro-
proved GAN, called FID-GAN, to identify anomalies in IoT tected through encrypted secure channels. In other words,
while realizing relatively low detection latency. They in- only IoT devices that are the main sources of traffic within
troduced a trained encoder and estimated the loss function the networks are considered as the victim to be potentially
by using the reconstruction of data converted into the la- compromised.
tent space. Cui et al. (2022) exploited Energy-based Gen- 2) Stable Traffic Pattern: The traffic patterns of IoT de-
erative Adversarial Network (EBGAN)-based IDS for com- vices in a local network are considered to be relatively stable
plex high-dimensional data. Then they estimated the abnor- Meidan et al. (2018). This is because the periodic and steady
mality based on sample reconstruction using autoencoder flow of IoT data, which is simpler than its dense and erratic
in EBGAN. Despite the merits of GAN-based approaches, network traffic counterparts.
they may still suffer from high computational complexity 3) Sufficient Prior Knowledge on Normal Traffic: Despite
and false positives, which limits their applicability in real- it is difficult to describe benign IoT profiles accurately and
time or resource-constrained IoT applications. comprehensively, we still assume that there are enough IoT
4
Journal Pre-proof
will perform poorly in the IoT network due to the high num-
ber of devices and long distance between them. Therefore,
as shown in Fig. 1, our proposed IDS framework works as
Data Center
Cloud Layer
of
ligent devices. Thus, the proposed IDS would be deployed
Network
Edge
pro
all attacks from the edge layer. Each fog node detects at-
tacks by simply processing data obtained from IoT devices
connected to that node. Therefore, the detection overload
is shared to the fog nodes of the fog layer, which improves
Transportation Office Facility Video Monitoring Smart Devices
Edge Layer
the detection performance. Additionally, attack detection
澳
Fig. 1. System model. is performed close to the IoT devices, reducing the detec-
tion response time. The deployment of the proposed IDS
would provide achieve high protection against malicious at-
devices and IoT system can obtain reliable normal IoT traffic tacks while monitoring gateways in edge networks.
with the help of network experts.
function to the cloud. can be regarded as a 1-vs-1 structure. The details of pro-
posed detection framework are presented in Section 4.6.
• Fog Layer. The fog layer consists of a lot of fog nodes
to group computation resources near the network edge. 3.3. Preliminaries
As the core IoT network, fog computing acts an inter- BiGAN: The standard GAN is unable to determine the in-
mediary between the edge and cloud, which manages verse mapping from input data to latent representation, and
IoT traffic under the coverage. Thus, the fog basically an expensive latent space optimization is experienced dur-
has available processing capability, storage and net- ing training. To address this issue, Bidirectional Genera-
working resources to satisfy various network services. tive Adversarial Networks (BiGAN) (Donahue et al., 2018)
Jou
5
Journal Pre-proof
3.0
JS distance
z G 2.5
2.0
D 1.5
Loss
C
1.0
E x 0.5
of
澳 0.0
0 5 10 15 20 25
Fig. 2. Proposed GAN model architecture, including the generator , encoder , dis- Epoch
criminator , and classifier . The 𝐿1 denotes cycle consistency loss.
(a)
pro
1.0
the differences for the parameters of the generator may be JS
discontinuous, making it difficult to reach Nash Equilibrium.
Proposed
0.8
Wasserstein GAN (WGAN) (Arjovsky et al., 2017) uses the
Earth-Mover distance 𝑊 to measure the distance between
0.6
real and fake data distributions to facilitate GAN training.
The WGAN function is described as
0.4
Fig. 3. (a) Loss value for the discriminator using JS distance during model training.
(b) Performance result using JS distance on UNSW-NB15 dataset.
dence. The more realistic the data, the higher the score
given.
dimensional expansion may be so severe that the real and
lP
generated data almost have no overlap space. For example,
4. Proposed Anomaly Detection Method when latent variables are mapped to a space with higher di-
mensionality by the generator, the high dimensional space
The core idea behind our anomaly detection method is to
is actually all constrained by the samples from the low di-
build an effective model that can identify anomalous IoT data
mensional space. Thus, the support dimension of the high-
by learning the latent representation of normal samples man-
dimensional space is actually the dimension of latent space.
ifold from encoding and decoding. The model should have
Under the influence of dimensional expansion, the two dis-
the powerful ability to not reconstruct anomalies as “nor-
rna
6
Journal Pre-proof
of
fic is generally stable and has a relatively centralized like- abnormal random noise 𝑧, the classifier should identify it
lihood distribution, the Wasserstein distance is beneficial to as abnormal, i.e., (𝑧) = 0.
describe and identify IoT traffic patterns.
Moreover, to enforce the Lipschitz-continuity of discrimi- 4.3. Cycle Consistency Loss
nator , we employ the gradient penalty regularization term The original BiGAN does not impose any constrain on the
pro
proposed by (Gulrajani et al., 2017), which penalizes gradi- alignment of and (i.e., −1 = ), which can results in
ents not equal to 1. Therefore, the objective in Eq. (3) can mismatches in the reconstruction of normal samples. Hence,
be reformulated as the false alarm rate for identifying anomalies is high. For
example, the mapping of input data may cause positions in
latent representation that are only sparsely sampled during
min max 𝑉𝑎𝑑𝑣 (, , ) = 𝔼𝑥∼𝑝 [(𝑥, (𝑥))] − training. This can fail to convince the discriminator after the
, ∈𝑊𝐷
(4) inverse mapping. Consequently, there are still small resid-
𝔼𝑧∼𝑝 [((𝑧), 𝑧)] + 𝜇𝐺𝑃 (𝑥,
̃ 𝑧)
̃ uals even for anomalous, leading to high anomaly scores of
(𝑥,
re-
where 𝐺𝑃 represents the penalty term in (Gulrajani et al.,
2017), 𝜇 is the weight controlling the gradient penalty, and
̃ denotes sampling uniformly along straight lines be-
̃ 𝑧)
errors. To narrow the possible mapping search space, a cycle
consistency regularization term to and is employed for
facilitating their alignment. Zhu et al. (Zhu et al., 2017) first
introduced cycle consistency loss for computer vision tasks.
tween distribution 𝑝 and 𝑝 . In this work, we utilize de- Inspired by (Zhu et al., 2017), we combine and with the
fault value 𝜇 = 10 provided by (Gulrajani et al., 2017). cycle loss by minimizing the 𝐿1 norm between the generated
For Eq. (4), and are trained to minimize the objec- fake data and the reconstructed ones, which is expressed as
tive function and make generated couple ((𝑧), 𝑧) close to
(𝑥, (𝑥)), while the discriminator is trained to maximize
lP
𝑉𝑐𝑦𝑐 (, ) = 𝔼𝑥∼𝑝 [‖𝑥 − ((𝑥))‖1 ] (7)
the objective function and discriminate between ((𝑧), 𝑧)
and (𝑥, (𝑥)) couples. Through mini-max game, can fi- This loss function describes the latent space different from
nally lead the generated data of to be close to the real nor- the other loss (i.e., 𝑧 − ((𝑧))), which helps to distinguish
mal data. between normal data and IoT anomalies, see the evaluation
in Section 5.5.
rna
from are different from the latent variables 𝑧 (i.e., coding where 𝜎 controls the importance of the consistency loss.
loss). More formally, the objective function 𝑉𝑎𝑑𝑣 (, ) can As illustrated in Fig. 2, the benefits of our method for
be expressed as: anomaly detection are as follows. First, the Wasserstein dis-
tance is utilized to improve the model training. Second, we
min max 𝑉𝑐𝑜𝑑 (, ) =𝔼𝑥∼𝑝 [log((𝑧))]+ have a code discriminator to distinguish between random
̂
(5)
𝔼𝑧∼𝑃 [log(1 − (𝑧))] latent variables and potential latent representations, which
contributes to enable improving the network learning capa-
where 𝑧̂ = (𝑥) denotes the encoded latent representation bility. Third, the encoder and generator (i.e., decoder) with
for normal data 𝑥. the cycle consistent loss enable the model effectively to en-
To increase the stability and robustness of classifier , we code and decode the input data. Therefore, the well-trained
also apply Wasserstein distance into the objective function encoder and generator can be considered as an anomaly de-
in Eq. (5). Therefore, the objective function can be reformu- tection measure. Next, we explain the details of anomaly
lated as detection phase utilizing the proposed method.
7
Journal Pre-proof
Notation Description
Send Feedback
of
𝑁
𝑚 Number of iterations of the critic per epoch
𝑥 Real normal flow
𝑧 Random noise vector
𝐹𝒆 Error terms of Generator and Encoder
pro
4.5. Anomaly Detection Score
Once the model well trained, the detection process is gen- Fog Node
erally reconstruction-based. Samples that fit the learned
normal data distribution should be accurately reconstructed
while anomalous ones should be poorly reconstructed. To Fig. 4. Anomaly detection framework, where the discriminator and classifier are
澳
this end, one can calculate (𝑥) and construct the reconstruc- located on the cloud server and the generator and encoder are hosted on the fog
phase. Since our model is only trained on normal data, 𝐴(𝑥) of a GAN and then concludes that the discriminator is no bet-
derives the discriminator’s confidence that the samples are ter than a random guess at the equilibrium (Salimans et al.,
well encoded and reconstructed by encoder and generator 2016). Consider that is expected to distinguish between
and thus captured from the normal data distribution (Schlegl real data pair (𝑥, (𝑥)) and its reconstruction (((𝑥)), (𝑥)).
et al., 2017). The larger value 𝐴(𝑥), the more likely a sample At the same time, generator and encoder finally perfectly
𝑥 is deemed to be anomalous. capture the real data and latent variable distribution. Un-
der this situation, is inadequate to accurately distinguish
Why the discriminator plays an important role in anomaly between real samples and reconstructed ones, and thus may
result in a random prediction, which can not be an informa-
Jou
detection?
According to the training objective of proposed GAN, the tive anomaly score. We validate that our proposed anomaly
discriminator should give smaller values to the generated score performs better than the output of through the ex-
samples that are far from the real samples. In practice train- periment in Section 5.5.
ing, the generator often fails to produce a distribution that
matches the training data very well. Generated samples that 4.6. Intrusion Detection System
are very different from the training samples become abnor- In this section, we describe how the proposed method is
mal data for the discriminator. Then the discriminator is implemented in IoT networks. To train a anomaly detection
trained to give small values to these samples and also anoma- model over the network to detect attacks, we utilize fog node
lies. This explains that the output of the discriminator in the and cloud server to jointly design the detection model. The
anomaly scores helps to identify anomalies. proposed whole framework is illustrated in Fig. 4. A dis-
The hidden vector in the last layer of the discriminator is criminator and a classifier both are located on cloud server
often very different for an anomaly and its reconstruction. In whereas a generator and an encoder are hosted on fog node.
the proposed GAN, this difference is utilized to calculate the The anomaly detection model consists of two steps: 1) train-
8
Journal Pre-proof
of
layer perceptron neural networks. Every global iteration, the
‖𝑥 − ((𝑥(𝑖) ))‖
𝑛
1 ∑ ‖ (𝑖) ‖
cloud first receives a mini-batch from the generator and the +
𝑛 𝑖=1 ‖ ‖1
encoder at fog} side, where consists of 𝑛 {generated samples
((𝑧(𝑖) ), 𝑧(𝑖) ) 𝑖=1 and 𝑛 real samples (𝑥(𝑖) , (𝑥(𝑖) )) 𝑖=1 .
{ 𝑛 }𝑛
Thus, the gradient ∇𝑤𝑔,𝑒 of generator and encoder is de-
The loss functions of , of discriminator , are de- duced as
pro
noted as
𝜕𝐿,
∇𝑤(𝑔,𝑒),𝑗 =
𝜕𝑤(𝑔,𝑒),𝑗
1 ∑ 𝜕𝐿,
= ((𝑧(𝑖) ), 𝑧(𝑖) ) − (𝑥(𝑖) , (𝑥(𝑖) ))
𝑛 𝑛
1∑ 1∑ 𝜕𝜻𝑖
𝑛 𝑖=1 𝑛 𝑖=1 (10)
=
𝑛 𝜻 ∈𝜻 𝜕𝜻𝑖 𝜕𝑤(𝑔,𝑒),𝑗 (14)
𝑖
+ 𝜇𝐺𝑃 (𝑥,
̃ 𝑧)
̃ 1 ∑ 𝜕𝜻𝑖
= 𝒆𝑖
𝑛 𝜻 ∈𝜻 𝜕𝑤(𝑔,𝑒),𝑗
and
re- 𝑖
Meanwhile, the cloud server calculates the errors 𝐹𝒆 of malized into [0, 1] to accelerate the model convergence. The
and using the received traffic, and transfers them to fog data preprocessing process is detailed in Section 5.2.
side, which contributes to update the{parameters
} of generator Step 2. Every global iteration, based on training dataset,
and encoder . Formally, 𝐹𝒆 = 𝒆1 , ..., 𝒆𝑛 , where 𝒆𝑖 is the method first utilizes the generator and encoder in the
defined as fog node to generate a batch of{ network traffic
}𝑛 with 𝑛 samples
including a pair of fake data ((𝑧(𝑖) ), 𝑧(𝑖) ) 𝑖=1 and a pair of
normal data (𝑥(𝑖) , (𝑥(𝑖) )) 𝑖=1 . These data traffic is sent to
{ }𝑛
the cloud server for updating the parameters of and and
𝜕𝐿,
𝒆𝑖 = (12)
calculating error terms from and .
Jou
𝜕𝜻𝑖
Step 3. On the other hand, the discriminator and classi-
where 𝜻𝑖 denotes the 𝑖-th data of combined (𝑥) and (𝑧), fier in the cloud server utilize the received flow containing
and , is the loss function of and . generated data and real data from fog node to produce the
2) Training and Detection on Fog Node: The fog node trained model. The discriminator and classifier utilize these
hosts a encoder and a generator with their parameters data to compute the gradient ∇𝑤𝑑 and ∇𝑤𝑐 with loss func-
𝑤𝑔,𝑒 . The encoder and generator are both multi-layer per- tions and for updating their parameters 𝑤𝑑 and 𝑤𝑐
ceptron neural networks. Every global iteration, for a fixed according to Eq. (10) and Eq. (11), respectively.
𝑤𝑔,𝑒 , the fog node first generates and sends the couple of Step 4. Then, the error terms 𝐹𝒆 of generator and en-
generated sample ((𝑧𝑖 ), 𝑧𝑖 ) and real sample (𝑥𝑖 , (𝑥𝑖 )) to the coder are also computed in the cloud server by using the re-
cloud server for training the discriminators and . Then, ceived traffic through and by using Eq. (12) and Eq. (13),
the fog node receives the error term 𝐹𝒆 from the cloud, corre- which are transferred to fog side.
sponding to the error made by and . The , represents Step 5. After receiving the error terms 𝐹𝒆 from the cloud
the loss function of and , which is expressed as server, the generator and encoder in the fog node use the
9
Journal Pre-proof
Fog Cloud
Gen
Generator
A Pair of Data
Training
Dataset
Encoder
3 Train and Update Model
of
6 Repeat
No
pro
4 Calculate Error
1 Preprocess Classification
Trained Model
7 Detection Results
Testing
Dataset
澳
Fig. 5. The overview of the proposed framework for intrusion detection in IoT, following Section 4.6: The input data traffic is initially preprocessed. A pair of real data and fake
data are generated by fog node and then sent to the cloud server to train the model. The cloud server determines the authenticity of the pair of data, and updates the parameters of
the discriminator and classifier. The error terms are returned to the fog node to update the generator and encoder. The trained model on the fog node is utilized to detect anomalies
for testing data.
from testing dataset into the anomaly score 𝐴(𝑥) in Eq. (9) to
obtain the classification result for online anomaly detection.
The pseudo-code of our anomaly detection method, in-
cluding those steps, is illustrated in Algorithm 1 and Algo- with one flow, respectively). Meanwhile, the discrim-
rithm 2. As described in the Algorithm 1, in each iteration, inator and classifier also require 𝑚𝑁 ⋅ (𝑑𝐷 + 𝑑𝐶 )
rna
the training process Step 3 and Step 4 in the cloud server operations to compute the gradients and update their
contains the pseudocode between Line 1 and 14. The train- parameters. For brevity, let 𝑑𝐷 = ||𝑤𝑑 || and 𝑑𝐶 =
|𝑤𝑐 |. Hence, the computation cost on the cloud is
ing process Step 2 and Step 5 in the fog node includes the | (| )
pseudocode between Line 16 and 28. After that, the well- 𝑂 𝑚𝑁𝑇 ⋅ (3 ||𝑤𝑑 || + 2 ||𝑤𝑐 || ).
trained model is ready for anomaly detection. For each test • Fog. The fog node makes the generator and encoder to
flow 𝑥, the Step 7 representing the test process includes the generate the flow information, which needs (𝑁 ⋅𝑑𝐺 +𝑁 ⋅
pseudocode between Line 1 and 8 in Algorithm 2. Thus, 𝑑𝐸 ) operations in one iteration (i.e., 𝑑𝐺 and 𝑑𝐸 denote
our method combines the advantages of the cloud and fog the number of operations for generator and encoder
with one flow, respectively). Then, it also requires
Jou
10
Journal Pre-proof
Algorithm 1 Training based on cloud server and fog node 5. Experiments and Analysis
Input: Training data, the batch size 𝑛, the training epoch 𝑇 , 5.1. Experimental Setup
the number of iterations of the critic per epoch 𝑚, the The overall experiments of this paper are performed on
learning rate 𝜂
Output: Trained generator , encoder , and discriminator
an Intel i7 processor (2.9 GHz) with RAM of 16-GB, under
64-bit Windows 10 system, and equipped with a NVIDIA
GTX1070 GPU. The proposed method is implemented using
1: procedure C LOUD S ERVER (𝑇 , 𝑚, 𝑛, 𝜂)
Initialize 𝑤𝑑 , 𝑤𝑐 for and , respectively
Python 3.6 programing language, Scikit-Learn libraries, and
of
2:
Pytroch 1.10 environments. In our implementations, we em-
3: for 𝑗 = 1 to 𝑇 do ulate one cloud server utilizing Alibaba virtual server and the
4: for {𝑘 = 1 to 𝑚 do} {
((𝑧(𝑖) ), 𝑧(𝑖) ) , (𝑥(𝑖) , (𝑥(𝑖) )) ← GetSam-
} fog layer node utilizing a laptop, which is linked with the vir-
5:
ples (, )
tual edge. We employ socket to realize the communication
between the cloud and fog side. For experiments, we utilize
pro
6: ∇𝑤𝑑 ← CompDisLoss () by Eq. (10) a multi-layer perceptron (MLP) neural network with differ-
7: ∇𝑤𝑐 ← CompClaLoss() by Eq. (11) ent layers to train encoder, generator, discriminator and the
8: 𝑤𝑑 ← 𝑤𝑑 + Adam(∇𝑤𝑑 , 𝜂) classifier. The encoder has the fully connected layers with 64
9: 𝑤𝑐 ← 𝑤𝑐 + Adam(∇𝑤𝑐 , 𝜂) and 32 units, LeakyRelu function, and Batch Normalization
10: end for
𝐹𝒆 ← CompError(, ) by Eq. (12) and Eq. (13)
operation. The generator has the fully connected layers with
11:
64 and 128 units, Relu and Sigmoid functions, and Batch
12: Send the error terms 𝐹𝒆 to fog node Normalization operation. The discriminator is composed of
13: end for with two input channels for 𝑥 and 𝑧 with a fully connected
14: end procedure
layer with 64 units, LeakyRelu function, Batch Normaliza-
15:
16:
17:
18:
procedure FOGNODE(𝑇 , 𝑛, 𝜂)
Initialize 𝑤𝑔,𝑒 for and
for 𝑗 = 1 to{𝑇 do}
re- tion operation, and a dropout layer 0.2. The classifier has
one input channel and the same network parameters to the
discriminator. We get hyper-parameter settings by running
a series of hyperparameters experiments. The dimension of
Sample 𝑥(𝑖) 𝑖=1 a batch from normal data latent space is set to 32 with 𝑧 ∼ (0, 1). The number of
𝑛
19:
{ (𝑖) }𝑛
20: Sample 𝑧 𝑖=1 a batch of prior samples iteration 𝑚 =5, and the training epoch is 𝑇 = 200 with a
21: (𝑧(𝑖) ) ← Generator(𝑧(𝑖) ) batch size of n = 64. The model is optimized utilizing a
lP
22: (𝑥(𝑖) ){←Encoder(𝑥(𝑖)}) { well-known Adam optimizer. The model architecture and
Send ((𝑧(𝑖) ), 𝑧(𝑖) ) , (𝑥(𝑖) , (𝑥(𝑖) )) to the
}
23: hyperparameters settings are shown in Table 3. Note that
cloud server only normal data samples are used for the training process.
24: Receive error terms 𝐹𝒆 from the cloud server
25: ∇𝑤𝑔,𝑒 ← CompLoss(, ) by Eq. (14) 5.2. Datasets
26: 𝑤𝑔,𝑒 ← 𝑤𝑔,𝑒 + Adam(∇𝑤𝑔,𝑒 , 𝜂) A diversity of intrusion detection datasets can be available
for the public. For our simulations, we select two represen-
rna
11
Journal Pre-proof
Table 3 Table 4
Proposed method architecture and hyperparameters. The datasets for evaluating the proposed method.
Operation Units Non Linearity Batch Dropout UNSW-NB15 CIC-IDS17
Norm Class Class
(𝑥) Training Testing Training Testing
Set Set Set Set
Linear 64 LeakyRelu ✓ 0.0
Linear 32 None 7 0.0 Normal 11200 37000 Normal 21984 167079
(𝑧) Fuzzers 6062 DoS Hulk 92049
Linear 64 Relu 0.0
of
Analysis 677 DoS 4117
7
Linear 128 Relu ✓ 0.0 Golden-
Linear 196 or 77 Sigmoid 7 0.0 Eye
𝑥𝑧 (𝑥, 𝑧) Backdoor 583 DoS 2318
Linear 64 LeakyRelu ✓ 0.2 slowloris
Linear 1 Sigmoid 7 0.0 Dos 4089 DoS 2200
(𝑧) Slowhttptest
pro
Linear 64 LeakyRelu ✓ 0.2 Exploits 11132 Heartbleed 5
Linear 1 Sigmoid 0.0
Generic 18871
7