1 s20 S1084804523000413 Main

Journal Pre-proof
Scalable anomaly-based intrusion detection for secure Internet of Things

using generative adversarial networks in fog environment
Wei Yao, Han Shi, Hai Zhao
PII: S1084-8045(23)00041-3
DOI: https://doi.org/10.1016/j.jnca.2023.103622
Reference: YJNCA 103622
To appear in: Journal of Network and Computer Applications
Received date : 13 November 2022

Revised date : 26 January 2023
Accepted date : 7 March 2023
Please cite this article as: W. Yao, H. Shi and H. Zhao, Scalable anomaly-based intrusion detection
for secure Internet of Things using generative adversarial networks in fog environment. Journal of
Network and Computer Applications (2023), doi: https://doi.org/10.1016/j.jnca.2023.103622.
This is a PDF file of an article that has undergone enhancements after acceptance, such as the
addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive
version of record. This version will undergo additional copyediting, typesetting and review before it
is published in its final form, but we are providing this version to give early visibility of the article.
Please note that, during the production process, errors may be discovered which could affect the
content, and all legal disclaimers that apply to the journal pertain.
© 2023 Published by Elsevier Ltd.

Journal Pre-proof
Scalable anomaly-based intrusion detection for secure Internet of

Things using generative adversarial networks in fog environment
Wei Yao∗ , Han Shi and Hai Zhao
School of Computer Science and Engineering, Northeastern University, Shenyang, China
of
ARTICLE INFO ABSTRACT
Keywords: The data generated exponentially by a massive number of devices in the Internet of Things (IoT) are
Intrusion detection extremely high-dimensional, large-scale, non-labeled, which poses great challenges to timely analysis
Internet of Things and effective decision making for anomaly detection in IoT. In this paper, we propose a novel unsuper-
pro
Anomaly detection vised deep learning method to identify anomalies against IoT networks, which exploits Bidirectional
Generative adversarial networks Generative Adversarial Networks (BiGAN) to build model on normal IoT data. The model introduces
Fog computing Wassertein distance to capture and learn the distribution of high-dimensional raw data and focuses on
latent representations using an auxiliary classifier. A cycle consistency connection between data is
designed to prevent information loss that helps to reduce false positive rate. The model detects outliers
by utilizing reconstruction error in feature space. Another challenge facing the current anomaly detec-
tion solutions is their limited scalability, which restricts capability in handling big IoT data. This issue
is resolved by deploying and jointly training the proposed method in a fog computing environment.
The anomaly-based intrusion detection can be scalable by leveraging the flexibility of fog computing,
which contributes to supporting efficient detection. Experimental results on two recent datasets (i.e.,
UNSW-NB15 and CIC-IDS2017) validate that the proposed method achieves 4% increase in accuracy
1. Introduction
re-
and 4% reduction in false alarm rate than the state-of-the-art methods while keeping computational
efficiency.
anomalies, which is effective for unknown and unusual be-

haviors that often occurs in most IoT applications (Yang and
The swift development of the Internet of Things (IoT) net- Hwang, 2022). Since anomaly-based IDS can detect exist-
works has facilitated the interconnection of intelligent sen- ing and new attacks, there is no need to generate principles
sors and devices, which provides tremendous benefits to al-
lP
for known attacks. Therefore, anomaly-based IDS that relies
most every aspect of daily life, such as smart cities, smart on normal traffic is more suitable for IoT networks.
control system, e-healthcare, and transportation (Mozaf-
fari et al., 2019). IoT devices in such networks are capa- The extensive proliferation of various IoT devices can re-
ble of generating and collecting large amounts of data, and sult in a drastic rise in the number of emerging IoT-based at-
then these data are usually transmitted to nearby edge and tacks (also called IoT anomalies) (Ceron et al., 2019). How-
uploaded to the cloud for further processing (Ghosh and ever, recognizing anomalies in IoT networks is still a very
challenging task, especially with the continuous advance of
rna
Grolinger, 2021; Soni and Kumar, 2022). Highly useful and

valuable information can be derived from the collected data attacks (Iqbal et al., 2020; Moustafa et al., 2021; Nie et al.,
to support intelligent services for IoT applications. However, 2022). IoT networks contain various interconnected devices
the IoT data usually include sensitive information about the and sensors with different computing resources, commu-
users, such as personal account, health records, photos, and nication techniques, battery capacities, software and plat-
location, which can pose numerous security threats (Hassan forms. This heterogeneous characteristic poses a challenge
et al., 2020; Iqbal et al., 2020). Hence, it is necessary to real- to the effectiveness of security technologies and leads to a
ize fast and effective attack detection to secure IoT networks. striking increase of emerging attacks, which makes IoT net-
To mitigate the risk of IoT networks, intrusion detection works more vulnerable to novel and unknown attacks (Has-
Jou
system (IDS) has become an effective and active technol- san et al., 2021; Cao et al., 2019). The attacks may evade
ogy to monitor and detect attacks or possible anomalous ac- advanced security technologies and engender heavy damage
tivities (Stoyanova et al., 2020). An IDS can be divided to networks (Cao et al., 2019). Traditional machine learning
into two categorizes: the signature-based IDS and anomaly- (ML) methods (Shafiq et al., 2021) have been proven to iden-
based IDS. The signature-based IDS identifies attack behav- tify malicious attacks in IoT networks for their advantages in
iors by examining traffic with previously learned knowledge behavior analysis (Keshk et al., 2021). Nevertheless, ML has
of confirmed attacks (Stoyanova et al., 2020). However, also been shown to not handle effectively with large amounts
it fails to recognize zero-day attacks or novel ones (Zoppi of complex and high-dimensional data, and predominantly
and Ceccarelli, 2021). In contrast, anomaly-based IDS is reflects high false positive rates for attack detection (Ben-
able to identify any deviation from the normal profile as gio et al., 2013). Nowadays, Deep Learning (DL) methods
have been receiving considerable attention in intrusion de-
∗ Correspondingauthor.
tection (Mahdavifar and Ghorbani, 2019; Wu et al., 2020).
E-mail addresses: yaow.neu@gmail.com (W. Yao), Advances in DL have spawned novel IDS technologies ca-
shihan.neu@gmail.com (H. Shi), zhaoh@mail.neu.edu.cn (H. Zhao). pable of handling the existing attacks, complexity, and so-
Journal Pre-proof
phistication (Li et al., 2022a). mal or not by estimating how well the given sample fits the
Obtaining labeled attacks from IoT network is usually a learned distribution. Therefore, instead of using standard
difficult task, which can be time-consuming and even im- GAN, we adopt Bidirectional Generative Adversarial Net-
possible in some complex scenarios. Thus, unsupervised works (BiGAN) (Donahue et al., 2018) to propose a novel
learning techniques are considered to be a promising way, anomaly detection method for IoT networks. BiGAN intro-
being able to identify anomalies without the need for labels duces an encoder network, which alleviates the computa-
(Choi et al., 2019; Schlegl et al., 2019; Zenati et al., 2018b). tion of reconstruction error for a given input sample 𝑥 (i.e.,
we do not require to take expensive time to find the suit-
of
Nevertheless, most of the existing unsupervised methods are
built on simple linear projection and transformation, which able latent representation 𝑧 for the sample 𝑥). Thus, the Bi-
often fails to find the hidden relationship of data (Li et al., GAN is better for our goal. However, when bigan model
2019). Moreover, they generally use a simple comparison to learns to capture the distribution of high-dimensional IoT
distinguish between normal data and abnormal data, which data, the model gradient may vanishing, which leads to the
pro
often increases false alarm rates. This may not be sufficient low detection performance. To solve this issue, we introduce
for anomaly detection because the control boundaries are not Wassertein distance (Arjovsky et al., 2017) into adversarial
flexible enough and cannot effectively identify malicious at- loss to train BiGAN model, which helps to stabilizing and
tacks. Thus, it is crucial to develop a novel unsupervised accelerating the model training. In order to distinguish the
method to detect attacks in IoT networks. potential distribution of normal data from IoT anomalies, we
Meanwhile, the existing approaches for anomaly detection develop a code classifier to learn their potential latent rep-
mainly rely on centralized cloud. This indicates that they resentations to enhance detection accuracy. To reduce false
have no powerful capability to deal with the needs of current alarm rate for detection, we add a consistency loss to guaran-
IoT applications, such as computational distribution, low la- tee mapping normal data to desired outputs and reconstruct
re-
tency, and scalability (Wang et al., 2020; de Araujo-Filho
et al., 2021; Li et al., 2022a). Moreover, cloud-based meth-
ods usually overtake a high recognition time to detect attacks
due to long distance between IoT devices and IDS, leading
them precisely. In addition, we use the feature-based dis-
crimination loss as a novel anomaly score function to iden-
tify potential IoT anomalies. Finally, to achieve low detec-
tion latency and communication cost, our anomaly detection
to long communication time (Lim et al., 2020). As an alter- framework combines the advantages of fog computing and
native, edge computing poses a serious obstacle to deploy cloud computing to detect anomalies. The difference be-
IDS directly at the edge level due to the limited processing tween proposed method and previous methods is that, our
lP
and battery resources (Ning et al., 2021). A new nontrivial method not only has stronger generalization and prediction
computation called fog computing brings cloud-things ser- capability for detecting anomalies by focusing on latent rep-
vices next to end users via grouping available computing, resentation and reconstruction of data, but also achieves a
storage and processing resources at the network edge (Ni scalable and efficient intrusion detection by exploiting fog
et al., 2018). Fog computing can reduce costly storage and computing. Previous methods mainly focus on simple com-
processing generated by IoT devices and enable real-time de- parisons on the data space to identify anomalies, which does
tection requirement to achieve high efficiency and flexibility. not help to reduce false positive rates. They also use central-
rna
Thus, the fog layer provides a unique opportunity for IoT to ized deployment for anomaly detection, which cannot sup-
deploy intelligent and collaborative security solutions. Con- port efficient real-time detection. The main contributions of
sequently, an unsupervised method can take advantage of the this paper are summarized as follows.
fog computing to efficiently identify attacks in IoT networks. 1. We propose a novel unsupervised DL method for de-
Recently, Generative Adversarial Networks (GAN) (Li tecting attacks in IoT networks utilizing improved GAN,
et al., 2021) has been widely used as an effective unsuper- which not only solve the challenge of obtaining labels but
vised anomaly detection approach in various fields, such as also improve the detection performance.
computer vision and cyber security (Schlegl et al., 2019; 2. We develop a code classifier and a cycle-consistent loss
Jou
Zhang et al., 2022). The standard GAN consists of a gener- to learn the new latent space representation, which aims
ator and a discriminator. The generator produces fake sam- to enhance the capability in detecting intrusions from IoT
ples from a specific latent space, while the discriminator at- networks for unsupervised anomaly detection methods.
tempts to distinguish fake samples from real ones. When be- 3. We design a scalable anomaly detection framework by
ing trained on normal data, GAN can be utilized for detect- making use of the benefits of fog node and cloud server.
ing anomalies by observing its anomaly score. The trained This helps to reduce response time and communication
GAN often produces small values of reconstruction error on overhead. Extensive experiments are conducted on two
the normal data and much higher values on the abnormal recent intrusion datasets to assess the performance of
data. Although GAN can approximate the normal data dis- proposed method. The experimental results show that
tribution and generate realistic data samples, it is still dif- our method greatly outperforms the existing methods for
ficult to find a suitable latent representation 𝑧 to determine anomaly detection, while keeping a lower recognize time
whether a given test sample conforms to the normal data dis- and communication cost than cloud-based way.
tribution. More importantly, our goal of learning the data
distribution is to estimate whether a test sample is abnor- The rest of this paper is described as follows. Section 2
2
Journal Pre-proof
briefly introduces an overview of recent research on intru- hashing (LSH), isolation forest, and PCA techniques to ef-
sion detection. In Section 3, we present the system model ficiently and accurately detect point anomalies and partially
in IoT networks and some fundamental preliminaries. Sec- group anomalies in Industry 4.0.
tion 4 describes the details of proposed anomaly detection However, these ML methods often rely on the robustness
method. Then we demonstrate and analysis the evaluation of feature engineering and only capture some shallow feature
results in Section 5. Section 6 analyzes results obtained relationships between data, which may limit their learning
from proposed method and compares the state-of-the-art capability. Moreover, their detection performance degrades
IDS studies. Section 7 presents the limitations of proposed
of
when utilized for high-dimensional IoT data and can not sat-
method. Finally, Section 8 concludes this paper and presents isfy the increasing complex and demand IoT data (Moustafa
potential future work. et al., 2021).
2.2. Deep Learning-Based Intrusion/Anomaly
pro
2. Related Work
Detection
In this section, we briefly review existing studies of IDS Deep learning (DL) has attracted extensive interest in
and recent advances in deep learning-based intrusion detec- intrusion/anomaly detection due to its powerful learning
tion methods. abilities (especially from high-dimensional data), and in-
dependence from any feature engineering. Some advanced
2.1. Traditional Intrusion Detection Methods deep learning methods have been employed in IDS for IoT
As an indispensable security tool, IDS can collect and networks, such as Convolutional Neural Network (CNN)
screen network traffic to detect whether a network has been (Ferrag et al., 2020), Long Short-Term Memory (LSTM)
compromised. A large number of intrusion/anomaly detec- (Al-Hawawreh et al., 2021), and Recurrent Neural network
re-
tion methods based on machine learning (ML) have been in-
vestigated. For example, Chawathe (2018) employed many
ML approaches, such as J48 and Random Forest (RF) to
detect IoT attacks. Nevertheless, these methods are not ef-
(RNN) (Zhang et al., 2022). For example, Gao et al. (2021)
explored LSTM and feed forward neural network (FNN) to
identify attacks respectively and then combined them to do
experiments. Depending on improved feature representa-
fective for identifying novel botnets. To detect point-wise tion, Zhou et al. (2021) developed an IDS by utilizing vari-
and collective anomalies, Marteau (2021) utilized a forest ational LSTM to solve the complexity of data compression
of binary partitioning trees to build an intrusion detection but retain critical features. Ding and Li (2022) proposed a
lP
model. By introducing of the distance-based paradigm, the intrusion detection method of exploiting Graph Convolution
proposed method is better than the Isolated Forest (IF) ap- Network (GCN) to built graph structured data in network
proach. Nomm and Bahsi (2018) presented a new method traffic and combining LSTM with attention mechanism to
for identifying anomalies from IoT datasets. In the proposed capture the time dependence of network traffic. To resolve
method, they first resampled normal data from IoT dataset to the problem of data imbalance, Park et al. (2022) developed a
expand training dataset. Then, the trained Local Anomaly novel Boundary Equilibrium GAN (BEGAN)-based IDS to
Factor (LOF) and One Class of Support Vector Machines detect network They extracted features from trained encoder
rna
(OCSVM) were employed to detect malicious attacks. The and used DNN, CNN, and LSTM as supervised classifiers,
sampling technique may alter the distribution of the original which improves the detection performance. However, the
data, thus affecting the effectiveness of LOF and OCSVM. computational cost of these DL methods are expensive.
In Yang et al. (2022), a Cluster Labeling (CL) K-Means and Apart from previous approaches using supervised learn-
was used as an unsupervised learner for zero-day attack de- ing as an anomaly detection model, several studies have ap-
tection. The authors first employed Kernal PCA to select plied unsupervised learning, especially autoencoder (AE)
the relevant features for reducing dimensionality and then models (Vu et al., 2022; Monshizadeh et al., 2022; Yang and
exploited two biased classifiers and a Bayesian optimiza- Hwang, 2022). Vu et al. (2022) introduced three regularized
Jou
tion with Gaussian process (BO-GP) to optimize the model, AE architectures (i.e., MAE, MVAE, MDAE) to build the
which realizes much improved performance. Different from desired feature representation of IoT data and then utilized
this, Abdelmoumin et al. (2022) proposed a triple stacking- for facilitating supervised learner to detect attacks. Yang
based ensemble learning techniquefor detecting anomalies and Hwang (2022) proposed an unsupervised and ensemble-
in IoT. This approach ensembles Principal Component Anal- based anomaly detection method based on AE and Maha-
ysis (PCA), One Class SVM, and Two class Neural Networks lanobis distances. An anomaly score was then calculated for
to aggregate the prediction, which provides better detection testing sampels by weighted summing the calculated recon-
accuracy than a single model. Similarly, Khan et al. (2023) struction loss and Mahalanobis distance from the output fea-
presented an ensemble IDS model utilizing AutoML based tures of each layer in the autoencoder. Also, an anomaly de-
on a soft voting method in the network environment. Nev- tection approach based on a hierarchical AE was developed
ertheless, the ensemble learning is costly in terms of train- by (Kye et al., 2022). Unlike Yang and Hwang (2022), this
ing time, testing time, and computational overhead, which method has multiple detection stages, where the first stage
can lead to high latency and resource utilization in IoT. To based on encoder output identifies anomalies using normal-
tackle this issue, Qi et al. (2022) combined locality-sensitive ized 𝐿1 norm distance and the rest of stages based on de-
3
Journal Pre-proof
coder and hidden layer outputs detect anomalies utilizing 2.3. Deep Learning for Fog/Edge Computing
Mahalanobis distance. Li et al. (2022b) employed autoen- Currently, some works have studied and investigated dif-
coders and temporal convolutional networks to identify early ferent methods in fog/edge environments (Moustafa et al.,
failures of large mechanical equipment. As one of the sig- 2021; Yao et al., 2021; Abdel-Basset et al., 2021; Nie
nificant results, Alsaedi et al. (2023) designed a integrated et al., 2022). Moustafa et al. (2021) developed a novel dis-
method called USMD to identify attacks on Cyber Physical tributed anomaly detection using ensemble statistical learn-
Systems. They first employed Temporal Dependencies Net- ing method to timely and effectively identify zero-day at-
work (TDN) based on LSTM and a Temporal Attention Unit tacks in the edge networks. To capture long-standing depen-
of
for multi-sensor data and then used Isolation Forest to cal- dencies and complement parallel computation on IoT traf-
culate the misbehaviour scores, which achieves F-scores of fic sequences, Abdel-Basset et al. (2021) introduced a new
96.99% and 97.02% on SWaT and WADI datasets, respec- DL model to identify attacks in fog-based industrial IoT by
tively. integrating gated recurrent unit (GRU) and multihead at-
pro
tention (MA). Nie et al. (2022) proposed a GAN-based in-
As another way to applying unsupervised learning, trusion detection method to detect new attacks in collabo-
Li et al. (2018) proposed a new method called GAN- rative edge computing environment. Li et al. (2022a) de-
AD, to identify possible anomalies in cyber physical sys- veloped a LSTM AE-based anomaly detection framework,
tems. Long-short-term memory recurrent neural networks which leverages edge computing to discover potential at-
(LSTM-RNN) were employed to obtain the distribution of tacks in IoT networks. The authors showed that the proposed
multivariate IoT data. They realized high detection rate and model could achieve high F1 measure compared with other
low false positive to detect anomalies caused by various conventional ML models.
attacks. Then, the authors extended GAN-AD to propose Although the above studies have achieved promising re-
an unsupervised multivariate anomaly detection with GAN
re-
(MAD-GAN) (Li et al., 2019) to identify attacks utilizing
a new anomaly score called DR-Score. Moreover, Schlegl
et al. (2017) presented AnoGAN, and combined the discrim-
ination loss and reconstruction loss to define an anomaly
sults, there are still some issues in IoT network anomaly de-
tection among performance, scalability, and computation. In
this paper, we propose to take advantage of Wassertein dis-
tance to effectively stabilize the model training and enhance
detection performance. Moreover, we introduce a code clas-
score in medical images. However, these methods require sifier and a cycle-consistent constrain to learn desired la-
computationally expensive latent space optimization, which tent representations of input samples, which helps to distin-
may not be suitable for real-time IoT applications such as
lP
guish abnormal data from normal data. Finally, we exploit
e-healthcare. A lot of studies solve this issue by introduc- fog node and cloud server to jointly train the proposed DL
ing an encoder that maps input data to latent representation. method for detecting anomalies in IoT networks, which can
Schlegl et al. (2019) proposed f-AnoGAN method by intro- reduce response time and communication cost.
ducing an encoder and the more stable Wasserstein GAN.
However, it cannot emphasize the importance of learning
encoder and generator jointly and may degrade the detec- 3. Problem Statement
tion performance. The works efficient GAN based anomaly
rna
In this section, we first give the assumptions in the IoT

detection method (EGBAD) (Zenati et al., 2018a) and adver- environment, then present the system model, and finally in-
sarially learned anomaly detection (ALAD) (Donahue et al., troduce some structure of GAN models.
2018) both employed an architecture similar to BiGAN,
which learns a mapping from data to latent space. Nev- 3.1. Assumptions
ertheless, these models lacked an evaluation for detecting The proposed method is based on the following assump-
anomalies on the time cost. Moreover, the works in (Schlegl tions:
et al., 2019), (Zenati et al., 2018a) and (Donahue et al., 2018) 1) Securing Gateways and Centralized Controller: As-
mainly focus on images and have not been explored for IoT sume that the gateways of edge networks and the could server
Jou
data, which possess important specificity and complexity. are relatively secure Liyanage et al. (2021). Moreover, the
In addition, de Araujo-Filho et al. (2021) proposed an im- communication between gateways and cloud server is pro-
proved GAN, called FID-GAN, to identify anomalies in IoT tected through encrypted secure channels. In other words,
while realizing relatively low detection latency. They in- only IoT devices that are the main sources of traffic within
troduced a trained encoder and estimated the loss function the networks are considered as the victim to be potentially
by using the reconstruction of data converted into the la- compromised.
tent space. Cui et al. (2022) exploited Energy-based Gen- 2) Stable Traffic Pattern: The traffic patterns of IoT de-
erative Adversarial Network (EBGAN)-based IDS for com- vices in a local network are considered to be relatively stable
plex high-dimensional data. Then they estimated the abnor- Meidan et al. (2018). This is because the periodic and steady
mality based on sample reconstruction using autoencoder flow of IoT data, which is simpler than its dense and erratic
in EBGAN. Despite the merits of GAN-based approaches, network traffic counterparts.
they may still suffer from high computational complexity 3) Sufficient Prior Knowledge on Normal Traffic: Despite
and false positives, which limits their applicability in real- it is difficult to describe benign IoT profiles accurately and
time or resource-constrained IoT applications. comprehensively, we still assume that there are enough IoT
4
Journal Pre-proof
will perform poorly in the IoT network due to the high num-
ber of devices and long distance between them. Therefore,
as shown in Fig. 1, our proposed IDS framework works as
Data Center
Cloud Layer
follows. Edge nodes/devices collect and analyze IoT traffic

Cloud Server
information under its base station. Then, these data infor-

mation are sent to fog layer for computing, analysis and stor-
age. The network gateways can be installed as software or
Fog Layer
devices to connect between the edge nodes, cloud, and intel-
of
ligent devices. Thus, the proposed IDS would be deployed
Network
Edge
at gateways in the fog node such as routers and switches in

Wireless edge networks. In other words, the fog layer supports attack
detection, where each fog node is responsible for detecting
Access point Router Switch Base Station
pro
all attacks from the edge layer. Each fog node detects at-
tacks by simply processing data obtained from IoT devices
connected to that node. Therefore, the detection overload
is shared to the fog nodes of the fog layer, which improves
Transportation Office Facility Video Monitoring Smart Devices
Edge Layer
the detection performance. Additionally, attack detection
澳
Fig. 1. System model. is performed close to the IoT devices, reducing the detec-
tion response time. The deployment of the proposed IDS
would provide achieve high protection against malicious at-
devices and IoT system can obtain reliable normal IoT traffic tacks while monitoring gateways in edge networks.
with the help of network experts.
3.2. System Model

The system model of proposed anomaly-based IDS in a
re- The aim of our IDS is to simultaneously realize high ef-
fectiveness and efficiency in anomaly detection for IoT net-
works. Thus, we also try to train an anomaly detection model
by utilizing the advantages of fog node and cloud server.
fog computing environment is illustrated in Fig. 1, which Specifically, the fog node and the cloud server work in a col-
is comprised of three components, i.e., edge layer, fog layer, laborative manner to design a detection model based on IoT
and cloud layer. The roles of these layers are displayed as traffic in a combining way. The discriminator and a clas-
lP
follows. sifier are trained on cloud server, while a pair of generator
and encoder are trained on fog node. Every a period, the
• Edge Layer. The edge layer includes edge nodes and generator and encoder send the generated fake data to the
edge devices that communicate through the IoT net- discriminators and classifier, which attempts to distinguish
work by various network devices (e.g., wireless access between the generated data and real data. Meanwhile, the
points, routers, switching devices, and base stations), discriminator and the classifier learn the distribution of nor-
while connecting to specific fog node that serves as a mal traffic collected from fog node at the training phase. It
rna
function to the cloud. can be regarded as a 1-vs-1 structure. The details of pro-
posed detection framework are presented in Section 4.6.
• Fog Layer. The fog layer consists of a lot of fog nodes
to group computation resources near the network edge. 3.3. Preliminaries
As the core IoT network, fog computing acts an inter- BiGAN: The standard GAN is unable to determine the in-
mediary between the edge and cloud, which manages verse mapping from input data to latent representation, and
IoT traffic under the coverage. Thus, the fog basically an expensive latent space optimization is experienced dur-
has available processing capability, storage and net- ing training. To address this issue, Bidirectional Genera-
working resources to satisfy various network services. tive Adversarial Networks (BiGAN) (Donahue et al., 2018)
Jou
is proposed to learn an encoder that provides the projection

• Cloud Layer. The cloud (i.e, cloud server) owns high needed. The encoder (𝑥) maps the real data 𝑥 to the cor-
and powerful computing resources, which can easily responding latent representation and trained with . The
perform big data aggregation and store it in the data discriminator (𝑥, 𝑧) distinguishes between (𝑥, (𝑥)) and
center. In addition, the cloud server stores model con- ((𝑧), 𝑧), where the latent component is either an output (𝑥)
figurations and other training settings. of the encoder or an generator input latent variable 𝑧. Fol-
lowing a game-theoretic framework, the minimax problem
In an IoT system, multiple devices are located in differ- of the BiGAN is
ent positions with long distances between them. Compared
min max 𝑉 (, , ) = 𝔼𝑥∼𝑝 [log (𝑥, (𝑥))] +
to classical wired or wireless system, IoT system contains a
large number of edge devices. Thus, an intrusion detection , 
(1)
system must have the storage capacity to store and process 𝔼𝑧∼𝑝 [1 − log ((𝑧), 𝑧)]
data from all network devices, and should provide a quick
response in a short time. In this case, the detection system Wasserstein GAN: As the typical GAN usually minimizes
5
Journal Pre-proof
3.0
JS distance
z G 2.5
2.0
D 1.5
Loss
C
1.0
E x 0.5
of
澳 0.0
0 5 10 15 20 25
Fig. 2. Proposed GAN model architecture, including the generator , encoder , dis- Epoch
criminator , and classifier . The 𝐿1 denotes cycle consistency loss.
(a)
pro
1.0
the differences for the parameters of the generator may be JS
discontinuous, making it difficult to reach Nash Equilibrium.
Proposed
0.8
Wasserstein GAN (WGAN) (Arjovsky et al., 2017) uses the
Earth-Mover distance 𝑊 to measure the distance between
0.6
real and fake data distributions to facilitate GAN training.
The WGAN function is described as
0.4
min max 𝑉 (, 𝑓 ) = 𝔼𝑥∼𝑝 [𝑓𝑤 (𝑥)] − 𝔼𝑧∼𝑝 [𝑓𝑤 ((𝑧))]

 𝑤∈𝑊
0.2
(2)
where 𝑤 ∈ 𝑊 is the set of 1-Lipschitz functions, and 𝑓𝑤 (𝑥)
acts equivalently as a critic. The function of 𝑓𝑤 (𝑥) shifts
from a classifier to a critic that provides authentic confi-
re- Accuracy
(b)
F1
Fig. 3. (a) Loss value for the discriminator using JS distance during model training.
(b) Performance result using JS distance on UNSW-NB15 dataset.
dence. The more realistic the data, the higher the score
given.
dimensional expansion may be so severe that the real and
lP
generated data almost have no overlap space. For example,
4. Proposed Anomaly Detection Method when latent variables are mapped to a space with higher di-
mensionality by the generator, the high dimensional space
The core idea behind our anomaly detection method is to
is actually all constrained by the samples from the low di-
build an effective model that can identify anomalous IoT data
mensional space. Thus, the support dimension of the high-
by learning the latent representation of normal samples man-
dimensional space is actually the dimension of latent space.
ifold from encoding and decoding. The model should have
Under the influence of dimensional expansion, the two dis-
the powerful ability to not reconstruct anomalies as “nor-
rna
tributions inevitably have little overlapping, but discrete fea-

mal” samples since anomalies would lose information dur-
tures are aggravated by One-Hot Encoding. According to
ing the encoding process. Fig. 2 presents the architecture of
the inherent nature of JS divergence, when there is almost
proposed GAN model. As shown in Fig. 2, the high-level
no overlap between the two distributions, the JS divergence
our model function is comprised of three primary modules:
inevitably converges to a constant, which may lead to the
(1) Wasserstein adversarial loss, to correspond the distribu-
occurrence of vanishing gradients (Salimans et al., 2016).
tion between true data samples and generated ones, which
Fig. 3(a) indicates the JS estimate of the discriminator dur-
improves the robustness of the training model, (2) coding
ing model training. It can be seen that the loss decreases
loss, to distinguish between the latent variables 𝑧 and 𝑧̂ gen-
erated from the encoder , making the model focus on la-
quickly and then the discriminator closes to zero loss. In
Jou
other words, the JS distance saturates. This can result in de-

tent representation, and (3) cycle consistency loss, to avoid
the conflict between  and , which helps to reduce the false
grading detection performance. As shown in Fig. 3(b), the
detection accuracy and F1 using JS distance are lower than
alarm rate for detection. These modules for anomaly detec-
that of proposed method.
tion are described in the following subsections.
To solve this issue, we employ Wasserstein distance (Ar-
jovsky et al., 2017) as adversarial loss to improve the BiGAN
4.1. Wasserstein adversarial Loss
model. We take advantage of the Wasserstein distance when
The formulation of BiGAN model using the standard ad-
training the discriminator. Specifically, its min-max prob-
versarial losses suffers from the mode collapse problem. In
lem can be formulated as
particular, the original BiGAN uses Jensen-Shannon (JS) di-
vergence to represent the distance between two distributions.
This distance does help a lot in image domain. However, for min max 𝑉𝑎𝑑𝑣 (, , ) = 𝔼𝑥∼𝑝 [(𝑥, (𝑥))] −
data with discrete features in IoT anomaly detection, such , ∈𝑊𝐷
(3)
as non-numerical values that rely on One-Hot Encoding, the 𝔼𝑧∼𝑝 [((𝑧), 𝑧)]
6
Journal Pre-proof
where  ∈ 𝑊𝐷 , which is the set of 1-Lipschitz functions, 𝑝

is the distribution of normal samples 𝑥, and 𝑝 is the of ran- min max 𝑉𝑐𝑜𝑑 (, ) = 𝔼𝑥∼𝑝 [((𝑥))] −
 ∈𝑊𝐶
dom noise 𝑧 such as from Gaussian distribution. Due to the (6)
constrain of the Lipschitz continuous functions, the model 𝔼𝑧∼𝑝 [(𝑧)] + 𝜇𝐺𝑃 (𝑧)
̃
weights would not alter considerably when updated using where  ∈ 𝑊𝑐 , which is the set of 1-Lipschitz functions and
gradient descent method. Thus, the Wasserstein distance can the parameter 𝜇 is a weighting factor. The optimization of
decrease the possibility of gradient explosion, while keep- Eq. (6) is: (1) for normal encoded vectors (𝑥), the classifier
ing the stability and robustness of training. Since IoT traf-  should identify it as normal, i.e., ((𝑥)) = 1; and (2) for
of
fic is generally stable and has a relatively centralized like- abnormal random noise 𝑧, the classifier  should identify it
lihood distribution, the Wasserstein distance is beneficial to as abnormal, i.e., (𝑧) = 0.
describe and identify IoT traffic patterns.
Moreover, to enforce the Lipschitz-continuity of discrimi- 4.3. Cycle Consistency Loss
nator , we employ the gradient penalty regularization term The original BiGAN does not impose any constrain on the
pro
proposed by (Gulrajani et al., 2017), which penalizes gradi- alignment of  and  (i.e.,  −1 = ), which can results in
ents not equal to 1. Therefore, the objective in Eq. (3) can mismatches in the reconstruction of normal samples. Hence,
be reformulated as the false alarm rate for identifying anomalies is high. For
example, the mapping of input data may cause positions in
latent representation that are only sparsely sampled during
min max 𝑉𝑎𝑑𝑣 (, , ) = 𝔼𝑥∼𝑝 [(𝑥, (𝑥))] − training. This can fail to convince the discriminator after the
, ∈𝑊𝐷
(4) inverse mapping. Consequently, there are still small resid-
𝔼𝑧∼𝑝 [((𝑧), 𝑧)] + 𝜇𝐺𝑃 (𝑥,
̃ 𝑧)
̃ uals even for anomalous, leading to high anomaly scores of
(𝑥,
re-
where 𝐺𝑃 represents the penalty term in (Gulrajani et al.,
2017), 𝜇 is the weight controlling the gradient penalty, and
̃ denotes sampling uniformly along straight lines be-
̃ 𝑧)
errors. To narrow the possible mapping search space, a cycle
consistency regularization term to  and  is employed for
facilitating their alignment. Zhu et al. (Zhu et al., 2017) first
introduced cycle consistency loss for computer vision tasks.
tween distribution 𝑝 and 𝑝 . In this work, we utilize de- Inspired by (Zhu et al., 2017), we combine  and  with the
fault value 𝜇 = 10 provided by (Gulrajani et al., 2017). cycle loss by minimizing the 𝐿1 norm between the generated
For Eq. (4),  and  are trained to minimize the objec- fake data and the reconstructed ones, which is expressed as
tive function and make generated couple ((𝑧), 𝑧) close to
(𝑥, (𝑥)), while the discriminator  is trained to maximize
lP
𝑉𝑐𝑦𝑐 (, ) = 𝔼𝑥∼𝑝 [‖𝑥 − ((𝑥))‖1 ] (7)
the objective function and discriminate between ((𝑧), 𝑧)
and (𝑥, (𝑥)) couples. Through mini-max game,  can fi- This loss function describes the latent space different from
nally lead the generated data of  to be close to the real nor- the other loss (i.e., 𝑧 − ((𝑧))), which helps to distinguish
mal data. between normal data and IoT anomalies, see the evaluation
in Section 5.5.
rna
4.2. Coding Loss

4.4. Overall Training Objective
The purpose of our method is to learn an effective latent
representation 𝑧̂ from the encoder , which is benefit to dis-
To effectively detect IoT anomalies, the Wasserstein ad-
versarial loss 𝑉𝑎𝑑𝑣 (, , ), coding loss 𝑉𝑐𝑜𝑑 (, ), and cy-
cle consistency loss 𝑉𝑐𝑦𝑐 ((, ), are combined together to
tinguish abnormal samples from normal samples. However,
training the BiGAN with adversarial loss alone only consid-
represent the overall objective function, which is written as
ers the joint distribution and cannot guarantee the potential
latent representations from the encoding are different from
𝑧, which may degrade the detection accuracy. To ensure the min max 𝑉 (, , , ) = 𝑉𝑎𝑑𝑣 (, , )
, ∈𝑊𝐷 ,∈𝑊𝐶
high quality of the latent representation, we further introduce (8)
a classifier  to guarantee that the encoded normal data 𝑧̂ + 𝑉𝑐𝑜𝑑 (, ) + 𝜎𝑉𝑐𝑦𝑐 (, )
Jou
from  are different from the latent variables 𝑧 (i.e., coding where 𝜎 controls the importance of the consistency loss.
loss). More formally, the objective function 𝑉𝑎𝑑𝑣 (, ) can As illustrated in Fig. 2, the benefits of our method for
be expressed as: anomaly detection are as follows. First, the Wasserstein dis-
tance is utilized to improve the model training. Second, we
min max 𝑉𝑐𝑜𝑑 (, ) =𝔼𝑥∼𝑝 [log((𝑧))]+ have a code discriminator to distinguish between random
 
̂
(5)
𝔼𝑧∼𝑃 [log(1 − (𝑧))] latent variables and potential latent representations, which
contributes to enable improving the network learning capa-
where 𝑧̂ = (𝑥) denotes the encoded latent representation bility. Third, the encoder and generator (i.e., decoder) with
for normal data 𝑥. the cycle consistent loss enable the model effectively to en-
To increase the stability and robustness of classifier , we code and decode the input data. Therefore, the well-trained
also apply Wasserstein distance into the objective function encoder and generator can be considered as an anomaly de-
in Eq. (5). Therefore, the objective function can be reformu- tection measure. Next, we explain the details of anomaly
lated as detection phase utilizing the proposed method.
7
Journal Pre-proof
Table 1 Send flow

Notations utilized in IDS. Update parameters
Notation Description
Send Feedback
, 𝑤𝑑 Discriminator and its parameters

, 𝑤𝑐 Classifier and its parameters Cloud Server
, , 𝑤𝑔,𝑒 Generator, Encoder, and their parameters
𝑛 Batch size
𝑇 Training epoch
Number of training dataset
of
𝑁
𝑚 Number of iterations of the critic per epoch
𝑥 Real normal flow
𝑧 Random noise vector
𝐹𝒆 Error terms of Generator  and Encoder 
pro
4.5. Anomaly Detection Score
Once the model well trained, the detection process is gen- Fog Node
erally reconstruction-based. Samples that fit the learned
normal data distribution should be accurately reconstructed
while anomalous ones should be poorly reconstructed. To Fig. 4. Anomaly detection framework, where the discriminator  and classifier  are
澳
this end, one can calculate (𝑥) and construct the reconstruc- located on the cloud server and the generator  and encoder  are hosted on the fog
tion ((𝑥)) for given 𝑥. To measure the abnormal extent

node.
of given sample 𝑥, similar to (Schlegl et al., 2019), we uti-

lize the feature vectors from a richer intermediate output of
the discriminator as the anomaly score 𝐴(𝑥). Formally, the
feature-based discriminator loss 𝑓𝐷 (𝑥) is denoted as
re- anomaly score. The anomalies are passed through the en-
coder and the generator for reconstruction, and then the dis-
criminator tends to give very different values to an anomaly
and its reconstruction. These values are a linear transfor-
𝐴(𝑥) = 𝑓 (𝑥) = ‖ ‖𝑓 (𝑥, (𝑥)) − 𝑓 (((𝑥)), (𝑥))‖1
‖ mation of the hidden vector of the last layer, so the hidden
(9) vector of an anomaly is different from its reconstructed vec-
lP
where 𝑓 (⋅, ⋅) is the activations of the layer preceding the tor. A normal sample and its reconstruction tend to be very
logits for the given input samples in the discriminator . Us- similar, and their hidden vectors are also very similar, thus
ing discriminator features is motivated by the recently fea- the discriminant loss of a normal sample is small.
ture matching method (Salimans et al., 2016). Different from In addition, we propose to use the feature loss calculate
this, we do not employ the feature matching loss during the anomaly score, which is better than the directly output of
model training, but instead only utilize the concept in a dif- the discriminator for anomaly score. Some previous work
ferent context of calculating an anomaly score at detection uses the distribution of real data to analyze the equilibrium
rna
phase. Since our model is only trained on normal data, 𝐴(𝑥) of a GAN and then concludes that the discriminator is no bet-
derives the discriminator’s confidence that the samples are ter than a random guess at the equilibrium (Salimans et al.,
well encoded and reconstructed by encoder and generator 2016). Consider that  is expected to distinguish between
and thus captured from the normal data distribution (Schlegl real data pair (𝑥, (𝑥)) and its reconstruction (((𝑥)), (𝑥)).
et al., 2017). The larger value 𝐴(𝑥), the more likely a sample At the same time, generator and encoder finally perfectly
𝑥 is deemed to be anomalous. capture the real data and latent variable distribution. Un-
der this situation,  is inadequate to accurately distinguish
Why the discriminator plays an important role in anomaly between real samples and reconstructed ones, and thus may
result in a random prediction, which can not be an informa-
Jou
detection?
According to the training objective of proposed GAN, the tive anomaly score. We validate that our proposed anomaly
discriminator  should give smaller values to the generated score performs better than the output of  through the ex-
samples that are far from the real samples. In practice train- periment in Section 5.5.
ing, the generator often fails to produce a distribution that
matches the training data very well. Generated samples that 4.6. Intrusion Detection System
are very different from the training samples become abnor- In this section, we describe how the proposed method is
mal data for the discriminator. Then the discriminator is implemented in IoT networks. To train a anomaly detection
trained to give small values to these samples and also anoma- model over the network to detect attacks, we utilize fog node
lies. This explains that the output of the discriminator in the and cloud server to jointly design the detection model. The
anomaly scores helps to identify anomalies. proposed whole framework is illustrated in Fig. 4. A dis-
The hidden vector in the last layer of the discriminator is criminator and a classifier both are located on cloud server
often very different for an anomaly and its reconstruction. In whereas a generator and an encoder are hosted on fog node.
the proposed GAN, this difference is utilized to calculate the The anomaly detection model consists of two steps: 1) train-
8
Journal Pre-proof
ing model over cloud server; 2) training and detection of IoT

traffic by fog node. Table 1 summarizes the notations uti-
, = (𝑥(𝑖) , (𝑥(𝑖) )) − ((𝑧(𝑖) ), 𝑧(𝑖) )
𝑛 𝑛
lized in our IDS. Next, we separately describe the above two 1∑ 1∑
steps. 𝑛 𝑖=1 𝑛 𝑖=1
1) Training on Cloud: The cloud server holds  and 
((𝑥(𝑖) )) − (𝑧(𝑖) ) (13)
𝑛 𝑛
1∑ 1∑
with an input (𝑥, 𝑧), associated with their parameters 𝑤𝑑 and +
𝑤𝑐 , respectively. The discriminator and classifier are multi-
𝑛 𝑖=1 𝑛 𝑖=1
of
layer perceptron neural networks. Every global iteration, the
‖𝑥 − ((𝑥(𝑖) ))‖
𝑛
1 ∑ ‖ (𝑖) ‖
cloud first receives a mini-batch from the generator and the +
𝑛 𝑖=1 ‖ ‖1
encoder at fog} side, where consists of 𝑛 {generated samples
((𝑧(𝑖) ), 𝑧(𝑖) ) 𝑖=1 and 𝑛 real samples (𝑥(𝑖) , (𝑥(𝑖) )) 𝑖=1 .
{ 𝑛 }𝑛
Thus, the gradient ∇𝑤𝑔,𝑒 of generator and encoder is de-
The loss functions of  ,  of discriminator ,  are de- duced as
pro
noted as
𝜕𝐿,
∇𝑤(𝑔,𝑒),𝑗 =
𝜕𝑤(𝑔,𝑒),𝑗
1 ∑ 𝜕𝐿,
 = ((𝑧(𝑖) ), 𝑧(𝑖) ) − (𝑥(𝑖) , (𝑥(𝑖) ))
𝑛 𝑛
1∑ 1∑ 𝜕𝜻𝑖
𝑛 𝑖=1 𝑛 𝑖=1 (10)
=
𝑛 𝜻 ∈𝜻 𝜕𝜻𝑖 𝜕𝑤(𝑔,𝑒),𝑗 (14)
𝑖
+ 𝜇𝐺𝑃 (𝑥,
̃ 𝑧)
̃ 1 ∑ 𝜕𝜻𝑖
= 𝒆𝑖
𝑛 𝜻 ∈𝜻 𝜕𝑤(𝑔,𝑒),𝑗
and
re- 𝑖
where 𝑤(𝑔,𝑒),𝑗 denotes the 𝑗-th parameter of 𝑤𝑔,𝑒 , and 𝜻 is

the data of combined (𝑥) and (𝑧). The term 𝜕𝜻𝑖 ∕𝜕𝑤(𝑔,𝑒),𝑗
is computed on the fog node. After computing the gradi-
 = (𝑧(𝑖) ) − ((𝑥(𝑖) )) + 𝜇𝐺𝑃 (𝑧) ents, the fog node updates its own parameters 𝑤𝑔,𝑒 by using
𝑛 𝑛
1∑ 1∑
̃ (11)
𝑛 𝑖=1 𝑛 𝑖=1 a gradient descent method. Contrary to discriminator learn-
ing step, this step is performed only once per iteration.
After the training is complete, the cloud sends the copy
of  to fog node by utilizing Eq. (9) for anomaly detec-
lP
Then, the cloud server computes the gradient ∇𝑤𝑑 with
the loss function  and the gradient ∇𝑤𝑐 with the loss tion. Given each coming flow 𝑥, if the anomaly score 𝐴(𝑥) is
function  . After that, we apply the Adam optimizer to greater than threshold 𝛼, then 𝑥 is classified as an anomaly.
update the parameters of  and  based on the collected The overall workflow of the proposed method is shown in
samples by minimizing  and  . Following (Gulrajani Fig. 5 and described as follows:
et al., 2017), we propose to perform few gradient descent Step 1. The data traffic is collected from edge device as
iterations 𝑚 to find the good parameters 𝑤𝑑 and 𝑤𝑐 . training and testing dataset and then is processed and nor-
rna
Meanwhile, the cloud server calculates the errors 𝐹𝒆 of malized into [0, 1] to accelerate the model convergence. The
 and  using the received traffic, and transfers them to fog data preprocessing process is detailed in Section 5.2.
side, which contributes to update the{parameters
} of generator Step 2. Every global iteration, based on training dataset,
 and encoder . Formally, 𝐹𝒆 = 𝒆1 , ..., 𝒆𝑛 , where 𝒆𝑖 is the method first utilizes the generator  and encoder  in the
defined as fog node to generate a batch of{ network traffic
}𝑛 with 𝑛 samples
including a pair of fake data ((𝑧(𝑖) ), 𝑧(𝑖) ) 𝑖=1 and a pair of
normal data (𝑥(𝑖) , (𝑥(𝑖) )) 𝑖=1 . These data traffic is sent to
{ }𝑛
the cloud server for updating the parameters of  and  and
𝜕𝐿,
𝒆𝑖 = (12)
calculating error terms from  and .
Jou
𝜕𝜻𝑖
Step 3. On the other hand, the discriminator  and classi-
where 𝜻𝑖 denotes the 𝑖-th data of combined (𝑥) and (𝑧), fier  in the cloud server utilize the received flow containing
and , is the loss function of  and . generated data and real data from fog node to produce the
2) Training and Detection on Fog Node: The fog node trained model. The discriminator and classifier utilize these
hosts a encoder  and a generator  with their parameters data to compute the gradient ∇𝑤𝑑 and ∇𝑤𝑐 with loss func-
𝑤𝑔,𝑒 . The encoder and generator are both multi-layer per- tions  and  for updating their parameters 𝑤𝑑 and 𝑤𝑐
ceptron neural networks. Every global iteration, for a fixed according to Eq. (10) and Eq. (11), respectively.
𝑤𝑔,𝑒 , the fog node first generates and sends the couple of Step 4. Then, the error terms 𝐹𝒆 of generator  and en-
generated sample ((𝑧𝑖 ), 𝑧𝑖 ) and real sample (𝑥𝑖 , (𝑥𝑖 )) to the coder  are also computed in the cloud server by using the re-
cloud server for training the discriminators  and . Then, ceived traffic through  and  by using Eq. (12) and Eq. (13),
the fog node receives the error term 𝐹𝒆 from the cloud, corre- which are transferred to fog side.
sponding to the error made by  and . The , represents Step 5. After receiving the error terms 𝐹𝒆 from the cloud
the loss function of  and , which is expressed as server, the generator  and encoder  in the fog node use the
9
Journal Pre-proof
Fog Cloud
Gen
Generator
A Pair of Data
1 Preprocess 2 Generate Data
Training
Dataset
Encoder
3 Train and Update Model
of
6 Repeat
No
Epoch =T? 5 Update Model
Yes Discriminator Classifier
pro
4 Calculate Error
1 Preprocess Classification
Trained Model
7 Detection Results
Testing
Dataset
澳
Fig. 5. The overview of the proposed framework for intrusion detection in IoT, following Section 4.6: The input data traffic is initially preprocessed. A pair of real data and fake
data are generated by fog node and then sent to the cloud server to train the model. The cloud server determines the authenticity of the pair of data, and updates the parameters of
the discriminator and classifier. The error terms are returned to the fog node to update the generator and encoder. The trained model on the fog node is utilized to detect anomalies
for testing data.
gradient ∇𝑤𝑔,𝑒 to update their parameters 𝑤𝑔,𝑒 according to

Eq. (14).
re- Table 2
Computation complexity.
Step 6. This training process is repeatedly performed until Method Complexity

the maximum training epoch 𝑇 is reached. (Steps 2–5 are
( )
| |
Centralized 𝑂 3𝑚𝑁𝑇 ||𝑤𝑑 || + 2𝑚𝑁𝑇 ||𝑤𝑐 || + 2𝑁𝑇 |𝑤𝑔,𝑒 |
Cloud | |
repeated). ( )
Step 7. Finally, the cloud server sends the copy of  to

Our IDS 𝑂 𝑚𝑁𝑇 (3 ||𝑤𝑑 || + 2 ||𝑤𝑐 || )
lP
( )
Fog Our IDS
| |
𝑂 2𝑁𝑇 |𝑤𝑔,𝑒 |
fog node. The well-trained model inputs the coming flow 𝑥
| |
from testing dataset into the anomaly score 𝐴(𝑥) in Eq. (9) to
obtain the classification result for online anomaly detection.
The pseudo-code of our anomaly detection method, in-
cluding those steps, is illustrated in Algorithm 1 and Algo- with one flow, respectively). Meanwhile, the discrim-
rithm 2. As described in the Algorithm 1, in each iteration, inator  and classifier  also require 𝑚𝑁 ⋅ (𝑑𝐷 + 𝑑𝐶 )
rna
the training process Step 3 and Step 4 in the cloud server operations to compute the gradients and update their
contains the pseudocode between Line 1 and 14. The train- parameters. For brevity, let 𝑑𝐷 = ||𝑤𝑑 || and 𝑑𝐶 =
|𝑤𝑐 |. Hence, the computation cost on the cloud is
ing process Step 2 and Step 5 in the fog node includes the | (| )
pseudocode between Line 16 and 28. After that, the well- 𝑂 𝑚𝑁𝑇 ⋅ (3 ||𝑤𝑑 || + 2 ||𝑤𝑐 || ).
trained model is ready for anomaly detection. For each test • Fog. The fog node makes the generator and encoder to
flow 𝑥, the Step 7 representing the test process includes the generate the flow information, which needs (𝑁 ⋅𝑑𝐺 +𝑁 ⋅
pseudocode between Line 1 and 8 in Algorithm 2. Thus, 𝑑𝐸 ) operations in one iteration (i.e., 𝑑𝐺 and 𝑑𝐸 denote
our method combines the advantages of the cloud and fog the number of operations for generator  and encoder
 with one flow, respectively). Then, it also requires
Jou
node together, provides superior performance, and handles

the issue of low efficiency in centralized detection. (𝑁 ⋅ 𝑑𝐺 + 𝑁 ⋅ 𝑑𝐸 ) operations for  and  to respectively
update their parameters with the error feedback from
4.7. Computational Complexity the discriminator and classifier. For brevity, let 𝑑𝐺 =
We analyze the computation complexities of proposed | | | |
|𝑤𝑔 | and 𝑑𝐸 = ||𝑤𝑒 ||. Note that we suppose |𝑤𝑔,𝑒 | =
method for both cloud and fog sides, as illustrated in Ta- | | | |
| | | |
ble 2. Specifically, the detailed computational complexity |𝑤𝑔 | + |𝑤𝑒 |. Thus, the total computation cost on the
| | ( )
on the cloud sever and fog node is described as | |
fog is 𝑂 2𝑁𝑇 ⋅ |𝑤𝑔,𝑒 | .
| |
• Cloud. Upon receiving a flow, the cloud copes with
discriminator  and classifier  to calculate the error
In (centralized detection, the computation )complexity
| |
terms and the computation cost is (2𝑚𝑁 ⋅𝑑𝐷 +𝑚𝑁 ⋅𝑑𝐶 ) is 𝑂 3𝑚𝑁𝑇 ||𝑤𝑑 || + 2𝑚𝑁𝑇 ||𝑤𝑐 || + 2𝑁𝑇 |𝑤𝑔,𝑒 | , which is
| |
operations for each epoch (𝑑𝐷 and 𝑑𝐶 denote the num- large than that of our method. For the experiments, multi-
ber of operations for a feed-forward step of  and  layer perceptron networks , , , and  only have few
10
Journal Pre-proof
Algorithm 1 Training based on cloud server and fog node 5. Experiments and Analysis
Input: Training data, the batch size 𝑛, the training epoch 𝑇 , 5.1. Experimental Setup
the number of iterations of the critic per epoch 𝑚, the The overall experiments of this paper are performed on
learning rate 𝜂
Output: Trained generator , encoder , and discriminator
an Intel i7 processor (2.9 GHz) with RAM of 16-GB, under

64-bit Windows 10 system, and equipped with a NVIDIA
GTX1070 GPU. The proposed method is implemented using
1: procedure C LOUD S ERVER (𝑇 , 𝑚, 𝑛, 𝜂)
Initialize 𝑤𝑑 , 𝑤𝑐 for  and , respectively
Python 3.6 programing language, Scikit-Learn libraries, and
of
2:
Pytroch 1.10 environments. In our implementations, we em-
3: for 𝑗 = 1 to 𝑇 do ulate one cloud server utilizing Alibaba virtual server and the
4: for {𝑘 = 1 to 𝑚 do} {
((𝑧(𝑖) ), 𝑧(𝑖) ) , (𝑥(𝑖) , (𝑥(𝑖) )) ← GetSam-
} fog layer node utilizing a laptop, which is linked with the vir-
5:
ples (, )
tual edge. We employ socket to realize the communication
between the cloud and fog side. For experiments, we utilize
pro
6: ∇𝑤𝑑 ← CompDisLoss () by Eq. (10) a multi-layer perceptron (MLP) neural network with differ-
7: ∇𝑤𝑐 ← CompClaLoss() by Eq. (11) ent layers to train encoder, generator, discriminator and the
8: 𝑤𝑑 ← 𝑤𝑑 + Adam(∇𝑤𝑑 , 𝜂) classifier. The encoder has the fully connected layers with 64
9: 𝑤𝑐 ← 𝑤𝑐 + Adam(∇𝑤𝑐 , 𝜂) and 32 units, LeakyRelu function, and Batch Normalization
10: end for
𝐹𝒆 ← CompError(, ) by Eq. (12) and Eq. (13)
operation. The generator has the fully connected layers with
11:
64 and 128 units, Relu and Sigmoid functions, and Batch
12: Send the error terms 𝐹𝒆 to fog node Normalization operation. The discriminator is composed of
13: end for with two input channels for 𝑥 and 𝑧 with a fully connected
14: end procedure
layer with 64 units, LeakyRelu function, Batch Normaliza-
15:
16:
17:
18:
procedure FOGNODE(𝑇 , 𝑛, 𝜂)
Initialize 𝑤𝑔,𝑒 for  and 
for 𝑗 = 1 to{𝑇 do}
re- tion operation, and a dropout layer 0.2. The classifier has
one input channel and the same network parameters to the
discriminator. We get hyper-parameter settings by running
a series of hyperparameters experiments. The dimension of
Sample 𝑥(𝑖) 𝑖=1 a batch from normal data latent space is set to 32 with 𝑧 ∼  (0, 1). The number of
𝑛
19:
{ (𝑖) }𝑛
20: Sample 𝑧 𝑖=1 a batch of prior samples iteration 𝑚 =5, and the training epoch is 𝑇 = 200 with a
21: (𝑧(𝑖) ) ← Generator(𝑧(𝑖) ) batch size of n = 64. The model is optimized utilizing a
lP
22: (𝑥(𝑖) ){←Encoder(𝑥(𝑖)}) { well-known Adam optimizer. The model architecture and
Send ((𝑧(𝑖) ), 𝑧(𝑖) ) , (𝑥(𝑖) , (𝑥(𝑖) )) to the
}
23: hyperparameters settings are shown in Table 3. Note that
cloud server only normal data samples are used for the training process.
24: Receive error terms 𝐹𝒆 from the cloud server
25: ∇𝑤𝑔,𝑒 ← CompLoss(, ) by Eq. (14) 5.2. Datasets
26: 𝑤𝑔,𝑒 ← 𝑤𝑔,𝑒 + Adam(∇𝑤𝑔,𝑒 , 𝜂) A diversity of intrusion detection datasets can be available
for the public. For our simulations, we select two represen-
rna
27: end for

28: end procedure tative datasets to assess the effectiveness and efficiency of
the proposed method, namely UNSW-NB15 (Moustafa and
Slay, 2015) and CIC-IDS2017 (Sharafaldin et al., 2018).
1) UNSW-NB15: The dataset was created by an Aus-
Algorithm 2 Anomaly detection on fog node
tralian security laboratory utilizing IXIA PerfectStrom tool,
Input: Testing data, trained generator , encoder , and which includes realistic benign records and synthetic con-
discriminator  temporary attack records from the real environment. This
Output: Anomaly score 𝐴(𝑥) dataset consists of 43 features with the class label, involv-
1: procedure D ETECTION (, , )
Jou
ing one normal records and nine classes of attacks, namley

2: 𝐴(𝑥) ← Calculate the feature-based discriminator fuzzers, analysis, backdoor, DoS, exploits, generic, recon-
result by Eq. (9) for test sample 𝑥 naissance, shellcode, and worms. We select two datasets,
3: if 𝐴(𝑥) > 𝛼 then UNSW_NB15_training-set and UNSW_NB15_testing-set,
4: 𝑥 = 𝑎𝑛𝑜𝑚𝑎𝑙𝑦 for training and testing, respectively. Considering the com-
5: else putation efficiency, we randomly sample a small proportion
6: 𝑥 = 𝑛𝑜𝑚𝑟𝑎𝑙 (10%) normal data in UNSW_NB15_training-set as training
7: end if set. The distribution of dataset is described in Table 4.
8: end procedure 2) CIC-IDS2017: This intrusion detection dataset was
generated by Canadian Institute for Cybersecurity (CIC),
wh-ich consists of benign traffic and the most recent com-
mon attacks initiated from a dispersed network. In detail, the
neural layers. Therefore, the computation complexity of our dataset contains 78 attributes and one class label containing
method is not large and reasonable. normal class and 14 kinds of attacks. To keep the same or-
11
Journal Pre-proof
Table 3 Table 4
Proposed method architecture and hyperparameters. The datasets for evaluating the proposed method.
Operation Units Non Linearity Batch Dropout UNSW-NB15 CIC-IDS17
Norm Class Class
(𝑥) Training Testing Training Testing
Set Set Set Set
Linear 64 LeakyRelu ✓ 0.0
Linear 32 None 7 0.0 Normal 11200 37000 Normal 21984 167079
(𝑧) Fuzzers 6062 DoS Hulk 92049
Linear 64 Relu 0.0
of
Analysis 677 DoS 4117
7
Linear 128 Relu ✓ 0.0 Golden-
Linear 196 or 77 Sigmoid 7 0.0 Eye
𝑥𝑧 (𝑥, 𝑧) Backdoor 583 DoS 2318
Linear 64 LeakyRelu ✓ 0.2 slowloris
Linear 1 Sigmoid 7 0.0 Dos 4089 DoS 2200
(𝑧) Slowhttptest
pro
Linear 64 LeakyRelu ✓ 0.2 Exploits 11132 Heartbleed 5
Linear 1 Sigmoid 0.0
Generic 18871
7
Optimizer Adam (𝛼 = 10−4 , 𝛽1 = 0.5, 𝛽2 = 0.999) Reconnaissance 3496

Batch size 64 Shellcode 378
Latent Dimension 32
Worms 44
Epoches 200
LeakyReLU 0.2 Attacks 51332 Attacks 100689
Weight, bias init. Xavier Initializer, Constant(0)
Total 11200 82332 Total 21984 267768
der of magnitude of dataset, the Wednesday-working Hours

set is chosen for experiments. This dataset contains five
different attack scenarios, namely DoS slowloris, DoS
re- 6
5
GE loss
D loss
C loss
Slowhttptest, Dos Hulk, DoS GoldenEye, and Heartbleed. 4
In particular, we use the stratified sampling to select a pro- 3
Loss
portion of data. The conditions of data splitting are listed in

lP
2
Table 4.
1
In addition, data preprocessing is the essential step to fil-
ter the raw data at the beginning of experiments, which may
0
have a negative effect on detection performance. The IoT 0 50 100
Epoch
150 200
data preprocessing contains the following three steps.
(a)
1. Remove redundant features (i.e., “Fwd_Header_Length”

rna
in the CIC-IDS2017 dataset) and illegal traffic records 1RUPDO
(i.e., null or missed values).

$WWDFN
2. Transform the symbol features into numerical ones us-
ing one-hot encoding. For instance, the 42 dimensional

'HQVLW\
features in UNSW-NB15 dataset are transform-ed to 196

features.
3. Normalize the data into range [0, 1] using min-max nor-
malization after using the first two steps.
Jou

$QRPDO\VFRUH
(b)
5.3. Parameter Settings
Fig. 6. (a) Loss values during the training process, where  denotes the discriminator,
Since anomalies are not available during training, we can-  is the classifier, and 𝐺𝐸 is the combination of the generator and encoder in the pro-
not employ cross validation to tune hyperparameters. This posed model. (b) Density of anomaly score during testing phase - red line represents
the selected threshold.
is also one of the main challenges of unsupervised meth-
ods. Hence, to find optimal value of the weight of cy-
cle consistency loss 𝜎 in Section 4.4, we investigate the
loss
{ −1 functions of proposed
} method on four 𝜎 values, 𝜎 ∈ 𝜎 = 10 on UNSW-NB15 dataset for our method, as we con-
10 , 100 , 101 , 102 over 200 epochs. We first observe sider these losses equally important. Similarly, we set 𝜎 = 1
training phases on the error curves for UNSW-NB15 dataset. on CIC-IDS17 dataset. For brevity, we give the training loss
When 𝜎 = 10, we found that the losses of generator, encode on UNSW-NB15 dataset with 𝜎 = 10, as shown in Fig. 6(a).
and the discriminators are approximately balanced, which From the curves, we can see that the errors of loss functions
allows us to achieve the best performance. Thus, we set converge quickly and then keep stable during the training
12
Journal Pre-proof
of
pro
(a) (b)
Fig. 7. The visualization of encoded representations on two datasets - green denotes normal samples while red denotes anomalous samples (i.e. attacks). (a) UNSW-NB15 dataset.
(b) CIC-IDS17 dataset.
process. As we cannot known the score threshold 𝛼 for IoT

anomalies, following the work in (Zenati et al., 2018a), the
proportion of anomaly samples in testing set with the high-
est anomaly scores 𝐴(𝑥) are identified as anomalies. The
re- of VAE can be used as anomaly score to detect attacks. The
standard VAE has the dense layers with 64, 32 and 64 units.
MAD-GAN method is a new DL unsupervised anomaly de-
tection to identify attacks by utilizing a new defined anomaly
percentages of anomaly samples are 55.1% and 37.6% for score. The default LSTM networks is used for experiment
UNSW-NB15 and CIC-IDS17 datasets, respectively. Fig. and the balancing hyperparameter for anomaly score is set
6(b) illustrates the distribution of anomaly score during the to 0.5. ALAD builds upon bi-directional GAN to obtain
lP
testing phase on UNSW-NB15 dataset. Obviously, the se- adversarially features efficiently and the abnormality iden-
lected threshold can discriminate normal data and attacks tification is based on normal data. f-AnoGAN uses an en-
to some extent, which also validates the effectiveness of the coder and Wasserstein GAN to learn latent space mapping
proposed method. and get the abnormality estimation based on normal data. In
addition, FID-GAN introduces a trained encoder into GAN
To validate the importance of the proposed method, model to capture the distribution of data and estimates the
we compare the proposed method against representative abnormality by using the discrimination and reconstruction
and the-state-of-the-art unsupervised methods, namely One
rna
components. For fair comparison, the used model network

Class Support Vector Machines (OCSVM) (Nomm and parameters is the same as ours.
Bahsi, 2018), Isolation Forests (IF) (Liu et al., 2008), Local
Outlier Factor (LOF) (Nomm and Bahsi, 2018), Variational 5.4. Evaluation Metrics
Autoencoder (VAE) (Monshizadeh et al., 2022), Multivari- To evaluate the performance of proposed anomaly detec-
ate Anomaly Detection with GAN (MAD-GAN) (Li et al., tion method, we employ seven widely utilized metrics. The
2019), Adversarially Learned Anomaly Detection (ALAD) details of these metrics are described as follows:
(Zenati et al., 2018b), FID-GAN (de Araujo-Filho et al.,
2021). The KDE method estimates the probability density Accuracy =
TP + TN
(15)
Jou
function of data samples. In our experiment, the Gaussian TP + TN + FP + FN

kernel is used for KDE. OCSVM is a classic kernel method TP
by learns a decision boundary for novelty detection. The ra- Precision = (16)
TP + FP
dial basis function kernel is chosen for OCSVM. The trade
parameter 𝜈 is set to 0.5. The IF method is a relatively new Recall =
TP
(17)
classical machine learning technique that isolates anomalies TP + FN
rather than models the distribution of normal data. The near- FP
est neighbors value k is set to 10% of training set. The con- False Alarm Rate (FAR) = (18)
FP + TN
tamination rate of the dataset is set same to our proposed
method. The LOF considers a data point with a local den- Precision ⋅ Recall
F1 = 2 ⋅ (19)
sity much lower than that of its neighbors as an anomaly. The Precision + Recall
nearest neighbors value k is also selected as 10% of training where True Positive (TP), False Positive (FP), True Negative
set. The VAE method is a DL method that contains a en- (TN), and False Negative (FN) denotes the number of ac-
coder and a decoder components. The reconstruction error curately predicted attack samples, the number of incorrectly
13
Journal Pre-proof
Table 5 Table 6
Results for proposed Method with different consistency Losses on two datasets. Anomaly detection result based on the UNSW-NB15 dataset.
Dataset Consistency loss Accuracy Recall FAR F1 Method Accuracy Recall Precision FAR F1 Testing(s)
𝑑𝑥 0.801 0.819 0.222 0.819 OC-SVM 0.681 0.881 0.657 0.565 0.753 63.5
UNSW-NB15
𝑑𝑧 0.732 0.757 0.298 0.757 IF 0.680 0.878 0.656 0.564 0.751 7.9
𝑑𝑥 0.823 0.763 0.142 0.764 LOF 0.709 0.822 0.702 0.428 0.768 20.2
CIC-IDS17
0.766 0.689 0.188 0.689
VAE 0.700 0.729 0.729 0.333 0.728 2.9
𝑑𝑧
of
MAD-GAN 0.663 0.694 0.694 0.375 0.694 14.4
ALAD 0.727 0.752 0.752 0.303 0.752 2.1
f-AnoGAN 0.753 0.776 0.776 0.275 0.776 2.3
predicted attack samples, the number of accurately predicted FID-GAN 0.684 0.713 0.713 0.352 0.713 2.7
normal samples, and the number of incorrectly predicted Proposed 0.801 0.819 0.819 0.222 0.819 2.7
pro
normal samples, respectively. In addition, F1 provides the
balance between precision and recall. The ROC curve plots
the graph of true positive rate TPR = [TP∕(TP+FN)] against
the false positive rate FPR = [FP∕FP + TN)] at various Table 7
Anomaly detection result based on the CIC-IDS17 dataset.
threshold settings. The AUC is the area under the ROC
curve, which reflects the overall classification performance. Method Accuracy Recall Precision FAR F1 Testing(s)
OC-SVM 0.659 0.922 0.526 0.500 0.670 285.7
IF 0.676 0.764 0.550 0.377 0.640 14.8
5.5. Experimental Results
LOF 0.675 0.758 0.549 0.375 0.637 398.2
1) Evaluation on Latent Representation: To validate
whether the latent representations retain the ability to dis-
tinguish abnormal samples from normal samples, we visu-
alize the hidden variables according to t-SNE visualization.
Figs. 7(a) and 7(b) show the observed latent representations
re- VAE
MAD-GAN
ALAD
f-AnoGAN
0.775
0.708
0.787
0.786
0.701
0.612
0.717
0.716
0.701
0.575
0.717
0.715
0.333
0.234
0.170
0.171
0.701
0.612
0.717
0.716
9.2
44.3
6.4
9.2
FID-GAN 0.775 0.701 0.701 0.180 0.701 9.5
in a 2-D view for UNSW-NB15 and CIC-IDS17 datasets,
Proposed 0.823 0.763 0.765 0.142 0.764 9.4
respectively. The data points represented by latent space
are obviously divided into two groups, attacks and normal.
lP
The normal samples are clustered into several clusters in the
center, while abnormal samples are distributed in the edge.
The clustering results clearly show that attacks can be distin- From the two tables, the proposed method has robust de-
guished from normal data in latent representation and suc- tection performance in both datasets under different eval-
cessfully recognized, which means that the original input uation metrics. In the UNSW-NB15 dataset, our method
data is effectively mapped in the latent space. Moreover, the achieves the highest accuracy with 80.1% and the lowest
majority of normal data resides in a regional center, which FAR with 22.2%. However, traditional anomaly detection
rna
may provide more space for anomalies to occur within that methods (i.e., OC-SVM, IF) achieved slightly higher detec-
region. Although there have mixed points, the overall clas- tion rate than that of our method. But unfortunately, these
sification is accurate for most data. traditional methods generated a high false alarm rate, which
Then, we compare difference in data space (i.e., real data) cannot be overlooked in some IoT applications such as e-
with difference in latent space (i.e., generated data). We de- healthcare. In the CIC-IDS2017 dataset, our method in
note our consistency loss in Eq. (7) as 𝑑𝑥 . Alternatively, terms of F1 measure reached with 75.4%. On the other hand,
another choice can be considered as the difference between MAD-GAN shows the lowest performance with 69.4% and
variables 𝑧 and the reconstructed data 𝑧′ = ((𝑧)), which is 61.2% of F1-measure compared with other DL methods (i.e,
defined as 𝑑𝑧 = ‖𝑧 − ((𝑧))‖1 . Table 5 indicates the com-
Jou
VAE, ALAD, f-AnoGAN, FID-GAN) on two datasets, re-

parison results on two datasets utilizing 𝑑𝑥 and 𝑑𝑧 . It can spectively. Furthermore, it could be noted that the proposed
be seen that the distance using 𝑑𝑥 is far superior when de- method presents obvious performance enhancements (accu-
tecting abnormal samples. This happens may because real racy: 4.8% and F1: 4.3%) on UNSW-NB15 and realized
samples can be next to each other in the latent space after great improvements (accuracy: 3.6% and F1: 4.7%) on CIC-
reconstruction. However, small changes between 𝑧 and 𝑧′ IDS2017 dataset. Furthermore, comparing the testing time
will not produce similar samples. This result indicates the of the methods shows that our method achieved low infer-
reconstruction of true samples during training helps to iden- ence time with 2.7 and 9.4 s on two datasets, respectively.
tify abnormal samples. On the other hand, the traditional anomaly methods has rel-
2) Comparative Analysis: We compare the proposed atively high testing time when compared with DL methods.
method with the exist anomaly detection methods (i.e., unsu- The reason is that traditional methods needs a large number
pervised), as presented in Section B. The comparative results of comparisons to identify attacks. Although ALAD has the
for UNSW-NB15 and CIC-IDS2017 are shown in Table 6 shortest testing time, our method is still comparable when
and Table 7, respectively. working for complex datasets.
14
Journal Pre-proof

7UXH3RVLWLYH5DWH
7UXH3RVLWLYH5DWH
of

2&690$8& 2&690$8&
,)$8& ,)$8&
/2)$8& /2)$8&
9$($8& 9$($8&
0$'*$1$8& 0$'*$1$8&
$/$'$8& $/$'$8&

I$QR*$1$8& I$QR*$1$8&
),'*$1$8& ),'*$1$8&
pro
3URSRVHG$8& 3URSRVHG$8&

)DOVH3RVLWLYH5DWH )DOVH3RVLWLYH5DWH
(a) (b)
Fig. 8. ROC and AUC analysis for anomaly detection methods. (a) UNSW-NB15 dataset. (b) CIC-IDS17 dataset. With the increasing false positive rate, an increase in true
positive rate with proposed method outperforms other methods.
Table 8 Table 9
Pairs-test results on two datasets using Accuracy and F1 measures. Ablation study of different modules on the UNSW-NB15 dataset.
Method
OC-SVM
UNSW-NB15
Accuracy
8.12e-05
F1
0.0009
CIC-IDS17
Accuracy
6.89e-06
re- F1
5.54e-5
Module
Baseline
Baseline + CoL
Baseline + CycL
Accuracy
0.735
0.703
0.777
Recall
0.760
0.730
0.797
FAR
0.295
0.331
0.189
F1
0.760
0.730
0.797
IF 7.81e-05 0.0008 1.12e-5 1.63e-5 Baseline + DivL + CycL 0.801 0.819 0.222 0.819
LOF 0.0003 0.0037 1.08e-5 1.47e-5
VAE 0.0002 0.0002 0.0023 0.0014
lP
MAD-GAN 0.0002 0.0001 0.0019 0.001
ALAD 0.0013 0.0013 0.0375 0.0227
f-AnoGAN 0.0119 0.0117 0.0454 0.0403 outperforms significantly other competing methods.
FID-GAN 0.0008 0.0006 0.003 0.0021 The proposed method is also explored on the specific at-
tack groups in UNSW-NB15 and CIC-IDS17 datasets, as
displayed in Fig. 9(a) and Fig. 9(b), respectively. In general,
our method can detect accurately almost all attacks on both
datasets. However, Shellcode attack in CIC-IDS17 dataset
rna
Fig. 8(a) and Fig. 8(b) show the ROC curve and ACU has the lowest detection rate with 8.47%. This is because
score under the UNSW-NB15 and CIC-IDS2017 datastes this attack behavior resembles the normal data and there has
for the analysis of the ROC curves of different methods, re- few samples in the dataset.
spectively. Obviously, as presented in ??, the area under 3) Ablation Study: We conduct an ablation evaluation
the curve of the proposed method is larger than that of the to demonstrate the importance of different components.
other seven methods, as can also be seen from the AUC val- Specifically, we select BiGAN with Wasserstein distance
ues. Analysis of the ROC curve of the proposed method as the baseline of our model by systematically adding each
shows that the TPR value rapidly reaches 0.7 as the thresh- component in turn. Our full model (Section IV-D) includes
old is continuously reduced. This indicates that our method
Jou
Coding loss (CoL) and cycle consistency loss (CycL). Thus,

is more robust. Although ALAD method performes well, the we repeated our experiments with and without these compo-
growth rate of TPR values is still worse than ours. In short, nents. The results for UNSW-NB15 and CIC-IDS17 datasets
the proposed method achieved better detection results than are shown in Table 9 and Table 10, respectively. It is clearly
existing methods. that only employing the Wasserstein distance results in low
To strength the robustness of our method, a paired 𝑡-test accuracy on UNSW-NB15 and CIC-IDS2017 correspond-
is conducted to determine if our method is statistical signif- ingly. This is mainly because the original discriminator fails
icantly compared with competing methods utilizing the ac- to match between normal data and attacks, and thus cannot
curacy and F1 measures. The 𝑝-values computation results identify attacks accurately. To tackle this issue, the DivL and
on two datasets are shown in Table 8. If 𝑝-value < 0.05, CycL components are employed and has greatly increased
the results are statistical significantly. Otherwise, there is the performance of proposed method on accuracy, FAR, and
no performance difference difference between the methods. F1.
From the table, the computed 𝑝-values are much lower than 4) Investigation of different anomaly scores: We explore
0.05 on two datasets. This evidence suggests that out method different scores to verify the effectiveness of our anomaly
15
Journal Pre-proof
Table 10 Table 11
Ablation study of different modules on the CIC-IDS17 dataset. Different anomaly scores on the UNSW-NB15 dataset.
Module Accuracy Recall FAR F1 Score Accuracy Recall FAR F1

Baseline 0.789 0.720 0.169 0.720 𝐿1 residual 0.789 0.809 0.234 0.809
Baseline + CoL 0.791 0.722 0.167 0.722 𝐿2 residual 0.794 0.813 0.229 0.813
Baseline + CycL 0.797 0.731 0.162 0.731 Logits 0.542 0.584 0.509 0.584
Baseline + DivL + CycL 0.823 0.763 0.142 0.764 Feature Matching 0.801 0.819 0.222 0.819
of
1.0
Table 12
0.8 Different anomaly scores on the CIC-IDS17 dataset.
pro
Detection rate
0.6 Score Accuracy Recall FAR F1

𝐿1 residual 0.794 0.723 0.165 0.723
0.4
𝐿2 residual 0.790 0.721 0.168 0.721
0.2 Logits 0.758 0.702 0.180 0.702
Feature Matching 0.823 0.763 0.142 0.764
0.0
l rs sis or oS its ric ms
r ma zze aly kdo D Explo Gene Wor nce
No Fu An Bac ssa
on nai
c
Re
(a) Table 13
Communication complexity.
1.0
0.8
re- Type
Cloud → Fog
Fog → Cloud
Centralized
−
𝑁 |𝑥|
Our IDS
𝑛(|𝑥| + |𝑧|)
2𝑛(|𝑥| + |𝑧|)
Detection rate
0.6
0.4
0.2
5) Analysis on Response Time and Communication cost :
lP
0.0
al ris s t lk ye ed We implement two IoT frameworks to validate the efficiency
pte
of the proposed method, namely fog-based and cloud-based.
orm wlo Hu nE ble
N Slo htt lde art
Slo
w Go He
Fig. 10(a) shows the response time taken for two deploy-
(b)
ments with different numbers of traffic packets. It can be
Fig. 9. Detection rate of different attack types on two datasets. (a) UNSW-NB15 seen that fog deployment achieves a shorter recognize time
dataset. (b) CIC-IDS17 dataset.
than that of the cloud when detecting IoT traffic, since the
rna
fog layer brings the computation resources near the edge.

Therefore, the proposed fog-based framework can recognize
score. Particularly, we consider the raw output (i.e., the for- malicious IoT anomalies with low detection latency. This
mat of log) of the discriminator  as “Logits". Moreover, indicates that the proposed framework is suited to meet real-
we also employ non adversarially learned 𝐿1 and 𝐿2 resid- time IoT applications such as e-healthcare.
ual losses. The “feature matching” represents our anomaly In addition, communication cost is also a crucial factor
score, as it utilizes feature vectors extracted from the dis-
criminator . Specifically, these scores are described as
for validating the efficiency of detection framework. We an-
alyze communication complexities for centralized detection
• 𝐿1 residual loss 𝐴𝑥 = ‖𝑥 − ((𝑥))‖1
and our IDS in Table 13. In our proposed framework, there
Jou
exists two kinds of communications as below:

• 𝐿2 residual loss 𝐴𝑥 = ‖𝑥 − ((𝑥))‖2
• Cloud → Fog. Upon receiving each flow, the cloud
• Logits 𝐴𝑥 = log((𝑥, (𝑥))) computes the errors and transmit them to the fog node.
Hence, the total communication from cloud to fog is
Table 11 and Table 12 indicate the simulation results of 𝑛 ⋅ (|𝑥| + |𝑧|) for each global epoch, where 𝑛 is the batch
different anomaly scores on Two datasets. As can be seen size.
from the tables, our feature matching loss extracted from the
discriminator has can better detect anomalies (i.e., attacks), • Fog → Cloud. The fog node sends a pair of true data
while outperforming the 𝐿1 and 𝐿2 variants. In contrast, (𝑥, (𝑥)) and generated data ((𝑧), 𝑧) to the cloud in one
the cross entropy loss has the poorest performance on two iteration, and thus the cost is 2𝑛 ⋅ (|𝑥| + |𝑧|).
datasets. The possible reason is that the direct output of dis-
criminate may output a random prediction score, which can- In centralized detection, the fog node initially sends all
not distinguish between real data and reconstructed data. flows to the cloud server at one time. Compared with
16
Journal Pre-proof
6
Fog
these traditional methods often classify some normal data
5 Cloud that is far away from data cluster as abnormal. Moreover,
they generally need to large computational costs looking for
Reponse Time (s)
4
decision boundary to detect anomalies, which results in high
3 detection time. This means that these methods can not be
2 directly applied in real time IoT applications. On the other
1 hand, anomaly-based DL methods can effectively deal with
high-dimensional data and find out useful hidden layer in-
of
formation on normal data for identifying anomalies. Thus,
0 1 2 3 4 5 6
Size of dataset (×105) they have higher detection accuracy and lower FPR than that
(a) of traditional methods. However, our method improves the
performance of IDS better than other existing DL methods
pro
6
Fog Cloud
as it can represent the distribution of IoT data using Wasser-
stein distance better than conventional other distances and
Communication Cost (MB)
5 Cloud Fog
focus on the latent representation of data better than existing
4
methods. It also promotes fast and stable convergence while
3 maintaining strong resilience to data distortions.
2 To summarize the state-of-the-art research findings in
1 IDS, Table 14 presents and compares the most relevant IDS
0 10 studies based on the ten aspects, including their reference
20 30 40 50 60 and year, security architecture, technique employed, whether
Batch Size
(b)
Fig. 10. Efficiency. (a) Response time. (b) Communication cost for Fog → Cloud,
Cloud → Fog.
re- they use conventional ML classifier or not, dataset used, pre-
process techniques, evaluation measures, whether they de-
signed for IoT or not, IDS strengths, IDS drawbacks. As
shown in Table 14, we may conclude that various traditional
ML and DL techniques, such as OCSVM, Ensemble Vot-
ing, Isolation Forest, PCA, AE, LSTM, and improved GAN,
centralized-based method (i.e. only cloud-based), the pro- can achieve desirable detection performance on the various
lP
posed framework leaves the discriminator in the cloud while datasets. However, the first shortcoming we found in the
offloading the generator and encoder tasks to the fog. Thus, existing works is that most of them did not comprehensively
the traffic transmission cost between the cloud and fog is ra- measure the effectiveness and efficiency of the methods. The
tional. Fig. 10(b) presents the communication cost of one second shortcoming identified is that they are developed for
epoch with different batch size. Although the communica- centralized or decentralized deployment, which is difficult to
tion cost between the cloud and fog grows steadily with value realize high scalability. Nevertheless, note that our proposed
𝑛, it is always less than centralized-based cost (i.e., 𝑁 ≫ 𝑛, method not only offers the design and implementation of a
rna
𝑁 is the total number of dataset), which helps to alleviate coordinated security architecture suitable for IoT, but also
the communication overload of IoT network. This further depends on a novel DL approach to facilitate anomaly de-
validates the proposed method is efficient for detecting IoT tection while providing comprehensively superior detection
traffic. performance.
Our proposed method contains several advantages that
6. Comparison and Discussion solve the limitations of previous IDSs in IoT networks.
To clearly discuss the detection performance of the pro- Firstly, the proposed method enables the traffic behaviors
posed method compared to other eight techniques, we ex- of IoT networks and evolving attack techniques to be moni-
Jou
plain that our method can effectively identify cyber-attacks tored. As it overcomes high false positives on traditional IDS
from IoT. The anomaly detection methods listed in Table 6, mechanisms, it can effectively detect unknown attack pat-
Table 7, and Fig. 8 suffer from processing large-scale data terns that are generally different from normal data. For ex-
to achieve high detection accuracy and low false positive ample, our method can recognize Analysis, Backdoor, Dos,
rate. Compared with new DL methods (i.e., VAE, MAD- Exploits, Generic, and Worms in UNSW-NB15 dataset, with
GAN, ALAD, f-AnoGAN, FID-GAN), traditional methods averages of 78-98%. However, some attacks such as Recon-
such as OC-SVM, IF, LOF have relatively high detection naissance, Shellcode, can gather valuable information about
rates, because they pay more attention to data points away IoT networks, penetrate a slight piece of code, and identify
from the center of normal samples and ignore points at the device features such as devices addresses to control the com-
data edge. This decision rule thus enables them to distin- promised object. For Dos Slowloris attack in CIC-IDS17
guish anomalies that are far from normal data. However, the dataset, it can make a single edge server keeping connections
main challenge for anomaly-based methods is that they suf- open with minimal bandwidth that consumes the server re-
fer from high false positives due to focusing on detecting at- sources. Thus, our method still can not accurately classify
tacks based on normal data patterns. This is the reason that these attacks. We will improve our method for mitigating
17
Journal Pre-proof
Table 14
Comparative analysis of proposed method with other state-of-the-art IDS studies
Study Security Technique Traditional Dataset used Preprocessing Metrics IoT Strengths Drawbacks
architecture ML
Donahue Centralized BiGAN No Image dataset, One-hot Precision, No ∙ High AUC ∙ Lack of FAR
et al. (2018) NSL-KDD Recall, F1, ∙ Fast detection ∙ Design for image
AUC, analysis
Inference
of
time
Li et al. Centralized GAN, LSTM No SWaT, WADI, Data select Precision, Yes ∙ Detailed framework ∙ High computation
(2019) KDDCUP99 Recall, F1, ∙ Enhanced Security cost
Maximum ∙ Lack of FAR
mean ∙ Lack of theoretical
discrepancy analysis
pro
Marteau Centralized Random Yes UNSW-NB15, − AUC, No ∙ Less overhead ∙ Semi-supervised
(2021) binary CICIDS, Average ∙ Detect collective ∙ Lack of reliability
partitioning Kitsune, precision anomalies
trees ISCX2012 Equal error
rate,
Computation
time
de Araujo- Decentralized AE, GAN No SWaT, WADI − AUC, Equal Yes ∙ Improved technique ∙ High computation
Filho et al. NSL-KDD error rate, ∙ Design overhead
(2021) Detection considerations ∙ Lack of comparison
latency ∙ Low detection time
Abdelmoumin Centralized Stacking(PCA, No IoT-Botnet, − Accuracy, Yes ∙ High Accuracy ∙ Lack of FAR
et al. (2022)
Yang and
Hwang
(2022)
Centralized
OC-SVM,
DNN)
Ensemble-
based AE,
Mahalanobis
No
re-
ToN-IoT
UNSW-NB15 One-hot
Precision,
Recall, F1,
AUC
Accuracy,
Recall,
Precision,
Yes
∙ Robust solution
∙ High Precision
∙ Anomaly
comparison
∙ High computation
∙ Extended execution
time
∙ Limited
comparison
∙ Lack of robustness
distance FAR, F1, validation
AUC
Ding and Li Centralized GCN, LSTM No CTU-13, Build Accuracy, Yes ∙ Enhance ∙ Supervised
(2022) CICIDS-17 subgraph Precision, performance ∙ Lack of detection
lP
Recall, F1, ∙ Detailed time
AUC comparison ∙ High computational
complexity
Qi et al. Centralized LSH, Yes UNSW-NB15 feature AUC, Time Yes ∙ Enhanced AUC ∙ Lack of FAR
(2022) Isolation reduction cost ∙ Scalability ∙ Lack of robustness
Forest, PCA validation evaluation
∙ Efficient
computation
Cui et al. Centralize EBGAN No NSL-KDD Normalization Precision, Yes ∙ Enhanced ∙ Lack of FAR
rna
(2022) Recall, F1 performance ∙ Lack of testing time

∙ Comparison
evaluation
Nie et al. Decentralized Fuzzy Rough No CIC-IDS18, Feature Accuracy, Yes ∙ Enhanced accuracy ∙ Require label
(2022) Set, CNN, CIC-DDOS19 selection Precision, ∙ Detailed framework ∙ Lack of testing time
GAN Recall, FAR, ∙ Complex
F1 computation
Khan et al. Centralized AutoML,soft Yes UNSW-NB15, Data clean, Accuracy, Yes ∙ Enhanced ∙ Lack of FAR
(2023) voting CIC-IDS17, Label Precision, performance ∙ Complex
encoding, Recall, F1 ∙ Robust system computation
Resampling, ∙ Detailed discussion ∙ Require label
Feature
Jou
selection
Alsaedi Centralized AE, LSTM, No ToN-IoT, Normalization Precision, Yes ∙ High F1 ∙ Lack of FAR
et al. (2023) Attention SWaT, WADI, Recall, F1, ∙ Comprehensive
Unit, Gas pipeline AUC, discussion
Isolation Training and ∙ Robustness
Forest Testing time evaluation
Proposed Decentralized, Improved No UNSW-NB15, Normalization, Accuracy, Yes ∙ Enhanced ∙ Lack of distributed
Joint BiGAN CIC-IDS17 One-hot, Precision, performance learning
Data clean Recall, FAR, ∙ Detailed framework
F1, AUC, ∙ Highly scalable
Testing time, ∙ Efficient detection
Signification
test, Response
Time, Com-
munication
overhead
18
Journal Pre-proof
these attacks in future work. 8. Conclusion and Future Work

Secondly, our method has the scalability and flexibility In this paper, we proposed a novel unsupervised anomaly
since it is designed based on two parts (i.e., generator and detection method to defend against attacks in IoT networks,
encoder, discriminator and classifier) by combing fog node which is based on improved BiGAN with discrimination
and cloud server, which can be jointly deployed in IoT net- anomaly score. To improve the detection performance, we
works. Our method divides the entire detection mechanism employed Wassertein distance into BiGAN to facilitate the
into small parts, where each part can be dedicated to a spe- model more stable. Then, we introduced a code discrimi-
of
cific function. These parts can be easily deployed into a pro- nator and a cycle consistency loss to learn potential latent
duction environment, modified and evolved without any im- representations, which can help to discriminate from attacks
pact on the overall system. Therefore, this way is well suited and normal data. Moreover, to reduce recognize time and
for complex and large-scale structure of IoT networks. The communication cost, we utilize the fog and the cloud com-
coordination training property makes our framework much puting to jointly train an detection framework for IoT net-
pro
better than traditional IDS, where the detection solution is works. The extensive experimental results on two real-world
deployed as a system and all functions are performed in a sin- datasets show that our method outperforms significantly than
gle block. The design of conventional IDS carries complex- the existing methods in terms of accuracy, FAR, and F1. Fur-
ity to the implement, maintenance, and evolution of defense thermore, our method permitted simpler communication and
mechanisms in IoT networks. It also results in poor scalabil- reduced overheads between the fog and cloud layer, provid-
ity and overload of computing resources, limiting the ability ing an efficient detection framework to support many IoT
to secure IoT networks. Compared with traditional IDS solu- applications. In the future, the interpretability of classifi-
tions, the way we implement detection on the fog node also cation decisions in IDS remains an unexplored area, which
results in less response time. will be explained in IoT networks. We also propose to in-
re-
Finally, our method provides high capabilities to deal with
large amounts of high-dimensional IoT traffic, which shows
improved attack detection performance by depending heav-
ily on three effective loss functions, as described in Sec-
vestigate distributed training, which is not addressed in this
work. This is due to the heterogeneous nature of IoT net-
works. Finally, we plan to utilize a lot of datasets to evaluate
comprehensively our method.
tion 4. Our method focuses on the latent representation and
reconstruction of data in an unsupervised manner, thereby
learning to hidden patterns to identify anomalies effectively. CRediT authorship contribution statement
lP
Furthermore, it utilizes Wassertein distance with neural net- Wei Yao: Conceptualization of this study, Methodology,
works which is better than other DL-based IDS methods Writing - Original draft, Software. Han Shi: Data process-
in terms of time cost and converge. As a result, its out- ing, Writing - review & ending. Hai Zhao: Validation.
put has fewer errors which keeping computational efficiency,
which indicates that it has better generalization capability
than other DL-based IDS methods. Declaration of competing interest
rna
The authors declare that they have no known competing

financial interests or personal relationships that could have
7. Limitations appeared to influence the work reported in this paper.
Although the proposed method is efficient and effective,
it still exits some limitations. First, the proposed method Acknowledgements
is designed using an unsupervised way, which learns nor-
mal profile from benign traffic. However, it is difficult to This work has been supported by a research grant from
obtain pure and clean data in some practical IoT applica- the Fundamental Research Funds for the Central Universities
with grant number 2020GFZD014.
Jou
tions. Therefore, we suggest extending the proposed method

to learn from unlabeled traffic using semi-supervised learn-
ing (i.e., generative networks or self-learning). Second, the
proposed method does not address how distributed training References
will be kept, which is an important aspect of most intelli- Abdel-Basset, M., Chang, V., Hawash, H., Chakrabortty, R.K., Ryan, M.J.,
gent IoT applications. Hence, we aim to extend our method 2021. Deep-ifs: Intrusion detection approach for industrial internet of
things traffic in fog environment. IEEE Trans. Ind. Informatics 17, 7704–
to address this challenge in future work. Third, the privacy- 7715. doi:10.1109/TII.2020.3025755.
preserving aspect of the proposed anomaly detection has not Abdelmoumin, G., Rawat, D.B., Rahman, A., 2022. On the performance
been considered, which can be addressed by utilizing feder- of machine learning models for anomaly-based intelligent intrusion de-
ated learning and privacy-preserving methods in the future. tection systems for the internet of things. IEEE Internet Things J. 9,
Finally, the complexity of messaging indicates the cost of 4280–4290. doi:10.1109/JIOT.2021.3103829.
Al-Hawawreh, M., Moustafa, N., Garg, S., Hossain, M.S., 2021. Deep
broadcasting new blocks to all participants in an IoT environ- learning-enabled threat intelligence scheme in the internet of things net-
ment, which may degrade the capability of alert aggregation works. IEEE Trans. Netw. Sci. Eng. 8, 2968–2981. doi:10.1109/TNSE.
and achieving complex intrusions rapidly. 2020.3032415.
19
Journal Pre-proof
Alsaedi, A., Tari, Z., Mahmud, R., Moustafa, N., Mahmood, A., Anwar, for cyber-physical systems. IEEE Trans. Sustain. Comput. 6, 66–79.
A., 2023. Usmd: Unsupervised misbehaviour detection for multi-sensor doi:10.1109/TSUSC.2019.2906657.
data. IEEE Transactions on Dependable and Secure Computing 20, 724– Khan, M.A., Iqbal, N., Imran, Jamil, H., Kim, D.H., 2023. An optimized
739. doi:10.1109/TDSC.2022.3143493. ensemble prediction model using automl based on soft voting classifier
de Araujo-Filho, P.F., Kaddoum, G., Campelo, D.R., Santos, A.G., Macêdo, for network intrusion detection. J. Netw. Comput. Appl. 212, 103560.
D., Zanchettin, C., 2021. Intrusion detection for cyber-physical systems doi:10.1016/j.jnca.2022.103560.
using generative adversarial networks in fog environment. IEEE Internet Kye, H., Kim, M., Kwon, M., 2022. Hierarchical autoencoder for network
Things J. 8, 6247–6256. doi:10.1109/JIOT.2020.3024800. intrusion detection, in: IEEE International Conference on Communica-
Arjovsky, M., Chintala, S., Bottou, L., 2017. Wasserstein generative ad- tions (ICC), IEEE. pp. 2700–2705. doi:10.1109/ICC45855.2022.9839056.
of
versarial networks, in: Proceedings of the 34th International Confer- Li, D., Chen, D., Goh, J., Ng, S., 2018. Anomaly detection with generative
ence on Machine Learning (ICML), IEEE. pp. 214–223. URL: http: adversarial networks for multivariate time series. URL: http://arxiv.
//proceedings.mlr.press/v70/arjovsky17a.html. org/abs/1809.04758.
Bengio, Y., Courville, A., Vincent, P., 2013. Representation learning: A Li, D., Chen, D., Jin, B., Shi, L., Goh, J., Ng, S., 2019. MAD-
review and new perspectives. IEEE Trans. Pattern Anal. Mach. Intell. GAN: multivariate anomaly detection for time series data with gener-
35, 1798–1828. doi:10.1109/TPAMI.2013.50. ative adversarial networks, in: 28th International Conference on Arti-
pro
Cao, V.L., Nicolau, M., McDermott, J., 2019. Learning neural represen- ficial Neural Networks (ICANN), Springer. pp. 703–716. doi:10.1007/
tations for network anomaly detection. IEEE Trans. Cybern. 49, 3074– 978-3-030-30490-4\_56.
3087. doi:10.1109/TCYB.2018.2838668. Li, R., Li, Q., Zhou, J., Jiang, Y., 2022a. Adriot: An edge-assisted anomaly
Ceron, J.M., Steding-Jessen, K., Hoepers, C., Granville, L.Z., Margi, C.B., detection framework against iot-based network attacks. IEEE Internet
2019. Improving IoT botnet investigation using an adaptive network Things J. 9, 10576–10587. doi:10.1109/JIOT.2021.3122148.
layer. Sensors 19, 727. doi:10.3390/s19030727. Li, Y., Wang, Q., Zhang, J., Hu, L., Ouyang, W., 2021. The theoretical
Chawathe, S.S., 2018. Monitoring iot networks for botnet activity, in: 17th research of generative adversarial networks: an overview. Neurocom-
IEEE International Symposium on Network Computing and Applica- puting 435, 26–41. doi:10.1016/j.neucom.2020.12.114.
tions (NCA), IEEE. pp. 1–8. doi:10.1109/NCA.2018.8548330. Li, Z., Sun, Y., Yang, L., Zhao, Z., Chen, X., 2022b. Unsupervised ma-
Choi, H., Kim, M., Lee, G., Kim, W., 2019. Unsupervised learning ap- chine anomaly detection using autoencoder and temporal convolutional
Supercomput. 75, 5597–5621. doi:10.1007/s11227-019-02805-w.

re-
proach for network intrusion detection system using autoencoders. J.
Cui, Y., Shen, W., Zhang, J., Lu, W., Liu, C., Sun, L., Chen, S., 2022.
Using EBGAN for anomaly intrusion detection, in: International Joint
Conference on Neural Networks IJCNN, IEEE. pp. 1–7. URL: https:
network. IEEE Trans. Instrum. Meas. 71, 1–13. doi:10.1109/TIM.2022.
3212547.
Lim, W.Y.B., Luong, N.C., Hoang, D.T., Jiao, Y., Liang, Y., Yang, Q., Niy-
ato, D., Miao, C., 2020. Federated learning in mobile edge networks: A
comprehensive survey. IEEE Commun. Surv. Tutorials 22, 2031–2063.
//doi.org/10.1109/IJCNN55064.2022.9892744. doi:10.1109/COMST.2020.2986024.
Ding, Q., Li, J., 2022. Anogla: An efficient scheme to improve network Liu, F.T., Ting, K.M., Zhou, Z., 2008. Isolation forest, in: Proceedings of
anomaly detection. J. Inf. Secur. Appl. 66, 103149. doi:10.1016/j.jisa. the 8th IEEE International Conference on Data Mining (ICDM), IEEE.
2022.103149. pp. 413–422. doi:10.1109/ICDM.2008.17.
lP
Donahue, J., Krähenbühl, P., Darrell, T., 2018. Adversarial feature learning, Liyanage, K.S.K., Divakaran, D.M., Singh, R.P., Gurusamy, M., 2021.
in: 5th International Conference on Learning Representations (ICLR), ADEPT: detection and identification of correlated attack stages in iot
pp. 1–18. URL: http://arxiv.org/abs/1605.09782. networks. IEEE Internet Things J. 8, 6591–6607. doi:10.1109/JIOT.
Ferrag, M.A., Maglaras, L.A., Moschoyiannis, S., Janicke, H., 2020. Deep 2021.3055937.
learning for cyber security intrusion detection: Approaches, datasets, Mahdavifar, S., Ghorbani, A.A., 2019. Application of deep learning to
and comparative study. J. Inf. Secur. Appl. 50. doi:10.1016/j.jisa.2019. cybersecurity: A survey. Neurocomputing 347, 149–176. doi:10.1016/
102419. j.neucom.2019.02.056.
Gao, J., Gan, L., Buschendorf, F., Zhang, L., Liu, H., Li, P., Dong, X., Marteau, P., 2021. Random partitioning forest for point-wise and col-
rna
Lu, T., 2021. Omni SCADA intrusion detection using deep learning lective anomaly detection - application to network intrusion detection.
algorithms. IEEE Internet Things J. 8, 951–961. doi:10.1109/JIOT.2020. IEEE Trans. Inf. Forensics Secur. 16, 2157–2172. doi:10.1109/TIFS.
3009180. 2021.3050605.
Ghosh, A.M., Grolinger, K., 2021. Edge-cloud computing for internet of Meidan, Y., Bohadana, M., Mathov, Y., Mirsky, Y., Shabtai, A., Breiten-
things data analytics: Embedding intelligence in the edge with deep bacher, D., Elovici, Y., 2018. N-baiot - network-based detection of iot
learning. IEEE Trans. Ind. Informat. 17, 2191–2200. doi:10.1109/TII. botnet attacks using deep autoencoders. IEEE Pervasive Comput. 17,
2020.3008711. 12–22. doi:10.1109/MPRV.2018.03367731.
Gulrajani, I., Ahmed, F., Arjovsky, M., Dumoulin, V., Courville, Monshizadeh, M., Khatri, V., Kantola, R., Yan, Z., 2022. A deep density
A.C., 2017. Improved training of wasserstein gans, in: Annual based and self-determining clustering approach to label unknown traffic.
Conference on Neural Information Processing Systems (NIPS), pp. J. Netw. Comput. Appl. 207, 103513. doi:10.1016/j.jnca.2022.103513.
Jou
5767–5777. URL: https://proceedings.neurips.cc/paper/2017/hash/ Moustafa, N., Keshk, M., Choo, K.R., Lynar, T.M., Camtepe, S., Whitty,
892c3b1c6dccd52936e27cbd0ff683d6-Abstract.html. M.T., 2021. DAD: A distributed anomaly detection system using en-
Hassan, M.M., Gumaei, A., Huda, M.S., Almogren, A., 2020. Increasing semble one-class statistical learning in edge networks. Future Gener.
the trustworthiness in the industrial IoT networks through a reliable cy- Comput. Syst. 118, 240–251. doi:10.1016/j.future.2021.01.011.
berattack detection model. IEEE Trans. Ind. Informatics 16, 6154–6162. Moustafa, N., Slay, J., 2015. UNSW-NB15: a comprehensive data set for
doi:10.1109/TII.2020.2970074. network intrusion detection systems (UNSW-NB15 network data set),
Hassan, M.M., Huda, M.S., Sharmeen, S., Abawajy, J.H., Fortino, G., 2021. in: 2015 Military Communications and Information Systems Confer-
An adaptive trust boundary protection for IIoT networks using deep- ence (MilCIS), IEEE. pp. 1–6. doi:10.1109/MilCIS.2015.7348942.
learning feature-extraction-based semisupervised model. IEEE Trans. Mozaffari, M., Saad, W., Bennis, M., Nam, Y., Debbah, M., 2019. A tuto-
Ind. Informatics 17, 2860–2870. doi:10.1109/TII.2020.3015026. rial on uavs for wireless networks: Applications, challenges, and open
Iqbal, M.M.W., Abbas, H., Daneshmand, M., Rauf, B., Bangash, Y.A., problems. IEEE Commun. Surv. Tutorials 21, 2334–2360. doi:10.1109/
2020. An in-depth analysis of IoT security requirements, challenges, COMST.2019.2902862.
and their countermeasures via software-defined security. IEEE Internet Ni, J., Zhang, K., Lin, X., Shen, X.S., 2018. Securing fog computing for in-
Things J. 7, 10250–10276. doi:10.1109/JIOT.2020.2997651. ternet of things applications: Challenges and solutions. IEEE Commun.
Keshk, M., Sitnikova, E., Moustafa, N., Hu, J., Khalil, I., 2021. An Surv. Tutorials 20, 601–628. doi:10.1109/COMST.2017.2762345.
integrated framework for privacy-preserving based anomaly detection Nie, L., Wu, Y., Wang, X., Guo, L., Wang, G., Gao, X., Li, S., 2022. In-
20
Journal Pre-proof
trusion detection for secure social internet of things based on collabora- Yao, W., Zhang, K., Yu, C., Zhao, H., 2021. Exploiting ensemble learning
tive edge computing: A generative adversarial network-based approach. for edge-assisted anomaly detection scheme in e-healthcare system, in:
IEEE Trans. Comput. Soc. Syst. 9, 134–145. doi:10.1109/TCSS.2021. IEEE Global Communications Conference (GLOBECOM), IEEE. pp.
3063538. 1–7. doi:10.1109/GLOBECOM46510.2021.9685745.
Ning, Z., Dong, P., Wang, X., Hu, X., Guo, L., Hu, B., Guo, Y., Qiu, T., Zenati, H., Foo, C.S., Lecouat, B., Manek, G., Chandrasekhar, V.R., 2018a.
Kwok, R.Y., 2021. Mobile edge computing enabled 5g health moni- Efficient GAN-based anomaly detection. URL: http://arxiv.org/abs/
toring for internet of medical things: A decentralized game theoretic 1802.06222.
approach. IEEE J. Sel. Areas Commun. 39, 463–478. doi:10.1109/JSAC. Zenati, H., Romain, M., Foo, C., Lecouat, B., Chandrasekhar, V., 2018b.
2020.3020645. Adversarially learned anomaly detection, in: IEEE International Con-
of
Nomm, S., Bahsi, H., 2018. Unsupervised anomaly based botnet detection ference on Data Mining (ICDM), IEEE. pp. 727–736. doi:10.1109/ICDM.
in IoT networks, in: 17th IEEE International Conference on Machine 2018.00088.
Learning and Applications (ICLMA), IEEE. pp. 1048–1053. doi:10. Zhang, X., Yang, F., Hu, Y., Tian, Z., Liu, W., Li, Y., She, W., 2022. RANet:
1109/ICMLA.2018.00171. Network intrusion detection with group-gating convolutional neural net-
Park, C., Lee, J., Kim, Y., Park, J.G., Kim, H., Hong, D., 2022. An enhanced work. J. Netw. Comput. Appl. 198, 103266. doi:10.1016/j.jnca.2021.
AI-based network intrusion detection system using generative adversar- 103266.
pro
ial networks. IEEE Internet Things J. doi:10.1109/JIOT.2022.3211346. Zhou, X., Hu, Y., Liang, W., Ma, J., Jin, Q., 2021. Variational LSTM
Qi, L., Yang, Y., Zhou, X., Rafique, W., Ma, J., 2022. Fast anomaly iden- enhanced anomaly detection for industrial big data. IEEE Trans. Ind.
tification based on multiaspect data streams for intelligent intrusion de- Informatics 17, 3469–3477. doi:10.1109/TII.2020.3022432.
tection toward secure industry 4.0. IEEE Trans. Ind. Informatics 18, Zhu, J., Park, T., Isola, P., Efros, A.A., 2017. Unpaired image-to-image
6503–6511. doi:10.1109/TII.2021.3139363. translation using cycle-consistent adversarial networks, in: IEEE Inter-
Salimans, T., Goodfellow, I.J., Zaremba, W., Cheung, V., Radford, national Conference on Computer Vision (ICCV), IEEE. pp. 2242–2251.
A., Chen, X., 2016. Improved techniques for training gans, in: doi:10.1109/ICCV.2017.244.
Advances in Neural Information Processing Systems 29: Annual Zoppi, T., Ceccarelli, A., 2021. Prepare for trouble and make it double! su-
Conference on Neural Information Processing Systems (NIPS), pp. pervised - unsupervised stacking for anomaly-based intrusion detection.
2226–2234. URL: https://proceedings.neurips.cc/paper/2016/hash/ J. Netw. Comput. Appl. 189, 103106. doi:10.1016/j.jnca.2021.103106.
8a3363abe792db2d8761d6403605aeb7-Abstract.html.
re-
Schlegl, T., Seeböck, P., Waldstein, S.M., Langs, G., Schmidt-Erfurth, U.,
2019. f-anogan: Fast unsupervised anomaly detection with generative
adversarial networks. Medical Image Anal. 54, 30–44. doi:10.1016/j.
media.2019.01.010.
Schlegl, T., Seeböck, P., Waldstein, S.M., Schmidt-Erfurth, U., Langs, G.,
2017. Unsupervised anomaly detection with generative adversarial net-
works to guide marker discovery, in: 25th International Conference In-
formation Processing in Medical Imaging (IPMI), Springer. pp. 146–
lP
157. doi:10.1007/978-3-319-59050-9\_12.
Shafiq, M., Tian, Z., Bashir, A.K., Du, X., Guizani, M., 2021. Cor-
rauc: A malicious bot-IoT traffic detection method in IoT network using
machine-learning techniques. IEEE Internet Things J. 8, 3242–3254.
doi:10.1109/JIOT.2020.3002255.
Sharafaldin, I., Lashkari, A.H., Ghorbani, A.A., 2018. Toward generating a
new intrusion detection dataset and intrusion traffic characterization, in:
Proceedings of the 4th International Conference on Information Systems
rna
Security and Privacy (ICISSP), SciTePress. pp. 108–116. doi:10.5220/

0006639801080116.
Soni, D., Kumar, N., 2022. Machine learning techniques in emerging cloud
computing integrated paradigms: A survey and taxonomy. J. Netw.
Comput. Appl. 205, 103419. doi:10.1016/j.jnca.2022.103419.
Stoyanova, M., Nikoloudakis, Y., Panagiotakis, S., Pallis, E., Markakis,
E.K., 2020. A survey on the internet of things (IoT) forensics: Chal-
lenges, approaches, and open issues. IEEE Commun. Surv. Tutorials
22, 1191–1221. doi:10.1109/COMST.2019.2962586.
Vu, L., Cao, V.L., Nguyen, Q.U., Nguyen, D.N., Hoang, D.T., Dutkiewicz,
Jou
E., 2022. Learning latent representation for iot anomaly detection. IEEE
Trans. Cybern. 52, 3769–3782. doi:10.1109/TCYB.2020.3013416.
Wang, X., Han, Y., Leung, V.C.M., Niyato, D., Yan, X., Chen, X., 2020.
Convergence of edge computing and deep learning: A comprehensive
survey. IEEE Commun. Surv. Tutorials 22, 869–904. doi:10.1109/COMST.
2020.2970550.
Wu, Z., Wang, J., Hu, L., Zhang, Z., Wu, H., 2020. A network intrusion
detection method based on semantic re-encoding and deep learning. J.
Netw. Comput. Appl. 164, 102688. doi:10.1016/j.jnca.2020.102688.
Yang, D., Hwang, M., 2022. Unsupervised and ensemble-based anomaly
detection method for network security, in: 14th International Conference
on Knowledge and Smart Technology (KST), IEEE. pp. 75–79. doi:10.
1109/KST53302.2022.9729061.
Yang, L., Moubayed, A., Shami, A., 2022. MTH-IDS: A multitiered hybrid
intrusion detection system for internet of vehicles. IEEE Internet Things
J. 9, 616–632. doi:10.1109/JIOT.2021.3084796.
21
Journal Pre-proof
Author Biography
Dear Editors:
The authors’ information is as follows.
of
Wei Yao received the M.Sc. degree in computer system architecture from Northeastern University,
Shenyang, China, in 2018. He is currently pursuing the Ph.D. degree in computer science and technology
from Northeastern University, Shenyang, China. His research interests include Internet of Things security,
anomaly detection, e-healthcare, and network measurement.
pro
Han Shi is currently pursuing the Ph.D. degree in School of Computer Science and Engineering at
Northeastern University, Shenyang, China. His research interests include wireless sensor networks, body
area networks, and transfer learning techniques.
Hai Zhao received the B.Sc. degree in electrical engineering from Dalian Maritime University, Dalian,
China, in 1982 and the M.Sc. and Ph.D. degrees from Northeastern University, Shenyang, China, in 1987
re-
and 1995, respectively, both in computer science. He is currently a Professor with the School of Computer
Science and Engineering, Northeastern University. He serves as the Director of Liaoning Provincial Key
Laboratory of Embedded Technology. He has held programs such as National Natural Science
Foundation of China, National High Technology Research and Development Program of China, Nation
Class Lighted Torch Plan, etc. He is the author of more than 300 academic papers, four books, and one
lP
national standard. He has received six awards for science and technology from the Liaoning province
and ministry of China. He received allowance of the State Council due to his special contributions to the
development of education. His current research interests include Internet of Things cybersecurity,
wireless sensor networks, edge computing, deep learning, data and information fusion, computer
simulation and virtual reality.
rna
Thank you and best regards.
Yours sincerely,
Wei Yao
E-mail: yaow.neu@gmail.com
Jou
Journal Pre-proof
Author Contribution Statement
Wei Yao: Conceptualization of this study, Methodology, Writing - Original draft,

Software. Han Shi: Data processing, Writing - review . ending. Hai Zhao:
Validation.
of
pro
re-
lP
rna
Jou
Journal Pre-proof
Declaration of interests
☒ The authors declare that they have no known competing financial interests or personal relationships
that could have appeared to influence the work reported in this paper.
of
☐The authors declare the following financial interests/personal relationships which may be considered
as potential competing interests:
pro
re-
lP
rna
Jou

1 s20 S1084804523000413 Main

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1 s20 S1084804523000413 Main

Uploaded by

Copyright:

Available Formats

Journal Pre-proof

Scalable anomaly-based intrusion detection for secure Internet of Things

Wei Yao, Han Shi, Hai Zhao

To appear in: Journal of Network and Computer Applications

Received date : 13 November 2022

© 2023 Published by Elsevier Ltd.

Scalable anomaly-based intrusion detection for secure Internet of

anomalies, which is effective for unknown and unusual be-

Grolinger, 2021; Soni and Kumar, 2022). Highly useful and

2.2. Deep Learning-Based Intrusion/Anomaly

In this section, we first give the assumptions in the IoT

follows. Edge nodes/devices collect and analyze IoT traffic

information under its base station. Then, these data infor-

devices to connect between the edge nodes, cloud, and intel-

at gateways in the fog node such as routers and switches in

3.2. System Model

is proposed to learn an encoder that provides the projection

min max 𝑉 (, 𝑓 ) = 𝔼𝑥∼𝑝 [𝑓𝑤 (𝑥)] − 𝔼𝑧∼𝑝 [𝑓𝑤 ((𝑧))]

tributions inevitably have little overlapping, but discrete fea-

other words, the JS distance saturates. This can result in de-

where  ∈ 𝑊𝐷 , which is the set of 1-Lipschitz functions, 𝑝

4.2. Coding Loss

Table 1 Send flow

, 𝑤𝑑 Discriminator and its parameters

tion ((𝑥)) for given 𝑥. To measure the abnormal extent

of given sample 𝑥, similar to (Schlegl et al., 2019), we uti-

ing model over cloud server; 2) training and detection of IoT

where 𝑤(𝑔,𝑒),𝑗 denotes the 𝑗-th parameter of 𝑤𝑔,𝑒 , and 𝜻 is

1 Preprocess 2 Generate Data

Epoch =T? 5 Update Model

Yes Discriminator Classifier

gradient ∇𝑤𝑔,𝑒 to update their parameters 𝑤𝑔,𝑒 according to

Step 6. This training process is repeatedly performed until Method Complexity

Step 7. Finally, the cloud server sends the copy of  to

node together, provides superior performance, and handles

27: end for

ing one normal records and nine classes of attacks, namley

Optimizer Adam (𝛼 = 10−4 , 𝛽1 = 0.5, 𝛽2 = 0.999) Reconnaissance 3496

der of magnitude of dataset, the Wednesday-working Hours

portion of data. The conditions of data splitting are listed in

1. Remove redundant features (i.e., “Fwd_Header_Length”

in the CIC-IDS2017 dataset) and illegal traffic records  1RUPDO

(i.e., null or missed values).

2. Transform the symbol features into numerical ones us- 

ing one-hot encoding. For instance, the 42 dimensional

features in UNSW-NB15 dataset are transform-ed to 196

     

process. As we cannot known the score threshold 𝛼 for IoT

components. For fair comparison, the used model network

function of data samples. In our experiment, the Gaussian TP + TN + FP + FN

VAE, ALAD, f-AnoGAN, FID-GAN) on two datasets, re-

Coding loss (CoL) and cycle consistency loss (CycL). Thus,

Module Accuracy Recall FAR F1 Score Accuracy Recall FAR F1

0.6 Score Accuracy Recall FAR F1

fog layer brings the computation resources near the edge.

exists two kinds of communications as below:

(2022) Recall, F1 performance ∙ Lack of testing time

these attacks in future work. 8. Conclusion and Future Work

The authors declare that they have no known competing

tions. Therefore, we suggest extending the proposed method

Supercomput. 75, 5597–5621. doi:10.1007/s11227-019-02805-w.

Security and Privacy (ICISSP), SciTePress. pp. 108–116. doi:10.5220/

The authors’ information is as follows.

in the CIC-IDS2017 dataset) and illegal traffic records 1RUPDO

2. Transform the symbol features into numerical ones us-