Shrink AutoEncoder For Federated Learning-Based IoT Anomaly Detection

2022 9th NAFOSTED Conference on Information and Computer Science (NICS)
Shrink AutoEncoder for Federated Learning-based

IoT Anomaly Detection
Thai An Vu, Tuan Phong Tran, Ly Vu and Quang Uy Nguyen
Le Quy Don Technical University, Hanoi, Vietnam
Abstract—Federated Learning (FL)-based anomaly detection Among these works, the proposed FedDetect model in [15]
is a promising framework for Internet of Things (IoT) security. can be considered as one of the most effective solutions
Due to the scarcity of abnormal data, unsupervised deep learning for IoT anomaly detection. Specifically, FedDetect estimates
neural network models, such as variations of AutoEncoder (AE),
2022 9th NAFOSTED Conference on Information and Computer Science (NICS) | 978-1-6654-5422-3/22/$31.00 ©2022 IEEE | DOI: 10.1109/NICS56915.2022.10013475
are considered effective solutions for anomaly detection in IoT the global anomaly score1 by the median of mean square
devices. These models construct low-dimensional representations error (MSE) received from all training clients (IoT devices).
of input data that are utilized for classification. Nevertheless, However, the difference between the normal data collected by
given the enormous number of IoT devices, their intrinsic different IoT devices makes a significant difference in MSE at
heterogeneity, and the distributed nature of the FL training the training clients. As shown in Fig. 1 (a), the representations
process, the latent representation of the local data is distributed
randomly. The determination of the global anomaly score is thus of the normal data of IoT devices are different, leading to
no longer accurate. To address this issue, this work provides an different anomaly scores. Thus, estimating the global anomaly
effective FL-based IoT anomaly detection framework with novel score of FedDetect is less effective.
AutoEncoder models, namely Federated Shrink AutoEncoder Given the above, this work proposes an effective FL-based
(FedSAE). The proposed model forces normal data of IoT devices IoT anomaly detection framework with novel AE models,
to nearly the origin. Thus, a universal or global anomaly score
can be determined accurately for all IoT devices. The extensive namely Federated Shrink AutoEncoder (FedSAE). Specifi-
experiments on the N-BaIoT dataset indicate that FedSAE may cally, we design the FL based on the Shrink AutoEncoder
reduce the false detection rate by 1.84% compared with that (SAE) [5] that can force all normal IoT data to the region
of the AE-based FL frameworks for the IoT anomaly detection that is near the origin so that we can define a universal/global
problem. anomaly score for all the IoT devices. Fig. 1 (b) shows that
Index Terms—Shrink AutoEncoder, federated learning,
by forcing the normal data from different IoT devices closer
anomaly detection, IoT.
to the origin, the global anomaly score (the green circle) can
be estimated more effectively. Thus, FL based on SAE, i.e.,
I. I NTRODUCTION Federated SAE (FedSAE) can handle the problem of global
anomaly score on the FL frameworks. The major contributions
The Internet of Things (IoT) is a network of billions of of this paper are as follows:
IoT devices that execute independent tasks with minimum • We propose using SAE for FL (i.e., FedSAE) to handle
human intervention. These devices are often limited in battery the problem of the global anomaly score for FL-based
and computing power, hence more vulnerable to malicious IoT anomaly detection.
attacks. A number of deep neural network (DNN) models • We conduct the extensive experiments to prove that Fed-
have been recently developed to detect anomalous behaviors in SAE can enhance the accuracy of IoT anomaly detection.
IoT networks, e.g., [1]–[4]. To build a DNN model, we need Specifically, FedSAE can reduce the false detection rate
to collect a huge amount of data generated by IoT devices, by 1.84% compared with the convention.
including normal and abnormal data. However, abnormal data
The rest of the paper is organized as follows. Section II
is usually hard to collect. Thus, unsupervised DNN models that
reviews the techniques for anomaly detection related to DNN
are trained only on normal data are preferred for IoT anomaly
models and FL architectures. In Section III, we present the
detection. Within the various DNN models, AutoEncoder
fundamental background of our proposed method. Then, the
(AE)-based models are considered the most effective solution
proposed model is presented in detail in Section IV. In
for IoT anomaly detection [5]–[10].
Section V, we describe the dataset and experimental scenarios.
As aforementioned, using DNN models for IoT anomaly The results and discussions of experiments are represented
detection usually requires massive training data at the center in Section VI. Finally, conclusions with future work are
server [5]–[10]. However, collecting training data from IoT summarized in Section VII.
devices and sending it to a central server can compromise
the privacy of devices. This may discourage the IoT devices II. R ELATED WORK
from sharing their raw data for the algorithm or model training Due to the speed and volume of data generated by IoT de-
purposes. To address this issue, Federated Learning (FL) vices, DNNs have been widely adopted for the anomaly detec-
frameworks have recently been introduced as a highly effec- tion problem recently [16]–[18]. Among various DNN models,
tive solution to train different DNN models with distributed
training data [11]–[15]. 1 The score is used to separate the normal and abnormal data [15]
978-1-6654-5422-3/22/$31.00
Authorized ©2022
licensed use limited to: East China 383 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
IEEE and Tech. Downloaded
Univ of Science
a. Represe tatio of FedDetect b. Represe tatio of FedSAE

A omaly
15 Normal IoT 1
Normal IoT 2
10 Normal IoT 3
Normal IoT 4
5 Global a omaly score Normal IoT 5
Normal IoT 6
Normal IoT 7
0 Normal IoT 8
Normal IoT 9
−5
−10
−15
−15 −10 −5 0 5 10 15 −15 −10 −5 0 5 10 15
Fig. 1. Representation of normal samples and abnormal data samples selected randomly in the N-BaIoT dataset [7]. The representations have two dimensions
resulted from (a) FedDetect [15] and (b) FedSAE.
AE-based models are widely used in anomaly detection due However, the anomaly data is not always available, especially
to their simplicity, flexibility across multiple data formats, and with data collected from IoT devices. To handle this, the
variety of variations [5], [6], [8], [9]. [8] proposed a novel FL architecture based on AE, namely FedDetect, was proved
architecture for time series anomaly detection that combines as an effective method for IoT anomaly detection [15]. This
Convolutional Neural Network (CNN) and Long Short Term architecture leverages AE to learn the representation of normal
Memory (LSTM) based on AE. The preprocessing of the IoT data. The anomaly score is set globally based on the
two-stage sliding window might be considered the extraction M SE of AE models from all the clients. However, because
of low-level temporal features. However, these techniques each client trains on different data collected from different IoT
are only suitable for univariable time series data. Moreover, devices, thus, setting the global anomaly score for all clients
[5] also examined the learning representation models based is difficult. We address this problem by proposing a model
on AE for anomaly detection. The results showed that AE- that can force latent representations around the origin, thus
based models can enhance the accuracy of anomaly detection facilitating a more accurate calculation of the global anomaly
compared with other methods. However, the performance of score.
these techniques is highly dependent on the anomaly scores
III. BACKGROUND
that are selected manually based on the loss value of AE, i.e.,
Mean Square Error (MSE). In this paper, we overcome this In this section, we present the fundamental background of
issue by calculating an anomaly score based on multiple MSE AE and FL that are the key components of our proposed
values. frameworks.
FL has recently received paramount interest in handling the A. AutoEncoder
data privacy of DNN models thanks to its potential to address An Autoencoder (AE) [19] is a type of artificial neural
drawbacks of centralized training systems such as latency, network designed for unsupervised learning. This is composed
communication bandwidth, and data privacy [11]–[15]. The of two major components: an encoder En and a decoder De.
work [11] proposed sparse tertiary compression (STC), a The encoder generates a low-dimensional representation of an
communication protocol that utilizes sparsification, ternariza- input data xi (i.e., a latent representation z i ), while the decoder
tion, error accumulation, and optimum Golomb encoding to reconstructs the data x̂i from the latent representation space
compress both upstream and downstream data. However, since in order to minimize the loss function (as shown in Eq. (4)).
STC compressed models after the local training phase, the We use ϕ = (We , be ) to denote the parameter set for train-
quantization process is not optimized during training. The ing the encoder, where We and be are the weight matrix and
work [12] introduced an asynchronous client-side learning the bias vector, respectively. Assume the latent representation
technique and a server-side temporally weighted aggregation z i of the input sample xi has a general form of
of the local models. Furthermore, DeepFed [13] was a feder-
ated scheme using a novel gated recurrent unit-CNN model z i = qϕ (xi ) = σe (We xi + be ), (1)
for detecting threats against industrial cyber-physical systems.
where qϕ is the encoder, and σe is the activation function of
In [14], the authors introduced an on-device communication
the encoder. Similarly, θ = (Wd , bd ), pθ , and σd represent
efficient FL-based architecture for deep anomaly detection. An
the parameter set, the decoder and the activation function of
Attention Mechanism-based Convolutional Neural Network-
the decoder, respectively. The decoder pθ is responsible for
Long Short Term Memory (AMCNN-LSTM) model was pro-
mapping the latent representation z i to the reconstruction x̂i
posed for detecting anomalies. In practice, these FL archi-
tectures usually need to collect anomaly data for learning. x̂i = pθ (z i ) = σd (Wd z i + bd ). (2)
384 on March 26,2023 at 13:10:46 UTC from IEEE Xplore. Restrictions apply.
Authorized licensed use limited to: East China Univ of Science and Tech. Downloaded
The AE is trained to achieve a minimal reconstruction error. representation by the same anomaly score. This property is
The reconstruction error of an AE is often estimated as the suitable for FL while a central AE-based model is aggregated
mean squared error (MSE) averaged across all data samples from multiple AE-based models. Thus, we have developed a
as follows: new FL framework based on SAE for the anomaly detection
n
1X i problem.
LAE (x, ϕ, θ) = ∥x − pθ (qϕ (xi ))∥2 , (3)
n i=1 C. Federated Learning
where n is the number of data samples in the dataset. Federated Learning (FL) is a promising distributed machine
AEs are commonly utilized in anomaly detection for train- learning paradigm for privacy preservation [23]. A shared
ing on normal data samples [5]. The training objective of AEs global model is trained from a federation of participating
is to minimize the loss function in Eq. (4). Moreover, the devices under the control of a central server. FL enables
M SE value is usually used to setup the anomaly score for mobile devices to collaborate on training a shared model
the anomaly detection problem [5], [15]. while preserving all training data on the device, ensuring the
B. AutoEncoder Variations protection of personal data and minimizing system latency.
The FL operation process consists of three phases, i.e.,
AEs, which have too many hidden layers, will be able to
initialization, local model training and updating, and global
learn the task of copying data in inputs without extracting
model aggregation and updating [24]. The server broadcasts
essential information. Denoising AE (DAE) [5], [20] is a
an initialized global model to all the clients during the ini-
regularized AE motivated to acquire new characteristics and
tialization phase. Following that, each client trains the model
generalize more effectively. DAEs are trained to reconstructed
and updates the model parameters locally by using its private
the original input from a stochastically noise-added input. The
dataset. At the end of this phase, the clients send the updated
loss function of DAEs is reformulated as follows:
model parameters to the server. The server aggregates the local
n
X models and then broadcasts the updated global model back to
LDAE (x, ϕ, θ) = Ep(x̃|xi ) ∥xi − pθ (qϕ (x̃))∥2 , (4)
the clients again during the aggregation phase. There are sev-
i=1
eral ways to aggregate client models, however, the averaging
where x̃ is the corrupted version of xi as determined by model operation has been widely used in the literature [12],
p(x̃|xi ). Ep(x̃|xi ) is the expectation of a reconstruction loss at [14].
xi over a number of samples x̃. There are several techniques to
corrupt the input, such as Gaussian noise and salt and pepper IV. P ROPOSED METHOD
noise, but the most popular is arbitrarily masking input features
This section presents our proposed FL architecture, i.e.,
to zero.
FedSAE for IoT anomaly detection.
A Variational AE (VAE) [5], [21] is an AE whose distribu-
tion of encodings is regularized during training to guarantee
Algorithm 1 Training FedSAE.
that its latent space has good features that enable the gen-
1: Server: Initialize ws ,
eration of new data. Unlike in a conventional AE, the input
2: Clients: Initialize wk (k = 1 . . . K),
of the VAE is encoded as a distribution throughout the latent
3: for round < T do
space, rather than as a single point, which helps incorporate
some regularisation into the latent space. The loss function of 4: for each client k = 1 to K do
VAE consists of two terms. The first one is the reconstruction 5: Train the SAE model,
term on the final layer, which tends to make the encoding- 6: Update the weight wk at the client K,
decoding scheme as efficient as possible. The second one is 7: Send wk to the Server,
the regularisation term on the latent layer, which tends to 8: end for
9: end for
regularise the organisation of the latent space by placing the 1
PK
10: Server: Calculate ws = K k=1 wk ,
encoder closer to the standard normal distribution. This term
11: Server: Calculate MSEs = MSE1 , MSE2 , . . . , MSEK ,
is stated as the Kulback-Leibler divergence [22] between the
12: Server: Calculate thglobal = MSEs + α ∗ σ(MSEs ),
retrieved distribution and a Gaussian distribution.
13: Server: Send the new model weight ws and the anomaly
Shrink AutoEncoder (SAE) was introduced by Loi et al. [5].
This is a variant of AE by adding a new regularizer to the loss score thglobal to all the Clients.
function of AE. The new regularizer helps to force the latent
representation of an AE nearly to the origin. In the training Shrink AutoEncoder (SAE) introduced in [5] that was
process, only the normal data is trained to map to nearly the proved as an effective unsupervised AE-based model for
origin of the latent representation space. After training, the anomaly detection. SAE is trained on normal network traffic
normal data is forced around the origin while the abnormal data and uses an anomaly score to distinguish between normal
data is mapped to the regions far from the origin. By forcing and abnormal network traffic data. The loss function of SAE
the normal data to the origin, the normal and abnormal data aims to map normal data samples to nearly the origin, that can
of multiple data types are easily distinguished in the latent be expressed as follows:
shown in Algorithm 1) less effectively. To handle this, we can

n
1X i compress the training normal data from all IoT devices (i.e.,
LSAE (x, ϕ, θ) = ∥x − pθ (qϕ (xi ))∥2 + λ ∗ ∥z i ∥2 , (5) clients) to the same value range. To make setup easier, this
n i=1
value is set to zero, similar to that in SAE [5]. Therefore, the
where the qϕ and pθ are the encoder and decoder parts, second term ∥z i ∥2 in Eq. (6) is used to compress the original
respectively. n is the number of data samples in a dataset. data nearly to the origin. The value of λ controls the trade-off
x is the input dataset. z i is the latent representation of the a between two loss terms.
data sample xi . Fig. 2 visualizes the training process of the proposed model
Based on the SAE, we propose the FedSAE that can FedSAE. First, for each training round, the Clients train the
leverage the advantages of the SAE for the FL architecture. To AE based models (SAE) locally and send the weights wk to
train the FedSAE, we adjust the training framework of Fed- the Server. Then, the Server calculates the average weight ws
Detect [15] as shown in Algorithm 1. We choose the learning as the line 10th of Algorithm 1 and sends back to the Clients.
rate and the optimization algorithm by considering different Second, in the final training round, each Client k sends both
settings based on the training data. In the Algorithm 1, Clients its model weight wk and MSE which is calculated by Eq. (4).
represent the IoT devices that can collect data and train the Then, the Server calculates the global anomaly score thglobal
SAE model locally. Besides that, Server represents the center as the line 12nd of Algorithm 1. Finally, the Server sends ws
node that aggregates the SAE models received from all the and thglobal to all the Clients. The Clients will use the final
Clients. In Algorithm 1, FedSAE is trained as follows. First, ws and the global anomaly score thglobal to detect anomaly.
we initialize the model weights of the SAE models on the A new data sample inputs to the AE based model (SAE) of a
Server and all the Clients. Second, for each training round, all Client with the weight ws . It results the MSE value. If the MSE
the Clients do training on the SAE locally and update their value is larger than the global anomaly score, i.e., thglobal , the
weights wk to the Server. The Server receives the weights Client will report this data sample as an anomaly.
from all the Clients and calculates the average weights ws .
V. E XPERIMENTAL S ETTING
Then, the Server sends ws to all the Clients for updating. T
is the number of training rounds. In the final training round, A. Experimental Dataset
the Clients send the weight wk and MSEk values to the Server In this section, we present two datasets used for our exper-
to calculate the loss value of the Server, i.e., MSEs . The global iments.
anomaly score is computed in the line 12nd of this algorithm. 1) N-BaIoT dataset: We used the N-BaIoT dataset [7] for
evaluating our proposed models. The dataset comprises 115
n numeric features that represent network traffic data gathered
1X i
L (x, x̃, ϕ, θ) = ∥x − pθ (qϕ (x̃i ))∥2 + λ ∗ ∥z i ∥2 , (6) from nine commercial IoT devices. These devices were au-
n i=1
thentically infected by two popular botnet attacks, Mirai and
In Figure 1, we visualize the representation of IoT data by BASHLITE. Benign, BASHLITE, and Mirai have 502605,
FedDetect and FedSAE as the two dimensional space, i.e., 2835317, and 2935131 samples, respectively. The feature set
z has two neurons. After training FedDetect and FedSAE, is composed of data from the five most recent time frames
we randomly selected 100 normal data samples of each IoT (100 ms, 500 ms, 1.5 sec, 10 sec, and 1 min).
device and 100 abnormal data samples from the IoT datasets We divided this dataset into two sets, i.e., the training set
to visualize latent representation z. The visualization of z and the testing set, with a ratio of 8:2. After the elimination
generated by FedDetect and FedSAE are shown in Figure 1 of the malicious samples, the training set was distributed to
(a) and (b), respectively. all the clients for training. Each client was trained on the
Figure 1(a) shows that after training FedDetect, each IoT data collected from one IoT device. Thus, we had a total of
device may have a different anomaly score to separate the nine clients for training FL architecture. After training, we
normal and abnormal data samples. Thus, it is hard to set evaluated the global model on the testing set.
the global anomaly score of the Server as Algorithm 1. On 2) NSL-KDD dataset: We also do the experiments on a
the other hand, as shown in Figure 1(b), the FedSAE model network IDS dataset, namely the NSL-KDD dataset. The
can represent normal IoT data collected from all IoT devices dataset comprises 41 features that represent network traffic
nearly to the origin. It helps to estimate the global anomaly data captured in the DARPA’98 IDS evaluation program. The
threshold(the green circle) for IoT anomaly detection more attack is one of the following four types, Denial of Service,
effectively than FedDetect [15]. User to Root, Remote to Local, and Probing. The dataset
There is a reason which helps the proposed FedSAE en- has two sets, i.e., the training set and the testing set. The
hance the accuracy of the IoT anomaly detection. The FL training set consists of 67343 normal data samples and 58630
architecture trains a neural network model (i.e., SAE) on dif- data samples. These numbers in the testing set are 9711 and
ferent IoT devices. The training data collected from different 12833, respectively. Our anomaly detection problem is based
IoT devices can be different. Consequently, the anomaly score on the semi-supervised approach, thus, we need to eliminate
for each client can be significantly different. This leads to the malicious samples in the training set. Then, the normal
the aggregation of the global anomaly score for all clients (as training data samples are distributed to nine clients for FL.
TABLE I
Server
P ERFORMANCE COMPARISON BETWEEN FL FRAMEWORK ON N-BA I OT
- Calculate weight ws DATASET.
- Calculate global
anomaly score thglobal
Global Model
Model FPR TNR Precision
w
s, th FedVAE 100 0 50
SE 1
,M w g lob
ws, thglobal
w1 K, M al
w2, MSE2
b al
s,
th glo SE FedDAE 5.81 94.19 94.19
w K
Client 1 Client K FedDetect 2.13 97.87 97.91

Train Client 2 Train
SAE SAE FedSAE 1.84 98.15 98.19
Local Model on normal Train Local Model on normal
data of IoT SAE data of IoT
Local Model on normal
device 1 device K
Normal Data data of IoT
Normal Data TABLE II
device 2
Normal Data P ERFORMANCE COMPARISON BETWEEN FL FRAMEWORK ON NSL-KDD
DATASET.
Fig. 2. The architecture of FedSAE. Model FPR TNR Precision

FedVAE 100 0 50
After training, we evaluated the global model on the testing FedDAE 25.07 74.92 79.93
set. FedDetect 26.82 83.17 85.58
B. Evaluation Metric FedSAE 20.74 79.25 82.80
We use popular and effective metrics for classification

problems as False Positive Rate (FPR), True Negative Rate proposed model, FedSAE, can significantly decrease the false
(TNR), Precision, and Accuracy, defined as follows: detection rate of the IoT anomaly detection problem in contrast
FP to other models. FedSAE has the lowest false detection rate at
FPR = (7)
TN + FP 1.84%, whereas FedDetect, FedDAE, and FedVAE have rates
TN of 2.13, 5.81, and 100%, respectively. Additionally, FedSAE
TNR = (8) also has the highest true detection rate for malicious data
TN + FP
samples. FedSAE had the best TNR with 98.15%, followed by
TP FedDetect and FedDAE with 97.87 and 94.19%, respectively.
Precision = (9)
TP + FP FedVAE has the worst performance, with a TNR of 0%.
TN + TP This means that this model misclassifies all harmful data
Accuracy = (10)
TN + TP + FN + FP samples. In addition to the superiority in comparison with
where TP and FP denote the number of correctly predicted other models in terms of TNR and FPR, our proposed model
samples for a single class, respectively, while TN and FN also achieves better classification performance. The results of
denote the number of correctly predicted samples for all other the FL frameworks in the table II also present that FedSAE
classes. enhances the accuracy of the anomaly detection problem.
Moreover, the figure 3 demonstrates that FedSAE has the
C. Experimental Setups greatest accuracy of the models evaluated. FedSAE attained
All experiments were implemented in PyTorch and based 99.07% accuracy, which is higher than FedDetect and FedDAE
on the FedDetect frameworks [15]. Moreover, the same com- of 0.19% and 1.9%, respectively. Similar to comparisons of
puting platform (Operating system: Ubuntu 16.04 (64 bit), detection rates, FedVAE obtained the lowest accuracy at 50%.
Intel(R) Core(TM) i5-5200U CPU, 2 cores, and 4GB RAM This proves that mapping the normal IoT data into nearly the
memory) was used in all experiments in this work. To highlight origin helps training FL more effectively. The reason is that
the strength of the proposed method, we compared four mod- on all clients, the normal IoT data is represented in the same
els, i.e., FedDetect [15], FedVAE, FedDAE, and the proposed range of values. This helps to estimate the global anomaly
model, i.e., FedSAE that combines FL architecture based on score that separates the normal and abnormal IoT data more
AE [19], VAE [21], DAE [20], and, SAE [5], respectively. effectively.
VI. E XPERIMENT R ESULTS AND D ISCUSSION VII. C ONCLUSION

This section highlights the performance of the proposed In this paper, we have proposed a new FL architecture,
models compared with other modern FLs. The FL frameworks, namely FedSAE, for IoT anomaly detection to address the
including the FedDetect, FedVAE, FedDAE, and FedSAE anomaly score and poisoning attack problems. The FedSAE
frameworks, are evaluated. The table I illustrates the false framework is able to effectively calculate the global anomaly
detection rates for each FL frame on the N-BaIoT dataset. A score by forcing normal data from all the clients into the
lower value ensures that the anomaly detection system detects same value range. The experimental results showed that the
anomalous behavior more efficiently. This table shows that our proposed method can enhance the accuracy of the IoT anomaly
97.09 98.88 99.07 [14] Y. Liu, S. Garg, J. Nie, Y. Zhang, Z. Xiong, J. Kang, and M. S.
100 Hossain, “Deep anomaly detection for time-series data in industrial
iot: A communication-efficient on-device federated learning approach,”
IEEE Internet of Things Journal, vol. 8, no. 8, pp. 6348–6358, 2021.
Accuracy (%)
[15] T. Zhang, C. He, T. Ma, L. Gao, M. Ma, and S. Avestimehr, “Federated

80 learning for internet of things,” in Proceedings of the 19th ACM
Conference on Embedded Networked Sensor Systems, ser. SenSys ’21.
New York, NY, USA: Association for Computing Machinery, 2021, p.
413–419. [Online]. Available: https://doi.org/10.1145/3485730.3493444
[16] S. Manimurugan, S. Al-Mutairi, M. M. Aborokbah, N. Chilamkurti,
60 S. Ganesan, and R. Patan, “Effective attack detection in internet of
50 medical things smart environment using a deep belief neural network,”
IEEE Access, vol. 8, pp. 77 396–77 404, 2020.
[17] Y. Cheng, Y. Xu, H. Zhong, and Y. Liu, “Leveraging semisupervised
FedVAE FedDAE FedDetect FedSAE hierarchical stacking temporal convolutional network for anomaly de-
Models tection in iot communication,” IEEE Internet of Things Journal, vol. 8,
no. 1, pp. 144–155, 2021.
[18] T. Alladi, B. Gera, A. Agrawal, V. Chamola, and F. R. Yu, “Deepadv:
Fig. 3. The accuracy of FL frameworks. A deep neural network framework for anomaly detection in vanets,”
IEEE Transactions on Vehicular Technology, vol. 70, no. 11, pp. 12 013–
12 023, 2021.
[19] C.-Y. Liou, W.-C. Cheng, J.-W. Liou, and D.-R. Liou, “Autoencoder for
detection problem while not increasing the communication words,” Neurocomputing, vol. 139, pp. 84–96, 2014.
cost of the FL architecture. [20] P. Vincent, H. Larochelle, Y. Bengio, and P.-A. Manzagol, “Extracting
and composing robust features with denoising autoencoders,” in
R EFERENCES Proceedings of the 25th International Conference on Machine
Learning, ser. ICML ’08. New York, NY, USA: Association for
[1] Z. Chkirbene, A. Erbad, and R. Hamila, “A combined decision for secure Computing Machinery, 2008, p. 1096–1103. [Online]. Available:
cloud computing based on machine learning and past information,” https://doi.org/10.1145/1390156.1390294
in 2019 IEEE Wireless Communications and Networking Conference [21] D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” CoRR,
(WCNC), 2019, pp. 1–6. vol. abs/1312.6114, 2014.
[2] S. Sahu and B. M. Mehtre, “Network intrusion detection system using [22] B. Schölkopf, J. C. Platt, J. C. Shawe-Taylor, A. J. Smola, and
j48 decision tree,” in 2015 International Conference on Advances R. C. Williamson, “Estimating the support of a high-dimensional
in Computing, Communications and Informatics (ICACCI), 2015, pp. distribution,” Neural Comput., vol. 13, no. 7, p. 1443–1471, jul 2001.
2023–2026. [Online]. Available: https://doi.org/10.1162/089976601750264965
[3] M. A. Ambusaidi, X. He, P. Nanda, and Z. Tan, “Building an intrusion [23] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
detection system using a filter-based feature selection algorithm,” IEEE “Communication-efficient learning of deep networks from decentralized
Transactions on Computers, vol. 65, no. 10, pp. 2986–2998, 2016. data,” in Artificial intelligence and statistics. PMLR, 2017, pp. 1273–
[4] Y. Lei, “Network anomaly traffic detection algorithm based on svm,” 1282.
in 2017 International Conference on Robots Intelligent System (ICRIS), [24] W. Y. B. Lim, N. C. Luong, D. T. Hoang, Y. Jiao, Y.-C. Liang, Q. Yang,
2017, pp. 217–220. D. Niyato, and C. Miao, “Federated learning in mobile edge networks:
[5] V. L. Cao, M. Nicolau, and J. McDermott, “Learning neural rep- A comprehensive survey,” IEEE Communications Surveys Tutorials,
resentations for network anomaly detection,” IEEE Transactions on vol. 22, no. 3, pp. 2031–2063, 2020.
Cybernetics, vol. 49, no. 8, pp. 3074–3087, 2019.
[6] L. Vu, V. L. Cao, Q. U. Nguyen, D. N. Nguyen, D. T. Hoang,
and E. Dutkiewicz, “Learning latent representation for iot anomaly
detection,” IEEE Transactions on Cybernetics, pp. 1–14, 2020.
[7] Y. Meidan, M. Bohadana, Y. Mathov, Y. Mirsky, A. Shabtai, D. Breiten-
bacher, and Y. Elovici, “N-baiot—network-based detection of iot botnet
attacks using deep autoencoders,” IEEE Pervasive Computing, vol. 17,
no. 3, pp. 12–22, 2018.
[8] C. Yin, S. Zhang, J. Wang, and N. N. Xiong, “Anomaly detection
based on convolutional recurrent autoencoder for iot time series,” IEEE
Transactions on Systems, Man, and Cybernetics: Systems, vol. 52, no. 1,
pp. 112–122, 2022.
[9] R.-H. Hwang, M.-C. Peng, C.-W. Huang, P.-C. Lin, and V.-L. Nguyen,
“An unsupervised deep learning model for early network traffic anomaly
detection,” IEEE Access, vol. 8, pp. 30 387–30 399, 2020.
[10] M. A. Salahuddin, V. Pourahmadi, H. A. Alameddine, M. F. Bari,
and R. Boutaba, “Chronos: Ddos attack detection using time-based
autoencoder,” IEEE Transactions on Network and Service Management,
vol. 19, no. 1, pp. 627–641, 2022.
[11] F. Sattler, S. Wiedemann, K.-R. Müller, and W. Samek, “Robust and
communication-efficient federated learning from non-i.i.d. data,” IEEE
Transactions on Neural Networks and Learning Systems, vol. 31, no. 9,
pp. 3400–3413, 2020.
[12] Y. Chen, X. Sun, and Y. Jin, “Communication-efficient federated deep
learning with layerwise asynchronous model update and temporally
weighted aggregation,” IEEE Transactions on Neural Networks and
Learning Systems, vol. 31, no. 10, pp. 4229–4238, 2020.
[13] B. Li, Y. Wu, J. Song, R. Lu, T. Li, and L. Zhao, “Deepfed: Federated
deep learning for intrusion detection in industrial cyber–physical sys-
tems,” IEEE Transactions on Industrial Informatics, vol. 17, no. 8, pp.
5615–5624, 2021.

Shrink AutoEncoder For Federated Learning-Based IoT Anomaly Detection

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Shrink AutoEncoder For Federated Learning-Based IoT Anomaly Detection

Uploaded by

Copyright:

Available Formats

2022 9th NAFOSTED Conference on Information and Computer Science (NICS)

Shrink AutoEncoder for Federated Learning-based

a. Represe tatio of FedDetect b. Represe tatio of FedSAE

shown in Algorithm 1) less effectively. To handle this, we can

Client 1 Client K FedDetect 2.13 97.87 97.91

Fig. 2. The architecture of FedSAE. Model FPR TNR Precision

B. Evaluation Metric FedSAE 20.74 79.25 82.80

We use popular and effective metrics for classification

VI. E XPERIMENT R ESULTS AND D ISCUSSION VII. C ONCLUSION

[15] T. Zhang, C. He, T. Ma, L. Gao, M. Ma, and S. Avestimehr, “Federated

You might also like