Professional Documents
Culture Documents
Abstract—The sheer volume of Industrial Internet of Things server, and then build a model to classify new data samples
(IIoT) malware is one of the most serious security threats by applying ML algorithms to this training dataset.
in today’s interconnected world, with new types of advanced
persistent threats and advanced forms of obfuscations. This paper
Participant side Server side
presents a robust Federated Learning-based architecture called Model (Mt)
P1 BR
Fed-IIoT for detecting Android malware applications in IIoT.
Fed-IIoT consists of two parts: i) participant side, where the 0 1 0 0 ... 0 0 0 1
Training FA
data are triggered by two dynamic poisoning attacks based on a P2 BR
generative adversarial network (GAN) and Federated Generative 0 0 0 1 ... 0 1 0 0
Training
Adversarial Network (FedGAN). While ii) server-side, aims to
monitor the global model and shape a robust collaboration train-
…
ing model, by avoiding anomaly in aggregation by GAN network PS BR
(A3GAN) and adjust two GAN-based countermeasure algorithms. 0 0 1 1 ... 0 1 1 0
Training
One of the main advantages of Fed-IIoT is that devices can Model (Mt+1)
Update model
safely participate in the IIoT and efficiently communicate with
each other, with no privacy issues. We evaluate our solutions Fig. 1: FL-based architecture applied to a mobile Android device.
through experiments on various features using three IoT datasets. FA: = federated aggregation; BR: = binary representation. Ps := s-th
The results confirm the high accuracy rates of our attack and participant generated a BR.
defence algorithms and show that the A3GAN defensive approach However, access to these datasets in centralised ML meth-
preserves the robustness of data privacy for Android mobile users ods raises concerns about data privacy for users. Since tradi-
and is about 8% higher accuracy with existing state-of-the-art tional ML techniques are classified only based on the training
solutions.
dataset, it is easy for attackers to access the data during the
Index Terms—Internet of Things (IoT), Federated Learning learning process. These approaches therefore face significant
(FL), Generative Adversarial Network (GAN), Malware. problems with data privacy and leakage. Collaborative ma-
chine learning (CML) was designed to cope with this problem
I. I NTRODUCTION and at the same time to make better use of ML methods [3],
Ndustrial Internet of Things (IIoT) consists of hetero- [4]. CML is a kind of decentralised learning that analyses data
I geneous devices that connect and communicate via the
Internet. In recent years, most of these devices have used
from small, mobile devices such as those connected to the
IoT. Based on the CML framework, federated learning (FL)
the Android operating system (OS), as the most popular and was designed to protect data privacy. In FL, each participant
well-known mobile OS for processing and communication. An uses a global training model, without needing to upload their
Android system can easily be installed on IoT-based systems, private data to a third-party server. Fig. 1 illustrates an FL-
and improves accessibility to a wide range of applications [1], based architecture applied to an Android malware system. In
[2]. This popularity has made the Android OS an attractive the figure, each participant (Pi , ∀i ∈ {1, . . . , S}, where S
target for malware writers and malicious Android applications, is total number of participants) is located in the participant
and attackers have written several complex malware models side influences on a global model [5]. This global model
to invade the Android OS. Several solutions have applied is pre-defined and trained by each participant to generate
traditional machine learning (ML) algorithms to distinguish local model parameters in duration round t (see the model
malware from benign programs and to deal with this problem. graph in the upper part of Fig. 1). Then, on the server side
These algorithms have achieved good results by collecting data (right-hand rectangular box in Fig. 1), we use a federated
and constructing models based on the identification of malware aggregation algorithm to aggregate the trained parameters for
features. The majority of such ML algorithms are centralised each participant and update the global model (see the model
methods, meaning that they first gather data from different graph in the lower part of Fig. 1).
users for use as a training dataset, which is placed on the ML In FL, individual computing machines may show abnormal
actions, for example due to faulty software, hardware inva-
R. Taheri is with the Computer Engineering and IT Department, Shiraz sions, unreliable communication channels, malicious samples
University of Technology, Shiraz, Iran (e-mail: r.taheri@sutech.ac.ir)
M. Shojafar and R. Tafazolli are with the Institute for Communication deliberately craft the model [6]. To mitigate these challenges,
Systems, 6G Innovation Centre (6GIC), University of Surrey, Guildford, we require robust policies to control the learning phases in
GU27XH, United Kingdom (e-mail: {m.shojafar, r.tafazolli}@surrey.ac.uk) FL. It is therefore necessary to develop provably robust FL
M. Alazab is with the College of Engineering, IT and Environment, Charles
Darwin University, Australia E-mail: alazab.m@ieee.org (Corresponding au- algorithms that can deal with Byzantine failures. Recently
thor: Mamoun Alazab.) developed robust FL defence mechanisms mainly depend
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 2
on the type of attacks launched against the system. As an TABLE I: Comparison between different federated learning solu-
tions (where a tick indicates that the method supports the property,
example, the authors in [7] introduced a Byzantine detection and a cross indicates that the method does not support the property).
algorithm for backdoor attacks in CML. This method depends Attack Defence Malware Supervised/ Used
on the distribution of the training data and is not robust, Ref.
Detection Unsupervised For
especially for various distributions of data applying on FL ML
[15] X X X Supervised IoT
settings. Other categories of solutions, such as those in [8]– [16] × X X Unsupervised IoT
[11], deal with controlling the injected noise in the training [17] × X X Unsupervised IoT
dataset to trigger the distribution on the model and increase the [18] × X X Unsupervised IoT
[19] × X X Unsupervised IIoT
weight clippings. For instance, the authors of [10] designed a FL
fast-converging defence algorithm to handle backdoor attacks [20] × X X Supervised IoT
on FL tasks using model weight clipping and noise injection. [21] × X X Supervised IoT
[22] × X X Unsupervised IoT
However, this scheme was limited, as it was unable to manage [23] × X X Supervised IIoT
untargeted attacks such as those in [12], [13]. Compared with AFL
conventional ML, FL can preserve data security, especially [24] × × X Supervised -
in terms of participant data during the learning process. FL [25] X × X Supervised -
[26] X X X Supervised IoT
can also help in updating server side data for the global [27] X X X Supervised IIoT
model, and the participant is not required to provide their This paper X X X Unsupervised IIoT
data to the server. Nevertheless, FL is vulnerable to several
security threats. For example, since the participant cannot see
or access the server side data, an attacker can access the datasets using different features.
participants’ training and inject poisoned data into the training Roadmap. The remainder of this paper is structured as fol-
model, meaning that the global model will be contaminated lows. Section II gives a short summary of related FL solutions
with false data. This is a well-known attack in ML, and is that have been designed to tackle anomalies and malware in
called a poisoning attack [14]. There are several significant the network. Section III discusses the representation of the FL
reasons for the vulnerabilities of FL to poisoning attacks: (i) data. Section IV presents our proposed FL architecture, attacks
each participant trains the local model, and the server cannot and solutions. We first describe our FL model for Android
determine whether the parameters loaded by the participant OS, and then describe various attack scenarios. Finally, we
are benign or malicious; and (ii) there is no mechanism for explain our adjusted defence mechanisms for mitigating these
participant authentication in FL, meaning that an adversary attacks. A performance analysis of the proposed attacks and
can pretend to be a benign participant. Motivated by this, adapted defence algorithms is presented in Section V. Finally,
we address the above-mentioned issues by designing a FL- Section VI summarises the main achievements of the paper
based Android malware detection defence algorithm to protect and gives some directions for future work.
the privacy of the users’ data. In particular, we design two II. R ELATED W ORK
algorithms that launch poisoning attacks on the participants’
In this section, we review the most recent related works in
training model (see the colourful training model adopted by
the field of ML approaches to malware detection (Section II-A)
ML algorithm in Fig. 1), and apply two countermeasure
and the robustness of FL-based malware detection approaches
solutions, namely Byzantine Medium (BM) and Byzantine
(Section II-B). A comparison of the techniques found in the
Krum (BK), to preserve the robustness of the network under
literature is presented in Table I.
these types of attacks.
Contributions. The main contributions of the paper are as
A. ML Approaches to Android Malware Detection
follows:
ML algorithms are widely used to leverage performance
• We present a FL-based architecture, Fed-IIoT, imposing of Android mobile apps. In [15], one of the earliest and
Android malware detection algorithm, including various most well-known works, the authors added crafted poisoning
identically independent distribution of learning learning attack algorithms adopted on Android malware clustering.
models. The authors of [16]then designed a feature transformation-
• We propose two poisoning attacks based on latent random based Android malware detection scheme that considered the
variable adopting GAN to conduct malware floating on the major features of Android malware detection and transformed
benign data samples using FL, namely GAN and FedGAN. them irreversibly into a new feature domain to validate the
• We propose avoiding anomaly in aggregation by GAN robustness of the ML model. The work in [17] introduced
network (A3GAN) defence algorithm that is formed based SEDMDroid, a stacking ensemble framework for identifying
on aggregating FL and GAN algorithms to detect the adver- Android malware. SEDMDroid validates diversity on the fea-
saries in server side component. tures and applies random feature subspaces and bootstrapping
• We modify and adapt Byzantine defence algorithm on Krum sample techniques. The study in [18] presented a permission-
and Medium and apply them against these form of attacks based malware detection approach named SIGPID to deal with
and verify its effectiveness. the growth in the number of malicious Android applications.
• Finally, we conduct an exhaustive set of experiments to The SIGPID algorithm applies three levels of pruning to the
validate the attack and defence mechanisms, on three IoT dataset to discover the most important permission features that
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 3
can help in attaining distinction between benign and malicious A. The considered data representation model
applications. Most recently, the work in [19] introduced a Fig. 2 shows the information gathered from various IIoT
malware detection framework to identify malware attacks on applications using Android OS to shape our sparse matrix
the IIoT, called MD-IIoT. The authors of MD-IIoT proposed representation. Note that the IIoT devices are under threat
a methodology for handling colour image visualisation and from an adversary who wants to modify and corrupt the data
used a deep convolution neural network to identify benign (see dashed lines in Fig. 2). This matrix includes important
and malicious samples. Although the methods described above information on the features of an Android app, such as system
are promising ML solutions, none of them deal with global features.
training models applying on each mobile app (possibly, partic-
ipant). Unlike these schemes, FED-IIoT considers this aspect.
IIoT system
Adversary
B. Adversarial FL Approaches
Some researchers have adopted distributed ML (DML) to
monitor data gathered from IoT devices [20], [21], [28]. DML
technique is the preliminary deployed solution that can support
FL [23], [29], [30]. These approaches commanded some band-
width and communication indications to mainly concentrate on
analysing the system performance and preserving the reliabil-
Android apps
ity of the federated nodes. FL is also vulnerable to poisoned
data that can fool the local and global ML models. To cope
with this issue, the work in [22]presented a rejection algorithm Dataset
based on the error rate and loss function to deny suspicious
local updates by testing the impacts on the data on the global
training model using a validation set. The main problem with
this FL solution is that validation testing for large Android
mobile applications is computationally expensive, and cannot
{00000...001} {00110...001} {00000...011} {10000...100} {01100...001}
be applied in real-time apps. In another study [23], the authors
designed an FL-based algorithm to distribute the training
process of a deep neural network. Their approach allows
mobile users to keep their data on their devices while a service
provider aggregates and distributes the locally trained model 00000...001
across the users. This helps to minimise the amount of data 00110...001 Sparse
collected by third parties on mobile users. 00000...011 matrix
Adversarial FL (AFL) settings are another issue that must be 10000...100
01100...001
considered. One prominent AFL technique relates to Byzantine
settings, where a subset of client data can behave stochasti- Fig. 2: Data representation of IIoT samples as a sparse matrix.
cally. We therefore need to design robust aggregation rules Dashed lines refer to an injecting attack from an adversary in the
to mitigate this issue. Exhaustive research has been carried IIoT system.
out on Byzantine settings [24]–[27], [31]. For example, the We assume that the local model of each IIoT device consists
authors of [24], [31] focused on gradient similarities, while of a set of benign samples, denoted by B, and a set of malware
the work in [25]applied geometric median aggregation, the samples, denoted by M . Then, we set up our settings contains
study in [26] examined redundant communication; and finally the labelled examples (i.e., S samples) and the B elements for
[27] utilised adaptive model quality estimation to deal with each sample as is shown in (1).
the anomalous behaviour of the samples in AFL. While these
D = {(ai , bi ) | ∀i = 1, . . . , S}, (1)
approaches can guarantee appropriate convergence guarantees
in Byzantine cases, they are computationally expensive and Here, bi ∈ {0, 1} is the binary label of the i-th sample fea-
need to be manually modified during the federated commu- ture; ai denote the i-th malware sample BR of each component
nication. Unlike the above AFL methods, our proposed FED- representing the selected feature; aif is the binary value of the
IIoT method adopts a generative adversarial network (GAN) to f -th feature in i-th sample where {∀f = 1, . . . , F }. If ai has
mimic the environment of the poisoned sample. We also adapt the f -th feature, then we set aif = 1, otherwise 0. We can
Byzantine defence mechanisms using Medium and Krum and also set S is the total number of samples.
add a GAN to deal with the proposed attack scenarios.
III. DATA R EPRESENTATION IN FL B. Threat model and assumptions
In this section, we give a detailed description of the data We consider some important hypotheses that list them here.
representation for the Android malware dataset in FL (Sec- First, our proposed attack and defence algorithms apply on
tion III-A) and explain the proposed threat model and paper static features of IoT devices. It is because the speed of
assumptions (Section III-B). executing operations is greater than the dynamic features.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 4
Second, the proposed methods include the number of adver- An interesting feature of a GAN is that it does not need
saries, which we have expanded to 5 in order to explore the labelled information. Its learning methods can be classified
impact of increasing the number of adversaries. We do not into generative and discriminative models. A generative model
consider collaborations between adversaries and we assume is trained to obtain the joint probability of input data and
the adversaries are independent of each other. Third, this work output class labels, i.e., p(a, b). This can be used to derive a
uses a special GAN structure, which is designed on the basis conditional distribution, i.e., p(b|a) using the Bayes rule. We
of the convolutional neural networks described in Fig. 4 and 5. can also use this learned joint probability for other purposes,
Finally, we assume that each client sends local model weight such as generating new samples (a, b).
updates to the server without encryption. The main idea of GAN is to use the discriminative frame-
work against the generative framework. In this way, the two
IV. FED-II OT: P ROPOSED A PPROACH neural network components of the GAN act as adversaries,
In this section, we describe our robust IIoT FL architecture and are trained on real samples to produce non-identifiable
for malware detection (Section IV-A). We then present our samples. The discriminator model adopted here as a binary-
GAN-based attack algorithm and describe its behaviour using label classifier. For the classifier, the input point is b, and
an analysis of Android IoT devices (Section IV-B). Finally, we the output is a F -dimensional vector of logits (the inverse
present our adapted countermeasure algorithms, which aims to of sigmoidal logistic function). The output vector will be as
mitigate attacks using the GAN method inspired by Byzantine follows.
algorithm (Section IV-C). b1 , b 2 , . . . , b F , (2)
server side. The participant side, as shown by the red rectangle The softmax function is a kind of normalized function ap-
on the left-hand side of Fig. 3, contains different participants plying as an activation function for CNN in the GAN. If we
(i.e., Pi , i ∈ {1 . . . S}). This part represents an adversary that can increase the accuracy for the normalization, we can have
can train the model locally (the adversary poison data gener- higher accuracy on the classification, either attack and defence
ation, P-BR, as shown in Fig. 3). Each Android application algorithms. To train the model, we minimise the negative log-
is accompanied by a participant and generates a sample as an likelihood between Pmodel (b|a) and the observed labels b. We
input to the binary vector representing a feature. In each step add some fake examples generated by the generator G to the
t + 1, one of the participants reuses the learned model in the available dataset (see the dashed components on the participant
previous step t, Mt . This model is subverted and modified by side of Fig. 3).
the adversary. The adversary performs the attack by adding Algorithm 1 presents the pseudocode for the GAN-based
poisoned samples (the P-BR vector on the participant side, as trivial attack algorithm. In lines 1 and 2 of this algorithm,
shown in Fig. 3) as a new sample into the training phase of first, we define the generator and discriminator functions that
one of the participants. We will explain how to create these are used to generate adversarial samples. Then in lines, 3 to
poisoned updates in Section IV-B. 17 in the two nested for loops, first, a batch of training data
is separated. Then, we randomly generate the noise vector
B. Proposed GAN-based Trivial Attack Algorithm (GAN) with the batch size with a normal distribution. In the next
step, the algorithm gives this noise vector to the generator
On the participant side (red rectangle in Fig. 3), one or
function, G, and its output is sent to the discriminator function,
more adversaries enter the system as ordinary participants and
D, to compute the similarity of the generated sample to the
try to change the process by modifying the features of the
training data set. In lines 11 and 12, we calculate the loss of
input samples so that the generated malware sample represents
the two functions G and D and repeat it. For each epoch,
a benign sample. This reduces the accuracy of the detection
by calculating gradients, the optimizer function is used to
system and opens the door to allow more malware samples
optimize the solutions. It should be noted that in the proposed
into the system. This part of the figure shows the adversary
trivial GAN method, all collected data from IoT devices are
using a GAN mechanism to generate adversarial samples, in
considered as a single data set and the training is performed
which the trained model (i.e., Mt ) is used as a discriminator
on the server side, as stated in the introduction, privacy is still
function. A generator network is created based on the latent
an important matter in this type of approach.
random variable, and this network is used to generate new
samples. The GAN is used to produce new samples that are
very similar to the real samples, which the adversary uses to C. Proposed Federated GAN Attack Algorithm (FedGAN)
add updates to the model, causing the model to be trained so Algorithm 2 presents a method inspired by using the concept
that it cannot detect malware samples. of federated learning in combination with GAN that can
Modified GAN for the Participant Attack: In this case, maintain the privacy of the data for each IoT device and
we intend to create GANs to enhance the training structure produce adversarial samples, named FedGAN. This algorithm
provided by Goodfellow et al. [32]. The first step involves assumes that each IoT devices do not share their data sets, but
gathering data by sampling from a dataset. they only update the model. We use this policy to preserve
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 5
Validation dataset
Adversary BR
0 0 0 1 ... 0 1 0 0 Training Poisoned
0 0 0 1 . 0 1 0 0 P-BR
Mt
Generator Discriminator
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 6
Algorithm 3 A3GAN Defence Algorithm on the number of samples and time complexity is O(n).
Input: MG ,V.Data,Clinets,Round Inside these two loops, commands are used, each of which
Output: MG
requires the use of train data, and time complexity for each
1: for each i in N do
2: Li ← M odel(V.Data/N ) of them is O(n), so considering the two nested for loops
3: MGAN .T rain(Li .params) mentioned, the total computational complexity is O(300 · n ·
4: end for n)=O(n2 ).
5: for r in Round do
6: for c in Clinets do • FedGAN Attack. In FedGAN, we use four nested for loops,
7: W̃ i ← MGAN (c.params) in which three external loops execute in a fixed number
8: Compute Ai using Eq. (5) (in this paper, Round = 300, Clients = 10, epochs = 300).
9: end for
10: Compute τ using Eq. (6) The internal for loop is the same as the GAN-based Trivial
11: if (Ai > τ ) then algorithm, so the computational complexity of FedGAN
12: Update MG using Eq. (4) is O(n2 ). Also, Due to the execution of the loops, the
13: end if
14: end for execution time is much longer than the GAN-based Trivial
15: return MG method but it is in the order of O(n2 ).
• A3GAN Defence. In A3GAN, lines 1-4 contain a for loop
in which the time complexity is O(n). In lines 5-14, we use
client in the federated education system, and not to include two nested for loops, which is the main part of the algorithm
that client in the aggregation model if this value is higher and consume more time and is in the order of O(n). Thus,
than the specified threshold for each client. the total time complexity of this algorithm is O(n2 ).
In this way, we first divide the validation data into N sep-
arate sections and create a corresponding model for each. We V. P ERFORMANCE E VALUATION
then use the weights of these models to train a GAN network. In this section, we report an experimental evaluation of the
The resulting network will act as an anomaly detector. proposed attack and the countermeasure algorithms.
Without loss of generality, suppose K clients participate in
federated learning (K ≤ S) and each client has nk training A. Simulation Setup
k
points. Let Wt+1 is the weight of the kth client in the round
We have extracted the static features of datasets and create
(t + 1) of the global model. We inspired the data aggregation
a sparse matrix that mapped the feature possibility to a
algorithm used in FedAvg [33], as reported in eq. (4):
S binary cases (set 1 presenting the feature, otherwise 0).In the
X nk k following, we describe the datasets and system settings.
Wt+1 = Wt+1 (4)
n Datasets: The tested IIoT datasets is listed below.
k=1
Considering the trained anomaly detector of FL, we want to • Drebin dataset [34]: This dataset contained 131,611 An-
shape the aggregation model such a way not to allow clients droid samples representing benign and malware/malicious
who have a high anomaly value to be used in calculating apps. A total of 96,150 of these samples were gathered
aggregation. We compute the anomaly value of client k based from the GooglePlay store, 19,545 from the Chinese market,
on Mean Square Error (MSE) relation function in (5): 2,810 from the Russian market, and 13,106 from other
2
Internet sources.
Akt+1 =
Wt+1
k k
− W̃t+1
(5)
• Genome dataset [35]: This dataset contained 1,200 Android
where Wt+1k
is the weight of client k in round t + 1 and malware samples, classified as installation methods, activa-
k
W̃t+1 is the calculated weight by GAN for this client. After tion mechanisms, and malicious payloads.
calculating Akt+1 for all clients, we calculate threshold τ in • Contagio dataset [36]: This dataset contained 16,800 benign
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 7
We adopt our method using Tensorflow (version 1.12.0) and of correct predictions and the total number of tested samples.
Keras (version 2.2.4) and build our GAN models in FED-IoT. Hence, we can define it as follows:
Methodologies. We present the adopted generator and dis- ζ +π
A= , (7)
criminator functions in Fig. 4 and 5, respectively. The input ζ +π+ν+µ
to the generator network is examples of the Android file where ζ is the ratio of correctly classified benign samples,
which are gathered from IoT devices. The generator model π is the ratio of correctly classified malware samples, ν
learns based on the data distribution network and generates is the ratio of wrongly classified benign samples, and µ is
similar examples to deceive the discriminator network. The the ratio of wrongly classified malware samples. The python
generated sample which is created by the generator network implementation of Fed-IIoT is available in [38].
feeds as an input to the discriminator network. It tries to detect
adversarial samples. If the sample is detected as an adversarial, B. Experimental results
it is returned to the generator network. Also, if a sample is In this section, we test the proposed attack and defence
unable to detect by the discriminator network, then it will be mechanisms on the datasets and features described above.
added to the data as a poisoned sample. In this paper, we use Attack algorithm results: In the first experiment, we studied
CNN architecture for generator and discriminator functions. GAN and FedGAN attack algorithms that were applied to the
Specifically, as shown in Fig. 4, we use a CNN sequential traffic data gathered from ten Android devices connected to
type to design the generator. In this figure, the first layer, the global training model for federated attack. We assumed
Dense layer, uses three Conv2DTranspose sublayers that are that the GAN model on an IIoT device could only keep
BatchNormalization, Relu, and Reshape layers. Similarly, as 300 binary examples for adversarial training. We tested our
shown in Fig. 5, we use a CNN sequential type that has two attack algorithms on all three feature types on three datasets.
Conv2D layers where we use LeakyRelu and Dropout between We generate the adversarial examples as transfer attacks. We
these layers. Also, we adopt Flatten and Dense in the final presume that we can produce the initial training model while
layers. the server side federated model is unable to retrain the model
and the adversary is also unable to get access to the update
model, Mt+1 .
We set the number of epochs to 300, which is aligned with
Convolutional Transpose Layer
Features the binary example rates for adversarial training. We use each
BatchNormalization
BatchNormalization
BatchNormalization
ReLU
ReLU
Samples 0 0 … 1 0
ter finalizing each epoch, the IIoT devices transfer the updated
Dense
. . . . .
. . . . .
. . . . . gradient information to their corresponding server to perform
1 0 … 0 0 aggregation. In here, we present the prediction accuracy for
0 0 … 0 1
the datasets Drebin, Contagio, and Gnome using the proposed
GAN and FedGAN federated attack algorithms. Fig. 6 shows
strides=(2, 2)
the results of the implementation of the two proposed attack
4*4*256 (4, 4, 128) (8, 8, 64) (16, 16, 1) algorithms. In each of the subfigures, we present the results
of the Drebin, Contagio, and Genome datasets using the API,
Fig. 4: CNN architecture for generative GAN. permission, and intent features. On the x-axis in each of the
subfigures, we indicate the number of epochs. On the y-axis
in each subfigure, we present the accuracy as calculated in
eq. (7). The illustrated plots display three modes: no-attack,
expressing that no attack was injected, and two of our proposed
Convolutional Layer
Convolutional Layer
Malware
attacks (i.e., GAN-attack and FedGAN). From these figures,
LeakyReLU
LeakyReLU
Dropout
Dropout
Flatten
Dense
0 1 … 0 0 Benign
we can see that when the number of epochs increases, which is
Generated Sample actually associated with the use of optimisers, accuracy always
increases in all methods. Fig. 6 shows that in no-attack mode,
the accuracy is always higher than 98% in all data sets and all
strides=(2, 2)
Dropout rate=0.3
file properties with a sufficient number of epochs. However, by
64
using two types of attacks, GAN-attack and FedGAN-attack,
128
the accuracy value has been drastically reduced; in some cases
Fig. 5: CNN architecture for discriminative GAN. it reaches less than 70%. This level of reduction is particularly
Defence Methods. We adopt and adjust two scenarios of noticeable in the case of malware data and binary features. By
byzantine methods reported in [7], [37] and with some mod- comparing the accuracy plots of GAN-attack and FedGAN-
ifications apply them on Krum and Medium and utilize them attack, we can see that in most cases, FedGAN-attack is
as defence mechanisms against the proposed attacks. more disastrous and can reduce the accuracy more than GAN-
Feature selection and metric. We rank the features using attack. Also, FedGAN-attack could cause a wider breach in
RandomForestRegressor algorithm and select 300 of them with the data privacy for each compromised participant. Focusing
higher ranks. We use accuracy as our main metric for the on the dataset feature results in Fig. 6, it can be seen that the
experiments. Accuracy (A) is the ratio between the number accuracies of all methods on API features are smaller than the
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 8
95 95 95
90
90 90
Accuracy (%)
Accuracy (%)
Accuracy (%)
85
85 85
80
80 80
75
75 75
70
70 70 65
65 65 60
25 50 75 100 125 150 175 200 225 250 275 300 25 50 75 100 125 150 175 200 225 250 275 300 25 50 75 100 125 150 175 200 225 250 275 300
Epoch Epoch Epoch
95 95 95
90
Accuracy (%)
Accuracy (%)
Accuracy (%)
90 90
85
85 85
80
80 80
75
75 75 70
70 70 65
50 100 150 200 250 300 50 100 150 200 250 300 50 100 150 200 250 300
Round Round Round
accuracies of all methods on permission and intent features, is assumed that only one of the clients is adversary. When
which is because of the smaller number of API samples. The the number of clients is small, the adversary could poison a
last point to be made about Fig. 6 is that the results presented higher percentage of the data, and in this case the accuracy
are for 10 participants, and only one of them is an adversary. will be lower. On the other hand, as the number of clients
Obviously, increasing the number of adversaries and keeping increases, each client, and therefore the adversary, uses a
the number of participants constant will cause the accuracy smaller percentage of data to train its local model, resulting
to decrease, while increasing the number of participants and in less impact on accuracy. It should be noted that in the
keeping the number of adversaries constant will cause the GAN-based attack algorithm, in which the model is created
accuracy to increase. As can be seen from the figures, after directly based on the whole data, we have actually changed
performing 300 epochs, the process of changing the results has the percentage of poisoned data. Focusing on no-adversary
reached steady state, and it seems that increasing the number cases (the dashed lines), we assume that the training process
of epochs will not change the results significantly. is distributed and we ave high accuracy rate. Also this figure
(i) Comparing algorithms based on accuracy in different shows when the number of clients increases, the amount of
round: Fig. 7 shows the accuracy of the proposed algorithms accuracy decreases. It confirms that the model will be triggered
for different numbers of running rounds on various datasets. and affected by the more aggregated data and it influences on
In this figure, with approximately 250 rounds of running, the the classification on FedGAN. Focusing on attack algorithms,
accuracy reaches steady state. This result is almost identical when an adversary is present among 5 clients, the accuracy
for all three datasets and all three types of API, Permission, is reduced to very low values for all datasets for both attack
and Intents files. In particular, Fig. 7a shows the accuracy algorithms and is lower for GAN method.
associated with the running of the proposed attack methods (iii) Comparing algorithms based on accuracy in different
on the Drebin dataset and confirms that an attack has not number of adversaries: In Table II we present accuracy
yet taken place, even with 50 rounds and the accuracy for results of GAN and FedGAN attack approaches based on 1 to 5
different features is more than 93%. Using both GAN-based number of adversaries (Ad1 ,. . .,Ad5 ) for different features on
and FedGAN approaches, the accuracy is significantly reduced various datasets. It is observed that by increasing the number
to between 70% and 86%, respectively. As a result, with in- of adversaries in the FedGAN algorithm or increasing the
creasing the number of rounds the accuracy initially increases, percentage of data poisoning in the GAN-Based attack, the
but in a maximum of 300 rounds of running, we will achieve amount of accuracy decreases.
an almost constant amount of accuracy.
(ii) Comparing algorithms based on accuracy in different Defence algorithm results: In the next experiment, we com-
number of clients: Fig. 8 compares the proposed attack pare the adjusted defence algorithms and verify their effi-
methods for a different number of clients. In this figure, it ciencies for various features and datasets. Specifically, Fig. 9
shows the results of using the Byzantine defence algorithm
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 9
Accuracy (%)
Accuracy (%)
85
80 85
80
75 80
75
70 75
70 65
65 70
60
60 55 65
55 50 60
5 6 7 8 9 10 11 12 13 14 15 5 6 7 8 9 10 11 12 13 14 15 5 6 7 8 9 10 11 12 13 14 15
Client Client Clients
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 10
Accuracy (%)
Accuracy (%)
92 92 92
90 90 90
88 88 88
86 86 86
84 84 84
82 82 82
80 80 80
N
Fe Att
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
N
Fe tt
Fe GA k
A GA -BM
N
Fe tt
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
N
Fe Att
Fe GA k
A GA -BM
o-
3G N
o-
3G N
o-
3G N
o-
3G N
o-
3G N
o-
3G N
o-
3G N
o-
3G N
o-
3G N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
d ac
d N
A
A
A -BK
A -BK
A -BK
A -BK
A -BK
A -BK
A -BK
A -BK
A -BK
N
N
(a) Drebin dataset. (b) Contagio dataset. (c) Genome dataset.
Fig. 9: The accuracy results of A3GAN, FedGAN adjusted on Byzantine-Medium (BM) and Byzantine-Krum (BK) defence approaches for
different features on various datasets. P:= permission; A:= API; I:= intent.
defence algorithm and adjust two Android malware detection [14] J. Zhang and C. Li, “Adversarial examples: Opportunities and chal-
schemes that use a GAN and FL algorithms to accurately lenges,” IEEE transactions on neural networks and learning systems,
2019.
detect a malicious model and delete the poisoned samples. [15] B. Biggio, K. Rieck, D. Ariu, C. Wressnegger, I. Corona, G. Giacinto,
The results of a comprehensive set of experiments confirm and F. Roli, “Poisoning behavioral malware clustering,” in Proceedings
that our methods outperform existing defence-based schemes of the 2014 workshop on artificial intelligent and security workshop,
2014, pp. 27–36.
in terms of accuracy. In future work, we will explore the use of
[16] Q. Han, V. Subrahmanian, and Y. Xiong, “Android malware detection
robust ensemble learning based on a GAN model and analyse via (somewhat) robust irreversible feature transformations,” IEEE Trans-
the anomalous behaviour of the IIoT samples especially for actions on Information Forensics and Security, 2020.
the heterogeneous stream line Android applications. Also, we [17] H. Zhu, Y. Li, R. Li, J. Li, Z.-H. You, and H. Song, “Sedmdroid: An
enhanced stacking ensemble of deep learning framework for android
will consider the robust data aggregation techniques, such as malware detection,” IEEE Transactions on Network Science and Engi-
information fusion to enhance the GAN and federating models neering, 2020.
in IIoT applications. xxxxxxx [18] J. Li, L. Sun, Q. Yan, Z. Li, W. Srisa-An, and H. Ye, “Significant
permission identification for machine-learning-based android malware
R EFERENCES detection,” IEEE Transactions on Industrial Informatics, vol. 14, no. 7,
[1] L. Da Xu, W. He, and S. Li, “Internet of things in industries: A survey,” pp. 3216–3225, 2018.
IEEE Transactions on industrial informatics, vol. 10, no. 4, pp. 2233– [19] H. Naeem, F. Ullah, M. R. Naeem, S. Khalid, D. Vasan, S. Jabbar, and
2243, 2014. S. Saeed, “Malware detection in industrial internet of things based on
[2] M. Alazab, S. Huda, J. Abawajy, R. Islam, J. Yearwood, S. Venkatraman, hybrid image visualization and deep learning model,” Ad Hoc Networks,
and R. Broadhurst, “A hybrid wrapper-filter approach for malware p. 102154, 2020.
detection,” Journal of networks, vol. 9, no. 11, pp. 2878–2891, 2011. [20] J. Dean, G. Corrado, R. Monga, K. Chen, M. Devin, M. Mao, M. Ran-
[3] L. Zhao, S. Hu, Q. Wang, J. Jiang, S. Chao, X. Luo, and P. Hu, zato, A. Senior, P. Tucker, K. Yang et al., “Large scale distributed deep
“Shielding collaborative learning: Mitigating poisoning attacks through networks,” in Advances in neural information processing systems, 2012,
client-side detection,” IEEE Transactions on Dependable and Secure pp. 1223–1231.
Computing, 2020. [21] Y. Song, T. Liu, T. Wei, X. Wang, Z. Tao, and M. Chen, “Fda3: Federated
[4] M. Alazab, R. Layton, R. Broadhurst, and B. Bouhours, “Malicious defense against adversarial attacks for cloud-based iiot applications,”
spam emails developments and authorship attribution,” in 2013 Fourth IEEE Transactions on Industrial Informatics, 2020.
Cybercrime and Trustworthy Computing Workshop. IEEE, 2013, pp. [22] M. Fang, X. Cao, J. Jia, and N. Gong, “Local model poisoning attacks
58–68. to byzantine-robust federated learning,” in 29th {USENIX} Security
[5] Q. Yang, Y. Liu, T. Chen, and Y. Tong, “Federated machine learning: Symposium ({USENIX} Security 20), 2020, pp. 1605–1622.
Concept and applications,” ACM Transactions on Intelligent Systems and [23] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
Technology (TIST), vol. 10, no. 2, pp. 1–19, 2019. “Communication-efficient learning of deep networks from decentralized
[6] S. K. Lo, Q. Lu, C. Wang, H. Paik, and L. Zhu, “A systematic literature data,” in Artificial Intelligence and Statistics. PMLR, 2017, pp. 1273–
review on federated machine learning: From a software engineering 1282.
perspective,” arXiv preprint arXiv:2007.11354, 2020. [24] F. Sattler, S. Wiedemann, K.-R. Müller, and W. Samek, “Robust and
[7] P. Blanchard, R. Guerraoui, J. Stainer et al., “Machine learning with communication-efficient federated learning from non-iid data,” IEEE
adversaries: Byzantine tolerant gradient descent,” in Advances in Neural transactions on neural networks and learning systems, 2019.
Information Processing Systems, 2017, pp. 119–129. [25] Y. Chen, L. Su, and J. Xu, “Distributed statistical machine learning
[8] E. Bagdasaryan, A. Veit, Y. Hua, D. Estrin, and V. Shmatikov, “How to in adversarial settings: Byzantine gradient descent,” Proceedings of the
backdoor federated learning,” in International Conference on Artificial ACM on Measurement and Analysis of Computing Systems, vol. 1, no. 2,
Intelligence and Statistics. PMLR, 2020, pp. 2938–2948. pp. 1–25, 2017.
[9] C. Zhang, S. Li, J. Xia, W. Wang, F. Yan, and Y. Liu, “Batchcrypt:
[26] L. Chen, H. Wang, Z. Charles, and D. Papailiopoulos, “Draco:
Efficient homomorphic encryption for cross-silo federated learning,” in
Byzantine-resilient distributed training via redundant gradients,” arXiv
2020 {USENIX} Annual Technical Conference ({USENIX}{ATC} 20),
preprint arXiv:1803.09877, 2018.
2020, pp. 493–506.
[10] Z. Sun, P. Kairouz, A. T. Suresh, and H. B. McMahan, “Can you really [27] L. Muñoz-González, K. T. Co, and E. C. Lupu, “Byzantine-robust
backdoor federated learning?” arXiv preprint arXiv:1911.07963, 2019. federated machine learning through adaptive model averaging,” arXiv
[11] S. Mishra and S. Jain, “Ontologies as a semantic model in iot,” preprint arXiv:1909.05125, 2019.
International Journal of Computers and Applications, vol. 42, no. 3, [28] W. Zhang, T. Zhou, Q. Lu, X. Wang, C. Zhu, Z. Wang, and F. Wang,
pp. 233–243, 2020. “Dynamic fusion based federated learning for covid-19 detection,” arXiv
[12] S. Li, Y. Cheng, W. Wang, Y. Liu, and T. Chen, “Learning to de- preprint arXiv:2009.10401, 2020.
tect malicious clients for robust federated learning,” arXiv preprint [29] J. Konečnỳ, H. B. McMahan, F. X. Yu, P. Richtárik, A. T. Suresh, and
arXiv:2002.00211, 2020. D. Bacon, “Federated learning: Strategies for improving communication
[13] S. Fu, C. Xie, B. Li, and Q. Chen, “Attack-resistant federated learn- efficiency,” arXiv preprint arXiv:1610.05492, 2016.
ing with residual-based reweighting,” arXiv preprint arXiv:1912.11464, [30] M. Alazab and R. Broadhurst, “An analysis of the nature of spam as
2019. cybercrime,” in Cyber-Physical Security. Springer, 2017, pp. 251–266.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI 10.1109/TII.2020.3043458, IEEE
Transactions on Industrial Informatics
IEEE TRANSACTIONS ON INDUSTRIAL INFORMATICS, VOL. XX, NO. XX, DECEMBER 2020 11
[31] W. Zhang, Q. Lu, Q. Yu, Z. Li, Y. Liu, S. K. Lo, S. Chen, X. Xu, and Rahim Tafazolli (SM’09) is a professor and the
L. Zhu, “Blockchain-based federated learning for device failure detection Director of the Institute for Communication Systems
in industrial iot.” (ICS) and 6G Innovation Centre (6GIC), the Univer-
[32] I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, sity of Surrey in the UK. He has over 30 years of
S. Ozair, A. Courville, and Y. Bengio, “Generative adversarial nets,” in experience in digital communications research and
Proc. of NIPS, 2014, pp. 2672–2680. teaching. He has published more than 500 research
[33] S. Li, Y. Cheng, Y. Liu, W. Wang, and T. Chen, “Abnormal client behav- papers in refereed journals, international conferences
ior detection in federated learning,” arXiv preprint arXiv:1910.09933, and as invited speaker. He is the editor of two books
2019. on “Technologies for Wireless Future” published by
[34] D. Arp, M. Spreitzenbarth, H. Gascon, K. Rieck, and C. Siemens, Wiley Vol.1 in 2004 and Vol.2 2006. He is co-
“Drebin: Effective and explainable detection of android malware in your inventor on more than 30 granted patents, all in
pocket.” in Proc. of NDSS, 2014. the field of digital communications. He was appointed as Fellow of WWRF
[35] X. Jiang and Y. Zhou, “Dissecting android malware: Characterization (Wireless World Research Forum) in April 2011, in recognition of his personal
and evolution,” in Proc. of IEEE S&P, 2012, pp. 95–109. contribution to the wireless world. As well as heading one of Europa’s leading
[36] “Contagio dataset,” http://contagiominidump.blogspot.com/, 2018, [On- research groups. He is regularly invited by governments to advise on network
line; accessed 04-December-2020]. and 5G technologies and was advisor to the Mayor of London with regard to
[37] M. B. Cohen, Y. T. Lee, G. Miller, J. Pachocki, and A. Sidford, the London Infrastructure Investment 2050 Plan during May and June 2014.
“Geometric median in nearly linear time,” in Proceedings of the forty- For more information: https://www.surrey.ac.uk/people/rahim-tafazolli
eighth annual ACM symposium on Theory of Computing, 2016, pp. 9–21.
[38] 2020, “Fed-IIoT source code,” https : / / github . com / mshojafar /
sourcecodes/raw/master/FeD-IIoT_sourcecode.zip.
1551-3203 (c) 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission. See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
Authorized licensed use limited to: Univ of Calif Santa Barbara. Downloaded on May 16,2021 at 10:27:12 UTC from IEEE Xplore. Restrictions apply.