You are on page 1of 6

,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

ChainsFL: Blockchain-driven Federated Learning


from Design to Realization
Shuo Yuan, Bin Cao∗ , Mugen Peng, Yaohua Sun
2021 IEEE Wireless Communications and Networking Conference (WCNC) | 978-1-7281-9505-6/20/$31.00 ©2021 IEEE | DOI: 10.1109/WCNC49053.2021.9417299

State Key Laboratory of Networking and Switching Technology,


Beijing University of Posts and Telecommunications, Beijing, 100876, China
Email: {yuanshuo, caobin, pmg, sunyaohua}@bupt.edu.cn

Abstract—Despite the advantages of Federated Learning (FL), centralized FL are challenging in the widespread application
such as devolving model training to intelligent devices and and can be summarized as follows:
preserving data privacy, FL still faces the risk of the single
point of failure and attack from malicious participants. Recently, • Security. The traditional FL that relies on a central
blockchain is considered a promising solution that can transform FL server to orchestrate the entire training process is
FL training into a decentralized manner and improve security vulnerable to a single point of failure and targeted attacks,
during training. However, traditional consensus mechanisms and
architecture for blockchain can hardly handle the large-scale FL
which will lead to service paralysis. Moreover, valid
task due to the huge resource consumption, limited throughput, local models will be distorted by the fallacious global
and high communication complexity. To this end, this paper model aggregated at the FL server. Besides, traditional
proposes a two-layer blockchain-driven FL framework, called FL cannot handle the effect of malicious devices on the
as ChainsFL, which is composed of multiple Raft-based shard accuracy of the global model.
networks (layer-1) and a Direct Acyclic Graph (DAG)-based
main chain (layer-2) where layer-1 limits the scale of each
• Efficiency. Nowadays, most FL architectures run in a
shard for a small range of information exchange, and layer- synchronous manner, and the training will inevitably be
2 allows each shard to update and share the model in parallel slow down by stragglers which are devices that take a
and asynchronously. Furthermore, FL procedure in a blockchain long time to complete one training round [5]. On the
manner is designed, and the refined DAG consensus mechanism other hand, in the asynchronous training, stale models
to mitigate the effect of stale models is proposed. In order to
provide a proof-of-concept implementation and evaluation, the
trained from an older version of the global model may
shard blockchain base on Hyperledger Fabric is deployed on the be used to update the latest global model and make the
self-made gateway as layer-1, and the self-developed DAG-based global model unstable [6].
main chain is deployed on the personal computer as layer-2. The
experimental results show that ChainsFL provides acceptable To address the aforementioned challenges, the blockchain
and sometimes better training efficiency and stronger robustness technology [7] [8] with decentralized and tamper-resistant
comparing with the typical existing FL systems. characteristics is introduced into FL. By leveraging
blockchain, [9] proposed BlockFL to carry out synchronous
I. I NTRODUCTION training in a decentralized manner. Thus, the single point
Internet of Everything brings not only massive amounts of of failure and targeted attacks can be overcome, and all the
data but also considerable distributed computation and storage local model updates will be verified during the Proof-of-Work
resources. However, it is challenging to use these data and (PoW) consensus. Besides, Multi-access Edge Computing
resources effectively while preserving privacy. To this end, (MEC) servers are adopted in [10] to provide computing
Federated Learning (FL) as a distributed training technology resources to the resources-constraint devices and as miners to
has been proposed to collaborate devices (e.g., mobile phones process received local models. Although these architectures
or vehicles) to train a model while keeping the training data improve the security of the FL training process, the PoW
locally [1]. consensus brings a huge computation cost and limits the
In FL, the global model is aggregated by a central server throughput of the system.
based on local models trained by each device over the local Different from [9] and [10] which use the public blockchain,
raw data [2]. To enhance the availability of FL, on the other we employ a consortium blockchain suitable for the MEC
hand, many existing works dedicate to investigate and optimize environment, where edge nodes are used as blockchain nodes,
the performance of FL in wireless networks. For example, [3] and devices need to be licensed to access the blockchain
formulates the FL over a wireless network as an optimization [11]. Moreover, compared with PoW, the Raft consensus [12]
problem to analyze the impact of wireless environment for the avoids the flaws, such as high computation costs and long
time of FL task. Besides, the radio resources are optimized in confirmation delay, and has a significant improvement in
[4] to schedule devices and minimize the convergence time throughput. Thus, we adopt the Raft to achieve a fast low-
of FL. However, the security and efficiency of the traditional complexity consensus to support decentralized FL. However,
the throughput of Raft is limited by the maximum performance
∗ Bin Cao is the corresponding author. of a single node with limited resources. So how to improve

978-1-7281-9505-6/21/$31.00 ©2021 IEEE

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.
,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

the scalability of the Raft-based blockchain to support a shards to deploy these blockchains. Besides, the main chain is
large-scale FL task? deployed on many distributed nodes to coordinate and validate
To this end, we split edge nodes into multiple groups transactions committed by shards in a decentralized manner.
where each group is one shard to deploy one Raft-based For simplicity, the descriptions below do not include the details
blockchain to interact with the devices joined the task, and of the interaction between shards and the main chain, which
each shard executes the FL training independently. Hence, are described in Section III.
limited scalability and high communication complexity caused
A. Raft-based Shard Blockchain
by a single large-scale Raft-based blockchain will be tackled
well. However, the time cost of each shard to complete one The Raft-based blockchain is deployed in each shard formed
iteration of synchronous training is different, and it is hard to by several edge nodes with heavy computation and storage
aggregate and update the global model from all the shards. capabilities and responsible for coordinate the devices served
So another critical problem is: how to efficiently coordinate by these nodes to complete the training task. We assume that
and integrate the models trained by all shards to update the the device set {d1 , d2 , · · · , dn } is selected by the blockchain
global model? of shard #1 to participate in the FL training task, and the
To address this problem, we construct a main chain to datasets of these devices are {D1 , D2 , · · · , Dn }. Without loss
store and share models trained from shards and adopt the of generality, assume that each training sample in dataset
Directed Acyclic Graph (DAG) consensus [13] to process is a set of input-output pairs (x, y). The parameters set of
these models asynchronously. Thus, a two-layer blockchain- the FL model is denoted as w. For each sample i, the
driven FL framework called ChainsFL is set up in this paper. loss function of the machine learning problem is defined as
On the other hand, the combination of synchronous training fi (w) = l(xi , yi |w). Thus, the loss function on the mini-
and asynchronous training is implemented in ChainsFL finally. batch bj which samples from Dj and contains k samples can
Besides, both the aggregating of the basic model for the next be written as fbj (w). The goal of device j is to solve the
shard training iteration and the validating of the transactions following optimization problem:
in DAG are combined as one process in ChainsFL. min Fj (w) = Ebj ∼Dj fbj (w). (1)
Through the above observations and discussions, the main The local model of device j can be updated according to wj ←
contributions of this work can be summarized as follows. wj − μ∇fbj (wj ),where μ is learning rate of the device.
• We propose ChainsFL, a federated learning framework In addition, the FederatedAveraging algorithm [1] is adopted
driven by the two-layer blockchain. Then, we design by each shard to aggregate updated local models. Then the
a Raft-based blockchain sharding architecture to im- loss function of shard #1 on the decentralized datasets can be
prove scalability and a refined DAG-based main chain expressed as:
n
to achieve cross-shard interaction. To the best of our |Dj |
knowledge, ChainsFL is the first framework using the Gs1 (w) = Fj (w), (2)
j=1
D
DAG to coordinate multiple shard blockchain networks n
to improve the security and scalability of FL systems. where D = j=1 |Dj | is total size of datasets used in
• We define the operation process and design a set of this shard training round. As the devices selected in round
interaction rules for ChainsFL to perform FL tasks. To t upload the parameters of the updated local models, the
improve the FL efficiency, synchronous and asynchronous model parameters of shard #1 ws1 will be updated through the
training are combined in ChainsFL. Moreover, the virtual weighted aggregation of all updated local models’ parameters,
i.e., n
pruning mechanism is proposed based on the refined  |Dj |wj (t)
DAG consensus to eliminate the impact of stale models ws1 (t) = . (3)
D
on the global model. j=1
• We establish the sharding network prototype based on After the sharding training iteration is completed, the up-
Hyperledger Fabric to implement the layer-1 and develop dated sharding model will be packed as a transaction and then
a DAG-based main chain to implement the layer-2 as published to the main chain.
well as carry out cross-layer interaction. The extensive
B. DAG-based Main Chain
evaluation results show that ChainsFL provides accept-
able and sometimes better convergence rates compared The DAG-based asynchronous consensus mechanism
to FedAvg [1] and AsynFL [6] for CNNs and enhances (known as DAG consensus or tangle consensus [13] [14])
the robustness of the training architecture. is adopted in the main chain to handle the interactions with
shards. In the main chain, a tangle graph can be constructed
II. F RAMEWORK OF C HAINS FL where vertices represent transactions, and the edges denote
In this section, we illustrate the framework and key compo- approval of another transaction. Each transaction consists
nents of ChainsFL. ChainsFL consists of two-layer blockchain of one model’s parameters trained by the shard. Besides,
which the layer-1 formed by multiple Raft-based blockchains transactions that are not approved by any other transaction are
and layer-2 formed by one DAG-based main chain. For layer- called tips. Different from other blockchain, the DAG-based
1, a set of edge nodes be partitioned into multiple independent main chain does not rely on single chain being the single

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.
,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

Shard #1 Shard #M Algorithm 1 Updated Local Model Validation


Task Node in shard executes:
Publisher
receive the transactions from devices
for all received the transactions do
Anew ← validate the accuracy of the model in the
Training
Round #1

transaction
Training

1st 1st
if Anew > Aτ then
iteration

iteration
forward the transaction to the leader in shard
else
Several
rounds ago mark invalid and discard
end if
Training
Round #1

2nd 2nd end for


Training
iteration

iteration
ability such as power status and network status which
Several
are reported by devices periodically. Only devices that
rounds ago are willing to participate in training and have abundant
DAG-base mainchain battery and stable network coverage will be selected.
Raft-based blockchain Then, these devices will be authorized to access the shard
deployed on edge nodes Device blockchain to download the basic model for local training
Shard gets the genesis block Shard uploads the
aggregated shard model and upload the transactions. It is worth noting that the
from main chain
device selection in each shard is independent.
Devices download the basic Shard selects two tips
model from shard blockchain and aggregates them to • 2) Local Update: Based on the basic model obtained from
for next round obtain the basic model the shard blockchain, the device will run the training
Devices upload the updated local for current iteration process over the local raw data to solve problem (1).
models to the shard blockchain Afterward, the updated local model will be sent to the
Fig. 1. Overview of the FL procedure in ChainsFL. blockchain node after the training goal, such as preset
the number of local epochs or accuracy, is achieved.
source of confidence thanks to the tangle graph. Each node in • 3) Model Aggregation: As shown in Algorithm 1, each
the DAG network maintains a local ledger that can be used to blockchain node will receive local models and validate
construct a tangle graph. Besides, the node belonging to the the accuracy based on the test dataset stored in g0 . Where
shard blockchain network can also join the DAG network. Aτ is the preset threshold and could be set as the accuracy
of the basic model used in the current training round.
III. F EDERATED L EARNING P ROCEDURE IN C HAINS FL
Next, the leader of the shard blockchain will aggregate
To perform the FL task in a decentralized manner, we define valid local models according to (3) to update the shard
the operation process and design a set of interaction rules to model. Due to the synchronous training in each shard, the
orchestrate the devices and compute nodes in ChainsFL, as aggregation will be triggered if enough devices upload the
shown in Fig. 1. More details about this procedure are given local models in time, otherwise, the round is abandoned.
as follows. The process from device selection to model aggregation
Phase 1: Task Publication. The task publisher signs a smart is called one shard training round. If the current iteration
contract to release the FL task. Besides, the information of this has not ended, the updated shard model will be packed
task, such as the structure and parameters of the initial model, as a transaction of shard blockchain and then published
the configurations for shard training, and the conditions for in the shard to provide the basic model for the next shard
end this task, will be stated in this contract. Then the smart training round.
contract will create the genesis transaction (denoted as g0 ) of Phase 3: Shard Model Committing and Basic Model
the main chain which includes these information and a test Aggregation. When the shard has completed rounds needed
dataset. Meanwhile, related shard networks will be triggered to execute in the current iteration, the latest aggregated shard
by this smart contract to run the training task. model will be packed as a transaction of the main chain and
Phase 2: Shard Training. The genesis transaction will be then be committed to the main chain. The details of these
pulled by the leader of triggered shard blockchains. Then the processes are shown in Algorithm 2.
configurations and requirements of this task will be stored in In addition, the smart contract will monitor the main chain
the distributed ledger of the shard and used to initiate the and take a similar operation as the basic model aggregation
training process. The interactions between devices and the in Algorithm 2 to aggregate the global model. The number of
shard blockchain and details of training process in one shard tips used to aggregate the global model will be stated in the
are described as follows: smart contract. Besides, the task publisher is able to aggregate
• 1) Device Selection: The leader of shard blockchain the global model from any location where can access the main
selects participants for shard training based on the avail- chain. On the other hand, devices licensed to access the edge

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.
,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

Algorithm 2 Tips Choosing and Shard Model Committing


Tip 1 Approve New tip Normal growth
if the current is 1st iteration then
wbm ← extract the initial parameters from g0 Freshness time

ApproveSet = (g0 )
else Commit tip 1 to
DAG
Selected as
candidate
Tip 1 approved by other
transaction
  
(w1 , w2 , ..., wη ) ← choose η tips from the tangle graph Tip 2 Approve New tip
  
(A1 , A2 , ..., Aη ) ← validate the accuracy of the model in Freshness time
No one will select
each chosen λtip 
wbm ← ( 1 wi /λ) aggregate λ(λ < η) tips with the
Commit tip 2 to Selected as Judged as stale Tip 2 approved by other
highest accuracy DAG candidate model transaction
ApproveSet = these λ tips Tip 3 Freshness time
Virtual Pruning
end if
wnew ← ShardTrain(wbm ) No one will select

g ← package wnew and the ID of all transactions in


Commit tip 3 to Selected as
ApproveSet as a transaction DAG candidate Judged as stale model

commit the g to the main chain Fig. 2. Three situations in lifecycle of each tip.
these tips will be approved by this shard model trained from
shard blockchain are able to select to join the training task to this basic model in the current iteration. In the main chain, the
improve the model accuracy or aggregate the latest model to model can be regarded as reach a consensus in the main chain
get the latest intelligent service, anytime and anywhere without when it is directly or indirectly approved by all new models
any centralized management. Thus, the vision of “providing committed to the main chain.
AI for every person and every organization at everywhere” can Due to the graph structure of the main chain, models not
be realized in an efficient way. approved by other models for a long time (called stale models)
IV. C ONSENSUS IN C HAINS FL will always be treated as tips. Thus, the ratio of stale models to
all tips will increase if they are not dealt with timely. Then the
As described in Section II, the Raft consensus and the DAG probability that the shard selects the stale model to aggregate
consensus are adopted in layer-1 and layer-2, respectively. the basic model will increase. To eliminate stale models, a
A. Raft Consensus waiting time, called freshness time, is set in the main chain. As
shown in Fig. 2, each tip will experience one of three situations
The edge nodes in one Raft consensus-based shard during the entire lifecycle and tips in the third situation will be
blockchain network are divided into leader and followers. The judged as stale and will not be selected by shards. On the other
details of leader selection are not the focus of this paper and hand, with the validation during the DAG consensus described
can be found in [12]. As shown in Algorithm 1, followers are in Algorithm 2, the model with low accuracy in the selected
responsible for validating received transactions (updated local η tips has a high probability to be discarded. Therefore, most
models) and then forwarding the valid ones to the leader. The of the tips judged as stale are models with lower accuracy and
leader will sort these transactions according to the generation will be pruned from the tangle graph.
time. When the cumulative size of transactions reaches the
threshold or the period is over, the leader will create a block V. I MPLEMENTATION
and broadcast it to all followers. Moreover, the consensus for To implement and then evaluate the proposed ChainsFL, we
this block will be achieved when the leader received half of set up the Raft-based blockchain based on the Hyperledger
all followers’ responses. Fabric [15], which includes the Raft consensus. Besides, a
By partitioning a large-scale blockchain network into multi- refined DAG-based main chain is developed. Due to the limited
ple independent shards to parallel consensus and separate data size of the block in Fabric, it is hard to store the model
storage, the system’s overall throughput is scaled effectively. file in the main body of the block directly. Thus, the peer-
Besides, most of the data just need synchronizing within the to-peer file system called InterPlanetary File System (IPFS)
shard instead of broadcasting to the entire network, which [16] is used in ChainsFL to implement the off-chain storage.
considerably reduces communication rounds and speeds the As one file is added to IPFS, a hash value that uniquely
processing of transactions. On the other hand, the influence identifies the contents of this file will be returned by IPFS.
of stragglers is limited to the shards to which the stragglers Moreover, the hash value can be used to locate and then
belong instead of the entire network. download the related file across the system [16]. Then, the
transactions in ChainsFL store the hash of the parameter file
B. DAG Consensus-based Virtual Pruning to guarantee its immutability. The details of Fabric and IPFS
The main chain in ChainsFL is used to share the model are not presented in this paper to save space. The source code
trained by each shard and validate these models’ validity. As of the implementation is available on GitHub1 .
described in Algorithm 2, the leader will aggregate the basic
model using λ tips for the next shard training iteration. Then, 1 https://github.com/shuoyuan/ChainsFL-implementation

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.
,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

Deployment in Each Shard


Leader
Layer-2

Follower Follower
DAG Host (PC)
Layer-1

(a) Accuracy on Testing Set. (b) Cross Entropy on Training Set.


Shard #1 (Gateway) Shard #2 (Gateway) Shard #3 (Gateway)
Fig. 4. Metrics vs. # of global epochs for the MNIST CNN (non-IID).
Fig. 3. The implementation of ChainsFL in the real environment.
The implementation of ChainsFL without participating de-
vices are shown in Fig. 3, and the following details are
considered:
• Device: the lightweight computer is used as the partici-
pant of the FL task to train the local model.
• Raft-based Blockchain Node: multiple self-made gate-
ways are used as the blockchain nodes to deploy the
Fabric blockchain and as the access points to provide (a) Accuracy on Testing Set. (b) Cross Entropy on Training Set.
radio access for devices. Fig. 5. Metrics vs. # of gradients for the MNIST CNN (non-IID).
• Main Chain: a personal computer (PC) is used to deploy For ChainsFL, one shard training round is configured for each
the self-developed DAG-based main chain. shard training iteration. There are three shards configured in
To accelerate and simplify the executions of massive experi- ChainsFL, and k = 10 devices are non-overlapping selected
ments, the main chain is deployed on one computer (one node) for each shard. The parameters related to the aggregation
instead of many distributed nodes in our real environment of the basic model are set as λ = 2, and η = 3. For
setup2 . Due to the limited number of the gateway, a complete FedAvg, 10 devices are selected in each global epoch. For
Fabric blockchain containing one leader and two followers is AsynFL, one device with local model wlm will be selected
configured in one gateway3 . Thus, the gateway and devices the from 100 devices and be used to update the global model
gateway served are one shard network defined in ChainsFL. wgm in each global epoch according to the update rule of
wgm ← 12 wgm + 12 wlm . To compare three training paradigms
VI. E VALUATION AND D ISCUSSION fairly, we conduct two comparisons: metrics versus the number
In this section, we empirically evaluate the ChainsFL based of global epochs and metrics versus the number of gradients.
on its convergence and robustness against the malicious model For the first comparison, the basic model aggregated by one
attacks from devices and shards. shard can be treated as the global model. Thus, one shard
training iteration can be treated as one global epoch. On the
A. Evaluation Specifics
other hand, the gradient generated in each local epoch of
With the implemented ChainsFL described in Section V, the devices is counted for the second comparison. For FedAvg,
object classification task is adopted to evaluate the efficiency in each global epoch, since the local epochs is 5, and 10
and security of ChainsFL. The image dataset MNIST, which devices are selected, 50 gradients are generated in each global
includes 60000 training images and 10000 testing images, epoch. Similar to FedAvg, 50 gradients are used in each shard
is used. The synchronous algorithm of FedAvg [1], and the iteration in ChainsFL. For AsynFL, 5 gradients are used in
asynchronous algorithm of AsynFL [6] are used as baselines. each global epoch. In all experiments, 10 times run for each.
To simulate the real data characteristics, the Non-IID data Then, the averages of these experiments results are drawn as
setting is adopted in all experiments. Thus, we divide the curves, and the standard deviations of these curves are also
training set of MNIST into 200 groups after sorting these by shown in the figures.
digit label, and each group has 300 images. We assign each
B. Convergence
of 100 devices 2 groups.
The Convolutional Neural Network (CNN) with three con- The convergence with the number of global epochs and
volutional layers and two full connected layers is used in the gradients increases are shown in Fig. 4 and Fig. 5, respectively.
training task. Moreover, we take the learning rate μ = 0.01, In Fig. 4(a), we can observe that ChainsFL converges faster
mini-batch size B = 10, and local epoch E = 5 for all devices. than FedAvg and AsynFL at the same amount of global
epochs. The reason is that the shard aggregates the basic
2 Due to interactions between the leader of the shard blockchain and the
model using the tips from DAG and this means that updated
main chain will not be affected by this deployment scheme, the performance shard models are shared with all shards in ChainsFL. Besides,
of federated learning over ChainsFL will not be disturbed.
3 At least three nodes (one leader and two followers) need to be deployed AsynFL uses only one device for one global epoch, which
in the Raft consensus of Fabric. makes it converge slow than ChainsFL and FedAvg. Actually,

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.
,(((:LUHOHVV&RPPXQLFDWLRQVDQG1HWZRUNLQJ&RQIHUHQFH :&1&

prototype of ChainsFL is developed, and massive experiments


are run in this prototype. The experimental results demonstrate
the effectiveness and robustness of ChainsFL.

VIII. ACKNOWLEDGEMENT
This work was supported in part by the National Nat-
ural Science Foundation of China under Grant 61921003,
Grant 61925101, Grant 61831002, Grant 61701059, and Grant
(a) Accuracy of different FL (b) Accuracy of ChainsFL with dif-
paradigms with malicious devices. ferent # of malicious devices. 61941114, in part by the Beijing Natural Science Founda-
Fig. 6. Test accuracy with malicious devices for the MNIST CNN (non-IID). tion under Grant JQ18016, in part by the Eighteenth Open
this comparison scheme is not fair to AsynFL due to the Foundation of State Key Lab of Integrated Services Net-
computation of AsynFL in each global epoch is much less than works of Xidian University under Grant ISN20-05, in part
FedAvg and ChainsFL. As shown in Fig. 5(a), the comparison by the Chongqing Technological Innovation and Application
based on the number of gradients shows that AsynFL achieves Development Projects under Grant cstc2019jscx-msxm1322,
the fastest convergence, which is in line with the conclusion in part by the Fundamental Research Funds for the Central
of [6]. As the training progresses, the accuracy of ChainsFL Universities under Grant 2020RC10 and Grant 2020RC11, and
will gradually get close to AsynFL. Besides, the train loss of in part by the BUPT Excellent Ph.D. Students Foundation
these experiments also presents the same conclusions as to test under Grant CX2020106.
accuracy.
R EFERENCES
C. Robustness [1] B. McMahan, E. Moore, D. Ramage, S. Hampson, and B. A. y Arcas,
To compare the robustness of these three paradigms, we “Communication-Efficient Learning of Deep Networks from Decentral-
ized Data,” in Proc. Int. Conf. Artif. Intell. Stat., AISTATS’17, Fort
configure 20% of selected devices as malicious devices. The Lauderdale, Florida, USA, Apr. 2017, pp. 1273–1282.
model from malicious devices or malicious shards is set as [2] K. Bonawitz, H. Eichner, W. Grieskamp, D. Huba, A. Ingerman,
the same model with a test accuracy of 0.07. For detail, V. Ivanov, C. Kiddon et al., “Towards Federated Learning at Scale:
System Design,” arXiv preprint arXiv:1902.01046, Mar. 2019.
two malicious devices are configured in 10 selected devices [3] N. H. Tran, W. Bao, A. Zomaya, M. N. H. Nguyen, and C. S. Hong,
for FedAvg, and 20 malicious devices are configured in 100 “Federated Learning over Wireless Networks: Optimization Model De-
devices for AsynFL. As shown in Fig. 6(a), the accuracy of sign and Analysis,” in Proc. IEEE INFOCOM’19, Apr. 2019, pp. 1387–
1395.
ChainsFL with two malicious devices per shard is higher than [4] M. Chen, H. V. Poor, W. Saad, and S. Cui, “Convergence Time
other paradigms thanks to the validation of shard blockchain Optimization for Federated Learning over Wireless Networks,” arXiv
nodes and virtual pruning in the main chain. It is clear that the preprint arXiv:2001.07845, Jan. 2020.
[5] S. Dutta, G. Joshi, S. Ghosh, P. Dube, and P. Nagpurkar, “Slow and Stale
accuracy of FedAvg has dropped significantly, and the curve Gradients Can Win the Race: Error-Runtime Trade-offs in Distributed
of AsynFL is chaotic and shows no signs of convergence. SGD,” in Proc. Int. Conf. Artif. Intell. Stat., AISTATS’18, Playa Blanca,
To evaluate the impact of malicious devices or shard on the Lanzarote, Canary Islands, Spain, Aug. 2018.
[6] C. Xie, S. Koyejo, and I. Gupta, “Asynchronous Federated Optimiza-
global model of ChainsFL, we design three attack schemes: tion,” arXiv preprint arXiv:1903.03934, Sep. 2019.
two malicious devices per shard, five malicious devices per [7] S. Nakamoto, “Bitcoin: A Peer-to-Peer Electronic Cash System,” bit-
shard, and one malicious shard. As shown in Fig. 6(b), coin.org, pp. 1–9, 2008.
[8] B. Cao, Y. Li, L. Zhang, L. Zhang, S. Mumtaz, Z. Zhou, and M. Peng,
compared with schemes of each shard containing two and “When Internet of Things Meets Blockchain: Challenges in Distributed
five malicious devices, a malicious shard has less impact on Consensus,” IEEE Netw., vol. 33, no. 6, pp. 133–139, Nov. 2019.
the performance of ChainsFL. The reason is that the DAG [9] H. Kim, J. Park, M. Bennis, and S.-L. Kim, “Blockchained On-Device
consensus-based virtual pruning is able to effectively eliminate Federated Learning,” IEEE Commun. Lett., pp. 1–4, 2019.
[10] U. Majeed and C. S. Hong, “FLchain: Federated Learning via MEC-
the model uploaded by the malicious shard. Although mali- enabled Blockchain Network,” in Proc. Asia-Pacific Netw. Oper. Manag.
cious devices are detected and eliminated during the shard Symp., APNOMS’19, Matsue, Japan, Sep. 2019, pp. 1–4.
consensus for the first two schemes, the number of valid [11] R. Yang, F. R. Yu, P. Si, Z. Yang, and Y. Zhang, “Integrated Blockchain
and Edge Computing Systems: A Survey, Some Research Issues and
devices that participated in each shard iteration is also reduced. Challenges,” IEEE Commun. Surv. Tutor., vol. 21, no. 2, pp. 1508–1532,
VII. C ONCLUSION Jan. 2019.
[12] D. Ongaro and J. Ousterhout, “In Search of an Understandable Consen-
In this paper, a two-layer blockchain-driven federated learn- sus Algorithm,” in Proc. USENIX Annu. Tech. Conf., USENIX ATC’14,
ing framework is proposed to ensure the security and efficiency Philadelphia, PA, USA, 2014, pp. 305–319.
[13] S. Popov, “The Tangle,” IOTA, White Paper, Apr. 2018.
of the distributed training. Through the Raft consensus used [14] Y. Li, B. Cao, M. Peng, L. Zhang, L. Zhang, D. Feng, and J. Yu, “Direct
in layer-1 and DAG consensus used in layer-2, asynchronous Acyclic Graph-Based Ledger for Internet of Things: Performance and
optimization and synchronous optimization are effectively Security Analysis,” IEEE/ACM Trans. Networking, pp. 1–14, 2020.
[15] “A Blockchain Platform for the Enterprise — Hyperledger Fabric,”
combined to deal with stragglers and stale models, and a cross- https://hyperledger-fabric.readthedocs.io.
layer operation procedure is proposed to achieve the consensus [16] J. Benet, “IPFS - Content Addressed, Versioned, P2P File System,” arXiv
among all shards on updated shard models. Besides, the preprint arXiv:1407.3561, Jul. 2014.

Authorized licensed use limited to: San Francisco State Univ. Downloaded on June 17,2021 at 04:24:19 UTC from IEEE Xplore. Restrictions apply.

You might also like