Professional Documents
Culture Documents
6534
because each commercial image source has its own unique the global model. The issue arises in real-world scenarios
image set, causing domain shift amongst clients. The when the data follow the non-IID distribution as proposed
dataset is inspired by the Office-31 dataset [37]. We also by Zhao et al. [48].
propose two novel algorithms, namely Fed-Cyclic and Fed- Federated Learning with data heterogeneity: The
Star. Fed-Cyclic is a simple algorithm in which, a client vanilla FedAvg faces convergence issues when there is data
gets weights from the previous client, trains the model lo- heterogeneity among the clients. To tackle this challenge,
cally and passes the weights to the next client in a cyclic different methods have been proposed. FedProx [24] adds
fashion. In this way, a global model is simply being passed a proximal term by calculating the square distance between
from one client to another in a cyclic manner. The global the server and client with local loss to optimize the global
server need not involve here. Even if it is required that we model better. FedNova [43] proposes normalized averag-
want to involve it to preserve anonimity, the global server ing to eliminate objective inconsistency with heterogeneous
does not have to perform any computation and can be used Federated optimization. FedMax, as proposed by Chen et
only for passing the parameters of one client to another. al. [7], aims to mitigate activation-divergence by making ac-
Fed-Star requires each client to receive weights from all tivation vectors of the same classes across different devices
the other clients during pre-aggregation in a star-like man- similar. FedOpt proposed by Reddi et al. [35] applies dif-
ner after the local training of each client on its train set. ferent optimizers to the server, like Adam, Yogi, and Ada-
While pre-aggregating, every client prioritizes learning the Grad. VRL-SGD [26] incorporates variance-reduction into
outlier features present in different clients while retaining local SGD and attains higher accuracy while decreasing the
the common features to train a more robust model impervi- communication cost. FedCluster [5] proposes grouping lo-
ous to statistical heterogeneity among the client’s data dis- cal devices into different clusters so that each cluster can
tribution, followed by aggregation via a global server after implement any Federated algorithm. RingFed, proposed
a fixed number of periods. Experiments show that our algo- by Yang et al. [46], minimizes the communication cost by
rithms have better converegence and accuracy. pre-aggregating the parameters among the local clients be-
To summarize, our contributions are three-fold: fore uploading the parameter to the central server. SCAF-
• We propose the an image classification dataset specif- FOLD [20] uses variance-reduction to mitigate client drift.
ically designed for Federated Learning, which is close Chen et al. [6] proposes FedSVRG that uses stochastic vari-
to a real-world scenario where each client has a unique ance reduced gradient-based method to reduce the commu-
dataset demonstrating domain shift. nication cost between clients and servers while maintain-
ing accuracy. Jeong et al. [18] proposes Federated aug-
• We propose Fed-Cyclic, a simple algorithm, which is mentation (FAug), which involves the local client jointly
communicationally efficient and attains higher accu- training the generative model to augment their local dataset
racy than baselines. and generate the IID dataset. In this paper, we aim to
train models robust to data heterogeneity with faster conver-
• We propose the Fed-Star algorithm, which trains a gence and lower communication costs via our proposed al-
model that prioritizes learning of generalized and out- gorithms. One of our proposed method, Fed-Star is similar
lier features to create a model personalized to each to RingFed [46]. RingFed involves simple pre-aggregation
client’s heterogeneous dataset distribution and attains of weights between adjacent clients, whereas our method
faster convergence than the baselines with higher ac- involves pre-aggregation of weights between all the local
curacy. clients using the accuracy metric.
6535
poses the TiFL method to cluster the clients in different tiers where Dk denotes the data owned by the k th client. Given
and train the clients belonging to the same tier together for this, we can rewrite our objective function as follows:
faster training. Sattler et al. [38] proposes a hierarchical
clustering-based approach based on the cosine similarity of
the client gradient to segment similar clients in similar clus- \min \frac {1}{|D|}\sum \limits _{k=1}^{K}{\sum \limits _{i=1}^{|D_k|}\mathcal {L}(x_i,y_i;w)} (3)
tering for training to train the local clients properly. Xie et
al. [45] proposes Expectation Maximization to derive opti- If we represent the average local loss Lk of k th client
mal matching between the local client and the global model using
for personalized training. Deng et al. [10] proposes APFL
algorithm to find optimal client-server pair for personalized
learning.There are different clustering-based approaches as \mathbf {L}_k=\frac {1}{|D_k|}\sum \limits _{i=1}^{|D_k|}\mathcal {L}(x_i,y_i;w), (4)
explored by different authors [3] [14] [17] [13]. Tan et
al. [40] provides deeper analysis of personalized Federated we can reformulate the objective as follows:
framework. In our work, we aim to personalize the model at
the global level by proposing an algorithm that better cap-
tures the outlier features of the client while retaining the \min \sum \limits _{k=1}^{K}{\frac {|D_k|}{|D|}\mathbf {L}_k} (5)
generalized features.
Federated Image classification datasets Most Feder- which suggests that our objective is to minimize the
ated Learning algorithms are simulated on datasets belong- weighted sum of local losses incurred by our clients, and
ing to a single domain with an artificial partition among the weights are proportional to the number of data samples
the clients or use existing public datasets. The dataset dis- clients have with them. This objective function is similar
tribution may differ for different clients in the real-world to same as [29]. We discussed it to make our paper self-
scenario as the clients exhibit domain shift. The first work contained.
proposing the real-world image dataset [27] contains more This formulation motivated FedAvg [29] to aggregate
than 900 images belonging to 7 different object categories the local model weights in the weighted averaging manner
captured via street cameras and annotated with detailed to obtain w as shown below:
boxes. The image dataset has applications in object detec-
tion. In our work, we propose the first real-world image
dataset for the image classification task to better understand w=\sum \limits _{k=1}^K{\frac {|D_k|}{|D|}w_k} (6)
the performance of the Federated algorithm in a real-world
setting. This aggregation happens iteratively, where, in a given
iteration, the central server sends the global model to the
local clients, where local models are updated and then sent
3. Proposed Methods back to the global server for aggregation, as shown in Fig-
ure 1. This happens until the global model converges.
3.1. Objective
In Federated Learning (FL), different clients (say K
clients) collaborate to learn a global model without having Global Model
4
to share their data. Let the weights of such a model be w, 3
and let the loss value of the model for sample (xi , yi ) be 1
3
1 3
1
D1 D2 D8
where D denotes the union of all the data owned by differ-
ent clients, as shown below: Figure 1. FedAvg algorithm [29]
6536
the accuracy drops too [15]. Also, FedAvg creates a gen- Algorithm 1 Proposed Fed-Cyclic Algorithm
eralized model by averaging the parameters from the local Input Initial weights winit , number of global
clients, forcing the local model with statistical heterogene- rounds R, number of local epochs E, learning rate η,
ity to learn a generalized representation that may differ from K clients indexed by k (with local data Dk and local
its data distribution, leading to poorly trained local clients. weights wk ) and local minibatch size b.
In that case, local clients do not find the global model satis- Output Global weights wR (after R rounds)
factory. Algorithm:
Considering these limitations of FedAvg, we propose Initialize w0 ← winit // Global weights initialized
Fed-Cyclic and Fed-Star algorithms, which cater to the sta- w10 ← w0
tistical heterogeneity of data across the clients and ensure for r=0 to R − 1 do
the satisfactory local performance of global models despite for k=1 to K − 1 do
that. wkr+1 ← ClientU pdate(wkr , k) using SGD
3.2. Fed-Cyclic
r
wk+1 ← wkr+1
r+1 r
wK ← ClientU pdate(wK , K) using SGD
We propose the Fed-Cyclic algorithm to overcome the r+1 r+1
w1 ← wK
challenges faced by the FedAvg algorithm, which suffers r+1
wr+1 ← wK
from a communication bottleneck due to a large number of
edge devices uploading the parameters to the central server, function ClientU pdate(w, k)
which causes congestion in the network. The model visual- B ← (split Dk into batches of size b)
ization is shown in Figure 2 for e=1 to E do
for d ∈ B do w ← w - η∇g(w;d)
W2 return w
D2 W3
W4
D8
W8
w^{r}_{k+1} \gets w^{r+1}_{k} \label {eq:5} (7)
Wk : Weights of kth client
D7 D5
where we use k th client to initialize (k + 1)th client.
D6
The algorithm is robust to statistical heterogeneity as ev-
W7
W5
ery client gets an opportunity to train the global model on
W6
the local data. Moreover, we can take the view that the
Figure 2. Proposed model using Fed-Cyclic showing passing of global model is being periodically trained on different por-
weights cyclically by the clients after local training. tions of the dataset (D), as if they are mini-batches (Dk ).
Hence, this algorithm is somewhat analogous to a typical
In the Fed-Cyclic algorithm, we use a global model to deep learning approach from the point of view of the global
initialize the weight of one of the clients in the network, model. As a result, convergence also gets ensured, unlike
followed by training the client’s local model for E local FedAvg, where we expect convergence for simple aggrega-
epochs. The optimizer used is SGD at the local client. The tion of weights.
updated weights are then used to initialize the weight of the
3.3. Fed-Star
next client in the network, as shown in equation (7), and
the process continues until all the clients are trained cycli- Although Fed-Cyclic can converge faster than FedAvg,
cally in this manner, constituting one training round. In our it is a very simple algorithm, similar to FedAvg. Also, it
Fed-Cyclic algorithm, the clients can either directly pass the lacks aggregation of any kind. Here, we propose the Fed-
weights to the next client or involve the global server to do Star algorithm, where we address these limitations.
so to preserve the anonymity of the last client. After the end In Fed-Star, for some time, the local models are trained
of each round, the global weights w get updated. locally for some epochs in a parallel manner. Once the
It is a communication-efficient algorithm since the given epochs are complete, we perform pre-aggregation of
clients can directly pass the weights to the next client with- weights locally at each client by sharing their models with
out involving the global server. Even if the global server is each other. Each client gets models from every other client,
involved in passing the weights from one client to another, and they are evaluated on the local training set of the given
no processing is done, and only parameters from a single client. The accuracy obtained helps us determine how much
6537
Algorithm 2 Proposed Fed-Star Algorithm D2
Algorithm:
Initialize w0 ← winit // Global weights initialized D7 D5
for r=0 to R − 1 do W7
D6
for k ∈ {1, · · · , K} parallely do Figure 3. Proposed model using Fed-Star demonstrating pre-
wkr,p+1 ← ClientU pdate (wkr,p , k) aggregation of the parameters in star topology manner among local
for j ∈ {1, · · · , K} do clients and is given by equation (9). This is followed by the trans-
transfer wjr,p+1 to k th client ferring of weight to the global model where aggregation of weights
M (k, j) = 1 − Acc(wjr,p+1 , Dk )/100 is performed.
K
M (k,j)∗wjr,p+1
P
6538
dataset [37], which contains common objects in the office
settings like keyboard, printer, monitor, laptop, and so forth.
Our dataset includes the same categories of images as were
in the Office-31 dataset. We took extra care to ensure that
only relevant and high-quality images from each source
were taken. We removed poor quality, duplicate or irrele-
vant images by manually curating the dataset. The statistics
showing how images are distributed within the classes and
across the sources are summarized in Tables 1 and 2. The
sample images are shown in Figure 4.
6539
Method Local Epochs (E) Period (P ) Global Rounds (R) Dataset Model Test Accuracy
6540
Figure 5. The graph shows how the accuracy of FedAvg, RingFed, Fed-Cyclic and Fed-Star changes with different learning rates (lr).
Learning FedAvg RingFed Fed-Cyclic Fed-Star relevant for meaningful collaboration amongst clients hav-
Rate (η) [29] [46] (Ours) (Ours) ing statistical heterogeneity (domain shift).
6541
[5] Cheng Chen, Ziyi Chen, Yi Zhou, and Bhavya Kailkhura. ity and hospital stay time using distributed electronic medi-
Fedcluster: Boosting the convergence of federated learning cal records. Journal of biomedical informatics, 99:103291,
via cluster-cycling. In 2020 IEEE International Conference 2019.
on Big Data (Big Data), pages 5017–5026. IEEE, 2020. [18] Eunjeong Jeong, Seungeun Oh, Hyesung Kim, Jihong Park,
[6] Dawei Chen, Choong Seon Hong, Yiyong Zha, Yunfei Mehdi Bennis, and Seong-Lyun Kim. Communication-
Zhang, Xin Liu, and Zhu Han. Fedsvrg based communica- efficient on-device machine learning: Federated distillation
tion efficient scheme for federated learning in mec networks. and augmentation under non-iid private data. arXiv preprint
IEEE Transactions on Vehicular Technology, 70(7):7300– arXiv:1811.11479, 2018.
7304, 2021. [19] Peter Kairouz, H Brendan McMahan, Brendan Avent,
[7] Wei Chen, Kartikeya Bhardwaj, and Radu Marculescu. Fed- Aurélien Bellet, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista
max: Mitigating activation divergence for accurate and Bonawitz, Zachary Charles, Graham Cormode, Rachel Cum-
communication-efficient federated learning. In Joint Euro- mings, et al. Advances and open problems in federated learn-
pean Conference on Machine Learning and Knowledge Dis- ing. Foundations and Trends® in Machine Learning, 14(1–
covery in Databases, pages 348–363. Springer, 2020. 2):1–210, 2021.
[8] Emiliano De Cristofaro. An overview of privacy in machine [20] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri,
learning. arXiv preprint arXiv:2005.08679, 2020. Sashank Reddi, Sebastian Stich, and Ananda Theertha
[9] Ofer Dekel, Ran Gilad-Bachrach, Ohad Shamir, and Lin Suresh. Scaffold: Stochastic controlled averaging for fed-
Xiao. Optimal distributed online prediction using mini- erated learning. In International Conference on Machine
batches. Journal of Machine Learning Research, 13(1), Learning, pages 5132–5143. PMLR, 2020.
2012. [21] Sai Praneeth Karimireddy, Satyen Kale, Mehryar Mohri,
Sashank J Reddi, Sebastian U Stich, and Ananda Theertha
[10] Yuyang Deng, Mohammad Mahdi Kamani, and Mehrdad
Suresh. Scaffold: Stochastic controlled averaging for on-
Mahdavi. Adaptive personalized federated learning. arXiv
device federated learning. 2019.
preprint arXiv:2003.13461, 2020.
[22] Jakub Konečnỳ, H Brendan McMahan, Felix X Yu, Peter
[11] Georgios Drainakis, Konstantinos V Katsaros, Panagiotis
Richtárik, Ananda Theertha Suresh, and Dave Bacon. Fed-
Pantazopoulos, Vasilis Sourlas, and Angelos Amditis. Fed-
erated learning: Strategies for improving communication ef-
erated vs. centralized machine learning under privacy-elastic
ficiency. arXiv preprint arXiv:1610.05492, 2016.
users: A comparative analysis. In 2020 IEEE 19th Interna-
[23] Li Li, Yuxi Fan, Mike Tse, and Kuo-Yi Lin. A review of
tional Symposium on Network Computing and Applications
applications in federated learning. Computers & Industrial
(NCA), pages 1–8. IEEE, 2020.
Engineering, 149:106854, 2020.
[12] Moming Duan, Duo Liu, Xianzhang Chen, Renping Liu, Yu- [24] Tian Li, Anit Kumar Sahu, Ameet Talwalkar, and Virginia
juan Tan, and Liang Liang. Self-balancing federated learning Smith. Federated learning: Challenges, methods, and future
with global imbalanced data in mobile systems. IEEE Trans- directions. IEEE Signal Processing Magazine, 37(3):50–60,
actions on Parallel and Distributed Systems, 32(1):59–71, 2020.
2020.
[25] Xiang Li, Kaixuan Huang, Wenhao Yang, Shusen Wang, and
[13] Moming Duan, Duo Liu, Xinyuan Ji, Renping Liu, Liang Zhihua Zhang. On the convergence of fedavg on non-iid
Liang, Xianzhang Chen, and Yujuan Tan. Fedgroup: Ef- data. arXiv preprint arXiv:1907.02189, 2019.
ficient federated learning via decomposed similarity-based [26] Xianfeng Liang, Shuheng Shen, Jingchang Liu, Zhen Pan,
clustering. In 2021 IEEE Intl Conf on Parallel & Distributed Enhong Chen, and Yifei Cheng. Variance reduced local
Processing with Applications, Big Data & Cloud Computing, sgd with lower communication complexity. arXiv preprint
Sustainable Computing & Communications, Social Comput- arXiv:1912.12844, 2019.
ing & Networking (ISPA/BDCloud/SocialCom/SustainCom), [27] Jiahuan Luo, Xueyang Wu, Yun Luo, Anbu Huang, Yun-
pages 228–237. IEEE, 2021. feng Huang, Yang Liu, and Qiang Yang. Real-world
[14] Avishek Ghosh, Jichan Chung, Dong Yin, and Kannan Ram- image datasets for federated learning. arXiv preprint
chandran. An efficient framework for clustered federated arXiv:1910.11089, 2019.
learning. Advances in Neural Information Processing Sys- [28] Yishay Mansour, Mehryar Mohri, Jae Ro, and
tems, 33:19586–19597, 2020. Ananda Theertha Suresh. Three approaches for per-
[15] Song Han, Huizi Mao, and William J Dally. Deep com- sonalization with applications to federated learning. arXiv
pression: Compressing deep neural networks with pruning, preprint arXiv:2002.10619, 2020.
trained quantization and huffman coding. arXiv preprint [29] Brendan McMahan, Eider Moore, Daniel Ramage, Seth
arXiv:1510.00149, 2015. Hampson, and Blaise Aguera y Arcas. Communication-
[16] Tzu-Ming Harry Hsu, Hang Qi, and Matthew Brown. Mea- efficient learning of deep networks from decentralized data.
suring the effects of non-identical data distribution for feder- In Artificial intelligence and statistics, pages 1273–1282.
ated visual classification. arXiv preprint arXiv:1909.06335, PMLR, 2017.
2019. [30] Jose G Moreno-Torres, Troy Raeder, Rocı́o Alaiz-
[17] Li Huang, Andrew L Shea, Huining Qian, Aditya Masurkar, Rodrı́guez, Nitesh V Chawla, and Francisco Herrera. A uni-
Hao Deng, and Dianbo Liu. Patient clustering improves fying view on dataset shift in classification. Pattern recogni-
efficiency of federated machine learning to predict mortal- tion, 45(1):521–530, 2012.
6542
[31] Dinh C Nguyen, Ming Ding, Pubudu N Pathirana, Aruna [45] Ming Xie, Guodong Long, Tao Shen, Tianyi Zhou, Xianzhi
Seneviratne, Jun Li, and H Vincent Poor. Federated learning Wang, Jing Jiang, and Chengqi Zhang. Multi-center feder-
for internet of things: A comprehensive survey. IEEE Com- ated learning. arXiv preprint arXiv:2005.01026, 2020.
munications Surveys & Tutorials, 23(3):1622–1658, 2021. [46] Guang Yang, Ke Mu, Chunhe Song, Zhijia Yang, and Tierui
[32] Nicolas Papernot, Patrick McDaniel, Arunesh Sinha, and Gong. Ringfed: Reducing communication costs in federated
Michael Wellman. Towards the science of security and pri- learning on non-iid data. arXiv preprint arXiv:2107.08873,
vacy in machine learning. arXiv preprint arXiv:1611.03814, 2021.
2016. [47] Sixin Zhang, Anna E Choromanska, and Yann LeCun. Deep
[33] Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, learning with elastic averaging sgd. Advances in neural in-
James Bradbury, Gregory Chanan, Trevor Killeen, Zeming formation processing systems, 28, 2015.
Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An im- [48] Yue Zhao, Meng Li, Liangzhen Lai, Naveen Suda, Damon
perative style, high-performance deep learning library. Ad- Civin, and Vikas Chandra. Federated learning with non-iid
vances in neural information processing systems, 32, 2019. data. arXiv preprint arXiv:1806.00582, 2018.
[34] Joaquin Quinonero-Candela, Masashi Sugiyama, Anton
Schwaighofer, and Neil D Lawrence. Dataset shift in ma-
chine learning. Mit Press, 2008.
[35] Sashank Reddi, Zachary Charles, Manzil Zaheer, Zachary
Garrett, Keith Rush, Jakub Konečnỳ, Sanjiv Kumar, and
H Brendan McMahan. Adaptive federated optimization.
arXiv preprint arXiv:2003.00295, 2020.
[36] Peter Richtárik and Martin Takáč. Distributed coordinate de-
scent method for learning with big data. The Journal of Ma-
chine Learning Research, 17(1):2657–2681, 2016.
[37] Kate Saenko, Brian Kulis, Mario Fritz, and Trevor Dar-
rell. Adapting visual category models to new domains. In
European conference on computer vision, pages 213–226.
Springer, 2010.
[38] Felix Sattler, Klaus-Robert Müller, and Wojciech Samek.
Clustered federated learning: Model-agnostic distributed
multitask optimization under privacy constraints. IEEE
transactions on neural networks and learning systems,
32(8):3710–3722, 2020.
[39] Karen Simonyan and Andrew Zisserman. Very deep convo-
lutional networks for large-scale image recognition. arXiv
preprint arXiv:1409.1556, 2014.
[40] Alysa Ziying Tan, Han Yu, Lizhen Cui, and Qiang Yang.
Towards personalized federated learning. IEEE Transactions
on Neural Networks and Learning Systems, 2022.
[41] Stacey Truex, Nathalie Baracaldo, Ali Anwar, Thomas
Steinke, Heiko Ludwig, Rui Zhang, and Yi Zhou. A hybrid
approach to privacy-preserving federated learning. In Pro-
ceedings of the 12th ACM workshop on artificial intelligence
and security, pages 1–11, 2019.
[42] Hao Wang, Zakhary Kaplan, Di Niu, and Baochun Li. Opti-
mizing federated learning on non-iid data with reinforcement
learning. In IEEE INFOCOM 2020-IEEE Conference on
Computer Communications, pages 1698–1707. IEEE, 2020.
[43] Jianyu Wang, Qinghua Liu, Hao Liang, Gauri Joshi, and
H Vincent Poor. Tackling the objective inconsistency prob-
lem in heterogeneous federated optimization. Advances
in neural information processing systems, 33:7611–7623,
2020.
[44] Qiong Wu, Xu Chen, Zhi Zhou, and Junshan Zhang. Fed-
home: Cloud-edge based personalized federated learning for
in-home health monitoring. IEEE Transactions on Mobile
Computing, 2020.
6543