You are on page 1of 5

Personalized Learning with Limited Data on Edge Devices using

Federated Learning and Meta-Learning


Kousalya Soumya Lahari Voleti Shen-Shyang Ho
Department of Computer Science Department of Computer Science
Glassboro, New Jersey, USA Glassboro, New Jersey, USA
soumya.voleti16@gmail.com hos@rowan.edu

ABSTRACT 1 INTRODUCTION
The efficient and effective handling of few-shot learning tasks on There is a rapid growth in mobile device usage over the last decade.
mobile devices is challenging due to the small training set issue Moreover, there is a need to independent build effective personal-
and the physical limitations in power and computational resources ized predictive models on these mobile devices for different user
on these devices. We propose a framework that combines feder- needs. In other words, predictive models are different on different
ated learning and meta-learning to handle independent few-shot devices. The main challenge to build these predictive models is
learning tasks on multiple devices. In particular, we utilize the Pro- the limited amount of data for each object class (e.g., five to ten
totypical Networks to perform meta-learning on all devices to learn images available for each class) available for the predictive model
multiple independent few-shot learning models and to aggregate on a device. This is the so-called few-shot learning problem [19].
the device models using federated learning which can be reused Federated Learning (FL) [13, 16] is a recent technique which can
by the devices subsequently. We perform extensive experiments to solve the few-shots learning issue [19] by allowing the edge devices
(1) compare three different federated learning approaches, namely to collaboratively train and share knowledge to improve the pre-
Federated Averaging (FedAvg), Federated Proximal (FedProx), and diction accuracy at each device. In particular, distributed devices
Federated Personalization (FedPer) on the proposed framework, can effectively train their models and aggregate them to form an
and (2) investigate the effect of data heterogeneity issue on multi- effective global model shared by the devices.
ple devices on their few-shot learning performance. Our empirical The key difference between our few-shot learning scenario and
results show that our proposed framework is feasible and is able scenarios on existing federated few-shots learning problems is that
to improve the devices’ individual prediction performance and sig- each device has its own distinct prediction task different from the
nificant performance improvement using the aggregated model others. Our proposed solution combines federated learning (using
using any of the federated learning approaches when the few-shot aggregated models trained for the few-shots learning tasks) with
learning tasks are from the same source and data heterogeneity meta-learning [6] to fine tune the predictive models (at the devices)
continues to be a challenging issue to overcome. so that they work well when the number of data samples for each
class is limited on unrelated few-shot learning tasks at the mobile
CCS CONCEPTS devices. Fig. 1 shows an example of our problem scenario and
a high-level sketch of the proposed solution utilizing federated
• Computing methodologies → Neural networks; Coopera-
learning for knowledge sharing (models) among multiple devices to
tion and coordination.
perform few-shot learning at these devices driven by meta-learning
to fine-tuning the individual models.
KEYWORDS To enhance the training process for few-shot learning on the
Federated Learning, Meta-Learning, Few-Shot Learning devices, a meta-learning technique is applied on each device on
a collection of few-shot learning tasks so that global and local
ACM Reference Format: predictive models can be efficiently learned for previously unseen
Kousalya Soumya Lahari Voleti and Shen-Shyang Ho. 2023. Personalized few-shot learning tasks. Many meta-learning methods have been
Learning with Limited Data on Edge Devices using Federated Learning and presented for the few-shot learning problem such as Task Agnostic
Meta-Learning. In The Eighth ACM/IEEE Symposium on Edge Computing Meta-Learning [9] and meta-learning over a pre-trained model [3].
(SEC ’23), December 6–9, 2023, Wilmington, DE, USA. ACM, New York, NY, We investigate a federated meta-learning framework for few-shot
USA, 5 pages. https://doi.org/10.1145/3583740.3626811 learning using the Prototypical Networks [18] as the base classifier
to perform meta-learning on all devices in a centralized data dis-
tributed architecture. In particular, the different few-shot predictive
Permission to make digital or hard copies of all or part of this work for personal or models can be executed on the devices and the server. We perform
classroom use is granted without fee provided that copies are not made or distributed extensive experiments on three real-world datasets, namely CIFAR-
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for components of this work owned by others than the 100 [10], Fashion-MNIST [20] and Omniglot [11], to (1) compare
author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or three different federated learning approaches, namely Federated
republish, to post on servers or to redistribute to lists, requires prior specific permission Averaging (FedAvg) [14], Federated Proximal (FedProx) [12], and
and/or a fee. Request permissions from permissions@acm.org.
SEC ’23, December 6–9, 2023, Wilmington, DE, USA Federated Personalization (FedPer) [2] on our proposed framework
© 2023 Copyright held by the owner/author(s). Publication rights licensed to ACM. and (2) explore the effect of data heterogeneity (using different
ACM ISBN 979-8-4007-0123-8/23/12. . . $15.00
https://doi.org/10.1145/3583740.3626811
SEC ’23, December 6–9, 2023, Wilmington, DE, USA Kousalya Soumya Lahari Voleti and Shen-Shyang Ho

Figure 1: Overview of Federated Few-shot Learning using Meta Learning

datasets on different edge devices) on the few-shot learning perfor- Learning) [4] called Sharp-MAML [1] utilizing a sharpness aware
mance on the different edge devices. minimization algorithm to overcome the generalization issues re-
The main observations and conclusions from our empirical re- lated to the nonconvex nature of the MAML task.
sults are as follows: One recent research of interest is whether one can utilize meta-
1. For few-shot classification tasks with reasonable difficulty (> 50% learning techniques to support few-shot learning tasks [19]. Unlike
accuracy initially), the proposed approach is able to improve the de- supervised learning of training on one set and testing on another
vices’ individual prediction performance and improve significantly set, meta-learning technique uses three different sets, namely: base
on the global model (on the server) using any of the federated learn- set for prior training, support set for fine-tuning and a query set
ing approaches when the few-shot learning tasks are from the same which one performs prediction on. The support set and query set
dataset. mostly have classes with fewer samples that are unseen in the base
2. Using the FedAvg performance as a baseline, we observe that set. Every support set and query set is specified as 𝑛-way: 𝑘-shot:
FedPer and FedProx performed well on our proposed framework. 𝑞-query tasks where n-way denotes number of classes in the sets,
3. Data heterogeneity problem affects the prediction performance 𝑘-shot is number of images per class in support set and 𝑞-query
of our current implementation of the framework no matter which is the number of images per class in the query set. The two main
federated learning approach we used. steps of an inductive meta-learning method for few-shot learning
are as follows:
1. Creating a set of support and query sets for training. To
2 BACKGROUND learn to handle an unseen few-shot learning task, a training set
2.1 Meta-Learning for Few Shot Learning is needed to build prior knowledge of the model. The training set
Meta-learning [6] is learning from one model with a large amount is randomly sampled to create multiple support and query sets of
of data, and fine-tuning it to another model for a new similar task. pre-defined 𝑛-way 𝑘-shot 𝑞-query configuration for 𝑛-way 𝑘-shot
Recently proposed meta-learning approaches include the synthetic few-shot learning training purposes. By replicating the process
information bottleneck method [7] which is based on an empirical of predicting with support and query sets during training, when
Bayes formulation with a transductive approach to obtain the vari- the model needs to train on an unseen support and query set, the
ational posterior and a variant of MAML (Model-Agnostic Meta training will be more effective.
Personalized Learning with Limited Data on Edge Devices using Federated Learning and Meta-Learning SEC ’23, December 6–9, 2023, Wilmington, DE, USA

Federated Server C1 C2 C3 Average Server


2. Fine tuning using a support set. The pre-trained model is Learning Training Query Query Query of 3 Test
given a support set (or a novel set) of 𝑛-way 𝑘-shot configuration. Algorithm Clients
One will fine-tune the predictive model (based on the base set) for FedAvg 60.13 67.00 66.10 67.30 66.80 82.60
FedPer 61.68 71.70 65.20 69.80 68.90 83.70
the query set using the support set. FedProx 59.91 65.40 72.40 68.20 68.67 80.60
A query set (or a validation set) with 𝑛-way 𝑞-query is used to
Table 1: Mean Accuracy (%) for a 3-way 5-shot 10-query Task
test the performance of the fine-tuned prediction model. In other
using 3 clients for Fashion-MNIST Dataset. Server training
words, a prediction model which is trained on a large base set and
used the initial model 𝑀 and testing used the aggregated
fine-tuned on an 𝑛-way 𝑘-shot support set will predict on an 𝑛-way
model 𝑀 ′ .
𝑞-query query set.

Federated Server C1 C2 C3 Average Server


Learning Training Query Query Query of 3 Test
2.2 Federated Learning Algorithm Clients
Recently, there are great interest in utilizing federated learning to FedAvg 61.52 70.70 70.40 72.40 71.17 83.30
FedPer 61.30 70.30 75.40 70.80 72.17 89.00
improve on the performance of few-shot learning [8, 17]. Three FedProx 61.39 66.00 70.40 71.00 69.13 86.50
federated learning aggregation approaches are integrated into our Table 2: Mean Accuracy (%) for a 5-way 5-shot 10-query Task
proposed framework and compared. They are: using 3 clients for Fashion-MNIST Dataset. Server training
1. FedAvg [14]: When the clients receive the server trained model, used the initial model 𝑀 and testing used the aggregated
they undergo several rounds of local training using their local data. model 𝑀 ′ .
At each round, these clients update the weights and finish their
local training, their weight updates are averaged and tested on the
server. It is a simple averaging technique and it does not address
the data heterogeneity issue among the clients. This is the baseline distances, {𝑑 1, 𝑑 2, . . . , 𝑑𝑘 }. Then, we calculate the negative logarith-
approach to compare our federated meta-learning framework. mic probability distribution of these distances using softmax. The
2. FedPer [2]: Fed-Personalization concentrates more on the individ- class with the highest softmax values is the prediction for 𝑥𝑞 .
ual learning process of the clients. The more the clients learn, the The Prototypical Networks are computationally inexpensive,
better is the overall aggregated global server model performance. easy to implement, and noise resistant.
The client neural network has its layers divided into base and per-
sonalization layers where base layers often get updated with the
3 CENTRALIZED FEDERATED
FedAvg aggregated model layers and personalization layers are
kept aside for client specialization. For every communication round META-LEARNING FOR FEW-SHOT
only the base layers are changed with respect to the server model LEARNING FRAMEWORK
but the personalization layers are never changed. Fig. 1 shows our centralized federated learning for knowledge shar-
3. FedProx [12]: Federated Optimization (FedProx) specifically ad- ing (models) among multiple devices to perform few-shot learning
dresses and deals with the inconstant resource constraints of clients at these devices driven by meta-learning to fine-tuning the individ-
during federated learning and also the issue with heterogeneity of ual models. The framework consists of five main steps.
local data at the clients. They assure the non-uniform working of (1) The training dataset on the server is randomly sampled into
different client devices and give each client a varying amount of multiple support and query sets of predefined few-shot con-
work to be done. They use a proximal term for this process, which figuration to train a global model 𝑀 based on meta-learning
balances the local updates. using Prototypical Network.
(2) The global model 𝑀 is sent to the clients.
(3) At each client, we have a support set and query set of a
2.3 Meta-Learning using Prototypical Networks predefined few-shot learning configuration (e.g., 5-way, 5-
for Few-Shot Learning shot, 5-query) similar to the configuration at the server. Once
We utilize Prototypical Networks [18], an efficient meta-learning the clients receive 𝑀, they perform model fine-tuning with
algorithm which prioritizes the few-shot learning in the federated their distinct support sets (e.g., 𝑆1, 𝑆2, 𝑆3 for 3 clients).
learning setting. These networks are characterized by calculating (4) Fine-tuned models are sent back to the server.
the prototypes (i.e., centroid of a feature space). Consider a support (5) Model aggregation is performed in the fine-tuned models on
set 𝑆 of 𝑘 classes of 𝑛 labeled examples {(𝑥 1, 𝑦1 ), (𝑥 2, 𝑦2 ), . . . , (𝑥𝑛 , 𝑦𝑛 )} the server using one of the federated learning algorithms (Fe-
where {𝑥 1, 𝑥 2, . . . , 𝑥𝑛 } are the input data and {𝑦1, 𝑦2, . . . , 𝑦𝑛 } are dAvg, FedProx, and FedPer) to obtain the aggregated model
their respective labels. We pass the raw input data to a feature ex- 𝑀 ′.
tractor to obtain feature vectors {𝑧 1, 𝑧 2, . . . , 𝑧𝑛 }. For each class, we For our experimental setting, we perform prediction on the
compute the centroid of the feature vectors to obtain 𝑘 prototypes client’s respective query sets (e.g, 𝑄1, 𝑄2, 𝑄3) using their respective
{𝑝 1, 𝑝 2, . . . , 𝑝𝑘 }. fine-tuned model (e.g., 𝑀 ′ 1, 𝑀 ′ 2, 𝑀 ′ 3). The five steps are iterated
For a query set data point, 𝑥𝑞 , we transform it using the feature for multiple rounds in our experimental scenarios. The fine-tuned
extractor to obtain 𝑧𝑞 and calculate the Euclidean distance between client models and the aggregated models are then evaluated. In our
𝑧𝑞 and each prototype 𝑝𝑘 . Since we have 𝑘 prototypes, we get 𝑘 experiments, we utilize 2 and 3 clients.
SEC ’23, December 6–9, 2023, Wilmington, DE, USA Kousalya Soumya Lahari Voleti and Shen-Shyang Ho

Federated Server C1 C2 Average Server


Learning Training Query Query of 2 Test
resources, we only train every model for 20 rounds and the num-
Algorithm (Initial Clients ber of epochs for each client or server is 1. Number of episodes
FedAvg 57.00 64.70 70.80 67.75 89.70 in pre-training is 400. The optimization technique we use in the
FedPer 55.93 70.40 64.40 67.40 90.50
FedProx 56.45 67.40 68.80 68.10 90.60
ResNet18 [5] model is Stochastic Gradient Descent (SGD) [15]. The
learning rate is 0.1 for all experiments. The platform used for the
Table 3: Mean Accuracy (%) for a 3-way 5-shot 10-query Task
experiments is Google Colaboratory Pro+ consisting of a GPU
using 2 clients for Omniglot Dataset. Server training used
(NVIDIA PT100) and 52 GB RAM.
the initial model 𝑀 and testing used the aggregated model
𝑀 ′.
4.3 Results and Discussions
Federated Server C1 C2 Average Server From Table 1, 2, 3, and 4, we observe that all the clients models
Learning Training Query Query of 2 Test
Algorithm Clients
improved their performance from the model obtained from the
FedAvg 57.32 67.80 67.00 67.40 89.20 server for single data domain scenario for both Fashion-MNIST and
FedPer 58.61 69.60 73.80 71.70 91.00 Omniglot datasets and the two different few-shot learning configu-
FedProx 58.67 71.60 68.60 70.10 91.70
rations for the individual few-shot meta-learning tasks. Moreover,
Table 4: Mean Accuracy (%) for a 5-way 5-shot 10-query Task the performance are further improved when the federated learning
using 2 clients for Omniglot Dataset. Server training used models (i.e., FedAvg, FedPer, and FedProx) are used. In particular,
the initial model 𝑀 and testing used the aggregated model we observe that the utilization of FedPer aggregated model in our
𝑀 ′. proposed framework performed best for Fashion-MNIST dataset
and very competitive for the Omniglot dataset.
The second column in Table 5 and 6 show the performance
of the initial model 𝑀 trained on the server using Omniglot and
CIFAR100 datasets. Each of the three clients fine-tuned the pre-
4 EMPIRICAL RESULTS
trained model 𝑀 for the few-shots learning tasks on the three
4.1 Datasets Description datasets independently. One interesting observation comparing
The following three datasets are used in our empirical study of the Client C2 performance with results in Table 3 and 4 on Omniglot
federated few-shot learning problem. dataset showed significant improvement on the client performance
from 60-70% to high 80% when the global initial model utilizes
(1) Fashion MNIST [20]: The dataset consists of 60,000 images
also CIFAR100 dataset. On the other hand, CIFAR100 results on
in training set and 10,000 images in testing set. All these
Client C1 did not improved at all (baseline results similar to Table 3
images belong to 10 different classes and all are different
and 4 not shown here due to page limitation). Client C3 has worse
types of clothes such as trousers, shirts etc. Every image is a
performance compared to Fig. 1 and 2 as model 𝑀 was not trained
28 × 28 grayscale image.
with any Fashion MNIST data at all.
(2) Omniglot [11]: It is a dataset of 1623 hand-written characters
Federated learning did not significantly improved the aggre-
from different languages written by 20 different people, that
gated model performance for CIFAR100 few-shot learning task. It
is 1623 × 20 = 32, 460 data points. It consists of characters
results in worse performance for Omniglot with the aggregated
from 50 different alphabet series. Every omniglot character
model. However, it perform significantly better for Fashion MNIST
image is 105 × 105 pixel size grayscale image. The training
few-shot learning task. It seems that Fashion MNIST has affected
set consists of 19,280 data points and the test set consists of
the performance on few-shot learning for Omniglot. On the other
13,180 data points.
hand, the model based on Fashion MNIST in Client C3 helps to
(3) CIFAR-100 [10]: It is a subset of 80 million tiny images dataset.
improve the aggregated model performance on Fashion MNIST.
It consists of 60,000 images divided between 100 different
From the last three columns in Table 5 and 6, it is not clear which
classes. Each image is 32 × 32 pixels and is colored. The
federated learning approach is best for our proposed framework.
training set has 50,000 images of 500 classes and the test set
New federated learning aggregation approach has to be proposed
has 10,000 images of 100 classes.
to handle data heterogeneity problem for few-shot learning using
meta-learning.
4.2 Experimental Design and Implementation
Our experiments are performed on two few-shot learning task
configurations, namely: 3-way : 5-shot : 10-query and 5-way : 5- 5 CONCLUSIONS AND FUTURE WORK
shot : 10-query. We compare the few-shot learning performance on We investigate and implement our proposed centralized federated
the three datasets for the 2 (or 3)-client and 1 server scenario. Every meta-learning framework to handle independent few-shot learning
dataset is divided into five parts namely, one large part for server tasks on multiple devices. In particular, we utilize the Prototypical
base training, three parts for clients individual training/testing and Networks to perform meta-learning on all devices to learn multiple
finally one part for testing the aggregated model. We use accuracy independent few-shot learning models and to aggregate the device
to evaluate the performance of different test configurations. models using federated learning which can be reused by the clients
Since we do not assume that all devices involved in the knowl- subsequently. Future works include utilizing more personalized
edge sharing of model parameters have sufficient computational used of transfer learning to handle data heterogeneity issue on
Personalized Learning with Limited Data on Edge Devices using Federated Learning and Meta-Learning SEC ’23, December 6–9, 2023, Wilmington, DE, USA

Federated Server C1 C2 C3 Server Test


Learning Training Query Query Query (Aggregated 𝑀 ′ )
Algorithm (Initial 𝑀 ) CIFAR100 Omniglot FMNIST CIFAR100 Omniglot FMNIST
Omniglot &
CIFAR100
FedAvg 64.65 40.33 90.66 60.33 44.83 83.00 74.00
FedPer 61.975 44.83 87.17 55.66 45.66 81.00 69.17
FedProx 65.14 44.83 88.50 64.33 47.83 86.53 73.66
Table 5: Mean Accuracy (%) for a 3-way 5-shot 10-query Task using 3 clients using Multiple (3) Datasets.

Federated Server C1 C2 C3 Server Test


Learning Training Query Query Query (Aggregated 𝑀 ′ )
Algorithm (Initial 𝑀 ) CIFAR100 Omniglot FMNIST CIFAR100 Omniglot FMNIST
Omniglot &
CIFAR100
FedAvg 58.55 31.00 87.50 56.30 29.60 74.90 62.60
FedPer 59.93 31.70 88.90 51.00 34.30 67.90 64.60
FedProx 56.56 30.00 86.50 50.10 33.10 74.80 60.60
Table 6: Mean Accuracy (%) for a 5-way 5-shot 10-query Task using 3 clients using Multiple (3) Datasets.

independent devices and exploring the federated meta-few-shot information processing systems 27 (2014).
learning with decentralized architectures. [16] Dinh C Nguyen, Ming Ding, Pubudu N Pathirana, Aruna Seneviratne, Jun Li, and
H Vincent Poor. 2021. Federated learning for internet of things: A comprehensive
survey. IEEE Communications Surveys & Tutorials 23, 3 (2021), 1622–1658.
[17] Debaditya Shome and Tejaswini Kar. 2021. FedAffect: Few-shot federated learning
REFERENCES for facial expression recognition. In Proceedings of the IEEE/CVF International
[1] Momin Abbas, Quan Xiao, Lisha Chen, Pin-Yu Chen, and Tianyi Chen. 2022. Conference on Computer Vision. 4168–4175.
Sharp-MAML: Sharpness-Aware Model-Agnostic Meta Learning. In International [18] Jake Snell, Kevin Swersky, and Richard Zemel. 2017. Prototypical networks for
Conference on Machine Learning. PMLR, 10–32. few-shot learning. Advances in neural information processing systems 30 (2017).
[2] Manoj Ghuhan Arivazhagan, Vinay Aggarwal, Aaditya Kumar Singh, and Sunav [19] Yaqing Wang, Quanming Yao, James T Kwok, and Lionel M Ni. 2020. Generalizing
Choudhary. 2019. Federated learning with personalization layers. arXiv preprint from a few examples: A survey on few-shot learning. ACM computing surveys
arXiv:1912.00818 (2019). (csur) 53, 3 (2020), 1–34.
[3] Yinbo Chen, Zhuang Liu, Huijuan Xu, Trevor Darrell, and Xiaolong Wang. 2021. [20] Han Xiao, Kashif Rasul, and Roland Vollgraf. 2017. Fashion-MNIST:
Meta-baseline: Exploring simple meta-learning for few-shot learning. In Proceed- a Novel Image Dataset for Benchmarking Machine Learning Algorithms.
ings of the IEEE/CVF International Conference on Computer Vision. 9062–9071. arXiv:cs.LG/1708.07747 [cs.LG]
[4] Chelsea Finn, Pieter Abbeel, and Sergey Levine. 2017. Model-agnostic meta-
learning for fast adaptation of deep networks. In International conference on
machine learning. PMLR, 1126–1135.
[5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual
learning for image recognition. In Proceedings of the IEEE conference on computer
vision and pattern recognition. 770–778.
[6] Timothy Hospedales, Antreas Antoniou, Paul Micaelli, and Amos Storkey. 2021.
Meta-learning in neural networks: A survey. IEEE transactions on pattern analysis
and machine intelligence 44, 9 (2021), 5149–5169.
[7] Shell Xu Hu, Pablo Garcia Moreno, Yang Xiao, Xi Shen, Guillaume Obozinski,
Neil D. Lawrence, and Andreas C. Damianou. 2020. Empirical Bayes Transduc-
tive Meta-Learning with Synthetic Gradients. In 8th International Conference
on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020.
OpenReview.net. https://openreview.net/forum?id=Hkg-xgrYvH
[8] Wenke Huang, Mang Ye, Bo Du, and Xiang Gao. 2022. Few-shot model agnostic
federated learning. In Proceedings of the 30th ACM International Conference on
Multimedia. 7309–7316.
[9] Muhammad Abdullah Jamal and Guo-Jun Qi. 2019. Task agnostic meta-learning
for few-shot learning. In Proceedings of the IEEE/CVF Conference on Computer
Vision and Pattern Recognition. 11719–11727.
[10] Alex Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images.
(2009).
[11] Brenden M Lake, Ruslan Salakhutdinov, and Joshua B Tenenbaum. 2015. Human-
level concept learning through probabilistic program induction. Science 350, 6266
(2015), 1332–1338.
[12] Tian Li, Anit Kumar Sahu, Manzil Zaheer, Maziar Sanjabi, Ameet Talwalkar,
and Virginia Smith. 2020. Federated optimization in heterogeneous networks.
Proceedings of Machine Learning and Systems 2 (2020), 429–450.
[13] Wei Yang Bryan Lim, Nguyen Cong Luong, Dinh Thai Hoang, Yutao Jiao, Ying-
Chang Liang, Qiang Yang, Dusit Niyato, and Chunyan Miao. 2020. Federated
learning in mobile edge networks: A comprehensive survey. IEEE Communications
Surveys & Tutorials 22, 3 (2020), 2031–2063.
[14] Brendan McMahan, Eider Moore, Daniel Ramage, Seth Hampson, and
Blaise Aguera y Arcas. 2017. Communication-efficient learning of deep net-
works from decentralized data. In Artificial intelligence and statistics. PMLR,
1273–1282.
[15] Deanna Needell, Rachel Ward, and Nati Srebro. 2014. Stochastic gradient descent,
weighted sampling, and the randomized Kaczmarz algorithm. Advances in neural

You might also like