You are on page 1of 2

IEEE INFOCOM 2022 - IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS) | 978-1-6654-0926-1/22/$31.00 ©2022 IEEE | DOI: 10.1109/INFOCOMWKSHPS54753.2022.

9798299 IEEE INFOCOM 2022 Poster

Noise-Resilient Federated Learning: Suppressing


Noisy Labels in the Local Datasets of Participants
Rahul Mishra, Hari Prabhat Gupta, and Tanima Dutta
{rahulmishra.rs.cse17, hariprabhat.cse, tanima.cse}@iitbhu.ac.in
Department of Computer Science and Engineering, Indian Institute of Technology (BHU) Varanasi, India
Abstract—Federated Learning (FL) is a novel paradigm of II. N OISE - RESILIENT FEDERATED LEARNING APPROACH
collaboratively training a model using local datasets of multiple
This poster considers a multi-class classification problem
participants. FL maintains data privacy and keeps local datasets
confined to the participants. This poster presents a novel noise- with a set C of c classes, where C = {1, 2, · · · c}. Let P
resilient federated learning approach that suppresses the negative denotes a set of N participants, P = {p1 , p2 , · · · , pN }. Each
impact of noisy labels in the local datasets of the participants. participant pi has local dataset Di with ni number of instances
The approach starts with the estimation of noise ratio without and c classes, where 1 ≤ i ≤ N . The noise-resilient approach
using prior information about the concentration of noisy labels.
comprises the following components:
Next, the server generates different groups of participants using
the estimated noise ratio. The FL-based training starts with the 1.) Estimating noise ratio: We estimate the noise ratio on
group having the least noise ratio, and subsequent groups are the local dataset of the participants using no prior information
added later. We also introduce a noise robust loss function that about the concentration. We determine most of the instances
incorporates dynamic variables to reduce the impact of noisy with correct labels in Di of pi , 1 ≤ i ≤ N , inspired from [3].
labels. The proposed approach reduces the overall training time
The work presented in [3] is applied to centralized training;
and achieves adequate accuracy despite noisy labels.
Index Terms—Federated learning, local dataset, noisy labels. however, we exploit the noise estimation in FL. It starts with
I. I NTRODUCTION the random split of Di into two halves, Di1 and Di2 . The local
model Mi is trained on the first half. pi tests the Di2 on the
Federated Learning (FL) is a distributed and collaborative trained Mi , where 1 ≤ i ≤ N . If the predicted label of an
learning framework that preserves data privacy and reduces instance in Di2 equals to the given label then the instance
communication overhead to transfer massive data from partic- is assigned to a set, denoted as Si1 . Next, the model Mi is
ipants to the server [1]. FL operation starts with the generation reinitialized and trained on Di2 to obtained a set Si2 . We obtain
and random initialization of a model at the server. Next, the a noise-free set Si by taking the union of sets Si1 and Si2 . Noise
server broadcast the model to all the participants for training ratio (βi ) of pi is estimated as: βi = |Di|D|−|Si |
.
i|
on their datasets. Afterwards, the participants transfer the 2.) Group formation: The noise-resilient approach divides
Weight Parameter Matrices (WPM) to the server. In turn, the participants into k groups, G = {G1 , G2 , · · · , Gk }, with
the server aggregates the WPM received from participants aPgroup Gl has Nl participants, where 1 ≤ l ≤ k and
and sends them back the updated WPM. Nevertheless, the k
l←1 Nl = N . This division of participants in the k groups
deployment of FL comes up with several challenges. Noisy depends upon the estimated noise ratio. The participants with
labels in the local datasets is one of such challenges [2], which similar ratio reside in same group, i.e., difference between
hampers performance and increases convergence time. noise ratio is minimal. We arrange groups into ascending order
This poster assumes an FL scenario comprising a server of noise ratio, i.e., Gl has a lower noise ratio than Gl+1 .
and multiple participants with local datasets. The available 3.) Noise robust loss function: This component formulates a
datasets on some of the participants may possess noisy labels NRL function to alleviate the noisy labels in the participant
with unknown and non-identical concentrations. Specifically, datasets. Let ρ ij denotes the predicted probability vector for
we investigate the following problem: how to reduce the j th instance in Di , where ρ ij = {ρ1ij , ρ2ij , . . . , ρcij }, 1 ≤
training time for convergence of federated learning without i ≤ N , and 1 ≤ j ≤ ni . The default loss functions (e.g.,
compromising performance despite noisy labels? To this end, cross-entropy) consider only the class label with the highest
this poster proposes a novel noise-resilient federated learning predicted probability, which is valid for the noise-free scenario.
approach to suppress the negative impact of noisy labels. The probabilities of other class labels (except highest) are
The approach initially estimates the noise ratio without prior nearly zero but can provide significant information in the case
knowledge of noise. The approach creates different groups of of noisy labels. NRL considers the prediction probability of
participants using the estimated ratio. Next, FL-based training all the classes and uses dynamic variables λ1 and λ2 . Let τij
starts with the group having the least ratio, and other groups is the true label probability of j th instance in Di [4]. NRL
are added later in ascending order of noise ratio. Training loss (LiN RL ) of pi is given as:
the group of participants with low noise concentrations builds ni X
c
a more robust model. It helps to speed up the training of 1 X
LiN RL =− [λ1 λ2 (τij log ρfij ) + (1 − λ1 )
subsequent groups via well-refined WPM. We also introduce ni × c j←1
f ←1
a Noise Robust Loss (NRL) function, which incorporates
dynamic variables to reduce the impact of noisy labels. × ρfij log τij + (1 − λ2 )(1 − τij ) log(1 − ρfij )]. (1)

978-1-6654-0926-1/22/$31.00 ©2022 IEEE


Authorized licensed use limited to: Indian Institute Of Technology (Banaras Hindu University) Varanasi. Downloaded on November 29,2022 at 06:59:00 UTC from IEEE Xplore. Restrictions apply.
IEEE INFOCOM 2022 Poster

NRL initially sets λ1 = λ2 = 1, which makes it similar to Algorithm 1: Noise-Resilient Federated Learning.
cross-entropy loss. During the training, λ1 and λ2 decreases Input: Set of participant P = {p1 , p2 , · · · , pN } with local
asynchronously to reduce the impact of the noisy labels. datasets with or without noisy labels.
4.) Local training and global aggregation: Server S designs 1 for each pi ∈ {p1 , p2 , · · · , pN } do
and randomly initialize a model M , upload M on partici- 2 Estimate noise ratio βi ;
pants of G1 , participants train the M using local dataset by 3 Obtain k clusters {G1 , G2 , · · · , Gk } using βi , 1 ≤ i ≤ N ;
minimizing NRL loss, and finally, the participants transfer 4 while not converge do
5 for each group Gl ∈ {G1 , G2 , · · · , Gk } do
WPM of trained M to S for aggregation using FL. NRL loss 6 if Gl == G1 then
adjust λ1 and λ2 to reduce impact of noisy labels. Let pq 7 G = G1
denotes a participant in G1 , where the target is to estimate 8 else
optimal WPM (wq ) by minimizing NRL loss (LqN RL ), q ∈ N1 . 9 G = G1 + · · · + Gl ;
During aggregation,
PN1 the server estimate aggregated loss as: 10 for each participant pq in G do
q
L1 (w1 ) = L
q←1 N RL , where w1 is aggregated WPM of 11 Obtain model M from server;
G1 . w1 and gradient ∇L1 are estimated as: 12 Train M on using NRL, given in Eq. 1;
N1 N1
13 Send WPM of M to the server;
1 X 1 X
Server aggregated WPM from all participant in G;
w1 = wq and ∇L1 = ∇LqN RL . (2) 14
N1 N 1
q←1 q←1 return Trained models on all the participants, {p1 , · · · , pN }.
The trained model M on the server achieves higher per-
formance and provides well-refined WPM for sub-sequent
groups. Next, the server includes the participants of G2 , train and S3 , respectively. We set a total of 10 participants, where
M using participants of G1 and G2 using the mechanism 5 participants have noise-free labels and 5 have equal and
mentioned above, andPNobtain WPM (w2 ) of the trained model randomly assigned noisy labels in their datasets.
1 1 +N2
as: w2 = N1 +N 2 q←1 wq . Similarly, we iterate for k 90 S1
groups to obtain the final WPM (wk ) of Gk as: 80
S2
S3
Accuracy

Accuracy
N1 +···+N N 70
1 X k 1 X 60
wk = wq = wq . (3) S1
N1 + · · · + Nk q←1
N q←1 50 S2 40
S3
wk is the well-refined WPM for N participants and achieves 20 40 60 80 100 0.1 0.2 0.3 0.4 0.5 0.6
Communication rounds Noise ratio
adequate performance in less convergence time. Fig. 1 il- (a) Communication rounds. (b) Noise ratio.
lustrates different steps of proposed noise-resilient approach, Fig. 2: Illustration of accuracy on S1 , S2 , and S3 .
which are summarised in Algorithm 1.
Fig. 2(a) illustrates a rapid increment in the accuracy of
Step 1 Step 2 Step 3 Next
S3 (proposed) up to 13 rounds and a marginal increment
round afterwards. The performance improvement is negligible after
Participants Estimating Group Training on Training on
in FL noise ratio formation G1 G1 · · · Gk 60 rounds, which indicates S3 converged after 60 rounds and
w and M w and M achieves an accuracy of 91.43%. We set the noise ratio to
p1 β1
1 2 1 2
0.1 during this experiment. Similarly, S1 and S2 converged
Server after 80 and 70 rounds and achieved an accuracy of 86.52%
p2
β2 G1 and 89.22%, respectively. Fig. 2(b) depicts the reduction in
.. .. ..
. . . G1 ··· G1 , · · · , Gk accuracy due to the increase in noise ratio. S3 and S1 achieves
βN Gk Group Groups highest and lowest performance, respectively. It is because
pN the S1 did not incorporate any mechanism for handling noisy
Noise Robust Loss labels, whereas S3 (proposed) estimated noise ratio and used
Fig. 1: An overview of noise-resilient approach.
1 and 2 denote NRL to mitigate the impact of noisy labels.
transmission of WPM from server and participant, respectively. R EFERENCES
III. P RELIMINARY RESULTS [1] B. Luo, X. Li, S. Wang, J. Huang, and L. Tassiulas, “Cost-effective
This section illustrates the preliminary results to verify federated learning design,” in Proc. IEEE INFOCOM, 2021, pp. 1–10.
[2] T. Tuor, S. Wang, B. J. Ko, C. Liu, and K. K. Leung, “Overcoming noisy
the effectiveness of the proposed noise-resilient approach. and irrelevant data in federated learning,” in Proc. IEEE ICPR, 2021, pp.
We presented the technique of handling noisy labels in [4]; 5020–5027.
however, its scope is limited to centralized training. We also [3] P. Chen, B. B. Liao, G. Chen, and S. Zhang, “Understanding and utilizing
deep neural networks trained with noisy labels,” in Proc. ICML, 2019,
collected the locomotion mode recognition dataset in [4], pp. 1062–1070.
which is utilized in this preliminary evaluation. In addition, [4] R. Mishra, A. Gupta, and H. P. Gupta, “Locomotion mode recog-
we utilized the deep learning-based conventional model in [4]. nition using sensory data with noisy labels: A deep learning ap-
proach,” IEEE Transactions on Mobile Computing, pp. 1–1, 2021, doi:
Further, we considered three schemes, i.e., baseline FedAvg, 10.1109/TMC.2021.3135878.
ICRP [2], and proposed noise-resilient, denoted as S1 , S2 ,

Authorized licensed use limited to: Indian Institute Of Technology (Banaras Hindu University) Varanasi. Downloaded on November 29,2022 at 06:59:00 UTC from IEEE Xplore. Restrictions apply.

You might also like