You are on page 1of 8

Computer Communications 214 (2024) 215–222

Contents lists available at ScienceDirect

Computer Communications
journal homepage: www.elsevier.com/locate/comcom

FedAFR: Enhancing Federated Learning with adaptive feature reconstruction


Youxin Huang a , Shunzhi Zhu a ,∗, Weizhe Chen b , Zhicai Huang c
a
Xiamen University of Technology, Xiamen, China
b
Guangzhou University, Guangzhou, China
c
Xiamen Huaxia University, Xiamen, China

ARTICLE INFO ABSTRACT

Keywords: Federated learning is a distributed machine learning method where clients train models on local data to ensure
Federated Learning that data will not be transmitted to a central server, providing unique advantages in privacy protection.
Non-IID data However, in real-world scenarios, data between different clients may be non-Independently and Identically
Feature reconstruction
Distributed (non-IID) and imbalanced, leading to discrepancies among local models and impacting the efficacy
Feature representations
of global model aggregation. To tackle this issue, this paper proposes a novel framework, FedARF, designed to
improve Federated Learning performance by adaptively reconstructing local features during training. FedARF
offers a simple reconstruction module for aligning feature representations from various clients, thereby
enhancing the generalization capability of cross-client aggregated models. Additionally, to better adapt the
model to each client’s data distribution, FedARF employs an adaptive feature fusion strategy for a more
effective blending of global and local model information, augmenting the model’s accuracy and generalization
performance. Experimental results demonstrate that our proposed Federated Learning method significantly
outperforms existing methods in variety image classification tasks, achieving faster model convergence and
superior performance when dealing with non-IID data distributions.

1. Introduction framework. It allows data owners to train their models locally and only
share model updates while protecting data privacy. This way, the model
In the era of big data, the rapid development of the internet has pro- can be shared and improved among all participants without sharing
pelled the extensive application of data analytics and machine learning the original data. For instance, in the medical field, various medical
techniques across multiple sectors, leading to a wave of innovation. institutions can share their model updates to improve disease prediction
For example, the medical field leverages machine learning to predict models without sharing patients’ personal medical records [5,6].
disease risks [1], the transportation industry uses neural networks to In Federated Learning, each client typically collects data based on
forecast traffic conditions [2], and the financial sector refines risk their respective conditions and behaviors, leading to a Non-Independ-
assessment and customer behavior analysis through machine learn- ently and Identically Distributed (non-IID) data distribution. This char-
ing [3]. However, this wave has yet to progress smoothly and has acteristic raises two main issues: first, because model updates rely on
encountered various challenges, among which data privacy issues are individual data, their heterogeneity may lead to discrepancies among
particularly prominent. Our personal information, consumption habits, the updates, thereby affecting the convergence of the global model [7].
browsing history, etc., will likely be collected and analyzed as we use It might require more iterations to reach a stable state, undoubtedly
various online services. With the rise of public awareness regarding
increasing computational and communication costs; second, due to the
personal privacy protection, people’s attention to data privacy has grad-
differences in updates, the global model may not adapt well to all client
ually intensified. Consequently, many organizations and corporations
data, i.e., the global model may perform excellently on some clients
have found that they need help to directly share data after weighing
while poorly on others.
the pros and cons, which undoubtedly constitutes a significant obstacle
To address the aforementioned issues, many researchers have begun
to data-driven decision-making and model training.
to focus on and study the process of improving client-side model train-
To address this problem, Google has proposed a new solution -
ing. Li et al. [8] proposed FedProx, which adds a proximal term in client
Federated Learning (FL) [4]. Federated Learning is not a traditional
machine learning method but a unique distributed machine learning training to minimize the distance between local and global models

∗ Corresponding author.
E-mail addresses: youxhuang@s.xmut.edu.cn (Y. Huang), szzhu@xmut.edu.cn (S. Zhu), chenwz@e.gzhu.edu.cn (W. Chen), huangzc@hxxy.edu.cn
(Z. Huang).

https://doi.org/10.1016/j.comcom.2023.12.007
Received 31 August 2023; Received in revised form 14 November 2023; Accepted 8 December 2023
Available online 12 December 2023
0140-3664/© 2023 Elsevier B.V. All rights reserved.
Y. Huang et al. Computer Communications 214 (2024) 215–222

from the previous round, thereby reducing the differences between On the client side, Liang et al. [15] use a gradient correction term
client-side models. However, this method might suppress the model’s to reduce gradient drift and introduce a local drift variable to track the
flexibility because it forces the model to be closer to the global model in bias between the global and local models. FedAlign [16] introduces the
each round of updates, which might also hinder the global model from concept of local learning ubiquity, using a distillation-based regulariza-
advancing toward optimization. Yao et al. [9] proposed a federated tion method to optimize the federated learning process. FedPvR [17]
learning method based on knowledge distillation, FedGKD, which uti- proposes a method for partial variance reduction in joint learning,
lizes past global models to guide local model training. Each client learns which corrects model drift by reducing variance only on the final layer.
global knowledge from past global models via adaptive knowledge On the server side, McMahan et al. [4] proposed FedAvg, which is the
refinement techniques. Nevertheless, the limitation of this method is earliest and most basic method in federated learning. It simply averages
that its performance largely depends on the quality of the global model. all client model updates. FedDUAP [18] dynamically adjusts the server
If the quality of the global model is low, its guidance might mislead the update method based on the probability distribution of client data and
local model. Moreover, it may overlook the characteristics of local data, the amount of data. FedAvgM [19] use server momentum to help the
especially in scenarios where data distribution heterogeneity is high. model adapt to the data distribution between different devices.
Yuan et al. [10] proposed a federated learning distribution transforma- In addition, on the client side, some researchers have also at-
tion framework, DisTrans, which introduces an offset to change local tempted to solve data heterogeneity problems from the perspective of
data distribution, thus reducing the differences between client data. feature learning. Moon [20] aims to minimize the distance between
However, ideal results may not be achieved through simple data offset the representations extracted by the global model and the local model,
for datasets with complex or irregular distributions. while maximizing the distance between the representations extracted
Inspired by the aforementioned research, we designed and proposed by the current local model and the old local model. FedIntR [21]
a novel adaptive feature reconstruction federated learning method, uses the intermediate layer representation of the global model and
FedARF, for the problem of data heterogeneity. This method utilizes the the old local model to calculate the weight of each intermediate layer
feature information of the global model to reconstruct the feature of the representation, then uses these weights to calculate the regularization
local model, achieving more effective integration of data from different term.
clients in the feature space. Simultaneously, we further proposed an Compared to other methods, our method, FedARF, focuses more on
adaptive feature fusion strategy, which reconstructs local features based the relationship between the reconstructed features and the features
on the difference between global features and local features, giving our that blend the global and local models. This approach not only helps
method greater flexibility in handling data heterogeneity and optimiz- to improve the accuracy of the model, but also enhances the model’s
ing model performance. In general, the main contributions of this study adaptability to various data distributions.
include:

• We proposed a novel federated learning framework method, 2.2. Feature reconstruction


FedARF, introducing a feature reconstruction module to align
feature representations from different clients effectively. In the Feature reconstruction is a crucial machine learning technique that
aggregation process of the federated learning model, this strategy can extract valuable data characteristics and reshape or transform
significantly mitigates the model performance degradation caused them to improve model performance. Feature reconstruction can be
by data heterogeneity. implemented in various ways, such as standardization (making the data
• In the feature reconstruction module of the FedARF framework, conform to a normal distribution), normalization (transforming data
an adaptive feature fusion strategy is introduced and improved, within a specific range), and dimensionality reduction (like Principal
optimizing the balance of learning global shared features and Component Analysis, PCA, which is used to reduce data dimensions).
local specific features during the training process, making the Effective feature reconstruction not only enhances model perfor-
model better adapt to the specific data distribution of each client. mance, but also aids in understanding and interpreting model decisions.
• Experiments on multiple datasets show that our method has For instance, Song et al. [22] proposed a global and local feature recon-
significantly improved handling data heterogeneity and enhanced struction network based on the encoder–decoder structure, achieving
model performance. Compared with existing federated learning the capture of global context features and the recovery of feature spatial
methods, it demonstrates significant advantages. information. Göppel et al. [23] implemented feature reconstruction di-
The rest of this paper is organized as follows. Section 2 discusses rectly from CT data through a nonlinear regularization method, solving
the relevant methods of federated learning and feature reconstruction. the image reconstruction problem from incomplete X-ray CT data. Li
Section 3 provides a detailed introduction to the FedARF method. et al. [24] realized feature reconstruction based on meta-learning and
Section 4 summarizes experimental results and analysis. Section 5 metric learning methods, solving the dependency problem on a large
concludes the paper. amount of annotated data in few-shot object detection tasks.
In the method of this paper, we mainly reconstruct the features
2. Relation work extracted from local data through a simple but effective linear layer.
The advantage of doing this is that it can optimize the adjustment of
In this section, we begin by outlining the various strategies pro- features without overly increasing the complexity of the model. This
posed by researchers to mitigate this issue, with adaptations tailored method allows the model to more effectively utilize the characteristics
specifically for either the client side or the server side. Subsequently, of local data, thus enhancing the overall learning effect.
we also introduce related methods of feature reconstruction, which are
significant for enhancing the effectiveness of federated learning. 3. Method

2.1. Addressing non-IID data in federated learning In this section, we introduce the proposed method, which aligns fea-
ture representations from different clients by adding a reconstruction
Non-IID data poses significant challenges in federated learning, module to the model. Meanwhile, we also introduce an adaptive feature
affecting the convergence and overall performance of the models [11– fusion strategy, which allows the model to better adapt to the data
14]. Researchers have proposed different solutions starting from the characteristics of different clients and strike a better balance between
client and server sides to tackle this issue. learning global and local features.

216
Y. Huang et al. Computer Communications 214 (2024) 215–222

Fig. 1. The training process of FedAFR framework.

3.1. Motivation

The data heterogeneity in federated learning refers to the dif-


ferences in feature distribution or statistical properties of data held
by clients, which may be due to different data sources or unique
data collection processes. Such heterogeneity can cause performance
fluctuations among clients and negatively impact the generalization
capability of the global model. CNNs are widely used in deep learning
for their excellent feature extraction ability. However, when faced with
significant data heterogeneity, the fully connected layers in CNNs need
to learn complex mapping relationships to accommodate diverse data, Fig. 2. Components of the Reconstruction module.

which may increase the disparity among different client layers, com-
plicate model aggregation, and possibly result in the loss of valuable
local model information. To address this issue, we propose a novel The linear layer learns a simple linear transformation relationship,
federated learning method, FedARF. It minimizes the disparities in the aligning feature representations from different clients in a unified
features extracted from local data by individual client models, thereby feature space, reducing feature differences, and simplifying the learning
effectively mitigating the issue of data heterogeneity. task of the fully connected layer. The ReLU activation function intro-
duces non-linearity, enhancing the model’s adaptability when dealing
3.2. Overview with different feature spaces, and ensuring effective backpropagation.
This simple design choice is beneficial because it has lower compu-
Fig. 1 shows the federated learning system framework, FedARF, tational complexity and fewer parameters, which helps reduce the
proposed in this paper. As the figure shows, FedARF includes the computational burden on clients and lowers the risk of overfitting.
following steps:
3.4. Adaptive fusion strategies
1. Initially, clients download the global model, denoted as 𝐺, to use
as their local model before training commences. Client data is To minimize the differences between the feature of models from
then utilized to update this local model, facilitating the extraction different clients, we propose an adaptive feature fusion strategy. This
of local features, represented as 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 . In parallel, the global strategy calculates a new fusion feature by taking a weighted sum of
features, 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 , are extracted from the pristine, unmodified the feature of the global model and local model, combining information
global model 𝐺, prior to any local updates. The 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 captures from both global and local models. The contributions of the global
the feature space of the global model before local training, while model feature and the local model feature are adjusted according to
𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 dynamically evolves as local training progresses with the their differences and are calculated using the following formula:
client’s data.
2. Then, 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 is subjected to a feature reconstruction operation, 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 = 𝛼𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 + (1 − 𝛼)𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 (1)
generating the reconstructed feature (𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 ). This aims to align where 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 is the feature extracted by the global model, 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙
the feature representations of each client within a unified feature is the feature extracted by the local model. 𝛼 is a weight dynamically
space, reducing feature differences. adjusted according to the differences between the features of the global
3. Next, by weighting the fusion of 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 and 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 , we ob- and local models. The differences between the features are measured
tain a new fusion feature (𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 ), which contains information using Kullback–Leibler divergence (KLD Loss), calculated as follows:
about local and global features.
4. Finally, (by adding classification loss 𝑙ce (𝑦,
̂ 𝑦) and reconstruction 𝑑𝑖𝑓 𝑓 = 𝐾𝐿𝐷(𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 , 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 ) (2)
)
loss 𝑙kld 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 , 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 , we calculate the total loss. We choose to use KLD as the criterion for measuring differences,
because it can more accurately assess the effectiveness of the recon-
3.3. Reconstruction module
struction module in reducing differences between features and has
advantages in capturing differences between probability distributions
The reconstruction module is shown in Fig. 2. It uses a linear layer
when dealing with features.
and ReLU activation function to reconstruct the feature of the local
model. 𝛼 = 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝜇 × 𝑑𝑖𝑓 𝑓 ) (3)

217
Y. Huang et al. Computer Communications 214 (2024) 215–222

where 𝜇 is a scaling factor, ranging from 0 to 1. Its purpose is to prevent various existing methods under different datasets and settings. Our
the Sigmoid function output from consistently approaching 1 due to evaluation mainly focuses on two indicators: the highest accuracy of
excessively large Diff. Because Diff is always greater than 0, the range the trained model and the number of communication rounds needed to
of 𝛼 is from 0.5 to 1. achieve the target accuracy.
According to Eqs. (2) and (3), when the difference between the
convolution output of the global model and the local model is large, 4.1. Experimental setup
𝛼 will increase accordingly. This means that when calculating the
weighted sum of features, the model will pay more attention to the Datasets
convolutional layer output of the global model, which helps the model We conducted experiments on four publicly available datasets,
namely CIFAR10, CIFAR100 [25], EMNIST [26], and SVHN [27]. The
to focus on global shared features and improve learning outcomes. As
training/testing splits used in this study are the same as the configura-
the training proceeds, the difference gradually decreases, and the value
tions in previous studies [28]. In the IID setting, training samples are
of 𝛼 will approach 0.5. This allows the model to pay more attention to
randomly selected and evenly distributed among clients. Furthermore,
local features in the mid-late stages of training. These features help the
all clients receive the same number of training data samples and data
model adapt to the specific data distributions of each client, thereby
categories. For the non-IID data setting [29], we generate client data
improving the accuracy and generalizability of the model.
based on the Dirichlet distribution. We establish two non-IID scenarios,
called D1 and D2, with Dirichlet parameters of 0.6 and 0.3 respectively.
3.5. FedAFR Hyperparameter settings
Our experiment uses the traditional federated learning architecture,
Algorithm 1 shows the framework details of FedARF. During the where the selected clients use their private data to train local models
training process on the client side, first, we input the local data into the and transfer the models to the server for aggregation with the contribu-
local model and global model, respectively extracting the local model tions from other clients. All methods use Stochastic Gradient Descent
feature 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 , the reconstructed feature 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 , and the global (SGD) as the local optimization algorithm. Moreover, we set the batch
model feature 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 (line 7, 8). size to 50, the number of training rounds for each client to 5, the
initial learning rate to 0.1, and the decay rate to 0.998, to ensure the
consistency of experimental settings across all methods on the datasets.
Algorithm 1 Pseudo-code of FedAFR:Number of clients C, local training
We set the hyperparameter 𝜇 of FedProx to 10− 4, the hyperparame-
dataset 𝐷𝑖 for client i, number of rounds R, number of epochs E, local
ter 𝛾 of FedGKD to 0.2, the temperature parameter for both Moon and
learning rate 𝜂.
FedIntR is set to 0.5 and the hyperparameter 𝜇 of FedAFR to 0.01. For
1: For r=0 to R-1 do FedAFR on the CIFAR10, CIFAR100, EMNIST, and SVHN datasets, the
2: Server sends 𝜃 𝑡 to client i value of 𝜆 is set to 1, 1, 0.01, and 0.01, respectively.
3: For i = 0 to C - 1 do
4: 𝜃𝑖𝑟 ← 𝜃 𝑟 4.2. Experimental results
5: For e = 0 to E - 1 do
6: For each mini-batch 𝐷𝑚 from 𝐷𝑖 do To evaluate the effectiveness of the proposed method in terms of
7: ̂ 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 , 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 ← 𝑓 (𝜃𝑖𝑟 , 𝑥)
𝑦, model performance and convergence speed, we conducted comprehen-
8: 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 ← 𝑓 (𝜃 𝑟 , 𝑥) sive experiments. Our experiments show that FedAFR is robust and
9: 𝑑𝑖𝑓 𝑓 ← Calculate the difference between 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 and
superior under various degrees of data heterogeneity and client partic-
𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔
ipation levels. We have conducted detailed comparisons with baseline
10: 𝛼 ← 𝑆𝑖𝑔𝑚𝑜𝑖𝑑(𝜇 × 𝑑𝑖𝑓 𝑓 )
11: 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 ← (1 − 𝛼)𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 + 𝛼𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔
methods such as FedProx and FedAvg, highlighting the advantages of
12:  = 𝓁ce (𝑦,̂ 𝑦) + 𝜆𝓁kld (𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 , 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 ) FedARF in terms of accuracy and convergence speed. The results gen-
13: 𝜃𝑖𝑟 = 𝜃𝑖𝑟 − 𝜂∇ erated using the global model demonstrate that our proposed method
14: Client 𝑖 sends 𝜃𝑖𝑟 to server reduces communication overhead between clients and servers while
∑𝐾 𝑛
15: Server updates global model: 𝜃 𝑟 ← 𝑘=1 𝑛𝑘 𝜃𝑘𝑟 achieving the required accuracy level, always outperforming existing
methods.
Efficiency Analysis of FedARF
Next, we calculate the difference 𝑑𝑖𝑓 𝑓 between 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑙 and In Table 1, we can see that, regardless of the dataset or whether it is
𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑔 through Eq. (2) (line 9). Then, the fused feature 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 is under IID, D1, or D2 settings, FedAFR can achieve the target accuracy
obtained through Eqs. (2) and (3) (line 10, 11). in fewer communication rounds. In the best case (for example, the D2
Finally, when calculating the final loss function, we consider two setting of the CIFAR-100 dataset), FedARF is 6.84 times faster than
factors: classification loss and reconstruction loss (line 12). FedAvg in reaching 40%. However, even in the least ideal situation
(for example, the D2 setting of the SVHN dataset), FedARF is still 1.34
𝐿𝑜𝑠𝑠 = 𝑙ce (𝑦,
̂ 𝑦) + 𝜆𝑙kld (𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑓 , 𝐹 𝑒𝑎𝑡𝑢𝑟𝑒𝑟 )
times faster than FedAvg. This demonstrates the superior performance
where, 𝜆 ≤ 1 is used to adjust the weight of the reconstruction of FedAFR under any circumstance.
loss. In scenarios where client data exhibits simplicity or high inter- The efficiency of FedARF is mainly attributed to the design of its
data similarity, mitigating disparities among features might provide feature reconstruction module. This module functions to reconstruct
limited enhancement to the global model’s performance. Under such and align the local features of each client under the guidance of
the global model. This can reduce the feature differences brought
circumstances, by appropriately adjusting the value of 𝜆, we can reduce
about by different client data distributions, thereby enabling the global
the contribution of the reconstruction loss to the total loss, making the
model to focus more on learning global, universal features instead of
model focus more on classification performance, thus reducing the risk
overly adapting to the special data distribution of certain clients. This
of overfitting.
mechanism allows FedARF to quickly learn and extract valuable global
features in the early stages of training, thereby achieving a higher
4. Experiments accuracy within fewer communication rounds.
While FedARF is able to achieve the target accuracy with fewer
In this section, we have conducted a comprehensive performance communication rounds in most cases, Moon might be more advan-
evaluation of our proposed FedARF method by comparing it with tageous under certain special data distributions. For example, in the

218
Y. Huang et al. Computer Communications 214 (2024) 215–222

Table 1
The number of communication rounds required to achieve the target accuracy threshold using different approaches.
The results are obtained from 1500 rounds with 15% of clients participating, where one IID setting and two non-IID
settings with Dirichlet-0.6 and Dirichlet-0.3 distributions, denoted as ‘‘D1’’ and ‘‘D2’’ respectively, were considered.
The total number of communication rounds needed to achieve the target accuracy is labeled as ‘‘R#’’, while the
corresponding FedAvg convergence rate is denoted as ‘‘S↑’’. Methods that fail to meet the target accuracy under the
communication constraint are represented by the symbol ‘‘>’’.
Methods Partial Participation(15%)
D1 D2 IID
R# S↑ R# S↑ R# S↑
SVHN, 100 client, Target accuracy 90%
FedAvg 178 – 216 – 117 –
FedProx 164 1.08× 295 0.73× 130 0.9×
Moon 203 0.88× 196 1.10× 178 0.66×
FedIntR 164 1.09× 300 0.72× 193 0.61×
FedGKD 187 0.95× 319 0.67× 127 0.92×
DisTrans 230 0.77× 463 0.46× 191 0.61×
FedAFR 114 1.56× 192 1.12× 87 1.34×
EMNIST-L, 100 client, Target accuracy 97%
FedAvg 122 – 204 – 197 –
FedProx 131 0.93× 204 1.00× 98 0.98×
Moon 86 1.42× 132 1.54× 82 1.18×
FedIntR 86 1.42× 132 1.54× 65 1.49×
FedGKD 113 1.03× 188 1.08× 100 0.97×
DisTrans 95 1.28× 126 1.61× 93 1.04×
FedAFR 99 1.23× 115 1.77× 63 1.53×
CIFAR10, 100 client, Target accuracy 79%
FedAvg 372 – 556 – 212 –
FedProx 464 0.80× 508 1.09× 212 1.00×
Moon 287 1.29× 601 0.92× 230 0.92×
FedIntR 333 1.12× 692 0.80× 227 0.93×
FedGKD 340 1.09× 633 0.88× 211 1.00×
DisTrans >1500 0.24× >1500 0.37× 695 0.30×
FedAFR 171 2.17× 223 2.49× 132 1.60×
CIFAR100, 100 client, Target accuracy 40%
FedAvg 931 – 583 – 930 –
FedProx 1008 0.92× 706 0.82× 966 0.96×
Moon 529 1.75× 488 1.19× 1117 0.83×
FedIntR 472 1.97× 478 1.22× >1500 0.62×
FedGKD 703 1.32× 514 1.13× >1500 0.62×
DisTrans >1500 0.62× >1500 0.38× >1500 0.62×
FedAFR 136 6.84× 145 4.02× 149 6.24×

Table 2
The highest accuracy (%) for IID and non-IID data over 1500 communication rounds. The number of clients is set to one of
two values: setting 1 (100 clients with 15% user participation), or setting 2 (100 clients with full participation).
Method FedAvg FedProx Moon FedIntR FedGKD DisTrans FedAFR
Setting 1 100 clients 15% participation
CIFAR10-IID 82.53 81.99 81.72 82.15 82.03 79.58 85.72
CIFAR10-D1 80.50 80.07 81.46 81.28 80.92 78.66 84.76
CIFAR10-D2 79.77 79.99 79.91 79.94 79.54 77.46 84.52
CIFAR100-IID 40.28 40.29 40.27 39.75 39.82 39.10 52.46
CIFAR100-D1 40.37 40.10 40.77 41.22 40.86 38.53 51.61
CIFAR100-D2 41.29 40.62 41.40 40.93 41.18 38.57 50.97
EMNIST-IID 97.53 97.55 97.60 97.62 97.52 97.52 97.74
EMNIST-D1 97.50 97.39 97.58 97.58 97.48 97.53 97.68
EMNIST-D2 97.39 97.35 97.38 97.41 97.38 97.46 97.50
SVHN-IID 90.68 90.64 90.40 90.38 90.53 90.13 90.99
SVHN-D1 90.46 90.48 90.27 90.60 90.38 90.13 90.72
SVHN-D2 90.33 90.27 90.39 90.29 90.35 90.02 90.50
Setting 2 100 clients all participation
CIFAR10-IID 81.68 81.89 82.15 81.89 81.88 79.64 85.28
CIFAR10-D2 77.91 80.05 79.76 79.76 79.58 77.34 84.59
Setting 3 500 clients all participation
CIFAR100-IID 26.80 26.56 25.86 26.01 27.08 27.81 33.23
CIFAR100-D2 28.78 29.28 29.51 28.98 28.56 29.32 41.83

219
Y. Huang et al. Computer Communications 214 (2024) 215–222

Fig. 3. Convergence performance comparison of various methods in different settings of data heterogeneity (IID, Dirichlet-0.6, and Dirichlet-0.3) on CIFAR10 and CIFAR100
datasets.

D2 setting of the Emnist dataset, Moon reaches an accuracy of 97% Table 3


The highest accuracy of four federated learning methods (FedAvg, FedGKD, DisTran,
after 86 communication rounds, while FedARF requires 99 rounds. This
FedAFR) on the CIFAR-100 dataset under IID and D2 settings, using both ResNet and
might be because Moon is more capable of capturing and optimizing CNN models.
the differences between the global and local models, as well as between Model DataSets FedAvg FedGKD DisTran FedAFR
local models, in a lower complexity data distribution scenario.
IID 26.33 27.09 29.24 42.06
Performance of FedARF ResNet
D2 35.42 33.26 35.07 40.19
The data in Table 2 show that in Client Setting 1 (100 clients, 15% IID 40.28 39.82 39.10 52.46
participation), FedARF performs with the highest accuracy across all CNN
D2 41.29 41.18 38. 57 50.97
datasets and data distributions (IID and non-IID). It performs the best
on the CIFAR100 dataset, where it improves by over 10 percentage
points in accuracy compared to FedAvg. For instance, in CIFAR100-IID,
the target accuracy increases, compared to other benchmark meth-
the accuracy of FedARF reaches 52.46%, an increase of 12.1 percentage
ods, FedARF can effectively reduce communication overhead, further
points compared to FedAvg’s 40.28%. Such excellent performance is
enhancing model efficiency.
mainly due to FedARF’s adaptive feature fusion strategy, which is
Effectiveness of FedAFR Across Different Models
highly flexible and can dynamically adjust according to the training
The Table 3 shows the results of our training on the Cifar100
process and data features. In practical application, FedAFR focuses on
dataset, which includes experiments conducted under two different
optimizing those features that have a significant impact on specific
models (ResNet and CNN), and in two settings, IID and non-IID (D2).
client data, this targeted optimization helps the model converge quickly
It is clear that in all settings, FedAFR achieves the best performance,
in the middle and late stages of training.
demonstrating its versatility and robustness. However, it is noteworthy
In Client Setting 2 (100 clients, full participation) and Setting 3
that non-IID data performs better on the ResNet model than IID data,
(500 clients, full participation), the training of the global model is and CNN models generally outperform ResNet models. This can be
influenced by the local data of all clients, making the training process attributed to the following reasons:
more complex. However, even when facing such challenges, FedARF In the IID setting, as each client randomly selects data from the
can still effectively balance the local data features of different clients entire dataset, the data distribution of each client is essentially the
by reconstructing local modules, thereby maintaining its lead in terms same. However, the high complexity of the Cifar100 dataset and the
of accuracy. relatively small number of images per category (around 600) may limit
Convergence Performance of FedARF the ability of deep models (such as ResNet) to extract complex features.
As shown in Fig. 3, different methods demonstrate their own con- On the other hand, in the non-IID setting, the data of each client
vergence performance when facing different data heterogeneities, with may include a portion or specific categories of images, allowing the
FedARF achieving the best results in all cases. Figure (a, d) presents model to learn more representative features at each client. This can
the convergence situation of CIFAR10 and CIFAR100 in an IID set- provide more information in the process of aggregating the global
ting, while Figure (b, e) and Figure (c, f) respectively describe the model.
convergence performance in non-IID settings (i.e., Dirichlet-0.6 and Compared to CNN models, ResNet models have more complex
Dirichlet-0.3). Among all compared methods, FedARF displays a higher structures and larger numbers of parameters. This means that ResNet
accuracy and always maintains a leading position. Of particular note is requires more data or a longer training time to achieve a stable and
that when facing the more complex CIFAR100 dataset, the performance accurate model.
improvement of FedARF significantly surpasses its gain when dealing Analysis of 𝜶 for Model Performance
with the CIFAR10 dataset. This finding further underscores the superior As shown in Fig. 4(a), we observe that when the value of 𝛼 is small,
performance of FedARF in handling data heterogeneity. Moreover, as the model performs better at the beginning of learning. This is because,

220
Y. Huang et al. Computer Communications 214 (2024) 215–222

Fig. 4. Analysis of the 𝛼 parameter’s influence on model performance under the setting (100 clients, 15% participation, Dirichlet-0.3 distribution on CIFAR10).

Table 4
Highest Accuracies(%) Achieved by Reconstruction Module with Var-
ied Layer Configurations (indicated by the number in parentheses) on
Cifar10-D2 and SVHN-D2 Datasets.
Dataset Methods Accuracy
FedAFR (1) 84.52
FedAFR (2) 83.87
Cifar10-D2
FedAFR (3) 82.36
FedAFR (4) 81.69
FedAFR (1) 90.50
FedAFR (2) 90.38
SVHN-D2
FedAFR (3) 89.64
FedAFR (4) 89.02

at this stage, the shared features of the global model play a key role.
However, as the training progresses, the learning speed of the model
begins to slow down, and the upper limit of accuracy is relatively
low, highlighting the importance of local features in later learning. In
Fig. 4(b), we find that when the value of 𝛼 is 0.5, the model achieves
the optimal balance between global features and local features, thereby
demonstrating the best performance. This proves that overly relying on
either the local model or the global model can limit the improvement
of model performance during the overall training process. Finally, as
shown in Fig. 4(c), we dynamically adjusted the value of 𝛼 during
training to achieve faster learning and convergence speed in the early
Fig. 5. UMAP visualization of the features extracted from the last layer of a CNN
stage, while ensuring optimal final performance of the model. This
trained on the CIFAR10 dataset under non-IID (Dirichlet-0.3 distribution) conditions.
further verifies the effectiveness of our proposed adaptive feature fusion
strategy.
Impact of Reconstruction Module Complexity
Table 4 demonstrates the impact of reconstruction module layer exhibits outstanding feature differentiation capabilities. On the UMAP
complexity on model performance. On the Cifar10-D2 dataset, a single- image, the distribution of feature points for each category is clear, and
layer module, FedAFR (1), achieved peak accuracy at 84.52%. Adding the boundaries between categories are distinct, fully showcasing the
layers resulted in diminished accuracy, with a two-layer configuration excellent performance of FedARF.
achieving 83.87% and further drops to 82.36% and 81.69% for three
and four layers, respectively. These results indicate that increasing 5. Conclusion
complexity may not lead to better performance and could potentially
impair it. A similar trend is observed in the SVHN-D2 dataset, which In this paper, we propose a novel federated learning method to
supports the premise that a simpler module, capable of capturing solve the problem of data heterogeneity. Specifically, we introduce
essential features, often performs better. The enhanced performance of a reconstruction module to align feature representations of different
the single-layer module can be attributed to its lower parameter count, clients and adopt an adaptive feature fusion strategy to better balance
reducing the risk of overfitting and offering computational benefits in global shared and locally specific features during training. With this de-
the resource-limited settings characteristic of federated learning. sign, we observe an enhancement in model performance across various
UMAP Feature Visualization clients and improved generalization capability, even when faced with
As shown in Fig. 5, we compared the global models trained on the non-independent and identically distributed data distributions.
CIFAR10 dataset under non-IID (Dirichlet-0.3 distribution) conditions Through extensive experiments, we have verified that the proposed
and reaching the highest accuracy. The comparison included FedAvg, method has achieved significant improvements in dealing with data
FEDGKD, Moon, and FedARF. We used the UMAP [30] method to visu- heterogeneity and is superior to existing federated learning methods.
alize the features extracted from the last layer of CNN by these models. Future work will focus on the following aspects:
In the UMAP image, the clarity of the feature points for each category 1. Feature reconstruction methods: We plan to further research
intuitively reflects the performance of each model. Among them, the and explore different feature reconstruction methods to adapt to a
model trained by FedAvg has poor feature quality, manifested as mixed wider range of data distributions and scenario changes. This may
and indistinguishable features between categories. In contrast, FedARF involve using more complex alignment strategies or introducing more

221
Y. Huang et al. Computer Communications 214 (2024) 215–222

advanced feature transformation techniques to enhance the model’s [9] D. Yao, W. Pan, Y. Dai, Y. Wan, X. Ding, H. Jin, Z. Xu, L. Sun, Local-global
generalization capabilities and adaptability. knowledge distillation in heterogeneous federated learning with non-iid data,
2021, arXiv preprint arXiv:2107.00051.
2. Dynamic reconstruction strategies: In the future, we will fur-
[10] H. Yuan, B. Hui, Y. Yang, P. Burlina, N.Z. Gong, Y. Cao, Addressing heterogeneity
ther explore more delicate dynamic feature reconstruction strategies, in federated learning via distributional transformation, in: European Conference
especially by introducing attention mechanisms. By using attention on Computer Vision, Springer, 2022, pp. 179–195.
mechanisms, we can weigh different features, enabling the model to fo- [11] P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K.
cus more on important feature information, thereby further improving Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open
problems in federated learning, 2019, arXiv preprint arXiv:1912.04977.
the learning effect. [12] H. Zhu, J. Xu, S. Liu, Y. Jin, Federated learning on non-IID data: A survey,
Neurocomputing 465 (2021) 371–390.
CRediT authorship contribution statement [13] M.F. Criado, F.E. Casado, R. Iglesias, C.V. Regueiro, S. Barro, Non-IID data and
continual learning processes in federated learning: A long road ahead, Inf. Fusion
88 (2022) 263–280.
Youxin Huang: Conceptualization, Methodology, Software, Inves-
[14] V. Smith, C.-K. Chiang, M. Sanjabi, A.S. Talwalkar, Federated multi-task learning,
tigation, Validation, Writing – original draft. Shunzhi Zhu: Methodol- in: Advances in Neural Information Processing Systems, 2017, pp. 4424–4434.
ogy, Writing – review & editing, Supervision. Weizhe Chen: Investiga- [15] L. Gao, H. Fu, L. Li, Y. Chen, M. Xu, C.-Z. Xu, Feddc: Federated learning with
tion, Supervision. Zhicai Huang: Supervision. non-iid data via local drift decoupling and correction, in: Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp.
10112–10121.
Declaration of competing interest [16] M. Mendieta, T. Yang, P. Wang, M. Lee, Z. Ding, C. Chen, Local learning
matters: Rethinking data heterogeneity in federated learning, in: Proceedings of
The authors declare that they have no known competing finan- the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022,
cial interests or personal relationships that could have appeared to pp. 8397–8406.
[17] B. Li, M.N. Schmidt, T.S. Alstrøm, S.U. Stich, Partial variance reduction improves
influence the work reported in this paper.
non-convex federated learning on heterogeneous data, 2022, arXiv preprint
arXiv:2212.02191.
Data availability [18] H. Zhang, J. Liu, J. Jia, Y. Zhou, H. Dai, D. Dou, Fedduap: Federated learning
with dynamic update and adaptive pruning using shared data on the server,
Data will be made available on request. 2022, arXiv preprint arXiv:2204.11536.
[19] T.-M.H. Hsu, H. Qi, M. Brown, Measuring the effects of non-identical data
distribution for federated visual classification, 2019, arXiv preprint arXiv:1909.
Acknowledgments 06335.
[20] Q. Li, B. He, D. Song, Model-contrastive federated learning, in: Proceedings of
This work was supported in part by the Collaborative innovation the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021,
pp. 10713–10722.
project of Fuxiaquan National Independent Innovation Demonstration
[21] Y.L. Tun, C.M. Thwal, Y.M. Park, S.-B. Park, C.S. Hong, Federated learning
Zone, China (No. 2022FX4), Type 2030 Green and Intelligent Ship in with intermediate representation regularization, in: 2023 IEEE International
the Fujian region, China (No. CBG4N21-4-4). Conference on Big Data and Smart Computing (BigComp), IEEE, 2023, pp. 56–63.
[22] J. Song, X. Chen, Q. Zhu, F. Shi, D. Xiang, Z. Chen, Y. Fan, L. Pan, W. Zhu,
References Global and local feature reconstruction for medical image segmentation, IEEE
Trans. Med. Imaging 41 (9) (2022) 2273–2284.
[23] S. Göppel, J. Frikel, M. Haltmeier, Feature reconstruction from incomplete
[1] D. Dahiwade, G. Patle, E. Meshram, Designing disease prediction model using
tomographic data without detour, Mathematics 10 (8) (2022) 1318.
machine learning approach, in: 2019 3rd International Conference on Computing
[24] Y. Li, W. Feng, S. Lyu, Q. Zhao, Feature reconstruction and metric based network
Methodologies and Communication, ICCMC, IEEE, 2019, pp. 1211–1215.
for few-shot object detection, Comput. Vis. Image Underst. 227 (2023) 103600.
[2] Z. Liu, Z. Li, K. Wu, M. Li, Urban traffic prediction from mobility data using
[25] A. Krizhevsky, G. Hinton, et al., Learning Multiple Layers of Features from Tiny
deep learning, IEEE Netw. 32 (4) (2018) 40–46.
Images, Toronto, ON, Canada, 2009.
[3] M. Leo, S. Sharma, K. Maddulety, Machine learning in banking risk management:
[26] G. Cohen, S. Afshar, J. Tapson, A. Van Schaik, EMNIST: Extending MNIST to
A literature review, Risks 7 (1) (2019) 29.
handwritten letters, in: 2017 International Joint Conference on Neural Networks,
[4] B. McMahan, E. Moore, D. Ramage, S. Hampson, B.A. y Arcas, Communication-
IJCNN, IEEE, 2017, pp. 2921–2926.
efficient learning of deep networks from decentralized data, in: Artificial
[27] Y. Netzer, T. Wang, A. Coates, A. Bissacco, B. Wu, A.Y. Ng, Reading digits in
Intelligence and Statistics, PMLR, 2017, pp. 1273–1282.
natural images with unsupervised feature learning, 2011.
[5] T.S. Brisimi, R. Chen, T. Mela, A. Olshevsky, I.C. Paschalidis, W. Shi, Federated
[28] P. Kairouz, H.B. McMahan, B. Avent, A. Bellet, M. Bennis, A.N. Bhagoji, K.
learning of predictive models from federated electronic health records, Int. J.
Bonawitz, Z. Charles, G. Cormode, R. Cummings, et al., Advances and open
Med. Inf. 112 (2018) 59–67.
problems in federated learning, Found. Trends. Mach. Learn. 14 (1–2) (2021)
[6] W. Zhang, T. Zhou, Q. Lu, X. Wang, C. Zhu, H. Sun, Z. Wang, S.K. Lo, F.-Y.
1–210.
Wang, Dynamic-fusion-based federated learning for COVID-19 detection, IEEE
[29] M. Yurochkin, M. Agarwal, S. Ghosh, K. Greenewald, N. Hoang, Y. Khazaeni,
Internet Things J. 8 (21) (2021) 15884–15891.
Bayesian nonparametric federated learning of neural networks, in: International
[7] Y. Zhao, M. Li, L. Lai, N. Suda, D. Civin, V. Chandra, Federated learning with
Conference on Machine Learning, PMLR, 2019, pp. 7252–7261.
non-iid data, 2018, arXiv preprint arXiv:1806.00582.
[30] L. McInnes, J. Healy, J. Melville, Umap: Uniform manifold approximation and
[8] T. Li, A.K. Sahu, M. Zaheer, M. Sanjabi, A. Talwalkar, V. Smith, Federated
projection for dimension reduction, 2018, arXiv preprint arXiv:1802.03426.
optimization in heterogeneous networks, Proc. Mach. Learn. Syst. 2 (2020)
429–450.

222

You might also like