Profile in Federated Learning

Applied Mathematics and Nonlinear Sciences 2021(aop) 1–14
Applied Mathematics and Nonlinear Sciences

https://www.sciendo.com
Fed-UserPro: A user profile construction method based on federated learning

Yilin Fan1 , Zheng Huo1†, Yaxin Huang1
1Information Technology School, Hebei University of Economics and Business, Shijiazhuang, Hebei Province
050061, China
Submission Info
Communicated by Juan Luis García Guirao
Received February 14th 2022
Accepted April 10th 2022
Available online May 20th 2022
Abstract
User profiles constructed using vast network behaviour data are widely used in various fields. However, data island and
central server capacity problems limit the implementation of centralised big data training. This paper proposes a user profile
construction method, Fed-UserPro, based on federated learning, which uses non-independent and identically distributed
unstructured user text to jointly construct user profiles. Latent Dirichlet allocation model and softmax multi-classification
regression method are introduced into the federated learning structure to train data. The results show that the accuracy of
the Fed-UserPro method is 8.69%–19.71% higher than that of single-party machine learning methods.
Keywords: Federated Learning, Non-independent and identically distributed, Multi-classification, User profile
1 Introduction
With the rapid development of mobile internet, online social behaviours have shown a strong development
trend. The topic of how to make full use of the various elements of information shared by users participating in
online society has become popular in recent years. User profiles are built to document users’ social attributes,
living habits, consumption behaviours and other characteristics, as abstracted from vast user network behaviour
data. These generated profiles are widely used in e-commerce, social networking, internet financing, product
development and other fields, providing an important basis for accurate advertising, personalised recommen-
dations and risk control. In practical applications, user network behaviour profiles are built using big data and
machine learning technology. To accomplish this, one must collect a vast amount of user network behaviour
data. After cleaning and fusing the data, machine learning algorithms are used to model behaviours and build
profiles. Existing user network behaviour profiles tend to be domain oriented, focusing on improving the accu-
racy of the user profiles derived from the user information. Profiles built on single enterprise data usually have
difficulty in fully reflecting user characteristics. As the internet society grows and changes, the integration of
† Corresponding author.
Email address: huozheng@heuet.edu.cn
ISSN 2444-8656 doi:10.2478/amns.2021.2.00188

Open Access. © 2022 Fan et al., published by Sciendo.
This work is licensed under the Creative Commons Attribution alone 4.0 License.
2 Fan et al. Applied Mathematics and Nonlinear Sciences 2021(aop) 1–14
user network behaviour has become a research trend across multiple fields and enterprises; thus, comprehensive
and accurate user profiles are widely used with many network social governance services.
The following key issues exist in the use of large datasets to construct user profiles. First, the requirement
for vast amounts of data adds significant performance requirements to central server equipment and networks. If
each service provider collects a large quantum of user data for combination and synthesis, the data must be stored
and processed centrally. Thus, the centralised method of constructing user profiles poses severe challenges to
storage capacity, computing power and network transmission capability. Second, a data island problem is caused
by fragmented data storage among enterprises [2]. The behaviour data input by users on the network are collected
and stored by different service providers. However, owing to competition, security restrictions and approval
processes, a barrier exists that is difficult to overcome regarding network user behavior data collection (i.e., the
data island problem). Even if companies intend to exchange data, they may encounter policy accountability
issues that prevent it. Splitting stored user network behaviour data obviously does not lead to comprehensive
user profile development, resulting in a great discount in its availability and accuracy. Third, there are huge
differences in data between different enterprises. Due to the different market positioning of different enterprises,
the types of user groups they attract are also different. In addition, the private data of different enterprises
are affected by the use behaviour of users under the enterprise. Therefore, the private data distribution of any
particular enterprise cannot fully reflect the global data distribution of the entire industry, and it is difficult to
build a comprehensive and accurate user profile by relying exclusively on the private data of an enterprise.
Recently, a federated learning architecture proposed by Google [5, 17] has provided inroads to solving data
island and load capacity problems. This federated learning architecture ensures that the data of each participant
are stored in a decentralised manner without the need for centralisation. It builds a machine learning model of
global data without sharing the original data [11]. In the real environment, different from traditional machine
learning and distributed machine learning, the data characteristics of participants in the federated learning sce-
nario are mostly no independent and identically distributed (non-IID). Based on this, this paper proposes a global
user profile construction method – Fed-UserPro, which is based on federated learning. The main contributions
of this paper are summarised as follows:
• This paper proposes, for the first time, a method for constructing user profiles based on unstructured data,
and it uses a federated learning architecture to cooperate with multiparty data to construct global user pro-
files. Compared with user profiles in a single field, it describes user characteristics more comprehensively.
• Based on the federated learning architecture, Fed-UserPro is proposed. Its federated learning architecture
employs a horizontal division that is used to construct a global user profile using multiparty data. With
this method, a latent Dirichlet allocation (LDA) model is used to mine potential user topic information to
obtain a topic probability distribution, and users are grouped by softmax regression multi-classification.
This method is extended to the federated learning architecture. When the data of each participant are
not independent and identically distributed (IID), the accuracy of the model is improved with parameter
transfer and aggregation.
• The Fed-UserPro algorithm was subject to experiments using the Sina Weibo dataset. The experiment
verified the accuracy and running time of the algorithm on non-IID data and compared it to the user
profile algorithm, UserPro, trained by a single data holder. The results show that the accuracy of the
Fed-UserPro algorithm was significantly higher than that of UserPro. The accuracy was increased by
19.71% in the best case and by 8.69% in the worst case; the running time showed a linear increase with
the increase of participants.
Fed-UserPro user profile construction method 3
2 Related works
Based on the need for federated internet user profiling, this paper provides an improved federated learning
algorithm.
User profiles are widely used in recommendation, advertising and marketing services. In recent years, such
profiles have been widely used in the construction of smart libraries, smart campuses, emergency public opinion
managers and personalised insurance services. With the increasing use of user profiles, the types of data and
methods for constructing them are also increasing. In addition to using statistical learning methods to obtain
user profile tags, big data machine learning methods can be used to mine detailed and versatile user behaviour
information from different sources. Zeng and Sun [12] built user profiles and embedded them into a library’s rec-
ommendation service by collecting user behaviour information, such as access logs and search keywords in the
library, improving user retrieval efficiency and the quality of recommendations. He et al. [13] constructed user
profiles by collecting basic real-world information about the urban elderly using a smart elderly care platform
to predict their service needs and provide customised services. Ren et al. [14] used crawler technology to obtain
the static and dynamic attributes of Weibo users using machine learning methods to analyse their emotional
tendencies and build user profiles. They also leveraged user portrait information to predict emotions, helping the
platform develop targeted public opinion guidance strategies. Lin and Xie [9] analysed user behaviours based on
social identity theory and used an LDA topic model to mine user interests and preferences to construct profiles
of Weibo groups.
Federated learning has been proposed to solve the problem of training and updating local models under
privacy constraints. It can train a global model using multiple participants’ data simply by aggregating the gra-
dient or parameter information gathered from each during the model training stage without manipulating the
original local data. For the machine learning task, the goal is to find an optimal solution to minimise the loss
function. Usually, for complex problems with too many model parameters, optimisation algorithms are used to
find the numerical solution of the loss function. The most common optimisation algorithm is the stochastic gra-
dient descent (SGD) method. The FedSGD [19] algorithm applies the SGD algorithm to the federated learning
framework for the aggregation optimisation of model parameters. After receiving the current round of global
model parameters sent by the central server, the local participants perform a gradient calculation according to
all their local data and upload the gradient back to the server to complete the global model aggregation round
and update. Although this method is computationally efficient, it requires many communication rounds to reach
a satisfactory model. In federated learning, communication cost is a problem that must be solved. There are two
optimisation methods for reducing the communication costs of model training [1, 4, 6, 10]. The first reduces the
number of communication rounds and the amount of information transmitted in each. Communication round
reduction can be achieved by increasing the parallelism of the participants and increasing their local computa-
tion power. Information transmission reduction in each round can be achieved through parameter compression.
The FedAvg algorithm proposed by McMahan et al. [4] reduces communication costs by increasing the local
computation of each participant round. FedAvg has equivalent effects to centralised learning [22]. However,
when applied to real data, defects pertaining to equipment heterogeneity arise. For example, participants often
lack resources for the current training stage and cannot complete the training task within the specified time; thus,
the server abandons them [3]. Consequently, for the problem of equipment heterogeneity, some scholars have
proposed the FedProx algorithm [7, 23], which can dynamically adjust the number of local iterative training
rounds of participants according to the resource status of the equipment. It improves the stability of federated
learning, in which the original data are stored in different places. Owing to factors of time, space and individual
differences, it is easy to cause the data to show non-IID characteristics [1,21]. In response to this problem, Zhao
et al. [16] proposed a method of sharing a small amount of data, which helped improve FedAvg performance
when the data were non-IID.
In contrast with the above research, this paper examines a more comprehensive user profile that is based on
unstructured text data under a horizontally federated learning architecture, with the goal of improving model
accuracy by designing parameter transmission and aggregation when the data of each participant are non-IID.
3 Preliminaries
This section provides preparatory knowledge of user profiles and federated learning.
3.1 User profile
The concept of the user profile was first proposed by Alan Cooper, the “father” of interaction design. He
believed that user profiles were virtual representations of real users [8]. The creation of a user portrait requires a
vast amount of real user data to obtain usable information through statistical analysis and machine mining. The
portrait further describes individuals or groups by establishing tags from different dimensions and forming the
prototype of a user group.
Definition 1. User profile UPi refers to a set composed of users, categories and feature labels, which can be
expressed as UPi = {Ui ,Ci , < l1 , p1 >, < l2 , p2 >, . . . , < lk , pk >}, where Ui represents the ID in the user profile,
and Ci represents the category to which Ui belongs. < l1 , p1 >, < l2 , p2 >, . . . , < lk , pk > are the top-k important
feature labels of category Ci , li represents the feature tag and pi represents the probability of having the feature
tag li .
The content posted by Weibo users on their personal accounts reflects their personal characteristics (e.g.,
hobbies, values and social needs). It is feasible to extract important information from such data to construct
user profiles. However, users do not store much content on Weibo, and only the most important aspects of their
personalities can be used to identify their behavioural characteristics. Therefore, this article selects only the
top-k important tags as features.
3.2 Federated learning
A federated learning architecture includes servers and N participants, P1 , P2 , . . . , Pn , as shown in Figure

1. Under the coordination of the server, federated learning allows multiple participants to use their local data
collaboratively to train a global model. The process of federated learning can be simply described as [18, 20]:
(1) The server distributes the parameters of the current global model to the clients participating in this round of
model update training; (2) The client receives the parameters, updates the local model, performs model training
Fig. 1 Federated learning architecture

based on the local data and updates the local model parameters; (3) The client uploads the parameters to the
server after updating the local model parameters of this round; (4) After the server receives all model parameters
participating in this round of training, it aggregates them according to certain rules and updates the global model
again.
According to the distribution of participants’ local data, federated learning is divided into horizontal, vertical
and transfer types [5, 21]. The essence of horizontal federated learning is the collaborative learning of samples.
The sample characteristics of each participant are similar, and model training is carried out by combining dif-
ferent data samples among participants. The essence of vertical federated learning is the collaborative learning
of features. There is much overlap in user IDs, but the characteristics of user data held by different participants
differ. Federated transfer learning solves the problem of each participant having few ID and sample feature
overlaps. It can deal with insufficient sample sizes for a certain issue under the premise of ensuring data privacy
and security.
Although federated learning uses a distributed learning framework, there are certain differences between
it and distributed machine learning. For example, under federated learning, the central server does not have
the right to allocate and control all data used for training; it only acts as a curious but credible third party to
perform modelling tasks and parameter distributions, as well as local model updating of parameter aggregation
operations. In distributed machine learning, the data of each work node are IID, and the number of work nodes
is far lower than the number of training data samples. In federated learning, each participant is a work node, and
the data are non-IID under the following situations [1]: covariate shift, prior probability shift, concept shift and
data imbalance. Among these, a prior probability shift refers to the different distributions of the category labels
of different clients. The setting of the non-IID data in this article is based on the prior probability shift, which is
expressed as Ptrain (x|y) = Ptest (x|y) and Ptrain (y) ̸= Ptest (y).
4 Problem definitions
The problem is that a federated learning architecture is needed to combine the data of multiple participants
to construct user profiles during horizontal data segmentation.
The central server holds a uniformly distributed pre-training dataset provided by all participants according to
their user category characteristics. The dataset was desensitised to delete user identities, which are only stored
on the central server. The data of the pre-training dataset can be expressed as S = {, 
, . . . , }. Among them, to describe the unstructured data of user U i , Ci represents the category label,
and the total number of categories is K. The central server trains the global LDA model according to the pre-
Table 1 Symbols used in this article and their meanings

Symbols Meanings
Pi The i-th participant in federated learning
Di Data set held by the i-th participant
S Pre-training data set held by the server
N Total number of participants
li Topic feature tags
pi Probability of having topic feature label li
Ui User ID in user data
Ci Category i
K Total categories
w Softmax model parameters
M The total number of documents in the pre-training dataset
Ui Unstructured data describing user Ui in S
training dataset to map the participant user text data into the topic feature space vector. Each participant, Pi ,
holds part of the user data, and the data, Di = {Ui , < l1 , p1 >, < l2 , p2 >, . . . , < ln , pn >}, held by participant Pi .
Among them, li represents the i-th feature label, and pi represents the probability that user Ui has label li . The
central server initiates a softmax regression multi-classification model and uses the FedAvg federated learning
algorithm to train a global model containing all types of labels together with all participants. The central server
distributes the trained global model to each participant, and the participant uses local data and the global model
to obtain the distribution of group user interest labels in each category as a group user profile.
The symbols used in this article are described in Table 1.
5 Fed-UserPro algorithm
This section introduces the Fed-UserPro algorithm, which is divided into server and client sides.
5.1 Fed-UserPro server-side algorithm
The server-side algorithm is divided into preprocessing and training sub-algorithms. The server holds the
pre-training dataset, S = {, , . . . , }. First, during the preprocessing stage, a
global LDA model is trained to assist the participants’ local data in the vector representation of the topic feature
space; then, a classification model is jointly trained in the training stage.
5.1.1 Server-side preprocessing algorithm
The data held by the server are the user’s unstructured text data (i.e., all Weibo data published by user Ui ).
The text data of Weibo users can intuitively reflect their interests and preferences. Owing to problems of noise
in microblog content, lower word frequencies and nonstandard word use, traditional text mining technology
cannot be used effectively. Moreover, the high-latitude representation of text data in the feature space is a major
challenge for subsequent model training. The topic model shows a strong advantage in the mining of this type
of text. Therefore, this paper uses the LDA topic [15] model to preprocess the text data minus topic features and
obtain a low-latitude topic feature space.
The LDA model is an unsupervised Bayesian generative learning model that includes a text–topic–word
distribution. In recent years, it has been widely used for text dimensionality reduction, topic mining and text
representation. All user document data contain several topics and probabilities corresponding to different topics.
Each topic contains multiple feature tags, and the feature tags have corresponding probability distributions in
the topics. Figure 2 shows the process of document generation using the LDA model.
Fig. 2 LDA graphical model representation. LDA, latent Dirichlet allocation
In Figure 2, α and β represent the parameters at the corpus level, and the documents held by each user are
the same; θ is a variable at the document level, and each document corresponds to a θ . Thus, the probability of
each document generating each topic z is different, and each document is generated to sample θ once. Both z
and w are word-level variables; z is generated by θ , w is generated by both z and β , and word w corresponds to
topic z.
According to the LDA probability model, the joint distribution formula of all variables can be known:
dN
P(Wn , zn , θd |α, β ) = P(θd |α) ∏m=1 P(zn |θd )P(Wn |zn , β ). (1)
The training process of the topic model learns the parameters of the model in the existing document set,
and the Gibbs sampling method is most often used to solve the distribution parameters. It randomly assigns a
topic number to each feature word in the document set and modifies the topic number of each word by scanning
and updating the entire corpus. It repeats the process until convergence to obtain document topic distribution
parameters.
On the federated learning server side, the word segmenter is used to preprocess each user’s data (e.g., word
segmentation and stop-word removal) so that each user document becomes a bag of words. It then uses the LDA
model for training and obtaining the latent semantic information of the dataset. It then calculates the probability
distribution of each user under each topic to realise the vector representation of the original data in the topic
feature space.
It is essential that an appropriate number of topics be selected in the LDA model. Presently, most topics
are determined through experience and experiments. In this paper, two indicators of perplexity and consistency
were used to jointly determine the number of topics. The degree of confusion refers to how uncertain the trained
model is about the topic to which the document belongs; hence, the lower the degree of confusion, the better.
Generally, however, the higher the number of model topics, the lower the degree of confusion, which leads to
model overfitting in the training set and lower topic interpretability. Consistency can reveal the strength of the
semantic relationship between words in a topic, and the higher the consistency, the better. Therefore, the score
of the comprehensive model for perplexity and consistency determines the number of topics in the model.
The server distributes the trained LDA model to all participants and preprocesses its local data. After pre-
processing, all user text data are mapped to the topic feature space so that the server and different participants
can start federated learning to jointly construct a global user profile.
5.1.2 Server-side training algorithm
The server collects a round of user parameters and updates them as a weighted average according to the pro-
portion of each participant’s dataset to the global dataset. It then sends the updated parameters to the participants
for the next iteration. The specific algorithm is shown as Algorithm 1.
Algorithm 1 Fed-UserPro Server-side training algorithm

Input: Local model parameters of each participant wt1 , wt2 , . . . , wtK
Output: Global user profile model
1: Initialise model parameters w1 ;
2: for t =1, 2,..., T do
3: m ← max(γ*N, 1), γ ∈ [0,1];//m is the number of participants
4: Pt ← Participants in this round are placed in the collection;
5: for P ∈ Pt do
6: wCt+1 ← ClientU pdata(wt );
7: wt+1 ← ∑ |D i| C
|D| wt+1 ;
C
8: end for
9: end for
5.2 Fed-UserPro client algorithm
This paper uses the softmax multi-class regression algorithm to train the user classification model. Softmax
regression is a general form of logistic regression that is used for binary classification tasks and for multi-
classification tasks.
The client Pi holds the input training dataset containing K categories, Di = {< U1 ,C1 >, < U2 ,C2 >, . . . , <
Um ,Cm >}, where Ci ∈ {1, 2, . . . K} and m is the number of samples. The output of the softmax multi-class
regression model is the probability that user Ui belongs to each category, expressed as hw (Ui ) in the form of Eq.
(2).
 T 
ew1 Ui
 
p(Ci = 1)|Ui ; w
T
 p(Ci = 2)|Ui ; w  1 ew2 Ui 
 
hw (Ui ) =  =
wT U
(2)
... ... ∑Kj=1 e j i  . .T. 
   
p(Ci = K)|Ui ; w ewK Ui
where w = [wT1 , wT2 , . . . , wTK ] are the parameters of the model. The probability of user Ui belonging to category
C j is shown in Eq. (3).
T
ew j Ui
p(Ci = j|Ui ; w) = T (3)
∑Kl=1 ew1 Ui
The sum of the probability that the sample belongs to all K categories is one. In softmax regression, the
cross-entropy loss function is often used to measure the gap between the predicted result and the real result.
L(w) is the cost function of the softmax regression:
" T
#
1 n K ew j Ui
L(w) = − ∑ ∑
i=1 j=1
1{Ci = j} log T (4)
n ∑Kl=1 ew1 Ui
where 1{} is an indicative function (i.e., 1{ expression that is true} = 1, and 1{ expression that is false} =0). Eq.
(5) can be used to solve the gradient of w. The gradient of L(w) with respect to w j is solved as follows:
" T
#
∂ L(w) 1 ∂ n K ew j Ui 1 n
=− ∑ ∑ i=1 j=1
1{Ci = j} log wT1 Ui
=− ∑i=1
Ui (1{Ci = j} − p(Ci = j|Ui ; w)) (5)
∂ (w j ) m ∂wj ∑Kl=1 e m
The algorithm of the Fed-UserPro client is shown in Algorithm 2.
Algorithm 2 Client Pk training algorithm ClientUpdate(w)

Input: Local training set Dk = {Ui , < l1 , p1 , >, < l2 , p2 >, . . . , < ln , pn >}
Output: Local training model parameters wtk
1: The client Pk downloads the parameters wt used for this round of model training from the server;
2: Divide Di into small batch training sets B of size batch_size;
3: wtk ← wt ;
4: for e = 1, 2, . . . , E do
5: for each batch b ∈ B do
6: wtk ← wtk − η∇L(wtk , b);
7: upload wtk to the server;
8: end for
9: end for
The client Pk receives the training parameters sent by the server for local training. The client divides the held
data Dk into a small batch training set B with a size batch_size, takes one batch b each time and trains parameter
wtk according to the loss function L(w) of Eq. (5). After the local task is executed, the trained parameters are
uploaded to the server. Finally, the user portrait is constructed from the global data.
6 Experiments
In this study, the Fed-UserPro method was experimentally verified on a real dataset. This section introduces
the experimental environment, parameter settings and experimental results.
6.1 Experimental data acquisition
This study applied crawler technology to examine the Weibo data published by active users in many articles
using 10 Sina Weibo fields (i.e., tourism, military, finance, sports, film, sports, childcare, food, fashion and dig-
ital) and the text data published by 30,000 microblog users. The text data published by users were preprocessed
mainly using Chinese word segmentation to remove stop words, etc. Each user was labelled with an interest
type according to the field (category) of the user’s source.
The above data were divided into training and test sets. In each category, 10% of the data were selected
and provided to the central server for LDA model pre-training. The data released by 21,000 users were divided
horizontally and used as the training set for each participant in federated learning. According to the actual
situation, the data of each participant presented a non-IID a priori probability offset that contained 10 categories
of data for a total of 4,200 training samples, of which two categories contained 1,500 training samples each, and
the remaining eight contained 150 training samples each.
The text data published by 770 users were selected from each category as the test set for federated learning.
It was assumed that the central server was credible and that there was no data interaction among the central
server and the participants, and no data interaction among the participants during the model training stage, so
that the original data information would not be leaked. The number of topics in the LDA model was jointly
determined by topic confusion and consistency indicators, and the topic–word distribution was obtained after
model training was completed. The user’s domain was the category of the user, and the number of topics in the
LDA model was the number of user characteristics.
6.2 Experimental parameters
For multi-classification tasks, classification accuracy, recall rate and F1 value are generally used as model
evaluation indicators. The calculation formulas for accuracy rate (P) and recall rate (R) are shown in Eq. (6),
where TP is true positive, FP is false positive and FN is false negative.
(
TP
P = T P+FP
TP
(6)
R = T P+FN
Owing to the differences in the number of samples held by each participant during federated learning, the
classification accuracy rate did not reflect the performance of the model in each category. This study measured
the macro-averaging index to evaluate the classification performance of the model. The macro-averaging is the
arithmetic average of the F1-score of each class. The calculation formula is shown in Eq. (7).
2 × 1n ∑ni=1 Pi × 1n ∑ni=1 Ri
Macro − avg = 1
(7)
n ∑ni=1 Pi + 1n ∑ni=1 Ri
where Pi and Ri , respectively, represent the precision and recall of the i-th category. To grasp the changes in
the participant-updated parameter differences during the training process, the average variance of participant
parameters and the calculation formula were also tested, as shown in Eq. (8).
1 K ∑Ni=1 (vi j − v̄ j )2
V= (8)
K ∑ j=1 N
where K is the total number of categories, N is the total number of participants, v̄ j is the average value of the
j-th category parameter and vi j is the parameter value of the i-th participant in the j-th category.
Table 2 Accuracy and macro-average results of federated learning and single-point learning
P1 Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 20.00% 22.45% 49.15% 64.39% 74.31% 77.58% 79.02%
Macro-average 7.35% 12.19% 49.04% 62.03% 72.01% 74.43% 73.65%
P2 Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 20.00% 20.00% 38.11% 47.50% 65.53% 79.75% 85.76%
Macro-average 6.66% 7.31% 37.00% 47.83% 66.10% 79.79% 84.46%
P3 Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 19.98% 20.00% 35.85% 50.58% 73.49% 84.32% 89.88%
Macro-average 7.46% 6.71% 33.69% 50.52% 72.17% 83.98% 88.95%
P4 Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 20.00% 20.64% 41.40% 45.63% 57.89% 75.62% 85.88%
Macro-average 10.56% 11.92% 45.18% 48.80% 60.73% 77.49% 85.92%
P5 Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 20.00% 20.19% 35.15% 43.57% 57.37% 71.12% 80.32%
Macro-average 7.16% 7.54% 32.51% 43.45% 57.59% 69.91% 77.93%
Fed-UserPro Index r-1 r-15 r-25 r-35 r-50 r-100 r-1000
Precision 79.77% 94.85% 97.02% 97.87% 98.20% 98.44% 98.28%
Macro-average 73.05% 94.69% 96.97% 97.84% 98.19% 98.43% 98.28%
6.3 Analysis of experimental results
The experiment compared and analysed the Fed-UserPro user profile construction method based on fed-
erated learning and the UserPro user profile construction method using single-party data. The algorithms for
constructing user portraits iterated 1,000 rounds each, and the results are shown in Table 2. P1–P5 represent the
results of five single parties executing the UsePro algorithm on their respective data. The data of these five single
parties were all distributions of prior probability deviations, and the distributions were different, as described
in Section 5.1. Fed-UserPro represents the user profile construction method based on the federated learning
proposed in this paper. The experiment analysed the accuracy of the algorithm and the macro-average. The
first (r-1), 15th (r-15), 25th (r-25), 35th (r-35), 50th (r-50), 100th (r-100) and 1,000th (r-1,000) round results
are presented for comparative analysis. The experimental results show that the accuracy of Fed-UserPro in the
first round of iteration reached 79.77%, which is significantly better than the accuracy of the first round of the
single party. After 100 iterations, the accuracy of Fed-UserPro reached 98.44%, and the accuracy of the single
party was 84.32% (best) and 71.12% (worst), showing the advantages of federated learning in terms of accuracy.
Similar results can be seen with the macro-average parameters because the participants uploaded various repre-
sentative samples to the central server to train the global LDA, which reduced the inaccurate subject distribution
in single-point learning due to the small number of data samples, poor performance of sample diversity and di-
mensionality reduction of the test set to the subject feature space. Simultaneously, federated learning cooperated
with multiple participants to train together, which increased the number of each type of training data in single-
point learning and improved the accuracy and macro-average score of the model in the test set. Fed-UserPro
improved the accuracy by 19.71% in the best case compared with the r-1,000 accuracy of P5, and by 8.69% in
the worst case compared with the r-1,000 accuracy of P3.
This paper also experimentally verified the performance changes of federated learning when the param-
eters were changed (i.e., the number of local epochs of participants’ local training iterations, the number of
model parameter update samples’ batch size and the number of participants per round participating in parameter
updates). Performance testing mainly included algorithm availability (i.e., precision and macro-average) and
efficiency (i.e., runtime). The experimental results are shown in Figures 3–5, and the abscissas all indicate the
number of iterations.
(a) (b)
Fig. 3 Experiment of changing the local epoch size of participants. (a) Accuracy (b) Macro-average
(a) (b)
Fig. 4 Experiment of changing the batch size of the participants. (a) Accuracy (b) Macro-average
(a) (b)
Fig. 5 Experiments to change the number of participants. (a) Accuracy (b) Macro-average
Figure 3 shows the accuracy and macro-average score of the Fed-UserPro algorithm when the local epoch of
the participants’ local training iterations was changed. At this time, the number of participants participating in
the model update in each round of federated learning was fixed at five, and the experiment was iterated for 1,000
rounds. The figure shows the results of the first 200 rounds. When the local epoch was set to five, the model
converged the fastest, and the accuracy reached 97.72% after 20 iterations. When the local epoch was set to
one, the accuracy was only 85.33% after 20 iterations. This is because, in each round of global communication,
increasing the number of calculations of the local model can reduce the global communication cost, and the
model can achieve higher accuracy in fewer communication rounds. Similar results can be seen in the average
macro score. When the local epoch was five, the average macro score increased the fastest.
Figure 4 shows the changes in accuracy and macro-average resultant to changing the batch size of model
parameters. When the batch size was 120, the convergence speed of the model was the fastest. When the number
of iterations was 20, the accuracy of the model reached 96.27%. However, when the batch size was 600, the
accuracy of iteration 20 was only 80.24%. When the batch size was 300, the accuracy was between batch size
120 and batch size 600. Thus, reducing the number of samples for each model parameter update can speed up
the convergence of the model. Similar results can be seen with the average macro score, but the convergence
was slightly slower. When iterating 20 times and when the batch size was 120, the average macro score was
96.19%, and when the batch size was 600, the average macro score was only 73.78%.
Figure 5 shows the changes in accuracy and macro-average resultant to changing the number of participants
in each update round. It can be seen from the results that when all five participants participated in each round of
global model update training, the accuracy of the model on the test dataset and the average macro score steadily
improved, and the model converged faster. When the number was three or four, the accuracy and average score
of the macro fluctuate greatly, and the smaller the value, the greater the fluctuation. It also required more
communication rounds to make the model converge stably because some participants did not participate in the
parameter update in a certain round. When they participated in the parameter update again, the accuracy of the
model was greatly improved.
To grasp the changes in the differences between the participants’ update parameters during the training
process, an experimental test was conducted on the average variance between the various parameters of the
participants during federated learning, as shown in Figure 6. The abscissa represents the number of iterations,
and the ordinate represents the average squared difference between the participant parameters. Owing to the
differences in the amount of various local training data among participants at the beginning of training, each
participant focused more on training large local sample categories. Therefore, with the increase in training
times, the average variance of parameters among participants also increased. With the advancement of federated
learning, the central server aggregated and updated the parameters of all categories in each round, and the aver-
age variance of parameters among participants decreased, also improving the accuracy of participants in small
sample categories according to the aggregated parameters. This further improved the classification performance
of the local model as a whole.
In this paper, the running times of Fed-UserPro and UserPro algorithms were tested, and the results are
shown in Figure 7. Figure 7(a) indicates the time individually required for Fed-UserPro and UserPro algorithms
to run 200 rounds when the model parameters were updated to the batch size and the number of local training
iterations was fixed to three. The running time of the algorithm was the longest when the batch size was 120.
This is because the number of samples involved in each update was small. Training using local data requires
more iterations, and more calculation time is required.
Figure 7(b) shows that when the number of local training iterations of the participants changed, the Fed-
UserPro and UserPro algorithms ran for 200 rounds, and the number of fixed model parameter update samples
was 120. Thus, the smaller the number of local participant training iterations, the shorter the algorithm’s running
time. According to the results in Figure 7, when the number of participants increased, the running time of the
Fed-UserPro algorithm increased linearly, showing good scalability.
Fig. 6 Average variance between various parameters of participants in federated learning
(a) (b)
Fig. 7 Running time experiment. (a) Change batch size (b) Change local epoch
7 Summary
User profiling is widely used in e-commerce, social networking, internet financing, product development
and other fields. It provides an important basis for accurate advertising, personalised recommendations and risk
control. Building a user profile requires a vast amount of data to provide an accurate user portrayal. However,
data island problems have become the biggest obstacle to building user profiles in a centralised fashion. The
emergence of federated learning allows multiple parties to jointly train machine user portrait models without
sharing local data. This paper proposed the Fed-UserPro federated learning user portrait method based on a
multi-classification model, and experimental verification was carried out on a real dataset. Experimental results
showed that this method can significantly improve the accuracy of a single-party training model based on local
data. This not only ensures that the data of participants are not shared but also improves model accuracy, which
helps build powerful user group profiles.
Notably, Fed-UserPro has room for improvement, which will motivate future research activities. First, the
data privacy of participants needs to be improved, and intermediate parameters need to be handled using en-
cryption or differential privacy. Then, the data need to be combined and sent to the server. The user portrait
algorithm also needs improvement, such as by applying an unsupervised clustering algorithm without determin-
ing the number of user categories in advance. To summarise, user profile technology under federated learning is
still a new research field, and there are many problems worthy of in-depth study.
Acknowledgements. This study was supported by the National Science Foundation of China (No.62002098), Natural
Science Foundation of Hebei Province (No.F2020207001, No.F2019207061), the Scientific Research Projects of Hebei
Education Department (No.QN2018116), and the Research Foundation of Hebei University of Economics and Business
(No.2018QZ04, No.2019JYQ08).
References
[1] Kairouz P, McMahan H B, Avent B, et al. (2020), Advances and Open Problems in Federated Learning. Foundations
and Trends in Machine Learning, 14(1-2):1-210
[2] Li Q, Wen Z, Wu Z, et al. (2019), A Survey on Federated Learning Systems: Vision, Hype and Reality for Data Privacy
and Protection. CoRR abs/1907.09693
[3] Aledhari M, Razzak R, Parizi R M, et al. (2020), Federated learning: A survey on Enabling Technologies, Protocols,
and Applications. IEEE Access, 8: 140699-140725
[4] McMahan B, Moore E, Ramage D, et al. (2017), Communication-efficient Learning of Deep Networks from Decen-
tralized Data. AISTATS: 1273-1282
[5] Yang Q, Liu Y, Chen T, et al. (2019), Federated Machine Learning: Concept and Applications. ACM Transactions on
Intelligent Systems and Technology, 10(2):1-19
[6] Li T, Sahu A K, Talwalkar A, et al. (2020), Federated learning: Challenges, methods, and future directions. IEEE
Signal Processing Magazine, 37(3): 50-60.
[7] Li T, Sahu A K, Zaheer M, et al. (2020), Federated optimization in heterogeneous networks. Proceedings of Machine
Learning and Systems, 2: 429-450
[8] Cooper A, Reimann M R. (2005), Software concept revolution: the essence of interaction design. Beijing: Electronic
Industry Press
[9] Lin Y, Xie X. (2018), User Portrait of Diversified Groups in Micro-blog Based on Social Identity Theory. Information
Studies: Theory & Application. 041(003):142-148
[10] Konečný J, McMahan H B, Yu F X, et al. (2016), Federated learning: Strategies for improving communication effi-
ciency. arXiv preprint arXiv:1610.05492
[11] Liu Y, Kang Y, Xing C, et al. (2018), Secure Federated Transfer Learning. arXiv preprint arXiv:1812.03337
[12] Zeng Z, Sun S. (2020), Research on Personalized Mobile Visual Search of Smart Library Based on User Portrait.
Library & Information, (4):8
[13] He Z, Zhu Q, Bai M. (2021), The Construction of Urban Elderly User Portrait from the Perspective of Pension Service.
Journal of Intelligence, 40(09):154-160
[14] Ren Z, Zhang P, Lan Y, et al. (2019), Emotional Tendency Prediction of Emergencies Based on the Portraits of Weibo
Users Taking “8 12” Accident in Tianjin as an Example. Journal of Intelligence, 38(11):130-137
[15] Blei D M, Ng A Y, Jordan M I. (2003), Latent Dirichlet allocation. Journal of Machine Learning Research, 3: 993-1022
[16] Zhao Y, Li M, Lai L, et al. (2018), Federated learning with non-IID data. arXiv preprint arXiv:1806.00582
[17] Bonawitz K, Eichner H, Grieskamp W, et al. (2019), Towards Federated Learning at Scale: System Design. CoRR
abs/1902.01046
[18] Sattler F, Müller K R, Samek W. (2021), Clustered Federated Learning: Model-Agnostic Distributed Multi-task Opti-
mization under Privacy Constraints. IEEE Transactions on Neural Networks and Learning Systems 32(8):3710-3722
[19] Liu L, Zheng F. (2021), A Bayesian Federated Learning Framework with Multivariate Gaussian Product. CoRR
abs/2102.01936
[20] Wang J, Kong L, Huang Z, et al. (2020), Research review of federated learning algorithms. Big Data Research, 6(6):64-
82
[21] Hahn S J, Lee J. (2019), Privacy-preserving Federated Bayesian Learning of a Generative Model for Imbalanced
Clinical Data. CoRR abs/1910.08489
[22] Nilsson A, Smith S, Ulm G, et al. (2018), A Performance Evaluation of Federated Learning Algorithms. DIDL at
Middleware: 1-8
[23] Sahu A K, Li T, Sanjabi M, et al. (2018), On the Convergence of Federated Optimization in Heterogeneous Networks.
CoRR abs/1812.06127

Profile in Federated Learning

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Profile in Federated Learning

Uploaded by

Copyright:

Available Formats

Applied Mathematics and Nonlinear Sciences 2021(aop) 1–14

Applied Mathematics and Nonlinear Sciences

Fed-UserPro: A user profile construction method based on federated learning

ISSN 2444-8656 doi:10.2478/amns.2021.2.00188

3.1 User profile

3.2 Federated learning

A federated learning architecture includes servers and N participants, P1 , P2 , . . . , Pn , as shown in Figure

Fig. 1 Federated learning architecture

Table 1 Symbols used in this article and their meanings

5.1 Fed-UserPro server-side algorithm

Fig. 2 LDA graphical model representation. LDA, latent Dirichlet allocation

Algorithm 1 Fed-UserPro Server-side training algorithm

5.2 Fed-UserPro client algorithm

The algorithm of the Fed-UserPro client is shown in Algorithm 2.

Algorithm 2 Client Pk training algorithm ClientUpdate(w)

6.1 Experimental data acquisition

6.2 Experimental parameters

6.3 Analysis of experimental results

Fig. 6 Average variance between various parameters of participants in federated learning

You might also like