You are on page 1of 26

Abstract

Data privacy has become a paramount concern in the era of big data, prompting the development
of encryption algorithms and security strategies to safeguard sensitive information. Centralized
machine learning approaches often involve transferring data to a central point for model training,
which poses a risk of data exposure. To address this issue, multi-party privacy protection
combined with machine learning offers a solution, with federated learning emerging as a way to
ensure privacy in multi-party settings. In this study, we present PFMLP, a privacy-protected
machine learning algorithm that employs homomorphic encryption. The algorithm enables joint
model training while maintaining multi-party security. We use gradient descent with encrypted
data transmission, preventing data exposure during the process. To counter member inference
attacks, we employ homomorphic encryption on the data, ensuring data privacy. Our approach
demonstrates applicability across various domains, offering a privacy-protected multi-party
machine learning framework. Experimental results using MNIST and metal fatigue strength
datasets indicate the efficiency and accuracy of our method, paving the way for enhanced data
security and privacy in multi-party learning environments. The paper also presents a literature
review, details the PFMLP algorithm and its optimization, and discusses results and implications
for future work.

Introduction

Data privacy is one of the most concerning issues currently especially in the age of big data.
There are several encryption algorithms and security strategies available which aim to protect
sensitive information. Additionally, several security strategies allow only users with keys to
access the data. This is made possible with the help of centralized machine learning, wherein
data is collected, transferred to a central checkpoint, and used to train a model. The process of
data transfer faces the risk of the sensitive data being exposed to hackers. Therefore, preventing
data exposure during the data transfer process is an important issue to achieve data security.

An approach to ensuring data privacy is to combine multi-party privacy protection with machine
learning where several users share their data and jointly learn from the pooled data while
maintaining the security of their own information (1-3). This is possible through federated
learning (4, 5) which can address data privacy issues in a multi-party environment. In this paper,
we developed PFMLP, a machine learning privacy-protected algorithm using homomorphic
encryption. We jointly trained the model using gradient learning while maintaining multi-party
security. In each iteration round, we optimized the model using gradient descent ensuring that
each user could learn from other users’ information through transmission of the gradient. One
concern was the possibility of member inference attacks (6) wherein during the model training
process, hackers could train their own shadow models using the plaintext gradient thereby
compromising data privacy. In order to avoid this, we employed homomorphic encryption
processes on the data, where users could perform calculations on the encrypted data. We found
that upon decryption of the data, the result was similar to operations on the plaintext data thereby
ensuring the quality and efficiency of the process (7). As encrypted data was used during the
entire process, data privacy and data security was ensured.

The machine learning algorithm and homomorphic encryption process proposed in this paper has
several practical applications. It provides a privacy-protected multi-party machine learning
framework using federated learning and homomorphic encryption to achieve data security and
protection of the model during the training process. Furthermore, it can ensure data privacy in a
multi-party learning environment. We tested our model using the MNIST and metal fatigue
strength datasets, and calculated the accuracy rate, time taken for homomorphic encryption, and
impact of various network structures and key lengths.

Section 2 presents a literature review that formed the foundation of our work. In section 3, we
have provided an overview of the Paillier federated network algorithm in terms of network
structure, interaction, and security. Section 4 presents the experimental results and section 5
provides a summary of the paper.

Literature review

Distributed machine learning

Distributed machine learning is a type of multi-node machine learning which aims to increase
accuracy, enhance performance, and easily scale the data. One of the distributed machine
learning frameworks aimed to address the issues with ordinary synchronization involving
training a huge model comprising a large volume of data by means of a state synchronous
parallel model (8). Another framework aimed to systematically solve data and model parallel
changes in a larger scale (9). A factor broadcast calculation model has also been proposed which
enables distributed learning of a parameterized large matrix model (10). Network
communication’s efficiency under a particular bandwidth was improved so that parallel errors
could be reduced and data parallel large-scale applications could be theoretically fused (11).
STRADS is another distributed framework using which the throughput for machine learning
algorithms could be optimized (12).

Google released Disbelief in 2012, which is a first-generation deep learning system capable of
dividing a model into 32 nodes for the purpose of performing calculations (13). The InfiniBand
network was released in 2013 through which model parallelism and data in distributed learning
was introduced in deep learning (14). The efficiency of training of distributed SGD (Stochastic
Gradient Descent) in data parallel and model was compared theoretically and it was found that
an increase in the size of the minibatch could improve the data training efficiency (15, 16).

Homomorphic encryption and secure multi-party computation


In distributed machine learning, a central network assigns tasks to external users and so, the data
is accessible by all users and there is no data privacy in the system. Multi-party computing is
usually involved in distributed learning where unknown or complicated computing processes are
revealed to third parties. One of the first methods to be proposed was the Garbred circuit method
which was used to solve general and simple problems such as two-party password issues (17).
Several years later, SMPC (Secure Multi-Party Computation) was introduced (18). Currently,
SMPC represents a sub-category of cryptography that allows distributed users to collaborate in
computing functions while maintaining the privacy of their information.

Homomorphic encryption has slowly gained popularity in the recent years. It was initially
proposed to be used for bank applications in 1978 (19). It has been used to create multiplicative
homomorphism in one of the first cryptosystems, RSA (Rivest-ShamirAdleman) (20, 21). The
Paillier algorithm was developed in 1999 (22). This algorithm combined with homomorphism
has been used in applications concerning digital auction, retrieval of cloud ciphertext, and digital
elections among others. A fully homomorphic encryption (FHE) algorithm was proposed in 2009
which is based on lattices that adhered to the rules of additive as well as multiplicative
homomorphism (23). FHE has been applied in several cases owing to its high security (24-26). It
has proven useful especially in the field of cloud computing where it has ensured data privacy
(27).

Another technique to maintain data privacy is differential privacy which adds noise to the data
and thereby prevents data exposure (28-30). However, if noise is introduced when the data size is
less, it may impact the efficiency of model training.

Federated learning

Google proposed federated learning in 2016 to address the issues of multi-party learning and data
security (31). Since its release, it has been widely used as a cooperative multi-party machine
learning algorithm in industry and research (32, 33). Initially, federated learning was used for
updating the models for Android users locally. Later, in 2019, scientists at Google used
tensorflow to develop a scalable production system for the purpose of multi-party joint learning
in the area of mobile devices (34). Furthermore, in the same year, the problem of adjusting the
parameters of learning models when data was divided among multiple nodes, while ensuring that
the raw data was not sent to a central network (35). The applications of federated transfer
learning have also been explored and applied to multi-party secure machine learning algorithms
(36). A framework known as SecureBoost has been proposed which has an accuracy equivalent
to that of a total of five privacy protection techniques (37).

Several applications of federated learning have been documented. Google designed the Gboard
system which can recognize keyboard input, carry out predictions, protect privacy, and enhance
input efficiency of users (38, 39). In the field of healthcare, federated learning has been used to
protect sensitive medical information of patients (40, 41). Other applications of federated
learning include natural language processing (42) and recommendation systems (43).

A lot of work has been done in the past few years by making use of machine learning algorithms
for privacy protection. Differential privacy has been used for this purpose, and SMC has been
used to address the noise that arises from the use of differential privacy (44, 45). A batchcrypt
algorithm has been proposed which was developed by optimizing the FATE framework (46).
This algorithm encodes a long integer in the place of a group of quantized gradients, and carries
out encryption of a single gradient at a time, thereby increasing the encryption and decryption
efficiency, and decreasing the calculation required. A vertical federated learning system has been
proposed for Bayesian machine learning using homomorphic encryption, which has been showed
to achieve as high as 90% of the performance given by a union server training model (47).

Methods

Generation of a multi-sample cooperative learning algorithm using a federated idea

In data areolation, there are several intermediate variables that interact with each other during the
training phase. At this point, it is possible to use the data of the other party to optimize one’s own
model (Figure 1). This forms the basis for federated learning. Federated learning is of two types:
sample expansions or horizontal, and feature expansions or vertical federated learning.

Figure 1: Horizontal and vertical federated learning indicated by dotted row and column
respectively. ‘A’ and ‘B’ are the two data owners.
Horizontal federated learning encompasses machine learning and sample expansions.
Considering that D stands for data, X stands for features, Y stands for samples, and I stands for
data index, the equation for horizontal federated learning can be represented as:

X i =X j , Y i=Y j , I i ≠ I j , ∀ Di , D j , i≠ j

Based on this equation, there may or may not be intersections between various users that have
different data. Horizontal federated learning aims to enable different users to pool their data to
train a model, at the same time ensuring the security and privacy of data. In order for this to take
place, all users’ data need to be aligned to ensure that all the users that are involved in the
training bear the same feature domain. This will ensure that all users contribute to the same
model architecture and the iterations that take place in the model are synchronous. In contrast,
for vertical federated learning, the data of all users involved in the training have different
features.

Federated network algorithm

The federated learning network proposed here aims to enable all users to train the same model by
passing the various intermediate variables during the training phase. Assuming that most of the
neural networks undergo training by gradient descent, gradients have been chosen as
intermediate variables here. Even though the data cannot be represented directly by the gradients,
the relationship between the data and the model can be represented which can help in training the
model. Figure 2 shows the federated learning network’s architecture which comprises of various
learning clients and a computing server.

Figure 2: Federated neural network architecture


Learning client

Learning clients have their own data, the quantitative dimensions of which are aligned with other
users’ data before training the model. The main functions of the learning client are initializing
the same model with the other users, local training of data, extraction of gradients during
training, computation of the gradients using the computing server, collection of server responses,
passing of the results, making updates to the model, and performing repeated iterations for the
convergence of the model.

Computing server

The computing server represents an intermediate platform during the learning process. The main
functions of the server are receiving the gradient information from the users, performing
calculations using the gradients, integration of data obtained from multiple models, and
transmission of the result to each individual user.

Federated multi-layer perceptron algorithm

In this paper, a federated multi-layer perceptron (FMLP) algorithm was developed based on the
traditional multi-layer perceptron algorithm. This type of algorithm is also known as a deep feed-
forward network and belongs to the category of deep learning models. FMLP algorithm is
capable of training simple models for individual clients in an environment of multi-party data
areolation by means of a gradient-sharing process. Figure 3 shows the architecture of an FMLP
network architecture.

Figure 3: Multi-layer perceptron model

Considering that the model parameter θ={ w 1 … wn , b1 … bn }, and the learning rate of the model
during training is lr, then the dataset is represented as x= { x 1 … x n }. The objective of the model is
the approximation of the distribution, f*. The network’s forward process calculates the training
output using the following formula:

out=fp(x , θ)

The loss function calculates the distance between the ideal value and the model output using the
following formula:
¿
c=loss (f ( x ) , out)

The back-propagation calculates the gradients and propagates them backward from the loss
function so that the network parameters can be adjusted accordingly and the error between the
ideal value and the output can be decreased. The back-propagation can be calculated using the
following formula:

grad=bp (x ,θ , c )

The model is updated based on the adjustment of the network parameters according to the
gradients obtained following the back-propagation process, which is expressed using the
following formula:

'
θ =θ−lr ∙ grad

Therefore, when a multi-layer perceptron (MLP) recognizes a federated network, a federated


multi-layer perceptron (FMLP) network is generated. Each user stores a copy of the MLP model
in its local memory. This copy comprises of an input layer (x units), hidden layers (n; y units),
and an output layer (z units). The size of the input layer is determined by the feature dimensions
of the input data. Furthermore, the size of the output layer is determined by the required network
output that, in turn, is determined by the target output of real applications.

The gradient data is fused and the gradient descent is accelerated by the model during the
learning process with the help of the computing server. Before updating the model, each user
sends the gradients to the computing server for training the model. The gradient data obtained
from all users are integrated and the calculated new gradient is sent by the server to each user for
updating the model. Finally, model convergence takes place when each user’s loss is less than ɛ,
and the same federated model is sent to all users. The steps involved in the functioning of the
FMLP algorithm are showed in Table 1.

Table 1: Algorithm for federated multi-layer perceptron

Input: Dataset x
Output: Model θfinal
1: Initialize model parameters θ
2: for i in iteration do
3: Forward propagation: outi = fp(xi, θi);
4: Compute loss: ci = loss(f*(xi),outi);
5: if ci < ɛ then
6: Break
7: else
8: Back propagation: gradi = bp(xi,θi,ci);
9: Send gradients to computing server and get new gradients;
10: Update: θi+1 = θi – lr * gradnew;
11: end if
12: end for
13: return Model with parameters θfinal

Paillier federated network

Using the FMLP algorithm proposed in this paper, multiple users can use individual data to
perform cooperative machine learning. However, hackers not only require the data of each user
but also the final updated model.

Evidence from a member inference attack shows that hackers can gain access to the computing
server and derive shadow models from the data present in the server (Shokri et al., 2017). Using
ensemble learning, hackers can derive a predicted model from these shadow models that is
similar to the actual trained model. Therefore, the federated algorithm only addresses the
problem of data security, and not model security.

To address model security, federated learning can be integrated with homomorphic encryption.
In homomorphic encryption, plaintext (a) is encrypted to ciphertext (c), by performing certain
operations on c in the ciphertext space which results in encryption of a in the plaintext space.
The encryption process is expressed using the following formula:

E ( a ) ⊕ E ( b )=E( a ⊗ b)

Where, E stands for the encryption algorithm,

a and b stand for different plaintexts, and

⊕ and ⊗ are the operators.

Homomorphic encryption performs its function based on the operator. For instance, when a
multiplication operator is present, then the multiplicative homomorphism is satisfied, an example
of which is the RSA algorithm (Calderbank, 2007). On the other hand, if an addition operator is
present, the additive homomorphism is satisfied, and an example of this is the Paillier algorithm
(Paillier, 1999). Furthermore, if both the multiplication and the addition operators are present,
then both the homomorphisms are satisfied (Gentry, 2009). In our FMLP algorithm, the gradient
data from all users are summed together, and so, the Paillier algorithm can perform the additive
homomorphic encryption.

Pailler algorithm

The additive homomorphic function of the Paillier algorithm can be divided into three parts: key
generation, encryption, and decryption.

1. Key generation: In this, two primes p and q are selected such that the values are large,
are of equal length, and satisfy the formula of gcd ( p∗q , ( p−1 )∗( q−1 ) )=1. Then, n and λ
can be calculated using the following formulas:

n=p ∙ q

λ=lcm( p−1 ,q−1)

¿
Then, a random integer g is selected which satisfies the formula, g ϵ Z n , such that the
2

order of g can be divided by n. The following equations can then be used to calculate µ:

(x−1)
L ( x) =
n

2 −1
μ=(L ( g mod n ) ) mod n
λ

Therefore, at the end of this process, we obtain the public key (n, g) as well as the private
key (λ, µ).

2. Encryption: Considering m to be the plaintext and c to be the ciphertext, the encryption


process using the public key can be denoted using the following formula:

m n 2
c=g ∙ r mod n

3. Decryption: Similarly, the decryption process using the private key can be denoted using
the following formula:

m=L ( c λ mod n2 ) ∙ μ mod n

Improved Paillier algorithm


Owing to the complexity of the Paillier algorithm during the encryption and decryption
processes, the efficiency of the network training process is greatly affected. Hence, we have used
an improved version of the Paillier algorithm whose accuracy and efficiency have been
previously reported (Jost et al., 2015). The three steps of this improved algorithm are given
below:

1. Key generation: Considering α to be the divisor, the order of g in the public key can be
represented as αn.
2. Encryption: Considering r to be a random positive integer such that r is less than α, m to
be plaintext and c to be ciphertext, the encryption process can be represented as follows:

n r
c=g ∙ ( g ) mod n
m 2

3. Decryption: The decryption process can be represented as follows:

α 2
L( c mod n )
m= mod n
L( gα mod n2 )

The strength of the improved Paillier algorithm can be seen in the decryption equation where α is
used instead of λ. In this equation, the power operations number changes from 2‧λ to 2‧α,
thereby significantly reducing the overhead time as α is a divisor of λ. The computational
2
complexity of the improved algorithm can be represented as O(|n| |α|) , while the computational
3
complexity of the traditional algorithm is O(|n| ) (Ogunseyi and Bo, 2020).

Paillier federated network architecture

Paillier encryption is used for protection of the gradients thereby ensuring that even if the
computing server is attacked, the specific data from each gradient are not accessible by the
hacker. Furthermore, hackers cannot make use of the encrypted gradient data for training of the
shadow models.

In general, Paillier encryption needs key pairs for performing its function; therefore, a key
management centre (KMC) is added to the algorithm for the generation and management of key
pairs. Thus, the architecture of the Paillier federated network comprises of the computing server,
learning clients, and KMC (Figure 4).
Figure 4: Paillier federated network architecture

Paillier Federated Multi-Layer Perceptron (PFMLP)

PFMLP follows the architecture of FMLP, the only addition being that of KMC as the learning
client needs to send a request to KMC prior to training. Once the KMC confirms that each of the
users is online, it generates key pairs and sends them back to the users. Then, each user carries
out multi-party machine learning using the encrypted data. The three additional steps performed
by PFMLP include encryption and decryption, homomorphic operations, and generation and
distribution of key pairs, and the algorithm for the learning client in PFMLP is given in Table 2.

Table 2: Algorithm for PFMLP in the learning client

Input: Dataset x
Output: Model θfinal
1: Request key pairs from KMC
2: Initialize model parameters θ
3: for i in iteration do
4: Forward propagation: outi = fp(xi, θi);
5: Compute loss: ci = loss(f*(xi),outi);
6: if ci < ɛ then
7: Break
8: else
9: Back propagation: gradi = bp(xi,θi,ci);
10: Use public key of client i to encrypt the gradient: Enc(gradi) =
EncPaillier(Publickey,gradi);
11: Send Enc(gradi) to the computing server and receive: Enc(gradinew);
12: Use private key of client i to decrypt the gradient: gradi =
DecPaillier(Privatekey,Enc(gradi));
13: Update: θi+1 = θi – lr * gradnew;
14: end if
15: end for
16: return Model with parameters θfinal

The local model is not updated immediately after the calculation of the gradients by the learning
client. The gradient data is homomorphically encrypted and transmitted to the computing server,
following which the learning client waits for the server to send back the encrypted data after
performing homomorphic operations. Once the learning client receives and decrypts the gradient
data, the local model is updated with the new data. This ensures privacy of other users’ data. The
algorithm for the KMC in PFMLP is given in Table 3.

Table 3: Algorithm for PFMLP in KMC

Input: requests
Output: KeyPair
1: while listening request from Clients do
2: if receive a request from a client then
3: Generate a KeyPair;
4: return a KeyPair to the learning client;
5: endif
6: endwhile

The computing server performs homomorphic operations on the encrypted data obtained from
the learning client upon request and sends the gradient data back to the client. The algorithm for
the computing server in PFMLP is given in Table 4. In this case too, data privacy is ensured
during model training as the generated key pairs are not transmitted to the computing server.

Table 4: Algorithm for PFMLP in the computing server

Input: requests
Output: GradientData
1: while listening requests from Clients do
2: Initialize GradientData;
3: if receive a request then
4: Push Encrypted Data Enc(data) to the Queue;
5: if the requests number == learning clients then
6: for i in the number of learning clients do
7: GradientData = GradientData ⊕ Enc(datai)
8: endfor
9: return GradientData to each client;
10: Break;
11: endif
12: endif
13: endwhile

Security analysis of the algorithm

The KMC cannot access the gradient data, nor can it access the data that has been encrypted by
the learning client using the key, therefore it cannot be a part of security breach. The computing
server too receives the ciphertext that is encrypted by the learning client and it does not need to
decrypt the data for performing homomorphic operations. Therefore, all data present in the
server is always in an encrypted format, and so, even if the server is attacked by hackers, they
will not be able to access the data.

Once the learning client receives the key pair from the KMC, it sends the encrypted data to the
server after which the server performs its operations and sends the result back to the learning
client in an encrypted form. This also ensures that a user cannot access the data of other users. If
hackers gain access to the computing server, they can only access the ciphertext form of the data.
Even if the hackers can gain access to some of the intermediate results sent by the server, key
pairs can be altered in each iteration process and so, hackers cannot access the final result sent by
the server.

Results

Datasets and environment

Verification of the algorithm was performed using two datasets, namely MNIST and metal
fatigue strength datasets. The MNIST handwritten dataset comprises of 60,000 training along
with 10,000 testing samples (LeCun et al., 1998). Furthermore, 784 input layers are present in
the neural network model along with two hidden layers comprising of 64 units and an output
layer comprising of 10 units.

On the other hand, the metal fatigue strength dataset comprises of 437 records that are derived
from the NIMS MatNavi open dataset (MatNavi, 2013). MatNavi is a materials database which
encompasses alloys, ceramics, polymers, composites, superconducting materials, and diffusion
samples. In this study, 437 samples were selected from MatNavi and a regression model was
generated for testing different metals such as carburizing steel, low alloy steel, carbon steel, and
spring steel under different conditions based on characteristics of the rolled product and
conditions of heat treatment. There are 15 dimensional features and a single dimensional label
for each sample. The entire dataset is divided into four categories (Table 5). Overall, the model
that was used in our study comprised of a single input layer comprising of 15 dimensions, a
single output layer comprising of four units, and three hidden layers comprising of 64 units. The
network structure of PFMLP is given in Table 6.
Table 5: Metal strength fatigue dataset for tasks of multi-classification

Dataset Data range Data amount


Fatigue [200,400) 56
[400,500) 147
[500,600) 148
[600,∞) 86

Table 6: PFMLP network structure

Dataset Input layer Hidden layers Output layer


MNIS 784 (units) 2 (layers) . 64 (units) 10 (units)
T
Fatigue 15 (units) 3 (layers) . 64 (units) 4 (units)

The value of DnDataset represents the value of the nth sample of the dataset. The experiments
performed to test the PFMLP algorithm and its optimization included assessing the prediction
accuracy of the FMLP and single-node MLP, time used in training the model with varying key
lengths, time used in training the model using varying sizes of the units of hidden layers, and
impact of the number of learning clients on the performance of the model.

The environment used for the experiment was Windows 10, and Python 3.6 with scikit-learn
0.21.3 and phe 1.4.0. A computing server, several learning clients, and a KMC were deployed
from the local area network, and Socket was used to establish communication between the
various machines. The layout of the network that was used in the study is given in Figure 5.
Figure 5: Network deployment in the experimental environment

Comparison of prediction accuracy

Comparison between the PFMLP and MLP algorithms was carried out while using the same
dataset and network structure during training of the model. Considering that there are two
learning clients, each dataset was split into two subsets, one for each client.

For example, for the MNIST dataset, the first 4000 samples were selected as the training data
(Dmnist), which were divided into two parts: Dmnist
I
mnist mnist
={D 1 , … , D 2000 } and
mnist mnist mnist
D II ={D 2001 , … , D 4000 }. For the test data, 10,000 samples from the MNIST dataset were used.

Similarly, for the metal fatigue strength dataset, 400 samples were selected (DFatigue) and were
randomized [DFatigue’ = random (DFatigue)]. This was divided into two subsets:
Fatigue ' Fatigue ' Fatigue ' Fatigue ' Fatigue ' Fatigue '
DI ={D1 , … , D200 } and D II ={D201 , … , D400 }.Overall, 70% of the data was
used for training and 30% was used as test dataset. The results of the experiment are given in
Table 7.

Table 7: Comparison of prediction accuracy of MLP and PFMLP on MNIST and fatigue datasets

Dataset Data subset Algorithm Accuracy


mnist
MNIS D1 MLP 0.8333
T D2
mnist
MLP 0.9033
mnist
D MLP 0.9245
mnist
D1 PFMLP 0.9252
mnist
D2 PFMLP 0.9252
Fatigue '
Fatigue D1 MLP 0.9013
Fatigue '
D2 MLP 0.7833
DFatigue’ MLP 0.8583
Fatigue '
D1 PFMLP 0.8833
Fatigue '
D2 PFMLP 0.8167
DFatigue’ PFMLP 0.8500

Based on the results, it is evident that the accuracy of the model is higher in case of PFMLP
compared to that of MLP. The model that is trained by PFMLP is similar to or better than the
model that is trained by MLP. An accuracy rate of 0.9252 was reached in the experiment
performed using the MNIST dataset for the PFMLP model, while an accuracy of 0.9245 was
reached for the MLP model, with a difference of just 0.007 between the two models. In case of
the metal fatigue strength dataset, a weighted average of both experimental results was
performed as the same PFMLP trained each client’s model, and a final prediction accuracy of
0.85 was achieved. The prediction accuracy for the MLP model was 0.858, with a difference of
0.008 between the two models. Therefore, our findings indicate that both the PFMLP and MLP
algorithms can train models with similar prediction accuracies.

Comparison of time for model training

The transmission of plain text data over a network is vulnerable to attacks by hackers who may
use the data to train their shadow models. This not only breaches the model security of the client
to whom the data belongs, but also places other clients’ data at risk. Therefore, in order to
encrypt the data, we used the Paillier homomorphic encryption in PFMLP. This encryption was
operated during the transmission of data, and all homomorphic operations were carried out in the
server to tighten the security of the data.

An important factor for the optimal functioning of the Paillier encryption method is the key
length. Higher security is provided by longer keys in general. However, the problem with using
longer keys is that it increases the time taken for generating the ciphertext. For both the datasets,
three comparison runs were carried out. Additionally, the structure of the model was fixed, and
the effect of various key lengths on the time and cost of training the model was evaluated (Table
8 and Figure 6).

Table 8: Impact of differences in key lengths on model training time

Encryption algorithm Dataset Key length (bits) Time (s)


None MNIST - 7.92
Paillier 128 12,033.25
Paillier 256 45,467.44
Paillier 512 199,915.70
None Fatigue - 9.84
Paillier 64 2567.44
Paillier 128 4480.02
Paillier 256 13,428.92
Impact of differences in key lengths on model
training time
250000

200000

150000
Time (s)

100000

50000

0
No key 64 bits 128 bits 256 bits 512 bits
Key lengths

MNIST Fatigue

Figure 6: Impact of differences in key lengths on the time taken for training the model using the
MNIST and fatigue datasets

Our findings indicate that in Paillier encryption, the length of the key directly impacts the time
used for training the model. Each round includes the encryption and decryption of the gradient
data. The encrypted data is transmitted over the server, and therefore, it follows that with an
increase in the key length, the time taken for training the model will also increase. Therefore, an
appropriate length of the key can help achieve a balance between data security and time
consumption. Furthermore, if we want to enhance the security, we can increase the key length in
each round during model training. Therefore, even if the key of a particular round is exposed, the
security of the overall model training process will not be affected and a higher level of security
can be achieved.

From our experimental run, using the same key length and the same model, 358.14 s is used for
4000 pieces of data, 733.69 s is used for 8000 pieces of data, and 1284.06 s is used for 12,000
pieces of data for every iteration. Therefore, the amount of data directly impacts the time used
for model training. Furthermore, the improved Paillier algorithm was also tested on the MNIST
dataset for the same number of iterations to compare the time taken for data encryption and
decryption (Table 9 and Figure 7). It is evident from our results that the improved Paillier
algorithm showed a 25 to 28% better performance compared to the native Paillier algorithm.

Table 9: Impact of differences in key lengths on iteration time in each round

Algorithm Size of key length (bits) Time (s)


Paillier 128 1068.38
256 6411.23
512 35,930.83
Improved 128 779.37
Paillier
256 4716.37
512 26,148.56

Impact of differences in key lengths on iteration


time in each round
40000
35000
30000
25000
Time (s)

20000
15000
10000
5000
0
128 bits 256 bits 512 bits
Key lengths

Paillier Improved Paillier

Figure 7: Impact of differences in key lengths on iteration time in each round of model training
for the Paillier and improved Paillier algorithms

Comparison of training performance

In case of neural networks, the time and performance for both the forward and backward
propagation is affected by the size of each layer. Usually, the size of the network and time taken
for training the model are positively correlated. We tested this relationship through our
experimental runs on the two datasets (Table 10 and Figure 8).

Table 10: Impact of differences in key lengths on iteration time in each round

Dataset Size of hidden layer (units) Time (s)


MNIS 2 . 64 12,033.25
T
2 . 128 23,981.02
2 . 256 47,702.87
Fatigue 3 . 64 2615.42
3 . 128 6941.04
3 . 256 21,782.07
Impact of differences in key lengths on iteration
time in each round
60000
50000
40000
Time (s)

30000
20000
10000
0
64 128 256
Key lengths

MNIST Fatigue

Figure 8: Impact of differences in key lengths on iteration time in each round of model training
for the MNIST and fatigue datasets

As our data is encrypted and the number of hidden layer units is increased, the time consumed
also increases. Additionally, with increase in the number of hidden layer units, there is also an
increase in the data that is transmitted over the network. Therefore, if we want to reduce the time,
we should also reduce the number of hidden layers and their units.

Comparison of number of learning clients

Multi-party machine learning is possible through the use of PFMLP algorithm. Theoretically,
with increase in the number of learning clients, there should be an increase in accuracy and
decrease in the time taken for model training. We performed an experiment on the metal fatigue
strength dataset where we used a single node (MLP) along with either two clients (2-Client-
PFMLP) or four clients (4-Client-PFMLP). The experimental results are given in Table 11. The
local accuracy refers to the prediction accuracy of the model on the local dataset, and the logical
accuracy refers to the average prediction accuracy for each learning client.

Table 11: Accuracy of PFMLP on fatigue dataset with various learning clients

Algorithm Dataset Local accuracy Logical accuracy


Fatigue '
MLP D0−400 0.858 -
Fatigue '
MLP D0−200 0.833 -
Fatigue '
MLP D 200−400
0.783 -
Fatigue '
2-Client-PFMLP D0−200 0.867 0.850
Fatigue '
2-Client-PFMLP D200−400 0.833 0.850
Fatigue '
MLP D0−100 0.767 -
Fatigue '
MLP D 100−200
0.933 -
Fatigue '
MLP D 200−300
0.800 -
Fatigue '
MLP D 300−400
0.600 -
Fatigue '
4-Client-PFMLP D 0−100
0.833 0.850
Fatigue '
4-Client-PFMLP D 100−200
0.967 0.850
Fatigue '
4-Client-PFMLP D 200−300
0.867 0.850
Fatigue '
4-Client-PFMLP D 300−400
0.733 0.850

The prediction accuracy was significantly improved when using the multi-client PFMLP
algorithm. However, the logical accuracy was similar when using two or four clients. The local
accuracy for the PFMLP algorithm was also significantly improved and was amplified in
extreme cases, for instance, in the 4-Client run, the last client had outlier data despite which the
accuracy of PFMLP was 13.3% higher than the accuracy of MLP. The local accuracy for the
second client was 93.3% which was improved by 3.4% when using PFMLP.

Furthermore, in Table 11, we can see that the model performance in case of PFMLP is not
significantly affected by the number of clients. Rather, it is similar to the model performance in
case of MLP. As the model performance depends on batch expansion, if each client transmits
lesser amount of data, then the learning per round will also reduce thereby reducing the time
taken for training the model.

Conclusions and future work

In this paper, we have proposed a multi-party privacy-protected machine learning algorithm


which can be accessed and used by different users without compromising the privacy and
security of their data through federated learning and homomorphic encryption. When considering
data security, our proposed algorithm can train models in multiple rounds in a way that is similar
to the model being trained in a single round using all data together. All users transmit their
respective data over the network and homomorphic encryption is performed on the data over the
server. The new data thus obtained forms the basis for updating the model. However,
homomorphic encryption affects the model performance by increasing the workload involved in
encryption and decryption, which in turn affects the efficiency of training the model. Also, the
final model performance is also affected by the structure of the network, length of the key, and
replacement frequency of the key.
In future, federated learning can be made more efficient and scalable by using vertical federated
learning algorithms which can divide the features among the users. Secondly, the model
performance should be improved by increasing the efficiency of the encryption process. Finally,
other privacy-protected learning algorithms can be considered that are more robust such as anti-
malicious attack client algorithms and hybrid algorithms.
References

1. Bonawitz, K.; Ivanov, V.; Kreuter, B.; Marcedone, A.; McMahan, H.B.; Patel, S.; Ramage, D.;
Segal, A.; Seth, K. Practical secure aggregation for privacy-preserving machine learning. In
Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security,
Dallas, TX, USA, 30 October–3 November 2017; pp. 1175–1191.

2. Giacomelli, I.; Jha, S.; Joye, M.; Page, C.D.; Yoon, K. Privacy-preserving ridge regression
with only linearly-homomorphic encryption. In International Conference on Applied
Cryptography and Network Security; Springer: Berlin/Heidelberg, Germany, 2018; pp. 243–261.

3. Hall, R.; Fienberg, S.E.; Nardi, Y. Secure multiple linear regression based on homomorphic
encryption. J. Off. Stat. 2011, 27, 669.

4. Koneˇcný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated
learning: Strategies for improving communication efficiency. arXiv 2016, arXiv:1610.05492.

5. Yang, Q.; Liu, Y.; Chen, T.; Tong, Y. Federated machine learning: Concept and applications.
ACM Trans. Intell. Syst. Technol. (TIST) 2019, 10, 1–19.

6. Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against
machine learning models. In Proceedings of the 2017 IEEE Symposium on Security and Privacy
(SP), San Jose, CA, USA, 22–26 May 2017; pp. 3–18.

7. Yi, X.; Paulet, R.; Bertino, E. Homomorphic encryption. In Homomorphic Encryption and
Applications; Springer: Berlin/Heidelberg, Germany, 2014; pp. 27–46.

8. Ho, Q.; Cipar, J.; Cui, H.; Lee, S.; Kim, J.K.; Gibbons, P.B.; Gibson, G.A.; Ganger, G.; Xing,
E.P. More effective distributed ml via a stale synchronous parallel parameter server. Adv. Neural
Inf. Process. Syst. 2013, 2013, 1223–1231.

9. Xing, E.P.; Ho, Q.; Dai, W.; Kim, J.K.; Wei, J.; Lee, S.; Zheng, X.; Xie, P.; Kumar, A.; Yu, Y.
Petuum: A new platform for distributed machine learning on big data. IEEE Trans. Big Data
2015, 1, 49–67.

10. Xie, P.; Kim, J.K.; Zhou, Y.; Ho, Q.; Kumar, A.; Yu, Y.; Xing, E. Distributed machine
learning via sufficient factor broadcasting. arXiv 2015, arXiv:1511.08486.

11. Wei, J.; Dai, W.; Qiao, A.; Ho, Q.; Cui, H.; Ganger, G.R.; Gibbons, P.B.; Gibson, G.A.;
Xing, E.P. Managed communication and consistency for fast data-parallel iterative analytics. In
Proceedings of the Sixth ACM Symposium on Cloud Computing, Kohala Coast, HI, USA, 27–
29 August 2015; pp. 381–394.

12. Kim, J.K.; Ho, Q.; Lee, S.; Zheng, X.; Dai, W.; Gibson, G.A.; Xing, E.P. Strads: A
distributed framework for scheduled model parallel machine learning. In Proceedings of the
Eleventh European Conference on Computer Systems, London, UK, 18–21 April 2016; pp. 1–
16.

13. Dean, J.; Corrado, G.; Monga, R.; Chen, K.; Devin, M.; Mao, M.; Ranzato, M.; Senior, A.;
Tucker, P.; Yang, K.; et al. Large scale distributed deep networks. In Proceedings of the 25th
International Conference on Neural Information Processing Systems, Siem Reap, Cambodia, 13–
16 December 2012; pp. 1223–1231.

14. Coates, A.; Huval, B.; Wang, T.; Wu, D.; Catanzaro, B.; Andrew, N. Deep learning with cots
hpc systems. In Proceedings of the International Conference on Machine Learning, Atlanta, GA,
USA, 16–21 June 2013; pp. 1337–1345.

15. Seide, F.; Fu, H.; Droppo, J.; Li, G.; Yu, D. On parallelizability of stochastic gradient descent
for speech dnns. In Proceedings of the 2014 IEEE International Conference on Acoustics,
Speech and Signal Processing (ICASSP), Florence, Italy, 4–9 May 2014; pp. 235–239.

16. Watcharapichat, P.; Morales, V.L.; Fernandez, R.C.; Pietzuch, P. Ako: Decentralised deep
learning with partial gradient exchange. In Proceedings of the Seventh ACM Symposium on
Cloud Computing, Santa Clara, CA, USA, 5–7 October 2016; pp. 84–97.

17. Yao, A.C.-C. How to generate and exchange secrets. In Proceedings of the 27th Annual
Symposium on Foundations of Computer Science (sfcs 1986), Toronto, ON, Canada, 27–29
October 1986; pp. 162-167.

18. Goldreich, O. Secure Multi-Party Computation; Manuscript. Preliminary Version;


CiteSeerX: University Park, PA, USA, 1998; Volume 78.

19. Rivest, R.L.; Adleman, L.; Dertouzos, M.L. On data banks and privacy homomorphisms.
Found. Secur. Comput. 1978, 4, 169–180.

20. Calderbank, M. The Rsa Cryptosystem: History, Algorithm, Primes; Fundamental Concepts
of Encryption; University of Chicago: Chicago, IL, USA, 2007.

21. Somani, U.; Lakhani, K.; Mundra, M. Implementing digital signature with rsa encryption
algorithm to enhance the data security of cloud in cloud computing. In Proceedings of the 2010
First International Conference On Parallel, Distributed and Grid Computing (PDGC 2010),
Solan, India, 28–30 October 2010; pp. 211–216.
22. Paillier, P. Public-key cryptosystems based on composite degree residuosity classes. In
Proceedings of the International Conference on the Theory and Applications of Cryptographic
Techniques, Prague, Czech Republic, 14–18 May 1999; pp. 223–238.

23. Gentry, C. Fully homomorphic encryption using ideal lattices. In Proceedings of the Forty-
First Annual ACM Symposium on Theory of Computing, Washington, DC, USA, 31 May–2
June 2009; pp. 169–178.

24. Dijk, M.V.; Gentry, C.; Halevi, S.; Vaikuntanathan, V. Fully homomorphic encryption over
the integers. In Proceedings of the Annual International Conference on the Theory and
Applications of Cryptographic Techniques, Monaco and Nice, France, 30 May–3 June 2010; pp.
24–43.

25. Ibtihal, M.; Hassan, N. Homomorphic encryption as a service for outsourced images in
mobile cloud computing environment. In Cryptography: Breakthroughs in Research and
Practice; IGI Global: Hershey, PA, USA, 2020; pp. 316–330.

26. Makkaoui, K.E.; Ezzati, A.; Beni-Hssane, A.; Ouhmad, S. Fast cloud–paillier homomorphic
schemes for protecting confidentiality of sensitive data in cloud computing. J. Ambient. Intell.
Humaniz. Comput. 2020, 11, 2205–2214.

27. Çatak, F.; Mustacoglu, A.F. Cpp-elm: Cryptographically privacy-preserving extreme


learning machine for cloud systems. Int. J. Comput. Intell. Syst. 2018, 11, 33–44.

28. Abadi, M.; Chu, A.; Goodfellow, I.; McMahan, H.; Mironov, I.; Talwar, K.; Zhang, L. Deep
learning with differential privacy. In Proceedings of the 2016 ACM SIGSAC Conference on
Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 308–318.

29. Dwork, C.; Roth, A. The algorithmic foundations of differential privacy. Found. Trends
Theor. Comput. Sci. 2014, 9, 211–407.

30. Zhu, T.; Ye, D.; Wang, W.; Zhou, W.; Yu, P. More than privacy: Applying differential
privacy in key areas of artificial intelligence. arXiv 2020, arXiv:2008.01916.

31. Gilad-Bachrach, R.; Dowlin, N.; Laine, K.; Lauter, K.; Naehrig, M.; Wernsing, J. Cryptonets:
Applying neural networks to encrypted data with high throughput and accuracy. In Proceedings
of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016;
pp. 201–210.

32. Yuan, J.; Yu, S. Privacy preserving back-propagation neural network learning made practical
with cloud computing. IEEE Trans. Parallel Distrib. Syst. 2013, 25, 212–221.
33. Shokri, R.; Shmatikov, V. Privacy-preserving deep learning. In Proceedings of the 22nd
ACM SIGSAC Conference on Computer and Communications Security, Denver, Colorado, 12–
16 October 2015; pp. 1310–1321.

34. Bonawitz, K.; Eichner, H.; Grieskamp, W.; Huba, D.; Ingerman, A.; Ivanov, V.; Kiddon, C.;
Konecny, J.; Mazzocchi, S.; McMahan, H.B.; et al. Towards federated learning at scale: System
design. arXiv 2019, arXiv:1902.01046.

35. Wang, S.; Tuor, T.; Salonidis, T.; Leung, K.K.; Makaya, C.; He, T.; Chan, K. Adaptive
federated learning in resource constrained edge computing systems. IEEE J. Sel. Areas
Commun.s 2019, 37, 1205–1221.

36. Liu, Y.; Chen, T.; Yang, Q. Secure federated transfer learning. arXiv 2018,
arXiv:1812.03337.

37. Cheng, K.; Fan, T.; Jin, Y.; Liu, Y.; Chen, T.; Yang, Q. Secureboost: A lossless federated
learning framework. arXiv 2019, arXiv:1901.08755.

38. Yang, T.; Andrew, G.; Eichner, H.; Sun, H.; Li, W.; Kong, N.; Ramage, D.; Beaufays, F.
Applied federated learning: Improving google keyboard query suggestions. arXiv 2018,
arXiv:1812.02903.

39. Hard, A.; Rao, K.; Mathews, R.; Ramaswamy, S.; Beaufays, F.; Augenstein, S.; Eichner, H.;
Kiddon, C.; Ramage, D. Federated learning for mobile keyboard prediction. arXiv 2018,
arXiv:1811.03604.

40. Sheller, M.J.; Reina, G.A.; Edwards, B.; Martin, J.; Bakas, S. Multi-institutional deep
learning modeling without sharing patient data: A feasibility study on brain tumor segmentation.
In Proceedings of the International MICCAI Brainlesion Workshop, Granada, Spain, 16
September 2018; pp. 92–104.

41. Huang, L.; Shea, A.L.; Qian, H.; Masurkar, A.; Deng, H.; Liu, D. Patient clustering improves
efficiency of federated machine learning to predict mortality and hospital stay time using
distributed electronic medical records. J. Biomed. Inform. 2019, 99, 103291.

42. Chen, M.; Mathews, R.; Ouyang, T.; Beaufays, F. Federated learning of out-of-vocabulary
words. arXiv 2019, arXiv:1903.10635.

43. Ammad-Ud-Din, M.; Ivannikova, E.; Khan, S.A.; Oyomno, W.; Fu, Q.; Tan, K.E.; Flanagan,
A. Federated collaborative filtering for privacy-preserving personalized recommendation system.
arXiv 2019, arXiv:1901.09888.
44. Truex, S.; Baracaldo, N.; Anwar, A.; Steinke, T.; Ludwig, H.; Zhang, R.; Zhou, Y. A hybrid
approach to privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop
on Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 1–11.

45. Xu, R.; Baracaldo, N.; Zhou, Y.; Anwar, A.; Ludwig, H. Hybridalpha: An efficient approach
for privacy-preserving federated learning. In Proceedings of the 12th ACM Workshop on
Artificial Intelligence and Security, London, UK, 15 November 2019; pp. 13–23.

46. Zhang, C.; Li, S.; Xia, J.; Wang, W.; Yan, F.; Liu, Y. Batchcrypt: Efficient homomorphic
encryption for cross-silo federated learning. In Proceedings of the 2020 USENIX Annual
Technical Conference (USENIX ATC 20), online, 15–17 July 2020; pp. 493–506.

47. Ou, W.; Zeng, J.; Guo, Z.; Yan, W.; Liu, D.; Fuentes, S. A homomorphic-encryption-based
vertical federated learning scheme for rick management. Comput. Sci. Inf. Syst. 2020, 17, 819–
834.

48. Jost, C.; Lam, H.; Maximov, A.; Smeets, B.J. Encryption performance improvements of the
paillier cryptosystem. IACR Cryptol. ePrint Arch. 2015, 864, 2015.

49. Ogunseyi, T.B.; Bo, T. Fast decryption algorithm for paillier homomorphic cryptosystem. In
Proceedings of the 2020 IEEE International Conference on Power, Intelligent Computing and
Systems (ICPICS), Shenyang, China, 28–30 July 2020; pp. 803–806.

50. LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document
recognition. Proc. IEEE 1998, 86, 2278–2324.

51. MatNavi, N. Materials Database. [DB/OL] 15 June 2013. Available online:


http://mits.nims.go.jp/index_en.html (accessed on 20 October 2020).

You might also like