You are on page 1of 14

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL.

68, 2020 5983

Privacy-Preserving Distributed Optimization via


Subspace Perturbation: A General Framework
Qiongxiu Li , Student Member, IEEE, Richard Heusdens ,
and Mads Græsbøll Christensen , Senior Member, IEEE

Abstract—As the modern world becomes increasingly digitized in which all the data must be firstly collected from different
and interconnected, distributed signal processing has proven to be units and then processed at a central server, distributed signal
effective in processing its large volume of data. However, a main processing circumvents this limitation by utilizing the network
challenge limiting the broad use of distributed signal processing
techniques is the issue of privacy in handling sensitive data. To nature. That is, instead of relying on a single centralized co-
address this privacy issue, we propose a novel yet general subspace ordination, each node/unit is able to collect information from
perturbation method for privacy-preserving distributed optimiza- its neighbors and also to conduct computation on a subset
tion, which allows each node to obtain the desired solution while of the overall networked data. This distributed processing has
protecting its private data. In particular, we show that the dual many advantages, such as allowing for flexible scalability of
variable introduced in each distributed optimizer will not converge
in a certain subspace determined by the graph topology. Addition- the number of nodes and robustness to dynamical changes in
ally, the optimization variable is ensured to converge to the desired the graph topology. Currently, the computational unit/node in
solution, because it is orthogonal to this non-convergent subspace. distributed systems is usually limited in resources, as tablets and
We therefore propose to insert noise in the non-convergent subspace phones become the primary computing devices used by many
through the dual variable such that the private data are protected, people [1], [2]. These devices often contain sensors that can
and the accuracy of the desired solution is completely unaffected.
Moreover, the proposed method is shown to be secure under use wireless communication to form so-called ad-hoc networks.
two widely-used adversary models: passive and eavesdropping. Therefore, these devices can collaborate in solving problems by
Furthermore, we consider several distributed optimizers such as sharing computational resources and sensor data. However, the
ADMM and PDMM to demonstrate the general applicability of the information collected from sensors such as GPS, cameras and
proposed method. Finally, we test the performance through a set of
microphones often includes personal data, thus posing a major
applications. Numerical tests indicate that the proposed method
is superior to existing methods in terms of several parameters concern, because such data are private in nature.
like estimated accuracy, privacy level, communication cost and There has been a considerable growth of optimization tech-
convergence rate. niques in the field of distributed signal processing, as many
Index Terms—Distributed optimization, LASSO, consensus,
traditional signal processing problems in distributed systems
least squares, noise insertion, privacy, subspace. can be equivalently formed as convex optimization problems.
Owing to the general applicability and flexibility of distributed
optimization, optimization has emerged in a wide range of
I. INTRODUCTION applications such as acoustic signal processing [3], [4], control
N A world of interconnected and digitized systems, new theory [5] and image processing [6]. Typically, the paradigm
I and innovative signal processing tools are needed to take
advantage of the sheer scale of information/data. Such systems
of distributed optimization is to separate the global objective
function over the network into several local objective functions,
are often characterized by “big data”. Another central aspect which can be solved for each node through exchanging data
of such systems is their distributed nature, in which the data only with its neighbors. This data exchange is a major concern
are usually located in different computational units that form regarding privacy, because the exchanged data usually contain
a network. In contrast to the traditional centralized systems, sensitive information, and traditional distributed optimization
schemes do not address this privacy issue. Therefore, how to
design a distributed optimizer for processing sensitive data, is a
Manuscript received April 29, 2020; revised August 24, 2020 and October 6,
2020; accepted October 6, 2020. Date of publication October 9, 2020; date of challenge to be overcome in the field.
current version October 23, 2020. The associate editor coordinating the review
of this manuscript and approving it for publication was Dr. Ketan Rajawat.
(Corresponding author: Qiongxiu Li.) A. Related Works
Qiongxiu Li and Mads Græsbøll Christensen are with Audio Analy-
sis Lab, CREATE, Aalborg University, Aalborg 9000, Denmark (e-mail: To address the privacy issues in distributed optimization,
qili@create.aau.dk; mgc@create.aau.dk).
Richard Heusdens is with Netherlands Defence Academy (NLDA), Het the literature has mainly used techniques from differential pri-
Nieuwe Diep 8, Den Helder, AC 1781, The Netherlands, and also with the vacy [7], [8] and secure multiparty computation (SMPC) [9].
Faculty of Electrical Engineering, Mathematics and Computer Science, Delft Differential privacy is one of the most commonly used non-
University of Technology, Mekelweg, Delft, CD 2628, The Netherlands (e-mail:
r.heusdens@tudelft.nl). cryptographic techniques for privacy preservation, because it
Digital Object Identifier 10.1109/TSP.2020.3029887 is computationally lightweight, and it also uses a strict privacy
1053-587X © 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5984 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

metric to quantify that the posterior guess of the private data Because it does not require complex encryption functions
is only slightly better than the prior (quantified by a small (such as those involved in PHE), and it does not have high
positive number ). This method of protecting private data has communication costs (such as those required in the secret
been applied in [10]–[16] through carefully adding noise to the sharing approaches).
exchanged states or objective functions. However, this noise r The proposed subspace perturbation method is generally
insertion mechanism involves an inherent trade-off between the applicable to many distributed optimization algorithms like
privacy and the accuracy of the optimization outputs. Addi- ADMM, PDMM or the dual ascent method.
tionally, some approaches [17]–[19] have applied differential r The convergence rate of the proposed method is invariant
privacy with the help of a trusted third party (TTP) like a with respect to the amount of inserted noise and thus to the
server/cloud. However, requiring a TTP for coordination makes privacy level.
the protocol not completely decentralized (i.e., peer-to-peer We published preliminary results in [31], [32] where the main
setting). Consequently, it thus hinders use in many applications concept of subspace perturbation was introduced using PDMM
such as wireless sensor networks in which centralized coordi- based on a specific application. Here we give a more complete
nations are unavailable. analysis of the proposed subspace perturbation in a broader
SMPC, in contrast, has been widely used in distributed pro- context, i.e., for all convex problems, and further generalize it
cessing, because it provides cryptographic techniques to ensure into other optimizers such as ADMM and dual ascent.
privacy in a distributed network. More specifically, it aims to
compute the result of a function of a number of parties’ pri-
vate data while protecting each party’s private data from being C. Outline and Notation
revealed to others. Examples of how to preserve privacy by
The remainder of this paper is organized as follows: Section II
using SMPC have been applied in [20]–[24], in which partially
reviews distributed convex optimization and some important
homomorphic encryption (PHE) [25] was used to conduct com-
concepts for privacy preservation. Section III defines the prob-
putations in the encrypted domain. However, PHE requires the
lem to be solved and provides qualitative metrics to evaluate the
assistance of a TTP and thus cannot be directly applied in a
performance. Section IV introduces the primal-dual method of
fully decentralized setting. Additionally, although PHE is more
multipliers (PDMM), explaining its key properties used in the
computationally lightweight than other encryption techniques,
proposed approach. Section V introduces the proposed subspace
such as fully homomorphic encryption [26] and Yao’s garbled
perturbation method based on the PDMM. Section VI shows
circuit [27], [28], it is more computationally demanding than the
the general applicability of the proposed method by considering
noise insertion techniques, such as differential privacy, because
other types of distributed optimizers, such as ADMM and the
it relies on the computational hardness assumption. To alleviate
dual ascent method. In Section VII the proposed approach is
the bottleneck of computational complexity, another technique
applied to a wide range of applications including distributed
in SMPC, called secret sharing [29], has become a popular
average consensus, distributed least squares and distributed
alternative for distributed processing, because its computational
LASSO. Section VIII demonstrates the numerical results for
cost is comparable to that of differential privacy. It has been
each application and compares the proposed method with exist-
applied in [30] to preserve privacy by splitting sensitive data
ing approaches. Finally, Section IX concludes the paper.
into pieces and sending them to the so-called computing par-
The following notations are used in this paper. Lowercase
ties afterward. However, secret sharing generally is expensive
letters (x), lowercase boldface letters (x), uppercase boldface
in terms of communication cost, because it requires multiple
letters (X), overlined uppercase letters (X̄) and calligraphic
communication rounds for each splitting process.
letters (X ) denote scalars, vectors, matrices, subspaces and sets,
respectively. An uppercase letter (X) denotes the random vari-
B. Paper Contributions able of its lowercase argument, which means that the lowercase
letter x is assumed to be a realization of random variable X.
The main contribution of this paper is that we propose a
null{·} and span{·} denote the nullspace and span of their argu-
novel subspace perturbation method, which circumvents the
ment, respectively. (X)† and (X) denote the Moore-Penrose
limitations of both the differential privacy and SMPC approaches
pseudo inverse and transpose of X, respectively. xi denotes the
for distributed signal processing. We propose to insert noise in
i-th entry of the vector x, and X ij denotes the (i, j)-th entry of
the subspace such that not only the private data is protected
the matrix X. 0, 1 and I denote the vectors with all zeros and all
from being revealed to others but also the accuracy of results
ones, and the identity matrix of appropriate size, respectively.
is not affected. The proposed subspace perturbation method has
several attractive properties:
r Compared to differential privacy based approaches, the
proposed approach is ensured to converge to the optimum II. FUNDAMENTALS
results without compromising privacy. Additionally, it is In this section, we review the fundamentals and some impor-
defined in a completely decentralized setting, because no tant concepts related to privacy preservation. We first review
central aggregator is required. the distributed convex optimization and highlight its privacy
r In contrast to SMPC based approaches, the proposed ap- concerns. Then we describe the adversary models that will be
proach is efficient in both computation and communication. addressed later in this paper.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5985

A. Distributed Convex Optimization (k+1)


optimization variable xi to all of its neighbors. Since this
(k+1)
A distributed network is usually modeled as a graph G = variable is related to fi (xi ), the revealed xi leaks informa-
(N , E), where N = {1, 2, . . ., n} is the set of nodes, and E ⊆ tion about fi (xi ), e.g., its subgradient ∂fi (xi ), thereby violating
N × N is the set of edges. Let n = |N | and m = |E| denote the privacy. This privacy concern, however, has not been addressed
numbers of nodes and edges, respectively. Ni = {j | (i, j) ∈ E} in existing distributed optimizers. Therefore, in this paper, we
denotes the neighborhood of node i, and di = |Ni | denotes the attempt to investigate this privacy issue and propose a general
degree of node i. solution to achieve privacy-preserving distributed optimization.
Let xi ∈ Rui denote the local optimization variable at node
i. A standard constrained convex optimization problem over the C. Adversary Model
network can then be expressed as
 When designing a privacy-preserving algorithm, it is im-
min fi (xi ) portant to determine the adversary model that qualifies its ro-
xi
i∈N (1) bustness under various types of security attack. By colluding
s.t. B i|j xi + B j|i xj = bi,j ∀(i, j) ∈ E with a number of nodes, the adversary aims to conduct certain
where fi : Rui → R ∪ {∞} denote the local objective function malicious behaviors, such as learning private data or manipu-
at node i, which we assume to be convex for all nodes i ∈ N . lating the function result to be incorrect. These colluding and
Additionally, let vi,j denote the dimension of constraints at each non-colluding nodes are referred to as corrupted nodes and
edge (i, j) ∈ E, B i|j ∈ Rvi,j ×ui , bi,j ∈ Rvi,j are defined for honest nodes, respectively. Most of the literature has considered
the constraints. Note that we distinct the subscripts i|j and i, j, only a passive (also called honest-but-curious or semi-honest)
where the former is a directed identifier that denotes the directed adversary, where the corrupted nodes are assumed to follow the
edge from node i to j and the later i, j is an undirected instructions of the designed protocol, but are curious about the
 identifier. honest nodes’ private data. Another common adversary is the
Stackingall the local information and let Nn = i∈N ui and
Mm = (i,j)∈E vi,j , we can compactly express (1) as external eavesdropping adversary, which is assumed to infer the
private data of the honest nodes by eavesdropping all the commu-
min f (x) nication channels in the network. The eavesdropping adversary
x (2)
s.t. Bx = b in the context of privacy-preserving distributed optimization is
relatively unexplored. In fact, many SMPC based approaches,
where f : RNn → R ∪ {∞}, x ∈ RNn , B ∈ RMm ×Nn , b ∈ such as secret sharing [30], [39], [40], assume that all messages
RMm . For simplicity, we assume the dimension of xi of all are transmitted through securely encrypted channels [41], such
nodes are the same and set it as u, i.e., u = ui , ∀i ∈ N , all that the communication channels cannot be eavesdropped upon.
B i|j be square matrices, i.e., vi,j = ui = u, ∀(i, j) ∈ E, and the However, channel encryption is computationally demanding and
constraints be zeros, i.e., b = 0. We thus have Mm = m × u is therefore very expensive for iterative algorithms, such as
and Nn = n × u. We further define matrix B based on the those used here, because they require use of communication
incidence matrix of the graph: B i|j = I, B j|i = −I if and only channels between nodes many times. In this paper, we design
if (i, j) ∈ E and i < j and B i|j = −I, B j|i = I if and only if the privacy-preserving distributed optimizers in an efficient way,
(i, j) ∈ E and i > j. Note that B reduces to the graph incidence such that the channel encryption needs to be used only once.
matrix if u = 1.
To solve the above problem without any centralized coordi- III. PROBLEM DEFINITION
nation, several distributed optimizers have been proposed, such
as ADMM [33] and PDMM [34], [35], to iteratively solve the Given the above-mentioned fundamentals, we thus conclude
problem by communicating only with the local neighborhood. that the goal of privacy-preserving distributed convex optimiza-
That is, at each iteration (denoted by index k), each node i tion is to jointly optimize the constrained convex problem while
(k) protecting the private data of each node from being revealed
updates its optimization variable xi by exchanging data only
with its neighbors. The goal of distributed optimizers is to design under defined adversary models. More specifically, there are
(k) two requirements that should be satisfied simultaneously:
certain updating functions to ensure that xi → x∗i , where x∗i
1) Output correctness: at the end of the algorithm, each node
denotes the optimum solution for node i.
i obtains its optimum solution x∗i and its correct function
result fi (x∗i ), which implies that the global function result
B. Privacy Concerns f (x∗ ) has been also achieved.
As mentioned in the introduction, the sensor data captured by 2) Individual privacy: throughout the execution of the al-
an individual’s device are usually private in nature. For example, gorithm, the private data, i.e., the information regarding
health conditions like Parkinson’s disease can be detected by fi (xi ), held by each honest node should be protected
voice signals [36], [37], and activities of householders can against both passive and eavesdropping adversaries; ex-
be revealed by power consumption data [38]. In the context cept for the information that can be directly inferred from
of distributed optimization, such private information regarding the knowledge of the function output and the private data
each node i is contained in its local objective function fi (xi ) of the corrupted nodes (in Section VII we will explain this
[12]. Recall that after each iteration, node i will send the updated in detail).

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5986 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

To quantify the above requirements, two metrics must be salary of the honest node i. Provided that this information
defined. leakage is unavoidable, we therefore refer to it as the lower bound
of information leakage. We now give a definition of perfect
A. Output Correctness Metric security in the context of distributed processing.
Definition 1: (Perfect privacy-preserving algorithms.) Given
For each node i, achieving the optimum solution x∗i implies
a specific application, let δ ≥ 0 denote its lower bound of infor-
obtaining the correct function output fi (x∗i ) as well. To measure
mation leakage. A privacy-preserving algorithm is considered
the output correctness for the whole network in terms of the
perfect (or achieves perfect security) as long as it reveals no
amount of communication, we thus use the mean squared error
more information than this lower bound δ.
x(k) − x∗ 22 over all the nodes as a function of number of trans-
mission: one transmission denotes that one message package is
transmitted from one node to another.
IV. PRIMAL-DUAL METHOD OF MULTIPLIERS
B. Individual Privacy Metric To introduce the main idea of subspace perturbation, we
first use PDMM as an example and then generalize it to other
In the literature, information-theoretic measures like mutual
distributed optimizers in Section VI, like ADMM or dual ascent.
information and -differential privacy are often adopted as the
The main reasons for choosing PDMM over other optimizers
privacy metric (see [42] for details). In this paper, we deploy
are its general applicability and its broadcasting property (see
mutual information as the metric for quantifying the individual
Section IV-B) which allows for simplification of the individual
privacy. The reason of choosing mutual information over -
privacy analysis. In this section, we first provide a review of the
differential privacy is because -differential privacy corresponds
fundamentals of the PDMM and introduce its main properties,
to a worst-case metric and the worst-case privacy leakage can
which will be used later in the proposed approach.
in practice be quite far from the typical leakage of the average
user [43]. Notably, mutual information and -differential privacy
are closely related with each other and it is shown in [44] that
Bayesian mutual information is a relaxation of -differential A. Fundamentals
privacy. PDMM is an instance of Peaceman-Rachford splitting of
Given a continuous random variable X with a probability the extended dual problem (refer to [35] for details). It is an
density function  fX , the differential entropy of X is defined alternative distributed optimization tool to ADMM for solving
as h(X) = − f (x) log f (x)dx. Let Y be another random constrained convex optimization problems and is often charac-
variable, the conditional entropy h(X|Y ) quantifies the uncer- terized by a faster convergence rate [34]. For the distributed
tainty of X after knowing Y . The mutual information I(X; Y ) optimization problem stated in (1), the extended augmented
[45] measures the amount of information learned about X by Lagrangian of PDMM is given by
observing Y , or vice versa, which is given by1
I(X; Y ) = h(X) − h(X|Y ). c
(3) L(x, λ) = f (x) + (P λ(k) )Cx + Cx + P Cx(k) 22 ,
2
(4)
C. Lower Bound of Information Leakage
When defining the individual privacy, we explicitly exclude
the information that can be deduced from the function output and the updating equations of PDMM are given by
and the private data of the corrupted nodes, because each node
 
will eventually obtain its function output from the algorithm,
x(k+1) = arg min L x, x(k) , λ(k) , (5)
and in some cases, this output may contain certain information x
regarding the private data held by the individual honest node. λ(k+1) = P λ(k) + c(Cx(k+1) + P Cx(k) ), (6)
To explain this scenario more explicitly, take the distributed
average consensus as an example. Let Nc and Nh denote the set
of corrupted and honest nodes, respectively. A group of n people where P ∈ R2Mm ×2Mm is a symmetric permutation matrix
would like to compute their average salary, denoted by save , exchanging the first Mm with the last Mm rows of the matrix
while keeping each person’s salary si unknown to the others. If it applies, c > 0 is a constant controlling the convergence rate.
the average result is accurate,
the salary sum of the honest
 people λ(k) ∈ R2Mm denotes the dual variable at iteration k, introduced
can always be computed by i∈Nh si = n × save − i∈Nc si for controlling the constraints. Each edge (i, j) ∈ E is related to
assuming the adversary knows n, regardless of the underlying two dual variables λi|j , λj|i ∈ Ru , controlled by node i and j,
algorithms. Withthe mutual information metric, the salary sum respectively. Additionally, C ∈ R2Mm ×Nn is a matrix related
will leak I(Si ; j∈Nh Sj ) amount of information about the to B: C = [B   
+ B − ] , where B + and B − are the matrices
containing only the positive and negative entries of B, respec-
1 In the case of discrete random variables, the condition is expressed in terms tively. Of note, C + P C = [B  B  ] and ∀(i, j) ∈ E : λj|i =
of the Shannon entropy H(·) (P λ)i|j .

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5987

B. Broadcast PDMM associated with PDMM, and similarly ΠH̄p λ and (I − ΠH̄p )λ
On the basis of (5), the local updating function at each node are called the convergent and non-convergent component of the
i is given by dual variable, respectively.

(k+1)
 (k) V. PROPOSED APPROACH USING PDMM
xi = arg min ⎝fi (xi ) + λj|i B i|j xi
xi Having introduced the PDMM algorithm, we now introduce
j∈Ni
⎞ the proposed approach. To achieve a computationally and com-
municationally efficient solution for privacy preservation, one
c  (k)
+ B i|j xi + B j|i xj 22 ⎠ (7) of the most used techniques is obfuscation by inserting noise,
2 such as those used in differential privacy based approaches.
j∈Ni
  However, inserting noise usually compromises the function ac-
(k+1) (k) (k+1) (k)
∀j ∈ Ni : λi|j = λj|i + c B i|j xi + B j|i xj . curacy, because the updates are disturbed by noise. To alleviate
(8) this trade-off, we propose to insert noise in the non-convergent
subspace only so that the accuracy of the optimization solution
(k+1) (k) (k)
We can see that updating λi|j requires only λj|i , xj and is not affected (see also Remark 4), thus achieving both privacy
(k+1) (k) (k) and accuracy at the same time. The proposed noise insertion
xi , of which λj|i and xj are already available at node
(k+1)
method is referred to as subspace perturbation. Below, we first
j. Thus, node i needs to broadcast only xi after which present some information-theoretic results regarding privacy,
(k+1)
the neighboring nodes can update λi|j themselves. As a after which we explain the proposed subspace perturbation in
consequence, the dual variables do not need to be transmitted at detail and prove that it satisfies both the output correctness and
all, except for the initialization step. individual privacy requirements stated in Section III.
More specifically, each node i, regarding each iteration k, has
the following knowledge: A. Privacy Preservation Using Noise Insertion
(k) (k) (k) (k)
{xi , λi|j }j∈Ni ∪ {xj , λj|i }j∈Ni , (9) We first present the following information theoretic results
regarding privacy, which serve as fundamentals for the proposed
where the first term represents the local variables of node i, and approach.
the latter is the variables of its neighbors that are related to node Proposition 1: Let {Xi }i=1,...,n denote a set of continuous
(k)
i. Note that the optimization variables {xj }j∈Ni are sent by the random variables with mean and variance μXi and σX 2
, respec-
i
neighbouring nodes during the (iterative) optimization process tively, assuming they exist. Let {Yi }i=1,...,n be a set of mu-
(k)
whereas the dual variables {λj|i }j∈Ni are computed and kept tually independent random variables, i.e., I(Yi , Yj ) = 0, i = j,
locally at node i, except for the initialization step (k = 0) where which is independent of {Xi }i=1,...,n , i.e., I(Xi , Yj ) = 0 for
these variables need to be exchanged through securely encrypted all i, j ∈ N . Let Zi = Xi + Yi and let Zi = Zi /σZi be the
channels. normalized random variable with unit variance. We have
lim I(X1 , . . . , Xn ; Z1 , . . . , Zn ) = 0, (13)
C. Convergence of Dual Variables 2 →∞
σY
i

Consider two successive λ-updates in (6), in which we have Proof: See Appendix A. 
λ (k+2)
=λ (k)
+ c(Cx (k+2)
+ 2P Cx (k+1)
+ Cx (k)
), (10) Proposition 1 states that, if the lower bound of information
leakage δ = 0, we have perfect security if the variance of the
as P 2 = I. Let H̄p = span(C) + span(P C) and H̄p⊥ = inserted noise goes to infinity. That is, the system is asymp-
null(C  ) ∩ null((P C) ). Denote ΠH̄p as the orthogonal pro- totically optimal. In the context of distributed signal processing,
jection onto H̄p . From (10), we conclude that every two λ- however, δ is usually positive as the optimum solution is often an
updates affect only ΠH̄p λ ∈ H̄p , and (I − ΠH̄p )λ ∈ H̄p⊥ re- aggregation of the local information of all nodes. In these cases,
mains the same. Moreover, as shown in [35], (I − ΠH̄p )λ will perfect security can be achieved through finite noise insertion.
only be permuted in each iteration and ΠH̄p λ will eventually The following result gives a lower bound on the noise variance
converge to λ∗ given by for guaranteeing perfect security.
† Proposition 2: Consider two independent random variables
∗ C ∂f (x∗ ) + cC Cx∗ X and Y with variance σX 2
and σY2 , where X denotes the private
λ =− + cCx∗ . (11)
(P C) ∂f (x∗ ) + cC P Cx∗ data and Y denotes the inserted noise for protecting X. Let
We thus can separate the dual variable into two parts: Z = X + Y . Given δ > 0, if we choose to insert Gaussian noise,
we can obtain
λ(k) = ΠH̄p λ(k) + (I − ΠH̄p )λ(k) ,
  I(X; Z) ≤ δ
→ λ∗ + P k I − ΠH̄p λ(0) (12) as long as
2
since P 2 = I. Below, H̄p and H̄p⊥ are respectively referred σY2 ≥
σX
, (14)
to as the convergent subspace and non-convergent subspace 22δ −1

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5988 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Proof: See Appendix B.  In order to achieve perfect security (see Definition 1), the infor-
Proposition 2 provides a simple way to set the noise variance mation loss at every iteration should not exceed δ, i.e.,
for achieving perfect security. As an example, if δ = 7 × 10−2 ⎛ ⎞
bits, then perfect security can be guaranteed by setting the (k+1) (k+1)
 (k)
I ⎝∂fi (Xi ) ; ∂fi (Xi )+ B i|j λj|i ⎠ ≤ δ.
variance of the inserted Gaussian noise to be 10 times that of the
j∈Ni,h
variance of the private data.
(18)
B. Subspace Perturbation Note that the lower bound δ is given by the application. The only
freedom we have, in order to obtain perfect security, is to control
We first give the following assumption.  (k)
Assumption 1: The communication channels in the network the variance of j∈Ni,h B i|j λj|i .
are securely encrypted when transmitting the initialized dual Using (12), we can express (17) as
  
variable λ(0) . (k+1)
∂fi (xi )+ B i|j ΠH̄p λ(k)
Because of the broadcasting property of the PDMM, after j∈Ni,h
j|i
transmission of the initialized dual variables, the updated opti-     
(k+1)
mization variable xi is the only information transmitted in + B i|j P k I − ΠH̄p λ(0) , (19)
(k+1) j|i
the network at each iteration. Based on (7), xi is computed j∈Ni,h
2
by from
    which we conclude that the variance of the convergent term
  (k+1)  (k)
(k+1) (k) (k) j∈Ni,h B i|j (ΠH̄p λ )j|i can not be manipulated to be large as
0 ∈ ∂fi xi + B i|j λj|i + c xi − xj .
it will always converge since ΠH̄p λ(k) → λ∗ . On the contrary,
j∈Ni j∈Ni 
(15) the variance of the non-convergent term j∈Ni,h B i|j (P k (I −
We can see that the information about the local objective function ΠH̄p )λ(0) )j|i can be made arbitrarily large as it only depends
(k+1) (k+1) on the initialization of the dual variable. As a consequence,
fi (xi ) is contained in the subgradient ∂fi (xi ). The
we propose to exploit this non-convergent term to guarantee
main goal of privacy preservation thus becomes minimizing
(k+1) perfect security. That is, given an arbitrary δ > 0, we can adjust
the information loss about ∂fi (xi ) given the information the variance of j∈Ni,h B i|j (P k (I − ΠH̄p )λ(0) )j|i such that
known by the adversary. To do so, we first need to analyze how (18) is satisfied. In particular, by applying Proposition 2 to the
much knowledge the adversary knows regarding (15). Based problem at hand, we conclude that a sufficient condition to
on the defined adversary models, there are two ways for the guarantee perfect security (18) is to initialize the dual variable
adversary to collect information. The first way is to eavesdrop with Gaussian distributed noise with
the communication channels. By doing so, the adversary is able
  var(∂f (X (k+1) ))
to collect, at every iteration k ≥ 0, all optimization variables ∃j ∈ Ni,h : var ((I − ΠH̄p )λ(0) )j|i ≥
i i
.
(k) 22δ − 1
{xi }i∈N in the network. The dual variables {λ(k) }, on the
other hand, cannot be eavesdropped because they are only trans- (20)
mitted during the initialization phase (k = 0) through securely By inspecting the above condition, we have the following
encrypted channels (see Assumption 1). The second way is to remarks.
collect the information of all corrupted nodes (a passive adver- Remark 1: ((I − ΠH̄p )λ(0) = 0 can be realized by ran-
sary model), which is given by (9) for every i ∈ Nc . Combining domly initializing λ(0) ). Of note, [C P C] ∈ R2Mm ×2Nn can
the above information together, we conclude that the adversary be viewed as a new graph incidence matrix with 2Nn nodes and
has the following knowledge regarding x and λ: 2Mm edges (see (c) in Fig. 1 for an example); we thus have
(k) (k)
{xi }i∈N ∪ {λi|j }(i,j)∈Ec , (16) dim(H̄p ) ≤ 2Nn − 1, and H̄p⊥ is non-empty. For a connected
graph with the number of edges not less than the number of
where Ec = {(i, j) : (i, j) ∈ E, (i, j) ∈
/ Nh × Nh } denotes the nodes (i.e., Mm ≥ Nn ), we conclude that a randomly initialized
corrupted edge set. Let Ni,c = Ni ∩ Nc and Ni,h = Ni ∩ Nh λ(0) ∈ R2Mm will achieve (I − ΠH̄p )λ(0) = 0 with probabil-
denote the corrupted and honest neighborhood of node i, respec- ity 1.
tively. By inspecting (15), we can see that with the knowledge Remark 2: (The privacy is still guaranteed if the adversary has
 (k+1) full knowledge of the subspace H̄p ). If the network topology,
(16), the adversary is able to compute both c j∈Ni (xi −
(k) i.e., B, is known to the adversary, it is able to construct the
xj ) and the partial sum contributed by the corrupted neighbor-
 subspace H̄p by using C and P C. However, knowing H̄p will
(k)
hood of node i, i.e., j∈Ni,c B i|j λj|i . Therefore, after deducing  (k)
not compromise the privacy because the term j∈Ni,h B i|j λj|i
the known terms from (15), what the adversary observes is in (17) can not be determined by the adversary. The reason is that

(k+1)
∂fi (xi )+
(k)
B i|j λj|i . (17) as long as λ(0) ∈/ H̄p , the adversary is not able to reconstruct the
j∈Ni,h
dual variables transmitted between the honest nodes.
Remark 3: (The privacy will not be compromised even though
the adversary collects information over iterations). Since the
2 Note that B 
i|j
= B i|j , B i|j B j|i = −I, and B i|j B i|j = I. updates are only conducted in the convergent subspace, even if

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5989

Fig. 1. An example of graph topologies associated with dual ascent, ADMM and PDMM with u = 1: (a) A graph with n = 5 nodes and m = 6 edges. (b) The
bipartite graph constructed by ADMM with n + m nodes and 2 m edges. (c) The bipartite graph constructed by PDMM with 2n nodes and 2 m edges.

2) Individual Privacy: From (20), we conclude that the pro-


Algorithm 1: Privacy-Preserving Distributed Optimization
posed algorithm is able to achieve perfect security, against both
Via Subspace Perturbation Using PDMM.
passive and eavesdropping adversaries as long as the honest node
1: Each node i ∈ N initializes its optimization variable has at least one honest neighbor, i.e., Ni,h = ∅ and Assumption
(0) (0)
xi arbitrarily, and its dual variables λi|j , j ∈ Ni are 1 holds.
randomly initialized with a certain distribution with Overall, the proposed subspace perturbation approach is able
large variance (specified by the required privacy level). to achieve perfect security without compromising the accuracy.
(0)
2: Node i broadcasts xi and sends the initialized
(0)
{λi|j } to its neighbor j through secure channels [41].
C. Discussions
3: while x(k) − x∗ 2 < threshold do
4: Activated a node uniformly at random, e.g., node i, In Remark 4 we mentioned that the proposed approach has
(k+1) no trade-off between privacy and accuracy. When considering
updates xi using (5).
(k+1) practical signal processing tasks that require quantization, the
5: Node i broadcasts xi to its neighbors j ∈ Ni above claim still holds if the quantizer has a fixed resolution (cell
(through non-secure channels). width), but there will be a trade-off between privacy and bit rate
(k+1)
6: After receiving xi , each neighbor j ∈ Ni as an increase in variance will require a higher bit rate. On the
(k+1)
updates λi|j using (6). other hand, there will be a trade-off between privacy and accu-
7: end while racy if the quantizer has a fixed bit rate, since increasing the noise
will end up with a lower resolution. These quantization related
trade-offs, however, exist in all approaches using noise insertion
for example the differential privacy approaches. One way to
the adversary collects information over iterations, it can only circumvent these trade-offs is to adopt a fixed-rate quantizer
know the difference λ(k+2) − λ(k) ∈ H̄p . With the knowledge that change the cell width adaptively during the iterations [46],
of the dual variables associated with the corrupted nodes only, [47].
again the adversary is not able to determine the dual variables
 (k)
In connection to the above quantization effects, we note that
related to the honest nodes. Hence, j∈Ni,h B i|j λj|i in (17) the lower bound of information leakage is very important in
can not be determined and the privacy is thus guaranteed. reducing these trade-offs, i.e., minimizing the communication
Remark 4: (No trade-off between privacy and accuracy). bandwidth (bit rate) or the error in the algorithm output, because
No matter how much noise is inserted in the non-convergent it helps to specify the minimum noise variance that needs to be
subspace, the convergence of the optimization variable x → x∗ added for perfect security; it is not necessary to set the noise
is guaranteed. By inspecting (4), we can see that the x-update variance to be large if the lower bound is not low enough.
is independent of (I − ΠH̄p )λ as λ (I − ΠH̄p )P C = 0.
Details of the proposed approach using PDMM are shown in
Algorithm 1. And the analysis of both output correctness and VI. PROPOSED APPROACH USING OTHER OPTIMIZERS
individual privacy is summarized below. In this section, we demonstrate the general applicability of the
1) Output Correctness: As proven in [35], with strictly con- proposed subspace perturbation method. In fact, the proposed
(k)
vex f (xi ), the optimization variable xi of each node i of the method can be generally applied to any distributed optimizer if
PDMM is guaranteed to converge geometrically (linearly on the introduced dual variables converge only in a subspace (i.e.,
a logarithmic scale) to the optimum solution x∗i , regardless of there is a non-empty nullspace), which is indeed usually true,
the initialization of both x(0) and λ(0) , thereby ensuring the because these optimizers often work in a subspace determined
correctness. Moreover, for convex functions that are not strictly by the incidence matrix of a graph. To substantiate this claim, we
convex, a slightly modified version called averaged PDMM (see will show that the subspace perturbation also applies to ADMM
Section VII-C for an example) can be used to guarantee the and the dual ascent method. We then illustrate their differences
convergence. by linking the convergent subspaces to their graph topologies.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5990 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

A. Admm The proof for both the output correctness and individual privacy
using ADMM follows a similar structure as that of PDMM. The
Given a standard distributed optimization problem stated in
(1), the augmented Lagrangian function of ADMM [33] is given sufficient condition for perfect security using ADMM becomes
by   var(∂f (X (k+1) ))
∃j ∈ Ni,h : var ((I − ΠH̄a )v (0) )i|j ≥
i i
c .
L(x, v, z) = f (x) + v  (M x + W z) + M x + W z 2 , 22δ − 1
2 (27)
(21)
where M ∈ R2Mm ×Nn , like PDMM, is a matrix related to We thus conclude that perfect security can be achieved using

the graph incidence matrix, and M = − B
, W = B ADMM via subspace perturbation.
+ −
 Of note, there are variations in ADMM. For example, [48]
−I  − I  ∈ R2Mm ×Mm . v ∈ R2Mm and z ∈ R2Mm are showed that the auxiliary variable can be eliminated and the
the introduced dual variables and auxiliary variable for con- dual variable update can be simplified by proper initialization;
straints, respectively. The updating functions of ADMM are it requires that the dual variable should be initialized properly
given by such that it is in the column space of [(B) (−B) ] . However,
  such initialization ensures (I − ΠH̄a )v (0) = 0 and thus there is
x(k+1) = arg min L x, z (k) , v (k) (22) no subspace noise for protecting the private data. Instead, we
x
  need to randomly initialize the dual variable v (0) such that (27)
z (k+1) = arg min L x(k+1) , z, v (k) (23) can be satisfied.
z
 
v (k+1) = v (k) + c M x(k+1) + W z (k+1) . (24) B. Dual Ascent Method
The Lagrangian of the dual ascent method for solving (1) is
By inspecting the v-update in (24), we can see that it has a
given by
similar structure to that of the λ-update in (10) in PDMM. Let
H̄a = span(M ) + span(W ) and the matrix [M W ] can also L(x, u) = f (x) + u Bx, (28)
be seen as an incidence matrix of an extended bipartite graph (see
(b) in Fig. 1 for an example). Therefore, we have dim(H̄a ) ≤ where u ∈ RMm is the introduced dual variable. The updating
Mm + Nn − 1 and every v-update only effects (ΠH̄a )v ∈ H̄a function is given by
and leaves (I − ΠH̄a )v ∈ H̄a⊥ , H̄a⊥ = null(M  ) ∩ null(W  ),  
x(k+1) = arg min L x, u(k) (29)
unchanged. In addition to this, similar as (15) in PDMM, the x
(k+1)
local optimization variable xi of node i is computed by u(k+1) = u(k) + t(k) Bx(k+1) , (30)
(k+1)
 (k)
 (k+1) (k)
0 ∈ ∂fi (xi )+ v i|j + c (xi − z i,j ). (25) where t(k) denotes the step size at iteration k. Likewise, the u-
j∈Ni j∈Ni
update in (30) has a similar structure as the λ-update of PDMM
Note that ADMM is not a broadcasting protocol, i.e., it re- and the v-update of ADMM. Here the convergent subspace is
quires pairwise communications for the auxiliary variable. The H̄d = span(B) and B is also rank deficient as it is related to the
individual privacy is thus dependent on both x and z. To graph incidence matrix. Hence, we conclude that the proposed
simplify the analysis, we remark that revealing the auxiliary subspace perturbation method also works for the dual ascent
variable z will not disclose more information than revealing method.
x by the data processing inequality [45]. As a consequence,
it is sufficient to analyze the individual privacy through (25). C. Linking Graph Topologies and Subspaces
As for the knowledge of the adversary, we note that, in addi-
Thus far, we have shown that the dual variable updates
tion to the optimization variable and the dual variable, all the
(k) of PDMM, ADMM and the dual ascent method are depen-
auxiliary variables {z i,j }(i,j)∈E are assumed to be known by dent only on their corresponding subspaces: H̄p = span(C) +
the adversary as they can be eavesdropped. We then conclude span(P C), H̄a = span(M ) + span(W ) and H̄d = span(B).
 (k)
that after deducing all the known terms, i.e., j∈Ni,c v i|j As mentioned before, each of the matrices [C P C], [M W ] and
 (k+1) (k)
and c j∈Ni (xi − z i,j ), from (25), what the adversary B can be seen as an incidence matrix of a graph, therefore they
observes is all have a non-empty left nullspace for subspace perturbation as
(k+1)
 (k) long as m ≥ n (Remark 1). To examine the appearance of these
∂fi (xi )+ v i|j constructed graphs, in Fig. 1 we give an example of these graphs
j∈Ni,h and provide illustrative insights into the differences among these
(k+1)
   optimizers.
= ∂fi (xi )+ ΠH̄a v (k)
i|j
j∈Ni,h
   VII. APPLICATIONS
+ (I − ΠH̄a )v (0) . (26) To demonstrate the potential of the proposed approach to
i|j
j∈Ni,h be used in a wide range of applications, we now present the

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5991

application of the proposed subspace perturbation to three fun- B. Privacy-Preserving Distributed Least Squares
damental problems: distributed average consensus, distributed
Privacy-preserving distributed least squares aims to find a
least squares and distributed LASSO, because they serve as solution for a linear system (here we consider an overdetermined
building blocks for many other signal processing tasks, such
system in which there are more equations than unknowns) over
as denoising [49], interpolation [50], machine learning [51],
a network in which each node has only partial knowledge of
[52] and compressed sensing [53], [54]. We first introduce the the system and is only able to communicate with its neighbors,
application and then perform the individual privacy analysis.
and in the meantime the local information held by each node
We will continue using PDMM to introduce the details, but the
should not be revealed to others. More specifically, here the
numerical results of using all the discussed optimizers will be local information of node i means both the input observations,
presented in the next section.
denoted by Qi ∈ Rpi ×u , pi > u and decision vector, denoted
by y i ∈ Rpi . That is, each node i has pi observations, and each
A. Privacy-Preserving Distributed Average Consensus contains an u-dimensional feature vector. Collecting all the local
The optimization problem setup (1) for distributed average information, we thus have Q = [Q , Q
1 , . . .

n] ∈ R
Pn ×u
and
  
consensus becomes y = [y 1 , . . . , y n ] ∈ R , where Pn = i∈N pi .
Pn

1 The least-squares problem in a distributed network can be


min xi − si 22 formulated as a distributed linearly constrained convex opti-
xi 2 (31)
i∈N mization problem, and the problem setup in (1) becomes
s.t. xi = xj , ∀(i, j) ∈ E,  1
min y i − Qi xi 22
xi i∈N 2 (34)
where si denotes the initial state value held by node i. s.t. xi = xj , ∀(i, j) ∈ E,
The  optimum solution for each optimization variable is x∗i =
n −1 ∗ ∗ ∗ 
i∈N si and x = (x1 , . . . , xn ) . Privacy-preserving dis-
where the optimum solution is given by x∗i = (Q Q)−1 Q y ∈
tributed average consensus is widely investigated in the lit- Ru , ∀i ∈ N .
erature [55]–[63] and it aims estimate the average of all the 1) Individual Privacy: Note that the local information y i and
nodes’ initial state values over a network and keep each node’s Qi usually contain users’ sensitive information [24]. Take the
initial state value unknown to others. Such privacy-preserving distributed linear regression as an example, which is widely
solutions are highly desired in practice. For example, in face used in the field of machine learning, and consider the case
recognition applications, computing mean faces is usually re- that several hospitals want to collaboratively learn a predictive
quired, thus prompting privacy concerns. Here, a group of people model by exploring all the data in their datasets. However,
may cooperate to compute the mean face, but none of them would such collaborations are limited because they must comply with
like to reveal their own facial images during the computation. policies such as the general data protection regulation (GDPR)
1) Individual Privacy: Note that in distributed average con- and because individual patients/customers may feel uncomfort-
sensus, the requirement of protecting the initial state value si is able with revealing their private information to others, such
(k+1) (k+1) as insurance data and health condition. In this context, since
equivalent to protecting ∂fi (xi ) = xi − si , since the (k+1) (k+1)
(k+1) ∂fi (xi ) = Q i (Qi xi − y i ) contains sensitive infor-
optimization variable xi is known to the adversary and can
mation regarding the local information Qi and y i of node i,
thus be seen as a constant. To comply with existing approaches,
it is thus important to protect it from being revealed. We can
in what follows we will analyze the privacy in terms of the initial
(k+1) (k+1) see that at the end each node obtains the optimum solution
state value si . Substitute ∂fi (xi ) = xi − si in (17) and
(k+1)
∀i ∈ N : x∗i = (Q Q)−1 Q y. The lower bound δ is thus the
remove the known term xi , we then have amount of information learned about ∂fi (xi ) of honest node
 (k) i ∈ Nh by knowing x∗i given the knowledge of the corrupted
− si + B i|j λj|i . (32) nodes, i.e., {fi (xi )}i∈Nc . Hence, the propose approach is able to
j∈Ni,h
achieve privacy-preserving distributed least squares, i.e., perfect
As shown in [31], [55], the only revealed information here would security (18) is guaranteed as long as (20) is satisfied.
be the partial sums of all the honest components (connected
subgraphs consist of honest nodes only) after removal of all the C. Privacy-Preserving Distributed LASSO
corrupted nodes. Let H denote the node set of the component Privacy-preserving distributed LASSO aims to securely find
that the honest node i belongs to, the lower bound
 of information a sparse solution when solving an underdetermined system (in
leakage for node i is thus given by I(Si ; j∈H Sj ) = δ. We which the number of equations is less than number of un-
conclude that, given δ, the proposed approach is able to achieve knowns). We thus have a network similar to the previous least
perfect security, i.e., squares section but with the dimension Pn < u to ensure an
⎛ ⎞ underdetermined system. The distributed LASSO is formulated
 (k) as a 1 -regularized distributed least squares problem given by
I ⎝Si ; Si + B i|j λj|i ⎠ ≤ δ,
 1
(33)
j∈Ni,h
min y i − Qi xi 22 + α|xi |
xi 2
(k+1) i∈N
by satisfying (20) in which var(∂fi (Xi )) is replaced by
var(Si ). s.t. xi = xj , ∀(i, j) ∈ E (35)

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5992 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

Fig. 2. Distributed average consensus with two different initializations of the dual variable with a variance of 106 : convergence of the optimization variable, the
convergent and non-convergent component of the dual variable, using (a) dual ascent, (b) ADMM, and (c) PDMM.

Fig. 3. Distributed least squares with two different initializations of the dual variable with a variance of 106 : convergence of the optimization variable, the
convergent and non-convergent component of the dual variable of (a) dual ascent, (b) ADMM, and (c) PDMM.

where α the constant controlling the sparsity of the solution. set at r2 = 2 logn n so ensure that the graph is connected with
Because the objective function is convex but not strictly convex, high probability [64]. For simplicity, all the local data regard-
we use averaged PDMM to ensure convergence, the x-updating ing f (xi ) in each application, i.e., si in distributed average
function remains the same, and the λ-updating function in (6) consensus, Qi and y i in distributed least squares and LASSO,
is replaced with a weighted average by are randomly generated from a Gaussian distribution with unit
λ(k+1) = θ(λ(k) + cC(x(k+1) − x(k) )) variance, and the optimization variables are initialized with
  zeros. Additionally, we initialize all the dual variables with a
+ (1 − θ) P λ(k) + c(Cx(k+1) + P Cx(k) ) , Gaussian distribution with different variances.
(36)
where 0 < θ < 1 is the constant controlling the average weight.
A. General Applicability
The output correctness is ensured by simply replacing the equa-
tion (6) in step 1 of algorithm 1 with the above equation. The In Figs. 2, 3 and 4, we compare the convergence behavior of
rest analysis follows similarly as the example for distributed the proposed subspace perturbation methods (blue lines) with
least squares described above. Hence, with (20), we are able to traditional non-private approaches (red lines) by using three
achieve perfect security in distributed LASSO. distributed optimizers in three applications: distributed average
consensus, least squares and LASSO. More specifically, the blue
VIII. NUMERICAL RESULTS lines indicate that the dual variables are randomly initialized
with a variance of 106 , such that the non-convergent component
In this section, several numerical tests3 are conducted to
(blue line with triangle markers) can protect the private data,
demonstrate both the generally applicability and the benefits of
whereas the red lines mean that the dual variables are initialized
the proposed subspace perturbation in terms of several important
within the convergent subspace, and the private data are therefore
parameters including accuracy, convergence rate, communica-
not protected, because the non-convergent component is zero
tion cost and privacy level.
(the red lines with triangle markers are not shown in the plots).
We simulated a distributed network by generating a random
We can see that the proposed approach has no effect on the
graph with n = 20 nodes and the communication radius r was
accuracy, because all the optimization variables converge to the
3 The code for reproducing these results is available at https://github.com/ same optimum solution as the non-private counterparts. Overall,
qiongxiu/PrivacyOptimizationSubspace we can conclude that

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5993

Fig. 4. Distributed LASSO with two different initializations of the dual variable with a variance of 106 : convergence of the optimization variable, the convergent
and non-convergent component of the dual variable of (a) dual ascent, (b) ADMM, and (c) PDMM.

Fig. 5. Convergence of the optimization variable in terms of three different privacy levels, i.e., approximately 7 × 10−3 , 7 × 10−5 and 7 × 10−7 bits, for
(a) proposed dual ascent (p-Dual), (b) proposed ADMM (p-ADMM), and (c) proposed PDMM (p-PDMM) in a distributed least squares application.

1) the proposed approach is able to achieve both accuracy


and privacy simultaneously;
2) it is able to solve a variety of convex problems;
3) it is generally applicable to a broad range of distributed
convex optimization methods.

B. Privacy Level-Invariant Convergence Rate


Another important aspect to quantify the performance is the
influence on the convergence rate when considering privacy.
Because the convergence rates of the discussed distributed op-
timizers depend only on the underlying graph topologies rather
than the initializations, initializing the dual variables with greater
variance will therefore not change the convergence rate; and it
Fig. 6. Performance comparison: convergence behavior under three different
will only result in only a larger offset in the initial error. To vali- noise variances using the proposed p-PDMM, p-ADMM and an existing differ-
date these results, in Fig. 5 we show the convergence behavior of ential privacy (DP) approach.
the proposed approaches in the distributed least squares problem
under three different privacy levels: δ = 7 × 10−3 , 7 × 10−5 ,
and 7 × 10−7 bits. In order to achieve perfect security, based p-ADMM) with a differential privacy approach [60] by using the
on Proposition 2, the variance of each dual variable is set as distributed average consensus application. We consider three
102 , 104 and 106 , respectively. As expected, in all optimizers, cases in which the variance of each dual variable is set as
the convergence rate remains identical regardless of the privacy 0, 102 , 104 . We can see that the accuracy of the differential
level. privacy approach decreases with increasing privacy level. Hence,
there is a trade-off between privacy and accuracy. Additionally,
the convergence rates of differential privacy approaches will also
C. Comparison With Differential Privacy
be affected when increasing the level of privacy, because noise
To demonstrate the benefits of the proposed method, in is inserted at every iteration, and a higher privacy level will also
Fig. 6, we compare the proposed approaches (both p-PDMM and result in a slower convergence rate.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5994 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

APPENDIX A
PROOF OF PROPOSITION 1
Proof:

I(X1 , . . . , Xn ; Z1 , . . . , Zn )
= h(Z1 , . . . , Zn ) − h(Z1 , . . . , Zn |X1 , . . . , Xn )
(a)
= h(Z1 , . . . , Zn ) − h(Y1 , . . . , Yn )

(b)  
n n
= h (Zi |Z1 , . . . , Zi−1 ) − h (Yi )
i=1 i=1

(c) 
n 
n
Fig. 7. Normalized mutual information of an arbitrary node i (i.e., ≤ h (Zi ) − h(Yi )
(k)
I(Si ;Xi ) i=1 i=1
I(Si ;Si )
) using the proposed p-PDMM and non-private PDMM (n-PDMM)
(d) 
for each iteration. n
= I(Xi ; Zi )
i=1

(e) 
n
D. Information Loss Over the Iterative Process = I(Xi /σZi ; Zi ).
To visualize how the information loss behaves during the i=1

iterative process, we perform 105 Monte Carlo simulations and


I(S ;X
(k)
) Step (a) holds, as h(Zi |Xi ) = h(Yi ), (b) holds from the chain
estimate the normalized mutual information (NMI) I(Si i ;Sii ) rule of differential entropy and the condition that the Yi ’s
of distributed average consensus using the non-parametric en- are independent random variables, (c) holds, as condition-
tropy estimation toolbox (npeet) [65]. In Fig. 7, we show the ing decreases entropy, (d) holds, as h(Zi ) − h(Yi ) = h(Zi ) −
estimated normalized mutual information of the proposed p- h(Zi |Xi ) = I(Xi ; Zi ), and (e) holds from the fact that mutual
PDMM with a noise variance of 106 and traditional non-private information is scaling invariant. As a consequence,
PDMM, in which the dual variables are initialized with zeros.
Since bothapproaches converge to the same average result 
n 
n
x∗i = n−1 i∈N si , they ultimately have the same informa- lim I(Xi ; Zi ) = lim I(Xi /σZi ; Zi )
2
σY →∞ σZi →∞
tion loss I(Si ; Xi∗ ), which corresponds to the lower bound i i=1 i=1
of information leakage under the condition that there is no 
n
passively corrupted nodes. As expected, the information loss of = I(0; Zi ) = 0,
the proposed p-PDMM never exceeds the lower bound; hence, i=1
the proposed approach achieves perfect security. However, the
n-PDMM reveals all the information about si at the first iteration thereby proving our claims. 
as no noise is inserted; thus the privacy is not protected at
all. APPENDIX B
PROOF OF PROPOSITION 2
IX. CONCLUSION 2 2
Proof: As X and Y are independent, we have σZ = σX +
In this paper, a novel and general subspace perturbation σY2 . For a Gaussian random variable with variance σ , the 2

method was proposed for privacy-preserving distributed opti- differential entropy is given by 12 log(2πeσ 2 ), so that
mization. As a noise insertion approach, this method is more
practical than SMPC based approaches in terms of both com- δ = I(X; Z) = h(Z) − h(Z|X)
putation and communication costs. Additionally, by inserting = h(Z) − h(Y )
noise in subspace, it circumvents the trade-off between privacy
and accuracy in traditional noise insertion approaches such as (a) 1 1
2
≤ log(2πeσZ ) − log(2πeσY2 )
differential privacy. Moreover, the proposed method guarantees 2 2
perfect security and is generally applicable to various optimizers 1 2
and all convex problems. Furthermore, we consider both passive = log(1 + σX /σY2 ), (37)
2
and eavesdropping adversary models; in which the private data
of each honest node are protected as long as the node has one where (a) holds, because the maximum entropy of a random vari-
honest neighbor, and only secure channel encryption in the able with fixed variance is achieved by a Gaussian distribution;
initialization is required. and equality holds if X is also Gaussian distributed. 

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
LI et al.: PRIVACY-PRESERVING DISTRIBUTED OPTIMIZATION VIA SUBSPACE PERTURBATION: A GENERAL FRAMEWORK 5995

ACKNOWLEDGMENT [23] C. Wang, K. Ren, and J. Wang, “Secure and practical outsourcing of linear
programming in cloud computing,” in Proc. Int. Conf. Comput. Commun.,
The authors would like to thank the anonymous reviewers for 2011, pp. 820–828.
their helpful comments to improve this manuscript. In particular [24] C. Zhang, M. Ahmad, and Y. Wang, “ADMM based privacy-preserving
decentralized optimization,” IEEE Trans. Inf. Forensics Security, vol. 14,
the reviewer who suggested to directly define the subgradient as no. 3, pp. 565–580, Mar. 2019.
the private data helped to simplify and clarify this manuscript. [25] P. Paillier, “Public-key cryptosystems based on composite degree residu-
osity classes,” in Proc. EUROCRYPT, 1999, pp. 223–238.
[26] C. Gentry, “Fully homomorphic encryption using ideal lattices,” in Proc.
Symp. Theory Comput., 2009, pp. 169–178.
[27] A. C. Yao, “Protocols for secure computations,” in Proc. Symp. Founda-
REFERENCES tions Comput. Sci., 1982, pp. 160–164.
[28] A. C. Yao, “How to generate and exchange secrets,” in Proc. Symp. Found.
[1] M. Anderson, Technology Device Ownership, 2015, Pew Research Center,
Comput. Sci., 1986, pp. 162–167.
2015.
[29] I. Damgård, V. Pastro, N. Smart, and S. Zakarias, “Multiparty computation
[2] J. Poushter and others, “Smartphone ownership and internet usage contin-
from somewhat homomorphic encryption,” in Proc. Adv. Cryptology–
ues to climb in emerging economies,” Pew Res. Center, vol. 22, pp. 1–44,
CRYPTO, Springer, 2012, pp. 643–662.
2016.
[30] K. Tjell and R. Wisniewski, “Privacy preservation in distributed opti-
[3] T. Sherson, W. B. Kleijn, and R. Heusdens, “A distributed algorithm for
mization via dual decomposition and ADMM,” in Proc. Centers Disease
robust LCMV beamforming,” in Proc. IEEE Int. Conf. Acoust., Speech,
Control Prevention, 2020, pp. 7203–7208.
Signal Process., 2016, pp. 101–105.
[31] Q. Li, R. Heusdens, and M. G. Christensen, “Convex optimisation-
[4] A. I. Koutrouvelis, T. W. Sherson, R. Heusdens, and R. C. Hendriks, “A
based privacy-preserving distributed average consensus in wireless sensor
low-cost robust distributed linearly constrained beamformer for wireless
networks,” in Proc. Int. Conf. Acoust., Speech, Signal Process., 2020,
acoustic sensor networks with arbitrary topology,” IEEE/ACM Trans.
pp. 5895–5899.
Audio, Speech, Lang. Process., vol. 26, no. 8, pp. 1434–1448, Aug. 2018.
[32] Q. Li, R. Heusdens, and M. G. Christensen, “Convex optimization-
[5] M. Zhu and S. Martínez, “On distributed convex optimization under
based privacy-preserving distributed least squares via subspace per-
inequality and equality constraints,” IEEE Trans. Autom. Control, vol. 57,
turbation,” in Proc. Eur. Signal Process. Conf., to be published, doi:
no. 1, pp. 151–164, Jan. 2011.
10.1109/ICASSP40776.2020.9053348.
[6] M. Zibulevsky and M. Elad, “L1-L2 optimization in signal and image pro-
[33] S. Boyd et al., “Distributed optimization and statistical learning via the
cessing,” IEEE Signal Process. Mag., vol. 27, no. 3, pp. 76–88, May 2010.
alternating direction method of multipliers,” Found. Trends Mach. Learn.,
[7] C. Dwork, “Differential privacy,” in Proc. Int. Colloq. Automata, Lang.,
vol. 3, no. 1, pp. 1–122, 2011.
Program., 2006, pp. 1–12.
[34] G. Zhang and R. Heusdens, “Distributed optimization using the primal-
[8] C. Dwork, F. McSherry, K. Nissim, and A. Smith, “Calibrating noise to
dual method of multipliers,” IEEE Trans. Signal Process., vol. 4, no. 1,
sensitivity in private data analysis,” in Proc. Theory Cryptography Conf.,
pp. 173–187, Mar. 2018.
2006, pp. 265–284.
[35] T. Sherson, R. Heusdens, and W. B. Kleijn, “Derivation and analysis
[9] R. Cramer, I. B. Damgård, and J. B. Nielsen, Secure Multiparty Com-
of the primal-dual method of multipliers based on monotone operator
putation and Secret Sharing, Cambridge, U.K.: Cambridge Univ. Press,
theory,” IEEE Trans. Signal Inf. Process. Netw., vol. 5, no. 2, pp 334–347,
2015.
Jun. 2018.
[10] Z. Huang, S. Mitra, and N. Vaidya, “Differentially private distributed
[36] A. H. Poorjam, Y. P. Raykov, R. Badawy, J. R. Jensen, M. G. Christensen,
optimization,” in Proc. Int. Conf. Distrib. Comput. Netw, 2015, pp. 1–10.
and M.A. Little, “Quality control of voice recordings in remote Parkinson’s
[11] S. Han, U. Topcu, and G. J. Pappas, “Differentially private distributed
disease monitoring using the infinite hidden Markov model,” in Proc. Int.
constrained optimization,” IEEE Trans. Autom. Control, vol. 62, no. 1,
Conf. Acoust., Speech, Signal Process., 2019, pp. 805–809.
pp. 50–64, Jan. 2016.
[37] A. H. Poorjam et al., “Automatic quality control and enhance-
[12] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private distributed
ment for voice-based remote Parkinson’s disease detection,” 2019,
convex optimization via functional perturbation,” IEEE Trans. Control
arXiv:1905.11785.
Netw. Syst., vol. 5, no. 1, pp. 395–408, Mar. 2018.
[38] G. Giaconi, D. Gndz, and H. V. Poor, “Privacy-aware smart metering:
[13] T. Zhang and Q. Zhu, “Dynamic differential privacy for ADMM-based
Progress and challenges,” IEEE Signal Process. Mag., vol. 35, no. 6,
distributed classification learning,” IEEE Trans. Inf. Forensics Security,
pp. 59–78, Nov. 2018.
vol. 12, no. 1, pp. 172–187, Jan. 2016.
[39] Q. Li and M. G. Christensen, “A privacy-preserving asynchronous aver-
[14] X. Zhang, M. M. Khalili, and M. Liu, “Recycled ADMM: Improve privacy
aging algorithm based on Shamir’s secret sharing,” in Proc. Eur. Signal
and accuracy with less computation in distributed algorithms,” in Proc.
Process. Conf., 2019, pp. 1–5.
56th Annu. Allerton Conf. Commun., Control, Comput., 2018, pp. 959–965.
[40] K. Tjell, I. Cascudo, and R. Wisniewski, “Privacy preserving recursive
[15] X. Zhang, M. M. Khalili, and M. Liu, “Improving the privacy and accuracy
least squares solutions,” in Proc. Ewing Christian College, 2019, pp. 3490–
of ADMM-based distributed algorithms,” Proc. Int. Conf. Mach. Lear.,
3495.
pp. 5796–5805, 2018.
[41] D. Dolev, C. Dwork, O. Waarts, and M. Yung, “Perfectly secure mes-
[16] Y. Xiong, J. Xu, K. You, J. Liu, and L. Wu, “Privacy preserving distributed
sage transmission,” J. Assoc. Comput. Mach., vol. 40, no. 1, pp. 17–47,
online optimization over unbalanced digraphs via subgradient rescaling,”
1993.
IEEE Trans. Control Netw. Syst., vol. 7, no. 3, pp. 1366–1378, Sep. 2020.
[42] I. Wagner and D. Eckhoff, “Technical privacy metrics: A systematic
[17] Z. Huang, R. Hu, Y. Gong, and E. Chan-Tin, “DP-ADMM: ADMM-based
survey,” ACM Comput. Surv., vol. 51, no. 3, pp. 1–38, 2018.
distributed learning with differential privacy,” IEEE Trans. Inf. Forensics
[43] M. Lopuhaä-Zwakenberg, B. Škorić, and N. Li, “Information-theoretic
Security, vol. 15, pp. 1002–1012, 2020.
metrics for local differential privacy protocols,” 2019, arXiv:1910.07826.
[18] M. T. Hale and M. Egerstedt, “Differentially private cloud-based multi-
[44] P. Cuff and L. Yu, “Differential privacy as a mutual information constraint,”
agent optimization with constraints,” in Proc. Amer. Control Conf., 2015,
in Proc. 23rd ACM SIGSAC Conf. Comput. Commun. Security, pp. 43–54,
pp. 1235–1240.
2016.
[19] M. T. Hale and M. Egerstedt, “Cloud-enabled differentially private mul-
[45] T. M. Cover and J. A. Tomas, Elements of Information Theory. Hoboken,
tiagent optimization with constraints,” IEEE Trans. Control Netw. Syst.,
NJ, USA: Wiley, 2012.
vol. 5, no. 4, pp. 1693–1706, Dec. 2018.
[46] D. H. M. Schellekens, T. Sherson, and R. Heusdens, “Quantisation effects
[20] Z. Xu and Q. Zhu, “Secure and resilient control design for cloud enabled
in PDMM: A first study for synchronous distributed averaging,” in Proc.
networked control systems,” in Proc. 1st ACM Workshop Cyber-Phys.
Int. Conf. Acoust., Speech, Signal Process., 2017, pp. 4237–4241.
Syst.-Security Privacy, 2015, pp. 31–42.
[47] D. H. M. Schellekens, T. Sherson, and R. Heusdens, “Quantisation effects
[21] N. M. Freris and P. Patrinos, “Distributed computing over encrypted data,”
in distributed optimisation,” in Proc. Int. Conf. Acoust., Speech, Signal
in Proc. IEEE 54th Annu. Allerton Conf. Commun., Control, Comput.,
Process., 2018, pp. 3649–3653.
2016, pp. 1116–1122.
[48] W. Shi, Q. Ling, K. Yuan, G. Wu, and W. Yin, “On the linear convergence of
[22] Y. Shoukry et al., “Privacy-aware quadratic optimization using partially
the ADMM in decentralized consensus optimization,” IEEE Trans. Signal
homomorphic encryption,” in Proc. IEEE 55th Conf. Decis. Control, 2016,
Process., vol. 62, no. 7, pp. 1750–1761, Apr. 2014.
pp. 5053–5058.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.
5996 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 68, 2020

[49] J. Pang, G. Cheung, A. Ortega, and O. C. Au, “Optimal graph Laplacian Richard Heusdens received the M.Sc. and Ph.D. de-
regularization for natural image denoising,” in Proc. Int. Conf. Acoust., grees from the Delft University of Technology, Delft,
Speech, Signal Process., 2015, pp. 2294–2298. The Netherlands, in 1992 and 1997, respectively.
[50] S. K. Narang, A. Gadde, and A. Ortega, “Signal processing techniques for Since 2002, he has been an Associate Professor with
interpolation in graph structured data,” in Proc. Int. Conf. Acoust., Speech, the Faculty of Electrical Engineering, Mathematics
Signal Process., 2013, pp. 5445–5449. and Computer Science, Delft University of Technol-
[51] Tibshirani and Robert, “Regression shrinkage and selection via the lasso,” ogy. In the spring of 1992, he joined the Digital
J. Roy. Stat. Soc. B, vol. 58, no. 1, pp. 267–288, 1996. Signal Processing Group, Philips Research Laborato-
[52] J. Wright, A. Yang, A. Ganesh, S. Sastry, and Y. Ma, “Robust face ries, Eindhoven, The Netherlands. He has worked on
recognition via sparse representation,” IEEE Trans. Pattern Anal. Mach. various topics in the field of signal processing, such as
Intell., vol. 31, no. 2, pp. 210–227, Feb. 2008. image/video compression and VLSI architectures for
[53] E. J. Candes and T. Tao, “Near-optimal signal recovery from random image processing algorithms. In 1997, he joined the Circuits and Systems Group
projections: Universal encoding strategies?” IEEE Trans. Inf. Theory., of Delft University of Technology, where he was a Postdoctoral Researcher.
vol. 52, no. 12, pp. 5406–5425, Dec. 2006. In 2000, he joined the Information and Communication Theory (ICT) Group,
[54] D. L. Donoho, “Compressed sensing,” IEEE Trans. Inf. Theory., vol. 52, where he became an Assistant Professor responsible for the audio/speech signal
no. 4, pp. 1289–1306, Apr. 2006. processing activities within the ICT Group. He held visiting positions with
[55] Q. Li, I. Cascudo, and M. G. Christensen, “Privacy-preserving distributed KTH (Royal Institute of Technology, Sweden) in 2002 and 2008 and was a
average consensus based on additive secret sharing,” in Proc. Eur. Signal Guest Professor with Aalborg University from 2014 to 2016. Since 2019, he has
Process. Conf., 2019, pp. 1–5. been a Full Professor with the Netherlands Defence Academy. He is involved
[56] N. Gupta, J. Katz, and N. Chopra, “Privacy in distributed average consen- in various research projects that include audio and acoustic signal processing,
sus,” IFAC-Papers OnLine, vol. 50, no. 1, pp. 9515–9520, 2017. speech enhancement, sensor signal processing, and distributed optimization.
[57] N. Gupta, J. Kat, and N. Chopra, “Statistical privacy in distributed average
consensus on bounded real inputs,” in Proc. Amer. Congr. Cardiology,
pp 1836–1841.
[58] M. Kefayati, M. S. Talebi, B. H. Khalajand, and H. R. Rabiee, “Secure con-
sensus averaging in sensor networks using random offsets,” in Proc. IEEE
Int. Conf. Telecommun., Malaysia Int. Conf. Commun., 2007, pp. 556–560.
[59] Z. Huang, S. Mitra, and G. Dullerud, “Differentially private iterative
synchronous consensus,” in Proc. ACM Workshop Privacy Electron. Soc., Mads Græsbøll Christensen (Senior Member,
2012, pp. 81–90. IEEE) received the M.Sc. and Ph.D. degrees in
[60] E. Nozari, P. Tallapragada, and J. Cortés, “Differentially private average 2002 and 2005, respectively, from Aalborg University
consensus: Obstructions, trade-offs, and optimal algorithm design,” Auto- (AAU), Denmark. He is currently with the Depart-
matica, vol. 81, pp. 221–231, 2017. ment of Architecture, Design & Media Technology,
[61] N. E. Manitara and C. N. Hadjicostis, “Privacy-preserving asymptotic AAU, as a Professor in audio processing and the Head
average consensus,” in Proc. Eur. Control Conf., 2013, pp. 760–765. and a founder of the Audio Analysis Lab.
[62] Y. Mo and R. M. Murray, “Privacy preserving average consensus,” IEEE He was formerly with the Department of Electronic
Trans. Autom. Control, vol. 62, no. 2, pp. 753–765, Feb. 2017. Systems, AAU, and has held visiting positions with
[63] J. He, L. Cai, C. Zhao, P. Cheng, and X. Guan, “Privacy-preserving average Philips Research Labs, ENST, UCSB, and Columbia
consensus: Privacy analysis and algorithm design,” IEEE Trans. Signal Inf. University. He has authored or coauthored four books
Process. Netw., vol. 5, no. 1, pp. 127–138, Mar. 2019. and more than 200 papers in peer-reviewed conference proceedings and journals,
[64] J. Dall and M. Christensen, “Random geometric graphs,” Phys. Rev. E, and he has given multiple tutorials at EUSIPCO, ICASSP, and INTERSPEECH
vol. 66, no. 1, 2002, Art. no. 016121. and a keynote talk at IWAENC. His research interests include audio and acoustic
[65] G. Ver Steeg, “Non-parametric entropy estimation toolbox (NPEET),” signal processing where he has worked on topics such as microphone arrays,
2000. [Online]. Available: https://github.com/gregversteeg/NPEET noise reduction, signal modeling, speech analysis, audio classification, and audio
coding.
Dr. Christensen was the recipient of several awards, including best paper
awards, the Spar Nord Foundations Research Prize, a Danish Independent Re-
search Council Young Researchers Award, the Statoil Prize, the EURASIP Early
Career Award, and an IEEE SPS Best Paper Award. He is a beneficiary of major
Qiongxiu Li (Student Member, IEEE) was born in grants from the Independent Research Fund Denmark, the Villum Foundation,
Hubei, China, in 1993. She received the M.Sc. degree and Innovation Fund Denmark. He is the Editor-in-Chief of EURASIP Journal
in information and communication engineering from on Audio, Speech, and Music Processing and is the Senior Area Editor of the
Inha University, Incheon, South Korea, in 2017. She IEEE SIGNAL PROCESSING LETTERS, and he was previously an Associate Editor
is currently working toward the Ph.D. degree with of the IEEE/ACM TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE
the Department of Architecture, Design, and Media PROCESSING and IEEE SIGNAL PROCESSING LETTERS. He is a member of
Technology and the Audio Analysis Lab, Aalborg the IEEE Audio and Acoustic Signal Processing Technical Committee, and a
University, Aalborg, Denmark. Her research inter- founding member of the EURASIP Special Area Team in Acoustic, Speech and
ests include distributed signal processing, secure and Music Signal Processing. He is a member of EURASIP and member of the
privacy-preserving signal processing. Danish Academy of Technical Sciences.

Authorized licensed use limited to: TU Delft Library. Downloaded on February 09,2021 at 06:39:09 UTC from IEEE Xplore. Restrictions apply.

You might also like