You are on page 1of 6

1

DCG: Distributed Conjugate Gradient for Efficient


Linear Equations Solving
Haodi Ping, Yongcai Wang, and Deying Li

Abstract—Distributed algorithms to solve linear equations in


50 50
multi-agent networks have attracted great research attention and 1
1
many iteration-based distributed algorithms have been developed. 10 10
11 11
The convergence speed is a key factor to be considered for 25
9 13
25
9 13
distributed algorithms, and it is shown dependent on the spectral 2 2
arXiv:2107.13814v1 [cs.DC] 29 Jul 2021

5 5
radius of the iteration matrix. However, the iteration matrix is 0 12 0 12
determined by the network structure and is hardly pre-tuned,
8 3 8 3
making the iterative-based distributed algorithms may converge
-25 -25
very slowly when the spectral radius is close to 1. In contrast, in 7 7
centralized optimization, the Conjugate Gradient (CG) is a widely
4 4
adopted idea to speed up the convergence of the centralized -50 6 -50 6
solvers, which can guarantee convergence in fixed steps. In -50 -25 0 25 50 -50 -25 0 25 50
this paper, we propose a general distributed implementation of
CG, called DCG. DCG only needs local communication and (a) The Richardson Iteration (b) DCG
local computation, while inheriting the characteristic of fast Fig. 1. The convergence trails in R2 .
The marker ‘+’ represents the start a
convergence. DCG guarantees to converge in 4Hn rounds, where trail. The marker ‘∗’ and ‘◦’ represent the converged estimation x̂i and the
H is the maximum hop number of the network and n is the ground truth xi , respectively.
number of nodes. We present the applications of DCG in solving
the least square problem and network localization problem. The
results show the convergence speed of DCG is three orders 40
DCG
of magnitude faster than the widely used Richardson iteration Richardson
method. 30

Index Terms—distributed algorithm, conjugate gradient, linear 20


equations, network localization, least square problem
10

0
I. I NTRODUCTION 100 200 300 400 500 ... 2
Iteration (t)
In many multi-agent applications, the underlying problem
can be reduced to solving a system of linear equations[1]. Be- Fig. 2. The mean square error w.r.t. iteration rounds.
cause the autonomous networked agents are usually discretely
deployed, each agent only has the access to communicate with
direct neighbors. Moreover, in certain scenarios, each agent determined by the network topology and is difficult to be pre-
only desires its own state. Such characteristics consequently tuned. Thus the convergence speed is highly uncertain and
give rise to distributed solvers for linear systems. Different may be very slow in some network states.
from the centralized solvers, distributed solvers usually adopt To guarantee fast convergence of distributed algorithms
iterative manners. The key feature of iterative approaches is in solving linear equations, this study explores the idea of
linear iterations, where each agent receives states of direct Conjugate Gradient (CG)[4]. CG has two desired properties
neighbors; updates and sends its own state. The barycentric for solving the linear systems: 1) CG converges to the exact
linear localization algorithm[2], is a typical iteration-based solution after a finite number of iterations, which is not larger
method. than the size of the system matrix; 2) CG is suited for solving
The convergence speed is a crucial factor of the iteration- linear systems with large and sparse system matrices. But CG
based distributed algorithms, which determines whether the is essentially a centralized solver. In pursuit of distributed
distributed algorithm can be used when the application re- implementation, we design an efficient protocol to synchronize
quires a fast response. However, the convergence rate of many the necessary vectors to update the residual. Our distributed
iterative methods, e.g., Jacobi iteration, Gauss-Seidel iteration, CG, i.e., DCG remarkably speeds up convergence by paying
and Richardson iteration[3], are characterized by the spectral limited neighborhood communication costs.
radius of the iteration matrix. However, the spectral radius is The rest of the paper is organized as follows. We formulate
the problem and present related algorithms in Section II. DCG
The authors are with School of Information, Renmin University of China, is proposed in Section III. Applications of DCG are discussed
Beijing, P.R.China, 100872.
in Section IV. DCG is evaluated in Section V. The paper is
E-mail: {haodi.ping, ycw, deyingli}@ruc.edu.cn concluded with further discussions in Section VI.
2

Notations: Throughout the paper, let AX = b denote a sys-


tem of linear equations, where A ∈ Rn×n is the coefficient A = D − L − U. (7)
matrix, b ∈ Rn×d is the right-hand side vector, and X ∈ Rn×d
is the vector of unknowns. n is the number of variables and D is the diagonal component of A. −L and −U are the upper
d is the spatial dimension of a vector, d ∈ {2, 3}. The vector triangle component and the lower triangle component of A,
Ai,: = [Ai1 · · · Ain ] denotes the ith row of A, i ∈ {1 · · · n}. respectively. Then, the Jacobi iteration is specified as:

II. P RELIMINARIES X̂(t + 1) = D−1 (L + U)X̂(t) + D−1 b. (8)


A. Problem Formulation From the local behavior of an individual agent vi , the state
Consider a network of n agents V = {v1 , · · · , vn }, each update is:
agent vi is capable of communicating with the agents within  
n
its reception range R. Let E be the set of edges and (i, j) ∈ E 1  X
if the distance between vi and vj is not larger than R. Then the x̂i (t + 1) = bi − Aij x̂j (t) (9)
Aii
multi-agent network can be represented as a graph G = (V, E). j=1,j6=i

The neighbors of vi is denoted by Ni , j ∈ Ni if (i, j) ∈ E. In Gauss-Seidel iteration, the iteration function is designed
Problem: Assume that A is non-singular. Let X∗ denote the as:
unique solution satisfying AX = b. Suppose each agent vi
holds a state vector x̂i ∈ Rd . Initially, vi knows Ai,: and bi,: .
The problem is to devise a local rule for each agent to update X̂(t + 1) = (D − L)−1 UX̂(t) + (D − L)−1 b. (10)
its state x̂i leveraging the local communication with agents in
Ni so that x̂i (t) converges to x∗i within finite t. The local update of agent vi ’s state is:
 
i−1 n
1  X X
B. Related Work x̂i (t + 1) = bi − Aij x̂j (t + 1) − Aij x̂j (t)
Aii j=1 j=i+1
The basic idea of the iterative methods is as follows. Given
(11)
an initialization x̂(0), generate an iteration sequence {x̂(t)}∞
t=0
Another brief iteration method is the Richardson iteration
in a certain manner, so that:
when A is symmetric positive definite. The iteration function
is:
lim x̂(t) = x∗ , A−1 b. (1)
t→∞

Generally, the state update of an iterative method can be X̂(t + 1) = X̂(t) + ω(b − AX̂(t)). (12)
represented as:
The state of agent vi is updated as:
x̂(t + 1) = φk (x̂(t), x̂(t − 1) · · · , x̂(0), A, b), (2)
 
where x̂(0) = φ0 (A, b) or x̂(0) is selected manually. φk is n
X
called the iteration function. The specific design of iterative x̂i (t + 1) = x̂i (t) + ω bi − Aij x̂j (t) , (13)
functions are based on the matrix splitting. j=1

Definition 1 (Matrix Splitting). Suppose a non-singular matrix where ω is a non-negative scalar and is suggested to be
2
A ∈ Rn×n , a split of matrix A is defined as A = M − N, λmax +λmin . Similar methods include the Successive Over-
where M is also non-singular. Relaxation (SOR) iteration, the Symmetric SOR (SSOR) it-
eration, the Accelerated OR (AOR) iteration, the Symmetric
Consider a general linear system:
AOR (SAOR) etc [5].
AX = b, (3)
where A is non-singular. (3) can be transformed as follows C. Convergence and Convergence Rate
through matrix splitting A = M − N. Since the aforementioned methods are iterative, a crucial
MX = NX + b. (4) issue is to guarantee iteration convergence. For a general
iteration function as in (6), its convergence is guaranteed by
Then, we can construct an iteration function as: the following theorem.
MX̂(t + 1) = NX̂(t) + b, (5) Theorem 1. The iterates formulated by X̂(t+1) = GX̂(t)+b
converges for any X̂(0), if and only if ρ(G) < 1[6].
which is equivalent to:
ρ(G) is the spectral radius of the iteration matrix G. See
X̂(t + 1) = M−1 NX̂(t) + M−1 b , GX̂(t) + g. (6)
Theorem 4.1 of [6] for the proof.
G = M−1 b is called the iteration matrix. It is straightforward Apart from knowing when the iteration converges, it is also
that different iterative methods can be constructed by varying desirable to explore how fast it converges. Saad [6] presented
M. that the convergence rate τ is the natural logarithm of the
In Jacobi iteration, A is splitted as: inverse of the spectral radius:
3

• Update Direction (Line 12). di needs another vector


1 r(t − 1) to be updated. It is obtained by Synchro-
τ = ln = − ln ρ(G). (14)
ρ(G) nize Vector(ri (t − 1)). Then, r(t − 1)T r(t − 1) =
P n 2
It can be seen that these iterative methods converge slowly i=1 ri (t − 1) and di can be calculated locally.
when G has unstable eigenvalues. • Update Step Size (Line 14). Represent the numerator and
denominator of αi as numer = d(t)T r(t) and denom =
d(t)T Ad(t), respectively. The direction vector d(t) is
III. DCG: A G ENERAL D ISTRIBUTED C ONJUGATE
G RADIENT I MPLEMENTATION
obtained by Synchronize Vector(dPn i (t)). So the numerator
can be known as numer = i=1 di (t)ri (t).
The most attractive feature of CG is that it converges within An intermediate vector T(t) = [T1 (t) · · · Tn (t)]T is
a fixed round of iterations. But CG is essentially a centralized introduced to calculate denom. By communicating with
P
gradient-based solver for linear equations. See Section 6 of [6] neighbors, vi calculates Ti (t) = j∈Ni A ij j (t).
d
for the original CG algorithm. Although parallel or distributed The complete T(t) is obtained P by Synchro-
n
CG algorithms have been reported for allocating computational nize Vector(Ti (t)). So denom = i=1 di (t)Ti (t).
loads in cloud computing[7] and estimating spectrum in sensor Then αi = −numer/denom.
networks[8], the general distributed CG has not been well • Update State (Line 16). The state X bi (t) updates a step
touched. αi (t)di (t) along its last state.
The key difficulty in implementing Distributed CG (DCG) is Overall, all of the above DCG procedures are completely
that the state X bi is updated based on several vectors, however,
distributed and implemented through neighborhood message
vi only knows the ith element of these vectors. Specifically, passing.
Xbi (t) = Xbi (t − 1) + αi (t)di (t). αi (t) and di (t) are calculated
using the direction vector d(t) and the residual vector r(t).
But vi only knows di and ri . Algorithm 1: Distributed Conjugate Gradient
(DCG) of vi
Input: Ω(i,:) ; bi ;
A. Vector Synchronization Output: x̂i ;
To supplement the necessary information, we design the 2 Xbi (0) ← 0; di (0) ← 0; ri (0) ← −bi // Initialize
Synchronize Vector(zi ) protocol, through which vi can gather /* Update X bi as in Section III-B */
any complete vector z by constantly exchanging respective 4 for iterations t ∈ {1, · · · , tmax } do
elements with neighbors (Line 21-31 of Algorithm 1). H is 6 b − 1))i − bi ; // residual
ri (t) = (ΩX(t
the largest number of hops throughout the network. So the 8
T
if r(t) r(t) < ε then
synchronization can be finished in H rounds. H can be set 10 break // meets accuracy requirement
to n if it is not given at network deployment. The operation T
r(t) r(t)
‘merge’ (Line 29) means selecting all non-zero elements of 12 di (t)=−ri (t)+ r(t−1) T r(t−1) di (t − 1) //direction
T
the input vectors and stacking them as a new vector while 14
d(t) r(t)
αi (t) = − d(t) T Ωd(t) ; // step size
maintaining their original indexes.
16 Xi (t) = Xi (t − 1) + αi (t)di (t); // state
b b
18 return X bi (t).
B. Distributed Conjugate Gradient
19 Function Synchronize Vector(zi )
DCG is detailed as Function DCG (Line 2-18 of Algo-
21 initialize zi (0) ← [0(i−1)×1 zi 0(n−i)×1 ];
rithm 1). At initialization (Line 2), the state X bi (0) is set to
23 for iterations k ∈ {0, · · · , H} do
0. The direction vector component di and the residual vector
25 exchange zi (k) with Ni ;
component ri are set to 0 and −bi , respectively. At each
27 for each vj ∈ Ni do
iteration t, the local behavior of a node vi is as follows:
29 zi (k + 1) ← merge(zi (k), zj (k));
• Update Residual (Line 6). The residual ri is updated by
ri (t) =P b i (t − 1) − bi , which is realized by ri (t) =
(AX) 31 return zi .
−bi + j∈Ni Aij X bj (t − 1). A(i,:) and bi are known to
vi at the begining. Xbj (t − 1) is the state of neighbor vj
obtained by local communication.
• Check Residual (Line 8). CG theoretically completes C. Analysis of DCG
after n iterations [3]. However, due to the accumulated
floating point rounding off errors, the residual and the In each round, four vectors are obtained by synchronization,
direction gradually lose accuracy. Thus, DCG terminates so it converges within 4Hn rounds when A is non-singular.
by checking whether r(t)T r(t) < ε. The threshold ε In actual applications, there may exist measurement noise so
is an empirical value based on accuracy requirement. that A and b are influenced and the constructed linear system
r(t) is obtained by: r(t) = Synchronize Vector(ri (t)). may be unsolvable[9]. Let A and b denote the noisy matrices
as:
After knowing the synchronized Pn r, the squared residual
is calculated as r(t)T r(t) = i=1 ri (t)2 . A = A + ∆A, b = b + ∆b, (15)
4

where ∆A and ∆b are error matrices implying the noise. If B. The Network Localization Problem
the noisy matrix A is still non-singular, DCG still converges The geographical locations of nodes are fundamental in-
to the neighborhood of X∗ . formation for many multi-agent applications[10], [11], [12].
Lemma 1. For a noisy linear system AX = b, DCG Network localization techniques are usually adopted for cal-
converges if the error matrix satisfies: culating node locations in infrastructure-less scenarios[13],
[14], which are formulated as follows. For a network of
||∆A|| < λmin (A). (16) m + n agents in Rd , let V = A ∪ F denote the entire node
set. Nodes in A = {v1 , · · · , vm } are called anchor agents,
Proof. From (15), we can obtain: whose locations PA = {p1 , · · · , pm } are known. Nodes
in F = {vm+1 , · · · , vm+n } are called free agents, whose
A = A(I + A−1 ∆A). (17)
locations PF = {pm+1 , · · · , pm+n } are unknown. Each
Since ||A−1 ∆A|| ≤ ||A−1 || ||∆A|| = λ||∆A|| , the matrix A agent vi can only sense the relative distance dij between vi and
min (A)
must be non-singular if (16) holds. Thus DCG can converge any neighbor vj ∈ Ni . Each agent can exchange its estimated
∗ −1 location p̂i and distance measurements with neighbors. The
to the neighborhood of X∗ shown as: X = A b.
network localization problem is to design a distributed protocol
for each agent to update its location p̂i so that it converges to
IV. A PPLICATIONS OF DCG pi .
To solve the network localization problem, we transform it
In this section, we investigate applying DCG to two actual
to a linear system. First, each location pi is represented as a
scenarios.
linear combination of locations of neighbors:
X
pi = aij pj , (22)
A. The Least Square Problem
vj ∈Ni
Linear equations arising from the era of engineering are
where aij are called barycentric coordinates. The calculation
usually over-determined. A typical scenario is distributed
of barycentric coordinates involve only local distance measure-
parameter estimation, where the observation equations are as:
ments and the specific process is introduced by Diao et al. [2]
in R2 and by Han et al. [15] in R3 . Then after calculating the
Ai x i = bi + δ i , (18) barycentric coordinates for each node, the agent locations can
form a linear system:
where xi is the desired parameter to be calculated and δi is the     
component implying the measurement noise of bi . Due to the PA I 0 PA
= . (23)
measurement noise, such linear equations usually do not have PF B C PF
a solution that exactly meets the constraints. This problem can  
I 0
be formulated as follows. Suppose there is no solution to the A = ∈ R(m+n)×(m+n) is constructed with
B C
linear system AX = b and AAT is non-singular. Each agent barycentric coordinates, i.e., the ith row of A is the barycentric
vi only knows the ith row of A and b, which are denoted coordinate of vi w.r.t. its neighbors Ni . Then, the localization
by Ai,: and bi,: , respectively. Design a distributed rule for problem can be transformed into solving the following linear
each agent to update its state xi so that xi (t) converges to the system:
unique solution of: (I − C)PF = BPA . (24)
AT AX = AT b. (19) Writing I − C as M, (24) can be reformulated as:
Let Ω = AT A and β = AT b. To solve the least square MPF = BPA . (25)
problem as in (19), each agent should be aware of the ith row
Considering that CG is used when the system matrix is positive
of Ω and β using Ai,: and bi,: .
definite [3]. Thus, we multiply MT to both sides of (25):
Initially, vi maintains Ai,: .Then, vi transmits Ai,j to Ni and
receives Aj,i from Ni , then it also knows the nonzero elements MT MPF = MT BPA . (26)
of the ith column A:,i , i.e., ATi,: . Thus the nonzero elements
of Ωi,: are calculated distributively as: Then, MT M is positive definite. To apply DCG to distributed
localization, the localization model in (26) is reformulated to
Xn
Ωij = ATik Akj , ∀vj ∈ Ni . (20) ΩPF = β, where Ω = MT M ∈ Rn×n and β = MT BPA ∈
k=1 Rn×d .
Similarly, β i,: can be calculated as: Algorithm 2 shows the routine of DCG-Loc. For initializa-
Xn tion, vi needs to know the ith row of Ω and β to invoke
βij = ATik bkj , j ∈ {1, · · · , d}. (21) the general DCG. After constructing the local linear model
k=1
by (22), vi maintains Ai,: , so it knows the nonzero elements
Finally, the problem can be solved as xi = of the ith row Mi,: by Mi,j = −Ai,j if vj ∈ Ni . Then, vi
DCG(Ωi,: , β i,: ). Therefore x solves the least squares transmits Mij to Ni and receives Mji from Ni (Line 4-6), so
problem in (19). it also knows the nonzero elements of the ith column M:,i ,
5

i.e., MTi,: . Thus the nonzero elements of Ωi,: are calculated


distributively as:
Xn
T
Ωij = Mik Mkj , ∀vj ∈ Ni . (27)
k=1

To calculate β (i,:) , an intermediate vector µ ∈ Rn×d implying


BPA is introduced. vi locally calculates the ith row of µ:
X
µi,: = Aia pa , (28)
va ∈Ni ∩A

where Ni ∩A represents neighboring anchors. µi,: = [0, 0] if no


anchor is found in Ni . Then, vi sends µi,: to Ni and receives (a) The Richardson Iteration (b) DCG
µj,: of each vj ∈ Ni∗ (Line 4-6). Thus, β i,: is calculated as:
Fig. 3. The convergence trails in R3 .
X
T
β i,: = Mi,j µj,: , (29) 60
vj ∈Ni ∩F
DCG
where Ni ∩ F means the non-anchor barycentric neighbors. Richardson

For convenience, each location p̂i ∈ Rd×1 is decomposed to 40

[p̂1i , · · · , p̂di ]. β (i,:) is decomposed to [βi1 , · · · , βid ]. The ele-


ment p̂ji is calculated by DCG(Ai,: , βij ) (Line 10). Therefore, 20

from the GBLL model in (26), p̂i is calculated leveraging


DCG-Loc, where communications only involve message pass- 0 ... 2
100 200 300 400
ing with neighbors. Iteration (t)

Fig. 4. The mean square error w.r.t. iteration rounds.


Algorithm 2: Distributed Conjugate Gradient Lo-
calization (DCG-Loc) of vi
Input: neighbors: Ni ; barycentric coordinates: Ai,: ; property analysis and two applications. Compared with tradi-
Output: location: p̂i ; tional Richardson iteration, DCG shows 3 magnitudes faster
2 Mij ← −Aij ; calculate µi,: as (28); convergence speed. In future work, we will consider reducing
4 transmit Mij and µi,: to vj ∈ Ni ; the communication burden that DCG requires.
6 receive Mji and µj,: from vj ∈ Ni ;
8 calculate Ωi,: as (27), calculate β i,: as (29);
1 d
10 return p̂i ← [DCG(Ωi,: , βi ),· · · , DCG(Ωi,: , βi )]. R EFERENCES
[1] Shaoshuai Mou, A Stephen Morse, Zhiyun Lin, Lili Wang, and Daniel
Fullmer. A distributed algorithm for efficiently solving linear equations.
In 2015 54th IEEE Conference on Decision and Control (CDC), pages
V. E VALUATION 6791–6796. IEEE, 2015.
[2] Y. Diao, Z. Lin, and M. Fu. A barycentric coordinate based distributed
In this section, we evaluate the convergence speed between localization algorithm for sensor networks. IEEE Transactions on Signal
our proposed DCG algorithm and the representative Richard- Processing, 62(18):4760–4771, 2014.
[3] Wolfgang Hackbusch. Iterative Solution of Large Sparse Systems of
son iteration by counting the iteration rounds. Simulations are Equations, chapter 10.2, page 234. Applied Mathematical Sciences.
conducted using MATLAB R2020b in both R2 and R3 . A Springer International Publishing, 2 edition, 2016.
network denoted by G = {V, E} is deployed. The number of [4] Charu C Aggarwal. Linear Algebra and Optimization for Machine
Learning. Springer, 2020.
anchors is set to d + 1, which is the minimum number of [5] Richard Barrett, Michael Berry, Tony F Chan, James Demmel, June
anchors required to uniquely localize a network in Rd . Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo, Charles Romine,
Fig. 1 and Fig. 2 shows the results in R2 . DCG and and Henk Van der Vorst. Templates for the solution of linear systems:
building blocks for iterative methods. SIAM, 1994.
Richardson iteration are adopted to solve the linear localization [6] Yousef Saad. Iterative methods for sparse linear systems. SIAM, 2003.
problem modeled by the network in Fig. 1. It is shown that [7] Leila Ismail and Rajeev Barua. Implementation and performance eval-
both DCG and Richardson successfully converge to the ground uation of a distributed conjugate gradient method in a cloud computing
truth within finite rounds of iterations. However, Fig. 2 shows environment. Software: Practice and Experience, 43(3):281–304, 2013.
[8] Songcen Xu, Rodrigo C De Lamare, and H Vincent Poor. Distributed
that the Richardson iteration consumes 5 × 105 rounds while estimation over sensor networks based on distributed conjugate gradient
DCG only needs 550 rounds. From Fig. 3 and Fig. 4, similar strategies. IET Signal Processing, 10(3):291–301, 2016.
results can be obtained in R3 . Overall, DCG-Loc is shown to [9] Xu Fang, Xiaolei Li, and Lihua Xie. 3-d distributed localization
with mixed local relative measurements. IEEE Transactions on Signal
be faster than Richardson iteration about 1,000 times. Processing, 68:5869–5881, 2020.
[10] Thien-Minh Nguyen, Zhirong Qiu, Thien Hoang Nguyen, Muqing
Cao, and Lihua Xie. Persistently excited adaptive relative localization
VI. C ONCLUSION and time-varying formation of robot swarms. IEEE Transactions on
In this paper, we proposed DCG to enable a network of Robotics, 36(2):553–560, 2020.
[11] T. Sun, Y. Wang, D. Li, Z. Gu, and J. Xu. Wcs: Weighted component
n agents to solve linear equations like AX = b in fixed stitching for sparse network localization. IEEE/ACM Transactions on
rounds of iterations. The DCG algorithm is presented with Networking, 26(5):2242–2253, 2018.
6

[12] Y. Wang, T. Sun, G. Rao, and D. Li. Formation tracking in sparse


airborne networks. IEEE Journal on Selected Areas in Communications,
36(9):2000–2014, 2018.
[13] H. Ping, Y. Wang, D. Li, and T. Sun. Flipping free conditions and their
application in sparse network localization. IEEE Transactions on Mobile
Computing, pages 1–1, 2020.
[14] Haodi Ping, Yongcai Wang, and Deying Li. Hgo: Hierarchical graph
optimization for accurate, efficient, and robust network localization.
In Proceedings of the 29th International Conference on Computer
Communications and Networks, ICCCN, pages 1–9, 2020.
[15] T. Han, Z. Lin, R. Zheng, Z. Han, and H. Zhang. A barycentric coor-
dinate based approach to three-dimensional distributed localization for
wireless sensor networks. In 2017 13th IEEE International Conference
on Control Automation (ICCA), pages 600–605, 2017.

You might also like