Professional Documents
Culture Documents
DCG: Distributed Conjugate Gradient For Efficient Linear Equations Solving
DCG: Distributed Conjugate Gradient For Efficient Linear Equations Solving
5 5
radius of the iteration matrix. However, the iteration matrix is 0 12 0 12
determined by the network structure and is hardly pre-tuned,
8 3 8 3
making the iterative-based distributed algorithms may converge
-25 -25
very slowly when the spectral radius is close to 1. In contrast, in 7 7
centralized optimization, the Conjugate Gradient (CG) is a widely
4 4
adopted idea to speed up the convergence of the centralized -50 6 -50 6
solvers, which can guarantee convergence in fixed steps. In -50 -25 0 25 50 -50 -25 0 25 50
this paper, we propose a general distributed implementation of
CG, called DCG. DCG only needs local communication and (a) The Richardson Iteration (b) DCG
local computation, while inheriting the characteristic of fast Fig. 1. The convergence trails in R2 .
The marker ‘+’ represents the start a
convergence. DCG guarantees to converge in 4Hn rounds, where trail. The marker ‘∗’ and ‘◦’ represent the converged estimation x̂i and the
H is the maximum hop number of the network and n is the ground truth xi , respectively.
number of nodes. We present the applications of DCG in solving
the least square problem and network localization problem. The
results show the convergence speed of DCG is three orders 40
DCG
of magnitude faster than the widely used Richardson iteration Richardson
method. 30
0
I. I NTRODUCTION 100 200 300 400 500 ... 2
Iteration (t)
In many multi-agent applications, the underlying problem
can be reduced to solving a system of linear equations[1]. Be- Fig. 2. The mean square error w.r.t. iteration rounds.
cause the autonomous networked agents are usually discretely
deployed, each agent only has the access to communicate with
direct neighbors. Moreover, in certain scenarios, each agent determined by the network topology and is difficult to be pre-
only desires its own state. Such characteristics consequently tuned. Thus the convergence speed is highly uncertain and
give rise to distributed solvers for linear systems. Different may be very slow in some network states.
from the centralized solvers, distributed solvers usually adopt To guarantee fast convergence of distributed algorithms
iterative manners. The key feature of iterative approaches is in solving linear equations, this study explores the idea of
linear iterations, where each agent receives states of direct Conjugate Gradient (CG)[4]. CG has two desired properties
neighbors; updates and sends its own state. The barycentric for solving the linear systems: 1) CG converges to the exact
linear localization algorithm[2], is a typical iteration-based solution after a finite number of iterations, which is not larger
method. than the size of the system matrix; 2) CG is suited for solving
The convergence speed is a crucial factor of the iteration- linear systems with large and sparse system matrices. But CG
based distributed algorithms, which determines whether the is essentially a centralized solver. In pursuit of distributed
distributed algorithm can be used when the application re- implementation, we design an efficient protocol to synchronize
quires a fast response. However, the convergence rate of many the necessary vectors to update the residual. Our distributed
iterative methods, e.g., Jacobi iteration, Gauss-Seidel iteration, CG, i.e., DCG remarkably speeds up convergence by paying
and Richardson iteration[3], are characterized by the spectral limited neighborhood communication costs.
radius of the iteration matrix. However, the spectral radius is The rest of the paper is organized as follows. We formulate
the problem and present related algorithms in Section II. DCG
The authors are with School of Information, Renmin University of China, is proposed in Section III. Applications of DCG are discussed
Beijing, P.R.China, 100872.
in Section IV. DCG is evaluated in Section V. The paper is
E-mail: {haodi.ping, ycw, deyingli}@ruc.edu.cn concluded with further discussions in Section VI.
2
The neighbors of vi is denoted by Ni , j ∈ Ni if (i, j) ∈ E. In Gauss-Seidel iteration, the iteration function is designed
Problem: Assume that A is non-singular. Let X∗ denote the as:
unique solution satisfying AX = b. Suppose each agent vi
holds a state vector x̂i ∈ Rd . Initially, vi knows Ai,: and bi,: .
The problem is to devise a local rule for each agent to update X̂(t + 1) = (D − L)−1 UX̂(t) + (D − L)−1 b. (10)
its state x̂i leveraging the local communication with agents in
Ni so that x̂i (t) converges to x∗i within finite t. The local update of agent vi ’s state is:
i−1 n
1 X X
B. Related Work x̂i (t + 1) = bi − Aij x̂j (t + 1) − Aij x̂j (t)
Aii j=1 j=i+1
The basic idea of the iterative methods is as follows. Given
(11)
an initialization x̂(0), generate an iteration sequence {x̂(t)}∞
t=0
Another brief iteration method is the Richardson iteration
in a certain manner, so that:
when A is symmetric positive definite. The iteration function
is:
lim x̂(t) = x∗ , A−1 b. (1)
t→∞
Generally, the state update of an iterative method can be X̂(t + 1) = X̂(t) + ω(b − AX̂(t)). (12)
represented as:
The state of agent vi is updated as:
x̂(t + 1) = φk (x̂(t), x̂(t − 1) · · · , x̂(0), A, b), (2)
where x̂(0) = φ0 (A, b) or x̂(0) is selected manually. φk is n
X
called the iteration function. The specific design of iterative x̂i (t + 1) = x̂i (t) + ω bi − Aij x̂j (t) , (13)
functions are based on the matrix splitting. j=1
Definition 1 (Matrix Splitting). Suppose a non-singular matrix where ω is a non-negative scalar and is suggested to be
2
A ∈ Rn×n , a split of matrix A is defined as A = M − N, λmax +λmin . Similar methods include the Successive Over-
where M is also non-singular. Relaxation (SOR) iteration, the Symmetric SOR (SSOR) it-
eration, the Accelerated OR (AOR) iteration, the Symmetric
Consider a general linear system:
AOR (SAOR) etc [5].
AX = b, (3)
where A is non-singular. (3) can be transformed as follows C. Convergence and Convergence Rate
through matrix splitting A = M − N. Since the aforementioned methods are iterative, a crucial
MX = NX + b. (4) issue is to guarantee iteration convergence. For a general
iteration function as in (6), its convergence is guaranteed by
Then, we can construct an iteration function as: the following theorem.
MX̂(t + 1) = NX̂(t) + b, (5) Theorem 1. The iterates formulated by X̂(t+1) = GX̂(t)+b
converges for any X̂(0), if and only if ρ(G) < 1[6].
which is equivalent to:
ρ(G) is the spectral radius of the iteration matrix G. See
X̂(t + 1) = M−1 NX̂(t) + M−1 b , GX̂(t) + g. (6)
Theorem 4.1 of [6] for the proof.
G = M−1 b is called the iteration matrix. It is straightforward Apart from knowing when the iteration converges, it is also
that different iterative methods can be constructed by varying desirable to explore how fast it converges. Saad [6] presented
M. that the convergence rate τ is the natural logarithm of the
In Jacobi iteration, A is splitted as: inverse of the spectral radius:
3
where ∆A and ∆b are error matrices implying the noise. If B. The Network Localization Problem
the noisy matrix A is still non-singular, DCG still converges The geographical locations of nodes are fundamental in-
to the neighborhood of X∗ . formation for many multi-agent applications[10], [11], [12].
Lemma 1. For a noisy linear system AX = b, DCG Network localization techniques are usually adopted for cal-
converges if the error matrix satisfies: culating node locations in infrastructure-less scenarios[13],
[14], which are formulated as follows. For a network of
||∆A|| < λmin (A). (16) m + n agents in Rd , let V = A ∪ F denote the entire node
set. Nodes in A = {v1 , · · · , vm } are called anchor agents,
Proof. From (15), we can obtain: whose locations PA = {p1 , · · · , pm } are known. Nodes
in F = {vm+1 , · · · , vm+n } are called free agents, whose
A = A(I + A−1 ∆A). (17)
locations PF = {pm+1 , · · · , pm+n } are unknown. Each
Since ||A−1 ∆A|| ≤ ||A−1 || ||∆A|| = λ||∆A|| , the matrix A agent vi can only sense the relative distance dij between vi and
min (A)
must be non-singular if (16) holds. Thus DCG can converge any neighbor vj ∈ Ni . Each agent can exchange its estimated
∗ −1 location p̂i and distance measurements with neighbors. The
to the neighborhood of X∗ shown as: X = A b.
network localization problem is to design a distributed protocol
for each agent to update its location p̂i so that it converges to
IV. A PPLICATIONS OF DCG pi .
To solve the network localization problem, we transform it
In this section, we investigate applying DCG to two actual
to a linear system. First, each location pi is represented as a
scenarios.
linear combination of locations of neighbors:
X
pi = aij pj , (22)
A. The Least Square Problem
vj ∈Ni
Linear equations arising from the era of engineering are
where aij are called barycentric coordinates. The calculation
usually over-determined. A typical scenario is distributed
of barycentric coordinates involve only local distance measure-
parameter estimation, where the observation equations are as:
ments and the specific process is introduced by Diao et al. [2]
in R2 and by Han et al. [15] in R3 . Then after calculating the
Ai x i = bi + δ i , (18) barycentric coordinates for each node, the agent locations can
form a linear system:
where xi is the desired parameter to be calculated and δi is the
component implying the measurement noise of bi . Due to the PA I 0 PA
= . (23)
measurement noise, such linear equations usually do not have PF B C PF
a solution that exactly meets the constraints. This problem can
I 0
be formulated as follows. Suppose there is no solution to the A = ∈ R(m+n)×(m+n) is constructed with
B C
linear system AX = b and AAT is non-singular. Each agent barycentric coordinates, i.e., the ith row of A is the barycentric
vi only knows the ith row of A and b, which are denoted coordinate of vi w.r.t. its neighbors Ni . Then, the localization
by Ai,: and bi,: , respectively. Design a distributed rule for problem can be transformed into solving the following linear
each agent to update its state xi so that xi (t) converges to the system:
unique solution of: (I − C)PF = BPA . (24)
AT AX = AT b. (19) Writing I − C as M, (24) can be reformulated as:
Let Ω = AT A and β = AT b. To solve the least square MPF = BPA . (25)
problem as in (19), each agent should be aware of the ith row
Considering that CG is used when the system matrix is positive
of Ω and β using Ai,: and bi,: .
definite [3]. Thus, we multiply MT to both sides of (25):
Initially, vi maintains Ai,: .Then, vi transmits Ai,j to Ni and
receives Aj,i from Ni , then it also knows the nonzero elements MT MPF = MT BPA . (26)
of the ith column A:,i , i.e., ATi,: . Thus the nonzero elements
of Ωi,: are calculated distributively as: Then, MT M is positive definite. To apply DCG to distributed
localization, the localization model in (26) is reformulated to
Xn
Ωij = ATik Akj , ∀vj ∈ Ni . (20) ΩPF = β, where Ω = MT M ∈ Rn×n and β = MT BPA ∈
k=1 Rn×d .
Similarly, β i,: can be calculated as: Algorithm 2 shows the routine of DCG-Loc. For initializa-
Xn tion, vi needs to know the ith row of Ω and β to invoke
βij = ATik bkj , j ∈ {1, · · · , d}. (21) the general DCG. After constructing the local linear model
k=1
by (22), vi maintains Ai,: , so it knows the nonzero elements
Finally, the problem can be solved as xi = of the ith row Mi,: by Mi,j = −Ai,j if vj ∈ Ni . Then, vi
DCG(Ωi,: , β i,: ). Therefore x solves the least squares transmits Mij to Ni and receives Mji from Ni (Line 4-6), so
problem in (19). it also knows the nonzero elements of the ith column M:,i ,
5