Professional Documents
Culture Documents
1
Problem formulation
n
X
min F (x) = Fi (x) Fi convex
x∈Rd
i=1
Centralized
Distributed
2
Problem formulation
n
X
min F (x) = Fi (x) Fi convex
x∈Rd
Optimization problem i=1
Xn
subject to gi (x) ≤ 0 or x ∈ X
i=1
I Graph: G = {V, E}
I Goal: Cooperatively estimating
the global minimizer x∗ , i.e.,
x1 = x2 = . . . = xn = x∗
3
Motivation
Smart
grids Distributed Formation
networks control
4
Preliminaries
(Doubly Stochastic) A square matrix A is doubly-stochastic if all of its
entries are non-negative, and the entries of each column sum to 1 where
A = AT .
(Row Stochastic) A square matrix A is row-stochastic if all of its entries
are non-negative, and the entries of each row sum to 1.
(L-Lipschitz Continuity ) Let X ∈ Rn be a convex set. Function
h : X → R is said to be L-Lipschitz continuous on X with modulus L if
there exists l > 0 such that kh(y) − h(x)k ≤ Lkx − yk for all x, y ∈ X .
(L-Smooth) Let X ∈ Rn be a convex set. A differentiable function
h : X → R is said to be L-Smooth on X with modulus L if its gradient
∇h is L-Lipschitz continuous on X .
(m-Strongly Convex Function) Let X ∈ Rn be a convex set. Function
h : X → R is said to be m-strongly convex on X if there exists a constant
C > 0 such that there exists m > 0 such that the following holds
m
fi (y) ≥ fi (x) + ∇fi (x)T (y − x) + kx − yk2 .
2
5
This dissertation
Presentation:
I Chapter 3
I Chapter 4
I Chapter 2
6
Optimization problem with global constraint
n
1X
min F (x) = fi (x),
x n i=1
subject to x ∈ X,
Assumption:
I fi is m-strongly convex and L-smooth
I X is convex and closed set
I All agents know X
I Communication graph: directed graph
7
Existing methods (I/III)
Unconstrained (X = Rp ) + Undirected Graph
n
xik+1 =
X
wij xjk − αk ∇fi (xik ),
j=1
11
Analysis via LTI
A∞ xk+1 = A∞ xk − αA∞ zk
Hence
kxk+1 − A∞ xk+1 kA = kAxk − A∞ xk kA − αkA∞ zk − A∞ zk kA
≤ πkxk − A∞ xk kA − αkA∞ zk − A∞ zk kA
where ζ ∈ (0, 1). Long short story,
kxk+1 − A∞ xk kA kxk − A∞ xk−1 kA
∞ k+1 − 1n ⊗ x k2 ≤ Ω kA∞ xk − 1n ⊗ x∗ k2
∗
kA x
kzk+1 − A∞ z k kA k ∞
kz − A z k−1 kA
Convergence is obtained by ensuring ρ(Ω) < 1. This can be derived by
using π ∈ (0, 1).
12
Does it work with projection?
h i
xk+1 = PX n Axk − αzk
Then,
h i
A∞ xk+1 = A∞ PX n Axk − αzk
As a result,
h i h i
kxk+1 − A∞ xk+1 kA = kPX n Axk − αzk − A∞ PX n Axk − αzk kA
13
Proposed method for constrained problem (I/II)
GTCP-RS:
h i n
xk+1
X
i = βPX xki − αzik + (1 − β) aij xkj
j=1
n
yik+1 =
X
aij yjk
j=1
n
∇fi (xk+1 ) ∇fi (xki )
zik+1 =
X
aij zjk + i
−
j=1 [yik+1 ]i [yik ]i
14
Convergence Analysis
Theorem
Suppose wk = [ kA∞ xk − 1n ⊗ x∗ k, kxk − A∞ xk kA , kA∞ zk − zk kA ]T
2
and that α < nL . It follows that
1
Moreover, assume that the step-sizes satisfy 0 < α < nL and
0 < β < min{1, β̄1 , β̄2 }, then we have ρ(G) < 1.
15
Idea of the proof
h i
xk+1 = βPX̃ xk − αzk + (1 − β)Axk ,
h i
A∞ xk+1 = βA∞ PX̃ xk − αzk + (1 − β)A∞ xk ,
Lemma
Let X ∈ Rn×n be a non-negative matrix and x ∈ Rn be a positive vector.
If Xx < ωx with ω > 0, then ρ(X) < ω
I β gives additional degree of freedom to make the diagonal of G less
than 1 → use the above lemma to obtain ρ(G) < 1
16
Numerical Simulation
Consider 30 agents aiming to solve logistic regression problem:
mi
n X h i µ
x∗ = argmin
X
ln 1 + exp −(cTij x)hij + kxk2 ,
x∈X ⊂Rp i=1 j 2
-2
-4
-6
-8
-10
-12
-14
17
Comparison with literature
Any weaknesses?
I Only limited to strongly convex and smooth cost function
I The variable yik is n-dimensional → not good for a large network
18
Existing method: column stochastic matrix
Unconstrained + directed [10]
n
x̂ik+1 =
X
bij x̂ik − αi zki
j=1
n
yik+1 =
X
bij yjk
j=1
x̂k+1
xik+1 =
yik+1
n
zk+1 bij zkj + ∇fi (xik+1 ) − ∇fi (xik )
X
i =
j=1
GTCP-CS:
n
yik+1 =
X
bij yjk ,
j=1
" Pn k k Pn
− αzik k k
#
j=1 bij (yj xj ) j=1 bij (yj xj )
xk+1
i = βPX + (1 − β) ,
yik+1 yik+1
n
zik+1 = bij zjk + ∇fi xik+1 − ∇fi xik ,
X
j=1
20
Convergence Analysis
Theorem h
iT
∞ k
Let us define g k =
à x −1n ⊗ x∗
,
xk − Ã∞ xk
,
B ∞ zk − zk
h iT
2
and hk = kxk k, kzk k . Suppose that α < nL . It follows that
g k+1 ≤ G̃g k + H̃ k hk ,
u4 u†4
1−β +βλ βu1 +βαu2 βαu3
βαu8 ,H̃ = γ u9 u†9,
k k
G̃= βu5 λ ψ2 (1−β)+βu6 +βαu7
β(1+λ)u10 βu11 + βαu12 +(1+ψ1 )u13 ψ1 +βαu14 u15 u†15
Moreover, assume that 0 < α < (1/nL) holds. Then, there exists
β M ∈ (0, 1) such that , when β < β M , we have
k
kxk − (1n ⊗ x∗ )k ≤ w max{ρ(G̃), ψ2 } + .
21
Idea of proof
I In a compact form:
yk+1 = B yk ,
sk+1 = (Yk+1 )−1 B Yk ⊗ Ip xk
h i
xk+1 = βPX̃ sk+1 − α(Yk+1 )−1 zk + (1 − β)sk+1 ,
zk+1 = Bzk + ∇f(xk+1 ) − ∇f(xk ),
22
Summary
n
1X
min F (x) = fi (x),
x n i=1
subject to x ∈ X,
23
Distributed EDP considering transmission limit (I/III)
Problem formulation:
n
X
min C(pg ) = Ci (pgi ),
pg ∈P
i=1
n
X Xn
subject to pLi − pgi = 0,
i=1 i=1
n n
−p̄qf ≤ q q
≤ p̄qf , q = 1, . . . , `
X X
πgi pgi − πLi pLi
i=1 i=1
24
Distributed EDP considering transmission limit (II/III)
First, the Lagrangian function of the EDP L : P × R2`
+ → R is defined as
L(pg , λ, ξ U , ξ m )
n n ` n n
! !
q q
πLi pLi − p̄qf
X X X X X
= Ci (pgi ) + λ pLi − pgi + ξ U,q πgi pgi −
i=1 i=1 q=1 i=1 i=1
` n n
!
q q
pgi − p̄qf
X X X
+ ξ m,q πLi pLi − πgi
q=1 i=1 i=1
n
X
= Li (pgi , λ, ξ U , ξ m )
i=1
25
Distributed EDP considering transmission limit (III/III)
Dual problem
N
X
max Φ(λ) = Φi (λ) (1)
λ∈R
i=1
Lemma
For all i ∈ V, let Ci (·) is m-strongly convex and L-smooth on Pi . Then,
Ci⊥ (λ) is strongly convex with constant 1/L and Ci⊥ (λ) is differentiable
such that its derivative is Lipschitz continuous with constant 1/m.
26
Simulation results (I/IV)
27
Simulation results (II/IV)
16
14
12
10
28
Simulation results (III/IV)
-1
-2
-3
-4
-5
-6
-7
-8
0 1000 2000 3000 4000 5000 6000 7000 8000
Figure: Comparison of normalized error (in log scale) between the proposed
algorithm and an existing method based on diminishing step size.
29
Simulation results (IV/IV)
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
5 10 15 20 25 30 35
We set p̄qf = 8.
Θg pg∗ − ΘL pL = [1.03, 5.96, 0.7100, 1.56, 8, 7.9, 5.94, 2.62]T
30
This dissertation
Presentation:
I Chapter 3 X
I Chapter 4
I Chapter 2
31
Problem formulation
PN
min f (x) = i=1 fi (xi ),
x∈X
PN
subject to g(x) = i=1 gi (xi ) ≤ 0,
32
Existing method via fixed step size
Primal-dual algorithm:
h i
xk+1 = PX Axk − αLx (xk , λk )
h i
λk+1 = PD Aλk − αLλ (xk , λk )
Motivation: design an algorithm that does not require compact set set D.
33
Algorithm development
34
Proposed algorithm
35
Convergence analysis
Theorem
(Primal and Dual Optimality) Let α satisfies
36
Idea of the proof
I Dual function
37
Application to the EDP (I/IV)
Problem formulation:
PN
min C(pg ) = i=1 Ci (pgi ),
pg ∈P
PN PN
subject to Υ(pg ) + i=1 pLi − i=1 pgi = 0,
where
N X
X N N
X
Υ(pg ) = pgi bij pgj + b0i pgi + b00 .
i=1 j=1 j=1
38
Application to the EDP (II/IV)
Modified EDP:
PN
min C(pg ) = i=1 Ci (pgi ),
u,pg ∈P
PN 2 PN PN
subject to i=1 ui + i=1 pLi − i=1 pgi ≤ 0,
Rpg = u,
39
Application to the EDP (III/IV)
Relaxed modified EDP:
PN
min C(pg ) = i=1 Ci (pgi ),
u,pg ∈P
PN 2 PN PN
subject to i=1 ui + i=1 pLi − i=1 pgi ≤ 0,
Rpg = u,
I The problem has the same formulation as the one discussed before,
Algorithm 3 can be used to solve it
41
Simulation results (I/IV)
42
Simulation results (II/IV)
20
18
16
14
12
10
0 1 2 3 4 5 6 7 8 9 10
105
44
Simulation results (IV/IV)
4000
Cost function value
3000
2000
1000
30
Supply-demand balance
20
10
-10
-20
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10 5
45
This dissertation
Presentation:
I Chapter 3 X
I Chapter 4 X
I Chapter 2
46
Problem formulation
PN
min f (x) = i=1 fi (xi ),
x∈X
47
Related methods: centralized
I Nesterov’s Fast Gradient Method (FGM), is widely employed:
n
!
k+1 k 1X
x = z −α ∇fi (zk )
n i=1
zk+1 = xk+1 + β(xk+1 − xk )
√ √
I The FGM uses α = 1/L and β = √L−√m .
L+ m
I The rate can be improved by using the triple momentum (TM):
n
!
k+1 k 1X
x = s −α ∇fi (z)
n i=1
sk+1 = xk+1 + β(xk+1 − xk )
zk+1 = xk+1 + γ(xk+1 − xk )
where step-sizes are chosen such that
!
1 + % %2 %2
(α, β, γ) = , ,
L 2 − % (1 + %)(2 − %)
p
and % = 1 − 1/ L/m. 48
Proposed methods
TM-like gradient tracking algorithm :
n
xik+1 =
X
aij sjk − αi yik
j=1
n
sik+1 = xik+1 + βi aij (xjk+1 − xjk )
X
j=1
n
zk+1 = xik+1 + γi aij (xjk+1 − xjk )
X
i
j=1
n
yik+1 = bij yjk + ∇fi (zk+1
X
i ) − ∇fi (zki )
j=1
I Show that
I where
πA + hA u1 ᾱ u1 ᾱ u1 ᾱγ̄ + u2 β̄ u3 ᾱ
u5 ᾱ λ u6 ᾱγ̄ + u7 β̄ hB u7 ᾱ
Ω =
u8 + u9 ᾱ u9 ᾱ u10 ᾱγ̄ + u11 β̄ hB ᾱ
u13 (1 + γ̄) + u14 ᾱ(1 + γ̄) u15 ᾱ(1 + γ̄) ξ1 ξ2
50
Application to the EDP
Problem formulation:
n
X
min Ci (pgi )
pgi ,i=1,...,n
i=1
Xn n
X
subject to pgi = pLi ,
i=1 i=1
pgi ∈ Pi = {pgmin
i
≤ pgi ≤ pgmax
i
}.
Dual problem:
n
X
min g(λ) = gi (λ)
λ∈R
i=1
51
Simulation result (I/III)
52
Simulation result (II/III)
18
16
14
12
10
0
0 20 40 60 80 100 120 140 160 180 200
Pn
Figure: The sum of residuals at each agent, i=1 kpgi (k) − pgi∗ k
53
Simulation result (III/III)
-1
-2
-3
-4
-5
-6
Figure: Performance comparisons between the proposed method and the methods
based on gradient-tracking methods (in log scale).
54
Summary
I Three different algorithms are proposed for different objectives
I Linear combination between a projected vector and non-projected
vector is proposed to attain linear convergence for constrained
problem.
I Strong convexity is important in the class of distributed algorithms in
order to employ a fixed step size
1
References:
[2].K. I. Tsianos, S. Lawlor, M. G. Rabbat, Push-Sum Distributed Dual Averaging for
convex optimization (2013)
[4]. C. Xi, U. A. Khan, Distributed subgradient projection algorithm over directed
graphs, IEEE Transactions on Automatic Control 62 (8) (2017).
[7]. V. S. Mai, E. H. Abed, Distributed optimization over directed graphs with row
stochasticity and constraint regularity, Automatica 102 (2019)
[8]. H. Liu, W. Yu, G. Chen, Discrete-time algorithms for distributed constrained convex
optimization with linear convergence rates, IEEE Transactions on Cybernetics (2020)
[9]. Z. Dong, S. Mao, W. Du, Y. Tang, Distributed constrained optimization with linear
convergence rate, in: 2020 IEEE 16th International Conference 460 on Control
Automation (ICCA), 2020,
56