You are on page 1of 56

Accelerated Distributed Algorithms for

Optimization Problem and Its Application to


Economic Dispatch Problems

Ismi Rosyiana Fitri

Department of Electrical and Information Engineering,


Seoul National University of Science and Technology

December 7th, 2021

1
Problem formulation

n
X
min F (x) = Fi (x) Fi convex
x∈Rd
i=1

Centralized
Distributed

2
Problem formulation

n
X
min F (x) = Fi (x) Fi convex
x∈Rd
Optimization problem i=1
Xn
subject to gi (x) ≤ 0 or x ∈ X
i=1

I Graph: G = {V, E}
I Goal: Cooperatively estimating
the global minimizer x∗ , i.e.,
x1 = x2 = . . . = xn = x∗

3
Motivation

Distributed decision making

Smart
grids Distributed Formation
networks control

4
Preliminaries
(Doubly Stochastic) A square matrix A is doubly-stochastic if all of its
entries are non-negative, and the entries of each column sum to 1 where
A = AT .
(Row Stochastic) A square matrix A is row-stochastic if all of its entries
are non-negative, and the entries of each row sum to 1.
(L-Lipschitz Continuity ) Let X ∈ Rn be a convex set. Function
h : X → R is said to be L-Lipschitz continuous on X with modulus L if
there exists l > 0 such that kh(y) − h(x)k ≤ Lkx − yk for all x, y ∈ X .
(L-Smooth) Let X ∈ Rn be a convex set. A differentiable function
h : X → R is said to be L-Smooth on X with modulus L if its gradient
∇h is L-Lipschitz continuous on X .
(m-Strongly Convex Function) Let X ∈ Rn be a convex set. Function
h : X → R is said to be m-strongly convex on X if there exists a constant
C > 0 such that there exists m > 0 such that the following holds
m
fi (y) ≥ fi (x) + ∇fi (x)T (y − x) + kx − yk2 .
2
5
This dissertation

Presentation:
I Chapter 3
I Chapter 4
I Chapter 2
6
Optimization problem with global constraint

n
1X
min F (x) = fi (x),
x n i=1
subject to x ∈ X,

Assumption:
I fi is m-strongly convex and L-smooth
I X is convex and closed set
I All agents know X
I Communication graph: directed graph

7
Existing methods (I/III)
Unconstrained (X = Rp ) + Undirected Graph
n
xik+1 =
X
wij xjk − αk ∇fi (xik ),
j=1

Constrained (X convex closed) + Undirected Graph [2]


 
n
xik+1 = PX 
X
wij xjk − αk ∇fi (xik ) ,
j=1

I Works for general convex function + bounded subgradient


I W = [wij ] is a doubly stochastic matrix for achieving consensus
I Converges if αk is diminishing step size
I Converges if the graph is strongly connected
8
Existing methods (II/III)
Unconstrained (X = Rp ) + Directed Graph
n
∇fi (xik )
xik+1 =
X
aij xjk − αk
j=1
[yi ]i
n
yik+1 =
X
aij yjk
j=1

Constrained (X convex closed) + Directed Graph


 
n k
k ∇fi (xi ) 
xik+1 = PX 
X
aij xjk − α
j=1
[yi ]i
n
yik+1
X
= aij yjk
j=1

I Works for general convex function + bounded subgradient


I A = [aij ] is a row stochastic matrix for achieving consensus
I Converges if αk is diminishing step size
I Converges if the graph is strongly connected 9
Existing methods (II/III)
Unconstrained (X = Rp ) + Directed Graph
n
xik+1 =
X
aij xjk − αzki
j=1
n
yik+1 =
X
aij yjk
j=1
n
∇fi (xk+1 ) ∇fi (xki )
zik+1 =
X
aij zjk + i

j=1 [yik+1 ]i [yik ]i

I Fixed step size is employed


I A = [aij ] is a row stochastic matrix for achieving consensus
I Converges if α is sufficienly small
I Converges with linear convergence
I zki ∈ Rp estimates the global gradient
Question: corresponding algorithm for constrained case?
10
Conjecture: projection on primal variable

Constrained (X = Rp ) + Directed Graph


 
n
xik+1 = PX 
X
aij xjk − αzki 
j=1
n
yik+1 =
X
aij yjk
j=1
n
∇fi (xk+1 ) ∇fi (xki )
zik+1 =
X
aij zjk + i

j=1 [yik+1 ]i [yik ]i

where α > 0 is a fixed step size.

Why does not it work ? (in analysis)

11
Analysis via LTI

xk+1 = Axk − αzki


Let A∞ = lim Ak . Then, A∞ A = A∞ yields
k→∞

A∞ xk+1 = A∞ xk − αA∞ zk
Hence
kxk+1 − A∞ xk+1 kA = kAxk − A∞ xk kA − αkA∞ zk − A∞ zk kA
≤ πkxk − A∞ xk kA − αkA∞ zk − A∞ zk kA
where ζ ∈ (0, 1). Long short story,
   
kxk+1 − A∞ xk kA kxk − A∞ xk−1 kA
∞ k+1 − 1n ⊗ x k2  ≤ Ω  kA∞ xk − 1n ⊗ x∗ k2 

 kA x
   
kzk+1 − A∞ z k kA k ∞
kz − A z k−1 kA
Convergence is obtained by ensuring ρ(Ω) < 1. This can be derived by
using π ∈ (0, 1).
12
Does it work with projection?

h i
xk+1 = PX n Axk − αzk

Then,
h i
A∞ xk+1 = A∞ PX n Axk − αzk

As a result,
h i h i
kxk+1 − A∞ xk+1 kA = kPX n Axk − αzk − A∞ PX n Axk − αzk kA

The property A∞ = A∞ A can not be used. The same analysis used in


unconstrained problem does not work here.

13
Proposed method for constrained problem (I/II)
GTCP-RS:
h i n
xk+1
X
i = βPX xki − αzik + (1 − β) aij xkj
j=1
n
yik+1 =
X
aij yjk
j=1
n
∇fi (xk+1 ) ∇fi (xki )
zik+1 =
X
aij zjk + i

j=1 [yik+1 ]i [yik ]i

I The method uses two fixed step sizes: α and β


I 0<β<1
I Linear combination of two vectors are employed in xk+1
i
I If xi0 ∈ X , ∀i ∈ V, xki ∈ X for all k ≥ 0

14
Convergence Analysis
Theorem
Suppose wk = [ kA∞ xk − 1n ⊗ x∗ k, kxk − A∞ xk kA , kA∞ zk − zk kA ]T
2
and that α < nL . It follows that

wk+1 ≤ Gwk + H k k∇f(xk )k,

where the inequality is seen as component-wise, and


 
1 − β + βλ βu1 + βαu2 βαu3
G=  βu5 λ π(1 − β) + βu6 + βαu7 βαu6 ,
 
β(1 + λ)u9 βu10 + βαu11 + (1 + π)u10 π + βαu10
 
u4 π k
k 
H =  u8 π k  ,

u12 π k

1
Moreover, assume that the step-sizes satisfy 0 < α < nL and
0 < β < min{1, β̄1 , β̄2 }, then we have ρ(G) < 1.
15
Idea of the proof

h i
xk+1 = βPX̃ xk − αzk + (1 − β)Axk ,
h i
A∞ xk+1 = βA∞ PX̃ xk − αzk + (1 − β)A∞ xk ,

Lemma
Let X ∈ Rn×n be a non-negative matrix and x ∈ Rn be a positive vector.
If Xx < ωx with ω > 0, then ρ(X) < ω
I β gives additional degree of freedom to make the diagonal of G less
than 1 → use the above lemma to obtain ρ(G) < 1

16
Numerical Simulation
Consider 30 agents aiming to solve logistic regression problem:
mi
n X h  i µ
x∗ = argmin
X
ln 1 + exp −(cTij x)hij + kxk2 ,
x∈X ⊂Rp i=1 j 2

where regularization (µ/2)kxk2 and convex set X are used to avoid


overfitting.
0

-2

-4

-6

-8

-10

-12

-14

0 1000 2000 3000 4000 5000 6000 7000 8000

17
Comparison with literature

I Compared to [2,4,7] : GTCP-RS uses fixed step sizes


I Compared to [8]: GTCP-RS does not require two time scale
I Compared to [9]: GTCP-RS can deal with directed graph

Any weaknesses?
I Only limited to strongly convex and smooth cost function
I The variable yik is n-dimensional → not good for a large network

Next: Observe algorithm based on column stochastic matrix

18
Existing method: column stochastic matrix
Unconstrained + directed [10]
n
x̂ik+1 =
X
bij x̂ik − αi zki
j=1
n
yik+1 =
X
bij yjk
j=1

x̂k+1
xik+1 =
yik+1
n
zk+1 bij zkj + ∇fi (xik+1 ) − ∇fi (xik )
X
i =
j=1

I B = [bij ] is a column stochastic matrix for achieving consensus


I yik is a scalar!
Corresponding algorithm for constrained case?
19
Proposed method for constrained problem

GTCP-CS:
n
yik+1 =
X
bij yjk ,
j=1
" Pn k k Pn
− αzik k k
#
j=1 bij (yj xj ) j=1 bij (yj xj )
xk+1
i = βPX + (1 − β) ,
yik+1 yik+1
n    
zik+1 = bij zjk + ∇fi xik+1 − ∇fi xik ,
X

j=1

I Linear combination of two vectors are employed in xk+1


i
I yik is a scalar!

20
Convergence Analysis
Theorem h iT
∞ k
Let us define g k = à x −1n ⊗ x∗ , xk − Ã∞ xk , B ∞ zk − zk

h iT
2
and hk = kxk k, kzk k . Suppose that α < nL . It follows that

g k+1 ≤ G̃g k + H̃ k hk ,

where the inequality is seen as component-wise, and

u4 u†4
   
1−β +βλ βu1 +βαu2 βαu3
βαu8 ,H̃ = γ  u9 u†9,
 k k
G̃=  βu5 λ ψ2 (1−β)+βu6 +βαu7
 
β(1+λ)u10 βu11 + βαu12 +(1+ψ1 )u13 ψ1 +βαu14 u15 u†15

Moreover, assume that 0 < α < (1/nL) holds. Then, there exists
β M ∈ (0, 1) such that , when β < β M , we have
 k
kxk − (1n ⊗ x∗ )k ≤ w max{ρ(G̃), ψ2 } +  .
21
Idea of proof

I In a compact form:

yk+1 = B yk ,
  
sk+1 = (Yk+1 )−1 B Yk ⊗ Ip xk
h i
xk+1 = βPX̃ sk+1 − α(Yk+1 )−1 zk + (1 − β)sk+1 ,
zk+1 = Bzk + ∇f(xk+1 ) − ∇f(xk ),

I Given that Yk converges to Y∞ , sk converges to


(Y∞ )−1 B Y∞ ⊗ Ip xk
 

I (Y∞ )−1 B Y∞ ⊗ Ip is a row stochastic matrix


 

I The analysis follows GTCP-RS

22
Summary

n
1X
min F (x) = fi (x),
x n i=1
subject to x ∈ X,

I Two algorithms are proposed : GTCP-RS and GTCP-CS


I Both converges linearly to the optimal solution
I Both uses fixed step sizes
Next: Application to the EDP

23
Distributed EDP considering transmission limit (I/III)
Problem formulation:
n
X
min C(pg ) = Ci (pgi ),
pg ∈P
i=1
n
X Xn
subject to pLi − pgi = 0,
i=1 i=1
n n
−p̄qf ≤ q q
≤ p̄qf , q = 1, . . . , `
X X
πgi pgi − πLi pLi
i=1 i=1

I pgi ∈ Pi = {pgi |pgmin


i
≤ pgi ≤ pgmax
i
} capacity constraint
q
I p̄f be the power flow limit of transmission line q
I The DC power flow is a linearization between the power injections in
the grid and the active power flows are used
I We solve its dual problem

24
Distributed EDP considering transmission limit (II/III)
First, the Lagrangian function of the EDP L : P × R2`
+ → R is defined as

L(pg , λ, ξ U , ξ m )
n n ` n n
! !
q q
πLi pLi − p̄qf
X X X X X
= Ci (pgi ) + λ pLi − pgi + ξ U,q πgi pgi −
i=1 i=1 q=1 i=1 i=1
` n n
!
q q
pgi − p̄qf
X X X
+ ξ m,q πLi pLi − πgi
q=1 i=1 i=1
n
X
= Li (pgi , λ, ξ U , ξ m )
i=1

Consider a convex function Ci , i ∈ V, the conjugate function Ci⊥ of Ci is


given by
 
` `
q q
Ci⊥ (λ, ξ U , ξ m )
X X
= sup λpgi − ξ U,q πgi pgi + ξ m,q πgi pgi − Ci (pgi ) ,
pgi∈Pi q=1 q=1

25
Distributed EDP considering transmission limit (III/III)

Dual problem
N
X
max Φ(λ) = Φi (λ) (1)
λ∈R
i=1

where Φi (λ) = −Ci⊥ (λ) + λpLi .

Lemma
For all i ∈ V, let Ci (·) is m-strongly convex and L-smooth on Pi . Then,
Ci⊥ (λ) is strongly convex with constant 1/L and Ci⊥ (λ) is differentiable
such that its derivative is Lipschitz continuous with constant 1/m.

→Then, the dual problem is 1/L-strongly convex and 1/m-smooth

26
Simulation results (I/IV)

Figure: Directed graph 6 agents.

27
Simulation results (II/IV)

16

14

12

10

0 500 1000 1500 2000

Figure: The dispatched power at each bus.

pg∗ = [5.3, 17, 5, 15.7, 14, 8]T

28
Simulation results (III/IV)

-1

-2

-3

-4

-5

-6

-7

-8
0 1000 2000 3000 4000 5000 6000 7000 8000

Figure: Comparison of normalized error (in log scale) between the proposed
algorithm and an existing method based on diminishing step size.

29
Simulation results (IV/IV)

0.9

0.8

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0
5 10 15 20 25 30 35

Figure: The local estimate of Lagrangian function corresponding to the upper


bound of power transfer limit in line number 4.

We set p̄qf = 8.
Θg pg∗ − ΘL pL = [1.03, 5.96, 0.7100, 1.56, 8, 7.9, 5.94, 2.62]T
30
This dissertation

Presentation:
I Chapter 3 X
I Chapter 4
I Chapter 2
31
Problem formulation

PN
min f (x) = i=1 fi (xi ),
x∈X
PN
subject to g(x) = i=1 gi (xi ) ≤ 0,

I fi and gi are convex; fi is C-strongly convex on X; and gi is


β-Lipscitz continuous on X for all i = 1, . . . , N .

32
Existing method via fixed step size
Primal-dual algorithm:
h i
xk+1 = PX Axk − αLx (xk , λk )
h i
λk+1 = PD Aλk − αLλ (xk , λk )

I Fixed step size α > 0 works


I The algorithm does not require the cost function to be strongly
convex
I D is a superset of optimal set dual variable λ
I D can be estimated using a point satisfying Slater’s condition
I The convergence highly depends on the compact set D

Motivation: design an algorithm that does not require compact set set D.

33
Algorithm development

I The Lagrangian dual function of a strongly convex program is


differentiable.
I The extragradient method:
h i
λk+1/2 = PR≥0 λk + ∇φ(λk )
h  i
λk+1 = PR≥0 λk + ∇φ λk+1/2 .

converges to the optimal solution if ∇φ is Lipschitz continuous on


R≥0
I Idea: distributed extragradient method for the dual problem

34
Proposed algorithm

35
Convergence analysis

Theorem
(Primal and Dual Optimality) Let α satisfies

0 < α < 1/κ,

where κ < (β 2 /C) + 2ϑ and ϑ is the largest singular value of Laplacian L,


there exist λ∗ ∈ Λ∗ such that

lim kλi (k) − λ∗ k = 0, for all i = 1, . . . , N.


k→∞

Hence, there exists x∗ = col(x∗1 , . . . , x∗N ) ∈ X ∗ such that

lim kx(k) − x∗ k = 0, for all i = 1, . . . , N.


k→∞

where x(k) = [x1 (k), . . . , xn (k)]T .

36
Idea of the proof

I Dual function

φ(λ) = min L(x, λ).


x∈X

I Dual function φ(λ) has gradient Lipschitz continuous, due to strong


convexity of objective function
I Algorithm 3 can be rewritten in a compact form as:

η̂(k) = PΩ [η(k) − αG(η(k))] ,


η(k + 1) = PΩ [η(k) − αG(η̂(k))] ,

I Lyapunov function kη(k + 1)−η † k2 satisfying:

kη(k + 1)−η † k2 ≤ kη(k)−η † k2 −(1−α2 κ2 )kη(k) − η̂(k)k2 .

37
Application to the EDP (I/IV)

Problem formulation:
PN
min C(pg ) = i=1 Ci (pgi ),
pg ∈P
PN PN
subject to Υ(pg ) + i=1 pLi − i=1 pgi = 0,

where
N X
X N N
X
Υ(pg ) = pgi bij pgj + b0i pgi + b00 .
i=1 j=1 j=1

I The problem has quadratic equality constraint


I The quadratic equality is not separable
I B is positive semidefinite

38
Application to the EDP (II/IV)

Modified EDP:
PN
min C(pg ) = i=1 Ci (pgi ),
u,pg ∈P
PN 2 PN PN
subject to i=1 ui + i=1 pLi − i=1 pgi ≤ 0,
Rpg = u,

I The problem has quadratic equality constraint


I The quadratic equality is separable
I RT R = B and that each ith agent knows the ith column of R.

39
Application to the EDP (III/IV)
Relaxed modified EDP:
PN
min C(pg ) = i=1 Ci (pgi ),
u,pg ∈P
PN 2 PN PN
subject to i=1 ui + i=1 pLi − i=1 pgi ≤ 0,
Rpg = u,

I The problem has quadratic inequality constraint


I The constraint function has Lipschitx continuous gradient since Xi is
compact
I The relaxed algorithm solves the original problem
Theorem
[Relaxation] Suppose that ∇Ci (pgi ) ≥ 0 for all pgi ∈ Pi and ∇Ci (pgi ) > 0
for all pgi ∈ relint(Pi ) and (u∗ , pg∗ ) is the optimal solution of above
algorithm. Then, pg∗ is also the optimal solution of original EDP
(inequality).
40
Application to the EDP (IV/IV)

Sub-optimal relaxed modified EDP:


PN PN 2
min i=1 Ci (pgi ) +τ i=1 ui ,
ui ∈U ,pg ∈P,
i=1...,N
PN 2 PN PN
subject to i=1 ui + i=1 pLi − i=1 pgi ≤ 0,
Rpg = u,

I The problem has the same formulation as the one discussed before,
Algorithm 3 can be used to solve it

41
Simulation results (I/IV)

Figure: The communication graph of thirty agents via an undirected graph.

42
Simulation results (II/IV)
20

18

16

14

12

10

0 1 2 3 4 5 6 7 8 9 10
105

Figure: Trajectories of primal variables pgi (k).

In a centralized way: pg∗1 = 5, pg∗2 = 9.0797, pg∗3 = 18.6667, pg∗4 = 16.1137,


pg∗5 = 10 and pg∗6 = 8 MW.
43
Simulation results (III/IV)

Figure: Trajectories of dual error |λi (k) − λ∗ | where λ is the Lagrangian


multiplier corresponding to the inequality constraint.

44
Simulation results (IV/IV)

4000
Cost function value
3000

2000

1000

30
Supply-demand balance
20

10

-10

-20
0 0.5 1 1.5 2 2.5 3 3.5 4 4.5 5
10 5

Figure: Trajectories of function value C(pgi (k)) and coupled constraint.

45
This dissertation

Presentation:
I Chapter 3 X
I Chapter 4 X
I Chapter 2
46
Problem formulation

PN
min f (x) = i=1 fi (xi ),
x∈X

I fi is m-strongly convex and L-smooth


I Motivation: accelerate convergence

47
Related methods: centralized
I Nesterov’s Fast Gradient Method (FGM), is widely employed:
n
!
k+1 k 1X
x = z −α ∇fi (zk )
n i=1
zk+1 = xk+1 + β(xk+1 − xk )
√ √
I The FGM uses α = 1/L and β = √L−√m .
L+ m
I The rate can be improved by using the triple momentum (TM):
n
!
k+1 k 1X
x = s −α ∇fi (z)
n i=1
sk+1 = xk+1 + β(xk+1 − xk )
zk+1 = xk+1 + γ(xk+1 − xk )
where step-sizes are chosen such that
!
1 + % %2 %2
(α, β, γ) = , ,
L 2 − % (1 + %)(2 − %)
p
and % = 1 − 1/ L/m. 48
Proposed methods
TM-like gradient tracking algorithm :
n
xik+1 =
X
aij sjk − αi yik
j=1
n
sik+1 = xik+1 + βi aij (xjk+1 − xjk )
X

j=1
n
zk+1 = xik+1 + γi aij (xjk+1 − xjk )
X
i
j=1
n
yik+1 = bij yjk + ∇fi (zk+1
X
i ) − ∇fi (zki )
j=1

I Consists of three step sizes: α, β, γ


I There are two times exchange information at one iteration:
I before running xik
I after running xik
49
Convergence analysis

I Show that

kxk+1 − A∞ xk kA kxk − A∞ xk−1 kA


   
 kA∞ xk+1 − 1n ⊗ x∗ k2   kA∞ xk − 1n ⊗ x∗ k2 
 ≤ Ω
   
kxk+1 − xk k2 kxk − xk−1 k2
 
   
kyk+1 − B ∞ y k kB kyk − B ∞ y k−1 kB

I where
 
πA + hA u1 ᾱ u1 ᾱ u1 ᾱγ̄ + u2 β̄ u3 ᾱ

u5 ᾱ λ u6 ᾱγ̄ + u7 β̄ hB u7 ᾱ 
Ω =
 
u8 + u9 ᾱ u9 ᾱ u10 ᾱγ̄ + u11 β̄ hB ᾱ

 
u13 (1 + γ̄) + u14 ᾱ(1 + γ̄) u15 ᾱ(1 + γ̄) ξ1 ξ2

I The analysis procedure similar to the methods proposed earlier

50
Application to the EDP

Problem formulation:
n
X
min Ci (pgi )
pgi ,i=1,...,n
i=1
Xn n
X
subject to pgi = pLi ,
i=1 i=1
pgi ∈ Pi = {pgmin
i
≤ pgi ≤ pgmax
i
}.

Dual problem:
n
X
min g(λ) = gi (λ)
λ∈R
i=1

51
Simulation result (I/III)

Figure: Randomly generated directed graph of 30 agents.

52
Simulation result (II/III)

18

16

14

12

10

0
0 20 40 60 80 100 120 140 160 180 200

Pn
Figure: The sum of residuals at each agent, i=1 kpgi (k) − pgi∗ k

53
Simulation result (III/III)

-1

-2

-3

-4

-5

-6

0 200 400 600 800 1000 1200

Figure: Performance comparisons between the proposed method and the methods
based on gradient-tracking methods (in log scale).

54
Summary
I Three different algorithms are proposed for different objectives
I Linear combination between a projected vector and non-projected
vector is proposed to attain linear convergence for constrained
problem.
I Strong convexity is important in the class of distributed algorithms in
order to employ a fixed step size

Future research topics The following constitute future research problems :


I In this dissertation, it is assumed that the communication is
synchronous. However, in real practice, this assumption may not be
hard to hold.
I Linear convergence algorithm for a constrained problem with
uncoordinated step sizes
I Utilizing constant step size for a constrained problem with coupled
constraint under directed graphs.
I Constrained problem subjected to local constraint sets
55
Thank you
1

1
References:
[2].K. I. Tsianos, S. Lawlor, M. G. Rabbat, Push-Sum Distributed Dual Averaging for
convex optimization (2013)
[4]. C. Xi, U. A. Khan, Distributed subgradient projection algorithm over directed
graphs, IEEE Transactions on Automatic Control 62 (8) (2017).
[7]. V. S. Mai, E. H. Abed, Distributed optimization over directed graphs with row
stochasticity and constraint regularity, Automatica 102 (2019)
[8]. H. Liu, W. Yu, G. Chen, Discrete-time algorithms for distributed constrained convex
optimization with linear convergence rates, IEEE Transactions on Cybernetics (2020)
[9]. Z. Dong, S. Mao, W. Du, Y. Tang, Distributed constrained optimization with linear
convergence rate, in: 2020 IEEE 16th International Conference 460 on Control
Automation (ICCA), 2020,
56

You might also like