Distributed Optimization Via Alternating Direction Method of Multipliers

Distributed Optimization via
Alternating Direction Method of Multipliers

Stephen Boyd, Neal Parikh, Eric Chu, Borja Peleato
Stanford University
ITMANET, Stanford, January 2011
Outline
precursors
dual decomposition
method of multipliers
alternating direction method of multipliers
applications/examples
conclusions/big picture
ITMANET, Stanford, January 2011 1
Dual problem
convex equality constrained optimization problem
minimize f(x)
subject to Ax = b
Lagrangian: L(x, y) = f(x) + y
T
(Ax b)
dual function: g(y) = inf
x
L(x, y)
dual problem: maximize g(y)
recover x
= argmin
x
L(x, y
)
Dual ascent
gradient method for dual problem: y
k+1
= y
k
+
k
g(y
k
)
g(y
k
) = A x b, where x = argmin
x
L(x, y
k
)
dual ascent method is
x
k+1
:= argmin
x
L(x, y
k
) // x-minimization
y
k+1
:= y
k
+
k
(Ax
k+1
b) // dual update
works, with lots of strong assumptions
Dual decomposition
suppose f is separable:
f(x) = f
1
(x
1
) + + f
N
(x
N
), x = (x
1
, . . . , x
N
)
then L is separable in x: L(x, y) = L
1
(x
1
, y) + + L
N
(x
N
, y) y
T
b,
L
i
(x
i
, y) = f
i
(x
i
) + y
T
A
i
x
i
x-minimization in dual ascent splits into N separate minimizations
x
k+1
i
:= argmin
x
i
L
i
(x
i
, y
k
)
which can be carried out in parallel
Dual decomposition
dual decomposition (Everett, Dantzig, Wolfe, Benders 196065)
x
k+1
i
:= argmin
x
i
L
i
(x
i
, y
k
), i = 1, . . . , N
y
k+1
:= y
k
+
k
(
N
i=1
A
i
x
k+1
i
b)
scatter y
k
; update x
i
in parallel; gather A
i
x
k+1
i
solve a large problem
by iteratively solving subproblems (in parallel)
dual variable update provides coordination
works, with lots of assumptions; often slow
Method of multipliers
a method to robustify dual ascent
use augmented Lagrangian (Hestenes, Powell 1969), > 0
L
(x, y) = f(x) + y
T
(Ax b) + (/2)Ax b
2
2
method of multipliers (Hestenes, Powell; analysis in Bertsekas 1982)
x
k+1
:= argmin
x
L
(x, y
k
)
y
k+1
:= y
k
+ (Ax
k+1
b)
(note specic dual update step length )
Method of multipliers
good news: converges under much more relaxed conditions
(f can be nondierentiable, take on value +, . . . )
bad news: quadratic penalty destroys splitting of the x-update, so cant
do decomposition
Alternating direction method of multipliers
a method
with good robustness of method of multipliers
which can support decomposition
robust dual decomposition or decomposable method of multipliers
proposed by Gabay, Mercier, Glowinski, Marrocco in 1976
ADMM problem form (with f, g convex)
minimize f(x) + g(z)
subject to Ax + Bz = c
two sets of variables, with separable objective
L
(x, z, y) = f(x) + g(z) + y

T
(Ax + Bz c) + (/2)Ax + Bz c
2
2
ADMM:
x
k+1
:= argmin
x
L
(x, z
k
, y
k
) // x-minimization
z
k+1
:= argmin
z
L
(x
k+1
, z, y
k
) // z-minimization
y
k+1
:= y
k
+ (Ax
k+1
+ Bz
k+1
c) // dual update
if we minimized over x and z jointly, reduces to method of multipliers
instead, we do one pass of a Gauss-Seidel method
we get splitting since we minimize over x with z xed, and vice versa
Convergence
assume (very little!)
f, g convex, closed, proper
L
0
has a saddle point
then ADMM converges:
iterates approach feasibility: Ax
k
+ Bz
k
c 0
objective approaches optimal value: f(x
k
) + g(z
k
) p

Related algorithms
operator splitting methods
(Douglas, Peaceman, Rachford, Lions, Mercier, . . . 1950s, 1979)
proximal point algorithm (Rockafellar 1976)
Dykstras alternating projections algorithm (1983)
Spingarns method of partial inverses (1985)
Rockafellar-Wets progressive hedging (1991)
proximal methods (Rockafellar, many others, 1976present)
Bregman iterative methods (2008present)
most of these are special cases of the proximal point algorithm
Consensus optimization
want to solve problem with N objective terms
minimize
N
i=1
f
i
(x)
e.g., f
i
is the loss function for ith block of training data
ADMM form:
minimize
N
i=1
f
i
(x
i
)
subject to x
i
z = 0
x
i
are local variables
z is the global variable
x
i
z = 0 are consistency or consensus constraints
Consensus optimization via ADMM
L
(x, z, y) =
N
i=1
f
i
(x
i
) + y
T
i
(x
i
z) + (/2)x
i
z
2
2
ADMM:
x
k+1
i
:= argmin
x
i
f
i
(x
i
) + y
kT
i
(x
i
z
k
) + (/2)x
i
z
k
2
2
z
k+1
:=
1
N
N
i=1
x
k+1
i
+ (1/)y
k
i
y
k+1
i
:= y
k
i
+ (x
k+1
i
z
k+1
)
Consensus optimization via ADMM
using
N
i=1
y
k
i
= 0, algorithm simplies to
x
k+1
i
:= argmin
x
i
f
i
(x
i
) + y
kT
i
(x
i
x
k
) + (/2)x
i
x
k
2
2
y
k+1
i
:= y
k
i
+ (x
k+1
i
x
k+1
)
where x
k
= (1/N)
N
i=1
x
k
i
in each iteration
gather x
k
i
and average to get x
k
scatter the average x
k
to processors
update y
k
i
locally (in each processor, in parallel)
update x
i
locally
Statistical interpretation
f
i
is negative log-likelihood for parameter x given ith data block
x
k+1
i
is MAP estimate under prior N(x
k
+ (1/)y
k
i
, I)
prior mean is previous iterations consensus shifted by price of
processor i disagreeing with previous consensus
processors only need to support a Gaussian MAP method
type or number of data in each block not relevant
consensus protocol yields global maximum-likelihood estimate
Consensus classication
data (examples) (a
i
, b
i
), i = 1, . . . , N, a
i
R
n
, b
i
{1, +1}
linear classier sign(a
T
w + v), with weight w, oset v
margin for ith example is b
i
(a
T
i
w + v); want margin to be positive
loss for ith example is l(b
i
(a
T
i
w + v))
l is loss function (hinge, logistic, probit, exponential, . . . )
choose w, v to minimize
1
N
N
i=1
l(b
i
(a
T
i
w + v)) + r(w)
r(w) is regularization term (
2
,
1
, . . . )
split data and use ADMM consensus to solve
Consensus SVM example
hinge loss l(u) = (1 u)
+
with
2
regularization
baby problem with n = 2, N = 400 to illustrate
examples split into 20 groups, in worst possible way:
each group contains only positive or negative examples
Iteration 1
3 2 1 0 1 2 3
10
8
6
4
2
0
2
4
6
8
10
Iteration 5
3 2 1 0 1 2 3
10
8
6
4
2
0
2
4
6
8
10
Iteration 40
3 2 1 0 1 2 3
10
8
6
4
2
0
2
4
6
8
10
Reference
Distributed Optimization and Statistical Learning via the Alternating
Direction Method of Multipliers (Boyd, Parikh, Chu, Peleato, Eckstein)
available at Boyd web site

Distributed Optimization Via Alternating Direction Method of Multipliers

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Distributed Optimization Via Alternating Direction Method of Multipliers

Uploaded by

Copyright:

Available Formats

Distributed Optimization via

Alternating Direction Method of Multipliers

(x, z, y) = f(x) + g(z) + y

ITMANET, Stanford, January 2011 11

You might also like