Professional Documents
Culture Documents
Convex Optimization I - Boyd, Slides, 2009
Convex Optimization I - Boyd, Slides, 2009
1. Introduction
mathematical optimization
least-squares and linear programming
convex optimization
example
course goals and topics
nonlinear optimization
brief history of convex optimization
11
Mathematical optimization
(mathematical) optimization problem
minimize f0(x)
subject to fi(x) bi,
i = 1, . . . , m
Introduction
12
Examples
portfolio optimization
variables: amounts invested in different assets
13
Introduction
14
Least-squares
minimize kAx bk22
solving least-squares problems
analytical solution: x = (AT A)1AT b
reliable and efficient algorithms and software
15
Linear programming
minimize cT x
subject to aTi x bi,
i = 1, . . . , m
16
i = 1, . . . , m
Introduction
17
Introduction
18
Example
m lamps illuminating n (small, flat) patches
lamp power pj
rkj
kj
illumination Ik
m
X
akj pj ,
2
max{cos kj , 0}
akj = rkj
j=1
19
how to solve?
1. use uniform power: pj = p, vary p
2. use least-squares:
minimize
round pj if pj > pmax or pj < 0
Pn
2
(I
I
)
k
des
k=1
110
h(u)
0
0
111
Introduction
112
Introduction
113
Nonlinear optimization
traditional techniques for general nonconvex problems involve compromises
local optimization methods (nonlinear programming)
find a point that minimizes f0 among feasible points near it
fast, can handle large problems
require initial guess
provide no information about distance to (global) optimum
global optimization methods
find the (global) solution
worst-case complexity grows exponentially with problem size
these algorithms are often based on solving convex subproblems
Introduction
114
115
2. Convex sets
affine and convex sets
some important examples
operations that preserve convexity
generalized inequalities
separating and supporting hyperplanes
dual cones and generalized inequalities
21
Affine set
line through x1, x2: all points
x = x1 + (1 )x2
( R)
x1
= 1.2
=1
= 0.6
x2
=0
= 0.2
affine set: contains the line through any two distinct points in the set
example: solution set of linear equations {x | Ax = b}
(conversely, every affine set can be expressed as solution set of system of
linear equations)
Convex sets
22
Convex set
line segment between x1 and x2: all points
x = x1 + (1 )x2
with 0 1
convex set: contains line segment between any two points in the set
x1, x2 C,
01
x1 + (1 )x2 C
Convex sets
23
Convex sets
24
Convex cone
conic (nonnegative) combination of x1 and x2: any point of the form
x = 1 x1 + 2 x2
with 1 0, 2 0
x1
x2
0
convex cone: set that contains all conic combinations of points in the set
Convex sets
25
x0
x
aT x = b
x0
aT x b
aT x b
26
xc
27
0.5
0
1
x2 1 1
x1
28
Polyhedra
solution set of finitely many linear inequalities and equalities
Ax b,
Cx = d
a2
a5
a3
a4
29
z T Xz 0 for all z
x y
y z
S2+
0.5
example:
0
1
y
Convex sets
0.5
1 0
x
210
01
x1 + (1 )x2 C
intersection
affine functions
perspective function
linear-fractional functions
Convex sets
211
Intersection
the intersection of (any number of) convex sets is convex
example:
x2
p(t)
1
1
Convex sets
/3
2/3
2
2
x01
212
Affine function
suppose f : Rn Rm is affine (f (x) = Ax + b with A Rmn, b Rm)
the image of a convex set under f is convex
S Rn convex
examples
scaling, translation, projection
213
images and inverse images of convex sets under perspective are convex
dom f = {x | cT x + d > 0}
Convex sets
214
1
x
x1 + x2 + 1
1
1
Convex sets
x2
x2
0
x1
1
1
f (C)
x1
215
Generalized inequalities
a convex cone K Rn is a proper cone if
K is closed (contains its boundary)
K is solid (has nonempty interior)
K is pointed (contains no line)
examples
nonnegative orthant K = Rn+ = {x Rn | xi 0, i = 1, . . . , n}
positive semidefinite cone K = Sn+
216
y x K,
x K y
y x int K
examples
componentwise inequality (K = Rn+)
x Rn+ y
xi yi ,
i = 1, . . . , n
Y X positive semidefinite
u K v
x + u K y + v
217
x K y
y K x
example (K = R2+)
x1 is the minimum element of S1
x2 is a minimal element of S2
x1
Convex sets
S1
y=x
S2
x2
218
aT x b for x D
aT x b
aT x b
D
C
a
219
x0
220
K = Sn+: K = Sn+
y T x 0 for all x K 0
221
S
x
x1
S
x2
222
fuel
example (n = 2)
x1, x2, x3 are efficient; x4, x5 are not
P
x1
x2 x5 x4
x3
labor
Convex sets
223
3. Convex functions
basic properties and examples
operations that preserve convexity
the conjugate function
quasiconvex functions
log-concave and log-convex functions
convexity with respect to generalized inequalities
31
Definition
f : Rn R is convex if dom f is a convex set and
f (x + (1 )y) f (x) + (1 )f (y)
for all x, y dom f , 0 1
(y, f (y))
(x, f (x))
f is concave if f is convex
32
Examples on R
convex:
affine: ax + b on R, for any a, b R
exponential: eax, for any a R
powers: x on R++, for 1 or 0
powers of absolute value: |x|p on R, for p 1
negative entropy: x log x on R++
concave:
affine: ax + b on R, for any a, b R
powers: x on R++, for 0 1
logarithm: log x on R++
Convex functions
33
f (X) = tr(AT X) + b =
n
m X
X
Aij Xij + b
i=1 j=1
34
dom g = {t | x + tv dom f }
35
Extended-value extension
extended-value extension f of f is
f(x) = f (x),
x dom f,
f(x) = ,
x 6 dom f
First-order condition
f is differentiable if dom f is open and the gradient
f (x) =
f (x) f (x)
f (x)
,
,...,
x1
x2
xn
f (y)
f (x) + f (x)T (y x)
(x, f (x))
37
Second-order conditions
f is twice differentiable if dom f is open and the Hessian 2f (x) Sn,
2f (x)
f (x)ij =
,
xixj
2
i, j = 1, . . . , n,
38
Examples
quadratic function: f (x) = (1/2)xT P x + q T x + r (with P Sn)
2f (x) = P
f (x) = P x + q,
convex if P 0
2f (x) = 2AT A
y
x
T
0
f (x, y)
2
2f (x, y) = 3
y
2
1
0
2
2
0
0 2
x
39
1
1T z
Pn
k=1 exp xk
diag(z)
is convex
1
T
zz
(1T z)2
(zk = exp xk )
v f (x)v =
since (
k v k zk )
2
k zk vk )(
2
k zk vk )(
P
P
zk ) ( k v k zk ) 2
k
P
0
( k zk ) 2
k zk )
Qn
n
1/n
x
)
on
R
k
++ is concave
k=1
Convex functions
310
311
Jensens inequality
basic inequality: if f is convex, then for 0 1,
f (x + (1 )y) f (x) + (1 )f (y)
extension: if f is convex, then
f (E z) E f (z)
for any random variable z
basic inequality is special case with discrete distribution
prob(z = x) = ,
Convex functions
prob(z = y) = 1
312
Convex functions
313
m
X
i=1
Convex functions
314
Pointwise maximum
if f1, . . . , fm are convex, then f (x) = max{f1(x), . . . , fm(x)} is convex
examples
piecewise-linear function: f (x) = maxi=1,...,m(aTi x + bi) is convex
Convex functions
315
Pointwise supremum
if f (x, y) is convex in x for each y A, then
g(x) = sup f (x, y)
yA
is convex
examples
support function of a set C: SC (x) = supyC y T x is convex
f (x) = sup kx yk
yC
Convex functions
316
317
Vector composition
composition of g : Rn Rk and h : Rk R:
f (x) = h(g(x)) = h(g1(x), g2(x), . . . , gk (x))
nondecreasing in each argument
gi convex, h convex, h
f is convex if
nonincreasing in each argument
gi concave, h convex, h
proof (for n = 1, differentiable g, h)
f (x) = g (x)T 2h(g(x))g (x) + h(g(x))T g (x)
examples
Pm
318
Minimization
if f (x, y) is convex in (x, y) and C is a convex set, then
g(x) = inf f (x, y)
yC
is convex
examples
f (x, y) = xT Ax + 2xT By + y T Cy with
A
BT
B
C
0,
C0
319
Perspective
the perspective of a function f : Rn R is the function g : Rn R R,
dom g = {(x, t) | x/t dom f, t > 0}
g(x, t) = tf (x/t),
g is convex if f is convex
examples
320
sup (y T x f (x))
xdom f
f (x)
xy
x
(0, f (y))
321
examples
negative logarithm f (x) = log x
f (y) = sup(xy + log x)
x>0
1 log(y) y < 0
otherwise
Convex functions
1 T 1
y Q y
2
322
Quasiconvex functions
f : Rn R is quasiconvex if dom f is convex and the sublevel sets
S = {x dom f | f (x) }
are convex for all
f is quasiconcave if f is quasiconvex
f is quasilinear if it is quasiconvex and quasiconcave
Convex functions
323
Examples
|x| is quasiconvex on R
dom f = {x | cT x + d > 0}
is quasilinear
distance ratio
kx ak2
,
f (x) =
kx bk2
is quasiconvex
Convex functions
324
n
X
(1 + r)ixi
i=0
Convex functions
n
X
i=0
(1 + r)ixi 0 for 0 r R
325
Properties
modified Jensen inequality: for quasiconvex f
01
f (x)T (y x) 0
f (x)
326
for 0 1
1
(2)n det
e 2 (xx)
1 (x
x)
327
f (x, y) dy
328
f (x y)g(y)dy
g(u) =
1 uC
0 u
6 C,
p is pdf of y
Convex functions
329
Convex functions
330
331
41
i = 1, . . . , m
i = 1, . . . , p
optimal value:
p = inf{f0(x) | fi(x) 0, i = 1, . . . , m, hi(x) = 0, i = 1, . . . , p}
p = if problem is infeasible (no x satisfies the constraints)
42
hi(z) = 0,
i = 1, . . . , p
examples (with n = 1, m = p = 0)
f0(x) = 1/x, dom f0 = R++: p = 0, no optimal point
43
Implicit constraints
the standard form optimization problem has an implicit constraint
xD=
m
\
i=0
dom fi
p
\
dom hi,
i=1
Pk
i=1 log(bi
aTi x)
44
Feasibility problem
find
x
subject to fi(x) 0,
hi(x) = 0,
i = 1, . . . , m
i = 1, . . . , p
i = 1, . . . , m
i = 1, . . . , p
45
i = 1, . . . , m
46
example
minimize f0(x) = x21 + x22
subject to f1(x) = x1/(1 + x22) 0
h1(x) = (x1 + x2)2 = 0
f0 is convex; feasible set {(x1, x2) | x1 = x2 0} is convex
not a convex problem (according to our definition): f1 is not convex, h1
is not affine
equivalent (but not identical) to the convex problem
minimize x21 + x22
subject to x1 0
x1 + x2 = 0
47
kz xk2 R
f0(z) f0(x)
48
f0(x)
X
49
f0(x) = 0
f0(x) + AT = 0
Ax = b,
x 0,
f0(x)i 0 xi = 0
f0(x)i = 0 xi > 0
410
i = 1, . . . , m
is equivalent to
minimize (over z) f0(F z + x0)
subject to
fi(F z + x0) 0,
i = 1, . . . , m
x = F z + x0 for some z
411
i = 1, . . . , m
is equivalent to
minimize (over x, yi) f0(y0)
subject to
fi(yi) 0, i = 1, . . . , m
yi = Aix + bi, i = 0, 1, . . . , m
introducing slack variables for linear inequalities
minimize f0(x)
subject to aTi x bi,
i = 1, . . . , m
is equivalent to
minimize (over x, s) f0(x)
subject to
aTi x + si = bi, i = 1, . . . , m
si 0, i = 1, . . . m
Convex optimization problems
412
i = 1, . . . , m
minimize f0(x1)
subject to fi(x1) 0,
i = 1, . . . , m
is equivalent to
413
Quasiconvex optimization
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
Ax = b
with f0 : Rn R quasiconvex, f1, . . . , fm convex
can have locally optimal points that are not (globally) optimal
(x, f0(x))
414
t(x) 0
p(x)
f0(x) =
q(x)
415
fi(x) 0,
i = 1, . . . , m,
Ax = b
(1)
416
c
P
417
Examples
diet problem: choose quantities x1, . . . , xn of n foods
one unit of food j costs cj , contains amount aij of nutrient i
x0
piecewise-linear minimization
minimize maxi=1,...,m(aTi x + bi)
equivalent to an LP
minimize t
subject to aTi x + bi t,
Convex optimization problems
i = 1, . . . , m
418
xcheb
i = 1, . . . , m
419
Linear-fractional program
minimize f0(x)
subject to Gx h
Ax = b
linear-fractional program
cT x + d
f0(x) = T
,
e x+f
minimize cT y + dz
subject to Gy hz
Ay = bz
eT y + f z = 1
z0
Convex optimization problems
420
421
422
Examples
least-squares
minimize kAx bk22
analytical solution x = Ab (A is pseudo-inverse)
can add linear constraints, e.g., l x u
linear program with random cost
minimize cT x + xT x = E cT x + var(cT x)
subject to Gx h, Ax = b
c is random vector with mean c and covariance
423
i = 1, . . . , m
424
i = 1, . . . , m
425
i = 1, . . . , m,
i = 1, . . . , m,
i = 1, . . . , m
426
(
a i Rn ,
Pi Rnn)
center is a
i, semi-axes determined by singular values/vectors of Pi
robust LP
minimize cT x
subject to aTi x bi
ai Ei,
i = 1, . . . , m
cT x
a
Ti x + kPiT xk2 bi,
i = 1, . . . , m
427
Rx
t
e
/2
dt is CDF of N (0, 1)
minimize cT x
subject to prob(aTi x bi) ,
i = 1, . . . , m,
cT x
1/2
a
Ti x + 1()ki xk2 bi,
i = 1, . . . , m
428
Geometric programming
monomial function
f (x) = cxa1 1 xa2 2 xann ,
dom f = Rn++
K
X
k=1
ck x1 1k x2 2k xannk ,
dom f = Rn++
i = 1, . . . , m
i = 1, . . . , p
429
PK
(b = log c)
1k
2k
nk
transforms to
k=1 ck x1 x2 xn
K
X
k=1
eak y+bk
(bk = log ck )
i = 1, . . . , m
430
431
vi = 12(i 1/2)
yi
432
formulation as a GP
minimize
w 1 h1 + + w N hN
1
subject to wmax
wi 1,
h1
max hi 1,
wminwi1 1,
hminh1
i 1,
1
Smax
wi1hi 1,
i = 1, . . . , N
i = 1, . . . , N
Sminwih1
i 1,
1
6iF max
wi1h2
i 1,
i = 1, . . . , N
i = 1, . . . , N
1
ymax
y1 1
note
we write wmin wi wmax and hmin hi hmax
wmin/wi 1,
wi/wmax 1,
hmin/hi 1,
hi/hmax 1
hi/(wiSmax) 1
433
i = 1, . . . , n
variables , v, x
Convex optimization problems
434
i = 1, . . . , m
435
0
x1F1 + + xnFn + G
F1 0
0 F1
+x2
F2 0
0 F2
+ +xn
Fn 0
0 Fn
G 0
+
0
0 G
436
minimize cT x
subject to Ax b
SDP:
minimize cT x
subject to diag(Ax b) 0
SDP:
minimize f T x
subject to kAix + bik2 cTi x + di,
f T x
(cTi x + di)I
subject to
(Aix + bi)T
minimize
A i x + bi
cTi x + di
i = 1, . . . , m
0,
i = 1, . . . , m
437
Eigenvalue minimization
minimize max(A(x))
where A(x) = A0 + x1A1 + + xnAn (with given Ai Sk )
equivalent SDP
minimize t
subject to A(x) tI
variables x Rn, t R
follows from
max(A) t
A tI
438
1/2
t
tI
A(x)T
A(x)
tI
0
variables x Rn, t R
constraint follows from
kAk2 t AT A t2I, t 0
tI A
0
AT tI
Convex optimization problems
439
Vector optimization
general vector optimization problem
minimize (w.r.t. K) f0(x)
subject to
fi(x) 0,
hi(x) 0,
i = 1, . . . , m
i = 1, . . . , p
i = 1, . . . , m
440
O
O
f0(x)
x is optimal
f0(xpo)
441
Multicriterion optimization
vector optimization problem with K = Rq+
f0(x) = (F1(x), . . . , Fq (x))
q different objectives Fi; roughly speaking we want all Fis to be small
feasible x is optimal if
y feasible
f0(x) f0(y)
f0(y) f0(xpo)
f0(xpo) = f0(y)
442
Regularized least-squares
minimize (w.r.t. R2+) (kAx bk22, kxk22)
F2(x) = kxk22
25
20
15
10
0
0
10
20
30
40
50
443
allocation x
mean return
15%
10%
5%
x(4) x(3)
x(2)
0.5
x(1)
0
0%
0%
10%
20%
0%
10%
20%
Scalarization
to find Pareto optimal points: choose K 0 and solve scalar problem
minimize T f0(x)
subject to fi(x) 0,
hi(x) = 0,
i = 1, . . . , m
i = 1, . . . , p
f0(x1)
1
f0(x3)
f0(x2) 2
for convex vector optimization problems, can find (almost) all Pareto
optimal points by varying K 0
Convex optimization problems
445
15
kxk22
10
=1
0
0
10
kAx
Convex optimization problems
bk22
15
20
446
447
5. Duality
Lagrange dual problem
weak and strong duality
geometric interpretation
optimality conditions
perturbation and sensitivity analysis
examples
generalized inequalities
51
Lagrangian
standard form problem (not necessarily convex)
minimize f0(x)
subject to fi(x) 0,
hi(x) = 0,
i = 1, . . . , m
i = 1, . . . , p
m
X
i=1
ifi(x) +
p
X
ihi(x)
i=1
52
inf L(x, , )
xD
inf
xD
f0(x) +
m
X
ifi(x) +
i=1
p
X
ihi(x)
i=1
53
x = (1/2)AT
plug in in L to obtain g:
1
g() = L((1/2)AT , ) = T AAT bT
4
a concave function of
lower bound property: p (1/4) T AAT bT for all
Duality
54
Standard form LP
minimize cT x
subject to Ax = b,
x0
dual function
Lagrangian is
L(x, , ) = cT x + T (Ax b) T x
= bT + (c + AT )T x
L is affine in x, hence
g(, ) = inf L(x, , ) =
x
bT
AT + c = 0
otherwise
55
bT kAT k 1
otherwise
as t
56
Two-way partitioning
minimize xT W x
subject to x2i = 1,
i = 1, . . . , n
g() = inf (x W x +
x
X
i
otherwise
57
Cx = d
dual function
g(, ) =
inf
xdom f0
f0(x) + (A + C ) x b d
= f0(AT C T ) bT dT
n
X
i=1
Duality
xi log xi,
f0(y) =
n
X
eyi1
i=1
58
maximize bT
subject to AT + c 0
59
maximize 1T
subject to W + diag() 0
gives a lower bound for the two-way partitioning problem on page 57
strong duality: d = p
does not hold in general
510
i = 1, . . . , m
fi(x) < 0,
i = 1, . . . , m,
Ax = b
511
Inequality form LP
primal problem
minimize cT x
subject to Ax b
dual function
T
g() = inf (c + A ) x b =
x
dual problem
bT AT + c = 0
otherwise
maximize bT
subject to AT + c = 0,
0
512
Quadratic program
primal problem (assume P Sn++)
minimize xT P x
subject to Ax b
dual function
1
g() = inf x P x + (Ax b) = T AP 1AT bT
x
4
T
dual problem
maximize (1/4)T AP 1AT bT
subject to 0
from Slaters condition: p = d if A
x b for some x
in fact, p = d always
Duality
513
t
A + I
subject to
bT
maximize
b
t
0
strong duality although primal problem is not convex (not easy to show)
Duality
514
Geometric interpretation
for simplicity, consider problem with one constraint f1(x) 0
interpretation of dual function:
g() =
inf (t + u),
(u,t)G
where
G = {(f1(x), f0(x)) | x D}
t
G
p
d
p
u + t = g()
g()
u
515
u + t = g()
p
g()
strong duality
holds if there is a non-vertical supporting hyperplane to A at (0, p)
516
Complementary slackness
assume strong duality holds, x is primal optimal, (, ) is dual optimal
f0(x) = g(, ) = inf
x
f0(x) +
f0(x) +
f0(x)
m
X
ifi(x) +
i=1
m
X
ifi(x) +
i=1
p
X
ihi(x)
i=1
p
X
ihi(x)
i=1
fi(x) < 0 = i = 0
517
m
X
i=1
ifi(x) +
p
X
i=1
ihi(x) = 0
from page 517: if strong duality holds and x, , are optimal, then they
must satisfy the KKT conditions
Duality
518
) = L(
)
from 4th condition (and convexity): g(,
x, ,
)
hence, f0(
x) = g(,
Duality
519
ixi = 0,
1
+ i =
xi + i
interpretation
1/
xi
i
520
maximize g(, )
subject to 0
i = 1, . . . , m
i = 1, . . . , p
i = 1, . . . , m
i = 1, . . . , p
max. g(, ) uT v T
s.t.
0
521
sensitivity interpretation
if i large: p increases greatly if we tighten constraint i (ui < 0)
522
p(0, 0)
=
,
ui
p(0, 0)
=
vi
u
p(u)
u=0
p(0) u
Duality
523
common reformulations
introduce new variables and equality constraints
make explicit constraints implicit or vice-versa
transform objective or constraint functions
e.g., replace f0(x) by (f0(x)) with convex, increasing
Duality
524
maximize bT f0()
subject to AT = 0
otherwise
Duality
525
otherwise
T
b AT = 0, kk 1
=
otherwise
(see page 54)
dual of norm approximation problem
maximize bT
subject to AT = 0,
Duality
kk 1
526
Implicit constraints
LP with box constraints: primal and dual problem
minimize cT x
subject to Ax = b
1 x 1
maximize bT 1T 1 1T 2
subject to c + AT + 1 2 = 0
1 0, 2 0
otherwise
subject to Ax = b
dual function
g() =
inf
1x1
= bT kAT + ck1
dual problem: maximize bT kAT + ck1
Duality
527
m
X
i=1
Ti fi(x) +
p
X
ihi(x)
i=1
Duality
528
m
X
x) +
Ti fi(
i=1
p
X
ihi(
x)
i=1
inf L(x, 1, . . . , m, )
xD
= g(1, . . . , m, )
minimizing over all feasible x
gives p g(1, . . . , m, )
dual problem
maximize g(1, . . . , m, )
subject to i Ki 0, i = 1, . . . , m
weak duality: p d always
strong duality: p = d for convex problem with constraint qualification
(for example, Slaters: primal problem is strictly feasible)
Duality
529
Semidefinite program
primal SDP (Fi, G Sk )
minimize cT x
subject to x1F1 + + xnFn G
Lagrange multiplier is matrix Z Sk
dual function
tr(GZ) tr(FiZ) + ci = 0,
otherwise
i = 1, . . . , n
dual SDP
maximize tr(GZ)
subject to Z 0, tr(FiZ) + ci = 0,
i = 1, . . . , n
530
norm approximation
least-norm problems
regularized approximation
robust approximation
61
Norm approximation
minimize kAx bk
(A Rmn with m n, k k is a norm on Rm)
interpretations of solution x = argminx kAx bk:
geometric: Ax is point in R(A) closest to b
estimation: linear measurement model
y = Ax + v
y are measurements, x is unknown, v is measurement error
given y = b, best guess of x is x
optimal design: x are design variables (input), Ax is result (output)
x is design that best approximates desired result b
62
examples
least-squares approximation (k k2): solution satisfies normal equations
AT Ax = AT b
(x = (AT A)1AT b if rank A = n)
Chebyshev approximation (k k): can be solved as an LP
minimize t
subject to t1 Ax b t1
sum of absolute residuals approximation (k k1): can be solved as an LP
minimize 1T y
subject to y Ax b y
63
log barrier
quadratic
1.5
(u)
quadratic: (u) = u
deadzone-linear
0.5
0
1.5
0.5
0.5
1.5
otherwise
64
(u) = |u|,
(u) = log(1u2)
Log barrier
Deadzone
p=2
p=1
40
0
2
10
0
2
0
2
10
0
2
20
65
Huber
replacements
u2
|u| M
M (2|u| M ) |u| > M
f (t)
hub(u)
1.5
0.5
0
1.5
10
0.5
0.5
1.5
20
10
10
66
Least-norm problems
minimize kxk
subject to Ax = b
(A Rmn with m n, k k is a norm on Rn)
interpretations of solution x = argminAx=b kxk:
geometric: x is point in affine set {x | Ax = b} with minimum
distance to 0
estimation: b = Ax are (perfect) measurements of x; x is smallest
(most plausible) estimate consistent with measurements
design: x are design variables (inputs); b are required results (outputs)
x is smallest (most efficient) design that satisfies requirements
67
examples
least-squares solution of linear equations (k k2):
can be solved via optimality conditions
2x + AT = 0,
Ax = b
Ax = b
68
Regularized approximation
69
Scalarized problem
minimize kAx bk + kxk
solution for > 0 traces out optimal trade-off curve
Tikhonov regularization
minimize kAx bk22 + kxk22
can be solved as a least-squares problem
2
A
b
minimize
x
0
2
I
610
t
X
=0
h( )u(t ),
t = 0, 1, . . . , N
611
y(t)
u(t)
10
0
0.5
50
100
t
150
200
y(t)
u(t)
100
t
150
200
50
100
t
150
200
50
100
t
150
200
0.5
2
50
100
t
150
200
1
0
1
0.5
y(t)
u(t)
50
0.5
0.5
2
4
0
1
0
1
4
0
50
100
t
150
200
1
0
612
Signal reconstruction
n1
X
i=1
(
xi+1 x
i )2 ,
tv (
x) =
n1
X
i=1
|
xi+1 x
i |
613
0.5
0.5
1000
2000
3000
4000
1000
2000
3000
4000
1000
2000
3000
4000
0.5
0
1000
2000
3000
4000
0.5
0.5
0
0.5
xcor
0.5
0.5
0
0.5
1000
2000
3000
4000
x
i
2
0
2
500
1000
1500
2000
500
1000
1500
2000
500
1000
1500
2000
2
0
x
i
1
500
1000
1500
2
0
2
2
1
0
x
i
xcor
2000
1
2
0
500
1000
1500
2000
2
0
615
2
0
2
500
1000
1500
2000
500
1000
1500
2000
500
1000
1500
2000
2
0
1
500
1000
1500
2000
2
0
2
2
1
0
xcor
1
2
0
500
1000
1500
2000
2
0
616
Robust approximation
minimize kAx bk with uncertain A
two approaches:
stochastic: assume A is random, minimize E kAx bk
xnom
r(u)
10
bk22
6
4
xstoch
xwc
2
0
2
2
617
2
bk22
+ xT P x
618
A1 x A2 x
b
Apx , q(x) = Ax
from page 514, strong duality holds between the following problems
maximize kP u + qk22
subject to kuk22 1
minimize
t+
I
subject to P T
qT
P
I
0
q
0 0
t
t+
I
subject to P (x)T
q(x)T
Approximation and fitting
P (x) q(x)
I
0 0
0
t
619
xrls
frequency
0.2
0.15
0.1
xtik
xls
0.05
0
0
r(u)
620
7. Statistical estimation
71
72
i = 1, . . . , m
Qm
i=1 p(yi
aTi x)
Statistical estimation
Pm
aTi x)
73
examples
Gaussian noise N (0, 2): p(z) = (2 2)1/2ez
/(2 2 )
1 X T
m
2
(ai x yi)2
l(x) = log(2 ) 2
2
2 i=1
ML estimate is LS solution
1X T
l(x) = m log(2a)
|ai x yi|
a i=1
otherwise
i = 1, . . . , m
74
Logistic regression
random variable y {0, 1} with distribution
exp(aT u + b)
p = prob(y = 1) =
1 + exp(aT u + b)
a, b are parameters; u Rn are (observable) explanatory variables
m
k
T
Y
Y
1
exp(a
u
+
b)
i
l(a, b) = log
T
T
1 + exp(a ui + b)
1 + exp(a ui + b)
i=1
i=k+1
k
X
i=1
(aT ui + b)
m
X
i=1
concave in a, b
Statistical estimation
75
example (n = 1, m = 50 measurements)
1
prob(y = 1)
0.8
0.6
0.4
0.2
0
0
10
76
Statistical estimation
77
Tp Tq
1 Pfp
Pfn
Pfp
1 Pfn
78
i = 1, 2,
k = 1, . . . , n
(1, 0) pk qk
(0, 1) pk < qk
79
example
0.70
0.20
P =
0.05
0.05
0.10
0.10
0.70
0.10
1
0.8
Pfn
0.6
0.4
0.2
0
0
1
2 4
0.2
3
0.4
Pfp
0.6
0.8
710
Experiment design
m linear measurements yi = aTi x + wi, i = 1, . . . , m of unknown x Rn
measurement errors wi are IID N (0, 1)
ML (least-squares) estimate is
x
=
m
X
aiaTi
i=1
!1
m
X
yi a i
i=1
error e = x
x has zero mean and covariance
E = E eeT =
m
X
i=1
aiaTi
!1
711
Sn+)
T 1
k=1 mk vk vk
Pp
E=
mk 0,
mk Z
m1 + + mp = m
Sn+)
T 1
k=1 k vk vk
Pp
E = (1/m)
0, 1T = 1
Statistical estimation
712
D-optimal design
minimize log det
subject to 0,
T 1
k=1 k vk vk
T
Pp
1 =1
k = 1, . . . , p
713
example (p = 20)
1 = 0.5
2 = 0.5
Statistical estimation
714
0,
p
X
1T = 1
k vk vkT
k=1
!!
z T +(1T 1)
715
8. Geometric problems
81
i = 1, . . . , m
82
i = 1, . . . , m
83
n if C is symmetric
84
Centering
some possible definitions of center of a convex set C:
center of largest inscribed ball (Chebyshev center)
for polyhedron, can be computed via linear programming (page 419)
center of maximum volume inscribed ellipsoid (page 83)
xcheb
xmve
85
i = 1, . . . , m,
Fx = g
Geometric problems
86
m
X
i=1
log(bi aTi x)
xac
87
Linear discrimination
separate two sets of points {x1, . . . , xN }, {y1, . . . , yM } by a hyperplane:
aT xi + b > 0,
i = 1, . . . , N,
aT yi + b < 0,
i = 1, . . . , M
i = 1, . . . , N,
aT yi + b 1,
i = 1, . . . , M
88
H2 = {z | aT z + b = 1}
(1)
89
1T = 1T ,
0,
(2)
0
t
PM
P N
subject to
i=1 ixi i=1 iyi
t
2
T
0, 1 = 1, 0, 1T = 1
optimal value is distance between convex hulls
Geometric problems
810
Geometric problems
811
Geometric problems
812
Nonlinear discrimination
separate two sets of points by a nonlinear function:
f (xi) > 0,
i = 1, . . . , N,
f (yi) < 0,
i = 1, . . . , M
Geometric problems
i = 1, . . . , N,
T F (yi) 1,
i = 1, . . . , M
813
yiT P yi + q T yi + r 1
separation by ellipsoid
Geometric problems
i6=j
fij (xi, xj )
815
example: minimize
(i,j)A h(kxi
5
3
00
0.5
Geometric problems
1.5
00
4
3
2
1
0.5
1.5
00
0.5
1.5
816
91
92
93
94
orthogonal matrices: A1 = AT
2n2 flops to compute x = AT b for general A
1 j = i
0 otherwise
0 1 0
A = 0 0 1 ,
1 0 0
Numerical linear algebra background
A1
0 0 1
= AT = 1 0 0
0 1 0
95
A
compute x = A1b = A1
2 A1 b by solving k easy equations
k
A1x1 = b,
A2 x 2 = x 1 ,
...,
Ak x = xk1
Ax2 = b2,
...,
Axm = bm
96
LU factorization
every nonsingular matrix A can be factored as
A = P LU
with P a permutation matrix, L lower triangular, U upper triangular
cost: (2/3)n3 flops
97
sparse LU factorization
A = P1LU P2
adding permutation matrix P2 offers possibility of sparser L, U (hence,
cheaper factor and solve steps)
P1 and P2 chosen (heuristically) to yield sparse L, U
choice of P1 and P2 depends on sparsity pattern and values of A
cost is usually much less than (2/3)n3; exact value depends in a
complicated way on n, number of zeros in A, sparsity pattern
98
Cholesky factorization
every positive definite A can be factored as
A = LLT
with L lower triangular
cost: (1/3)n3 flops
99
910
LDLT factorization
every nonsingular symmetric matrix A can be factored as
A = P LDLT P T
with P a permutation matrix, L lower triangular, D block diagonal with
1 1 or 2 2 diagonal blocks
cost: (1/3)n3
cost of solving symmetric sets of linear equations by LDLT factorization:
(1/3)n3 + 2n2 (1/3)n3 for large n
for sparse A, can choose P to yield sparse L; cost (1/3)n3
911
A11 A12
A21 A22
x1
x2
b1
b2
(1)
A
A
(A22 A21A1
12
2
2
21
11 b1
11
912
913
first write as
A B
C I
x
y
b
0
914
(2)
915
916
101
Unconstrained minimization
minimize f (x)
f convex, twice continuously differentiable (hence dom f open)
we assume optimal value p = inf x f (x) is attained (and finite)
unconstrained minimization methods
produce sequence of points x(k) dom f , k = 0, 1, . . . with
f (x(k)) p
can be interpreted as iterative methods for solving optimality condition
f (x) = 0
Unconstrained minimization
102
true if dom f = Rn
m
X
i=1
Unconstrained minimization
exp(aTi x + bi)),
f (x) =
m
X
i=1
log(bi aTi x)
103
for all x S
implications
for x, y S,
f (y) f (x) + f (x)T (y x) +
m
kx yk22
2
hence, S is bounded
p > , and for x S,
1
f (x) p
kf (x)k22
2m
104
Descent methods
x(k+1) = x(k) + t(k)x(k)
Unconstrained minimization
105
f (x + tx)
f (x) + tf (x)T x
t=0
Unconstrained minimization
t0
f (x) + tf (x)T x
t
106
f (x(k)) p ck (f (x(0)) p)
c (0, 1) depends on m, x(0), line search type
107
quadratic problem in R2
f (x) = (1/2)(x21 + x22)
( > 0)
1
+1
k
(k)
x2
+1
k
very slow if 1 or 1
x(0)
0
x(1)
4
10
Unconstrained minimization
0
x1
10
108
nonquadratic example
f (x1, x2) = ex1+3x20.1 + ex13x20.1 + ex10.1
x(0)
x(0)
x(2)
x(1)
x(1)
Unconstrained minimization
109
a problem in R100
f (x) = cT x
500
X
i=1
log(bi aTi x)
f (x(k)) p
104
102
100
exact l.s.
102
backtracking l.s.
104
0
50
100
150
200
1010
1011
examples
Euclidean norm: xsd = f (x)
f (x)
xnsd
Unconstrained minimization
f (x)
xnsd
1012
x
x(1)
x(0)
(0)
x(2)
x(2)
x(1)
steepest descent with backtracking line search for two quadratic norms
ellipses show {x | kx x(k)kP = 1}
equivalent interpretation of steepest descent with quadratic norm k kP :
gradient descent after change of variables x
= P 1/2x
shows choice of P has strong effect on speed of convergence
Unconstrained minimization
1013
Newton step
xnt = 2f (x)1f (x)
interpretations
x + xnt minimizes second order approximation
1 T 2
T
b
f (x + v) = f (x) + f (x) v + v f (x)v
2
fb
fb
(x, f (x))
(x + xnt, f (x + xnt))
Unconstrained minimization
f
(x + xnt, f (x + xnt))
(x, f (x))
f
1014
1/2
x + xnsd
x + xnt
Unconstrained minimization
1015
Newton decrement
T
f (x)
1/2
properties
gives an estimate of f (x) p, using quadratic approximation fb:
1
b
f (x) inf f (y) = (x)2
y
2
equal to the norm of the Newton step in the quadratic Hessian norm
(x) =
1/2
2
T
xnt f (x)xnt
1016
Newtons method
Unconstrained minimization
1017
Unconstrained minimization
L
(k)
kf
(x
)k2
2
2m
2
1018
Unconstrained minimization
L
k
kf
(x
)k2
2
2m
2lk
2lk
1
,
2
lk
1019
Unconstrained minimization
1020
Examples
example in R2 (page 109)
x(0)
x(1)
f (x(k)) p
105
100
105
1010
10150
1021
100
10
1.5
backtracking
f (x(k)) p
0.5
10
0
0
backtracking
1022
10000
X
i=1
log(1 x2i )
100000
X
i=1
log(bi aTi x)
f (x(k)) p
105
100
105
10
15
20
Unconstrained minimization
1023
Self-concordance
shortcomings of classical convergence analysis
depends on unknown constants (m, L, . . . )
bound is not affinely invariant, although Newtons method is
convergence analysis via self-concordance (Nesterov and Nemirovski)
does not depend on any unknown constants
gives affine-invariant bound
applies to special class of convex functions (self-concordant functions)
developed to analyze polynomial-time interior-point methods for convex
optimization
Unconstrained minimization
1024
Self-concordant functions
definition
convex f : R R is self-concordant if |f (x)| 2f (x)3/2 for all
x dom f
Unconstrained minimization
1025
Self-concordant calculus
properties
preserved under positive scaling 1, and sum
preserved under composition with affine function
1026
f (x(k+1)) f (x(k))
2(x(k+1)) 2(x(k))
2
1027
Pm
T
log(b
a
i
i x)
i=1
m = 100, n = 50
: m = 1000, n = 500
: m = 1000, n = 50
iterations
20
15
10
5
0
0
10
15
f (x
(0)
20
)p
25
30
35
1028
Implementation
main effort in each iteration: evaluate derivatives and solve Newton system
Hx = g
where H = 2f (x), g = f (x)
via Cholesky factorization
H = LLT ,
xnt = LT L1g,
(x) = kL1gk2
Unconstrained minimization
1029
n
X
H = D + AT H0 A
i=1
LT0 Ax w = 0
Dx = g AT L0w
1030
111
Ax = b
112
P A
A 0
q
b
x 6= 0
xT P x > 0
113
F =
I
1T
Rn(n1)
reduced problem:
minimize f1(x1) + + fn1(xn1) + fn(b x1 xn1)
(variables x1, . . . , xn1)
115
Newton step
Newton step xnt of f at feasible x is given by solution v of
f (x) A
A
0
v
w
f (x)
0
interpretations
xnt solves second order approximation (with variable v)
minimize fb(x + v) = f (x) + f (x)T v + (1/2)v T 2f (x)v
subject to A(x + v) = b
xnt equations follow from linearizing optimality conditions
f (x + v) + AT w f (x) + 2f (x)v + AT w = 0,
Equality constrained minimization
A(x + v) = b
116
Newton decrement
(x) =
properties
1/2
T
2
xnt f (x)xnt
= f (x) xnt
1/2
f (x)
1/2
117
118
119
r(y) = (f (x) + AT , Ax b)
1110
given starting point x dom f , , tolerance > 0, (0, 1/2), (0, 1).
repeat
1. Compute primal and dual Newton steps xnt, nt.
2. Backtracking line search on krk2.
t := 1.
while kr(x + txnt, + tnt)k2 > (1 t)kr(x, )k2, t := t.
3. Update. x := x + txnt, := + tnt.
until Ax = b and kr(x, )k2 .
t=0
= kr(y)k2
1111
H
A
A
0
v
w
g
h
solution methods
LDLT factorization
elimination (if H nonsingular)
AH 1AT w = h AH 1g,
Hv = (g + AT w)
1112
Pn
i=1 log xi
Pn
subject to Ax = b
i=1 log(A
)i + n
105
100
105
10100
10
15
20
1113
105
100
105
10100
10
1010
105
100
105
1010
10150
Equality constrained minimization
10
15
20
25
1114
diag(x)
A
A
0
x
w
diag(x)
0
diag(x)
A
A
0
diag(x) 1
Ax b
1115
minimize
i=1 i (xi )
subject to Ax = b
directed graph with n arcs, p + 1 nodes
xi: flow through arc i; i: cost flow function for arc i (with i (x) > 0)
0 otherwise
1116
KKT system
H
A
A
0
v
w
g
h
Hv = (g + AT w)
1117
i = 1, . . . , p
variable X Sn
optimality conditions
X 0,
(X )1 +
p
X
jAi = 0,
tr(AiX ) = bi,
i = 1, . . . , p
j=1
p
X
wj Ai = X 1,
tr(AiX) = 0,
i = 1, . . . , p
j=1
1118
Pp
j=1 wj XAj X
i = 1, . . . , p
(2)
j=1
1119
121
i = 1, . . . , m
(1)
fi(
x) < 0,
i = 1, . . . , m,
A
x=b
Interior-point methods
122
Examples
LP, QP, QCQP, GP
entropy maximization with linear inequality constraints
Pn
minimize
i=1 xi log xi
subject to F x g
Ax = b
with dom f0 = Rn++
differentiability may require reformulating the problem, e.g.,
piecewise-linear minimization or -norm approximation via LP
SDPs and SOCPs are better handled as problems with generalized
inequalities (see later)
Interior-point methods
123
Logarithmic barrier
reformulation of (1) via indicator function:
Pm
minimize f0(x) + i=1 I(fi(x))
subject to Ax = b
where I(u) = 0 if u 0, I(u) = otherwise (indicator function of R)
approximation via logarithmic barrier
minimize f0(x) (1/t)
subject to Ax = b
Pm
10
5
0
5
3
1
124
m
X
log(fi(x)),
i=1
Interior-point methods
m
X
1
fi(x)
f
(x)
i
i=1
m
X
X 1
1
2
T
f
(x)f
(x)
+
fi(x)
i
i
2
f (x)
fi(x)
i=1
i=1 i
125
Central path
for t > 0, define x(t) as the solution of
minimize tf0(x) + (x)
subject to Ax = b
(for now, assume x(t) exists and is unique for each t > 0)
central path is {x(t) | t > 0}
example: central path for an LP
minimize cT x
subject to aTi x bi,
T
i = 1, . . . , 6
x
x(10)
126
1
tf0(x) +
fi(x) + AT w = 0,
fi(x)
i=1
Ax = b
m
X
i=1
= f0(x(t)) m/t
Interior-point methods
127
m
X
i=1
ifi(x) + AT = 0
Interior-point methods
128
Pm
m
X
Fi(x(t)) = 0
i=1
Interior-point methods
129
example
minimize cT x
subject to aTi x bi,
i = 1, . . . , m
ai
,
T
bi a i x
kFi(x)k2 =
1
dist(x, Hi)
c
3c
t=1
Interior-point methods
t=3
1210
Barrier method
given strictly feasible x, t := t(0) > 0, > 1, tolerance > 0.
repeat
1.
2.
3.
4.
1211
Convergence analysis
number of outer (centering) iterations: exactly
log(m/(t(0)))
log
1212
Examples
inequality form LP (m = 100 inequalities, n = 50 variables)
140
10
102
10
106
= 50 = 150
0
20
40
60
=2
Newton iterations
duality gap
102
80
120
100
80
60
40
20
0
0
40
80
Newton iterations
120
160
200
1213
P
5
T
log
k=1 exp(a0k x + b0k )
P
5
T
log
exp(a
ik x + bik )
k=1
0,
i = 1, . . . , m
duality gap
102
100
102
104
106
0
= 150
20
= 50
40
60
=2
80
100
120
Newton iterations
Interior-point methods
1214
x0
Newton iterations
35
30
25
20
15 1
10
102
103
1215
i = 1, . . . , m,
Ax = b
(2)
i = 1, . . . , m
(3)
1216
i = 1, . . . , m
60
60
40
40
number
number
20
0
1
0.5
bi
0.5
aTi xmax
1.5
20
0
1
0.5
bi
0.5
aTi xsum
1.5
1217
0.5
100
0.5
100
80
60
40
20
0
100
Feasible
Infeasible
Newton iterations
Newton iterations
Newton iterations
102
104
106
80
60
40
20
0
106
104
2
10
100
1218
minimize
i=1 xi log xi
subject to F x g
Pn
minimize
i=1 xi log xi
subject to F x g, x 0
1219
tf0(x) tf0(x+) t
m
X
i=1
ifi(x+) m m log
Interior-point methods
1220
(0)
log(m/(t ))
log
m( 1 log )
+c
5 104
4 104
3 104
2 10
m = 100,
m
t(0)
= 105
1 104
0
1
1.1
1.2
1221
for = 1 + 1/ m:
N =O
m/t(0)
m log
Interior-point methods
1222
Generalized inequalities
minimize f0(x)
subject to fi(x) Ki 0,
Ax = b
i = 1, . . . , m
Interior-point methods
1223
Rn+:
(y) =
Pn
i=1 log yi ,
with degree = n
( = n)
Interior-point methods
( = 2)
1224
(y) K 0,
nonnegative orthant
Rn+:
(y) =
Pn
i=1 log yi
y T (y) = n
tr(Y (Y )) = n
y1
..
2
T
,
y
(y) = 2
(y) = 2
2
2
y
yn+1 y1 yn
n
yn+1
Interior-point methods
1225
m
X
i=1
i(fi(x)),
dom = {x | fi(x) Ki 0, i = 1, . . . , m}
Interior-point methods
1226
m
X
i=1
Dfi(x)T i(fi(x)) + AT w = 0
1
= i(fi(x(t))),
t
w
(t) =
t
Interior-point methods
m
X
i=1
1227
i = 1, . . . , n
i = 1, . . . , n
1228
Barrier method
given strictly feasible x, t := t(0) > 0, > 1, tolerance > 0.
repeat
1.
2.
3.
4.
i i /t
1229
Examples
duality gap
102
10
102
104
106
= 50 = 200
20
=2
60
40
Newton iterations
80
20
60
100
Newton iterations
140
180
duality gap
10
140
100
102
10
106 = 150 = 50
0
20
40
60
=2
80
Newton iterations
Interior-point methods
100
100
60
20
0
20
40
60
80
100
120
1230
Newton iterations
35
30
25
20
15
101
Interior-point methods
102
103
1231
Interior-point methods
1232
13. Conclusions
131
Modeling
mathematical optimization
problems in engineering design, data analysis and statistics, economics,
management, . . . , can often be expressed as mathematical
optimization problems
techniques exist to take into account multiple objectives or uncertainty
in the data
tractability
roughly speaking, tractability in optimization requires convexity
algorithms for nonconvex optimization find local (suboptimal) solutions,
or are very expensive
surprisingly many applications can be formulated as convex problems
Conclusions
132
Conclusions
133
Conclusions
134
Conclusions
135
Further topics
some topics we didnt cover:
methods for very large scale problems
subgradient calculus, convex analysis
localization, subgradient, and related methods
distributed convex optimization
applications that build on or use convex optimization
. . . these will be in EE364B next quarter
Conclusions
136