Professional Documents
Culture Documents
BV Cvxslides PDF
BV Cvxslides PDF
1. Introduction
mathematical optimization
convex optimization
example
nonlinear optimization
11
Mathematical optimization
minimize f0(x)
subject to fi(x) bi, i = 1, . . . , m
f0 : Rn R: objective function
fi : Rn R, i = 1, . . . , m: constraint functions
Introduction 12
Examples
portfolio optimization
variables: amounts invested in different assets
constraints: budget, max./min. investment per asset, minimum return
objective: overall risk or return variance
data fitting
variables: model parameters
constraints: prior information, parameter limits
objective: measure of misfit or prediction error
Introduction 13
Solving optimization problems
least-squares problems
linear programming problems
convex optimization problems
Introduction 14
Least-squares
using least-squares
Introduction 15
Linear programming
minimize cT x
subject to aTi x bi, i = 1, . . . , m
Introduction 16
Convex optimization problem
minimize f0(x)
subject to fi(x) bi, i = 1, . . . , m
if + = 1, 0, 0
Introduction 17
solving convex optimization problems
no analytical solution
reliable and efficient algorithms
computation time (roughly) proportional to max{n3, n2m, F }, where F
is cost of evaluating fis and their first and second derivatives
almost a technology
Introduction 18
Example
rkj
kj
illumination Ik
Introduction 19
how to solve?
1. use uniform power: pj = p, vary p
2. use least-squares:
Pn 2
minimize k=1 (I k I des )
Introduction 110
5. use convex optimization: problem is equivalent to
3
h(u)
0
0 1 2 3 4
u
Introduction 111
additional constraints: does adding 1 or 2 below complicate the problem?
answer: with (1), still easy to solve; with (2), extremely difficult
moral: (untrained) intuition doesnt always work; without the proper
background very easy problems can appear quite similar to very difficult
problems
Introduction 112
Course goals and topics
goals
topics
Introduction 113
Nonlinear optimization
Introduction 114
Brief history of convex optimization
theory (convex analysis): ca19001970
algorithms
1947: simplex algorithm for linear programming (Dantzig)
1960s: early interior-point methods (Fiacco & McCormick, Dikin, . . . )
1970s: ellipsoid method and other subgradient methods
1980s: polynomial-time interior-point methods for linear programming
(Karmarkar 1984)
late 1980snow: polynomial-time interior-point methods for nonlinear
convex optimization (Nesterov & Nemirovski 1994)
applications
before 1990: mostly in operations research; few in engineering
since 1990: many new applications in engineering (control, signal
processing, communications, circuit design, . . . ); new problem classes
(semidefinite and second-order cone programming, robust optimization)
Introduction 115
Convex Optimization Boyd & Vandenberghe
2. Convex sets
generalized inequalities
21
Affine set
x = x1 + (1 )x2 ( R)
= 1.2 x1
=1
= 0.6
x2
=0
= 0.2
affine set: contains the line through any two distinct points in the set
Convex sets 22
Convex set
x = x1 + (1 )x2
with 0 1
convex set: contains line segment between any two points in the set
x1, x2 C, 01 = x1 + (1 )x2 C
Convex sets 23
Convex combination and convex hull
x = 1 x1 + 2 x2 + + k xk
with 1 + + k = 1, i 0
Convex sets 24
Convex cone
x = 1 x1 + 2 x2
with 1 0, 2 0
x1
x2
0
convex cone: set that contains all conic combinations of points in the set
Convex sets 25
Hyperplanes and halfspaces
hyperplane: set of the form {x | aT x = b} (a 6= 0)
a
x0
x
aT x = b
x0 aT x b
aT x b
Convex sets 26
Euclidean balls and ellipsoids
xc
Convex sets 27
Norm balls and norm cones
norm: a function k k that satisfies
Convex sets 28
Polyhedra
Ax b, Cx = d
a1 a2
P
a5
a3
a4
Convex sets 29
Positive semidefinite cone
notation:
Sn is set of symmetric n n matrices
Sn+ = {X Sn | X 0}: positive semidefinite n n matrices
x y 0.5
z
example: S2+
y z
0
1
1
0
0.5
y 1 0 x
1. apply definition
x1, x2 C, 01 = x1 + (1 )x2 C
example:
S = {x Rm | |p(t)| 1 for |t| /3}
for m = 2:
2
1
1
p(t)
0
x2 0 S
1
2
0 /3 2/3 2 1
t x01 1 2
examples
scaling, translation, projection
solution set of linear matrix inequality {x | x1A1 + + xmAm B}
(with Ai, B Sp)
hyperbolic cone {x | xT P x (cT x)2, cT x 0} (with P Sn+)
images and inverse images of convex sets under perspective are convex
Ax + b
f (x) = T , dom f = {x | cT x + d > 0}
c x+d
1
f (x) = x
x1 + x2 + 1
1 1
x2
x2
0 C 0
f (C)
1 1
1 0 1 1 0 1
x1 x1
examples
nonnegative orthant K = Rn+ = {x Rn | xi 0, i = 1, . . . , n}
positive semidefinite cone K = Sn+
nonnegative polynomials on [0, 1]:
x K y y x K, x K y y x int K
examples
componentwise inequality (K = Rn+)
x Rn+ y xi yi , i = 1, . . . , n
x K y, u K v = x + u K y + v
yS = x K y
y S, y K x = y=x
example (K = R2+)
S2
S1 x2
x1 is the minimum element of S1
x2 is a minimal element of S2 x1
aT x b for x C, aT x b for x D
aT x b aT x b
D
C
{x | aT x = aT x0}
a
x0
C
K = {y | y T x 0 for all x K}
examples
K = Rn+: K = Rn+
K = Sn+: K = Sn+
K = {(x, t) | kxk2 t}: K = {(x, t) | kxk2 t}
K = {(x, t) | kxk1 t}: K = {(x, t) | kxk t}
y K 0 y T x 0 for all x K 0
x1 S
2
x2
fuel
example (n = 2)
x1, x2, x3 are efficient; x4, x5 are not P
x1
x2 x5 x4
x3
labor
3. Convex functions
quasiconvex functions
31
Definition
f : Rn R is convex if dom f is a convex set and
(y, f (y))
(x, f (x))
f is concave if f is convex
f is strictly convex if dom f is convex and
Convex functions 32
Examples on R
convex:
affine: ax + b on R, for any a, b R
exponential: eax, for any a R
powers: x on R++, for 1 or 0
powers of absolute value: |x|p on R, for p 1
negative entropy: x log x on R++
concave:
affine: ax + b on R, for any a, b R
powers: x on R++, for 0 1
logarithm: log x on R++
Convex functions 33
Examples on Rn and Rmn
affine functions are convex and concave; all norms are convex
examples on Rn
affine function f (x) = aT x + b
Pn
norms: kxkp = ( i=1 |xi|p)1/p for p 1; kxk = maxk |xk |
Convex functions 34
Restriction of a convex function to a line
Convex functions 35
Extended-value extension
extended-value extension f of f is
dom f is convex
for x, y dom f ,
Convex functions 36
First-order condition
f (y)
f (x) + f (x)T (y x)
(x, f (x))
Convex functions 37
Second-order conditions
2 2f (x)
f (x)ij = , i, j = 1, . . . , n,
xixj
Convex functions 38
Examples
quadratic function: f (x) = (1/2)xT P x + q T x + r (with P Sn)
f (x) = P x + q, 2f (x) = P
convex if P 0
least-squares objective: f (x) = kAx bk22
f (x, y)
T 1
2 y y
2f (x, y) = 3 0
y x x 0
2 2
1 0
convex for y > 0 y 0 2 x
Convex functions 39
Pn
log-sum-exp: f (x) = log k=1 exp xk is convex
1 1
2f (x) = diag(z) zz T
(zk = exp xk )
1T z (1T z)2
Qn 1/n n
geometric mean: f (x) = ( k=1 x k ) on R ++ is concave
-sublevel set of f : Rn R:
C = {x dom f | f (x) }
epigraph of f : Rn R:
epi f
f (E z) E f (z)
prob(z = x) = , prob(z = y) = 1
examples
m
X
f (x) = log(bi aTi x), dom f = {x | aTi x < bi, i = 1, . . . , m}
i=1
examples
is convex
examples
support function of a set C: SC (x) = supyC y T x is convex
distance to farthest point in a set C:
f (x) = sup kx yk
yC
max(X) = sup y T Xy
kyk2 =1
composition of g : Rn R and h : R R:
f (x) = h(g(x))
nondecreasing
g convex, h convex, h
f is convex if nonincreasing
g concave, h convex, h
note: monotonicity must hold for extended-value extension h
examples
exp g(x) is convex if g is convex
1/g(x) is convex if g is concave and positive
composition of g : Rn Rk and h : Rk R:
examples
Pm
i=1 log gi (x) is concave if gi are concave and positive
Pm
log i=1 exp gi(x) is convex if gi are convex
is convex
examples
f (x, y) = xT Ax + 2xT By + y T Cy with
A B
0, C0
BT C
g is convex if f is convex
examples
f (x) = xT x is convex; hence g(x, t) = xT x/t is convex for t > 0
negative logarithm f (x) = log x is convex; hence relative entropy
g(x, t) = t log t t log x is convex on R2++
if f is convex, then
T T
g(x) = (c x + d)f (Ax + b)/(c x + d)
f (x)
xy
(0, f (y))
S = {x dom f | f (x) }
a b c
f is quasiconcave if f is quasiconvex
f is quasilinear if it is quasiconvex and quasiconcave
is quasiconvex
f (x)
x
1 1 T
1 (x
f (x) = p e 2 (xx) x)
(2)n det
f (x) = prob(x + y C)
is log-concave
proof: write f (x) as integral of product of log-concave functions
Z
1 uC
f (x) = g(x + y)p(y) dy, g(u) =
0 u
6 C,
p is pdf of y
Y (x) = prob(x + w S)
Y is log-concave
for x, y dom f , 0 1
z T (X + (1 )Y )2z z T X 2z + (1 )z T Y 2z
for X, Y Sm, 0 1
therefore (X + (1 )Y )2 X 2 + (1 )Y 2
41
Optimization problem in standard form
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
optimal value:
examples (with n = 1, m = p = 0)
f0(x) = 1/x, dom f0 = R++: p = 0, no optimal point
f0(x) = log x, dom f0 = R++: p =
f0(x) = x log x, dom f0 = R++: p = 1/e, x = 1/e is optimal
f0(x) = x3 3x, p = , local optimum at x = 1
m
\ p
\
xD= dom fi dom hi,
i=0 i=1
example:
Pk
minimize f0(x) = i=1 log(bi aTi x)
find x
subject to fi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
minimize 0
subject to fi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
aTi x = bi, i = 1, . . . , p
often written as
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
Ax = b
f0(x)
x
X
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
Ax = b
is equivalent to
Ax = b x = F z + x0 for some z
is equivalent to
minimize f0(x)
subject to aTi x bi, i = 1, . . . , m
is equivalent to
minimize (over x, t) t
subject to f0(x) t 0
fi(x) 0, i = 1, . . . , m
Ax = b
is equivalent to
minimize f0(x1)
subject to fi(x1) 0, i = 1, . . . , m
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
Ax = b
with f0 : Rn R quasiconvex, f1, . . . , fm convex
can have locally optimal points that are not (globally) optimal
(x, f0(x))
f0(x) t t(x) 0
example
p(x)
f0(x) =
q(x)
minimize cT x + d
subject to Gx h
Ax = b
P x
minimize cT x
subject to Ax b, x0
piecewise-linear minimization
equivalent to an LP
minimize t
subject to aTi x + bi t, i = 1, . . . , m
P = {x | aTi x bi, i = 1, . . . , m}
xcheb
is center of largest inscribed ball
B = {xc + u | kuk2 r}
maximize r
subject to aTi xc + rkaik2 bi, i = 1, . . . , m
minimize f0(x)
subject to Gx h
Ax = b
linear-fractional program
cT x + d
f0(x) = T , dom f0(x) = {x | eT x + f > 0}
e x+f
minimize cT y + dz
subject to Gy hz
Ay = bz
eT y + f z = 1
z0
cTi x + di
f0(x) = max T , dom f0(x) = {x | eTi x+fi > 0, i = 1, . . . , r}
i=1,...,r e x + fi
i
minimize (1/2)xT P x + q T x + r
subject to Gx h
Ax = b
f0(x)
least-squares
minimize kAx bk22
minimize cT x + xT x = E cT x + var(cT x)
subject to Gx h, Ax = b
minimize f T x
subject to kAix + bik2 cTi x + di, i = 1, . . . , m
Fx = g
minimize cT x
subject to aTi x bi, i = 1, . . . , m,
minimize cT x
subject to aTi x bi for all ai Ei, i = 1, . . . , m,
minimize cT x
subject to prob(aTi x bi) , i = 1, . . . , m
Ei = {
ai + Piu | kuk2 1} a i Rn ,
( Pi Rnn)
center is a
i, semi-axes determined by singular values/vectors of Pi
robust LP
minimize cT x
subject to aTi x bi ai Ei, i = 1, . . . , m
minimize cT x
subject to Ti x + kPiT xk2 bi,
a i = 1, . . . , m
ai + Piu)T x = a
(follows from supkuk21( Ti x + kPiT xk2)
robust LP
minimize cT x
subject to prob(aTi x bi) , i = 1, . . . , m,
minimize cT x
1/2
subject to Ti x + 1()ki xk2 bi,
a i = 1, . . . , m
minimize f0(x)
subject to fi(x) 1, i = 1, . . . , m
hi(x) = 1, i = 1, . . . , p
PK a
1k 2k a nk a
posynomial f (x) = k=1 ck x1 x2 xn transforms to
K
!
X T
log f (ey1 , . . . , eyn ) = log eak y+bk (bk = log ck )
k=1
design problem
aspect ratio hi/wi and inverse aspect ratio wi/hi are monomials
the vertical deflection yi and slope vi of central axis at the right end of
segment i are defined recursively as
F
vi = 12(i 1/2) 3 + vi+1
Ewihi
F
yi = 6(i 1/3) 3 + vi+1 + yi+1
Ewihi
note
we write wmin wi wmax and hmin hi hmax
Sminwi/hi 1, hi/(wiSmax) 1
minimize Pn
subject to j=1 A(x)ij vj /(vi ) 1, i = 1, . . . , n
variables , v, x
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
Ax = b
conic form problem: special case with affine objective and constraints
minimize cT x
subject to F x + g K 0
Ax = b
minimize cT x
subject to x1F1 + x2F2 + + xnFn + G 0
Ax = b
with Fi, G Sk
x1F1 + + xnFn + G
0, x1F1 + + xnFn + G
0
SOCP: minimize f T x
subject to kAix + bik2 cTi x + di, i = 1, . . . , m
SDP: minimize f T x
(cTi x + di)I A i x + bi
subject to 0, i = 1, . . . , m
(Aix + bi)T cTi x + di
minimize max(A(x))
equivalent SDP
minimize t
subject to A(x) tI
variables x Rn, t R
follows from
max(A) t A tI
T
1/2
minimize kA(x)k2 = max(A(x) A(x))
minimize t
tI A(x)
subject to 0
A(x)T tI
variables x Rn, t R
constraint follows from
kAk2 t AT A t2I, t 0
tI A
0
AT tI
O = {f0(x) | x feasible}
O
O
f0(xpo)
f0(x)
x is optimal xpo is Pareto optimal
25
20 O
F2(x) = kxk22
15
10
0
0 10 20 30 40 50
example
15% 1
allocation x
mean return
10%
0.5
x(1)
5%
0
0%
0% 10% 20% 0% 10% 20%
standard deviation of return standard deviation of return
Convex optimization problems 444
Scalarization
minimize T f0(x)
subject to fi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
O
if x is optimal for scalar problem,
then it is Pareto-optimal for vector f0(x1)
optimization problem
f0(x3)
1
f0(x2) 2
for convex vector optimization problems, can find (almost) all Pareto
optimal points by varying K 0
examples
kxk22
minimize kAx bk22 + kxk22
10
5
for fixed , a LS problem =1
0
0 5 10 15 20
kAx bk22
minimize pT x + xT x
subject to 1T x = 1, x 0
5. Duality
51
Lagrangian
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Duality 52
Lagrange dual function
x) L(
f0( x, , ) inf L(x, , ) = g(, )
xD
gives p g(, )
minimizing over all feasible x
Duality 53
Least-norm solution of linear equations
minimize xT x
subject to Ax = b
dual function
Lagrangian is L(x, ) = xT x + T (Ax b)
to minimize L over x, set gradient equal to zero:
xL(x, ) = 2x + AT = 0 = x = (1/2)AT
plug in in L to obtain g:
1
g() = L((1/2)AT , ) = T AAT bT
4
a concave function of
Duality 54
Standard form LP
minimize cT x
subject to Ax = b, x0
dual function
Lagrangian is
L(x, , ) = cT x + T (Ax b) T x
= bT + (c + AT )T x
L is affine in x, hence
bT AT + c = 0
g(, ) = inf L(x, , ) =
x otherwise
Duality 55
Equality constrained norm minimization
minimize kxk
subject to Ax = b
dual function
T T bT kAT k 1
g() = inf (kxk Ax + b ) =
x otherwise
where kvk = supkuk1 uT v is dual norm of k k
Duality 56
Two-way partitioning
minimize xT W x
subject to x2i = 1, i = 1, . . . , n
dual function
X
T
g() = inf (x W x + i(x2i 1)) = inf xT (W + diag())x 1T
x x
i
1T W + diag() 0
=
otherwise
Duality 57
Lagrange dual and conjugate function
minimize f0(x)
subject to Ax b, Cx = d
dual function
T T T T T
g(, ) = inf f0(x) + (A + C ) x b d
xdom f0
= f0(AT C T ) bT dT
Duality 58
The dual problem
maximize g(, )
subject to 0
minimize cT x maximize bT
subject to Ax = b subject to AT + c 0
x0
Duality 59
Weak and strong duality
weak duality: d p
always holds (for convex and nonconvex problems)
can be used to find nontrivial lower bounds for difficult problems
for example, solving the SDP
maximize 1T
subject to W + diag() 0
strong duality: d = p
does not hold in general
(usually) holds for convex problems
conditions that guarantee strong duality in convex problems are called
constraint qualifications
Duality 510
Slaters constraint qualification
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m
Ax = b
Duality 511
Inequality form LP
primal problem
minimize cT x
subject to Ax b
dual function
T T T
bT AT + c = 0
g() = inf (c + A ) x b =
x otherwise
dual problem
maximize bT
subject to AT + c = 0, 0
Duality 512
Quadratic program
primal problem (assume P Sn++)
minimize xT P x
subject to Ax b
dual function
1
g() = inf x P x + (Ax b) = T AP 1AT bT
T T
x 4
dual problem
maximize (1/4)T AP 1AT bT
subject to 0
Duality 513
A nonconvex problem with strong duality
minimize xT Ax + 2bT x
subject to xT x 1
A 6 0, hence nonconvex
strong duality although primal problem is not convex (not easy to show)
Duality 514
Geometric interpretation
for simplicity, consider problem with one constraint f1(x) 0
interpretation of dual function:
t t
G G
p p
u + t = g() d
g()
u u
Duality 515
epigraph variation: same interpretation if G is replaced with
u + t = g() p
g()
u
strong duality
holds if there is a non-vertical supporting hyperplane to A at (0, p)
for convex problem, A is convex, hence has supp. hyperplane at (0, p)
u, t) A with u
Slaters condition: if there exist ( < 0, then supporting
hyperplanes at (0, p) must be non-vertical
Duality 516
Complementary slackness
x minimizes L(x, , )
ifi(x) = 0 for i = 1, . . . , m (known as complementary slackness):
Duality 517
Karush-Kuhn-Tucker (KKT) conditions
the following four conditions are called KKT conditions (for a problem with
differentiable fi, hi):
m
X p
X
f0(x) + ifi(x) + ihi(x) = 0
i=1 i=1
from page 517: if strong duality holds and x, , are optimal, then they
must satisfy the KKT conditions
Duality 518
KKT conditions for convex problem
hence, f0( )
x) = g(,
recall that Slater implies strong duality, and dual optimum is attained
generalizes optimality condition f0(x) = 0 for unconstrained problem
Duality 519
example: water-filling (assume i > 0)
Pn
minimize i=1 log(xi + i)
subject to x 0, 1T x = 1
interpretation
n patches; level of patch i is at height i
1/
xi
flood area with unit amount of water
i
resulting level is 1/
i
Duality 520
Perturbation and sensitivity analysis
(unperturbed) optimization problem and its dual
Duality 521
global sensitivity result
assume strong duality holds for unperturbed problem, and that , are
dual optimal for unperturbed problem
apply weak duality to perturbed problem:
p(u, v) g(, ) uT v T
= p(0, 0) uT v T
sensitivity interpretation
Duality 522
local sensitivity: if (in addition) p(u, v) is differentiable at (0, 0), then
p(0, 0) p(0, 0)
i = , i =
ui vi
p(0) u
Duality 523
Duality and problem reformulations
common reformulations
Duality 524
Introducing new variables and equality constraints
minimize f0(Ax + b)
Duality 525
norm approximation problem: minimize kAx bk
minimize kyk
subject to y = Ax b
maximize bT
subject to AT = 0, kk 1
Duality 526
Implicit constraints
LP with box constraints: primal and dual problem
minimize cT x maximize bT 1T 1 1T 2
subject to Ax = b subject to c + AT + 1 2 = 0
1 x 1 1 0, 2 0
dual function
g() = inf (cT x + T (Ax b))
1x1
= bT kAT + ck1
Duality 527
Problems with generalized inequalities
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
hi(x) = 0, i = 1, . . . , p
Duality 528
lower bound property: if i Ki 0, then g(1, . . . , m, ) p
is feasible and Ki 0, then
proof: if x
m
X p
X
x) f0(
f0( x) + Ti fi(
x) + ihi(
x)
i=1 i=1
inf L(x, 1, . . . , m, )
xD
= g(1, . . . , m, )
gives p g(1, . . . , m, )
minimizing over all feasible x
dual problem
maximize g(1, . . . , m, )
subject to i Ki 0, i = 1, . . . , m
Duality 529
Semidefinite program
primal SDP (Fi, G Sk )
minimize cT x
subject to x1F1 + + xnFn G
dual SDP
maximize tr(GZ)
subject to Z 0, tr(FiZ) + ci = 0, i = 1, . . . , n
Duality 530
Convex Optimization Boyd & Vandenberghe
norm approximation
least-norm problems
regularized approximation
robust approximation
61
Norm approximation
minimize kAx bk
y = Ax + v
AT Ax = AT b
minimize t
subject to t1 Ax b t1
minimize 1T y
subject to y Ax b y
examples
2
2
quadratic: (u) = u log barrier
1.5 quadratic
deadzone-linear with width a:
(u)
1
deadzone-linear
(u) = max{0, |u| a}
0.5
40
p=1
0
2 1 0 1 2
10
p=2
0
2 1 0 1 2
Deadzone
20
0
2 1 0 1 2
Log barrier
10
0
2 1 0 1 2
r
1.5
10
hub(u)
f (t)
1
0
0.5 10
0 20
1.5 1 0.5 0 0.5 1 1.5 10 5 0 5 10
u t
minimize kxk
subject to Ax = b
2x + AT = 0, Ax = b
minimize 1T y
subject to y x y, Ax = b
Tikhonov regularization
y(t)
u(t)
0
5 0.5
10 1
0 50 100 150 200 0 50 100 150 200
t t
4 1
2 0.5
y(t)
u(t)
0 0
2 0.5
4 1
0 50 100 150 200 0 50 100 150 200
t t
4 1
2 0.5
y(t)
u(t)
0 0
2 0.5
4 1
0 50 100 150 200 0 50 100 150 200
t t
x Rn is unknown signal
xcor = x + v is (known) corrupted version of x, with additive noise v
variable x
(reconstructed signal) is estimate of x
: Rn R is regularization function or smoothing objective
n1
X n1
X
quad(
x) = ( i )2 ,
xi+1 x tv (
x) = |
xi+1 x
i |
i=1 i=1
0.5
x
0
0.5
x
0
0 1000 2000 3000 4000
0.5
0.5
0 1000 2000 3000 4000
x
0
0.5
0.5
0 1000 2000 3000 4000
0.5
xcor
x
0
0.5 0.5
0 1000 2000 3000 4000 0 1000 2000 3000 4000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor k
x xcork2 versus quad(x)
i
0
x
2
1
2
0 500 1000 1500 2000
x
0
2
1
i
0
x
2
0 500 1000 1500 2000
2 2
0 500 1000 1500 2000
1 2
xcor
i
0
x
1
2 2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor k
x xcork2 versus quad(x)
x
2 0
1
2
0 500 1000 1500 2000
x
0
2
1
x
2 0
0 500 1000 1500 2000
2 2
0 500 1000 1500 2000
1 2
xcor
x
0
1
2 2
0 500 1000 1500 2000 0 500 1000 1500 2000
i i
original signal x and noisy three solutions on trade-off curve
signal xcor k
x xcork2 versus tv (
x)
12
example: A(u) = A0 + uA1
10
xnom
xnom minimizes kA0x bk22
8
r(u)
6 xstoch
with u uniform on [1, 1] xwc
4
b + U xk2
E kAx bk22 = E kAx 2
bk2 + E xT U T U x
= kAx 2
= kAx bk22 + xT P x
bk2 + kP 1/2xk2
minimize kAx 2 2
bk2 + kxk2
minimize kAx 2 2
from page 514, strong duality holds between the following problems
minimize t+
I P (x) q(x)
subject to P (x)T I 0 0
q(x)T 0 t
0.2 xrls
frequency
0.15
0.1
xtik
0.05 xls
0
0 1 2 3 4 5
r(u)
7. Statistical estimation
experiment design
71
Parametric distribution estimation
y is observed value
l(x) = log px(y) is called log-likelihood function
can add constraints x C explicitly, or define px(y) = 0 for x 6 C
a convex optimization problem if log px(y) is concave in x for fixed y
Statistical estimation 72
Linear measurements with IID noise
yi = aTi x + vi, i = 1, . . . , m
(y is observed value)
Statistical estimation 73
examples
2
/(2 2 )
Gaussian noise N (0, 2): p(z) = (2 2)1/2ez ,
m
m 2 1 X T
l(x) = log(2 ) 2 (ai x yi)2
2 2 i=1
ML estimate is LS solution
Laplacian noise: p(z) = (1/(2a))e|z|/a,
m
1X T
l(x) = m log(2a) |ai x yi|
a i=1
Statistical estimation 74
Logistic regression
random variable y {0, 1} with distribution
exp(aT u + b)
p = prob(y = 1) =
1 + exp(aT u + b)
k
X m
X
= (aT ui + b) log(1 + exp(aT ui + b))
i=1 i=1
concave in a, b
Statistical estimation 75
example (n = 1, m = 50 measurements)
0.8
prob(y = 1)
0.6
0.4
0.2
0 2 4 6 8 10
u
Statistical estimation 76
(Binary) hypothesis testing
randomized detector
Statistical estimation 77
detection probability matrix:
1 Pfp Pfn
D= Tp Tq =
Pfp 1 Pfn
variable T R2n
Statistical estimation 78
scalarization (with weight > 0)
minimax detector
Statistical estimation 79
example
0.70 0.10
0.20 0.10
P =
0.05
0.70
0.05 0.10
0.8
0.6
Pfn
0.4
1
0.2
2 4
3
0
0 0.2 0.4 0.6 0.8 1
Pfp
m
!1 m
X X
x
= aiaTi yi a i
i=1 i=1
error e = x
x has zero mean and covariance
m
!1
X
E = E eeT = aiaTi
i=1
)T E 1(x x
confidence ellipsoids are given by {x | (x x ) }
k (1 vkT W vk ) = 0, k = 1, . . . , p
2 = 0.5
p
!!
X
L(X, , Z, z, ) = log det X 1+tr Z X k vk vkT z T +(1T 1)
k=1
dual problem
change variable W = Z/, and optimize over to get dual of page 713
8. Geometric problems
centering
classification
81
Minimum volume ellipsoid around a set
Geometric problems 82
Maximum volume inscribed ellipsoid
maximum volume ellipsoid E inside a convex set C Rn
Geometric problems 83
Efficiency of ellipsoidal approximations
factor n can be improved to n if C is symmetric
Geometric problems 84
Centering
xcheb xmve
Geometric problems 85
Analytic center of a set of inequalities
fi(x) 0, i = 1, . . . , m, Fx = g
Geometric problems 86
analytic center of linear inequalities aTi x bi, i = 1, . . . , m
xac is minimizer of
m
X xac
(x) = log(bi aTi x)
i=1
where
Geometric problems 87
Linear discrimination
separate two sets of points {x1, . . . , xN }, {y1, . . . , yM } by a hyperplane:
aT xi + b > 0, i = 1, . . . , N, aT yi + b < 0, i = 1, . . . , M
aT xi + b 1, i = 1, . . . , N, aT yi + b 1, i = 1, . . . , M
Geometric problems 88
Robust linear discrimination
H1 = {z | aT z + b = 1}
H2 = {z | aT z + b = 1}
minimize (1/2)kak2
subject to aT xi + b 1, i = 1, . . . , N (1)
aT yi + b 1, i = 1, . . . , M
Geometric problems 89
Lagrange dual of maximum margin separation problem (1)
maximize 1T
+ 1T
P N PM
subject to 2
i=1 ixi i=1 iyi
1 (2)
2
1T = 1T , 0, 0
interpretation
minimize t
P N PM
subject to
i=1 ixi i=1 iyi
t
2
0, 1 = 1, 0, 1T = 1
T
minimize 1T u + 1T v
subject to aT xi + b 1 ui, i = 1, . . . , N
aT yi + b 1 + vi, i = 1, . . . , M
u 0, v 0
an LP in a, b, u, v
at optimum, ui = max{0, 1 aT xi b}, vi = max{0, 1 + aT yi + b}
can be interpreted as a heuristic for minimizing #misclassified points
f (z) = T F (z)
T F (xi) 1, i = 1, . . . , N, T F (yi) 1, i = 1, . . . , M
xTi P xi + q T xi + r 1, yiT P yi + q T yi + r 1
placement problem
P
minimize i6=j fij (xi, xj )
0 0 0
1 1 1
1 0 1 1 0 1 1 0 1
4 4 6
5
3 3
4
2 2 3
2
1 1
1
00 0.5 1 1.5 2 00 0.5 1 1.5 00 0.5 1 1.5
91
Matrix structure and algorithm complexity
flop counts
x1 := b1/a11
x2 := (b2 a21x1)/a22
x3 := (b3 a31x1 a32x2)/a33
..
permutation matrices:
1 j = i
aij =
0 otherwise
A = A 1 A2 Ak
compute x = A1b = A1
k A 1 1
2 A1 b by solving k easy equations
A = P LU
A = P1LU P2
A = LLT
A = P LLT P T
A = P LDLT P T
cost: (1/3)n3
(A22 A21A1
11 A 12 )x 2 = b 2 A 21 A 1
11 b1
step 3: (2/3)n32
examples
general A11 (f = (2/3)n31, s = 2n21): no gain over standard method
(A + BC)x = b
first write as
A B x b
=
C I y 0
(I + CA1B)y = CA1b,
then solve Ax = b By
this proves the matrix inversion lemma: if A and A + BC nonsingular,
| z Rnp}
{x | Ax = b} = {F z + x
x
is (any) particular solution
columns of F Rn(np) span nullspace of A
there exist several numerical methods for computing F
(QR factorization, rectangular LU factorization, . . . )
Newtons method
self-concordant functions
implementation
101
Unconstrained minimization
minimize f (x)
f (x(k)) p
f (x) = 0
2nd condition is hard to verify, except when all sublevel sets are closed:
equivalent to condition that epi f is closed
true if dom f = Rn
true if f (x) as x bd dom f
implications
for x, y S,
m
f (y) f (x) + f (x)T (y x) + kx yk22
2
hence, S is bounded
p > , and for x S,
1
f (x) p kf (x)k22
2m
useful as stopping criterion (if you know m)
f (x + tx)
f (x(k)) p ck (f (x(0)) p)
very slow if 1 or 1
example for = 10:
x(0)
x2
0
x(1)
4
10 0 10
x1
x(0) x(0)
x(2)
x(1)
x(1)
500
X
f (x) = cT x log(bi aTi x)
i=1
104
102
f (x(k)) p
102
backtracking l.s.
104
0 50 100 150 200
k
xsd = kf (x)kxnsd
unit balls and normalized steepest descent directions for a quadratic norm
and the 1-norm:
f (x)
f (x)
xnsd
xnsd
(0) x(0)
x
x(2)
x(1) x(2)
x(1)
steepest descent with backtracking line search for two quadratic norms
ellipses show {x | kx x(k)kP = 1}
equivalent interpretation of steepest descent with quadratic norm k kP :
= P 1/2x
gradient descent after change of variables x
interpretations
x + xnt minimizes second order approximation
b T 1 T 2
f (x + v) = f (x) + f (x) v + v f (x)v
2
fb
fb f
(x + xnt, f (x + xnt))
(x, f (x)) (x, f (x))
(x + xnt, f (x + xnt)) f
T 2
1/2
kuk2f (x) = u f (x)u
x + xnsd
x + xnt
b 1
f (x) inf f (y) = (x)2
y 2
equal to the norm of the Newton step in the quadratic Hessian norm
T 2
1/2
(x) = xnt f (x)xnt
Newton iterates for f(y) = f (T y) with starting point y (0) = T 1x(0) are
y (k) = T 1x(k)
assumptions
f strongly convex on S with constant m
2f is Lipschitz continuous on S, with constant L > 0:
2lk 2lk
L l L k 1
2
kf (x )k2 2
kf (x )k2 , lk
2m 2m 2
f (x(0)) p
+ log2 log2(0/)
105
x(0) 100
f (x(k)) p
x(1) 105
1010
10150 1 2 3 4 5
k
105 2
1015 0
0 2 4 6 8 10 0 2 4 6 8
k k
f (x(k)) p 105
100
105
0 5 10 15 20
k
definition
convex f : R R is self-concordant if |f (x)| 2f (x)3/2 for all
x dom f
f : Rn R is self-concordant if g(t) = f (x + tv) is self-concordant for
all x dom f , v Rn
examples on R
linear and quadratic functions
negative logarithm f (x) = log x
negative entropy plus negative logarithm: f (x) = x log x log x
properties
preserved under positive scaling 1, and sum
preserved under composition with affine function
if g is convex with dom g = R++ and |g (x)| 3g (x)/x then
is self-concordant
examples: properties can be used to show that the following are s.c.
Pm
f (x) = i=1 log(bi aTi x) on {x | aTi x < bi, i = 1, . . . , m}
f (X) = log det X on Sn++
f (x) = log(y 2 xT x) on {(x, y) | kxk2 < y}
if (x) , then
2
2(x(k+1)) 2(x(k))
f (x(0)) p
+ log2 log2(1/)
25
20
iterations
m = 100, n = 50 15
: m = 1000, n = 500
: m = 1000, n = 50 10
0
0 5 10 15 20 25 30 35
(0)
f (x )p
main effort in each iteration: evaluate derivatives and solve Newton system
Hx = g
Dx + AT L0w = g, LT0 Ax w = 0
implementation
111
Equality constrained minimization
minimize f (x)
subject to Ax = b
f (x) + AT = 0, Ax = b
minimize (1/2)xT P x + q T x + r
subject to Ax = b
optimality condition:
T
P A x q
=
A 0 b
Ax = 0, x 6= 0 = xT P x > 0
represent solution of {x | Ax = b} as
| z Rnp}
{x | Ax = b} = {F z + x
x
is (any) particular solution
range of F Rn(np) is nullspace of A (rank F = n p and AF = 0)
minimize f (F z + x
)
x = F z + x
, = (AAT )1Af (x)
reduced problem:
interpretations
T 2
1/2 T
1/2
(x) = xnt f (x)xnt = f (x) xnt
properties
1
f (x) inf fb(y) = (x)2
Ay=b 2
T 2 1
1/2
in general, (x) 6= f (x) f (x) f (x)
minimize f(z) = f (F z + x
)
variables z Rnp
x x = b; rank F = n p and AF = 0
satisfies A
Newtons method for f, started at z (0), generates iterates z (k)
x(k+1) = F z (k) + x
primal-dual interpretation
write optimality condition as r(y) = 0, where
given starting point x dom f , , tolerance > 0, (0, 1/2), (0, 1).
repeat
1. Compute primal and dual Newton steps xnt, nt.
2. Backtracking line search on krk2.
t := 1.
while kr(x + txnt, + tnt)k2 > (1 t)kr(x, )k2, t := t.
3. Update. x := x + txnt, := + tnt.
until Ax = b and kr(x, )k2 .
T
H A v g
=
A 0 w h
solution methods
LDLT factorization
elimination (if H nonsingular)
AH 1AT w = h AH 1g, Hv = (g + AT w)
105
f (x(k)) p
100
105
10100 5 10 15 20
k
p g( (k))
100
105
10100 2 4 6 8 10
k
3. infeasible start Newton method (requires x(0) 0)
1010
105
kr(x(k), (k))k2
100
105
1010
10150 5 10 15 20 25
k
Equality constrained minimization 1114
complexity per iteration of three methods is identical
T
H A v g
=
A 0 w h
AH 1AT w = h AH 1g, Hv = (g + AT w)
p
X
tr(AiXAj X)wj = bi, i = 1, . . . , p (2)
j=1
121
Inequality constrained minimization
minimize f0(x)
subject to fi(x) 0, i = 1, . . . , m (1)
Ax = b
dom f0,
x fi(
x) < 0, i = 1, . . . , m, A
x=b
approximation improves as t
5
3 2 1 0 1
u
Interior-point methods 124
logarithmic barrier function
m
X
(x) = log(fi(x)), dom = {x | f1(x) < 0, . . . , fm(x) < 0}
i=1
m
X 1
(x) = fi(x)
i=1
f i (x)
m
X m
X 1
1
2(x) = 2
f i (x)f i (x) T
+ 2
fi(x)
f (x)
i=1 i i=1
fi(x)
(for now, assume x(t) exists and is unique for each t > 0)
central path is {x(t) | t > 0}
minimize cT x
subject to aTi x bi, i = 1, . . . , 6
x(10)
x
T T
hyperplane c x = c x (t) is tangent to
level curve of through x(t)
p g((t), (t))
= L(x(t), (t), (t))
= f0(x(t)) m/t
m
X
f0(x) + ifi(x) + AT = 0
i=1
m
X
F0(x(t)) + Fi(x(t)) = 0
i=1
3c
t=1 t=3
centering problem
102 140
Newton iterations
120
0
10
duality gap
100
102 80
60
4
10
40
106 = 50 = 150 =2 20
0
0 20 40 60 80 0 40 80 120 160 200
Newton iterations
102
100
duality gap
102
104
106 = 150 = 50 =2
0 20 40 60 80 100 120
Newton iterations
minimize cT x
subject to Ax = b, x0
35
Newton iterations
30
25
20
15 1
10 102 103
m
fi(x) 0, i = 1, . . . , m, Ax = b (2)
minimize (over x, s) s
subject to fi(x) s, i = 1, . . . , m (3)
Ax = b
minimize 1T s
subject to s 0, fi(x) si, i = 1, . . . , m
Ax = b
40 40
number
number
20 20
0 0
1 0.5 0 0.5 1 1.5 1 0.5 0 0.5 1 1.5
bi aTi xmax bi aTi xsum
Newton iterations
100
80 Infeasible Feasible
60
40
20
0
1 0.5 0 0.5 1
Newton iterations
Newton iterations
100 100
80 80
60 60
40 40
20 20
0 0
100 102 104 106 106 104 2
100
10
number of iterations roughly proportional to log(1/||)
second condition
5 104
4 104
figure shows N for typical values of , c,
3 104
m
N
4 m = 100, = 105
2 10 t(0)
1 104
0
1 1.1 1.2
for = 1 + 1/ m:
m/t(0)
N =O m log
number of Newton iterations for fixed gap reduction is O( m)
minimize f0(x)
subject to fi(x) Ki 0, i = 1, . . . , m
Ax = b
examples
Pn
nonnegative orthant K = Rn+: (y) = i=1 log yi , with degree = n
positive semidefinite cone K = Sn+:
(Y ) = log det Y ( = n)
2
(y) = log(yn+1 y12 yn2 ) ( = 2)
(y) K 0, y T (y) =
Pn
nonnegative orthant Rn+: (y) = i=1 log yi
(Y ) = Y 1, tr(Y (Y )) = n
m
X
(x) = i(fi(x)), dom = {x | fi(x) Ki 0, i = 1, . . . , m}
i=1
1 w
i(t) = i(fi(x(t))),
(t) =
t t
m
X
f0(x(t)) g((t), (t)) = (1/t) i
i=1
minimize cT x Pn
subject to F (x) = i=1 xiFi + G 0
maximize tr(GZ)
subject to tr(FiZ) + ci = 0, i = 1, . . . , n
Z0
P
only difference is duality gap m/t on central path is replaced by i i /t
Newton iterations
120
duality gap
0
10
102 80
104 40
106 = 50 = 200 =2
0
0 20 40 60 80 20 60 100 140 180
Newton iterations
100 100
102
60
4
10
106 = 150 = 50 =2 20
0 20 40 60 80 100 0 20 40 60 80 100 120
Newton iterations
Interior-point methods 1230
family of SDPs (A Sn, x Rn)
minimize 1T x
subject to A + diag(x) 0
35
Newton iterations
30
25
20
15
101 102 103
n
13. Conclusions
131
Modeling
mathematical optimization
tractability
Conclusions 132
Theoretical consequences of convexity
Conclusions 133
Practical consequences of convexity
high level modeling tools like cvx ease modeling and problem
specification
Conclusions 134
How to use convex optimization
even if the problem is quite nonconvex, you can use convex optimization
in subproblems, e.g., to find search direction
by repeatedly forming and solving a convex approximation at the
current point
Conclusions 135
Further topics
Conclusions 136
Whats next?
Conclusions 137