You are on page 1of 54

Convex Optimization: Part 1 of Chapter 7

Discussion

Presenter: Brian Quanz

A KTEC Center of Excellence 1


About todays discussion
Chapter 7 no separate discussion of
convex optimization
Discusses with SVM problems

Instead:
Today: Discuss convex optimization
Next Week: Discuss some specific convex optimization problems
(from text), e.g. SVMs

A KTEC Center of Excellence 2


About todays discussion
Mostly follow alternate text:
Convex Optimization, Stephen Boyd and Lieven Vandenberghe
Borrowed material from book and related course notes
Some figures and equations shown here
Available online: http://www.stanford.edu/~boyd/cvxbook/
Nice course lecture videos available from Stephen Boyd online:
http://www.stanford.edu/class/ee364a/
Corresponding convex optimization tool (discuss later) - CVX:
http://www.stanford.edu/~boyd/cvx/

A KTEC Center of Excellence 3


Overview
Why convex? What is convex?
Key examples of linear and quadratic programming

Key mathematical ideas to discuss:


->Lagrange Duality
->KKT conditions
Brief concept of interior point methods
CVX convex opt. made easy

A KTEC Center of Excellence 4


Mathematical Optimization
All learning is some optimization problem
-> Stick to canonical form

x = (x1, x2, , xp ) opt. variables ; x*


f0 : Rp -> R objective function
fi : Rp -> R constraint function
A KTEC Center of Excellence 5
Optimization Example
Well familiar with: regularized regression
Least squares

Add some constraints, ridge, lasso

A KTEC Center of Excellence 6


Why convex optimization?
Cant solve most OPs
E.g. NP Hard, even high polynomial time too slow

Convex OPs
(Generally) No analytic solution
Efficient algorithms to find (global) solution
Interior point methods (basically Iterated Newton) can be used:
~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f
At worst solve with general IP methods (CVX), faster specialized

A KTEC Center of Excellence 7


What is Convex Optimization?
OP with convex objective and constraint
functions

f0 , , fm are convex = convex OP that has


an efficient solution!

A KTEC Center of Excellence 8


Convex Function
Definition: the weighted mean of function
evaluated at any two points is greater than
or equal to the function evaluated at the
weighted mean of the two points

A KTEC Center of Excellence 9


Convex Function
What does definition mean?
Pick any two points x, y and evaluate along the function,
f(x), f(y)

Draw the line passing through the two points f(x) and
f(y)

Convex if function evaluated on any point along the line


between x and y is below the line between f(x) and f(y)

A KTEC Center of Excellence 10


Convex Function

A KTEC Center of Excellence 11


Convex Function

Convex!

A KTEC Center of Excellence 12


Convex Function

Not Convex!!!
A KTEC Center of Excellence 13
Convex Function
Easy to see why convexity allows for
efficient solution
Just slide down the objective function as
far as possible and will reach a minimum

A KTEC Center of Excellence 14


Local Optima is Global (simple proof)

A KTEC Center of Excellence 15


Convex vs. Non-convex Ex.

Affine border
case of convexity

Convex, min. easy to find

A KTEC Center of Excellence 16


Convex vs. Non-convex Ex.

Non-convex, easy to get stuck in a local min.

Cant rely on only local search techniques

A KTEC Center of Excellence 17


Non-convex
Some non-convex problems highly multi-modal,
or NP hard
Could be forced to search all solutions, or hope
stochastic search is successful
Cannot guarantee best solution, inefficient
Harder to make performance guarantees with
approximate solutions

A KTEC Center of Excellence 18


Determine/Prove Convexity
Can use definition (prove holds) to prove
If function restricted to any line is convex, function is convex

If 2X differentiable, show hessian >= 0


Often easier to:
Convert to a known convex OP
E.g. QP, LP, SOCP, SDP, often of a more general form
Combine known convex functions (building blocks) using
operations that preserve convexity
Similar idea to building kernels

A KTEC Center of Excellence 19


Some common convex OPs
Of particular interest for this book and
chapter:
linear programming (LP) and quadratic programming (QP)

LP: affine objective function, affine constraints

-e.g. LP SVM, portfolio management

A KTEC Center of Excellence 20


LP Visualization
Note:
constraints
form
feasible set
-for LP,
polyhedra

A KTEC Center of Excellence 21


Quadratic Program
QP: Quadratic objective, affine constraints

LP is special case

Many SVM problems result in QP, regression

If constraint functions quadratic, then Quadratically


Constrained Quadratic Program (QCQP)

A KTEC Center of Excellence 22


QP Visualization

A KTEC Center of Excellence 23


Second Order Cone Program

Ai = 0 - results in LP
ci = 0 - results in QCQP
Constraint requires the affine functions
to lie in 2nd order cone
A KTEC Center of Excellence 24
Second Order Cone (Boundary) in R3

A KTEC Center of Excellence 25


Semidefinite Programming

Linear matrix inequality (LMI) constraints


Many problems can be expressed using
LMIs
LP and SOCP
A KTEC Center of Excellence 26
Semidefinite Programming

A KTEC Center of Excellence 27


Building Convex Functions
From simple convex functions to complex:
some operations that preserve complexity
Nonnegative weighted sum
Composition with affine function
Pointwise maximum and supremum
Composition
Minimization
Perspective ( g(x,t) = tf(x/t) )

A KTEC Center of Excellence 28


Verifying Convexity Remarks
For more detail and expansion, consult the
referenced text, Convex Optimization
Geometric Programs also convex, can be
handled with a series of SDPs (skipped details
here)
CVX converts the problem either to SOCP or
SDM (or a series of) and uses efficient solver

A KTEC Center of Excellence 29


Lagrangian
Standard form:

Lagrangian L:

Lambda, nu, Lagrange multipliers (dual variables)

A KTEC Center of Excellence 30


Lagrange Dual Function

Lagrange Dual found by minimizing L


with respect to primal variables
Often can take gradient of L w.r.t. primal var.s and set = 0
(SVM)

A KTEC Center of Excellence 31


Lagrange Dual Function
Note: Lagrange dual function is the point-
wise infimum of family of affine functions
of (lambda, nu)
Thus, g is concave even if problem is not
convex

A KTEC Center of Excellence 32


Lagrange Dual Function
Lagrange Dual provides lower bound on
objective value at solution

A KTEC Center of Excellence 33


Lagrangian as Linear Approximation, Lower Bound

Simple interpretation of Lagrangian


Can incorporate the constraints into objective as
indicator functions
Infinity if violated, 0 otherwise:

In Lagrangian we use a soft linear approximation to the


indicator functions; under-estimator since

A KTEC Center of Excellence 34


Lagrange Dual Problem
Why not make the lower bound best possible?
Dual problem:

Always convex opt. problem (even when primal


is non-convex)
Weak Duality: d* <= p* (have already seen
this)
A KTEC Center of Excellence 35
Strong Duality
If d* = p*, strong duality holds
Does not hold in general
Slaters Theorem: If convex problem, and
strictly feasible point exists, then strong
duality holds! (proof too involved, refer to text)
=> For convex problems, can use dual problem to find solution

A KTEC Center of Excellence 36


Complementary Slackness
When strong duality holds

(definition)

(since constraints
satisfied at x*)
Sandwiched between f0(x), last 2 inequalities are
equalities, simple!

A KTEC Center of Excellence 37


Complementary Slackness

Which means:
Since each term is non-positive, we have
complementary slackness:

Whenever constraint is non-active,


corresponding multiplier is zero

A KTEC Center of Excellence 38


Complementary Slackness

This can also be described by

Since usually only a few active constraints at


solution (see geometry), the dual variable
lambda is often sparse
Note: In general no guarantee

A KTEC Center of Excellence 39


Complementary Slackness

As we will see, this is why support vector


machines result in solution with only key
support vectors
These come from the dual problem, constraints correspond to points, and
complementary slackness ensures only the active points are kept

A KTEC Center of Excellence 40


Complementary Slackness

However, avoid common misconceptions when


it comes to SVM and complementary slackness!
E.g. if Lagrange multiplier is 0, constraint could
still be active! (not bijection!)
This means:

A KTEC Center of Excellence 41


KKT Conditions
The KKT conditions are then just what we
call that set of conditions required at the
solution (basically list what we know)
KKT conditions play important role
Can sometimes be used to find solution analytically
Otherwise can think of many methods as ways of solving KKT
conditions

A KTEC Center of Excellence 42


KKT Conditions
Again given strong duality and assuming
differentiable, since
gradient must be 0 at x*

Thus, putting it all together, for non-convex


problems we have

A KTEC Center of Excellence 43


KKT Conditions non-convex

Necessary conditions

A KTEC Center of Excellence 44


KKT Conditions convex

Also sufficient
conditions:
1+2 -> xt is feasible.
3 -> L(x,lt,nt) is convex
5 -> xt minimizes L(x,lt,nt)
so g(lt,nt) = L(xt,lt,nt)

A KTEC Center of Excellence 45


Brief description of interior point method

Solve a series of equality constrained problems


with Newtons method
Approximate constraints with log-barrier
(approx. of indicator)

A KTEC Center of Excellence 46


Brief description of interior point method

As t gets larger, approximation becomes better

A KTEC Center of Excellence 47


Central Path Idea

A KTEC Center of Excellence 48


CVX: Convex Optimization Made Easy
CVX is a Matlab toolbox
Allows you to flexibly express convex optimization problems
Translates these to a general form and uses efficient solver
(SOCP, SDP, or a series of these)
http://www.stanford.edu/~boyd/cvx/

All you have to do is design the convex


optimization problem
Plug into CVX, a first version of algorithm implemented
More specialized solver may be necessary for some applications

A KTEC Center of Excellence 49


CVX - Examples
Quadratic program: given H, f, A, and b
cvx_begin
variable x(n)
minimize (x*H*x + f*x)
subject to
A*x >= b
cvx_end

A KTEC Center of Excellence 50


CVX - Examples
SVM-type formulation with L1 norm
cvx_begin
variable w(p)
variable b(1)
variable e(n)
expression by(n)
by = train_label.*b;
minimize( w'*(L + I)*w + C*sum(e) + l1_lambda*norm(w,1) )
subject to
X*w + by >= a - e;
e >= ec;
cvx_end

A KTEC Center of Excellence 51


CVX - Examples
More complicated terms built with expressions
cvx_begin
variable w(p+1+n);
expression q(ec);
for i =1:p
for j =i:p
if(A(i,j) == 1)
q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j));
ct=ct+1;
end
end
end
minimize( f'*w + lambda*sum(q) )
subject to
X*w >= a;
cvx_end

A KTEC Center of Excellence 52


Questions
Questions, Comments?

A KTEC Center of Excellence 53


Extra proof

A KTEC Center of Excellence 54

You might also like