Convex Optimization:: Part 1 of Chapter 7 Discussion

Convex Optimization: Part 1 of Chapter 7
Discussion
Presenter: Brian Quanz
A KTEC Center of Excellence 1

About todays discussion
Chapter 7 no separate discussion of
convex optimization
Discusses with SVM problems
Instead:
Today: Discuss convex optimization
Next Week: Discuss some specific convex optimization problems
(from text), e.g. SVMs

About todays discussion
Mostly follow alternate text:
Convex Optimization, Stephen Boyd and Lieven Vandenberghe
Borrowed material from book and related course notes
Some figures and equations shown here
Available online: http://www.stanford.edu/~boyd/cvxbook/
Nice course lecture videos available from Stephen Boyd online:
http://www.stanford.edu/class/ee364a/
Corresponding convex optimization tool (discuss later) - CVX:
http://www.stanford.edu/~boyd/cvx/

Overview
Why convex? What is convex?
Key examples of linear and quadratic programming
Key mathematical ideas to discuss:

->Lagrange Duality
->KKT conditions
Brief concept of interior point methods
CVX convex opt. made easy

Mathematical Optimization
All learning is some optimization problem
-> Stick to canonical form
x = (x1, x2, , xp ) opt. variables ; x*

f0 : Rp -> R objective function
fi : Rp -> R constraint function
Optimization Example
Well familiar with: regularized regression
Least squares
Add some constraints, ridge, lasso

Why convex optimization?
Cant solve most OPs
E.g. NP Hard, even high polynomial time too slow
Convex OPs
(Generally) No analytic solution
Efficient algorithms to find (global) solution
Interior point methods (basically Iterated Newton) can be used:
~[10-100]*max{p3 , p2m, F} ; F cost eval. obj. and constr. f
At worst solve with general IP methods (CVX), faster specialized

What is Convex Optimization?
OP with convex objective and constraint
functions
f0 , , fm are convex = convex OP that has

an efficient solution!

Convex Function
Definition: the weighted mean of function
evaluated at any two points is greater than
or equal to the function evaluated at the
weighted mean of the two points

Convex Function
What does definition mean?
Pick any two points x, y and evaluate along the function,
f(x), f(y)
Draw the line passing through the two points f(x) and
f(y)
Convex if function evaluated on any point along the line

between x and y is below the line between f(x) and f(y)

Convex Function

Convex Function
Convex!

Convex Function
Not Convex!!!
Convex Function
Easy to see why convexity allows for
efficient solution
Just slide down the objective function as
far as possible and will reach a minimum

Local Optima is Global (simple proof)

Convex vs. Non-convex Ex.
Affine border
case of convexity
Convex, min. easy to find

Convex vs. Non-convex Ex.
Non-convex, easy to get stuck in a local min.
Cant rely on only local search techniques

Non-convex
Some non-convex problems highly multi-modal,
or NP hard
Could be forced to search all solutions, or hope
stochastic search is successful
Cannot guarantee best solution, inefficient
Harder to make performance guarantees with
approximate solutions

Determine/Prove Convexity
Can use definition (prove holds) to prove
If function restricted to any line is convex, function is convex
If 2X differentiable, show hessian >= 0

Often easier to:
Convert to a known convex OP
E.g. QP, LP, SOCP, SDP, often of a more general form
Combine known convex functions (building blocks) using
operations that preserve convexity
Similar idea to building kernels

Some common convex OPs
Of particular interest for this book and
chapter:
linear programming (LP) and quadratic programming (QP)
LP: affine objective function, affine constraints
-e.g. LP SVM, portfolio management

LP Visualization
Note:
constraints
form
feasible set
-for LP,
polyhedra

Quadratic Program
QP: Quadratic objective, affine constraints
LP is special case
Many SVM problems result in QP, regression
If constraint functions quadratic, then Quadratically

Constrained Quadratic Program (QCQP)

QP Visualization

Second Order Cone Program
Ai = 0 - results in LP
ci = 0 - results in QCQP
Constraint requires the affine functions
to lie in 2nd order cone
Second Order Cone (Boundary) in R3

Semidefinite Programming
Linear matrix inequality (LMI) constraints

Many problems can be expressed using
LMIs
LP and SOCP
Semidefinite Programming

Building Convex Functions
From simple convex functions to complex:
some operations that preserve complexity
Nonnegative weighted sum
Composition with affine function
Pointwise maximum and supremum
Composition
Minimization
Perspective ( g(x,t) = tf(x/t) )

Verifying Convexity Remarks
For more detail and expansion, consult the
referenced text, Convex Optimization
Geometric Programs also convex, can be
handled with a series of SDPs (skipped details
here)
CVX converts the problem either to SOCP or
SDM (or a series of) and uses efficient solver

Lagrangian
Standard form:
Lagrangian L:
Lambda, nu, Lagrange multipliers (dual variables)

Lagrange Dual Function
Lagrange Dual found by minimizing L

with respect to primal variables
Often can take gradient of L w.r.t. primal var.s and set = 0
(SVM)

Note: Lagrange dual function is the point-
wise infimum of family of affine functions
of (lambda, nu)
Thus, g is concave even if problem is not
convex

Lagrange Dual provides lower bound on
objective value at solution

Lagrangian as Linear Approximation, Lower Bound
Simple interpretation of Lagrangian

Can incorporate the constraints into objective as
indicator functions
Infinity if violated, 0 otherwise:
In Lagrangian we use a soft linear approximation to the

indicator functions; under-estimator since

Lagrange Dual Problem
Why not make the lower bound best possible?
Dual problem:
Always convex opt. problem (even when primal

is non-convex)
Weak Duality: d* <= p* (have already seen
this)
Strong Duality
If d* = p*, strong duality holds
Does not hold in general
Slaters Theorem: If convex problem, and
strictly feasible point exists, then strong
duality holds! (proof too involved, refer to text)
=> For convex problems, can use dual problem to find solution

Complementary Slackness
When strong duality holds
(definition)
(since constraints
satisfied at x*)
Sandwiched between f0(x), last 2 inequalities are
equalities, simple!

Which means:
Since each term is non-positive, we have
complementary slackness:
Whenever constraint is non-active,

corresponding multiplier is zero

This can also be described by
Since usually only a few active constraints at

solution (see geometry), the dual variable
lambda is often sparse
Note: In general no guarantee

As we will see, this is why support vector

machines result in solution with only key
support vectors
These come from the dual problem, constraints correspond to points, and
complementary slackness ensures only the active points are kept

However, avoid common misconceptions when

it comes to SVM and complementary slackness!
E.g. if Lagrange multiplier is 0, constraint could
still be active! (not bijection!)
This means:

KKT Conditions
The KKT conditions are then just what we
call that set of conditions required at the
solution (basically list what we know)
KKT conditions play important role
Can sometimes be used to find solution analytically
Otherwise can think of many methods as ways of solving KKT
conditions

KKT Conditions
Again given strong duality and assuming
differentiable, since
gradient must be 0 at x*
Thus, putting it all together, for non-convex

problems we have

KKT Conditions non-convex
Necessary conditions

KKT Conditions convex
Also sufficient
conditions:
1+2 -> xt is feasible.
3 -> L(x,lt,nt) is convex
5 -> xt minimizes L(x,lt,nt)
so g(lt,nt) = L(xt,lt,nt)

Brief description of interior point method
Solve a series of equality constrained problems

with Newtons method
Approximate constraints with log-barrier
(approx. of indicator)

Brief description of interior point method
As t gets larger, approximation becomes better

Central Path Idea

CVX: Convex Optimization Made Easy
CVX is a Matlab toolbox
Allows you to flexibly express convex optimization problems
Translates these to a general form and uses efficient solver
(SOCP, SDP, or a series of these)
http://www.stanford.edu/~boyd/cvx/
All you have to do is design the convex

optimization problem
Plug into CVX, a first version of algorithm implemented
More specialized solver may be necessary for some applications

CVX - Examples
Quadratic program: given H, f, A, and b
cvx_begin
variable x(n)
minimize (x*H*x + f*x)
subject to
A*x >= b
cvx_end

CVX - Examples
SVM-type formulation with L1 norm
cvx_begin
variable w(p)
variable b(1)
variable e(n)
expression by(n)
by = train_label.*b;
minimize( w'*(L + I)*w + C*sum(e) + l1_lambda*norm(w,1) )
subject to
X*w + by >= a - e;
e >= ec;
cvx_end

CVX - Examples
More complicated terms built with expressions
cvx_begin
variable w(p+1+n);
expression q(ec);
for i =1:p
for j =i:p
if(A(i,j) == 1)
q(ct) = max(abs(w(i))/d(i),abs(w(j))/d(j));
ct=ct+1;
end
end
end
minimize( f'*w + lambda*sum(q) )
subject to
X*w >= a;
cvx_end

Questions
Questions, Comments?

Extra proof

Convex Optimization:: Part 1 of Chapter 7 Discussion

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Convex Optimization:: Part 1 of Chapter 7 Discussion

Uploaded by

Copyright:

Available Formats

Convex Optimization: Part 1 of Chapter 7

Presenter: Brian Quanz

A KTEC Center of Excellence 1

A KTEC Center of Excellence 2

A KTEC Center of Excellence 3

Key mathematical ideas to discuss:

A KTEC Center of Excellence 4

x = (x1, x2, , xp ) opt. variables ; x*

Add some constraints, ridge, lasso

A KTEC Center of Excellence 6

A KTEC Center of Excellence 7

f0 , , fm are convex = convex OP that has

A KTEC Center of Excellence 8

A KTEC Center of Excellence 9

Convex if function evaluated on any point along the line

A KTEC Center of Excellence 10

A KTEC Center of Excellence 11

A KTEC Center of Excellence 12

A KTEC Center of Excellence 14

A KTEC Center of Excellence 15

Convex, min. easy to find

A KTEC Center of Excellence 16

Non-convex, easy to get stuck in a local min.

Cant rely on only local search techniques

A KTEC Center of Excellence 17

A KTEC Center of Excellence 18

If 2X differentiable, show hessian >= 0

A KTEC Center of Excellence 19

LP: affine objective function, affine constraints

-e.g. LP SVM, portfolio management

A KTEC Center of Excellence 20

A KTEC Center of Excellence 21

Many SVM problems result in QP, regression

If constraint functions quadratic, then Quadratically

A KTEC Center of Excellence 22

A KTEC Center of Excellence 23

A KTEC Center of Excellence 25

Linear matrix inequality (LMI) constraints

A KTEC Center of Excellence 27

A KTEC Center of Excellence 28

A KTEC Center of Excellence 29

Lambda, nu, Lagrange multipliers (dual variables)

A KTEC Center of Excellence 30

Lagrange Dual found by minimizing L

A KTEC Center of Excellence 31

A KTEC Center of Excellence 32

A KTEC Center of Excellence 33

Simple interpretation of Lagrangian

In Lagrangian we use a soft linear approximation to the

A KTEC Center of Excellence 34

Always convex opt. problem (even when primal

A KTEC Center of Excellence 36

A KTEC Center of Excellence 37

Whenever constraint is non-active,

A KTEC Center of Excellence 38

This can also be described by

Since usually only a few active constraints at

A KTEC Center of Excellence 39

As we will see, this is why support vector

A KTEC Center of Excellence 40

However, avoid common misconceptions when

A KTEC Center of Excellence 41

A KTEC Center of Excellence 42