Constrained Optimization PDF

Introduction to constrained optimization
Gilles Gasso
INSA Rouen - ASI Departement

Laboratory LITIS
September 19, 2019
Gilles Gasso Introduction to constrained optimization 1 / 26

Plan
1 Introduction
2 Formulation
3 Concept of Lagrangian and duality, condition of optimality

Concept of Lagrangian
Concept of duality
4 QP Problem

Introduction
Constrained optimizing problems
Where do these problems come from?

What is the connection with Machine Learning?
How to formulate them ?

Introduction
Example 1: sparse Regression
Output to be predicted: y ∈ R
Input variables: x ∈ Rd
Linear model: f (x) = x> θ
θ ∈ Rd : parameters of the model
Determination of a sparse θ
Minimization of square error
Only a few paramters are non-zero
PN
min 12 i=1 (yi − x>
i θ)
2
θ∈Rd
s.c. kθkp ≤ k
http://www.ds100.org/sp17/assets/notebooks/
Pd linear_regression/Regularization.html
with kθkpp = j=1 |θj |
p

Introduction
Example 2: where to settle the firehouse?
Caserne Maison
House Mi : defined by its coordinates
zi = [xi yi ]>
Let θ be the coordinates of the
?
firehouse
Minimize the distance from the
firehouse to the farthest house
Problem formulation Equivalent problem

min 2t
t∈R,θ∈R
min max kθ − zi k2 s.c. kθ − zi k2 ≤ t ∀ i = 1, · · · , N
θ i=1,··· ,N

Introduction
Third example: Support vector machine (SVM)
D = {(xi , yi ) ∈ Rd × {−1, 1}}N

i=1 linearly separable points
Goal: find a function f (x) = θ > x + b that correctly predicts the class
of each xi while maximizing the margin between positives and
negatives
minθ∈Rd ,b∈R 12 kθk2 maximisation of the margin

>
s.t. yi (θ xi + b) ≥ 1 ∀i = 1, · · · , N correct classification
6 f(x) = -1 f(x)= 1
5
class 1
4 class 2 f(x) = 0
3
1 Margin= 2
w
0
−1
−2 −1 0 1 2 3 4 5 6 7

Introduction
SVM example: where are the constraints?
minθ,b 12 kθk2 function J(θ) to be minimized

>
s.t. yi (θ xi + b) ≥ 1 ∀i = 1, · · · , N Contraints to be satisfied
Contraints
How many constraints ?
N constraints yi (θ > xi + b) ≥ 1 with i = 1, · · · , N
Type of constraints ?
Inequality constraints
Goal
Find the minimum θ ∗ of J(θ) such as every constraint being satisfied
Formulation
Constrained optimization
Elements of the problem
θ ∈ Rd : vector of unknown real parameters
J : Rd → R : the function to be minimized on its domain domJ
fi and gj are differentiable functions of Rd on R
Formulation of the primal problem P
minθ∈Rd J(θ) function to be minimized

s.c. fi (θ) = 0 ∀i = 1, · · · , n n equality Constraints
gj (θ) ≤ 0 ∀j = 1, · · · , m m inequality Constraints
Feasibility
Let p ∗ = minθ {J(θ) such that fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j}
If p ∗ = ∞ then the problem does not admit a feasible solution
Formulation
Characterization of the solutions
Feasibility domain
The feasible domain is defined by the set of constraints
n o
Ω(θ) = θ ∈ Rd ; fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j
Feasible points
θ 0 is feasible if θ 0 ∈ domJ and θ 0 ∈ Ω(θ) ie θ0 fulfills all the
constraints and J(θ0 ) has a finite value
θ ∗ is a global solution of the problem if θ ∗ is a feasible solution such
that J(θ∗ ) ≤ J(θ) for every θ
θ̂ is a local optimal solution if θ̂ is feasible and J(θ̂) ≤ J(θ) for every
kθ − θ̂k ≤

Formulation
Example 1
5.75
5 −5
10
0
5.7
6
10
0.9θ12
0
− 0.74θ1 θ2
−1
min
−5
0
−5
4
−10
θ
5.75
2
5
0
5.7
10
+0.75θ12 − 5.4θ1 − 1.2θ2 −1
0
10
−5
−5
0
0
s.c. −4 ≤ θ1 ≤ −1
0
5
−2 5.7
5
5.7 10
10
J(θ) = c
−4
−3 ≤ θ2 ≤ 4 −5 0 5
Domaine Ω
10

θ
Parameters: θ = 1
θ2
Objective function:
J(θ) = 0.9θ12 − 0.74θ1 θ2 + 0.75θ12 − 5.4θ1 − 1.2θ2
Feasibility domain (four inequality constraints) :
Ω(θ) = θ ∈ R2 ; −4 ≤ θ 1 ≤ −1 and − 3 ≤ θ 2 ≤ 4


Formulation
Example 2
2.5
1
∇ h(θ*) =
2 T
2 0 3 (1, 1)
4
1.5 ∇ J(θ*) = (1, 1)T
Example 1 −1
0
1
θ* = (1, 1)T 2 3
0.5
θ2
min θ1 + θ2 0 −2
θ∈R2 −1
0
1
2
−0.5
∇ J(θ*) = (1, 1)T
s.c. θ12 + θ22 −2=0 −1 *
θ = (−1, −1)
T
−3
−2 1
−1
−1.5 0
∇ h(θ*) = (1, 1)T
−2
−2 −1 0 1 2
θ1
An equality constraint
Domain of feasibility: a circle with center at 0 and diameter equals to 2
>
The optimal solution is obtained for θ ∗ = −1 −1 and we have
J(θ ∗ ) = −2

Formulation
Optimality
How to assess a solution of the primal problem?

Do we have optimality conditions similar to those of unconstrained
optimization?

Concept of Lagrangian and duality, condition of optimality Concept of Lagrangian
Notion of Lagrangian
Primal problem P
minθ∈Rd J(θ)
fi (θ) = 0 ∀i = 1, · · · , n n equality constraints
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m m inequality constraints
θ is called primal variable
Principle of Lagrangian
Each constraint is associated to a scalar parameter called Lagrange
multiplier
Equality constraint fi (θ) = 0 : we associate λi ∈ R
Inequality constraint gj (θ) ≤ 0 : we associate µj ≥ 0a
Lagrangian allows to transform the problem with constraints into a problem
without constraints with additional variables: λi and µj .
a
Beware of the sense of the inequality gj (θ) ≤ 0
Lagrangian formulation
Associated Lagrange parameters

minθ∈Rd J(θ) None
fi (θ) = 0 ∀i = 1, · · · , n λi any real number ∀i = 1, · · · , n
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m µj ≥ 0 ∀j = 1, · · · , m
Lagrangien
The Lagrangian is defined by :
n
X m
X
L(θ, λ, µ) = J(θ)+ λi fi (θ)+ µj gj (θ) avec µj ≥ 0, ∀j = 1, · · · , m
i=1 j=1
Parameters λi , i = 1, · · · , n and µj , j = 1, · · · , m are called dual

variables
Dual variables are unknown parameters to be determined
Examples
Example 1
min θ1 + θ2
θ∈R2
s.c. θ12 + 2θ22 − 2 ≤ 0 inequality constraint
s.c. θ2 ≥ 0 inequality constraint (mind the type of inequality)
Lagrangian : L(λ, θ) = (θ1 + θ2 ) + µ1 (θ12 + 2θ22 − 2) + µ2 (−θ2 ), µ1 ≥ 0, µ2 ≥ 0
Example 2
1
θ12 + θ22 + θ32

min 2
θ∈R3
s.c. θ1 + θ2 + 2θ3 = 1 equality constraint
θ1 + 4θ2 + 2θ3 = 3 equality constraint
Lagrangian :
1 2
θ1 + θ22 + θ32 + λ1 (θ1 + θ2 + 2θ3 − 1) + λ2 (θ1 + 4θ2 + 2θ3 − 3),

L(λ, θ) = λ1 , λ2 any
2

Necessary optimality conditions

Assume that J, fi , gj are differentiable functions. Let θ ∗ be a feasible
solution to the problem P. Then there exists dual variables
λ∗i , i = 1, · · · , n, µ∗j , j = 1, · · · , m such that the following conditions are
satisfied.
Karush-Kuhn-Tucker (KKT) Conditions
Stationarity ∇L(λ, µ,P

θ) = 0 ie
∇J(θ) + ni=1 λi ∇fi (θ) + m
P
j=1 µj ∇gj (θ) = 0
Primal feasibility fi (θ) = 0 ∀i = 1, · · · , n

gj (θ) ≤ 0 ∀j = 1, · · · , m
Dual feasibility µj ≥ 0 ∀j = 1, · · · , m
Complementary slackness µj gj (θ) = 0 ∀j = 1, · · · , m

Example
1 2
min (θ1 + θ22 )
θ∈R2 2
s.c. θ1 − 2θ2 + 2 ≤ 0
Lagrangian : L(λ, θ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0

KKT Conditions
θ1 = −µ
Stationarity: ∇θ L(µ, θ) = 0 ⇒
θ2 = −2µ
Primal feasibility : θ1 − 2θ2 + 2 ≤ 0
Dual feasibility : µ ≥ 0
Complementary slackness : µ(θ1 − 2θ2 + 2) = 0
Remarks on the complementary slackness
If θ1 − 2θ2 + 2 < 0 (inactive constraint) ⇒ µ = 0 (no penalty required
as the constraint is satisfied)
If µ > 0 ⇒ θ1 − 2θ2 + 2 = 0 (active constraint)
Concept of Lagrangian and duality, condition of optimality Concept of duality
Duality
Dual function
Let L(θ, λ, µ) be the lagrangian of the primal problem P with µj ≥ 0. The
dual function corresponding to it is D(λ, µ) = minθ L(θ, λ, µ)
Theorem [Weak duality]

Let p ∗ = minθ {J(θ) such that fi (θ) = 0 ∀i and gj (θ) ≤ 0 ∀j} be the
optimum value (supposed finite) of the problem P. Then, for any value of
µj ≥ 0, ∀j and λi , ∀i, we have
D(λ, µ) ≤ p ∗

Dual problem
The weak duality indicates that the dual function

D(λ, µ) = minθ L(θ, λ, µ) is a lower bound of p ∗
Bridge the gap: maximize the dual w.r.t. dual variables λ and µ to make
this lower bound close to p ∗
Dual problem
max D(λ, µ)
λ,µ
s.c. λj ≥ 0 ∀ j = 1, · · · , m
http://www.onmyphd.com/?p=duality.theory

Interest of the dual problem
Remarks
Transform the primal problem into an equivalent dual problem possibly
much simpler to solve
Solving the dual problem can lead to the solution of the primal problem
Solving the dual problem gives the optimal values of the Lagrange multipliers

Example : inequality constraints

1 2
min (θ1 + θ22 )
θ∈R2 2
s.c. θ1 − 2θ2 + 2 ≤ 0
Lagrangian : L(θ, µ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0
Stationarity of the KKT Condition

:
θ1 = −µ
∇θ L(µ, θ) = 0 ⇒ (1)
θ2 = 2µ
Dual function D(µ) = minθ L(θ, µ) : by substituting (1) in L we obtain

5
D(µ) = − µ2 + 2µ
2
Dual problem : maxµ D(µ) s.c. µ ≥ 0
Dual solution
2
∇D(µ) = 0 ⇒ µ = (that satisfies µ ≥ 0) (2)
5
4 >
Primal solution : (2) and (1) lead to θ = − 25

5
Constrained optimization with equality constraints

1 2 Matrix form
θ1 + θ22 + θ32

min
θ∈R3 2 1 >
min θ θ s.c. Aθ − b = 0
s.c. θ1 + θ2 + 2θ3 − 1 = 0 θ∈R3 2
θ1 + 4θ2 + 2θ3 − 3 = 0
1 1 2

1
with A = ,b=
1 4 2 3
Lagrangian : L(λ, θ) = 12 θ > θ + λ> (Aθ − b) , λ ∈ R2
KKT Conditions
Stationarity : ∇θ L(λ,θ) = 0 ⇒ θ = −A> λ (1)
Dual function : by substituting (1) in L we get
1
D(λ) = − λ> AA> λ − λ> b
2
Dual Problem : maxλ D(λ)
Dual Solution −1
∇D(λ) = 0 ⇒ λ = − AA> b (3)
−1
Primal Solution : (3) in (1) gives θ = A> AA> b
Convex constrained optimization
Convexity condition
minθ∈Rd J(θ) J is a convex function
fi (θ) = 0 ∀i = 1, · · · , n fi are linear ∀i = 1, n
s.c. gj (θ) ≤ 0 ∀j = 1, · · · , m gj are convex functions ∀j = 1, m
Problems of interest
Linear Programming (LP)
Quadratic Programming (QP)
Off-the-shelves toolboxes exist for those problems (Gurobi, Mosek, CVX . . . )

QP Problem
QP convex problem
Standard form
1 >
min 2 θ Gθ + q> θ + r
θ∈Rd
s.c. a>
i θ = bi ∀i = 1, · · · , n affine equality constraint
c>
j θ ≥ dj ∀j = 1, · · · , m linear inequality constraints
with q, ai , cj ∈ Rd , di and dj real scalar values and G ∈ Rd×d a positive

definite matrix
Examples
SVM Problem
1 2 1 2
min2 (θ + θ22 ) minθ,bR 2 kθk
θ∈R 2 1 s.t. >
yi (θ xi + b) ≥ 1 ∀i = 1, N
s.c. θ1 − 2θ2 + 2 ≤ 0

QP Problem
Summary
Optimization under constraints: involved problem in the general case

Lagrangian: allows to reduce to an unconstrained problem via Lagrange
multipliers
Lagrange multipliers
to a constraint corresponds a multiplier ⇒
act as a penalty if the corresponding constraints are violated
Optimally: Stationary condition + feasibility conditions + Complementary
conditions
Duality: provides lower bound on the primal problem. Dual problem
sometimes easier to solve than primal.

QP Problem
A reference book

Constrained Optimization PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Constrained Optimization PDF

Uploaded by

Copyright:

Available Formats

Introduction to constrained optimization

INSA Rouen - ASI Departement

September 19, 2019

Gilles Gasso Introduction to constrained optimization 1 / 26

3 Concept of Lagrangian and duality, condition of optimality

Gilles Gasso Introduction to constrained optimization 2 / 26

Constrained optimizing problems

Where do these problems come from?

Gilles Gasso Introduction to constrained optimization 3 / 26

Example 1: sparse Regression

Gilles Gasso Introduction to constrained optimization 4 / 26

Example 2: where to settle the firehouse?

Problem formulation Equivalent problem

Gilles Gasso Introduction to constrained optimization 5 / 26

Third example: Support vector machine (SVM)

D = {(xi , yi ) ∈ Rd × {−1, 1}}N

minθ∈Rd ,b∈R 12 kθk2 maximisation of the margin

Gilles Gasso Introduction to constrained optimization 6 / 26

SVM example: where are the constraints?

minθ,b 12 kθk2 function J(θ) to be minimized

Formulation of the primal problem P

minθ∈Rd J(θ) function to be minimized

Characterization of the solutions

Gilles Gasso Introduction to constrained optimization 9 / 26

J(θ) = 0.9θ12 − 0.74θ1 θ2 + 0.75θ12 − 5.4θ1 − 1.2θ2

Feasibility domain (four inequality constraints) :

Gilles Gasso Introduction to constrained optimization 10 / 26

Gilles Gasso Introduction to constrained optimization 11 / 26

How to assess a solution of the primal problem?

Gilles Gasso Introduction to constrained optimization 12 / 26

Associated Lagrange parameters

Parameters λi , i = 1, · · · , n and µj , j = 1, · · · , m are called dual

Lagrangian : L(λ, θ) = (θ1 + θ2 ) + µ1 (θ12 + 2θ22 − 2) + µ2 (−θ2 ), µ1 ≥ 0, µ2 ≥ 0

Gilles Gasso Introduction to constrained optimization 15 / 26

Necessary optimality conditions

Stationarity ∇L(λ, µ,P

Primal feasibility fi (θ) = 0 ∀i = 1, · · · , n

Complementary slackness µj gj (θ) = 0 ∀j = 1, · · · , m

Lagrangian : L(λ, θ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0

Theorem [Weak duality]

Gilles Gasso Introduction to constrained optimization 18 / 26

The weak duality indicates that the dual function

Gilles Gasso Introduction to constrained optimization 19 / 26

Interest of the dual problem

Gilles Gasso Introduction to constrained optimization 20 / 26

Example : inequality constraints

Lagrangian : L(θ, µ) = 12 (θ12 + θ22 ) + µ(θ1 − 2θ2 + 2), µ≥0

Stationarity of the KKT Condition

Dual function D(µ) = minθ L(θ, µ) : by substituting (1) in L we obtain

Constrained optimization with equality constraints

Convex constrained optimization

Gilles Gasso Introduction to constrained optimization 23 / 26

with q, ai , cj ∈ Rd , di and dj real scalar values and G ∈ Rd×d a positive

Gilles Gasso Introduction to constrained optimization 24 / 26

Optimization under constraints: involved problem in the general case

Gilles Gasso Introduction to constrained optimization 25 / 26

Gilles Gasso Introduction to constrained optimization 26 / 26

You might also like