You are on page 1of 47

Optimization

Inequality Constrained Optimization Algorithms

Panos Patrinos
STADIUS, Department of Electrical
Engineering, KU Leuven
1 – Outline 2/47

1 Quadratic programs

2 Nonlinear programs

3 Interior point methods


1 – Quadratic Programs 3/47

minimize f (x) = 21 x> Bx + g > x


subject to Ax ≤ b

a>
 
1
 a> 
 2
A =  .  ∈ IRm×n
 .. 
a>
m
very important problem. arises in
1 algorithms for NLPs (SQP)
2 optimal control of linear systems (MPC)
many algorithms: active set, interior point, first-order
methods,. . .
we assume that B  0 (convex QP)
1 – KKT conditions 4/47

Bx? + g + A> µ? = 0 Lagrangian stationarity


?
Ax ≤ b primal feasibility
?
µ ≥0 dual feasibility
µ?i (Ax? − b) = 0 complementary slackness

B  0: necessary & sufficient for x? to be a global optimum


complementarity condition gives

µi ≥ 0 i ∈ A(x? ) = {i | Ai x? = bi }
µi = 0 i ∈ I(x? ) = {i | Ai x? < bi }

optimal active set A(x? ): active inequalities at the solution


1 – KKT conditions 5/47

if we could guess the optimal active set

A(x? ) = {i | Ai x? = bi }

then optimal solution could be determined by solving EQP

minimize 21 x> Bx + g > x


subject to a>
i x = bi , i ∈ A(x? )

can solve with methods of previous lecture


for any index set A we use notation

AA x = bA

to mean

a>
i x = bi , i∈A
1 – Active set methods 6/47

make guess for active set A


find primal-dual solution (x̄, µ̄) of equality constrained QP

minimize 21 x> Bx + g > x


subject to AA x = bA

if (x̄, µ̄) satisfies

AI x̄ ≤ bI
µ̄ ≥ 0

where I = {i | Ai x̄ < bi } then solution of QP found (KKT


conditions are satisfied)
otherwise add or remove constraints on A and repeat
1 – Active set methods 7/47

very efficient for medium/small scale problems


benefit greatly from warm-start
primal, dual, primal-dual versions
software available: quadprog (MATLAB), qpOASES, QPSchur
1 – Primal active set method 8/47

at each iteration k we know a feasible point xk and

Wk ⊆ A(xk ) = {i | a>
i xk = bi }

with ai , i ∈ Wk linearly independent


Wk is called the working set
we solve EQ subproblem

minimize 21 x> Bx + g > x


subject to AWk x = bWk

in terms of direction d = x − xk problem becomes

minimize fk (d) = 21 d> Bd + gk> d


subject to AWk d = 0

where gk = ∇f (xk ) = Bxk + g (we used AWk xk = bWk )


1 – Primal active set method 9/47

minimize fk (d) = 21 d> Bd + gk> d


subject to AWk d = 0
where gk = Bxk + g.
assume subproblem with unique solution dk 6= 0
d = 0 is feasible for subproblem: fk (dk ) < 0
this shows that directional derivative of
f (x) = 12 x> Bx + gk> x
is negative
∇f (xk )> dk = gk> dk < − 21 d>
k Bdk ≤ 0

dk is direction of descent:
f (xk + αdk ) < f (xk )
for α sufficiently small
1 – Primal active set method 10/47

update xk+1 = xk + αk dk where αk keeps xk+1 feasible

a> > >


i xk+1 = ai xk + αk ai dk

constraints in Wk put no restriction on αk


i∈/ Wk
1 a> >
i dk ≤ 0: ai xk+1 ≤ bi for any αk
bi −a>i xk
2 a>
i dk > 0: feasible if αk ≤ a>
i dk

xk+1 satisfies all constraints if


( )
bi − a>i xk
αk = min 1, min >
/ k ,a>
i∈W i dk >0
ai dk

constraints that achieve minimum above are called blocking


1 – Primal active set method 11/47

minimize 12 d> Bd + gk> d


subject to AWk d = 0

where gk = Bxk + g.
if dk = 0, from KKT conditions for subproblem

Bdk + gk + A> >


Wk µ̃ = gk + AWk µ̃ = 0

if µ̃ ≥ 0, then xk and

µWk = µ̃, µi = 0, i ∈
/ Wk

satisfy KKT conditions for QP


1 – Primal active set method 12/47

minimize 12 d> Bd + gk> d


subject to AWk d = 0

where gk = Bxk + g.
if µ̃i < 0 for some i ∈ Wk we can further reduce cost by
solving

minimize 12 d> Bd + gk> d


subject to AWk+1 d = 0

where Wk+1 = Wk \ {i}


in other words we remove one constraint from Wk and resolve
usually the constraint with most negative multiplier is removed

i = argmin µ̃i
i∈Wk
1 – Primal active set method 13/47

Require: feasible x0 , W0 ⊆ W(x0 )


for k = 0, 1, 2, . . . do
find (dk , µ̃) that solve (gk = Bxk + g)
1 >
minimize 2
d Bd + gk> d subject to AWk d = 0

if dk = 0 then
if µ̃i ≥ 0 for all i ∈ Wk then
stop with x? = xk
else
j ← argmini∈Wk µ̃i , Wk+1 ← Wk \ {j}, xk+1 ← xk
end if
else
xk+1 = xk + αk dk , with αk that keeps xk+1 feasible
if αk < 1 then
Wk+1 = Wk ∪ {j}, where j is a blocking constraint
else
Wk+1 = Wk
end if
end if
end for
Our strategy is to use some heuristic to choose a value of M and solve (16.47) by
the usual means. If the solution we obtain has a positive value of η, we increase M and try
1 – Example [Nocedal
again. Note that a & Wright
feasible point is easy(2006), Ex.16.4]
to obtain for the subproblem (16.47): We set x ! x̃
(where, as before, x̃ is the user-supplied initial guess) and choose η large enough that all the
14/47
constraints in (16.47) are satisfied. This approach is, in fact, an exact penalty method using
the ℓ∞ norm; see Chapter 17.
A variant of (16.47) that penalizes the ℓ1 norm of the constraint violation rather than
the ℓ∞ norm is as follows:

2 + t) + Me v
minimize Gx + x−
min x(x
1 c +1)
Me (s + (x2 − 2.5)2
1 T
(x,s,t,v) 2
T T
E
T
I

subject to aiT x − bi + si − ti ! 0, i ∈ E,
subject to − x1 + 2xa 2x −≤
b +2
v ≥ 0, T
i i i i ∈ I, (16.48)
s ≥ 0, t ≥ 0, v ≥ 0.
x1 + 2x2 ≤ 6
Here, eE is the vector (1, 1, . . . , 1)T of length |E|; similarly for eI . The slack variables si , ti ,
x − 2x ≤ 2
and vi soak up any infeasibility in the constraints.
1 2
In the following example we use subscripts on the vectors x and p to denote their
components, and we use superscripts to indicate the iteration index. For example, x1 denotes
x ≥ 0, x ≥ 0
4
the first component, while x1denotes the fourth2iterate of the vector x.

x2

(2,2)
x5
x4
(0,1) (4,1)

x 2, x 3 (2,0) x1
x 0, x 1
components, and we use superscripts to indicate the iteration index. For example, x1 denotes
the first component, while x 4 denotes the fourth iterate of the vector x.
1 – Example [Nocedal & Wright (2006), Ex.16.4] 15/47

x2

(2,2)
x5
x4
(0,1) (4,1)

x 2, x 3 (2,0) x1
x 0, x 1

x0 = (2, 0),Figure
W16.3 Iterates of the active-set method.
0 = {3, 5}, d0 = (0, 0), (µ̃3 , µ̃5 ) = (−2, −1)
x1 = (2, 0), W1 = {5}, d1 = (−1, 0), α1 = 1
x2 = (1, 0), W2 = {5}, d2 = (0, 0), µ̃5 = −5
x3 = (1, 0), W3 = {∅}, d3 = (0, 2.5), α3 = 0.6
x4 = (1, 1.5), W4 = {1}, d4 = (0.4, 0.2), α4 = 1
x5 = (1.4, 1.7), W5 = {1}, d5 = (0, 0), µ̃1 = 0.8, solution
found
1 – Initial feasible point 16/47

primal active set algorithm requires initial feasible point


can be as hard as finding an optimal point
solve auxiliary problem to determine feasible solution

minimize 12 k max{0, Ax − b}k22

can be expressed as a QP in variables x ∈ IRn , w ∈ IRm

minimize 21 kwk2 (QPfeas )


subject to w ≥ 0, Ax − w ≤ b

feasible point for QPfeas : given any x0 ∈ IRn set

w0 = max{0, Ax0 − b}

optimal solution for QPfeas is feasible solution for QP if and


only if optimal cost is zero
1 – Dual active set 17/47

if B is positive definite, then dual problem has simple


constraints

minimize 21 (A> µ + g)> B −1 (A> µ + g) + b> µ


subject to µ ≥ 0

no need for feasibility QP


after solving dual QP for µ? , primal solution is

x? = −B −1 (A> µ? + g)
2 – Outline 18/47

1 Quadratic programs

2 Nonlinear programs

3 Interior point methods


2 – Inequality constrained optimization 19/47

minimize f (x)
subject to hi (x) = 0, i = 1, . . . , l
gi (x) ≤ 0, i = 1, . . . , m

f , h, g are twice smooth (can be nonconvex)


Sequential Quadratic Programming (SQP)
1 quadratic approximation of cost around xk
2 linear approximation of constraints
extends Newton-Lagrange method to inequality constraints
2 – Sequential quadratic programming (SQP) 20/47

at iteration k solve QP
1 >
minimize 2 d Bk d + ∇f (xk )> d
subject to hi (xk ) + ∇hi (xk )> d = 0, i = 1, . . . , l
gi (xk ) + ∇gi (xk )> d ≤ 0, i = 1, . . . , m

and let

xk+1 = xk + dk
(λk+1 ,µk+1 ) Lagrange multipliers for equalities/inequalities

can use active set method to solve QP


if m = 0 and Bk = ∇2xx L(xk , λk , µk ) we get
Newton-Lagrange
2 – Choices for Bk 21/47

Newton
Bk = ∇2xx L(xk , λk , µk )
quadratic convergence (under assumptions)

Gauss Newton: if f (x) = 21 kF (x)k22

Bk = JF (xk )> JF (xk )

linear convergence (quadratic for small residual problems


F (x? ) ≈ 0)
2 – Choices for Bk 22/47

quasi-Newton: Bk from BFGS


Bk sk s>
k Bk
> ỹk ỹk>
Bk+1 = Bk − +
s>k Bk sk s>k ỹk

with
sk = xk+1 − xk ,
yk = ∇x L(xk+1 , λk+1 , µk+1 ) − ∇x L(xk , λk+1 , µk+1 )
ỹk = θk yk + (1 − θk )Bk sk

1, if s> >
k yk ≥ 0.2sk Bk sk
θk = >
 >0.8sk Bk s>k , if s> >
k yk < 0.2sk Bk sk
s B s −s y
k k k k k

“Powell’s trick”: guarantees that


ỹk> sk ≥ 0.2s>
k Bk s k > 0

hence Bk+1  0
2 – Convergence rate of SQP 23/47

suppose that
1 (x? , λ? , µ? ) is a KKT point
2 LICQ holds

∇hi (x? ), i = 1, . . . , l, ∇gi (x? ), i ∈ A(x? ) are linearly independent

3 SOSC holds

d> ∇2xx L(x? , λ? , µ? )d > 0, ∀d ∈ C(x? , µ? ) \ {0}

4 (x0 , λ0 , µ0 ) is close enough to (x? , λ? , µ? )


5 Bk = ∇2xx L(xk , λk , µk )
then {(xk , λk , µk )} converges quadratically to (x? , λ? , µ? )
2 – Finite identification property 24/47

assume that (x? , λ? , µ? ) satisfy


1 KKT, LICQ, SOSC and
2 strict complementarity holds

µ?i > 0, for i ∈ A(x? )

3 (xk , λk , µk ) is close enough to (x? , λ? , µ? )


then solution of SQP subproblem has same active set as A(x? )

means that when close to the solution SQP becomes


Newton-Lagrange for equality constrained problem

minimize f (x)
subject to hi (x) = 0, i = 1, . . . , l
gi (x) = 0, i ∈ A(x? )
2 – Local SQP example 25/47

minimize f (x) = −x1 − x2


subject to g1 (x) = x21 − x2 ≤ 0
g2 (x) = x21 + x22 − 1 ≤ 0
√ √
(global) minimum is x? = (1/ 2, 1/ 2)
cost and constraint gradients
     
−1 2x1 2x1
∇f (x) = ∇g1 (x) = ∇g2 (x) = ,
−1 −1 2x2
cost and constraint Hessians
     
2 0 0 2 2 0 2 2 0
∇ f (x) = ∇ g1 (x) = ∇ g2 (x) =
0 0 0 0 0 2
Hessian of Lagrangian
 
2(µ1 + µ2 ) 0
∇2xx L(x, µ) = .
0 2µ2
2 – Local SQP example 26/47

local SQP algorithm starting from x0 = ( 21 , 1), µ0 = (0, 0)


k = 0: olve

minimize − d1 − d2
3
subject to d1 − d2 ≤ 4
d1 + 2d2 ≤ − 14

to obtain

d0 = (5/12, −1/3) µ1 = (1/3, 2/3).

next iterate

x1 = x0 + d0 = (11/12, 2/3) µ1 = (1/3, 2/3).


2 – Local SQP example 27/47

k = 1:  
2 0
B1 = ∇2xx L(x1 , µ1 ) = .
0 34
solve

minimize d21 + 32 d22 − d1 − d2


11 25
subject to 6 d1 − d2 ≤ − 144
11
6 d1 + 43 d2 ≤ − 144
41

to obtain

d1 = (−0.1696, 0.0196) µ2 = (0, 0.7304).

next iterate

x2 = x1 + d1 = (0.7471, 0.6863) µ2 = (0, 0.7304).


2 – Local SQP example 28/47

minimize f (x) = −x1 − x2


subject to g1 (x) = x21 − x2 ≤ 0
g2 (x) = x21 + x22 − 1 ≤ 0

first 4 iterations of local SQP

k xk µk g1 (xk ) g2 (xk ) dk
0 (0.5, 1) (0, 0) −0.75 0.25 (0.417, −0.333)
1 (0.917, 0.667) (0.33, 0.667) 0.174 0.285 (−0.170, −0.020)
2 (0.747, 0.686) (0, 0.73) −0.128 0.029 (−0.038, 0.021)
3 (0.709, 0.707) (0, 0.707) −0.204 0.002 (−0.002, 0.00)
4 (0.707, 0.707) (0, 0.707) −0.207 0 (0, 0)
2 – Globalization of SQP 29/47

SQP converges locally


to enforce global convergence we need a merit function
damp Newton step by choosing stepsize that decreases merit
function
`1 penalty function

ϕc (x) = f (x) + cP (x)

where
l
X m
X
P (x) = |hi (x)| + max{gi (x), 0}
i=1 i=1
= kh(x)k1 + k max{g(x), 0}k1

P (x) = 0 for x feasible, P (x) > 0 for x infeasible


under mild assumptions this is an exact penalty function
2 – Exact penalty 30/47

constrained problem penalized problem


minimize f (x) minimize ϕc (x) = f (x) + cP (x)
subject to h(x) = 0 subject to x ∈ IRn
g(x) ≤ 0

if x? is
1 local minimum for penalized problem
2 feasible for constrained problem
then x? is local minimum for constrained problem
if (x? , λ? , µ? ) satisfy KKT and SOSC for constrained problem and

c > k(λ? , µ? )k∞

then x? is local minimum for penalized problem


2 – Counterexample 31/47

minimize 0
subject to x3 + 3x2 + 3 = 0

unique global minimum is x? = −3.279


`1 penalty function: ϕc (x) = c|x3 + 3x2 + 3|.
penalty function has two local minima: x? and x?? = 0
ϕ1 (x) = |x3 + 3x2 + 3|
25

20

15

10

0
-4 -3 -2 -1 0 1 2
x
2 – Exact penalization 32/47

minimize x
subject to x ≥ 1
global minimum x?
= 1, Lagrange multiplier λ? = 1
ϕc (x) = x + c max{1 − x, 0} is an exact penalty for c > 1

c = 0.9
c = 1.2
c = 1.5
1.4 c = 1.8
ϕc (x)

1.2

0.6 0.8 1 1.2 1.4


x
2 – Exact penalization 33/47

nondifferentiability is essential for existence of finite c


candidate merit function ϕc (x) = x + 2c max{1 − x, 0}2
smooth with ∇ϕc (x) = 1 − c max{1 − x, 0}
global minima are x? = 1 − 1c 6= 1, for any finite c

c=2
c=4
c=8
1.4 c = 16
ϕc (x)

1.2

0.6 0.8 1 1.2 1.4


x
2 – Directional differentiability of exact penalty 34/47

idea is to perform backtracking on ϕc to determine stepsize


but ϕc is not differentiable
however it is directionally differentiable

ϕc (x + αd) − ϕc (x)
ϕ0c (x; d) = lim
α↓0 α

exists (but is not a linear function of d)


must guarantee that dk coming from SQP is a direction of
descent for ϕc
ϕ0c (xk ; dk ) < 0
2 – Line search 35/47

directional derivative is upper bounded by


ϕ0c (xk ; dk ) ≤ ∇f (xk )> dk − cP (xk )
from KKT conditions for QP
∇f (xk )> dk ≤ −d>
k Bk dk + k(λk+1 , µk+1 )k∞ P (xk )
combine the two
ϕ0c (xk ; dk ) ≤ −d>
k Bk dk − (c − k(λk+1 , µk+1 )k∞ )P (xk )
if c > k(λk+1 , µk+1 )k∞ and Bk is positive definite then
ϕ0c (xk ; dk ) < 0
backtrack on α to satisfy
ϕc (xk + αdk ) ≤ ϕc (xk ) + σα∆k
where
∆k = ∇f (xk )> dk − cP (xk )
will give xk+1 = xk + αk dk that decreases ϕc
2 – Global SQP 36/47

Require: (x0 , λ0 , µ0 ), c̄ > 0, σ ∈ (0, 1)


for k = 0, 1, 2, . . . do
find (dk , λk+1 , µk+1 ) that solve QP
1 >
minimize 2
d Bk d + ∇f (xk )> d
subject to hi (xk ) + ∇hi (xk )> d = 0, i = 1, . . . , l
gi (xk ) + ∇gi (xk )> d ≤ 0, i = 1, . . . , m

If dk = 0 stop.
Choose ck ≥ k(λk+1 , µk+1 )k∞ + c̄ and compute

∆k = ∇f (xk )> dk − ck (kh(xk )k1 + k max{g(xk ), 0}k1 )

Find the smallest nonnegative integer ik such that αk = 2−ik satisfies

ϕck (xk + αk dk ) ≤ ϕck (xk ) + σαk ∆k

Set xk+1 = xk + αk dk
end for
2 – Convergence of global SQP 37/47

assume that
1 f , h, g are smooth with Lipschitz gradient
2 ck remains constant after some iteration k
then (every accumulation point of) the sequence
{(xk , λk , µk )} converges to a KKT point
3 – Outline 38/47

1 Quadratic programs

2 Nonlinear programs

3 Interior point methods


3 – Interior point methods 39/47

minimize f (x)
subject to hi (x) = 0, i = 1, . . . , l
gi (x) + si = 0, i = 1, . . . , m
si ≥ 0, i = 1, . . . , m

added slacks si to convert inequalities to equalities


KKT (zi are Lagrange multipliers for si ≥ 0)
l
X m
X
∇f (x) + λi ∇hi (x) + µi ∇gi (x) = 0
i=1 i=1
µi − zi = 0
h(x) = 0, g(x) + s = 0
s ≥ 0, zi ≥ 0, si zi = 0
3 – Interior point methods 40/47

can eliminate zi using µi = zi : therefore KKT become


l
X m
X
∇f (x) + λi ∇hi (x) + µi ∇gi (x) = 0
i=1 i=1
h(x) = 0, g(x) + s = 0
s ≥ 0, µi ≥ 0, si µi = 0

the “hard part” is

s ≥ 0, µi ≥ 0, si µi = 0

introduce τ ≥ 0 and write them as

s ≥ 0, µi ≥ 0, si µi = τ

τ = 0: KKT for original problem


3 – Interior point methods 41/47

perturbed KKT: let τ > 0 and omit inequalities


m
X l
X
∇f (x) + µi ∇gi (x) + λi ∇hi (x) = 0
i=1 i=1
h(x) = 0, g(x) + s = 0
si µi = τ

just a system of nonlinear equations (no inequalities)


main idea: solve perturbed KKT (using Newton) for a
sequence of positive τk that go to zero while maintaining s, µ
positive
no need for initial x0 to be feasible
3 – Interior point methods 42/47

∇f (x) + ∇h(x)λ + ∇g(x)µ = 0


Sµ − τ 1 = 0
h(x) = 0, g(x) + s = 0

where S = diag(s1 , . . . , sm ).
solution for fixed τ ≥ 0: (x(τ ), s(τ ), λ(τ ), µ(τ ))
trajectory as τ varies is called primal-dual central path

interior point algorithm

solve the above using Newton’s method for a sequence {τk }


approaching zero
under mild assumptions, as τ → 0

(x(τ ), s(τ ), λ(τ ), µ(τ )) → (x? , s? , λ? , µ? )


3 – Barrier method interpretation 43/47

m
X
minimize f (x) − τ log si
i=1
subject to hi (x) = 0, i = 1, . . . , l
gi (x) + si = 0, i = 1, . . . , m

log term is called logarithmic barrier: − log s → ∞ as s ↓ 0


barrier approach: solve (approx.) barrier problem for {τk } → 0
KKT conditions for barrier problem

∇f (x) + ∇h(x)λ + ∇g(x)µ = 0


−τ /si + µi = 0, i = 1, . . . , m
h(x) = 0, g(x) + s = 0

multiply second eq. by si (positive)


3 – Logarithmic barrier 44/47

10

5
−τ log(s)

0 1 2 3
s

τ = 2, τ = 1, τ = 0.5
3 – Example 45/47

minimize f (x) = 12 (x21 + x22 )


subject to 2 ≥ x1

optimal solution: x? = (2, 0)


logarithmic barrier: B(x) = − log(2 − x1 )
barrier subproblems

xk = argmin{ 21 (x21 + x22 ) − τk log(x1 − 2)} = (1 + 1 + τk , 0)

as τk decreases, xk → x?
3 – Example 46/47

level sets of 12 (x21 + x22 ) − τ log(2 − x1 )

τ = 0.3 τ = 0.03

1 1

0.5 0.5

0 0

−0.5 −0.5

−1 −1
2.1 2.2 2.3 2.1 2.2 2.3
3 – Interior point methods 47/47

various rules for choosing τ0 , decreasing τk , stopping inner


Newton iterations
warm-start each barrier subproblem using previous solution
with a smart choice of τk , one inner Newton step is enough
for convex problems
very efficient for medium-scale convex problems
competitive to SQP for general nonlinear programs
software for LPs, QPs, SOCPs, SDPs: MOSEK, CPLEX,
Gurobi, SeDuMi, SDPT3
efficient open-source software for general NLPs: IPOPT

You might also like