Professional Documents
Culture Documents
Slide 1
Linear Optimization (LO): Lec. 1-9
2 Requirements
Slide 2
Homeworks: 30%
Examples of Formulations
4 History of Optimization
Slide 4
Fermat, 1638 Newton, 1670
min f(x) x: scalar
df(x) = 0
dx
Euler, 1755
min f(x1 : : : xn)
rf(x) = 0
Lagrange, 1797
min f(x1 : : : xn)
s.t. gk (x1 : : : xn) = 0 k = 1 : : : m
Euler, Lagrange Problems in innite dimensions, calculus of variations.
5 Nonlinear Optimization
5.1 The general problem Slide 5
min f(x1 : : : xn)
s:t: g1(x1 : : : xn) 0
...
gm (x1 : : : xn) 0:
2x1 + x2 3
x1 0 x2 0
c= 3 x = x1 b = 2 A = 12 12
1 x2 3
minimize c x 0
subject to Ax b
x0
7 History of LO
7.1 The pre-algorithmic period Slide 7
Fourier, 1826 Method for solving system of linear inequalities.
de la Vall
ee Poussin simplex-like method for objective function with abso
lute values.
1950s Applications.
Telecommunications
Manufacturing
Medicine
Engineering
Typesetting (TEX, LATEX)
9 Transportation Problem
9.1 Data Slide 10
m plants, n warehouses
3
9.2 Decision Variables
9.2.1 Formulation Slide 11
xij = number of units to send i ! j
m n
XX
min cij xij
i=1 j =1
Xm
s:t: xij = dj j = 1 : : :n
i=1
Xn
xij = si i = 1 : : :m
j =1
xij 0
10 Sorting through LO
Slide 12
Given n numbers c1 c2 : : : cn
The order statistic c(1) c(2) : : : c(n): c(1) c(2) : : : c(n)
P
X n
s:t: xi = k
i=1
0 xi 1 i = 1 : : : n
Example: You sell 1,000 shares at $50 per share you have bought them
at $30 per share Net cash is:
4
11.1 Formulation Slide 14
n
X
max ri (si ; xi )
i=1
Xn Xn Xn
s:t: pi xi ; 0:30 (pi ; qi)xi ; 0:01 pi xi C
i=1 i=1 i=1
0 xi si
12.2 Formulation
12.2.1 Decision Variables Slide 17
A : : :E: amount invested in $ millions
Casht : amount invested in cash in period t, t = 1 2 3
max 1:06Cash3 + 1:00B + 1:75D + 1:40E
s:t: A + C + D + Cash1 1
Cash2 + B 0:3A + 1:1C + 1:06Cash1
Cash3 + 1:0E 1:0A + 0:3B + 1:06Cash2
A 0:5 C 0:5 E 0:75
A : : : E 0
5
Solution: A = 0:5M , B = 0, C = 0, D = 0:5M , E = 0:659M , Cash1 = 0,
Cash2 = :15M , Cash3 = 0 Objective: 1:7976M
13 Manufacturing
13.1 Data Slide 18
n products, m raw materials
cj : prot of product j
14 Capacity Expansion
14.1 Data and Constraints Slide 20
Dt : forecasted demand for electricity at year t
Et : existing capacity (in oil) available at t
ct : cost to produce 1MW using coal capacity
nt : cost to produce 1MW using nuclear capacity
No more than 20% nuclear
Coal plants last 20 years
Nuclear plants last 15 years
6
14.2 Decision Variables Slide 21
xt : amount of coal capacity brought on line in year t.
yt : amount of nuclear capacity brought on line in year t.
wt : total coal capacity in year t.
zt : total nuclear capacity in year t.
14.3 Formulation Slide 22
P
T
min ct xt + ntyt
t=1
Pt
s:t: wt = xs t = 1 : : :T
s=max(0t;19)
zt =
Pt ys t = 1 : : :T
s=max(0t;14)
wt + zt + Et Dt
zt 0:2(wt + zt + Et)
xt yt wt zt 0:
15 Scheduling
15.1 Decision variables Slide 23
Hospital wants to make weekly nightshift for its nurses
Dj demand for nurses, j = 1 : : : 7
Every nurse works 5 days in a row
Goal: hire minimum number of nurses
Decision Variables
xj : # nurses starting their week on day j
15.2 Formulation Slide 24
min
P7 x
j
j =1
s:t: x1+ x4 + x5 + x6 + x7 d1
x1+ x2 x5 + x6 + x7 d2
x1+ x2 + x3 x6 + x7 d3
x1+ x2 + x3 + x4 + x7 d4
x1+ x2 + x3 + x4 + x5 d5
xj 0 x2 + x3 + x4 + x5 + x6 d6
x3 + x4 + x5 + x6 + x7 d7
7
16 Revenue Management
16.1 The industry Slide 25
Deregulation in 1978
Prior to Deregulation
{ Carriers only allowed to y certain routes. Hence airlines such as
Northwest, Eastern, Southwest, etc.
17 Revenue Management
17.1 Economics Slide 27
Huge sunk and xed costs
Very low variable costs per passenger ($10/passenger or less)
Strong economically competitive environment
Near-perfect information and negligible cost of information
Highly perishable inventory
Result: Multiple fares
18 Revenue Management
18.1 Data Slide 28
n origins, n destinations
1 hub
2 classes (for simplicity), Q-class, Y-class
Revenues rijQ rijY
8
18.2 LO Formulation
18.2.1 Decision Variables Slide 29
� Qij : # of Q-class customers we accept from i to j
� Yij : # of Y-class customers we accept from i to j
X
maximize Q
rij Qij +r Y
ij Yij
X
ij
n
subject to (Q + Y ) � C 0
ij ij i
�0
X
j
n
(Q + Y ) � C0
ij ij j
i�0
0�Q ij � Dij
Q
0�Y ij � Dij
Y
19 Revenue Management
19.1 Importance Slide 30
Robert Crandall, former CEO of American Airlines:
We estimate that RM has generated $1.4 billion in incremental revenue for
American Airlines in the last three years alone. This is not a one-time benet.
We expect RM to generate at least $500 million annually for the foreseeable
future. As we continue to invest in the enhancement of DINAMO we expect to
capture an even larger revenue premium.
20 Messages
20.1 How to formulate? Slide 31
1. Dene your decision variables clearly.
2. Write constraints and objective function.
3. No systematic method available.
What is a good LO formulation?
A formulation with a small number of variables and constraints, and the matrix
A is sparse.
21 Nonlinear Optimization
21.1 The general problem Slide 32
min f(x1 : : : xn)
s:t: g1(x1 : : : xn) 0
...
gm (x1 : : : xn) 0:
23 On the power of LO
23.1 LO formulation Slide 34
min f(x) = maxk dk x + ck
0
s:t: Ax b
min z
s:t: Ax b
dk x + ck z 8 k
24 On the power of LO
24.1 Problems with j:j
min
P c jx j Slide 35
j j
s:t: Ax b
Idea: jxj = maxfx ;xg Pc z
min j j
s:t: Ax b
xj zj
;xj zj
Message: Minimizing Piecewise linear convex function can be modelled by LO
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093 Optimization Methods
1 Outline Slide 1
� Polyhedra
� Standard form
� Algebraic and geometric de�nitions of corners
� Equivalence of de�nitions
� Existence of corners
� Optimality of corners
� Conceptual algorithm
xj � 0 j 2 N1
xj >< 0 j 2 N2
Characteristics
� Minimization problem
� Equality constraints
� Non-negative variables
2.2 Transformations Slide 4
max c0 x � min(�c0 x)
ai0x � bi ai0x + si � bi; si � 0
,
ai0x � bi ai0x � si � bi; si � 0
xj >< 0 xj � x+j � x�j
x+j � 0; xj�
� 0
x+1 � x
�
1 + 2x2 � s2 � 1
x+1 ; x
�
1 ; x2; s1; s2 � 0
3 Preliminary Insights
Slide 6
minimize �x1 � x2
subject to x1 + 2x2 � 3
2x1 + x2 � 3
x1; x2 � 0
x2
- x1 - x2 = - 2
1.5
- x1 - x2 = z
(1,1)
- x1 - x2 = 0
1.5 3 x1
c x1 + 2x2 < 3
2x1 + x2 < 3
Slide 7
�x1 + x2 � 1
x1 � 0
x2 � 0
Slide 8
� There exists a unique optimal solution.
� There exist multiple optimal solutions; in this case, the set of optimal
solutions can be either bounded or unbounded.
2
x2
c = (1,0)
c = (- 1,- 1)
1
c = (0,1)
c = (1,1)
x1
4 Polyhedra
4.1 Denitions Slide 9
� The set fx j a0 x � bg is called a hyperplane.
� The set fx j a0 x � bg is called a halfspace.
� The intersection of many halfspaces is called a polyhedron.
= b3
a '2
a '3x
x=
b2
a3
a2
a' x < b a4
a' x > b a1
=b a 4' x = b4
=b
1
a' x a5
1x
a'
a
a5' x = b
5
(a) (b)
5 Corners
5.1 Extreme Points Slide 10
� Polyhedron P � fx j Ax � bg
� x 2 P is an extreme point of P
if 6 9 y; z 2 P (y �
6 x; z 6� x):
x � �y + (1 � �)z ; 0 � � � 1
. . u
v
. w
. . z
.y
x
4
'w }
.
w
=c
c 'y
{y | P
c
.
x
'x }
{y | c
'y = c
x3
A .
E . P . C
x2
D . .
B
x1
Then 3 hyperplanes are tight, but constraints are not linearly independent.
Slide 14
Intuition: a point at which n inequalities are tight and corresponding equations
are linearly independent.
P � fx 2 �n j Ax � bg
� a1 ; : : :; am rows of A
� x2P
� I � fi j ai 0x � bi g
De�nition x is a basic feasible solution if subspace spanned by fai; i 2 I g
is �n .
5.3.1 Degeneracy
Slide 15
� If jI j � n, then ai ; i 2 I are linearly independent; x nondegenerate.
5
� If jI j � n, then there exist n linearly independent fai ; i 2 I g; x degener
ate.
A
C
P
B
E
(a) (b)
6 Equivalence of denitions
Slide 16
Theorem: P � fx j Ax � bg. Let x 2 P.
x is a vertex , x is an extreme point , x is a BFS.
7 BFS for standard form polyhedra
Slide 17
� Ax � b and x � 0
� m � n matrix A has linearly independent rows
� x 2 �n is a basic solution if and only if Ax � b, and there exist indices
B(1); : : :; B(m) such that:
{ The columns AB(1) ; : : :; AB(m) are linearly independent
{ If i 6� B(1); : : :; B(m), then xi � 0
7.1 Construction of BFS Slide 18
Procedure for constructing basic solutions
1. Choose m linearly independent columns AB(1) ; : : :; AB(m)
2. Let xi � 0 for all i 6� B(1); : : :; B(m)
3. Solve Ax � b for xB(1) ; : : : ; xB(m)
Ax � b ! BxB + NxN � b
xN � 0; xB � B� b 1
6
7.2 Example Slide 19
2 3 2 3
1 1 2 1 0 0 0 8
66 0 1 6 0 1 0
0
7 6 12
7
5 x �
6
4 1 0 0 0 0 1
0
7 4 4
75
0 1 0 0 0 0 1 6
� A ; A ; A ; A basic columns
4 5 6 7
A3
A1
A2
A4 = - A1
P Q
9 Optimality of BFS
Slide 23
min c0 x
s:t: x 2 P � fx j Ax � bg
Theorem: Suppose P has at least one extreme point. Either optimal cost is
�1 or there exists an extreme point which is optimal.
10 Conceptual algorithm
Slide 24
� Start at a corner
� Visit a neighboring corner that improves objective.
x3
. B = (0,0,10)
.
A = (0,0,0)
E = (4,4,4) .
. .
C = (0,10,0)
D = (10,0,0)
x1 x2
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093 Optimization Methods
1 Outline Slide 1
� Reduced Costs
� Optimality conditions
� Improving the cost
� Unboundness
� The Simplex algorithm
� The Simplex algorithm on degenerate problems
) xB � B � b � B � NxN
1 1
Then
� If c � 0 ) x optimal
1
2.3 Proof
� y arbitrary feasible solution
� d � y � x ) Ax � Ay � b ) Ad � 0
) BdB + P Aidi � 0
Slide 5
) dB � � P B �1 Ai di
i2N
) c0 d � c0B dB + P cidi
i2N
P i2N P
� (ci � c0B B �1Ai )di � cidi Slide 6
i2N i2N
� Since y � 0 and xi � 0; i 2 N, then di � yi � xi � 0; i 2 N
� c0 d � c0 (y � x) � 0 ) c0 y � c0 x
) x optimal
(b) in BT, Theorem 3.1
� � � (cB dB + cj dj )
� � � (cj � c0B B � Aj )
� � � cj
4 Unboundness Slide 9
� Is y � x + � � d feasible�
Since Ad � 0 ) Ay � Ax � b
� y�0�
If d � 0 ) x + � � d � 0 8 � � 0
) objective unbounded.
2
x3
(0,0,3) (1,0,3)
(2,0,2)
(0,1,3)
x1
(0,2,0) (2,2,0)
x2
5 Improvement
Slide 10
If di � 0, then
xi + �di � 0 ) � � � xdi
i
�
x �
) �� � fijmin �
di
di < g0 i �
xB i �
) �� � min
fi�1;:::;mjdB(i) <0g
� ( )
dB(i)
5.1 Example Slide 11
min x1+ 5x2 �2x3
s:t: x1+ x2+ x3 �4
x1 �2
x3 �3
3x2+ x3 �6
x1; x2; x3 �0
0
x 1
Slide 12
A A A A A A A
Slide 13
2
1
1 2
1 1
3
1
4
0
5
0
6 7
0
3 BB x 1
CC 0
4
1
BB x CC B 2
C
66 77
1 0 0 0 1 0
0
BB x 3
CC �
B@
3
CA
4
5
0 0 1 0 0 1
0
BB x 4
CC
0 3 1 0 0 0 1
@
x5
A
6
6
x7
3
x3
(0,0,3) (1,0,3)
(2,0,2)
(0,1,3)
x1
(0,2,0) (2,2,0)
x2
2
1 1 0 0 3
2
0 1 0 0
B �
664
10 01 0
1
0
0
775
; B� �
664
�11 �11
1
0 0
1 0
77 c0
� (0; 7; 0; 2; �3; 0;0)
0 1 0 1 0
1
�1 1 0 0
1
1
d �1
B d C
1
B 1
CC
d � 1; d � d � 0;
B
5 2 4 @
d CA
� �B� A
3
6
5 �
B
@
�1
A
Slide 15
d7 �1
y0 � x0 + �d0 � (2 � �; 0; 2 + �; 0; �; 1 � �; 4 � �)
What happens
as � increases�
�
�
�
�
( ) 0
Slide 17
1 1 0 0 1 0 �1 0
66 1 0 1 0
77 �1 66 0 0 1 0
77
B �
4
0 1 0 0
5
; B �
4
�1 1 1
0
5
0 1 0 1 0 0 �1 1
c0 � c0 � c0B B�1A � (0; 4; 0; �1; 0; 3; 0)
Need to continue, column A4 enters the basis.
6 Correctness Slide 18
� xB i �
� xdB(l) � i�1;:::;m;d
min � ( )
dB(i) � ��
B(l) B(i)<0
Theorem
� B � fAB i ;i�6 l ; Aj g basis
( )
� Else select j : cj � 0.
Slide 20
3. Compute u � �d � B �1Aj .
� If u � 0 ) cost unbounded; stop
� Else
xB(i) uB(l)
4. �� � 1�i�min
m;u >0 u
� u
i i l
5. Form a new basis by replacing AB(l) with Aj .
6. yj � ��
yB(i) � xB(i) � �� ui
7.1 Finite Convergence Slide 21
Theorem:
� P � fx j Ax � b; x � 0g �6 ;
� Every BFS non-degenerate
Then
� Simplex method terminates after a �nite number of iterations
� At termination, we have optimal basis B or we have a direction d : Ad �
0; d � 0; c0d � 0 and optimal cost is �1.
5
7.2 Degenerate problems Slide 22
� �� can equal zero (why�) ) y � x, although B 6� B .
� Even if �� � 0, there might be a tie
xB(i)
min ui )
1�i�m;ui >0
subscript.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093J Optimization Methods
1 Outline
Slide 1
• Revised Simplex method
• The full tableau implementation
• Finding an initial BFS
• The complete algorithm
• The column geometry
• Computational efficiency
2 Revised Simplex
Slide 2
Initial data: A, b, c
1
2.1 Example
Slide 5
min x1 + 5x2 −2x3
s.t. x1 + x2 + x3 ≤4
x1 ≤2
x3 ≤3
3x2 + x3 ≤6
x1 , x2 , x3 ≥0
Slide 6
B = {A1 , A3 , A6 , A7 }, BFS: x = (2, 0, 2, 0, 0, 1, 4)′
′
c = (0, 7, 0, 2, −3, 0,0)
1 1 0 0 0 1 0 0
1 0 0 0 −1
1 −1 0 0
B= 0 1 1 0 , B = −1
1 1 0
0 1 0 1 −1 1 0 1
(u1 , u3 , u6�, u7 )′ =�B −1 A5 = (1, −1, 1, 1)′
θ∗ = min 21 , 11 , 41 = 1, l = 6
l = 6 (A6 exits the basis). Slide 7
0 1 0 0 1
1 −1 0 0 −1
[B −1 |u] = −1
1 1 0 1
−1 1 0 1 1
1 0 −1 0
−1 0 0 1 0
⇒B = −1 1
1 0
0 0 −1 1
2
−c′B xB c1 ... cn
xB(1) | |
..
. B −1 A1 ... B −1 An
xB(m) | |
3.1 Example
Slide 10
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 ≤ 20
2x1 + x2 + 2x3 ≤ 20
2x1 + 2x2 + x3 ≤ 20
x1 , x2 , x3 ≥ 0
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 + x4 = 20
2x1 + x2 + 2x3 + x5 = 20
2x1 + 2x2 + x3 + x6 = 20
x1 , . . . , x6 ≥ 0
BFS: x = (0, 0, 0, 20, 20, 20)′
B=[A4 , A5 , A6 ] Slide 11
x1 x2 x3 x4 x5 x6
0 −10 −12 −12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2* 1 2 0 1 0
x6 = 20 2 2 1 0 0 1
x1 x2 x3 x4 x5 x6
100 0 −7 −2 0 5 0
x4 = 10 0 1.5 1* 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0
x6 = 0 0 1 −1 0 −1 1
Slide 13
3
x1 x2 x3 x4 x5 x6
120 0 −4 0 2 4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5* 0 1 −1.5 1
Slide 14
x1 x2 x3 x4 x5 x6
136 0 0 0 3.6 1.6 1.6
x3 = 4 0 0 1 0.4 0.4 −0.6
x1 = 4 1 0 0 −0.6 0.4 0.4
x2 = 4 0 1 0 0.4 −0.6 0.4
Slide 15
x 3
.B = (0,0,10)
A = (0,0,0) .
E = 4( 4, 4, ) .
. .
C = (0,10,0)
D = (10,0,0)
x 1 x 2
4 Comparison of implementations
Slide 16
Full tableau Revised simplex
4
5 Finding an initial BFS
Slide 17
• Goal: Obtain a BFS of Ax = b, x ≥ 0
• Special case: b ≥ 0
Ax ≤ b, x ≥ 0
⇒ Ax + s = b, x, s ≥ 0
s = b, x=0
min y1 + y2 + . . . + ym
s.t. Ax + y = b
x, y ≥ 0
Slide 19
3. If cost > 0 ⇒ LOP infeasible; stop.
4. If cost = 0 and no artificial variable is in the basis, then a BFS was found.
5. Else, all yi∗ = 0, but some are still in the basis. Say we have AB(1) , . . . , AB(k)
Slide 20
6. Drive artificial variables out of the basis: If lth basic variable is artifi
b ≥ 0.
5
4. If cost= 0, a feasible solution to the original problem has been found.
5. Drive artificial variables out of the basis, potentially eliminating redundant
rows.
Slide 22
Phase II:
1. Let the final basis and tableau obtained from Phase I be the initial basis
2. Compute the reduced costs of all variables for this initial basis, using the
rows.
6
z
B
. I C
F
H .
G . D
b .
initialbasis
z
. 6
3. . . . 2
1
4 . 7
nextbasis
8
. . 5
.b
optimalbasis
9 Computational efficiency
Slide 28
Exceptional practical behavior: linear in n
Worst case
max xn
s.t. ǫ ≤ x1 ≤ 1
ǫxi−1 ≤ xi ≤ 1 − ǫxi−1 , i = 2, . . . , n
Slide 29
x3
x2
x2
x1 x1
(a) (b)
Slide 30
Theorem
• The feasible set has 2n vertices
• The vertices can be ordered so that each one is adjacent to and has lower
• There exists a pivoting rule under which the simplex method requires
8
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093J Optimization Methods
1 Outline
Slide 1
• Motivation of duality
• General form of the dual
• Weak and strong duality
• Relations between primal and dual
• Economic Interpretation
• Complementary Slackness
2 Motivation
2.1 An idea from Lagrange
Slide 2
Consider the LOP, called the primal with optimal solution x∗
min c′ x
s.t. Ax = b
x≥0
g(p) ≤ c′ x∗ + p′ (b − Ax∗ ) = c′ x∗
Get the tightest lower bound, i.e.,
max g(p)
� �
g(p) = min c′ x + p′ (b − Ax)
x ≥0
= p′ b + min (c′ − p′ A)x
x ≥0
Note that �
′ ′ 0, if c′ − p′ A ≥ 0′ ,
min (c − p A)x =
x≥0 −∞, otherwise.
s.t. p′ A ≤ c′
1
3 General form of the dual
Slide 3
Primal Dual
min c′ x max p′ b
s.t. a′i x ≥ bi i ∈ M1 s.t. pi ≥ 0 i ∈ M1
a′i x ≤ bi i ∈ M2 pi ≤ 0 i ∈ M1
a′i x = bi i ∈ M3 pi >< 0 i ∈ M3
xj ≥ 0 j ∈ N1 p′ Aj ≤ cj j ∈ N1
xj ≤ 0 j ∈ N2 p′ Aj ≥ cj j ∈ N2
xj >< 0 j ∈ N3 p′ Aj = cj j ∈ N3
3.1 Example
Slide 4
min x1 + 2x2 + 3x3 max 5p1 + 6p2 + 4p3
s.t. −x1 + 3x2 =5 s.t. p1 free
2x1 − x2 + 3x3 ≥ 6 p2 ≥0
x3 ≤ 4 p3 ≤0
x1 ≥ 0 −p1 + 2p2 ≤1
x2 ≤ 0 3p1 − p2 ≥2
x3 free, 3p2 + p3 = 3.
Slide 5
Primal min max dual
≥ bi ≥0
constraints ≤ bi ≤0 variables
>
= bi <0
≥0 ≤ cj
variables ≤0 ≥ cj constraints
>
<0 = cj
4 Weak Duality
Slide 7
Theorem:
If x is primal feasible and p is dual feasible then p′ b ≤ c′ x
Proof
p′ b = p′ Ax ≤ c′ x
2
Corollary:
5 Strong Duality
Slide 8
Theorem: If the LOP has optimal solution, then so does the dual, and optimal
Proof:
min c′ x
s.t. Ax = b
x ≥ 0
Apply Simplex; optimal solution x, basis B.
Optimality conditions:
c′ − c′B B −1 A ≥ 0′
Slide 9
Define p′ = c′B B −1 ⇒ p′ A ≤ c′
⇒ p dual feasible for
max p′ b
s.t. p′ A ≤ c′
p′ b = c′B B −1 b = c′B xB = c′ x
⇒ x, p are primal and dual optimal
5.1 Intuition
Slide 10
a3
c
a2
a1
p 2a 2 p 1a 1
.
x *
3
6 Relations between primal and dual
Slide 11
Finite opt. Unbounded Infeasible
Finite opt. *
Unbounded *
Infeasible * *
7 Economic Interpretation
Slide 12
• x optimal nondegenerate solution: B −1 b > 0
• Suppose b changes to b + d for some small d
• How is the optimal cost affected?
• For small d feasibilty unaffected
• Optimality conditions unaffected
• New cost c′B B −1 (b + d) = p′ (b + d)
• If resource i changes by di , cost changes by pi di : “Marginal Price”
8 Complementary slackness
8.1 Theorem
Slide 13
Let x primal feasible and p dual feasible. Then x, p optimal if and only if
pi (a′i x − bi ) = 0, ∀i
xj (cj − p′ Aj ) = 0, ∀j
8.2 Proof
Slide 14
• ui = pi (a′i x − bi ) and vj = (cj − p′ Aj )xj
• If x, p primal and dual feasible, ui ≥ 0, vj ≥ 0 ∀i, j.
• Also c′ x − p′ b = i ui + j vj .
� �
ui = vj = 0 for all i, j.
4
8.3 Example
Slide 15
3x1 + x2 = 3 p1 + p2 ≤ 10
x1 , x2 , x3 ≥ 0 3p1 ≤ 6
⇒ p1 = 2, p2 = 1
Objective=19
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
c
c
a1
a3
A
a5
a2 c a4
a1
B a3 c
a4 a2
a1
C c
a1
a5
D a1
1 Outline
Slide 1
• Geometry of duality
• The dual simplex algorithm
• Farkas lemma
• Duality as a proof technique
max p′ b
�
m
s.t. pi ai = c
i=1
p≥0
c′ − c′B B −1 A ≥ 0′
1
a2 c
a1
a3 a3
a2
x * a1
• Do not require B −1 b ≥ 0
• Require c̄ ≥ 0 (dual feasibility)
• Dual cost is
p′ b = c′B B −1 b = c′B xB
3.2 An iteration
Slide 5
1. Start with basis matrix B and all reduced costs ≥ 0.
2
x2 p2
1 . D
c b
. B . C
1
. C
. .
A
1
D
.
E
2 x1
A
. B
1/2
. .
1
E
p1
(a) (b)
Slide 6
4. Else, let j s.t.
c̄j c̄i
= min
|vj | {i|vi <0} |vi |
5. Pivot element vj : Aj enters the basis and AB(l) exits.
3.3 An example
Slide 7
min x1 + x2
s.t. x1 + 2x2 ≥ 2
x1 ≥ 1
x1 , x2 ≥ 0
min x1 + x2 max 2p1 + p2
s.t. x1 + 2x2 − x3 = 2 s.t. p1 + p2 ≤ 1
x1 − x4 = 1 2p1 ≤ 1
x1 , x2 , x3 , x4 ≥ 0 p1 , p2 ≥ 0
Slide 8
x1 x2 x3 x4
0 1 1 0 0
x3 = −2 −1 −2* 1 0
x4 = −1 −1 0 0 1
Slide 9
x1 x2 x3 x4
−1 1/2 0 1/2 0
x2 = 1 1/2 1 −1/2 0
x4 = −1 −1* 0 0 1
3
A 1
A 3
A 2
b
.
x1 x2 x3 x4
−3/2 0 0 1/2 1/2
x2 = 1/2 0 1 −1/2 1/2
x1 = 1 1 0 0 −1
1. ∃x ≥ 0 s.t. Ax = b.
2. ∃p s.t. p′ A ≥ 0′ and p′ b < 0.
4.1.1 Proof
Slide 11
“ ⇒′′ If ∃x ≥ 0 s.t. Ax = b, and if p′ A ≥ 0′ , then p′ b = p′ Ax ≥ 0
“ ⇐′′ Assume there is no x ≥ 0 s.t. Ax = b
4
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
x ≥ 0
2 Outline
Slide 2
1. Global sensitivity analysis
2. Local sensitivity analysis
(a) Changes in b
(b) Changes in c
(c) A new variable is added
(d) A new constraint is added
(e) Changes in A
3. Detailed example
3.2 Dependence on b
Slide 4
Primal Dual
F (b) = min c′ x
F (b) = max p′ b
s.t. Ax = b
s.t. p′ A ≤ c′
x≥0
F (b) = maxi=1,...,N (pi )′ b is a convex function of b
1
( c + q d) ' ( x3)
( c + q d) ' ( x2
)
x1 o p t i m a l
. x2 o p t i m a l
. x3 o p t i m a l
. x4 o p t i m a l q
f( q)
( p1) ' ( b* + q d)
( p3) ' ( b* + q d)
( p2) ' ( b* + q d)
q1 q2 q
1. Feasibility conditions: B −1 b ≥ 0
2. Optimality conditions: c′ − c′B B −1 A ≥ 0′
Slide 6
• Suppose that there is a change in either b or c for example
• How do we find whether B is still optimal?
• Need to check whether the feasibility and optimality conditions are satis
fied
x≥0 x ≥ 0
1. Feasibility: B −1 (b + Δei ) ≥ 0
2. Optimality: c′ − c′B B −1 A ≥ 0′
Observations:
1. Changes in b affect feasibility
2. Optimality conditions are not affected
Slide 9
B −1 (b + Δei ) ≥ 0
βij = [B −1 ]ij
bj = [B −1 b]j
Thus,
(B −1 b)j + Δ(B −1 ei )j ≥ 0 ⇒ bj + Δβji ≥ 0 ⇒
3
bj bj
max − ≤ Δ ≤ min −
βji >0 βji βji <0 βji
Slide 10
Δ≤Δ≤Δ
5.2 Changes in c
Slide 11
cj → cj + Δ
Need to check:
1. Feasibility: B −1 b ≥ 0, unaffected
2. Optimality: c′ − c′B B −1 A ≥ 0′ , affected
• xj nonbasic
5.2.1 xj nonbasic
Slide 12
cB unaffected
(cj + Δ) − c′B B −1 Aj ≥ 0 ⇒ cj + Δ ≥ 0
Solution optimal if Δ ≥ −cj
What if Δ = −cj ?
What if Δ < −cj ?
4
5.2.2 xj basic
Slide 13
cB ← ĉB = cB + Δej
Then,
[c′ − ĉ′B B −1 A]i ≥ 0 ⇒ ci − [cB + Δej ]′ B −1 Ai ≥ 0
[B −1 A]ji = aji
ci ci
ci − Δaji ≥ 0 ⇒ max ≤ Δ ≤ min
aji <0 aji aji >0 aji
5
5.5 Changes in A
Slide 17
• Suppose aij ← aij + Δ
• Assume Aj does not belong in the basis
• Feasibility conditions: B −1 b ≥ 0, unaffected
• Optimality conditions: cl − c′B B −1 Al ≥ 0, l =
6 j, unaffected
• Optimality condition: cj − p′ (Aj + Δei ) ≥ 0 ⇒ cj − Δpi ≥ 0
6 Example
6.1 A Furniture company
Slide 18
• A furniture company makes desks, tables, chairs
• The production requires wood, finishing labor, carpentry labor
6.2 Formulation
Slide 19
Decision variables:
x1 = # desks, x2 = # tables, x3 = # chairs
6
Final tableau: s1 s2 s3 x1 x2 x3
280 0 10 10 0 5 0
s1 = 24 1 2 -8 0 -2 0
x3 = 8 0 2 -4 0 -2 1
x1 = 2 0 -0.5 1.5 1 1.25 0
• What is B −1 ?
1 2 −8
B −1 = 0 2 −4
0 −0.5 1.5
Slide 22
• What is the optimal solution?
• What is the optimal solution value?
• Is it a bit surprising?
• What is the optimal dual solution?
• What is the shadow price of the wood constraint?
• What is the shadow price of the finishing hours constraint?
• What is the reduced cost for x2 ?
7
• Suppose you can hire 1h of finishing overtime at $7. Would you do it?
• Another check
1 2 −8
c′B B −1 = (0, −20, −60) 0 2 −4 =
0 −0.5 1.5
8x1 + x3 + s1 + 6·1 = 48 s1 = 26
4x1 + 1.5x3 + 2·1 = 20 ⇒ x1 = 0.75
2x1 + 0.5x3 + 1.5 · 1 = 8 x3 = 10
z ′ − z = (60 ∗ 0.75 + 20 ∗ 10) − (60 ∗ 2 + 20 ∗ 8 + 30 ∗ 1) = −35 + 30 = −5 Slide 27
Another way to calculate the same thing: If x2 = 1
Suppose profit from tables increases from $30 to $34. Should it be produced?
At $35? At $36?
Optimality conditions:
cj − c′B B −1 Aj ≥ 0 ⇒
1 2
" #
−8
′
p = c′B B −1 = [0, −20, −(60 + Δ)] 0 2 −4
0 −0.5 1.5
8
6
c2 = c2 − p′ A2 = −30 + [0, 10 − 0.5Δ, 10 + 1.5Δ] 2 = 5 + 1.25Δ
1.5
cs2 = 10 − 0.5Δ
cs3 = 10 + 1.5Δ
Current basis optimal:
5 + 1.25Δ ≥ 0
10 − 0.5Δ ≥ 0 −4 ≤ Δ ≤ 20
10 + 1.5Δ ≥ 0
= 8 + 2Δ ≥ 0
2 − 0.5Δ
s1 = 24 + 2Δ
x3 = 8 + 2Δ
x1 = 2 − 0.5Δ
z = 60(2 − 0.5Δ) + 20(8 + 2Δ) = 280 + 10Δ
Slide 32
Suppose
Δ =
10 then
s1 44
x3 = 25 ← inf. (Use dual simplex)
x1 −3
xi ≥ 0
9
1
!
c4 −c′B B −1 A4 = −15 − (0, −10, −10) 1 =5≥0
1
Current basis still optimal. Do not produce stools
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
2003.
2 Structure
Slide 2
• Motivation
• Data Uncertainty
3 Motivation
Slide 3
• The classical paradigm in optimization is to develop a model that assumes
that the input data is precisely known and equal to some nominal values.
This approach, however, does not take into account the influence of data
Slide 4
• Ben-Tal and Nemirovski (2000):
In real-world applications of Linear Optimization (Net Lib li
brary), one cannot ignore the possibility that a small uncer
tainty in the data can make the usual optimal solution com
pletely meaningless from a practical viewpoint.
3.1 Literature
Slide 5
• Ellipsoidal uncertainty; Robust convex optimization Ben-Tal and Nemirovski
1
4 Goal
Slide 6
Develop an approach to address data uncertainty for optimization problems
that:
• It allows to control the degree of conservatism of the solution;
• It is computationally tractable both practically and theoretically.
5 Data Uncertainty
Slide 7
minimize c′ x
subject to Ax ≤ b
l≤x≤u
xi ∈ Z, i = 1, . . . , k,
WLOG data uncertainty affects only A and c, but not the vector b. Slide 8
6 Robust MIP
Slide 9
• Consider an integer Γi ∈ [0, |Ji |], i = 0, 1, . . . , m.
• Γi adjusts the robustness of the proposed method against the level of
allowed to change.
Slide 10
• Nature will be restricted in its behavior, in that only a subset of the
• We will guarantee that if nature behaves like this then the robust solution
2
6.1 Problem
( ) Slide 11
X
′
minimize c x+ max dj |xj |
{S0 | S0 ⊆J0 ,|S0 |≤Γ0 }
j∈S0
( )
X X
subject to aij xj + max âij |xj | ≤ bi , ∀i
{Si | Si ⊆Ji ,|Si |≤Γi }
j j∈Si
l ≤ x ≤
u
xi ∈ Z, ∀i = 1, . . . k.
6.2 Theorem 1
Slide 12
The robust problem can be reformulated has an equivalent MIP:
P
minimize c ′ x + z 0 Γ0 + p
X j∈J0X0j
subject to aij xj + zi Γi + pij ≤ bi ∀i
j j∈Ji
z0 + p0j ≥ dj yj ∀j ∈ J0
zi + pij ≥ a ˆij yj ∀i =
� 0, j ∈ Ji
pij , yj , zi ≥ 0 ∀i, j ∈ Ji
−yj ≤ xj ≤ yj ∀j
lj ≤ xj ≤ uj ∀j
xi ∈ Z i = 1, . . . , k.
6.3 Proof
Slide 13
Given a vector x∗ , we define:
( )
X
βi (x∗ ) = max âij |x∗j | .
{Si | Si ⊆Ji ,|Si |=Γi }
j∈Si
0 ≤ zij ≤ 1 ∀i, j ∈ Ji .
Slide 14
Dual: X
βi (x∗ ) = min pij + Γi zi
j∈Ji
s.t. zi + pij ≥ âij |x∗j | ∀j ∈ Ji
pij ≥ 0 ∀j ∈ Ji
zi ≥ 0 ∀i.
3
|Ji | Γi
5 5
10 8.3565
100 24.263
200 33.899
6.4 Size
Slide 15
• Original Problem has n variables and m constraints
• Robust counterpart has 2n + m + l variables, where l = m
P
i=0 |Ji | is the
1 n n
X X X
Pr ãij x∗j > bi ≤ (1 − µ) +µ ,
2n l l
j l=⌊ν⌋ l=⌊ν⌋+1
Slide 17
Slide 18
7 Experimental Results
7.1 Knapsack Problems
• Slide 19
X
maximize ci xi
i∈N
X
subject to wi xi ≤ b
i∈N
x ∈ {0, 1}n.
4
0
10
Approx bound
Bound 2
−1
10
−2
10
−3
10
−4
10
0 1 2 3 4 5 6 7 8 9 10
Γi
[wi − δi , wi + δi ];
7.1.1 Data
Slide 20
• |N | = 200, b = 4000,
• wi randomly chosen from {20, 21, . . . , 29}.
• ci randomly chosen from {16, 17, . . . , 77}.
• δi = 0.1wi .
5
7.1.2 Results
Slide 21
minimize c′ x
subject to x ∈ X ⊂ {0, 1}n .
• Robust Counterpart:
X
Z∗ = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S
subject to x ∈ X,
• WLOG d1 ≥ d2 ≥ . . . ≥ dn .
8.1 Remarks
Slide 23
• Examples: the shortest path, the minimum spanning tree, the minimum
assignment, the traveling salesman, the vehicle routing and matroid inter
section problems.
8.2 Approach
X Slide 24
6
8.3 Algorithm A
Slide 25
• Solution: yj = max(dj xj − θ, 0)
• X
Z∗ = min θΓ + (cj xj + max(dj xj − θ, 0))
x∈X,θ≥0
j
• X
Z∗ = min θΓ + (cj + max(dj − θ, 0)) xj
x∈X,θ≥0
j
Slide 26
• d1 ≥ d2 ≥ . . . ≥ dn ≥ dn+1 = 0.
• For dl ≥ θ ≥ dl+1 ,
n l
X X
min θΓ + cj xj + (dj − θ)xj =
x∈X,dl ≥θ≥dl+1
j=1 j=1
n l
X X
dl Γ + min cj xj + (dj − dl )xj = Zl
x∈X
j=1 j=1
•
n l
X X
Z∗ = min dl Γ + min cj xj + (dj − dl )xj .
l=1,...,n+1 x∈X
j=1 j=1
8.4 Theorem 3
Slide 27
• Algorithm A correctly solves the robust 0-1 optimization problem.
• It requires at most |J| + 1 solutions of nominal problems. Thus, If the
nominal problem is polynomially time solvable, then the robust 0-1 coun
9 Experimental Results
9.1 Robust Sorting
X Slide 28
minimize ci xi
i∈N
X
subject to xi = k
i∈N
x ∈ {0, 1}n .
7
Γ ¯
Z(Γ) ¯
% change in Z(Γ)
σ(Γ) % change in σ(Γ)
0 8822 0 %
501.0 0.0 %
10 8827 0.056 %
493.1 -1.6 %
20 8923 1.145 %
471.9 -5.8 %
30 9059 2.686 %
454.3 -9.3 %
40 9627 9.125 %
396.3 -20.9 %
50 10049 13.91 %
371.6 -25.8 %
60 10146 15.00 %
365.7 -27.0 %
70 10355 17.38 %
352.9 -29.6 %
80 10619 20.37 %
342.5 -31.6 %
100 10619 20.37 %
340.1 -32.1 %
X
Z ∗ (Γ) = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S
X
subject to xi = k
i∈N
x ∈ {0, 1}n .
9.1.1 Data
Slide 29
• |N | = 200;
• k = 100;
• cj ∼ U [50, 200]; dj ∼ U [20, 200];
• For testing robustness, generate instances such that each cost component
cj to cj + dj .
9.1.2 Results
Slide 30
8
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods
2 Column Generation
Slide 2
• For x ∈ ℜn and n large consider the LOP:
min c′ x
s.t. Ax = b
x≥0
• Restricted problem X
min ci xi
i∈I
X
s.t. Ai xi = b
i∈I
x≥0
70 − (3 × 17 + 1 × 15) = 4
Slide 5
• Given w1 , . . . , wm and W there are many cutting patterns: (3, 1) and (2, 2)
for example
3 × 17 + 1 × 15 ≤ 70
2 × 17 + 2 × 15 ≤ 70
1
• Pattern: (a1 , . . . , am ) integers:
X
ai wi ≤ W
i=1
3.1 Problem
Slide 6
• Given wi , bi , i = 1, . . . , m (bi : number of rolls of width wi demanded,
• Find how to cut the large rolls in order to minimize the number of rolls
used.
2
3.3 Formulation
Slide 10
Decision variables: xj = number of rolls cut by pattern j characterized by vector
Aj :
Pn
min xj
j=1
b1
n
Aj · xj = ...
P
j=1
bm
xj ≥ 0 ( integer)
Slide 11
• Huge number of variables.
• Can we apply column generation, that is generate the patterns Aj on the
fly?
3.4 Algorithm
Slide 12
Idea: Generate feasible patterns as needed.
W
⌊ w1 ⌋ 0 0 0
0 ⌊W ⌋ 0
0
1) Start with initial patterns: w2
0 , 0
, W ,
⌊ ⌋
w3
0
W
0 0 0 ⌊ w4 ⌋
Slide 13
2) Solve:
min x1 + · · · + xm
x1 A1 + · · · + xm Am = b
xi ≥ 0
Slide 14
3) Compute reduced costs
3
3.4.1 Key Idea
Slide 15
4) Solve
m
X
z ∗ = max p i ai
i=1
Xm
s.t. wi ai ≤ W
i=1
ai ≥ 0, integer
This is the integer knapsack problem
Slide 16
• If z ∗ ≤ 1 ⇒ 1 − p′ Aj > 0 ∀j ⇒ current solution optimal
• If z ∗ > 1 ⇒ ∃ s: 1 − p′ As < 0 ⇒ Variable xs becomes basic, i.e., a new
i=1,...,m
Why ?
3.6 Example
Slide 18
max 11x1 + 7x2 + 5x3 + x4
s.t. 6x1 + 4x2 + 3x3 + x4 ≤ 25
xi ≥ 0, xi integer
F (0) = 0
F (1) = 1
F (2) = 1 + F (1) = 2
Slide 19
F (3) = max(5 + F (0)∗ , 1 + F (2)) = 5
F (9) = 11 + F (3) = 16
F (10) = 11 + F (4) = 18
F (u) = 11 + F (u − 6) = 16 u ≥ 11
4
⇒ F (25) = 11 + F (19) = 11 + 11 + F (13) = 11 + 11 + 11 + F (7) = 33 + 12 = 45
x∗ = (4, 0, 0, 1)
4 Stochastic Programming
4.1 Example
Slide 20
Wrenches
Pliers Cap.
Steel (lbs)
1.5
1.0 27,000
Molding machine (hrs)
1.0
1.0 21,000
Assembly machine (hrs)
0.3
0.5 9,000* Slide 21
Demand limit (tools/day)
15,000
16,000
Contribution to earnings
$130*
$100
($/1000 units)
max 130W + 100P
s.t. W ≤ 15
P ≤ 16
1.5W + P ≤ 27
W + P ≤ 21
0.3W + 0.5P ≤ 9
W, P ≥ 0
4.1.2 Decisions
Slide 23
• Need to decide steel capacity in the current quarter. Cost 58$/1000lbs.
• Soon after, uncertainty will be resolved.
• Next quarter, company will decide production quantities.
4.1.3 Formulation
Slide 24
5
State Cap. W. contr. Prob.
1 8,000 160 0.25
2 10,000 160 0.25
3 8,000 90 0.25
4 10,000 90 0.25
Decision Variables: S: steel capacity,
Mol. 1 W1 + P1 ≤ 21
Ste. 1 −S + 1.5W1 + P1 ≤ 0
W.d. 1 W1 ≤ 15
P.d. 1 P1 ≤ 16
Slide 26
Ass. 2
0.3W2 + 0.5P2 ≤ 10
Mol. 2
W2 + P2 ≤ 21
Ste. 2
−S + 1.5W2 + P2 ≤ 0
W.d. 2
W2 ≤ 15
P.d. 2 P2 ≤ 16
Obj. 2 −Z2 + 160W2 + 100P2 = 0
Slide 27
Ass. 3
0.3W3 + 0.5P3 ≤ 8
Mol. 3
W3 + P3 ≤ 21
Ste. 3
−S + 1.5W3 + P3 ≤ 0
W.d. 3
W3 ≤ 15
P.d. 3 P3 ≤ 16
Obj. 3 −Z3 + 90W3 + 100P3 = 0
Slide 28
Ass. 4
0.3W4 + 0.5P4 ≤ 10
Mol. 4
W4 + P4 ≤ 21
Ste. 4
−S + 1.5W4 + P4 ≤ 0
W.d. 4
W4 ≤ 15
P.d. 4 P4 ≤ 16
Obj. 4 −Z4 + 90W4 + 100P4 = 0
S, Wi , Pi ≥ 0
4.1.4 Solution
Slide 29
Solution: S = 27, 250lb.
Wi Pi
1 15,000 4,750
2 15,000 4,750
3 12,500 8,500
4 5,000 16,000
6
4.2 Two-stage problems
Slide 30
• Random scenarios indexed by w = 1, . . . , k. Scenario w has probability
αw .
Bw x + Dw yw = dw , yw ≥ 0.
4.2.1 Formulation
Slide 31
min c′ x + α1 f1′ y1 + ··· + αk fk′ yk
Ax =b
B1 x + D 1 y1 = d1
B2 x + D 2 y2 = d2 Slide 32
. . .. ..
. . .
Bk x + D k yk = dk
x, y1 , y2 , . . . , yk ≥ 0.
Structure: x y1 y2 y3 y4
Objective
7
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods
2 Integer Optimization
2.1 Mixed IO
Slide 2
(MIO) max c� x + h� y
s.t. Ax + By ≤ b
n
x ∈ Z+ (x ≥ 0, x integer)
m
y ∈ R+ (y ≥ 0)
2.2 Pure IO
Slide 3
(IO) max c� x
s.t. Ax ≤ b
n
x ∈ Z+
Important special case: Binary Optimization
(BO) max c� x
s.t. Ax ≤ b
x ∈ {0, 1}n
2.3 LO
Slide 4
(LO) max c� x
s.t. By ≤ b
n
y ∈ R+
aj : cost of project j
cj : value
� of project j
Slide 6
1, if project j is selected.
xj =
0, otherwise.
1
n
�
max cj xj
j=1
�
s.t. aj xj ≤ b
xj ∈ {0, 1}
x2 − x1 = 0
0 ≤ x2 ≤ x1
0 ≤ y ≤ U x, x ∈ {0, 1}
m jobs
cij : cost
� of assigning person j to job i.
�
s.t. xij = 1 each job is assigned
j=1
�m
i=1
xij ∈ {0, 1}
max c� x
s.t. Ax ≤ b
x ∈ {0, 1}n
2
• Add constraint � �
xj + (1 − xj ) ≥ 1.
j∈I0 j∈I1
• Extensions to MIO?
I = {1 . . . m} set of clients
• Decision variables
�
1, a facility is placed at location j
xj =
0, otherwise
yij = fraction of demand of client i
satisfied by facility j.
Slide 11
n
� m �
� n
IZ1 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
yij ≤ xj
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.
Slide 12
Consider an alternative formulation.
n
� m �
� n
IZ2 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
�m
yij ≤ m · xj
i=1
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.
3
4.2 Observations
Slide 13
• IZ1 = IZ2 , since the integer points both formulations define are the same.
n �
� 0 ≤ xj ≤ 1
P1 = {(x, y) : yij = 1, yij ≤ xj ,
0 ≤ yij ≤ 1
j=1
n
� m
�
P2 = {(x, y) : yij = 1, yij ≤ m · xj ,
j=1 i=1
�
0 ≤ xj ≤ 1
0 ≤ yij ≤ 1
Slide 14
• Let
Z1 = min cx + hy, Z2 = min cx + hy
(x, y) ∈ P1 (x, y) ∈ P2
• Z2 ≤ Z1 ≤ IZ1 = IZ2
4.3 Implications
Slide 15
• Finding IZ1 (= IZ2 ) is difficult.
• Solving to find Z1 , Z2 is a LOP. Since Z1 is closer to IZ1 several methods
Slide 16
Slide 18
4
• The extreme points of CH(H) have {0, 1} coordinates.
• So, if we know CH(H) explicitly, then by solving min cx + hy, (x, y) ∈
CH(H) ⊆ P1 ⊆ P2
5 Minimum Spanning
Tree (MST)
Slide 19
• How do telephone companies bill you?
• It used to be that rate/minute: Boston → LA proportional to distance in
MST
for TSP)
Slide 20
• Given a graph G = (V, E) undirected and Costs ce , e ∈ E.
• Find a tree of minimum cost spanning all the nodes.
�
1, if edge e is included in the tree
• Decision variables xe =
0, otherwise
Slide 21
• The tree should be connected. How can you model this requirement?
• Let S be a set of vertices. Then S and V \ S should be connected
�
i∈S
• Let δ(S) = {e = (i, j) ∈ E :
j ∈V \S
• Then, �
xe ≥ 1
e∈δ(S)
5
5.1 Formulation
� Slide 22
IZMST = min ce xe
⎧ e∈E
�
⎪ xe ≥ 1 ∀ S ⊆ V, S �= ∅, V
⎨ e∈δ(S)
⎪
�
H xe = n − 1
⎩ e∈E
⎪
⎪
xe ∈ {0, 1}.
Is this a good formulation? Slide 23
Pcut = {x ∈ R|E| : 0 ≤ x ≤ e,
�
xe = n − 1
e∈E
�
xe ≥ 1 ∀ S ⊆ V, S �= ∅, V }
e∈δ(S)
straints.
6
6.1 Formulation I
� Slide 27
1, if edge e is included in the tour.
xe =
0, otherwise.
�
min ce xe
e∈E
�
s.t. xe ≥ 2, S⊆E
e∈δ(S)
�
xe = 2, i∈V
e∈δ(i)
xe ∈ {0, 1}
6.2 Formulation II
� Slide 28
min �ce xe
s.t. xe ≤ |S| − 1, S ⊆ E
e∈E(S)
�
xe = 2, i ∈ V
e∈δ(i)
xe ∈ {0, 1}
Slide 29
T SP
= {x ∈ R|E| ;
� �
Pcut xe ≥ 2, xe
= 2
e∈δ(S) e∈δ(i)
0 ≤ xe ≤ 1}
�
T SP
Psub
= {x ∈ R|E| ; xe = 2
e∈δ(i)
�
xe ≤ |S| − 1
e∈δ(S)
0 ≤ xe ≤ 1}
Slide 30
T SP T SP
• Theorem: Pcut = Psub �⊇ CH(H)
• Nobody knows CH(H) for the TSP
7 Minimum Matching
Slide 31
• Given G = (V, E); ce costs on e ∈ E. Find a matching of minimum cost.
• Formulation: �
min �ce xe
s.t. xe = 1, i∈V
e∈δ(i)
xe ∈ {0, 1}
7
Let
PMAT = {x ∈ R|E| :
�
xe = 1
e∈δ(i)
�
xe ≥ 1 |S| = 2k + 1, S �= ∅
e∈δ(S)
xe ≥ 0}
8 Observations
Slide 33
• For MST, Matching there are efficient algorithms. CH(H) is known.
• For TSP � ∃ efficient algorithm. TSP is an N P − hard problem. CH(H)
is not known.
9 Summary
Slide 34
1. An IO formulation is better than another one if the polyhedra of their
is known.
8
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods
1 Outline Slide 1
� Cutting plane methods
� Branch and bound methods
LP relaxation
min c0x
s:t: Ax � b
x � 0:
j 2N
� � � �
� aij � B �1Aj i , ai0 � B �1b i :
1
� X
xi + aij xj � ai0 :
j 2N
� Since xj � 0 for all j ,
X X
xi + baij cxj � xi + aij xj � ai0 :
j 2N j 2N
� Since xj integer, X
xi + baij cxj � bai0 c:
j 2N
� Valid cut
2.4 Example Slide 6
min x1 � 2x2
s:t: �4x1 + 6x2 � 9
x 1 + x2 � 4
x1 ; x2 � 0
x1 ; x2 integer:
We transform the problem in standard form
min x1 � 2x2
s:t: �4x1 + 6x2 + x3 �9
x1 + x2 + x4 � 4
x1 ; : : :; x4 � 0
x1 ; : : :; x4 integer.
LP relaxation: x1 � (15�10; 25�10). Slide 7
�
x2 + 101 x3 + 10
1 x � 25 :
4
10
� Gomory cut
x2 � 2:
� Add constraints x2 + x5 � 2, x5 � 0
� New optimal x2 � (3�4; 2):
� One of the equations in the optimal tableau is
x1 � 14 x3 + 64 x5 � 43 :
� New Gomory cut
x1 � x3 + x5 � 0;
� New optimal solution is x3 � (1; 2):
Slide 8
2
x2
.
x1 - 3x1 + 5x 2 < 7
2 ..
x2
x3
x2 < 2
0
1 2 3 4 x1
(stop), or break the corresponding problem into further subproblems, which are
added to the list of active subproblem.
x3=0 x3=1
Objective value=22.2
x1=1, x2=0, x3=0.6, x4=1
x3=0 x3=1
x4=0 x4=1
LP relaxation Slide 12
max 12x1 + 8x2 + 7x3 + 6x4
s:t: 8x1 + 6x2 + 5x3 + 4x4 � 15
x1 � 1; x2 � 1; x3 � 1; x4 � 1
x1 ; x2; x3; x4 � 0
LP solution: x1 � 1; x2 � 0; x3 � 0:6; x4 � 1 Pro�t�22:2
3.2.1 Branch and bound tree Slide 13
3.3 Pigeonhole Problem Slide 14
Slide 15
� There are n + 1 pigeons with n holes. We want to place the pigeons in the Slide 16
holes in such a way that no two pigeons go into the same hole.
� Let xij � 1 if pigeon i goes into hole j , 0 otherwise.
Slide 17
� Formulation 1:
P
j
� 1; i � 1; : : : ; n + 1
xij
4
Objective value=22.2
x1=1, x2=0, x3=0.6, x4=1
x3=0 x3=1
x4=0 x4=1
x1=0 x1=1
� Formulation 2:
P
j
xij � 1;
� 1; : : : ; n + 1
i
Pn+1
xij � 1; 8j
i�1
ing.
� Probing : Setting temporarily a 0-1 variable to 0 or 1 and redo the logical
tests. Force logical connection between variables. For example, if 5x + 4y + z �
8; x; y; z 2 f0; 1g, then by setting x � 1, we obtain y � 0. This leads to an
inequality x + y � 1.
5
1 2 5
4 3 6 7
4 Application
4.1 Directed TSP
4.1.1 Assignment Lower Bound
Slide 20
Given a directed graph G � (N; A) with n nodes, and a cost cij for every arc,
�nd a tour (a directed cycle that visits all nodes) of minimum cost.
Pn Pn
min �1
i j �1 cij xij
Pn
s.t. : � 1; j � 1; : : : ; n;
Pin�1
xij
j �1 ij
x � 1; i � 1; :::; n;
Slide 21
Branching:Set one of the arcs selected in the optimal solution to zero. i.e., add
constraints of the type \xij � 0" to exclude the current optimal solution.
4.2 Improving BB Slide 22
� Better LP solver
� Use problem structure to derive better branching strategy
� Better choice of lower bound b(F ) - better relaxation
� Better choice of upper bound U - heuristic to get good solution
� KEY: Start pruning the search tree as early as possible
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093: Optimization Methods
1 Outline
Slide 1
Dx � d
x integer
� X � fx integer j Dx � dg
� Optimizing over X can be done e�ciently
2.1 Formulation Slide 3
� Consider
Z (�) � min c0x + �0
(b � Ax) (D)
s:t: x2X
� For �xed �, problem can be solved e�ciently
� �
� Z (�) � mini�1;:::;m c0 xi + �0 (b � Axi ) :
� Z (�) is concave and piecewise linear
2.2 Weak Duality Slide 4
If problem (D) has an optimal solution and if � � , then Z (�) � Z
0 IP
ZD
Z(p)
p* p
� ZD � ZIP
� We need to maximize a piecewise linear concave function
Slide 6
3 Strength of LD
3.1 Main Theorem Slide 7
X � fx integer j Dx � dg: Note that CH(X ) is a polyhedron. Then
ZD � min c0 x
s:t: Ax � b
x 2 CH(X )
2
x2
CH(X)
3 . .
xD
2 .
x IP
. .
c
xL P
1 . .
0
.
1
.
2 x1
X � (1; 0); (2; 0); (1; 1); (2; 1); (0; 2);
Slide 9
For p � 0, we have
Slide 10
� �
(x ;x )2X 1 2
8
�2 + p; �
� 0 � p � 5�3;
Z (p) � � 3 � 2p; 5�3 � p � 3;
:
6 � 3p; p � 3:
�
p � 5�3, and ZD � Z (5�3) � �1�3 Slide 11
� �
Slide 12
� xD � 1�3; 4�3 , ZD � �1�3
� �
� ZLP � ZD � ZIP
Slide 13
Z(p)
0
p
CH(X ) � x j Dx � d
� � �
� If� x j Dx � d , has integer extreme points, then CH(X ) � x j Dx �
d , and therefore Z � Z D LP
4 Solution of LD
� � Slide 15
� Z (�) � mini�1;:::;m c0xi + �0 (b � Axi ) ; i.e.,
� 0 ��:
Z (�) � i�1min
;:::;m
hi + f i
4
f (x *) +s' (x -x *)
f (x)
x* x
Z (p)
f2
s
f1
p* p
f (x ) � f (x� ) + s0 (x � x� );
for all x 2 �n .
� Let f be a concave function. A vector s such that
f (x ) � f (x� ) + s0 (x � x� );
t p
t
s
t
( t)
Z p
1 5:00
�3 �9 00
:
2 2:60
�3 �1 80
:
3 0:68
1 �1 32
:
4 1:19
1 �0 81
:
5 1:60
1 �0 40
:
6 1:92
�2 �0 84
:
7 1:40
1 �0 60
:
8 1:61
1 �0 39
:
9 1:78
�2 �0 56
:
10 1:51
1 �0 49
:
t and go to Step 2.
� �
3a If � � 0, ptj+1 � max ptj + �t stj ; 0 ; 8 j:
4.2.1 Step sizes
Slide 20
� Z (pt) converges to the unconstrained maximum of Z (�), for any stepsize
sequence �t such that
1
X
� t � 1; and lim � � 0:
t!1 t
t�1
� Examples �t � 1�t
� �t � �0 �t; t � 1; 2; : : :;
� �t � Z^Djj�sZjj(2p ) �t; where � satis�es 0 � � � 1, and Z^D is an estimate of
t
t
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093: Optimization Methods
1 Outline Slide 1
� Approximation algorithms
� Local search methods
� Simulated annealing
2 Approximation algorithms
Slide 2
� Algorithm H is an �-approximation algorithm for a minimization prob
lem with optimal cost Z � , if H runs in polynomial time, and returns a
feasible solution with cost ZH :
ZH � (1 + �)Z �
� For a maximization problem
ZH � (1 � �)Z �
2.1 TSP
2.1.1 MST-heuristic Slide 3
� Triangle inequality
cij � cik + ckj ; 8 i; k; j
� Find a minimum spanning tree with cost ZT
� Construct a closed walk that starts at some node, visits all nodes, returns
to the original node, and never uses an arc outside the minimal spanning
tree
� Each arc of the spanning tree is used exactly twice
Slide 4
� Total cost of this walk is 2ZT
� Because of triangle inequality ZH � 2ZT
� But ZT � Z � , hence
ZH � 2ZT � 2Z �
1-approximation algorithm
1
2.1.2 Matching heuristic
Slide 5
� Find a minimum spanning tree. Let ZT be its cost
� Find the set of odd degree nodes. There is an even number of them. Why�
� Find the minimum matching among those nodes with cost ZM
� Adding spanning tree and minimum matching creates a Eulerian graph,
i.e., each node has even degree. Construct a closed walk
� Performance
ZH � ZT + ZM � Z � + 1�2Z � � 3�2Z �
Slide 6
� Each tour has O(n2) neighbours. Search for better solution among its
neighbourhood.
Slide 9
� Performance of 2-OPT on random Euclidean instances
3.2 Extensions
4 Extensions Slide 10
� Iterated Local Search
� Large neighbourhoods (example 3-OPT)
� Simulated Annealing
� Tabu Search
� Genetic Algorithms
4.1 Large Neighbourhoods Slide 11
� Within a small neighbourhood, the solution may be locally optimal. Maybe
by looking at a bigger neighbourhood, we can �nd a better solution.
� Increase in computational complexity
4.1.1 TSP Again
Slide 12
3-OPT: Two tours are neighbour if one can be obtained from the other by
removing three edges and introducing three new edges
5 Simulated Annealing
Slide 13
� Allow the possibility of moving to an inferior solution, to avoid being
trapped at local optimum
� Idea: Use of randomization
5.1 Algorithm
Slide 14
� Starting at x, select a random neighbour y in the neighbourhood structure
qxy � 0; qxy � 1
y2N (x)
e�(c(y)�c(x))�T ;
stay in x otherwise
5.2 Convergence
Slide 15
� We de�ne a Markov chain.
� Under natural conditions, the long run probability of �nding the chain at
state x is given by
e
�c(x)�T
P
with A � z e
�c (z )�T
� But if T is too small, it takes longer to escape from local optimal (accept
longer for the markov chain to converge to the steady state distribution
Slide 16
� T (t) � R� log(t). Convergence guaranteed, but known to be slow empiri
cally.
4
5.4 Knapsack Problem
X X n
max c x : a x � b;
n
xi 2 f0; 1g
Slide 17
i i i i
i�1 i�1
Let X � (x1; :::; xn) 2 f0; 1gn
� Neighbourhood Structure: N (X ) � fY 2 f0; 1gn : d(X; Y ) � 1g. Exactly
one entry has been changed
Slide 18
Generate random Y � (y1 ; :::; yn):
� Choose j uniformly from 1; 2; :::; n.
� yi � xi if i 6� j . yj � 1 � xj .
� Accept if Pi ai yi � b.
5.4.1 Example
Slide 19
� c�(135, 139, 149, 150, 156, 163, 173, 184, 192, 201, 210, 214, 221, 229, 240)
� a�(70, 73, 77, 80,82, 87, 90,94, 98, 106, 110, 113, 115, 118, 120)
� b � 750
� X
� � (1; 0; 1; 0; 1; 0; 1; 1; 1; 0; 0; 0; 0; 1; 1), with value 1458
Slide 20
Cooling Schedule:
� T0 � 1000
� probability of accepting a downward move is between 0.787 (ci � 240) and
0.874 (ci � 135).
� Cooling Schedule: T (t) � �T (t � 1), � � 0:999
� Number of iterations: 1000, 5000
Slide 21
Performance:
� 1000 iterations: best solutions obtained in 10 runs vary from 1441 to 1454
� 5000 iterations: best solutions obtained in 10 runs vary from 1448 to 1456.
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093 Optimization Methods
1 Outline Slide 1
1. The knapsack problem
2. The traveling salesman problem
3. The general DP framework
4. Bellman equation
5. Optimal inventory control
6. Optimal trading
7. Multiplying matrices
j �1
X n
subject to wj xj K
j �1
xj
2 f0; 1g
De�ne i
X
Ci (w) � maximize cj xj
j �1
X i
subject to wj xj w
j �1
xj 2 f0; 1g
complexity O(nK)
S nfkg
C f1g; 1 � 0:
Length of an optimal tour is
min C f1; : : :; ng; k + ck1
k
Complexity: O n22n operations
2
5 The general DP framework
Slide 7
Discrete time dynamic system described by state xk , k indexes time.
uk control to be selected at time k. uk 2 Uk (xk ).
wk randomness at time k
N time horizon
Dynamics:
xk+1 � fk (xk ; uk ; wk)
Cost function: additive over time
NX1 !
6 The DP Algorithm
Slide 9
De�ne Jk (xk ) to be the expected optimal cost starting from stage k at
state xk .
Bellman's principle of optimality
JN (xN ) � gN (xN )
Jk (xk ) �
min E g (x ; u ; w ) + Jk+1 (fk (xk ; uk ; wk))
uk 2Uk (xk ) wk k k k k
Optimal expected cost for the overall problem: J0(x0 ).
3
7 Inventory Control
Slide 10
If r(xk ) � ax2k , wk N(�k ; �k2), then
uk � ck xk + dk ; Jk (xk ) � bk x2k + fk xk + ek
If r(xk ) � p max(0; xk ) + h max(0; xk ) , then there exist Sk :
uk � Sk xk if xk � Sk
0 if xk Sk
8 Optimal trading
Slide 11
S shares of a stock to be bought within a horizon T .
t � 1; 2; : : :; T discrete trading periods.
Control: St number of shares acquired in period t at price Pt, t � 1; 2; : : :; T
T
X
Objective: minE Pt St
t�1
T
X
s:t: St � S
t�1
Dynamics:
Pt � Pt 1 + �St + �t
where � � 0, �t N(0; 1)
8.1 DP ingredients Slide 12
State: (Pt 1; Wt)
Pt 1 price realized at the previous period
Wt # of shares remaining to be purchased
Control: St number of shares purchased at time t
Randomness: �t
PT
Objective: minE t�1 Pt St
Dynamics: Pt � Pt 1 + �St + �t Wt � Wt 1 St 1; W1 �
S; WT +1 � 0
Slide 13
Note that WT +1 � 0 is equivalent to the constraint that S must be executed by
period T
JT (PT 1; WT ) �
min E [P W ] � (PT 1 + �WT )WT
ST T T T
Since WT +1 � 0 ) ST � WT
8.3 Solution Slide 15
JT 1(PT 2 ; WT 1) �
� min ET
ST �1 1 PT 1ST 1 + JT (PT 1; WT )
� min ET
ST �1 1 (PT 2 + �ST 1 + �T 1 )ST 1 +
JT PT 2 + �ST 1 + �T 1; WT 1 ST 1
ST � WT 1
1
2
S1 � S
2
J1(P0 ; W1) � �S 1
P0S + 2 1 + T
ST k � WT k + �bk 1
XT k
k+1 2ak
bk � � + ��bk 1
2ak 1 ; b0 � �
�2 b2k
ck � �2 ck 1
4ak
1
; c0 � 0
1
dk � dk 1 + ck 1��2 ; d0 � 0 :
9 Matrix multiplication
Slide 19
Matrices: Mk : nk nk+1
Objective: Find M1 M2 MN
Example: M1 M2 M3 ; M1 : 1 10, M2 : 10 1, M3 : 1 10.
M1 (M2 M3 ) 200 multiplications;
(M1 M2 )M3 20 multiplictions.
What is the optimal order for performing the multiplication�
Slide 20
6
m(i; j) optimal number of scalar multiplications for multiplying Mi : : :Mj .
m(i; i) � 0
For i � j:
m(i; j) � imin
k�j
(m(i; k) + m(k + 1; j) + ni nk+1nj +1 )
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093 Optimization Methods
1 Lecture Outline
Slide 1
� History of Nonlinear Optimization
� Portfolio Optimization
� Tra�c Assignment
� Convex optimization
2 History of Optimization
Slide 2
Fermat, 1638; Newton, 1670
df(x)
� 0
dx
Euler, 1755
rf(x) � 0
Slide 3
Lagrange, 1797
1950s Applications.
1
3 Where do NLPs Arise�
� Equilibrium Problems
Slide 5
� Engineering
Pattern Classi�cation
� Manufacturing
Resource Allocation
Production Planning
4 A Simple Portfolio
Selection Problem
4.1 Data
Slide 6
� xi : decision variable on amount to invest in stock i � 1; 2
Data:
2
5 A Simple Portfolio
Selection Problem
� No short sales
Slide 8
subject to
X
xi � B
X X
i i
xi � 0; i � 1; 2
6 A Real Portfolio
Optimization Problem
6.1 Data
Slide 9
� We currently own zi shares from stock i, i 2 S
� We consider buying and selling stocks in S, and consider buying new stocks
from a set B (B \ S � ;)
Slide 10
� Data: Forecasted prices next period (say next month) and their correla
tions:
E[P^i] � �i
� (�1; : : :; �n)0
; � � �ij
3
6.2 Issues and Objectives Slide 11
� Mutual funds regulations: we cannot sell a stock if we do not own it
� Transaction costs
� Turnover
� Liquidity
� Volatility
� Objective: Maximize expected wealth next period minus transaction
costs
6.3 Decision variables Slide 12
# shares bought or sold if i 2 S
�
xi � # shares bought if i 2 B
By convention:
xi � 0 buy
xi � 0 sell
4
6.6 Turnover Slide 15
� Because of transaction costs: jxij should be small
jxij � �i ) ��i � xi � �i
� Alternatively, we might want to bound turnover:
X n
Pi jxij � t
i�1
6.7 Balanced portfolios Slide 16
� Need the value of stocks we buy and sell to balance out:
� �
�Xn �
n
X
�
�
�
Pixi � L ) �L ��
�
Pixi � L
i�1 i�1
� No short sales:
zi + xi � 0; i2 B[S
6.8 Expected value
and Volatility Slide 17
� Expected value of portfolio:
"
n #
n
P^i (zi + xi) � �i (zi + xi)
X X
E
i�1 i�1
� Variance of the value of the portfolio:
"
n #
P^i(zi + xi ) � (z + x)0�(z + x)
X
V ar
i�1
6.9 Overall formulation Slide 18
n n
X
X
i�1 i�1
s:t: (z + x) �(z + x) �2 0
zi + xi �i zitotal
�i xi �i
n
X
L Pixi L
i�1
n
X
Pi jxi j t
i�1
z i + xi 0
5
7 The general problem
Slide 19
f(x): �n !
7 �
gi (x): � 7! �; i � 1; : : :; m
n
.
.
gm (x) � 0
i�1 i�1
s:t: (z + x) �(z + x) �2
0
zi + xi �i zitotal
xi �i
�i
n
X
L Pixi L
i�1
n
X
Pi jxi j t
i�1
z i + xi 0
8 Geometry Problems
8.1 Fermat-Weber Problem Slide 21
Given m points c1 ; : : :; cm 2 �n (locations of retail outlets) and weights w1; : : :; wm 2
�. Choose the location of a distribution center.
That is, the point x 2 �n to minimize the sum of the weighted distances from
x to each of the points c1; : : :; cm 2 �n (minimize total daily distance
traveled).
m
min wijjx � cijj
P
i�1 n
s:t: x 2 �
or
m
min wijjx � cijj
P
i�1
s:t: x � 0
Ax � b; feasible sites
(Linearly constrained NLP)
8.2 The Ball Circumscription Problem Slide 22
Given m points c1; : : :; cm 2 �n , locate a distribution center at point x 2 �n to
minimize the maximum distance from x to any of the points c1 ; : : :; cm 2 �n.
min �
s:t: jjx � ci jj � �; i � 1; : : :; m
9 Transportation
9.1 Tra�c Assignment Slide 23
� ODPw, paths p 2 Pw , demand dw , xp : �ow of p
cij ( p: crossing (i;j ) xp ): travel cost of link (i; j).
cp (x) is the travel cost of path p and
X
cp (x) � cij (xij ); 8p 2 Pw ; 8w 2 W:
(i;j ) on p
System � optimization principle : Assign �ow on each path to satisfy
total demand and so that the total network cost is minimized.
X
s:t: xp � 0; xp � dw ; 8w
p2Pw
x�p1 � 6; x�p2 � 4; x�
p3 � 0
Slide 25
7
� User � optimization principle : Each user of the network chooses, among
all paths, a path requiring minimum travel cost,
i.e., for all w 2 W and p 2 Pw ,
x�p � 0 : �! cp (x� ) � cp (x� ) 8p0 2 Pw ; 8w 2 W
0
10 Optimal Routing
Slide 26
� Given a data net and a set W of OD pairs w � (i; j) each OD pair w has
input tra�c dw
� Optimal routing problem:
X X
X
s:t: xp � dw ; 8w 2 W
p2Pw
xp � 0; 8p 2 Pw ; w 2 W
.
.
gm (x) � 0
h1(x) � 0
.
.
hl (x) � 0
9
13 Convex Functions Slide 32
� A function f(x) is a convex function if
f(�x + (1 � �)y) � �f(x ) + (1 � �)f(y )
8x; y 8� 2 [0; 1]
1
n
10
The Hessian matrix is the matrix of second partial derivatives:
H(x� )ij � @@xf(@xx� )
i j
i�1
14 Convex Optimization
14.1 Convexity and Minima Slide 40
min f(x)
s.t. x2F
Theorem: Suppose that F is a convex set, f : F ! � is a convex function, and
x� is a local minimum of P. Then x� is a global minimum of f over F .
14.1.1 Proof Slide 41
Assume that x� is not the global minimum. Let y be the global minimum.
From the convexity of f(�),
f(y (�)) � f(�x � + (1 � �)y ) � �f(x � ) + (1 � �)f(y )
� �f(x� ) + (1 � �)f(x� ) � f(x� )
11
for all � 2 (0; 1).
Therefore, f(y (�)) � f(x� ) for all � 2 (0; 1), and so x� is not a local minimum,
resulting in a contradiction
14.2 COP Slide 42
COP : min f(x)
s:t: g1(x) � 0
..
.
gm (x) � 0
Ax � b
Slide 43
COP is called a convex optimization problem if f(x); g1(x); : : :; gm (x) are con
vex functions
Note that this implies that the feasible region F is a convex set
In COP we are minimizing a convex function over a convex set
Implication: If COP is a convex optimization problem, then any local minimum
will be a global minimum.
15 Examples of COPs
Slide 44
The Fermat-Weber Problem - COP
m
min
P
i�1
wijjx � cijj
s:t: x 2 P
The Ball Circumscription Problem - COP
min �
s:t: jjx � ci jj � �; i � 1; : : :; m
12
max �i (zi + xi ) 2
i�1 i�1
s:t: (z + x) �(z + x) �2
0
zi + xi �i zitotal
�i xi �i
n
X
L L
Pixi
i�1
n
X
Pi jxi j t
i�1
z i + xi 0
i � 1; : : :; m
This is a COP
di � 0;
i � 1; : : :; m
� Separable: f(x) � P
j fj (xj ), gi(x) � P
j gij (xj )
13
� Computation of minima via iterative algorithms
Iterative descent Methods
Interior Point Methods
18 Summary
� Convex optimization is a powerful modeling framework
14
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
1
15.093 Optimization Methods
Gradient Methods
1 Outline
Slide 1
1. Necessary and sufficient optimality conditions
2. Gradient methods
3. The steepest descent algorithm
4. Rate of convergence
5. Line search algorithms
2 Optimality Conditions
Slide 2
Necessary Conds for Local Optima
“If x̄ is local optimum then x̄ must satisfy ...”
3 Optimality Conditions
3.1 Necessary conditions
Slide 3
Consider
min f (x)
x∈ℜn
Zero first order variation along all directions
Theorem
Let f (x) be continuously differentiable.
If x∗ ∈ ℜn is a local minimum of f (x), then
3.2 Proof
Slide 4
Zero slope at local min x∗
• f (x∗ ) ≤ f (x∗ + λd) for all d ∈ ℜn , λ ∈ ℜ
1
• Pick λ > 0
f (x∗ + λd) − f (x∗ )
0≤
λ
• Take limits as λ → 0
0 ≤ ∇f (x∗ )′ d, ∀d ∈ ℜn
1 2 ′ 2
= λ d ∇ f (x∗ )d + λ2 ||d||2 R(x∗ ; λd) ⇒
2
f (x∗ + λd) − f (x∗ ) 1
= d′ ∇2 f (x∗ )d + ||d||2 R(x∗ ; λd)
λ2 2
If ∇2 f (x∗ ) is not PSD, ∃d¯: d¯ ′ ∇2 f (x∗ )d¯ < 0 ⇒ f (x∗ + λd)¯ < f (x̄), ∀λ
suff. small QED.
3.3 Example
Slide 6
f (x) = 12 x21 + x1 .x2 + 2x22 − 4x1 − 4x2 − x3
2
∇f (x) = (x�1 + x2 − 4, x1 + � 4x2 − 4 − 3x22 ) Candidates x∗ = (4, 0) and x̄ = (3, 1)
1 1
∇2 f (x) =
� 1 4 −�6x2
1 1
∇2 f (x∗ ) =
1 4
PSD
Slide 7
x̄ = (3, 1) � �
1 1
∇2 f (x̄) =
1 −2
Indefinite matrix
x∗ is the only candidate for local min
2
3.5 Example Continued...
Slide 9
At x∗ = (4,�0), ∇f (x∗ ) = 0 �and
1 1
∇2 f (x) =
1 4 − 6x2
is PSD for x ∈ B(x∗ , ǫ) Slide 10
f (x) = x31 + 2 2 ∗
� x2 and ∇f�(x) = (3x1 , 2x2 ) x = (0, 0)
6x1 0
∇2 f (x) = is not PSD in B(0, ǫ)
0 2
f (−ǫ, 0) = −ǫ3 < 0 = f (x∗ )
3.7 Proof
Slide 12
By convexity
f (λx + (1 − λ)x) ≤ λf (x) + (1 − λ)f (x)
f (x + λ(x − x)) − f (x)
≤ f (x) − f (x)
λ
As λ → 0,
∇f (x)′ (x − x) ≤ f (x) − f (x)
∇f (x∗ ) = 0
3
3.9 Descent Directions
Slide 14
Interesting Observation
f diff/ble at x̄
∃d: ∇f (x̄)′ d < 0 ⇒ ∀λ > 0, suff. small, f (x̄ + λd) < f (x̄)
3.10 Proof
Slide 15
f (x̄ + λd) = f (x̄) + λ∇f (x̄)t d + λ||d||R(x̄, λd)
where R(x̄, λd) −→λ→0 0
f (x̄ + λd) − f (x̄)
= ∇f (x̄)t d + ||d||R(x̄, λd)
λ
∇f (x̄)t d < 0, R(x̄, λd) −→λ→0 0 ⇒
∀λ > 0 suff. small f (x̄ + λd) < f (x̄). QED
5 Gradient Methods
5.1 A generic algorithm
Slide 17
• xk+1 = xk + λk dk
• If ∇f (xk ) 6= 0, direction dk satisfies:
∇f (xk )′ dk < 0
• Step-length λk > 0
• Principal example:
xk+1 = xk − λk Dk ∇f (xk )
4
5.2 Principal directions
Slide 18
• Steepest descent:
xk+1 = xk − λk ∇f (xk )
• Newton’s method:
6 Steepest descent
6.1 The algorithm
Slide 20
Step 0 Given x0 , set k := 0.
Step 1 dk := −∇f (xk ). If ||dk || ≤ ǫ, then stop.
Step 2 Solve minλ h(λ) := f (xk + λdk ) for the
step-length λk , perhaps chosen by an exact
or inexact line-search.
Step 3 Set xk+1 ← xk + λk dk , k ← k + 1.
Go to Step 1.
6.2 An example
Slide 21
minimize f (x1 , x2 ) = 5x21 + x22 + 4x1 x2 − 14x1 − 6x2 + 20
x∗ = (x∗1 , x∗2 )′ = (1, 1)′
f (x∗ ) = 10 Slide 22
k
Given x
−10xk1 − 4xk2 + 14 dk1
� � � �
dk = −∇f (xk1 , xk2 ) = =
−2xk2 − 4xk1 + 6 dk2
h(λ) = f (xk + λdk )
= 5(xk1 + λdk1 )2 + (xk2 + λdk2 )2 + 4(xk1 + λdk1 )(xk2 + λdk2 )−
−14(xk1 + λdk1 ) − 6(xk2 + λdk2 ) + 20
5
(dk1 )2 + (dk2 )2
λk = Slide 23
2(5(dk1 )2
+ (dk2 )2 + 4dk1 dk2 )
Start at x = (0, 10)′
ε = 10−6
k x1k x2k d1k d2k ||dk ||2 λk f (xk )
1 0.000000 10.000000 −26.000000 −14.000000 29.52964612 0.0866 60.000000
2 −2.252782 8.786963 1.379968 −2.562798 2.91071234 2.1800 22.222576
3 0.755548 3.200064 −6.355739 −3.422321 7.21856659 0.0866 12.987827
4 0.204852 2.903535 0.337335 −0.626480 0.71152803 2.1800 10.730379
5 0.940243 1.537809 −1.553670 −0.836592 1.76458951 0.0866 10.178542
6 0.805625 1.465322 0.082462 −0.153144 0.17393410 2.1800 10.043645
7 0.985392 1.131468 −0.379797 −0.204506 0.43135657 0.0866 10.010669
8 0.952485 1.113749 0.020158 −0.037436 0.04251845 2.1800 10.002608
9 0.996429 1.032138 −0.092842 −0.049992 0.10544577 0.0866 10.000638
10 0.988385 1.027806 0.004928 −0.009151 0.01039370 2.1800 10.000156
Slide 24
k xk
1 xk
2 dk
1 dk
2
k
||d ||2 λk k
f (x )
11 0.999127 1.007856 −0.022695 −0.012221 0.02577638 0.0866 10.000038
12 0.997161 1.006797 0.001205 −0.002237 0.00254076 2.1800 10.000009
13 0.999787 1.001920 −0.005548 −0.002987 0.00630107 0.0866 10.000002
14 0.999306 1.001662 0.000294 −0.000547 0.00062109 2.1800 10.000001
15 0.999948 1.000469 −0.001356 −0.000730 0.00154031 0.0866 10.000000
16 0.999830 1.000406 0.000072 −0.000134 0.00015183 2.1800 10.000000
17 0.999987 1.000115 −0.000332 −0.000179 0.00037653 0.0866 10.000000
18 0.999959 1.000099 0.000018 −0.000033 0.00003711 2.1800 10.000000
19 0.999997 1.000028 −0.000081 −0.000044 0.00009204 0.0866 10.000000
20 0.999990 1.000024 0.000004 −0.000008 0.00000907 2.1803 10.000000
21 0.999999 1.000007 −0.000020 −0.000011 0.00002250 0.0866 10.000000
22 0.999998 1.000006 0.000001 −0.000002 0.00000222 2.1817 10.000000
23 1.000000 1.000002 −0.000005 −0.000003 0.00000550 0.0866 10.000000
24 0.999999 1.000001 0.000000 −0.000000 0.00000054 0.0000 10.000000
Slide 25
5 x2+4 y2+3 x y+7 x+20
1200
1000
800
600
400
200
10
5
0 10
5
−5 0
−5
−10 −10
y
x
Slide 26
6
10
−2
−5 0 5
bounded set
7
• Perform line-search of h(λ) = f (xk + λdk )
8 Rate of convergence
of algorithms
Slide 30
Let z1 , . . . , zn , . . . → z be a convergent sequence. We say that the order of
convergence of this sequence is p∗ if
� �
|zk+1 − z|
p∗ = sup p : lim sup p
< ∞
k→∞ |zk − z|
Let
|zk+1 − z|
β = lim sup p∗
k→∞ |zk − z|
8.2 Examples
Slide 32
• zk = ak , 0 < a < 1 converges linearly to zero, β = a
k
• zk = a2 , 0 < a < 1 converges quadratically to zero
1
• zk = k converges sub-linearly to zero
� �k
1
• zk = k converges super-linearly to zero
8
• Then an algorithm exhibits linear convergence if there is a constant δ < 1
such that
f (xk+1 ) − f (x∗ )
≤δ,
f (xk ) − f (x∗ )
8.3.1 Discussion
Slide 34
f (xk+1 ) − f (x∗ )
≤δ<1
f (xk ) − f (x∗ )
• If δ = 0.1, every iteration adds another digit of accuracy to the optimal
objective value.
9 Rate of convergence
of steepest descent
9.1 Quadratic Case
9.1.1 Theorem
Slide 35
Suppose f (x) = 12 x′ Qx − c′ x
Q is psd
9.1.2 Discussion
� � 2 Slide 36
λmax
k+1 ∗
f (x ) − f (x ) λmin −1
≤ �
f (xk ) − f (x∗ )
�
λmax
λmin+1
λmax
• κ(Q) := λmin is the condition number of Q
9
• κ(Q) ≥ 1
• κ(Q) plays an extremely important role in analyzing computation involv
ing Q
Slide 37
�2
f (xk+1 ) − f (x∗ )
�
κ(Q) − 1
≤
f (xk ) − f (x∗ ) κ(Q) + 1
Upper Bound on Number of Iterations to Reduce
λmax
κ(Q) = λmin Convergence Constant δ the Optimality Gap by 0.10
1.1 0.0023 1
3.0 0.25 2
10.0 0.67 6
100.0 0.96 58
200.0 0.98 116
400.0 0.99 231
Slide 38
For κ(Q) ∼ O(1) converges fast.
For large κ(Q)
� �2
κ(Q) − 1 1 2 2
∼ (1 − ) ∼1−
κ(Q) + 1 κ(Q) κ(Q)
Therefore
2 k
(f (xk ) − f (x∗ )) ≤ (1 − ) (f (x0 ) − f (x∗ ))
κ(Q)
In k ∼ 12 κ(Q)(−lnǫ) iterations, finds xk :
9.2 Example 2
Slide 39
1
f (x) = x′ Qx − c′ x + 10
2
� � � �
20 5 14
Q = c=
5 1 6
κ(Q) = 30.234
�2
κ(Q)−1
�
δ = κ(Q)+1 = 0.8760 Slide 40
f (xk ) − f (x∗ )
k xk
1 xk
2 ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )
1 40.000000 −100.000000 286.06293014 0.0506 6050.000000
2 25.542693 −99.696700 77.69702948 0.4509 3981.695128 0.658079
3 26.277558 −64.668130 188.25191488 0.0506 2620.587793 0.658079
4 16.763512 −64.468535 51.13075844 0.4509 1724.872077 0.658079
5 17.247111 −41.416980 123.88457127 0.0506 1135.420663 0.658079
6 10.986120 −41.285630 33.64806192 0.4509 747.515255 0.658079
7 11.304366 −26.115894 81.52579489 0.0506 492.242977 0.658079
8 7.184142 −26.029455 22.14307211 0.4509 324.253734 0.658079
9 7.393573 −16.046575 53.65038732 0.0506 213.703595 0.658079
10 4.682141 −15.989692 14.57188362 0.4509 140.952906 0.658079
10
Slide 41
f (xk ) − f (x∗ )
k xk
1 xk
2 ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )
Slide 42
9.3 Example 3
Slide 43
1
f (x) = x′ Qx − c′ x + 10
2
� � � �
20 5 14
Q= c=
5 16 6
κ(Q) = 1.8541
� �2
κ(Q) − 1
δ= = 0.0896 Slide 44
κ(Q) + 1
f (xk ) − f (x∗ )
k xk
1 x2k ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )
Slide 45
11
9.4 Empirical behavior
Slide 46
• The convergence constant bound is not just theoretical. It is typically
experienced in practice.
Slide 47
10 Summary
Slide 48
1. Optimality Conditions
2. The steepest descent algorithm - Convergence
3. Rate of convergence of Steepest Descent
12
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
• Characterizations of Convexity
a. f convex ⇔ f (y) ≥ f (x) + ∇f (x)′ (y − x) ∀x, y
b. f convex ⇔ ∇2 f (x) PSD ∀x
• f convex then global min ⇔ ∇f (x∗ ) = 0
2 Steepest descent
2.1 The algorithm
Slide 2
Step 0 Given x0 , set k := 0.
Step 1 dk := −∇f (xk ). If ||dk || ≤ ǫ, then stop.
Step 2 Solve minλ h(λ) := f (xk + λdk ) for the
or inexact line-search.
Go to Step 1.
3 Outline
Slide 3
1. Bisection Method - Armijo’s Rule
2. Motivation for Newton’s method
3. Newton’s method
4. Quadratic rate of convergence
5. Modification for global convergence
1
• Diminishing stepsize: λk → 0, λk = ∞
P
k
• Armijo Rule
4.1.2 Algorithm
Slide 6
Step 0. Set k = 0. Set λL := 0 and λU := λ̂.
λU + λL ′
Step k. Set λ̃ = and compute h (λ̃).
′
2
′
If h (λ̃) < 0, re-set λL := λ̃. Set k ← k + 1.
′
If h (λ̃) = 0, stop.
4.1.3 Analysis
Slide 7
• At the kth iteration of the bisection algorithm, the current interval [λL , λU ]
interval [λL , λU ] is
k
1
length = (λ̂).
2
2
4.1.4 Example
Slide 8
1 0.6 4
h′ (λ) = −17.4225 + − + +
1 − λ 1 + 0.6λ 1 − 4λ
0.25 4.65
+ −
1 − 0.25λ 1 + 4.65λ Slide 10
10
0
0.05 0.1 0.15 0.2 0.25
−10
−20 Slide 11
k λkl λku h′ (λ̃)
1 0.0000000 1.0000000 NaN
2 0.0000000 0.5000000 NaN
3 0.0000000 0.2500000 −11.520429338348
4 0.1250000 0.2500000 −2.952901763683
5 0.1875000 0.2500000 13.286386294218
6 0.1875000 0.2187500 2.502969022220
7 0.1875000 0.2031250 −0.605144021505
8 0.1953125 0.2031250 0.831883373151
9 0.1953125 0.1992188 0.087369215988
10 0.1953125 0.1972656 −0.265032213496
Slide 12
3
k λkl λku h′ (λ̃)
20 0.1970253 0.1970272 −0.000184301091
30 0.1970268 0.1970268 −0.000000146531
40 0.1970268 0.1970268 0.000000000189
41 0.1970268 0.1970268 0.000000000023
42 0.1970268 0.1970268 −0.000000000059
43 0.1970268 0.1970268 −0.000000000018
44 0.1970268 0.1970268 0.000000000003
45 0.1970268 0.1970268 −0.000000000008
46 0.1970268 0.1970268 −0.000000000002
47 0.1970268 0.1970268 0.000000000000
Slide 13
Bisection is accurate but may be expensive in practice
Inexact line search method. Requires two parameters: ǫ ∈ (0, 1), σ > 1.
5 Newton’s method
5.1 History
Slide 15
Steepest Descent is simple but slow
Raphson became member of the Royal Society in 1691 for his book “Analysis
4
6 Newton’s method
6.1 Motivation
Slide 16
Consider
min f (x)
• Taylor series expansion around x̄
f (x) ≈ g(x) = f (x̄) + ∇f (x̄)′ (x − x̄)+
1
(x − x̄)′ ∇2 f (x̄)(x − x̄)
2
• Instead of min f (x), solve min g(x), i.e., ∇g(x) = 0
• ∇f (x̄) + ∇2 f (x̄)(x − x̄) = 0
⇒ x − x̄ = −(∇2 f (x̄))−1 ∇f (x̄)
Go to Step 1.
6.3 Comments
Slide 18
• The method assumes that ∇2 f (xk ) is nonsingular at each iteration
• There is no guaranteee that f (xk+1 ) ≤ f (xk )
• We can augment the algorithm with a line-search:
6.4 Properties
Slide 19
Theorem If H = ∇2 f (xk ) is PD, then dk is a descent direction: ∇f (xk )′ dk < 0
Proof:
∇f (xk )′ dk = −∇f (xk )′ H −1 ∇f (xk ) < 0
if H −1 is PD. But,
0 < v ′ (H −1 )′ HH −1 v = v ′ H −1 v
⇒ H −1 is PD
5
6.5 Example 1
Slide 20
1 ∗
f (x) = 7x − log x, x = = 0.14857143
7
1 1
f ′ (x) = 7 − , f ′′ (x) = 2
x x
−1
1 1
⇒ d = −(f ′′ (x))−1 f ′ (x) = − 7 − = x − 7x2
x2 x
Slide 21
k xk xk xk xk
0 1
0 0.01
0.1
1 -5
0 0.0193
0.13
2 -185
0 0.03599
0.1417
3 -239945
0 0.062917
0.14284777
4 -4E11
0 0.098124
0.142857142
5 -112E22
0 0.128849782
0.142857143
6 0 0.141483700
0.142857143
7 0 0.142843938
0.142857143
8 0 0.142857142
0.142857143
6.6 Example 2
Slide 22
f (x1 , x2 ) = − log(1 − x1 − x2 ) − log x1 − log x2
1 1
1 − x1 − x2 −
x1
∇f (x1 , x2 ) =
1 1
−
1 − x1 − x2 x2
2 2 2
1 1 1
1−x −x +
1 2 x1 1 − x1 − x2
∇2 f (x1 , x2 ) =
2 2 2
1 1 1
+
1 − x1 − x2 1 − x1 − x2 x1
Slide 23
1 1
(x∗1 , x∗2 ) = , , f (x∗1 , x∗2 ) = 3.295836867
3 3
6
k xk1 xk2 ||x − x∗ ||
0 0.85 0.05 0.58925565
1 0.7170068 0.09659864 0.45083106
2 0.5129752 0.17647971 0.23848325
3 0.35247858 0.27324878 0.06306103
4 0.33844902 0.32623807 0.00874717
5 0.33333772 0.33325933 7.4133E-05
6 0.33333334 0.33333333 1.1953E-08
7 0.33333333 0.33333333 1.5701E-16
7 Quadratic convergence
Slide 24
• Recall
|zn+1 − z|
7
Suppose Slide 28
∗ 2h
||x − x || ≤ min β, = γ.
3L
Let x
¯ be the next iterate in the Newton method. Then,
2h
¯ − x∗ || ≤ ||x − x∗ ||2
||x and
3L
7.3 Proof
Slide 29
¯ − x∗
x
= x − H(x)−1 g(x) − x∗
= x − x∗ + H(x)−1 g(x∗ ) − g(x)
= x − x∗ + H(x)−1
Z 1
H(x + t(x∗ − x))(x∗ − x)dt (Lemma 1)
0
Z 1
= H(x)−1 (H(x + t(x∗ − x)) − H(x)) (x∗ − x)dt
0
Slide 30
¯ − x∗ ||
||x
≤ ||H(x)−1 ||
Z 1
H(x + t(x∗ − x)) − H(x) · ||x − x∗ ||dt
0
Z 1
≤ ||H(x)−1 || · ||x − x∗ || L||x − x∗ || t dt
0
1
= L||H(x)−1 || · ||x − x∗ ||2
2
Slide 31
Now
h ≤ ||H(x∗ )||
= ||H(x) + H(x∗ ) − H(x)||
≤ ||H(x)|| + ||H(x∗ ) − H(x)||
≤ ||H(x)|| + L||x∗ − x||
⇒ ||H(x)|| ≥ h − L||x − x∗ ||
Slide 32
−1 1 1
⇒ ||H(x) || ≤ ≤
||H (x)|| h − L||x − x∗ ||
8
L
⇒ ||x̄ − x∗ || ≤ ||x − x∗ ||2
2(h − L||x − x∗ ||)
L
≤ ||x − x∗ ||2
2(h − 23 h)
3L
= ||x − x∗ ||2
2h
7.4 Lemma 1
Slide 33
• Fix w. Let φ(t) = g(x + t(x∗ − x))′ w
• φ(0) = g(x)′ w, φ(1) = g(x∗ )′ w
• φ′ (t) = w ′ H(x + t(x∗ − x))(x∗ − x)
R1
• φ(1) = φ(0) + 0 φ′ (t)dt ⇒
Slide 34
∀w : w′ (g(x∗ ) − g(x)) =
Z 1
w′ H(x + t(x∗ − x))(x∗ − x)dt
0
Z 1
⇒ g(x∗ ) − g(x) = H(x + t(x∗ − x))(x∗ − x)dt
0
local max.
3L
Let rk = ||xk − x∗ || and C = 2h . If kx0 − x∗ || < γ then
1 k
rk ≤ (Cr0 )2
C
Slide 37
9
Proposition:
• D k is a diagonal matrix:
∇2 f (xk ) + Dk is PD
10 Summary
Slide 40
1. Line search methods:
Bisection Method.
Armijo’s Rule.
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
1 Outline
Slide 1
1. The Conjugate Gradient Algorithm
2. Necessary Optimality Conditions
3. Sufficient Optimality Conditions
4. Convex Optimization
5. Applications
2.2 Motivation
Slide 3
Given Q-conjugate d1 , . . . , dn , and xk , compute
k
min f (xk + αdk ) = c� x + αc� dk +
α
1 k
(x + αdk )� Q(xk + αdk ) =
2
1
f (xk ) + α�f (xk )� dk + α2 d�k Qdk
2
Solution:
−�f (xk )� dk
α̂k = , xk+1 = xk + α̂k dk
d�k Qdk
Moreover, xn+1 = x∗ .
1
2.4 The Algorithm
Slide 5
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 For k = 1, . . . , n do:
If ||�f (xk )|| ≤ �, stop; else:
−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk
xk+1 = xk + α̂k dk
−�f (xk+1 )� Qdk
dk+1 = −�f (x k+1 ) + λk dk , λk =
d�k Qdk
2.5 Correctness
Slide 6
Theorem: The directions d1 , . . . , dn are Q-conjugate.
solution in k steps.
2.7 Example
Slide 8
1 �
f (x) = x Qx − c� x
2
⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞
⎜ 19 43 33 19 5 2 5 4 0 0
⎜ 0
⎜ 22 33 40 29 12 7 6 2 2 4 ⎜ 0
⎟ ⎟
⎟ ⎟
⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 16 5 12 16 12 4 8 2 4 8 ⎜ 0
Q = ⎜ ⎟ and c = ⎜ −2
⎟ ⎟
⎜ 3 2 7 7 4 5 1 0 1 4 ⎟
⎟ ⎜ ⎟
⎜ 16 5 6 14 8 1 12 2 2 4 ⎟ ⎜ 0 ⎟
⎜ 6 4 2 6 2 0 2 4 0 0
⎟ ⎜ −6 ⎟
⎝ ⎠ ⎝ ⎠
4 0 2 2 4 1 2 0 2 4 −7
4 0 4 4 8 4 4 0 4 16 −4
� �2
κ(Q)−1
κ(Q) ≈ 17, 641 δ=
κ(Q)+1
= 0.999774 Slide 9
2
f (xk ) − f (x∗ )
k f (xk ) f (xk ) − f (x∗ )
f (xk−1 ) − f (x∗ )
1 12.000000 2593.726852 1.000000
2 8.758578 2590.485430 0.998750
3 1.869218 2583.596069 0.997341
4 −12.777374 2568.949478 0.994331
5 −30.479483 2551.247369 0.993109
6 −187.804367 2393.922485 0.938334
7 −309.836907 2271.889945 0.949024
8 −408.590428 2173.136424 0.956532
9 −754.887518 1826.839334 0.840646
10 −2567.158421 14.568431 0.007975
11 −2581.711672 0.015180 0.001042
12 −2581.726852 −0.000000 −0.000000
−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk
x k+1 = xk + α̂k dk
dk+1 = −�f (xk+1 ) + λk dk
||�f (xk+1 )||
λk =
||�f (xk )||
Step 2 k ← k + 1, goto Step 1
3 Necessary
Optimality Conditions
3.1 Nonlinear Optimization
Slide 11
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m
P = {x| gj (x) ≤ 0, j = 1, . . . , p,
hi (x) = 0, i = 1, . . . , m}
3
3.2 The KKT conditions
Slide 12
Discovered by Karush-Kuhn-Tucker in 1950’s.
Theorem
If
• x is local minimum of P
• I = {j| gj (x) = 0}, set of tight constraints
• Constraint qualification condition (CQC): The vectors �gj (x), j ∈ I and
�hi (x), i = 1, . . . , m, are linearly independent
Slide 13
Then, there exist vectors (u, v):
p
� m
�
1. �f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1
2. u ≥ 0
3. uj gj (x) = 0, j = 1, . . . , p
3.4 Example 1
Slide 16
min f (x) = (x1 − 12)2 + (x2 + 6)2
s.t. h1 (x) = 8x1 + 4x2 − 20 = 0
g1 (x) = x21 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0
g2 (x) = (x1 − 9)2 + x22 − 64 ≤ 0
4
x = (2, 1)� ; g1 (x) = 0, g2 (x) = −14, h1 (x) = 0.
Slide 17
• I = {1}
• �f (x) = (−20, 14)�; �g1 (x) = (7, −2.5)�
• �g2 (x) = (−14, 2)� ; �h1 (x) = (8, 4)�
• u1 = 4, u2 = 0, v1 = −1
• �g1 (x), �h1 (x) linearly independent
• �f (x) + u1 �g1 (x) + u2 �g2 (x) + v1 �h1 (x) = 0
� � � � � � � � � �
−20 7 −14 8 0
+4 +0 + (−1) =
14 −2.5 2 4 0
3.5 Example 2
Slide 18
max x� Qx
s.t. x� x ≤ 1
Q arbitrary; Not a convex optimization problem.
min −x� Qx
s.t. x� x ≤ 1
3.5.1 KKT
Slide 19
−2Qx + 2ux = 0
x� x ≤ 1
u ≥ 0
u(1 − x� x) = 0
Slide 20
5
3.6 Are CQC Necessary?
Slide 21
min x1
s.t. x21 − x2 ≤ 0
x2 = 0
Feasible space is (0, 0).
KKT: � � � � � � � �
1 2x1 0 0
+u +v =
0 −1 1 0
KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1 (0, 0)
and �h1 (0, 0).
Theorem Under the Slater condition the KKT conditions are necessary.
4 Sufficient
Optimality Conditions
Slide 23
Theorem If
• x feasible for P
• Feasible set is P is convex and f (x) convex
• There exist vectors (u, v), u ≥ 0:
p
� m
�
�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1
uj gj (x) = 0, j = 1, . . . , p
4.1 Proof
Slide 24
• Let x ∈ P . Then (1 − λ)x + λx ∈ P for λ ∈ [0, 1].
• gj (x + λ(x − x)) ≤ 0 ⇒
�gj (x)� (x − x) ≤ 0
6
• Similarly, hi (x + λ(x − x)) ≤ 0 ⇒
�hi (x)� (x − x) = 0
Slide 25
• Thus,
�f (x)� (x − x) =
⎛ ⎞�
� p m
�
−⎝ uj �gj (x) + vi �hi (x)⎠ (x − x) ≥ 0
j=1 i=1
⇒ f (x) ≥ f (x).
5 Convex Optimization
Slide 26
• The KKT conditions are always necessary under CQC.
• The KKT conditions are sufficient for convex optimization problems.
• The KKT conditions are necessary and sufficient for convex optimization
ble.
7
S
.x *
• Take λ → 0:
p
� m
�
c0 f (x) + cj gj (x) + di hi (x) ≥ c0 f (x)
j=1 i=1
•
p
� m
�
f (x) + uj gj (x) + vi hi (x) ≤ f (x) ≤
j=1 i=1
p
� m
�
f (x) + uj gj (x) + vi hi (x).
j=1 i=1
8
• Thus, ⎛ ⎞
p
� m
�
f (x) = min ⎝f (x) + uj gj (x) + vi hi (x)⎠
j=1 i=1
p
�
uj gj (x) = 0 ⇒ uj gj (x) = 0
j=1
6 Applications
6.1 Linear Optimization
Slide 30
min c� x
s.t. Ax = b
x≥0
min c� x
s.t. Ax − b = 0
−x ≤ 0
6.1.1 KKT
Slide 31
c + A� û − v = 0
v ≥ 0
vj xj = 0
Ax − b = 0
x ≥ 0
u = −û
A� u ≤ c dual feasibility
(cj − A�j u)xj = 0 complementarity
Ax − b = 0 primal feasibility
x ≥ 0 primal feasibility
9
6.2 Portfolio Optimization
Slide 32
x =weights of the portfolio
1
max r � x − λ x� Qx
2
s.t. e� x = 1
1
min −r �x + λ x� Qx
2
s.t. e� x = 1
6.2.1 KKT
Slide 33
−r + λQx + ue = 0
1 −1
x= Q (r − ue)
λ
−1
e� x = 1 ⇒ e� Q (r − ue) = λ
� −1
eQ r−λ
u= � −1
eQ e
As λ changes, tradeoff of risk and return changes. The allocation changes as
well. This is the essense of modern portfolio theory.
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
2 History
Slide 2
• In 1984, Karmakar at AT&T “invented” interior point method
• In 1985, Affine scaling “invented” at IBM + AT&T seeking intuitive ver
markar’s algorithm
3 Geometric intuition
3.1 Notation
Slide 3
min c� x
s.t. Ax = b
x ≥ 0
and its dual
max p� b
s.t. p� A ≤ c�
• P = {x | Ax = b, x ≥ 0}
• {x ∈ P | x > 0} the interior of P and its elements interior points
1
c
x2 .
x1 .0
x.
2
d=x−y
min c� d
s.t. Ad = 0
||Y −1 d|| ≤ β
4.2 Solution
Slide 7
If rows of A are linearly independent and c is not a linear combination of the
rows of A, then
• optimal solution d∗ :
Y 2 (c − A� p)
d∗ = −β �� � , p = (AY 2 A� )−1 AY 2 c.
|Y (c − A� p)�|
• x = y + d∗ ∈ P
� �
• c� x = c� y − β �|Y (c − A� p)�| < c� y
4.2.1 Proof
Slide 8
• AY 2 A� is invertible;if not, there exists some z =
� 0 such that z � AY 2 A� z = 0
� �
• w = Y A z; w w = 0 ⇒ w = 0
• Hence A� z = 0 contradiction
• Since c is not a linear combination of the rows of A, c − A� p = � 0 and d∗ is well
defined
• d∗ feasible
Y (c − A� p)
Y −1 d∗ = −β � � ⇒ ||Y −1 d∗ || = β
�|Y (c − A� p)�|
Ad∗ = 0, since AY 2 (c − A� p) = 0
•
c� d = (c� − p� A)d
= (c� − p� A)Y Y −1 d
� �
≥ −�|Y (c − A� p)�| · ||Y −1 d||
� �
≥ −β �|Y (c − A� p)�|.
Slide 9
•
c� d∗ = (c� − p� A)d∗
Y 2 (c − A� p)
= −(c� − p� A)β � �
�|Y (c − A� p)�|
� �� � �
Y (c − A� p) Y (c − A� p)
= −β � �
�|Y (c − A� p)�|
� �
= −β �|Y (c − A� p)�|.
� �
• c� x = c� y + c� d∗ = c� y − β �|Y (c − A� p)�|
3
4.3 Interpretation
Slide 10
• y be a nondegenerate BFS with basis B
• A = [B N ]
• Y = diag(y1 , . . . , ym , 0, . . . , 0) and Y 0 = diag(y1 , . . . , ym ), then AY =
[BY 0 0]
p = (AY 2 A� )−1 AY 2 c
= (B � )−1 Y −2
0 B
−1
BY 20 cB
= (B � )−1 cB
r = c − A� (B � )−1 cB
• Under degeneracy?
4.4 Termination
Slide 11
y and p be primal and dual feasible solutions with
c� y − b� p < �
c� y ∗ ≤ c� y < c� y ∗ + �,
b� p∗ − � < b� p ≤ b� p∗
4.4.1 Proof
Slide 12
• c� y ∗ ≤ c� y
• By weak duality, b� p ≤ c� y ∗
• Since c� y − b� p < �,
c� y < b� p + � ≤ c� y ∗ + �
b� p∗ = c� y ∗ ≤ c� y < b� p + �
4
5 Affine Scaling
5.1 Inputs
Slide 13
• (A, b, c);
• an initial primal feasible solution x0 > 0
• the optimality tolerance � > 0
• the parameter β ∈ (0, 1)
xk > 0, let
X k = diag(xk1 , . . . , xkn ),
pk = (AX 2k A� )−1 AX 2k c,
rk = c − A� pk .
X 2k rk
xk+1 = xk − β .
||X k rk ||
5.3 Variants
Slide 15
• ||u||∞ = maxi |ui |, γ(u) = max{ui | ui > 0}
• γ(u) ≤ ||u||∞ ≤ ||u||
• Short-step method.
• Long-step variants
X 2k r k
xk+1 = xk − β
||X k r k ||∞
X 2k r k
xk+1 = xk − β
γ(X k r k )
5
6 Convergence
6.1 Assumptions
Slide 16
Assumptions A:
(a) The rows of the matrix A are linearly independent.
(b) The vector c is not a linear combination of the rows of A.
(c) There exists an optimal solution.
(d) There exists a positive feasible solution.
Assumptions B:
(a) Every BFS to the primal problem is nondegenerate.
(b) At every BFS to the primal problem, the reduced cost of every nonbasic
variable is nonzero.
6.2 Theorem
Slide 17
If we apply the long-step affine scaling algorithm with � = 0, the following hold:
(a) For the Long-step variant and under Assumptions A and B, and if 0 < β < 1,
(b) For the second Long-step variant, and under Assumption A and if 0 < β <
2/3, the sequences xk and pk converge to some primal and dual optimal solutions,
respectively
7 Initialization
Slide 18
min c� x + M xn+1
s.t. � � Ax + (b − Ae)xn+1 = b
x, xn+1 ≥ 0
8 Example
Slide 19
max x1 + 2x2
s.t. x1 + x2 ≤ 2
−x1 + x2 ≤ 1
x1 , x2 ≥0
9 Practical Performance
Slide 20
• Excellent practical performance, simple
• Major step: invert AX 2k A�
• Imitates the simplex method near the boundary
6
x2
..
1 ...
..
..
2 x1
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093J Optimization Methods
2 Barrier methods
Slide 2
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m
2.1 Strategy
Slide 3
• A barrier function G(x) is a continous function with the property that is
• Examples:
p p
� � 1
G(x) = − log(−gj (x)), G(x) = −
j=1 j=1
gj (x)
Slide 4
• Consider a sequence of µk : 0 < µk+1 < µk and µk → 0.
• Consider the problem
1
. x*
.
x(0.01) c
central p a th . x(0.1)
.
x(1)
. x(10)
.
analy t ic center
Barrier problem:
n
�
min Bµ (x) = c� x − µ log xj
j=1
s.t. Ax = b
Minimizer: x(µ)
3 Central Path
Slide 6
• As µ varies, minimizers x(µ) form the central path
• limµ→0 x(µ) exists and is an optimal solution x∗ to the initial LP
• For µ = ∞, x(∞) is called the analytic center
n
�
min − log xj
j=1
s.t. Ax = b
Slide 7
2
x3 (1/3, 1/3, 1/3)
the analy t ic
Q centerof P
.
P
(1/2, 0, 1/2)
the analy t ic
.
x2
centerof Q
the central p a th
x1
�
• Q = x | x = (x1 , 0, x3 ), x1 + x3 = 1, x ≥ 0}, set of optimal solutions to
original LP
3.1 Example
Slide 8
min x2
s.t. x1 + x2 + x3 = 1
x1 , x2 , x3 ≥ 0
min x2 − µ log x1 − µ log x2 − µ log x3
s.t. x1 + x2 + x3 = 1
1 − x2 (µ)
x1 (µ) =
2 �
1 + 3µ − 1 + 9µ2 + 2µ
x2 (µ) =
2
1 − x2 (µ)
x3 (µ) =
2
The analytic center: (1/3, 1/3, 1/3) Slide 9
3
• Solution (KKT):
Ax(µ) = b
x(µ) ≥ 0
A� p(µ) + s(µ) = c
s(µ) ≥ 0
X(µ)S(µ)e = eµ
Slide 11
∗ ∗ ∗
• Theorem: If x , p , and s satisfy optimality conditions, then they are
optimal solutions to problems primal and dual barrier problems.
• Goal: Solve barrier problem
n
�
min Bµ (x) = c� x − µ log xj
j=1
s.t. Ax = b
c − µX −1 e + µX −2 d − A� p = 0
Ad = 0
Slide 15
4
.x*
.
x(0.01) c
central p a th . x(0.1)
.
x(1)
. x(10)
.
analy t ic center
and pi , i = 1, . . . , m).
• Solution:
� �� 1
�
d(µ) = I − X 2 A� (AX 2 A� )−1 A xe − X 2 c
µ
2 � −1 2
p(µ) = (AX A ) A(X c − µxe)
a Newton step
• Measure of closeness �� ��
�� 1 ��
�� XSe − e�� ≤ β,
�� µ ��
Slide 18
5
5 The Primal Barrier Algorithm
Slide 19
Input
(a) (A, b, c); A has full row rank;
(b) x0 > 0, s0 > 0, p0 ;
(c) optimality tolerance � > 0;
(d) µ0 , and α, where 0 < α < 1. Slide 20
1. (Initialization) Start with some primal and dual feasible x0 > 0, s0 >
0, p0 , and set k = 0.
X k = diag(xk1 , . . . , xkn ),
µk+1 = αµk
Slide 21
4. (Computation of directions) Solve the linear system
µk+1 X − 2 �
k d−A p = µ
k+1
X− 1
k e−c
Ad = 0
x k+1 = xk + d,
pk+1 = p,
sk+1 = c − A� p.
5.1 Correctness
√ Slide 22
β−β
Theorem Given α = 1 − √ √ , β < 1, (x0 , s0 , p0 ), (x0 > 0, s0 > 0):
β+ n
�� ��
�� 1 ��
�� X 0 S 0 e − e�� ≤ β.
�� µ0 ��
Then, after � √ √
(s0 )� x0 (1 + β)
�
β+ n
K= √ log
β−β �(1 − β)
iterations, (xK , sK , pK ) is found:
(sK )� xK ≤ �.
6
5.2 Complexity
Slide 23
• Work per iteration involves solving a linear system with m + n equations
XSe − µe
z = (x, p, s), r = 2n + m
Solve
F (z ∗ ) = 0
F (z k ) + J (z k )d = 0
Set z k+1 = z k + d (d is the Newton direction) Slide 27
7
(xk , pk , sk ) current primal and dual feasible solution
Newton direction d = (dkx , dkp , dks )
dkx Axk − b
⎡ ⎤⎡ ⎤ ⎡ ⎤
A 0 0
⎣ 0 A� I ⎦ ⎣ dkp ⎦ = − ⎣ A� pk + sk − c ⎦
⎢ ⎥⎢ ⎥ ⎢ ⎥
Sk 0 Xk dks X k S k e − µk e
xk
� � ��
k
βP = min 1, α min − ki ,
{i|(dk
x )i <0} (dx )i
ski
� � ��
k
βD = min 1, α min − ,
{i|(dk
s )i <0} (dks )i
0<α<1
(sk )� xk
µk =
n
Xk = diag(xk1 , . . . , xkn )
S k = diag(sk1 , . . . , skn )
dkx Axk − b
⎡ ⎤⎡ ⎤ ⎡ ⎤
A 0 0
�
⎣ 0 A I ⎦ ⎣ dkp ⎦ = − ⎣ A� pk + sk − c ⎦
⎢ ⎥⎢ ⎥ ⎢ ⎥
Sk 0 Xk dks X k S k e − µk e
Slide 30
8
4. (Find step lengths)
xki
� � ��
βPk = min 1, α min −
{i|(dk
x )i <0} (dkx )i
sk
� � ��
k
βD = min 1, α min − ki
{i|(dk
s )i <0} (ds )i
5. (Solution update)
• Primal barrier
�� �
1 2
�
dprimal−barrier = I − X 2 A� (AX 2 A� )−1 A Xe − X c
µ
• For µ = ∞ � �
dcentering = I − X 2 A� (AX 2 A� )−1 A Xe
• Note that
1
dprimal−barrier = dcentering + daffine
µ
• When µ is large, then the centering direction dominates, i.e., in the beginning,
• When µ is small, then the affine scaling direction dominates, i.e., towards the
end, the barrier algorithm behaves like the affine scaling algorithm
9
• In implementations of IPMs AX 2k A� is usually written as
AX 2k A� = LL� ,
Ly = f , L� d = y
8 Conclusions
Slide 33
• IPMs represent the present and future of Optimization.
• Very successful in solving very large problems.
• Extend to general convex problems
10
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
2 Preliminaries
Slide 2
• A symmetric matrix A is positive semidefinite (A � 0) if and only if
u� Au ≥ 0 ∀ u ∈ Rn
• trace(AB) = trace(BA)
• A • B = trace(A� B) = trace(B � A)
3 SDO
Slide 4
• C symmetric n × n matrix
• Ai , i = 1, . . . , m symmetric n × n matrices
• bi , i = 1, . . . , m scalars
• Semidefinite optimization problem (SDO)
(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0
1
3.1 Example
Slide 5
n = 3 and m = 2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 1 0 2 8 1 2 3
A1 = ⎝ 0 3 7 ⎠ , A2 = ⎝ 2 6 0⎠, C = ⎝2 9 0⎠
1 7 5 8 0 4 3 0 7
b1 = 11, b2 = 19
⎛ ⎞
x11 x12 x13
X = ⎝ x21 x22 x23 ⎠
x31 x32 x33
Slide 6
3.2 Convexity
Slide 7
(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0
The feasible set is convex:
2
3.3 LO as SDO
Slide 8
LO : min c� x
s.t. Ax = b
x≥0
⎛ ⎞ ⎛ ⎞
ai1 0 ... 0 c1 0 ... 0
⎜ 0 ai2 ... 0 ⎟ ⎜0 c2 ... 0 ⎟
Ai = ⎜
⎝ ... .. .. .. ⎟ , C =⎜
⎝ ... .. .. .. ⎟
. . . ⎠ . . . ⎠
0 0 . . . ain 0 0 . . . cn
Slide 9
(P ) : min C • X
s.t. Ai • X = bi , i = 1, . . . , m
Xij = 0, i = 1, . . . , n, j = i + 1, . . . , n
X�0
⎛ ⎞
x1 0 . . . 0
⎜ 0 x2 . . . 0 ⎟
X =⎜ ⎝ ... .. . . . ⎟
. . .. ⎠
0 0 . . . xn
4 Duality
Slide 10
m
�
(D) : max yi bi
i=1
�m
s.t. yi Ai + S = C
i=1
S�0
Equivalently,
m
�
(D) : max yi bi
i=1
m
�
s.t. C − yi Ai � 0
i=1
3
4.1 Example
Slide 11
(D) max 11y1 + 19y2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 1 0 2 8 1 2 3
s.t. y1 ⎝ 0 3 7 ⎠ + y2 ⎝ 2 6 0⎠+S = ⎝2 9 0⎠
1 7 5 8 0 4 3 0 7
S�0
y22
1
0.8
0.6
0.4
0.2
y11
4
4.3 Proof
Slide 14
• We must show that if S � 0 and X � 0, then S • X ≥ 0
• Let S = P DP � and X = QEQ� where P, Q are orthonormal matrices
S • X = trace(S � X) = trace(SX)
= trace(P DP � QEQ� )
n
�
= trace(DP � QEQ� P ) = Djj (P � QEQ� P )jj ≥ 0,
j=1
Theorem
• If there exist feasible solutions X̂ for (P ) and (ŷ, Ŝ) for (D) such that
X̂ � 0, Ŝ � 0
∗ ∗
• Then, both (P ) and (D) attain their optimal values zP and zD
• Furthermore, zP∗ = zD
∗
5
i = 1, . . . , m
min t
Z −X +sI �0
6
5.3 Optimizing
Structural Dynamics
Slide 20
• Select xi , cross-sectional area of structure i, i = 1, . . . , n
�
• M (x) = M 0 + i xi M i , mass matrix
�
• K(x) = K 0 + i xi K i , stiffness matrix
�
• Structure weight w = w0 + i xi wi
• Dynamics
¨ + K(x)d = 0
M (x)d
Slide 21
• d(t) vector of displacements
n
�
• di (t) = αij cos(ωj t − φj )
j=1
• det(K(x) − M (x)ω 2 ) = 0; ω1 ≤ ω2 ≤ · · · ≤ ωn
1/2
• Fundamental frequency: ω1 = λmin (M (x), K(x))
• We want to bound the fundamental frequency
ω1 ≥ Ω ⇐⇒ M (x)Ω2 − K(x) � 0
• Minimize weight
Slide 22
Problem: Minimize weight subject to
Fundamental frequency ω1 ≥ Ω
Limits on cross-sectional areas
Formulation
�
min w0 + i xi wi
7
• y = x + v; E[y] = x̄
� = Σ + D
Var[e� x] e� Σe e� De
�
= =1−
Var[e y] �e
e� Σ �e
e� Σ
Slide 25
We can find a lower bound on the reliability of the test
min e� Σe
�
s.t. Σ + D = Σ
Σ, D � 0
D diagonal
Equivalently,
max e� De
�
s.t. 0 � D � Σ
D diagonal
8
5.6 MAXCUT
Slide 27
• Given G = (N, E) undirected graph, weights wij ≥ 0 on edge (i, j) ∈ E
�
• Find a subset S ⊆ N : i∈S,j∈S̄ wij is maximized
5.6.1 Reformulation
Slide 28
• Let Y = xx� , i.e., Yij = xi xj
• Let W = [wij ]
• Equivalent Formulation
n n
1 ��
M AXCU T : max wij − W • Y
4 i=1 j=1
s.t. xj ∈ {−1, 1}, j = 1, . . . , n
Yjj = 1, j = 1, . . . , n
Y = xx�
5.6.2 Relaxation
Slide 29
• Y = xx� � 0
• Relaxation
n n
1 ��
RELAX : max wij − W • Y
4 i=1 j=1
s.t. Yjj = 1, j = 1, . . . , n
Y � 0
Slide 30
•
M AXCU T ≤ RELAX
9
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods
2 SDO formulation
2.1 Primal and dual
Slide 2
•
(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0
•
m
�
(D) : max yi bi
i=1
m
�
s.t. C − yi Ai � 0
i=1
3 Minimizing Polynomials
3.1 Sum of squares
Slide 3
• A polynomial f (x) is a sum of squares (SOS) if
�
f (x) = gj2 (x)
j
squares.
3.2 Proof
� Slide 4
2
• (⇐) Obvious. If f (x) = j gj (x) then f (x) ≥ 0.
1
� �
• (⇒) Factorize f (x) = C j (x − rj )nj k (x − ak + ibk )mk (x − ak − ibk )mk .
Since f (x) is nonnegative, then C ≥ 0 and all the nj are even. Then,
f (x) = f1 (x)2 + f2 (x)2 , where
1 � nj �
f1 (x) = C2 (x − rj ) 2 (x − ak )mk
j k
1 � nj �
f2 (x) = C 2 (x − rj ) 2 bm
k
k
j k
where Q � 0, i.e., Q = L� L.
• Then, f (x) = z(x)� L� L(x) = ||Lz(x)||2 .
3.4 Formulation
Slide 6
• Consider min f (x).
• Then, f (x) ≥ γ if and only if f (x) − γ = zx� Qzx with Q � 0. This
• Reformulation
max γ
�
f (x) − γ = z(x)� Qz(x)
s.t.
Q � 0
3.5 Example
3.5.1 Reformulation
Slide 7
min f (x) = 3 + 4x + 2x2 + 2x3 + x4 .
f (x) − γ = 3 − γ + 4x + 2x2 + 2x3 + x4 = (1, x, x2 )� Q(1, x, x2 ).
max γ
s.t. 3 − γ = q11
4 = 2q12 , 2 = 2q13 + q22
2 = 2q
⎡23 , 1 = q33 ⎤
q11 q12 q13
Q = ⎣ q12 q22 q23 ⎦ � 0
q13 q23 q33
2
4 Stability
Slide 8
• A linear difference equation
4.1 Proof
Slide 9
• (⇐=) Let Av = λv. Then,
�
>0
∞
� ∞
�
i� i �
�
A PA − P = A QA − Ai QAi = −Q � 0
i=1 i=0
4.2 Stabilization
Slide 10
• Consider now the case where A is not stable, but we can change some
Condition is nonlinear in (P , L)
� �
P A� P + C � Y �
�0
P A + Y C P
• This is linear in (P , Y ).
• Solve using SDO, recover L via L = P −1 Y
3
5 Primal Barrier Algorithm for SDO
Slide 12
• X � 0 ⇔ λ1 (X) ≥ 0, . . . , λn (X) ≥ 0
• Natural barrier to repel X from the boundary λ1 (X) > 0, . . . , λn (X) > 0:
n
�
− log(λi (X)) =
j=1
n
�
− log( λi (X)) = − log(det(X))
j=1
Slide 13
• Logarithmic barrier problem
• KKT conditions
Ai • X = bi , i = 1, . . . , m,
�m
C − µX −1 = yi A i .
i=1
X � 0,
• Given µ, need to solve these nonlinear equations for X, C, yi
• Apply Newton’s method until we are “close” to the optimal
• Reduce value of µ, and iterate until the duality gap is small
Ai • X = bi , i = 1, . . . , m,
m
�
yi A i + S =C
i=1
X, S � 0,
XS =0
4
6 Differences with LO
Slide 15
• Many different ways to linearize the nonlinear complementarity condition
X S = µI
7 Convergence
7.1 Stopping criterion
Slide 16
• The point (X, yi ) is feasible, and has duality gap:
m
�
C •X − yi bi = µX −1 • X = nµ
i=1
�√ �0 �
• Barrier algorithm needs O n log iterations to reduce duality gap from �0
�
to �
8 Conclusions
Slide 17
• SDO is a powerful modeling tool
• Barrier and primal-dual algorithms are very powerful
• Many good solvers available: SeDuMi, SDPT3, SDPA, etc.
• Pointers to literature and solvers:
www-user.tu-chemnitz.de/~helmberg/semidef.html
5
MIT OpenCourseWare
http://ocw.mit.edu
For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.