You are on page 1of 219

15.093J/2.

098J Optimization Methods

Professor: Dimitris Bertsimas


1 Structure of Class

Slide 1
Linear Optimization (LO): Lec. 1-9

Network Flows: Lec. 10-11

Discrete Optimization: Lec. 12-15

Dynamic Optimization: Lec. 16

Nonlinear Optimization (NLO): Lec. 17-24

2 Requirements
Slide 2
Homeworks: 30%

Midterm Exam: 30%

Final Exam: 40%

Class Participation: important tie breaker

Use of commercial software for solving optimization problems

3 Lecture Outline Slide 3


History of Optimization

Where LOPs Arise?

Examples of Formulations

4 History of Optimization
Slide 4
Fermat, 1638 Newton, 1670
min f(x) x: scalar
df(x) = 0
dx

Euler, 1755
min f(x1  : : : xn)
rf(x) = 0

Lagrange, 1797
min f(x1  : : : xn)
s.t. gk (x1 : : : xn) = 0 k = 1 : : : m
Euler, Lagrange Problems in innite dimensions, calculus of variations.

5 Nonlinear Optimization
5.1 The general problem Slide 5
min f(x1  : : : xn)
s:t: g1(x1  : : : xn)  0
...
gm (x1  : : : xn)  0:

6 What is Linear Optimization?


6.1 Formulation Slide 6
minimize 3x1 + x2
subject to x1 + 2x2  2

2x1 + x2  3
x1  0 x2  0
    
c= 3  x = x1  b = 2  A = 12 12
1 x2 3
minimize c x 0

subject to Ax  b
x0
7 History of LO
7.1 The pre-algorithmic period Slide 7
Fourier, 1826 Method for solving system of linear inequalities.
de la Vall ee Poussin simplex-like method for objective function with abso
lute values.

Kantorovich, Koopmans, 1930s Formulations and solution method.

von Neumann, 1928 game theory, duality.

Farkas, Minkowski, Carath eodory, 1870-1930 Foundations.

7.2 The modern period Slide 8


George Dantzig, 1947 Simplex method.

1950s Applications.

1960s Large Scale Optimization.

1970s Complexity theory.

Khachyan, 1979 The ellipsoid algorithm.

Karmakar, 1984 Interior point algorithms.

8 Where do LOPs Arise?


8.1 Wide Applicability Slide 9
Transportation
Air trac control, Crew scheduling, . . .

Movement of Truck Loads

Telecommunications
Manufacturing
Medicine
Engineering
Typesetting (TEX, LATEX)

9 Transportation Problem
9.1 Data Slide 10
m plants, n warehouses

si supply of ith plant, i = 1 : : :m

dj demand of jth warehouse, j = 1 : : :n

cij : cost of transportation i ! j

3
9.2 Decision Variables
9.2.1 Formulation Slide 11
xij = number of units to send i ! j
m n
XX
min cij xij
i=1 j =1
Xm
s:t: xij = dj j = 1 : : :n
i=1
Xn
xij = si i = 1 : : :m
j =1
xij  0

10 Sorting through LO
Slide 12
Given n numbers c1 c2 : : : cn
The order statistic c(1)  c(2) : : : c(n): c(1)  c(2)  : : :  c(n) 
P

Use LO to nd ki=1 c(i) .


X n
min ci xi
i=1

X n

s:t: xi = k

i=1
0  xi  1 i = 1 : : : n

11 Investment under taxation Slide 13


You have purchased si shares of stock i at price qi, i = 1 : : : n.
Current price of stock i is pi
You expect that the price of stock i one year from now will be ri.
You pay a capital-gains tax at the rate of 30% on any capital gains at the
time of the sale.

You want to raise C amount of cash after taxes.

You pay 1% in transaction costs.

Example: You sell 1,000 shares at $50 per share you have bought them
at $30 per share Net cash is:

50  1 000 ; 0:30  (50 ; 30)  1 000;

0:01  50  1 000 = $43 500:

4
11.1 Formulation Slide 14
n
X
max ri (si ; xi )
i=1
Xn Xn Xn
s:t: pi xi ; 0:30 (pi ; qi)xi ; 0:01 pi xi  C
i=1 i=1 i=1
0  xi  si

12 Investment Problem Slide 15


Five investment choices A, B, C, D, E.
A, C, and D are available in 1993.
B is available in 1994.
E is available in 1995.
Cash earns 6% per year.
$1,000,000 in 1993.
12.1 Cash Flowper Dollar Invested Slide 16
Year: A B C D E
1993 -1.00 0 -1.00 -1.00 0
1994 +0.30 -1.00 +1.10 0 0
1995 +1.00 +0.30 0 0 -1.00
1996 0 +1.00 0 +1.75 +1.40
LIMIT $500,000 NONE $500,000 NONE $750,000

12.2 Formulation
12.2.1 Decision Variables Slide 17
A : : :E: amount invested in $ millions
Casht : amount invested in cash in period t, t = 1 2 3
max 1:06Cash3 + 1:00B + 1:75D + 1:40E
s:t: A + C + D + Cash1  1
Cash2 + B  0:3A + 1:1C + 1:06Cash1
Cash3 + 1:0E  1:0A + 0:3B + 1:06Cash2
A  0:5 C  0:5 E  0:75
A : : : E  0

5
Solution: A = 0:5M , B = 0, C = 0, D = 0:5M , E = 0:659M , Cash1 = 0,
Cash2 = :15M , Cash3 = 0 Objective: 1:7976M

13 Manufacturing
13.1 Data Slide 18
n products, m raw materials
cj : prot of product j

bi : available units of material i.

aij : # units of material i product j needs in order to be produced.


13.2 Formulation
13.2.1 Decision variables Slide 19
xj = amount of product j produced.
Xn
max cj xj
j =1

s:t: a11x1 +    + a1nxn  b1


..
.
am1 x1 +    + amn xn  bm
xj  0 j = 1 : : :n

14 Capacity Expansion
14.1 Data and Constraints Slide 20
Dt : forecasted demand for electricity at year t
Et : existing capacity (in oil) available at t
ct : cost to produce 1MW using coal capacity
nt : cost to produce 1MW using nuclear capacity
No more than 20% nuclear
Coal plants last 20 years
Nuclear plants last 15 years

6
14.2 Decision Variables Slide 21
xt : amount of coal capacity brought on line in year t.
yt : amount of nuclear capacity brought on line in year t.
wt : total coal capacity in year t.
zt : total nuclear capacity in year t.
14.3 Formulation Slide 22
P
T
min ct xt + ntyt
t=1

Pt

s:t: wt = xs  t = 1 : : :T

s=max(0t;19)
zt =
Pt ys  t = 1 : : :T
s=max(0t;14)
wt + zt + Et  Dt
zt  0:2(wt + zt + Et)
xt yt wt zt  0:

15 Scheduling
15.1 Decision variables Slide 23
Hospital wants to make weekly nightshift for its nurses
Dj demand for nurses, j = 1 : : : 7
Every nurse works 5 days in a row
Goal: hire minimum number of nurses
Decision Variables
xj : # nurses starting their week on day j
15.2 Formulation Slide 24
min
P7 x
j
j =1

s:t: x1+ x4 + x5 + x6 + x7 d1


x1+ x2 x5 + x6 + x7  d2
x1+ x2 + x3 x6 + x7  d3
x1+ x2 + x3 + x4 + x7  d4
x1+ x2 + x3 + x4 + x5  d5
xj  0 x2 + x3 + x4 + x5 + x6  d6
x3 + x4 + x5 + x6 + x7  d7

7
16 Revenue Management
16.1 The industry Slide 25
Deregulation in 1978
Prior to Deregulation
{ Carriers only allowed to y certain routes. Hence airlines such as
Northwest, Eastern, Southwest, etc.

{ Fares determined by Civil Aeronautics Board (CAB) based on mileage


and other costs (CAB no longer exists)
Slide 26
Post Deregulation
anyone can y, anywhere
fares determined by carrier (and the market)

17 Revenue Management
17.1 Economics Slide 27
Huge sunk and xed costs
Very low variable costs per passenger ($10/passenger or less)
Strong economically competitive environment
Near-perfect information and negligible cost of information
Highly perishable inventory
Result: Multiple fares

18 Revenue Management
18.1 Data Slide 28
n origins, n destinations
1 hub
2 classes (for simplicity), Q-class, Y-class
Revenues rijQ  rijY

Capacities: Ci0, i = 1 : : :n C0j , j = 1 : : :n


Expected demands: DijQ , DijY

8
18.2 LO Formulation
18.2.1 Decision Variables Slide 29
� Qij : # of Q-class customers we accept from i to j
� Yij : # of Y-class customers we accept from i to j
X
maximize Q
rij Qij +r Y
ij Yij

X
ij
n

subject to (Q + Y ) � C 0
ij ij i

�0
X
j
n

(Q + Y ) � C0
ij ij j

i�0
0�Q ij � Dij
Q
 0�Y ij � Dij
Y

19 Revenue Management
19.1 Importance Slide 30
Robert Crandall, former CEO of American Airlines:
We estimate that RM has generated $1.4 billion in incremental revenue for
American Airlines in the last three years alone. This is not a one-time benet.
We expect RM to generate at least $500 million annually for the foreseeable
future. As we continue to invest in the enhancement of DINAMO we expect to
capture an even larger revenue premium.

20 Messages
20.1 How to formulate? Slide 31
1. Dene your decision variables clearly.
2. Write constraints and objective function.
3. No systematic method available.
What is a good LO formulation?
A formulation with a small number of variables and constraints, and the matrix
A is sparse.

21 Nonlinear Optimization
21.1 The general problem Slide 32
min f(x1  : : : xn)
s:t: g1(x1  : : : xn)  0

...
gm (x1  : : : xn)  0:

22 Convex functions Slide 33


f : S ;! R
For all x1  x2 2 S
f(x1 + (1 ; )x2 )  f(x1 ) + (1 ; )f(x2 )
f(x) concave if ;f(x) convex.

23 On the power of LO
23.1 LO formulation  Slide 34
min f(x) = maxk dk x + ck

0

s:t: Ax  b

min z
s:t: Ax  b

dk x + ck  z 8 k

24 On the power of LO
24.1 Problems with j:j

min
P c jx j Slide 35
j j
s:t: Ax  b
Idea: jxj = maxfx ;xg Pc z
min j j
s:t: Ax  b
xj  zj
;xj  zj
Message: Minimizing Piecewise linear convex function can be modelled by LO
10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093 Optimization Methods

Lecture 2: The Geometry of LO

1 Outline Slide 1
� Polyhedra
� Standard form
� Algebraic and geometric de�nitions of corners
� Equivalence of de�nitions
� Existence of corners
� Optimality of corners
� Conceptual algorithm

2 Central Problem Slide 2


minimize c0 x
subject to ai0x � bi
0 i 2 M1
ai0x � bi i 2 M2
ai x � bi i 2 M3

xj � 0 j 2 N1

xj >< 0 j 2 N2

2.1 Standard Form Slide 3


minimize c0
x
subject to Ax � b
x � 0

Characteristics
� Minimization problem
� Equality constraints
� Non-negative variables
2.2 Transformations Slide 4
max c0 x � min(�c0 x)
ai0x � bi ai0x + si � bi; si � 0
,
ai0x � bi ai0x � si � bi; si � 0
xj >< 0 xj � x+j � x�j
x+j � 0; xj�
� 0

2.3 Example Slide 5


maximize x1 � x2
subject to x1 + x2 � 1
x1 + 2x2 � 1
x1 >< 0; x2 � 0
+
�minimize �x+1 + x

1 + x2
subject to x+1 � x

1 + x2 + s1 � 1

x+1 � x

1 + 2x2 � s2 � 1

x+1 ; x

1 ; x2; s1; s2 � 0

3 Preliminary Insights
Slide 6
minimize �x1 � x2
subject to x1 + 2x2 � 3
2x1 + x2 � 3

x1; x2 � 0

x2

- x1 - x2 = - 2

1.5

- x1 - x2 = z
(1,1)

- x1 - x2 = 0

1.5 3 x1
c x1 + 2x2 < 3
2x1 + x2 < 3

Slide 7
�x1 + x2 � 1
x1 � 0
x2 � 0
Slide 8
� There exists a unique optimal solution.
� There exist multiple optimal solutions; in this case, the set of optimal
solutions can be either bounded or unbounded.

2
x2

c = (1,0)

c = (- 1,- 1)
1

c = (0,1)
c = (1,1)

x1

� The optimal cost is �1, and no feasible solution is optimal.


� The feasible set is empty.

4 Polyhedra
4.1 De nitions Slide 9
� The set fx j a0 x � bg is called a hyperplane.
� The set fx j a0 x � bg is called a halfspace.
� The intersection of many halfspaces is called a polyhedron.

= b3
a '2

a '3x
x=
b2

a3
a2

a' x < b a4

a' x > b a1
=b a 4' x = b4
=b
1

a' x a5
1x
a'

a
a5' x = b
5

(a) (b)

5 Corners
5.1 Extreme Points Slide 10
� Polyhedron P � fx j Ax � bg
� x 2 P is an extreme point of P
if 6 9 y; z 2 P (y �
6 x; z 6� x):
x � �y + (1 � �)z ; 0 � � � 1

. . u
v
. w

. . z

.y
x

5.2 Vertex Slide 11


� x 2 P is a vertex of P if 9 c:
x is the unique optimum
minimize c0
y
subject to y2P

5.3 Basic Feasible Solution Slide 12


P � f(x1 ; x2; x3) j x1 + x2 + x3 � 1; x1 ; x2; x3 � 0g
Slide 13
Points A,B,C : 3 constraints active
Point E: 2 constraints active
suppose we add 2x1 + 2x2 + 2x3 � 2.

4
'w }
.
w

=c
c 'y
{y | P

c
.
x
'x }
{y | c
'y = c

x3

A .
E . P . C
x2

D . .
B
x1

Then 3 hyperplanes are tight, but constraints are not linearly independent.
Slide 14
Intuition: a point at which n inequalities are tight and corresponding equations
are linearly independent.
P � fx 2 �n j Ax � bg
� a1 ; : : :; am rows of A
� x2P
� I � fi j ai 0x � bi g
De�nition x is a basic feasible solution if subspace spanned by fai; i 2 I g
is �n .
5.3.1 Degeneracy
Slide 15
� If jI j � n, then ai ; i 2 I are linearly independent; x nondegenerate.
5
� If jI j � n, then there exist n linearly independent fai ; i 2 I g; x degener

ate.
A

C
P
B
E

(a) (b)

6 Equivalence of de nitions
Slide 16
Theorem: P � fx j Ax � bg. Let x 2 P.
x is a vertex , x is an extreme point , x is a BFS.
7 BFS for standard form polyhedra
Slide 17
� Ax � b and x � 0
� m � n matrix A has linearly independent rows
� x 2 �n is a basic solution if and only if Ax � b, and there exist indices
B(1); : : :; B(m) such that:
{ The columns AB(1) ; : : :; AB(m) are linearly independent
{ If i 6� B(1); : : :; B(m), then xi � 0
7.1 Construction of BFS Slide 18
Procedure for constructing basic solutions
1. Choose m linearly independent columns AB(1) ; : : :; AB(m)
2. Let xi � 0 for all i 6� B(1); : : :; B(m)
3. Solve Ax � b for xB(1) ; : : : ; xB(m)

Ax � b ! BxB + NxN � b
xN � 0; xB � B� b 1

6
7.2 Example Slide 19
2 3 2 3
1 1 2 1 0 0 0 8

66 0 1 6 0 1 0
0
7 6 12
7

5 x �
6
4 1 0 0 0 0 1
0
7 4 4
75

0 1 0 0 0 0 1 6

� A ; A ; A ; A basic columns
4 5 6 7

� Solution: x � (0; 0; 0; 8; 12; 4; 6), a BFS


� Another basis: A ; A ; A ; A basic columns.
3 5 6 7

� Solution: x � (0; 0; 4; 0; �12; 4;6), not a BFS


7.3 Geometric intuition Slide 20

A3

A1

A2
A4 = - A1

8 Existence of BFS Slide 21

P Q

P � f(x1; x2) : 0 � x1; x2 � 1g

Q � f(x1 ; x2) : �x1 + x2 � 2; x1 � 0; x2 � 0g:


Slide 22
De�nition: P contains a line if 9x 2 P; and d 2 �n:
x + �d 2 P 8�:
Theorem: P � fx 2 �n j Ax � bg �6 ;.
P has a BFS , P does not contain a line.
Implications
� Polyhedra in standard form P � fx j Ax � b; x � 0g contain a BFS
� Bounded polyhedra have a BFS.

9 Optimality of BFS
Slide 23
min c0 x
s:t: x 2 P � fx j Ax � bg
Theorem: Suppose P has at least one extreme point. Either optimal cost is
�1 or there exists an extreme point which is optimal.

10 Conceptual algorithm
Slide 24
� Start at a corner
� Visit a neighboring corner that improves objective.
x3

. B = (0,0,10)

.
A = (0,0,0)
E = (4,4,4) .

. .
C = (0,10,0)

D = (10,0,0)
x1 x2

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093 Optimization Methods

Lecture 3: The Simplex Method

1 Outline Slide 1
� Reduced Costs
� Optimality conditions
� Improving the cost
� Unboundness
� The Simplex algorithm
� The Simplex algorithm on degenerate problems

2 Matrix View Slide 2


min c0 x
s:t: Ax � b
x�0
x � (xB ; xN ) xB basic variables
xN non-basic variables
A � [B; N ]
Ax � b ) � B � xB +� N � xN � b
) xB + B NxN � B b
1 1

) xB � B � b � B � NxN
1 1

2.1 Reduced Costs Slide 3


z � c0B xB + c0N xN
� c0B (B �1 b � B �1 N xN ) + c0N xN
� c0B B �1 b + (c0N � c0B B �1N )xN
cj � cj � c0B B �1Aj reduced cost

2.2 Optimality Conditions Slide 4


Theorem:
� x BFS associated with basis B
� c reduced costs

Then
� If c � 0 ) x optimal

� x optimal and non-degenerate ) c � 0

1
2.3 Proof
� y arbitrary feasible solution
� d � y � x ) Ax � Ay � b ) Ad � 0
) BdB + P Aidi � 0
Slide 5

) dB � � P B �1 Ai di
i2N

) c0 d � c0B dB + P cidi
i2N
P i2N P
� (ci � c0B B �1Ai )di � cidi Slide 6
i2N i2N
� Since y � 0 and xi � 0; i 2 N, then di � yi � xi � 0; i 2 N
� c0 d � c0 (y � x) � 0 ) c0 y � c0 x
) x optimal
(b) in BT, Theorem 3.1

3 Improving the Cost


Slide 7
� Suppose cj � cj � c0B B �1Aj � 0
Can we improve the cost�
� Let dB � �B �1 Aj
dj � 1; di � 0; i 6� B(1); : : : ; B(m); j.
� Let y � x + � � d; � � 0 scalar
Slide 8
c0y � c0x � � � c d0
0

� � � (cB dB + cj dj )

� � � (cj � c0B B � Aj )

� � � cj

Thus, if cj � 0 cost will decrease.

4 Unboundness Slide 9
� Is y � x + � � d feasible�
Since Ad � 0 ) Ay � Ax � b
� y�0�
If d � 0 ) x + � � d � 0 8 � � 0
) objective unbounded.

2
x3
(0,0,3) (1,0,3)

(2,0,2)
(0,1,3)

x1

(0,2,0) (2,2,0)
x2

5 Improvement
Slide 10
If di � 0, then
xi + �di � 0 ) � � � xdi
i

x �

) �� � fijmin �
di
di < g0 i �
xB i �

) �� � min
fi�1;:::;mjdB(i) <0g
� ( )

dB(i)
5.1 Example Slide 11
min x1+ 5x2 �2x3
s:t: x1+ x2+ x3 �4
x1 �2
x3 �3
3x2+ x3 �6
x1; x2; x3 �0
0
x 1
Slide 12
A A A A A A A

Slide 13
2
1
1 2
1 1
3
1
4
0
5
0
6 7
0

3 BB x 1
CC 0
4
1

BB x CC B 2
C

66 77

1 0 0 0 1 0
0
BB x 3
CC �
B@
3
CA

4
5

0 0 1 0 0 1
0
BB x 4

CC
0 3 1 0 0 0 1
@
x5

A
6

6
x7

B � [A1 ; A3; A6; A7]


BFS: x � (2; 0; 2; 0; 0;1;4)0 Slide 14

3
x3
(0,0,3) (1,0,3)

(2,0,2)
(0,1,3)

x1

(0,2,0) (2,2,0)
x2

2
1 1 0 0 3
2
0 1 0 0

B �
664
10 01 0
1
0
0
775
; B� �
664
�11 �11
1

0 0

1 0

77 c0
� (0; 7; 0; 2; �3; 0;0)

0 1 0 1 0
1
�1 1 0 0
1
1

d �1

B d C
1
B 1
CC
d � 1; d � d � 0;
B
5 2 4 @
d CA
� �B� A
3
6

5 �
B
@
�1
A
Slide 15

d7 �1

y0 � x0 + �d0 � (2 � �; 0; 2 + �; 0; �; 1 � �; 4 � �)

What happens
as � increases�

�� � minfi�1;:::;mjdB i < g � xBdi i �


( )


( ) 0

min � (�21) ; � (�11) ; � (�41) � 1:


l � 6 (A6 exits the basis).
New solution
y � (1; 0; 3; 0; 1; 0;3)0 Slide 16
New
basis2
B � (A1 ; A3 ;3
A5; A7)
2
3

Slide 17
1 1 0 0 1 0 �1 0

66 1 0 1 0
77 �1 66 0 0 1 0
77
B �
4
0 1 0 0
5
; B �
4
�1 1 1

0
5

0 1 0 1 0 0 �1 1
c0 � c0 � c0B B�1A � (0; 4; 0; �1; 0; 3; 0)
Need to continue, column A4 enters the basis.

6 Correctness Slide 18
� xB i �
� xdB(l) � i�1;:::;m;d
min � ( )

dB(i) � ��
B(l) B(i)<0
Theorem
� B � fAB i ;i�6 l ; Aj g basis

( )

� y � x + �� d is a BFS associated with basis B .

7 The Simplex Algorithm


Slide 19
1. Start with basis B � [AB(1) ; : : :; AB(m) ]
and a BFS x.
2. Compute cj � cj � c0B B �1Aj
� If cj � 0; x optimal; stop.

� Else select j : cj � 0.
Slide 20
3. Compute u � �d � B �1Aj .
� If u � 0 ) cost unbounded; stop
� Else
xB(i) uB(l)
4. �� � 1�i�min
m;u >0 u
� u
i i l
5. Form a new basis by replacing AB(l) with Aj .
6. yj � ��
yB(i) � xB(i) � �� ui
7.1 Finite Convergence Slide 21
Theorem:
� P � fx j Ax � b; x � 0g �6 ;
� Every BFS non-degenerate
Then
� Simplex method terminates after a �nite number of iterations
� At termination, we have optimal basis B or we have a direction d : Ad �
0; d � 0; c0d � 0 and optimal cost is �1.

5
7.2 Degenerate problems Slide 22
� �� can equal zero (why�) ) y � x, although B 6� B .
� Even if �� � 0, there might be a tie
xB(i)
min ui )

1�i�m;ui >0

next BFS degenerate.


� Finite termination not guaranteed; cycling is possible.
7.3 Avoiding Cycling Slide 23
� Cycling can be avoided by carefully selecting which variables enter and
exit the basis.
� Example: among all variables cj � 0, pick the smallest subscript;
among all variables eligible to exit the basis, pick the one with the smallest

subscript.

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093J Optimization Methods

Lecture 4: The Simplex Method II

1 Outline
Slide 1
• Revised Simplex method
• The full tableau implementation
• Finding an initial BFS
• The complete algorithm
• The column geometry
• Computational efficiency

2 Revised Simplex
Slide 2
Initial data: A, b, c

1. Start with basis B = [AB(1) , . . . , AB(m) ]


and B −1 .
2. Compute p′ = c′B B −1
cj = cj − p ′ A j
• If cj ≥ 0; x optimal; stop.
• Else select j : cj < 0.
Slide 3
3. Compute u = B −1 Aj .
• If u ≤ 0 ⇒ cost unbounded; stop
• Else
xB(i) uB(l)
4. θ∗ = min =
1≤i≤m,ui >0 ui ul
5. Form a new basis B by replacing AB(l) with Aj .
6. yj = θ∗ , yB(i) = xB(i) − θ∗ ui
Slide 4
7. Form [B −1 |u]
8. Add to each one of its rows a multiple of the lth row in order to make the
last column equal to the unit vector el .
−1
The first m columns is B .

1
2.1 Example
Slide 5
min x1 + 5x2 −2x3
s.t. x1 + x2 + x3 ≤4
x1 ≤2
x3 ≤3
3x2 + x3 ≤6
x1 , x2 , x3 ≥0
Slide 6
B = {A1 , A3 , A6 , A7 }, BFS: x = (2, 0, 2, 0, 0, 1, 4)′

c = (0, 7, 0, 2, −3, 0,0)  
1 1 0 0 0 1 0 0
 1 0 0 0  −1
 1 −1 0 0 
B=  0 1 1 0  , B =  −1
  
1 1 0 
0 1 0 1 −1 1 0 1
(u1 , u3 , u6�, u7 )′ =�B −1 A5 = (1, −1, 1, 1)′
θ∗ = min 21 , 11 , 41 = 1, l = 6
l = 6 (A6 exits  the basis).  Slide 7
0 1 0 0 1
 1 −1 0 0 −1 

[B −1 |u] =   −1

1 1 0 1 

 −1 1 0 1  1

1 0 −1 0

−1  0 0 1 0

⇒B =  −1 1

1 0 

0 0 −1 1

2.2 Practical issues


Slide 8
• Numerical Stability
B −1 needs to be computed from scratch once in a while, as errors accu­
mulate
• Sparsity
B −1 is represented in terms of sparse triangular matrices

3 Full tableau implementation


Slide 9
−c′B B −1 b c′ − c′B B −1 A
B −1 b B −1 A

or, in more detail,

2
−c′B xB c1 ... cn
xB(1) | |
..
. B −1 A1 ... B −1 An
xB(m) | |

3.1 Example
Slide 10
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 ≤ 20
2x1 + x2 + 2x3 ≤ 20
2x1 + 2x2 + x3 ≤ 20
x1 , x2 , x3 ≥ 0
min −10x1 − 12x2 − 12x3
s.t. x1 + 2x2 + 2x3 + x4 = 20
2x1 + x2 + 2x3 + x5 = 20
2x1 + 2x2 + x3 + x6 = 20
x1 , . . . , x6 ≥ 0
BFS: x = (0, 0, 0, 20, 20, 20)′
B=[A4 , A5 , A6 ] Slide 11

x1 x2 x3 x4 x5 x6
0 −10 −12 −12 0 0 0
x4 = 20 1 2 2 1 0 0
x5 = 20 2* 1 2 0 1 0
x6 = 20 2 2 1 0 0 1

c′ = c′ − c′B B −1 A = c′ = (−10, −12, −12, 0, 0, 0) Slide 12

x1 x2 x3 x4 x5 x6
100 0 −7 −2 0 5 0
x4 = 10 0 1.5 1* 1 −0.5 0
x1 = 10 1 0.5 1 0 0.5 0

x6 = 0 0 1 −1 0 −1 1

Slide 13

3
x1 x2 x3 x4 x5 x6
120 0 −4 0 2 4 0
x3 = 10 0 1.5 1 1 −0.5 0
x1 = 0 1 −1 0 −1 1 0
x6 = 10 0 2.5* 0 1 −1.5 1
Slide 14

x1 x2 x3 x4 x5 x6
136 0 0 0 3.6 1.6 1.6
x3 = 4 0 0 1 0.4 0.4 −0.6
x1 = 4 1 0 0 −0.6 0.4 0.4
x2 = 4 0 1 0 0.4 −0.6 0.4
Slide 15
x 3

.B = (0,0,10)

A = (0,0,0) .
E = 4( 4, 4, ) .

. .
C = (0,10,0)

D = (10,0,0)
x 1 x 2

4 Comparison of implementations
Slide 16
Full tableau Revised simplex

Memory O(mn) O(m2 )

Worst-case time O(mn) O(mn)

Best-case time O(mn) O(m2 )

4
5 Finding an initial BFS
Slide 17
• Goal: Obtain a BFS of Ax = b, x ≥ 0

or decide that LOP is infeasible.

• Special case: b ≥ 0

Ax ≤ b, x ≥ 0

⇒ Ax + s = b, x, s ≥ 0
s = b, x=0

5.1 Artificial variables


Slide 18
Ax = b, x≥0
1. Multiply rows with −1 to get b ≥ 0.
2. Introduce artificial variables y, start with initial BFS y = b, x = 0, and

apply simplex to auxiliary problem

min y1 + y2 + . . . + ym
s.t. Ax + y = b
x, y ≥ 0
Slide 19
3. If cost > 0 ⇒ LOP infeasible; stop.

4. If cost = 0 and no artificial variable is in the basis, then a BFS was found.

5. Else, all yi∗ = 0, but some are still in the basis. Say we have AB(1) , . . . , AB(k)

in basis k < m. There are m − k additional columns of A to form a basis.

Slide 20

6. Drive artificial variables out of the basis: If lth basic variable is artifi­

cial examine lth row of B −1 A. If all elements = 0 ⇒ row redundant.

Otherwise pivot with =� 0 element.

6 A complete Algorithm for LO


Slide 21
Phase I:
1. By multiplying some of the constraints by −1, change the problem so that

b ≥ 0.

2. Introduce y1 , . . . , ym , if necessary, and apply the simplex method to min m



i=1 yi .

3. If cost> 0, original problem is infeasible; STOP.

5
4. If cost= 0, a feasible solution to the original problem has been found.
5. Drive artificial variables out of the basis, potentially eliminating redundant

rows.

Slide 22
Phase II:
1. Let the final basis and tableau obtained from Phase I be the initial basis

and tableau for Phase II.

2. Compute the reduced costs of all variables for this initial basis, using the

cost coefficients of the original problem.

3. Apply the simplex method to the original problem.

6.1 Possible outcomes


Slide 23
1. Infeasible: Detected at Phase I.
2. A has linearly dependent rows: Detected at Phase I, eliminate redundant

rows.

3. Unbounded (cost= −∞): detected at Phase II.


4. Optimal solution: Terminate at Phase II in optimality check.

7 The big-M method


n m
Slide 24
� �
min cj xj + M yi
j=1 i=1
s.t. Ax + y = b
x, y ≥ 0

8 The Column Geometry


Slide 25
min c′ x
s.t. Ax = b
e′ x = 1
x ≥ 0
� � � � � � � �
A1 A2 An b
x1 + x2 + · · · + xn =
c1 c2 cn z
Slide 26
Slide 27

6
z
B

. I C

F
H .
G . D

b .

initialbasis
z
. 6

3. . . . 2
1

4 . 7
nextbasis

8
. . 5

.b
optimalbasis

9 Computational efficiency
Slide 28
Exceptional practical behavior: linear in n
Worst case
max xn
s.t. ǫ ≤ x1 ≤ 1
ǫxi−1 ≤ xi ≤ 1 − ǫxi−1 , i = 2, . . . , n
Slide 29
x3

x2

x2

x1 x1

(a) (b)

Slide 30
Theorem
• The feasible set has 2n vertices
• The vertices can be ordered so that each one is adjacent to and has lower

cost than the previous one.

• There exists a pivoting rule under which the simplex method requires

2n − 1 changes of basis before it terminates.

8
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093J Optimization Methods

Lecture 5: Duality Theory I

1 Outline
Slide 1
• Motivation of duality
• General form of the dual
• Weak and strong duality
• Relations between primal and dual
• Economic Interpretation
• Complementary Slackness

2 Motivation
2.1 An idea from Lagrange
Slide 2
Consider the LOP, called the primal with optimal solution x∗

min c′ x
s.t. Ax = b
x≥0

Relax the constraint


g(p) = min c′ x + p′ (b − Ax)
s.t. x ≥ 0

g(p) ≤ c′ x∗ + p′ (b − Ax∗ ) = c′ x∗
Get the tightest lower bound, i.e.,

max g(p)

� �
g(p) = min c′ x + p′ (b − Ax)
x ≥0
= p′ b + min (c′ − p′ A)x
x ≥0

Note that �
′ ′ 0, if c′ − p′ A ≥ 0′ ,
min (c − p A)x =
x≥0 −∞, otherwise.

Dual max g(p) ⇔ max p′ b

s.t. p′ A ≤ c′

1
3 General form of the dual

Slide 3
Primal Dual
min c′ x max p′ b
s.t. a′i x ≥ bi i ∈ M1 s.t. pi ≥ 0 i ∈ M1
a′i x ≤ bi i ∈ M2 pi ≤ 0 i ∈ M1
a′i x = bi i ∈ M3 pi >< 0 i ∈ M3
xj ≥ 0 j ∈ N1 p′ Aj ≤ cj j ∈ N1
xj ≤ 0 j ∈ N2 p′ Aj ≥ cj j ∈ N2
xj >< 0 j ∈ N3 p′ Aj = cj j ∈ N3

3.1 Example
Slide 4
min x1 + 2x2 + 3x3 max 5p1 + 6p2 + 4p3
s.t. −x1 + 3x2 =5 s.t. p1 free
2x1 − x2 + 3x3 ≥ 6 p2 ≥0
x3 ≤ 4 p3 ≤0
x1 ≥ 0 −p1 + 2p2 ≤1
x2 ≤ 0 3p1 − p2 ≥2
x3 free, 3p2 + p3 = 3.
Slide 5
Primal min max dual
≥ bi ≥0
constraints ≤ bi ≤0 variables
>
= bi <0
≥0 ≤ cj
variables ≤0 ≥ cj constraints
>
<0 = cj

Theorem: The dual of the dual is the primal.

3.2 A matrix view


Slide 6
min c′ x max p′ b
s.t. Ax = b s.t. p′ A ≤ c′
x ≥ 0
min c′ x max p′ b
s.t. Ax ≥ b s.t. p′ A = c′
p≥0

4 Weak Duality
Slide 7
Theorem:
If x is primal feasible and p is dual feasible then p′ b ≤ c′ x
Proof
p′ b = p′ Ax ≤ c′ x

2
Corollary:

If x is primal feasible, p is dual feasible, and p′ b = c′ x, then x is optimal in

the primal and p is optimal in the dual.

5 Strong Duality
Slide 8
Theorem: If the LOP has optimal solution, then so does the dual, and optimal

costs are equal.

Proof:

min c′ x
s.t. Ax = b
x ≥ 0
Apply Simplex; optimal solution x, basis B.
Optimality conditions:
c′ − c′B B −1 A ≥ 0′
Slide 9
Define p′ = c′B B −1 ⇒ p′ A ≤ c′
⇒ p dual feasible for
max p′ b
s.t. p′ A ≤ c′

p′ b = c′B B −1 b = c′B xB = c′ x
⇒ x, p are primal and dual optimal

5.1 Intuition
Slide 10
a3

c
a2
a1
p 2a 2 p 1a 1

.
x *

3
6 Relations between primal and dual
Slide 11
Finite opt. Unbounded Infeasible
Finite opt. *
Unbounded *
Infeasible * *

7 Economic Interpretation
Slide 12
• x optimal nondegenerate solution: B −1 b > 0
• Suppose b changes to b + d for some small d
• How is the optimal cost affected?
• For small d feasibilty unaffected
• Optimality conditions unaffected
• New cost c′B B −1 (b + d) = p′ (b + d)
• If resource i changes by di , cost changes by pi di : “Marginal Price”

8 Complementary slackness
8.1 Theorem
Slide 13
Let x primal feasible and p dual feasible. Then x, p optimal if and only if

pi (a′i x − bi ) = 0, ∀i

xj (cj − p′ Aj ) = 0, ∀j

8.2 Proof
Slide 14
• ui = pi (a′i x − bi ) and vj = (cj − p′ Aj )xj
• If x, p primal and dual feasible, ui ≥ 0, vj ≥ 0 ∀i, j.
• Also c′ x − p′ b = i ui + j vj .
� �

• By the strong duality theorem, if x and p are optimal, then c′ x = p′ b ⇒

ui = vj = 0 for all i, j.

• Conversely, if ui = vj = 0 for all i, j, then c′ x = p′ b,

• ⇒ x and p are optimal.

4
8.3 Example
Slide 15

min 13x1 + 10x2 + 6x3 max 8p1 + 3p2


s.t. 5x1 + x2 + 3x3 = 8 s.t. 5p1 + 3p2 ≤ 13

3x1 + x2 = 3 p1 + p2 ≤ 10

x1 , x2 , x3 ≥ 0 3p1 ≤ 6

Is x∗ = (1, 0, 1)′ optimal? Slide 16

5p1 + 3p2 = 13, 3p1 = 6

⇒ p1 = 2, p2 = 1
Objective=19

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 6: Duality Theory II

c
c
a1
a3
A
a5
a2 c a4
a1

B a3 c

a4 a2
a1

C c

a1
a5
D a1

1 Outline
Slide 1
• Geometry of duality
• The dual simplex algorithm
• Farkas lemma
• Duality as a proof technique

2 The Geometry of Duality


Slide 2
min c′ x
s.t. a′i x ≥ bi , i = 1, . . . , m

max p′ b

m
s.t. pi ai = c
i=1
p≥0

3 Dual Simplex Algorithm


3.1 Motivation
Slide 3
• In simplex method B −1 b ≥ 0
• Primal optimality condition

c′ − c′B B −1 A ≥ 0′

same as dual feasibility

1
a2 c

a1
a3 a3
a2

x * a1

• Simplex is a primal algorithm: maintains primal feasibility and works


towards dual feasibility
• Dual algorithm: maintains dual feasibility and works towards primal
feasibility
Slide 4
−c′B xB c̄1 ... c̄n
xB(1) | |
..
. B −1 A1 ... B −1 An
xB(m) | |

• Do not require B −1 b ≥ 0
• Require c̄ ≥ 0 (dual feasibility)
• Dual cost is
p′ b = c′B B −1 b = c′B xB

• If B −1 b ≥ 0 then both dual feasibility and primal feasibility, and also

same cost ⇒ optimality

• Otherwise, change basis

3.2 An iteration
Slide 5
1. Start with basis matrix B and all reduced costs ≥ 0.

2. If B −1 b ≥ 0 optimal solution found; else, choose l s.t. xB(l) < 0.

2
x2 p2

1 . D

c b

. B . C
1
. C

. .
A
1
D
.
E
2 x1
A
. B
1/2
. .
1
E
p1

(a) (b)

3. Consider the lth row (pivot row) xB(l) , v1 , . . . , vn . If ∀i vi ≥ 0 then dual

optimal cost = +∞ and algorithm terminates.

Slide 6
4. Else, let j s.t.
c̄j c̄i
= min
|vj | {i|vi <0} |vi |
5. Pivot element vj : Aj enters the basis and AB(l) exits.

3.3 An example
Slide 7
min x1 + x2
s.t. x1 + 2x2 ≥ 2
x1 ≥ 1
x1 , x2 ≥ 0
min x1 + x2 max 2p1 + p2
s.t. x1 + 2x2 − x3 = 2 s.t. p1 + p2 ≤ 1
x1 − x4 = 1 2p1 ≤ 1
x1 , x2 , x3 , x4 ≥ 0 p1 , p2 ≥ 0
Slide 8
x1 x2 x3 x4
0 1 1 0 0
x3 = −2 −1 −2* 1 0
x4 = −1 −1 0 0 1
Slide 9

x1 x2 x3 x4
−1 1/2 0 1/2 0
x2 = 1 1/2 1 −1/2 0
x4 = −1 −1* 0 0 1

3
A 1
A 3
A 2

b
.

x1 x2 x3 x4
−3/2 0 0 1/2 1/2
x2 = 1/2 0 1 −1/2 1/2
x1 = 1 1 0 0 −1

4 Duality as a proof method


4.1 Farkas lemma
Slide 10
Theorem:

Exactly one of the following two alternatives hold:

1. ∃x ≥ 0 s.t. Ax = b.
2. ∃p s.t. p′ A ≥ 0′ and p′ b < 0.

4.1.1 Proof
Slide 11
“ ⇒′′ If ∃x ≥ 0 s.t. Ax = b, and if p′ A ≥ 0′ , then p′ b = p′ Ax ≥ 0
“ ⇐′′ Assume there is no x ≥ 0 s.t. Ax = b

(P ) max 0′ x (D) min p′ b


s.t. Ax = b s.t. p′ A ≥ 0 ′
x ≥ 0

(P) infeasible ⇒ (D) either unbounded or infeasible


Since p = 0 is feasible ⇒ (D) unbounded
⇒ ∃p : p′ A ≥ 0′ and p′ b < 0

4
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 7: Sensitivity Analysis


1 Motivation
1.1 Questions
Slide 1
z = min c′ x
s.t. Ax = b

x ≥ 0

• How does z depend globally on c? on b?


• How does z change locally if either b, c, A change?
• How does z change if we add new constraints, introduce new variables?
• Importance: Insight about LO and practical relevance

2 Outline
Slide 2
1. Global sensitivity analysis
2. Local sensitivity analysis
(a) Changes in b
(b) Changes in c
(c) A new variable is added
(d) A new constraint is added
(e) Changes in A
3. Detailed example

3 Global sensitivity analysis


3.1 Dependence on c
Slide 3
G(c) = min c′ x
s.t. Ax = b
x≥0
i
G(c) = mini=1,...,N c′ x is a concave function of c

3.2 Dependence on b
Slide 4
Primal Dual
F (b) = min c′ x
F (b) = max p′ b
s.t. Ax = b
s.t. p′ A ≤ c′
x≥0
F (b) = maxi=1,...,N (pi )′ b is a convex function of b

1
( c + q d) ' ( x3)

( c + q d) ' ( x2
)

( c + q d) ' ( x1) ( c + q d) ' ( x4


)

x1 o p t i m a l
. x2 o p t i m a l
. x3 o p t i m a l
. x4 o p t i m a l q

f( q)

( p1) ' ( b* + q d)

( p3) ' ( b* + q d)
( p2) ' ( b* + q d)

q1 q2 q

4 Local sensitivity analysis


Slide 5
z = min c′ x
s.t. Ax = b
x≥0
What does it mean that a basis B is optimal?

1. Feasibility conditions: B −1 b ≥ 0
2. Optimality conditions: c′ − c′B B −1 A ≥ 0′
Slide 6
• Suppose that there is a change in either b or c for example
• How do we find whether B is still optimal?
• Need to check whether the feasibility and optimality conditions are satis­

fied

5 Local sensitivity analysis


5.1 Changes in b
Slide 7
bi becomes bi + Δ, i.e.
(P ) min c′ x (P ′ ) min c′ x
s.t. Ax = b → s.t. Ax = b + Δei

x≥0 x ≥ 0

• B optimal basis for (P )


• Is B optimal for (P ′ )?
Slide 8
Need to check:

1. Feasibility: B −1 (b + Δei ) ≥ 0
2. Optimality: c′ − c′B B −1 A ≥ 0′

Observations:
1. Changes in b affect feasibility
2. Optimality conditions are not affected
Slide 9
B −1 (b + Δei ) ≥ 0
βij = [B −1 ]ij
bj = [B −1 b]j
Thus,
(B −1 b)j + Δ(B −1 ei )j ≥ 0 ⇒ bj + Δβji ≥ 0 ⇒

3
   
bj bj
max − ≤ Δ ≤ min −
βji >0 βji βji <0 βji
Slide 10

Δ≤Δ≤Δ

Within this range


• Current basis B is optimal
• z = c′B B −1 (b + Δei ) = c′B B −1 b + Δpi
• What if Δ = Δ?
• What if Δ > Δ?
Current solution is infeasible, but satisfies optimality conditions → use

dual simplex method

5.2 Changes in c
Slide 11
cj → cj + Δ

Is current basis B optimal?

Need to check:

1. Feasibility: B −1 b ≥ 0, unaffected
2. Optimality: c′ − c′B B −1 A ≥ 0′ , affected

There are two cases:


• xj basic

• xj nonbasic

5.2.1 xj nonbasic
Slide 12
cB unaffected
(cj + Δ) − c′B B −1 Aj ≥ 0 ⇒ cj + Δ ≥ 0
Solution optimal if Δ ≥ −cj
What if Δ = −cj ?
What if Δ < −cj ?

4
5.2.2 xj basic
Slide 13

cB ← ĉB = cB + Δej

Then,
[c′ − ĉ′B B −1 A]i ≥ 0 ⇒ ci − [cB + Δej ]′ B −1 Ai ≥ 0
[B −1 A]ji = aji
ci ci
ci − Δaji ≥ 0 ⇒ max ≤ Δ ≤ min
aji <0 aji aji >0 aji

What if Δ is outside this range? use primal simplex

5.3 A new variable is added


Slide 14
min c′ x min c′ x + cn+1 xn+1
s.t. Ax = b → s.t. Ax + An+1 xn+1 = b
x≥0 x≥0
In the new problem is xn+1 = 0 or xn+1 > 0? (i.e., is the new activity prof­
itable?) Slide 15
Current basis B. Is solution x = B −1 b, xn+1 = 0 optimal?

• Feasibility conditions are satisfied


• Optimality conditions:

cn+1 − c′B B −1 An+1 ≥ 0 ⇒ cn+1 − p′ An+1 ≥ 0?

• If yes, solution x = B −1 b, xn+1 = 0 optimal


• Otherwise, use primal simplex

5.4 A new constraint is added


Slide 16
′ min c′ x
min c x
s.t. Ax = b
s.t. Ax = b →
a′m+1 x = bm+1
x≥0
x≥0
If current solution feasible, it is optimal; otherwise, apply dual simplex

5
5.5 Changes in A
Slide 17
• Suppose aij ← aij + Δ
• Assume Aj does not belong in the basis
• Feasibility conditions: B −1 b ≥ 0, unaffected
• Optimality conditions: cl − c′B B −1 Al ≥ 0, l =
6 j, unaffected
• Optimality condition: cj − p′ (Aj + Δei ) ≥ 0 ⇒ cj − Δpi ≥ 0

• What if Aj is basic? BT, Exer. 5.3

6 Example
6.1 A Furniture company
Slide 18
• A furniture company makes desks, tables, chairs
• The production requires wood, finishing labor, carpentry labor

Desk Table (ft) Chair Avail.


Profit 60 30 20 -
Wood (ft) 8 6 1 48
Finish hrs. 4 2 1.5 20
Carpentry hrs. 2 1.5 0.5 8

6.2 Formulation
Slide 19
Decision variables:
x1 = # desks, x2 = # tables, x3 = # chairs

max 60x1 + 30x2 + 20x3


s.t. 8x1 + 6x2 + x3 ≤ 48
4x1 + 2x2 + 1.5x3 ≤ 20
2x1 + 1.5x2 + 0.5x3 ≤8
x1 , x2 , x3 ≥0

6.3 Simplex tableaus


Slide 20
Initial tableau: s1 s2 s3 x1 x2 x3
0 0 0 0 -60 -30 -20
s1 = 48 1 8 6 1
s2 = 20 1 4 2 1.5
s2 = 8 1 2 1.5 0.5

6
Final tableau: s1 s2 s3 x1 x2 x3
280 0 10 10 0 5 0
s1 = 24 1 2 -8 0 -2 0
x3 = 8 0 2 -4 0 -2 1
x1 = 2 0 -0.5 1.5 1 1.25 0

6.4 Information in tableaus


Slide 21
• What is B?  
1 1 8
B= 0 1.5 4 
0 0.5 2

• What is B −1 ?  
1 2 −8
B −1 = 0 2 −4 
0 −0.5 1.5
Slide 22
• What is the optimal solution?
• What is the optimal solution value?
• Is it a bit surprising?
• What is the optimal dual solution?
• What is the shadow price of the wood constraint?
• What is the shadow price of the finishing hours constraint?
• What is the reduced cost for x2 ?

6.5 Shadow prices


Slide 23
Why the dual price of the finishing hours constraint is 10?

• Suppose that finishing hours become 21 (from 20).


• Currently only desks (x1 ) and chairs (x3 ) are produced
• Finishing and carpentry hours constraints are tight
• Does this change leaves current basis optimal?
Slide 24
New Previous
8x1 + x3 + s1 = 48 s1 = 26 24
New solution:
4x1 + 1.5x3 = 21 ⇒ x1 = 1.5 2
2x1 + 0.5x3 =8 x3 = 10 8
Solution change:
z ′ − z = (60 ∗ 1.5 + 20 ∗ 10) − (60 ∗ 2 + 20 ∗ 8) = 10
Slide 25

7
• Suppose you can hire 1h of finishing overtime at $7. Would you do it?
• Another check
 
1 2 −8
c′B B −1 = (0, −20, −60)  0 2 −4  =
0 −0.5 1.5

(0, −10, −10)

6.6 Reduced costs


Slide 26
• What does it mean that the reduced cost for x2 is 5?
• Suppose you are forced to produce x2 = 1 (1 table)
• How much will the profit decrease?

8x1 + x3 + s1 + 6·1 = 48 s1 = 26
4x1 + 1.5x3 + 2·1 = 20 ⇒ x1 = 0.75
2x1 + 0.5x3 + 1.5 · 1 = 8 x3 = 10
z ′ − z = (60 ∗ 0.75 + 20 ∗ 10) − (60 ∗ 2 + 20 ∗ 8 + 30 ∗ 1) = −35 + 30 = −5 Slide 27
Another way to calculate the same thing: If x2 = 1

Direct profit from table +30


Decrease wood by -6 −6 ∗ 0 = 0
Decrease finishing hours by -2 −2 ∗ 10 = −20
Decrease carpentry hours by -1.5 −1.5 ∗ 10 = −15
Total Effect −5

Suppose profit from tables increases from $30 to $34. Should it be produced?
At $35? At $36?

6.7 Cost ranges


Slide 28
Suppose profit from desks becomes 60 + Δ. For what values of Δ does current

basis remain optimal?

Optimality conditions:

cj − c′B B −1 Aj ≥ 0 ⇒
1 2
" #
−8

p = c′B B −1 = [0, −20, −(60 + Δ)] 0 2 −4

0 −0.5 1.5

= −[0, 10 − 0.5Δ, 10 + 1.5Δ]


Slide 29
s1 , x3 , x1 are basic
Reduced costs of non-basic variables

8
 
6
c2 = c2 − p′ A2 = −30 + [0, 10 − 0.5Δ, 10 + 1.5Δ]  2  = 5 + 1.25Δ
1.5
cs2 = 10 − 0.5Δ
cs3 = 10 + 1.5Δ
Current basis optimal:

5 + 1.25Δ ≥ 0 
10 − 0.5Δ ≥ 0 −4 ≤ Δ ≤ 20
10 + 1.5Δ ≥ 0

⇒ 56 ≤ c1 ≤ 80 solution remains optimal.


If c1 < 56, or c1 > 80 current basis is not optimal.
Suppose c1 = 100(Δ = 40) What would you do?

6.8 Rhs ranges


Slide 30
Suppose
 finishing hours
 change by Δ  becoming
 (20+ Δ) What happens?
48 1 2 −8 48
B −1  20 + Δ  =  0 2 −4   20 + Δ 
8 0 −0.5 1.5 8
 
24 + 2Δ

=  8 + 2Δ  ≥ 0

2 − 0.5Δ

⇒ −4 ≤ Δ ≤ 4 current basis optimal Slide 31


Note that even if current basis is optimal, optimal solution variables change:

s1 = 24 + 2Δ
x3 = 8 + 2Δ
x1 = 2 − 0.5Δ
z = 60(2 − 0.5Δ) + 20(8 + 2Δ) = 280 + 10Δ
Slide 32
Suppose
 Δ =
 10 then

s1 44
 x3  =  25  ← inf. (Use dual simplex)
x1 −3

6.9 New activity


Slide 33
Suppose the company has the opportunity to produce stools
Profit $15; requires 1 ft of wood, 1 finishing hour, 1 carpentry hour
Should the company produce stools?

max 60x1 +30x2 +20x3 +15x4

8x1 +6x2 +x3 +x4 +s1 = 48

4x1 +2x2 +1.5x3 +x4 +s2 = 20

2x1 +1.5x2 +0.5x3 +x4 +s3 = 8

xi ≥ 0

9
1
!
c4 −c′B B −1 A4 = −15 − (0, −10, −10) 1 =5≥0
1
Current basis still optimal. Do not produce stools

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 8: Robust Optimization


1 Papers
Slide 1
• B. and Sim, The Price of Robustness, Operations Research, 2003.
• B. and Sim, Robust Discrete optimization, Mathematical Programming,

2003.

2 Structure
Slide 2
• Motivation

• Data Uncertainty

• Robust Mixed Integer Optimization

• Robust 0-1 Optimization

3 Motivation
Slide 3
• The classical paradigm in optimization is to develop a model that assumes

that the input data is precisely known and equal to some nominal values.

This approach, however, does not take into account the influence of data

uncertainties on the quality and feasibility of the model.

• Can we design solution approaches that are immune to data uncertainty,

that is they are robust?

Slide 4
• Ben-Tal and Nemirovski (2000):
In real-world applications of Linear Optimization (Net Lib li­
brary), one cannot ignore the possibility that a small uncer­
tainty in the data can make the usual optimal solution com­
pletely meaningless from a practical viewpoint.

3.1 Literature
Slide 5
• Ellipsoidal uncertainty; Robust convex optimization Ben-Tal and Nemirovski

(1997), El-Ghaoui et. al (1996)

• Flexible adjustment of conservativism


• Nonlinear convex models
• Not extendable to discrete optimization

1
4 Goal
Slide 6
Develop an approach to address data uncertainty for optimization problems
that:
• It allows to control the degree of conservatism of the solution;
• It is computationally tractable both practically and theoretically.

5 Data Uncertainty
Slide 7
minimize c′ x
subject to Ax ≤ b
l≤x≤u
xi ∈ Z, i = 1, . . . , k,
WLOG data uncertainty affects only A and c, but not the vector b. Slide 8

• (Uncertainty for matrix A): aij , j ∈ Ji is independent, symmetric

and bounded random variable (but with unknown distribution) ãij , j ∈ Ji

that takes values in [aij − âij , aij + âij ].

• (Uncertainty for cost vector c): cj , j ∈ J0 takes values in [cj , cj + dj ].

6 Robust MIP
Slide 9
• Consider an integer Γi ∈ [0, |Ji |], i = 0, 1, . . . , m.
• Γi adjusts the robustness of the proposed method against the level of

conservativeness of the solution.

• Speaking intuitively, it is unlikely that all of the aij , j ∈ Ji will change.

We want to be protected against all cases that up to Γi of the aij ’s are

allowed to change.

Slide 10
• Nature will be restricted in its behavior, in that only a subset of the

coefficients will change in order to adversely affect the solution.

• We will guarantee that if nature behaves like this then the robust solution

will be feasible deterministically. Even if more than Γi change, then the

robust solution will be feasible with very high probability.

2
6.1 Problem
( ) Slide 11
X

minimize c x+ max dj |xj |
{S0 | S0 ⊆J0 ,|S0 |≤Γ0 }
j∈S0
( )
X X
subject to aij xj + max âij |xj | ≤ bi , ∀i
{Si | Si ⊆Ji ,|Si |≤Γi }
j j∈Si
l ≤ x ≤
u
xi ∈ Z, ∀i = 1, . . . k.

6.2 Theorem 1
Slide 12
The robust problem can be reformulated has an equivalent MIP:
P
minimize c ′ x + z 0 Γ0 + p
X j∈J0X0j
subject to aij xj + zi Γi + pij ≤ bi ∀i
j j∈Ji
z0 + p0j ≥ dj yj ∀j ∈ J0
zi + pij ≥ a ˆij yj ∀i =
� 0, j ∈ Ji
pij , yj , zi ≥ 0 ∀i, j ∈ Ji
−yj ≤ xj ≤ yj ∀j
lj ≤ xj ≤ uj ∀j
xi ∈ Z i = 1, . . . , k.

6.3 Proof
Slide 13
Given a vector x∗ , we define:
( )
X
βi (x∗ ) = max âij |x∗j | .
{Si | Si ⊆Ji ,|Si |=Γi }
j∈Si

This equals to: X


βi (x∗ ) = max âij |x∗j |zij
j∈Ji
X
s.t. zij ≤ Γi
j∈Ji

0 ≤ zij ≤ 1 ∀i, j ∈ Ji .

Slide 14
Dual: X
βi (x∗ ) = min pij + Γi zi
j∈Ji
s.t. zi + pij ≥ âij |x∗j | ∀j ∈ Ji
pij ≥ 0 ∀j ∈ Ji
zi ≥ 0 ∀i.

3
|Ji | Γi

5 5

10 8.3565

100 24.263

200 33.899

Table 1: Choice of Γi as a function of |Ji | so that the probability of constraint


violation is less than 1%.

6.4 Size
Slide 15
• Original Problem has n variables and m constraints
• Robust counterpart has 2n + m + l variables, where l = m
P
i=0 |Ji | is the

number of uncertain coefficients, and 2n + m + l constraints.

6.5 Probabilistic Guarantee


6.5.1 Theorem 2
Slide 16
Let x∗ be an optimal solution of robust MIP.
(a) If A is subject to the model of data uncertainty U:
!  
n   n  

1 n n
X  X X
Pr ãij x∗j > bi ≤ (1 − µ) +µ ,

2n  l l 
j l=⌊ν⌋ l=⌊ν⌋+1

n = |Ji |, ν = Γi2+n and µ = ν − ⌊ν⌋; bound is tight.


(b) As n → ∞
 
n   n    
1 n n Γi − 1
 X X
(1 − µ) +µ ∼1−Φ √ .
2n  l l  n
l=⌊ν⌋ l=⌊ν⌋+1

Slide 17
Slide 18

7 Experimental Results
7.1 Knapsack Problems
• Slide 19
X
maximize ci xi
i∈N
X
subject to wi xi ≤ b
i∈N
x ∈ {0, 1}n.

4
0
10
Approx bound
Bound 2

−1
10

−2
10

−3
10

−4
10
0 1 2 3 4 5 6 7 8 9 10
Γi

Γ Violation Probability Optimal Value Reduction


0 0.5 5592 0%
2.8 4.49 × 10−1 5585 0.13%
36.8 5.71 × 10−3 5506 1.54%
82.0 5.04 × 10−9 5408 3.29%
200 0 5283 5.50%

• w̃i are independently distributed and follow symmetric distributions in

[wi − δi , wi + δi ];

• c is not subject to data uncertainty.

7.1.1 Data
Slide 20
• |N | = 200, b = 4000,
• wi randomly chosen from {20, 21, . . . , 29}.
• ci randomly chosen from {16, 17, . . . , 77}.
• δi = 0.1wi .

5
7.1.2 Results
Slide 21

8 Robust 0-1 Optimization


Slide 22

• Nominal combinatorial optimization:

minimize c′ x
subject to x ∈ X ⊂ {0, 1}n .

• Robust Counterpart:
X
Z∗ = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S

subject to x ∈ X,

• WLOG d1 ≥ d2 ≥ . . . ≥ dn .

8.1 Remarks
Slide 23

• Examples: the shortest path, the minimum spanning tree, the minimum
assignment, the traveling salesman, the vehicle routing and matroid inter­
section problems.

• Other approaches to robustness are hard. Scenario based uncertainty:

minimize max(c′1 x, c′2 x)


subject to x ∈ X.

is NP-hard for the shortest path problem.

8.2 Approach
X Slide 24

Primal :Z ∗ = min c′ x + max dj xj uj


x∈X
j
s.t. 0 ≤ uj ≤ 1, ∀j
X
uj ≤ Γ
j
X
Dual :Z ∗ = min c′ x + min θΓ + yj
x∈X
j
s.t. yj + θ ≥ dj xj , ∀j
yj , θ ≥ 0

6
8.3 Algorithm A
Slide 25
• Solution: yj = max(dj xj − θ, 0)
• X
Z∗ = min θΓ + (cj xj + max(dj xj − θ, 0))
x∈X,θ≥0
j

• Since X ⊂ {0, 1}n,


max(dj xj − θ, 0) = max(dj − θ, 0) xj

• X
Z∗ = min θΓ + (cj + max(dj − θ, 0)) xj
x∈X,θ≥0
j
Slide 26
• d1 ≥ d2 ≥ . . . ≥ dn ≥ dn+1 = 0.
• For dl ≥ θ ≥ dl+1 ,
n l
X X
min θΓ + cj xj + (dj − θ)xj =
x∈X,dl ≥θ≥dl+1
j=1 j=1

n l
X X
dl Γ + min cj xj + (dj − dl )xj = Zl
x∈X
j=1 j=1

n l
X X
Z∗ = min dl Γ + min cj xj + (dj − dl )xj .
l=1,...,n+1 x∈X
j=1 j=1

8.4 Theorem 3
Slide 27
• Algorithm A correctly solves the robust 0-1 optimization problem.
• It requires at most |J| + 1 solutions of nominal problems. Thus, If the

nominal problem is polynomially time solvable, then the robust 0-1 coun­

terpart is also polynomially solvable.

• Robust minimum spanning tree, minimum assignment, minimum match­

ing, shortest path and matroid intersection, are polynomially solvable.

9 Experimental Results
9.1 Robust Sorting
X Slide 28
minimize ci xi
i∈N
X
subject to xi = k
i∈N
x ∈ {0, 1}n .

7
Γ ¯
Z(Γ) ¯
% change in Z(Γ)
σ(Γ) % change in σ(Γ)
0 8822 0 %
501.0 0.0 %
10 8827 0.056 %
493.1 -1.6 %
20 8923 1.145 %
471.9 -5.8 %
30 9059 2.686 %
454.3 -9.3 %
40 9627 9.125 %
396.3 -20.9 %
50 10049 13.91 %
371.6 -25.8 %
60 10146 15.00 %
365.7 -27.0 %
70 10355 17.38 %
352.9 -29.6 %
80 10619 20.37 %
342.5 -31.6 %
100 10619 20.37 %
340.1 -32.1 %

X
Z ∗ (Γ) = minimize c′ x + max dj x j
{S| S⊆J,|S|=Γ}
j∈S
X
subject to xi = k
i∈N
x ∈ {0, 1}n .

9.1.1 Data
Slide 29
• |N | = 200;
• k = 100;
• cj ∼ U [50, 200]; dj ∼ U [20, 200];
• For testing robustness, generate instances such that each cost component

independently deviates with probability ρ = 0.2 from the nominal value

cj to cj + dj .

9.1.2 Results
Slide 30

8
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods

Lecture 9: Large Scale Optimization


1 Outline
Slide 1
1. The idea of column generation
2. The cutting stock problem
3. Stochastic programming

2 Column Generation
Slide 2
• For x ∈ ℜn and n large consider the LOP:

min c′ x
s.t. Ax = b
x≥0

• Restricted problem X
min ci xi
i∈I
X
s.t. Ai xi = b
i∈I
x≥0

2.1 Two Key Ideas


Slide 3
• Generate columns Aj only as needed.
• Calculate mini ci efficiently without enumerating all columns.

3 The Cutting Stock Problem


Slide 4
• Company has a supply of large rolls of paper of width W .
• bi rolls of width wi , i = 1, . . . , m need to be produced.
• Example: w = 70 inches, can be cut in 3 rolls of width w1 = 17 and 1 roll

of width w2 = 15, waste:

70 − (3 × 17 + 1 × 15) = 4
Slide 5
• Given w1 , . . . , wm and W there are many cutting patterns: (3, 1) and (2, 2)

for example

3 × 17 + 1 × 15 ≤ 70
2 × 17 + 2 × 15 ≤ 70

1
• Pattern: (a1 , . . . , am ) integers:

X
ai wi ≤ W
i=1

3.1 Problem
Slide 6
• Given wi , bi , i = 1, . . . , m (bi : number of rolls of width wi demanded,

and W (width of large rolls):

• Find how to cut the large rolls in order to minimize the number of rolls

used.

3.2 Concrete Example


Slide 7
• What is the solution for W = 70, w1 = 21, w2 = 11, b1 = 40, b2 = 40?
• feasible patterns: (2, 2), (3, 0), (0, 6)
• Solution 1: (2, 2) : 20 patterns; 20 rolls used

• Solution 2: (3, 0) : 12, (0, 6) : 9, (2, 2) : 2 patterns: 23 rolls used


Slide 8
• W = 70, w1 = 20, w2 = 11, b1 = 12, b2 = 17
• Feasible patterns: 10 , 20 , 30 , 01 , 11 , 21 , 02 , 1 2 0 1
          
2 , 2, 3, 3 ,
0 1 0 0
   
4 , 4 , 5 , 6
1 0
 
• x1 , . . . , x15 = # of feasible patterns of the type 0,..., 6 respectively

min x1 + ·· · + x15


     
1 2 0 12
s.t. x1 + x2 + · · · + x15 =
0 0 6 17
x1 , . . . , x15 ≥ 0
Slide 9
       
0 0 3 12
• Example: 2 +1 +4 = 7 rolls used
6 5 0 17
       
0 0 3 12
4 + +4 = 9 rolls used
4 1 0 17
• Any ideas?

2
3.3 Formulation
Slide 10
Decision variables: xj = number of rolls cut by pattern j characterized by vector
Aj :
Pn
min xj
j=1 
b1
n
Aj · xj =  ... 
P  
j=1
bm
xj ≥ 0 ( integer)
Slide 11
• Huge number of variables.
• Can we apply column generation, that is generate the patterns Aj on the

fly?

3.4 Algorithm
Slide 12
Idea: Generate feasible patterns as needed.
 W       
⌊ w1 ⌋ 0 0 0
 0   ⌊W ⌋   0
   0
1) Start with initial patterns:    w2
 0 , 0
, W , 
  ⌊ ⌋  
w3
 0
W
0 0 0 ⌊ w4 ⌋
Slide 13
2) Solve:
min x1 + · · · + xm

x1 A1 + · · · + xm Am = b

xi ≥ 0

Slide 14
3) Compute reduced costs

cj = 1 − p′ Aj for all patterns j


If cj ≥ 0 current set of patterns optimal
If cs < 0 ⇒ xs needs to enter basis
How are we going to compute reduced costs cj = 1 − p′ Aj for all j? (huge
number)

3
3.4.1 Key Idea
Slide 15
4) Solve
m
X
z ∗ = max p i ai
i=1
Xm
s.t. wi ai ≤ W
i=1
ai ≥ 0, integer
This is the integer knapsack problem
Slide 16
• If z ∗ ≤ 1 ⇒ 1 − p′ Aj > 0 ∀j ⇒ current solution optimal
• If z ∗ > 1 ⇒ ∃ s: 1 − p′ As < 0 ⇒ Variable xs becomes basic, i.e., a new

pattern As will enter the basis.

• Perform min-ratio test and update the basis.

3.5 Dynamic Programming


Slide 17
F (u) = max p1 a1 + · · · + pm am
s.t. w1 a1 + · · · + wm am ≤ u
ai ≥ 0, integer

• For u ≤ wmin , F (u) = 0.


• For u ≥ wmin

F (u) = max {pi + F (u − wi )}

i=1,...,m

Why ?

3.6 Example
Slide 18
max 11x1 + 7x2 + 5x3 + x4
s.t. 6x1 + 4x2 + 3x3 + x4 ≤ 25
xi ≥ 0, xi integer

F (0) = 0

F (1) = 1

F (2) = 1 + F (1) = 2

Slide 19
F (3) = max(5 + F (0)∗ , 1 + F (2)) = 5

F (4) = max(7 + F (0)∗ , 5 + F (1), 1 + F (3)) = 7

F (5) = max(7 + F (1)∗ , 5 + F (2), 1 + F (4)) = 8

F (6) = max(11 + F (0)∗ , 7 + F (2), 5 + F (3), 1 + F (5)) = 11

F (7) = max(11 + F (1)∗ , 7 + F (2), 5 + F (3), 1 + F (4)) = 12

F (8) = max(11 + F (2), 7 + F (4)∗ , 5 + F (5), 1 + F (7)) = 14

F (9) = 11 + F (3) = 16

F (10) = 11 + F (4) = 18

F (u) = 11 + F (u − 6) = 16 u ≥ 11

4
⇒ F (25) = 11 + F (19) = 11 + 11 + F (13) = 11 + 11 + 11 + F (7) = 33 + 12 = 45
x∗ = (4, 0, 0, 1)

4 Stochastic Programming
4.1 Example
Slide 20
Wrenches
Pliers Cap.
Steel (lbs)
1.5
1.0 27,000
Molding machine (hrs)
1.0
1.0 21,000
Assembly machine (hrs)
0.3
0.5 9,000* Slide 21
Demand limit (tools/day)
15,000
16,000
Contribution to earnings
$130*
$100
($/1000 units)
max 130W + 100P
s.t. W ≤ 15

P ≤ 16

1.5W + P ≤ 27

W + P ≤ 21

0.3W + 0.5P ≤ 9

W, P ≥ 0

4.1.1 Random data


Slide 22
1

 8000
 with probability
• Assembly capacity is random: 2
 10, 000 with probability
 1
2
1

 160 with probability

• Contribution from wrenches: 2
 90 1
with probability

2

4.1.2 Decisions
Slide 23
• Need to decide steel capacity in the current quarter. Cost 58$/1000lbs.
• Soon after, uncertainty will be resolved.
• Next quarter, company will decide production quantities.

4.1.3 Formulation
Slide 24

5
State Cap. W. contr. Prob.
1 8,000 160 0.25
2 10,000 160 0.25
3 8,000 90 0.25
4 10,000 90 0.25
Decision Variables: S: steel capacity,

Pi , Wi : i = 1, . . . , 4 production plan under state i. Slide 25

max −58S + 0.25Z1 + 0.25Z2 + 0.25Z3 + 0.25Z4


s.t.

Ass. 1 0.3W1 + 0.5P1 ≤ 8

Mol. 1 W1 + P1 ≤ 21

Ste. 1 −S + 1.5W1 + P1 ≤ 0
W.d. 1 W1 ≤ 15
P.d. 1 P1 ≤ 16

Obj. 1 −Z1 + 160W1 + 100P1 = 0

Slide 26
Ass. 2
0.3W2 + 0.5P2 ≤ 10
Mol. 2
W2 + P2 ≤ 21
Ste. 2
−S + 1.5W2 + P2 ≤ 0
W.d. 2
W2 ≤ 15
P.d. 2 P2 ≤ 16
Obj. 2 −Z2 + 160W2 + 100P2 = 0
Slide 27
Ass. 3
0.3W3 + 0.5P3 ≤ 8
Mol. 3
W3 + P3 ≤ 21
Ste. 3
−S + 1.5W3 + P3 ≤ 0
W.d. 3
W3 ≤ 15
P.d. 3 P3 ≤ 16
Obj. 3 −Z3 + 90W3 + 100P3 = 0
Slide 28
Ass. 4
0.3W4 + 0.5P4 ≤ 10
Mol. 4
W4 + P4 ≤ 21
Ste. 4
−S + 1.5W4 + P4 ≤ 0
W.d. 4
W4 ≤ 15
P.d. 4 P4 ≤ 16
Obj. 4 −Z4 + 90W4 + 100P4 = 0
S, Wi , Pi ≥ 0

4.1.4 Solution
Slide 29
Solution: S = 27, 250lb.
Wi Pi
1 15,000 4,750
2 15,000 4,750
3 12,500 8,500
4 5,000 16,000

6
4.2 Two-stage problems
Slide 30
• Random scenarios indexed by w = 1, . . . , k. Scenario w has probability

αw .

• First stage decisions: x: Ax = b, x ≥ 0.


• Second stage decisions: yw : w = 1, . . . , k.
• Constraints:

Bw x + Dw yw = dw , yw ≥ 0.

4.2.1 Formulation
Slide 31
min c′ x + α1 f1′ y1 + ··· + αk fk′ yk
Ax =b
B1 x + D 1 y1 = d1
B2 x + D 2 y2 = d2 Slide 32
. . .. ..
. . .

Bk x + D k yk = dk

x, y1 , y2 , . . . , yk ≥ 0.

Structure: x y1 y2 y3 y4
Objective

7
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods

Lecture 12: Discrete Optimization


1 Todays Lecture
Slide 1
• Modeling with integer variables
• What is a good formulation?
• Theme: The Power of Formulations

2 Integer Optimization
2.1 Mixed IO
Slide 2
(MIO) max c� x + h� y
s.t. Ax + By ≤ b
n
x ∈ Z+ (x ≥ 0, x integer)
m
y ∈ R+ (y ≥ 0)

2.2 Pure IO
Slide 3
(IO) max c� x
s.t. Ax ≤ b
n
x ∈ Z+
Important special case: Binary Optimization
(BO) max c� x
s.t. Ax ≤ b
x ∈ {0, 1}n

2.3 LO
Slide 4
(LO) max c� x
s.t. By ≤ b
n
y ∈ R+

3 Modeling with Binary Variables


3.1 Binary Choice
� Slide 5
1, if event occurs
x∈
0, otherwise
Example 1: IO formulation of the knapsack problem
n : projects, total budget b

aj : cost of project j

cj : value
� of project j
Slide 6
1, if project j is selected.
xj =
0, otherwise.

1
n

max cj xj
j=1

s.t. aj xj ≤ b
xj ∈ {0, 1}

3.2 Modeling relations


Slide 7
• At most one event occurs �
xj ≤ 1
j

• Neither or both events occur

x2 − x1 = 0

• If one event occurs then, another occurs

0 ≤ x2 ≤ x1

• If x = 0, then y = 0; if x = 1, then y is uncontrained

0 ≤ y ≤ U x, x ∈ {0, 1}

3.3 The assignment problem


Slide 8
n people

m jobs

cij : cost
� of assigning person j to job i.

1 person jis assigned to job i


xij =
� 0
min cij xij


s.t. xij = 1 each job is assigned
j=1

�m

xij ≤ 1 each person can do at most one job.

i=1

xij ∈ {0, 1}

3.4 Multiple optimal solutions


Slide 9
• Generate all optimal solutions to a BOP.

max c� x
s.t. Ax ≤ b
x ∈ {0, 1}n

• x∗ optimal solution: I0 = {j : x∗j = 0}, I1 = {j : x∗j = 1}.

2
• Add constraint � �
xj + (1 − xj ) ≥ 1.
j∈I0 j∈I1

• Generate third best?

• Extensions to MIO?

4 What is a good formulation?


4.1 Facility Location
Slide 10
• Data

N = {1 . . . n} potential facility locations

I = {1 . . . m} set of clients

cj : cost of facility placed at j

hij : cost of satisfying client i from facility j.

• Decision variables

1, a facility is placed at location j
xj =
0, otherwise
yij = fraction of demand of client i
satisfied by facility j.
Slide 11
n
� m �
� n
IZ1 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
yij ≤ xj
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.
Slide 12
Consider an alternative formulation.
n
� m �
� n
IZ2 = min cj xj + hij yij
j=1 i=1 j=1
�n
s.t. yij = 1
j=1
�m
yij ≤ m · xj
i=1
xj ∈ {0, 1}, 0 ≤ yij ≤ 1.

Are both valid?

Which one is preferable?

3
4.2 Observations
Slide 13
• IZ1 = IZ2 , since the integer points both formulations define are the same.

n �
� 0 ≤ xj ≤ 1
P1 = {(x, y) : yij = 1, yij ≤ xj ,
0 ≤ yij ≤ 1
j=1
n
� m

P2 = {(x, y) : yij = 1, yij ≤ m · xj ,
j=1 i=1

0 ≤ xj ≤ 1
0 ≤ yij ≤ 1
Slide 14
• Let
Z1 = min cx + hy, Z2 = min cx + hy
(x, y) ∈ P1 (x, y) ∈ P2

• Z2 ≤ Z1 ≤ IZ1 = IZ2

4.3 Implications
Slide 15
• Finding IZ1 (= IZ2 ) is difficult.
• Solving to find Z1 , Z2 is a LOP. Since Z1 is closer to IZ1 several methods

(branch and bound) would work better (actually much better).

• Suppose that if we solve min cx + hy, (x, y) ∈ P1 we find an integral

solution. Have we solved the facility location problem?

Slide 16

• Formulation 1 is better than Formulation 2. (Despite the fact that 1 has

a larger number of constraints than 2.)

• What is then the criterion?

4.4 Ideal Formulations


Slide 17
• Let P be a linear relaxation for a problem
• Let

H = {(x, y) : x ∈ {0, 1}n} ∩ P

• Consider Convex Hull (H)


� �
= {x : x = λi xi , λi = 1, λi ≥ 0, xi ∈ H}
i i

Slide 18

4
• The extreme points of CH(H) have {0, 1} coordinates.
• So, if we know CH(H) explicitly, then by solving min cx + hy, (x, y) ∈

CH(H) we solve the problem.

• Message: Quality of formulation is judged by closeness to CH(H).

CH(H) ⊆ P1 ⊆ P2

5 Minimum Spanning
Tree (MST)
Slide 19
• How do telephone companies bill you?
• It used to be that rate/minute: Boston → LA proportional to distance in

MST

• Other applications: Telecommunications, Transportation (good lower bound

for TSP)

Slide 20
• Given a graph G = (V, E) undirected and Costs ce , e ∈ E.
• Find a tree of minimum cost spanning all the nodes.

1, if edge e is included in the tree
• Decision variables xe =
0, otherwise
Slide 21
• The tree should be connected. How can you model this requirement?
• Let S be a set of vertices. Then S and V \ S should be connected

i∈S
• Let δ(S) = {e = (i, j) ∈ E :
j ∈V \S
• Then, �
xe ≥ 1
e∈δ(S)

• What is the number of edges in a tree?



• Then, xe = n − 1
e∈E

5
5.1 Formulation
� Slide 22
IZMST = min ce xe

⎧ e∈E


⎪ xe ≥ 1 ∀ S ⊆ V, S �= ∅, V
⎨ e∈δ(S)


H xe = n − 1

⎩ e∈E



xe ∈ {0, 1}.
Is this a good formulation? Slide 23

Pcut = {x ∈ R|E| : 0 ≤ x ≤ e,

xe = n − 1
e∈E

xe ≥ 1 ∀ S ⊆ V, S �= ∅, V }
e∈δ(S)

Is Pcut the CH(H)?

5.2 What is CH(H)?


Slide 24
Let �
Psub = {x ∈ R|E| : xe = n − 1
e∈E

xe ≤ |S| − 1 ∀ S ⊆ V, S =
� ∅, V }
e∈E(S)
� �
i∈S
E(S) = e = (i, j) :
j∈S
Why is this a valid IO formulation? Slide 25

• Theorem: Psub = CH(H).


• ⇒ Psub is the best possible formulation.
• MESSAGE: Good formulations can have an exponential number of con­

straints.

6 The Traveling Salesman


Problem
Slide 26
Given G = (V, E) an undirected graph. V = {1, . . . , n}, costs ce ∀ e ∈ E. Find
a tour that minimizes total length.

6
6.1 Formulation I
� Slide 27
1, if edge e is included in the tour.
xe =
0, otherwise.

min ce xe

e∈E


s.t. xe ≥ 2, S⊆E
e∈δ(S)

xe = 2, i∈V
e∈δ(i)
xe ∈ {0, 1}

6.2 Formulation II
� Slide 28
min �ce xe
s.t. xe ≤ |S| − 1, S ⊆ E
e∈E(S)

xe = 2, i ∈ V
e∈δ(i)
xe ∈ {0, 1}
Slide 29
T SP
= {x ∈ R|E| ;
� �
Pcut xe ≥ 2, xe
= 2
e∈δ(S) e∈δ(i)

0 ≤ xe ≤ 1}

T SP
Psub
= {x ∈ R|E| ; xe = 2

e∈δ(i)


xe ≤ |S| − 1
e∈δ(S)
0 ≤ xe ≤ 1}
Slide 30
T SP T SP
• Theorem: Pcut = Psub �⊇ CH(H)
• Nobody knows CH(H) for the TSP

7 Minimum Matching
Slide 31
• Given G = (V, E); ce costs on e ∈ E. Find a matching of minimum cost.
• Formulation: �
min �ce xe
s.t. xe = 1, i∈V
e∈δ(i)
xe ∈ {0, 1}

• Is the linear relaxation CH(H)?


Slide 32

7
Let
PMAT = {x ∈ R|E| :

xe = 1
e∈δ(i)

xe ≥ 1 |S| = 2k + 1, S �= ∅
e∈δ(S)

xe ≥ 0}

Theorem: PMAT = CH(H)

8 Observations
Slide 33
• For MST, Matching there are efficient algorithms. CH(H) is known.
• For TSP � ∃ efficient algorithm. TSP is an N P − hard problem. CH(H)

is not known.

• Conjuecture: The convex hull of problems that are polynomially solvable

are explicitly known.

9 Summary
Slide 34
1. An IO formulation is better than another one if the polyhedra of their

linear relaxations are closer to the convex hull of the IO.

2. A good formulation may have an exponential number of constraints.


3. Conjecture: Formulations characterize the complexity of problems. If a

problem is solvable in polynomial time, then the convex hull of solutions

is known.

8
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093: Optimization Methods

Lecture 13: Exact Methods for IP

1 Outline Slide 1
� Cutting plane methods
� Branch and bound methods

2 Cutting plane methods


Slide 2
min c0x
s:t: Ax � b
x�0
x integer;

LP relaxation
min c0x
s:t: Ax � b
x � 0:

2.1 Algorithm Slide 3


� Solve the LP relaxation. Let x� be an optimal solution.
� If x� is integer stop; x� is an optimal solution to IP.
� If not, add a linear inequality constraint to LP relaxation that all integer
solutions satisfy, but x� does not; go to Step 1.
2.2 Example Slide 4
� Let x� be an optimal BFS to LP ralxation with at least one fractional
basic variable.
� N : set of indices of the nonbasic variables.
� Is this a valid cut� X
xj � 1:

j 2N

2.3 The Gomory cutting


plane algorithm Slide 5
� Let x� be an optimal BFS and B an optimal basis.

xB + B�1AN xN � B�1b:

� � � �
� aij � B �1Aj i , ai0 � B �1b i :

1
� X

xi + aij xj � ai0 :
j 2N
� Since xj � 0 for all j ,
X X
xi + baij cxj � xi + aij xj � ai0 :
j 2N j 2N
� Since xj integer, X
xi + baij cxj � bai0 c:
j 2N
� Valid cut
2.4 Example Slide 6
min x1 � 2x2
s:t: �4x1 + 6x2 � 9
x 1 + x2 � 4
x1 ; x2 � 0
x1 ; x2 integer:
We transform the problem in standard form
min x1 � 2x2
s:t: �4x1 + 6x2 + x3 �9
x1 + x2 + x4 � 4
x1 ; : : :; x4 � 0
x1 ; : : :; x4 integer.
LP relaxation: x1 � (15�10; 25�10). Slide 7

x2 + 101 x3 + 10
1 x � 25 :
4
10
� Gomory cut
x2 � 2:
� Add constraints x2 + x5 � 2, x5 � 0
� New optimal x2 � (3�4; 2):
� One of the equations in the optimal tableau is
x1 � 14 x3 + 64 x5 � 43 :
� New Gomory cut
x1 � x3 + x5 � 0;
� New optimal solution is x3 � (1; 2):
Slide 8

2
x2

.
x1 - 3x1 + 5x 2 < 7

2 ..
x2
x3
x2 < 2

0
1 2 3 4 x1

3 Branch and bound Slide 9


1. Branching: Select an active subproblem Fi

2. Pruning: If the subproblem is infeasible, delete it.


3. Bounding: Otherwise, compute a lower bound ( ) for the subproblem. b Fi

4. Pruning: If ( ) � , the current best upperbound, delete the subproblem.


b Fi U

5. Partitioning: If ( ) , either obtain an optimal solution to the subproblem


b Fi � U

(stop), or break the corresponding problem into further subproblems, which are
added to the list of active subproblem.

3.1 LP Based Slide 10


� Compute the lower bound b(F ) by solving the LP relaxation of the discrete
optimization problem.
� From the LP solution x� , if there is a component x�
i which is fractional,
we create two subproblems by adding either one of the constraints
xi � bx�i c; or xi � dx�
i e:
Note that both constraints are violated by x� .
� If there are more than 2 fractional components, we use selection rules like
maximum infeasibility etc. to determine the inequalities to be added to
the problem
� Select the active subproblem using either depth-�rst or breadth-�rst search
strategies.
3.2 Example Slide 11
max 12x1 + 8x2 + 7x3 + 6x4
s:t: 8x1 + 6x2 + 5x3 + 4x4 � 15
x1; x2; x3; x4 are binary:
3
Objective value=22.2
x1=1, x2=0, x3=0.6, x4=1

x3=0 x3=1

Objective value=22 Objective value=22


x1=1, x2=0.5, x3=0, x4=1 x1=1, x2=0, x3=1, x4=0.5

Objective value=22.2
x1=1, x2=0, x3=0.6, x4=1

x3=0 x3=1

Objective value=22 Objective value=22


x1=1, x2=0.5, x3=0, x4=1 x1=1, x2=0, x3=1, x4=0.5

x4=0 x4=1

Objective value=21.66 Objective value=22


x1=1, x2=0.33, x3=1, x4=0 x1=0.75, x2=0, x3=1, x4=1

LP relaxation Slide 12
max 12x1 + 8x2 + 7x3 + 6x4
s:t: 8x1 + 6x2 + 5x3 + 4x4 � 15
x1 � 1; x2 � 1; x3 � 1; x4 � 1
x1 ; x2; x3; x4 � 0
LP solution: x1 � 1; x2 � 0; x3 � 0:6; x4 � 1 Pro�t�22:2
3.2.1 Branch and bound tree Slide 13
3.3 Pigeonhole Problem Slide 14
Slide 15
� There are n + 1 pigeons with n holes. We want to place the pigeons in the Slide 16
holes in such a way that no two pigeons go into the same hole.
� Let xij � 1 if pigeon i goes into hole j , 0 otherwise.
Slide 17
� Formulation 1:
P
j
� 1; i � 1; : : : ; n + 1
xij

xij + xkj � 1; 8j; i 6� k

4
Objective value=22.2
x1=1, x2=0, x3=0.6, x4=1

x3=0 x3=1

Objective value=22 Objective value=22


x1=1, x2=0.5, x3=0, x4=1 x1=1, x2=0, x3=1, x4=0.5

x4=0 x4=1

Objective value=21.66 Objective value=22


x1=1, x2=0.33, x3=1, x4=0 x1=0.75, x2=0, x3=1, x4=1

x1=0 x1=1

Objective value=21 Infeasible


x1=0, x2=1, x3=1, x4=1

� Formulation 2:
P
j
xij � 1;
� 1; : : : ; n + 1
i
Pn+1
xij � 1; 8j
i�1

Which formulation is better for the problem� Slide 18

� The pigeonhole problem is infeasible.


� For Formulation 1, feasible solution xij � n1 for all i; j . O(n3) constraints.
Nearly complete enumeration is needed for LP-based BB, since the prob
lem remains feasible after �xing many variables.
� Formulation 2 Infeasible. O(n) constraints.
� Mesage: Formulation of the problem is important!
3.4 Preprocessing Slide 19
� An e�ective way of improving integer programming formulations prior to and
during branch-and-bound.
� Logical Tests
{ Removal of empty (all zeros) rows and columns;
{ Removal of rows dominated by multiples of other rows;
{ strengthening the bounds within rows by comparing individual variables

and coe�cients to the right-hand-side.


{ Additional strengthening may be possible for integral variables using round

ing.
� Probing : Setting temporarily a 0-1 variable to 0 or 1 and redo the logical
tests. Force logical connection between variables. For example, if 5x + 4y + z �
8; x; y; z 2 f0; 1g, then by setting x � 1, we obtain y � 0. This leads to an
inequality x + y � 1.

5
1 2 5

4 3 6 7

4 Application
4.1 Directed TSP
4.1.1 Assignment Lower Bound
Slide 20
Given a directed graph G � (N; A) with n nodes, and a cost cij for every arc,
�nd a tour (a directed cycle that visits all nodes) of minimum cost.
Pn Pn
min �1
i j �1 cij xij
Pn
s.t. : � 1; j � 1; : : : ; n;
Pin�1
xij

j �1 ij
x � 1; i � 1; :::; n;

xij 2 f0; 1g:

Slide 21
Branching:Set one of the arcs selected in the optimal solution to zero. i.e., add
constraints of the type \xij � 0" to exclude the current optimal solution.
4.2 Improving BB Slide 22
� Better LP solver
� Use problem structure to derive better branching strategy
� Better choice of lower bound b(F ) - better relaxation
� Better choice of upper bound U - heuristic to get good solution
� KEY: Start pruning the search tree as early as possible

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093: Optimization Methods

Lecture 14: Lagrangean Methods


1 Outline
Slide 1

� The Lagrangean dual


� The strength of the Lagrangean dual
� Solution of the Lagrangean dual

2 The Lagrangean dual


Slide 2
� Consider
ZIP � min c0
x
s:t: Ax � b

Dx � d
x integer
� X � fx integer j Dx � dg
� Optimizing over X can be done e�ciently
2.1 Formulation Slide 3
� Consider
Z (�) � min c0x + �0
(b � Ax) (D)
s:t: x2X
� For �xed �, problem can be solved e�ciently
� �
� Z (�) � mini�1;:::;m c0 xi + �0 (b � Axi ) :
� Z (�) is concave and piecewise linear
2.2 Weak Duality Slide 4
If problem (D) has an optimal solution and if � � , then Z (�) � Z
0 IP

� Proof: x� an optimal solution to (D).


� Then b � Ax� � 0 and, therefore,
c0x� + �0(b � Ax�) � c0x� � Z IP

� Since x� 2 X , Z (�) � c0x� + �0 (b � Ax� ); and thus, Z (�) � Z IP

ZD

Z(p)

p* p

2.3 Key problem Slide 5


� Consider the Lagrangean dual:
ZD � max Z (�)
s:t: � � 0

� ZD � ZIP
� We need to maximize a piecewise linear concave function
Slide 6

3 Strength of LD
3.1 Main Theorem Slide 7
X � fx integer j Dx � dg: Note that CH(X ) is a polyhedron. Then
ZD � min c0 x
s:t: Ax � b
x 2 CH(X )

3.2 Example Slide 8


min 3x1 � x2
s:t: x1 � x2 � �1
�x1 + 2x2 � 5
3x1 + 2x2 � 3
6x1 + x2 � 15
x1 ,x2 � 0 integer

2
x2
CH(X)

3 . .
xD
2 .
x IP

. .
c
xL P
1 . .
0
.
1
.
2 x1

Relax x1 � x2 � �1, X involves the remaining constraints

X � (1; 0); (2; 0); (1; 1); (2; 1); (0; 2);

(1; 2); (2; 2); (1; 3); (2; 3) :

Slide 9
For p � 0, we have

Slide 10

� �

Z (p) � min 3x1 � x2 + p(�1 � x1 + x2 )

(x ;x )2X 1 2
8
�2 + p; �
� 0 � p � 5�3;
Z (p) � � 3 � 2p; 5�3 � p � 3;
:
6 � 3p; p � 3:

p � 5�3, and ZD � Z (5�3) � �1�3 Slide 11

� �
Slide 12
� xD � 1�3; 4�3 , ZD � �1�3

� �

� xLP � 1�5; 6�5 , ZLP � �9�5

� xIP � (1; 2), ZIP � 1

� ZLP � ZD � ZIP
Slide 13

� In general, ZLP � ZD � ZIP

� For c x � 3x1 � x2, we have ZLP � ZD � ZIP .

� For c x � �x1 + x2 , we have ZLP � ZD � ZIP .

� For c x � �x1 � x2 , we have ZLP � ZD � ZIP .

� It is also possible: ZLP � ZD � ZIP but not on this example.

Z(p)

0
p

3.3 LP and LD Slide 14


� ZIP � ZD for all cost vectors c, if and only if
� � �
CH X \ x j Ax � b� � CH(X ) \ �x j Ax � b�

� We have ZLP � ZD for all cost vectors c, if


� �

CH(X ) � x j Dx � d
� � �
� If� x j Dx � d , has integer extreme points, then CH(X ) � x j Dx �
d , and therefore Z � Z D LP

4 Solution of LD
� � Slide 15
� Z (�) � mini�1;:::;m c0xi + �0 (b � Axi ) ; i.e.,
� 0 ��:
Z (�) � i�1min
;:::;m
hi + f i

� Motivation: classical steepest ascent method for maximizing Z (�)


�t+1 � �t + �trZ (�t); t � 1; 2; : : :
� Problem: Z (�) is not di�erentiable

4
f (x *) +s' (x -x *)

f (x)

x* x

Z (p)

f2

s
f1

p* p

4.1 Subgradients Slide 16


� A function f : �n 7! � is concave if and only if for any x� 2 �n , there
exists a vector s 2 �n such that

f (x ) � f (x� ) + s0 (x � x� );

for all x 2 �n .
� Let f be a concave function. A vector s such that

f (x ) � f (x� ) + s0 (x � x� );

for all x 2 �n , is called a subgradient of f at x� .


Slide 17
Slide 18
4.2 Subgradient Algorithm Slide 19
1. Choose a starting point �1 ; let t � 1.
2. Given �t , choose a subgradient st of the function Z (�) at �t. If st � 0,
then �t is optimal and the algorithm terminates. Else, continue.

t p
t

s
t

( t)
Z p

1 5:00
�3 �9 00
:

2 2:60
�3 �1 80
:

3 0:68
1 �1 32
:

4 1:19
1 �0 81
:

5 1:60
1 �0 40
:

6 1:92
�2 �0 84
:

7 1:40
1 �0 60
:

8 1:61
1 �0 39
:

9 1:78
�2 �0 56
:

10 1:51
1 �0 49
:

3. Let �t+1 � �t + �t st , where �t is a positive stepsize parameter. Increment


t and go to Step 2.
� �
3a If � � 0, ptj+1 � max ptj + �t stj ; 0 ; 8 j:
4.2.1 Step sizes
Slide 20
� Z (pt) converges to the unconstrained maximum of Z (�), for any stepsize
sequence �t such that
1
X
� t � 1; and lim � � 0:

t!1 t
t�1

� Examples �t � 1�t
� �t � �0 �t; t � 1; 2; : : :;
� �t � Z^Djj�sZjj(2p ) �t; where � satis�es 0 � � � 1, and Z^D is an estimate of
t
t

the optimal value ZD .


4.3 Example Slide 21
Recall p� � 5�3 � 1:66 and ZD � �1�3 � �0:33. Apply subgradient optimiza
tion:

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093: Optimization Methods

Lecture 15: Heuristic Methods

1 Outline Slide 1
� Approximation algorithms
� Local search methods
� Simulated annealing

2 Approximation algorithms
Slide 2
� Algorithm H is an �-approximation algorithm for a minimization prob
lem with optimal cost Z � , if H runs in polynomial time, and returns a
feasible solution with cost ZH :
ZH � (1 + �)Z �
� For a maximization problem
ZH � (1 � �)Z �
2.1 TSP
2.1.1 MST-heuristic Slide 3
� Triangle inequality
cij � cik + ckj ; 8 i; k; j
� Find a minimum spanning tree with cost ZT
� Construct a closed walk that starts at some node, visits all nodes, returns
to the original node, and never uses an arc outside the minimal spanning
tree
� Each arc of the spanning tree is used exactly twice
Slide 4
� Total cost of this walk is 2ZT
� Because of triangle inequality ZH � 2ZT
� But ZT � Z � , hence
ZH � 2ZT � 2Z �
1-approximation algorithm

1
2.1.2 Matching heuristic
Slide 5
� Find a minimum spanning tree. Let ZT be its cost
� Find the set of odd degree nodes. There is an even number of them. Why�
� Find the minimum matching among those nodes with cost ZM
� Adding spanning tree and minimum matching creates a Eulerian graph,
i.e., each node has even degree. Construct a closed walk
� Performance
ZH � ZT + ZM � Z � + 1�2Z � � 3�2Z �
Slide 6

3 Local search methods Slide 7


� Local Search: replaces current solution with a better solution by slight
modi�cation (searching in some neighbourhood) until a local optimal so
lution is obtained
� Recall the Simplex method
3.1 TSP-2OPT Slide 8
� Two tours are neighbours if one can be obtained from the other by remov
ing two edges and introducing two new edges

� Each tour has O(n2) neighbours. Search for better solution among its
neighbourhood.
Slide 9
� Performance of 2-OPT on random Euclidean instances

Size N 100 1000 10000 100000 1000000


� Matching 9.5 9.7
9.9 9.9 -
2OPT 4.5 4.9 5 4.9 4.9

3.2 Extensions
4 Extensions Slide 10
� Iterated Local Search
� Large neighbourhoods (example 3-OPT)
� Simulated Annealing
� Tabu Search
� Genetic Algorithms
4.1 Large Neighbourhoods Slide 11
� Within a small neighbourhood, the solution may be locally optimal. Maybe
by looking at a bigger neighbourhood, we can �nd a better solution.
� Increase in computational complexity
4.1.1 TSP Again
Slide 12
3-OPT: Two tours are neighbour if one can be obtained from the other by
removing three edges and introducing three new edges

3-OPT improves on 2-OPT performance, with corresponding increase in exe


cution time. Improvement from 4-OPT turns out to be not that substantial
compared to 3-OPT.

5 Simulated Annealing
Slide 13
� Allow the possibility of moving to an inferior solution, to avoid being
trapped at local optimum
� Idea: Use of randomization

5.1 Algorithm

Slide 14
� Starting at x, select a random neighbour y in the neighbourhood structure

with probability qxy


X

qxy � 0; qxy � 1

y2N (x)

� Move to y if c(y) � c(x).

� If c(y) � c(x), move to y with probability

e�(c(y)�c(x))�T ;

stay in x otherwise

� T is a positive constant, called temperature

5.2 Convergence

Slide 15
� We de�ne a Markov chain.

� Under natural conditions, the long run probability of �nding the chain at

state x is given by

e
�c(x)�T

P
with A � z e

�c (z )�T

� If T ! 0, then almost all of the steady state probability is concentrated

on states at which c(x) is minimum

� But if T is too small, it takes longer to escape from local optimal (accept

an inferior move with probability e�(c(y)�c(x))�T ). Hence it takes much


longer for the markov chain to converge to the steady state distribution

5.3 Cooling schedules

Slide 16
� T (t) � R� log(t). Convergence guaranteed, but known to be slow empiri

cally.

� Exponential Schedule: T (t) � T (0)an with a � 1 and very close to 1

(a�0.95 or 0.99) commonly used.

4
5.4 Knapsack Problem
X X n
max c x : a x � b;
n
xi 2 f0; 1g
Slide 17
i i i i
i�1 i�1
Let X � (x1; :::; xn) 2 f0; 1gn
� Neighbourhood Structure: N (X ) � fY 2 f0; 1gn : d(X; Y ) � 1g. Exactly
one entry has been changed
Slide 18
Generate random Y � (y1 ; :::; yn):
� Choose j uniformly from 1; 2; :::; n.
� yi � xi if i 6� j . yj � 1 � xj .
� Accept if Pi ai yi � b.
5.4.1 Example
Slide 19
� c�(135, 139, 149, 150, 156, 163, 173, 184, 192, 201, 210, 214, 221, 229, 240)

� a�(70, 73, 77, 80,82, 87, 90,94, 98, 106, 110, 113, 115, 118, 120)

� b � 750

� X
� � (1; 0; 1; 0; 1; 0; 1; 1; 1; 0; 0; 0; 0; 1; 1), with value 1458
Slide 20
Cooling Schedule:
� T0 � 1000
� probability of accepting a downward move is between 0.787 (ci � 240) and
0.874 (ci � 135).
� Cooling Schedule: T (t) � �T (t � 1), � � 0:999
� Number of iterations: 1000, 5000
Slide 21
Performance:
� 1000 iterations: best solutions obtained in 10 runs vary from 1441 to 1454
� 5000 iterations: best solutions obtained in 10 runs vary from 1448 to 1456.

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093 Optimization Methods

Lecture 16: Dynamic Programming

1 Outline Slide 1
1. The knapsack problem
2. The traveling salesman problem
3. The general DP framework
4. Bellman equation
5. Optimal inventory control
6. Optimal trading
7. Multiplying matrices

2 The Knapsack problem


n Slide 2
X
maximize cj xj

j �1
X n
subject to wj xj  K
j �1
xj
2 f0; 1g
De�ne i
X
Ci (w) � maximize cj xj

j �1
X i
subject to wj xj  w

j �1
xj 2 f0; 1g

2.1 A DP Algorithm Slide 3


 Ci (w): the maximum value that can be accumulated using some of the
�rst i items subject to the constraint that the total accumulated weight
is equal to w
 Recursion

Ci+1(w) � max Ci (w); Ci (w wi+1 ) + ci+1

 By considering all states of the form (i; w) with w  K, algorithm has

complexity O(nK)

3 The TSP Slide 4


 G � (V; A) directed graph with n nodes
 cij cost of arc (i; j)
 Approach: choice of a tour as a sequence of choices
 We start at node 1; then, at each stage, we choose which node to visit
next.
 After a number of stages, we have visited a subset S of V and we are at
a current node k 2 S
3.1 A DP algorithm Slide 5
 C(S; k) be the minimum cost over all paths that start at node 1, visit all
nodes in the set S exactly once, and end up at node k
 (S; k) a state; this state can be reached from any state of the form S n
fkg; m , with m 2 S n fkg, at a transition cost of cmk
 Recursion
  
C(S; k) � m2min C S n fk g; m + c mk ; k2S

S nfkg

C f1g; 1 � 0:
 Length of an optimal tour is
  
min C f1; : : :; ng; k + ck1

k

 Complexity: O n22n operations

4 Guidelines for constructing


DP Algorithms
Slide 6
 View the choice of a feasible solution as a sequence of decisions occurring
in stages, and so that the total cost is the sum of the costs of individual
decisions.
 De�ne the state as a summary of all relevant past decisions.
 Determine which state transitions are possible. Let the cost of each state
transition be the cost of the corresponding decision.
 Write a recursion on the optimal cost from the origin state to a destination
state.
The most crucial step is usually the de�nition of a suitable state.

2
5 The general DP framework
Slide 7
 Discrete time dynamic system described by state xk , k indexes time.
 uk control to be selected at time k. uk 2 Uk (xk ).
 wk randomness at time k
 N time horizon
 Dynamics:
xk+1 � fk (xk ; uk ; wk)
 Cost function: additive over time
NX1 !

E gN (xN ) + gk (xk ; uk ; wk)


k�0

5.1 Inventory Control Slide 8


 xk stock available at the beginning of the kth period
 uk stock ordered at the beginning of the kth period
 wk demand duirng the kth period with given probability distribution.
Excess demand is backloged and �lled as soon as additional inventory is
available.
 Dynamics
xk+1 � xk + uk wk
 Cost NX1 !
E R(xN ) + (r(xk ) + cuk )
k�0

6 The DP Algorithm
Slide 9
 De�ne Jk (xk ) to be the expected optimal cost starting from stage k at
state xk .
 Bellman's principle of optimality
JN (xN ) � gN (xN )
Jk (xk ) �
 
min E g (x ; u ; w ) + Jk+1 (fk (xk ; uk ; wk))

uk 2Uk (xk ) wk k k k k
 Optimal expected cost for the overall problem: J0(x0 ).

3
7 Inventory Control
Slide 10
 If r(xk ) � ax2k , wk  N(�k ; �k2), then
uk � ck xk + dk ; Jk (xk ) � bk x2k + fk xk + ek
 If r(xk ) � p max(0; xk ) + h max(0; xk ) , then there exist Sk :

uk � Sk xk if xk � Sk
0 if xk  Sk

8 Optimal trading
Slide 11
 S shares of a stock to be bought within a horizon T .
 t � 1; 2; : : :; T discrete trading periods.
 Control: St number of shares acquired in period t at price Pt, t � 1; 2; : : :; T

T
 X 
 Objective: minE Pt St
t�1
T

X
s:t: St � S

t�1
 Dynamics:
Pt � Pt 1 + �St + �t
where � � 0, �t  N(0; 1)
8.1 DP ingredients Slide 12
 State: (Pt 1; Wt)
Pt 1 price realized at the previous period
Wt # of shares remaining to be purchased
 Control: St number of shares purchased at time t
 Randomness: �t
 
PT
 Objective: minE t�1 Pt St
 Dynamics: Pt � Pt 1 + �St + �t Wt � Wt 1 St 1; W1 �
S; WT +1 � 0
Slide 13
Note that WT +1 � 0 is equivalent to the constraint that S must be executed by

period T

8.2 The Bellman Equation Slide 14


 
Jt (Pt 1; Wt) � min
St t
E PtSt + Jt+1 (Pt ; Wt+1)

JT (PT 1; WT ) �
min E [P W ] � (PT 1 + �WT )WT
ST T T T
Since WT +1 � 0 ) ST � WT
8.3 Solution Slide 15
JT 1(PT 2 ; WT 1) �
 
� min ET
ST �1 1 PT 1ST 1 + JT (PT 1; WT )

� min ET
ST �1 1 (PT 2 + �ST 1 + �T 1 )ST 1 +
 

JT PT 2 + �ST 1 + �T 1; WT 1 ST 1

ST � WT 1
1
2

JT 1 (PT 2; WT 1) � WT 1 (PT 2 + 34 �WT 1);


Slide 16
Continuing in this fashion,
ST k � WT k
k+1
JT k (PT k 1; WT k ) � k + 2 �W )
WT k (PT k 1 + 2(k + 1) T k

S1 � S

2 
J1(P0 ; W1) � �S 1
P0S + 2 1 + T

S1 � S2 �    � ST �


TS

8.4 Di erent Dynamics Slide 17


Pt � Pt 1 + �St + �Xt + �t ; ��0
Xt � �Xt 1 + �t ; X1 � 1 ; � 2 ( 1; 1)
where �t  N(0; ��2) and �t  N(0; ��2)
8.5 Solution Slide 18

ST k � WT k + �bk 1
XT k
k+1 2ak

JT k (PT k 1; XT k ; WT k ) � PT k 1WT k + ak WT2 k +


bk XT k WT k + ck XT2 k + dk
for k � 0; 1; : : :; T 1, where:
 
ak � �2 1 + k +1 1 ; ; a0 � �

bk � � + ��bk 1
2ak 1 ; b0 � �

�2 b2k
ck � �2 ck 1
4ak
1
; c0 � 0
1

dk � dk 1 + ck 1��2 ; d0 � 0 :

9 Matrix multiplication
Slide 19
 Matrices: Mk : nk  nk+1
 Objective: Find M1  M2    MN
 Example: M1  M2  M3 ; M1 : 1  10, M2 : 10  1, M3 : 1  10.
M1 (M2 M3 ) 200 multiplications;
(M1 M2 )M3 20 multiplictions.
 What is the optimal order for performing the multiplication�
Slide 20

6
 m(i; j) optimal number of scalar multiplications for multiplying Mi : : :Mj .

 m(i; i) � 0
 For i � j:
m(i; j) � imin
k�j
(m(i; k) + m(k + 1; j) + ni nk+1nj +1 )

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093 Optimization Methods

Lecture 17: Applications of Nonlinear Optimization

1 Lecture Outline
Slide 1
� History of Nonlinear Optimization

� Where do NLPs Arise�

� Portfolio Optimization

� Tra�c Assignment

� The general problem

� The role of convexity

� Convex optimization

� Examples of convex optimization problems

2 History of Optimization

Slide 2
Fermat, 1638; Newton, 1670

min f(x) x: scalar

df(x)
� 0

dx

Euler, 1755

min f(x1 ; : : :; xn)

rf(x) � 0

Slide 3

Lagrange, 1797

min f(x1 ; : : :; xn)

s.t. gk (x1; : : :; xn) � 0 k � 1; : : :; m

Euler, Lagrange Problems in in�nite dimensions, calculus of variations.

Kuhn and Tucker, 1950s Optimality conditions.

1950s Applications.

1960s Large Scale Optimization.

Karmakar, 1984 Interior point algorithms.

1
3 Where do NLPs Arise�

3.1 Wide Applicability


Slide 4
� Transportation

Tra�c management, Tra�c equilibrium . ..

Revenue management and Pricing

� Finance - Portfolio Management

� Equilibrium Problems

Slide 5

� Engineering

Data Networks and Routing

Pattern Classi�cation

� Manufacturing

Resource Allocation

Production Planning

4 A Simple Portfolio

Selection Problem

4.1 Data
Slide 6
� xi : decision variable on amount to invest in stock i � 1; 2

� ri : reward from stock i � 1; 2 (random variable)

Data:

� �i � E(ri ): expected reward from stock i � 1; 2

� V ar(ri): variance in reward from stock i � 1; 2

� �ij � E[(rj � �j )(ri � �i )] � Cov(ri ; rj )

� Budget B, target � on expected portfolio reward

2
5 A Simple Portfolio

Selection Problem

5.1 The Problem


Slide 7
Objective: Minimize total portfolio variance so that:

� Expected reward of total portfolio is above target �

� Total amount invested stay within our budget

� No short sales

Slide 8

min f(x) � x21 V ar(r1) + x22 V ar(r2) + 2x1x2 �12


subject to
X

xi � B

X X

E[ ri xi] � �i xi � �; (exp reward of portf:)

i i

xi � 0; i � 1; 2

(Linearly constrained NLP)

6 A Real Portfolio

Optimization Problem

6.1 Data
Slide 9
� We currently own zi shares from stock i, i 2 S

� Pi : current price of stock i

� We consider buying and selling stocks in S, and consider buying new stocks

from a set B (B \ S � ;)

� Set of stocks B [ S � f1; : : :; ng

Slide 10

� Data: Forecasted prices next period (say next month) and their correla

tions:

E[P^i] � �i

Cov(P^i ; P^j ) � E[(P^i � �i)(P^j � �j )] � �ij

 � (�1; : : :; �n)0
; � � �ij

3
6.2 Issues and Objectives Slide 11
� Mutual funds regulations: we cannot sell a stock if we do not own it
� Transaction costs
� Turnover
� Liquidity
� Volatility
� Objective: Maximize expected wealth next period minus transaction
costs
6.3 Decision variables Slide 12
# shares bought or sold if i 2 S

xi � # shares bought if i 2 B
By convention:
xi � 0 buy
xi � 0 sell

6.4 Transaction costs Slide 13


� Small investors only pay commision cost: ai $/share traded
� Transaction cost: ai jxij
� Large investors (like portfolio managers of large funds) may a�ect price:
price becomes Pi + bi xi
� Price impact cost: (Pi + bi xi )xi � Pixi � bi x2i
� Total cost model:
ci (xi ) � ai jxij + bi x2i
6.5 Liquidity Slide 14
� Suppose you own 50% of all outstanding stock of a company
� How di�cult is to sell it�
� Reasonable to bound the percentage of ownership on a particular stock
� Thus, for liquidity reasons zzi total
+ xi � �
i
i
� zitotal
�# outstanding shares of stock i
� �i maximum allowable percentage of ownership

4
6.6 Turnover Slide 15
� Because of transaction costs: jxij should be small
jxij � �i ) ��i � xi � �i
� Alternatively, we might want to bound turnover:
X n
Pi jxij � t
i�1
6.7 Balanced portfolios Slide 16
� Need the value of stocks we buy and sell to balance out:
� �
�Xn �
n
X



Pixi � L ) �L ��

Pixi � L

i�1 i�1

� No short sales:
zi + xi � 0; i2 B[S
6.8 Expected value
and Volatility Slide 17
� Expected value of portfolio:
"

n #

n
P^i (zi + xi) � �i (zi + xi)
X X
E
i�1 i�1
� Variance of the value of the portfolio:

"

n #

P^i(zi + xi ) � (z + x)0�(z + x)
X

V ar
i�1
6.9 Overall formulation Slide 18
n n

X
X

max �i (zi + xi ) (ai jxij + bi xi )


2

i�1 i�1
s:t: (z + x) �(z + x)  �2 0

zi + xi  �i zitotal

�i  xi  �i
n
X

L Pixi  L

i�1

n
X

Pi jxi j  t

i�1
z i + xi 0

5
7 The general problem
Slide 19
f(x): �n !
7 �
gi (x): � 7! �; i � 1; : : :; m
n

NLP: min f(x)


s.t. g1(x) � 0

.
.

gm (x) � 0

7.1 Is Portfolio Optimization an NLP� Slide 20


n
X n
X
max �i (zi + xi ) (ai jxij + bi xi )
2

i�1 i�1
s:t: (z + x) �(z + x)  �2
0

zi + xi  �i zitotal

 xi  �i

�i
n
X

L Pixi  L

i�1

n
X
Pi jxi j  t
i�1
z i + xi 0

8 Geometry Problems
8.1 Fermat-Weber Problem Slide 21
Given m points c1 ; : : :; cm 2 �n (locations of retail outlets) and weights w1; : : :; wm 2
�. Choose the location of a distribution center.
That is, the point x 2 �n to minimize the sum of the weighted distances from
x to each of the points c1; : : :; cm 2 �n (minimize total daily distance
traveled).
m
min wijjx � cijj
P

i�1 n
s:t: x 2 �
or

m
min wijjx � cijj
P

i�1
s:t: x � 0
Ax � b; feasible sites
(Linearly constrained NLP)
8.2 The Ball Circumscription Problem Slide 22
Given m points c1; : : :; cm 2 �n , locate a distribution center at point x 2 �n to
minimize the maximum distance from x to any of the points c1 ; : : :; cm 2 �n.
min �
s:t: jjx � ci jj � �; i � 1; : : :; m

9 Transportation
9.1 Tra�c Assignment Slide 23
� ODPw, paths p 2 Pw , demand dw , xp : �ow of p
cij ( p: crossing (i;j ) xp ): travel cost of link (i; j).
cp (x) is the travel cost of path p and
X
cp (x) � cij (xij ); 8p 2 Pw ; 8w 2 W:
(i;j ) on p
System � optimization principle : Assign �ow on each path to satisfy
total demand and so that the total network cost is minimized.

Min C(x) � cp (x)xp


p

X
s:t: xp � 0; xp � dw ; 8w
p2Pw

9.2 Example Slide 24


Consider a three path network, dw � 10.
With travel costs cp1 (x) � 2xp1 + xp2 + 15, cp2 (x) � 3xp2 + xp1 + 11 cp3 (x) �
xp3 + 48
C(x) � cp1 (x)xp1 + cp2 (x)xp2 + cp3 (x)xp3 �
2xp1 + 3x2p2 + x2p3 + 2xp1xp2 + 15xp1 + 11xp2 + 38xp3
2

x�p1 � 6; x�p2 � 4; x�
p3 � 0

Slide 25

7
� User � optimization principle : Each user of the network chooses, among
all paths, a path requiring minimum travel cost,
i.e., for all w 2 W and p 2 Pw ,
x�p � 0 : �! cp (x� ) � cp (x� ) 8p0 2 Pw ; 8w 2 W
0

where cp (x) is the travel time of path p and


X
cp (x) � cij (xij ); 8p 2 Pw ; 8w 2 W
i;j ) on p
(

10 Optimal Routing
Slide 26
� Given a data net and a set W of OD pairs w � (i; j) each OD pair w has
input tra�c dw
� Optimal routing problem:
X X

Min C(x) � Ci;j ( xp )


i;j p: (i;j )2p

X
s:t: xp � dw ; 8w 2 W
p2Pw
xp � 0; 8p 2 Pw ; w 2 W

11 The general problem again


Slide 27
f(x): �n 7! �
is a continuous (usually di�erentiable) function of n variables
gi(x): �n 7! �; i � 1; : : :; m;
hj (x): �n 7! �; j � 1; : : :; l

NLP: min f(x)


s.t. g1(x) � 0

.
.

gm (x) � 0
h1(x) � 0
.
.

hl (x) � 0

11.1 De�nitions Slide 28


� The feasible region of NLOP is the set:
F � fxjg1(x) � 0; : : :; gm (x) � 0g
h1 (x) � 0; : : :; hl (x) � 0g
11.2 Where do optimal solutions lie� Slide 29
Example:
min f(x; y) � (x � a)2 + (y � b)2
Subject to
(x � 8)2 + (y � 9)2 � 49
2 � x � 13
x + y � 24
Optimal solution(s) do not necessarily lie at an extreme point!
Depends on (a; b).
(a; b) � (16; 14) then solution lies at a corner
(a; b) � (11; 10) then solution lies in interior
(a; b) � (14; 14) then solution lies on the boundary
(not necessarily corner)

11.3 Local vs Global Minima Slide 30


� The ball centered at x� with radius � is the set:
B(x� ; �) :� fxjjjx � x� jj � �g
� x 2 F is a local minimum of NLOP if there exists � � 0 such that
f(x) � f(y ) for all y 2 B(x; �) \ F

� x 2 F is a global minimum of NLOP if f(x ) � f(y ) for all y 2 F

12 Convex Sets Slide 31


� A subset S � �n is a convex set if
x; y 2 S ) �x + (1 � �)y 2 S 8� 2 [0; 1]
� If S; T are convex sets, then S \ T is a convex set
� Implication: The intersection of any collection of convex sets is a convex
set

9
13 Convex Functions Slide 32
� A function f(x) is a convex function if
f(�x + (1 � �)y) � �f(x ) + (1 � �)f(y )
8x; y 8� 2 [0; 1]

� A function f(x) is a concave function if


f(�x + (1 � �)y) � �f(x ) + (1 � �)f(y )
8x; y 8� 2 [0; 1]

13.1 Examples in one dimension Slide 33


� f(x) � ax + b
� f(x) � x2 + bx + c
� f(x) � jxj
� f(x) � � ln(x) for x � 0
� f(x) � x1 for x � 0
� f(x) � ex
13.2 Properties Slide 34
� If f1 (x) and f2 (x) are convex functions, and a; b � 0, then f(x) :�
af1 (x) + bf2 (x) is a convex function
� If f(x) is a convex function and x � Ay + b, then g(y) :� f(Ay + b) is
a convex function
13.3 Recognition of a Convex Function Slide 35
A function f(x) is twice di erentiable at x� if there exists a vector rf(x� ) (called
the gradient of f(�)) and a symmetric matrix H(x� ) (called the Hessian of f(�))
for which:
f(x ) � f(x� ) + rf(x� )0 (x � x� )
+ 12 (x � x� )0 H(x� )(x � x� ) + R(x)jjx � x� jj
2
where R(x) ! 0 as x ! x� Slide 36
The gradient vector is the vector of partial derivatives:

@f(x� )
rf(x) � @x ; : : :; @x
� @f( x
� )
�0

1
n

10
The Hessian matrix is the matrix of second partial derivatives:
H(x� )ij � @@xf(@xx� )

i j

13.4 Examples Slide 37


� For LP, f(x) � c0x, rf(�x) � c
� For NLP,
f(x) � 8x21 � x1 x2 + x22 + 8x1, at x� � (1; 0),
f(�x) � 16 and
rf(�x)0 �� (16�x1 � x�2 �+ 8; �x�1 + 2�x2) � (24; �1).
H (�x) � �161 �2 1 Slide 38

Property: f(x) is a convex function if and only if H(x) is positive semi-de�nite


(PSD) for all x
Recall that A is PSD if u Au � 0; 8u
0

Property: If H(x) is PD for all x, then f(x) is a strictly convex function


13.5 Examples in n Dimensions Slide 39
� f(x) � a x + b
0

� f(x) � 12 x Mx � c x where M is PSD


0 0

� f(x) � jjxjj for any norm jj � jj


m
� f(x) � P � ln(bi � ai x) for x satisfying Ax < b
0

i�1

14 Convex Optimization
14.1 Convexity and Minima Slide 40
min f(x)
s.t. x2F
Theorem: Suppose that F is a convex set, f : F ! � is a convex function, and
x� is a local minimum of P. Then x� is a global minimum of f over F .
14.1.1 Proof Slide 41
Assume that x� is not the global minimum. Let y be the global minimum.
From the convexity of f(�),
f(y (�)) � f(�x � + (1 � �)y ) � �f(x � ) + (1 � �)f(y )
� �f(x� ) + (1 � �)f(x� ) � f(x� )

11
for all � 2 (0; 1).
Therefore, f(y (�)) � f(x� ) for all � 2 (0; 1), and so x� is not a local minimum,
resulting in a contradiction
14.2 COP Slide 42
COP : min f(x)
s:t: g1(x) � 0
..
.
gm (x) � 0
Ax � b
Slide 43
COP is called a convex optimization problem if f(x); g1(x); : : :; gm (x) are con
vex functions
Note that this implies that the feasible region F is a convex set
In COP we are minimizing a convex function over a convex set
Implication: If COP is a convex optimization problem, then any local minimum
will be a global minimum.

15 Examples of COPs
Slide 44
The Fermat-Weber Problem - COP
m
min
P

i�1
wijjx � cijj
s:t: x 2 P
The Ball Circumscription Problem - COP
min �
s:t: jjx � ci jj � �; i � 1; : : :; m

12

15.1 Is Portfolio Optimization a COP� Slide 45


n
X n
X
(ai jxij + bi xi )

max �i (zi + xi ) 2

i�1 i�1
s:t: (z + x) �(z + x)  �2
0

zi + xi  �i zitotal

�i  xi  �i

n
X
L  L

Pixi
i�1
n

X
Pi jxi j  t
i�1
z i + xi 0

15.2 Quadratically Constrained Problems Slide 46


min (A0x + b0 ) (A0x + b0) � c0 x � d0
0 0

s:t: (Aix + bi) (Aix + bi) � ci x � di � 0


0 0

i � 1; : : :; m
This is a COP

16 Classi�cation of NLPs Slide 47


� Linear: f(x) � ctx, gi(x) � Ati x � bi , i � 1; :::; m
� Unconstrained: f(x), �n
� Quadratic: f(x) � ctx + xtQx, gi(x) � Atix � bi
� Linearly Constrained: gi(x) � Ati x � bi
� Quadratically Constrained: gi (x) � (Ai x + bi ) (Ai x + bi ) � cix � 0 0

di � 0;
i � 1; : : :; m
� Separable: f(x) � P
j fj (xj ), gi(x) � P
j gij (xj )

17 Two Main Issues Slide 48


� Characterization of minima
Necessary | Su�cient Conditions
Lagrange Multiplier and KKT Theory

13
� Computation of minima via iterative algorithms
Iterative descent Methods
Interior Point Methods

18 Summary
� Convex optimization is a powerful modeling framework

� Main message: convex optimization can be solved e�ciently

14

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

1
15.093 Optimization Methods

Lecture 18: Optimality Conditions and

Gradient Methods

for Unconstrained Optimization

1 Outline
Slide 1
1. Necessary and sufficient optimality conditions
2. Gradient methods
3. The steepest descent algorithm
4. Rate of convergence
5. Line search algorithms

2 Optimality Conditions
Slide 2
Necessary Conds for Local Optima
“If x̄ is local optimum then x̄ must satisfy ...”

Identifies all candidates for local optima.

Sufficient Conds for Local Optima


“If x̄ satisfies ...,then x̄ must be a local optimum ”

3 Optimality Conditions
3.1 Necessary conditions
Slide 3
Consider
min f (x)
x∈ℜn
Zero first order variation along all directions
Theorem
Let f (x) be continuously differentiable.
If x∗ ∈ ℜn is a local minimum of f (x), then

∇f (x∗ ) = 0 and ∇2 f (x∗ ) PSD

3.2 Proof
Slide 4
Zero slope at local min x∗
• f (x∗ ) ≤ f (x∗ + λd) for all d ∈ ℜn , λ ∈ ℜ

1
• Pick λ > 0
f (x∗ + λd) − f (x∗ )
0≤
λ
• Take limits as λ → 0
0 ≤ ∇f (x∗ )′ d, ∀d ∈ ℜn

• Since d arbitrary, replace with −d ⇒ ∇f (x∗ ) = 0.


Slide 5
Nonnegative curvature at a local min x∗
• f (x∗ + λd) − f (x∗ ) = ∇f (x∗ )′ (λd) + 21 (λd)′ ∇2 f (x∗ )(λd) + ||λd||2 R(x∗ ; λd)

where R(x∗ ; y) → 0 as y → 0. Since ∇f (x∗ ) = 0,

1 2 ′ 2
= λ d ∇ f (x∗ )d + λ2 ||d||2 R(x∗ ; λd) ⇒
2
f (x∗ + λd) − f (x∗ ) 1
= d′ ∇2 f (x∗ )d + ||d||2 R(x∗ ; λd)
λ2 2
If ∇2 f (x∗ ) is not PSD, ∃d¯: d¯ ′ ∇2 f (x∗ )d¯ < 0 ⇒ f (x∗ + λd)¯ < f (x̄), ∀λ
suff. small QED.

3.3 Example
Slide 6
f (x) = 12 x21 + x1 .x2 + 2x22 − 4x1 − 4x2 − x3
2
∇f (x) = (x�1 + x2 − 4, x1 + � 4x2 − 4 − 3x22 ) Candidates x∗ = (4, 0) and x̄ = (3, 1)

1 1
∇2 f (x) =
� 1 4 −�6x2
1 1
∇2 f (x∗ ) =
1 4
PSD
Slide 7
x̄ = (3, 1) � �
1 1
∇2 f (x̄) =
1 −2
Indefinite matrix
x∗ is the only candidate for local min

3.4 Sufficient conditions


Slide 8
Theorem f twice continuously differentiable. If ∇f (x∗ ) = 0 and ∇2 f (x) PSD

in B(x∗ , ǫ), then x∗ is a local minimum.

Proof: Taylor series expansion: For all x ∈ B(x∗ , ǫ)

f (x) = f (x∗ ) + ∇f (x∗ )′ (x − x∗ )


1
+ (x − x∗ )′ ∇2 f (x∗ + λ(x − x∗ ))(x − x∗ )
2
for some λ ∈ [0, 1]
⇒ f (x) ≥ f (x∗ )

2
3.5 Example Continued...
Slide 9
At x∗ = (4,�0), ∇f (x∗ ) = 0 �and
1 1
∇2 f (x) =
1 4 − 6x2
is PSD for x ∈ B(x∗ , ǫ) Slide 10
f (x) = x31 + 2 2 ∗
� x2 and ∇f�(x) = (3x1 , 2x2 ) x = (0, 0)
6x1 0
∇2 f (x) = is not PSD in B(0, ǫ)
0 2
f (−ǫ, 0) = −ǫ3 < 0 = f (x∗ )

3.6 Characterization of convex functions


Slide 11
Theorem Let f (x) be continuously differentiable.
Then f (x) is convex if and only if

∇f (x)′ (x − x) ≤ f (x) − f (x)

3.7 Proof
Slide 12
By convexity
f (λx + (1 − λ)x) ≤ λf (x) + (1 − λ)f (x)
f (x + λ(x − x)) − f (x)
≤ f (x) − f (x)
λ
As λ → 0,
∇f (x)′ (x − x) ≤ f (x) − f (x)

3.8 Convex functions


Slide 13
Theorem Let f (x) be a continuously differentiable convex function. Then x∗ is
a minimum of f if and only if

∇f (x∗ ) = 0

Proof: If f convex and ∇f (x∗ ) = 0

f (x) − f (x∗ ) ≥ ∇f (x∗ )′ (x − x∗ ) = 0

3
3.9 Descent Directions
Slide 14
Interesting Observation

f diff/ble at x̄

∃d: ∇f (x̄)′ d < 0 ⇒ ∀λ > 0, suff. small, f (x̄ + λd) < f (x̄)

(d: descent direction)

3.10 Proof
Slide 15
f (x̄ + λd) = f (x̄) + λ∇f (x̄)t d + λ||d||R(x̄, λd)
where R(x̄, λd) −→λ→0 0
f (x̄ + λd) − f (x̄)
= ∇f (x̄)t d + ||d||R(x̄, λd)
λ
∇f (x̄)t d < 0, R(x̄, λd) −→λ→0 0 ⇒
∀λ > 0 suff. small f (x̄ + λd) < f (x̄). QED

4 Algorithms for unconstrained optimization


4.1 Gradient Methods-Motivation
Slide 16
• Decrease f (x) until ∇f (x∗ ) = 0

f (x̄ + λd) ≈ f (x̄) + λ∇f (x̄)′ d

• If ∇f (x̄)′ d < 0, then for small λ > 0,

f (x̄ + λd) < f (x̄)

5 Gradient Methods
5.1 A generic algorithm
Slide 17
• xk+1 = xk + λk dk
• If ∇f (xk ) 6= 0, direction dk satisfies:

∇f (xk )′ dk < 0

• Step-length λk > 0
• Principal example:

xk+1 = xk − λk Dk ∇f (xk )

D k positive definite symmetric matrix

4
5.2 Principal directions
Slide 18
• Steepest descent:
xk+1 = xk − λk ∇f (xk )

• Newton’s method:

xk+1 = xk − λk (∇2 f (xk ))−1 ∇f (xk )

5.3 Other directions


Slide 19
• Diagonally scaled steepest descent

D k = Diagonal approximation to (∇2 f (xk ))−1

• Modified Newton’s method

D k = Diagonal approximation to (∇2 f (x0 ))−1

• Gauss-Newton method for least squares problems f (x) = ||g(x)||2 Dk =


(∇g(xk )∇g(xk )′ )−1

6 Steepest descent
6.1 The algorithm
Slide 20
Step 0 Given x0 , set k := 0.
Step 1 dk := −∇f (xk ). If ||dk || ≤ ǫ, then stop.
Step 2 Solve minλ h(λ) := f (xk + λdk ) for the
step-length λk , perhaps chosen by an exact
or inexact line-search.
Step 3 Set xk+1 ← xk + λk dk , k ← k + 1.
Go to Step 1.

6.2 An example
Slide 21
minimize f (x1 , x2 ) = 5x21 + x22 + 4x1 x2 − 14x1 − 6x2 + 20
x∗ = (x∗1 , x∗2 )′ = (1, 1)′
f (x∗ ) = 10 Slide 22
k
Given x
−10xk1 − 4xk2 + 14 dk1
� � � �
dk = −∇f (xk1 , xk2 ) = =
−2xk2 − 4xk1 + 6 dk2
h(λ) = f (xk + λdk )
= 5(xk1 + λdk1 )2 + (xk2 + λdk2 )2 + 4(xk1 + λdk1 )(xk2 + λdk2 )−
−14(xk1 + λdk1 ) − 6(xk2 + λdk2 ) + 20

5
(dk1 )2 + (dk2 )2
λk = Slide 23
2(5(dk1 )2
+ (dk2 )2 + 4dk1 dk2 )
Start at x = (0, 10)′
ε = 10−6
k x1k x2k d1k d2k ||dk ||2 λk f (xk )
1 0.000000 10.000000 −26.000000 −14.000000 29.52964612 0.0866 60.000000
2 −2.252782 8.786963 1.379968 −2.562798 2.91071234 2.1800 22.222576
3 0.755548 3.200064 −6.355739 −3.422321 7.21856659 0.0866 12.987827
4 0.204852 2.903535 0.337335 −0.626480 0.71152803 2.1800 10.730379
5 0.940243 1.537809 −1.553670 −0.836592 1.76458951 0.0866 10.178542
6 0.805625 1.465322 0.082462 −0.153144 0.17393410 2.1800 10.043645
7 0.985392 1.131468 −0.379797 −0.204506 0.43135657 0.0866 10.010669
8 0.952485 1.113749 0.020158 −0.037436 0.04251845 2.1800 10.002608
9 0.996429 1.032138 −0.092842 −0.049992 0.10544577 0.0866 10.000638
10 0.988385 1.027806 0.004928 −0.009151 0.01039370 2.1800 10.000156
Slide 24
k xk
1 xk
2 dk
1 dk
2
k
||d ||2 λk k
f (x )
11 0.999127 1.007856 −0.022695 −0.012221 0.02577638 0.0866 10.000038
12 0.997161 1.006797 0.001205 −0.002237 0.00254076 2.1800 10.000009
13 0.999787 1.001920 −0.005548 −0.002987 0.00630107 0.0866 10.000002
14 0.999306 1.001662 0.000294 −0.000547 0.00062109 2.1800 10.000001
15 0.999948 1.000469 −0.001356 −0.000730 0.00154031 0.0866 10.000000
16 0.999830 1.000406 0.000072 −0.000134 0.00015183 2.1800 10.000000
17 0.999987 1.000115 −0.000332 −0.000179 0.00037653 0.0866 10.000000
18 0.999959 1.000099 0.000018 −0.000033 0.00003711 2.1800 10.000000
19 0.999997 1.000028 −0.000081 −0.000044 0.00009204 0.0866 10.000000
20 0.999990 1.000024 0.000004 −0.000008 0.00000907 2.1803 10.000000
21 0.999999 1.000007 −0.000020 −0.000011 0.00002250 0.0866 10.000000
22 0.999998 1.000006 0.000001 −0.000002 0.00000222 2.1817 10.000000
23 1.000000 1.000002 −0.000005 −0.000003 0.00000550 0.0866 10.000000
24 0.999999 1.000001 0.000000 −0.000000 0.00000054 0.0000 10.000000
Slide 25
5 x2+4 y2+3 x y+7 x+20

1200

1000

800

600

400

200

10

5
0 10
5
−5 0
−5
−10 −10
y

x
Slide 26

6
10

−2
−5 0 5

6.3 Important Properties


Slide 27
• f (xk+1 ) < f (xk ) < · · · < f (x0 ) (because dk are descent directions)
• Under reasonable assumptions of f (x), the sequence x0 , x1 , . . . , will have

at least one cluster point x̄

• Every cluster point x̄ will satisfy ∇f (x̄) = 0


• Implication: If f (x) is a convex function, x̄ will be an optimal solution

7 Global Convergence Result


Slide 28
Theorem:
f : Rn → R is continuously diff/ble on F = {x ∈ Rn : f (x) ≤ f (x0 )} closed,

bounded set

Every cluster point x̄ of {xk } satisfies ∇f (x̄) = 0.

7.1 Work Per Iteration


Slide 29
Two computation tasks at each iteration of steepest descent:

• Compute ∇f (xk ) (for quadratic objective functions, it takes O(n2 ) steps)

to determine dk = −∇f (xk )

7
• Perform line-search of h(λ) = f (xk + λdk )

to determine λk = arg minλ h(λ) = arg minλ f (xk + λdk )

8 Rate of convergence
of algorithms
Slide 30
Let z1 , . . . , zn , . . . → z be a convergent sequence. We say that the order of
convergence of this sequence is p∗ if
� �
|zk+1 − z|
p∗ = sup p : lim sup p
< ∞
k→∞ |zk − z|

Let
|zk+1 − z|
β = lim sup p∗
k→∞ |zk − z|

The larger p∗ , the faster the convergence

8.1 Types of convergence


Slide 31

1. p = 1, 0 < β < 1, then linear (or geometric) rate of convergence
2. p ∗ = 1, β = 0, super-linear convergence
3. p ∗ = 1, β = 1, sub-linear convergence
4. p ∗ = 2, quadratic convergence

8.2 Examples
Slide 32
• zk = ak , 0 < a < 1 converges linearly to zero, β = a

k
• zk = a2 , 0 < a < 1 converges quadratically to zero

1
• zk = k converges sub-linearly to zero

� �k
1
• zk = k converges super-linearly to zero

8.3 Steepest descent


Slide 33
• zk = f (xk ), z = f (x∗ ), where x∗ = arg min f (x)

8
• Then an algorithm exhibits linear convergence if there is a constant δ < 1

such that

f (xk+1 ) − f (x∗ )
≤δ,
f (xk ) − f (x∗ )

for all k sufficiently large, where x∗ is an optimal solution.

8.3.1 Discussion
Slide 34
f (xk+1 ) − f (x∗ )
≤δ<1
f (xk ) − f (x∗ )
• If δ = 0.1, every iteration adds another digit of accuracy to the optimal

objective value.

• If δ = 0.9, every 22 iterations add another digit of accuracy to the optimal

objective value, because (0.9)22 ≈ 0.1.

9 Rate of convergence
of steepest descent
9.1 Quadratic Case
9.1.1 Theorem
Slide 35
Suppose f (x) = 12 x′ Qx − c′ x
Q is psd

λmax = largest eigenvalue of Q

λmin = smallest eigenvalues of Q

Linear Convergence Theorem: If f (x) is a quadratic function and Q is psd,


then � � 2
λmax
f (x ) − f (x )  λmin − 1 
k+1 ∗
≤ �
f (xk ) − f (x∗ )

λmax
+1 λmin

9.1.2 Discussion
� � 2 Slide 36
λmax
k+1 ∗
f (x ) − f (x )  λmin −1
≤ � 
f (xk ) − f (x∗ )

λmax
λmin+1

λmax
• κ(Q) := λmin is the condition number of Q

9
• κ(Q) ≥ 1
• κ(Q) plays an extremely important role in analyzing computation involv­
ing Q
Slide 37
�2
f (xk+1 ) − f (x∗ )

κ(Q) − 1

f (xk ) − f (x∗ ) κ(Q) + 1
Upper Bound on Number of Iterations to Reduce
λmax
κ(Q) = λmin Convergence Constant δ the Optimality Gap by 0.10
1.1 0.0023 1
3.0 0.25 2
10.0 0.67 6
100.0 0.96 58
200.0 0.98 116
400.0 0.99 231
Slide 38
For κ(Q) ∼ O(1) converges fast.
For large κ(Q)
� �2
κ(Q) − 1 1 2 2
∼ (1 − ) ∼1−
κ(Q) + 1 κ(Q) κ(Q)
Therefore
2 k
(f (xk ) − f (x∗ )) ≤ (1 − ) (f (x0 ) − f (x∗ ))
κ(Q)
In k ∼ 12 κ(Q)(−lnǫ) iterations, finds xk :

(f (xk ) − f (x∗ )) ≤ ǫ(f (x0 ) − f (x∗ ))

9.2 Example 2
Slide 39
1
f (x) = x′ Qx − c′ x + 10
2
� � � �
20 5 14
Q = c=
5 1 6

κ(Q) = 30.234
�2
κ(Q)−1

δ = κ(Q)+1 = 0.8760 Slide 40

f (xk ) − f (x∗ )
k xk
1 xk
2 ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )
1 40.000000 −100.000000 286.06293014 0.0506 6050.000000
2 25.542693 −99.696700 77.69702948 0.4509 3981.695128 0.658079
3 26.277558 −64.668130 188.25191488 0.0506 2620.587793 0.658079
4 16.763512 −64.468535 51.13075844 0.4509 1724.872077 0.658079
5 17.247111 −41.416980 123.88457127 0.0506 1135.420663 0.658079
6 10.986120 −41.285630 33.64806192 0.4509 747.515255 0.658079
7 11.304366 −26.115894 81.52579489 0.0506 492.242977 0.658079
8 7.184142 −26.029455 22.14307211 0.4509 324.253734 0.658079
9 7.393573 −16.046575 53.65038732 0.0506 213.703595 0.658079
10 4.682141 −15.989692 14.57188362 0.4509 140.952906 0.658079

10
Slide 41
f (xk ) − f (x∗ )
k xk
1 xk
2 ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )

20 0.460997 0.948466 1.79847660 0.4509 3.066216 0.658079

30 −0.059980 3.038991 0.22196980 0.4509 0.965823 0.658079

40 −0.124280 3.297005 0.02739574 0.4509 0.933828 0.658079

50 −0.132216 3.328850 0.00338121 0.4509 0.933341 0.658079

60 −0.133195 3.332780 0.00041731 0.4509 0.933333 0.658078

70 −0.133316 3.333265 0.00005151 0.4509 0.933333 0.658025

80 −0.133331 3.333325 0.00000636 0.4509 0.933333 0.654656

90 −0.133333 3.333332 0.00000078 0.0000 0.933333 0.000000

Slide 42

9.3 Example 3
Slide 43
1
f (x) = x′ Qx − c′ x + 10
2
� � � �
20 5 14
Q= c=
5 16 6

κ(Q) = 1.8541
� �2
κ(Q) − 1
δ= = 0.0896 Slide 44
κ(Q) + 1

f (xk ) − f (x∗ )
k xk
1 x2k ||dk ||2 λk f (xk )
f (xk−1 ) − f (x∗ )

1 40.000000 −100.000000 1434.79336491 0.0704 76050.000000

2 19.867118 −1.025060 385.96252652 0.0459 3591.615327 0.047166

3 2.513241 −4.555081 67.67315150 0.0704 174.058930 0.047166

4 1.563658 0.113150 18.20422450 0.0459 12.867208 0.047166

5 0.745149 −0.053347 3.19185713 0.0704 5.264475 0.047166

6 0.700361 0.166834 0.85861649 0.0459 4.905886 0.047166

7 0.661755 0.158981 0.15054644 0.0704 4.888973 0.047166

8 0.659643 0.169366 0.04049732 0.0459 4.888175 0.047166

9 0.657822 0.168996 0.00710064 0.0704 4.888137 0.047166

10 0.657722 0.169486 0.00191009 0.0459 4.888136 0.047166

11 0.657636 0.169468 0.00033491 0.0704 4.888136 0.047166

12 0.657632 0.169491 0.00009009 0.0459 4.888136 0.047161

13 0.657628 0.169490 0.00001580 0.0704 4.888136 0.047068

14 0.657627 0.169492 0.00000425 0.0459 4.888136 0.045002

15 0.657627 0.169491 0.00000075 0.0000 4.888136 0.000000

Slide 45

11
9.4 Empirical behavior
Slide 46
• The convergence constant bound is not just theoretical. It is typically

experienced in practice.

• Analysis is due to Leonid Kantorovich, who won the Nobel Memorial

Prize in Economic Science in 1975 for his contributions to optimization

and economic planning.

Slide 47

• What about non-quadratic functions?

– Suppose x∗ = arg minx f (x)

– ∇2 f (x∗ ) is the Hessian of f (x) at x = x∗

– Rate of convergence will depend on κ(∇2 f (x∗ ))

10 Summary
Slide 48
1. Optimality Conditions
2. The steepest descent algorithm - Convergence
3. Rate of convergence of Steepest Descent

12
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 19: Line Searches


and Newton’s Method
1 Last Lecture
Slide 1
• Necessary Conditions for Optimality
(identifies candidates)
x∗ local min ⇒ ∇f (x∗ ) = 0, ∇2 f (x∗ ) PSD
• Sufficient Conditions for Optimality

∇f (x∗ ) = 0 ∇2 f (x) psd x ∈ Bǫ (x∗ ) ⇒ x∗ loc. min.

• Characterizations of Convexity
a. f convex ⇔ f (y) ≥ f (x) + ∇f (x)′ (y − x) ∀x, y
b. f convex ⇔ ∇2 f (x) PSD ∀x
• f convex then global min ⇔ ∇f (x∗ ) = 0

2 Steepest descent
2.1 The algorithm
Slide 2
Step 0 Given x0 , set k := 0.
Step 1 dk := −∇f (xk ). If ||dk || ≤ ǫ, then stop.
Step 2 Solve minλ h(λ) := f (xk + λdk ) for the

step-length λk , perhaps chosen by an exact

or inexact line-search.

Step 3 Set xk+1 ← xk + λk dk , k ← k + 1.

Go to Step 1.

3 Outline
Slide 3
1. Bisection Method - Armijo’s Rule
2. Motivation for Newton’s method
3. Newton’s method
4. Quadratic rate of convergence
5. Modification for global convergence

4 Choices of step sizes


Slide 4
• M inλ f (xk + λdk )
• Limited Minimization: M inλ∈[0,s] f (xk + λdk )

• Constant stepsize λk = s constant

1
• Diminishing stepsize: λk → 0, λk = ∞
P
k

• Armijo Rule

4.1 Bisection Line- Search Algorithm


4.1.1 Convex functions
Slide 5
λ̄ := arg min h(λ) := arg min f (x̄ + λd̄)
λ λ

If f (x) is convex, h(λ) is convex.


Find λ̄ for which h′ (λ) = 0
h′ (λ) = ∇f (x̄ + λd̄)′ d̄

4.1.2 Algorithm
Slide 6
Step 0. Set k = 0. Set λL := 0 and λU := λ̂.

λU + λL ′
Step k. Set λ̃ = and compute h (λ̃).


2

If h (λ̃) > 0, re-set λU := λ̃. Set k ← k + 1.


If h (λ̃) < 0, re-set λL := λ̃. Set k ← k + 1.


If h (λ̃) = 0, stop.

4.1.3 Analysis
Slide 7
• At the kth iteration of the bisection algorithm, the current interval [λL , λU ]

must contain a point λ̄: h′ (λ̄) = 0

• At the k th iteration of the bisection algorithm, the length of the current

interval [λL , λU ] is

 k
1
length = (λ̂).
2

• A value of λ such that |λ − λ̄| ≤ ǫ can be found in at most


& !'
λ̂
log2
ǫ

steps of the bisection algorithm.

2
4.1.4 Example
Slide 8

h(λ) = (x1 + λd1 ) − 0.6(x2 + λd2 ) + 4(x3 + λd3 ) +


X4
+0.25(x4 + λd4 ) − log(xi + λdi ) −
i=1
4 4
!
X X
− log (5 − xi − λ di
i=1 i=1
where (x1 , x2 , x3 , x4 ) = (1, 1, 1, 1)′ and (d1 , d2 , d3 , d4 ) = (−1, 0.6, −4, −0.25)′. Slide 9

h(λ) = 4.65 − 17.4225λ − log(1 − λ) − log(1 + 0.6λ) −


and − log(1 − 4λ) − log(1 − 0.25λ) − log(1 + 4.65λ)

1 0.6 4
h′ (λ) = −17.4225 + − + +
1 − λ 1 + 0.6λ 1 − 4λ
0.25 4.65
+ −
1 − 0.25λ 1 + 4.65λ Slide 10

10

0
0.05 0.1 0.15 0.2 0.25

−10

−20 Slide 11
k λkl λku h′ (λ̃)
1 0.0000000 1.0000000 NaN
2 0.0000000 0.5000000 NaN
3 0.0000000 0.2500000 −11.520429338348
4 0.1250000 0.2500000 −2.952901763683
5 0.1875000 0.2500000 13.286386294218
6 0.1875000 0.2187500 2.502969022220
7 0.1875000 0.2031250 −0.605144021505
8 0.1953125 0.2031250 0.831883373151
9 0.1953125 0.1992188 0.087369215988
10 0.1953125 0.1972656 −0.265032213496
Slide 12

3
k λkl λku h′ (λ̃)
20 0.1970253 0.1970272 −0.000184301091
30 0.1970268 0.1970268 −0.000000146531
40 0.1970268 0.1970268 0.000000000189
41 0.1970268 0.1970268 0.000000000023
42 0.1970268 0.1970268 −0.000000000059
43 0.1970268 0.1970268 −0.000000000018
44 0.1970268 0.1970268 0.000000000003
45 0.1970268 0.1970268 −0.000000000008
46 0.1970268 0.1970268 −0.000000000002
47 0.1970268 0.1970268 0.000000000000

4.2 Armijo Rule

Slide 13
Bisection is accurate but may be expensive in practice

Need cheap method guaranteeing sufficient accuracy

Inexact line search method. Requires two parameters: ǫ ∈ (0, 1), σ > 1.

h̄(λ) = h(0) + λǫh′ (0)

λ̄ acceptable by Armijo’s rule if:


• h(λ̄) ≤ h̄(λ̄)
• h(σλ̄) ≥ h̄(σλ̄) (prevents the step size be small)
¯ k ) ≤ f (xk ) + λǫ
(i.e. f (xk + λd ¯ ∇f (xk )′ dk ) Slide 14
We get a range of acceptable stepsizes.
Step 0: Set k = 0, λ0 = λ̄ > 0
1 k
Step k: If h(λk ) ≤ h̄(λk ), choose λk stop. If h(λk ) > h̄(λk ) let λk+1 = σλ
(for example, σ = 2)

5 Newton’s method
5.1 History
Slide 15
Steepest Descent is simple but slow

Newton’s method complex but fast

Origins not clear

Raphson became member of the Royal Society in 1691 for his book “Analysis

Aequationum Universalis” with Newton method.

Raphson published it 50 years before Newton.

4
6 Newton’s method
6.1 Motivation
Slide 16
Consider
min f (x)
• Taylor series expansion around x̄
f (x) ≈ g(x) = f (x̄) + ∇f (x̄)′ (x − x̄)+
1
(x − x̄)′ ∇2 f (x̄)(x − x̄)
2
• Instead of min f (x), solve min g(x), i.e., ∇g(x) = 0
• ∇f (x̄) + ∇2 f (x̄)(x − x̄) = 0
⇒ x − x̄ = −(∇2 f (x̄))−1 ∇f (x̄)

• The direction d = −(∇2 f (x̄))−1 ∇f (x̄) is the Newton direction

6.2 The algorithm


Slide 17
Step 0 Given x0 , set k := 0.

Step 1 dk := −(∇2 f (xk ))−1 ∇f (xk ).

If ||dk || ≤ ǫ, then stop.

Step 2 Set xk+1 ← xk + dk , k ← k + 1.

Go to Step 1.

6.3 Comments
Slide 18
• The method assumes that ∇2 f (xk ) is nonsingular at each iteration
• There is no guaranteee that f (xk+1 ) ≤ f (xk )
• We can augment the algorithm with a line-search:

min f (xk + λdk )

6.4 Properties
Slide 19
Theorem If H = ∇2 f (xk ) is PD, then dk is a descent direction: ∇f (xk )′ dk < 0
Proof:
∇f (xk )′ dk = −∇f (xk )′ H −1 ∇f (xk ) < 0
if H −1 is PD. But,

0 < v ′ (H −1 )′ HH −1 v = v ′ H −1 v
⇒ H −1 is PD

5
6.5 Example 1
Slide 20
1 ∗
f (x) = 7x − log x, x = = 0.14857143
7
1 1
f ′ (x) = 7 − , f ′′ (x) = 2
x x
 −1  
1 1
⇒ d = −(f ′′ (x))−1 f ′ (x) = − 7 − = x − 7x2
x2 x

xk+1 = xk + xk − 7(xk )2 = 2xk − 7(xk )2




Slide 21
k xk xk xk xk
0 1
0 0.01
0.1

1 -5
0 0.0193
0.13

2 -185
0 0.03599
0.1417

3 -239945
0 0.062917
0.14284777

4 -4E11
0 0.098124
0.142857142

5 -112E22
0 0.128849782
0.142857143

6 0 0.141483700
0.142857143

7 0 0.142843938
0.142857143

8 0 0.142857142
0.142857143

6.6 Example 2

Slide 22
f (x1 , x2 ) = − log(1 − x1 − x2 ) − log x1 − log x2
1 1
 
 1 − x1 − x2 −
x1 
∇f (x1 , x2 ) =  
 1 1 

1 − x1 − x2 x2
  2  2  2 
1 1 1
 1−x −x +
1 2 x1 1 − x1 − x2 
∇2 f (x1 , x2 ) = 
 
 2  2  2 
 1 1 1 
+
1 − x1 − x2 1 − x1 − x2 x1
Slide 23
 
1 1
(x∗1 , x∗2 ) = , , f (x∗1 , x∗2 ) = 3.295836867
3 3

6
k xk1 xk2 ||x − x∗ ||
0 0.85 0.05 0.58925565
1 0.7170068 0.09659864 0.45083106
2 0.5129752 0.17647971 0.23848325
3 0.35247858 0.27324878 0.06306103
4 0.33844902 0.32623807 0.00874717
5 0.33333772 0.33325933 7.4133E-05
6 0.33333334 0.33333333 1.1953E-08
7 0.33333333 0.33333333 1.5701E-16

7 Quadratic convergence
Slide 24
• Recall

|zn+1 − z|

z = lim zn , lim sup =δ<∞


n→∞ n→∞ |zn − z|2
n
• Example: zn = a2 , 0 < a < 1, z = 0
n+1
|zn+1 − z| a2
= =1
|zn − z|2 a2n+1

7.1 Intuitive statement


Slide 25
Theorem Suppose f (x) ∈ C 3 (thrice cont. diff/ble), x∗ : ∇f (x∗ ) = 0 and
∇2 f (x∗ ) is nonsingular. If Newton’s method is started sufficiently close to x∗ ,
the sequence of iterates converges quadratically to x∗ .
Slide 26
Lemma:
Suppose f (x) ∈ C 3 on ℜn and x a given pt.
Then ∀ǫ > 0, ∃β > 0: ||x − y|| ≤ ǫ ⇒

||∇f (x) − ∇f (y) − H(y)(x − y)|| ≤ β||x − y||2

7.2 Formal statement


Slide 27
Theorem
Suppose f twice cont. diff/ble. Let g(x) = ∇f (x), H(x) = ∇2 f (x), and
x∗ : g(x∗ ) = 0. Suppose f (x) is twice differentiable and its Hessian satisfies:
• ||H(x∗ )|| ≥ h
• ||H(x) − H(x∗ )|| ≤ L||x − x∗ ||, ∀x : ||x − x∗ || ≤ β
(Reminder: If A is a square matrix:

||A|| = max ||Ax|| = max{|λ1 |, . . . , |λn |})


x:||x||=1

7
Suppose   Slide 28
∗ 2h
||x − x || ≤ min β, = γ.
3L

Let x
¯ be the next iterate in the Newton method. Then,

 
2h
¯ − x∗ || ≤ ||x − x∗ ||2
||x and
3L

¯ − x∗ || < ||x − x∗ || < γ.


||x

7.3 Proof
Slide 29
¯ − x∗
x

= x − H(x)−1 g(x) − x∗
 
= x − x∗ + H(x)−1 g(x∗ ) − g(x)

= x − x∗ + H(x)−1
Z 1
H(x + t(x∗ − x))(x∗ − x)dt (Lemma 1)
0
Z 1
= H(x)−1 (H(x + t(x∗ − x)) − H(x)) (x∗ − x)dt
0
Slide 30
¯ − x∗ ||
||x

≤ ||H(x)−1 ||
Z 1

H(x + t(x∗ − x)) − H(x) · ||x − x∗ ||dt

0
Z 1
≤ ||H(x)−1 || · ||x − x∗ || L||x − x∗ || t dt
0
1
= L||H(x)−1 || · ||x − x∗ ||2
2
Slide 31
Now
h ≤ ||H(x∗ )||
= ||H(x) + H(x∗ ) − H(x)||
≤ ||H(x)|| + ||H(x∗ ) − H(x)||
≤ ||H(x)|| + L||x∗ − x||
⇒ ||H(x)|| ≥ h − L||x − x∗ ||
Slide 32
−1 1 1
⇒ ||H(x) || ≤ ≤
||H (x)|| h − L||x − x∗ ||

8
L
⇒ ||x̄ − x∗ || ≤ ||x − x∗ ||2
2(h − L||x − x∗ ||)
L
≤ ||x − x∗ ||2
2(h − 23 h)
3L
= ||x − x∗ ||2
2h

7.4 Lemma 1
Slide 33
• Fix w. Let φ(t) = g(x + t(x∗ − x))′ w
• φ(0) = g(x)′ w, φ(1) = g(x∗ )′ w
• φ′ (t) = w ′ H(x + t(x∗ − x))(x∗ − x)
R1
• φ(1) = φ(0) + 0 φ′ (t)dt ⇒
Slide 34
∀w : w′ (g(x∗ ) − g(x)) =
Z 1
w′ H(x + t(x∗ − x))(x∗ − x)dt
0
Z 1
⇒ g(x∗ ) − g(x) = H(x + t(x∗ − x))(x∗ − x)dt
0

7.5 Critical comments


Slide 35
• The iterates from Newton’s method are equally attracted to local min and

local max.

• We do not know β, h, L in general.


• Note, however, that they are only used in the proof, not in the algorithm.
• We do not assume convexity, only that H(x∗ ) is nonsingular and not badly
behaved near x∗ .
Slide 36

7.6 Properties of Convergence


Proposition:

3L
Let rk = ||xk − x∗ || and C = 2h . If kx0 − x∗ || < γ then

1 k
rk ≤ (Cr0 )2
C
Slide 37

9
Proposition:

If ||x0 − x∗ || > ǫ > 0 ⇒ ||xk − x∗ || < ǫ


 
log( 1 )
log( log( Cǫ1 )
Cr0
∀k ≥ 
 
log 2

 
 

8 Solving systems of equations


Slide 38
g(x) = 0

g(xt+1 ) ≈ g(xt ) + ∇g(xt )(xt+1 − xt ) ⇒

xt+1 = xt − (∇g(xt ))−1 g(xt )

Application in optimization: g(x) = ∇f (x)

9 Modifications for global convergence


Slide 39
• Perform line search

• When Hessian is singular or near singular, use:


 −1
k 2 k k
d = − ∇ f (x ) + D ∇f (xk )

• D k is a diagonal matrix:

∇2 f (xk ) + Dk is PD

10 Summary
Slide 40
1. Line search methods:

Bisection Method.

Armijo’s Rule.

2. The Newton’s method


3. Quadratic rate of convergence
4. Modification for global convergence

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 20: The Conjugate Gradient Algorithm

Optimality conditions for constrained optimization

1 Outline
Slide 1
1. The Conjugate Gradient Algorithm
2. Necessary Optimality Conditions
3. Sufficient Optimality Conditions
4. Convex Optimization
5. Applications

2 The Conjugate Gradient Algorithm


2.1 Quadratic functions
Slide 2
1
min f (x) = x� Qx + c� x
2
Definition: d1 , . . . , dn are Q-conjugate if
di =
� 0, d�i Qdj = 0, i =
� j
Proposition: If d1 , . . . , dn are Q-conjugate, then d1 , . . . , dn are linearly inde­
pendent.

2.2 Motivation
Slide 3
Given Q-conjugate d1 , . . . , dn , and xk , compute
k
min f (xk + αdk ) = c� x + αc� dk +
α

1 k
(x + αdk )� Q(xk + αdk ) =
2
1
f (xk ) + α�f (xk )� dk + α2 d�k Qdk
2
Solution:
−�f (xk )� dk
α̂k = , xk+1 = xk + α̂k dk
d�k Qdk

2.3 Expanding Subspace


Theorem
Slide 4
d1 , . . . , dn are Q-conjugate. Then, xk+1 solves
min f (x)
k

s.t. x = x1 + αj dj
j=1

Moreover, xn+1 = x∗ .

1
2.4 The Algorithm
Slide 5
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 For k = 1, . . . , n do:
If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

xk+1 = xk + α̂k dk
−�f (xk+1 )� Qdk
dk+1 = −�f (x k+1 ) + λk dk , λk =
d�k Qdk

2.5 Correctness
Slide 6
Theorem: The directions d1 , . . . , dn are Q-conjugate.

2.6 Convergence Properties


Slide 7
• This is a finite algorithm.
• If there are only k distinct eigenvalues of Q, the CGA finds an optimal

solution in k steps.

• Idea of pre-conditioning. Consider

min f (Sx) = (Sx)� Q(Sx) + c� Sx


2

so that the number of distinct eigenvalues of S � QS is small

2.7 Example
Slide 8
1 �
f (x) = x Qx − c� x
2
⎛ 35 19 22 28 16 3 16 6 4 4 ⎞ ⎛ −1 ⎞
⎜ 19 43 33 19 5 2 5 4 0 0
⎜ 0
⎜ 22 33 40 29 12 7 6 2 2 4 ⎜ 0
⎟ ⎟
⎟ ⎟
⎜ 28 19 29 39 16 7 14 6 2 4 ⎟ ⎜ −3 ⎟
⎜ ⎟ ⎜ ⎟
⎜ 16 5 12 16 12 4 8 2 4 8 ⎜ 0
Q = ⎜ ⎟ and c = ⎜ −2
⎟ ⎟
⎜ 3 2 7 7 4 5 1 0 1 4 ⎟
⎟ ⎜ ⎟
⎜ 16 5 6 14 8 1 12 2 2 4 ⎟ ⎜ 0 ⎟
⎜ 6 4 2 6 2 0 2 4 0 0
⎟ ⎜ −6 ⎟
⎝ ⎠ ⎝ ⎠
4 0 2 2 4 1 2 0 2 4 −7
4 0 4 4 8 4 4 0 4 16 −4
� �2
κ(Q)−1
κ(Q) ≈ 17, 641 δ=
κ(Q)+1
= 0.999774 Slide 9

2
f (xk ) − f (x∗ )
k f (xk ) f (xk ) − f (x∗ )
f (xk−1 ) − f (x∗ )
1 12.000000 2593.726852 1.000000
2 8.758578 2590.485430 0.998750
3 1.869218 2583.596069 0.997341
4 −12.777374 2568.949478 0.994331
5 −30.479483 2551.247369 0.993109
6 −187.804367 2393.922485 0.938334
7 −309.836907 2271.889945 0.949024
8 −408.590428 2173.136424 0.956532
9 −754.887518 1826.839334 0.840646
10 −2567.158421 14.568431 0.007975
11 −2581.711672 0.015180 0.001042
12 −2581.726852 −0.000000 −0.000000

2.8 General problems


Slide 10
Step 0 Given x1 , set k := 1, d1 = −�f (x0 )
Step 1 If ||�f (xk )|| ≤ �, stop; else:

−�f (xk )� dk
α̂k = argminα f (xk + αdk ) =
d�k Qdk

x k+1 = xk + α̂k dk
dk+1 = −�f (xk+1 ) + λk dk
||�f (xk+1 )||
λk =
||�f (xk )||
Step 2 k ← k + 1, goto Step 1

3 Necessary
Optimality Conditions
3.1 Nonlinear Optimization
Slide 11
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m

P = {x| gj (x) ≤ 0, j = 1, . . . , p,
hi (x) = 0, i = 1, . . . , m}

3
3.2 The KKT conditions
Slide 12
Discovered by Karush-Kuhn-Tucker in 1950’s.
Theorem
If
• x is local minimum of P
• I = {j| gj (x) = 0}, set of tight constraints
• Constraint qualification condition (CQC): The vectors �gj (x), j ∈ I and
�hi (x), i = 1, . . . , m, are linearly independent
Slide 13
Then, there exist vectors (u, v):
p
� m

1. �f (x) + uj �gj (x) + vi �hi (x) = 0

j=1 i=1

2. u ≥ 0
3. uj gj (x) = 0, j = 1, . . . , p

3.3 Some Intuition from LO


Slide 14
Linearize the functions in the neighborhood of the solution x. Problem becomes:
min f (x) + �f (x)� (x − x)
s.t. gj (x) + �gj (x)� (x − x) ≤ 0, j∈I
hi (x) + �hi (x)� (x − x) = 0, i = 1, . . . , m
Slide 15
This is a LO problem. Dual feasibility:
� m

ûj �gj (x) + v̂i �hi (x) = �f (x), ûj ≤ 0
j∈I i=1

Change to uj = −ûj , vi = −v̂i to obtain:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0, uj ≥ 0
j=1 i=1

3.4 Example 1
Slide 16
min f (x) = (x1 − 12)2 + (x2 + 6)2
s.t. h1 (x) = 8x1 + 4x2 − 20 = 0
g1 (x) = x21 + x22 + 3x1 − 4.5x2 − 6.5 ≤ 0
g2 (x) = (x1 − 9)2 + x22 − 64 ≤ 0

4
x = (2, 1)� ; g1 (x) = 0, g2 (x) = −14, h1 (x) = 0.
Slide 17

• I = {1}
• �f (x) = (−20, 14)�; �g1 (x) = (7, −2.5)�
• �g2 (x) = (−14, 2)� ; �h1 (x) = (8, 4)�
• u1 = 4, u2 = 0, v1 = −1
• �g1 (x), �h1 (x) linearly independent
• �f (x) + u1 �g1 (x) + u2 �g2 (x) + v1 �h1 (x) = 0
� � � � � � � � � �
−20 7 −14 8 0
+4 +0 + (−1) =
14 −2.5 2 4 0

3.5 Example 2
Slide 18
max x� Qx
s.t. x� x ≤ 1
Q arbitrary; Not a convex optimization problem.

min −x� Qx
s.t. x� x ≤ 1

3.5.1 KKT
Slide 19
−2Qx + 2ux = 0
x� x ≤ 1
u ≥ 0
u(1 − x� x) = 0
Slide 20

3.5.2 Solutions of KKT


• x = 0, u = 0, Obj = 0.
• x �= 0 ⇒ Qx = u x ⇒ x eigenvector of Q with non-negative eigenvalue
u.
• x� Qx = u x� x = u.
• Thus, pick the largest nonnegative eigenvalue û of Q. The solution is

the corresponding eigenvector x̂ normalized such that x̂� x̂ = 1. If all

eigenvalues are negative, x̂ = 0.

5
3.6 Are CQC Necessary?
Slide 21
min x1
s.t. x21 − x2 ≤ 0
x2 = 0
Feasible space is (0, 0).
KKT: � � � � � � � �
1 2x1 0 0
+u +v =
0 −1 1 0
KKT multipliers do not exist, while still (0, 0)� is local minimum. Check �g1 (0, 0)
and �h1 (0, 0).

3.7 Constrained Qualification


Slide 22
Slater condition: There exists an x0 such that gj (x0 ) < 0, j = 1, . . . , p, and
hi (x0 ) = 0 for all i = 1, . . . , m.

Theorem Under the Slater condition the KKT conditions are necessary.

4 Sufficient
Optimality Conditions
Slide 23
Theorem If
• x feasible for P
• Feasible set is P is convex and f (x) convex
• There exist vectors (u, v), u ≥ 0:
p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

uj gj (x) = 0, j = 1, . . . , p

Then, x is a global minimum of P .

4.1 Proof
Slide 24
• Let x ∈ P . Then (1 − λ)x + λx ∈ P for λ ∈ [0, 1].
• gj (x + λ(x − x)) ≤ 0 ⇒

�gj (x)� (x − x) ≤ 0

6
• Similarly, hi (x + λ(x − x)) ≤ 0 ⇒

�hi (x)� (x − x) = 0

Slide 25
• Thus,

�f (x)� (x − x) =

⎛ ⎞�
� p m

−⎝ uj �gj (x) + vi �hi (x)⎠ (x − x) ≥ 0
j=1 i=1

⇒ f (x) ≥ f (x).

5 Convex Optimization
Slide 26
• The KKT conditions are always necessary under CQC.
• The KKT conditions are sufficient for convex optimization problems.
• The KKT conditions are necessary and sufficient for convex optimization

problems under CQC.

• min f (x) s.t. Ax = b, x ≥ 0, f (x) convex, KKT are necessary and

sufficient even without CQC.

5.0.1 Separating hyperplanes


Slide 27
Theorem Let S be a nonempty closed convex subset of �n and let x∗ ∈ �n be
a vector that does not belong to S. Then, there exists some vector c ∈ �n such
that c� x∗ < c� x for all x ∈ S.
Proof in BT, p.170

5.1 Sketch of the Proof under convexity


Slide 28
• Suppose x is a local (and thus global) optimal solution.
• f (x) < f (x), gj (x) ≤ 0, j = 1, . . . , p, hi (x) = 0, i = 1, . . . , m is infeasi­

ble.

• Let U = {(u0 , u, v)| there exists x : f (x) < u0 , gj (x) ≤ uj , hi (x) = vi }.


• (f (x), 0, 0) ∈
/ S.
• U convex.
Slide 29
• By separating hyperplane theorem, there is a vector (c0 , c, d):
p
� m

c0 u 0 + cj u j + di vi > c0 f (x) ∀ (u0 , u, v) ∈ U.
j=1 i=1

7
S
.x *

• c0 ≥ 0 and cj ≥ 0 for j ∈ I (constraint gj (x) ≤ 0 tight). Why?


If (u0 , u, v) ∈ U , then (u0 + λ, u, v) ∈ U for λ ≥ 0. Thus,
p
� m

∀ λ ≥ 0, λc0 + cj u j + di vi > c0 f (x) ⇒ c0 ≥ 0.
j=1 i=1

• Select (u0 , u, v) = (f (x) + λ, g1 (x), . . . , gp (x), h1 (x), . . . , hm (x)) ∈ U



p
� m

c0 (f (x) + λ) + cj gj (x) + di hi (x) > c0 f (x)
j=1 i=1

• Take λ → 0:
p
� m

c0 f (x) + cj gj (x) + di hi (x) ≥ c0 f (x)
j=1 i=1

• c0 > 0 (constrained qualification needed here).


• p
� m

f (x) + uj gj (x) + vi hi (x) ≥ f (x), uj ≥ 0
j=1 i=1


p
� m

f (x) + uj gj (x) + vi hi (x) ≤ f (x) ≤
j=1 i=1
p
� m

f (x) + uj gj (x) + vi hi (x).
j=1 i=1

8
• Thus, ⎛ ⎞
p
� m

f (x) = min ⎝f (x) + uj gj (x) + vi hi (x)⎠
j=1 i=1

p

uj gj (x) = 0 ⇒ uj gj (x) = 0
j=1

• Unconstrained optimality conditions:


p
� m

�f (x) + uj �gj (x) + vi �hi (x) = 0
j=1 i=1

6 Applications
6.1 Linear Optimization
Slide 30
min c� x
s.t. Ax = b
x≥0
min c� x
s.t. Ax − b = 0
−x ≤ 0

6.1.1 KKT
Slide 31
c + A� û − v = 0
v ≥ 0
vj xj = 0
Ax − b = 0
x ≥ 0
u = −û
A� u ≤ c dual feasibility
(cj − A�j u)xj = 0 complementarity
Ax − b = 0 primal feasibility
x ≥ 0 primal feasibility

9
6.2 Portfolio Optimization
Slide 32
x =weights of the portfolio
1
max r � x − λ x� Qx
2
s.t. e� x = 1

1
min −r �x + λ x� Qx
2
s.t. e� x = 1

6.2.1 KKT
Slide 33
−r + λQx + ue = 0
1 −1
x= Q (r − ue)
λ
−1
e� x = 1 ⇒ e� Q (r − ue) = λ
� −1
eQ r−λ
u= � −1
eQ e
As λ changes, tradeoff of risk and return changes. The allocation changes as
well. This is the essense of modern portfolio theory.

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 21: The Affine Scaling Algorithm


1 Outline
Slide 1
• History
• Geometric intuition
• Algebraic development
• Affine Scaling
• Convergence
• Initialization
• Practical performance

2 History
Slide 2
• In 1984, Karmakar at AT&T “invented” interior point method
• In 1985, Affine scaling “invented” at IBM + AT&T seeking intuitive ver­

sion of Karmarkar’s algorithm

• In early computational tests, A.S. far outperformed simplex and Kar­

markar’s algorithm

• In 1989, it was realised Dikin invented A.S. in 1967

3 Geometric intuition
3.1 Notation
Slide 3
min c� x
s.t. Ax = b
x ≥ 0
and its dual
max p� b
s.t. p� A ≤ c�

• P = {x | Ax = b, x ≥ 0}
• {x ∈ P | x > 0} the interior of P and its elements interior points

1
c

x2 .
x1 .0
x.

3.2 The idea


Slide 4
4 Algebraic development
4.1 Theorem
Slide 5
β ∈ (0, 1), y ∈ �n : y > 0, and
� n

(xi − yi )2
� �
n
S= x∈� � ≤ β2 .

2
yi
i=1

Then, x > 0 for every x ∈ S


Proof
• x∈S
• (xi − yi )2 ≤ β 2 yi2 < yi2
• |xi − yi | < yi ; −xi + yi < yi , and hence xi > 0
Slide 6
� �
x ∈ S is equivalent to �|Y −1 (x − y)�| ≤ β
Replace original LP:
min c� x
s.t. Ax
� −1=b �
�|Y (x − y)�| ≤ β.

2
d=x−y
min c� d
s.t. Ad = 0
||Y −1 d|| ≤ β

4.2 Solution
Slide 7
If rows of A are linearly independent and c is not a linear combination of the
rows of A, then
• optimal solution d∗ :
Y 2 (c − A� p)
d∗ = −β �� � , p = (AY 2 A� )−1 AY 2 c.
|Y (c − A� p)�|

• x = y + d∗ ∈ P
� �
• c� x = c� y − β �|Y (c − A� p)�| < c� y

4.2.1 Proof
Slide 8
• AY 2 A� is invertible;if not, there exists some z =
� 0 such that z � AY 2 A� z = 0
� �
• w = Y A z; w w = 0 ⇒ w = 0
• Hence A� z = 0 contradiction
• Since c is not a linear combination of the rows of A, c − A� p = � 0 and d∗ is well

defined

• d∗ feasible
Y (c − A� p)
Y −1 d∗ = −β � � ⇒ ||Y −1 d∗ || = β
�|Y (c − A� p)�|
Ad∗ = 0, since AY 2 (c − A� p) = 0

c� d = (c� − p� A)d
= (c� − p� A)Y Y −1 d
� �
≥ −�|Y (c − A� p)�| · ||Y −1 d||
� �
≥ −β �|Y (c − A� p)�|.
Slide 9

c� d∗ = (c� − p� A)d∗
Y 2 (c − A� p)
= −(c� − p� A)β � �
�|Y (c − A� p)�|
� �� � �
Y (c − A� p) Y (c − A� p)
= −β � �
�|Y (c − A� p)�|
� �
= −β �|Y (c − A� p)�|.
� �
• c� x = c� y + c� d∗ = c� y − β �|Y (c − A� p)�|

3
4.3 Interpretation
Slide 10
• y be a nondegenerate BFS with basis B
• A = [B N ]
• Y = diag(y1 , . . . , ym , 0, . . . , 0) and Y 0 = diag(y1 , . . . , ym ), then AY =
[BY 0 0]

p = (AY 2 A� )−1 AY 2 c
= (B � )−1 Y −2
0 B
−1
BY 20 cB
= (B � )−1 cB

• Vectors p dual estimates


• r = c − A� p becomes reduced costs:

r = c − A� (B � )−1 cB

• Under degeneracy?

4.4 Termination
Slide 11
y and p be primal and dual feasible solutions with

c� y − b� p < �

y ∗ and p∗ be optimal primal and dual solutions. Then,

c� y ∗ ≤ c� y < c� y ∗ + �,
b� p∗ − � < b� p ≤ b� p∗

4.4.1 Proof
Slide 12
• c� y ∗ ≤ c� y
• By weak duality, b� p ≤ c� y ∗
• Since c� y − b� p < �,
c� y < b� p + � ≤ c� y ∗ + �
b� p∗ = c� y ∗ ≤ c� y < b� p + �

4
5 Affine Scaling
5.1 Inputs
Slide 13
• (A, b, c);
• an initial primal feasible solution x0 > 0
• the optimality tolerance � > 0
• the parameter β ∈ (0, 1)

5.2 The Algorithm


Slide 14
1. (Initialization) Start with some feasible x0 > 0; let k = 0.
2. (Computation of dual estimates and reduced costs) Given some feasible

xk > 0, let

X k = diag(xk1 , . . . , xkn ),
pk = (AX 2k A� )−1 AX 2k c,
rk = c − A� pk .

3. (Optimality check) Let e = (1, 1, . . . , 1). If r k ≥ 0 and e� X k rk < �, then

stop; the current solution xk is primal �-optimal and pk is dual �-optimal.

4. (Unboundedness check) If −X 2k r k ≥ 0 then stop; the optimal cost is −∞.


5. (Update of primal solution) Let

X 2k rk
xk+1 = xk − β .
||X k rk ||

5.3 Variants
Slide 15
• ||u||∞ = maxi |ui |, γ(u) = max{ui | ui > 0}
• γ(u) ≤ ||u||∞ ≤ ||u||
• Short-step method.
• Long-step variants
X 2k r k
xk+1 = xk − β
||X k r k ||∞
X 2k r k
xk+1 = xk − β
γ(X k r k )

5
6 Convergence
6.1 Assumptions
Slide 16
Assumptions A:
(a) The rows of the matrix A are linearly independent.
(b) The vector c is not a linear combination of the rows of A.
(c) There exists an optimal solution.
(d) There exists a positive feasible solution.
Assumptions B:
(a) Every BFS to the primal problem is nondegenerate.
(b) At every BFS to the primal problem, the reduced cost of every nonbasic
variable is nonzero.

6.2 Theorem
Slide 17
If we apply the long-step affine scaling algorithm with � = 0, the following hold:
(a) For the Long-step variant and under Assumptions A and B, and if 0 < β < 1,

xk and pk converge to the optimal primal and dual solutions

(b) For the second Long-step variant, and under Assumption A and if 0 < β <
2/3, the sequences xk and pk converge to some primal and dual optimal solutions,
respectively

7 Initialization
Slide 18
min c� x + M xn+1
s.t. � � Ax + (b − Ae)xn+1 = b
x, xn+1 ≥ 0

8 Example
Slide 19
max x1 + 2x2
s.t. x1 + x2 ≤ 2
−x1 + x2 ≤ 1
x1 , x2 ≥0

9 Practical Performance
Slide 20
• Excellent practical performance, simple
• Major step: invert AX 2k A�
• Imitates the simplex method near the boundary

6
x2

..
1 ...
..
..
2 x1

MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093J Optimization Methods

Lecture 22: Barrier Interior Point Algorithms


1 Outline
Slide 1
1. Barrier Methods
2. The Central Path
3. Approximating the Central Path
4. The Primal Barrier Algorithm
5. The Primal-Dual Barrier Algorithm
6. Computational Aspects of IPMs

2 Barrier methods
Slide 2
min f (x)
s.t. gj (x) ≤ 0, j = 1, . . . , p
hi (x) = 0, i = 1, . . . , m

S = {x| gj (x) < 0, j = 1, . . . , p,


hi (x) = 0, i = 1, . . . , m}

2.1 Strategy
Slide 3
• A barrier function G(x) is a continous function with the property that is

approaches ∞ as one of gj (x) approaches 0 from negative values.

• Examples:
p p
� � 1
G(x) = − log(−gj (x)), G(x) = −
j=1 j=1
gj (x)

Slide 4
• Consider a sequence of µk : 0 < µk+1 < µk and µk → 0.
• Consider the problem

xk = argminx∈S f (x) + µk G(x)


� �

• Theorem If Every limit point xk generated by a barrier method is a global

minimum of the original constrained problem.

1
. x*

.
x(0.01) c

central p a th . x(0.1)

.
x(1)

. x(10)

.
analy t ic center

2.2 Primal path-following


IPMs for LO
Slide 5
(P ) min c� x (D) max b� p
s.t. Ax = b s.t. A� p + s = c
x ≥ 0 s≥0

Barrier problem:
n

min Bµ (x) = c� x − µ log xj
j=1
s.t. Ax = b

Minimizer: x(µ)

3 Central Path
Slide 6
• As µ varies, minimizers x(µ) form the central path
• limµ→0 x(µ) exists and is an optimal solution x∗ to the initial LP
• For µ = ∞, x(∞) is called the analytic center
n

min − log xj
j=1
s.t. Ax = b
Slide 7

2
x3 (1/3, 1/3, 1/3)
the analy t ic
Q centerof P

.
P

(1/2, 0, 1/2)
the analy t ic
.
x2
centerof Q

the central p a th
x1


• Q = x | x = (x1 , 0, x3 ), x1 + x3 = 1, x ≥ 0}, set of optimal solutions to

original LP

• The analytic center of Q is (1/2, 0, 1/2)

3.1 Example
Slide 8
min x2
s.t. x1 + x2 + x3 = 1
x1 , x2 , x3 ≥ 0
min x2 − µ log x1 − µ log x2 − µ log x3
s.t. x1 + x2 + x3 = 1

min x2 − µ log x1 − µ log x2 − µ log(1 − x1 − x2 ).

1 − x2 (µ)
x1 (µ) =
2 �
1 + 3µ − 1 + 9µ2 + 2µ
x2 (µ) =
2
1 − x2 (µ)
x3 (µ) =
2
The analytic center: (1/3, 1/3, 1/3) Slide 9

3.2 Solution of Central Path


Slide 10
• Barrier problem for dual:
n

max p� b + µ log sj
j=1
s.t. p� A + s = c�

3
• Solution (KKT):

Ax(µ) = b

x(µ) ≥ 0

A� p(µ) + s(µ) = c

s(µ) ≥ 0
X(µ)S(µ)e = eµ
Slide 11
∗ ∗ ∗
• Theorem: If x , p , and s satisfy optimality conditions, then they are
optimal solutions to problems primal and dual barrier problems.
• Goal: Solve barrier problem
n

min Bµ (x) = c� x − µ log xj
j=1
s.t. Ax = b

4 Approximating the central path


Slide 12
∂Bµ (x) µ
= ci −
∂xi xi
∂ 2 Bµ (x) µ
= 2
∂x2i xi
∂ 2 Bµ (x)
= 0, i �= j
∂xi ∂xj
Given a vector x > 0: Slide 13
n
� ∂Bµ (x)
Bµ (x + d) ≈ Bµ (x) + di +
i=1
∂xi
n
1 � ∂ 2 Bµ (x)
di dj
2 i,j=1 ∂xi ∂xj
1
= Bµ (x) + (c� − µe� X −1 )d + µd� X −2 d
2
X = diag(x1 , . . . , xn ) Slide 14
Approximating problem:
1
min (c� − µe� X −1 )d + µd� X −2 d
2
s.t. Ad = 0
Solution (from Lagrange):

c − µX −1 e + µX −2 d − A� p = 0
Ad = 0
Slide 15

4
.x*

.
x(0.01) c

central p a th . x(0.1)

.
x(1)

. x(10)

.
analy t ic center

• System of m + n linear equations, with m + n unknowns (dj , j = 1, . . . , n,

and pi , i = 1, . . . , m).

• Solution:
� �� 1

d(µ) = I − X 2 A� (AX 2 A� )−1 A xe − X 2 c
µ
2 � −1 2
p(µ) = (AX A ) A(X c − µxe)

4.1 The Newton connection


Slide 16
• d(µ) is the Newton direction; process of calculating this direction is called

a Newton step

• Starting with x, the new primal solution is x + d(µ)

• The corresponding dual solution becomes (p, s) = p(µ), c − A� p(µ)


� �

• We then decrease µ to µ = αµ, 0 < α < 1

4.2 Geometric Interpretation


Slide 17
• Take one Newton step so that x would be close to x(µ)

• Measure of closeness �� ��

�� 1 ��
�� XSe − e�� ≤ β,
�� µ ��

0 < β < 1, X = diag(x1 , . . . , xn ) S = diag(s1 , . . . , sn )

• As µ → 0, the complementarity slackness condition will be satisfied

Slide 18

5
5 The Primal Barrier Algorithm
Slide 19
Input
(a) (A, b, c); A has full row rank;
(b) x0 > 0, s0 > 0, p0 ;
(c) optimality tolerance � > 0;
(d) µ0 , and α, where 0 < α < 1. Slide 20
1. (Initialization) Start with some primal and dual feasible x0 > 0, s0 >

0, p0 , and set k = 0.

2. (Optimality test) If (sk )� xk < � stop; else go to Step 3.


3. Let

X k = diag(xk1 , . . . , xkn ),
µk+1 = αµk

Slide 21
4. (Computation of directions) Solve the linear system

µk+1 X − 2 �
k d−A p = µ
k+1
X− 1
k e−c
Ad = 0

5. (Update of solutions) Let

x k+1 = xk + d,
pk+1 = p,
sk+1 = c − A� p.

6. Let k := k + 1 and go to Step 2.

5.1 Correctness
√ Slide 22
β−β
Theorem Given α = 1 − √ √ , β < 1, (x0 , s0 , p0 ), (x0 > 0, s0 > 0):
β+ n
�� ��
�� 1 ��
�� X 0 S 0 e − e�� ≤ β.
�� µ0 ��

Then, after � √ √
(s0 )� x0 (1 + β)

β+ n
K= √ log
β−β �(1 − β)
iterations, (xK , sK , pK ) is found:

(sK )� xK ≤ �.

6
5.2 Complexity
Slide 23
• Work per iteration involves solving a linear system with m + n equations

in m + n unknowns. Given that m ≤ n, the work per iteration is O(n3 ).

• �0 = (s0 )� x0 : initial duality gap. Algorithm needs


�√ �0 �
O n log

iterations to reduce the duality gap from �0 to �, with O(n3 ) arithmetic
operations per iteration.

6 The Primal-Dual Barrier Algorithm


6.1 Optimality Conditions
Slide 24
Ax(µ) = b
x(µ) ≥ 0

A p(µ) + s(µ) = c
s(µ) ≥ 0
sj (µ)xj (µ) = µ or
X(µ)S(µ)e = eµ
� � � �
X(µ) = diag x1 (µ), . . . , xn (µ) , S(µ) = diag s1 (µ), . . . , sn (µ)

6.2 Solving Equations


⎡ ⎤ Slide 25
Ax − b
F (z) = ⎣ A� p + s − c ⎦
⎢ ⎥

XSe − µe
z = (x, p, s), r = 2n + m
Solve
F (z ∗ ) = 0

6.2.1 Newton’s method


Slide 26
F (z k + d) ≈ F (z k ) + J (z k )d
Here J (z k ) is the r × r Jacobian matrix whose (i, j)th element is given by

∂Fi (z) ��
∂zj �z =z k

F (z k ) + J (z k )d = 0
Set z k+1 = z k + d (d is the Newton direction) Slide 27

7
(xk , pk , sk ) current primal and dual feasible solution
Newton direction d = (dkx , dkp , dks )

dkx Axk − b
⎡ ⎤⎡ ⎤ ⎡ ⎤
A 0 0
⎣ 0 A� I ⎦ ⎣ dkp ⎦ = − ⎣ A� pk + sk − c ⎦
⎢ ⎥⎢ ⎥ ⎢ ⎥

Sk 0 Xk dks X k S k e − µk e

6.2.2 Step lengths


Slide 28
xk+1 = xk + βPk dkx
k k
pk+1 = pk + βD dp
k k
sk+1 = sk + βD ds

To preserve nonnegativity, take

xk
� � ��
k
βP = min 1, α min − ki ,
{i|(dk
x )i <0} (dx )i
ski
� � ��
k
βD = min 1, α min − ,
{i|(dk
s )i <0} (dks )i

0<α<1

6.3 The Algorithm


Slide 29
1. (Initialization) Start with x0 > 0, s0 > 0, p0 , and set k = 0
2. (Optimality test) If (sk )� xk < � stop; else go to Step 3.
3. (Computation of Newton directions)

(sk )� xk
µk =
n

Xk = diag(xk1 , . . . , xkn )

S k = diag(sk1 , . . . , skn )

Solve linear system

dkx Axk − b
⎡ ⎤⎡ ⎤ ⎡ ⎤
A 0 0

⎣ 0 A I ⎦ ⎣ dkp ⎦ = − ⎣ A� pk + sk − c ⎦
⎢ ⎥⎢ ⎥ ⎢ ⎥

Sk 0 Xk dks X k S k e − µk e
Slide 30

8
4. (Find step lengths)

xki
� � ��
βPk = min 1, α min −
{i|(dk
x )i <0} (dkx )i
sk
� � ��
k
βD = min 1, α min − ki
{i|(dk
s )i <0} (ds )i

5. (Solution update)

xk+1 = xk + βPk dkx


k k
pk+1 = pk + βD dp
k k
sk+1 = sk + βD ds

6. Let k := k + 1 and go to Step 2

6.4 Insight on behavior


Slide 31
• Affine Scaling
� �
daffine = −X 2 I − A� (AX 2 A� )−1 AX 2 c

• Primal barrier
�� �
1 2

dprimal−barrier = I − X 2 A� (AX 2 A� )−1 A Xe − X c
µ

• For µ = ∞ � �
dcentering = I − X 2 A� (AX 2 A� )−1 A Xe

• Note that
1
dprimal−barrier = dcentering + daffine
µ
• When µ is large, then the centering direction dominates, i.e., in the beginning,

the barrier algorithm takes steps towards the analytic center

• When µ is small, then the affine scaling direction dominates, i.e., towards the

end, the barrier algorithm behaves like the affine scaling algorithm

7 Computational aspects of IPMs


Slide 32
Simplex vs. Interior point methods (IPMs)
• Simplex method tends to perform poorly on large, massively degenerate

problems, whereas IP methods are much less affected.

• Key step in IPMs


AX 2k A� d = f
� �

9
• In implementations of IPMs AX 2k A� is usually written as

AX 2k A� = LL� ,

where L is a square lower triangular matrix called the Cholesky factor


• Solve system
AX 2k A� d = f
� �

by solving the triangular systems

Ly = f , L� d = y

• The construction of L requires O(n3 ) operations; but the actual compu­


tational effort is highly dependent on the sparsity (number of nonzero
entries) of L
• Large scale implementations employ heuristics (reorder rows and columns

of A) to improve sparsity of L. If L is sparse, IPMs are stronger.

8 Conclusions
Slide 33
• IPMs represent the present and future of Optimization.
• Very successful in solving very large problems.
• Extend to general convex problems

10
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 23: Semidefinite Optimization


1 Outline
Slide 1
1. Preliminaries
2. SDO
3. Duality
4. SDO Modeling Power

2 Preliminaries
Slide 2
• A symmetric matrix A is positive semidefinite (A � 0) if and only if

u� Au ≥ 0 ∀ u ∈ Rn

• A � 0 if and only if all eigenvalues of A are nonnegative


n �
� n
• Inner product A • B = Aij Bij
i=1 j=1

2.1 The trace


Slide 3
• The trace of a matrix A is defined
n

trace(A) = Ajj
j=1

• trace(AB) = trace(BA)
• A • B = trace(A� B) = trace(B � A)

3 SDO
Slide 4
• C symmetric n × n matrix
• Ai , i = 1, . . . , m symmetric n × n matrices
• bi , i = 1, . . . , m scalars
• Semidefinite optimization problem (SDO)

(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0

1
3.1 Example
Slide 5
n = 3 and m = 2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 1 0 2 8 1 2 3
A1 = ⎝ 0 3 7 ⎠ , A2 = ⎝ 2 6 0⎠, C = ⎝2 9 0⎠
1 7 5 8 0 4 3 0 7

b1 = 11, b2 = 19
⎛ ⎞
x11 x12 x13
X = ⎝ x21 x22 x23 ⎠
x31 x32 x33
Slide 6

(P ) : min x11 + 4x12 + 6x13 + 9x22 + 7x33


s.t. x11 + 2x13 + 3x22 + 14x23 + 5x33 = 11
4x12 + 16x13 + 6x22 + 4x33 = 19
⎛ ⎞

x11 x12 x13


X = ⎝ x21 x22 x23 ⎠ � 0

x31 x32 x33

3.2 Convexity
Slide 7
(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0
The feasible set is convex:

X1 , X2 feasible =⇒ λX1 + (1 − λ)X2 feasible, 0≤λ≤1

Ai • (λX1 + (1 − λ)X2 ) = λ Ai • X1 +(1 − λ) Ai • X2 = bi


� �� � � �� �
bi bi

u� (λX1 + (1 − λ)X2 )u = λ u� X1 u +(1 − λ) u� X2 u ≥ 0


� �� � � �� �
≥0 ≥0

2
3.3 LO as SDO
Slide 8
LO : min c� x
s.t. Ax = b
x≥0
⎛ ⎞ ⎛ ⎞
ai1 0 ... 0 c1 0 ... 0
⎜ 0 ai2 ... 0 ⎟ ⎜0 c2 ... 0 ⎟
Ai = ⎜
⎝ ... .. .. .. ⎟ , C =⎜
⎝ ... .. .. .. ⎟
. . . ⎠ . . . ⎠
0 0 . . . ain 0 0 . . . cn
Slide 9

(P ) : min C • X
s.t. Ai • X = bi , i = 1, . . . , m
Xij = 0, i = 1, . . . , n, j = i + 1, . . . , n
X�0
⎛ ⎞
x1 0 . . . 0
⎜ 0 x2 . . . 0 ⎟
X =⎜ ⎝ ... .. . . . ⎟
. . .. ⎠
0 0 . . . xn

4 Duality
Slide 10
m

(D) : max yi bi
i=1
�m
s.t. yi Ai + S = C
i=1
S�0
Equivalently,
m

(D) : max yi bi
i=1
m

s.t. C − yi Ai � 0
i=1

3
4.1 Example
Slide 11
(D) max 11y1 + 19y2
⎛ ⎞ ⎛ ⎞ ⎛ ⎞
1 0 1 0 2 8 1 2 3
s.t. y1 ⎝ 0 3 7 ⎠ + y2 ⎝ 2 6 0⎠+S = ⎝2 9 0⎠
1 7 5 8 0 4 3 0 7
S�0

(D) max 11y1 + 19y2


⎛ ⎞
1 − 1y1 − 0y2 2 − 0y1 − 2y2 3 − 1y1 − 8y2
s.t. ⎝ 2 − 0y1 − 2y2 9 − 3y1 − 6y2 0 − 7y1 − 0y2 ⎠ � 0
3 − 1y1 − 8y2 0 − 7y1 − 0y2 7 − 5y1 − 4y2
Slide 12

y22

1
0.8
0.6
0.4
0.2
y11

-2 -1.5 -1 -0.5 0.5 1

Optimal value � 13.9022

y1∗ � 0.4847, y2∗ � 0.4511

4.2 Weak Duality


Slide 13
Theorem Given a feasible solution X of (P ) and a feasible solution (y, S) of
(D),
�m
C •X − yi bi = S • X ≥ 0
i=1
m

If C • X − yi bi = 0, then X and (y, S) are each optimal solutions to (P )
i=1
and (D) and SX = 0

4
4.3 Proof
Slide 14
• We must show that if S � 0 and X � 0, then S • X ≥ 0
• Let S = P DP � and X = QEQ� where P, Q are orthonormal matrices

and D, E are nonnegative diagonal matrices

S • X = trace(S � X) = trace(SX)

= trace(P DP � QEQ� )
n

= trace(DP � QEQ� P ) = Djj (P � QEQ� P )jj ≥ 0,
j=1

since Djj ≥ 0 and the diagonal of P � QEQ� P must be nonnegative.


• Suppose that trace(SX) = 0. Then
n

Djj (P � QEQ� P )jj = 0
j=1

• Then, for each j = 1, . . . , n, Djj = 0 or (P � QEQ� P )jj = 0.


• The latter case implies that the j th row of P � QEQ� P is all zeros. There­

fore, DP � QEQ� P = 0, and so SX = P DP � QEQ� = 0.

4.4 Strong Duality


Slide 15
• (P ) or (D) might not attain their respective optima
• There might be a duality gap, unless certain regularity conditions hold

Theorem
• If there exist feasible solutions X̂ for (P ) and (ŷ, Ŝ) for (D) such that

X̂ � 0, Ŝ � 0

∗ ∗
• Then, both (P ) and (D) attain their optimal values zP and zD
• Furthermore, zP∗ = zD

5 SDO Modeling Power


5.1 Quadratically
Constrained Problems
Slide 16
min (A0 x + b0 )� (A0 x + b0 ) − c�0 x − d0

s.t. (Ai x + bi )� (Ai x + bi ) − c�i x − di ≤ 0 ,

5
i = 1, . . . , m

(Ax + b)� (Ax + b) − c� x − d ≤ 0 ⇔


� �
I Ax + b
�0
(Ax + b)� c� x + d
Slide 17

min t

s.t. (A0 x + b0 )� (A0 x + b0 ) − c�0 x − d0 − t ≤ 0

(Ai x + bi )� (Ai x + bi ) − c�i x − di ≤ 0, ∀ i


Slide 18

min t
� �
I A0 x + b0
s.t. �0
(A0 x + b0 )� c�0 x + d0 + t
� �
I Ai x + bi
�0 ∀i
(Ai x + bi )� c�i x + di

5.2 Eigenvalue Problems


Slide 19
• X: symmetric n × n matrix
• λmax (X) = largest eigenvalue of X
• λ1 (X) ≥ λ2 (X) ≥ · · · ≥ λm (X) eigenvalues of X
• λi (X + t · I) = λi (X) + t
• Theorem: λmax (X) ≤ t ⇔ t · I − X � 0
• Sum of k largest eigenvalues:
k

λi (X) ≤ t ⇔ t − k · s − trace(Z) ≥ 0
i=1
Z�0

Z −X +sI �0

• Follows from the characterization:


k

λi (X) = max{X • V : trace(V ) = k, 0 � V � I}
i=1

6
5.3 Optimizing
Structural Dynamics
Slide 20
• Select xi , cross-sectional area of structure i, i = 1, . . . , n

• M (x) = M 0 + i xi M i , mass matrix

• K(x) = K 0 + i xi K i , stiffness matrix

• Structure weight w = w0 + i xi wi
• Dynamics

¨ + K(x)d = 0

M (x)d
Slide 21
• d(t) vector of displacements
n

• di (t) = αij cos(ωj t − φj )

j=1

• det(K(x) − M (x)ω 2 ) = 0; ω1 ≤ ω2 ≤ · · · ≤ ωn

1/2
• Fundamental frequency: ω1 = λmin (M (x), K(x))
• We want to bound the fundamental frequency

ω1 ≥ Ω ⇐⇒ M (x)Ω2 − K(x) � 0

• Minimize weight
Slide 22
Problem: Minimize weight subject to
Fundamental frequency ω1 ≥ Ω
Limits on cross-sectional areas
Formulation

min w0 + i xi wi

s.t. M (x) Ω2 − K(x) � 0


li ≤ xi ≤ ui

5.4 Measurements with Noise


Slide 23
• x: ability of a random student on k tests

E[x] = x̄, E[(x − x̄)(x − x̄)� ] = Σ

• y: score of a random student on k tests


• v: testing error of k tests, independent of x

E[v] = 0, E[vv � ] = D, diagonal (unknown)

7
• y = x + v; E[y] = x̄

� = Σ + D

E[(y − x̄)(y − x̄)� ] = Σ

• Objective: Estimate reliably x̄ and Σ


Slide 24

• Take samples of y from which we can estimate x̄, Σ
• e� x: total ability on tests
• e� y: total test score
• Reliability of test:=

Var[e� x] e� Σe e� De

= =1−
Var[e y] �e
e� Σ �e
e� Σ
Slide 25
We can find a lower bound on the reliability of the test

min e� Σe

s.t. Σ + D = Σ
Σ, D � 0
D diagonal

Equivalently,
max e� De

s.t. 0 � D � Σ
D diagonal

5.5 Further Tricks


Slide 26
• If B � 0, � �
B C �
A= � 0 ⇐⇒ D − CB −1 C � � 0
C D
• � �
c b�
x� Ax + 2b� x + c ≥ 0, ∀ x ⇐⇒ �0
b A

8
5.6 MAXCUT
Slide 27
• Given G = (N, E) undirected graph, weights wij ≥ 0 on edge (i, j) ∈ E

• Find a subset S ⊆ N : i∈S,j∈S̄ wij is maximized

• xj = 1 for j ∈ S and xj = −1 for j ∈ S̄


n n
1 ��
M AXCU T : max wij (1 − xi xj )
4 i=1 j=1
s.t. xj ∈ {−1, 1}, j = 1, . . . , n

5.6.1 Reformulation
Slide 28
• Let Y = xx� , i.e., Yij = xi xj
• Let W = [wij ]
• Equivalent Formulation
n n
1 ��
M AXCU T : max wij − W • Y
4 i=1 j=1
s.t. xj ∈ {−1, 1}, j = 1, . . . , n
Yjj = 1, j = 1, . . . , n
Y = xx�

5.6.2 Relaxation
Slide 29
• Y = xx� � 0
• Relaxation
n n
1 ��
RELAX : max wij − W • Y
4 i=1 j=1
s.t. Yjj = 1, j = 1, . . . , n

Y � 0
Slide 30

M AXCU T ≤ RELAX

• It turns out that:

0.87856 RELAX ≤ M AXCU T ≤ RELAX

• The value of the SDO relaxation is guaranteed to be no more than 12%


higher than the value of the very difficult to solve problem MAXCUT

9
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.
15.093 Optimization Methods

Lecture 24: Semidefinite Optimization


1 Outline
Slide 1
1. Minimizing Polynomials as an SDP
2. Linear Difference Equations and Stabilization
3. Barrier Algorithm for SDO

2 SDO formulation
2.1 Primal and dual
Slide 2

(P ) : min C • X
s.t. Ai • X = bi i = 1, . . . , m
X�0

m

(D) : max yi bi
i=1
m

s.t. C − yi Ai � 0
i=1

3 Minimizing Polynomials
3.1 Sum of squares
Slide 3
• A polynomial f (x) is a sum of squares (SOS) if

f (x) = gj2 (x)
j

for some polynomials gj (x).


• A polynomial satisfies f (x) ≥ 0 for all x ∈ R if and only if it is a sum of

squares.

• Not true in more than one variable!

3.2 Proof
� Slide 4
2
• (⇐) Obvious. If f (x) = j gj (x) then f (x) ≥ 0.

1
� �
• (⇒) Factorize f (x) = C j (x − rj )nj k (x − ak + ibk )mk (x − ak − ibk )mk .
Since f (x) is nonnegative, then C ≥ 0 and all the nj are even. Then,
f (x) = f1 (x)2 + f2 (x)2 , where
1 � nj �
f1 (x) = C2 (x − rj ) 2 (x − ak )mk
j k
1 � nj �
f2 (x) = C 2 (x − rj ) 2 bm
k
k

j k

3.3 SOS and SDO


Slide 5
• Let zx = (1, x, x2 , . . . , xk )� .
• f (x) = z(x)� Qz(x) is a sum of squares if and only if

f (x) = z(x)� Qz(x),

where Q � 0, i.e., Q = L� L.
• Then, f (x) = z(x)� L� L(x) = ||Lz(x)||2 .

3.4 Formulation
Slide 6
• Consider min f (x).
• Then, f (x) ≥ γ if and only if f (x) − γ = zx� Qzx with Q � 0. This

implies linear constraints on γ and Q.

• Reformulation

max γ


f (x) − γ = z(x)� Qz(x)
s.t.
Q � 0

3.5 Example
3.5.1 Reformulation
Slide 7
min f (x) = 3 + 4x + 2x2 + 2x3 + x4 .
f (x) − γ = 3 − γ + 4x + 2x2 + 2x3 + x4 = (1, x, x2 )� Q(1, x, x2 ).

max γ
s.t. 3 − γ = q11
4 = 2q12 , 2 = 2q13 + q22
2 = 2q
⎡23 , 1 = q33 ⎤
q11 q12 q13
Q = ⎣ q12 q22 q23 ⎦ � 0
q13 q23 q33

Extensions to multiple dimensions.

2
4 Stability
Slide 8
• A linear difference equation

x(k + 1) = Ax(k), x(0) = x0

• x(k) converges to zero iff |λi (A)| < 1, i = 1, . . . n


• Characterization:

|λi (A)| < 1 ∀i ⇐⇒ ∃P � 0 A� P A − P � 0

4.1 Proof
Slide 9
• (⇐=) Let Av = λv. Then,

0 > v � (A� P A − P )v = (|λ|2 − 1) �v ���


P v,


>0

and therefore |λ| < 1


�∞ �
• (=⇒) Let P = i=0 Ai QAi , where Q � 0. The sum converges by the

eigenvalue assumption. Then,


� ∞

i� i �

A PA − P = A QA − Ai QAi = −Q � 0
i=1 i=0

4.2 Stabilization
Slide 10
• Consider now the case where A is not stable, but we can change some

elements, e.g., A(L) = A + LC, where C is a fixed matrix.

• Want to find an L such that A + LC is stable.


• Use Schur complements to rewrite the condition:
(A + LC)� P (A + LC) − P � 0, P �0
� � �
P (A + LC)� P
�0
P (A + LC) P

Condition is nonlinear in (P , L)

4.3 Changing variables


Slide 11
• Define a new variable Y := P L

� �

P A� P + C � Y �
�0
P A + Y C P

• This is linear in (P , Y ).
• Solve using SDO, recover L via L = P −1 Y

3
5 Primal Barrier Algorithm for SDO
Slide 12
• X � 0 ⇔ λ1 (X) ≥ 0, . . . , λn (X) ≥ 0
• Natural barrier to repel X from the boundary λ1 (X) > 0, . . . , λn (X) > 0:
n

− log(λi (X)) =
j=1

n

− log( λi (X)) = − log(det(X))
j=1
Slide 13
• Logarithmic barrier problem

min Bµ (X) = C • X − µ log(det(X))


s.t. Ai • X = bi , i = 1, . . . , m,
X �0

• Derivative: �Bµ (X) = C − µX −1


Follows from

log det(X + H ) ≈ log det(X) + trace(X −1 H ) +


· · ·

• KKT conditions
Ai • X = bi , i = 1, . . . , m,
�m
C − µX −1 = yi A i .
i=1
X � 0,
• Given µ, need to solve these nonlinear equations for X, C, yi
• Apply Newton’s method until we are “close” to the optimal
• Reduce value of µ, and iterate until the duality gap is small

5.1 Another interpretation


Slide 14
• Recall the optimality conditions:

Ai • X = bi , i = 1, . . . , m,

m

yi A i + S =C
i=1
X, S � 0,
XS =0

• Cannot solve directly. Rather, perturb the complementarity condition to X S =


µI.
• Now, unique solution for every µ > 0 (the “central path”)
• Solve using Newton, for decreasing values of µ.

4
6 Differences with LO
Slide 15
• Many different ways to linearize the nonlinear complementarity condition

X S = µI

• Want to preserve symmetry of the iterates


• Several search directions.

7 Convergence
7.1 Stopping criterion
Slide 16
• The point (X, yi ) is feasible, and has duality gap:
m

C •X − yi bi = µX −1 • X = nµ
i=1

• Therefore, reducing µ always decreases the duality gap

�√ �0 �
• Barrier algorithm needs O n log iterations to reduce duality gap from �0


to �

8 Conclusions
Slide 17
• SDO is a powerful modeling tool
• Barrier and primal-dual algorithms are very powerful
• Many good solvers available: SeDuMi, SDPT3, SDPA, etc.
• Pointers to literature and solvers:
www-user.tu-chemnitz.de/~helmberg/semidef.html

5
MIT OpenCourseWare
http://ocw.mit.edu

15.093J / 6.255J Optimization Methods


Fall 2009

For information about citing these materials or our Terms of Use, visit: http://ocw.mit.edu/terms.

You might also like