You are on page 1of 36

Numerical Optimization

Unconstrained Optimization

Shirish Shevade
Computer Science and Automation
Indian Institute of Science
Bangalore 560 012, India.

NPTEL Course on Numerical Optimization

Shirish Shevade Numerical Optimization


Coordinate Descent Method

Consider the problem,

min f (x)
x

where f : Rn → R, f ∈ C 1 .
Idea :
1 For every coordinate variable xi , i = 1, . . . , n, minimize
f (x) w.r.t. xi , keeping the other coordinate variables
xj , j 6= i constant.
2 Repeat the procedure in step 1 until some stopping
condition is satisfied.

Shirish Shevade Numerical Optimization


Coordinate Descent Method
(1) Initialize x0 ,  , set k := 0.
(2) while kgk k > 
for i = 1, . . . , n
xinew = arg minxi f (x)
xi = xinew
endfor
endwhile
Output : x∗ = xk , a stationary point of f (x).

Globally convergent method if a search along any


coordinate direction yields a unique minimum point

Shirish Shevade Numerical Optimization


Example: Consider the problem,

min f (x) = 4x12 + x22
x
We use coordinate descent method with exact line search to
solve this problem.
x0 = (−1, −1)T
Let d0 = (1, 0)T
x1 = x0 + α0 d0 where

α0 = arg min φ0 (α) = f (x0 + αd0 )
 0 α
x1 + αd10
φ0 (α) = f 0 = 4(α − 1)2 + 1
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 1 ⇒ x1 = (0, −1)T

d1= (0, 1)T 2 1 1 1 1
 , x = x + α d , α = arg minα φ1 (α) =
0
f = (α − 1)2 ⇒ α1 = 1 ⇒ x2 = (0, 0)T = x∗
α−1
Shirish Shevade Numerical Optimization

min f (x) = 4x12 + x22
x
For the above problem,
Moving along coordinate directions and using exact lines
search gives the solution in at most two steps.
Same result is obtained even if d0 and d1 are interchanged.

Shirish Shevade Numerical Optimization


Example: Consider the problem,

min f (x) = 4x12 + x22 − 2x1 x2
x
We use coordinate descent method with exact line search to
solve this problem.
x0 = (−1, −1)T
Let d0 = (1, 0)T
x1 = x0 + α0 d0 where

α0 = arg min φ0 (α) = f (x0 + αd0 )
 0 α
x1 + αd10
φ0 (α) = f 0 = 4(α − 1)2 + 1 + 2(α − 1)
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 43 ⇒ x1 = (− 14 , −1)T

d1= (0, 1) T 2 1 1 1 1
 , x = x + α d , α = arg minα φ1 (α) =
1
−4
f = (α − 1)2 + α−1 + 14 ⇒ α1 = 34 ⇒ x2 =
α−1 2

(− 41 , − 14 )T 6= x∗
Shirish Shevade Numerical Optimization
Example 1:

min f1 (x) = 4x12 + x22
x

H = 80 02 .
x∗ , attained in at most two steps using coordinate descent
method
Example 2:

min f2 (x) = 4x12 + x22 − 2x1 x2
x

8 −2

H = −2 2 .
x∗ , could not be attained in two steps using coordinate
descent method (if x0 is not on one of the principal axes of
the elliptical contours)

Shirish Shevade Numerical Optimization


Consider the problem:
∆ 1
min f (x) = xT Hx + cT x
x 2
where H is a symmetric positive definite matrix.
Let {d0 , d1 , . . . , dn−1 } be a set of linearly independent
directions and x0 ∈ Rn
Any x ∈ Rn can be represented as
n−1
X
x=x + 0
αi d i
i=0

Given {d0 , d1 , . . . , dn−1 } and x0 ∈ Rn , the given problem is


to minimize Ψ(α) defined as,
n−1
!T n−1
! n−1
!
1 X X X
x0 + αi di H x0 + αi di +cT x0 + αi d i
2 i=0 i=0 i=0

Shirish Shevade Numerical Optimization


Define D = (d0 |d1 | . . . |dn−1 ) and α = (α0 , α1 , . . . , αn−1 ).

1 1 T
Ψ(α) = αT |DT{z
HD} α + (Hx0 + c)T Dα + x0 Hx0 + cT x0
2 |2
Q {z
constant
}

 T T T

d0 Hd0 d0 Hd1 ... d0 Hdn−1
T T T
d1 Hd0 d1 Hd1 d1 Hdn−1
 
T
 ... 
Q = D HD =  .. .. .. .. 
. . . .
 
 
T T T
dn−1 Hd0 dn−1 Hd1 . . . dn−1 Hdn−1
T
Q will be diagonal matrix if di Hdj = 0, ∀ i 6= j.

Shirish Shevade Numerical Optimization


T
Let di Hdj = 0, ∀ i 6= j.
 T 
d0 Hd0 0 ... 0
 1 T 1 
 0 d Hd . . . 0 
Q = DT HD =  .. .. .. .. 
. . . .
 
 
n−1 T
0 0 ... d Hdn−1
Therefore, (
1
iT if j = i
Q−1
ij = d Hdi
0 otherwise

1 X X X
Ψ(α) = (x0 + αi di )T H(x0 + αi di ) + cT (x0 + αi d i )
2 i i i
1 X h
T
i
= (x0 + αi di ) H(x0 + αi di ) + 2cT (x0 + αi di ) + constant
2 i

Ψ(α) is separable in terms of α0 , α1 , . . . , αn−1


Shirish Shevade Numerical Optimization
1 Xh 0 i i T 0 i i T 0 i i
i
Ψ(α) = (x + α d ) H(x + α d ) + 2c (x + α d )
2 i

T
∂Ψ i∗ di (Hx0 + c)
=0⇒α =− T
∂αi di Hdi
Therefore,
n−1

X

x =x + 0
αi d i
i=0

Definition
Let H ∈ Rn×n be a symmetric matrix. The vectors
{d0 , d1 , . . . , dn−1 } are said to be H-conjugate if they are linearly
T
independent and di Hdj = 0 ∀ i 6= j.
Shirish Shevade Numerical Optimization
Example: Consider the problem,

min f (x) = 4x12 + x22 − 2x1 x2
x
−2
8

H = −2 2
x0 = (−1, −1)T
Let d0 = (1, 0)T
x1 = x0 + α0 d0 where

α0 = arg min φ0 (α) = f (x0 + αd0 )
 0 0

x + αd1
φ0 (α) = f 10 = 4(α − 1)2 + 1 + 2(α − 1)
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 43 ⇒ x1 = (− 14 , −1)T
T
Choose a non-zero direction d1 such that d1 Hd0 = 0
Let d1 = (a, b)T
. Therefore,
8 −2
 1
(a b) −2 2 0 = 0 ⇒ 8a − 2b = 0
Shirish Shevade Numerical Optimization
Let d1 = (1, 4)T ,
x2 = x1 + α1 d1 where

α − 41
 
1 ∆ 3
α = arg min φ1 (α) = f = (4α − 1)2
α 4α − 1 4

φ01 (α) = 0 ⇒ α1 = 1
4
x2 = x1 + α1 d1 = (0, 0)T = x∗

A convex quadratic function can be minimized in, at most, n


steps, provided we search along conjugate directions of the
Hessian matrix.

Given H, does a set of H-conjugate vectors exist? If yes, how to


get a set of such vectors?

Shirish Shevade Numerical Optimization


Conjugate Directions

Let H ∈ Rn×n be a symmetric matrix.


Do there exist n conjugate directions w.r.t H?
H is symmetric ⇒ H has n mutually orthogonal
eigenvectors.
Let v1 and v2 be two orthogonal eigenvectors of H.
∴ vT1 v2 = 0.

Hv1 = λ1 v1 ⇒ vT2 Hv1 = λ1 vT2 v1


⇒ vT2 Hv1 = 0
⇒ v1 and v2 are H-conjugate

∴ n orthogonal eigenvectors of H are H-conjugate.

Shirish Shevade Numerical Optimization


Conjugate Directions

Let H be a symmetric positive definite matrix and


d0 , d1 , . . . , dn−1 be nonzero directions such that
T
di Hdj = 0, i 6= j.

Are d0 , d1 , . . . , dn−1 linearly independent?

n−1 n−1
T
X X
i i
µd =0 ⇒ µi dj Hdi = 0 for every j = 0, . . . , n − 1
i=0 i=0
T
⇒ µj dj Hdj = 0
⇒ µj = 0 for every j = 0, . . . , n − 1
⇒ d0 , d1 , . . . , dn−1 are linearly independent

Shirish Shevade Numerical Optimization


Conjugate Directions
Geometric Interpretation:
Consider the problem:
1 T
min x Hx + cT x, H symmetric positive definite matrix.
x∈R 22

Let x∗ be the solution. ∴ Hx∗ = −c.


Let x0 be any initial point. g0 = Hx0 + c
Let d0 be some direction (d0 6= 0).
T
x1 is found by doing exact line search along d0 . ∴ g1 d0 = 0.
g1 = Hx1 + c.
(x∗ − x1 )T Hd0 = (Hx∗ − Hx1 )T d0
T
= −g1 d0
= 0

Therefore, the direction (x∗ − x1 ) is H conjugate to d0 .


Shirish Shevade Numerical Optimization
Consider the problem:

∆ 1
min f (x) = xT Hx+cT x, H symmetric positive definite matrix.
x 2
Let d , d , . . . , dn−1 be H-conjugate. ∴d0 , d1 , . . . , dn−1 are
0 1

linearly independent.
Let B k denote the subspace spanned by d0 , d1 , . . . , dk−1 .
Clearly, B k ⊂ Bk+1 .
Let x0 ∈ Rn be any arbitrary point.
Let xk+1 = xk + αk dk where αk is obtained by doing exact line
search:
αk = arg min f (xk + αdk )
α

Claim:
xk = arg minx f (x)
s.t. x ∈ x0 + B k

Shirish Shevade Numerical Optimization


Exact line search:

αk = arg min f (xk + αdk )


α∈R

Therefore,
T
∇f (xk + αk dk )T dk = 0 ⇒ gk+1Pdk = 0 ∀ k = 0, . . . , n − 1
xk = xk−1 + αk−1 dk−1 = xj + k−1 i i
i=j α d (j = 0, . . . , k − 1)
k−1
X
∴ Hxk + c = Hxj + c + αi Hdi
i=j
k−1
X
∴ gk = gj + αi Hdi
i=j
k−1
k T j−1 j T j−1 T
X
∴g d = g d + αi di Hdj−1 = 0
i=j

T
Therefore, gk dj = 0 ∀ j = 0, . . . , k − 1 or gk ⊥ Bk
Shirish Shevade Numerical Optimization
Note that for every j = 0, . . . , n − 1,

αj = arg min f (xj + αdj )


α

∴ f (xj + αj dj ) ≤ f (xj + µj dj ), µj ∈ R
T 1 j2 jT j T 1 2 T
∴ f (xj ) + αj gj dj + α d Hd ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
We need to show that f (xk ) ≤ f (x) ∀ x ∈ x0 + B k or
k−1
X k−1
X
j j
0
f (x + α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j.
That is, j=0 j=0

k−1 k−1
j 0T j 1 j2 jT T 1 2 T
X X
f (x )+ (α g d + α d Hd ) ≤ f (x )+ (µj g0 dj + µj dj Hdj )
0 j 0

j=0
2 j=0
2

where µj ∈ R ∀ j.
Shirish Shevade Numerical Optimization
For every j = 0, . . . , n − 1,
T 1 2 T T 1 2 T
f (xj ) + αj gj dj + αj dj Hdj ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
T T
Suppose gj dj = g0 dj ∀ j
T 1 j2 jT j T 1 2 T
∴ αj g0 dj + α d Hd ≤ µj g0 dj + µj dj Hdj ∀j
2 2
Therefore,
k−1 k−1
X T 1 2 T X T 1 2 T
f (x0 )+ (αj g0 dj + αj dj Hdj ) ≤ f (x0 )+ (µj g0 dj + µj dj Hdj )
j=0
2 j=0
2

k−1
X k−1
X
j j
∴ f (x + 0
α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j
j=0 j=0
k
∴ f (x ) ≤ f (x), ∀ x ∈ x0 + B k
Shirish Shevade Numerical Optimization
We need to show that
T T
gj dj = g0 dj ∀j

Consider, xj = x0 + j−1 i i
P
i=0 α d .

j−1
X
j
∴ Hx + c = Hx + c + 0
αi Hdi
i=0
j−1
X
∴ gj = g0 + αi Hdi
i=0
j−1
jT j 0T j T
X
∴g d = g d + αi di Hdj
i=0
jT j 0T j
∴g d = g d ∀j

Shirish Shevade Numerical Optimization


Expanding Subspace Theorem

Consider the problem to minimize f (x) = 12 xT Hx + cT x where
H is symmetric positive definite matrix. Let d0 , d1 , . . . , dn−1 be
H-conjugate and let x0 ∈ Rn be any initial point. Let

αk = arg min f (xk + αdk ), ∀ k = 0, . . . , n − 1


α∈R

and xk+1 = xk + αk dk , ∀ k = 0, . . . , n − 1.
Then, for all k = 0, . . . , n − 1,
T
1 gk dj = 0, j = 0, . . . , k
T T
2 gk dk = g0 dk
3

xk+1 = arg min f (x)


x
s.t. x ∈ x0 + B k
Shirish Shevade Numerical Optimization
Given a set of n directions, d0 , d1 , . . . , dn−1 which are
H-conjugate and x0 ∈ Rn , it is easy to determine

αi , ∀ i = 0, . . . , n − 1,
T
i∗ di (Hx0 + c)
α =− T
di Hdi
and get
n−1

X

x =x + 0
αi d i
i=0

How do we construct the H-conjugate directions,


d0 , d1 , . . . , dn−1 ?
Given the H-conjugate directions, d0 , d1 , . . . , dk−1 , how do
we determine αk where

αk = arg min f (xk + αdk )?


α

Shirish Shevade Numerical Optimization


n−1
X

x −x = 0
αi d i
i=0
kT ∗ k kT
∴ d H(x − x ) = α d Hdk 0

T
dk H(x∗ − x0 )
∴ αk = T
dk Hdk
Suppose that after k iterative steps and obtaining k H-conjugate
directions,
k−1
X
k
x −x = 0
αi d i
i=0
kT
∴ d H(x − x0 ) = 0k

Shirish Shevade Numerical Optimization


T
Given, dk H(xk − x0 ) = 0,
T
k dk H(x∗ − xk + xk − x0 )
∴α = T
dk Hdk
T
dk (Hx∗ − Hxk )
= T
dk Hdk
T
dk (−c − Hxk )
= T
dk Hdk
T
gk dk
= − T
dk Hdk
Therefore,
T
k gk dk
α =− T
dk Hdk

Shirish Shevade Numerical Optimization


Suppose {−g0 , −g1 , . . . , −gn−1 } is a linearly independent set of
vectors.
Use Gram-Schmidt procedure to determine the H-conjugate
vectors, d0 , d1 , . . . , dn−1 .
Let d0 = −g0
In general,
k−1
X
k
d = −g + k
β j dj , k = 1, . . . , n − 1
j=0

But we want d0 , d1 , . . . , dn−1 to be H-conjugate vectors.


k−1
T T T
X
di Hdk = −di Hgk + β j di Hdj , i = 0, . . . , k − 1
j=0
iT T
∴ 0 = −d Hg + β i di Hdi , k
i = 0, . . . , k − 1
kT i
g Hd
∴ βi = T
di Hdi
Shirish Shevade Numerical Optimization
k−1 T
!
X gk Hdj
∴ dk = −gk + dj
jT j
j=0 d Hd

We now need to show that {−g0 , −g1 , . . . , −gn−1 } is a linearly


independent set of vectors.
Note that
span{d0 , d1 , . . . , dk−1 } = span{−g0 , −g1 , . . . , −gk−1 }
We have already shown that

{d0 , d1 , . . . , dk−1 } are H-conjugate ⇒ gk ⊥ Bk


∴ − gk ⊥ span{d0 , d1 , . . . , dk−1 }
∴ − gk ⊥ span{−g0 , −g1 , . . . , −gk−1 }

Therefore, {−g0 , −g1 , . . . , −gn−1 } is a linearly independent set


of vectors.
Shirish Shevade Numerical Optimization
Now, consider
d0 = −g0
k−1 T
!
X gk Hdj
dk = −gk + jT j
dj ∀ k = 1, . . . , n − 1
j=0 d Hd
| {z }
βj

Note that xj+1 = xj + αj dj and gj+1 = gj + αj Hdj .


Therefore,
1
Hdj = j (gj+1 − gj )
α
Thus,
k−1 T
!
k
X gk (gj+1 − gj )
d = −g + k
j T j+1
dj
j=0 d (g − gj )
!
kT k
g g
= −gk + dk−1
k−1 T k
d (g − g )k−1

Shirish Shevade Numerical Optimization


!
kT k
g g
dk = −gk + dk−1
k−1 T
d (gk − gk−1 )
T
Due to exact line search, gk dk−1 = 0.

dk−1 = −gk−1 + β k−2 dk−2


T T T
−dk−1 gk−1 = gk−1 gk−1 + β k−2 gk−1 dk−2

Therefore,
T
k kgk gk k−1
d = −g + T k−1 d , k = 1, . . . , n − 1
g k−1 g
Fletcher-Reeves method
Shirish Shevade Numerical Optimization
Conjugate Gradient Algorithm (Fletcher-Reeves)
For Quadratic function, 12 xT Hx + cT x, H symmetric positive
definite
(1) Initialize x0 , , d0 = −g0 , set k := 0.
(2) while kgk k > 
gk T dk
(a) αk = − T
dk Hdk
(b) xk+1 = x + αk d k
k

(c) gk+1 = Hxk+1 + c


gk+1 T gk+1
(d) βk =
gk T gk
k+1
(e) d = −gk+1 + β k dk
(f) k := k + 1
endwhile
Output : x∗ = xk , global minimum of f (x).

Shirish Shevade Numerical Optimization


Example:

min f (x) = 4x12 + x22 − 2x1 x2
3

1
x2

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

Conjugate Gradient algorithm (Fletcher-Reeves) with exact line


search applied to f (x)
Shirish Shevade Numerical Optimization
Example:

min f (x) = 4x12 + x22 − 2x1 x2
3

1
x2

−1

−2

−3
−3 −2 −1 0 1 2 3
x1

Conjugate Gradient algorithm (Fletcher-Reeves) with exact line


search applied to f (x)
Shirish Shevade Numerical Optimization
Extension to Nonquadratic function, f (x):
Conjugate Gradient Algorithm (Fletcher-Reeves)
(1) Initialize x0 , , d0 = −g0 , set k := 0.
(2) while kgk k > 
(a) αk = arg minα>0 f (xk + αdk )
(b) xk+1 = xk + αk dk
(c) Compute gk+1
(d) if k < n − 1
gk+1 T gk+1
βk = gk T gk
k+1
d = −gk+1 + β k dk
k := k + 1
else
x0 = xk+1
d0 = −gk+1
k := 0
endif
endwhile
Output : x∗ = xk , a stationary point of f (x).
Shirish Shevade Numerical Optimization
β k Determination

Fletcher-Reeves method
T
k gk gk
βFR =
gk−1 T gk−1
Polak-Ribiere method
T
k gk (gk − gk−1 )
βPR =
gk−1 T gk−1
Hestenes-Steifel method
T
k gk (gk − gk−1 )
βHS = k
(g − gk−1 )T dk−1

Shirish Shevade Numerical Optimization


γ T Bγ δδ T δγ T B + Bγδ T
   
BkBFGS =B+ 1+ T −
δ γ δT γ δT γ
Memoryless BFGS iteration

γ T γ δδ T δγ + γδ T
   T 
k
BBFGS = I + 1 + T −
δ γ δT γ δT γ
T T
With exact line search, δ k−1 gk = αk−1 dk−1 gk = 0. Therefore,
T
δγ T gk gk (gk − gk−1 ) k−1
dkBFGS = −BkBFGS gk k
= −g + T = −gk + k d
δ γ (g − gk−1 )T dk−1
| {z }
k
βHS

Shirish Shevade Numerical Optimization


For nonquadratic function, f (x):
Conjugate Gradient Algorithm (Fletcher-Reeves)
(1) Initialize x0 , , d0 = −g0 , set k := 0.
(2) while kgk k > 
(a) αk = arg minα>0 f (xk + αdk )
(b) xk+1 = xk + αk dk
(c) Compute gk+1
(d) if k < n − 1
gk+1 T gk+1
βk = gk T gk
k+1
d = −gk+1 + β k dk
k := k + 1
else
x0 = xk+1
d0 = −gk+1
k := 0
endif
endwhile
Output : x∗ = xk , a stationary point of f (x).
Shirish Shevade Numerical Optimization

You might also like