UnconstrainedOptimization IX

Numerical Optimization
Unconstrained Optimization
Shirish Shevade
Computer Science and Automation
Indian Institute of Science
Bangalore 560 012, India.
NPTEL Course on Numerical Optimization
Shirish Shevade Numerical Optimization

Coordinate Descent Method
Consider the problem,
min f (x)
x
where f : Rn → R, f ∈ C 1 .
Idea :
1 For every coordinate variable xi , i = 1, . . . , n, minimize
f (x) w.r.t. xi , keeping the other coordinate variables
xj , j 6= i constant.
2 Repeat the procedure in step 1 until some stopping
condition is satisfied.

Coordinate Descent Method
(1) Initialize x0 , , set k := 0.
(2) while kgk k >
for i = 1, . . . , n
xinew = arg minxi f (x)
xi = xinew
endfor
endwhile
Output : x∗ = xk , a stationary point of f (x).
Globally convergent method if a search along any

coordinate direction yields a unique minimum point

Example: Consider the problem,
∆
min f (x) = 4x12 + x22
x
We use coordinate descent method with exact line search to
solve this problem.
x0 = (−1, −1)T
Let d0 = (1, 0)T
x1 = x0 + α0 d0 where
∆
α0 = arg min φ0 (α) = f (x0 + αd0 )
0 α
x1 + αd10
φ0 (α) = f 0 = 4(α − 1)2 + 1
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 1 ⇒ x1 = (0, −1)T
∆
d1= (0, 1)T 2 1 1 1 1
, x = x + α d , α = arg minα φ1 (α) =
0
f = (α − 1)2 ⇒ α1 = 1 ⇒ x2 = (0, 0)T = x∗
α−1
∆
min f (x) = 4x12 + x22
x
For the above problem,
Moving along coordinate directions and using exact lines
search gives the solution in at most two steps.
Same result is obtained even if d0 and d1 are interchanged.

∆
min f (x) = 4x12 + x22 − 2x1 x2
x
We use coordinate descent method with exact line search to
solve this problem.
x0 = (−1, −1)T
Let d0 = (1, 0)T
∆
α0 = arg min φ0 (α) = f (x0 + αd0 )
0 α
x1 + αd10
φ0 (α) = f 0 = 4(α − 1)2 + 1 + 2(α − 1)
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 43 ⇒ x1 = (− 14 , −1)T
∆
d1= (0, 1) T 2 1 1 1 1
, x = x + α d , α = arg minα φ1 (α) =
1
−4
f = (α − 1)2 + α−1 + 14 ⇒ α1 = 34 ⇒ x2 =
α−1 2
(− 41 , − 14 )T 6= x∗
Example 1:
∆
min f1 (x) = 4x12 + x22
x

H = 80 02 .
x∗ , attained in at most two steps using coordinate descent
method
Example 2:
∆
min f2 (x) = 4x12 + x22 − 2x1 x2
x
8 −2

H = −2 2 .
x∗ , could not be attained in two steps using coordinate
descent method (if x0 is not on one of the principal axes of
the elliptical contours)

Consider the problem:
∆ 1
min f (x) = xT Hx + cT x
x 2
where H is a symmetric positive definite matrix.
Let {d0 , d1 , . . . , dn−1 } be a set of linearly independent
directions and x0 ∈ Rn
Any x ∈ Rn can be represented as
n−1
X
x=x + 0
αi d i
i=0
Given {d0 , d1 , . . . , dn−1 } and x0 ∈ Rn , the given problem is

to minimize Ψ(α) defined as,
n−1
!T n−1
! n−1
!
1 X X X
x0 + αi di H x0 + αi di +cT x0 + αi d i
2 i=0 i=0 i=0

Define D = (d0 |d1 | . . . |dn−1 ) and α = (α0 , α1 , . . . , αn−1 ).
1 1 T
Ψ(α) = αT |DT{z
HD} α + (Hx0 + c)T Dα + x0 Hx0 + cT x0
2 |2
Q {z
constant
}
 T T T

d0 Hd0 d0 Hd1 ... d0 Hdn−1
T T T
d1 Hd0 d1 Hd1 d1 Hdn−1
 
T
 ... 
Q = D HD =  .. .. .. .. 
. . . .
 
 
T T T
dn−1 Hd0 dn−1 Hd1 . . . dn−1 Hdn−1
T
Q will be diagonal matrix if di Hdj = 0, ∀ i 6= j.

T
Let di Hdj = 0, ∀ i 6= j.
 T 
d0 Hd0 0 ... 0
 1 T 1 
 0 d Hd . . . 0 
Q = DT HD =  .. .. .. .. 
. . . .
 
 
n−1 T
0 0 ... d Hdn−1
Therefore, (
1
iT if j = i
Q−1
ij = d Hdi
0 otherwise
1 X X X
Ψ(α) = (x0 + αi di )T H(x0 + αi di ) + cT (x0 + αi d i )
2 i i i
1 X h
T
i
= (x0 + αi di ) H(x0 + αi di ) + 2cT (x0 + αi di ) + constant
2 i
Ψ(α) is separable in terms of α0 , α1 , . . . , αn−1

1 Xh 0 i i T 0 i i T 0 i i
i
Ψ(α) = (x + α d ) H(x + α d ) + 2c (x + α d )
2 i
T
∂Ψ i∗ di (Hx0 + c)
=0⇒α =− T
∂αi di Hdi
Therefore,
n−1
∗
X
∗
x =x + 0
αi d i
i=0
Definition
Let H ∈ Rn×n be a symmetric matrix. The vectors
{d0 , d1 , . . . , dn−1 } are said to be H-conjugate if they are linearly
T
independent and di Hdj = 0 ∀ i 6= j.
∆
min f (x) = 4x12 + x22 − 2x1 x2
x
−2
8

H = −2 2
x0 = (−1, −1)T
Let d0 = (1, 0)T
∆
α0 = arg min φ0 (α) = f (x0 + αd0 )
0 0
α
x + αd1
φ0 (α) = f 10 = 4(α − 1)2 + 1 + 2(α − 1)
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 43 ⇒ x1 = (− 14 , −1)T
T
Choose a non-zero direction d1 such that d1 Hd0 = 0
Let d1 = (a, b)T
. Therefore,
8 −2
1
(a b) −2 2 0 = 0 ⇒ 8a − 2b = 0
Let d1 = (1, 4)T ,
α − 41

1 ∆ 3
α = arg min φ1 (α) = f = (4α − 1)2
α 4α − 1 4
φ01 (α) = 0 ⇒ α1 = 1
4
x2 = x1 + α1 d1 = (0, 0)T = x∗
A convex quadratic function can be minimized in, at most, n

steps, provided we search along conjugate directions of the
Hessian matrix.
Given H, does a set of H-conjugate vectors exist? If yes, how to

get a set of such vectors?

Conjugate Directions
Let H ∈ Rn×n be a symmetric matrix.

Do there exist n conjugate directions w.r.t H?
H is symmetric ⇒ H has n mutually orthogonal
eigenvectors.
Let v1 and v2 be two orthogonal eigenvectors of H.
∴ vT1 v2 = 0.
Hv1 = λ1 v1 ⇒ vT2 Hv1 = λ1 vT2 v1

⇒ vT2 Hv1 = 0
⇒ v1 and v2 are H-conjugate
∴ n orthogonal eigenvectors of H are H-conjugate.

Let H be a symmetric positive definite matrix and

d0 , d1 , . . . , dn−1 be nonzero directions such that
T
di Hdj = 0, i 6= j.
Are d0 , d1 , . . . , dn−1 linearly independent?
n−1 n−1
T
X X
i i
µd =0 ⇒ µi dj Hdi = 0 for every j = 0, . . . , n − 1
i=0 i=0
T
⇒ µj dj Hdj = 0
⇒ µj = 0 for every j = 0, . . . , n − 1
⇒ d0 , d1 , . . . , dn−1 are linearly independent

Geometric Interpretation:
1 T
min x Hx + cT x, H symmetric positive definite matrix.
x∈R 22
Let x∗ be the solution. ∴ Hx∗ = −c.

Let x0 be any initial point. g0 = Hx0 + c
Let d0 be some direction (d0 6= 0).
T
x1 is found by doing exact line search along d0 . ∴ g1 d0 = 0.
g1 = Hx1 + c.
(x∗ − x1 )T Hd0 = (Hx∗ − Hx1 )T d0
T
= −g1 d0
= 0
Therefore, the direction (x∗ − x1 ) is H conjugate to d0 .

∆ 1
min f (x) = xT Hx+cT x, H symmetric positive definite matrix.
x 2
Let d , d , . . . , dn−1 be H-conjugate. ∴d0 , d1 , . . . , dn−1 are
0 1
linearly independent.
Let B k denote the subspace spanned by d0 , d1 , . . . , dk−1 .
Clearly, B k ⊂ Bk+1 .
Let x0 ∈ Rn be any arbitrary point.
Let xk+1 = xk + αk dk where αk is obtained by doing exact line
search:
αk = arg min f (xk + αdk )
α
Claim:
xk = arg minx f (x)
s.t. x ∈ x0 + B k

Exact line search:
αk = arg min f (xk + αdk )

α∈R
Therefore,
T
∇f (xk + αk dk )T dk = 0 ⇒ gk+1Pdk = 0 ∀ k = 0, . . . , n − 1
xk = xk−1 + αk−1 dk−1 = xj + k−1 i i
i=j α d (j = 0, . . . , k − 1)
k−1
X
∴ Hxk + c = Hxj + c + αi Hdi
i=j
k−1
X
∴ gk = gj + αi Hdi
i=j
k−1
k T j−1 j T j−1 T
X
∴g d = g d + αi di Hdj−1 = 0
i=j
T
Therefore, gk dj = 0 ∀ j = 0, . . . , k − 1 or gk ⊥ Bk
Note that for every j = 0, . . . , n − 1,
αj = arg min f (xj + αdj )

α
∴ f (xj + αj dj ) ≤ f (xj + µj dj ), µj ∈ R
T 1 j2 jT j T 1 2 T
∴ f (xj ) + αj gj dj + α d Hd ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
We need to show that f (xk ) ≤ f (x) ∀ x ∈ x0 + B k or
k−1
X k−1
X
j j
0
f (x + α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j.
That is, j=0 j=0
k−1 k−1
j 0T j 1 j2 jT T 1 2 T
X X
f (x )+ (α g d + α d Hd ) ≤ f (x )+ (µj g0 dj + µj dj Hdj )
0 j 0
j=0
2 j=0
2
where µj ∈ R ∀ j.
For every j = 0, . . . , n − 1,
T 1 2 T T 1 2 T
f (xj ) + αj gj dj + αj dj Hdj ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
T T
Suppose gj dj = g0 dj ∀ j
T 1 j2 jT j T 1 2 T
∴ αj g0 dj + α d Hd ≤ µj g0 dj + µj dj Hdj ∀j
2 2
Therefore,
k−1 k−1
X T 1 2 T X T 1 2 T
f (x0 )+ (αj g0 dj + αj dj Hdj ) ≤ f (x0 )+ (µj g0 dj + µj dj Hdj )
j=0
2 j=0
2
k−1
X k−1
X
j j
∴ f (x + 0
α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j
j=0 j=0
k
∴ f (x ) ≤ f (x), ∀ x ∈ x0 + B k
We need to show that
T T
gj dj = g0 dj ∀j
Consider, xj = x0 + j−1 i i
P
i=0 α d .
j−1
X
j
∴ Hx + c = Hx + c + 0
αi Hdi
i=0
j−1
X
∴ gj = g0 + αi Hdi
i=0
j−1
jT j 0T j T
X
∴g d = g d + αi di Hdj
i=0
jT j 0T j
∴g d = g d ∀j

Expanding Subspace Theorem
∆
Consider the problem to minimize f (x) = 12 xT Hx + cT x where
H is symmetric positive definite matrix. Let d0 , d1 , . . . , dn−1 be
H-conjugate and let x0 ∈ Rn be any initial point. Let
αk = arg min f (xk + αdk ), ∀ k = 0, . . . , n − 1

α∈R
and xk+1 = xk + αk dk , ∀ k = 0, . . . , n − 1.
Then, for all k = 0, . . . , n − 1,
T
1 gk dj = 0, j = 0, . . . , k
T T
2 gk dk = g0 dk
3
xk+1 = arg min f (x)

x
s.t. x ∈ x0 + B k
Given a set of n directions, d0 , d1 , . . . , dn−1 which are
H-conjugate and x0 ∈ Rn , it is easy to determine
∗
αi , ∀ i = 0, . . . , n − 1,
T
i∗ di (Hx0 + c)
α =− T
di Hdi
and get
n−1
∗
X
∗
x =x + 0
αi d i
i=0
How do we construct the H-conjugate directions,

d0 , d1 , . . . , dn−1 ?
Given the H-conjugate directions, d0 , d1 , . . . , dk−1 , how do
we determine αk where
αk = arg min f (xk + αdk )?

α

n−1
X
∗
x −x = 0
αi d i
i=0
kT ∗ k kT
∴ d H(x − x ) = α d Hdk 0
T
dk H(x∗ − x0 )
∴ αk = T
dk Hdk
Suppose that after k iterative steps and obtaining k H-conjugate
directions,
k−1
X
k
x −x = 0
αi d i
i=0
kT
∴ d H(x − x0 ) = 0k

T
Given, dk H(xk − x0 ) = 0,
T
k dk H(x∗ − xk + xk − x0 )
∴α = T
dk Hdk
T
dk (Hx∗ − Hxk )
= T
dk Hdk
T
dk (−c − Hxk )
= T
dk Hdk
T
gk dk
= − T
dk Hdk
Therefore,
T
k gk dk
α =− T
dk Hdk

Suppose {−g0 , −g1 , . . . , −gn−1 } is a linearly independent set of
vectors.
Use Gram-Schmidt procedure to determine the H-conjugate
vectors, d0 , d1 , . . . , dn−1 .
Let d0 = −g0
In general,
k−1
X
k
d = −g + k
β j dj , k = 1, . . . , n − 1
j=0
But we want d0 , d1 , . . . , dn−1 to be H-conjugate vectors.

k−1
T T T
X
di Hdk = −di Hgk + β j di Hdj , i = 0, . . . , k − 1
j=0
iT T
∴ 0 = −d Hg + β i di Hdi , k
i = 0, . . . , k − 1
kT i
g Hd
∴ βi = T
di Hdi
k−1 T
!
X gk Hdj
∴ dk = −gk + dj
jT j
j=0 d Hd
We now need to show that {−g0 , −g1 , . . . , −gn−1 } is a linearly

independent set of vectors.
Note that
span{d0 , d1 , . . . , dk−1 } = span{−g0 , −g1 , . . . , −gk−1 }
We have already shown that
{d0 , d1 , . . . , dk−1 } are H-conjugate ⇒ gk ⊥ Bk

∴ − gk ⊥ span{d0 , d1 , . . . , dk−1 }
∴ − gk ⊥ span{−g0 , −g1 , . . . , −gk−1 }
Therefore, {−g0 , −g1 , . . . , −gn−1 } is a linearly independent set

of vectors.
Now, consider
d0 = −g0
k−1 T
!
X gk Hdj
dk = −gk + jT j
dj ∀ k = 1, . . . , n − 1
j=0 d Hd
| {z }
βj
Note that xj+1 = xj + αj dj and gj+1 = gj + αj Hdj .

Therefore,
1
Hdj = j (gj+1 − gj )
α
Thus,
k−1 T
!
k
X gk (gj+1 − gj )
d = −g + k
j T j+1
dj
j=0 d (g − gj )
!
kT k
g g
= −gk + dk−1
k−1 T k
d (g − g )k−1

!
kT k
g g
dk = −gk + dk−1
k−1 T
d (gk − gk−1 )
T
Due to exact line search, gk dk−1 = 0.
dk−1 = −gk−1 + β k−2 dk−2

T T T
−dk−1 gk−1 = gk−1 gk−1 + β k−2 gk−1 dk−2
Therefore,
T
k kgk gk k−1
d = −g + T k−1 d , k = 1, . . . , n − 1
g k−1 g
Fletcher-Reeves method
Conjugate Gradient Algorithm (Fletcher-Reeves)
For Quadratic function, 12 xT Hx + cT x, H symmetric positive
definite
(1) Initialize x0 , , d0 = −g0 , set k := 0.
(2) while kgk k >
gk T dk
(a) αk = − T
dk Hdk
(b) xk+1 = x + αk d k
k
(c) gk+1 = Hxk+1 + c

gk+1 T gk+1
(d) βk =
gk T gk
k+1
(e) d = −gk+1 + β k dk
(f) k := k + 1
endwhile
Output : x∗ = xk , global minimum of f (x).

Example:
∆
min f (x) = 4x12 + x22 − 2x1 x2
3
1
x2
−1
−2
−3
−3 −2 −1 0 1 2 3
x1
Conjugate Gradient algorithm (Fletcher-Reeves) with exact line

search applied to f (x)
Example:
∆
min f (x) = 4x12 + x22 − 2x1 x2
3
1
x2
−1
−2
−3
−3 −2 −1 0 1 2 3
x1
Conjugate Gradient algorithm (Fletcher-Reeves) with exact line

search applied to f (x)
Extension to Nonquadratic function, f (x):
(2) while kgk k >
(a) αk = arg minα>0 f (xk + αdk )
(b) xk+1 = xk + αk dk
(c) Compute gk+1
(d) if k < n − 1
gk+1 T gk+1
βk = gk T gk
k+1
d = −gk+1 + β k dk
k := k + 1
else
x0 = xk+1
d0 = −gk+1
k := 0
endif
endwhile
β k Determination
Fletcher-Reeves method
T
k gk gk
βFR =
gk−1 T gk−1
Polak-Ribiere method
T
k gk (gk − gk−1 )
βPR =
gk−1 T gk−1
Hestenes-Steifel method
T
k gk (gk − gk−1 )
βHS = k
(g − gk−1 )T dk−1

γ T Bγ δδ T δγ T B + Bγδ T

BkBFGS =B+ 1+ T −
δ γ δT γ δT γ
Memoryless BFGS iteration
γ T γ δδ T δγ + γδ T
T
k
BBFGS = I + 1 + T −
δ γ δT γ δT γ
T T
With exact line search, δ k−1 gk = αk−1 dk−1 gk = 0. Therefore,
T
δγ T gk gk (gk − gk−1 ) k−1
dkBFGS = −BkBFGS gk k
= −g + T = −gk + k d
δ γ (g − gk−1 )T dk−1
| {z }
k
βHS

For nonquadratic function, f (x):
(2) while kgk k >
(a) αk = arg minα>0 f (xk + αdk )
(b) xk+1 = xk + αk dk
(c) Compute gk+1
(d) if k < n − 1
gk+1 T gk+1
βk = gk T gk
k+1
d = −gk+1 + β k dk
k := k + 1
else
x0 = xk+1
d0 = −gk+1
k := 0
endif
endwhile

UnconstrainedOptimization IX

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UnconstrainedOptimization IX

Uploaded by

Copyright:

Available Formats

Numerical Optimization

NPTEL Course on Numerical Optimization

Shirish Shevade Numerical Optimization

Consider the problem,

Shirish Shevade Numerical Optimization

Globally convergent method if a search along any

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Given {d0 , d1 , . . . , dn−1 } and x0 ∈ Rn , the given problem is

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Ψ(α) is separable in terms of α0 , α1 , . . . , αn−1

A convex quadratic function can be minimized in, at most, n

Given H, does a set of H-conjugate vectors exist? If yes, how to

Shirish Shevade Numerical Optimization

Let H ∈ Rn×n be a symmetric matrix.

Hv1 = λ1 v1 ⇒ vT2 Hv1 = λ1 vT2 v1

∴ n orthogonal eigenvectors of H are H-conjugate.

Shirish Shevade Numerical Optimization

Let H be a symmetric positive definite matrix and

Are d0 , d1 , . . . , dn−1 linearly independent?

Shirish Shevade Numerical Optimization

Let x∗ be the solution. ∴ Hx∗ = −c.

Therefore, the direction (x∗ − x1 ) is H conjugate to d0 .

Shirish Shevade Numerical Optimization

αk = arg min f (xk + αdk )

αj = arg min f (xj + αdj )

Shirish Shevade Numerical Optimization

αk = arg min f (xk + αdk ), ∀ k = 0, . . . , n − 1

xk+1 = arg min f (x)

How do we construct the H-conjugate directions,

αk = arg min f (xk + αdk )?

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

But we want d0 , d1 , . . . , dn−1 to be H-conjugate vectors.

We now need to show that {−g0 , −g1 , . . . , −gn−1 } is a linearly

{d0 , d1 , . . . , dk−1 } are H-conjugate ⇒ gk ⊥ Bk

Therefore, {−g0 , −g1 , . . . , −gn−1 } is a linearly independent set

Note that xj+1 = xj + αj dj and gj+1 = gj + αj Hdj .

Shirish Shevade Numerical Optimization

dk−1 = −gk−1 + β k−2 dk−2

(c) gk+1 = Hxk+1 + c

Shirish Shevade Numerical Optimization

Conjugate Gradient algorithm (Fletcher-Reeves) with exact line

Conjugate Gradient algorithm (Fletcher-Reeves) with exact line

Shirish Shevade Numerical Optimization

Shirish Shevade Numerical Optimization

You might also like