Professional Documents
Culture Documents
Unconstrained Optimization
Shirish Shevade
Computer Science and Automation
Indian Institute of Science
Bangalore 560 012, India.
min f (x)
x
where f : Rn → R, f ∈ C 1 .
Idea :
1 For every coordinate variable xi , i = 1, . . . , n, minimize
f (x) w.r.t. xi , keeping the other coordinate variables
xj , j 6= i constant.
2 Repeat the procedure in step 1 until some stopping
condition is satisfied.
(− 41 , − 14 )T 6= x∗
Shirish Shevade Numerical Optimization
Example 1:
∆
min f1 (x) = 4x12 + x22
x
H = 80 02 .
x∗ , attained in at most two steps using coordinate descent
method
Example 2:
∆
min f2 (x) = 4x12 + x22 − 2x1 x2
x
8 −2
H = −2 2 .
x∗ , could not be attained in two steps using coordinate
descent method (if x0 is not on one of the principal axes of
the elliptical contours)
1 1 T
Ψ(α) = αT |DT{z
HD} α + (Hx0 + c)T Dα + x0 Hx0 + cT x0
2 |2
Q {z
constant
}
T T T
d0 Hd0 d0 Hd1 ... d0 Hdn−1
T T T
d1 Hd0 d1 Hd1 d1 Hdn−1
T
...
Q = D HD = .. .. .. ..
. . . .
T T T
dn−1 Hd0 dn−1 Hd1 . . . dn−1 Hdn−1
T
Q will be diagonal matrix if di Hdj = 0, ∀ i 6= j.
1 X X X
Ψ(α) = (x0 + αi di )T H(x0 + αi di ) + cT (x0 + αi d i )
2 i i i
1 X h
T
i
= (x0 + αi di ) H(x0 + αi di ) + 2cT (x0 + αi di ) + constant
2 i
T
∂Ψ i∗ di (Hx0 + c)
=0⇒α =− T
∂αi di Hdi
Therefore,
n−1
∗
X
∗
x =x + 0
αi d i
i=0
Definition
Let H ∈ Rn×n be a symmetric matrix. The vectors
{d0 , d1 , . . . , dn−1 } are said to be H-conjugate if they are linearly
T
independent and di Hdj = 0 ∀ i 6= j.
Shirish Shevade Numerical Optimization
Example: Consider the problem,
∆
min f (x) = 4x12 + x22 − 2x1 x2
x
−2
8
H = −2 2
x0 = (−1, −1)T
Let d0 = (1, 0)T
x1 = x0 + α0 d0 where
∆
α0 = arg min φ0 (α) = f (x0 + αd0 )
0 0
α
x + αd1
φ0 (α) = f 10 = 4(α − 1)2 + 1 + 2(α − 1)
x2 + αd20
φ00 (α) = 0 ⇒ α0 = 43 ⇒ x1 = (− 14 , −1)T
T
Choose a non-zero direction d1 such that d1 Hd0 = 0
Let d1 = (a, b)T
. Therefore,
8 −2
1
(a b) −2 2 0 = 0 ⇒ 8a − 2b = 0
Shirish Shevade Numerical Optimization
Let d1 = (1, 4)T ,
x2 = x1 + α1 d1 where
α − 41
1 ∆ 3
α = arg min φ1 (α) = f = (4α − 1)2
α 4α − 1 4
φ01 (α) = 0 ⇒ α1 = 1
4
x2 = x1 + α1 d1 = (0, 0)T = x∗
n−1 n−1
T
X X
i i
µd =0 ⇒ µi dj Hdi = 0 for every j = 0, . . . , n − 1
i=0 i=0
T
⇒ µj dj Hdj = 0
⇒ µj = 0 for every j = 0, . . . , n − 1
⇒ d0 , d1 , . . . , dn−1 are linearly independent
∆ 1
min f (x) = xT Hx+cT x, H symmetric positive definite matrix.
x 2
Let d , d , . . . , dn−1 be H-conjugate. ∴d0 , d1 , . . . , dn−1 are
0 1
linearly independent.
Let B k denote the subspace spanned by d0 , d1 , . . . , dk−1 .
Clearly, B k ⊂ Bk+1 .
Let x0 ∈ Rn be any arbitrary point.
Let xk+1 = xk + αk dk where αk is obtained by doing exact line
search:
αk = arg min f (xk + αdk )
α
Claim:
xk = arg minx f (x)
s.t. x ∈ x0 + B k
Therefore,
T
∇f (xk + αk dk )T dk = 0 ⇒ gk+1Pdk = 0 ∀ k = 0, . . . , n − 1
xk = xk−1 + αk−1 dk−1 = xj + k−1 i i
i=j α d (j = 0, . . . , k − 1)
k−1
X
∴ Hxk + c = Hxj + c + αi Hdi
i=j
k−1
X
∴ gk = gj + αi Hdi
i=j
k−1
k T j−1 j T j−1 T
X
∴g d = g d + αi di Hdj−1 = 0
i=j
T
Therefore, gk dj = 0 ∀ j = 0, . . . , k − 1 or gk ⊥ Bk
Shirish Shevade Numerical Optimization
Note that for every j = 0, . . . , n − 1,
∴ f (xj + αj dj ) ≤ f (xj + µj dj ), µj ∈ R
T 1 j2 jT j T 1 2 T
∴ f (xj ) + αj gj dj + α d Hd ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
We need to show that f (xk ) ≤ f (x) ∀ x ∈ x0 + B k or
k−1
X k−1
X
j j
0
f (x + α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j.
That is, j=0 j=0
k−1 k−1
j 0T j 1 j2 jT T 1 2 T
X X
f (x )+ (α g d + α d Hd ) ≤ f (x )+ (µj g0 dj + µj dj Hdj )
0 j 0
j=0
2 j=0
2
where µj ∈ R ∀ j.
Shirish Shevade Numerical Optimization
For every j = 0, . . . , n − 1,
T 1 2 T T 1 2 T
f (xj ) + αj gj dj + αj dj Hdj ≤ f (xj ) + µj gj dj + µj dj Hdj
2 2
T T
Suppose gj dj = g0 dj ∀ j
T 1 j2 jT j T 1 2 T
∴ αj g0 dj + α d Hd ≤ µj g0 dj + µj dj Hdj ∀j
2 2
Therefore,
k−1 k−1
X T 1 2 T X T 1 2 T
f (x0 )+ (αj g0 dj + αj dj Hdj ) ≤ f (x0 )+ (µj g0 dj + µj dj Hdj )
j=0
2 j=0
2
k−1
X k−1
X
j j
∴ f (x + 0
α d ) ≤ f (x + 0
µj dj ), µj ∈ R ∀ j
j=0 j=0
k
∴ f (x ) ≤ f (x), ∀ x ∈ x0 + B k
Shirish Shevade Numerical Optimization
We need to show that
T T
gj dj = g0 dj ∀j
Consider, xj = x0 + j−1 i i
P
i=0 α d .
j−1
X
j
∴ Hx + c = Hx + c + 0
αi Hdi
i=0
j−1
X
∴ gj = g0 + αi Hdi
i=0
j−1
jT j 0T j T
X
∴g d = g d + αi di Hdj
i=0
jT j 0T j
∴g d = g d ∀j
and xk+1 = xk + αk dk , ∀ k = 0, . . . , n − 1.
Then, for all k = 0, . . . , n − 1,
T
1 gk dj = 0, j = 0, . . . , k
T T
2 gk dk = g0 dk
3
T
dk H(x∗ − x0 )
∴ αk = T
dk Hdk
Suppose that after k iterative steps and obtaining k H-conjugate
directions,
k−1
X
k
x −x = 0
αi d i
i=0
kT
∴ d H(x − x0 ) = 0k
Therefore,
T
k kgk gk k−1
d = −g + T k−1 d , k = 1, . . . , n − 1
g k−1 g
Fletcher-Reeves method
Shirish Shevade Numerical Optimization
Conjugate Gradient Algorithm (Fletcher-Reeves)
For Quadratic function, 12 xT Hx + cT x, H symmetric positive
definite
(1) Initialize x0 , , d0 = −g0 , set k := 0.
(2) while kgk k >
gk T dk
(a) αk = − T
dk Hdk
(b) xk+1 = x + αk d k
k
1
x2
−1
−2
−3
−3 −2 −1 0 1 2 3
x1
1
x2
−1
−2
−3
−3 −2 −1 0 1 2 3
x1
Fletcher-Reeves method
T
k gk gk
βFR =
gk−1 T gk−1
Polak-Ribiere method
T
k gk (gk − gk−1 )
βPR =
gk−1 T gk−1
Hestenes-Steifel method
T
k gk (gk − gk−1 )
βHS = k
(g − gk−1 )T dk−1
γ T γ δδ T δγ + γδ T
T
k
BBFGS = I + 1 + T −
δ γ δT γ δT γ
T T
With exact line search, δ k−1 gk = αk−1 dk−1 gk = 0. Therefore,
T
δγ T gk gk (gk − gk−1 ) k−1
dkBFGS = −BkBFGS gk k
= −g + T = −gk + k d
δ γ (g − gk−1 )T dk−1
| {z }
k
βHS