Professional Documents
Culture Documents
1
where the terms dTk Qdj = 0; k 6= j by the Q conjugacy property. Hence
dTk Q(x x0 ) = T
k dk Qdk gives
T
dk Q(x x0 )
k = indeed, dTk Qdk > 0
dTk Qdk
Note that
Continuing
Therefore
x(k) x(0) = 0 d0 + 1 d1 + 2 d2 + ::: + k 1 dk 1
Thus
x x0 = (x xk ) + (xk x0 )
Premultiply both sides of the above equation by dTk Q; we have
x(n) = x(n 1)
+ n 1 dn 1
(0)
= x + 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1
(0)
= x + 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1
0 0 ( )
= x +x x =x
2
Remark 3 The basic idea is that the minimization on Rn of quadratic function
f (x) = c + bT x + 12 xT Qx with Q symmetric positive de…nite, can be split into n
minimizations over R: This is done with the help of n directions conjugate with
respect to Q: Along each direction an exact line search is performed in closed
form.
Remark 4 Suppose that we start at x(0) and search in the direction d0 to obtain
that
x(1) = x(0) + 0 d0
where step sizes calculated with the help of following formula
We claim that
< rf (x(1) ); d0 >=< g (1) ; d0 >= 0
To see this,
The equation < g (1) ; d0 >= 0 implies that 0 has the property that
that is,
f (x(1) ) = f (x(0) + 0 d0 ) = min f (x(0) + d0 )
2R
where
0( ) = f (x(0) + d0 ):
That is, the value of 0 0 is the exact minimizer of the
0( ) = f (x(0) + d0 ):
3
Evaluating at 0, we have
0
0( 0) = rf (x(0) + T
0 d0 ) d0 = rf (x(1) )T d0 < g (1) ; d0 >= 0
and
k = arg min k( )
2R
x(k+1) = x(k) + k dk
where step size at each step is calculated with the help of following formula
< gk ; dk >
k =
< dk ; dk >Q
Note that
where
k( ) = f (x(k) + dk ):
That is, the value of k 0 is the exact minimizer of the
f (x(k) + dk ) = k( ):
That is, step size is found with the help of exact line search. Note that
gk = rf (x(k) ) = Qx(k) b
With the search direction dk chosen, we need to compute the step size, k: To
this end, we consider
1 (k)
( ) = f (x(k) + dk ) = (x + dk )T Q(x(k) + dk ) (x(k) + dk )T b
2
4
This is a quadratic function of : Now
0
( ) = rf (x(k) + dk )T dk = [Q(x(k) + dk ) b]T dk
= [Q(x(k) ) + Qdk ) b]T dk = (x(k) )T Qdk + dTk Qdk bT dk
00
and ( ) = dTk Qdk > 0
0
Its minimizer k is the value of where f (xk + dk ) vanishes. Therefore
d
f (x(k) + dk ) = rf (x(k+1) )T dk = (gk+1 )T dk =< gk+1 ; dk >= 0
d = k
Note that
0
( k) = (x(k) )T Qdk + T
k dk Qdk bT dk = 0 gives
bT dk (x(k) )T Qdk (x(k) )T Qdk bT dk ) (x(k) )T Q bT )dk
k = = ( = (
dTk Qdk dTk Qdk dTk Qdk
rf (x(k) )T dk < gk ; dk >
= ( )= :
dTk Qdk dTk Qdk
Now,
Q(x(k+1) ) Qx(k) = [Q(x(k+1) ) b] [Qx(k) b]
Q[x(k+1) x(k) ] = g (k+1) g (k)
Q[ k dk ] = g (k+1) g (k) gives
(k+1)
g = g (k) + k Qdk
(k+1) T (k) T
[g ] = [g ] + k dTk Q
5
Now, for 0 j<k
[g (k+1) ]T dj = [g (k) ]T dj + T
k dk Qdj
= 0+0
The …rst term on RHS is zero by inductive hypothesis and the second term is
zero by Q conjugacy ). We know that
T
gk+1 dk = 0:
Hence
Thus, g (k+1) is orthogonal to any vector from the subspace spanned by fd0 ; d1 ; d2 ; :::; dk g
Proposition 7
f (xk+1 ) = min f (x)
x2Vk
(0)
where Vk = x + spanfd0 ; d1 ; d2 ; :::; dk g: Thus not only have
but also
P
k
f (x(k+1) ) = min f (x(0) + k dk )
0 ; 1 ; 2 ;::: k 2R i=0
Proof. De…ne the matrix Dk = [d0 ; d1 ; d2 ; :::; dk ]: That is, di is the ith column
of Dk : Note that x0 + R(Dk ) = Vk ; where R(Dk ) is the column space of Dk :
Also
P
k
xk+1 = x(0) + k dk
i=0
= x(0) + D k
x = x(0) + Dk
6
De…ne
k( ) = f (x(0) + Dk )
Note that k ( ) is a quadratic function and has a unique minimizer that satis…es
the FONC. By the chain rule
D k( ) = rf (x(0) + Dk )T D(k)
= [rf (xk+1 )]T D(k)
= [g (k+1) )]T D(k)
where
< g (0) ; d0 >
0 = arg min f (x(0) + d0 ) =
0 < d0 ; d0 >Q
5. In the next stage, we search in a direction d1 that is Q conjugate to d0 :
We choose d1 as a linear combination of g (1) and d0 :
6. We then look for a direction d2 that is conjugate to the previous direction
d0 and d1 with respect to matrix Q:
7
7. Likewise we continue and thus in general:
8. In general, at the (k + 1) th stage, we choose dk+1 as a linear combination
of g (k+1) and dk as follows:
dk+1 = g (k+1) + k dk
The coe¢ cients k are chosen in such a way that dk+1 is Q conjugate
to d0 ; d1 ; d2 ; :::; dk : This is done by choosing k as
[rf (x(0) + d0 )]T d0 = 0; [rf (x(1) )]T d0 = 0; that is, (g (1) )T g (0) = 0:
That is, g (1) and g (0) are orthogonal to each other. That is, < g1 ; g0 >=
0; that is, < g1 ; g0 >= g1T g0 = 0: that is, < g (1) ; d0 >= 0, that is,
< g (k+1) ; dk >= 0 holds for k = 0:
We now show that if d0 and d1 are conjugate with respect to Q; then
Note that
8
Proof. We shall prove the result with induction: First show that,
dT0 Qd1 = 0:
To this end:
dT0 Qd1 = dT0 Q(g (1) + 0 d0 )
substituting the value of 0; we have
Consider
x(k+1) = x(k) + k dk
(k+1)
x x(k)
= dk
k
Thus
x(j+1) x(j)
dTk+1 Qdj = (g (k+1) )T Qdj = (g (k+1) )T Q( )
j
1
= (g (k+1) )T Q(x(j+1) x(j) )
j
1
= (g (k+1) )T [Qx(j+1) Qx(j) ]
j
1
= [(g (k+1) )T g (j+1) (g (k+1) )T g (j) ] = 0
j
Hence
dk+1 T Qdj = 0
for j = 0; 1; 2; :::; k 1:
9
Does dk+1 T Qdk = 0: We assume this as this necessary to …nd k: Use
dk+1 T Qdk = 0
we know that
Note that
(dk+1 )T Qdk = T
0 () ( gk+1 + T
k dk )Qdk = 0
T
() gk+1 Qdk + k dTk Qdk =0
T T
() k dk Qdk = gk+1 Qdk
T
gk+1 Qdk
() k =
dTk Qdk
Proof. We know that dk+1 = g (k+1) + k dk : Fix j 2 f0; 1; 2; :::; kg and obtain
dj = gj + j 1 dj 1
gives that
(g (k+1) )T dj = (g (k+1) )T gj + j 1 (g
(k+1) T
) dj 1 implies that
0 = (g (k+1) )T gj + 0 as (g (k+1) )T dj 1 = 0 and (g (k+1) )T dj = 0
Hence (g (k+1) )T gj = 0 for j 2 f0; 1; 2; :::; kg:
10
Note that
@
@x1 f (x) x1
rf (x) = @ =
@x2 f (x)
9x2
1 0 x1 0
=
0 9 x2 0
= Qx b and
" @2 @2
#
2 @x21
f (x) @x2 @x1 f (x)
r f (x) = @2 @2
@x1 @x2 f (x) @x22
f (x)
1 0
= =Q
0 9
9
Suppose we start with x(0) = : Then
1
9
g0 = rf (x0 ) =
9
9
) d0 = g (0) =
9
1 0 9 9
) Qd0 = =
0 9 9 81
T
9 9
dT0 Qd0 = = 810
9 81
< g (0) ; d0 > 162 2
) 0 = = =
< d0 ; Qd0 > 810 10
9 1 9 4 9
) x(1) = x(0) + 0 d0 = =
1 5 9 5 1
Now
36 1
g (1) = rf (x1 ) =
5 1
36 1 < g (1) ; d0 >Q
) d1 = g (1) + 1 d0 = + d0
5 1 < d0 ; d0 >Q
36 1 ( 36 2
5 ) (2) 9
=
5 1 81(2) 9
324 36
25 9
= 36 =
25 25 1
11
Now
36 36 1 9 36 36
g1T d1 = ( )( ) =( )( )( 10)
5 25 1 1 5 25
36 324 324
1 0 9 25 1
Qd1 = ( ) = 324 =
25 0 9 1 25 25 1
T
36 324 9 1 23 328
dT1 Qd1 = ( )( ) =
25 25 1 1 125
< g1 ; d1 > ( 36 )( 36 )( 10) 5
) 1 = = 5 25 23 328 =
< d1 ; Qd1 > 125
9
4 9 36 5 9
x2 = x1 + 1 d1 = + ( )
5 1 25 9 1
0
=
0
Here
12
1 a
d0 = : Then d1 = must satisfy the following condition:
0 b
T
1 8 4 a
dT0 Qd1 = =0
0 4 8 b
= 8a 4b = 0
1
In particular we may choose a = 1 then b = 2: Hence d1 = : It may
2
be noted that conjugate directions are not unique. If we minimize the objective
1
1
function f starting from x(0) = 2 along the direction : Then
1 0
1 1
1 +
x(1) = x(0) + d0 = 2 + = 2
1 0 1
Now
1 T 1
1 + 8 4 +
( ) = f (x(0) + d0 ) = f (x(1) ) = 2 2
2 1 4 8 1
T 1
0 2 +
12 1
1
= (4 4) 2 +5 12
2
1
which attains its minimum at 0 = 1: Hence x(1) = 2 : Now start from
1
1
1
x(1) = 2 and minimize the objective function along the direction we
1 2
get
1
1
x(2) = x(1) + d1 = 2 +
1 2
1 1
f attains its minimum value at 1 = 2 and we get x(2) = which is a
2
minimum point of f:
8(1) 4(2) 0
rf (x(2) ) = =
12 + 8(2) 4(1) 0
13