You are on page 1of 13

1 Unconstrained optimization problems:

2 Conjugate Gradient Method


Remark 1 The solution to the system of linear equations Qx = b minimizes
the quadratic form
1
f (x) = xT Qx xT b + c
2
with Q symmetric positive de…nite matrix ( solution of the system solves opti-
mization problem).
Proof. Let x be a vector that satis…es Qx = b and e be any other non zero
vector in Rn . Then
1
f (x + e) = (x + e)T Q(x + e) (x + e)T b + c
2
1 T
= (x + eT )[Q(x ) + Q(e)] (x )T b eT b + c
2
1 T
= (x [Q(x ) + Q(e)] + eT [Q(x ) + Q(e)] (x )T b eT b + c
2
1 T 1 1
= [x Q(e) + eT Q(x )] + eT Q(e) (x )T b eT b + x T Q(x ) + c
2 2 2
1 T 1 T 1
= [ x Q(x ) (x ) b + c] + [x Q(e) + (x ) Qe] + eT Q(e) eT b
T T
2 2 2
1 T T T 1 T T
= [ x Q(x ) (x ) b + c] + x Q(e) + e Q(e) e b
2 2
1 T 1
= [ x Q(x ) (x ) b + c] + e Q(x ) + eT Q(e) eT b
T T
2 2
1 T
= f (x ) + e Q(e)
2
Note that (x )T Q(e) = ((x )T Q(e))T = eT QT (x ) = eT Q(x ) = eT b: Also,
1 T T 1 T T
2 [(x ) Q(e) + e Q(x )] = 2 (2e b) = e b:

As, 21 eT Q(e) > 0; f (x ) < f (x + e) for any non zero vector e in Rn :


The above is also the way we express a quadratic function f in the form
containing x ( which is a minimum point).
Theorem 2 Let x(0) be any starting vector in Rn ; the basic conjugate algorithm
converges to the unique x ( that solves Qx = b) in n steps; that is, x(n) = x :
Proof. Consider x x0 2 Rn : Since fd0 ; d1 ; d2 ; :::; dn 1g are linearly indepen-
n
dent vectors in R . Hence
x x0 = 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1

Premultiply both sides of the above equation by dTk Q, 0 k < n; we have


dTk Q(x x0 ) = dTk Q( 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1 )
T
= k dk Qdk

1
where the terms dTk Qdj = 0; k 6= j by the Q conjugacy property. Hence

dTk Q(x x0 ) = T
k dk Qdk gives
T
dk Q(x x0 )
k = indeed, dTk Qdk > 0
dTk Qdk

Note that

x(1) = x(0) + 0 d0 and x(2) = x(1) + 1 d1 = x(0) + 0 d0 + 1 d1


(3) (2) (0)
x = x + 2 d2 = x + 0 d0 + 1 d1 + 2 d2

Continuing

x(k) = x(0) + 0 d0 + 1 d1 + 2 d2 + ::: + k 1 dk 1

Therefore
x(k) x(0) = 0 d0 + 1 d1 + 2 d2 + ::: + k 1 dk 1

Thus
x x0 = (x xk ) + (xk x0 )
Premultiply both sides of the above equation by dTk Q; we have

dTk Q(x x0 ) = dTk Q(x xk ) + dTk Q(xk x0 )


= dTk Q(x xk ) + dTk Q( 0 d0 + 1 d1 + 2 d2 + ::: + k 1 dk 1 )
= dTk [Q(x ) Q(x )] = k
dTk gk

because gk = Q(xk ) b, Qx = b and dTk Q( 0 d0 + 1 d1 + 2 d2 +:::+ k 1 dk 1 ) =


0: Thus
dTk Q(x x0 ) dT gk
k = T
= Tk = k
dk Qdk dk Qdk
Hence

x(n) = x(n 1)
+ n 1 dn 1
(0)
= x + 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1
(0)
= x + 0 d0 + 1 d1 + 2 d2 + ::: + n 1 dn 1
0 0 ( )
= x +x x =x

Thus x(n) solves Qx = b , that is, rf (x(n) ) = 0 and hence we have

f (x(n) ) = minn f (x):


x2R

For quadratic function of n variables, the conjugate direction method reaches


the solution after n steps.

2
Remark 3 The basic idea is that the minimization on Rn of quadratic function
f (x) = c + bT x + 12 xT Qx with Q symmetric positive de…nite, can be split into n
minimizations over R: This is done with the help of n directions conjugate with
respect to Q: Along each direction an exact line search is performed in closed
form.

Remark 4 Suppose that we start at x(0) and search in the direction d0 to obtain
that
x(1) = x(0) + 0 d0
where step sizes calculated with the help of following formula

< g (0) ; d0 >


0 =
< d0 ; d0 >Q

We claim that
< rf (x(1) ); d0 >=< g (1) ; d0 >= 0
To see this,

< g (1) ; d0 >= (g (1) )T d0 = (Qx(1) b)T d0


= (x(1) )T Qd0 bT d0
(0) T
= (x + 0 d0 ) Qd0 bT d0
= (x(0) )T Qd0 + T
0 d0 Qd0 bT d0
< g (0) ; d0
> T
= (x(0) )T Qd0 d Qd0 bT d0
< d0 ; d0 >Q 0
= (x(0) )T Qd0 < g (0) ; d0 > bT d0
= [(x(0) )T Q bT ]d0 < g (0) ; d0 >= [Q(x(0) ) b]T d0 < g (0) ; d0 >
= [(g (0) )T ]d0 < g (0) ; d0 >= 0

The equation < g (1) ; d0 >= 0 implies that 0 has the property that

0 = arg min f (x(0) + d0 ) = arg min 0( )

that is,
f (x(1) ) = f (x(0) + 0 d0 ) = min f (x(0) + d0 )
2R

where
0( ) = f (x(0) + d0 ):
That is, the value of 0 0 is the exact minimizer of the

0( ) = f (x(0) + d0 ):

Apply the chain rule to obtain that


0
0( ) = rf (x(0) + d0 )T d0

3
Evaluating at 0, we have
0
0( 0) = rf (x(0) + T
0 d0 ) d0 = rf (x(1) )T d0 < g (1) ; d0 >= 0

as 0 is quadratic function of ( see the remark at the start of this lecture by


replacing e by 0 d0 and x by x(0) ) ; the coe¢ cient of the 2 term in 0 is

dT0 Qd0 > 0

the above implies that


0 = arg min 0( )
2R

Using the similar arguments, we can show that

< rf (x(k+1) ); dk >=< g (k+1) ; dk >= 0

and
k = arg min k( )
2R

Proposition 5 If d0 ; d1 ; d2 ; :::; dn 1 are n mutually conjugate directions with


respect to positive de…nite symmetric matrix Q in Rn and x(0) be any starting
vector in Rn : Let x(1) ; x(2) ; x(3) ; :::; x(n) be recursively de…ned by

x(k+1) = x(k) + k dk

where step size at each step is calculated with the help of following formula
< gk ; dk >
k =
< dk ; dk >Q
Note that

f (x(k+1) ) = f (x(k) + k dk ) = min f (x(k) + dk ) = min k( );


2R 2R

where
k( ) = f (x(k) + dk ):
That is, the value of k 0 is the exact minimizer of the

f (x(k) + dk ) = k( ):

That is, step size is found with the help of exact line search. Note that

gk = rf (x(k) ) = Qx(k) b

With the search direction dk chosen, we need to compute the step size, k: To
this end, we consider
1 (k)
( ) = f (x(k) + dk ) = (x + dk )T Q(x(k) + dk ) (x(k) + dk )T b
2

4
This is a quadratic function of : Now
0
( ) = rf (x(k) + dk )T dk = [Q(x(k) + dk ) b]T dk
= [Q(x(k) ) + Qdk ) b]T dk = (x(k) )T Qdk + dTk Qdk bT dk
00
and ( ) = dTk Qdk > 0
0
Its minimizer k is the value of where f (xk + dk ) vanishes. Therefore
d
f (x(k) + dk ) = rf (x(k+1) )T dk = (gk+1 )T dk =< gk+1 ; dk >= 0
d = k

Note that
0
( k) = (x(k) )T Qdk + T
k dk Qdk bT dk = 0 gives
bT dk (x(k) )T Qdk (x(k) )T Qdk bT dk ) (x(k) )T Q bT )dk
k = = ( = (
dTk Qdk dTk Qdk dTk Qdk
rf (x(k) )T dk < gk ; dk >
= ( )= :
dTk Qdk dTk Qdk

Following is the even the stronger condition satis…ed by gk+1 :


Proposition 6 In conjugate direction algorithm,
T
gk+1 dj = 0
for all k; 0 k n 1 and 0 j k: That is, gk+1 is orthogonal to each of
the direction d0 ; d1 ; :::; dk :

The result is true for k = 0; that is, g1T d0 = 0


Proof. We know that x(k+1) = x(k) + k dk where k is found such that
f (x(k+1) ) = min f (x(k) + dk ) = min ( ); where ( ) = f (x(k) + dk ); that is,
2R 2R
k 0 chosen to satisfy f (x(k) + k dk ) = min f (x(k) + dk ): So we must have
2R
0
T
( ) = 0: Thus gk+1 dk = 0:
= k
Assume that the result is true for k 1 ( that is, gkT dj = 0; 0 j k 1 ) (
T
we will show that it is true for k; that is, gk+1 dj = 0; 0 j k): Note that,
x(k+1) = x(k) + k dk
(k+1) (k)
x x = k dk

Now,
Q(x(k+1) ) Qx(k) = [Q(x(k+1) ) b] [Qx(k) b]
Q[x(k+1) x(k) ] = g (k+1) g (k)
Q[ k dk ] = g (k+1) g (k) gives
(k+1)
g = g (k) + k Qdk
(k+1) T (k) T
[g ] = [g ] + k dTk Q

5
Now, for 0 j<k

[g (k+1) ]T dj = [g (k) ]T dj + T
k dk Qdj
= 0+0

The …rst term on RHS is zero by inductive hypothesis and the second term is
zero by Q conjugacy ). We know that
T
gk+1 dk = 0:

Hence

(g (k+1) )T dj = 0 for all k; with 0 k n 1 and 0 j k

Thus, g (k+1) is orthogonal to any vector from the subspace spanned by fd0 ; d1 ; d2 ; :::; dk g

Proposition 7
f (xk+1 ) = min f (x)
x2Vk

(0)
where Vk = x + spanfd0 ; d1 ; d2 ; :::; dk g: Thus not only have

f (x(k+1) ) = min f (x(k) + dk )


2R

but also
P
k
f (x(k+1) ) = min f (x(0) + k dk )
0 ; 1 ; 2 ;::: k 2R i=0

As k increases, the subspace generated by fd0 ; d1 ; d2 ; :::; dk g expands and will


eventually …ll the whole of Rn ( provided the vectors d0 ; d1 ; d2 ; ::: are linearly
independent. Therefore for some su¢ ciently large k; x will lie in Vk : For this
reason, the above result is sometimes called the expanding subspace theorem.

Proof. De…ne the matrix Dk = [d0 ; d1 ; d2 ; :::; dk ]: That is, di is the ith column
of Dk : Note that x0 + R(Dk ) = Vk ; where R(Dk ) is the column space of Dk :
Also
P
k
xk+1 = x(0) + k dk
i=0

= x(0) + D k

where = [ 0 ; 1 ; :::; k ]: Hence xk+1 2 x0 + R(Dk ) = Vk : Now consider any


vector x 2 Vk : There exists a vector such that

x = x(0) + Dk

6
De…ne
k( ) = f (x(0) + Dk )
Note that k ( ) is a quadratic function and has a unique minimizer that satis…es
the FONC. By the chain rule

D k( ) = rf (x(0) + Dk )T D(k)
= [rf (xk+1 )]T D(k)
= [g (k+1) )]T D(k)

we know that [g (k+1) )]T D(k) = 0: Hence

D k( ) = [g (k+1) )]T D(k) = 0

Therefore satis…es FONC for the quadratic function k( ): Hence is the


minimizer of k ; that is

f (x(k+1) ) = min f (x(0) + Dk ) = min f (x):


x2Vk

How to generate Q conjugate directions:


The conjugate gradient algorithm does not use prespeci…ed conjugate direc-
tions but instead compute the directions as algorithm progresses.
At each stage, direction is calculated as a linear combination of previous
direction and the current gradient, in such a way that all the directions are
mutually Q conjugates.
Consider the quadratic form
1 T
f (x) = x Qx xT b + c
2
with Q symmetric positive de…nite matrix.

1. Let x(0) be an initial guess, then compute g (0) = rf (x(0) ) = Qx(0) b


2. If g (0) = 0; then stop, else go to next step.
3. Take d0 = g (0) ; That is the starting step is a selection of steepest descent.
4. Thus
x(1) = x(0) + 0 d0

where
< g (0) ; d0 >
0 = arg min f (x(0) + d0 ) =
0 < d0 ; d0 >Q
5. In the next stage, we search in a direction d1 that is Q conjugate to d0 :
We choose d1 as a linear combination of g (1) and d0 :
6. We then look for a direction d2 that is conjugate to the previous direction
d0 and d1 with respect to matrix Q:

7
7. Likewise we continue and thus in general:
8. In general, at the (k + 1) th stage, we choose dk+1 as a linear combination
of g (k+1) and dk as follows:

dk+1 = g (k+1) + k dk

The coe¢ cients k are chosen in such a way that dk+1 is Q conjugate
to d0 ; d1 ; d2 ; :::; dk : This is done by choosing k as

< gk+1 ; dk >Q


k =
< dk ; dk >Q

2.1 Justi…cation of above process:


1. Note that,

d0 = g (0) and d1 = g (1) + 0 d0 which gives


(1) (0)
x = x + 0 d0

where 0 is found such that f (x(0) + d0 ) is minimum. That is,

[rf (x(0) + d0 )]T d0 = 0; [rf (x(1) )]T d0 = 0; that is, (g (1) )T g (0) = 0:

That is, g (1) and g (0) are orthogonal to each other. That is, < g1 ; g0 >=
0; that is, < g1 ; g0 >= g1T g0 = 0: that is, < g (1) ; d0 >= 0, that is,
< g (k+1) ; dk >= 0 holds for k = 0:
We now show that if d0 and d1 are conjugate with respect to Q; then

< g (1) ; d0 >Q


0 must be chosen equal to
< d0 ; d0 >Q

Note that

g = rf (x) = Qx b and in particular


(0) (0) (0)
g = rf (x ) = Qx b and g (1) = rf (x(1) ) = Qx(1) b

If d0 and d1 are conjugate with respect to Q; then we must have

dT1 Qd0 = 0; that is, ( g (1) + 0 d0 )T Qd0 = 0


that is, g (1) Qd0 + 0 d0
T
Qd0 = 0; that is,
g (1) Qd0 < g (1) ; d0 >Q
0 = =
d0 T Qd0 < d0 ; d0 >Q

Proposition 8 In conjugate gradient algorithm, the directions d0 ; d1 ; :::; dn 1


are Q conjugate.

8
Proof. We shall prove the result with induction: First show that,

dT0 Qd1 = 0:

To this end:
dT0 Qd1 = dT0 Q(g (1) + 0 d0 )
substituting the value of 0; we have

< g (1) ; d0 >Q


dT0 Qd1 = dT0 Q[ g (1) + 0 d0 ] = dT0 Q[ g (1) + d0 ]
< d0 ; d0 >Q
< g (1) ; d0 >Q T
= dT0 Q( g (1) ) + d Qd0 )
< d0 ; d0 >Q 0
= dT0 Q( g (1) )+ < g (1) ; d0 >Q = dT0 Q( g (1) ) + dT0 Qg (1) = 0

Assume that d0 ; d1 ; :::; dk ; k < n 1 are Q conjugate directions. We now show


that dk+1 is Q conjugate to the directions d0 ; d1 ; :::; dk . That is,

dTk+1 Qdj = 0 for j 2 f0; 1; 2; :::; kg

Consider

dTk+1 Qdj = [ g (k+1) + k dk ]T Qdj


= (g (k+1) )T Qdj + T
k dk Qdj
= (g (k+1) )T Qdj + T
k dk Qdj

for j < k; we have dk T Qdj = 0 by virtue of the induction hypothesis. Also,

x(k+1) = x(k) + k dk
(k+1)
x x(k)
= dk
k

Thus
x(j+1) x(j)
dTk+1 Qdj = (g (k+1) )T Qdj = (g (k+1) )T Q( )
j
1
= (g (k+1) )T Q(x(j+1) x(j) )
j
1
= (g (k+1) )T [Qx(j+1) Qx(j) ]
j
1
= [(g (k+1) )T g (j+1) (g (k+1) )T g (j) ] = 0
j

Hence
dk+1 T Qdj = 0
for j = 0; 1; 2; :::; k 1:

9
Does dk+1 T Qdk = 0: We assume this as this necessary to …nd k: Use

dk+1 T Qdk = 0

we know that

dk+1 = gk+1 + k dk which implies that


dTk+1 = T
gk+1 + T
k dk

Note that

(dk+1 )T Qdk = T
0 () ( gk+1 + T
k dk )Qdk = 0
T
() gk+1 Qdk + k dTk Qdk =0
T T
() k dk Qdk = gk+1 Qdk
T
gk+1 Qdk
() k =
dTk Qdk

Proposition 9 Show that g (k+1) is orthogonal to gj for 0 j k: That is,


T
gk+1 gj = 0

Proof. We know that dk+1 = g (k+1) + k dk : Fix j 2 f0; 1; 2; :::; kg and obtain

dj = gj + j 1 dj 1

gives that

(g (k+1) )T dj = (g (k+1) )T gj + j 1 (g
(k+1) T
) dj 1 implies that
0 = (g (k+1) )T gj + 0 as (g (k+1) )T dj 1 = 0 and (g (k+1) )T dj = 0
Hence (g (k+1) )T gj = 0 for j 2 f0; 1; 2; :::; kg:

Example 10 Consider the following problem:


1
min (x21 + 9x22 )
2
Here
1 2
f (x) = (x + 9x22 )
2 1
T T
1 x1 1 0 x1 x1 0
=
2 x2 0 9 x2 x2 0
1 T 1 0 0
= x Qx xT b; here Q = and b =
2 0 9 0

10
Note that
@
@x1 f (x) x1
rf (x) = @ =
@x2 f (x)
9x2
1 0 x1 0
=
0 9 x2 0
= Qx b and
" @2 @2
#
2 @x21
f (x) @x2 @x1 f (x)
r f (x) = @2 @2
@x1 @x2 f (x) @x22
f (x)
1 0
= =Q
0 9

9
Suppose we start with x(0) = : Then
1

9
g0 = rf (x0 ) =
9
9
) d0 = g (0) =
9
1 0 9 9
) Qd0 = =
0 9 9 81
T
9 9
dT0 Qd0 = = 810
9 81
< g (0) ; d0 > 162 2
) 0 = = =
< d0 ; Qd0 > 810 10
9 1 9 4 9
) x(1) = x(0) + 0 d0 = =
1 5 9 5 1

Now
36 1
g (1) = rf (x1 ) =
5 1
36 1 < g (1) ; d0 >Q
) d1 = g (1) + 1 d0 = + d0
5 1 < d0 ; d0 >Q
36 1 ( 36 2
5 ) (2) 9
=
5 1 81(2) 9
324 36
25 9
= 36 =
25 25 1

11
Now
36 36 1 9 36 36
g1T d1 = ( )( ) =( )( )( 10)
5 25 1 1 5 25
36 324 324
1 0 9 25 1
Qd1 = ( ) = 324 =
25 0 9 1 25 25 1
T
36 324 9 1 23 328
dT1 Qd1 = ( )( ) =
25 25 1 1 125
< g1 ; d1 > ( 36 )( 36 )( 10) 5
) 1 = = 5 25 23 328 =
< d1 ; Qd1 > 125
9

4 9 36 5 9
x2 = x1 + 1 d1 = + ( )
5 1 25 9 1
0
=
0

Example 11 Consider the following problem:

min 12x2 + 4x21 + 4x22 4x1 x2

Here

f (x) = 4x21 + 4x22 4x1 x2 12x2


T T
x1 4 2 x1 x1 0
=
x2 2 4 x2 x2 12
T T T
1 x1 8 4 x1 x1 0
=
2 x2 4 8 x2 x2 12
1 T
= x Qx xT b
2
Note that
@
@x1 f (x) 8x1 4x2
rf (x) = @ =
@x2 f (x)
12 + 8x2 4x1
8 4 x1 0
= and
4 8 x2 12
" @2 @2
#
2 @x21
f (x) @x2 @x1 f (x)
r f (x) = @2 @2
@x1 @x2 f (x) @x22
f (x)
8 4
= =Q
4 8

We now generate two conjugate directions: d0 and d1 : Suppose that we choose

12
1 a
d0 = : Then d1 = must satisfy the following condition:
0 b
T
1 8 4 a
dT0 Qd1 = =0
0 4 8 b
= 8a 4b = 0

1
In particular we may choose a = 1 then b = 2: Hence d1 = : It may
2
be noted that conjugate directions are not unique. If we minimize the objective
1
1
function f starting from x(0) = 2 along the direction : Then
1 0
1 1
1 +
x(1) = x(0) + d0 = 2 + = 2
1 0 1

Now
1 T 1
1 + 8 4 +
( ) = f (x(0) + d0 ) = f (x(1) ) = 2 2
2 1 4 8 1
T 1
0 2 +
12 1
1
= (4 4) 2 +5 12
2
1
which attains its minimum at 0 = 1: Hence x(1) = 2 : Now start from
1
1
1
x(1) = 2 and minimize the objective function along the direction we
1 2
get
1
1
x(2) = x(1) + d1 = 2 +
1 2

1 1
f attains its minimum value at 1 = 2 and we get x(2) = which is a
2
minimum point of f:

8(1) 4(2) 0
rf (x(2) ) = =
12 + 8(2) 4(1) 0

13

You might also like