A Method To Minimize Exactly A Broad Class of Nonlinear Functions

A METHOD TO MINIMIZE EXACTLY A BROAD CLASS OF
NONLINEAR FUNCTIONS
*
B. BRUET
Laboratoire Productique et Logistique

Ecole Centrale des Arts et Manufactures
92290 CHATENAY-MALABRY Cedex, FRANCE
E-mail: 100321.1407@compuserve.com
Abstract
A technique is presented that minimizes exactly (in a finite number of steps)

the class of nonlinear functions whose isovalue surfaces are homothetic with
respect to the minimum ("HIS" functions). The method can be applied to general
nonlinear functions, only leaving out finite termination. Various numerical
experiments have been performed and their results compared to those obtained
with the well known BFGS and conjugate gradient methods. The method proved
to be more efficient than others when applied to very large problems such as
those found in structural design optimization.
Keywords : nonlinear optimization, unconstrained optimization, nonlinear

scaling, finite termination, homothetic surfaces, conjugate
gradient, BFGS
1. DEFINITION AND MINIMIZATION OF "HIS" FUNCTIONS
*+ n
Let R be the set of strictly positive real numbers and let F : R ->R
be a function having continuous first derivatives and a unique minimum at
origin such that :
n *+
(1) F(µX) = G(µ,F(X)) -
v X -
c R , -
v µ -
c R
*+
where G : R x R has continuous first derivatives G' and G' such
1 2
that :
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R, ±>F(0),
1
_______

*Ingénieur ECP - Docteur-Ingénieur de l Université Pierre et Marie Curie
MINIMIZING HIS FUNCTIONS 2
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R, ±>F(0).
2
The mathematical properties of such functions will be reviewed in further

detail in section 4, but for now, let us simply state that such a function
has homothetic isovalue surfaces (HIS function) and that its gradient satisfies:
(2) F'(X).X = G'(1,F(X))

1
n
Let be f : R ->R having a minimum x*, and defined by :
f(x) = F(x-x*)
n
where F : R ->R is a HIS function as defined above.
This is simply a translation from the origin of coordinates to the minimum,

so the results derived in section 4 still hold when applied to the minimum
instead of the origin. So, f will be said to be a HIS function with respect
to its unique minimum x*.
Definition (1) then becomes :
n *+
(3) f(x*+µ(x-x*)) = G(µ,f(x)) -
v x -
c R , -
v µ -
c R
*+
where G : R x R has continuous first derivatives such that :
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R,
1 *+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R,
2
where G' and G' denotes the derivatives with respect

1 1
to the variables.
Similarly, equation (3) yields :
(4) f'(x).(x-x*) = G'(1,f(x))

1
In this section, we will present a method to minimize exactly such HIS

functions.
1.1. Overview of the method
Assume we start at a given point x and we perform an unidimensional search

along a descent direction d, i.e. we restrict our attention to points y
verifying :
(D) y = x + ±d
Since d is supposed to be a descent direction for f when starting from x, we

have necessarily ±e0 .
Similarly to the method described in [1], let us search "uphill" (that is,
for increasing values of f) for a point y /
= x such that :
f(y) = f(x)
As we will show in section 4.2, such a point necessarily exists for a HIS
function. Let us now derive a more usable form of equation (4). After
determining a point y such that f(y)=f(x), we have G'(1,f(y))=G'(f(x)) and
1 1
according to equation (4) we can write :
f'(y).(y-x*) = f'(x).(x-x*)
Reordering, we get :
(5) (f'(y)-f'(x)).x* = f'(y).y - f'(x).x
where y = x + ±d for some ± -c R
This form could be used directed by constructing a set of n independent

equations corresponding to n+1 points giving the same value to f. This linear
system could then be solved using any relevant method of linear algebra such
as the classic Gauss method. However, this gives a poorly performing method
when applied to non HIS nonlinear functions, and we will now develop a more
attractive approach.
Let us find a point z lying on (D) such that :
(6) (f'(y)-f'(x)).z = f'(y).y - f'(x).x

Writing z=x+ßd and remembering that y=x+±d, we get :
(f'(y)-f'(x)).(x+ßd) = f'(y).(x+±d) - f'(x).x
Reordering and simplifying this equation yields :
f'(y).d
(7) ß = ±
f'(y).d-f'(x).d
It is interesting to note that on general HIS functions, there is no relation

between the z point determined as above, and the unidirectional minimum of
f from x along d : point z does not necessarily need to be this minimum, and
usually is simply not.
By hypothesis, d is a descent direction from x, so that f'(x).d should always

be negative. Similarly, by construction y is reached "uphill" along d, so
f'(y).d should be always positive. Therefore, the denominator of the above
expression should never vanish, even for non HIS functions. Furthermore, this
also implies that the denominator is superior to the numerator, yielding :
f'(y).d
0 < < 1
f'(y).d-f'(x).d
This means that, as one could expect, point z is located somewhere between
x and y, on the line joining x to y.
Using point z as determined above and subtracting (6) from (5) yields :
(f'(y)-f'(x)).(x*-z) = 0
This means that x* lies in the hyperplane orthogonal to f'(y)-f'(x) and

passing through z.
The idea of the method is then to restrict the remaining searches to proceed
within this hyperplane, and then to apply the same method starting at the
point z previously computed. Therefore, the searches will be restricted to
linear subspaces of a dimension decreased by one at each iteration thus
leading to termination in a finite number of steps (at most n).
Up to now, there has been no imposed way to choose the search directions,
provided these are kept orthogonal to the subspace generated by the gradient
differences f'(y)-f'(x) issued from previous steps. The trivial choice is
then obviously to choose the new direction of search as the projection of
the opposite of the gradient at the current iteration point onto the subspace
complementary to the subspace generated by gradient differences.
This is accomplished by orthogonalizing these differences through a process

like the well known Gram-Schmidt procedure. Note that if the set of previous
gradient differences is already orthogonalized, orthogonalizing a new gradient
difference will cost only O(n²) elementary operations instead of O(n3)
operations required for the classic Gram-Schmidt procedure. The projected
gradient is then obtained by subtracting from it its individual projections
onto the orthogonalized gradient differences.
Our method can now be summarized as follows :
- choose an inital point x and the first search direction

0
d = -f'(x ) ;
0 0
- at iteration k>=0, make an linear search along d to find the
k
point y /= x such that f(y ) = f(x ) ;
k k k k
- orthogonalize the gradient difference f'(y )-f'(x ) to previous
k k
gradient differences (if any) and normalize it ;
- compute z = x + ßd where ß is determined by formula (7) ;

k k k
- compute the gradient f'(z ) ;
k
- test for termination with an appropriate criterion (n iterations
done, null gradient, little changing f or x, etc) ;
- if not terminated, set x = z and d = projection

k+1 k k+1
of f'(z ) onto the subspace orthogonal to the orthogonalized
k
gradient differences, that is, remove from f'(z ) the orthogonal
k
projections of f'(z ) onto the individual normalized gradient
k
differences ;
- iterate process with k set to k+1.

1.2. Special issues
There are two possible cases of failure for the method as expressed above.
First, the newly computed f'(y)-f'(x) difference may lie within the subspace
generated by the previous differences. This leads to a null vector after
orthogonalization with respect to these differences.
We will now see that this is not possible under the assumption that the
direction d used for the search was a descent one, and that point y has been
determined "uphill".
Let us pose the hypothesis that the newly computed f'(y)-f'(x) would lie
within the subspace generated by the previous gradient differences. Since
the search direction d has been chosen orthogonal to this subspace, then d
would be necessarily orthogonal to f'(y)-f'(x). So we would have :
(f'(y)-f'(x)).d = 0
which could be rewritten as :
(8) f'(x).d - f'(y).d = 0
But from the fact that d is a descent direction, we have f'(x).d<0, and from
the fact that y has been found "uphill", we have f'(y).d>0 ; therefore we
obtain :
f'(x).d - f'(y).d < 0
which is contradictory with (8). So, a newly computed f'(y)-f'(x) cannot lie
with the subspace generated by the previous gradient differences.
Second, we will now show that the method cannot generate a direction that is
not a descent one.
Due the the fact that the direction has been chosen as the projection of the
gradient onto the subspace orthogonal to the subspace generated by previous
gradient differences, the new direction is not a descent one if and only if
the gradient lies within the subspace generated by the previous differences.
In this case the direction d would vanish, thus preventing the process from
being continued.
A vanishing new direction would mean that the gradient at the iteration point
is either zero or orthogonal to the subspace where the optimum lies. If the
gradient is null then the optimum is achieved. If not, at this iteration
point we would have :
f'(x).(x-x*) = 0
But, using (4), we would get :
f'(x).(x-x*) = G'(1,f(x)) = 0
1
which is contradictory with (3). So, the resulting direction as chosen above
cannot vanish when used on HIS functions.
1.3. Special case : quadratic f
It is interesting to see what becomes of ß as defined above when f is a

quadratic function, which is obviously a special case of a HIS function. In
this case, since the gradient of f is linear, we have :
f'(y) = f'(x) + ±Ad
where A is the Hessian matrix of f, and Ad the matrix-vector product

of A by d.
So the expression (7) yielding ß can be simplified and rewritten as :
f'(x).d
(9) ß = ± +
d.Ad
Using the fact that f is quadratic, we can write, using Taylor expansion :
f(y) = f(x) + ±f'(x).d + ±²d.Ad/2
and using f(y)=f(x) we get :

±f'(x).d + ±²d.Ad/2 = 0
The non trivial solution for ± is therefore :
f'(x).d
± = -2
d.Ad
Replacing and simplifying in (9) gives :
f'(x).d
ß = - = ±/2
d.Ad
which means that, as could be expected, z is midway between x and y, i.e. z

is the minimum of f along (D).
For quadratic functions, at every iteration k, the gradient difference

f'(x )-f'(y ) is colinear to Ad . Since the new direction d is
k k k k+1
computed to be orthogonal to the subspace generated by the previous
gradient differences, we then have :
d .Ad = 0, for j = 0 to k
k+1 j
This means that the direction d is conjugate to d with respect to A

k+1 j
for all jdk.
Given a sequence of n independent vectors, this property allows building

univocally a set of conjugate directions which spans the same subspace than
the original vectors.
In both this method and the conjugate gradient algorithm, the initial direction
is the opposite to the gradient at the starting point. In both methods also,
the new direction is uniquely defined by the fact it is the projection of
the gradient at the iteration point onto the subpace orthogonal to the subspace
generated by the previous gradient vectors. Hence, by induction, the resulting
directions are the same in both methods.
Therefore, when applied to quadratic functions, the method presented in this

paper generates the same directions (and thereafter the same iteration points)
as the conjugate gradient algorithm.
1.4. Down sized algorithm for huge problems
Increasingly often, modern practical problems involve a huge number of

variables. It is therefore interesting to note that this method can be easily
modified to accommodate whatever amount of storage is available.
This is accomplished by keeping only the last p gradient differences (where

p is some integer less than n), while discarding older differences. Of course,
the finite termination does not hold any longer in this case, but it is not
a major concern when the method is applied to non HIS functions for which
this feature does not hold anyway.
So, the method can be dynamically adjusted to whatever storage is reasonably

available, while still remaining a powerful tool.
This feature can be a definite advantage when compared to classic matrix

methods such as Huang family methods [3].
Numerical experiments presented in section 3.2 include results for such a

down sized algorithm.
2. MINIMIZING NON "HIS" FUNCTIONS
The algorithm developed in section 1 does not explicitly rely on the fact
that the function being minimized is a HIS one. Therefore, the method can be
used to minimize general nonlinear functions, provided their first derivatives
are known. However, some of the results established in section 4 do not hold
any longer when applied to non HIS functions, and this must be accounted for.
More specifically, there may not be a point y such that f(y)=f(x) along a
descent path starting at x, and the gradient at a newly computed iteration
point may lie within the subspace generated by gradient differences. In both
cases, the algorithm cannot be continued and the only possibility left is to
restart the process from the beginning, i.e. with a search direction set to
the gradient. If the process were to fail again at this point, the algorithm
should be stopped and a failure be reported.
Note that the method (as exposed in section 1) does not require line searches
for a minimum. This peculiarity may sound odd when compared to classical
minimization methods where linear searches for a minimum form the building
blocks of the process. However, preliminary experiments showed that there
was no special advantage adding an extra search for a minimum at the end of
the algorithm, and therefore this feature has been discarded.
By the way, it is interesting to remark that searching for a point y such

that f(y)=f(x) is faster that searching for a minimum when using a dichotomous
line search : in the first instance, the worst case rate of convergence is
0.5, in the second instance ("golden" search for a minimum), the rate is only
0.618.
Now, our algorithm can be stated more precisely as follows :
0. set x = some initial guess

0
1. note: starting direction is opposite of gradient
set d = -f'(x )
0 0
note: k is the iteration number, 0 d k < n
set k = 0
2. note: µ = some predefined threshold

if |f'(x )|<µ then go to 7
k
3. search ± > 0 such that f(x +± d ) = f(x )
k k k k k
if not such ± then go to 6
k
4. note: determination of next iteration point
set y = x + ± d
k k k k
let ß = ± f'(y ).d /(f'(y )-f'(x )).d
k k k k k k k
let x = x + ß d
k+1 k k k
note: orthogonalization of the gradient difference
let ´ = f'(y ) - f'(x )

k k k
for j=0 to k-1 let ´ = ´ - (´ .´ )´
k k k j j
let ´ = ´ /|´ |
k k k
note: orthogonalization of the new direction of search
let d = -f'(x )
k+1 k+1
for j=0 to k let d = d - (d .´ )´

k+1 k+1 k+1 j j
if |d |<µ then go to 6
k+1
5. let d = d /|d |
k+1 k+1 k+1
set k = k + 1
if ken then go to 1, else go to 2
6. if k=0 then stop 'Failure to converge'
set x = x
0 k
go to 1
7. display 'Minimum reached'

k
display x and other relevant information
stop
3. NUMERICAL EXPERIMENTS
3.1. Overview
In section 3.2, we will report some numerical experiments, with both the full
version and a down sized version of this algorithm. The down sized method is
obtained by keeping only the last four gradient differences as mentioned in
1.2.2.
The numbers of iterations they required to achieve convergence are compared

to those of the well known BFGS [2] method and Polak-Ribière version of the
conjugate gradient [4] method.
These experiments have been conducted under the following assumptions :
- the convergence was considered to be obtained when the optimum

-10
value is achieved with a precision better than 10 ;
- in any case of trouble (non decreasing function over a line
search, failure to generate a new direction, etc.), the failing
method was restarted from the beginning, using the current
point of iteration as the new starting point, throwing away any
remembered information ; the failure was considered definitive
when this restart attempt also failed ;
- all line searches have been performed with maximum precision

-16
(10 ) to avoid roundoff effects.
A little difficulty arises when comparing the performances of the methods,

because classic algorithms (CG and BFGS) use only one gradient evaluation
per step instead of two in our method. To circumvent this, two ranking criteria
were used. The first one was the number of iterations used to achieve the
convergence : it is well suited to problems with small numbers of variables,
where the gradient evaluation cost is likely to be negligible when compared
to line search cost. The second criterion was the number of gradient evalua-
tions necessary to achieve the convergence : it is well suited to problems
with a great number of variables, where line search cost can probably be
neglected when compared to gradient evaluation cost. This allows a potential
user of the method to put together his own criterion from these.
The following abbreviations have been employed thereafter :
BFGSM : Broyden Fletcher Goldfarb Shanno Method [2]

CGM : Conjugate Gradient Method [4]
4HISM : 4 directions Homotetic Isovalue Surface Method
FHISM : Full Homothetic Isovalue Surface Method
3.2. Test Problems
3.2.1. Huang and Levy Quasi-Quadratic Function
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 0.3
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
4HISM FHISM BFGSM CGM

Iteration number 4 4 45 80
Iteration ranking 1 1 3 4
Gradient ranking 1 1 3 4
p

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 0.5
i

p

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 1.0
i

p

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 2.0
i

p

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 3.0
i

p

4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0

parameter: p = 4.0
i

3.2.7. Hilbert Quasi-Quadratic Function
function: f(x) = (½xHx)²
where: Hij = 1/(i+j-1), i = 1 to n, j = 1 to n
starting point: x = -3, i = 1 to n
i

3.2.8. Quasi-Quadratic Function

function: f(x) = sin(Q(x))+1.001*Q(x)
where: Q(x) = £ ix²
, i = 1 to n
i
i

3.2.9. Homothetic Isovalue Surfaces Function
p
function: £ | )
f(x) = ln(1 + i|x
i i
parameter: p = 1.0
i

Iteration number 30* 10 191 173*
* method failed to converge
p
function: £ | )
f(x) = ln(1 + i|x
i i
parameter: p = 3.0
i

p p
function: £ | )
f(x) = (i|x
i i
parameter: p = 1.0
i

p p
function: £ | )
f(x) = (i|x
i i
parameter: p = 3.0
i

function: f(x) = sin(F(x)) + 1.001F(x)
p
£ |
F(x) = i|x
i i
parameter: p = 1.0
i

function: f(x) = sin(F(x)) + 1.001F(x)
p
£ |
F(x) = i|x
i i
parameter: p = 3.0
i

3.2.15. Fletcher-Powell Helical Function
function: f(x) = 100[(x -10)² + (r-1)²] + x²
3 3
where :
arctan(x /x ) , x >0
2À = $ 2 1 1
arctan(x /x )+À, x <0
2 1 1
½
r = (x²
+x²)
1 2
starting point: x = (-1,0,0)
o
solution x* = (1,0,0), f* = 0

3.2.16. Rosenbrock's 2D Function
function: f(x,y) = (x-1)² + 100(y-x²)²
starting point: (x ,y ) = (0,0)
o o
solution (x*,y*) = (1,1), f* = 0

3.2.17. Rosenbrock's 4D Function
function: f(x) = 100(x²
-x )² + (1-x )² +
1 2 1
90(x²
-x )² + (1-x )² +
3 4 3
10.1[(x -1)² + (x -1)²] +
2 4
19.8(x -1)(x -1)
2 4
starting point: x = (-3,-1,-3,-1)
o
solution x* = (1,1,1,1), f* = 0

3.2.18. Generalized Rosenbrock's 4D Function
n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
i
solution x*= 1, i = 1 to n, f* = 0
i

n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
i
i

n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
i
i


n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
i
i


n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
i
i

Iteration number 56 0ª 0ª 210
ª method failed because of insufficient memory
3.2.23. Powell's Quartic Function
2 2 4
function: f(x) = (x +10x ) + 5(x -x ) + (x -2x ) + 10(x -
1 2 3 4 2 3 1
starting point: x = (3,-1,0,1)
o

3.2.24. Powell's Extended Quartic Function
n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
starting point: (x ,x ,x ,x ) = (3,-1,0,3), i = 1 to n/4
4i-3 4i-2 4i-1 4i

n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
4i-3 4i-2 4i-1 4i


n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
4i-3 4i-2 4i-1 4i


n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
4i-3 4i-2 4i-1 4i

Iteration number 41 0ª 0ª 509
ª method failed because of insufficient memory

3.2.28. Statistics
The following tables show the number of times a given method obtained a given
rank over the 27 test problems above. Every number is followed by the corre-
sponding percentage in parenthesis. Note that percentages in a line may add
up to more than 100% because there can be more than one method having a given
rank in case of a tie on a test problem.
Iteration Number Ranking
Rank 4HISM FHISM BFGSM CGM

1 16(59%) 20(74%) 5(19%) 1( 4%)
2 7(26%) 4(15%) 5(19%) 2( 7%)
3 4(15%) 3(11%) 11(41%) 6(22%)
4 0( 0%) 0( 0%) 6(22%) 18(67%)
Gradient Number Ranking
Rank 4HISM FHISM BFGSM CGM

1 8(30%) 10(37%) 9(33%) 7(26%)
2 6(22%) 7(26%) 4(15%) 7(26%)
3 12(44%) 10(37%) 9(33%) 0( 0%)
4 1( 4%) 0( 0%) 5(19%) 13(48%)
The following table shows the number of failures for every method.

Failure number 3(11%) 2( 7%) 2( 7%) 3(11%)
3.3. Discussion
At first glance, the major outcome of the above experiments is that on the
problems tested, the two versions of our algorithm performed similarly,
despite keeping only four directions in the down sized algorithm.
For the iteration criterion, the down sized version and the full version of
our method respectively occupy 59% and 74% of first ranks, which is clearly
superior to other methods.
For the gradient number criterion, the results are more balanced : for first
rank, the full version of our algorithm comes first (37%), but is closely
followed by the BFGS method (33%), itself followed by the down sized version
of our algorithm (30%) and finally the conjugate gradient method (26%). This
is due to the fact that our method has to undergo the severe handicap of two
gradient evaluations per iteration step, instead of one for the BFGS method
or the CG algorithm. Similar results hold for the lower ranks.
However, it is worthwhile to note that on the two largest (500 variables)

test problems (Rosenbrock and Powell's extended functions), our down sized
method still wins the first rank despite the above mentioned handicap. This
is partly due to the fact that for problems of such a size, matrix method
(BFGS and full HISM) are unusable. Howewer, this stresses the point that the
down sized version of our method seems to be very effective compared to the
CG algorithm, the only alternative method available for very big problems.
Hence, these experiments allow us to believe that the method presented here
may prove even more interesting for larger problems like those arising from
structural design optimization, and that it should be rewarding to investigate
further its use on such large practical problems.
4. MATHEMATICAL STUDY
In this section, we will review some mathematical properties of the functions

verifying (1).
4.1. Homothetic isovalue surfaces
Let F be a function verifying (1).
It is easy to show that the isovalue surfaces of F are homothetic to one

another with respect to the origin of coordinates.
Let (S1) be the isovalue surface of F defined by :
(10) F(X) = ±, ± -c R
and (S2) be the isovalue surface of F defined by :
F(Y) = ß, ß -
c R
We can assume without loss of generality that ±eßeF(0).

Let us consider the function f : R->R defined as follow for a given X :
f(µ) = F(µX)
f(µ) varies from f(0)=F(0) to f(1)=± when µ varies from 0 to 1. Since F

is continuous, so is f. Therefore, since f(0)dßdf(1), there exists some
µc
-[0,1] such that f(µ)=ß, i.e. F(µX)=ß.
Using (1), we have :
F(µX) = G(µ,F(X)) = G(µ,±) = ß

According to (1), for a fixed ± the function defined by G (µ) a G(µ,±)

±
is a continuous, monotonous and increasing function of µ in [0,1], so it has
-1
an inverse function G such that :
±
-1
µ = G (ß)
±
Therefore, µ does not depend on X, which shows that (S1) and (S2) are homot-
hetic.
Reciprocally, any surface homothetic to an isovalue surface of F is also an

isovalue surface of F. Consider the isovalue surface (S1) defined by (10).
Then the surface (S2) defined by :
*+
Y = µX, for some fixed µ in R and where X verifies (10)
is homothetic to (S1) with respect to the origin of coordinates. Using (1),

we get :
F(Y) = F(µX) = G(µ,F(X)) = G(µ,±) = some constant in R
So, (S2) is also an isovalue surface of F.
The fact that isovalue surfaces are homothetic to one another is the main
characterization of functions verifying (1). So, such "Homothetic Isovalue
Surfaces" functions defined by (1) will be described as "HIS" functions.
4.2. Fundamental property of the gradient of a HIS function
Differentiating definition (1) with respect to µ yields :
F'(µX).X = G'(µ,F(X))
1
where F' stands for the gradient of F and G'

for the derivative
1
with respect to the first variable of G,
n n
and "." denotes the scalar product of two vectors R x R -> R.
Taking µ=1 in equation above yields :

(11) F'(X).X = G'(1,F(X))

1
on an isovalue surface (S) defined by F(X)=±, we then have :
*+
F'(X).X = G'(1,±) = constant in R
1
This means that the scalar product F'(X).X is constant on any isovalue surface
of F. This is the fundamental property which has been used previously in
section 2 to minimize exactly HIS functions.
n
Reciprocally, assume we have some function F : R ->R with continuous
derivatives, such that :
n
(12) F'(X).X = h(F(X)), -
v X -
c R
*+
where h : R->R
*+
Rewriting (12) for µX, µ -
c R , we get :
F'(µX).(µX) = h(F(µX))
that is :
F'(µX).X
(13) = 1/µ
h(F(µX))
h(y) being defined and positive for all y>F(0) implies that 1/h(y) is also
defined and positive for all y>F(0). Therefore, there exists a function H :
R->R such that :
H'(y) = 1/h(y), y>F(0)
where H' stands for the first derivative of H.
Thus, (13) can be rewritten as follows :
dH(F(µX))
= 1/µ
dµ
Integrating this, we obtain :
H(F(µX)) = ln(µ) + c(X)
where ln : R->R is the Napierian logarithm,

n
and c : R ->R is some constant with respect to µ.
For µ=1, we get :
H(F(X)) = c(X)
So, we finally obtain :
(14) H(F(µX)) = ln(µ) + H(F(X))
From the fact that H'(y) is continuous and positive, we infer that H has a
-1 -1
inverse function H : R->R ; applying H to both sides of (14), we
obtain :
-1
F(µX) = H (ln(µ)+H(F(X))
-1 *+
Defining G(x,y) a H (ln(x)+H(y)) for (x,y)c
- R x R, we get :
F(µX) = G(µ,F(X))
which is precisely the form (1). From this and the facts that :
-1
G' (µ,±) = 1/H'(H (ln(µ)+H(±))) 1/µ
1
-1
= h(H (ln(µ)+H(±))) / µ
-1
and G' (µ,±) = 1/H'(H (ln(µ)+H(±))) H'(±)
2
-1
= h(H (ln(µ)+H(±))) / h(±)
are both continuous and positive for all µ,±, we can conclude that any function
verifying (12) is a HIS function.
4.3. Relation between the gradient and the Hessian matrix

Differentiating (11) with respect to X yields :
F"(X)*X + F'(X) = G"(1,F(X)) F'(X)

12
n n²
where F" : R ->R is the Hessian matrix of F,
* denotes the matrix-vector product,
and G" is the second derivative of G with respect to the first

12
and the second variable in turn.
This can be rewritten as :
F"(X)*X = [G"(1,F(X))-1] F'(X)

12
which expresses that everywhere, the gradient is colinear to the product of

the Hessian matrix by the gradient.
4.4. Existence of an >0 such that F(X+D)=F(X) for all X, D being a

descent direction
n
Let X be some point of R , and D some descent direction for F that defines
a line passing through X. We will show now that if F is a HIS function,

*+
there always exists an ±-cR such that :
F(X+±D) = F(X)
*+
First, we will show that there exists some ß
- c
R such that :
F(X+ßD) e F(X)
The proof is by refutation. Assume there were to be no such ß for some given
X,D. Then, we would have :
*+
F(X+ßD) < F(X) -
v ß -
c R
Let us consider the following points :
Y = X/ß
and Z = (X+ßD)/ß = Y + D
The values of F at these points would be :
F(Y) = F(X/ß) = G(1/ß,F(X))

and F(Z) = F((X+ßD)/ß) = G(1/ß,F(X+ßD))
G(µ,ß) is a increasing function of ß for all µ>0, ß>0. So F(X+ßD)<F(X) for

all ß>0 would then imply that F(Z)<F(Y) for all ß>0. Therefore, if we were
to make ß go to infinity, then Y would converge to 0 and Z would converge to
D. F being continuous, then F(Y) would converge to F(0). But F(Y)>F(Z)eF(0)
would imply in turn that F(Z) would also converge to F(0). Since F is conti-
nuous, this would imply that F(D)=F(0). Since D is a descent direction, it
is not the null vector and therefore F would have two distinct minima 0 and
D, which is contradictory with (1). Hence, for any descent direction D from
a point X, there exists some ß>0 such that F(X+ßD)eF(X).
So, there exist both a real ß>0 such that F(X+ßD)eF(X) and a real µ>0 such
that F(X+µD)<F(X). Since F is continuous, this implies that there is some
real ß>±>µ>0 such that F(X+±D)=F(X), whenever F is a HIS function and D is
a descent direction from X.
4.5. Building a HIS function
Let there be a hypersurface defined by the implicit equation :
n
S(X) = 0, X-cR
n
such that for any given Y- cR , Y=
/0, the equation :
S(ÄY) = 0
*+
has a unique solution Ä- cR .
*+
Let there be some function K : R ->R having a continuous first derivative
*+
K' such that K'(µ)>0 for all µ- cR ; then we can define a function
n
F : R ->R by :
F(X) = K(Ä),
where Ä is the unique solution of S(ÄX)=0.
Rewriting the previous equation for µX yields :
F(µX) = K(Ã),
where Ã is the unique solution of S(ÃµX)=0.
Therefore Ã=Ä/µ, where Ä is the unique solution of S(ÄX)=0.

-1
Since K has a continuous positive derivative, it has an inverse function K
*+
R->R such that :
-1 *+
K [K(Ä)] = Ä -vÄ -c R .
-1
So we have : Ä=K [F(X)], and therefore :
-1
F(µX) = K(K [F(X)]/µ)
n
Let G : R ->R be defined by :
-1 *+
G(µ,±) = K(K [±]/µ) - vµ -c R , -v ± -c R, ±>F(0).
Then F(µX)=G(µ,F(X)) and :
-1 -1
G'(µ,±) = K'(K [±]/µ)K [±]/µ²
1
-1
is positive from the fact that K'(µ)>0 wherever it is defined, and that K
*+
takes its values in R . Similarly :
-1 -1
G'(µ,±) = K'(K [±]/µ)(K [±])'/µ =
2
-1 -1
K'(K [±]/µ)/K'(K [±])/µ
is positive from the fact that K'(µ)>0 wherever it is defined and µ>0.
So, G fulfils the requirements of (1) and thereafter, F is a HIS function.

In other terms, this means that we can build a HIS function given an arbitrary
hypersurface containing the origin and an arbitrary positive real-valued
increasing function. This can be regarded as another way to define the rather
broad class of HIS functions.
4.6. Relation to homogeneous functions
n
A function F : R ->R is said to be homogeneous of degree p if it verifies :
p n
F(µX) = µ F(X) for all µ- cR and X -c R
Obviously, such a function is a HIS function for which G is defined by :
p
G(µ,Y) = µ Y
Conversely, it is interesting to note that there is a close relationship

between HIS functions and homogeneous ones : a HIS function can be transformed
into an homogeneous function through nonlinear scaling.
Let h : R->R be a function with a positive first derivative, and let K :

n
R ->R be defined by :
K(X) = h(F(X))
Such a function K is said to be a nonlinear (if h is nonlinear) scaling of F.
Substituting µX for X yields :
K(µX) = h(F(µX))
and differentiating with respect to µ and taking µ=1 yields :
K'(X).X = h'(F(X))F'(X).X
From (11) then we have :
K'(X).X = h'(F(X))G'(1,F(X))
We can now choose h such that K is homogeneous of degree p -

c R ; all we need
is that :
n
K'(X).X = pK(X) for all X -
c R
This means that :
h'(F(X))G'(1,F(X)) = ph(F(X))
If by Y we denote F(X) we get :
h'(Y) p
=
h(Y) G'(1,Y)
which can be integrated to :
dY
ln(h(Y)) = p !
G'(1,Y)
So, h is defined by :
dY
h(Y) = exp(p! ) for all Y -
c R,
G'(1,Y)
and for this choice of h, K is a HIS function.
ACKNOWLEDGMENTS
The author thanks Professor W. BOCQUET for having made this work possible in
his laboratory.
The author also thanks Professor P. LAURENT for his careful reading of the
manuscript and his helpful suggestions, and Mlle MAGNOUX for her invaluable
help in bibliographic database searching.
REFERENCES
[1] B. BRUET, "A Corrected-Gradient Method that makes Gradient

Optimization Algorithms Invariant to any Non-Linear Scaling"
Engineering Optimization, 1980, Vol. 5, pp. 13-18
[2] M. AVRIEL, "Nonlinear Programming: Analysis and Methods",

Prentice-Hall, Series in Automatic Computation, 1976,
ISBN 0-13-623603-0
[3] H. Y. HUANG, "Unified Approach to Quadratically Convergent

Algorithms for Function Minimization", JOTA, 5, (6), (1970)
[4] E. POLAK et G. RIBIERE, "Note sur la Convergence de Méthodes

de Directions Conjuguées" R.I.R.O. 3ème année, (16), pp. 35-45,
(1969)
__________

A Method To Minimize Exactly A Broad Class of Nonlinear Functions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Method To Minimize Exactly A Broad Class of Nonlinear Functions

Uploaded by

Copyright:

Available Formats

A METHOD TO MINIMIZE EXACTLY A BROAD CLASS OF

Laboratoire Productique et Logistique

A technique is presented that minimizes exactly (in a finite number of steps)

Keywords : nonlinear optimization, unconstrained optimization, nonlinear

1. DEFINITION AND MINIMIZATION OF "HIS" FUNCTIONS

The mathematical properties of such functions will be reviewed in further

(2) F'(X).X = G'(1,F(X))

This is simply a translation from the origin of coordinates to the minimum,

Definition (1) then becomes :

where G' and G' denotes the derivatives with respect

Similarly, equation (3) yields :

(4) f'(x).(x-x*) = G'(1,f(x))

In this section, we will present a method to minimize exactly such HIS

1.1. Overview of the method

Assume we start at a given point x and we perform an unidimensional search

Since d is supposed to be a descent direction for f when starting from x, we

(5) (f'(y)-f'(x)).x* = f'(y).y - f'(x).x

where y = x + ±d for some ± -c R

This form could be used directed by constructing a set of n independent

Let us find a point z lying on (D) such that :

(6) (f'(y)-f'(x)).z = f'(y).y - f'(x).x

Writing z=x+ßd and remembering that y=x+±d, we get :

(f'(y)-f'(x)).(x+ßd) = f'(y).(x+±d) - f'(x).x

Reordering and simplifying this equation yields :

It is interesting to note that on general HIS functions, there is no relation

By hypothesis, d is a descent direction from x, so that f'(x).d should always

This means that x* lies in the hyperplane orthogonal to f'(y)-f'(x) and

This is accomplished by orthogonalizing these differences through a process

Our method can now be summarized as follows :

- choose an inital point x and the first search direction

- compute z = x + ßd where ß is determined by formula (7) ;

done, null gradient, little changing f or x, etc) ;

- if not terminated, set x = z and d = projection

- iterate process with k set to k+1.

1.2. Special issues

which could be rewritten as :

(8) f'(x).d - f'(y).d = 0

f'(x).d - f'(y).d < 0

But, using (4), we would get :

1.3. Special case : quadratic f

It is interesting to see what becomes of ß as defined above when f is a

f'(y) = f'(x) + ±Ad

where A is the Hessian matrix of f, and Ad the matrix-vector product

So the expression (7) yielding ß can be simplified and rewritten as :

f(y) = f(x) + ±f'(x).d + ±²d.Ad/2

and using f(y)=f(x) we get :

The non trivial solution for ± is therefore :

Replacing and simplifying in (9) gives :

which means that, as could be expected, z is midway between x and y, i.e. z

For quadratic functions, at every iteration k, the gradient difference

This means that the direction d is conjugate to d with respect to A

Given a sequence of n independent vectors, this property allows building

Therefore, when applied to quadratic functions, the method presented in this

1.4. Down sized algorithm for huge problems

Increasingly often, modern practical problems involve a huge number of

This is accomplished by keeping only the last p gradient differences (where

So, the method can be dynamically adjusted to whatever storage is reasonably

This feature can be a definite advantage when compared to classic matrix

Numerical experiments presented in section 3.2 include results for such a

2. MINIMIZING NON "HIS" FUNCTIONS

By the way, it is interesting to remark that searching for a point y such

Now, our algorithm can be stated more precisely as follows :

0. set x = some initial guess