Professional Documents
Culture Documents
NONLINEAR FUNCTIONS
*
B. BRUET
Abstract
*+ n
Let R be the set of strictly positive real numbers and let F : R ->R
be a function having continuous first derivatives and a unique minimum at
origin such that :
n *+
(1) F(µX) = G(µ,F(X)) -
v X -
c R , -
v µ -
c R
*+
where G : R x R has continuous first derivatives G' and G' such
1 2
that :
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R, ±>F(0),
1
_______
*Ingénieur ECP - Docteur-Ingénieur de l Université Pierre et Marie Curie
MINIMIZING HIS FUNCTIONS 2
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R, ±>F(0).
2
f(x) = F(x-x*)
n
where F : R ->R is a HIS function as defined above.
n *+
(3) f(x*+µ(x-x*)) = G(µ,f(x)) -
v x -
c R , -
v µ -
c R
*+
where G : R x R has continuous first derivatives such that :
*+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R,
1 *+
G'(µ,±)>0 -
v µ -
c R , -
v ± -c R,
2
(D) y = x + ±d
Similarly to the method described in [1], let us search "uphill" (that is,
for increasing values of f) for a point y /
= x such that :
f(y) = f(x)
As we will show in section 4.2, such a point necessarily exists for a HIS
function. Let us now derive a more usable form of equation (4). After
determining a point y such that f(y)=f(x), we have G'(1,f(y))=G'(f(x)) and
1 1
according to equation (4) we can write :
f'(y).(y-x*) = f'(x).(x-x*)
Reordering, we get :
f'(y).d
(7) ß = ±
f'(y).d-f'(x).d
f'(y).d
0 < < 1
f'(y).d-f'(x).d
This means that, as one could expect, point z is located somewhere between
x and y, on the line joining x to y.
Using point z as determined above and subtracting (6) from (5) yields :
(f'(y)-f'(x)).(x*-z) = 0
The idea of the method is then to restrict the remaining searches to proceed
within this hyperplane, and then to apply the same method starting at the
point z previously computed. Therefore, the searches will be restricted to
linear subspaces of a dimension decreased by one at each iteration thus
leading to termination in a finite number of steps (at most n).
MINIMIZING HIS FUNCTIONS 5
Up to now, there has been no imposed way to choose the search directions,
provided these are kept orthogonal to the subspace generated by the gradient
differences f'(y)-f'(x) issued from previous steps. The trivial choice is
then obviously to choose the new direction of search as the projection of
the opposite of the gradient at the current iteration point onto the subspace
complementary to the subspace generated by gradient differences.
There are two possible cases of failure for the method as expressed above.
First, the newly computed f'(y)-f'(x) difference may lie within the subspace
generated by the previous differences. This leads to a null vector after
orthogonalization with respect to these differences.
We will now see that this is not possible under the assumption that the
direction d used for the search was a descent one, and that point y has been
determined "uphill".
Let us pose the hypothesis that the newly computed f'(y)-f'(x) would lie
within the subspace generated by the previous gradient differences. Since
the search direction d has been chosen orthogonal to this subspace, then d
would be necessarily orthogonal to f'(y)-f'(x). So we would have :
(f'(y)-f'(x)).d = 0
But from the fact that d is a descent direction, we have f'(x).d<0, and from
the fact that y has been found "uphill", we have f'(y).d>0 ; therefore we
obtain :
which is contradictory with (8). So, a newly computed f'(y)-f'(x) cannot lie
with the subspace generated by the previous gradient differences.
Second, we will now show that the method cannot generate a direction that is
not a descent one.
Due the the fact that the direction has been chosen as the projection of the
gradient onto the subspace orthogonal to the subspace generated by previous
gradient differences, the new direction is not a descent one if and only if
MINIMIZING HIS FUNCTIONS 7
the gradient lies within the subspace generated by the previous differences.
In this case the direction d would vanish, thus preventing the process from
being continued.
A vanishing new direction would mean that the gradient at the iteration point
is either zero or orthogonal to the subspace where the optimum lies. If the
gradient is null then the optimum is achieved. If not, at this iteration
point we would have :
f'(x).(x-x*) = 0
f'(x).(x-x*) = G'(1,f(x)) = 0
1
which is contradictory with (3). So, the resulting direction as chosen above
cannot vanish when used on HIS functions.
f'(x).d
(9) ß = ± +
d.Ad
Using the fact that f is quadratic, we can write, using Taylor expansion :
±f'(x).d + ±²d.Ad/2 = 0
f'(x).d
± = -2
d.Ad
f'(x).d
ß = - = ±/2
d.Ad
d .Ad = 0, for j = 0 to k
k+1 j
In both this method and the conjugate gradient algorithm, the initial direction
is the opposite to the gradient at the starting point. In both methods also,
the new direction is uniquely defined by the fact it is the projection of
the gradient at the iteration point onto the subpace orthogonal to the subspace
generated by the previous gradient vectors. Hence, by induction, the resulting
directions are the same in both methods.
The algorithm developed in section 1 does not explicitly rely on the fact
that the function being minimized is a HIS one. Therefore, the method can be
used to minimize general nonlinear functions, provided their first derivatives
are known. However, some of the results established in section 4 do not hold
any longer when applied to non HIS functions, and this must be accounted for.
More specifically, there may not be a point y such that f(y)=f(x) along a
descent path starting at x, and the gradient at a newly computed iteration
point may lie within the subspace generated by gradient differences. In both
cases, the algorithm cannot be continued and the only possibility left is to
restart the process from the beginning, i.e. with a search direction set to
the gradient. If the process were to fail again at this point, the algorithm
should be stopped and a failure be reported.
Note that the method (as exposed in section 1) does not require line searches
for a minimum. This peculiarity may sound odd when compared to classical
MINIMIZING HIS FUNCTIONS 10
minimization methods where linear searches for a minimum form the building
blocks of the process. However, preliminary experiments showed that there
was no special advantage adding an extra search for a minimum at the end of
the algorithm, and therefore this feature has been discarded.
set d = -f'(x )
0 0
note: k is the iteration number, 0 d k < n
set k = 0
set y = x + ± d
k k k k
let ß = ± f'(y ).d /(f'(y )-f'(x )).d
k k k k k k k
let x = x + ß d
k+1 k k k
note: orthogonalization of the gradient difference
let d = -f'(x )
k+1 k+1
MINIMIZING HIS FUNCTIONS 11
set x = x
0 k
go to 1
stop
3. NUMERICAL EXPERIMENTS
3.1. Overview
In section 3.2, we will report some numerical experiments, with both the full
version and a down sized version of this algorithm. The down sized method is
obtained by keeping only the last four gradient differences as mentioned in
1.2.2.
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 0.3
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
MINIMIZING HIS FUNCTIONS 13
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 0.5
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 1.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
MINIMIZING HIS FUNCTIONS 14
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 2.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 3.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
MINIMIZING HIS FUNCTIONS 15
number of variables: n = 4
p
function: f(x) = [Q(x)]
where: Q(x) = ½xAx + bx + c
4.5 7 3.5 3 -0.5
7 14 9 8 -1.0
A = 3.5 9 8.5 5 b = -1.5 c = 0.25
3 8 5 7 0
parameter: p = 4.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 5
function: f(x) = (½xHx)²
where: Hij = 1/(i+j-1), i = 1 to n, j = 1 to n
starting point: x = -3, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
function: f(x) = sin(Q(x))+1.001*Q(x)
where: Q(x) = £ ix²
, i = 1 to n
i
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
p
function: £ | )
f(x) = ln(1 + i|x
i i
parameter: p = 1.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
p
function: £ | )
f(x) = ln(1 + i|x
i i
parameter: p = 3.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
MINIMIZING HIS FUNCTIONS 17
number of variables: n = 10
p p
function: £ | )
f(x) = (i|x
i i
parameter: p = 1.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
p p
function: £ | )
f(x) = (i|x
i i
parameter: p = 3.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
function: f(x) = sin(F(x)) + 1.001F(x)
p
£ |
F(x) = i|x
i i
parameter: p = 1.0
MINIMIZING HIS FUNCTIONS 18
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 10
function: f(x) = sin(F(x)) + 1.001F(x)
p
£ |
F(x) = i|x
i i
parameter: p = 3.0
starting point: x = 1, i = 1 to n
i
solution x* = 0, f* = 0
number of variables: n = 3
function: f(x) = 100[(x -10)² + (r-1)²] + x²
3 3
where :
arctan(x /x ) , x >0
2À = $ 2 1 1
arctan(x /x )+À, x <0
2 1 1
½
r = (x²
+x²)
1 2
starting point: x = (-1,0,0)
o
solution x* = (1,0,0), f* = 0
MINIMIZING HIS FUNCTIONS 19
number of variables: n = 2
function: f(x,y) = (x-1)² + 100(y-x²)²
starting point: (x ,y ) = (0,0)
o o
solution (x*,y*) = (1,1), f* = 0
number of variables: n = 4
function: f(x) = 100(x²
-x )² + (1-x )² +
1 2 1
90(x²
-x )² + (1-x )² +
3 4 3
10.1[(x -1)² + (x -1)²] +
2 4
19.8(x -1)(x -1)
2 4
starting point: x = (-3,-1,-3,-1)
o
solution x* = (1,1,1,1), f* = 0
number of variables: n = 4
MINIMIZING HIS FUNCTIONS 20
n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
starting point: x = -3, i = 1 to n
i
solution x*= 1, i = 1 to n, f* = 0
i
number of variables: n = 20
n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
starting point: x = -3, i = 1 to n
i
solution x*= 1, i = 1 to n, f* = 0
i
number of variables: n = 80
MINIMIZING HIS FUNCTIONS 21
n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
starting point: x = -3, i = 1 to n
i
solution x*= 1, i = 1 to n, f* = 0
i
n/4
function: f(x) = £ { 100(x² -x )² + (1-x )² +
i=1 4i-3 4i-2 4i-3
90(x² -x )² + (1-x )² +
4i-1 4i 4i-1
10.1[(x -1)²+(x -1)²] +
4i-2 4i
19.8(x -1)(x -1) }
4i-2 4i
starting point: x = -3, i = 1 to n
i
solution x*= 1, i = 1 to n, f* = 0
i
number of variables: n = 4
2 2 4
function: f(x) = (x +10x ) + 5(x -x ) + (x -2x ) + 10(x -
1 2 3 4 2 3 1
starting point: x = (3,-1,0,1)
o
solution x* = 0, f* = 0
number of variables: n = 20
n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
starting point: (x ,x ,x ,x ) = (3,-1,0,3), i = 1 to n/4
4i-3 4i-2 4i-1 4i
MINIMIZING HIS FUNCTIONS 23
solution x* = 0, f* = 0
number of variables: n = 80
n/4 2 2
function: f(x) = £ [ (x +10x ) + 5(x -x ) +
i=1 4i-3 4i-2 4i-1 4i
4 4
(x -2x ) + 10(x -x ) ]
4i-2 4i-1 4i-3 4i
starting point: (x ,x ,x ,x ) = (3,-1,0,3), i = 1 to n/4
4i-3 4i-2 4i-1 4i
solution x* = 0, f* = 0
The following tables show the number of times a given method obtained a given
rank over the 27 test problems above. Every number is followed by the corre-
sponding percentage in parenthesis. Note that percentages in a line may add
up to more than 100% because there can be more than one method having a given
rank in case of a tie on a test problem.
The following table shows the number of failures for every method.
3.3. Discussion
At first glance, the major outcome of the above experiments is that on the
problems tested, the two versions of our algorithm performed similarly,
despite keeping only four directions in the down sized algorithm.
For the iteration criterion, the down sized version and the full version of
our method respectively occupy 59% and 74% of first ranks, which is clearly
superior to other methods.
For the gradient number criterion, the results are more balanced : for first
rank, the full version of our algorithm comes first (37%), but is closely
followed by the BFGS method (33%), itself followed by the down sized version
of our algorithm (30%) and finally the conjugate gradient method (26%). This
is due to the fact that our method has to undergo the severe handicap of two
gradient evaluations per iteration step, instead of one for the BFGS method
or the CG algorithm. Similar results hold for the lower ranks.
Hence, these experiments allow us to believe that the method presented here
may prove even more interesting for larger problems like those arising from
structural design optimization, and that it should be rewarding to investigate
further its use on such large practical problems.
4. MATHEMATICAL STUDY
(10) F(X) = ±, ± -c R
F(Y) = ß, ß -
c R
f(µ) = F(µX)
*+
Y = µX, for some fixed µ in R and where X verifies (10)
The fact that isovalue surfaces are homothetic to one another is the main
characterization of functions verifying (1). So, such "Homothetic Isovalue
Surfaces" functions defined by (1) will be described as "HIS" functions.
F'(µX).X = G'(µ,F(X))
1
*+
F'(X).X = G'(1,±) = constant in R
1
This means that the scalar product F'(X).X is constant on any isovalue surface
of F. This is the fundamental property which has been used previously in
section 2 to minimize exactly HIS functions.
n
Reciprocally, assume we have some function F : R ->R with continuous
derivatives, such that :
n
(12) F'(X).X = h(F(X)), -
v X -
c R
*+
where h : R->R
*+
Rewriting (12) for µX, µ -
c R , we get :
F'(µX).(µX) = h(F(µX))
that is :
F'(µX).X
(13) = 1/µ
h(F(µX))
h(y) being defined and positive for all y>F(0) implies that 1/h(y) is also
defined and positive for all y>F(0). Therefore, there exists a function H :
R->R such that :
dH(F(µX))
= 1/µ
dµ
MINIMIZING HIS FUNCTIONS 29
H(F(X)) = c(X)
From the fact that H'(y) is continuous and positive, we infer that H has a
-1 -1
inverse function H : R->R ; applying H to both sides of (14), we
obtain :
-1
F(µX) = H (ln(µ)+H(F(X))
-1 *+
Defining G(x,y) a H (ln(x)+H(y)) for (x,y)c
- R x R, we get :
F(µX) = G(µ,F(X))
which is precisely the form (1). From this and the facts that :
-1
G' (µ,±) = 1/H'(H (ln(µ)+H(±))) 1/µ
1
-1
= h(H (ln(µ)+H(±))) / µ
-1
and G' (µ,±) = 1/H'(H (ln(µ)+H(±))) H'(±)
2
-1
= h(H (ln(µ)+H(±))) / h(±)
are both continuous and positive for all µ,±, we can conclude that any function
verifying (12) is a HIS function.
n
Let X be some point of R , and D some descent direction for F that defines
F(X+±D) = F(X)
*+
First, we will show that there exists some ß
- c
R such that :
F(X+ßD) e F(X)
The proof is by refutation. Assume there were to be no such ß for some given
X,D. Then, we would have :
*+
F(X+ßD) < F(X) -
v ß -
c R
Y = X/ß
MINIMIZING HIS FUNCTIONS 31
and Z = (X+ßD)/ß = Y + D
So, there exist both a real ß>0 such that F(X+ßD)eF(X) and a real µ>0 such
that F(X+µD)<F(X). Since F is continuous, this implies that there is some
real ß>±>µ>0 such that F(X+±D)=F(X), whenever F is a HIS function and D is
a descent direction from X.
n
S(X) = 0, X-cR
n
such that for any given Y- cR , Y=
/0, the equation :
S(ÄY) = 0
*+
has a unique solution Ä- cR .
*+
Let there be some function K : R ->R having a continuous first derivative
*+
K' such that K'(µ)>0 for all µ- cR ; then we can define a function
n
F : R ->R by :
MINIMIZING HIS FUNCTIONS 32
F(X) = K(Ä),
F(µX) = K(Ã),
-1 *+
K [K(Ä)] = Ä -vÄ -c R .
-1
So we have : Ä=K [F(X)], and therefore :
-1
F(µX) = K(K [F(X)]/µ)
n
Let G : R ->R be defined by :
-1 *+
G(µ,±) = K(K [±]/µ) - vµ -c R , -v ± -c R, ±>F(0).
-1 -1
G'(µ,±) = K'(K [±]/µ)K [±]/µ²
1
-1
is positive from the fact that K'(µ)>0 wherever it is defined, and that K
*+
takes its values in R . Similarly :
-1 -1
G'(µ,±) = K'(K [±]/µ)(K [±])'/µ =
2
-1 -1
K'(K [±]/µ)/K'(K [±])/µ
is positive from the fact that K'(µ)>0 wherever it is defined and µ>0.
MINIMIZING HIS FUNCTIONS 33
n
A function F : R ->R is said to be homogeneous of degree p if it verifies :
p n
F(µX) = µ F(X) for all µ- cR and X -c R
p
G(µ,Y) = µ Y
K(X) = h(F(X))
K(µX) = h(F(µX))
K'(X).X = h'(F(X))F'(X).X
K'(X).X = h'(F(X))G'(1,F(X))
MINIMIZING HIS FUNCTIONS 34
n
K'(X).X = pK(X) for all X -
c R
h'(F(X))G'(1,F(X)) = ph(F(X))
h'(Y) p
=
h(Y) G'(1,Y)
dY
ln(h(Y)) = p !
G'(1,Y)
So, h is defined by :
dY
h(Y) = exp(p! ) for all Y -
c R,
G'(1,Y)
ACKNOWLEDGMENTS
The author thanks Professor W. BOCQUET for having made this work possible in
his laboratory.
The author also thanks Professor P. LAURENT for his careful reading of the
manuscript and his helpful suggestions, and Mlle MAGNOUX for her invaluable
help in bibliographic database searching.
MINIMIZING HIS FUNCTIONS 35
REFERENCES
__________