You are on page 1of 24

A nonsmooth inexact Newton method for the solution of

large-scale nonlinear complementarity problems

Francisco Facchinei a 1 and Christian Kanzow b


;

a
Universita di Roma \La Sapienza", Dipartimento di Informatica e Sistemistica,
Via Buonarroti 12, 00185 Roma, Italy. e-mail : soler@peano.dis.uniroma1.it
b
University of Hamburg, Institute of Applied Mathematics, Bundesstrasse 55,
D-20146 Hamburg, Germany. e-mail: kanzow@math.uni-hamburg.de

Abstract
A new algorithm for the solution of large-scale nonlinear complementarity prob-
lems is introduced. The algorithm is based on a nonsmooth equation reformulation
of the complementarity problem and on an inexact Levenberg-Marquardt-type algo-
rithm for its solution. Under mild assumptions, and requiring only the approximate
solution of a linear system at each iteration, the algorithm is shown to be both glob-
ally and superlinearly convergent, even on degenerate problems. Numerical results
for problems with up to 10000 variables are presented.

1 Introduction

We consider the complementarity problem NCP(F ), which is to nd a vector


in IRn satisfying the conditions
x  0; F (x)  0; xT F (x) = 0;

where F : IRn ! IRn is a continuously di erentiable function. Nonlinear com-


plementarity problems have important applications, see, e.g., [11,19], which
often call for the solution of large-scale problems. During the last few years
many methods have been developed for the solution of the nonlinear com-
plementarity problem NCP(F ), see [17,25] and references therein, but there
have been only few proposals for the solution of large-scale problems, see
The work of this author was partially supported by Agenzia Spaziale Italiana,
1

Roma, Italy

Preprint submitted to Elsevier Science 26 May 1995


e.g., [24,14,16,7,2]. Some of these proposals [24,14] require, at each iteration,
the approximate solution of a linear complementarity problem, which may
be cumbersome from a computational point of view. Results for large-scale
problems have also been obtained by using direct methods for the solution of
the subproblems arising at each iteration [2,7]. But the computational costs
associated to the exact solution of the subproblems preclude the use of these
methods for really large problems. Finally, we point out that it would be easy
to think to the inexact extensions of methods which reduce the nonlinear com-
plementarity problem to a smooth system of equations. However, using these
methods, fast convergence is precluded in the case of degenerate solutions, and
it is well known that degenerate solutions occur, especially in the large-scale
case.
In this paper we propose a new method for the solution of NCP(F ) which is
based on a semismooth equation reformulation of the nonlinear complemen-
tarity problem. This reformulation hinges on the use of the convex function
' : IR ! IR de ned by
2

p
'(a; b) := a + b ? a ? b:
2 2
(1)

The main property of this function is that


'(a; b) = 0 () a  0; b  0; ab = 0;

so that the NCP(F ) is obviously equivalent to the nonlinear system (x) = 0;


where  : IRn ! IRn is de ned by
0 1
'(x ; F (x))
(x) := B ... CA :
1 1

@ (2)
'(xn ; Fn(x))

The function (1) has been introduced by Fischer [12] in 1992. Since then it
has interested several researchers in the elds of complementarity, constrained
optimization and variational inequality problems, see [13] for a survey of its
applications.
We propose an inexact Levenberg-Marquardt-type method for the solution of
the semismooth system (x) = 0. This method requires, at each iteration, the
approximate solution of a symmetric positive semide nite and solvable linear
system. We show that, under very mild assumptions, the algorithm is locally
superlinearly and even quadratically convergent to a solution of the nonlinear
complementarity problem. In order to globalize the local method we perform
an Armijo-type line search to minimize the natural merit function
(x) := (x)T (x);
1
2

2
which, in spite of the nonsmoothness of , turns out to be continuously dif-
ferentiable.
The new algorithm is based, on the one hand on the recently developed theory
for the solution of semismooth systems of equations by Newton-type methods,
see [30,29,22], and, on the other hand, on the properties of the functions 
and that have also been the subject of recent research [10,5]. From the
theoretical point of view, the new method compares favourably with existing
methods. Numerical tests on problems with up to 10000 variables are reported
to show also its practical viability. As far as we are aware of, the algorithm
we propose here is the rst algorithm for nonlinear complementarity problems
that requires, at each iteration, only the inexact solution of a linear system
(which is an \easy" task) and can nevertheless guarantee fast convergence
even on degenerate problems.
The paper is organized as follows: In Section 2 we recall some recent results
on semismooth functions and prove some related new results that are needed
to analyze the behavior of the algorithm. Some important properties of the
operators  and are summarized in Section 3. Section 4 contains a detailed
description of the new method along with the corresponding global and local
convergence theory. Some implementation issues of the algorithm are discussed
in Section 5, while the numerical tests are reported in Section 6.
A few words about notation. Given a function G : IRn ! IRn ; we denote by
Gi its ith component function. If G is continuously di erentiable, G0(x) is the
Jacobian matrix of G at a point x 2 IRn: If G is directionally di erentiable, we
denote the directional derivative of G at x 2 IRn in the direction d 2 IRn by
G0(x; d): The symbol k:k indicates the Euclidean vector norm or its associated
matrix norm. The index set f1; : : : ; ng is indicated by I . If M 2 IRnn is a
matrix with elements mij ; i; j 2 f1; : : : ; ng; and if K and J are subsets of
f1; : : : ; ng; then MKJ denotes the jK j  jJ j submatrix of M consisting of the
elements mij ; i 2 K; j 2 J:

2 Di erentiability of Functions and Generalized Inexact Newton


Methods

In this section we recall recent results on the di erentiability of functions and


the solution of semismooth systems of equations. Furthermore, we prove some
new results about convergence rates of general algorithms for the solution of
semismooth systems of equations and apply these results to the local analysis
of a Levenberg-Marquardt-type method.
Let G : IRn ! IRn be locally Lipschitzian; by Rademacher's theorem G is

3
di erentiable almost everywhere. If we indicate by DG the set where G is
di erentiable, we can de ne the B-subdi erential of G at x [29] as
@B G(x) := klim G0(xk );
x 2DG
xk !x

and the generalized Jacobian of G at x [3] as


@G(x) := co @B G(x);

where co denotes the convex hull of a set.


Semismooth functions were introduced in [23] for functionals, and further stud-
ied in [30].
De nition 1 Let G : IRn ! IRn be locally Lipschitzian at x 2 IRn. We say
that G is semismooth at x if
lim Hv0 (3)
H 2@G(x+tv0 )
v0 !v;t#0

exists for every v 2 IRn . A semismooth function G : IRn ! IRn is said to be


strongly semismooth at x 2 IRn if for any H 2 @G(x + d); and for any d ! 0;
Hd ? G0(x; d) = O(kdk ): 2
(4)

Semismooth functions lie between Lipschitz functions and C functions. It


1

is known that if G is semismooth at x then it is also directionally di eren-


tiable there (in particular, formula (4) makes sense for this reason), and its
directional derivative in the direction v is given by (3).
In the study of algorithms for the local solution of semismooth systems of
equations, the following regularity condition plays a role similar to that of the
nonsingularity of the Jacobian in the study of algorithms for smooth systems
of equations.
De nition 2 We say that a semismooth function G : IRn ! IRn is BD-
regular at x if all elements in @B G(x) are nonsingular.
The BD-regularity condition has some noteworthy consequences, for example
it implies the local uniqueness of a solution x , see [29, Proposition 2.5]. In
this paper we shall employ several times the following result [29, Lemma 2.6].

4
Proposition 3 Suppose that x is a BD-regular solution of G(x) = 0, where
G : IRn ! IRn is semismooth. Then there exists a neighborhood
of x and a
constant c such that
kH ? k  c;
1
8y 2
; 8H 2 @B G(y):
The following result plays a crucial role in the proof of our characterization
theorem for Q-quadratic convergence.
Proposition 4 Assume that G : IRn ! IRn is locally Lipschitzian and strongly
semismooth at x 2 IRn ; and directionally di erentiable in a neighborhood of x:
Then
lim sup kG(x + h) ? G(x) ? Hhk < 1: (5)
h!0;H 2@G(x+h) khk 2

Proof. Since G is strongly semismooth at x; it is B-di erentiable of degree


two at x [29, Lemma 2.3], i.e.
lim sup kG(x + h) ?kGh(kx) ? G (x; h)k < 1:
0
(6)
h!0
2

From (6) and the de nition of strong semismoothness we directly obtain


(5). 2

The following result characterizes the local Q-superlinear and Q-quadratic


convergence rate of sequences converging to a solution of G(x) = 0:
Theorem 5 Let G : IRn ! IRn be a locally Lipschitz-continuous function in
the open convex set D  IRn : Let fxk g  D be any sequence converging to
x 2 D with xk =6 x for all k: Then:
(a) If G is semismooth and BD-regular at x ; then fxk g converges Q-superlinearly
to x and G(x ) = 0 if and only if

lim kG(xk ) + H k dk k = 0; (7)


k!1 kkd k
where H k 2 @B G(xk ) and dk := xk+1 ? xk :
(b) If G is strongly semismooth and BD-regular at x and directionally dif-
ferentiable in a neighborhood of x; then fxk g converges Q-quadratically
to x and G(x ) = 0 if and only if

lim sup kG(x k)d+k kH d k < 1;


k k k
(8)
k!1 2

5
where again H k 2 @B G(xk ) and dk := xk+1 ? xk :

Proof. Part (a) is just a restatement of Theorem 2 by Pang and Qi [27]. In


order to prove part (b), rst assume that (8) holds. Furthermore suppose,
without loss of generality, that xk 6= xk . Let ek := xk ? x denote the error
+1

vector at the kth iterate. Note that (8) implies that


kG(xk ) + H k dk k ! 0:
kdk k

Because of part (a), this implies G(x ) = 0 and fxk g ! x Q-superlinearly. In


order to show that the rate of convergence is actually Q-quadratic, consider
the following identity:
0 = G(x ) = [G(xk ) + H k dk ] ? [G(xk ) ? G(x ) ? H k ek ] ? H k ek :
+1
(9)

In view of the semismoothness of G and the BD-regularity of x ; the matrices


H k are nonsingular for all k suciently large, and there exists a constant c > 0
such that k(H k )? k  c for all these indices, see Proposition 3. Dividing (9)
1

by kek k therefore gives


2

!
kek k  c kG(xk ) + H k dk k kdk k + kG(xk ) ? G(x ) ? H k ek k : (10)
+1 2

kek k 2
kdk k 2
kek k 2
kek k 2

Since we already know that xk ! x Q-superlinearly, we have

lim kdk k = 1 (11)


k!1 kek k

by Lemma 8.2.3 in [6]. Using (11), condition (8) and Proposition 4, we imme-
diately obtain

lim sup kkeek k k < 1;


k+1

k!1 2

i.e., fxk g converges Q-quadratically to x :


Based on the identity (9) and the upper semicontinuity of the generalized
Jacobian, the converse direction can be shown in a very similar way; we omit
the details here. 2

6
We are now in the position to de ne and analyze a Levenberg-Marquardt-type
method for the solution of a system of nonsmooth equations. Given a starting
vector x 2 IRn ; let
0

xk = xk + dk ;
+1
(12)

where dk is the solution of the linear system


((H k )T H k + k I )d = ?(H k )T G(xk ); H k 2 @B G(xk ); k  0: (13)

This scheme obviously is formally identical to the classical Levenberg-Marquardt


scheme for the solution of smooth systems of equations. We can consider also
inexact versions of this scheme, which are more suited to the large-scale case.
In this case the vector dk is given by the solution of the system

((H k )T H k + k I )d = ?(H k )T G(xk ) + rk ; (14)


H k 2 @B G(xk ); k  0;
where rk is the vector of residuals and measures how inexactly system (13) is
solved. We can state the following result.
Theorem 6 Let G : IRn ! IRn be given, and let x be a solution of G(x) = 0.
Suppose that fxk g is a sequence generated by (12) and (14) such that fxk g
converges to x .
(a) Under the assumptions of Theorem 5 (a), if f k g ! 0 and krk k =
o(k(H k )T G(xk )k), then fxk g ! x Q-superlinearly.
(b) Under the assumptions of Theorem 5 (b), if f k g = O(k(H k)T G(xk )k)
and krk k = O(k(H k )T G(xk )k2 ), then fxk g ! x Q-quadratically.

Proof. (a) In order to prove the Q-superlinear convergence of the sequence


fxk g to x; we only need to verify that
lim kG(xk ) + H k dk k = 0: (15)
k!1 kdk k

The assertion then follows from Theorem 5 (a). We rst note that by the
upper semicontinuity of the generalized Jacobian and by Proposition 3, there
exists a positive constant  such that
k(H k )T k  ; k(H k )?T k  : (16)

7
By (14) we have
G(xk ) + H k dk = (H k )?T (rk ? k dk ): (17)

Hence, by (15), (17) and (16), to prove superlinear convergence we only need
to show that
k r k ?  k dk k
lim kdk k = 0: (18)
k!1

In turn, since fk g ! 0, (18) is satis ed if

lim kr k = 0:
k
(19)
k!1 kdk k

But from the de nition of dk we have


 ?  k T k k 
dk = (H k )T H k + k I ?(H ) G(x ) + r ;
1

so that, by (16) and the assumption krk k = o(k(H k )T G(xk )k), we have kdk k =
O(k(H k)T G(xk )k), from which (19) follows.
(b) In order to prove the Q-quadratic convergence of the sequence fxk g to x ;
we only need to verify that
lim sup kG(x k)d+k kH d k < 1:
k k k
(20)
k!1 2

The assertion then follows from Theorem 5 (b). We can reason as in the
previous point to show that (20) is equivalent to

lim sup kr k?dkk d k < 1:


k k k
(21)
k!1 2

Always reasoning as on the previous point we also have that kdk k = O(k(H k)T G(xk )k).
So, taking into account that by assumption fk g = O(k(H k)T G(xk )k) and
krk k = O(k(H k)T G(xk )k ), (21) readily follows. 2
2

We stress that the condition krk k = o(k(H k )T G(xk )k) in Theorem 6 (a) can
be replaced by the condition krk k = o(kG(xk )k) without destroying the Q-
superlinear convergence of the sequence fxk g: This follows from the bounded-
ness of the sequence fxk g and the boundedness of the generalized Jacobian on

8
bounded sets. Similarly, the sequence fxk g remains Q-quadratically conver-
gent if we assume fk g = O(kG(xk )k) and krk k = O(kG(xk )k ) in Theorem
2

6 (b).
Remark 7 We note that it is easy to check, following the proofs of Theorems
5 and 6 that actually the following statement holds. If fxk g is any sequence
(no matter how generated) converging to x , then, under the assumptions of
Theorem 6 (a), if dk is generated according to (14), it holds that

lim k x k + dk ? x k
= 0:
k!1 kxk ? x k

In the case xk+1 = xk + dk this assertion coincides with Theorem 6 (a).


We nally give the de nition of SC function.
1

De nition 8 A function f : IRn ! IR is an SC function if f is continuously


1

di erentiable and its gradient is semismooth.


SC functions can be viewed as functions which lie between C and C func-
1 1 2

tions.
In proving the superlinear convergence rate of our method we shall use the
following result, proved in [8].
Theorem 9 Let f : IRn ! IR be an SC function and suppose that we have a
1

sequence fxk g converging to a stationary point x of f as well as a sequence of


directions fdk g. Assume that limk!1 kxk + dk ? x k=kxk ? x k = 0 and that,
for all k suciently large and some  > 0,
rf (xk )T dk  ?kdk k : 2
(22)

Then, for any 2 (0; 1=2), there exists a k such that for every k  k
f (xk + dk )  f (xk ) + rf (xk )T dk : (23)

3 Properties of  and

In this section we recall a few properties of the two operators  and which
are directly relevant to the convergence theory of the method to be described
in the next section.

9
In the rst theorem below we summarize the di erential properties of  and
we shall need [10,20].
Theorem 10 Let F : IRn ! IRn be given.
(a) If F is a semismooth function, then  is also semismooth.
(b) If F is twice continuously di erentiable, then  is strongly semismooth.
(c) If F is continuously di erentiable, then is also continuously di eren-
tiable, and its gradient at a vector x 2 IRn is given by r (x) = H T (x);
where H can be an arbitrary element in @ (x):
(d) If all component functions Fi are SC 1 functions, then is also an SC 1
function.
We remark the fact that, although  is nonsmooth, is continuously di er-
entiable. This fact will be crucial in the design of our algorithm.
The following matrix-classes will be used in the subsequent de nitions and
results.
De nition 11 A matrix M 2 IRnn is called a
(a) P -matrix if det(MJJ ) > 0 for all subsets J  I ;
(b) P0 -matrix if det(MJJ )  0 for all subsets J  I ;
(c) semimonotone matrix if, for all x  0; x 6= 0; there exists an index i 2 I
such that xi > 0 and xi[Mx]i  0:
Obviously, every P -matrix is a P -matrix. Furthermore, it can easily be ver-
0
i ed that positive de nite matrices are P -matrices and positive semide nite
matrices are P -matrices. Moreover it is known that the class of semimonotone
0
matrices includes the class of P -matrices, see [4].
0

For any given solution x 2 IRn of the nonlinear complementarity problem


NCP(F ), let us introduce the index sets

(x ) := fi 2 I j xi > 0g;


(x ) := fi 2 I j xi = Fi(x ) = 0g;
(x ) := fi 2 I j Fi(x ) > 0g:
If the solution x under consideration is clear from the context, these index sets
will be abbreviated by ; and ; respectively. We note that  is di erentiable
at a solution x of NCP(F ) (and hence of the system (x) = 0) if and only if
the set (x ) is empty.
De nition 12 A solution x 2 IRn of NCP(F ) is said to be b-regular if the
principal submatrices F0 (x ) are nonsingular for all index sets  with 
  [ .

10
Based on this de nitionn, we can state the following result which is an imme-
diate consequence of [5, Theorem 5.4].
Theorem 13 Let x 2 IRn be a b-regular solution of NCP(F ) such that the
Schur-complement

F 0 (x ) ? F
0 (x )F 0 (x )? F 0 (x )

1

is a P0 -matrix. Then all elements G 2 @ (x ) are nonsingular.


Note, in particular, that the above theorem holds if the classical R-regularity
condition of Robinson [31] is satis ed in x or if F 0(x ) itself is a P -matrix.
The above theorem is important because it gives a sucient condition for a
solution of NCP(F ) to be a BD-regular solution of the system (x) = 0, which,
in light of Theorem 6, will be crucial in establishing a superlinear convergence
rate of our method.
Finally, we give some conditions for a stationary point of to be a solution of
NCP(F ). Also these conditions are important because our method determines
unconstrained stationary points of the natural merit function , so that these
conditions can be seen as conditions guaranteeing that the algorithm converges
to solutions of NCP(F ). Several sucient as well as necessary-and-sucient
conditions were proven in [5]. Here we just mention two simple consequences
of the results in [5], see also [13]. To this end we rst recall that a vector
x 2 IRn is called feasible for NCP(F ) if x  0 and F (x)  0:
Theorem 14 Let F : IRn ! IRn and x 2 IRn be a stationary point of :
(a) If F 0 (x ) is a P0-matrix, then x is a solution of NCP(F ).
(b) If F 0 (x ) is a semimonotone matrix and x is feasible, then x solves NCP(F ).

Proof. Part (a) follows from Corollary 4.4 (a) in [5], whereas part (b) is a
consequence of Corollary 3.9.7 in [4] and Corollary 4.4 (c) in [5]. 2

As far as the authors are aware of, these conditions for stationary points being
solutions of NCP(F ) are the weakest conditions known for any equation-based
method or for any (constrained or unconstrained) minimization reformulation
of NCP(F ).

11
4 The Nonsmooth Inexact Levenberg-Marquardt Algorithm

We are now ready to describe an inexact Levenberg-Marquardt-type algorithm


for the solution of NCP(F ). We recall that we made the blanket assumption
that F is at least continuously di erentiable. Roughly speaking this algorithm
can be seen as an attempt to solve the semismooth system of equations (x) =
0 by using the inexact Levenberg-Marquardt method (12) and (14). To ensure
global convergence, a line search is performed to minimize the smooth merit
function . If the search direction generated by (14) is not a \good" descent
direction (according to test (25) below), we resort to the negative gradient of
. We remark the extreme simplicity of the method from both the theoretical
and computational point of view.
Algorithm.
Data: x 2 IRn;  > 0; p > 2; 2 (0; 1=2); "  0:
0

Step 0: Set k = 0:
Step 1: (stopping criterion) If kr (xk )k  "; stop.
Step 2: (search direction calculation) Select an element H k 2 @B (xk ):
Find an approximate solution dk 2 IRn of the system
((H k )T H k + k I )d = ?(H k )T (xk ); (24)
where k  0 is the Levenberg-Marquardt parameter. If the condition
r (xk )T dk  ?kdk kp (25)
is not satis ed, set dk = ?r (xk ):
Step 3: (line search) Find the smallest ik 2 f0; 1; 2; : : :g such that
(xk + 2?ik dk )  (xk ) + 2?ik r (xk )T dk : (26)
Set xk = xk + 2?ik dk ; k
+1
k + 1 and go to Step 1.

Note that the choice k = 0 at each step is allowed by the above algorithm.
In this case, provided that H k is nonsingular, the equation (24) is equivalent
to the generalized Newton equation
H k d = ?(xk ):

The nonsmooth Newton method [29] for the solution of system (x) = 0
is therefore a particular case of our scheme. In Step 2 we say that we want
to solve the linear system (24) inexactly. By this we mean that the vector

12
dk 2 IRn satis es
((H k )T H k + k I )dk = ?(H k )T (xk ) + rk = ?r (xk ) + rk (27)

(the second equality follows by Theorem 10 (c)) for some suitable residual
vector rk . Note however that, as usual in truncated schemes, the vector rk is
not xed beforehands. Instead an iterative solver is used to solve the linear
system (24), and this method is stopped when the norm of the residual rk is
smaller than a pre xed accuracy (see the next two sections).
In what follows, as usual in analyzing the behaviour of algorithms, we shall
assume that " = 0 and that the algorithm produces an in nite sequence of
points.
Theorem 15 Assume that the sequence fk g is bounded and that the sequence
of residual vectors frk g satis es the condition krk k  k kr (xk )k; where k
is a sequence of positive numbers converging to 0 and such that k < 1 for
every k: Then each accumulation point of the sequence fxk g generated by the
above procedure is a stationary point of :

Proof. First assume that there exists an in nite set K such that dk = ?r (xk )
for all k 2 K: Then the assertion follows immediately from well-known results
(see, e.g., Proposition 1.16 in [1]). Hence we can assume without loss of gen-
erality that dk is always given by (27).
We now show that the sequence fdk g is uniformly gradient related to fxk g
according to the de nition given in [1], i.e. we show that for every convergent
subsequence fxk gK for which
lim r (xk ) 6= 0
k!1;k2K
(28)

there holds

lim sup kdk k < 1; (29)


k!1;k2K
lim inf jr (xk )T dk j > 0:
k!1;k2K
(30)

Let x 2 IRn denote an accumulation point of the sequence fxk g: Subsequenc-


ing if necessary, we assume, without loss of generality, that xk ! x : Suppose
that x is not a stationary point of : We get from (27):
kr (xk ) ? rk k = k((H k )T H k + k I )dk k  k(H k )T H k + k I k kdkk (31)

13
and therefore
kdk k  k(kr (xk ) ? rk k :
H k )T H k + k I k (32)

Note that the denominator in (32) is nonzero since otherwise we would have
r (xk ) ? rk = 0 by (31) which, in turn, since krk k  k kr (xk )k with
k < 1; would be possible only if kr (xk )k = 0; so that xk would be a
stationary point and the algorithm would have stopped.
Since fk g is bounded by assumption and the generalized Jacobian is upper
semicontinuous, there exists a constant  > 0 such that
1

k(H k )T H k + k I k   1

(recall that the sequence fxk g is bounded since xk ! x ). Taking into account
krk k = o(kr (xk )k); we therefore get from (32)
kdk k  (kr (xk ) ? rk k)=  (kr (xk )k ? krk k)=   kr (xk )k(33)
1 1 2

for all k suciently large and a suitable constant  > 0:


2

Relation (29) now readily follows from the fact that we are assuming that
the direction satis es (25) with p > 2 while the gradient r (xk ) is bounded
on the convergent sequence fxk g. Consider (30). If it is not satis ed there
exists a subsequence fxk gK 0 of fxk gK for which limk!1;k2K 0 jr (xk )T dk j = 0.
This implies, by (25), that limk!1;k2K 0 kdk k = 0. In view of (33), this in turn
implies limk!1;k2K 0 kr (xk )k = 0, contradicting (28). The assertion of the
theorem now follows from [1, Proposition 1.8]. 2

Note that, under the additional assumptions of Theorem 14, we get from
Theorem 15 that any accumulation point of the sequence fxk g is a solution of
NCP(F ).
The next result gives a mild sucient condition for the whole sequence to
converge to a single point.
Theorem 16 Let the assumptions of Theorem 15 hold. Let fxk g be any se-
quence generated by the algorithm. If one of the accumulation points of fxk g;
say x ; is an isolated solution of NCP(F ), then the entire sequence fxk g con-
verges to x :

Proof. Let K be a subset of f1; 2; : : :g such that fxk gK ! x . Since fkr (xk )kgK !

14
0 we have, either because dk = ?r (xk ) or by (25), with p > 2, that
fdk gK ! 0. The thesis then follows by Theorem 2.3 in [9]. 2

The following result deals with the local rate of convergence.


Theorem 17 Assume the sequence frk g satis es the assumption of Theorem
15. If one of the limit points of the sequence fxk g; let us say x ; is a BD-
regular solution of the system (x) = 0; and if each Fi is an SC 1 function,
then
(a) The sequence fxk g converges to x :
(b) Eventually dk is always given by the solution of system (24) (i.e., the
negative gradient is never used eventually).
(c) If f k g ! 0, eventually the stepsize of one is always accepted so that
xk+1 = xk + dk :
(d) If f k g ! 0; then xk ! x Q-superlinearly.
(e) If Fi 2 C 2 for each i 2 f1; : : : ; ng;  k = O(kr (xk )k) and krk k =
O(kr (xk )k2); then xk ! x Q-quadratically.

Proof. (a) Since x is a BD-regular solution of (x) = 0 and since  is semis-


mooth by Theorem 10 (a), x is a locally unique solution of this system by
Proposition 3 in [27] and therefore an isolated solution of NCP(F ). Hence
xk ! x by Theorem 16.

(b) We rst show that there is a constant  > 0 such that


3

r (xk )T dk  ? kdk k
3
2
(34)

for all k suciently large. Since x is a BD-regular solution of (x) = 0; xk !


x by part (a) and k  0 for all k; the matrices (H k )T H k + k I are uniformly
positive de nite for all k suciently large. Hence there exist constants   2
 > 0 such that
1

 
 kdk k  (dk )T (H k )T H k + k I dk   kdk k :
1
2
2
2
(35)

We now obtain from (27), (33), (35), krk k  k kr (xk )k and the Cauchy-
Schwarz inequality:
 
r (xk )T dk = ?(dk )T (H k )T H k + k I dk + (rk )T dk
 ? kdk k + krk k kdk k
1
2

15
 ? kdk k + k kr (xk )k kdk k
1
2

 ? kdk k + k kdk k = :
1
2 2
2

From this, inequality (34) follows immediately by taking into account that
f k g ! 0. Therefore, statement (b) follows from (34), fkdk kg ! 0 and the
fact that p > 2 in (25).
(c) By Theorem 10 (d), the merit function is an SC function. Statement
1

(c) is therefore a direct consequence of (34), Remark 7 and Theorem 9.


(d) and (e) By point (c) the algorithm reduces eventually to xk = xk + +1

dk . The assertions then follow directly from Theorem 6, taking into account
Theorem 10 (a) and (b) and the fact that r (xk ) = (H k )T (xk ) by Theorem
10 (c). 2

The BD-regularity assumption in Theorem 17 is satis ed under the assump-


tions of Theorem 13 and therefore, in particular, under the R-regularity con-
dition of Robinson [31]. We also stress that the rate of convergence results are
independent of whether the limit point x is degenerate or not.

5 Implementation Issues

In the implementation of the algorithm we assume that three routines are


available: one that, given a point x, returns the value of F (x), a second one
that, given a point x and a vector v, returns the product F 0(x)v and a last
one that, given a point x and a vector v, returns the product F 0(x)T v. We
note that these are usual requiremts in the solution of large-scale problems.
In particular, since most large-scale problems have a sparse structure, it is
generally possible to store the Jacobian of F in a compact form from which
the second and third routine mentioned above are readily obtained. In any
case, even when the Jacobian is not exlicitly known, automatic di erentiation
gives us a sound theoretical tool to obtain these routines at a cost which is
only a small multiple of the cost of calculating F .
The computational most intensive part of algorithm is the approximate solu-
tion of system (24). We note that this system is always solvable. In fact, if
k > 0, then the matrix (H k )T H k + k I is (symmetric) positive de nite and
hence system (24) is surely solvable. If k = 0, then the matrix (H k )T H k + k I
reduces to (H k )T H k , which is guaranteed to be only positive semide nite. How-
ever, in this case, system (24) reduces to the normal equations for the system
H k d = ?(xk ), and is therefore solvable. We therefore chose to solve system
(24) by the (unpreconditioned) conjugate method, which is guaranteed to nd

16
a solution of the system in at most n iterations in exact arithmetic (see, e.g.,
[28] for the semide nite case). This means, in particular, that we can solve
system (24) as accurately as we want, i.e., that it is always possible to make
the residual vector rk as close to 0 as we like. Having chosen to solve system
(24) by the conjugate gradient method, we have to show how it is possible to
calculate the product of any vector v times the matrix (H k )T H k + k I . In turn,
it is clear that this is possible if we are able to calculate the product of any vec-
tor v by H k and by (H k )T . To this end we rst have to specify which element
H k 2 @B (xk ) we select at the k-th iteration. In [5] we showed that an element
of @B (x) can be obtained in the following way. Let := fi : xki = 0 = Fi(xk )g
be the set of \degenerate indices" and de ne z 2 IRn to be a vector whose
components zi are 1 if i 2 and 0 otherwise. Then, the matrix H k de ned by
H k = A(xk ) + B (xk )F 0(xk );

where A and B are n  n diagonal matrices whose ith diagonal elements are
given, respectively, by
( xi ? 1 if i 62 ;
Aii(x) = k xi ;Fi zxi k ? 1 if i 2 ;
( ( ))

k zi ;rFi x T z k
( ( ) )

and by
8 Fi x
>
< ( )
? 1 if i 62 ;
Bii(x) = > k xir;FFii xx Tkz
( ( ))

: k zi ;rFi x T z k ? 1 if i 2 ;
(
( )
( ) )

belongs to @B (xk ). It is now easy to see that, storing in two vectors the di-
agonal elements of the matrices A and B and using the routines that calulate
the product of a vector v by the matrices F 0 and F 0T we can evaluate also
the product of v by H k and by (H k )T . Two situations can be distinguished. If
= ; (i.e. if  is di erentiable at xk ), then the product Hv can be calculated
performing the product F 0(xk )v plus O(n) arithmetic operations. If, instead,
6= ; (i.e. if  is not di erentiable at xk ), then we need an additional cal-
culation of the product F 0(xk )T z in order to evaluate the diagonal elements
of A(xk ) and B (xk ). As expected, if  is not di erentiable at xk , then the
calculation of H k v is more costly than in the case in which  is di erentiable
at x. Similar considerations can be made in the case of the product (H k )T v
and we omit them.
We nally discuss the linesearch used in the algorithm. We use a nonmonotone
linesearch, which can be viewed as an extension of (26). To motivate this
variant we rst recall that it has been often observed, in the eld of nonlinear
complementarity algorithms, that the linesearch test used to enforce global

17
convergence can lead to very small stepsizes; in turn this can bring to very slow
convergence and even to a numerical failure of the algorithm. To circumvent
this problem many heuristics have been used, see, e.g., [18,26]. We propose to
substitute the linesearch test (26) by the following nonmonotone linesearch:
(xk + 2?ik dk )  W + 2?ik r (xk )T dk ; (36)

where W is any value satisfying


(xk )  W  j max
; ;:::;M k
=0 1
(xk?j ) (37)

and M k are nonnegative integers bounded above for any k (we suppose for
simplicity that k is large enough so that no negative superscripts can occur in
(37)). This kind of linesearch has been rst proposed in [15] and since then it
has proved very useful in the unconstrained minimization of smooth functions.
In particular it has already been used in the eld of nonlinear complementarity
problems in [5]. Adopting the same proof techniques used in [15], it is easy
to see that all the results described in the previous section still hold if we
substitute the linesearch (36) to (26) in the algorithm of Section 4. If at each
iteration we take W = (xk ); we obtain exactly the algorithm of Section 3,
however much more freedom is allowed by (36). In particular, according to
the most recent experiences in nonmonotone minimization algorithms [32], we
chose to keep W xed as long as the algorithm seems to make progress and to
vary it only if for a certain pre xed number of consecutive steps the function
values (xk ) increase; the precise scheme is as follows.
- Set W = (x ) at the beginning of the algorithm;
0

- keep the value of W xed as long as


(xk )  j min
;;
(xk?j );
=0 1 2
(38)

- If (38) is not satis ed at iteration k, set W = (xk );


- In any case, at most every 20 iterations reset W to (xk ).

6 Numerical Results

We tested our algorithm on 17 test problems generated as suggested in [16].


Let g(x) = 0 be a (large-scale) di erentiable system of nonlinear equations
and let x 2 IRn be de ned by
x = (1; 0; 1; 0; : : :)T :

18
For all i = 1; : : : ; n set
(
? gi(x ) if i odd or i > n=2;
Fi(x) = ggi((xx)) ?
i gi(x ) + 1 otherwise

In this way x is a solution of the nonlinear complementarity problem NCP(F ).


In particular note that x is a degenerate solution (with j (x)j = n=4) so that
the system (x) = 0 is nondi erentiable at x. As done in [16] we used the
collection of 17 large-scale problems used by Luksan in [21] and the starting
points indicated there. We coded the algorithm in Fortran 77 and run it on
an IBM RISC 6000/375 machine, in double precision arithmetic.
p
The main stopping criterion is k min(xk ; F (xk ))k  10? n, but we also
5

stopped the algorithm after 100 iterations if the former stopping criterion
was not met. We used a single set of parameters for all the runs of the prob-
lems, more precisely we set: = 10? ,  = 10? and p = 2:1. The conjugate
4 8

gradient algorithm was stopped when the norm of the residual is smaller than
(0:1=(k + 1))kr (xk )k, but we also set a limit of 200 iterations to the conju-
gate gradient phase. Finally we turn to the choice of k . We rst note that the
classical sophisticated choices indicated, e.g., in [6], are not suitable for large
scale problems. We used a very simple, and yet seemingly e ective strategy. If
in the previous iteration the quotient kr (xk? )k=kdk? k is greaterpthan 250
1 1

and the norm of the residual k min(xk ; F (xk ))k is smaller than k(0:1 n) then
we set k = 1, otherwise we set k = 0. We note that the rst test is a rough
indicator that \something is going wrong" while the second test makes the
possibility of the perturbation (i.e. k > 0) more and more unlikely the more
the process progresses, so that the nal fast convergence rate is preserved.
We run the algorithm on the 17 test problems with dimensions n = 100,
n = 1000 and n = 10000. The results are summarized in Table 1. For each
test we report: the dimension of the problem (n), the number of iterations (it),
the total number of iterations of the conjugate gradient algorithm used to solve
the linear systems (minor it), the number of functions evaluations (F), the nal
value
pn, sooftothemake
merit function and the norm of the residual vector divided by
this stationarity measure independent of the dimension of the
problem. If convergence occured to a solution of NCP(F ) di erent from x ,
this is indicated by an  after the iteration number. Note that this evenience is
possible because we do not have any guarantee that x is the only solution of
the problem generated. In any case we observed that, even when the solution
found di ers from x, this is so only for very few components.
The overall results seem to indicate that, on this set of test problems, the
algorithm is fairly robust and capable of nding a solution to the problem with
a limited amount of work. We see that the algorithm is practically insensitive
to the dimension of the problem, which is a typical feature of Newton-type

19
Table 1
Results for degenerate problems. p
Prob. n it minor it. F Res/ n
100 9 139 10 .82d-9 .29d-5
1 1000 9 159 10 .31d-7 .56d-5
10000 10 166 11 .45d-8 .67d-6
100 4 6 8 .13d-9 .16d-5
2 1000 4 6 7 .20d-12 .20d-7
10000 4 6 7 .20d-11 .20d-7
100 F
3 1000 F
10000 F
100 10 82 11 .38d-9 .29d-5
4 1000 11 84 12 .31d-8 .27d-5
10000 11 81 12 .28d-9 .23d-6
100 31 1786 36 .45d-8 .93d-5
5 1000 19 1414 20 .45d-7 .90d-5
10000 21 1060 24 .46d-6 .92d-5
100 F
6 1000 F
10000 F
100 25 321 31 .22d-8 .67d-5
7 1000 26 330 29 .83d-8 .41d-7
10000 31 340 36 .41d-6 .91d-5
100 13 128 14 .83d-9 .23d-5
8 1000 15 144 16 .30d-8 .14d-5
10000 15 139 16 .46d-7 .30d-5
100 14 150 15 .41d-9 .25d-5
9 1000 14 101 15 .72d-8 .24d-5
10000 15 105 16 .91d-7 .24d-5
100 16 156 18 .47d-12 .97d-7
10 1000 10 24 11 .24d-10 .22d-6
10000 11 32 13 .26d-6 .72d-5
100 4 6 5 .26d-17 .23d-9
11 1000 4 6 5 .26d-16 .23d-8
10000 4 6 5 .26d-15 .23d-9
100 10 45 11 .16d-9 .18d-5
12 1000 10 42 11 .15d-8 .17d-5
10000 10 42 11 .15d-7 .17d-7
100 25 162 33 .70d-10 .63d-6
13 1000 22 133 30 .19d-7 .30d-5
10000 18 110 25 .83d-7 .64d-5
100 14 390 15 .16d-8 .56d-5
14 1000 14 407 15 .95d-8 .28d-5
10000 13 702 14 .11d-6 .38d-5
100 14 297 15 .60d-9 .25d-5
15 1000 14 417 15 .28d-7 .41d-5
10000 22 793 28 .54d-7 .19d-5
100 10 259 11 .81d-9 .40d-5
16 1000 100 18828 108 .32d-4 .24d-3
10000 100 18962 102 .75d-3 .12d-4
100 10 72 11 .15d-9 .17d-5
17 1000 10 70 11 .33d-8 .26d-5
10000 10 68 11 .40d-8 .90d-6
methods. Furthermore, in all cases in which convergence occurred, a fast nal
rate of convergence was observed, showing that the assumptions necessary to
establish the superlinear rate of convergence of the method are usually met
in practice. We now comment on the failures of the algorithm. The algorithm
fails on Problems 3 and 6 for n = 100, n = 1000 and n = 10000. The reason of
the failure is that in these cases convergence occurs towards stationary points
of the merit function that are not solutions of NCP(F ). There is not much

20
Table 2
Results for nondegenerate problems, n = 1000. p
Prob. it minor it. F Res/ n
1 6 65 7 .21d-7 .64d-5
2 4 6 7 .40d-12 .28d-7
3 F
4 14 70 15 .39d-8 .28d-5
5 19 648 21 .30d-7 .78d-5
6 F
7 25 329 27 .36d-11 .85d-7
8 14 86 15 .38d-9 .87d-6
9 17 112 18 .63d-9 .11d-5
10 9 20 10 .15d-7 .55d-5
11 3 4 4 .13d-7 .51d-5
12 6 18 7 .83d-12 .40d-7
13 14 67 20 .72d-11 .12d-6
14 100 3191 796 .79d-1 .12d-1
15 12 136 16 .11d-7 .47d-5
16 11 141 12 .43d-9 .92d-6
17 10 61 11 .39d-9 .89d-6

we can do for avoiding this (except considering global minimization algorithms


for or heuristics). However, it is interesting to note that in [16], where the
merit function k min(x; F (x))k is used in the solution of the same problems
2

considered here, convergence to stationary points others than the solutions


of the nonlinear complementarity problems is more frequent and furthermore
divergence also occurs in some cases, which is not the case for our algorithm.
This seems to con rm, at least on this set of test problems, that the merit
function enjoys better theoretical properties than other merit functions.
Continuing in the analysis of the failures we also see that the algorithm reached
the maximum number of iterations without meeting the stopping criterion on
Problem 16, for n = 1000 and n = 10000. In this case, as indicated by the nal
value of and of the residual, the algorithm is close to a solution. We observed
the following behaviour. In few iterations the algorithm is able to reach a zone
close to the solution. After that the maximum number of iterations allowed to
the conjugate gradient solver is always reached. This means that the search
direction is not close to the Newton direction so that the algorithm converges
very slowly to the solution.
We also performed a second set of experiments generating complementarity
problems as done before but setting
(
Fi(x) = ggi((xx)) ? gi(x ) if i odd
? gi(x ) + 1 otherwise
i

The complementarity problems so obtained are nondegenerate at the solution


x and therefore (x) is continuously di erentiable at x . Since again there
are no major di erences when n varies, we report in Table 2 only the results
for n = 1000.

21
The only new observation we can make with respect to these results is that
they are overall very similar to the previous ones, showing that the algorithm
is seemingly insensitive to whether the solution towards which convergence
occurs is a di erentiable point of the system (x) = 0 or not.

Acknowledgement

We are grateful to M.A. Gomes-Ruggiero and J.M. Martnez who very kindly
provided us with the fortran codes of the test problems.

References

[1] D.P. Bertsekas, Constrained Optimization and Lagrange Multiplier Methods


(Academic Press, New York, 1982).
[2] C. Chen and O.L. Mangasarian, A class of smoothing functions for nonlinear and
mixed complementarity problems, Technical Report 94-11, Computer Sciences
Department, University of Wisconsin (Madison, WI, August 1994, revised
February 1995). To appear in Computational Optimization and Applications .
[3] F.H. Clarke, Optimization and Nonsmooth Analysis (Wiley, New York, 1983,
reprinted by SIAM, Philadelphia, 1990).
[4] R.W. Cottle, J.-S. Pang and R.W. Stone, The Linear Complementarity Problem
(Academic Press, New York, 1992).
[5] T. De Luca, F. Facchinei and C. Kanzow, A semismooth equation approach
to the solution of nonlinear complementarity problems, DIS Technical Report
01.95, Universita di Roma \La Sapienza" (Roma, Italy, January 1995).
[6] J.E. Dennis and R.B. Schnabel, Numerical Methods for Unconstrained
Optimization and Nonlinear Equations (Prentice Hall, Englewood Cli s, New
Jersey, 1983).
[7] S.P. Dirkse and M.C. Ferris, MCPLIB: A collection of nonlinear mixed
complementarity problems, Technical Report 1215, Computer Sciences
Department, University of Wisconsin (Madison, WI, 1994). To appear in
Optimization Methods and Software .
[8] F. Facchinei, Minimization of SC1 functions and the Maratos e ect, DIS
Technical Report 10.94, Universita di Roma \La Sapienza" (Roma, Italy, 1994).
To appear in Operations Research Letters .
[9] F. Facchinei, On the convergence to a unique point of minimization algorithms,
DIS Technical Report 06.95, Universita di Roma \La Sapienza" (Roma, Italy,
January 1995).

22
[10] F. Facchinei and J. Soares, A new merit function for nonlinear complementarity
problems and a related algorithm, DIS Technical Report 15.94, Universita di
Roma \La Sapienza" (Roma, Italy, 1994).
[11] M.C. Ferris and J.-S. Pang, Engineering and economic applications of
complementarity problems, Technical Report, Computer Sciences Department,
University of Wisconsin (Madison, WI, 1995).
[12] A. Fischer, A special Newton-type optimization method, Optimization 24 (1992)
269{284.
[13] A. Fischer, An NCP-function and its use for the solution of complementarity
problems, to appear in D.-Z. Du, L. Qi and R. Womersley (eds.),
Recent Advances in Nonsmooth Optimization (Kluwer Academic Publishers,
Dordrecht, 1995).
[14] S.A. Gabriel and J.-S. Pang, An inexact NE/SQP method for solving
the nonlinear complementarity problem, Computational Optimization and
Applications 1 (1992) 67{91.
[15] L. Grippo, F. Lampariello and S. Lucidi, A nonmonotone line search technique
for Newton's method, SIAM Journal on Numerical Analysis 23 (1986) 707{716.
[16] M.A. Gomes-Ruggiero, J.M. Martnez and S.A. Santos, Solving nonsmooth
equations by means of quasi-Newton methods with globalization. To appear
in D.-Z. Du, L. Qi and R. Womersley (eds.), Recent Advances in Nonsmooth
Optimization (Kluwer Academic Publishers, Dordrecht, 1995).
[17] P.T. Harker and J.-S. Pang, Finite-dimensional variational inequality and
nonlinear complementarity problems: a survey of theory, algorithms and
applications, Mathematical Programming 48 (1990) 161{220.
[18] P.T. Harker and B. Xiao, Newton's method for the nonlinear complementarity
problem: a B-di erentiable equation approach, Mathematical Programming 48
(1990) 339{357.
[19] G. Isac, Complementarity Problems , Lecture Notes in Mathematics 1528
(Springer-Verlag, Berlin, 1992).
[20] H. Jiang and L. Qi, A new nonsmooth equations approach to nonlinear
complementarities, Applied Mathematics Report 94/31, School of Mathematics,
University of New South Wales (Sydney, Australia, October 1994).
[21] L. Luksan, Inexact trust region method for large sparse systems of nonlinear
equations, Journal of Optimization Theory and Applications 81 (1994) 569{590.
[22] J.M. Martnez and L. Qi, Inexact Newton methods for solving nonsmooth
equations, Applied Mathematics Report 93/9, School of Mathematics,
University of New South Wales (Sydney, Australia, 1993, revised April 1994).
[23] R. Miin, Semismooth and semiconvex functions in constrained optimization,
SIAM Journal on Control and Optimization 15 (1977) 957{972.

23
[24] J.-S. Pang, Inexact Newton methods for the nonlinear complementarity
problem, Mathematical Programming 36 (1986) 54{71.
[25] J.-S. Pang, Complementarity problems, in: R. Horst and P.M. Pardalos (eds.),
Handbook of Global Optimization (Kluwer Academic Publishers, Dordrecht,
1994), 271{338.
[26] J.-S. Pang and S.A. Gabriel, NE/SQP: A robust algorithm for the nonlinear
complementarity problem, Mathematical Programming 60 (1993) 295{337.
[27] J.-S. Pang and L. Qi, Nonsmooth equations: motivation and algorithms, SIAM
Journal on Optimization 3 (1993) 443{465.
[28] B.N. Pshenichny and J.M Danilin, Numerical Methods in Extremal Problems
(Mir Publisher, Moscow, 1978).
[29] L. Qi, A convergence analysis of some algorithms for solving nonsmooth
equations, Mathematics of Operations Research 18 (1993) 227{244.
[30] L. Qi and J. Sun, A nonsmooth version of Newton's method, Mathematical
Programming 58 (1993) 353{368.
[31] S.M. Robinson, Generalized equations, in: A. Bachem, M. Groetschel and
B. Korte (eds.), Mathematical Programming: The State of the Art (Springer-
Verlag, Berlin, Germany, 1983), 346{367.
[32] Ph.L. Toint, A non-monotone trust-region algorithm for nonlinear optimization
subject to convex constraints, Report 92/94, Department of Mathematics,
FUNDP, Namur, Belgium, 1994.

24

You might also like