You are on page 1of 3

IEEk TRANSACTIONS ON NEURAL NETWORKS, VOL I , NO 6, NOVEMBER 1996 1513

A Note on Stability of Analog Neural oscillations, are considered and analyzed. The networks' learning
Networks with Time Delays property affected by the time delays has also been studied in [4].
Marcus and Westervelt have investigated the case in which C , =
Y. J. Cao and Q. H. Wu C , r3 = T , and T,, are symmetric in (1). By rescaling time, delay
and Tt3,the new variables, t' = L, RC r' = 2 f?C and J,, = RT,,,
are obtained. Neglecting ' nithout losing the generality, linearizing
Abstract-This note presents a generalized sufficient condition which .f,?('U,( t - 7)around the eqnilibrium gives
guarantees stability of analog neural networks with time delays. The con-
dition is derived using a Lyapunov functional and the stability criterion
is stated as: the equilibrium of analog neural networks with delays is duC= -Uz
- + C11,J,,u,(t -
N
T)

globally asymptotically stable if the product of the norm of connection dt


3=1
matrix and the maximum neuronal gain is less than one.
where 3, is the gain of the ith neuron. It is convenient to represent the
I. INTRODUCTION linearized form of the' \ I tirue delay equations as using amplitudes
.r, ( i = 1 . 2 . ' . . , N) along the eigenvectors of the connection
Theoretical study of neural dynamics and hardware implementation
matrix .I,,, which gives
of artificial neural networks has advanced rapidly in recent years
[ I1-[9]. In particular, with advances in very large scale integration
(VLSI) technology, electronic implementations of analog neural
networks have laid a path leading to neural computers. However,
many problems such as switching delays, integration, and commu- where A; is the ith eigenvalue o f matrix J . Denoting Amin and, , ,A
nication delays in the hardware implementation have arisen, which as the minimal and maximal eigenvalues of matrix J , respectively,
deteriorate dynamic performance and lead to instability of hardware the following results can be obtained by using the characteristic
neural networks. Study of neural dynamics with consideration of equations.
these problems becomes more important to manufacture high quality
microelectronic neural networks. The time delay degrades stability of
1 ) If n, < &,& +
> -Arnin(u3 1)'I' and d = - t y ( w r ) ,
< LO'T < T ,the origin is stable.
dynamics systems greatly [lO]-[12] and it is a key issue which causes
2) If /jL > or 9, .< "lax and & < -Xmirl(d2 + l)'/',
instability of hardware neural networks. This problem is introduced
as follows.
*I7 7x

w' = - - f g ( d T ) , f < w~ < T , the origin is unstable. n?, =&


The dynamics equations for analog neural networks with time is a pitchfork bifurcalion, whereas 13 = - A m i n ( d 2 + l)')i,
delays can be described as follows [11, 141, [81: w = - t g ( . j ~ ) , < W T < T is a Hopf bifurcation.
In this note, a Lyapunov fiinctional [ 131 is employed to investigate
the stability of the continuous Hopfield neural network with time
delays. A generalized sufficient condition that guarantees stability of
analog neural networks with delay is presented. The stability criterion
where the variable U ( t ) represents the voltage on the input of the can be described as the equilibrium of analog neural networks with
ith neuron. Each neuron is characterized by an input capacitance delay is globally asymptotically stable provided that the product of
C ,, a time delay T , , and a transfer function .f, . The element of the the norm of connection matrix and the maximum neuronal gain is less
connection matrix, T Y 3has , a value l / R c Jwhen the noninverting than one. This criterion is an extension of the results presented in [I].
output of ,jth neuron is connected to the input of ith neuron through
,
a resistance R,, , and the value - 1/R, when the inverting output of
11. CONDITION FOR STABILITY OF NEURALNETWORKS
WITH DELAYS
,jth neuron is connected to the input of ith neuron through a resistance
R , , . The parallel resistance at the input of each neuron is defined as Consider the following autonomous time delay equation:
I?<"= (2, lTrJl)-',
When the connection matrix T = { T ) ] }is symmetric, it is well
known that system (1) is always a dynamics in gradient convergence where .f: C + R" is completely continuous and solutions of (4)
if T~ = 0, .j = 1 . 2 . . . . , L\-. This has been the basis for applications depend continuously on thz initial data. We denote by :c(i)) the
of the model to the associative memory and optimization problems. solution through (0. d), 4 E C and C denotes C ( [ - - T01,
, R7').
However, when T is symmetric and providing the gains (defined as If 1': C + R is a continuous functional assumed as a Lyapunov
a slope of f ,( U ) at 'U = 0) are sufficiently high, (1) is not necessarily functional, we define the derivation of T I along the solution of (4)
convergent if the time delays exist. Divergence may happen even as follows:
if the delays r, are very small. In particular, even if the delays
are all the same (7)= T ) across the network, the dynamics of (1)
may be not convergent as shown by Marcus and Westervelt [ I ] . An
extensive analysis of the effect of one common delay and the stability Lemma 1: Suppose 1': C' -+ R is continuous and there exist
of (1) has been conducted in [l], especially with consideration of nonnegative functions (((1.) and b ( r ) such that a(?.)+ oc as I' + x
different network architectures. In 141 different possible delay values
are allowed in the network and the network dynamics, in particular
Manuscript received March 15, 1995; revised April 8, 1996.
then the solution ,E = 0 of (4) is stable and every solution is bounded.
The authors are with the Department of Electrical Engineering and Elec-
tronics, University of' Liverpool, Liverpool L69 3BX, U.K.
Publisher Item Identifier S 1045-9227(96)06618-0. approaches zero as t -
If, in addition, b ( r ) is positive definite, then every solution of (4)
ai.

I045-9227/96$05.00 0 1996 IEEE


1534 IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL. 7 , NO. 6, NOVEMBER 1996

Based on the above lemma, let us consider the continuous Hopfield Furthermore, based on the Cauchy inequality, we have
neural networks with time delays. Assuming C, = C and R, = R,
( I ) can be rewritten as uT(t).Jrl i ,Jll4t)lll l J l l 2 I l 4 t . ~ ) l / (12)

(E;”=, u : ( t r t ) ) i .Therefore
where i l u ( t , ~ ) l=
l -

dV(ut) I-(l14t)l12 + I / 4 t . T ) 1 I 2 ) + 23114t)ll IlJll2 ll~(t>T)ll.


dt
Using the transformation t‘ = T: = A, &, Jt3 = RT,,; (13)
E, TZ7= 1 and neglecting (1) becomes ‘, Noting the second term in the right-hand side of (13) and using the
arithmetic geometric inequalilty, we have

ll4t)ll llu(t. T ) II ==(I1 u(f)1l2Il”(t>7)117+


In this case, multiple time delays are considered in the neural network.
According to the compatability of norms of vectors and matrices, we
A
denote ljxll = 1/z1I2 = 1z,12)1/2, where z is n vector, and
A
its induced norm of a matrix, A, by IlAllz = ~ ~ x , ( A , ( A ~ A ) ) ~ ’ ~ .
A maximum neuronal gain, [3 = inax(P1, & > . . ,8.v}, is defined. .
Using the Lyapunov functional defined in Lemma 1, the stability
criterion can be obtained as follows.
Theorem: If PIlJllz < 1, then the equilibrium of system (8) is
unique and asymptotically stable.
The proof is given as follows. Define b ( r ) as (1 - ,3\I,J\\2)\\r\l2,according to the lemma, the
The equilibrium of system (8) i s unique. equilibrium of system (8) is asymptotically stable.
Suppose system (8) has a nonzero equilibrium Corollary: If the matrix <J is symmetric,, , ,A denotes the maxi-
Xo = (X:,X,”,....X,:~)’, then Xp = ~~~1 J 2 , f 3 ( X ; ) . mum of absolute eigenvalue of matrix J and RJX,I, < 1, then the
equilibrium of system (8) is unique and asymptotically stable.
i = 1 , 2 . . . . , N. Let Y O = (YF,Y,O,....Y;)~,
U: = f Z ( X P ) , i = 1 , 2 : . . , N ,

ll~ol=
then Xo = J Y ” .
Xo’Xo = X o T J Y o , llXol12 5 llXolj IIcJ1121/YoIJ and
l llf3(Xy)llI o,//X~llI LllX,”ll,therefore
IlYOll I ,~I/X0II, I/X01I2
I 9/1~J11211X01I2
An example is given as follows to show the above corollary:

.J = [--I
0 -1 -1
0
--1 -1
-:I.
which results in PIlJlln 2 1, contradicting the assumption.
The eigenvalues of matrix ,T ,are X i , 2 = k,
A 3 = -1. According to
the theorem, if 13 < 1, then the equilibrium of system (8) i s unique
Thus llXoll = 0, Xo = 0 and the origin is the unique
and asymptotically stable with respect to arbitrary delays.
equilibrium of system (8).
The equilibrium of system (8) i s asymptotically stable.
Let d = ( $ 1 , 4 ~’ >
’ @,v)’, a(.) = r 2 and V functional be
i
111. CONCLUSION
N This note presents a generalized sufficient condition which guar-
antees stability of analog neural networks with time delays. The
stability criterion can be described as follows: the equilibrium of
analog neural networks with time delays is globally asymptotically
a(.) tends to +m as t + a, and obviously u(lld(0)Il) 5
stable provided that the product of the norm of connection matrix and
V ( 4 ) .Differentiating
the V functional with respect to (8), we
the maximum neuronal gain is less than one. The condition provides
have
a handy assessment on the stability of neural networks with time
delays and can be used to design stable analog neural networks in
practical applications.

REFERENCES
C. M. Marcus and R. M. Westervelt, “Stability of analog neural networks
AV N
with delay,” Phys. Rev. A, vol. 39, no. 2, pp. 347-359, 1989.
J. S. Denker, “Neural network for computing,” in Proc. ConJ Neural
Networks f o r Comput., Snowbird, VT, 1986.
C. M. Marcus, F. R. Waugh, and R. M. Westervelt. “Nonlinear dynamics
and stability of analog neural networks,” Physica D , vol. 51, pp.
In (101, let rl = f ~ ( u ~- (TtI ) ) , f ~ ( w (-tT Z ) ) : ~ . , ~ , V ( U . , V ( ~ - 234-247, 1991.
T , v ) ) ) ~ . and u ( t ) = ( 7 , , ( t ) , 7 ~ 2 ( t ) , . , . , ~ ~ ( t ) ) ‘ , then the second P. Baldi and A. F. Atiya, “How delays affect neural dynamics and
term in the right-hand side of (10) becomes v’(t).l~.The neural learning,” IEEE Truns. Neural Networks, vol. 5, pp. 612-621, July 1994.
network (8) will be unconditionally stable if u T ( t ) J q is negative. L. Wang, E. E. Pichler, and J. Ross, “Oscillations and chaos in neural
Suppose it is positive, linearlized ,fA based on the neuronal gain, fit networks: An exactly solvable model,” in Proc. Nul. Academy Scz. USA,
vol. 87, Dec. 1990, pp. 94ti7-9471.
and using 8 instead of B, we have J. J. Hopfield, “Neural networks and physics systems with emergent
collective computation abilities,” in Proc. Nut. Academy Sci. USA, vol.
79, no. 2, 1982, pp. 2554-2558.
J. J. Hopfield and D. W. Tank, “Neural computation of decisions
optimization problems,” B i d . Cybern., vol. 52, pp. 141-152, 1985.
IEEE TRANSACTIONS ON NEURAL NETWORKS, VOL I , NO 6, NOVhMBER 1996 1.535

ISJ J. J. Hopfield, “Neurons with graded response have collective computa- denoted Ph. These notations will be subsequently assumed. The
tional properties like those of two-state neurons,” in Proc. Nat. Academy Bayesian approach is known to be able to detect new classes, but
Sci. USA, vol. X I , no. I O , 1984, pp. 3088-3092. this will not be debated in the present letter. Also note that the true
191 U. Van der Heide, Analysis ofNeuruZ Nercvorks. New York: Springer-
Vcrlag, 1980. classes are not assumed to be disjoint, so that the ideal classifier may
I IO] J. K. Hale, “Nonlinear oscillations in equations with delays,” in Nonlin- have a nonzero misclassificairion rate (it does not bear overfitting).
ear 0.scillntions ir7 Biology, Lecture Notes in Applied Mathematics, vol. In the classification context, the quadratic error minimisation
17, F. C. Hoppensteadt, Ed. Providence, RI: Amer. Math. Soc., 1979. (QEM) criterion consists of minimizing over the learning set a gap
[ 1 1 I S. S. Wang, B. S. Chen, and T. P. Lin, “Robust stability of unccrtain
time-delay systems,” It~r.J. Conrr., vol. 46, no. 3, pp. 963-976, 1987. between desired responses and the outputs of the parameterized
[ 121 E. Niehur, H. G. Schustcr, and D. M. Kammcn, “Collective frequencies mapping, <I,(CV,.)
and metastability in networks of limit-cycle oscillators with time delay,”
Phys. Rev. Lett., vol. 67, no. 20, pp. 2753-2756, 1991.
I131 J . K. Hale, lntroducrion to Fuizctiorzal Diferenliul Equalion.r. New
York: Springer-Verlag, 1993.
The output space is assumed to be provided with a norm, and it is
assumed throughout that T == m“. Many neural networks dedicated
to classification are proceeding this way, and the numerous algorithms
proposed in the literature actually aim at reaching the same goal.
The matter presented in this letter has been already published
Ultimate Performance of QEM Classifiers
in a French conference 151. At the same time, results related to
Pierre Comon and Georges Bienvenu asymptotical performance of the multilayer perceptron (MLP) have
been independently published in this journal [lo]. One can also note
that historically, asymptotical performance of the MLP has also been
Abstract-Supervised learning of classifiers often resorts to the min- derived earlier [I], but the proof relied heavily on the numerical
imization of a quadratic error, even if this criterion is more especially algorithm utilked. It has been established in [9] that probabilities
matched to nonlinear regression problems. It is shown that the mapping of misclassifications are minimized when data samples are infinite
built by a quadratic error minimization (QEM) tends to output the
Bayesian discriminating rules even with nonuniform losses, provided the and when losses are uniforjm. The scope of the paper is to show
desired responses are chosen accordingly. This property is for instance that similar results hold true for nonuniform losses, and for ,finite
shared by the multilayer perceptron (MLP). It is shown that their ultimate databases when noisy replicates are fed infinitly many times in
performance can be assessed with finite learning sets by establishing links the network. The statements presented are valid for general QEM
with kernel estimators of density.
classifiers independently o f t h e exact form of the learning algorithm.

I. INTRODUCTION I[. NOTATION


The classification problem consists of building a mapping (3 from a Assuming the existence of the above-mentioned statistical links,
set of patterns (observations), E, to a set of classes. But in practice, ci the Bayesian solution minimizes a risk function, corresponding to
often maps E to a set decision variables instead, F . In classijication probabilities of misclassification weighted by losses. More precisely,
problems, the set d ( E ) is finite (and can be indexed by an integer the risk takes the form [6]
i ) , and contains as many elements as classes. Denote y ’ the variable
Ii
encoding in T the ith class, d L .With this formulation, any pattern .I’
in f is wished to be associated with a variable y z in d(E) C F.
In the context of supervised classification, a set of examples
A(:\-)= { ( . a ‘ “ ’ . y ’ ( n ) ) .1 5 tL 5 I\-} is given, so that mapping where t c ( i . , j ) denotes the loss associated with the classification in
o is apparently known at a finite number of points. This set of of
d,, a member of class d L . is the domain in which patterns are
input-output pairs is subsequently referred to as the learning set. assigned class d J ,and I< is the number of classes. In practice it is not
It is assumed throughout this paper that patterns are real valued and very useful to assign a nonzero loss to patterns correctly classified.
of dimension t i , that is, t’ = IR“. Therefore it can be set K ( i , i ) = 0, and the minimization of (2) then
Next, let @(IT/-. .) be a mapping Parameterized by a set of weights, simplifies. In this case a vector s will be assigned the class dJ(.)
T I - , that associates any vector .I’ o f f to an output vector y = @(TI which minimizes the expression Bk (.c) over index I:
in F , from which the decision will be made; @(IT-. . ) is the estimate assigned to j(.r) = ArgMinBk(x)
.I‘ @ (3)
of c5.
Of course, regardless of the algorithm that will be used for this
purpose, learning requires the existence of a link (generally of (4)
statistical nature) between the data present in the learning set and the lSi<l<
l#k
data to be classified [ l I ] , [ 131. In the Bayesian context, it is assumed
that any vector .I‘ of a given class ~h that may be observed is drawn For instance, in the case of uniform losses, ~ ( .ij ). = 1- b , , , and the
from a fixed (but a priori unknownj conditional density, p ( . r 1 .*’A ). minimi7ation of L ? k ( . r ) is equivalent to the maximization of
In addition, the occurence of any class d k . has a constant probability
bk(-l.) = P@(.I‘ I Ldk). (5)
Manuscript submitted June 8, 1995; revised July 29, 1996.
The authors are with Thomson.Sintra, F.0bC)03 Sophia.Antipolis Cedex, The Bayesian discriminating rule is generally better known in this
France. latter form. See, for instalice, 191 where Richard and Lippmann
Publisher Item Identifier S 1045-9227(96)08382-8. discuss this case in detail. In practice, a finite learning set -4(.Y)

1045-9227/96$05.00 0 1996 IEEE

You might also like