1993analysis of The Misadjustment of BP Network and An Improved Algorithm

ANALYSIS
T H E M I S A D J U S T M E N T OF BP
AND AN IMPROVED ALGORITHM
OF
Yang Jia
and
NETWORK
Yang Dali
Beijing University of Posts and Telecommunications

P.O.Box 145, Beijing 100088, P.R.CHINA
Abstract -- This paper analyzes the misadjustment

of t h e conventional BP network introduced by t h e
momentum factor a and i t s disadvantageous effect on
the network, and proposes a n improved algorithm with
stochastic attenuation momentum factor a ( k ) . Analysis
and simulations show t h a t t h e momentum misadjustment
is a nonnegligible factor to affect on t h e convergence
precision of
BP
network.
Compared
with
the
conventional BP algorithm, the improved algorithm can
effectively cancel t h e negative effect of a on network
and improve t h e convergence performance of the
network.
I. INTRODUCTION
The multilayer feedforward neural network models
with e r r o r Back-Propagation (BP) algorithm have been
widely researched and applied in some fields [ l , 21. A s
a generalization of the L M S algorithm, the backpropagation algorithm uses a gradient research
technique to minimize a cost function equal to the
mean square difference of t h e desired and the actually
net output. To reduce t h e e r r o r bursting during t h e
learning and to speed up convergence of the
algorithm, Rumelhart, Hinlton and Willims suggested to
add a momentum term to the weight update expression
of the BE algorithm in 1986 [3]. Thus, the weight
update expression of the conventional BP algorithm is
written as
size parameter p is often considered as a main factor

to affect the convergence performance of the BP
algorithm. Whereas t h e negative effect of
the
momentum factor dl on the convergence performance of
BP algorithm is often overlooked in t h e analysis. A
comprehensive analysis for t h e momentum factor a has
been made in t h e paper [41 as t h e MLMS adaptive
algorithm, in which the factor dl is a constant scalar.
For a adaptive filter, maybe the negative effect of a
is not serious to most applications. However, for a
multilayer neural network with a great number of t h e
adaptive units connected each other, t h e effect of a on
network should be paid attention.
The purpose of this work is to clear the negative
effect of t h e constant factor a on the multilayer
feedforward neural network with BP algorithm and to
present a n improved algorithm to overcome t h e effect.
11. ANALYSIS O F THE MOMENTUM MISADJUSTMENT
For the conventional BP network with t h e momentum

weight update given by equation (l), we may consider
each neural net as a standard adaptive unit. According
to the analysis result of t h e paper [41, the
misadjustment in each neural net may expressed as
Where M O denotes the misadjustment without t h e

momentum term o r the misadjustment of t h e standard
W d j ( k + l ) =W,,(k) +p6,(k)Oj(k)
+aAW,,(k)
L M S algorithm [5]. Because a is bounded 0 < a < 1 in
applications of BP network, it is clear from (2) t h a t
the larger the a is, the larger t h e misadjustment is.
Where I, is a step size to control convergence rate and
Though a larger a is contributory to speeding u p the
the stead-state performance of the algorithm, O.(k) is
convergence, the faster convergence is gained at t h e
1
the input signal, a and AW..(k)=(Wij(k)-Wij(k-l)) cost of convergence precision descent. For analysis
1)
express the momentum factor and the momentum,
simplicity, we consider a three-layer BP network
respectively.
Observe
that
the
weight
update
shown in Fig. 1. The momentum misadjustment of the
expression given by (1) corresponds to a secondsnetwork can be described as t h e noise sources which
order adaptive algorithm in t h a t two previous weight
a r e added to only t h e nodes in the hidden layer and
vectors are combined at each iteration to obtain the
the output layer of BP network. Thus, the output of
updated weight vectors. A variety of experiments have
the hidden layer node is expressed as
shown that the algorithm with t h e momentum update is
(3)
effective to s u p p r e s s the e r r o r bursting in t h e weight
,n,
W y ] x y + N ? ( U ) ) j=l,
space and to speed up the smoothing convergence of
t h e algorithm.
In the property analysis of the BP network, t h e s t e p
Where N j ( a ) denotes the noise introduced by t h e
(1)
...
0-7803-1254-6/93$03.000 1993 IEEE
2592
momentum factor
Clearly, t h e noise is a p a r t of t h e
input signal of t h e network o u t p u t layer. Accordingly,
we write t h e outputs of network as
a noise along with t h e useful signal P(*). A s a result,

f o r the o u t p u t layer of the network, we have
4
yE)-i~(&W $ ! ~ > ~ + N ;(a )1
(10)
= P ( X g ) + g ( X y , N y ( a ), X ; , N :
Substituting ( 4 ) into (31, yields
mal,
Where P(*) is the output of

additional momentum term and
network
(a))
without the
. . ., n3
(5)
I n the above equations, t h e nonlinear function

the typical sigmoid function defined by
is
f ( x )=- l +1e - =
Observe t h a t if equations ( 3 ) and ( 5 ) are directly
expressed by the form of equation (61, the analysis for
this question will become extremely complex. To avoid
the difficulty to analyze the complex nonlinear sigmoid
function, we u s e the following quadratic interpolation
function P(x) to approximate the sigmoid function
given by equation ( 6 ) , i.e.
x< -2 ln3
(0.1
(7)
x>21n3
0.9
I t may be demonstrated t h a t t h e interpolation e r r o r

between R(x) and f ( x ) satisfies R(x)=f(x)-P(x)< 10.
Consequently, we replace the sigmoid function f(x) by
the interpolation function P(x) and rewrite t h e
equation (3) as
Where P(*) denotes the output of t h e hidden layer

without the momentum term and g(*) is t h e additional
o u t p u t introduced by the momentum factor a . P(*) and
g(*) a r e defined, respectively, as follows
P ( x ~ =a+m?ic ( x y ) 2
Which expresses the additional o u t p u t introduced by

the momentum factor. Clearly, the effect of q(*), as a
noise source, on the network can not be overlooked in
the analysis of the BP algorithm.
111. AN IMPROVED ALGORITHM
We see from t h e above analysis t h a t t h e momentum

noises in t h e hidden and the o u t p u t nodes of BP
network depend on the value of the momentum factor
a. Consequently, t h e key to cancel the momentum
misadjustment noise is to cancel the effect of t h e
momentum factor a . However, t h e factor a is a
contradictory element in the algorithm convergence
process, t h a t is to say t h a t i t is of t h e smoothing
convergence role, and at the same time, introduces
the momentum noise to the network. If we consider to
cancel t h e negative effect of a on the BP network only
by simply setting a=O when the algorithm converges to
a certain extent, then t h e smoothing role of a will also
despair along with a. Therefore, a n improved algorithm
should take into account both to cancel the momentum
noise and to remain the speeding u p as well as the
smoothing convergence.
Based on the ideal, we give such a n improved
algorithm t h a t the momentum factor a ( k ) in this
algorithm is designed as the stochastic variable and
the function of the output error. As a result, t h e
stochastic factor a ( k ) will attenuate to zero as the
algorithm converges, and the momentum misadjustment
introduced by the constant a will disappear.
The weight update expression for the improved
algorithm is given by
,NY( a ) = N > ~() a ) ( b * ~ ( 2 X > ~ + N( ay )) )
g(x>2
(12)
Wfj(k+l)=Wfj(k)+ p b j (k)0,(k)+ a j (k)Awl, (k)
(9)
Where a=0.5, b=0.3/ln3, c=O.O5/(ln3). I t is clear from

( 8 ) t h a t the momentum misadjustment g(*) in the
hidden layer of the network will input to next layer as
I t can be demonstrated t h a t t h e momentum and t h e

gradient in t h e algorithm have t h e following orthogonal
2593
relationship
Where the subscript T denotes t h e vector transpose.

For the mean-square-error
(mse) function with a
quadratic o r a n approximate quadratic performance
surface, the output e r r o r can be expressed as
Thus, we have
a@=
u~pw)w'
bl
~ ~ ~ ~ ~ ( ~ x a ~ ~ - l > o - l ) )
tz(&)=iWT(k)RryLk)
+PIy(lk)+C
m-l
(14)
*-!E
Where R is a correlation matrix of the input signal for

each neural net, P is t h e cross-correlation vector of
t h e desired response and t h e input signal, and
t-1
c Ym)
m- 1
C=E[d(k)I2.By differentiating (14) with respect to the

weight vector W, it i s easy to obtain the following
expression
We see from (21) t h a t t h e momentum factor a ( k ) is a

function of the gradient instead of a constant, so t h a t
the varying factor added to the c u r r e n t momentum
ve(W~k+l))=vE(w~~))+~w~k+x)-W~k))
(15)
could play t h e role to accelerate and to smooth the
descent process towards t h e minimum during learning.
I t is note t h a t the role of speeding u p the
From the above expressions, we have the following
convergence will despair as the a ( k ) is little than zero
relation
during iteration[4]. Therefor, to keep the a ( k ) is great
than zero all time during adaptation, w e should make
ti,(k)OI'(A>Rb. W&k- 1)
(I6)
the compromise between the bursting compensation and
'P
A $(k- 1)RA W& 1)
the speeding up convergence, t h a t is
Now, rewriting Eq.(15) with k replaced by k-1, yields

m=1
RaW~~-I)=V(W~k))-V(W~~-l))
(17)
Substituting ( 1 7 ) into (16) and making use of the
relation given by (13), we obtain
B Y the definition of the

following recursion
momentum, we have the
A W&k)=p[t3,(&-l)Ubk-l)+
+akk-1)6,(&-2)0&&-2)
+a,(k-l>a,(k-2)...a~i2)8,(1)0~1)
We see from (22) t h a t a ( k ) is a stochastic attenuation

variable greater than zero during iteration, and
satisfies the condition of speeding u p convergence.
The attenuation property of a ( k ) can be seen from t h e
Again, it is clear from (23) t h a t the algorithm

misadjustment has nothing to do with the momentum
factor a ( k ) . Thereby, we conclude t h a t t h e momentum
misadjustment noise in B P network can be canceled
when a ( k ) given by (22) is used.
(Y9)
IV. SIMULATION
+a,(k-l) ... aLl)AWY(O)]
For each element of t h e momentum AW..(k) and for

11
l a ( k ) l < 1 and B < l , t h e r e exits
A s comparison of two algorithms, we present a

three-layer B P network to perform XOR function. The
learning c u r v e s shown in Fig.3a and Fig.3b illustrate
that the improved algorithm marked by I1 has higher
convergence precision than the conventional BP
algorithm with a=0.6 and a=0.7 marked by I. The
corresponding simulation results a r e given in Table 1.
2594
Xi
1 TH I
I
Iteratio, Input
d(k)
BP algortthm ( a . 0 . 7 )
Improved algorithm
EE
KE
XI x2
Output y
4981
0. 0. 0.0
0.00008 0.630X10-s
0.14583
4992
0. 0. 0.0
0.00005 0.6309X10-e
0.11821
0.6942XlO-
4993
1.
0. 1.0
0.99995
0.6309X10-m
0.96283
0.7751
4994
1.
1.
0.00008
0.2110X10-
0.17034
0.7751K10-
4895
0. 0. 0.0
0.00003
0.2llOXlO-*
0.13101
0.3146X10-p
4998
I.
0.98994
0.1323 &lo-
0.96416
0.7751 KIO-a
4997
0.
I.
1.0
0.99995
0.1323XLO-*
0.84592
4998
1.
1.
0.0
0 . m
0.~ ~ X I O0 .-i 4~m
4989
0.
0.
0.0
0.00008 0.1747X10-
0.07643
0.7485X10-*
5000
1.
0.
1.0
0.99994
0.94592
0.6841X10-a
Output y
Xi
X:
NI
Na
Fig.1 The effect of the momentum adjustment

on network for conventional B P algorithm.
W,;
0.
0.0
1.0
0.2160X10-
0.6941~~10-1
lo-
0.3146XlO-*
0.2140~10-a
Na
NI
Fig.2 The improved algorithm to cancel the

momentum nrisad justment.
0.60
0.50
0.40 0.30 0.205:
0.10 0.00 -0. I0 -
L
REFERENCES
lBQ(3 2(3m 3(3m 4 m 5(3(30
Fig.3a The learning curves of two algorithms with

p=O.1 (with a=0.6 for conventional R P algorithm).
0.40
0.30
w
0.20
[ l l R. P. Lippman, An introduction to computing with

neural net,IEEE ASSP Mag., Vo1.4,pp.4-22,Apr.,1987.
[2] B.Widrow and M.A.Lehr,30 years of adaptive neural
networks:perceptron,madaline,mdbackpropagation,
Proc.IEEE, Vo1.8,No.9,Sept.199OIpp 1415-1441.
[3] D.E.Rumelhart, G.E.Hinton,and R.J.Williams,learning
representations b y e r r o r propagation,Parallel
distributed processing, Vol. l,pp.318-362, Cambridge,
MA: M.I.T.Press,l986.
[4] S.Roy and John J.Shynk,Analysis of the momentum
LMS algorithm, IEEE Trans. Acoust.,Speech,Signal
Processing,Vol. ASSP-38,pp. 2088-2098, Dec.1990.
[SI R.Widrow,J.M.Mcmool, M.G.Larimore, and C.R.Johnson,
Jr., Stationary and nonstationary learning
characteristics of LMS adaptive filter, Proc.IEEE,
101.64, pp.1157-1162, 1976.
0.10
0.00
-0.10
1000 2000 3000 4000 5000
Fig.3b The learning c u r v e s of two algorithms with

p=O.l (with a=0.7 for conventional BP algorithm).
2595

1993analysis of The Misadjustment of BP Network and An Improved Algorithm

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

1993analysis of The Misadjustment of BP Network and An Improved Algorithm

Uploaded by

Copyright:

Available Formats

ANALYSIS

Beijing University of Posts and Telecommunications

Abstract -- This paper analyzes the misadjustment

size parameter p is often considered as a main factor

11. ANALYSIS O F THE MOMENTUM MISADJUSTMENT

For the conventional BP network with t h e momentum

Where M O denotes the misadjustment without t h e

0-7803-1254-6/93$03.000 1993 IEEE

a noise along with t h e useful signal P(*). A s a result,

yE)-i~(&W $ ! ~ > ~ + N ;(a )1

Where P(*) is the output of

I n the above equations, t h e nonlinear function

I t may be demonstrated t h a t t h e interpolation e r r o r

Where P(*) denotes the output of t h e hidden layer

Which expresses the additional o u t p u t introduced by

111. AN IMPROVED ALGORITHM

We see from t h e above analysis t h a t t h e momentum

,NY( a ) = N > ~() a ) ( b * ~ ( 2 X > ~ + N( ay )) )

Wfj(k+l)=Wfj(k)+ p b j (k)0,(k)+ a j (k)Awl, (k)

Where a=0.5, b=0.3/ln3, c=O.O5/(ln3). I t is clear from

I t can be demonstrated t h a t t h e momentum and t h e

Where the subscript T denotes t h e vector transpose.

Where R is a correlation matrix of the input signal for

C=E[d(k)I2.By differentiating (14) with respect to the

We see from (21) t h a t t h e momentum factor a ( k ) is a

Now, rewriting Eq.(15) with k replaced by k-1, yields

B Y the definition of the

momentum, we have the

We see from (22) t h a t a ( k ) is a stochastic attenuation

Again, it is clear from (23) t h a t the algorithm

+a,(k-l) ... aLl)AWY(O)]

For each element of t h e momentum AW..(k) and for

A s comparison of two algorithms, we present a

0.~ ~ X I O0 .-i 4~m

Fig.1 The effect of the momentum adjustment

Fig.2 The improved algorithm to cancel the

lBQ(3 2(3m 3(3m 4 m 5(3(30

Fig.3a The learning curves of two algorithms with

[ l l R. P. Lippman, An introduction to computing with

1000 2000 3000 4000 5000

Fig.3b The learning c u r v e s of two algorithms with

You might also like