Professional Documents
Culture Documents
New Neuro-Fuzzy Systems and Their Applications
New Neuro-Fuzzy Systems and Their Applications
Bogdan M. Wilamowski
llniversity of Wyoming
Department of Electrical Engineering
Laramie WY 82071
wilam@uwyo.edu
Abstract -
C ikational intelligence combines neural 11. " R O N
networks, fuzzy ms, and evolutional computing. Neural
networks and systems, have already proved their Biological neuron has a complicated structure which
usefulness and been found useful for many practical receives trains of pulses on hundreds of excitatory and
applications. V re a t the beginning of the third
technological re ion. Now neural networks are being
inhibitory inputs. Those incoming pulses are summed and
capable of rep1 , highly skilled people with all their averaged with different weights during the time period of
experience. latent sumnution. If the summed value is higher than a
The concept *tificial neural networks is presented, threshold then the neuron generates a pulse which is sent to
underlining thei que features and limitations. A review neighboring neurons. If the value of the summed weighted
and comparisoi various supervised and unsupervised inputs is higher, the neuron generates pulses more iiequently.
learning algorit follows. Several special, easy to train, An above simplified description of the neuron action leads
architectures a r wn. The neural network presentation is to a very complex neuron model which is not practical.
illustrated with 1practical applications such as speaker McCulloch and Pitts in 1943 show that even with a very
identification, sc recognition of various equipment as a
diagnosis tool, -itten character recognition, data
simple neuron model it is possible to build logic and memory
compression usii Ise coupled neural networks, time series circuits. The McCulloch-Rtts neuron model assumes that
prediction, etc. incoming and outgoing signals may have only binary values 0
In the later 9 the presentation the concept of fuzzy and 1. If incoming signals summed through positive or
systems, includi he conventional Zadeh approach and negative weights have a value larger than threshold T, then
Takagi-Sugano itecture, is presented. The basic the neuron output is set to 1. Otherwise it is set to 0.
building blocks my systems are discussed. Comparisons Examples of McCulloch-Pitts neurons realizing OR, AND,
of fuzzy and ne] systems, are given and illustrated with NOT and MEMORY operations are shown in Fig. 1. Note,
several applica The fuzzy system presentation is that the structure of OR and AND gates can be identical and
concluded with scription of the fabricated VLSI fuzzy
chip.
only threshold is different. These simple neurons, known also
as perceptrons, are usually more powerful than typical logic
[. INTRODUCTION gam used in computers.
Fascination a
McCulloch and
elementary c o l ~
introduced his 1
introduced the
artificial neural networks started when
in 1943 developed their model of an
g neuron and when Hebb in 1949
ng rules. A decade latter Rosenblatt
p r o n concept. In the early sixties A *' write 1
- memory
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
These continuous activation functions allow for the = - I
M.', for i = 1 to n : w,,+~= - 1
gradient based training of multilayer networks. Typical x,o
activation functions are shown in Fig. 2. One neuron can divide only linearly separated patterns. In
+ + order to select just one region in n-dimensional input space,
more than n+l neurons should be used. If more input clusters
should be selected then the number of neurons in the input
--- (hidden) layer should be properly multiplied. If the number of
neurons in the input (hidden) layer is not limited, then all
classification problems can be solved using the three layer
network. The linear separation property of neurons makes
some problems spectally difficult for neural networks, such as
exclusive OR, parity computation for several bits, or to
separate patterns laying on two neighboring spirals.
The feedfaward neural networks are used for nonlinear
Fig.2. Typical adivation funciions: (a) hard threshdd unipolar. (b) hard transformation (mapping) of a multidimensional input
b h l d bipolar, (c) Continuousunipolar, (d)continucus bipdar. variable into another multidimensional output variable. In
theory, any input-output mapping should be possible if neural
111. W-EDFORWARD NEURAL NETWORKS
network has enough neurons in hidden layers (size of output
layer is set by the number of outputs required). Practically, it
Simplest and most commonly used neural networks use is not an easy task and presently, there is no satisfactory
only for one directional signal flow. Furthermore, most of
method to define how many neurons should be used in hidden
feedforwad neural networks are organized in layers. An
layers. Usually this is found by try and error method. In
example of the three layer feedforward neural network is
general, it is known that if more neurons are used, more
shown in Fig. 3. This network consists of input nodes, two
complicated shapes can be mapped. On the other side,
hidden layers, and an output layer.
networks with large number of neurons lose their ability for
hidden generalization, and it is more likely that such network will try
hidden layer #2 output to map noise supplied to the input also.
h
h layer
IV. LEARNING ALGORITHMS
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
B. Correlation leoming rule
AW, = C X I ~ J Note, that weight change Awij is a sum of the changes from
This training algcirithm starts with initialization of weights each of the individual applied patterns. Therefore, it is
to zero values. possible to correct weight after each individual pattern was
applied. If the learning constant c is chosen to be small, then
C. Instar leaminlt rule both methods gives the same result. The LMS rule works well
for all type of activation functions. This rule tries to enforce
If input vectors, and weights, are normalized, or they have the net value to be equal to desired value. Sometimes, this is
only binary bipolar values (-I or +Z), then the net value will not what we are looking for. It is usually not important what
have the largest pabitive value when the weights have the the net value is, but it is important if the net value is positive
same values as the iniput signals. Therefore, weights should be or negative. For example, a very large net value with a proper
changed only if they pre different from the signals sign will result in large error and this may be the preferred
solution.
Aw, = C(X, - w,)
Note, that the infbrmation required for the weight is only G. Linear regression
taken kom the input signals. This is a local and unsupervised
learning algorithm. The LMS learning rule requires hundreds or thousands of
iterations before it converges to the proper solution. Using the
linear regression the same result can be obtained in only one
D. W A - Winner TakesAll step.
The WTA is a madifioltion of the instar algorithm where Considering one neuron and using vector notation for a set
weights are modifiedl only for the neuron with the highest net of the input patterns X applied through weights w the vector
value. Weights of remaining neurons are left unchanged. This of net values net is calculated using
unsupervised algorithm @"e we do not know what are X w = net
desired outputs) has a global character. The WTA algorithm,
where X is a rectangular array (n+I)*p, n is the number of
developed by Kohonien in 1982, is often used for autmatic
inputs, and p is the number of patterns. Note that the size of
clustering and for extracting statistical properties of input data.
the input patterns is always augmented by one, and this
additional weight is responsible for the threshold. This
E. Outstar leamiizg rule
method, similar to the LMS rule, assumes a linear activation
function, so the net values net should be equal to desired
In the outstar leaning rule it is required that weights
output values d
connected to the cerlain node should be equal to the desired
outputs for the neurons connected through those weights xw = d
Usuallyp > n+Z, and the above equation can be solved only
Aw, = C(dj - wij)
in the least mean square error sense
where 4 is the desired neuron output and c is small w =(XTX)'XTd
learning constant which m er decreases during the learning
procedure. This is the supervised training procedure because or to convert the set of p equations with n+l unknowns to
desired outputs must be known. Both instar and outstar the set of n+Z equations with n+I unknowns. Weights are a
learning rules were developed by Grossberg in 1974. solution of the equation
Yw = 2
F. Widrow-Hoff(LMS) learning nile where elements of the Y matrix and the z vector are given
by
P P
Widrow and Hoff in 1962 developed a supervised training
algorithm which allows to train a neuron for the desired
p d p=l
response. This rule was derived by minimizing the square of
the differencebetween ner and output value.
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
H. Delta learning rule I. Nonlinear regression method
The LMS method assumes linear activation function net = 'I'he delta method is usually very slow. Solution can be very
0, and the obtained solution is sometimes far from optimum as fast when nonlinear regression algorithm is adopted [l]. The
it is shown in Fig. 5 for a simple two dimensional case,with total error for one neuron j and pattern p is now defined by
four pattems belonging to two categories. In the solution a simple difference:
obtained using the LMS algorithm one pattern is = dJp -oJp(rlet)
misclassified. If error is defined as where net=w1x1+w2x2+ .....wnxn. The derivative of this
2
D
p=l or
in case of the incremental training for each applied pattern XAw=v
Matrix X is usually rectangular and above equation can be
AWIJ - c XI fJ,(d, - 0,) only solves using pseudo inversion technique.
the weight change should be proportional to input signal x,,
A w = (X'X)-* XTv
to the difkrence between desired and actual outputs dp-op,
and to the derivative of the activation function 7,. Similar to The (X7X)'XT matrix is composed of input pattems only,
the LMS rule, weights can be updated in both the incremental and it must be computed only once
and the cumulative methods. In comparison to the LMS rule,
the delta rule always leads to a solution close to the optimum. 8
I
As it is illustrated in Fig 5, when the delta rule is used, all four
patterns are classified correctly.
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
TABLE 1. Learning rules for single neuron find the total gains as products of layer gains. This procedure
General formula is equally or less computational intensive than a calculation of
Awl = a S x cumulativeerrors in the error backpropagation algorithm.
Hebb rule (unsupeirvised):
6=0
correlation rule (supervised): The backpropagation algorithm has a tendency for
S=d oscillation. In order to smooth up the process, the weights
perceptron fixed rale: increment Aw,, can be modified acccording to Rumelhart,
6Zd-o Hinton, and Wiliams [141
perceptron adjustable rule: W l J ( n + I ) = w l J ( n ) + AWIJ(n) + a W t J ( n - l )
XTW net or according to Sejnowski and Rosenberg (1987)
6 = (d -0)- = (d - o ) ~ w,(n + 1 ) = w g(n) + (1 - a)A w,,(n) + U!, w,,(n - 1 )
xTx llX11 where a is the momentum term.
LMS (Widrow-Hoff)rule: r-------- 1
delta rule:
6 = d -net i
6 = (d - 0 ) f '
pseudoinverse rule (for linear system):
w = (x'x)-'x'd
iterative pseudoinverse rule (for nonlinear system): +I Y
T d -0 Fig. 7. illustration dthe c o n q ofthe gain computationin neural networks
w = (x'x)'x --
f' The backpropagation algorithm can be significantly
speedup, when after finding components of the gradient,
J. Error Backpropagation leaming weights are modified along the gradient direction until a
minimum is reached. This process can be carried on without
The delta learning rule can be generalized for multilayer the necessity of computationalintensive gradient calculation at
networks. Using a :similar approach, as it is described for the each step. The new gradient components are calculated once
delta rule, the gradient of the global error can be computed in a minimum on the direction of the previous gradient is
respect to each weight in the network. Interestingly
obtained. This process is only possible for cumulative weight
A WIJ = c XI f J l E, adjustment. One method to find a minimum along the
where c is the learning constant, x, is the signal on the i-th gradient direction is the tree step process of finding error for
neuron input, and j;' is the derivative of activation function. three points along gradient direction and then, using a
The cumulative error E, on neuron output is given by parabola approximation, jump direcxly to the minimum. The
fast learning algorithm using the above approach was
proposed by Fahlman [2] and it is known as the quickprop.
J J' k=I The backpropagation algorithm has many disadvantages
where K is the number of network outputs, and Alk is the which leads to very slow convergence. One of the most
small signal gain fjrom the input of j-th neuron to the k-th painful is this, that in the backpropagation algorithm the
network output as Fig. 7 shows. The calculation of the back learning process almost perishes for neurons responding with
propagating error is kind of artificial to the real nervous the maximally wrong answer. For example if the value on the
system. Also, the error backpropagation method is not neuron output is close to +I and desired output should be
practical from the point of view of hardware realization. close to - I , then the neuron gainf(nett-0 and the error signal
Instead, it is simpler to find signal gains firom the input of the cannot back propagate, so the learning procedure is not
j-th neuron to each of the network output (Fig. 7). In this case effective. To overcome this difficulty,a modified method for
weights are corrected using derivative calculation was introduced by Wilamowski and
K Torvik [a] . derivative is calculated as tQe slope of a line
The
connecting the point of the output value with the point of the
k=1 desired value as shown in Fig. 8
Note, that the above formula is general, no matter if
neurons are arranged in layers or not. One way to find gains
A$ is to introduce ani incremental change on the input of the j -
th neuron and observe the change in the k-th network output.
This procedure requires only forward signal propagation, and
it is easy to implenrent in a hardware realization. Another
pcxsible way, is to calculate gains through each layer and then
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
Lavenberg-Marquardt method the Hessian AI, is
approximated by product of Jacobians
A = 2JTJ
and gradient as
g = 2JTe
where e is vector of output errors and Jacobian J is
-
aE, - aE, aE, .-.
my, my, my,
- aE2 -
aEZ - aE, ...
Jacobian * my, &J2 &J3
output I --
aE,
aE, aE, ...
e* hyz hu".~ .
Wg.8 IllustratiOn of the modified derivative calmlation fa fa=
mvergence ofthe enor baclqropagation algorithm It is much easier to calculate Jacobian than Hessian and
also usually Jacobian is much smaller so less memory is
- Odesired - oacmal required. Therefore weights can be calculated as
netdesired - netactual
Note, that for small errors, equation converges to the or
derivative of activation function at the point of the output
value. With an increase of the system dimensionality, a Wk+l = wk - (J:J, I'JEe
chance for local minima decrease. It is believed that the To secure convergence the Lavenberg-Marquardt
described above phenomenon, rather than a trapping in local introduces p parameter
minima, is responsible for convergence problems in the error
backpropagation algorithm.
when p = 0 this method is similar to the second order
K. Lavenberg-Marquardt learning Newton method. For larger values of p parameter the
Lavenberg-Marquardt works as the steepest decent method
The Lavenberg-Marquardtlearning algorithm is the second with small time steps. The m parameter is automatically
order search method of a minimum. At each iteration step adjusted during computation process so good convergence
error surface is approximated by parabolic approximation and is secured. The Lavenberg-Marquardt recently becomes
the minimum of the paraboloid is the solution for the step. very popular because it will usually converge in 5 to 10
Simples approach require function approximation by first iteration steps. Main disadvantage of this method is a large
terms of Taylor series memory requirement and therefore it cannot be used for
1
F ( w ~ + ~F(wk
) = +Aw)+g:Aw, i--Aw:AkAwk
2 V. SPECIAL FEEDFORWARD NETWORKS
a2E
wa2E
&,my,
aZE
neurons in the hidden layer is usually found by a try and error
process. The training process starts with all weights
- - ...
* &*;z
~
Hessian aw2aw, &&J3 randomized to small values, and then the error
a 2
-- -E a'E d2E ... backpropagation algorithm is used to find a solution. When
aW3.awl "v, my: . the leaming process does not converge, the training is
repeated with a new set of randomly chosen weights. Nguyen
Steepest decent (error backpropagation)method calculates and Widrow [16] proposed an experimental approach for the
weights using: two layer network weight initialization. In the second layer,
weights are randomly chosen in the range fiom -0.5 to +0.5.
Wk+l = w k -a g In the first layer, initial weights are calculated from
while Newton method uses:
Wk+l = wk -A,'g
The Newton method is practical only for small networks
where Hessian Ak can be calculated and inverted. In the
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
where 5, is the mdom number firom -0.5 to +0.5 and the Kohonen Grossberg
scaling factor i3 is given by layer I'\ o layer
I
p
= 0.7P~
where n is the number of inputs, and N is the number of
hidden neurons in the first layer. This type of weight
initialization usually leads to faster solutions.
For adequate solutions with backpropagation networks,
many tries are typically required with different network
structures and different initial random weights. This
encouraged researchers to develop feedforward networks
which can be more reliable. Some of those networks are summing
unipolar circuits
described below. neurons
One layer neural networks are relatively easy to train, but B. Feedforward version of the counterpropagation network
these networks can solve only linearly separated problems.
The concept of furictional link networks was developed by The counterpropagation network was originally proposed
Nilson bodc [lo] and late elaborated by Pa0 [13] using the by Hecht-Nilsen [3] and modified feedfbrward version
functional link network shown in Fig. 9. described by &ada [26]. This network, which is shown in
0
Fig. 11, requires numbers of hidden neurons equal to the
4
.. number of input patterns, or more exactly, to the number of
input clusters. The first layer is known as the Kohonen layer
0
with unipolar neurons. In this layer only one neuron, the
& winner, can be active. The second is the Grossberg outstar
layer. The Kohonen layer can be trained in the unsupervised
mode, but that need not be the case. When binary input
patterns are considered, then the input weights must be exactly
equal to the input patterns. In this case
net = x'w = (n - 2HD(x,w))
where n is the number of inputs, w are weights, x is the
U input vector, and HD(w,x) is the Hamming distance between
Fig. 9. The fundional link netwcrk input pattern and weights.
Since for a given input pattern, only one neuron in the first
Using nonlinear terms with initially determined functions, layer may have the value of one and remaining neurons have
the actual number of inputs supplied to the one layer neural zero values, the weights in the output layer are equal to the
network is increasecl. In the simplest case nonlinear elements required output pattern.
are higher order tcms of input patterns. Note that the The feedforward counterpropagation network may also use
functional link network can be treated as a one layer network, analog inputs, but in this case all input data should be
where additional input data are generated off line using normalized
nonlinear transformations. The learning procedure for one A
= - Xi
layer is easy and fast. Fig. 10 shows an XOR problem solved
using functional link networks. Note, that when the
wi=xi l Xil
functional link approach is used, this difficult problem The counterpropagation network is very easy to design.
becomes a trivial one. The problem with the functional link The number of neurons i ~thei hidden layer should be equal to
network is that p r o p selection of nonlinear elements is not the number of patterns (clusters). The weights in the input
an easy task. However, in many practical cases it is not layer should be equal to the input patterns and, the weights in
difficult to predict what kind of transformation of input data the output layer should be equal to the output patterns. This
may linearize the problem, so the functional link approach can simple network can be used for rapid prototyping.
be used.
untpolcrr neuron bipolar neuron C. WA architecture
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
functions of neurons are unipolar and continuous. The transformed z-space are located on a hypersphere with
learning process starts with a weight initialization to small radius R .
random values. During the learning process the weights are z T z = x +x:+...+xi
~ +(,/Rz -x: +x:+...+x~)* = R2
changed only for the neuron with highest value on the output -
the winner
Aww = c ( x ww) - The p a m s of the transfmed z space have the same magnitudes
where w, are weights of the winning neuron, x is the input and lie on the n+l hypesphere. Fach cluster can now be cut out
vector, and c is the learning constant. by a single hyperplane in the n+l dimensional space. Ihe
Kohonen separaticpl hypsplanes should be normal to the vectars speakcl
CentfXS of @Ivity ZCk. ~ U C l h fOr
by thet~l~~tfXS’ S the S€pX2lbtl
plane of the k-th cluster can be easily found using the point and
normal vector formula:
2
Y
zcl(z-ze,) = o
2
-a where zck is the center of gravity of the vector and q is a point
2 transfmed &om the edge of the cluster. To visualize the problem
tu‘
-a
Y let us consider a simple two dimensional example with three
r? clusters shown in Fig. 14.
s0
s2
Fig. 12 A winner take all - “TA architemre for cluster extrading in the
unsupwvised training mode: (a) network COM&O~S, (b) single layer network
arrangedinto a hexagonal +.
D. Projection on sphere
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
After defactorizatioti hidden neurons
i t x - ~ '=.r; + x i +...+Xi +%$+w,'+...+Wn'-2(x1w; +X21+\ +...+ X"1V")
finally
Ik - wllZ= x T x + w T w - 2xTw= 11x112 + 11~112- 2net
where net = xTw is the weighted sum of input signals. Fig.
16 shows the netRork which is capable of calculating the
square of Euclidean distance between input vector x and
stored pattern w.
1
be maximized
-
500
s =
o=I ,=I
(v, - v) ( E , - E)J
400
where 0 is the number of network outputs, P is the number
300 of training patterns, V, is output on the new hidden neuron,
200 and E, is the error on the network output. By finding the
100 gradient, ~ S W the ; , , adjustment for the new neuron
weight
0 can be found as
30
O P
30
0-0
o=l p = l
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
to one. This "neuron" is capable of recognizing certain unipolar hard tlireshold neurons with outputs equal to 0 or 1.
patterns and generating output signals being functions of a Weights are given by a symmetrical square matrix W with
similarity. Features of this "neuron" are much more powerful zero elements (N;, = 0 for i=j) on the main diagonal. The
than a neuron used in the bckpropagation networks. As a stability of the system is usually analyzed by means of the
consequence, a network made of such "neurons" is also more energy function
powerful. I N N
If the input signal is the same as a pattern stored in a E = - -~J~dw,vlvj
neuron , then this "neuron" responds with 1 and remaining 2 1=1 j = 1
"neurons" have 0 on the output, as it is illustrated in Fig. 16. Aw,] = A w j l = (2v, - I) (2v, - I )
Thus, output signals are exactly equal to the weights coming
out from the active "neuron". This way, if the number of It was proved that during signal circulation the energy E of
"neurons" in the hidden layer is large, then any input output the network decreases and system converges to the stable
mapping can be obtained. Unfortunately, it may also happen points. This is especially true when values of system outputs
that for some pattems several "neurons" in the first layer will are updated in the asynchronousmode. This means that at the
respond with a non-zero signal. For a proper approximation given cycle, only one random output can be changed to the
the sum of all signals from hidden layer should be equal to required value. Hopfield also proved that those stable points
one. In order to meet this requirement, output signals are to which the system converges can by programmed by
often normalized as shown in Fig. 18. adjusting the weights using a modified Hebbian rule. Such
hidden "neurons"
memory has limited storage capacity. Based on experiments,
Hopfield estimated that the maximum number of stored
patterns is 0.1SN, where N is the number of neurons.
were described.
where M is the number of stored pattern, and Z is the unity
A. Hopfield network matrix. Note, that W is the square symmetrical matrix with
elements on the main diagonal equal to zero (wjfor i=j).
The single layer recurrent network was analyzed by Using a modified formula, new patierns can be added or
Hopfield in 1982. This network shown in Fig. 19 has subtracted from memory. When such memory is exposed to a
binary bipolar pattern by enforcing the initial network states,
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
then after signal circulation the network will converge to the values W e e n zero and one. The fuzzy logic consists of the
closest (most similar) stored pattern or to its complement. same basic A - AND, v - OR, and NOT operators:
This stable point will be at A A B A C ==> min(A,B. C] - smallest value of A or B or C
I A v B v C ==> m ( A ,B, C)- largest value of A or B or C
E(v) = - -vTWv -
2 A =>I-A - one minus value of A
the closest minimum of the energy functionLike the Forexample0.1~0.8~0.4 = O.l,O.IvO.8v0.4 = 0.8, and 0.3
Hopfield network, the autoassociative memory has limited = 0.7. The above rules are also known as Zadeh [25] AND,
storage capacity, which is estimated to be about Mm=0.15N. OR, and NOT operators. One can note that these rules are true
When the number of stored patterns is large and close to the also for classical binary logic.
memory capacity, the network has a tendency to converge to
spurious states which were not stored. These spurious states B. Fuzzy controllers and basic blocks
are additional minima of the energy function.
The principle of operation of the fuzzy controller
C. BAM significantlydiffers kom neural networks. The block diagram
of a fuzzy controller is shown in Fig. 21.(a) In the first step,
The concxpt of the autoaSSoCiative memory was analog inputs are converted into a set of fuzzy variables. In
extended to bidirectional associative memories - BAM by this step usually for each analog input 3 to 9 fuzzy variables
Kosko [7][8]. This memory shown in Fig. 20 is able to are generated. Each fuzzy variable has an analog value
associate pairs of the patterns a and b. This is the two layer between zero and one. In the next step a fuzzy logic is applied
nmork with the output of the second layer connected directly to the input fuzzy variables and a resulting set of the output
to the input of the first layer. The weight matrix of the second variables is generated. In the last step, known as
layer is WTand it is W for the first layer. The rectangular defuzzification, firom a set of output fuzzy variables, one or
weight matrix W is obtained as a sum of the cross melation more output analog variables are generated, which are to be
matrixes used as control variables.
M
W = Camb,
m=l
where M is the number of stored pairs, and U,,, and 6, are
the stored vector pairs. If the nodes a or b are initialized with
a vector similar to the stored one, then after signal circulation,
both stored patterns U,,, and b, should be recovered. The
BAM has similar limited memory capacity and memory
corruption problem!; as the autoassociative memory. The
B A M concept can Ix:extended for association of three or more
vectors.
U
7
(4 (b)
C. Fum>cation
Fig. 20. An example of the b i d i r e d i d autoasudive nlemuy - BAh4: (a)
drawn as a two layer net\& with cira~latingsignals (b) drawn as two layer The purpose of fuzzification is to convert an analog variable
network with bi-diredional signal flow. input into a set of fuzzy variables, For higher accuracy more
fuzzy variables will be chosen. To illustrate the fuzzification
VII. FUZZY SYS’IEMS process let us consider that the input variable is the
temperature, and this is coded into five fuzzy variables: cold,
A. Fuzzy variables and basic operations cool, normal, warm, and hot. Each fuzzy variable should
obtain a value between zero and one, which describes a degree
In contrary to the Boolean logic where variables can have of association of the analog input (temperature) within the
only binary states, ixi fuzzy logic all variables may have any given fuzzy variable. Sometimes, instead of term of degree of
association, the degree of membership is used. The process of
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
fuzzification is illustrated in Fig. 22. For example, for a D. Rule Evaluation
temperature of V F , the following set of fuzzy variables is
obtained: [O, 0.5, 0.3, 0, 01, and for T=80"F it is [O, 0, 0.2, Fuzzy rules are specified in the fuzzy table as it is shown for
0.7, 01. Usually only one or two fuzzy variables have a value a given system. Let us consider a simple system with two
diffsent than zero. In the presented example, trapezoidal analog input variablesx and y , and one output variable z. We
function are used for calculation of the degree of association. have to design a fuzzy system generating z asf(x.y). Let us
Various different functions as the triangular, or the gaussian also assume that after fuzzification the analog variable x is
can also be used, as long as the computed value is in the range represented by five fuzzy variables: xI, x2, x3, .Q, x5 and an
from zero to one. Always each membership function is analog variable y is represented by three fuzzy variables:yI,y2,
described by only three or four parameters, which have to be y3. Let us also assume that an analog output variable is
stored in memory. represented by four fuzzy variables: zI, z2, 23. G.The key issue
of the design prows is to set proper output fuzzy variables .Q
For proper design of the fuzzification stage, certain practical for all combination of input fuzzy variables, as it is illustrated
rules should be used: in the table (a) shown in Fig. 23. The designer has to spsclfy
1. Each point of the input analog variable should belong to at many rules such as: if inputs are represented by fuzzy
least one and no more then two membership functions. variables x, and vj then the output should be represented by
2. For overlapping functions, the sum of two membership fuzzy variable a. It is only required to specify what
functions must not be larger than one. This also means that membership functions of the output variable are associated
overlaps must not cross the points of maximum values (ones). with a combination of input fuzzy variables. Once the fuzzy
3. For higher accuracy, more membership function should be table is specified, the fuzzy logic amputation is proceeded
used. However, very dense functions lead to a frequent system with two steps. At first each field of the fuzzy table is filled
reaction, and sometimes to a system instability. with intermediate fuzzy variables tl,, obtained from AND
operator tv=min(xc,y,/, as shown in Fig. 23(b). This step is
574 804 independent of the required rules for a given system. In the
cold second step the OR (max) operator is used to compute each
output fuzzy variable a. In the given example in Fig. 22
ZI=max(tII,t12.t2I,t31/, 8 = m a X ( t I . ~ , t 2 2 , t 4 I , t 5 I / , Z.?'"t23,t32,t42,
tZ2/,~ = m a x { t ~ ~ , t ~ ~ ,Note
t 4 5 / .that the formulas depend on the
specificationsgiven in the fuzzy table shown in Fig 23(a).
-
Rg. 24. IlIusaatiMl of the defuzzifcationprocess.
E. Defuuijicarion
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
In the case when shapes of the output membership functions therefore the sum of all the signals representing the fuzzy
pdz) are the same, the above equation can simplified to variables of a single input is constant.
n The fuzzifier circuit is presented in Fig. 26. This design
C Z k ZCk requires only one differential pair for each membership
- k= 1 function, in contrast to earlier designs where at least two
Zanalog -
differential pairs were required. Also the output currents
$Zk are automatically normalized because the sum of I1
k= 1
through I6 is always equal to IO. Thus the normalization
where n is the numkr of membership function of &dog output
circuit is integrated within the fuzzifier.
variable, is fuzzy output variables obtained from rule
evaluation, and zck are analog values corresponding to the
center of k-th memtership function.
1, (a)
Fuzzyfiercircuit
J U
u u
Fig. 25. Architecture of VLSI fuzzy controller
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
values. Due to the specific shapes of the fuzzifier E. VLSI implementation
membership functions, where only two membership
functions can overlap, a maximum of four cluster cells are A universal fuzzy approximator has been designed and
active at a time. Although current mode MIN and MAX fabricated. In order to make the chip universal, each
operators are possible, it is much easier to convert currents fuzzifier consists of seven differential pairs with seven
from the fuzzifiers into volhges and use the simple rule equally spaced reference voltages. This results in eight
selection circuits with the fuzzy conjunction (AND) or membership functions for each input and 8*8 = 64 cluster
fuzzy MIN operator. cells. Sixty-four adjustable current mirrors for setting
The voltage on the common node of all sources always weights of output signals are programmed with 6 bit
follows the highest potential of any of the transistor gates, accuracy. For an arbitrary two-dimensional function only
so it operates as a MAX/OR circuit. However using the 6*64=384 bits are required for programming. A test chip
negative signal convention (lower voltages for higher has been implemented in the 2 p n-well MOSIS process
signals) this circuit performs the MIN/AND function. This using more than 2000 transistors to perform the analog
means that the output signal is low only when all inputs signal processing. To simplify the test chip
are low. A cluster is selected when all fuzzy signals are implementation, current sources were programmed at the
significantly lower than the positive battery voltage. contact mask level. Fig. 28 shows a comparison between
Selectivity of the circuit increases with larger W/L ratios. the desired and the actually measured control surface from
Transistor M3 would be required only if three fuzzifier the fuzzy chip.
circuits were used with three inputs.
C. Normalization circuit
In order to obtain proper outputs it is essential that 1
normalization occurs before weights are applied to the
summed currents. The normalization circuit can be
implemented using the same concept as the rule selection
circuit. For the negative signal convention, PMOS
transistors supplied by a common current source are
required. The normalization circuit is shown in Fig. 27.
The voltage on the common node A follows the lowest 10
input potential. The normalization circuit consists of 4
Tn
m ID
10
1
D. Weight circuit
V. REFER.E!NCES
The weights for different clusters can be assigned by
setting proper W/L ratios for current sources. This task can [l] Andersen, Thomas J. and B.M. Wilamowski, “A.
be also accomplished by introducing a digitally Modified Regression Algorithm for Fast One Layer Neural
programmable current mirrors. Network Training”,World Congress of Neural Networks,
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.
vol. 1, pp. 687.690, Washington DC,USA, July 17-21, [19]Tapkan, Baskin I. and Bog& M. Wilamowski,
1W5. "Trainable Functional Link Neural Network Architecture",
[21 Fahlman, S. E. "Faster-learning variations on presented at ANNIE95 - Artificial Neural Networks in
backpropagation: An empirical study." Proceedings of [20]Wilamowski B. M. and R C. Jaeger, "Neuro-Fuzzy
the Connectioriist Models Summer School, eds. D. Architecture for CMOS Implementation" accepted for
Touretzky, G. Ilinton, and T. Sejnowski, San Mateo, IEEE Transaction on Industrial Electronics
CA: Morgan Kaufmann, 1988. [21]Wilamowski B. M., " Modified EBP Algorithm with
[3] Hecht-Nielsen, R. "Counterpropagation Networks," Instant Training of the Hidden Layer", proceedings of
Appl. Opt., vol. 26(23) pp. 4979-4984, 1987. Industrial Electronic Conference (IECON97), New
[4] Hopfield, J. J. "Neural networks and physical systems Orleans, November 9-14, 1997, pp. 1097-1101.
with emergent collective computation abilities. [22]Wilamowski, B. M. and Richard C. Jaeger, "VLSI
Proceedings of the National Academy of Science, vol Implementation of a Universal Fuzzy Controller,"
79, pp. 2554-2558, 1982. ANNIE96 - Artificial Neural Networks in Engineering,
[5] Hopfield, J. J. "Neurons with graded response have St. Louis, Missouri, USA, November 11-14,1996.
collective computational properties like those of two- [23]Wilamowski, B. M., "Neural Networks and Fuzzy
state neurons." Proceedings of the National Academy Systems" chapters 124.1 to 124.8 in The Electronic
of Science, ~0181,pp. 3088-3092, 1984. H a n d W . CRC Press 1996, pp. 1893-1914.
[6] Kohonen, T. "The self-organized map," Proc. IEEE, [24]Wilamowski, B.M. and L. Torvik, "Modification of
VOI78(9), pp. 1464-1480, 1990 Gradient Computation in the Back-Propagation
[7] Kosko, B. "Adaptive Bidirectional Associative Algorithm", presented at ANNIE'93 - Artificial Neural
Memories," App. Opt. vol26, pp 4947-4959, 1987 Networks in Engineering, St. Louis, Missouri,
[8] Kosko, B. "Bidirectional Associative Memories," November 14-17,1993;
IEEE Transaction on System Man, and Cybernnetics [25] Zadeh, L. A. "Fuzzy sets."lnformution and Control,
V O ~18, pp. 49-60, 1988. V O ~8, 338-353, 1965.
[9] Nguyen, D. and B. Widrow, "Improving the learning [26] Zurada, J. Introduction to Artificial Neural Systems,
speed of 2-layer neural networks, by choosing initial West Publishing 1992.
values of the adaptive weights." Roc. Intl. Join Conf.
on Neural Networks, San Diego Ca, June 1990.
[lo] Nilson N. J., (1%5) Leaming Machines: Foundations of
Trainable Paaem chssifim New York:McGraw Hill.
[ l l ] O t a Y . a n d B . V v ' i i (1994)"Inputdata
~
transformation&Ybetteapattesnclass~cation with less
neurons,'' Proc. qf World Congress on Neural Nemrks, San
Diego, califomia vol. 3, pp 667-672.
[12]0ta Y., B M.Wilamowski, Analog Hardware
Implementation of a Voltage-Mode Fuzzy Min-Max
Controller, Journal of Circuits, Systems, and
Computers, Vol. 6, No.2, pp. 171-184, 1996.
[13]Pao, Y. H. ,Adaptive Pattern Recognition and Neural
Networks, REading, Mass.: Addison-Wesley
Publishing Co., 1989.
[14]Rumelhart, D. E., G. E. Hinton, and R. J. Williams,
"Learning internal representation by error
propagation," Parallel Distributed Processing, vol 1,
pp. 318-362, Cambrige, MA: MIT Press 1986
[15] sarajedrni A., EL zkxht-Nielsen, (1992) The best of both
worlds: Casasent networks integrate multilayer p e " s
and radial basis functions, Iruematwmd Joint Cbnference on
Neural Nemrks, IlI, 905-910.
[161Specht, D. F. "General regression neural network."
IEEE Transactions on Neural Networks, vol 2, pp.
568-576, 1992.
[171Specht, D. F. "Probalistic neural networks", Neural
Networks, vol. 3, pp. 109-118.
[18]Takagi and MI. Sugeno, Derivation of Fuzzy Control
Rules from Human Operator's Control Action. Proc. of
the IFAC Symp. on Fuzzy In$ Knowledge
Representation and Decision Analysis, pp. 55-60, July
1989.
Authorized licensed use limited to: Auburn University. Downloaded on November 29, 2009 at 20:37 from IEEE Xplore. Restrictions apply.