Professional Documents
Culture Documents
Digital VLSI Architectures For Neural Networks
Digital VLSI Architectures For Neural Networks
S. Y. Kung J. N. Hwang
Abstract
2. Lager Ifemiron: the network is a spatially iterative network The
This paper proposes a generic iterative model for a wide variety of artifi- heterogeneous neuron layers in between the input and output lay-
cial neural networks (ANNs): single layer feedback networks, multilayer feed- ers are called hidden layers.
forward networks, hierarchical competitive networks, as well as some proba-
bilistic models. A unified formnlation is provided for both the retrieving and 3. Pattern Itemlion: In certain models, each iteration (level) is cor-
learning phases of mcat ANNs. Baed on the formulation, a programmable responding t o one pattern input.
ring systolic array is developed. The architecture maximizes the strength of
VLSI in terms of intensive and pipelined computing and yet circumvents the Nonlinear Activation Wnctions fi The nonlinear activation func-
limitation on communication. It may be adopted . I a brsic structure for an tion fi in Eq. 2 can be a deterministic function, winner-take-all mech-
universal neurocomputer architecture. anism,or a stochastic decision. There are three popular deterministic
nonlinear activation functions for Eq. 2: threshold, squash, and sigmoid
1 A Unified Generic Iterative ANN Model [SI. In some pattern classification applications, winner-take-all (WTA)
nonlinear mechanisms are adopted, which are often implemented by
A basic ANN model consists of a large number of neurons (with acti- lateral inhibitions so that only the neuron receive largest net input is
vation values { a , } ) , linked to each other with synaptic weights {wij).
From algorithmic viewpoint, two separate phases are involved in an
ANN processing: retnevtng phase and learning phase.
activated.
1/11 .,U a ‘ .111
NI
ui(I+l) = ~w,,(I+l)aj(I)+Bi(l+l) (1)
,=1 troi %(ai aio! %/ai
445
ISCAS ’89 CH2692-2/89/0000-0445$1.00 0 1989 IEEE
The important factors in the Leaming p h w are recursion index m, 1.3 Neural Network Examples of Generic Iterative
meaaure function E for tbe training criterion,updating formulations 0 , ANN
back-propagation rule of the corrective signala, and homogeneity con-
sideration in certain modela. Table 1 clwifies the neural network examples in terma of the facton,
considered in the retrieving and learning phase.
RecumionIndex m in the Learning Phase The recursion index m
used in the unified recursive weight updating formulation may represent
either a pattern index or a sweep index. 2 A Unified Ring Systolic Design for the
1. Pattern Recursion: the network update the synaptic weights after Generic Iterative ANN Model
the presentation of each training pattern.
An important consideration in the architectural design for ANNs is to
2. Sweep Recursion: then the network update the synaptic weights enaure that the processing in both the retrieving phase and the learn-
only after all the P training patterns are presented. ing phase can share the same storage, processor hardware, and array
configuration. This will significantly speed. up real-time learning by
Meaaure Function E aa Painimg Criterion A criterion function avoiding the difficulty of transfer of synaptic weights between retriev-
E in Eq. 3 has to be selected first, so that weight training may be ing and learning phaeea. It w i l l be shown that operations in both the
formulated an an iterative optimization (maximization or minimization) retrieving and learning phaaa, of the generic iterative ANN modela can
problem. The mursure function E can be a global function of the net- be formulated as conrecutive matriz-vector multiplicaiion, outer-pmduci
'
work, or a local function of I, i.e., E(1). updating, or consecutive vector-matriz multiplication problems. In terma
Of the m a y a t N C t W , all theme formulations lead to a same universd
ExampAea of U p d a t i n g Formulation 0 The updating formulation ring syablic array architectures [B]. In tuma of the functional opera-
function 0 in Eq. 3 can be in additive or multiplicative formulations, t L G , all theme formulatiom calla for a MAC (mxtiply and accumula-
tion) operation. Proper digital arithmetic technique (such a~ Cordic) to
or others. The additive formulations lead to the gradient descent (min- support the nonlinear processing is also required.
imization) or gradient ascent (for maximisation) approach:
2.1 Ring Systolic Design for the Retrieving Phase
Wij(1) -c Wij(1) I) -
8E
(1)
8wij Basic O p e r a t i o m in the Retrieving Phase The system dynamics
where f is determined based on either the maximization or minimiza- in the retrieving phase of the generic iterative ANN model can be for-
tion formulation (e.g., back-propagation learning [9]). On the other mulated as a consecutive matriz-vector multiplication (MVM) problem
hand, if { w i j ( f ) } has to satisfy the following constraints, wij(1) 2 0 and interleaved with the nonlinear activation function (see Eqs. 1 and 2).
Cz, wij(1) = 1, then the constraint optimkation problem leadn to a
multiplicative formulation [SI. u(l+ 1) = W(l+ l)a(l) + e(/ + 1)
wij(1) e wj(l) . I) -BE
(5)
a(1 + 1) = f[u(1+ I), a(/)] (9)
&j(O
where the f[x] operator performa the nonlinear activation function f
Note that only proper choice of the updating step I) can emure that the on each element of the vector x. Without loss of generality and for the
new weights { w i j ( l ) } satisfy the same constraints (e.g., Baum-Welch convenience of homogeneous architectural design, it is assumed that all
reestimation learning [a]). the iterations in the iterative ANN model have uniform size of N neural
units (it is always possible to artificially create a certain number of n-
Back-Propagation of Corrective Signals To alleviate the burden operation neural units and fixed zero weights to balance any inequality
of computing the gradient, back-propagation of corrective signah baaed between two iterations).
on chain rule derivation may be adopted. A speeial feature of consecutive MVM problems is that the output
vector at one iteration will be used as the input vector for the next
iteration. Therefore, the data have to be carefully arranged so that the
wtputa of one iteration can be immediately used for the inputs of the
where the backward propagated corrective signal & ( I ) is defined to be next iteration [SI. This le& to a ring systolic architecture as shown
$&, which can be recursively calculated as shown below: in Figure 2. Note that The functional operation at each PE is a MAC
operation (see Figure 2). The pipelining period of this design is 1, which
impliea 100% utilization efficiency.
Figure 2: The ring syrtolic ANN at the (I + 1)-th iteration in the re-
trieving phase.
446
~~
- -
2.2 Ring Systolic Design for the Learning Phase Wij(l+ l ) r W i j ( I + 1 ) ' 9 i ( l + l ) ' h j ( l + l ) (11)
447
Systolic Processing for O P U T h e ring systolic array derived for the neural processing units are MAC or multiplication (e.g., the calcula-
tiom ofgi(l+l), hj(l+l), and rji(l+l)) operatiom. For a time eficient
the retrieving p h s v can be easily adapted to execute in pornllel both
the OPU and consecutive VMM in the learning phase [SI. The OPU dedicated design, a parallel array multiplier (e.g., Baugh-Wooley multi-
adopted the same ring systolic array configuration M consecutive MVM plier [l]) should be favorably considered.
There is concern about using digital hardware for the nonlinear sig-
problems with similar MAC functional operations (see Figure 3). The
moid function which is required in many continuowvalued ANN mod-
operations of the OPU in the ring systolic ANN can be briefly described eh. Simulations have been conducted [e] to replace the sigmoid function
as follows: with a piecewise linear function. The simulations adopted special val-
The value of gi(l+ 1) is computed and stored in the i-th PE. The ues (combination of powersf-two) of the slopes for each intervals of the
value hj(l + 1) produced at the j-th PE will be cyclically piped piecerise linear function so that shift-and-add operations are required
only for the implementation. It is observed that with 8 to 16 linear seg-
(leftward) to all other PES in the ring systolic ANN during the N
menta, the approximation produces very similar results as that in the
clocks.
32-bit floating point implementation of the sigmoid function. This may
When h j ( l + 1) arrives at the i-th PE, it is multiplied with the be exploited to simpliiy the procesring hardware.
+
stored value gi(l 1) to yield Awij(l+ l), which will be added Area EfRcient Design Based on Cordic Processor For an area
to (or multiplied with) the old weight wij(l+ 1) of the previour efficient dedicated digital VLSI implementation of the neural procws-
recursion to yield the updated wij(l+ 1). Note that the old weight ing units, a Cordic procesmr, which wza about 1/3 silicon area of an
data {wij(l+ 1)} an retrieved in a circularly-shift-up sequence, m a y multiplier might be a good alternative [l]. A Cordic scheme is an
juet like what ueed in the retrieving phsv. iterative method based on bit-level shift-and-add operations. It is e s p t
cidly suitable for computing a class of rotations. A 2-dimensionalvector
+
After N clocks, all the N x N new weights at (I 1)-th iteration
v = [z, y] can be rotated by an angle U by using a rotation operator R,,,
+
{wij(l 1)) are generated. The ring syetolic ANN is now ready
v' = &V.
for the weight updating of the next iteration (I).
In a linear mode, the Cordic can be used for MAC (multiply and
accumulation) operations, although somewhat slower than the array
multiplier. Given z,y, and U , we can get the Cordic output y ZQ +
in the linear mode which is needed for the MAC operations. The key
advantage of Cordic is that it may implement the sigmoid activation
function by setting the Cordic in the hyperbolic mode, where the inputs
are set as v = [ l , 11 and a = u/2. For a more detailed design, the
readem refer to [4].
3 Conclusion .
For r e d time proceming performance, the neural network architectures
will require massive parallel processing. Fortunately, today's VLSI and
CAD technologies fadlitate practical and cost-effective implementation
of large male computing networks. A unified mathematical formulations
of the generic iterative ANN model is p r o p d to prepare the ground for
Systolic Processing for Consecutive V M M The coweutive V M M a versatile neural computing architecture. This leads to a programmable
a h lead to the name ring systolic array configuration with similar MAC Universal ring systolic ANN design. Thanks to the versatility of this
functional operations (nee Figure 4). The operations of the c o m u t i v e systolic design, both of the retrieving and learning phases of most neural
VMM in the ring systolic ANN can be briefly described M follows: network models can be efficiently implemented.
References
1. The back-propagated corrective signal 6 j ( l +
1) and the value [l] H.M. Ahmed. Alternative arithmetic unit architectures for VLSI
+
rji(l+ 1) are available in the j-th PE at (I 1)-th iteration. The digital signal processors. In VLSI and Modem Signal Processing,
value r j i ( l + 1) is then multiplied with 6j(l+ 1) at j-th PE. chapter 16, pages 277-306, Prentice Hall, Inc., Englewood Cliffs,
2. The product is added to the newly arrived accumulator acci. (Tbe NJ, 1985.
parameter acci is initiated at the i-th PE with zero initial value [2] S. Gmeaberg. Adaptive pattern classification and universal recod-
and circularly shifted leftward acmea the ring array.) ing: Part 1. parallel development and coding of neural feature d e
tutors. Biological Cqbemeiics, 23:121-134, 1976.
3. After N such accumulation operations, the accumulator acq will [3] J. J. Hopfield. Neural network and physical systems with emergent
return to i-th PE after accumulating all the products &(I) = collective computational abilities. In Pmc. Natl'. Acad. Sci. USA,
Cy==,6j(l+ 1) rji(l+ 1). pages 2554-2558, 1982.
[4] J. N. Hwang. Algorithms/Applicationa/Architectums of Artificial
I I Neural Nets. Dept. of Electrical Engineering, University of South-
ern California, December 1988. Ph.D. Dissertation.
[5] T. Kohonen. Self-organized formation of topologically correct fea-
ture map. Biological Cybernetics, 4359-69, 1982.
[SI S.Y.Kung and J. N. Hwang. A unified systolic architecture for
artificial neural networks. To appear in Joumal of Parallel and
Diaiributed Compuiing, Special Issue on Neural Nctcoorka. 1989.
[7] R. Linsker. Self-organization in a perceptual network. IEEE Com-
W p " w11.11 w I,. I) w I!. *I p i e r Magazine, 21:105-117, March 1988.
t~.av> *n#
[SI L. R. Rabiner and B. E. Juang. An introduction to hidden Markov
Figure 4: The ring systolic architecture for the consecutiveVMM at the models. IEEE ASSP Magazine, 3(1):4-16, January 1986.
+
(I 1)-th iteration in the learning phase.
- ._ [9] D. E. Rumelhart, J. L. McClelland, and the PDP h a t c h Group.
Parallel Diatribuied Processing (PDP): Vol. 1 and 2. MIT Prem,
2.3 Implementation Considerations of Neural Pro-
Cambridge, Mamachuaetts, 1986.
ceming Units [lo] B. Widrow and R. Winter. Neural nets for adaptive filtering and
Time EfsCient Design Based on a Parallel Array Multiplier A. adaptive pattern recognition. IEEE Computer Magazine, 21:25-39,
dwussed in Sections 2.1 and 2.2, most of the computations involved in March 1988.
448
______~-- -~ ~ ~ ~ ~
(217) 333-4789 rurtiicr imvrmarivn can ~e mraineu morn ut-. w . hennern JenKins.