You are on page 1of 4

In J. W, J. Yang, W. Gao and Y. Li (eds.

):
Young Computer Scientists.
Tsinghua University Press, Beijing, China,
July 1993.

ENHANCED THRESHOLD GATE FAN-IN REDUCTION ALGORITHMS①

Valeriu Beiu†,②, Jan Peperstraete†, and Rudy Lauwereins†,③



Katholieke Universiteit Leuven, Department of Electrical Engineering, ESAT–ACCA, Kardinaal Mercierlaan 94, B-3001 Heverlee, Belgium
E-mail: Valeriu.Beiu@esat.kuleuven.ac.be

ABSTRACT 2. A DIVIDE AND CONQUER ALGORITHM


The paper describes and improves on a Boolean neural network (NN) The basic ideas of the algorithm, introduced by the authors[6,7], are the division
fan-in reduction algorithm, with a view to possible VLSI implementation of of the input variables in a first layer, and how to join the results in two
NNs using threshold gates (TGs). Constructive proofs are given for: (i) at subsequent layers. The reduction algorithm has been designed for majority
least halving the size; (ii) reducing the depth from O(N) to O(log2N). Lastly functions. As it has been shown[24], any symmetric function can be built using
a fresh algorithm which reduces the size to polynomial is suggested. only majority functions (see Fig.1). But any Boolean function (BF) can be
considered as a symmetric function by repeating its input variables[1,35]. It is
1. INTRODUCTION also known that any LT 1 ⊂ MAJ3[1,32], and also LT ^ k is equivalent with
[32]
MAJ3 . Here LT 1 is the class of BFs computed by linear TGs (LTGs) with
The paper is the result of an ongoing work at KULeuven focused towards ^ 1 is the class of BFs computed by LTGs with weights
arbitrary real weights; LT
reducing the complexity of Boolean NNs, with a view to their efficient VLSI ^ k the class
bounded by a polynomial in the number of inputs (| wi| ≤ N c)[11]; LT
implementation using TGs. The meaning of “reducing a NN” is that after of BFs computed by a polynomial size depth-k circuit of LT ^ 1 gates (the depth
applying such an algorithm to an “input NN”, the “output NN” will be being the number of gates on the longest input–output path)[11,30]; MAJ1 the
“simpler” with respect to: (i) the fan-in of the neurons[6,7,15,19], (ii) precision of class of BFs computed by LTGs having ± 1 weights (these compute functions
the weights[2–4,8], and (iii) approximation of the sigmoid output function[5]. We analogous to MAJORITY gates)[21,31]; and MAJk the class of BFs computed by
will discuss two algorithms for fan-in reduction. This particular problem is of a polynomial size depth-k circuit of MAJ1 gates[22,32]. It should also be
great importance as showing one way to deal with high fan-in artificial neurons mentioned that more efficient constructions for symmetric functions than the
(high connectivity). When a NN is simulated (execution phase), or trained one in Fig.1 are known[23,35]. Unfortunately they are not made only of majority
(learning phase), such aspects do not count. But for the VLSI designers who try functions.
to map the resulting NNs in silicon, this high connectivity is usually an obstacle.
Suppose that we have a N = 2k fan-in neuron and accept only neurons with
There are two main trends for hardware implementation of neural networks:
(i) analog, and (ii) digital[14,16]. This paper deals with TGs which borrow from fan-in n ≤ 2 (obviously n < N). The recursive equations (Fig.2)
l [6,7]
are:
k− 1  k
both of them, having binary inputs but analog summation. They are challenging NG (k,1) = 2 +  1+ 2 ⋅ NG (k−1,1) (1)
alternative to classical Boolean solution due to solid theoretical background
NL(k+1,l) =  NL(k,l) ⁄ 2 × 3 +  NL(k,l) ⁄ 2 , (2)
from the 60s[17,23,24,26], new interest proven by many articles from the late
80s[27] and 90s[9–13,30–35], as well as proposals of implementation[5,19]. In where NG(k,l) is the size (number of gates), and NL(k,l) is the depth (number of
section 2 we will present a divide and conquer algorithm for reducing the fan-in layers). Solving them, we obtain (we use lg instead of log2, and ln for loge):
of symmetric functions (any function, if we are allowed to repeat each input k  k  (3)
 
variable), and will improve on its efficiency in section 3, by showing how we NG (k,1) = ∑  2 i − 1 × ∏  1+ 2 j   = O N lgN 
   
i=1 j=i+1
can reduce even more the size and the depth of the output NN. Mathematical  
proofs and simulation results support the claims. Section 4 will shortly present NL (k,l ) = 2⋅ 2 k−l − 1 = ON⁄n. (4)
the basic ideas of an even more efficient algorithm with respect to size. Some [6, 7]
conclusions end up the paper in section 5. In the general case (Fig. 3) the size is :

F F

+ ±

1 2 ... m

x1 x2 ... xN

Figure 1. Classical two layered structure for computing any symmetric function with
maximum n + 1 majority functions[24] (m ≤ N).

① Research partly carried out under the Belgian Concerted Action Project “Applicable
Neural Networks.” x1 x2 x3 x4 x5 x6 x7 x8
② On leave of absence from the “Politehnica” University of Bucharest, Department of
Computer Science, Spl. Independentei 313, 77206 Bucharest, România. Figure 2. Division of the input variables in two groups in a first layer, and “joining”
③ Senior Research Assistant of the Belgian Fund for Scientific Research. the intermediate results in the second layer (N=8, n=4).

- 3.39 -
F F

Depth = 7 Depth = 6
Size = 67 Size = 47

x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8
Figure 3. Tree structure obtained after applying twice the algorithm (N=8, n=2); no Figure 4. Removing the unused gates – tinted circles (N=8, n=2).
enhancement.

k
 
3.2 Simplifying decomposition
∑  2
k
NG (k,1) − NG2 (k,l)
NG (k,l) = + NG2 (k,l) = i−1
× ∏  1+ 2j   +
NG (l,1)   Using a partial binary tree decomposition for the OR–gates (Fig.4) leads to:
i = l + 1 j=i+1

NG (k,1) = 2 k − 1 + 2k ⋅ NG (k−1,1), (9)
k
 N lgN  (5)
+ ∏  1+ 2i = O  lgn  NL(k+1,l) = NL(k,l) + 2 + k − l, (10)
i=l+1  n 
instead of eq.1 and eq.2.
which is superpolynomial with respect to N (n has been assumed to be
Solving them we obtain:
constant).
k (k + 1) k (k + 1)



k+1
(11)
This compares well with classical Boolean decomposition which has NG(k,1) = 2 2 − 1 < 2 2 = N
exponential growth. With respect to depth the proposed algorithm is linear, like
Boolean decomposition, but the multiplying constant is lower (1⁄2). (k−l)2+ 3(k−l) 1 2 N  3 N  (12)
NL(k,l) = + 1= lg ⁄n  + lg ⁄n  + 1 = O  lg2 N⁄n  ,
2 2  2 
3. IMPROVING THE ALGORITHM having the proof that the size is reduced by 20.4365 (eq.11 and eq.8), and the
depth decreases from linear to squared logarithmic (eq.4 and eq.12).
We have improved on these results in several successive steps. First we will
prove a very tight bound for the size (eq.3), to be used later. As the 3.3 Removing unused gates
decomposition process used to obtain eq.1 was “uniform”, treating all gates in
an equal manner, an immediate improvement was to simplify the OR-gates But not all the gates are used (the unused gates are the tinted ones in Fig.5)!
decomposition. More sophisticated improvements can be realized if one looks As we compute twice all the possible sums of 1s, the number of gates we need
carefully at the way the subfunctions are generated; thus some unused gates can is:
be removed. A final improvement step has been to delete the gates performing  2
i−2
 (13)
 i−2 
the same functions. 2 ⋅  2 + 2 × ∑ 3⋅j  = 3⋅ 22i−3 + 2i+1
 j=1

 
3.1 A very tight bound
instead of:
Starting from eq.3 we can rewrite the sum of products as: 2 i⋅  1 + 2i−2 + 2i−1 = 22i−1 + 22i−2 + 2i. (14)
 1
2k−1  (6) When i →∞ this will reduces the size by 2 (the minimum reduction being 8⁄7).
 1+ 22 ⋅ 1+ 23 ⋅ … ⋅ 1+ 2k  × 20 + 2 +…+ 
       2  2  k 

  1+ 2  1+ 2  … ⋅1+ 2 
⋅ 
 3.4 Deleting redundant gates
and use a truncated Taylor series expansion for ln 1+ 2i  around 2i:
If now, in the first layer, we keep only one gate for one particular function (see
1 (7)
i+
i
Fig.6), we will have:
2 ⋅ ln2
1+ 2i < 2 ,  2
i−2
 (15)
 
so being able to prove: 2 ⋅  2 × ∑ j + 2 × 2i−1 = 22i−3 + 2i+1.
 =

k (k + 1) 2
k−1
−1 2 (8)  j 1 
+ − 0.2846 k +k


√ instead of eq.13. This reduces the size by 3 (or 5⁄4 minimum).
2 k + 0.4365
2 ⋅ ln2 k+1
NG (k,1) < 2 < 2 = 20.4365
2
N .

F F

Depth = 6 Depth = 6
Size = 63 Size = 31

x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x1 x2 x3 x4 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8 x5 x6 x7 x8 x1 x2 x3 x4 x5 x6 x7 x8
Figure 5. Simplifying OR-gate decomposition – tinted circles (N=8, n=2). Figure 6. Deleting redundant gates – tinted circles (N=8, n=2).

- 3.40 -
Table 1. Table 2.

Number of limited fan-in TGs necessary to substitute a high fan-in TG, in the following Number of necessary layers to substitute a high fan-in TG by limited fan-in TGs:
order: (i) Boolean decomposition; (ii) proposed algorithm; (iii) the enhanced algorithm. (i) Boolean decomposition; (ii) proposed algorithm; (iii) the enhanced algorithm.

n 21 22 23 24 25 26 27 28 29 n 21 22 23 24 25 26 27 28 29
N 2 4 8 16 32 64 128 256 512 N 2 4 8 16 32 64 128 256 512
13 5
22 7 22 3
7 3
253 41 13 5
23 67 13 23 7 3
31 13 6 3
65,535 10,921 583 29 13 7
24 1,147 229 25 24 15 7 3
511 77 25 10 6 3
4.3E9 7.2E8 3.8E7 1.39E5 61 29 17 9
25 37,867 7,573 841 49 25 31 15 7 3
16,383 2,493 217 49 15 10 6 3
1.8E19 3.1E18 1.6E17 6.0E14 8.9E9 125 61 39 25 15
26 2.5E6 4.9E5 54,697 3,217 97 26 63 31 15 7 3
1.1E6 1.6E5 13,945 689 97 21 15 10 6 3
3.4E38 5.7E37 3.0E36 1.1E34 1.6E29 3.7E19 253 125 81 57 41 23
27 3.2E8 6.4E7 7.1E6 4.1E5 12,557 193 27 127 63 31 15 7 3
1.3E8 2.0E7 1.8E6 88,305 2,401 193 28 21 15 10 6 3
1.2E77 1.9E76 1.0E75 3.8E72 5.6E 67
1.3E58 6.9E38 509 253 167 121 91 65 39
28 8.2E10 1.6E10 1.8E9 1.1E8 3.2E 6
49,729 385 28 255 127 63 31 15 7 3
3.4E10 5.2E9 4.6E8 2.3E7 6.E5 8,897 385 36 28 21 15 10 6 3
1.3E154 2.2E153 1.2E152 4.4E149 6.4E144 1.5E135 7.9E115 2.3E77 1021 509 437 249 193 151 111 65
29 4.2E13 8.4E12 9.3E11 5.5E10 1.7E9 2.5E7 1.9E5 769 29 511 255 127 63 31 15 7 3
1.8E13 2.7E12 2.3E11 1.1E10 3.1E8 4.5E6 34,177 769 45 36 28 21 15 10 6 3
1.8E308 2.9E307 1.6E306 5.8E303 8.6E298 1.9E289 1.0E270 3.1E231 2.7E154 2045 1021 679 505 399 321 257 193 115
210 4.3E16
16
8.6E15 9.5E14 5.6E13 1.7E12 2.6E10 2.0E8 7.9E5 1,537 210 1023 511 255 127 63 31 15 7 3
1.8E 2.7E15 2.4E14 1.2E13 3.2E11 4.7E9 3.5E7 1.3E5 1,537 55 45 36 28 21 15 10 6 3

3.5 Results A lot of work has been devoted to sorting algorithms[18], as well as to parallel
sorting networks and their possible VLSI implementation[20,36]. The classic
For a better understanding of what we have accomplished so far, we have odd-even merge algorithm[18] can be easily realized out of two element sorting
computed in Table 1 the size, and in Table 2 the depth, of an equivalent NN for cells (the primitive of all sorting networks) leading to (see Fig.8):
replacing one N-fan-in TG by n-fan-in TGs. We have included the values for
2k  2k − 1 (16)
Boolean decomposition, the proposed decomposition algorithm, and the NG (k,1) = cells = N  N − 1 TGs = O  N 2 
enhanced decomposition algorithm. 2
NL (k,1) = N = O N . (17)
 
4. USING SORTING NETWORKS
While the depth of this sorting network is linear, the size has been reduced to
Another algorithm for reducing the fan-in of symmetric functions can be polynomial. Fortunately other sorting algorithms can be implemented more
derived if we start from the basic definition of a symmetric function: “a function efficiently as sorting networks: Batcher’s odd-even mergesort and other
which depends only on the sum of its input variables” (the function is invariant butterfly implementation of odd-even mergesort, Batcher’s bitonic merge, or
to permutations of its input variables). It is well known that the evaluation of a the balance sorting network[25] (Fig.9) have:
symmetric function can be reduced to comparing the sum of the input variables NG (k,1) = O N lg2N 
 
(18)
with some constants[17,24,28]. It is thus clear that the basic operation is the SUM!
NL (k,1) = O  lg2N . (19)
There are many different ways to compute a sum. Out of these we should
mention the ones summing with TGs[1,30,31]. But all of them use unbounded We should mention that there are even NL (k,1) = O ( lgN ) algorithms like
fan-in TGs. An alternate solution is to sort the inputs and detect the position in Ajtai-Komlos-Szemeredi (AKS), but their large multiplying constant does not
the sorted output string where zeros switch to ones[12]. This position is in fact make it of any practical use (N should be of the order of 10100 such that these
equal to the sum of the inputs. Two separate blocks are needed: one to sort the algorithms really become interesting). Lower constants for O ( lgN ) time
inputs, and one to detect (search) the position of the 0 ➯1 transition. That is algorithms are Leighton’s columnsort[20] (uses AKS) and Bilardi-Preparata
why we will call it a “sort-and-searach” algorithm (Fig.7). bitonic sorting on a mesh-of-CCC (cube-connected cycles). But they are still
too complicated. The interested reader should consult [37,38] for an overview of
F

SEARCHING TREE

x′1 ≤ x′2 ≤ ... ≤ x′N

SORTING NETWORK

x1 x2 ... xN

Figure 7. Computing a symmetric function with the “sort-and-search” algorithm. Figure 8. Sorting network based on “odd-even merge” (N = 8, n = 2).

- 3.41 -
Network Models for Digital Hardware Implementation, Tech. Rep. CS/E
88-035, Dept. CS&E, Oregon Graduate Center, 1988.
[5] V. Beiu, J.A. Peperstraete and R. Lauwereins, Using Threshold Gates to
Implement Sigmoid Nonlinearity, in Proc. of ICANN’92 (Brighton),
Elsevier Science Publishers, vol. 2, 1447, 1992.
[6] V. Beiu, J.A. Peperstraete and R. Lauwereins, Algorithms for Fan-In
Reduction, in Proc. of IJCNN’92 (Beijing), IEEE and PHEI Press, vol. 3,
203, 1992.
[7] V. Beiu, J.A. Peperstraete and R. Lauwereins Simpler Neural Networks
by Fan-In Reduction, in Proc. of NeuroNimes’92 (Nimes), EC2, 589,
1992.
[8] V. Beiu, J.A. Peperstraete, J. Vandewalle and R. Lauwereins, Efficient
Decomposition of Comparison and Its Applications, in Proc. of the
European Symposium on Artificial Neural Networks (Brussels), D facto,
45, 1993.
[9] N.N. Biswas, T.V.M.K. Murthy and M. Chandrasekhar, IMS Algorithm
Figure 9. Shuffle-exchange sorting network[25] (N = 8, n = 2). for Learning Representations in Boolean Neural Networks, in Proc. of
IJCNN’91 (Singapore), IEEE Press, 1123, 1991.
too complicated. The interested reader should consult [37,38] for an overview of [10] N.N. Biswas and R. Kumar, A New Algorithm for Learning
several algorithms. Also, from the VLSI point of view to extremes are: Representations in Boolean Neural Networks, Current Science, 59(1990),
minimum area[29] Θ(2lgN) which can be realized by two counters (one for zeros 12, 595.
[11] J. Bruck, Harmonic Analysis of Polynomial Threshold Functions, SIAM
and one for ones), and minimum delay[36] which goes down to just 2 steps but
2 J. on Disc. Math., 3(1990), 2, 168.
requires a highly connected O(N ) array of binary neurons (TGs). [12] J. Bruck, personal communication, 1992.
As the search tree for the sorted sequence of bits can easily be implemented in [13] J. Bruck and R. Smolensky, Polynomial Threshold Functions, AC0
2 2
size O(N) and depth O(lgN), it becomes clear that O(Nlg N) size and O(lg N) Functions and Spectral Norms, SIAM J. on Comput., 21(1992), 1, 33.
depth networks of 2-fan-in TGs can be built. In particular for a majority [14] H.P. Graf, E. Sackinger, B. Boser and L.D. Jackel, Recent Developments
function the search tree is just one TG. For N = 210 and n = 21 we have: of Electronic Neural Nets in the USA and Canada, in Proc. of
MicroNeuro’91 (Münich), Kyrill&Method Verlag, 471, 1991.
210 1  2
NG (10,1) = ⋅ 10 + 10  = 28160 [15] T. Hofmeister, W. Hohberg and S. Köhling, Some Notes on Threshold
2 2 Circuits, and Multiplication in Depth 4, preprint, June 1990.
1 2 [16] Y. Hirai, Hardware Implementation of Neural Networks in Japan, in
NL (10,1) = 10 + 10  = 55.
2 Proc. of MicroNeuro’91 (Münich), Kyrill&Method Verlag, 435, 1991.
[17] S.T. Hu, Threshold Logic, Univ. of California Press, Berkeley, 1965.
As can be seen this solution while having the same delay as the previous one
[18] D.E. Knuth, The Art of Computer Programming, Volume 3: Sorting and
(55 layers), drastically reduces the number of 2-fan-in TGs (28160 instead of Searching, Addison-Wesley, Reading, 1973.
16
1.8E for the example we have taken). [19] R. Lauwereins and J. Bruck, Efficient Implementation of a Neural
Multiplier, in Proc. of MicroNeuro’91 (Münich), Kyrill&Method Verlag,
5. CONCLUSIONS 217, 1991.
[20] T. Leighton, Tight Bounds on the Complexity of Parallel Sorting, IEEE
Two algorithms for fan-in reduction have been introduced and analyzed. Both Trans. on Comp., C-34(1985), 4, 344.
of them improve on the known ones with respect to size and depth complexity. [21] E. Mayoraz, On the Power of Networks of Majority Functions, in A.
The first one has superpolynomial size complexity, and we have shown how to Prieto (ed.), Lecture Notes in Computer Science 540, Proc. of IWANN’91
reduce the size by a factor of at least 2.16 and up to more than 8, while the depth (Grenade), Springer-Verlag, 78, 1991.
[22] E. Mayoraz, Representation of Boolean Functions with Democratic
complexity has been decreased from linear to squared logarithmic.
Networks, preprint, July 1992.
A better result is suggested by a second algorithm which decreases the size to [23] R.C. Minnick, Linear-Input Logic, IRE Trans. on Electr. Comp.,
polynomial. The depth of the second algorithm is also squared logarithmic. A EC-10(1961), 3, 6.
further decrease to logarithmic depth is possible, but it is precluded by the fact [24] S. Muroga, Threshold Logic and Its Applications, John Wiley & Sons,
that the size in this case, while still polynomial (the size complexity is also New York, 1971.
decreased), will have a large constant making the solution impracticable. [25] L. Rudolph, A Robust Sorting Network, IEEE Trans. on Comp.,
C-34(1985), 4, 326.
To VLSI designers these results should be interesting as they directly relate to [26] N.P. Red’kin, Synthesis of Threshold Circuits for Certain Classes of
the area (A ≈ size) and the delay (T ≈ depth) of an integrated circuit used to Boolean Functions, Cybernetics (translation of Kibernetika), 6(1973), 5,
estimate its area-time (cost-performance) efficiency: AT 2. 540.
The algorithms can be used locally (one neuron at a time), or starting from the [27] A. Sarje and N.N. Biswas, Testing Threshold Functions Using Implied
Minterm Structure, Int. J. Systems Sci., 14(1983), 5, 497.
logical function to be implemented (the function can be extracted from an
[28] C.L. Sheng, Threshold Logic, Academic Press, New York, 1969.
already trained NN). [29] A.R. Siegel, Minimum Storage Sorting Networks, IEEE Trans. on Comp.,
Recently an algorithm for reducing the fan-in of BFs belonging to the FN,m C-34(1985), 4, 355.
[8]
class has been proposed . This is the class of BFs of N variables that have [30] K.-Y. Siu and J. Bruck, Neural Computation of Arithmetic Functions,
exactly m groups of ones. It has linear size and logarithmic depth, thus Proc. of IEEE, 78(1990), 10, 1669.
improving even on the second algorithm suggested in this paper. Still it is worth [31] K.-Y. Siu and J. Bruck, On the Power of Threshold Circuits with Small
Weights, SIAM J. on Disc. Math., 4(1991), 3, 423, 1991.
mentioning that the FN,m class algorithm cannot be used for any other BFs,
[32] K.-Y. Siu and J. Bruck, On the Dynamic Range of Linear Threshold
while the algorithms for decomposing symmetric functions can be applied to Elements, SIAM J. on Disc. Math., to appear.
any BF with the penalty induced by replicating the variables. [33] K.-Y. Siu, J. Bruck and T. Kailath, Depth Efficient Neural Networks for
Division and Related Problems, Res. Rep. RJ 7946 (72929), IBM
REFERENCES Almaden, San Jose, CA, 01/25/91.
[34] K.-Y. Siu, V. Roychowdhury and T. Kailath, Computing with Almost
[1] N. Alon and J. Bruck, Explicit Construction of Depth-2 Majority Circuits Optimal Size Threshold Circuits, Tech. Rep., Information System Lab.,
for Comparison and Addition, Res. Rep. RJ 8300 (75661), IBM Almaden, Stanford Univ., June 12, 1990.
San Jose, CA, 8/15/91. [35] K.-Y. Siu, V. Roychowdhury and T. Kailath, Depth-Size Tradeoffs for
[2] C. Alippi, Weight Representation and Network Complexity Reductions in Neural Computation, IEEE Trans. on Comp., C-40(1991), 12, 1402.
the Digital VLSI Implementation of Neural Nets, Res. Note RN/91/22, [36] Y. Takefuji and K.-C. Lee, A Super-Parallel Sorting Algorithm Based on
Dept. CS, Univ. College London, February 1991. Neural Networks, IEEE Trans. on Comp., C-37(1990), 11, 1425.
[3] C. Alippi and M. Nigri, Hardware Requirements for Digital VLSI [37] C.D. Thompson, The VLSI Complexity of Sorting, IEEE Trans. on Comp.,
Implementation of Neural Networks, in Proc. of IJCNN’91 (Singapore), C-32(1983), 12, 1171.
IEEE Press, 1873, 1991. [38] L.E. Winslow and Y.-C. Chow, The Analysis and Design of Some New
[4] T. Baker and D. Hammerstrom, Modifications to Artificial Neural Sorting Machines, IEEE Trans. on Comp., C-32(1983), 7, 677.

- 3.42 -

You might also like