This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

Scribd Selects Books

Hand-picked favorites from

our editors

our editors

Scribd Selects Audiobooks

Hand-picked favorites from

our editors

our editors

Scribd Selects Comics

Hand-picked favorites from

our editors

our editors

Scribd Selects Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

P. 1

Taxonomy Adders|Views: 38|Likes: 2

Published by nettigerman

See more

See less

https://www.scribd.com/doc/44077710/Taxonomy-Adders

03/10/2011

text

original

David Harris

Harvey Mudd College / Sun Microsystems Laboratories 301 E. Twelfth St. Claremont, CA 91711 David_Harris@hmc.edu

Abstract - Parallel prefix networks are widely used in highperformance adders. Networks in the literature represent tradeoffs between number of logic levels, fanout, and wiring tracks. This paper presents a three-dimensional taxonomy that not only describes the tradeoffs in existing parallel pre fix networks but also points to a family of new networks. Adders using these networks are compared using the method of logical effort. The new architecture is competitive in latency and area for some technologies.

A4 B4 A3 B3 A2 B2 A1 B1 C in

Precomputation G4 P4 G3 P3 G2 P2 G1 P1 G0 P0

Prefix Network G3:0 C3 G2:0 C2 G1:0 C1 G0:0 C0 Postcomputation

I. INTRODUCTION A parallel prefix circuit computes N outputs {YN, …, Y1} from N inputs {XN, …, X1} using an arbitrary associative two-input operator o as follows [13]

Y1 = X 1 Y2 = X 2 o X 1

C4 Cout S4 S3 S2 S1

Fig. 1. Prefix computation: 4-bit adder

Y3 = X 3 o X 2 o X 1 M YN = X N o X N -1 oLo X 2 o X 1

(1)

Common prefix computations include addition, incrementation, priority encoding, etc. Most prefix computations precompute intermediate variables {ZN:N, …, Z1:1 } from the inputs. The prefix network combines these intermediate variables to form the prefixes {ZN:1 , …, Z1:1 }. The outputs are postcomputed from the inputs and prefixes. For example, adders take inputs {AN, …, A1}, {BN, …, B1} and Cin and produce a sum output {SN, …, S1} using intermediate generate (G) and propagate (P) prefix signals. The addition logic consists of the following calculations and is shown in Fig. 1. Gi:i = Ai g Bi G0:0 = Cin • Precomputation: ; (2) Pi:i = Ai ¯ Bi P0:0 = 0 • • Prefix: Postcomputation:

Gi: j = Gi: k + Pi: k gGk -1: j Pi: j = Pi: k g P -1: j k S i = Pi ¯ Gi -1:0

(3) (4)

There are many ways to perform the prefix computation. For example, serial-prefix structures like ripple carry adders are compact but have a latency O(N). Single-level carry lookahead structures reduce the latency by a constant factor. Parallel prefix circuits use a tree network to reduce latency to

O(log N) and are widely used in fast adders, priority encoders [ and other prefix computations. This paper 3], focuses on valency-2 prefix operations (i.e. those that use 2-input associative operators), but the results readily generalize to higher valency [1]. Many parallel prefix networks have been described in the literature, especially in the context of addition. The classic networks include Brent-Kung [2], Sklansky [11], and Kogge-Stone [8]. An ideal prefix network would have log2N stages of logic, a fanout never exceeding 2 at each stage, and no more than one horizontal track of wire at each stage. The classic architectures deviate from ideal with 2log2N stages, fanout of N/2+1, and N/2 horizontal tracks, respectively. The Han-Carlson family of networks [5] offer tradeoffs in stages and wiring between Brent-Kung and Kogge-Stone. The Knowles family [7] similarly offers tradeoffs in fanout and wiring between Sklansky and KoggeStone and the Ladner-Fischer family [10] offers tradeoffs between fanout and stages between Sklansky and BrentKung. The Kowalczuk, Tudor, and Mlynek prefix network [9] has also been proposed, but this network is serialized in the middle and hence not as fast for wide adders. This paper develops a taxonomy of parallel prefix networks based on stages, fanout, and wiring tracks. The area of a datapath layout is the product of the number of rows and columns in the network. The latency strongly depends on fanout and wiring capacitance, not just number of logic levels. Therefore, the latency is evaluated using the method of logical effort [12, 6]. The taxonomy suggests new families of networks with different tradeoffs. One of these networks has area comparable with the smallest

0-7803-8104-1/03/$17.00 ©2003 IEEE

2213

0) network saves one 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0 15:12 11:8 7:4 3:0 15:8 13:8 7:0 5:0 15:8 13:0 11:0 9:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Fig. The Brent-Kung (L-1. in many adder cells. and the Pi:0 signal may be discarded. l. The Ladner-Fischer (L-2. P ARALLEL PREFIX NETWORKS Parallel prefix netwo rks are distinguished by the arrangement of prefix cells. White buffers are used to reduce the loading of later non-critical stages on the critical path. Black cells perform the full prefix operation. saving area and wire length. only part of the intermediate variable is required. Fig. 2. Observe that the Brent-Kung and Han-Carlson never have more than one black or gray cell in each pair of bits on any given row. and t are integers in the range [0. Section III develops the taxonomy. fanout. and wiring tracks.1. This suggests that the datapath layout may use half as many columns. black cells. In the middle. Such gray cells have lower input capacitance. Parallel prefix networks 2214 .1. f. The actual logic levels.0. II. Section II reviews the parallel prefix networks in the literature.t) corresponding to the number of logic levels. The critical path is indicated with a heavy line. For an Nbit parallel prefix structure with L = log2N. fanout. and horizontal wiring tracks. suggesting an inherent tradeoff between logic levels. fanout. The upper box performs the precomputation and the lower box performs the postcomputation. The prefix graphs illustrate the tradeoffs in each network between number of logic levels. which reveals a new family of prefix networks. The span of bits covered by each cell output appears near the output. In certain cases. III. For example. Huang and Ercegovac [4] showed that networks with large number of wiring tracks increase the wiring capacitance because the tracks are packed on a tight pitch to achieve reasonable area.L-1. Sklansky (0.0. All three of these tradeoffs impact latency. and wiring tracks are annotated along each axis in parentheses. as given in EQ (3).L-1) networks occupy vertices.known network and latency comparable with the fastest known network.0). gray cells. and Kogge-Stone (0. TAXONOMY Parallel prefix structures may be classified with a three-dimensional taxonomy (l. and wiring tracks.f.0).1. Performance comparison appears in Section IV and Section V concludes the paper.1] 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0 15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0 • Logic Levels: Fanout: Wiring Tracks: L+l 2 +1 2t f 15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 (f) Ladner-Fischer This taxonomy is illustrated in Fig. 3 for N=16. L-1] indicating: • • (a) Brent-Kung 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 15:12 15:8 13:12 11:10 11:8 9:8 7:6 7:4 7:0 5:4 3:2 3:0 1:0 11:0 13:0 9:0 5:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 (b) Sklansky 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 15:12 14:12 15:8 14:8 13:12 11:10 11:8 10:8 9:8 7:6 7:4 6:4 5:4 3:2 3:0 2:0 1:0 13:8 12:8 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 (c) Kogge-Stone 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 14:13 13:12 12:11 11:10 10:9 9:8 8:7 7:6 6:5 5:4 4:3 3:2 2:1 1:0 15:12 14:11 13:10 12:9 11:8 10:7 9:6 8:5 7:4 6:3 5:2 4:1 3:0 2:0 15:8 14:7 13:6 12:5 11:4 10:3 9:2 8:1 7:0 6:0 5:0 4:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 (d) Han-Carlson 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0 15:12 13:10 11:8 9:6 7:4 5:2 3:0 15:8 13:6 11:4 9:2 7:0 5:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 (e) Knowles [2. fanout. only the Gi:0 signal is required. 2 shows six such networks for N=16. and white buffers comprise the prefix network. The parallel prefix networks from the previous section all fall on the plane l + f + t = L1.

S. t) with l + t = L-1. vol. R. vol. 2003. Sutherland.L-2) network reduces the wiring tracks of the Kogge-Stone network by nearly a factor of two at the expense of an extra level of logic. Han and D.” Proc. f. Logical Effort. Similarly. In general. 786-793. pp.. pp. March 1982. Networks with l > 0 are sparse and require half as many columns of cells. Computers. Electronic Computers. 37th Asilomar Conf. Sept. Reasonable estimates from a trial layout in a 180 nm process are w = 0.” IEEE J. 1. R. 218-225. vol. 226231. f. S. no. June 2001. Z. 101-104. t > 0. Sproull.2. the new networks would include (1.1] network represents the Sklansky and Kogge-Stone extremes and [2. June 1960. and D. Comp. J. 1991. a (0. 34th Asilomar Conf. Aug.1. pp. Systems. June 2001. ACM. 1. 15th IEEE Symp. Harris. pp. the ratio of wire capacitance per column traversed to input capacitance of a unit inverter. Wang. 4 5 6 7 8 9 10 11 12 13 2215 . 1999.22. pp.. Fig. circuit family. Sutherland. Stone. 15th IEEE Symp. this drive is arbitrary and generally greater than minimum. The taxonomy captures the networks used in the parallel prefix adders described in the literature. Kowalczuk. no. In general. 0) with l + f = L-1. Kogge and H. Han and Carlson describe a family of networks along the diagonal (l. The method of logical effort is used to estimate the latency adders built with each prefix network. f. and wire capacitance. Computers.” Proc. For example. 49-56. and Computers.1). 63-76. 831-838. 1987. 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 15:14 13:12 11:10 9:8 7:6 5:4 3:2 1:0 15:12 13:10 11:8 9:6 7:4 5:2 3:0 15:8 13:6 11:4 9:2 7:0 5:0 15:0 14:0 13:0 12:0 11:0 10:0 9:0 8:0 7:0 6:0 5:0 4:0 3:0 2:0 1:0 0:0 Fig. R. D.” IEEE Trans. “Parallel prefix computation. Brent and H. 1]. and the wire capacitance. The delay depends on the number of logic levels. Fischer. The new architecture appears to have competitive latency in many cases. 8.1. C-31. “Design of highperformance CMOS priority encoders and incrementer/decrementers using multilevel lookahead and multilevel folding techniques. Swiss Federal Institute of Technology. EC-9. Carlson. following the assumptions made in [6]. the fanout. 277-281. 2e. 37. t) with f + t = L-1 and Ladner-Fischer occupy the diagonal (l.1. 1713-1717. Arith. and wiring tracks. C.0. I.1). Solid-State Circuits. “A regular layout for parallel adders. 4. which is the Knowles network closest to the diagonal. Arith. Comp. Jan. Ladner and M. Mlynek. t) network corresponds to the Knowles network [2 f. 1980. Lim. “A family of adders. Arith. “Conditional-sum addition logic. The taxonomy suggests yet another family of parallel prefix networks found inside the cube with l. the [8. It also suggests a new family of parallel prefix networks inside the cube. (1. Signals.5 for widely spaced tracks and w = 1 for networks with a large number of tightly spaced wiring tracks. All cells are designed to have the same drive capability. Tables 2-4 shows how the latency depends on adder size. CONCLUSION This paper has presented a three-dimensional taxonomy of parallel prefix networks showing the tradeoffs between number of stages.” IRE Trans. T. Binary Adder Architectures for Cell-Based VLSI and their Synthesis. the Knowles family of networks occupy the diagonal (0. Knowles. Huang and M. vol. and Y. ETH Dissertation 12480. San Francisco: Morgan Kaufmann. and Computers. 2. no.1) network.1] and [1. P. vol.1.” J... C. 0.1. and D. Huang. “Fast area-efficient VLSI adders.” Proc. “A parallel algorithm for the efficient solution of a general class of recurrence relations. “Logical effort analysis of carry propagate adders. Sklansky. Zimmermann. 2002. vol. Beaumont-Smith and C. J. “Effect of wire delay on the design of prefix adders in deep submicron technology. The HanCarlson (1. 260-264.4. 4. REFERENCES 1 A.” Proc. 2f-1. pp. pp. 27. RESULTS Table 1 compares the parallel prefix networks under consideration. and (2. 3.” Proc. Ercegovac. pp. Huang. “Parallel prefix adder design. Signals.2. V. New (1. fanout.1. 2000. Knowles networks are described by L integers specifying the fanout at each stage. European Solid-State Circuits Conf. “A new architecture for an automatic generation of fast pipeline adders. Kung. 1997.1. Comp.level of logic at the expense of greater fanout. …. The wire capacitance depends on layout and process and can be expressed by w. Tudor. no. Oct.” IEEE Trans. IV.1] was shown in Fig. f. J. 1973.2). pp. 8th Symp. For N=32.” Proc. Harris and I.1) parallel prefix network 2 3 R.1. Systems. 4 shows such a (1. pp.

1.1] 1 (2) HanCarlson Knowles [2.1.1) HanCarlson Knowles [4.2. 3.1] 2 (4) Kogge3 (8) Stone t (Wire Tracks) Fig.l (Logic Levels) LadnerFischer 2 (6) 1 (5) 1 (3) 0 (4) 0 (1) BrentKung 3 (7) f (Fanout) LadnerFischer Sklansky 3 (9) 2 (5) 0 (2) New (1. Taxonomy of prefix graphs 2216 .1.1.

5 16.1 10.4 31.0 14.1 / / / / / 24.2 / / / / / / / 13.3 23. L-1.1 38.….1 15.0 9.7 15.1 17.1.0 12.8 / / / / / / / 21. Adder delays: inverting static CMOS.4 18. L-2.1.1 / / / / / / / 15.4 41.5 8.1] Ladner-Fischer (1.4 10.5 Table 2.1 12.5 13.9 14. 1.75 14.9 8.3 14.2 14.6 w = 0.0 16.4 10.9 8.2 / 10.3 23.9 12. 0) (0.4 13.9 8.6 w=1 15.3 19.5 9.8 15.2 14.5 14.4 12.9 12.6 12.8 / / / / / / / 20.1 12.7 21.8 9.1 10. Adder delays: w=0.1] Ladner-Fischer (1.2 Table 4.6 15.3 / / / / / 17.25 12.6 / / / / / / / 22.7 7.4 Footless Domino 10.4 7.3 13.8 7.6 11.1 12.8 23.5 9.7 44. 1) 10.4 10.2 17. 1.1.9 11.7 10.7 / / / / / 13.6 / 8.6 13.9 8.5 14.4 18.5 N = 128 24.9 10.1 16.3 10.5 20.2 21.1 12.9 w = 0.3 16.9 / / / / / / / 18.7 15.3 11.4 18.4 13.2 17.7 15.8 15.4 25.4 9.8 23.1] Ladner-Fischer (1.8 24.9 Noninverting Static CMOS 16.0 13.1] Ladner-Fischer (1.7 20.8 23.7 11.5 13.5 13.0 15.6 27. inverting static CMOS / footed domino Inverting Static CMOS Brent-Kung Sklansky Kogge-Stone Han-Carlson Knowles [2.1 38.7 21.9 70. L-2) (1.3 20.8 19.7 / 8.3 18.5 15.0 19.6 12.2 12. 1. 1.1 21.5 12.0 12.2 22.8 10.3 10.Architecture Brent-Kung Sklansky Kogge-Stone Han-Carlson Knowles [2.4 40.9 13. 0.4 / 20.9 11.8 / 14.3 13.7 / / / / / 9.6 12.0 20.0 13.9 Footed Domino 13.9 N = 32 13.4 20.7 15.5. 1) Classification (L-1.0 15. 0.4 12.5 / 16. N = 32/64 2217 . L-1) (1. 0) (0.9 / 9.4 10. 0) (1. 1.5 23.3 23.8 / / / / / / / 17. 1.8 16. N = 32/64 w=0 Brent-Kung Sklans ky Kogge-Stone Han-Carlson Knowles [2.0 14.9 9.7 35.8 12.0 9. 1) 13.9 / / / / / / / 18.1 38.1.1 17.1 16.1 17.…. L-2) (0.8 16.1 13.2 17.3 w = 0.8 14. L-3) Logic Levels L + (L – 1) L L L+1 L L+1 L+1 Max Fanout 2 N/2 + 1 2 2 3 N/4 + 1 3 Track s 1 1 N/2 N/4 N/4 1 N/8 Col s N/2 N N N/2 N N/2 N/2 Table 1.3 N = 64 18.7 10.1 12.5.3 / / / / / / / 14.3 14.0 9.4 12.3 12.6 Table 3.6 14.7 21.7 25.4 17.9 24.4 12.….9 12.7 7. Adder delays: w=0.4 18.6 12. 0.2 28.4 10.9 / 12.2 12. Comparison of parallel prefix network architectures N = 16 Brent-Kung Sklansky Kogge-Stone Han-Carlson Knowles [2.….0 15. 1) 11.4 18.

- Read and print without ads
- Download to keep your version
- Edit, email or read offline

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

CANCEL

OK

You've been reading!

NO, THANKS

OK

scribd