Professional Documents
Culture Documents
htm
1. Introduction
Design of high speed data path logic systems are one of the most substantial research area in VLSI system design. High-speed addition and multiplication has always been a fundamental requirement of high-performance processors and systems. The major speed limitation in any adder is in the production of carries and many authors have considered the addition problem [1]-[4]. The basic idea of the proposed work is using n-bit binary to excess-1 code converters (BEC) to improve the speed of addition. The detailed structure and function of BEC is discussed in section 2. This logic can be implemented with any type of adder to further improve the speed. The proposed 16, 32 and 64-bit adders are compared in this paper with the conventional fast adders such as carry save adder (CSA) and carry look ahead adder (CLA). This paper has realized the improved performance of the CSA with BEC logic through custom design and layout [5]-[6]. The final stage CPA constitutes a dominant component of the delay in the parallel multiplier [7]-[8]. Signals from the multiplier partial products summation tree do not arrive at the final CPA at the same time. This is due to the fact that the number of partial-product bits is larger in the middle of the multiplier tree. Due to un-even arrival time of the input signals to the final CPA, the selection of the
54
final adder is an important work in parallel multipliers [9]. Therefore decrease in carry propagation delay will result in major enhancement of the speed of the adder and multiplier [10]. This paper is structured as follows. In Section 2, an overview of the 4-bit binary to excess-1 logic is provided. Section 3 deals with the proposed modified carry save adder (MCSA) architecture. The ASIC implementation details of the various adders are presented in Section 4.
2. BEC
A structure of 4-bit BEC and the truth table is shown in Figure 1 and Table 1 respectively. How the goal of fast addition is achieved using BEC together with a multiplexer (mux) is described in Figure 2, one input of the 8:4 mux gets as it input (B3, B2, B1, and B0) and another input of the mux is the BEC output. This produces the two possible partial product results in parallel and the muxes are used to select either BEC output or the direct inputs according to the control signal Cin. The Boolean expressions of 4-bit BEC are listed below, (Note: functional symbols, ~ NOT, & AND, ^ XOR). X0 = ~ B0 (1) X1 = B0 ^ B1 (2) X2 = B2 ^ (B0 & B1) (3) X3 = B3 ^ (B0 & B1 & B2) (4)
Table 1: Truth table of 4-bit binary to excess-1
Binary 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 Excess-1
0001 0010 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 1101 1110 1111 0000
55
B3B2B1B0
B2B1 B0
B1 B0
B0
X3
X2
X1
X0
X3 X2 X1 X0
3. MCSA
The 16-bit conventional CSA is shown in Figure 3. It has 17-half adders and 15-full adders. Since the ripple carry adder (RCA) is used in the final stage, this structure yields large carry propagation delay. To reduce this delay, the final stage of CSA is divided into 5 groups as shown in Figure 4. The first group includes 1 + log 2 n -bit value and other groups includes log 2 n -bit value, where n is the bit size of the adder. The divided groups are listed below, i). {c4,s[4:0]} ii). {c7,x[7:5]} iii). {c10,x[10:8]} iv). {c13,x[13:11]} v). x[17:14] The first group of output s[4:0] are directly assigned as the final output; the second group {c7,x[7:5]} manipulates the partial result by considering c4 is 0; the third group {c10,x[10:8]} manipulates the partial result by considering c7 is 0; the fourth group {c13,x[13:11]} manipulates the
56
partial result by considering c10 is 0 and the fifth group x[17:14] manipulates the partial result by considering c13 is 0. Depending on c4 of the first group, the second group mux gives the final result without the carry propagation delay from c4 to c7; depending on c7 of the second group final result, the third group mux gives the final result without the carry propagation delay from c7 to c10; depending on c10 of the third group final result, the fourth group mux gives the final result without the carry propagation delay from c10 to c13 and depending on c13 of the fourth group final result, the fifth group mux gives the final result without the carry propagation delay from c13 to s17. The main advantage of this logic is that each group computes the partial results in parallel and the muxes are ready to give the final result immediately with the minimum delay of the mux. When the Cin of each group arrives, the final result will be determined immediately. Thus the maximum delay is reduced in the carry propagation path. This same logic has been used for 32 and 64-bit adder structures to achieve higher speeds. Table 2 exhibits the post layout simulation results of adder structures in terms of delay, area and power. The area indicates the total cell area of the design and the total power is sum of leakage power, internal power, net power and dynamic power. The proposed result shows that the CLA and CSA have reduced area and consume lesser power than MCSA. But the speed of the MCSA architecture has significantly improved and has the least value of power-delay product compared to the conventional CSA and CLA.
Figure 3: 16-bit CSA
a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b a b
15 H
14 H
13 H
12 H
11 H
10 H
9 H
8 H
7 H
6 H
5 H
4 H
3 H
2 H
1 H
0 F
Cin
c15
c14
c13
c12
c11
c10
c9
c8
c7
c6
c5
c4
c3
c2
c1
s17 s16
s15
s14
s13
s12
s11
s10
s9
s8
s7
s6
s5
s4
s3
s2
s1
s0
c15
c14
0 F
c12
c11
0 F
c9
c8
0 F
c6
c5
0 F
c3
c2
c1
x17
x16
x15
x14 c13
x13
x12
x11 c10
x10
x9
x8 c7
x7
x6
x5 c4
s4
s3
s2
s1
s0
57
x17 x16 x15 x14 4 c13 x13 x12 x11 4
Table 2:
Power (uW) Leakage Switching Word size Adder Delay (ns) Area(um ) Power power CSA 3.6 1660 0.011 207.8 16-bit CLA 3.5 1118 0.008 175.5 MCSA 2.3 2165 0.015 254.8 CSA 6.6 3363 0.023 415.4 32-bit CLA 6.5 2235 0.016 347.4 MCSA 3.8 4737 0.032 532.9 CSA 12.7 6769 0.047 821.3 64-bit CLA 12.6 4471 0.032 691.1 MCSA 6.9 9883 0.067 1049.6 *Total power = leakage power + Internal power + Net power + Switching power
2
Total power* 415.7 351.0 529.7 830.9 702.9 1045.9 1642.6 1382.0 2099.2
Power Delay Product (10-12) 1.49 1.22 1.21 5.48 4.56 3.97 20.86 17.41 14.48
4. ASIC Implementation
The proposed designs in this paper have been developed using Verilog-HDL and synthesized in Cadence RTL compiler using typical libraries of TSMC 0.18um technology. The synthesized Verilog netlist and their respective design constraints file (SDC) are imported to Cadence SoC Encounter and are used to generate automated layout from standard cells and placement & routing [11]. Parasitic extraction is performed using Encounters Native RC extraction tool. The extracted parasitic RC (SPEF format) is back annotated to Common Timing Engine in Encounter Platform and analyzed for static timing delay. The power analysis is done using Virtuso Ultrasim [12].
5. Conclusion
A very simple approach is proposed in this paper to improve the speed of addition. CLA is arranged in the form of carry select adder (CSLA) and is used to speed up the final addition in many parallel multipliers [9]. But due to the structure of the CSLA it occupies more chip area, because it uses multiple pairs of RCAs (CLA) to generate partial sum and carry by considering Cin=0 and Cin=1. Thus the complexity of the final adder structure is high. By replacing, as demonstrated in this paper, the RCA (CLA) with Cin=1 with the BEC logic, obviously the maximum area and delay can be reduced in the final adder structure. Therefore, the BEC logic can be used with any type of adder to enhance the speed of addition. Figure 5 shows the power-delay comparison graphs of Table 2. The compared results prove that the MCSA are faster and very suitable for VLSI hardware implementation, than the other known architectures.
58
1.6 1.4 1.2 1 0.8 0.6 0.4 0.2 0 16-bit WORD SIZE
DELAY [ns]
CSA,
CLA,
MCSA
References
[1] [2] [3] [4] [5] [6] [7] [8] [9] V. G. Oklobdzija, High-Speed VLSI Arithmetic Units: Adders and Multipliers, in Design of High-Performance Microprocessor Circuits, Book edited by A.Chandrakasan,IEEE Press,2000 M. J. Flynn and S. F. Oberman, "Advanced Computer Arithmetic Design", John Wiley & Sons, 2001. J. Sklansky, Conditional-Sum Addition Logic, IRE Transactions on Electronic Computers, EC-9, p 226-231, 1960 O. J. Bedrij, Carry-Select Adder, IRE Transactions on Electronic Computers, p.340-344, 1962 R. P. Brent and H. T. Kung, A Regular Layout for Parallel Adders, IEEE Transaction on computers, Vol. C-31, No.3, p.260-264, March, 1982. T. D. Han and D. A. Carlson, Fast Area-Efficient VLSI Adders, 8th symposium on Computer Arithmetic, May 1987 C.S. Wallace A suggestion for a fast multiplier. IEEE Trans.On Computers, vol.13, pp, 1417, 1964. L. Dadda, Some schemes for parallel multipliers, Alta Frequenza, vol.34,pp.349-356,1965 Vojin G. Oklobdzija, Improving Multiplier Design by Using Improved Column Compression Tree and Optimized Final Adder in CMOS Technology, IEEE transactions on Very Large Scale Integration (VLSI) systems, vol. 3, no. 2, June 1995 Paul F.Stelling, Design strategies for optimal hybrid final adders in parallel multiplier,Journal of VLSI signal processing, vol 14,pp,321-331,1996. EncounterTM User Guide, pp. 582-592, February, 2006. Virtuoso Ultrasim User Guide, p.17, June,2004.