This action might not be possible to undo. Are you sure you want to continue?

of ECE

Abstract

Carry Select Adder (CSLA) is one of the fastest adders used in many data-processing processors to perform fast arithmetic functions. From the structure of the CSLA, it is clear that there is scope for reducing the area and power consumption in the CSLA. This work uses a simple and efficient gatelevel modification to significantly reduce the area and power of the CSLA. Based on this modification 8-, 16-, 32-, and 64-b square-root CSLA (SQRT CSLA) architecture have been developed and compared with the regular SQRT CSLA architecture. The proposed design has reduced area and power as compared with the regular SQRT CSLA with only a slight increase in the delay. This work evaluates the performance of the proposed designs in terms of delay, area, power, and their products by hand with logical effort and through custom design and layout in 0.18- m CMOS process technology. The results analysis shows that the proposed CSLA structure is better than the regular SQRT CSLA. Index Terms—Application-specific integrated circuit (ASIC), area-efficient, CSLA, low power.

1|Page

Low Power Area Efficient CSLA Dept. of ECE

1. CRITICAL REVIEW

Digital Adders are the core block of DSP processors. The final carry propagation adder (CPA) structure of many adders constitutes high carry propagation delay and this delay reduces the overall performance of the DSP processor. This paper proposes a simple and efficient approach to reduce the maximum delay of carry propagation in the final stage. Based on this approach a 16, 32 and 64-bit adder architecture has been developed and compared with conventional fast adder architectures. This work identifies the performance of proposed designs in terms of delay-area-power through custom design and layout in 0.18um CMOS process technology [2], [9], [10], [11] Instead of using dual carry-ripple adders. a carry select adder scheme using an add-one circuit to replace one carry-ripple adder requires 29.2% fewer transistors with a speed penalty of 5.9% for bit length 17 = 64. If speed is crucial for this 64bit adder, then two of the original carryselect adder blocks can be substituted by the proposed scheme with a 6.39'0 area saving and the same speed.[3]A carry-select adder can be implemented by using a single ripple carry adder and an add-one circuit instead of using dual ripple carry adders. A multiplexer-based add-one circuit is proposed to reduce the area with negligible speed penalty. The proposed 64 bit carry-select adder requires 42% fewer transistors than the conventional carry-select adder [4],[12], [13].Carry-select method has deemed to be a good compromise between cost and performance in carry propagation adder design. However, conventional carry-select adder (CSL) is still areaconsuming due to the dual ripple carry adder structure. The excessive area overhead makes CSL relatively unattractive but this has been circumvented by the use of add-one circuit introduced recently. In this paper, an area efficient square root CSL scheme based on a new first zero detection logic is proposed. The proposed CSL witnesses a notable power-delay and area-delay performance improvement by virtue of proper exploitation of logic structure and circuit technique. For 64-bit addition, our proposed CSL requires 44% fewer transistors than the conventional one. Simulation results indicate that our proposed CSL can complete 64-bit addition in 1.50 ns and dissipates only 0.35mW at 1.8V in TSMC 0.18 μm CMOS technology [5], [14],[15].

2|Page

of ECE 3|Page .Low Power Area Efficient CSLA Dept.

This brief is structured as follows. Speed. The problem is again. it requires more circuitry because it requires two full adders at each stage of three bits addition. INTRODUCTION VLSI stands for Very large scale integration which refers to those integrated circuits that contain more than 107 transistors. Designing such circuit is difficult and that design needs to overcome The VLSI design problem like Area.Fig. Section 4 presents the detailed structure and the function of the BEC logic. next those are replaced by carry select adders using dual RCAs. The SQRT CSLA has 4|Page . The sum for each bit position in an elementary adder is generated sequentially only after the previous bit position has been summed and a carry propagated into the next position. Power dissipation. the speed of addition is limited by the time required to propagate a carry through the adder. Section 3 deals with the delay and area evaluation methodology of the basic adder blocks. The early years carry look ahead adder used to overcome the delay it will produce all produce all the carries at time but it requires more circuitry. of ECE 2. The main advantage of this BEC logic comes from the lesser number of logic gates than the n-bit Full Adder (FA) structure. The details of the BEC logic are discussed in Section 4. The basic idea of this work is to use Binary to Excess-1 Converter (BEC) instead of RCA with Cin = 1 in the regular CSLA to achieve lower area and power consumption [2]–[4]. In this sum is generated for cin=1 and cin=0. That is replaced by one RCA and one add-one circuit.shows CSA with dual RCAs. shown on fig.1 [1].Low Power Area Efficient CSLA Dept. depends on input carry one sum is passed as final sum using multiplexer. Design time and Testability.2. In digital adders. There again the same problem that is eliminated by this proposed system CSLA using BEC.

and requires lower power and area [5]. OR. The delay and area evaluation methodology considers all gates to be made up of AND. We then add up the number of gates in the longest path of a logic block that contributes to the maximum delay. and Inverter.Conventional CSA using dual RCAS 3. of ECE been chosen for comparison with the proposed design as it has a more balanced delay. and Inverter (AOI) implementation of an XOR gate is shown in Fig. the work is concluded in Section 8. DELAY AND AREA EVALUATION METHODOLOGY OF THE BASIC ADDER BLOCKS The AND. [6]. The gates between the dotted lines are performing the operations in parallel and the numeric representation of each gate indicates the delay contributed by that gate.1. The ASIC implementation details and results are analyzed in Section 7. OR.Low Power Area Efficient CSLA Dept. 2. respectively. Finally. each having delay equal to 1 unit and area equal to 1 unit. The area evaluation is done by counting the total number of AOI gates 5|Page . Fig. The delay and area evaluation methodology of the regular and modified SQRT CSLA are presented in Sections 5 and 6.

Delay and Area evaluation of an XOR gate. respectively. To replace the n-bit RCA. A structure and the function table of a 4-b BEC are shown in Fig. Half Adder (HA). BINARY TO EXCESS-1 CONVERTER As stated above the main idea of this work is to use BEC instead of the RCA with Cin = 1 in order to reduce the area and power consumption of the regular CSLA. the CSLA adder blocks of 2:1 mux.Low Power Area Efficient CSLA Dept. of ECE required for each logic block. 2. an n+1-bit BEC is required. 6|Page . and FA are evaluated and listed in Table I. TABLE-1 DELAY AND AREA COUNT OF THE BASIC BLOCKS OF CSLA Adder blocks Xor 2:1 MuX Half adder Full adder Delay 3 3 3 6 Area 5 4 6 13 4. Fig. 4 and Table II. Based on this approach.

This produces the two possible partial results in parallel and the mux is used to select either the BEC output or the direct inputs according to the control signal Cin. and B0) and another input of the mux is the BEC output.4 the 4-bit BEC with 8:4 multiplexer. & AND. 3. The Boolean expressions of the 4-bit BEC is listed as (note the functional symbols ~ NOT. In the Fig. B1. the inputs for the 8:4 MUX are one is the output of the 4-bit BEC and another input is output of 4bit full adder with input carry equal to zero. B2. One input of the 8:4 mux gets as it input (B3. of ECE Fig. The importance of the BEC logic stems from the large silicon area reduction when the CSLA with large number of bits are designed. ^ XOR) X0 = ~B0 X1 = B0 ^ B1 X2 = B2 ^ (B0 & B1) X3 = B3 ^ (B0 & B1 & B2). 4-b BEC. The selection line is carry of 7|Page . Fig.Low Power Area Efficient CSLA Dept. 3 illustrates how the basic function of the CSLA is obtained by using the 4-bit BEC together with the mux.

4-b BEC with 8:4 mux. 4. TABLE-II Functional table of the 4-bit BEC B[3:0](input) 0000 0001 0010 0011 0100 0101 0110 I I I 1101 X[3:0](output) 0001 0010 0011 0100 0101 0110 0111 I I I 1110 8|Page . if cin=1 output is 4bit BEC output. Fig.Low Power Area Efficient CSLA Dept. of ECE previous stage which select one of the input as output.

The steps leading to the evaluation are as follows.g. It has five groups of different size RCA. Fig. 5. e. Regular 16-bit SQRT CSLA 9|Page . in which the numerals within [] specify the delay values.. sum2 requires 10 gate delays. 6. DELAY AND AREA EVALUATION METHODOLOGY OF REGULAR 16-B SQRT CSLA The structure of the 16-b regular SQRT CSLA is shown in Fig. The delay and area evaluation of each group are shown in Fig.Low Power Area Efficient CSLA Dept. of ECE 1110 1111 1111 0000 5. 5.

Delay and area evaluation of regular SQRT CSLA : a) group2. F is a Full Adder 10 | P a g e . and d)group5. of ECE Fig.6.Low Power Area Efficient CSLA Dept. b) group3. c) goup4.

the arrival time of mux selection input is always greater than the arrival time of data outputs from the RCA’s. sum [6:4]} = c3 [t=10] + mux. sum [6:4]} = c3 [t = 10] + mux {c6. sum [6:4]} = c3 [t=10] + mux {c6. the total number of gate counts in group2 is determined as follows: Gate count = 57 (FA + HA + Mux) FA = 39(3*13) HA = 6(1*6) Mux = 12(3*4). 6(a)] has two sets of 2-b RCA. Thus. Thus. 11 | P a g e .Low Power Area Efficient CSLA Dept. sum3 [t = 11] is summation of s3and mux [t = 3] and sum2 [t = 10] is summation of c1 and mux. 3) The one set of 2-b RCA in group2 has 2 FA for Cin=1 and the other set has 1 FA and 1 HA for Cin=0. Based on the consideration of delay values of Table I. Based on the area count of Table I. respectively as follows: {c6. of ECE 1) The group2 [see Fig. the arrival time of selection input c1 [time (t) = 7] of 6:3 mux is earlier than s3 [t = 8] and later than s2 [t = 6]. the delay of group3 to group5 is determined. 2) Except for group2.

12 | P a g e .Low Power Area Efficient CSLA Dept. the estimated maximum delay and area of the other groups in the regular SQRT CSLA are evaluated and listed in Table III. DELAY AND AREA EVALUATION METHODOLOGY OF MODIFIED 16-B SQRT CSLA The structure of the proposed 16-b SQRT CSLA using BEC for RCA with Cin=1 to optimize the area and power is shown in Fig. of ECE 4) Similarly. We again split the structure into five groups. 7. 8. The delay and area estimation of each group are shown in Fig. The steps leading to the evaluation are given here. TABLE-III DELAY AND AREA COUNT OF REGULAR SQRT CSLA GROUPS Group Group2 Group2 Group2 Group2 Delay 11 13 16 19 Area 57 87 117 147 6.

Low Power Area Efficient CSLA Dept. Modified 16-Bit SQRT CSLA 13 | P a g e . 7. of ECE Fig.

(b). group3 (c) group4. Based on the consideration of delay values of Table I. the sum3 and final c3 (output from mux) are depending on s3 and mux and partial c3 (input to mux) and mux. 2) For the remaining group’s the arrival time of mux selection input is always greater than the arrival time of data inputs from the BEC’s. H is a Half adder 1) The group2 [see Fig.Low Power Area Efficient CSLA Dept. the arrival time of selection input c1[time(t)=7] of 6:3 mux is earlier than the s3[t=9] and c3[t=10] and later than the s2[t=4]. And (d) group5. Instead of another 2-b RCA with Cin=1 a 3-b BEC is used which adds one to the output from 2-b RCA. Delay and area evaluation of modified SQRT CSLA : (a) group2. 3) The area count of group2 is determined as follows: 14 | P a g e . 8. Thus. The sum2 depends on c1 and mux. the delay of the remaining groups depends on the arrival time of mux selection input and the mux delay. respectively. of ECE Fig. Thus. 8(a)] has one 2-b RCA which has 1 FA and 1 HA for Cin=0.

of ECE Gate count = 43(FA + HA + Mux + BEC) FA = 13(1*13) HA = 6(1*6) AND = 1 NOT = 1 XOR = 10(2*5) Mux = 12(3*4). TABLE-III DELAY AND AREA COUNT OF MODIFIED SQRT CSLA GROUPS Group Group2 Group2 Group2 Group2 Delay 13 16 19 22 Area 43 61 84 107 15 | P a g e . 4) Similarly.Low Power Area Efficient CSLA Dept. with only 11 increases in gate delays. we have resorted to ASIC implementation and simulation. Comparing Tables III and IV. it is clear that the proposed modified SQRT CSLA saves 113 gate areas than the regular SQRT CSLA. the estimated maximum delay and area of the other groups of the modified SQRT CSLA are evaluated and listed in Table IV. To further evaluate the performance.

Table V exhibits the simulation results of both the CSLA structures in terms of delay. The area indicates the total cell area of the design and the total power is sum of the leakage power. Parasitic extraction is performed using Encounter’s Native RC extraction tool and the extracted parasitic RC (SPEF format) is back annotated to Common Timing Engine in Encounter platform for static timing analysis. 9(a). total power.Low Power Area Efficient CSLA Dept. The percentage reduction in the cell area.18 um technology. of ECE 7. power-delay product and the area–delay product as function of the bit size are shown in Fig. ASIC IMPLEMENTATION RESULTS The design proposed in this paper has been developed using VerilogHDL and synthesized in Cadence RTL compiler using typical libraries of TSMC 0. area and power. For each word size of the adder. internal power and switching power. The synthesized Verilog netlist and their respective design constraints file (SDC) are imported to Cadence SoC Encounter and are used to generate automated layout from standard cells and placement and routing [7]. Also plotted is the percentage 16 | P a g e . the same value changed dump (VCD) file is generated for all possible input conditions and imported the same to Cadence Encounter Power Analysis to perform the power simulations. The similar design flow is followed for both the regular and modified SQRT CSLA.

2% and the area-delay product is lower by 2. 9(b). 11%. (a) Percentage reduction in the cell area. 32-. 17 | P a g e .7%.6%.28% respectively. the delay overhead also exhibits a similarly decreasing trend with bit size. The power–delay product of the proposed 8-b is higher than that of the regular SQRT CSLA by 5. Interestingly. delay overhead in Fig. and 12.4%.4% respectively. However. and 15. 10. of ECE Fig.7% respectively. 16-. and 32-b is 14%.63%.7%. and 17. the power-delay product of the proposed 16-b SQRT CSLA reduces by 1.9%. respectively.18%. and area–delay product. and 14. 15%. (b) Percentage of delay overhead. power–delay product. 9. 9.76% and for the 32-b and 64-b by as much as 8. 16.76%.Low Power Area Efficient CSLA Dept.8%.56%. The total power consumed shows a similar trend of increasing reduction in power consumption 7.46 % with the bit size. 16. It is clear that the area of the 8-. whereas for the 64-b it reduces to only 3. and 64-b proposed SQRT CSLA is reduced by 9. Similarly the area-delay product of the proposed design for 16-. 13.7%. 32-. total power. The delay overhead for the 8. and 6. and 64-b is also reduced by 6.

simple and efficient for VLSI hardware implementation.Low Power Area Efficient CSLA Dept. The power-delay product and also the area-delay product of the proposed design show a decrease for 16-.76%). of ECE 8. The reduced number of gates of this work offers the great advantage in the reduction of area and also the total power. and 64-b sizes which indicates the success of the method and not a mere tradeoff of delay for power and area. low area. low power. but the area and power of the 64-b modified SQRT CSLA are significantly reduced by 17. It would be interesting to test the design of the modified 128-b SQRT CSLA. CONCLUSION A simple approach is proposed in this paper to reduce the area and power of SQRT CSLA architecture.4% respectively. 32-. 18 | P a g e . The modified CSLA architecture is therefore. The compared results show that the modified SQRT CSLA has a slightly larger delay (only 3.4% and 15.

19 | P a g e . 340– 344. FUTURE SCOPE Now a day’s Carry Select Adder (CSLA) used in many data-processing processors to perform fast arithmetic functions. SQRT CSLA can be replaced by Modified SQRT CSLA Where the area and power major constraints than speed. Bedrij. pp. “Carry-select adder..Low Power Area Efficient CSLA Dept. LITERATURE SURVEY [1] O.” IRE Trans. but the area and power reduced compared to SQRT CSLA. Electron. of ECE 9. So. 1962. 10. The speed of SQRT CSLA greater than Modified SQRT CSLA. Comput. J.

Hsiao. [8] Youngjoon Kim and Lee-Sup Kim. G. "Advanced Computer Arithmetic Design".2.. Rabaey. 20 | P a g e . 2005.Chandrakasan. 53–58. J .-S.2000 [10] M. and J.4. Digtal Integrated Circuits—A Design Perspective.. no. M. M. 2001. Sci.M. C. no. Sklansky. 2nd edn.” Electron.EC-9. “ASIC implementation of modified faster carry save adder. p 226-231. “An integrated system forrapid prototyping of high performance algorithmspecific data paths.” Eur. vol. Application SpecificArray Processors. 42..IEEE Press. Oklobdzija. 4082–4085. [11] J. Kittur. “Carry-select adder using single ripple carry adder. Book edited by A. “64-bit carry-select adder with reduced area. March 2008. “Encounter user guide. Ramkumar. 1998. M. Circuits Syst. and FSHRAGIAN. 614–615.. J. “An area efficient 64-bit square root carryselect adder for lowpower applications. pp. M. M. 34. “A Low Power Carry Select Adder With Reduced Area” [9] V. Potkonjak. D. in “Design of High-Performance Microprocessor Circuits”. 4. [4] Y.. [5] J. no. p. vol.” in Proc. Rabaey.” Electron. [6] Y. 1993).Low Power Area Efficient CSLA Dept. M : ‘Digital integrated circuits: a design perspective’ (Prentice-Hall. Oberman. vol. N. pp. Lett.2001. 1. Flynn and S. C. 1992. NJ: Prentice-Hall. H. 1960 [12] WESTE. J. IEEE Int. May 2001. E. Kim. 10. Lett. of ECE [2] B. L. IRE Transactions on Electronic Computers. vol. Y. Symp. Res. H. Ng. New Jersey. 2101–2103. Kannan. “Conditional-Sum Addition Logic”. Guerra. 22. He..” in Proc. and P. Oct. Ceiang and M.: ‘PrinCipkS Of CMOS VLSI designs: a system perspectivc’ (Addison-Weslcy. 37.” Version 6. pp.P. John Wiley & Sons. pp. Aug. K. F. Upper Saddle River. pp. H. 2010. [7] Cadence. Kim and L. Schultz and J. 1996) [14] D. 134-148. 534 [13] RABAEY. J. Gu. “High-Speed VLSI Arithmetic Units: Adders and Multipliers”. Chen. Chang. [3] T.

K.35μm CMOS technology fornext-generation DSPs. R.” in Proc. “VLSIimplementation of a 200MHz 16×16 left-to-rightcarry-free multiplier in 0.Low Power Area Efficient CSLA Dept. 21 | P a g e . Burns. IEEE CustomIntegrated Circuits Conf. pp. May 1997. F. H. 469-472. Kolagotla. of ECE [15] R.. Srinivas and G.

Sign up to vote on this title

UsefulNot useful- Carry Select Adder
- An Area Efficient SQRT Carry Select Adder
- Anand.s Report
- Low-Power and Area-Efficient Carry Select Adder.pdf
- NCECS1132
- Low-Power and Area-Efficient Carry Select Adder
- Final Project Report 2011_new
- Low power and area delay efficient carry select adder
- Andamuthu IJARE Paper
- 2.Low-Power and Area-Efficient Carry Select Adder(1)
- floating point multiplier
- Multipliers Using Vhdl
- Thesis
- projectreportondesignimplementationofhighspeedcarryselectadder-150515065806-lva1-app6891.pdf
- Low area csla.docx
- Fm A
- adders
- CMOS_Roopa
- ICCA Volume 3
- vhdl file
- Adders
- Vedic Multiplier Part1
- Masters Thesis
- Embedded signal processing
- [Keshab K. Parhi] VLSI Digital Signal Processing S(Bookos.org)(1)
- DESIGN AND IMPLEMENTATION OF OFDM TRANSMITTER AND RECEIVER ON FPGA HARDWARE
- VHDL Code for carry save adder
- The Complete Verilog Book
- Vinay+Thesis
- HDL Manual 2012 4th Sem 10ESL48
- Low-Power and Area-Efficient Carry Select Adder