Attribution Non-Commercial (BY-NC)

13 views

Attribution Non-Commercial (BY-NC)

- Layout Design of CMOS Buffer to Reduce Area and Power
- Digital Logic Structures Ch3
- m27c512_st
- FIFO Chip Design Example
- Testing
- EC01_2014 Electronics _ Communication Engineering (GATE 2014) 15th February 2014 (Forenoon)
- Temperature-Aware NBTI Modeling and the Impact of Input Vector Control on Performance Degradation
- paper_C7
- Optimal State Assignment to Spare Cell inputs for Leakage Recovery
- 160512-161004-VLSI Technology and Design
- Datasheet
- Lect 16
- 2011 International Technology Roadmap of r Semiconductor
- IJITCS 0206 02
- Subthreshold Dual Mode Logic
- Lect2UP050_(100430)
- Infineon - Application Note - Selection MOSFETs - DCDC Converter
- n well
- Passives
- Logic Family Introduction and overview

You are on page 1of 9

htm

A 1616 MUX Based Multiplier Design Using Optimized Static CMOS Logic Style

Abhijit Asati* and Chandrashekhar** * Lecturer, Electrical & Electronics Engineering Group, BITS, Pilani, India ** Director, Central Electronics Engineering Research Institute, Pilani, India

Abstract Simpler VLSI implementation of array multipliers makes them preferable for smaller operand sizes, in-spite of their linear time complexity. In general array multipliers have bad space complexity O (n2), and it requires approximately n2 cells to produce multiplication, therefore as the operand size grows the circuit takes large area and power. In this paper we present a MUX based 1616 unsigned multiplier circuit, which utilize an efficient partial product generation and partial product addition technique. The time and space complexity of such multiplier is much better than simpler array multiplier techniques. The multiplier has been designed using optimized static CMOS logic cells to provide best area, power and delay performance. The multiplier circuit is implemented using conventional CMOS logic in 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, and simulated after parasitic extraction. The simulation result shows large reduction in propagation delay and the average power compared to tree multiplier implementation by [3]. Keywords: MUX based, array, Wallace tree, booth encoding, partial product, complexity, operand size,

Introduction

In Digital Signal Processor implementation like Standard Digital Signal Processors and ASIC Digital Signal Processors, the multiplier is used as fundamental building block. The performance of different signal processing algorithms like frequency domain filtering (FIR and IIR), frequency-time transformations (FFT), Correlation etc depend on performance of multiplier implementation. In most real-time DSP processing task, the multiplier block must operate at high speed, consuming less layout area and low Power. The multiplication algorithms differ in the means of

54

partial product generation and partial product addition [1]. The array multipliers have linear time complexity i.e O (n) therefore their delay may degrade for multipliers having larger operand sizes. Also array multipliers have bad space complexity O (n2), and they requires approximately n2 cells to produce multiplication, therefore as the operand size grows the circuit takes large area and power [2], [4], [5]. The reduction in partial product row by factor of n can be achieved using a radix-m booth encoding, (where m=2n). By using Booth radix-4 (m=4=22) encoding the partial product rows can be halved [3]; therefore the number of logic cells required to generate partial product are reduced to n2/2 [2]. Further in Wallace tree accumulation, since ripple effect is reduced it produces product in far less time, the time complexity is reduced to O (log n) but requires large gate and routing area compared to regular array, hence unsuitable for VLSI implementation [2]. The advantage of reduction in hardware using Booth encoding scheme can be combined with, accelerated Wallace tree accumulation of partial product to obtain the reduced time complexity of O (log n), which are very much suitable for multipliers having large operand sizes [2], [3]. As discussed earlier, for smaller operand sizes the tree based architectures may have smaller gate delay but consume more silicon area due to increased routing and encoding overheads, on the other hand array multipliers have larger gate delay but consume smaller routing length. The MUX based array multipliers show faster and compact implementation due to efficient partial product generation and efficient partial product addition. In this paper we present, an implementation of 1616, multiplier design using MUX based array technique and static CMOS logic cells. These static CMOS logic cells provide best area, power and delay performance as described in [6]. The VLSI implementation of multiplier circuit is done using 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional CMOS logic. Simulation results are compared with another faster Booth encoded Wallace tree multiplier implementation as in [3]. Section II discusses the conventional static CMOS logic design style, section III explains the design of MUX based multiplier algorithm, Section IV describes the illustration of the Multiplication Logic; Section V describes schematic 44 multiplier and 1616 multiplier. Physical implementation and results are described in section VI. Section VII concludes the paper.

A static logic gate generates its output corresponding to the applied input voltages after a certain time delay, and it can preserve its output level (or state) as long as the power supply is provided. In steady state each gate output is connected to either Vdd or Gnd through a low-resistive path therefore for a static input, the output levels are preserved, while the operation dynamic logic circuits relies on temporary storage of signal values on the capacitance of dynamic circuit nodes. Conventional static logic style offers a versatile implementation of logic functions based on static or steady state behavior of simple CMOS structures. It is most suitable and widely accepted for many VLSI circuit implementations due to its important properties like high speed, low power, large noise margins, no logic degradation and validity of logic design

55

style at scaled down technologies. A logic gate with fan-in of n requires 2n (n Ntype + n P-type) devices. Two logic blocks, N-block and P-block, form a CMOS gate. The topology of N-block is the dual of that of the P-block. Since both the two blocks have equal number of transistors, transistor count may increase. The channel widths of series connected n-channel MOS transistors (NMOS) or p-channel MOS transistors (PMOS) have to be increased to obtain a reasonable conducting current to drive capacitive loads. The increase in size of PMOS results in a significant area overhead, and also an increased gate input capacitance, which may lead to high dynamic power dissipation. The higher gate input capacitance loads the previous stage thereby increases the delay. The ratio of PMOS/NMOS transistor widths () should be chosen optimally for achieving good, noise margin, higher speed and lower power consumption as described in [7], [9]. The short-circuit currents of a static CMOS gate can be minimized by appropriately sizing transistors for equal rise and fall times. The schematic of 1-bit full adder, 2-input AND, 3-input AND, 2-input MUX, 2-input function implemented using Conventional Static CMOS Logic design is shown in Figure 1. The full adder cell is designed using principle of symmetry has 28 transistors as described in [6], [8]. The 28-transistor performs considerably better than the 40-transistors version [6]. The 32-bit adder designed using complimentary CMOS has a power delay product of less than half of the CPL version [6]. The 2-input AND cell, 3-input AND cell, 2 input MUX and other cells also provide better a power delay product.

(a)

(b)

(c)

(d)

Figure 1: schematic using conventional static CMOS logic design style of (a) complex Full adder cell using principle of symmetry (b) 2 input AND gate (c) 3 input AND gate (d) 2 input MUX .

56

It is unsigned multiplier algorithm in which one bit of the multiplier and one bit of the multiplicand are processed in parallel. The algorithm is symmetric, i.e., the multiplier and multiplicand can be interchanged. According to this algorithm, the sum of the two operands, progressively computed, is a useful quantity that is used in the computation of certain partial products. The different quantities are computed one bit at each step of the algorithm and the appropriate quantity is then selected in the next step, if required so. The parallel implementation of this algorithm yields an iterative type array. Compared to the implementation based on the modified booths algorithm, it consumes the same amount of circuitry but yields faster multiplication. This multiplexer-based architecture performs parallel computation of the partial sums of the two operands together, which simplifies the tasks such as compression and accumulation. It also performs favorably well with regards to processing speed, compared to other regular array architectures. The multiplication logic can be explained using equation 1, equation 2, equation 3, equation 4 and equation 5.

X = x n 1 x n 2 K x 0 Y = y n 1 y n 2 K y 0 Let ,

X

j

(1)

P = XY

Xj &Yj are binary nos. after truncation, up-to the (j+1)th bit in X,Y respectively;

= x j 1 x j 2 K x 0

j 1

Yj = y

P j = X jY j

X 0 = Y 0 = 0 = P0 X = X n = X n 1 + 2 n 1 x n 1 & Y = Y n = Y n 1 + 2 n 1 y n 1 Pn = X n Y n = X n 1 + 2 n 1 x n 1 Y n 1 + 2 n 1 y n 1 = = =

}{

= 2 2 n 2 x n 1 y n 1 + 2 n 1 ( x n 1Y n 1 + X n 1 y n 1 ) + X n 1Y n 1

n 1

xjyj2

2 j

+ ( x j Y j + X j y j ) 2 j + P0

0 n 1 0

n 1

n 1

x j y j 2 2 j + ( x jY j + X j y j ) 2 j x j y j 22 j + Z j 2 j

0 n 1

n 1

(3)

57

where , Z j = x jY j + X j y j Zj = X Z j = Yj Zj =0

j

if if if

x j = 0, y j = 1 x j = 1, y j = 0 x j = 0, y j = 0 if x j = 1, y j = 1 ( 4)

Z j = X j +Yj

The example 1 shows the multiplication process for two binary 4-bit numbers using MUX-based approach. The multiplication process shows that the numbers of rows remain the same, but numbers of partial product bits to be compressed in a particular column are now restricted to only 3-bits; this makes compression much faster and easier. If carry bits C1, C2, C3 as shown by example 1 are taken care then the number of bits to be added in particular column will be only 2-bits. The two columns can be added simultaneously using 2 bit CLA, which also accepts carry input C1, C2, C3 of particular column (this is possible because, these carries are occurring in alternate columns). Thus the first step in algorithm is generation of partial product rows and second step performs the addition of these partial products together with compression. Thus compared to other regular array multiplier it will be faster. It produces output in time T= (n+1) FA_2CLA where FA_2CLA is delay of a 2 bit CLA adder, with a timing overhead one 4:1 MUX delay, while regular array multiplier takes approximate delay of T= (2n) FA. The large area overhead will be due to routing needed between these MUX.

Example 1: X0Y0, X1Y1, X2Y2 & X3Y3 at the positions shown below has be added with appropriate term selected by 4:1 MUX based on select lines shown in first column. Let X= X3X2X1X0=0111=(+7)10 and Y= Y3Y2Y1Y0=0011=(+3)10 The uncolored portion explains the operation to be performed by algorithm and colored portion show the application of algorithm on selected inputs X and Y. Working of MUX: Select lines 00/01/10/11 corresponds to I1/I2/I3/I4. X3Y3 0 X2Y2 0 X1Y1 1 0/0/0/C1 =0/0/0/1 1 0/X0/Y0/S0 =0/1/1/0 1 X0Y0 1 0/X0/Y0/S0 =0/1/1/0 0 X1Y1 =11 X2Y2 =10 X3Y3 =00 1 P2 0 P1 1 P0 =(21)10 Select line for 4:1 MUX

0 P7

0/0/0/C3 =0/0/0/1 0 0 P6

0/X2/Y2/S2 =0/1/0/0 0 0 P5

58

The logic explained in example 1 can be shown through a schematic, which use 4:1 Multiplexers & AND gates as shown in figure 2. The multiplexers are used to choose j the Zj for the Zj 2 terms (refer equation 5) while AND gates are used to produce the xjyj2 terms. The logic for MUX based multiplier implementation is shown in Figure 2. The complete logic structure to accumulate the partial product terms utilizes Cell-I and Cell-II, which are shown in Figure 3 [2]. Similar technique can be used in design of 1616 multiplier.

X1 Y1 X0 Y0

2j

AND2 22X1Y1 X2 Y2 0 0 0 C1 0 X0 Y0 S0

AND2 20X0Y0

AND2 24X2Y2 X3 Y3 0 0 0 C2 0 Y1 X1 S1

4:1 MUX

4:1 MUX

X1 Y1

0 X0 Y0 S0

Z121

AND2 26X3Y3 0 0 0 C3 0 X2 Y2 S2

4:1 MUX

4:1 MUX

4:1 MUX

X2 Y2 Z222

0 Y1 X1 S1

0 X0 Y0 S0

4:1 MUX

4:1 MUX

X14:1 Y1 MUX

4:1 MUX

X3 Y3 Z323

Xj

4:1 MUX

0 Xi Yi Si

CELL-I

Cout Sout Xj Yj Cout Sout

59

Sin Xi=Xj

Xj

Yj

Xj Yj Cin

Yi=Yj Si=Sj

CELL-II

FA

II

Cj

FA

AND2

AND2

XiYi

XiYi

Layout for a 1616 MUX based, unsigned multiplier circuit shown in figure (4) is implemented in 0.6m, N-well CMOS process (SCN_SUBM, lambda=0.3) of MOSIS, using conventional CMOS logic. A schematic library consisting of 7 functional cells is defined for static CMOS design styles comprising of 1-bit full adder, 2-input AND, 3-input AND, 2-input MUX, 2-input XOR, 2-input OR and 3input OR function. Corresponding to the schematic library, physical libraries were designed using conventional CMOS logic design styles using the design principles of [7], [8], [9], [10]. Three different versions of each physical library were developed by respectively sizing the W/L ratios of the NMOS transistor to values of 3,5 and 7 (W/L values smaller than 3 were also experimented with but not considered further as they resulted in parasitic dominated slower speeds due to weak drives of transistors and were not considered good candidates for high performance. The layout assemblies for

60

the 16-bit multiplier were carried out using these cell libraries and automatic place and route tool LEDIT (SPR) from M/s Tanner Research Inc. It was noticed that the physical library utilizing W/L ratio of 3 for NMOS transistor gave the smallest average switching energy-delay product. The generated layouts were simulated after parasitic extraction using circuit simulator, ELDO spice. Supply voltage VDD is kept at 3.3V. The table 1 shows the comparison of important parameters like propagation delay and power dissipation at 20MHz data rate with tree based implementation as in [3]. Table 2 shows the maximum power leakage power, transistor count, core area, total routing length and number of vias.

Table 1 Algorithm (technology) Proposed (0.6m) BEWM ref [3] (1.25 m) VDD (V) 3.3 5 Propagation delay () ns 14.15 60 Average power (mW) 22.05 100

Table 2 Algorithm Maximum Leakage Transistor Core Total Number (technology) Power Power count area routing of Via (mW) (nW) (mm2) length (mm) Proposed 623.46 53.34 10168 23.76 1386.71 3452 (0.6m)

Comparing these two multiplier architectures shows that proposed MUX based array multiplier architecture shows reduction in delay by a factor of 0.235 and reduction in average power consumption almost by a factor of 0.22. The maximum instantaneous power, leakage power, transistor count, core area, total routing length and number of vias are also shown for judging the VLSI implementation characteristics.

61

Conclusion

This paper present a 16-bit MUX based unsigned multiplier implementation using an optimized static CMOS logic style. The multiplier algorithm performs efficient partial product generation and addition; which makes its time and space complexity better than other array multipliers. The simulation results are compared with faster tree multiplier implementation shows reduction in propagation delay by a factor 1/4 and average switching power by approximately by a factor 1/4.

References

[1] [2] [3] A. Hesham, Technology scaling effects on multipliers, IEEE Transactions on Computers, Vol.47, No.11, pp. 1201-1215, November 1998. Z. Kiamal, Multiplexer-based array multipliers, IEEE Transactions on Computers, Vol.48, No.1, pp. 15-23, January 1999. F Jalil, M *N Booth encoded multiplier generator using optimized wallace trees, IEEE Transactions on very large Scale Integration (VLSI) Systems, Vol. 1, No.2, pp. 120-125, June 1993. V. Chanramouli, Self-Timed design in GaAs-case study on a high-speed, parallel multiplier, IEEE Transactions on very large Scale Integration (VLSI) Systems, Vol. 4, No.1, pp. 146-149, March 1996. P. Kornerup, A systolic, linear-array multiplier for a class of right-shift algorithms, IEEE Transactions on Computers, Vol.43, No.8, pp. 892-898, August 1994. Reto Zimmermann and Wolfgang Fichtner, Low-Power Logic Styles: CMOS Versus Pass Transisistor Logic IEEE Journal of solid state circuits, Vol. 32, No. 7, pp. 1079-1090, July 1997 Mohab Anis, Mohamed Allam and Mohamed Elmasry, Impact of Technology Scaling on CMOS Logic Styles, IEEE Transaction on circuits and systems-II, Analog and Digital Signal Processing, VOL. 49, NO. 8, pp. 577-587, August 2002. S.M. kang, Yusuf Leblebici, CMOS Digital integrated Circuits, Analysis and Design, Third edition McGrawhill, 2003. N. Weste and K. Eshraghian, Principles of CMOS VLSI Design, AddisonWesley, 1994 Jan M. Rabaey, Anantha Chandrakasan, Borivose Nikolic, Digital Integrated Circuits, Second Edition PrenticeHall of India Private Limited, 2004.

[4]

[5]

[6]

[7]

- Layout Design of CMOS Buffer to Reduce Area and PowerUploaded byIJIRST
- Digital Logic Structures Ch3Uploaded bylogugl89
- m27c512_stUploaded byRafael
- FIFO Chip Design ExampleUploaded byramachandra
- TestingUploaded byALEX SAGAR
- EC01_2014 Electronics _ Communication Engineering (GATE 2014) 15th February 2014 (Forenoon)Uploaded bySatish Bojjawar
- Temperature-Aware NBTI Modeling and the Impact of Input Vector Control on Performance DegradationUploaded byadaiadai
- paper_C7Uploaded byDebbrat Ghosh
- Optimal State Assignment to Spare Cell inputs for Leakage RecoveryUploaded byIJASCSE
- 160512-161004-VLSI Technology and DesignUploaded byajaypatel007
- DatasheetUploaded bytimothyyang
- Lect 16Uploaded byNaveen Kumar
- 2011 International Technology Roadmap of r SemiconductorUploaded bypllca12
- IJITCS 0206 02Uploaded byijitcs
- Subthreshold Dual Mode LogicUploaded bySibi Manoj
- Lect2UP050_(100430)Uploaded byUgonna Ohiri
- Infineon - Application Note - Selection MOSFETs - DCDC ConverterUploaded bysunil251
- n wellUploaded bySanthanu Surendran
- PassivesUploaded bybabadfe
- Logic Family Introduction and overviewUploaded byNikhil Datta
- Rong Bin 2013Uploaded bySunil Pandey
- ABSTRACT OverviewUploaded byDon Raju
- mic4451-779120Uploaded byNikhil Sharma
- Latest Vlsi ExperimentsUploaded byTamilinbaa
- Solutions Assignment3Uploaded byMa Seenivasan
- 2 - The MosfetUploaded byroxy8marie8chan
- Mc74hc02a dUploaded byDaniel Morales Castañeda
- Process Variation Effect, Metal Gate Work FunctionUploaded bysunilkmch505682
- SyllabusUploaded bymariah
- lec15Uploaded byaravind reddy

- Coping Stress Management English 1Uploaded byamulya_mallesh
- ForecastingUploaded byamulya_mallesh
- PWC ARUploaded byamulya_mallesh
- MSL 101 L09 Intro to Stress ManagementUploaded byVetLegacyLdr
- Pwc Talent Mobility 2020Uploaded byamulya_mallesh
- Stress ManagementUploaded byamulya_mallesh
- 2. Tuberculosis Screening FormUploaded byamulya_mallesh
- Stress ManagementUploaded bynidhidarklord
- A successful CRM implementation project in a service company: case studyUploaded byarmand.faganel9465
- Array vs Tree2Uploaded byamulya_mallesh
- Philips Service Center List Oct2015Uploaded bysmishra2222
- Credit Analysis for Agricultural Lending.Uploaded byamulya_mallesh
- FPGA based implementation of Wallace MultiplierUploaded byamulya_mallesh
- Business MarketingUploaded byamulya_mallesh
- UmlUploaded byamulya_mallesh
- Costs of ProductionUploaded byamulya_mallesh
- MS AccessHandsOn2012 2Uploaded byamulya_mallesh
- Six SigmaUploaded byamulya_mallesh
- 51041498 Hr Policies Final PptUploaded byamulya_mallesh
- Job Factor AnalysisUploaded byamulya_mallesh
- BRAIN GYMUploaded byamulya_mallesh
- STalking the StocksUploaded byamulya_mallesh
- XIMB20146 View Payment Advise FormUploaded byamulya_mallesh
- StepsUploaded byamulya_mallesh

- M.docxUploaded byyupsup9
- Low Power Logic StyleUploaded byrgangadhar049236
- UT Dallas Syllabus for ee4325.001.08s taught by Carl Sechen (cms057000)Uploaded byUT Dallas Provost's Technology Group
- VLSI Lab MaualUploaded byMuhammad Kamran Akram
- [09] Chapter09_Advanced Techniques in CMOS Logic CircuitsUploaded byMalvika Diddee
- Mos Ic TechnologyUploaded byMansi Jhamb
- akmpptUploaded byVidhya Ds
- Domino Logic CircuitsUploaded bySahil Bansal
- Dual Mode Logic DesignUploaded byLucas Weaver
- Design and Implementation of Low Power 16-bit Carry-lookahead Adder using Adiabatic LogicUploaded byAnonymous kw8Yrp0R5r
- Design and Optimization Techniques of High Speed VLSI CircuitUploaded byHarshvardhanUpadhyay
- VLSI Lec 4Uploaded bynasim_majoka803
- A Digital Design Flow for Secure Integrated CircuitsUploaded byDebabrata Sikdar
- Cmos 3Uploaded bynandanvr
- Logic and Computer Design Fundamentals - 4th International EditionUploaded bykalithea
- Dynamic Cmos LogicUploaded byasheesh12feb
- cntfetUploaded byCassandra Spencer
- UploadedFile_130685538550321561Uploaded bySankeerthana Likhitha
- Mtech HandoutsUploaded byJyothi Poorna
- Dynamic Logic CktsUploaded byaashishscribd
- Digital Integrated Circuits - A Design Perspective (2nd Ed) tocUploaded byemilko
- Fd 21972981Uploaded byPurush Arun
- A Study and Comparison of Full Adder CellsUploaded byBakshi Amit
- faultSs SssssssssUploaded byAbhinav Kumar
- Career EpisodesUploaded byPrashant
- Westeweb.fm.pdfUploaded byhellboytonmoy
- Project ReportUploaded byShilpa Reddy
- 4-10 Bit, 0.4-1 v Power Supply, Power Scalable Asynchronous SAR-ADC in 40 Nm-CMOS With Wide Supply Voltage Range SAR ControllerUploaded byPhuc Hoang
- 2-EE-Objective Paper-II-2013.pdfUploaded bysunilnn
- A New Low Power Dynamic Full Adder Cell Based on Majority FunctionUploaded bysrikiranmucheli

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.