This action might not be possible to undo. Are you sure you want to continue?

BooksAudiobooksComicsSheet Music### Categories

### Categories

### Categories

Editors' Picks Books

Hand-picked favorites from

our editors

our editors

Editors' Picks Audiobooks

Hand-picked favorites from

our editors

our editors

Editors' Picks Comics

Hand-picked favorites from

our editors

our editors

Editors' Picks Sheet Music

Hand-picked favorites from

our editors

our editors

Top Books

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Audiobooks

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Comics

What's trending, bestsellers,

award-winners & more

award-winners & more

Top Sheet Music

What's trending, bestsellers,

award-winners & more

award-winners & more

Welcome to Scribd! Start your free trial and access books, documents and more.Find out more

4:2 COMPRESSOR DESIGN BASED ON DOMINO LOGIC

**Supervisor: Dr. M. Ahmadi g g Peng Chang Department of Electrical and Computer Engineering University of Windsor 2008.08.01
**

1

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS - UNIVERSITY OF WINDSOR

Outline

4:2 Compressors Domino logic Logical decompositions of 4:2 compressors Circuit level optimization – Split Domino Logic Simulation results and Conclusion

2

UNIVERSITY OF WINDSOR 4:2 Compressors : Co p esso s 4:2 Compressor 4:2 Compressor Array • The 4:2 compressor takes five equally weighted inputs (CIN. X4) and generate a sum bit (S). 3 . • The 4:2 compressor array is formed by a series of 4:2 compressors cascaded together it together. X3.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . X1. a carry-bit (C) and a carry-propagate-bit (COUT). X2. is used to perform column-wise compression of the partial product.

UNIVERSITY OF WINDSOR Analysis of 3:2 and 4:2 Reduction Scheme ( 12 12 Dadda Tree ) 12×12 dd Stage 1 Stage 1 Stage 2 Stage 2 Stage 3 Stage 4 Stage 5 Stage 3 3:2 Reduction Scheme 4:2 Reduction Scheme 4 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

UNIVERSITY OF WINDSOR Analysis of 3:2 and 4:2 Reduction Scheme Max column height per stage of a 3:2 scheme (carry save array) h 0 1 2 3 4 5 6 7 8 9 10 n(h) 2 3 4 6 9 13 19 28 42 63 94 Max column height per stage of a 4:2 scheme (4:2 compressor) h n(h) 0 3 1 4 2 8 3 16 4 32 5 64 6 128 7 256 “ n(h) ” represents max column height (h) t l h i ht “ h ” represents the number of stages 5 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

Its operation is divided into two major phases: precharge (CLK=0) and evaluation (CLK=1). count. clocked PMOS and NMOS transistors. charge sharing and etc. 6 . no short circuit current. Advantages: lower transistor count faster switching speed.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . speed current Disadvantages: charge leakage.UNIVERSITY OF WINDSOR Domino Logic An example of domino XOR gate It is consist of a pull-down network.

AND Gates and MUXs. 7 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .UNIVERSITY OF WINDSOR Logical Level Decomposition of 4:2 Compressors X 0 + X 1 + X 2 + X 3 + C IN = Sum + 2 ⋅ (Carry + Cout) S = S ⊕ X4 ⊕ CIN = X0 ⊕ X1 ⊕ X2 ⊕ X3 ⊕ CIN C = (S ⊕ 0 0 X X X 1 1 3 ) ⋅C X X IN 2 2 + S ⋅ X ⊕ ⊕ X X 3 3 3 IN 3 = ( X + ( X ⊕ ⊕ ⊕ ⊕ ) ⋅C ) ⋅ X Configuration of 4:2 compressor Cout = ( X 0 ⊕ X1 ) ⋅ X 2 + X 0 ⋅ X1 = ( X 0 ⊕ X1 ) ⋅ X 2 + ( X 0 ⊕ X1 ) ⋅X 0 4:2 compressor could be realized by different combinations of XOR Gates.

UNIVERSITY OF WINDSOR Logical Level Decomposition of 4:2 Compressors Full adder Primitive decomposition of 4:2 compressor (Com_and) It is formed by using 3-input XOR gates and 3-input AND gates. 8 . g y g p The critical path of the compressor is 4 XOR gates. Its regularity lends itself to gains at the architecture level of the multiplier.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .UNIVERSITY OF WINDSOR Logical Level Decomposition of 4:2 compressors Full adder Alternative decomposition of 4:2 compressor (Com_mux) It is composed of six modules: four 2-input XOR gates and two 2:1 MUX gates. The critical path of the compressor is 3 XOR gates. 2:1 MUX gate is used instead of AND gate to generate two carry signals “Carry” and “Cout” “Cout”. 9 .

10 . gates All three outputs: “Sum”.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . “Carry” and “Cout” are generated by using 2:1 MUX gates.UNIVERSITY OF WINDSOR Logical Level Decomposition of 4:2 Compressors Full adder Alternative decomposition of 4:2 compressor (Com_pur_mux) It consist of six 2:1 MUX gates. The critical path delay of the compressor is 3 XOR gates.

UNIVERSITY OF WINDSOR Optimization of 4:2 Compressors Sum= A ⊕ B ⊕ C = ABC+ ABC + ABC + ABC Carry = AB + BC + AC y Carry= AB + BC + AC Configuration of full adder fi i f f ll dd By taking the NOT of Carry.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . Thus the lower transistor count and higher performance of full adder could be achieved. which generates Sum signal. to generate Carry signal. we could use part of the circuit. 11 .

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .UNIVERSITY OF WINDSOR Optimization of 4:2 Compressors Conventional full adder using Domino Logic Proposed full adder using Domino Logic 12 .

a logical 2-input NAND gate is used to generate the output. The large keeper transistor is also replaced by two smaller transistors transistors. The main advantage of Split Domino is to reduce the dynamic node capacitance and consequently fast evaluation.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . 13 .UNIVERSITY OF WINDSOR Split Domino Logic N-input Split Domino OR gate The pull down network is equally divided into two sub-network.

UNIVERSITY OF WINDSOR Split Domino Logic – XOR Gate 2-input 2 input XOR Gate using Domino Logic (denoted as “2_xor_D”) 2-input 2 input XOR Gate using Split Domino Logic (denoted as “2_xor_SD”) 14 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .UNIVERSITY OF WINDSOR Split Domino Logic – XOR Gate 3-input 3 input XOR Gate using Domino Logic (denoted as “3_xor_D”) 3-input 3 input XOR Gate using Split Domino Logic (denoted as “3_xor_SD”) 15 .

UNIVERSITY OF WINDSOR Split Domino Logic – Full Adder Proposed full adder using Domino Logic (denoted as “FA_new”) Proposed full adder using Split Domino Logic (denoted as “FA_SD”) 16 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

The average delay is the average of delays of all input data The worst case delay is the largest delay data. which offer a realistic simulation environment reflecting the operation in actual applications.8 voltage source.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . All the circuits are targeted for TSMC 0. In the test bench.UNIVERSITY OF WINDSOR Simulation Result 2-input XOR Gate. 3-input XOR Gate. among all input data.18 technologies. 17 . The simulations are performed by using HSPICE in Cadence design tool. each input is driven by buffered signals and each output is loaded with buffers. The delay is measured from the time at which the input signals reaching 50% of its full value to the time when the output signal reaching 50% of its full potential. Full adder and 4:2 Compressors are designed in g p g y p y p Domino Logic and Split Domino Logic style separately. Circuits are thoroughly tested by all the possible input vector combinations at 1.

51 Worst Case Delay (ns) 0.89 0.UNIVERSITY OF WINDSOR Simulation Result Simulation Results for logical decompositions of 4:2 Compressors Cell Name Power Po er Dissipatio n (ns) 2.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .43E13 Worst Case PDP 1.17E13 1.78E13 1.80 Average PDP 1.12E-04 2.25E13 Operatio p n Frequenc y (GHz) 1 0.57 0.46E13 2.59 0.48E-04 3.78E13 2.63 Com_and Com_mux Com_pur_mux Comparison of different logical decompositions of 4:2 Compressors Cell Name Power Dissipatio n (ns) 100% 126% 113% Average Delay (ns) 100% 121% 109% Worst Case Delay (ns) 100% 151% 136% Average PDP 100% 154% 122% Worst Case PDP 100% 190% 154% Operatio n Frequenc y (GHz) 100% 41% 63% Com_and Com_mux Com mux Com_pur_mux 18 .41 0.81E-04 Average A erage Delay (ns) 0.47 0.

28 116% 2.81E-14 364% 2.63GHz 2.21 0.39 165% 1.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .5% 2_xor_SD % Savings Simulation Results for 3-input XOR Gates Cell Name Power Dissipation (w) Average Delay (ns) Worst Case Delay (ns) Average PDP Worst Case PDP Operation p Frequency (GHz) 3_xor_D 3_xor_SD % Savings 1.06E-04 1.26E-04 224% 0.79E-14 80.72E-14 4.54E-14 3.17 2.33E-14 131% 2.24 0.19E-04 112% 0.17GHz 82.01E-04 2.38 109% 19 .4% 0.3% 2.24 0.UNIVERSITY OF WINDSOR Simulation Result Simulation Results for 2-input XOR Gates Cell Name Power Dissipation (w) Average Delay (ns) Worst Case Delay(ns) Average PDP Worst Case PDP Operation Frequency (GHz) 2_xor_D 1.97E-14 288% 2.42E-14 8.23E-14 1.17 0.15 71.22 129% 0.

UNIVERSITY OF WINDSOR Simulation Result Simulation Results for Full Adders Cell Name Power Po er Dissipatio n (ns) 1.90E14 Worst Case PDP 7.29 0.32E-04 Average A erage Delay (ns) 0.39 Average PDP 4.12E14 5.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .92 1.41 0.67 2.20E-04 1.48E14 2.51 0.29E14 6.28 0.98E14 3.15E14 Operatio p n Frequenc y (GHz) 1.78E-04 1.22 Worst Case Delay (ns) 0.17 FA_con FA_new FA_SD Comparison of different Full Adders Cell Name Power Dissipatio n 100% 67% 74% Average Delay 100% 104% 79% Worst Case Delay 100% 124% 95% Average PDP 100% 70% 58% Worst Case PDP 100% 84% 71% Operatio n Frequenc y 100% 87% 113% FA_con _ FA_new FA_SD 20 .

09E13 Operatio n Frequenc y (GHz) 1 1.47 0.96E13 0.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .25 1.67 Com_con Com_new Com_SD 0.27E-04 Average Delay Worst Case Delay 0.17E13 0.48 Average PDP 1.29E-04 2.49E13 1.32 Comparison of different 4:2 Compressors Cell Name Power Dissipatio n 100% 92% 91% Average Delay 100% 89% 68% Worst Case Delay 100% 88% 80% Average PDP 100% 82% 62% Worst Case PDP 100% 81% 73% Operatio n Frequenc y 100% 125% 167% Com_con Com con Com_new Com_SD 21 .73E13 Worst Case PDP 1.UNIVERSITY OF WINDSOR Simulation Result Simulation Results for 4:2 Compressors Cell Name Power Dissipatio n 2.53 0.21E13 1.60 0.42 0.48E-04 2.

simulation results confirm that Split Domino Logic p g y. power and operating speed.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS . 3-input XOR Gate. followed by the simulation results of these circuits. Full adder and 4:2 Compressors are i l implemented i Domino Logic and Split Domino d in i i d li i Logic separately.UNIVERSITY OF WINDSOR Conclusion Three different logical level decompositions of 4:2 compressor are implemented in Domino Logic. 22 . p p g outperform Domino Logic in terms of delay. and used to implement 4:2 compressor in Domino Logic. A new architecture of full adder is proposed. Its property is confirmed by the simulation results results. 2-input XOR Gate.

" IBM Technical Disclosure Bulletin. vol.” IEEE Symposium on Computer Arithmetic. pp. G. “DesignWare IP family reference guide. Systems & Computers. pp. 1991 [6] P.23. 1991 [5] M. 43-50. Swartzlander Jr. Swartzlander. 1320-1324. g p . on Electronic Computers. "A recursive fast multiplier. Danysh.J. y p p Architecture. 1988 [12] Synopsys. 2. "A reconfigurable digital multiplier architecture." Master thesis. April 2002 [ ] [11] S. pp. 197 -201. pp.1981 [4] P. De Micheli. 13.UNIVERSITY OF WINDSOR References [1] C.S. vol. Kim. 30-36. Wallace. V. 1998 [9] J. "Investigation into arithmetic sub-cells for digital multiplication." Asilomar Conference on Signals.Weinberger. "4:2 carry-save adder module. “High-speed multiplier design using multi-input counter and compressor circuits. 1964 [2] Luigi Dadda. 1184-1198. 14-17. 574-580. Dally. pp. Swartzlander Jr. y." lEEE Tran." Alta Frequenza. 1. vol. "A suggestion for a fast multiplier. vol. Michael Howard . 45. E.Mokrian. vol. pp. “A Reconfigurable Coprecessor for Finite Field Multiplication in GF(2^n). Parmar.RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS ." Asilomar Conference on Signals. University of Windsor.” Proceeding of the IEEE Workshop on Heterogeneous Reconfigurable Systems on Chip.Mehta.Song.N. Felix Madlener. 2000 [10] Michael Jung. Fiske.E. vol. E. Sorin A.” March 2007 23 .” IEEE International Symposium on Computer . E. 26. "Some schemes for parallel multipliers. Systems and Computers.1966 [3] A. W. pp. University of Windsor. “The reconfigurable arithmetic processor." Master thesis. 2005 [8] A.” IEEE Journal of Solide-State Circuits. Huss.E. “Circuit and architecture trade-offs for high-speed multiplication. ''Improving the recursive multiplier. Markus Ernst.J. Jan. 2003 [7] G.

UNIVERSITY OF WINDSOR Thank You 24 .RESEARCH CENTRE FOR INTEGRATED MICROSYSTEMS .

Are you sure?

This action might not be possible to undo. Are you sure you want to continue?

We've moved you to where you read on your other device.

Get the full title to continue

Get the full title to continue listening from where you left off, or restart the preview.

scribd