Low-Cost Design of Serial-Parallel Multipliers Over GF (2) Using Hybrid Pass-Transistor Logic (PTL) and CMOS Logic

2010 International Symposium on Electronic System Design
Low-Cost Design of Serial-Parallel Multipliers Over GF(2m) Using Hybrid Pass-Transistor Logic (PTL) and CMOS Logic
Pramod Kumar Meher
Department of Communication Systems Institute for Infocomm Research, Singapore aspkmeher@ntu.edu.sg
Shen-Fu Hsiao, Chia-Sheng Wen and Ming-Yu Tsai

Department of Computer Science and Engineering National Sun Yat-Sen University Kaohsiung, Taiwan sfhsiao@cse.nsysu.edu.tw, {sheng, mjtsai}@garfield.cse.nsysu.edu.tw involving arithmetic operations or other XOR-rich applications [3-6]. Recently, hybrid PTL/CMOS synthesis based on Synopsys Design Compiler with both PTL and CMOS standard cell library has been shown to achieve promising results for designing combinational logic circuits [6]. In this paper, we present the PTL circuit designs for the fundamental sequential logic circuits, e.g., D-FF and T-FF, to be used in the design of finite field multiplication over GF(2m). Since the new cell library includes both D-FF and T-FF cells designed using both CMOS and PTL, the logic synthesis tool can better exploit the circuit features of standard cells to generate results based on users specified design constraint. The rest of the paper is organized as follows. Section II gives a brief review of finite field serial/parallel multiplier designs using D-FFs and T-FFs respectively. In Section III, the D-FF and T-FF cells are designed using both CMOS and PTL circuits. Section IV compares the synthesis results of the two finite field multiplier designs. Finally, we present the conclusion in Section V. II. SERIAL-PARALLEL MULTIPLIER USING T FLIP-FLOPS
AbstractWe have designed pass-transistor logic (PTL)-based D flip-flop and T flip-flop to be used in finite field multiplication. Since both CMOS and PTL have their respective advantages in area, speed, and power, we have compared two different designs (conventional implementation and improved implementation) of serial-parallel finite field multiplication using pure CMOS, pure PTL, and hybrid PTL/CMOS logic. Experimental results with UMC 90nm technology show that the improved architecture of finite field multiplication composed of PTL-based T flip-flops can substantially reduce the total area, delay and power. Furthermore, the proposed cell-based design flow with hybrid PTL/CMOS cell library can be used to generate any other combinational and sequential logic circuits. Key words: finite field arithmetic, digital arithmetic, pass transistor logic, sequential logic, standard cell library, logic synthesis.
I.
INTRODUCTION
Polynomial basis finite field multipliers over GF(2m) are widely used for error control coding and cryptography. Among a large number of architectures proposed for efficient finite field multiplication, serial-parallel polynomial-basis multipliers are well suited for embedded systems where the cost and size of hardware are major considerations. Addition in finite field over GF(2m) is one of the fundamental operations, and has been used to perform the other field operations like multiplication through finite field accumulation which is usually implemented using D flip-flops (D-FFs) combined with bitwise logical AND and XOR operations. Recently, extended sequential logic for synchronous circuit optimization has been proposed to significantly improve the performance and area complexity of the finite field accumulation using T flip-flops (T-FFs) that are constructed from CMOS D-FFs [1-2]. Although CMOS has been the mainstream logic design style for standard cell library, it has been shown in the past two decades that pass transistor logic (PTL) is an attractive design alternative to CMOS logic in circuit designs
978-0-7695-4294-2/10 $26.00 2010 IEEE DOI 10.1109/ISED.2010.33 131
Any two arbitrary elements A and B in GF(2m) can be represented as

A = a j j ,
j =0 m 1
A. Mathematical Formulation Let the finite field over GF(2m) be defined by an irreducible polynomial of degree m, given by (1) Q ( z ) = z m + q m 1 z m 1 + + q1 z + 1, q j GF ( 2) .
m 1 j =0
B = b j j , a j , b j GF ( 2)
(2)
where is a root of Q(z). The product of A and B over GF(2m) is given by m 1 (3) C = A B mod Q( z ) = b ( i A mod Q( z ) )
i =0
with A = A . Thus, A recursively as

0
i +1
Ai can be obtained from Ai
Ai +1 = Ai mod Q( z )
i = a0 + a1i 2 +
i + am1 m mod Q( z )
(4)
i i 1 = a0+1 + a1i +1 + + am+1 m1 . The derivation of (4) uses the following relation: (5) m = q m 1 m 1 + + q1 + 1, q j GF ( 2) because is a root of Q(z). Thus, we have a ij+1 = a ij 1 q j (6) , 1 j m 1. i a0+1 = a i m 1 The finite field multiplication is performed in two stages of recursive operations, where modular reduction is performed according to (6) in the first stage, and AND-accumulate is performed according to (3) and (4) in the second stage. Fig. 1 shows the implementation of serial-parallel multiplier over GF(2m) consisting of a reduction section and an ANDaccumulate section. The D-FFs of the reduction section is initialized by the bits of the input word A, which get shifted from one D-FF to the next in each cycle across the reduction section. The reduction section consists of m reduction cells (RCs) shown in Fig. 1(b) and m D-FFs to perform successive reductions in every cycle according to (6). The i-th RC cell consists of an XOR gate if qi = 1 ; otherwise, the RC cell could be removed.
(a) overall architecture
(b) ((a b) c) using T-FF (c) T-FF constructed from D-FF Fig. 2. (a) Improved implementation of serial-parallel multiplier using T-FFs. (b) Realization of ((a b) c) using T-FF, and (c) the T-FF constructed from D-FF with gated clock signal.
III.
IMPLEMENTATIONS USING PTL AND CMOS CIRCUITS
(a) overall architecture with D-FF
(b) function of the RC cell Fig. 1. (a) Implementation of serial-parallel multiplier over GF(2m), and (b) function of an RC cell.
B. Simplication with T Flip-Flops The ((a b) c ) operation in feedback loop with a D-FF in the AND-accumulate section in Fig. 1(a) can be realized more efficiently by an extended sequential logic cell of T-FF along with an AND gate as shown in Fig. 2(b). The T-FF can be implemented using a D-FF with the inverted output feeding back to the data input plus a gated clock signal, as shown in Fig. 2(c). Fig. 2(a) shows the overall architecture of the improved serial-parallel finite field multiplier based on the T-FF in Fig. 2(b). In general, the T-FF has the same circuit complexity as D-FF. Note that the gated clock signal in Fig. 2(b) is shared among all the T-FFs in Fig. 2(a). Thus the serial-parallel finite field multiplier design of Fig. 2(a) has smaller area cost compared with the conventional design of Fig. 1(a).
A. PTL Desigsn of Comibnatinal Logic From the two achitectures of finite field serial-parallel multipliers (Figs. 1 and 2), we notice that the basic components are AND, XOR logic gates and D-FF and/or TFF. It has been shown that PTL outperforms CMOS for some particular circuits that extensively use the XOR function [3-4]. For example, Fig. 3(a) shows PTL circuit designs of two-input XOR (XOR2) and two-input NAND (NAND2) gates using only four basic cells of MUX, INV, PINV, and NINV shown in Fig. 3(b). The PTL-based twoinput AND (AND2) circuit is simply constructed from NAND2 with an output inverter, as shown in Fig. 3(a). In fact, all PTL-based combinational logic gates can be realized using only the four basic PTL cells in Fig. 3(b) through a multi-level PTL logic cell library [6]. But in this paper, we only need XOR2 and AND2 for implementations of the finite field multiplier. Table 1 compares XOR2, NAND2, and AND2 logic gates designed using CMOS and PTL under UMC 90nm technology. We find that PTL-based XOR2 has less overall power consumption (including dynamic AC power and static leakage DC power) and smaller area compared with the CMOS implementation. As power becomes one of the major design concerns, PTL, although not as robust as CMOS, still has its advantages over CMOS for some applications, even under nano-scale process technologies where leakage power takes a significant portion of total power. As mentioned before, PTL and CMOS have their relative advantages in terms of area, speed and power. Thus, we can fully exploit the features of these logic design styles by including both PTL and CMOS cells during logic synthesis [5-6]. Fig. 4 shows a hybrid PTL/CMOS logic synthesis method embedded in the traditional standard cellbased design flow with Synopsys Design Compiler (DC) for
ANDAccumulat e section
Reductio n section
132
the front-end logic synthesis and Cadence SoC Encounter for the back-end placement and routing (P&R). This design flow allows us to perform logic synthesis of large systems based on pure PTL, pure CMOS, or hybrid PTL/CMOS cell library, under various design constraints.
System-on-a-Chip (SoC) designs because the power dissipated in the clock distribution takes a significant portion of total power in large system designs. All these designs of D-FF and T-FF cells are included in the hybrid PTL/CMOS standard cell library during logic synthesis so that Synopsys Design Compiler can fully exploit the respective circuit features of PTL and CMOS and produce best synthesized circuits that satisfy users specified design constraints.
XOR2
NAND2 (a)
AND2
(a) CMOS D-FF PINV NINV (b) Fig. 3. (a) PTL designs of XOR2, NAND2, and AND2 circuits designed using (b) four basic PTL cells. Table 1. Comparison of XOR2, NAND2, and AND2 designs. UMC 90nm (Vdd=1.0V) Logic circuits Area Delay Power P A*D*P A(m2) D(ns) (W@100MHz) PTL 9.4 0.46 0.24 1.0 XOR2 CMOS 16.5 0.40 0.32 2.1 PTL 7.1 0.46 0.17 0.6 NAND2 CMOS 5.5 0.20 0.26 0.3 PTL 9.4 0.49 0.21 0.96 AND2 CMOS 8.6 0.23 0.38 0.75 (b) PTL D-FF MUX INV
(c) CMOS T-FF
Fig. 4.Hybrid PTL/CMOS synthesis flow.
B. PTL Designs for D Flip-flop and T Flip-Flop Fig. 5(a) is a CMOS D-FF circuit with active-low synchronous reset. Fig. 5(b) shows the PTL implementation of the DFF circuit. CMOS and PTL implementations of the T-FF are shown in Fig. 5(c) and Fig. 5(d) respectively. Table 1 compares the post-layout simulation results of various DFF and T-FF cell circuits based on UMC 90nm process technology. The numbers in parentheses denote the reduction rates of area, delay and power of PTL circuits compared with CMOS implementations. Note that the PTL-based D-FF (TFF) circuits, requiring only one phase of the (gated) clock signal, have smaller area cost and lower power consumption. The smaller load burden for the clock signal in the PTL implementations is an important consideration in current
133
(d) PTL T-FF Fig. 5. (a) CMOS D-FF circuit with synchronous reset provision. (b) PTL circuit for D-FF. (c) CMOS circuit for T-FF. (d) PTL circuit for T-FF. Table 1. Post-layout simulation results of D-FF and T-FF designed using CMOS and PTL under UMC 90nm technology. Delay Power clk load Area A*D*P Flip-Flop circuits D (ps) P (uW) (fF) A(um2) CMOS 22.736 87.21 0.692 1372 6+6 D-FF 10.192 44.32 0.545 246 PTL 3 (55%) (49%) (21%) (82%) CMOS 34.496 124.7 1.478 6356 5.25 T-FF 19.6 98.64 1.078 2084 PTL 5.25 (43%) (21%) (27%) (67%)
IV.
EXPERIMENTAL RESULTS
Table 2 shows the synthesis results for two serial-parallel finite field multipliers over GF(28) of Figs. 1 and 2 using pure CMOS, pure PTL, and hybrid PTL/CMOS cells with Synopsys Design Compiler as the logic synthesis tool under
the same design constraint of area optimization. The postsynthesis simulation results are based on UMC 90nm process technology. Fig. 6 shows the area profiling of the major components (D-FFs, T-FFs, and other combinational logic). Fig. 7 shows the cell utilization rate of the various PTL and CMOS cells in the hybrid PTL/CMOS synthesis results. It is observed that in the hybrid design of Fig. 1, CMOS cells are used only for AND2 logic operations, which takes about 13% of total area cost. Other cells are all PTL circuits. In the hybrid design of Fig. 2, since most of the AND2 operations are merged inside the T-FF cells, the remaining CMOS AND2 only takes about 2% of the total area. From Table 2, we observe that hybrid PTL/CMOS circuits have almost the same area cost as the pure PTL results because in the area-optimized synthesis, CMOS logic has advantages only in the design of AND2 gates that take a negligible portion of the total area, as shown in Fig. 7. We also notice that the improved finite field multiplier designs of Fig. 2 with T-FFs have better results compared with their counterpart designs of Fig. 1 with D-FFs. This demonstrates the efficiency of merging the finite field accumulation with D-FFs into T-FFs, mentioned in Section II.
Table 2. Synthesis results of two serial-parallel finite field multipliers over GF(28) using PTL and CMOS cells under UMC 90nm technology. Serial-parallel Delay D Power P Area A multiplier over A*D*P (ns) (uW@100MHz) (um2) 8 GF(2 ) CMOS 683 0.09 34.05 2093 Fig. 1 PTL 363 0.04 24.52 356 (Dhybrid 350 0.04 28.05 392 FF) CMOS 584 0.12 25.46 1784 Fig. 2 PTL 313 0.1 19.47 609 (Thybrid 311 0.1 19.68 612 FF)
(a) Fig. 1 (b) Fig. 2 Fig. 7. Cell utilization rate (%) in the hybrid PTL/CMOS synthesis for two serial-parallel finite field multipliers of Fig. 1 and Fig. 2.
V.
CONCLUSIONS
We have designed PTL-based D flip-flop and T Flip-flop used in finite field multiplication. Two serial-parallel multipliers over finite field GF(2m) have been presented based on standard cells designed using pure CMOS, pure pass transistor logic (PTL), and hybrid PTL/CMOS circuits. The multipliers are synthesized using the conventional ASIC cell-based design flow with Synopsys Design Compiler as the logic synthesis tool. Experimental results show that the proposed finite field multiplier synthesized using PTL or hybrid PTL/CMOS cells can achieve best results in terms of area, speed, and power. The hybrid PTL/CMOS library can also be employed to synthesize any other large System-on-aChip (SoC) design modeled in hardware description language. ACKNOWLEDGMENT This work is supported in part by Taiwans National Science Council (NSC) under grant NSC 98-2220-E-110004. Chip Implementation Center (CIC) in Taiwan provides the support of process technology files and related EDA tools. REFERENCES
[1] P. K. Meher, Extended Sequential Logic fo rSynchronous Circuit Optimization and Its Applications, IEEE Trans. Computer-Aided Design of Integrated Circuits and Systems , vol. 28, no. 4, pp. 469477, Apr. 2009. P. K. Meher, On Efficient Implementation of Accumulation in Finite Field Over GF(2m) and Its Applications, IEEE Trans. VLSI Systems, vol. 17, no. 4., pp. 541-550, Apr. 2009. K. Yano, Y. Sasaki, K. Rikino, and K. Seki, Top-down PassTransistor Logic Design, IEEE Journal of Solid-State Circuits, vol.31, no. 6, pp.792-803, June 1996. R. S. Shelar and S. S. Sapatnekar, BDD Decomposition for Delay Oriented Pass Transistor Logic Synthesis, IEEE Trans. VLSI Systems, Vol. 13, No. 8, pp. 957-970, Aug. 2005. G. R. Cho and T. Chen, Synthesis of Single/Dual-Rail Mixed PTL/Static Logic for Low-Power Applications, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 23, no. 2, pp. 229-242, Feb. 2004. Shen-Fu Hsiao, Ming-Yu Tsai, and Chia-Sheng Wen, Low Area/Power Synthesis Using Hybrid Pass Transistors/CMOS Logic Cells in Standard Cell-Based Design Environment, to appear in IEEE Trans. Circuits and Systems-II.
[2]
[3]
[4] Fig. 6. Area profiling of major components in the finite field multiplier designs of Fig. 1 and Fig. 2 using pure CMOS, pure PTL, and hybrid PTL/CMOS cells.
[5]
[6]
134

Low-Cost Design of Serial-Parallel Multipliers Over GF (2) Using Hybrid Pass-Transistor Logic (PTL) and CMOS Logic

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Low-Cost Design of Serial-Parallel Multipliers Over GF (2) Using Hybrid Pass-Transistor Logic (PTL) and CMOS Logic

Uploaded by

Copyright:

Available Formats

2010 International Symposium on Electronic System Design

Shen-Fu Hsiao, Chia-Sheng Wen and Ming-Yu Tsai

Any two arbitrary elements A and B in GF(2m) can be represented as

with A = A . Thus, A recursively as

Ai can be obtained from Ai

(a) overall architecture

IMPLEMENTATIONS USING PTL AND CMOS CIRCUITS

(a) overall architecture with D-FF

(c) CMOS T-FF

Fig. 4.Hybrid PTL/CMOS synthesis flow.

You might also like