Professional Documents
Culture Documents
Fig. 4. The detection logic circuits (at the left-hand side of the dotted line)
using an AND gate to assert the control signals.
2AC9 0010101011001001
006A x) 00000000011010100
3140
power consumption caused by the transient signals. dissipates only 0.0121 mW per MHz when computing the
According to the analysis of the multiplication shown in Fig. multiplication of texture coding in H.264.
5, we propose the SPST-equipped modified-Booth encoder,
which is controlled by a detection unit. The detection unit has
one of the two operands as its input to decide whether the
Booth encoder calculates redundant computations. As shown
in Fig. 6, the latches can respectively freeze the inputs of
MUX-4 to MUX-7 or only those of MUX-6 to MUX-7 when
the PP4 to PP7 or only the PP6 to PP7 are zero, to reduce the
transition power dissipation. Such cases occur frequently in
e.g. FFT/IFFT, DCT/IDCT and Q/IQ adopted in encoding or
decoding multimedia data.
3141
power saving becomes negative under some conditions with and the switching activities are expected to be the same.
large input data bit-widths. From Fig. 8, we can also know Furthermore, we plot the scaled results of the 16 × 16-bit
that the power saving of the SPST-equipped multiplier using multipliers listed in Table 2 in terms of power per unit
registers is slightly larger than that of the SPST-equipped frequency in Fig. 9. From this figure, we can also find that the
multiplier using AND gates. Nevertheless, using registers proposed multiplier has better power efficiency than the
results in longer critical path delay as shown in Table 1. existing multipliers. Designs [2], [5], and [7] are 32-bit
multipliers which are considered as reference indexes only.
TABLE 2
THE PERFORMANCE COMPARISON WITH THE EXISTING MULTIPLIERS V. CONCLUSION
Delay This study proposes a multiplier adopting the new SPST
Design Feature Tech. Power (mW) Area
(ns)
implementing approach. The simulation results show that the
(1) 19.65 for JPEG
Huang
(1) 32b × 32b 0.18 µm (2) 40.65@ 100MHz for 7.25
74598 power reduction of the new approach, i.e. a 40% saving, is
[2] (tr.) very close to that of the former approach. Besides, the new
random data
(1)Coprocessor approach leads to a 40% speed improvement when compared
Liao [5] (2) SIMD 0.18 µm 900@1.6V, 800MHz 1.25 N.A. with the former one. When implemented in a 0.18-µm
(3) 32b MAC
(1) 32b × 32b 0.35 µm/ 19743
CMOS technology, the proposed SPST-equipped multiplier
Wang [7] 79.86 14.01 dissipates only 0.0121 mW per MHz in H.264 texture coding.
(2)Fixed-width 3.3V (gate)
17.30 for normal 0.337 In addition, the performance of the proposed design under the
Chen [1] (1) 16b × 16b 0.25 µm 8.3
distribution inputs (mm2) conditions of different bit-width input data is explored. The
Scalable length 0.13 1.04@100MHz for random 6388 results also show that the new SPST approach not only owns
Lee [4] N.A.
of 4b, 8b, 16b µm/1.2V data (gate)
equivalent low-power performance but also leads to a higher
(1) 16b × 16b (1) 9@1.3V, 1GHz
Hsu [9] (2) Sleep mode 90 nm (1) 7.9 × 10−2 @50MHz, 1
0.03 maximum speed when compared with the former SPST
(mm2) approach. Moreover, the proposed SPST-equipped multiplier
(3) Duel VT 0.57V
(1) 1.21@100MHz, 1.8V also has better power efficiency when compared with the
(1) 16b × 16b
for H.264 texture coding
11028 existing modern multipliers.
Proposed 0.18 µm (2) 2.82@100MHz, 1.8V 5
(2) SPST (tr.)
for avg. normal REFERENCES
distribution inputs
[1] O. Chen, S. Wang, and Y. W. Wu, “Minimization of switching activities
of partial products for designing low-power multipliers,” IEEE Trans.
0.08 VLSI, vol. 11, no. 3, pp. 418-433, June 2003.
0.07 [2] Z. Huang, and M.D. Ercegovac, ”High-performance low-power
0.06 left-to-right array multiplier design,” IEEE Trans. on Computers, vol.
54, no. 3, pp. 272-283, Mar. 2005.
Scaled P/f
0.05
[3] M. C. Wen, S. J. Wang; Y. N. Lin, “Low-power parallel multiplier with
0.04 column bypassing,” Electronic Letters, vol. 41, no.12, pp. 581-583,
0.03 May 2005.
0.02 [4] H. Lee, “A power-aware scalable pipelined Booth multiplier,” IEEE Int.
SOC Conf., pp.123-126, Sep. 2004.
0.01
[5] Y. Liao, and D.B. Roberts, “A high-performance and low-power 32-bit
0 multiply-accumulate unit with single-instruction- multiple-data
[1] [4] [9] Proposed
(SIMD) feature,” IEEE J. of Solid-State Circuits, vol. 37, no. 7, pp.
926-931, July 2002.
Fig. 9. The comparison of power efficiency [6] A. Danysh, and D. Tan, “Architecture and implementation of a
vector/SIMD multiply-accumulate unit,” IEEE Trans. on Computers,
The performance comparison of the proposed multiplier vol. 54, no. 3, pp. 284-293, Mar. 2005.
with some existing designs, which report more complete [7] J. S. Wang, C. N. Kuo, and T. H. Yang, “Low-power fixed-width array
information, is listed in Table 2. To fasten the comparison multipliers,” IEEE Symp. Low Power Electronics and Design, pp.
307-312, 9-11, Aug. 2004.
fairly, we scale the power results to the same technology, i.e. [8] W. C Yeh, C. W. Jen, “High-speed Booth encoded parallel multiplier
0.18 µm technology with 1.8V, according to the geometric design,” IEEE Trans. Computer, vol. 49, no. 7, pp.692-701, July 2000.
feature and the well-known power approximation [9] S. K. Hsu, S. K. Mathew, M. A. Anders, B. R. Zeydel, V. G. Oklobdzija,
R. K. Krishnamurthy, and S. Y. Borkar, “A 110 GOPS/W 16-bit
P = α CV 2 f
, where P, α i , C, V, and f denote power
∑ i multiplier and reconfigurable PLA loop in 90-nm CMOS,” IEEE J.
i Solid-State Circuits, vol. 41, no. 1, pp. 256-264, Jan. 2006.
dissipation, switching activity, capacitance, voltage, and [10] K. H. Chen, K. C. Chao, J. I. Guo, J. S. Wang, and Y. S. Chu, "An
frequency, respectively. The scaling formulation used Efficient Spurious Power Suppression Technique (SPST) and its
Applications on MPEG-4 AVC/H.264 Transform Coding Design,"
is Porg (extended from [12]) IEEE Int. Symp. Low Power Electronics Design, San Diego, California,
Psca =
(Techorg / 0.18) 2 * (Vorg / 1.8) 2 Aug. 8-10 2005.
[11] K. H. Chen, Y. M. Chen, and Y. S. Chu, “A Versatile Multimedia
where Psca , Porg , Techorg , and Vorg denote the scaled power Functional Unit Design Using the Spurious Power Suppression
Technique,” IEEE Asian Solid-State Circuits Conf., Hangzhou, China,
dissipation, original power dissipation, line width in the Nov. 2006.
original technology, and original voltage, respectively. The [12] Y. W. Lin, H. Y. Liu, and C. Y. Lee, “A dynamic scaling FFT processor
capacitance is assumed to be proportional to the silicon area, for DVB-T applications,” IEEE J. Solid-State Circuits, vol. 39, no. 11,
pp. 2005-2013, Nov. 2004.
3142