Professional Documents
Culture Documents
1, JANUARY 2015
203
multilevel
I. I NTRODUCTION
Fixed-width multipliers are widely used in digital signal processing
(DSP) applications [1][4], such as fast Fourier transform [2] and
discrete cosine transform [3], [4]. To generate an output with the
same width as the input, fixed-width multipliers truncate the half
least significant bits (LSBs) in DSP applications. Thus, truncation
errors can occur in fixed-width multiplier designs. The fixed-width
multiplier with highest accuracy is called a posttruncated (P-T)
multiplier, which truncates half of the LSBs results after calculating
all products. However, a P-T multiplier requires a large circuit area
to calculate truncation part products. By contrast, a direct-truncated
(D-T) multiplier truncates half of the LSBs products directly to
conserve circuit area, but produces a large truncation error.
To achieve a balanced design between accuracy (P-T) and area cost
(D-T), several researchers have presented various error-compensated
circuits to alleviate the truncation errors in BaughWooley (BW)
multipliers [5][10] and Booth multipliers [11][21]. Because a few
products are truncated after Booth encoding, the multipliers have a
smaller truncation error than that of BW multipliers [17]. Therefore,
many previous works have focused on the compensated circuit in
Booth multipliers [11][21]. Song et al. [18] present a binary
threshold based on statistical analysis. Their compensated circuit
consumes a large circuit area because of the complex curve fitting
required for statistical analysis. Wang et al. [19] use more product
information to improve accuracy, but their exhaustive simulation
required a considerable amount of established time. To reduce the
established time for compensated circuits, Li et al. [20] present
probability estimator (PEB) that substantially reduces calculation
time. An adaptive conditional-probability estimator (ACPE) [21] is
presented to improve the accuracy using conditional probability to
further induce the column information w for adjusting the accuracy
Manuscript received August 23, 2013; revised November 30, 2013 and
January 21, 2014; accepted January 22, 2014. Date of publication February 11,
2014; date of current version January 16, 2015. This work was supported by
the Chip Implementation Center and National Science Council (NSC) under
Project CIC T18-102C-N0001, Project NSC 102-2221-E-033-030, and Project
NSC 101-2218-E-033-005.
The author is with the Department of Information and Computer Engineering, Chung Yuan Christian University, Zhongli 320, Taiwan (e-mail:
yhchen@cycu.edu.tw).
Color versions of one or more of the figures in this paper are available
online at http://ieeexplore.ieee.org.
Digital Object Identifier 10.1109/TVLSI.2014.2302447
TABLE I
M APPED TABLE OF A M ODIFIED B OOTH E NCODER
L2
ai 2i
i=0
B = b L1 2 L1 +
L2
bi 2i
i=0
P = A B.
(1)
1063-8210 2014 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications_standards/publications/rights/index.html for more information.
204
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
TABLE II
PARTIAL P RODUCTS FOR AN E IGHT-B IT B OOTH E NCODER
Fig. 1.
Fig. 2.
Fig. 3.
(2)
T2 = 22 ( p L2,0 + p L4,1 + + n Q1 )
..
.
TL = 2L ( p0,0 + n 0 ).
TPmj = T1 + T2 + + Tw
(6)
TPmi = G 0 + G 1 + + G
(7)
P[ p0,1 = 1]
=
..
.
(4)
(5)
With the column information w, the terms TPmj and TPmi can be
expressed as the following equations:
(3)
The major term TPmj provides true information and the minor term
TPmi can be estimated based on the proposed MLCP method. Thus,
the compensated bias can be summed by obtaining TPmj and
estimating TPmi .
G Q1 = 22 ( p0,Q1 + n Q1 )
m=1,2
n=1,2
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
205
Fig. 4. Examples of an 8 8 MLCP Booth multiplier with w = 1 and nonzero codes 1111, 1110, and 0101. (a) MLCP for z = 1111. (b) MLCP for
z = 1110. (c) MLCP for z = 0101.
1 1 1 1
1
= 10+ + +0
2
3 2 3
3 m=2
1 1 1
1
1
+ 10+ + +0
2 3 2
3
3 m=1
1 1 1 1 1
+ 1 + + +00
3 2
3 2 3
m=1
4
= .
(8)
9
Fig. 4(a) shows the expected values of all elements in TPmi . Then,
a regular rule is observed, the expected values for all products with
corresponding z j = 1 are equal to 1/2 (except for p0, j and n j ).
The expected values of p0, j and n j depend greatly on the number
of the nonzero code z j = 1; that is, the order of z j code can affect
the expected value, and the expected values for p0, j and n j can be
summarized as follows:
1 3k +1 as k = odd
2
E[n j |z], E[ p0, j |z] = 3k
1 3k 1 as k = even
2
3k
k =
j
z ik
TPmi
1
3k 1
40
|k=4 =
k
2
81
3
1
3k 1
4
E[ p0,2 |z = 1110] = k
|k=2 =
2
9
3
1
3k + 1
2
|k=1 = .
E[n 0 |z = 0101] = k
2
3
3
(10)
(11)
(12)
(15)
TPmi = 0
(16)
where
=
zi .
(17)
i=0
E i Sone 2w
(18)
i=0
where
Sone = 1
2
as z = 00 0
Sone = 0
as z = 00 . . . 0.
(19)
= E0 + E1 + + E
1
2w .
2
Case 3: = 0
ik=0
E[ p0,3 |z = 1111] =
(14)
Case 2: = even
(9)
2w .
2
TPmi
(20)
206
Fig. 5.
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
Fig. 6.
designed by adding all one values for twos complement representation. Using L = 16 and w = 3 as an example, the Sone in (19) can
be expressed as follows:
+ 4 b1111
.
(21)
Sone =
2
The proposed MLCP circuits depend on , thus, various word lengths
L and column information w can use the same MLCP circuit. Using
= 6 as an example, the MLCP circuit (Fig. 5) can be employed,
yielding L = 16 with w = 3, L = 16 with w = 2, and L = 14 with
w = 1, and so on. Fig. 6 shows the use of the MLCP circuit with
corresponding w.
Because the proposed MLCP method entails using the conditionalprobability method, it yields considerable time saved for the compensated circuit compared with the exhaustive and time-consuming
heuristic simulation methods [13], [14], [18], [19]. Therefore, the
proposed MLCP compensated circuit can easily implement a large
bitwidth (as L > 16) Booth multiplier and adjust accuracy by
changing the column information w.
IV. C OMPARISONS AND D ISCUSSION
This section presents a comparison of the accuracy, area cost, and
computation delay of fixed-width Booth multipliers.
A. Accuracy
In this brief, the average absolute error || is presented and
compared for accuracy. The definitions of || are as follows:
|| = E[|P Pq |]/2 L .
(22)
Table III shows the || for D-T, P-T, the proposed MLCP estimator,
and previous works [17][19] and [21], respectively. The || is the
most crucial metric for comparing the accuracy of a fixed-width
Booth multiplier. Table III shows that the proposed MLCP Booth
multipliers achieve high performance with various bitwidth L and
column information w. Because of the structure in [19] and [21],
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS, VOL. 23, NO. 1, JANUARY 2015
TABLE IV
C OMPARISONS OF A REA C OST (m 2 ) AND D ELAY (ns) FOR
VARIOUS M ETHODS
Fig. 7.
207
R EFERENCES
[1] K. K. Parhi, VLSI Digital Signal Processing Systems: Design and
Implementation. New York, NY, USA: Wiley, 1999.
[2] S. N. Tang, J. W. Tsai, and T. Y. Chang, A 2.4-Gs/s FFT processor for
OFDM-based WPAN applications, IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 57, no. 6, pp. 451455, Jun. 2010.
[3] S. C. Hsia and S. H. Wang, Shift-register-based data transposition for
cost-effective discrete cosine transform, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 15, no. 6, pp. 725728, Jun. 2007.
[4] Y. H. Chen, T. Y. Chang, and C. Y. Li, High throughput DA-based DCT
with high accuracy error-compensated adder tree, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 19, no. 4, pp. 709714, Apr. 2011.
[5] L. D. Van and C. C. Yang, Generalized low-error area-efficient fixedwidth multipliers, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 52,
no. 8, pp. 16081619, Aug. 2005.
[6] L. D. Van, S. S. Wang, and W. S. Feng, Design of the lower error
fixed-width multiplier and its application, IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 47, no. 10, pp. 11121118, Oct. 2000.
[7] C. H. Chang and R. K. Satzoda, A low error and high performance
multiplexer-based truncated multiplier, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 18, no. 12, pp. 17671771, Dec. 2010.
[8] N. Petra, D. D. Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,
Truncated binary multipliers with variable correction and minimum
mean square error, IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 57,
no. 6, pp. 13121325, Jun. 2010.
[9] N. Petra, D. D. Caro, V. Garofalo, E. Napoli, and A. G. M. Strollo,
Design of fixed-width multipliers with linear compensation function,
IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 58, no. 5, pp. 947960,
May 2011.
[10] I. C. Wey and C. C. Wang, Low-error and hardware-efficient fixedwidth multiplier by using the dual-group minor input correction vector
to lower input correction vector compensation error, IEEE Trans.
Very Large Scale Integr. (VLSI) Syst., vol. 20, no. 10, pp. 19231928,
Oct. 2012.
[11] S. J. Jou, M. H. Tsai, and Y. L. Tsao, Low-error reduced-width Booth
multipliers for DSP applications, IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 50, no. 11, pp. 14701474, Nov. 2003.
[12] H. A. Huang, Y. C. Liao, and H. C. Chang, A self-compensation fixedwidth Booth multiplier and its 128-point FFT applications, in Proc.
IEEE Int. Symp. Circuits Syst., May 2006, pp. 35383541.
[13] Y. H. Chen, T. Y. Chang, and R. Y. Jou, A statistical error-compensated
Booth multiplier and its DCT applications, in Proc. IEEE Region 10
Conf., Nov. 2010, pp. 11461149.
[14] T. B. Juang and S. F. Hsiao, Low-error carry-free fixed-width multipliers with low-cost compensation circuits, IEEE Trans. Circuits Syst. II,
Exp. Briefs, vol. 52, no. 6, pp. 299303, Jun. 2005.
[15] K. J. Cho, K. C. Lee, J. G. Chung, and K. K. Parhi, Design of low-error
fixed-width modified Booth multiplier, IEEE Trans. Very Large Scale
Integr. (VLSI) Syst., vol. 12, no. 5, pp. 522531, May 2004.
[16] S. R. Kuang, J. P. Wang, and C. Y. Guo, Modified Booth multipliers
with a regular partial product array, IEEE Trans. Circuits Syst. II, Exp.
Briefs, vol. 56, no. 5, pp. 404408, May 2009.
[17] Y. H. Chen, C. Y. Li, and T. Y. Chang, Area-effective and powerefficient fixed-width Booth multipliers using generalized probabilistic
estimation bias, IEEE J. Emerging Sel. Topics Circuits Syst., vol. 1,
no. 3, pp. 277288, Sep. 2011.
[18] M. A. Song, L. D. Van, and S. Y. Kuo, Adaptive low-error fixed-width
Booth multipliers, IEICE Trans. Fundam., vol. A, no. 6, pp. 11801187,
Jun. 2007.
[19] J. P. Wang, S. R. Kuang, and S. C. Liang, High-accuracy fixed-width
modified Booth multipliers for lossy applications, IEEE Trans. Very
Large Scale Integr. (VLSI) Syst., vol. 19, no. 1, pp. 5260, Jan. 2011.
[20] C. Y. Li, Y. H. Chen, T. Y. Chang, and J. N. Chen, A probabilistic
estimation bias circuit for fixed-width Booth multiplier and its DCT
applications, IEEE Trans. Circuits Syst. II, Exp. Briefs, vol. 58, no. 4,
pp. 215219, Apr. 2011.
[21] Y. H. Chen and T. Y. Chang, A high-accuracy adaptive conditionalprobability estimator for fixed-width Booth multipliers, IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 59, no. 3, pp. 594603, Mar. 2012.
[22] B. Parhami, Computer Arithmetic: Algorithms and Hardware Designs.
Oxford, U.K.: Oxford Univ. Press, 2000.
[23] C. H. Chang, J. Gu, and M. Zhang, Ultra low-voltage low-power
CMOS 4-2 and 5-2 compressors for fast arithmetic circuits, IEEE Trans.
Circuits Syst. I, Reg. Papers, vol. 51, no. 10, pp. 19851997, Oct. 2004.
[24] Y. Wang, J. Ostermann, and Y. Zhang, Video Processing and Communications, 1st ed. Upper Saddle River, NJ, USA: Prentice-Hall, 2002.