Professional Documents
Culture Documents
A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
A Novel Area-Power Efficient Design For Approximated Small-Point FFT Architecture
Abstract—Fast Fourier transform (FFT) is an essential algo- high data rate requirement. Therefore, an area-power efficient
rithm in digital signal processing and advanced mobile communi- and high speed FFT architecture has been actively researched
cations. With the continuous development of modern technology, in recent years [5], [9]–[11].
the area-power efficient hardware implementation of FFT has
attracted a lot of attention. In this article, a novel design for FFT FFT can be implemented both in high-level software lan-
implementation is proposed. The number of resource-expensive guages and on hardware. The software solution has the
multiplications in our design is decreased by a twiddle factor advantage in flexibility but the solution is usually accompanied
merging technique that reduces the hardware area. Subsequently, with a bulky and complex hardware system which is unfriendly
a common subexpression sharing scheme is applied to reuse the to small-sized and battery-based devices [12], [13]. Motivated
hardware resources to further save the hardware area. In addi-
tion, a magnitude-response aware approximation algorithm is by the challenging requirements in hardware area and power
proposed for applications where the transformation accuracy can budget, many hardware solutions for FFT implementation
be compromised a little bit for lesser hardware area and power have been developed for mobile and smart devices [14], [15].
dissipation. Logic synthesis shows that the proposed 16-point FFT As technology matures, application-specific integrated cir-
architecture can save hardware area and power dissipation on cuit (ASIC) has become more cost effective for some generic
application-specific integrated circuit (ASIC) by up to 65.7% and
53.1% compared with recently published designs. Similarly, the classes of problems in digital signal processing such as the
proposed 32-point FFT architecture achieves up to 58.8% reduc- FFT [16]. ASIC solution provides the ability to manage power
tion on hardware area and 60.0% reduction on power dissipation and exploit the robustness after processing in digital domain.
on ASIC. Existing hardware solutions for FFT implementation can
Index Terms—Approximated fast Fourier transform (FFT), be mainly divided into reconfigurable and fixed architectures.
common subexpression sharing (CSSsharing), twiddle factor Reconfigurable architecture [17]–[19] is generally developed
merging (TFMerging). for variable-length FFTs. Mixed-radix algorithms are com-
mon solutions to implementing a variety of transformation
lengths. Fixed FFTs can be further categorized into pipelined
I. I NTRODUCTION and parallel architectures. The most classical approaches for
pipelined FFTs [20]–[24] are multipath delay commutator and
AST Fourier transform (FFT) is a widely used algo-
F rithm in digital signal processing [1]–[3] and wireless
communication systems [4]–[8]. In long term evolution (LTE)
single-path delay feedback. The basic structure of these two
approaches consists of a processing element (PE) for data com-
putation and the required memory for data storage. Parallel
and its high-speed versions LTE-Advanced/LTE-Advanced
architectures on the other hand [25]–[29] can be constructed
Pro [7], FFTs with different transformation sizes are desired.
by decomposing the FFT algorithm into several partitions and
Moreover, FFT generates the required radio frequency (RF)
using a combination of PEs to compute these partitions in
beams for multibeam beamforming which is one of the
parallel. Different FFT architectures have different advantages
significant techniques in the fifth generation (5G) wireless
and disadvantages in terms of hardware complexity and com-
communication [8]. Nowadays, as most of mobile communi-
putation speed. Pipelined FFTs have simpler architectures but
cations are designed to be implemented on portable devices,
larger latency as data are processed sequentially. In addition,
the embedded FFT processor is required to have low hardware
complex controller is required in pipelined architectures. On
area and power dissipation. It is also vital that the computa-
the contrary, parallel architectures can deal with N inputs
tion speed of the FFT processor is high enough to support
simultaneously and have the advantage of easy control but
Manuscript received October 16, 2019; revised January 8, 2020; accepted at the expense of more hardware resources.
February 29, 2020. Date of publication March 6, 2020; date of cur- It is challenging to implement FFT on hardware to achieve
rent version November 20, 2020. This work was supported by the
Nanjing University of Aeronautics and Astronautics, Nanjing, China, under high accuracy and low hardware cost. However, the accuracy
Grant 56YAH18043. This article was recommended by Associate Editor can be relaxed in some applications to achieve a substantially
R. Drechsler. (Corresponding authors: Jiajia Chen; Susanto Rahardja.) reduction on hardware cost and power dissipation. To do this,
Xueyu Han and Susanto Rahardja are with the School of Marine Science
and Technology, Northwestern Polytechnical University, Xi’an 710072, China approximation [30] has become imperative for the tradeoff.
(e-mail: susantorahardja@ieee.org). Recent published works in [24], [27], and [28] improved exist-
Jiajia Chen and Boyu Qin are with the College of Electronic and ing FFT architectures based on approximate computing. An
Information Engineering, Nanjing University of Aeronautics and Astronautics,
Nanjing 211106, China (e-mail: jiajia_chen@nuaa.edu.cn). inexact pipelined FFT accelerator was proposed in [24]. A sta-
Digital Object Identifier 10.1109/TCAD.2020.2978839 tistical learning technique based on normalized least mean
0278-0070
c 2020 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See https://www.ieee.org/publications/rights/index.html for more information.
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4817
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
4818 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 39, NO. 12, DECEMBER 2020
TABLE I
R EQUIRED R ESOURCES IN N-P OINT DFT AND
DIT-FFT A RCHITECTURES
2π nk 2π nk 2π nk
error E caused by the approximation is evaluated based on WN = exp −j
nk
= cos − j sin (7)
N N N
the root-mean-square error (RMSE) between Xappro and X as
follows: where cos(2π nk/N) and sin(2π nk/N) are twiddle factor
s s
coefficients (TFCs). The term WNp+1 WNp in (6) can then be
N−1 2 represented by a single NTF
k=0 X(k) − Xappro (k)
E= . (4)
N sp sp+1 2π sp + sp+1 2π sp + sp+1
WN WN = cos − j sin
With the estimated hardware cost final_cost and the com- N N
sp +s
puted error E, the design problem can then be formulated = WN p+1 . (8)
as a minimization of final_cost provided that E is less than
Therefore, each TFM in (6) is decomposed into two
required
multiplications with nontrivial TFCs. Similar to nontrivial
TFMs, nontrivial TFCs are coefficients which need to be
Minimize{final_cost}, s.t. E ≤ δ (5)
implemented by shift and adder network. When sp+1 =
where δ is the maximum RMSE allowed. To solve (5), N/4 − (sp + sp+1 ), (6) is rewritten as
we propose an algorithm to perform TFMerging which
2π sp 2π sp+1 2π sp+1
reduces the number of multiplications in FFT architec- R = r + cos t + cos u + sin v
N N N
ture and increases hardware resource sharing to reduce
2π sp 2π sp+1 2π sp+1
final_cost. Moreover, a magnitude-response aware approxima- − j sin t + sin u + cos v . (9)
tion approach is proposed to further reduce final_cost while E N N N
is monitored to be no more than δ. The details of the algorithm Benefiting from the distributive operation of multiplication
are presented in the next section. in (6) and the TFMerging in (8), the computation of the FFT
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4819
output is finally re-expressed by the sum of terms where the Algorithm 1 Pseudo Code of the CSSharing Algorithm
input is multiplied with nontrivial TFCs like (9). In such case, it Input: C
Output: CS_Final
is highly possible that these addition terms in (9) can be shared CS_Share = Find_CS(C);
and the number of multiplications with nontrivial TFCs is n = Size(CS_Share);
reduced correspondingly compared to the conventional radix-2 for i from 1 to n
CS_Selected = CS_Share[i];
DIT-FFT architecture. For example, in stage 3 and 4 of Fig. 1, updated_C = Remove(CS_Share[i], C);
the data path in red for computing X(1) corresponds to the CS_Additional = Find_CS(updated_C);
data path for computing R which is also marked in red in Fig. 3. CS_Selected = Insert(CS_Selected, CS_ Additional);
s cost[i] = FA_count(CS_Selected);
The parameters in Fig. 3 are specified as p = 3, WNp = W16 2 ,
end for
sp+1
and WN = W16 . r, t, u, and v are computed as
1 CS_Final = Min_cost(cost);
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
4820 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 39, NO. 12, DECEMBER 2020
C. Magnitude-Response Aware Approximation to Twiddle Algorithm 2 Pseudo Code of the AFFT_ECS Algorithm
Factor Coefficients Input: C, existing_CS, δ
Output: C
All the infinite-precision TFCs are first transformed into M- E = error(C);
bit CSD representations by cutting off the insignificant bits of while (E ≤ δ) {
updated_C = Remove(existing_CS, C);
TFCs. If the precision of TFCs is specified as K-bit (K ≤ M), D = Count_nonzerodigit(updated_C);
the truncation operation directly cuts off the (M − K) least n = Size(D);
significant bits (LSBs). The error E∗ caused by the trunca- for i from 1 to n
(gcomp [i], gzero [i]) = Gain(updated_C[i]);
tion operation can be evaluated by (4). When a maximum end for
allowed transformation error δ is specified for certain appli- final_C = Max_Gain(gcomp , gzero , C); E = error(final_C);
cation, there is usually a small margin between E∗ and δ. To if E ≤ δ
C = final_C;
better utilize the margin, the truncated TFCs can be further else
approximated by changing some digits in the TFCs to reduce C_set = Rank_Gain(gcomp , gzero , C);
total FA count as long as the error is not bigger than δ. The for j from 2 to n
E = error(C_set[j]);
challenge is how to develop an efficient measure to minimize if E ≤ δ
the total FA count during the approximation while the error C = C_set[j];
is being kept below δ. To address this, a novel magnitude- break;
end if
response aware approximation approach is proposed in this end for
section. end if }
Two methods are considered in the proposed approxima-
tion approach. The first one is to change less significant
nonzero digits of nontrivial TFCs to zero so that the num-
ber of adders used for implementing the corresponding TFMs 1) Approximation Based on Existing Common
is reduced. The second method is to change less significant Subexpressions: If CSs exist in different nontrivial TFCs, we
nonzero digits to their complement so that opportunities for propose an approximated FFT algorithm based on existing
sharing CSs between nontrivial TFCs are created. These two CSs (named as AFFT_ECS) in this article. First of all, an
methods are simultaneously applied to different nonzero digits optimal CSSharing solution is generated by the proposed
of TFCs which have different impacts on the total FA count algorithm. With the optimal CSs unchanged, the AFFT_ECS
and transformation error. The effect on the total FA count by algorithm iteratively changes some of the remaining nonzero
approximating the ith TFC at the jth nonzero digit counting digits in TFCs which do not exist in any CS yet. In each
i,j i,j
from the most significant bit is iteration, gzero and gcomp are computed for all nonzero digits
by (13). The approximation is performed to the nonzero digit
FAbe − FAaf i,j i,j
which has the biggest gzero /gcomple , provided that the corre-
ci,j = (11)
FAbe sponding transformation error Eaf is less than the maximally
where FAbe and FAaf are the total FA count of the correspond- allowed error δ. If Eaf is bigger than δ due to the change of
ing FFT implementation before and after one approximation the nonzero digit with the biggest gain, the algorithm seeks
method is adopted, respectively. By evaluating the transforma- the digit with the second biggest gain. The approximation is
tion error using (4), the sensitivity of the jth nonzero digit in performed when the error caused by changing the digit with
the ith TFC with respect to transformation error is defined as the second biggest gain is smaller than δ. Otherwise, the
algorithm continues to seek the digit with the third biggest
Eaf − Ebe gain. The iteration continues while Eaf is always evaluated to
si,j = (12)
Ebe decide if any nonzero digit should be changed. If Eaf caused
where Ebe and Eaf are the transformation errors before and by the changing of the digit with the least gain is still bigger
after the nonzero digit is changed, respectively. It is obvious than δ, the algorithm stops and the approximation to this set
that changing a nonzero digit with a larger c and a smaller of TFCs completes. The main steps of the AFFT_ECS are
s leads to more effective improvement. To evaluate these two summarized in Algorithm 2.
measures by using one metric, we define the gain of changing The function error computes the error of the FFT imple-
the jth nonzero digit in the ith TFC on the total FA count and mentation using the approximated TFC set C. The function
transformation error as Count_nonzerodigit counts the total number of nonzero digits
of updated_C after removing existing CSs. For each nonzero
ci,j digit in updated_C, the function Gain measures its gain.
gi,j = . (13)
si,j The nonzero digit that has the biggest gain is selected and
It is evident that changing the nonzero digit with a larger gain the TFC set is changed by the function Max_Gain accord-
contributes more efficient solution. Since the approximation ingly. The function Rank_Gain ranks the gains in descending
to one nonzero digit can be done by either changing to zero order. Finally, the algorithm returns the TFC set where all the
or the complement, the respective gains can be denoted as qualified nonzero digits are approximated.
i,j i,j
gzero and gcomp which are evaluated by (13). With the above 2) Approximation by Creating New Common
denotations and definitions, two approximated algorithms are Subexpressions: For small size FFTs, the number of nontrivial
presented in the next sections. TFCs is limited. As a consequence, it is likely that no CS
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4821
Algorithm 3 Pseudo Code of the Error Compensation Algorithm 4 Pseudo Code of the AFFT_NCS Algorithm
Technique Input: C, δ
Input: C Output: final_C
Output: C, Einitial D = Count_nonzerodigit(C);
Einitial = error(C); n = Size(D);
while (true) { for i from 1 to n
D = Count_nonzerodigit(C); sharable_CS = ∅;
n = Size(D); appro_C = ∅;
for i from 1 to n new_C = Change_to_complement(C[i]);
compensated_C[i] = Change_to_zero(C[i]); CS_set = Generate_CS(new_C);
E[i] = error(compensated_C[i]); sharable_CS = Select_CS(CS_set);
end for while (sharable_CS = ∅) {
(min_E, new_C) = Min_error(E, compensated_C); new_CS = Shortest_CS(sharable_CS);
if min_E ≤ Einitial Einitial = error(new_C);
Einitial = min_E; if Einitial ≥ δ
C = new_C; (adapt_C, Einitial ) = Error_Compensate(new_CS, new_C);
else end if
break; if Einitial = δ
end if } appro_C[i] = AFFT_ ECS (adapt_C, new_CS, δ);
cost[i] = FA_count(new_CS);
end if }
end for
if appro_C = ∅
can be shared by TFCs at the beginning. Moreover, even CSs (min_cost, final_C) = Min_cost(cost);
exist initially, fixing them as in AFFT_ECS algorithm may else
hinder the TFCs from being further approximated to achieve final_C = C;
end if
a better solution. For example, if the existing CS is located
at less significant bit position in a TFC, we can only change
the rest nonzero digits at more significant bit positions which
causes bigger transformation error. In the above-mentioned error compensation is applied when the corresponding trans-
two circumstances, we propose to approximate TFCs with the formation error exceeds. After that, the AFFT_ECS algorithm
freedom of creating new CS by changing a nonzero digit to its proposed in Section III-C1 is performed thereafter for further
complement. Eaf caused by this approximation operation may approximation.
exceed the maximally allowed error δ. However, this does not This process is applied to every nonzero digit in the same
mean that the approximation cannot be performed because way as described above. The total FA count is computed for
the unacceptable error can be compensated by changing each implementation and the approximated TFC set which
other nonzero digits in the same TFCs. To achieve this, we results in the lowest FA count is returned as the final solution
propose an error compensation technique to adapt TFCs for by the AFFT_NCS. The main steps of the algorithm are sum-
compensating the error before it is compared with δ. The marized in Algorithm 4. The function Change_to_complement
algorithm starts with computing the initial transformation changes a particular nonzero digit to its complement and
error using the TFCs in which one nonzero digit is changed returns the approximated TFC set. All the CSs are saved
to create new CS. For each of the remaining nontrivial digits in CS_set using the function Generate_CS. The function
not appearing in the new CS, it is changed to zero and the Eaf Select_CS selects all CSs that can be shared and saves them
is recomputed correspondingly. The minimum Eaf is selected into sharable_CS. For each element in sharable_CS, the func-
and compared with the initial transformation error after the tion Shortest_CS chooses the CS of the shortest length as the
new CS is generated. If the error decreases, the algorithm newly created CS for further TFC approximation.
moves on to change the next nonzero digit to zero until the With the proposed TFMerging technique, CSSharing and
transformation error stops decreasing. The main steps of magnitude-response aware approximation algorithm, a com-
the algorithm are summarized in Algorithm 3. The function plete approximated FFT architecture design algorithm is estab-
Change_to_zero changes a nonzero digit which does not lished. First of all, the proposed TFMerging technique is
exist in CS and returns a new TFC set compensated_C. The applied to an N-point FFT to generate nontrivial TFCs to
function Min_error selects the minimum error and returns the be approximated. Next, we apply the CSSharing method and
approximated TFC set which produces this error. magnitude-response aware approximation algorithm to further
With the error compensation, we propose an approximated reduce the hardware complexity, with the maximally allowed
FFT algorithm by creating new CS (named as AFFT_NCS). transformation error δ. With the nontrivial TFCs, we first
If there are CSs existing in TFCs initially, they are ignored check if there are existing CSs that can be shared. If no,
and all nonzero digits are considered equally when creating the AFFT_NCS algorithm is applied to approximate TFCs.
new CS. For each nonzero digit in TFCs, the algorithm first Otherwise, the CSSharing method is applied to provide a solu-
changes it to its complement. All the remaining nonzero dig- tion for resource sharing before the AFFT_ECS algorithm is
its in the same TFC take turns to be examined. Once a new applied to further approximate nontrivial TFCs. Though a good
CS is found, it is fixed and the algorithm stops creating more. solution can be returned by the AFFT_ECS algorithm, the
This is because of the limited number of nonzero digits exist- fixed CSs create a barrier for further approximation. Therefore,
ing in TFC. When one CS is fixed, there is little chance the AFFT_NCS algorithm is also applied in this situation to
that the remaining nonzero digits can form other CSs. The provide an alternative even though CS exists initially. Two
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
4822 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 39, NO. 12, DECEMBER 2020
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4823
TABLE III
T OTAL FA C OUNT AND T RANSFORMATION E RROR OF THE FFT D ESIGNS
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
4824 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 39, NO. 12, DECEMBER 2020
TABLE IV
the proposed FFT designs, AFFT1 and AFFT5 are with the C OMPARISON B ETWEEN THE FFT D ESIGNS ON FPGA. (a) 16-P OINT
same ransformation errors as the 16- and 32-point designs FFT D ESIGNS . (b) 32-P OINT FFT D ESIGNS
in [14], respectively, while the others are less accurate. To
compare the hardware cost more accurately, all designs are
described in Verilog HDL and mapped to Xilinx Virtex7,
xc7s75fgga484 FPGA device. Xilinx Vivado Design Suite
v17.4 is used to synthesize the designs. The number of LUTs
(#LUTs), the utilization density of LUTs, the number of IOs
(#IOs), the utilization density of IOs and delays in ns of 16-
and 32-point FFT designs are shown in Table IV. At least
41.2% and 56.4% improvements are achieved by our 16- and
32-point designs, respectively, over the designs of [5] and [14]
in terms of #LUTs. Our designs have shorter delays com-
pared with [5] and [14]. The reason is that the merging
technique reduces the number of multiplications and there-
fore #FAs in the critical path is reduced. Design of [5] has
larger delays since the iteration operation in CORDIC scheme
lead to more adders in the critical path. The reason why
AFFT4 and AFFT8 reduce #LUTs dramatically over the 16-
and 32-point designs in [5] and [14] is because their high
transformation error tolerance cause excessively approximated
TFCs, with which all the multiplications in the FFT archi-
tecture can be implemented by direct hardwiring. Compared
with another excessively approximated 16-point FFT design The on-chip memory requirement of the designs are
proposed in [28], AFFT4 saves 8.4% FPGA area benefitting presented in Table V. The proposed FFT implementations have
from the proposed techniques. The FPGA areas of the 16- lower register cost, because the proposed merging technique
and 32-point approximated FFT designs are plotted in Fig. 8. reduces the number of temporary TFM partial products which
The FPGA areas of the 16- and 32-point FFT designed by requires storage.
using the conventional radix-2 DIT-FFT algorithm are used To verify the performance on ASIC, all the designs are also
as the baseline. All other areas of FFT architectures by our mapped to 45-nm standard cell library and synthesized by
algorithm, [5], [14], and [28] are normalized by the baseline. Synopsys design compiler. We choose the same cell library
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4825
Fig. 8. Normalized FPGA areas of 16- and 32-point FFT designs. Fig. 9. Normalized ASIC areas of 16- and 32-point FFT designs.
TABLE VI
C OMPARISON B ETWEEN THE FFT D ESIGNS ON ASIC
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
4826 IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS, VOL. 39, NO. 12, DECEMBER 2020
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.
HAN et al.: NOVEL AREA-POWER EFFICIENT DESIGN FOR APPROXIMATED SMALL-POINT FFT ARCHITECTURE 4827
[28] V. Ariyarathna et al., “Multibeam digital array receiver using a 16-point Boyu Qin is currently pursuing the B.Eng. degree
multiplierless DFT approximation,” IEEE Trans. Antennas Propag., with the College of Electronic and Information
vol. 67, no. 2, pp. 925–933, Feb. 2019. Engineering, Nanjing University of Aeronautics and
[29] Y. Ji-Yang, H. Dan, L. Xin, X. Ke, and W. Lu-Yuan, “Conflict-free Astronautics, Nanjing, China.
architecture for multi-butterfly parallel processing in-place radix-r FFT,” His research interest includes digital circuits
in Proc. IEEE Int. Conf. Signal Process., Chengdu, China, Nov. 2016, design and implementation.
pp. 496–501.
[30] S. Mittal, “A survey of techniques for approximate computing,” ACM
Comput. Surveys, vol. 48, no. 4, pp. 1–34, Mar. 2016.
[31] J. Ding, J. Chen, and C.-H. Chang, “A new paradigm of common subex-
pression elimination by unification of addition and subtraction,” IEEE
Trans. Comput.-Aided Design Integr. Circuits Syst., vol. 35, no. 10,
pp. 1605–1617, Oct. 2016.
[32] J. W. Cooley and J. W. Tukey, “An algorithm for machine calculation of
complex Fourier series,” Math. Comput., vol. 19, no. 90, pp. 297–301,
Jan. 1965.
[33] J. Chen and C.-H. Chang, “High-level synthesis algorithm for the design
of reconfigurable constant multiplier,” IEEE Trans. Comput.-Aided
Design Integr. Circuits Syst., vol. 28, no. 12, pp. 1844–1856, Dec. 2009.
[34] K. Moller, M. Kumm, M. Garrido, and P. Zipf, “Optimal shift reas-
signment in reconfigurable constant multiplication circuits,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 37, no. 3, pp. 710–714,
Mar. 2018.
[35] B. Koyada, N. Meghana, M. O. Jaleel, and P. R. Jeripotula, “A com-
parative study on adders,” in Proc. Int. Conf. Wireless Commun. Signal
Process. Netw., Chennai, India, Mar. 2017, pp. 2226–2230.
[36] V. Gupta, D. Mohapatra, A. Raghunathan, and K. Roy, “Low-power
digital signal processing using approximated adders,” IEEE Trans.
Comput.-Aided Design Integr. Circuits Syst., vol. 32, no. 1, pp. 124–137,
Jan. 2013.
[37] R. Kaur and T. Singh, “Design of 32-point mixed radix Fft processor
using CSD multiplier,” in Proc. Int. Conf. Parallel Distrib. Grid Comput.,
Waknaghat, India, Dec. 2016, pp. 538–543.
[38] J. Chen, C. H. Chang, and H. Qian, “New power index model for switch- Susanto Rahardja (Fellow, IEEE) received the
ing power analysis from adder graph of FIR filter,” in Proc. IEEE Int. B.Eng. degree from the National University of
Symp. Circuits Syst., Taipei, Taiwan, May 2009, pp. 2197–2200. Singapore, Singapore, and the M.Eng. and Ph.D.
degrees in electronic engineering from Nanyang
Technological University, Singapore.
He is currently the Chair Professor with
Xueyu Han received the B.Eng. degree from Northwestern Polytechnical University, Xi’an,
Northwestern Polytechnical University, Xi’an, China, under the Thousand Talent Plan of People’s
China, in 2017, where she is currently pursuing Republic of China. He attended the Stanford
the Ph.D. degree with the Center of Intelligent Executive Programme with the Graduate School
Acoustics and Immersive Communications. of Business, Stanford University, Stanford, CA,
Her research interests include algorithms and USA. He contributed to the development of a series of audio compression
circuit design for digital signal processing. technologies, such as Audio Video Standards AVS-L, AVS-2 and ISO/IEC
14496-3:2005/Amd.2:2006, and ISO/IEC 14496-3:2005/Amd.3:2006 in
which some have been licensed to several companies. He has more than
15 years of experience in leading research team for media related research
that cover areas in signal processing (audio coding, video/image processing),
media analysis (text/speech, image, video), media security (biometrics,
computer vision, and surveillance), and sensor networks. He has published
more than 300 papers and has been granted more than 70 patents worldwide
Jiajia Chen received the B.Eng. (Hons.) and Ph.D. out of which 15 are U.S. patents. His research interests are in multimedia,
degrees from Nanyang Technological University, signal processing, wireless communications, discrete transforms, machine
Singapore, in 2004 and 2010, respectively. learning, and signal processing algorithms and implementation.
From April 2012 to March 2018, he was Dr. Rahardja was a recipient of several honors, including the IEE Hartree
a Faculty Member with the Singapore University Premium Award, the Tan Kah Kee Young Inventors’ Open Category Gold
of Technology and Design, Singapore. Since Award, the Singapore National Technology Award, the A*STAR Most
April 2018, he has been with the Nanjing University Inspiring Mentor Award, the Finalist of the 2010 World Technology and
of Aeronautics and Astronautics, Nanjing, China, Summit Award, the Nokia Foundation Visiting Professor Award, and
where he is currently a Professor. His research the ACM Recognition of Service Award. He was an Associate Editor
interest includes computational transformations of of the IEEE T RANSACTIONS ON AUDIO , S PEECH AND L ANGUAGE
low-complexity digital circuits and digital signal P ROCESSING and the IEEE T RANSACTIONS ON M ULTIMEDIA, and the
processing. Senior Editor of the IEEE J OURNAL OF S ELECTED T OPICS IN S IGNAL
Prof. Chen served as the Web Chair of the Asia–Pacific Computer Systems P ROCESSING. He is currently serving as an Associate Editor for the Journal
Architecture Conference in 2005, the Technical Program Committee Member of Visual Communication and Image Representation (Elsevier) and the
of European Signal Processing Conference in 2014, and the Third IEEE IEEE T RANSACTIONS ON M ULTIMEDIA. He was the Conference Chair
International Conference on Multimedia Big Data in 2017, and has been serv- of fifth ACM SIGGRAPHASIA in 2012 and APSIPA second Summit and
ing as an Associate Editor for the EURASIP Journal on Embedded Systems Conference in 2010 and 2018, as well as other conferences in ACM, SPIE,
(Springer) since 2016. and IEEE.
Authorized licensed use limited to: GLA University. Downloaded on November 05,2022 at 07:27:50 UTC from IEEE Xplore. Restrictions apply.