You are on page 1of 5

Reconfigurable Radix-2k×3 Feedforward FFT

Architectures
Wei-Lun Tsai1, Sau-Gee Chen1, Shen-Jui Huang2
1
Department of Electronics Engineering, Institute of Electronics, National Chiao Tung University, Hsinchu, Taiwan, R.O.C.
2
Intelligo Corp., Hsinchu, Taiwan, R.O.C.
asd19473652@gmail.com, sgchen@cc.nctu.edu.tw, shenray.ee95g@g2.nctu.edu.tw

Abstract—Due to the increasing demand for high- SDF and MDF architectures. In [2], an SDF architecture
throughput and low-cost mobile devices, design of high-parallel designed with improved radix-3 and radix-23 algorithms is
reconfigurable FFT processors has become more and more presented. Although it has low area complexity, its
important. However, FFT lengths varied, designing a multi- throughput is limited to R, where R is clock rate. Moreover,
length FFT processor with the requirement meet has become the radix-3 butterfly unit cannot be applied in the parallel
unprecedentedly challenging, especially as the FFT lengths FFT architectures, reducing the possibility of achieving
includes non-power-of-two. In this paper, reconfigurable mixed- higher throughput capacity. On the contrary, a high-
radix 2k×3-point feedforward FFT architectures are proposed. throughput MDF architecture proposed in [7] uses mixed-
It can be realized as any power-of-two parallelism to achieve the radix algorithms and, by hardware sharing in the last stage,
sweet spot, with performs high enough to meet the requirement
the area is partially reduced. However, its throughput is
and still promise a reasonable cost. A proposed feedforward
radix-3 FFT is applied in the architecture, empowering the FFT
limited to 6R for 1536-point FFT, instead of 8R designed for
processor to achieve high parallelisms. An 8-parallel 128- 128-2048-point FFTs. Therefore, in this paper, we propose
2048/1536-point FFT processor for the 4G LTE system is a low-area and high-parallelism FFT architecture supporting
implemented with TSMC 90nm technology. Compared to the 2k×3-point FFT operation.
existing designs, this work offers a high-throughput and high The organization of this paper is as follows. In section II,
area-efficiency solution for mixed-radix FFT operation.
a feedforward FFT architecture for radix-3 butterfly
Keywords—FFT; MDC; mixed-radix; non-power-of-two
operation is proposed. In section III, P-parallel 2k×3-point
FFT architectures are presented. Furthermore, 48/64-point
FFT architectures are shown as examples. In section IV, the
I. INTRODUCTION implementation result of an 8-parallel 128-2048/1536-point
The fast Fourier transform (FFT) is one of the most feedforward FFT processor for 4G LTE system is shown and
crucial algorithm in the SC-FDMA and OFDM-based compared with existing designs. Finally, a brief conclusion
transmission technologies. It is adopted prevalently in Wi-Fi, is given in section V.
Long Term Evolution (LTE), and the 5G New Radio (5G NR)
standard. The FFT algorithm significantly reduces the II. PROPOSED RADIX-3 FEEDFORWARD ARCHITECTUTE
computational complexity of the N-point discrete Fourier
transform (DFT) from O(N2) to O(NlogN) by radix A. Radix-3 FFT Algorithms
factorization [1]. This algorithm empowers the designer to
An N-point DFT computation of input sequence x[n] is
realize an FFT processor more efficiently.
defined as (1), where the symbol WNnk is defined as (2).
For hardware design aspect, the pipelined-based FFT,
including feedback architectures, feedforward architectures, N -1

and hybrid architectures, are widely adopted in high- X [k ]   x[ n]WNnk , k  0,1,..., N  1 (1)
n 0
throughput applications. Single-path delay feedback (SDF)
[2]-[6] and multipath delay feedback (MDF) [7]-[8] are j
2 nk
2 nk 2 nk
included in feedback architectures. The data samples in the WNnk  e N
= cos( )  j sin( ) (2)
architectures are feedback to a delay element, waiting for the N N
corresponding samples for butterfly operation. On the other The 3-point DFT computation can be extended to (3) and
hand, feedforward architectures include serial commutator is further reformulated as (4) in [2]. The symbol α equals to
(SC) [9]-[10] and multipath delay commutator (MDC) [11]- sin(2π/3) and can be approximated as (5) for efficient
[13]. In these architectures, sets of data samples are fed into hardware implementation. The corresponding signal flow
the butterfly unit simultaneously and outputted to next stage graph (SFG) is shown in Fig. 1.
once the current processes are finished. Among the pipelined
architectures, MDC has the highest utilization of butterfly X [0 ]  x [0 ]  x [1]  x [ 2 ]
unit and multiplier, resulting in a lower hardware cost. (3)
X [1]  x [ 0 ]  x [1]W 31  x [ 2 ]W 32
Besides, the flexibility of its parallelization makes it suitable
for multiple-input multiple-output (MIMO) application [11]. X [2 ]  x [0 ]  x [1]W 32  x [ 2 ]W 31
In [12], high-paralleled radix-2k MDC architectures are The reformulation [2] efficiently reduces the number of
proposed. It has higher throughput and smaller area than complex multiplication, which is the main factor dominating
MDF architecture. However, the FFT length this architecture the area of an FFT processor. Each complex multiplication in
supports is limited to power-of-two FFT length. (3) is replaced with two constant multiplications and a shift
For 4G LTE applications, FFT processor has to support operation, so a low complexity design is achieved. As a result,
128, 256, 512, 1024, 2048, and 1536 FFT lengths; and the only two constant multiplications for α, two shift operation,
existing 128-2048/1536-point FFT processors are mainly and six complex additions/subtractions are required.

978-1-7281-0397-6/19/$31.00 ©2019 IEEE


Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on April 18,2022 at 11:06:33 UTC from IEEE Xplore. Restrictions apply.
BU1 BU2 BU3 SW
x[0] X[0]
+ + +

x[1] X[1]
-0.5 - - >>1 -

-1 -αj -1
x[2] X[2] Fig. 3. Butterfly units and switch for FFT architecture
ST1 ST2 ST3

Fig. 1. Improved radix-3 buttefly signal flow graph [2] >>4 >>3 -
Re{di} >>3 + +
Re{do}
ST1 ST2 ST3

1 1 1 Im{di } >>3 + >>4 + >>3 -


sw BU2 sw BU3 sw BU2 Im{do}
1 1 1 Wα

Fig. 2. Proposed 2-parallel radix-3 FFT architecture (R3FA) Fig. 4. Constant multiplier for proposed radix-3 FFT architeture

X [0]  x[0]  x[1]  x[2] constant multipliers are 66%. Compared to the radix-3 FFT
X [1]  x[0]  (  1 / 2)( x[1]  x[2])  ( x[1]  x[2])(  j )
(4) proposed in [2], while keeping the hardware complexity, the
proposed architecture halve the latency and double the
X [2]  x[0]  (  1 / 2)( x[1]  x[2])  ( x[1]  x[2])( j ) throughput by introducing higher parallelism.
  sin  2 / 3   0.866  1  2 3 (1  2 4 (1  2  3 )) (5)
III. RECONFIGURABLE MIX-RADIX FEEDFORWARD FFT
B. Proposed Feedforward Radix-3 FFT Architecture ARCHITECTURES
According to (4) and (5), a 2-parallel radix-3 feedforward P-parallel radix-2k×3 feedforward FFT architectures and
FFT architecture (R3FA) is proposed as shown in Fig. 2. It is reconfigurable mixed-radix FFT architecture are proposed in
composed of three feedforward stages, whose SFG this section by taking the advantage of the proposed R3FA.
counterparts are marked in Fig. 1. Fig. 3 shows the detailed
structure of the butterfly units and the switch. BU1 is one of A. P-Parallel Radix-2k×3 Feedforward FFT Architectures
the most widely-used butterfly unit in FFT design [2] [12]. It A 2k×3-point FFT operation can be decomposed into a 2k-
is shown here for latter explanation of the architectures point FFT followed by a 3-point FFT, as shown in (6).
presented in section III. BU2 is the butterfly unit with two
2 k 3-1
multiplexers for data-bypassing, and BU3 has an additional
right shifter for the multiplication of 0.5. The structure of X [k ]   x[n ]W2nkk 3 , k  N
constant multiplier Wα is shown in Fig. 4. The multiplication
n 0 (6)
2  2k 1  rk
of α and data is shown in stage 2 in Fig. 1 but its
=    x (3m  r )W2km
k W k ,k  N
corresponding multiplier Wα is designed in stage 3 of the r 0  m0
2 3

proposed R3FA. This relocation halves the number of
required Wα by taking the advantage of the ordering property Take a 48-point FFT operation as an example. This
of data flow. Moreover, since the BU2, BU3, Wα multiplier, operation can be expressed as a 16-point FFT followed by a
and SW has the multiplexers before the output, all of them 3-point FFT. The corresponding SFG using radix-24 FFT
can be switched between two operation mode - calculating algorithm and the improved radix-3 FFT algorithm is shown
mode and bypassing mode. in Fig. 6. The proposed 2-parallel and 4-parallel radix-24×3
feedforward architectures for the 48-point FFT computation
The proposed architecture can process two concurrent 3- are shown in Fig. 7 and Fig. 8, respectively. Each stage in the
point FFT symbols simultaneously and can process sets of architecture realizing a stage in the SFG. The first four stages
FFT symbols continuously. Take two concurrent 3-point FFT are responsible for the 16-point FFT operation, and the last
symbols as an example. The data scheduling is shown in Fig. three stages are in charge of the 3-point FFT operation. The
5. The nth data sample of the ith symbol is expressed as xi[n] data orderings are shown below the stages by defining the
and is fed into the ith data path. It is processed through the index of the data x as (bk-1,…,b3,b2,b1,t0), where the symbol t0
circuit from clock cycle t = 1 to 7. The arrows in the butterfly represents a ternary number, also called base 3, and symbols
units and switches indicate the direction of data flows and the b4, b3, b2, and b1 represent binary numbers. Furthermore, the
operation mode of these components. The colored components represented by circles with a number (or a
components are the one operating at the shown clock period. symbol) are rotator or multiplier, which would be W4, W8, W16,
For constant multipliers at the last stage, the circle without Wα, or WN (complexity multiplier). It can be observed that the
sign α means the multiplier is in the bypassing mode (data is proposed architectures require only one non-trivial multiplier
bypassed directly), and the circle with sign α is the one in the every four stages regardless of the degree of parallelism.
calculating mode, i.e. multiplication. Furthermore, the
switches and delay elements are in charge of reordering the Generally speaking, since the architecture can process P-
samples so that the proper output order can be generated. parallel continuous data samples, the throughput in samples
per clock cycle is equals to P and the latency is equals to N/P
It is shown that the hardware requirements of the R3FA clock cycles. As for hardware requirements, the proposed P-
are six delay elements, three switches, one constant multiplier, parallel N-point FFT architecture applied with radix-2k×3
and three butterfly units. The utilization of butterfly units and

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on April 18,2022 at 11:06:33 UTC from IEEE Xplore. Restrictions apply.
t = 1: ST1 ST2 ST3 ST4 ST5 ST6 ST7
0
x1[2] x1[1] x1[0] x[0]
0
X[0]
x[1] X[24]
0
x[2] X[12]
0
x2[2] x2[1] x2[0] x[3] X[36]
8
x[4] X[6]
x[5] 16 X[30]
x[6] 0 X[18]
t = 2: 8
x[7] X[42]
16
x[8] X[3]
x1[2] x1[1] 12 0
x1[0] x[9]
12 16
X[27]
x[10] X[15]
12 32
x[11] X[39]
0
x2[2] x2[1] x2[0] x[12] X[9]
4
x[13] X[33]
x[14] 8 X[21]
x[15] 6 0 X[45]
t = 3: 6 12
x[16] X[1]
6 24
x[17] X[25]
x 1[1] x1[0] 12 0
x 1[2] x[18]
12 12
X[13]
x[19] X[37]
12 24
x[20] X[7]
12 18 0
x2[2] x2[1] x2[0] x[21] X[31]
12 18 20
x[22] X[19]
x[23] 12 18 40 X[43]
x[24] 0 X[4]
t = 4: 2
x[25] X[28]
4
x[26] X[16]
x2[1] x2[0] x1[0] x[27]
3 0
X[40]
x[28] 3 10 X[10]
3 20
x[29] X[34]
6 0
x2[2] x1[2] x 1[1] x[30]
6 10
X[22]
x[31] X[46]
x[32] 6 20 X[2]
6 15 0
x[33] X[26]
t = 5: x[34] 6 15 18
X[14]
6 15 36
x[35] X[38]
x1[2] x1[1] x1[0] x[36] 12 0 X[8]
x[37] 12 6 X[32]
12 12
x[38] X[20]
x2[2] x2[1] α x2[0] x[39] 12 9 0
X[44]
12 9 11
x[40] X[5]
x[41] 12 9 22 X[29]
12 18 0
x[42] X[17]
t = 6: x[43] 12 18 14
X[41]
12 18 28
x[44] X[11]
x2[1] x1[1] x[45] 12 18 21 0 X[35]
x[46] 12 18 21 22 X[23]
12 18 21 44
x[47] X[47]
x 2[2] α x1[2]

t = 7:
Fig. 6. Signal flow graph of the 48-point radix-24×3 FFT
x2[1]

x2[2]

TABLE I. THE CONFIGURATIONS OF PROPOSED 48/64-POINT FFT


Fig. 5. Data scheduling of proposed radix-3 FFT architecture FFT Stage i (STi)
length 1 2 3 4 SW1 5 6 SW2 7
algorithm is composed of log2(N/3)+3 stages and requires 48 c c c c b bc bc b bc
only one non-trivial multiplier every k stages in each path. 64 c c c c s c b s c
The number of butterfly unit, complex multiplier, Wα
multiplier, and delay element equal to P(log2(N/3)+3)/2, IV. IMPLEMENTATION RESULTS
Plog2(N/3)/4, P/2, and N, respectively. A pipelined 128-2048/1536 FFT processor supporting 4G
LTE system is implemented with the proposed design
B. Reconfigurable Mixed-Radix Feedforward FFT approach. A performance, power, and area (PPA) comparison
Architectures of the proposed FFT architecture and the existing mix-radix
When designing a multi-mode FFT processor, one of the FFT architectures is presented in TABLE II and TABLE III.
most efficient ways for area saving is to share the hardware The normalization formulas are defined in [2] and [15]. The
components among different FFT lengths as more as possible. presented PPA indicators are all normalized to 65 nm
By following this principle, a reconfigurable 48/64-point technology for a fair comparison. The implementation result
architecture is proposed in Fig. 9. Two interconnected shows that the proposed processor occupies 1.843 mm2 and
switches (SW1 and SW2) are added into the original 48-point dissipate 106.77 mW. Compared with the existing mixed-
FFT architecture. The delay elements with numbers separated radix FFT processors, this work performs outstanding
by slash mean that it can be configured from one number to throughput capacity and still has high area efficiency.
another. As the required FFT operating length changed from
one to another, the behavior of the switches and the butterfly V. CONCLUSION
units’ changed. For example, the switches are set to the In this paper, the reconfigurable P-parallel radix-2k ×3
switching mode when processing 64-point FFT and are set to feedforward FFT architectures are proposed. The
the bypass mode while processing 48-point FFT. The detail architectures break the limitation of throughput bound in
configurations of the architecture are shown in TABLE I. The existing designs by applying the proposed radix-3 FFT
symbol b, c, and s represents bypassing mode, calculating architecture. It offers a high area-efficiency, low power-
mode, and switching mode, respectively. The symbol bc consumption solution for 2k×3-point FFT computation. The
means that the stage is either in the bypassing or calculating implementation result and qualitative analysis of an 128-
mode, depending on the timing as described in Fig. 5. 2048/1536-point FFT processor is presented as well.

Radix-3 feedforward FFT architecture (R3FA)


ST1 ST2 ST3 ST4 ST5 ST6 ST7

12 8 6 16 3 N 1 1 1
BU1 sw BU1 sw BU1 sw BU1 sw BU2 sw BU3 sw BU2
4 12 8 6 16 3 N 1 1 1 α

b3 b2 b1 t0 b4 b2 b1 t0 b4 b3 b1 t0 b4 b3 b2 t0
b4 b3 b2 b1
24 23 22 .. . 3 2 1 0 35 .. . 25 24 11 .. . 1 0 41 ... 37 36 29 ... 25 24 17 .. . 13 12 5 .. . 1 0 44 43 42 38 37 36 32 31 30 26 25 24 20 19 18 14 13 12 8 7 6 2 1 0

47 46 45 .. . 28 27 26 25 47 .. . 37 36 23 .. . 13 12 47 ... 43 42 35 ... 31 30 23 .. . 19 18 11 .. . 7 6 47 16 45 41 40 39 35 34 33 29 28 27 23 22 21 17 16 15 11 10 9 5 4 3

Fig. 7. Proposed 2-parallel radix-24×3 feedforward architecture for the computation of the 48-point FFT

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on April 18,2022 at 11:06:33 UTC from IEEE Xplore. Restrictions apply.
ST1 ST2 ST3 ST4 ST5 ST6 ST7

6 16 3 N 1 1 1
BU1 BU1 sw BU1 sw BU1 sw BU2 sw BU3 sw BU2
4 6 16 3 N 1 1 1 α

8 6 16 3 N 1 1 1
BU1 BU1 sw BU1 sw BU1 sw BU2 sw BU3 sw BU2
4 8 6 16 3 N 1 1 1 α

b2 b1 t0 b2 b1 t0 b4 b1 t 0 b4 b3 t0
b4 b3 b2 b1
b3 b4 b3 b2
Radix-3 feedforward FFT architecture (R3FA)

Fig. 8. Proposed 4-parallel radix-24×3 feedforward architecture for the computation of the 48-point FFT

ST1 ST2 ST3 ST4 SW1 ST5 ST6 SW2 ST7

3/4 N 1/2 1 1
BU1 BU1 BU1 sw BU1 sw BU2 sw BU3 sw BU2
1/2 1
4 3/4 N 4 1 α

8 3/4 N 1/2 1 1
BU1 BU1 BU1 sw BU1 sw BU2 sw BU3 sw BU2
N 1/2 1 1 α
4 8 3/4 4

16 3/4 N 1/2 1 1
BU1 BU1 BU1 sw BU1 sw BU2 sw BU3 sw BU2
1/2 1
4 16 3/4 N 4 1 α

8 16 3/4 N 1/2 1 1
BU1 BU1 BU1 sw BU1 sw BU2 sw BU3 sw BU2
4 8 16 3/4 N 1/2 4 1 1 α
b 1 t0 b 1 t0 b1 t0 b 4 t0
b4 b3 b2 b1
b3 b4 b3 b2
b4
Radix-3 feedforward FFT architecture (R3FA)
b2 b2 b3

Fig. 9. Proposed reconfigurable 8-parallel radix-24×3 feedforward architecture for the computation of the 48/64-point FFT

TABLE II. A C OMPARISON OF FFT PROCESSORS

Proposed [2] [3] [4] [5] [7] [11] [14]


Architecture
MDC SDF SDF SDF SDF MDF MDC Memory-based
128~2048/ 128~2048/ 128~2048/ 128/512/
FFT size 128~2048 2~2187 4~2048 128~2048
1536 1536 1536 1024/2048
Parallelism 8 1 1 1 1 6 or 8 4 1
Word length (bits) 12 12 16 16 14 12 10 16
Process (nm) 90 90 180 90 40 65 90 55
Supply voltage (V) 0.9 0.9 1.8 NA 0.99 0.45 1 1.08
Frequency (MHz) 166.67 40 40 200 555.56 20 40 122.88
Throughput (M Samples/s) 1333.33 40 40 200 555.56 160 160 122.88
Area (μm2) 1,843,441 783,000 4,520,000 1,116,000 258,300 1,375,000 3,100,000 6,150,000
Normalized area [2] (mm2) 0.12 0.408 0.589 0.582 0.682 0.172 0.404 0.859
NAE [15] (samples/mm2) 8.32 2.448 1.697 1.718 1.466 5.818 2.474 1.164
Power (mW) 106.77 7.19 55.64 NA 42.95 8.55 63.72 19.2
Normalized power [2] (mW) 0.146 0.328 0.318 NA 0.263 0.540 0.589 0.324
Implementation results Pre-layout Post-layout Post-layout Pre-layout Pre-layout Measurement Post-layout Post-layout

TABLE III. A C OMPARISON OF FFT PROCESSORS SUPPORTING 128-2048/1536 FFT LENGTHS

Architecture Hardware Requirements Functionality and Performance


Real Delay Constant multiplier Complex Supported Latency Throughput
Type Algorithm
adder element (W4, W8, W16, W32, Wα) multiplier FFT lengths (cycles) (samples/cycle)
SDF[2] Radix-23 56 2050 10 3 128~2048/1536 N 1
128~2048 N/8 8
MDF[7] Radix-24/8/6 304 2040 55 16
1536 N/6 6
48 2048 14 4 128~2048/1536 N/2 2
Proposed
Radix-24/3 96 2048 25 8 128~2048/1536 N/4 4
MDC
192 2048 45 16 128~2048/1536 N/8 8

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on April 18,2022 at 11:06:33 UTC from IEEE Xplore. Restrictions apply.
REFERENCES
[1] Cooley, J.W., and Tukey, J.W., “An algorithm for machine
computation of complex Fourier series,” Math. Comput., 1965, 19, pp.
297–301
[2] C. Yu and M. H. Yen, "Area-Efficient 128- to 2048/1536-Point
Pipeline FFT Processor for LTE and Mobile WiMAX Systems," in
IEEE Transactions on Very Large Scale Integration (VLSI) Systems,
vol. 23, no. 9, pp. 1793-1800, Sept. 2015.
[3] M. S. Patil, T. D. Chhatbar and A. D. Darji, "An area efficient and low
power implementation of 2048 point FFT/IFFT processor for mobile
WiMAX," 2010 International Conference on Signal Processing and
Communications (SPCOM), Bangalore, 2010, pp. 1-4.
[4] X. Y. Shih, Y. Q. Liu and H. R. Chou, "48-Mode Reconfigurable
Design of SDF FFT Hardware Architecture Using Radix-32 and Radix-
23 Design Approaches," in IEEE Transactions on Circuits and Systems
I: Regular Papers, vol. 64, no. 6, pp. 1456-1467, June 2017.
[5] X. Y. Shih, H. R. Chou and Y. Q. Liu, "VLSI Design and
Implementation of Reconfigurable 46-Mode Combined-Radix-Based
FFT Hardware Architecture for 3GPP-LTE Applications," in IEEE
Transactions on Circuits and Systems I: Regular Papers, vol. 65, no.
1, pp. 118-129, Jan. 2018.
[6] X. Shih, H. Chou and Y. Liu, "Design and Implementation of Flexible
and Reconfigurable SDF-Based FFT Chip Architecture With
Changeable-Radix Processing Elements," in IEEE Transactions on
Circuits and Systems I: Regular Papers, vol. 65, no. 11, pp. 3942-3955,
Nov. 2018.
[7] C. H. Yang, T. H. Yu and D. Markovic, "Power and Area Minimization
of Reconfigurable FFT Processors: A 3GPP-LTE Example," in IEEE
Journal of Solid-State Circuits, vol. 47, no. 3, pp. 757-768, March 2012.
[8] Sheng-Yeng, Kai-Ting, Chao-Ming and Yuan-Hao, "Energy-efficient
128∼2048/1536-point FFT processor with resource block mapping for
3GPP-LTE system," The 2010 International Conference on Green
Circuits and Systems, Shanghai, 2010, pp. 14-17.
[9] M. Garrido, S. J. Huang, S. G. Chen and O. Gustafsson, "The Serial
Commutator FFT," in IEEE Transactions on Circuits and Systems II:
Express Briefs, vol. 63, no. 10, pp. 974-978, Oct. 2016.
[10] M. Garrido, N. K. Unnikrishnan and K. K. Parhi, "A Serial
Commutator Fast Fourier Transform Architecture for Real-Valued
Signals," in IEEE Transactions on Circuits and Systems II: Express
Briefs, vol. 65, no. 11, pp. 1693-1697, Nov. 2018.
[11] K. J. Yang, S. H. Tsai and G. C. H. Chuang, "MDC FFT/IFFT
Processor With Variable Length for MIMO-OFDM Systems," in IEEE
Transactions on Very Large Scale Integration (VLSI) Systems, vol. 21,
no. 4, pp. 720-731, April 2013.
[12] M. Garrido, J. Grajal, M. A. Sanchez and O. Gustafsson, "Pipelined
Radix-2k Feedforward FFT Architectures," in IEEE Transactions on
Very Large Scale Integration (VLSI) Systems, vol. 21, no. 1, pp. 23-32,
Jan. 2013.
[13] M. Garrido, S. Huang and S. Chen, "Feedforward FFT Hardware
Architectures Based on Rotator Allocation," in IEEE Transactions on
Circuits and Systems I: Regular Papers, vol. 65, no. 2, pp. 581-592,
Feb. 2018.
[14] K. F. Xia, B. Wu, T. Xiong and T. C. Ye, "A Memory-Based FFT
Processor Design With Generalized Efficient Conflict-Free Address
Schemes," in IEEE Transactions on Very Large Scale Integration
(VLSI) Systems, vol. 25, no. 6, pp. 1919-1929, June 2017.
[15] C.-H. Lin, C.-Y. Chen, and A.-Y. Wu, “Area-efficient scalable MAP
processor design for high-throughput multistandard convolutional
turbo decoding,” IEEE Trans. Very Large Scale Integr. (VLSI) Syst.,
vol. 19, no. 2, pp. 305–318, Feb. 2011.

Authorized licensed use limited to: University of Electronic Science and Tech of China. Downloaded on April 18,2022 at 11:06:33 UTC from IEEE Xplore. Restrictions apply.

You might also like