Graduate Institute of Electronics Engineering, NTU

FFT VLSI Implementation
VLSI Signal Processing

1. 2.

Shousheng He and Mats Torkelson, A new approach to pipeline FFT processor. IEEE Proc. Of IPPS, P766-770, 1996. E. Bidet, D. Castelain, C. Joanblanq, and P. Senn, A fast single-chip implementation of 8192 complex point FFT. IEEE J. Solid-State Circuits, P300-305, March 1995

ACCESS IC LAB

.1. N -1 n!0  j(2T / N ) . NTU FFT Review N 1 X (k ) ! § N G ( n )W ! e nk N for k ! 0...ACCESS IC LAB Graduate Institute of Electronics Engineering..

with G [0 ] W X [0 ] X [1 ] W G [4 ] 0 N 1 W G [2 ] W G [6 ] 0 N 0 N X [2 ] 1 2 W 1 N X [3 ] 1 0 N W G [1 ] 0 N X [4 ] 1 1 N W G [5 ] W 1 X [5 ] 1 0 N 2 N W G [3 ] 0 N W 1 X [6 ] 1 2 N W G [7 ] W 1 W 1 3 N X [7 ] 1 .

Control --------------.Two Extreme Method Fully Spread Reuse Single Butterfly G [0 ] W G [4 ] 1 W G [2 ] W G [6 ] 1 0 N X [0 ] 0 N X [1 ] 0 N X [2 ] 1 W 2 N X [3 ] 1 W 0 N G [1 ] W G [5 ] 0 N X [4 ] 1 W 1 W 0 N 1 N X [5 ] 1 W 1 W 2 N G [3 ] 1 W 1 3 N 0 N X [6 ] W 1 G [7 ] 2 N X [7 ] 1 Slow ----------------. NTU Implementation --.Simple .ACCESS IC LAB Graduate Institute of Electronics Engineering.Speed ----------------.Fast Small ------------------Area------------------.Large Complicated -----------.

g. speed.ACCESS IC LAB Graduate Institute of Electronics Engineering.. NTU Design Consideration System Requirement e. area.power « Trade-off in these two cases. we need More Processing Elements (PE¶s) Better Processing Element Utilization Rate Better Control Scheme .

NTU FFT Processor --.ACCESS IC LAB Graduate Institute of Electronics Engineering.Block Diagram COEF ROM DATA IN INPUT BUFFER Processing Element (Butterfly) FFT RAM DATA OUT CONTROL SIGNAL CONTROL .

NTU Some Current Themes Radix-2 Multi-path Delay Commutator.ACCESS IC LAB Graduate Institute of Electronics Engineering. ( N = 16 ) . ( N = 16 ) 8 4 2 1 BF2 BF2 BF2 j BF2 Radix-2 Single-path Delay Feedback.

) 8 4 2 1 Radix-4 Single-path Delay Feedback. NTU Some Current Themes (cont.ACCESS IC LAB Graduate Institute of Electronics Engineering. ( N = 256 ) DC6x64 DC6x16 DC6x4 Radix-4 Single-path Delay Commutator. ( N = 256 ) ¡ ¡ ¡ BF4 BF4 BF4 ¢ ¢ ¢ ¢ ¢ ¢ 16 32 48 48 32 16   4 8   ¢ ¢ ¢ BF4 BF4 BF4 j BF4 12 1 2 3 3 2 C4 1 BF4 DC6x1 BF4 . ( N = 256 ) 192 C4 128 64 BF4 C4 BF4 C4 8 4 BF4 12 Radix-4 Multi-path Delay Commutator.

ACCESS IC LAB Graduate Institute of Electronics Engineering.Radix-2 has simpler BF which are better utilized .however. NTU Distinctive merit of the above The delay-feedback are more efficient than delay-commutator in terms of memory utilization Radix-4 has higher multiplier utilization .

Complex Processing Ability / Unit Low ----------------------------------.High Control Theme Simple ----------------------------------.High Combine the advantages Further decompose high radix PE . NTU Comparison Radix / Speed Low ----------------------------------.ACCESS IC LAB Graduate Institute of Electronics Engineering.

ACCESS IC LAB Graduate Institute of Electronics Engineering. NTU Decompose Method (1) Simply µµreuse¶¶ the repeated micro unit Reuse 4 times A radix-4 PE .

NTU Decompose Method (2) From algorithm level Applying 3 index: n=<n1*N/2 + n2*N/4 + n3>N where n1.1} .ACCESS IC LAB Graduate Institute of Electronics Engineering.n3={0~N/4-1} k=<k1 + 2k2 + 4k3>N Summation of n1 .n2={0.

Summation of n2 Only real-imaginary swapping & sign inversion .ACCESS IC LAB Graduate Institute of Electronics Engineering. NTU Decompose Method (2) cont.

ACCESS IC LAB Graduate Institute of Electronics Engineering. NTU Graphical Explanation (N=16) Trivial multiplication .

) The Eqs are equivalent to the operations below BF4 Control BF2 I BF2 II Control .ACCESS IC LAB Graduate Institute of Electronics Engineering. NTU Graphical Explanation (cont.

NTU Circuit of BF2I First N/2 cycles Xr(n) Xi(n) Xr(n+N/2) Xi(n+N/2) Zr(n+N/2) Zi(n+N/2) Zr(n) Zi(n) Second N/2 cycles .ACCESS IC LAB Graduate Institute of Electronics Engineering.

NTU Circuit of BF2II Xr(n) Xi(n) Xr(n+N/2) Xi(n+N/2) Zr(n+N/2) Zi(n+N/2) Zr(n) Zi(n) Swap Re&Im and sign inversion .ACCESS IC LAB Graduate Institute of Electronics Engineering.

NTU Radix-22 Single-path Delay Feedback 128 64 32 16 8 4 2 1 W1(n) clk 7 6 5 4 W2(n) 3 2 W3(n) 1 0 FFT architecture using the above technique. for N=256 Compare with original architecture. for N=256 £ £ £ x(n) BF2i BF2ii BF2i BF2ii BF2i BF2ii BF2i BF2ii X(k) .ACCESS IC LAB Graduate Institute of Electronics Engineering.

but still retain radix-2 BF structure The stage has non-trivial multiplication Control is simple.ACCESS IC LAB Graduate Institute of Electronics Engineering. synchronization controller n address counter for W 2 . NTU Structural advantage Radix-2 has the same complexity as radix-4.

However. 2. 3. NTU Conclusions 1. Fast convolution.. TI C3x/C5x) for computations (fast algorithms like DIT and DIF FFT). VLIW (Very Long-length Instruction Word)-based processors (TI C6x) need new programming skills to utilize the two parallel MAC units.g. FFT Applications: Radar Signal Processing. most systems still employ DSP processors (e.ACCESS IC LAB Graduate Institute of Electronics Engineering. . OFDM-based Modulation/demodulations Efficient VLSI architectures (parallel processing) are required for real-time processing. Spectrum Estimation. 4.