108 views

Uploaded by Iqbal Novramadani

Here we present a new design of a 64-point Fast Fourier
Transform circuit. The design is derived from Radix-23 algorithm
and implemented using Single Path Delay Feedback architecture.
This approach ensures high memory and multiplier
utilizations. The 64-Point FFT is realized by decomposing into
two-dimensional structure of 8-point FFTs. Each of this FFT is
re-decomposed into 4-point and 2-point FFTs. This decomposition
reduces the number of non-trivial twiddle factor into just
one. Thus we only need one complex multiplier for the design.
The complex multiplier is realized using modified Booth (radix-
4) encoding algorithm to achieve faster computational speed. The
validity and efficiency of the proposed circuit has been thoroughly
verified by functional simulation, timing simulation, and
FPGA implementation. The proposed design has been successfully
synthesized using Synopsys with TSMC 0.18μ technology.
The core area is 0.47 mm2. The power consumption is 29.7 mW.
The time delay is 6 ns. The circuit computes one serial-to-serial
data in 116 clock cycles. Thus our design has 3 advantages: small
area, low power consumption, and fast computation.

- Math Standards Grade 3
- Ece1373 Final Report
- II
- LabVIEW FPGA Programming Best Practices
- Magic
- R7220404 Switching Theory & Logic Design
- Fast Fourier Transform Implementation for High Speed Astrophysics Applications on FPGAs.pdf
- Multiplication strategies.pdf
- Integer INTEGRAL EXPONENTS
- Multiply Fractions Calculator
- 2 FPGA Fundamentals
- Vga Project Report
- Volume 3Number 3PP 191 198x
- Andreas Intoniou Digital Signal Processing.9780071454247.31527
- Mtech Embedded Syllabus
- lrp-revised-channa griham
- lecture_8.pdf
- Report
- Cordic_for Vsp Ntuee
- 4 q3 multiplying fractions by whole numbers

You are on page 1of 5

64-point Fast Efficient FFT Architecture Using Radix-23 Single Path Delay Feedback

Trio Adiono, Muh Syafiq Irsyadi, Yan Syafri Hidayat, Ade Irawan

Electrical Engineering and Informatics School, Bandung Institute of Technology Jl. Ganesha 10, Bandung 40132, Indonesia

tadiono@paume.itb.ac.id syafiq@students.ee.itb.ac.id yayan_sh@yahoo.com ade_gawa@yahoo.com Abstract Here we present a new design of a 64-point Fast Fourier Transform circuit. The design is derived from Radix-23 algorithm and implemented using Single Path Delay Feedback architecture. This approach ensures high memory and multiplier utilizations. The 64-Point FFT is realized by decomposing into two-dimensional structure of 8-point FFTs. Each of this FFT is re-decomposed into 4-point and 2-point FFTs. This decomposition reduces the number of non-trivial twiddle factor into just one. Thus we only need one complex multiplier for the design. The complex multiplier is realized using modified Booth (radix4) encoding algorithm to achieve faster computational speed. The validity and efficiency of the proposed circuit has been thoroughly verified by functional simulation, timing simulation, and FPGA implementation. The proposed design has been successfully synthesized using Synopsys with TSMC 0.18 technology. The core area is 0.47 mm2. The power consumption is 29.7 mW. The time delay is 6 ns. The circuit computes one serial-to-serial data in 116 clock cycles. Thus our design has 3 advantages: small area, low power consumption, and fast computation. Keywords FFT, R23SDF, radix-23.

N N n1 + n 2 + n3 4 8 k = k1 + 4k2 + 8k3 n=

N 1 8 N 0 k1 3, 0 k2 1, 0 k3 1 8 0 n1 3, 0 n2 1, 0 n3

X (k ) =

3 N N N nk 4 1 1 x n1 + n2 + n3 WN 8 n3 = 0 n2 = 0 n1 = 0 4

N 1 8

WN 8 =

( N n2 + n3 ) k1 1

WN 8

( N n2 + n3 )(4 k2 +8 k3 )

I. INTRODUCTION FFT have been used in innumerable signal processing applications and are often an important building block in such systems. Many of these applications require real time operation in order to be useful. While Digital Signal Processors (DSP) are available that can perform an FFT fast enough to keep up with many real-time applications, some systems require additional computation or have speed requirements that exceed the capabilities of a DSP alone. It is in these situations that dedicated logic for computing an FFT proved to be useful. Pipeline FFT processor is a specific class of processors for DFT computation utilizing fast algorithms. It is characterized with real-time, non-stopping processing as the data sequence passing the processor. II. THEORY This algorithm is based on fact that radix-8 FFT can be decomposed into radix-4 and radix-2 FFT in order to reduce computation complexity. Recall the DFT algorithm,

n3 = 0 n2 = 0

BF 4 n 8

( N n2 + n3 )( k1 + 4 k2 + 8 k3 )

N 1 8

+ n3 , k1

WN 8

WN 8

( N n2 + n3 )( k1 + 4 k2 +8 k3 )

8

X (k ) =

n3 = 0 n2 = 0

BF 4 n 8

N 1 8

N 1 8

+ n3 , k1

8

7

n3 = 0

H ( n , k , k ) W

3 1 2

sm 8

N 8

n3 k3

WNn 3( k1 + 4 k2 )

sl 64

x ( l + 8m ) W

m=0

X ( k ) = x ( n )W

n =0

N 1

n3 = 0

H ( n , k , k ) W

3 1 2

N 8

n 3( k1 + 4 k2 ) N

nk N

0k < N

sl 64

x ( l + 8m ) W

m=0

sm 8

n3 = 0

H ( n , k , k ) W

3 1 2

N 8

n 3( k1 + 4 k2 ) N

H ( n3 , k1 , k2 ) = N BF 4 ( n3 , k1 ) + BF 4 n3 + , k1 W8( k1 + 4 k2 ) 8

From the last equation we have shown that the first stage of 64-point radix-8 FFT can be decomposed into radix-4 and radix-2 FFTs. The second stage of radix-8 FFT can be decomposed into radix-4 and radix-2 FFT using the same method. The real advantage of this method is that W8sm and W8lt is trivial twiddle factor. Its actually addition / subtraction operation followed by multiplication with (1/ 2 ) that can be realized using only a hardwired shift-and-add operation [2]. The only non-trivial twiddle factor is W64sl. Detailed derivation of radix-8 and radix-23 FFT algorithm can be found on [2] and [3]. III. DESIGN ARCHITECTURE The block diagram of the 64-point FFT processor derived from section 2 is depicted in figure 1. It consists of four stages of butterfly feedback structure and one reorder stage. The architecture itself is based on Single Path Delay Feedback architecture. The reason is the delay-feedback approach are always more efficient than corresponding delay-commutator approach in term of memory utilization since the stored butterfly output can be directly used by the multiplier [2]. The unusual mixed radix structure consists of radix-4 butterfly, followed by radix2 butterfly, followed by radix-4, and radix-2 butterfly is intended to retain the radix-8 FFT advantage. That is there is only one non-trivial twiddle factor needed and yet this new approach has simpler butterfly structure higher utilization of butterfly compared to radix-8. Controller : In this design we dont implement a master controller. Each butterfly has its own controller that independent from each other. This approach leads to modular and general structure of butterfly. Each controller is activated by the head signal from previous stage. The controller it self is actually a (log2 N)-bit binary counter. In each butterfly, the counter is divided into four or two group cycle on radix-4 and radix-2 butterfly respectively. Each group of counting is

called phase (ph). These phases control the memory modules, butterfly operation and twiddle multiplication. Another control signal called stage (st) is needed by twiddle stage to choose the multiplicative operation. Radix-4 butterfly (stage 1 and stage 3): Stage 1 and stage 3 are radix-4 butterfly modules. There are four phases that control the butterfly operation. In the first three phases the data input is directly inserted into shift register, while the previous data is taken to the output. The butterfly computation only happens on the last phase. Radix-2 butterfly (stage 2 and stage 4): Stage 2 and stage 3 are radix-2 butterfly modules. Same as stage 1 and stage 3, the only difference between stage 2 and 4 is in the shift register length. There are two phases that control the butterfly operation. Trivial twiddle factor: In this design, there is four cases of trivial twiddle factor, each cases belongs to each phases. From the algorithm in section 2, we can conclude that only the second half of the data in each phase that needs to be multiplied with trivial twiddle factor. The first half will be remain constant. Thats why we need another control signal that change every eight clock cycle to tell the twiddle factor mechanism whether its needs to be multiplied or not.

TABLE 1 TRIVIAL TWIDDLE FACTOR CONSTANT

Phase 00 01 10 11

2 2

As we can see on the table 1 that on phase 0 and phase 2, the multiplication is merely no change at all or just swapping and inverting the real and imaginary part. On phase 1 and phase 3 it involves an addition/subtraction and multiplication with 1/ 2 constant. From [2] we get that the constant to be multiplied is called priori. This constant can be decomposed as a summation / subtraction based on power of 2. This in essence results in a shift-and-add architecture. Constant 1/

655

2-3 + 2-4 + 2-8). With this representation, the multiplication of input data with this constant turns into addition of right shifted values of input data. Non-trivial twiddle factor: This operation uses ROMs to save the twiddle factors and one complex multiplier to do the operation. The ROMs is very simple. We implement two array of constant to save the twiddle factor constant. The real and imaginary parts of the twiddle factor are saved in the first and second array respectively. We implement a custom built multiplier based on radix-4 recoding technique (modified booth recoding technique). This approach is proven to be the most efficient multiplier in terms of AT (area time delay) compared to Synopsys standard multiplier (using * operator) and the standard multiplier plus shuffle network version (intended to reduce the twiddle factor constant). The complete comparisons are presented in table 2.

TABLE 2 MULTIPLIER COMPARISONS

one position to ensure that the last triplet contains 3 bits. In every step we will get a signed digit that will multiply the multiplicand to generate a partial product. The recoding table is presented in table 3.

TABLE 3 RADIX-4 RECODING

xi+2 xi+1 xi 000 001 010 011 100 101 110 111

Partial products 0Y +1 Y +1 Y +2 Y -2 Y -1 Y -1 Y 0Y

Area (m2)

AT

In the straightforward implementation, complex multiplication needs four real multiplier and two adders. So, we need four booth recoders if we want to implement the multiplication using radix-4 recoding. But, if we examine closely the multiplication formula,

( a + jb )( c + jd ) = ( ac bd ) + j ( bc + ad )

From table 2 it can be clearly seen that radix-4 recoding is the best choice in terms of speed and AT. The other advantage of using custom multiplier is that the synthesized circuit will be independent to synthesis tools Radix-4 recoding multiplier itself is a recoding process intended to reduce the partial product. This can be achieved by the application of the multiplier recoding, changing from a 2scomplement format to a signed-digit representation from the set {0, 1, 2} [5]. The radix-4 recoding starts by appending a zero to the right of x0 (multiplier LSB). Triplets are taken beginning at position x 1 and continuing to the MSB with one bit overlapping between adjacent triplets. If the number of bits in X (excluding x 1) is odd, the sign (MSB) is extended

and if we always keep one pair (a and b for example) as the multiplier and the other pair (c and d) as the multiplicand then we only need two radix-4 recoders instead of four [7]. The circuit block diagram is presented in figure below. There are four inputs. Input a and b are recoded to choose the appropriate partial product. Once the radix-4 recoded partial products have been generated, they need to be shifted and added. To produce the real part then the sum of the first partial product is subtracted by the sum of the second partial product. The imaginary part is an addition of the other two partial products. Micro architecture for radix 4 recoding is presented in figure 2.

656

Reorder: The reorder stage is an integral part of the design to realize data ordered serial-to-serial data input-output. We implement the reorder stage using only shift registers and multiplexers. The shift registers is used to save the data temporally before taken out as the output. We need 98 blocks of shift registers for the design. As the selector, we implement 64to1 mapping using multiplexers. IV. VERIFICATION AND IMPLEMENTATION Verification process includes functional simulation, waveform simulation, and signal tap in FPGA. Functional simulation was done to know if HDL design was match with model. After the functional simulation is complete, the architecture was synthesized for TSMC 0.18 library using Synopsys. The synthesis result is presented in table 4. The FPGA implementation is used to know whether the designed circuit is function correctly in the real world or not. We use Altera Cyclone II EP2C35F672C6 board for this design

Figure 2 Architecture for a complex multiplier circuit with twiddle factor ROM

Figure 3 FFT ouput from FPGA captured using Signal Tap II TABLE 4 PERFORMANCE COMPARISON OF THE PROPOSED FFT CIRCUIT WITH THE REFERENCE DESIGN AND WITH AVAILABLE CHIPSETS

FFT Circuit Proposed (radix23SDF) Koushik[2] (radix-8) T. Chen L..Zhu[2] T. Chen Sunanda[2] McCanny D. Trainor[2]

Word length 16 16 16 16 24

implementation. We upload the test vector and the expected result in ROM, and compare the result. The output signal is captured using Signal Tap II function on Altera Quartus software. On the figure 3, we use 15 cycle complex sinusoid as our test vector. The test vector signal continuously inputted into the designed circuit. We use 50 MHz internal clock to produce the clock signal. To capture the signals we implement a push button as our trigger. The push button itself only serves as a trigger and doesnt have any connections to the design. The

head signal is automatically generated at the beginning of the first data using a counter.

TABLE 5 AREA AND TIME DELAY SYNTHESIS RESULT

m2 473163.78125

normalized 17780.62478

ns 6.03

normalized 24.25862069

657

V. CONCLUSIONS 64 point FFT architecture for high speed WLAN systems based on OFDM transmission has been presented. This architecture is based on a decomposition of the 64 point FFT into four stages of 4-point and 2-point FFTs. The algorithm offers simple FFT computations so that the resulting algorithm to architecture mapping is well suited for hardware implementation. The design exhibits numerous attractive features from a VLSI point of view, which include regularity, modularity, and high throughput. The validity and efficiency of the proposed circuit has been thoroughly verified by functional simulation, timing simulation, and FPGA implementation. The proposed design has been successfully synthesized using Synopsys with TSMC 0.18 technology library. The core area is 0.47 mm2. The power consumption is 29.7 mW. The time delay is 6 ns. The circuit computes one serial-to-serial data in 116 clock cycles. Thus our design has 3 advantages: small area, low power consumption, and fast computation in terms of speed and clock latency. Those advantages prove that this design is well suited for high performance WLAN system. REFERENCES

[1] [2] Shousheng He, Mats Torkelson. A New Approach to Pipeline FFT Processor. Department of Applied Electronics, Lund University. Koushik Maharatna, Eckhard Grass, Ulrich Jagdhold. A 64-Point Fourier Transform Chip for High-Speed Wireless LAN Application Using OFDM. IEEE Journal of Solid State Circuit, Vol. 39, No. 3, March 2004. . Modified radix- 23 FFT. Graduate Institute of Electronics Engineering, NTU. Wada Tomohisa. 64 Point Fast Fourier Transform Circuit (Version 1.0). Available: http://bw-www.ie.uryukyu.ac. jp/~wada/ design07/spec_e.html J.A Hidalgo. A Radix-8 Multiplier Unit Design For Specific Purpose. Dept. de Electronics, E.T.S.I Industriales. Joel J. Fster, Karl S. Gugel. Pipelined 64-Point Fast Fourier Transform For Programmable Logic Devices. Dept. of Electrical and Computer Engineering, University of Florida. Geoff Knagge. ASIC Design for Signal Processing. Available: http://www.geoff knagge.com/. Lo'ai A. Tawalbeh, Alexandre F. Tenca and C . K. Ko. A Radix-4 Design of a Scalable Modular Multiplier With Recoding Techniques. School of Electrical Engineering & Computer Science Oregon State University.

658

- Math Standards Grade 3Uploaded bytruadmin
- Ece1373 Final ReportUploaded byAnonymous VBGaySkx
- IIUploaded byknk761987
- LabVIEW FPGA Programming Best PracticesUploaded byBalázs Nagy
- MagicUploaded byRifi Fathima
- R7220404 Switching Theory & Logic DesignUploaded bysubbu
- Fast Fourier Transform Implementation for High Speed Astrophysics Applications on FPGAs.pdfUploaded byLemi Foto
- Multiplication strategies.pdfUploaded bysmith1
- Integer INTEGRAL EXPONENTSUploaded byRosenia Santiago Pascual
- Multiply Fractions CalculatorUploaded bycircleteam123
- 2 FPGA FundamentalsUploaded byJSebastian Parra
- Vga Project ReportUploaded byPranav Jain
- Volume 3Number 3PP 191 198xUploaded bypepgote
- Andreas Intoniou Digital Signal Processing.9780071454247.31527Uploaded byAnmol Singh
- Mtech Embedded SyllabusUploaded byGibin George
- lrp-revised-channa grihamUploaded byapi-175333098
- lecture_8.pdfUploaded byatom tux
- ReportUploaded byPalash Jhabak
- Cordic_for Vsp NtueeUploaded byAchu
- 4 q3 multiplying fractions by whole numbersUploaded byapi-235131396
- RegistersUploaded byPaw Paladan
- DivConq01Uploaded byRajeev Krishna Singh
- Unit 2 Practice TestUploaded byMisterLemus
- Design Performance AnUploaded byankaiah_yadav
- Samer AlamriUploaded byVmv Sairam
- FPGA Power Reduction Using Configurable Dual-VddUploaded bymossaied2
- New Text DocumentUploaded byAniket
- 1. Simplification ToolsUploaded bydassreerenjini
- 18BCE0467_VL2018195003971_AST01Uploaded byPuneet Tiwari
- SPI Block GuideUploaded byAaron Davis

- Linux ManualUploaded byselva13683kumar
- A88M-ITXac_multiQIGUploaded byJhon Eduar Valencia
- 3241_132Uploaded byLaw Zhan Hong
- 12090 - Fundamentals of ElectronicsUploaded byatulzende
- Adjunting the Illumination_00192546-01[1]Uploaded byIvanRemi
- HTMLUploaded byHabib Mangoli
- Computer Power UserUploaded byandreimihai2001
- 111-project5-v1.pdfUploaded byMichael Alexander Harris
- InteliATS NT 2.5 New Features r1Uploaded byfernando lozano
- Nemo Outdoor TrainingUploaded bydwahjoedi3984
- Visual Development Pack ExampleUploaded byCage Prabhu
- SiebelIntQAUploaded bysandeepchopra23
- Legato NetworkerUploaded byAbhii01
- heatedsidebender_3.pdfUploaded byDavid U Juarez
- Unit 6Uploaded bysamavedamkalyan
- BMW Climate Control ComponentsUploaded bygraig27
- VCS Fencing Deployment ConsiderationsUploaded byKrishna Reddy
- 1.8 Channel MasterUploaded byLarry Tembu
- History m.algebraUploaded bySyahira Yusof
- 2 (Chs) Install Equipment ,Devices and Systems2Uploaded byArtem Parriñas
- 1 Catalogo Durometros de Sobremesa 2014Uploaded bydanielbo2220
- NX ShortcutUploaded byDinh Ct
- Piaggio X7 250ie (EN)Uploaded byManualles
- RT3 Straight BladeUploaded bycamohunter71
- 1243ii (11B).pdfUploaded byวิรัตน์ อัครอภิโภคี
- Plan_LTE1800_Nokia_04052015Uploaded byarif budiman
- Marker DesignUploaded byJosé Miguel Reyes Betancourt
- Cisco ASA 5506 DatasheetUploaded byFrancine Johnson
- Caffeine Extraction From Tea Lab ReportUploaded byLoginy Radhakrishnan
- ￼￼￼￼AD5171 64-Position OTP Digital Potentiometer Data SheetUploaded bySherif Eltoukhi