You are on page 1of 7


A.Anbarasan1 P.G Scholar Arunai College of Engineering Thiruvannamalai K.Shankar2 Assistant Professor Arunai College of Engineering Thiruvannamalai

Fast Fourier transform (FFT )and Inverse Fast Fourier Transform(IFFT) processing is one of the key procedures in popular orthogonal frequency division multiplexing (OFDM) communication systems. Structured pipeline architectures, low power consumption, high speed and reduced chip area are the main concerns in this VLSI implementation. In this paper, the efficient implementation of FFT/IFFT processor for OFDM applications is presented. This processor can be used in various OFDM-based communication systems, such as Worldwide Interoperability for Microwave access (Wi-Max), digital audio broadcasting (DAB), and digital video broadcastingterrestrial (DVB-T). We adopt single-path delay feedback architecture. To reduce the size of the read only memories (ROM's) used to store the twiddle factors, this proposed architecture applies a reconfigurable complex multiplier to achieve a reduced ROM size. To reduce the truncation error we adopt the fixed width modified booth multiplier. The three processing elements (PEs), delay-line (DL) buffers are used for computing IFFT. Thus we consume the low power, highly accurate and efficient processor with reduced chip size. Keywords : FFT, IFFT, OFDM, Modified Booth Multiplier, Error compensation circuit.

1. Introduction
The Fast Fourier Transform (FFT) and its Inverse Fast Fourier Transform (IFFT) are essential in the field of digital signal processing (DSP), widely used in communication systems, especially in orthogonal frequency division multiplexing(OFDM) systems, wireless-LAN, ADSL, VDSL systems and Wi-MAX. Apart from the applications, the system demands high speed of operation, low power consumption, reduced truncation error and reduced chip size. By considering these facts, we proposed the FFT/IFFT processor with single path delay feedback (SDF) pipeline architecture with reconfigurable complex constant multiplier and fixed-width modified booth multiplier. Discrete Fourier transform(DFT) is the useful method to transform from the time domain to the frequency domain. Here 2N2 complex operations are required for the computation of the N-point DFT, there are a number of different 'Fast Fourier Transform' (FFT) algorithms that enable the calculations much faster than a DFT. FFT algorithms used for quick calculation of discrete Fourier transform of a data vector. The FFT is a DFT algorithm which reduces the number of computations needed for N points from (N2) to (N log N) where log

is the base-2 logarithm. It is the new technique to reduce the order of complexity operations of DFT based on decomposition and breaking the transform into smaller transform and combining them to give the total transform. FFT is used to speed up the DFT, it reduces the computation time required to compute a discrete Fourier transform and improves the performance by factor 100 or more over direct evaluation of DFT. The SDF pipelined architecture is used for the high-throughput in FFT processor. There are three types of pipeline structures; they are singlepath delay feedback (SDF), single-path delay commutator (SDC) and multi-path delay commutator. The advantages of single-path delay feedback (SDF) are (1) This SDF architecture is very simple to implement the different length FFT. (2) The required registers in SDF architecture is less than MDC and SDC architectures. (3) The control unit of SDF architecture is easier. We implement the processor in SDF architecture with radix-4 algorithm. There are various algorithms to implement FFT, such as radix-2, radix-4 and splitradix with arbitrary sizes [1]. Radix-2 algorithm is the simplest one, but its calculation of addition and multiplication is more than radix-4's. Though being

more efficient than radix-2, radix-4 only can process 4n-point FFT. The radix-4 FFT equation essentially combines two stages of a radix-2 FFT into one, so that half as many stages are required. Since the radix4 FFT requires fewer stages and butterflies than the radix 2 FFT, the computations of FFT can be further improved. In order to speed up the FFT computation we increase the radix, for reducing the chip size we use ROM-less architecture and for further low power consumption we implement the reconfigurable complex multiplier and delay line buffers[2]-[4], error compensation is carried out using fixed-width modified booth multiplier. Here reconfigurable complex constant multiplier works simultaneously for the multiplication of two pairwise coefficients. These coefficients are real and imaginary with limited number of points and used for the multiplication of twiddle factors in FFT architectures. Several error compensation methods have been used to reduce the truncation error in the multiplication of twiddle factors in FFT/IFFT processors. Here, this fixed-width modified booth multipliers for lossy applications achieve better error performance in terms of absolute error and mean square error. Using radix-4 algorithm, we propose a 64-point FFT/IFFT processor for OFDM applications. Finally, this paper is organized as follows. Section II describes the FFT/IFFT processor with radix-4 algorithm. Proposed architecture is discussed in Section III. The performance results is then discussed in Section IV. Finally conclusion and future work will see in SectionV.

log416=2stages. A 16-point, radix-4 decimation-infrequency FFT algorithm is shown in Figure 1. Its input is in normal order and its output is in digitreversed order. It has exactly the same computational complexity as the decimation-in-time radix-4 FFT algorithm.

Fig.1 Radix-4 decimation-in-frequency FFT SGF When the number of data points N in the DFT is a power of 4, then is more efficient computationally to employ a radix-4 algorithm instead of radix-2 algorithm. A radix-4 decimation in-time FFT algorithm is obtained by splitting the N-point input sequence x(n)into four sub sequences x(4n), x(4n + 1), x(4n + 2) and x(4n + 3). Its input is in normal order and its output is in digit-reversed order. It has exactly the same computational complexity as the decimation-in-time radix-4 FFT algorithm. A 16point, radix-4 decimation-in-frequency FFT algorithm is shown in Figure.1 The radix-4 decimation in frequency butterfly is constructed by merging 4-point DFT with associated coefficients between DFT stages. The four outputs of the radix-4 butterfly namely X(4n), X(4n+1), X(4N+2) and X(4N+3) are expressed in terms of its inputs x(n), x(n)+N/4 , x(n)+N/2 and x(n)+3N/4. For illustrative purposes, let us re-derive the radix-4 decimation-infrequency algorithm by breaking the N-point DFT formula into four smaller DFTs. We have

2. Radix-4 FFT algorithm

In last three decades, various FFT architectures such as single-memory architecture, dual memory architecture, pipelined architecture, array architecture and cache memory architecture have been proposed. In order to improve the power reduction, we propose a radix-4 64-point pipeline FFT/IFFT processor. In order to speed up the FFT computations, more advanced solutions have been proposed using an increase of the radix. The traditional FFT algorithms of radix 2 type have simple structure and data flow, which are easy to implement and are suitable for generic FFT implementation. These algorithms need large memory to store data at inner stages, which require large power and area consumption. The radix-4 FFT algorithm is most popular and has the potential to satisfy the current need. The radix-4 FFT equation essentially combines two stages of a radix-2 FFT into one, so that half as many stages are required. To calculate 16-point FFT, the radix-2 takes log216=4 stages but the radix-4 takes only

3. Proposed architecture
In this paper, low power techniques are employed for power consumption using reconfigurable complex multiplier. Using radix-4 algorithm with single-path delay feedback pipelined architecture increase the computational speed, further reduce the chip area by three different processing elements (PEs) were proposed in this radix-4 64-point FFT/IFFT processor. Our proposed architecture uses a low complexity reconfigurable complex multiplier instead of ROM tables to generate twiddle factors and fixed width modified booth multiplier to reduce the truncation error. Fig.2 Shows the Block diagram of FFT processor.

From the definition of the twiddle factors, we have

The relation is not an N/4-point DFT because the twiddle factor depends on N and not on N/4. To convert it into an N/4-point DFT we subdivede the DFT sequence into four N/4-point subsequences, X(4k), X(4k+1), X(4k+2), and X(4k+3), k = 0, 1, ..., N/4. Thus we obtain the radix-4 decimation-in frequency DFT as

Fig.2. Block diagram of FFT processor

A. Processing Elements
This proposed architecture consists of three different types of processing elements (PEs), reconfigurable constant complex multiplier, delay line buffers (as shown by a rectangle with a number inside) and some extra processing units for computing IFFT. Here, the conjugate operation is easy to implement, where we have to generate the 2s complement of the imaginary part of a complex value. This new multiplication structure becomes the key component in reducing the chip area and power consumption. Figure.3 shows the Radix-4 single-path delay feedback pipeline architecture, it is characterized by continous processing of output data. In addition, the pipeline architecture is highly regular, making it straightforward to automatically generate FFTs of various lengths[6]-[8].

where we have used the property WN4kn = WknN/4. Note that the input to each N/4-point DFT is a linear combination of four signal samples scaled by a twiddle factor. This procedure is repeated v times, where v = log4N.

Fig.3. A 64-point R4SDF Architecture Fig.6 Circuit diagram of our proposed PE1 Based on the radix-4 FFT algorithm, the three types of processing elements (PE3, PE2, PE1) proposed in our design. Illustrated in fig.4, fig.5 and fig.6 respectively. The PE3 stage is used to implement the radix-4 butterfly structure, and serves as sub-modules of the PE2and PE1 stages. In the PE2 stage, the calculation of multiplication by j or 1 uses the outcome of the PE3 module. Note that a multiplication by -1 is practically to take the 2s complement of the input value. The PE1 stage is responsible for computing the multiplications by j, WNN/8, and WN3N/8, respectively. Since WN3N/8= -jWN3N/8, it can be done with a multiplication by, WNN/8 first and then a multiplication by j. Hence, our designed hardware utilizes this kind of cascaded calculation and multiplexers to realize all the calculations in the PE1 stage.

B. Reconfigurable Multiplier
Fig.4 Circuit diagram of our proposed PE3 stage.



It is based on the trigonometric identities where the coefficients are real and imaginary and number of points are uniformly spread. Hence it is suitable for the multiplication of twiddle factors and highly used for the calculation of FFT architectures.

Fig.5 Circuit diagram of our proposed PE2 stage.

Fig.7. Block diagram of reconfigurable complex multiplier

Arithmetic circuits optimized for the required coefficients rather than general multipliers for the small range of twiddle factors. Here the required coefficient values are shown in table.1. It is based on the trigonometric identity of sin2= 2sincos[13]. As shown in figure.7 these multipliers are slightly modified and those multipliexers are added and trigonometric identities identify the smaller number of constant multiplications are combined to form a remaining coefficients at the output and allow the multipication into single output structure.

Fig.8. shows the partial product matrix of 8x8 modified booth multiplier. Next the simple error compensation function which takes this outputs of the booth encoders as inputs and generates the approximate carry value. The sum (MP) and sum(LP) represent the sum of partial products bits in MP and LP, then 2n-bit output is expressed as Table.1 trigonometry identities. P = A X B = SUM (MP) + SUM (LP) The SUM (LP) calculated as A and B are two bit signed numbers for multiplication operation and can be expressed as

C. Modified Booth Multipliers

This modified Fixed width modified Booth multipliers are used to reduce the truncation error.It is made by numerical algorithms that arises from taking finite number of steps in computation. It is present even with infinite-precision arithmetic, because it is caused by truncation of the infinite algorithm. This multiplier achieve better error performance in terms of absolute error. This modified booth multiplier produces at most n/2+1partial products[15]-[17]. In this multiplier, the partial product matrix of booth multiplication is slightly modified and effective error compensation function was derived. Fig.8. shows the partial product matrix of 8x8 booth multiplier.

To analyze the accuracy of the fixed-width modified Booth multipliers, Error metrics in terms of normalized absolute error , normalized mean error and normailzed mean square error are expressed as

max = Max{ P Pt

n }/2

mean = Ave { p pt}/2 mse = Ave {( P Pt )2} / 22n

Max = maximal operator Ave = average operator P = output product of standard modified booth multiplier

Pt = output product of fixed-width modified booth multiplier Here, mean and mean-square errorcan be significantly reduced so that this multiplier is used for the various applications.

4. Result
The performance of various FFT/IFFT architectures has been studied literally. Our proposed architecture consumes low power, lower hardware cost, as well as highly efficient when compare with other architectures with word length of 16, gate counts of 33600 and this design can also implemented in 0.18m CMOS technology. In order to improve the efficiency of FFT/IFFT processor, the fixed width modified booth multiplier with low error compensation circuit is used in our proposed work for reducing the truncation error and improve the peak signal to noise ratio (PSNR) by 2.0 dB . By the way, efficiency of our proposed work is improved about 10% to 15% when compared with other processors. Figure.9. shows the simulation results of maximum error, mean error and mean square error of fixed width modified booth mulitpliers

requires about 33.6k gates, and has a working frequency up to 80 MHz synthesized by using 0.18m CMOS technology. Since our design requires low-cost and consumes low power, as well as reduce the truncation error, SQNR and highly efficient and also adapted to high-point FFT applications. Hence it can be applied as a powerful FFT/IFFT processor in various wireless communication systems.

[1] Anbarasan.A, Shankar.K Design and implementation of low power FFT/IFFT processor for wireless communication Proceedings of the International Conference on Pattern Recognition, Informatics and Medical Engineering (PRIME-2012), pp. 152-155, March 2012. [2] Chu yu, Mao-Hsu Yen A Low power 64-point FFT/IFFT Processor for OFDM Applications IEEE Transactions on Consumer Electronics,Vol. 57, Feb 2011. [3] N.Kirubanandasarathy, Dr.K.Karthikeyan, VLSI Design of Mixed Radix FFT Processor for MIMIOFDM in Wireless Communication, 2011 IEEE Proceedings. [4] Fahad Qureshi and Oscar Gustafson,Twiddle Factor Memory Switching Activity Analysis of Radix-22 and Equivalent FFT Algorithms, IEEE Proceedings, April 2010. [5] Jianing Su, Zhenghao Lu, Low cost VLSI design of a flexible FFT processor, IEEE Proceedings, April 2010. [6] He Jing, Ma Lanjaun, Xu Xinyu, A Configurable FFT Processor, IEEE proceeding 2010. [7] M.Merlyn, FPGA Implementation of FFT Processor with OFDM Transceiver, 2010 IEEE proceeding. [8] Nuo Li and N.P.van der Meijs, A Radix based Parallel pipelined FFT processor for MB- OFDM UWB system, IEEE Proceedings, 2009. [9] Minhyrok Shin and Hanho Lee, A High-Speed Four Parallel Radix-24 FFT/IFFT Processor for UWB Applications, in Proc. IEEE Int. Symp. Circuits and systems, 2008, pp. 960-963. [10] Yuan Chen, Yu-Wei Lin, A Slock scalin FFT/IFFT processor for WiMAX Applications IEEE proceedings 2006. [11] Wei Han, Ahmet T. Erdogan, Tughrul Arslan, and Mohd. Hasan, High-Performance Low-Power FFT Cores, ETRI Journal, Volume 30, Number 3, June 2008. [12] K. Stevens and B. Suter, A Mathematical Approach to a Low Power FFT Architecture, IEEE Intl Symp. on Circuits and Systems, 2008.

Fig.9. calculation of maximum error, mean error, mean square error.

In this paper, we presented low power high efficient single-path delay feedback pipelined 64-point FFT/IFFT processor for OFDM applications has been described. For improving the speed of the processor we increased the radix size from radix-2 to radix-4. In order reduce the chip size by reducing its area we used the reconfigurable complex constant multiplier which multiplies as well as stores the twiddle factors and hence the ROM memory size is gets shrunked ans chip size is reduced. The efficiency of the processor can be improved by reducing its truncation error, mean error and mean square error of the multipliers used this processors by using fixed width modified booth multiplier. Our designed hardware

[13] Fahad Qureshi and Oscar Gustafson,Low complexity Reconfigurable complex constant multiplication for FFTs , IEEE Proceedings,2009. [14] Jiun-Ping Wang, Shiann-Rong Kuang, HighAccuracy Fixed-Width Modified Booth Multipliers for Lossy Applications, IEEE Transactions on Very Large Scale Integration (VLSI) systems ,Vol. 19 No.1, Jan 2011. [15] M.A.Song, L.D.Van,and S.Y.Kuo Adaptive low error fixed width booth multiplier,IEICE Trans.Fundamentals,vol.E90-A, no.6 pp.1180-1187, Jun.2007 [16] K.J Cho, K.C .Lee, J.G.Chung, and K.K.Parhi,Design of low error fixed width modified booth multiplier, IEEE Trans. Very large scale Integr. (VLSI) syst., vol.12,no.5,pp.522-531, May 2004 [17] High-Accuracy Fixed-Width Modified Booth Multipliers for Lossy Applications, Jiun-Ping Wang, Shiann-Rong Kuang, IEEE Transactions on Very Large Scale Integration (VLSI) systems ,Vol. 19 No.1, Jan 2011.