You are on page 1of 6

Proceedings of International Conference On Current Innovations In Engineering And Technology

ISBN : 978 - 1502851550

Pipelined Parallel FFT Architecture through Folding Transformation


M.S.Krishna priya, M.TECH Student, DEPARTMENT of ECE, Shri Vishnu Engg college for women
D.Murali Krishna, Sr.ASST.PROFESSOR, DEPARTMENT of ECE, Shri Vishnu Engg college for women

is the basis for several television and radio broadcast

ABSTRACT:

applications, including the European digital broadcast


This project presents a FUSING FFT system using

television standard, as well as digital radio in North

OFDM application. It is demonstrated by a software

America. FUSING platform provides software

reconfigurable

control of variety of modulation schemes, wideband

programmable floating point DSP. A new VLSI

or narrow band operation, communications security

architecture for real-time pipeline FFT processor is

functions such as frequency hopping and waveform

proposed in this project. In this project, high radix

requirements of current and evolving standards over a

floating point butterflies are implemented more

broad frequency range. It is viewed as a single radio

efficiently

floating-point

platform, providing services to multiple cellular

operations. The fused operations are a two-term dot

standards. F AST FOURIER TRANSFORM (FFT) is

product and add-subtract unit. Both discrete and

widely used in the field of digital signal processing

fused radix processors are implemented; compared

(DSP) such as filtering, spectral analysis, etc., to

in regarded with area wise. OFDM systems and the

compute the discrete Fourier transform (DFT). FFT

associated clock cycles required to demodulate data

plays

using loop and straight-line FFT programming

communications such as digital video broadcasting

methods are provided. Higher execution speed is

and orthogonal frequency division multiplexing

achieved by using straight-line code instead of

(OFDM) systems. Much research has been carried

looped code. The tradeoff of this optimization is a

out

larger program memory requirement of the straight-

computation of FFT of complex valued signals

line assembly code.

(CFFT). Various algorithms have been developed to

OFDM

with the

system

two

fused

using

on

critical

designing

role

in

pipelined

modern

digital

architectures

for

reduce the computational complexity, of which


KEYWORDS: Fusing, OFDM, FFT, Radix, Dot
product,

Folding

transformation,

Cooley-Tukey radix-2 FFT [1] is very popular.

Optimization,

complex valued Fourier transform, Add-substract

Note that this is not the only way to represent floating

unit.

point numbers, it is just the IEEE standard way of

1.INTRODUTION:

OFDM

is

multimode

modulation and multiple access technique used in a


number

of

commercial

wired

and

doing it. Here is what we do: the representation has


three fields:

wireless

applications. In the wired side, it is used for a variant


of digital subscriber line (D). SLFor wireless, OFDM

International Association Of Engineering & Technology For Skill Development


72

www.iaetsd.in

Proceedings of International Conference On Current Innovations In Engineering And Technology

|S| E

ISBN : 978 - 1502851550

difference calculation, significand swapping, and the


significand shifting for both the add and the subtract

----------------------------

operations are performed with a single set of


hardware and the results are shared by both the

S is one bit representing the sign of the number

operations. This significantly reduces the required

E is an 8-bit biased integer representing the exponent

circuit area. The significand swapping and shifting is

F is an unsigned integer

done based solely on the values of the exponents (i.e.,


without comparing the significands).

the decimal value represented is:


S

(-1) x f x 2
where e = E bias
f = ( F/(2^n) ) + 1
for single precision representation (the emphasis in
this

class)

23

bias = 127
for

double

precision

representation

(a

64-bit

representation)
n = 52 (there are 52 bits for the mantissa field)
bias = 1023 (there are 11 bits for the exponent field)
2.FUSED FLOATING-POINT ADD-SUBTRACT
UNIT:The floating-point fused add-subtract unit
(Fused AS) performs anaddition and a subtraction in

To demonstrate the utility of the Fused DP and Fused

parallel on the same pair of data. The fused add-

AS units for FFT implementation, FFT butterfly unit

subtract unit is based on a conventional floatingpoint

designs using both the discrete and the fused units

adder [8]. Although higher speed adder designs are

have been made. First, a radix-2 decimation in

available(see [9] for example), the basic design

frequency FFT butterfly was designed. All lines carry

shown here serves todemonstrate the concept. A

complex pairs of 32-bit IEEE-754 numbers and all

block diagram of the fused addsubtractunit is shown

operations are complex. The complex add, subtract,

in Fig. 5 (after the initial design from [10]). Some

and multiply operations can be realized with a

details, such as the LZA and normalization logic are

discrete implementation that uses two real adders to

omitted here to simplify the figure. The exponent

perform the complex add or subtract and four real

International Association Of Engineering & Technology For Skill Development


73

www.iaetsd.in

Proceedings of International Conference On Current Innovations In Engineering And Technology

ISBN : 978 - 1502851550

multipliers and two real adders to perform the


complex multiply.

The nodes from

A0A7.represent the eight

butterflies in the first stage of the FFT and B0..B7.


represent the butterflies in the second stage. Assume
the butterflies have only one multiplier at the bottom
output instead of both outputs.
3.HIGH

THROUGHPUT

ARCHITECTURE:

The

proposed

FFT
architecture

consists of the following main parts, together with


their specific novelties and advantages. (i) A memory
unit composed of 16 dual-port memory banks, which
facilitates 16-way parallel data access. (ii) A memory
bank index and address generation unit (BAGU),
Although there is a multiplicative factor j after the

which generates conflict-free and in-place memory

first stage, the first two stages consists of only real-

bank indexes and address for the radix-16 FFT

valued datapath.We need to just combine the real and

operation. (iii) Four commutator blocks located in

imaginary parts and send it as an input to the

front of the input side and after the output side of the

multiplier in the next stage. For this, we do not need a

memory, provide efficient data routing mechanism

full complex butterfly stage. The factor is handled in

which is governed by the BAGU signals. (iv) A

the second butterfly stage using a bypass logic which

scaling

forwards the two samples as real and imaginary parts

operations for block floating point (BFP) operations,

to the input of the multiplier. The adder and

which generates higher signal-to-quantization noise

subtractor in the butterfly remains inactive during

ratio (SQNR) than the existing designs. (v) The

that time. Scheduling Method 2: Another way of

kernel

scheduling

performance computing

is

proposed

which

modifies

the

unit (SU) coordinates controlled scaling

processing

engine,

which

is

high

engine for radix-16

architecture slightly and also reduces the required

butterfly operations. Four radix-16 PEs (i.e., PE_R16

number of delay elements. In this scheduling, the

0 through PE_R16 3), two sets of radix 2 PEs (each

input samples are processed sequentially, instead of

set contains four radix-2 PEs), and four sets of

processing the even and odd samples separately. This

complex multipliers(each contains four complex

can be derived using the following folding sets:

multipliers) for twiddle factor multiplications. Those


multipliers are optimized with the help of commonsubexpression sharing technique and a new twiddle-

International Association Of Engineering & Technology For Skill Development


74

www.iaetsd.in

Proceedings of International Conference On Current Innovations In Engineering And Technology

ISBN : 978 - 1502851550

factor multiplication scheme. All the function units

order of the output samples in the proposed

inside the kernel processing engine are detailed. To

architectures is not in the bit-reversed order. The

avoid possible conflicts in simultaneously reading (or

output order changes for different architectures

writing) 16 data from (or to) the memory banks

because of different folding sets/scheduling schemes.

during FFT operations, a proper memory addressing


scheme is necessary. The well-known non-conflict
memory addressing schemes [5], [7] are only

5.BOOTH

ENCODER

FOR

MULTIPLICATION:

applicable to radix-2 FFT algorithm. Although the

We use the sign extension circuitry developed in [2]

addressing scheme in [6] is for general radix- FFT

and [3]. The conventional MBE partial product array

operations, its FFT size should be a power-of-

has two drawbacks: 1) an additional partial product

number. Besides, those schemes are only limited to

term at the (n-2)th bit position; 2) poor performance

single-PE architecture. On the other hand, the radix-2

at the LSB-part compared with the non-Booth design

addressing scheme for multiple PEs [16] is relatively

when using the TDM algorithm. To remedy the two

inefficient compared with higher-radix schemes. The

drawbacks, the LSB part of the partial product array

proposed scheme has three special features. First, it

is modified. Referring to theory, the Row_LSB (gray

ensures conflict-free FFT butterfly executions during

circle) and the Neg_cin terms are combined and

the entire FFT operation. Second, it supports parallel

further simplified using Boolean minimization. All

data outputs with normal ordering. This feature is

these are efficiently implemented using this advanced

always desirable for providing immediate and

modified booth algorithm. Below figure shows the

normal-order

succeeding

architecture of the commonly used modified Booth

functional blocks, such as channel estimator for

multiplier. The inputs of the multiplier are multiplicand

timely operations. Thirdly, like many other designs,

X and multiplier Y. The Booth encoder encodes input Y

the in-place FFT computation strategy is also adopted

and derives the encoded signals as shown in below

for low memory overhead consideration.

figure The Booth decoder generates the partial products

FFT

outputs

to

the

according to the logic diagram using the encoded signals

4. REORDERING OF THE OUTPUT SAMPLES:

and the other input X. The carry save tree computes the

Reordering of the output samples is an inherent

last two rows by adding the generated partial products.

problem in FFT computation. The outputs are

The last two rows are added to generate the final

obtained in the bit-reversal order [5] in the serial

multiplication results using the carry save addition.

architectures. In general the problem is solved using a


memory of size . Samples are stored in the memory
in natural order using a counter for the addresses and
then they are read in bit-reversal order by reversing
the bits of the counter. In embedded DSP systems,
special memory addressing schemes are developed to
solve this problem. But in case of real-time systems,
this will lead to an increase in latency and area. The

Fig: Modified Booth Encoder

International Association Of Engineering & Technology For Skill Development


75

www.iaetsd.in

Proceedings of International Conference On Current Innovations In Engineering And Technology

6.RESULT:

ISBN : 978 - 1502851550

designed and experimental results are obtained with


XILINX.
REFERENCES:
[1] J. W. Cooley and J. Tukey, An algorithm for
machine calculation of complex fourier series,
Math. Comput., vol. 19, pp. 297301, Apr. 1965.
[2] A. V. Oppenheim, R.W. Schafer, and J.R.Buck,
Discrete-Time Singal Processing, 2nd ed. Englewood
Cliffs, NJ: Prentice-Hall, 1998.
[3] P. Duhamel, Implementation of split-radix FFT
algorithms for complex, real, and real-symmetric
data, IEEE Trans. Acoust., Speech, Signal Process.,
vol. 34, no. 2, pp. 285295, Apr. 1986. [4] S. He and
M. Torkelson, A new approach to pipeline FFT
processor, in Proc. of IPPS, 1996, pp. 766770.
[5] L. R. Rabiner and B. Gold, Theory and
Application of Digital Signal Processing. Englewood
Cliffs, NJ: Prentice-Hall, 1975.
[6] E. H. Wold and A. M. Despain, Pipeline and
parallel-pipeline

FFT

processors

for

VLSI

7. CONCLUSION: Finally, This paper describes the

implementation, IEEE Trans. Comput.,vol. C-33,

design of two new fused floating-point arithmetic

no. 5, pp. 414426, May 1984.

units and their application to the implementation of

[7] A. M. Despain, Fourier transfom using CORDIC

FFT butterfly operations. Although the fused add-

iterations, IEEE Trans. Comput., vol. C-233, no. 10,

subtract unit is specific to FFT applications, the fused

pp. 9931001, Oct. 1974.

dot product is applicable to a wide variety of signal

[8] E. E. Swartzlander, W. K. W. Young, and S. J.

processing applications. Both the fused dot product

Joseph, A radix-4 delay commutator for fast Fourier

unit and the fused add-subtract unit are smaller than

transform processor implementation, IEEE J. Solid-

parallel implementations constructed with discrete

State Circuits, vol. SC-19, no. 5, pp. 702709, Oct.

floating-point adders and multipliers. The fused dot

1984.

product

conventional

[9] E. E. Swartzlander, V. K. Jain, and H. Hikawa,

implementation, since rounding and normalization is

A radix-8 wafer scale FFT processor, J. VLSI

not required as a part of each multiplication. Due to

Signal Process., vol. 4, no. 2/3, pp. 165176, May

longer interconnections, the fused add-subtract unit is

1992.

slightly slower than the discrete implementation. An

[10] G. Bi and E. V. Jones, A pipelined FFT

efficient and more flexible architecture of FFT is

processor for word-sequential data, IEEE Trans.

is

faster

than

the

International Association Of Engineering & Technology For Skill Development


76

www.iaetsd.in

Proceedings of International Conference On Current Innovations In Engineering And Technology

ISBN : 978 - 1502851550

Acoust., Speech, Signal Process., vol. 37, no. 12, pp.


19821985, Dec. 1989.
[11] Y. W. Lin, H. Y. Liu, and C. Y. Lee, A 1-GS/s
FFT/IFFT processor for UWB applications, IEEE J.
Solid-State Circuits, vol. 40, no. 8, pp. 17261735,
Aug. 2005.
[12] J. Lee, H. Lee, S. I. Cho, and S. S. Choi, A
High-Speed two parallel radix- FFT/IFFT processor
for MB-OFDM UWB systems, in Proc. IEEE Int.
Symp. Circuits Syst., 2006, pp. 47194722.
[13] J. Palmer and B. Nelson, A parallel FFT
architecture for FPGAs, Lecture Notes Comput. Sci.,
vol. 3203, pp. 948953, 2004.
[14] M. Shin and H. Lee, A high-speed four parallel
radix- FFT/IFFT processor for UWB applications, in
Proc. IEEE ISCAS, 2008, pp. 960963.
[15] M. Garrido, Efficient hardware architectures
for the computation of the FFT and other related
signal processing algorithms in real time,Ph.D.
dissertation, Dept. Signal, Syst., Radio commun.,
Univ. Politecnica Madrid, Madrid, Spain, 2009.

International Association Of Engineering & Technology For Skill Development


77

www.iaetsd.in

You might also like