This action might not be possible to undo. Are you sure you want to continue?
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
REPORT
/proj/fmd/fft/report02.fm
1(52)
FFT, REALIZATION AND IMPLEMENTATION IN FPGA
Grifﬁth University/Ericsson Microwave System AB 2000/2001
by
Magnus Nilsson
Supervisor, EMW: Rune Olsson
Supervisor, GU: Prof. Kuldip K. Paliwal
Signal Processing Laboratory, School of Microelectronic Engineering, Grifﬁth University
Brisbane/Gothenburg 2000/2001
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
cos(2*pi*4.5/16*t)+i*sin(2*pi*4.5/16*t)
EMWMSNN (Magnus Nilsson)
EMW/FX/DC (Anders Wanner) 20010212 A1
FX/D2001:007
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
2(52)
Abstract
Ericsson Microwave Systems develops radar systems for military and civilian
applications. In military environments high radar resolution and long range
are desired, thus high demands must be met by the generated and
transmitted radar signal.
In this report the design of a parallel Radix4 Fast Fourier Transform
algorithm is described. A theoretical review regarding Fourier theory and
Fast Fourier Transform (Radix2 and Radix4) is done.
A complex parallel Radix4 algorithmis simulated, implemented and realized
in hardware using VHDL and a Xilinx VirtexE 1000 FPGA circuit.
The VHDL code was simulated and synthesized in Ease and Synplify
environment. The design was veriﬁed and the output was identical with the
Matlab and VHDL simulations, proving speed improvements due to a parallel
approach.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
3(52)
Preface
This thesis is a part of my education towards a Master degree in Computer
and Information Engineering at Grifﬁth University, Brisbane, Australia.
Project 1,2 and 3. MEE7097,MEE7098 and MEE7099.
The work has been done at Ericsson Microwave System AB in Mölndal
Sweden, at the department FX/D
I would like to thank the following people who has been of great help to me
during my work.
My supervisor Rune Olsson, EMW.
My manager Håkan Olsson/Anders Wanner, EMW.
Prof. Kuldip K. Paliwal, supervisor GU.
Daniel Wallström, EMW, for help with VHDL.
Dennis Eriksson, EMW, for help with Logical Analyser/Pattern generator.
Nils Dagås and Gabriel Gitye, EMW, for help with Matlab.
I would also like to thank the remaining staff at EMW/FX and GU who have
been helpful to me.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
4(52)
Contents Page
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.3 Technical function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2 JeanBaptisteJoseph Fourier . . . . . . . . . . . . . . . . . . . . . . . . . . 7
3 The Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
4 The Discrete Fourier Transform. . . . . . . . . . . . . . . . . . . . . . . . 13
5 Development of the Fast Fourier Transform. . . . . . . . . . . . . . . 15
5.1 Theory of the Fast Fourier Transform . . . . . . . . . . . . . . . . . . . 15
5.2 History of the Fast Fourier Transform . . . . . . . . . . . . . . . . . . . 16
6 The Radix  2 Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
Fig.1. : FFTButterﬂy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
Fig.2. : Radix2 DFT structure . . . . . . . . . . . . . . . . . . . . . . . . . 23
Fig.3. : Radix2 vs. Direct calculation in ﬂops . . . . . . . . . . . . . 23
Fig.4. : Radix2 algorithm comp. with MATLAB function FFT . 24
7 The Radix4 Algorithm. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
Fig.5. : Radix4 Butterﬂy, also referred to as Dragonﬂy . . . . . . 26
Fig.6. : Radix4 FFT algorithm compared with Matlab FFT. . . 29
8 Implementation and Realization in hardware. . . . . . . . . . . . . . 30
8.1 FPGA. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
Fig.7. : CLB, Conﬁgurable logic block. Courtesy of Xilinx Inc. . 30
8.2 Complex FFT. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
Fig.8. : Construction conﬁguration. . . . . . . . . . . . . . . . . . . . . . 31
8.3 Bitlength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32
Fig.9. : Radix4 FFT, 12bit length of samples. . . . . . . . . . . . . 32
Fig.10. : Radix4 FFT, 14bit length of samples. . . . . . . . . . . . 32
Fig.11. : Radix4 FFT, 16bit length of samples. . . . . . . . . . . . 32
8.4 Radix4 FFT algorithm, N = 64 . . . . . . . . . . . . . . . . . . . . . . . . 33
Fig.12. : Radix4 FFT, N = 64 . . . . . . . . . . . . . . . . . . . . . . . . . 33
Fig.13. : First FFT construction vs. Matlab FFT. . . . . . . . . . . . 35
Fig.14. : Timing diagram for Radix4 FFT, shared multiplier . . 37
8.5 Radix4 FFT algorithm, N = 16 . . . . . . . . . . . . . . . . . . . . . . . . 38
Fig.15. : Input signal X1 and X2. . . . . . . . . . . . . . . . . . . . . . . . 38
Fig.16. : Input signal X3 and X4. . . . . . . . . . . . . . . . . . . . . . . . 38
Fig.17. : Radix4 N = 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
Fig.18. : Timing diagram for Radix4 FFT length 16, 16 bits . . 39
Fig.19. : Absolute value block . . . . . . . . . . . . . . . . . . . . . . . . . 40
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
5(52)
9 Veriﬁcation and Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.1 Test pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
9.2 Matlab veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
Fig.20. : Output graph signal X1, absolute = 1 . . . . . . . . . . . . . 43
Fig.21. : Output graph signal X2, absolute = 1 . . . . . . . . . . . . . 43
Fig.22. : Output graph signal X3, absolute = 1 . . . . . . . . . . . . . 43
Fig.23. : Output graph signal X4, absolute = 1 . . . . . . . . . . . . . 43
Fig.24. : Output complex and absolute, signal 1 vs. Matlab . . . 44
Fig.25. : Output complex and absolute, signal 2 vs. Matlab . . . 44
Fig.26. : Output complex and absolute, signal 3 vs. Matlab . . . 45
Fig.27. : Output complex and absolute, signal 4 vs. Matlab . . . 45
10 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
11 Ideas for further studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
12 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
Appendix A1 . . . . . . . . . . . . . . Ease block structure of Radix4 FFT, N = 64
AppendixA2 . . . Ease block structure of Radix4 FFT, N = 64, shared mult
Appendix A3 . . . . . . . . . . . . . . Ease block structure of Radix4 FFT, N = 16
Appendix B . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matlab code
Appendix C . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Output listing
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
6(52)
1 INTRODUCTION
1.1 BACKGROUND
To implement the DFT (FFT) in hardware (real time system) required
expensive solution often with ASIC (Application Speciﬁc Integrated Circuit).
With the latest generation of FPGA (Field Programmable Gate Arrays) it is
possible to implement very large amounts of logic in a single integrated
circuit.
A manufacturer of FPGA named XILINX now has a dropin module for their
FPGAs which can execute a 1024points FFT. It is interesting to evaluate and
develop such a DFT or similar.
1.2 TASK
To study, implement and evaluate the DFT (Discrete Fourier Transform) in
FPGA or similar.
1.3 TECHNICAL FUNCTION
The DFT shall collect data, execute a DFT or IDFT and output the data. The
implementation shall be optimized on execution time, size (area) and cost.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
7(52)
2 JEANBAPTISTEJOSEPH FOURIER
The 21:st of March 1768, JeanBaptisteJoseph Fourier was born. He was
born in poor circumstances in the small village of Auxerre, France.
JeanBaptisteJoseph Fourier introduced the idea that an arbitrary function,
even a function deﬁned by different analytic expressions in adjacent
segments of its range (such as a staircase waveform) could nevertheless be
represented by a single analytic expression.
Fourier’s ideas encountered resistance at the time but has proven to be a
central theorem to many of the later developments in mathematics, science
and also engineering. As we all know, it is at the heart of the electrical
curriculum today.
Fourier came across the idea in the connection with the problem of ﬂow of
heat in solid bodies, including the heat from the earth.
We have learned that Fourier was obsessed with heat, keeping his room
really hot, uncomfortably hot for visitors, this when even wearing a heavy
coat himself. Some has traced this obsession back to Egypt where he went
1798 with Napoleon on an expedition to civilize the country. By this time
Fourier worked with his theories parallel to his ofﬁcial duties as a secretary
of the Institut d’Egypte. At the time in Egypt, Fourier came in contact with the
English Physisist Thomas Young (17737829), father of linearity, with whom
he discussed his ideas and worked together on, among other things, the
Rosetta Stone.
After returning back to Paris, Fourier had by 1807, despite ofﬁcial duties,
completed his theory of heat conduction, which depended on the essential
idea of analysing the temperature distribution into spatially sinusoidal
componets. He was very criticized for his theory among the french scientists,
among them where Biot and Poisson. Even though he was criticized for his
theory he received a mathematic prize in 1811 for his heat theory.
The publication of his writing report "Théorie analytique de la chaleur" (The
analytical theory of heat) in 1815 was also met with some criticism and this
might be seen as an indication of the deep uneasiness about Fourier
analysis that was felt by the great mathematicians of that day.
JeanBaptisteJoseph Fourier died in Paris the 16:th of May 1830. He never
got married.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
8(52)
3 THE FOURIER TRANSFORM
One of todays principal analysis tool in many of todays scientiﬁc challenges
is the Fourier Transform. Maybe the most known application of this
mathematical technique is the analysis of linear timeinvariant system. As
this might be the most well known application, the Fourier Transform is
essentially a universal problemsolving technique. Its importance is based on
the fundamental property that one can examine a particular relationship from
an entirely different viewpoint. Simultaneous visualization of a function and
its Fourier Transform is often the key to successful problem solving.
If we deﬁne a signal:
(EQ 1)
A transient signals spectrumis characterised by the fact that it is continuous,
this means that it holds inﬁnite numbers of frequency components, although
usually they are in a ﬁnite interval.
y t ( )
t ∞ t →
lim 0 =
0 2 4 6 8 10 12 14
0
1
2
3
4
5
6
7
Transient signal
n
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
9(52)
Mathematically one can deﬁne a signal, that vary periodically with time, to be
a sumof discrete frequency components, where a simple relationship exists
between the frequency components. This can be deﬁned as a formula:
(EQ 2)
where
If we look back on our transient signal above, the mathematical consequence
will be that the coefﬁcients will be continues functions of the phase w.
Equation 3 becomes:
(EQ 3)
y t ( )
1
2
  A
n
nω
0
t ( ) B
n
nω
0
t ( ) sin + cos { ¦
n 1 =
∞
∑
+ =
ω
0
2π
T
 =
T Periodicaltime =
y t ( ) A ω ( ) ωt ( ) B ω ( ) ωt ( ) sin + cos
0
∞
∫
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
10(52)
If we compare this equation with equation 2 we will see that the constant
A
0
/2 has disappeared, this though A
0
/2 represents the time mean value of
the signal and though it is a transient, the time mean value is zero. The
Fourier coefﬁcients A(w) and B(w) is deﬁned by the Fourier integrals:
(EQ 4)
(EQ 5)
where
When one wants to calculate the Fourier coefﬁcients in the general case for
the signal f(t), one should facilitate the calculations by introduce complex
notation.The starting point for complex notation of the Fourier Transform is
based on the formulas by Euler which gives a relation between the complex
number j and the trigonometrical functions sine and cosine:
(EQ 6)
(EQ 7)
A ω ( ) 2 y t ( ) ωt cos t d
∞ –
∞
∫
=
B ω ( ) 2 y t ( ) ω sin t t d
∞ –
∞
∫
=
ω 0 >
α cos
e
jα
e
j – α
+
2
 =
α sin
e
jα
e
j – α
–
2 j
  =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
11(52)
The equations 2, 6 and 7 will give us:
(EQ 8)
If we deﬁne Y(w):
Equation 9a and 9b (EQ 9)
Then the equation 8 can be simpliﬁed by making the integration over the real
area:
(EQ 10)
y t ( ) A ω ( )
e
jωt
e
jωt –
+
2

. ,
 `
B ω ( )
e
jωt
e
jωt –
–
2 j

. ,
 `
+ ω d
0
∞
∫
=
y t ( ) A
e
jωt
e
jωt –
+
2
 jB
e
jωt
e
jωt –
–
2
 +
. ,
 `
ω d
0
∞
∫
=
y t ( )
1
2
 A j – B ( )e
jωt 1
2
 A jB + ( )e
j – ωt
+
. ,
 `
ω d
0
∞
∫
=
Y ω ( )
1
2
  A ω ( ) j – B ω ( ) ( ) =
Y ω – ( )
1
2
 A ω ( ) jB ω ( ) + ( ) =
y t ( ) Y ω ( )e
jωt
∞ –
∞
∫
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
12(52)
This will then give us, by looking at equation 4 and 5:
(EQ 11)
We will then deﬁne Y(w) as the Fourier Transform of y(t) and equation 10 as
the Inverse Fourier Transform.
Y ω ( ) y t ( ) ωt cos t j y t ( ) ω sin t t d
∞ –
∞
∫
– d
∞ –
∞
∫
=
Y ω ( ) y t ( ) ωt j ω sin t – cos ( ) t d
∞ –
∞
∫
=
Y ω ( ) y t ( )e
jωt –
∞ –
∞
∫
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
13(52)
4 THE DISCRETE FOURIER TRANSFORM
When sampling an arbitrary analog signal the sampled signal can be
expressed as:
(EQ 12)
Where
According to the Nyquist theorem.
The function described above is a sumof time delayed delta functions, each
of them with the height y(nT
S
). The Fourier Transform for all of those
functions equals the Fourier Transform for the undelayed function ie.
multiplied with respectively time delay factor:
(EQ 13)
Since f = w/2pi is a discrete variable when we deal with a sampled signal and
only adopt the discrete values:
y t ( ) y 0 ( )δ 0 ( ) y T
S
( )δ t T
S
– ( ) y 2T
S
( )δ t 2T
S
– ( ) … y N 1 – ( )T
S
( )δ t N 1 – ( )T
S
– ( ) + + + + =
1
T
S
 2 f
max
≥
F y nT
S
( )δ 0 ( ) { ¦ y nT
S
( )F δ 0 ( ) { ¦ =
Y ω ( ) y 0 ( ) y T
S
( )e
jωT
S
–
… y N 1 – ( )T
S
( )e
jω N 1 – ( )T
S
–
+ + + =
Y ω ( ) y nT
S
( )e
jωnT
S
–
n 0 =
N 1 –
∑
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
14(52)
(EQ 14)
Where k = 0,1,2,.....,N1
Equation 13 becomes:
(EQ 15)
Once again k = 0,1,2,.....,N1
If we simplify the equation:
(EQ 16)
We will get the ﬁnal expression for the Fourier Transform:
(EQ 17)
k = 0,1,2,...,N1
The factor W
N
is called the Twiddle Factor.
0
1
NT
s

2
NT
s
 …
N 1 –
NT
s
 , , , ,
k
NT
s
 =
Y
2πk
NT
s

. ,
 `
y nT
S
( )e
jn
2πk
NT
s
T
S
–
n 0 =
N 1 –
∑
y nT
S
( ) e
j
2π
N
 –
. ,
 `
nk
n 0 =
N 1 –
∑
= =
W
N
e
j
2π
N
 –
=
Y k ( ) y n ( )W
N
nk
n 0 =
N 1 –
∑
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
15(52)
5 DEVELOPMENT OF THE FAST FOURIER TRANSFORM
5.1 THEORY OF THE FAST FOURIER TRANSFORM
If we consider the equation 17:
and we consider the amount of additions and multiplications needed for
computing the algorithm. For instance, let us consider the case when N = 4
Or simpliﬁed in the compact form:
(EQ 18)
If we then consider the twiddle factor and y(k) we will in the worst case have
two complex numbers. This fact will give us N
2
complex multiplications and
(N)(N1) complex additions. Suppose that we have a microprocessor that
can do an addition or a multiplication in 1 micro second and that this
processor should compute a DFT on a 1 kbyte set of samples. If we have N
2
complex multiplications and (N)(N1) complex additions ~ 2N
2
additions and
multiplications: 2 x 1024
2
x 1 micro second = 2,1 second. This without taking
into consideration the fact that the processor has to update pointers an so
on. If we want the analyse to be made in real time we will have to have a
distance between the samples that exceeds 2,1 second:
Y k ( ) y n ( )W
N
nk
n 0 =
N 1 –
∑
=
k 0 1 2 … N 1 – , , , , =
Y 0 ( ) y
0
W
0
y
1
W
0
y
2
W
0
y
3
W
0
+ + + =
Y 1 ( ) y
0
W
0
y
1
W
1
y
2
W
2
y
3
W
3
+ + + =
Y 2 ( ) y
0
W
0
y
1
W
2
y
2
W
4
y
3
W
6
+ + + =
Y 3 ( ) y
0
W
0
y
1
W
3
y
2
W
6
y
3
W
9
+ + + =
Y n ( ) W
nk
y k ( ) =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
16(52)
Which gives us:
as the maximum sampling frequency. By taking the Nyquist theorem in
respect, we can not sample a signal that holds a frequency component that
exceeds half the maximum sampling frequency = 488Hz/2 = 244Hz.
There are two obvious ways to improve and increase the bandwidth; a faster
processor or optimizing the algorithm.
5.2 HISTORY OF THE FAST FOURIER TRANSFORM
In the beginning of the 1960’s, during a meeting of the President’s Scientiﬁc
Advisory Commitee, Richard L. Garwin found out that John W. Tukey was
writing about the Fourier Transform. Garwin was in his own research in a
desperate need for a fast way to compute the Fourier Transform. When
questioned, Tukey outlined to Garwin essentially what has led to the famous
CooleyTukey algorithm.
To get some programming technique, Tukey went to IBM Research in
Yorktown Heights and meet there James WCooley, who quickly worked out
a computer program for this algorithm. After a while, request for copies and
a writeup began accumulating, and also Cooley was asked to write a paper
on the algorithm which in 1965 became the famous paper "An algorithm for
the machine calculation of complex Fourier series", that he published
together with Tukey.
When publishing this paper, reports of other people using the same
technique became known, but the original idea usually ascribe to Runge and
König.
The Cooley  Tukey algorithmis also called the Radix  2 algorithm, due to its
signal splitting.
2.1
1024
  2.05ms =
1
2.05
 488Hz =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
17(52)
6 THE RADIX  2 ALGORITHM
Once again consider equation 17:
and we want to analyse the samples:
If we consider the possibility to split the samples into odd and even samples:
Doing the DFT for those two sequences will give:
(EQ 19)
By extracting and simplifying the twiddle factor we are able to simplify even
further:
(EQ 20)
Y k ( ) y n ( )W
N
nk
n 0 =
N 1 –
∑
=
k 0 1 2 … N 1 – , , , , =
y n ( ) { ¦ y 0 ( ) y 1 ( ) y 2 ( ) … y N 1 – ( ) , , , , { ¦ =
y 2n ( ) { ¦ y 0 ( ) y 2 ( ) y 4 ( )… y N 2 – ( ) , , , { ¦ =
y 2n 1 + ( ) { ¦ y 1 ( ) y 3 ( ) y 5 ( ) … y N 1 – ( ) , , , , { ¦ =
Y k ( ) y 2n ( )W
N
2nk
y 2n 1 + ( )W
N
2n 1 + ( )k
n 0 =
N
2
 1 –
∑
+
n 0 =
N
2
 1 –
∑
=
Y k ( ) y 2n ( )W
N
2nk
W
N
k
y 2n 1 + ( )W
N
2nk
n 0 =
N
2
 1 –
∑
+
n 0 =
N
2
 1 –
∑
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
18(52)
(EQ 21)
(EQ 22)
By comparing this equation with equation 17 we will ﬁnd that this by deﬁnition
are two DFT’s with length N/2.
(EQ 23)
Where Dand Erepresents the sums fromequation 22. The computation gain
by doing this will be: (as the multiplications in an ordinary DFT = N
2
)
This number should be adjusted a bit though the twiddle factor should be
multiplied with the odd sum, but this is of a ﬁrst order of N.
If we study equation 23, we will ﬁnd that k goes from 0 to N1 but that D and
E represents DFT of N/2. Generally for a DFT of length N is that it is
periodical in k with N. This leads to that D and E in equation 23 is periodical
with N/2.
W
2
e
j
2π
N
 –
. ,
 `
2
e
j
2π
N
 – 2
e
j
2π
N 2 ⁄
 –
W
N 2 ⁄
= = = =
Y k ( ) y 2n ( )W
N 2 ⁄
nk
W
N
k
y 2n 1 + ( )W
N 2 ⁄
nk
n 0 =
N
2
 1 –
∑
+
n 0 =
N
2
 1 –
∑
=
Y k ( ) D k ( ) W
N
k
E k ( ) + =
k 0 1 2 … N 1 – , , , , =
N
2

. ,
 `
2
N
2

. ,
 `
2
+
N
4

2
N
4

2
+
N
2

2
= =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
19(52)
(EQ 24)
(EQ 25)
Calculating the DFT:
(EQ 26)
By symmetrically, the twiddle factor can be expressed as:
D k
N
2
 +
. ,
 `
D k ( ) =
E k
N
2
 +
. ,
 `
E k ( ) =
Y 0 ( ) D 0 ( ) W
N
0
E 0 ( ) × + =
Y 1 ( ) D 1 ( ) W
N
1
E 1 ( ) × + =
Y 2 ( ) D 2 ( ) W
N
2
E 2 ( ) × + =
…
…
Y
N
2
 1 –
. ,
 `
D
N
2
 1 –
. ,
 `
W
N
N
2
 1 –
E
N
2
 1 –
. ,
 `
× + =
Y
N
2

. ,
 `
D
N
2

. ,
 `
W
N
N
2

E
N
2

. ,
 `
× + D 0 ( ) W
N
N
2

E 0 ( ) × + = =
Y
N
2
 1 +
. ,
 `
D
N
2
 1 +
. ,
 `
W
N
N
2
 1 –
E
N
2
 1 +
. ,
 `
× + D 1 ( ) W
N
N
2
 1 +
E 1 ( ) × + = =
…
…
Y N 1 – ( ) D N 1 – ( ) W
N
N 1 –
E N 1 – ( ) × + D
N
2
 1 –
. ,
 `
W
N
N 1 –
E
N
2
 1 –
. ,
 `
× + = =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
20(52)
(EQ 27)
Which gives us:
(EQ 28)
By looking into equation 28 we will ﬁnd one elementary buildingblock, the so
called FFTButterﬂy.
Fig.1. FFTButterﬂy
W
N
k
N
2
 +
W
N
k
– =
Y 0 ( ) D 0 ( ) W
N
0
E 0 ( ) × + =
Y 1 ( ) D 1 ( ) W
N
1
E 1 ( ) × + =
…
Y
N
2
 1 –
. ,
 `
D
N
2
 1 –
. ,
 `
W
N
N
2
 1 –
E
N
2
 1 –
. ,
 `
× + =
Y
N
2

. ,
 `
D 0 ( ) W
N
0
E 0 ( ) × – =
Y
N
2
 1 +
. ,
 `
D 1 ( ) W
N
1
E 1 ( ) × – =
…
…
Y N 1 – ( ) D
N
2
 1 –
. ,
 `
W –
N
N
2
 1 –
E
N
2
 1 –
. ,
 `
× =
Y(k)
Y(k+N/2)
D(k)
E(k)
W
N
K
W
N
K
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
21(52)
Which gives the equations:
(EQ 29)
(EQ 30)
Since dividing the sequences into smaller building blocks reduce the amount
of multiplications, we will continue to divide the sequences into new blocks.
If we start with equation 22 and divide the sum into four new sums:
(EQ 31)
Y k ( ) D k ( ) W
N
k
E k ( ) × + =
Y k ( ) D k ( ) W –
N
k
E k ( ) × =
Y k ( ) y 4n ( )W
N 2 ⁄
2nk
y 4n 2 + ( )W
N 2 ⁄
2n 1 + ( )k
+
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
=
W
N
k
y 4n 1 + ( )W
N 2 ⁄
2nk
y 4n 3 + ( )W
N 2 ⁄
2n 1 + ( )k
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
¹ ¹
¹ ¹
' '
¹ ¹
¹ ¹
=
y 4n ( )W
N 2 ⁄
2nk
W
N 2 ⁄
k
y 4n 2 + ( )W
N 2 ⁄
2nk
+
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
W
N
k
y 4n 1 + ( )W
N 2 ⁄
2nk
W
N 2 ⁄
k
y 4n 3 + ( )W
N 2 ⁄
2nk
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
¹ ¹
¹ ¹
' '
¹ ¹
¹ ¹
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
22(52)
And since
(EQ 32)
And by using equation 21 backwards
(EQ 33)
And if we continue to divide into smaller sums until we only have N/2 2points
DFT’s we will get the structure described below, the constrain though is that
the length N should be a power of two. The example below shows the
structure for N = 8.
W
N 2 ⁄
2
W
N 4 ⁄
=
Y k ( ) y 4n ( )W
N 4 ⁄
nk
W
N 2 ⁄
k
y 4n 2 + ( )W
N 4 ⁄
nk
+
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
=
W
N
k
y 4n 1 + ( )W
N 4 ⁄
nk
W
N 2 ⁄
k
y 4n 3 + ( )W
N 4 ⁄
nk
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
¹ ¹
¹ ¹
' '
¹ ¹
¹ ¹
W
N 2 ⁄
k
W
N
2k
=
Y k ( ) y 4n ( )W
N 4 ⁄
nk
W
N
2k
y 4n 2 + ( )W
N 4 ⁄
nk
+
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
=
W
N
k
y 4n 1 + ( )W
N 4 ⁄
nk
W
N
2k
y 4n 3 + ( )W
N 4 ⁄
nk
n 0 =
N
4
 1 –
∑
+
n 0 =
N
4
 1 –
∑
¹ ¹
¹ ¹
' '
¹ ¹
¹ ¹
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
23(52)
Fig.2. Radix2 DFT structure
When computing a DFT using a Radix2 algorithm for the case when N = 2
x
the decimation into smaller sums can be done x = log
2
N times, and this will
give a total number of complex multiplications = (N/2)log
2
N and Nlog
2
N
complex additions. The gain when comparing with a direct calculation is
enormous as shown in the ﬁgure below:
Fig.3. Radix2 vs. Direct calculation in ﬂops
1
1
1
1
1
1
1
1
1
1
1
1
W
8
0
W
8
2
W
8
2
W
8
0
W
8
0
W
8
1
W
8
2
W
8
3
Stage 1 Stage 2 Stage 3
x(0)
X(0)
x(1)
x(2)
x(3)
x(4)
x(5)
x(6)
x(7)
X(3)
X(2)
X(1)
X(6)
X(5)
X(4)
X(7)
0 100 200 300 400 500 600 700 800 900 1000
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
x 10
7
N
N
u
m
b
e
r
o
f
f
l
o
p
s
Radix−2 vs. Direct Calculation in flops
Radix−2
Direct Calculation
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
24(52)
Part of radix2 Matlab algorithm.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Initialize variables.
t = 1:1:1024;
x = sin(2*pi*0.35*t)+sin(2*pi*0.25*t);
N = length(x);
b = bin2dec(fliplr(dec2bin(0:1:length(x)1)))+1;
MC = x(b); % Make in bit reversed order
alfa = N/2;
beta = 1;
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Calculate Twiddle factor
for n = 1:N/2
W(n) = exp(j*2*pi*(n1)/N);
W_r(n) = cos(2*pi*(n1)/N);
W_i(n) = sin(2*pi*(n1)/N);
end
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Calculate FFT using inplace nonrecursive DIT FFT, radix2
for h = 1:(log(N)/log(2))
b = 2^(h1);
a = 1;
aO = 1;
for d = 1:alfa
c = 1;
for e = 1:beta
a+b;
temp1 = W(c)*MC(a+b);
temp2 = MC(a);
MC(a) = MC(a) + temp1;
MC(a+b) = temp2  temp1;
a = a + 1;
c = c + alfa;
end
a = aO + 2^(h);
aO = a;
end
alfa = alfa/2;
beta = beta*2;
end
Fig.4. Radix2 algorithm comp. with MATLAB function FFT
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
Radix−2 FFT algorithm
N
y(x) = sin(2*pi*0.35*t)+sin(2*pi*0.25*t)
0 50 100 150 200 250 300 350 400 450 500
0
100
200
300
400
500
600
MATLAB FFT algorithm
N
y(x) = sin(2*pi*0.35*t)+sin(2*pi*0.25*t)
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
25(52)
7 THE RADIX4 ALGORITHM
By developing the Radix2 algorithm even further and using the base 4
instead we will get the a more complex algorithm but with less computation
power. As we will understand, we will get the new constrain where the
number of data points N in the DFT has to be the power of 4 (i.e. N = 4
x
). By
doing in the same way as we did with the Radix2 algorithm we divide the
data sequence into four subsequence
By using the approach described in [8] and by applying:
(EQ 34)
where F(l,q) is given by:
(EQ 35)
And where:
(EQ 36)
y 4n ( ) { ¦ y 0 ( ) y 4 ( ) y 8 ( ) … y N 4 – ( ) , , , , { ¦ =
y 4n 1 + ( ) { ¦ y 1 ( ) y 5 ( ) y 9 ( ) … y N 3 – ( ) , , , , { ¦ =
y 4n 2 + ( ) { ¦ y 2 ( ) y 6 ( ) y 10 ( ) … y N 2 – ( ) , , , , { ¦ =
y 4n 3 + ( ) { ¦ y 3 ( ) y 7 ( ) y 11 ( ) … y N 1 – ( ) , , , , { ¦ =
X p q , ( ) W
N
lq
F l q , ( ) [ ]W
4
lp
l 0 =
3
∑
=
p 0 1 2 3 , , , =
F l q , ( ) x l m , ( )W
N
4

mq
m 0 =
N
4
 1 –
. ,
 `
∑
=
l 0 1 2 3 , , , =
q 0 1 2 …
N
4
 1 – , , , , =
x l m , ( ) x 4m l + ( ) =
X p q , ( ) X
N
4
 p q +
. ,
 `
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
26(52)
And the four N/4point DFT’s obtained from equation 35 are combined
according to equation 34 and can be combined to yield the Npoint DFT, as
described in [8]:
(EQ 37)
We also have to note that W
0
N
= 1, which will give us three complex
multiplications and 12 complex additions per Radix4 butterﬂy. As the Radix
4 algorithm consists of v steps (log(N)/log(4)) where each step involves N/4
number of butterﬂies we will get 3*v*N/4 = (3N/8)log
2
N number of complex
multiplications and (3N/2)log
2
N complex additions. If compared with the
computational power used by the Radix2 algorithmin chapter 5, we will ﬁnd
that we have a computer gain of 25%regarding the complex multiplications,
but that the number of complex additions increases by 50%.
The matrix in equation 37 is better described with a Radix4 butterﬂy:
Fig.5. Radix4 Butterﬂy, also referred to as Dragonﬂy
X 0 q , ( )
X 1 q , ( )
X 2 q , ( )
X 3 q , ( )
1 1 1 1
1 j – 1 – j
1 1 – 1 1 –
1 j 1 – j –
W
N
0
F 0 q , ( )
W
N
q
F 1 q , ( )
W
N
2q
F 2 q , ( )
W
N
3q
F 3 q , ( )
=
W
0
W
q
W
2q
W
3q
in0
in1
in2
in3
j
1
j
1
1
j
1
j
x
x
x
x
A
B
C
D
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
27(52)
As we are interested in a complex FFT we need to derive the equations for
the complex radix4 algorithm.
(EQ 38)
Which in the complex matter will give us, starting with the easiest ones:
(r =real, i = imag)
(EQ 39)
And continuing with B gives:
(EQ 40)
Divided into real and imaginary part:
(EQ 41)
a in0 1 × =
b in1 W
q
× =
c in2 W
2q
× =
d in3 W
3q
× =
¹
¹
¹
'
¹
¹
¹
A a b c d + + + =
B a c – j b d – ( ) – =
C a c b d + ( ) – + =
D a c – j b d – ( ) + =
⇒
Ar ar br cr dr + + + =
Ai ai bi ci di + + + =
Cr ar br – cr dr – + =
Ci ai bi – ci di – + =
B ar ai cr – ci – jbr – jbi – jdr jdi + + + =
B ar ai cr – ci – jbr – bi jdr di – + + + =
imag real
Br ar cr – bi di – + =
Bi ai ci – br – dr + =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
28(52)
And the last one gives:
(EQ 42)
Divided into real and imaginary part:
(EQ 43)
To get the inputs ar, ai, br, bi, cr, ci, dr and di, we will have to multiply the input
in0r, in0i and so on with the twiddle factor. This render in:
(EQ 44)
This is adequate for all input signals. X = dragonﬂy speciﬁc value (twiddle
factor)
As the goal of this project is to implement a very fast fourier transform in a
realtime programmable logic system, we want as fewcomplex multiplications
as possible, which yields lots of logic. With this in thoughts, to choose the
Radix4 algorithm for implementation was obvious, as it has less complex
multiplications than the Radix2 algorithm.
D ar ai cr – ci – jbr jbi jdr – jdi – + + + =
D ar ai cr – ci – jbr bi – jdr – di + + + =
imag real
Dr ar cr – bi – di + =
Di ai ci – br dr – + =
br in1r x ( ) in1i x ( ) sin × + cos × =
bi in1i x ( ) cos × in1r x ( ) sin × – =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
29(52)
Part of radix4 Matlab algorithm
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Innitialize variables.
t = 1:1:256;
x = sin(2*pi*0.35*t)+sin(2*pi*0.38*t);
x1 = x;
n = length(x);
t = log(n)/log(4);
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% Radix4 Algorithm
for q = 1:t
L = 4^q;
r = n/L;
Lx = L/4;
rx = 4*r;
y = x;
for j = 0:Lx1
for k = 0:r1
a = y(j*rx + k + 1);
b = exp(i*2*pi*j/L)*y(j*rx + r + k + 1);
c = exp(i*2*pi*2*j/L)*y(j*rx + 2*r + k + 1);
d = exp(i*2*pi*3*j/L)*y(j*rx + 3*r + k + 1);
t0 = a + c;
t1 = a  c;
t2 = b + d;
t3 = b  d;
x(j*r + k + 1) = t0 + t2;
x((j + Lx)*r + k + 1) = t1  i*t3;
x((j + 2*Lx)*r + k + 1) = t0  t2;
x((j + 3*Lx)*r + k + 1) = t1 + i*t3;
end
end
end
Fig.6. Radix4 FFT algorithm compared with Matlab FFT
0 50 100 150 200 250
20
40
60
80
100
n
Radix−4 FFT algorithm
0 50 100 150 200 250
20
40
60
80
100
n
Matlab FFT algorithm
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
30(52)
8 IMPLEMENTATION AND REALIZATION IN HARDWARE
Classical implementation of the FFT algorithm, with a processor or in
hardware usually requires a sequential algorithm, in some cases recursive,
this due to space and memory requirements. This slows down the execution
time. By utilizing modern programmable circuits, like a FPGA, a parallel
approach to the realization of FFT is available.
8.1 FPGA
The realtime FFT construction was meant to be realized in a FPGA, a ﬁeld
programmable gate array, constructed and manufactured by Xilinx, Inc. The
Xilinx FPGA model VirtexE is a state of the art programmable gate array for
high speed, high complex logical construction. There is a great ﬁeld of
models, from small to large circuits. The logic inside a FPGA is constructed
around a building block called CLB, Conﬁgurable logic block.
Fig.7. CLB, Conﬁgurable logic block. Courtesy of Xilinx Inc.
Each of these blocks are divided into two slices, where each slice consists of
two lookup tables and some storage elements. The slices are internally
connected in between and are the basic highspeed logic in the circuit.
Available for implementation of this project was a PCB with a Xilinx VirtexE
1000 mounted. This circuit holds a CLB array of 64 x 96 = 6144 CLB blocks.
For more information about Xilinx VirtexE, refer to Xilinx VertexE data book
[13].
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
31(52)
8.2 COMPLEX FFT
The Ericsson Microwave speciﬁcation for the project was to simulate, realize
and implement a complex FFT in a FPGA, a Xilinx VirtexE 1000. The
speciﬁcations for the FFT was:
FFTlength
Minimum: 16 complex samples
Maximum: 1024 complex samples
Typical: 64 or 256 (16)
Number of bits for the input signal
Minimum: 10 bits
Maximum: 16 bits
Typical: 12
The idea was to implement the FFT as a buildingblock in a construction,
where the FFTblock will be placed after a quadrature divided A/D converted
signal as described in the ﬁgure below:
Fig.8. Construction conﬁguration
A/D
f(x) x(n)
I/Q
I
Q
FFT
FPGA, Xilinx VirtexE 1000
I
Q
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
32(52)
8.3 BITLENGTH
The ﬁrst thing to consider when implementing something discrete in
hardware is to consider the bit length with which you want to represent your
sample. The best way to do this is to simulate different types of bit lengths
and compare the phase error and amplitude error factor with the constrains
for your construction.
Fig.9. Radix4 FFT, 12bit length of samples
Fig.10. Radix4 FFT, 14bit length of samples
Fig.11. Radix4 FFT, 16bit length of samples
0 50 100 150 200 250
−40
−20
0
20
40
Radix−4 FFT, Bits = 12
n
d
B
sin(2*pi*f1*p/Fs)
0 50 100 150 200 250
−40
−20
0
20
40
MATLAB FFT
n
d
B
sin(2*pi*f1*p/Fs)
50 100 150 200 250
−3
−2
−1
0
Amplitude error factor Radix−4 FFT/MATLAB FFT
n
d
B
50 100 150 200 250
0
5
10
15
Phase error
n
P
h
a
s
e
e
r
r
o
r
0 50 100 150 200 250
−40
−20
0
20
40
Radix−4 FFT, Bits = 14
n
d
B
sin(2*pi*f1*p/Fs)
0 50 100 150 200 250
−40
−20
0
20
40
MATLAB FFT
n
d
B
sin(2*pi*f1*p/Fs)
50 100 150 200 250
0
0.2
0.4
0.6
0.8
1
Amplitude error factor Radix−4 FFT/MATLAB FFT
n
d
B
50 100 150 200 250
−10
−8
−6
−4
−2
0
Phase error
n
P
h
a
s
e
e
r
r
o
r
0 50 100 150 200 250
−40
−20
0
20
40
Radix−4 FFT, Bits = 16
n
d
B
sin(2*pi*f1*p/Fs)
0 50 100 150 200 250
−40
−20
0
20
40
MATLAB FFT
n
d
B
sin(2*pi*f1*p/Fs)
50 100 150 200 250
−0.1
0
0.1
0.2
0.3
0.4
0.5
Amplitude error factor Radix−4 FFT/MATLAB FFT
n
d
B
50 100 150 200 250
−1.5
−1
−0.5
0
Phase error
n
P
h
a
s
e
e
r
r
o
r
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
33(52)
As the constrains for the realtime FFT construction was to minimize the
phase and amplitude error as much as possible, but not more than that the
construction could be realizable. The simulation results pointed towards 16
bits, as this result had a small value of phase error.
8.4 RADIX4 FFT ALGORITHM, N = 64
The ﬁrst attempt of the implementation phase was to implement a Radix4
FFT algorithm, with length 64 complex samples. For a Radix4 FFT with
length N = 64, there are 3 dragonﬂy ranks, with each rank comprising 16
dragonﬂies.
In the ﬁrst revision of the construction, the bitlength of the input samples to
the ﬁrst dragonﬂy rank was 12, this due to the precision of the quadrature
block in ﬁgure 8. Those input samples were then multiplied with the phase
factor for the correct block, also with a precision of 12 bits. As the complex
output of the multiplication will generate 2
12
* 2
12
=> 24 bits, the complex
output of the multiplication is rounded of and truncated to 14 bits. This results
in a 14 bits input to the second rank of dragonﬂies, which will by using the
same model as for the ﬁrst rank of dragonﬂies, generate a complex output
with the length of 16 bits. As we also have a third rank of dragonﬂies, the
complex output from our FFT construction will have 18 bits.
Fig.12. Radix4 FFT, N = 64
The FFTblock was constructed using the software EASE and the
programming language VHDL, i.e. Very high speed integrated circuit
Hardware Description Language.
The software Ease is a block model description language that lets you
construct the algorithm as blocks and takes care of the interconnection
between the blocks and then generates the VHDL code for this
interconnection [10].
I
Q
I I I
Q Q Q
12bits
12bits
14 bits
14 bits
16 bits
16 bits
18 bits
18 bits
D
r
a
g
o
n
ﬂ
y
r
a
n
k
D
r
a
g
o
n
ﬂ
y
r
a
n
k
D
r
a
g
o
n
ﬂ
y
r
a
n
k
Radix4 FFT, N = 64
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
34(52)
Part of the VHDL  code for one of the dragonﬂies in the ﬁrst of the ranks is
as follows:
begin  process radix4
if clk'event and clk = '1' then
ar_temp <= in0r*cos_0j;
ai_temp <= in0i*sin_0j;
br_temp1 <= in1r*cos_1j;
br_temp2 <= in1i*sin_1j;
bi_temp1 <= in1i*cos_1j;
bi_temp2 <= in1r*sin_1j;
cr_temp1 <= in2r*cos_2j;
cr_temp2 <= in2i*sin_2j;
ci_temp1 <= in2i*cos_2j;
ci_temp2 <= in2r*sin_2j;
dr_temp1 <= in3r*cos_3j;
dr_temp2 <= in3i*sin_3j;
di_temp1 <= in3i*cos_3j;
di_temp2 <= in3r*sin_3j;
br_temp <= br_temp1 + br_temp2;
bi_temp <= bi_temp1  bi_temp2;
cr_temp <= cr_temp1 + cr_temp2;
ci_temp <= ci_temp1  ci_temp2;
dr_temp <= dr_temp1 + dr_temp2;
di_temp <= di_temp1  di_temp2;
ar_round <= ar_temp((N*21) downto (N3)) + round;
ai_round <= ai_temp((N*21) downto (N3)) + round;
br_round <= br_temp((N*21) downto (N3)) + round;
bi_round <= bi_temp((N*21) downto (N3)) + round;
cr_round <= cr_temp((N*21) downto (N3)) + round;
ci_round <= ci_temp((N*21) downto (N3)) + round;
dr_round <= dr_temp((N*21) downto (N3)) + round;
di_round <= di_temp((N*21) downto (N3)) + round;
temp1r <= ar_round((N+2) downto 1) + cr_round((N+2) downto 1);
temp1i <= ai_round((N+2) downto 1) + ci_round((N+2) downto 1);
temp2r <= ar_round((N+2) downto 1)  cr_round((N+2) downto 1);
temp2i <= ai_round((N+2) downto 1)  ci_round((N+2) downto 1);
ar_out <= temp1r + br_round((N+2) downto 1) + dr_round((N+2) downto 1);
ai_out <= temp1i + bi_round((N+2) downto 1) + di_round((N+2) downto 1);
br_out <= temp2r + bi_round((N+2) downto 1)  di_round((N+2) downto 1);
bi_out <= temp2i  br_round((N+2) downto 1) + dr_round((N+2) downto 1);
cr_out <= temp1r  br_round((N+2) downto 1)  dr_round((N+2) downto 1);
ci_out <= temp1i  bi_round((N+2) downto 1)  di_round((N+2) downto 1);
dr_out <= temp2r  bi_round((N+2) downto 1) + di_round((N+2) downto 1);
di_out <= temp2i + br_round((N+2) downto 1)  dr_round((N+2) downto 1);
end if;
end process radix4;
end a0 ;  of Block1
The VHDL code above describes exactly the dragonﬂy illustrated in ﬁgure 5.
For this block the phase/twiddle factor is simple and can easy be realized as
a right shift of the input signal, but as we get towards the last dragonﬂy in the
construction, the phase factor constant gets more complex and has to be
realized as a high performance multiplier. The total block description is
described in appendix A1.
The 64 complex input signals is shifted into the FFTblock using a shift
register. This register is divided into a real and an imaginary part where the
input (complex) gets a new sample every clock cycle. When the shift register
gets full, it generates a valid signal that triggs the FFTblock that starts the
FFT process. When the process is done, a newvalid signal is generated, and
an output shift register is started. For every clock cycle, a new processed
value is delivered to the output. After all 64 values are delivered the valid
signal gets low and shows that every sample has been shifted out.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
35(52)
The code is before realization simulated, using a simulation program called
Modelsim, a VHDL simulation program. The result from this part showed
perfect result when comparing with a MATLAB FFT of a sinusoidal signal.
Fig.13. First FFT construction vs. Matlab FFT
What we can see from the plot in ﬁgure 13 is that we get a truncation and
rounding error that will generate a noise in the FFT plot. Although the
frequency peaks are in the correct position.
The next step after simulation is to realize the construction in the next
program. This program is called Synplify and translates/syntesis from code
to gate level.
The sad conclusion when reaching this level was that the construction was
to large for a VirtexE 1000 circuit and even for the next higher circuit, Virtex
E 2000, with twice as many CLBblocks, but Xilinx has larger circuits, like the
VirtexE 3200 where this construction would be implementable. This means
that we have to consider another bitlength or a shorter FFTlength.
The ﬁrst try was to consider another bit length. By changing the design and
the dragonﬂy blocks to 12 bits we will save lots of hardware, but we will get
a higher amount of phase error.
The new dragonﬂy blocks (still as in ﬁgure 5) uses 12 bits as input and uses
12 bits precision on the phase factor. The output from the multiplication will
generate 24 bits that gets rounded of and truncated back to 12 bits. The
same applies for the second and the third dragonﬂy ranks. So the input will
be 12 bits and the output will also be 12 bits.
A new consideration is also to instead of using four complex multiplier (8 real
multiplier) as in the above mentioned implementation, we have to consider
the choice when we reuse the same multiplier for all of the multiplications.
This fact is better described by looking the VHDL code below.
10 20 30 40 50 60
5
10
15
20
25
30
n
Radix−4 and Matlab FFT, x=sin(2*pi*4/16*t), N=64, 12 bits input, 18 bits output
Matlab
Modelsim
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
36(52)
Part of VHDLcode for Radix4 dragonﬂy with shared multiplier:
begin  process radix4
if clk'event and clk = '1' then
if done_i = '1' then
run <= '1';
radix <= 0;
end if;
if run = '1' then
case radix is
when 0 =>
signal_in0 <= in0r;
constant_in0 <= cos_0j;
radix <= 1;
when 1 =>
signal_in0 <= in0i;
constant_in0 <= sin_0j;
ar_temp <= signal_in0*constant_in0;
radix <= 2;
when 2 =>
signal_in0 <= in1r;
constant_in0 <= cos_1j;
signal_in1 <= in1i;
constant_in1 <= sin_1j;
ai_temp <= signal_in0*constant_in0;
radix <= 3;
when 3 =>
signal_in0 <= in1i;
constant_in0 <= cos_1j;
signal_in1 <= in1r;
constant_in1 <= sin_1j;
br_temp <= signal_in0*constant_in0 + signal_in1*constant_in1;
radix <= 4;
when 4 =>
signal_in0 <= in2r;
constant_in0 <= cos_2j;
signal_in1 <= in2i;
constant_in1 <= sin_2j;
bi_temp <= signal_in0*constant_in0  signal_in1*constant_in1;
radix <= 5;
when 5 =>
signal_in0 <= in2i;
constant_in0 <= cos_2j;
signal_in1 <= in2r;
constant_in1 <= sin_2j;
cr_temp <= signal_in0*constant_in0 + signal_in1*constant_in1;
radix <= 6;
when 6 =>
signal_in0 <= in3r;
constant_in0 <= cos_3j;
signal_in1 <= in3i;
constant_in1 <= sin_3j;
ci_temp <= signal_in0*constant_in0  signal_in1*constant_in1;
radix <= 7;
when 7 =>
signal_in0 <= in3i;
constant_in0 <= cos_3j;
signal_in1 <= in3r;
constant_in1 <= sin_3j;
dr_temp <= signal_in0*constant_in0 + signal_in1*constant_in1;
radix <= 8;
when 8 =>
di_temp <= signal_in0*constant_in0  signal_in1*constant_in1;
radix <= 9;
when 9 =>
ar_out <= ar_temp((2*N1) downto(N)) + cr_temp((2*N1) downto(N)) + br_temp((2*N
1) downto(N)) + dr_temp((2*N1) downto(N));
ai_out <= ai_temp((2*N1) downto(N)) + ci_temp((2*N1) downto(N)) + bi_temp((2*N
1) downto(N)) + di_temp((2*N1) downto(N));
br_out <= ar_temp((2*N1) downto(N))  cr_temp((2*N1) downto(N)) + bi_temp((2*N
1) downto(N))  di_temp((2*N1) downto(N));
bi_out <= ai_temp((2*N1) downto(N))  ci_temp((2*N1) downto(N))  br_temp((2*N
1) downto(N)) + dr_temp((2*N1) downto(N));
cr_out <= ar_temp((2*N1) downto(N)) + cr_temp((2*N1) downto(N))  br_temp((2*N
1) downto(N))  dr_temp((2*N1) downto(N));
ci_out <= ai_temp((2*N1) downto(N)) + ci_temp((2*N1) downto(N))  bi_temp((2*N
1) downto(N))  di_temp((2*N1) downto(N));
dr_out <= ar_temp((2*N1) downto(N))  cr_temp((2*N1) downto(N))  bi_temp((2*N
1) downto(N)) + di_temp((2*N1) downto(N));
di_out <= ai_temp((2*N1) downto(N))  ci_temp((2*N1) downto(N)) + br_temp((2*N
1) downto(N))  dr_temp((2*N1) downto(N));
done_o <= '1';
radix <= 10;
when others =>
radix <= 0;
run <= '0';
done_o <= '0';
end case;
end if;
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
37(52)
As we can see in this code for the ﬁrst block, we utilize a SWITCHCASE
structure, using the same multiplier (actually two multipliers, one for the real
part and one for the imaginary part) for all the multiplications between the
phase factor and the input. This slows down the construction, as we have to
use more clock cycles to get all the inputs through the dragonﬂy. The timing
diagram below shows how the construction works.
Fig.14. Timing diagram for Radix4 FFT, shared multiplier
The grey marked area in Shift_out(I/Q) is invalid data. Also this construction
showed to be to large for a VirtexE 1000 circuit. The code translation
program Synplify showed though that the construction is realizable in a
VirtexE 2000 at a clock rate of 55 MHz, this means that the computation
phase alone is 640 nanoseconds in duration. This can be compared with the
Xilinx Virtex LogiCore blocks that utilize 1.92 microseconds for the same
computation[13].
Another constrain that the LogiCore block has, is that it has to have all the
complex input samples in serial. It is easy to change the above described
construction to get all the samples as a gigantic parallel bus, or just speed
up the clock rate on the Shift_in register and the Shift_out register,
clk
Shift
=N*clk
FFT
, as the clock rate constrain for this construction is in the
multipliers in the dragonﬂies and not in the Shift registers. The total block
diagram description is displayed in appendix A2.
As we wanted to implement and realize a construction in the accessible
VirtexE 1000, described in ﬁgure 8 above we once again have to reconsider
the FFTsample length and the bit length of the construction. As the above
mentioned 12 bits FFT with length 64 complex samples, utilized almost 75%
of a VirtexE 2000 circuit, we can be quite sure that a 16 bits FFT with FFT
length 16 complex samples will be possible to implement in a VirtexE 1000.
0 1 2 3 63 0
0 1 63
1 64 76 86 96 160
clk
Shift_in(I/Q)
Shift_in_valid
Rank 1 done
Rank 2 done
Rank 3 done
Valid_data
Shift_out(I/Q)
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
38(52)
8.5 RADIX4 FFT ALGORITHM, N = 16
A new construction with the FFT length of 16 complex samples had to be
made. The construction consists of two dragonﬂy ranks with each four
dragonﬂies. It also consists of an input register that holds four predeﬁned
signals for the FFT, this instead of using the quadrature divided A/D input
signal:
Where t goes from 1 to 16. The signals are then converted to two’s
complement using a Matlab function, TWOSCOMP(no_of_bits,DATA).
Fig.15. Input signal X1 and X2
Fig.16. Input signal X3 and X4
x1 2π
4
16
 t ×
. ,
 `
sin =
x2 2π
4.5
16
t ×
. ,
 `
sin =
x3 2π
4
16
 t ×
. ,
 `
j 2π
4
16
 t ×
. ,
 `
sin + cos =
x4 2π
4.5
16
t ×
. ,
 `
j 2π
4.5
16
t ×
. ,
 `
sin + cos =
5 10 15
−1
−0.5
0
0.5
1
n
Sequence = "00", sin(2*pi*4/16*t)
5 10 15
1
2
3
4
5
6
7
8
n
FFT of "00"
5 10 15
−0.5
0
0.5
1
n
Sequence = "01",sin(2*pi*4.5/16*t)
5 10 15
1
2
3
4
5
n
FFT of "01"
−1 −0.5 0 0.5 1
−1
−0.5
0
0.5
1
n
Sequence = "10", cos(2*pi*4/16*t)+i*sin(2*pi*4/16*t)
5 10 15
2
4
6
8
10
12
14
16
n
FFT of "10"
−1 −0.5 0 0.5
−0.5
0
0.5
1
n
Sequence = "11", cos(2*pi*4.5/16*t)+i*sin(2*pi*4.5/16*t)
5 10 15
2
4
6
8
10
n
FFT of "11"
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
39(52)
The idea with those four signals is to let the FFT construction consider two
real input signals, one within a FFT channel and one outside, and two
complex input signals, also here one within a FFT channel and one outside.
The two signals that are outside the FFT channel will spread through all the
channels, as we can see from the FFT plots above.
The FFT construction with bit length 16 and the four predeﬁned signals, x1
x4, can be deﬁned as:
Fig.17. Radix4 N = 16
The construction was syntesized in the code translation program Synplify
and was realizable in 50 MHz using the below showed timing constrains.
Fig.18. Timing diagram for Radix4 FFT length 16, 16 bits
I I I
Q Q Q
16 bits
16 bits
16 bits
16 bits
16 bits
16 bits
D
r
a
g
o
n
ﬂ
y
r
a
n
k
D
r
a
g
o
n
ﬂ
y
r
a
n
k
I
n
p
u
t
s
i
g
n
a
l
x
1

x
4
S
h
i
f
t
_
o
u
t
r
e
g
i
s
t
e
r
Q
16 bits
16 bits
I
T
o
l
o
g
i
c
a
l
a
n
a
l
y
s
e
r
F
r
o
m
P
a
t
t
e
r
n
g
e
n
e
r
a
t
o
r
X1X4
clk
reset’
on
0 1 2 3 15 0
0 Invalid data 1 15
0 14 15
1 16 26 41
clk
Shift_in(I/Q)
Shift_in_valid
FFT done
Shift_out(I/Q)
Absolute
Valid data
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
40(52)
The input connection to the FPGA goes through a serial interface called
HOTLINK, a high speed serial interface, that is connected to a Pattern
generator, a Hewlett Pacard HP16522A (200 MHz in 32 channels) for
generating the input stimuli.
The output 16 bits vector from the real (I) and imaginary (Q) part, is taken
care of by a Logical Analyser, a Hewlett Pacard HP16555D (2.0 MSamples,
110/500 MHz). As this instrument has the possibility to display the output
both as listing and as a graph, a absolute value block was made and
implemented after the Shift_out register:
Fig.19. Absolute value block
As the calculation of the absolute value is a quite complex procedure to do,
a alternative method is utilized. If we consider the following equations:
(EQ 45)
This method is quite easy to implement in hardware, and the precision of this
method is +1%/ 2%of variation on the output. This block is controlled by the
Absolute trigger, connected to the Pattern generator. When the Absolute
trigger = 1, the absolute value is delivered in the real part output channel to
the Logic Analyser. The imaginary part equals zero.
I
Q
16 bits
16 bits
S
h
i
f
t
_
o
u
t
r
e
g
i
s
t
e
r
Q
16 bits
16 bits
I
T
o
l
o
g
i
c
a
l
a
n
a
l
y
s
e
r
I
Q
16 bits
16 bits
A
b
s
o
l
u
t
e
v
a
l
u
e
F
r
o
m
F
F
T
r
a
n
k
2
From Pattern generator
Absolute
A Max I Q , { ¦ =
B Min I Q , { ¦ =
Absolutevalue Max A
7
8
 A
1
2
 B + ,
¹ ¹
' '
¹ ¹
=
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
41(52)
The construction can be compared with Xilinx LogiCore block [13] that also
uses 16 bits precision on the input and the phase factor. The Xilinx LogiCore
block requires 16 clock cycles (at a clock rate of 120 MHz) when the one
mentioned above, requires 10 clock cycles (at a clock rate of 50 MHz). The
difference once again is that the Xilinx LogiCore block requires that the input
data is delivered in serial, the above described block can take care of 16 new
complex 16 bits samples on every clock cycle.
(EQ 46)
(EQ 47)
The construction require twice as many CLB’s then the Xilinx LogiCore block
(963 CLB’s / 1876 CLB’s). The total block diagramdescription is displayed in
appendix A3.
T
FFTXilinx
1
120MHz
 16 × 133.33ns = =
T
FFTEMW
1
50MHz
 10
1
16
  × × 12.5ns = =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
42(52)
9 VERIFICATION AND RESULTS
9.1 TEST PATTERN
To get a ﬁle to download to the conﬁguration PROM, a circuit speciﬁc
software called Design manger is utilized. When you then apply power to
your PCB, the circuit, in this case the Xilinx VirtexE 1000 will download the
conﬁguration ﬁle and load your design.
A veriﬁcation/test pattern was programmed in the Pattern generator.
Test pattern:
1) reset = 0 (active low)
2) Synchronize HOTLINK
3) absolute = 0 (output = real + imag part)
sequence = 00 (signal x1)
on_signal = 1 (the signal in on)
4) reset = 1
5) When output signal Test_valid_data = 1
Collect fft_out_r (real part, I), and
Collect fft_out_i (imag part, Q) in Logic Analyser
(listning)
6) When Test_valid_data = 0 again
reset = 0
7) Synchronize HOTLINK
8) absolute = 1 (gives real = absolute value, imag = 0)
sequence = 00 (signal x1)
on_signal = 1
9) When Test_valid_data = 1
Collect fft_out_r and display as graph
10)When Test_valid_data = 0 again
Goto 1 but change to next signal (x2x4)
The output was collected in the Logical Analyser and transferred to Matlab
for veriﬁcation.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
43(52)
9.2 MATLAB VERIFICATION
The ﬁles from the Logic Analyser was loaded in Matlab and compared with
the result froma FFT made by Matlab itself on the same input signal (x1x4).
When the absolute trigger from the Pattern generator is set to 1, the output
graph from the Logical Analyser displayed the following result.
Fig.20. Output graph signal X1, absolute = 1
Fig.21. Output graph signal X2, absolute = 1
Fig.22. Output graph signal X3, absolute = 1
Fig.23. Output graph signal X4, absolute = 1
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
44(52)
If we compare ﬁgure 15 and 16 with ﬁgure 20 to 23, we will see that the
construction is working properly. The value listing from the veriﬁcation with
signal X1X4 was properly compared with FFT calculations in Matlab.
Fig.24. Output complex and absolute, signal 1 vs. Matlab
Fig.25. Output complex and absolute, signal 2 vs. Matlab
2 4 6 8 10 12 14 16
−250
−200
−150
−100
−50
0
n
d
b
Signal X1, FFT FPGA complex, FFT FPGA absolut & FFT Matlab
FPGA complex (Error vs. Matlab in dB) −311.0382
FPGA absolut (Error vs. Matlab in dB) −311.0382
Matlab
2 4 6 8 10 12 14 16
20
22
24
26
28
30
32
34
36
38
n
d
b
Signal X2, FFT FPGA complex, FFT FPGA absolut & FFT Matlab
FPGA complex (Error vs. Matlab in dB) −30.4532
FPGA absolut (Error vs. Matlab in dB) −30.4225
Matlab
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
45(52)
Fig.26. Output complex and absolute, signal 3 vs. Matlab
Fig.27. Output complex and absolute, signal 4 vs. Matlab
The Error value calculation was made according to the following formula
(EQ 48)
2 4 6 8 10 12 14 16
−250
−200
−150
−100
−50
0
n
d
b
Signal X3, FFT FPGA complex, FFT FPGA absolut & FFT Matlab
FPGA complex (Error vs. Matlab in dB) −301.3538
FPGA absolut (Error vs. Matlab in dB) −301.3538
Matlab
2 4 6 8 10 12 14 16
26
28
30
32
34
36
38
40
42
44
n
d
b
Signal X4, FFT FPGA complex, FFT FPGA absolut & FFT Matlab
FPGA complex (Error vs. Matlab in dB) −39.2248
FPGA absolut (Error vs. Matlab in dB) −35.1
Matlab
Error 10 10
FFT Matlab ( ) FFT FPGA ( ) – ( )
2
∑
FFT Matlab ( ) ( )
2
∑

. ,
 `
log =
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
46(52)
10 CONCLUSION
• As the Radix4 FFT algorithm utilizes less complex multipliers
than the Radix2 FFT algorithm, the Radix4 algorithm is
preferable for hardware implementation.
• A parallel programming approach seems to be the model when
a real time system with high sampling rate is desired.
• To reach an acceptable level of phase error, it is desirable to
use 16 bits precision on the input signal and the phase factor
• By using a separate clock with clock rate clk
Shift
=N*clk
FFT
, for
the input and output shift registers, it would be possible to
process a FFT on a signal of length N every clock cycle, clk
FFT
.
• By using shared multiplier in the dragonﬂies, less CLB’s is
utilized, with the cost of longer execution time.
11 IDEAS FOR FURTHER STUDIES
As there are two FFT constructions of length N=64, one with precision 12 bits
and one with precision 12, 14 and 16 bits, (dragonﬂy rank 1, 2 and 3) veriﬁed
to be correct in Modelsim, it would be desirable to implement and verify those
constructions when a circuit board with the necessary Xilinx VirtexE circuit
is available. Improvement and development of the input and output shift
registers are also interesting as this would improve the bandwidth of a real
time sampled signal when computing FFT.
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
47(52)
12 REFERENCES
[1] Bergland, G. D.: ’A guided tour of the fast Fourier transform’,
IEEE Spectrum, July 1969
[2] Bracewell, R. N.: ’The Fourier Transform and its applications’,
The McGrawHill Companies, Inc, 2000, ISBN: 0073039381
[3] Brigham, E. O.: ’The fast fourier transform’, PrenticeHall, Inc,
1974, ISBN: 013307496X
[4] Cartwright, M.: ’Fourier Methods for mathematicians, scientists
and engineers’, Ellis Horwood Limited, 1990, ISBN: 0133270165
[5] Gray, R. M., Goodman, J. W.: ’Fourier Transforms, an
introduction for engineers’, Kluwer Academic Publishers, 1995,
ISBN: 0792395859
[6] Lasser, R.: ’Introduction to Fourier Series’, Marcel Dekker, Inc.,
1996, ISBN: 0824796101
[7] Ma, Y., Wanhammar, L.: ’A hardware Efﬁcient Control of
Memory Addressing for HighPerformance FFT Processors’, IEEE
Transaction on Signal Processing, Vol. 48, No. 3, March 2000
[8] Proakis, J. G.: ’Digital Signal Processing, Principles, algorithms
and applications’, Prentice Hall, Inc., 1996, ISBN: 0133942899
[9] Roche, C.: ’A SplitRadix Partial Input/Output Fast Fourier
Transform Algorithm’,
[10] Translogic: ’Ease and Eale User’s Manual’, Translogic BV, Ede,
The Netherlands, 1998, http://www.translogiciccorp.com, (Acc 20010205)
[11] Van Loan, C.:’ Computational Frameworks for the Fast Fourier
Transform’, SIAM, 1992, ISBN: 0898712858
[12] Vretblad, A.: ’An introduction tp Fourier Analysis and some of
its applications’, Department of mathematics, Uppsala, Sweden, 1996,
ISBN: 9150611712
[13] Xilinx Inc.: ’Xilinx VirtexE Databook’, http://www.xilinx.com,
20002001, (Acc 20010205)
[14] Zonst, A. E.: ’Understanding the FFT’, Citrus Press, 1995,
ISBN: 0964568187
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
48(52)
Appendix A1 Ease block structure of Radix4 FFT, N = 64
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
49(52)
Appendix A2 Ease block structure of Radix4 FFT, N = 64, shared mult
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
50(52)
Appendix A3 Ease block structure of Radix4 FFT, N = 16
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
51(52)
Appendix B Matlab code
Datum  Date Rev
Nr  No. Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked File
20010212 A1
FX/D2001:007
REPORT
EMW/FX/DC(Anders Wanner)
EMWMSNN(Magnus Nilsson)
52(52)
Appendix C Output listning
REPORT
Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other) Nr  No.
2(52)
EMWMSNN(Magnus Nilsson)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked
FX/D2001:007
Datum  Date Rev File
EMW/FX/DC(Anders Wanner) Abstract
20010212
A1
Ericsson Microwave Systems develops radar systems for military and civilian applications. In military environments high radar resolution and long range are desired, thus high demands must be met by the generated and transmitted radar signal. In this report the design of a parallel Radix4 Fast Fourier Transform algorithm is described. A theoretical review regarding Fourier theory and Fast Fourier Transform (Radix2 and Radix4) is done. A complex parallel Radix4 algorithm is simulated, implemented and realized in hardware using VHDL and a Xilinx VirtexE 1000 FPGA circuit. The VHDL code was simulated and synthesized in Ease and Synplify environment. The design was veriﬁed and the output was identical with the Matlab and VHDL simulations, proving speed improvements due to a parallel approach.
REPORT
Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other) Nr  No.
3(52)
EMWMSNN(Magnus Nilsson)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked
FX/D2001:007
Datum  Date Rev File
EMW/FX/DC(Anders Wanner) Preface
20010212
A1
This thesis is a part of my education towards a Master degree in Computer and Information Engineering at Grifﬁth University, Brisbane, Australia. Project 1,2 and 3. MEE7097,MEE7098 and MEE7099. The work has been done at Ericsson Microwave System AB in Mölndal Sweden, at the department FX/D I would like to thank the following people who has been of great help to me during my work. My supervisor Rune Olsson, EMW. My manager Håkan Olsson/Anders Wanner, EMW. Prof. Kuldip K. Paliwal, supervisor GU. Daniel Wallström, EMW, for help with VHDL. Dennis Eriksson, EMW, for help with Logical Analyser/Pattern generator. Nils Dagås and Gabriel Gitye, EMW, for help with Matlab. I would also like to thank the remaining staff at EMW/FX and GU who have been helpful to me.
REPORT
Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other) Nr  No.
4(52)
EMWMSNN(Magnus Nilsson)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked
FX/D2001:007
Datum  Date Rev File
EMW/FX/DC(Anders Wanner) Contents 1 1.1 1.2 1.3 2 3 4 5 5.1 5.2 6
20010212
A1 Page
7
8 8.1 8.2 8.3
8.4
8.5
Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Task . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 Technical function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6 JeanBaptisteJoseph Fourier . . . . . . . . . . . . . . . . . . . . . . . . . . 7 The Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8 The Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . 13 Development of the Fast Fourier Transform. . . . . . . . . . . . . . . 15 Theory of the Fast Fourier Transform . . . . . . . . . . . . . . . . . . . 15 History of the Fast Fourier Transform . . . . . . . . . . . . . . . . . . . 16 The Radix  2 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17 Fig.1. : FFTButterﬂy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20 Fig.2. : Radix2 DFT structure . . . . . . . . . . . . . . . . . . . . . . . . . 23 Fig.3. : Radix2 vs. Direct calculation in ﬂops . . . . . . . . . . . . . 23 Fig.4. : Radix2 algorithm comp. with MATLAB function FFT . 24 The Radix4 Algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25 Fig.5. : Radix4 Butterﬂy, also referred to as Dragonﬂy . . . . . . 26 Fig.6. : Radix4 FFT algorithm compared with Matlab FFT . . . 29 Implementation and Realization in hardware. . . . . . . . . . . . . . 30 FPGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30 Fig.7. : CLB, Conﬁgurable logic block. Courtesy of Xilinx Inc. . 30 Complex FFT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31 Fig.8. : Construction conﬁguration . . . . . . . . . . . . . . . . . . . . . . 31 Bitlength . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 Fig.9. : Radix4 FFT, 12bit length of samples . . . . . . . . . . . . . 32 Fig.10. : Radix4 FFT, 14bit length of samples . . . . . . . . . . . . 32 Fig.11. : Radix4 FFT, 16bit length of samples . . . . . . . . . . . . 32 Radix4 FFT algorithm, N = 64 . . . . . . . . . . . . . . . . . . . . . . . . 33 Fig.12. : Radix4 FFT, N = 64 . . . . . . . . . . . . . . . . . . . . . . . . . 33 Fig.13. : First FFT construction vs. Matlab FFT. . . . . . . . . . . . 35 Fig.14. : Timing diagram for Radix4 FFT, shared multiplier . . 37 Radix4 FFT algorithm, N = 16 . . . . . . . . . . . . . . . . . . . . . . . . 38 Fig.15. : Input signal X1 and X2. . . . . . . . . . . . . . . . . . . . . . . . 38 Fig.16. : Input signal X3 and X4. . . . . . . . . . . . . . . . . . . . . . . . 38 Fig.17. : Radix4 N = 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 Fig.18. : Timing diagram for Radix4 FFT length 16, 16 bits . . 39 Fig.19. : Absolute value block . . . . . . . . . . . . . . . . . . . . . . . . . 40
REPORT
Uppgjord (även faktaansvarig om annan)  Prepared (also subject responsible if other) Nr  No.
5(52)
EMWMSNN(Magnus Nilsson)
Dokansv/Godkänd  Doc respons/Approved Kontr  Checked
FX/D2001:007
Datum  Date Rev File
EMW/FX/DC(Anders Wanner) 9 9.1 9.2
20010212
A1
10 11 12
Veriﬁcation and Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 Test pattern . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .42 Matlab veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .43 Fig.20. : Output graph signal X1, absolute = 1 . . . . . . . . . . . . .43 Fig.21. : Output graph signal X2, absolute = 1 . . . . . . . . . . . . .43 Fig.22. : Output graph signal X3, absolute = 1 . . . . . . . . . . . . .43 Fig.23. : Output graph signal X4, absolute = 1 . . . . . . . . . . . . .43 Fig.24. : Output complex and absolute, signal 1 vs. Matlab . . .44 Fig.25. : Output complex and absolute, signal 2 vs. Matlab . . .44 Fig.26. : Output complex and absolute, signal 3 vs. Matlab . . .45 Fig.27. : Output complex and absolute, signal 4 vs. Matlab . . .45 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 Ideas for further studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .46 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .47 . . . . . . . . . . . . . . Ease block structure of Radix4 FFT, N = 64 . . . Ease block structure of Radix4 FFT, N = 64, shared mult . . . . . . . . . . . . . . Ease block structure of Radix4 FFT, N = 16 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Matlab code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Output listing
Appendix A1 AppendixA2 Appendix A3 Appendix B Appendix C
implement and evaluate the DFT (Discrete Fourier Transform) in FPGA or similar. A manufacturer of FPGA named XILINX now has a dropin module for their FPGAs which can execute a 1024points FFT.Date Rev File EMW/FX/DC(Anders Wanner) 1 1.Prepared (also subject responsible if other) Nr . With the latest generation of FPGA (Field Programmable Gate Arrays) it is possible to implement very large amounts of logic in a single integrated circuit. 1. execute a DFT or IDFT and output the data.2 TASK To study.3 TECHNICAL FUNCTION The DFT shall collect data. . It is interesting to evaluate and develop such a DFT or similar. The implementation shall be optimized on execution time.1 INTRODUCTION BACKGROUND 20010212 A1 To implement the DFT (FFT) in hardware (real time system) required expensive solution often with ASIC (Application Speciﬁc Integrated Circuit).REPORT Uppgjord (även faktaansvarig om annan) . 6(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .No. 1.Doc respons/Approved Kontr . size (area) and cost.Checked FX/D2001:007 Datum .
despite ofﬁcial duties. .Checked FX/D2001:007 Datum . with whom he discussed his ideas and worked together on. Fourier came in contact with the English Physisist Thomas Young (17737829). keeping his room really hot. He was very criticized for his theory among the french scientists. among other things. After returning back to Paris. including the heat from the earth. We have learned that Fourier was obsessed with heat. even a function deﬁned by different analytic expressions in adjacent segments of its range (such as a staircase waveform) could nevertheless be represented by a single analytic expression. As we all know. 7(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .No.Doc respons/Approved Kontr . The publication of his writing report "Théorie analytique de la chaleur" (The analytical theory of heat) in 1815 was also met with some criticism and this might be seen as an indication of the deep uneasiness about Fourier analysis that was felt by the great mathematicians of that day. Fourier had by 1807. He was born in poor circumstances in the small village of Auxerre. science and also engineering. Some has traced this obsession back to Egypt where he went 1798 with Napoleon on an expedition to civilize the country. it is at the heart of the electrical curriculum today. France. JeanBaptisteJoseph Fourier introduced the idea that an arbitrary function. JeanBaptisteJoseph Fourier died in Paris the 16:th of May 1830. this when even wearing a heavy coat himself. which depended on the essential idea of analysing the temperature distribution into spatially sinusoidal componets.Date Rev File EMW/FX/DC(Anders Wanner) 2 20010212 A1 JEANBAPTISTEJOSEPH FOURIER The 21:st of March 1768. Fourier’s ideas encountered resistance at the time but has proven to be a central theorem to many of the later developments in mathematics. At the time in Egypt. the Rosetta Stone. among them where Biot and Poisson. Fourier came across the idea in the connection with the problem of ﬂow of heat in solid bodies.REPORT Uppgjord (även faktaansvarig om annan) . By this time Fourier worked with his theories parallel to his ofﬁcial duties as a secretary of the Institut d’Egypte. Even though he was criticized for his theory he received a mathematic prize in 1811 for his heat theory.Prepared (also subject responsible if other) Nr . JeanBaptisteJoseph Fourier was born. completed his theory of heat conduction. uncomfortably hot for visitors. He never got married. father of linearity.
No. If we deﬁne a signal: t → ±∞ lim y ( t ) = 0 (EQ 1) Transient signal 7 6 5 4 3 2 1 0 0 2 4 6 n 8 10 12 14 A transient signals spectrum is characterised by the fact that it is continuous. 8(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Checked FX/D2001:007 Datum . Maybe the most known application of this mathematical technique is the analysis of linear timeinvariant system.Date Rev File EMW/FX/DC(Anders Wanner) 3 20010212 A1 THE FOURIER TRANSFORM One of todays principal analysis tool in many of todays scientiﬁc challenges is the Fourier Transform. this means that it holds inﬁnite numbers of frequency components.REPORT Uppgjord (även faktaansvarig om annan) . Its importance is based on the fundamental property that one can examine a particular relationship from an entirely different viewpoint.Doc respons/Approved Kontr . .Prepared (also subject responsible if other) Nr . Simultaneous visualization of a function and its Fourier Transform is often the key to successful problem solving. although usually they are in a ﬁnite interval. the Fourier Transform is essentially a universal problem solving technique. As this might be the most well known application.
Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Mathematically one can deﬁne a signal.+ 2 ∑ { An cos ( nω0 t ) + Bn sin ( nω0 t ) } n=1 (EQ 2) ∞ where 2π ω 0 = T T = Periodicaltime If we look back on our transient signal above. 9(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .REPORT Uppgjord (även faktaansvarig om annan) . that vary periodically with time.Doc respons/Approved Kontr . This can be deﬁned as a formula: 1 y ( t ) = . Equation 3 becomes: ∞ y(t ) = ∫ A ( ω ) cos ( ωt ) + B ( ω ) sin ( ωt ) 0 (EQ 3) . where a simple relationship exists between the frequency components. to be a sum of discrete frequency components.Prepared (also subject responsible if other) Nr .No. the mathematical consequence will be that the coefﬁcients will be continues functions of the phase w.Checked FX/D2001:007 Datum .
REPORT Uppgjord (även faktaansvarig om annan) .Prepared (also subject responsible if other) Nr . this though A0/2 represents the time mean value of the signal and though it is a transient. one should facilitate the calculations by introduce complex notation. The Fourier coefﬁcients A(w) and B(w) is deﬁned by the Fourier integrals: ∞ A(ω) = 2 –∞ ∫ y ( t ) cos ωt dt (EQ 4) ∞ B(ω) = 2 –∞ ∫ y ( t ) sin ω t dt (EQ 5) where ω>0 When one wants to calculate the Fourier coefﬁcients in the general case for the signal f(t).Doc respons/Approved Kontr . 10(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . the time mean value is zero.The starting point for complex notation of the Fourier Transform is based on the formulas by Euler which gives a relation between the complex number j and the trigonometrical functions sine and cosine: e +e cos α = 2 jα – jα (EQ 6) e –e sin α = 2j jα – jα (EQ 7) .No.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 If we compare this equation with equation 2 we will see that the constant A0/2 has disappeared.Checked FX/D2001:007 Datum .
Prepared (also subject responsible if other) Nr .( A – j B )e 2 1 0 jωt 1 – j ωt + . 11(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .No.( A + jB )e dω 2 (EQ 8) If we deﬁne Y(w): 1 Y ( ω ) = . 6 and 7 will give us: ∞ y(t ) = ∫ 0 e +e A ( ω )  + B ( ω ) e – e  dω  .( A ( ω ) – j B ( ω ) ) 2 1 Y ( – ω ) = .Doc respons/Approved Kontr .( A ( ω ) + jB ( ω ) ) 2 Equation 9a and 9b (EQ 9) Then the equation 8 can be simpliﬁed by making the integration over the real area: ∞ y(t ) = –∞ ∫ Y ( ω )e jωt (EQ 10) . 2 2j jωt – jωt jωt – jωt ∞ y(t ) = e –e e +e ∫ A .+ jB  dω 2 2 0 jωt – jωt jωt – jωt ∞ y(t ) = ∫ .REPORT Uppgjord (även faktaansvarig om annan) .Checked FX/D2001:007 Datum .Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 The equations 2.
Doc respons/Approved Kontr . by looking at equation 4 and 5: ∞ ∞ –∞ Y (ω) = –∞ ∫ y ( t ) cos ωt dt – j ∫ y ( t ) sin ω t dt ∞ Y (ω) = –∞ ∫ y ( t ) ( cos ωt – j sin ω t ) dt ∞ Y (ω) = –∞ ∫ y ( t )e – jωt (EQ 11) We will then deﬁne Y(w) as the Fourier Transform of y(t) and equation 10 as the Inverse Fourier Transform. 12(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . .Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 This will then give us.Prepared (also subject responsible if other) Nr .No.REPORT Uppgjord (även faktaansvarig om annan) .Checked FX/D2001:007 Datum .
REPORT Uppgjord (även faktaansvarig om annan) .Prepared (also subject responsible if other) Nr .≥ 2 f max TS According to the Nyquist theorem. F { y ( nT S )δ ( 0 ) } = y ( nT S )F { δ ( 0 ) } multiplied with respectively time delay factor: Y ( ω ) = y ( 0 ) + y ( T S )e – jωT S + … + y ( ( N – 1 )T S )e – jω ( N – 1 )T S N–1 Y (ω) = ∑ y ( nT S )e – jωnT S n=0 (EQ 13) Since f = w/2pi is a discrete variable when we deal with a sampled signal and only adopt the discrete values: .Date Rev File EMW/FX/DC(Anders Wanner) 4 20010212 A1 THE DISCRETE FOURIER TRANSFORM When sampling an arbitrary analog signal the sampled signal can be expressed as: y ( t ) = y ( 0 )δ ( 0 ) + y ( T S )δ ( t – T S ) + y ( 2T S )δ ( t – 2T S ) + … + y ( ( N – 1 )T S )δ ( t – ( N – 1 )T S ) (EQ 12) Where 1 .Checked FX/D2001:007 Datum . The Fourier Transform for all of those functions equals the Fourier Transform for the undelayed function ie. The function described above is a sum of time delayed delta functions.Doc respons/Approved Kontr .No. each of them with the height y(nTS). 13(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .
.Prepared (also subject responsible if other) Nr . = N T  s N–1 ∑ y ( nT S )e 2πk – jn ..Checked FX/D2001:007 Datum .N1 Equation 13 becomes: 2πk Y .... ..1..1... ….T S NTs n=0 – j  N = ∑ y ( nT S ) e n=0 N–1 2π nk (EQ 15) Once again k = 0.2..2. .....N1 If we simplify the equation: 2π – j N WN = e (EQ 16) We will get the ﬁnal expression for the Fourier Transform: N–1 Y (k) = ∑ y ( n )W N nk n=0 (EQ 17) k = 0..= NTs NTs NTs NTs (EQ 14) Where k = 0.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 1 2 N–1 k 0. 14(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .1.N1 The factor WN is called the Twiddle Factor.No.. ..Doc respons/Approved Kontr .REPORT Uppgjord (även faktaansvarig om annan) .2. ..
No.1 20010212 A1 DEVELOPMENT OF THE FAST FOURIER TRANSFORM THEORY OF THE FAST FOURIER TRANSFORM If we consider the equation 17: N–1 Y (k) = ∑ y ( n )W N nk n=0 k = 0. 1. 15(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .REPORT Uppgjord (även faktaansvarig om annan) .1 second. …. This fact will give us N2 complex multiplications and (N)(N1) complex additions. This without taking into consideration the fact that the processor has to update pointers an so on.1 second: . If we have N2 complex multiplications and (N)(N1) complex additions ~ 2N2 additions and multiplications: 2 x 10242 x 1 micro second = 2.Prepared (also subject responsible if other) Nr .Doc respons/Approved Kontr . For instance.Date Rev File EMW/FX/DC(Anders Wanner) 5 5.Checked FX/D2001:007 Datum . Suppose that we have a microprocessor that can do an addition or a multiplication in 1 micro second and that this processor should compute a DFT on a 1 kbyte set of samples. 2. N – 1 and we consider the amount of additions and multiplications needed for computing the algorithm. let us consider the case when N = 4 Y ( 0 ) = y0 W + y1 W + y2 W + y3 W Y ( 1 ) = y0 W + y1 W + y2 W + y3 W Y ( 2 ) = y0 W + y1 W + y2 W + y3 W Y ( 3 ) = y0 W + y1 W + y2 W + y3 W Or simpliﬁed in the compact form: 0 3 6 0 2 4 0 1 2 0 0 0 0 3 6 9 Y (n) = W y(k ) nk (EQ 18) If we then consider the twiddle factor and y(k) we will in the worst case have two complex numbers. If we want the analyse to be made in real time we will have to have a distance between the samples that exceeds 2.
Doc respons/Approved Kontr .= 488Hz 2. Garwin was in his own research in a desperate need for a fast way to compute the Fourier Transform.05ms 1024 Which gives us: 1 .2 algorithm. After a while. who quickly worked out a computer program for this algorithm.2 HISTORY OF THE FAST FOURIER TRANSFORM In the beginning of the 1960’s.= 2.Prepared (also subject responsible if other) Nr . during a meeting of the President’s Scientiﬁc Advisory Commitee. Garwin found out that John W. To get some programming technique. request for copies and a writeup began accumulating. a faster processor or optimizing the algorithm. reports of other people using the same technique became known. When publishing this paper. 5. that he published together with Tukey.1 . There are two obvious ways to improve and increase the bandwidth. .Tukey algorithm is also called the Radix .No. Tukey went to IBM Research in Yorktown Heights and meet there James W Cooley.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 2. Tukey outlined to Garwin essentially what has led to the famous CooleyTukey algorithm. 16(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . Richard L.05 as the maximum sampling frequency. When questioned. The Cooley . but the original idea usually ascribe to Runge and König.REPORT Uppgjord (även faktaansvarig om annan) . and also Cooley was asked to write a paper on the algorithm which in 1965 became the famous paper "An algorithm for the machine calculation of complex Fourier series". Tukey was writing about the Fourier Transform.Checked FX/D2001:007 Datum . By taking the Nyquist theorem in respect. we can not sample a signal that holds a frequency component that exceeds half the maximum sampling frequency = 488Hz/2 = 244Hz. due to its signal splitting.
2.Checked FX/D2001:007 Datum . y ( 4 )….Date Rev File EMW/FX/DC(Anders Wanner) 6 20010212 A1 THE RADIX . N – 1 and we want to analyse the samples: { y ( n ) } = { y ( 0 ). y ( N – 1 ) } Doing the DFT for those two sequences will give: N . y ( N – 1 ) } If we consider the possibility to split the samples into odd and even samples: { y ( 2n ) } = { y ( 0 ). y ( 2 ).– 1 2 2nk + ∑ y ( 2n + 1 )W N ( 2n + 1 )k n=0 n=0 (EQ 19) By extracting and simplifying the twiddle factor we are able to simplify even further: N .Prepared (also subject responsible if other) Nr .Doc respons/Approved Kontr .– 1 2 Y (k) = ∑ y ( 2n )W N 2nk + WN k ∑ y ( 2n + 1 )W N 2nk n=0 n=0 (EQ 20) .2 ALGORITHM Once again consider equation 17: N–1 Y (k) = ∑ y ( n )W N nk n=0 k = 0. y ( N – 2 ) } { y ( 2n + 1 ) } = { y ( 1 ). …. ….– 1 2 N .No. y ( 5 ). y ( 2 ). 17(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . ….REPORT Uppgjord (även faktaansvarig om annan) . y ( 1 ). 1. y ( 3 ).– 1 2 Y (k) = ∑ y ( 2n )W N N .
REPORT Uppgjord (även faktaansvarig om annan) . + .Checked FX/D2001:007 Datum . N – 1 k (EQ 23) Where D and E represents the sums from equation 22.– 1 2 Y (k) = ∑ y ( 2n )W N ⁄ 2 + W N ∑ y ( 2n + 1 )W N ⁄ 2 nk k nk n=0 (EQ 22) N . 1. 2.Prepared (also subject responsible if other) Nr . This leads to that D and E in equation 23 is periodical with N/2. The computation gain by doing this will be: (as the multiplications in an ordinary DFT = N2) N N . ….= 4 4 2 This number should be adjusted a bit though the twiddle factor should be multiplied with the odd sum. 18(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .No. . If we study equation 23.+ .2 N = e 2π – j N⁄2 = WN ⁄ 2 (EQ 21) N .Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 – j  N W = e 2 2π 2 = e 2π – j .Doc respons/Approved Kontr . Y (k ) = D(k ) + W N E (k ) k = 0.  2 2 2 2 N2 N2 N2 = . Generally for a DFT of length N is that it is periodical in k with N.– 1 2 n=0 By comparing this equation with equation 17 we will ﬁnd that this by deﬁnition are two DFT’s with length N/2. we will ﬁnd that k goes from 0 to N1 but that D and E represents DFT of N/2. but this is of a ﬁrst order of N.
+ W N × E . = D ( 0 ) + W N × E ( 0 )    2 2 2 2 N N Y . 2 2 N N . 19(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .+ 1 = D ( 1 ) + W N 2 N +1 × E(1) … … Y ( N – 1) = D( N – 1) + W N N–1 N–1 N N × E ( N – 1 ) = D . 2 2 (EQ 26) By symmetrically.– 1 + W N × E .Checked FX/D2001:007 Datum .– 1 = D . .– 1 + W N .Prepared (also subject responsible if other) Nr .– 1 .Doc respons/Approved Kontr . = E ( k ) 2 (EQ 25) Calculating the DFT: Y (0) = D(0) + W N × E (0) Y (1) = D(1) + W N × E (1) Y (2) = D(2) + W N × E (2) … … 2 N N Y .+ 1 = D .– 1 2 1 0 N × E .Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 N D k + . the twiddle factor can be expressed as: .REPORT Uppgjord (även faktaansvarig om annan) . = D ( k ) 2 (EQ 24) N E k + .+ 1 + W N 2 2 N –1 2 N × E . 2 N  N N N 2 2 Y .– 1 . .No. = D .
2 N 0 Y . the so called FFTButterﬂy.REPORT Uppgjord (även faktaansvarig om annan) . FFTButterﬂy . D(k) Y(k) WN K E(k) WNK Y(k+N/2) Fig. 2 (EQ 28) By looking into equation 28 we will ﬁnd one elementary buildingblock.– 1  N × E .1.– 1 . = D ( 0 ) – W N × E ( 0 ) 2 1 N Y . 2 2 N . 2 N .– 1 .No.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 N k + 2 WN = –W N k (EQ 27) Which gives us: Y (0) = D(0) + W N × E (0) Y (1) = D(1) + W N × E (1) … 2 N N Y .Checked FX/D2001:007 Datum .+ 1 = D ( 1 ) – W N × E ( 1 ) 2 … … 2 N Y ( N – 1 ) = D .Doc respons/Approved Kontr .– 1 – W N .– 1 + W N .– 1 1 0 N × E .– 1 = D . 20(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . .Prepared (also subject responsible if other) Nr .
– 1 4 N .Checked FX/D2001:007 Datum .REPORT Uppgjord (även faktaansvarig om annan) .Doc respons/Approved Kontr .– 1 4 ∑ y ( 4n )W N ⁄ 2 + W N ⁄ 2 ∑ y ( 4n + 2 )W N ⁄ 2 + 2nk k 2nk n=0 N N n=0 .– 1 4 ( 2n + 1 )k + n=0 .– 1 4 4 k 2nk ( 2n + 1 )k W N ∑ y ( 4n + 1 )W N ⁄ 2 + ∑ y ( 4n + 3 )W N ⁄ 2 = n = 0 n=0 N .– 1 4 4 k 2nk k 2nk W N ∑ y ( 4n + 1 )W N ⁄ 2 + W N ⁄ 2 ∑ y ( 4n + 3 )W N ⁄ 2 n = 0 n=0 (EQ 31) .Prepared (also subject responsible if other) Nr .No.Date Rev File EMW/FX/DC(Anders Wanner) Which gives the equations: 20010212 A1 Y (k ) = D(k ) + W N × E (k ) k (EQ 29) Y ( k ) = D ( k ) –W N × E ( k ) k (EQ 30) Since dividing the sequences into smaller building blocks reduce the amount of multiplications.– 1 . 21(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .– 1 .– 1 4 Y (k) = N ∑ y ( 4n )W N ⁄ 2 + ∑ y ( 4n + 2 )W N ⁄ 2 2nk n=0 N N . we will continue to divide the sequences into new blocks. If we start with equation 22 and divide the sum into four new sums: N .
22(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . The example below shows the structure for N = 8. the constrain though is that the length N should be a power of two.– 1 4 n=0 .– 1 4 4 k nk 2k nk W N ∑ y ( 4n + 1 )W N ⁄ 4 + W N ∑ y ( 4n + 3 )W N ⁄ 4 n = 0 n=0 (EQ 33) And if we continue to divide into smaller sums until we only have N/2 2points DFT’s we will get the structure described below.– 1 .Checked FX/D2001:007 Datum .– 1 4 4 k nk k nk W N ∑ y ( 4n + 1 )W N ⁄ 4 + W N ⁄ 2 ∑ y ( 4n + 3 )W N ⁄ 4 n = 0 n=0 (EQ 32) And by using equation 21 backwards k 2k WN ⁄ 2 = WN N .– 1 .– 1 4 n=0 .– 1 4 Y (k) = N ∑ y ( 4n )W N ⁄ 4 + W N ∑ y ( 4n + 2 )W N ⁄ 4 + nk 2k nk n=0 N N . .Doc respons/Approved Kontr .No.Date Rev File EMW/FX/DC(Anders Wanner) And since 2 20010212 A1 WN ⁄ 2 = WN ⁄ 4 N .REPORT Uppgjord (även faktaansvarig om annan) .Prepared (also subject responsible if other) Nr .– 1 4 Y (k) = N ∑ y ( 4n )W N ⁄ 4 + W N ⁄ 2 ∑ y ( 4n + 2 )W N ⁄ 4 + nk k nk n=0 N N .
and this will give a total number of complex multiplications = (N/2)log2N and Nlog2N complex additions. 23(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Prepared (also subject responsible if other) Nr . Direct Calculation in flops Radix−2 Direct Calculation 4 3.5 0 0 100 200 300 400 500 N 600 700 800 900 1000 Fig.5 7 Radix−2 vs.Doc respons/Approved Kontr . The gain when comparing with a direct calculation is enormous as shown in the ﬁgure below: x 10 4. Radix2 vs.2.REPORT Uppgjord (även faktaansvarig om annan) .5 3 Number of flops 2.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 x(0) x(1) Stage 1 Stage 2 Stage 3 X(0) X(1) 1 x(2) x(3) x(4) x(5) x(6) x(7) 1 W8 0 1 W82 1 1 W80 1 1 W80 1 W82 1 W81 1 W82 W83 1 1 X(2) X(3) X(4) X(5) X(6) X(7) Fig.3.5 1 0. Direct calculation in ﬂops .No.Checked FX/D2001:007 Datum . Radix2 DFT structure When computing a DFT using a Radix2 algorithm for the case when N = 2x the decimation into smaller sums can be done x = log2N times.5 2 1.
MC(a+b) = temp2 .35*t)+sin(2*pi*0. end Radix−2 FFT algorithm 600 500 400 300 200 100 0 0 50 100 150 200 250 N 300 350 400 450 500 y(x) = sin(2*pi*0. x = sin(2*pi*0.35*t)+sin(2*pi*0. t = 1:1:1024. end %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate FFT using inplace nonrecursive DIT FFT. aO = a. N = length(x). %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Calculate Twiddle factor for n = 1:N/2 W(n) = exp(j*2*pi*(n1)/N).25*t) Fig. % Make in bit reversed order alfa = N/2. temp1 = W(c)*MC(a+b).temp1. radix2 for h = 1:(log(N)/log(2)) b = 2^(h1). MC(a) = MC(a) + temp1. b = bin2dec(fliplr(dec2bin(0:1:length(x)1)))+1. beta = beta*2.35*t)+sin(2*pi*0.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Part of radix2 Matlab algorithm.Doc respons/Approved Kontr .Prepared (also subject responsible if other) Nr . for d = 1:alfa c = 1. beta = 1.REPORT Uppgjord (även faktaansvarig om annan) .25*t). end alfa = alfa/2. W_r(n) = cos(2*pi*(n1)/N).Checked FX/D2001:007 Datum .4. 24(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . c = c + alfa. for e = 1:beta a+b. end a = aO + 2^(h). temp2 = MC(a).No. MC = x(b). a = 1. Radix2 algorithm comp. aO = 1. a = a + 1. W_i(n) = sin(2*pi*(n1)/N). with MATLAB function FFT . %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Initialize variables.25*t) MATLAB FFT algorithm 600 500 400 300 200 100 0 0 50 100 150 200 250 N 300 350 400 450 500 y(x) = sin(2*pi*0.
y ( N – 4 ) } { y ( 4n + 1 ) } = { y ( 1 ). y ( N – 3 ) } { y ( 4n + 2 ) } = { y ( 2 ). y ( 8 ). q ) ]W 4 lq l=0 3 lp p = 0.Prepared (also subject responsible if other) Nr . N = 4x). 2. 3 (EQ 34) where F(l.Date Rev File EMW/FX/DC(Anders Wanner) 7 20010212 A1 THE RADIX4 ALGORITHM By developing the Radix2 algorithm even further and using the base 4 instead we will get the a more complex algorithm but with less computation power.No.q) is given by: N . . 4 F ( l. 1. y ( 10 ). 25(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . …. 1.p + q 4 (EQ 36) . y ( 5 ). y ( 11 ).Checked FX/D2001:007 Datum . y ( N – 1 ) } By using the approach described in [8] and by applying: X ( p. m )W N mq 4 l = 0. By doing in the same way as we did with the Radix2 algorithm we divide the data sequence into four subsequence { y ( 4n ) } = { y ( 0 ). y ( 9 ). 2.REPORT Uppgjord (även faktaansvarig om annan) .Doc respons/Approved Kontr . q ) = ∑ m=0 x ( l. we will get the new constrain where the number of data points N in the DFT has to be the power of 4 (i. …. y ( 6 ). …. …. y ( 4 ). q ) = ∑ [ W N F ( l. y ( 7 ). 2. q ) = X .e. 3 N q = 0. …. 1. As we will understand.– 1 4 (EQ 35) And where: x ( l.– 1 . y ( N – 2 ) } { y ( 4n + 3 ) } = { y ( 3 ). m ) = x ( 4m + l ) N X ( p.
also referred to as Dragonﬂy .Checked FX/D2001:007 Datum . q ) q 0 j W N F ( 1. q ) 1 X ( 1. as described in [8]: X ( 0. q ) 1 = X ( 2.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 And the four N/4point DFT’s obtained from equation 35 are combined according to equation 34 and can be combined to yield the Npoint DFT.5. we will ﬁnd that we have a computer gain of 25% regarding the complex multiplications.REPORT Uppgjord (även faktaansvarig om annan) . q ) (EQ 37) We also have to note that W0N = 1. The matrix in equation 37 is better described with a Radix4 butterﬂy: W0 in0 x A Wq in1 x j 1 j B W2q in2 x 1 1 C W3q in3 x j 1 j D Fig. 26(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . Radix4 Butterﬂy.No. q ) 1 1 –j –1 j 1 –1 1 –1 1 W N F ( 0.Prepared (also subject responsible if other) Nr . If compared with the computational power used by the Radix2 algorithm in chapter 5. q ) – 1 W 2q F ( 2. q ) N –j 3q W N F ( 3. q ) 1 X ( 3.Doc respons/Approved Kontr . but that the number of complex additions increases by 50%. As the Radix4 algorithm consists of v steps (log(N)/log(4)) where each step involves N/4 number of butterﬂies we will get 3*v*N/4 = (3N/8)log2N number of complex multiplications and (3N/2)log2N complex additions. which will give us three complex multiplications and 12 complex additions per Radix4 butterﬂy.
Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 As we are interested in a complex FFT we need to derive the equations for the complex radix4 algorithm.No.Checked FX/D2001:007 Datum . 27(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Prepared (also subject responsible if other) Nr . starting with the easiest ones: (r =real. a = in0 × 1 A = a+b+c+d q b = in1 × W B = a – c – j(b – d ) ⇒ 2q C = a + c – (b + d ) c = in2 × W D = a – c + j(b – d ) 3q d = in3 × W (EQ 38) Which in the complex matter will give us. i = imag) Ar = ar + br + cr + dr Ai = ai + bi + ci + di Cr = ar – br + cr – dr Ci = ai – bi + ci – di (EQ 39) And continuing with B gives: B = ar + ai – cr – ci – jbr – jbi + jdr + jdi B = ar + ai – cr – ci – jbr + bi + jdr – di imag real (EQ 40) Divided into real and imaginary part: Br = ar – cr + bi – di Bi = ai – ci – br + dr (EQ 41) .REPORT Uppgjord (även faktaansvarig om annan) .Doc respons/Approved Kontr .
Doc respons/Approved Kontr . This render in: br = in1r × cos ( x ) + in1i × sin ( x ) bi = in1i × cos ( x ) – in1r × sin ( x ) (EQ 44) This is adequate for all input signals. bi. cr. 28(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . ai. With this in thoughts. we want as few complex multiplications as possible. in0i and so on with the twiddle factor. we will have to multiply the input in0r. . X = dragonﬂy speciﬁc value (twiddle factor) As the goal of this project is to implement a very fast fourier transform in a realtime programmable logic system.REPORT Uppgjord (även faktaansvarig om annan) . as it has less complex multiplications than the Radix2 algorithm.No.Checked FX/D2001:007 Datum .Prepared (also subject responsible if other) Nr .Date Rev File EMW/FX/DC(Anders Wanner) And the last one gives: 20010212 A1 D = ar + ai – cr – ci + jbr + jbi – jdr – jdi D = ar + ai – cr – ci + jbr – bi – jdr + di imag real (EQ 42) Divided into real and imaginary part: Dr = ar – cr – bi + di Di = ai – ci + br – dr (EQ 43) To get the inputs ar. which yields lots of logic. to choose the Radix4 algorithm for implementation was obvious. ci. br. dr and di.
x1 = x.Prepared (also subject responsible if other) Nr .REPORT Uppgjord (även faktaansvarig om annan) . x((j + 2*Lx)*r + k + 1) = t0 . b = exp(i*2*pi*j/L)*y(j*rx + r + k + 1). x(j*r + k + 1) = t0 + t2.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Part of radix4 Matlab algorithm %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Innitialize variables. d = exp(i*2*pi*3*j/L)*y(j*rx + 3*r + k + 1).t2.Doc respons/Approved Kontr . t2 = b + d. rx = 4*r. x = sin(2*pi*0.6. c = exp(i*2*pi*2*j/L)*y(j*rx + 2*r + k + 1).i*t3. x((j + Lx)*r + k + 1) = t1 . t3 = b . for j = 0:Lx1 for k = 0:r1 a = y(j*rx + k + 1). t0 = a + c.d.38*t). y = x.Checked FX/D2001:007 Datum . t = 1:1:256. 29(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . t = log(n)/log(4). t1 = a .c.No. r = n/L.35*t)+sin(2*pi*0. x((j + 3*Lx)*r + k + 1) = t1 + i*t3. %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% % Radix4 Algorithm for q = 1:t L = 4^q. end end end 100 80 60 40 20 Radix−4 FFT algorithm 0 50 100 n 150 200 250 100 80 60 40 20 Matlab FFT algorithm 0 50 100 n 150 200 250 Fig. n = length(x). Radix4 FFT algorithm compared with Matlab FFT . Lx = L/4.
The Xilinx FPGA model VirtexE is a state of the art programmable gate array for high speed. There is a great ﬁeld of models. refer to Xilinx VertexE data book [13]. For more information about Xilinx VirtexE.1 FPGA The realtime FFT construction was meant to be realized in a FPGA. like a FPGA. this due to space and memory requirements. in some cases recursive. Courtesy of Xilinx Inc. Conﬁgurable logic block.Prepared (also subject responsible if other) Nr .Checked FX/D2001:007 Datum . from small to large circuits. Available for implementation of this project was a PCB with a Xilinx VirtexE 1000 mounted. This circuit holds a CLB array of 64 x 96 = 6144 CLB blocks. Each of these blocks are divided into two slices. constructed and manufactured by Xilinx. . a parallel approach to the realization of FFT is available. CLB. a ﬁeld programmable gate array.No. Inc. Conﬁgurable logic block.REPORT Uppgjord (även faktaansvarig om annan) .7. 30(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . 8. By utilizing modern programmable circuits. where each slice consists of two lookup tables and some storage elements. The logic inside a FPGA is constructed around a building block called CLB. The slices are internally connected in between and are the basic highspeed logic in the circuit.Doc respons/Approved Kontr . Fig. with a processor or in hardware usually requires a sequential algorithm. This slows down the execution time.Date Rev File EMW/FX/DC(Anders Wanner) 8 20010212 A1 IMPLEMENTATION AND REALIZATION IN HARDWARE Classical implementation of the FFT algorithm. high complex logical construction.
Xilinx VirtexE 1000 I Q I Q f(x) A/D x(n) I/Q FFT Fig.8. where the FFTblock will be placed after a quadrature divided A/D converted signal as described in the ﬁgure below: FPGA. realize and implement a complex FFT in a FPGA.Checked FX/D2001:007 Datum .No. The speciﬁcations for the FFT was: FFTlength Minimum: 16 complex samples Maximum: 1024 complex samples Typical: 64 or 256 (16) Number of bits for the input signal Minimum: 10 bits Maximum: 16 bits Typical: 12 The idea was to implement the FFT as a buildingblock in a construction. a Xilinx VirtexE 1000.Prepared (also subject responsible if other) Nr .Doc respons/Approved Kontr .Date Rev File EMW/FX/DC(Anders Wanner) 8.2 COMPLEX FFT 20010212 A1 The Ericsson Microwave speciﬁcation for the project was to simulate. Construction conﬁguration .REPORT Uppgjord (även faktaansvarig om annan) . 31(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .
Doc respons/Approved Kontr .5 −1 −1.6 0.3 BITLENGTH 20010212 A1 The ﬁrst thing to consider when implementing something discrete in hardware is to consider the bit length with which you want to represent your sample.10. Radix−4 FFT. Radix4 FFT. 14bit length of samples Radix−4 FFT. 32(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .2 0.No.4 0.4 0 150 200 250 20 0 −20 −40 0 50 100 n Phase error 150 200 250 40 sin(2*pi*f1*p/Fs) MATLAB FFT dB Phase error 50 100 n 150 200 250 0.1 0 dB −0.2 −4 −6 −8 −10 0 50 100 n 150 200 250 Fig.Checked FX/D2001:007 Datum .1 50 100 n 150 200 250 Fig. Bits = 12 40 sin(2*pi*f1*p/Fs) 20 20 0 −20 −40 0 50 100 n Amplitude error factor Radix−4 FFT/MATLAB FFT 15 150 200 250 0 50 100 n Phase error 150 200 250 40 sin(2*pi*f1*p/Fs) MATLAB FFT dB −20 −40 0 −1 Phase error 50 100 n 150 200 250 dB 0 10 dB −2 −3 5 0 50 100 n 150 200 250 Fig.3 dB 0. Radix4 FFT.11.5 0.Prepared (also subject responsible if other) Nr . Radix4 FFT.REPORT Uppgjord (även faktaansvarig om annan) .8 0 −2 150 200 250 20 0 −20 −40 0 50 100 n Phase error 150 200 250 40 sin(2*pi*f1*p/Fs) MATLAB FFT dB Phase error 50 100 n 150 200 250 dB dB 0. The best way to do this is to simulate different types of bit lengths and compare the phase error and amplitude error factor with the constrains for your construction.9.5 −0. Bits = 16 40 sin(2*pi*f1*p/Fs) 20 0 −20 −40 0 50 100 n Amplitude error factor Radix−4 FFT/MATLAB FFT 0. 16bit length of samples . Bits = 14 40 sin(2*pi*f1*p/Fs) 20 0 −20 −40 0 50 100 n Amplitude error factor Radix−4 FFT/MATLAB FFT 1 0.Date Rev File EMW/FX/DC(Anders Wanner) 8. 12bit length of samples Radix−4 FFT.
4 RADIX4 FFT ALGORITHM.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 As the constrains for the realtime FFT construction was to minimize the phase and amplitude error as much as possible. i. N = 64 I 12bits I 14 bits I 16 bits I 18 bits Dragonﬂy rank Dragonﬂy rank Q 12bits Q 14 bits Dragonﬂy rank Q 16 bits Q 18 bits Fig. Those input samples were then multiplied with the phase factor for the correct block.Prepared (also subject responsible if other) Nr . The software Ease is a block model description language that lets you construct the algorithm as blocks and takes care of the interconnection between the blocks and then generates the VHDL code for this interconnection [10]. the bitlength of the input samples to the ﬁrst dragonﬂy rank was 12. but not more than that the construction could be realizable. N = 64 The ﬁrst attempt of the implementation phase was to implement a Radix4 FFT algorithm.12. there are 3 dragonﬂy ranks.No. 33(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Doc respons/Approved Kontr . the complex output of the multiplication is rounded of and truncated to 14 bits. As we also have a third rank of dragonﬂies. generate a complex output with the length of 16 bits. also with a precision of 12 bits.REPORT Uppgjord (även faktaansvarig om annan) . 8. which will by using the same model as for the ﬁrst rank of dragonﬂies. this due to the precision of the quadrature block in ﬁgure 8. with length 64 complex samples. N = 64 The FFTblock was constructed using the software EASE and the programming language VHDL.Checked FX/D2001:007 Datum . . Radix4 FFT. the complex output from our FFT construction will have 18 bits. as this result had a small value of phase error. In the ﬁrst revision of the construction. For a Radix4 FFT with length N = 64. with each rank comprising 16 dragonﬂies. Radix4 FFT. The simulation results pointed towards 16 bits. This results in a 14 bits input to the second rank of dragonﬂies.e. As the complex output of the multiplication will generate 212 * 212 => 24 bits. Very high speed integrated circuit Hardware Description Language.
the phase factor constant gets more complex and has to be realized as a high performance multiplier. round. 1). br_temp1 <= in1r*cos_1j. For this block the phase/twiddle factor is simple and can easy be realized as a right shift of the input signal. 1). di_temp <= di_temp1 . round. 1). bi_temp <= bi_temp1 . round. cr_temp2 <= in2i*sin_2j. end a0 .di_temp2. a new valid signal is generated. This register is divided into a real and an imaginary part where the input (complex) gets a new sample every clock cycle. bi_temp1 <= in1i*cos_1j. round. round.process radix4 if clk'event and clk = '1' then ar_temp <= in0r*cos_0j.bi_temp2. cr_round((N+2) ci_round((N+2) cr_round((N+2) ci_round((N+2) 1) 1) 1) 1) 1) 1) 1) 1) + + + +  ar_out <= temp1r ai_out <= temp1i br_out <= temp2r bi_out <= temp2i cr_out <= temp1r ci_out <= temp1i dr_out <= temp2r di_out <= temp2i end if. ci_temp <= ci_temp1 . it generates a valid signal that triggs the FFTblock that starts the FFT process. and an output shift register is started. cr_temp1 <= in2r*cos_2j. 1).code for one of the dragonﬂies in the ﬁrst of the ranks is as follows: begin . ai_temp <= in0i*sin_0j.Doc respons/Approved Kontr . 34(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . After all 64 values are delivered the valid signal gets low and shows that every sample has been shifted out. round. 1). br_temp <= br_temp1 + br_temp2.of Block1 br_round((N+2) bi_round((N+2) bi_round((N+2) br_round((N+2) br_round((N+2) bi_round((N+2) bi_round((N+2) br_round((N+2) downto downto downto downto downto downto downto downto dr_round((N+2) di_round((N+2) di_round((N+2) dr_round((N+2) dr_round((N+2) di_round((N+2) di_round((N+2) dr_round((N+2) The VHDL code above describes exactly the dragonﬂy illustrated in ﬁgure 5. cr_temp <= cr_temp1 + cr_temp2. For every clock cycle. When the shift register gets full. 1). round. bi_temp2 <= in1r*sin_1j. downto downto downto downto downto downto downto downto 1). 1). dr_temp <= dr_temp1 + dr_temp2.ci_temp2. di_temp2 <= in3r*sin_3j. downto downto downto downto 1).Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Part of the VHDL . . 1). br_temp2 <= in1i*sin_1j.Checked FX/D2001:007 Datum . The total block description is described in appendix A1. ci_temp2 <= in2r*sin_2j. 1). When the process is done. dr_temp2 <= in3i*sin_3j. dr_temp1 <= in3r*cos_3j.Prepared (also subject responsible if other) Nr . 1). . di_temp1 <= in3i*cos_3j. end process radix4. but as we get towards the last dragonﬂy in the construction. ar_round ai_round br_round bi_round cr_round ci_round dr_round di_round temp1r temp1i temp2r temp2i <= <= <= <= <= <= <= <= <= <= <= <= ar_temp((N*21) ai_temp((N*21) br_temp((N*21) bi_temp((N*21) cr_temp((N*21) ci_temp((N*21) dr_temp((N*21) di_temp((N*21) ar_round((N+2) ai_round((N+2) ar_round((N+2) ai_round((N+2) + + + + downto downto downto downto downto downto downto downto downto downto downto downto (N3)) (N3)) (N3)) (N3)) (N3)) (N3)) (N3)) (N3)) 1) 1) 1) 1) + + + + + + + + + + round. a new processed value is delivered to the output.REPORT Uppgjord (även faktaansvarig om annan) . The 64 complex input signals is shifted into the FFTblock using a shift register. ci_temp1 <= in2i*cos_2j.No.
N=64.Doc respons/Approved Kontr . Although the frequency peaks are in the correct position.No. The ﬁrst try was to consider another bit length. but we will get a higher amount of phase error. we have to consider the choice when we reuse the same multiplier for all of the multiplications. The sad conclusion when reaching this level was that the construction was to large for a VirtexE 1000 circuit and even for the next higher circuit. x=sin(2*pi*4/16*t). VirtexE 2000. A new consideration is also to instead of using four complex multiplier (8 real multiplier) as in the above mentioned implementation. Matlab FFT What we can see from the plot in ﬁgure 13 is that we get a truncation and rounding error that will generate a noise in the FFT plot. .Prepared (also subject responsible if other) Nr .Checked FX/D2001:007 Datum . Radix−4 and Matlab FFT. with twice as many CLBblocks.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 The code is before realization simulated. like the VirtexE 3200 where this construction would be implementable. This program is called Synplify and translates/syntesis from code to gate level. This fact is better described by looking the VHDL code below. By changing the design and the dragonﬂy blocks to 12 bits we will save lots of hardware. So the input will be 12 bits and the output will also be 12 bits. The next step after simulation is to realize the construction in the next program.13. but Xilinx has larger circuits. The same applies for the second and the third dragonﬂy ranks. This means that we have to consider another bitlength or a shorter FFTlength. using a simulation program called Modelsim. The result from this part showed perfect result when comparing with a MATLAB FFT of a sinusoidal signal. First FFT construction vs. The new dragonﬂy blocks (still as in ﬁgure 5) uses 12 bits as input and uses 12 bits precision on the phase factor. The output from the multiplication will generate 24 bits that gets rounded of and truncated back to 12 bits. 35(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .REPORT Uppgjord (även faktaansvarig om annan) . 12 bits input. a VHDL simulation program. 18 bits output Matlab Modelsim 30 25 20 15 10 5 10 20 30 n 40 50 60 Fig.
when 9 => ar_out <= ar_temp((2*N1) downto(N)) + cr_temp((2*N1) downto(N)) downto(N)) + dr_temp((2*N1) downto(N)). end case. ai_temp <= signal_in0*constant_in0.Checked FX/D2001:007 Datum . constant_in1 <= sin_2j. constant_in1 <= sin_3j. constant_in0 <= cos_0j.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Part of VHDLcode for Radix4 dragonﬂy with shared multiplier: begin . signal_in1 <= in2r. radix <= 10. radix <= 7.cr_temp((2*N1) downto(N)) downto(N)) + di_temp((2*N1) downto(N)).signal_in1*constant_in1. done_o <= '1'. br_out <= ar_temp((2*N1) downto(N)) . when 3 => signal_in0 <= in1i. constant_in0 <= cos_1j. cr_out <= ar_temp((2*N1) downto(N)) + cr_temp((2*N1) downto(N)) downto(N)) .cr_temp((2*N1) downto(N)) downto(N)) .br_temp((2*N. radix <= 6. radix <= 0. radix <= 9. radix <= 5. signal_in1 <= in1i. when 7 => signal_in0 <= in3i.dr_temp((2*N1) downto(N)). constant_in1 <= sin_2j. end if. bi_temp <= signal_in0*constant_in0 . constant_in0 <= cos_2j.signal_in1*constant_in1. constant_in1 <= sin_1j.di_temp((2*N1) downto(N)). ai_out <= ai_temp((2*N1) downto(N)) + ci_temp((2*N1) downto(N)) downto(N)) + di_temp((2*N1) downto(N)). end if. signal_in1 <= in3r. when 6 => signal_in0 <= in3r. + br_temp((2*N+ bi_temp((2*N+ bi_temp((2*N. radix <= 8. 1) 1) 1) 1) 1) 1) 1) 1) if run = '1' then case radix is when 0 => signal_in0 <= in0r. constant_in0 <= cos_3j. constant_in0 <= cos_3j. run <= '0'. when 2 => signal_in0 <= in1r. when 1 => signal_in0 <= in0i. constant_in0 <= sin_0j.signal_in1*constant_in1. constant_in0 <= cos_2j.Prepared (also subject responsible if other) Nr .process radix4 if clk'event and clk = '1' then if done_i = '1' then run <= '1'. ci_out <= ai_temp((2*N1) downto(N)) + ci_temp((2*N1) downto(N)) downto(N)) . when others => radix <= 0.di_temp((2*N1) downto(N)). constant_in1 <= sin_3j.ci_temp((2*N1) downto(N)) downto(N)) . cr_temp <= signal_in0*constant_in0 + signal_in1*constant_in1. done_o <= '0'.br_temp((2*N. dr_out <= ar_temp((2*N1) downto(N)) .bi_temp((2*N. di_out <= ai_temp((2*N1) downto(N)) . radix <= 2. radix <= 4.dr_temp((2*N1) downto(N)). constant_in1 <= sin_1j. signal_in1 <= in1r. ar_temp <= signal_in0*constant_in0. signal_in1 <= in3i. when 4 => signal_in0 <= in2r. radix <= 1. when 8 => di_temp <= signal_in0*constant_in0 .No.ci_temp((2*N1) downto(N)) downto(N)) + dr_temp((2*N1) downto(N)). dr_temp <= signal_in0*constant_in0 + signal_in1*constant_in1. br_temp <= signal_in0*constant_in0 + signal_in1*constant_in1. constant_in0 <= cos_1j. bi_out <= ai_temp((2*N1) downto(N)) . 36(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . signal_in1 <= in2i. ci_temp <= signal_in0*constant_in0 . when 5 => signal_in0 <= in2i.Doc respons/Approved Kontr .bi_temp((2*N+ br_temp((2*N . radix <= 3.REPORT Uppgjord (även faktaansvarig om annan) .
we utilize a SWITCHCASE structure. is that it has to have all the complex input samples in serial. The code translation program Synplify showed though that the construction is realizable in a VirtexE 2000 at a clock rate of 55 MHz. Timing diagram for Radix4 FFT. utilized almost 75% of a VirtexE 2000 circuit.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 As we can see in this code for the ﬁrst block.14. . shared multiplier The grey marked area in Shift_out(I/Q) is invalid data.Doc respons/Approved Kontr . using the same multiplier (actually two multipliers.No. It is easy to change the above described construction to get all the samples as a gigantic parallel bus.Prepared (also subject responsible if other) Nr . or just speed up the clock rate on the Shift_in register and the Shift_out register. 37(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .92 microseconds for the same computation[13]. described in ﬁgure 8 above we once again have to reconsider the FFTsample length and the bit length of the construction. The timing diagram below shows how the construction works. This slows down the construction. clkShift=N*clkFFT. as the clock rate constrain for this construction is in the multipliers in the dragonﬂies and not in the Shift registers. This can be compared with the Xilinx Virtex LogiCore blocks that utilize 1. Also this construction showed to be to large for a VirtexE 1000 circuit. this means that the computation phase alone is 640 nanoseconds in duration. As we wanted to implement and realize a construction in the accessible VirtexE 1000. 1 64 76 86 96 160 clk Shift_in(I/Q) 0 1 2 3 Shift_in_valid Rank 1 done Rank 2 done Rank 3 done Valid_data Shift_out(I/Q) 0 1 63 63 0 Fig.Checked FX/D2001:007 Datum . we can be quite sure that a 16 bits FFT with FFT length 16 complex samples will be possible to implement in a VirtexE 1000. one for the real part and one for the imaginary part) for all the multiplications between the phase factor and the input. as we have to use more clock cycles to get all the inputs through the dragonﬂy.REPORT Uppgjord (även faktaansvarig om annan) . As the above mentioned 12 bits FFT with length 64 complex samples. The total block diagram description is displayed in appendix A2. Another constrain that the LogiCore block has.
N = 16 A new construction with the FFT length of 16 complex samples had to be made. 38(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .5/16*t)+i*sin(2*pi*4.5 4 2 −1 −1 −0.16. It also consists of an input register that holds four predeﬁned signals for the FFT. TWOSCOMP(no_of_bits. this instead of using the quadrature divided A/D input signal: 4 x1 = sin 2π × .5 12 10 0 8 6 −0.Checked FX/D2001:007 Datum .Date Rev File EMW/FX/DC(Anders Wanner) 8.5 1 5 n FFT of "11" 10 8 6 0 4 −0.5 0 n 0.5 Fig.REPORT Uppgjord (även faktaansvarig om annan) .Doc respons/Approved Kontr .5 2 1 −1 5 n Sequence = "01".5 x2 = sin 2π × .5 2 −1 −0. The construction consists of two dragonﬂy ranks with each four dragonﬂies.5/16*t) 1 0. Sequence = "00".t + j sin 2π × . The signals are then converted to two’s complement using a Matlab function.t + j sin 2π × .No.5/16*t) 1 5 4 3 2 1 5 n 10 15 5 n 10 15 10 15 5 n FFT of "01" 10 15 FFT of "00" 0.5 0 −0.t 16 4 4 x3 = cos 2π × .5 Fig. cos(2*pi*4/16*t)+i*sin(2*pi*4/16*t) 1 FFT of "10" 16 14 0. Input signal X3 and X4 .5 0 n 0. Input signal X1 and X2 Sequence = "10".DATA).t  16 16 4.5 4.t 16 16 Where t goes from 1 to 16.sin(2*pi*4.5 20010212 A1 RADIX4 FFT ALGORITHM.5 6 5 0 4 3 −0.t 16 4. sin(2*pi*4/16*t) 1 8 7 0.Prepared (also subject responsible if other) Nr .15. cos(2*pi*4.5 x4 = cos 2π × .5 5 n 10 15 10 15 Sequence = "11".
Prepared (also subject responsible if other) Nr . x1x4. 1 16 26 41 clk Shift_in(I/Q) Shift_in_valid FFT done Shift_out(I/Q) Absolute Valid data Invalid data 0 1 0 15 14 15 0 1 2 3 15 0 Fig. Radix4 N = 16 The construction was syntesized in the code translation program Synplify and was realizable in 50 MHz using the below showed timing constrains.No. as we can see from the FFT plots above. 16 bits To logical analyser Input signal x1x4 .Doc respons/Approved Kontr .Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 The idea with those four signals is to let the FFT construction consider two real input signals. The two signals that are outside the FFT channel will spread through all the channels.18. one within a FFT channel and one outside.REPORT Uppgjord (även faktaansvarig om annan) . also here one within a FFT channel and one outside. and two complex input signals.17. 39(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . Timing diagram for Radix4 FFT length 16. The FFT construction with bit length 16 and the four predeﬁned signals. can be deﬁned as: From Pattern generator on clk reset’ Q 16 bits Q 16 bits Q 16 bits Shift_out register Dragonﬂy rank Dragonﬂy rank X1X4 I 16 bits I 16 bits I 16 bits I 16 bits Q 16 bits Fig.Checked FX/D2001:007 Datum .
.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 The input connection to the FPGA goes through a serial interface called HOTLINK. If we consider the following equations: A = Max { I .Doc respons/Approved Kontr . that is connected to a Pattern generator. the absolute value is delivered in the real part output channel to the Logic Analyser. Absolute value block As the calculation of the absolute value is a quite complex procedure to do. a absolute value block was made and implemented after the Shift_out register: Shift_out register From FFT rank 2 Absolute value I 16 bits I 16 bits I 16 bits Q 16 bits Q 16 bits Q 16 bits From Pattern generator Absolute Fig. .Prepared (also subject responsible if other) Nr .REPORT Uppgjord (även faktaansvarig om annan) . This block is controlled by the Absolute trigger.Checked FX/D2001:007 Datum . a Hewlett Pacard HP16522A (200 MHz in 32 channels) for generating the input stimuli. and the precision of this method is +1% / 2% of variation on the output.19. The output 16 bits vector from the real (I) and imaginary (Q) part. Q } B = Min { I . The imaginary part equals zero. a Hewlett Pacard HP16555D (2. 40(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .0 M Samples. When the Absolute trigger = 1. a high speed serial interface. is taken care of by a Logical Analyser.A + . connected to the Pattern generator. Q } 7 1 Absolutevalue = Max A.No. As this instrument has the possibility to display the output both as listing and as a graph.B 2 8 To logical analyser (EQ 45) This method is quite easy to implement in hardware. a alternative method is utilized. 110/500 MHz).
5ns 50MHz 16 (EQ 47) The construction require twice as many CLB’s then the Xilinx LogiCore block (963 CLB’s / 1876 CLB’s).Prepared (also subject responsible if other) Nr .Doc respons/Approved Kontr .Checked FX/D2001:007 Datum . The difference once again is that the Xilinx LogiCore block requires that the input data is delivered in serial. requires 10 clock cycles (at a clock rate of 50 MHz).No. . the above described block can take care of 16 new complex 16 bits samples on every clock cycle.× 10 × . The total block diagram description is displayed in appendix A3.33ns 120MHz (EQ 46) 1 1 T FFTEMW = . 41(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .REPORT Uppgjord (även faktaansvarig om annan) . The Xilinx LogiCore block requires 16 clock cycles (at a clock rate of 120 MHz) when the one mentioned above. 1 T FFTXilinx = .= 12.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 The construction can be compared with Xilinx LogiCore block [13] that also uses 16 bits precision on the input and the phase factor.× 16 = 133.
. I).Doc respons/Approved Kontr . A veriﬁcation/test pattern was programmed in the Pattern generator. in this case the Xilinx VirtexE 1000 will download the conﬁguration ﬁle and load your design. a circuit speciﬁc software called Design manger is utilized. When you then apply power to your PCB. imag = 0) sequence = 00 (signal x1) on_signal = 1 9) When Test_valid_data = 1 Collect fft_out_r and display as graph 10)When Test_valid_data = 0 again Goto 1 but change to next signal (x2x4) The output was collected in the Logical Analyser and transferred to Matlab for veriﬁcation.Date Rev File EMW/FX/DC(Anders Wanner) 9 9.REPORT Uppgjord (även faktaansvarig om annan) .Checked FX/D2001:007 Datum . Q) in Logic Analyser (listning) 6) When Test_valid_data = 0 again reset = 0 7) Synchronize HOTLINK 8) absolute = 1 (gives real = absolute value. Test pattern: 1) reset = 0 (active low) 2) Synchronize HOTLINK 3) absolute = 0 (output = real + imag part) sequence = 00 (signal x1) on_signal = 1 (the signal in on) 4) reset = 1 5) When output signal Test_valid_data = 1 Collect fft_out_r (real part. and Collect fft_out_i (imag part.Prepared (also subject responsible if other) Nr . 42(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .1 20010212 A1 VERIFICATION AND RESULTS TEST PATTERN To get a ﬁle to download to the conﬁguration PROM. the circuit.No.
Output graph signal X4. the output graph from the Logical Analyser displayed the following result. Output graph signal X2.Doc respons/Approved Kontr .2 MATLAB VERIFICATION 20010212 A1 The ﬁles from the Logic Analyser was loaded in Matlab and compared with the result from a FFT made by Matlab itself on the same input signal (x1x4).Prepared (also subject responsible if other) Nr .No. absolute = 1 .23. absolute = 1 Fig.20.22. Output graph signal X1. 43(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . When the absolute trigger from the Pattern generator is set to 1.Checked FX/D2001:007 Datum .Date Rev File EMW/FX/DC(Anders Wanner) 9.21. Fig. absolute = 1 Fig. Output graph signal X3. absolute = 1 Fig.REPORT Uppgjord (även faktaansvarig om annan) .
Matlab Signal X2.Doc respons/Approved Kontr . Signal X1. Matlab in dB) −311.0382 FPGA absolut (Error vs. Matlab .REPORT Uppgjord (även faktaansvarig om annan) .4225 Matlab 36 34 32 30 db 28 26 24 22 20 2 4 6 8 n 10 12 14 16 Fig.Checked FX/D2001:007 Datum .24. FFT FPGA absolut & FFT Matlab 38 FPGA complex (Error vs. Matlab in dB) −30. signal 1 vs. signal 2 vs.0382 Matlab 0 −50 −100 db −150 −200 −250 2 4 6 8 n 10 12 14 16 Fig. The value listing from the veriﬁcation with signal X1X4 was properly compared with FFT calculations in Matlab. we will see that the construction is working properly.No. Matlab in dB) −30. Output complex and absolute.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 If we compare ﬁgure 15 and 16 with ﬁgure 20 to 23. FFT FPGA complex. 44(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . FFT FPGA complex.Prepared (also subject responsible if other) Nr .4532 FPGA absolut (Error vs.25. Output complex and absolute. Matlab in dB) −311. FFT FPGA absolut & FFT Matlab FPGA complex (Error vs.
signal 4 vs. Matlab in dB) −301.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Signal X3.3538 Matlab 0 −50 −100 db −150 −200 −250 2 4 6 8 n 10 12 14 16 Fig. Matlab in dB) −39. Matlab in dB) −301. signal 3 vs. FFT FPGA absolut & FFT Matlab FPGA complex (Error vs. Matlab The Error value calculation was made according to the following formula ( FFT ( Matlab ) – FFT ( FPGA ) ) Error = 10 log 10 ∑  2 ∑ ( FFT ( Matlab ) ) 2 (EQ 48) . FFT FPGA absolut & FFT Matlab 44 FPGA complex (Error vs.Checked FX/D2001:007 Datum .27.26.REPORT Uppgjord (även faktaansvarig om annan) .No.Doc respons/Approved Kontr . 45(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .1 Matlab 42 40 38 36 db 34 32 30 28 26 2 4 6 8 n 10 12 14 16 Fig.2248 FPGA absolut (Error vs. Output complex and absolute. Matlab Signal X4.3538 FPGA absolut (Error vs. Output complex and absolute. FFT FPGA complex.Prepared (also subject responsible if other) Nr . FFT FPGA complex. Matlab in dB) −35.
less CLB’s is utilized. it would be possible to process a FFT on a signal of length N every clock cycle.REPORT Uppgjord (även faktaansvarig om annan) . 2 and 3) veriﬁed to be correct in Modelsim. A parallel programming approach seems to be the model when a real time system with high sampling rate is desired. (dragonﬂy rank 1.Prepared (also subject responsible if other) Nr . the Radix4 algorithm is preferable for hardware implementation. for the input and output shift registers. . 46(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd . Improvement and development of the input and output shift registers are also interesting as this would improve the bandwidth of a real time sampled signal when computing FFT. 14 and 16 bits. it is desirable to use 16 bits precision on the input signal and the phase factor By using a separate clock with clock rate clkShift=N*clkFFT. it would be desirable to implement and verify those constructions when a circuit board with the necessary Xilinx VirtexE circuit is available. clkFFT. 11 IDEAS FOR FURTHER STUDIES As there are two FFT constructions of length N=64.Doc respons/Approved Kontr . with the cost of longer execution time. By using shared multiplier in the dragonﬂies. To reach an acceptable level of phase error. one with precision 12 bits and one with precision 12.Checked FX/D2001:007 Datum .Date Rev File EMW/FX/DC(Anders Wanner) 10 CONCLUSION • 20010212 A1 • • • • As the Radix4 FFT algorithm utilizes less complex multipliers than the Radix2 FFT algorithm.No.
No. M. 47(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .. Vol. March 2000 [8] Proakis. Wanhammar.xilinx. E. C. Uppsala. C. Y. J. Kluwer Academic Publishers. IEEE Transaction on Signal Processing. O. scientists and engineers’. PrenticeHall. 1995. R. A. Sweden. M. No. Inc. 2000. 3. 48.: ’An introduction tp Fourier Analysis and some of its applications’..: ’Xilinx VirtexE Databook’. E. The Netherlands. 1992..: ’Understanding the FFT’.com.: ’Digital Signal Processing. http://www. Citrus Press. A.Prepared (also subject responsible if other) Nr . ISBN: 9150611712 [13] Xilinx Inc. July 1969 [2] Bracewell. 1996. The McGrawHill Companies.Doc respons/Approved Kontr . Goodman. Translogic BV. L.:’ Computational Frameworks for the Fast Fourier Transform’. 1995. SIAM.. IEEE Spectrum. 1996. Marcel Dekker. ISBN: 0964568187 .: ’The Fourier Transform and its applications’.: ’A guided tour of the fast Fourier transform’. (Acc 20010205) [11] Van Loan. R. 20002001. ISBN: 0824796101 [7] Ma. ISBN: 0898712858 [12] Vretblad. Principles. ISBN: 013307496X [4] Cartwright. W.: ’Fourier Transforms. (Acc 20010205) [14] Zonst. R. 1998.: ’A hardware Efﬁcient Control of Memory Addressing for HighPerformance FFT Processors’.Checked FX/D2001:007 Datum . J.: ’The fast fourier transform’. Prentice Hall. Ede. N. an introduction for engineers’. [10] Translogic: ’Ease and Eale User’s Manual’.: ’A SplitRadix Partial Input/Output Fast Fourier Transform Algorithm’.: ’Fourier Methods for mathematicians. 1990. 1996. G. Inc. G. Ellis Horwood Limited. http://www.com. algorithms and applications’. Inc.REPORT Uppgjord (även faktaansvarig om annan) . ISBN: 0133270165 [5] Gray. D. ISBN: 0792395859 [6] Lasser. 1974.: ’Introduction to Fourier Series’.Date Rev File EMW/FX/DC(Anders Wanner) 12 REFERENCES 20010212 A1 [1] Bergland. Inc.translogiciccorp. ISBN: 0133942899 [9] Roche. Department of mathematics. ISBN: 0073039381 [3] Brigham.
No.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Appendix A1 Ease block structure of Radix4 FFT. 48(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Doc respons/Approved Kontr .Checked FX/D2001:007 Datum . N = 64 .Prepared (also subject responsible if other) Nr .REPORT Uppgjord (även faktaansvarig om annan) .
Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Appendix A2 Ease block structure of Radix4 FFT.No.Prepared (also subject responsible if other) Nr .REPORT Uppgjord (även faktaansvarig om annan) . 49(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Checked FX/D2001:007 Datum . shared mult .Doc respons/Approved Kontr . N = 64.
No.Date Rev File EMW/FX/DC(Anders Wanner) 20010212 A1 Appendix A3 Ease block structure of Radix4 FFT. N = 16 .Doc respons/Approved Kontr .Checked FX/D2001:007 Datum . 50(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Prepared (also subject responsible if other) Nr .REPORT Uppgjord (även faktaansvarig om annan) .
Checked FX/D2001:007 Datum .REPORT Uppgjord (även faktaansvarig om annan) . 51(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .No.Date Rev File EMW/FX/DC(Anders Wanner) Appendix B Matlab code 20010212 A1 .Doc respons/Approved Kontr .Prepared (also subject responsible if other) Nr .
52(52) EMWMSNN(Magnus Nilsson) Dokansv/Godkänd .Doc respons/Approved Kontr .REPORT Uppgjord (även faktaansvarig om annan) .Checked FX/D2001:007 Datum .No.Prepared (also subject responsible if other) Nr .Date Rev File EMW/FX/DC(Anders Wanner) Appendix C Output listning 20010212 A1 .
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.