0% found this document useful (0 votes)

110 views42 pages

FFT Matrix Factorization Techniques

The document discusses the discrete Fourier transform (DFT) and efficient algorithms for computing it, specifically the fast Fourier transform (FFT). It describes how the DFT matrix can be factored into sparse matrices using techniques like the Cooley-Tukey algorithm. This factorization leads to an efficient divide-and-conquer FFT algorithm that uses O(n log n) operations, as opposed to the naive O(n^2) matrix-vector multiplication approach. Bit reversal permutation is also discussed as part of reordering the input for the FFT.

Uploaded by

dangtung.dtt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

110 views42 pages

FFT Matrix Factorization Techniques

Uploaded by

dangtung.dtt

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

The FFT

Via Matrix Factorizations

A Key to Designing High Performance Implementations

Charles Van Loan

Department of Computer Science
Cornell University

A High Level Perspective...

Blocking For Performance

A11 A12 A1q } n1

A
21 22
2q } n2
A = .

.
.
.
.
.
.
.
.

Ap1 Ap2 Apq } nq

|{z}
n1

|{z}
n2

|{z}
nq

A well known strategy for high-performance Ax = b and Ax = x

solvers.

Factoring for Performance

One way to execute a matrix-vector product
y = Fnx
when Fn = At A2A1 is as follows:
y=x
for k = 1:t
y = Ak x
end
A different factorization Fn = At A1 would yield a different
algorithm.

The Discrete Fourier Transform (n = 8)

0
8

0
8
y = F8x =
0
8
0

8
0

8
80

80
81
82
83
84
85
86
87

80
82
84
86
88
810
812
814

80
83
86
89
812
815
818
821

80
84
88
812
816
820
824
828

80
85
810
815
820
825
830
835

8 = cos(2/8) i sin(2/8)

80
86
812
818
824
830
836
842

7
8

14
8

21
8

x
28
8

35
8

42
8

849

The DFT Matrix In General...

If n = cos(2/n) i sin(2/n) then
pq

[Fn]pq = n

= (cos(2/n) i sin(2/n))pq
= cos(2pq/n) i sin(2pq/n)
Fact:
FnH Fn = nIn

Thus, Fn/ n is unitary.

Data Sparse Matrices

An n-by-n matrix A is data sparse if it can be represented with
many fewer than n2 numbers.
Example 1.
A has lots of zeros. (Traditional Sparse)
Example 2.
A is Toeplitz...

a
e
A =
f
g

b
a
e
f

c
b
a
e

d
c

b
a

More Examples of Data Sparse Matrices

A is a Kronecker Product B C, e.g.,

A =

b11C b12C
b21C b22C

If B IRm1m1 and C IRm2m2 then A = B C has m21m22

entries but is parameterized by just m21 + m22 numbers.

Extreme Data Sparsity

A =

n X
n X
n X
n
X

S(i, j, k, `) (2-by-2) (2-by-2)

|
{z
}
i=1 j=1 k=1 `=1
d times

A is 2d -by-2d but is parameterized by O(dn4) numbers.

Factorization of Fn

The DFT matrix can be factored into a short product of sparse

matrices, e.g.,
F1024 = A10 A2A1P1024
where each A-matrix has 2 nonzeros per row and P1024 is a permutation.

From Factorization to Algorithm

If n = 210 and
Fn = A10 A2A1Pn

then
y = Pnx
for k = 1:10
y = Ak x

2n flops.

end
computes y = Fnx and requires O(n log n) flops.

Recursive Block Structure

F8(:, [ 0 2 4 6 1 3 5 7 ]) =

1 0 0 0 1
0
0
0
0 1 0 0 0
0
0

0 0 1 0 0

2
0

F 0
3
0
0 8
0 0 0 1 0
4

1
0
0
0
1
0
0
0

0 F4

0
0 1 0 0 0 8 0

2
0 8 0
0 0 1 0 0
0 0 0 1 0
0
0 83
Fn/2 shows up when you permute the columns of Fn so that
the odd-indexed columns come first.

Recursion...

We build an 8-point DFT from two 4-point DFTs...

F8 x =

1
0
0
0
1
0
0
0

0
1
0
0
0
1
0
0

0
0
1
0
0
0
1
0

0 1
0
0
0
0 0 8 0
0

0 0
0 82 0
"
#

1 0
0
0 83 F4x(0:2:7)

0 1 0
0
0 F4x(1:2:7)

0 0 8 0
0

2
0 0
0 8 0
1 0
0
0 83

Radix-2 FFT: Recursive Implementation

function y =fft(x, n)
if n = 1
y = x
else
m = n/2; = exp(2i/n)
= diag(1, , . . . , m1)
zT = fft(x(0:2:n 1), m)
zB = fft(x(1:2:n 1), m)
y =
end

Im Im
Im Im

zT
zB

Overall: 5n log n flops.

The Divide-and-Conquer Picture

(0:1:15)

H
H

(0:2:15)

Q

Q

(0:4:15)

H
H

(1:2:15)

Q
Q

(2:4:15)

(1:4:15)

@
@

Q

Q

Q
Q
(3:4:15)

@
@

(0:8:15)

(4:8:15)

(2:8:15)

(6:8:15)

(1:8:15)

(5:8:15)

(3:8:15)

(7:8:15)

A
A

A
[0]

[8]

[4]

[12]

[2]

[10]

[6]

[14]

[1]

[9]

[5]

[13]

[3]

[11]

[7]

[15]

Towards a Nonrecursive Implementation

The Radix-2 Factorization...
If n = 2m and
m = diag(1, n, . . . , nm1),
then
Fnn =

mFm

Fm mFm

where n = In(:, [0:2:n 1:2:n]).

Note:

I2 Fm =

Fm 0
.
0 Fm

Im m

(I2 Fm).

The Cooley-Tukey Factorization

n = 2t
Fn = At A1Pn
Pn = the n-by-n bit reversal permutation matrix

Aq = I r
L/2 =

IL/2

L/2

IL/2 L/2

L/21
diag(1, L, . . . , L
)

L = 2q , r = n/L

L = exp(2i/L)

The Bit Reversal Permutation

(0:1:15)

H
H

(0:2:15)

Q

Q

(0:4:15)

H
H

(1:2:15)

Q
Q

(2:4:15)

(1:4:15)

@
@

Q

Q

Q
Q
(3:4:15)

@
@

(0:8:15)

(4:8:15)

(2:8:15)

(6:8:15)

(1:8:15)

(5:8:15)

(3:8:15)

(7:8:15)

A
A

A
[0]

[8]

[4]

[12]

[2]

[10]

[6]

[14]

[1]

[9]

[5]

[13]

[3]

[11]

[7]

[15]

Bit Reversal

x(0000)
x(0)
x(0001)
x(1)

x(0010)
x(2)

x(0011)
x(3)

x(0100)
x(4)

x(0101)
x(5)

x(0110)
x(6)

x(0111)
x(7)

x(8) = x(1000)

x(1001)
x(9)

x(1010)
x(10)

x(1011)
x(11)

x(1100)
x(12)

x(1101)
x(13)

x(1110)
x(14)
x(1111)
x(15)

x(0)
x(0000)
x(8)
x(1000)

x(4)
x(0100)

x(12)
x(1100)

x(2)
x(0010)

x(10)
x(1010)

x(6)
x(0110)

x(14)
x(1110)

x(0001) = x(1)

x(9)
x(1001)

x(5)
x(0101)

x(13)
x(1101)

x(3)
x(0011)

x(11)
x(1011)

x(7)
x(0111)
x(15)
x(1111)

Butterfly Operations
This matrix is block diagonal...
Aq = I r

IL/2

L/2

IL/2 L/2

L = 2q , r = n/L

r copies of things like this

At the Scalar Level...

a sH

b s

H
H

s
a + b

HH
Hs a b

Signal Flow Graph (n = 8)

s
H

s

@

HH
80

@
@
@

s
A

s y0

A

A

s
s y1
80

A A
@

@
A A
@

@
A A
@
@
A

s y2
s
s
@s A
82
80

HH
A

A

H

@
A
A

A

@
80
A
A

A

@
A A
HH
Hs

s
@s A
A s y3
81
A
A
A

A
A
A A
A
A

A A A A
s
s
s A
82
A A s y4

H
@
H
A A A
H
A
@
A
80
A
@
H
AA A A
@
H

Hs

s
s
A A s y5
80
83
@
A A
@
@

@
A A
@

@
A A
@
2
s
s
s

@

A A s y6
8

H
HH

A
@
0
@

A
8
H

@
A
H

Hs
s

s
@
A s y7

s

H
HHs
@

The Transposed Stockham Factorization

If n = 2t, then
Fn = St S2S1,
where for q = 1:t the factor Sq = Aq q1 is defined by
= I r BL ,

L = 2q , r = n/L,

q1 = r IL ,

IL L
BL =
,
IL L

L = L/2, r = 2r,

= diag(1, L, . . . , LL1).

Perfect Shuffle

x0
x0
x1
x1

x2
x4

x3
x5

(4 I2)
x4 = x2

x3
x5

x6
x6
x7
x7

Cooley-Tukey Array Interpretation

Step q:
k

2k 2k+1
L =2q1

8
>
<

>
:

r =n/L

r=n/L

L=2q

Reshaping

x =

x24 =

Transposed Stockham Array Interp

k+r

9
>
=

(q1)

xL r = FL xT
r L =

>
;

x(q) = Sq x(q1)

r =n/L
k

9
>
>
>
>
>
>
>
>
=

(q)

xLr = FL xT
rL =

>
>
>
>
>
>
>
>
;

r=n/L

L=2q

L =2q1 .

2 2 2 Basic Radix-2 Versions

Store intermediate DFTs by row or column

Intermediate DFTs adjacent or not.
How the two butterfly loops are ordered.
x =

IL/2

L/2

IL/2 L/2

L = 2q , r = n/L

The Gentleman-Sande Idea

It can be shown that FnT = Fn and so if

then

Fn = At A1PnT
Fn = FnT = PnAT1 ATt

and we can compute y = Fnx as follows...

y = x
for k = t: 1:1
y = ATk x
end
y = Pny

Convolution and Other Aps

From problem space to DFT space via
for k = t: 1:1
x = ATk x
end
x = Pnx
Do your thing in DFT space. Then inverse transform back to
Problem space via
x = PnT x
for k = 1:t
x = Ak x
end
x = x/n
Can avoid the Pn ops by working in scrambled DFT space.

Radix-4
Can combine four quarter-length DFTs to produce a single fulllength DFT:

(a + c) + (b + d)
I I I I
a
I iI I iI b (a c)i(b d)
,
=
v=
I I I I c (a + c) (b + d)
(a c)+i(b d)
I iI I iI
d

The radix-4 butterfly.

Better re-use of data.
Fewer flops. Radix-4 FFT is 4.25n log n (instead of 5n log n).

Mixed Radix

P

#
cPP

#
PP
c

PP

#
c
PP
#
c

24

@
8

@
@

@
8

Multiple DFTs
Given: n1-by-n2 matrix X.
Multicolumn DFT Problem...
X Fn1 X
Multirow DFT Problem...
X XFn2

Blocked Multiple DFTs

The 4-Step Framework

A matrix reshaping of the x Fnx operation when n = n1n2:
xn1n2 xn1n2 Fn2

Multiple row DFT

xn1n2 Fn(0:n1 1, 0:n2 1). xn1n2

xn2n1 xTn1n2
xn2n1 xn2n1 Fn1

Pointwise multiply

Transpose
Multiple row DFT .

Can be arranged so communication is concentrated in the transpose step.

Distributed Transpose: Example

Initial:

X01
X11
X21
X31

X02
X12
X22
X32

X03
X13
.
X23
X33

T
X01
T
X11
T
X21
T
X31

T
X02
T
X12
T
X22
T
X32

T
X03

T
X13

.
T
X23

T
X33

X00
X10
X =
X20
X30

Transpose each block:

T
X00

XT
10
X
XT
20
T
X30

Now regard as 2-by-2 and block transpose each block:

T XT XT XT
X
00 10 02 12

T
T
T
T
X X X X
01 11 03 13 .
X

T
T
T
T
X X X X
20 30 22 32
T XT XT XT
X21
31 23 33

Now do a 2-by-2 block transpose:

T XT XT XT
X
00 10 20 30
T

T
T
T
X X X X
01 11 21 31 .
X
T

X XT XT XT
02 12 22 32
T XT XT XT
X03
13 23 33

Factorization and Transpose

xnm xTmn
corresponds to
x P (m, n)x
where P (m, n) is a perfect shuffle permutation, e.g.,
P (3, 4) = I12(:, [0 3 6 9 1 4 7 10 2 5 8 11])
Different multi-pass transposition algorithms correspond to different factorizations of P (m, n).

Two-Dimensional FFTs
If X is an n1-by-n2 matrix then is 2D DFT is
X Fn1 XFn2
Option 1.
X Fn1 X
X XFn2
Option 2. Assume n1 = n2 and Fn1 = At A1.
for q = 1:t

end

X Aq XATq

Interminlgling the column and row butterfly computations can

result in better locality.

3-Dimensional DFTs
Given X(1:n1, 1:n2, 1:n3 ), apply DFT in each of the three dimensions.
If
x = reshape(X(1:n1, 1:n2, 1:n3), n1n2n3, 1)
then the problem is to compute
x (Fn3 Fn2 Fn1 )x

i.e.,
x (In3 In2 Fn1 )x
x (In3 Fn2 In1)x
x (Fn3 In2 In1)x

d-Dimensional DFTs
Sample for d = 5:
=1
=2
=3
=4
=5

X(1, 2 , 3, 4, 5)
X(2, 3 , 4, 5, 1)
X(2, 3 , 4, 5, 1)
X(3, 4 , 5, 1, 2)
X(3, 4 , 5, 1, 2)
X(4, 5 , 1, 2, 3)
X(4, 5 , 1, 2, 3)
X(5, 1 , 2, 3, 4)
X(5, 1 , 2, 3, 4)
X(1, 2 , 3, 4, 5)

Fn1
Tn1,n
Fn2
Tn2,n
Fn3
Tn3,n
Fn4
Tn4,n
Fn5
Tn5,n

Intemingling of component DFTs and tensor transpositions.

References

FFTW: http:www.fftw.org
C. Van Loan (1992). Computational Frameworks for the Fast
Fourier Transform, SIAM Publications, Philadelphia, PA.

FFT Matrix Factorization Techniques
No ratings yet
FFT Matrix Factorization Techniques
42 pages
DSP-Lec 4
No ratings yet
DSP-Lec 4
32 pages
Dit FFT
100% (1)
Dit FFT
18 pages
FFT Algorithms in Digital Signal Processing
No ratings yet
FFT Algorithms in Digital Signal Processing
9 pages
Fast Fourier Transform in DAA
100% (3)
Fast Fourier Transform in DAA
32 pages
Radix-2 FFT Algorithm Explained
No ratings yet
Radix-2 FFT Algorithm Explained
15 pages
Fast Fourier Transform Algorithms Explained
No ratings yet
Fast Fourier Transform Algorithms Explained
61 pages
Exp 3 Report
No ratings yet
Exp 3 Report
14 pages
155.FFT Ropec
No ratings yet
155.FFT Ropec
7 pages
Lab.8. DFT and FFT Transforms
No ratings yet
Lab.8. DFT and FFT Transforms
16 pages
Fast Fourier Transform Implementation
No ratings yet
Fast Fourier Transform Implementation
17 pages
You Only Live Once
No ratings yet
You Only Live Once
13 pages
FFT Algorithms for Engineers
No ratings yet
FFT Algorithms for Engineers
23 pages
DIT and DIF Algorithms
0% (1)
DIT and DIF Algorithms
21 pages
Understanding Fast Fourier Transform (FFT)
No ratings yet
Understanding Fast Fourier Transform (FFT)
23 pages
Lab Notes 6
No ratings yet
Lab Notes 6
21 pages
Fast Fourier Transform Course Outline
No ratings yet
Fast Fourier Transform Course Outline
53 pages
FFT Algorithms in Discrete Time Signal Processing
100% (2)
FFT Algorithms in Discrete Time Signal Processing
39 pages
N-Point FFT Algorithm Explained
No ratings yet
N-Point FFT Algorithm Explained
10 pages
ASPA Module 1
No ratings yet
ASPA Module 1
60 pages
FFT and Its Relation to z-Transform
No ratings yet
FFT and Its Relation to z-Transform
27 pages
Pipelined Parallel FFT Architecture
No ratings yet
Pipelined Parallel FFT Architecture
5 pages
DFT and DTFT Relationship Explained
No ratings yet
DFT and DTFT Relationship Explained
48 pages
FFT Algorithms for Engineers
No ratings yet
FFT Algorithms for Engineers
37 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
22 pages
Fourier Family DFTFFT
No ratings yet
Fourier Family DFTFFT
27 pages
DSP Expt 4 Mane 078 B1
No ratings yet
DSP Expt 4 Mane 078 B1
8 pages
FFT for Engineers
No ratings yet
FFT for Engineers
11 pages
FFT Algorithms: Decimation Techniques
No ratings yet
FFT Algorithms: Decimation Techniques
23 pages
FFT Algorithm Implement in C
No ratings yet
FFT Algorithm Implement in C
9 pages
DSP6 FFT
No ratings yet
DSP6 FFT
20 pages
Fast Fourier Transform (FFT) (Theory and Implementation)
No ratings yet
Fast Fourier Transform (FFT) (Theory and Implementation)
47 pages
Dif FFT 121008054352 Phhjggpapp01
No ratings yet
Dif FFT 121008054352 Phhjggpapp01
23 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
19 pages
Fast Fourier Transform in MATLAB
No ratings yet
Fast Fourier Transform in MATLAB
9 pages
Radix-4 FFT-DIF Algorithm Overview
No ratings yet
Radix-4 FFT-DIF Algorithm Overview
7 pages
FFT: Decimation in Time & Frequency
No ratings yet
FFT: Decimation in Time & Frequency
37 pages
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
No ratings yet
IP FFT Processors For OFDM in FPGA (Http://bbwizard - Com)
9 pages
Fast Fourier Transform Methods Explained
No ratings yet
Fast Fourier Transform Methods Explained
18 pages
Title Computational Complexity of FFT Algorithm: Objective
No ratings yet
Title Computational Complexity of FFT Algorithm: Objective
4 pages
Fast Fourier Transform (FFT) Overview
No ratings yet
Fast Fourier Transform (FFT) Overview
20 pages
FFT 2025
No ratings yet
FFT 2025
39 pages
Fast Fourier Transform Explained
No ratings yet
Fast Fourier Transform Explained
14 pages
FFT Algorithms and DFT Basics
No ratings yet
FFT Algorithms and DFT Basics
37 pages
FFT A Brief Overview
No ratings yet
FFT A Brief Overview
30 pages
Decimation in Time and Frequency FFT
No ratings yet
Decimation in Time and Frequency FFT
37 pages
DIT Radix-2 FFT Algorithm Explained
No ratings yet
DIT Radix-2 FFT Algorithm Explained
4 pages
Impact of DPU 2017
No ratings yet
Impact of DPU 2017
6 pages
FFT for Engineers
No ratings yet
FFT for Engineers
25 pages
Fast Fourier Transform Overview
No ratings yet
Fast Fourier Transform Overview
11 pages
Dayananda Sagar College of Engineering
No ratings yet
Dayananda Sagar College of Engineering
7 pages
Pipelined 128-Point FFT Design Project
No ratings yet
Pipelined 128-Point FFT Design Project
70 pages
Implementation of Fast Fourier Transform (FFT) Using VHDL
93% (30)
Implementation of Fast Fourier Transform (FFT) Using VHDL
71 pages
Chapter 2 - FIR Filters - Digital Filter Design
No ratings yet
Chapter 2 - FIR Filters - Digital Filter Design
100 pages
Questions On Digital Signal Proseccesing
No ratings yet
Questions On Digital Signal Proseccesing
16 pages
DFT Properties
No ratings yet
DFT Properties
58 pages
Title: Medical Image Classification Using Artificial Neural Networks
No ratings yet
Title: Medical Image Classification Using Artificial Neural Networks
12 pages
Dynamic DSP System for Egg Weighing
No ratings yet
Dynamic DSP System for Egg Weighing
6 pages
Properties of 2D Systems
No ratings yet
Properties of 2D Systems
19 pages
DSP
100% (1)
DSP
119 pages
Window - To - Viewport Transformation
No ratings yet
Window - To - Viewport Transformation
21 pages
LABVIEW Digital Signal Processing Experiments
No ratings yet
LABVIEW Digital Signal Processing Experiments
4 pages
What Are The Mean and Median Filters
No ratings yet
What Are The Mean and Median Filters
6 pages
Iiirdyear, B.Tech-Electrical Engineering Digital Signal Processing
No ratings yet
Iiirdyear, B.Tech-Electrical Engineering Digital Signal Processing
2 pages
Colorize Hair in Photoshop Tutorial
No ratings yet
Colorize Hair in Photoshop Tutorial
15 pages
Video Encoding: Basic Principles: Felipe Portavales Goldstein
No ratings yet
Video Encoding: Basic Principles: Felipe Portavales Goldstein
33 pages
Pixel Transforms in Computer Vision
No ratings yet
Pixel Transforms in Computer Vision
20 pages
Multirate Filters & Wavelets in FPGAs
No ratings yet
Multirate Filters & Wavelets in FPGAs
22 pages
Digital Signal Processing Exam Paper
No ratings yet
Digital Signal Processing Exam Paper
2 pages
MATLAB Image Processing Techniques
No ratings yet
MATLAB Image Processing Techniques
9 pages
Impulse Invariance Method For Analog-To-Digital Filter Conversion - MATLAB Impinvar - MathWorks India PDF
No ratings yet
Impulse Invariance Method For Analog-To-Digital Filter Conversion - MATLAB Impinvar - MathWorks India PDF
2 pages
Simple Image Saliency Detection From Histogram Backprojection
No ratings yet
Simple Image Saliency Detection From Histogram Backprojection
7 pages
Pyramid Attention Network For Image Restoration
No ratings yet
Pyramid Attention Network For Image Restoration
19 pages
DSP Unit6-Vrc
No ratings yet
DSP Unit6-Vrc
73 pages
Digital Signal Processing Course Overview
No ratings yet
Digital Signal Processing Course Overview
26 pages
Ultrasonic Time of Flight Diffraction 1st Edition - Sample
67% (3)
Ultrasonic Time of Flight Diffraction 1st Edition - Sample
19 pages
Image Registra, On With 3D Slicer: Sonia Pujol, PH.D
No ratings yet
Image Registra, On With 3D Slicer: Sonia Pujol, PH.D
36 pages
ISSCC2022-Short Course-Murmann
No ratings yet
ISSCC2022-Short Course-Murmann
93 pages
Image Segmentation Explained
No ratings yet
Image Segmentation Explained
16 pages
Sampling of Continuous-Time Signals
No ratings yet
Sampling of Continuous-Time Signals
11 pages
Erica Synths Pico DSP User Manual
No ratings yet
Erica Synths Pico DSP User Manual
4 pages
051796F0 Schedule Log Entries
No ratings yet
051796F0 Schedule Log Entries
25 pages
Bec503-Digital-Signal-Processing 67837918 2025 11 02 04 17
No ratings yet
Bec503-Digital-Signal-Processing 67837918 2025 11 02 04 17
6 pages

FFT Matrix Factorization Techniques

Uploaded by

FFT Matrix Factorization Techniques

Uploaded by

The FFT

Via Matrix Factorizations

Charles Van Loan

A High Level Perspective...

Blocking For Performance

A11 A12 A1q } n1

Ap1 Ap2 Apq } nq

A well known strategy for high-performance Ax = b and Ax = x

Factoring for Performance

The Discrete Fourier Transform (n = 8)

The DFT Matrix In General...

Thus, Fn/ n is unitary.

Data Sparse Matrices

More Examples of Data Sparse Matrices

If B IRm1m1 and C IRm2m2 then A = B C has m21m22

Extreme Data Sparsity

S(i, j, k, `) (2-by-2) (2-by-2)

A is 2d -by-2d but is parameterized by O(dn4) numbers.

The DFT matrix can be factored into a short product of sparse

From Factorization to Algorithm

Recursive Block Structure

We build an 8-point DFT from two 4-point DFTs...

Radix-2 FFT: Recursive Implementation

Overall: 5n log n flops.

The Divide-and-Conquer Picture

Towards a Nonrecursive Implementation

where n = In(:, [0:2:n 1:2:n]).

The Cooley-Tukey Factorization

The Bit Reversal Permutation

r copies of things like this

At the Scalar Level...

Signal Flow Graph (n = 8)

The Transposed Stockham Factorization

Cooley-Tukey Array Interpretation

Transposed Stockham Array Interp

2 2 2 Basic Radix-2 Versions

Store intermediate DFTs by row or column

The Gentleman-Sande Idea

and we can compute y = Fnx as follows...

Convolution and Other Aps

The radix-4 butterfly.

Blocked Multiple DFTs

The 4-Step Framework

Multiple row DFT

xn1n2 Fn(0:n1 1, 0:n2 1). xn1n2

Can be arranged so communication is concentrated in the transpose step.

Distributed Transpose: Example

Transpose each block:

Now regard as 2-by-2 and block transpose each block:

Now do a 2-by-2 block transpose:

Factorization and Transpose

Interminlgling the column and row butterfly computations can

Intemingling of component DFTs and tensor transpositions.

You might also like