,
VOL. ASSP22,NO.
2,
APRIL
1974
87
Fast Convolution Using FermatNumber Transforms with
Applications to Digital Filtering
AbsfracfThestructure
of transformshaving the convolution
property is developed. A particular transform is proposed that is defined on a finite ring of integers with arithmetic carried out modulo
Fermat numbers. This Fermat number transform (FNT) is ideally
suited todigital computation, requiringon the order of N log N additions, subtractions and bit shifts, but no multiplications. In addition
to.being efficient, the Fermat number transform implementation of
convolution is exact, i.e., there is no roundoff error. There is a restrictionon sequencelength imposedbyword
length butmultidimensional techniques are discussed which overcome this limitation. Results of an implementationon the IBM 370/155 are presented
andcomparedwith
thefast Fouriertransform(FFT)showing
a
substantial improvement in efficiency and accuracy.
convolution property in a finite field and also gives the
conditions for having transforms with the cyclic convolution property in a finite ring
of integers. Good [5] also
mentioned the use of transforms in a finite
ring of integers.
Schonhage and Strassen [SI defined transformshaving
the cyclic convolution property modulo a Fermat number and discussed their application to fast multiplication
of very large integers. Knuth [7] elaborated on the work
of Schonhage and Strassen. Nicholson [SI presented an
algebraic theory of finite Fourier transfornls in any ring
and established fastFFTtypealgorithmstocompute
these transforms. Rader [SI, [lo] proposed finite transformsinrings
of integersmodulo both Mersenne and
I. INTRODUCTION
Fermatnumbers. He firstproposed the application to
convolution property of certaintransformscan
digital signal processing, showed that thetransforms could
be used to efficiently compute the cyclic convolution be calculated using only additionsand bit shifting,showed
of two long discrete sequences. One such transform is the the ,wordlength constraint,and suggested twodimensional
discrete Fourier transform (DFT) with the fast Fourier transformsas a possible relaxation of thatconstraint.
transform (FFT) implementation [l] which operates on Agamal and Burrus [ll] defined Fermat transforms and
sequences in the complex number field. To compute the also proposed their applicationfor fast digital convolution.
D F T of a sequence of length iV requires on the order of
The purpose of this paperis t o develop the general
( N / Z ) logz ( N / 2 ) complex multiplications, which often structure of transforms with the convolution property and
take most of the computation time. I n addition, the FFT then present the numbertheoretictransformmoduloa
implementation of cyclic convolutionintroduces signifi Fernrat number as a practical scheme for use with digital
cantamounts of roundoff error [ Z ] , thusdeteriorating
computation. The properties are then discussed for the
the signaltonoise ratio. When working with digital ma implementation of digital filters and the results of a softchines, the data are available only with some finite pre ware implementation on an IRM 370/155 are given. It, is
cision and, therefore, without loss of generality, the data believed that thepresentation here is differentfrom prevican be considered to be integers with some upper bound. ous works and gives more insight into these transforms.
In this digital domain, the complex number field of the One result of this approach allows the maximum length
continuous domain can be replaced with a finite field or, of sequences that can be convolved to be increased by a
more generally, a finite ring of integers with addition and factor of 2 over what hadpreviously been reportedin [IO].
multiplicationmodulosomeinteger
F . I n this ring,
The choice of terminology in a new area is always diffiusing transforms similar to the DFT, the cyclic canvolu cult. I n this paper the generalname,numbertheoretic
tion can be performed very efficiently and without any
transform, is usedfor
anytransform
usingnumberroundoff error.Onesuchtransformpresentedhereisa
theoreticconceptsinitsdefinition.
The twoparticular
number theoretic transform which operates in the ring of transforms with arithmetic modulo Mersenne and Fermat
integers modulo a Fermat number. Computation of this numbers are called klersennenumber transforms
(MNT’s)
transform of length N , analogous to an FFI’, requires on and Fermat number transforms (FNT’s) . The particular
the order of N log2 AT additions, bit shifts, and subtrac MNT’s and FNT’s with a basis function a = 2 are called
tions, but no multiplication. This is considerably simpler
Rader transforms (RT’s).
than the FFT.
These transforms are truly digital transforms, taking
Knuth [SI has proposed the use of transforms in finite into account the quantization in amplitude and the finite
fields. Pollard [4] discussed transforms having the cyclic precision of digital signals. They bear the same relation
to digitalsignal astheDFT
does to discretetime or
ManuscriptreceivedJuly16,
1973; revisedNovember19,1973.
sampled data signals and the Fourier or Laplace transThis work was supported by NSF Grant GK23697.
The authors are with the Department of Electrical Engineering, forms do to continuoustime signals. In the same manner
Rice University, Houston, Tex. 77001.
that the relation of discretetime signals to continuous
T””
88
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, APRIL
time signals through sampling involves a possible folding
or aliasing in the frequency domain, the relation of calculationswith the DFT to calculations with the number
theoretictransforms involves a possible folding of the
amplitude that must be taken into account.
I n Section 11, the structures of t'ransforms having the
cyclic andthe noneyclic convolutionpropertiesare
developed. It is shown Ohat what we call the DFT structure
is t,he only structure which supports the cyclic convolution property and any transform
having the DFT structure
will have the cyclic convolution property. I n Section 111,
the necessary and sufficient conditions for the existmce
of transforms in a ring of integers modulo of an integer F
are establishcd. In Section IV, FNT's are defined which,
computationally,appear to be t.he besttransforms for
implementing convolution. I n Section V, reference is given
to a twodimensional transform implementat'ion for long
onedimensional convolution. I n Section VI, a hardware
implementation to compute the FNT is suggested. Comparison of the FNT with the FFT implementation of the
I>FT,to implement convolution, is made in Sect,ion VII.
I n Section VIII, a software implementation of the FNT
on an IBM 370/155 is prescnt,ed and timing comparisons
are made with the PFT.
For cyclic convolution lengths
up to 236, F N T implementations are faster by a factor
of 3 to 5 over the best FFT implementations which make
use of the symmet,ry of the DFT for real data. For computerswith slower multiplication, theadvantages look
even bet'tcr. lGnally, several generalizat,ions and applications are presented.
11. THE STRUCTURE OF TRASSFOILMS HAVING
T H E CONVOLUTION PROPERTY
In this section the structure of transforms having the
cyclic convolution property, i.e., the transform of cyclic
convolution of two sequences is equal to the product of
theirtransforms, is establishcd. The class of transforms
considered is the set of lincar nonsingular (invertible)
t,ransforms which map a sequence of length A r to another
sequencc of length N . I t is shown that what lvc call t,he
D I T structure is tho only structure which supports the
cyclic convolut,ion propert,y and, moreover, any transform
having the DPT structurcwill have the cyclic convolution
property.
Let xn and h,, n = O , l , . ,lV  1, be two sequences
which are to be cyclically convolved as given by
..
N 1
yn
=
xn*hn =
2khnk
n
=
O,l,..
N

1.

X
=
TX
=
Th
Y
=
Tg.
Consider the conditions on
so that
tk,m's
Y = X @ H
(3)
where @ denotes term by term multiplicat'ion of the
vectors.
Equations ( 2 ) and (1) are combined to give
Nl
y k
N1
=
N1
=
tk,nyn
xmhnm
tk,n
n=O
n=O
m=O
x 1 N 1
=
xmhmtk,m+l
m=O
Z=O
N 1
x k
=
tk,mxnz
m=O
A' 1
HA
=
IC
tk,zhl
=
O,l,.*
.,.I:

1.
z=o
4)
The individual terms in ( 3 ) can be rewritten as
Y k = X k H k
which, using (4), becomes
N 1 A' 1
A  1 A'1
&hltk,m+z
5)
xm:mhztk,nztk,z.
z=o
m=o
z=o
m=O
Matching the multiples of xmhl on both sides of (5) gives
k,Z,m
= tk,&,z
tk++l
=
O,l,.
.,X

1.
(6)
Repeated applications of (6) gives
tkVm = t k , P
lc,m = O , l , *
 .,w  1.
(7)
And, since the convolution is cyclic, the illdices in (6)
are added nlodulo N . This gives
tk,mAr =
1
16,772 =
0,1,
*
.,iv 
(8)
1.
Therefore, t.he tic,m's are JVth roots of unity. For T to be
nonsingular, all the t k , 1 7 should
~
bedistinct
and, since
there are only N distinct N t h roots of unity, t'he t k , l ' s
must be these i\; distinct roots. Without loss of generality,
t l . 1 can be taken as a root of order N , i.e., N is t,he least
positive integersuch that tl,l" = 1. Therefore,all t k , l ' s
can be written as some powers of t l , 1 . Again without loss
of generality, the rows of T can be arrangod so that
(1)
tk.1
k=O
This definition assumes the sequences are periodically
extendedwith
period N or the indiccs arcevaluated
modulo N. Here we will describe sequences as vectors of
length N. Since only linear invertible t,ransforms are considered, they can be represented by anN X N nonsingular
matrix T whose clenwnts arc t k , m , k,wz = O , l ,   ,N  1.
Let capital letters denote the transformed sequences, i.e.,
H
1974
(9)
tl,lk.
Combining (7) and (9) and denoting
following
tk,m
akm
k,7?% =
0,1,' *
tl.1
by a gives the
',Av 1.
(10)
The structure thus established causes the transform to
beorthogonal. The elements Zk,m of
are given by
Zk,m
=
Nlakm
To prove this, consider
k,m
=
O,l,.
.,N

1.
(11)
89
AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITAL FILTERING
TTl
=
I
Repeated applications of (14) gives
or
tk,m
N1
N1
= 6
(Ykn(Yln
(12)
k.2
7L=O
where 6 k , Z is the Kroneckerdeltafunction.Taking
1 = p , (12) can be rewritten as
1
1 if p
N 1
Lvl
=
=

0 (mod W )
(13)
0 otherwise.
n=O
IC
To prove the orthogonality, we have Do prove (13). If
p = O(mod W), then (YP = 1 and the first part is thus
established. If p # O(mod A') , ( Y P # 1 or (a"  1) # 0.
Multiplying (13) by ( ( Y P  l ) , we obtain
N1
p
1(
( Y~ 1)
( Y ~ n
= l p l
( ( Y~ 1)
~
=
= kk,lm
lc,m
=
0,1,.,N  1.
(15)
Thisis the same as (7), but now we donothave
(8)
which implies cyclic convolution. Because of the absence
of ( 8 ) , t k , 1 7are
~ not restricted to N t h roots of unity. The
only restriction is the nonsignularityconstraintwhich
requiresall t k , 1 ' 8 to be distinct. A similarprocedureis
discussed by Knuth [3] in connection withthemultiplication of large integers which is almost analogous to
noncyclic convolution. When the t k , l ' s are not restricted
to Nth roots of unity, ,in general there is no 1WTtype
algorithm to compute the transform or the inverse transform. As a matt'er of fact, t,he inverse transform docs not
have a simple structure. Because of these limitations we
restrict ourselves to transforms having the cyclic convolution property.
0.
111. TRANSFORMS I N T H E R I N G 011' INTEGERS
MODULO AN I N T E G E R
n=O
Thus (13) is established.
An examination of the preceding development reveals
In anypractical situation, or when working with digital
that the existence of an N X N transformhaving the machines, the data are available only with some finite
cyclic convolution property depends only on the existence precision, and therefore,without loss of generalit,y, the
of an (Y that is a root. of unity of order N , and theexistence data can be considered to be integers wit'h some upper
of 1Vl. The structures of t'he transform matrix T and its bound. To compute corivolution in this digitaldomain,
inverse are given by (10) and (11) respectively, and operations in the complex number field of the continuous
thesearethe
only structures which supportthe cyclic domain can be imitated with a finite field or, rnore genconvolutionproperty.This
structure is called t'he DFT erally, a finitx ring of integers under additions and multistructure. Because of the DFT structure, any transform
plications modulo some integer F ; and an int'eger of
having the cyclic convolutionproperty also has a fast order M replaces ej2T/i\i used in a DIW. In thisring, when
computational algorithm simiiar to theFFT if is highly two integer sequences z ( n ) and h ( h ) are convolved, the
composite Kg]. I n t'he complex number field the DFT,
output integer sequence y ( n )is congruent to the convoluwith = ejZriN, is the only transform having the cyclic tion of x(n) and h ( n ) modulo F . In the ring of integers
convolution property. But, in a finite field or, more gen modulo F , conventionalintcgcrscanbeunambiguously
erally, a ringwe might also have transforms with the
cyclic represent'ed if their absolute value is loss than F / 2 . If the
convolution property if there exists a root of unity of input integer sequences z ( n ) and h ( n ) are so scaled that
order N and if X' exists. In the next section we establish I y ( n ) I never exceeds F / 2 , we would get the same results
resultsfor the special case of a finitering of integers by implementing convolut'ion inthe
ring of integers
modulo of an integer, which seem to be most appropriate modulo F as that obtained with normal arit.hmetic. This
for practical digital computation. Appendix A lists some is similar to the overflow constraint in fixedpoint digital
of the properties of these transforms.
machines.
Thus far wi: have discussed only the transforms having
I n most digit,al filt'ering applications, h ( n ) represents
the cyclic convolutionproperty.
Noncyclic convolution the impulse response and is known a priori; also the maxcan be implemented using cyclic convolution by padding imummagnitude of the input signal isusually known.
the sequences with zeros [l]. If only noncyclic convolu I n this situation, we can bound the peak output magnition is desired, a more general class of transforms is pos tude by,
sible. Consider two sequences of lengths L and M , respecAi 1
t'ively, t'hat are noncyclically convolved giving an output
1
y
(
n
)
I
5
I
z
(
n
)
I m x
I h ( n ) 1.
sequence of length N = L
M  1. First we pad the
n=O
sequences with zeros to makethem of length N each.
When we relatecontinuoustime
signals to discreteFor the N X N transform T , the desired restrictions on
time
signals
through
sampling,
we
obtain an aliasing
t k , m 1 8 are still given by (6), but now the indices are not
effect
in
the
frequency
domain;
all
the
frequencies are
added modulo X ; this modifies (6) to be
obtained
modulo
the sampling
frcquency.
A similar
tk,m+l = tk,mtk,z
k = 0,1," ' , N  1
effect occurs when the amplitude is discrctizod and convolution is performed in the integer domain modulo an
112 = O , l , . . . , M
 1
integer F. We obtain aliasing in amplitude which can be
I = 0,1,.*.,L  1.
(14) avoided as noted before.
(Y
A
i
(Y
+
c
90
1974
IEEE TRANSACTIONS ON
PROCESSING,
APRIL
ACOUSTICS,
SIGNAL ANDSPEECH,
I n order to have a computational advantage in implementing cyclic convolution, wewill derive a transform
having the cyclic convolutionproperty in this ring. In
addition, we would like thistransfornltohave
a fast
cornput'ationalalgorithm.First,
wewill presentresults
about the existence of such transforms which mere shown
to depend on the exist.enceof an a of order N and existence
N  l in the ring. The number theory necessary to understand the derivations is very basic and can be found in
any elementary book on number theory [14].
Let
F = p l r l p 2 r z . PlPL
(16)
..
be the prime factorization of F . Since a is of order N ,
we have
(3.7)
a N = 1 (mod F )
which implies
all' = I(mod
i
piri)
=
..
1,2,. ,1.
(18)
If the transform matrix T is to be nonsingular, no two
rows of T can be the same, which implies
a p
# @(mod piri)
i
=
1,

11,
P #4
0 5 p , q 5 A:  1.
(19)
Equations (18) and (19) imply that a should be of order
N with respect to each prime power factor pjri of F , i.e.,
N is the least positive integer such that
a'v = 1 (mod pi'i)
i = 1,2,. *,Z.
( 20)

If a is of order less than N with respect to any moduli
piT<,then 1' would be singular wit,h respect to that modulus. Also, for 1Vl to exist, N should be relatively prime
t o F , i.e., p i should not be a factor of N . These two considerations give Theorem I, which isproven in Appendix B.
Theorem 1
An N X N transform T having the cyclic convolution
property in the ring of integers modulo an integer F exists
if and only if X divides 0 ( F ) g gcd(pl  1,pZ  I ,
Q L  1).

IV. ECERRIAT NUMBER TRANSFORMS
a ,
choices for F , for which the maximum transform l.ength
is not too small. As noted above, we would like F to
have a very few bit binary representation. The first possibility is P ; it has a prime factor 2 and therefore according to Theorem 1 of Section 111, the maximum possible
transform length is 1. For 2k  1, let IC be a composite
PQ, where 1' is prime. Then 2 p  1 divides 2"Q  1 and
the maximum possible length of the t,ransform will be
governed by the length possible for 2 p  1. Therefore,
only the primes I' need to be considered intcrcsting. Nurnbers of this form are known as Merscnne nunlbers and
Rader [lo] has discussed convolution using Mersenne
numbers in detail. For MNT's [lo], it can beshown that
transfornls of length at least 21' exist and the corresponding a = 2. Mersennc number transforms are not of as
muchinterest because 21' is not highly composito and,
therefore, we do not have fast FYI'type comput,at,ional
algorithms to compute the transforms. For Zk $ I, say IC
is odd. Thcn 3 divides
I andthc largest possible
transform length is 2. Thus k is wen. Let IC be ~ 2 where
~ ,
s is an odd integer. Then 2 z t
I divides 2 8 z t
1 and the
length of the possible transform will be governed by the
length possible for 2zt 1. Therefore, int'egers of the form
2zL 1 are of interest.Thesenumbersare
known as
Fermatnumbers and will be discussed in detail in t'his
paper. E'ennat numbers seem to be optimum in the sense
of having transforms whose length is interesting while t.he
word size is moderate. Numbers of the form 2.2'
1 are
also of limited int'erest and are discussed in Section IX.
A systematic investigat'ion of t'hose F which require rnore
than taw bit reprcsentation is difficult. Our preliminary
investigation in that direction does not seem t o be very
encouraging.
In the light of the above discussion, it is proposed to
use F t 42b $ 1,b = zt, t being a positive integer, as good
choices for I?. Fr is known as thetth Ferrnatnumber.
Arithmetic modulo 17, can be performed using 6bit arithmetic, as will be discussed in Section VI. The nunher of
bits used for the signal and t'he filtercoefficients decide
the value of b to be used so as not to cause error in the
output due tooverflow. The value of b required is slightly
more than double the number of bits used for Dhc signal
representat,ion. One simplc bound on the output was present'cd in t'he lastr section. Computationally, Fcrmat nunlbers sewn to be the best choices for F , therefore, as far as
is practicable, they shouldbc used. If thovalue of h
obtained from the overflow consideration is not a power
os 2, it should be increased to the next power of 2 in
order that Fcrrnatnumberscanbe
used. Some other
possibilities are discussed in Section IX. Since thn arithmetic is done modulo a k'crmat number, we call the associatedtransforms FNT's. Below we define F h T s and
their inverse transforms for sequences of length lz' = 2":
AT
+
+
+
+
+
For number theoretic transforms to be more at,tractive
i n comparison to the PE'T for implementing convolution,
they should bo computationally efficient'. There arc three
requirements that will be considered. First, I\; should be
highly composite (prefcrably a power of 2 ) for a fast FE'T
type algorithm to exist. Second, since complex multiplications take most of the computational effort in calculating
thc FF'T, it is important' that themultiplication by powers
of a be a simple operation. This is possible if the powers
of a have very few bit binary representations; preferably
A1
z(n)anb(modP I ) k = O,l,...,Ar
also a power of two, where multiplication by a powcr of a X ( k ) =
7L=O
reduces to a word shift. Third, in order to facilitate arithmetic modulo P , F should also have a very few bit binary
N 1
representation.
z (n) = (2m)
X(k)a"k(modFt)
k=O
Now we shall make a systematic investigation of good
+
 1
(21)
CONVOLUTION
FAST
AGARWAL
BURRUS: AND
91
AAPPLICATXONS
FILTERING
DIGITAL TO
p , ::26 :+ 1,
b = 2t
TABLE I
(22)
PARAMETERS FOR SEVERAL POSSIBLE IMPLICATIONS FOR THE
__._
is an integer of order N , Le., N is the least posit'ive
integer such that
FNT'S
___..I__
a!
ah' = 1(mod F , )
.
Notethat CY^ =
and 2" = 2b"(mod F,).
As discussed in Sections I1 and 111, we canperform
cyclic convolut,ion of two integer sequences of length hr,
using the FNT, if certain conditions,are satisfied. Now
we discuss the possible values of N for which an FNT
exists, given a choice of F,.As implied by Theorem 1 of
Section 111, for transforms (mod F , ) of length N to exist,
N should divide O(F,).
Fermat numbers up to Fd are all primes, therefore for
FNT for
these cases O ( F , ) = 2b, and we canhavean
any length N = 2", m 5 b. For these Fermat primes the
integer 3 is an of order N = 2b, giving the largest possibletransformlength.
Of course thereare 2b1 other
integers also which are of order 2b.They can be obtained
by taking odd powers of 3 . If an integer a! is of order N ,
then a!P(mod F,)will be of order N / p , if N / p is an integer.
Therefore,usingFermatprimes,
an integer a! of order
2" m 5 b, is given by 32&"(modF,). The integer 2 is of
order 2b = 2t+1,If CY is taken as 2 or a power of 2, all the
powers of a! would be some powers of 2, and, for these
cases, as discussed in t'he beginning of this section, the
FNT can be computed very
efficiently and is called the RT.
There are no other known Fernlat primes. For digital
fiheringapplications, F 5 ( b = 3 2 ) and F6 ( b = 64) seem
to be most practical. Lucas [13] has proven that every
prime factor of composite F , is of the form K2t+2 1.
Therefore, 21+2 divides O(F,)
, for t > 4. I n particular it
can be verified that for F5 and F6,O ( F t ) = 2t+2. Therefore, for these choices of Fermat numbers, the maximum
possible transform 1engt)his 2t+2= 4b.Also, we assert that
a!* given by (23) is of order 2," mod F,,t 2 2.
O(
+
qj 4
=
22t2(221'1
.
1)*
(23)
We denote this aP as fi because
CY;
= 2 mod P,.
The proof that aP given by (23) is of order 21+2 with respect to any factorof F , is given i n Appendix C. Any odd
power of .\/z willalso be of order 2t+2. By raising q
2 to
21+2mthpower, we obtain an integer U! of order 2m, m 5
t 2.
Table I gives Values of 1V for the two most important
values of CY and also gives the maximum possible N for
the most practical values of b. The Fermat number transform has also been defined by Rader [lo], using & = 2
but his development does not indicate the possibility
of
using
Using theresults of Section 111, we havefound
the maximumpossible length N for which an FNT exists,
for the most practical values of b, and have found the
correspondinginteger LY. TJsing CY = fi, themaximum
possible lengths for these transforms
have increased by
a factor of 2, which makes them more useful. In the next
section we discuss a twodimensional convolution scheme
3
4
5
6
8
16
32
256
16
32
32
64
128
12s
256
2*+1
1
+
41
+ 1
2l'
232
2e4
64
64
._
3
65536
...__ 3
128
256
...~_lll___I____
a
fi
__
This case corresponds t o the RT.
whereby very long onedimensional sequences can also be
convolved using a twodimensional FNT.
The basis functions for t,He FNT are integer exponentials mod F,, in comparison t o the complex exponentials
for the DFT. These integer exponentials
fold over after
they cross F J 2 . Because of the nature of modulo arithmetic,the FNT coefficients do not seem to haveany
physical meaning. Although the signal for which the FNT
is being taken may be very small, its FNT coefficients
may lie anywhere between 0 and Ft1. As a matter of
fact, during various stages of the computation each accumulation of the signal "overflows" many times. But still
the end result of the convolution would be exact if the
input signals are properly bounded. FNT's f o l l o ~all the
properties mentioned in AppendixA.
Example
To make the ideas of this section more clear, we now
present an example. This example
will illustrate several
points,treat,ment of negativevaluesinthedata,
t'he
structure of thetransformandthe.inversetransform
matrix, negative powers of CY,frequent "overflow" during
computation,meaningless of the transformvalues,and
exactnegs of the final answer. This example will not demonstrate the efhient implementation of the R T using the
binary arithmatic. That will be illustrated in Section VI
on implementation of the RT.
Consider two sequences 2 = (2,  2,1,0) and h =
(1;2,0,0), whose convolution is desired. From the overflow consideration, it is sufficient 'if we work modulo
Fz = 17. We want N = 4,for F2 the integer 2 is of order
8, therefore i2= 4 is an a! of order 4. The transformation
matrix T is given by
+
11
1
4
16 13
1
16
1 16

11
13 16
4 1
I1 (mod 17)
41
4
92
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, APRIL
Since 41 = 4(mod 17), the inverse transformation matrix T’ is given by
1974
TABLE I1
MAXIMUMONEDIMENSIONAL
CYCLIC
CONVOLUTION
LENGTHS
USINGTWODIMEKSIONAL
FNT OR RT
____
T1
=
1 41
42
43
1
44
46
1
1
=41 1
4
1
1
4
42
‘31;
1;
1 1
16
8192 32
3276864
4 1
46 49
l;
1 1
1
4
1;f3(mod17)*
The transforms of z and h are given by
1
1
4 1
1
1
16 1 1 [ 3
X=Tx=
16
1 16
1 1613
4
1
Note that in x ,  2 was represented by
Similarly,
H
Word length b
41
1 43
=
1
XH

2
N for CY
=
2
N for CY
2 =! %
512
2048
8192
2048
as discussed above.Therefore, thelength of sequences
which can be convolved is proportional to theword length
in bits. Thus,for long sequences, word length requirement
may be excessive. Rader [lo] suggested using atwodimensionalconvolutionscheme
to convolvelong onedimensionalsequences. Agarwal and Burrus [ll], [121
presentedsucha
twodimensional convolutionscheme.
Using this scheme, cyclic convolution of length AT = L M
isimplemented as a twodimensional cyclic convolution
of length (2L  1) by M (or 21, by M ) . This twodimensional cyclic convolution can be implemented using
a twodimensional P N T [ll], [12] defined similar to the
onedimensional transform. Using this twodimensional
scheme, the word length required is proportional to the
square root of the length of the sequences to be convolved
which would give for a maximum sequence length,
8b2,
rather than 4b. If M is taken as the maximum possible
length 4b, and 2L  1 is a small integer, then either direct,
convolution or another highspeed algorithm could be
employed [la],[15]
to computeconvolutionalong the
short dimension and the onedimensional F N T could be
used along the long dimension. Computationally this combinationcanbevery
efficient as will be shown in the
implementation in Section VIII. Table I1 lists the maximum lengths for twodimensional transforms.
+ 17 = 15.
VI. A SUGGESTED HARDWARE
IMPLEMENTATION
Since the structure of the PNT is similar to that of the
DFT, and N is a power of 2, a fast implementation of the
= (3,5,12,5) (mod 17).
FNT’s similar to the FFT with radix 2 exists. As a matter
of fact, all we have to do is replace W = ejZ*IN by Q in
Taking the inverse transform of Y ,
any FFT algorithm [l].
y = (2,2,14,2) (mod 17).
I n computing theFNT,arithmeticisdone
modulo
2b
1.
I
n
this
arithmetic
the
only
allowed
integers
are
According to ourassumption,integersaresupposed
to
O,l,
and
all
integers
whose
absolute
values
do
not
lie between 8 and 8. Therefore, 14 must be represented
exceed 2b1 can be represented unambiguously. Negative
as 14  17 = 3. This gives y = (2,2,3,2), whichis
the correct answer. Also, note that y is a symmetric se integers are represented by adding 2b
1 to them; this
quence, therefore Y is also a symmetric sequence. Other is similar to twos complement and ones complement representation of negative integers. Using a bbit register, all
than this, the transform values have no
significance.
int,egers from0 to 2b  1 can be represented. The problem
V. TWODIMENSIONALCONVOLUTION FOR
remains to represent 2 b . If the dat,a are uncorrelated, the
CONVOLVINGLONGSEQUENCES
probability that this number will appear after an arithmetical operation is approximately 2b. For digit’al filterArithmeticmodulo F , canbe implementedusinga
b = 2t bit representation of integers, as will be discussed ing applications, b would typically be 32 or 64; in these
cases the probability of occurrence of 2b is extremely
later in Section VI, with some provision for representing
2b. Themaximumlength
of sequenceswhichcan
be small. If occasional error in convolution is permissible,
cyclically convolved using the FNT is 46, using Q = f i , for these cases, probabily there is no need of any extra
=
(3,9,16,10) and Y
=
=
(3,90,80,90)
+

+
93
AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITAL FILTERING
hardware to represent 2b. If the need exists, an extra bit
could be used to represent 2b at the expense of a more
complicated hardware. The following discussion is based
on the bbit representation of the integers.
Now, we discuss how various basic arithmetical operations can be performed moduloF,.
A . Negation
Let
product; Let Ct be the bbit loworder part of it and CHbe
the bbit highorder part of it., then
A XB
=
CL
+ CW2b= CL  C;i(mod F , ) .
Thus, all we havetodo
is subtractthe higherorder
registerfrom the lower orderregister. Thesubtraction
needs to be done according to C. For example,
(mod 17):13 X 9
=
117 = 15(mod17); 117
h 1
A
ai2i,ai
=
=
0 or 1.
CL
i=O
(
=
0111 0101
7
L'rr
L'L
0101
=
cl{) = 1010
Then
1111 = 15.
b1
A=C
a22
E. Multiplication by a Power of 2
If CY is taken as 2 or a power of 2, RT's are obtained and
h1
the
only multiplications involved in computing them are
=
di2i  (26  1)
those by some powers of 2. These multiplications are pari=O
ticularlysimple to implement in arithmetic
modulo F,.
h 1
Suppose we need to multiply the content's of a register
=
di2i  (26  1)
2b 1
mod F ,
by 2k, 0 < k < 6, all we need to do is leftshift the coni=O
tents of the register by lc bits and subtract the lc overflow
a 1
bits. A convenient way to do this is to append the data
=
di2; 2
mod F,.
register
on the left with a register of zeros, leftshift the
i0
double register by IC positions, dropping the leading zeros,
Thus to negate a number, we have to complement each
and then subtract thehigher order register from the lower
bit and add 2 to the result. For example,
order register, as in D. If IC is outside the range 0 < k < b,
we make use of the fact that 2b =  1 mod F,. Computa( 1011)
tion of the inverse transform required multiplications by
negative powers of 2 which can be converted to positive
(mod 17): 4 = 0100; 4 = [+11:)] = 1101 = 13;
powers by the following relationship, 2k =  2blc mod F,.
An alternate method to multiplyby 2k is to load the data
13 4 = 17. in the higherorderregister
of adoubleregister,
filling
the lower order register with zeros, t,hen rightshift the
13. Addition
double register by Ic positions and subtract the loworder
When we addtwo bbit integers, we obtaina bbit register fromthe highorder register mod P,. For fast
integer and possibly a carry bit. The carry bit represents implementation of the bit shift operation, shift amount k
2b =  1,mod F,. To implement arithmeticmodulo 2b 1, should be expressed in a binary form and a t a clock pulse
shifting should be done by a power of 2. For example,
we simplysubtractthecarrybit.Thusthehardwarc
should be of the carry subtract type. For example,
(mod 17) : 11 X Z3 = 88 = 3 mod 17
1010
11 = 0000 1011.
(mod 17) : 10 9 = 17 = 2(mod 17) +lo01
0101 1000
Shift left 3 positions __
10011
d=O
+ +
+
+
+
+
~
crr
1
1
CI, =
0010
=
2
(CJr)
=
C. Subtraction
Subtraction is implemented as
an addition by
first negating the subtrahend and then adding them. Addition must
be done according to B.
1000
+1100
1 0100
Ll
0011
11 X 23 = 11 X (243)
L). General Multiplication
When we multiply two 6bit integers,
CL
we get a 2bbit
11 = 1011 0000.
=
=
22
3
=
12 mod 17
94
IEEE TRANSAIZTlONS ON ACOUSTICS, SPEECH,
0001 0110
Shift right 3 positions C'H
CL
1100
=
12.
For implementation of the fast RT,unlike the FFT, we
do notneed to store the powers of a. For serial arithmetic,
we could have a register which stores the shift amount k ,
and as we go along the fast. R T flow chart, we continually
update the shift amount.All this is particularly simple to
do in binary arithmetic. If OL is taken as f i , only the odd
powers of a require a twobit representation. In the fast
F N T algorit>hnz,onlyonestagerequiresmultiplications
by odd powers of a. Multiplication by an odd power of
fi would probably be best done by first multiplying by
the just lower even power of f i , which is a power of 2,
then multiplying by f i , which is a twobit number. The
resulting increase in computation effort is very small.
VII. CObilPARISON WITH THE FFT
As noted in the previous section, computing the KT is
a very simple operation on a binary machine. Now lei us
comparet'hecomplexity of variousbasicoperations involved incomputing t h e R T visavis the FFT. If the
two sequences z ( n ) and h ( n ) have 61 and 6 2 bit representations,respectively, and are of length N , then the
output y ( n )would need no morethan a (01 02 logz A')
bit representation. To obtain the correct result b 2 bl
02
log, X . I n Section I11 we have given a.better bound
on the output. Roughlyspeaking, we need twice the number of bits to carry out the convolution using the RT as
compared to t'he fixcdpoint FFT implementation of the
convolution. But in the DFT, every data point is treated
as a complex number and therefore requires two words,
one for the real part andone forthe imaginary part. Thus,
in effect, the hardware requirement for twotransforms
is about the same. Although for real dat>a
it is possible
to make use of the symmetry properties of the D I T S
theyrequireextracomputationandfortshepurposc
of
comparison it will be ignored, even t'hough we have taken
this into account forour IBM 370/155 implementation
to be discussed in the next scct'ion. Therefore we shall
assume, in the FE'T implementation, each data point is
represented by a b / 2 bit real part anda 0/2 bit imaginary
part.
One b / 2 bit complex addition is equivalent to two 6 / 2
bit real additions, which are comparable to a 0bit addition modulo F,. Thus, the complexit'y of additionlsubtraction is the same in both the transforms. Similarly, it can
be shown that a b / 2 bit complex multiplication is comparable to a &bit multiplication modulo F,. Computation
of the RT requires multiplications by powers of 2, which
implemented as bit shifts and subtractions become much
simpler operations compared to complex multiplications
required in the FFT implcn~entat~ion.
+ +
+
+
AND SIGNAL PHOCESSINU,APRIL
1974
To computea lengthN fast KT, 1%' log, N additions/subtract'ions,and ( N / 2 ) log, N / 2 multiplications by some
powers of 2 are requiredwhich are implernentcd as bit shifts
and subtractions. To compute the convolut.ion using the
FFT, most of the time is talcen in computing the complex
A
multiplicat,ionsrequired t o computet,hetransfornls.
comparison with thc RT reveals that these complcx nul.
tiplications are replaced by bit shifts and subtractions
which are much faster operatio~~s. This resultsin considerablecomputationalsavingsin
the implementation
of convolution. This fact has
been verified on a gcneral
purposemachine (IBll.2 370/155) The cornputation required to multiply the two transforms is about, the same
for both the implementations.To convolve long sequences
using the twodimensional P N T or BT, the compuhtional
effort increases by, at t'hc nlost, a factor of 2. Still, the
RT implementation of convolution is much faster as compared to the FE'T implementation.
RT's have some additional advantages ovcr the k'k'T,
First,the FYP implemc!ntation requiresstoringallt,he
powers of W requiringa significant, amount' of storage
which maybeanimportantfactor
for a small minicomputer or a special purpose hardware implcrnentation.
Second, fixedpoint, FF'Y implementation introduces a significant amount of the roundoff noise at the output, (iX
bits depending on the data [a]. This degrades tha signaltonoise rat'ioduring the filteringoperations. The PNT
or RT implementationiserrorfree,
the onlysoum: of
error is input A / D quanti~at~ion.
VIII. I~IPS,EMli:NTATrC)NON THE I B l l 370/155
The word length of t,hc IBM 360 370 series is 32 bits
and, therefore, is well suited for t,heirnplementation of
convolution using the P'NT or 1i.T nloclulo P 5 = 2 3 2
1.
I t has two's compl(:mcnt represcntation of rlegat.ive integers, i.c.,  x isrepresented as
x. W o want. rwgative
integers to be reprcscntedas the conlplernent modulo
232 1, i.e., x is t o be rcprcscnted by F 2 $ 1  x, and
to accomplish this 1 is a.dded whenever a nega.tivo int>eger
is encountered in the data. As noted before, on this machine, we cannot represorrt  l. If a  l is cmmxntcred
in the data, it is roundcd t:ithw to 0 or to 2. This is
equivalent to introducing some initial quantizat,ion noisc.
If the data are uncorrclatt:d, t'hc probability that  1 will
appear after an arithmetical opcration during cornp1~t.ation of the transform is roughly 232=
pcropcmtion.
This crror int,roduced during computationis ratllcr wrious
andprobablytho
corrcsponding output b1oc.k will be
meaningless.Assuming N = 64, thc probability that a
particular block is erroneous is roughly
k'or
most filtering operat'ions, this may be permissible.
Logical add ( A l A ) andsubtractinstructions
(SLIt)
are used to add and subt'ract two intcgcrs modulo Fj. If a
carry is detected aftor add, 1 is subtracted frorn the result,
and if 110 carry is dctectlctl aflcr subtract, I is added t,o
the result. Multiplication by 'Lk is done using tho logical
left shift operation (SLl)I.,) and m~dltiplicat,ionby 2 '; is
+
+

95
AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONSFILTERING
TO DIGITAL
TABLE 111
seem to be the best choice for implementation on digit,al
CYCLICCONVOLUTION
TIMINGS
FOR LENGTHN REALSEQUENCES
computers. Nevertheless, any integer F , for which 0 ( F ) >
1 can be used, was discussed in Section IV. Rader [lo]
ms
N
proposed the use of Mersenne numbers
(2"  1, p a
prime). MNT's have Q =  2 and N = 2 p , and therefore
3.3
32
16
do not have an FFTtype fast computational algorithm.
64
31
7.4
60
16.68
128
In many situations the quantization and overflow con256
123
40 .Ob
siderations may require arithmetic to be done using 10
256
123
8o.G
243
168.00
512
to 20bit arithmetic.Forthissituation,workingin
the
1024
530
340. Oc
integer ring modulo F4( 2 1 6
1) would be insufficient and
1260
720.00
2048
at the same tirne, computing the FiYT modulo F5(232 1 )
a Using
= d.
would require excessive number of bits. I n this situation;
b Using 2 by 128 convolution.
we may perform convoiution modulo zz4 1. Also, many
Using twodimensional RT.
computershave 24bit wordlength.Transformssimilar
to FNT's exist for thissituation.We
could take F =
doneusing logical rightshiftoperation
(SRDL), 0 < 224f 1 and then Q = 8 gives N = 16, and = S1/Z =
k < 32, as explained in Section VI. Similarly, multiplica 27(212 1) gives N = 32 as the maximum length of the
tion of two integers modulo F5 can also be done.
onedimensional transform.
Two assembler ,subprograms were written to compute
I n generalone could take F = 2sZt
1, s is an odd
the fast RT and the inverse for any sequence length from integer. In that case, a = 2" would give N = 2t+', and
to 26, taking a as a power of 2. Two more subprograms
= (p)1/2= 2 [ ( s  l ) / ~ + s 2 t  ~ 1 ( 2 ~ 2 t   l  1) would give iv =
were written to compute the fast FNT and itsinverse for 2t+2.I n many situations it may be possible to have translength 128 sequences, using a: = f i , given by (23). Out forms of greaterlengththan
2t+2. For example, taking
of the 7 stages of the fast algorithm, 6nly one stage re F = 240 1, t'he maxinium possible transform length is
quiresmultiplication byoda powers of ~, which are 256,' and,taking F = 280 1; the maximum possible
implenxnted as suggest'ed in Section VI.
transfotm length is 1024, but the corresponding a's may
To cyclically convolve sequences longer than 128, tyo not besimple.
dimensionalconvolution schemeswereused
[l2].One
As discussed earlier, the FNT implementation requires
suchscheme was 2 by 128 convolutionfor length 256 roughly twice the number of bits for the arithmetic. Most
sequences discussed in Section X of [li]. Another pro minicomputers have short word lengths and in that situgram was written to convolvelong onedimension61 se ation it may not be possible to efficiently use F'KT's, but
quences using twodimensional RT's, as discussed in Sec there is a solution to this problem. We can split the dat'a
tion I11 of [12]. Timings for various implementations of bits int'o two halves and then convolve them.
convolut'ion and their comparisonwith the FFT imple4 % ) = zz(n)
z1(n)2k, I xz(n) I <
mentationsarc shown inTable 111. To computethese
timings it is assumed that transforms of the h sequences
h ( n ) = h2(n) h ( n ) 2 k ,
I hz(n) I < 2k (24)
have been precalculated. Thus, timings arc for computing
y = 2*h
Zl*h1.22k
(21*h2 zz*h1) 2k 22*h2. ( 2 5 )
the 5 transforms, multiplications of the transforms, and
their inverse transforms. For the FE'T implementation, a NOW,since 21, h ~z2,
, and hz have roughly half the number
very efficient algorithm [17] was used, which employed of bits, it should be possible to convolve them using apboth radix 4 and radix 2 steps and took into account the proximately half the number of bit's. If necessary, a more
fact, that the dat'a are real. For sequences up to length precise analysis of the abovesituation could be easily
256, improvements of t,he order of 3 to 5 were obtained performed. I n (25), the last term, in comparison to the
over the FFT implementations.Since thehardware of first term, is very small and can be neglected. We need to
IBM 360/370 series isnot designed todoarithmetic
take two transforms for z and two transforms for h, the
modulo Permat numbers, extra branch instructions
were pummationshownwithin
the parent,hescscan be pernecessaryaft,er add or subtract opcrat'ions to take into
formed in the transform domain. Finally,we need to take
account the carry bit. If the hardware is designed to do two inverse transforms, one for
zl*hl and t,he other for
arithmetic modulo Fermatnumbers,theseinstructions
(z1*hz 52*hl).
could be avoided, resulting in a significant reduction
in
FNT's can also be used forconvolutionsrequired
in
the computationtime.
Copies of t'heassemblersubblock recursive realizations of recursive digital filters[lfi].
programs arc available from the authors.
For these realizations, convolutions form
the main comFFT
FNT or IZT
ms
,
M
,
+
+
+
0
(Y
+
+
+
+
+
=L
+
+
+
+
IX. GENERALIZATIONSAND
APPLICATIONS
OTHER
I n this paper we have discussed convolution in therings
of integersmodulo of Fermat numbers. These numbers
putationalt'ask. Sinceconvolution with the IWT's can
be performed with both efficiency and with perfect accuracy, it should be very attract,ive for this applicat'ion. For
very highorder recursive filters this may reduce roundoff
noise problems associated with these filters. Other appli
96
IEEE TRANSACTIONS
ON ACOUSTICS, SPEECH, ANI) SIGNAL PROCESSING,
APRIL
1974
APPENDIX A
1’ROPERTIES OF THE TRANSFORMS
HAVING
THE CYCLIC
CONVOLUTIOS
PROPERTY
Belom
we
summarize the definition and some important
propertics of the transforms having the cyclic convolution
property. It is assumed that allarithmeticaloperations
are performed in a ring. Most of these properties are similar to the DFT properties [l].
G.
Th’eore’n
If
=
F(W
+m ) }
=
F(K)ocmK.
Ti f(n
If N can be factored as
N
Transform :
k
f(n)a”k
7L=O
=
CYN =
O,l,..,N
1

+ +  +
1.
Inverse Transform:
N 1
f ( n ) = N’
F ( k )ank
n
=
0,1, *
 ,lv 1.
r1r2rm
=
then a fast computational algorithm similar to the FIT
can be used to compute the transform which requires on
the order of N ( r l r2
r,) arithmetical operations.
N 1
=
(A91
H . Fast ~ ~ ~ ~ ~ ~ ~ t ~ t i ~ ~ ~ l , 4 l ~ ~ ~
A , Definition
F(k)
T { f(n) 1
(A2)
k 4
I . Convolut.ion Property
Transform of cyclic convolut,ion of two sequences in
the product of their t’ransforms,
i.e.,
if
(AIO)
C . 0rth.ogonality
The inner product of two basis functions are given by
(13)] [see (12j and
N
if n a
=
n mod N
?J. Parseval’s Theorem
N
B . Periodicity
F ( k ) can also be periodically extended, similar t o $he
extension of f ( n ),
=
F(K).
=
T{h(n)j.
N 1
X(k)N(12)
s ( n ) h ( n )=
and
x 1
N
N 1
s(n)h(n)
case of s(n) = h ( n )
N
N 1
X ( k ) X ( 1 6 )
22(n) =
and
If the signal is symmetric, i.e.,
N 1
N 1
f(N
 n)
=
F(K)
=
F(N  K).
z ( n ) z (n)
JV
n=o
the transformed sequence is also symmetric, Le.,
F(K)
(M3)
k=O
7Z=O
=
(h12)
k=O
A‘ 1
(AS)
X(k)H(k).
=
a=O
E . Symmetry Property
f(n)= f (  n )
(All)
k=O
For
t,he
part’icular
f(n)
=
H(k)
n=O
This implies t,he orthogonality of the basis functions.
+N )
F(K + N )
T(z(n)]
N 1
otherwise
f(n
=
Then
(A41
0
X(k)
=
X(k)2.
(414)
k=O
Note that in
Ohis case, the conventional Parswal’s theorem
(AB)
N1
N 1
If the signal is antisymmetric, i.e.,
f(n)
=
f(n.)
=
f(N  n ) ,
does not hold,
because
in ring
a magnitude
is undefined.
97
AGARWAL AND BURRUS: FAST CONVOLUTION APPLICATIONS TO DIGITAL FILTERING
K. Stretch Property
The stretch property exists
if the transform for the
longer length sequence exists in that ring.
L. Sampling Property
The sampling property exists.
2t+l
aP
PROOF OF THEOREM I
A8 shown before, the existence of such a transform depends on the existence of an a of order N with respect to
each prime power factor moduli, pari. According to Euler’s
Theorem [14], N , order of a, must divide cp (pi7;),where
(o(piTi)is the Euler’s (o function given by
=
p p  y p i  1).
(B1)
Thus, N j ‘p(pi+i)),and since p i isnotafactor
N I ( p i  l ) , i = 1,2,..,1or
of I\,
N I gcd(p1  l,p2  I,.*.,pz  1 )
or
N I O(F).
(B2)
This proves thenecessary part of the theorem. To prove
sufficiency we have to show the existence of an a of order
N if hr [ 0 ( F ). Note that (B2) rules out F being an even
integerbecausein
that case 0 (8’) = 1. If N I 0 ( F ),
N I ( p i 1) or N I (o(pi),therefore, one can find integers
pi of order N modulo piri [14], i = 1,2, *.,1. Then, by
the Chinese Remainder Theorem, one canfind an a of
order N modulo F = p p . *  p f t , such that

a = pi(mod piri),
i
=
1,2,. .,Z.
(B3)
This is our desired a of order N . Since N and p i are relatively prime, NI exists in this ring. This completes the
proof of the theorem.
APPENDIX C
To prove that ap of ( 2 3 ) is of order 2t+2mod F t , we
have to prove that it is of order 2t+2with respect to each
prime power factor of F , (20). Let piri be a prime power
factor of F t , then
ap2t+2
= 22t+l =
=
Ni I 2‘”
(C2>
therefore N ; must be a power of 2. Also,
APPENDIX B
cp(pp)
Let Nibe the order of ap with respect to p p , then by
EuIer’s theorem,
1 mod F t
1 mod p p .
=
22t
=
1mod F ,
=
1 mod p p .
((333
From ( C l ) and (C3) , it is obvious that
Ni= 2t+?
completing the proof.
ACKNOWLEDGMENT
The authors wish to thank C. M. Rader of Lincoln
his comments and
Laboratqries,Cambridge,Mass.,for
discussions.
REFERENCES
B. Gold and C. M. Rader, Digital Processing of Signals. New
York: McGrawHill, 1969.
A. V. Oppenheim and C. Weinstein, “Effects of finite register
length in digital filtering and the fast Fourier transform,’, Proc.
IEEE, V O ~ .60, pp. 957976, Aug. 1972.
D. E. Knuth, The Art of Computer Programming, vol, 2, Seminumerical Algorithms. Reading, Mass. : AddisonWesley, 1969,
p. 555 and pp. 259262.
J. M. Pollard, “The fast Fourier transform in a finite field,”
Math. Compul., vol. 25, pp. 365374, Apr. 1971:
I. J. Good, “The relation between two fast Fourier transforms,”
IEEE Trans. Comput., vol. C20, pp. 310317, Mar. 1971.
A. Schonhage and V. Strassen, “Fast multiplication of large
numbers,” (in German) Comput., vol. 7, pp. 281292, 1971.
D. E. Knuth, ‘lThe art of computing programmingerrata et
addenda,” Comput. Sei. Dep., Stanford Vniversity, Stanford,
Calif., Rep. STANCS71194, pp. 2126, Jan. 1971.
P. J. Nicholson, "Algebraic theory of finite Fourier transforms,” J . Comput. Syst. Sci., vol. 5, pp. 524547, 1971.
C. M. Rader,’ “The number theoretic DFT and exact discrete
convolution,” IEEE Arden House Workshop on Digital Signal
Processing,‘Harriman, N. Y.,. Jan. 11, 1972.
, “Discrete convolution via Mersenne transforms,” IEEE
Trans. Comput., vol. C21, pp. 12691273, Dee. 1972.
R. C. Agarwal and C. S. Burrus, “Fast digital convolution
using Fermat transforms,” inSouthwest
IEEE Conf. Rec.,
Houston, Tex., Apr. 1973, pp. 538543.
,
‘‘Fakt onedimensional digital convolution by multidimensional techniques,” IEEE Trans. Awust., Speech,and
Signal processing, vol. ASSP22, pp. 110, Feb. 1974.
L. E. Dickson, History of the Theory of Numbers, vol. I.
Washington, D. C.: Carnegie Institute, 1919, p. 376.
0. Ore, Number Theory and Its History. New York: McGrawHill, 1948.
D. A. Pitassi, “Fast convolution usingWalsh functions,” in
Proc. Conf. Applications of Walsh Functions; Washington,
D. C., April 1971, pp. 130133.
C. S.Burrus, “Block realization of digital filters,” IEEE Trans.
Audio Electroacoust., vol. AU20, pp. 230235, Oct. 1972.
R. C. Singleton, “An algorithm for computing the mixed radix
fast Fourier transform,” IEEE Trans. Audio Eledroawust.,
vol. AU17. DD. 9.3103. June 1969.