ASSP22,
NO.
Fast OneDimensional
Digital Convolution
by Multidimensional
Techniques
1 , FEBRUARY 1974
1
for doing fast digital convolution. The caseof sequences of different lengths and the case where only
part of the output sequence is desired are covered as
special cases ofthe general problem.
II. TwoDimensional Convolution Based on OverlapSave
Consider the cyclic convolution of two sequences,
x ( n ) and h (n),giving an output sequence y ( n ) ,all of
length N .
RAMESH C. AGARWAL and CHARLES S. BURRUS,
Member, IEEE
h(n) * x@) = y ( n ) .
This is defined by
N
(1)
1
AbstractThispaperpresents
two formulations of multiy ( n )=
h(n  4)x ( q )
dimensional digital signals from onedimensional digital signals
q =o
so thatmultidimensional convolution willimplement onedimensional convolution of the originalsignals.This has ren = O , l , * . * N 1
ducedanimportantwordlengthrestrictionwhen
used with
the Fermat number transform. The formulation is very general where h(n) and x(n) are periodically extended
and includes block processing and sectioning as special cases their original domains of definition (or their
and, when used with various fast algorithms for short length evaluated modulo N ) .
convolutions, results in improved multiplicationefficiency.
(2)
outside
indices
In order to convert this onedimensional problem to
two dimensions a change of variables is made.
First,
it must be possible to factor N into integer factors
N
I. Introduction
There are several advantages to formulating a onedimensional digital convolution as a two or higher
dimensional problem. The first involves the Mersenne
and Fermat number transforms which have recently
been defined [l], [2] and which seem to have some
advantages over the discrete Fourier transform (DFT)
for implementing convolution on a digital computer.
They can be computationally faster thanthe fast
Fourier transform (FFT) implementation of the DFT
and result in no roundoff error. These transforms
have one limitation: the number of bits required for
each word in an implementation is proportional to
the length of the sequences t o be convolved [ 11 , [ 21.
It is the purpose of this paper to present a scheme
whereby long sequences can be convolved by a twodimensional convolution as mentioned by Rader [l] .
This twodimensional convolution can be implemented by a twodimensional transform to allow a
highspeed errorfree convolution with the word
length proportional t o the square root of the length
of the sequences.
This formulation also allowsuseofspecial
short
convolution algorithms similar to those proposed by
Pitassi [7] , Rayner [SI , Davis [9], and Allwright
[lo]. These can be extended and combined with
others to provide a very general and versatile format
= LM.
(3)
If a change of variables is made such that
k , l = O , l , *  . L 1
n=l+mLl{
1
q = k + pp ,Lm = O , l ,    M 
then (2) becomes
L1
M1
y ( +
~ mL) =
h ( l + mL  k

PL) x(k + p ~ ) .
k = O p=O
(4)
Define now a twodimensional L X M array 8 from
the original length N = LM signal, x(n), by
2(~
rn),= x ( j + m L )
(5)
where .columns of 2 are the sections or blocks of x
and the rows are sples of x taken every L values of
n. In a similar way H and P are defined by
ii(~,m ) = h ( l + mL)
?(E, rn) = y(1+ mL).
(6)
In termsof the twodimensional signals, (4)becomes
?(Z,
M 1
L1
p=O
k=O
m )=
k ( l k , m  p ) 8 ( k , p ) (7)
whichis a/\twodimensional convolution. Note that
valuesof H ouiside the L X M array are required.
These values ofH that are needed can be seenfrom (7)
Manuscript received March 15, 1973; revised July 10, 1973
andSeptember 20, 1973. Thisworkwassupported
by the w h e r e l  L < i  k < L  l a n d l  M G m  p G M National Science Foundation underGrant GK23697.
The authors are with the Department of Electrical Engineer 1. Values of H(1, m ) outside the L X M array are defined by ( 6 ) . We therefore define suitably extended
ing, Rice University, Houston, Tex. 77001.
IEEE TRANSACTIONS ON ACOUSTICS,
SPEECH,
2
8
5:
arrays H and X so that twodimensional convoluGon
willgive the desired answer. The extension of H is
analogous tothe
overlap extensions used in the
overlapsave algorithm for doing sectioning or block
processing [ 3 ] , [41. Note that along the m dimension the values of H are periodic with period M , Le.,
the desired convolution in (7) is cyclic in the rn dimension (this is because the original desired convolution in (21 was defined as cyclic).Considering the
values of H along 1 shows the convolution in that dimension is not cyclic.
An important factor when considering multidimensional convolution is the number of multiplications
required in an implementation. If in (7) m and p are
held constant then (7) becomes a scalar convolution
along the 1 dimension of a lengthL sequence of x with
a length(2L 1) sequence of h giving a lengthL sequence of the output. There willbe one of these
lengthL scalarconvolutions for each value ofm and p
in (7) so that along the rn dimension each “operation”
will be a lengthL convolution rather than a simple
scalar multiplication. This is a sort of convolution of
convolutions [ 3 ]. To count the total number of multiplications to compute (7) with 1 and k constant are
found. But, with 1 and k not constant, this willbe
the number oflengthL convolutions necessary and
the total number of multiplications will be the number of lengthL convolutions times the number of multiplications for a length4 convolution. In this case
along the rn dimension the number of multiplications
is M 2 and along the E dimension it is L 2 ,so the total
is M 2L2 or N 2 , which is the same as a direct calculation of (2) would require. We will later use transforms and other schemes to reduce this number. Note
the convolution can be carried out in either order.
AND SIGNAL
PROCESSING,
FEBRUARY
8
1974
A
H is formed so thatthe columns of
contain the
periodic extension of the original h ( n )with period N .

.&=

h(N  L + 1)
h(N  2L + 1)
h ( N  1)
h(L  1)
h(0)
h ( L )*
h(1)
h ( L + 1)
*
h(L  1)
h(N  L )
h ( N  1)
A
(9)
If twodimensional cyclic convolution is carried out
we have
jL?j*$
(10)
5:
where the loFer L X M partition of Y is 9 and the
columns of Y are the desired blocks of y ( n ) in (2).
Because of ease in implementation with transforms,
the arrays would usually be extended one additional row to be 2 L X M rather than the minimum
(2L  1) X M .
111. TwoDimensional Transform Implementation
8
The twodimensionaltransform of X is defined as
M  12 L  1
8(1,m )
T { 2 }= F ( j , k ) =
m=O
a;k
(11)
m k
(12)
Z=O
If (7) is to be cmie$out by transforms and therefore
cyclic convolution, X must be augmented with zeros and the inverse transform
so that aliasing of the noncyclic convolution along 1
does not occur and so that all arrays are the same size.
T  l { F } = *(l, m ) = ( 2 N )  l
5:
Consider the 2 L  1X M array X formed b,y appending ( L  1)rows of zeros to the bottomof X .
M1
2LI
k=O
j=O
. F ( j , k ) a;:
aM
where aM is of an order M (i.e., M is the least positive integer such that (aM ) M = 1[2] ). Applying the
transform to (10) it can be shown that
I :
T{$?)=T{?j}T{2}
5:
x= x ( L  1)
0
x ( N  1)
0
0
.
(8)
(13)
so that, similar to the onedimensional case, (10) can
be carried out by
@=T’ [T{.&}T{$}].
(14)
If the transform is the DFT, then
aM
= ei2niM*
To compare multiplication efficiencies we assume the
DFT of I$ is already known and the number of multi
BURRUS:
AGARWAL AND CONVOLUTION
ONEDIMENSIONAL
DIGITAL
plications for one 2 L X M transform, one 2 L X M
complex multiplication, and one 2 L X M inverse
transform are calculated. The number of complex
multiplications is approximately ( 2 N log 2N + 2 N ) as
compared to (N log N + N)for a onedimensional implementation and thereforeone would 2not use the
twodimensional approach with the DFT for improved multiplication efficiency.
2
Thecomputational advantage appears when used
with the Fermat number transform [ 2 ] where word
length requirements are a possible restriction. The
Fermat number transform is defined in [ 2 ] and, although not named, is defined in a restricted form in
thelatterpart
of [l, eq. ( 3 8 ) ] . It is atransform
defined in a finite ring of integers with arithmetic
performed modulo Fermat numbers (2b + 1, b = 2 f ) ,
with az = 2 and (x4b =
and having the property that multiplication of Fermat number transforms corresponds to conventional cyclic convolution
( 2 ) modulo 2b + 1. To perform convolution with the
transform requires N real multiplications and a number of additions and word shifts proportional t o
N log N . Unfortunately the transform requires word
lengths proportional to thelength of the sequences to
be convolved [ l ]
, [ 2 ]. Since the lengths of the two
dimensions are 2 L and M rather than N = LM for the
onedimensional signal, the wordlength requirement
using the twodimensional transform is proportional
to the.square root of N rather than to N as for the
onedimensional problem. It is this reduction in the
necessary wordlength that makes twodimensional
formulation
attractive
with the Fermat number
transform.
The consequences of this reduction in word length
is of considerable practical importance. For example,
using a word length of 16 b and the Fermat number
transform [ 2 ] with (x = 2 to compute the complete
noncyclic convolution of two sequences of equal
length, the onedimensional implementation restricts
the sequence length to a maximum of 16 where the
twodimensional implementation increases the maximum to 256. A summary of the restrictions on sequence lengths is shown in Table I for themost practical word lengths and for two values of a. Note that
onedimensional length restrictions would be too severe for many applications but the twodimensional
restrictions would include most practical filters. If
cyclic convolution is desired, then all the length restrictions are doubled since the addition of zeros to
prevent aliasing is unnecessary.
The twodimensional transformrinverse
transformcan be taken in either order. There is, however, a computational advantage in taking the transform first along the m direction (length M ) and then
A
along the 1 direction (length 2 L ) ;half the X sequences
A
along the m directionare zero and half the H sequences along the m direction are cyclically shifted
e,
3
TABLEI
Sequence Length Restrictions for One and TwoDimensional
Implementation o f Noncyclic Convolution by the
Fermat Number Transform
Word Length Transform Maximum Sequence LengthN/2
1D
20
Basis a
(Bits)
16
16
256
1024
32
16
32
1024
32
4096
64
32
64
2
4096
64
16
128
384
64
a
Jz
Jz
by one position of the other halfsequences. Also,
while taking the inverse transform, there is an advantage to first taking the inverse transform along 1, then
A
along m, because we need only half the Y sequences
and thereforeonly half the sequences need be inverted along m.
IV. Generalizations and the Inverse Problem
A generalization of the approach in this paper to
higher orders is fairly obvious. For example, N could
be factored into three integer factors N = LMP, as
was done with two factors in (3). The signals x ( n )
and h(n)would then be redefined as threedimensional
L X M X P arrays and ( 2 ) converted to athreedimensional convolution by a change of variables as
was done in two dimensions in (4)(7). For x this
would be
z(1,m , p ) = x(1 + mL + pML)
(15)
and with similar definitions for and and after augmentation t o prevent aliasing, ( 2 ) would become a
threedimensional convolution.
We would then have onedimensional cyclic convolution of Nlength sequences being carried out by threedimensional cyclic convolution with dimensions of
lengths 2L, 2M,and P . Use of orders higher than two
does not seem needed with the Fermat number transform at this time, but willbe exploited with other
schemes in the nextsections.
Stillanother variation would apply tothe case
where the filter is periodically timevarying. This,
when converted to a twodimensional problem with
L equal the period of the filter, becomes timeinvariant along one dimension, m, and timevarying
along the other, 1 [ 5 ] . The DFT or Fermat transform
could be applied to the twodimensional signal along
the timeinvariant dimension and either direct calculation or another type transform applied to the other
dimension.
The inverse of the above problem can be considered
where one wishes to implement twodimensional con
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
SIGNAL
AND
4
volution by onedimensional methods. This involves
reversing the process presented in Section I1 by constructing onedimensional sequences from the given
arrays. MacAdams [6] has presented a scheme for
doing this that can be seen to be the inverse of the
problem addressed in this paper and can be applied to
cyclic or noncyclic twodimensional convolution.
PROCESSING, FEBRUARY 1974
for efficient short convolution can be used. Consider
one of several possible algorithms suggested by Pitassi
[7] where, rather than directly calculating (19), three
intermediate numbers are found:
go =
V. MDimensional Convolution
W O
+h
g1
= hO@O 
g2
=
(h,
lbl
x1 )
(20)
+ ho 1x0.
The desired outputs are then calculated by
If the length of the signals to be cyclically convolved
in (2) can be written N = Z M , then the methodsused
Yo =go +g1
in (5) and (15) can be extended to define an M Y 1 =g2  g1.
(21)
dimensional signal with each dimension of length two.
This approach uses three multiplications in compariAs before, thisis done by achange of variables.
son with four multiplications for a direct calculation.
X(Z, m,p,
) = x(Z + 2m t 4p + * . ),
Using this result, the total number of multiplicaI , r n , p , . . . = 0 , l . (17) tions to calculate (18) becomes
N
 
After a similar definition for
tion in (2) becomes
Y(Z,m , p ,  .. ) = c
1
cc
1
k=O j=O
g(lk , m

and
? cyclic convolu
1
*
.
g=O
F
=
3M.
(22)
If the fact pointed out earlier, that the convolution
along the last dimension is cyclic, is used, then a further reduction is possible.
Lengthtwo cyclic convolution is given by
j , p  g ,  . . ) z ( k , j , 4,.. ) (18)
for I, m , p , . . . = 0 , l .
Both .?? and are Md@ensional with each dimension of length two and H is also Mdimensional but
must be defined with dimensions of length three in
order to carry out (18). This is the same as was reR
quired in two dimensions where H was defined in (9).
Since the original convolution in (2) was defined as
cyclic, the convolution a l o g the last dimension in
(18) is also cyclic so that H need not be extended
along that dimension but can be evaluated for thisindex modulo 2.
This is an extremely general and versatile formulation for the original problem that can be used to improve computational efficiency. The convolution
along each dimension of length two can be written in
terms of scalar variables as
This can be calculated from two intermediate values
by
f o = (ho /2 + hl/2)
f l
=
(x0 +
(ho/2  hl/2) (x0 
x1 )
X1
1
to give
Yo = f o
Y1
=fo
+ fl
 f l
requiring two multiplications (assuming the factors of
are either precalculated or obtained by shifting);
Using this result on thelast dimension reduces (22) to
F = 2 . 3"1
(25)
which is the same number obtained by Pitassi [7] and
Rayner [SI
Pitassi developed hisalgorithm by relating the cyclic
convolution of two sequences to the convolution of
subsequences in the same manner that the FFTcan be
The convolution of (18) can be viewed as M nested developed based on decimation in time. This was exlength 2 convolutions, each separately of the form tended by Davis [9] to an approach similar to decishown in (19) requiring four multiplications. Using mation in hequency where the subsequences are
the same reasoning that was explained for the two halvesof the originalsequences. Both of these are
dimensional formulation, the total number of multi specialcasesof
the multidimensional formulation
plications is F = 4M. Since N = ZM, this becomes F =
since the values along any dimension are samples or
N 2 , which is again the same as would be required by!
blocks of the original sequence.
directly calculating (2).
Another form of convolution that is sometimes desired is of the same, form as (2) but with h ( n ) not
VI. A High Speed Algorithm
beingperiodically extended,rather having indepenWith this formulationof scalar convolution in terms dent values for negative indices. The transmission maof multidimensional short convolutions, various tricks trix formulation for N = 3 is
a
AGARWAL AND BURRUS: ONEDIMENSIONAL DIGITAL CONVOLUTION
5
which becomes
and because of its structure the operator is called a
constant diagonal convolution matrix. This requires
2 N  1 values of h ( n ) from n =  N + 1 to n = N  1
and N values of x ( n ) t o give N value of y(n).
The same approach that wasused for cyclic convolution is applicable here except the reduction described in (23) and (24) does not work since the Mdimensional formulation of (18)isnolonger cyclic
along any dimension. Therefore the number of multiplications necessary to implement a lengthN constant
diagonal convolution is
F=3M
(27)
which is the same as obtained by Allwright [lo], using a matrix factorization approach.
Note that cyclic convolution in (2) can be viewed as
a special
case
of constant diagonal convolution.
Causal noncyclic convolution where h ( n ) is zero for
y1 < 0 is also a special case.
so = R o &
s1 =
$12,
+R, x 1
+ 8, G I .
A
(32)
The G1 terms are always zero sincethe last half of the
length2N sequence x(n)*has been added aszeros.
Close examination shows that because h ( n ) has also
been extended with zeros, either Tio or 8, will always
be zero, depending on values of the constant indices.
Therefore the lengthtwo convolution of (31)will require only one multiplication and the resulting total
number of multiplications necessary to noncyclically
convolve two lengthN sequences giving a length2N
output is
F=3M
(33)
which is the same as for the lengthN constant diagonal convolution with a lengthN output.
VII. Multidimensional Convolution Based on OverlapAdd
Another twodimensional formulation canbe developed that is a generalization o,f the overlapadd algorithm [ 31 , [ 41 . In (7) the H function was augmented so that the desirehd output y(n) is in block?
Lh2 hl h o l Lx2l L Y 2 1
that are the columns of Y.* For this formulation H
The most common form of convolution desired is will not be augmented but Y will be and the desired
causal noncyclic convolution where all of the output y(n) will e! obtained by adding the overlapped colsequence is obtainedratherthan
only the first N umns of Y. This formulation isbased on noncyclic
points as in (28).Forthis
case two lengthN se convolution given by
quences are convolved to give a length(2N  1)output. The transmission matrix formulation forN = 3 is
(34)
where h ( n ) and x ( n ) are sequences of length N and
y ( n ) of length 2N  1 and all are defined to be zero
outside these lengths. The transmission matrix formuY3
lation for N = 3 is given by (29).
iY4i .
Using a similar factoring of N and change of variables
as was done in (3) and (5) we have (34)
To apply the results from cyclic convolution, all sebecoming
quences are extended with zeros t o length 2N. Cyclic
convolution then gives the desired output of (29)
and uses
p=O
k=O
F ~ 2 . 3 ~
(30)
multiplications.
A further reduction is possible by recognizing that
along the last dimension only one multiplication is
necessary ratherthantwoforthe
cyclic case that
gave (25) or three for the constant
diagonal noncyclic
case that gave (27). Consider the (M + 1)dimensional
convolution of (18)from the extended sequenceswith
all indices except the last one held constant. The resulting lengthtwo scalar convolution is
Z=0,1;2L
2
m = 0 , 1 ,  . . 2 M  2.
(35)
In this case both 2 and fiAareL XAMarrays and 9 is
(2L  1)X (2M  1) with X and H having their col2mns the blocks of x ( n ) and h ( n ) but the columns of
Y and the blocks of y(n)h p e a mpe complicated relation than in (6). Both X and H are defined to be
zero outside the domain of definition. Here we can
show
IEEE TRANSACTIONS
ON
6
y(E + rnL) = ?(l,
rn) + ?(I + L , rn  1)
ACOUSTICS, SPEECH,
SIGNAL
AND
(36)
PROCESSING, FEBRUARY 1974
N
Y4 =
YZl
+Yo2
N
for
Ys =
I=O,l,..L 1
Y6 =
rn=0,1,..2M 1
and ?(Z,
YlZ
N
rn) = 0 outside its domainof definition of
YZZ.
(41)
Along each dimension of (39) the matrix formulation of the lengthtwo convolution is
I = 0 7 1 , * * * 2 L2rn=0,1;..2M
(37)
2.
This formulation gives the implementation of onedimensional convolution by the sumofoverlapped
columns of an array obtainedby
noncyclic twodimensional convolution. This can be viewed as a
generalization of the overlapadd algorithm [3], [4]
used for sectioning or block processing.
If N = p ,the extension t o Mdimensional convolution is similar to that done for the_overlapsvetechnique in (17) and (18). Both X and H are Mgimensional with each dimension of length two and
Y is Mdimensional but with each of length three. For
N
X(Z,rn,p,...)=x(Z+2rn+4p+...)
~ ( I , m , p ; . . ) = h ( Z + 2 m + 4 p + .  . ) (38)
(34) becomes
m
p
Y(I,rn,p,*..)=
Y 1 = ( h o +hl)@O+ x 1 )  Yo  Y 1
Yz = hl x1
gives the three outputswith threemultiplications and
four additions/subtractions. Using this algorithm on
each dimension of (39) and counting the number of
required multiplications by the method used t o find
(33) we find that (39) can be computed with
k, m  j,p

4,. .
(44)
multiplications which are the same as that obtained
by the overlapsavemethod in (33).
. * *
k = O j = O g=O

Yo = ho x0
F=3M
E
.f i ( 2
which, if done directly, requires four multiplications
and one addition. If an intermediate stepis added for
YI ,then
)z(k,
j , 4,..
) (39)
for I , rn,p, . . = 0,1, 2.
The calculation of y(n) from is a bit complicated
but is a generalization of (36) and involves_onIy additions. For example if N = Z3 = 8 the Y function
would have three dimensions with a total of 27 elements. Along the I dimension there would be nine
lengththree blocks that, when overlapped and added
along the rn dimension, would give three lengthseven
blocks.These,
when overlapped and added, would
give the single lengthfifteen sequence that would be
VII. Sequences of Different Lengths and Partial Outputs
Two modifications of the usual formulation of convolution are often desired. The first occurs when the
two sequences are significantly different in lengths
and the second whenone desires only a portion of the
output rather than all of it. Both cases can be formulated in terms of multidimensional convolutions and
computational savings can be realized.
If the h ( n ) is assumed to be the shorter sequence
and if its length can be expressed as L = 2' the length
of the longer sequence x ( n ) will be expressed as R =
Y(n)2' . M . It may be necessary to add zeros to both x
An example g r N = 4 would havea twodimensional and h. The multidimensional signals are formed t o
three X three Y
give S lengthtwo dimensions and the last dimension
r
1
of length MAIf rn is the index for thelengthM dimension, then H is formulated to be zero for all rn other
Yo0
Yo1
Yo2
Y = l Y l o Yl Y 1 Z
than m = 0. Therefore, if the fast algorithm of (20)
or (43) isusedalong the S lengthtwo dimensions,
L Y Z O Y Z l YZZ]
there will be only M multiplications required along
the M dimension. The total number of multiplicaand y(n) would be found by
tions is then
N
N
N
N
I
Yo = Y o 0
F = 3' . M.
Y1 = L o
This could also be seen by consideringthe problem as
requiring Mlength2'
noncyclic convolutions with
the outputsoverlapped and added.
If only a portion of the total outputfrom the con
Yz = Y z o + Y o 1
N
Y3 =
Y11
(45)
EDIMENSIONAL
BURRUS:
AGARWAL AND
DIGITAL CONVOLUTION
volution is desired, then a similar saving can be obtained. Assume that both h(n) and x ( n ) are of length
N = LM, witha noncyclic output y ( n ) of length
2N  1. Out of this only a block of y ( n ) of length L
is desired. First, N extra zeros ire appended to both
the sequences and cyclic convolution of length 2 N is
formulated. This onedimensional cyclic convolution
is reformulated as a twodimensional convolution as
in (5)(7). The first dimension is of length L and the
convolution in this dimension is noncyclic and can be
carried out as cyclic convolution of length 2L  1. In
the second dimension the convolution is cyclic and of
length 2M. Thetwodimensional arrays are formulated so that the des$ed length L block of y ( n ) appears as a column of Y in (6). Thus, P along the second dimension has to be computed only for one
index. This replaces cyclic convolution of length 2 M
with a summation of 2M terms along the second dimension. The desired output block can be written as
a summation of 2 M convolutions of length L, where
each convolution represents convolution of two sequences of lengths 2L  1 and L, respectively, giving
a sequence of length L. Because zeros were appended
to both x ( n ) and h ( n ) ,out of these 2M convolutions,
between one and M convolutions woTld be nonzero,
depending on the second index of Y for which the
output is desired.
The convolutions along the first dimension of length
L can be carried out either by transform techniques
or multidimensional techniques discussed
in
this
paper. If the transform techniques are used, the
summation of convolutions can be carried out in the
transform domain, thus requiring only one inverse
transform to obtain the desired output. Rader [ll]
hasdiscussed a similar technique for the particular
case of estimation of autocorrelation function forthe
first few lagvalues. If L = 2' and the multidimensional methods are used, the number of multiplications are at most the same as given in(45).
The formulation just discussed can be extended for
the situation where sampled output is desired, sampled at every Lth value. In this case for the twodimensional formulation, the output appears as a row
of P, and as before, the twodimensional convolution
again reduces to a summation of convolutions. If the
output y ( n ) is a narrowband signal as compared to
the sampling frequency, to reconstruct the analog signal, the samples of y ( n ) at a lower sampling rate are
sufficient. In this situation the formulation discussed
here can result in computational savings.
If a multidimensional formulation is considered, we
can obtain partial outputs as combinations of blocks
and samples.
7
put sequence y ( n ) also of length N . Logical convolution is defined similar to the cyclic convolution of (2),
but the addition and subtraction of indices is done
differently. All indices are represented in the binary
form as an Mbit index. When indices have to be added
or subtracted, theyare added or subtracted bit by bit,
modulo 2. Note that in logical convolution, addition
and subtraction of indices are equivalent. We can
convert this onedimensional logical convolution problem to an Mdimensional convolution as in (17) and
(18). Along any dimension, convolution appears as in
(19), but since logical convolution is desired, h( 1)=
h(1). Therefore, if (18) is implemented as lengthtwo
cyclic convolutions along all the dimensions, y ( n ) thus
obtained is logical convolution of x ( n ) and h ( n ) . Alternatively, if (18) is implemented as a noncyclic convolution alongall the dimensions, we obtainnoncyclic convolution of x ( n ) and h(n)as in (26).
Lengthtwo cyclic convolution can be implemented
using just two multiplications as in (24), which is a
lengthtwo DFT implementation. Thus (18) canbe
implemented as an Mdimension$ cyclk convolution
using Mdimensional DFT's of X and H , where each
dimension isof length two. Alternatively lengthN
logical convolution can also be implemented using
lengthNWalsh transforms of x ( n ) and h(n) [12]. The
preceding development shows that length(N = 2M)
Walsh transform is equivalent to the Mdimensional
DFT. Thus the Mdimensional approach establishes
the logical convolution theorem for the Walsh transforms and it also establishes the fast Walsh transform
algorithm as an Mdimensional DFT. These facts have
been noted before in the literature.
For a particular formulation of (20), Pittasi [ 71 and
Davis [9] have shown that some of the intermediate
products correspond to multiplication of the Walsh
transforms of the twosequences.
X. Generalizations and Applications
There are several modifications and generalizations
that are possible with the formulation used here. The
first will illustrate that fast algorithms exist for sequences of length other than two.
Consider the noncyclic convolution of two lengththree sequences.
E].
Y4
IX. Relations to Logical Convolution and Walsh Transforms
Consider logical convolution [12] of two sequences
x ( n ) and h(n) of length N = Z M , each giving an out
This would normally require nine multiplications and
four additions. If six intermediate variables are calculated by
IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, FEBRUARY 1974
8
go = ho x 0
g3
= (ho + h l
1 (x0 + x1
g1 = hl x1
g4
= ( h + h2
1 (x1 "Ixz)
g2
=hzxz
g5 =(h2
+ h 0 ) @ 2 +x01
In these equations, each summation represents cyclic convolution of length 4b , which can be carried
out by Fermat transforms. Taking length 4 b Fermat
(47) transforms of all the sequences in (51), we obtain
T { P(0, I ) ) = T { A ( O ,I ) } T { 2 ( O ,I ) }
then thedesired output can be obtained by
4 T { A (  l I, ) } T { 2 ( 1 ,l ) }
Yo
=go
Y1
=g3 
Yz
=g5
+g,
y3
=g4

Y4
=g2
go 
gl

T { P(1, I ) } = T(A(1, I ) } T{B(O, I ) }
g,
go
 g
2

+ T { a ( O ,I ) ) T { 8 ( 1 ,E ) } .
g2
(52)
These equations are similar to (19) and we could
employ the tricks of (20) and (21) along the first
(48) dimension, giving
requiring six multiplications.
T { P(0, I ) } = [ T { A ( O I, ) } + T{A(I, I ) } ] T { 2 ( 1 1, ) }
If two sequences of length N = 3M were to be convolved, then using an implementation with M di+
l9)  T { 2 < l 7 E , >TI{ A ( O ,E))
mensions of length three would require fortotal
T { ?(l, I ) } = [T@(O, I ) ) + TCA(1,4}1 T{2(0,I)l
multiplications
 [ T { 8 ( 0 ,I ) }  7'{8(1,
l ) } ] T { A ( O ,1 ) ) .
(53)
F=6M.
(49)
r u m
Note that &(l, rn) =fi(l,
rn l),thus A(l, 1) is
the cyclic shift by one position of H ( 1 , l ) . We gould
employ the 5yclic shift theorem to coAmputeT { H ( 1,
E ) } from T ( H ( 1 , E)}. Assuming t h e H transforms are
precalculated,this method requires computation of 2
length(4 b ) X transforms, 1 2 b multiplications and
12 b additions/subtractions to compute the ? transforms using (53), and two length(4b ) ? inverse transusing different algorithms along different dimensions forms. This is efficient because the only extra computation is 4b extra multiplications and is better than
in a manner similar to using a multiple radix FFT.
Because of the generality of the multidimensional the twodimensional Fermat number transform apformulation there are mixtures of highspeed tech proach, which requires roughly twice the amount of
computation. Thisuse of 2 X 1M convolution has a
niques that can be used.
In some situations, it may be advantageous to com small computational advantage even when used with
bine transform techniques with the fast algorithms for the FFT where, in effect, thelast stage of the FFTalshort convolutions discussed in this paper. One such gorithm is replacedby one of the fast lengthtwo
situation uses the Fermat number transforms [2] . As algorithms.
There are many other possibilities of combinations
discussed in Section 111, using a bbit implementation
of
Fermat and Fourier transforms of various lengths
of Fermat number transforms, the maximum cyclic
and
dimensions with short convolution algorithms or
convolution length is 4 b. To cyclically convolve sewith
the use of special hardware. The arbitrary order
quences longer than this, we need to formulate a twoof
the
various operations can also be used to advandimensional convolution as in Section 11. Assume
tage.
As
pointed out by Rader [ l l ] , and illustrated
x ( n ) ,h(n), and y ( n ) are sequences of length(N = 8 b )
in
our
discussion
of partial outputs and in ( 5 3 ) , it is
each. Consider the formulation of (7), with M = 4 b
often more efficient t o take transforms along one
and L = 2. Equation (7) can be rewritten as
dimension than convolve along another before taking
M 1
the inverse transform.
P(0, rn) =
i i ( 0 ,m  p ) 8 ( 0 , p )
To illustrate the efficiencies ofsome of the techp =o
niques of this paper, a comparison of a particular case
will be made with direct and FFT implementations.
Consider the problem of noncyclic convolution of
two length(N = 2M) sequences to give an output of
length 2 N  1 as described in (34). The number of
multiplications per output point will be calculatedfor
p =o
j=O
three implementations. First, consider adirect im e(0,m  j ) a ( l , j ) , m = O , l , . .  , M  1. (51) plementation which requires N 2 multiplication for
For short sequences of this length, this approach is
faster than adding zeros and using thenext larger
power of twolength with (39).
Similar results are possible with other lengths and a
very general scheme can be developed for sequences
of length
N = 2M . 3 s . . .
(50)
'
AGARWAL AND BURRUS: ONEDIMENSIONAL DIGITAL CONVOLUTION
2N  1 output points (we will approximate this by
2N for simplification). This gives for multiplication
per output point
TABLE I1
A Comparison,of Multiplication Efficiencies for Three
Implementationsof LengthN Convolution
Fo = t N .
2
4
8
16
32
64
128
256
512
1024
If the FFTis used in anefficient way, taking advantage of the fact that the data
are real andusing a
mixed radix algorithm (2, 4, and 8), the number of
multiplications necessary per output pointcan be calculated as described by Singleton [13] and is denoted F , .
Using the Mdimensional formulation with the fast
algorithm of (33) or (44) gives for themultiplications
per output
F2
1
2N
N
14
Table I1 compares these functions forvarious lengths
up to 1024. Note the multidimensional implementation is more efficient than a direct implementation
for all lengths and requires fewer multiplications
than the FFT forlengths up to 128. If a less efficient
FFT implementation requiring 4 log N + 4 multiplications per output point is used, the crossover length is
above 2048.
The multidimension approach can also beused in
the same way the FFT is used to implement ongoing
processing or sectioning [4]. In contrast, t o use with
the FFT, this approach is most efficient when the
length of the section orblock is the same as the length
of the convolution operator.
Initial observations indicatethat block implementation of recursive filters [3] becomes more attractive
when used with the techniques described in this paper. To illustratethis, wewillagain
consider the
multiplication efficiencies for three realizations. First
consider a recursive filter with an equal order numerator and denominatorof N = 2 M .The multiplications
per output points for
a direct implementation
1
2
4
8
16
32
64
128
256
512
1.5
2.25
4.12
5.06
6.03
8.01
9.00
10.00
12.00
13.00
0.75
1.12
1.70
2.53
3.80
5.70
8.54
12.81
19.22
28.83
TABLE I11
A Comparison of Multiplication Efficiencies for Three
Implementations of Block Recursive Filters of Order N
~  3 ~ .
4.5
9
4
2
4
8
16
32
45
64
128
58
FO
8
16
32
~64
128
256
Fl
F2
21
31
38
6.7
10.1
15.2
22.8
34.2
51.2
~~
53
XI. Conclusions
This paper has presented two formulations of convolution in terms of multidimensional convolutionone based ona generalization of the overlapsave
algorithm for sectioning and the other on the
overlapadd algorithm. The first proved to be well suited for
cyclic and constantdiagonal convolution and thesecond for noncyclic convolution. Fast algorithms were
developedbased onlengthtwo and three convolution that lead to an improvement in multiplication
efficiency. Thereductioninrequired word lengths
proved to be a valuable feature when used with the
Fermat number transform. The formulation
proved
to be well suited for the special cases where unequallength sequences were convolved or where only a portion of the output was desired. It was further shown
that various mixtures of algorithms could beused
F o = 2N.
along the different dimensions to achieve certain adUsing efficient FFT algorithms, the results given by vantages or to fitparticular requirements. Finally, exthe methods in [3] are denoted F1. Using three con amples were presented to compare the multiplication
volutions by (33) or (44) to implement the block re efficiencies of a few implementations.
The formulation is so general that a complete and
cursive filter with the block length equal the order,
systematic
investigation of all possible applications is
gives as the multiplications per output point
difficult. The main ideas and relations to otherworks
that we know of have been presented here. The investigation of word length and storage requirements
and a more complete consideration of recursive imTable I11 compares these multiplication efficiencies plementations is still to be made.
for orders up to 128. Note the multidimensional approach is more efficient than the direct for orders
Acknowledgment
above three and more efficient than the FFT for orders up to about 256. This is yet to be explored in
The authors would like to thank R. A. Meyer for
detail.
valuable discussions.
10
IEEE TRANSACTIONS
ON
ACOUSTICS,
SPEECH,
AND SIGNAL
PROCESSING,
VOL.
ASSP22,
References
[l] C.M. Rader “Discrete convolutions via Mersenne transforms,” IEdETrans.Comput.,
vol. C21, pp.12691273, Dec. 1972.
[ 2 ] R. C. Agarwal and C. S. Burru,s,.!‘Fast digitalconvolutions using Fermattransforms.
In Southwestern IEEE
Con6 R e c , Houston, Tex., Apr: 1973, pp. 538543.
[ 3 ] C. S. Burrus, “Block realization of digital filters,” IEEE
Trans. Audio Electroacoust., vol. AU20, pp. 239235,
Oct. 1972.
[4 J B. Gold and C.M. Rader, Digital Processing o f Signals.
New York: McGrawHill, 1969, py; 208211.
[ 5 ] R. A. Meyer and C. S. Burrus,Certain
properties pf
periodically timevarying digitalfilters,” in Southwestern
IEEE Conf. Rec., Houston, Tex., Apr. 1973, pp. 529535.
[ 6 ] D. P. MacAdam, “Image restoration by constrained deconvolution,” J. Opt. SOC. Amer., vol. 6O;pp. 16171627, Dec. 1970.
[ 7 ] D. A. Pitassi, “Fast convolution using the Walsh trans
[8 J
[9]
[lo]
[11 ]
[12 J
[13]
NO. 1 , FEBRUARY 1974
forms,”in
Proc. Conf.Applicationsof
Walsh Functions, Washington, D.C., Apr. 1971, pp. 130133.
P. J. W. Rayner, “A fast cyclic convolution algorithm,”
presentedatSymp.
Digital Filtering,Imperial College,
London, England, Aug. 1971.
W. F. ,Davis “A class of efficient convolution algorithms, in &roc. Symp.Applications o f Walsh Functions, Washington, D.C., Mar. 1972, pp. 318329.
J. C. Allwright, “Realfactorization of noncyclic convolutionoperatorswithapplications
to fast convolution,” Electron. Lett.,vol. 7, pp. 718719, Dec. 1971.
C. M. Rader,“An improvedalgorithm for high speed
to spectralestimaautocorrelationwithapplications
tion,” IEEETrans.AudioElectroacoust.,
vol. AU18,
pp. 439441, Dec. 1970.
G. S. Robinson, “Logical convolutionanddiscrete
Walsh and Fourier power spectra,” IEEE Trans. Audio
Electroacoust., vol. AU20, pp. 271280, Oct. 1972.
R. C. Singleton, “An’algorithm for computing the mixed
radix fast Fourier transform,”’ IEEE Trans. Audio Electroacoust., vol. AU17, pp. 93103, June 1969.
Digital Notch Filter
Design Procedure
The ideal filter forthe above application would then
have a response s ( t ) totheinput u ( t ) . In the frequency domain, the required linear filter would have
a gajn of one for all frequencies except at wo where
its gainis zero. As such, this processoris typically
called a notch filter with notchat a0. Its gainJAMES A. CADZOW
frequency behavior is depicted in Fig. 1.
Unfortunately, the ideal notch filter is not physically realizable and must be approximated in practice.
If one attempted to implement a notch filter apAbstractAn analyticalprocedurefor
designing alinear
resistors, cadigital notch filter is presented. The resultant filter is sixth proximation usinganalogdevices(i.e.,
orderandisimplementedby
cascading threesecondorder
pacitors, and inductors), one would quickly realize
filters so as to avoid instability which may arise from the futility of thisapproach.
On theother hand,
computercoefficienttruncation.Theprocedureoutlined
is
one
may
readily
design
a
digital
filter whose frestraightforw’md, requires only simple algebraic steps, and gives
quency
behavior
closely
resembles
that shown in
filter parameter selection criteria for reducing the effects of
computer coefficient truncation.
Fig. 1 (e.g., see [l][4] ).
Notchfilters
have utility in situationswhere adesired
The approach to be taken in this paper is to then
signal is corruptedbyanadditive
sinusoidalpickup.One
uniformly sample the signal u ( t ) (every T seconds)
thusmust
process the noisy signal so as to remove the
and use the resulting sequence as the input t o a digital
sinusoid without significantly distortingthe desired signal.
filter governed by
y ( h ) = b , u ( h ) + b , u ( h 1 ) + . . . +
b,u(k
I. Introduction
 aly(h 
1) a z y ( k
 2)

 . . .  a,y(h
m)

n ) (1)
A frequently occurring linear data filtering applica where u ( h ) and y ( h ) denote the values of the filter’s
tion occwfs whenone wishes to process a signal of the inputandoutput
signals,respectively, atthehth
form
iteration. A procedure for selecting the coefficients
aiand b iwhereby the filter’s gain factorfrequency beu( t ) = s( t ) + A sin wot
havior will be similar to that shown in Fig. l will be
so as t o remove the additive sinusoidal component, shortly given. One must realize, however, that since
A sin m o t , without seriously distorting the desired the filter is digital, its frequency behaviorwillbe
signal s(t). The situation in which a 60Hz sinusoidal periodic with period 2 x / T (e.g., [l,p. 2971 ) and will
pickup corrupts a desired measurement signal nicely appear as shown in Fig. 2.
illustrates how such problems can arise in practice.
The selection of the sampling period T t o be used is
of great importancefrom a number of viewpoints.
Most importantly, it must be chosen small enough so
that little distortion results from the analogtodigital
conversion of the desired signal s ( t ) . Quantitatively,