You are on page 1of 10

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, VOL.

ASSP-22,
NO.

Fast One-Dimensional
Digital Convolution
by Multidimensional
Techniques

1 , FEBRUARY 1974

1

for doing fast digital convolution. The caseof sequences of different lengths and the case where only
part of the output sequence is desired are covered as
special cases ofthe general problem.
II. Two-Dimensional Convolution Based on Overlap-Save

Consider the cyclic convolution of two sequences,

x ( n ) and h (n),giving an output sequence y ( n ) ,all of
length N .

RAMESH C. AGARWAL and CHARLES S. BURRUS,
Member, IEEE

h(n) * x@) = y ( n ) .
This is defined by
N

(1)

-1

Abstract-Thispaperpresents
two formulations of multiy ( n )=
h(n - 4)x ( q )
dimensional digital signals from one-dimensional digital signals
q =o
so thatmultidimensional convolution willimplement onedimensional convolution of the originalsignals.This has ren = O , l , * . * N -1
ducedanimportantwordlengthrestrictionwhen
used with
the Fermat number transform. The formulation is very general where h(n) and x(n) are periodically extended
and includes block processing and sectioning as special cases their original domains of definition (or their
and, when used with various fast algorithms for short length evaluated modulo N ) .
convolutions, results in improved multiplicationefficiency.

(2)

outside
indices

In order to convert this one-dimensional problem to
two dimensions a change of variables is made.
First,
it must be possible to factor N into integer factors
N

I. Introduction

There are several advantages to formulating a onedimensional digital convolution as a two- or higher
dimensional problem. The first involves the Mersenne
and Fermat number transforms which have recently
been defined [l], [2] and which seem to have some
advantages over the discrete Fourier transform (DFT)
for implementing convolution on a digital computer.
They can be computationally faster thanthe fast
Fourier transform (FFT) implementation of the DFT
and result in no roundoff error. These transforms
have one limitation: the number of bits required for
each word in an implementation is proportional to
the length of the sequences t o be convolved [ 11 , [ 21.
It is the purpose of this paper to present a scheme
whereby long sequences can be convolved by a twodimensional convolution as mentioned by Rader [l] .
This two-dimensional convolution can be implemented by a twodimensional transform to allow a
high-speed error-free convolution with the word
length proportional t o the square root of the length
of the sequences.
This formulation also allowsuseofspecial
short
convolution algorithms similar to those proposed by
Pitassi [7] , Rayner [SI , Davis [9], and Allwright
[lo]. These can be extended and combined with
others to provide a very general and versatile format

= LM.

(3)

If a change of variables is made such that

k , l = O , l , * - . L -1

n=l+mLl{

1

q = k + pp ,Lm = O , l , - - - M -

then (2) becomes
L-1

M-1

y ( +
~ mL) =

h ( l + mL - k

-

PL) x(k + p ~ ) .

k = O p=O

(4)
Define now a twodimensional L X M array 8 from
the original length N = LM signal, x(n), by

2(~
rn),= x ( j + m L )

(5)

where .columns of 2 are the sections or blocks of x
and the rows are s-ples of x taken every L values of
n. In a similar way H and P are defined by

ii(~,m ) = h ( l + mL)

?(E, rn) = y(1+ mL).

(6)

In termsof the two-dimensional signals, (4)becomes

?(Z,

M -1

L-1

p=O

k=O

m )=

k ( l -k , m - p ) 8 ( k , p ) (7)

whichis a/\twodimensional convolution. Note that
valuesof H ouiside the L X M array are required.
These values ofH that are needed can be seenfrom (7)
Manuscript received March 15, 1973; revised July 10, 1973
andSeptember 20, 1973. Thisworkwassupported
by the w h e r e l - L < i - k < L - l a n d l - M G m - p G M National Science Foundation underGrant GK-23697.
The authors are with the Department of Electrical Engineer- 1. Values of H(1, m ) outside the L X M array are defined by ( 6 ) . We therefore define suitably extended
ing, Rice University, Houston, Tex. 77001.

IEEE TRANSACTIONS ON ACOUSTICS,
SPEECH,

2

8

5:

arrays H and X so that twodimensional convoluGon
willgive the desired answer. The extension of H is
analogous tothe
overlap extensions used in the
overlap-save algorithm for doing sectioning or block
processing [ 3 ] , [41. Note that along the m dimension the values of H are periodic with period M , Le.,
the desired convolution in (7) is cyclic in the rn dimension (this is because the original desired convolution in (21 was defined as cyclic).Considering the
values of H along 1 shows the convolution in that dimension is not cyclic.
An important factor when considering multidimensional convolution is the number of multiplications
required in an implementation. If in (7) m and p are
held constant then (7) becomes a scalar convolution
along the 1 dimension of a length-L sequence of x with
a length-(2L- 1) sequence of h giving a length-L sequence of the output. There willbe one of these
length-L scalarconvolutions for each value ofm and p
in (7) so that along the rn dimension each “operation”
will be a length-L convolution rather than a simple
scalar multiplication. This is a sort of convolution of
convolutions [ 3 ]. To count the total number of multiplications to compute (7) with 1 and k constant are
found. But, with 1 and k not constant, this willbe
the number oflength-L convolutions necessary and
the total number of multiplications will be the number of length-L convolutions times the number of multiplications for a length4 convolution. In this case
along the rn dimension the number of multiplications
is M 2 and along the E dimension it is L 2 ,so the total
is M 2L2 or N 2 , which is the same as a direct calculation of (2) would require. We will later use transforms and other schemes to reduce this number. Note
the convolution can be carried out in either order.

AND SIGNAL
PROCESSING,
FEBRUARY

8

1974
A

H is formed so thatthe columns of
contain the
periodic extension of the original h ( n )with period N .

-

.&=

-

h(N - L + 1)

h(N - 2L + 1)

h ( N - 1)

h(L - 1)

h(0)

h ( L )*

h(1)

h ( L + 1)

*

h(L - 1)

h(N - L )

h ( N - 1)

A

(9)
If twodimensional cyclic convolution is carried out
we have

jL?j*$

(10)
5:

where the loFer L X M partition of Y is 9 and the
columns of Y are the desired blocks of y ( n ) in (2).
Because of ease in implementation with transforms,
the arrays would usually be extended one additional row to be 2 L X M rather than the minimum
(2L - 1) X M .
111. Two-Dimensional Transform Implementation
8

The twodimensionaltransform of X is defined as
M - 12 L - 1

8(1,m )

T { 2 }= F ( j , k ) =
m=O

a;k

(11)

-m k

(12)

Z=O

If (7) is to be cmie$out by transforms and therefore
cyclic convolution, X must be augmented with zeros and the inverse transform
so that aliasing of the noncyclic convolution along 1
does not occur and so that all arrays are the same size.
T - l { F } = *(l, m ) = ( 2 N ) - l
5:
Consider the 2 L - 1X M array X formed b,y appending ( L - 1)rows of zeros to the bottomof X .

M-1

2L-I

k=O

j=O

. F ( j , k ) a;:

aM

where aM is of an order M (i.e., M is the least positive integer such that (aM ) M = 1[2] ). Applying the
transform to (10) it can be shown that

I :

T{$?)=T{?j}T{2}

5:

x= x ( L - 1)
0

x ( N - 1)

0

0

.

(8)

(13)

so that, similar to the onedimensional case, (10) can
be carried out by

@=T-’ [T{.&}T{$}].

(14)

If the transform is the DFT, then
aM

= e-i2niM*

To compare multiplication efficiencies we assume the
DFT of I$ is already known and the number of multi-

BURRUS:
AGARWAL AND CONVOLUTION
ONE-DIMENSIONAL
DIGITAL

plications for one 2 L X M transform, one 2 L X M
complex multiplication, and one 2 L X M inverse
transform are calculated. The number of complex
multiplications is approximately ( 2 N log 2N + 2 N ) as
compared to (N log N + N)for a onedimensional implementation and thereforeone would 2not use the
two-dimensional approach with the DFT for improved multiplication efficiency.
2
Thecomputational advantage appears when used
with the Fermat number transform [ 2 ] where word
length requirements are a possible restriction. The
Fermat number transform is defined in [ 2 ] and, although not named, is defined in a restricted form in
thelatterpart
of [l, eq. ( 3 8 ) ] . It is atransform
defined in a finite ring of integers with arithmetic
performed modulo Fermat numbers (2b + 1, b = 2 f ) ,
with az = 2 and (x4b =
and having the property that multiplication of Fermat number transforms corresponds to conventional cyclic convolution
( 2 ) modulo 2b + 1. To perform convolution with the
transform requires N real multiplications and a number of additions and word shifts proportional t o
N log N . Unfortunately the transform requires word
lengths proportional to thelength of the sequences to
be convolved [ l ]
, [ 2 ]. Since the lengths of the two
dimensions are 2 L and M rather than N = LM for the
onedimensional signal, the word-length requirement
using the twodimensional transform is proportional
to the.square root of N rather than to N as for the
onedimensional problem. It is this reduction in the
necessary word-length that makes two-dimensional
formulation
attractive
with the Fermat number
transform.
The consequences of this reduction in word length
is of considerable practical importance. For example,
using a word length of 16 b and the Fermat number
transform [ 2 ] with (x = 2 to compute the complete
noncyclic convolution of two sequences of equal
length, the onedimensional implementation restricts
the sequence length to a maximum of 16 where the
two-dimensional implementation increases the maximum to 256. A summary of the restrictions on sequence lengths is shown in Table I for themost practical word lengths and for two values of a. Note that
onedimensional length restrictions would be too severe for many applications but the two-dimensional
restrictions would include most practical filters. If
cyclic convolution is desired, then all the length restrictions are doubled since the addition of zeros to
prevent aliasing is unnecessary.
The two-dimensional transform-rinverse
transform-can be taken in either order. There is, however, a computational advantage in taking the transform first along the m direction (length M ) and then
A
along the 1 direction (length 2 L ) ;half the X sequences
A
along the m directionare zero and half the H sequences along the m direction are cyclically shifted

e,

3

TABLEI
Sequence Length Restrictions for One- and Two-Dimensional
Implementation o f Noncyclic Convolution by the
Fermat Number Transform
Word Length Transform Maximum Sequence Length-N/2
1-D
2-0
Basis a
(Bits)
16
16
256
1024
32
16
32
1024
32
4096
64
32
64
2
4096
64
16
128
384
64

a

Jz
Jz

by one position of the other halfsequences. Also,
while taking the inverse transform, there is an advantage to first taking the inverse transform along 1, then
A
along m, because we need only half the Y sequences
and thereforeonly half the sequences need be inverted along m.
IV. Generalizations and the Inverse Problem

A generalization of the approach in this paper to
higher orders is fairly obvious. For example, N could
be factored into three integer factors N = LMP, as
was done with two factors in (3). The signals x ( n )
and h(n)would then be redefined as three-dimensional
L X M X P arrays and ( 2 ) converted to athreedimensional convolution by a change of variables as
was done in two dimensions in (4)-(7). For x this
would be

z(1,m , p ) = x(1 + mL + pML)

(15)

and with similar definitions for and and after augmentation t o prevent aliasing, ( 2 ) would become a
three-dimensional convolution.

We would then have onedimensional cyclic convolution of N-length sequences being carried out by threedimensional cyclic convolution with dimensions of
lengths 2L, 2M,and P . Use of orders higher than two
does not seem needed with the Fermat number transform at this time, but willbe exploited with other
schemes in the nextsections.
Stillanother variation would apply tothe case
where the filter is periodically time-varying. This,
when converted to a two-dimensional problem with
L equal the period of the filter, becomes timeinvariant along one dimension, m, and time-varying
along the other, 1 [ 5 ] . The DFT or Fermat transform
could be applied to the two-dimensional signal along
the time-invariant dimension and either direct calculation or another type transform applied to the other
dimension.
The inverse of the above problem can be considered
where one wishes to implement two-dimensional con-

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH,
SIGNAL
AND

4

volution by one-dimensional methods. This involves
reversing the process presented in Section I1 by constructing onedimensional sequences from the given
arrays. MacAdams [6] has presented a scheme for
doing this that can be seen to be the inverse of the
problem addressed in this paper and can be applied to
cyclic or noncyclic twodimensional convolution.

PROCESSING, FEBRUARY 1974

for efficient short convolution can be used. Consider
one of several possible algorithms suggested by Pitassi
[7] where, rather than directly calculating (19), three
intermediate numbers are found:
go =

V. M-Dimensional Convolution

W O

+h

g1

= hO@O -

g2

=

(h,

-lbl
x1 )
(20)

+ ho 1x0.

The desired outputs are then calculated by
If the length of the signals to be cyclically convolved
in (2) can be written N = Z M , then the methodsused
Yo =go +g1
in (5) and (15) can be extended to define an M Y 1 =g2 - g1.
(21)
dimensional signal with each dimension of length two.
This approach uses three multiplications in compariAs before, thisis done by achange of variables.
son with four multiplications for a direct calculation.
X(Z, m,p,
) = x(Z + 2m -t 4p + * . ),
Using this result, the total number of multiplicaI , r n , p , . . . = 0 , l . (17) tions to calculate (18) becomes
N

- -

After a similar definition for
tion in (2) becomes

Y(Z,m , p , - .. ) = c
1

cc
1

k=O j=O

-g(l--k , m

-

and

? cyclic convolu-

1

*

.

g=O

F

=

3M.

(22)

If the fact pointed out earlier, that the convolution
along the last dimension is cyclic, is used, then a further reduction is possible.
Length-two cyclic convolution is given by

j , p - g , - . . ) z ( k , j , 4,.. ) (18)

for I, m , p , . . . = 0 , l .
Both .?? and are Md@ensional with each dimension of length two and H is also Mdimensional but
must be defined with dimensions of length three in
order to carry out (18). This is the same as was reR
quired in two dimensions where H was defined in (9).
Since the original convolution in (2) was defined as
cyclic, the convolution a l o g the last dimension in
(18) is also cyclic so that H need not be extended
along that dimension but can be evaluated for thisindex modulo 2.
This is an extremely general and versatile formulation for the original problem that can be used to improve computational efficiency. The convolution
along each dimension of length two can be written in
terms of scalar variables as

This can be calculated from two intermediate values
by
f o = (ho /2 + hl/2)
f l

=

(x0 +

(ho/2 - hl/2) (x0 -

x1 )
X1

1

to give
Yo = f o
Y1

=fo

+ fl
- f l

requiring two multiplications (assuming the factors of
are either precalculated or obtained by shifting);
Using this result on thelast dimension reduces (22) to
F = 2 . 3"-1
(25)

which is the same number obtained by Pitassi [7] and
Rayner [SI
Pitassi developed hisalgorithm by relating the cyclic
convolution of two sequences to the convolution of
subsequences in the same manner that the FFTcan be
The convolution of (18) can be viewed as M nested developed based on decimation in time. This was exlength 2 convolutions, each separately of the form tended by Davis [9] to an approach similar to decishown in (19) requiring four multiplications. Using mation in hequency where the subsequences are
the same reasoning that was explained for the two- halvesof the originalsequences. Both of these are
dimensional formulation, the total number of multi- specialcasesof
the multidimensional formulation
plications is F = 4M. Since N = ZM, this becomes F =
since the values along any dimension are samples or
N 2 , which is again the same as would be required by!
blocks of the original sequence.
directly calculating (2).
Another form of convolution that is sometimes desired is of the same, form as (2) but with h ( n ) not
VI. A High Speed Algorithm
beingperiodically extended,rather having indepenWith this formulationof scalar convolution in terms dent values for negative indices. The transmission maof multidimensional short convolutions, various tricks trix formulation for N = 3 is
a

AGARWAL AND BURRUS: ONE-DIMENSIONAL DIGITAL CONVOLUTION

5

which becomes
and because of its structure the operator is called a
constant diagonal convolution matrix. This requires
2 N - 1 values of h ( n ) from n = - N + 1 to n = N - 1
and N values of x ( n ) t o give N value of y(n).
The same approach that wasused for cyclic convolution is applicable here except the reduction described in (23) and (24) does not work since the Mdimensional formulation of (18)isnolonger cyclic
along any dimension. Therefore the number of multiplications necessary to implement a length-N constant
diagonal convolution is

F=3M

(27)

which is the same as obtained by Allwright [lo], using a matrix factorization approach.
Note that cyclic convolution in (2) can be viewed as
a special
case
of constant diagonal convolution.
Causal noncyclic convolution where h ( n ) is zero for
y1 < 0 is also a special case.

so = R o &
s1 =

$12,

+R-, x 1
+ 8, G I .
A

(32)

The G1 terms are always zero sincethe last half of the
length-2N sequence x(n)*has been added aszeros.
Close examination shows that because h ( n ) has also
been extended with zeros, either Tio or 8, will always
be zero, depending on values of the constant indices.
Therefore the length-two convolution of (31)will require only one multiplication and the resulting total
number of multiplications necessary to noncyclically
convolve two length-N sequences giving a length-2N
output is

F=3M

(33)

which is the same as for the length-N constant diagonal convolution with a length-N output.
VII. Multidimensional Convolution Based on Overlap-Add

Another two-dimensional formulation canbe developed that is a generalization o,f the overlap-add algorithm [ 31 , [ 41 . In (7) the H function was augmented so that the desirehd output y(n) is in block?
Lh2 hl h o l Lx2-l L Y 2 1
that are the columns of Y.* For this formulation H
The most common form of convolution desired is will not be augmented but Y will be and the desired
causal noncyclic convolution where all of the output y(n) will e! obtained by adding the overlapped colsequence is obtainedratherthan
only the first N umns of Y. This formulation isbased on noncyclic
points as in (28).Forthis
case two length-N se- convolution given by
quences are convolved to give a length-(2N - 1)output. The transmission matrix formulation forN = 3 is

(34)
where h ( n ) and x ( n ) are sequences of length N and
y ( n ) of length 2N - 1 and all are defined to be zero
outside these lengths. The transmission matrix formuY3
lation for N = 3 is given by (29).
iY4i .
Using a similar factoring of N and change of variables
as was done in (3) and (5) we have (34)
To apply the results from cyclic convolution, all sebecoming
quences are extended with zeros t o length 2N. Cyclic
convolution then gives the desired output of (29)
and uses
p=O
k=O
F ~ 2 . 3 ~

(30)

multiplications.
A further reduction is possible by recognizing that
along the last dimension only one multiplication is
necessary ratherthantwoforthe
cyclic case that
gave (25) or three for the constant
diagonal noncyclic
case that gave (27). Consider the (M + 1)dimensional
convolution of (18)from the extended sequenceswith
all indices except the last one held constant. The resulting length-two scalar convolution is

Z=0,1;--2L-

2

m = 0 , 1 , - . . 2 M - 2.

(35)

In this case both 2 and fiAareL XAMarrays and 9 is
(2L - 1)X (2M - 1) with X and H having their col2mns the blocks of x ( n ) and h ( n ) but the columns of
Y and the blocks of y(n)h p e a mp-e complicated relation than in (6). Both X and H are defined to be
zero outside the domain of definition. Here we can
show

IEEE TRANSACTIONS
ON

6

y(E + rnL) = ?(l,

rn) + ?(I + L , rn - 1)

ACOUSTICS, SPEECH,
SIGNAL
AND

(36)

PROCESSING, FEBRUARY 1974
N

Y4 =

YZl

+Yo2
N

for

Ys =

I=O,l,-..L- 1

Y6 =

rn=0,1,-..2M- 1
and ?(Z,

YlZ
N

rn) = 0 outside its domainof definition of

YZZ.

(41)

Along each dimension of (39) the matrix formulation of the length-two convolution is

I = 0 7 1 , * * * 2 L2rn=0,1;..2M-

(37)

2.

This formulation gives the implementation of onedimensional convolution by the sumofoverlapped
columns of an array obtainedby
noncyclic twodimensional convolution. This can be viewed as a
generalization of the overlap-add algorithm [3], [4]
used for sectioning or block processing.
If N = p ,the extension t o M-dimensional convolution is similar to that done for the_overlap-svetechnique in (17) and (18). Both X and H are Mgimensional with each dimension of length two and
Y is Mdimensional but with each of length three. For
N

X(Z,rn,p,...)=x(Z+2rn+4p+...)
~ ( I , m , p ; . . ) = h ( Z + 2 m + 4 p + . - . ) (38)
(34) becomes
m

p

Y(I,rn,p,*..)=

Y 1 = ( h o +hl)@O+ x 1 ) - Yo - Y 1
Yz = hl x1

gives the three outputswith threemultiplications and
four additions/subtractions. Using this algorithm on
each dimension of (39) and counting the number of
required multiplications by the method used t o find
(33) we find that (39) can be computed with

k, m - j,p

-

4,. .

(44)

multiplications which are the same as that obtained
by the overlap-savemethod in (33).

. * *

k = O j = O g=O
-

Yo = ho x0

F=3M
E

.f i ( 2

which, if done directly, requires four multiplications
and one addition. If an intermediate stepis added for
YI ,then

)z(k,
j , 4,..

) (39)

for I , rn,p, . . = 0,1, 2.
The calculation of y(n) from is a bit complicated
but is a generalization of (36) and involves_onIy additions. For example if N = Z3 = 8 the Y function
would have three dimensions with a total of 27 elements. Along the I dimension there would be nine
length-three blocks that, when overlapped and added
along the rn dimension, would give three length-seven
blocks.These,
when overlapped and added, would
give the single length-fifteen sequence that would be

VII. Sequences of Different Lengths and Partial Outputs

Two modifications of the usual formulation of convolution are often desired. The first occurs when the
two sequences are significantly different in lengths
and the second whenone desires only a portion of the
output rather than all of it. Both cases can be formulated in terms of multidimensional convolutions and
computational savings can be realized.
If the h ( n ) is assumed to be the shorter sequence
and if its length can be expressed as L = 2' the length
of the longer sequence x ( n ) will be expressed as R =
Y(n)2' . M . It may be necessary to add zeros to both x
An example g r N = 4 would havea two-dimensional and h. The multidimensional signals are formed t o
three X three Y
give S length-two dimensions and the last dimension
r
1
of length MAIf rn is the index for thelength-M dimension, then H is formulated to be zero for all rn other
Yo0
Yo1
Yo2
Y = l Y l o Yl Y 1 Z
than m = 0. Therefore, if the fast algorithm of (20)
or (43) isusedalong the S length-two dimensions,
L Y Z O Y Z l YZZ]
there will be only M multiplications required along
the M dimension. The total number of multiplicaand y(n) would be found by
tions is then
N

N

N

N

I

Yo = Y o 0

F = 3' . M.

Y1 = L o

This could also be seen by consideringthe problem as
requiring Mlength-2'
noncyclic convolutions with
the outputsoverlapped and added.
If only a portion of the total outputfrom the con-

Yz = Y z o + Y o 1
N

Y3 =

Y11

(45)

E-DIMENSIONAL
BURRUS:
AGARWAL AND

DIGITAL CONVOLUTION

volution is desired, then a similar saving can be obtained. Assume that both h(n) and x ( n ) are of length
N = LM, witha noncyclic output y ( n ) of length
2N - 1. Out of this only a block of y ( n ) of length L
is desired. First, N extra zeros ire appended to both
the sequences and cyclic convolution of length 2 N is
formulated. This onedimensional cyclic convolution
is reformulated as a two-dimensional convolution as
in (5)-(7). The first dimension is of length L and the
convolution in this dimension is noncyclic and can be
carried out as cyclic convolution of length 2L - 1. In
the second dimension the convolution is cyclic and of
length 2M. Thetwodimensional arrays are formulated so that the des$ed length L block of y ( n ) appears as a column of Y in (6). Thus, P along the second dimension has to be computed only for one
index. This replaces cyclic convolution of length 2 M
with a summation of 2M terms along the second dimension. The desired output block can be written as
a summation of 2 M convolutions of length L, where
each convolution represents convolution of two sequences of lengths 2L - 1 and L, respectively, giving
a sequence of length L. Because zeros were appended
to both x ( n ) and h ( n ) ,out of these 2M convolutions,
between one and M convolutions woTld be nonzero,
depending on the second index of Y for which the
output is desired.
The convolutions along the first dimension of length
L can be carried out either by transform techniques
or multidimensional techniques discussed
in
this
paper. If the transform techniques are used, the
summation of convolutions can be carried out in the
transform domain, thus requiring only one inverse
transform to obtain the desired output. Rader [ll]
hasdiscussed a similar technique for the particular
case of estimation of autocorrelation function forthe
first few lagvalues. If L = 2' and the multidimensional methods are used, the number of multiplications are at most the same as given in(45).
The formulation just discussed can be extended for
the situation where sampled output is desired, sampled at every Lth value. In this case for the twodimensional formulation, the output appears as a row
of P, and as before, the twodimensional convolution
again reduces to a summation of convolutions. If the
output y ( n ) is a narrow-band signal as compared to
the sampling frequency, to reconstruct the analog signal, the samples of y ( n ) at a lower sampling rate are
sufficient. In this situation the formulation discussed
here can result in computational savings.
If a multidimensional formulation is considered, we
can obtain partial outputs as combinations of blocks
and samples.

7

put sequence y ( n ) also of length N . Logical convolution is defined similar to the cyclic convolution of (2),
but the addition and subtraction of indices is done
differently. All indices are represented in the binary
form as an M-bit index. When indices have to be added
or subtracted, theyare added or subtracted bit by bit,
modulo 2. Note that in logical convolution, addition
and subtraction of indices are equivalent. We can
convert this onedimensional logical convolution problem to an Mdimensional convolution as in (17) and
(18). Along any dimension, convolution appears as in
(19), but since logical convolution is desired, h(- 1)=
h(1). Therefore, if (18) is implemented as length-two
cyclic convolutions along all the dimensions, y ( n ) thus
obtained is logical convolution of x ( n ) and h ( n ) . Alternatively, if (18) is implemented as a noncyclic convolution alongall the dimensions, we obtainnoncyclic convolution of x ( n ) and h(n)as in (26).
Length-two cyclic convolution can be implemented
using just two multiplications as in (24), which is a
length-two DFT implementation. Thus (18) canbe
implemented as an Mdimension$ cyclk convolution
using Mdimensional DFT's of X and H , where each
dimension isof length two. Alternatively length-N
logical convolution can also be implemented using
length-NWalsh transforms of x ( n ) and h(n) [12]. The
preceding development shows that length-(N = 2M)
Walsh transform is equivalent to the Mdimensional
DFT. Thus the Mdimensional approach establishes
the logical convolution theorem for the Walsh transforms and it also establishes the fast Walsh transform
algorithm as an Mdimensional DFT. These facts have
been noted before in the literature.
For a particular formulation of (20), Pittasi [ 71 and
Davis [9] have shown that some of the intermediate
products correspond to multiplication of the Walsh
transforms of the twosequences.
X. Generalizations and Applications

There are several modifications and generalizations
that are possible with the formulation used here. The
first will illustrate that fast algorithms exist for sequences of length other than two.
Consider the noncyclic convolution of two lengththree sequences.

E].
Y4

IX. Relations to Logical Convolution and Walsh Transforms

Consider logical convolution [12] of two sequences
x ( n ) and h(n) of length N = Z M , each giving an out-

This would normally require nine multiplications and
four additions. If six intermediate variables are calculated by

IEEE TRANSACTIONS ON ACOUSTICS, SPEECH, AND SIGNAL PROCESSING, FEBRUARY 1974

8

go = ho x 0

g3

= (ho + h l

1 (x0 + x1

g1 = hl x1

g4

= ( h + h2

1 (x1 "I-xz)

g2

=hzxz

g5 =(h2

+ h 0 ) @ 2 +x01

In these equations, each summation represents cyclic convolution of length 4b , which can be carried
out by Fermat transforms. Taking length 4 b Fermat
(47) transforms of all the sequences in (51), we obtain

T { P(0, I ) ) = T { A ( O ,I ) } T { 2 ( O ,I ) }

then thedesired output can be obtained by

4- T { A ( - l I, ) } T { 2 ( 1 ,l ) }

Yo

=go

Y1

=g3 -

Yz

=g5

+g,

y3

=g4

-

Y4

=g2

go -

gl

-

T { P(1, I ) } = T(A(1, I ) } T{B(O, I ) }

g,
go

- g
2

-

+ T { a ( O ,I ) ) T { 8 ( 1 ,E ) } .

g2

(52)

These equations are similar to (19) and we could
employ the tricks of (20) and (21) along the first
(48) dimension, giving

requiring six multiplications.
T { P(0, I ) } = [ T { A ( O I, ) } + T{A(-I, I ) } ] T { 2 ( 1 1, ) }
If two sequences of length N = 3M were to be convolved, then using an implementation with M di+
l9) - T { 2 < l 7 E , >TI{ A ( O ,E))
mensions of length three would require fortotal
T { ?(l, I ) } = [T@(O, I ) ) + TCA(1,4}1 T{2(0,I)l
multiplications
- [ T { 8 ( 0 ,I ) } - 7'{8(1,
l ) } ] T { A ( O ,1 ) ) .
(53)
F=6M.
(49)

r u m

Note that &(--l, rn) =fi(l,
rn l),thus A(-l, 1) is
the cyclic shift by one position of H ( 1 , l ) . We gould
employ the 5yclic shift theorem to coAmputeT { H ( -1,
E ) } from T ( H ( 1 , E)}. Assuming t h e H transforms are
precalculated,this method requires computation of 2
length-(4 b ) X transforms, 1 2 b multiplications and
12 b additions/subtractions to compute the ? transforms using (53), and two length-(4b ) ? inverse transusing different algorithms along different dimensions forms. This is efficient because the only extra computation is 4b extra multiplications and is better than
in a manner similar to using a multiple radix FFT.
Because of the generality of the multidimensional the two-dimensional Fermat number transform apformulation there are mixtures of highspeed tech- proach, which requires roughly twice the amount of
computation. Thisuse of 2 X 1M convolution has a
niques that can be used.
In some situations, it may be advantageous to com- small computational advantage even when used with
bine transform techniques with the fast algorithms for the FFT where, in effect, thelast stage of the FFTalshort convolutions discussed in this paper. One such gorithm is replacedby one of the fast length-two
situation uses the Fermat number transforms [2] . As algorithms.
There are many other possibilities of combinations
discussed in Section 111, using a b-bit implementation
of
Fermat and Fourier transforms of various lengths
of Fermat number transforms, the maximum cyclic
and
dimensions with short convolution algorithms or
convolution length is 4 b. To cyclically convolve sewith
the use of special hardware. The arbitrary order
quences longer than this, we need to formulate a twoof
the
various operations can also be used to advandimensional convolution as in Section 11. Assume
tage.
As
pointed out by Rader [ l l ] , and illustrated
x ( n ) ,h(n), and y ( n ) are sequences of length-(N = 8 b )
in
our
discussion
of partial outputs and in ( 5 3 ) , it is
each. Consider the formulation of (7), with M = 4 b
often more efficient t o take transforms along one
and L = 2. Equation (7) can be rewritten as
dimension than convolve along another before taking
M -1
the inverse transform.
P(0, rn) =
i i ( 0 ,m - p ) 8 ( 0 , p )
To illustrate the efficiencies ofsome of the techp =o
niques of this paper, a comparison of a particular case
will be made with direct and FFT implementations.
Consider the problem of noncyclic convolution of
two length-(N = 2M) sequences to give an output of
length 2 N - 1 as described in (34). The number of
multiplications per output point will be calculatedfor
p =o
j=O
three implementations. First, consider adirect im- e(0,m - j ) a ( l , j ) , m = O , l , . . - , M - 1. (51) plementation which requires N 2 multiplication for
For short sequences of this length, this approach is
faster than adding zeros and using thenext larger
power of two-length with (39).
Similar results are possible with other lengths and a
very general scheme can be developed for sequences
of length
N = 2M . 3 s . . .
(50)

-'

AGARWAL AND BURRUS: ONE-DIMENSIONAL DIGITAL CONVOLUTION

2N - 1 output points (we will approximate this by
2N for simplification). This gives for multiplication
per output point

TABLE I1
A Comparison,of Multiplication Efficiencies for Three
Implementationsof Length-N Convolution

Fo = t N .

2
4
8
16
32
64
128
256
512
1024

If the FFTis used in anefficient way, taking advantage of the fact that the data
are real andusing a
mixed radix algorithm (2, 4, and 8), the number of
multiplications necessary per output pointcan be calculated as described by Singleton [13] and is denoted F , .
Using the Mdimensional formulation with the fast
algorithm of (33) or (44) gives for themultiplications
per output
F2

1
2N

N
14

Table I1 compares these functions forvarious lengths
up to 1024. Note the multidimensional implementation is more efficient than a direct implementation
for all lengths and requires fewer multiplications
than the FFT forlengths up to 128. If a less efficient
FFT implementation requiring 4 log N + 4 multiplications per output point is used, the crossover length is
above 2048.
The multidimension approach can also beused in
the same way the FFT is used to implement ongoing
processing or sectioning [4]. In contrast, t o use with
the FFT, this approach is most efficient when the
length of the section orblock is the same as the length
of the convolution operator.
Initial observations indicatethat block implementation of recursive filters [3] becomes more attractive
when used with the techniques described in this paper. To illustratethis, wewillagain
consider the
multiplication efficiencies for three realizations. First
consider a recursive filter with an equal order numerator and denominatorof N = 2 M .The multiplications
per output points for
a direct implementation

1
2
4
8
16
32
64
128
256
512

1.5
2.25
4.12
5.06
6.03
8.01
9.00
10.00
12.00
13.00

0.75
1.12
1.70
2.53
3.80
5.70
8.54
12.81
19.22
28.83

TABLE I11
A Comparison of Multiplication Efficiencies for Three
Implementations of Block Recursive Filters of Order N

~ - 3 ~ .
4.5

9

4

2
4
8
16
32
45
64
128
58

FO

8
16
32
~64
128
256

Fl

F2

21
31
38

6.7
10.1
15.2
22.8
34.2
51.2

~~

53

XI. Conclusions

This paper has presented two formulations of convolution in terms of multidimensional convolutionone based ona generalization of the overlap-save
algorithm for sectioning and the other on the
overlapadd algorithm. The first proved to be well suited for
cyclic and constantdiagonal convolution and thesecond for noncyclic convolution. Fast algorithms were
developedbased onlength-two and -three convolution that lead to an improvement in multiplication
efficiency. Thereductioninrequired word lengths
proved to be a valuable feature when used with the
Fermat number transform. The formulation
proved
to be well suited for the special cases where unequallength sequences were convolved or where only a portion of the output was desired. It was further shown
that various mixtures of algorithms could beused
F o = 2N.
along the different dimensions to achieve certain adUsing efficient FFT algorithms, the results given by vantages or to fitparticular requirements. Finally, exthe methods in [3] are denoted F1. Using three con- amples were presented to compare the multiplication
volutions by (33) or (44) to implement the block re- efficiencies of a few implementations.
The formulation is so general that a complete and
cursive filter with the block length equal the order,
systematic
investigation of all possible applications is
gives as the multiplications per output point
difficult. The main ideas and relations to otherworks
that we know of have been presented here. The investigation of word length and storage requirements
and a more complete consideration of recursive imTable I11 compares these multiplication efficiencies plementations is still to be made.
for orders up to 128. Note the multidimensional approach is more efficient than the direct for orders
Acknowledgment
above three and more efficient than the FFT for orders up to about 256. This is yet to be explored in
The authors would like to thank R. A. Meyer for
detail.
valuable discussions.

10

IEEE TRANSACTIONS
ON

ACOUSTICS,
SPEECH,

AND SIGNAL
PROCESSING,
VOL.
ASSP-22,

References
[l] C.M. Rader “Discrete convolutions via Mersenne transforms,” IEdETrans.Comput.,
vol. C-21, pp.12691273, Dec. 1972.
[ 2 ] R. C. Agarwal and C. S. Burru,s,.!‘Fast digitalconvolutions using Fermattransforms.
In Southwestern IEEE
Con6 R e c , Houston, Tex., Apr: 1973, pp. 538-543.
[ 3 ] C. S. Burrus, “Block realization of digital filters,” IEEE
Trans. Audio Electroacoust., vol. AU-20, pp. 239-235,
Oct. 1972.
[4 J B. Gold and C.M. Rader, Digital Processing o f Signals.
New York: McGraw-Hill, 1969, py; 208-211.
[ 5 ] R. A. Meyer and C. S. Burrus,Certain
properties pf
periodically time-varying digitalfilters,” in Southwestern
IEEE Conf. Rec., Houston, Tex., Apr. 1973, pp. 529535.
[ 6 ] D. P. MacAdam, “Image restoration by constrained deconvolution,” J. Opt. SOC. Amer., vol. 6O;pp. 16171627, Dec. 1970.
[ 7 ] D. A. Pitassi, “Fast convolution using the Walsh trans-

[8 J
[9]
[lo]
[11 ]

[12 J
[13]

NO. 1 , FEBRUARY 1974

forms,”in
Proc. Conf.Applicationsof
Walsh Functions, Washington, D.C., Apr. 1971, pp. 130-133.
P. J. W. Rayner, “A fast cyclic convolution algorithm,”
presentedatSymp.
Digital Filtering,Imperial College,
London, England, Aug. 1971.
W. F. ,Davis “A class of efficient convolution algorithms, in &roc. Symp.Applications o f Walsh Functions, Washington, D.C., Mar. 1972, pp. 318-329.
J. C. Allwright, “Realfactorization of noncyclic convolutionoperatorswithapplications
to fast convolution,” Electron. Lett.,vol. 7, pp. 718-719, Dec. 1971.
C. M. Rader,“An improvedalgorithm for high speed
to spectralestimaautocorrelationwithapplications
tion,” IEEETrans.AudioElectroacoust.,
vol. AU-18,
pp. 439-441, Dec. 1970.
G. S. Robinson, “Logical convolutionanddiscrete
Walsh and Fourier power spectra,” IEEE Trans. Audio
Electroacoust., vol. AU-20, pp. 271-280, Oct. 1972.
R. C. Singleton, “An’algorithm for computing the mixed
radix fast Fourier transform,”’ IEEE Trans. Audio Electroacoust., vol. AU-17, pp. 93-103, June 1969.

Digital Notch Filter
Design Procedure

The ideal filter forthe above application would then
have a response s ( t ) totheinput u ( t ) . In the frequency domain, the required linear filter would have
a gajn of one for all frequencies except at wo where
its gainis zero. As such, this processoris typically
called a notch filter with notchat a0. Its gainJAMES A. CADZOW
frequency behavior is depicted in Fig. 1.
Unfortunately, the ideal notch filter is not physically realizable and must be approximated in practice.
If one attempted to implement a notch filter apAbstract-An analyticalprocedurefor
designing alinear
resistors, cadigital notch filter is presented. The resultant filter is sixth- proximation usinganalogdevices(i.e.,
orderandisimplementedby
cascading threesecond-order
pacitors, and inductors), one would quickly realize
filters so as to avoid instability which may arise from the futility of thisapproach.
On theother hand,
computercoefficienttruncation.Theprocedureoutlined
is
one
may
readily
design
a
digital
filter whose frestraightforw’md, requires only simple algebraic steps, and gives
quency
behavior
closely
resembles
that shown in
filter parameter selection criteria for reducing the effects of
computer coefficient truncation.
Fig. 1 (e.g., see [l]-[4] ).
Notchfilters
have utility in situationswhere adesired
The approach to be taken in this paper is to then
signal is corruptedbyanadditive
sinusoidalpickup.One
uniformly sample the signal u ( t ) (every T seconds)
thusmust
process the noisy signal so as to remove the
and use the resulting sequence as the input t o a digital
sinusoid without significantly distortingthe desired signal.
filter governed by
y ( h ) = b , u ( h ) + b , u ( h- 1 ) + . . . +
b,u(k
I. Introduction

- aly(h -

1)- a z y ( k

- 2)

-

- . . . - a,y(h

m)
-

n ) (1)

A frequently occurring linear data filtering applica- where u ( h ) and y ( h ) denote the values of the filter’s
tion occwfs whenone wishes to process a signal of the inputandoutput
signals,respectively, atthehth
form
iteration. A procedure for selecting the coefficients
aiand b iwhereby the filter’s gain factor-frequency beu( t ) = s( t ) + A sin wot
havior will be similar to that shown in Fig. l will be
so as t o remove the additive sinusoidal component, shortly given. One must realize, however, that since
A sin m o t , without seriously distorting the desired the filter is digital, its frequency behaviorwillbe
signal s(t). The situation in which a 60-Hz sinusoidal periodic with period 2 x / T (e.g., [l,p. 2971 ) and will
pickup corrupts a desired measurement signal nicely appear as shown in Fig. 2.
illustrates how such problems can arise in practice.
The selection of the sampling period T t o be used is
of great importancefrom a number of viewpoints.
Most importantly, it must be chosen small enough so
that little distortion results from the analog-to-digital
conversion of the desired signal s ( t ) . Quantitatively,