Two Dimensional Cyclic Convolution Algorithms With Minimal Multiplicative Complexity
Two Dimensional Cyclic Convolution Algorithms With Minimal Multiplicative Complexity
Abraham H. Diaz-Perez
Popular University of the Cesar/ Electronic Department, Sabanas Campus, Valledupar-Cesar, Colombia
Email: abraham.diaz@gmail.com
Domingo Rodriguez
University of Puerto Rico/ Electrical and Computer Engineering Department, Mayagüez PR. 00681-9042
E-mail: domingo@ece.uprm.edu
than (10), thus, it can be computed using 2 After the multiplication stage, the inverse process is
multiplications and 6 sums as: done, it is:
For m 1 the following representation can be used :
1 1
y[n1,n2 ]= ( F N2 ( F N1 ( X[k1,k2 ] H[k1,k 2 ] ) ) )
T T
x 0 x3 x 2 x1 x 00 x 01 x30 x31 x 20 x 21 x10 x11
, x x x3 x 2 x10 x11 x 00 x 01 x30 x31 x 20 x 21
where the operator “ ” denotes the Haddamard product. 1 0 , the
The next example has dimensions N1 4, N 2 2 x 2 x1 x 0 x 3 x 20 x 21 x10 x11 x 00 x 01 x 30 x31
, in column major representation it is: x3 x2 x1 x 0 x30 x31 x 20 x 21 x10 x11 x 00 x 01
y 00 x00 x30 x 20 x10 x01 x31 x 21 x11 h00 h0 h00 h01 x0 x3 x2
y x x x x x x x x h h h h
10 10 00 30 20 11 01 31 21 10 1 10 11 1 x x x x
y 20 x 20 x10 x00 x30 x 21 x11 x 01 x31 h20 Total, 8 sums. Then m1 1 0 3
h2 h20 h21 2 x2 x1 x0 x
y 30 x30 x 20 x10 x 00 x31 x 21 x11 x01 h30 (16) h3 h30 h31 x3 x2 x1 x
y 01 x01 x31 x 21 x11 x 00 x30 x 20 x10 h01
y11 x11 x 01 x31 x 21 x10 x00 x30 x 20 h11 For m 2 the next representation can be used :
y x x x x x x x x h x0 ' x3 ' x2 ' x1 ' x00 x01 x30 x31 x20 x21 x10
21 21 11 01 31 20 10 00 30 21 x '
y 31 x31 x 21 x11 x 01 x30 x 20 x10 x00 h31 1 x0 ' x3 ' x2 ' x10 x11 x00 x01 x30 x31 x20
The matrix X and the vector h can be x2 ' x1 ' x0 ' x3 ' x20 x21 x10 x11 x00 x01 x30
represented as:
x3 ' x2 ' x1 ' x0 ' x30 x31 x20 x21 x10 x11 x00
X h X 2h 2
y Xh 1 1 , where then
X 2h1 X1h 2 h0 h00 h01 x0 ' x3 '
x00 x30 x20 x10 h00 h h
h11
1 10 1 x1 ' x0 '
x x00 x30 x20 h Total, 8 sums. Then, m 2
X1 10
, and h1 10 , h2 h20 h21 2 x2 ' x1 '
x20 x10 x00 x30 h20
h3 h30 h31 x3 ' x2 '
x30 x20 x10 x00 h30 The representations for m1 and m 2 have the same
properties, even more, they correspond to the one-
x01 x31 x21 x11 h01 dimensional cyclic convolution operation of size 4, thus,
x x01 x31
x21 h it can be computed using the algorithms showed in the
X 2 11 , and h 2 11 , previous work [3]. Assuming the general form for m1 and
x21 x11 x01 x31 h21 m2 as:
x31 x21 x11 x01 h31 x0 x3 x2 x1 h0
The algorithm showed in (9) is used to calculate x x0 x3 x2 h1
vector y : y X.h 1 . , (17)
x2 x1 x0 x3 h2
y X X 2 h1 m 1 m 2
y 1 1 . , where :
y 2 X 2 X1 h 2 m 1 m 2 x3 x2 x1 x0 h3
1 1 We can represent the matrix X and the vector h as:
m1 (X1 X 2 )(h1 h 2 ); m 2 (X1 X 2 )(h1 h 2 ) X X1 h 0
2 2 X 0 , and h , then
The vectors m1 and m 2 are found by: X1 X0 h1
X h X1h1
y Xh 0 0
X 0h1 X1h 0
We can use the same algorithm employed in (9) to
calculate vector y .
y X X 1 h 0 n 1 n 2
y 0 0 . , where :
y 1 X 1 X 0 h 1 n 1 n 2
1 1
n 1 (X 0 X 1 )(h 0 h 1 ); n 2 (X 0 X 1 )(h 0 h 1 )
2 2
The vectors n1 and n2 are found by: simultaneously. The constants employed in the whole
For n1 we will call algorithm are the roots of unit of the polynomials
y 21 x21 x20 x10 x01 x10 x00 x13 x03 h21 2) Row major representation:
y x x x11 x x x x x h
30 30 31 12 01 11 00 10
30
X[k 1 ,k 2 ]= FN 2 ( FN 1 x[n1 ,n 2 ]) T T
;
y31 x31 x30 x x02 x11 x01 x10 x00 h31
By means of a similar procedure as described above
H[k 1 ,k 2 ]= FN 2 ( FN 1 h[n1 ,n 2 ]) T T
that is not reproduced for size reasons, we can obtain the y[n1 ,n 2 ]= FN11 ( FN21 ( X[k1 ,k 2 ] H[k1 ,k 2 ] ) T )T
following signal flow diagram for a row major
representation. The same numerical example of the last A question arises at this point. Which of the two
representation is used in the mapping: representations is the best?
In terms of the number of multiplications both
representations has the same quantity, but if we consider
the number of processors of the same characteristics
required for the operation, in the column major
representation of the example, are necessary 2 identical
processor of dimension 4 for each of the sequences in the
pre-multiplication stage, plus 2 more of the same
dimension in the post multiplication stage.
In the row major representation instead, are
necessary 4 identical processors of dimension 2 for each
of the sequences in the pre-multiplication stage plus 4
more in the post-multiplication stage. It is evidently a
product of the original array dimension. The reader can
compare the representations showed in (2), (3) with those
showed in (16) and (20) to see the difference between
2 ( Z 2 Z 4 ) , it is N1 2, N 2 4 and
2 ( Z 4 Z 2 ) , it is N1 4, N 2 2 .
Figure 3. Signal flow diagram for two dimensional cyclic
Since the generalization is straightforward there is no
convolution, row major representation N1 4 by
need to follow with a detailed description. The general
N2 2 . algorithm uses a field of constants that are always the
roots of unit of polynomials u N1 1 and u N 2 1 , and
The figure 3 shows a new representation for the same
their uses over a signal flow diagram, depends only upon
CC2D in a signal flow diagram. The blue ones in the pre-
the class of representation used, it is row major or column
multiplication stages correspond now to
major representation.
F4 x[n1,n2 ] and F4 h[n1,n2 ] . The yellow We are interested in comparing the present algorithm
ones in the pre- multiplication stage correspond to : with those that uses the “Polynomial Transform”. In our
X[k1,k 2 ]= F2 ( F4 x[n1,n2 ])T
T
, and case, the array is organized in a row or column major
form and then it is processed by multiplications by the
]= F T
H[k1,k 2 ( F4 h[n1,n2 ])T . roots of unit and sums. Then the two “transformed” arrays
2
are multiplied, and at the last stages, the inverse process is
After the multiplication stage, the Two Dimensional
done. In the “Polynomial Transform” , cases, as Fermat
Inverse Discrete Fourier Transform (IDFT2) of the entire
number polynomial transform (FNPT) or Mersenne
process is done, in this representation it is:
number polynomial transform (MNPT) [1], the arrays are
first organized into polynomial forms and then
y[n1,n2 ]= F41 ( F21 ( X[k1,k 2 ] H[k1,k 2 ] )T )T polynomial division and Fermat Number Transform
, (FNT) or Mersenne Number Transform (MNT), are
where the operator “ ” denotes the Haddamard product. applied to reach the stage of transformation of the
sequences. Then the product of the two “transformed”
sequences is computed. Next the Inverse Fermat Number
Polynomial Transform (IFNPT) or the Inverse Mersenne Rings, IEEE Transactions On Signal Processing, Vol. 43,
Number Polynomial Transform (IMNPT) is applied. Then No. 3, March 1995
at the last stage, it is obtained the desired result of the
CC2D through of the Chinese Remainder Theorem [2] J. W. Woods and S. D. O’Neil, Subband Coding of
(CRT). The present algorithm shows two big advantages Images. IEEE Transactions on Acoustics, Speech, and
with respect to the algorithms that use “Polynomial Signal Processing, Vol. Assp-34, No. 5 , October 1986
Transform”. The first is that in the matrix representation
proposed in this paper it is not necessary to calculate the [3] Abraham H. Diaz-Perez, Domingo Rodriguez, One
polynomials remainders of the sequences. Instead the Dimensional Cyclic Convolution Algorithms with
“Polynomial Transform”, approach uses an algorithm Minimal Multiplicative Complexity, Proceedings, ICASSP
that their mapping to a signal flow diagram depends on 2006, Toulouse, France May 2006.
the values of each element of the arrays; this means the
need of an adaptive processor that makes the process [4] Díaz-Pérez Abraham H., Análisis y Diseño de
difficult to realize in real time. The second advantage is a Algoritmos Para la Computación con Estructuras
consequence of the last sentence, it is, the present Circulantes. Digital Tesis MSc. ECE Dep. UPRM,
algorithm need less memory than the polynomial Mayagüez, PR. May 2004.
transform approach, because it does not need to save
remainders and make reconstruction for every array that [5] J. Davis, Circulants matrices (John Wiley, New York,
will be processed. In other words, the algorithm depicted 1979).
here, follows the technique of computation “in place” and
it is unique for different arrays of the same dimension. [6] S. Winograd, Arithmetic complexity of computations,
For all the reasons stated above, in our case the whole Society for Industrial and Applied Mathematics, 1980.
algorithm complexity is reduced.
[7] M. Heideman, Multiplicative complexity, convolution,
4. CONCLUSION and the DFT (Springer Verlag, New York 1988)
This document shows a novel algorithm for the fast and
[8] J. McClellan and C. Rader, Number Theory in Digital
efficient computation of the CC2D with minimal
Signal Processing. Englewood Cliffs, NJ: Prentice Hall
mathematical complexity, based on the product of a
1979.
circulant matrix by blocks with circulant blocks by a
vector, and the use of the Chinese Remainder Theorem.
The principal goal was to obtain a recursive algorithm,
easy to implement, with the comparative advantage over
the algorithms that uses the polynomial transform
approach that the present algorithm does not require to
realize the polynomial divisions by the roots of unit, in
order to obtain less number of multiplications.
This work changes the conceptual framework of the
computation of the CC2D using the FFT and locates it in
a structure of minimum mathematical complexity,
changing the sense to such procedures, which constitutes
itself in a shift of paradigm. The algorithm obtained is
limited to sequences with size a power of two in both
dimensions, but their length (in each dimension) are not
necessarily equal.
5. ACKNOWLEDGEMENTS
6. REFERENCES