Professional Documents
Culture Documents
Research Article
FPGA BASED HIGH-SPEED A-OMS LUT & FIR SYSTEM
DESIGN
G. Sowmya Bala
and both cannot be combined since the words input whose output then add/subtracted from 16A, by
generated are odd numbers. the (+/-)_cell as shown in figure 1.
Consequently a different form of Antisymmetric
product coding combined with a modified form of Odd
Multiple Storage scheme forming A–OMS LUT
method which aims mainly to provide the efficient
memory based computations and to perform operations
for required functional computational. For large word
sizes, the input words are coded in to set of sub-words
and are subjected to the same method describe above.
The advanced method/approach is described briefly in
the section two of this paper. The section three consists Figure 1. Antisymmetric product coding based
of an design process using proposed methodology. In LUT-M circuit.
the section, four and five consists of the memory 2(b) Method 2: Modified odd multiple storage scheme:
optimization process, results, and conclusion. In this method a barrel shifter is used to perform the
2. METHODOLOGY shift operations through which the even multiples are
Conventional LUT-based multipliers, with a fixed computed from the obtained odd multiple values by
coefficient and an input word have been used for simple shift operations which provides LUT optimized
simple memory based multiplication operations that in terms of area by storing only the odd multiple values
hoard in a memory core [3]. This requires increase in rather than whole values.
the LUT size with an increase in the input word length,
which is area inefficient. In order to provide an area
efficient look-up-table for large data operation, some
optimization schemes have been presented, of them in
one method, instead of the entire values only the odd
multiple values are stored and with another one, there
is a reduction in LUT size to half of its original where
the product words are recorded as antisymmentric
pairs.
Combining the above-specified methods, form A-OMS Figure 2. (a) Decoder circuit.
LUT method that further optimizes the LUTs where
modified methods of odd multiple storage and
antisymmetric product coding are used.
2(a) Method 1: Modified antisymmetric product
coding scheme: In this method, 32 x 5-bit input words
are considered. Computing the product word (PW)
values (i.e., input word {X} of length L=5 multiplied
by fixed coefficient value A) results in the negative
mirror symmetry from half of total input words that
facilitates a reduced LUT in size. Hence, for a given 4-
bit addresses the corresponding code words to be Figure 2 (b) Control circuit and Address generation
stored are reduced to half. This is derived from the
circuit (AGC)
antisymmetric behavior of products forming In addition to the shifter circuit, a memory unit for
antisymmetric product coding, where the address bits product values and a decoder circuit for mapping bits
are represented by x{x0, x1, x2, x3} such that as well as a control circuit and address generation
X’=XL, if x4=1; XL’, if x4=0 ... (1) circuit are required. The mapping process of 5-bit input
where XL = (x0, x1, x2, x3) is the four less significant word to a 4-bit LUT address (d0, d1, d2, d3) is done by a
bits of X, and XL’ is the two’s complement of XL. The simple set of mapping relations. The address bits are
product word can be denoted as thus generated from the AGC as shown in figure 2(b)
PW = 16A + (sign value) × (derived word)… (2) using the equation (3) and (4) that are defined below.
where sign value is equal to one for x4 = 1 and is equals Here the YL {y0, y1, y2} denotes all the shifted odd
to −1 for x4 = 0. The product value for X = (10000) integer address bits. The relations used to map are as
corresponds to the derived word i.e., antisymmetric follows.
product code value “zero,” which could be derived by
resetting the LUT output, instead of storing that in the
LUT. A simplified LUT-M circuit for an input word of … (3)
length L=5 is shown in figure 1.it describes both the where X’’ = {x’’0, x’’1, x’’2, x’’3} is generated through
structure and function of LUT-M (look-up-table based address mapping the values after arithmetically right
multiplier). The address mapping circuit generates the shifting the leading zeros of X’ similar to that defined
desired address {x0’, x1’, x2’, x3’} where x4 is a control in equation (1).i.e.,
bit for the (+/-)_cell. The address bit generated through X’’=YL, if x4=1; YL’,if x4=0 …(4)
address mapping selects the required value in LUT
IJAERS/Vol. I/ Issue I/October-December, 2011/146-149
International Journal of Advanced Engineering Research and Studies E-ISSN2249 – 8974
For a given L-bit input word an address encoder maps y[n] = Samples of output sequences,
the L-1 bit addresses of the LUT which consists of nine h[k] = Impulse response of the filter.
words of (W+4) –bit width. Modifying a simple 3 to 8 The z-domain representation of it is as shown below
line decoder circuit (shown in figure 2(a)) produce a 4
to 9 line decoder that generates word select signals
… (8)
through which the required word from LUT is selected.
To reduce the number of register and pipelined stages
In figure 2(b), a control circuit is shown to provide the
w.r.t the direct form, the transposed structure is
control bits and a reset signal bit.
The basic shift operation by barrel shifter is done by considered. The direct form realization and Transposed
form realization of FIR filter is show in figure 4(a),
the control bits s0, s1 from control circuit. From the
4(b).
figure 2(b) the control bits s0, s1 are derived as follows.
… (5)
and the reset signal is derived as
RESET = d3 AND x4 … (6) Figure 4 (a) Direct form realization of FIR filter
Where d3 is defined in equation (3)
The optimized LUT circuit with modified schemes thus
designed and is shown in figure 3. in figure 3 the
address generation and control block is as shown in
figure 2(b).The main reason for approaching this
technique is to optimize the implementation of the sign Figure 4 (b) Transposed structure realization of FIR
modification of the odd LUT output, which does not filter
support the OMS scheme in methods defined 3(b) Design Process with specified methods: FIR filter
previously [6]. The modified circuit is shown in figure impulsive response is the ratio of output sequences to
3. Also it provides the 2’s complement representation that of input sequences. A simple flow of FIR filter
of the product words, that supports computations with with A-OMS based design is shown in figure 5 where
both the signed and unsigned bits, by modifying the the input samples are represented by x[n] and
(+/-)_cell (to perform add/subtract operations) of figure multiplication operation by M, simple arithmetic
1 function by A, D stands for delay operator. The method
specified in section 1 is applied to produce the desired
output.
described in section 2. The input values will be the be reduced to ½ that of the conventional design
input sequences X[N] i.e., (x[0], x[1], x[2],… ,x[n-1]) therefore nearly 20% less area than conventional
of the desired filter and the coefficient values are design methods (DA, Conventional Multiplier) for the
impulse sequences h[0], h[1], h[2],…., , h[N-2], h[N- implementation of a 16-tap FIR filter having the same
1]. The coefficient may be fixed or generic can be a set throughput per cycle. This could be used for memory-
of impulse sequences. The A-OMS method can support based implementation of cyclic and linear
multiple or a set of sequences and generic coefficient convolutions, sinusoidal transforms, and inner-product
values along with fixed values. The desired output computation. The proposed method of the FIR filter
responses are derived through the mapping process that design, when compared with the previous works results
is explained in the section 2 of this paper. Finally A- in a reduction of the adder/subtract, slices, thus reduced
OMS method reduces the computational delay. memory core size. An efficient approach is specified
The FIR filter design based on optimized LUT using A- for optimizing the LUTs and is designed through an
OMS method provide more efficiency than the DA- “A-OMS” method that results in improving
based approach (that is done previously [2, 4]) in terms performance. It can be said with this method that there
of area-complexity for a given throughput and lower is a reduction in the components required in a DSP
latency of implementation. With an increase in the cores of FPGAs FIR filter, when designing it
number of input sample size, the high precision resembling the Matched filter structure, which is
multiplication operations where input code words applicable for many applications especially for
(length of code word) were coded into required word spectrum sensing in cognitive radio of SDR. The
sizes and the A-OMS operations are performed in the specified method and the design process thus provide
same way as previous. This coding method is known to 20% savings of area-delay product and the throughput
be the input coding technique that provides high is twice that of the previous methods. This is shown in
precision operations. tabular re-presentations and explained briefly in the
4. CONCLUSION above sections. The FIR filter design based on the
This paper undergoes on describing the way to extract proposed methodology can be implemented at the
the FPGAs with an resourceful area utilization by receiver part of the system. To further enhance the
describing the methodology rather than DSM (deep- system performance by utilizing the available spectrum
sub- micron (nano meter- tech.,) ). The new approach is efficiently the specified method will be implemented at
tested by designing a filter circuit of a DSP core. The transmitter part of the system.
comparisons in terms of the latencies and area REFERENCES
complexity between the LUT-based design and the 1. P. K. Meher, ‘LUT Optimization for Memory-Based
Computation,’IEEE Trans on Circuits & Systems-II,
other methods based design are shown in table-1, table-
pp.285-289, April 2010.
2. 2. Professor S.K. Sanyal, Wasim Arif, “Designing of a
Table–1 Latencies for different filter order and fast LUT based DDA FIR system with adaptive
input word size coefficient for Spectrum ensing in Cognitive Radio”
A-OMS BASED DA-BASED DESIGN ICGST AIML-11 Conference, Dubai, UAE, 12-14
N DESIGN April 2011.
L = 8 L =16 L = 32 L=8 L =16 L =32 3. P. K. Meher, ‘New Approach to Look-up-Table
Design and Memory-Based Realization of FIR
8 11 12 13 16 17 18
Digital Filter,’ IEEE Trans on Circuits &Systems-I pp
16 18 16 20 5 26 27
592-Systems I, pp.592 603, March 2010.
32 35 36 37 42 43 44
4. Tevfik Y¨ucek and H¨useyin Arslan, ”A Survey of
64 65 66 67 75 76 77 Spectrum Sensing Algorithms for Cognitive Radio
128 129 130 131 140 141 142 Applications”, IEEE Communications Surveys &
Table-2 Area complexity comparisons using Tutorials, Vol. 11, No. 1,First Quarter 2009
different methods 5. H. H. Dam, A. Cantoni, K. L. Teo, and S. Nordholm,
“FIR variable digital filter with signed power-of-two
coefficients,” IEEE Trans. Circuits Syst. I, Reg.
Papers, vol. 54, no. 6, pp. 1348–1357, Jun. 2007.
6. R. Mahesh and A. P. Vinod, “A new common
subexpression elimination algorithm for realizing
low-complexity higher order digital filters,” IEEE
Trans. Computer-Aided Ded. Integr. Circuits Syst.,
vol. 27, no. 2, pp. 217–229, Feb. 2008.
7. Shahnam Mirzaei, Anup Hosangadi, Ryan Kastner,
“FPGA Implementation of High Speed FIR Filters
Using Add and Shift Method” IEEE Trans. Circuits
Syst., 2006.
The overall design process enhances the system 8. M. Yamada, and A. Nishihara, “High-Speed FIR
performance in terms of speed and area that doubles Digital Filter with CSD Coefficients Implemented on
FPGA”, in Proc. IEEE Design Automation
the transmission rate, increasing the overall throughput. Conference (ASP-DAC , 2001, pp. 7-8.
The comparisons show that more than 20% of saving in
area-delay product with a transmission speed of twice
that of the conventional methods. It requires N times
less number of decoders and memory requirement will