You are on page 1of 7

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

ORG

65

New approach for implementing a FIR Filter Based on AAA Methodology


Blaiech A.G, Ben Khalifa K, Boubaker M, and Bedoui M.H

Abstract

The FIR (finite impulse response) filter is an application that has been well treated in the literature for validating various approaches. Its algorithm is so interesting that it is implemented in embedded systems. In this paper, we have presented a new optimization methodology implementation on FPGAs (Field-Programmable Gate Array). It is based on the Adequacy Algorithm Architecture (AAA) approach and should determine the optimal encoding of various blocks of our filter to minimize area and maximize accuracy while respecting the time constraint. We start from an algorithmic specification in the form of a Factorized and Conditioned Data Dependence Graph (FCDDG). We apply transformations on this graph by producing defactorization which combines blocks of operators with similar accuracy. The proposed methodology allows the automatic generation of synthesizable VHDL code which is coded in fixed-point multi-width. This methodology has allowed a gain of 14% in LUTs compared to classic AAA methodology.

Index Terms Systems specification methodology, Multiple precision arithmetic, Algorithms implemented in hardware, Optimization.

part, we propose in this paper to improve the optimization approaches of the FIR filter implementation ilters have been the interesting subject of several by adopting the Adequacy Algorithm Architecture approaches to implementation in regards to their (AAA) method and then integrate in this methodology importance in several fields. Particularly, the FIR filter an optimization phase which takes into account the was implemented on processors [1],[2] and on ASIC or accuracy. To achieve this last part, a preliminary study FPGA [3]. The implementation of the FIR filter on FPGA will be made to define the optimum width of the various has been the subject of various studies and several components of the filter functions on the inputs methods have been applied [4]. The most classic method dynamics. This paper is organized as follows: In the first is based on conventional multiplications and additions. section, we describe the AAA methodology, starting As an approach to optimize the implementation of this from the algorithm graph and finishing with the filter, we mention those based on distributed arithmetics conversion methodology of FIR filter coding from a (DA) [5],[6],[7]. Other works have been oriented towards floating point to a fixed point in order to gain the the development of optimized multiplication by optimum widths of these components. breaking them down into simple operations such as In the third one, we present the integration of the addition, subtraction and shifting or sharing common arithmetic accuracy in the optimization phase of the sub-expressions [8],[9],[ 10]. In this category, the authors AAA methodology with the aim of minimizing the size [10] have used a digital representation with signed digits of the information while retaining the performance and to reduce the number of adders in a FIR filter. A respecting the time constraint imposed by the user. In comparison between these three methods has been the last section, we present the results of the filter developed in [11] and led to the conclusion that the third implementation on an FPGA using the AAA approach method is the most interesting ratio time/area point of and the proposed methodology. view. These methods have focused on optimizing the 2. ALGORITHM ARCHITECTURE ADEQUATION area while respecting the time constraint without resorting to a study of the operator accuracy, which is (AAA) supposed to be the most quasi-arbitrary work. For our We have chosen the extension to the circuits of the AAA methodology [12],[13],[14], which cover the complex tasks of implementation and optimization from Blaiech A.G. is with TIM Team, Laboratory of Biophysics, Faculty of Medicine of Monastir, University of Monastir, 5019 Tunisia. the high-level specication to the circuit synthesis. An Ben Khalifa K. is with TIM Team, Laboratory of Biophysics, Faculty of AAA target is an effective means for the rapid Medicine of Monastir, University of Monastir, 5019 Tunisia. prototyping in real time for dedicated architectures. The Boubaker M. is with TIM Team, Laboratory of Biophysics, Faculty of Medicine of Monastir, University of Monastir, 5019 Tunisia. algorithm to implement is specied firstly as a Bedoui M.H. is with TIM Team, Laboratory of Biophysics, Faculty of Factorized and Conditioned Data Dependence Graph Medicine of Monastir, University of Monastir, 5019 Tunisia. (FCDDG). FCDDG is an extension of the classical directed data dependence model with dedicated nodes which specify a repetition pattern (like the operator for/while of the classical imperative languages) and the conditioning (if..then..else). FCDDG, also

1.

INTRODUCTION

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

66

ALGORITHM1 GREEDY ALGORITHM ADOPTED BY AAA [14] Input : Let FF the list of frontier belonging to the critical path; Let Ct time constraint; Output : Defactorized graph; Begin Calculate the latency L; Calculate the area S; While L > Ct do Determine the list of FF belonging to the critical path; For in 1 to FF do Determine the optimal factor defactorization (the smallest factor that implies lower latency to the time constraint) ; Calculate the new area S'; Calculate the new latency L'; Calculate the cost for this defactorization : F = S S / L - Max(L, Ct) ; End for Sort list of frontiers by order of increasing cost; Defactorize the frontier at the top of the list; End while End called algorithm graph (Gal), is made up of an operation node which consumes the input data and produces the output data. Before generating the optimized hardware implementation, the AAA methodology explores the solution space through space-time transformations applied to the initial specification by looking for the best rates for each frontier defactorization Gal which leads to the minimization of the surface while respecting the time constraint. AAA is based on greedy Algorithm heuristics (Algorithm1) using the characterization information and a dedicated cost function. At the end of this stage synthesis, we have a hardware implementation graph obtained by direct translation of the transformed graph algorithms designated by the data path and a set of equations obtained by analyzing the neighborhood relationship between the frontiers of the graph algorithm factorization designated by the control path. The synchronization between the nodes of implementation graph is provided by a clock signal. After the synthesis step, we have a synthesizable VHDL code representing the interconnected operators which are synchronized by a clock signal.

first place the dynamic data enough to gain the size of the integer part and evaluate the accuracy. Then, we should be able to determine the size of the fractional part. The simulation process is given in Fig. 1. The dynamics of a given data correspond to the interval containing all values taken by this data over time. To determine the dynamic variables, two approaches can be used: statistical approaches and analytical approaches. -Statistical approach: It is based on the analysis of simulation results of the application in floating point [16],[17]. The disadvantage of this approach resides in not guaranteeing the absence of overflow[18]. -Analytical approach: It is based on a static analysis of the application [19],[20]. In fact, knowing the range of input values, this approach can calculate the range of the value output by spreading the data flow on a data flow graph. This method solves the problem of overflow. The limitation of this approach remains the bursting of the estimated integer part. In our methodology, we have opted for using a statistical approach in calculating the maximum value of various blocks of our algorithm in floating-point encoding. This first phase is to determine the minimum size of the integer part. The decimal position used for a variable x depends on the stretch estimated for this variable. Knowing the extent [ x, x ] to guarantee the absence of overflow, the decimal position is equal to:

3.1Dynamic data estimation

3.2 Integer part determination

mx = min k z

| 2 k > max( x, x ) = [log 2 max( x )]

(1)

3.

STUDY OF ACCURACY

In other words, we look for the smallest integer k in a k way that 2 is larger in absolute value at the terminals of the variable x. Subsequently, we add one bit for negative numbers (two's complement). Given that the statistical method used in estimating the dynamics of the data does not guarantee the overflow, we have integrated indicators defined as variables that control the excesses within the operators and the intermediate signals. These indicators control the overflow dynamically. The determination of the integral part of the data is described in Equation 1. The fractional part is to set the minimum number of bits for a maximum performance. Thus, calculation accuracy should be assessed basing on data widths to ensure that it exceeds a threshold fixed by the user and corresponding to the precision constraint. Several criteria exist to assess the accuracy of a fixed-point implementation. The absolute error calculated by the difference between the results in fixed point coding and

It consists in determining the optimal width for each block of the FIR filter that we have in [15]. It passes through a study of accuracy leading to the implementation of a limited accuracy by minimizing the size of the data without affecting the algorithm performance. We have used a uniform precision format for all blocks of the algorithm to implement. To make it convertible into a fixed point algorithm, initially coded as a floating point algorithm, we have to estimate in the

3.3 Accuracy evaluation

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG
Floating point coding FIR Filter Algorithm

67

Conversion to Fixed-Point

Dynamic data estimation (Statistical method)

Integral part determination

Accuracy evaluation (Statistical method)

Fractional part changing

[RFloating-RFixed[<

Accuracy return values

RFixed=Result in fixed point RFloating=Result in floating point =Margin of error to accept

Fixed Point coding FIR Filter algorithm

Fig. 1. Conversion to fixed point Coding for FIR filter

floating-point coding reflects the performance criterion of our application. To determine the noise power output of the algorithm for a specific instance of data formats, two approaches can be used, the statistical approach and the analytical approach. -Statistical approach is to determine the statistical parameters of the quantization error from the simulation of the algorithm in a fixed point with different precisions by varying the fractional part, while always referring to the floating-point result [21],[22]. -Analytical approach determines the analytical expression of the noise power by propagating a noise model, depending on the sizes within the flow graph of the algorithm [23],[24]. In our approach, we have chosen to use a statistical method to assess the accuracy. Since our approach takes account of different widths, we have used a method called "adding one bit for all, where we add a bit of fractional part for all blocks in the algorithm (such as inputs, the coefficients, the multipliers and adders for a FIR filter) to obtain the desired performance. This method ensures a good performance of the application.

complete study of the accuracy. This study will determine the size of the integer part and fractional part. This process will help identify the optimal widths of different blocks of the FIR filter. The next step is to adopt the principle of the classic AAA methodology presented in Section II for the synthesis architecture, which consists in building firstly the FCDDG followed by the implementation graph corresponding to a combination of the data path and control path. After determining the factorized implementation graph, we will proceed to apply a heuristic of optimization by performing a set of frontier defactorizations which takes into account the width of the operation. We will change the manner of frontiers clustering through the knowledge of the optimal size of the various operations in the FCDDG, which has been initially and randomly determined in relation to the similarity of the frontier in uniform accuracy, during the defactorizaton, with a manner of clustering based on knowing the widths operations and regrouping operations with the same kind and having the closest widths. This process leads us to determine a set of group operations where the operations of the group will be perform with the same operator. To apply this grouping, we begin by going through the graph by locating all frontiers belonging to the critical path. For each operation of a frontier studied, we will perform, relating to agglomerative (bottom-up) algorithms [25] in a complete linkage, groupings for operations that are repeated over time for a factorized graph. This phase begin firstly by computing the distances between the widths of all operations to regroup. Then, we choose the minimum distance, and after that we group the operations between the two groups on this distance while the maximum number in a group is not reached. Finally, we recompute the distances. This process is repeated until reaching the minimum number of groups that satisfies the time constraint. So, we will have as many ways of grouping as the number of operations in a frontier. The winner group will be the group that has a minimum cost [6] function defined in Equation 2.

F = S ' S / Max( L' , Ct )

(2)

INTEGRATION OF ACCURACY STUDY IN AAA

where S is the initial area, S ' is the new area, L is the initial latency defined in clock cycle, L' is the new latency and Ct is the time constraint defined by the user. This step will provide a template for each defactorized frontier belonging to the critical path. Afterwards, we will order the cost frontiers and choose the winner frontier to be defactorized, which has the lowest cost. Algorithm 2 presents the process of defactorization which allows minimizing the area and guaranteeing the performance while respecting the time constraint.

The new approach consists in integrating accuracy study of the AAA methodology. So, we begin by a

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

ALGORITHM2 METHODOLOGY ADOPTED


Input : FIR filter Algorithm Output : Optimized and defactorized implementation graph. Let : - G im (O, D) :Implementation Graph - O : Finite set operations, O = {o i }1in - D : Finite set of data dependence edges called inter-operation

Apply Heuristic of Optimization (Defactorization):68 Calculate the latency L; Calculate the area S ; While L > C t do Determine the list FFlist of frontiers belonging to the critical path

CC ; FFlist := FF j , FF j CC. ;
For all frontiers candidate FFj FFlist Do Calculate the optimized defactorization factor Facti j corresponding to a number of minimum groups while checking the time constraint Ct ; Calculate the N _ max t g t G ; For all OiC FF j candidate in FFj Do

- G : Finite set of groups , G = {gi } i n 1 - predecessor


-1

- A : Association between integer part I and fractional part F = (I, F) .

:D O

d i 1 (d i ) = (oi1 ) where di = (oi1, oij


- successor : D O
d i (di ) = (oij )

{ }2 jn )

t [1, N _ max i ], (I, F)


Calculate

FFjSi I t

g jit FFjSi I t such as


p [1, N _ maxi ], k [1, N _ maxi ]

M ( g jit ) = (I, F) {initially} ; DS pk = max d / x g jip ) et y g jik ), d = x y


Repeat Choose the pair g jip , g jiq with DS pq = min d /d DS pk While ((NG jit < N max ji with t = p or q ) and

where di = (oi1, oij

{ }2 jn )

distances

};

- Accuracy Ac : D A
di Ac(d i ) = (I, F) i

- Oi

FF j

oprtion FF

FF - (I, F) j di I t accuracy of the vertices i at iteration t of the frontier

( x g jip et y g jiq , ( x g jip g jiq ) or( y g jip g jiq ) )


Put (x g jip , g jiq in g jit / g jit G jip , G jiq

FFj
- OiC FF j arithmetic operation in the frontier FFj . - S:O D
oi S (oi ) = s i = d i D / o j O avec -1 (d i ) = oi et (d i ) = o j

m = min M (g jip ), M(G jiq ) in g max = g jit g jip , g jiq

having m = max M (g jip ), M(g jiq ) ; If ( x g jip ) , ( y g jiq ), ( x G jip G jiq ) and

having

- N :G N
gi N ( g i ) = NGi is the element number in the group g i .

( y G jip G jiq ))
Delete g jip or g jiq ; End if End while Recalculate the distances DS pk between g jip and g jik where NG jik < N _ max jip and NG jip < N _ max jip ; Until reach the desired number of groups in the final grouping Perform the optimal defactorization with a factor designated by the number of groups and an accuracy determined by the width of a group; Calculate the new area S'; Calculate the new latency L'; Calculate the cost for this defactorization : F = S S / L - Max(L, Ct); End for Sort the list of manners of groupings by order of ascending cost; Choose the cost tops of the list;

- N _ max i is the maximal number in the group g i . - M :G P

gi P( g i ) = mi is the accuracy of the group g i


- g j i t FF j Si I t is a group containing accuracy elements of the arc S i belonging to iteration I t of the frontier FF j - :F N the optimized FFj (FFj ) = Fact j where Facti j is defactorization factor of FF j Doing a simulation to identify the different widths of operations Construct the Factorized and Conditioned Data Dependence Graph (FCDDG). Construct the implementation graph corresponding to a combination of an algorithm graph and a neighborhood graph: The width operation will be the maximum width of all the elementary operations.

End for Sort the list of frontiers by the ascending order of cost; Defactorized the frontier at the top of the list; End While

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

69

5.

In order to describe this approach in a better way and validate it, we have chosen to illustrate the application of the FIR filter. A FIR filter is defined by the coefficients of impulse response h(n) with h(n) = 0 pour n<0 and n>m where m is defined as an order of filter. The output of filter out (n) is given in Equation 3

METHODOLOGY VALIDATION FIR FILTER

BY APPLYING A

5.1 FIR Filtre Algorithm

The foundation of this methodology is based on researching the optimal implementation graph by applying a heuristic optimization. We have applied on the filter a previous heuristic approach adopted by the classical AAA, designed for defactorizing the frontiers to minimize the area and respect the time constraint. After that, we have applied our actual heuristic approach which is an extension of the first one and which takes into account accuracy. In our example, we have set the time constraint to 4 clock cycles.
(4,7)

out (n) =

x(n i).h(i)
i =0

m 1

(3)

C1

(1,7) (4,7)

(4,7)

F1{1}

+
F2{1} (6,7) (7,7) (4,7)

where { h(i ) : i = 0,..., m 1 } are the filter coefficients


C2

(1,7) (6,7) F1{2}

We have obtained the graph of the algorithm FIR filter of order 8 (Fig. 2) by applying the classical AAA. We have implemented the FIR filter of order 8 with the classical AAA and the modified AAA methodology. The point of departure is to study the accuracy of a FIR filter of order 8. This has resulted in the estimation of the dynamic data, the determination of the integer part and the estimation of accuracy, which has led to the determination of optimal widths of the inputs, coefficients, adders and multipliers. Indeed, each component has its own coding as shown in the totally defactorized Data Flow Graph (DFG) given in Fig. 3, with (E, F) where E is the size of the integer part and F is the size of the fractional part. We have got the FCDDG defined in Fig. 2 by applying the classical AAA. It is built by two frontiers: The first frontier of 8 iterations has 2 control nodes
2 (6,7) F 1 (6,7) 1 + 1 2 (7,7) 1 (8,7) 2 8 (6,7) F (6,7) 1 1 J + 1 + 2 1 (8,7) (8,7) (9,7) (6,7) F 1 (8,7) (6,7) + (3,7) 1 2 F (6,7) F2 *2 (6,7)

5.2 Applying the proposed methodology on a FIR filter:

C3

+ +
F2{1} (7,7) F2{2} (9,7) (8,7)

(1,7) (6,7) (1,7) (0,7) (1,7) (0,7) (1,7) (2,7)

F1{3}

C4

(6,7)

F1{4}

+
(1,7)

C5

F1{5}

+
(2,7) (1,7) F2{1} (3,7) (4,7)

C6

F1{6}

+ +
(3,7) F2{2} F2{2}

C7

(1,7) (2,7) (1,7)

F1{7}

C8 C7

(2,7) F1{8}

Fig. 3. Results of an accuracy study in a totally defactorized graph for a FIR of order 8

8 (6,7) 8

1 (6,7) 1

(6,7) 1 J

F (1,7)

(1,7)

F1 *8

Fig. 2. Algorithmic specication of the FIR of order 8 in the form of a graph FCDDG

FORK and JOIN for synchronizing the iterations, and a node for the multiplier. A second frontier with two iterations has 4 nodes of FORK and one JOIN where the inputs are the results of multiplication. The latter frontier has 3 adders. We integrate an adder that computes the final result. An implementation graph corresponding to the FCDDG is then deducted by combining the data path and control path.

The optimization process has led us to an optimal defactorization in Table 1. It is composed of a first frontier defactorized in 3 parts: the first with three iterations and 13 bits of accuracy, the second with 2 iterations and accuracy of 11 bits, while the last one is with 3 iterations of 9 bits. A total defactorization is applied in the second frontier. The schema of optimal implementation graph is given in Fig. 4. We have realized the implementation of our application by the classical approach of AAA and our methodology using the ISE 13.1 on the support of FPGAVirtex5 xc5vfx30t 2ff665. Table 2 presents the results of implementing the FIR filter of order 8 on FPGAs by exploiting our methodology compared to an implementation with the classical AAA. This table shows that the gain of resources generated by a methodology takes into account the accuracy compared to a methodology which focuses only on the latency and area. This gain is the result of the use of multiple optimal widths (inputs, coefficients, multipliers and adders) against uniform operation accuracy in the classical approach.

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

70

TABLE 1 PROCESS OF OPTIMIZING HEURISTIC METHODS FOR BOTH


IMPLEMENTATIONS
Classical AAA 3 3 16 2 16 3 16 2 1 16 16 16 1 16 16 16 Modified AAA 3 3 13 2 11 3 9 2 1 14 10 15 1 14 10 11

Frontier 1

Frontier 2

Degree of defactorization Number of iterations F1{1,3,4} Accuracy in bits of the multiplier Number of iterations of F1{2,5} the multiplier Accuracy in bits of the multiplier Number of iterations of F1{6,7,8} the multiplier Accuracy in bits of the multiplier Degree of defactorization F2{1} Number of iterations Accuracy Adder1 in bits Adder2 Adder3 F2{2} Number of iterations Accuracy Adder1 in bits Adder2 Adder3

In this paper, we have presented a new approach for an FPGA implementation methodology based on the classical AAA methodology which determines the optimal encoding of various blocks of our filter, to minimize area and maximize accuracy while respecting the time constraint. This methodology begins with a simulation phase to identify the optimal widths of the various blocks of our application, then continues with a set of graph transformations, and finishes with a heuristic of optimization that takes accuracy into account. This methodology allows the automatic generation of an optimized synthesizable VHDL code. To describe this approach better and validate it, we have completed the implementation on FPGA of a FIR filter, which has been well treated in the literature for the validation of various approaches. By adopting our methodology, we have obtained a gain of 14% in LUTs to implement the FIR filter from the AAA approach.

6.

CONCLUSION

3 F 3
afd asd clk reset rfu UC1

(1,7) 1 1
cpt afu en rfd rsd

1 J (6,7) *3 3 1 8 X (6,7)

(6,7)

+
(4,7) 1 (6,7) 1

(7,7)

+
(7,7) 1

(8,7) (8,7) 1 1 M 2 X

2 2
afd asd

(2,7)

1 F 1 (1,7)
cpt afu UC2 clk reset rfu en rfd rsd

1 J (2,7) *2

+ +

1 1

(4,7) 1 1 (2,7) 1(1,7) 1 (3,7) 1 (4,7)

+
(4,7) 1

1 (9,7)

(2,7)

+
1 (3,7)

F F
afd asd clk reset rfu UC3

(1,7) 1 1 (6,7)
cpt afu en rfd rsd

1 J (6,7) *3
ad clk UComb reset ru

(1,7)
au rd

Fig. 4. Graph of optimized FIR implementation

TABLE 2 RESULTS OF IMPLANTATION ON VIRTEX5XC5VFX30T-2FF665.


Logical Resources I/O primitives Maximum frequency (Mhz) Number of Flip Flop Number of LUT s Number of slices 171 153 11 451 388 14 332 254 24 148 190.51 108 207,64 28 9 Classical AAA Modified AAA Gain (%)

REFERENCES
[1] S. Mohammed, S.K.N Mahammad and V. Kamakoti, "Hardware based genetic evolution of selfadaptive arbitrary response FIR lters", Science Direct on Applied Soft Computing, 2010, pp 842854. A. Boudabousa, A. Ben Atitallaha, L. Khrijib, P. Kadionikc and N. Masmoudi, '' FPGA implementation of vector directional distance lter based on HW/SW environment validation '', in ScienceDirect on International Journal of Electronics and Communications (AE),2010,pp 250-257. V.S. Rosa, E. Costa, J.C. Monteiro and S. Bampi, '' Performance Evaluation of Parallel FIR Filter Optimizations in ASICs and FPGA ''. Proc of the 48th Midwest Symposium on Circuits and Systems, 2005,pp1481-1484. V. S Rosa, F.F Daitx, E. Costa and S. Bampi, "Design Flow for the Generation of Optimized FIR Filters", Proc. of the 16th IEEE International Conference on Electronics, Circuits, and Systems, 2009. ICECS 2009.pp1000-1003.

[2]

[3]

[4]

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

71

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

C.S. Burrus, "Digital Filter Structures Described by Distributed Arithmetic", IEEE Transactions on Circuits and Systems, Vol. cas-24, No.12 ,1977. C-L. Su, Y-T. Hwang, C-W Jen, '' A Novel Recursive Digital Filter Based on Signed Digit Distributed Arithmetic'', Proc on the IEEE International Symposium on Circuits and Systems ,1997,pp2104-2107. H. Yoo, D.V. Anderson, "Hardwre-efficient distributed arithmetic architecture for high-order digital filters", Proc of the IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005 ICASSP,2005, Vol 5, pp125-128 H.T Nguyen and A. Chatterjee, "Number-splitting with shift-and-add Decomposition for Power and Hardware Optimization in Linear DSP Synthesis", IEEE Transactions on Very Large Scale Integration (VLSI) system, 2000,Vol.8 No.4. S. Mirzaei, A. Hosangadi, R. Kastner, '' FPGA Implementation of High Speed FIR Filters Using Add and Shift Method'', Proc of the International Conference on Computer Design,2006, pp 308-313 . R.I. Hartley, '' Sub expression Sharing in Filters Using Canonic Signed Digit Multipliers'', IEEE Trans. On circuits and systems-II: Analog and Digital Signal Processing,1996, Vol. 43, No.IO, pp 611-623 . Y. Li, C. Pengl, D. Vul,X. Zhang, '' The Implementation methods ofHigh Speed FIR Filter on FPGA '' ,Proceeding of the 9th International Conference on pp 2216-219. P.Niang1, T. Grandpierre1, M. Akil and Y. Sorel, "AAA and SynDExIc: A Methodology and a Software Framework for the Implementation of Real-Time Applications onto Reconfigurable Circuits", Book Chapter,2004, page 1119-1123. L. Kaouane, M. Akil, Y. Sorel and T. grandpierre, "A methodology to implement real-time applications onto reconfigurable circuits", Special issue on Engineering of Configurable Systems of the Journal of Supercomputing, Kluwer Academic Publisher, Vol.30, No.3, Dec. 2004. M. Boubaker, M. Akil, K. Ben Khalifa, T. Grandpierre and M.H. Bedoui, "Implementation of an LVQ neural network with a variable size: algorithmic specication, architectural exploration and optimized implementation on FPGA devices", Neural Comput & Applic Springer,p 283-297,2009. A.G Blaiech, .K. Ben Khalifa, .M. Boubaker and M.H. Bedoui, '' Multiwidth fixed-point coding based on reprogrammable hardware implementation of a multi-layer perceptron neural network for alertness classification ''.Proc. of the 10th International Conference on Intelligent Systems Design and Applications (ISDA), Cairo, Egypt,2010.pp:610-614. S. Kim, K. Kum and W. Sung, ''Fixed-Point Optimization Utility for C and C++ Based Digital Signal Processing Programs'', IEEE Transactions on Circuits and Systems II, vol. 45, no 11, November 1998 K. Kum, J. Kang andW. Sung, ''AUTOSCALER for C: An optimizing oating-point to integer C program converter for xed-point digital signal processors'', IEEE Transactions on Circuits and Systems II -

Analog and Digital Signal Processing, vol. 47, p. 840848, September 2000. [18] E. Martin, J.Toureilles and C.Nouet, "Conception optimise darchitecture en prcision finie pour les applications de traitement de signal", Proc. Traitement du signal 2001, vol 18, n1, 2001. [19] L. H. de Figueiredo and J. Stolfi, ''Affine Arithmetic: Concepts and Applications '',Numerical Algorithms, vol. 37, no 1, p. 147158, 2004. [20] R. Kearfott, ''Interval Computations: Introduction, Uses, and Resources '', Euromath Bulletin, vol. 2, no 1, p. 95112, 1996. [21] S. Kim, K. Kum and W. Sung, '' Fixed-Point Optimization Utility for C and C++ Based Digital Signal Processing Programs '', in Workshop on VLSI and Signal Processing 95, Osaka, Nov. 1995. [22] M. Coors, H. Keding, O. Luthje and H. Meyr, ''Integer Code Generation For the TI TMS320C62x'', Proc. of International Conference on Acoustics, Speech and Signal Processing 2001 (ICASSP 01), Sate Lake City, US, May 2001. [23] G. A. Constantinides, P. Y. K. Cheung and W. Luk, '' Synthesis and Optimization of DSP Algorithms'', Kluwer Academic, 2004. [24] Daniel Menard, Romuald Rocher and Olivier Sentieys, ''Analytical Fixed-Point Accuracy Evaluation in Linear Time-Invariant Systems'',pp3197-3208. [25] G. Karypis, E.H. Han and V. Kumar, ''Chamelon: A hierarchical clustering algorithm using dynamic modeling''. In computers ,volume32, number 8pages 68-75,1999. Blaiech Ahmed Ghazi completed his Masters in computer real time and actually is a PHD student, attached to the Laboratory of Biophysics, Faculty of Medicine of Monastir, University of Monastir, Tunisia. He is an assistant at the FSM of Monastir. He has four paper published, he interested to study the impact of arithmetic representation for optimized implementation in FPGA. Ben Khalifa Khaled obtained his doctorate since 2006. He is attached to the Laboratory of Biophysics, Faculty of Medicine of Monastir, University of Monastir, Tunisia. He is an assistant professor at the ISSATs of Sousse, university of Sousse,Tunisia. Boubaker Mohamed obtained his doctorate since 2009. He is attached to the Laboratory of Biophysics, Faculty of Medicine of Monastir, University of Monastir, Tunisia. He is an assistant professor at the ISIM of Monastir, university of Monastir, Tunisia. Bedoui Mohamed Hedi received his PhD degree from Lille University in 1993. He currently teaches with the position of Professor of biophysics in the Faculty of Medicine of Monastir (FMM), Tunisia. He is a member of Medical Technology and image processing team (TIM), UR 08-27. His research interests are realtime and embedded systems, image & signal processing and hardware/software design in medical field, electronic applications in biomedical instrumentation. He is the president of the Tunisian Association of Promotion of Applied Research. .

You might also like