Department of Electrical and Computer System Engineering

Implementation of digital filter by using FPGA

by

Yishu Wang

A thesis submitted for the degree of Bachelor of Engineering in Computer Systems Engineering

School of Electrical and Computer Engineering
Title:

Implementation of digital filter by using FPGA

Author: Family name: Given name: Date Degree

Yishu Wang Wang Yishu Supervisor Dr Yee Hong Leung, Dr Cesar Ortega-Sanchez Option Bachelor of Engineering Computer System

Abstract
Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a formidable task. There is more than one way to implement the digital FIR filter. Based on the design specification, careful choice of implementation method and tools can save a lot of time and work. MatLab is an excellent tool to design filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce dramatically the time required to generate a solution. Time can be spent evaluating different implementation alternatives. Computation algorithms are required that exploit the FPGA architecture to make it efficient in terms of speed and/or area.

Indexing Terms
FPGAs, FIR Filter, Distributed Arithmetic

Good Technical Work Report Presentation Examiner Co-Examiner

Average

Poor

YishuWang 56 Lowan Loop Karawara Western Australia, 6152 Friday, 10 June 2005

Professor Syed Islam, Head of Department, Department of Electrical and Computer Engineering, Curtin University of Technology, Bentley, 6102, Australia. Dear Professor Islam In order to partially satisfy the requirements for the Bachelor of Engineering in Computer Systems Engineering degree, I present the following thesis entitled “Implementation of digital filters by using FPGAs”. I declare that this thesis is entirely my own work except where credited otherwise. Yours sincerely, Yishu Wang

ACKNOWLEDGEMENTS
I would like to express my appreciation to my project supervisors Dr Yee Hong Leung Dr Cesar Ortega-Sanchez and Dr Doug Myers for their patience and helpful suggestions throughout the life of the project. Their knowledge and support throughout this project has made this work possible. Their knowledge has proven invaluable in progressing through the work.

I

Synopsis Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a formidable task. There is more than one way to implement the digital FIR filter. Based on the design specification, careful choice of implementation method and tools can save a lot of time and work. MatLab is an excellent tool to design filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce dramatically the time required to generate a solution. Time can be spent evaluating different implementation alternatives. Proper choice of the computation algorithms can help the FPGA architecture to make it efficient in terms of speed and/or area.

II

INDEX PAGE TABLES OF FIGURES................................................................................................V 1 INTRODUCTION .................................................................................................1 1.1 Why implement the digital filter? ................................................................ 1 1.2 Design and implementation of digital FIR filter .......................................... 3 1.3 Design outcome............................................................................................ 4 1.4 Overview of the report ................................................................................. 4 2 ISSUES IN IMPLEMENTING THE DIGITAL FILTER .....................................6 2.1 Hardware Architecture ................................................................................. 6 2.2 FIR FILTER IMPLEMENT OPTIONS..................................................... 10 2.2.1 HDL coder.......................................................................................... 11 2.2.2 IP Generators...................................................................................... 11 2.2.3 User Defined ...................................................................................... 15 2.3 Solution to the current practice .................................................................. 16 3 DESIGN SPECIFICATIONS ..............................................................................18 3.1 Expected features of new system ............................................................... 18 3.2 Overview of the system.............................................................................. 19 3.3 Design specification of FIR filters ............................................................. 21 3.4 Overview of UART.......................................................................................... 23 3.4 Design specifications of UART ................................................................. 23 3.5 Design model for transmitter ..................................................................... 25 3.5.1 Design for transmitter ........................................................................ 25 3.5.2 Pseudo code for baud rate clock......................................................... 26 3.5.3 Pseudo code for transmitter................................................................ 26 3.6 Design model for receiver.......................................................................... 28 3.6.1 Design and pseudo code for serial to parallel converter .................... 28 3.6.2 Design for finite state machine and pseudo code for finite state machine 31 3.6.3 Design and pseudo code for 8 bits counter ........................................ 31 3.6.4 Finite state machine for receiver ........................................................ 32 3.6.5 Pseudo code for receiver .................................................................... 32 3.7 Design for 16 times baud rate clock and it’s pseudo code......................... 35 4 Implementation on board .....................................................................................36 4.1 Overview .................................................................................................... 36 4.2 Onboard peripheral..................................................................................... 37 4.2.1 DDR SDRAM, Flash ......................................................................... 37 4.2.2 Clock Sources .................................................................................... 38 4.2.3 10/100/1000 Ethernet PHY, LCD Panel ............................................ 38 4.2.4 USB 2.0 to RS232 Port ...................................................................... 38 4.2.5 RS232, Configuration and Debug Ports............................................. 39 4.3 Configuration setup.................................................................................... 40 4.4 Connection with PC ................................................................................... 41 5 Simulation Results ...............................................................................................42 5.1 Device utilization summary ....................................................................... 42 5.2 Simulation of the behavior model .............................................................. 48 6 CONCLUSION ....................................................................................................50 7 BIBLIOGRAPHY ................................................................................................51 8 APPENDICES .....................................................................................................52

III

Appendix A: Background knowledge about filter design...................................... 52 Appendix B: filter design by using FDATool........................................................ 54 Appendix B: Circuit diagram................................................................................. 63 Appendix C: Code.................................................................................................. 64

IV

TABLES OF FIGURES

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form Figure 2-5 Transposed Direct form FIR Filter Architecture Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture Figure 2-7 True-down DSP Design Flow Figure 2-8 Bottlenecks in the DSP Design Process Figure 2-9 Proper implement techniques choices for the different Filter hardware architecture Figure 3.1 Overview of the system Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter Figure 3-4 Function diagram of the UART Figure 3-5 Finite state machine of the receiver Figure 4-1 onboard peripheral Figure 4-2 USB2.0 to RS-232 port Figure 4-3 Assign Pins for the designed system Figure 5-1 Simulation behavior of the UART Transmitter Figure 5-2 Simulation behavior of the FIR filter Figure 5-3 Simulation behavior of the16 times baud rate clock Table 8.1The four classes of FIR filter and their associate properties Table 8.2 Suitability of a given class of filter for the four FIR types

V

VI

1 1.1

INTRODUCTION Why implement the digital filter?

Digital filters are used extensively in all areas of electronic industry. This is because Digital filters have the potential to attain much better signal to noise ratios than analog filters and at each intermediate stage the analog filter adds more noise to the signal, the digital filter performs noiseless mathematical operations at each intermediate step in the transform. As the digital filters have emerged as a strong option for removing noise, shaping spectrum, and minimizing inter-symbol interference in communication architectures. These filters have become popular because their precise reproducibility allows design engineers to achieve performance levels that are difficult to obtain with analog filters. FIR and IIR filters are the two common filter forms. A drawback of IIR filters is that the
closed-form IIR designs are preliminary limited to low pass, band pass, and high pass filters, etc. Furthermore, these designs generally disregard the phase response of the filter. For example, with a relatively simple computational procedure we may obtain excellent amplitude response characteristics with an elliptic low pass filter while the phase response will be very nonlinear. (For any details pleaser refer to the chapter 7)

In designing filters and other signal-processing system that pass some portion of the frequency band undistorted, it is desirable to have approximately constant frequency response magnitude and zero phases in that band. For casual systems, zero phases are not attainable, and consequently, some phase distortion must be allowed. As the effect of linear phase with integer slope is a simple time shift. A nonlinear phase, on the other hand, can have a major effect on the shape of a signal, even when the

1

frequency-response magnitude is constant. Thus, in many situations it is particularly desirable to design systems to have exactly or approximately linear phase.

Compare to IIR filers, FIR filters can have precise linear phase. Also, in the case of FIR filters, closed-form design equations do not exist. While the window Method can be applied in a straightforward manner, some iteration may be necessary to meet a prescribed specification. the window method and most algorithmic methods afford the possibility of approximating more arbitrary frequency response Characteristics with little more difficulty than is encountered in the design of low pass filters. Also, it appears that the design problem for FIR filters is much more under control than the IIR design problem because there is an optimality theorem for FIR filters that is meaningful in a wide range of practical situations.

The magnitude and phase plots provide an estimate of how the filter will perform; however, to determine the true response, the filter must be simulated in a system model using either calculated or recorded input data. The creation and analysis of representative data can be a complex task. Most of the filter algorithms require
multiplication and addition in real-time. The unit carrying out this function is called MAC (multiply accumulate). Depends on how good the MAC is, the better MAC the better performance can be obtained. Once a correct filter response has been determined

and a coefficient table has been generated, the second step is to design the hardware architecture. The hardware designer must choose between area, performance, quantization, architecture, and response. (For any details pleaser refer to the chapter 7)

2

1.2

Design and implementation of digital FIR filter

MATLAB combines the high-level, mathematical language with an extensive set of pre-defined functions to assist in the creation and analysis of filter data. Toolbox are available for designing filter response and generating coefficient tables, each with varying levels of sophistication. Graphical filter design tools provide selections for specifying passband, filter order, and design methods, as well as provide plots of the response of the filter to various standard forms of inputs. FDAtool from The MathWorks, which can generate a behavioral model and coefficient tables. Once a correct filter response has been determined and hardware architecture has been defined, the implementation can be carried out. Three choices of technology exist
for the implementation of filter algorithms. These are: Programmable DSP chips, ASICs and FPGAs. (For any details pleaser refer to the chapter 7)

At the heart of the filter algorithm is the multiply-accumulate operation;
Programmable DSP chips typically have only one MAC unit that can perform one MAC in less than a clock cycle. DSP processors or programmable DSP chips are flexible, but they might not be fast enough. The reason is that the DSP processor is general purpose and has architecture that constantly requires instructions to be fetched, decoded and executed. ASICs can have multiple dedicated MACs that perform DSP functions in parallel. But, they have high cost for low volume production and the inability to make design modifications after production makes them less attractive.

FPGAs have been praised for their ability to implement filters since the introduction of DSP savvy architectures . which can be efficiently realized using dedicated DSP resources on these devices. More than 500 dedicated multiply-accumulate blocks are now available, making them exceptionally well suited for high-performance, highorder filtering applications that benefit from a parallel, non-resource shared hardware

3

architecture. In this particular project, FPGA has been chosen as the implementation tool. (For any details pleaser refer to the appendix B) To program FPGA, hardware description language is needed.VHDL synthesis offers an easy way to target a model towards different implementation. 1.3 Design outcome

The objective of the project is to meet the demand for the development of courseware in the teaching unit of Digital Signal Processing 304 together with Computer engineering 203 for the multidisciplinary purpose at the department of computer and electrical engineering. There are several results has been achieved through the project life which are: examination of design procedures or design cycle from different approaches. There is more than one way to implement the digital FIR filter. The purpose of the project is to identify the different approaches compare and contrast the different methods, so based on the design specification; careful choice of implementation method can save designer a lot of time and work. To investigate relationship among filter order, hardware architecture and FPGA resources usage; to identify the relationship between the performance and filter hardware architecture; Using MatLab as major design tool to design filters then compare and contrast the outcome with outcome from other software package.

1.4

Overview of the report

Chapter 2 covers a brief overview of digital filter design and implementation tools and methods in use and problems in creating digital filter by using the FPGA board. Chapter 3 describes the achievement of the project and the methodology has been used.

4

Chapter 4 describes the implementation of the FIR filter on the FPGA board Chapter 5 describes the simulation results in details. Chapter 6 overviews what have been achieved in the project and its future direction.

5

2 2.1

ISSUES IN IMPLEMENTING THE DIGITAL FILTER Hardware Architecture

Deciding on an optimal architecture for a particular application involves tradeoffs between area, performance, and response. The abundance of dedicated MAC (Multiply-Accumulate) blocks in the FPGA devices present the option for implementing either an area efficient “serial” or a high-performance “parallel” FIR filter or something in between. Serial architectures share a single MAC resource, making it very area efficient but at the expense of performance, because one clockcycle is required for each tap delay register. This architecture is an excellent choice for area sensitive, low-performance applications. (For any details pleaser refer to the
chapter 7)

Figure2-1 FPGAs Parallel Approach to DSP Enables Higher Computational Through

Although commonly implemented on FPGAs, a serial MAC FIR filter does not exploit the large number of dedicated hardware MAC units available on the newer devices. High-performance, high-order filtering applications, that are able to exploit dedicated multiplier or DSP blocks, often turns to FPGAs for a solution. By replicating the multiply-adder logic once for each tap delay register, the entire filtering calculation can be performed in a single clock cycle. With more than 500

6

DSP blocks available, even very large order filters can sustain input sampling rates over 400 MHz. This type of performance far exceeds even the fastest DSP processors by orders of magnitude.

Below are the block diagram for an “unrolled” implementation of a direct form FIR filter and it’s symmetric and anti-symmetric form (For any details pleaser refer to the
chapter 7)

X(n) a(0)

Register

X(n-1) a(1)

Register a(2)

X(n-2)

Register

X(n-3) a(3)

y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))

Figure 2-2 “Unrolled” Direct Form FIR Filter Architecture

X(n)

X(n-1)

X(n-2)

Register

Register

Register

a(1)
a(0)

y(n)=a(0)(x(n)+x(n-3))+a(1)(x(n-1)+x(n-2))

Figure 2-3 Direct Form FIR Filter Architecture/Symmetric form
X(n)

Register

X(n-1)

Register

X(n-2)

Register

a(1) a(0)

y(n)=a(0)(x(n)-x(n-3))+a(1)(x(n-1)-x(n-2))

Figure 2-4 Direct Form FIR Filter Architecture/Anti-Symmetric form

7

When targeting an FPGA device with dedicated DSP blocks capable of supporting cascaded “multiply-add” operations such as the Xilinx Virtex 4, highest performance is achieved using a “transposed” architecture. Utilizing the same resources as a “direct” form FIR filter, data samples are applied in parallel to all tap multipliers through pipeline registers. The products are then applied to a cascaded chain of registered adders, combining the effect of accumulators and registers. (For any details
pleaser refer to the chapter 7)

X(n) X(n) X(n) X(n) X(n)

a(3)

a(2)

a(1)

a(0)

Register
a(3)X(n)

Register
a(3)X(n-1)+a(2)X(n)

y(n)

Register
y(n)=a(3)X(n-2)+a(2)X(n-1)+a(0)X(n)

a(3)X(n-2)+a(2)X(n-1)

Figure 2-5 Transposed Direct form FIR Filter Architecture

A fourth FIR filter architecture worth considering for FIR implementation in FPGAs is called “distributed arithmetic” (DA). A simplified view of a DA FIR is shown in Figure 2-8. In its most basic form, DA-based computations are bit-serial in nature – serial distributed arithmetic (SDA) FIR. These computations can be parallelized using multiple sets of Look-up table to achieve concurrency. This architecture is referred to as parallel distributed arithmetic (PDA).

8

Figure 2-6 Serial Distributed Arithmetic FIR Filter Architecture

The advantage of the distributed arithmetic approach is its efficiency. The basic operations required are a sequence of table look-ups, additions, subtractions, and shifts of the input data sequence. These functions map efficiently into LUT-based FPGA, such as Xilinx Virtex II, Virtex 4™ and Altera Stratix™ and Stratix II. Distributed arithmetic architectures are an excellent choice for high-performance filter applications targeting FPGA devices that do not include dedicated multipliers or DSP48 blocks or when these resources are not available. A unique characteristic of the DA architecture is that the sample rate is decoupled from the filter length, making the DA architecture appealing for high-order filters. The trade off introduced Here is one of silicon area (FPGA slices) for performance. As the filter length is increased in a DA FIR filter, more logic resources are consumed, but throughput and performance are maintained. The table below summarizes the benefits of each archite cture.

Filter Architecture Direct

Pros .Area efficient when implemented using shared multiply-accumulate operations .Optimal architecture for devices that offer a dedicated synchronous multiply block such as the Xilinx Virtex-II and Virtex-II Pro 9

Cons .Performance decreases on large order filters .Architecture does not allow for maximum efficiency of DSP blocks in the Virtex-4 architecture

Transposed

High-performance when mapped to DSP blocks that support multiply-add and multiply accumulate operations such as the Xilinx Virtex-4 and the Altera Stratix-II Filter sample rate does not decrease with an increase in filter order Offers highest performance in FPGAs when dedicated DSP or synchronous multiplier blocks are not available

Not as area efficient as the direct form for small order filters

Distributed Arithmetic

Performance diminishes with increase in bit-widths Consumes more area than direct or transposed forms

Table 1 Pros and Cons of the different hardware architecture

Although this discussion on filter hardware architectures has been limited to constant coefficient, single-rate, single-channel FIR filters, the concepts will apply for programmable coefficient, multi-rate, and multi-channel variations. The adaptive filters referenced in the overview, however, have distinct architectural characteristics that depart significantly from FIR filters. 2.2 FIR FILTER IMPLEMENT OPTIONS

There is more than one way to reach the solution, proper choice of the implementation tools and techniques can save the designer a lot of work and time.

10

2.2.1

HDL coder

HDL Coder is integrated with the graphical user interface and command line of the Filter Design Toolbox to provide a unified design and implementation environment. Filter Design and Analysis Tool (FDATool) or the MATLAB command line can be used to design a filter and generate VHDL or Verilog code.The design specification input to the Filter Design HDL Coder is quantized filters that can be create in one of two ways: Design and quantize the filter with the Filter Design Toolbox, Design the filter with the Signal Processing Toolbox and then quantize it with the Filter Design Toolbox. The Filter Design HDL Coder supports several important filter structures, including: Direct Form Finite Impulse Response (FIR), Symmetric FIR, Anti-symmetric FIR Transposed FIR, Direct Form I SOS, infinite Impulse Response (IIR), Direct Form I Transposed SOS IIR, Direct Form II SOS IIR, and Direct Form II Transposed SOS II Each of these IIR and FIR structures supports fixed-point and floating-point (double precision) realizations. In addition, the FIR structures also support unsigned fixedpoint coefficients. 2.2.2 IP Generators

When implementing the actual filter on an FPGA, the designer is presented with a fundamental choice of whether to use an IP core or design a custom implementation. Both options have merits and limitations. FIR filter IP cores are readily available from multiple sources with the most common forms being technology-specific enlists, synthesizable RTL, and synthesizable MATLAB. All these generators can construct a filter using a pre-defined coefficient table.

11

Xilinx and Altera both offer high-performance but device-specific filter IP in the form of technology mapped enlists. This form of IP is typically the most cost effective and offers the best results but provides the fewest options. Often each core is hand-crafted to leverage device specific hardware resources, such as Xilinx Block RAM, and includes placement information which helps maintain performance as filter size grows. The format of the IP, however, precludes user modification of the source or retargeting to a different device. The MathWorks offers device independent filter IP; however, they don’t provide the ability to exploit the high-performing resources of the FPGA. This form of IP tends to be a bit more expensive but offers the user the additional options of retarget ability and modification of the RTL source. The quality of the results will vary with the quality of the RTL model and the RTL synthesis tool used for implementation. In general, however, some area and performance is sacrificed to obtain general purpose, retarget able RTL models.

AccelChip offers both high-performance and device-independent IP available in AccelWare® IP Toolkits. AccelWare IP cores are provided as synthesizable MATLAB models that offer several advantages. First, the abstraction level of the IP allows for the greatest range of options. Describing a filter in MATLAB is much easier than creating a technology-specific placed netlist or RTL. This allows AccelChip to build a wider range of hardware architecture options into the model, providing the user the greatest number of choices. Secondly, the architecture of the final hardware can be further modified during the DSP synthesis process with AccelChip through the use of “directives.” This provides an efficient means of leveraging the architectural features of the specific FPGA device, such as dedicated DSP blocks, RAMs, ROMs, and shift registers, to achieve high quality of results in

12

the target device. The table below summarizes the synthesis directives available in AccelChip. (For any details pleaser refer to the chapter 7) AccelChip Synthesis Directive Rolling/unrolling of For loops Effect on Results

Improves sustainable data throughput rate by reducing the number of cycles for processing per input sample Improves sustainable data throughput by reducing the number of cycles for processing per input sample Improves FPGA utilization by mapping 1D and 2D arrays into dedicated FPGA RAM resources Improves sustainable data throughput rate by improving clock frequency performance Improves FPGA utilization by mapping data buffers to shift register logic

Expansion of vector and matrix additions and multiplications

RAM/ROM memory mapping of 1D and 2D arrays

Pipeline insertion

Shift register mapping

IP generators provide an excellent choice when time-to-market or ease-of-use demands prevail. The general purpose nature of these programmable IP generators, however, seldom yields the optimal hardware for a specific application. For this, a user-defined architecture should be considered.

13

Figure 2-7 True-down DSP Design Flow

14

2.2.3

User Defined

FIR filter implementations with aggressive area or performance targets may require the use of a hand-crafted implementation over a pre-defined IP block. Doing so allows the designer to exploit application-specific characteristics, such as unique impulse response, coefficient symmetry, ones in the coefficient table, negative coefficients, coefficient length, or knowledge about the dynamic range of the input data to the filter. Most filter IP generators include some simple coefficient symmetry optimization but do not account for every possible situation. User-defined implementations have traditionally been achieved using hand-coded RTL. This approach offers the designer a high degree of control over the implementation process but is decoupled from the MATLAB simulation environment and can be time consuming. A typical 32 tap FIR filter Verilog model can range between 300 – 500 lines of hand-created VHDL or Verilog code and will often require the instantiation of FPGA-specific RAMs or DSP blocks to achieve optimal results. The quantization must be determined for each signal and register in the design, which typically requires a re-write of the original floating-point model in language that supports fixed-point modeling and filter response analysis. Once complete, the final RTL code must be verified back to the original filter model to insure functional correctness. A recent survey conducted by AccelChip found that designers were evenly split between these major DSP design tasks when asked to identify their most significant bottlenecks.

15

Figure 2-8 Bottlenecks in the DSP Design Process

2.3

Solution to the current practice Implementing hardware design in Field Programmable Gate Arrays (FPGAs) is a formidable task. There is more than one way to implement the digital FIR filter. Based on the design specification, careful choice of implementation method and tools can save a lot of time and work. MatLab is an excellent tool to design filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce dramatically the time required to generate a solution. Time can be spent evaluating different implementation alternatives. Proper choice o f computation algorithms can improve the FPGA architecture to make it efficient in terms of speed and/or area.

16

FIR filter Design Methode Design Tool (MATLAB, MATCAD....)
FDATool

Implemenation Tool(Xilinx, Altera...)
Look-up Table Hardware Description Language C Language

FVTool Multi-Rate FIR filter (Idea form) Direct Direct-Transposed Symmetric Anti-Symmetric

Accelware Generations Distribute Arithmetic Sym FIR using (SDA) Non-Sym FIR using (SDA) Sym FIR using (PDA) Non-Sym FIR using (PDA)

Multi-Rate FIR filter (User defined input data)

Figure 2-9 Proper implement techniques choices for the different Filter hardware architecture

Iinvestigation through the first phase of the project has found that, in the current market there is so many design, implementation tools, each of them has the advantage and disadvantage in some aspects .Conclusion can be draw from the investigation is that to implement the linear phase filter MATLAB or MATCAD is the best choice as there are several toolbox available which remove the complexity for the design and implementation. To design and implement the Multi-rate FIR filter, Filter visualisation tool from MATLAB and Accelware generations software package is the best choice. For the distribute arithmetic hardware architecture the best way to implement it is to use the look-up table technique also distribute arithmetic algorithm is the best choice for implementing FIR filters with Xilinx FPGAs. One of these implementations, namely, Serial Distributed Arithmetic (SDA) exhibits efficiency in terms of area. The other implementation, named Parallel Distributed Arithmetic (PDA) shows a very high sample rate. Linear-phase response FIR filters were also developed and implemented in an FPGA using the DA technique.

17

3 3.1

DESIGN SPECIFICATIONS Expected features of new system

The objective of this system is to provide a hardware platform to test FIR filters that have been generated using MATLab. The system performs the following functions: 1. Communicate with a PC using a standard RS-232 serial interface. 2. Receive a data file from the PC. Data will be received in binary form so they can be applied to the FIR filter without further transformation. For this exercise consider that the FIR filter receives 8-bit samples. 3. The outputs of the FIR filter must be returned to the PC where they will be stored in a file. Again assume data from the FIR filter have to be sent the outputs in binary form (without transformation). 4. To send and receive the files from/to the PC a program like “HyperTerminal” can be used. The serial port must be configured with the following settings: Bauds rate is 57600 which is 54 times slower than the 50MHz crystal clock onboard;1 start bit, 1 stop bit ,8 data bits, No parity ,No flow control.

18

3.2 Overview of the system

8 DRdy PC
RS-232 interf ace

UART

8 WrD EOF

FIR filter

Clock for baud rate
FPGA development board

Clock divider

50 MHz on- board oscillato r

Figure 3.1 Overview of the system

In above figure, there are two blocks that do not have to be designed: the RS-232 interface, it is provided as part of the development board, and the FIR filter, it is implemented using MATLab. The blocks that have to be developed are the Universal Asynchronous ReceiverTransmitter (UART), the clock divider. A brief description of each one of these blocks provided next. UART must receive and transmit data in a serial format. Every frame that is transmitted contains 1 start bit, 8 bits of information and 1 stop bit. The function of the UART can be divided two: the transmission and the reception circuits. In this particular application, the FIR filter receives 8-bit numbers at its input. Hence, it is possible to feed the filter directly from the UART. The receiver must perform the following functions: • Detect the start of a frame by sensing the start bit.

19

Sample the 8 bits of information at the frequency corresponding to the baud rate, least significative bit first.

• •

Present the 8 bits of information as one byte to the FIR filter. Generate a signal DRdy to latch the new data into the filter. This signal must work as a “write” signal to the filter.

The transmitter must perform the following functions: • Receive 8-bit data from the FIR filter. Data will be latched in the transmitter using the WrD signal generated by the data manager. • Transmit the data in serial format at the baud rate, least significant bit first. 1 start bit at the beginning and 1 stop bit at the end of the frame must be inserted Clock divider in this circuit must divide the 50MHz clock available on the board to deliver a frequency related to the baud rate. This new clock has to be used to receive and send data over the serial channel at the correct speed.

FIR filter with its function is to receive 8 bit data from the UART and apply the filtering process for the incoming data. This circuit must provide an interface compatible with the UART. In figure above, this interface is represented by the signal EOF (End Of Filtering) that indicates when the filter has a new data available on its output.

20

3.3

Design specification of FIR filters

Low-pass FIR filter has been implemented with • Response type: Low pass; • • • • • • • • • Design method: Equi ripple; Density factor: 20; Filter order: order 5; Hardware architecture: Direct form; Sampling frequency: 48000Hz; Pass band frequency: 9600Hz; Stop band frequency: 12000Hz; Input data length: 8 bits; Output data length: 8 bits;

Figure 3-2 Magnitude response of the 8 bit-input and 8 bit-output FIR filter

21

Figure 3-3 Phase response of the 8 bit-input and 8 bit-output FIR filter

22

3.4 Overview of UART

Figure 3-4 Function diagram of the UART

The UART is for interfacing computers or microprocessors to an asynchronous serial data channel. The receiver converts serial start, data, parity and stop bits. The transmitter converts parallel data into serial form and automatically adds start, parity and stop bits. The data word length can be 5, 6, 7 or 8 bits. Parity may be odd or even. Parity checking and generation can be inhibited. The stop bits may be one or two or one and one-half when transmitting 5-bit code. It can be used in a wide range of applications including modems, printers, peripherals and remote data acquisition systems. (For any details pleaser refer to the chapter 7)

3.4

Design specifications of UART In the UART (Universal Asynchronous Receive and Transmission) all data transmit or received in unit of bit, at the beginning and the end of the data frame there are the start bit and the stop bit respectively.

23

Figure 3-4 Flow chart representation of the system

Start bit is logic 0, it always stays in the front of the data frame in order to remain the receiver to receive the data, in the process of the receiving the start bit will be removed. According to the TCP/IP, it allows to have the data frame within the range of the 5 to 8. Usually, the data frame is7 or 8 bits, if it required to transmitting ASCII data (Assume all data in binary number), in this design the data frame is in the length of the 8 bits. When the data was transmitted start from the LSB and the MSB is at the end of the data frame. Such as the letter C in the ASCII code it can be represented in 67 in decimal number and 01000011 in the binary number, when it started to transmitted it will become 11000010. There is something wrong with the data frame. Stop bit is logic 1, it always stays at the end of the data frame, and it can be 1 bit, 1.5bit or 2 bits. In the project, it is 1 bit. And there is no parity bit and framing error bit. Baud Rate is the number of the bits can be received and transmitted within a second. If the time need for transmitting and receiving one bit is t then the baud rate will be 1 over t, the corresponding transmit and receive clock is 1/t Hz. The baud rate for the transmitter and receiver should be consistent otherwise the error can be occurred,

24

in this particular design the baud rate has been assumed as 54 times slower than the clock signal on the board.

3.5

Design model for transmitter

To simplify the design for the project, there is a start bit, 8 bits data in length and a stop bit and there is no parity bit, assume the baud rate is 54 times slower than the clock signal from the board.

3.5.1

Design for transmitter

According to design specification, it needs to send 10bits data (1 start bit, 8 bits data, 1 stop bit), after the stop bits, the transmitter will stop and the logic level will stay back to logic1, then it will stay in idle state to wait for the next transmission. The transmitter section accepts parallel data, formats the data and transmits the data in serial form on the serial data output (sdo) terminal. Data is loaded from the inputs din 7 –din0 into the transmitter buffer register by applying logic high on the write input port. Valid data must be present at least tset prior to and thold following the rising edge of write signal. If words less than 8 bits are used, only the least significant bits are transmitted. The character is right justified, so the least significant bit corresponds to txbr(0). The rising edge of write signal clears transmitter buffer register. 0 to 1 Clock cycles later, data is transferred to the transmitter register, the tx_rd pin goes to a low state, tx_rd is set high and serial data information is transmitted. The output data is clocked at a clock rate 16 times the data rate. A second high level pulse on write loads data into the transmitter buffer register. Data transfer to the transmitter register is delayed until transmission of the current data is complete. Data is automatically transferred to the transmitter register and

25

transmission of that character begins one clock cycle later.

3.5.2

Pseudo code for baud rate clock

If the logic level 1 at the input port reset has been detected then The clk signal is logic 0 Variable count has been initialized to 000 Elsif there is a rising edge of the 16 times faster baud rate clock then If the variable count is in the initial state 000 then Invert the clk signal Variable count increment by 1 3.5.3 Pseudo code for transmitter

If the logic level 1 at the input port reset has been detected then The clk signal is logic 0 Variable count has been initialized to 000 Elsif there is a rising edge of the 16 times faster baud rate clock then If the variable count is in the initial state 000 then Invert the clk signal Variable count increment by 1 If the logic level 1 at the input port reset has been detected then Signal of the 8 bits serial register is 00000000 Signal of the empty shift register is logic 0 Signal of tag bits for detecting of the start bit is logic 0 Output of the transmitter is logic 0 Elsif there is rising edge of the baud rate clock then 26

If shifting of the byte is done and data is ready in txhold then Load transmitter register from 8 bits buffer Tag bits for detecting is on Empty shift register is on Send logic 0 start bit into the output port sdo Else Shift data If one byte of data has been shift out then Shift out the stop bit or idle state Else Shift the least significant bit of the serial register txsr(0) If the logic level 1 at the input port reset has been detected then Send the logic 0 into the signal txrd Variable write signal delayed 1 wr1 is logic 0 Variable write signal delayed 2 wr2 is logic 0 Variable txdone1 is logic 0 Elsif there is a rising edge of the16 times faster baud rate clock then If there is falling edge on write singal Signal txrd is in the high state Elsif there is a falling edge on txdone signal( Txbr has been read) Send the logic 0 into the signal txrd Variable wr2 an variable wr1 is equal Send the write strobe signal from input port wr to the varible wr2 Send the logic level of the txdone to the variable txdone1 Latch data bus during write

27

Transmitter ready for write when no data is in the transmitter buffer register

3.6

Design model for receiver

Data is received in serial form at the input port Din. When no data is being received, Din must remain high. The data is clocked through the Clock. The clock rate is 16 times the data rate. The UART receiver is implemented as a structural model. Some parts of it are: 1) Finite state machine 2) 8 bits counter 3) Serial to parallel converter 3.6.1 Design and pseudo code for serial to parallel converter

The serial to parallel converter is controlled by finite state machine which has only two states. State S0 is idle and S1 is data tapping state. Output of FSM is control signal for serial to parallel converter which uses a counter which counts till certain states to keep track of data and simultaneously tapping data in an 10 bit logic vector. After tapping of tenth bit the counter signals to finite state machine back to state S1.

When the current state is state S0 If the start bit of the incoming data has been detected then Go the next state S1 The datardy signal is in the lower state 0 If the start bit of the incoming data has not been detected or idle state The next state is state S1 The datardy signal is in the lower state 0 28

When the current state is in the state S1 then If the counter is in the state hex number 18 then Sampling the start bit into the tmpdata register in bit tmpdata(0); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 28 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(1); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 38 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(2); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 48 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(3); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 58 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(4); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 68 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(5); The datardy signal is in the lower state 0 The next sate is state S1

29

ElsIf the counter is in the state hex number 78 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(6); The datardy signal is in the lower state 0 The next sate is state S1 ElsIf the counter is in the state hex number 88 then Sampling the incoming data Din into the tmpdata register in bit tmpdata(7); The datardy signal is in the lower state 0 Output the data from the buffer register into the output port The next sate is state S1 ElsIf the counter is in the state hex number 95 then The datardy signal is in the high state 1 The next sate is state S1 ElsIf the counter is in the state hex number 98 then The datardy signal is in the high state 1 The next sate is state S1 ElsIf the counter is in the state hex number 9D then The datardy signal is in the low state 0 Go back to the state S0 Else the counter is in the other state Go back to the state S0

30

3.6.2

Design for finite state machine and pseudo code for finite state machine

The finite state machine is triggered from state S0 to state S1 when there is a rising edge of the clock signal; it stays at state S0 when the reset signal is at the higher level. If the signal at the input port reset has been scanned as logic 1 then The current state is state S0 If there is a rising edge of the clock signal then The current change to the next state S1

3.6.3

Design and pseudo code for 8 bits counter

The 8 bits counter signals at the predefined state; the 8 bit counter uses this signal as its clock to tap data. It is counting 11 such signals from 8 bit counter. If the logic level at the reset input port has been scanned as the logic 1 or the current state is in the state S0 Then counter initialize itself to the “00000000” Elsif there is the rising edge of the clock signal the Counter increment by 1

31

3.6.4

Finite state machine for receiver

If there is falling edge of the incoming data Din

Send data
S10

S0

Do nothing
S1

Load the the incoming data to the bre(8)

S9

S2

Load the the incoming data to the bre(1)

S8

Load the the incoming data to the bre(7)

S3

Load the the incoming data to the bre(2)

S7

Load the the incoming data to the bre(6)
S4
S6

Load the the incoming data to the bre(3)

S5

Load the the incoming data to the bre(5)

Load the the incoming data to the bre(4)

Figure 3-5 Finite state machine of the receiver

3.6.5

Pseudo code for receiver If the current state is state S0 and If there is the falling edge of the

incoming data has been detected (start bit 0 has been found) then Go to the state S1 Else stay in the state S0 wait until the start bit has been detected When the current the state is in state S1 then If the state control signal hex number 18 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 0 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 28 has been detected

32

Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 1 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number38 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 2 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 48 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 3 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 58 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 4 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 68 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 5 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 78 has been detected

33

Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 6 Apply the signal logic 0 to Drdy When the current the state is in state S1 then If the state control signal hex number 88 has been detected Then stay in the state S1 Sampling the incoming data into the tmpdata register tmpdata bit 7 Apply the signal logic 0 to Drdy Output the all sampling data to the port dout When the current the state is in state S1 then If the state control signal hex number 95 has been detected Then stay in the state S1 Apply the signal logic 1 to Drdy When the current the state is in state S1 then If the state control signal hex number 98 has been detected Then stay in the state S1 Apply the signal logic 1 to Drdy When the current the state is in state S1 then If the state control signals hex number 9D has been detected Then go back the state S0 Apply the signal logic 0 to Drdy Else Go back to the state S1 When the State is not state S0 or S1 then Go back to the state S0

34

3.7

Design for 16 times baud rate clock and it’s pseudo code

To generate baud rate of 16 times faster than the baud rate of 57600 frequency of onboard clock needs to be divided by (16*57600 = 921600) which is 54.2534. After rounding it becomes integer number 54.

If the signal 1 has been scanned at the input port en If there is rising edge of the 50MHz clock Then count incremented by 1 If there is 54 pulses has been passed Then the output the logic 1 And the variable count has been initialized to 0 Else output logic 0

35

4 4.1

Implementation on board Overview

The Virtex-4™ FX12 LC Development kit provides a complete development platform for designing and verifying applications based on the Xilinx Virtex-4 FPGA family. This kit enables designers to implement DSP and embedded processor based applications with extreme flexibility using IP cores and customized modules. The Virtex-4 FPGA along with its integrated PowerPC processor core makes it possible to prototype processor based applications, enabling software designer early access to a hardware platform prior to working with the final product/target board. The Virtex4 FX12 LC system board utilizes the Xilinx XC4VFX12-10FF668C FPGA. The board includes 64MB of DDR SDRAM, 4MB of Flash, USB-RS232 Bridge, a 10/100/1000 Ethernet PHY, 100 MHz clock source, RS-232 port, and additional user support circuitry to develop a complete system. The board also supports the P160 expansion module standard, allowing application specific expansion modules to be easily added.

36

4.2

Onboard peripheral

Figure 4-1 Onboard peripheral

4.2.1

DDR SDRAM, Flash

The Virtex-4™ FX12 LC development board provides 64MB of DDR SDRAM memory (x16). A high-level block diagram of the DDR SDRAM interface is shown below followed by a table describing the SDRAM memory interface signals. The Virtex-4™ FX12 LC development board provides 4MB of flash memory (x16). A high-level block diagram of the flash interface is shown below followed by a table describing the flash memory interface signals.

37

4.2.2 Clock Sources The Clock Generation section of the Virtex-4 FX12 LC board provides all the necessary clocks for the PowerPC processor, the I/O devices located on the board, as well as the DDR SDRAM memory. An on-board 100MHz oscillator provides the system clock input to the processor section. This 100Mhz clock will be used by the Virtex-4 Digital Clock Managers (DCMs) to generate various processor clocks. In addition to the above clock inputs, a socket is provided on the board that can be used to provide single ended LVTTL clock input to the FPGA via an 8 or 4-pin oscillator. 4.2.3 10/100/1000 Ethernet PHY, LCD Panel

The Virtex-4 FX12 LC development board provides a 10/100/1000 Ethernet port for network connection. This interface uses the Virtex-4 embedded 10/100/1000 MAC. A high-level block diagram of the 10/100/1000 Ethernet interface is shown in the following figure followed by FPGA pin assignments for this interface. The Virtex-4 FX12 LC development board provides an 8-bit interface to a 2x16 LCD panel (MYTECH MOC-16216B-B). The following table shows the LCD interface signals. 4.2.4 USB 2.0 to RS232 Port The Virtex-4 FX12 LC development board implements a USB 2.0 port. This is accomplished using the Cygnal CP2101 USB-to-UART Bridge Controller. The FPGA interfaces to the CP2102 as a simple UART. The UART interface to the CP2102 can run at speeds ranging from 300 to 921,600 baud. The CP2102 is a highly integrated USB-to-UART Bridge Controller, providing a simple solution for USB serial communications using a minimum of components and PCB space. The CP2102 includes a USB 2.0 full-speed function controller, USB transceiver, oscillator, EEPROM, and asynchronous serial data bus (UART) with full modem 38

control signals in a compact 5mm X 5mm MLP-28 package. No other external USB components are required. The on-chip EEPROM may be used to customize the USB Vendor ID, Product ID, Product Description String, Power Descriptor, Device Release Number, and Device Serial Number as desired. The EEPROM is programmed on-board via the USB allowing the programming step to be easily integrated into the product manufacturing and testing process. Royalty-free Virtual COM Port (VCP) device drivers provided by Cygnal allow the Virtex-4 FX12 LC development board to appear as a COM port to PC applications. The CP2102 UART interface implements all RS232 signals, including control and handshaking signals. These signals are interfaced to the Virtex-4 FPGA as follows:

Figure 4-2 USB2.0 to RS-232 port

4.2.5

RS232, Configuration and Debug Ports

The Virtex-4 FX12 LC development board provides an RS232 interface with RX and TX signals and jumpers for connecting the RTS and CTS signals. The following figure shows the RS232 interface to the Virtex-4 FX12 FPGA.Various methods of configuration and debug support are provided on the Virtex-4 FX12 development board to assist designers during the testing and debugging of their applications. The following sections provide brief descriptions of each of these interfaces.

39

4.3

Configuration setup

The configuration for implementation on the board has been set up as the following. For the 50MHz clock onboard Pin 184 has been assigned For the USB switch P16 has been assigned. For the reset button Pin 22 has been assigned. For the serial data input into the receiver Pin11 has been assigned. For the serial data output from the transmitter Pin 3 has been assigned. (For any details pleaser refer to the chapter 7 for the
reference reading)

Figure 4-3 Assign Pins for the designed system

40

4.4

Connection with PC

Design and implementation needs to be done at the host computer system before it downloads the configuration file into the FPGA. Connection needs to be made between the host computer system and target system by using connection cable.

Figure 4-3 Connection between the host system and the target system

A terminal program called hype terminal is an application that will enable a PC to communicate directly with a modem. This can be useful to test the overall system design and diagnose problems. For this particular design the baud rate for the HyperTerminal has been setup as 57600 baud rate. (For any details pleaser refer to the
chapter 7 for the reference reading)

41

5 5.1

Simulation Results Device utilization summary

As one of the objectives of the project is to investigate the resources usage inside of FPGA, simulation result shown that the amount of the component has been used in the FPGA as shown in the following diagram for the linear phase FIR filter for the order 5, order 10 and order 15. Direct FIR Order 5 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Transposed FIR Order 5 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Symmetric FIR Order 5 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Anti-Symmetric FIR Order 5 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs:

143 out of 131 out of 172 out of 54 out of 6 out of 1 out of

3584 3% 7168 1% 7168 2% 141 38% 16 37% 8 12%

117 out of 221 out of 173 out of 54 out of 3 out of 1 out of

3584 3% 7168 3% 7168 2% 141 38% 16 18% 8 12%

117 out of 131 out of 120 out of 55 out of 3 out of 1 out of

3584 3% 7168 1% 7168 1% 141 39% 16 18% 8 12%

143 out of 3584 131 out of 7168 172 out of 7168

3% 1% 2%

42

Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs:

54 out of 141 38% 6 out of 16 37% 1 out of 8 12%

Direct FIR order 10 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs:

206 out of 212 out of 210 out of 55 out of 7 out of 1 out of

3584 5% 7168 2% 7168 2% 141 39% 16 43% 8 12%

Transposed FIR Order 10 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Symmetric FIR Order 10 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Anti-Symmetric FIR Order 10 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs:

213 out of 398 out of 213 out of 55 out of 4 out of 1 out of

3584 5% 7168 5% 7168 2% 141 39% 16 25% 8 12%

180 out of 212 out of 156 out of 56 out of 4 out of 1 out of

3584 5% 7168 2% 7168 2% 141 39% 16 25% 8 12%

188 out of 212 out of 174 out of 55 out of 6 out of 1 out of

3584 5% 7168 2% 7168 2% 141 39% 16 37% 8 12%

Direct FIR Order 15 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs:

413 out of 3584 11% 292 out of 7168 4% 534 out of 7168 7%

43

Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs:

55 out of 141 39% 16 out of 16 100% 1 out of 8 12%

Transposed FIR Order 15 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Symmetric FIR Order 15 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs: Anti-Symmetric FIR Order 15 Selected Device : 3s400pq208-4 Number of Slices: Number of Slice Flip Flops: Number of 4 input LUTs: Number of bonded IOBs: Number of MULT18X18s: Number of GCLKs:

298 out of 587 out of 537 out of 55 out of 8 out of 1 out of

3584 8% 7168 8% 7168 7% 141 39% 16 50% 8 12%

346 out of 293 out of 389 out of 56 out of 8 out of 1 out of

3584 9% 7168 4% 7168 5% 141 39% 16 50% 8 12%

413 out of 292 out of 534 out of 55 out of 16 out of 1 out of

3584 11% 7168 4% 7168 7% 141 39% 16 100% 8 12%

From the observation simulation result the conclusions can be drawn is that for the same filter order, direct hardware architecture takes larger resources than the transposed architecture. Anti-Symmetric hardware architecture takes larger resources than the symmetric hardware architecture as symmetric hardware architecture shares some component in common so the result of the simulation is reasonable.

44

Resources Usage for Direct FIR filter
600

500

400

300

200

Order 5 Order 10 Order 15

100

0

Slice

FFs

LUTs

IOBs

MULT

GCLKs

Observation from the graph has found that for direct form FIR filter architecture as the filter order increases the amount of the resources usage is increasing correspond.

Resources Usage for Transposed form FIR filter
600

500 400

300 200

Order 5 Order 10 Order 15

100

0

Slice

FFs

LUTs

IOBs

MULT

GCLKs

45

Observation from the graph has found that for transposed form of FIR filter architecture as the filter order increases the amount of the resources usage is increasing correspond. Resources Usage for Symmetric FIR filter
400 350 300 250 200 150

Order 5
100

Order 10
50

Order 15
0
Slice FFs LUTs IOBs MULT GCLKs

Observation from the graph has found that for Symmetric form FIR filter architecture as the filter order increases the amount of the resources usage is increasing correspond. Resources Usage for Anti-Symmetric FIR filter
600 500 400 300 200 100 0

Order 5 Order 10 Order 15
Slice FFs LUTs IOBs MULT GCLKs

46

Observation from the graph has found that for Anti-symmetric form FIR filter architecture as the filter order increases the amount of the resources usage is increasing correspond.

47

5.2

Simulation of the behavior model

The simulation result for the UART transmitter is correct and the behaviour model has met the design requirement as the result shown that the most significant bit received from the parallel port of the 8 bits register now is becoming the least significant bit and the least significant bit received form the parallel port of the 8 bits register now is becoming the most significant bit and so on. There is no parity check bit and frame error bit.

Figure 5-1 Simulation behavior of the UART Transmitter

It is really hard to judge the behaviour of the FIR low pass filter form this simulation data however the simulation behaviour is generated based on the design specification in section 3.3 so the behaviour model should be correct.

48

Figure 5-2 Simulation behavior of the FIR filter

The simulation result for the 16 times baud rate clock is correct as it can be seen that there is only one pulse signal for the clk16x only when 54 50MHz pulses passed so it is correct.

Figure 5-3 Simulation behavior of the16 times baud rate clock

The simulation result for the receiver is correct.

Figure 5-4 Simulation behavior of the UART receiver

49

6

CONCLUSION

The traditional methodology for designing a filter consists of two phases: system specification and hardware implementation. Both phases require multiple iterations and can involve a four-to-six-week process just to ensure that a single functional block operates to specification within the system. Creating this filter from the ground up, using either VHDL or other design entry methods, would have required too much development time. MatLab is an excellent tool to design filters. There are toolboxes available to generate VHDL descriptions of the filters which reduce dramatically the time required to generate a solution. There is more than one way to implement the digital FIR filter. Based on the design specification, careful choice of implementation method and tools can save a lot of time and work. Time can be spent evaluating different implementation alternatives. Proper computation algorithm can improve the FPGA architecture to make it efficient in terms of speed and/or area. As the filter order increase the amount of resources usage increase correspondently, Non- Symmetric FIR filter took larger amount of resources than symmetric FIR filter as the Amount of hardware component to implement the Non-Symmetric FIR filter is bigger than the symmetric FIR filter. About future work for the improvement of the project, digital to analogue converter, analogue to digital converter and IIR filter can be tried to implement on the FPGA board.

50

7

BIBLIOGRAPHY

Abdelhak M. Zoubir “Digital Signal Processing and Spectral Analysis”, Manuscript for DSP 304, pp18, 60, 75, 1999 AccelChip, Inc., “AccelWare IP for DSP Design Targeting FPGAs and ASICs” pp5, 2004 AccelChip, Inc., “Comparison of Methods for Implementing DSP Algorithms ” pp. 5, July 2004. AccelChip, Inc., “Filter Design Methods for FPGAs” pp. 1-10, 2004. AccelChip, Inc., “Automated Conversion of Floating-point to Fixed-point MATLAB” pp.6 2004. AccelChip, Inc., “What’s the Right Language for DSP System-level Design?”pp3,2004 AccelChip, Inc., “MATLAB as a System Modeling Language for Hardware,” pp9, 2004. AccelChip, Inc., “Top-Down DSP Design Flow to Silicon Implementation,” pp6-8, 2004. Nirman Kendra, “SI40UART01 Universal Asynchronous Receiver/Transmitter Core Factsheet” 12/20/2002 Intersil, Inc..“CMOS Universal Asynchronous Receiver Transmitter (UART)” March 1997 Memcs, Inc..“Virtex-4™ FX12 LC Development Board User’s Guide”,2005

51

8

APPENDICES

Appendix A: Background knowledge about filter design

Table 8.1The four classes of FIR filter and their associate properties

Table 8.2 Suitability of a given class of filter for the four FIR types

52

Figure 8.3(a) Kaiser windows for

β

= 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.(b) Fourier

transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser Windows with = 11 (solid), 21 (semi-dashed), and 41 (dashed).

β

= 6 and N

53

Appendix B: filter design by using FDATool FDATool has been used to entry the design specification Start the FDATool by entering the fdatool command in the MATLAB Command Window. MATLAB displays the Filter Design & Analysis Tool dialog.

In the Filter Design & Analysis Tool dialog, check that the following filter options are set: Option Response Type Design Method Filter Order Options Frequency Specifications Value Lowpass FIR Equiripple Specify order: 5 Density Factor: 20 Units: KHz Fs: 10 Fpass: 2 Fstop: 3

54

Magnitude Specifications

Units: dB Apass: 1 Astop: 80

Click the Set Quantization Parameters button in the left-side tool bar. The FDATool displays a Filter arithmetic menu in the bottom half of its dialog.

Select Fixed-point from the Filter arithmetic list. The FDATool displays the first of three tabbed panels of quantization parameters across the bottom half of its dialog. You use the quantization options to test the effects of various settings with a goal of optimizing the quantized filter's performance and accuracy.
Tab Coefficients Parameter Numerator word length Best-precision fraction Setting

16 Selected

55

lengths Use unsigned representation Scale the numerator coefficients to fully utilize the entire dynamic range

Cleared

Cleared

Input/Output

Input word length Input fraction length Output word length Avoid overflow

16 15 16 Selected Floor Saturate Full precision Keep MSB 40 Selected

Filter Internals

Round towards Overflow mode Product mode Accum. mode Accum. word length Cast signals before accum

Click Apply

Start the Filter Design HDL Coder by selecting Targets–>Generate HDL in the FDATool dialog. The FDATool displays the Generate HDL dialog.

56

Find the Filter Design HDL Coder online help. Use the help to learn about product details or to get answers to questions as you work with the designer. a. In the MATLAB window, click the Help button in the toolbar or click Help–>Full Product Family Help. b. In the Help browser's Contents pane, select Filter Design HDL Coder. c. Minimize the Help browser. Click the Help button. The FDATool displays context-sensitive help for the dialog. As necessary, use the Help button on the other Filter Design HDL Coder dialogs for context-sensitive help on those dialog views. Close the Help window. Place your cursor over the Name label or text box in the HDL filter pane of the Generate HDL dialog and right-click. A What's This? button appears.

57

Click What's This? The Filter Design HDL Coder opens context-sensitive help that describes the Name option. Use the context-sensitive help as needed while using the GUI to configure options that control the contents and style of the generated HDL code and test bench. A help topic is available for each option and pane. In the Name text box of the HDL filter pane, replace the default name with basicfir. This option names the VHDL entity and the file that is to contain the filter's VHDL code. In the Name text box of the Test bench types pane, replace the default name with basicfir_tb. This option names the generated test bench file. Click HDL Options. The Filter Design HDL Coder displays an HDL Options dialog.

In the Comment in header text box, type Tutorial - Basic FIR Filter and then click Apply. The Filter Design HDL Coder adds the comment to the end of the header comment block in each generated file.

58

Select the Ports tab. The Ports pane appears.

Change the names of the input and output ports. Replace filter_in with data_in and filter_out with data_out. Clear the check box for the Add input register option. The Ports pane should now look like the following.

Click Apply and then OK to register your changes and close the HDL Options dialog. Click Test Bench Options. The Filter Design HDL Coder displays a Test Bench Options dialog.

59

You use this dialog to customize the generated test bench. For this tutorial, apply the default settings by clicking OK. In the Generate HDL dialog, click Generate to start the code generation process. The Filter Design HDL Coder displays the following messages in the MATLAB Command Window as it generates the filter and test bench VHDL files.
### Starting VHDL code generation process for filter: basicfir ### Generating basicfir.vhd file in: hdlsrc ### Starting generation of basicfir VHDL entity ### Starting generation of basicfir VHDL architecture ### HDL latency is 1 samples ### Successful completion of VHDL code generation process for filter: basicfir ### ### ### ### ### ### Starting generation of VHDL Test Bench Generating input stimulus Done generating input stimulus; length 3429 samples. Generating VHDL file basicfir_tb.vhd in: hdlsrc Please wait ........ Done generating VHDL test bench.

As the messages indicate, the Filter Design HDL Coder creates the directory hdlsrc under your current working directory and places the files basicfir.vhd and basicfir_tb.vhd in that directory. The generated VHDL code has the following characteristics:

VHDL entity named basicfir.

60

Registers that use asynchronous resets when the reset signal is active high (1).

• VHDL Port Input Output Clock input

Ports have the following names:
Name

data_in data_out clk clk_enable reset

Clock enable input Reset input

• •

An extra register for handling filter output. Clock input, clock enable input and reset ports are of type STD_LOGIC and data input and output ports are of type STD_LOGIC_VECTOR.

Coefficients are named coeffn, where n is the coefficient number, starting with 1.

Type safe representation is used when zeros are concatenated: '0' & '0'...

Registers are generated with the statement ELSIF clk'event AND clk='1' THEN rather than with the rising_edge function.

The postfix string _process is appended to process names.

The generated test bench:
• •

Is a portable VHDL file. Forces clock, clock enable, and reset input signals.

61

• •

Forces the clock enable input signal to active high. Drives the clock input signal high (1) for 5 nanoseconds and low (0) for 5 nanoseconds.

Forces the reset signal for two cycles plus a hold time of 2 nanoseconds.

• •

Applies a hold time of 2 nanoseconds to data input signals. Applies impulse, step, ramp, chirp, and white noise stimulus types.

When you have finished generating code, click Close to close the Generate HDL dialog. Low-pass FIR filter has been implemented with

62

Appendix B: Circuit diagram

Figure8-4 Circuit diagram of the overall system

63

Appendix C: Code 16 times baud rate clock library ieee; use ieee.std_logic_1164.all; entity count54 is port(clk50Mx,en:in std_logic; clk16x:out std_logic); end count54; architecture count54_arc of count54 is begin process(clk50Mx,en) variable count:integer range 0 to 54 :=0; begin if en='0' then clk16x <= '0'; elsif (rising_edge(clk50Mx)) then count:=count+1; if count=54 then clk16x<='1'; count:=0; else clk16x<='0'; end if; end if; end process; end count54_arc; Direct form FIR filter LIBRARY IEEE; USE IEEE.std_logic_1164.all; USE IEEE.numeric_std.ALL; ENTITY filter IS PORT( clk clk_enable reset data_in data_out sfix8_En7 ); END filter; 64 : IN std_logic; : IN std_logic; : IN std_logic; : IN std_logic_vector(7 DOWNTO 0); -- sfix8_En7 : OUT std_logic_vector(7 DOWNTO 0) --

-----------------------------------------------------------------Module Architecture: filter ---------------------------------------------------------------ARCHITECTURE rtl OF filter IS -- Local Functions -- Type Definitions TYPE delay_pipeline_type IS ARRAY (NATURAL range <>) OF signed(7 DOWNTO 0); -- sfix8_En7 -- Constants CONSTANT coeff1 : signed(15 DOWNTO 0) := to_signed(-4423, 16); -- sfix16_En16 CONSTANT coeff2 : signed(15 DOWNTO 0) := to_signed(18623, 16); -- sfix16_En16 CONSTANT coeff3 : signed(15 DOWNTO 0) := to_signed(29114, 16); -- sfix16_En16 CONSTANT coeff4 : signed(15 DOWNTO 0) := to_signed(29114, 16); -- sfix16_En16 CONSTANT coeff5 : signed(15 DOWNTO 0) := to_signed(18623, 16); -- sfix16_En16 CONSTANT coeff6 : signed(15 DOWNTO 0) := to_signed(-4423, 16); -- sfix16_En16 -- Signals SIGNAL delay_pipeline SIGNAL data_in_regtype SIGNAL product6 SIGNAL mul_temp SIGNAL product5 SIGNAL mul_temp_1 SIGNAL product4 SIGNAL mul_temp_2 SIGNAL product3 SIGNAL mul_temp_3 SIGNAL product2 SIGNAL mul_temp_4 SIGNAL product1 SIGNAL mul_temp_5 SIGNAL sum1 SIGNAL add_temp SIGNAL sum2 SIGNAL add_temp_1 SIGNAL sum3 SIGNAL add_temp_2 SIGNAL sum4 SIGNAL add_temp_3 SIGNAL sum5 SIGNAL add_temp_4 SIGNAL output_typeconvert : delay_pipeline_type(0 TO 4); -- sfix8_En7 : signed(7 DOWNTO 0); -- sfix8_En7 : signed(31 DOWNTO 0); -- sfix32_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(31 DOWNTO 0); -- sfix32_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(31 DOWNTO 0); -- sfix32_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(31 DOWNTO 0); -- sfix32_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(31 DOWNTO 0); -- sfix32_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(23 DOWNTO 0); -- sfix24_En23 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(35 DOWNTO 0); -- sfix36_En31 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(35 DOWNTO 0); -- sfix36_En31 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(35 DOWNTO 0); -- sfix36_En31 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(35 DOWNTO 0); -- sfix36_En31 : signed(34 DOWNTO 0); -- sfix35_En31 : signed(35 DOWNTO 0); -- sfix36_En31 : signed(7 DOWNTO 0); -- sfix8_En7

65

SIGNAL output_register BEGIN

: signed(7 DOWNTO 0); -- sfix8_En7

-- Block Statements Delay_Pipeline_process : PROCESS (clk, reset) BEGIN IF reset = '1' THEN delay_pipeline(0 TO 4) <= (OTHERS => (OTHERS => '0')); ELSIF clk'event AND clk = '1' THEN IF clk_enable = '1' THEN delay_pipeline(0) <= signed(data_in); delay_pipeline(1 TO 4) <= delay_pipeline(0 TO 3); END IF; END IF; END PROCESS Delay_Pipeline_process; data_in_regtype <= signed(data_in); mul_temp <= delay_pipeline(4) * coeff6; product6 <= resize( mul_temp & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32); mul_temp_1 <= delay_pipeline(3) * coeff5; product5 <= resize( mul_temp_1 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32); mul_temp_2 <= delay_pipeline(2) * coeff4; product4 <= resize( mul_temp_2 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32); mul_temp_3 <= delay_pipeline(1) * coeff3; product3 <= resize( mul_temp_3 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32); mul_temp_4 <= delay_pipeline(0) * coeff2; product2 <= resize( mul_temp_4 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 32); mul_temp_5 <= data_in_regtype * coeff1; product1 <= resize( mul_temp_5 & '0' & '0' & '0' & '0' & '0' & '0' & '0' & '0', 35); add_temp <= resize(product1, 36) + resize(product2, 36); sum1 <= (34 => '0', OTHERS => '1') WHEN add_temp(35) = '0' AND add_temp(34) /= '0' ELSE (34 => '1', OTHERS => '0') WHEN add_temp(35) = '1' AND add_temp(34) /= '1' ELSE (add_temp(34 DOWNTO 0)); add_temp_1 <= resize(sum1, 36) + resize(product3, 36); sum2 <= (34 => '0', OTHERS => '1') WHEN add_temp_1(35) = '0' AND add_temp_1(34) /= '0' ELSE (34 => '1', OTHERS => '0') WHEN add_temp_1(35) = '1' AND add_temp_1(34) /= '1'

66

ELSE (add_temp_1(34 DOWNTO 0)); add_temp_2 <= resize(sum2, 36) + resize(product4, 36); sum3 <= (34 => '0', OTHERS => '1') WHEN add_temp_2(35) = '0' AND add_temp_2(34) /= '0' ELSE (34 => '1', OTHERS => '0') WHEN add_temp_2(35) = '1' AND add_temp_2(34) /= '1' ELSE (add_temp_2(34 DOWNTO 0)); add_temp_3 <= resize(sum3, 36) + resize(product5, 36); sum4 <= (34 => '0', OTHERS => '1') WHEN add_temp_3(35) = '0' AND add_temp_3(34) /= '0' ELSE (34 => '1', OTHERS => '0') WHEN add_temp_3(35) = '1' AND add_temp_3(34) /= '1' ELSE (add_temp_3(34 DOWNTO 0)); add_temp_4 <= resize(sum4, 36) + resize(product6, 36); sum5 <= (34 => '0', OTHERS => '1') WHEN add_temp_4(35) = '0' AND add_temp_4(34) /= '0' ELSE (34 => '1', OTHERS => '0') WHEN add_temp_4(35) = '1' AND add_temp_4(34) /= '1' ELSE (add_temp_4(34 DOWNTO 0)); output_typeconvert <= (7 => '0', OTHERS => '1') WHEN sum5(34) = '0' AND sum5(33 DOWNTO 31) /= "000" ELSE (7 => '1', OTHERS => '0') WHEN sum5(34) = '1' AND sum5(33 DOWNTO 31) /= "111" ELSE (sum5(31 DOWNTO 24)); Output_Register_process : PROCESS (clk, reset) BEGIN IF reset = '1' THEN output_register <= (OTHERS => '0'); ELSIF clk'event AND clk = '1' THEN IF clk_enable = '1' THEN output_register <= output_typeconvert; END IF; END IF; END PROCESS Output_Register_process; -- Assignment Statements data_out <= std_logic_vector(output_register); END rtl; UART_Receiver ----------------------------------------------------library ieee ;

67

use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; ----------------------------------------------------entity rx is port( Din: clock: reset: Dout: Drdy: ); end rx; in std_logic; in std_logic; in std_logic; out std_logic_vector(7 downto 0); out std_logic

----------------------------------------------------architecture FSM of rx is -- define the states of FSM model type state_type is (S0, S1); signal next_state, current_state: state_type; signal tmpdata:std_logic_vector(9 downto 0); signal datardy:std_logic; signal count: std_logic_vector(7 downto 0); signal start: std_logic; begin -- cocurrent process#1: state registers process(clock, reset) begin if (reset='1') then current_state <= S0; elsif (clock'event and clock='1') then current_state <= next_state; end if; end process; process(reset,Din,current_state) begin if (reset='1') or current_state=S0 then start <= '0'; elsif current_state = S0 and (Din'event and Din='0') then start <= '1'; end if; end process; process(clock, reset, current_state)

68

begin if reset='1'or current_state = S0 then count<="00000000"; elsif(clock'event and clock='1') then count <= count + 1; end if; end process; process(clock, count, Din,current_state) begin next_state<= current_state; --if next_state<=current_state then case current_state is when S0 => if start <= '1' then next_state <= S1; datardy<='0'; else next_state <= S0; end if; when S1=> if count=x"18"then next_state <= S1; tmpdata(0)<=Din; datardy<='0'; elsif count=x"28" then next_state <= S1; tmpdata(1)<=Din; datardy<='0'; elsif count=x"38"then next_state <= S1; tmpdata(2)<=Din; datardy<='0'; elsif count=x"48"then next_state <= S1; tmpdata(3)<=Din; datardy<='0'; elsif count=x"58"then next_state<= S1; tmpdata(4)<=Din; datardy<='0'; elsif count=x"68"then next_state<= S1;

69

tmpdata(5)<=Din; datardy<='0'; elsif count=x"78"then next_state<= S1; tmpdata(6)<=Din; datardy<='0'; elsif count=x"88"then next_state<= S1; tmpdata(7)<=Din; datardy<='0'; dout <= tmpdata(7 downto 0); elsif count=x"95"then next_state<= S1; datardy<='1'; elsif count=x"98"then next_state<= S1; datardy<='1'; elsif count=x"9D"then next_state<= S0; datardy<='0'; else next_state<= S1; end if; when others => next_state<= S0; end case; -- end if; end process; Drdy <= datardy; end FSM; UART_Transmitter library ieee; use ieee.std_logic_1164.all; use ieee.std_logic_arith.all; use ieee.std_logic_unsigned.all; entity Tx is port ( clk16 : in std_logic; -- Input clock with 50 MHz clock signal wr : in std_logic; -- Write strobe signal rst : in std_logic; -- Reset button din : in std_logic_vector(7 downto 0); -- 8 Bits data bus Input sdo : out std_logic; -- Serial transmit data line tx_rd : out std_logic);-- Transmitter ready to Tx next byte

70

end Tx; architecture Tx_arc of Tx is signal txbr : std_logic_vector(7 downto 0);-- 8 Bits buffer register signal txsr : std_logic_vector(7 downto 0);-- 8 Bits serial register signal txd2 : std_logic; -- tag bits for detecting signal txd1 : std_logic; -- empty shift register signal clk : std_logic; -- Transmit clock or baud rate clock signal txdone : std_logic; -- '1' when shifting of byte is done signal txrd : std_logic; -- '1' when data is ready in txhold begin process (rst, clk16) -- Toggle clk every 8 counts, which divides the clock frequency by 16 variable count: std_logic_vector(2 downto 0); begin if rst='1' then clk <= '0' ; count := (others=>'0') ; elsif clk16'event and clk16='1' then if (count = "000") then clk <= not clk; end if; count := count + "001"; end if; end process; process (rst, clk) begin if rst='1' then txsr <= (others=>'0') ; txd1 <= '0' ; txd2 <= '0' ; sdo <= '0' ; elsif clk'event and clk = '1' then if (txdone and txrd) = '1' then -- Initialize registers and load next byte of data txsr <= txbr; -- Load tx register from txbr txd2 <= '1'; -- Tag bits for detecting txd1 <= '1'; -- when shifting is done sdo <= '0'; -- Start bit to the serial port else txsr <= txd1&txsr(7 downto 1); -- Shift data txd1 <= txd2; txd2 <= '0'; if txdone = '1' then -- Shift out data

71

sdo <= '1'; else

-- Stop or idle bit

sdo <= txsr(0); -- Shift data bit end if; end if ; end if; end process; txdone <= not(txd2 or txd1 or txsr(7) or txsr(6) or txsr(5) or txsr(4) or txsr(3) or txsr(2) or txsr(1) or txsr(0)); -- txdone = 1 when done shifting (When txtag2 has reached tx) process (rst, clk16) variable wr1,wr2: std_logic; -- write signal delayed 1 and 2 cycles or current state and preivous state variable txdone1: std_logic; -- txdone signal delayed one cycle begin if rst='1' THEN txrd<= '0' ; wr1 := '0' ; wr2 := '0' ; txdone1 := '0' ; elsif clk16'event and clk16 = '1' then if wr1 = '0' and wr2= '1' then -- Falling edge on write signal. New data in txbr register txrd <= '1'; elsif txdone ='0' and txdone1 = '1' then -- Falling edge on txdone signal. Txbr has been read. txrd <= '0'; end if; wr2 := wr1; -Delayed versions of write and txdone signals for edge detection wr1 := wr ; txdone1 := txdone; end if ; end process; txbr <= din when wr = '1'else txbr; -- Latch data bus during write tx_rd <= not txrd; -- Transmitter ready for write when no data is in txbr end Tx_arc;

72