FPGA Implimentation Of Adaptive Noise Cancellation

PROJECT REPORT Submitted in partial fulfillment of the requirements for the award of M.Tech Degree in Electronics and Communication Engineering (Applied Electronics and Instrumentation Engg.) of the University of Kerala

Submitted by JERIN K ANTONY Second Semester M.Tech, Applied Electronics and Instrumentation Engineering




This is to certify that this project entitled “FPGA Implimentation Of Adaptive Noise Cancellation ” is a bonafide record of the work done by Muhamed Shereef P, under our guidance towards partial fulfillment of the requirements for the award of Master of Technology Degree in Electronics and Communication Engineering (Applied Electronics and Instrumentation), of the University of Kerala during the year 2012.

MR. Prajith.C.A Assistant Professor Dept. of ECE CET (Project Coordinator)

Dr. Jiji C.V. Professor Dept. of ECE CET (P.G. Co-ordinator)

Prof. J David Professor Dept. of ECE CET (Head of the Department)

I would like to express my sincere gratitude and heartful indebtedness to project coordinator Mr. Prajith,Assistant Professor, Department of Electronics and Communication Engineering for her valuable guidance and encouragement in pursuing this project.

I am thankful to Prof. J David, Head of the Department and Prof. Jiji.C.V., P.G Coordinator , Department of Electronics and Communication Engineering for their help and support.

Above all I am thankful to the God Almighty.

Jerin K Antony


This paper presents hardware implementation of least mean square (LMS) adaptive filter based Adaptive Noise Canceller (ANC) structure on FPGA using VHDL hardware description language. First, the adaptive parameters are obtained by simulating ANC on MATLAB. Second, the data, processed by FPGA, such as step size, input and output signals, desired signal, and coefficients of ANC, are exactly expressed into fixed-point data representation. Finally, the functions of FPGA-based system structure for such LMS algorithm in time sequence are synthesized, simulated, and implemented on Xilinx XC3S500E FPGA using Xilinx ISE 9.2i developing tool. The research results show that it is feasible to implement, on chip train, and use adaptive LMS filter based ANC in a single FPGA chip.




2.1 2.2 2.3 Random Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . Stationary Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . Speech Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . .



4 4 4 5


3.1 3.2 Structure of Adaptive Filter . . . . . . . . . . . . . . . . . . . . . . LMS Adaptive Algorithm . . . . . . . . . . . . . . . . . . . . . . .

6 6 7


4.1 4.2 4.3 General description on VHDL . . . . . . . . . . . . . . . . . . . . Steps in VHDL Design Process . . . . . . . . . . . . . . . . . . . . Spartan-3E FPGA Chip . . . . . . . . . . . . . . . . . . . . . . . .

9 9 10 11





1.1 1.2 1.3 2.1 3.1 3.2 4.1 4.2 Noise cancellation scheme. . . . . . . . . . . . . . . . . . . . . . . Filter sysem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Noise cancellation . . . . . . . . . . . . . . . . . . . . . . . . . . . Speech signal representation. . . . . . . . . . . . . . . . . . . . . . Adaptive filter block diagram. . . . . . . . . . . . . . . . . . . . . Error surface and error contour. . . . . . . . . . . . . . . . . . . . . Overview of the design and implementation sequence of a VHDL project. Spartan-3E Family Architecture. . . . . . . . . . . . . . . . . . . . 2 2 3 5 6 8 10 11



Adaptive filters, as part of digital signal systems, have been widely used in communication industry, as well as in applications such as adaptive noise cancellation, adaptive beam forming, and channel equalization. However, its implementation takes a great deal and becomes a very important field in digital system design. An adaptive filter is usually implemented in DSP processors because of their capability of performing fast floating-point arithmetic .But when FPGA (Field Programmable Logic Array) grows in area and provides a lot of facilities to the designers, it becomes an important competitor in the signal processing market. In addition, FPGA is a form of programmable logic, which offers flexibility for repetitive reconfiguration. Since FPGA consists of slices organized as array of rows and columns, a great deal of parallelism can be explored. Although it is not efficient to use floating-point arithmetic in FPGA due to its need for a large area ,it is sufficient to use fixed-point arithmetic for the adaptive filter to work well. In general FIR structure has been used more successfully than IIR structure in adaptive filters. The output FIR filters is the convolution of its input with its coefficients which have constant values. However, when the adaptive FIR filter was made ˘´ this required appropriate algorithm to update the filterâAZs coefficients.The algorithm used to update the filter coefficient is the Least Mean Square (LMS) algorithm which is known for its simplification, low computational complexity, and better performance in different running environments. When compared to other algorithms used for implementing adaptive filters the LMS algorithm is seen to perform very well in terms of the number of iterations required for convergence. Recursive Least Squares algorithm, for example, is faster in convergence than the LMS but is then very complex to implement, hence detaining system performance in terms of speed and FPGA area used. One of the adaptive filter applications is the adaptive noise canceller. Figure 2 describes its structure where the desired response is composed of a signal plus noise, which is uncorrelated with the signal. The filter input is a sequence of noise which is correlated with the noise in the desired signal. By using the LMS algorithm inside the adaptive filter, the error term e (n) produced by this system is then the original signal

Figure 1.1: Noise cancellation scheme.

Figure 1.2: Filter sysem. with the noise signal cancelled. The method used to cancel the noise signal is known as adaptive filtering. Adaptive filters are dynamic filters which iteratively alter their characteristics in order to achieve an optimal desired output. An adaptive filter algorithmically alters its parameters in order to minimize a function of the difference between the desired output d(n) and its actual output y(n) . This function is known as the objective function of the adaptive algorithm. Figure (1.3) shows a block diagram of the adaptive echo cancellation model. Where the x(n) is input signal, the filter H(n) represents the impulse response of the acoustic environment, W (n) represents the adaptive filter used to cancel the echo signal. The adaptive filter aims to equate its output y(n) to the desired output d(n) (the signal reverberated within the acoustic environment). The external noise input no(n) is neglected here. At each iteration the error signal, e(n) = d(n) − y(n) , is fed back into the filter, where the filter characteristics are altered accordingly.



The equalization algorithm is very essential to the performance of equalizer. With the development of equalization technology, there are many algorithms we can use. Of all kinds of adaptive algorithms, Least Mean Square(LMS) algorithm is known for its simpli- fication, low computational complexity, and better performance in different running environments. LMS algorithm may be described using the following equations:

T T yk = Xk = Wk



T = dk − y k = dk − W k


where yk denotes the output to the filter,Xk is the input vector to the filter, and Xk = [x1k x2k · · · xLk ] w2k Wk · · · wLk ]. presents








Wk = [w1k

is the error signal of the filter output, and dk vdenotes

the desired output signals. The weight vector of the LMS algorithm is updated by Wk+1 = Wk + 2µ k Xk Where µ represents a step-size and constant. The structure of LMS algorithm is shown below (2.3)

Figure 2.1: Structure of LMS algorithm

FPGA Implementation of LMS Algorithm


Processing of positive and negative number
The data, processed by FPGA, such as the input signals, the coefficients of

filter may be positive or negative. Accordingly, it is necessary to employ sign number for expressing the all data inside FPGA. Based on the expression method of the sign number, the highest bit is used as sign bit. In the highest bit, binary digit "0" denotes positive number, binary digit "1" expresses negative number.


Floating-point representation of Data

All the data is represented as floating point numbers. IEEE 754 format is used for the representation of data. It is a 32 bit binary number with 32t h bit indicating the sign bit- "1" indicating a negative number and "0" a positive number. The value of a IEEE-754 number is computed as:

sign ∗ 2exponent ∗ mantissa The sign is stored in bit 32. The exponent can be computed from bits 24-31 by subtracting 127. The mantissa (also known as significand or fraction) is stored in bits 1-23. An invisible leading bit (i.e. it is not actually stored) with value 1.0 is placed in front, then bit 23 has a value of 1/2, bit 22 has value 1/4 etc. As a result, the mantissa has a value between 1.0 and 2. If the exponent reaches -127 (binary 00000000), the leading 1 is no longer used to enable gradual underflow.


FPGA-based system structure of LMS algorithm
The whole system is divided into data storage module, state control module,

output computation module, error adjustment module, and weight update module, and each module has itself function. The whole system is shown in Fig.2.

Figure 3.1: FPGA-based system structure of LMS algorithm. 3.3.1 Data storage module Data storage module is employed for storing input signal xk , the desired signal dk , and the weight value wk . At the same time, for providing the data for the latter operation, based on the number system of the floating-point operation, the extendibility of data bit corresponding to the data must be carried out. Embedded Array Block(EAB) inside FPGA is used as assuming the function of data storage. 3.3.2 State control module

The function of state control module consists in the initialization of system and controlling the time sequence of each module. On the basis of analyzing detailedly the computation process of LMS algorithm and the time sequence relations among modules, the strict state control method that make the system, which is controlled by the clock signal and the signals of functionality, work in pipeline manner of series-parallel combination. 3.3.3 Output computation module

Output computation module, in the action of control signals, completes convulsion operation between the input signal xk and the weight value and exports the computation results to the error adjustment module or the output port. This logic cell needs a mass


of multiplication-cumulated operations. Although the parallel-adopted system can improve the real-time processing performance of the system, it enhances the spending of FPGA resource. For example, eight-order FIR requiring eight multipliers can complete a computation of the output. Accordingly, the workmanner of the global parallel or local parallel is adopted, weight vector is divided into some groups, and each group includes two weight values. Serial operation is designed within a group and parallel operation is carried out among groups. The multiplier works in the manner of time-sharing multiplexing in order to reduce number of multipliers, to improve the availability of the FPGA resource, and to make use of the advantages of the FPGA modularization at the same time, so it is fit for the expansibility of system. 3.3.4 Error adjustment module

First, we can obtain the output error

from error adjustment module according to

the desired signal dk and the output yk to the filter. Then, 2µ

in Eq.(2.3) is calculated

and used as the iterative factor of weight vector. The function of this logic cell consists in realizing subtraction and multiplication operation.




General description on VHDL
Hardware description languages (HDLs) are used to describe hardware for the

purpose of simulation, modelling, design, testing and documentation. These languages are used to represent the functional and wiring details of digital systems in a compact form. It must uniquely and unambiguously define the hardware at various levels of abstraction. The IEEE standard VHDL is such a language. VHDL stands for VHSIC HDL where VHSIC stands for Very High Speed Integrated Circuit. It was developed in 1981 by the US Department of Defence as a language to describe the structure and function of hardware. IEEE standardized it in 1987. It is widely used as a standard tool for the design, manufacturing and documentation of digital circuits of various levels of complexity. The only other major HDL which is prevalent is the IEEE standard Verilog HDL. VHDL is a general-purpose hardware description language which can be used to describe and simulate the operation of a wide range of digital systems starting with those containing only a few gates up to systems formed by the interconnection of many complex integrated circuits. VHDL can describe a system at the behavioural, data flow and structural levels. At the behavioural level, the digital system is described by the functions that it performs. It provides no details as to how the design is implemented. Complex hardware units are first described at the behavioural level to simulate and test the design ideas. At the data flow level, the system is described in terms of the flow of control and the movement of data. This involves the architecture of busses and control hardware. The structural description is the lowest and most detailed level of description and is the simplest to synthesize into hardware. It includes a list of concurrently active components and their interconnections. A VHDL description has two main parts. They are the entity part and the architecture part. The entity describes the system as a single component as seen by external devices. This part provides information about the system’s interface while revealing

nothing about its internal structure and behaviour. The architecture part of a VHDL description is used to specify the behaviour and the internal structure of the system.


Steps in VHDL Design Process
VHDL is conducive to top-down design. So the whole system is described

at the block diagram level and each component in the diagram is described in further detail. The system can be simulated at each level of design to observe the output. The design flow is described in the Fig 4.1

Figure 4.1: Overview of the design and implementation sequence of a VHDL project

Once the design of the system is completed, the corresponding program is entered and synthesized. The Xilinx ISE 9.2i software package is used in this project to implement the VHDL program. The system can be simulated to observe and verify its behaviour. The design is then implemented on the FPGA chip.



Spartan-3E FPGA Chip
The Spartan-3E family architecture consists of five fundamental programmable

functional elements: • Configurable Logic Blocks (CLBs) • Input/Output Blocks (IOBs) • Block RAM • Multiplier Blocks • Digital Clock Manager (DCM) Blocks

Figure 4.2: Spartan-3E Family Architecture These elements are organized as shown in Figure 4.2. A ring of IOBs surrounds a regular array of CLBs. Each device has two columns of block RAM except for the XC3S100E, which has one column. Each RAM column consists of several 18-Kbit RAM blocks. Each block RAM is associated with a dedicated multiplier. The DCMs are positioned in the center with two at the top and two at the bottom of the device. The XC3S100E has only one DCM at the top and bottom, while the XC3S1200E and XC3S1600E add two DCMs in the middle of the left and right sides.


An adaptive filter module is designed using VHDL. The inputs to this module are:

1. Distorted signal r(n) (32 bit) 2. Desired signal d(n) (32 bit) 3. Clock (1 bit) 4. New data (1 bit)

The outputs are: 1. Output y(n) (32 bit) 2. Output ready (1 bit)

The schematic diagram of adaptive filter is given below.

Figure 5.1: Adaptive filter

Based on LMS algorithm an adaptive equalizer is designed and implemented using hardware description language VHDL.The design is simulated on Xilinx ISE 12.3i and a satisfactory result is obtained.

1. Sheng Fuming. Adaptive Signal Processing [M]. XiŠan: Xidian University Press,2001. 2. Uwe Meyer-Baese. Digital Signal Processing with Field Programmable Gate Arrays, 3rd Edition. Springer, Berlin, 2007. 3. John Proakis.Digital Communications.


----------------------------------------------------------------Mainfile.vhd ----------------------------------------------------------------library IEEE; use IEEE.STD_LOGIC_1164.ALL;

-- Uncomment the following library declaration if using -- arithmetic functions with Signed or Unsigned values use IEEE.NUMERIC_STD.ALL;

-- Uncomment the following library declaration if instantiating -- any Xilinx primitives in this code. --library UNISIM; --use UNISIM.VComponents.all;

entity adaptivfilter is port ( afi: in std_logic_vector(31 downto 0); bfi: in std_logic_vector(31 downto 0); mu2: in std_logic_vector(31 downto 0); ndfi: in std_logic; rfdfi: out std_logic; clk: in std_logic; resultfi: out std_logic_vector(31 downto 0); rdyfi: out std_logic); end adaptivfilter;

architecture Behavioral of adaptivfilter is

signal xin0 : std_logic_vector(31 downto 0) := "11000000000000000000000000000000"; signal xin1 : std_logic_vector(31 downto 0) := "11000000000000000000000000000000"; signal wt0 : std_logic_vector(31 downto 0) := "11000000000000000000000000000000"; signal wt1 : std_logic_vector(31 downto 0) := "11000000000000000000000000000000";

signal rfdmul0,rfdmul1,rfdmul2,rfdmul3,rfdmul4,rfdad0,rfdad1, rfdad2,rfdsub0:std_logic:=’1’;

signal resultad0,resultad1,resultad2,resultmul0,resultmul1, resultmul2,resultmul3,resultmul4, resultsub0 : std_logic_vector(31 downto 0) := "11000000000000000000000000000000";

signal ndfi2,ndad,ndsub0,rdymul0,rdymul1,rdymul2, rdymul3,rdymul4,rdyad0,rdyad1,rdyad2,rdysub0:std_logic:=’0’;

component fadder port ( a: in std_logic_vector(31 downto 0); b: in std_logic_vector(31 downto 0); operation: in std_logic_vector(5 downto 0); operation_nd: in std_logic; operation_rfd: out std_logic; clk: in std_logic; result: out std_logic_vector(31 downto 0); 14

rdy: out std_logic); end component;

component fmultiplier port ( a: in std_logic_vector(31 downto 0); b: in std_logic_vector(31 downto 0); operation_nd: in std_logic; operation_rfd: out std_logic; clk: in std_logic; result: out std_logic_vector(31 downto 0); rdy: out std_logic); end component;


mul0: fmultiplier port map(xin0,wt0,ndfi2,rfdmul0, clk,resultmul0,rdymul0);

mul1: fmultiplier port map(xin1,wt1,ndfi2, rfdmul1,clk,resultmul1,rdymul1);


add0: fadder port map (resultmul0,resultmul1, "000000",ndad,rfdad0,clk,resultad0,rdyad0);

sub: fadder port map (bfi,resultad0, "000001",rdyad0,rfdsub0,clk,resultsub0,rdysub0); 15

mul2: fmultiplier port map(mu2,resultsub0, rdysub0,rfdmul2,clk,resultmul2,rdymul2);

mul3: fmultiplier port map(resultmul2,xin0, rdymul2,rfdmul3,clk,resultmul3,rdymul3);

mul4: fmultiplier port map(resultmul2,xin1, rdymul2,rfdmul4,clk,resultmul4,rdymul4);

add1: fadder port map (wt0,resultmul3,"000000", rdymul4,rfdad1,clk,resultad1,rdyad1);

add2: fadder port map (wt1,resultmul4,"000000", rdymul4,rfdad2,clk,resultad2,rdyad2);


rdyfi<=rdyad2;--last ready output rfdfi<=rdyad0;--last ready for data;not giving;

process( ndfi) begin ndfi2<=ndfi; end process;

READ_NET: process begin 16

wait until ndfi = ’1’; xin1<=xin0; xin0<=afi; end process READ_NET;

READ_wt: process begin wait until rdyad2= ’1’; wt0<=resultad1; wt1<=resultad2; end process READ_wt;

end Behavioral;

-----------------------------------------------------------testbench.vhd ------------------------------------------------------------

LIBRARY ieee; USE ieee.std_logic_1164.ALL;

-- Uncomment the following library declaration if using -- arithmetic functions with Signed or Unsigned values USE ieee.numeric_std.ALL;



-- Component Declaration for the Unit Under Test (UUT) 17

COMPONENT adaptivfilter PORT( afi : IN bfi : IN mu2 : IN ndfi : IN std_logic_vector(31 downto 0); std_logic_vector(31 downto 0); std_logic_vector(31 downto 0); std_logic; std_logic;

rfdfi : OUT clk : IN

std_logic; std_logic_vector(31 downto 0);

resultfi : OUT rdyfi : OUT ); END COMPONENT;


signal afi : std_logic_vector(31 downto 0) := (others => ’0’); signal bfi : std_logic_vector(31 downto 0) := (others => ’0’); signal mu2 : std_logic_vector(31 downto 0) := "00111110000110011001100110011010"; --(0.15) signal ndfi : std_logic := ’0’; signal clk: std_logic := ’0’;

signal rfdfi : std_logic; signal resultfi : std_logic_vector(31 downto 0); signal rdyfi : std_logic;

-- Clock period definitions constant clk_period: time := 2000 ns;

constant nd_period: time := 0.2 ms;


-- Instantiate the Unit Under Test (UUT) uut: adaptivfilter PORT MAP ( afi => afi, bfi => bfi, mu2 => mu2, ndfi => ndfi, rfdfi => rfdfi, clk => clk, resultfi => resultfi, rdyfi => rdyfi );

-- Clock process definitions clk_process :process begin clk <= ’0’; wait for clk_period/2; clk <= ’1’; wait for clk_period/2; end process;

ndfi_process :process begin ndfi <= ’0’; wait for 500 ns ; ndfi <= ’1’; wait for clk_period; ndfi <= ’0’; wait for 0.175 ms ; end process; 19

REA_NET: process begin afi <="00111111100000000000000000000000"; bfi <="00111110100111100011011101111010"; wait for nd_period; afi <="00111110100111100011011101111010"; bfi <="00111111100000000000000000000000"; wait for nd_period; afi <="10111111010011110001101110111101"; bfi <="00111110100111100011011101111010" ; wait for nd_period; afi <="10111111010011110001101110111101"; bfi <="10111111010011110001101110111101"; wait for nd_period; afi <="00111110100111100011011101111010"; bfi <="10111111010011110001101110111101"; wait for nd_period;

end process REA_NET;

END behavior;