Professional Documents
Culture Documents
4
July 2010
'
Eigen-values of input data correlation matrix (FPGAs) can reduce the gap between flexibility and high
Gradient performance. New FPGAs include many primitives that
M Number of filter taps provide DSP applications such as embedded multipliers,
S Power spectral density (PSD) multiply and accumulate units (MAC), digital clock
U LMS algorithm input vector management (DCM), DSP-Blocks, and soft/hard
u(n) Input signal value processor cores (such as PPC). These facilities are
Y LMS algorithm output vector embedded in FPGA fabric and optimized for high
y(n) Output signal value performance applications and low power consumption.
W LMS algorithm weight vector The availability of soft/hard processor cores in new
E Estimation error FPGAs allows implementation of DSP algorithms
e(n) Error signal value without difficulty. An alternative choice is to move some
D Desired reference vector parts of the algorithm into hardware (HW) to improve
d(n) Desired signal value performance. This is called HW/SW co-design. This
MSE Mean square error solution would result in a more efficient implementation
as part of the algorithm is accelerated using HW while
the flexibility is maintained. Another more efficient and
I. Introduction
more complex choice is to convert the whole algorithm
Recently requests for portable and embedded digital into hardware as a pure HW implementation. Although
signal processing (DSP) systems have been increased this is an attractive option under area, speed,
dramatically. Applications such as audio devices, hearing performance and power consumption, the design will be
aids, cell phones, active noise control systems with much more complex [2]. Studies on LMS algorithm
constraints such as speed, area and power consumption mainly concentrate on two aspects. One is the
need an implementation by which these constraints are convergence time from the theoretical perspective;
met with shortest time to market [1]. Some possible several modified LMS algorithms were proposed in
solutions are ASIC chips, general purpose processor reference [3]-[4]. The other is hardware implementation,
(GPP) and digital signal processor (DSP). in order to improve data throughput, many modified
Manuscript received and revised June 2010, accepted July 2010 Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved
436
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
architectures for LMS algorithm such as pipeline There are some methods for performing weight update
technique were proposed in reference [5]-[6]. This paper on an adaptive filter. There is the Wiener filter, which is
can be classified into the latter. the optimum linear filter in terms of mean squared error,
In this paper we first describe the theory of adaptive and several algorithms that try to approximate it, such as
signal processing and LMS algorithms. Then in sections the method of steepest descent. There is also least mean
III and IV, data entry problem in LMS algorithm and square algorithm for use in Artificial Neural Networks
description of designed fixed point Standard-LMS (ANN). Finally, there are other techniques such as the
algorithm are given respectively. Section V shows recursive least squares algorithm and the Kalman filter.
simulation-implementation results and in section VI, a The choice of algorithm is highly dependent on the
comparison is made with other works. At last, section signals of interest and the working environment, as well
VII describes conclusions from the obtained results. as the convergence speed required and computation
complexity available.
The least mean square (LMS) algorithm is similar to
II. LMS Algorithm method of steepest descent in that it updates the weights
Adaptive filters learn the characteristics of their by iteratively approaching the MSE minimum. Widrow
environment and continually adjust their parameters and Hoff invented this method in 1960 for use in neural
accordingly. Because of their ability to perform well in network training. The key is that instead of calculating
unknown environments and track statistical time the gradient at every time interval, the LMS algorithm
variations, adaptive filters are employed in a wide area of uses a rough approximation of the gradient. The error at
fields. The adjustable parameters that are dependent on the filter output can be expressed as (1):
d n WnT un
the applications are the number of filter taps, selection of
FIR or IIR, choice of training algorithm, and the en (1)
convergence speed (learning rate). Beyond these, the
underlying architecture needed for realization is This is simply the desired output minus the filter
application independent. output. By using this definition for en , an approximation
of ' is found by (2):
The main goal of any filter is to extract useful
information from noisy data. Whereas a normal fixed
' 2en un
filter is designed in advance with knowledge of the
(2)
statistics of both the clear signal and the unwanted noise,
Substituting (4) for ' into the weight update (1) from
the adaptive filter continually adjusts to a changing
environment by the use of recursive algorithms [2]-[7].
This is useful when the characteristics of the signals are steepest descent method gives (3):
Wn 2K en un
not known before of change with time.
Wn 1 (3)
d(n)
Desired That is the Widrow-Hoff LMS algorithm. As with the
¦
Input Output +
FIR Filter
another criterion (4) can be used:
w ,w 1 ,... -
0 K
0
2
(4)
MS Max
e(n)
Estimation Error
where S Max is related to the tap inputs u. The good
Fig. 1. Block Diagram of Adaptive Filtering Problem
performance of the LMS algorithm and its simplicity has
caused it to be the most widely used algorithm in
The discrete adaptive filter in Fig. 1 receives u(n) and
practice. For an N tap filter, the number of operations has
produces y(n) by a convolution with filters weights w(k).
been reduced to 2*N multiplications and N additions per
Then d(n) is compared to y(n) to get e(n). This signal is
coefficient update. This is suitable for real time
used to incrementally adjust the filters coefficients for
applications and is the reason of LMS algorithm
the next time instant. Several algorithms exist for weight
In Normalized LMS [8]-[9], the gradient step factor K
popularity.
update, such as the Least Mean Square (LMS) and the
Recursive Least Squares (RLS) algorithms. The selection
of algorithm is dependent on required convergence speed is normalized by the energy of the data vector. NLMS
and the computational complexity available, as statistics usually converges much faster than LMS at little extra
of the operating environment. cost. In this paper, Standard-LMS algorithm is used.
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
437
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
III. Data Entry Problem of LMS discarded. This were called calculation noise, this
Algorithm problem also is a quantization error. This noise can slow
down the convergence speed, divergence of weight
Data entry problem is an important issue in LMS vectors and even lead the entire system to be collapsed.
algorithm. Data should be converted into binary forms in In order to make results more accurate, some measures
order to be processed by digital systems. In fixed-point can be taken into account. Appropriate algorithm
digital systems, data problems mainly involve binary structure can reduce the word-length effect and
code representation, limited word-length selection, appropriate word-length can reduce the calculation noise.
rounding and overflow. For implementation in hardware, long word-length
numbers utilizes more resources than short word-length
numbers. Performance and resource utilization should be
III.1. Binary Code Representation
balanced according to the requirement of the designed
There are several binary code representations. system [11].
Performances of one system based on different
representations are different. Subtractions are included in
LMS algorithm so results might be negative. For this III.3. Overflow
reason, data should be represented as signed binary Errors can also be produced when overflow happens.
codes. These errors can slow down the convergence speed.
The most popular representation of signed binary Overflow can be avoided by two methods: extending the
codes is two’s complement. Calculations of LMS word-length of the accumulator and scaling data before
algorithm are mainly done in decimals so decimals calculation. The latter can be realized by shifting.
should be converted into integers firstly. When - Extending the word-length of the accumulator.
initializing weight vector values, initial values should be When the word-length of input vectors and weight
the values after conversion [10]. vectors is N, the number range that can be represented
will be [ 2 N 1 , 2 N 1 1 ] if the two’s complement is
III.2. Limited Word-Length Selection adopted. The bigger N, the range will be wider. But big
value means more prices.
In digital systems, every number can be represented - Scaling: Scaling can be realized by shifting
by a binary code in limited word-length, so dynamic operation. Because the binary codes are the two’s
range and precision are finite. For LMS algorithm, the complement, the sign bit should be settled appropriately.
effect of limited word-length is that it will produce three When left shifting, the sign bit is not changed, others are
errors: quantization errors of input vectors, quantization left shifted. Most significant bits are discarded and zeros
errors of weight vectors and quantization errors of are left shifted in. When right shifting, the least
calculation. significant bits are discarded and sign bit is right shifted
- Natural signals are analog signals that can not be in.
processed by digital systems. In order to perform this Final output results should be descaled, that is shifting
task, analog signals should be converted into digital in reversed direction [12].
signals by using analog to digital converter (ADC).
Samplings of the ADC are represented in limited word-
length. There are differences between actual values and IV. Fixed-Point Standard LMS Model
values of representations. These differences are
quantization errors of input vectors. Quantization errors In our developed system, ADC unit provides 12 bit
can be reduced by improving the sampling precision of binary output data. Table I shows the LMS input data
the ADC. model used in our system. The one bit fraction length is
- Initial values of weight vectors also have to be dummy and not used for input/output data, but it is
represented by binary codes. Weight vectors are necessary for weight updates.
quantized according to the limited word-length.
TABLE I
Quantization errors of weight vectors are produced in INPUT DATA BIT-ALLOCATION
this process. Quantization errors of weight vectors can Sign Bit Guard Word Length Fraction Length
cause many problems such as actual results of filters Bits
deviate from theoretic results and so degrading the 1 3 12 1
performance.
- Multiplication is one arithmetic operation in LMS Table II shows the LMS weight bit-allocation.
algorithm. Rounding is needed in multiplication of two
binary numbers with limited word-length. Two binary TABLE II
WEIGHTS BIT-ALLOCATION
numbers that the length of each is N, then the length of
Sign Bit Word Fraction Length
the multiplication will be 2N. The length of the result Length
should be rounded to N and N bit binary codes should be 1 1 15
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
438
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
Y WU (5)
'W K EU (6)
Fig. 2. Entity of Fixed-point Standard LMS Core Fig. 3. Flowchart of LMS-Based FIR Core
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
439
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
Fig. 4(c). Test Data Creation with Uniform-Random number Noise Fig. 5(b). Noisy Input and Filter Output Signals
V.2. Software Simulation Results Fig. 5(c). Residual Error (Learning Curve)
In the first simulation a 200 hertz sine signal which is TABLE III
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
440
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
Fig. 6(b). Noisy Input and Filter Output Signals Fig. 7(c). Residual Error (Learning Curve)
TABLE V
3RD SIMULATION PERFORMANCE
Input SNR Output SNR SNR Enhancement
1.0831 dB 8.7515 dB 7.6684 dB
TABLE IV
2ND SIMULATION PERFORMANCE
Input SNR Output SNR SNR Enhancement
1.1399 dB 5.9211 dB 4.7812 dB
Fig. 8(a). Desired and Filter Output Signals
Fig. 7(b). Noisy Input and Filter Output Signals Fig. 8(c). Residual Error (Learning Curve)
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
441
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
TABLE IX
IMPLEMENTATION-RESOURCE UTILIZATION ON XC4VSX25-12FF668
Used Available Utilization
Slice Flip Flops 2,702 20,480 13%
4 input LUTs 3,445 20,480 16%
occupied Slices 3,164 10,240 30%
bonded IOBs 54 320 16%
BUFG/BUFGCTRLs 1 32 3%
Fig. 9(a). Desired and Filter Output Signals DSP48Es 2 128 1%
Maximum Frequency 80.017 MHz
TABLE X
IMPLEMENTATION-RESOURCE UTILIZATION ON XC5VLX50T-3FF665
Used Available Utilization
Slice Registers 2,618 28,800 9%
Slice LUTs 2,564 28,800 8%
occupied Slices 1,264 7,200 17%
bonded IOBs 54 360 15%
BUFG/BUFGCTRLs 1 32 3%
DSP48Es 2 48 4%
Maximum Frequency 103.581 MHz
TABLE VII
5TH SIMULATION PERFORMANCE
Input SNR Output SNR SNR Enhancement
1.1573 dB 8.2417 dB 7.0844 dB
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
442
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
443
Omid Sharifi Tehrani, Mohsen Ashourian, Payman Moallem
Omid Sharifi Tehrani was born in Isfahan, Payman Moallem was born in Isfahan, Iran,
Iran, 1984 and received B.Sc. degree in 1970 and received B.Sc. degree in electrical
telecommunication engineering from Islamic engineering from Isfahan University of
azad university of Majlesi in 2007 and M.Sc. Technology in 1992, M.Sc. degree in electrical
degree in telecommunication engineering engineering from Amir-Kabir University of
(systems) from Islamic azad university of technology in 1996 and PhD in electrical
Najafabad in 2010. engineering from Amir-Kabir University of
He has published 3 books about microcontrollers technology in 2003.
and FPGAs. His research interests are adaptive signal processing, active His research interests are neural networks, image processing and
noise control, artificial neural networks and FPGA implementations. machine vision.
Mr. Sharifi is an active member of young researchers club (YRC) of Mr. Moallem is an active member of image processing and machine
Islamic Azad University (IAU) and reviewer of MJEE journal. vision society of Iran.
Copyright © 2010 Praise Worthy Prize S.r.l. - All rights reserved International Review on Computers and Software, Vol. 5, N. 4
444