Professional Documents
Culture Documents
Abstract—The QR decomposition-based recursive least-squares STAR rotation can be given as (see [4])
(RLS) adaptive filtering (QRD-RLS) algorithm is suitable for
VLSI implementation since it has good numerical properties and
can be mapped onto a systolic array. Recently, a new fine-grain
pipelinable STAR-RLS algorithm was developed. The pipelined (1.1)
STAR-RLS algorithm (PSTAR-RLS) is useful for high-speed
applications. The stability of QRD-RLS, STAR-RLS, and PSTAR-
RLS has been proved, but the performance of these algorithms
in finite-precision arithmetic has not yet been analyzed. The aim The is similar to a tangent, and is a scaling
of this paper is to determine expressions for the degradation in factor. The sines and cosines of the Givens rotation have
the performance of these algorithms due to finite precision. By been replaced by tangents and scaling factors The
exploiting the steady-state properties of these algorithms, simple scaling factor has the effect of weighting down the
expressions are obtained that depend only on known parameters. input data such that the tangent is never above 1. As it is
This analysis can be used to compare the algorithms and to
decide the wordlength to be used in an implementation. Since shown in [4], the STAR-RLS has a lookahead equation that
floating- or fixed-point arithmetic representations may be used in is simpler as compared with that of QRD-RLS. The STAR
practice, both representations are considered in this paper. The rotations effectively solve an approximate weighted least-
results show that the three algorithms have about the same finite- squares problem. The scaling factors are effective only for
precision performance, with PSTAR-RLS performing better than a few samples after initialization. The STAR rotations are not
STAR-RLS, which does better than QRD-RLS. These algorithms
can be implemented with as few as 8 bits for the fractional part, orthogonal but tend to become orthogonal. It can be shown
depending on the filter size and the forgetting factor used. The that the STAR-RLS algorithm asymptotically converges to the
theoretical expressions are found to be in good agreement with same solution. In fact, it is found that there is little difference in
the simulation results. the performance of the STAR-RLS and QRD-RLS algorithms.
The STAR-RLS algorithm is square-root free; however, it is
different from the square-root-free algorithms presented in
I. INTRODUCTION [7]–[9]. The square-root-free algorithms of [7]–[9] can be
used to perform exact RLS, but these suffer from the same
R ECURSIVE least squares (RLS) based adaptive filters
are used in applications such as channel equalization,
voiceband modems, digital mobile radio, beamforming, and
pipelinabilty problem as the QRD-RLS. On the other hand,
STAR-RLS is designed for ease of pipelinability, and it does
speech and image processing. The QRD-RLS algorithm [1], not generate an exact RLS solution.
[2], [3, p. 494] is the most promising algorithm since it is The pipelined version of the STAR-RLS algorithm has also
known to have very good numerical properties and can be been developed, and this is referred to as the PSTAR-RLS
mapped to a systolic array. The speed (or sample rate) of the [10]. The PSTAR-RLS algorithm can be pipelined to operate
QRD-RLS algorithm is limited by the recursive equations in its at arbitrarily high speeds. The STAR-RLS algorithm [4] and
cells. A new STAR-RLS algorithm [4] was recently developed the pipelined STAR-RLS algorithm (or PSTAR-RLS) [10]
that can be used for high-speed applications, for example, in have lower complexity and half the intercell communication
communications, magnetic recording, image processing [5], as compared with the QRD-RLS algorithm. The PSTAR-RLS
etc. This algorithm uses scaled tangent rotations (STAR) algorithm can take a longer time for initialization, depending
instead of the Givens rotations, which are normally used. on the level of pipelining used. Thus, we are trading off
These rotations are designed so that lookahead [6] can be between performance and speed of operation.
applied with little increase in hardware complexity. A single The accumulation of quantization noise is an important
concern in the implementation of adaptive digital filters. The
cost of implementation is strongly dependent on the number
Manuscript received November 17, 1993; revised July 21, 1996. This work of bits used to represent the signals in the filter. Hence, one
was supported by the Office of Naval Research under Contract N00014-91-J- would like to use fewer number of bits for words without
1008. The associate editor coordinating the review of this paper and approving
it for publication was Dr. Fuyun Ling. affecting the performance of the algorithm. This motivates
K. J. Raghunath is with Lucent Technologies, Bell Laboratories, Murray the finite precision error analysis of QRD-RLS and STAR-
Hill, NJ 07974 USA (e-mail raghu@aloft.att.com). RLS algorithms. The stability of QRD-RLS, STAR-RLS, and
K. K. Parhi is with the Department of Electrical Engineering, University of
Minnesota, Minneapolis, MN 55455 USA (e-mail: parhi@ee.umn.edu). PSTAR-RLS was recently proved in [10]–[12]. The dynamic
Publisher Item Identifier S 1053-587X(97)03354-0. range of the quantities in these algorithms have also been
1053–587X/97$10.00 1997 IEEE
1194 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
evaluated in [13] and [10]. The dynamic range determines have to be made equal by shifting the mantissa of one of
the number of bits for the integral part in a fixed-point them. This implies extra hardware overhead as well as more
implementation and the number of bits for the exponent in clock cycles to complete the operation. The main advantage of
a floating-point representation. However, the representation floating-point arithmetic is the large dynamic range it offers as
for the fractional part (or mantissa) can be determined only compared with the fixed-point arithmetic [26]. Keeping these
by a finite-precision analysis. The effect of finite precision things in view, we have carried out the analysis separately for
on adaptive filters has been considered by many authors, the fixed- and floating-point arithmetic.
for example, for the LMS algorithm [14], for lattice filters We would like to derive a figure of merit for finite precision
[15]–[18], and for recursive least-squares (RLS) [19]–[22]. implementation that can be used to decide the wordlength
The RLS algorithms considered in [19]–[22] refer to the as well as to compare the three algorithms. The estimation
original algorithm, which does not use any decomposition error can be directly obtained as the final output from the
techniques. Reference [18] also gives a comparitive study systolic array for the QRD-RLS, STAR-RLS, and PSTAR-RLS
of some of previous finite-precision analysis papers. It has algorithms. The estimation error is itself the only output that is
been found that using a QR decomposition instead of a required in many applications such as prediction error filtering
Cholesky decomposition, for solving a set of normal equations, and beamforming [3, p. 494]. Hence, we use the deviation in
would reduce the wordlength requirement by about half [23]. the estimation error due to finite precision as the figure of
This explains the popularity of the QRD-RLS algorithm. It merit. Thus, we find expressions for the average deviation in
was also shown in [10] that the STAR-RLS algorithm has the estimation error for the three algorithms for fixed- and
good numerical properties. Simulation results have been used floating-point arithmetic. The deviation is a function of the
before to demonstrate that the QRD-RLS algorithm can be number of bits used, the filter size, and the forgetting factor
implemented with as few as 12 bits in floating-point arithmetic used. The analysis here is based on the assumptions about
[24]. A comprehensive theoretical finite-precision analysis of the roundoff error, such as being independent of the operands.
the QRD-RLS algorithm has, however, not been done before. These are not always satisfied (see [27]). However, we assume
The only papers on this issue, to the best of our knowledge, that the deviation from the assumed model is not serious
are [25] and [12]. The analysis of [12] is for a floating-point enough to affect the results. Such roundoff error models have
implementation of the QRD-RLS algorithm. Here, the classical been successfully used in many papers [20], [21], [14]–[16],
backward error analysis (see [18]) is used to determine bounds etc.
on the errors resulting due to rounding. In addition, it is shown By exploiting the steady-state properties of the algorithm
that the QRD-RLS algorithm is unconditionally stable in the and by making a number of simplifying approximations, we
presence of rounding errors. In [25], the error propagation of obtain simple expressions for the deviation in the estimation
quantization noise is investigated, assuming a single error has error. The deviation expressions for the three algorithms have
occurred. a similar structure. The finite-precision performance (fixed-
The aim of this paper is to make a comprehensive stochastic and floating-point) degrades dramatically as the forgetting
error analysis of the QRD-RLS, STAR-RLS, and PSTAR- factor approaches 1. Traditionally, the value of forgetting
RLS algorithms. Thus, we would like to find expressions factor decides the tradeoff between tracking property and
for the mean and variance of the rounding errors in the misadjustment error [3, p. 405]. A low value of forgetting
different quantities of interest in the algorithms. This stochastic factor implies better tracking but a higher misadjustment
error analysis is, hence, a different approach as compared noise. Our results show that the finite-precision error is also
with the numerical error analysis of [12], using the analysis a factor in the decision of the value of forgetting factor,
categories given in [16]. In fact, stochastic analysis is probably especially when a hardware implementation is required. See
preferred in this case since the input samples are from a [18] for some more discussion and results regarding optimal
stochastic process [16]. Such a general stochastic analysis of choice of forgetting factor and precision used. The theoretical
the finite precision effects in QRD-RLS systolic algorithm expressions agree well with our simulation results. In general,
would require keeping track of all the different parameters the PSTAR-RLS and STAR-RLS algorithms perform a little
as these are passed from one cell to the other in the systolic better than the QRD-RLS algorithm, with PSTAR-RLS having
array. Even if such an analysis is carried out, the resulting the best performance. It is interesting to note that the pipelined
expressions would be unwieldy and of little use. This is true for algorithm has better performance than the serial version. For
all the three algorithms (QRD-RLS, STAR-RLS, and PSTAR- the QRD-RLS, STAR-RLS, and PSTAR-RLS algorithms, we
RLS). We would like to keep the final expressions simple. To find that it would probably be more advantageous to use
do this, we exploit the steady-state properties of the algorithms fixed-point arithmetic. The dynamic range of the quantities
determined in [13], [4], and [10]. in these algorithms is not too wide, and they can be evaluated
In a practical implementation, either floating- or fixed- beforehand [10], [13]. The algorithms can be implemented
arithmetic may be used. The errors resulting from these repre- with very few bits; for example, for a filter size of 11, about
sentations have different statistics. In a VLSI implementation, 10 bits for the fractional part would be sufficient for these
floating-point arithmetic units are more complex than those of algorithms. This requirement would vary with the filter size
fixed-point arithmetic. This is mainly because of the presence and the forgetting factor used.
of two fields: exponent and mantissa. In a floating-point This paper is organized as follows. In Section II, the QRD-
addition, for example, the exponents of the two quantities RLS, STAR-RLS, and PSTAR-RLS algorithms are introduced,
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1195
II. BACKGROUND
In this section, we briefly introduce the QRD-RLS, STAR-
RLS, and PSTAR-RLS algorithms and develop some steady-
state properties of these algorithms. The notation of [3], [4],
and [10] is closely followed here. In a linear least-squares
estimation problem, given a time series of observation vectors
of size each, we want to estimate some desired signal
as The estimation error is
(2.1)
(2.2)
.. and
. for QRD-RLS is equivalent to using a value of
for STAR-RLS (or PSTAR-RLS), where has a value less
diag (2.3) than 1. For example, the convergence plot for QRD-RLS with
is similar to the convergence plot of STAR-RLS
Such an orthogonal triangularization of the weighted data with This difference arises because of the way the
matrix can be achieved using Givens rotations. The resulting STAR-RLS algorithm has been developed. To make it easy to
algorithm is referred to as the QRD-RLS algorithm [1], [3, p. compare the algorithms, we introduce a common parameter of
494]. The systolic implementation of the QRD-RLS algorithm effective forgetting factor We replace by for QRD-RLS
is shown in Fig. 1. The systolic array has two types of cells: and by for the STAR-RLS (and PSTAR-RLS). Now,
boundary and internal cells. The triangularized matrix is for a given value of , the performance of QRD-RLS and
collected in the triangular portion of the array, and the right- STAR-RLS are similar.
hand-side vector is available in the last column of internal The expected value of the sine parameter reaches a value
cells. , and the cosine parameter reaches a value of given by
To carry out the finite-precision analysis effectively, we
need to make use of some steady-state properties of the QRD and QRD (2.4)
algorithms. In [13], it was shown that the QRD-RLS algorithm In Appendix A, we show that the contents of the first boundary
tends to reach a quasi-steady-state, and the content of the cell (i.e., ) in QRD-RLS tend to reach a steady-state value
boundary cells and the rotation parameters generated by them of given by
reach a steady-state value, regardless of the input, if is
chosen close to unity. Some of the steady-state properties of QRD (2.5)
the QRD-RLS were also derived in [15] and [29]. It should also
be noted that these asymptotic expressions would not hold if where is the power of the input data signal, i.e.,
inputs are nonstationary or poorly excited. Before we get into The average cell content of the last column of
the steady-state properties, we define the concept of effective internal cells (or RHS vector , which is referred to here
forgetting factor that is used for uniformity. The QRD-RLS as , is found to be (see Appendix A)
and STAR-RLS (or PSTAR-RLS) are different algorithms, and
the convergence parameter for the two algorithms can also QRD (2.6)
have different significance. We found that using a value of
1196 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
Fig. 3. Cell equations for PSTAR-RLS (the systolic array is the same as
given in Fig. 2 for STAR-RLS).
PSTAR (2.11)
and
PSTAR (2.12)
then find the average value of the same. All symbols with an
overbar represent finite-precision quantities.
A. STAR-RLS Algorithm
1) Boundary Cell: Consider the computations that are per-
formed in the th boundary cell of the STAR-RLS algorithm.
Since we are considering steady-state analysis, the first case
in the “if” statement is always valid (i.e., ). The tangent
is then calculated in fixed-point arithmetic as
Fig. 4. Finite-precision deviation model for the cells in STAR-RLS and
PSTAR-RLS.
(3.2)
quantities involved). Thus, in the above expectation, 3) Deviation in Final Estimation Error: Now, for any cell,
can be considered to be a constant, and we have we have the statistics of the deviation due to fixed-point
arithmetic in the outputs in terms of the deviation in the inputs.
(3.7) The output from the boundary cell is the tangent , which
is input to the internal cells in the same row. The output of
the internal cell is the input to the cell below. Hence,
with the assumption that (this can be proved in
we have the following relationship:
a straightforward manner).
We need the expectation of the square of To get an (3.14)
expression for this, we use the following:
Using (3.14), (3.13) can be rewritten as
1) are uncorrelated with each other and
the other variables, and each has a variance of .
2) by the quasisteady-state assumption
[10]. (3.15)
3) is uncorrelated with .
4) at steady state. This recursive equation can be propagated to obtain the
We finally have deviation in the final output in terms of the initial inputs to
the systolic array of Fig. 2. Note that for the cells in the first
row, for any As mentioned before, the
(3.8) figure of merit used here is the deviation in the estimation
error (note that the a priori estimation error for
the STAR-RLS algorithm is , which is the output
where is the effective forgetting factor equal to in this
from the last internal cell). From (3.15), we find that
case.
We similarly find that (see Appendix B)
(3.9) (3.16)
(3.11)
(3.18)
For cell output, we have
and
(3.12) (3.19)
and (3.20)
2) Internal Cell: For the internal cell, we find For the deviation in the conversion factor, we find the fol-
lowing:
(3.21)
(3.32)
2) Internal Cell: For the internal cell, we have
(3.33)
(3.22)
and
(3.23) (3.34)
and
(3.35)
(3.24)
TABLE I
EXPRESSIONS FOR THEFIRST BOUNDARY CELL
B. PSTAR-RLS Algorithm
(4.2) 1) Boundary Cell: The update equation of the cell content
where the deviation occurs due to the multiplication, and has a summation term over quantities. Each summation adds
occurs due to the division. We carry out the analysis, a relative error of variance Since these errors appear as
as done before, for the fixed-point case, and the results are as multiplicative terms, the analysis gets complicated to a large
shown below. The mean of all the deviations are found to be degree. We have to take into account the correlation of the
zero as in the case of fixed-point arithmetic. Hence, only the same random variable appearing at different places. We find
variance is given. We have that
(4.3)
and (4.8)
and
(4.4)
(4.9)
2) Internal Cells: For the computations in the internal cell,
we find 2) Internal Cell:
(4.5) (4.10)
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1201
(a) (b)
(c) (d)
Fig. 6. Average deviation in estimation error in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS (p = 5). (d)
PSTAR-RLS (p = 10) (“——” theoretical; “+” Simulation results).
(4.14)
(4.15)
(4.11)
and
3) Deviation in Final Estimation Error: The final devia-
tion in the estimation error is (4.16)
(4.12) (4.18)
Again, we have a closed-form expression similar to that of
STAR-RLS. 3) Deviation in Final Estimation Error : Finally
C. QRD-RLS Algorithm
1) Boundary Cell: As before, the mean of the deviations is
approximately zero. The variances of the quantities are found (4.19)
to be
The above expression was simplified assuming that the esti-
(4.13) mation error is small. A more general expression is given in
(B.3) in Appendix B.
1202 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
(a) (b)
(c) (d)
Fig. 7. Average deviation in cell content of the first boundary cell in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS
(p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).
V. ANALYTICAL COMPARISON OF THE ALGORITHMS calculations then the fixed-point calculation. This is due to the
In this section, the equations developed in the last two fact that in fixed point, there are extra bits that are available
equations are simplified to be able to comment on their for the integral part.
To compare the performance of the different algorithms, we
relative performance as well as to compare the fixed-point
can use the expression for the deviation in the estimation error.
implementation with the floating-point implementation. Before
Note that the expressions are similar in form, and this greatly
doing that, we consider the deviations in the cell content and
simplifies the comparisons. First, we compare the floating-
the rotation parameters (tangent, cosines, or sines) generated
and fixed- implementations for the same algorithm. It should
by the boundary cell. It may be noted that the expressions
be kept in mind that the variance is different for the two
derived for the deviation in cell content and the deviation in the implementations. For STAR-RLS and QRD-RLS, we find that
rotation parameters are not in closed form. We have developed fixed-point implementation is always better than floating point.
closed-form expressions only for the estimation error since The same is true for PSTAR-RLS if
that was of prime importance. However, by carrying out the
analysis further, the deviations for every cell can be found. To (5.1)
get an idea of the quantities involved, we show the expressions
for the first boundary cell in Table I. With and , this condition is equivalent to
It may be noted that the error in the rotation parameter is
about for most of the cases [except PSTAR-RLS(FLOAT)]. Next, we compare the different algorithms with the same
The error in cell content is higher for the floating-point arithmetic being used. In the case of fixed-point arithmetic, it
(4.17)
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1203
(a) (b)
(c) (d)
Fig. 8. Average deviation in tangent calculation of the first boundary cell in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c)
PSTAR-RLS (p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).
can be shown that PSTAR-RLS always performs better than satisfied most of the time, except when is close to 0.5
STAR-RLS. The same would be true for the floating-point but higher. In the case of floating-point arithmetic, STAR-RLS
implementation if has a better performance than QRD-RLS if
Since is close to 1, this condition would be satisfied for
(5.2) most applications.
The above would be satisfied if is not too small VI. SIMULATION RESULTS
Thus, in general, PSTAR-RLS has better numerical properties
Next, we show results of the simulations conducted to verify
than the STAR-RLS algorithms. This would not be expected
the theoretical expressions developed in this paper. We use
since more computations are involved in PSTAR-RLS. How-
the 11-tap equalizer example of [3, p. 494], which is also
ever, it should be noted that the computation of tangent in
used in many recent papers including [4] and [10]. A random
PSTAR-RLS is simpler. In addition, the effect of the forgetting
binary sequence , which takes values of , is input to
factor is to multiply the delayed version of the cell content
the channel. The channel has the impulse response of a raised
by instead of This factor controls the finite-precision
cosine
deviation since this leads to a factor of instead of
in the error expressions. (6.1)
Next, we compare STAR-RLS and QRD-RLS. In the case of
fixed point, it can be shown that STAR-RLS performs better
The output of the channel is given by
than QRD-RLS if
(5.3) (6.2)
The right-hand-side (RHS) is positive only when where is the additive noise that is modeled to be a
The condition (5.3) will hence be zero mean Gaussian random variable with a variance of
1204 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
(a) (b)
(c) (d)
Fig. 9. Average deviation in estimation error in floating-point implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS (p = 5): (d) PSTAR-RLS
(p = 10). (“——” theoretical; “+” Simulation results).
0.001. Using simulation results, we first verify the infinite The position of the binary point is known beforehand. In
precision expression developed in Section II. We find that the case of floating point, each number is represented as a
the simulated results are close to the predicted values. For vector of length 2: one for the exponent and the other for
example, theoretical and simulated values of are as follows: mantissa. The handling of the mantissa and exponent is done
• for STAR-RLS simulated 9.29, theoretical as in hardware. To quantize a given quantity in floating-point
9.14; format, the number is represented in binary and shifted right or
• for QRD-RLS simulated 9.07, theoretical left until its absolute value is between 0.5 and 1. The number
9.09; of shifts is given as the exponent (positive for right shift and
• for PSTAR-RLS simulated 9.51, theoretical negative for left shift). The shifted binary value is rounded to
the wordlength specified, and this becomes the mantissa. For a
9.23.
multiplication, the mantissa of the two numbers are multiplied,
To simulate the finite-precision behavior of the algorithms, and the exponents are added. The outputs are again quantized
we carry out the computations as would be done in an actual to a floating-point format. Overflows at any point generate
circuit. We have developed subroutines in C language to error messages. The rounding scheme used for both the fixed-
mimic the operations of finite-precision arithmetic. For every and floating-point is the round-to-the-nearest scheme in [31,
operation of add, multiply, etc., the inputs are quantized to p. A.16]. Since is very close to 1, at low values of , it gets
the given format (fixed- or floating-point) and the wordlength. rounded to 1. In such situations, we use the closest possible
The actual operation is carried out, and the result is then value, which is not rounded to 1.
quantized. For example, if two quantities and are being The simulations are run on a 5000-long data stream, and
multiplied in fixed-point arithmetic, then the result is obtained the average results are computed by averaging over the last
as , where is a quantization operation for 1250 samples. This ensures that the results are calculated after
fixed-point representation. To accomplish quantization, the steady state has been reached. The infinite precision results are
number is represented in binary format and rounded off to also calculated alongside, and the deviation is determined. The
the wordlength specified (overflows generate error messages). plots (Figs. 6–12) show results for STAR-RLS, QRD-RLS,
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1205
(a) (b)
(c) (d)
Fig. 10. Average deviation in cell content of the first boundary cell in floating-point implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS
(p = 5): (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).
PSTAR-RLS with , and PSTAR-RLS with , in QRD-RLS) generated by the first cell. Again, the theoretical
that order. This enables a comparison of the algorithms. The plots and simulation results match well.
deviation values are plotted as a function of the number of Next, we consider the plots for the floating-point arithmetic.
bits used for the fractional part (both floating- and fixed- In Fig. 9, we again see a good match between the theoretical
point). The plot in Figs. 6–8 are for fixed-point arithmetic, and simulation results, except at low values of The nonlin-
and Figs. 9–12 are for floating-point arithmetic. Fig. 6 shows earity in the theoretical plots for STAR-RLS and QRD-RLS
the plots for deviation in estimation error with fixed-point is due to rounding problems associated with representing the
arithmetic. The theoretical plots and the actual performance value of at low values of as pointed out before. Thus, the
are very close for all the algorithms, except at low values quantized value of varies with initially. Again, we see that
of It should be noted that PSTAR-RLS degrades more PSTAR-RLS performs better than QRD-RLS and STAR-RLS,
gracefully at low values of The three algorithms have and the difference is noticeable in this case. In the plots shown
similar performance. However, as predicted in Section V, in Figs. 10 and 11, we see the deviation in cell content and
PSTAR-RLS performs better than STAR-RLS, and STAR-RLS tangent (or cosine in case of QRD-RLS). These plots are not
performs better than QRD-RLS. There is little difference in as close as in the case of fixed-point arithmetic. This shows
performance in increasing from 5 to 10 in PSTAR-RLS. that it is more difficult to predict performance in floating-point
There is a 6-dB improvement for every bit increase in the value arithmetic.
of Thus, the main use of these plots would be to decide the An important use of the expressions developed in this paper
wordlength that needs to be used for the level of performance is to be able to decide the wordlength that is required in
required. In Fig. 7, plots are shown for the deviation of the cell an implementation. As an example, we consider the case of
content of the first cell in the array. The theoretical equations QRD-RLS algorithm fixed-point arithmetic with
are those shown in Table I. Again, we see that the theoretical The infinite precision estimation error variance for this case
plots agree well with the simulated results. Fig. 8 shows the is 26 dB (this is assumed to be known a priori). Since the
plots for the deviation in the tangents (or cosine in case of steady-state estimation error variance is 26 dB, we would
1206 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
(a) (b)
(c) (d)
Fig. 11. Average deviation in tangent calculation of the first boundary cell in floating-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c)
PSTAR-RLS (p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).
For the first boundary cell, the input is the data input
itself. Let We finally obtain
QRD (A.3)
STAR (A.5)
QRD (A.11)
At steady state, the above equation can be written as (after
taking expectation) To derive for the STAR-RLS, we can use an indirect
method. Let represent the transformation resulting due
(A.2) to the scaled tangent rotations. The first column of data matrix
1208 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997
[23] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems. Keshab K. Parhi (S’85–M’88–SM’91–F’96) re-
Englewood Cliffs, NJ: Prentice-Hall, 1974. ceived the B.S. degree from the Indian Institute of
[24] I. K. Proudler, J. G. McWhirter, and T. J. Shepherd, “Computationally Technology, Kharagpur, India, in 1982, the M.S. de-
efficient QR decompositon approach to least squares adaptive filters,” gree from the University of Pennsylvania, Philadel-
Proc Inst. Elec. Eng., pt. F, vol. 138, pp. 341–353, Aug. 1991. phia, in 1984, and the Ph.D. degree from the Uni-
[25] H. Dedieu and M. Hasler, “Error propagation in recursive QRD LS versity of California, Berkeley, in 1988.
filter,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing He is a Professor of Electrical Engineering at
(ICASSP), 1991, pp. 1841–1844. the University of Minnesota, Minneapolis. His re-
[26] A. B. Sripad and D. L. Snyder, “Quantization errors in floating-point search interests include concurrent algorithm and
arithmetic,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP- architecture designs for communications, signal and
26, pp. 456–463, Oct. 1978. image processing systems, digital integrated circuits,
[27] C. W. Barnes, B. N. Tran, and S. H. Leung, “On the statistics of fixed- VLSI digital filters, computer arithmetic, high-level DSP synthesis, and
point roundoff error,” IEEE Trans. Acoust., Speech, Signal Processing, multiprocessor prototyping and task scheduling for programmable software
vol. ASSP-33, pp. 595–605, June 1985. systems. He has published over 160 papers in these areas.
[28] K. J. Raghunath and K. K. Parhi, “Fixed and floating point error analysis Dr. Parhi received the 1994 Darlington and the 1993 Guillemin–Cauer best
of QRD-RLS and STAR-RLS adaptive filters,” in Proc. IEEE Int. Conf. paper awards from the IEEE Circuits and Systems Society, the 1991 paper
Acoust., Speech, Signal Processing (ICASSP), Apr. 1994, pp. 81–84. award from the IEEE Signal Processing Society, the 1991 Browder Thompson
[29] P. A. Regalia and M. Bellanger, “On the duality between fast QR Prize Paper Award from the IEEE, the 1992 Young Investigator Award
methods and lattice methods in least squares adaptive filtering,” IEEE of the National Science Foundation, the 1992–1994 McKnight–Land Grant
Trans. Signal Processing, vol. 39, pp. 879–891, Apr. 1991. professorship of the University of Minnesota, and the 1987 Eliahu Jury award
[30] G. Long, F. Ling, and J. G. Proakis, “The LMS with delayed coeffi-
for excellence in systems research at the University of California, Berkeley.
cient adaptation,” IEEE Trans. Acoust., Speech, Signal Processing, pp.
He is a former associate editor of the IEEE TRANSACTIONS ON CIRCUITS AND
1397–1405, Sept. 1989.
SYSTEMS, is a current associate editor of the IEEE TRANSACTIONS ON SIGNAL
[31] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quanti-
PROCESSING and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART
tative Approach. San Mateo, CA: Morgan Kaufmann, 1990.
[32] F. Ling, D. Manolakis, and J. G. Proakis, “A recursive modified Gram- II: ANALOG AND DIGITAL SIGNAL PROCESSING, and an editor of the Journal of
Schmidt algorithm for least-squares estimation,” IEEE Trans. Acoust., VLSI Signal Processing.
Speech, Signal Processing, vol. ASSP-34, pp. 829–835, Aug. 1986.