You are on page 1of 17

IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO.

5, MAY 1997 1193

Finite-Precision Error Analysis of


QRD-RLS and STAR-RLS Adaptive Filters
Kalavai J. Raghunath, Member, IEEE, and Keshab K. Parhi, Fellow, IEEE

Abstract—The QR decomposition-based recursive least-squares STAR rotation can be given as (see [4])
(RLS) adaptive filtering (QRD-RLS) algorithm is suitable for
VLSI implementation since it has good numerical properties and
can be mapped onto a systolic array. Recently, a new fine-grain
pipelinable STAR-RLS algorithm was developed. The pipelined (1.1)
STAR-RLS algorithm (PSTAR-RLS) is useful for high-speed
applications. The stability of QRD-RLS, STAR-RLS, and PSTAR-
RLS has been proved, but the performance of these algorithms
in finite-precision arithmetic has not yet been analyzed. The aim The is similar to a tangent, and is a scaling
of this paper is to determine expressions for the degradation in factor. The sines and cosines of the Givens rotation have
the performance of these algorithms due to finite precision. By been replaced by tangents and scaling factors The
exploiting the steady-state properties of these algorithms, simple scaling factor has the effect of weighting down the
expressions are obtained that depend only on known parameters. input data such that the tangent is never above 1. As it is
This analysis can be used to compare the algorithms and to
decide the wordlength to be used in an implementation. Since shown in [4], the STAR-RLS has a lookahead equation that
floating- or fixed-point arithmetic representations may be used in is simpler as compared with that of QRD-RLS. The STAR
practice, both representations are considered in this paper. The rotations effectively solve an approximate weighted least-
results show that the three algorithms have about the same finite- squares problem. The scaling factors are effective only for
precision performance, with PSTAR-RLS performing better than a few samples after initialization. The STAR rotations are not
STAR-RLS, which does better than QRD-RLS. These algorithms
can be implemented with as few as 8 bits for the fractional part, orthogonal but tend to become orthogonal. It can be shown
depending on the filter size and the forgetting factor used. The that the STAR-RLS algorithm asymptotically converges to the
theoretical expressions are found to be in good agreement with same solution. In fact, it is found that there is little difference in
the simulation results. the performance of the STAR-RLS and QRD-RLS algorithms.
The STAR-RLS algorithm is square-root free; however, it is
different from the square-root-free algorithms presented in
I. INTRODUCTION [7]–[9]. The square-root-free algorithms of [7]–[9] can be
used to perform exact RLS, but these suffer from the same
R ECURSIVE least squares (RLS) based adaptive filters
are used in applications such as channel equalization,
voiceband modems, digital mobile radio, beamforming, and
pipelinabilty problem as the QRD-RLS. On the other hand,
STAR-RLS is designed for ease of pipelinability, and it does
speech and image processing. The QRD-RLS algorithm [1], not generate an exact RLS solution.
[2], [3, p. 494] is the most promising algorithm since it is The pipelined version of the STAR-RLS algorithm has also
known to have very good numerical properties and can be been developed, and this is referred to as the PSTAR-RLS
mapped to a systolic array. The speed (or sample rate) of the [10]. The PSTAR-RLS algorithm can be pipelined to operate
QRD-RLS algorithm is limited by the recursive equations in its at arbitrarily high speeds. The STAR-RLS algorithm [4] and
cells. A new STAR-RLS algorithm [4] was recently developed the pipelined STAR-RLS algorithm (or PSTAR-RLS) [10]
that can be used for high-speed applications, for example, in have lower complexity and half the intercell communication
communications, magnetic recording, image processing [5], as compared with the QRD-RLS algorithm. The PSTAR-RLS
etc. This algorithm uses scaled tangent rotations (STAR) algorithm can take a longer time for initialization, depending
instead of the Givens rotations, which are normally used. on the level of pipelining used. Thus, we are trading off
These rotations are designed so that lookahead [6] can be between performance and speed of operation.
applied with little increase in hardware complexity. A single The accumulation of quantization noise is an important
concern in the implementation of adaptive digital filters. The
cost of implementation is strongly dependent on the number
Manuscript received November 17, 1993; revised July 21, 1996. This work of bits used to represent the signals in the filter. Hence, one
was supported by the Office of Naval Research under Contract N00014-91-J- would like to use fewer number of bits for words without
1008. The associate editor coordinating the review of this paper and approving
it for publication was Dr. Fuyun Ling. affecting the performance of the algorithm. This motivates
K. J. Raghunath is with Lucent Technologies, Bell Laboratories, Murray the finite precision error analysis of QRD-RLS and STAR-
Hill, NJ 07974 USA (e-mail raghu@aloft.att.com). RLS algorithms. The stability of QRD-RLS, STAR-RLS, and
K. K. Parhi is with the Department of Electrical Engineering, University of
Minnesota, Minneapolis, MN 55455 USA (e-mail: parhi@ee.umn.edu). PSTAR-RLS was recently proved in [10]–[12]. The dynamic
Publisher Item Identifier S 1053-587X(97)03354-0. range of the quantities in these algorithms have also been
1053–587X/97$10.00  1997 IEEE
1194 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

evaluated in [13] and [10]. The dynamic range determines have to be made equal by shifting the mantissa of one of
the number of bits for the integral part in a fixed-point them. This implies extra hardware overhead as well as more
implementation and the number of bits for the exponent in clock cycles to complete the operation. The main advantage of
a floating-point representation. However, the representation floating-point arithmetic is the large dynamic range it offers as
for the fractional part (or mantissa) can be determined only compared with the fixed-point arithmetic [26]. Keeping these
by a finite-precision analysis. The effect of finite precision things in view, we have carried out the analysis separately for
on adaptive filters has been considered by many authors, the fixed- and floating-point arithmetic.
for example, for the LMS algorithm [14], for lattice filters We would like to derive a figure of merit for finite precision
[15]–[18], and for recursive least-squares (RLS) [19]–[22]. implementation that can be used to decide the wordlength
The RLS algorithms considered in [19]–[22] refer to the as well as to compare the three algorithms. The estimation
original algorithm, which does not use any decomposition error can be directly obtained as the final output from the
techniques. Reference [18] also gives a comparitive study systolic array for the QRD-RLS, STAR-RLS, and PSTAR-RLS
of some of previous finite-precision analysis papers. It has algorithms. The estimation error is itself the only output that is
been found that using a QR decomposition instead of a required in many applications such as prediction error filtering
Cholesky decomposition, for solving a set of normal equations, and beamforming [3, p. 494]. Hence, we use the deviation in
would reduce the wordlength requirement by about half [23]. the estimation error due to finite precision as the figure of
This explains the popularity of the QRD-RLS algorithm. It merit. Thus, we find expressions for the average deviation in
was also shown in [10] that the STAR-RLS algorithm has the estimation error for the three algorithms for fixed- and
good numerical properties. Simulation results have been used floating-point arithmetic. The deviation is a function of the
before to demonstrate that the QRD-RLS algorithm can be number of bits used, the filter size, and the forgetting factor
implemented with as few as 12 bits in floating-point arithmetic used. The analysis here is based on the assumptions about
[24]. A comprehensive theoretical finite-precision analysis of the roundoff error, such as being independent of the operands.
the QRD-RLS algorithm has, however, not been done before. These are not always satisfied (see [27]). However, we assume
The only papers on this issue, to the best of our knowledge, that the deviation from the assumed model is not serious
are [25] and [12]. The analysis of [12] is for a floating-point enough to affect the results. Such roundoff error models have
implementation of the QRD-RLS algorithm. Here, the classical been successfully used in many papers [20], [21], [14]–[16],
backward error analysis (see [18]) is used to determine bounds etc.
on the errors resulting due to rounding. In addition, it is shown By exploiting the steady-state properties of the algorithm
that the QRD-RLS algorithm is unconditionally stable in the and by making a number of simplifying approximations, we
presence of rounding errors. In [25], the error propagation of obtain simple expressions for the deviation in the estimation
quantization noise is investigated, assuming a single error has error. The deviation expressions for the three algorithms have
occurred. a similar structure. The finite-precision performance (fixed-
The aim of this paper is to make a comprehensive stochastic and floating-point) degrades dramatically as the forgetting
error analysis of the QRD-RLS, STAR-RLS, and PSTAR- factor approaches 1. Traditionally, the value of forgetting
RLS algorithms. Thus, we would like to find expressions factor decides the tradeoff between tracking property and
for the mean and variance of the rounding errors in the misadjustment error [3, p. 405]. A low value of forgetting
different quantities of interest in the algorithms. This stochastic factor implies better tracking but a higher misadjustment
error analysis is, hence, a different approach as compared noise. Our results show that the finite-precision error is also
with the numerical error analysis of [12], using the analysis a factor in the decision of the value of forgetting factor,
categories given in [16]. In fact, stochastic analysis is probably especially when a hardware implementation is required. See
preferred in this case since the input samples are from a [18] for some more discussion and results regarding optimal
stochastic process [16]. Such a general stochastic analysis of choice of forgetting factor and precision used. The theoretical
the finite precision effects in QRD-RLS systolic algorithm expressions agree well with our simulation results. In general,
would require keeping track of all the different parameters the PSTAR-RLS and STAR-RLS algorithms perform a little
as these are passed from one cell to the other in the systolic better than the QRD-RLS algorithm, with PSTAR-RLS having
array. Even if such an analysis is carried out, the resulting the best performance. It is interesting to note that the pipelined
expressions would be unwieldy and of little use. This is true for algorithm has better performance than the serial version. For
all the three algorithms (QRD-RLS, STAR-RLS, and PSTAR- the QRD-RLS, STAR-RLS, and PSTAR-RLS algorithms, we
RLS). We would like to keep the final expressions simple. To find that it would probably be more advantageous to use
do this, we exploit the steady-state properties of the algorithms fixed-point arithmetic. The dynamic range of the quantities
determined in [13], [4], and [10]. in these algorithms is not too wide, and they can be evaluated
In a practical implementation, either floating- or fixed- beforehand [10], [13]. The algorithms can be implemented
arithmetic may be used. The errors resulting from these repre- with very few bits; for example, for a filter size of 11, about
sentations have different statistics. In a VLSI implementation, 10 bits for the fractional part would be sufficient for these
floating-point arithmetic units are more complex than those of algorithms. This requirement would vary with the filter size
fixed-point arithmetic. This is mainly because of the presence and the forgetting factor used.
of two fields: exponent and mantissa. In a floating-point This paper is organized as follows. In Section II, the QRD-
addition, for example, the exponents of the two quantities RLS, STAR-RLS, and PSTAR-RLS algorithms are introduced,
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1195

and some steady-state properties are derived, which are used


later in the paper. The fixed-point arithmetic and the floating-
point arithmetic results are presented in Sections III and IV,
respectively. The relative performance characteristics of the
algorithms are compared in Section V, and simulation results
are presented in Section VI. Some parts of this paper have
appeared in [28].

II. BACKGROUND
In this section, we briefly introduce the QRD-RLS, STAR-
RLS, and PSTAR-RLS algorithms and develop some steady-
state properties of these algorithms. The notation of [3], [4],
and [10] is closely followed here. In a linear least-squares
estimation problem, given a time series of observation vectors
of size each, we want to estimate some desired signal
as The estimation error is
(2.1)

The weight vector is chosen to minimize the index of


performance

(2.2)

where is the forgetting factor. This problem can be solved


in a numerically sound way by using the QR decomposition
of the weighted data matrix , where
Fig. 1. Systolic array implementation of QRD-RLS.

.. and
. for QRD-RLS is equivalent to using a value of
for STAR-RLS (or PSTAR-RLS), where has a value less
diag (2.3) than 1. For example, the convergence plot for QRD-RLS with
is similar to the convergence plot of STAR-RLS
Such an orthogonal triangularization of the weighted data with This difference arises because of the way the
matrix can be achieved using Givens rotations. The resulting STAR-RLS algorithm has been developed. To make it easy to
algorithm is referred to as the QRD-RLS algorithm [1], [3, p. compare the algorithms, we introduce a common parameter of
494]. The systolic implementation of the QRD-RLS algorithm effective forgetting factor We replace by for QRD-RLS
is shown in Fig. 1. The systolic array has two types of cells: and by for the STAR-RLS (and PSTAR-RLS). Now,
boundary and internal cells. The triangularized matrix is for a given value of , the performance of QRD-RLS and
collected in the triangular portion of the array, and the right- STAR-RLS are similar.
hand-side vector is available in the last column of internal The expected value of the sine parameter reaches a value
cells. , and the cosine parameter reaches a value of given by
To carry out the finite-precision analysis effectively, we
need to make use of some steady-state properties of the QRD and QRD (2.4)
algorithms. In [13], it was shown that the QRD-RLS algorithm In Appendix A, we show that the contents of the first boundary
tends to reach a quasi-steady-state, and the content of the cell (i.e., ) in QRD-RLS tend to reach a steady-state value
boundary cells and the rotation parameters generated by them of given by
reach a steady-state value, regardless of the input, if is
chosen close to unity. Some of the steady-state properties of QRD (2.5)
the QRD-RLS were also derived in [15] and [29]. It should also
be noted that these asymptotic expressions would not hold if where is the power of the input data signal, i.e.,
inputs are nonstationary or poorly excited. Before we get into The average cell content of the last column of
the steady-state properties, we define the concept of effective internal cells (or RHS vector , which is referred to here
forgetting factor that is used for uniformity. The QRD-RLS as , is found to be (see Appendix A)
and STAR-RLS (or PSTAR-RLS) are different algorithms, and
the convergence parameter for the two algorithms can also QRD (2.6)
have different significance. We found that using a value of
1196 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

Fig. 3. Cell equations for PSTAR-RLS (the systolic array is the same as
given in Fig. 2 for STAR-RLS).

where is the speedup or the level of pipelining used. In


Appendix A, it is shown that

PSTAR (2.11)

and
PSTAR (2.12)

A. Assumptions for the Analysis


In order to carry out the analysis, we make some simplifying
Fig. 2. Systolic array implementation of STAR-RLS. The (n) parameter is
a dummy variable that is used to simplify the hardware structure. Given (n) assumptions. We assume that the forgetting factor is close to
and t(n)=z (n), the quantities t(n) and z (n) can be uniquely determined. unity (as is the case usually) and that the number of bits used is
not unreasonably small. We have also assumed that the desired
signal has a power of unity and that the input data signal
In STAR-RLS, instead of using Givens rotations, scaled
(elements of data vector ) have a power of roughly the same
tangent rotations (STAR) are used. The STAR-RLS systolic
order. The desired signal in some applications would be ,
algorithm is shown in Fig. 2. In [10], it was shown that the
and hence, this assumption is not restrictive. If the assumption
tangents generated in the STAR-RLS algorithm tend to reach
is not valid, the inputs can be scaled to meet the requirements
a steady-state value of
approximately (by shifting the inputs, for example). All the
input signals are assumed to be zero mean. The forgetting
STAR (2.7)
factor is a constant parameter, and it is assumed to be chosen
a priori such that it is not affected by the finite-precision
The steady-state value for the content of the first boundary
quantization.
cell and the average cell content of the last column of internal
cells are, respectively, found to be (see Appendix A)
B. Finite-Precision Deviation Model
STAR (2.8) When the algorithms of QRD-RLS, STAR-RLS, and
PSTAR-RLS are implemented in finite-precision (fixed-point
and or floating-point), there would be a deviation in the value of the
different quantities in the systolic array. The finite-precision
STAR (2.9) deviation is represented by an additive term as shown in Fig. 4
for STAR-RLS and PSTAR-RLS and in Fig. 5 for QRD-RLS.
The PSTAR-RLS algorithm is a pipelined version of the The subscript represents the row and column number of the
STAR-RLS algorithm. This algorithm is derived after applying cell in the systolic array (for example, represents the
lookahead and using delayed adaptation [30]. The properties cell content of the cell in the th row and th column, and
of this algorithm can hence be different from that of the represents the tangent for the cells in the th row).
STAR-RLS algorithm. The algorithm is shown in Fig. 3. The The analysis here is carried out assuming that steady state
steady-state value of the tangent in this case is (see [10]) has already been reached. There are two reasons for this.
During convergence, the behavior of the algorithm is highly
PSTAR (2.10) unpredictable, and hence, the finite-precision analysis is not
possible. The convergence period is of short duration, and
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1197

then find the average value of the same. All symbols with an
overbar represent finite-precision quantities.

A. STAR-RLS Algorithm
1) Boundary Cell: Consider the computations that are per-
formed in the th boundary cell of the STAR-RLS algorithm.
Since we are considering steady-state analysis, the first case
in the “if” statement is always valid (i.e., ). The tangent
is then calculated in fixed-point arithmetic as
Fig. 4. Finite-precision deviation model for the cells in STAR-RLS and
PSTAR-RLS.

(3.2)

where are the fixed-point errors introduced due


to a single operation due to a multiplication and
due to a division). We assume that the deviation due to
finite precision is small compared with the original infinite-
precision quantity. Hence, we can assume that
Fig. 5. Finite-precision deviation model for the cells in QRD-RLS. Using
further simplifications such as when ,
therefore, the finite-precision error during this period would we can reduce the above as
not contribute much to the overall error.
It should be noted that for the STAR-RLS and PSTAR-
RLS algorithms at steady state, the factor is always
unity, and is 0. Hence, these quantities are removed
from the finite-precision model. Thus, the quantities
have deviations of (3.3)
, respectively. The aim
of the analysis will be to determine the statistics of these Consider, the update equation used for the cell content
deviations as a function of the wordlength. used in the th boundary cell. In a fixed-point implementation,
the update equation can be given as
III. FIXED-POINT ARITHMETIC PERFORMANCE
In this section, we analyze the performance of the three
algorithms when fixed-point arithmetic is used. All quantities
in the algorithm are represented with bits for the integer
part and bits for the fractional part (i.e., bits before the (3.4)
binary point and bits after the binary point). It is assumed
that is sufficient to avoid any overflow (for the simulation Expanding the above using (3.3) and noting that
examples presented in this paper, ). Thus, the fixed- , we can get
point representation would require bits, with one bit
being used for sign. We also assume that rounding is used
in the operations. We can model fixed-point errors due to (3.5)
multiplication and division as zero mean additive white noise
where we assumed that , which would be true as
process with a variance of [14], [21]. If
long as is close to unity.
represents a multiply or divide operation, then
The mean and variance of are of interest to us. The
fix (3.1) mean of the quantities is zero according to
the model. Consider
where is zero mean white noise process, independent of
and , with a variance of Additions and subtractions do
(3.6)
not introduce any error (assuming there is no overflow). The
fixed-point deviation expressions are derived separately for the
three algorithms below. The approach used here is to follow According to the steady-state model [10], the content of
the arithmetic operations in the different cells and keep track the boundary cell tends to achieve a constant value
of the errors that arise due to finite-precision quantization and [10] (actually, its variation is slow compared with the other
1198 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

quantities involved). Thus, in the above expectation, 3) Deviation in Final Estimation Error: Now, for any cell,
can be considered to be a constant, and we have we have the statistics of the deviation due to fixed-point
arithmetic in the outputs in terms of the deviation in the inputs.
(3.7) The output from the boundary cell is the tangent , which
is input to the internal cells in the same row. The output of
the internal cell is the input to the cell below. Hence,
with the assumption that (this can be proved in
we have the following relationship:
a straightforward manner).
We need the expectation of the square of To get an (3.14)
expression for this, we use the following:
Using (3.14), (3.13) can be rewritten as
1) are uncorrelated with each other and
the other variables, and each has a variance of .
2) by the quasisteady-state assumption
[10]. (3.15)
3) is uncorrelated with .
4) at steady state. This recursive equation can be propagated to obtain the
We finally have deviation in the final output in terms of the initial inputs to
the systolic array of Fig. 2. Note that for the cells in the first
row, for any As mentioned before, the
(3.8) figure of merit used here is the deviation in the estimation
error (note that the a priori estimation error for
the STAR-RLS algorithm is , which is the output
where is the effective forgetting factor equal to in this
from the last internal cell). From (3.15), we find that
case.
We similarly find that (see Appendix B)

(3.9) (3.16)

We thus have a closed-form expression for the finite-


2) Internal Cell: Now, consider any internal cell in the
precision performance of the STAR-RLS algorithm. It depends
th row and th column. The quantities of interest in an
on the forgetting factor , array size , and wordlength used
internal cell are the cell content and the cell output
(represented by
We want to find the statistics of the deviations due
to finite precision in these quantities, i.e., and ,
B. PSTAR-RLS Algorithm
respectively. We find that the deviation in the cell content has
the statistics (see Appendix B) The computations involved in the PSTAR-RLS algorithm
are shown in Fig. 3, and the finite-precision deviation model
(3.10) is the same as shown in Fig. 4. The notation is hence the same
as that for the STAR-RLS algorithm.
1) Boundary Cell: The computation of the tangent is
and
very similar to that in the STAR-RLS, but the cell content has
extra add operations involved. We carry out the analysis for
the PSTAR-RLS as before and find that
(3.17)

(3.11)

(3.18)
For cell output, we have
and
(3.12) (3.19)

and (3.20)

where is the speedup or level of pipelining used. It is


interesting to note that the above expressions reduce to those
for the STAR-RLS algorithm if is set equal to 1. This is
(3.13) because the PSTAR-RLS algorithm with becomes very
similar (but not the same) to the STAR-RLS algorithm.
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1199

2) Internal Cell: For the internal cell, we find For the deviation in the conversion factor, we find the fol-
lowing:
(3.21)
(3.32)
2) Internal Cell: For the internal cell, we have
(3.33)
(3.22)
and
(3.23) (3.34)
and
(3.35)
(3.24)

3) Deviation in Final Estimation Error: As in the case of


STAR-RLS, we find the deviation in the estimation error to (3.36)
be defined by
3) Deviation in Final Estimation Error: The deviation due
to finite precision propagates down the systolic array, and
this can be evaluated as done before for the STAR-RLS and
QRD-RLS. Note that the expression for has the
(3.25) same general form as in STAR-RLS. However, in the case
of QRD-RLS, there is an extra final cell. The deviation due
Again, we have a closed-form expression. to operations in this cell also has to be included in the
final expression. We find the approximate deviation in the
C. QRD-RLS Algorithm estimation error to have the statistics
Now, we consider the deviation due to fixed-point arithmetic (3.37)
in the QRD-RLS algorithm. The computations carried out
in each of the cells is shown in Fig. 1, and the model for and
finite-precision deviation is shown in Fig. 5.
1) Boundary Cell: The arithmetic operation in the bound-
ary cell of the QRD-RLS is very different from that for
the STAR-RLS and PSTAR-RLS. In STAR-RLS, the rotation
parameter (tangent) is first computed, and the cell content is (3.38)
updated using it. In the QRD-RLS algorithm, the cell content
is first updated, and this is then used to generate the rotation The finite-precision performance of QRD-RLS also has a
parameters (sine and cosine). In addition to the rotation closed-form expression similar to STAR-RLS and PSTAR-
parameters, a conversion factor term is also generated RLS. In the above, we assumed that the estimation error is
by the boundary cell. This factor is required to be able to small, which is quite reasonable in most cases. A more general
obtain the estimation error directly without backsubstitution expression is given in Appendix B.
(the McWhirter type [2] of QRD-RLS array).
It is found that IV. FLOATING-POINT ARITHMETIC PERFORMANCE

(3.26) In the last section, the deviation due to finite-precision


with fixed point was analyzed. In this section, the analysis is
(3.27) repeated for a floating-point representation. A floating-point
representation consists of a -bit mantissa and a -
The deviation in the rotation parameters, the sine , and bit exponent. For the mantissa, 1 bit is used as a sign bit.
cosine are found to be Rounding is used on the mantissa to bring it down to bits.
The exponent is an integer, and it also has a sign bit. Thus, the
(3.28) number representation here requires bits. In contrast
(3.29) to fixed-point arithmetic, in a floating-point representation,
any operation including addition or subtraction involves a
and
quantization error. If “#” represents a division, multiplication,
(3.30) addition, or subtraction operation, then quantizing the result to
a floating-point format would involve an error , i.e.,
(3.31)
float (4.1)
1200 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

TABLE I
EXPRESSIONS FOR THEFIRST BOUNDARY CELL

where is assumed to be a zero-mean noise process, inde- and


pendent of and , with a variance of , which is equal
to [20], [26]. Note that the quantization error is
not additive as in the case of fixed-point arithmetic, but it
is a relative error as shown above. The notation used for the (4.6)
deviations in the different quantities is the same as that used
for the fixed-point arithmetic. The type of representation used 3) Propagation of Error in Systolic Array: Propagating
should be clear from the context. down the systolic array, we find the deviation in the output
estimation error to be given by
A. STAR-RLS Algorithm
1) Boundary Cell: Consider the computation of the tangent
in the STAR-RLS algorithm, as was done for the fixed-point
arithmetic case in (3.2)
(4.7)
float Note that the expression has a format that is very similar
to that for the fixed-point.

B. PSTAR-RLS Algorithm
(4.2) 1) Boundary Cell: The update equation of the cell content
where the deviation occurs due to the multiplication, and has a summation term over quantities. Each summation adds
occurs due to the division. We carry out the analysis, a relative error of variance Since these errors appear as
as done before, for the fixed-point case, and the results are as multiplicative terms, the analysis gets complicated to a large
shown below. The mean of all the deviations are found to be degree. We have to take into account the correlation of the
zero as in the case of fixed-point arithmetic. Hence, only the same random variable appearing at different places. We find
variance is given. We have that

(4.3)

and (4.8)
and
(4.4)
(4.9)
2) Internal Cells: For the computations in the internal cell,
we find 2) Internal Cell:

(4.5) (4.10)
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1201

(a) (b)

(c) (d)
Fig. 6. Average deviation in estimation error in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS (p = 5). (d)
PSTAR-RLS (p = 10) (“——” theoretical; “+” Simulation results).

(4.14)

(4.15)
(4.11)
and
3) Deviation in Final Estimation Error: The final devia-
tion in the estimation error is (4.16)

2) Internal Cell: Here, we have (4.17), shown at the bottom


of the next page, and

(4.12) (4.18)
Again, we have a closed-form expression similar to that of
STAR-RLS. 3) Deviation in Final Estimation Error : Finally

C. QRD-RLS Algorithm
1) Boundary Cell: As before, the mean of the deviations is
approximately zero. The variances of the quantities are found (4.19)
to be
The above expression was simplified assuming that the esti-
(4.13) mation error is small. A more general expression is given in
(B.3) in Appendix B.
1202 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

(a) (b)

(c) (d)
Fig. 7. Average deviation in cell content of the first boundary cell in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS
(p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).

V. ANALYTICAL COMPARISON OF THE ALGORITHMS calculations then the fixed-point calculation. This is due to the
In this section, the equations developed in the last two fact that in fixed point, there are extra bits that are available
equations are simplified to be able to comment on their for the integral part.
To compare the performance of the different algorithms, we
relative performance as well as to compare the fixed-point
can use the expression for the deviation in the estimation error.
implementation with the floating-point implementation. Before
Note that the expressions are similar in form, and this greatly
doing that, we consider the deviations in the cell content and
simplifies the comparisons. First, we compare the floating-
the rotation parameters (tangent, cosines, or sines) generated
and fixed- implementations for the same algorithm. It should
by the boundary cell. It may be noted that the expressions
be kept in mind that the variance is different for the two
derived for the deviation in cell content and the deviation in the implementations. For STAR-RLS and QRD-RLS, we find that
rotation parameters are not in closed form. We have developed fixed-point implementation is always better than floating point.
closed-form expressions only for the estimation error since The same is true for PSTAR-RLS if
that was of prime importance. However, by carrying out the
analysis further, the deviations for every cell can be found. To (5.1)
get an idea of the quantities involved, we show the expressions
for the first boundary cell in Table I. With and , this condition is equivalent to
It may be noted that the error in the rotation parameter is
about for most of the cases [except PSTAR-RLS(FLOAT)]. Next, we compare the different algorithms with the same
The error in cell content is higher for the floating-point arithmetic being used. In the case of fixed-point arithmetic, it

(4.17)
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1203

(a) (b)

(c) (d)
Fig. 8. Average deviation in tangent calculation of the first boundary cell in fixed-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c)
PSTAR-RLS (p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).

can be shown that PSTAR-RLS always performs better than satisfied most of the time, except when is close to 0.5
STAR-RLS. The same would be true for the floating-point but higher. In the case of floating-point arithmetic, STAR-RLS
implementation if has a better performance than QRD-RLS if
Since is close to 1, this condition would be satisfied for
(5.2) most applications.

The above would be satisfied if is not too small VI. SIMULATION RESULTS
Thus, in general, PSTAR-RLS has better numerical properties
Next, we show results of the simulations conducted to verify
than the STAR-RLS algorithms. This would not be expected
the theoretical expressions developed in this paper. We use
since more computations are involved in PSTAR-RLS. How-
the 11-tap equalizer example of [3, p. 494], which is also
ever, it should be noted that the computation of tangent in
used in many recent papers including [4] and [10]. A random
PSTAR-RLS is simpler. In addition, the effect of the forgetting
binary sequence , which takes values of , is input to
factor is to multiply the delayed version of the cell content
the channel. The channel has the impulse response of a raised
by instead of This factor controls the finite-precision
cosine
deviation since this leads to a factor of instead of
in the error expressions. (6.1)
Next, we compare STAR-RLS and QRD-RLS. In the case of
fixed point, it can be shown that STAR-RLS performs better
The output of the channel is given by
than QRD-RLS if

(5.3) (6.2)

The right-hand-side (RHS) is positive only when where is the additive noise that is modeled to be a
The condition (5.3) will hence be zero mean Gaussian random variable with a variance of
1204 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

(a) (b)

(c) (d)
Fig. 9. Average deviation in estimation error in floating-point implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS (p = 5): (d) PSTAR-RLS
(p = 10). (“——” theoretical; “+” Simulation results).

0.001. Using simulation results, we first verify the infinite The position of the binary point is known beforehand. In
precision expression developed in Section II. We find that the case of floating point, each number is represented as a
the simulated results are close to the predicted values. For vector of length 2: one for the exponent and the other for
example, theoretical and simulated values of are as follows: mantissa. The handling of the mantissa and exponent is done
• for STAR-RLS simulated 9.29, theoretical as in hardware. To quantize a given quantity in floating-point
9.14; format, the number is represented in binary and shifted right or
• for QRD-RLS simulated 9.07, theoretical left until its absolute value is between 0.5 and 1. The number
9.09; of shifts is given as the exponent (positive for right shift and
• for PSTAR-RLS simulated 9.51, theoretical negative for left shift). The shifted binary value is rounded to
the wordlength specified, and this becomes the mantissa. For a
9.23.
multiplication, the mantissa of the two numbers are multiplied,
To simulate the finite-precision behavior of the algorithms, and the exponents are added. The outputs are again quantized
we carry out the computations as would be done in an actual to a floating-point format. Overflows at any point generate
circuit. We have developed subroutines in C language to error messages. The rounding scheme used for both the fixed-
mimic the operations of finite-precision arithmetic. For every and floating-point is the round-to-the-nearest scheme in [31,
operation of add, multiply, etc., the inputs are quantized to p. A.16]. Since is very close to 1, at low values of , it gets
the given format (fixed- or floating-point) and the wordlength. rounded to 1. In such situations, we use the closest possible
The actual operation is carried out, and the result is then value, which is not rounded to 1.
quantized. For example, if two quantities and are being The simulations are run on a 5000-long data stream, and
multiplied in fixed-point arithmetic, then the result is obtained the average results are computed by averaging over the last
as , where is a quantization operation for 1250 samples. This ensures that the results are calculated after
fixed-point representation. To accomplish quantization, the steady state has been reached. The infinite precision results are
number is represented in binary format and rounded off to also calculated alongside, and the deviation is determined. The
the wordlength specified (overflows generate error messages). plots (Figs. 6–12) show results for STAR-RLS, QRD-RLS,
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1205

(a) (b)

(c) (d)
Fig. 10. Average deviation in cell content of the first boundary cell in floating-point implementation. (a) STAR-RLS. (b) QRD-RLS. (c) PSTAR-RLS
(p = 5): (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).

PSTAR-RLS with , and PSTAR-RLS with , in QRD-RLS) generated by the first cell. Again, the theoretical
that order. This enables a comparison of the algorithms. The plots and simulation results match well.
deviation values are plotted as a function of the number of Next, we consider the plots for the floating-point arithmetic.
bits used for the fractional part (both floating- and fixed- In Fig. 9, we again see a good match between the theoretical
point). The plot in Figs. 6–8 are for fixed-point arithmetic, and simulation results, except at low values of The nonlin-
and Figs. 9–12 are for floating-point arithmetic. Fig. 6 shows earity in the theoretical plots for STAR-RLS and QRD-RLS
the plots for deviation in estimation error with fixed-point is due to rounding problems associated with representing the
arithmetic. The theoretical plots and the actual performance value of at low values of as pointed out before. Thus, the
are very close for all the algorithms, except at low values quantized value of varies with initially. Again, we see that
of It should be noted that PSTAR-RLS degrades more PSTAR-RLS performs better than QRD-RLS and STAR-RLS,
gracefully at low values of The three algorithms have and the difference is noticeable in this case. In the plots shown
similar performance. However, as predicted in Section V, in Figs. 10 and 11, we see the deviation in cell content and
PSTAR-RLS performs better than STAR-RLS, and STAR-RLS tangent (or cosine in case of QRD-RLS). These plots are not
performs better than QRD-RLS. There is little difference in as close as in the case of fixed-point arithmetic. This shows
performance in increasing from 5 to 10 in PSTAR-RLS. that it is more difficult to predict performance in floating-point
There is a 6-dB improvement for every bit increase in the value arithmetic.
of Thus, the main use of these plots would be to decide the An important use of the expressions developed in this paper
wordlength that needs to be used for the level of performance is to be able to decide the wordlength that is required in
required. In Fig. 7, plots are shown for the deviation of the cell an implementation. As an example, we consider the case of
content of the first cell in the array. The theoretical equations QRD-RLS algorithm fixed-point arithmetic with
are those shown in Table I. Again, we see that the theoretical The infinite precision estimation error variance for this case
plots agree well with the simulated results. Fig. 8 shows the is 26 dB (this is assumed to be known a priori). Since the
plots for the deviation in the tangents (or cosine in case of steady-state estimation error variance is 26 dB, we would
1206 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

(a) (b)

(c) (d)
Fig. 11. Average deviation in tangent calculation of the first boundary cell in floating-point arithmetic implementation. (a) STAR-RLS. (b) QRD-RLS. (c)
PSTAR-RLS (p = 5). (d) PSTAR-RLS (p = 10): (“——” theoretical; “+” Simulation results).

like the deviation to be well below this value. Depending


on the application, this decision can be made. For a 10-dB
difference, we need the deviation to be 36 dB. From the
theoretical plot (or from the expression (3.38) directly), we
see that we would need about 8 or 9 bits at the minimum, i.e.,
or . In Fig. 12, we have plotted the convergence curves
for different number of bits. The plots are over a data window
of 300, and the results are averaged over 200 runs. With values
of 5, 6, and 7, the finite-precision curve diverges. The curve
with appears to be close to the infinite-precision result.
Finally, in Fig. 13, we plot the fixed-point estimation error
deviation plots for the recursive modified Gram–Schmidt
algorithm with error feedback (RMGS-EF) [32]. Only the
simulation results are plotted. The performance of the RMGS-
EF algorithm seems to be similar to the QRD-RLS, STAR-
RLS, and PSTAR-RLS algorithms. More accurate comparisons
will require further study.
Fig. 12. Convergence curves for QRD-RLS in fixed-point implementation
with different wordlengths (k ) for fractional part.
VII. CONCLUSIONS
It is well known that QR-based RLS systolic algorithms have considered the QRD-RLS algorithm and the high-speed
have very good numerical properties, but the actual per- algorithms of STAR-RLS and PSTAR-RLS. By using simple
formance of these algorithms as a function of the number approximations and exploiting the properties of the algorithms,
of bits has not been analyzed before. In this paper, we we have evaluated expressions for finite-precision performance
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1207

For the first boundary cell, the input is the data input
itself. Let We finally obtain

QRD (A.3)

For the STAR-RLS algorithm, note that at steady state,


and Substituting
these into the equation for the boundary cell content, we have
after some manipulation
(A.4)
In [10], it was shown that STAR-RLS also reaches a steady
state similar to the QRD-RLS algorithm. Thus, at steady state,
, and we have after taking expectation

STAR (A.5)

Similarly, for the PSTAR-RLS, we find


Fig. 13. Average deviation in estimation error in fixed-point arithmetic for
RMGS-EF algorithm [32].
PSTAR (A.6)
for fixed- and floating-point implementations. The expressions Next, we consider the derivation of the average content of the
are closed-form and simple. In a practical situation, given internal cells in the last column in the QRD-RLS algorithm.
the required performance, the wordlengths can readily be Let represent the orthogonal matrix representing all the
determined. We find that PSTAR-RLS, STAR-RLS, and QRD- Givens rotations used in the triangularization. The desired
RLS have about the same finite-precision performance, with signal can be put in a vector of length This vector
PSTAR-RLS doing a little better than STAR-RLS, which does is weighted by the matrix and then transformed by
better than QRD-RLS. It is interesting to note that pipelining as
also improves numerical performance. The finite-precision de-
viation increases dramatically as the forgetting factor becomes (A.7)
close to 1. Hence, lower values of the forgetting factor would
be preferred. where is the -length vector that appears in the last
The analysis in this paper is based on the assumption column of the cells in the systolic array. Now, since is
that the round-off error is independent of the operands. This unit norm, we have, as
assumption may not be valid in some practical applications.
The results of this paper would not be very accurate in such
applications.
(A.8)

APPENDIX A where it is assumed that From [3], we know


that (as )
In this Appendix, we derive some of the infinite precision
properties of the QRD-RLS, STAR-RLS, and PSTAR-RLS
algorithms. Note that the results obtained here use the original (A.9)
notation of as in [4] and [10]. These are used in the main
text by converting to the effective forgetting factor notation. Now, taking the expectation of the norm of (A.7), and using
According to the quasisteady-state model [13], the content (A.8) and (A.9), we have
of the boundary cells in the QRD-RLS systolic array tend to
reach a steady-state value regardless of the input data if is (A.10)
close to unity. Thus, the content of the first boundary cell will
reach a value of Consider the boundary cell equation since is a small quantity at steady state when the adaptive
filter is tracking well. We know that is the average of the
(A.1) elements of the vector Hence

QRD (A.11)
At steady state, the above equation can be written as (after
taking expectation) To derive for the STAR-RLS, we can use an indirect
method. Let represent the transformation resulting due
(A.2) to the scaled tangent rotations. The first column of data matrix
1208 IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 45, NO. 5, MAY 1997

is reduced to a single component that is the content of ACKNOWLEDGMENT


the first boundary cell. Thus, the norm of this column after The authors would like to thank the anonymous reviewers
transformation by is It may be noted that the first for their valuable comments that led to the improvement of
column of has components that are formed by the input this paper.
data The norm of the desired signal vector after
transformation by would, hence, be expected to have a
norm equal to The norm of after
REFERENCES
transformation is approximately equal to the norm of , as
was found for the QRD-RLS in (A.9) and (A.10). Thus [1] W. M. Gentleman and H. T. Kung, “Matrix triangularization by systolic
arrays,” Proc. SPIE, Real Time Signal Processing IV, vol. 298, pp.
STAR (A.12) 298–303, 1981.
[2] J. G. McWhirter, “Recursive least-squares minimization using a systolic
array,” Proc. SPIE, Real Time Signal Processing IV, vol. 431, pp.
We can similarly find for the PSTAR-RLS 105–112, 1983.
[3] S. Haykin, Adaptive Filter Theory. Englewood Cliffs, NJ: Prentice-
PSTAR (A.13) Hall, 1986.
[4] K. J. Raghunath and K. K. Parhi, “High-speed RLS using scaled tangent
rotations (STAR),” in Proc. IEEE Int. Symp. Circuits Syst. (ISCAS-93),
APPENDIX B May 1993, pp. 1959–1962.
[5] A. Nosratinia and M. T. Orchard, “Discrete formulation of pel-recursive
In this section, we give the outline for the derivation motion compensation with recursive least squares updates,” Proc. IEEE
Int. Conf. Acoust, Speech Signal Processing (ICASSP), 1993, vol. V, pp.
for equations in Section III. Equation (3.9) can be derived 229–232.
from (3.3) by using (3.8). To obtain (3.11), we again go [6] K. K. Parhi and D. G. Messerschmitt, “Pipeline interleaving and
through the computations for the internal cell and determine parallelism in recusive digital filters—Part I: Pipelining using scattered
look-ahead and decomposition,” IEEE Trans. Acoust., Speech, Signal
the deviation. In this case, we assume that is independent Processing, pp. 1118–1134, July 1989.
of When an equation is derived for , it is [7] W. M. Gentleman, “Least-squares computations by givens transforma-
found to depend on and For later tions without square-roots,” J. Inst. Math. Applications, vol. 12, pp.
329–336, 1973.
manipulation, it would be very convenient if it could be [8] S. F. Hsieh, K. J. R. Liu, and K. Yao, “A unified sqrt-free rank-1
expressed as a function of alone. The influence up/down dating approach for recursive least-squares problems,” in Proc.
of on is found to be quite weak, and IEEE Int. Conf. Acoust., Speech, Signal Processing (ICASSP), 1990, pp.
1017–1021.
hence, approximations can be made in this case. We assume [9] G. H. Golub and C. F. V. Loan, Matrix Computation. Baltimore, MD:
that Johns Hopkins Univ. Press, 1989.
[10] K. J. Raghunath and K. K. Parhi, “Pipelined implementation of high-
(B.1) speed STAR-RLS adaptive filters,” Proc. SPIE, Advanced Signal Pro-
cessing Algorithms, Architectures, Implementations IV, 1993, vol. 2027.
and simplify the equation. Since and are the [11] H. Leung and S. Haykin, “Stability of recursive QRD LS algorithms us-
ing finite precision systolic array implementation,” IEEE Trans. Acoust.,
deviations at different points along the same row, the average Speech, Signal Processing, vol. 37, pp. 760–763, May 1989.
error is likely to be of roughly the same magnitude (they also [12] G. W. Stewart, “Error analysis of QR updating with exponential
pass through same number of cells before reaching that point). windowing,” Math. Comput., vol. 59, pp. 135–140, 1992.
[13] K. J. R. Liu, K. Yao, and C. T. Chiu, “Dynamic range, stability, and
Using this additional approximation, we arrive at (3.13). fault-tolerant capability of finite-precision RLS systolic array based on
The expression (3.38), for QRD-RLS in fixed-point arith- Givens rotation,” IEEE Trans. Circuits Syst., vol. 38, pp. 625–636, June
metic, assumes that the steady-state estimation error is small, 1991.
[14] C. Caraiscos and B. Liu, “A roundoff error analysis of the LMS adaptive
and hence, the expression was simplified. A more general algorithm,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP-
expression is 32, pp. 34–41, Feb. 1984.
[15] M. A. Syed and V. J. Mathews, “Finite precision error analysis of
a QR-decomposition based lattice predictor,” Opt. Eng., vol. 31, pp.
1170–1180, June 1992.
[16] R. C. North, J. R. Zeidler, W. H. Ku, and T. R. Albert, “A floating-point
arithmetic analysis of direct and indirect coefficient updating techniques
for adaptive lattice filters,” IEEE Trans. Signal Processing, vol. 41, pp.
1809–1823, May 1993.
[17] C. G. Samson and V. U. Reddy, “Fixed point error analysis of the
normalized ladder algorithm,” IEEE Trans. Acoust., Speech, Signal
Processing, vol. ASSP-31, pp. 1177–1191, Oct. 1983.
[18] J. R. Bunch and R. C. LeBorne, “Error accumulation affects for the a
(B.2) posteriori RLSL prediction filter,” IEEE Trans. Signal Processing, vol.
43, pp. 150–159, Jan. 1995.
[19] S. Ljung and L. Ljung, “Error propagation properties of recursive least-
Similarly, in a floating-point implementation for QRD-RLS, squares algorithms,” Automatica, vol. 21, no. 2, pp. 157–167, 1985.
a more general expression instead of (4.19) is [20] S. H. Ardalan, “Floating-point error analysis of recursive least-squares
and least-mean-squares adaptive filters,” IEEE Trans. Circuits Syst., vol.
CAS-33, pp. 1192–1208, Dec. 1986.
[21] S. H. Ardalan and S. T. Alexander, “Fixed-point error analysis of the
exponential windowed RLS algorithm for time-varying systems,” IEEE
Trans. Acoust., Speech, Signal Processing, vol. ASSP-35, pp. 770–783,
June 1987.
[22] M. H. Verhaegen, “Round-off error propagation in four generally-
(B.3) applicable, recursive, least-squares estimation schemes,” Automatica,
vol. 25, no. 3, pp. 437–444, 1989.
RAGHUNATH AND PARHI: FINITE-PRECISION ERROR ANALYSIS OF ADAPTIVE FILTERS 1209

[23] C. L. Lawson and R. J. Hanson, Solving Least Squares Problems. Keshab K. Parhi (S’85–M’88–SM’91–F’96) re-
Englewood Cliffs, NJ: Prentice-Hall, 1974. ceived the B.S. degree from the Indian Institute of
[24] I. K. Proudler, J. G. McWhirter, and T. J. Shepherd, “Computationally Technology, Kharagpur, India, in 1982, the M.S. de-
efficient QR decompositon approach to least squares adaptive filters,” gree from the University of Pennsylvania, Philadel-
Proc Inst. Elec. Eng., pt. F, vol. 138, pp. 341–353, Aug. 1991. phia, in 1984, and the Ph.D. degree from the Uni-
[25] H. Dedieu and M. Hasler, “Error propagation in recursive QRD LS versity of California, Berkeley, in 1988.
filter,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing He is a Professor of Electrical Engineering at
(ICASSP), 1991, pp. 1841–1844. the University of Minnesota, Minneapolis. His re-
[26] A. B. Sripad and D. L. Snyder, “Quantization errors in floating-point search interests include concurrent algorithm and
arithmetic,” IEEE Trans. Acoust., Speech, Signal Processing, vol. ASSP- architecture designs for communications, signal and
26, pp. 456–463, Oct. 1978. image processing systems, digital integrated circuits,
[27] C. W. Barnes, B. N. Tran, and S. H. Leung, “On the statistics of fixed- VLSI digital filters, computer arithmetic, high-level DSP synthesis, and
point roundoff error,” IEEE Trans. Acoust., Speech, Signal Processing, multiprocessor prototyping and task scheduling for programmable software
vol. ASSP-33, pp. 595–605, June 1985. systems. He has published over 160 papers in these areas.
[28] K. J. Raghunath and K. K. Parhi, “Fixed and floating point error analysis Dr. Parhi received the 1994 Darlington and the 1993 Guillemin–Cauer best
of QRD-RLS and STAR-RLS adaptive filters,” in Proc. IEEE Int. Conf. paper awards from the IEEE Circuits and Systems Society, the 1991 paper
Acoust., Speech, Signal Processing (ICASSP), Apr. 1994, pp. 81–84. award from the IEEE Signal Processing Society, the 1991 Browder Thompson
[29] P. A. Regalia and M. Bellanger, “On the duality between fast QR Prize Paper Award from the IEEE, the 1992 Young Investigator Award
methods and lattice methods in least squares adaptive filtering,” IEEE of the National Science Foundation, the 1992–1994 McKnight–Land Grant
Trans. Signal Processing, vol. 39, pp. 879–891, Apr. 1991. professorship of the University of Minnesota, and the 1987 Eliahu Jury award
[30] G. Long, F. Ling, and J. G. Proakis, “The LMS with delayed coeffi-
for excellence in systems research at the University of California, Berkeley.
cient adaptation,” IEEE Trans. Acoust., Speech, Signal Processing, pp.
He is a former associate editor of the IEEE TRANSACTIONS ON CIRCUITS AND
1397–1405, Sept. 1989.
SYSTEMS, is a current associate editor of the IEEE TRANSACTIONS ON SIGNAL
[31] J. L. Hennessy and D. A. Patterson, Computer Architecture: A Quanti-
PROCESSING and the IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—PART
tative Approach. San Mateo, CA: Morgan Kaufmann, 1990.
[32] F. Ling, D. Manolakis, and J. G. Proakis, “A recursive modified Gram- II: ANALOG AND DIGITAL SIGNAL PROCESSING, and an editor of the Journal of
Schmidt algorithm for least-squares estimation,” IEEE Trans. Acoust., VLSI Signal Processing.
Speech, Signal Processing, vol. ASSP-34, pp. 829–835, Aug. 1986.

Kalavai J. Raghunath (S’87–M’96) received the


B.E. degree in electronics and communications en-
gineering from Osmania University, Hyderabad, In-
dia, in 1988 and the M.S. (Engg.) degree in electrical
communication engineering from the Indian Insti-
tute of Science, Bangalore, India, in 1990, where
he was awarded the Modawala award for his M.S.
thesis. He received the Ph.D. degree in electrical
engineering from University of Minnesota, Min-
neapolis, in 1994.
Currently, he is a Member of Technical Staff
(MTS) in the Microelectronics Division of Lucent Technologies, Bell Lab-
oratories (formerly AT&T), Murray Hill, NJ. His present work involves
design of demodulators for video applications. His interests include VLSI
algorithms and architectures for signal processing, digital communications,
high-speed arithmetic units for DSP processors and ASIC’s, neural networks,
and statistical signal processing.

You might also like