Recursive Least Squares Algorithm: Markku Juntti

Communication Signal Processing I: Course Contents
8. Recursive Least Squares 1. Introduction

Part I – Background
Algorithm 2. Optimum receiver design problem and equalization
3. Mathematical tools
Markku Juntti Part II – Linear and Adaptive Filters and Equalizers
4. Optimum linear filters
Overview Kalman filtering is briefly reviewed. The 5. Matrix algorithms
6. Stochastic gradient and LMS algorithms
method of least squares is modified to a recursive 7. Method of least squares
form suitable for adaptive filtering applications. Its 8. Recursive least squares algorithm
9. Rotations and reflections
properties are then evaluated. 10. Square-root and order recursive adaptive filters
Source The material is mainly based on Chapters 7 Part III – Nonlinear Equalizers
11. Decision-directed equalization
and 13 of the course book [1] (Chapters 9 and 10 of
12. Iterative joint equalization and decoding
[1A]). Part IV – Other Applications
13. Spectrum estimation
14. Array processing
Communication Signal Processing I © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng., 15. Summary
Communication Signal Processing I © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng.,
8. Recursive Least Squares Algorithm Telecomm. Laboratory & CWC 1 8. Recursive Least Squares Algorithm Telecomm. Laboratory & CWC 2
Contents Review of Last Lecture

• Method of least squares (LS): no statistical
• Review of last lecture
• Review of Kalman filters assumptions on observations (data).
– Kalman filtering problem an alternative to the Wiener filter theory.
– Innovations process
– State estimation by the innovations process • Minimize the squares of modeling errors.
– Summary of Kalman filtering
– Summary and discussion • The least squares estimate is
• Introduction to RLS algorithm – model-dependent block-by-block method.
• Matrix inversion lemma
– the BLUE method.
• Exponentially weighted RLS algorithm
• Weighted error squares – the MVUE method for Gaussian signals.
• Convergence analysis • Robust computation can be based on singular value
• Application example ─ equalization decomposition (SVD) of the data matrix to calculate
• Relation to Kalman filter
• Summary the pseudoinverse.
Communication Signal Processing I © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng., Communication Signal Processing I © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng.,
Multiple Linear Regression Model Principle of Orthogonality
M −1
• The error signal: e (i ) = d (i ) − ∑ w k u (i − k ) = d (i ) − w u(i ).
∗ H
• Assume an unknown
k=0
underlying model to be
estimated with u(i) and • The cost function (the sum of error squares):
N N
d(i) known. E (w 0 ,w1 ,…,wM −1 ) = ∑ e (i ) = ∑e (i )e ∗ (i ).
2
• Estimation error is i =M i =M
e(i)=d(i)–y(i), where Principle of orthogonality with time average:
N
∑u (i − k )emin(i ) = 0, ∀k = 0,1,…,M − 1.
M −1 ∗
y (i ) = ∑ w k u (i − k ).
∗
i =M
k=0 M −1
Error: e (i ) = d (i ) − ∑ w k u (i − k ). ∗
The filter output provides the linear LS estimate of
k=0
Sum of error squares: the reponse d(i).
i2
E (w 0 ,w1 ,…,wM −1 ) = ∑ e (i ) .
2
i =i1
Matrix Formulation of the Normal

Equations Review of Kalman Filters
Φw ˆ = Φ −1z, if Φ is nonsingula r ⇔
ˆ =z⇔w • Wiener filters are optimal for stationary
environments.
 φ(0,0) φ(1,0) ⋯ φ(M − 1,0)   wˆ0   z (0) 
• Kalman filters enable efficient recursive computation
 φ(0,1) φ(1,1) ⋯ φ(M − 1,1)   wˆ1   z (−1) 
  =  based on state-space model.
 ⋮ ⋮ ⋱ ⋮  ⋮   ⋮  • Kalman filters are optimal in MMSE sense for
  w   
φ(0, M − 1) φ(1, M − 1) ⋯ φ(M − 1, M − 1)  ˆM −1  z (−M + 1) nonstationary environments described by a state-
space model.
Time averaged (auto)correlation matrix Time averaged • Related (similar) to recursive least squares (RLS)
N crosscorrelation
⇒ Φ = ∑ u ( i ) uH ( i ), u ( i ) vector adaptive filtering algorithms.
i =M • Summary herein, details in Statistical Signal
( )
−1
ˆ = AHA
⇒w A Hd Processing.
Kalman Filtering Problem Innovations Process
state transition matrix
Process • The MMSE estimate of the observed data y(n): y ˆ(n Yn −1 ),
⋅ F (n + 1, n ) Observation
observation where is Yn–1 is the space spanned by the
vector observations y(i), 1≤i≤n–1.
v1 (n ) ∑ z −1Ι ⋅ C(n ) ∑
x (n + 1) x (n ) y (n ) • The estimation error process
measurement
ˆ(n Yn −1 ), n = 1,2,…
α(n ) = y(n ) − y
v 2 (n )
process state noise
noise vector is innovations process, since by orthogonality
principle:
x (n + 1) = F(n + 1, n )x (n ) + v1 (n ).
• Process equation:
– Special case: time-invariant system: F(n + 1, n ) = F(n ). 1. [α(n )y H
(k )] = 0, 1 ≤ k ≤ n − 1,
• Measurement equation: y (n ) = C(n )x (n ) + v 2 (n ). 2. [α(n )α H

(k )] = 0, 1 ≤ k ≤ n − 1,
• Problem: Find the MMSE estimates of the state x(i), 3. [y (1), y (2),… , y (n )] ↔ [α(1), α(2),… , α(n )] .
1≤i≤n, using all the observations y(i), 1≤i≤n.
Correlation Matrix of the Innovations State Estimation by the Innovations

Process Process
[
Q2 (n ) = E v2 (n )vΗ2 (n ) ]
• Correlation matrix: • Remind that [y (1), y (2 ), … , y (n )] ↔ [α(1), α(2 ), … , α(n )] .
[ ]
R(n ) = E α(n )αΗ (n ) = C(n )K(n , n − 1)CΗ (n ) + Q2 (n ),
The MMSE state estimate can be expressed a
n
where the predicted state-error correlation matrix is ˆ (i Yn ) = ∑ Βi (k )α(k ).
x
k =1
[
K(n , n − 1) = E ε(n , n − 1)εΗ (n , n − 1) , ] ⋮
correction term
and the predicted state-error: Recursive estimate update:
ˆ (n Yn −1 ).
ε(n ) = x(n ) − x ˆ (n + 1 Yn ) = F(n + 1, n )x
x ˆ (n Yn −1 ) + G(n )α(n ).
Kalman gain innovation
Recursive One-Step Predictor Kalman Gain
ˆ (n + 1 Yn ) = F(n + 1, n )x
x ˆ (n Yn −1 ) + G(n )α(n ). • The Kalman gain matrix G(n ) = E x(n + 1)α (n ) R (n )
Η −1
[ ]
can also be computed recursively:
Model of the dynamic system G(n ) = F(n + 1, n )K(n , n − 1)CΗ (n )R −1 (n ).

ˆ(n + 1 Yn )
pre-multiplication post-multiplication
x
y(n ) + α(n ) ˆ(n Yn −1 )
x K (n , n − 1) ⋅ C (n )
Η
F(n + 1, n ) ⋅ ⋅ R −1 (n ) G(n )
∑ ⋅ G(n ) ∑ z Ι −1 C(n ) ⋅
−
R −1 (n )
−1
Computation C(n ) ⋅ ∑ ()
⋅ F(n + 1, n ) of R-1(n)
Q2 (n )
Riccati Equation Summary of Kalman Filtering

ˆ(1 Y0 )
Initial condition: x
• The the predicted state-error correlation matrix
y(n ) ˆ (n + 1 Yn )
x
K(n, n-1) can also be computed recursively: One-step F(n , n + 1) ⋅ x̂(n Yn )
K(n + 1, n ) = F(n + 1, n )K(n )FH (n + 1, n ) + Q1 (n ), predictor
K(n ) = K(n , n − 1) − F(n , n + 1)G(n )C(n )K(n , n − 1). K (n , n − 1) Kalman G(n )

gain
computer
K (n , n − 1) + K(n )
z Ι−1 ∑ F(n + 1, n ) ⋅ ⋅ F(n + 1, n ) ∑ K (n , n − 1) Riccati K (n + 1, n )
− K (n + 1, n )
equation
⋅× Q1 (n )
Initial condition: K(1,0)
solver
⋅ C(n ) F(n , n + 1) ⋅ G(n ) z −1Ι

Kalman Variables Kalman Computations
Known
x(n) state vector at time n (M×1) parameters
y(n) observation vector at time n (N×1)
F(n+1,n) state transition matrix from time n to n+1 (M×M) [
G(n ) = F(n + 1, n )K(n , n − 1)CΗ (n ) C(n )K(n , n − 1)CΗ (n ) + Q2 (n ) ]
−1
C(n) measurement matrix at time n (N×M)

Q1(n) correlation matrix of process noise vector v1(n) (M×M) ˆ (n Yn −1 )
α(n ) = y(n ) − C(n )x
Q2(n) correlation matrix of observation noise v2(n) (N×N) ˆ (n + 1 Yn ) = F(n + 1, n )x
x ˆ (n Yn −1 ) + G(n )α(n )
ˆ (n + 1 Yn )
x predicted estimate of the state vector at time n+1 (M×1)
x̂(n Yn ) filtered estimate of the state vector at time n+1 (M×1) K(n ) = K(n , n − 1) − F(n , n + 1)G(n )C(n )K(n , n − 1)
G(n) Kalman gain matrix at time n (M×N)
α(n) innovations vector at time n (N×1) K(n + 1, n ) = F(n + 1, n )K(n )FH (n + 1, n ) + Q1 (n )
R(n) correlation matrix of innovations vector at time n (N×N)
K(n+1,n) ˆ (n + 1 Yn ) (M×M)
correlation matrix of the error in x
K(n) correlation matrix of the error in x̂(n Yn ) (M×M)
Summary of Kalman Filtering Variants of the Kalman Filter
• Efficient recursive computation based on state-space • Covariance filtering.

model. – Studied so far.
• Information filtering: propagate K-1(n) (~ Fisher’s
• Optimal in MMSE sense: They minimize the trace of information matrix) instead of K(n+1,n).
the filtered state error correlation matrix K(n). • Square-root filtering: propagate the Cholesky
• Widely applied in control systems. factorization K(n) = K1/2(n)KH/2(n) (covariance
• A framework for RLS algorithms in adaptive filtering. form) or inverse K-1(n) = K-1/2(n)K-H/2(n)
(information form).
Improved numerical stability.
• UD-factorization or fast Kalman algorithm:
modification of square-root filtering to reduce
computational complexity.
The numerical stability advantage lost.
Extended Kalman Filter Introduction to RLS Algorithm
• Sometimes the basic system model is nonlinear. • The next step is to apply the method of least
• The Kalman filter can be extended to such a case squares to update the tap-weights of adaptive
transversal filters.
as well: • We search for a recursive least squares (RLS)
1. Linearize the problem approximately by Taylor series. algorithm to update the filter tap-weights when new
2. Approximate the state equations: observations (data, input samples) are fed into the
x (n + 1) ≈ F(n + 1, n )x (n ) + v1 (n ) + d(n ), filter.
y (n ) ≈ C(n )x (n ) + v 2 (n ). deterministic (non-random) More efficient utilization of data than in the LMS
algorithm.
Kalman filtering still applies except a few Improved convergence.
modifications. Increased complexity.
• Close relationship to Kalman filtering, but RLS
algorithm is treated as on its own.
Problem Set–Up Exponentially Weighted Least Squares
• The cost function to be minimized at time n is • The weighting factor must satisfy 0<β(n,i)≤1.
n
E (n ) = ∑β(n , i )e (i ) ,
2 – To forget to cope with statistical variations (nonstationarity).
i =1 Exponential weighting factor: β(n,i) = λn–i.
n
where β(n,i) is a weighting or forgetting factor, and Exponentially weighted least squares: E (n ) = ∑ λn −i e (i ) ,
2
e(i) = d(i)–y(i) is the error between the desired Normal equations: i =1

response d(i) and the transversal filter output y(i). n n
M −1 Φ(n )w
ˆ (n ) = z(n ), Φ(n ) = ∑ λn −i u(i )uH (i ), z(n ) = ∑ λn −i u(i )d ∗ (i ).
• Remind: y (i ) = ∑ w k∗ (n )u (i − k ) = wH(n )u(i ), i =1 i =1
k=0 – The sample correlations are weighted by λn–i.
u(i ) = [u (i ) u (i − 1) ⋯ u (i − M + 1)] ,
T – Prewindowing
n n −1 i=1 is zero) is assumed.
(data before
⇒ Φ(n ) = ∑ λn − i u(i )uH (i ) = λ ∑ λn −i −1u(i )uH (i ) + u(n )uH (n )
[
w(n ) = w 0 (n ) w1 (n ) ⋯ wM −1 (n ) . ]T
i =1
= λΦ(n − 1) + u(n )uH (n )
i =1
• Block processing: taps are fixed over 1≤i≤n. z(n ) = λz(n − 1) + u(n )d ∗ (n ).
The Impact of the Value of λ Matrix Inversion Lemma
• General form: [S. M. Kay, Fundamentals of Stat. Sign. Proc., Prentice Hall, 1993, p. 571]
λ = 0.999 (A + BCD ) −1
(
= A −1 − A −1B DA −1B + C −1 DA −1 . )
• Textbook’s [1] special case:
A = B −1 + CD −1CH ⇒ A −1 = B − BC D + CHBC ( )−1
CHB,
λ = 0.99 where A and B are positive definite M×M matrices.

• Another useful special case (Woodbury’s indentity):
A −1uuH A −1
−1
(
A + uuH = A −1 −
1 + uH A −1u
),
λ = 0.98 λ = 0.97
where u is a vector.
Exponentially Weighted RLS

Riccati Equation of the RLS Algorithm
Algorithm
• Apply Woodbury’s identity to the exponentially • The gain vector can be updated via the Riccati
weighted least squares problem: equation of the RLS algorithm (compare to Kalman
Φ(n ) = λΦ(n − 1) + u(n )uH (n ) filter):
Φ −1 (n − 1)u(n )uH (n )Φ −1 (n − 1) λ−1P(n − 1)u(n )

⇒ Φ (n ) = λ Φ (n − 1) −
−1 −1 −1
. k (n ) =
1 + uH (n )Φ −1 (n − 1)u(n ) 1 + λ−1uH (n )P(n − 1)u(n )
• Let (for notational convenience) the inverse [
⇔ k (n ) = λ−1P(n − 1) − λ−1k (n )uH (n )P(n − 1) u(n ) ]
correlation matrix be P(n) = Φ–1(n): = P(n )u(n ) = Φ −1 (n )u(n ).
P(n ) = λ−1P(n − 1) − λ−1k (n )uH (n )P(n − 1),
λ−1P(n − 1)u(n )
k (n ) = .
1 + λ−1uH (n )P(n − 1)u(n )
Time Update of the Tap-Weight Vector RLS Algorithm Illustrations
• The tap-weight vector update:

[
ˆ (n ) = Φ −1 (n )z(n ) = P(n )z(n ) = P(n ) λz(n − 1) + u(n )d ∗ (n )
w ]
= λP(n )z(n − 1) + P (n )u(n )d ∗ (n )
[ ]
= λ λ−1P (n − 1) − λ−1k (n )uH (n )P(n − 1) z(n − 1) + P(n )u(n )d ∗ (n )
[
= P(n − 1)z(n − 1) − k (n )u (n )P(n − 1)z(n − 1) + P(n )u(n )d ∗ (n )
H
]
ˆ (n − 1) − k (n )u (n )w
=w ˆ (n − 1) + k (n )d (n )
H ∗
[
ˆ (n − 1) + k (n ) d ∗ (n ) − uH (n )w
=w ˆ (n − 1) ]
ˆ (n − 1) + k (n )ξ∗ (n ),
=w
where the a priori estimation error ≠ a posteriori
estimation error
ξ(n ) = d (n ) − w ˆ H (n − 1)u(n ) ≠ e (n ) = d (n ) − w
ˆ H (n )u(n ).
© M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng., © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng.,
Communication Signal Processing I Communication Signal Processing I
Summary of the RLS Algorithm Weighted Error Squares

λ−1P(n − 1)u(n )
k (n ) = • Resume that Emin = Ed (n ) − z (n )w
ˆ (n ),
H
1 + λ−1uH (n )P(n − 1)u(n ) n

where now Ed (n ) = ∑ λ d (i ) = λEd (n − 1) + d (n ) 2.
n − i 2
i =1
ξ(n ) = d (n ) − w
ˆ H (n − 1)u(n )
[
⇒ Emin = λEd (n − 1) + d (n ) − λzH(n − 1) + uH(n )d ∗ (n ) w
2
ˆ (n − 1) + k(n )ξ∗ (n ) ][ ]
ˆ (n ) = w
w ˆ (n − 1) + k (n )ξ∗ (n ) [
= λ Ed (n − 1) − zH(n − 1)w ]
ˆ (n − 1) + d (n ) d ∗ (n ) − uH(n )w [
ˆ (n − 1) ]
− z (n )k(n )ξ (n )
H ∗
P(n ) = λ−1P(n − 1) − λ−1k (n )uH (n )P(n − 1) ⋮

Emin = λEmin(n − 1) + ξ∗ (n )e (n ).
ˆ (0 ) = 0.
• Typical simple initializations: P(0) = δI, w
• Note that ξ∗ (n )e (n ) = ξ(n )e ∗ (n ).
Conversion factor: γ(n ) = e (n ) = 1 − kH(n )u(n ).
ξ(n )
Convergence Analysis Mean Value
• Convergence analysis • Similarly to the unbiasedness of the LS estimator,

here is rigorous. the RLS algorithm is convergent in the mean value:
– Direct averaging method E[w
ˆ (n )] = w o , n ≥ M .
(as in the case of the LMS – Proof: = Φ(n )
( )
algorithm) is not used. n n ∗ n n
z(n ) = ∑ u(i )d ∗ (i ) = ∑ u(i ) e o (n ) + w Hou(n ) = ∑ u(i )uH (n )w o + ∑ u(i )e o∗ (n )
• Multiple linear regression i =1 i =1 i =1 i =1
n
model is applied. = Φ(n )w o + ∑ u(i ) e o∗ (n )
i =1 n n
– Regression parameter: wo.
ˆ (n ) = Φ
⇒w −1
(n )z(n ) = w o + Φ −1 (n ) ∑ u(i )e o∗ (n ) = w o + Φ −1 (n ) ∑ u(i )e o∗ (n )
– Measurement error: eo(n). i =1 i =1
• Analysis carried out for – The claim follows from the above by noting that the
expectation of the latter term is zero.
λ=1 or β(n,i)=λn–i=1.
d (n ) = e o (n ) + wHou(n )
Mean-Squared Tap-Weight Error Learning Curve ─ The Output MSE
• Two independence assumptions: The input vectors • Two kinds of filter output MSE measures:
u(1),u(2),...,u(n) are IID and jointly Gaussian. – a priori estimation error ξ(n)
The covariance matrix K(n) = E[εε(n)εεH(n)] of the • Large value (MSE of d(1)) at time n=1, then decays.
filter tap-weight error vector ε (n ) = w ˆ (n ) − w o . – a posteriori estimation error e(n)
• Small value at time n=1, then rises.
is
[ ]
K (n ) = σ2 E Φ −1 (n ) = n −M1 −1 R −1 , n > M + 1. A priori estimation error ξ(n) is more descriptive:
[ ]
⇒ E ε H (n )ε (n ) = tr[K (n )] = σ2
n − M −1
M
∑ λ1i , n > M + 1.
i =1
[
J ' (n ) = E ξ(n )
2
] =σ 2
+ tr[RK (n − 1)] = σ2 + n −MMσ −1 , n > M + 1.
2
– Proof: See [1, pp. 576–578]. – Proof: See [1, pp. 578–579].
Consequences:
1. MSE is magnified by 1/λmin. Ill-conditioned matrices
cause problems.
2. MSE decreases linearly over time.
Learning Curve ─ The Output MSE: Application Example ─ Equalization
Consequences
1. The learning curve converges in about 2M iterations • Transmitted signal: random
about an order of magnitude faster than the LMS sequence of ±1’s.
algorithm. • Channel:
2. As the number of iterations approaches infinity MSE 1   2π 
hn =  2 1 + cosW (n − 2) , n = 1, 2, 3.

approaches the variance σ2 of the optimum

measurement error eo(n) zero excess MSE in WSS 0, otherwise
environments. – 11-tap FIR equalizer.
3. MSE convergence is independent of the eigenvalue • Two SNR values:
spread of the input data correlation matrix. – SNR = 30 dB
Remarkable convergence improvements over LMS – SNR = 10 dB.
Channel response
algorithm with the price of increased complexity.
Example: Impact of Eigenvalue Spread at Example: RLS and LMS Algorithm

High SNR = 30 dB Comparison at Low SNR = 10 dB
• Convergence in • The RLS algorithm
about 20 (≈2M) has clearly faster
χ(R) ≈ {11}
iterations. convergence and
• Relatively insensitive smaller steady-state
χ(R) ≈ {6,11,21,46}
to eigenvalue error than those of
spread. the LMS algorithm
• Clearly faster with less oscillations.
convergence and
smaller steady-state
error than those of
the LMS algorithm.
Relations of RLS Algorithms and Kalman
Relation to Kalman Filter Filter Variables
• The RLS algorithm has many similarities to the
Kalman filtering, but also some differences.
– RLS: derivation by a deterministic mathematical model.
– Kalman: derivation by a stochastic mathematical model.
Unified approach based on stochastic state-space
models.
• The Kalman filtering approaches in the literature are
readily available for RLS algorithms.
Summary
• RLS algorithm derived as a natural application of
the method of least squares to the linear filter
adaptation problem.
– Based on matrix inversion lemma.
• Difference to the LMS algorithm: step-size
parameter µ is replaced by P(n) = Φ–1(n).
The rate of convergence of the RLS alg. is typically
1. an order of magnitude better than that of the LMS alg.
2. invariant to eigenvalue spread.
3. the excess MSE converges to zero
• The case λ≠1 is considered later change in the last
property.
Communication Signal Processing I © M. Juntti, University of Oulu, Dept. Electrical and Inform. Eng.,
8. Recursive Least Squares Algorithm Telecomm. Laboratory & CWC 43

Recursive Least Squares Algorithm: Markku Juntti

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Recursive Least Squares Algorithm: Markku Juntti

Uploaded by

Copyright:

Available Formats

Communication Signal Processing I: Course Contents

8. Recursive Least Squares 1. Introduction

Contents Review of Last Lecture

Matrix Formulation of the Normal

• Measurement equation: y (n ) = C(n )x (n ) + v 2 (n ). 2. [α(n )α H

Correlation Matrix of the Innovations State Estimation by the Innovations

Kalman gain innovation

Model of the dynamic system G(n ) = F(n + 1, n )K(n , n − 1)CΗ (n )R −1 (n ).

Riccati Equation Summary of Kalman Filtering

K(n ) = K(n , n − 1) − F(n , n + 1)G(n )C(n )K(n , n − 1). K (n , n − 1) Kalman G(n )

⋅ C(n ) F(n , n + 1) ⋅ G(n ) z −1Ι

C(n) measurement matrix at time n (N×M)

Summary of Kalman Filtering Variants of the Kalman Filter

• Efficient recursive computation based on state-space • Covariance filtering.

Problem Set–Up Exponentially Weighted Least Squares

e(i) = d(i)–y(i) is the error between the desired Normal equations: i =1

λ = 0.99 where A and B are positive definite M×M matrices.

Exponentially Weighted RLS

Φ −1 (n − 1)u(n )uH (n )Φ −1 (n − 1) λ−1P(n − 1)u(n )

• The tap-weight vector update:

Summary of the RLS Algorithm Weighted Error Squares

1 + λ−1uH (n )P(n − 1)u(n ) n

P(n ) = λ−1P(n − 1) − λ−1k (n )uH (n )P(n − 1) ⋮

• Convergence analysis • Similarly to the unbiasedness of the LS estimator,

Mean-Squared Tap-Weight Error Learning Curve ─ The Output MSE

Example: Impact of Eigenvalue Spread at Example: RLS and LMS Algorithm

You might also like