SSP-5-1 Adaptive Filtering

Optimal filtering and adaptive filters
Sverre Holm
Optimal and adaptive filters
1. Optimal vs conventional filtrering and
applications of adaptive filter 5-18
2. Optimal (Wiener) filtering: stationarity 19-33

3. Adaptive filtering with the Least-Mean-
Squares algorithm 35-53
4. Convergence properties of LMS 54-63
5. Variants and alternatives to LMS 64-87
04.04.2022 2
IN3190
IN3190
Literature
Diniz, Adaptive Filtering Algorithms and Practical Implementation 5th Ed
https://link.springer.com/book/10.1007/978-3-030-29057-3
– Ch 2.4 Wiener filter (p. 26-30)
– Ch 2.6-9 MSE Surface, Steepest Descent (p. 35-41)
– Ch 2.10 Applications revisited (p. 41-52)
– Ch 3-1-3.4, 3.6-3.7 (p. 61-74, 75-97)
– Ch 4.4 The Normalized LMS (p. 114-115)
IN3190: Rao & Swamy, Digital Signal Processing, Theory and Practice
https://link-springer-com.ezproxy.uio.no/book/10.1007/978-981-10-8081-
4, Chapter 11: Adaptive Digital Filters
– 11.1-11.4 FIR Adaptive Filters, LMS, NLMS. [RLS]
– 11.5 Comparison, Convergence of LMS: result given but not derived
– 11.6 Applications: ECG, Echo Cancellation
04.04.2022 3
IN3190
IN3190
Literature
Monson H. Hayes: Statistical Digital Signal Processing and Modeling
• Chapter 7: Optimum Filters
– 7.1 Introduction (p. 335 -
– 7.2 The FIR Wiener Filter (- p. 353)
• Mainly 7.2.1 Filtering & 7.2.3 Noise Cancellation. Little emphasis on 7.2.2 & 7.2.4
• Chapter 9: Adaptive Filtering

– 9.1 Introduction (p. 493 -
– 9.2 FIR Adaptive Filters (- p. 534)
– [9.4 Recursive Least Squares (p. 541-552)]
04.04.2022 4
applications of adaptive filter
2. Optimal (Wiener) filtering: stationary case
Squares algorithm
4. Convergence properties of LMS
5. Variants and alternatives to LMS
04.04.2022 5
Elektrokardiogram (EKG)
• http://www.atenmedical
art.com/anim/heart.htm Images: Wikipedia
04.04.2022 6
Noise filtering of EKG signal
• Matlab: Aase.m
• Ottar Aase,
• Kardiologi, Ullevål
Universitetssykehus
04.04.2022 7
Time and frequency analysis
04.04.2022 8
EKG: Filter for noise reduction
04.04.2022 9
EKG: Before and after filterring
04.04.2022 10
Optimal – conventional filtrering
• Ex: ECG with 50 Hz noise
• Optimal filtering
– Signal and noise overlap
• Wiener filter
– Filter requirements and noise
spectrum may vary with time
• Adaptive Wiener filter
• Conventional filtering
– Analog or digital
• Usually a digital filter
04.04.2022 11
Filtering
• Conventional filter (FIR, IIR)
– Separate frequency bands
• Optimum filter (Wiener filter)
– Minimum error relative to reference
– Adaptive filter = Time-varying Wiener filter
• Matched filter
– Best signal to noise ratio
– Sonar, radar: correlate with reverse of sent
waveform
04.04.2022 12
Adaptive filter
• Digital filter with adjustable coefficients

– Usually FIR
• Adaptive algorithm that adjusts the coefficients
– Least Mean Squares: LMS and variants of it (normalized LMS, leaky
LMS)
– Recursive Least Squares: RLS
• Two inputs
– Often the reference signal can be found from the primary signal (see
examples)
04.04.2022 13
Examples – adaptive filtering
 Removal of tone from a broadband signal
 Unwanted carrier on a short wave link:
“HF SSB automatic notch filter example #1”
• Removal of broadband noise from a broadband signal
– Atmospheric noise that disturbs the shortwave link:
”HF SSB noise reduction example #1”
– VHF FM link which is close to lower SNR limit for useability:
”2 meter FM noise reduction example #1”
– VHF FM weather service close to SNR threshold:
”VHF FM (NOAA weather) noise reduction example #1”
• Source: http://www.w9gr.com/#Sound Clips +

• https://www-
int.uio.no/studier/emner/matnat/ifi/IN5340/v22/login-protected-
info/soundclips/index.html
• Project: implement these filters + find «best» parameters
04.04.2022 14
Echo cancellation in telephone
• 2-wire link for telephone
– Full duplex
– Sent and received sound
on same pair of wires
• Own signal ’leaks’ back
e.g. due to mismatch
• Mobile phone:
– Received signal in
speaker leaks back to
sender via own
microphone
– Especially if speaker
phone
04.04.2022 15
Echoes: signal processing and
psychoacoustics
• 0.1-10 ms: Comb filter (D = 0.25 ms)
– Additive 1/0.25 = 4 kHz (3600)
– Subtracts at 2 kHz (1800)
– Flanging: time-varying comb filter
• 10-50 ms: reverberation – creates ’volume’
• > 50 ms: Discrete echoes
• 138 ms: radio signal around the world (shortwave):
– LA3ZA: Three paths from Japan to Norway along the grayline
• ~200 ms: unpleasant, causes stuttering
– Kurihara, T "A System Utilizing Artificial Speech Disturbance with Delayed Auditory Feedback“,
ArXiv, 2012
• 480 ms:
– Geostationary satellite: 4*36000 km/3e5
04.04.2022 16
Echo cancellation in telephone
• One adaptive filter, AF, on each side
• An estimate of the echo, y^k is made from the
incoming signal, xk. Then it is subtracted
• Minimizing the output ek
input
reference
04.04.2022 17
Speaker phone (~any mobile phone)
• Acoustic coupling between
loudspeaker and
microphone
• May use PTT – Push to
Talk, but that does not
give full duplex
• Typically 512 tap filters
– Important with a fast
algorithm
• Tele conference reference
– Larger distance microphone -
loudspeaker
– Must avoid positive feedback
(howling)
– 250-1000 taps dependent on
the size of the room, fast input
convergence
04.04.2022 18
Squares algorithm
04.04.2022 19
Wiener filter
x(n)=d(n)+v(n)
• Adaptive filter is an approximation to a Wiener filter
• Norbert Wiener (1894-1964):

– Pioneer in the study of stochastic and noise processes, contributions in electronic
engineering, communication, and control systems.
– Wiener was also central in cybernetics, a field that formalizes the notion of feedback
and has implications for engineering, systems control, computer science, biology,
philosophy, and the organization of society (Wikipedia)
– The Wiener–Khintchine theorem: the power spectral density of a wide-sense-
stationary random process is the Fourier transform of the corresponding
autocorrelation function.
04.04.2022 20
Wiener filtering
• Clean up desired signal, d(n), contaminated by
noise, v(n): x(n)=d(n)+v(n)
• Wiener filter, W(z):
– In: x(n)
– Out: d^(n)
– Criterion: Min E[|e(n)|2], e(n) = d(n) - d^(n)
• Related to Kalman filter:
– Wiener filter: LTI filter for stationary processes
– Kalman filter: time-varying, also applicable to nonstationary
processes
• Ch. 7.1-7.2.1 in Hayes: Statistical digital signal
processing and modeling
04.04.2022 21
Wiener filter
• p’th order FIR:
• Output:
• Minimize:
– Similar to “least squares (direct) method” in Signal Modeling, but

simplified because an MA model (FIR filter) is assumed
04.04.2022 22
Theorem –
2.3.10 Optimization Theory
If f(z,z*) is a real-valued function of z and z* and

if f(z,z*) is analytic (smooth, infinitely
differentiable) with respect to both z and z*,
then the stationary point of f(z,z*) may be
found by setting the derivative of f(z,z*) with
respect to either z or z* equal to zero and
solving for z.
04.04.2022 23
Othogonality principle
• Derivative = 0 (e is a function of w, e* of w*):
• The error is:
• The derivative is:
• Othogonality principle or projection theorem

04.04.2022 24
y: desired
Wiener-Hopf
: estimator
• Important results: Prediction Plane spanned by

observations, x
error is orthogonal to the samples
• Inserting for the prediction error:
• x(n), d(n): jointly WSS (wide sense stationary)

Figure: http://en.wikipedia.org/wiki/Least-squares_estimation_of_linear_regression_coefficients
04.04.2022 25
04.04.2022 26
Wiener-Hopf equations
• P linear equations in p unknowns, w(k):
• In matrix form:
• Rx is a pxp Hermitian Toeplitz matrix

• w is the vector of filter coefficients
• rdx is the vector of cross-correlations
04.04.2022 27
Prediction error
• But due to the orthogonality principle, the

sum disappears:
• or:
04.04.2022 28
04.04.2022 29
Wiener filter – frequency domain
• Wiener-Hopf: w(k) * rx(k) = rdx(k)
 W(w) Px(w) = Pdx(w)
• Wiener filter: W(w) = Pdx(w) / Px(w)
04.04.2022 30
Wiener filter – frequency domain
• Additive uncorrelated noise: x(k)=d(k)+v(k)
– Px(w) = Pd(w) + Pv(w)
– Pdx(w) = Pd(w): uncorrelated signal, x, and noise,v
• Wiener filter:
– W(w) = Pdx(w) / Px(w) = Pd(w) / [Pd(w) + Pv(w)]
• Interpretation:
– High SNR: Pv(w) → 0 => W(w) → 1
– Low SNR: Pv(w) >> Pd(w) => W(w) → 0
04.04.2022 31
AR(1) process in noise
SNR = 0 dB SNR = -10 dB
Ex 7.2.1 in Hayes, matlab code: AR1.m

04.04.2022 32
Other scenarios than filtering
• Filtering: Classical problem: estimate d(n) with a causal filter,
i.e. from current and past values of x(n)
• Smoothing: Like filtering, but W(z) can be noncausal
• Prediction: d(n)=x(n+1) and W(z) is causal: linear predictor
which provides optimal estimate of x(n+1) based on past values
of x(n)
• Deconvolution: Let x(n)=d(n) * g(n)+v(n), g(n) = LTI-filter. Then
W(z) is the optimal deconvolution filter
04.04.2022 33
Different from Matched filter
• Transmit s, receive x=s+v (radar, …)
• Optimal linear filter for max SNR in additive noise:
– Convolve received signal with a conjugated time-reversed template
– y=h*x, where h(t) = s*(t-T)
• SNR increases by TB (time-bandwidth product)
• For radar & sonar which may be peak power limited
• Called ’pulse compression’ in previous exercises
– Linear Frequency Modulation in Exercise II
• Limitation: blind zone of length T (limits use in medical
ultrasound)
• Contrast to Wiener filter which is minimum rms error between
desired signal and estimated signal
04.04.2022 34
Squares algorithm
04.04.2022 35
1975: Adaptive noise cancelling
• Widrow, et al
"Adaptive noise
cancelling:
Principles and
applications,“
Proc. IEEE, pp.
1692-1716, Dec.
1975
• 2020: >5000 citations My project NTNU, 1977: 2 tap analog LMS filter
MC1494 - Four-Quadrant Multiplier x 4
04.04.2022 36
MC1456, 715 HC – Op Amp
Time-varying signal example: Speech
8000
7000
6000
Short-Time Fourier Transform
Framelength 32 ms, Frequency 5000
timeshift 1 ms.
4000
Frequency resolution
1/32 ms = 31.2 Hz 3000
2000
1000
0
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7
Time
04.04.2022 37
LMS algorithm: Hayes 9.1 – 9.2
• Wiener-Hopf equations:
• Nonstationary signal =>
w(k) → wn(k): time-varying
filter
• Relax the requirement that
w minimize the m.s. error
at each time-instant
• Instead let: wn+1=wn+wn
04.04.2022 38
Desirable properties
1. When signals are stationary, the solution
should converge to the Wiener filter:
2. It should be possible to find the weight

correction term, wn, without knowing the
signal statistics, i.e. directly from the
samples
3. For nonstationary signals, the filter should

be able to adapt and track the changing
statistics
04.04.2022 39
Steepest Descent Adaptive Filter
• Quadratic surface: =E{|e(n)|2}
• The correction wn involves
taking a step of size  in the
direction of steepest descent
• Must find the gradient of the
error, (n), which points in
the direction of steepest
ascent
• Thus
wn+1=wn+wn= wn- (n)
• Step size:
– Small : slow adaptation
– Larger : faster
– Too large : unstable
04.04.2022 40
Steepest Descent Algorithm
1. Initialize with an initial estimate w0
2. Evaluate the gradient of (n) for wn
3. Update the weight at time n by taking a step
of size : wn+1 = wn- (n)
4. Increment n, go back to 2) and repeat
04.04.2022 41
Gradient
• Assuming that w is complex, the gradient is
the derivative of E{|e(n)|2} w.r.t. w*
• Gradient vector:
(n) = E{|e(n)|2} = E{ |e(n)|2} = E{e(n) e*(n)}
• But
• The gradient: e*(n)=-x*(n) => (n)=-E{e(n)x*(n)}

• wn+1 = wn -  (n)= wn +  E{e(n)x*(n)}
04.04.2022 42
Steepest descent – LMS algorithm
• wn+1 = wn +  E{e(n)x*(n)}
• Simplest form of correlation = one-sample estimate
• This is the least-mean-square (LMS) algorithm:
wn+1 = wn +  e(n)x*(n)
• The update requires only one multiplication and one addition

per coefficient
• p+1 coeffs filter:
– Update: p+1 mults, p+1 adds
– One add for e(n)=d(n)-y(n)
– One mult for e(n)
– Filtering: p+1 mults, p adds
– Total 2p+3 mults, 2p+2 adds
04.04.2022 43
Gradient algorithm and stationarity
• Looks very empirical
• But, for stationary signal, the gradient is:
– E{e(n)x*(n)} = E{d(n)x*(n)} - E{wnTx(n)x*(n)} = rdx –
Rxwn
• Steepest descent algorithm then becomes:
wn+1 = wn +  (rdx – Rxwn)
• Stationarity, Wiener-Hopf satisfied wn=Rx-1rdx
• The correction term is 0 and wn+1=wn for all n
04.04.2022 44
Knowing the desired signal, d(n)
• Sometimes hard
• System identification:
– The plant produces d(n)
– Estimate W = a model of
the plant
• Noise cancellation:
– d(n) may be a narrowband
stochastic process
– v(n) may be a wideband
stochastic process with an
autocorrelation that is ≈ 0
for lags > k0
– May use delay >k0 to get a
signal that does the same
job as the desired signal in
the equations
– Achieve decorrelation
04.04.2022 45
Cancellation of periodic noise
• No reference: found from signal
• The M-step predictor structure can
also be seen as a ‘correlation
canceller’. This is because the
output of the prediction filter
consists of the portion of the signal
that is strongly autocorrelated at
delay lag M or greater and this
output is removed from the
received signal.
• By adjusting the lag M, we can
determine what components are
removed and what are preserved.
• Communications on short wave
– Autonotcher: remove sinusoid of
unknown frequency: 24-128 taps
– Typ. M1=63 @ fs=16 kHz  3.9 ms
• http://www.tapr.org/ Estimate of speech
signal = output
 “HF SSB automatic notch filter Estimate of d^ -
example #1”
the periodic signal
04.04.2022
Note that output is different from the usual one! 46
Automatic notch filter
• First a 1 kHz signal is
fed to the notcher and
then the frequency is
changed to 2 kHz.
• The LMS filter adapts
in ~40 ms to the new
signal.
• http://www.tapr.org/
• (Leaky LMS)
04.04.2022 47
Removal of broadband noise Estimate of noise –
Estimate of d^ -
• No reference: found from signal the speech signal
• Uncorrelated component is
suppressed
• Correlated signal is kept
– E.g. speech signal
• Cleaning up recordings made in
secrecy
– Forensic signal processing: Recovering speech
that is masked by distortion, bad recording
techniques, background sounds, or other voices.
• Communication over short wave
– Typical parameters:
– Denoiser: 24-128 taps
– M2=1 @ fs=16 kHz  62 s
– The delay is short because we want to reject
those signal components that do not correlate,
i.e. are pure white noise, and pass components
that have small correlation, i.e. speech signal.
– http://www.tapr.org/
• ”HF SSB noise reduction
example #1”
04.04.2022 48
Forensic … https://www.budstikka.no/nyheter/nyheter/avfyrte-
skudd-mot-asker-bolig/s/2-2.310-1.3469135
04.04.2022 49
”Correlation”- filter
|rxx|
Periodic signal
Correlation-wise one can make

a reference signal by delaying by M:
Speech rxx(-M)=E[x(n-M)x*(n)]
~White = E[{d(n-M)+v(n-M)}x*(n)]
noise ≈ E[d(n-M)x*(n)] = rdx(-M) = rdx*(M)
or rxx(M) ≈ rdx(M)
Previous ex: M2=62 s M1=3.9 ms

Lag
04.04.2022 50
Change in correlation length
(speech+tone)
1 sec
04.04.2022 Project of Håkon Mørk Solaas, 2015 51

Separate mother’s ECG from that of
the fetus
04.04.2022 52
Adaptive noise cancellation of
engine noise on sonar
• Cand Scient thesis, Jens
Kristian Haug, Nov. 1995
• Collaboration with FFI
(Norwegian Defense),
Horten
• Sonar’s frequency range:
500-5000 Hz also has
components from the ship’s
own noise
• 1996: 500-1000 Hz were
removed; unusable
• But this info is of particular
interest since it penetrates
deepest into the bottom
• Performed measurements
on board w.r.t. placement of
reference sensor
• Best: in the front of the
engine room J. Haug, T. Knudsen and S. Holm, "Adaptive noise cancellation
of self noise on sonar," in Proc. Nordic Symp. on Physical
Acoustics, Ustaoset, Norway, p. 72, February 1996.
04.04.2022 53
Squares algorithm
04.04.2022 54
Convergence property (9.2.1)
• The algorithm converges for stationary processes
if the step size satisfies 0 <  < 2/max

where max is the largest eigenvalue of the correlation matrix Rx
• Will be shown in the next slides:

– Weight -> weight error (single input)
– Decouple weights by rotation -> uncoupled update
equations
04.04.2022 55
Convergence – stationarity
Rewrite update equation (ideal algorithm, not LMS):
Weight error vector cn=wn-w, using rdx=Rx w:
Simplifies to an equation with only a single input:
To simplify further, decouple the terms in cn+1, i.e. diagonalize:

Rx = VVH
04.04.2022 56
Convergence – stationarity, 2
• Rx = VVH is a correlation matrix: Hermitian and non-negative
definite
– All eigenvalues k ≥ 0
– V may be unitary: V VH = I
• Weight error vector, cn → 0 for stationary signals:
• Rotated weight error vector: independent terms un = VH cn:
• And starting from initial weight error:
04.04.2022 57
Convergence – stationarity, 3
• Rotated weight error: un = VH(wn-w) = VH cn :
• Is expressed in terms of a diagonal matrix so

all components of un are independent:
• Convergence of the stationary solution if the

parenthesis is small enough:
• Final result: step size must be 0 <  < 2/max
04.04.2022 58
Convergence rate - stationarity
• Weight vector wn = w+cn = w + Vun :
• Sum of p independent modes, time constant per mode:
• Time constant (λk << 1):
• The slowest mode determines convergence:
04.04.2022 59
Convergence rate – stationarity, 2
• Step size: 0 <  < 2/max
• Convergence given by
• Let step-size be a fraction, 0<<1, of its max value,

 = 2/max. Then the convergence time is:
• i.e. given by the condition number of the correlation

matrix = eigenvalue spread:
04.04.2022 60
Summary – LMS algorithm
• wn+1 = wn +  E{e(n)x*(n)}
• The simplest form of the correlation is a one-sample

estimate
• The least-mean-square (LMS) algorithm:
wn+1 = wn +  e(n)x*(n)
• The update requires only one multiplication and one

addition per coefficient
• p+1 coeffs filter:
– Update: p+1 mults, p+1 adds
– One add for e(n)=d(n)-y(n)
– One mult for e(n)
– Filtering: p+1 mults, p adds
– Total 2p+3 mults, 2p+2 adds
04.04.2022 61
Convergence of LMS algorithm
• Steepest descent: 0 <  < 2/max
• Also the condition for LMS to converge in
the mean
• Bound is however of only limited value, but
gives a ballpark estimate:
1. In practice it is too large to ensure stability,
convergence in the mean may imply a large
variance of the weight vector
2. Must know Rx which one avoids to estimate in the
LMS-algorithm, but that can be overcome …
04.04.2022 62
Convergence of LMS algorithm
• Upper bound for max:
– The trace of a square matrix is the sum of the elements on the main
diagonal
– Since V is unitary, the sum of eigenvalues = trace
• Wide-sense stationary =>
Rx is Toeplitz and tr(Rx}) = (p+1)rx(0)
• Steepest descent: 0 <  < 2/max
• LMS converges in the mean if 0<< 2/[(p+1)E{|x(n)|2}]
• For exercise: Estimate max to get an idea of range of .
04.04.2022 63
Squares algorithm
04.04.2022 64
Variants
• LMS-derivations:
– Normalized LMS
– Leaky LMS
– Single-bit data, error, or both for faster
performance
• Recursive Least Squares
04.04.2022 65
Normalized LMS (1)
• LMS
– wn+1 = wn +  e(n)x*(n)
– Convergence for 0 <  < 2/[(p+1)E{|x(n)|2}]
• LMS may have gradient noise amplification when
x(n) is large as the step adjustment is proportional to
x(n)
• May be avoided by adjusting  depending on the
signal
04.04.2022 66
Normalized LMS (2)
• First: rewrite LMS
• LMS convergence if 0<< 2/[(p+1)E{|x(n)|2}]
• Estimate E{|x(n)|2} by
• Implies
– where the numerator is the power in a p+1 long signal vector
• Use this estimate in exercise VI
04.04.2022 67
Normalized LMS (3)
• Normalized LMS: replace 
by  = max /2 = /||x(n)||2 where 0<<2
• Advantage: less gradient noise amplification when

x(n) is large
•
• Disadvantage: nLMS may have gradient noise
amplification when x(n) is too small
• Therefore in practice:
–  = /[+||x(n)||2 ] where =0.0001 in code
• Slight increase in computations compared to LMS

04.04.2022 68
Leaky LMS
• wn+1 = (1-)wn +  e(n)x*(n) where 0<<<1
• The leakage coefficient, , forces w to zero if the
error or the signal becomes zero.
• Forces any undamped modes to zero
• Drawback: for stationary input, the algorithm no
longer converges to the Wiener filter
• Explanation:
– Especially when the convergence parameter, , is large, there is a noise
build-up effect where hiss components are noticeably amplified. When the
input signal goes to zero, the coefficients of the standard LMS algorithm do
not change, and when there is a lot of noise in the signal, the coefficients
will tend to wander aimlessly and may become quite large, increasing the
unwanted noise part in the signal.
– The coefficient decay factor, (1-) controls the forgetting factor
– With this decay parameter, the LMS algorithm can slowly recover and reset
itself over a period of several seconds.
– https://www.tapr.org/
04.04.2022 69
Simplified correlation for speeding up
the LMS algorithm
• Simplify correlation estimate,  e(n)x*(n), by using just a single
bit for one or both. Examples from Matlab filter design toolbox:
– adaptfilt.sd Sign-data LMS FIR adaptive filter algorithm
– adaptfilt.se Sign-error LMS FIR adaptive filter algorithm
– adaptfilt.ss Sign-sign LMS FIR adaptive filter algorithm
– Multiply => either sum or xor operation
– Gabriel, K. (1983). Comparison of three correlation coefficient estimators
for Gaussian stationary processes. IEEE Trans Acoust, Speech, Sign Proc.
• Block-oriented algorithm using FFT

– adaptfilt.blmsfft; FFT-based Block LMS FIR adaptive filter algorithm
– Slower convergence due to blocking (can never converge faster than block
length)
04.04.2022 70
Another fast method for correlation
Average magnitude difference function (AMDF):
– Scale factor may be found analytically for Gaussian signal
Estimator:
– Especially well suited for peak finding
Also called Sum Absolute Differences (SAD)

– Ross, M., Shaffer, H., Cohen, A., Freudberg, R., & Manley, H. (1974). Average magnitude difference
function pitch extractor. IEEE Trans Acoustics, Speech, and Signal Processing.
– Bohs, L. N., & Trahey, G. E. (1991). A novel method for angle independent ultrasonic imaging of
blood flow and tissue motion. IEEE Trans Biomed Eng.
– Barnea, D. I., & Silverman, H. F. (1972). A class of algorithms for fast digital image registration.
IEEE Trans Computers.
04.04.2022 71
Recursive Least Squares - ch 9.4
Wiener – min mean square: (n) =E{|e(n)|2}
• Need correlation and cross-correlation
• LMS: instantaneous values instead:
E {e(n)x*(n-k)} => e(n)x*(n-k)
Alternative: use an error measure that depends directly on

available data – least squares error:
• Filter depends directly on data, not on statistics

• Philosophy behind RLS-method
04.04.2022 72
Recursive Least Squares
• Properties
– Order p2 mults/adds (LMS: order p)
– RLS often converges faster than LMS
– Less sensitive to eigenvalue distribution of
autocorrelation
– RLS is not so good at tracking nonstationary
processes
– No value for  to set
04.04.2022 73
Conditions for adaptive filter to work
• Coherence = normalized cross spectrum:
• Magnitude squared coherence, MSC:

0 ≤ |γxy(f)|2 ≤ 1
• Examining the relation between two signals

• See Spectral estimation II
04.04.2022 74
Coherence – example
(w/ 95% confidence interval)
S. Holm, "Prudence in estimating coherence between planetary,

04.04.2022
solar and climate oscillations,“ Astrophys. Space Sci., 2015. 75
Orbit of the center of mass of the solar
system from 1980 to 2014
2010
1980
04.04.2022
Generated by Horizons system of JPL,76
Holm, Astrophys. Space Sci., 2015
Global temperature anomaly HadCRUT3 +
speed of center of mass of the solar system
(SCMSS)
HadCRUT3, Brohan et al. (2006),

Holm, Astrophys. Space Sci., 2015
04.04.2022 77
Coherence and adaptive noise
canceller
• Can show that suppression in adaptive noise

canceller depends on coherence:
– Attenuation =
– Low coherence: γ → 0 <=> Attenuation → 0 dB

– High coherence γ = 0.9 <=> Attenuation = -10 dB
– High coherence γ = 0.99 <=> Attenuation = -20 dB
04.04.2022 78
Reasons for lack of coherence
1. Independent noise on signal and reference
2. Nonlinear signal path
3. Several signal paths
Example: supression of aircraft

noise inside pilot oxygen mask:
– Noise in cockpit is ~diffuse, i.e. above
a certain distance it is uncorrelated = #1
– Noise gets inside mask via many paths
with more or less independent noise = #3
04.04.2022 79
Coherence: pilot oxygen mask
• Darlington et al "Adaptive
noise reduction in aircraft
communication systems"
IEEE ICASSP, 1985
• Adaptive noise cancellation
only succeeds at low
frequencies
04.04.2022 80
Multiple-input noise canceller
• Could solve the situation where the
coherence is lacking
• Griffiths, L., Adaptive structures for multiple-
input noise cancelling applications, IEEE
ICASSP 1979.
04.04.2022 81
Sub-band adaptive noise canceller
• Avoids a very long

filter and instead
uses several
shorter ones
• Faster
convergence
– Kuo and Kunduru,
Subband adaptive
noise canceler for
hands-free cellular
phone applications,
1993 IEEE Workshop
on Applications of
Signal Processing to
Audio and Acoustics
04.04.2022 82
Alternative: Spectral Subtraction
• Spectral Subtraction alternative to LMS
adaptive noise cancelling filter
– Adobe Audition
– Audacity (freeware) – Noise Removal
– Search for «How to remove noise with Audacity»
– Used for cleaning up portions where signal and
noise are overlaid
• Applications:
– Forensic signal processing,
– Broadband noise cancellation e.g. shortwave
communications
04.04.2022 83
Spectral Subtraction
• Basic spectral noise subtraction algorithm
• D(w) = Ps(w) – Pn(w)
D( w ), if D( w )  0
P (w ) = 
'
S
 0, otherwise
• Ps(w) is the noise corrupted input speech

• Pn(w) is the smoothed estimate of noise spectrum.
• P’s(w) is the modified signal spectrum
• Assumption of uncorrelated signal and noise.
04.04.2022 84
Modified Spectral Subtraction
• Mod of the basic spectral subtraction algorithm to combat
the musical noise resulting from its implementation:
• D(w) = Ps(w) – αPn(w)
D( w ), if D( w )  βPn ( w )
PS' ( w ) = 
 βPn ( w ), otherwise
• The α parameter is the subtraction factor that reduces the

spectral noise peaks by subtracting an overestimation of
noise spectrum,  ≥ 1
• “β” is the spectral floor parameter that prevents the
resultant spectral components from going below a spectral
floor, 0 ≤  < 1
• Original method: =1, =0
• M. Berouti, R. Schwartz, and J. Makhoul (1979). “Enhancement of speech corrupted by acoustic
noise”. Proc. IEEE Int. Conf. Acoust., Speech, and Sign. Proc.
04.04.2022 85
Modified Spectral Subtraction
Ps(w)
D( w ), if D( w )  βPn ( w )
PS' ( w ) = 
D(w) = Ps(w) – αPn(w)
 βPn ( w ), otherwise
αPn(w), α > 1
Pn(w)
Pn(w),  < 1
04.04.2022 86
Spectral Subtraction & Gunshot
Acoustics
• A. L. L. Ramos, S. Holm, S. Gudvangen, R. Otterlei, "A Spectral Subtraction Based

Algorithm for Real-time Noise Cancellation with Application to Gunshot Acoustics," Int. J.
Electron. and Telecomm., vol 59 (no 1), 2013.
04.04.2022 87
applications of adaptive filter 5-18
2. Optimal (Wiener) filtering: stationarity 19-33

Squares algorithm 35-53
4. Convergence properties of LMS 54-63
5. Variants and alternatives to LMS 64-87
04.04.2022 88

SSP-5-1 Adaptive Filtering

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSP-5-1 Adaptive Filtering

Uploaded by

Copyright:

Available Formats

Optimal filtering and adaptive filters

2. Optimal (Wiener) filtering: stationarity 19-33

4. Convergence properties of LMS 54-63

5. Variants and alternatives to LMS 64-87

• Chapter 9: Adaptive Filtering

• Ex: ECG with 50 Hz noise

• Digital filter with adjustable coefficients

• Source: http://www.w9gr.com/#Sound Clips +

• Adaptive filter is an approximation to a Wiener filter

• Norbert Wiener (1894-1964):

– Similar to “least squares (direct) method” in Signal Modeling, but

If f(z,z*) is a real-valued function of z and z* and

• The error is:

• The derivative is:

• Othogonality principle or projection theorem

• Important results: Prediction Plane spanned by

• Inserting for the prediction error:

• x(n), d(n): jointly WSS (wide sense stationary)

• Rx is a pxp Hermitian Toeplitz matrix

• But due to the orthogonality principle, the

 W(w) Px(w) = Pdx(w)

• Wiener filter: W(w) = Pdx(w) / Px(w)

SNR = 0 dB SNR = -10 dB

Ex 7.2.1 in Hayes, matlab code: AR1.m

2. It should be possible to find the weight

3. For nonstationary signals, the filter should

• The gradient: e*(n)=-x*(n) => (n)=-E{e(n)x*(n)}

• The update requires only one multiplication and one addition

Correlation-wise one can make

Previous ex: M2=62 s M1=3.9 ms

04.04.2022 Project of Håkon Mørk Solaas, 2015 51

if the step size satisfies 0 <  < 2/max

• Will be shown in the next slides:

Weight error vector cn=wn-w, using rdx=Rx w:

Simplifies to an equation with only a single input:

To simplify further, decouple the terms in cn+1, i.e. diagonalize:

• Rotated weight error vector: independent terms un = VH cn:

• And starting from initial weight error:

• Is expressed in terms of a diagonal matrix so

• Convergence of the stationary solution if the

• Final result: step size must be 0 <  < 2/max

• Sum of p independent modes, time constant per mode:

• Time constant (λk << 1):

• The slowest mode determines convergence:

• Let step-size be a fraction, 0<<1, of its max value,

• i.e. given by the condition number of the correlation

• The simplest form of the correlation is a one-sample

• The update requires only one multiplication and one

• LMS convergence if 0<< 2/[(p+1)E{|x(n)|2}]

• Use this estimate in exercise VI

• Advantage: less gradient noise amplification when

• Slight increase in computations compared to LMS

• Block-oriented algorithm using FFT

– Scale factor may be found analytically for Gaussian signal

Also called Sum Absolute Differences (SAD)

Alternative: use an error measure that depends directly on

• Filter depends directly on data, not on statistics

• Magnitude squared coherence, MSC:

• Examining the relation between two signals

S. Holm, "Prudence in estimating coherence between planetary,

HadCRUT3, Brohan et al. (2006),

• Can show that suppression in adaptive noise

– Low coherence: γ → 0 <=> Attenuation → 0 dB

If f(z,z) is a real-valued function of z and z and

• The gradient: e(n)=-x(n) => (n)=-E{e(n)x*(n)}