Professional Documents
Culture Documents
What is a filter?
Consider the signal, {𝑥𝑥(𝑡𝑡), 𝑥𝑥[𝑛𝑛]}, and desired modifications or operations on the signal to
produce the modified signal, {𝑦𝑦(𝑡𝑡), 𝑦𝑦[𝑛𝑛]}. We model the modifications/operations as a causal,
stable LTI system {ℎ(𝑡𝑡), ℎ[𝑛𝑛]}, which we refer to as a filter with {𝑥𝑥(𝑡𝑡), 𝑥𝑥[𝑛𝑛]} as the input and
{𝑦𝑦(𝑡𝑡), 𝑦𝑦[𝑛𝑛]} as the output such that:
𝑦𝑦(𝑡𝑡) = ℎ(𝑡𝑡) ∗ 𝑥𝑥(𝑡𝑡) → 𝑌𝑌(𝑗𝑗𝑗𝑗) = 𝐻𝐻(𝑗𝑗𝑗𝑗)𝑋𝑋(𝑗𝑗𝑗𝑗)
𝑦𝑦[𝑛𝑛] = ℎ[𝑛𝑛] ∗ 𝑥𝑥[𝑛𝑛] → 𝑌𝑌�𝑒𝑒 𝑗𝑗Ω � = 𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )
Analysis and design of filters involve the zero-state (or steady-state) magnitude and phase
responses. Filters which operate on analog signals are called analog filters. Filters which
operate on discrete signals are called digital filters.
2
Prove that |𝐻𝐻(𝑗𝑗𝑗𝑗)|2 = 𝐻𝐻(𝑠𝑠)𝐻𝐻(−𝑠𝑠)|𝑠𝑠=𝑗𝑗𝑗𝑗 and �𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �� = 𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �𝐻𝐻�𝑒𝑒 −𝑗𝑗Ω ��𝑧𝑧=𝑒𝑒 𝑗𝑗Ω by
showing that 𝐻𝐻(−𝑗𝑗𝑗𝑗) = 𝐻𝐻 ∗ (𝑗𝑗𝑗𝑗) and 𝐻𝐻�𝑒𝑒 −𝑗𝑗Ω � = 𝐻𝐻 ∗ (𝑒𝑒 𝑗𝑗Ω ) given that the poles and zeros are
either real or complex conjugate pairs.
Linear-phase filter
A linear-phase filters possesses constant group delay for all ω, that is:
𝑑𝑑ϕ(𝑗𝑗ω′ )
𝑡𝑡𝑔𝑔 (ω) = − � = 𝑡𝑡𝑔𝑔 ϕ(𝑗𝑗ω) = −ω𝑡𝑡𝑔𝑔 + 𝐶𝐶
𝑑𝑑ω′ ω′ =ω
Symmetry property of impulse response for linear-phase filters
• If the filter impulse response {ℎ(𝑡𝑡), ℎ[𝑛𝑛]} is even symmetric about point {𝑡𝑡0 , 𝑛𝑛0 } then the
phase response of the filter is {−ω𝑡𝑡0 , −Ω𝑛𝑛0 } (excluding ±π “wraps”)
• If the filter impulse response {ℎ(𝑡𝑡), ℎ[𝑛𝑛]} is odd symmetric about point {𝑡𝑡0 , 𝑛𝑛0 } then the
π π
phase response of the filter is {−ω𝑡𝑡0 ± , −Ω𝑛𝑛0 ± } (excluding ±π “wraps”)
2 2
Prove the above result by considering the Fourier transform of the impulse response that is
symmetric about the origin and then the Fourier transform for a delayed version of the
impulse response.
Consider a digital filter with transfer function in the following factored form:
(𝑧𝑧 − 𝑧𝑧1 )(𝑧𝑧 − 𝑧𝑧2 ) ⋯ (𝑧𝑧 − 𝑧𝑧𝑀𝑀 )
𝐻𝐻(𝑧𝑧) = 𝐾𝐾
(𝑧𝑧 − 𝑝𝑝1 )(𝑧𝑧 − 𝑝𝑝2 ) ⋯ (𝑧𝑧 − 𝑝𝑝𝑁𝑁 )
For the filter to be stable all the poles have to lie inside the unit circle in the z-plane, i.e. |𝑝𝑝𝑖𝑖 | <
1. If the filter also has all the zeros inside or on the unit circle in the z-plane, |𝑧𝑧𝑖𝑖 | ≤ 1, then it
has the smallest group delay, and smallest deviation from zero phase, at every frequency of
interest, than any other filter with the same magnitude response, 𝐻𝐻(𝑒𝑒 𝑗𝑗Ω ).
Digital Filters
• can only be used for discrete signals and since most real world signals are analog, an extra
analog-to-digital and digital-to-analog conversion is necessary, making all signals
processed digital (i.e. quantised) rather than simply discrete.
• are more expensive than analog filters due to the need for digital hardware and software.
• are implemented directly in computer logic and/or software, and are sensitive to both data
and coefficient quantisation effects.
• are very flexible and versatile due to direct implementation of any approximation in
hardware or software and unconstrained realisation of any rational transfer function.
𝐴𝐴𝑝𝑝 𝐴𝐴𝑝𝑝
𝐴𝐴𝑠𝑠 𝐴𝐴𝑠𝑠
𝐴𝐴𝑝𝑝 𝐴𝐴𝑝𝑝
𝐴𝐴𝑠𝑠 𝐴𝐴𝑠𝑠
Exercise 11.1
Answer:
2012.4
𝐻𝐻(𝑠𝑠) =
𝑠𝑠 5 + 14.82𝑠𝑠 4 + 109.8𝑠𝑠 3 + 502.6𝑠𝑠 2 + 1422.3𝑠𝑠 + 2012.4
Figure 11-3 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.3, pg. 409)
Example 11.2
Question: Design a Butterworth bandstop filter with 2-dB passband edges of 30 Hz and 100
Hz and 40 dB stopband edges of 50 Hz and 60 Hz.
Answer:
𝑠𝑠 6 + 3.55(10)5 𝑠𝑠 4 + 4.21(10)10 𝑠𝑠 2 + 1.66(10)15
𝐻𝐻(𝑠𝑠) =
𝑠𝑠 6 + 8.04(10)2 𝑠𝑠 5 + 6.79(10)5 𝑠𝑠 4 + 2.56(10)8 𝑠𝑠 3 + 8.04(10)10 𝑠𝑠 2 + 1.13(10)13 𝑠𝑠 + 1.66(10)15
Figure 11-4: Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.3C, pg. 411)
Figure 11-5 Chebyshev low-pass prototype magnitude response ([1], Figure 13.10, pg.
413)
Example 11.3
Question: A Chebyshev I lowpass filter is to meet the following specifications: 𝐴𝐴𝑝𝑝 = 1 dB for
ω = 4 rad/s and 𝐴𝐴𝑠𝑠 ≥ 20 dB for ω ≥ 8 rad/s.
Answer:
31.4436
𝐻𝐻(𝑠𝑠) =
𝑠𝑠 3 + 3.9534𝑠𝑠 2 + 19.8145𝑠𝑠 + 31.4436
Figure 11-6 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.5, pg. 419)
Example 11.4
Answer:
To ensure geometric symmetry the ωH
s is decreased to 2.54.
0.34𝑠𝑠 4
𝐻𝐻(𝑠𝑠) =
𝑠𝑠 8 + 0.86𝑠𝑠 7 + 10.87𝑠𝑠 6 + 6.75𝑠𝑠 5 + 39.39𝑠𝑠 4 + 15.27𝑠𝑠 3 + 55.69𝑠𝑠 2 + 9.99𝑠𝑠 + 26.25
Figure 11-7 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.5C, pg. 421)
Example 11.5
Answer:
2.4121𝑠𝑠 2 + 205.8317
𝐻𝐻(𝑠𝑠) = 3
𝑠𝑠 + 11.2431𝑠𝑠 2 + 60.2942𝑠𝑠 + 205.8317
Figure 11-9 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.6, pg. 426)
Figure 11-10 Elliptic low-pass prototype magnitude response ([1], Figure 13.14, pg. 427)
The analysis and design of the elliptic filter is detailed in pgs. 427-432 of [1] and is fairly
complex and beyond the scope of the examinable material (but these may be covered in
required laboratory or assignment work).
Bessel Filters
The analysis and design of the Bessel filter is fairly complex and beyond the scope of the
examinable material (but these may be covered in required laboratory or assignment work).
Properties of Bessel Filters
• Monotonic frequency response (no ripples in either the passband or stopband)
• Provides control over the phase response in the passband by being able to design for a
specified value of the group delay, 𝑡𝑡𝑔𝑔 .
• Can design for passband attenuation, 𝐴𝐴𝑝𝑝 , but not stopband attenuation, 𝐴𝐴𝑠𝑠 .
First-order low-pass
𝐻𝐻𝜔𝜔0
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the 3dB passband cut-off frequency and 𝐻𝐻𝑖𝑖 (0) = 𝐻𝐻
𝑠𝑠+𝜔𝜔0
First-order high-pass
𝐻𝐻𝐻𝐻
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the 3dB passband cut-off frequency and 𝐻𝐻𝑖𝑖 (∞) = 𝐻𝐻
𝑠𝑠+𝜔𝜔0
Second-order low-pass
𝐻𝐻𝜔𝜔02
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the passband cut-off frequency, 𝐻𝐻𝑖𝑖 (0) = 𝐻𝐻, and 𝑄𝑄 is the
𝑠𝑠 2 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
quality factor such that for 𝑄𝑄 < 1/√2 the passband response is monotonic, but for 𝑄𝑄 > 1/√2
passband response exhibits overshoot near 𝜔𝜔0 .
Second-order band-pass
𝐻𝐻2𝜁𝜁𝜔𝜔0 𝑠𝑠 𝐻𝐻(𝜔𝜔0 /𝑄𝑄)𝑠𝑠
𝐻𝐻𝑖𝑖 (𝑠𝑠) = 2 2 = where 𝜔𝜔0 is the bandpass centre frequency, 𝐻𝐻𝑖𝑖 (𝜔𝜔0 ) =
𝑠𝑠 +2𝜁𝜁𝜔𝜔0 𝑠𝑠+𝜔𝜔0 𝑠𝑠 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
2
𝜔𝜔0
𝐻𝐻, 𝜁𝜁 is the damping factor and 𝑄𝑄 = 1⁄2𝜁𝜁 is the quality factor such that 𝐵𝐵 = 𝜔𝜔2 − 𝜔𝜔1 =
𝑄𝑄
where 𝐻𝐻𝑖𝑖 (𝜔𝜔1 ) = 𝐻𝐻𝑖𝑖 (𝜔𝜔2 ) = 𝐻𝐻/√2 (3 dB bandpass attenuation), and 𝜔𝜔1 𝜔𝜔2 = Thus since 𝜔𝜔02 .
𝜔𝜔0
𝑄𝑄 = a higher 𝑄𝑄 implies a more peaked response (smaller bandwidth) relative to the centre
𝐵𝐵
frequency.
Second-order high-pass
𝐻𝐻𝑠𝑠 2 𝐻𝐻𝑠𝑠 2
𝐻𝐻𝑖𝑖 (𝑠𝑠) = = where 𝜔𝜔0 is the passband cut-off frequency, 𝐻𝐻𝑖𝑖 (∞) =
𝑠𝑠 2 +2𝜁𝜁𝜔𝜔0 𝑠𝑠+𝜔𝜔02 𝑠𝑠 2 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
𝐻𝐻, and for 𝜁𝜁 > 0.707, 𝑄𝑄 < 1/√2 the passband response is monotonic, but for 𝜁𝜁 < 0.707, 𝑄𝑄 >
1/√2 passband response exhibits overshoot near 𝜔𝜔0 .
Biquadratic
𝑠𝑠 2 +(𝜔𝜔𝑧𝑧 /𝑄𝑄𝑧𝑧 )𝑠𝑠+𝜔𝜔𝑧𝑧2
𝐻𝐻𝑖𝑖 (𝑠𝑠) = 𝐻𝐻 2 is a general form where both numerator and denominator are
𝑠𝑠 2 +(𝜔𝜔𝑝𝑝 /𝑄𝑄𝑝𝑝 )𝑠𝑠+𝜔𝜔𝑝𝑝
second-order polynomials.
Features
• Due to buffering provided by op-amp complex 𝐻𝐻(𝑠𝑠) can be realised by the appropriate
cascade of standard first-order and second-order filter circuits (modular design)
• Greater control and versatility offered by the modular design approach
• Absence of bulky, inductor components make IC implementation of active RC filters
possible and the preferred filters for electronic devices
• Presence of op-amp requires filter circuits to be powered and limits use to low-power
applications.
Exercise 11.2 For an IIR filter to exhibit linear phase the impulse response has to exhibit
symmetry, e.g. even symmetry ℎ[𝑛𝑛] = ℎ[−𝑛𝑛]. Show that this implies 𝐻𝐻(𝑧𝑧) = 𝐻𝐻(1/𝑧𝑧) and
that for every pole |𝑝𝑝𝑖𝑖 | < 1 there will be a corresponding pole �𝑝𝑝𝑗𝑗 � > 1 and hence a stable
linear-phase IIR filter is not possible.
• yield larger filter lengths for a given application (i.e. longer delays, more expensive)
• absence of poles implies filter will always be stable (and causal)
• transient response is of limited duration due to finite memory and nonrecursive nature of
filter
• can realise a response with exactly linear phase, thus a distortionless filter is possible
The main advantages of digital filters over analog filters arise when considering FIR filters
since these possess transient responses of limited duration, are always stable, and can be
designed for exactly linear phase. The properties of IIR filters are analogous to analog filters
whereas FIR filters are in a “class of their own”.
Furthermore FIR filters can also be used to design more than just LP, HP. BP and BS filters.
They are also used for:
• Functional filters like differentiators, phase shifters, etc.
• Adaptive filters used for noise cancellation, etc.
0.8.1 Differentiators
1 π 𝑑𝑑𝑑𝑑[𝑛𝑛] 1 π
Since 𝑥𝑥[𝑛𝑛] = ∫ 𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω then we have 𝑦𝑦[𝑛𝑛] =
2𝜋𝜋 −π 𝑑𝑑𝑑𝑑
= ∫ 𝑗𝑗Ω𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω
2𝜋𝜋 −π
1 π
and as 𝑦𝑦[𝑛𝑛] = ∫ 𝑌𝑌(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω then we can see that 𝑌𝑌�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω𝑋𝑋(𝑒𝑒 𝑗𝑗Ω ) where
2𝜋𝜋 −π
𝐻𝐻�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω and thus the ideal differentiator is described by:
𝐻𝐻�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω, |Ω| ≤ 𝜋𝜋
Since ℎ[𝑛𝑛] is infinite duration to design an FIR differentiator we must truncate ℎ[𝑛𝑛], by
ℎ[𝑛𝑛]𝑤𝑤[𝑛𝑛] −𝑛𝑛0 ≤ 𝑛𝑛 ≤ 𝑛𝑛0
application of an appropriate window function, to ℎ𝑁𝑁 [𝑛𝑛] = �
0 otherwise
where 𝑛𝑛0 = 0.5(𝑁𝑁 − 1). A delay of 𝑛𝑛0 is then introduced, producing ℎ𝑁𝑁 [𝑛𝑛 − 𝑛𝑛0 ], to ensure
causality. The distortion arising from the truncation is controlled by the choice of window
function, 𝑤𝑤[𝑛𝑛], and filter length, 𝑁𝑁.
Since ℎ[𝑛𝑛] is infinite duration to design an FIR differentiator we must truncate ℎ[𝑛𝑛], by
ℎ[𝑛𝑛]𝑤𝑤[𝑛𝑛] −𝑛𝑛0 ≤ 𝑛𝑛 ≤ 𝑛𝑛0
application of an appropriate window function, to ℎ𝑁𝑁 [𝑛𝑛] = �
0 otherwise
where 𝑛𝑛0 = 0.5(𝑁𝑁 − 1). A delay of 𝑛𝑛0 is then introduced, producing ℎ𝑁𝑁 [𝑛𝑛 − 𝑛𝑛0 ], to ensure
causality. The distortion arising from the truncation is controlled by the choice of window
function, 𝑤𝑤[𝑛𝑛], and filter length, 𝑁𝑁.
The above difference equation for the IIR filter can be easily realised by the appropriate
combination of unit delay, summer and multiplier units to produce the output 𝑦𝑦[𝑛𝑛] (see Section
1.3.1). For simplicity, and without loss of generality, we assume 𝑀𝑀 = 𝑁𝑁 (if 𝑀𝑀 ≠ 𝑁𝑁 then we
simply make the respective co-efficients zero). There are two possible realisations as shown
by Figure 11-11:
• Direct form II is a direct implementation of the difference equation in canonical form using
only N delay elements and 2N summers
• Transposed realisation reverses the signal flow thereby producing a realisation with only
N delay elements and N+1 summers.
Figure 11-11 Direct form II (left) and Transposed (right) realisation of an IIR digital
filter ([1], Figure 18.3, pg. 639)
The difference equation for the FIR filter can be easily realised as shown Figure 11-12.
Figure 11-12 Realisation of an FIR digital filter ([1], Figure 18.1(left), pg. 638)
Cascade realisation
𝐻𝐻(𝑧𝑧) = 𝐻𝐻1 (𝑧𝑧)𝐻𝐻2 (𝑧𝑧) … 𝐻𝐻𝑛𝑛 (𝑧𝑧)
where 𝐻𝐻𝑖𝑖 (𝑧𝑧) is usually a first-order or second-order transfer function unit and 𝐻𝐻(𝑧𝑧) is
represented as a product of such units (e.g. by grouping conjugate roots for the second-order
units and real roots for the first-order units).
Parallel realisation
𝐻𝐻(𝑧𝑧) = 𝐻𝐻1 (𝑧𝑧) + 𝐻𝐻2 (𝑧𝑧) + ⋯ + 𝐻𝐻𝑛𝑛 (𝑧𝑧)
where 𝐻𝐻𝑖𝑖 (𝑧𝑧) is usually a first-order or second-order transfer function unit and 𝐻𝐻(𝑧𝑧) is
represented as a partial fraction expression of such units (e.g. by grouping conjugate poles for
the second-order units and real poles for the first-order units).
Example 11.6
Question: Realise the digital filter transfer function:
6 − 2𝑧𝑧 −1
𝐻𝐻(𝑧𝑧) =
1 1
(1 − 𝑧𝑧 −1 ) �1 − 𝑧𝑧 −1 − 𝑧𝑧 −2 �
6 6
directly as one structure, in cascade and in parallel.
Cascade:
Parallel:
Software implementation
IIR and FIR filters can be easily implemented in software by storing the digital signal values,
𝑥𝑥[𝑛𝑛], as an array variable and looping through difference equation at each time instant, for
example:
for n = 1 to T do
y[n]=A1*y[n-1]-A2*y[n-2]- … -AN*y[n-N]+B0*x[n]+B1*x[n-1]+ … +
BM*x[n-M]
done
MATLAB also has the following function for implementing a general IIR filter:
Y = FILTER(B,A,X) filters the data in vector X with the filter described
by vectors A and B to create the filtered data Y.
And various tools to automate the analysis, design and use of digital filters on signals:
Graphical User Interfaces
fdatool - Filter Design and Analysis Tool.
fvtool - Filter Visualization Tool.
sptool - Signal Processing Tool.
wintool - Window Design and Analysis Tool.
wvtool - Window Visualization Tool.
You can use the sptool GUI to import data signals, design filters, apply the filter to the
signal and view the spectrum of the original and filtered signal and fdatool to visualise and
design filters.
0.10 References
1. Ambardar, “Analog and Digital Signal Processing”, 2nd Ed., Brooks/Cole, 1999.
2. S. Haykin, B.V. Veen, “Signals and Systems”, 2nd Ed., Wiley, 2003.
3. B.P. Lathi, “Linear Systems and Signals”, 2nd Ed., Oxford University Press, 2005.
4. L.P. Huelsman, “Active and Passive Analog Filter Design”, McGraw-Hill, 1993.
Random Variable: We are given an experiment specified by the space S, a field of subsets
of S called events, and the probability assigned to these events. To every outcome 𝜁𝜁 of this
experiment, we assign a number 𝑋𝑋(𝜁𝜁). We have thus created a function 𝑋𝑋 with domain the
set S and range a set of numbers. This function is called a random variable.
Cumulative Distribution Function: The elements of the set S that are contained in the
event {𝑋𝑋 ≤ 𝑥𝑥} change as the number 𝑥𝑥 takes various values. The probability 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥} of the
event {𝑋𝑋 ≤ 𝑥𝑥} is, therefore, a number that depends on 𝑥𝑥. This number is denoted by 𝐹𝐹𝑋𝑋 (𝑥𝑥)
and is called the cumulative distribution function (cdf).
Let 𝑋𝑋 be conditioned on the single event 𝐴𝐴 or multiple events {𝐴𝐴1 , 𝐴𝐴2 , … , 𝐴𝐴𝑛𝑛 }. Then:
Example 1-5: EXAMPLE 4-19 from [1], pgs. 103-105. (a priori vs a posteriori)
Expectation of 𝑔𝑔(𝑋𝑋):
∞
𝐸𝐸[𝑔𝑔(𝑋𝑋)] = � 𝑔𝑔(𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝑔𝑔(𝑥𝑥𝑖𝑖 )𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
The expectation provides a measure of the average of mean behaviour of 𝑔𝑔(𝑋𝑋).
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 1-3
ELEC4404 Signal Processing
A complete list of common types of random variables and their distributions can be found in
the front inside cover of [1].
𝑎𝑎 + 𝑏𝑏 2
(𝑏𝑏 − 𝑎𝑎)2
𝜇𝜇 = , 𝜎𝜎 =
2 12
Memoryless Property:
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠} 𝑒𝑒 −𝜆𝜆(𝑡𝑡+𝑠𝑠)
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠|𝑋𝑋 > 𝑠𝑠} = = = 𝑒𝑒 −𝜆𝜆𝜆𝜆 = 𝑃𝑃{𝑋𝑋 > 𝑡𝑡}
𝑃𝑃{𝑋𝑋 > 𝑠𝑠} 𝑒𝑒 −𝜆𝜆𝜆𝜆
What does it mean? The probability that an event will arrive after time 𝑡𝑡 + 𝑠𝑠,
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠|𝑋𝑋 > 𝑠𝑠}, given the event has not yet arrived by time 𝑠𝑠, {𝑋𝑋 > 𝑠𝑠}, is the same as
the probability the event will arrive after time 𝑡𝑡, that is, it does not depend on 𝑠𝑠.
NOTE: We can show that 𝐹𝐹𝑋𝑋 (𝑥𝑥) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥} = 1 − 𝑒𝑒 −𝜆𝜆𝑥𝑥 and hence 𝑃𝑃{𝑋𝑋 > 𝑥𝑥} = 𝑒𝑒 −𝜆𝜆𝜆𝜆
Meaning: Let 𝑋𝑋 be an event of interest with two outcomes, 𝑋𝑋 = 1 for success, 𝑋𝑋 = 0 for
failure, then the Bernoulli distribution describes individual random failure events where 𝑝𝑝 is
the probability of success and 𝑞𝑞 = 1 − 𝑝𝑝 is the probability of failure.
Poisson Distribution(𝜆𝜆)
𝜆𝜆𝑘𝑘 −𝜆𝜆
𝑃𝑃[𝑋𝑋 = 𝑘𝑘] = 𝑒𝑒 , for 𝑘𝑘 = 0,1,2, …
𝑘𝑘!
𝜇𝜇 = 𝜆𝜆, 𝜎𝜎 2 = 𝜆𝜆
NOTE: Can you see how the Binomial distribution and Poisson distribution are related?
Joint cdf
𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦}
Joint pdf
𝜕𝜕 2 𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = ≡ � � 𝑝𝑝𝑖𝑖𝑖𝑖 𝛿𝛿(𝑥𝑥 − 𝑥𝑥𝑖𝑖 , 𝑦𝑦 − 𝑦𝑦𝑘𝑘 )
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
𝑖𝑖 𝑘𝑘
where 𝑝𝑝𝑖𝑖𝑖𝑖 = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] and ∑𝑖𝑖 ∑𝑘𝑘 𝑝𝑝𝑖𝑖𝑖𝑖 = 1.
Important Properties/Relations
𝑥𝑥 𝑦𝑦
1. 𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = ∫−∞ ∫−∞ 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑢𝑢, 𝑣𝑣)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
2. 𝑃𝑃{(𝑋𝑋, 𝑌𝑌) ∈ 𝐷𝐷} = ∬𝐷𝐷 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
∞ ∞
3. ∫−∞ ∫−∞ 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1
Marginal pdf
∞
𝑓𝑓𝑋𝑋 (𝑥𝑥) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 ≡ � 𝑝𝑝𝑖𝑖 𝛿𝛿(𝑥𝑥 − 𝑥𝑥𝑖𝑖 )
−∞ 𝑖𝑖
∞
𝑓𝑓𝑌𝑌 (𝑦𝑦) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 ≡ � 𝑞𝑞𝑘𝑘 𝛿𝛿(𝑦𝑦 − 𝑦𝑦𝑘𝑘 )
−∞ 𝑘𝑘
where 𝑝𝑝𝑖𝑖 = ∑𝑘𝑘 𝑝𝑝𝑖𝑖𝑖𝑖 = ∑𝑘𝑘 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ],
𝑞𝑞𝑘𝑘 = ∑𝑖𝑖 𝑝𝑝𝑖𝑖𝑖𝑖 = ∑𝑖𝑖 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = 𝑃𝑃[𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] and ∑𝑖𝑖 𝑝𝑝𝑖𝑖 = ∑𝑘𝑘 𝑞𝑞𝑘𝑘 = 1
Independence
If events are independent then we have that:
𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦} = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥}𝑃𝑃{𝑌𝑌 ≤ 𝑦𝑦}
from which it follows that:
𝐹𝐹𝑋𝑋𝑌𝑌 (𝑥𝑥, 𝑦𝑦) = 𝐹𝐹𝑋𝑋 (𝑥𝑥)𝐹𝐹𝑌𝑌 (𝑦𝑦)
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑓𝑓𝑌𝑌 (𝑦𝑦)
Conditional pdf
From Bayes’ theorem for conditional probability:
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)
𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑦𝑦) = = = ∞
𝑓𝑓𝑌𝑌 (𝑦𝑦) 𝑓𝑓𝑌𝑌 (𝑦𝑦) ∫−∞ 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] 𝑝𝑝𝑖𝑖𝑖𝑖 𝑝𝑝𝑋𝑋𝑋𝑋 (𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑘𝑘 )
𝑝𝑝𝑋𝑋 (𝑥𝑥𝑖𝑖 |𝑦𝑦𝑘𝑘 ) = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 |𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = = =
𝑃𝑃[𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] 𝑞𝑞𝑘𝑘 𝑝𝑝𝑌𝑌 (𝑦𝑦𝑘𝑘 )
Example 1-9: EXAMPLE 6-42 from [1], pg, 224,225 (a priori vs a posteriori)
Covariance of 𝑋𝑋 and 𝑌𝑌
𝐶𝐶𝑋𝑋𝑋𝑋 = 𝑐𝑐11 = 𝐸𝐸[(𝑋𝑋 − 𝐸𝐸[𝑋𝑋])(𝑌𝑌 − 𝐸𝐸[𝑌𝑌])] = 𝐸𝐸[𝑋𝑋𝑋𝑋] − 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌]
The covariance is a measure of the joint spread or variation of (𝑋𝑋, 𝑌𝑌) about the mean
(𝐸𝐸[𝑋𝑋], 𝐸𝐸[𝑌𝑌]). NOTE: 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 𝐶𝐶𝑋𝑋𝑋𝑋
THINK: Show that 𝑬𝑬[𝑿𝑿 + 𝒀𝒀] = 𝑬𝑬[𝑿𝑿] + 𝑬𝑬[𝒀𝒀]. What is 𝑽𝑽𝑽𝑽𝑽𝑽[𝑿𝑿 + 𝒀𝒀]?
Important Properties
• If 𝐸𝐸[𝑋𝑋𝑋𝑋] = 0 then 𝑋𝑋 and 𝑌𝑌 are orthogonal.
• If 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 0 then 𝑋𝑋 and 𝑌𝑌 are uncorrelated.
• If 𝑋𝑋 and 𝑌𝑌 are independent then 𝐸𝐸[𝑋𝑋𝑋𝑋] = 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌], thus independent random
variables are uncorrelated (since 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 0).
The correlation co-efficient 𝜌𝜌𝑋𝑋𝑋𝑋 measures the correlation spread between 𝑋𝑋 and 𝑌𝑌 which can
be seen from Figure 1-6.
Figure 1-6: Jointly Gaussian pdf (a) 𝝆𝝆 = 𝟎𝟎; (b) 𝝆𝝆 = −𝟎𝟎. 𝟗𝟗 [2, Figure 5.25]
NOTE: If 𝑋𝑋 and 𝑌𝑌 jointly Gaussian are uncorrelated (𝜌𝜌 = 0) they are also independent.
(Show this!)
Define n jointly related random variables, 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , as a vector random variable:
𝑋𝑋1 𝑥𝑥1
𝑋𝑋 𝑥𝑥2
𝐗𝐗 = � 2 � with instances, 𝐱𝐱 = � ⋮ � and integration increments, 𝑑𝑑𝐱𝐱 = 𝑑𝑑𝑥𝑥1 𝑑𝑑𝑥𝑥2 , … , 𝑑𝑑𝑥𝑥𝑛𝑛
⋮
𝑋𝑋𝑛𝑛 𝑥𝑥𝑛𝑛
Independence
The collection of random variables 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent if:
𝑓𝑓𝐗𝐗 (𝐱𝐱) = 𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 )𝑓𝑓𝑋𝑋𝑛𝑛−1 (𝑥𝑥𝑛𝑛−1 ) ⋯ 𝑓𝑓𝑋𝑋2 (𝑥𝑥2 )𝑓𝑓𝑋𝑋1 (𝑥𝑥1 )
This also implies:
𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 |𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−1 ) = 𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 )
Expectation of 𝑔𝑔(𝐗𝐗)
∞ ∞
𝐸𝐸[𝑔𝑔(𝐗𝐗)] = � ⋯ � 𝑔𝑔(𝐱𝐱)𝑓𝑓𝐗𝐗 (𝐱𝐱)𝑑𝑑𝐱𝐱 = � � ⋯ � 𝑔𝑔(𝐱𝐱)𝑝𝑝𝐗𝐗 (𝐱𝐱)
−∞ −∞ 𝑥𝑥1 𝑥𝑥2 𝑥𝑥𝑛𝑛
Mean Vector, 𝝁𝝁
𝑋𝑋1 𝐸𝐸[𝑋𝑋1 ]
𝑋𝑋 𝐸𝐸[𝑋𝑋2 ]
𝝁𝝁𝐗𝐗 = 𝐸𝐸[𝐗𝐗] = 𝐸𝐸 � 2 � = � �
⋮ ⋮
𝑋𝑋𝑛𝑛 𝐸𝐸[𝑋𝑋𝑛𝑛 ]
Correlation Matrix, 𝐑𝐑
Let 𝑅𝑅𝑖𝑖𝑖𝑖 = 𝐸𝐸[𝑋𝑋𝑖𝑖 𝑋𝑋𝑗𝑗 ] define the correlation between 𝑋𝑋𝑖𝑖 and 𝑋𝑋𝑗𝑗 , then we have that:
𝑋𝑋1
𝑅𝑅11 ⋯ 𝑅𝑅1𝑛𝑛
𝑋𝑋
𝐑𝐑 𝐗𝐗 = � ⋮ ⋱ ⋮ � = 𝐸𝐸[𝐗𝐗𝐗𝐗 𝑇𝑇 ] = 𝐸𝐸 �� 2 � [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑛𝑛 ]�
⋮
𝑅𝑅𝑛𝑛1 ⋯ 𝑅𝑅𝑛𝑛𝑛𝑛
𝑋𝑋𝑛𝑛
Covariance Matrix, 𝐂𝐂
Let 𝐶𝐶𝑖𝑖𝑖𝑖 = 𝐸𝐸�(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑖𝑖 )�𝑋𝑋𝑗𝑗 − 𝜇𝜇𝑗𝑗 �� = 𝑅𝑅𝑖𝑖𝑖𝑖 − 𝜇𝜇𝑖𝑖 𝜇𝜇𝑗𝑗 be the covariance between 𝑋𝑋𝑖𝑖 and 𝑋𝑋𝑗𝑗 for 𝑖𝑖 ≠
𝑗𝑗, and 𝐶𝐶𝑖𝑖𝑖𝑖 = 𝐸𝐸[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑖𝑖 )2 ] be the variance of 𝑋𝑋𝑖𝑖 , then:
𝐶𝐶11 ⋯ 𝐶𝐶1𝑛𝑛
𝐂𝐂𝐗𝐗 = � ⋮ ⋱ ⋮ � = 𝐸𝐸[(𝐗𝐗 − 𝝁𝝁𝐗𝐗 )(𝐗𝐗 − 𝝁𝝁𝐗𝐗 )𝑇𝑇 ] = 𝐑𝐑 𝐗𝐗 − 𝝁𝝁𝐗𝐗 𝝁𝝁𝑇𝑇𝐗𝐗
𝐶𝐶𝑛𝑛1 ⋯ 𝐶𝐶𝑛𝑛𝑛𝑛
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 1-12
ELEC4404 Signal Processing
𝑛𝑛
Where the diagonal terms {𝐶𝐶𝑖𝑖𝑖𝑖 }𝑖𝑖=1 are the variances of the respective random variables in 𝐗𝐗
and the off-diagonal terms �𝐶𝐶𝑖𝑖𝑖𝑖 �𝑖𝑖≠𝑗𝑗 are the covariances between each pair of random
variables.
The covariance matrix is a very important quantity in pattern recognition and analysis of
high dimensional random data / measurements as it defines the spread and tendencies /
patterns of the data.
The random variables 𝐗𝐗 = [𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 ]𝑇𝑇 , are said to be jointly Gaussian if their joint pdf
is given by the multivariate Gaussian distribution function:
1
exp �− (𝐱𝐱 − 𝝁𝝁𝐗𝐗 )𝑇𝑇 𝐂𝐂𝐗𝐗−1 (𝐱𝐱 − 𝝁𝝁𝐗𝐗 )�
𝑓𝑓𝐗𝐗 (𝐱𝐱) = 2
(2𝜋𝜋)𝑛𝑛/2 |𝐂𝐂𝐗𝐗 |1/2
Consider applying a linear transformation matrix, 𝐀𝐀, to n-dimensional random data samples
represented by the vector random variable, 𝐗𝐗:
𝑌𝑌1 𝑎𝑎11 𝑎𝑎12 ⋯ 𝑎𝑎1𝑛𝑛 𝑋𝑋1
𝑌𝑌 𝑎𝑎21 𝑎𝑎22 ⋯ 𝑎𝑎2𝑛𝑛 𝑋𝑋2
𝐘𝐘 = � 2 � = � ⋮ ⋮ ⋱ ⋮ � � ⋮ � = 𝐀𝐀𝐀𝐀
⋮
𝑌𝑌𝑛𝑛 𝑎𝑎𝑛𝑛1 𝑎𝑎𝑛𝑛2 ⋯ 𝑎𝑎𝑛𝑛𝑛𝑛 𝑋𝑋𝑛𝑛
What can we say about the transformed data samples and the corresponding transformed
vector random variable, 𝐘𝐘?
Expectation of 𝐘𝐘
𝝁𝝁𝐘𝐘 = 𝐸𝐸[𝐘𝐘] = 𝐸𝐸[𝐀𝐀𝐀𝐀] = 𝐀𝐀𝐸𝐸[𝐗𝐗] = 𝐀𝐀𝝁𝝁𝐗𝐗
Covariance matrix of 𝐘𝐘
𝐂𝐂𝐘𝐘 = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
Exercise: Show that 𝐂𝐂𝐘𝐘 = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇 and 𝐂𝐂𝐗𝐗𝐗𝐗 = 𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
Let 𝑆𝑆𝑛𝑛 = 𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 be the sum of n iid random variables each with mean 𝐸𝐸[𝑋𝑋] =
𝜇𝜇𝑋𝑋 and variance 𝜎𝜎𝑋𝑋2 . Let 𝑍𝑍𝑛𝑛 be the zero-mean, unit-variance random variable defined by:
𝑆𝑆𝑛𝑛 − 𝜇𝜇𝑆𝑆𝑛𝑛 𝑆𝑆𝑛𝑛 − 𝑛𝑛𝜇𝜇𝑋𝑋
𝑍𝑍𝑛𝑛 = =
𝜎𝜎𝑆𝑆𝑛𝑛 𝜎𝜎√𝑛𝑛
then for any random variable (continuous or discrete):
𝑧𝑧
1 2
lim 𝑃𝑃[𝑍𝑍𝑛𝑛 ≤ 𝑧𝑧] = lim 𝐹𝐹𝑍𝑍 (𝑧𝑧) = � 𝑒𝑒 −𝑥𝑥 /2 𝑑𝑑𝑑𝑑
𝑛𝑛→∞ 𝑛𝑛→∞ √2𝜋𝜋 −∞
which is the cdf of the zero-mean, unit variance Gaussian distribution. Furthermore for any
continuous random variable we can also state:
1 −𝑧𝑧 2/2
lim 𝑓𝑓𝑍𝑍 (𝑧𝑧) = 𝑁𝑁(0,1) = 𝑒𝑒
𝑛𝑛→∞ √2𝜋𝜋
That is the sum of any sequence of iid random variables, with arbitrary pdf, will converge to
that of a Gaussian pdf.
In engineering the idea of estimation is the technique of being able to estimate a desired
signal of interest given noisy observations of it. In this context the signals are random and
being time-varying are deemed to be random signals. However the idea of estimation can
also be applied to random variables and observations or sequences of random variables.
Assume 𝑌𝑌 is the observation from which we want to form an estimate of 𝑋𝑋. For our purposes
we assume 𝑋𝑋 and 𝑌𝑌 are jointly distributed random variables. From Bayes’ theorem we have:
𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙ℎ𝑜𝑜𝑜𝑜𝑜𝑜 ∙ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑦𝑦) = ↔ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝑓𝑓𝑌𝑌 (𝑦𝑦) 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
The problem with MAP estimation is that we may not have any knowledge of the prior
distribution. In such cases we assume the most unbiased opinion of the prior, that is, a
uniform prior distribution, and form the ML estimate.
We want to form an estimate for 𝑋𝑋 as a function of the observation, 𝑌𝑌, that is:
𝑋𝑋� = 𝑔𝑔(𝑌𝑌)
such that that the mean square error (MSE):
2
𝑒𝑒 = 𝐸𝐸[�𝑋𝑋 − 𝑔𝑔(𝑌𝑌)� ]
is minimised. Thus 𝑋𝑋� is known as the minimum MSE estimator and is given by 𝑔𝑔(. ) as
follows:
2
𝑥𝑥� = 𝑔𝑔∗ (𝑦𝑦) = argmin 𝐸𝐸[�𝑋𝑋 − 𝑔𝑔(𝑌𝑌)� ] = 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦]
𝑔𝑔(.)
∗ (𝑦𝑦)
The function 𝑔𝑔 = 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦] is known as the regression curve.
In cases where 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦] is not known or not analytically tractable to derive we consider
specific functional forms based on linear functions.
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 1-16
ELEC4404 Signal Processing
It is intuitively satisfying that the optimal constant estimate of 𝑋𝑋 is its expectation and not
surprisingly the observations 𝑌𝑌 are ignored, this Case 1 is not very practical.
Case 3: 𝑋𝑋� = 𝑔𝑔(𝑌𝑌) = 𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ (linear curve, unbiased) linear MMSE estimator
𝑒𝑒 ∗ = min 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)2 ]
𝑎𝑎
𝑑𝑑𝑑𝑑[(𝑋𝑋−𝑎𝑎𝑎𝑎−𝑏𝑏)2 ]
By taking = 𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)(−1)] = 0 we get:
𝑑𝑑𝑑𝑑
𝑏𝑏 ∗ = 𝐸𝐸[𝑋𝑋] − 𝑎𝑎𝑎𝑎[𝑌𝑌]
𝑑𝑑𝑑𝑑[(𝑋𝑋−𝑎𝑎𝑎𝑎−𝑏𝑏∗ )2 ]
Then by taking = 𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏 ∗ )(−𝑌𝑌)] = 0 we get:
𝑑𝑑𝑑𝑑
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) 𝜎𝜎𝑋𝑋
𝑎𝑎∗ = = 𝜌𝜌𝑋𝑋𝑋𝑋
𝑉𝑉𝑉𝑉𝑉𝑉[𝑌𝑌] 𝜎𝜎𝑌𝑌
and thus:
𝑌𝑌 − 𝐸𝐸[𝑌𝑌]
𝑋𝑋� = 𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ = 𝜌𝜌𝑋𝑋𝑋𝑋 �𝜎𝜎𝑋𝑋
� + 𝐸𝐸[𝑋𝑋]
𝜎𝜎𝑌𝑌
from which we see that 𝐸𝐸�𝑋𝑋�� = 𝐸𝐸[𝑋𝑋] and this is an unbiased estimator.
The MSE can be shown to be (see pg. 335 from [2] for the details):
2
𝑒𝑒 ∗ = 𝐸𝐸[(𝑋𝑋 − (𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ ))2 ] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋] − 𝑎𝑎∗ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋](1 − 𝜌𝜌𝑋𝑋𝑋𝑋 )
The estimator relies on how correlated the 𝑌𝑌 is with 𝑋𝑋. If there is maximum correlation
(𝜌𝜌𝑋𝑋𝑋𝑋 = ±1) then the best estimator simply rescales and shifts the mean and variance of 𝑌𝑌 to
match that for 𝑋𝑋. Conversely if there is no correlation 𝜌𝜌𝑋𝑋𝑋𝑋 = 0 then then best estimator is
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 1-17
ELEC4404 Signal Processing
just the mean 𝐸𝐸[𝑋𝑋], as we had in Case 1, that is, the observations are not used since they
provide no information or knowledge about 𝑋𝑋.
Example 1-22: EXAMPLE 6.28 from [2], pgs. 337
NOTE: The “gotcha” with such statistical estimators is that we require knowledge of the
first and second order statistics of both the given observations 𝑌𝑌 and the signal that needs to
estimated 𝑋𝑋.
The orthogonality principle for MSE estimation states the estimation error 𝑋𝑋 − 𝑋𝑋� is
orthogonal to the observation data 𝑌𝑌, that is
𝐸𝐸��𝑋𝑋 − 𝑋𝑋��𝑌𝑌� = 0
Consider forming an estimate of the desired random variable, 𝐷𝐷, as a linear combination of n
observations, 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 :
� = 𝑎𝑎1 𝑋𝑋1 + 𝑎𝑎2 𝑋𝑋2 + ⋯ + 𝑎𝑎𝑛𝑛 𝑋𝑋𝑛𝑛
𝐷𝐷
The n observations could be from n joint random variables or n observations of a random
𝑛𝑛
process. We need to choose the {𝑎𝑎𝑖𝑖 }𝑖𝑖=1 such the MSE:
� �2 ]
𝑒𝑒 = 𝐸𝐸[�𝐷𝐷 − 𝐷𝐷
is minimised. By invoking the principle of orthogonality we can state the answer comes from
solving:
𝐸𝐸[(𝐷𝐷 − {𝑎𝑎1 𝑋𝑋1 + 𝑎𝑎2 𝑋𝑋2 + ⋯ + 𝑎𝑎𝑛𝑛 𝑋𝑋𝑛𝑛 })𝑋𝑋𝑖𝑖 ] = 0 𝑖𝑖 = 1,2, … , 𝑛𝑛
Define 𝑅𝑅𝑗𝑗𝑗𝑗 = 𝐸𝐸[𝑋𝑋𝑗𝑗 𝑋𝑋𝑖𝑖 ] and 𝑅𝑅𝐷𝐷𝐷𝐷 = 𝐸𝐸[𝐷𝐷𝑋𝑋𝑖𝑖 ] then we are required to solve the following n
simultaneous equations:
𝑅𝑅11 𝑎𝑎1 + 𝑅𝑅21 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛1 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷1
𝑅𝑅12 𝑎𝑎1 + 𝑅𝑅22 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛2 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷2
……………………………………….
𝑅𝑅1𝑛𝑛 𝑎𝑎1 + 𝑅𝑅2𝑛𝑛 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛𝑛𝑛 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷𝐷𝐷
Let 𝐗𝐗 = [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑛𝑛 ]𝑇𝑇 , 𝐚𝐚 = [𝑎𝑎1 𝑎𝑎2 ⋯ 𝑎𝑎𝑛𝑛 ]𝑇𝑇 , 𝐑𝐑 𝐗𝐗 = 𝐸𝐸[𝐗𝐗𝐗𝐗 𝑇𝑇 ] is the correlation
matrix and 𝐫𝐫𝐷𝐷𝐗𝐗 = 𝐸𝐸[𝐷𝐷𝐗𝐗] = [𝑅𝑅𝐷𝐷1 𝑅𝑅𝐷𝐷2 ⋯ 𝑅𝑅𝐷𝐷𝐷𝐷 ]𝑇𝑇 then we can express the equations in
matrix form:
𝐑𝐑 𝐗𝐗 𝐚𝐚 = 𝐫𝐫𝐷𝐷𝐗𝐗
The optimal linear estimator is then:
𝐚𝐚∗ = 𝐑𝐑−1
𝐗𝐗 𝐫𝐫𝐷𝐷𝐗𝐗 and 𝐷𝐷 � = 𝐚𝐚∗𝑇𝑇 𝐗𝐗
And the mean square error of the optimum linear estimator is:
𝑒𝑒 ∗ = 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)2 ] = 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐷𝐷] − 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐚𝐚∗𝑇𝑇 𝐗𝐗]
= 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐷𝐷] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝐷𝐷] − 𝐚𝐚∗𝑇𝑇 𝐫𝐫𝐷𝐷𝐗𝐗
since 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐚𝐚∗𝑇𝑇 𝐗𝐗] = 0 due to the orthogonality principle.
Example 1-23: EXAMPLE 6.30 from [2], pg. 340,341
NOTE: In the context of the observations arising from a sampling a random process across
time and the estimator being represented as an optimal FIR filter, the equations
𝐑𝐑 𝐗𝐗 𝐚𝐚 = 𝐫𝐫𝐷𝐷𝐗𝐗
are known as the Wiener-Hopf equations and their solution:
𝐚𝐚∗ = 𝐑𝐑−1 𝐗𝐗 𝐫𝐫𝐷𝐷𝐗𝐗
are the co-efficients of the Wiener FIR filter.
1.8 References
1. A. Papoulis, S. Unnikrishna Pillai, “Probability, Random Variable, and Stochastic
Processes”, 4th Ed., McGraw-Hill, 2002.
2. A. Leon-Garcia, “Probability and Random Processes for Electrical Engineering”, 3rd Ed.,
Addison Wesley, 2008.
Consider an experiment specified by the outcomes or sample point, 𝑠𝑠𝑖𝑖 , from some sample
space S.
Example 2-1: Consider throwing a die, then 𝑆𝑆 = {1,2,3,4,5,6} is a discrete space and 𝑠𝑠𝑖𝑖 can
take on any one of the 6 discrete values.
Example 2-2: Consider any real number, 𝑠𝑠𝑖𝑖 , from 0 to 1, then 𝑆𝑆 = [0,1]
We further assume that each sample point of the sample space evolves with time, that is we
assign each sample point, 𝑠𝑠𝑖𝑖 , a function of time, 𝑥𝑥𝑖𝑖 (𝑡𝑡).
The evolution of 𝑠𝑠𝑖𝑖 with respect to time, 𝑥𝑥𝑖𝑖 (𝑡𝑡), is called a realisation or sample function. The
sample space or ensemble consisting of the collection of sample functions constitute a
random process or stochastic process. The “randomness” arises both from the random
selection of the sample point and the possible random evolution with time. This is depicted in
Figure 2-1.
Figure 2-1 An ensemble of sample functions ([3], Figure 1.1, pg. 32)
Example 2-3: Consider the sample space S = {1,2,3,4,5,6}, define the realisation as 𝑥𝑥𝑖𝑖 (𝑡𝑡) =
𝑠𝑠𝑖𝑖 𝑒𝑒 −𝑡𝑡 𝑢𝑢(𝑡𝑡). See Figure 2-2 for example realisations
Figure 2-2 Sample functions 𝒙𝒙𝒊𝒊 (𝒕𝒕) = 𝒔𝒔𝒊𝒊 𝒆𝒆−𝒕𝒕 𝒖𝒖(𝒕𝒕) ([4], Figure 4.11, pg. 161)
(NOTE: 𝒙𝒙(𝒕𝒕; 𝛚𝛚𝒊𝒊 ) ≡ 𝒙𝒙𝒊𝒊 (𝒕𝒕))
At each time instant, 𝑡𝑡0 , we have the value 𝑥𝑥𝑖𝑖 (𝑡𝑡0 ). For each different possible sample point,
𝑠𝑠𝑖𝑖 , at the same fixed time instant, 𝑡𝑡0 , the collection of values 𝑥𝑥𝑖𝑖 (𝑡𝑡0 ) over the sample space
constitute a random variable denoted by 𝑋𝑋(𝑡𝑡0 ). That is, at each time instant, the value of the
random process across all possible realisations, constitutes a random variable.
Example 2-4: From Example 2-3 then 𝑋𝑋(1) is a random variable taking values
1
𝑒𝑒 −1 , 2𝑒𝑒 −1 , … ,6𝑒𝑒 −1 each with probability assuming a fair roll of the die.
6
Exercise 2-1
What is 𝑋𝑋(3)?
Exercise 2-2
Sketch a realisation of {𝑋𝑋𝑛𝑛 }∞
𝑛𝑛=0 from Example 2-6. Sketch another realisation and explain why
the realisations are different. Does it make sense to talk about a sample point and its evolution
with time in this context? Explain!
An important special case is that of M=2 in which only the second-order statistics need be
specified.
Example 2-7: The random process is defined by 𝑋𝑋(𝑡𝑡) = θ where θ is a random variable
uniformly distributed on [−1,1]. For this random process, each realisation is a constant signal
of amplitude θ.
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 2-3
ELEC4404 Signal Processing
Example 2-8: The random process is defined by 𝑋𝑋(𝑡𝑡) = θ𝑡𝑡 where θ is a random variable
uniformly distributed on [−1,1]. For this random process, each realisation is a line of slope θ
which goes through the origin.
Figure 2-3 The mean of a random process ([4], Figure 4.15, pg. 165)
(NOTE: 𝒙𝒙(𝒕𝒕; 𝛚𝛚𝒊𝒊 ) ≡ 𝒙𝒙𝒊𝒊 (𝒕𝒕))
Exercise 2-3
Repeat Example 2-9 for the random process of Example 2-8
Answer:
1 1
1 𝑡𝑡θ2
𝑚𝑚𝑋𝑋 (𝑡𝑡) = � θ𝑡𝑡 ⋅ 𝑑𝑑θ = � � = 0
−1 2 4 −1
Example 2-10: Consider the random process from Example 2-7. We note that 𝑓𝑓Θ (θ) =
1� −1 ≤ θ ≤ 1
� 2 and we also have that 𝑋𝑋(𝑡𝑡) = 𝑔𝑔(𝑡𝑡; θ) = θ, hence:
0 otherwise
1
1 1 1 θ3 1
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] = 𝐸𝐸[θ2 ] = ∫−1 θ2 ⋅ 𝑑𝑑θ = � � = which we note is
2 2 3 −1 3
independent of 𝑡𝑡1 and 𝑡𝑡2 (are we surprised?).
The autocovariance 𝑪𝑪𝑿𝑿𝑿𝑿 (𝒕𝒕𝟏𝟏 , 𝒕𝒕𝟐𝟐 ) of the random process, 𝑋𝑋(𝑡𝑡), can also defined as:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[{𝑋𝑋(𝑡𝑡1 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )}{𝑋𝑋(𝑡𝑡2 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡2 )}] = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )𝑚𝑚𝑋𝑋 (𝑡𝑡2 )
Equation 2-3
NOTE 1: For zero-mean random processes the autocovariance is equal to the autocorrelation.
NOTE 2: We also have that VAR[𝑋𝑋(𝑡𝑡)] = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡) = 𝐸𝐸[(𝑋𝑋(𝑡𝑡) − 𝐸𝐸[𝑋𝑋(𝑡𝑡)])2 ] = 𝐸𝐸[𝑋𝑋 2 (𝑡𝑡)] −
𝐸𝐸[𝑋𝑋(𝑡𝑡)]2 which is the variance of 𝑋𝑋(𝑡𝑡). For zero-mean signals then VAR[𝑋𝑋(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋 2 (𝑡𝑡)].
Exercise 2-5
Let 𝑔𝑔(𝑡𝑡) be the rectangular pulse shown below:
The random process 𝑋𝑋(𝑡𝑡) is defined as 𝑋𝑋(𝑡𝑡) = 𝐴𝐴𝐴𝐴(𝑡𝑡) where 𝐴𝐴 assumes the values ±1 with
equal probability. (a) Find the pmf of 𝑋𝑋(𝑡𝑡) and hence the expression for 𝑚𝑚𝑋𝑋 (𝑡𝑡).
Answer:
Since 𝑔𝑔(𝑡𝑡) is zero outside the interval [0,1], then 𝑃𝑃[𝑋𝑋(𝑡𝑡) = 0] = 1 for 𝑡𝑡 ∉ [0,1]. On the
otherhand for 𝑡𝑡 ∈ [0,1] we have 𝑃𝑃[𝑋𝑋(𝑡𝑡) = 1] = 𝑃𝑃[𝑋𝑋(𝑡𝑡) = −1] = 0.5. Hence we can state:
�1�𝑃𝑃�𝑋𝑋(𝑡𝑡) = 1� + �−1�𝑃𝑃�𝑋𝑋(𝑡𝑡) = −1� = 0 𝑡𝑡 ∈ [0,1]
𝑚𝑚𝑋𝑋 (𝑡𝑡) = �
0 𝑡𝑡 ∉ [0,1]
(b) Find the joint pmf of 𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ) and hence 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
Answer:
For 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = ±1] = 0.5 for (+1, +1) and (−1, −1) (i.e. same value)
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = ∓1] = 0 for (+1, −1) and (−1, +1)
For 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∉ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = 0] = 0.5 for (+1,0) and (−1,0)
For 𝑡𝑡1 ∉ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 0, 𝑋𝑋(𝑡𝑡2 ) = ±1] = 0.5 for (0, +1) and (0, −1)
For 𝑡𝑡1 ∉ [0,1], 𝑡𝑡2 ∉ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 0, 𝑋𝑋(𝑡𝑡2 ) = 0] = 1 for (0,0)
Only considering non-zero contributions to the summation from Equation 2-2 which occur
for the case 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = �+1. +1�𝑃𝑃�𝑋𝑋(𝑡𝑡1 ) = +1, 𝑋𝑋(𝑡𝑡2 ) = +1� + �−1. −1�𝑃𝑃�𝑋𝑋(𝑡𝑡1 ) = −1, 𝑋𝑋(𝑡𝑡2 ) = −1� = 1
and thus we can state:
𝑅𝑅𝑋𝑋𝑋𝑋(𝑡𝑡1 , 𝑡𝑡2 ) = �1 𝑡𝑡1 ∈ �0,1� and 𝑡𝑡2 ∈ [0,1]
0 otherwise
Independent / Uncorrelated
Two random processes, 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡), are deemed independent if for all 𝑡𝑡1 and for all 𝑡𝑡2 the
random variables 𝑋𝑋(𝑡𝑡1 ) and 𝑌𝑌(𝑡𝑡2 ) are independent. Similarly, 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) are uncorrelated
if 𝑋𝑋(𝑡𝑡1 ) and 𝑌𝑌(𝑡𝑡2 ) are uncorrelated for all 𝑡𝑡1 , 𝑡𝑡2
NOTE: Two random variables, 𝑋𝑋 and 𝑌𝑌, are independent if
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑓𝑓𝑌𝑌 (𝑦𝑦) and they are uncorrelated if COV[𝑋𝑋, 𝑌𝑌] = 𝐸𝐸[(𝑋𝑋 − 𝐸𝐸[𝑋𝑋])(𝑌𝑌 −
𝐸𝐸[𝑌𝑌])] = 𝐸𝐸[𝑋𝑋𝑋𝑋] − 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌] = 0, where for zero-mean random variables this reduces to
𝐸𝐸[𝑋𝑋𝑋𝑋] = 0.
NOTE: If X and Y are independent then they are also uncorrelated, but if X and Y are
uncorrelated this does NOT imply they are independent, thus independence is a more powerful
constraint/property than (un)correlation.
A random process 𝑋𝑋(𝑡𝑡) is a Gaussian random process if the k-dimensional random variable
vector [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑘𝑘 ], where 𝑋𝑋𝑗𝑗 = 𝑋𝑋(𝑡𝑡𝑗𝑗 ), are jointly Gaussian for all 𝑘𝑘 and for any and
all choices of 𝑡𝑡𝑗𝑗 .
Exercise 2-7
What is an IID Gaussian random process? What is its joint pdf?
Answer: An IID Gaussian sequence is a discrete-time random process 𝑋𝑋𝑛𝑛 which is an IID
random process with a Gaussian pdf expression given by:
(𝑥𝑥−𝑚𝑚)2 ∑𝑘𝑘 (𝑥𝑥 −𝑚𝑚)2
− − 𝑖𝑖=1 𝑖𝑖2
𝑒𝑒 2σ2 𝑒𝑒 2σ
𝑓𝑓𝑋𝑋 (𝑥𝑥) = 𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑘𝑘 ) = 𝑓𝑓𝑋𝑋 (𝑥𝑥1 )𝑓𝑓𝑋𝑋 (𝑥𝑥2 ) ⋯ 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑘𝑘 ) =
√2πσ2 (2πσ2 )𝑘𝑘/2
The IID Gaussian random process has 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚 and 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑗𝑗 ) = {σ δ𝑖𝑖𝑖𝑖 } 𝐊𝐊 = σ2 𝐈𝐈. 2
Figure 2-4 (a) Realisation of Bernoulli process, 𝑰𝑰𝒏𝒏 , indicating that a light bulb fails (𝑰𝑰𝒏𝒏 =
𝟏𝟏) and is replaced on day n. (b) Realisation of the Binomial process, 𝑺𝑺𝒏𝒏 , counting the
number of light bulbs that have failed up to day n. ([1], Figure 9.4, pg. 499)
Random Step
Define the random step as 𝐷𝐷𝑛𝑛 = 2𝐼𝐼𝑛𝑛 − 1 where 𝐼𝐼𝑛𝑛 is a Bernoulli random process. Then 𝐷𝐷𝑛𝑛 is
an IID random process taking on the values from the discrete-valued sample space {−1, +1},
where −1 is a step to the left and +1 is a step to the right, (see Figure 2-5(a)) with 𝑃𝑃[𝐷𝐷 =
1] = 𝑝𝑝, 𝑃𝑃[𝐷𝐷 = −1] = 1 − 𝑝𝑝 and:
𝑚𝑚𝐷𝐷 (𝑛𝑛) = 𝐸𝐸[𝐷𝐷𝑛𝑛 ] = 𝐸𝐸[2𝐼𝐼𝑛𝑛 − 1] = 2𝐸𝐸[𝐼𝐼𝑛𝑛 ] − 1 = 2𝑝𝑝 − 1,
VAR[𝐷𝐷𝑛𝑛 ] = VAR[2𝐼𝐼𝑛𝑛 − 1] = 4VAR[𝐼𝐼𝑛𝑛 ] = 4𝑝𝑝(1 − 𝑝𝑝).
Random Walk
The corresponding sum process, 𝑊𝑊𝑛𝑛 = ∑𝑛𝑛𝑗𝑗=1 𝐷𝐷𝑗𝑗 , is known as the one-dimensional random
walk, and depicts a random walk comprising a sequence of random left and right steps, see
Figure 2-5(b).
Figure 2-5 (a) Realisation of the random step process, 𝑫𝑫𝒏𝒏 , (b) Realisation of the
corresponding random walk process, 𝑾𝑾𝒏𝒏 ([1], Figure 9.5, pg. 500)
The AR(p), depicted in Figure 2-7 for p = 1, is a key process for time series signal modelling.
In the Poisson process events occur at random instants of time and at an average rate of λ
events per second. Practical, real-world examples of this process:
• modelling the arrival of network packets at a router or bridge in a communications
network and the consequent effect of flow and throughput (queuing theory)
• modelling the random breakdown of components in a system (reliability theory)
• modelling of car traffic flow and arrival of cars at a junction (queuing theory)
The Poisson process can be considered the continuous-time version of the Binomial Counting
process and is derived as follows:
• Assume the time interval [0,t] is divided into n subintervals of very short duration δ =
𝑡𝑡/𝑛𝑛. Each subinterval is sufficiently short that the events can be treated as Bernoulli
random variables (i.e. only one event or none in each subinterval, each event occurring
with probability p)
• Let 𝑁𝑁(𝑡𝑡) be the number of event occurrences in the time interval [0,t]. The expected
number of event occurrences or indications over the interval [0,t] (comprising n
subintervals) is the mean of the binomial counting or sum process, 𝐸𝐸[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛𝑛𝑛. Since
events occur at the rate of λ events per second we must have that 𝑛𝑛𝑛𝑛 = λ𝑡𝑡.
• If we let 𝑛𝑛 → ∞ (i.e. δ → 0), 𝑝𝑝 → 0 while 𝑛𝑛𝑛𝑛 = λ𝑡𝑡 remains fixed then we can show that:
𝑛𝑛 (𝑛𝑛𝑛𝑛)𝑘𝑘 −𝑛𝑛𝑛𝑛 (λ𝑡𝑡)𝑘𝑘 −λ𝑡𝑡
𝑃𝑃[𝑆𝑆𝑛𝑛 = 𝑘𝑘] = � � 𝑝𝑝𝑘𝑘 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ≅ 𝑒𝑒 ⇒ 𝑃𝑃[𝑁𝑁(𝑡𝑡) = 𝑘𝑘] = 𝑒𝑒
𝑘𝑘 𝑘𝑘! 𝑘𝑘!
Thus we have the Poisson process, N(t), (so named since the pmf of the random variable 𝑁𝑁(𝑡𝑡0 )
is the Poisson random variable) of rate λ𝑡𝑡:
(λ𝑡𝑡)𝑘𝑘
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 𝑘𝑘] = 𝑒𝑒 −λ𝑡𝑡 for 𝑘𝑘 = 0,1, …
𝑘𝑘!
where we note that 𝐸𝐸[𝑁𝑁(𝑡𝑡)] = λ𝑡𝑡 and we can show that VAR[𝑁𝑁(𝑡𝑡)] = λ𝑡𝑡.
Since the Poisson process is based on the Binomial sum process it possesses stationary and
independent increments, that is, we can say that:
𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖, 𝑁𝑁(𝑡𝑡2 ) = 𝑗𝑗] = 𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖]𝑃𝑃[𝑁𝑁(𝑡𝑡2 ) − 𝑁𝑁(𝑡𝑡1 ) = 𝑗𝑗 − 𝑖𝑖]
= 𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖]𝑃𝑃[𝑁𝑁(𝑡𝑡2 − 𝑡𝑡1 ) = 𝑗𝑗 − 𝑖𝑖]
Example 2-12: Inquiries arrive at a recorded message device at the Poisson rate of 15 enquiries
per minute. Find the expression for the probability that in a 1-minute period, 3 inquiries arrive
during the first 10 seconds and 2 enquiries arrive during the last 15 seconds.
The arrival rate in seconds is λ = 15/60 = 0.25 inquiries per second and the probability of
interest is: 𝑃𝑃[𝑁𝑁(10) = 3, 𝑁𝑁(60) − 𝑁𝑁(45) = 2]
Exercise 2-8
Evaluate the expression in Example 2-12 making use of independent increments.
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 2-13
ELEC4404 Signal Processing
Answer:
𝑃𝑃[𝑁𝑁(10) = 3, 𝑁𝑁(60) − 𝑁𝑁(45) = 2]
= 𝑃𝑃[𝑁𝑁(10) = 3]𝑃𝑃[𝑁𝑁(60) − 𝑁𝑁(45) = 2]
((0.25)(10))3 𝑒𝑒 −(0.25)(10) ((0.25)(15))2 𝑒𝑒 −(0.25)(15)
= 𝑃𝑃[𝑁𝑁(10) = 3]𝑃𝑃[𝑁𝑁(15) = 2] =
3! 2!
Event arrival times in [𝟎𝟎, 𝒕𝒕] are distributed uniformly and independently
Consider a single event arrival (i.e. network packet, telephone call, etc.), 𝑋𝑋, which we are told
arrives sometime in the interval [0, 𝑡𝑡]. Consider the cdf, 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥], this is the probability that
the event arrives by time 𝑥𝑥 where 0 < 𝑥𝑥 < 𝑡𝑡 (i.e. [𝑁𝑁(𝑥𝑥) = 1]), given the event must have
arrived by time 𝑡𝑡 (i.e. [𝑁𝑁(𝑡𝑡) = 1]). That is:
𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥] = 𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1|𝑁𝑁(𝑡𝑡) = 1]
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1, 𝑁𝑁(𝑡𝑡) = 1]
=
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 1]
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1, 𝑁𝑁(𝑡𝑡) − 𝑁𝑁(𝑥𝑥) = 0]
=
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 1]
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1]𝑃𝑃[𝑁𝑁(𝑡𝑡 − 𝑥𝑥) = 0]
=
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 1]
𝜆𝜆𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑒𝑒 −𝜆𝜆(𝑡𝑡−𝑥𝑥) 𝑥𝑥
= =
𝜆𝜆𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑡𝑡
Hence the event arrival time must be uniformly distributed in the interval [0, 𝑡𝑡] (since the cdf
corresponds to that of a uniform distribution). It can be shown that if the number of arrivals in
the interval �0, 𝑡𝑡� is 𝑘𝑘 then the individual arrival times are distributed independently and
uniformly in the interval.
Exercise 2-9
Suppose two customers arrive at a shop during a two-minute period. Find the probability that
both customers arrived during the first minute.
Answer:
Since the arrival times of each customer is uniformly distributed and independent then
1
customer 𝑎𝑎 arrives during the first minute with probability 𝑃𝑃[𝑋𝑋𝑎𝑎 ≤ 1] = and thus both
2
customers, 𝑎𝑎 and 𝑏𝑏, arrive during the first minute with probability
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 2-14
ELEC4404 Signal Processing
1 1 1
𝑃𝑃[𝑋𝑋𝑎𝑎 ≤ 1]𝑃𝑃[𝑋𝑋𝑏𝑏 ≤ 1] = � � � � = .
2 2 4
Consider a Poisson process where events arrive randomly at times, 𝑆𝑆𝑛𝑛 , 𝑛𝑛 = 1,2, …. and 𝑆𝑆𝑛𝑛 is
the time at which the 𝑛𝑛th event occurs, that is:
𝑆𝑆𝑛𝑛 = 𝑇𝑇1 + 𝑇𝑇2 + ⋯ + 𝑇𝑇𝑖𝑖 + ⋯ + 𝑇𝑇𝑛𝑛
where 𝑇𝑇𝑖𝑖 is the iid exponential inter-arrival times between the (𝑖𝑖-1)the and 𝑖𝑖th events.
Assume that each event results in a system response, ℎ(𝑡𝑡). Thus we can define a new random
process, the shot noise process, 𝑋𝑋(𝑡𝑡), given by:
∞
The term ‘shot noise’ is applied to the fluctuations in electronic and photonic circuits due to
the effect of random electrons/photons. For example, define 𝑋𝑋(𝑡𝑡) as the shot noise effect from
an idle photodetector where events correspond to sporadic photoelectrons hitting the detector
giving rise to a temporary current pulse ℎ(𝑡𝑡) at each event.
Figure 2-8 Shot noise process ([1], Figure 9.11(b), pg. 513)
The transmitter will transmit the desired signal of interest, 𝑋𝑋(𝑡𝑡), and the receiver will receive
the observed signal, 𝑌𝑌(𝑡𝑡), which has been corrupted by noise, usually in simple additive
fashion:
𝑌𝑌(𝑡𝑡) = 𝑋𝑋(𝑡𝑡) + 𝑁𝑁(𝑡𝑡)
In theory, both 𝑋𝑋(𝑡𝑡) and 𝑁𝑁(𝑡𝑡) should be considered as random processes, in practice it is the
noise random process, 𝑁𝑁(𝑡𝑡), and its properties, that are important. Also the noise process 𝑁𝑁(𝑡𝑡)
is assumed to be uncorrelated with the signal process 𝑋𝑋(𝑡𝑡) and usually 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 𝐸𝐸[𝑁𝑁(𝑡𝑡)] =
0 (signals are zero-mean).
Example 2-13
Specifically the random signal representing the carrier is 𝑋𝑋(𝑡𝑡) = 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ) where Θ is
uniformly distributed over the interval (−π, π), that is,
1
−π ≤ θ ≤ π
𝑓𝑓Θ (θ) = �2π . We have that:
0 otherwise π π
𝐴𝐴
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ)] = � 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + θ)𝑓𝑓Θ (θ)𝑑𝑑θ = � cos(ω𝑐𝑐 𝑡𝑡 + θ)𝑑𝑑θ = 0
2π
−π −π
and we can derive that:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
= 𝐸𝐸[𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡1 + Θ)𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡2 + Θ)]
π
𝐴𝐴2 1
= � {cos(ω𝑐𝑐 (𝑡𝑡1 − 𝑡𝑡2 )) + cos(ω𝑐𝑐 (𝑡𝑡1 + 𝑡𝑡2 ) + 2θ)}𝑑𝑑θ
2π 2
−π
𝐴𝐴2
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = cos(ω𝑐𝑐 (𝑡𝑡1 − 𝑡𝑡2 ))
2
Equation 2-5
NOTE 1: The identity 2cos𝐴𝐴cos𝐵𝐵 = cos(𝐴𝐴 + 𝐵𝐵) + cos(𝐴𝐴 − 𝐵𝐵)
π
NOTE 2: Is it obvious ∫−π cos(ω𝑡𝑡 + 𝑚𝑚θ)𝑑𝑑θ = 0 for m integer?
Exercise 2-10
Sketch 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) as a function of τ = 𝑡𝑡1 − 𝑡𝑡2 .
In a baseband digital transmission system a random sequence of 0’s and 1’s need to be encoded
for transmission over a wired network. The transmitted signal can be viewed as a random
binary wave, 𝑋𝑋(𝑡𝑡), as follows (see Figure 2-9):
1. The symbols 1 and 0 are transmitted every 𝑇𝑇 seconds by pulses of amplitude +𝐴𝐴 and
−𝐴𝐴 respectively, of duration 𝑇𝑇 seconds.
Figure 2-9 Sample function of the random binary wave ([3], Figure 1.6, pg. 39)
The random binary wave holds a key property called stationarity (to be formerly defined later)
which is that the statistical properties of the process are independent of time or “origin”. This
implies that the joint statistical properties at times (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) are the same as the joint
statistical properties at times (𝑡𝑡1 + τ, 𝑡𝑡2 + τ, … , 𝑡𝑡𝑘𝑘 + τ), that is, we can analyse the signal at
whatever absolute time is convenient without loss of generality.
Since the amplitude levels +𝐴𝐴 and −𝐴𝐴 occur with equal probability for any realisation at a
particular time instant then:
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 0
The calculation of the autocorrelation, 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑘𝑘 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )], however, is more tricky,
but here goes:
Case 1: |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | > 𝑇𝑇 (easy)
Under this condition the random variables 𝑋𝑋(𝑡𝑡𝑘𝑘 ) and 𝑋𝑋(𝑡𝑡𝑖𝑖 ) occur in different pulse intervals
and are thus independent from which we get:
𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )] = 𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )]𝐸𝐸[𝑋𝑋(𝑡𝑡𝑘𝑘 )] = 0
Case 2: |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇 (tricky, so concentrate!)
Since the signal is stationary we can analyse at whatever absolute time is convenient, so we
choose 𝑡𝑡𝑘𝑘 = 0 and 𝑡𝑡𝑖𝑖 < 𝑡𝑡𝑘𝑘 . From Figure 2-9 we observe that 𝑋𝑋(𝑡𝑡𝑖𝑖 ) and 𝑋𝑋(𝑡𝑡𝑘𝑘 ) will only be in
the same pulse interval if and only if |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇 − 𝑡𝑡𝑑𝑑 , that is 𝑡𝑡𝑑𝑑 < 𝑇𝑇 − |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 |. In this case
we can derive the following conditional expectation:
2 |𝑡𝑡 |
𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )|𝑡𝑡𝑑𝑑 ] = �𝐴𝐴 𝑡𝑡𝑑𝑑 < 𝑇𝑇 − 𝑘𝑘 − 𝑡𝑡𝑖𝑖
0 otherwise
So to remove the conditioning we average over all possible 𝑡𝑡𝑑𝑑 :
Exercise 2-11
Sketch 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) as a function of τ = 𝑡𝑡1 − 𝑡𝑡2 .
Consider a random process that assumes values ±1, where 𝑋𝑋(0) = ±1 with probability 0.5,
i.e. 𝑃𝑃[𝑋𝑋(0) = ±1] = 0.5, and 𝑋𝑋(𝑡𝑡) changes “polarity” with each occurrence of an event in a
Poisson process of rate α. A realisation of this process is shown in Figure 2-10.
Figure 2-10 Realisation of a random telegraph signal where the 𝑿𝑿𝒋𝒋 are iid exponential
random variables ([1], Figure 9.10, pg. 511)
Since 𝑋𝑋(𝑡𝑡) will have the same polarity as 𝑋𝑋(0) when an even number of Poisson events occur
and the opposite polarity otherwise we have that:
∞
(α𝑡𝑡)2𝑗𝑗 −α𝑡𝑡 1
𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1|𝑋𝑋(0) = ±1] = 𝑃𝑃[𝑁𝑁(𝑡𝑡) = even integer] = � 𝑒𝑒 = {1 + 𝑒𝑒 −2α𝑡𝑡 }
(2𝑗𝑗)! 2
𝑗𝑗=0
∞
(α𝑡𝑡)2𝑗𝑗+1 −α𝑡𝑡 1
𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1|𝑋𝑋(0) = ∓1] = 𝑃𝑃[𝑁𝑁(𝑡𝑡) = odd integer] = � 𝑒𝑒 = {1 − 𝑒𝑒 −2α𝑡𝑡 }
(2𝑗𝑗 + 1)! 2
𝑗𝑗=0
From which we have that:
P[ X (t ) = ±1] = P[ X (t ) = ±1 | X (0) = 1]P[ X (0) = 1] + P[ X (t ) = ±1 | X (0) = −1]P[ X (0) = −1]
⇒ 𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1] = 0.5
We can also show that:
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0, VAR[𝑋𝑋(𝑡𝑡)] = 1
and:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = (1)𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 𝑋𝑋(𝑡𝑡2 )] + (−1)𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) ≠ 𝑋𝑋(𝑡𝑡2 )] = 𝑒𝑒 −2α|𝑡𝑡1−𝑡𝑡2|
2.6.1 Definition
In a complete statistical description of the Mth order statistics of a random processes, for any
𝑘𝑘 ≤ 𝑀𝑀 joint combination and at any of the times (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) the joint PDF
𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 ), where 𝑋𝑋𝑗𝑗 = 𝑋𝑋(𝑡𝑡𝑗𝑗 ), is given. The joint PDF so stated depends on the
time origin. In a very important class of random processes the joint PDF does not depend on
the time origin, only on the relative times.
Thus the same statistical properties will hold no matter at which specific time or time interval
is analysed, that is the properties of the random process do not change with time. If this is the
case then we have a stationary random process.
But let’s be more formal …
Strict stationarity is very strong condition which is difficult to prove. However if one restricts
to only 2nd order stationarity we find that there are many signals that satisfy this and for which
the most useful properties for analysis and design arise.
Specifically …
Wide-Sense Stationary (WSS)
If only the 2nd other statistics exhibit stationarity then we have that:
1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 𝑚𝑚𝑋𝑋 is independent of t
2. 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) ≡ 𝑅𝑅𝑋𝑋 (𝑡𝑡1 − 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋 (τ) depends on the time difference τ = 𝑡𝑡1 − 𝑡𝑡2 .
3. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐶𝐶𝑋𝑋 (τ) = 𝑅𝑅𝑋𝑋 (τ) − 𝑚𝑚𝑋𝑋2 , as a consequence of 1. and 2.
and a random process exhibiting these characteristics will be a wide-sense stationary or WSS
process.
For discrete-time: 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) ≡ 𝑅𝑅𝑋𝑋 (𝑛𝑛1 − 𝑛𝑛2 ) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) where 𝑘𝑘 = 𝑛𝑛1 − 𝑛𝑛2
From now on any reference to stationary can be assumed to imply wide-sense stationary unless
otherwise specified.
Definition
Since τ = 𝑡𝑡1 − 𝑡𝑡2 , if 𝑡𝑡2 = 𝑡𝑡 then 𝑡𝑡1 = 𝑡𝑡 + τ. Hence for a WSS process we can define the
general form of the autocorrelation function as:
𝑅𝑅𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)]
to keep with standard mathematical convention. This convention is especially important to
remember when dealing with cross-correlations.
For discrete-time: 𝑅𝑅𝑋𝑋 (𝑘𝑘) = 𝐸𝐸[𝑋𝑋𝑛𝑛+𝑘𝑘 𝑋𝑋𝑛𝑛 ]
Figure 2-11 Autocorrelation and the rate of change of a random process ([3], Figure
1.4, pg. 37)
For the sinusoid with random phase from Section 1.5.3 we had:
1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0
𝐴𝐴2 𝐴𝐴2
2. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = cos(ω𝑐𝑐 (𝑡𝑡2 − 𝑡𝑡1 )) 𝑅𝑅𝑋𝑋 (τ) = cos(ω𝑐𝑐 τ)
2 2
Exercise 2-12
What is the expression for the pdf of a stationary Gaussian random process for k=2 (consult
Section 1.4.1)?
1 𝑇𝑇 −1
𝑒𝑒 −2(𝐱𝐱−𝐦𝐦) 𝐊𝐊 (𝐱𝐱−𝐦𝐦)
𝑓𝑓𝑋𝑋1 𝑋𝑋2 (𝑥𝑥1 , 𝑥𝑥2 ) =
(2π)|𝐊𝐊|1/2
where:
𝑚𝑚𝑋𝑋 𝐶𝐶 (0) 𝐶𝐶𝑋𝑋 (τ)
𝐦𝐦 = �𝑚𝑚 � 𝐊𝐊 = � 𝑋𝑋 �
𝑋𝑋 𝐶𝐶𝑋𝑋 (τ) 𝐶𝐶𝑋𝑋 (0)
Many random signals in communications systems arise from periodic processes. For example,
a data modulator or modem processes a random waveform every T seconds. Such signals
exhibit cyclostationary behaviour.
2.7.1 Definition
Consider a WSS random process, 𝑋𝑋(𝑡𝑡), for which we are interested in forming estimates of
the mean, 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚 𝑋𝑋 , and autocorrelation, 𝑅𝑅𝑋𝑋 (τ). How can we do this?
Ensemble Average
Consider an ensemble or collection of different realisations of the random process. Then we
have a random variable, 𝑋𝑋(𝑡𝑡), with pdf 𝑓𝑓𝑋𝑋(𝑡𝑡) (𝑥𝑥) representing the distribution at time 𝑡𝑡 over all
the realisations. We define the mean and the autocorrelation as the following expectations or
ensemble averages of the random variable:
∞
Time Average
For a WSS random process it is also possible to consider taking the time average of a single
realisation:
𝑇𝑇
1
⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
2𝑇𝑇
−𝑇𝑇
𝑇𝑇
1
⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
2𝑇𝑇
−𝑇𝑇
where 𝑥𝑥(𝑡𝑡) is any one realisation of the random process of interest.
Ergodic Process
The WSS random process 𝑋𝑋(𝑡𝑡) is ergodic in the mean, that is:
lim ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚𝑋𝑋 (time average approaches ensemble average)
𝑇𝑇→∞
in the mean square sense if the following can be shown to be true:
lim 𝑉𝑉𝑉𝑉𝑉𝑉[⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = 0 (time average converges to the ensemble average)
𝑇𝑇→∞
If a WSS random process is both mean ergodic and ergodic in the autocorrelation function
then it can be considered an ergodic process.
Example 2-16: Let 𝑋𝑋(𝑡𝑡) = 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ) where Θ is uniformly distributed over the interval
1 𝑇𝑇
(−π, π). We know that 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 0, so ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = ∫−𝑇𝑇 𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 =
2𝑇𝑇
1 𝑇𝑇 𝑇𝑇
𝐴𝐴
∫ 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑 = ∫−𝑇𝑇 cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑 from which we have that lim ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 =
2𝑇𝑇 −𝑇𝑇 2𝑇𝑇 𝑇𝑇→∞
0 = 𝑚𝑚𝑋𝑋 (𝑡𝑡). It remains to be shown that lim 𝑉𝑉𝑉𝑉𝑉𝑉[⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = 0 for the random process to be
𝑇𝑇→∞
𝐴𝐴2
deemed ergodic in the mean. Now we know 𝑅𝑅𝑋𝑋 (𝜏𝜏) = cos(ω𝑐𝑐 τ). Then:
2
𝑇𝑇 𝑇𝑇
1 𝐴𝐴2
⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 = � cos(ω𝑐𝑐 (𝑡𝑡 + 𝜏𝜏) + Θ)cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑
2𝑇𝑇 2𝑇𝑇
−𝑇𝑇 −𝑇𝑇
𝑇𝑇
𝐴𝐴2 𝐴𝐴2
= �[cos( 𝜔𝜔𝑐𝑐 (2𝑡𝑡 + 𝜏𝜏) + 2Θ) + cos(𝜔𝜔𝑐𝑐 𝜏𝜏)]𝑑𝑑𝑑𝑑 → cos(𝜔𝜔𝑐𝑐 𝜏𝜏)
4𝑇𝑇 2𝑇𝑇
−𝑇𝑇
as 𝑇𝑇 → ∞ and hence the random process may be ergodic in the autocorrelation.
NOTE: The importance of realising a random process is ergodic is that one need only have a
single realisation of the process to estimate the mean and autocorrelation, whereas otherwise
an ensemble of realisations would be needed.
Assume that the WSS process 𝑋𝑋(𝑡𝑡) is applied as the input to a linear, time-invariant (LTI)
system with impulse response ℎ(𝑡𝑡). What can we say about the response of the system to a
WSS random input?
From signal and systems theory we know that the output can be formulated in terms of the
following convolution integral:
∞
𝑌𝑌(𝑡𝑡) = � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑 = 𝑋𝑋(𝑡𝑡) ∗ ℎ(𝑡𝑡)
−∞
NOTE 1: The use and definition of the convolution operator ∗.
NOTE 2: From signals and systems theory we can also define the transfer function:
∞
𝐻𝐻(𝑓𝑓) = ℑ{ℎ(𝑡𝑡)} = ∫−∞ ℎ(𝑡𝑡)𝑒𝑒 −𝑗𝑗2π𝑓𝑓𝑓𝑓 𝑑𝑑𝑑𝑑 as the Fourier Transform of the impulse response.
𝑚𝑚𝑌𝑌 (𝑡𝑡) = 𝐸𝐸[𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡) ∗ ℎ(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] ∗ ℎ(𝑡𝑡) = 𝑚𝑚𝑋𝑋 (𝑡𝑡) ∗ ℎ(𝑡𝑡)
∞
= � ℎ(𝑟𝑟)𝑚𝑚𝑋𝑋 (𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑
−∞
Definition
The cross-correlation between the input 𝑋𝑋(𝑡𝑡) and output 𝑌𝑌(𝑡𝑡) is defined as:
𝑅𝑅𝑋𝑋𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑌𝑌(𝑡𝑡)𝑋𝑋(𝑡𝑡 + τ)] = 𝑅𝑅𝑌𝑌𝑌𝑌 (−τ)
since 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) are jointly stationary
∞
𝑅𝑅𝑋𝑋𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ) � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑]
−∞
∞ ∞
= � ℎ(𝑟𝑟)𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)]𝑑𝑑𝑑𝑑 = � ℎ(𝑟𝑟)𝑅𝑅𝑋𝑋 (τ + 𝑟𝑟)𝑑𝑑𝑑𝑑
−∞ −∞
∞
= � ℎ(−𝑟𝑟)𝑅𝑅𝑋𝑋 (τ − 𝑟𝑟)𝑑𝑑𝑑𝑑 = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ)
−∞
and:
𝑅𝑅𝑌𝑌𝑌𝑌 (τ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (−τ) = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(τ)
∞ ∞
𝑅𝑅𝑌𝑌 (τ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[� ℎ(𝑠𝑠)𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑑𝑑𝑑𝑑 � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑]
−∞ −∞
∞ ∞
= � � ℎ(𝑠𝑠)ℎ(𝑟𝑟)𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)]𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞
∞ ∞
= � � ℎ(𝑠𝑠)ℎ(𝑟𝑟)𝑅𝑅𝑋𝑋 (τ − 𝑠𝑠 + 𝑟𝑟)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞
which can also be expressed:
∞
𝑅𝑅𝑌𝑌 (τ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[{� ℎ(𝑠𝑠)𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑑𝑑𝑑𝑑} 𝑌𝑌(𝑡𝑡)]
−∞
∞ ∞
= � ℎ(𝑠𝑠)𝐸𝐸[ 𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑌𝑌(𝑡𝑡)]𝑑𝑑𝑑𝑑 = � ℎ(𝑠𝑠)𝑅𝑅𝑋𝑋𝑋𝑋 (τ − 𝑠𝑠) 𝑑𝑑𝑑𝑑
−∞ −∞
= 𝑅𝑅𝑋𝑋𝑋𝑋 (τ) ∗ ℎ(τ) = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ) ∗ ℎ(τ)
Since 𝑹𝑹𝒀𝒀 (𝛕𝛕) is a function of 𝛕𝛕 and 𝒎𝒎𝒀𝒀 (𝒕𝒕) = 𝒎𝒎𝒀𝒀 the output process, 𝒀𝒀(𝒕𝒕), is WSS
NOTE: This result will only hold if the input is continuously applied (i.e. applied at an
infinite time in the past). It does not hold if applied at 𝑡𝑡 = 0 (see EXAMPLE 9-18 from [2],
pg, 400-401)
Discrete-time case
𝑅𝑅𝑌𝑌 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘) ∗ ℎ(𝑘𝑘)
𝑅𝑅𝑌𝑌𝑌𝑌 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(𝑘𝑘)
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘)
2.9.1 Definition
The so-called Einstein-Wiener-Khinchin theorem states that the power spectral density,
𝑆𝑆𝑋𝑋 (𝑓𝑓) and the autocorrelation function, 𝑅𝑅𝑋𝑋 (τ), of a WSS process form a Fourier transform
pair (where ℑ{. } is the Fourier transform operator):
NOTE: To derive the PSD we simply take the Fourier transform of the autocorrelation
function of the random process. Consult your nearest signals and systems textbook for Fourier
transform tables of standard functions.
Property 1
The DC bias of the autocorrelation (NOT the random signal) is given by:
∞
𝑆𝑆𝑋𝑋 (0) = � 𝑅𝑅𝑋𝑋 (τ)𝑑𝑑τ
−∞
BE CAREFUL! The PSD of a random process is not always equivalent to what we expect
from the power spectrum of a deterministic signal. In particular the PSD evaluated at 𝑓𝑓 = 0
does NOT represent the DC bias, m, of the random signal itself. The DC bias of a random
signal is given by the non-zero mean of the autocorrelation function (i.e. 𝑹𝑹𝑿𝑿 (𝛕𝛕) → 𝒎𝒎𝟐𝟐 as
𝛕𝛕 → ∞) which manifests itself as a DC power impulse at 𝒇𝒇 = 𝟎𝟎 in the PSD.
Property 2
The average power of the random process is given by:
∞
2
𝐸𝐸[𝑋𝑋 (𝑡𝑡)] = 𝑅𝑅𝑋𝑋 (0) = � 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝑑𝑑𝑑𝑑
−∞
Property 3
The PSD is always non-negative, that is:
𝑆𝑆𝑋𝑋 (𝑓𝑓) ≥ 0
Property 4
The PSD of a real-valued random process is an even, real function:
𝑆𝑆𝑋𝑋 (−𝑓𝑓) = 𝑆𝑆𝑋𝑋 (𝑓𝑓)
From Section 2.6.4 we had that for a sinusoid with random phase:
𝐴𝐴2 𝐴𝐴2
𝑅𝑅𝑋𝑋 (τ) = cos(ω𝑐𝑐 τ) = cos(2π𝑓𝑓𝑐𝑐 τ)
2 2
and the PSD is given as:
𝐴𝐴2 𝐴𝐴2 𝐴𝐴2
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{ cos(2π𝑓𝑓𝑐𝑐 τ)} = δ(𝑓𝑓 − 𝑓𝑓𝑐𝑐 ) + δ(𝑓𝑓 + 𝑓𝑓𝑐𝑐 )
2 4 4
using either Appendix B of [2], Appendix C.5 from [5], or Table A6.3 from Appendix A of
[3]. From Figure 2-14 it is evident that spectrally all the energy is concentrated at the frequency
𝑓𝑓𝑐𝑐 which is to be expected. [NOTE: The 𝛿𝛿(𝑓𝑓), and in the time-domain: 𝛿𝛿(𝑡𝑡), is the impulse
or Dirac delta function]
Figure 2-14 The PSD of a sinusoid with random phase ([3], Figure 1.10, pg. 48)
Figure 2-15 The PSD of the random binary wave ([3], Figure 1.11, pg. 49)
sin(π𝑥𝑥)
NOTE: sinc(𝑥𝑥) = (where sinc(0) = 1) is an important function that you will see
π𝑥𝑥
again in the signals and systems and communication systems units.
Figure 2-16 The PSD of the random telegraph signal ([1], Figure 10.1, pg. 580)
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 2-32
ELEC4404 Signal Processing
When we speak of the energy or power of a signal it is typically assumed that the signal is a
voltage or current source feeding a 1 ohm resistor. Thus the power and energy are direct
functions of the signal amplitude (voltage).
The energy content of a signal is defined by:
∞
𝐸𝐸𝑥𝑥 = � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
−∞
and the power content of a signal is defined by:
1 𝑇𝑇/2
𝑃𝑃𝑥𝑥 = lim � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
Energy-Type Signals
A signal is called energy-type if 𝐸𝐸𝑥𝑥 < ∞ and it can be shown that in such a case 𝑃𝑃𝑥𝑥 = 0.
Typically a finite-duration signal is an energy-type signal. Using Parseval’s Theorem we see
that (where 𝑋𝑋(𝑓𝑓) = ℑ{𝑥𝑥(𝑡𝑡)} is the Fourier Transform of 𝑥𝑥(𝑡𝑡)):
∞ ∞
𝐸𝐸𝑥𝑥 = � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑 = � |𝑋𝑋(𝑓𝑓)|2 𝑑𝑑𝑑𝑑
−∞ −∞
where 𝐺𝐺𝑥𝑥 (𝑓𝑓) = |𝑋𝑋(𝑓𝑓)|2 is the energy spectral density (or energy spectrum) of the signal
𝑥𝑥(𝑡𝑡) expressed in Joules per Hertz.
Power-Type Signals
A signal is called power-type if 0 < 𝑃𝑃𝑥𝑥 < ∞ and it can be shown that in such a case 𝐸𝐸𝑥𝑥 = ∞.
Typically infinite-duration signals are power-type signals.
NOTE: A signal cannot be both energy-type and power-type. Most signals of interest are
one type or the other.
Define the “autocorrelation” function of a real-valued, power-type signal as:
1 𝑇𝑇/2
𝑅𝑅𝑥𝑥 (τ) = lim � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
From which we can state:
1 𝑇𝑇/2
𝑃𝑃𝑥𝑥 = 𝑅𝑅𝑥𝑥 (0) = lim � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
Define the power-spectral density or power spectrum of the signal as:
∞
1
𝑆𝑆𝑥𝑥 (𝑓𝑓) = ℑ{𝑅𝑅𝑥𝑥 (τ)} = � 𝑅𝑅𝑥𝑥 (τ)𝑒𝑒 −𝑗𝑗2π𝑓𝑓τ 𝑑𝑑τ = lim |𝑋𝑋𝑇𝑇 (𝑓𝑓)|2
−∞ 𝑇𝑇→∞ 𝑇𝑇
where:
𝑇𝑇/2
𝑡𝑡
𝑋𝑋𝑇𝑇 (𝑓𝑓) = ℑ{𝑥𝑥(𝑡𝑡)rect � �} = ℑ 𝑇𝑇 {𝑥𝑥(𝑡𝑡)} = � 𝑥𝑥(𝑡𝑡)𝑒𝑒 −𝑗𝑗2π𝑓𝑓𝑓𝑓 𝑑𝑑𝑑𝑑
𝑇𝑇 −𝑇𝑇/2
Most practical infinite-duration, deterministic signals are periodic. For periodic signals we
have the following relations:
1 𝑇𝑇0 /2 −1
∞
2 𝑗𝑗2π
𝑛𝑛τ
𝑅𝑅𝑥𝑥 (τ) = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 = ℑ {𝑆𝑆𝑥𝑥 (𝑓𝑓)} = � |𝑥𝑥𝑛𝑛 | 𝑒𝑒 𝑇𝑇0
𝑇𝑇0 −𝑇𝑇0 /2 𝑛𝑛=−∞
∞ 𝑛𝑛
𝑆𝑆𝑥𝑥 (𝑓𝑓) = � |𝑥𝑥𝑛𝑛 |2 δ �𝑓𝑓 − �
𝑛𝑛=−∞ 𝑇𝑇0
∞
𝑃𝑃𝑥𝑥 = � |𝑥𝑥𝑛𝑛 |2
𝑛𝑛=−∞
where 𝑥𝑥𝑛𝑛 are the Fourier series co-efficients of 𝑥𝑥(𝑡𝑡).
NOTE: For periodic signals the power spectrum is given by the magnitude squares of the
sequence of Fourier series co-efficients discrete-frequency or “sampled” spectrum.
For random signals we define the energy and power of the signal as the following
expectations:
∞ ∞ ∞
2 2
𝐸𝐸𝑋𝑋 = 𝐸𝐸 �� 𝑋𝑋 (𝑡𝑡)𝑑𝑑𝑑𝑑� = � 𝐸𝐸[𝑋𝑋 (𝑡𝑡)]𝑑𝑑𝑑𝑑 = � 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡)𝑑𝑑𝑑𝑑
−∞ −∞ −∞
𝑇𝑇/2 𝑇𝑇/2
1 2
1 2
1 𝑇𝑇/2
𝑃𝑃𝑋𝑋 = 𝐸𝐸 � lim � 𝑋𝑋 (𝑡𝑡)𝑑𝑑𝑑𝑑� = lim � 𝐸𝐸[𝑋𝑋 (𝑡𝑡)]𝑑𝑑𝑑𝑑 = lim � 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡)𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2 𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2 𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
For WSS processes we have that 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡) = 𝑅𝑅𝑋𝑋 (0) is independent of time 𝑡𝑡 and hence:
𝑃𝑃𝑋𝑋 = 𝑅𝑅𝑋𝑋 (0)
∞
𝐸𝐸𝑋𝑋 = � 𝑅𝑅𝑋𝑋 (0) 𝑑𝑑𝑑𝑑 = ∞
−∞
and hence WSS random signals are power-type signals.
Note that we also similarly define the power spectrum or power-spectral density as:
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{𝑅𝑅𝑋𝑋 (τ)}
which follows from noting that:
∞
𝑃𝑃𝑋𝑋 = 𝑅𝑅𝑋𝑋 (0) = � 𝑆𝑆𝑋𝑋 (𝑓𝑓) 𝑑𝑑𝑑𝑑
−∞
where 𝑆𝑆𝑋𝑋 (𝑓𝑓) is usually measured in Watts/Hz.
Exercise 2-13
What is the difference in the frequency-domain representation of the following similar signals:
(a) A rectangular pulse
(b) A square wave
(c) A random binary wave
Answer:
(a) Energy-type deterministic signal hence we have an energy spectrum given by 𝐺𝐺𝑥𝑥 (𝑓𝑓) =
|𝑋𝑋(𝑓𝑓)|2 ∝ sinc 2 (𝑓𝑓) (continuous-frequency)
(b) Power-type deterministic signal hence we have a power spectrum given by 𝑆𝑆𝑥𝑥 (𝑓𝑓) =
𝑛𝑛
∑∞ 2 2
𝑛𝑛=−∞|𝑥𝑥𝑛𝑛 | δ �𝑓𝑓 − � ∝ sinc (𝑘𝑘) (discrete-frequency)
𝑇𝑇0
(c) Power-type random signal hence we have a power spectrum given by 𝑆𝑆𝑋𝑋 (𝑓𝑓) =
ℑ{𝑅𝑅𝑋𝑋 (τ)} ∝ sinc 2 (𝑓𝑓) (continuous-frequency)
NOTE: The deterministic energy-type, rectangular pulse has the same spectral characteristic
as the power-type random binary wave! But one is an energy spectrum the other is a power
spectrum.
We now reconsider the relations established from Section 2.8 for the output of a LTI system
with a random WSS signal as input but in the frequency domain. To do this we simply take
the Fourier transform of the quantities:
𝑆𝑆(𝑓𝑓) = ℑ{𝑅𝑅(τ)}, 𝐻𝐻(𝑓𝑓) = ℑ{ℎ(τ)}, 𝐻𝐻∗ (𝑓𝑓) = ℑ{ℎ(−τ)}
and using the results of Section 2.3 obtain the following key result:
𝑆𝑆𝑌𝑌 (𝑓𝑓) = ℑ{𝑅𝑅𝑌𝑌 (τ)} = ℑ{𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ) ∗ ℎ(τ)} = 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝐻𝐻∗ (𝑓𝑓)𝐻𝐻(𝑓𝑓) = 𝑆𝑆𝑋𝑋 (𝑓𝑓)|𝐻𝐻(𝑓𝑓)|2
That is:
𝑆𝑆𝑌𝑌 (𝑓𝑓) = |𝐻𝐻(𝑓𝑓)|2 𝑆𝑆𝑋𝑋 (𝑓𝑓)
Equation 2-7
and also:
𝑆𝑆𝑌𝑌𝑌𝑌 (𝑓𝑓) = 𝐻𝐻(𝑓𝑓)𝑆𝑆𝑋𝑋 (𝑓𝑓)
𝑆𝑆𝑋𝑋𝑋𝑋 (𝑓𝑓) = 𝐻𝐻∗ (𝑓𝑓)𝑆𝑆𝑋𝑋 (𝑓𝑓)
where 𝑆𝑆𝑋𝑋𝑋𝑋 (𝑓𝑓) and 𝑆𝑆𝑌𝑌𝑌𝑌 (𝑓𝑓) are the cross-spectral densities and 𝑆𝑆𝑌𝑌 (𝑓𝑓) is the output power-
spectral density.
For discrete-time systems using the z-transform representation:
𝑆𝑆𝑌𝑌 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑅𝑅𝑌𝑌 (𝑘𝑘)} = 𝑍𝑍𝑍𝑍{𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘) ∗ ℎ(𝑘𝑘)} = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑋𝑋 (𝑧𝑧)
𝑆𝑆𝑌𝑌𝑌𝑌 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝑆𝑆𝑋𝑋 (𝑧𝑧)
𝑆𝑆𝑋𝑋𝑋𝑋 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑋𝑋 (𝑧𝑧)
where 𝑍𝑍𝑍𝑍{. } is the z-transform, 𝐻𝐻(𝑧𝑧 −1 ) = 𝑍𝑍𝑍𝑍{ℎ(−𝑘𝑘)} and 𝑆𝑆𝑋𝑋 (𝐹𝐹) = 𝑆𝑆𝑋𝑋 (𝑧𝑧)|𝑧𝑧=𝑒𝑒 𝑗𝑗2𝜋𝜋𝜋𝜋
Following the introduction of the Gaussian random process from Section 1.4.1 we note /
summarise the following important properties:
Property 1
For Gaussian processes knowledge of the mean and autocorrelation (𝑚𝑚𝑋𝑋 (𝑡𝑡) and 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ))
gives a complete statistical description of the process.
Property 2
If the Gaussian random process 𝑋𝑋(𝑡𝑡) is passed through an LTI system, then the output process
𝑌𝑌(𝑡𝑡) will also be a Gaussian process.
Property 3
If a Gaussian process is WSS then it is also strictly stationary.
Property 4
If random variables obtained by sampling a Gaussian random process
{𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ), … , 𝑋𝑋(𝑡𝑡𝑘𝑘 )}are uncorrelated, then they are also independent.
Property 5
A sufficient condition for the ergodicity of the stationary zero-mean Gaussian process 𝑋𝑋(𝑡𝑡)
∞
is that ∫−∞ �𝑅𝑅𝑋𝑋 (τ)� 𝑑𝑑τ < ∞
The term “white noise” is used to denote the zero-mean random process, 𝑊𝑊(𝑡𝑡), in which all
frequency components appear with equal power. i.e. the power spectral density (the power
spectrum of the signal) is constant or “flat”, typically designated as:
𝑁𝑁0
𝑆𝑆𝑊𝑊 (𝑓𝑓) =
2
and by taking the inverse Fourier Transform we have:
𝑁𝑁0
𝑅𝑅𝑊𝑊 (τ) = δ(τ)
2
which implies that samples from the white noise process are uncorrelated.
𝑁𝑁
The 0�2 (or 𝑁𝑁0 ) is termed the noise power (spectral) density. The ½ factor is needed to
account for the dual effect due to –ve frequencies (see Example 2-14)
Figure 2-17 White noise (a) power spectral density, (b) autocorrelation
([3], Figure 1.16, pg. 61)
The “white noise” process provides a simple mathematical approximation to thermal noise
which is a pervasive source of interference in all communications and electronics systems.
From quantum mechanical analysis one can derive the power spectrum of thermal noise as
shown in Figure 2-18.
Figure 2-18 Power spectrum of thermal noise ([4], Figure 4.20, pg. 189)
We make the following comments about the “white noise” process 𝑊𝑊(𝑡𝑡) typically analysed in
the electrical engineering context:
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 2-37
ELEC4404 Signal Processing
• Samples from the noise process are uncorrelated with each other.
• The noise process is zero-mean and is at least wide-sense stationary; it can be shown to
also be ergodic.
In communications engineering the noise process is used to model the interference experienced
by a signal transmitted through a communications channel (fibre optic cable, telephone cable,
satellite, over-the-air, etc.). The white noise process is the 𝑁𝑁(𝑡𝑡) in the signal plus noise
described previously:
𝑌𝑌(𝑡𝑡) = 𝑋𝑋(𝑡𝑡) + 𝑁𝑁(𝑡𝑡)
In communications engineering white noise in the “signal plus noise” context is termed an
AWGN (Additive White Gaussian Noise) process. Also standard notation for the AWGN
process can be 𝑊𝑊(𝑡𝑡), 𝑁𝑁(𝑡𝑡), or 𝑉𝑉(𝑡𝑡) so students should expect to see different notation being
adopted.
Although the white noise process has, in theory, infinite power in practical electronic
systems the noise process will be bandwidth limited (bandlimited) and hence possess finite
power. Indeed if one considers practical white noise one can make the following statements:
• In nature the white noise process doesn’t really exist and only approximates the true
underlying physical process (e.g. electrons in thermal noise, etc.) which is not truly
flat and will have finite power.
• In electrical engineering circuits the noise will be bandwidth limited and hence it is
safe to assume the ensuing process is a bandlimited or filtered white noise.
Answer:
1 |𝑓𝑓| < 𝐵𝐵
(a) Ideal low-pass filter transfer function: 𝐻𝐻(𝑓𝑓) = � , hence
0 otherwise
𝑁𝑁0
𝑆𝑆𝑁𝑁 (𝑓𝑓) = 𝑆𝑆𝑊𝑊 (𝑓𝑓)|𝐻𝐻(𝑓𝑓)|2 = � �2 |𝑓𝑓| < 𝐵𝐵 , a sketch is shown in Figure 2-19(a).
0 otherwise
Figure 2-19 Low-pass filtered white noise (a) power spectral density (b) autocorrelation
function ([2], Figure 1.17. pg. 62)
(b) 𝑃𝑃 = ∫−𝐵𝐵 𝑆𝑆𝑁𝑁 (𝑓𝑓)𝑑𝑑𝑑𝑑 = 𝑁𝑁0 𝐵𝐵 hence 𝑁𝑁0 = 𝑃𝑃�𝐵𝐵 �Watts�Hz�
𝐵𝐵
(c) From Fourier Transform tables (Appendix B of [1]) we have that 𝑅𝑅𝑁𝑁 (τ) =
𝑁𝑁0 sin(2π𝐵𝐵τ)
2𝐵𝐵 = 𝑁𝑁0 𝐵𝐵sinc(2𝐵𝐵τ), a sketch is shown in Figure 2-19(b).
2 2π𝐵𝐵τ
Note that the discrete-time white noise process, 𝑊𝑊𝑛𝑛 , is a zero-mean IID process. We have
when taking the z-transforms, where 𝑊𝑊(𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑊𝑊𝑛𝑛 } is the input and 𝑌𝑌(𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑌𝑌𝑛𝑛 } is the
output (assume for one realisation of the random process), the following result for the system
𝑌𝑌(𝑧𝑧)
transfer function 𝐻𝐻(𝑧𝑧) = :
𝑊𝑊(𝑧𝑧)
Since 𝑊𝑊𝑛𝑛 is WSS and since 𝑌𝑌𝑛𝑛 is the output process when applying 𝑊𝑊𝑛𝑛 to the LTI system
transfer function 𝐻𝐻(𝑧𝑧) then:
The AR(p), MA(q) and ARMA(p,q) processes are WSS with PSD given by:
𝑆𝑆𝑌𝑌 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑊𝑊 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝜎𝜎 2
assuming the input white noise process has variance 𝜎𝜎 2
Finally in digital communications systems a more useful measure is the normalised “SNR
per bit” defined as:
SNR per bit
𝐸𝐸𝑏𝑏 transmitted energy per bit
𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏 = =
𝑁𝑁0 noise power spectral density
𝐸𝐸
Note that if the data rate is 𝑅𝑅𝑏𝑏 then the bit duration is 𝑇𝑇𝑏𝑏 = 1�𝑅𝑅 and hence 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑏𝑏�𝑇𝑇 =
𝑏𝑏 𝑏𝑏
𝐸𝐸𝑏𝑏 𝑅𝑅𝑏𝑏 , and over a system bandwidth of B we also have 𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑁𝑁0 𝐵𝐵 hence:
𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐸𝐸𝑏𝑏 𝑅𝑅𝑏𝑏 𝑅𝑅𝑏𝑏
𝑆𝑆𝑆𝑆𝑆𝑆 = = = 𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏
𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑁𝑁0 𝐵𝐵 𝐵𝐵
Thus 𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏 is a normalised “SNR” since it is independent of the data rate and system
bandwidth.
Quantisers are devices that operate on a signal and represent the signal amplitude by a finite
number of levels or quantisation levels. Quantisation is needed to represent a discrete signal
(or each sample of a sampled analog signal) digitally in a finite number of bits (i.e. each sample
can then be stored on digital media as an 8-bit byte or 16-bit integer value).
It is common to use uniform quantisers with equal quantisation levels. There are various ways
a signal can be quantised as shown in Figure 2-20
Assume the quantiser has been designed to handle amplitudes in the range 𝑥𝑥min ≤ 𝑥𝑥[𝑛𝑛] ≤
𝑥𝑥max , then if 𝑥𝑥[𝑛𝑛] < 𝑥𝑥min or 𝑥𝑥[𝑛𝑛] > 𝑥𝑥max the signal amplitude will get clipped and either
saturate at the highest level or be zeroed (see Figure 2-21).
Assume each sample is to be stored as a B-bit word, then the number of quantisation levels is:
𝐿𝐿 = 2𝐵𝐵
Define 𝐷𝐷 = 𝑥𝑥max − 𝑥𝑥min as the dynamic range of the signal 𝑥𝑥[𝑛𝑛], then the quantisation step-
size, or resolution, Δ, that results when quantising a signal with the dynamic range 𝐷𝐷 in 𝐿𝐿
levels is:
𝐷𝐷
Δ=
𝐿𝐿
Let 𝑥𝑥𝑄𝑄 [𝑛𝑛] represent the quantised signal, then 𝑒𝑒[𝑛𝑛] = 𝑥𝑥[𝑛𝑛] − 𝑥𝑥𝑄𝑄 [𝑛𝑛] is the quantisation error.
For quantisation with rounding the quantisation error can be assumed to be a uniformly
Δ Δ
distributed random variable between − and (the pdf is 𝑓𝑓(𝑒𝑒) = 1/Δ). The noise power is
2 2
given by:
2.14 References
1. A. Leon-Garcia, “Probability and Random Processes for Electrical Engineering”, 3rd Ed.,
Addison Wesley, 2008.
2. A. Papoulis, S. Unnikrishna Pillai, “Probability, Random Variable, and Stochastic
Processes”, 4th Ed., McGraw-Hill, 2002.
3. S. Haykin, “Communication Systems”, 4th Ed., Wiley, 2001.
4. J.G. Proakis, M. Salehi, “Communication Systems Engineering”, 2nd Ed., Prentice-Hall,
2002.
5. S Haykin, B Van Veen, “Signals and Systems”, 2nd Ed., John Wiley, 2002
6. Ambardar, “Analog and Digital Signal Processing”, 2nd Ed., Brooks/Cole, 1999.
Block Diagram
Adaptive filter provides an estimate of the desired signal or system response
Examples
• echo cancellation: remove undesirable echo from two-way communications system by
subtracting an estimate of the echo
• adaptive control: estimate parameters and/or state of the plant in order to design a
controller
• channel modeling: provide estimate of the unknown system response over the regions of
operation of interest
Problem solution: adaptive filter forms an estimate of the interfering signal picked up by the
microphone, e(t), given the incoming loudspeaker signal, x(t), and subtracts this from the
outgoing microphone signal, y(t) = s(t) + e(t), so that only the wanted signal, s(t) is
transmitted.
Figure 3-2 Principle of acoustic echo cancellation using an adaptive echo canceler
(Figure 1.17 [1])
Solution caveats:
1. The echo is not just a simple, linear function of the loudspeaker signal. Room acoustics
and speaker and microphone transducer effects will modify the signal echo in a complex
way.
2. The room acoustics, and to a lesser extent transducer effects, change with time as the
talker moves, microphone/speaker are re-positioned, etc.
Block Diagram
Adaptive filters provides an estimate and applies the inverse of the system response
Examples
• adaptive equalization: apply inverse of communication channel transfer function in
order to equalise or remove the unwanted effects or distortions arising from transmission
through the channel.
• blind deconvolution/separation: apply the inverse of the corrupting convolution or
mixing operations to the resultant output signal(s) in order to reconstitute the original
signal. The system convolution/mixing response is usually unknown and this makes the
problem “blind”.
• adaptive inverse control: estimate the inverse response of the plant in order to design
controllers in series rather than in feedback with the plant
Problem solution: compensate for ISI by using an adaptive filter that restores the received
pulse to the original shape by estimating and applying the inverse of the communication
channel response characteristics.
Figure 3-4 Channel equalizer with training and decision-directed modes of operation
(Figure 12.6 [1])
Solution caveats
1. The filter has to both “learn” the inverse of the particular channel and “track” its variation
with time
2. Requires knowledge of the correct symbol sequence (e.g. a training sequence) initially to
prime or reset the operation of the filter. In cases where this is not available then there is
the additional problem of “blind equalization”.
Block Diagram
Estimate a signal at time t = n0 by using past and/or future values of the same signal
{x(t): n1 ≤: t ≤: n2}
• forward prediction (linear prediction) if n0 > n2
• backward prediction of n0 < n1
• smoothing/interpolation if n1 < n0 < n2 (excluding n0 itself!)
Examples
• adaptive predictive coding: form estimate of current value of signal based on previous
samples and store/transmit the error in the prediction rather than the signal itself
Problem solution: to reduce the quantisation noise the current sample is predicted based on
the previous samples and the error in the prediction is stored/transmitted together with the
predictor filter co-efficients. The dynamic range of these quantities should be smaller and
hence subject to reduced quantisation noise effects.
Figure 3-6 Predictive linear coding system, (a) coder, (b) decoder
(Figure 10.5 [1])
Solution caveats
1. As real-word sources are non-stationary the filter has to recalculate the predictor co-
efficients and retransmit these to the receiver, but this is done on a frame-by-frame rather
than sample-by-sample basis
Block Diagram
Use of multiple sensors that provide reference signals for the estimate and removal of
interference and noise from a primary signal.
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 3-7
ELEC4404 Signal Processing
Examples
• active noise control: provide an inverted estimate of the unwanted signal or noise and
remove it from the zone of interest by destructive wave interference
• array processing: collect signals from a group of spatially positioned sensors and
emphasise signals arriving from specific directions (i.e. adaptive beamforming) as used in
radar, direction finding, antenna steering, etc.
Problem solution: a reference signal, 𝑥𝑥(𝑡𝑡) = 𝐺𝐺𝑥𝑥 𝑣𝑣(𝑡𝑡), from the interfering environment is
used to provide an exact out-of-phase estimate of the interfering signal(s), 𝑦𝑦(𝑡𝑡) = 𝐺𝐺𝑦𝑦 𝑣𝑣(𝑡𝑡),
that is 𝑦𝑦�(𝑡𝑡) = 𝑓𝑓(𝑥𝑥(𝑡𝑡)) ≡ 𝐺𝐺𝑦𝑦 𝐺𝐺𝑥𝑥−1 𝑥𝑥(𝑡𝑡), which when played out through a loudspeaker will
completely cancel the unwanted signal(s) in the listening zone (.e. 𝑦𝑦(𝑡𝑡) + 𝑦𝑦�(𝑡𝑡) = 0).
y(t)
Figure 3-8 Basic components of an active noise control system (Figure 1.21 [1])
Solution Caveats
1. The filter has to provide an exact out-of-phase estimate of the interfering signal in order
to cancel the unwanted signal, otherwise more noise is added!
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 3-8
ELEC4404 Signal Processing
2. The reference signal must be a filtered version of the interfering signal but must not
include any desired signals in the listening zone of interest, otherwise these will also be
removed.
3. The acoustic environment is unknown and highly time-varying and the filter must be able
to rapidly adapt to any changes.
3.5 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000.
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.
In the following analysis we assume discrete-time random signals and adopt the following
notation:
• 𝑦𝑦(𝑛𝑛) be the value of the signal of interest (or desired response) at time n
• 𝑥𝑥𝑘𝑘 (𝑛𝑛) be the set of M values (observations or data), for 1 ≤ k ≤ M, at time n
signal of the kth sensor at time n (array processing)
x(n – k), the kth delay of the signal at time n
(signal prediction, system inversion and identification)
then:
Given the set of data, 𝑥𝑥𝑘𝑘 (𝑛𝑛), the signal estimation problem is to determine an estimate 𝑦𝑦�(𝑛𝑛),
of the desired response, 𝑦𝑦(𝑛𝑛) using:
𝑦𝑦�(𝑛𝑛) ≡ 𝐡𝐡T {𝑥𝑥𝑘𝑘 (𝑛𝑛), 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀}
where in the case of 𝑥𝑥𝑘𝑘 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 − 𝑘𝑘), the estimator takes the form of a discrete-time filter.
We want to find an optimum estimator that approximates the desired response as closely as
possible according to certain performance criterion, most commonly minimisation of the
error signal:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
Design an estimator that provides an estimate 𝑦𝑦�(𝑛𝑛) of the desired response 𝑦𝑦(𝑛𝑛) using a
linear combination of the data 𝑥𝑥𝑘𝑘 (𝑛𝑛) for 1 ≤ k ≤ M, such that the MSE 𝐸𝐸{|𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)|2 } is
minimised. That is, our linear MSE estimator (dropping the time index n, i.e. 𝑦𝑦� = 𝑦𝑦�(𝑛𝑛), 𝑦𝑦 =
𝑦𝑦(𝑛𝑛), and 𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘 (𝑛𝑛)) is defined by:
𝑀𝑀
where:
𝑥𝑥1
𝑥𝑥2
𝐱𝐱 = [𝑥𝑥1 𝑥𝑥2 ⋯ 𝑥𝑥𝑀𝑀 ]𝑇𝑇 = �⋮ �, is the M × 1 data or observation vector,
𝑥𝑥𝑀𝑀
We next rewrite the expression for 𝑃𝑃(𝐜𝐜) in the form of a “perfect square” as follows:
𝑃𝑃(𝐜𝐜) = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐑𝐑−1 𝐝𝐝 + (𝐑𝐑𝐑𝐑 − 𝐝𝐝)𝑇𝑇 𝐑𝐑−1 (𝐑𝐑𝐑𝐑 − 𝐝𝐝)
Exercise 4.1 Show This!
The necessary and sufficient conditions that determine the linear MMSE estimator, cO , are:
𝐑𝐑𝐜𝐜𝑂𝑂 = 𝐝𝐝
Equation 4-2
which can be written as the set of normal equations:
𝑟𝑟11 𝑟𝑟12 ⋯ 𝑟𝑟1𝑀𝑀 𝑐𝑐1 𝑑𝑑1
𝑟𝑟21 𝑟𝑟22 ⋯ 𝑟𝑟2𝑀𝑀 𝑐𝑐2 𝑑𝑑2
�⋮ ⋮ ⋱ ⋮ � �⋮ � = � �
⋮
𝑟𝑟𝑀𝑀1 𝑟𝑟𝑀𝑀2 ⋯ 𝑟𝑟𝑀𝑀𝑀𝑀 𝑐𝑐𝑀𝑀 𝑑𝑑𝑀𝑀
Equation 4-3
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 4-2
ELEC4404 Signal Processing
We note:
1. We assume that x and y are zero mean. If not, we replace x and y by 𝑥𝑥 − 𝐸𝐸(𝑥𝑥) and 𝑦𝑦 −
𝐸𝐸(𝑦𝑦) respectively.
2. If x and y are uncorrelated then d = 0 and PO = Py which means that there is no linear
estimator that can be found to reduce the MSE since the desired response y is
uncorrelated with the data vector x. This is the worst result.
3. If y can be perfectly estimated from x then 𝑦𝑦� = 𝑦𝑦 and PO = 0. The is the best result.
The normal equations can be more directly be derived making use of the principle of
orthogonality as follows. We consider the correlation between the data vector x and the MSE
error, 𝑒𝑒0 :
𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 𝐸𝐸{𝐱𝐱(𝑦𝑦 − 𝐱𝐱 𝑇𝑇 𝐜𝐜𝑂𝑂 )} = 𝐸𝐸{𝐱𝐱𝑦𝑦} − 𝐸𝐸{𝐱𝐱𝐱𝐱 𝑇𝑇 }𝐜𝐜0 = 𝐝𝐝 − 𝐑𝐑𝐜𝐜𝑂𝑂 = 0
that is:
𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 0
Equation 4-4
or:
𝐸𝐸{𝑥𝑥𝑖𝑖 𝑒𝑒𝑂𝑂 } = 0 for 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀
Orthogonality Principle
The estimation error, 𝑒𝑒0 , is orthogonal to the data, x, used for the estimation, i.e. 𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 0
A convenient way to view this is to consider the abstract vector space where a vector is a
zero-mean random variable with the following associations:
‖𝑥𝑥‖2 =< 𝑥𝑥, 𝑥𝑥 >≡ 𝐸𝐸{|𝑥𝑥|2 }, for the (squared) length of the vector
‖𝑥𝑥‖. ‖𝑦𝑦‖ cos 𝜃𝜃𝑥𝑥𝑥𝑥 =< 𝑥𝑥, 𝑦𝑦 >≡ 𝐸𝐸{𝑥𝑥𝑥𝑥}, where 𝜃𝜃𝑥𝑥𝑥𝑥 is angle between 𝑥𝑥 and 𝑦𝑦
We illustrate this principle for the case of M=2 in Figure 4-1 where for the given 𝑦𝑦 and space
spanned by {𝑥𝑥𝑖𝑖 : 𝑖𝑖 ≤ 1 ≤ 𝑀𝑀} the minimum error occurs when 𝑥𝑥𝑖𝑖 ⊥ 𝑒𝑒𝑂𝑂 , for all 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀
and hence 𝐸𝐸{𝑥𝑥𝑖𝑖 𝑒𝑒𝑂𝑂 } = 0 for 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀 is the condition for the optimum estimator.
Figure 4-1 Illustration of orthogonality principle (Figure 6.9 [1]). Note that 𝒙𝒙𝒊𝒊 ⊥ 𝒆𝒆𝑶𝑶 , 𝟏𝟏 ≤
𝒊𝒊 ≤ 𝑴𝑴 but it is not true that 𝒙𝒙𝒊𝒊 ⊥ 𝒙𝒙𝒋𝒋 unless the data itself is uncorrelated
The MMSE 𝑃𝑃𝑂𝑂 = 𝐸𝐸{|𝑒𝑒𝑂𝑂 |2 } = 𝐸𝐸{𝑒𝑒0 (𝑦𝑦 − 𝑦𝑦�0 )} = 𝐸𝐸{𝑦𝑦𝑒𝑒0 } = 𝐸𝐸{𝑦𝑦(𝑦𝑦 − 𝐱𝐱 𝑇𝑇 𝐜𝐜𝑂𝑂 )} = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐜𝐜𝑂𝑂
(where we use 𝐸𝐸{𝑒𝑒𝑂𝑂 𝑦𝑦�𝑂𝑂 } = 0 since the estimation error is also orthogonal to the estimate)
A numerical method for solution of the normal equations (Equation 4-2) and computation of
the minimum error is known as the lower-diagonal-upper decomposition or LDLT
decomposition for short where the correlation matrix is written as: R = LDLT .
Full details of the LDL solution method can be found in [1, pgs 274-278].
The optimum FIR filter (also known as Wiener filter) forms an estimate 𝑦𝑦�(𝑛𝑛) of the desired
response, 𝑦𝑦(𝑛𝑛), by using finite samples from a related input signal, 𝑥𝑥(𝑛𝑛). That is:
𝑀𝑀−1 𝑀𝑀−1
We form the LMMSE estimate by solving the corresponding set of normal equations for the
filter coefficients, cO (n), at each time n:
𝐑𝐑(𝑛𝑛)𝐜𝐜𝑂𝑂 (𝑛𝑛) = 𝐝𝐝(𝑛𝑛)
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 4-4
ELEC4404 Signal Processing
or in matrix form:
𝑟𝑟0,0 (𝑛𝑛) 𝑟𝑟0,1 (𝑛𝑛) ⋯ 𝑟𝑟0,𝑀𝑀−1 (𝑛𝑛) 𝑐𝑐 (𝑛𝑛) 𝑑𝑑0 (𝑛𝑛)
⎡ ⎤ 0
⎢𝑟𝑟1,0 (𝑛𝑛) 𝑟𝑟1,1 (𝑛𝑛) ⋯ 𝑟𝑟1,𝑀𝑀−1 (𝑛𝑛) ⎥ 𝑐𝑐1 (𝑛𝑛)
�
𝑑𝑑 (𝑛𝑛)
�=� 1 �
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⋮ ⋮
⎣𝑟𝑟𝑀𝑀−1,0 (𝑛𝑛) 𝑟𝑟𝑀𝑀−1,1 (𝑛𝑛) ⋯ 𝑟𝑟𝑀𝑀−1,𝑀𝑀−1 (𝑛𝑛)⎦ 𝑐𝑐𝑀𝑀−1 (𝑛𝑛) 𝑑𝑑𝑀𝑀−1 (𝑛𝑛)
𝑇𝑇
where 𝐑𝐑(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝐱𝐱 (𝑛𝑛)}, 𝑟𝑟𝑖𝑖𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑥𝑥(𝑛𝑛 − 𝑗𝑗)}, 𝐝𝐝(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)},
𝑑𝑑𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑦𝑦(𝑛𝑛)} and then computing the estimate 𝑦𝑦�(𝑛𝑛) using a discrete-time filter
structure based upon Equation 4-5 primed by the 𝑐𝑐𝑂𝑂 (𝑛𝑛), as shown in Figure 4-2.
Figure 4-2 Design and implementation of a time-varying optimum FIR filter (Figure 6.11
[1])
• The FIR filter coefficients have to be estimated and loaded into the filter at each sample
time n
• Although the desired response 𝑦𝑦(𝑛𝑛)is not available, the known relationship between 𝑦𝑦(𝑛𝑛)
and 𝑥𝑥(𝑛𝑛) can be used to derive or estimate the cross-correlation vector 𝐝𝐝(𝑛𝑛) =
𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)}
Example 4–1
The most common use of optimum filtering is for estimating of a signal in noise, that is:
x(n) = y(n) + v(n)
where v(n) is a noise signal which is assumed uncorrelated with y(n). Thus:
𝑑𝑑𝑘𝑘 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑘𝑘)𝑦𝑦(𝑛𝑛)} = 𝐸𝐸{(𝑦𝑦(𝑛𝑛 − 𝑘𝑘) + 𝑣𝑣(𝑛𝑛 − 𝑘𝑘))𝑦𝑦(𝑛𝑛)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛 − 𝑘𝑘)𝑦𝑦(𝑛𝑛)}
requiring only knowledge of the second-order statistics of the desired response y(n)
Most useful FIR filters are designed when the input and desired response stochastic
processes are jointly wide-sense stationary (WSS), in which case the correlation matrix, R(n)
= R, and cross-correlation vector, d(n) = d, no longer depend explicitly on the time-index n.
We form the LMMSE estimate by solving the corresponding set of normal equations for the
filter coefficients, 𝐜𝐜𝑂𝑂 = 𝐡𝐡𝑂𝑂 :
𝐑𝐑𝐡𝐡𝑂𝑂 = 𝐝𝐝
or in matrix form:
𝑟𝑟𝑥𝑥 (0) 𝑟𝑟𝑥𝑥 (1) ⋯ 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 1) ℎ𝑂𝑂 (0) 𝑟𝑟𝑦𝑦𝑦𝑦 (0)
⎡ ⎤
𝑟𝑟𝑥𝑥 (1) 𝑟𝑟𝑥𝑥 (0) ⋯ 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 2) ℎ𝑂𝑂 (1) ⎢ 𝑟𝑟𝑦𝑦𝑦𝑦 (1) ⎥
� �� �=
⋮ ⋮ ⋱ ⋮ ⋮ ⎢ ⋮ ⎥
𝑟𝑟𝑥𝑥 (𝑀𝑀 − 1) 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 2) ⋯ 𝑟𝑟𝑥𝑥 (0) ℎ𝑂𝑂 (𝑀𝑀 − 1) 𝑟𝑟
⎣ 𝑦𝑦𝑦𝑦 (𝑀𝑀 − 1) ⎦
Equation 4-6
That is, a time-invariant optimum FIR filter is implemented based upon the convolution:
𝑀𝑀−1
𝑃𝑃 𝑂𝑂 = 𝑃𝑃𝑦𝑦 − � ℎ𝑂𝑂 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑂𝑂 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − 𝐡𝐡𝑇𝑇𝑂𝑂 𝐝𝐝
𝑘𝑘=0 𝑘𝑘=0
Equation 4-9
Example 4–2
Problem: Consider the harmonic random process:
𝑦𝑦(𝑛𝑛) = 𝐴𝐴cos(𝜔𝜔𝑂𝑂 𝑛𝑛 + 𝜙𝜙)
with fixed, but unknown, amplitude and frequency and random uniformly distributed phase.
The process is corrupted by additive white Gaussian noise 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, 𝜎𝜎𝑣𝑣2 ) that is
uncorrelated with 𝑦𝑦(𝑛𝑛). The resulting signal 𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) is observable. It is
required to design an optimum FIR filter with input 𝑥𝑥(𝑛𝑛) to remove the noise and produce an
estimate of the desired response 𝑦𝑦(𝑛𝑛)
Solution: To design the optimum filter the second-order statistics, 𝑟𝑟𝑥𝑥 (𝑙𝑙)and 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) are derived
as follows. We note that since v(n) and y(n) are uncorrelated we have:
1
𝑟𝑟𝑥𝑥 (𝑙𝑙) = 𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑛𝑛) = 𝐴𝐴2 cos(𝜔𝜔𝑂𝑂 𝑙𝑙) + 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙)
2
1
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)[𝑦𝑦(𝑛𝑛 − 𝑙𝑙) + 𝑣𝑣(𝑛𝑛 − 𝑙𝑙)]} = 𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝐴𝐴2 𝑐𝑐𝑐𝑐𝑐𝑐𝜔𝜔𝑂𝑂 𝑙𝑙
2
and by solving for ℎO (𝑘𝑘) in Equation 4-6 we form the optimum FIR filter given by:
𝑀𝑀−1
1
Exercise 4.2: Show that 𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑦𝑦(𝑛𝑛 − 𝑙𝑙)} = 𝐴𝐴2 cos(𝜔𝜔𝑂𝑂 𝑙𝑙)
2
𝑀𝑀−1
We are given a set of samples 𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) of a stochastic process of
interest and wish to estimate the value of the same process, 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) (where 𝑖𝑖 ∉ [0. . 𝑀𝑀]), at a
different time i, using a linear combination of the given (known) samples:
𝑀𝑀
Figure 4-3 Illustration showing the samples used in linear signal estimation (Figure 6.16[1])
𝑒𝑒 (𝑖𝑖) (𝑛𝑛) = � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) + � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=0 𝑘𝑘=𝑖𝑖+1
= 𝐜𝐜𝑖𝑖𝑇𝑇 (𝑛𝑛)𝐱𝐱 𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) = 𝑇𝑇
𝐱𝐱𝑖𝑖 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖)
Equation 4-10
where:
𝐱𝐱 𝑖𝑖 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥�𝑛𝑛 − (𝑖𝑖 − 1)� 𝑥𝑥(𝑛𝑛 − (𝑖𝑖 + 1)) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇
𝐜𝐜𝑖𝑖 (𝑛𝑛) = [𝑐𝑐0 (𝑛𝑛) 𝑐𝑐1 (𝑛𝑛) … 𝑐𝑐𝑖𝑖−1 (𝑛𝑛) 𝑐𝑐𝑖𝑖+1 (𝑛𝑛) … 𝑐𝑐𝑀𝑀 (𝑛𝑛)]𝑇𝑇
If i = L and M = 2L then we have an Mth order symmetric linear smoother (SLS) that
produces an estimate of the middle sample, 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) for 𝑘𝑘 = 𝐿𝐿, by using the L past (𝐿𝐿 +
1 ≤ 𝑘𝑘 ≤ 2𝐿𝐿) and L future (0 ≤ 𝑘𝑘 ≤ 𝐿𝐿 − 1) samples. That is:
𝐿𝐿−1 2𝐿𝐿
where 𝐑𝐑 𝐿𝐿 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝐿𝐿 (𝑛𝑛)𝐱𝐱 𝐿𝐿𝑇𝑇 (𝑛𝑛)} and 𝐫𝐫𝐿𝐿 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝐿𝐿 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝐿𝐿)}, and the MMSE power:
(𝐿𝐿)
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝐿𝐿) + 𝐫𝐫𝐿𝐿𝑇𝑇 (𝑛𝑛)𝐜𝐜𝐿𝐿 (𝑛𝑛)
A one-step Mth order forward linear prediction (FLP) involves the estimation of the sample,
𝑥𝑥(𝑛𝑛), by using the M past samples, 𝑥𝑥(𝑛𝑛 − 1), 𝑥𝑥(𝑛𝑛 − 2), … , 𝑥𝑥(𝑛𝑛 − 𝑀𝑀). From Equation
4-10 this corresponds to the case of i = 0, and adopting the following change in notation for
the special case of FLP:
𝐱𝐱(𝑛𝑛 − 1) = [𝑥𝑥(𝑛𝑛 − 1) 𝑥𝑥(𝑛𝑛 − 2) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇 ≡ 𝐱𝐱 0 (𝑛𝑛)
𝐚𝐚(𝑛𝑛) = [ 𝑎𝑎1 (𝑛𝑛) 𝑎𝑎2 (𝑛𝑛) … 𝑎𝑎𝑀𝑀 (𝑛𝑛)]𝑇𝑇 ≡ 𝐜𝐜0 (𝑛𝑛)
𝐫𝐫 𝑓𝑓 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛)} ≡ 𝐫𝐫0 (𝑛𝑛)
𝐑𝐑(𝑛𝑛 − 1) = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝐱𝐱 𝑇𝑇 (𝑛𝑛 − 1)} = 𝐑𝐑 0 (𝑛𝑛)
the forward predictor co-efficients 𝐚𝐚(𝑛𝑛) are the solution of Equation 4-11, that is:
𝐑𝐑(𝑛𝑛 − 1)𝐚𝐚(𝑛𝑛) = −𝐫𝐫 𝑓𝑓 (𝑛𝑛)
and the MMSE power from Equation 4-12 is given by:
𝑓𝑓
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛) + 𝐫𝐫 𝑓𝑓𝑓𝑓 (𝑛𝑛)𝐚𝐚(𝑛𝑛)
where 𝑃𝑃𝑥𝑥 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛)}. An estimate of 𝑥𝑥(𝑛𝑛) is then provided by:
𝑀𝑀
𝑒𝑒 𝑓𝑓 (𝑛𝑛) = 𝑒𝑒 (0) (𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + � 𝑎𝑎𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = � 𝑎𝑎𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=1 𝑘𝑘=0
(𝑀𝑀)
• It is standard notational practice to indicate the order of analysis by: 𝑎𝑎𝑘𝑘 (𝑛𝑛) ≡ 𝑎𝑎𝑘𝑘 (𝑛𝑛),
which identifies the kth co-efficient of the Mth order FLP.
(𝑀𝑀) (𝑀𝑀)
• Since 𝑎𝑎1 (𝑛𝑛) is the first co-efficient we have by definition 𝑎𝑎0 (𝑛𝑛) = 1
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 4-10
ELEC4404 Signal Processing
A one-step Mth order backward linear prediction (BLP) involves the estimation of the
sample, 𝑥𝑥(𝑛𝑛 − 𝑀𝑀), by using the M future samples, 𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(𝑛𝑛 − (𝑀𝑀 − 1)).
From Equation 4-10 this corresponds to the case of i = M, and adopting the following change
in notation for the special case of BLP:
the backward predictor co-efficients 𝐛𝐛(𝑛𝑛) are the solution of Equation 4-11, that is:
𝐑𝐑(𝑛𝑛)𝐛𝐛(𝑛𝑛) = −𝐫𝐫 𝑏𝑏 (𝑛𝑛)
and the MMSE power from Equation 4-12 is given by:
𝑃𝑃𝑂𝑂𝑏𝑏 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑀𝑀) + 𝐫𝐫 𝑏𝑏𝑏𝑏 (𝑛𝑛)𝐛𝐛(𝑛𝑛)
where 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑀𝑀) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑀𝑀)𝑥𝑥(𝑛𝑛 − 𝑀𝑀)}. An estimate of 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) is then provided by:
𝑀𝑀−1
𝑒𝑒 𝑏𝑏 (𝑛𝑛) = 𝑒𝑒 (𝑀𝑀) (𝑛𝑛) = � 𝑏𝑏𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) = � 𝑏𝑏𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=0 𝑘𝑘=0
(𝑀𝑀)
• It is standard notational practice to indicate the order of analysis by: 𝑏𝑏𝑘𝑘 (𝑛𝑛) ≡ 𝑏𝑏𝑘𝑘 (𝑛𝑛),
which identifies the kth co-efficient of the Mth order BLP.
(𝑀𝑀) (𝑀𝑀)
• Since 𝑏𝑏𝑀𝑀−1 (𝑛𝑛) is the last co-efficient we have by definition 𝑏𝑏𝑀𝑀 (𝑛𝑛) = 1
Is the process 𝑥𝑥(𝑛𝑛) is WSS, then the elements of the correlation matrix 𝐑𝐑(𝑛𝑛) and correlation
vector 𝐫𝐫(𝑛𝑛) no longer depend explicitly on the time-index n but only on the difference. That
is 𝐫𝐫(𝑙𝑙) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} does not depend on n, only on l. Define:
𝐫𝐫 = [𝑟𝑟(1) 𝑟𝑟(2) … 𝑟𝑟(𝑀𝑀)]𝑇𝑇
It is evident that:
𝐫𝐫 𝑓𝑓 = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛)} = 𝐫𝐫
𝐫𝐫 𝑏𝑏 = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑀𝑀)} = 𝐉𝐉𝐉𝐉 = [𝑟𝑟(𝑀𝑀) 𝑟𝑟(𝑀𝑀 − 1) … 𝑟𝑟(1)]𝑇𝑇
where J is the exchange matrix:
0 0 ⋯ 1
⋮ ⋮ ⋱ ⋮
𝐉𝐉 = � �
0 1 ⋯ 0
1 0 ⋯ 0
which simply reverses the order of the vector elements of r.
Noting that 𝐉𝐉 𝑇𝑇 = 𝐉𝐉. Since R is a symmetric Toeplitz matrix it can be shown that:
𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉
and hence we have that for WSS signals:
𝐛𝐛𝑂𝑂 = 𝐉𝐉𝐚𝐚𝑂𝑂
𝑓𝑓
𝑃𝑃𝑂𝑂 = 𝑃𝑃𝑂𝑂𝑏𝑏
That is the BLP coefficient vector is the reverse of the FLP coefficient vector and the MMSE
powers are the identical. In other order words the FLP and BLP co-efficients are duals of one
another. This duality property only holds for stationary processes.
𝑓𝑓
Exercise 4.3 Show that 𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉 and hence show that 𝐛𝐛𝑂𝑂 = 𝐉𝐉𝐚𝐚𝑂𝑂 and 𝑃𝑃𝑂𝑂 = 𝑃𝑃𝑂𝑂𝑏𝑏
4.5.5 Properties
Property 1
If the signal x(n) is stationary, then the symmetric, linear smoother has linear phase.
Proof: Using the fact that for stationary signals 𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉 then we can show that for the
symmetric linear smoother that:
𝐫𝐫 = 𝐉𝐉𝐉𝐉 which implies 𝐜𝐜 = 𝐉𝐉𝐉𝐉
and hence the smoother impulse response,𝐜𝐜 ,has even symmetry it has, by definition, linear
phase.
Exercise 4.4 Show that 𝐜𝐜 = 𝐉𝐉𝐉𝐉 and hence that the smoother has linear phase.
Property 2
If the signal x(n) is stationary, then the FLP error filter 1, 𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑀𝑀 is minimum-
phase and the BLP error filter 𝑏𝑏0 , 𝑏𝑏1 , … , 𝑏𝑏𝑀𝑀−1 , 1 is maximum-phase.
Proof: The system function of the FLP error filter 𝐴𝐴(𝑧𝑧) = 1 + ∑𝑀𝑀 𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧
−𝑘𝑘
can be shown to
have all zeros inside the unit circle and hence is, by definition, minimum-phase. Since 𝐛𝐛 =
1
𝐉𝐉𝐉𝐉, then we have that 𝐵𝐵(𝑧𝑧) = 𝑧𝑧 −𝑀𝑀 𝐴𝐴 � � which implies that all zeros are outside the unit
𝑧𝑧
circle and hence the BLP error filter is maximum-phase.
1
Exercise 4.5 Show that 𝐛𝐛 = 𝐉𝐉𝐉𝐉 implies 𝐵𝐵(𝑧𝑧) = 𝑧𝑧 −𝑀𝑀 𝐴𝐴 � �
𝑧𝑧
Example 4–3
Problem: A random sequence x(n) is generated by passing a white Gaussian noise process
𝑤𝑤(𝑛𝑛)~𝑁𝑁(0,1) through the filter:
1
𝑥𝑥(𝑛𝑛) = 𝑤𝑤(𝑛𝑛) + 𝑤𝑤(𝑛𝑛 − 1)
2
Determine the second-order FLP, BLP and SLS.
Solution: Since x(n) is a WSS signal for M = 2 we need to calculate r(0), r(1) and r(2) to
fully specify the filters. The complex power spectrum, 𝑅𝑅𝑥𝑥 (𝑧𝑧) where 𝑅𝑅𝑤𝑤 (𝑧𝑧) = 1 is:
1
1 1 1 5 1
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑅𝑅𝑤𝑤 (𝑧𝑧) = (1 + 𝑧𝑧 −1 )(1 + 𝑧𝑧)(1) = 𝑧𝑧 + + 𝑧𝑧 −1 ≡ � 𝑟𝑟(𝑘𝑘)𝑧𝑧 −𝑘𝑘
2 2 2 4 2
𝑘𝑘=−1
5 1
and thus 𝑟𝑟(0) = , 𝑟𝑟(1) = , 𝑟𝑟(2) = 0. From Equation 4-13 for the FLP we have:
4 2
5 1
1
2 𝑎𝑎1 𝑎𝑎1 −0.476
�41 5� �𝑎𝑎2 � = − �2 � �𝑎𝑎 � = � �
0 2 0.190
2 4
and:
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 4-13
ELEC4404 Signal Processing
5 1𝑓𝑓 −0.476
𝑃𝑃𝑂𝑂 =
+� 0� � � = 1.012
4 2 0.190
and from Equation 4-14 for the BLP we have:
5 1
2 𝑏𝑏0
0 𝑏𝑏 0.190
�41 5� �𝑏𝑏 � = − �1 � � 0 � = � �
1 2
𝑏𝑏1 −0.476
2 4
and:
5 1 0.190
𝑃𝑃𝑂𝑂𝑏𝑏 =
+ �0 �� � = 1.012
4 2 −0.476
For the SLS L=1 and M=2 and thus:
𝐱𝐱1 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 2)]𝑇𝑇
𝑟𝑟(0) 𝑟𝑟(2)
𝐑𝐑1 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱1 (𝑛𝑛)𝐱𝐱1𝑇𝑇 (𝑛𝑛)} = � �
𝑟𝑟(2) 𝑟𝑟(0)
𝑟𝑟(1)
𝐫𝐫1 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱1 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 1)} = � �
𝑟𝑟(1)
Hence we have:
5 1
0 𝑐𝑐0 𝑐𝑐0 −0.4
�4 5 � �𝑐𝑐 � = − �21� �𝑐𝑐 � = � �
0 2 2 −0.4
4 2
and from Equation 4-12:
5 1 1 −0.4
𝑃𝑃𝑂𝑂𝑠𝑠 = +� �� � = 0.85
4 2 2 −0.4
In summary:
𝑓𝑓
Forward Linear Predictor: {𝑎𝑎0 , 𝑎𝑎1 , 𝑎𝑎2 } → {1, −0.476, 0.190} 𝑃𝑃𝑂𝑂 = 1.012
Backward Linear Predictor: {𝑏𝑏0 , 𝑏𝑏1 , 𝑏𝑏2 } → {0.190, −0.476, 1} 𝑃𝑃𝑂𝑂𝑏𝑏 = 1.012
Symmetric Linear Smoother: {𝑐𝑐0 , 𝑐𝑐1 , 𝑐𝑐2 } → {−0.4, 1, −0.4} 𝑃𝑃𝑂𝑂𝑠𝑠 = 0.85
We note:
1. The BLP co-efficient vector is the reverse of the FLP co-efficient vector.
2. The BLP and FLP MMSE powers are identical.
3. The SLS co-efficients are symmetric.
4. The SLS MMSE power is less than either the FLP or BLP and hence the SLS performs
better.
5. It can be shown the the FLP is minimum-phase, the BLP is maximum-phase and the SLS
has linear phase.
NOTE: We will define the complex power spectrum or power spectral density (PSD) for the
discrete-time signal, 𝑥𝑥[𝑛𝑛], as 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑟𝑟𝑥𝑥 (𝑘𝑘)}, where 𝑍𝑍𝑍𝑍 is the z-transform. Previously
we would have use 𝑆𝑆𝑋𝑋 (𝑧𝑧) or 𝑆𝑆𝑋𝑋 (𝐹𝐹) when considering the theory of random signals.
For optimum IIR filters the presence of both zeros and poles in the filter transfer function,
𝐻𝐻(𝑧𝑧), implies an infinite impulse response sequence, ℎ𝑂𝑂 (𝑘𝑘), hence the term IIR filter. The
theory for nonstationary signals is complicated and beyond the scope of these notes. For
stationary signals, the optimum IIR filter equation, Wiener-Hopf equation, and expression
for the MMSE power are the same as Equation 4-7, Equation 4-8, and Equation 4-9
respectively with the exception that the summation depends on whether we are interested in a
noncausal or causal IIR filter.
The noncausal optimum IIR filter is implemented based upon the convolution:
∞
The causal optimum IIR filter is implemented based upon the convolution:
∞
where the complexity of the solution depends on the range of m that is applicable.
From Chapter 2 we had the following results, where we restate using our new notational framework:
Let 𝑦𝑦(𝑘𝑘) = 𝑥𝑥(𝑘𝑘) ∗ ℎ(𝑘𝑘), and define: 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝑍𝑍𝑍𝑍�𝑟𝑟𝑦𝑦 (𝑘𝑘)� 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑟𝑟𝑥𝑥 (𝑘𝑘)} 𝐻𝐻(𝑧𝑧) = 𝑍𝑍𝑍𝑍{ℎ(𝑘𝑘)}
−1
then: 𝑅𝑅𝑥𝑥𝑥𝑥 (𝑧𝑧) = 𝐻𝐻 (𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑅𝑅𝑥𝑥 (𝑧𝑧)
Since the limits of the summation that apply to Equation 4-16 is −∞ < 𝑚𝑚 < ∞ the
convolution property of the z-transform can be invoked to give:
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
thus:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝑅𝑅𝑥𝑥 (𝑧𝑧)
Equation 4-17
where 𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) is the optimum IIR filter transfer function, 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑥𝑥 (𝑙𝑙)] =
∑∞𝑙𝑙=−∞ 𝑟𝑟𝑥𝑥 (𝑙𝑙)𝑧𝑧
−𝑙𝑙
is the power-spectral density (PSD) of 𝑥𝑥(𝑛𝑛), and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)] =
∞ −𝑙𝑙
∑𝑙𝑙=−∞ 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)𝑧𝑧 is the cross-PSD between 𝑦𝑦(𝑛𝑛) and 𝑥𝑥(𝑛𝑛).
Example 4–4
Problem: Consider the problem of estimating a desired signal 𝑦𝑦(𝑛𝑛) that is corrupted by
additive noise, 𝑣𝑣(𝑛𝑛). The goal is to design the optimum IIR filter to extract 𝑦𝑦(𝑛𝑛) from the
noisy observations:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
Equation 4-18
given that 𝑦𝑦(𝑛𝑛) and 𝑣𝑣(𝑛𝑛) are uncorrelated signals with known autocorrelation sequences
4
𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝛼𝛼 |𝑙𝑙| , − 1 < 𝛼𝛼 < 1 and 𝑟𝑟𝑣𝑣 (𝑙𝑙) = 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙) respectively where α = and σ2v = 1.
5
Solution for the noncausal IIR filter: The expressions for the autocorrelation 𝑟𝑟𝑥𝑥 (𝑙𝑙) of the
input signal 𝑥𝑥(𝑛𝑛) and the cross-correlation 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) between 𝑦𝑦(𝑛𝑛) and 𝑥𝑥(𝑛𝑛) are needed. From
Equation 4-18 and noting the 𝑦𝑦(𝑛𝑛) and 𝑣𝑣(𝑛𝑛) are uncorrelated:
𝑟𝑟𝑥𝑥 (𝑙𝑙) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑙𝑙)
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑦𝑦 (𝑙𝑙)
Taking the z-transform of the above:
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧)
Before deriving an exact expression of the noncausal IIR filter we can examine its behaviour
by noting its form. From Equation 4-17 :
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = =
𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
𝑗𝑗𝑗𝑗
and for 𝑧𝑧 = 𝑒𝑒 it shows that for those values of ω for which 𝑅𝑅𝑦𝑦 (𝑧𝑧) >> 𝑅𝑅𝑣𝑣 (𝑧𝑧), i.e. for high
SNR, then |𝐻𝐻𝑛𝑛𝑛𝑛 (𝑒𝑒 𝑗𝑗𝑗𝑗 )| ≈ 1. Conversely, if 𝑅𝑅𝑦𝑦 (𝑧𝑧) << 𝑅𝑅𝑣𝑣 (𝑧𝑧), i.e. for low SNR, then
|𝐻𝐻𝑛𝑛𝑛𝑛 (𝑒𝑒 𝑗𝑗𝑗𝑗 )| ≈ 0. Thus, the optimum filter “passes” its input in bands with high SNR (where
the desired signal dominates) but attenuates in bands with low SNR (where the noise
dominates).
To obtain an exact expression for the noncausal IIR filter, exact expressions for 𝑅𝑅𝑥𝑥 (𝑧𝑧) =
𝛧𝛧𝛧𝛧[𝑟𝑟𝑥𝑥 (𝑙𝑙)] = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑙𝑙)] = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧) and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)] = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] =
𝑅𝑅𝑦𝑦 (𝑧𝑧) are needed. Since 𝑟𝑟𝑣𝑣 (𝑙𝑙) = 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙) then 𝑅𝑅𝑣𝑣 (𝑧𝑧) = 𝜎𝜎𝑣𝑣2 = 1.
The expression for 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] requires derivation and simplification from first
principles as follows:
4 |𝑙𝑙| 4 𝑙𝑙 4 −𝑙𝑙
𝑟𝑟𝑦𝑦 (𝑙𝑙) = � � = � � 𝑢𝑢(𝑙𝑙) + � � 𝑢𝑢(−𝑙𝑙 − 1)
5 5 5
𝑙𝑙 𝑙𝑙
4 5
= � � 𝑢𝑢(𝑙𝑙) + � � 𝑢𝑢(−𝑙𝑙 − 1)
5 4
Referring to a common z-transform pairs table (found in most good textbooks in discrete-
time signal processing):
𝑛𝑛
1 𝑛𝑛
𝑎𝑎−1 𝑧𝑧
𝑎𝑎 𝑢𝑢(𝑛𝑛) → , |𝑧𝑧| > |𝑎𝑎| , 𝑎𝑎 𝑢𝑢(−𝑛𝑛 − 1) → , |𝑧𝑧| < |𝑎𝑎|
1 − 𝑎𝑎𝑧𝑧 −1 1 − 𝑎𝑎−1 𝑧𝑧
|𝑛𝑛|
1 − 𝑎𝑎2
𝑎𝑎 ⟷ , |𝑎𝑎| < |𝑧𝑧| < |𝑎𝑎−1 |
(1 − 𝑎𝑎𝑧𝑧 )(1 − 𝑎𝑎𝑎𝑎)
−1
thus:
4
1 𝑧𝑧 4 5
𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] = + 5 , < |𝑧𝑧| <
4 −1 4 5 4
1 − 𝑧𝑧 1 − 𝑧𝑧
5 5
and upon the appropriate algebraic simplification:
3 2
� � 4 5
𝑅𝑅𝑦𝑦 (𝑧𝑧) ≡ 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 5 , < |𝑧𝑧| <
4 −1 4 5 4
�1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧�
Equation 4-19
Finally the exact expression for 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧) in simplified form is needed:
3 2 3 2 4 4
� � � � + �1 − 𝑧𝑧 −1 � �1 − 𝑧𝑧�
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 5 +1= 5 5 5
4 −1 4 4 −1 4
�1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧� �1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧�
1 −1 1
8 �1 − 2 𝑧𝑧 � �1 − 2 𝑧𝑧�
=
5 �1 − 4 𝑧𝑧 −1 � �1 − 4 𝑧𝑧�
5 5
Equation 4-20
Note: The simplified forms for 𝑅𝑅𝑦𝑦 (𝑧𝑧) and 𝑅𝑅𝑥𝑥 (𝑧𝑧) have been deliberately structured so that
𝑅𝑅𝑥𝑥 (𝑧𝑧) can be expressed as 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) which is necessary when considering the causal
IIR filter.
The noncausal optimum IIR filter can now be expressed as the following all-pole filter:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 9 1 1
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = = , < |𝑧𝑧| < 2
𝑅𝑅𝑥𝑥 (𝑧𝑧) 40 �1 − 1 𝑧𝑧 −1 � �1 − 1 𝑧𝑧� 2
2 2
where the ROC has been chosen to ensure the filter is stable. Evaluating the inverse z-
transform:
3 1 |𝑛𝑛|
ℎ𝑛𝑛𝑛𝑛 (𝑛𝑛) = � � , − ∞ < 𝑛𝑛 < ∞
10 2
which clearly corresponds to a noncausal filter with corresponding stable, but noncausal,
difference equation:
∞
9 2 2 3 1 |𝑘𝑘|
𝑦𝑦(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + 𝑦𝑦(𝑛𝑛 − 1) + 𝑦𝑦(𝑛𝑛 + 1) = � � � 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
50 5 5 10 2
𝑘𝑘=−∞
The MMSE power is:
∞ ∞
3 1 |𝑘𝑘| 4 |𝑘𝑘| 3
𝑃𝑃 𝑛𝑛𝑛𝑛 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 1 − � � � � � =
10 2 5 10
𝑘𝑘=−∞ 𝑘𝑘=−∞
Since the limits of the summation that apply to Equation 4-16 is 0 ≤ 𝑚𝑚 < ∞ we cannot use
the convolution property of the z-transform to provide an analytic expression for the causal
IIR filter. An alternative methodology is to note the following:
1. Any regular process can be transformed to an equivalent white process
2. The solution to the Wiener-Hopf equations for 0 ≤ 𝑚𝑚 < ∞ is trivial if the input is white
𝑥𝑥(𝑛𝑛) = � ℎ𝑥𝑥 (𝑘𝑘)𝑤𝑤(𝑛𝑛 − 𝑘𝑘) ⇒ 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (z −1 )𝑅𝑅𝑤𝑤 (𝑧𝑧) = 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (z −1 )𝜎𝜎𝑥𝑥2
𝑘𝑘=0
Equation 4-24
For real-valued regular signal, 𝑥𝑥(𝑛𝑛), the PSD can be factored as:
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) → 𝜎𝜎𝑥𝑥2 = 𝐻𝐻𝑥𝑥−1 (𝑧𝑧)𝐻𝐻𝑥𝑥−1 (𝑧𝑧 −1 )𝑅𝑅𝑥𝑥 (𝑧𝑧)
1
where 𝐻𝐻𝑥𝑥−1 (𝑧𝑧) = is known as the whitening filter since:
𝐻𝐻𝑥𝑥 (𝑧𝑧)
∞
Figure 4-4 Block diagram of optimum IIR filter design (Figure 6.18[1])
To express 𝐻𝐻𝑐𝑐′ (𝑧𝑧) in terms of 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) a relationship between 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) is needed.
The non-causal form of Equation 4-24 allows us to state:
∞ ∞
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝐸𝐸 �𝑦𝑦(𝑛𝑛) � � ℎ𝑥𝑥 (𝑘𝑘)𝑤𝑤(𝑛𝑛 − 𝑙𝑙 − 𝑘𝑘)�� = � ℎ𝑥𝑥 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙 + 𝑘𝑘)
𝑘𝑘=−∞ 𝑘𝑘=−∞
Taking the z-transform noting that 𝑥𝑥(−𝑛𝑛) ⟺ 𝑋𝑋(𝑧𝑧 −1 ) this relationship is:
∞
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑍𝑍𝑍𝑍 � � ℎ𝑥𝑥 (−𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙 − 𝑘𝑘)� = 𝑍𝑍𝑍𝑍[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) ∗ ℎ𝑥𝑥 (−𝑘𝑘)] = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
𝑘𝑘=−∞
which gives:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) =
𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
Hence:
1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑐𝑐′ (𝑧𝑧) = � �
𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) +
and thus:
Since 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) then from Equation 4-17:
Optimum noncausal IIR filter
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = = 2
𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝜎𝜎𝑥𝑥 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
and the MMSE power, from Equation 4-23, where w(n) is the linearly equivalent white noise
process, can be expressed:
∞ ∞
1 2
𝑃𝑃𝑛𝑛𝑛𝑛 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − 2 � �𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)�
𝜎𝜎𝑥𝑥
𝑘𝑘=−∞ 𝑘𝑘=−∞
• Since �𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)� ≥ 0, then as the order of the filter increases the MMSE decreases due to
more 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) co-efficients contributing to decrease the initial signal variance
𝑃𝑃𝑦𝑦 = 𝑟𝑟𝑦𝑦 (0) in the expressions for 𝑃𝑃𝑐𝑐 and 𝑃𝑃𝑛𝑛𝑛𝑛 .
• How can only realise a non-causal IIR filter? The standard approach is to use a two-step
process with block buffering: 1) apply the causal filter 𝐻𝐻(𝑧𝑧) and buffer the output, then,
2) run the data “backwards” through the non-causal filter, 𝐻𝐻(𝑧𝑧 −1 ).
Example 4–5
The optimum causal IIR filter solution for the problem described in Example 4–4 is now
derived and compared with the optimum noncausal IIR filter.
From Equation 4-20 we can rewrite 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) where:
1 1
8 1 − 𝑧𝑧 −1 1 − 𝑧𝑧
𝜎𝜎𝑥𝑥2 = , 𝐻𝐻𝑥𝑥 (𝑧𝑧) = 2 , 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) = 2
5 4 4
1 − 𝑧𝑧 −1 1 − 𝑧𝑧
5 5
Equation 4-26
This together with the expression for 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) from Equation 4-19 gives:
3 2
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) � � 0.6 0.3𝑧𝑧
= 5 = +
𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) �1 − 4 𝑧𝑧 −1 � �1 − 1 𝑧𝑧� 1 − 4 𝑧𝑧 −1 1 − 1 𝑧𝑧
5 2 5 2
4
where the first term of the partial fraction expansion is stable for |z|> and is thus causal
5
and the second term is stable for |z|<2 and is thus noncausal. Hence taking the causal part
gives:
3
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) � �
� � = 5
𝐻𝐻𝑥𝑥 (𝑧𝑧 ) + 1 − 4 𝑧𝑧 −1
−1
5
From Equation 4-25 the expression for the optimal causal IIR filter is:
4 −1 3
1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 5 1 − 5 𝑧𝑧 � � 3 1 1
𝐻𝐻𝑐𝑐 (𝑧𝑧) = 2 � � = � � � 5 � = � �, |𝑧𝑧| >
𝜎𝜎𝑥𝑥 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) + 8 1 − 1 𝑧𝑧 −1 4
1 − 𝑧𝑧 −1 8 1 − 1 𝑧𝑧 −1 2
2 5 2
and evaluating the inverse z-transform the filter impulse response is:
3 1 𝑛𝑛
ℎ𝑐𝑐 (𝑛𝑛) = � � 𝑢𝑢(𝑛𝑛)
8 2
which corresponds to a causal filter with corresponding stable and causal difference
equation:
∞
3 1 3 1 𝑘𝑘
𝑦𝑦(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + 𝑦𝑦(𝑛𝑛 − 1) = � � � 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
8 2 8 2
𝑘𝑘=0
The MMSE power is:
∞ ∞
3 1 𝑘𝑘 4 𝑘𝑘 3
𝑃𝑃 𝑐𝑐 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑐𝑐 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 1 − � � � � � =
8 2 5 8
𝑘𝑘=0 𝑘𝑘=0
and, as expected, the optimum causal IIR filter has a larger MMSE than the optimum
noncausal IIR filter.
The one-step forward IIR linear predictor is a causal IIR optimum filter with desired
response 𝑦𝑦(𝑛𝑛) ≡ 𝑥𝑥(𝑛𝑛 + 1). The prediction error is:
∞
Of more interest is the prediction error filter transfer function, that is from Equation 4-27:
𝑒𝑒 𝑓𝑓 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − ∑∞ 𝑓𝑓
𝑘𝑘=0 ℎ𝑙𝑙𝑙𝑙 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 1 − 𝑘𝑘) → 𝐸𝐸 (𝑧𝑧) = 𝐻𝐻𝑝𝑝𝑝𝑝𝑝𝑝 (𝑧𝑧)𝑋𝑋(𝑧𝑧) from which
1
𝐻𝐻𝑝𝑝𝑝𝑝𝑝𝑝 (𝑧𝑧) = 1 − 𝑧𝑧 −1 𝐻𝐻𝑙𝑙𝑙𝑙 (𝑧𝑧) = .
𝐻𝐻𝑥𝑥 (𝑧𝑧)
That is, the prediction error filter is identical to the whitening filter of the process and hence
𝑓𝑓
the prediction error process is white. It can be shown the MMSE power is given by 𝑃𝑃𝑂𝑂 =
2
𝑃𝑃𝑒𝑒 = 𝐸𝐸{�𝑒𝑒 𝑓𝑓 (𝑛𝑛)� } = 𝜎𝜎𝑥𝑥2 which is as expected.
The inverse filtering or deconvolution problem involves the design of an optimum inverse
filter for linearly distorted signals observed in the presence of additive noise. The typical
configuration of such a system is shown by Figure 4-5. where G(z) is the known system
response of the linear distortion, H(z) is the optimum inverse filter we are designing, y(n) is
the desired signal we are trying to recover, s(n) is the linearly distorted signal, v(n) is
additive white noise and 𝑦𝑦�(𝑛𝑛) is an estimation of the desired signal from the noisy, distorted
input x(n).
The delay element is required since the linear distortion filter is causal and its output is
delayed by D samples. Usually this is unknown and has to be determined empirically for
improved performance. The optimum noncausal IIR filter is derived by:
𝑧𝑧 −𝐷𝐷 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝑅𝑅𝑥𝑥 (𝑧𝑧)
−𝐷𝐷
where the 𝑧𝑧 arises because the desired response is 𝑦𝑦(𝑛𝑛 − 𝐷𝐷). Since y(n) and v(n) are
uncorrelated, 𝑥𝑥(𝑛𝑛) = 𝑠𝑠(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) and 𝑋𝑋(𝑧𝑧) = 𝑆𝑆(𝑧𝑧) + 𝑉𝑉(𝑧𝑧) = 𝐺𝐺(𝑧𝑧)𝑌𝑌(𝑧𝑧) + 𝑉𝑉(𝑧𝑧) this gives:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐺𝐺(𝑧𝑧)𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
and thus the optimum inverse filter is:
𝑧𝑧 −𝐷𝐷 𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝐺𝐺(𝑧𝑧)𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
which, in the absence of noise, yields the expected result:
𝑧𝑧 −𝐷𝐷
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝐺𝐺(𝑧𝑧)
If we assume that system is driven by a white noise signal y(n) with variance 𝜎𝜎𝑦𝑦2 and the
additive noise v(n) is white with variance 𝜎𝜎𝑣𝑣2 then:
𝑧𝑧 −𝐷𝐷
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝐺𝐺(𝑧𝑧) + [1/𝐺𝐺(𝑧𝑧 −1 )](𝜎𝜎𝑣𝑣2 /𝜎𝜎𝑦𝑦2 )
• In most practical cases the system response, G(z) is unknown and the more difficult
problem of blind deconvolution applies.
This important problem is beyond the scope of these notes and students specialising in
digital communications are referred to [1, pages 310-319]. Of particular interest is Example
6.8.1 [1, page 317] which discusses the equalisation problem in the context of optimum FIR
filtering.
An important class of optimum filters are those that maximise the output signal-to-noise
ratio. Such filters are used to detect signals in additive noise in many applications, including
digital communications and radar. Detailed analysis of this problem is beyond the scope of
these notes and interested students should refer to [1, pages 319-325]. However we can
illustrate the basic tenets of the problem. Consider the observation vector, x(n), of a desired
signal, s(n) subject to interfering and/or noise v(n). That is:
𝐱𝐱(𝑛𝑛) = 𝐬𝐬(𝑛𝑛) + 𝐯𝐯(𝑛𝑛)
The optimum linear filter is designed to produce estimates 𝐲𝐲(𝑛𝑛) = 𝐬𝐬�(𝑛𝑛) from the observation
vector x(n). For the optimum FIR filter with co-efficient vector c(n) the filter output is given
by:
𝑦𝑦(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 𝐱𝐱(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 𝐬𝐬(𝑛𝑛) + 𝐜𝐜 𝑇𝑇 𝐯𝐯(𝑛𝑛)
2
The output signal power is defined as 𝑃𝑃𝑠𝑠 (𝑛𝑛) = 𝐸𝐸{�𝐜𝐜 𝐓𝐓 𝐬𝐬(𝑛𝑛)� } = 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑠𝑠 (𝑛𝑛)𝐜𝐜 and the output
noise power is given by 𝑃𝑃𝑣𝑣 (𝑛𝑛) = 𝐸𝐸{|𝐜𝐜 𝑇𝑇 𝐯𝐯(𝑛𝑛)|2 } = 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑣𝑣 (𝑛𝑛)𝐜𝐜. The signal-to-noise ratio for
WSS signals as a function of c is then:
𝑃𝑃𝑠𝑠 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑠𝑠 𝐜𝐜
𝑆𝑆𝑆𝑆𝑆𝑆(𝐜𝐜) = = 𝑇𝑇
𝑃𝑃𝑣𝑣 𝐜𝐜 𝐑𝐑 𝑣𝑣 𝐜𝐜
Of interest is the special case for deterministic signals, 𝐬𝐬(𝑛𝑛) = 𝛼𝛼𝐬𝐬𝑂𝑂 in additive white noise
(𝐑𝐑 𝑣𝑣 = 𝑃𝑃𝑣𝑣 𝐈𝐈). It can be shown [1, page 320] that SNR(c) is maximised when 𝐜𝐜𝑂𝑂 = 𝜅𝜅𝐬𝐬𝑂𝑂 , that is
the filter co-efficients are a scaled replica of the known signal and this type of filter is
usually referred to as the matched filter.
4.8 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000.
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.
3. A. Papoulis, S. Unnikrishna Pillai, “Probability, Random Variable, and Stochastic
Processes”, 4th Ed., McGraw-Hill, 2002.
5. Kalman Filters
Most real-world problems involve non-stationary processes and more useful estimates are
obtained based on all the available data (e.g. infinite past) and as we have seen efficient
algorithms based on the infinite-time Wiener filter are not easily obtainable.
The Kalman filter deals with the problem by providing an efficient time-recursive and order-
recursive solution to the optimal linear filtering problem in cases where the nonstationarity
can be modelled by dynamic or state-space models and where all the available data from
time n=0 is used. Furthermore vector state-space models are used which enhance the
applicability of Kalman filtering to a wider set of problems (e.g. an auto-regressive process
can be described by a vector state-space model).
Although the student is not required to remember how the Kalman equations are derived the
process of derivation is in itself instructive and will allow the student to appreciate the
importance of rigour and a systematic approach to derive the most complex of signal
processing and control theory equations from the most basic of assumptions. Being
proficient and confident in such derivation will allow the student to customise and adapt
algorithms to specific applications and research, rather than just accept algorithms published
in the literature for the most general cases.
We extend some concepts from stochastic processes and relevant notation is introduced
before the main development of the Kalman filtering equations.
We define:
𝑦𝑦�𝑛𝑛+1 (𝑛𝑛) = 𝑦𝑦�(𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(0)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝐱𝐱𝑛𝑛+1 (𝑛𝑛)}
𝑥𝑥�𝑛𝑛 (𝑛𝑛) = 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)|𝑥𝑥(𝑛𝑛 − 1), 𝑥𝑥(𝑛𝑛 − 2), … , 𝑥𝑥(0)} = 𝐸𝐸{𝑥𝑥(𝑛𝑛)|𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1)}
That is, 𝑦𝑦�(𝑛𝑛|𝑛𝑛), is the optimal estimate of the desired signal 𝑦𝑦(𝑛𝑛) based on all the available
data up to time n. We know that the optimum estimate at time n is given by 𝑦𝑦�(𝑛𝑛|𝑛𝑛) =
𝑦𝑦�𝑛𝑛+1 (𝑛𝑛) = 𝐜𝐜𝑛𝑛+1 (𝑛𝑛)𝑇𝑇 𝐱𝐱 𝑛𝑛+1 (𝑛𝑛), that is the n+1th order filter estimate.
Similarly, 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1), is the optimal forward prediction of the observation data 𝑥𝑥(𝑛𝑛) based
on all the available data prior to time n. We know that the optimum prediction at time n is
given by 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝑥𝑥�𝑛𝑛 (𝑛𝑛) = −𝐚𝐚𝑛𝑛 (𝑛𝑛)𝑇𝑇 𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1), that is the nth order predictor.
We now define the estimation/prediction error or innovation process for the observation data
𝑥𝑥(𝑛𝑛): 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1)
In the following we deal with vector valued observation data, 𝐱𝐱(𝑛𝑛), and desired signal or
state, 𝐲𝐲(𝑛𝑛), at time n. We also assume, without loss of generality, that 𝐸𝐸{𝐲𝐲(𝑛𝑛)} = 𝐸𝐸{𝐱𝐱(𝑛𝑛)} =
0 but note that the resulting equations can be shown to hold for the case 𝐸𝐸{𝐲𝐲(𝑛𝑛)} =
𝐸𝐸{𝐱𝐱(𝑛𝑛)} ≠ 0 and thus a correlation matrix will be referred to as the covariance matrix.
The non-stationarity of the desired signal is modelled by the following signal (state-space or
plant) model:
𝐲𝐲(𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
Equation 5.2
where:
𝐲𝐲(𝑛𝑛) = 𝑘𝑘 × 1 signal state vector at time n
𝐀𝐀(𝑛𝑛 − 1) = 𝑘𝑘 × 𝑘𝑘 matrix that relates 𝐲𝐲(𝑛𝑛 − 1) to 𝐲𝐲(𝑛𝑛) in the absence of a driving input
𝒖𝒖(𝑛𝑛) = 𝑗𝑗 × 1 zero-mean white noise “disturbance” with covariance matrix 𝐑𝐑 𝑢𝑢 (𝑛𝑛)
𝐁𝐁(𝑛𝑛) = 𝑘𝑘 × 𝑗𝑗 input matrix
The observation data is assumed to be related to the desired signal by the following
observation (or measurement) model:
𝐱𝐱(𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛)
Equation 5.3
where:
𝐱𝐱(𝑛𝑛) = 𝑚𝑚 × 1 observation data vector at time n
𝐇𝐇(𝑛𝑛) = 𝑚𝑚 × 𝑘𝑘 matrix that provides ideal linear relationship between 𝐲𝐲(𝑛𝑛) and 𝐱𝐱(𝑛𝑛)
𝒗𝒗(𝑛𝑛) = 𝑚𝑚 × 1 zero-mean white noise “interference” with covariance matrix 𝐑𝐑 𝑣𝑣 (𝑛𝑛)
Our goal is to derive a recursive algorithm that provides filtered estimates for the state
𝑦𝑦(𝑛𝑛|𝑛𝑛) given the observation data, 𝐱𝐱(𝑛𝑛), system matrices 𝐀𝐀(𝑛𝑛 − 1), 𝐇𝐇(𝑛𝑛), 𝐁𝐁(𝑛𝑛), and
covariances 𝐑𝐑 𝑢𝑢 (𝑛𝑛), 𝐑𝐑 𝑣𝑣 (𝑛𝑛).
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 5-2
ELEC4404 Signal Processing
It can be shown from Equation 5.3 and noting that E{𝒗𝒗(n)} = 0, that:
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
Equation 5.8
Hence from Equation 5.3 and Equation 5.8:
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛) − {𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)}
= 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝒗𝒗(𝑛𝑛)
Equation 5.9
where 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) represents the state error
Thus:
𝐑𝐑 𝑥𝑥� (𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛)
Equation 5.10
𝑇𝑇
where 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲� (𝑛𝑛|𝑛𝑛 − 1)} represents the predicted or a priori
state error covariance
Finally from Equation 5.7 together with Equation 5.10 and Equation 5.11 we have an
expression for the Kalman gain matrix:
𝐊𝐊(𝑛𝑛) = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛)[𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛)]−1
Equation 5.12
and hence a recursive estimate for the state 𝐲𝐲(𝑛𝑛|𝑛𝑛):
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)]
Equation 5.13
where from Equation 5.5 and Equation 5.8:
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
Thus we have a time-recursive and order-recursive set of equations for 𝐲𝐲�(𝑛𝑛|𝑛𝑛) at time n
based on the previous estimate 𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) at time n-1. However we now require a
similar recursive equation for the state error covariance 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) which involves the a priori
estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1).
Consider:
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
= 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛) − 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
= 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
which provides a prediction for the state error based on the previous estimate.
A similar prediction can be derived for the state error covariance:
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)}
= 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1)𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
Equation 5.14
We define:
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲(𝑛𝑛) − {𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)]}
= 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)
where we have used Equation 5.13 for 𝐲𝐲�(𝑛𝑛|𝑛𝑛). Hence the corresponding state error
covariance expression becomes:
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛)𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛)} = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)𝐊𝐊 𝑇𝑇 (𝑛𝑛)
= 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)[𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛)𝐑𝐑−1
𝑥𝑥� (𝑛𝑛)]
𝑇𝑇
Equation 5.14 and Equation 5.15 provide a time-recursive and order-recursive set of
equations for 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) at time n based on the previous estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1) at time n-
1, an providing the a priori estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) by Equation 5.14 needed in the calculation
of the Kalman gain in Equation 5.12.
System Description
State/Signal process: 𝐲𝐲(𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
Observation process: 𝐱𝐱(𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛)
Input
(a) State-space model parameters: 𝐀𝐀(𝑛𝑛 − 1), 𝐁𝐁(𝑛𝑛), 𝐑𝐑 𝑢𝑢 (𝑛𝑛); for n = 0, 1, 2, …
(b) Observation model parameters: 𝐇𝐇(𝑛𝑛), 𝐑𝐑 𝑣𝑣 (𝑛𝑛); for n = 0, 1, 2, …
(c) Observation data: 𝐱𝐱(𝑛𝑛); for n = 0, 1, 2, …
Initialisation:
(a) State vector: 𝐲𝐲�(−1| − 1) = 𝐸𝐸{𝐲𝐲(−1)}, the expected value of the state at time −1
(b) Error covariance: 𝐑𝐑 𝑦𝑦� (−1| − 1) = 𝐑𝐑 𝑦𝑦� (−1)the error covariance of the state at time −1
Recursion for n = 0, 1, 2, …
(a) Prediction
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1)𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
(c) Update
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)]
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) = [𝐈𝐈 − 𝐊𝐊(𝑛𝑛)𝐇𝐇(𝑛𝑛)]𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)
Output
(a) Estimated state vector: 𝐲𝐲�(𝑛𝑛|𝑛𝑛); for n = 0, 1, 2, …
(b) Estimated MMS error covariance: 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛); for n = 0, 1, 2, …
For the measurement model the simplest is to assume we have noisy observations of the AR
process as follows:
𝑦𝑦(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1)
𝑥𝑥(𝑛𝑛) = [1 0 ⋯ 0] � � + 𝑣𝑣(𝑛𝑛) = 𝐇𝐇𝐇𝐇(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
⋮
𝑦𝑦(𝑛𝑛 − (𝑞𝑞 − 1))
and we have 𝐇𝐇 = [1 0 ⋯ 0], the 1 × 𝑞𝑞 observation matrix. Note that all the system
matrices are independent of time.
It should be noted that the Kalman gain, 𝐊𝐊(𝑛𝑛), and state error covariance, 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛), can be
calculated off-line and pre-loaded into the filter since the equations do not depend on the data
𝐱𝐱(𝑛𝑛).
Thus the operation of the Kalman filter is sensitive to 𝐑𝐑 𝑢𝑢 (𝑛𝑛) and 𝐑𝐑 𝑣𝑣 (𝑛𝑛) as well as the
initialisation 𝐑𝐑 𝑦𝑦� (−1| − 1). In most practical cases 𝐑𝐑 𝑢𝑢 (𝑛𝑛) and 𝐑𝐑 𝑣𝑣 (𝑛𝑛) will be unknown and
will either have to be judiciously selected arbitrarily or become parameters that need to be
estimated as part of the system identification process.
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2){𝐈𝐈 − 𝐇𝐇 𝑇𝑇 (𝑛𝑛 − 1)[𝐇𝐇(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2)𝐇𝐇 𝑇𝑇 (𝑛𝑛 − 1) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛 − 1)]−1
𝐇𝐇(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2)}𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
Equation 5.16
This horrible looking equation is known as the matrix Riccati equation for 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1).
Example 5–1
Problem: Let 𝑦𝑦(𝑛𝑛) be defined by the AR(2) process described by:
𝑦𝑦(𝑛𝑛) = 1.8𝑦𝑦(𝑛𝑛 − 1) − 0.81𝑦𝑦(𝑛𝑛 − 2) + 0.1𝑢𝑢(𝑛𝑛) 𝑛𝑛 ≥ 0
where 𝑢𝑢(𝑛𝑛)~𝑁𝑁(0,1) and 𝑦𝑦(−1) = 𝑦𝑦(−2) = 0. The signal is observed in the presence of
additive noise:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) 𝑛𝑛 ≥ 0
where 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0,1) is uncorrelated with 𝑢𝑢(𝑛𝑛). Form the linear MMSE estimate for
𝑦𝑦(𝑛𝑛), 𝑛𝑛 ≥ 0
Solution: The AR(2) process and observations are reformulated as the follow state-space
model:
𝑦𝑦(𝑛𝑛) 1.8 −0.81 𝑦𝑦(𝑛𝑛 − 1) 0.1
𝐲𝐲(𝑛𝑛) = � �=� �� � + � � 𝑢𝑢(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1) 1 0 𝑦𝑦(𝑛𝑛 − 2) 0
𝑦𝑦(𝑛𝑛)
𝑥𝑥(𝑛𝑛) = [1 0] � � + 𝑣𝑣(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1)
and hence the Kalman filter algorithm can be run with the following model parameters:
1.8 −0.81 0.1
𝐀𝐀(𝑛𝑛) = � � 𝐁𝐁(𝑛𝑛) = � � 𝐑𝐑 𝑢𝑢 (𝑛𝑛) = 1
1 0 0
𝐇𝐇(𝑛𝑛) = [1 0] 𝐑𝐑 𝑣𝑣 (𝑛𝑛) = 1
100 samples of 𝑥𝑥(𝑛𝑛) and 𝑦𝑦(𝑛𝑛) were generated. The resulting estimate 𝑦𝑦�(𝑛𝑛) by running the
Kalman filter on the observation data 𝑥𝑥(𝑛𝑛) is shown in Figure 5-2. It can be seen that even
in the presence of very noisy data the Kalman filter is able to accurately track the signal.
Figure 5-2 Estimation of AR(2) process using the Kalman filter (Figure 7.12[1])
Example 5–2
Problem: Consider an object travelling in a straight-line with initial velocity 𝑦𝑦𝑣𝑣 (−1), initial
displacement 𝑦𝑦𝑝𝑝 (−1) and subject to a random acceleration. The measured position of the
object at the nth sampling instant:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦𝑝𝑝 (𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
is made in the presence of additive noise 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, 𝜎𝜎𝑣𝑣2 ) where 𝑦𝑦𝑝𝑝 (𝑛𝑛) is the true position of
the object at the nth sampling instant (i.e. 𝑦𝑦𝑝𝑝 (𝑛𝑛) = 𝑦𝑦𝑐𝑐 (𝑛𝑛𝑛𝑛) where 𝑦𝑦𝑐𝑐 (𝑛𝑛𝑛𝑛) is the instantaneous
position and T is the sampling interval in seconds). Form the linear MMSE estimate for the
trajectory 𝑦𝑦𝑝𝑝 (𝑛𝑛), 𝑛𝑛 ≥ 0, that is, track the target in the presence of random acceleration and
measurement noise.
Solution: Let 𝑦𝑦𝑣𝑣 (𝑛𝑛) = 𝑦𝑦̇𝑐𝑐 (𝑛𝑛𝑛𝑛) be the true velocity at the nth sampling instant and 𝑦𝑦𝑎𝑎 (𝑛𝑛) =
𝑦𝑦̈𝑐𝑐 (𝑛𝑛𝑛𝑛) represent the random acceleration that is in effect during the nth sampling interval
(i.e. for 𝑛𝑛𝑛𝑛 ≤ 𝑡𝑡 ≤ (𝑛𝑛 + 1)𝑇𝑇). The following equations of motion will then apply:
𝑦𝑦𝑣𝑣 (𝑛𝑛) = 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1) + 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1)𝑇𝑇
1
𝑦𝑦𝑝𝑝 (𝑛𝑛) = 𝑦𝑦𝑝𝑝 (𝑛𝑛 − 1) + 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1)𝑇𝑇 + 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1)𝑇𝑇 2
2
We formulate the problem using the following state-space model:
𝑦𝑦𝑝𝑝 (𝑛𝑛) 𝑇𝑇 2
1 𝑇𝑇 𝑦𝑦𝑝𝑝 (𝑛𝑛 − 1)
𝐲𝐲(𝑛𝑛) = � �=� �� � + � 2 � 𝑢𝑢(𝑛𝑛) 𝑛𝑛 ≥ 0
𝑦𝑦𝑣𝑣 (𝑛𝑛) 0 1 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1)
𝑇𝑇
= 𝐀𝐀𝐀𝐀(𝑛𝑛 − 1) + 𝐁𝐁𝑢𝑢(𝑛𝑛)
where 𝑢𝑢(𝑛𝑛) = 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1) treats the random acceleration as the state model noise and:
𝑥𝑥(𝑛𝑛) = [1 0]𝐲𝐲(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) 𝑛𝑛 ≥ 0
= 𝐇𝐇𝐇𝐇(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
is the observation model.
Estimates for both the position and velocity can be obtained for 𝑛𝑛 ≥ 0 by using the Kalman
filter algorithm initialised by 𝑦𝑦𝑣𝑣 (−1) and 𝑦𝑦𝑝𝑝 (−1).
Figure 5-3 Estimation of positions and velocities using the Kalman filter (Figure 7.14[1])
5.2 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000 (Chapter 10).
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.
3. A. Papoulis, S. Unnikrishna Pillai, “Probability, Random Variable, and Stochastic
Processes”, 4th Ed., McGraw-Hill, 2002.
We discuss modeling of real world signals using parametric pole-zero (PZ) signal models
described by:
𝑃𝑃 𝑄𝑄
is equivalent to forward linear prediction where 𝑏𝑏0 𝑤𝑤(𝑛𝑛) can be considered as the prediction
error.
If 𝑃𝑃 > 0 and 𝑄𝑄 > 0 we have the pole-zero (PZ) or autoregressive moving average (ARMA)
model:
𝑃𝑃 𝑄𝑄
Model Selection: Which model type (MA, AR, ARMA) and which order (values of 𝑃𝑃 and 𝑄𝑄)
should we used to model the given data {𝑥𝑥(𝑛𝑛)}?
Model Validation: For the selected model and order is the estimated model accurate and
representative?
Usually larger values of (𝑃𝑃, 𝑄𝑄) will yield more accurate models, but perhaps not efficient
models which can generalise sufficiently, especially if parameters have been estimated based
on the actual available data {𝑥𝑥(𝑛𝑛)} (i.e. least-squares methods) rather than known second-
order statistics (i.e. optimum filters), and for which a lower order model can be just as good.
So how do we measure the goodness of fit of a model?
Let 𝑥𝑥�(𝑛𝑛) be the estimate of the data generated from a model which has been selected and
parameters estimated. Then 𝑒𝑒(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛) is the model error or residual.
NOTE: Use the MATLAB script [a,g,eps]=rtoam(r) (from Lab 2) to calculate eps
(𝜖𝜖𝑗𝑗 ) or g (Γ𝑘𝑘 ) for any order m > P. given the m x m autocorrelation matrix r.
Problem
Find the AR(P) signal model that best describes the sunspot data in Figure 6-1.
Model Selection
The Partial AutoCorrelation Sequence (PACS) of the sunspot data is plotted in Figure 6-2.
The horizontal dashed lines represent the significance threshold for determining if Γ𝑚𝑚 ≈ 0.
We see that Γ𝑚𝑚 ≈ 0 for 𝑚𝑚 > 2 hence we select an AP model with P=2.
Figure 6-2 The PACS values of the sunspot numbers (Figure 9.6[1])
Model Estimation
Using LS error analysis with full-windowing [1: 449-453] yields the following AR(2) model
for the sunspot data:
�2𝑤𝑤 = 289.2
𝑥𝑥�(𝑛𝑛) = 1.318𝑥𝑥(𝑛𝑛 − 1) − 0.634𝑥𝑥(𝑛𝑛 − 2) + 𝑤𝑤(𝑛𝑛) σ
NOTE: In the following 𝜔𝜔 refers to digital radians (principal period of −𝜋𝜋 < 𝜔𝜔 < 𝜋𝜋 and due
to even symmetry defined 0 < 𝜔𝜔 < 𝜋𝜋), normally we use Ω but we adopt the same notation as
the textbook.
The power spectrum of a wide-sense stationary random process is given by the discrete-time
Fourier transform (DTFT) of the autocorrelation sequence:
∞
𝑋𝑋𝑁𝑁 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑥𝑥𝑁𝑁 (𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 = � 𝑥𝑥(𝑛𝑛)𝑤𝑤𝑅𝑅 (𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 = � 𝑥𝑥(𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗
𝑛𝑛=−∞ 𝑛𝑛=−∞ 𝑛𝑛=0
Equation 6.5
Thus to compute the periodogram we form the DFT of the data sequence and square the
magnitude response:
𝑤𝑤𝑅𝑅 (𝑛𝑛) 𝐷𝐷𝐷𝐷𝐷𝐷 1
𝑥𝑥(𝑛𝑛) �⎯⎯� 𝑥𝑥𝑁𝑁 (𝑛𝑛) �⎯� 𝑋𝑋𝑁𝑁 (𝑘𝑘) → |𝑋𝑋𝑁𝑁 (𝑘𝑘)|2 = 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗2𝜋𝜋𝜋𝜋/𝑛𝑛 )
𝑁𝑁
Since the periodogram is a function of the random process, 𝑥𝑥(𝑛𝑛) we need to establish the
performance of the periodogram, 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) as an estimate of the true power spectrum,
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), along statistical grounds. The standard performance measure for statistical
estimates is whether such estimates converge in the mean-square sense for a sufficiently
large sample size, that is whether or not:
2
𝑙𝑙𝑙𝑙𝑙𝑙 𝐸𝐸 ��𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) − 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� � = 0
𝑁𝑁→∞
Convergence in the mean-squared senses requires two important properties to be satisfied in
order as follows:
1. Asymptotically Unbiased
lim 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
𝑁𝑁→∞
2. Mean-Square Convergence
2
lim 𝐸𝐸 ��𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) − 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�� � = lim var�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 0
𝑁𝑁→∞ 𝑁𝑁→∞
That is the periodogram, 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), must be a consistent estimate of the power spectrum,
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), its mean must converge to 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) and its variance must converge to zero as 𝑁𝑁 →
∞. Another important measure we can also investigate is the resolution of the periodogram.
Mean of Periodogram
It can be shown [1: 216-217; 2:398-399] that:
1 2
𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�
2𝜋𝜋
Equation 6.6
where:
𝑗𝑗𝑗𝑗
1 sin(𝑁𝑁𝑁𝑁/2) 2
2
�𝑊𝑊𝑅𝑅 (𝑒𝑒 )� = � �
𝑁𝑁 sin(𝜔𝜔/2)
2
And as 𝑁𝑁 → ∞, �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� converges to an impulse of unit area, so we can state:
1 2
𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� (biased)
2𝜋𝜋𝜋𝜋
𝑙𝑙𝑙𝑙𝑙𝑙 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) (it is asymptotically unbiased)
𝑁𝑁→∞
As we can see the window spectrum, �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�, is critical in determining the form of bias
present in the periodogram.
Resolution of Periodogram
In addition to smoothing the spectrum and introducing spurious peaks the window spectrum
�𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� also reduces the resolution of the spectrum. That is if two closely spaced
sinusoids are close enough the width the mainlobe of �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� will smooth out the two
peaks making it difficult to resolve the two frequencies. The wider the mainlobe the worse
this effect.
We define the resolution as the width, Δω, of the mainlobe of �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� at its “half-power”
or 3 dB point. The resolution of the periodogram is given by:
2𝜋𝜋
Res�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 0.89
𝑁𝑁
Equation 6.7
From which we see that the resolution increases (is better) with increasing N (longer
window). This is a fundamental result in spectral analysis as it implies that for non-stationary
signals (e.g. speech) there is a tradeoff between better spectral frequency resolution (longer
window) and better spectral temporal resolution (shorter window).
Variance of Periodogram
We require the variance of the periodogram to converge to zero as 𝑁𝑁 → ∞. Unfortuneately,
deriving an expression for the variance for an arbitrary process, 𝑥𝑥(𝑛𝑛), is difficult due to the
presence of the fourth-order moments of the process. From the derivation in [2: 403-407] it
can be shown that:
var�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
Equation 6.8
Thus the variance does not go to zero as 𝑁𝑁 → ∞ and the periodogram is not a consistent
estimate of the power spectrum.
The modified periodogram is defined for the case where the window function, 𝑤𝑤(𝑛𝑛), is other
than the rectangular window function:
∞ 2
1
𝑃𝑃�𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � � 𝑥𝑥(𝑛𝑛)𝑤𝑤(𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
𝑁𝑁𝑁𝑁
𝑛𝑛=−∞
Equation 6.9
where:
𝑁𝑁−1
1
𝑈𝑈 = � |𝑤𝑤(𝑛𝑛)|2
𝑁𝑁
𝑛𝑛=0
Equation 6.10
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 6-7
ELEC4404 Signal Processing
Thus by being able to choose the window function we can control the level and type of bias
in the spectral estimate. Some of the popular data windows used for the modified
periodogram are:
and
Res�𝑃𝑃�𝑀𝑀 �𝑒𝑒 𝑗𝑗𝑗𝑗 �� = (𝛥𝛥𝛥𝛥)3𝑑𝑑𝑑𝑑
which depends on the window function. Furthermore the level of the peak sidelobe is also
important in establishing the level of spectral leakage in the ensuing spectral estimate. These
values are tabulated in Figure 6-4.
Figure 6-4 Properties of selected Window functions ([1], Table 8.2, pg. 411)
We know from the statistical theory that if we average N uncorrelated observations (e.g. K
uncorrelated measurements from the same random process 𝑥𝑥, with the nth observation
denoted by 𝑥𝑥𝑛𝑛 ) then the variance of the average is given by:
𝐾𝐾 𝐾𝐾
1 1 var{𝑥𝑥𝑛𝑛 }
var � � 𝑥𝑥𝑛𝑛 � = 2 var �� 𝑥𝑥𝑛𝑛 � =
𝐾𝐾 𝐾𝐾 𝐾𝐾
𝑛𝑛=1 𝑛𝑛=1
Equation 6.11
That is, the variance is reduced by a factor of K. Thus if we can derive K uncorrelated
measurements of the modified periodogram for the same random process, then by forming
the average of these (which we know produces an asymptotically unbiased estimate of the
power spectrum) we will obtain a power spectrum estimate with variance ∝ 1/𝐾𝐾, i.e. the
variance will approach zero as 𝐾𝐾 → ∞.
Given the N length realisation of the signal, 𝑥𝑥(𝑛𝑛), assume that successive sequences are
offset by 𝐷𝐷 sample times and that each sequence is 𝐿𝐿 samples long. Then the ith sequence is
given by:
𝑥𝑥𝑖𝑖 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖), 𝑛𝑛 = 0,1, … , 𝐿𝐿 − 1
thus, the amount of overlap between 𝑥𝑥𝑖𝑖 (𝑛𝑛) and 𝑥𝑥𝑖𝑖+1 (𝑛𝑛) is 𝐿𝐿 − 𝐷𝐷 samples.
Assume K of the sequences cover the entire N data samples, then we must have that:
𝑁𝑁 = 𝐿𝐿 + 𝐷𝐷(𝐾𝐾 − 1)
Consider the following cases:
• No overlap (𝐷𝐷 = 𝐿𝐿) 𝐾𝐾 = 𝑁𝑁/𝐿𝐿 sections of length 𝐿𝐿
• 50% overlap (𝐷𝐷 = 𝐿𝐿/2) , 𝐾𝐾 = 2(𝑁𝑁/𝐿𝐿) − 1 sections of length 𝐿𝐿
In Welch’s Method the N-length realisation of the signal is partitioned into K sequences of
length L which are offset by 𝐷𝐷 samples, i.e. 𝑥𝑥𝑖𝑖 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖), such that 𝑁𝑁 = 𝐿𝐿 + 𝐷𝐷(𝐾𝐾 −
1) and the K modified periodogram estimates are formed using the L-length subsequences
and averaged, to yield the estimate:
𝐾𝐾−1 𝐿𝐿−1 2
1
𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � �� 𝑤𝑤(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
𝐾𝐾𝐾𝐾𝐾𝐾
𝑖𝑖=0 𝑛𝑛=0
1
where 𝑈𝑈 = ∑𝐿𝐿−1 |𝑤𝑤(𝑛𝑛)|2 and 𝑤𝑤(𝑛𝑛) is the window of length L. Hence it is evident that:
𝐿𝐿 𝑛𝑛=0
1 2
𝐸𝐸�𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 )�
2𝜋𝜋𝜋𝜋𝜋𝜋
For 𝐷𝐷 = 𝐿𝐿/2 (50% overlap) and a Bartlett window function, 𝑤𝑤𝐵𝐵 (𝑛𝑛), it can be shown that:
9 𝐿𝐿 2 𝑗𝑗𝑗𝑗 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
� 𝑗𝑗𝑗𝑗
var�𝑃𝑃𝑊𝑊 (𝑒𝑒 )� ≈ 𝑃𝑃 (𝑒𝑒 ) ≈
16 𝑁𝑁 𝑥𝑥 𝐾𝐾
2𝜋𝜋
Res�𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 1.28
𝐿𝐿
Note however that 𝑤𝑤𝑎𝑎 (𝑛𝑛) is an infinite-duration sinc( ) function and thus we can’t realise this
smoothing operation. The Blackman-Tukey Method implements a practical form of
smoothing of the periodogram and is defined by:
We note that since 𝑟𝑟̂𝑥𝑥 (𝑘𝑘) is a real and even symmetric function then 𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) should be a
real-valued, non-negative function of ω. These requirements restrict our choice of ideal
window functions, 𝑤𝑤(𝑘𝑘), to those that:
• are conjugate symmetric, 𝑤𝑤(𝑘𝑘) = 𝑤𝑤 ∗ (−𝑘𝑘), so that 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) is real-valued
• have a Fourier transform 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) ≥ 0 (non-negative function of ω)
All window functions listed in Table 6.1 are even and symmetric, but only the Bartlett
window function has 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) ≥ 0, whereas the popular Hamming and Hanning window
functions do not. Hence the Blackman-Tukey Method will use a Bartlett window function by
default.
We see that as always there is a trade-off. To decrease the variance we require a small value
of 𝑀𝑀. This corresponds to wider main-lobe in 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) which produces greater smoothing but
at the cost of reduced resolution. One recommendation is to ensure 𝑀𝑀 ≤ 𝑁𝑁/5.
NOTE: In the following 𝜔𝜔 refers to digital radians (principal period of −𝜋𝜋 < 𝜔𝜔 < 𝜋𝜋 and due
to even symmetry defined 0 < 𝜔𝜔 < 𝜋𝜋), normally we use Ω but we adopt the same notation as
the textbook.
The nonparametric methods make no assumption on the signal, 𝑥𝑥(𝑛𝑛), and can thus be used
with any random signal process. However if we have some knowledge on the signal model
used to generate 𝑥𝑥(𝑛𝑛) then parametric spectrum estimation methods can be used.
Parametric methods provide more accurate, higher resolution spectrum estimates BUT only
if the underlying model is correct, otherwise the spectrum estimate will be worse than that
obtained by nonparametric means.
An AR(p) signal is generated by filtering a unit variance white noise process by:
𝑏𝑏(0)
𝐻𝐻(𝑧𝑧) = 𝑝𝑝
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
𝑝𝑝
We have a signal, 𝑥𝑥(𝑛𝑛), which we assume is the output of an AR(p) model. Let �𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 ,
𝑏𝑏�(0) be the estimates of the model parameters that we obtain from the signal data, then
2
noting the power spectrum of a signal is given by �𝐻𝐻(𝑒𝑒 𝑗𝑗𝑗𝑗 )� we obtain the parametric
spectrum estimate:
� (0)�2
�𝑏𝑏
𝑃𝑃�𝐴𝐴𝐴𝐴 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = 𝑝𝑝 2
�1 + ∑𝑘𝑘=1 𝑎𝑎�𝑝𝑝 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
Equation 6.12
𝑝𝑝 𝑝𝑝
where the �𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 ≡ {𝑎𝑎𝑘𝑘 }𝑘𝑘=1 = 𝐚𝐚𝑂𝑂 , are the forward linear prediction parameters, and
2 𝑓𝑓
�𝑏𝑏�(0)� = 𝑃𝑃𝑂𝑂 = 𝑟𝑟(0) + 𝐫𝐫 𝑇𝑇 𝐚𝐚𝑂𝑂 is the MMSE power from stationary FLP analysis (see Section
4.5.4).
An MA(q) signal is generated by filtering a unit variance white noise process by:
𝑞𝑞
An alternative formulation is possible if the autocorrelation sequence is known (or has been
estimated). Since for a MA(q) process 𝑟𝑟𝑥𝑥 (𝑘𝑘) = 0 for |𝑘𝑘| > 𝑞𝑞 then a natural estimate based
on Equation 6.2 is:
𝑞𝑞
An ARMA(p,q) signal is generated by filtering a unit variance white noise process by:
∑𝑞𝑞𝑘𝑘=0 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
𝐻𝐻(𝑧𝑧) = 𝑝𝑝
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
We have a signal, 𝑥𝑥(𝑛𝑛), which we assume is the output of an ARMA(p,q) model. Let
𝑝𝑝 𝑞𝑞
�𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 , �𝑏𝑏�𝑞𝑞 (𝑘𝑘)�𝑘𝑘=0 , be the estimates of the model parameters that we obtain from the
2
signal data, then noting the power spectrum of a signal is given by �𝐻𝐻(𝑒𝑒 𝑗𝑗𝑗𝑗 )� we obtain the
parametric spectrum estimate:
𝑞𝑞 −𝑗𝑗𝑗𝑗𝑗𝑗 2
∑ 𝑘𝑘=0 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑒𝑒
𝑃𝑃�𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑝𝑝 �
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗
Equation 6.15
Example 6.1
Consider the signal generated by:
𝑥𝑥𝑎𝑎 (𝑛𝑛) = 5sin(0.45𝜋𝜋𝜋𝜋 + 𝜙𝜙1 ) + 5sin(0.55𝜋𝜋𝜋𝜋 + 𝜙𝜙2 ) + 𝑤𝑤(𝑛𝑛)
where 𝑤𝑤(𝑛𝑛) is a unit variance white noise process. The spectrum estimate in Figure 6-5
clearly shows that parametric AR spectral estimation has superior resolution to
nonparametric Blackman-Tukey spectral estimation since the sinusoidal peaks have been
resolved. This is a result of the process 𝑥𝑥𝑎𝑎 (𝑛𝑛) being accurately modelled as an AR process.
Figure 6-5 Spectrum estimate for the signal 𝐱𝐱 𝐚𝐚 (𝐧𝐧) using the Blackman-Tukey method
(dashed lines) and AR spectral estimation (solid lines) ([2], Figure 8.23(a), pg. 441)
Figure 6-6 Spectrum estimate for the signal 𝒙𝒙𝒃𝒃 (𝒏𝒏) using the Blackman-Tukey method
(dashed lines) and AR spectral estimation (solid lines) ([1], Figure 8.23(b), pg. 441)
The PSD of an ARMA(4,2) process is estimated by using the LS AP/PZ methods of order
AP(10) (aka AR(10)) and PZ(4,2) (aka ARMA(4.2)) on a 300 sample segment. The actual
PSD and estimated PSD’s are plotted in Figure 6-7 .The effect of the model mismatch is
evident when using the AP(10) method (especially the spectral valley between the two
spectral peaks and the high-frequency response) compared to accuracy of the estimated PSD
when using the correct model in the PZ(4,2) method. However even with a model mismatch
the AP(10) accurately identifies the spectral peaks in the spectrum. Indeed it can be shown
that the AP models tend to accurately model spectral peaks but not spectral valleys.
Figure 6-7: Actual PSD and estimated PSD from the AP(10) and PZ(4,2) methods for
an ARMA(4,2) process ([1], Figure 9.13)
6.4 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000.
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.
7. Adaptive Filters
7.1 Introduction
[1: 499-516][2: 493-499]
Filters (which we will also use to refer to predictors) that incorporate a systematic way that
allows the filter co-efficients to evolve (or adapt) as a function of the input data and desired
signal (when available) are termed adaptive filters. Adaptive filters are important for several
reasons:
• Knowledge of the second-order moments will usually be unknown and an optimum filter
design is not possible
• The signal operating environment (SOE) changes with time and a fixed digital filter
implementation will not work
• Practical implementations are possible with minimum computational and memory
requirements (compared to solving the normal equations in an optimum or LS filter
directly)
The following real-world problems in signal processing and communications can only be
solved by the use of a properly designed adaptive filter:
• echo cancelation in communications
• equalisation of data communication channels
• linear predictive coding
• noise cancelation
Filter c(n)
Parameters
Filter Structure
Forms the output of the filter, 𝑦𝑦�(𝑛𝑛), which is an estimate of a desired response, using
measurements of the input signal or signals, 𝐱𝐱(𝑛𝑛), and driven by the filter parameters, 𝐜𝐜(𝑛𝑛).
The filtering structure can be linear (e.g. FIR filter) or non-linear. We will only consider the
linear FIR structure.
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 7-1
ELEC4404 Signal Processing
Performance Evaluation
The output of the adaptive filter, 𝑦𝑦�(𝑛𝑛), and the desired response, 𝑦𝑦(𝑛𝑛) (when available), are
used to assess the quality of the adaptive filter response with respect to the requirements of
the particular application and produce a criterion of system performance.
With supervised adaptation the desired response is available and the criterion of
performance is derived based on some average or instantaneous form of the square error (i.e.
|𝑒𝑒(𝑛𝑛)|2 where 𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛) is the performance parameter of interest).
With unsupervised adaptation the desired response is not known and the performance is
evaluated based on some measurable property of the input signal and/or generated response
as part of the adaptation algorithm process.
Adaptation Algorithm
Uses the information from the input signal, 𝐱𝐱(𝑛𝑛), and performance parameter, 𝑒𝑒(𝑛𝑛), or some
function of it, to modify the filter parameters, 𝐜𝐜(𝑛𝑛), in such a way that performance, as
measured by the criterion of system performance, is improved.
Our goal is to find the coefficients, 𝐜𝐜(𝑛𝑛) = [𝑐𝑐1 (𝑛𝑛) 𝑐𝑐2 (𝑛𝑛) … 𝑐𝑐𝑀𝑀 (𝑛𝑛)]𝑇𝑇 , to form the estimate
of a desired signal, 𝑦𝑦�(𝑛𝑛) from the M observation data samples,
𝐱𝐱(𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) ⋯ 𝑥𝑥(𝑛𝑛 − 𝑀𝑀 + 1)]𝑇𝑇 :
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)
such that the Mean Square Error (MSE) is minimised:
𝑃𝑃(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 }
where the error is simply:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
Optimum filters
If we know the second-order moments of the SOE we can design the optimum filter by
solving the normal equations:
𝐑𝐑(𝑛𝑛)𝐜𝐜𝑂𝑂 (𝑛𝑛) = 𝐝𝐝(𝑛𝑛)
where:
𝐑𝐑(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)}
𝐝𝐝(𝑛𝑛) = 𝐸𝐸(𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)}
If the SOE is stationary, the optimum filter is computed once and is used with all realisations
{𝐱𝐱(𝑛𝑛), 𝑦𝑦(𝑛𝑛)}. For nonstationary environments, the optimum filter design is repeated at every
time instant n because the optimum filter is time-varying.
Least-Squares filters
If the second-order moments are not known we collect a sufficient amount of data
{𝐱𝐱(𝑛𝑛), 𝑦𝑦(𝑛𝑛)}𝑁𝑁−1
0 and obtain an acceptable estimate of the optimum filter in the LSE sense by
computing:
𝑁𝑁−1
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 = � 𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)
𝐑𝐑
𝑛𝑛=0
𝑁𝑁−1
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 = � 𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)
𝑛𝑛=0
and then solving the normal equations:
� 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐝𝐝̂
𝐑𝐑
The obtained co-efficients can be used to filter the collected (batch) data in the interval 0 ≤
𝑛𝑛 ≤ 𝑁𝑁 − 1 and/or to start filtering the data for 𝑛𝑛 > 𝑁𝑁, on a sample-by-sample basis, in real-
time. LS filtering is a form of adaptive filtering called block adaptive filtering where the co-
efficients should be re-estimated each time the properties of the SOE change significantly.
Block adaptive filters suffer from the following two problems:
1. The filter cannot track statistical variations within the operating block and hence provides
an “average” performance which may be poor on a sample-by-sample responses basis.
2. A decision has to be made when the properties of the SOE change significantly to trigger
a re-estimation. Alternatively the co-efficients can be re-estimated continuously (as with
standard LS estimation using blocks of length N and overlap 𝑁𝑁𝑂𝑂 ) with a consequent
increase in the computational requirements to calculate the correlation matrix and solve
the normal equations (see lecture notes on Least Squares Filtering and Prediction).
Adaptive filters
In applications which require sample-by-sample filtering (e.g. channel equalisation) the
adaptive filter operation starts immediately at time 𝑛𝑛 = 0 after the observation of the pair
{𝐱𝐱(0), 𝑦𝑦(0)} using an initial “guess” 𝐜𝐜(−1) for the filter co-efficients.
The co-efficients are then updated at each time-step and the performance improves as the
filter co-efficients converge to the optimum values (transient acquisition phase) and then, in
the case of non-stationary signals, track any subsequent changes to the optimum values
(steady-state tracking phase). This operation of the adaptive filter is shown in Figure 7-2.
Figure 7-2 Modes of operation of adaptive filter in stationary and nonstationary SOE
(Figure 10.12[1])
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 7-3
ELEC4404 Signal Processing
For the case where the desired signal is available, the general adaptive filter, at each time n,
performs the following operations.
a priori type adaptive algorithms
1. Filtering:
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
2. Error formation:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
3. Adaptive algorithm:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), 𝑒𝑒(𝑛𝑛)}
The increment Δ𝐜𝐜(𝑛𝑛) = Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), 𝑒𝑒(𝑛𝑛)} is in general a non-linear function of the input 𝐱𝐱(𝑛𝑛)
and error 𝑒𝑒(𝑛𝑛) and is designed to bring the filter co-efficient, 𝐜𝐜(𝑛𝑛), close to the optimum
filter co-efficient, 𝐜𝐜𝑂𝑂 , with the passage of time (acquisition phase). A key requirement is that
Δ𝐜𝐜(𝑛𝑛) must vanish if the error 𝑒𝑒(𝑛𝑛) vanishes. Hence 𝑒𝑒(𝑛𝑛) plays a major role in determining
the increment Δ𝐜𝐜(𝑛𝑛). Most adaptive algorithms use a direct linear dependency on 𝑒𝑒(𝑛𝑛) as
follows:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝐠𝐠(𝑛𝑛)𝑒𝑒(𝑛𝑛)
Equation 7.1
where 𝐠𝐠(𝑛𝑛) is the gain adaptation vector, usually a function of the input data vector 𝐱𝐱(𝑛𝑛).
The a priori refers to the fact that the estimate 𝑦𝑦�(𝑛𝑛) is based on the current input 𝐱𝐱(𝑛𝑛) and
the previously estimated filter co-efficients 𝐜𝐜(𝑛𝑛 − 1), that is predicted or a priori estimates
for 𝑦𝑦(𝑛𝑛) and 𝑒𝑒(𝑛𝑛) are used, rather than the actual or a posteriori estimates.
If we used the actual estimates, obtained using the current filter co-efficient 𝐜𝐜(𝑛𝑛) then we
have the following.
a posterior type adaptive algorithms
1. Filtering:
𝑦𝑦�𝑎𝑎 (𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)
2. Error formation:
ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�𝑎𝑎 (𝑛𝑛)
3. Adaptive algorithm:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), ε(𝑛𝑛)}
The a posteriori type adaptive algorithms are usually more difficult to formulate due to the
coupling of 𝐜𝐜(𝑛𝑛)between ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛) and 𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), ε(𝑛𝑛)}.
The goal of an adaptive filter is to find and then track the optimum filter 𝐜𝐜𝑂𝑂 (𝑛𝑛) as quickly
and accurately as possible. Define:
𝐜𝐜�(𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐜𝐜𝑂𝑂 (𝑛𝑛)
as the deviation from the optimal filter. The MSE for the adaptive filter can be decomposed
as:
𝑃𝑃(𝑛𝑛) = 𝑃𝑃𝑂𝑂 (𝑛𝑛) + 𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛)
where 𝑃𝑃𝑂𝑂 (𝑛𝑛) is the optimum filter MSE (the best we can do) and 𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) measures the excess
MSE (EMSE) and indicates the deviation of the error response from the optimum filter case.
In stationary SOE’s the MSE can be further decomposed as:
𝑃𝑃(𝑛𝑛) = 𝑃𝑃𝑂𝑂 + 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) + 𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
where:
𝑃𝑃𝑂𝑂 is the stationary (i.e. constant) MSE of the optimum filter
𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) represents the transient MSE which dominates during the acquisition phase.
𝑃𝑃𝑒𝑒𝑒𝑒 (∞) represents the steady-state excess MSE which dominates during the tracking phase.
Stability
An acceptable adaptive filter should be stable in the bounded-input bounded-output (BIBO)
sense. Assuming that the optimum filter is, by definition, stable, then one measure of
stability is whether the adaptive filter eventually convergences to the optimum filter. One
measure of stability in this way is:
Convergence in the mean-square (MS) sense
lim 𝐸𝐸{‖𝐜𝐜�(𝑛𝑛)‖2 } = 𝟎𝟎
𝑛𝑛→∞
Equation 7.2
�(𝑛𝑛)‖2
where 𝐌𝐌𝐌𝐌𝐌𝐌(𝑛𝑛) = 𝐸𝐸{‖𝐜𝐜 } is the mean square deviation (MSD).
Speed of adaptation
The speed of adaptation or rate of convergence during the acquisition phase can be
considered proportional to the total amount of transient MSE. Thus:
Total transient MSE (measure of speed of adaptation)
∞
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑃𝑃𝑡𝑡𝑡𝑡 = � 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛)
𝑛𝑛=0
Equation 7.3
Quality of Adaptation
The EMSE and steady-state EMSE (in the stationary SOE case) are a measure of how well
the adaptive filter is working, i.e. the quality of adaptation.
Misadjustment (measure of quality of adaptation)
𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) 𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
Μ= or
𝑃𝑃𝑂𝑂 (𝑛𝑛) 𝑃𝑃𝑂𝑂
Equation 7.4
7.2.1 Derivation
The LMS adaptive filter is a form of stochastic steepest-descent algorithm (SDA) that
minimises the instantaneous MSE. For details see [1: 516-524][2: 499-505].
The SDA attempts to find the optimum filter, 𝐜𝐜𝑂𝑂 (𝑛𝑛), by taking steps in the direction of the
∂𝜉𝜉(𝑛𝑛)
negative gradient of the MSE cost function, −∇𝜉𝜉(𝑛𝑛) = − � � where the MSE cost
∂𝐜𝐜(𝑛𝑛−1)
function, 𝜉𝜉(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 } ≈ |𝑒𝑒(𝑛𝑛)|2 , is approximated by the instantaneous MSE.
For a data input vector with M values the algorithm requires 2M multiplications and 2M
additions at each time-step.
The FIR adaptive filter realisation using the LMS algorithm is shown in Figure 7-3.
Figure 7-3 An FIR adaptive filter realisation using the LMS algorithm (Figure 10.18[1])
Speed of adaptation
The theoretical analysis is beyond the scope of these notes but the defining relation is that:
∞
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝐜𝐜�(0)
𝑃𝑃𝑡𝑡𝑡𝑡 = � 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) ≅
4µ
𝑛𝑛=0
The smaller the step size and the farther the initial co-efficients are from their optimum
settings, the more iterations it takes for the LMS algorithm to converge.
Another factor which contributes to a slow rate of convergence which arises from analysis of
the convergence of the underlying SDA is the eigenvalue spread (condition number) of the
input correlation matrix 𝐑𝐑 𝑥𝑥 :
λmax
Χ(𝐑𝐑 𝑥𝑥 ) =
λmin
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 7-7
ELEC4404 Signal Processing
For the LMS algorithm the transient decay time-constant is lower bounded by:
𝜏𝜏 > Χ(𝐑𝐑 𝑥𝑥 )
Thus the LMS algorithm will converge faster if the contours of the error surface are circular
(i.e. 𝐑𝐑 𝑥𝑥 → σ2𝑥𝑥 𝐈𝐈 or Χ(𝐑𝐑 𝑥𝑥 ) → 1, the eigenvalue spread is small) rather than when they are
elliptical (i.e. Χ(𝐑𝐑 𝑥𝑥 ) >> 1, the eigenvalue spread is large)
Quality of adaptation
The theoretical analysis is beyond the scope of these notes but the steady-state excess MSE
and hence misadjustment is given by:
𝑀𝑀
𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
Μ= ≅ µ tr(𝐑𝐑 𝑥𝑥 ) = µ � 𝐸𝐸{|𝑥𝑥𝑘𝑘 (𝑛𝑛)|2 } ≡ µ𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 }
𝑃𝑃𝑂𝑂
𝑘𝑘=1
The larger the step-size and the larger the tap input power (defined as 𝑀𝑀𝑃𝑃𝑥𝑥 = 𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 })
the greater the deviation in the error from the optimum error.
Linear Prediction
Consider a signal, 𝑥𝑥(𝑛𝑛), generated by the following AR(2) model:
𝑥𝑥(𝑛𝑛) = −𝑎𝑎1 𝑥𝑥(𝑛𝑛 − 1) − 𝑎𝑎2 𝑥𝑥(𝑛𝑛 − 2) + 𝑤𝑤(𝑛𝑛)
2
where 𝑤𝑤(𝑛𝑛)~𝑁𝑁(0, σ𝑤𝑤 ) and two sets of parameter co-efficients are chosen as follows:
Χ(𝐑𝐑) 𝑎𝑎1 𝑎𝑎2 σ2𝑤𝑤
λ1 1.1 -0.1950 0.95 0.0965
= = 1.22
λ2 0.9
λ1 1.818 -1.5955 0.95 0.0322
= = 10
λ2 0.182
𝑐𝑐 (𝑛𝑛) −𝑎𝑎1
The adaptive LMS algorithm was used to provide estimates of 𝐜𝐜(𝑛𝑛) = � 1 � ≡ �−𝑎𝑎 �
𝑐𝑐2 (𝑛𝑛) 2
from 1000 realisations of 𝑥𝑥(𝑛𝑛) for each of the above set of parameters. The LMS algorithm
for this problem is:
Filtering
𝑥𝑥�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
= 𝑐𝑐1 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 1) + 𝑐𝑐2 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 2)
Error Formation
𝑒𝑒(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛)
= 𝑥𝑥(𝑛𝑛) − 𝑐𝑐1 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 1) − 𝑐𝑐2 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 2)
Co-efficient Updating
𝑐𝑐 (𝑛𝑛) 𝑐𝑐 (𝑛𝑛 − 1) 2µ𝑒𝑒(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 1)
𝐜𝐜(𝑛𝑛) = � 1 � = � 1 �+� � = 𝐜𝐜(𝑛𝑛 − 1) + 2µ𝑒𝑒(𝑛𝑛)𝐱𝐱(𝑛𝑛)
𝑐𝑐2 (𝑛𝑛) 𝑐𝑐2 (𝑛𝑛 − 1) 2µ𝑒𝑒(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 2)
𝑥𝑥(𝑛𝑛 − 1)
where 𝐱𝐱(𝑛𝑛) = � �, µ is the step-size, and the adaptive predictor is initialised by
𝑥𝑥(𝑛𝑛 − 2)
𝑥𝑥(−1) = 𝑥𝑥(−2) = 0 and 𝑐𝑐1 (−1) = 𝑐𝑐2 (−1) = 0.
Plots of the 𝐜𝐜(𝑛𝑛) learning curve and the effect of step-size for the case of a small eigenvalue
spread (Χ(𝐑𝐑) = 1.22) are shown in Figure 7-4 and for a large eigenvalue spread
(Χ(𝐑𝐑) = 10) are shown in Figure 7-5.
Figure 7-4 The performance of the LMS adaptive algorithm used in linear prediction of an
AR(2) process for an eigenvalue spread of 𝚾𝚾(𝐑𝐑) = 𝟏𝟏. 𝟐𝟐𝟐𝟐. (left plot) The average and sample
c(n) learning curve for 𝛍𝛍 = 𝟎𝟎. 𝟎𝟎𝟒𝟒. (right plot) Effect of step-size on MSE learning curve.
(Figure 10.20[1])
Figure 7-5 The performance of the LMS adaptive algorithm used in linear prediction of an
AR(2) process for an eigenvalue spread of 𝚾𝚾(𝐑𝐑) = 𝟏𝟏𝟏𝟏. (left plot) The average and sample
c(n) learning curve for 𝛍𝛍 = 𝟎𝟎. 𝟎𝟎𝟎𝟎. (right plot) Effect of step-size on MSE learning curve.
(Figure 10.21[1])
An echo canceller based on the LMS adaptive algorithm is needed to remove the unwanted
echoes from the incoming signal.
Figure 7-6 Block diagram of adaptive echo cancellation filter (Figure 10.23[1])
Echo cancellation in a communications system can be investigated from the block diagram
given by Figure 7-6 where:
• 𝑥𝑥(𝑛𝑛) is the transmitted signal from the local handset which is assumed to be an IID binary
data sequence,
• FIR echo path,𝐜𝐜𝑂𝑂 , is the FIR filter structure modelling the generation of the combined
near-end and far-end echo signal, 𝑦𝑦(𝑛𝑛), arising from the originating signal source, 𝑥𝑥(𝑛𝑛),
• 𝑢𝑢(𝑛𝑛) = 𝑧𝑧(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) is the “uncancelable” desired signal received from the remote
transmitted signal, 𝑠𝑠(𝑛𝑛), subject to the effect of the transmission path, 𝑔𝑔(𝑛𝑛), and additive
noise, 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, σ2𝑣𝑣 ),
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 7-10
ELEC4404 Signal Processing
• 𝑠𝑠𝑟𝑟 (𝑛𝑛) is the received signal at the local handset subject to echo interference from 𝑦𝑦(𝑛𝑛),
• adaptive echo canceller is an FIR filter structure that attempts to form an estimate of the
unwanted echo signal, 𝑦𝑦�(𝑛𝑛), which is then subtracted from the received signal 𝑠𝑠𝑟𝑟 (𝑛𝑛).
The formulation of a practical LMS adaptive algorithm is based on the observation that the
data sequence 𝐱𝐱(𝑛𝑛) is correlated with 𝑦𝑦(𝑛𝑛) but not 𝑠𝑠(𝑛𝑛) or 𝑣𝑣(𝑛𝑛). Thus 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑠𝑠(𝑛𝑛)} = 0
which also implies 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑢𝑢(𝑛𝑛)} = 𝐸𝐸{𝑦𝑦�(𝑛𝑛)𝑢𝑢(𝑛𝑛)} = 0 (why?). As shown by Figure 7-6 we
instead use 𝑒𝑒(𝑛𝑛) = 𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛). So what happens when we use the LMS algorithm to
minimise this error?
𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} = 𝐸𝐸{(𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)2 )}
2
= 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) + 𝑢𝑢(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � = 𝐸𝐸{([𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)] + [𝑢𝑢(𝑛𝑛)])2 }
2
= 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} + 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � + 2𝐸𝐸�𝑢𝑢(𝑛𝑛)�𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)��
2
= 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} + 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� �
2
Thus minimising 𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} is equivalent to minimising 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � the MSE for
𝑦𝑦(𝑛𝑛) so that 𝑦𝑦�(𝑛𝑛) → 𝑦𝑦(𝑛𝑛), then 𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} → 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} and the output of the system is in
fact 𝑒𝑒(𝑛𝑛) → 𝑢𝑢(𝑛𝑛) which is the desired received signal!
Thus the following LMS algorithm for adaptive echo cancellation can be formulated:
Filtering
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
Error Formation (and also the output!)
𝑒𝑒(𝑛𝑛) = 𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
Co-efficient Updating
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 2µ 𝑒𝑒(𝑛𝑛)𝐱𝐱(𝑛𝑛)
We incorporate this knowledge into the step-size if we define 𝛽𝛽 and generate the adaptive
step-size as:
𝛽𝛽
𝜇𝜇(𝑛𝑛) =
2‖𝐱𝐱(𝑛𝑛)‖2
Hence we formulate the normalised LMS (NLMS):
𝐱𝐱(𝑛𝑛)
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝛽𝛽 𝑒𝑒(𝑛𝑛)
‖𝐱𝐱(𝑛𝑛)‖2
where the choice of the NLMS step-size, 𝛽𝛽, will ensure convergence for 0 < 𝛽𝛽 < 2. To use
the NLMS we also need to the following recursion for the normalisation term ‖𝐱𝐱(𝑛𝑛)‖2 :
‖𝐱𝐱(𝑛𝑛)‖2 = ‖𝐱𝐱(𝑛𝑛 − 1)‖2 + |𝑥𝑥(𝑛𝑛)|2 − |𝑥𝑥(𝑛𝑛 − 𝑀𝑀)|2
Advanced Techniques
The LMS algorithm attains its best performance when the input correlation matrix is
diagonal with equal eigenvalues (i.e. Χ(𝐑𝐑 𝑥𝑥 ) = 1 implying no eigenvalue spread) and for FIR
filters this implies that the input data signal is white. Where this is not the case the following
two variations to the basic LMS algorithm are possible:
• Transform-domain LMS algorithm which applies a whitening transformation to the input
data based on the known input correlation matrix.
• Suboptimal decorrelation of the input data via the discrete cosine transform (DCT) or
discrete wavelet transform (DWT) when the input correlation matrix is unknown.
In applications that require filters with a large number of coefficients, then the real-time
implementation of the LMS adaptive algorithm becomes difficult. One solution is to resort to
a block adaptive filter structure where the filter co-efficients are updated on a block-by-
block rather than sample-by-sample basis.
7.3.1 Introduction
Whereas with the LMS adaptive filter the instantaneous MSE is used:
𝜉𝜉(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 } ≈ |𝑒𝑒(𝑛𝑛)|2
with the RLS adaptive algorithm a weighted MSE is used:
𝑛𝑛 𝑛𝑛
To derive the LS filter coefficients that minimise Equation 7.5 we proceed by setting the
derivative of 𝜉𝜉(𝑛𝑛) with respect to 𝐜𝐜 𝑇𝑇 (𝑛𝑛) to zero:
𝑛𝑛 𝑛𝑛
𝜕𝜕𝜕𝜕(𝑛𝑛) 𝜕𝜕𝜕𝜕(𝑗𝑗)
= 2 � 𝜆𝜆𝑛𝑛−𝑗𝑗 𝑒𝑒(𝑗𝑗) = −2 � 𝜆𝜆𝑛𝑛−𝑗𝑗 𝑒𝑒(𝑗𝑗)𝐱𝐱(𝑗𝑗)
𝜕𝜕𝐜𝐜(𝑛𝑛) 𝜕𝜕𝐜𝐜(𝑛𝑛)
𝑗𝑗=0 𝑗𝑗=0
𝑛𝑛
The LS adaptive filter summarised in Figure 7-7 requires the solution of the 𝑀𝑀 × 𝑀𝑀 normal
equations (for an M-length FIR filter) for the adaptation gain vector at each time-step. This
makes practical implementation of the LS adaptive filter difficult without a more efficient
method to calculate the adaptation gain vector. By formulating a time-recursive equation for
� −1 (𝑛𝑛), the adaptation gain can be derived
the inverse of the correlation matrix, 𝐏𝐏(𝑛𝑛) = 𝐑𝐑
directly as 𝐠𝐠�(𝑛𝑛) = 𝐏𝐏(𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛). The CRLS algorithm summarised in Figure 7-8
represents a practical implementation of the a priori RLS algorithm. For derivation see [1:
552-553][2: 543-544].
NOTE
1) The growing memory RLS algorithm yields no deviation and no excess MSE in the
steady-state whereas the exponential memory RLS algorithm exhibits a finite deviation and
excess MSE in the steady-state which decreases to zero as λ approaches 1.
2) However the growing memory RLS algorithm is not able to track changes to new
statistical environments and should only be deployed in stationary SOEs. The exponential
memory RLS algorithm is the more practical implementation as the 𝜆𝜆 can be used to trade-off
performance and tracking ability.
3) Unlike the LMS algorithm the convergence of the RLS algorithm is not sensitive to the
eigenvalue spread.
Consider the steady-state co-efficient error vector which can be written as:
𝐜𝐜�(𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐜𝐜𝑂𝑂 (𝑛𝑛)
= [𝐜𝐜(𝑛𝑛) − 𝐸𝐸{𝐜𝐜(𝑛𝑛)}] + [𝐸𝐸{𝐜𝐜(𝑛𝑛)} − 𝐜𝐜𝑂𝑂 (𝑛𝑛)]
≡ 𝐜𝐜�𝑒𝑒 (𝑛𝑛) + 𝐜𝐜�𝑙𝑙 (𝑛𝑛)
where
• 𝐜𝐜�𝑒𝑒 (𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐸𝐸{𝐜𝐜(𝑛𝑛)} is the estimation error which represents the fluctuations of the
adaptive filter parameter vector about its mean.
• 𝐜𝐜�𝑙𝑙 (𝑛𝑛) = 𝐸𝐸{𝐜𝐜(𝑛𝑛)} − 𝐜𝐜𝑂𝑂 (𝑛𝑛) is the lag error which represents the bias in 𝐜𝐜(𝑛𝑛) with respect
to the optimal 𝐜𝐜𝑂𝑂 (𝑛𝑛).
We note that:
1. In stationary SOEs the estimation error manifests itself as the noisy fluctuations about the
constant optimal 𝐜𝐜𝑂𝑂 and the lag error is zero.
2. In nonstationary SOEs the estimation error manifests itself as the noisy fluctuations about
the mean trajectory of 𝐜𝐜(𝑛𝑛) and the lag error manifests itself as the deviation between the
𝐜𝐜(𝑛𝑛) curve and the optimal 𝐜𝐜𝑂𝑂 (𝑛𝑛) curve.
7.5 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000.
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 7-17
ELEC4404 Signal Processing
The optimum MMSE estimator assumes knowledge of the second-order moments (i.e. the
autocorrelation sequence 𝑟𝑟(𝑙𝑙)). In practice such second-order moments are not available and
have to be estimated from the available data. Thus we resort to techniques applicable to
deterministic signals where we have access to actual data (although the origin may be
stochastic) rather than estimated statistics.
For ergodic stationary data, a widely used measure is the time-averaged sample
autocorrelation sequence:
𝑁𝑁−1
⎧1
⎪ � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙) 0 ≤ 𝑙𝑙 ≤ 𝑁𝑁 − 1
𝑟𝑟̂𝑥𝑥 (𝑙𝑙) = 𝑁𝑁 𝑛𝑛=𝑙𝑙
⎨𝑟𝑟 (−𝑙𝑙) −(𝑁𝑁 − 1) ≤ 𝑙𝑙 ≤ 0
⎪ 𝑥𝑥
⎩0 otherwise
Equation 8.1
which makes use only of the available data {𝑥𝑥(𝑛𝑛)}𝑁𝑁−1
0 Hence estimates for |𝑙𝑙| close to N will
.
not contain enough data samples to be reliable (i.e. 𝑁𝑁 − 𝑙𝑙 samples is a small number). A good
rule of thumb is to restrict |𝑙𝑙| ≤ 𝑁𝑁/4.
Linear LSE estimation is based on the availability of measurements of the desired response
𝑦𝑦(𝑛𝑛) and the M input signals 𝑥𝑥𝑘𝑘 (𝑛𝑛) for 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀 over the measurement interval or analysis
frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1. As was the case with optimum MMSE estimation the problem is to
estimate the desired response 𝑦𝑦(𝑛𝑛) using a linear combination of the M input signals as
follows:
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 8-1
ELEC4404 Signal Processing
𝑀𝑀
𝐸𝐸 = � |𝑒𝑒(𝑛𝑛)|2
𝑛𝑛=0
For this minimisation to be possible, the coefficient vector 𝒄𝒄(𝑛𝑛) has to be held constant over
the analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1, i.e. 𝒄𝒄(𝑛𝑛) ≡ 𝒄𝒄. Furthermore the linear LSE estimator 𝒄𝒄𝒍𝒍𝒍𝒍
so obtained depends on the measurement set or particular analysis frame and a different
value will be obtained with different sets of data. This should be contrasted with the optimum
MMSE estimate 𝒄𝒄𝑶𝑶 which only depends on the second-order moments (e.g. it is time-
invariant for stationary signals).
where the 𝑁𝑁 × 1 columns 𝐱𝐱� 𝑘𝑘 of X are called the data records (collected for input “sensor” k):
𝑥𝑥𝑘𝑘 (0)
𝑥𝑥 (1)
𝐱𝐱� 𝑘𝑘 = � 𝑘𝑘 �
⋮
𝑥𝑥𝑘𝑘 (𝑁𝑁 − 1)
𝑇𝑇
and the 1 × 𝑀𝑀 rows 𝐱𝐱 (𝑛𝑛) of X are called the snapshots (of all “sensors” at time n):
𝐱𝐱 𝑇𝑇 (𝑛𝑛) = [𝑥𝑥1 (𝑛𝑛) 𝑥𝑥2 (𝑛𝑛) ⋯ 𝑥𝑥𝑀𝑀 (𝑛𝑛)]
Equation 8.4 represents a system of N equations in M unknowns. The practical case of interest
for LS analysis that will be considered is for overdetermined systems when N > M.
The LSE estimator operates in block processing mode. That is, it processes a frame of N
snapshots of the data, where the data is blocked into frames of length N samples with
successive frames overlapping by 𝑁𝑁𝑂𝑂 samples. The required estimate or error signal are
unblocked at the final stage of the processor. The values of N and 𝑁𝑁𝑂𝑂 and the interpolation
between overlapping estimates depends on the application. This is illustrated in Figure 8-1.
Figure 8-1 Block processing implementation of general linear LSE estimator (Figure 8.2[1])
A geometric derivation to the normal equations will be provided which will highlight
important interpretations of the variables involved, especially the input data matrix, X.
The desired response vector, 𝐲𝐲, and data records, 𝐱𝐱� 𝑘𝑘 , for 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀 are considered as vectors
in an N-dimensional vector space, with the inner product defined by:
⟨𝐱𝐱�𝑖𝑖 , 𝐱𝐱� 𝑖𝑖 ⟩ = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐱𝐱� 𝑖𝑖 = ∑𝑁𝑁−1 �, 𝐱𝐱�⟩ = ‖𝐱𝐱�‖2
𝑛𝑛=0 𝑥𝑥𝑖𝑖 (𝑛𝑛)𝑥𝑥𝑗𝑗 (𝑛𝑛) with the special case ⟨𝐱𝐱
The estimate of the desired response can be expressed as:
𝑀𝑀
is orthogonal to the estimation space, that is 𝐞𝐞𝑙𝑙𝑙𝑙 ⊥ 𝐱𝐱� 𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀, which is the case when
𝐲𝐲�𝑙𝑙𝑙𝑙 is the projection of 𝐲𝐲 onto the estimation space. This is illustrated in Figure 8-2.
Figure 8-2 Vector space interpretation of LSE estimation for N=3 (dimension of data space)
and M=2 (dimension of estimation space) (Figure 8.5[1])
average cross-correlation vector. From Figure 8-2 we have the following trigonometric
identity:
‖𝐲𝐲‖2 = ‖𝐲𝐲�𝑙𝑙𝑙𝑙 ‖2 + ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖2
Now we define 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐞𝐞𝑇𝑇𝑙𝑙𝑙𝑙 𝐞𝐞𝑙𝑙𝑙𝑙 = ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖2 , 𝐸𝐸𝑦𝑦 = 𝐲𝐲 𝑇𝑇 𝐲𝐲 = ‖𝐲𝐲‖2 and note that from Equation 8.5
𝑇𝑇 𝑇𝑇 𝑇𝑇
𝐲𝐲�𝑙𝑙𝑙𝑙 = 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 . This together with Equation 8.6 gives ‖𝐲𝐲�𝑙𝑙𝑙𝑙 ‖2 = 𝐲𝐲�𝑙𝑙𝑙𝑙 𝐲𝐲�𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝑇𝑇 𝐲𝐲, and
thus we can say:
𝑇𝑇 ̂
𝑇𝑇 𝑇𝑇
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝐲𝐲 = 𝐸𝐸𝑦𝑦 − 𝐜𝐜𝑙𝑙𝑙𝑙 𝐝𝐝 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙
Equation 8.7
For linear MSE estimation the computational requirements for calculation of 𝐑𝐑 � = 𝐗𝐗 𝑇𝑇 𝐗𝐗 and
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 are as important as the solution of the normal equations of Equation 8.6 themselves.
The formulation of the normal equations for LS estimation is illustrated by Figure 8-3.
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 is symmetric only the upper triangular elements 𝑟𝑟̂𝑖𝑖𝑖𝑖 = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐱𝐱�𝑗𝑗 for 𝑗𝑗 ≥ 𝑖𝑖 need to
Since 𝐑𝐑
be calculated requiring 𝑀𝑀(𝑀𝑀 + 1)/2 dot products with N arithmetic operations per dot
product. The 𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 requires calculation of 𝑑𝑑̂𝑖𝑖 = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐲𝐲 requiring M dot-products with N
operations per dot-product. Thus, to form the normal equations requires a total of:
1 1 3
𝑀𝑀(𝑀𝑀 + 1)𝑁𝑁 + 𝑀𝑀𝑀𝑀 = 𝑀𝑀2 𝑁𝑁 + 𝑀𝑀𝑀𝑀
2 2 2
arithmetic operations. Solution of the normal equations by standard techniques like LDL𝐻𝐻 or
Cholesky decomposition [1, Section 6.3] requires 𝑂𝑂(𝑀𝑀3 ) operations. For over-determined
systems of interest where 𝑁𝑁 > 𝑀𝑀 this may mean that more computational work will be
involved in forming the normal equations 𝑂𝑂(𝑀𝑀2 𝑁𝑁) than in solving them 𝑂𝑂(𝑀𝑀 3 )!
Uniqueness Theorem
The over-determined (𝑁𝑁 > 𝑀𝑀) LS problem has a unique solution provided by the normal
equations of Equation 8.6 if the time-average correlation matrix 𝐑𝐑� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 is positive definite,
or equivalently if the data matrix 𝐗𝐗 has linearly independent columns.
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 8-5
ELEC4404 Signal Processing
Example 8.1
Problem: Estimate the sequence 𝐲𝐲 = [1 2 3 2]𝑇𝑇 from the observation data records 𝐱𝐱�1 =
[1 2 1 1]𝑇𝑇 and 𝐱𝐱� 2 = [2 1 2 3]𝑇𝑇 by determining the optimum filter, the error vector,
𝐞𝐞𝑙𝑙𝑙𝑙 , and LSE 𝐸𝐸𝑙𝑙𝑙𝑙 .
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 = �7 9 �,
𝐑𝐑
10
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 = � �
9 18 16
where:
1 2
2 1
𝐗𝐗 = [𝐱𝐱�1 𝐱𝐱� 2 ] = � �
1 2
1 3
and the solve the normal equations to obtain the LS estimator:
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 8-6
ELEC4404 Signal Processing
2 1 4
−
� −1 𝐝𝐝̂ = �5
𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐑𝐑 5� �10� = �5 �
1 7 16 22
−
5 45 45
and the LSE:
4
98
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = ‖𝐲𝐲‖2 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = 18 − [10 16] �5 � =
22 45
45
The projection matrix is:
2 1 2 1
⎡ ⎤
⎢ 9 9 9 3 ⎥
1 43 1 2
⎢ − ⎥
𝑇𝑇
𝐏𝐏 = 𝐗𝐗(𝐗𝐗 𝐗𝐗) 𝐗𝐗 =−1 𝑇𝑇 ⎢ 9 45 9 15⎥
⎢ 2 1 2 1 ⎥
⎢9 9 9 3 ⎥
⎢1 2 1 3 ⎥
⎣3 −
15 3 5 ⎦
which can be used to determine the error vector:
7 4 11 4 𝑇𝑇
𝐞𝐞𝑙𝑙𝑙𝑙 = (𝐈𝐈 − 𝐏𝐏)𝐲𝐲 = �− − − �
9 45 9 15
2 98
from which we get ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖ = = 𝐸𝐸𝑙𝑙𝑙𝑙
45
Linear LS estimation of FIR filters adopts the same framework used in optimum MMSE FIR
filter estimation. Previously we had:
𝑀𝑀−1
With linear LS estimation we assume availability of the desired response 𝑦𝑦(𝑛𝑛) and input
signal 𝑥𝑥(𝑛𝑛) over the measurement interval or analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 and hold the
linear LS estimators or filter coefficients 𝐜𝐜 = [𝑐𝑐0 𝑐𝑐1 ⋯ 𝑐𝑐𝑀𝑀−1 ]𝑇𝑇 constant over this
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 8-7
ELEC4404 Signal Processing
interval. However unlike the general linear LS estimation case (Section 8.2) Equation 8.8
holds for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓 where 𝑁𝑁𝑖𝑖 and 𝑁𝑁𝑓𝑓 depend on the type of “windowing” that is applied.
The need for this windowing arises by noting that data outside the measurement interval may
be required. For example, at 𝑛𝑛 = 0 we require 𝐱𝐱(0) = [𝑥𝑥(0) 𝑥𝑥(−1) ⋯ 𝑥𝑥(−𝑀𝑀 + 1)]𝑇𝑇
and the data 𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀 + 1) lies outside the measurement interval 0 ≤ 𝑛𝑛 ≤
𝑁𝑁 − 1. In matrix form the system of equations can be written as:
𝐞𝐞 = 𝐲𝐲 − 𝐗𝐗𝐗𝐗 for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓
where 𝐞𝐞, 𝐲𝐲 and 𝐗𝐗 are given by:
𝐞𝐞 = [𝑒𝑒(𝑁𝑁𝑖𝑖 ) 𝑒𝑒(𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑒𝑒(𝑁𝑁𝑓𝑓 )]𝑇𝑇
𝐲𝐲 = [𝑦𝑦(𝑁𝑁𝑖𝑖 ) 𝑦𝑦(𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑦𝑦(𝑁𝑁𝑓𝑓 )]𝑇𝑇
𝐱𝐱 𝑇𝑇 (𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 − 1) ⋯ 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑀𝑀 + 1)
⎡ 𝑇𝑇 ⎤
𝐱𝐱 (𝑁𝑁𝑖𝑖 + 1)⎥ 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1) 𝑥𝑥((𝑁𝑁𝑖𝑖 + 1) − 1) ⋯ 𝑥𝑥((𝑁𝑁𝑖𝑖 + 1) − 𝑀𝑀 + 1)
𝐗𝐗 = ⎢ =� �
⎢⋮ ⎥ ⋮ ⋮ ⋱ ⋮
𝑇𝑇
⎣𝐱𝐱 (𝑁𝑁𝑓𝑓 ) ⎦ 𝑥𝑥(𝑁𝑁𝑓𝑓 ) 𝑥𝑥(𝑁𝑁𝑓𝑓 − 1) ⋯ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑀𝑀 + 1)
= [𝐱𝐱�1 𝐱𝐱� 2 ⋯ 𝐱𝐱� 𝑀𝑀 ]
where e and y are (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) length vectors and X is a (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) × 𝑀𝑀 matrix,
𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙)
⎡⋮ ⎤
⎢ ⎥
𝐱𝐱� 𝑙𝑙 = ⎢𝑥𝑥(𝑛𝑛 + 1 − 𝑙𝑙) ⎥
⎢⋮ ⎥
⎣𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑙𝑙)⎦
and the LS criterion is:
𝑁𝑁𝑓𝑓
𝐸𝐸 = � |𝑒𝑒(𝑛𝑛)|2 = 𝐞𝐞𝑇𝑇 𝐞𝐞
𝑛𝑛=𝑁𝑁𝑖𝑖
which is minimised when the LS FIR filter co-efficients are chosen so as to satisfy the normal
equations:
(𝐗𝐗 𝑇𝑇 𝐗𝐗)𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐗𝐗 𝑇𝑇 𝐲𝐲
� 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐝𝐝̂
𝐑𝐑
with an LS error of:
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = ‖𝐲𝐲‖2 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗
8.3.2 Efficient computation of the correlation matrix 𝐑𝐑
This recursion holds because the columns of X are obtained by shifting the first column. The
recursion suggests the following efficient way to compute 𝐑𝐑 �:
1. Compute the first row of 𝐑𝐑 � using Equation 8.9. This requires M dot products and a total of
𝑀𝑀(𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 ) operations.
2. Compute the remaining elements in the upper triangular part of 𝐑𝐑 � using the recursion
Equation 8.10. The required number of operations are 𝑂𝑂(𝑀𝑀2 ).
3. Compute the lower triangular part of 𝐑𝐑 � is symmetric.
� from 𝑟𝑟𝑗𝑗𝑗𝑗 = 𝑟𝑟𝑖𝑖𝑖𝑖 since 𝐑𝐑
In the following we assume an over-determined system 𝑁𝑁 > 𝑀𝑀. The data, 𝑥𝑥(𝑛𝑛) and 𝑦𝑦(𝑛𝑛), are
also assumed derived by windowing the sequence by a rectangular window of length 0 ≤ 𝑛𝑛 ≤
𝑁𝑁 − 1. That is, only data in the analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 is available and data outside
that interval are assumed to be zero.
Case 1: No windowing
𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1 implying:
𝐲𝐲 = [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1)]𝑇𝑇
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1)
𝐗𝐗 = � � = [𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 ]
⋮ ⋮ ⋱ ⋮
𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀)
That is, all the available data is used without any distortions arising from using data outside
the measurement interval. In the signal processing literature this is sometimes referred to as
the covariance method.
Case 2: Prewindowing
𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1 implying:
𝐲𝐲 = [𝑦𝑦(0) 𝑦𝑦(1) ⋯ 𝑦𝑦(𝑁𝑁 − 1)]𝑇𝑇
𝑥𝑥(0) 𝑥𝑥(−1) ⋯ 𝑥𝑥(−𝑀𝑀 + 1) 𝑥𝑥(0) 0 ⋯ 0
⎡ ⎤ ⎡ ⎤
𝑥𝑥(1) 𝑥𝑥(0) ⋯ 𝑥𝑥(−𝑀𝑀 + 2) 𝑥𝑥(1) 𝑥𝑥(0) ⋯ 0
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎢⋮ ⋮ ⋱ ⋮ ⎥
𝐗𝐗 = ⎢
𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) 𝑥𝑥(−1) ⎥ = ⎢𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) 0 ⎥ = �𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝 �
⎢𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥ ⎢𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥ 𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥ ⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎢⋮ ⋮ ⋱ ⋮ ⎥
⎣𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎦ ⎣𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀)⎦
That is, access is required to the data 𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀 + 1) which is outside the
measurement interval and are all set equal to zero. This method is widely used in LS adaptive
filtering.
Case 3: Postwindowing
𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 2 implying:
𝐲𝐲 = [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 𝑦𝑦(𝑁𝑁) ⋯ 𝑦𝑦(𝑁𝑁 + 𝑀𝑀 − 2)]𝑇𝑇
= [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 0 ⋯ 0]𝑇𝑇
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
⎡ ⎤
𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1)
⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ 𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎥
𝐗𝐗 =
⎢𝑥𝑥(𝑁𝑁) 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 4) ⋯ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
⎡ ⎤
𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1)
⎢ ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
=⎢
𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎥ = �𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 �
⎢0 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥ 𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢0 0 ⋯ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣0 0 ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
That is, access is required to the data 𝑥𝑥(𝑁𝑁), 𝑥𝑥(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) and signal 𝑦𝑦(𝑁𝑁),
𝑦𝑦(𝑁𝑁 + 1), … , 𝑦𝑦(𝑁𝑁 + 𝑀𝑀 − 2) which are outside the measurement interval and are all set
equal to zero. This method is not widely used in practice.
� and vector 𝐝𝐝̂ for the various windowing methods are computed by using the
The matrix 𝐑𝐑
SASP MATLAB function [1] [R,d]=lsmatvec(method,x,M,y) for filter order M,
input data vector x, desired response vector y, and method = “prew”, “post”, “full” or
“nowi”.
Figure 8-4 shows the FIR LSE filter operating in block processing mode.
Figure 8-4 Block processing implementation of FIR LSE filter (Figure 8.6[1])
Adopting the same notation and theoretical development previously for the FLP, BLP and
smoother MMSE estimators we formulate the LSE estimators by restating the following
equation for estimating the 𝑙𝑙 𝑡𝑡ℎ sample of the signal using M other samples:
𝑙𝑙−1 𝑀𝑀
𝑒𝑒 (𝑙𝑙) (𝑛𝑛) = � 𝑐𝑐𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑙𝑙) + � 𝑐𝑐𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=0 𝑘𝑘=𝑙𝑙+1
= 𝑥𝑥(𝑛𝑛 − 𝑙𝑙) + 𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑙𝑙)
Equation 8.11
where:
𝐱𝐱 𝑙𝑙 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥�𝑛𝑛 − (𝑙𝑙 − 1)� 𝑥𝑥(𝑛𝑛 − (𝑙𝑙 + 1)) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇
𝐜𝐜 (𝑙𝑙) = [𝑐𝑐0 𝑐𝑐1 … 𝑐𝑐𝑙𝑙−1 𝑐𝑐𝑙𝑙+1 … 𝑐𝑐𝑀𝑀 ]𝑇𝑇
are (𝑀𝑀 × 1) vectors.
For LSE estimation the predictor co-efficients, 𝐜𝐜 (𝑙𝑙) , are held constant over the measurement
interval or frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 Equation 8.11 can be written compactly as:
𝐞𝐞(𝑙𝑙) = 𝐱𝐱 (−𝑙𝑙) + 𝐗𝐗 𝑙𝑙 𝐜𝐜 (𝑙𝑙) for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓
where:
𝑇𝑇
𝐞𝐞(𝑙𝑙) = �𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 ) 𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑓𝑓 )�
𝐱𝐱 (−𝑙𝑙) = [𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑙𝑙) 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙) ⋯ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑙𝑙)]𝑇𝑇
are �𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1� × 1 vectors, and:
𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑁𝑁𝑖𝑖 )
⎡ 𝑇𝑇 ⎤
⎢ 𝐱𝐱 𝑙𝑙 (𝑁𝑁𝑖𝑖 + 1)⎥
𝐗𝐗 𝑙𝑙 = ⎢ ⎥
⎢⋮ 𝑇𝑇 ⎥
⎣𝐱𝐱𝑙𝑙 �𝑁𝑁𝑓𝑓 � ⎦
is a (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) × 𝑀𝑀 matrix.
Hence:
𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁 𝑖𝑖 − 𝑙𝑙) 𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑁𝑁𝑖𝑖 )
⎡ ⎤ ⎡ 𝑇𝑇 ⎤
⎢ 𝑒𝑒 (𝑙𝑙)
(𝑁𝑁 + 1) ⎥ 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙) ⎢ 𝐱𝐱 𝑙𝑙 (𝑁𝑁𝑖𝑖 + 1)⎥ (𝑙𝑙)
𝐞𝐞(𝑙𝑙) =⎢ 𝑖𝑖
⎥ =� �+⎢ ⎥ 𝐜𝐜 = 𝐱𝐱 (𝑙𝑙) + 𝐗𝐗 𝑙𝑙 𝐜𝐜 (𝑙𝑙)
⋮
⎢ (𝑙𝑙) ⋮ ⎥ ⎢⋮ 𝑇𝑇 ⎥
⎣ 𝑒𝑒 (𝑁𝑁 𝑓𝑓 ) ⎦ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑙𝑙) 𝐱𝐱
⎣ 𝑙𝑙 𝑓𝑓 �𝑁𝑁 � ⎦
where:
for full windowing : 𝑁𝑁𝑖𝑖 = 0, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 1,
for no windowing : 𝑁𝑁𝑖𝑖 = 𝑀𝑀, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1, and
𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀), 𝑥𝑥(𝑁𝑁), 𝑥𝑥(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 1) are all set to zero
The solution for the LSE estimator is given by the usual normal equations
(𝑙𝑙)
(𝐗𝐗 𝑇𝑇𝑙𝑙 𝐗𝐗 𝑙𝑙 )𝐜𝐜𝑙𝑙𝑙𝑙 = −𝐗𝐗 𝑇𝑇𝑙𝑙 𝐱𝐱 (−𝑙𝑙)
� 𝐜𝐜 (𝑙𝑙) = 𝐝𝐝̂
𝐑𝐑 𝑙𝑙𝑙𝑙
Equation 8.12
with an LS error of:
(𝑙𝑙) (𝑙𝑙) 2 𝑇𝑇 (𝑙𝑙)
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑥𝑥 (−𝑙𝑙) − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = �𝐱𝐱 (−𝑙𝑙) � + �𝐱𝐱 (−𝑙𝑙) � 𝐗𝐗 𝑙𝑙 𝐜𝐜𝑙𝑙𝑙𝑙
Special cases
(𝑀𝑀/2) (𝑀𝑀/2)
LSE SLS 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 𝑠𝑠 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙
(0) 𝑓𝑓 (0)
LSE FLP 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐚𝐚𝑙𝑙𝑙𝑙
(𝑀𝑀) (𝑀𝑀)
LSE BLP 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 𝑏𝑏 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐛𝐛𝑙𝑙𝑙𝑙
Example 8.2
Problem: Observations over the interval 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 are available for the signal 𝑥𝑥(𝑛𝑛) =
α𝑛𝑛 , where α is an arbitrary constant. Determine the first-order (M = 1) one-step forward linear
predictor via LSE estimation, using the full-windowing and no-windowing methods
gives:
(1)
𝑟𝑟̂11 𝑎𝑎1 = −𝑟𝑟̂12
(1)
𝐸𝐸 𝑓𝑓 = 𝑟𝑟̂22 + 𝑟𝑟̂21 𝑎𝑎1
where:
𝑁𝑁−1 𝑁𝑁−1
1 − |𝛼𝛼|2𝑁𝑁
𝑟𝑟̂11 = 𝑟𝑟̂22 = � |𝑥𝑥(𝑛𝑛)|2 =� |𝛼𝛼|2𝑛𝑛 =
1 − |𝛼𝛼|2
𝑛𝑛=0 𝑛𝑛=0
𝑁𝑁−2 𝑁𝑁−2
1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂12 = 𝑟𝑟̂21 = � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 1) = � |𝛼𝛼|2𝑛𝑛+1 = 𝛼𝛼
1 − |𝛼𝛼|2
𝑛𝑛=0 𝑛𝑛=0
and by carrying out the required algebraic simplification the solution gives:
(1) 𝑟𝑟̂21 1−|α|2(𝑁𝑁−1) 1−|α|2(2𝑁𝑁−1)
𝑎𝑎1 = − = −α , 𝐸𝐸 𝑓𝑓 =
𝑟𝑟̂11 1−|α|2𝑁𝑁 1−|α|2𝑁𝑁
From which we see:
(1)
• The FLP is minimum-phase since it can be shown that �𝑎𝑎1 � ≤ 1
(1)
• If |α| < 1, then for full windowing lim 𝑎𝑎1 = −α and lim 𝐸𝐸 𝑓𝑓 = 1 = 𝑥𝑥(0) which implies
𝑁𝑁→∞ 𝑁𝑁→∞
that for large N the LSE FLP approaches the optimum MMSE FLP.
and:
𝑁𝑁−2 𝑁𝑁−1
2
1 − |𝛼𝛼|2(𝑁𝑁−1) 1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂11 = � |𝑥𝑥(𝑛𝑛)| = , 𝑟𝑟̂22 = � |𝑥𝑥(𝑛𝑛)|2 = |𝛼𝛼|2
1 − |𝛼𝛼|2 1 − |𝛼𝛼|2
𝑛𝑛=0 𝑛𝑛=1
𝑁𝑁−2
1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂12 = 𝑟𝑟̂21 = � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 1) = 𝛼𝛼
1 − |𝛼𝛼|2
𝑛𝑛=0
By carrying out the required algebraic simplification the solution gives:
(1) 𝑟𝑟̂
𝑎𝑎1 = − 21 = −α , 𝐸𝐸 𝑓𝑓 = 0
𝑟𝑟̂11
From which we see:
• The FLP is minimum-phase only when |α| < 1
• The no windowing LSE predictor is identical to the optimum MMSE predictor.
For FIR filtering and prediction the following observations can be made with regards to
adopting the order-recursive algorithms developed in [1: 355-360][2: 215-240].
• In the full windowing case (𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 2) the order m correlation matrix
� 𝑚𝑚 is Toeplitz and the order-recursive algorithms of Levinson and Levinson-Durbin can
𝐑𝐑
be applied.
• In the prewindowing case (𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1) the order-recursive algorithms will
require time updatings. A similar results holds for postwindowing but is not of practical
interest.
• In the no windowing case (𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1) the correlation matrix 𝐑𝐑 � 𝑚𝑚
depends on both M and N resulting in complicated time updatings.
In numerical analysis orthogonal decomposition methods [1, pages 422-431] applied directly
to the data matrix X are preferrable to the computation and solution of the normal equations
whenever numerical stability is important. The “squaring” 𝐑𝐑 � = 𝐗𝐗 𝑇𝑇 𝐗𝐗 of the data to form the
time-average correlation matrix results in a loss of information. Algorithms that compute the
Cholesky factor used in QR factorization directly from X are known as square-root methods.
Now:
‖𝐞𝐞‖ = ‖𝐲𝐲 − 𝐗𝐗𝐗𝐗‖ = ‖𝐲𝐲 − 𝐔𝐔Σ𝐕𝐕𝐻𝐻 𝐜𝐜‖ = ‖𝐔𝐔 𝐻𝐻 𝐲𝐲 − Σ𝐕𝐕𝐻𝐻 𝐜𝐜‖ = ‖𝐲𝐲′ − Σ𝐜𝐜′‖
where 𝐲𝐲′ = 𝐔𝐔 𝐻𝐻 𝐲𝐲, 𝐜𝐜′ = 𝐕𝐕𝐻𝐻 𝐜𝐜 and we have used that fact that ‖𝐔𝐔 𝐻𝐻 𝐞𝐞‖ = ‖𝐞𝐞‖ since U is unitary.
Thus:
𝑟𝑟 𝑁𝑁
The solution of the LS problem using the SVD method from the above analysis is outlined
below [1, pages 431-438].
8.6 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000 (Chapter 8).
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.