Analyzing Analog and Digital Filters

ELEC4404 Signal Processing
0. Introduction to Analog and Digital Filters
What is a filter?
Consider the signal, {𝑥𝑥(𝑡𝑡), 𝑥𝑥[𝑛𝑛]}, and desired modifications or operations on the signal to
produce the modified signal, {𝑦𝑦(𝑡𝑡), 𝑦𝑦[𝑛𝑛]}. We model the modifications/operations as a causal,
stable LTI system {ℎ(𝑡𝑡), ℎ[𝑛𝑛]}, which we refer to as a filter with {𝑥𝑥(𝑡𝑡), 𝑥𝑥[𝑛𝑛]} as the input and
{𝑦𝑦(𝑡𝑡), 𝑦𝑦[𝑛𝑛]} as the output such that:
𝑦𝑦(𝑡𝑡) = ℎ(𝑡𝑡) ∗ 𝑥𝑥(𝑡𝑡) → 𝑌𝑌(𝑗𝑗𝑗𝑗) = 𝐻𝐻(𝑗𝑗𝑗𝑗)𝑋𝑋(𝑗𝑗𝑗𝑗)
𝑦𝑦[𝑛𝑛] = ℎ[𝑛𝑛] ∗ 𝑥𝑥[𝑛𝑛] → 𝑌𝑌�𝑒𝑒 𝑗𝑗Ω � = 𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )
Analysis and design of filters involve the zero-state (or steady-state) magnitude and phase
responses. Filters which operate on analog signals are called analog filters. Filters which
operate on discrete signals are called digital filters.
Properties of 𝑯𝑯(𝒋𝒋𝛚𝛚) FT of 𝒉𝒉(𝒕𝒕)

The frequency response, 𝐻𝐻(𝑗𝑗ω), is a ratio of real polynomials in 𝑠𝑠 = 𝑗𝑗ω and exhibits the
conjugate symmetry property, 𝐻𝐻(−𝑗𝑗ω) = 𝐻𝐻∗ (𝑗𝑗ω). This implies:
• even symmetric magnitude response: |𝐻𝐻(−𝑗𝑗ω)| = |𝐻𝐻(𝑗𝑗ω)|
• odd symmetric phase response: ϕ(−𝑗𝑗ω) = −ϕ(𝑗𝑗ω)
• energy/power spectrum: |𝐻𝐻(𝑗𝑗ω)|2 = 𝐻𝐻(𝑗𝑗ω)𝐻𝐻∗ (𝑗𝑗ω) = 𝐻𝐻(𝑗𝑗ω)𝐻𝐻(−𝑗𝑗ω) =
𝐻𝐻(𝑠𝑠)𝐻𝐻(−𝑠𝑠)|𝑠𝑠=𝑗𝑗ω
Properties of 𝑯𝑯(𝒆𝒆𝒋𝒋𝛀𝛀 ) DTFT of 𝒉𝒉[𝒏𝒏]

𝑗𝑗Ω
The frequency response, 𝐻𝐻(𝑒𝑒 ), is periodic with period 2π radians and principal period
1 1
−π ≤ Ω ≤ π (corresponding to unit period and principal period − ≤ 𝐹𝐹 ≤ , where
2 2
Ω = 2π𝐹𝐹). Since 𝐻𝐻(𝑒𝑒 𝑗𝑗Ω ) is a ratio of real polynomials in 𝑧𝑧 = 𝑒𝑒 𝑗𝑗Ω it exhibits the conjugate
symmetry property, 𝐻𝐻(𝑒𝑒 −𝑗𝑗Ω ) = 𝐻𝐻∗ (𝑒𝑒 𝑗𝑗Ω ). This implies:
• even symmetric magnitude response: �𝐻𝐻(𝑒𝑒 −𝑗𝑗Ω )� = �𝐻𝐻(𝑒𝑒 𝑗𝑗Ω )�
• odd symmetric phase response: ϕ(𝑒𝑒 −𝑗𝑗Ω ) = −ϕ(𝑒𝑒 𝑗𝑗Ω )
2
• energy/power spectrum: �𝐻𝐻(𝑒𝑒 𝑗𝑗Ω )� = 𝐻𝐻(𝑒𝑒 𝑗𝑗Ω )𝐻𝐻∗ (𝑒𝑒 𝑗𝑗Ω ) = 𝐻𝐻(𝑒𝑒 𝑗𝑗Ω )𝐻𝐻(𝑒𝑒 −𝑗𝑗Ω ) =
𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )|𝑧𝑧=𝑒𝑒 𝑗𝑗Ω
2
Prove that |𝐻𝐻(𝑗𝑗𝑗𝑗)|2 = 𝐻𝐻(𝑠𝑠)𝐻𝐻(−𝑠𝑠)|𝑠𝑠=𝑗𝑗𝑗𝑗 and �𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �� = 𝐻𝐻�𝑒𝑒 𝑗𝑗Ω �𝐻𝐻�𝑒𝑒 −𝑗𝑗Ω ��𝑧𝑧=𝑒𝑒 𝑗𝑗Ω by
showing that 𝐻𝐻(−𝑗𝑗𝑗𝑗) = 𝐻𝐻 ∗ (𝑗𝑗𝑗𝑗) and 𝐻𝐻�𝑒𝑒 −𝑗𝑗Ω � = 𝐻𝐻 ∗ (𝑒𝑒 𝑗𝑗Ω ) given that the poles and zeros are
either real or complex conjugate pairs.
10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-1

0.1 Phase Delay and Group Delay Measures

Phase Delay, 𝒕𝒕𝒑𝒑 (𝛚𝛚)
Consider the harmonic signal, 𝑥𝑥(𝑡𝑡) = cos(ω𝑡𝑡), subject to the filter with unity magnitude response and phase
response, ϕ(𝑗𝑗ω), that is 𝑦𝑦(𝑡𝑡) = cos(ω𝑡𝑡 + ϕ(𝑗𝑗ω)). For a causal filter this corresponds to a time delay, that is,
ϕ(𝑗𝑗ω)
𝑦𝑦(𝑡𝑡) = 𝑥𝑥(𝑡𝑡 − 𝑡𝑡𝑝𝑝 ) = cos(ω(𝑡𝑡 − 𝑡𝑡𝑝𝑝 )) = cos(ω𝑡𝑡 − ω𝑡𝑡𝑝𝑝 ) and thus we see ϕ(𝑗𝑗ω) = −ω𝑡𝑡𝑝𝑝 or 𝑡𝑡𝑝𝑝 = − ω .
Thus the phase delay of a filter, 𝑡𝑡𝑝𝑝 (ω), is the time delay experienced by a harmonic
component of frequency ω.
Group Delay, 𝒕𝒕𝒈𝒈 (𝛚𝛚)

Consider a modulated signal with carrier frequency ω𝑐𝑐 and message component ω0 where ω𝑐𝑐 >> ω0
modulated as follows:
1 1
𝑠𝑠(𝑡𝑡) = 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡)cos(ω0 𝑡𝑡) = 𝐴𝐴cos(ω2 𝑡𝑡) + 𝐴𝐴cos(ω1 𝑡𝑡)
2 2
where ω1 = ω𝑐𝑐 − ω0 and ω2 = ω𝑐𝑐 + ω0 . Assume the signal is transmitted through a filter with unity
magnitude response and phase response ϕ(𝑗𝑗ω), then:
1 1
𝑠𝑠(𝑡𝑡) = 𝐴𝐴cos�ω2 𝑡𝑡 + ϕ(ω2 )� + 𝐴𝐴cos�ω1 𝑡𝑡 + ϕ(ω1 )�
2 2
ϕ(ω2 ) + ϕ(ω1 ) ϕ(ω2 ) − ϕ(ω1 )
= 𝐴𝐴cos �ω𝑐𝑐 𝑡𝑡 + � cos �ω0 𝑡𝑡 + �
2 2
We see that the carrier component at frequency ω𝑐𝑐 experiences a time delay equal to the phase delay of:
ϕ(ω2 ) + ϕ(ω1 )
2 ϕ(ω𝑐𝑐 )
𝑡𝑡𝑝𝑝 (ω𝑐𝑐 ) = − ≈−
ω𝑐𝑐 ω𝑐𝑐
and the message component at frequency ω0 experiences a time delay equal to the group delay of:
ϕ(ω2 ) − ϕ(ω1 )
2 ϕ(ω2 ) − ϕ(ω1 ) 𝑑𝑑ϕ(𝑗𝑗ω)
𝑡𝑡𝑔𝑔 (ω𝑐𝑐 ) = − =− ≈− �
ω0 ω2 − ω1 𝑑𝑑ω ω=ω0
Thus the group delay of a filter, 𝑡𝑡𝑔𝑔 (ω), is the time delay experienced by the modulated
signal or envelope signal of frequency ω. Note that the group delay also gives the phase
delay if ϕ(𝑗𝑗ω) = −ω𝑡𝑡𝑝𝑝 and is sometimes used interchangeably.
0.2 Linear-Phase and Minimum-Phase Filters

0.2.1 Linear-Phase Filters
Linear-phase filter
A linear-phase filters possesses constant group delay for all ω, that is:
𝑑𝑑ϕ(𝑗𝑗ω′ )
𝑡𝑡𝑔𝑔 (ω) = − � = 𝑡𝑡𝑔𝑔  ϕ(𝑗𝑗ω) = −ω𝑡𝑡𝑔𝑔 + 𝐶𝐶
𝑑𝑑ω′ ω′ =ω
Symmetry property of impulse response for linear-phase filters
• If the filter impulse response {ℎ(𝑡𝑡), ℎ[𝑛𝑛]} is even symmetric about point {𝑡𝑡0 , 𝑛𝑛0 } then the
phase response of the filter is {−ω𝑡𝑡0 , −Ω𝑛𝑛0 } (excluding ±π “wraps”)
• If the filter impulse response {ℎ(𝑡𝑡), ℎ[𝑛𝑛]} is odd symmetric about point {𝑡𝑡0 , 𝑛𝑛0 } then the
π π
phase response of the filter is {−ω𝑡𝑡0 ± , −Ω𝑛𝑛0 ± } (excluding ±π “wraps”)
2 2

Prove the above result by considering the Fourier transform of the impulse response that is
symmetric about the origin and then the Fourier transform for a delayed version of the
impulse response.
Why are linear-phase filters important?

Linear-phase filters ensure distortion-less transmission of a signal since the harmonic
π
components all experience the same time delay through the filter (if 𝐶𝐶 = ±𝑛𝑛 , 𝑛𝑛 =
2
0,1,2, …Check This!). For the case of unity magnitude gain this implies that the waveform
shape of the signal is preserved (it is not distorted).
0.2.2 Minimum-Phase Filters

Consider an analog filter with transfer function in the following factored form:
(𝑠𝑠 − 𝑧𝑧1 )(𝑠𝑠 − 𝑧𝑧2 ) ⋯ (𝑠𝑠 − 𝑧𝑧𝑚𝑚 )
𝐻𝐻(𝑠𝑠) = 𝐾𝐾
(𝑠𝑠 − 𝑝𝑝1 )(𝑠𝑠 − 𝑝𝑝2 ) ⋯ (𝑠𝑠 − 𝑝𝑝𝑛𝑛 )
For the filter to be stable all the poles have to be in the LHP of the s-plane, i.e. Re(𝑝𝑝𝑖𝑖 ) < 0. If
the filter also has all the zeros in the LHP (or 𝑗𝑗ω axis) of the s-plane, i.e. 𝑧𝑧𝑖𝑖 ≤ 0, then it has
the smallest group delay, and smallest deviation from zero phase, at every frequency of
interest, than any other filter with the same magnitude response, 𝐻𝐻(𝑗𝑗ω).
Minimum-Phase Analog Filters (smallest group delay)

Stable minimum-phase: Re(𝑝𝑝𝑖𝑖 ) < 0 and Re(𝑧𝑧𝑖𝑖 ) ≤ 0 and 𝐾𝐾 > 0
Stable mixed-phase: Re(𝑝𝑝𝑖𝑖 ) < 0
Stable maximum-phase: Re(𝑝𝑝𝑖𝑖 ) < 0 and Re(𝑧𝑧𝑖𝑖 ) > 0 (or 𝐾𝐾 < 0)
Consider a digital filter with transfer function in the following factored form:
(𝑧𝑧 − 𝑧𝑧1 )(𝑧𝑧 − 𝑧𝑧2 ) ⋯ (𝑧𝑧 − 𝑧𝑧𝑀𝑀 )
𝐻𝐻(𝑧𝑧) = 𝐾𝐾
(𝑧𝑧 − 𝑝𝑝1 )(𝑧𝑧 − 𝑝𝑝2 ) ⋯ (𝑧𝑧 − 𝑝𝑝𝑁𝑁 )
For the filter to be stable all the poles have to lie inside the unit circle in the z-plane, i.e. |𝑝𝑝𝑖𝑖 | <
1. If the filter also has all the zeros inside or on the unit circle in the z-plane, |𝑧𝑧𝑖𝑖 | ≤ 1, then it
has the smallest group delay, and smallest deviation from zero phase, at every frequency of
interest, than any other filter with the same magnitude response, 𝐻𝐻(𝑒𝑒 𝑗𝑗Ω ).
Minimum-Phase Digital Filters (smallest group delay)

Stable minimum-phase: |𝑝𝑝𝑖𝑖 | < 1 and |𝑧𝑧𝑖𝑖 | ≤ 1 and 𝐾𝐾 > 0
Stable mixed-phase: |𝑝𝑝𝑖𝑖 | < 1
Stable maximum-phase: |𝑝𝑝𝑖𝑖 | < 1 and |𝑧𝑧𝑖𝑖 | > 1 (or 𝐾𝐾 < 0)
Why are minimum-phase filters important?

Minimum-phase filters are important because for the specified magnitude response they
provide the minimum delay to an input signal (formally, it can be shown the impulse response
is concentrated around the origin, hence “faster” transient decay) and a smaller delay is
“good”.

0.3 Filter Design
0.3.1 Filter Design Process

The filter design process for both analog and digital filters consists of the following three steps
• Specification of the desired frequency response. Since an ideal frequency response does not
lead to the design of causal and stable filters this involves not only specification of the
desired frequency selective characteristics but also the tolerances we are prepared to accept.
• Approximation of the desired frequency response, taking into account the given tolerances,
by a rational transfer function that represents a system that is both causal and stable.
• Realisation of the approximating transfer function by construction of a physical system
0.3.2 Analog versus Digital Filters

Analog filters
• can only be used for analog signals and most real-world signals are analog (e.g. speech,
music, video, electrical, optical, radio, etc.)
• are implemented either by passive RLC or active RC circuits, the first analog filters were
implement by passive RLC circuits, modern analog filters are implemented by active RC
circuits:
◊ passive RLC are bulky, complicated to realise due to coupling, sensitive to component
tolerances, but do not require any active (powered) components.
◊ active RC circuits are smaller, easier to realise due to the buffering effect of the op-
amp, less sensitive to component tolerances, and need to be powered.
• have limited flexibility due to constrained realisation of the rational transfer function (e.g.
you can’t just implement something like 𝐻𝐻(𝑠𝑠) = 1 + 𝑠𝑠 + 𝑠𝑠 2 + 𝑠𝑠 3 ).
Digital Filters
• can only be used for discrete signals and since most real world signals are analog, an extra
analog-to-digital and digital-to-analog conversion is necessary, making all signals
processed digital (i.e. quantised) rather than simply discrete.
• are more expensive than analog filters due to the need for digital hardware and software.
• are implemented directly in computer logic and/or software, and are sensitive to both data
and coefficient quantisation effects.
• are very flexible and versatile due to direct implementation of any approximation in
hardware or software and unconstrained realisation of any rational transfer function.
In engineering applications both analog and digital filters are important.

• Analog filters are still required when simple frequency-selective filtering of analog signals
is required and where the added cost and burden of digital filer implementation is
unwarranted. Analog filters are also necessary in the analog-to-digital conversion as part
of a digital filter system.
• Digital filters are becoming more important as both data storage and transmission are
increasingly in the digital domain, computer hardware is becoming less expensive, and the
added flexibilty and versatility of digital filters allows more sophisticated and adaptive
filtering to be performed.
0.4 Analog Filter Specification and Design

[1: 398-403][2: 630-633]
The design of analog filters based on the frequency-selective characteristics of the filter starts
with the design specification of the required filter response for low-pass, high-pass, band-pass
and band-stop filters:
0.4.1 Design Specification

The design specification is based on the magnitude response and tolerances, and an unspecified
phase response which ideally results in minimal phase and group delays and as close to linear
phase to minimise distortion. The specification of the magnitude response depends on the type
of filter from Figure 11-1 consists of the following parameters:
• The passband cut-off frequency, ω𝑝𝑝 = 2π𝑓𝑓𝑝𝑝 and stopband cut-off frequency, ω𝑠𝑠 = 2π𝑓𝑓𝑠𝑠 .
For bandpass and bandstop filters these are further qualified as (𝜔𝜔𝑝𝑝𝐿𝐿 , 𝜔𝜔𝑠𝑠𝐿𝐿 ) for the low cut-
off frequencies and (𝜔𝜔𝑝𝑝𝐻𝐻 , 𝜔𝜔𝑠𝑠𝐻𝐻 ) for the high cut-off frequencies.
• The passband is permitted to have a rippled response deviating from unity gain by δ𝑝𝑝 :
1 − δ𝑝𝑝 ≤ |𝐻𝐻(𝑗𝑗ω)| ≤ 1 for 0 ≤ |ω| ≤ ω𝑝𝑝
and the passband tolerance parameter, δ𝑝𝑝 , needs to be specified, usually in the form of
2
1
the passband attenuation, 𝐴𝐴𝑝𝑝 = −20log�1 − δ𝑝𝑝 � = 10log � � dB
𝐻𝐻(𝑗𝑗ω𝑝𝑝 )
• The stopband is permitted to have a rippled response deviating from zero gain by δ𝑠𝑠 :
0 ≤ |𝐻𝐻(𝑗𝑗ω)| ≤ δ𝑠𝑠 for |ω| ≥ ω𝑠𝑠
and thus the stopband tolerance parameter, δ𝑠𝑠 , needs to be specified, usually in the form of
1 2
the stopband attenuation, 𝐴𝐴𝑠𝑠 = −20log|δ𝑠𝑠 | = 10log � � dB
𝐻𝐻(𝑗𝑗ω𝑠𝑠 )
𝐴𝐴𝑝𝑝 𝐴𝐴𝑝𝑝
𝐴𝐴𝑠𝑠 𝐴𝐴𝑠𝑠
𝐴𝐴𝑝𝑝 𝐴𝐴𝑝𝑝
𝐴𝐴𝑠𝑠 𝐴𝐴𝑠𝑠
Figure 11-1 Magnitude response specification

0.5 Analog Filter Approximation Functions

[1: 403-426][2: 623-630][3: 450-456
The design of analog filters based on the frequency-selective characteristics of the filter
proceeds as follows:
• derive the low-pass prototype filter parameters given the required filter design specifications from Section
11.4,
• approximate the low-pass response, |𝐻𝐻(𝜈𝜈)|2 ≡ |𝐻𝐻(𝑗𝑗𝑗𝑗)|2 |𝑗𝑗𝑗𝑗=𝜈𝜈 , by a suitable filter approximation
function (Butterworth, Chebyshev, etc.)
• using |𝐻𝐻(𝑗𝑗𝑗𝑗)|2 = 𝐻𝐻(𝑠𝑠)𝐻𝐻(−𝑠𝑠)|𝑠𝑠=𝑗𝑗𝑗𝑗 to design the minimum-phase low-pass prototype filter transfer
function,
• transform the low-pass prototype filter transfer function to the final filter response (the equivalent high-
pass, band-pass or band-stop response transfer function)
This process is automated by various filter design tools. What is important for the signal
processing engineer are the first two steps of design specification and the choice of filter
approximation function for the most efficient design.
0.5.1 Butterworth Filters

The nth order Butterworth low-pass prototype filter is given by:
1
|𝐻𝐻(𝜈𝜈)|2 = 2 2𝑛𝑛
, 𝐴𝐴𝑑𝑑𝑑𝑑 (𝜈𝜈) = 10log(1 + 𝜀𝜀 2 𝜈𝜈 2𝑛𝑛 )
1 + 𝜀𝜀 𝜈𝜈
Equation 11.1
A typical magnitude spectrum for the Butterworth low-pass prototype is shown in Figure 11-2.
Figure 11-2 Butterworth low-pass prototype magnitude response

([1], Figure 13.7, pg. 403)
Properties of the Butterworth Filter
𝜕𝜕𝑚𝑚
• Maximally flat response in the passband since 𝑚𝑚 𝜈𝜈 𝑛𝑛 = 0 for all 𝑚𝑚
𝜕𝜕𝜈𝜈
• Monotonic frequency response (no ripples in either the passband or stopband)
• The high frequency rolloff (in the stopband) is 20𝑛𝑛 dB / decade (i.e. response is attenuated
by 20𝑛𝑛 dB for each 10-fold increase in frequency).
• Phase response approaches −𝑛𝑛𝑛𝑛/2 at high frequencies, and the group delay is fairly
constant (linear phase) within the passband for 𝜔𝜔 ≤ 𝜔𝜔𝑝𝑝 (it peaks near the passband edge in
the vicinity of 𝜔𝜔𝑝𝑝 , with a flatter passband response and more pronounced peaking at the
passband edge for increased order 𝑛𝑛)
Exercise 11.1

Show that the high-frequency rolloff is 20𝑛𝑛 dB / decade

Example 11.1
Question: A Butterworth lowpass filter is to meet the following specifications: 𝐴𝐴𝑝𝑝 = 1 dB for
ω ≤ 4 rad/s and 𝐴𝐴𝑠𝑠 ≥ 20 dB for ω ≥ 8 rad/s.
Answer:
2012.4
𝐻𝐻(𝑠𝑠) =
𝑠𝑠 5 + 14.82𝑠𝑠 4 + 109.8𝑠𝑠 3 + 502.6𝑠𝑠 2 + 1422.3𝑠𝑠 + 2012.4
Figure 11-3 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.3, pg. 409)
Example 11.2
Question: Design a Butterworth bandstop filter with 2-dB passband edges of 30 Hz and 100
Hz and 40 dB stopband edges of 50 Hz and 60 Hz.
Answer:
𝑠𝑠 6 + 3.55(10)5 𝑠𝑠 4 + 4.21(10)10 𝑠𝑠 2 + 1.66(10)15
𝑠𝑠 6 + 8.04(10)2 𝑠𝑠 5 + 6.79(10)5 𝑠𝑠 4 + 2.56(10)8 𝑠𝑠 3 + 8.04(10)10 𝑠𝑠 2 + 1.13(10)13 𝑠𝑠 + 1.66(10)15
Figure 11-4: Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.3C, pg. 411)

0.5.2 Chebyshev I Filters

The nth order Chebyshev I magnitude response is given by:
1
|𝐻𝐻(𝜈𝜈)|2 = 2 2
, 𝐴𝐴𝑑𝑑𝑑𝑑 (𝜈𝜈) = 10log[1 + 𝜀𝜀 2 𝑇𝑇𝑛𝑛2 (𝜈𝜈)]
1 + 𝜀𝜀 𝑇𝑇𝑛𝑛 (𝜈𝜈)
Equation 11.2
where 𝑇𝑇𝑛𝑛 (𝜈𝜈) is the nth-order Chebyshev polynomial. A typical magnitude spectrum for the
Chebyshev low-pass prototype is shown in Figure 11-5.
Figure 11-5 Chebyshev low-pass prototype magnitude response ([1], Figure 13.10, pg.
413)
Properties of the Chebyshev I Filter

• Equiripple response in the passband, since 𝑇𝑇𝑛𝑛 (ν) oscillates between -1 and 1 for 0 ≤ ν ≤
1, as follows:
𝑛𝑛+1
For odd n, there are maxima (of unity gain) including a maxima at ν = 0
2
𝑛𝑛 1
For even n, there are maxima (of unity gain) with a minima (of ) at ν = 0
2 √1+𝜀𝜀 2
• Beyond the passband, ν ≥ 1, the response decays rapidly and there is a sharp transition to
the stopband.
• Phase response approaches −𝑛𝑛𝑛𝑛/2 at high frequencies but has the undesirable property
that is not as linear as that of the Butterworth filter in the passband.

Example 11.3
Question: A Chebyshev I lowpass filter is to meet the following specifications: 𝐴𝐴𝑝𝑝 = 1 dB for
ω = 4 rad/s and 𝐴𝐴𝑠𝑠 ≥ 20 dB for ω ≥ 8 rad/s.
Answer:
31.4436
𝑠𝑠 3 + 3.9534𝑠𝑠 2 + 19.8145𝑠𝑠 + 31.4436
Example 11.4
Question: A Chebyshev I bandpass filter is to meet the following specifications:

Passband edges: �𝜔𝜔𝑠𝑠𝐿𝐿 , 𝜔𝜔𝑝𝑝𝐿𝐿 , 𝜔𝜔𝑝𝑝𝐻𝐻 , 𝜔𝜔𝑠𝑠𝐻𝐻 � = [0.89, 1.019, 2.221, 6.155]
Max. passband attenuation: 𝐴𝐴𝑝𝑝 = 2 dB Min. stopband attenuation: 𝐴𝐴𝑠𝑠 = 20 dB
Answer:
To ensure geometric symmetry the ωH
s is decreased to 2.54.
0.34𝑠𝑠 4
𝑠𝑠 8 + 0.86𝑠𝑠 7 + 10.87𝑠𝑠 6 + 6.75𝑠𝑠 5 + 39.39𝑠𝑠 4 + 15.27𝑠𝑠 3 + 55.69𝑠𝑠 2 + 9.99𝑠𝑠 + 26.25
Figure 11-7 Linear and dB magnitude response of 𝐇𝐇(𝐬𝐬) ([1], Figure E13.5C, pg. 421)

0.5.3 Chebyshev II Filters

The Chebyshev II (or inverse Chebyshev) is a transformation of the Chebyshev I filter such
that the passband response is maximally flat but the stopband response is equiripple. This can
be achieved by the following response function |𝐻𝐻(𝜈𝜈)|2 = 1 − |𝐻𝐻𝐶𝐶 (1/𝜈𝜈)|2 where |𝐻𝐻𝐶𝐶 (𝜈𝜈)|2 =
1
2 2
is the Chebyshev I filter response (but with parameter µ). This transformation is
1+𝜇𝜇 𝑇𝑇𝑛𝑛 (𝜈𝜈)
depicted in Figure 11-8.
Figure 11-8 Transformation of Chebyshev I filter to Chebyshev II filter

([1], Figure 13.13, pg. 422)
Thus the Chebyshev II magnitude response is given by:

1
1 𝜇𝜇2 𝑇𝑇𝑛𝑛2 � � 1
|H(ν)|2 = 1 − = 𝜈𝜈 , 𝐴𝐴 (ν) = 10log �1 + �
1 1 𝑑𝑑𝐵𝐵 2 𝑇𝑇 2 (1/𝜈𝜈)
2 2 2 2
1 + 𝜇𝜇 𝑇𝑇𝑛𝑛 � � 1 + 𝜇𝜇 𝑇𝑇𝑛𝑛 � � 𝜇𝜇 𝑛𝑛
𝜈𝜈 𝜈𝜈
Equation 11.3
Properties of the Chebyshev II Filter

𝜕𝜕𝑚𝑚
• Maximally flat response in the passband since it can be shown that 𝑚𝑚 |𝐻𝐻(𝜈𝜈)| = 0 at ν =
𝜕𝜕𝜈𝜈
0 for most values of m.
• Beyond the passband, ν ≥ 1, the response decays rapidly and there is a sharp transition to
the stopband.
• Equiripple response in the stopband, since 𝑇𝑇𝑛𝑛 (1/𝜈𝜈) oscillates between -1 and 1 for 1 ≤
ν ≤ ∞.
• Phase response in the passband is linear and similar to that of the Butterworth filter

Example 11.5
Question: A Chebyshev II lowpass filter is to meet the following specifications: 𝐴𝐴𝑝𝑝 ≤ 1 dB

for ω ≤ 4 rad/s and 𝐴𝐴𝑠𝑠 ≥ 20 dB for ω ≥ 8 rad/s.
Answer:
2.4121𝑠𝑠 2 + 205.8317
𝐻𝐻(𝑠𝑠) = 3
𝑠𝑠 + 11.2431𝑠𝑠 2 + 60.2942𝑠𝑠 + 205.8317
0.5.4 Elliptic and Bessel Filters

Elliptic Filters
The elliptic filter exhibits both ripples in the passband and stopband as shown in Figure 11-10
Figure 11-10 Elliptic low-pass prototype magnitude response ([1], Figure 13.14, pg. 427)
The analysis and design of the elliptic filter is detailed in pgs. 427-432 of [1] and is fairly
complex and beyond the scope of the examinable material (but these may be covered in
required laboratory or assignment work).

Properties of Elliptic Filters

• Equiripple response in both the passband and stopband
• Exhibits one of the steepest transitions from passband to stopband and thus provides the
lowest filter order for the specified passband and stopband attenuations
• Highly nonlinear phase and worst delay characteristics
Due to the complexity of deriving the co-efficients of the elliptic filter analytically and the
importance of elliptic filters, tables have been published which provide the required co-
efficients for various prototype design specifications.
Bessel Filters
The analysis and design of the Bessel filter is fairly complex and beyond the scope of the
examinable material (but these may be covered in required laboratory or assignment work).
Properties of Bessel Filters
• Monotonic frequency response (no ripples in either the passband or stopband)
• Provides control over the phase response in the passband by being able to design for a
specified value of the group delay, 𝑡𝑡𝑔𝑔 .
• Can design for passband attenuation, 𝐴𝐴𝑝𝑝 , but not stopband attenuation, 𝐴𝐴𝑠𝑠 .
0.6 Analog Filter Realisation and Implementation

[3: 468-478][4: 242~363]
0.6.1 Realisation of System Functions

To facilitate realisation of analog filter system functions (and also to plot the magnitude and
phase of a complex transfer function using Bode plots, see pgs. 371-378 of [1]) we decompose
the system function (by finding the real and complex conjugate zeros and poles) as a cascade
of first-order or second-order sections as follows:
𝑏𝑏0 𝑠𝑠 𝑚𝑚 + 𝑏𝑏1 𝑠𝑠 𝑚𝑚−1 + ⋯ + 𝑏𝑏𝑚𝑚
𝐻𝐻(𝑠𝑠) = = 𝐻𝐻1 (𝑠𝑠)𝐻𝐻2 (𝑠𝑠) ⋯ 𝐻𝐻𝑝𝑝 (𝑠𝑠)
𝑎𝑎0 𝑠𝑠 𝑛𝑛 + 𝑎𝑎1 𝑠𝑠 𝑛𝑛−1 + ⋯ + 𝑏𝑏𝑛𝑛
where 𝐻𝐻𝑖𝑖 (𝑠𝑠) can be one of the following:
First-order low-pass
𝐻𝐻𝜔𝜔0
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the 3dB passband cut-off frequency and 𝐻𝐻𝑖𝑖 (0) = 𝐻𝐻
𝑠𝑠+𝜔𝜔0
First-order high-pass
𝐻𝐻𝐻𝐻
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the 3dB passband cut-off frequency and 𝐻𝐻𝑖𝑖 (∞) = 𝐻𝐻
𝑠𝑠+𝜔𝜔0
Second-order low-pass
𝐻𝐻𝜔𝜔02
𝐻𝐻𝑖𝑖 (𝑠𝑠) = where 𝜔𝜔0 is the passband cut-off frequency, 𝐻𝐻𝑖𝑖 (0) = 𝐻𝐻, and 𝑄𝑄 is the
𝑠𝑠 2 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
quality factor such that for 𝑄𝑄 < 1/√2 the passband response is monotonic, but for 𝑄𝑄 > 1/√2
passband response exhibits overshoot near 𝜔𝜔0 .

Second-order band-pass
𝐻𝐻2𝜁𝜁𝜔𝜔0 𝑠𝑠 𝐻𝐻(𝜔𝜔0 /𝑄𝑄)𝑠𝑠
𝐻𝐻𝑖𝑖 (𝑠𝑠) = 2 2 = where 𝜔𝜔0 is the bandpass centre frequency, 𝐻𝐻𝑖𝑖 (𝜔𝜔0 ) =
𝑠𝑠 +2𝜁𝜁𝜔𝜔0 𝑠𝑠+𝜔𝜔0 𝑠𝑠 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
2
𝜔𝜔0
𝐻𝐻, 𝜁𝜁 is the damping factor and 𝑄𝑄 = 1⁄2𝜁𝜁 is the quality factor such that 𝐵𝐵 = 𝜔𝜔2 − 𝜔𝜔1 =
𝑄𝑄
where 𝐻𝐻𝑖𝑖 (𝜔𝜔1 ) = 𝐻𝐻𝑖𝑖 (𝜔𝜔2 ) = 𝐻𝐻/√2 (3 dB bandpass attenuation), and 𝜔𝜔1 𝜔𝜔2 = Thus since 𝜔𝜔02 .
𝜔𝜔0
𝑄𝑄 = a higher 𝑄𝑄 implies a more peaked response (smaller bandwidth) relative to the centre
𝐵𝐵
frequency.
Second-order high-pass
𝐻𝐻𝑠𝑠 2 𝐻𝐻𝑠𝑠 2
𝐻𝐻𝑖𝑖 (𝑠𝑠) = = where 𝜔𝜔0 is the passband cut-off frequency, 𝐻𝐻𝑖𝑖 (∞) =
𝑠𝑠 2 +2𝜁𝜁𝜔𝜔0 𝑠𝑠+𝜔𝜔02 𝑠𝑠 2 +(𝜔𝜔0 /𝑄𝑄)𝑠𝑠+𝜔𝜔02
𝐻𝐻, and for 𝜁𝜁 > 0.707, 𝑄𝑄 < 1/√2 the passband response is monotonic, but for 𝜁𝜁 < 0.707, 𝑄𝑄 >
1/√2 passband response exhibits overshoot near 𝜔𝜔0 .
Biquadratic
𝑠𝑠 2 +(𝜔𝜔𝑧𝑧 /𝑄𝑄𝑧𝑧 )𝑠𝑠+𝜔𝜔𝑧𝑧2
𝐻𝐻𝑖𝑖 (𝑠𝑠) = 𝐻𝐻 2 is a general form where both numerator and denominator are
𝑠𝑠 2 +(𝜔𝜔𝑝𝑝 /𝑄𝑄𝑝𝑝 )𝑠𝑠+𝜔𝜔𝑝𝑝
second-order polynomials.
Why a large 𝑄𝑄 factor is bad

The two poles of the second-order filters (given by the roots of 𝑠𝑠 2 + (𝜔𝜔0 /𝑄𝑄)𝑠𝑠 + 𝜔𝜔02 ) are:
𝜔𝜔0
𝑝𝑝1,2 = −𝜎𝜎𝑘𝑘 ± 𝑗𝑗𝜔𝜔𝑘𝑘 = − ± 𝑗𝑗𝜔𝜔0 �1 − (1/4𝑄𝑄2 )
2𝑄𝑄
from which we see that for 𝑄𝑄 > 0.5 the poles are complex conjugates, lying on a circle of
radius ω0 in the s-plane. The larger the 𝑄𝑄 factor the closer the poles will be to the 𝑗𝑗𝑗𝑗-axis and
hence the less stable the system will be as it becomes more sensitive to component tolerances
(i.e. if a pole is close to the 𝑗𝑗𝑗𝑗-axis it is easy for it to be pushed over to the RHP due to
component tolerances). See also filter performance pgs. 378-382 of [1].
Butterworth / Chebyshev I / Bessel (cheaper realisation)

The low-pass, band-pass and high-pass filters can be decomposed into a cascade of the
corresponding second-order low-pass, band-pass or high-pass sections, each section
corresponding to a complex conjugate root pairing. For real roots, a first-order section is
needed. The band-stop filter will need to include biquadratic sections as part of the
decomposition.
Chebyshev II / Elliptic (more expensive realisation)

Most filter forms will require decomposition into corresponding biquadratic forms.
0.6.2 Active RC Realisations

The active RC realisation implements the analog filter transfer function either by cascade or
direct realisation using only RC components and op-amps.

Features
• Due to buffering provided by op-amp complex 𝐻𝐻(𝑠𝑠) can be realised by the appropriate
cascade of standard first-order and second-order filter circuits (modular design)
• Greater control and versatility offered by the modular design approach
• Absence of bulky, inductor components make IC implementation of active RC filters
possible and the preferred filters for electronic devices
• Presence of op-amp requires filter circuits to be powered and limits use to low-power
applications.
There are two main approaches to implementing active RC filter circuits

• Cascade realisation, by cascading first-order and second-order sections so that 𝐻𝐻(𝑠𝑠) =
𝐻𝐻1 (𝑠𝑠)𝐻𝐻2 (𝑠𝑠) ⋯ 𝐻𝐻𝑝𝑝 (𝑠𝑠). This requires the use of standard first-order and second-order
cascade circuits. This method is detailed in pgs. 242-363 of [4].
• Direct realisation, by directly synthesising the complete filter circuit from 𝐻𝐻(𝑠𝑠) (as is the
case with passive RLC filters not discussed here). This includes transforming a passive
RLC filter design to an active RC filter equivalent by simulating inductor elements by
active elements. This method is detailed in pgs. 364-428 of [4].
0.7 Digital IIR and FIR filters
Infinite Impulse Response (IIR) filter
𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀

𝐻𝐻(𝑧𝑧) =
𝐴𝐴0 + 𝐴𝐴1 𝑧𝑧 −1 + ⋯ + 𝐴𝐴𝑁𝑁 𝑧𝑧 −𝑁𝑁
Features
• possesses both zeros and poles
• yields the smallest filter length for a given application
• poles must satisfy |𝑝𝑝𝑖𝑖 | < 1 for the filter to be stable
• transient response of filter may be long-lasting due to infinite memory and recursive
nature of filter
• stable, causal filter cannot be designed for linear phase, thus there will be phase distortion
effects
Exercise 11.2 For an IIR filter to exhibit linear phase the impulse response has to exhibit
symmetry, e.g. even symmetry ℎ[𝑛𝑛] = ℎ[−𝑛𝑛]. Show that this implies 𝐻𝐻(𝑧𝑧) = 𝐻𝐻(1/𝑧𝑧) and
that for every pole |𝑝𝑝𝑖𝑖 | < 1 there will be a corresponding pole �𝑝𝑝𝑗𝑗 � > 1 and hence a stable
linear-phase IIR filter is not possible.
Finite Impulse Response (FIR) filter
𝐻𝐻(𝑧𝑧) = 𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀

Features
• possesses only zeros
• yield larger filter lengths for a given application (i.e. longer delays, more expensive)
• absence of poles implies filter will always be stable (and causal)
• transient response is of limited duration due to finite memory and nonrecursive nature of
filter
• can realise a response with exactly linear phase, thus a distortionless filter is possible
The main advantages of digital filters over analog filters arise when considering FIR filters
since these possess transient responses of limited duration, are always stable, and can be
designed for exactly linear phase. The properties of IIR filters are analogous to analog filters
whereas FIR filters are in a “class of their own”.
0.8 Digital Filter Design Principles
IIR Filter Design

The standard approach is to first design the analog filter, 𝐻𝐻(𝑠𝑠), and then use a transformation
method which maps the analog filter to an equivalent IIR digital filter, 𝐻𝐻(𝑧𝑧).
Two popular methods are:
• Impulse Invariant Transformation (suffers from aliasing)
• Bilinear Transformation (suffers from warping)
With any mapping from the s-domain to the z-domain there are the dual issues of aliasing or warping or both:.
Let 𝑠𝑠 = σ + 𝑗𝑗ω then:
𝑧𝑧 = 𝑒𝑒 σ𝑡𝑡𝑆𝑆 𝑒𝑒 𝑗𝑗Ω
where Ω = ω𝑡𝑡𝑆𝑆 = 2π𝑓𝑓/𝑆𝑆 = 2π𝐹𝐹 is the digital frequency in rad/s. Thus 𝑒𝑒 𝑗𝑗Ω is periodic with period 2π,
principal range −π ≤ Ω ≤ π which corresponds to −π𝑆𝑆 ≤ ω ≤ π𝑆𝑆. Thus the frequencies −∞ ≤ ω ≤ ∞ in
the s-plane (for fixed σ in the LHP) are mapped many times over onto the same circle of in the z-plane of
radius 𝑒𝑒 σ𝑡𝑡𝑆𝑆 (problem of aliasing) or will need to be mapped non-linearly (problem of warping).
The method of choice is the bilinear transformation since it is a direct mapping from the s-
domain to the z-domain, the only issue is the warping effect which can be compensated for.
Bilinear Transformation
Given the analog filter, 𝐻𝐻(𝑠𝑠), and sampling frequency, 𝑆𝑆 Hz, we can design an equivalent IIR
filter using:
𝐻𝐻(𝑧𝑧) = 𝐻𝐻(𝑠𝑠)| 1−𝑧𝑧 −1
𝑠𝑠=𝐶𝐶
1+𝑧𝑧 −1
where 𝐶𝐶 = 2𝑆𝑆 and the following non-linear mapping between analog frequency, 𝜔𝜔, and digital
frequency, Ω holds:
Ω = 2 tan−1 (𝜔𝜔⁄𝐶𝐶 ) , 𝜔𝜔 = 𝐶𝐶 tan(0.5Ω)
where we note for Ω small and 𝐶𝐶 = 2𝑆𝑆 that: 𝜔𝜔 ≈ 2𝑆𝑆(0.5Ω) = 𝑆𝑆Ω (linear mapping)
FIR Filter Design
FIR filters are usually designed for linear phase and of the form 𝐻𝐻(𝑧𝑧) = ∑𝑁𝑁−1
𝑛𝑛=0 ℎ[𝑛𝑛]𝑧𝑧
−𝑛𝑛
, so
that all we need worry about is the impulse response sequence, ℎ[𝑛𝑛]. Since a linear phase filter
possesses an impulse response, ℎ[𝑛𝑛], that exhibits either odd or even symmetry, we may need
to consider the types of possible impulse response sequences and the consequent symmetry
effects.

There are various design approaches one can use:

• The window-based method essentially truncates the infinite extent sinc( ) impulse response
for a low-pass filter (to make it stable) and introduces a delay (to make it causal). This
approach works and can be done “by hand” but is not at all efficient and requires additional
transformations to obtain high-pass, band-pass and band-stop filters.
• The frequency-sampling method samples the desired filter frequency responses and
applies an IDFT from the samples to directly obtain the impulse response samples. This is
a popular approach which allows maximum customisation for any filter response effects
but is not an optimal design method.
• The Parks-McClellan algorithm which provides an optimal FIR filter using a fairly
sophisticated iterative process. This is the recommended method if you have the software
tools at your disposal.
Furthermore FIR filters can also be used to design more than just LP, HP. BP and BS filters.
They are also used for:
• Functional filters like differentiators, phase shifters, etc.
• Adaptive filters used for noise cancellation, etc.
0.8.1 Differentiators
1 π 𝑑𝑑𝑑𝑑[𝑛𝑛] 1 π
Since 𝑥𝑥[𝑛𝑛] = ∫ 𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω then we have 𝑦𝑦[𝑛𝑛] =
2𝜋𝜋 −π 𝑑𝑑𝑑𝑑
= ∫ 𝑗𝑗Ω𝑋𝑋(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω
2𝜋𝜋 −π
1 π
and as 𝑦𝑦[𝑛𝑛] = ∫ 𝑌𝑌(𝑒𝑒 𝑗𝑗Ω )𝑒𝑒 𝑗𝑗Ωn 𝑑𝑑Ω then we can see that 𝑌𝑌�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω𝑋𝑋(𝑒𝑒 𝑗𝑗Ω ) where
2𝜋𝜋 −π
𝐻𝐻�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω and thus the ideal differentiator is described by:
𝐻𝐻�𝑒𝑒 𝑗𝑗Ω � = 𝑗𝑗Ω, |Ω| ≤ 𝜋𝜋
To find ℎ[𝑛𝑛] we use the inverse DTFT to obtain:

1 𝜋𝜋 1 𝜋𝜋 cos(𝜋𝜋𝜋𝜋)
ℎ[𝑛𝑛] = � 𝑗𝑗Ω𝑒𝑒 𝑗𝑗Ω𝑛𝑛 𝑑𝑑Ω = − � Ω sin(Ω𝑛𝑛) 𝑑𝑑Ω = � 𝑛𝑛 𝑛𝑛 ≠ 0
2𝜋𝜋 −𝜋𝜋 𝜋𝜋 0
0 𝑛𝑛 = 0
Since ℎ[𝑛𝑛] is infinite duration to design an FIR differentiator we must truncate ℎ[𝑛𝑛], by
ℎ[𝑛𝑛]𝑤𝑤[𝑛𝑛] −𝑛𝑛0 ≤ 𝑛𝑛 ≤ 𝑛𝑛0
application of an appropriate window function, to ℎ𝑁𝑁 [𝑛𝑛] = �
0 otherwise
where 𝑛𝑛0 = 0.5(𝑁𝑁 − 1). A delay of 𝑛𝑛0 is then introduced, producing ℎ𝑁𝑁 [𝑛𝑛 − 𝑛𝑛0 ], to ensure
causality. The distortion arising from the truncation is controlled by the choice of window
function, 𝑤𝑤[𝑛𝑛], and filter length, 𝑁𝑁.

0.8.2 Hilbert transformers

A Hilbert transformer is used to shift the phase of a signal by:
− 𝜋𝜋⁄2 Ω > 0 −𝑗𝑗 Ω > 0
𝑗𝑗Ω 𝑗𝑗Ω
Ω = 0  𝐻𝐻�𝑒𝑒 � = � 0 Ω = 0
∠𝐻𝐻�𝑒𝑒 � = � 0
𝜋𝜋/2 Ω < 0 𝑗𝑗 Ω < 0
To find ℎ[𝑛𝑛] we use the inverse DTFT to obtain:
1 0 𝜋𝜋
1 𝜋𝜋 1 − cos(Ω𝑛𝑛)
ℎ[𝑛𝑛] = 𝑗𝑗Ω𝑛𝑛 𝑗𝑗Ω𝑛𝑛
�� 𝑗𝑗𝑒𝑒 𝑑𝑑Ω + � −𝑗𝑗𝑒𝑒 𝑑𝑑Ω� = � sin(Ω𝑛𝑛) 𝑑𝑑Ω = � 𝑛𝑛 ≠ 0
2𝜋𝜋 −𝜋𝜋 π 𝜋𝜋𝜋𝜋
0 0 0 𝑛𝑛 = 0
Since ℎ[𝑛𝑛] is infinite duration to design an FIR differentiator we must truncate ℎ[𝑛𝑛], by
ℎ[𝑛𝑛]𝑤𝑤[𝑛𝑛] −𝑛𝑛0 ≤ 𝑛𝑛 ≤ 𝑛𝑛0
application of an appropriate window function, to ℎ𝑁𝑁 [𝑛𝑛] = �
0 otherwise
where 𝑛𝑛0 = 0.5(𝑁𝑁 − 1). A delay of 𝑛𝑛0 is then introduced, producing ℎ𝑁𝑁 [𝑛𝑛 − 𝑛𝑛0 ], to ensure
causality. The distortion arising from the truncation is controlled by the choice of window
function, 𝑤𝑤[𝑛𝑛], and filter length, 𝑁𝑁.
0.9 Digital Filter Realisation and Implementation

[1: 637-641][2: 161-167][3: 525-531]
0.9.1 Realisation of IIR and FIR Filters

Consider the system function of an IIR filter rewritten in the following form (where the
denominator constant 𝐴𝐴0 has been divided out):
𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀
1 + 𝐴𝐴1 𝑧𝑧 −1 + ⋯ + 𝐴𝐴𝑁𝑁 𝑧𝑧 −𝑁𝑁
The corresponding difference equation is:

𝑦𝑦[𝑛𝑛] = −𝐴𝐴1 𝑦𝑦[𝑛𝑛 − 1] − 𝐴𝐴2 𝑦𝑦[𝑛𝑛 − 2] − ⋯ − 𝐴𝐴𝑁𝑁 𝑦𝑦[𝑛𝑛 − 𝑁𝑁] + 𝐵𝐵0 𝑥𝑥[𝑛𝑛] + 𝐵𝐵1 𝑥𝑥[𝑛𝑛 − 1] + ⋯
+ 𝐵𝐵𝑀𝑀 𝑥𝑥[𝑛𝑛 − 𝑀𝑀]
The above difference equation for the IIR filter can be easily realised by the appropriate
combination of unit delay, summer and multiplier units to produce the output 𝑦𝑦[𝑛𝑛] (see Section
1.3.1). For simplicity, and without loss of generality, we assume 𝑀𝑀 = 𝑁𝑁 (if 𝑀𝑀 ≠ 𝑁𝑁 then we
simply make the respective co-efficients zero). There are two possible realisations as shown
by Figure 11-11:

• Direct form II is a direct implementation of the difference equation in canonical form using
only N delay elements and 2N summers
• Transposed realisation reverses the signal flow thereby producing a realisation with only
N delay elements and N+1 summers.
Figure 11-11 Direct form II (left) and Transposed (right) realisation of an IIR digital
filter ([1], Figure 18.3, pg. 639)
For an FIR filter with system function:

𝐻𝐻(𝑧𝑧) = 𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀
The corresponding difference equation is:
𝑦𝑦[𝑛𝑛] = 𝐵𝐵0 𝑥𝑥[𝑛𝑛] + 𝐵𝐵1 𝑥𝑥[𝑛𝑛 − 1] + ⋯ + 𝐵𝐵𝑀𝑀 𝑥𝑥[𝑛𝑛 − 𝑀𝑀]
The difference equation for the FIR filter can be easily realised as shown Figure 11-12.
Figure 11-12 Realisation of an FIR digital filter ([1], Figure 18.1(left), pg. 638)

Cascade and Parallel Systems

The realisation of a high-order digital filter system function by one monolithic structure may
not be possible if using off-the-shelf components. Alternative realisations are based on
factoring the system function in cascade or parallel fashion. Such realisations can also provide
more numerically stable and accurate implementations (e.g. reduced round-off accumulation
errors).
Cascade realisation
𝐻𝐻(𝑧𝑧) = 𝐻𝐻1 (𝑧𝑧)𝐻𝐻2 (𝑧𝑧) … 𝐻𝐻𝑛𝑛 (𝑧𝑧)
where 𝐻𝐻𝑖𝑖 (𝑧𝑧) is usually a first-order or second-order transfer function unit and 𝐻𝐻(𝑧𝑧) is
represented as a product of such units (e.g. by grouping conjugate roots for the second-order
units and real roots for the first-order units).
Parallel realisation
𝐻𝐻(𝑧𝑧) = 𝐻𝐻1 (𝑧𝑧) + 𝐻𝐻2 (𝑧𝑧) + ⋯ + 𝐻𝐻𝑛𝑛 (𝑧𝑧)
where 𝐻𝐻𝑖𝑖 (𝑧𝑧) is usually a first-order or second-order transfer function unit and 𝐻𝐻(𝑧𝑧) is
represented as a partial fraction expression of such units (e.g. by grouping conjugate poles for
the second-order units and real poles for the first-order units).
Example 11.6
Question: Realise the digital filter transfer function:
6 − 2𝑧𝑧 −1
1 1
(1 − 𝑧𝑧 −1 ) �1 − 𝑧𝑧 −1 − 𝑧𝑧 −2 �
6 6
directly as one structure, in cascade and in parallel.
Answer: (all realizations in Direct form II)

One structure directly:

Cascade:
Parallel:

Software implementation
IIR and FIR filters can be easily implemented in software by storing the digital signal values,
𝑥𝑥[𝑛𝑛], as an array variable and looping through difference equation at each time instant, for
example:
for n = 1 to T do
y[n]=A1*y[n-1]-A2*y[n-2]- … -AN*y[n-N]+B0*x[n]+B1*x[n-1]+ … +
BM*x[n-M]
done
MATLAB also has the following function for implementing a general IIR filter:
Y = FILTER(B,A,X) filters the data in vector X with the filter described
by vectors A and B to create the filtered data Y.
And various tools to automate the analysis, design and use of digital filters on signals:
Graphical User Interfaces
fdatool - Filter Design and Analysis Tool.
fvtool - Filter Visualization Tool.
sptool - Signal Processing Tool.
wintool - Window Design and Analysis Tool.
wvtool - Window Visualization Tool.
You can use the sptool GUI to import data signals, design filters, apply the filter to the
signal and view the spectrum of the original and filtered signal and fdatool to visualise and
design filters.
0.10 References
1. Ambardar, “Analog and Digital Signal Processing”, 2nd Ed., Brooks/Cole, 1999.
2. S. Haykin, B.V. Veen, “Signals and Systems”, 2nd Ed., Wiley, 2003.
3. B.P. Lathi, “Linear Systems and Signals”, 2nd Ed., Oxford University Press, 2005.
4. L.P. Huelsman, “Active and Passive Analog Filter Design”, McGraw-Hill, 1993.

1. Random Variables and Probability Distributions
1.1 Probability Distributions and Random Variables (Single, Basics)

[1: 72-83][2: 141-152]
Random Variable: We are given an experiment specified by the space S, a field of subsets
of S called events, and the probability assigned to these events. To every outcome 𝜁𝜁 of this
experiment, we assign a number 𝑋𝑋(𝜁𝜁). We have thus created a function 𝑋𝑋 with domain the
set S and range a set of numbers. This function is called a random variable.
Example 1-1: EXAMPLE 4-2 from [1], pg. 74
Cumulative Distribution Function: The elements of the set S that are contained in the
event {𝑋𝑋 ≤ 𝑥𝑥} change as the number 𝑥𝑥 takes various values. The probability 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥} of the
event {𝑋𝑋 ≤ 𝑥𝑥} is, therefore, a number that depends on 𝑥𝑥. This number is denoted by 𝐹𝐹𝑋𝑋 (𝑥𝑥)
and is called the cumulative distribution function (cdf).
Example 1-2: EXAMPLE 4-4 from [1], pg. 76 (Discrete)
Key Properties of 𝐹𝐹𝑋𝑋 (𝑥𝑥) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥}:

1. 𝐹𝐹𝑋𝑋 (+∞) = 1, 𝐹𝐹𝑋𝑋 (−∞) = 0
2. If 𝑥𝑥1 < 𝑥𝑥2 then 𝐹𝐹𝑋𝑋 (𝑥𝑥1 ) ≤ 𝐹𝐹𝑋𝑋 (𝑥𝑥2 )
3. If 𝐹𝐹𝑋𝑋 (𝑥𝑥0 ) = 0 then 𝐹𝐹𝑋𝑋 (𝑥𝑥) = 0 for every 𝑥𝑥 ≤ 𝑥𝑥0 .
4. 𝑃𝑃{𝑋𝑋 > 𝑥𝑥} = 1 − 𝐹𝐹𝑋𝑋 (𝑥𝑥)
5. 𝑃𝑃{𝑥𝑥1 < 𝑋𝑋 ≤ 𝑥𝑥2 } = 𝐹𝐹𝑋𝑋 (𝑥𝑥2 ) − 𝐹𝐹𝑋𝑋 (𝑥𝑥1 )
Continuous type vs Discrete type:

• A random variable, 𝑋𝑋, is discrete type if 𝐹𝐹𝑋𝑋 (𝑥𝑥) is a discrete (step-wise) function of 𝑥𝑥.
• A random variable, 𝑋𝑋, is continuous type if 𝐹𝐹𝑋𝑋 (𝑥𝑥) is a continuous (smooth) function of
𝑥𝑥.
Example 1-3: EXAMPLE 4-5 from [1], pg. 76-77 (Continuous)
Probability Distribution Function: The derivative of the cumulative distribution function:

𝑑𝑑𝐹𝐹𝑋𝑋 (𝑥𝑥)
𝑓𝑓𝑋𝑋 (𝑥𝑥) =
𝑑𝑑𝑑𝑑
is called the probability distribution function (pdf).
Key Properties of 𝑓𝑓𝑋𝑋 (𝑥𝑥):

+∞
1. ∫−∞ 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 = 1
2. 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 = 𝑃𝑃{𝑥𝑥 < 𝑋𝑋 ≤ 𝑥𝑥 + 𝑑𝑑𝑑𝑑}

𝑥𝑥
3. 𝑃𝑃{𝑥𝑥1 < 𝑋𝑋 ≤ 𝑥𝑥2 } = 𝐹𝐹𝑋𝑋 (𝑥𝑥2 ) − 𝐹𝐹𝑋𝑋 (𝑥𝑥1 ) = ∫𝑥𝑥 2 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
1
𝑥𝑥1
4. 𝐹𝐹𝑋𝑋 (𝑥𝑥1 ) = ∫−∞ 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
NOTE: Property 1 is mandatory. Thus most pdf expressions will include a constant factor to
ensure this property holds. It is the “shape” of the distribution (the function of 𝑥𝑥) which is
important, not the constant factor.
THINK: Why is this known as a “density” function?
Discrete type random variables:

If a random variable, 𝑋𝑋, is discrete type then 𝑓𝑓𝑋𝑋 (𝑥𝑥) is a pulse function of 𝑥𝑥,
i.e. 𝑓𝑓𝑥𝑥 (𝑥𝑥) = ∑𝑖𝑖 𝑝𝑝𝑖𝑖 𝛿𝛿(𝑥𝑥 − 𝑥𝑥𝑖𝑖 ) where 𝑥𝑥𝑖𝑖 represents the step-wise discontinuities in 𝐹𝐹𝑥𝑥 (𝑥𝑥) and
𝑝𝑝𝑖𝑖 = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ] such that ∑𝑖𝑖 𝑝𝑝𝑖𝑖 = 1 are the discrete event probabilities.
One can define a more convenient probability mass function (pmf) description:
𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ] 𝑥𝑥 = 𝑥𝑥𝑖𝑖
𝑝𝑝𝑋𝑋 (𝑥𝑥) = � , 𝑖𝑖 = 1,2, …
0 otherwise
Example 1-4: Previous Example 1-2 and Example 1-3
1.2 Probability Distributions and Random Variables (Single, Conditional)

[1: 98-105][2: 152-155]
Conditional Distribution Function:

𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥, 𝑀𝑀}
𝐹𝐹𝑋𝑋 (𝑥𝑥|𝑀𝑀) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥|𝑀𝑀} =
𝑃𝑃(𝑀𝑀)
where the random variable, 𝑋𝑋, is conditional on some event 𝑀𝑀.
Conditional Density Function:
𝑑𝑑𝐹𝐹𝑋𝑋 (𝑥𝑥|𝑀𝑀)
𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑀𝑀) =
𝑑𝑑𝑑𝑑
1.2.1 Total Probability and Bayes’ Theorem
Let 𝑋𝑋 be conditioned on the single event 𝐴𝐴 or multiple events {𝐴𝐴1 , 𝐴𝐴2 , … , 𝐴𝐴𝑛𝑛 }. Then:
Marginal Distribution Function

𝑛𝑛
𝑓𝑓𝑋𝑋 (𝑥𝑥) = � 𝑓𝑓𝑋𝑋 (𝑥𝑥|𝐴𝐴𝑖𝑖 )𝑃𝑃(𝐴𝐴𝑖𝑖 )
𝑖𝑖=1
Total Probability Theorem
∞
� 𝑃𝑃(𝐴𝐴|𝑋𝑋 = 𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 = 𝑃𝑃(𝐴𝐴)
−∞
Bayes’ Theorem

𝑃𝑃(𝐴𝐴|𝑋𝑋 = 𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) 𝑃𝑃(𝐴𝐴|𝑋𝑋 = 𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)

𝑓𝑓𝑋𝑋 (𝑥𝑥|𝐴𝐴) = = ∞
𝑃𝑃(𝐴𝐴) ∫ 𝑃𝑃(𝐴𝐴|𝑋𝑋 = 𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
−∞
Example 1-5: EXAMPLE 4-19 from [1], pgs. 103-105. (a priori vs a posteriori)
1.3 Probability Distributions and Random Variables (Single, Expectation)

[1: 123-152][2: 155-163]
1.3.1 Function of a Random Variable
Define a new random variable:

𝑌𝑌 = 𝑔𝑔(𝑋𝑋)
The cdf of 𝑌𝑌 given the cdf of 𝑋𝑋:
Let 𝑋𝑋 = 𝑔𝑔−1 (𝑌𝑌), then:
𝐹𝐹𝑌𝑌 (𝑦𝑦) = 𝑃𝑃{𝑌𝑌 ≤ 𝑦𝑦} = 𝑃𝑃{𝑔𝑔(𝑋𝑋) ≤ 𝑦𝑦} = 𝑃𝑃{𝑋𝑋 ≤ 𝑔𝑔−1 (𝑦𝑦)} = 𝐹𝐹𝑋𝑋 (𝑔𝑔−1 (𝑦𝑦))
NOTE: Must take into account the domain and range of the inverse function: a more
complex relation will hold for functions which have minima/maxima (e.g 𝑌𝑌 = 𝑋𝑋 2 ).
The pdf of 𝑌𝑌 given the pdf of 𝑋𝑋:

Let 𝑦𝑦 = 𝑔𝑔(𝑥𝑥), then:
𝑓𝑓𝑌𝑌 (𝑦𝑦)𝑑𝑑𝑑𝑑 = 𝑃𝑃{𝑦𝑦 < 𝑌𝑌 ≤ 𝑦𝑦 + 𝑑𝑑𝑑𝑑} ≡ 𝑃𝑃{𝑥𝑥 < 𝑋𝑋 ≤ 𝑥𝑥 + 𝑑𝑑𝑑𝑑} = 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
𝑑𝑑𝑑𝑑 ′
Now = 𝑔𝑔 (𝑥𝑥), hence in general we will have:
𝑑𝑑𝑑𝑑
𝑓𝑓𝑋𝑋 (𝑥𝑥𝑘𝑘 )
𝑓𝑓𝑌𝑌 (𝑦𝑦) = � ′
|𝑔𝑔 (𝑥𝑥𝑘𝑘 )|
𝑘𝑘
where 𝑥𝑥𝑘𝑘 = 𝑔𝑔−1 (𝑦𝑦) for all possible inverses, k (e.g. 𝑦𝑦 = 𝑥𝑥 2 → (𝑥𝑥1 , 𝑥𝑥2 ) = �−�𝑦𝑦, �𝑦𝑦�).
NOTE: The above only applies to continuous-type random variables.

THINK: What happens with discrete-type random variables?
1.3.2 Moments, Mean and Variance
Expectation of 𝑔𝑔(𝑋𝑋):
∞
𝐸𝐸[𝑔𝑔(𝑋𝑋)] = � 𝑔𝑔(𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝑔𝑔(𝑥𝑥𝑖𝑖 )𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
The expectation provides a measure of the average of mean behaviour of 𝑔𝑔(𝑋𝑋).
The nth moment of 𝑋𝑋:

∞
𝑚𝑚𝑛𝑛 = 𝐸𝐸[𝑋𝑋 𝑛𝑛 ]
= � 𝑥𝑥 𝑛𝑛 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝑥𝑥𝑖𝑖𝑛𝑛 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
The nth central moment of 𝑋𝑋:

∞
𝑛𝑛 ]
𝑐𝑐𝑛𝑛 = 𝐸𝐸[(𝑋𝑋 − 𝐸𝐸[𝑋𝑋]) = � (𝑥𝑥 − 𝑚𝑚1 )𝑛𝑛 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � (𝑥𝑥𝑖𝑖 − 𝑚𝑚1 )𝑛𝑛 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
The moments of 𝑋𝑋 are convenient scalar measures of the shape, position of the distribution
function. The 1st and 2nd moments are particularly useful.
Mean (𝜇𝜇𝑋𝑋 ) and Variance (𝜎𝜎𝑋𝑋2 ) of 𝑋𝑋:

∞
𝜇𝜇𝑋𝑋 = 𝑚𝑚1 = 𝐸𝐸[𝑋𝑋] = � 𝑥𝑥𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝑥𝑥𝑖𝑖 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
∞
𝜎𝜎𝑋𝑋2 = 𝑐𝑐2 = 𝐸𝐸[(𝑋𝑋 − 𝜇𝜇)2 ] = � (𝑥𝑥 − 𝜇𝜇)2 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � (𝑥𝑥𝑖𝑖 − 𝜇𝜇)2 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]
−∞ 𝑖𝑖
The mean represents the average or expected value of 𝑋𝑋. The variance represents the spread
or level of deviation around the average of 𝑋𝑋. These are important parameters to describe
distribution functions. NOTE: 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋] ≡ 𝜎𝜎𝑋𝑋2 ≡ 𝐸𝐸[𝑋𝑋 2 ] − 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑋𝑋]
1.4 Probability Distributions and Random Variables (Single, Common-Types)

[1: 83-98][2: 163-174]
A complete list of common types of random variables and their distributions can be found in
the front inside cover of [1].
1.4.1 Continuous Type Random Variables
Uniform Distribution (𝑎𝑎, 𝑏𝑏)

1
𝑓𝑓𝑋𝑋 (𝑥𝑥) = �𝑏𝑏 − 𝑎𝑎 𝑎𝑎 ≤ 𝑥𝑥 ≤ 𝑏𝑏
0 otherwise
𝑎𝑎 + 𝑏𝑏 2
(𝑏𝑏 − 𝑎𝑎)2
𝜇𝜇 = , 𝜎𝜎 =
2 12

Figure 1-1: Uniform distribution function

Importance: Describes a fair outcome, all events equally likely, but with over a confined
range (between a and b)
Normal (Gaussian) Distribution (𝜇𝜇, 𝜎𝜎 2 )

1 (𝑥𝑥−𝜇𝜇)2
−
𝑓𝑓𝑋𝑋 (𝑥𝑥) = 𝑒𝑒 2𝜎𝜎2 ≡ 𝑁𝑁(𝜇𝜇, 𝜎𝜎 2 )
√2𝜋𝜋𝜎𝜎 2
Figure 1-2: Normal density function

Importance: Describes the statistical behaviour of most physical phenomenon, has many
foundational attributes (e.g. central limit theorem, maximum entropy, etc.) and is the most
ubiquitous distribution known across many fields. It is directly parameterised by the mean
and variance.
Exponential Distribution (𝜆𝜆)

−𝜆𝜆𝜆𝜆
𝑓𝑓𝑋𝑋 (𝑥𝑥) = �𝜆𝜆𝑒𝑒 𝑥𝑥 ≥ 0
0 otherwise
1 1
𝜇𝜇 = , 𝜎𝜎 2 = (𝜆𝜆 > 0)
𝜆𝜆 𝜆𝜆2
Figure 1-3: Exponential distribution function

Importance: Waiting time distribution of independent event arrivals. Enjoys wide
application due to the memoryless property.

Memoryless Property:
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠} 𝑒𝑒 −𝜆𝜆(𝑡𝑡+𝑠𝑠)
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠|𝑋𝑋 > 𝑠𝑠} = = = 𝑒𝑒 −𝜆𝜆𝜆𝜆 = 𝑃𝑃{𝑋𝑋 > 𝑡𝑡}
𝑃𝑃{𝑋𝑋 > 𝑠𝑠} 𝑒𝑒 −𝜆𝜆𝜆𝜆
What does it mean? The probability that an event will arrive after time 𝑡𝑡 + 𝑠𝑠,
𝑃𝑃{𝑋𝑋 > 𝑡𝑡 + 𝑠𝑠|𝑋𝑋 > 𝑠𝑠}, given the event has not yet arrived by time 𝑠𝑠, {𝑋𝑋 > 𝑠𝑠}, is the same as
the probability the event will arrive after time 𝑡𝑡, that is, it does not depend on 𝑠𝑠.
NOTE: We can show that 𝐹𝐹𝑋𝑋 (𝑥𝑥) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥} = 1 − 𝑒𝑒 −𝜆𝜆𝑥𝑥 and hence 𝑃𝑃{𝑋𝑋 > 𝑥𝑥} = 𝑒𝑒 −𝜆𝜆𝜆𝜆
1.4.2 Discrete Type Random Variables
Bernoulli Distribution (p)

𝑝𝑝 𝑥𝑥 = 1
𝑝𝑝𝑋𝑋 (𝑥𝑥) = �
1 − 𝑝𝑝 𝑥𝑥 = 0
𝜇𝜇 = 𝑝𝑝, 𝜎𝜎 2 = 𝑝𝑝(1 − 𝑝𝑝)
Meaning: Let 𝑋𝑋 be an event of interest with two outcomes, 𝑋𝑋 = 1 for success, 𝑋𝑋 = 0 for
failure, then the Bernoulli distribution describes individual random failure events where 𝑝𝑝 is
the probability of success and 𝑞𝑞 = 1 − 𝑝𝑝 is the probability of failure.
Binomial Distribution (𝑛𝑛, 𝑝𝑝)

𝑛𝑛
𝑃𝑃[𝑋𝑋 = 𝑘𝑘] = � � 𝑝𝑝𝑘𝑘 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ≡ 𝐵𝐵(𝑛𝑛, 𝑝𝑝), for 𝑘𝑘 = 0,1,2, … , 𝑛𝑛
𝑘𝑘
𝜇𝜇 = 𝑛𝑛𝑛𝑛, 𝜎𝜎 2 = 𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)
Figure 1-4: Binomial distribution function
Meaning: In an independent trial of 𝑛𝑛 Bernoulli experiments, 𝑃𝑃[𝑋𝑋 = 𝑘𝑘] is the probability of

𝑘𝑘 successes. This is important in computer engineering and communications: probability of k
bit errors in a 4byte storage or transmission given by Binomial(k;32,p) where p is the bit
error probability.
Poisson Distribution(𝜆𝜆)
𝜆𝜆𝑘𝑘 −𝜆𝜆
𝑃𝑃[𝑋𝑋 = 𝑘𝑘] = 𝑒𝑒 , for 𝑘𝑘 = 0,1,2, …
𝑘𝑘!
𝜇𝜇 = 𝜆𝜆, 𝜎𝜎 2 = 𝜆𝜆
Figure 1-5: Poisson distribution function

Meaning: Assume there are an average of 𝜆𝜆 events over a specified region (e.g. interval of
time, across a distance, over a space, etc,) then the probability of exactly 𝑘𝑘 events over that
region is given by 𝑃𝑃[𝑋𝑋 = 𝑘𝑘]. This is important to model random events like arrival rates,
genome mutations, etc.
Property: The number of events for maximum probability is, not surprisingly, given by the
average number of events expected, specifically, 𝑃𝑃[𝑋𝑋 = 𝑘𝑘] is at the same maximum for 𝑘𝑘 =
𝜆𝜆, 𝜆𝜆 − 1.
NOTE: Can you see how the Binomial distribution and Poisson distribution are related?
1.5 Probability Distributions and Random Variables (Two, Joint/Marginal)

[1: 169-179,209-215,220-235][2:233-271]
Joint cdf
𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦}
Joint pdf
𝜕𝜕 2 𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = ≡ � � 𝑝𝑝𝑖𝑖𝑖𝑖 𝛿𝛿(𝑥𝑥 − 𝑥𝑥𝑖𝑖 , 𝑦𝑦 − 𝑦𝑦𝑘𝑘 )
𝜕𝜕𝜕𝜕𝜕𝜕𝜕𝜕
𝑖𝑖 𝑘𝑘
where 𝑝𝑝𝑖𝑖𝑖𝑖 = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] and ∑𝑖𝑖 ∑𝑘𝑘 𝑝𝑝𝑖𝑖𝑖𝑖 = 1.
Important Properties/Relations
𝑥𝑥 𝑦𝑦
1. 𝐹𝐹𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = ∫−∞ ∫−∞ 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑢𝑢, 𝑣𝑣)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
2. 𝑃𝑃{(𝑋𝑋, 𝑌𝑌) ∈ 𝐷𝐷} = ∬𝐷𝐷 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
∞ ∞
3. ∫−∞ ∫−∞ 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 = 1

Marginal pdf
∞
𝑓𝑓𝑋𝑋 (𝑥𝑥) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 ≡ � 𝑝𝑝𝑖𝑖 𝛿𝛿(𝑥𝑥 − 𝑥𝑥𝑖𝑖 )
−∞ 𝑖𝑖
∞
𝑓𝑓𝑌𝑌 (𝑦𝑦) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 ≡ � 𝑞𝑞𝑘𝑘 𝛿𝛿(𝑦𝑦 − 𝑦𝑦𝑘𝑘 )
−∞ 𝑘𝑘
where 𝑝𝑝𝑖𝑖 = ∑𝑘𝑘 𝑝𝑝𝑖𝑖𝑖𝑖 = ∑𝑘𝑘 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 ],
𝑞𝑞𝑘𝑘 = ∑𝑖𝑖 𝑝𝑝𝑖𝑖𝑖𝑖 = ∑𝑖𝑖 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = 𝑃𝑃[𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] and ∑𝑖𝑖 𝑝𝑝𝑖𝑖 = ∑𝑘𝑘 𝑞𝑞𝑘𝑘 = 1
Independence
If events are independent then we have that:
𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥, 𝑌𝑌 ≤ 𝑦𝑦} = 𝑃𝑃{𝑋𝑋 ≤ 𝑥𝑥}𝑃𝑃{𝑌𝑌 ≤ 𝑦𝑦}
from which it follows that:
𝐹𝐹𝑋𝑋𝑌𝑌 (𝑥𝑥, 𝑦𝑦) = 𝐹𝐹𝑋𝑋 (𝑥𝑥)𝐹𝐹𝑌𝑌 (𝑦𝑦)
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑓𝑓𝑌𝑌 (𝑦𝑦)
Conditional pdf
From Bayes’ theorem for conditional probability:
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)
𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑦𝑦) = = = ∞
𝑓𝑓𝑌𝑌 (𝑦𝑦) 𝑓𝑓𝑌𝑌 (𝑦𝑦) ∫−∞ 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] 𝑝𝑝𝑖𝑖𝑖𝑖 𝑝𝑝𝑋𝑋𝑋𝑋 (𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑘𝑘 )
𝑝𝑝𝑋𝑋 (𝑥𝑥𝑖𝑖 |𝑦𝑦𝑘𝑘 ) = 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 |𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] = = =
𝑃𝑃[𝑌𝑌 = 𝑦𝑦𝑘𝑘 ] 𝑞𝑞𝑘𝑘 𝑝𝑝𝑌𝑌 (𝑦𝑦𝑘𝑘 )
Example 1-9: EXAMPLE 6-42 from [1], pg, 224,225 (a priori vs a posteriori)
1.5.1 Moments, Mean and Covariance
Expectation of 𝑔𝑔(𝑋𝑋, 𝑌𝑌):

∞ ∞
𝐸𝐸[𝑔𝑔(𝑋𝑋, 𝑌𝑌)] = � � 𝑔𝑔(𝑋𝑋, 𝑌𝑌)𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≡ � � 𝑔𝑔(𝑥𝑥𝑖𝑖 , 𝑦𝑦𝑘𝑘 )𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ]
−∞ −∞ 𝑖𝑖 𝑘𝑘
The expectation provides a measure of the average of mean behaviour of 𝑔𝑔(𝑋𝑋, 𝑌𝑌).
The jlth moment of 𝑋𝑋 and 𝑌𝑌:

∞ ∞
𝑗𝑗
𝑚𝑚𝑗𝑗𝑗𝑗 = 𝐸𝐸�𝑋𝑋 𝑌𝑌 � = � � 𝑥𝑥 𝑗𝑗 𝑦𝑦 𝑙𝑙 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 ≡ � � 𝑥𝑥𝑖𝑖 𝑦𝑦𝑘𝑘𝑙𝑙 𝑃𝑃[𝑋𝑋 = 𝑥𝑥𝑖𝑖 , 𝑌𝑌 = 𝑦𝑦𝑘𝑘 ]
𝑗𝑗 𝑙𝑙
−∞ −∞ 𝑖𝑖 𝑘𝑘
The jlth central moment of 𝑋𝑋 and 𝑌𝑌:

𝑐𝑐𝑗𝑗𝑗𝑗 = 𝐸𝐸�(𝑋𝑋 − 𝐸𝐸[𝑋𝑋])𝑗𝑗 (𝑌𝑌 − 𝐸𝐸[𝑌𝑌])𝑙𝑙 �
The moments of 𝑋𝑋 are convenient scalar measures of the shape, position of the distribution
function. The 1st and 2nd moments are particularly useful.

Covariance of 𝑋𝑋 and 𝑌𝑌
𝐶𝐶𝑋𝑋𝑋𝑋 = 𝑐𝑐11 = 𝐸𝐸[(𝑋𝑋 − 𝐸𝐸[𝑋𝑋])(𝑌𝑌 − 𝐸𝐸[𝑌𝑌])] = 𝐸𝐸[𝑋𝑋𝑋𝑋] − 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌]
The covariance is a measure of the joint spread or variation of (𝑋𝑋, 𝑌𝑌) about the mean
(𝐸𝐸[𝑋𝑋], 𝐸𝐸[𝑌𝑌]). NOTE: 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 𝐶𝐶𝑋𝑋𝑋𝑋
Correlation Co-efficient of 𝑋𝑋 and 𝑌𝑌

𝐶𝐶𝑋𝑋𝑋𝑋
𝜌𝜌𝑋𝑋𝑋𝑋 =
𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌
where |𝜌𝜌𝑋𝑋𝑋𝑋 | ≤ 1 provides a normalised measure of the level of correlation between 𝑋𝑋 and 𝑌𝑌.
If 𝐶𝐶𝑋𝑋𝑋𝑋 = 0 this implies there is no correlation between 𝑋𝑋 and 𝑌𝑌.
Important Relations (for c constant)

1. 𝐸𝐸[𝑐𝑐] = 𝑐𝑐
2. 𝐸𝐸[𝑐𝑐𝑐𝑐] = 𝑐𝑐𝑐𝑐[𝑋𝑋]
3. 𝐸𝐸[𝑐𝑐 + 𝑋𝑋] = 𝑐𝑐 + 𝐸𝐸[𝑋𝑋]
4. 𝑉𝑉𝑉𝑉𝑉𝑉[𝑐𝑐] = 0
5. 𝑉𝑉𝑉𝑉𝑉𝑉[𝑐𝑐𝑐𝑐] = 𝑐𝑐 2 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋]
6. 𝑉𝑉𝑉𝑉𝑉𝑉[𝑐𝑐 + 𝑋𝑋] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋]
THINK: Show that 𝑬𝑬[𝑿𝑿 + 𝒀𝒀] = 𝑬𝑬[𝑿𝑿] + 𝑬𝑬[𝒀𝒀]. What is 𝑽𝑽𝑽𝑽𝑽𝑽[𝑿𝑿 + 𝒀𝒀]?
Important Properties
• If 𝐸𝐸[𝑋𝑋𝑋𝑋] = 0 then 𝑋𝑋 and 𝑌𝑌 are orthogonal.
• If 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 0 then 𝑋𝑋 and 𝑌𝑌 are uncorrelated.
• If 𝑋𝑋 and 𝑌𝑌 are independent then 𝐸𝐸[𝑋𝑋𝑋𝑋] = 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌], thus independent random
variables are uncorrelated (since 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 0).
Example 1-10: EXAMPLE 5-27 from [2], pg. 260.
1.5.2 Conditional Probability and Expectation
Conditioned Event Probabilities

𝑃𝑃[𝑌𝑌 in 𝐴𝐴|𝑋𝑋 = 𝑥𝑥𝑖𝑖 ] = � 𝑝𝑝𝑌𝑌 (𝑦𝑦|𝑥𝑥𝑖𝑖 )
𝑦𝑦∈𝐴𝐴
𝑃𝑃[𝑌𝑌 in 𝐴𝐴|𝑋𝑋 = 𝑥𝑥] = � 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑑𝑑𝑑𝑑

𝑦𝑦∈𝐴𝐴
Total Event Probabilities
𝑃𝑃[𝑌𝑌 in 𝐴𝐴] = � 𝑃𝑃[𝑌𝑌 in 𝐴𝐴|𝑋𝑋 = 𝑥𝑥𝑖𝑖 ]𝑝𝑝𝑋𝑋 (𝑥𝑥𝑖𝑖 )
𝑥𝑥𝑖𝑖
∞
𝑃𝑃[𝑌𝑌 in 𝐴𝐴] = � 𝑃𝑃[𝑌𝑌 in 𝐴𝐴|𝑋𝑋 = 𝑥𝑥]𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑
−∞

Example 1-11: EXAMPLE 5.30 from [2], pg. 263
Conditional expectation/mean of 𝑔𝑔(𝑌𝑌) given 𝑋𝑋 = 𝑥𝑥

∞
𝐸𝐸[𝑔𝑔(𝑌𝑌)|𝑥𝑥] = � 𝑔𝑔(𝑦𝑦)𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝑔𝑔(𝑦𝑦𝑘𝑘 )𝑝𝑝𝑌𝑌 (𝑦𝑦𝑘𝑘 |𝑥𝑥)
−∞ 𝑘𝑘
which includes the special case of 𝐸𝐸[𝑌𝑌|𝑥𝑥].
Total Probability of 𝑔𝑔(𝑌𝑌) given 𝑋𝑋

∞
𝐸𝐸[𝑔𝑔(𝑌𝑌)] = 𝐸𝐸�𝐸𝐸[𝑔𝑔(𝑌𝑌)|𝑋𝑋]� = � 𝐸𝐸[𝑔𝑔(𝑌𝑌)|𝑥𝑥]𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑑𝑑𝑑𝑑 ≡ � 𝐸𝐸[𝑔𝑔(𝑌𝑌)|𝑥𝑥𝑖𝑖 ]𝑝𝑝𝑋𝑋 (𝑥𝑥𝑖𝑖 )
−∞ 𝑖𝑖
1.5.3 Jointly Gaussian Random Variables
If 𝑋𝑋 and 𝑌𝑌 are jointly Gaussian then:

1 (𝑥𝑥 − 𝜇𝜇𝑋𝑋 )2 (𝑥𝑥 − 𝜇𝜇𝑋𝑋 )(𝑦𝑦 − 𝜇𝜇𝑌𝑌 ) (𝑦𝑦 − 𝜇𝜇𝑌𝑌 )2
exp �− � − 2𝜌𝜌 𝑋𝑋𝑋𝑋 + ��
2(1 − 𝜌𝜌𝑋𝑋𝑋𝑋 2
) 𝜎𝜎𝑋𝑋2 𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 𝜎𝜎𝑌𝑌2
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) =
2
2𝜋𝜋𝜎𝜎𝑋𝑋 𝜎𝜎𝑌𝑌 �1 − 𝜌𝜌𝑋𝑋𝑋𝑋
= 𝑁𝑁(𝜇𝜇𝑋𝑋 , 𝜇𝜇𝑌𝑌 , 𝜎𝜎𝑋𝑋2 , 𝜎𝜎𝑌𝑌2 , 𝜌𝜌𝑋𝑋𝑋𝑋 )
The correlation co-efficient 𝜌𝜌𝑋𝑋𝑋𝑋 measures the correlation spread between 𝑋𝑋 and 𝑌𝑌 which can
be seen from Figure 1-6.
Figure 1-6: Jointly Gaussian pdf (a) 𝝆𝝆 = 𝟎𝟎; (b) 𝝆𝝆 = −𝟎𝟎. 𝟗𝟗 [2, Figure 5.25]

NOTE: Marginal pdfs of joint Gaussians are Gaussian

(𝑥𝑥 − 𝜇𝜇1 )2
∞ exp �− �
2𝜎𝜎12
𝑓𝑓𝑋𝑋 (𝑥𝑥) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 =
−∞ √2𝜋𝜋𝜎𝜎1
(𝑦𝑦 − 𝜇𝜇2 )2
∞ exp �− �
2𝜎𝜎22
𝑓𝑓𝑌𝑌 (𝑦𝑦) = � 𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦)𝑑𝑑𝑑𝑑 =
−∞ √2𝜋𝜋𝜎𝜎2
NOTE: If 𝑋𝑋 and 𝑌𝑌 jointly Gaussian are uncorrelated (𝜌𝜌 = 0) they are also independent.
(Show this!)
1.6 Vector Random Variables

[1: 243-255,278-280][2: 303-312,318-322,325-328,369-373]
Define n jointly related random variables, 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 , as a vector random variable:
𝑋𝑋1 𝑥𝑥1
𝑋𝑋 𝑥𝑥2
𝐗𝐗 = � 2 � with instances, 𝐱𝐱 = � ⋮ � and integration increments, 𝑑𝑑𝐱𝐱 = 𝑑𝑑𝑥𝑥1 𝑑𝑑𝑥𝑥2 , … , 𝑑𝑑𝑥𝑥𝑛𝑛
⋮
𝑋𝑋𝑛𝑛 𝑥𝑥𝑛𝑛
Multi-variate joint distributions and probabilities

Joint cdf: 𝐹𝐹𝐗𝐗 (𝐱𝐱) = 𝐹𝐹𝑋𝑋1 ,𝑋𝑋2 ,…,𝑋𝑋𝑛𝑛 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) = 𝑃𝑃[𝑋𝑋1 ≤ 𝑥𝑥1 , 𝑋𝑋2 ≤ 𝑥𝑥2 , … , 𝑋𝑋𝑛𝑛 ≤ 𝑥𝑥𝑛𝑛 ]
Joint pmf: 𝑝𝑝𝐗𝐗 (𝐱𝐱) = 𝑃𝑃[𝑋𝑋1 = 𝑥𝑥1 , 𝑋𝑋2 = 𝑥𝑥2 , … , 𝑋𝑋𝑛𝑛 = 𝑥𝑥𝑛𝑛 ]
𝜕𝜕𝑛𝑛 𝐹𝐹𝐗𝐗 (𝐱𝐱)
Joint pdf: 𝑓𝑓𝐗𝐗 (𝐱𝐱) = 𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 ,…,𝑋𝑋𝑛𝑛 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 ) =
𝜕𝜕𝑥𝑥1 𝜕𝜕𝑥𝑥2 …𝜕𝜕𝑥𝑥𝑛𝑛
n-dimensional event probability: 𝑃𝑃[𝐗𝐗 in 𝐴𝐴] = ∬𝐱𝐱∈𝐴𝐴 ⋯ ∫ 𝑓𝑓𝐗𝐗 (𝐱𝐱)𝑑𝑑𝐱𝐱
Important marginal and conditional pdfs

∞ ∞
𝑓𝑓𝑋𝑋1 (𝑥𝑥1 ) = � ⋯ � 𝑓𝑓𝐗𝐗 (𝐱𝐱)𝑑𝑑𝑥𝑥2 𝑑𝑑𝑥𝑥3 … 𝑑𝑑𝑥𝑥𝑛𝑛
−∞ −∞
∞
𝑓𝑓𝑋𝑋1,𝑋𝑋2 ,…,𝑋𝑋𝑛𝑛−1 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−1 ) = � 𝑓𝑓𝐗𝐗 (𝐱𝐱)𝑑𝑑𝑥𝑥𝑛𝑛
−∞
𝑓𝑓𝑋𝑋1,𝑋𝑋2,…𝑋𝑋𝑛𝑛 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 )
𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 |𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−1 ) =
𝑓𝑓𝑋𝑋1,𝑋𝑋2 ,…,𝑋𝑋𝑛𝑛−1 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑛𝑛−1 )
𝑓𝑓𝑋𝑋1 ,𝑋𝑋2 ,…𝑋𝑋𝑛𝑛 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛 )
= 𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 |𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−1 )𝑓𝑓𝑋𝑋𝑛𝑛−1 (𝑥𝑥𝑛𝑛−1 |𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−2 ) ⋯ 𝑓𝑓𝑋𝑋2 (𝑥𝑥2 |𝑥𝑥1 )𝑓𝑓𝑋𝑋1 (𝑥𝑥1 )

Independence
The collection of random variables 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent if:
𝑓𝑓𝐗𝐗 (𝐱𝐱) = 𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 )𝑓𝑓𝑋𝑋𝑛𝑛−1 (𝑥𝑥𝑛𝑛−1 ) ⋯ 𝑓𝑓𝑋𝑋2 (𝑥𝑥2 )𝑓𝑓𝑋𝑋1 (𝑥𝑥1 )
This also implies:
𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 |𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑛𝑛−1 ) = 𝑓𝑓𝑋𝑋𝑛𝑛 (𝑥𝑥𝑛𝑛 )
1.6.1 Expectations, Correlation and Covariance Matrices
Expectation of 𝑔𝑔(𝐗𝐗)
∞ ∞
𝐸𝐸[𝑔𝑔(𝐗𝐗)] = � ⋯ � 𝑔𝑔(𝐱𝐱)𝑓𝑓𝐗𝐗 (𝐱𝐱)𝑑𝑑𝐱𝐱 = � � ⋯ � 𝑔𝑔(𝐱𝐱)𝑝𝑝𝐗𝐗 (𝐱𝐱)
−∞ −∞ 𝑥𝑥1 𝑥𝑥2 𝑥𝑥𝑛𝑛
Sums and Products (for 𝑎𝑎𝑖𝑖 constant)

𝑚𝑚 𝑚𝑚
𝐸𝐸 �� 𝑎𝑎𝑖𝑖 𝑔𝑔𝑖𝑖 (𝐗𝐗)� = � 𝑎𝑎𝑖𝑖 𝐸𝐸[𝑔𝑔𝑖𝑖 (𝐗𝐗)]
𝑖𝑖=1 𝑖𝑖=1
If 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 are independent then we also have:
𝑚𝑚 𝑚𝑚
𝐸𝐸 �� 𝑎𝑎𝑖𝑖 𝑔𝑔𝑖𝑖 (𝐗𝐗)� = � 𝑎𝑎𝑖𝑖 𝐸𝐸[𝑔𝑔𝑖𝑖 (𝐗𝐗)]
Mean Vector, 𝝁𝝁
𝑋𝑋1 𝐸𝐸[𝑋𝑋1 ]
𝑋𝑋 𝐸𝐸[𝑋𝑋2 ]
𝝁𝝁𝐗𝐗 = 𝐸𝐸[𝐗𝐗] = 𝐸𝐸 � 2 � = � �
⋮ ⋮
𝑋𝑋𝑛𝑛 𝐸𝐸[𝑋𝑋𝑛𝑛 ]
Correlation Matrix, 𝐑𝐑
Let 𝑅𝑅𝑖𝑖𝑖𝑖 = 𝐸𝐸[𝑋𝑋𝑖𝑖 𝑋𝑋𝑗𝑗 ] define the correlation between 𝑋𝑋𝑖𝑖 and 𝑋𝑋𝑗𝑗 , then we have that:
𝑋𝑋1
𝑅𝑅11 ⋯ 𝑅𝑅1𝑛𝑛
𝑋𝑋
𝐑𝐑 𝐗𝐗 = � ⋮ ⋱ ⋮ � = 𝐸𝐸[𝐗𝐗𝐗𝐗 𝑇𝑇 ] = 𝐸𝐸 �� 2 � [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑛𝑛 ]�
⋮
𝑅𝑅𝑛𝑛1 ⋯ 𝑅𝑅𝑛𝑛𝑛𝑛
𝑋𝑋𝑛𝑛
Covariance Matrix, 𝐂𝐂
Let 𝐶𝐶𝑖𝑖𝑖𝑖 = 𝐸𝐸�(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑖𝑖 )�𝑋𝑋𝑗𝑗 − 𝜇𝜇𝑗𝑗 �� = 𝑅𝑅𝑖𝑖𝑖𝑖 − 𝜇𝜇𝑖𝑖 𝜇𝜇𝑗𝑗 be the covariance between 𝑋𝑋𝑖𝑖 and 𝑋𝑋𝑗𝑗 for 𝑖𝑖 ≠
𝑗𝑗, and 𝐶𝐶𝑖𝑖𝑖𝑖 = 𝐸𝐸[(𝑋𝑋𝑖𝑖 − 𝜇𝜇𝑖𝑖 )2 ] be the variance of 𝑋𝑋𝑖𝑖 , then:
𝐶𝐶11 ⋯ 𝐶𝐶1𝑛𝑛
𝐂𝐂𝐗𝐗 = � ⋮ ⋱ ⋮ � = 𝐸𝐸[(𝐗𝐗 − 𝝁𝝁𝐗𝐗 )(𝐗𝐗 − 𝝁𝝁𝐗𝐗 )𝑇𝑇 ] = 𝐑𝐑 𝐗𝐗 − 𝝁𝝁𝐗𝐗 𝝁𝝁𝑇𝑇𝐗𝐗
𝐶𝐶𝑛𝑛1 ⋯ 𝐶𝐶𝑛𝑛𝑛𝑛
𝑛𝑛
Where the diagonal terms {𝐶𝐶𝑖𝑖𝑖𝑖 }𝑖𝑖=1 are the variances of the respective random variables in 𝐗𝐗
and the off-diagonal terms �𝐶𝐶𝑖𝑖𝑖𝑖 �𝑖𝑖≠𝑗𝑗 are the covariances between each pair of random
variables.
NOTE: Both 𝐑𝐑 and 𝐂𝐂 are 𝒏𝒏 × 𝒏𝒏 symmetric matrices.

THINK: Research the special properties of symmetric matrices from linear algebra
The covariance matrix is a very important quantity in pattern recognition and analysis of
high dimensional random data / measurements as it defines the spread and tendencies /
patterns of the data.
1.6.2 Multivariate Gaussian Distribution
The random variables 𝐗𝐗 = [𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 ]𝑇𝑇 , are said to be jointly Gaussian if their joint pdf
is given by the multivariate Gaussian distribution function:
1
exp �− (𝐱𝐱 − 𝝁𝝁𝐗𝐗 )𝑇𝑇 𝐂𝐂𝐗𝐗−1 (𝐱𝐱 − 𝝁𝝁𝐗𝐗 )�
𝑓𝑓𝐗𝐗 (𝐱𝐱) = 2
(2𝜋𝜋)𝑛𝑛/2 |𝐂𝐂𝐗𝐗 |1/2
Important Properties of the multivariate Gaussian distribution

1. The multivariate Gaussian distribution for any order n is fully specified by only the 1st
and 2nd order moments (i.e. 𝝁𝝁𝐗𝐗 and 𝐂𝐂𝐗𝐗 ).
2. The family of all marginal pdf’s are Gaussian (see EXAMPLE 6.21 from [2])
3. The family of all conditional pdf’s are Gaussian (see EXAMPLE 6.23 from [2])
4. If the jointly Gaussian random variables are uncorrelated then they are also
independent (see EXAMPLE 6.22 from [2]).
5. The linear transformation of a jointly Gaussian random vector is also a jointly
Gaussian random vector (see Section 6.4.1 from [2]).
6. The sum of jointly Gaussian random vectors is also jointly Gaussian (see EXAMPLE
6.24 from [2]).
1.6.3 Linear Transformations of Random Variables
Consider applying a linear transformation matrix, 𝐀𝐀, to n-dimensional random data samples
represented by the vector random variable, 𝐗𝐗:
𝑌𝑌1 𝑎𝑎11 𝑎𝑎12 ⋯ 𝑎𝑎1𝑛𝑛 𝑋𝑋1
𝑌𝑌 𝑎𝑎21 𝑎𝑎22 ⋯ 𝑎𝑎2𝑛𝑛 𝑋𝑋2
𝐘𝐘 = � 2 � = � ⋮ ⋮ ⋱ ⋮ � � ⋮ � = 𝐀𝐀𝐀𝐀
⋮
𝑌𝑌𝑛𝑛 𝑎𝑎𝑛𝑛1 𝑎𝑎𝑛𝑛2 ⋯ 𝑎𝑎𝑛𝑛𝑛𝑛 𝑋𝑋𝑛𝑛

What can we say about the transformed data samples and the corresponding transformed
vector random variable, 𝐘𝐘?
Expectation of 𝐘𝐘
𝝁𝝁𝐘𝐘 = 𝐸𝐸[𝐘𝐘] = 𝐸𝐸[𝐀𝐀𝐀𝐀] = 𝐀𝐀𝐸𝐸[𝐗𝐗] = 𝐀𝐀𝝁𝝁𝐗𝐗
Covariance matrix of 𝐘𝐘
𝐂𝐂𝐘𝐘 = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
Cross-covariance between 𝐗𝐗 and 𝐘𝐘

𝐂𝐂𝐗𝐗𝐗𝐗 = 𝐸𝐸[[(𝐗𝐗 − 𝝁𝝁𝐗𝐗 )(𝐘𝐘 − 𝝁𝝁𝐘𝐘 )𝑇𝑇 ] = 𝐸𝐸[𝐗𝐗𝐘𝐘 𝑇𝑇 ] − 𝝁𝝁𝐗𝐗 𝝁𝝁𝑇𝑇𝐘𝐘 = 𝐑𝐑 𝐗𝐗𝐗𝐗 − 𝝁𝝁𝐗𝐗 𝝁𝝁𝑇𝑇𝐘𝐘
Where 𝐑𝐑 𝐗𝐗𝐗𝐗 is the cross-correlation between 𝐗𝐗 and 𝐘𝐘. If 𝐘𝐘 = 𝐀𝐀𝐀𝐀 then we also have:
𝐂𝐂𝐗𝐗𝐗𝐗 = 𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
Exercise: Show that 𝐂𝐂𝐘𝐘 = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇 and 𝐂𝐂𝐗𝐗𝐗𝐗 = 𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
Transformation from an uncorrelated random vector

If 𝐂𝐂𝐗𝐗 = 𝐈𝐈 then the random variable components of 𝐗𝐗 are uncorrelated with unit variance. If
we now apply a linear transformation to uncorrelated random variables then:
𝐂𝐂𝐘𝐘 = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇 = 𝐀𝐀𝐀𝐀𝐀𝐀𝑇𝑇 = 𝐀𝐀𝐀𝐀𝑇𝑇
which depends only on 𝐀𝐀  thus we can generate a vector random variable with a desired
𝐂𝐂𝐘𝐘 simply by choosing the appropriate linear transformation matrix.
Transformation to an uncorrelated random vector

There are many applications where we want to model, filter or otherwise process n-
dimensional random data samples which are uncorrelated. If the initial data represented by 𝐗𝐗
is correlated can we find a linear transformation that yields an uncorrelated 𝐘𝐘? That is we
want 𝐀𝐀 such that:
Λ = 𝐀𝐀𝐂𝐂𝐗𝐗 𝐀𝐀𝑇𝑇
where Λ is a diagonal matrix.
RESEARCH: The answer is yes! Research the Karhunen–Loève Transform (KLT)

(Section 6.3.4 from [2], also http://fourier.eng.hmc.edu/e161/lectures/klt/node3.html)
THINK: Why are uncorrelated random variables so desirable?
Example 1-18: EXAMPLE 6.18 from [2], pgs. 321-322

1.6.4 Central Limit Theorem
Independent and Identically Distributed (iid): A sequence or other collection of random

variables is independent and identically distributed (iid) if each random variable has the
same probability distribution as the others and all are mutually independent.
Let 𝑆𝑆𝑛𝑛 = 𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 be the sum of n iid random variables each with mean 𝐸𝐸[𝑋𝑋] =
𝜇𝜇𝑋𝑋 and variance 𝜎𝜎𝑋𝑋2 . Let 𝑍𝑍𝑛𝑛 be the zero-mean, unit-variance random variable defined by:
𝑆𝑆𝑛𝑛 − 𝜇𝜇𝑆𝑆𝑛𝑛 𝑆𝑆𝑛𝑛 − 𝑛𝑛𝜇𝜇𝑋𝑋
𝑍𝑍𝑛𝑛 = =
𝜎𝜎𝑆𝑆𝑛𝑛 𝜎𝜎√𝑛𝑛
then for any random variable (continuous or discrete):
𝑧𝑧
1 2
lim 𝑃𝑃[𝑍𝑍𝑛𝑛 ≤ 𝑧𝑧] = lim 𝐹𝐹𝑍𝑍 (𝑧𝑧) = � 𝑒𝑒 −𝑥𝑥 /2 𝑑𝑑𝑑𝑑
𝑛𝑛→∞ 𝑛𝑛→∞ √2𝜋𝜋 −∞
which is the cdf of the zero-mean, unit variance Gaussian distribution. Furthermore for any
continuous random variable we can also state:
1 −𝑧𝑧 2/2
lim 𝑓𝑓𝑍𝑍 (𝑧𝑧) = 𝑁𝑁(0,1) = 𝑒𝑒
𝑛𝑛→∞ √2𝜋𝜋
That is the sum of any sequence of iid random variables, with arbitrary pdf, will converge to
that of a Gaussian pdf.
Example 1-20: EXAMPLE 7.11 from [2], pgs. 370-371
Gaussian approximation for the Binomial random variable

The Binomial random variable is derived from the sum of n iid Bernoulli random variables
and hence from the central limit theorem we can state:
𝑛𝑛 1 2
𝑃𝑃[𝑋𝑋 = 𝑘𝑘] = � � 𝑝𝑝𝑘𝑘 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ≅ 𝑒𝑒 −(𝑘𝑘−𝑛𝑛𝑛𝑛) /2𝑛𝑛𝑛𝑛(1−𝑝𝑝)
𝑘𝑘 �2𝜋𝜋𝜋𝜋𝜋𝜋(1 − 𝑝𝑝)
which is the Gaussian pdf, 𝑁𝑁(𝑛𝑛𝑛𝑛, 𝑛𝑛𝑛𝑛(1 − 𝑝𝑝)).
1.7 MAP, ML and MSE Estimators

[1: 261-272][2: 332-342]
In engineering the idea of estimation is the technique of being able to estimate a desired
signal of interest given noisy observations of it. In this context the signals are random and
being time-varying are deemed to be random signals. However the idea of estimation can
also be applied to random variables and observations or sequences of random variables.

1.7.1 MAP and ML Estimators
Assume 𝑌𝑌 is the observation from which we want to form an estimate of 𝑋𝑋. For our purposes
we assume 𝑋𝑋 and 𝑌𝑌 are jointly distributed random variables. From Bayes’ theorem we have:
𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙ℎ𝑜𝑜𝑜𝑜𝑜𝑜 ∙ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑦𝑦) = ↔ 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 =
𝑓𝑓𝑌𝑌 (𝑦𝑦) 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
Maximum A Posteriori (MAP) estimate

Given the observation or output 𝑌𝑌 = 𝑦𝑦 we can form an estimate of the desired or input 𝑋𝑋 = 𝑥𝑥
by the MAP estimate:
𝑥𝑥�𝑀𝑀𝑀𝑀𝑀𝑀 = argmax 𝑓𝑓𝑋𝑋 (𝑥𝑥|𝑦𝑦) = argmax 𝑃𝑃[𝑋𝑋 = 𝑥𝑥|𝑌𝑌 = 𝑦𝑦]
𝑥𝑥 𝑥𝑥
= argmax 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥)𝑓𝑓𝑋𝑋 (𝑥𝑥) = argmax 𝑃𝑃[𝑌𝑌 = 𝑦𝑦|𝑋𝑋 = 𝑥𝑥]𝑃𝑃[𝑋𝑋 = 𝑥𝑥]
𝑥𝑥 𝑥𝑥
To form the MAP estimate knowledge of the prior distribution (our belief or bias on 𝑋𝑋
before we get any observations), 𝑓𝑓𝑋𝑋 (𝑥𝑥) ≡ 𝑃𝑃[𝑋𝑋 = 𝑥𝑥], is needed.
The problem with MAP estimation is that we may not have any knowledge of the prior
distribution. In such cases we assume the most unbiased opinion of the prior, that is, a
uniform prior distribution, and form the ML estimate.
Maximum Likelihood (ML) estimate

Given the observation or output 𝑌𝑌 = 𝑦𝑦 we can form an estimate of the desired or input 𝑋𝑋 = 𝑥𝑥
by the ML estimate:
𝑥𝑥�𝑀𝑀𝑀𝑀 = argmax 𝑓𝑓𝑌𝑌 (𝑦𝑦|𝑥𝑥) = argmax 𝑃𝑃[𝑌𝑌 = 𝑦𝑦|𝑋𝑋 = 𝑥𝑥]
𝑥𝑥 𝑥𝑥
The MAP estimate is equal to the ML estimate when the prior distribution is uniform.
Example 1-21: EXAMPLE 6.26 from [2], pgs. 333,334
1.7.2 Minimum MSE Estimators (MMSE Estimators)
We want to form an estimate for 𝑋𝑋 as a function of the observation, 𝑌𝑌, that is:
𝑋𝑋� = 𝑔𝑔(𝑌𝑌)
such that that the mean square error (MSE):
2
𝑒𝑒 = 𝐸𝐸[�𝑋𝑋 − 𝑔𝑔(𝑌𝑌)� ]
is minimised. Thus 𝑋𝑋� is known as the minimum MSE estimator and is given by 𝑔𝑔(. ) as
follows:
2
𝑥𝑥� = 𝑔𝑔∗ (𝑦𝑦) = argmin 𝐸𝐸[�𝑋𝑋 − 𝑔𝑔(𝑌𝑌)� ] = 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦]
𝑔𝑔(.)
∗ (𝑦𝑦)
The function 𝑔𝑔 = 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦] is known as the regression curve.
In cases where 𝐸𝐸[𝑋𝑋|𝑌𝑌 = 𝑦𝑦] is not known or not analytically tractable to derive we consider
specific functional forms based on linear functions.
Case 1: 𝑋𝑋� = 𝑔𝑔(𝑌𝑌) = 𝑏𝑏 ∗ (constant curve)

We have to minimise 𝑒𝑒 = 𝐸𝐸[(𝑋𝑋 − 𝑏𝑏)2 ]:
𝑒𝑒 ∗ = min 𝐸𝐸[(𝑋𝑋 − 𝑏𝑏)2 ] = 𝐸𝐸[𝑋𝑋 2 ] − 2𝑏𝑏𝑏𝑏[𝑋𝑋] + 𝑏𝑏 2
𝑏𝑏
𝑑𝑑𝑑𝑑
by = 0 which yields:
𝑑𝑑𝑑𝑑
𝑏𝑏 ∗ = 𝐸𝐸[𝑋𝑋]
and the MSE:
𝑒𝑒 ∗ = 𝐸𝐸[(𝑋𝑋 − 𝑏𝑏 ∗ )2 ] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋]
It is intuitively satisfying that the optimal constant estimate of 𝑋𝑋 is its expectation and not
surprisingly the observations 𝑌𝑌 are ignored, this Case 1 is not very practical.
Case 2: 𝑋𝑋� = 𝑔𝑔(𝑌𝑌) = 𝑎𝑎∗ 𝑌𝑌 (linear curve, biased)

𝑒𝑒 ∗ = min 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎𝑎𝑎)2 ]
𝑎𝑎
𝑑𝑑𝑑𝑑[(𝑋𝑋−𝑎𝑎𝑎𝑎)2 ]
By taking = 𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎)(−𝑌𝑌)] = 0 we get:
𝑑𝑑𝑑𝑑
𝐸𝐸[𝑋𝑋𝑋𝑋]
𝑎𝑎∗ =
𝐸𝐸[𝑌𝑌 2 ]
Since 𝐸𝐸�𝑋𝑋�� = 𝑎𝑎∗ 𝐸𝐸[𝑌𝑌] ≠ 𝐸𝐸[𝑋𝑋] the estimator is biased.
The MSE is given by (since we know 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎∗ 𝑌𝑌)𝑌𝑌] = 0):
𝑒𝑒 ∗ = 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎∗ 𝑌𝑌)2 ] = 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎∗ 𝑌𝑌)𝑋𝑋] = 𝐸𝐸[𝑋𝑋 2 ] − 𝑎𝑎∗ 𝐸𝐸[𝑋𝑋𝑋𝑋]
Being a biased estimator Case 2 is not very useful.
Case 3: 𝑋𝑋� = 𝑔𝑔(𝑌𝑌) = 𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ (linear curve, unbiased)  linear MMSE estimator
𝑒𝑒 ∗ = min 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)2 ]
𝑎𝑎
𝑑𝑑𝑑𝑑[(𝑋𝑋−𝑎𝑎𝑎𝑎−𝑏𝑏)2 ]
By taking = 𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)(−1)] = 0 we get:
𝑑𝑑𝑑𝑑
𝑏𝑏 ∗ = 𝐸𝐸[𝑋𝑋] − 𝑎𝑎𝑎𝑎[𝑌𝑌]
𝑑𝑑𝑑𝑑[(𝑋𝑋−𝑎𝑎𝑎𝑎−𝑏𝑏∗ )2 ]
Then by taking = 𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏 ∗ )(−𝑌𝑌)] = 0 we get:
𝑑𝑑𝑑𝑑
𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) 𝜎𝜎𝑋𝑋
𝑎𝑎∗ = = 𝜌𝜌𝑋𝑋𝑋𝑋
𝑉𝑉𝑉𝑉𝑉𝑉[𝑌𝑌] 𝜎𝜎𝑌𝑌
and thus:
𝑌𝑌 − 𝐸𝐸[𝑌𝑌]
𝑋𝑋� = 𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ = 𝜌𝜌𝑋𝑋𝑋𝑋 �𝜎𝜎𝑋𝑋
� + 𝐸𝐸[𝑋𝑋]
𝜎𝜎𝑌𝑌
from which we see that 𝐸𝐸�𝑋𝑋�� = 𝐸𝐸[𝑋𝑋] and this is an unbiased estimator.
The MSE can be shown to be (see pg. 335 from [2] for the details):
2
𝑒𝑒 ∗ = 𝐸𝐸[(𝑋𝑋 − (𝑎𝑎∗ 𝑌𝑌 + 𝑏𝑏 ∗ ))2 ] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋] − 𝑎𝑎∗ 𝐶𝐶𝐶𝐶𝐶𝐶(𝑋𝑋, 𝑌𝑌) = 𝑉𝑉𝑉𝑉𝑉𝑉[𝑋𝑋](1 − 𝜌𝜌𝑋𝑋𝑋𝑋 )
The estimator relies on how correlated the 𝑌𝑌 is with 𝑋𝑋. If there is maximum correlation
(𝜌𝜌𝑋𝑋𝑋𝑋 = ±1) then the best estimator simply rescales and shifts the mean and variance of 𝑌𝑌 to
match that for 𝑋𝑋. Conversely if there is no correlation 𝜌𝜌𝑋𝑋𝑋𝑋 = 0 then then best estimator is
just the mean 𝐸𝐸[𝑋𝑋], as we had in Case 1, that is, the observations are not used since they
provide no information or knowledge about 𝑋𝑋.
Example 1-22: EXAMPLE 6.28 from [2], pgs. 337
NOTE: The “gotcha” with such statistical estimators is that we require knowledge of the
first and second order statistics of both the given observations 𝑌𝑌 and the signal that needs to
estimated 𝑋𝑋.
1.7.3 The Orthogonality Principle
Consider the minimum MSE linear estimator where we want to minimise:

2
𝑒𝑒 = 𝐸𝐸 ��𝑋𝑋 − 𝑋𝑋�� = 𝐸𝐸[(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)2 ]
𝑑𝑑𝑑𝑑
Part of this process requires solving = 0, that is:
𝑑𝑑𝑑𝑑
𝐸𝐸[2(𝑋𝑋 − 𝑎𝑎𝑎𝑎 − 𝑏𝑏)(−𝑌𝑌)] = 0
or restated:
𝐸𝐸��𝑋𝑋 − 𝑋𝑋��𝑌𝑌� = 0
The orthogonality principle for MSE estimation states the estimation error 𝑋𝑋 − 𝑋𝑋� is
orthogonal to the observation data 𝑌𝑌, that is
𝐸𝐸��𝑋𝑋 − 𝑋𝑋��𝑌𝑌� = 0
1.7.4 General Linear Estimation
Consider forming an estimate of the desired random variable, 𝐷𝐷, as a linear combination of n
observations, 𝑋𝑋1 , 𝑋𝑋2 , … , 𝑋𝑋𝑛𝑛 :
� = 𝑎𝑎1 𝑋𝑋1 + 𝑎𝑎2 𝑋𝑋2 + ⋯ + 𝑎𝑎𝑛𝑛 𝑋𝑋𝑛𝑛
𝐷𝐷
The n observations could be from n joint random variables or n observations of a random
𝑛𝑛
process. We need to choose the {𝑎𝑎𝑖𝑖 }𝑖𝑖=1 such the MSE:
� �2 ]
𝑒𝑒 = 𝐸𝐸[�𝐷𝐷 − 𝐷𝐷
is minimised. By invoking the principle of orthogonality we can state the answer comes from
solving:
𝐸𝐸[(𝐷𝐷 − {𝑎𝑎1 𝑋𝑋1 + 𝑎𝑎2 𝑋𝑋2 + ⋯ + 𝑎𝑎𝑛𝑛 𝑋𝑋𝑛𝑛 })𝑋𝑋𝑖𝑖 ] = 0 𝑖𝑖 = 1,2, … , 𝑛𝑛
Define 𝑅𝑅𝑗𝑗𝑗𝑗 = 𝐸𝐸[𝑋𝑋𝑗𝑗 𝑋𝑋𝑖𝑖 ] and 𝑅𝑅𝐷𝐷𝐷𝐷 = 𝐸𝐸[𝐷𝐷𝑋𝑋𝑖𝑖 ] then we are required to solve the following n
simultaneous equations:
𝑅𝑅11 𝑎𝑎1 + 𝑅𝑅21 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛1 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷1
𝑅𝑅12 𝑎𝑎1 + 𝑅𝑅22 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛2 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷2
……………………………………….
𝑅𝑅1𝑛𝑛 𝑎𝑎1 + 𝑅𝑅2𝑛𝑛 𝑎𝑎2 + ⋯ + 𝑅𝑅𝑛𝑛𝑛𝑛 𝑎𝑎𝑛𝑛 = 𝑅𝑅𝐷𝐷𝐷𝐷

Let 𝐗𝐗 = [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑛𝑛 ]𝑇𝑇 , 𝐚𝐚 = [𝑎𝑎1 𝑎𝑎2 ⋯ 𝑎𝑎𝑛𝑛 ]𝑇𝑇 , 𝐑𝐑 𝐗𝐗 = 𝐸𝐸[𝐗𝐗𝐗𝐗 𝑇𝑇 ] is the correlation
matrix and 𝐫𝐫𝐷𝐷𝐗𝐗 = 𝐸𝐸[𝐷𝐷𝐗𝐗] = [𝑅𝑅𝐷𝐷1 𝑅𝑅𝐷𝐷2 ⋯ 𝑅𝑅𝐷𝐷𝐷𝐷 ]𝑇𝑇 then we can express the equations in
matrix form:
𝐑𝐑 𝐗𝐗 𝐚𝐚 = 𝐫𝐫𝐷𝐷𝐗𝐗
The optimal linear estimator is then:
𝐚𝐚∗ = 𝐑𝐑−1
𝐗𝐗 𝐫𝐫𝐷𝐷𝐗𝐗 and 𝐷𝐷 � = 𝐚𝐚∗𝑇𝑇 𝐗𝐗
And the mean square error of the optimum linear estimator is:
𝑒𝑒 ∗ = 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)2 ] = 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐷𝐷] − 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐚𝐚∗𝑇𝑇 𝐗𝐗]
= 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐷𝐷] = 𝑉𝑉𝑉𝑉𝑉𝑉[𝐷𝐷] − 𝐚𝐚∗𝑇𝑇 𝐫𝐫𝐷𝐷𝐗𝐗
since 𝐸𝐸[(𝐷𝐷 − 𝐚𝐚∗𝑇𝑇 𝐗𝐗)𝐚𝐚∗𝑇𝑇 𝐗𝐗] = 0 due to the orthogonality principle.
Example 1-23: EXAMPLE 6.30 from [2], pg. 340,341
NOTE: In the context of the observations arising from a sampling a random process across
time and the estimator being represented as an optimal FIR filter, the equations
𝐑𝐑 𝐗𝐗 𝐚𝐚 = 𝐫𝐫𝐷𝐷𝐗𝐗
are known as the Wiener-Hopf equations and their solution:
𝐚𝐚∗ = 𝐑𝐑−1 𝐗𝐗 𝐫𝐫𝐷𝐷𝐗𝐗
are the co-efficients of the Wiener FIR filter.
1.8 References
1. A. Papoulis, S. Unnikrishna Pillai, “Probability, Random Variable, and Stochastic
Processes”, 4th Ed., McGraw-Hill, 2002.
2. A. Leon-Garcia, “Probability and Random Processes for Electrical Engineering”, 3rd Ed.,
Addison Wesley, 2008.

2. Random Processes and Signals
2.1 What is a random process?

[1: 487-490][2: 373-372]
Consider an experiment specified by the outcomes or sample point, 𝑠𝑠𝑖𝑖 , from some sample
space S.
Example 2-1: Consider throwing a die, then 𝑆𝑆 = {1,2,3,4,5,6} is a discrete space and 𝑠𝑠𝑖𝑖 can
take on any one of the 6 discrete values.
Example 2-2: Consider any real number, 𝑠𝑠𝑖𝑖 , from 0 to 1, then 𝑆𝑆 = [0,1]
We further assume that each sample point of the sample space evolves with time, that is we
assign each sample point, 𝑠𝑠𝑖𝑖 , a function of time, 𝑥𝑥𝑖𝑖 (𝑡𝑡).
The evolution of 𝑠𝑠𝑖𝑖 with respect to time, 𝑥𝑥𝑖𝑖 (𝑡𝑡), is called a realisation or sample function. The
sample space or ensemble consisting of the collection of sample functions constitute a
random process or stochastic process. The “randomness” arises both from the random
selection of the sample point and the possible random evolution with time. This is depicted in
Figure 2-1.
Figure 2-1 An ensemble of sample functions ([3], Figure 1.1, pg. 32)
Example 2-3: Consider the sample space S = {1,2,3,4,5,6}, define the realisation as 𝑥𝑥𝑖𝑖 (𝑡𝑡) =
𝑠𝑠𝑖𝑖 𝑒𝑒 −𝑡𝑡 𝑢𝑢(𝑡𝑡). See Figure 2-2 for example realisations

Figure 2-2 Sample functions 𝒙𝒙𝒊𝒊 (𝒕𝒕) = 𝒔𝒔𝒊𝒊 𝒆𝒆−𝒕𝒕 𝒖𝒖(𝒕𝒕) ([4], Figure 4.11, pg. 161)
(NOTE: 𝒙𝒙(𝒕𝒕; 𝛚𝛚𝒊𝒊 ) ≡ 𝒙𝒙𝒊𝒊 (𝒕𝒕))
At each time instant, 𝑡𝑡0 , we have the value 𝑥𝑥𝑖𝑖 (𝑡𝑡0 ). For each different possible sample point,
𝑠𝑠𝑖𝑖 , at the same fixed time instant, 𝑡𝑡0 , the collection of values 𝑥𝑥𝑖𝑖 (𝑡𝑡0 ) over the sample space
constitute a random variable denoted by 𝑋𝑋(𝑡𝑡0 ). That is, at each time instant, the value of the
random process across all possible realisations, constitutes a random variable.
Example 2-4: From Example 2-3 then 𝑋𝑋(1) is a random variable taking values
1
𝑒𝑒 −1 , 2𝑒𝑒 −1 , … ,6𝑒𝑒 −1 each with probability assuming a fair roll of the die.
6
Exercise 2-1
What is 𝑋𝑋(3)?
There are two types or families of random processes:

• Analytic random process (predictable): A sample point, 𝑠𝑠𝑖𝑖 , is selected randomly from
the sample space and allowed to evolve with time according to the analytic function or
expression, 𝑥𝑥𝑖𝑖 (𝑡𝑡), and is thus predictable in time.
• Collection of random variables indexed by time (regular): The random process or
random signal at times 𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 is viewed as a collection of random variables,
𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ), … 𝑋𝑋(𝑡𝑡𝑘𝑘 ) and there is no predictability.
The second view is important mathematically, the first view includes important random
processes or random signals in communications engineering.
If time is continuous, 𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 , then we have a continuous-time random process, 𝑋𝑋(𝑡𝑡),
otherwise if time is discrete, 𝑛𝑛1 , 𝑛𝑛2 , … , 𝑛𝑛𝑘𝑘 then we have a discrete-time random process, 𝑋𝑋𝑛𝑛 .
Example 2-5: Example 2-3 is an analytic random process.

Example 2-6: Let 𝑠𝑠𝑛𝑛 be the outcome of a random experiment drawing from a Gaussian random
variable distributed according to Ν(0,1) (zero-mean, unit variance) at time n. Define the
discrete-time random process {𝑋𝑋𝑛𝑛 }∞ 𝑛𝑛=0 by 𝑋𝑋𝑛𝑛 = 𝑋𝑋𝑛𝑛−1 + 𝑠𝑠𝑛𝑛 where 𝑋𝑋0 = 1. It can be shown that
𝑗𝑗
the collection {𝑋𝑋𝑛𝑛 }𝑖𝑖 is a j-i+1 dimensional (multi-variate) Gaussian distribution., thus this is a
random process viewed as a collection of random variables.
Exercise 2-2
Sketch a realisation of {𝑋𝑋𝑛𝑛 }∞
𝑛𝑛=0 from Example 2-6. Sketch another realisation and explain why
the realisations are different. Does it make sense to talk about a sample point and its evolution
with time in this context? Explain!
2.2 Key measures: mean and autocorrelation

[1: 491-495][2: 376-378,383-386]
Viewing a random processes as a collection of random variables indexed by time allows us to

fully specify a random processes generically as follows:
Mth order specification of a random process
A random process is described by its Mth order statistics if for any and all 𝑘𝑘 ≤ 𝑀𝑀 and for any
k time indices (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) the joint distributions are given. Specifically for continuous-time,
continuous-valued random processes we need to specify the pdf (probability distribution
function):
𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 )
and for discrete-valued random processes we need to specify the pmf (probability mass
function):
𝑝𝑝𝑋𝑋1 𝑋𝑋2⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑘𝑘 ) = 𝑃𝑃[𝑋𝑋1 = 𝑥𝑥1 , 𝑋𝑋2 = 𝑥𝑥2 , … , 𝑋𝑋𝑘𝑘 = 𝑥𝑥𝑘𝑘 ]
where 𝑋𝑋𝑗𝑗 = 𝑋𝑋(𝑡𝑡𝑗𝑗 ) and 𝑥𝑥𝑗𝑗 is a particular value of the random variable 𝑋𝑋𝑗𝑗 .
An important special case is that of M=2 in which only the second-order statistics need be
specified.
Remember! A Gaussian distribution is completely specified by its second-order statistics and

most signals of key interest assume a Gaussian distribution.
Specification of an analytic random process

For an analytic random process we specify:
𝑋𝑋(𝑡𝑡) = 𝑔𝑔(𝑡𝑡; θ)
where θ = (θ1 , θ2 , … θ𝑛𝑛 ) is, in general, a random vector with a given joint PDF expression.
Example 2-7: The random process is defined by 𝑋𝑋(𝑡𝑡) = θ where θ is a random variable
uniformly distributed on [−1,1]. For this random process, each realisation is a constant signal
of amplitude θ.
Example 2-8: The random process is defined by 𝑋𝑋(𝑡𝑡) = θ𝑡𝑡 where θ is a random variable
uniformly distributed on [−1,1]. For this random process, each realisation is a line of slope θ
which goes through the origin.
2.2.1 Mean of a Random Process
Definition of the mean of a random process

The mean, or expectation, of the random process 𝑋𝑋(𝑡𝑡) is a deterministic function of time,
𝑚𝑚𝑋𝑋 (𝑡𝑡) that at each time instant 𝑡𝑡0 equals the mean of the random variable 𝑋𝑋(𝑡𝑡0 ). And since
for any time instant 𝑡𝑡0 the random variable 𝑋𝑋(𝑡𝑡0 ) is defined by the PDF 𝑓𝑓𝑋𝑋(𝑡𝑡0) (𝑥𝑥) we have
that, in general:
𝑚𝑚𝑋𝑋 (𝑡𝑡0 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡0 )]
∞
= � 𝑥𝑥𝑓𝑓𝑋𝑋(𝑡𝑡0) (𝑥𝑥)𝑑𝑑𝑑𝑑 (continuous-valued)
−∞
𝑘𝑘=𝑚𝑚𝐻𝐻
=� 𝑥𝑥𝑘𝑘 𝑃𝑃[𝑋𝑋0 = 𝑥𝑥𝑘𝑘 ] (discrete-valued)
𝑘𝑘=𝑚𝑚𝐿𝐿
Equation 2-1
This is depicted in Figure 2-3.
For an analytic random process the mean can be formulated (with some abuse of notation) as:
∞
𝑚𝑚𝑋𝑋 (𝑡𝑡0 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡0 )] = 𝐸𝐸[𝑔𝑔(𝑡𝑡; θ)] = � 𝑔𝑔(𝑡𝑡; θ)𝑓𝑓Θ (θ)𝑑𝑑θ
−∞
NOTE: For simplicity we drop the subscript and simply refer to 𝑚𝑚𝑋𝑋 (𝑡𝑡).
Figure 2-3 The mean of a random process ([4], Figure 4.15, pg. 165)
(NOTE: 𝒙𝒙(𝒕𝒕; 𝛚𝛚𝒊𝒊 ) ≡ 𝒙𝒙𝒊𝒊 (𝒕𝒕))

Example 2-9: Consider the random process from Example 2-7. We note that:
1� −1 ≤ θ ≤ 1
𝑓𝑓Θ (θ) = � 2 and we also have that 𝑔𝑔(𝑡𝑡; θ) = θ, hence:
0 otherwise
1
1 1 θ2
𝑚𝑚𝑋𝑋 (𝑡𝑡) = ∫−1 θ ⋅ 𝑑𝑑θ = � � = 0 which we note is independent of t.
2 4 −1
Exercise 2-3
Repeat Example 2-9 for the random process of Example 2-8
Answer:
1 1
1 𝑡𝑡θ2
𝑚𝑚𝑋𝑋 (𝑡𝑡) = � θ𝑡𝑡 ⋅ 𝑑𝑑θ = � � = 0
−1 2 4 −1
2.2.2 Autocorrelation function of a Random Process
Definition of the autocorrelation function of a random process

Consider two time instances, 𝑡𝑡1 , 𝑡𝑡2 of the random process 𝑋𝑋(𝑡𝑡) giving rise to the two random
variables, 𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ) described by the joint PDF 𝑓𝑓𝑋𝑋(𝑡𝑡1)𝑋𝑋(𝑡𝑡2) (𝑥𝑥1 , 𝑥𝑥2 ). The autocorrelation
function is given by:
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )]
∞ ∞
= � � 𝑥𝑥1 𝑥𝑥2 𝑓𝑓𝑋𝑋(𝑡𝑡1)𝑋𝑋(𝑡𝑡2) (𝑥𝑥1 , 𝑥𝑥2 )𝑑𝑑𝑥𝑥1 𝑑𝑑𝑥𝑥2 continuous-valued
−∞ −∞
𝑘𝑘=𝑚𝑚𝐻𝐻 𝑙𝑙=𝑛𝑛𝐻𝐻
= � � 𝑥𝑥𝑘𝑘 𝑥𝑥𝑙𝑙 𝑃𝑃[𝑋𝑋1 = 𝑥𝑥𝑘𝑘 , 𝑋𝑋2 = 𝑥𝑥𝑙𝑙 ] discrete-valued

𝑘𝑘=𝑚𝑚𝐿𝐿 𝑙𝑙=𝑛𝑛𝐿𝐿
Equation 2-2
NOTE
The autocorrelation function will come back to haunt you as it is a key function used in
engineering to describe the characteristics of a random signal. So REMEMBER IT
Example 2-10: Consider the random process from Example 2-7. We note that 𝑓𝑓Θ (θ) =
1� −1 ≤ θ ≤ 1
� 2 and we also have that 𝑋𝑋(𝑡𝑡) = 𝑔𝑔(𝑡𝑡; θ) = θ, hence:
0 otherwise
1
1 1 1 θ3 1
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] = 𝐸𝐸[θ2 ] = ∫−1 θ2 ⋅ 𝑑𝑑θ = � � = which we note is
2 2 3 −1 3
independent of 𝑡𝑡1 and 𝑡𝑡2 (are we surprised?).

Exercise 2-4
Repeat Example 2-10 for the random process of Example 2-8
Answer:
1 3 1
1 𝑡𝑡1 𝑡𝑡2 θ 𝑡𝑡1 𝑡𝑡2
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] = 𝐸𝐸[θ2 𝑡𝑡1 𝑡𝑡2 ] = � θ2 𝑡𝑡1 𝑡𝑡2 ⋅ 𝑑𝑑θ = � � =
−1 2 2 3 −1 3
The autocovariance 𝑪𝑪𝑿𝑿𝑿𝑿 (𝒕𝒕𝟏𝟏 , 𝒕𝒕𝟐𝟐 ) of the random process, 𝑋𝑋(𝑡𝑡), can also defined as:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[{𝑋𝑋(𝑡𝑡1 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )}{𝑋𝑋(𝑡𝑡2 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡2 )}] = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )𝑚𝑚𝑋𝑋 (𝑡𝑡2 )
Equation 2-3
NOTE 1: For zero-mean random processes the autocovariance is equal to the autocorrelation.
NOTE 2: We also have that VAR[𝑋𝑋(𝑡𝑡)] = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡) = 𝐸𝐸[(𝑋𝑋(𝑡𝑡) − 𝐸𝐸[𝑋𝑋(𝑡𝑡)])2 ] = 𝐸𝐸[𝑋𝑋 2 (𝑡𝑡)] −
𝐸𝐸[𝑋𝑋(𝑡𝑡)]2 which is the variance of 𝑋𝑋(𝑡𝑡). For zero-mean signals then VAR[𝑋𝑋(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋 2 (𝑡𝑡)].
Exercise 2-5
Let 𝑔𝑔(𝑡𝑡) be the rectangular pulse shown below:
The random process 𝑋𝑋(𝑡𝑡) is defined as 𝑋𝑋(𝑡𝑡) = 𝐴𝐴𝐴𝐴(𝑡𝑡) where 𝐴𝐴 assumes the values ±1 with
equal probability. (a) Find the pmf of 𝑋𝑋(𝑡𝑡) and hence the expression for 𝑚𝑚𝑋𝑋 (𝑡𝑡).
Answer:
Since 𝑔𝑔(𝑡𝑡) is zero outside the interval [0,1], then 𝑃𝑃[𝑋𝑋(𝑡𝑡) = 0] = 1 for 𝑡𝑡 ∉ [0,1]. On the
otherhand for 𝑡𝑡 ∈ [0,1] we have 𝑃𝑃[𝑋𝑋(𝑡𝑡) = 1] = 𝑃𝑃[𝑋𝑋(𝑡𝑡) = −1] = 0.5. Hence we can state:
�1�𝑃𝑃�𝑋𝑋(𝑡𝑡) = 1� + �−1�𝑃𝑃�𝑋𝑋(𝑡𝑡) = −1� = 0 𝑡𝑡 ∈ [0,1]
𝑚𝑚𝑋𝑋 (𝑡𝑡) = �
0 𝑡𝑡 ∉ [0,1]
(b) Find the joint pmf of 𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ) and hence 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
Answer:
For 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = ±1] = 0.5 for (+1, +1) and (−1, −1) (i.e. same value)
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = ∓1] = 0 for (+1, −1) and (−1, +1)
For 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∉ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = ±1, 𝑋𝑋(𝑡𝑡2 ) = 0] = 0.5 for (+1,0) and (−1,0)
For 𝑡𝑡1 ∉ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 0, 𝑋𝑋(𝑡𝑡2 ) = ±1] = 0.5 for (0, +1) and (0, −1)
For 𝑡𝑡1 ∉ [0,1], 𝑡𝑡2 ∉ [0,1]:
𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 0, 𝑋𝑋(𝑡𝑡2 ) = 0] = 1 for (0,0)
Only considering non-zero contributions to the summation from Equation 2-2 which occur
for the case 𝑡𝑡1 ∈ [0,1], 𝑡𝑡2 ∈ [0,1]:
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = �+1. +1�𝑃𝑃�𝑋𝑋(𝑡𝑡1 ) = +1, 𝑋𝑋(𝑡𝑡2 ) = +1� + �−1. −1�𝑃𝑃�𝑋𝑋(𝑡𝑡1 ) = −1, 𝑋𝑋(𝑡𝑡2 ) = −1� = 1
and thus we can state:
𝑅𝑅𝑋𝑋𝑋𝑋(𝑡𝑡1 , 𝑡𝑡2 ) = �1 𝑡𝑡1 ∈ �0,1� and 𝑡𝑡2 ∈ [0,1]
0 otherwise

2.3 Classes of Random Processes
[1: 496-498,501-506][
2.3.1 IID Process
IID Process (Definition)

Let 𝑋𝑋𝑛𝑛 be a discrete-time random process consisting of a sequence of independent,
identically distributed (iid) random variables with a common pdf, 𝑓𝑓𝑋𝑋 (𝑥𝑥), mean 𝑚𝑚, and
variance σ2 . The sequence 𝑋𝑋𝑛𝑛 is called the iid random process.
IID Process (Properties)

The joint pdf of an iid process for any time instants 𝑛𝑛1 , 𝑛𝑛2 , … , 𝑛𝑛𝑘𝑘 is given by:
𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑘𝑘 ) = 𝑓𝑓𝑋𝑋 (𝑥𝑥1 )𝑓𝑓𝑋𝑋 (𝑥𝑥2 ) ⋯ 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑘𝑘 ) where, for simplicity, 𝑋𝑋𝑘𝑘 ≡ 𝑋𝑋𝑛𝑛𝑘𝑘
The mean of an iid process is given by

𝑚𝑚𝑋𝑋 (𝑛𝑛) = 𝐸𝐸[𝑋𝑋𝑛𝑛 ] = 𝑚𝑚 = 𝐸𝐸[𝑋𝑋] for all n
thus the mean does not depend on n.
The autocovariance of an iid process is given by:

If 𝑛𝑛1 ≠ 𝑛𝑛2 the random variables 𝑋𝑋𝑛𝑛1 and 𝑋𝑋𝑛𝑛2 are independent and hence:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) = 𝐸𝐸[(𝑋𝑋𝑛𝑛1 − 𝑚𝑚)(𝑋𝑋𝑛𝑛2 − 𝑚𝑚)] = 0
If 𝑛𝑛1 = 𝑛𝑛2 = 𝑛𝑛 then we have the variance of the random variable 𝑋𝑋𝑛𝑛 :
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛, 𝑛𝑛) = 𝐸𝐸[(𝑋𝑋𝑛𝑛 − 𝑚𝑚)2 ] = σ2 = VAR[𝑋𝑋]
thus the variance does not depend on n.
NOTE: We can represent this compactly as 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) = σ2 δ[𝑛𝑛1 − 𝑛𝑛2 ] where 𝛿𝛿[𝑛𝑛] = 1
only when 𝑛𝑛 = 0 (i.e. 𝑛𝑛1 = 𝑛𝑛2 ) and is 0 otherwise.
The autocorrelation of an iid process is given by (from Equation 2-3):

𝑅𝑅𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) = 𝐸𝐸[𝑋𝑋𝑛𝑛1 𝑋𝑋𝑛𝑛2 ] = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) + 𝑚𝑚2
2.3.2 Sum Process
Sum Process (Definition)

We define 𝑆𝑆𝑛𝑛 , the sum process, as the sum of a sequence of iid random variables:
𝑆𝑆𝑛𝑛 = 𝑋𝑋1 + 𝑋𝑋2 + ⋯ + 𝑋𝑋𝑛𝑛 = 𝑆𝑆𝑛𝑛−1 + 𝑋𝑋𝑛𝑛
Sum Process (Properties)

The sum process has independent and stationary increments, that is:
For 𝑛𝑛2 > 𝑛𝑛1  𝑃𝑃�𝑆𝑆𝑛𝑛2 − 𝑆𝑆𝑛𝑛1 = 𝑦𝑦� = 𝑃𝑃�𝑆𝑆𝑛𝑛2−𝑛𝑛1 = 𝑦𝑦�
Proof: Without loss of generality let 𝑛𝑛2 = 10 and 𝑛𝑛1 = 6 then 𝑆𝑆10 − 𝑆𝑆6 = 𝑋𝑋7 + 𝑋𝑋8 + 𝑋𝑋9 +
𝑋𝑋10 and 𝑆𝑆10−6 = 𝑆𝑆4 = 𝑋𝑋1 + 𝑋𝑋2 + 𝑋𝑋3 + 𝑋𝑋4 and we note that in both cases the sum process is
the sum of 4 iid random variables and there is no commonality between the two sets of random
variables (independent). Hence the sum processes are independent and involve to the same

sum of 4 iid random variables, thus the distributions and associated probabilities are identical
(independent and stationary).
The mean of a sum process is given by:

𝑚𝑚𝑆𝑆 (𝑛𝑛) = 𝐸𝐸[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛𝑛𝑛[𝑋𝑋] = 𝑛𝑛𝑛𝑛
The variance of a sum process is given by:
VAR[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛VAR[𝑋𝑋] = 𝑛𝑛σ2
Both these follow from the sum of independent random variables.
The autocovariance of the sum process can be shown to be:

𝐶𝐶𝑆𝑆𝑆𝑆 (𝑛𝑛, 𝑘𝑘) = min(𝑛𝑛, 𝑘𝑘)σ2
Exercise 2-6
Show it!
Answer:
𝐶𝐶𝑆𝑆𝑆𝑆 (𝑛𝑛, 𝑘𝑘) = 𝐸𝐸[(𝑆𝑆𝑛𝑛 − 𝐸𝐸[𝑆𝑆𝑛𝑛 ])(𝑆𝑆𝑘𝑘 − 𝐸𝐸[𝑆𝑆𝑘𝑘 ])]
= 𝐸𝐸[(𝑆𝑆𝑛𝑛 − 𝑛𝑛𝑛𝑛)(𝑆𝑆𝑘𝑘 − 𝑘𝑘𝑘𝑘)]
𝑛𝑛 𝑘𝑘
= 𝐸𝐸 ��(𝑋𝑋𝑖𝑖 − 𝑚𝑚)� ��(𝑋𝑋𝑗𝑗 − 𝑚𝑚)��

𝑖𝑖=1 𝑗𝑗=1
𝑛𝑛 𝑘𝑘 𝑛𝑛 𝑘𝑘
= � � 𝐸𝐸[(𝑋𝑋𝑖𝑖 − 𝑚𝑚)(𝑋𝑋𝑗𝑗 − 𝑚𝑚)] = � � 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑖𝑖, 𝑗𝑗)

𝑖𝑖=1 𝑗𝑗=1 𝑖𝑖=1 𝑗𝑗=1
2
Remembering that 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑖𝑖, 𝑗𝑗) is non-zero (and equal to σ ) only for 𝑖𝑖 = 𝑗𝑗 then:
min(𝑛𝑛,𝑘𝑘)
𝐶𝐶𝑆𝑆𝑆𝑆 (𝑛𝑛, 𝑘𝑘) = � 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑖𝑖, 𝑖𝑖) = min(𝑛𝑛, 𝑘𝑘)σ2

𝑖𝑖=1
2.3.3 Multiple Random Processes

Consider two different random processes, 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡).
Independent / Uncorrelated
Two random processes, 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡), are deemed independent if for all 𝑡𝑡1 and for all 𝑡𝑡2 the
random variables 𝑋𝑋(𝑡𝑡1 ) and 𝑌𝑌(𝑡𝑡2 ) are independent. Similarly, 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) are uncorrelated
if 𝑋𝑋(𝑡𝑡1 ) and 𝑌𝑌(𝑡𝑡2 ) are uncorrelated for all 𝑡𝑡1 , 𝑡𝑡2
NOTE: Two random variables, 𝑋𝑋 and 𝑌𝑌, are independent if
𝑓𝑓𝑋𝑋𝑋𝑋 (𝑥𝑥, 𝑦𝑦) = 𝑓𝑓𝑋𝑋 (𝑥𝑥)𝑓𝑓𝑌𝑌 (𝑦𝑦) and they are uncorrelated if COV[𝑋𝑋, 𝑌𝑌] = 𝐸𝐸[(𝑋𝑋 − 𝐸𝐸[𝑋𝑋])(𝑌𝑌 −
𝐸𝐸[𝑌𝑌])] = 𝐸𝐸[𝑋𝑋𝑋𝑋] − 𝐸𝐸[𝑋𝑋]𝐸𝐸[𝑌𝑌] = 0, where for zero-mean random variables this reduces to
𝐸𝐸[𝑋𝑋𝑋𝑋] = 0.
NOTE: If X and Y are independent then they are also uncorrelated, but if X and Y are
uncorrelated this does NOT imply they are independent, thus independence is a more powerful
constraint/property than (un)correlation.

Cross-correlation and cross-covariance
The cross-correlation of 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) is defined by:
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑌𝑌(𝑡𝑡2 )]
and the cross-covariance of 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) is defined by:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[{𝑋𝑋(𝑡𝑡1 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )}{𝑌𝑌(𝑡𝑡2 ) − 𝑚𝑚𝑌𝑌 (𝑡𝑡2 )}]
= 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) − 𝑚𝑚𝑋𝑋 (𝑡𝑡1 )𝑚𝑚𝑌𝑌 (𝑡𝑡2 )
NOTE 1: By obvious definition we also have that: 𝑅𝑅𝑌𝑌𝑌𝑌 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )]
Example 2-11 What can you say about 𝑅𝑅𝑌𝑌𝑌𝑌 (𝑡𝑡2 , 𝑡𝑡1 )?
𝑅𝑅𝑌𝑌𝑌𝑌 (𝑡𝑡2 , 𝑡𝑡1 ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡2 )𝑋𝑋(𝑡𝑡1 )] = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑌𝑌(𝑡𝑡2 )] = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
NOTE 2: We can also state:
COV[𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 )] = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) and similarly COV[𝑋𝑋(𝑡𝑡1 ), 𝑌𝑌(𝑡𝑡2 )] = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
2.4 Important Random Processes

[1: 499-506,515-518]
2.4.1 Gaussian (continuous-time, discrete-time)
A random process 𝑋𝑋(𝑡𝑡) is a Gaussian random process if the k-dimensional random variable
vector [𝑋𝑋1 𝑋𝑋2 ⋯ 𝑋𝑋𝑘𝑘 ], where 𝑋𝑋𝑗𝑗 = 𝑋𝑋(𝑡𝑡𝑗𝑗 ), are jointly Gaussian for all 𝑘𝑘 and for any and
all choices of 𝑡𝑡𝑗𝑗 .
The joint pdf is given by:

1 𝑇𝑇 −1
𝑒𝑒 −2(𝐱𝐱−𝐦𝐦) 𝐊𝐊 (𝐱𝐱−𝐦𝐦)
𝑓𝑓𝑋𝑋1 𝑋𝑋2⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑘𝑘 ) =
(2π)𝑘𝑘/2 |𝐊𝐊|1/2
Equation 2-4
where:
𝑚𝑚𝑋𝑋 (𝑡𝑡1 ) 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡1 ) 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) ⋯ 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡𝑘𝑘 )
𝑚𝑚 (𝑡𝑡 ) 𝐶𝐶 (𝑡𝑡 , 𝑡𝑡 ) 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡2 , 𝑡𝑡2 ) ⋯ 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡2 , 𝑡𝑡𝑘𝑘 )
𝐦𝐦 = � 𝑋𝑋 2 � 𝐊𝐊 = � 𝑋𝑋𝑋𝑋 2 1 �
⋮ ⋮ ⋮ ⋱ ⋮
𝑚𝑚𝑋𝑋 (𝑡𝑡𝑘𝑘 ) 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡𝑘𝑘 , 𝑡𝑡1 ) 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡𝑘𝑘 , 𝑡𝑡2 ) ⋯ 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡𝑘𝑘 , 𝑡𝑡𝑘𝑘 )
Exercise 2-7
What is an IID Gaussian random process? What is its joint pdf?
Answer: An IID Gaussian sequence is a discrete-time random process 𝑋𝑋𝑛𝑛 which is an IID
random process with a Gaussian pdf expression given by:
(𝑥𝑥−𝑚𝑚)2 ∑𝑘𝑘 (𝑥𝑥 −𝑚𝑚)2
− − 𝑖𝑖=1 𝑖𝑖2
𝑒𝑒 2σ2 𝑒𝑒 2σ
𝑓𝑓𝑋𝑋 (𝑥𝑥) =  𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … , 𝑥𝑥𝑘𝑘 ) = 𝑓𝑓𝑋𝑋 (𝑥𝑥1 )𝑓𝑓𝑋𝑋 (𝑥𝑥2 ) ⋯ 𝑓𝑓𝑋𝑋 (𝑥𝑥𝑘𝑘 ) =
√2πσ2 (2πσ2 )𝑘𝑘/2
The IID Gaussian random process has 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚 and 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑗𝑗 ) = {σ δ𝑖𝑖𝑖𝑖 }  𝐊𝐊 = σ2 𝐈𝐈. 2

2.4.2 Bernoulli Random Process / Binomial Counting Process
Bernoulli Random Process

Let 𝐼𝐼𝑛𝑛 be a sequence of independent Bernoulli random variables. Then 𝐼𝐼𝑛𝑛 is an IID random
process taking on the values from the discrete-valued sample space {0,1} with pmf: 𝑝𝑝𝐼𝐼𝑛𝑛 (1) =
𝑃𝑃[𝐼𝐼𝑛𝑛 = 1] = 𝑝𝑝, 𝑝𝑝𝐼𝐼𝑛𝑛 (0) = 𝑃𝑃[𝐼𝐼𝑛𝑛 = 0] = 1 − 𝑝𝑝. We also have that 𝑚𝑚𝐼𝐼 (𝑛𝑛) = 𝐸𝐸[𝐼𝐼𝑛𝑛 ] = 𝑝𝑝 and
VAR[𝐼𝐼𝑛𝑛 ] = 𝑝𝑝(1 − 𝑝𝑝).
Example: 𝐼𝐼𝑛𝑛 could be an indicator function for the event “a light bulb fails and is replaced on
day n”, see Figure 2-4(a)
Binomial Counting Process

Let 𝑆𝑆𝑛𝑛 be the corresponding sum process of the Bernoulli random process. Since 𝐼𝐼𝑛𝑛 is an
indicator process the sum, 𝑆𝑆𝑛𝑛 , is a counting process, the Binomial counting process. For the
sum of n independent Bernoulli random variables, 𝑆𝑆𝑛𝑛 = ∑𝑛𝑛𝑗𝑗=1 𝐼𝐼𝑗𝑗 , we have: 𝑝𝑝𝑆𝑆𝑛𝑛 (𝑗𝑗) = 𝑃𝑃[𝑆𝑆𝑛𝑛 =
𝑛𝑛
𝑗𝑗] = �𝑗𝑗 � 𝑝𝑝 𝑗𝑗 (1 − 𝑝𝑝)𝑛𝑛−𝑗𝑗 , where 𝑝𝑝 = 𝑃𝑃[𝐼𝐼 = 1], which is the probability of a sequence of length
n with j 1’s. We also have that 𝑚𝑚𝑆𝑆 (𝑛𝑛) = 𝑛𝑛𝑛𝑛 and VAR[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛𝑛𝑛(1 − 𝑝𝑝).
Example: 𝑆𝑆𝑛𝑛 denotes the “number of light bulbs that have failed up to day n”, see Figure 2-4(b).
Figure 2-4 (a) Realisation of Bernoulli process, 𝑰𝑰𝒏𝒏 , indicating that a light bulb fails (𝑰𝑰𝒏𝒏 =
𝟏𝟏) and is replaced on day n. (b) Realisation of the Binomial process, 𝑺𝑺𝒏𝒏 , counting the
number of light bulbs that have failed up to day n. ([1], Figure 9.4, pg. 499)

2.4.3 Random Step / Random Walk
Random Step
Define the random step as 𝐷𝐷𝑛𝑛 = 2𝐼𝐼𝑛𝑛 − 1 where 𝐼𝐼𝑛𝑛 is a Bernoulli random process. Then 𝐷𝐷𝑛𝑛 is
an IID random process taking on the values from the discrete-valued sample space {−1, +1},
where −1 is a step to the left and +1 is a step to the right, (see Figure 2-5(a)) with 𝑃𝑃[𝐷𝐷 =
1] = 𝑝𝑝, 𝑃𝑃[𝐷𝐷 = −1] = 1 − 𝑝𝑝 and:
𝑚𝑚𝐷𝐷 (𝑛𝑛) = 𝐸𝐸[𝐷𝐷𝑛𝑛 ] = 𝐸𝐸[2𝐼𝐼𝑛𝑛 − 1] = 2𝐸𝐸[𝐼𝐼𝑛𝑛 ] − 1 = 2𝑝𝑝 − 1,
VAR[𝐷𝐷𝑛𝑛 ] = VAR[2𝐼𝐼𝑛𝑛 − 1] = 4VAR[𝐼𝐼𝑛𝑛 ] = 4𝑝𝑝(1 − 𝑝𝑝).
Random Walk
The corresponding sum process, 𝑊𝑊𝑛𝑛 = ∑𝑛𝑛𝑗𝑗=1 𝐷𝐷𝑗𝑗 , is known as the one-dimensional random
walk, and depicts a random walk comprising a sequence of random left and right steps, see
Figure 2-5(b).
Figure 2-5 (a) Realisation of the random step process, 𝑫𝑫𝒏𝒏 , (b) Realisation of the
corresponding random walk process, 𝑾𝑾𝒏𝒏 ([1], Figure 9.5, pg. 500)
2.4.4 Wiener Process (Brownian Motion)

The random walk process can be used to model particle motion in discrete time-steps. Consider
a symmetric random walk process (i.e 𝑝𝑝 = 1/2) where steps of magnitude h are taken every δ
seconds. Define the continuous-time random process, 𝑋𝑋δ (𝑡𝑡), as the random walk up to time t.
In the limit δ → 0 and ℎ = √αδ we obtain the Wiener process, 𝑋𝑋(𝑡𝑡), which can be shown to
have 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0 and VAR[𝑋𝑋(𝑡𝑡)] = α𝑡𝑡. The Wiener process is used to model Brownian motion,
the motion of particles suspended in a fluid, see
Figure 2-6 Realisation of the Wiener process (Brownian motion)

(see also [1], Figure 9.12, pg. 517)
2.4.5 AR(p), MA and ARMA(p,q) processes
Assume 𝑋𝑋𝑛𝑛 is a zero-mean IID process.

The pth order auto-regressive or AR(p) process is given by:
𝑝𝑝
𝑌𝑌𝑛𝑛 = � α𝑖𝑖 𝑌𝑌𝑛𝑛−𝑖𝑖 + 𝑋𝑋𝑛𝑛

𝑖𝑖=1
The qth order moving average or MA(q) process is given by:
𝑞𝑞
𝑌𝑌𝑛𝑛 = � β𝑖𝑖 𝑋𝑋𝑛𝑛−𝑖𝑖

𝑖𝑖=0
The (p,q)th order auto-regressive moving average or ARMA(p,q) process is given by:
𝑝𝑝 𝑞𝑞
𝑌𝑌𝑛𝑛 = � α𝑖𝑖 𝑌𝑌𝑛𝑛−𝑖𝑖 + � β𝑖𝑖 𝑋𝑋𝑛𝑛−𝑖𝑖

The AR(p), depicted in Figure 2-7 for p = 1, is a key process for time series signal modelling.
Figure 2-7 AR(1) process ([1], Figure 9.8, pg. 506)

2.5 Important Random Processes in Communications Engineering
[1: 495-497,507-512][2: 377-383,453-463]
2.5.1 Poisson Process
In the Poisson process events occur at random instants of time and at an average rate of λ
events per second. Practical, real-world examples of this process:
• modelling the arrival of network packets at a router or bridge in a communications
network and the consequent effect of flow and throughput (queuing theory)
• modelling the random breakdown of components in a system (reliability theory)
• modelling of car traffic flow and arrival of cars at a junction (queuing theory)
The Poisson process can be considered the continuous-time version of the Binomial Counting
process and is derived as follows:
• Assume the time interval [0,t] is divided into n subintervals of very short duration δ =
𝑡𝑡/𝑛𝑛. Each subinterval is sufficiently short that the events can be treated as Bernoulli
random variables (i.e. only one event or none in each subinterval, each event occurring
with probability p)
• Let 𝑁𝑁(𝑡𝑡) be the number of event occurrences in the time interval [0,t]. The expected
number of event occurrences or indications over the interval [0,t] (comprising n
subintervals) is the mean of the binomial counting or sum process, 𝐸𝐸[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛𝑛𝑛. Since
events occur at the rate of λ events per second we must have that 𝑛𝑛𝑛𝑛 = λ𝑡𝑡.
• If we let 𝑛𝑛 → ∞ (i.e. δ → 0), 𝑝𝑝 → 0 while 𝑛𝑛𝑛𝑛 = λ𝑡𝑡 remains fixed then we can show that:
𝑛𝑛 (𝑛𝑛𝑛𝑛)𝑘𝑘 −𝑛𝑛𝑛𝑛 (λ𝑡𝑡)𝑘𝑘 −λ𝑡𝑡
𝑃𝑃[𝑆𝑆𝑛𝑛 = 𝑘𝑘] = � � 𝑝𝑝𝑘𝑘 (1 − 𝑝𝑝)𝑛𝑛−𝑘𝑘 ≅ 𝑒𝑒 ⇒ 𝑃𝑃[𝑁𝑁(𝑡𝑡) = 𝑘𝑘] = 𝑒𝑒
𝑘𝑘 𝑘𝑘! 𝑘𝑘!
Thus we have the Poisson process, N(t), (so named since the pmf of the random variable 𝑁𝑁(𝑡𝑡0 )
is the Poisson random variable) of rate λ𝑡𝑡:
(λ𝑡𝑡)𝑘𝑘
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 𝑘𝑘] = 𝑒𝑒 −λ𝑡𝑡 for 𝑘𝑘 = 0,1, …
𝑘𝑘!
where we note that 𝐸𝐸[𝑁𝑁(𝑡𝑡)] = λ𝑡𝑡 and we can show that VAR[𝑁𝑁(𝑡𝑡)] = λ𝑡𝑡.
Since the Poisson process is based on the Binomial sum process it possesses stationary and
independent increments, that is, we can say that:
𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖, 𝑁𝑁(𝑡𝑡2 ) = 𝑗𝑗] = 𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖]𝑃𝑃[𝑁𝑁(𝑡𝑡2 ) − 𝑁𝑁(𝑡𝑡1 ) = 𝑗𝑗 − 𝑖𝑖]
= 𝑃𝑃[𝑁𝑁(𝑡𝑡1 ) = 𝑖𝑖]𝑃𝑃[𝑁𝑁(𝑡𝑡2 − 𝑡𝑡1 ) = 𝑗𝑗 − 𝑖𝑖]
Example 2-12: Inquiries arrive at a recorded message device at the Poisson rate of 15 enquiries
per minute. Find the expression for the probability that in a 1-minute period, 3 inquiries arrive
during the first 10 seconds and 2 enquiries arrive during the last 15 seconds.
The arrival rate in seconds is λ = 15/60 = 0.25 inquiries per second and the probability of
interest is: 𝑃𝑃[𝑁𝑁(10) = 3, 𝑁𝑁(60) − 𝑁𝑁(45) = 2]
Exercise 2-8
Evaluate the expression in Example 2-12 making use of independent increments.
Answer:
𝑃𝑃[𝑁𝑁(10) = 3, 𝑁𝑁(60) − 𝑁𝑁(45) = 2]
= 𝑃𝑃[𝑁𝑁(10) = 3]𝑃𝑃[𝑁𝑁(60) − 𝑁𝑁(45) = 2]
((0.25)(10))3 𝑒𝑒 −(0.25)(10) ((0.25)(15))2 𝑒𝑒 −(0.25)(15)
= 𝑃𝑃[𝑁𝑁(10) = 3]𝑃𝑃[𝑁𝑁(15) = 2] =
3! 2!
Inter-arrival times are exponentially distributed

Now consider the time T between event occurrences in a Poisson process. Assuming the
underlying Bernoulli process across the n subintervals we have that:
λ𝑡𝑡 𝑛𝑛
𝑃𝑃[𝑇𝑇 > 𝑡𝑡] = 𝑃𝑃[no events in 𝑡𝑡 seconds] = (1 − 𝑝𝑝)𝑛𝑛 = �1 − � → 𝑒𝑒 −λ𝑡𝑡 as 𝑛𝑛 → ∞
𝑛𝑛
which implies that T is an exponential random variable. We thus conclude that the inter-event
times in a Poisson process form an iid sequence of exponential random variables with mean
1/λ and variance 1/λ2 .
Event arrival times in [𝟎𝟎, 𝒕𝒕] are distributed uniformly and independently
Consider a single event arrival (i.e. network packet, telephone call, etc.), 𝑋𝑋, which we are told
arrives sometime in the interval [0, 𝑡𝑡]. Consider the cdf, 𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥], this is the probability that
the event arrives by time 𝑥𝑥 where 0 < 𝑥𝑥 < 𝑡𝑡 (i.e. [𝑁𝑁(𝑥𝑥) = 1]), given the event must have
arrived by time 𝑡𝑡 (i.e. [𝑁𝑁(𝑡𝑡) = 1]). That is:
𝑃𝑃[𝑋𝑋 ≤ 𝑥𝑥] = 𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1|𝑁𝑁(𝑡𝑡) = 1]
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1, 𝑁𝑁(𝑡𝑡) = 1]
=
𝑃𝑃[𝑁𝑁(𝑡𝑡) = 1]
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1, 𝑁𝑁(𝑡𝑡) − 𝑁𝑁(𝑥𝑥) = 0]
=
𝑃𝑃[𝑁𝑁(𝑥𝑥) = 1]𝑃𝑃[𝑁𝑁(𝑡𝑡 − 𝑥𝑥) = 0]
=
𝜆𝜆𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑒𝑒 −𝜆𝜆(𝑡𝑡−𝑥𝑥) 𝑥𝑥
= =
𝜆𝜆𝜆𝜆𝑒𝑒 −𝜆𝜆𝜆𝜆 𝑡𝑡
Hence the event arrival time must be uniformly distributed in the interval [0, 𝑡𝑡] (since the cdf
corresponds to that of a uniform distribution). It can be shown that if the number of arrivals in
the interval �0, 𝑡𝑡� is 𝑘𝑘 then the individual arrival times are distributed independently and
uniformly in the interval.
Exercise 2-9
Suppose two customers arrive at a shop during a two-minute period. Find the probability that
both customers arrived during the first minute.
Answer:
Since the arrival times of each customer is uniformly distributed and independent then
1
customer 𝑎𝑎 arrives during the first minute with probability 𝑃𝑃[𝑋𝑋𝑎𝑎 ≤ 1] = and thus both
2
customers, 𝑎𝑎 and 𝑏𝑏, arrive during the first minute with probability
1 1 1
𝑃𝑃[𝑋𝑋𝑎𝑎 ≤ 1]𝑃𝑃[𝑋𝑋𝑏𝑏 ≤ 1] = � � � � = .
2 2 4
2.5.2 Shot Noise
Consider a Poisson process where events arrive randomly at times, 𝑆𝑆𝑛𝑛 , 𝑛𝑛 = 1,2, …. and 𝑆𝑆𝑛𝑛 is
the time at which the 𝑛𝑛th event occurs, that is:
𝑆𝑆𝑛𝑛 = 𝑇𝑇1 + 𝑇𝑇2 + ⋯ + 𝑇𝑇𝑖𝑖 + ⋯ + 𝑇𝑇𝑛𝑛
where 𝑇𝑇𝑖𝑖 is the iid exponential inter-arrival times between the (𝑖𝑖-1)the and 𝑖𝑖th events.
Assume that each event results in a system response, ℎ(𝑡𝑡). Thus we can define a new random
process, the shot noise process, 𝑋𝑋(𝑡𝑡), given by:
∞
𝑋𝑋(𝑡𝑡) = � ℎ(𝑡𝑡 − 𝑆𝑆𝑛𝑛 )

𝑛𝑛=1
An sample shot noise process is given by Figure 2-8.
The term ‘shot noise’ is applied to the fluctuations in electronic and photonic circuits due to
the effect of random electrons/photons. For example, define 𝑋𝑋(𝑡𝑡) as the shot noise effect from
an idle photodetector where events correspond to sporadic photoelectrons hitting the detector
giving rise to a temporary current pulse ℎ(𝑡𝑡) at each event.
Figure 2-8 Shot noise process ([1], Figure 9.11(b), pg. 513)
2.5.3 Signal plus Noise
In communications engineering there are two types of signals:

• 𝑋𝑋(𝑡𝑡) the clean or signal of interest that contains the information (data, speech, video,
music) that is being transmitted,
• 𝑁𝑁(𝑡𝑡) the noise or interference signal which is generated from the environment (thermal,
electrical, deliberate or man-made, etc.)
The transmitter will transmit the desired signal of interest, 𝑋𝑋(𝑡𝑡), and the receiver will receive
the observed signal, 𝑌𝑌(𝑡𝑡), which has been corrupted by noise, usually in simple additive
fashion:
𝑌𝑌(𝑡𝑡) = 𝑋𝑋(𝑡𝑡) + 𝑁𝑁(𝑡𝑡)
In theory, both 𝑋𝑋(𝑡𝑡) and 𝑁𝑁(𝑡𝑡) should be considered as random processes, in practice it is the
noise random process, 𝑁𝑁(𝑡𝑡), and its properties, that are important. Also the noise process 𝑁𝑁(𝑡𝑡)
is assumed to be uncorrelated with the signal process 𝑋𝑋(𝑡𝑡) and usually 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 𝐸𝐸[𝑁𝑁(𝑡𝑡)] =
0 (signals are zero-mean).
Example 2-13

Show that 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ).
Answer:
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑌𝑌(𝑡𝑡2 )] = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 ){𝑋𝑋(𝑡𝑡2 ) + 𝑁𝑁(𝑡𝑡2 )}]
= 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] + 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑁𝑁(𝑡𝑡2 )] = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
2.5.4 Sinusoid with random phase
In passband communications systems a sinusoidal signal is required to be the “carrier” of the

information to be transmitted. Although the carrier frequency, ω𝑐𝑐 = 2π𝑓𝑓𝑐𝑐 , is fixed, due to
propagation delays, the phase will usually be unknown. Hence at the receiver the phase has to
be treated as a random variable and the signal becomes an analytic random process.
Specifically the random signal representing the carrier is 𝑋𝑋(𝑡𝑡) = 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ) where Θ is
uniformly distributed over the interval (−π, π), that is,
1
−π ≤ θ ≤ π
𝑓𝑓Θ (θ) = �2π . We have that:
0 otherwise π π
𝐴𝐴
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ)] = � 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + θ)𝑓𝑓Θ (θ)𝑑𝑑θ = � cos(ω𝑐𝑐 𝑡𝑡 + θ)𝑑𝑑θ = 0
2π
−π −π
and we can derive that:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 )
= 𝐸𝐸[𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡1 + Θ)𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡2 + Θ)]
π
𝐴𝐴2 1
= � {cos(ω𝑐𝑐 (𝑡𝑡1 − 𝑡𝑡2 )) + cos(ω𝑐𝑐 (𝑡𝑡1 + 𝑡𝑡2 ) + 2θ)}𝑑𝑑θ
2π 2
−π
𝐴𝐴2
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = cos(ω𝑐𝑐 (𝑡𝑡1 − 𝑡𝑡2 ))
2
Equation 2-5
NOTE 1: The identity 2cos𝐴𝐴cos𝐵𝐵 = cos(𝐴𝐴 + 𝐵𝐵) + cos(𝐴𝐴 − 𝐵𝐵)
π
NOTE 2: Is it obvious ∫−π cos(ω𝑡𝑡 + 𝑚𝑚θ)𝑑𝑑θ = 0 for m integer?
Exercise 2-10
Sketch 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) as a function of τ = 𝑡𝑡1 − 𝑡𝑡2 .
2.5.5 Random binary wave
In a baseband digital transmission system a random sequence of 0’s and 1’s need to be encoded
for transmission over a wired network. The transmitted signal can be viewed as a random
binary wave, 𝑋𝑋(𝑡𝑡), as follows (see Figure 2-9):
1. The symbols 1 and 0 are transmitted every 𝑇𝑇 seconds by pulses of amplitude +𝐴𝐴 and
−𝐴𝐴 respectively, of duration 𝑇𝑇 seconds.

2. The starting time, 𝑡𝑡𝑑𝑑 , of the pulses is unknown and is uniformly distributed between 0
and T seconds. That is, 𝑡𝑡𝑑𝑑 is a sample value from the random variable 𝑇𝑇𝑑𝑑 which has a
1
0 ≤ 𝑡𝑡𝑑𝑑 ≤ 𝑇𝑇
pdf: 𝑓𝑓𝑇𝑇𝑑𝑑 (𝑡𝑡𝑑𝑑 ) = �𝑇𝑇 .
0 elsewhere
3. During any time interval (𝑛𝑛 − 1)𝑇𝑇 < 𝑡𝑡 − 𝑡𝑡𝑑𝑑 < 𝑛𝑛𝑛𝑛, where 𝑛𝑛 is an integer, the presence
of a 1 or a 0 follows a Bernoulli random process with 𝑝𝑝 = 1/2, that is, a 0 and a 1 are
equally likely and the time intervals are independent from one another.
Figure 2-9 Sample function of the random binary wave ([3], Figure 1.6, pg. 39)
The random binary wave holds a key property called stationarity (to be formerly defined later)
which is that the statistical properties of the process are independent of time or “origin”. This
implies that the joint statistical properties at times (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) are the same as the joint
statistical properties at times (𝑡𝑡1 + τ, 𝑡𝑡2 + τ, … , 𝑡𝑡𝑘𝑘 + τ), that is, we can analyse the signal at
whatever absolute time is convenient without loss of generality.
Since the amplitude levels +𝐴𝐴 and −𝐴𝐴 occur with equal probability for any realisation at a
particular time instant then:
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 0
The calculation of the autocorrelation, 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑘𝑘 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )], however, is more tricky,
but here goes:
Case 1: |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | > 𝑇𝑇 (easy)
Under this condition the random variables 𝑋𝑋(𝑡𝑡𝑘𝑘 ) and 𝑋𝑋(𝑡𝑡𝑖𝑖 ) occur in different pulse intervals
and are thus independent from which we get:
𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )] = 𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )]𝐸𝐸[𝑋𝑋(𝑡𝑡𝑘𝑘 )] = 0
Case 2: |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇 (tricky, so concentrate!)
Since the signal is stationary we can analyse at whatever absolute time is convenient, so we
choose 𝑡𝑡𝑘𝑘 = 0 and 𝑡𝑡𝑖𝑖 < 𝑡𝑡𝑘𝑘 . From Figure 2-9 we observe that 𝑋𝑋(𝑡𝑡𝑖𝑖 ) and 𝑋𝑋(𝑡𝑡𝑘𝑘 ) will only be in
the same pulse interval if and only if |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇 − 𝑡𝑡𝑑𝑑 , that is 𝑡𝑡𝑑𝑑 < 𝑇𝑇 − |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 |. In this case
we can derive the following conditional expectation:
2 |𝑡𝑡 |
𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )|𝑡𝑡𝑑𝑑 ] = �𝐴𝐴 𝑡𝑡𝑑𝑑 < 𝑇𝑇 − 𝑘𝑘 − 𝑡𝑡𝑖𝑖
0 otherwise
So to remove the conditioning we average over all possible 𝑡𝑡𝑑𝑑 :

∞ 𝑇𝑇−|𝑡𝑡𝑘𝑘 −𝑡𝑡𝑖𝑖 |
𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )] = � 𝐸𝐸[𝑋𝑋(𝑡𝑡𝑖𝑖 )𝑋𝑋(𝑡𝑡𝑘𝑘 )|𝑡𝑡𝑑𝑑 ] 𝑓𝑓𝑇𝑇𝑑𝑑 (𝑡𝑡𝑑𝑑 )𝑑𝑑𝑡𝑡𝑑𝑑 = � 𝐴𝐴2 𝑓𝑓𝑇𝑇𝑑𝑑 (𝑡𝑡𝑑𝑑 )𝑑𝑑𝑡𝑡𝑑𝑑
−∞ 0
𝑇𝑇−|𝑡𝑡𝑘𝑘 −𝑡𝑡𝑖𝑖 | 2 |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 |
𝐴𝐴
=� 𝑑𝑑𝑡𝑡𝑑𝑑 = 𝐴𝐴2 �1 − �
0 𝑇𝑇 𝑇𝑇
Putting it all together we get:
|𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 |
𝐴𝐴2 �1 − � |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑘𝑘 ) = � 𝑇𝑇
0 |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | ≥ 𝑇𝑇
Equation 2-6
Exercise 2-11
Sketch 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) as a function of τ = 𝑡𝑡1 − 𝑡𝑡2 .
2.5.6 Random telegraph signal
Consider a random process that assumes values ±1, where 𝑋𝑋(0) = ±1 with probability 0.5,
i.e. 𝑃𝑃[𝑋𝑋(0) = ±1] = 0.5, and 𝑋𝑋(𝑡𝑡) changes “polarity” with each occurrence of an event in a
Poisson process of rate α. A realisation of this process is shown in Figure 2-10.
Figure 2-10 Realisation of a random telegraph signal where the 𝑿𝑿𝒋𝒋 are iid exponential
random variables ([1], Figure 9.10, pg. 511)
Since 𝑋𝑋(𝑡𝑡) will have the same polarity as 𝑋𝑋(0) when an even number of Poisson events occur
and the opposite polarity otherwise we have that:
∞
(α𝑡𝑡)2𝑗𝑗 −α𝑡𝑡 1
𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1|𝑋𝑋(0) = ±1] = 𝑃𝑃[𝑁𝑁(𝑡𝑡) = even integer] = � 𝑒𝑒 = {1 + 𝑒𝑒 −2α𝑡𝑡 }
(2𝑗𝑗)! 2
𝑗𝑗=0
∞
(α𝑡𝑡)2𝑗𝑗+1 −α𝑡𝑡 1
𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1|𝑋𝑋(0) = ∓1] = 𝑃𝑃[𝑁𝑁(𝑡𝑡) = odd integer] = � 𝑒𝑒 = {1 − 𝑒𝑒 −2α𝑡𝑡 }
(2𝑗𝑗 + 1)! 2
𝑗𝑗=0
From which we have that:
P[ X (t ) = ±1] = P[ X (t ) = ±1 | X (0) = 1]P[ X (0) = 1] + P[ X (t ) = ±1 | X (0) = −1]P[ X (0) = −1]
⇒ 𝑃𝑃[𝑋𝑋(𝑡𝑡) = ±1] = 0.5
We can also show that:
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0, VAR[𝑋𝑋(𝑡𝑡)] = 1
and:
𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = (1)𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) = 𝑋𝑋(𝑡𝑡2 )] + (−1)𝑃𝑃[𝑋𝑋(𝑡𝑡1 ) ≠ 𝑋𝑋(𝑡𝑡2 )] = 𝑒𝑒 −2α|𝑡𝑡1−𝑡𝑡2|

2.6 Stationary and Wide Sense Stationary (WSS) Processes

[1:518-528][2: 387-393]
2.6.1 Definition
In a complete statistical description of the Mth order statistics of a random processes, for any
𝑘𝑘 ≤ 𝑀𝑀 joint combination and at any of the times (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) the joint PDF
𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 ), where 𝑋𝑋𝑗𝑗 = 𝑋𝑋(𝑡𝑡𝑗𝑗 ), is given. The joint PDF so stated depends on the
time origin. In a very important class of random processes the joint PDF does not depend on
the time origin, only on the relative times.
Thus the same statistical properties will hold no matter at which specific time or time interval
is analysed, that is the properties of the random process do not change with time. If this is the
case then we have a stationary random process.
But let’s be more formal …

Strictly Stationary Process
If:
𝑓𝑓𝑋𝑋1𝑋𝑋2 ⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 ) = 𝑓𝑓𝑋𝑋1+τ𝑋𝑋2+τ⋯𝑋𝑋𝑘𝑘+τ (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 )
for all 𝑘𝑘, for all (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) and for all time-shifts τ then the process is strictly stationary.
If the condition only holds for 𝑘𝑘 ≤ 𝑀𝑀 then the process is Mth order stationary.
Strict stationarity is very strong condition which is difficult to prove. However if one restricts
to only 2nd order stationarity we find that there are many signals that satisfy this and for which
the most useful properties for analysis and design arise.
Specifically …
Wide-Sense Stationary (WSS)
If only the 2nd other statistics exhibit stationarity then we have that:
1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 𝑚𝑚𝑋𝑋 is independent of t
2. 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) ≡ 𝑅𝑅𝑋𝑋 (𝑡𝑡1 − 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋 (τ) depends on the time difference τ = 𝑡𝑡1 − 𝑡𝑡2 .
3. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐶𝐶𝑋𝑋 (τ) = 𝑅𝑅𝑋𝑋 (τ) − 𝑚𝑚𝑋𝑋2 , as a consequence of 1. and 2.
and a random process exhibiting these characteristics will be a wide-sense stationary or WSS
process.
For discrete-time: 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) ≡ 𝑅𝑅𝑋𝑋 (𝑛𝑛1 − 𝑛𝑛2 ) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) where 𝑘𝑘 = 𝑛𝑛1 − 𝑛𝑛2
From now on any reference to stationary can be assumed to imply wide-sense stationary unless
otherwise specified.
2.6.2 Properties of the autocorrelation function for a WSS process
Definition
Since τ = 𝑡𝑡1 − 𝑡𝑡2 , if 𝑡𝑡2 = 𝑡𝑡 then 𝑡𝑡1 = 𝑡𝑡 + τ. Hence for a WSS process we can define the
general form of the autocorrelation function as:
𝑅𝑅𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)]
to keep with standard mathematical convention. This convention is especially important to
remember when dealing with cross-correlations.
For discrete-time: 𝑅𝑅𝑋𝑋 (𝑘𝑘) = 𝐸𝐸[𝑋𝑋𝑛𝑛+𝑘𝑘 𝑋𝑋𝑛𝑛 ]
The autocorrelation function of a WSS process has several important properties:

1. The mean-square value of the process, which also defines the average power of the
signal, is given by: 𝑅𝑅𝑋𝑋 (0) = 𝐸𝐸[𝑋𝑋 2 (𝑡𝑡)]
2. The autocorrelation function is an even function: 𝑅𝑅𝑋𝑋 (τ) = 𝑅𝑅𝑋𝑋 (−τ)
3. The autocorrelation function has a maximum at τ = 0: |𝑅𝑅𝑋𝑋 (τ)| ≤ 𝑅𝑅𝑋𝑋 (0)
4. If 𝑅𝑅𝑋𝑋 (0) = 𝑅𝑅𝑋𝑋 (𝑑𝑑) then 𝑅𝑅𝑋𝑋 (τ) is periodic with period d and 𝑋𝑋(𝑡𝑡) is mean square
periodic, that is 𝐸𝐸[(𝑋𝑋(𝑡𝑡 + 𝑑𝑑) − 𝑋𝑋(𝑡𝑡))2 ] = 0.
5. The autocorrelation function is a measure of the rate of change of a random process as
shown by Figure 2-11.
6. Let 𝑋𝑋(𝑡𝑡) = 𝑚𝑚 + 𝑁𝑁(𝑡𝑡) where 𝑁𝑁(𝑡𝑡) is a zero-mean process for which 𝑅𝑅𝑁𝑁 (τ) → 0 as τ →
∞ and m is the “DC bias”. Then we have that 𝑅𝑅𝑋𝑋 (τ) → 𝑚𝑚2 as τ → ∞.

Figure 2-11 Autocorrelation and the rate of change of a random process ([3], Figure
1.4, pg. 37)
In summary the autocorrelation can consist of three types of components:

1. a component that approaches zero as τ → ∞, for a general non-periodic, zero-mean
process
2. a component that is periodic, due to a periodic process
3. a component due to the non-zero mean of the process
2.6.3 IID and Sum Processes
For the IID Process from Section 1.3.1 we had:

1. 𝑚𝑚𝑋𝑋 (𝑛𝑛) = 𝐸𝐸[𝑋𝑋𝑛𝑛 ] = 𝑚𝑚 = 𝐸𝐸[𝑋𝑋] is independent of time n
2. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) = σ2 𝛿𝛿[𝑛𝑛1 − 𝑛𝑛2 ]  𝐶𝐶𝑋𝑋 (𝑘𝑘) = σ2 δ[𝑘𝑘]
3. 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) = 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑛𝑛1 , 𝑛𝑛2 ) + 𝑚𝑚2  𝑅𝑅𝑋𝑋 (𝑘𝑘) = 𝐶𝐶𝑋𝑋 (𝑘𝑘) + 𝑚𝑚2 which depends only on
the time difference 𝑘𝑘 = 𝑛𝑛1 − 𝑛𝑛2 .
Verdict: The IID process is WSS
For the Sum Process from Section 1.3.2 we had:

1. 𝑚𝑚𝑆𝑆 (𝑛𝑛) = 𝐸𝐸[𝑆𝑆𝑛𝑛 ] = 𝑛𝑛𝑛𝑛[𝑋𝑋] = 𝑛𝑛𝑛𝑛 which is dependent on time n
Verdict: The Sum process is NOT WSS
For the sinusoid with random phase from Section 1.5.3 we had:
1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0
𝐴𝐴2 𝐴𝐴2
2. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = cos(ω𝑐𝑐 (𝑡𝑡2 − 𝑡𝑡1 ))  𝑅𝑅𝑋𝑋 (τ) = cos(ω𝑐𝑐 τ)
2 2
Verdict: The sinusoid with random phase process is WSS

For the random binary wave from Section 1.5.4 we had:

1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 0
|𝑡𝑡 −𝑡𝑡 | |τ|
𝐴𝐴2 �1 − 𝑘𝑘 𝑖𝑖 � |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | < 𝑇𝑇 𝐴𝐴2
�1 − � |τ| < 𝑇𝑇
2. 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡𝑖𝑖 , 𝑡𝑡𝑘𝑘 ) = � 𝑇𝑇  𝑅𝑅𝑋𝑋 (τ) = � 𝑇𝑇
0 |𝑡𝑡𝑘𝑘 − 𝑡𝑡𝑖𝑖 | ≥ 𝑇𝑇 0 |τ| ≥ 𝑇𝑇
Verdict: The random binary wave is WSS
2.6.6 Random telegraph signal
For the random telegraph signal from Section 1.5.5 we had:

1. 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 0
2. 𝐶𝐶𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝑒𝑒 −2α|𝑡𝑡2−𝑡𝑡1|  𝐶𝐶𝑋𝑋 (τ) = 𝑅𝑅𝑋𝑋 (τ) = 𝑒𝑒 −2α|τ|
Verdict: The random telegraph signal is WSS
2.6.7 Gaussian Random Process
Exercise 2-12
What is the expression for the pdf of a stationary Gaussian random process for k=2 (consult
Section 1.4.1)?
1 𝑇𝑇 −1
𝑒𝑒 −2(𝐱𝐱−𝐦𝐦) 𝐊𝐊 (𝐱𝐱−𝐦𝐦)
𝑓𝑓𝑋𝑋1 𝑋𝑋2 (𝑥𝑥1 , 𝑥𝑥2 ) =
(2π)|𝐊𝐊|1/2
where:
𝑚𝑚𝑋𝑋 𝐶𝐶 (0) 𝐶𝐶𝑋𝑋 (τ)
𝐦𝐦 = �𝑚𝑚 � 𝐊𝐊 = � 𝑋𝑋 �
𝑋𝑋 𝐶𝐶𝑋𝑋 (τ) 𝐶𝐶𝑋𝑋 (0)
2.6.8 Cyclostationary Random Process
Many random signals in communications systems arise from periodic processes. For example,
a data modulator or modem processes a random waveform every T seconds. Such signals
exhibit cyclostationary behaviour.
Strictly Cyclostationary Process

If for some period 𝑇𝑇:
𝑓𝑓𝑋𝑋1 𝑋𝑋2⋯𝑋𝑋𝑘𝑘 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 ) = 𝑓𝑓𝑋𝑋1+𝑚𝑚𝑚𝑚𝑋𝑋2+𝑚𝑚𝑚𝑚⋯𝑋𝑋𝑘𝑘+𝑚𝑚𝑚𝑚 (𝑥𝑥1 , 𝑥𝑥2 , … 𝑥𝑥𝑘𝑘 )
for all 𝑘𝑘, for all (𝑡𝑡1 , 𝑡𝑡2 , … , 𝑡𝑡𝑘𝑘 ) and for all 𝑚𝑚 then the process is strictly cyclostationary.

Wide-Sense Cyclostationary (WSC)
If only the 2nd other statistics exhibit cyclostationarity then we have that:
1. 𝑚𝑚𝑋𝑋 (𝑡𝑡 + 𝑚𝑚𝑚𝑚) = 𝑚𝑚𝑋𝑋 (𝑡𝑡), is invariant to shifts by multiples of T
2. 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 + 𝑚𝑚𝑚𝑚, 𝑡𝑡2 + 𝑚𝑚𝑚𝑚) ≡ 𝑅𝑅𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ), is periodic with period T.
and a random process exhibiting these characteristics will be a wide-sense cyclostationary or
WSC process.
Example 2-14 EXAMPLE 9.38 from [1], pgs. 526-527
2.7 Ergodicity and Time Averages

[1: 540-544][2: 523-527]
2.7.1 Definition
Consider a WSS random process, 𝑋𝑋(𝑡𝑡), for which we are interested in forming estimates of
the mean, 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚 𝑋𝑋 , and autocorrelation, 𝑅𝑅𝑋𝑋 (τ). How can we do this?
Ensemble Average
Consider an ensemble or collection of different realisations of the random process. Then we
have a random variable, 𝑋𝑋(𝑡𝑡), with pdf 𝑓𝑓𝑋𝑋(𝑡𝑡) (𝑥𝑥) representing the distribution at time 𝑡𝑡 over all
the realisations. We define the mean and the autocorrelation as the following expectations or
ensemble averages of the random variable:
∞
𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = � 𝑥𝑥𝑓𝑓𝑋𝑋(𝑡𝑡) (𝑥𝑥)𝑑𝑑𝑑𝑑

−∞
∞ ∞
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡1 )𝑋𝑋(𝑡𝑡2 )] = � � 𝑥𝑥1 𝑥𝑥2 𝑓𝑓𝑋𝑋(𝑡𝑡1)𝑋𝑋(𝑡𝑡2) (𝑥𝑥1 , 𝑥𝑥2 )𝑑𝑑𝑥𝑥1 𝑑𝑑𝑥𝑥2
−∞ −∞
Time Average
For a WSS random process it is also possible to consider taking the time average of a single
realisation:
𝑇𝑇
1
⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
2𝑇𝑇
−𝑇𝑇
𝑇𝑇
1
⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
2𝑇𝑇
−𝑇𝑇
where 𝑥𝑥(𝑡𝑡) is any one realisation of the random process of interest.
We can easily show that (for the mean):

𝑇𝑇 𝑇𝑇
1 1
𝐸𝐸[⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = � 𝐸𝐸[𝑥𝑥(𝑡𝑡)]𝑑𝑑𝑑𝑑 = � 𝑚𝑚𝑋𝑋 𝑑𝑑𝑑𝑑 = 𝑚𝑚𝑋𝑋
2𝑇𝑇 2𝑇𝑇
−𝑇𝑇 −𝑇𝑇

that is, the time average is an unbiased estimate of the ensemble average. However we need
to show this is true in the mean square sense, that is, 𝐸𝐸[(⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 − 𝑚𝑚𝑋𝑋 )2 ] → 0 as 𝑇𝑇 → ∞.
Ergodic Process
The WSS random process 𝑋𝑋(𝑡𝑡) is ergodic in the mean, that is:
lim ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝑚𝑚𝑋𝑋 (time average approaches ensemble average)
𝑇𝑇→∞
in the mean square sense if the following can be shown to be true:
lim 𝑉𝑉𝑉𝑉𝑉𝑉[⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = 0 (time average converges to the ensemble average)
𝑇𝑇→∞
And it is ergodic in the autocorrelation function, that is:

lim ⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = 𝑅𝑅𝑋𝑋 (τ)
𝑇𝑇→∞
in the mean square sense if the following can be shown to be true:
lim 𝑉𝑉𝑉𝑉𝑉𝑉[⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = 0
𝑇𝑇→∞
If a WSS random process is both mean ergodic and ergodic in the autocorrelation function
then it can be considered an ergodic process.
Example 2-15: Let 𝑋𝑋(𝑡𝑡) = 𝐴𝐴 where 𝐴𝐴 is a zero-mean, unit-variance random variable. We

obviously have that 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 𝐸𝐸[𝐴𝐴] = 0, on the other hand:
1 𝑇𝑇 1 𝑇𝑇
⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = ∫−𝑇𝑇 𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 = ∫−𝑇𝑇 𝐴𝐴𝐴𝐴𝐴𝐴 = 𝐴𝐴, and thus lim ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ≠ 𝑚𝑚𝑋𝑋 (𝑡𝑡) and the random
2𝑇𝑇 2𝑇𝑇 𝑇𝑇→∞
process is NOT ergodic in the mean.
Example 2-16: Let 𝑋𝑋(𝑡𝑡) = 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ) where Θ is uniformly distributed over the interval
1 𝑇𝑇
(−π, π). We know that 𝑚𝑚𝑋𝑋 (𝑡𝑡) = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] = 0, so ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = ∫−𝑇𝑇 𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 =
2𝑇𝑇
1 𝑇𝑇 𝑇𝑇
𝐴𝐴
∫ 𝐴𝐴cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑 = ∫−𝑇𝑇 cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑 from which we have that lim ⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 =
2𝑇𝑇 −𝑇𝑇 2𝑇𝑇 𝑇𝑇→∞
0 = 𝑚𝑚𝑋𝑋 (𝑡𝑡). It remains to be shown that lim 𝑉𝑉𝑉𝑉𝑉𝑉[⟨𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 ] = 0 for the random process to be
𝑇𝑇→∞
𝐴𝐴2
deemed ergodic in the mean. Now we know 𝑅𝑅𝑋𝑋 (𝜏𝜏) = cos(ω𝑐𝑐 τ). Then:
2
𝑇𝑇 𝑇𝑇
1 𝐴𝐴2
⟨𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡)⟩ 𝑇𝑇 = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 = � cos(ω𝑐𝑐 (𝑡𝑡 + 𝜏𝜏) + Θ)cos(ω𝑐𝑐 𝑡𝑡 + Θ)𝑑𝑑𝑑𝑑
2𝑇𝑇 2𝑇𝑇
−𝑇𝑇 −𝑇𝑇
𝑇𝑇
𝐴𝐴2 𝐴𝐴2
= �[cos( 𝜔𝜔𝑐𝑐 (2𝑡𝑡 + 𝜏𝜏) + 2Θ) + cos(𝜔𝜔𝑐𝑐 𝜏𝜏)]𝑑𝑑𝑑𝑑 → cos(𝜔𝜔𝑐𝑐 𝜏𝜏)
4𝑇𝑇 2𝑇𝑇
−𝑇𝑇
as 𝑇𝑇 → ∞ and hence the random process may be ergodic in the autocorrelation.
NOTE: The importance of realising a random process is ergodic is that one need only have a
single realisation of the process to estimate the mean and autocorrelation, whereas otherwise
an ensemble of realisations would be needed.

Since it is difficult to prove ergodicity in the mean square sense, a WSS random process of
interest will simply be assumed to be ergodic, and we can postulate, without proof, that the
following signals you will encounter in communication systems are most likely ergodic in the
mean and possibly autocorrelation:
• Sinusoid with random phase
• Random Telegraph signal
• Random binary wave
• White noise (yet to be defined!)
2.8 Response of Linear Systems to Random Signals (Part 1: Time-domain)

[1: 587-589][2: 397-403]
Assume that the WSS process 𝑋𝑋(𝑡𝑡) is applied as the input to a linear, time-invariant (LTI)
system with impulse response ℎ(𝑡𝑡). What can we say about the response of the system to a
WSS random input?
From signal and systems theory we know that the output can be formulated in terms of the
following convolution integral:
∞
𝑌𝑌(𝑡𝑡) = � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑 = 𝑋𝑋(𝑡𝑡) ∗ ℎ(𝑡𝑡)
−∞
NOTE 1: The use and definition of the convolution operator ∗.
NOTE 2: From signals and systems theory we can also define the transfer function:
∞
𝐻𝐻(𝑓𝑓) = ℑ{ℎ(𝑡𝑡)} = ∫−∞ ℎ(𝑡𝑡)𝑒𝑒 −𝑗𝑗2π𝑓𝑓𝑓𝑓 𝑑𝑑𝑑𝑑 as the Fourier Transform of the impulse response.
2.8.1 Mean of the output process
𝑚𝑚𝑌𝑌 (𝑡𝑡) = 𝐸𝐸[𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡) ∗ ℎ(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡)] ∗ ℎ(𝑡𝑡) = 𝑚𝑚𝑋𝑋 (𝑡𝑡) ∗ ℎ(𝑡𝑡)
∞
= � ℎ(𝑟𝑟)𝑚𝑚𝑋𝑋 (𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑
−∞
For a WSS input process we have:

∞
𝑚𝑚𝑌𝑌 (𝑡𝑡) = 𝑚𝑚𝑋𝑋 � ℎ(𝑡𝑡)𝑑𝑑𝑑𝑑 = 𝑚𝑚𝑋𝑋 𝐻𝐻(0) = 𝑚𝑚𝑌𝑌
−∞

2.8.2 Cross-correlation between the input and output processes
Definition
The cross-correlation between the input 𝑋𝑋(𝑡𝑡) and output 𝑌𝑌(𝑡𝑡) is defined as:
𝑅𝑅𝑋𝑋𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑌𝑌(𝑡𝑡)𝑋𝑋(𝑡𝑡 + τ)] = 𝑅𝑅𝑌𝑌𝑌𝑌 (−τ)
since 𝑋𝑋(𝑡𝑡) and 𝑌𝑌(𝑡𝑡) are jointly stationary
∞
𝑅𝑅𝑋𝑋𝑋𝑋 (τ) = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ) � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑]
−∞
∞ ∞
= � ℎ(𝑟𝑟)𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)]𝑑𝑑𝑑𝑑 = � ℎ(𝑟𝑟)𝑅𝑅𝑋𝑋 (τ + 𝑟𝑟)𝑑𝑑𝑑𝑑
−∞ −∞
∞
= � ℎ(−𝑟𝑟)𝑅𝑅𝑋𝑋 (τ − 𝑟𝑟)𝑑𝑑𝑑𝑑 = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ)
−∞
and:
𝑅𝑅𝑌𝑌𝑌𝑌 (τ) = 𝑅𝑅𝑋𝑋𝑋𝑋 (−τ) = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(τ)
2.8.3 Autocorrelation of the output process
∞ ∞
𝑅𝑅𝑌𝑌 (τ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[� ℎ(𝑠𝑠)𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑑𝑑𝑑𝑑 � ℎ(𝑟𝑟)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)𝑑𝑑𝑑𝑑]
−∞ −∞
∞ ∞
= � � ℎ(𝑠𝑠)ℎ(𝑟𝑟)𝐸𝐸[𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑋𝑋(𝑡𝑡 − 𝑟𝑟)]𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞
∞ ∞
= � � ℎ(𝑠𝑠)ℎ(𝑟𝑟)𝑅𝑅𝑋𝑋 (τ − 𝑠𝑠 + 𝑟𝑟)𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑
−∞ −∞
which can also be expressed:
∞
𝑅𝑅𝑌𝑌 (τ) = 𝐸𝐸[𝑌𝑌(𝑡𝑡 + τ)𝑌𝑌(𝑡𝑡)] = 𝐸𝐸[{� ℎ(𝑠𝑠)𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑑𝑑𝑑𝑑} 𝑌𝑌(𝑡𝑡)]
−∞
∞ ∞
= � ℎ(𝑠𝑠)𝐸𝐸[ 𝑋𝑋(𝑡𝑡 + τ − 𝑠𝑠)𝑌𝑌(𝑡𝑡)]𝑑𝑑𝑑𝑑 = � ℎ(𝑠𝑠)𝑅𝑅𝑋𝑋𝑋𝑋 (τ − 𝑠𝑠) 𝑑𝑑𝑑𝑑
−∞ −∞
= 𝑅𝑅𝑋𝑋𝑋𝑋 (τ) ∗ ℎ(τ) = 𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ) ∗ ℎ(τ)
Since 𝑹𝑹𝒀𝒀 (𝛕𝛕) is a function of 𝛕𝛕 and 𝒎𝒎𝒀𝒀 (𝒕𝒕) = 𝒎𝒎𝒀𝒀 the output process, 𝒀𝒀(𝒕𝒕), is WSS
NOTE: This result will only hold if the input is continuously applied (i.e. applied at an
infinite time in the past). It does not hold if applied at 𝑡𝑡 = 0 (see EXAMPLE 9-18 from [2],
pg, 400-401)
Discrete-time case
𝑅𝑅𝑌𝑌 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘) ∗ ℎ(𝑘𝑘)
𝑅𝑅𝑌𝑌𝑌𝑌 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(𝑘𝑘)
𝑅𝑅𝑋𝑋𝑋𝑋 (𝑘𝑘) = 𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘)

2.9 Power Spectral Density
[1: 577-587][2: 408-412]
2.9.1 Definition
The so-called Einstein-Wiener-Khinchin theorem states that the power spectral density,
𝑆𝑆𝑋𝑋 (𝑓𝑓) and the autocorrelation function, 𝑅𝑅𝑋𝑋 (τ), of a WSS process form a Fourier transform
pair (where ℑ{. } is the Fourier transform operator):
Continuous-time case using FT

∞
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{𝑅𝑅𝑋𝑋 (τ)} = � 𝑅𝑅𝑋𝑋 (τ)𝑒𝑒 −𝑗𝑗2π𝑓𝑓τ 𝑑𝑑τ
−∞
∞
𝑅𝑅𝑋𝑋 (τ) = ℑ {𝑆𝑆𝑋𝑋 (𝑓𝑓)} = � 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝑒𝑒 𝑗𝑗2π𝑓𝑓τ 𝑑𝑑𝑓𝑓
−1
−∞
Discrete-time case (signal analysis) using DTFT

∞
𝑆𝑆𝑋𝑋 (𝐹𝐹) = 𝒟𝒟{𝑅𝑅𝑋𝑋 (𝑘𝑘)} = � 𝑅𝑅𝑋𝑋 (𝑘𝑘)𝑒𝑒 −𝑗𝑗2𝜋𝜋𝜋𝜋𝜋𝜋
𝑘𝑘=−∞
𝜋𝜋
𝑅𝑅𝑋𝑋 (𝑘𝑘) = 𝒟𝒟 −1 {𝑆𝑆𝑋𝑋 (𝐹𝐹)} = � 𝑆𝑆𝑋𝑋 (𝐹𝐹)𝑒𝑒 𝑗𝑗2𝜋𝜋𝜋𝜋𝜋𝜋 𝑑𝑑𝑑𝑑
−𝜋𝜋
𝑓𝑓
where 𝒟𝒟{. } is the discrete-time Fourier transform (DTFT), and 𝐹𝐹 = is the digital frequency
𝑆𝑆
where 𝑆𝑆 is the sampling frequency. Since the discrete-time PSD is periodic with principle
1 1 𝑆𝑆 𝑆𝑆
period, − ≤ 𝐹𝐹 ≤  − ≤ 𝑓𝑓 ≤ which is a consequence of the Nyquist sampling theorem.
2 2 2 2
Discrete-time case (systems analysis) using z-transform

∞
𝑆𝑆𝑋𝑋 (𝑧𝑧) = 𝒵𝒵{𝑅𝑅𝑋𝑋 (𝑘𝑘)} = � 𝑅𝑅𝑋𝑋 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
𝑘𝑘=−∞
−1
1 𝜋𝜋
𝑅𝑅𝑋𝑋 (𝑘𝑘) = 𝒵𝒵 {𝑆𝑆𝑋𝑋 (𝑧𝑧)} = � 𝑆𝑆 (𝑧𝑧)𝑧𝑧 𝑘𝑘−1 𝑑𝑑𝑑𝑑
2𝜋𝜋𝜋𝜋 −𝜋𝜋 𝑋𝑋

Figure 2-12: Fourier Transform Properties ([1], Appendix B)

Figure 2-13: Fourier Transform Pairs ([1], Appendix B)

What is the power spectral density?
The power spectral density (PSD) of a random process provides a spectral representation of
the process and can be viewed as a form of power spectrum for random signals. We can most
easily see this by considering the average power of the signal:
∞
2
𝐸𝐸[𝑋𝑋 (𝑡𝑡)] = 𝑅𝑅𝑋𝑋 (0) = � 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝑑𝑑𝑑𝑑
−∞
∞
from which it is evident that if the integral over frequency ∫−∞ 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝑑𝑑𝑑𝑑 represents the average
power then 𝑆𝑆𝑋𝑋 (𝑓𝑓) is indeed a measure of power density across the spectrum, the power spectral
density! The PSD is an extremely useful measure for (invariably random) communications
signals since spectrum usage and efficient utilisation is a key concern with all communications
systems.
NOTE: To derive the PSD we simply take the Fourier transform of the autocorrelation
function of the random process. Consult your nearest signals and systems textbook for Fourier
transform tables of standard functions.
2.9.2 Properties of the Power Spectral Density
Property 1
The DC bias of the autocorrelation (NOT the random signal) is given by:
∞
𝑆𝑆𝑋𝑋 (0) = � 𝑅𝑅𝑋𝑋 (τ)𝑑𝑑τ
−∞
BE CAREFUL! The PSD of a random process is not always equivalent to what we expect
from the power spectrum of a deterministic signal. In particular the PSD evaluated at 𝑓𝑓 = 0
does NOT represent the DC bias, m, of the random signal itself. The DC bias of a random
signal is given by the non-zero mean of the autocorrelation function (i.e. 𝑹𝑹𝑿𝑿 (𝛕𝛕) → 𝒎𝒎𝟐𝟐 as
𝛕𝛕 → ∞) which manifests itself as a DC power impulse at 𝒇𝒇 = 𝟎𝟎 in the PSD.
Property 2
The average power of the random process is given by:
∞
2
𝐸𝐸[𝑋𝑋 (𝑡𝑡)] = 𝑅𝑅𝑋𝑋 (0) = � 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝑑𝑑𝑑𝑑
−∞
Property 3
The PSD is always non-negative, that is:
𝑆𝑆𝑋𝑋 (𝑓𝑓) ≥ 0
Property 4
The PSD of a real-valued random process is an even, real function:
𝑆𝑆𝑋𝑋 (−𝑓𝑓) = 𝑆𝑆𝑋𝑋 (𝑓𝑓)

From Section 2.6.4 we had that for a sinusoid with random phase:
𝐴𝐴2 𝐴𝐴2
𝑅𝑅𝑋𝑋 (τ) = cos(ω𝑐𝑐 τ) = cos(2π𝑓𝑓𝑐𝑐 τ)
2 2
and the PSD is given as:
𝐴𝐴2 𝐴𝐴2 𝐴𝐴2
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{ cos(2π𝑓𝑓𝑐𝑐 τ)} = δ(𝑓𝑓 − 𝑓𝑓𝑐𝑐 ) + δ(𝑓𝑓 + 𝑓𝑓𝑐𝑐 )
2 4 4
using either Appendix B of [2], Appendix C.5 from [5], or Table A6.3 from Appendix A of
[3]. From Figure 2-14 it is evident that spectrally all the energy is concentrated at the frequency
𝑓𝑓𝑐𝑐 which is to be expected. [NOTE: The 𝛿𝛿(𝑓𝑓), and in the time-domain: 𝛿𝛿(𝑡𝑡), is the impulse
or Dirac delta function]
Figure 2-14 The PSD of a sinusoid with random phase ([3], Figure 1.10, pg. 48)
From Section 2.6.5 we had that:

|τ|
𝐴𝐴2 �1 − � |τ| < 𝑇𝑇
𝑅𝑅𝑋𝑋 (τ) = � 𝑇𝑇
0 |τ| ≥ 𝑇𝑇
𝑇𝑇
|τ| −𝑗𝑗2π𝑓𝑓τ
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{𝑅𝑅𝑋𝑋 (τ)} = � 𝐴𝐴2 �1 − � 𝑒𝑒 𝑑𝑑τ = 𝐴𝐴2 𝑇𝑇sinc 2 (𝑓𝑓𝑓𝑓)
𝑇𝑇
−𝑇𝑇
using either Appendix B of [1], or Table A6.3 from Appendix A of [2]. A plot of the PSD is
provided by Figure 2-15.

Figure 2-15 The PSD of the random binary wave ([3], Figure 1.11, pg. 49)
sin(π𝑥𝑥)
NOTE: sinc(𝑥𝑥) = (where sinc(0) = 1) is an important function that you will see
π𝑥𝑥
again in the signals and systems and communication systems units.
2.9.5 Random Telegraph Signal
From Section 2.6.6 we have that:

𝑅𝑅𝑋𝑋 (τ) = 𝑒𝑒 −2α|τ|
4α
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{𝑅𝑅𝑋𝑋 (τ)} =
4α2 + 4π2 𝑓𝑓 2
using either Appendix B of [2], Appendix C.4 from [5], or Table A6.3 from Appendix A of
[3]. A plot of the PSD is provided by Figure 2-16.
Figure 2-16 The PSD of the random telegraph signal ([1], Figure 10.1, pg. 580)
2.10 Energy and Power
2.10.1 Energy signals and Power signals (deterministic signals)
When we speak of the energy or power of a signal it is typically assumed that the signal is a
voltage or current source feeding a 1 ohm resistor. Thus the power and energy are direct
functions of the signal amplitude (voltage).
The energy content of a signal is defined by:
∞
𝐸𝐸𝑥𝑥 = � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
−∞
and the power content of a signal is defined by:
1 𝑇𝑇/2
𝑃𝑃𝑥𝑥 = lim � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
Energy-Type Signals
A signal is called energy-type if 𝐸𝐸𝑥𝑥 < ∞ and it can be shown that in such a case 𝑃𝑃𝑥𝑥 = 0.
Typically a finite-duration signal is an energy-type signal. Using Parseval’s Theorem we see
that (where 𝑋𝑋(𝑓𝑓) = ℑ{𝑥𝑥(𝑡𝑡)} is the Fourier Transform of 𝑥𝑥(𝑡𝑡)):
∞ ∞
𝐸𝐸𝑥𝑥 = � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑 = � |𝑋𝑋(𝑓𝑓)|2 𝑑𝑑𝑑𝑑
−∞ −∞
where 𝐺𝐺𝑥𝑥 (𝑓𝑓) = |𝑋𝑋(𝑓𝑓)|2 is the energy spectral density (or energy spectrum) of the signal
𝑥𝑥(𝑡𝑡) expressed in Joules per Hertz.
Power-Type Signals
A signal is called power-type if 0 < 𝑃𝑃𝑥𝑥 < ∞ and it can be shown that in such a case 𝐸𝐸𝑥𝑥 = ∞.
Typically infinite-duration signals are power-type signals.
NOTE: A signal cannot be both energy-type and power-type. Most signals of interest are
one type or the other.
Define the “autocorrelation” function of a real-valued, power-type signal as:
1 𝑇𝑇/2
𝑅𝑅𝑥𝑥 (τ) = lim � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑
From which we can state:
1 𝑇𝑇/2
𝑃𝑃𝑥𝑥 = 𝑅𝑅𝑥𝑥 (0) = lim � |𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑
Define the power-spectral density or power spectrum of the signal as:
∞
1
𝑆𝑆𝑥𝑥 (𝑓𝑓) = ℑ{𝑅𝑅𝑥𝑥 (τ)} = � 𝑅𝑅𝑥𝑥 (τ)𝑒𝑒 −𝑗𝑗2π𝑓𝑓τ 𝑑𝑑τ = lim |𝑋𝑋𝑇𝑇 (𝑓𝑓)|2
−∞ 𝑇𝑇→∞ 𝑇𝑇
where:
𝑇𝑇/2
𝑡𝑡
𝑋𝑋𝑇𝑇 (𝑓𝑓) = ℑ{𝑥𝑥(𝑡𝑡)rect � �} = ℑ 𝑇𝑇 {𝑥𝑥(𝑡𝑡)} = � 𝑥𝑥(𝑡𝑡)𝑒𝑒 −𝑗𝑗2π𝑓𝑓𝑓𝑓 𝑑𝑑𝑑𝑑
𝑇𝑇 −𝑇𝑇/2

which follows from noting that:
∞
𝑃𝑃𝑥𝑥 = 𝑅𝑅𝑥𝑥 (0) = � 𝑆𝑆𝑥𝑥 (𝑓𝑓) 𝑑𝑑𝑑𝑑
−∞
where 𝑆𝑆𝑥𝑥 (𝑓𝑓) is usually measured in Watts/Hz.
∞
NOTE: Since 𝑥𝑥(𝑡𝑡) is infinite duration (not timelimited) then ∫−∞|𝑥𝑥(𝑡𝑡)|2 𝑑𝑑𝑑𝑑 ≮ ∞ and the usual
Fourier Transform (FT) cannot be defined. Instead we use the short-time Fourier Transform
∞ 𝑡𝑡
(STFT), 𝑋𝑋(𝜏𝜏, 𝜔𝜔) = ∫−∞ 𝑥𝑥(𝑡𝑡)𝑤𝑤(𝑡𝑡 − 𝜏𝜏)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 𝑑𝑑𝑑𝑑 with 𝜏𝜏 = 0 and 𝑤𝑤(𝑡𝑡) = rect � �.
𝑇𝑇
Most practical infinite-duration, deterministic signals are periodic. For periodic signals we
have the following relations:
1 𝑇𝑇0 /2 −1
∞
2 𝑗𝑗2π
𝑛𝑛τ
𝑅𝑅𝑥𝑥 (τ) = � 𝑥𝑥(𝑡𝑡 + τ)𝑥𝑥(𝑡𝑡)𝑑𝑑𝑑𝑑 = ℑ {𝑆𝑆𝑥𝑥 (𝑓𝑓)} = � |𝑥𝑥𝑛𝑛 | 𝑒𝑒 𝑇𝑇0
𝑇𝑇0 −𝑇𝑇0 /2 𝑛𝑛=−∞
∞ 𝑛𝑛
𝑆𝑆𝑥𝑥 (𝑓𝑓) = � |𝑥𝑥𝑛𝑛 |2 δ �𝑓𝑓 − �
𝑛𝑛=−∞ 𝑇𝑇0
∞
𝑃𝑃𝑥𝑥 = � |𝑥𝑥𝑛𝑛 |2
𝑛𝑛=−∞
where 𝑥𝑥𝑛𝑛 are the Fourier series co-efficients of 𝑥𝑥(𝑡𝑡).
NOTE: For periodic signals the power spectrum is given by the magnitude squares of the
sequence of Fourier series co-efficients  discrete-frequency or “sampled” spectrum.
2.10.2 Energy signals and Power signals (random signals)
For random signals we define the energy and power of the signal as the following
expectations:
∞ ∞ ∞
2 2
𝐸𝐸𝑋𝑋 = 𝐸𝐸 �� 𝑋𝑋 (𝑡𝑡)𝑑𝑑𝑑𝑑� = � 𝐸𝐸[𝑋𝑋 (𝑡𝑡)]𝑑𝑑𝑑𝑑 = � 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡)𝑑𝑑𝑑𝑑
−∞ −∞ −∞
𝑇𝑇/2 𝑇𝑇/2
1 2
1 2
1 𝑇𝑇/2
𝑃𝑃𝑋𝑋 = 𝐸𝐸 � lim � 𝑋𝑋 (𝑡𝑡)𝑑𝑑𝑑𝑑� = lim � 𝐸𝐸[𝑋𝑋 (𝑡𝑡)]𝑑𝑑𝑑𝑑 = lim � 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡)𝑑𝑑𝑑𝑑
𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2 𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2 𝑇𝑇→∞ 𝑇𝑇 −𝑇𝑇/2
For WSS processes we have that 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡, 𝑡𝑡) = 𝑅𝑅𝑋𝑋 (0) is independent of time 𝑡𝑡 and hence:
𝑃𝑃𝑋𝑋 = 𝑅𝑅𝑋𝑋 (0)
∞
𝐸𝐸𝑋𝑋 = � 𝑅𝑅𝑋𝑋 (0) 𝑑𝑑𝑑𝑑 = ∞
−∞
and hence WSS random signals are power-type signals.
Note that we also similarly define the power spectrum or power-spectral density as:
𝑆𝑆𝑋𝑋 (𝑓𝑓) = ℑ{𝑅𝑅𝑋𝑋 (τ)}
which follows from noting that:
∞
𝑃𝑃𝑋𝑋 = 𝑅𝑅𝑋𝑋 (0) = � 𝑆𝑆𝑋𝑋 (𝑓𝑓) 𝑑𝑑𝑑𝑑
−∞
where 𝑆𝑆𝑋𝑋 (𝑓𝑓) is usually measured in Watts/Hz.

NOTE: For stationary random signals the power spectrum is a continuous function of
frequency.
Exercise 2-13
What is the difference in the frequency-domain representation of the following similar signals:
(a) A rectangular pulse
(b) A square wave
(c) A random binary wave
Answer:
(a) Energy-type deterministic signal hence we have an energy spectrum given by 𝐺𝐺𝑥𝑥 (𝑓𝑓) =
|𝑋𝑋(𝑓𝑓)|2 ∝ sinc 2 (𝑓𝑓) (continuous-frequency)
(b) Power-type deterministic signal hence we have a power spectrum given by 𝑆𝑆𝑥𝑥 (𝑓𝑓) =
𝑛𝑛
∑∞ 2 2
𝑛𝑛=−∞|𝑥𝑥𝑛𝑛 | δ �𝑓𝑓 − � ∝ sinc (𝑘𝑘) (discrete-frequency)
𝑇𝑇0
(c) Power-type random signal hence we have a power spectrum given by 𝑆𝑆𝑋𝑋 (𝑓𝑓) =
ℑ{𝑅𝑅𝑋𝑋 (τ)} ∝ sinc 2 (𝑓𝑓) (continuous-frequency)
NOTE: The deterministic energy-type, rectangular pulse has the same spectral characteristic
as the power-type random binary wave! But one is an energy spectrum the other is a power
spectrum.
2.11 Response of Linear Systems to Random Signals (Part 2: Frequency-

domain) [1:589-590][2: 412-414]
We now reconsider the relations established from Section 2.8 for the output of a LTI system
with a random WSS signal as input but in the frequency domain. To do this we simply take
the Fourier transform of the quantities:
𝑆𝑆(𝑓𝑓) = ℑ{𝑅𝑅(τ)}, 𝐻𝐻(𝑓𝑓) = ℑ{ℎ(τ)}, 𝐻𝐻∗ (𝑓𝑓) = ℑ{ℎ(−τ)}
and using the results of Section 2.3 obtain the following key result:
𝑆𝑆𝑌𝑌 (𝑓𝑓) = ℑ{𝑅𝑅𝑌𝑌 (τ)} = ℑ{𝑅𝑅𝑋𝑋 (τ) ∗ ℎ(−τ) ∗ ℎ(τ)} = 𝑆𝑆𝑋𝑋 (𝑓𝑓)𝐻𝐻∗ (𝑓𝑓)𝐻𝐻(𝑓𝑓) = 𝑆𝑆𝑋𝑋 (𝑓𝑓)|𝐻𝐻(𝑓𝑓)|2
That is:
𝑆𝑆𝑌𝑌 (𝑓𝑓) = |𝐻𝐻(𝑓𝑓)|2 𝑆𝑆𝑋𝑋 (𝑓𝑓)
Equation 2-7
and also:
𝑆𝑆𝑌𝑌𝑌𝑌 (𝑓𝑓) = 𝐻𝐻(𝑓𝑓)𝑆𝑆𝑋𝑋 (𝑓𝑓)
𝑆𝑆𝑋𝑋𝑋𝑋 (𝑓𝑓) = 𝐻𝐻∗ (𝑓𝑓)𝑆𝑆𝑋𝑋 (𝑓𝑓)
where 𝑆𝑆𝑋𝑋𝑋𝑋 (𝑓𝑓) and 𝑆𝑆𝑌𝑌𝑌𝑌 (𝑓𝑓) are the cross-spectral densities and 𝑆𝑆𝑌𝑌 (𝑓𝑓) is the output power-
spectral density.
For discrete-time systems using the z-transform representation:
𝑆𝑆𝑌𝑌 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑅𝑅𝑌𝑌 (𝑘𝑘)} = 𝑍𝑍𝑍𝑍{𝑅𝑅𝑋𝑋 (𝑘𝑘) ∗ ℎ(−𝑘𝑘) ∗ ℎ(𝑘𝑘)} = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑋𝑋 (𝑧𝑧)
𝑆𝑆𝑌𝑌𝑌𝑌 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝑆𝑆𝑋𝑋 (𝑧𝑧)
𝑆𝑆𝑋𝑋𝑋𝑋 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑋𝑋 (𝑧𝑧)
where 𝑍𝑍𝑍𝑍{. } is the z-transform, 𝐻𝐻(𝑧𝑧 −1 ) = 𝑍𝑍𝑍𝑍{ℎ(−𝑘𝑘)} and 𝑆𝑆𝑋𝑋 (𝐹𝐹) = 𝑆𝑆𝑋𝑋 (𝑧𝑧)|𝑧𝑧=𝑒𝑒 𝑗𝑗2𝜋𝜋𝜋𝜋

2.12 Gaussian, White Noise and Autoregressive Processes
[1: 581,584,590][2: 385,420,450]
2.12.1 Gaussian Process (revisited)
Following the introduction of the Gaussian random process from Section 1.4.1 we note /
summarise the following important properties:
Property 1
For Gaussian processes knowledge of the mean and autocorrelation (𝑚𝑚𝑋𝑋 (𝑡𝑡) and 𝑅𝑅𝑋𝑋𝑋𝑋 (𝑡𝑡1 , 𝑡𝑡2 ))
gives a complete statistical description of the process.
Property 2
If the Gaussian random process 𝑋𝑋(𝑡𝑡) is passed through an LTI system, then the output process
𝑌𝑌(𝑡𝑡) will also be a Gaussian process.
Property 3
If a Gaussian process is WSS then it is also strictly stationary.
Property 4
If random variables obtained by sampling a Gaussian random process
{𝑋𝑋(𝑡𝑡1 ), 𝑋𝑋(𝑡𝑡2 ), … , 𝑋𝑋(𝑡𝑡𝑘𝑘 )}are uncorrelated, then they are also independent.
Property 5
A sufficient condition for the ergodicity of the stationary zero-mean Gaussian process 𝑋𝑋(𝑡𝑡)
∞
is that ∫−∞ �𝑅𝑅𝑋𝑋 (τ)� 𝑑𝑑τ < ∞
2.12.2 White Noise and Thermal Noise
The term “white noise” is used to denote the zero-mean random process, 𝑊𝑊(𝑡𝑡), in which all
frequency components appear with equal power. i.e. the power spectral density (the power
spectrum of the signal) is constant or “flat”, typically designated as:
𝑁𝑁0
𝑆𝑆𝑊𝑊 (𝑓𝑓) =
2
and by taking the inverse Fourier Transform we have:
𝑁𝑁0
𝑅𝑅𝑊𝑊 (τ) = δ(τ)
2
which implies that samples from the white noise process are uncorrelated.
𝑁𝑁
The 0�2 (or 𝑁𝑁0 ) is termed the noise power (spectral) density. The ½ factor is needed to
account for the dual effect due to –ve frequencies (see Example 2-14)

Figure 2-17 White noise (a) power spectral density, (b) autocorrelation
([3], Figure 1.16, pg. 61)
NOTE: The δ(τ) is the unit impulse or delta function (see

http://en.wikipedia.org/wiki/Dirac_delta_function)
In practice the white noise process as defined is an artificial abstraction since:

∞
𝑃𝑃𝑊𝑊 = � 𝑆𝑆𝑊𝑊 (𝑓𝑓)𝑑𝑑𝑑𝑑 = ∞
−∞
and no signal can have an infinite power. Practical “white noise” signals are usually limited
by the system bandwidth and hence have finite power.
The “white noise” process provides a simple mathematical approximation to thermal noise
which is a pervasive source of interference in all communications and electronics systems.
From quantum mechanical analysis one can derive the power spectrum of thermal noise as
shown in Figure 2-18.
Figure 2-18 Power spectrum of thermal noise ([4], Figure 4.20, pg. 189)
Thus we can make the following association:

𝑁𝑁0 = 𝑘𝑘𝑘𝑘 (Watts⁄Hz)
where k is the Boltzmann’s constant and T is the equivalent noise temperature.
Finally if the white noise discrete-time process is also Gaussian then the sample random
variables are not only uncorrelated but they are also independent, and this provides a stronger
statement on the characteristics of a “white noise” process. This white Gaussian noise
represents the ultimate in “randomness”.
We make the following comments about the “white noise” process 𝑊𝑊(𝑡𝑡) typically analysed in
the electrical engineering context:
• Samples from the noise process are uncorrelated with each other.
• The noise process is zero-mean and is at least wide-sense stationary; it can be shown to
also be ergodic.
If the noise process is discrete-time then it is an IID process and:

𝑁𝑁0
𝑅𝑅𝑊𝑊 (𝑘𝑘) = 𝛿𝛿[𝑘𝑘]
2
which yields:
𝑁𝑁0
𝑆𝑆𝑊𝑊 (𝐹𝐹) =
2
𝑆𝑆 𝑆𝑆
Such a sampled noise process is bandlimited since we know − ≤ 𝑓𝑓 ≤ .
2 2
And we can further state:
• The noise discrete-time process, 𝑊𝑊𝑛𝑛 , is modelled as a Gaussian random process, hence
samples from the noise process follow a Gaussian distribution.
• The noise discrete-time process possesses a flat power spectrum over the system
𝑁𝑁 𝑆𝑆
bandwidth, B, of interest: 𝑆𝑆𝑊𝑊 (𝑓𝑓) = 0�2 for |𝑓𝑓| ≤ 𝐵𝐵 where 𝐵𝐵 = .
2
Hence the pdf of 𝑊𝑊𝑛𝑛 is given by:

1 𝑤𝑤 2 1 𝑤𝑤 2 𝑁𝑁0
− 2 −
𝑓𝑓𝑊𝑊𝑛𝑛 (𝑤𝑤) = 𝑒𝑒 2𝜎𝜎 = 𝑒𝑒 𝑁𝑁0 where we note that 𝜎𝜎 2 =
√2𝜋𝜋𝜎𝜎 2 �𝜋𝜋𝑁𝑁0 2
In communications engineering the noise process is used to model the interference experienced
by a signal transmitted through a communications channel (fibre optic cable, telephone cable,
satellite, over-the-air, etc.). The white noise process is the 𝑁𝑁(𝑡𝑡) in the signal plus noise
described previously:
In communications engineering white noise in the “signal plus noise” context is termed an
AWGN (Additive White Gaussian Noise) process. Also standard notation for the AWGN
process can be 𝑊𝑊(𝑡𝑡), 𝑁𝑁(𝑡𝑡), or 𝑉𝑉(𝑡𝑡) so students should expect to see different notation being
adopted.
2.12.3 Filtered White noise / Low-pass filtered white noise
Although the white noise process has, in theory, infinite power in practical electronic
systems the noise process will be bandwidth limited (bandlimited) and hence possess finite
power. Indeed if one considers practical white noise one can make the following statements:
• In nature the white noise process doesn’t really exist and only approximates the true
underlying physical process (e.g. electrons in thermal noise, etc.) which is not truly
flat and will have finite power.
• In electrical engineering circuits the noise will be bandwidth limited and hence it is
safe to assume the ensuing process is a bandlimited or filtered white noise.

Exercise 2-14
Suppose a WGN process 𝑊𝑊(𝑡𝑡) is input to an ideal low-pass filter of bandwidth B and passband
magnitude response of unity. (a) What is the PSD of the output process 𝑁𝑁(𝑡𝑡)? (b) If the output
power is measured to be P Watts, what is 𝑁𝑁0 (in terms of B and P)? (c) Sketch the
autocorrelation function of 𝑁𝑁(𝑡𝑡).
Answer:
1 |𝑓𝑓| < 𝐵𝐵
(a) Ideal low-pass filter transfer function: 𝐻𝐻(𝑓𝑓) = � , hence
0 otherwise
𝑁𝑁0
𝑆𝑆𝑁𝑁 (𝑓𝑓) = 𝑆𝑆𝑊𝑊 (𝑓𝑓)|𝐻𝐻(𝑓𝑓)|2 = � �2 |𝑓𝑓| < 𝐵𝐵 , a sketch is shown in Figure 2-19(a).
0 otherwise
Figure 2-19 Low-pass filtered white noise (a) power spectral density (b) autocorrelation
function ([2], Figure 1.17. pg. 62)
(b) 𝑃𝑃 = ∫−𝐵𝐵 𝑆𝑆𝑁𝑁 (𝑓𝑓)𝑑𝑑𝑑𝑑 = 𝑁𝑁0 𝐵𝐵 hence 𝑁𝑁0 = 𝑃𝑃�𝐵𝐵 �Watts�Hz�
𝐵𝐵
(c) From Fourier Transform tables (Appendix B of [1]) we have that 𝑅𝑅𝑁𝑁 (τ) =
𝑁𝑁0 sin(2π𝐵𝐵τ)
2𝐵𝐵 = 𝑁𝑁0 𝐵𝐵sinc(2𝐵𝐵τ), a sketch is shown in Figure 2-19(b).
2 2π𝐵𝐵τ
2.12.4 Spectral Factorisation

In general if we filter a white noise process 𝑊𝑊(𝑡𝑡) by a filter with transfer function 𝐻𝐻(𝑓𝑓) then
we can state the following spectral factorisation regarding the PSD of the output process 𝑌𝑌(𝑡𝑡):
𝑁𝑁0
𝑆𝑆𝑌𝑌 (𝑓𝑓) = |𝐻𝐻(𝑓𝑓)|2 = 𝐻𝐻∗ (𝑓𝑓)𝐻𝐻(𝑓𝑓)σ2
2
and this provides us with a way to generate a WSS process with arbitrary PSD, 𝑆𝑆𝑌𝑌 (𝑓𝑓), by
filtering a white noise process with the appropriate 𝐻𝐻(𝑓𝑓).

σ2
Example 2-17: Generate a WSS process with PSD: 𝑆𝑆𝑌𝑌 (𝑓𝑓) =
α2 +4π2 𝑓𝑓2
Factor as follows:
σ2 1 1
𝑆𝑆𝑌𝑌 (𝑓𝑓) = 2 2 2
= σ2 = 𝐻𝐻 ∗ (𝑓𝑓)𝐻𝐻(𝑓𝑓)σ2
α + 4π 𝑓𝑓 (α − 𝑗𝑗2π𝑓𝑓) (α + 𝑗𝑗2π𝑓𝑓)
from which we extract the causal, stable filter transfer function as:
1
𝐻𝐻(𝑓𝑓) =
(α + 𝑗𝑗2π𝑓𝑓)
𝑁𝑁
and apply as input to this filter the white noise process with 0 = σ2 .
2
Review: Why is 𝐻𝐻∗ (𝑓𝑓) a non-causal filter?
2.12.5 Wide Sense Stationarity of Autoregressive Processes
Note that the discrete-time white noise process, 𝑊𝑊𝑛𝑛 , is a zero-mean IID process. We have
when taking the z-transforms, where 𝑊𝑊(𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑊𝑊𝑛𝑛 } is the input and 𝑌𝑌(𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑌𝑌𝑛𝑛 } is the
output (assume for one realisation of the random process), the following result for the system
𝑌𝑌(𝑧𝑧)
transfer function 𝐻𝐻(𝑧𝑧) = :
𝑊𝑊(𝑧𝑧)
AR(p) process (with 𝑋𝑋𝑛𝑛 = 𝑊𝑊𝑛𝑛 )

𝑝𝑝
1
𝑌𝑌𝑛𝑛 = � α𝑖𝑖 𝑌𝑌𝑛𝑛−𝑖𝑖 + 𝑊𝑊𝑛𝑛 → 𝐻𝐻(𝑧𝑧) = 𝑝𝑝
1 − ∑𝑖𝑖=1 𝛼𝛼𝑖𝑖 𝑧𝑧 −𝑖𝑖
𝑖𝑖=1
MA(q) process (with 𝑋𝑋𝑛𝑛 = 𝑊𝑊𝑛𝑛 )
𝑘𝑘 𝑘𝑘
𝑌𝑌𝑛𝑛 = � β𝑖𝑖 𝑋𝑋𝑛𝑛−𝑖𝑖 → H(z) = � β𝑖𝑖 𝑧𝑧 −𝑖𝑖

ARMA(p,q) process (with 𝑋𝑋𝑛𝑛 = 𝑊𝑊𝑛𝑛 )
𝑝𝑝 𝑞𝑞
∑𝑘𝑘𝑖𝑖=0 β𝑖𝑖 𝑧𝑧 −𝑖𝑖
𝑌𝑌𝑛𝑛 = � α𝑖𝑖 𝑌𝑌𝑛𝑛−𝑖𝑖 + � β𝑖𝑖 𝑋𝑋𝑛𝑛−𝑖𝑖 → H(z) = 𝑝𝑝
1 − ∑𝑖𝑖=1 𝛼𝛼𝑖𝑖 𝑧𝑧 −𝑖𝑖
Since 𝑊𝑊𝑛𝑛 is WSS and since 𝑌𝑌𝑛𝑛 is the output process when applying 𝑊𝑊𝑛𝑛 to the LTI system
transfer function 𝐻𝐻(𝑧𝑧) then:
The AR(p), MA(q) and ARMA(p,q) processes are WSS with PSD given by:
𝑆𝑆𝑌𝑌 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑆𝑆𝑊𝑊 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝜎𝜎 2
assuming the input white noise process has variance 𝜎𝜎 2

2.13 Signal and Noise Measures
2.13.1 Signal-to-Noise Ratio (dB)
If one considers the signal plus noise process:

a common measure used by engineers to determine the “quality” of the signal in the presence
of additive noise is the ubiquitous signal-to-noise ratio defined as:
Signal to Noise Ratio (SNR)
𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐴𝐴𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 2
𝑆𝑆𝑆𝑆𝑆𝑆 = =� �
𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝐴𝐴𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
where P is the signal power and A is the RMS signal amplitude. Measurements must be
made under the same system bandwidth.
NOTE: 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑅𝑅𝑋𝑋 (0) and 𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑅𝑅𝑁𝑁 (0)
The SNR is usually measured in dB or decibels:
SNR (dB)
𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐴𝐴𝑠𝑠𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖
𝑆𝑆𝑆𝑆𝑆𝑆(𝑑𝑑𝑑𝑑) = 10log10 � � = 20log10 � �
𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝐴𝐴𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
What can we say about SNR(dB) values?

• if we double the signal power (or halve the noise power) we increase the SNR(dB) by
3dB; conversely if we reduce the signal power by 2 (or double the noise power) then we
decrease the SNR(dB) by 3dB
• if the SNR(dB) improves by 10dbB (from, say, 5dB to 15dB) then that means there
has been an ten-fold increase in the signal power (or ten-fold decrease in the noise
power).
• At 0dB the noise and signal power are the same, above 0dB the signal dominates, below
0dB the noise dominates.
Finally in digital communications systems a more useful measure is the normalised “SNR
per bit” defined as:
SNR per bit
𝐸𝐸𝑏𝑏 transmitted energy per bit
𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏 = =
𝑁𝑁0 noise power spectral density
𝐸𝐸
Note that if the data rate is 𝑅𝑅𝑏𝑏 then the bit duration is 𝑇𝑇𝑏𝑏 = 1�𝑅𝑅 and hence 𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 = 𝑏𝑏�𝑇𝑇 =
𝑏𝑏 𝑏𝑏
𝐸𝐸𝑏𝑏 𝑅𝑅𝑏𝑏 , and over a system bandwidth of B we also have 𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 = 𝑁𝑁0 𝐵𝐵 hence:
𝑃𝑃𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝐸𝐸𝑏𝑏 𝑅𝑅𝑏𝑏 𝑅𝑅𝑏𝑏
𝑆𝑆𝑆𝑆𝑆𝑆 = = = 𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏
𝑃𝑃𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 𝑁𝑁0 𝐵𝐵 𝐵𝐵
Thus 𝑆𝑆𝑆𝑆𝑅𝑅𝑏𝑏𝑏𝑏𝑏𝑏 is a normalised “SNR” since it is independent of the data rate and system
bandwidth.

2.13.2 Quantisation Noise
Quantisers are devices that operate on a signal and represent the signal amplitude by a finite
number of levels or quantisation levels. Quantisation is needed to represent a discrete signal
(or each sample of a sampled analog signal) digitally in a finite number of bits (i.e. each sample
can then be stored on digital media as an 8-bit byte or 16-bit integer value).
It is common to use uniform quantisers with equal quantisation levels. There are various ways
a signal can be quantised as shown in Figure 2-20
Figure 2-20 Quantising a signal ([6], Figure 14.15. pg. 461)
Assume the quantiser has been designed to handle amplitudes in the range 𝑥𝑥min ≤ 𝑥𝑥[𝑛𝑛] ≤
𝑥𝑥max , then if 𝑥𝑥[𝑛𝑛] < 𝑥𝑥min or 𝑥𝑥[𝑛𝑛] > 𝑥𝑥max the signal amplitude will get clipped and either
saturate at the highest level or be zeroed (see Figure 2-21).
Figure 2-21 Clipping a signal ([6], Figure 14.16, pg. 461)

Note: Clipping creates serious distortion and is avoided in practice by reducing the signal gain
so that spurious signal peaks are not clipped.
Assume each sample is to be stored as a B-bit word, then the number of quantisation levels is:
𝐿𝐿 = 2𝐵𝐵
Define 𝐷𝐷 = 𝑥𝑥max − 𝑥𝑥min as the dynamic range of the signal 𝑥𝑥[𝑛𝑛], then the quantisation step-
size, or resolution, Δ, that results when quantising a signal with the dynamic range 𝐷𝐷 in 𝐿𝐿
levels is:
𝐷𝐷
Δ=
𝐿𝐿
Let 𝑥𝑥𝑄𝑄 [𝑛𝑛] represent the quantised signal, then 𝑒𝑒[𝑛𝑛] = 𝑥𝑥[𝑛𝑛] − 𝑥𝑥𝑄𝑄 [𝑛𝑛] is the quantisation error.
For quantisation with rounding the quantisation error can be assumed to be a uniformly
Δ Δ
distributed random variable between − and (the pdf is 𝑓𝑓(𝑒𝑒) = 1/Δ). The noise power is
2 2
given by:

Δ/2 Δ/2
1 Δ2
𝑃𝑃𝑁𝑁 = σ2 = � 𝑒𝑒 2 𝑓𝑓(𝑒𝑒)𝑑𝑑𝑑𝑑 = � 𝑒𝑒 2 𝑑𝑑𝑑𝑑 =
Δ 12
−Δ/2 −Δ/2
Equation 2.8
𝐵𝐵
where σ = Δ/√12 defines the rms quantisation error. With Δ = 𝐷𝐷/2 we get noise power
(dB):
𝐷𝐷2
10 log 𝑃𝑃𝑁𝑁 = 10 log = 20 log 𝐷𝐷 − 10.8 − 6.02𝐵𝐵
12. 22𝐵𝐵
Quantisation signal-to-noise ratio

𝑃𝑃𝑆𝑆
SNR𝑆𝑆 (dB) = 10 log = 10 log 𝑃𝑃𝑆𝑆 − 20 log 𝐷𝐷 + 10.8 + 6.02𝐵𝐵
𝑃𝑃𝑁𝑁
Equation 2.9
1
where 𝑃𝑃𝑆𝑆 = ∑𝑁𝑁−1 2
𝑛𝑛=0 𝑥𝑥 [𝑛𝑛] is the signal power.
𝑁𝑁
The SNR can be improved (i.e. quantisation error reduced) by:

• Increasing the number of bits, 𝐵𝐵, (giving a 6-dB improvement for each additional bit) which
is usually not an option since most ADC provide a fixed or limited choice for 𝐵𝐵.
• Increasing the signal power relative to the dynamic range of the ADC. Since 𝑃𝑃𝑆𝑆 is increased
but 𝐷𝐷 is unchanged this is at the expense of an increased incidence of clipping.
2
• Not bothering! Assume a peak signal amplitude of 𝑉𝑉𝑝𝑝 = 𝐷𝐷/2 and 𝑃𝑃𝑆𝑆 ≡ 𝑉𝑉𝑟𝑟𝑟𝑟𝑟𝑟 = 𝑉𝑉𝑝𝑝2 ⁄2 =
𝐷𝐷2 /8 then 𝑆𝑆𝑆𝑆𝑅𝑅𝑆𝑆 ≅ 1.76 + 6.02𝐵𝐵 and with 16-bit sound cards this gives us 𝑆𝑆𝑆𝑆𝑅𝑅𝑆𝑆 (𝑑𝑑𝑑𝑑) =
98.1 dB and modern sound cards are 32-bit.
2.14 References
1. A. Leon-Garcia, “Probability and Random Processes for Electrical Engineering”, 3rd Ed.,
Addison Wesley, 2008.
3. S. Haykin, “Communication Systems”, 4th Ed., Wiley, 2001.
4. J.G. Proakis, M. Salehi, “Communication Systems Engineering”, 2nd Ed., Prentice-Hall,
2002.
5. S Haykin, B Van Veen, “Signals and Systems”, 2nd Ed., John Wiley, 2002
6. Ambardar, “Analog and Digital Signal Processing”, 2nd Ed., Brooks/Cole, 1999.

3. Introduction to Optimum and Adaptive Systems
3.1 Random Signals

[1: 1-8]
3.1.1 What is a signal?

Denote the discrete-time signal 𝑥𝑥(𝑛𝑛) as a sequence of ordered samples (for each n).
Possible sources include:
1
• Sampling a continuous-time signal, 𝑥𝑥(𝑛𝑛) = 𝑥𝑥𝑐𝑐 (𝑛𝑛𝑛𝑛), where 𝑇𝑇 = is the sampling period
𝐹𝐹𝑠𝑠
and Fs is the sampling frequency,
• Sampling the output of an array of sensors at time t, 𝑥𝑥(𝑛𝑛) = 𝑥𝑥𝑛𝑛 (𝑡𝑡), for each sensor n.
3.1.2 What is randomness?

Successive observations, x(n-1), x(n), x(n+1), … are dependent on one another
• If x(n) can be predicted exactly from the previous samples then the time-series signal is
deterministic
• If x(n) cannot be predicted exactly from the previous samples the time-series signal is
random or stochastic
• If x(n) is independent from previous samples then the time-series signal is white noise.
3.1.3 What do we want to do with these signals?

Spectral estimation – extract features useful for classification
Signal modeling – understanding the underlying signal generation process
Signal filtering – improve quality of signal according to a criterion of performance
3.2 Spectrum Estimation

[1: 8-11]
3.2.1 Single-signal amplitude analysis

Examines the statistical distribution of the signal amplitudes.
Measurements of interest include:
• mean, median and variance
• probability distribution function
• dynamic range of amplitude
3.2.2 Single-signal dependence analysis

Examines the amount of correlation between samples of the same signal.
10/02/2017 Dr. Roberto Togneri, E&E Eng, UWA Page 3-1
• autocorrelation and power spectrum
• self-similarity and higher-order statistics
3.2.3 Joint-signal analysis

Simultaneously examines two signals for their inter-dependence and inter-relationship.
• cross-correlation and cross-power spectrum
• coherence and higher-order statistics
3.3 Signal Modeling

[1: 11-16]
3.3.1 What is signal modelling?

We want to derive a signal model which is a mathematical description that provides an
efficient representation of the “essential” properties of the signal.
• a model can be used to generate random signals with the “essential” properties
• a model can be used to describe a particular class of signals and form the basis for
classification comparing the “essential” properties and ignoring the irrelevant properties
• an model can be used for efficient storage and transmission of the signal
Efficient representation is achieved by:

• selecting a “good” model
• selection of the “right” number of parameters
• achieving a good fit of the model to the data with M << N, where M is the number of
parameters and N is the number of data samples.
Essential properties of the signal include:

• identical magnitude spectrum for selected frequencies or all frequencies
• acceptable approximation of x(n), especially when this is based on previous samples of
the signal.
3.3.2 Pole-Zero Models

One of the most popular modelling paradigms is the pole-zero model which models x(n)
based on the previous samples of the signal, x(n-k), and the current and past values of an
external excitation source, w(n-k):
𝑃𝑃 𝑄𝑄
𝑥𝑥(𝑛𝑛) = �(−𝑎𝑎𝑘𝑘 )𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + � 𝑑𝑑𝑘𝑘 𝑤𝑤(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=1 𝑘𝑘=0
Equation 3.1

The resulting transfer function:

𝑋𝑋(𝑧𝑧) ∑𝑄𝑄𝑘𝑘=0 𝑑𝑑𝑘𝑘 𝑧𝑧 −𝑘𝑘
𝐻𝐻(𝑧𝑧) = =
𝑊𝑊(𝑧𝑧) 1 + ∑𝑃𝑃𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧 −𝑘𝑘
Equation 3.2
is known as the pole-zero model of the time-series signal.
Other names include:

• AutoRegressive Moving-Average (ARMA) model
• AutoRegressive(AR) model for the special case of 𝑄𝑄 = 0 (all-pole model)
• Moving Average (MA) model for the special case of 𝑃𝑃 = 0 (all-zero model)
3.4 Adaptive Filtering

[1: 16-25]
3.4.1 What is adaptive filtering?

Characteristics of adaptive filters:
• filter response modifies itself (“adapts”) according to the non-stationary, noisy or
otherwise unpredictable behaviour of the signal.
• filter co-efficients are recalculated/re-estimated constantly, rather than being fixed at the
initial design stage (as is the case with traditional digital filters).
• three modules of design: the filter structure itself, estimating the error criterion of
performance, adapting the filter (i.e. recalculating the filter co-efficients)
3.4.2 Application to System Identification
Block Diagram
Adaptive filter provides an estimate of the desired signal or system response
Figure 3-1 System Identification (Figure 1.15(a) [1])

Examples
• echo cancellation: remove undesirable echo from two-way communications system by
subtracting an estimate of the echo
• adaptive control: estimate parameters and/or state of the plant in order to design a
controller
• channel modeling: provide estimate of the unknown system response over the regions of
operation of interest
Specific Example of Acoustic Echo Cancellation

Problem description: in two-way communications (telephone, audio teleconferencing, etc.)
the transmitting microphone will pick up unwanted sound from the receiving loudspeaker
which is transmitted back to the original speaker as an echo (with a delay equal to the round-
trip transmission time).
Problem solution: adaptive filter forms an estimate of the interfering signal picked up by the
microphone, e(t), given the incoming loudspeaker signal, x(t), and subtracts this from the
outgoing microphone signal, y(t) = s(t) + e(t), so that only the wanted signal, s(t) is
transmitted.
Figure 3-2 Principle of acoustic echo cancellation using an adaptive echo canceler
(Figure 1.17 [1])
Solution caveats:
1. The echo is not just a simple, linear function of the loudspeaker signal. Room acoustics
and speaker and microphone transducer effects will modify the signal echo in a complex
way.
2. The room acoustics, and to a lesser extent transducer effects, change with time as the
talker moves, microphone/speaker are re-positioned, etc.
3.4.3 Application to System Inversion
Block Diagram
Adaptive filters provides an estimate and applies the inverse of the system response

Figure 3-3 System Inversion (Figure 1.15(b) [1])
Examples
• adaptive equalization: apply inverse of communication channel transfer function in
order to equalise or remove the unwanted effects or distortions arising from transmission
through the channel.
• blind deconvolution/separation: apply the inverse of the corrupting convolution or
mixing operations to the resultant output signal(s) in order to reconstitute the original
signal. The system convolution/mixing response is usually unknown and this makes the
problem “blind”.
• adaptive inverse control: estimate the inverse response of the plant in order to design
controllers in series rather than in feedback with the plant
Specific Example of Channel Equalization

Problem description: the detector in a digital communications system has to determine
whether the received pulse is symbol 1 or symbol 0 every Tb seconds. The pulse is designed
so that there are zero-crossings every Tb seconds in order to avoid interference with adjacent
pulses. However this relies on transmission through a distortion-less or ideal channel and
real--world channels will introduce amplitude and phase effects that will distort the shape of
the pulse and produce InterSymbol Interference (ISI)
Problem solution: compensate for ISI by using an adaptive filter that restores the received
pulse to the original shape by estimating and applying the inverse of the communication
channel response characteristics.

Figure 3-4 Channel equalizer with training and decision-directed modes of operation
(Figure 12.6 [1])
Solution caveats
1. The filter has to both “learn” the inverse of the particular channel and “track” its variation
with time
2. Requires knowledge of the correct symbol sequence (e.g. a training sequence) initially to
prime or reset the operation of the filter. In cases where this is not available then there is
the additional problem of “blind equalization”.
3.4.4 Signal Prediction
Block Diagram
Estimate a signal at time t = n0 by using past and/or future values of the same signal
{x(t): n1 ≤: t ≤: n2}
• forward prediction (linear prediction) if n0 > n2
• backward prediction of n0 < n1
• smoothing/interpolation if n1 < n0 < n2 (excluding n0 itself!)
Figure 3-5 Signal Prediction (Figure 1.15(c) [1])
Examples

• adaptive predictive coding: form estimate of current value of signal based on previous
samples and store/transmit the error in the prediction rather than the signal itself
Specific Example of Linear Predictive Coding (LPC)

Problem description: time-series data from real-world sources like speech are highly
correlated. When storing or transmitting such data quantisation involves representing the
dynamic range of the signal amplitude over a discrete range of values, the larger the dynamic
range the larger the discretisation steps or errors (i.e. quantisation noise) for the same bit
quantisation of the signal.
Problem solution: to reduce the quantisation noise the current sample is predicted based on
the previous samples and the error in the prediction is stored/transmitted together with the
predictor filter co-efficients. The dynamic range of these quantities should be smaller and
hence subject to reduced quantisation noise effects.
Figure 3-6 Predictive linear coding system, (a) coder, (b) decoder
(Figure 10.5 [1])
Solution caveats
1. As real-word sources are non-stationary the filter has to recalculate the predictor co-
efficients and retransmit these to the receiver, but this is done on a frame-by-frame rather
than sample-by-sample basis
3.4.5 Multisensor Interference Cancellation
Block Diagram
Use of multiple sensors that provide reference signals for the estimate and removal of
interference and noise from a primary signal.
Figure 3-7 Multisensor Interference Cancellation (Figure 1.15(d) [1])
Examples
• active noise control: provide an inverted estimate of the unwanted signal or noise and
remove it from the zone of interest by destructive wave interference
• array processing: collect signals from a group of spatially positioned sensors and
emphasise signals arriving from specific directions (i.e. adaptive beamforming) as used in
radar, direction finding, antenna steering, etc.
Specific Example of Active Noise Control (ANC)

Problem description: in noisy environments (e.g. airline cockpit, car) interfering signals
and noise make listening difficult and uncomfortable.
Problem solution: a reference signal, 𝑥𝑥(𝑡𝑡) = 𝐺𝐺𝑥𝑥 𝑣𝑣(𝑡𝑡), from the interfering environment is
used to provide an exact out-of-phase estimate of the interfering signal(s), 𝑦𝑦(𝑡𝑡) = 𝐺𝐺𝑦𝑦 𝑣𝑣(𝑡𝑡),
that is 𝑦𝑦�(𝑡𝑡) = 𝑓𝑓(𝑥𝑥(𝑡𝑡)) ≡ 𝐺𝐺𝑦𝑦 𝐺𝐺𝑥𝑥−1 𝑥𝑥(𝑡𝑡), which when played out through a loudspeaker will
completely cancel the unwanted signal(s) in the listening zone (.e. 𝑦𝑦(𝑡𝑡) + 𝑦𝑦�(𝑡𝑡) = 0).
y(t)
Figure 3-8 Basic components of an active noise control system (Figure 1.21 [1])
Solution Caveats
1. The filter has to provide an exact out-of-phase estimate of the interfering signal in order
to cancel the unwanted signal, otherwise more noise is added!
2. The reference signal must be a filtered version of the interfering signal but must not
include any desired signals in the listening zone of interest, otherwise these will also be
removed.
3. The acoustic environment is unknown and highly time-varying and the filter must be able
to rapidly adapt to any changes.
3.5 References
1. D.G. Manolakis, V.K. Ingle, S.M. Kogon, “Statistical and Adaptive Signal Processing”,
McGraw-Hill, 2000.
2. M.H. Hayes, “Statistical Digital Signal Processing and Modeling”, Wiley, 1996.

4. Optimum Linear Filters
4.1 Optimum Signal Estimation

[1: 261-264][2: 335-336]
In the following analysis we assume discrete-time random signals and adopt the following
notation:
• 𝑦𝑦(𝑛𝑛) be the value of the signal of interest (or desired response) at time n
• 𝑥𝑥𝑘𝑘 (𝑛𝑛) be the set of M values (observations or data), for 1 ≤ k ≤ M, at time n
 signal of the kth sensor at time n (array processing)
 x(n – k), the kth delay of the signal at time n
(signal prediction, system inversion and identification)
then:
Given the set of data, 𝑥𝑥𝑘𝑘 (𝑛𝑛), the signal estimation problem is to determine an estimate 𝑦𝑦�(𝑛𝑛),
of the desired response, 𝑦𝑦(𝑛𝑛) using:
𝑦𝑦�(𝑛𝑛) ≡ 𝐡𝐡T {𝑥𝑥𝑘𝑘 (𝑛𝑛), 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀}
where in the case of 𝑥𝑥𝑘𝑘 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 − 𝑘𝑘), the estimator takes the form of a discrete-time filter.
We want to find an optimum estimator that approximates the desired response as closely as
possible according to certain performance criterion, most commonly minimisation of the
error signal:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
4.2 Linear Mean Square Error Estimation

[1: 264-274][2: 337-339][3: 580-587]
Design an estimator that provides an estimate 𝑦𝑦�(𝑛𝑛) of the desired response 𝑦𝑦(𝑛𝑛) using a
linear combination of the data 𝑥𝑥𝑘𝑘 (𝑛𝑛) for 1 ≤ k ≤ M, such that the MSE 𝐸𝐸{|𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)|2 } is
minimised. That is, our linear MSE estimator (dropping the time index n, i.e. 𝑦𝑦� = 𝑦𝑦�(𝑛𝑛), 𝑦𝑦 =
𝑦𝑦(𝑛𝑛), and 𝑥𝑥𝑘𝑘 = 𝑥𝑥𝑘𝑘 (𝑛𝑛)) is defined by:
𝑀𝑀
𝑦𝑦� = � 𝑐𝑐𝑘𝑘 𝑥𝑥𝑘𝑘 = 𝐜𝐜 𝑇𝑇 𝐱𝐱 = 𝐱𝐱 𝑇𝑇 𝐜𝐜

𝑘𝑘=1
Equation 4-1
where:
𝑥𝑥1
𝑥𝑥2
𝐱𝐱 = [𝑥𝑥1 𝑥𝑥2 ⋯ 𝑥𝑥𝑀𝑀 ]𝑇𝑇 = �⋮ �, is the M × 1 data or observation vector,
𝑥𝑥𝑀𝑀

𝑐𝑐1
𝑐𝑐2
𝐜𝐜 = [𝑐𝑐1 𝑐𝑐2 ⋯ 𝑐𝑐𝑀𝑀 ]𝑇𝑇 = �⋮ �, is the M × 1 parameter or co-efficient vector of the estimator
𝑐𝑐𝑀𝑀
and we want to derive the Linear Minimum Mean Square Error (LMMSE) estimator given
by:
𝐜𝐜𝑂𝑂 = 𝑎𝑎𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑃𝑃(𝐜𝐜), where 𝑃𝑃(𝐜𝐜) = 𝐸𝐸{|𝑒𝑒|2 } = 𝐸𝐸{|𝑦𝑦 − 𝑦𝑦�|2 }
𝐜𝐜
and form the LMMSE estimate, 𝑦𝑦�𝑂𝑂 = 𝐜𝐜𝑂𝑂 𝑇𝑇 𝐱𝐱
4.2.1 Derivation of Linear MMSE Estimator
We first derive an expression for 𝑃𝑃(𝐜𝐜) using Equation 4-1:

𝑃𝑃(𝐜𝐜) = 𝐸𝐸{|𝑒𝑒|2 } = 𝐸𝐸{(𝑦𝑦 − 𝑦𝑦�)(𝑦𝑦 − 𝑦𝑦�)}
= 𝐸𝐸{(𝑦𝑦 − 𝐜𝐜 𝑇𝑇 𝐱𝐱)(𝑦𝑦 − 𝐱𝐱 𝑇𝑇 𝐜𝐜)}
= 𝐸𝐸{|𝑦𝑦|2 } − 𝐜𝐜 𝑇𝑇 𝐸𝐸{𝐱𝐱𝑦𝑦} − 𝐸𝐸{𝑦𝑦𝐱𝐱 𝑇𝑇 }𝐜𝐜 + 𝐜𝐜 𝑇𝑇 𝐸𝐸{𝐱𝐱𝐱𝐱 𝑇𝑇 }𝐜𝐜
= 𝑃𝑃𝑦𝑦 − 𝐜𝐜 𝑇𝑇 𝐝𝐝 − 𝐝𝐝𝑇𝑇 𝐜𝐜 + 𝐜𝐜 𝑇𝑇 𝐑𝐑𝐑𝐑
where:
𝑃𝑃(𝐜𝐜), is the MSE which we want to minimise
𝑃𝑃𝑦𝑦 = 𝐸𝐸{|𝑦𝑦|2 }, is the power of the desired response
𝐝𝐝 = 𝐸𝐸{𝐱𝐱𝑦𝑦}, is the M × 1 cross-correlation vector between the data and desired response
𝐑𝐑 = 𝐸𝐸{𝐱𝐱𝐱𝐱 𝑇𝑇 }, is the M × M correlation matrix of the data vector x
• The correlation matrix R is guaranteed to be symmetric (𝐑𝐑𝑇𝑇 = 𝐑𝐑) and nonnegative

definite and in practice is positive definite. (i.e. x T Rx > 0, for every non-zero x)
• If R is positive definite, then R-1 exists and is also positive definite (i.e. 𝐑𝐑−𝑇𝑇 = 𝐑𝐑−1 and
𝐱𝐱 𝑇𝑇 𝐑𝐑−1 𝐱𝐱 > 0)
We next rewrite the expression for 𝑃𝑃(𝐜𝐜) in the form of a “perfect square” as follows:
𝑃𝑃(𝐜𝐜) = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐑𝐑−1 𝐝𝐝 + (𝐑𝐑𝐑𝐑 − 𝐝𝐝)𝑇𝑇 𝐑𝐑−1 (𝐑𝐑𝐑𝐑 − 𝐝𝐝)
Exercise 4.1 Show This!
𝑃𝑃(𝐜𝐜) is minimised when 𝐑𝐑𝐑𝐑 − 𝐝𝐝 = 0 (Why?)
The necessary and sufficient conditions that determine the linear MMSE estimator, cO , are:
𝐑𝐑𝐜𝐜𝑂𝑂 = 𝐝𝐝
Equation 4-2
which can be written as the set of normal equations:
𝑟𝑟11 𝑟𝑟12 ⋯ 𝑟𝑟1𝑀𝑀 𝑐𝑐1 𝑑𝑑1
𝑟𝑟21 𝑟𝑟22 ⋯ 𝑟𝑟2𝑀𝑀 𝑐𝑐2 𝑑𝑑2
�⋮ ⋮ ⋱ ⋮ � �⋮ � = � �
⋮
𝑟𝑟𝑀𝑀1 𝑟𝑟𝑀𝑀2 ⋯ 𝑟𝑟𝑀𝑀𝑀𝑀 𝑐𝑐𝑀𝑀 𝑑𝑑𝑀𝑀
Equation 4-3
where 𝑟𝑟𝑖𝑖𝑖𝑖 = 𝐸𝐸{𝑥𝑥𝑖𝑖 𝑥𝑥𝑗𝑗 } and 𝑑𝑑𝑖𝑖 = 𝐸𝐸{𝑥𝑥𝑖𝑖 𝑦𝑦}.

A solution is guaranteed to exist if R is positive definite (i.e. 𝐜𝐜𝑂𝑂 = 𝐑𝐑−1 𝐝𝐝) and any general-
purpose routine for the solution of simultaneous linear equations can be used.
The MMSE 𝑃𝑃𝑂𝑂 = 𝐸𝐸{|𝑒𝑒𝑂𝑂 |2 } where 𝑒𝑒𝑂𝑂 = 𝑦𝑦 − 𝑦𝑦�𝑂𝑂 is:

𝑃𝑃𝑂𝑂 = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐑𝐑−1 𝐝𝐝 = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐜𝐜𝑂𝑂
We note:
1. We assume that x and y are zero mean. If not, we replace x and y by 𝑥𝑥 − 𝐸𝐸(𝑥𝑥) and 𝑦𝑦 −
𝐸𝐸(𝑦𝑦) respectively.
2. If x and y are uncorrelated then d = 0 and PO = Py which means that there is no linear
estimator that can be found to reduce the MSE since the desired response y is
uncorrelated with the data vector x. This is the worst result.
3. If y can be perfectly estimated from x then 𝑦𝑦� = 𝑦𝑦 and PO = 0. The is the best result.
4.2.2 Principle of Orthogonality
The normal equations can be more directly be derived making use of the principle of
orthogonality as follows. We consider the correlation between the data vector x and the MSE
error, 𝑒𝑒0 :
𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 𝐸𝐸{𝐱𝐱(𝑦𝑦 − 𝐱𝐱 𝑇𝑇 𝐜𝐜𝑂𝑂 )} = 𝐸𝐸{𝐱𝐱𝑦𝑦} − 𝐸𝐸{𝐱𝐱𝐱𝐱 𝑇𝑇 }𝐜𝐜0 = 𝐝𝐝 − 𝐑𝐑𝐜𝐜𝑂𝑂 = 0
that is:
𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 0
Equation 4-4
or:
𝐸𝐸{𝑥𝑥𝑖𝑖 𝑒𝑒𝑂𝑂 } = 0 for 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀
Orthogonality Principle
The estimation error, 𝑒𝑒0 , is orthogonal to the data, x, used for the estimation, i.e. 𝐸𝐸{𝐱𝐱𝑒𝑒𝑂𝑂 } = 0
A convenient way to view this is to consider the abstract vector space where a vector is a
zero-mean random variable with the following associations:
‖𝑥𝑥‖2 =< 𝑥𝑥, 𝑥𝑥 >≡ 𝐸𝐸{|𝑥𝑥|2 }, for the (squared) length of the vector
‖𝑥𝑥‖. ‖𝑦𝑦‖ cos 𝜃𝜃𝑥𝑥𝑥𝑥 =< 𝑥𝑥, 𝑦𝑦 >≡ 𝐸𝐸{𝑥𝑥𝑥𝑥}, where 𝜃𝜃𝑥𝑥𝑥𝑥 is angle between 𝑥𝑥 and 𝑦𝑦
We illustrate this principle for the case of M=2 in Figure 4-1 where for the given 𝑦𝑦 and space
spanned by {𝑥𝑥𝑖𝑖 : 𝑖𝑖 ≤ 1 ≤ 𝑀𝑀} the minimum error occurs when 𝑥𝑥𝑖𝑖 ⊥ 𝑒𝑒𝑂𝑂 , for all 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀
and hence 𝐸𝐸{𝑥𝑥𝑖𝑖 𝑒𝑒𝑂𝑂 } = 0 for 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀 is the condition for the optimum estimator.

Figure 4-1 Illustration of orthogonality principle (Figure 6.9 [1]). Note that 𝒙𝒙𝒊𝒊 ⊥ 𝒆𝒆𝑶𝑶 , 𝟏𝟏 ≤
𝒊𝒊 ≤ 𝑴𝑴 but it is not true that 𝒙𝒙𝒊𝒊 ⊥ 𝒙𝒙𝒋𝒋 unless the data itself is uncorrelated
The MMSE 𝑃𝑃𝑂𝑂 = 𝐸𝐸{|𝑒𝑒𝑂𝑂 |2 } = 𝐸𝐸{𝑒𝑒0 (𝑦𝑦 − 𝑦𝑦�0 )} = 𝐸𝐸{𝑦𝑦𝑒𝑒0 } = 𝐸𝐸{𝑦𝑦(𝑦𝑦 − 𝐱𝐱 𝑇𝑇 𝐜𝐜𝑂𝑂 )} = 𝑃𝑃𝑦𝑦 − 𝐝𝐝𝑇𝑇 𝐜𝐜𝑂𝑂
(where we use 𝐸𝐸{𝑒𝑒𝑂𝑂 𝑦𝑦�𝑂𝑂 } = 0 since the estimation error is also orthogonal to the estimate)
4.3 Solution of the Normal Equations
A numerical method for solution of the normal equations (Equation 4-2) and computation of
the minimum error is known as the lower-diagonal-upper decomposition or LDLT
decomposition for short where the correlation matrix is written as: R = LDLT .
Full details of the LDL solution method can be found in [1, pgs 274-278].
4.4 Optimum Finite Impulse Response Filters (Wiener filters)

[1: 278-285][2: 339-342]
The optimum FIR filter (also known as Wiener filter) forms an estimate 𝑦𝑦�(𝑛𝑛) of the desired
response, 𝑦𝑦(𝑛𝑛), by using finite samples from a related input signal, 𝑥𝑥(𝑛𝑛). That is:
𝑀𝑀−1 𝑀𝑀−1
𝑦𝑦�(𝑛𝑛) = � ℎ(𝑛𝑛, 𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)

Equation 4-5
where:
𝐜𝐜(𝑛𝑛) = [𝑐𝑐0 (𝑛𝑛) 𝑐𝑐1 (𝑛𝑛) … 𝑐𝑐𝑀𝑀−1 (𝑛𝑛)]𝑇𝑇 = [ℎ(𝑛𝑛, 0) ℎ(𝑛𝑛, 1) … ℎ(𝑛𝑛, 𝑀𝑀 − 1)]𝑇𝑇 is the filter
impulse response sequence or coefficient vector at time n
𝐱𝐱(𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥(𝑛𝑛 − (𝑀𝑀 − 1))]𝑇𝑇 is the input data vector or filter memory at
time n
We form the LMMSE estimate by solving the corresponding set of normal equations for the
filter coefficients, cO (n), at each time n:
𝐑𝐑(𝑛𝑛)𝐜𝐜𝑂𝑂 (𝑛𝑛) = 𝐝𝐝(𝑛𝑛)
or in matrix form:
𝑟𝑟0,0 (𝑛𝑛) 𝑟𝑟0,1 (𝑛𝑛) ⋯ 𝑟𝑟0,𝑀𝑀−1 (𝑛𝑛) 𝑐𝑐 (𝑛𝑛) 𝑑𝑑0 (𝑛𝑛)
⎡ ⎤ 0
⎢𝑟𝑟1,0 (𝑛𝑛) 𝑟𝑟1,1 (𝑛𝑛) ⋯ 𝑟𝑟1,𝑀𝑀−1 (𝑛𝑛) ⎥ 𝑐𝑐1 (𝑛𝑛)
�
𝑑𝑑 (𝑛𝑛)
�=� 1 �
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⋮ ⋮
⎣𝑟𝑟𝑀𝑀−1,0 (𝑛𝑛) 𝑟𝑟𝑀𝑀−1,1 (𝑛𝑛) ⋯ 𝑟𝑟𝑀𝑀−1,𝑀𝑀−1 (𝑛𝑛)⎦ 𝑐𝑐𝑀𝑀−1 (𝑛𝑛) 𝑑𝑑𝑀𝑀−1 (𝑛𝑛)
𝑇𝑇
where 𝐑𝐑(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝐱𝐱 (𝑛𝑛)}, 𝑟𝑟𝑖𝑖𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑥𝑥(𝑛𝑛 − 𝑗𝑗)}, 𝐝𝐝(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)},
𝑑𝑑𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑦𝑦(𝑛𝑛)} and then computing the estimate 𝑦𝑦�(𝑛𝑛) using a discrete-time filter
structure based upon Equation 4-5 primed by the 𝑐𝑐𝑂𝑂 (𝑛𝑛), as shown in Figure 4-2.
Figure 4-2 Design and implementation of a time-varying optimum FIR filter (Figure 6.11
[1])
• The FIR filter coefficients have to be estimated and loaded into the filter at each sample
time n
• Although the desired response 𝑦𝑦(𝑛𝑛)is not available, the known relationship between 𝑦𝑦(𝑛𝑛)
and 𝑥𝑥(𝑛𝑛) can be used to derive or estimate the cross-correlation vector 𝐝𝐝(𝑛𝑛) =
𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)}
Example 4–1
The most common use of optimum filtering is for estimating of a signal in noise, that is:
x(n) = y(n) + v(n)
where v(n) is a noise signal which is assumed uncorrelated with y(n). Thus:
𝑑𝑑𝑘𝑘 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑘𝑘)𝑦𝑦(𝑛𝑛)} = 𝐸𝐸{(𝑦𝑦(𝑛𝑛 − 𝑘𝑘) + 𝑣𝑣(𝑛𝑛 − 𝑘𝑘))𝑦𝑦(𝑛𝑛)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛 − 𝑘𝑘)𝑦𝑦(𝑛𝑛)}
requiring only knowledge of the second-order statistics of the desired response y(n)
4.4.1 Optimum FIR Filters for Stationary Processes
Most useful FIR filters are designed when the input and desired response stochastic
processes are jointly wide-sense stationary (WSS), in which case the correlation matrix, R(n)
= R, and cross-correlation vector, d(n) = d, no longer depend explicitly on the time-index n.

That is for WSS processes:

𝑟𝑟𝑖𝑖𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑥𝑥(𝑛𝑛 − 𝑗𝑗)} ≡ 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − (𝑖𝑖 − 𝑗𝑗))} = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑥𝑥 (𝑙𝑙)
𝑑𝑑𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑦𝑦(𝑛𝑛)} ≡ 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑖𝑖)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)
𝑐𝑐𝑖𝑖 (𝑛𝑛) = ℎ(𝑛𝑛, 𝑖𝑖) ≡ ℎ(𝑖𝑖) = 𝑐𝑐𝑖𝑖
Discrete Wiener-Hopf Equations for WSS processes
We form the LMMSE estimate by solving the corresponding set of normal equations for the
filter coefficients, 𝐜𝐜𝑂𝑂 = 𝐡𝐡𝑂𝑂 :
𝐑𝐑𝐡𝐡𝑂𝑂 = 𝐝𝐝
or in matrix form:
𝑟𝑟𝑥𝑥 (0) 𝑟𝑟𝑥𝑥 (1) ⋯ 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 1) ℎ𝑂𝑂 (0) 𝑟𝑟𝑦𝑦𝑦𝑦 (0)
⎡ ⎤
𝑟𝑟𝑥𝑥 (1) 𝑟𝑟𝑥𝑥 (0) ⋯ 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 2) ℎ𝑂𝑂 (1) ⎢ 𝑟𝑟𝑦𝑦𝑦𝑦 (1) ⎥
� �� =
⋮ ⋮ ⋱ ⋮ ⋮ ⎢ ⋮ ⎥
𝑟𝑟𝑥𝑥 (𝑀𝑀 − 1) 𝑟𝑟𝑥𝑥 (𝑀𝑀 − 2) ⋯ 𝑟𝑟𝑥𝑥 (0) ℎ𝑂𝑂 (𝑀𝑀 − 1) 𝑟𝑟
⎣ 𝑦𝑦𝑦𝑦 (𝑀𝑀 − 1) ⎦
Equation 4-6
That is, a time-invariant optimum FIR filter is implemented based upon the convolution:
𝑀𝑀−1
𝑦𝑦�𝑂𝑂 (𝑛𝑛) = � ℎ𝑂𝑂 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐱𝐱(𝑛𝑛)

𝑘𝑘=0
Equation 4-7
where the filter co-efficients satisfy the discrete-time Wiener-Hopf equations:
𝑀𝑀−1
� ℎ𝑂𝑂 (𝑘𝑘)𝑟𝑟𝑥𝑥 (𝑚𝑚 − 𝑘𝑘) = 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑚𝑚) 0 ≤ 𝑚𝑚 ≤ 𝑀𝑀 − 1

𝑘𝑘=0
Equation 4-8
and the MMSE is given by:
𝑀𝑀−1 𝑀𝑀−1
𝑃𝑃 𝑂𝑂 = 𝑃𝑃𝑦𝑦 − � ℎ𝑂𝑂 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑂𝑂 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − 𝐡𝐡𝑇𝑇𝑂𝑂 𝐝𝐝
Equation 4-9
• 𝑟𝑟𝑥𝑥 (−𝑙𝑙) = 𝑟𝑟𝑥𝑥 (𝑙𝑙) and 𝑟𝑟𝑦𝑦𝑥𝑥 (−𝑙𝑙) = 𝑟𝑟𝑥𝑥𝑥𝑥 (𝑙𝑙)

• The correlation matrix, R, for WSS processes is a positive definite, symmetric, Toeplitz
matrix.
• A Toeplitz matrix is defined such that 𝐴𝐴 = [𝑎𝑎𝑖𝑖,𝑗𝑗 ] = [𝑎𝑎𝑖𝑖−𝑗𝑗 ], 1 ≤ 𝑖𝑖 ≤ 𝑀𝑀, 1 ≤ 𝑗𝑗 ≤ 𝑁𝑁, that
is, the elements along any diagonal are equal. A square Toeplitz matrix appears as:
𝑎𝑎0 𝑎𝑎−1 𝑎𝑎−2 ⋯ 𝑎𝑎1−𝑁𝑁
⎡𝑎𝑎1 𝑎𝑎0 𝑎𝑎−1 ⋯ 𝑎𝑎2−𝑁𝑁 ⎤
⎢𝑎𝑎 𝑎𝑎1 𝑎𝑎0 ⋯ 𝑎𝑎3−𝑁𝑁 ⎥⎥
𝐴𝐴 = ⎢ 2
⎢⋮ ⋮ ⋮ ⋱ ⋮ ⎥
⎣𝑎𝑎𝑁𝑁−1 𝑎𝑎𝑁𝑁−2 𝑎𝑎𝑁𝑁−3 ⋯ 𝑎𝑎0 ⎦
• A symmetrc, Toeplitz matrix has 𝑎𝑎−𝑖𝑖 = 𝑎𝑎𝑖𝑖

Example 4–2
Problem: Consider the harmonic random process:
𝑦𝑦(𝑛𝑛) = 𝐴𝐴cos(𝜔𝜔𝑂𝑂 𝑛𝑛 + 𝜙𝜙)
with fixed, but unknown, amplitude and frequency and random uniformly distributed phase.
The process is corrupted by additive white Gaussian noise 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, 𝜎𝜎𝑣𝑣2 ) that is
uncorrelated with 𝑦𝑦(𝑛𝑛). The resulting signal 𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) is observable. It is
required to design an optimum FIR filter with input 𝑥𝑥(𝑛𝑛) to remove the noise and produce an
estimate of the desired response 𝑦𝑦(𝑛𝑛)
Solution: To design the optimum filter the second-order statistics, 𝑟𝑟𝑥𝑥 (𝑙𝑙)and 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) are derived
as follows. We note that since v(n) and y(n) are uncorrelated we have:
1
𝑟𝑟𝑥𝑥 (𝑙𝑙) = 𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑛𝑛) = 𝐴𝐴2 cos(𝜔𝜔𝑂𝑂 𝑙𝑙) + 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙)
2
1
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)[𝑦𝑦(𝑛𝑛 − 𝑙𝑙) + 𝑣𝑣(𝑛𝑛 − 𝑙𝑙)]} = 𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝐴𝐴2 𝑐𝑐𝑐𝑐𝑐𝑐𝜔𝜔𝑂𝑂 𝑙𝑙
2
and by solving for ℎO (𝑘𝑘) in Equation 4-6 we form the optimum FIR filter given by:
𝑀𝑀−1
𝑦𝑦�𝑂𝑂 (𝑛𝑛) = � ℎ𝑂𝑂 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
THINK! Do you really need to know 𝐴𝐴 and 𝜔𝜔0 to solve for ℎ𝑂𝑂 (𝑘𝑘)?
1
Exercise 4.2: Show that 𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑦𝑦(𝑛𝑛 − 𝑙𝑙)} = 𝐴𝐴2 cos(𝜔𝜔𝑂𝑂 𝑙𝑙)
2
4.4.2 SNR Improvement
Consider the signal plus noise problem:

𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
where we want to enhance the noisy signal, 𝑥𝑥(𝑛𝑛), by the estimate of the clean signal using
the Mth order optimum FIR filter:
𝑀𝑀−1
𝑦𝑦�(𝑛𝑛) = � ℎ𝑂𝑂 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐱𝐱(𝑛𝑛)

𝑘𝑘=0
To determine how effective the filter has been we calculate the SNR Improvement as:
𝑆𝑆𝑆𝑆𝑅𝑅𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 = 𝑆𝑆𝑆𝑆𝑅𝑅𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 − 𝑆𝑆𝑆𝑆𝑅𝑅𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
where
𝑟𝑟𝑦𝑦 (0)
𝑆𝑆𝑆𝑆𝑅𝑅𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 10 log10
𝑟𝑟𝑣𝑣 (0)
measures the SNR of the original signal and noise prior to enhancement, and:
𝑟𝑟𝑦𝑦 (0)
𝑆𝑆𝑆𝑆𝑅𝑅𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 10 log10 𝑒𝑒
𝑟𝑟𝑣𝑣𝑒𝑒 (0)
measures the SNR of the enhanced signal and noise after enhancement (on output) of the
filter. We assume that the enhanced signal and noise are given by:

𝑀𝑀−1
𝑦𝑦𝑒𝑒 (𝑛𝑛) = � ℎ𝑂𝑂 (𝑘𝑘)𝑦𝑦(𝑛𝑛 − 𝑘𝑘) = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐲𝐲(𝑛𝑛)

𝑘𝑘=0
𝑀𝑀−1
𝑣𝑣𝑒𝑒 (𝑛𝑛) = � ℎ𝑂𝑂 (𝑘𝑘)𝑣𝑣(𝑛𝑛 − 𝑘𝑘) = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐯𝐯(𝑛𝑛)

𝑘𝑘=0
which arises if we consider 𝑦𝑦�(𝑛𝑛) = 𝑦𝑦𝑒𝑒 (𝑛𝑛) + 𝑣𝑣𝑒𝑒 (𝑛𝑛) = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐱𝐱(𝑛𝑛) = 𝐡𝐡𝑇𝑇𝑂𝑂 [𝐲𝐲(𝑛𝑛) + 𝐯𝐯(𝑛𝑛)]
Thus we have that:
𝑟𝑟𝑦𝑦𝑒𝑒 (0) = 𝐸𝐸[𝑦𝑦𝑒𝑒 (𝑛𝑛)𝑦𝑦𝑒𝑒 (𝑛𝑛)] = 𝐸𝐸[𝐡𝐡𝑇𝑇𝑂𝑂 𝐲𝐲(𝑛𝑛)𝐲𝐲 𝑇𝑇 (𝑛𝑛)𝐡𝐡𝑂𝑂 ] = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐑𝐑 𝑦𝑦 𝐡𝐡𝑂𝑂
𝑟𝑟𝑣𝑣𝑒𝑒 (0) = 𝐸𝐸[𝑣𝑣𝑒𝑒 (𝑛𝑛)𝑣𝑣𝑒𝑒 (𝑛𝑛)] = 𝐸𝐸[𝐡𝐡𝑇𝑇𝑂𝑂 𝐯𝐯(𝑛𝑛)𝐯𝐯 𝑇𝑇 (𝑛𝑛)𝐡𝐡𝑂𝑂 ] = 𝐡𝐡𝑇𝑇𝑂𝑂 𝐑𝐑 𝑣𝑣 𝐡𝐡𝑂𝑂
and
𝐡𝐡𝑇𝑇𝑂𝑂 𝐑𝐑 𝑦𝑦 𝐡𝐡𝑂𝑂
𝑆𝑆𝑆𝑆𝑅𝑅𝑎𝑎 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 = 10 log10 𝑇𝑇
𝐡𝐡𝑂𝑂 𝐑𝐑 𝑣𝑣 𝐡𝐡𝑂𝑂
4.5 Linear Prediction

[1: 286-295][2: 342-348][3: 587-605]
We are given a set of samples 𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) of a stochastic process of
interest and wish to estimate the value of the same process, 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) (where 𝑖𝑖 ∉ [0. . 𝑀𝑀]), at a
different time i, using a linear combination of the given (known) samples:
𝑀𝑀
𝑥𝑥�(𝑛𝑛 − 𝑖𝑖) = − � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
𝑘𝑘≠𝑖𝑖
where we note that 𝑦𝑦�(𝑛𝑛) ≡ 𝑥𝑥�(𝑛𝑛 − 𝑖𝑖) and we assume the usual convention of negated
predictor coefficients (i.e. −𝑐𝑐𝑘𝑘 (𝑛𝑛)). In terms of the error signal:
𝑒𝑒 (𝑖𝑖) (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) − 𝑥𝑥�(𝑛𝑛 − 𝑖𝑖)
𝑀𝑀
= 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) + � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
𝑘𝑘≠𝑖𝑖
Figure 4-3 Illustration showing the samples used in linear signal estimation (Figure 6.16[1])

To determine the MMSE signal estimator we partition as follows:

𝑖𝑖−1 𝑀𝑀
𝑒𝑒 (𝑖𝑖) (𝑛𝑛) = � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) + � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=0 𝑘𝑘=𝑖𝑖+1
= 𝐜𝐜𝑖𝑖𝑇𝑇 (𝑛𝑛)𝐱𝐱 𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖) = 𝑇𝑇
𝐱𝐱𝑖𝑖 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖)
Equation 4-10
where:
𝐱𝐱 𝑖𝑖 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥�𝑛𝑛 − (𝑖𝑖 − 1)� 𝑥𝑥(𝑛𝑛 − (𝑖𝑖 + 1)) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇
𝐜𝐜𝑖𝑖 (𝑛𝑛) = [𝑐𝑐0 (𝑛𝑛) 𝑐𝑐1 (𝑛𝑛) … 𝑐𝑐𝑖𝑖−1 (𝑛𝑛) 𝑐𝑐𝑖𝑖+1 (𝑛𝑛) … 𝑐𝑐𝑀𝑀 (𝑛𝑛)]𝑇𝑇
Using the Orthogonality Principle from Equation 4-4 we have that:

𝐸𝐸��𝐱𝐱𝑖𝑖 (𝑛𝑛)𝑒𝑒 (𝑖𝑖) (𝑛𝑛)�� = 𝐸𝐸 ��𝐱𝐱𝑖𝑖 (𝑛𝑛) �𝐱𝐱 𝑖𝑖𝑇𝑇 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖)��
= 𝐸𝐸[𝐱𝐱 𝑖𝑖 (𝑛𝑛)𝐱𝐱𝑖𝑖𝑇𝑇 (𝑛𝑛)]𝐜𝐜𝑖𝑖 (𝑛𝑛) + 𝐸𝐸[𝐱𝐱 𝑖𝑖 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑖𝑖)] = 0
which yields the following set of equations to solve for the predictor co-efficients, 𝐜𝐜𝑖𝑖 (𝑛𝑛):
𝐑𝐑 𝑖𝑖 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛) = −𝐫𝐫𝑖𝑖 (𝑛𝑛)
Equation 4-11
where:
𝐑𝐑 𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝑖𝑖 (𝑛𝑛)𝐱𝐱 𝑖𝑖𝑇𝑇 (𝑛𝑛)}
𝐫𝐫𝑖𝑖 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝑖𝑖 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑖𝑖)}
We also have from the orthogonality principle that:

(𝑖𝑖)
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑒𝑒 (𝑖𝑖) (𝑛𝑛)} = 𝐸𝐸 �𝑥𝑥(𝑛𝑛 − 𝑖𝑖) �𝐱𝐱 𝑖𝑖𝑇𝑇 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛) + 𝑥𝑥(𝑛𝑛 − 𝑖𝑖)��
which yields the following:
(𝑖𝑖)
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑖𝑖) + 𝐫𝐫𝑖𝑖𝑇𝑇 (𝑛𝑛)𝐜𝐜𝑖𝑖 (𝑛𝑛)
Equation 4-12
2
where 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑖𝑖) = 𝐸𝐸{|𝑥𝑥(𝑛𝑛 − 𝑖𝑖)| } = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑖𝑖)𝑥𝑥(𝑛𝑛 − 𝑖𝑖)}

4.5.1 Symmetric Linear Smoother
If i = L and M = 2L then we have an Mth order symmetric linear smoother (SLS) that
produces an estimate of the middle sample, 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) for 𝑘𝑘 = 𝐿𝐿, by using the L past (𝐿𝐿 +
1 ≤ 𝑘𝑘 ≤ 2𝐿𝐿) and L future (0 ≤ 𝑘𝑘 ≤ 𝐿𝐿 − 1) samples. That is:
𝐿𝐿−1 2𝐿𝐿
𝑥𝑥�(𝑛𝑛 − 𝐿𝐿) = − � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) − � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0 𝑘𝑘=𝐿𝐿+1
and the smoother co-efficients are given by:
𝐱𝐱 𝐿𝐿 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥�𝑛𝑛 − (𝐿𝐿 − 1)� 𝑥𝑥(𝑛𝑛 − (𝐿𝐿 + 1)) … 𝑥𝑥(𝑛𝑛 − 2𝐿𝐿)]𝑇𝑇
𝐜𝐜𝐿𝐿 (𝑛𝑛) = [𝑐𝑐0 (𝑛𝑛) 𝑐𝑐1 (𝑛𝑛) … 𝑐𝑐𝐿𝐿−1 (𝑛𝑛) 𝑐𝑐𝐿𝐿+1 (𝑛𝑛) … 𝑐𝑐𝑀𝑀 (𝑛𝑛)]𝑇𝑇
where 𝐑𝐑 𝐿𝐿 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝐿𝐿 (𝑛𝑛)𝐱𝐱 𝐿𝐿𝑇𝑇 (𝑛𝑛)} and 𝐫𝐫𝐿𝐿 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱 𝐿𝐿 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝐿𝐿)}, and the MMSE power:
(𝐿𝐿)
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝐿𝐿) + 𝐫𝐫𝐿𝐿𝑇𝑇 (𝑛𝑛)𝐜𝐜𝐿𝐿 (𝑛𝑛)
4.5.2 Forward Linear Prediction
A one-step Mth order forward linear prediction (FLP) involves the estimation of the sample,
𝑥𝑥(𝑛𝑛), by using the M past samples, 𝑥𝑥(𝑛𝑛 − 1), 𝑥𝑥(𝑛𝑛 − 2), … , 𝑥𝑥(𝑛𝑛 − 𝑀𝑀). From Equation
4-10 this corresponds to the case of i = 0, and adopting the following change in notation for
the special case of FLP:
𝐱𝐱(𝑛𝑛 − 1) = [𝑥𝑥(𝑛𝑛 − 1) 𝑥𝑥(𝑛𝑛 − 2) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇 ≡ 𝐱𝐱 0 (𝑛𝑛)
𝐚𝐚(𝑛𝑛) = [ 𝑎𝑎1 (𝑛𝑛) 𝑎𝑎2 (𝑛𝑛) … 𝑎𝑎𝑀𝑀 (𝑛𝑛)]𝑇𝑇 ≡ 𝐜𝐜0 (𝑛𝑛)
𝐫𝐫 𝑓𝑓 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛)} ≡ 𝐫𝐫0 (𝑛𝑛)
𝐑𝐑(𝑛𝑛 − 1) = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝐱𝐱 𝑇𝑇 (𝑛𝑛 − 1)} = 𝐑𝐑 0 (𝑛𝑛)
the forward predictor co-efficients 𝐚𝐚(𝑛𝑛) are the solution of Equation 4-11, that is:
𝐑𝐑(𝑛𝑛 − 1)𝐚𝐚(𝑛𝑛) = −𝐫𝐫 𝑓𝑓 (𝑛𝑛)
and the MMSE power from Equation 4-12 is given by:
𝑓𝑓
𝑃𝑃𝑂𝑂 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛) + 𝐫𝐫 𝑓𝑓𝑓𝑓 (𝑛𝑛)𝐚𝐚(𝑛𝑛)
where 𝑃𝑃𝑥𝑥 (𝑛𝑛) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛)}. An estimate of 𝑥𝑥(𝑛𝑛) is then provided by:
𝑀𝑀
𝑥𝑥�(𝑛𝑛) = − � 𝑎𝑎𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = −𝐚𝐚(𝑛𝑛)𝑇𝑇 𝐱𝐱(𝑛𝑛 − 1)

𝑘𝑘=1
and the FLP error filter is defined by:
𝑀𝑀 𝑀𝑀
𝑒𝑒 𝑓𝑓 (𝑛𝑛) = 𝑒𝑒 (0) (𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + � 𝑎𝑎𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = � 𝑎𝑎𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
(𝑀𝑀)
• It is standard notational practice to indicate the order of analysis by: 𝑎𝑎𝑘𝑘 (𝑛𝑛) ≡ 𝑎𝑎𝑘𝑘 (𝑛𝑛),
which identifies the kth co-efficient of the Mth order FLP.
(𝑀𝑀) (𝑀𝑀)
• Since 𝑎𝑎1 (𝑛𝑛) is the first co-efficient we have by definition 𝑎𝑎0 (𝑛𝑛) = 1
4.5.3 Backward Linear Prediction
A one-step Mth order backward linear prediction (BLP) involves the estimation of the
sample, 𝑥𝑥(𝑛𝑛 − 𝑀𝑀), by using the M future samples, 𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(𝑛𝑛 − (𝑀𝑀 − 1)).
From Equation 4-10 this corresponds to the case of i = M, and adopting the following change
in notation for the special case of BLP:
𝐱𝐱(𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥(𝑛𝑛 − (𝑀𝑀 − 1))]𝑇𝑇 ≡ 𝐱𝐱 𝑀𝑀 (𝑛𝑛)

𝐛𝐛(𝑛𝑛) = [ 𝑏𝑏0 (𝑛𝑛) 𝑏𝑏1 (𝑛𝑛) … 𝑏𝑏𝑀𝑀−1 (𝑛𝑛)]𝑇𝑇 ≡ 𝐜𝐜𝑀𝑀 (𝑛𝑛)
𝐫𝐫 𝑏𝑏 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑥𝑥(𝑛𝑛)} ≡ 𝐫𝐫𝑀𝑀 (𝑛𝑛)
𝐑𝐑(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)} = 𝐑𝐑 𝑀𝑀 (𝑛𝑛)
the backward predictor co-efficients 𝐛𝐛(𝑛𝑛) are the solution of Equation 4-11, that is:
𝐑𝐑(𝑛𝑛)𝐛𝐛(𝑛𝑛) = −𝐫𝐫 𝑏𝑏 (𝑛𝑛)
and the MMSE power from Equation 4-12 is given by:
𝑃𝑃𝑂𝑂𝑏𝑏 (𝑛𝑛) = 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑀𝑀) + 𝐫𝐫 𝑏𝑏𝑏𝑏 (𝑛𝑛)𝐛𝐛(𝑛𝑛)
where 𝑃𝑃𝑥𝑥 (𝑛𝑛 − 𝑀𝑀) = 𝐸𝐸{𝑥𝑥(𝑛𝑛 − 𝑀𝑀)𝑥𝑥(𝑛𝑛 − 𝑀𝑀)}. An estimate of 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) is then provided by:
𝑀𝑀−1
𝑥𝑥�(𝑛𝑛 − 𝑀𝑀) = − � 𝑏𝑏𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = −𝐛𝐛(𝑛𝑛)𝑇𝑇 𝐱𝐱(𝑛𝑛)

𝑘𝑘=0
and the BLP error filter is defined by:
𝑀𝑀−1 𝑀𝑀
𝑒𝑒 𝑏𝑏 (𝑛𝑛) = 𝑒𝑒 (𝑀𝑀) (𝑛𝑛) = � 𝑏𝑏𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑀𝑀) = � 𝑏𝑏𝑘𝑘 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
(𝑀𝑀)
• It is standard notational practice to indicate the order of analysis by: 𝑏𝑏𝑘𝑘 (𝑛𝑛) ≡ 𝑏𝑏𝑘𝑘 (𝑛𝑛),
which identifies the kth co-efficient of the Mth order BLP.
• Since 𝑏𝑏𝑀𝑀−1 (𝑛𝑛) is the last co-efficient we have by definition 𝑏𝑏𝑀𝑀 (𝑛𝑛) = 1
4.5.4 Stationary Processes
Is the process 𝑥𝑥(𝑛𝑛) is WSS, then the elements of the correlation matrix 𝐑𝐑(𝑛𝑛) and correlation
vector 𝐫𝐫(𝑛𝑛) no longer depend explicitly on the time-index n but only on the difference. That
is 𝐫𝐫(𝑙𝑙) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} does not depend on n, only on l. Define:
𝐫𝐫 = [𝑟𝑟(1) 𝑟𝑟(2) … 𝑟𝑟(𝑀𝑀)]𝑇𝑇
It is evident that:
𝐫𝐫 𝑓𝑓 = 𝐸𝐸{𝐱𝐱(𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛)} = 𝐫𝐫
𝐫𝐫 𝑏𝑏 = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑀𝑀)} = 𝐉𝐉𝐉𝐉 = [𝑟𝑟(𝑀𝑀) 𝑟𝑟(𝑀𝑀 − 1) … 𝑟𝑟(1)]𝑇𝑇
where J is the exchange matrix:

0 0 ⋯ 1
⋮ ⋮ ⋱ ⋮
𝐉𝐉 = � �
0 1 ⋯ 0
1 0 ⋯ 0
which simply reverses the order of the vector elements of r.
We also note that:

𝑟𝑟(0) 𝑟𝑟(1) ⋯ 𝑟𝑟(𝑀𝑀 − 1)
𝑟𝑟(1) 𝑟𝑟(0) ⋯ 𝑟𝑟(𝑀𝑀 − 2)
𝐑𝐑(𝑛𝑛) = 𝐑𝐑(𝑛𝑛 − 1) = 𝐑𝐑 = � �
⋮ ⋮ ⋱ ⋮
𝑟𝑟(𝑀𝑀 − 1) 𝑟𝑟(𝑀𝑀 − 2) ⋯ 𝑟𝑟(0)
Therefore for the stationary FLP:
𝐑𝐑𝐚𝐚𝑂𝑂 = −𝐫𝐫
𝑓𝑓
𝑃𝑃𝑂𝑂 = 𝑟𝑟(0) + 𝐫𝐫 𝑇𝑇 𝐚𝐚𝑂𝑂
Equation 4-13
and for the stationary BLP:
𝐑𝐑𝐛𝐛𝑂𝑂 = −𝐉𝐉𝐉𝐉
𝑃𝑃𝑂𝑂𝑏𝑏 = 𝑟𝑟(0) + 𝐫𝐫 𝑇𝑇 𝐉𝐉𝐛𝐛𝑂𝑂
Equation 4-14
where:
𝐚𝐚O = [ 𝑎𝑎1 𝑎𝑎2 … 𝑎𝑎𝑀𝑀 ]𝑇𝑇
𝐫𝐫 = [𝑟𝑟(1) 𝑟𝑟(2) … 𝑟𝑟(𝑀𝑀)]𝑇𝑇  𝐉𝐉𝐉𝐉 = [𝑟𝑟(𝑀𝑀) 𝑟𝑟(𝑀𝑀 − 1) … 𝑟𝑟(1)]𝑇𝑇
𝐛𝐛O = [ 𝑏𝑏1 𝑏𝑏2 … 𝑏𝑏𝑀𝑀 ]𝑇𝑇  𝐉𝐉𝐛𝐛O = [ 𝑏𝑏𝑀𝑀 𝑏𝑏𝑀𝑀−1 … 𝑏𝑏1 ]𝑇𝑇
Noting that 𝐉𝐉 𝑇𝑇 = 𝐉𝐉. Since R is a symmetric Toeplitz matrix it can be shown that:
𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉
and hence we have that for WSS signals:
𝐛𝐛𝑂𝑂 = 𝐉𝐉𝐚𝐚𝑂𝑂
𝑓𝑓
𝑃𝑃𝑂𝑂 = 𝑃𝑃𝑂𝑂𝑏𝑏
That is the BLP coefficient vector is the reverse of the FLP coefficient vector and the MMSE
powers are the identical. In other order words the FLP and BLP co-efficients are duals of one
another. This duality property only holds for stationary processes.
𝑓𝑓
Exercise 4.3 Show that 𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉 and hence show that 𝐛𝐛𝑂𝑂 = 𝐉𝐉𝐚𝐚𝑂𝑂 and 𝑃𝑃𝑂𝑂 = 𝑃𝑃𝑂𝑂𝑏𝑏

4.5.5 Properties
Property 1
If the signal x(n) is stationary, then the symmetric, linear smoother has linear phase.
Proof: Using the fact that for stationary signals 𝐑𝐑𝐑𝐑 = 𝐉𝐉𝐉𝐉 then we can show that for the
symmetric linear smoother that:
𝐫𝐫 = 𝐉𝐉𝐉𝐉 which implies 𝐜𝐜 = 𝐉𝐉𝐉𝐉
and hence the smoother impulse response,𝐜𝐜 ,has even symmetry it has, by definition, linear
phase.
Exercise 4.4 Show that 𝐜𝐜 = 𝐉𝐉𝐉𝐉 and hence that the smoother has linear phase.
Property 2
If the signal x(n) is stationary, then the FLP error filter 1, 𝑎𝑎1 , 𝑎𝑎2 , … , 𝑎𝑎𝑀𝑀 is minimum-
phase and the BLP error filter 𝑏𝑏0 , 𝑏𝑏1 , … , 𝑏𝑏𝑀𝑀−1 , 1 is maximum-phase.
Proof: The system function of the FLP error filter 𝐴𝐴(𝑧𝑧) = 1 + ∑𝑀𝑀 𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧
−𝑘𝑘
can be shown to
have all zeros inside the unit circle and hence is, by definition, minimum-phase. Since 𝐛𝐛 =
1
𝐉𝐉𝐉𝐉, then we have that 𝐵𝐵(𝑧𝑧) = 𝑧𝑧 −𝑀𝑀 𝐴𝐴 � � which implies that all zeros are outside the unit
𝑧𝑧
circle and hence the BLP error filter is maximum-phase.
1
Exercise 4.5 Show that 𝐛𝐛 = 𝐉𝐉𝐉𝐉 implies 𝐵𝐵(𝑧𝑧) = 𝑧𝑧 −𝑀𝑀 𝐴𝐴 � �
𝑧𝑧
Example 4–3
Problem: A random sequence x(n) is generated by passing a white Gaussian noise process
𝑤𝑤(𝑛𝑛)~𝑁𝑁(0,1) through the filter:
1
𝑥𝑥(𝑛𝑛) = 𝑤𝑤(𝑛𝑛) + 𝑤𝑤(𝑛𝑛 − 1)
2
Determine the second-order FLP, BLP and SLS.
Solution: Since x(n) is a WSS signal for M = 2 we need to calculate r(0), r(1) and r(2) to
fully specify the filters. The complex power spectrum, 𝑅𝑅𝑥𝑥 (𝑧𝑧) where 𝑅𝑅𝑤𝑤 (𝑧𝑧) = 1 is:
1
1 1 1 5 1
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑅𝑅𝑤𝑤 (𝑧𝑧) = (1 + 𝑧𝑧 −1 )(1 + 𝑧𝑧)(1) = 𝑧𝑧 + + 𝑧𝑧 −1 ≡ � 𝑟𝑟(𝑘𝑘)𝑧𝑧 −𝑘𝑘
2 2 2 4 2
𝑘𝑘=−1
5 1
and thus 𝑟𝑟(0) = , 𝑟𝑟(1) = , 𝑟𝑟(2) = 0. From Equation 4-13 for the FLP we have:
4 2
5 1
1
2 𝑎𝑎1 𝑎𝑎1 −0.476
�41 5� �𝑎𝑎2 � = − �2 �  �𝑎𝑎 � = � �
0 2 0.190
2 4
and:
5 1𝑓𝑓 −0.476
𝑃𝑃𝑂𝑂 =
+� 0� � � = 1.012
4 2 0.190
and from Equation 4-14 for the BLP we have:
5 1
2 𝑏𝑏0
0 𝑏𝑏 0.190
�41 5� �𝑏𝑏 � = − �1 �  � 0 � = � �
1 2
𝑏𝑏1 −0.476
2 4
and:
5 1 0.190
𝑃𝑃𝑂𝑂𝑏𝑏 =
+ �0 �� = 1.012
4 2 −0.476
For the SLS L=1 and M=2 and thus:
𝐱𝐱1 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 2)]𝑇𝑇
𝑟𝑟(0) 𝑟𝑟(2)
𝐑𝐑1 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱1 (𝑛𝑛)𝐱𝐱1𝑇𝑇 (𝑛𝑛)} = � �
𝑟𝑟(2) 𝑟𝑟(0)
𝑟𝑟(1)
𝐫𝐫1 (𝑛𝑛) = 𝐸𝐸{𝐱𝐱1 (𝑛𝑛)𝑥𝑥(𝑛𝑛 − 1)} = � �
𝑟𝑟(1)
Hence we have:
5 1
0 𝑐𝑐0 𝑐𝑐0 −0.4
�4 5 � �𝑐𝑐 � = − �21�  �𝑐𝑐 � = � �
0 2 2 −0.4
4 2
and from Equation 4-12:
5 1 1 −0.4
𝑃𝑃𝑂𝑂𝑠𝑠 = +� �� = 0.85
4 2 2 −0.4
In summary:
𝑓𝑓
Forward Linear Predictor: {𝑎𝑎0 , 𝑎𝑎1 , 𝑎𝑎2 } → {1, −0.476, 0.190} 𝑃𝑃𝑂𝑂 = 1.012
Backward Linear Predictor: {𝑏𝑏0 , 𝑏𝑏1 , 𝑏𝑏2 } → {0.190, −0.476, 1} 𝑃𝑃𝑂𝑂𝑏𝑏 = 1.012
Symmetric Linear Smoother: {𝑐𝑐0 , 𝑐𝑐1 , 𝑐𝑐2 } → {−0.4, 1, −0.4} 𝑃𝑃𝑂𝑂𝑠𝑠 = 0.85
We note:
1. The BLP co-efficient vector is the reverse of the FLP co-efficient vector.
2. The BLP and FLP MMSE powers are identical.
3. The SLS co-efficients are symmetric.
4. The SLS MMSE power is less than either the FLP or BLP and hence the SLS performs
better.
5. It can be shown the the FLP is minimum-phase, the BLP is maximum-phase and the SLS
has linear phase.
NOTE: We will define the complex power spectrum or power spectral density (PSD) for the
discrete-time signal, 𝑥𝑥[𝑛𝑛], as 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑟𝑟𝑥𝑥 (𝑘𝑘)}, where 𝑍𝑍𝑍𝑍 is the z-transform. Previously
we would have use 𝑆𝑆𝑋𝑋 (𝑧𝑧) or 𝑆𝑆𝑋𝑋 (𝐹𝐹) when considering the theory of random signals.

4.6 Optimum Infinite Impulse Response Filters

[1: 295-306][2: 353-369][3: 606-612]
For optimum IIR filters the presence of both zeros and poles in the filter transfer function,
𝐻𝐻(𝑧𝑧), implies an infinite impulse response sequence, ℎ𝑂𝑂 (𝑘𝑘), hence the term IIR filter. The
theory for nonstationary signals is complicated and beyond the scope of these notes. For
stationary signals, the optimum IIR filter equation, Wiener-Hopf equation, and expression
for the MMSE power are the same as Equation 4-7, Equation 4-8, and Equation 4-9
respectively with the exception that the summation depends on whether we are interested in a
noncausal or causal IIR filter.
Noncausal IIR filter
The noncausal optimum IIR filter is implemented based upon the convolution:
∞
𝑦𝑦�𝑂𝑂 (𝑛𝑛) = � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=−∞
∞
� ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑥𝑥 (𝑚𝑚 − 𝑘𝑘) = 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑚𝑚) − ∞ < 𝑚𝑚 < ∞

𝑘𝑘=−∞
∞
𝑃𝑃 𝑂𝑂 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)

𝑘𝑘=−∞
Causal IIR filter
The causal optimum IIR filter is implemented based upon the convolution:
∞
𝑦𝑦�𝑂𝑂 (𝑛𝑛) = � ℎ𝑐𝑐 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
∞
� ℎ𝑐𝑐 (𝑘𝑘)𝑟𝑟𝑥𝑥 (𝑚𝑚 − 𝑘𝑘) = 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑚𝑚) 0 ≤ 𝑚𝑚 < ∞

𝑘𝑘=0
∞
𝑃𝑃 𝑂𝑂 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑐𝑐 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)

𝑘𝑘=0
Equation 4-15
Analysis is facilitated if the Wiener-Hopf equations are expressed by the following

convolutional equation:
ℎ𝑂𝑂 (𝑚𝑚) ∗ 𝑟𝑟𝑥𝑥 (𝑚𝑚) = 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑚𝑚)
Equation 4-16
where the complexity of the solution depends on the range of m that is applicable.
From Chapter 2 we had the following results, where we restate using our new notational framework:
Let 𝑦𝑦(𝑘𝑘) = 𝑥𝑥(𝑘𝑘) ∗ ℎ(𝑘𝑘), and define: 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝑍𝑍𝑍𝑍�𝑟𝑟𝑦𝑦 (𝑘𝑘)� 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑍𝑍𝑍𝑍{𝑟𝑟𝑥𝑥 (𝑘𝑘)} 𝐻𝐻(𝑧𝑧) = 𝑍𝑍𝑍𝑍{ℎ(𝑘𝑘)}
−1
then: 𝑅𝑅𝑥𝑥𝑥𝑥 (𝑧𝑧) = 𝐻𝐻 (𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝑅𝑅𝑥𝑥 (𝑧𝑧)
4.6.1 Noncausal IIR filter
Since the limits of the summation that apply to Equation 4-16 is −∞ < 𝑚𝑚 < ∞ the
convolution property of the z-transform can be invoked to give:
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧)𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
thus:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) =
𝑅𝑅𝑥𝑥 (𝑧𝑧)
Equation 4-17
where 𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) is the optimum IIR filter transfer function, 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑥𝑥 (𝑙𝑙)] =
∑∞𝑙𝑙=−∞ 𝑟𝑟𝑥𝑥 (𝑙𝑙)𝑧𝑧
−𝑙𝑙
is the power-spectral density (PSD) of 𝑥𝑥(𝑛𝑛), and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)] =
∞ −𝑙𝑙
∑𝑙𝑙=−∞ 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)𝑧𝑧 is the cross-PSD between 𝑦𝑦(𝑛𝑛) and 𝑥𝑥(𝑛𝑛).
Example 4–4
Problem: Consider the problem of estimating a desired signal 𝑦𝑦(𝑛𝑛) that is corrupted by
additive noise, 𝑣𝑣(𝑛𝑛). The goal is to design the optimum IIR filter to extract 𝑦𝑦(𝑛𝑛) from the
noisy observations:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
Equation 4-18
given that 𝑦𝑦(𝑛𝑛) and 𝑣𝑣(𝑛𝑛) are uncorrelated signals with known autocorrelation sequences
4
𝑟𝑟𝑦𝑦 (𝑙𝑙) = 𝛼𝛼 |𝑙𝑙| , − 1 < 𝛼𝛼 < 1 and 𝑟𝑟𝑣𝑣 (𝑙𝑙) = 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙) respectively where α = and σ2v = 1.
5
Solution for the noncausal IIR filter: The expressions for the autocorrelation 𝑟𝑟𝑥𝑥 (𝑙𝑙) of the
input signal 𝑥𝑥(𝑛𝑛) and the cross-correlation 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) between 𝑦𝑦(𝑛𝑛) and 𝑥𝑥(𝑛𝑛) are needed. From
Equation 4-18 and noting the 𝑦𝑦(𝑛𝑛) and 𝑣𝑣(𝑛𝑛) are uncorrelated:
𝑟𝑟𝑥𝑥 (𝑙𝑙) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑙𝑙)
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝑟𝑟𝑦𝑦 (𝑙𝑙)
Taking the z-transform of the above:
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧)

Before deriving an exact expression of the noncausal IIR filter we can examine its behaviour
by noting its form. From Equation 4-17 :
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = =
𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
𝑗𝑗𝑗𝑗
and for 𝑧𝑧 = 𝑒𝑒 it shows that for those values of ω for which 𝑅𝑅𝑦𝑦 (𝑧𝑧) >> 𝑅𝑅𝑣𝑣 (𝑧𝑧), i.e. for high
SNR, then |𝐻𝐻𝑛𝑛𝑛𝑛 (𝑒𝑒 𝑗𝑗𝑗𝑗 )| ≈ 1. Conversely, if 𝑅𝑅𝑦𝑦 (𝑧𝑧) << 𝑅𝑅𝑣𝑣 (𝑧𝑧), i.e. for low SNR, then
|𝐻𝐻𝑛𝑛𝑛𝑛 (𝑒𝑒 𝑗𝑗𝑗𝑗 )| ≈ 0. Thus, the optimum filter “passes” its input in bands with high SNR (where
the desired signal dominates) but attenuates in bands with low SNR (where the noise
dominates).
To obtain an exact expression for the noncausal IIR filter, exact expressions for 𝑅𝑅𝑥𝑥 (𝑧𝑧) =
𝛧𝛧𝛧𝛧[𝑟𝑟𝑥𝑥 (𝑙𝑙)] = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙) + 𝑟𝑟𝑣𝑣 (𝑙𝑙)] = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧) and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)] = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] =
𝑅𝑅𝑦𝑦 (𝑧𝑧) are needed. Since 𝑟𝑟𝑣𝑣 (𝑙𝑙) = 𝜎𝜎𝑣𝑣2 𝛿𝛿(𝑙𝑙) then 𝑅𝑅𝑣𝑣 (𝑧𝑧) = 𝜎𝜎𝑣𝑣2 = 1.
The expression for 𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] requires derivation and simplification from first
principles as follows:
4 |𝑙𝑙| 4 𝑙𝑙 4 −𝑙𝑙
𝑟𝑟𝑦𝑦 (𝑙𝑙) = � � = � � 𝑢𝑢(𝑙𝑙) + � � 𝑢𝑢(−𝑙𝑙 − 1)
5 5 5
𝑙𝑙 𝑙𝑙
4 5
= � � 𝑢𝑢(𝑙𝑙) + � � 𝑢𝑢(−𝑙𝑙 − 1)
5 4
Referring to a common z-transform pairs table (found in most good textbooks in discrete-
time signal processing):
𝑛𝑛
1 𝑛𝑛
𝑎𝑎−1 𝑧𝑧
𝑎𝑎 𝑢𝑢(𝑛𝑛) → , |𝑧𝑧| > |𝑎𝑎| , 𝑎𝑎 𝑢𝑢(−𝑛𝑛 − 1) → , |𝑧𝑧| < |𝑎𝑎|
1 − 𝑎𝑎𝑧𝑧 −1 1 − 𝑎𝑎−1 𝑧𝑧
|𝑛𝑛|
1 − 𝑎𝑎2
𝑎𝑎 ⟷ , |𝑎𝑎| < |𝑧𝑧| < |𝑎𝑎−1 |
(1 − 𝑎𝑎𝑧𝑧 )(1 − 𝑎𝑎𝑎𝑎)
−1
thus:
4
1 𝑧𝑧 4 5
𝑅𝑅𝑦𝑦 (𝑧𝑧) = 𝛧𝛧𝛧𝛧[𝑟𝑟𝑦𝑦 (𝑙𝑙)] = + 5 , < |𝑧𝑧| <
4 −1 4 5 4
1 − 𝑧𝑧 1 − 𝑧𝑧
5 5
and upon the appropriate algebraic simplification:
3 2
� � 4 5
𝑅𝑅𝑦𝑦 (𝑧𝑧) ≡ 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 5 , < |𝑧𝑧| <
4 −1 4 5 4
�1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧�
Equation 4-19

Finally the exact expression for 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧) in simplified form is needed:
3 2 3 2 4 4
� � � � + �1 − 𝑧𝑧 −1 � �1 − 𝑧𝑧�
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 5 +1= 5 5 5
4 −1 4 4 −1 4
�1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧� �1 − 5 𝑧𝑧 � �1 − 5 𝑧𝑧�
1 −1 1
8 �1 − 2 𝑧𝑧 � �1 − 2 𝑧𝑧�
=
5 �1 − 4 𝑧𝑧 −1 � �1 − 4 𝑧𝑧�
5 5
Equation 4-20
Note: The simplified forms for 𝑅𝑅𝑦𝑦 (𝑧𝑧) and 𝑅𝑅𝑥𝑥 (𝑧𝑧) have been deliberately structured so that
𝑅𝑅𝑥𝑥 (𝑧𝑧) can be expressed as 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) which is necessary when considering the causal
IIR filter.
The noncausal optimum IIR filter can now be expressed as the following all-pole filter:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 9 1 1
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = = , < |𝑧𝑧| < 2
𝑅𝑅𝑥𝑥 (𝑧𝑧) 40 �1 − 1 𝑧𝑧 −1 � �1 − 1 𝑧𝑧� 2
2 2
where the ROC has been chosen to ensure the filter is stable. Evaluating the inverse z-
transform:
3 1 |𝑛𝑛|
ℎ𝑛𝑛𝑛𝑛 (𝑛𝑛) = � � , − ∞ < 𝑛𝑛 < ∞
10 2
which clearly corresponds to a noncausal filter with corresponding stable, but noncausal,
difference equation:
∞
9 2 2 3 1 |𝑘𝑘|
𝑦𝑦(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + 𝑦𝑦(𝑛𝑛 − 1) + 𝑦𝑦(𝑛𝑛 + 1) = � � � 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
50 5 5 10 2
𝑘𝑘=−∞
The MMSE power is:
∞ ∞
3 1 |𝑘𝑘| 4 |𝑘𝑘| 3
𝑃𝑃 𝑛𝑛𝑛𝑛 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 1 − � � � � � =
10 2 5 10
𝑘𝑘=−∞ 𝑘𝑘=−∞

4.6.2 Causal IIR filter
Since the limits of the summation that apply to Equation 4-16 is 0 ≤ 𝑚𝑚 < ∞ we cannot use
the convolution property of the z-transform to provide an analytic expression for the causal
IIR filter. An alternative methodology is to note the following:
1. Any regular process can be transformed to an equivalent white process
2. The solution to the Wiener-Hopf equations for 0 ≤ 𝑚𝑚 < ∞ is trivial if the input is white
Solution for white input processes

If 𝑥𝑥(𝑛𝑛) is a white noise process then:
𝑟𝑟𝑥𝑥 (𝑙𝑙) = 𝜎𝜎𝑥𝑥2 𝛿𝛿(𝑙𝑙)
and from Equation 4-16 this gives:
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑚𝑚)
ℎ𝑐𝑐 (𝑚𝑚) ∗ 𝛿𝛿(𝑚𝑚) = 0 ≤ 𝑚𝑚 < ∞
𝜎𝜎𝑥𝑥2
which implies:
1
𝑟𝑟 (𝑚𝑚) 0 ≤ 𝑚𝑚 < ∞
ℎ𝑐𝑐 (𝑚𝑚) = �𝜎𝜎𝑥𝑥2 𝑦𝑦𝑦𝑦
0 𝑚𝑚 < 0
Equation 4-21
a causal filter response. The optimum IIR filter system function response is:
1
𝐻𝐻𝑐𝑐 (𝑧𝑧) = 2 [𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)]+
𝜎𝜎𝑥𝑥
Equation 4-22
where
∞
[𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)]+ = � 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙)𝑧𝑧 −𝑙𝑙

𝑙𝑙=0
is the one-sided z-transform of the two-sided sequence 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙). From Equation 4-15 and
Equation 4-21 the MMSE power is given by:
∞
1 2
𝑃𝑃𝑐𝑐 = 𝑟𝑟𝑦𝑦 (0) − 2 ��𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)�
𝜎𝜎𝑥𝑥
𝑘𝑘=0
Equation 4-23
Solution for regular input processes

From stochastic signal theory any regular input process, 𝑥𝑥(𝑛𝑛), can be considered as the
output of a linear time-invariant system with transfer function, 𝐻𝐻𝑥𝑥 (𝑧𝑧), excited by a white
noise source, 𝑤𝑤(𝑛𝑛), with noise variance 𝜎𝜎𝑥𝑥2 . That is:
∞
𝑥𝑥(𝑛𝑛) = � ℎ𝑥𝑥 (𝑘𝑘)𝑤𝑤(𝑛𝑛 − 𝑘𝑘) ⇒ 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (z −1 )𝑅𝑅𝑤𝑤 (𝑧𝑧) = 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (z −1 )𝜎𝜎𝑥𝑥2
𝑘𝑘=0
Equation 4-24

For real-valued regular signal, 𝑥𝑥(𝑛𝑛), the PSD can be factored as:
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) → 𝜎𝜎𝑥𝑥2 = 𝐻𝐻𝑥𝑥−1 (𝑧𝑧)𝐻𝐻𝑥𝑥−1 (𝑧𝑧 −1 )𝑅𝑅𝑥𝑥 (𝑧𝑧)
1
where 𝐻𝐻𝑥𝑥−1 (𝑧𝑧) = is known as the whitening filter since:
𝐻𝐻𝑥𝑥 (𝑧𝑧)
∞
𝑤𝑤(𝑛𝑛) = � ℎ𝑥𝑥−1 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
−1 1
That is, by passing 𝑥𝑥(𝑛𝑛) through 𝐻𝐻𝑥𝑥 (𝑧𝑧) = a linearly equivalent white noise process,
𝑤𝑤(𝑛𝑛), is output. Then the optimum causal IIR filter for estimating 𝑦𝑦(𝑛𝑛) from 𝑤𝑤(𝑛𝑛) is
provided by Equation 4-22 where 𝑥𝑥(𝑛𝑛) ≡ 𝑤𝑤(𝑛𝑛), that is:
1
𝐻𝐻𝑐𝑐′ (𝑧𝑧) = 2 [𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)]+
𝜎𝜎𝑥𝑥
This is illustrated by Figure 4-4 where it can be seen that the optimum causal IIR filter for
estimating y(n) from x(n) is:
1
𝐻𝐻𝑐𝑐 (𝑧𝑧) = � � 𝐻𝐻′ (𝑧𝑧)
𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝑐𝑐
Figure 4-4 Block diagram of optimum IIR filter design (Figure 6.18[1])
To express 𝐻𝐻𝑐𝑐′ (𝑧𝑧) in terms of 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) a relationship between 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) and 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) is needed.
The non-causal form of Equation 4-24 allows us to state:
∞ ∞
𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙)} = 𝐸𝐸 �𝑦𝑦(𝑛𝑛) � � ℎ𝑥𝑥 (𝑘𝑘)𝑤𝑤(𝑛𝑛 − 𝑙𝑙 − 𝑘𝑘)�� = � ℎ𝑥𝑥 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙 + 𝑘𝑘)
𝑘𝑘=−∞ 𝑘𝑘=−∞
Taking the z-transform noting that 𝑥𝑥(−𝑛𝑛) ⟺ 𝑋𝑋(𝑧𝑧 −1 ) this relationship is:
∞
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑍𝑍𝑍𝑍 � � ℎ𝑥𝑥 (−𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙 − 𝑘𝑘)� = 𝑍𝑍𝑍𝑍[𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) ∗ ℎ𝑥𝑥 (−𝑘𝑘)] = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
𝑘𝑘=−∞
which gives:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) =
𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
Hence:
1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑐𝑐′ (𝑧𝑧) = � �
𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) +
and thus:

Optimum causal IIR filter

1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑐𝑐 (𝑧𝑧) = � �
𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) +
Equation 4-25
and the MMSE power, from Equation 4-23, where w(n) is the linearly equivalent white noise
process, can be expressed:
∞ ∞
1 2
𝑃𝑃𝑐𝑐 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑐𝑐 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − 2 ��𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)�
𝜎𝜎𝑥𝑥
Since 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) then from Equation 4-17:
Optimum noncausal IIR filter
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝐻𝐻𝑛𝑛𝑛𝑛 (𝑧𝑧) = = 2
𝑅𝑅𝑥𝑥 (𝑧𝑧) 𝜎𝜎𝑥𝑥 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 )
and the MMSE power, from Equation 4-23, where w(n) is the linearly equivalent white noise
process, can be expressed:
∞ ∞
1 2
𝑃𝑃𝑛𝑛𝑛𝑛 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑛𝑛𝑛𝑛 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 𝑟𝑟𝑦𝑦 (0) − 2 � �𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)�
𝜎𝜎𝑥𝑥
𝑘𝑘=−∞ 𝑘𝑘=−∞
• Since �𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘)� ≥ 0, then as the order of the filter increases the MMSE decreases due to
more 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) co-efficients contributing to decrease the initial signal variance
𝑃𝑃𝑦𝑦 = 𝑟𝑟𝑦𝑦 (0) in the expressions for 𝑃𝑃𝑐𝑐 and 𝑃𝑃𝑛𝑛𝑛𝑛 .
• How can only realise a non-causal IIR filter? The standard approach is to use a two-step
process with block buffering: 1) apply the causal filter 𝐻𝐻(𝑧𝑧) and buffer the output, then,
2) run the data “backwards” through the non-causal filter, 𝐻𝐻(𝑧𝑧 −1 ).

Example 4–5
The optimum causal IIR filter solution for the problem described in Example 4–4 is now
derived and compared with the optimum noncausal IIR filter.
From Equation 4-20 we can rewrite 𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) where:
1 1
8 1 − 𝑧𝑧 −1 1 − 𝑧𝑧
𝜎𝜎𝑥𝑥2 = , 𝐻𝐻𝑥𝑥 (𝑧𝑧) = 2 , 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) = 2
5 4 4
1 − 𝑧𝑧 −1 1 − 𝑧𝑧
5 5
Equation 4-26
This together with the expression for 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) from Equation 4-19 gives:
3 2
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) � � 0.6 0.3𝑧𝑧
= 5 = +
𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) �1 − 4 𝑧𝑧 −1 � �1 − 1 𝑧𝑧� 1 − 4 𝑧𝑧 −1 1 − 1 𝑧𝑧
5 2 5 2
4
where the first term of the partial fraction expansion is stable for |z|> and is thus causal
5
and the second term is stable for |z|<2 and is thus noncausal. Hence taking the causal part
gives:
3
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) � �
� � = 5
𝐻𝐻𝑥𝑥 (𝑧𝑧 ) + 1 − 4 𝑧𝑧 −1
−1
5
From Equation 4-25 the expression for the optimal causal IIR filter is:
4 −1 3
1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 5 1 − 5 𝑧𝑧 � � 3 1 1
𝐻𝐻𝑐𝑐 (𝑧𝑧) = 2 � � = � � � 5 � = � �, |𝑧𝑧| >
𝜎𝜎𝑥𝑥 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) + 8 1 − 1 𝑧𝑧 −1 4
1 − 𝑧𝑧 −1 8 1 − 1 𝑧𝑧 −1 2
2 5 2
and evaluating the inverse z-transform the filter impulse response is:
3 1 𝑛𝑛
ℎ𝑐𝑐 (𝑛𝑛) = � � 𝑢𝑢(𝑛𝑛)
8 2
which corresponds to a causal filter with corresponding stable and causal difference
equation:
∞
3 1 3 1 𝑘𝑘
𝑦𝑦(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) + 𝑦𝑦(𝑛𝑛 − 1) = � � � 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
8 2 8 2
𝑘𝑘=0
The MMSE power is:
∞ ∞
3 1 𝑘𝑘 4 𝑘𝑘 3
𝑃𝑃 𝑐𝑐 = 𝑟𝑟𝑦𝑦 (0) − � ℎ𝑐𝑐 (𝑘𝑘)𝑟𝑟𝑦𝑦𝑦𝑦 (𝑘𝑘) = 1 − � � � � � =
8 2 5 8
and, as expected, the optimum causal IIR filter has a larger MMSE than the optimum
noncausal IIR filter.

4.6.3 Causal IIR linear predictor
The one-step forward IIR linear predictor is a causal IIR optimum filter with desired
response 𝑦𝑦(𝑛𝑛) ≡ 𝑥𝑥(𝑛𝑛 + 1). The prediction error is:
∞
𝑒𝑒 𝑓𝑓 (𝑛𝑛 + 1) = 𝑥𝑥(𝑛𝑛 + 1) − 𝑥𝑥�(𝑛𝑛 + 1) = 𝑥𝑥(𝑛𝑛 + 1) − � ℎ𝑙𝑙𝑙𝑙 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
Equation 4-27
To derive the expression for the IIR linear predictor transfer function 𝐻𝐻𝑙𝑙𝑙𝑙 (𝑧𝑧) =
∑∞𝑘𝑘=0 ℎ𝑙𝑙𝑙𝑙 (𝑘𝑘)𝑧𝑧
−𝑘𝑘
we note that since 𝑦𝑦(𝑛𝑛) = 𝑥𝑥(𝑛𝑛 + 1) then 𝑟𝑟𝑦𝑦𝑦𝑦 (𝑙𝑙) = 𝑟𝑟𝑥𝑥 (𝑙𝑙 + 1) and taking the
z-transform of both sides 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑧𝑧𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝑧𝑧𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ). The optimum linear
predictor is:
1 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) 1 𝑧𝑧𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧)𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) [𝑧𝑧𝐻𝐻𝑥𝑥 (𝑧𝑧)]+
𝐻𝐻𝑙𝑙𝑙𝑙 (𝑧𝑧) = 2 � � = � � =
𝜎𝜎𝑥𝑥 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) + 𝜎𝜎𝑥𝑥2 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧 −1 ) +
∑∞𝑘𝑘=1 ℎ𝑥𝑥 (𝑘𝑘)𝑧𝑧
−𝑘𝑘+1
𝑧𝑧𝐻𝐻𝑥𝑥 (𝑧𝑧) − ℎ𝑥𝑥 (0)𝑧𝑧 𝑧𝑧𝐻𝐻𝑥𝑥 (𝑧𝑧) − 𝑧𝑧 1
= = = = 𝑧𝑧 �1 − �
𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧) 𝐻𝐻𝑥𝑥 (𝑧𝑧)
where without loss of generality and for convenience it is assumed that ℎ𝑥𝑥 (0) = 1.
Of more interest is the prediction error filter transfer function, that is from Equation 4-27:
𝑒𝑒 𝑓𝑓 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − ∑∞ 𝑓𝑓
𝑘𝑘=0 ℎ𝑙𝑙𝑙𝑙 (𝑘𝑘)𝑥𝑥(𝑛𝑛 − 1 − 𝑘𝑘) → 𝐸𝐸 (𝑧𝑧) = 𝐻𝐻𝑝𝑝𝑝𝑝𝑝𝑝 (𝑧𝑧)𝑋𝑋(𝑧𝑧) from which
1
𝐻𝐻𝑝𝑝𝑝𝑝𝑝𝑝 (𝑧𝑧) = 1 − 𝑧𝑧 −1 𝐻𝐻𝑙𝑙𝑙𝑙 (𝑧𝑧) = .
That is, the prediction error filter is identical to the whitening filter of the process and hence
𝑓𝑓
the prediction error process is white. It can be shown the MMSE power is given by 𝑃𝑃𝑂𝑂 =
2
𝑃𝑃𝑒𝑒 = 𝐸𝐸{�𝑒𝑒 𝑓𝑓 (𝑛𝑛)� } = 𝜎𝜎𝑥𝑥2 which is as expected.
4.7 Application of Optimum Linear Filters

[1: 306-325][2: 349-353,369-371]
4.7.1 Inverse Filtering and Deconvolution
The inverse filtering or deconvolution problem involves the design of an optimum inverse
filter for linearly distorted signals observed in the presence of additive noise. The typical
configuration of such a system is shown by Figure 4-5. where G(z) is the known system
response of the linear distortion, H(z) is the optimum inverse filter we are designing, y(n) is
the desired signal we are trying to recover, s(n) is the linearly distorted signal, v(n) is
additive white noise and 𝑦𝑦�(𝑛𝑛) is an estimation of the desired signal from the noisy, distorted
input x(n).

Figure 4-5 Optimum Inverse System Modeling (Figure 6.24[1])
The delay element is required since the linear distortion filter is causal and its output is
delayed by D samples. Usually this is unknown and has to be determined empirically for
improved performance. The optimum noncausal IIR filter is derived by:
𝑧𝑧 −𝐷𝐷 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧)
𝑅𝑅𝑥𝑥 (𝑧𝑧)
−𝐷𝐷
where the 𝑧𝑧 arises because the desired response is 𝑦𝑦(𝑛𝑛 − 𝐷𝐷). Since y(n) and v(n) are
uncorrelated, 𝑥𝑥(𝑛𝑛) = 𝑠𝑠(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) and 𝑋𝑋(𝑧𝑧) = 𝑆𝑆(𝑧𝑧) + 𝑉𝑉(𝑧𝑧) = 𝐺𝐺(𝑧𝑧)𝑌𝑌(𝑧𝑧) + 𝑉𝑉(𝑧𝑧) this gives:
𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝑅𝑅𝑦𝑦𝑦𝑦 (𝑧𝑧) = 𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐺𝐺(𝑧𝑧)𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
and thus the optimum inverse filter is:
𝑧𝑧 −𝐷𝐷 𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧)
𝐺𝐺(𝑧𝑧)𝐺𝐺(𝑧𝑧 −1 )𝑅𝑅𝑦𝑦 (𝑧𝑧) + 𝑅𝑅𝑣𝑣 (𝑧𝑧)
which, in the absence of noise, yields the expected result:
𝑧𝑧 −𝐷𝐷
𝐺𝐺(𝑧𝑧)
If we assume that system is driven by a white noise signal y(n) with variance 𝜎𝜎𝑦𝑦2 and the
additive noise v(n) is white with variance 𝜎𝜎𝑣𝑣2 then:
𝑧𝑧 −𝐷𝐷
𝐺𝐺(𝑧𝑧) + [1/𝐺𝐺(𝑧𝑧 −1 )](𝜎𝜎𝑣𝑣2 /𝜎𝜎𝑦𝑦2 )
• In most practical cases the system response, G(z) is unknown and the more difficult
problem of blind deconvolution applies.
4.7.2 Channel Equalisation in Data Transmission Systems
This important problem is beyond the scope of these notes and students specialising in
digital communications are referred to [1, pages 310-319]. Of particular interest is Example
6.8.1 [1, page 317] which discusses the equalisation problem in the context of optimum FIR
filtering.

4.7.3 Matched Filters and Eigenfilters
An important class of optimum filters are those that maximise the output signal-to-noise
ratio. Such filters are used to detect signals in additive noise in many applications, including
digital communications and radar. Detailed analysis of this problem is beyond the scope of
these notes and interested students should refer to [1, pages 319-325]. However we can
illustrate the basic tenets of the problem. Consider the observation vector, x(n), of a desired
signal, s(n) subject to interfering and/or noise v(n). That is:
𝐱𝐱(𝑛𝑛) = 𝐬𝐬(𝑛𝑛) + 𝐯𝐯(𝑛𝑛)
The optimum linear filter is designed to produce estimates 𝐲𝐲(𝑛𝑛) = 𝐬𝐬�(𝑛𝑛) from the observation
vector x(n). For the optimum FIR filter with co-efficient vector c(n) the filter output is given
by:
𝑦𝑦(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 𝐱𝐱(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 𝐬𝐬(𝑛𝑛) + 𝐜𝐜 𝑇𝑇 𝐯𝐯(𝑛𝑛)
2
The output signal power is defined as 𝑃𝑃𝑠𝑠 (𝑛𝑛) = 𝐸𝐸{�𝐜𝐜 𝐓𝐓 𝐬𝐬(𝑛𝑛)� } = 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑠𝑠 (𝑛𝑛)𝐜𝐜 and the output
noise power is given by 𝑃𝑃𝑣𝑣 (𝑛𝑛) = 𝐸𝐸{|𝐜𝐜 𝑇𝑇 𝐯𝐯(𝑛𝑛)|2 } = 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑣𝑣 (𝑛𝑛)𝐜𝐜. The signal-to-noise ratio for
WSS signals as a function of c is then:
𝑃𝑃𝑠𝑠 𝐜𝐜 𝑇𝑇 𝐑𝐑 𝑠𝑠 𝐜𝐜
𝑆𝑆𝑆𝑆𝑆𝑆(𝐜𝐜) = = 𝑇𝑇
𝑃𝑃𝑣𝑣 𝐜𝐜 𝐑𝐑 𝑣𝑣 𝐜𝐜
Of interest is the special case for deterministic signals, 𝐬𝐬(𝑛𝑛) = 𝛼𝛼𝐬𝐬𝑂𝑂 in additive white noise
(𝐑𝐑 𝑣𝑣 = 𝑃𝑃𝑣𝑣 𝐈𝐈). It can be shown [1, page 320] that SNR(c) is maximised when 𝐜𝐜𝑂𝑂 = 𝜅𝜅𝐬𝐬𝑂𝑂 , that is
the filter co-efficients are a scaled replica of the known signal and this type of filter is
usually referred to as the matched filter.
4.8 References
McGraw-Hill, 2000.

5. Kalman Filters
5.1 Kalman Filter Algorithm

[1: 378-387][2: 371-379][3: 613-625]
Most real-world problems involve non-stationary processes and more useful estimates are
obtained based on all the available data (e.g. infinite past) and as we have seen efficient
algorithms based on the infinite-time Wiener filter are not easily obtainable.
The Kalman filter deals with the problem by providing an efficient time-recursive and order-
recursive solution to the optimal linear filtering problem in cases where the nonstationarity
can be modelled by dynamic or state-space models and where all the available data from
time n=0 is used. Furthermore vector state-space models are used which enhance the
applicability of Kalman filtering to a wider set of problems (e.g. an auto-regressive process
can be described by a vector state-space model).
Although the student is not required to remember how the Kalman equations are derived the
process of derivation is in itself instructive and will allow the student to appreciate the
importance of rigour and a systematic approach to derive the most complex of signal
processing and control theory equations from the most basic of assumptions. Being
proficient and confident in such derivation will allow the student to customise and adapt
algorithms to specific applications and research, rather than just accept algorithms published
in the literature for the most general cases.
5.1.1 Preliminary Development
We extend some concepts from stochastic processes and relevant notation is introduced
before the main development of the Kalman filtering equations.
We define:
𝑦𝑦�𝑛𝑛+1 (𝑛𝑛) = 𝑦𝑦�(𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥(𝑛𝑛), 𝑥𝑥(𝑛𝑛 − 1), … , 𝑥𝑥(0)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝐱𝐱𝑛𝑛+1 (𝑛𝑛)}
𝑥𝑥�𝑛𝑛 (𝑛𝑛) = 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝑥𝑥(𝑛𝑛)|𝑥𝑥(𝑛𝑛 − 1), 𝑥𝑥(𝑛𝑛 − 2), … , 𝑥𝑥(0)} = 𝐸𝐸{𝑥𝑥(𝑛𝑛)|𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1)}
That is, 𝑦𝑦�(𝑛𝑛|𝑛𝑛), is the optimal estimate of the desired signal 𝑦𝑦(𝑛𝑛) based on all the available
data up to time n. We know that the optimum estimate at time n is given by 𝑦𝑦�(𝑛𝑛|𝑛𝑛) =
𝑦𝑦�𝑛𝑛+1 (𝑛𝑛) = 𝐜𝐜𝑛𝑛+1 (𝑛𝑛)𝑇𝑇 𝐱𝐱 𝑛𝑛+1 (𝑛𝑛), that is the n+1th order filter estimate.
Similarly, 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1), is the optimal forward prediction of the observation data 𝑥𝑥(𝑛𝑛) based
on all the available data prior to time n. We know that the optimum prediction at time n is
given by 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝑥𝑥�𝑛𝑛 (𝑛𝑛) = −𝐚𝐚𝑛𝑛 (𝑛𝑛)𝑇𝑇 𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1), that is the nth order predictor.
We now define the estimation/prediction error or innovation process for the observation data
𝑥𝑥(𝑛𝑛): 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1)

Hence: 𝑥𝑥(𝑛𝑛) = 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) + 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) = 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) − 𝐚𝐚𝑛𝑛 (𝑛𝑛)𝑇𝑇 𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1)

Thus since 𝑥𝑥(𝑛𝑛) can be recovered from 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) and 𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1) then:
𝑦𝑦�(𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥(𝑛𝑛), 𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1)} = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1), 𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1)}
Since the error 𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1) is orthogonal to the data 𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1) (by the principle of
orthogonality) the following is true:
𝑦𝑦�(𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝐱𝐱𝑛𝑛 (𝑛𝑛 − 1)} + 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1)}
= 𝑦𝑦�(𝑛𝑛|𝑛𝑛 − 1) + 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝑥𝑥�(𝑛𝑛|𝑛𝑛 − 1)}
Equation 5.1
We define 𝑦𝑦�(𝑛𝑛|𝑛𝑛) as the filtered or updated estimate of 𝑦𝑦(𝑛𝑛), whereas 𝑦𝑦�(𝑛𝑛|𝑛𝑛 − 1) is the
predicted estimate of 𝑦𝑦(𝑛𝑛).
5.1.2 Derivation of the Kalman filter equations
In the following we deal with vector valued observation data, 𝐱𝐱(𝑛𝑛), and desired signal or
state, 𝐲𝐲(𝑛𝑛), at time n. We also assume, without loss of generality, that 𝐸𝐸{𝐲𝐲(𝑛𝑛)} = 𝐸𝐸{𝐱𝐱(𝑛𝑛)} =
0 but note that the resulting equations can be shown to hold for the case 𝐸𝐸{𝐲𝐲(𝑛𝑛)} =
𝐸𝐸{𝐱𝐱(𝑛𝑛)} ≠ 0 and thus a correlation matrix will be referred to as the covariance matrix.
The non-stationarity of the desired signal is modelled by the following signal (state-space or
plant) model:
𝐲𝐲(𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
Equation 5.2
where:
𝐲𝐲(𝑛𝑛) = 𝑘𝑘 × 1 signal state vector at time n
𝐀𝐀(𝑛𝑛 − 1) = 𝑘𝑘 × 𝑘𝑘 matrix that relates 𝐲𝐲(𝑛𝑛 − 1) to 𝐲𝐲(𝑛𝑛) in the absence of a driving input
𝒖𝒖(𝑛𝑛) = 𝑗𝑗 × 1 zero-mean white noise “disturbance” with covariance matrix 𝐑𝐑 𝑢𝑢 (𝑛𝑛)
𝐁𝐁(𝑛𝑛) = 𝑘𝑘 × 𝑗𝑗 input matrix
The observation data is assumed to be related to the desired signal by the following
observation (or measurement) model:
𝐱𝐱(𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛)
Equation 5.3
where:
𝐱𝐱(𝑛𝑛) = 𝑚𝑚 × 1 observation data vector at time n
𝐇𝐇(𝑛𝑛) = 𝑚𝑚 × 𝑘𝑘 matrix that provides ideal linear relationship between 𝐲𝐲(𝑛𝑛) and 𝐱𝐱(𝑛𝑛)
𝒗𝒗(𝑛𝑛) = 𝑚𝑚 × 1 zero-mean white noise “interference” with covariance matrix 𝐑𝐑 𝑣𝑣 (𝑛𝑛)
Optional (only for lovers of fine maths)
Recursive algorithm for 𝐲𝐲(𝑛𝑛)
Our goal is to derive a recursive algorithm that provides filtered estimates for the state
𝑦𝑦(𝑛𝑛|𝑛𝑛) given the observation data, 𝐱𝐱(𝑛𝑛), system matrices 𝐀𝐀(𝑛𝑛 − 1), 𝐇𝐇(𝑛𝑛), 𝐁𝐁(𝑛𝑛), and
covariances 𝐑𝐑 𝑢𝑢 (𝑛𝑛), 𝐑𝐑 𝑣𝑣 (𝑛𝑛).
Extending Equation 5.1 to the vector variable case:

𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐸𝐸{𝐲𝐲(𝑛𝑛)|𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)}
Equation 5.4
From Equation 5.2 and noting that E{𝒖𝒖(𝑛𝑛)} = 0:
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = E{𝐲𝐲(𝑛𝑛)|𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1)} = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
Equation 5.5
which provides the required recursion for the term 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1). But how about
E{𝐲𝐲(𝑛𝑛)|𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)}? We treat this as finding the optimum linear estimate of 𝐲𝐲(𝑛𝑛) from the
innovations data 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐱𝐱(𝑛𝑛) − 𝐸𝐸{𝐱𝐱(𝑛𝑛)|𝐱𝐱 𝑛𝑛 (𝑛𝑛 − 1)}. That is we seek the 𝑘𝑘 × 𝑚𝑚 co-
efficient matrix 𝐊𝐊(𝑛𝑛) of the MSE estimator (also called the Kalman gain matrix), which can
be shown by extending the optimum filter result, 𝑦𝑦�(𝑛𝑛) = 𝐸𝐸{𝑦𝑦(𝑛𝑛)|𝐱𝐱(𝑛𝑛)} = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛), to the
case of vector-valued signals to be:
𝐸𝐸{𝐲𝐲(𝑛𝑛)|𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)} = 𝐊𝐊(𝑛𝑛)𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)
Equation 5.6
The corresponding normal equations can be similarly extrapolated from the normal equations
result for optimum filters, 𝐑𝐑(𝑛𝑛)𝐜𝐜(𝑛𝑛) = 𝐝𝐝(𝑛𝑛), to the case of vector-valued signals as follows:
𝐑𝐑 𝑥𝑥� (𝑛𝑛)𝐊𝐊 𝑇𝑇 (𝑛𝑛) = 𝐑𝐑 𝑥𝑥�𝑦𝑦 (𝑛𝑛)
where:
𝐑𝐑 𝑥𝑥� (n) = 𝐸𝐸{𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)𝐱𝐱� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)} is the 𝑚𝑚 × 𝑚𝑚 covariance matrix of the innovations
process and correspond to the autocorrelation matrix of the data.
�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲 𝑇𝑇 (𝑛𝑛)} is the 𝑚𝑚 × 𝑘𝑘 cross-correlation matrix between the data and
𝐑𝐑 𝑥𝑥�𝑦𝑦 (𝑛𝑛) = 𝐸𝐸{𝒙𝒙
desired signal.
Hence the expression Kalman gain matrix is given by:
𝐊𝐊 𝑇𝑇 (𝑛𝑛) = 𝐑𝐑−1 −1
𝑥𝑥� (𝑛𝑛)𝐑𝐑 𝑥𝑥�𝑦𝑦 (𝑛𝑛) ⇒ 𝐊𝐊(𝑛𝑛) = 𝐑𝐑 𝑦𝑦𝑥𝑥� (𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)
Equation 5.7
It can be shown from Equation 5.3 and noting that E{𝒗𝒗(n)} = 0, that:
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
Equation 5.8
Hence from Equation 5.3 and Equation 5.8:
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛) − {𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)}
= 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝒗𝒗(𝑛𝑛)
Equation 5.9
where 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) represents the state error
Thus:
𝐑𝐑 𝑥𝑥� (𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛)
Equation 5.10
𝑇𝑇
where 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲� (𝑛𝑛|𝑛𝑛 − 1)} represents the predicted or a priori
state error covariance

Now from Equation 5.9 and 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1):

𝐑𝐑 𝑦𝑦𝑥𝑥� (𝑛𝑛) = 𝐸𝐸{𝐲𝐲(𝑛𝑛)𝐱𝐱� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)} = 𝐸𝐸{𝐲𝐲(𝑛𝑛)[𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝒗𝒗𝑇𝑇 (𝑛𝑛)]}
= 𝐸𝐸{[𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)][𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝒗𝒗𝑇𝑇 (𝑛𝑛)]}
= 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇𝑇𝑇 (𝑛𝑛)
Equation 5.11
since the error 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) is orthogonal to the optimal prediction 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) (i.e.
E{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)} = 0) and 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝒗𝒗𝑇𝑇 (𝑛𝑛)} = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝒗𝒗𝑇𝑇 (𝑛𝑛)} = 0.
Finally from Equation 5.7 together with Equation 5.10 and Equation 5.11 we have an
expression for the Kalman gain matrix:
𝐊𝐊(𝑛𝑛) = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛)[𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛)]−1
Equation 5.12
and hence a recursive estimate for the state 𝐲𝐲(𝑛𝑛|𝑛𝑛):
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)]
Equation 5.13
where from Equation 5.5 and Equation 5.8:
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1) = 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
Thus we have a time-recursive and order-recursive set of equations for 𝐲𝐲�(𝑛𝑛|𝑛𝑛) at time n
based on the previous estimate 𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) at time n-1. However we now require a
similar recursive equation for the state error covariance 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) which involves the a priori
estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1).
Recursive equation for 𝐑𝐑 𝒚𝒚� (𝑛𝑛|𝑛𝑛)
Consider:
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)
= 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛) − 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
= 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
which provides a prediction for the state error based on the previous estimate.
A similar prediction can be derived for the state error covariance:
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)}
= 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1)𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
Equation 5.14
We define:
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲(𝑛𝑛) − 𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲(𝑛𝑛) − {𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)]}
= 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐱𝐱�(𝑛𝑛|𝑛𝑛 − 1)
where we have used Equation 5.13 for 𝐲𝐲�(𝑛𝑛|𝑛𝑛). Hence the corresponding state error
covariance expression becomes:

𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) = 𝐸𝐸{𝐲𝐲�(𝑛𝑛|𝑛𝑛)𝐲𝐲� 𝑇𝑇 (𝑛𝑛|𝑛𝑛)} = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)𝐊𝐊 𝑇𝑇 (𝑛𝑛)
= 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)[𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛)𝐑𝐑−1
𝑥𝑥� (𝑛𝑛)]
𝑇𝑇
= 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) − 𝐊𝐊(𝑛𝑛)𝐑𝐑 𝑥𝑥� (𝑛𝑛)[𝐑𝐑−1

𝑥𝑥� (𝑛𝑛)𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)]
= [𝐈𝐈 − 𝐊𝐊(𝑛𝑛)𝐇𝐇(𝑛𝑛)]𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)
Equation 5.15
where 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) is the estimated or a posteriori state error covariance, using Equation 5.7
for 𝐊𝐊(𝑛𝑛), Equation 5.11 for 𝐑𝐑 𝑦𝑦𝑥𝑥� (𝑛𝑛), and remembering that covariance matrices and their
inverses are symmetric.
Equation 5.14 and Equation 5.15 provide a time-recursive and order-recursive set of
equations for 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) at time n based on the previous estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1) at time n-
1, an providing the a priori estimate 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) by Equation 5.14 needed in the calculation
of the Kalman gain in Equation 5.12.
Kalman Filter Algorithm

The following algorithm will hold for the general case where 𝐸𝐸{𝐲𝐲(𝑛𝑛)} ≠ 0.
System Description
State/Signal process: 𝐲𝐲(𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝒖𝒖(𝑛𝑛)
Observation process: 𝐱𝐱(𝑛𝑛) = 𝐇𝐇(𝑛𝑛)𝐲𝐲(𝑛𝑛) + 𝒗𝒗(𝑛𝑛)
Input
(a) State-space model parameters: 𝐀𝐀(𝑛𝑛 − 1), 𝐁𝐁(𝑛𝑛), 𝐑𝐑 𝑢𝑢 (𝑛𝑛); for n = 0, 1, 2, …
(b) Observation model parameters: 𝐇𝐇(𝑛𝑛), 𝐑𝐑 𝑣𝑣 (𝑛𝑛); for n = 0, 1, 2, …
(c) Observation data: 𝐱𝐱(𝑛𝑛); for n = 0, 1, 2, …
Initialisation:
(a) State vector: 𝐲𝐲�(−1| − 1) = 𝐸𝐸{𝐲𝐲(−1)}, the expected value of the state at time −1
(b) Error covariance: 𝐑𝐑 𝑦𝑦� (−1| − 1) = 𝐑𝐑 𝑦𝑦� (−1)the error covariance of the state at time −1
Recursion for n = 0, 1, 2, …
(a) Prediction
𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 1)𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
(b) Kalman gain

𝐊𝐊(𝑛𝑛) = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇𝑇𝑇 (𝑛𝑛)[𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 𝑇𝑇 (𝑛𝑛) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛)]−1
(c) Update
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐇𝐇(𝑛𝑛)𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1)]
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) = [𝐈𝐈 − 𝐊𝐊(𝑛𝑛)𝐇𝐇(𝑛𝑛)]𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)

Output
(a) Estimated state vector: 𝐲𝐲�(𝑛𝑛|𝑛𝑛); for n = 0, 1, 2, …
(b) Estimated MMS error covariance: 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛); for n = 0, 1, 2, …
Figure 5-1 Block diagram of the Kalman filter (Figure 7.11[1])
5.1.3 State-space representation of an AR(q) process
Consider the AR(q) process for the signal 𝑦𝑦(𝑛𝑛):

𝑞𝑞
𝑦𝑦(𝑛𝑛) = − � 𝑎𝑎𝑘𝑘 𝑦𝑦(𝑛𝑛 − 𝑘𝑘) + 𝑑𝑑0 𝑤𝑤(𝑛𝑛)

𝑘𝑘=1
Define the 𝑞𝑞 × 1 state vector as:
𝑦𝑦(𝑛𝑛) 𝑦𝑦(𝑛𝑛 − 1)
𝑦𝑦(𝑛𝑛 − 1) 𝑦𝑦(𝑛𝑛 − 2)
𝐲𝐲(𝑛𝑛) = � �, from whence 𝐲𝐲(𝑛𝑛 − 1) = � �
⋮ ⋮
𝑦𝑦(𝑛𝑛 − (𝑞𝑞 − 1)) 𝑦𝑦(𝑛𝑛 − 𝑞𝑞)
The state-space representation of the AR(q) process is formulated by:
𝐲𝐲(𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲(𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝑢𝑢(𝑛𝑛)
where:
−𝑎𝑎1 −𝑎𝑎2 −𝑎𝑎3 ⋯ −𝑎𝑎𝑞𝑞
⎡1 0 0 ⋯ 0 ⎤
⎢ ⎥
𝐀𝐀(𝑛𝑛 − 1) = 𝐀𝐀 = ⎢0 1 0 ⋯ 0 ⎥, is the 𝑞𝑞 × 𝑞𝑞 state matrix.
⎢⋮ ⋮ ⋮ ⋱ 0 ⎥
⎣0 0 0 1 0 ⎦
𝑑𝑑0
0
𝐁𝐁(𝑛𝑛) = 𝐁𝐁 = � �, is the 𝑞𝑞 × 1 input vector and 𝑢𝑢(𝑛𝑛) = 𝑤𝑤(𝑛𝑛) is the white noise input
⋮
0
source.

For the measurement model the simplest is to assume we have noisy observations of the AR
process as follows:
𝑦𝑦(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1)
𝑥𝑥(𝑛𝑛) = [1 0 ⋯ 0] � � + 𝑣𝑣(𝑛𝑛) = 𝐇𝐇𝐇𝐇(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
⋮
𝑦𝑦(𝑛𝑛 − (𝑞𝑞 − 1))
and we have 𝐇𝐇 = [1 0 ⋯ 0], the 1 × 𝑞𝑞 observation matrix. Note that all the system
matrices are independent of time.
5.1.4 Properties of the Kalman filter
Digital filter implementation

The Kalman filter can be viewed as a time-varying recursive (IIR) digital filter with 𝐱𝐱(𝑛𝑛) as
the input and 𝐲𝐲�(𝑛𝑛|𝑛𝑛) as the output. The Update equation for 𝑦𝑦�(𝑛𝑛|𝑛𝑛) together with the
Prediction for 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1) yields:
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)[𝐱𝐱(𝑛𝑛) − 𝐇𝐇(𝑛𝑛)𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1)]
= [𝐈𝐈 − 𝐊𝐊(𝑛𝑛)𝐇𝐇(𝑛𝑛)]𝐀𝐀(𝑛𝑛 − 1)𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) + 𝐊𝐊(𝑛𝑛)𝐱𝐱(𝑛𝑛)
where 𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) is obtained by feeding 𝐲𝐲�(𝑛𝑛|𝑛𝑛) back through a unit delay.
It should be noted that the Kalman gain, 𝐊𝐊(𝑛𝑛), and state error covariance, 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛), can be
calculated off-line and pre-loaded into the filter since the equations do not depend on the data
𝐱𝐱(𝑛𝑛).
Importance of the state and observation noise covariances

Although the equation for the Kalman gain appears involved, there is a conceptual
relationship to the input noise covariances 𝐑𝐑 𝑢𝑢 (𝑛𝑛) and 𝐑𝐑 𝑣𝑣 (𝑛𝑛). Consider the case where there
is no state noise 𝐑𝐑 𝑢𝑢 (𝑛𝑛) = 0 and the initial state is known (i.e. 𝐑𝐑 𝑦𝑦� (−1| − 1) = 0) so that
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛) = 0 . Then it is obvious that 𝐊𝐊(𝑛𝑛) = 0, implying that there is only
prediction and no update, which can be explained by noting that for this case the state
becomes wholly predictable (i.e. deterministic) and hence 𝐲𝐲�(𝑛𝑛|𝑛𝑛) = 𝐲𝐲�(𝑛𝑛|𝑛𝑛 − 1). Thus the
observation data is redundant and not needed! Similarly if the observation data is very noisy
and of poor quality then 𝐑𝐑 𝑣𝑣 (𝑛𝑛) will be large and the Kalman gain will be correspondingly
smaller (since 𝐊𝐊(𝑛𝑛) ∝ 𝐑𝐑−1 𝑇𝑇
𝑣𝑣 (𝑛𝑛) if 𝐑𝐑 𝑣𝑣 (𝑛𝑛) >> 𝐇𝐇(𝑛𝑛)𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1)𝐇𝐇 (𝑛𝑛)), since the data
cannot be relied upon to update the prediction.
Thus the operation of the Kalman filter is sensitive to 𝐑𝐑 𝑢𝑢 (𝑛𝑛) and 𝐑𝐑 𝑣𝑣 (𝑛𝑛) as well as the
initialisation 𝐑𝐑 𝑦𝑦� (−1| − 1). In most practical cases 𝐑𝐑 𝑢𝑢 (𝑛𝑛) and 𝐑𝐑 𝑣𝑣 (𝑛𝑛) will be unknown and
will either have to be judiciously selected arbitrarily or become parameters that need to be
estimated as part of the system identification process.

State error covariance matrix and the matrix Riccati equation

As only the predicted state error covariance, 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1), is needed in the Kalman gain
expression by using Equation 5.12, Equation 5.14, and Equation 5.15 a one-off recursive
equation can be derived as follows:
𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐀𝐀(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2){𝐈𝐈 − 𝐇𝐇 𝑇𝑇 (𝑛𝑛 − 1)[𝐇𝐇(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2)𝐇𝐇 𝑇𝑇 (𝑛𝑛 − 1) + 𝐑𝐑 𝑣𝑣 (𝑛𝑛 − 1)]−1
𝐇𝐇(𝑛𝑛 − 1)𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2)}𝐀𝐀𝑇𝑇 (𝑛𝑛 − 1) + 𝐁𝐁(𝑛𝑛)𝐑𝐑 𝑢𝑢 (𝑛𝑛)𝐁𝐁 𝑇𝑇 (𝑛𝑛)
Equation 5.16
This horrible looking equation is known as the matrix Riccati equation for 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1).
Steady-state Kalman filter and the algebraic Riccati equation

For time-invariant, stationary systems (i.e the state-space and observation model parameters
are time-invariant) then it is possible for the Kalman gain to approach a steady state value,
� and similarly for the state error-covariance to approach a stationary value,
i.e. lim 𝐊𝐊(𝑛𝑛) = 𝐊𝐊
𝑛𝑛→∞
i.e. lim 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 − 1) = 𝐑𝐑 𝑦𝑦� . To derive the steady-state Kalman gain the stationary error
𝑛𝑛→∞
covariance is first calculated by solving the non-linear steady-state or algebraic Riccati
equation for 𝐑𝐑 𝑦𝑦� :
𝐑𝐑 𝑦𝑦� = 𝐀𝐀𝐑𝐑 𝑦𝑦� {𝐈𝐈 − 𝐇𝐇 𝑇𝑇 [𝐇𝐇𝐑𝐑 𝑦𝑦� 𝐇𝐇𝑇𝑇 + 𝐑𝐑 𝑣𝑣 ]−1 𝐇𝐇𝐑𝐑 𝑦𝑦� }𝐀𝐀𝑇𝑇 + 𝐁𝐁𝐑𝐑 𝑢𝑢 𝐁𝐁 𝑇𝑇
Equation 5.17
which is derived from Equation 5.16 assuming time-invariant parameters and 𝐑𝐑 𝑦𝑦� (𝑛𝑛|𝑛𝑛 −
1) ≡ 𝐑𝐑 𝑦𝑦� (𝑛𝑛 − 1|𝑛𝑛 − 2) ≡ 𝐑𝐑 𝑦𝑦� . Then:
� = 𝐑𝐑 𝑦𝑦� 𝐇𝐇𝑇𝑇 [𝐇𝐇𝐑𝐑 𝑦𝑦� 𝐇𝐇 𝑇𝑇 + 𝐑𝐑 𝑣𝑣 ]−1
𝐊𝐊
and we obtain the steady-state Kalman filter:
𝐲𝐲�(𝑛𝑛|𝑛𝑛) = [𝐈𝐈 − 𝐊𝐊 � 𝐇𝐇]𝐀𝐀𝐲𝐲�(𝑛𝑛 − 1|𝑛𝑛 − 1) + 𝐊𝐊 � 𝐱𝐱(𝑛𝑛)
It can be shown that:
The steady-state Kalman filter is identical to the causal IIR Wiener filter
Whitening of the innovations process

The innovations process is “white”, that is, it forms an uncorrelated sequence since:
𝐸𝐸{𝑥𝑥�(𝑚𝑚|𝑚𝑚 − 1)𝑥𝑥� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)} = 𝐸𝐸{[𝑥𝑥(𝑚𝑚) − 𝑥𝑥�(𝑚𝑚|𝑚𝑚 − 1)]𝑥𝑥� 𝑇𝑇 (𝑛𝑛|𝑛𝑛 − 1)} = 0 when 𝑚𝑚 ≠ 𝑛𝑛.
This can be demonstrated by noting that by the principle of orthogonality the error 𝑥𝑥�(𝑛𝑛|𝑛𝑛 −
1) is uncorrelated with both the observation data {𝑥𝑥(𝑛𝑛 − 1), 𝑥𝑥(𝑛𝑛 − 2), … , 𝑥𝑥(0)} and
consequent predictions {𝑥𝑥�(𝑛𝑛 − 1|𝑛𝑛 − 2), 𝑥𝑥�(𝑛𝑛 − 2|𝑛𝑛 − 3), … , 𝑥𝑥�(0| − 1)}.
Applications of the Kalman filter

Many and a plenty!
• target tracking, aero and marine navigation
• process control in industrial applications
• time-varying channel estimation and equalisation
• time-series forecasting and econometric modelling
• speech enhancement in additive white noise

Example 5–1
Problem: Let 𝑦𝑦(𝑛𝑛) be defined by the AR(2) process described by:
𝑦𝑦(𝑛𝑛) = 1.8𝑦𝑦(𝑛𝑛 − 1) − 0.81𝑦𝑦(𝑛𝑛 − 2) + 0.1𝑢𝑢(𝑛𝑛) 𝑛𝑛 ≥ 0
where 𝑢𝑢(𝑛𝑛)~𝑁𝑁(0,1) and 𝑦𝑦(−1) = 𝑦𝑦(−2) = 0. The signal is observed in the presence of
additive noise:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) 𝑛𝑛 ≥ 0
where 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0,1) is uncorrelated with 𝑢𝑢(𝑛𝑛). Form the linear MMSE estimate for
𝑦𝑦(𝑛𝑛), 𝑛𝑛 ≥ 0
Solution: The AR(2) process and observations are reformulated as the follow state-space
model:
𝑦𝑦(𝑛𝑛) 1.8 −0.81 𝑦𝑦(𝑛𝑛 − 1) 0.1
𝐲𝐲(𝑛𝑛) = � �=� �� + � � 𝑢𝑢(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1) 1 0 𝑦𝑦(𝑛𝑛 − 2) 0
𝑦𝑦(𝑛𝑛)
𝑥𝑥(𝑛𝑛) = [1 0] � � + 𝑣𝑣(𝑛𝑛)
𝑦𝑦(𝑛𝑛 − 1)
and hence the Kalman filter algorithm can be run with the following model parameters:
1.8 −0.81 0.1
𝐀𝐀(𝑛𝑛) = � � 𝐁𝐁(𝑛𝑛) = � � 𝐑𝐑 𝑢𝑢 (𝑛𝑛) = 1
1 0 0
𝐇𝐇(𝑛𝑛) = [1 0] 𝐑𝐑 𝑣𝑣 (𝑛𝑛) = 1
100 samples of 𝑥𝑥(𝑛𝑛) and 𝑦𝑦(𝑛𝑛) were generated. The resulting estimate 𝑦𝑦�(𝑛𝑛) by running the
Kalman filter on the observation data 𝑥𝑥(𝑛𝑛) is shown in Figure 5-2. It can be seen that even
in the presence of very noisy data the Kalman filter is able to accurately track the signal.
Figure 5-2 Estimation of AR(2) process using the Kalman filter (Figure 7.12[1])

Example 5–2
Problem: Consider an object travelling in a straight-line with initial velocity 𝑦𝑦𝑣𝑣 (−1), initial
displacement 𝑦𝑦𝑝𝑝 (−1) and subject to a random acceleration. The measured position of the
object at the nth sampling instant:
𝑥𝑥(𝑛𝑛) = 𝑦𝑦𝑝𝑝 (𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
is made in the presence of additive noise 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, 𝜎𝜎𝑣𝑣2 ) where 𝑦𝑦𝑝𝑝 (𝑛𝑛) is the true position of
the object at the nth sampling instant (i.e. 𝑦𝑦𝑝𝑝 (𝑛𝑛) = 𝑦𝑦𝑐𝑐 (𝑛𝑛𝑛𝑛) where 𝑦𝑦𝑐𝑐 (𝑛𝑛𝑛𝑛) is the instantaneous
position and T is the sampling interval in seconds). Form the linear MMSE estimate for the
trajectory 𝑦𝑦𝑝𝑝 (𝑛𝑛), 𝑛𝑛 ≥ 0, that is, track the target in the presence of random acceleration and
measurement noise.
Solution: Let 𝑦𝑦𝑣𝑣 (𝑛𝑛) = 𝑦𝑦̇𝑐𝑐 (𝑛𝑛𝑛𝑛) be the true velocity at the nth sampling instant and 𝑦𝑦𝑎𝑎 (𝑛𝑛) =
𝑦𝑦̈𝑐𝑐 (𝑛𝑛𝑛𝑛) represent the random acceleration that is in effect during the nth sampling interval
(i.e. for 𝑛𝑛𝑛𝑛 ≤ 𝑡𝑡 ≤ (𝑛𝑛 + 1)𝑇𝑇). The following equations of motion will then apply:
𝑦𝑦𝑣𝑣 (𝑛𝑛) = 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1) + 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1)𝑇𝑇
1
𝑦𝑦𝑝𝑝 (𝑛𝑛) = 𝑦𝑦𝑝𝑝 (𝑛𝑛 − 1) + 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1)𝑇𝑇 + 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1)𝑇𝑇 2
2
We formulate the problem using the following state-space model:
𝑦𝑦𝑝𝑝 (𝑛𝑛) 𝑇𝑇 2
1 𝑇𝑇 𝑦𝑦𝑝𝑝 (𝑛𝑛 − 1)
𝐲𝐲(𝑛𝑛) = � �=� �� + � 2 � 𝑢𝑢(𝑛𝑛) 𝑛𝑛 ≥ 0
𝑦𝑦𝑣𝑣 (𝑛𝑛) 0 1 𝑦𝑦𝑣𝑣 (𝑛𝑛 − 1)
𝑇𝑇
= 𝐀𝐀𝐀𝐀(𝑛𝑛 − 1) + 𝐁𝐁𝑢𝑢(𝑛𝑛)
where 𝑢𝑢(𝑛𝑛) = 𝑦𝑦𝑎𝑎 (𝑛𝑛 − 1) treats the random acceleration as the state model noise and:
𝑥𝑥(𝑛𝑛) = [1 0]𝐲𝐲(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) 𝑛𝑛 ≥ 0
= 𝐇𝐇𝐇𝐇(𝑛𝑛) + 𝑣𝑣(𝑛𝑛)
is the observation model.
Estimates for both the position and velocity can be obtained for 𝑛𝑛 ≥ 0 by using the Kalman
filter algorithm initialised by 𝑦𝑦𝑣𝑣 (−1) and 𝑦𝑦𝑝𝑝 (−1).

Figure 5-3 Estimation of positions and velocities using the Kalman filter (Figure 7.14[1])
5.2 References
McGraw-Hill, 2000 (Chapter 10).

6. Signal Modelling and Spectrum Estimation
6.1 Signal Models

[1: 149-156,445-449][2: 188-198]
We discuss modeling of real world signals using parametric pole-zero (PZ) signal models
described by:
𝑃𝑃 𝑄𝑄
𝑥𝑥(𝑛𝑛) = − � 𝑎𝑎𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + � 𝑏𝑏𝑘𝑘 𝑤𝑤(𝑛𝑛 − 𝑘𝑘)

Equation 6.1
where 𝑤𝑤(𝑛𝑛)~𝑁𝑁(0, σ2𝑤𝑤 )
and defined by transfer function:
∑𝑄𝑄𝑘𝑘=0 𝑏𝑏𝑘𝑘 𝑧𝑧 −𝑘𝑘
1 + ∑𝑃𝑃𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧 −𝑘𝑘
such that for random signals:
𝑅𝑅𝑥𝑥 (𝑧𝑧) = 𝐻𝐻(𝑧𝑧)𝐻𝐻(𝑧𝑧 −1 )𝜎𝜎𝑤𝑤2
6.1.1 Moving Average MA(Q) Model
If 𝑃𝑃 = 0 we have the all-zero (AZ) or moving average (MA) model

𝑄𝑄
𝑥𝑥(𝑛𝑛) = � 𝑏𝑏𝑘𝑘 𝑤𝑤(𝑛𝑛 − 𝑘𝑘)

𝑘𝑘=0
where:
𝑄𝑄
𝐻𝐻(𝑧𝑧) = � 𝑏𝑏𝑘𝑘 𝑧𝑧 −𝑘𝑘

𝑘𝑘=0
is an FIR filter transfer function.
Estimating the parameters {𝑏𝑏𝑘𝑘 }

Durbin’s method [2: 196-197] can be used to estimate the {𝑏𝑏𝑘𝑘 }.
6.1.2 Autoregressive AR(P) Model
If 𝑄𝑄 = 0 we have the all-pole (AP) or auto-regressive (AR) model:

𝑃𝑃
𝑥𝑥(𝑛𝑛) = − � 𝑎𝑎𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑏𝑏0 𝑤𝑤(𝑛𝑛)

𝑘𝑘=1
where:
𝑏𝑏0
𝐻𝐻(𝑧𝑧) = 𝑃𝑃
1+ ∑𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧 −𝑘𝑘
is equivalent to forward linear prediction where 𝑏𝑏0 𝑤𝑤(𝑛𝑛) can be considered as the prediction
error.
Estimating the parameters {𝑎𝑎𝑘𝑘 }

The Yule-Walker method [2: 110-113,194] can be used to estimate {𝑎𝑎𝑘𝑘 }. This is effectively
equivalent to the optimum forward linear predictor formulation of 𝑥𝑥(𝑛𝑛).
6.1.3 Autoregressive Moving Average ARMA(P,Q) Model
If 𝑃𝑃 > 0 and 𝑄𝑄 > 0 we have the pole-zero (PZ) or autoregressive moving average (ARMA)
model:
𝑃𝑃 𝑄𝑄
𝑥𝑥(𝑛𝑛) = − � 𝑎𝑎𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + � 𝑏𝑏𝑘𝑘 𝑤𝑤(𝑛𝑛 − 𝑘𝑘)

where:
∑𝑄𝑄𝑘𝑘=0 𝑏𝑏𝑘𝑘 𝑧𝑧 −𝑘𝑘
1 + ∑𝑃𝑃𝑘𝑘=1 𝑎𝑎𝑘𝑘 𝑧𝑧 −𝑘𝑘
is equivalent to an IIR filter transfer function.
Estimating the parameters {𝑎𝑎𝑘𝑘 , 𝑏𝑏𝑘𝑘 }

One option could be to form the optimum IIR filter but this formulation will not permit
specification of (𝑃𝑃, 𝑄𝑄). As such estimating the parameters requires other strategies [2: 189-
193].
6.1.4 Model Selection and Validation
Model Selection: Which model type (MA, AR, ARMA) and which order (values of 𝑃𝑃 and 𝑄𝑄)
should we used to model the given data {𝑥𝑥(𝑛𝑛)}?
Model Validation: For the selected model and order is the estimated model accurate and
representative?
Usually larger values of (𝑃𝑃, 𝑄𝑄) will yield more accurate models, but perhaps not efficient
models which can generalise sufficiently, especially if parameters have been estimated based
on the actual available data {𝑥𝑥(𝑛𝑛)} (i.e. least-squares methods) rather than known second-
order statistics (i.e. optimum filters), and for which a lower order model can be just as good.
So how do we measure the goodness of fit of a model?
Let 𝑥𝑥�(𝑛𝑛) be the estimate of the data generated from a model which has been selected and
parameters estimated. Then 𝑒𝑒(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛) is the model error or residual.

A key validation technique is to check whether the residual process is a realisation of a

white noise process. This is because a white noise residual implies any and all structure
from the signal has been extracted (by the modelling) and what is left are the random
excitation/perturbations. The goal is to find the lowest order (𝑷𝑷, 𝑸𝑸) which yields a white
noise residual. Conversely if the residual is not white noise a higher order or different model
should be investigated.
Testing for the whiteness of the residual signal

• The autocorrelation of the residual should be close to 𝑅𝑅𝑒𝑒 (𝑘𝑘) = 𝛿𝛿(𝑘𝑘).
• The PSD of the residual should be close to a flat response.
Selection of 𝑃𝑃 for an AR model

Formulate the forward linear prediction normal equations using data estimates for the
correlations at any order 𝑚𝑚 > 𝑃𝑃 and then [2: 216-223] solve using the Levinson-Durbin
recursion for 𝑗𝑗 = 1,2, … , 𝑚𝑚:
• until the model error 𝜖𝜖𝑗𝑗+1 ≈ 𝜖𝜖𝑗𝑗 from which the model order is 𝐴𝐴𝐴𝐴(𝑃𝑃 = 𝑗𝑗), or
• until the Partial AutoCorrelation Sequence (PACS), Γ𝑚𝑚 ≈ 0 for 𝑚𝑚 > 𝑃𝑃 from which
the model order is 𝐴𝐴𝐴𝐴(𝑃𝑃).
NOTE: Use the MATLAB script [a,g,eps]=rtoam(r) (from Lab 2) to calculate eps
(𝜖𝜖𝑗𝑗 ) or g (Γ𝑘𝑘 ) for any order m > P. given the m x m autocorrelation matrix r.
6.1.5 Sunspot Data Example
Problem
Find the AR(P) signal model that best describes the sunspot data in Figure 6-1.
Figure 6-1 Sunspot numbers from 1760 to 1869 (Figure 9.3[1])
Model Selection
The Partial AutoCorrelation Sequence (PACS) of the sunspot data is plotted in Figure 6-2.
The horizontal dashed lines represent the significance threshold for determining if Γ𝑚𝑚 ≈ 0.
We see that Γ𝑚𝑚 ≈ 0 for 𝑚𝑚 > 2 hence we select an AP model with P=2.

Figure 6-2 The PACS values of the sunspot numbers (Figure 9.6[1])
Model Estimation
Using LS error analysis with full-windowing [1: 449-453] yields the following AR(2) model
for the sunspot data:
�2𝑤𝑤 = 289.2
𝑥𝑥�(𝑛𝑛) = 1.318𝑥𝑥(𝑛𝑛 − 1) − 0.634𝑥𝑥(𝑛𝑛 − 2) + 𝑤𝑤(𝑛𝑛) σ
Model Validation [1: 448-449]

The residual 𝑒𝑒(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛) is computed and the following tests are performed to
check whether the residual is statistically white:
• autocorrelation test: A plot of the normalised samples autocorrelation sequence, ρ�(𝑙𝑙) =
𝑟𝑟̂ (𝑙𝑙)/𝑟𝑟̂ (0), of the residual should yield ρ�(0) = 1 and ρ�(𝑙𝑙) ≈ 0 𝑙𝑙 > 0. This test is
confirmed in Figure 6-3.
• PSD test: A plot of the standardised cumulative periodogram [1: 448] of the residual
should yield 𝐼𝐼̃(𝑓𝑓) ∝ 𝑓𝑓. This is confirmed in Figure 6-3.
Figure 6-3 Validation tests on the P=2, AP model

fit to the sunspot numbers (Figure 9.7[1])

6.2 Nonparametric Methods for Spectrum Estimation

[1: 212-237][2: 391-426]
NOTE: In the following 𝜔𝜔 refers to digital radians (principal period of −𝜋𝜋 < 𝜔𝜔 < 𝜋𝜋 and due
to even symmetry defined 0 < 𝜔𝜔 < 𝜋𝜋), normally we use Ω but we adopt the same notation as
the textbook.
6.2.1 The Periodogram
The power spectrum of a wide-sense stationary random process is given by the discrete-time
Fourier transform (DTFT) of the autocorrelation sequence:
∞
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑟𝑟𝑥𝑥 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗

𝑘𝑘=−∞
Equation 6.2
One problem is the estimation of the autocorrelation sequence. Define the rectangular data
window as:
1 0 ≤ 𝑛𝑛 < 𝑁𝑁
𝑤𝑤𝑅𝑅 (𝑛𝑛) = �
0 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒
then form:
𝑥𝑥(𝑛𝑛) 0 ≤ 𝑛𝑛 < 𝑁𝑁
𝑥𝑥𝑁𝑁 (𝑛𝑛) = 𝑤𝑤𝑅𝑅 (𝑛𝑛)𝑥𝑥(𝑛𝑛) = �
and we can thus express for an ergodic process:
∞
1 1
𝑟𝑟̂𝑥𝑥 (𝑘𝑘) = � 𝑥𝑥𝑁𝑁 (𝑛𝑛 + 𝑘𝑘)𝑥𝑥𝑁𝑁∗ (𝑛𝑛) = 𝑥𝑥𝑁𝑁 (𝑘𝑘) ∗ 𝑥𝑥𝑁𝑁∗ (−𝑘𝑘)
𝑁𝑁 𝑁𝑁
𝑛𝑛=−∞
Equation 6.3
Taking the discrete-time Fourier transform leads to the following estimate of the power
spectrum, known as the periodogram:
Periodogram
𝑁𝑁−1
1 1 2
𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑟𝑟̂𝑥𝑥 (𝑘𝑘) 𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 = 𝑋𝑋𝑁𝑁 (𝑒𝑒 𝑗𝑗𝑗𝑗 )𝑋𝑋𝑁𝑁∗ (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = �𝑋𝑋𝑁𝑁 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�
𝑁𝑁 𝑁𝑁
𝑘𝑘=−𝑁𝑁+1
Equation 6.4
where:
∞ ∞ 𝑁𝑁−1
𝑋𝑋𝑁𝑁 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑥𝑥𝑁𝑁 (𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 = � 𝑥𝑥(𝑛𝑛)𝑤𝑤𝑅𝑅 (𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 = � 𝑥𝑥(𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗
𝑛𝑛=−∞ 𝑛𝑛=−∞ 𝑛𝑛=0
Equation 6.5
Thus to compute the periodogram we form the DFT of the data sequence and square the
magnitude response:
𝑤𝑤𝑅𝑅 (𝑛𝑛) 𝐷𝐷𝐷𝐷𝐷𝐷 1
𝑥𝑥(𝑛𝑛) �⎯⎯� 𝑥𝑥𝑁𝑁 (𝑛𝑛) �⎯� 𝑋𝑋𝑁𝑁 (𝑘𝑘) → |𝑋𝑋𝑁𝑁 (𝑘𝑘)|2 = 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗2𝜋𝜋𝜋𝜋/𝑛𝑛 )
𝑁𝑁

6.2.2 Performance of the Periodogram
Since the periodogram is a function of the random process, 𝑥𝑥(𝑛𝑛) we need to establish the
performance of the periodogram, 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) as an estimate of the true power spectrum,
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), along statistical grounds. The standard performance measure for statistical
estimates is whether such estimates converge in the mean-square sense for a sufficiently
large sample size, that is whether or not:
2
𝑙𝑙𝑙𝑙𝑙𝑙 𝐸𝐸 ��𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) − 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� � = 0
𝑁𝑁→∞
Convergence in the mean-squared senses requires two important properties to be satisfied in
order as follows:
1. Asymptotically Unbiased
lim 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
𝑁𝑁→∞
2. Mean-Square Convergence
2
lim 𝐸𝐸 ��𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) − 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�� = lim var�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 0
𝑁𝑁→∞ 𝑁𝑁→∞
That is the periodogram, 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), must be a consistent estimate of the power spectrum,
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), its mean must converge to 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) and its variance must converge to zero as 𝑁𝑁 →
∞. Another important measure we can also investigate is the resolution of the periodogram.
Mean of Periodogram
It can be shown [1: 216-217; 2:398-399] that:
1 2
𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�
2𝜋𝜋
Equation 6.6
where:
𝑗𝑗𝑗𝑗
1 sin(𝑁𝑁𝑁𝑁/2) 2
2
�𝑊𝑊𝑅𝑅 (𝑒𝑒 )� = � �
𝑁𝑁 sin(𝜔𝜔/2)
2
And as 𝑁𝑁 → ∞, �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� converges to an impulse of unit area, so we can state:
1 2
𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� (biased)
2𝜋𝜋𝜋𝜋
𝑙𝑙𝑙𝑙𝑙𝑙 𝐸𝐸�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) (it is asymptotically unbiased)
𝑁𝑁→∞
As we can see the window spectrum, �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )�, is critical in determining the form of bias
present in the periodogram.
Resolution of Periodogram
In addition to smoothing the spectrum and introducing spurious peaks the window spectrum
�𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� also reduces the resolution of the spectrum. That is if two closely spaced
sinusoids are close enough the width the mainlobe of �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� will smooth out the two

peaks making it difficult to resolve the two frequencies. The wider the mainlobe the worse
this effect.
We define the resolution as the width, Δω, of the mainlobe of �𝑊𝑊𝑅𝑅 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� at its “half-power”
or 3 dB point. The resolution of the periodogram is given by:
2𝜋𝜋
Res�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 0.89
𝑁𝑁
Equation 6.7
From which we see that the resolution increases (is better) with increasing N (longer
window). This is a fundamental result in spectral analysis as it implies that for non-stationary
signals (e.g. speech) there is a tradeoff between better spectral frequency resolution (longer
window) and better spectral temporal resolution (shorter window).
Variance of Periodogram
We require the variance of the periodogram to converge to zero as 𝑁𝑁 → ∞. Unfortuneately,
deriving an expression for the variance for an arbitrary process, 𝑥𝑥(𝑛𝑛), is difficult due to the
presence of the fourth-order moments of the process. From the derivation in [2: 403-407] it
can be shown that:
var�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
Equation 6.8
Thus the variance does not go to zero as 𝑁𝑁 → ∞ and the periodogram is not a consistent
estimate of the power spectrum.
6.2.3 The Modified Periodogram
Consider the expression for the (standard) periodogram:

∞ 2
1 2 1
𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = �𝑋𝑋𝑁𝑁 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = � � 𝑥𝑥(𝑛𝑛)𝑤𝑤(𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
𝑁𝑁 𝑁𝑁
𝑛𝑛=−∞
where 𝑤𝑤(𝑛𝑛) = 𝑤𝑤𝑅𝑅 (𝑛𝑛) is the rectangular window function. As we have seen the bias in the
spectrum is dependent on the Fourier transform of the window function.
The modified periodogram is defined for the case where the window function, 𝑤𝑤(𝑛𝑛), is other
than the rectangular window function:
∞ 2
1
𝑃𝑃�𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � � 𝑥𝑥(𝑛𝑛)𝑤𝑤(𝑛𝑛)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
𝑁𝑁𝑁𝑁
𝑛𝑛=−∞
Equation 6.9
where:
𝑁𝑁−1
1
𝑈𝑈 = � |𝑤𝑤(𝑛𝑛)|2
𝑁𝑁
𝑛𝑛=0
Equation 6.10
is needed to ensure the estimate is asymptotically unbiased. For a rectangular window,

𝑈𝑈 = 1 and the modified periodogram reduces to the standard periodogram.
Thus by being able to choose the window function we can control the level and type of bias
in the spectral estimate. Some of the popular data windows used for the modified
periodogram are:
Data Expression, for 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1

Window
Rectangular 𝑤𝑤𝑅𝑅 (𝑛𝑛) = 1
Bartlett |2𝑛𝑛 − (𝑁𝑁 − 1)|
𝑤𝑤𝐵𝐵 (𝑛𝑛) = 1 −
𝑁𝑁 − 1
Hanning 2𝑛𝑛𝑛𝑛
𝑤𝑤𝐻𝐻𝐻𝐻 (𝑛𝑛) = 0.5 − 0.5𝑐𝑐𝑐𝑐𝑐𝑐 � �
𝑁𝑁 − 1
Hamming 2𝑛𝑛𝑛𝑛
𝑤𝑤𝐻𝐻𝐻𝐻 (𝑛𝑛) = 0.54 − 0.46𝑐𝑐𝑐𝑐𝑐𝑐 � �
𝑁𝑁 − 1
Blackman 2𝑛𝑛𝑛𝑛 4𝑛𝑛𝑛𝑛
𝑤𝑤𝐵𝐵𝐵𝐵 (𝑛𝑛) = 0.42 − 0.5𝑐𝑐𝑐𝑐𝑐𝑐 � � − 0.08𝑐𝑐𝑐𝑐𝑐𝑐 � �
𝑁𝑁 − 1 𝑁𝑁 − 1
Table 6.1 Some common Data Window functions
Proceeding as before we have that:

1 2
𝐸𝐸�𝑃𝑃�𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 )� (biased)
2𝜋𝜋𝜋𝜋𝜋𝜋
lim 𝐸𝐸�𝑃𝑃�𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) (asymptotically unbiased)
𝑁𝑁→∞
since all windows used for the modified periodogram tend to the impulse function as 𝑁𝑁 → ∞
var�𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 )

Thus the variance does not go to zero as 𝑁𝑁 → ∞ and the modified periodogram is not a
consistent estimate of the power spectrum
and
Res�𝑃𝑃�𝑀𝑀 �𝑒𝑒 𝑗𝑗𝑗𝑗 �� = (𝛥𝛥𝛥𝛥)3𝑑𝑑𝑑𝑑
which depends on the window function. Furthermore the level of the peak sidelobe is also
important in establishing the level of spectral leakage in the ensuing spectral estimate. These
values are tabulated in Figure 6-4.
Figure 6-4 Properties of selected Window functions ([1], Table 8.2, pg. 411)

We note the following:

• the rectangular has the best resolution but the worst spectral leakage.
• the non-rectangular windows all have poorer resolution, but better spectral leakage
(smoother spectrum).
6.2.4 Welch’s Method: Averaging the Modified Periodogram
We know from the statistical theory that if we average N uncorrelated observations (e.g. K
uncorrelated measurements from the same random process 𝑥𝑥, with the nth observation
denoted by 𝑥𝑥𝑛𝑛 ) then the variance of the average is given by:
𝐾𝐾 𝐾𝐾
1 1 var{𝑥𝑥𝑛𝑛 }
var � � 𝑥𝑥𝑛𝑛 � = 2 var �� 𝑥𝑥𝑛𝑛 � =
𝐾𝐾 𝐾𝐾 𝐾𝐾
𝑛𝑛=1 𝑛𝑛=1
Equation 6.11
That is, the variance is reduced by a factor of K. Thus if we can derive K uncorrelated
measurements of the modified periodogram for the same random process, then by forming
the average of these (which we know produces an asymptotically unbiased estimate of the
power spectrum) we will obtain a power spectrum estimate with variance ∝ 1/𝐾𝐾, i.e. the
variance will approach zero as 𝐾𝐾 → ∞.
Given the N length realisation of the signal, 𝑥𝑥(𝑛𝑛), assume that successive sequences are
offset by 𝐷𝐷 sample times and that each sequence is 𝐿𝐿 samples long. Then the ith sequence is
given by:
𝑥𝑥𝑖𝑖 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖), 𝑛𝑛 = 0,1, … , 𝐿𝐿 − 1
thus, the amount of overlap between 𝑥𝑥𝑖𝑖 (𝑛𝑛) and 𝑥𝑥𝑖𝑖+1 (𝑛𝑛) is 𝐿𝐿 − 𝐷𝐷 samples.
Assume K of the sequences cover the entire N data samples, then we must have that:
𝑁𝑁 = 𝐿𝐿 + 𝐷𝐷(𝐾𝐾 − 1)
Consider the following cases:
• No overlap (𝐷𝐷 = 𝐿𝐿) 𝐾𝐾 = 𝑁𝑁/𝐿𝐿 sections of length 𝐿𝐿
• 50% overlap (𝐷𝐷 = 𝐿𝐿/2) , 𝐾𝐾 = 2(𝑁𝑁/𝐿𝐿) − 1 sections of length 𝐿𝐿
In Welch’s Method the N-length realisation of the signal is partitioned into K sequences of
length L which are offset by 𝐷𝐷 samples, i.e. 𝑥𝑥𝑖𝑖 (𝑛𝑛) = 𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖), such that 𝑁𝑁 = 𝐿𝐿 + 𝐷𝐷(𝐾𝐾 −
1) and the K modified periodogram estimates are formed using the L-length subsequences
and averaged, to yield the estimate:
𝐾𝐾−1 𝐿𝐿−1 2
1
𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � �� 𝑤𝑤(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 𝑖𝑖𝑖𝑖)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
𝐾𝐾𝐾𝐾𝐾𝐾
𝑖𝑖=0 𝑛𝑛=0
1
where 𝑈𝑈 = ∑𝐿𝐿−1 |𝑤𝑤(𝑛𝑛)|2 and 𝑤𝑤(𝑛𝑛) is the window of length L. Hence it is evident that:
𝐿𝐿 𝑛𝑛=0
1 2
𝐸𝐸�𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ �𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 )�
2𝜋𝜋𝜋𝜋𝜋𝜋

For 𝐷𝐷 = 𝐿𝐿/2 (50% overlap) and a Bartlett window function, 𝑤𝑤𝐵𝐵 (𝑛𝑛), it can be shown that:
9 𝐿𝐿 2 𝑗𝑗𝑗𝑗 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 )
� 𝑗𝑗𝑗𝑗
var�𝑃𝑃𝑊𝑊 (𝑒𝑒 )� ≈ 𝑃𝑃 (𝑒𝑒 ) ≈
16 𝑁𝑁 𝑥𝑥 𝐾𝐾
2𝜋𝜋
Res�𝑃𝑃�𝑊𝑊 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 1.28
𝐿𝐿
We note the following:

• Welch’s method produces asymptotically unbiased estimates which are also
consistent due to the reduced variance for large 𝐾𝐾.
• The choice of 𝐿𝐿 is a compromise between resolution (improves with larger 𝐿𝐿) and
consistency (improves with smaller 𝐿𝐿)
• The choice of 𝐷𝐷 is tradeoff between generating sufficiently uncorrelated
measurements (need larger offset 𝐷𝐷) which will help reduce the variance and not
having enough measurements (need a smaller offset 𝐷𝐷), so 50% overlap is deemed to
be a good compromise.
6.2.5 Blackman-Tukey Method: Periodogram Smoothing
An alternative to averaging the periodogram is smoothing the periodogram directly. For

example consider the following convolution:
𝜋𝜋
1 1
𝑃𝑃�𝑎𝑎 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ 𝑊𝑊𝑎𝑎 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗(𝜔𝜔−𝜃𝜃) ) 𝑊𝑊𝑎𝑎 (𝑒𝑒 𝑗𝑗𝑗𝑗 )𝑑𝑑𝑑𝑑
2𝜋𝜋 2𝜋𝜋
−𝜋𝜋
where:
1 𝛥𝛥𝛥𝛥
𝑊𝑊𝑎𝑎 (𝑒𝑒 ) = �𝛥𝛥𝛥𝛥 𝑗𝑗𝑗𝑗 |𝜔𝜔| <
2
𝛥𝛥𝛥𝛥 𝛥𝛥𝛥𝛥
yields the average of the periodogram over the range − < 𝜔𝜔 < for 𝑃𝑃�𝑎𝑎 (𝑒𝑒 𝑗𝑗𝑗𝑗 ), thus
2 2
smoothing the periodogram. This smoothing will remove the spurious variability evident in
the periodogram estimate and hence reduce the variance of the estimates. In the time-domain
we can achieve this by multiplying 𝑟𝑟̂𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) by 𝑤𝑤𝑎𝑎 (𝑛𝑛).
Note however that 𝑤𝑤𝑎𝑎 (𝑛𝑛) is an infinite-duration sinc( ) function and thus we can’t realise this
smoothing operation. The Blackman-Tukey Method implements a practical form of
smoothing of the periodogram and is defined by:
In the Blackman-Tukey Method the estimate is given by:

∞
𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑟𝑟̂𝑥𝑥 (𝑘𝑘)𝑤𝑤(𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗

𝑘𝑘=−∞
where 𝑤𝑤(𝑘𝑘) is the lag window, non-zero over the interval −𝑀𝑀 ≤ 𝑘𝑘 ≤ 𝑀𝑀, where 𝑀𝑀 ≤ 𝑁𝑁 − 1,
and zero otherwise.
1
𝐸𝐸�𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� ≈ 𝑃𝑃�𝑝𝑝𝑝𝑝𝑝𝑝 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) ∗ 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 )
2𝜋𝜋
For a Bartlett window function of length 2M we can also show:

𝑀𝑀 𝑀𝑀 2
1 1 |𝑘𝑘| 2𝑀𝑀 2 𝑗𝑗𝑗𝑗
var�𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� ≈ 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) � 𝑤𝑤 2 (𝑘𝑘) = 𝑃𝑃𝑥𝑥2 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) � �1 − � ≈ 𝑃𝑃 (𝑒𝑒 )
𝑁𝑁 𝑁𝑁 𝑀𝑀 3𝑁𝑁 𝑥𝑥
𝑘𝑘=−𝑀𝑀 𝑘𝑘=−𝑀𝑀
2𝜋𝜋 2𝜋𝜋
Res�𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 )� = 1.28 = 0.64
2𝑀𝑀 𝑀𝑀
We note that since 𝑟𝑟̂𝑥𝑥 (𝑘𝑘) is a real and even symmetric function then 𝑃𝑃�𝐵𝐵𝐵𝐵 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) should be a
real-valued, non-negative function of ω. These requirements restrict our choice of ideal
window functions, 𝑤𝑤(𝑘𝑘), to those that:
• are conjugate symmetric, 𝑤𝑤(𝑘𝑘) = 𝑤𝑤 ∗ (−𝑘𝑘), so that 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) is real-valued
• have a Fourier transform 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) ≥ 0 (non-negative function of ω)
All window functions listed in Table 6.1 are even and symmetric, but only the Bartlett
window function has 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) ≥ 0, whereas the popular Hamming and Hanning window
functions do not. Hence the Blackman-Tukey Method will use a Bartlett window function by
default.
We see that as always there is a trade-off. To decrease the variance we require a small value
of 𝑀𝑀. This corresponds to wider main-lobe in 𝑊𝑊(𝑒𝑒 𝑗𝑗𝑗𝑗 ) which produces greater smoothing but
at the cost of reduced resolution. One recommendation is to ensure 𝑀𝑀 ≤ 𝑁𝑁/5.
6.3 Parametric Methods for Spectrum Estimation

[1: 466-470][2: 440-442,448-451]
NOTE: In the following 𝜔𝜔 refers to digital radians (principal period of −𝜋𝜋 < 𝜔𝜔 < 𝜋𝜋 and due
to even symmetry defined 0 < 𝜔𝜔 < 𝜋𝜋), normally we use Ω but we adopt the same notation as
the textbook.
The nonparametric methods make no assumption on the signal, 𝑥𝑥(𝑛𝑛), and can thus be used
with any random signal process. However if we have some knowledge on the signal model
used to generate 𝑥𝑥(𝑛𝑛) then parametric spectrum estimation methods can be used.
Parametric methods provide more accurate, higher resolution spectrum estimates BUT only
if the underlying model is correct, otherwise the spectrum estimate will be worse than that
obtained by nonparametric means.
A random signal can be modelled as being the output of either:

• an Autoregressive (AR) (all-pole) model of order p (i.e. designated AR(p)), or
• a Moving Average (MA) (all-zeros) model of order q (i.e. designated MA(q)), or
• an Autoregressive Moving Average (ARMA) (poles and zeros) model of order q and p
(designated ARMA(p,q))
each excited by a unit variance white noise process.

6.3.1 Autoregressive (AR) Spectrum Estimation
An AR(p) signal is generated by filtering a unit variance white noise process by:
𝑏𝑏(0)
𝐻𝐻(𝑧𝑧) = 𝑝𝑝
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
𝑝𝑝
We have a signal, 𝑥𝑥(𝑛𝑛), which we assume is the output of an AR(p) model. Let �𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 ,
𝑏𝑏�(0) be the estimates of the model parameters that we obtain from the signal data, then
2
noting the power spectrum of a signal is given by �𝐻𝐻(𝑒𝑒 𝑗𝑗𝑗𝑗 )� we obtain the parametric
spectrum estimate:
� (0)�2
�𝑏𝑏
𝑃𝑃�𝐴𝐴𝐴𝐴 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = 𝑝𝑝 2
�1 + ∑𝑘𝑘=1 𝑎𝑎�𝑝𝑝 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �
Equation 6.12
𝑝𝑝 𝑝𝑝
where the �𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 ≡ {𝑎𝑎𝑘𝑘 }𝑘𝑘=1 = 𝐚𝐚𝑂𝑂 , are the forward linear prediction parameters, and
2 𝑓𝑓
�𝑏𝑏�(0)� = 𝑃𝑃𝑂𝑂 = 𝑟𝑟(0) + 𝐫𝐫 𝑇𝑇 𝐚𝐚𝑂𝑂 is the MMSE power from stationary FLP analysis (see Section
4.5.4).
6.3.2 Moving Average (MA) Spectrum Estimation
An MA(q) signal is generated by filtering a unit variance white noise process by:
𝑞𝑞
𝐻𝐻(𝑧𝑧) = � 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑧𝑧 −𝑘𝑘

𝑘𝑘=0
𝑞𝑞
We have a signal, 𝑥𝑥(𝑛𝑛), which we assume is the output of an MA(q) model. Let �𝑏𝑏�𝑞𝑞 (𝑘𝑘)�𝑘𝑘=0 ,
be the estimates of the model parameters that we obtain from the signal data, then noting the
2
power spectrum of a signal is given by �𝐻𝐻(𝑒𝑒 𝑗𝑗𝑗𝑗 )� we obtain the parametric spectrum
estimate:
𝑞𝑞 2
𝑃𝑃�𝑀𝑀𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = �� 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗 �

𝑘𝑘=0
Equation 6.13
An alternative formulation is possible if the autocorrelation sequence is known (or has been
estimated). Since for a MA(q) process 𝑟𝑟𝑥𝑥 (𝑘𝑘) = 0 for |𝑘𝑘| > 𝑞𝑞 then a natural estimate based
on Equation 6.2 is:
𝑞𝑞
𝑃𝑃�𝑀𝑀𝑀𝑀 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑟𝑟̂𝑥𝑥 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗

𝑘𝑘=−𝑞𝑞
Equation 6.14

6.3.3 Autoregressive Moving Average (ARMA) Spectrum Estimation
An ARMA(p,q) signal is generated by filtering a unit variance white noise process by:
∑𝑞𝑞𝑘𝑘=0 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
𝐻𝐻(𝑧𝑧) = 𝑝𝑝
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑧𝑧 −𝑘𝑘
We have a signal, 𝑥𝑥(𝑛𝑛), which we assume is the output of an ARMA(p,q) model. Let
𝑝𝑝 𝑞𝑞
�𝑎𝑎�𝑝𝑝 (𝑘𝑘)�𝑘𝑘=1 , �𝑏𝑏�𝑞𝑞 (𝑘𝑘)�𝑘𝑘=0 , be the estimates of the model parameters that we obtain from the
2
signal data, then noting the power spectrum of a signal is given by �𝐻𝐻(𝑒𝑒 𝑗𝑗𝑗𝑗 )� we obtain the
parametric spectrum estimate:
𝑞𝑞 −𝑗𝑗𝑗𝑗𝑗𝑗 2
∑ 𝑘𝑘=0 𝑏𝑏𝑞𝑞 (𝑘𝑘)𝑒𝑒
𝑃𝑃�𝐴𝐴𝐴𝐴𝐴𝐴𝐴𝐴 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = � 𝑝𝑝 �
1 + ∑𝑘𝑘=1 𝑎𝑎𝑝𝑝 (𝑘𝑘)𝑒𝑒 −𝑗𝑗𝑗𝑗𝑗𝑗
Equation 6.15
Example 6.1
Consider the signal generated by:
𝑥𝑥𝑎𝑎 (𝑛𝑛) = 5sin(0.45𝜋𝜋𝜋𝜋 + 𝜙𝜙1 ) + 5sin(0.55𝜋𝜋𝜋𝜋 + 𝜙𝜙2 ) + 𝑤𝑤(𝑛𝑛)
where 𝑤𝑤(𝑛𝑛) is a unit variance white noise process. The spectrum estimate in Figure 6-5
clearly shows that parametric AR spectral estimation has superior resolution to
nonparametric Blackman-Tukey spectral estimation since the sinusoidal peaks have been
resolved. This is a result of the process 𝑥𝑥𝑎𝑎 (𝑛𝑛) being accurately modelled as an AR process.
Figure 6-5 Spectrum estimate for the signal 𝐱𝐱 𝐚𝐚 (𝐧𝐧) using the Blackman-Tukey method
(dashed lines) and AR spectral estimation (solid lines) ([2], Figure 8.23(a), pg. 441)

Now consider the MA(2) signal:

𝑥𝑥𝑏𝑏 (𝑛𝑛) = 𝑤𝑤(𝑛𝑛) − 𝑤𝑤(𝑛𝑛 − 2)
where 𝑤𝑤(𝑛𝑛) is a unit variance white noise process . The power spectrum is given by:
𝑃𝑃𝑥𝑥 (𝑒𝑒 𝑗𝑗𝑗𝑗 ) = 2 − 2cos2𝜔𝜔
The spectrum estimate in Figure 6-6 now clearly shows that the parametric AR spectral
resolution introduces artificial peaks in the spectrum whereas the nonparametric Blackman-
Tukey spectral estimation more accurately represents the true spectrum. This is a result of
the process 𝑥𝑥𝑏𝑏 (𝑛𝑛) being incorrectly modelled as an AR process.
Figure 6-6 Spectrum estimate for the signal 𝒙𝒙𝒃𝒃 (𝒏𝒏) using the Blackman-Tukey method
(dashed lines) and AR spectral estimation (solid lines) ([1], Figure 8.23(b), pg. 441)

6.3.4 Example of parametric spectral estimation using the LS AP/PZ methods
The PSD of an ARMA(4,2) process is estimated by using the LS AP/PZ methods of order
AP(10) (aka AR(10)) and PZ(4,2) (aka ARMA(4.2)) on a 300 sample segment. The actual
PSD and estimated PSD’s are plotted in Figure 6-7 .The effect of the model mismatch is
evident when using the AP(10) method (especially the spectral valley between the two
spectral peaks and the high-frequency response) compared to accuracy of the estimated PSD
when using the correct model in the PZ(4,2) method. However even with a model mismatch
the AP(10) accurately identifies the spectral peaks in the spectrum. Indeed it can be shown
that the AP models tend to accurately model spectral peaks but not spectral valleys.
Figure 6-7: Actual PSD and estimated PSD from the AP(10) and PZ(4,2) methods for
an ARMA(4,2) process ([1], Figure 9.13)
6.4 References
McGraw-Hill, 2000.

7. Adaptive Filters
7.1 Introduction
[1: 499-516][2: 493-499]
Filters (which we will also use to refer to predictors) that incorporate a systematic way that
allows the filter co-efficients to evolve (or adapt) as a function of the input data and desired
signal (when available) are termed adaptive filters. Adaptive filters are important for several
reasons:
• Knowledge of the second-order moments will usually be unknown and an optimum filter
design is not possible
• The signal operating environment (SOE) changes with time and a fixed digital filter
implementation will not work
• Practical implementations are possible with minimum computational and memory
requirements (compared to solving the normal equations in an optimum or LS filter
directly)
The following real-world problems in signal processing and communications can only be
solved by the use of a properly designed adaptive filter:
• echo cancelation in communications
• equalisation of data communication channels
• linear predictive coding
• noise cancelation
7.1.1 Adaptive filtering framework
The basic structure of an adaptive filter is shown in Figure 7-1.

Input Output
Signal Filter Signal
x(n) Structure yˆ (n)
Filter c(n)
Parameters
Adaptation Error Performance

Algorithm e(n) Evaluation
Figure 7-1 Basic elements of a general adaptive filter
Filter Structure
Forms the output of the filter, 𝑦𝑦�(𝑛𝑛), which is an estimate of a desired response, using
measurements of the input signal or signals, 𝐱𝐱(𝑛𝑛), and driven by the filter parameters, 𝐜𝐜(𝑛𝑛).
The filtering structure can be linear (e.g. FIR filter) or non-linear. We will only consider the
linear FIR structure.
Performance Evaluation
The output of the adaptive filter, 𝑦𝑦�(𝑛𝑛), and the desired response, 𝑦𝑦(𝑛𝑛) (when available), are
used to assess the quality of the adaptive filter response with respect to the requirements of
the particular application and produce a criterion of system performance.
With supervised adaptation the desired response is available and the criterion of
performance is derived based on some average or instantaneous form of the square error (i.e.
|𝑒𝑒(𝑛𝑛)|2 where 𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛) is the performance parameter of interest).
With unsupervised adaptation the desired response is not known and the performance is
evaluated based on some measurable property of the input signal and/or generated response
as part of the adaptation algorithm process.
Adaptation Algorithm
Uses the information from the input signal, 𝐱𝐱(𝑛𝑛), and performance parameter, 𝑒𝑒(𝑛𝑛), or some
function of it, to modify the filter parameters, 𝐜𝐜(𝑛𝑛), in such a way that performance, as
measured by the criterion of system performance, is improved.
7.1.2 Optimum versus Least-Squares versus Adaptive Filters
Our goal is to find the coefficients, 𝐜𝐜(𝑛𝑛) = [𝑐𝑐1 (𝑛𝑛) 𝑐𝑐2 (𝑛𝑛) … 𝑐𝑐𝑀𝑀 (𝑛𝑛)]𝑇𝑇 , to form the estimate
of a desired signal, 𝑦𝑦�(𝑛𝑛) from the M observation data samples,
𝐱𝐱(𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) ⋯ 𝑥𝑥(𝑛𝑛 − 𝑀𝑀 + 1)]𝑇𝑇 :
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)
such that the Mean Square Error (MSE) is minimised:
𝑃𝑃(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 }
where the error is simply:
Optimum filters
If we know the second-order moments of the SOE we can design the optimum filter by
solving the normal equations:
𝐑𝐑(𝑛𝑛)𝐜𝐜𝑂𝑂 (𝑛𝑛) = 𝐝𝐝(𝑛𝑛)
where:
𝐑𝐑(𝑛𝑛) = 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)}
𝐝𝐝(𝑛𝑛) = 𝐸𝐸(𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)}
If the SOE is stationary, the optimum filter is computed once and is used with all realisations
{𝐱𝐱(𝑛𝑛), 𝑦𝑦(𝑛𝑛)}. For nonstationary environments, the optimum filter design is repeated at every
time instant n because the optimum filter is time-varying.

Least-Squares filters
If the second-order moments are not known we collect a sufficient amount of data
{𝐱𝐱(𝑛𝑛), 𝑦𝑦(𝑛𝑛)}𝑁𝑁−1
0 and obtain an acceptable estimate of the optimum filter in the LSE sense by
computing:
𝑁𝑁−1
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 = � 𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)
𝐑𝐑
𝑛𝑛=0
𝑁𝑁−1
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 = � 𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)
𝑛𝑛=0
and then solving the normal equations:
� 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐝𝐝̂
𝐑𝐑
The obtained co-efficients can be used to filter the collected (batch) data in the interval 0 ≤
𝑛𝑛 ≤ 𝑁𝑁 − 1 and/or to start filtering the data for 𝑛𝑛 > 𝑁𝑁, on a sample-by-sample basis, in real-
time. LS filtering is a form of adaptive filtering called block adaptive filtering where the co-
efficients should be re-estimated each time the properties of the SOE change significantly.
Block adaptive filters suffer from the following two problems:
1. The filter cannot track statistical variations within the operating block and hence provides
an “average” performance which may be poor on a sample-by-sample responses basis.
2. A decision has to be made when the properties of the SOE change significantly to trigger
a re-estimation. Alternatively the co-efficients can be re-estimated continuously (as with
standard LS estimation using blocks of length N and overlap 𝑁𝑁𝑂𝑂 ) with a consequent
increase in the computational requirements to calculate the correlation matrix and solve
the normal equations (see lecture notes on Least Squares Filtering and Prediction).
Adaptive filters
In applications which require sample-by-sample filtering (e.g. channel equalisation) the
adaptive filter operation starts immediately at time 𝑛𝑛 = 0 after the observation of the pair
{𝐱𝐱(0), 𝑦𝑦(0)} using an initial “guess” 𝐜𝐜(−1) for the filter co-efficients.
The co-efficients are then updated at each time-step and the performance improves as the
filter co-efficients converge to the optimum values (transient acquisition phase) and then, in
the case of non-stationary signals, track any subsequent changes to the optimum values
(steady-state tracking phase). This operation of the adaptive filter is shown in Figure 7-2.
Figure 7-2 Modes of operation of adaptive filter in stationary and nonstationary SOE
(Figure 10.12[1])
For the case where the desired signal is available, the general adaptive filter, at each time n,
performs the following operations.
a priori type adaptive algorithms
1. Filtering:
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
2. Error formation:
3. Adaptive algorithm:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), 𝑒𝑒(𝑛𝑛)}
The increment Δ𝐜𝐜(𝑛𝑛) = Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), 𝑒𝑒(𝑛𝑛)} is in general a non-linear function of the input 𝐱𝐱(𝑛𝑛)
and error 𝑒𝑒(𝑛𝑛) and is designed to bring the filter co-efficient, 𝐜𝐜(𝑛𝑛), close to the optimum
filter co-efficient, 𝐜𝐜𝑂𝑂 , with the passage of time (acquisition phase). A key requirement is that
Δ𝐜𝐜(𝑛𝑛) must vanish if the error 𝑒𝑒(𝑛𝑛) vanishes. Hence 𝑒𝑒(𝑛𝑛) plays a major role in determining
the increment Δ𝐜𝐜(𝑛𝑛). Most adaptive algorithms use a direct linear dependency on 𝑒𝑒(𝑛𝑛) as
follows:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝐠𝐠(𝑛𝑛)𝑒𝑒(𝑛𝑛)
Equation 7.1
where 𝐠𝐠(𝑛𝑛) is the gain adaptation vector, usually a function of the input data vector 𝐱𝐱(𝑛𝑛).
The a priori refers to the fact that the estimate 𝑦𝑦�(𝑛𝑛) is based on the current input 𝐱𝐱(𝑛𝑛) and
the previously estimated filter co-efficients 𝐜𝐜(𝑛𝑛 − 1), that is predicted or a priori estimates
for 𝑦𝑦(𝑛𝑛) and 𝑒𝑒(𝑛𝑛) are used, rather than the actual or a posteriori estimates.
If we used the actual estimates, obtained using the current filter co-efficient 𝐜𝐜(𝑛𝑛) then we
have the following.
a posterior type adaptive algorithms
1. Filtering:
𝑦𝑦�𝑎𝑎 (𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)
2. Error formation:
ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�𝑎𝑎 (𝑛𝑛)
3. Adaptive algorithm:
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), ε(𝑛𝑛)}
The a posteriori type adaptive algorithms are usually more difficult to formulate due to the
coupling of 𝐜𝐜(𝑛𝑛)between ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛) and 𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + Δ𝐜𝐜{𝐱𝐱(𝑛𝑛), ε(𝑛𝑛)}.

7.1.3 Performance Measures of Adaptive Filters
The goal of an adaptive filter is to find and then track the optimum filter 𝐜𝐜𝑂𝑂 (𝑛𝑛) as quickly
and accurately as possible. Define:
𝐜𝐜�(𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐜𝐜𝑂𝑂 (𝑛𝑛)
as the deviation from the optimal filter. The MSE for the adaptive filter can be decomposed
as:
𝑃𝑃(𝑛𝑛) = 𝑃𝑃𝑂𝑂 (𝑛𝑛) + 𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛)
where 𝑃𝑃𝑂𝑂 (𝑛𝑛) is the optimum filter MSE (the best we can do) and 𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) measures the excess
MSE (EMSE) and indicates the deviation of the error response from the optimum filter case.
In stationary SOE’s the MSE can be further decomposed as:
𝑃𝑃(𝑛𝑛) = 𝑃𝑃𝑂𝑂 + 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) + 𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
where:
𝑃𝑃𝑂𝑂 is the stationary (i.e. constant) MSE of the optimum filter
𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) represents the transient MSE which dominates during the acquisition phase.
𝑃𝑃𝑒𝑒𝑒𝑒 (∞) represents the steady-state excess MSE which dominates during the tracking phase.
Stability
An acceptable adaptive filter should be stable in the bounded-input bounded-output (BIBO)
sense. Assuming that the optimum filter is, by definition, stable, then one measure of
stability is whether the adaptive filter eventually convergences to the optimum filter. One
measure of stability in this way is:
Convergence in the mean-square (MS) sense
lim 𝐸𝐸{‖𝐜𝐜�(𝑛𝑛)‖2 } = 𝟎𝟎
𝑛𝑛→∞
Equation 7.2
�(𝑛𝑛)‖2
where 𝐌𝐌𝐌𝐌𝐌𝐌(𝑛𝑛) = 𝐸𝐸{‖𝐜𝐜 } is the mean square deviation (MSD).
Speed of adaptation
The speed of adaptation or rate of convergence during the acquisition phase can be
considered proportional to the total amount of transient MSE. Thus:
Total transient MSE (measure of speed of adaptation)
∞
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝑃𝑃𝑡𝑡𝑡𝑡 = � 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛)
𝑛𝑛=0
Equation 7.3
Quality of Adaptation
The EMSE and steady-state EMSE (in the stationary SOE case) are a measure of how well
the adaptive filter is working, i.e. the quality of adaptation.
Misadjustment (measure of quality of adaptation)
𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) 𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
Μ= or
𝑃𝑃𝑂𝑂 (𝑛𝑛) 𝑃𝑃𝑂𝑂
Equation 7.4

7.2 Least-Mean-Square (LMS) Adaptive Filters

[1: 524-548][2:505-521]
7.2.1 Derivation
The LMS adaptive filter is a form of stochastic steepest-descent algorithm (SDA) that
minimises the instantaneous MSE. For details see [1: 516-524][2: 499-505].
The SDA attempts to find the optimum filter, 𝐜𝐜𝑂𝑂 (𝑛𝑛), by taking steps in the direction of the
∂𝜉𝜉(𝑛𝑛)
negative gradient of the MSE cost function, −∇𝜉𝜉(𝑛𝑛) = − � � where the MSE cost
∂𝐜𝐜(𝑛𝑛−1)
function, 𝜉𝜉(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 } ≈ |𝑒𝑒(𝑛𝑛)|2 , is approximated by the instantaneous MSE.
The steepest-descent algorithm operates by:

𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) − µ[∇𝜉𝜉(𝑛𝑛)]
where µ, the step-size parameter, is a user-defined variable which controls the SDA learning
rate.
The expression for the gradient is derived as follows:

∂𝜉𝜉(𝑛𝑛) ∂{|𝑒𝑒(𝑛𝑛)|2 } ∂{(𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛))(𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1))}
= =
∂𝐜𝐜(𝑛𝑛 − 1) ∂𝐜𝐜(𝑛𝑛 − 1) ∂𝐜𝐜(𝑛𝑛 − 1)
∂{𝑦𝑦 2 (𝑛𝑛) − 2𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛) + 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1)}
=
∂𝐜𝐜(𝑛𝑛 − 1)
= −2𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛) + 2𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1)
= −2𝐱𝐱(𝑛𝑛)𝑒𝑒(𝑛𝑛)
Thus we can summarise:

a priori type LMS adaptive algorithm
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛) filtering
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛) error formation
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 2µ𝐱𝐱(𝑛𝑛)𝑒𝑒(𝑛𝑛) coefficient updating
where µ is the adaptation step size and the algorithm is initialised by:
𝐜𝐜(−1) = 𝟎𝟎
𝐱𝐱(−1) = 𝟎𝟎
that is, the initial filter co-efficients are all set to zero and unavailable data is set to zero.
For a data input vector with M values the algorithm requires 2M multiplications and 2M
additions at each time-step.
The FIR adaptive filter realisation using the LMS algorithm is shown in Figure 7-3.

Figure 7-3 An FIR adaptive filter realisation using the LMS algorithm (Figure 10.18[1])
7.2.2 Performance of LMS adaptive filter
The following analyses hold for the stationary SOE case.
Stability and Robustness

The stability of the LMS adaptive filter can only be theoretically proven for the case where
we assume that 𝐱𝐱(𝑛𝑛) is an IID process, which is usually an unrealistic assumption (except for
some specific cases like digital data transmissions). However this result is a useful guide to
the general case where stability can be ensured if we select the adaptation step size such that:
1 1
0 < 𝜇𝜇 ≪ 𝑀𝑀 ≡
∑𝑘𝑘=1 𝐸𝐸{|𝑥𝑥𝑘𝑘 (𝑛𝑛)|2 } 𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 }
where the last term holds for the FIR filter case assuming a stationary signal 𝑥𝑥(𝑛𝑛). If the
stationary second-order statistics are also known (e.g. 𝐑𝐑 𝑥𝑥 = 𝐸𝐸{𝐱𝐱𝐱𝐱 𝑇𝑇 }) then we can show that:
1
0 < 𝜇𝜇 ≪
𝜆𝜆max
where 𝜆𝜆max is the maximum eigenvalue of 𝐑𝐑 𝑥𝑥 . The LMS adaptive filter is also robust in the
sense that the initial conditions, 𝐜𝐜(−1), and optimum error, 𝑒𝑒𝑂𝑂 (𝑛𝑛), have a small effect on the
steady filter deviation, 𝐜𝐜�(𝑛𝑛) and error 𝑒𝑒(𝑛𝑛) response.
Speed of adaptation
The theoretical analysis is beyond the scope of these notes but the defining relation is that:
∞
𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡𝑡
𝐜𝐜�(0)
𝑃𝑃𝑡𝑡𝑡𝑡 = � 𝑃𝑃𝑡𝑡𝑡𝑡 (𝑛𝑛) ≅
4µ
𝑛𝑛=0
The smaller the step size and the farther the initial co-efficients are from their optimum
settings, the more iterations it takes for the LMS algorithm to converge.
Another factor which contributes to a slow rate of convergence which arises from analysis of
the convergence of the underlying SDA is the eigenvalue spread (condition number) of the
input correlation matrix 𝐑𝐑 𝑥𝑥 :
λmax
Χ(𝐑𝐑 𝑥𝑥 ) =
λmin
For the LMS algorithm the transient decay time-constant is lower bounded by:
𝜏𝜏 > Χ(𝐑𝐑 𝑥𝑥 )
Thus the LMS algorithm will converge faster if the contours of the error surface are circular
(i.e. 𝐑𝐑 𝑥𝑥 → σ2𝑥𝑥 𝐈𝐈 or Χ(𝐑𝐑 𝑥𝑥 ) → 1, the eigenvalue spread is small) rather than when they are
elliptical (i.e. Χ(𝐑𝐑 𝑥𝑥 ) >> 1, the eigenvalue spread is large)
Quality of adaptation
The theoretical analysis is beyond the scope of these notes but the steady-state excess MSE
and hence misadjustment is given by:
𝑀𝑀
𝑃𝑃𝑒𝑒𝑒𝑒 (∞)
Μ= ≅ µ tr(𝐑𝐑 𝑥𝑥 ) = µ � 𝐸𝐸{|𝑥𝑥𝑘𝑘 (𝑛𝑛)|2 } ≡ µ𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 }
𝑃𝑃𝑂𝑂
𝑘𝑘=1
The larger the step-size and the larger the tap input power (defined as 𝑀𝑀𝑃𝑃𝑥𝑥 = 𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 })
the greater the deviation in the error from the optimum error.
Tradeoff in the choice of the adaptation gain

The adaptation gain, µ, must be carefully chosen:
1. If too large, the LMS adaptive filter will converge quickly but may be unstable and
exhibit large misadjustments.
2. If too small, the LMS adaptive filter will be stable and exhibit low misadjustment but
may converge too slowly.
7.2.3 Applications of the LMS algorithm
Linear Prediction
Consider a signal, 𝑥𝑥(𝑛𝑛), generated by the following AR(2) model:
𝑥𝑥(𝑛𝑛) = −𝑎𝑎1 𝑥𝑥(𝑛𝑛 − 1) − 𝑎𝑎2 𝑥𝑥(𝑛𝑛 − 2) + 𝑤𝑤(𝑛𝑛)
2
where 𝑤𝑤(𝑛𝑛)~𝑁𝑁(0, σ𝑤𝑤 ) and two sets of parameter co-efficients are chosen as follows:
Χ(𝐑𝐑) 𝑎𝑎1 𝑎𝑎2 σ2𝑤𝑤
λ1 1.1 -0.1950 0.95 0.0965
= = 1.22
λ2 0.9
λ1 1.818 -1.5955 0.95 0.0322
= = 10
λ2 0.182
𝑐𝑐 (𝑛𝑛) −𝑎𝑎1
The adaptive LMS algorithm was used to provide estimates of 𝐜𝐜(𝑛𝑛) = � 1 � ≡ �−𝑎𝑎 �
𝑐𝑐2 (𝑛𝑛) 2
from 1000 realisations of 𝑥𝑥(𝑛𝑛) for each of the above set of parameters. The LMS algorithm
for this problem is:
Filtering
𝑥𝑥�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
= 𝑐𝑐1 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 1) + 𝑐𝑐2 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 2)

Error Formation
𝑒𝑒(𝑛𝑛) = 𝑥𝑥(𝑛𝑛) − 𝑥𝑥�(𝑛𝑛)
= 𝑥𝑥(𝑛𝑛) − 𝑐𝑐1 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 1) − 𝑐𝑐2 (𝑛𝑛 − 1)𝑥𝑥(𝑛𝑛 − 2)
Co-efficient Updating
𝑐𝑐 (𝑛𝑛) 𝑐𝑐 (𝑛𝑛 − 1) 2µ𝑒𝑒(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 1)
𝐜𝐜(𝑛𝑛) = � 1 � = � 1 �+� � = 𝐜𝐜(𝑛𝑛 − 1) + 2µ𝑒𝑒(𝑛𝑛)𝐱𝐱(𝑛𝑛)
𝑐𝑐2 (𝑛𝑛) 𝑐𝑐2 (𝑛𝑛 − 1) 2µ𝑒𝑒(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 2)
𝑥𝑥(𝑛𝑛 − 1)
where 𝐱𝐱(𝑛𝑛) = � �, µ is the step-size, and the adaptive predictor is initialised by
𝑥𝑥(𝑛𝑛 − 2)
𝑥𝑥(−1) = 𝑥𝑥(−2) = 0 and 𝑐𝑐1 (−1) = 𝑐𝑐2 (−1) = 0.
Plots of the 𝐜𝐜(𝑛𝑛) learning curve and the effect of step-size for the case of a small eigenvalue
spread (Χ(𝐑𝐑) = 1.22) are shown in Figure 7-4 and for a large eigenvalue spread
(Χ(𝐑𝐑) = 10) are shown in Figure 7-5.
Figure 7-4 The performance of the LMS adaptive algorithm used in linear prediction of an
AR(2) process for an eigenvalue spread of 𝚾𝚾(𝐑𝐑) = 𝟏𝟏. 𝟐𝟐𝟐𝟐. (left plot) The average and sample
c(n) learning curve for 𝛍𝛍 = 𝟎𝟎. 𝟎𝟎𝟒𝟒. (right plot) Effect of step-size on MSE learning curve.
(Figure 10.20[1])
Figure 7-5 The performance of the LMS adaptive algorithm used in linear prediction of an
AR(2) process for an eigenvalue spread of 𝚾𝚾(𝐑𝐑) = 𝟏𝟏𝟏𝟏. (left plot) The average and sample
c(n) learning curve for 𝛍𝛍 = 𝟎𝟎. 𝟎𝟎𝟎𝟎. (right plot) Effect of step-size on MSE learning curve.
(Figure 10.21[1])

The following observations can be made:

• The LMS adaptive filter co-efficients and MSE clearly converge to the optimum MMSE
values.
• The learning curves for a particular realisation are random or “noisy” but can be
smoothed by averaging over many realisations.
• The rate of convergence of the LMS algorithm is highly dependent on the step-size µ, the
smaller the step-size the slower the convergence.
• The rate of convergence also depends on the eigenvalue spread, the larger the spread the
slower the rate (e.g. the acquisition phase for Χ(𝐑𝐑) = 1.22 lasts for 150 iterations,
whereas for Χ(𝐑𝐑) = 10 it lasts for 500 iterations).
Echo cancellation in full-duplex data transmission

In telephone based communications system a serious problem is the removal of echo signals
that are either:
• near-end echoes arising from coupling between the microphone and speaker transducers
from the local telephone handset,
• far-end echoes that arise from coupling between the microphone and speaker transducers
from the remote telephone handset.
An echo canceller based on the LMS adaptive algorithm is needed to remove the unwanted
echoes from the incoming signal.
Figure 7-6 Block diagram of adaptive echo cancellation filter (Figure 10.23[1])
Echo cancellation in a communications system can be investigated from the block diagram
given by Figure 7-6 where:
• 𝑥𝑥(𝑛𝑛) is the transmitted signal from the local handset which is assumed to be an IID binary
data sequence,
• FIR echo path,𝐜𝐜𝑂𝑂 , is the FIR filter structure modelling the generation of the combined
near-end and far-end echo signal, 𝑦𝑦(𝑛𝑛), arising from the originating signal source, 𝑥𝑥(𝑛𝑛),
• 𝑢𝑢(𝑛𝑛) = 𝑧𝑧(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) is the “uncancelable” desired signal received from the remote
transmitted signal, 𝑠𝑠(𝑛𝑛), subject to the effect of the transmission path, 𝑔𝑔(𝑛𝑛), and additive
noise, 𝑣𝑣(𝑛𝑛)~𝑁𝑁(0, σ2𝑣𝑣 ),
• 𝑠𝑠𝑟𝑟 (𝑛𝑛) is the received signal at the local handset subject to echo interference from 𝑦𝑦(𝑛𝑛),
• adaptive echo canceller is an FIR filter structure that attempts to form an estimate of the
unwanted echo signal, 𝑦𝑦�(𝑛𝑛), which is then subtracted from the received signal 𝑠𝑠𝑟𝑟 (𝑛𝑛).
The following equations apply to the system described by Figure 7-6:

𝑦𝑦(𝑛𝑛) = 𝐜𝐜𝑂𝑂𝑇𝑇 𝐱𝐱(𝑛𝑛) where 𝐜𝐜𝑂𝑂 = [𝑐𝑐𝑂𝑂 (0) 𝑐𝑐𝑂𝑂 (1) ⋯ 𝑐𝑐𝑂𝑂 (𝑀𝑀 − 1)]𝑇𝑇
𝑠𝑠𝑟𝑟 (𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑧𝑧(𝑛𝑛) + 𝑣𝑣(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) + 𝑢𝑢(𝑛𝑛) where 𝑧𝑧(𝑛𝑛) = ∑∞
𝑘𝑘=0 𝑔𝑔(𝑘𝑘)𝑠𝑠(𝑛𝑛 − 𝑘𝑘)
Since we do not have direct access to 𝑦𝑦(𝑛𝑛) we cannot use 𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛) for the
adaptive filter.
The formulation of a practical LMS adaptive algorithm is based on the observation that the
data sequence 𝐱𝐱(𝑛𝑛) is correlated with 𝑦𝑦(𝑛𝑛) but not 𝑠𝑠(𝑛𝑛) or 𝑣𝑣(𝑛𝑛). Thus 𝐸𝐸{𝐱𝐱(𝑛𝑛)𝑠𝑠(𝑛𝑛)} = 0
which also implies 𝐸𝐸{𝑦𝑦(𝑛𝑛)𝑢𝑢(𝑛𝑛)} = 𝐸𝐸{𝑦𝑦�(𝑛𝑛)𝑢𝑢(𝑛𝑛)} = 0 (why?). As shown by Figure 7-6 we
instead use 𝑒𝑒(𝑛𝑛) = 𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛). So what happens when we use the LMS algorithm to
minimise this error?
𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} = 𝐸𝐸{(𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)2 )}
2
= 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) + 𝑢𝑢(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � = 𝐸𝐸{([𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)] + [𝑢𝑢(𝑛𝑛)])2 }
2
= 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} + 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � + 2𝐸𝐸�𝑢𝑢(𝑛𝑛)�𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)��
2
= 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} + 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� �
2
Thus minimising 𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} is equivalent to minimising 𝐸𝐸 ��𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)� � the MSE for
𝑦𝑦(𝑛𝑛) so that 𝑦𝑦�(𝑛𝑛) → 𝑦𝑦(𝑛𝑛), then 𝐸𝐸{𝑒𝑒 2 (𝑛𝑛)} → 𝐸𝐸{𝑢𝑢2 (𝑛𝑛)} and the output of the system is in
fact 𝑒𝑒(𝑛𝑛) → 𝑢𝑢(𝑛𝑛) which is the desired received signal!
Thus the following LMS algorithm for adaptive echo cancellation can be formulated:
Filtering
𝑦𝑦�(𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)
Error Formation (and also the output!)
𝑒𝑒(𝑛𝑛) = 𝑠𝑠𝑟𝑟 (𝑛𝑛) − 𝑦𝑦�(𝑛𝑛)
Co-efficient Updating
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 2µ 𝑒𝑒(𝑛𝑛)𝐱𝐱(𝑛𝑛)

7.2.4 Improving performance of the LMS algorithm
Normalised LMS (NLMS)

The selection of the step size 𝜇𝜇 for best performance is quite tricky, if too small, convergence
is slow, if too large correct convergence is not assured. We can guarantee convergence if:
1 1 1
0 < µ ≪ 𝑀𝑀 ≡ ≡
∑𝑘𝑘=1 𝐸𝐸{|𝑥𝑥𝑘𝑘 (𝑛𝑛)|2 } 𝑀𝑀𝑀𝑀{|𝑥𝑥(𝑛𝑛)|2 } ‖𝐱𝐱(𝑛𝑛)‖2
But that requires knowledge of the input correlation matrix.
We incorporate this knowledge into the step-size if we define 𝛽𝛽 and generate the adaptive
step-size as:
𝛽𝛽
𝜇𝜇(𝑛𝑛) =
2‖𝐱𝐱(𝑛𝑛)‖2
Hence we formulate the normalised LMS (NLMS):
𝐱𝐱(𝑛𝑛)
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝛽𝛽 𝑒𝑒(𝑛𝑛)
‖𝐱𝐱(𝑛𝑛)‖2
where the choice of the NLMS step-size, 𝛽𝛽, will ensure convergence for 0 < 𝛽𝛽 < 2. To use
the NLMS we also need to the following recursion for the normalisation term ‖𝐱𝐱(𝑛𝑛)‖2 :
‖𝐱𝐱(𝑛𝑛)‖2 = ‖𝐱𝐱(𝑛𝑛 − 1)‖2 + |𝑥𝑥(𝑛𝑛)|2 − |𝑥𝑥(𝑛𝑛 − 𝑀𝑀)|2
Advanced Techniques
The LMS algorithm attains its best performance when the input correlation matrix is
diagonal with equal eigenvalues (i.e. Χ(𝐑𝐑 𝑥𝑥 ) = 1 implying no eigenvalue spread) and for FIR
filters this implies that the input data signal is white. Where this is not the case the following
two variations to the basic LMS algorithm are possible:
• Transform-domain LMS algorithm which applies a whitening transformation to the input
data based on the known input correlation matrix.
• Suboptimal decorrelation of the input data via the discrete cosine transform (DCT) or
discrete wavelet transform (DWT) when the input correlation matrix is unknown.
In applications that require filters with a large number of coefficients, then the real-time
implementation of the LMS adaptive algorithm becomes difficult. One solution is to resort to
a block adaptive filter structure where the filter co-efficients are updated on a block-by-
block rather than sample-by-sample basis.

7.3 Recursive Least-Squares (RLS) Adaptive Filters

[1: 548-559][2:541-551]
7.3.1 Introduction
Whereas with the LMS adaptive filter the instantaneous MSE is used:
𝜉𝜉(𝑛𝑛) = 𝐸𝐸{|𝑒𝑒(𝑛𝑛)|2 } ≈ |𝑒𝑒(𝑛𝑛)|2
with the RLS adaptive algorithm a weighted MSE is used:
𝑛𝑛 𝑛𝑛
𝜉𝜉(𝑛𝑛) = � λ𝑛𝑛−𝑗𝑗 |𝑒𝑒(𝑗𝑗)|2 = � λ𝑛𝑛−𝑗𝑗 |𝑦𝑦(𝑗𝑗) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑗𝑗)|2

𝑗𝑗=0 𝑗𝑗=0
Equation 7.5
which is a weighted sum of the total squared error since time 𝑗𝑗 = 0, where the constant λ,
0 < λ ≤ 1 is the forgetting factor (and λ = 1 stands for the special case of growing memory)
chosen to ensure that data in the distant past is paid less attention to (i.e. “forgotten”) so that
the filter can track changes in varying SOEs.
7.3.2 Derivation of LS Adaptive Filters
To derive the LS filter coefficients that minimise Equation 7.5 we proceed by setting the
derivative of 𝜉𝜉(𝑛𝑛) with respect to 𝐜𝐜 𝑇𝑇 (𝑛𝑛) to zero:
𝑛𝑛 𝑛𝑛
𝜕𝜕𝜕𝜕(𝑛𝑛) 𝜕𝜕𝜕𝜕(𝑗𝑗)
= 2 � 𝜆𝜆𝑛𝑛−𝑗𝑗 𝑒𝑒(𝑗𝑗) = −2 � 𝜆𝜆𝑛𝑛−𝑗𝑗 𝑒𝑒(𝑗𝑗)𝐱𝐱(𝑗𝑗)
𝜕𝜕𝐜𝐜(𝑛𝑛) 𝜕𝜕𝐜𝐜(𝑛𝑛)
𝑛𝑛
= −2 � 𝜆𝜆𝑛𝑛−𝑗𝑗 {𝑦𝑦(𝑗𝑗) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑗𝑗)}𝐱𝐱(𝑗𝑗) = 𝟎𝟎

𝑗𝑗=0
Interchanging the order of summation and rearranging terms we have:

𝑛𝑛 𝑛𝑛
� 𝜆𝜆𝑛𝑛−𝑗𝑗 {𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑗𝑗)}𝐱𝐱(𝑗𝑗) = � λ𝑛𝑛−𝑗𝑗 𝐱𝐱(𝑗𝑗)𝑦𝑦(𝑗𝑗)

Noting that the term in { } is a scalar and that {𝐜𝐜 𝐱𝐱} = {𝐱𝐱 𝑇𝑇 𝐜𝐜} we can arrange:
𝑇𝑇
𝑛𝑛 𝑛𝑛
� 𝜆𝜆𝑛𝑛−𝑗𝑗 𝐱𝐱(𝑗𝑗){𝐱𝐱 𝑇𝑇 (𝑗𝑗)𝐜𝐜(𝑛𝑛)} = � λ𝑛𝑛−𝑗𝑗 𝐱𝐱(𝑗𝑗)𝑦𝑦(𝑗𝑗)

and hence we can rewrite as:
� (𝑛𝑛)𝐜𝐜(𝑛𝑛) = 𝐝𝐝̂(𝑛𝑛)
𝐑𝐑
Equation 7.6
where:
𝑛𝑛
� (𝑛𝑛) = � λ𝑛𝑛−𝑗𝑗 𝐱𝐱(𝑗𝑗)𝐱𝐱 𝑇𝑇 (𝑗𝑗)
𝐑𝐑
𝑗𝑗=0

𝑛𝑛
𝐝𝐝̂(𝑛𝑛) = � λ𝑛𝑛−𝑗𝑗 𝐱𝐱(𝑗𝑗)𝑦𝑦(𝑗𝑗)

𝑗𝑗=0
The following time-recursive expressions for the correlation matrix and cross-correlation
vector are evident:
� (𝑛𝑛) = λ𝐑𝐑
𝐑𝐑 � (𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)
𝐝𝐝̂(𝑛𝑛) = λ𝐝𝐝̂(𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)
Equation 7.7
Exercise 7.1 Prove the above time-recursive equations.
Using Equation 7.6 and Equation 7.7 we can show that:

� (𝑛𝑛)𝐜𝐜(𝑛𝑛) = 𝐝𝐝̂(𝑛𝑛)
𝐑𝐑
� (𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)]𝐜𝐜(𝑛𝑛) = [λ𝐝𝐝̂(𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)]
[λ𝐑𝐑
� (𝑛𝑛 − 1)𝐜𝐜(𝑛𝑛) − 𝐱𝐱(𝑛𝑛)[𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛)] = λ𝐝𝐝̂(𝑛𝑛 − 1)
λ𝐑𝐑
� (𝑛𝑛 − 1)𝐜𝐜(𝑛𝑛) − 𝐱𝐱(𝑛𝑛)ε(𝑛𝑛) = λ𝐝𝐝̂(𝑛𝑛 − 1)
λ𝐑𝐑
� −1 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛)ε(𝑛𝑛) = 𝐑𝐑
𝐜𝐜(𝑛𝑛) − λ−1 𝐑𝐑 � −1 (𝑛𝑛 − 1)𝐝𝐝̂(𝑛𝑛 − 1) = 𝐜𝐜(𝑛𝑛 − 1)
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝐠𝐠�(𝑛𝑛)ε(𝑛𝑛)
where:
ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛) is the a posteriori error
𝐠𝐠�(𝑛𝑛) is the alternative adaptation gain which is given by solving
� (𝑛𝑛 − 1)𝐠𝐠�(𝑛𝑛) = 𝐱𝐱(𝑛𝑛)
λ𝐑𝐑
With some additional simplification to decouple 𝐜𝐜(𝑛𝑛) from the co-efficient update (𝐜𝐜(𝑛𝑛) =
𝐜𝐜(𝑛𝑛 − 1) + 𝐠𝐠�(𝑛𝑛)ε(𝑛𝑛)) and error formation (ε(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛)) equations we can
𝑒𝑒(𝑛𝑛)
formulate a conversion factor 𝛼𝛼�(𝑛𝑛) such that ε(𝑛𝑛) = �(𝑛𝑛) as detailed in the a posteriori LS
𝛼𝛼
adaptive filter summarised in Figure 7-7.
Similarly we can also show that:

� (𝑛𝑛 − 1)𝐜𝐜(𝑛𝑛 − 1) = 𝐝𝐝̂(𝑛𝑛 − 1)
𝐑𝐑
� (𝑛𝑛) − 𝐱𝐱(𝑛𝑛)𝐱𝐱 𝑇𝑇 (𝑛𝑛)]𝐜𝐜(𝑛𝑛 − 1) = [𝐝𝐝̂(𝑛𝑛) − 𝐱𝐱(𝑛𝑛)𝑦𝑦(𝑛𝑛)]
[𝐑𝐑
� (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)[𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1)] = 𝐝𝐝̂(𝑛𝑛)
𝐑𝐑
� (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1) + 𝐱𝐱(𝑛𝑛)𝑒𝑒(𝑛𝑛) = 𝐝𝐝̂(𝑛𝑛)
𝐑𝐑
𝐜𝐜(𝑛𝑛 − 1) + 𝐑𝐑 � −1 (𝑛𝑛)𝐝𝐝̂(𝑛𝑛) = 𝐜𝐜(𝑛𝑛)
� −1 (𝑛𝑛)𝐱𝐱(𝑛𝑛)𝑒𝑒(𝑛𝑛) = 𝐑𝐑
𝐜𝐜(𝑛𝑛) = 𝐜𝐜(𝑛𝑛 − 1) + 𝐠𝐠(𝑛𝑛)𝑒𝑒(𝑛𝑛)
where:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛 − 1) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 (𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛) is the a priori error
𝐠𝐠(𝑛𝑛) is the adaptation gain which is given by solving 𝐑𝐑 � (𝑛𝑛)𝐠𝐠(𝑛𝑛) = 𝐱𝐱(𝑛𝑛)
The a priori LS adaptive filter is summarised in Figure 7-7 (note that conversion factor 𝛼𝛼(𝑛𝑛)
is not otherwise required but can be formulated).

Figure 7-7 Summary of a priori and a posteriori LS adaptive filter algorithms

(Table 10.5[1]) [𝐱𝐱 𝑯𝑯 (𝒏𝒏) ≡ 𝐱𝐱 𝑻𝑻 (𝒏𝒏), 𝒆𝒆∗ (𝒏𝒏) ≡ 𝒆𝒆(𝒏𝒏), etc. for real signals]
7.3.3 Conventional Recursive Least-Squares (CRLS) Algorithm
The LS adaptive filter summarised in Figure 7-7 requires the solution of the 𝑀𝑀 × 𝑀𝑀 normal
equations (for an M-length FIR filter) for the adaptation gain vector at each time-step. This
makes practical implementation of the LS adaptive filter difficult without a more efficient
method to calculate the adaptation gain vector. By formulating a time-recursive equation for
� −1 (𝑛𝑛), the adaptation gain can be derived
the inverse of the correlation matrix, 𝐏𝐏(𝑛𝑛) = 𝐑𝐑
directly as 𝐠𝐠�(𝑛𝑛) = 𝐏𝐏(𝑛𝑛 − 1)𝐱𝐱(𝑛𝑛). The CRLS algorithm summarised in Figure 7-8
represents a practical implementation of the a priori RLS algorithm. For derivation see [1:
552-553][2: 543-544].
Figure 7-8 Summary of the CRLS algorithm (Table 10.6[1])

7.3.4 Performance of RLS adaptive filter
Convergence in the mean
Growing memory RLS algorithm (𝜆𝜆 = 1)

𝐸𝐸{𝐜𝐜(𝑛𝑛)} = 𝐜𝐜𝑂𝑂 (𝑛𝑛) for 𝑛𝑛 > 𝑀𝑀
Exponential memory RLS algorithm (0 < λ < 1)

𝐸𝐸{𝐜𝐜(𝑛𝑛)} → 𝐜𝐜𝑂𝑂 (𝑛𝑛) as 𝑛𝑛 → ∞
Mean square deviation

𝐌𝐌𝐌𝐌𝐌𝐌(𝑛𝑛) → 0 as 𝑛𝑛 → ∞

1−λ
𝐌𝐌𝐌𝐌𝐌𝐌(𝑛𝑛) → 𝐾𝐾 as 𝑛𝑛 → ∞
1+λ
A priori excess MSE

𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) → 0 as 𝑛𝑛 → ∞

1−λ
𝑃𝑃𝑒𝑒𝑒𝑒 (𝑛𝑛) → 𝐾𝐾 as 𝑛𝑛 → ∞
1+λ
NOTE
1) The growing memory RLS algorithm yields no deviation and no excess MSE in the
steady-state whereas the exponential memory RLS algorithm exhibits a finite deviation and
excess MSE in the steady-state which decreases to zero as λ approaches 1.
2) However the growing memory RLS algorithm is not able to track changes to new
statistical environments and should only be deployed in stationary SOEs. The exponential
memory RLS algorithm is the more practical implementation as the 𝜆𝜆 can be used to trade-off
performance and tracking ability.
3) Unlike the LMS algorithm the convergence of the RLS algorithm is not sensitive to the
eigenvalue spread.

7.4 Tracking Performance of Adaptive Algorithms

[1: 593-607]
Consider the steady-state co-efficient error vector which can be written as:
𝐜𝐜�(𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐜𝐜𝑂𝑂 (𝑛𝑛)
= [𝐜𝐜(𝑛𝑛) − 𝐸𝐸{𝐜𝐜(𝑛𝑛)}] + [𝐸𝐸{𝐜𝐜(𝑛𝑛)} − 𝐜𝐜𝑂𝑂 (𝑛𝑛)]
≡ 𝐜𝐜�𝑒𝑒 (𝑛𝑛) + 𝐜𝐜�𝑙𝑙 (𝑛𝑛)
where
• 𝐜𝐜�𝑒𝑒 (𝑛𝑛) = 𝐜𝐜(𝑛𝑛) − 𝐸𝐸{𝐜𝐜(𝑛𝑛)} is the estimation error which represents the fluctuations of the
adaptive filter parameter vector about its mean.
• 𝐜𝐜�𝑙𝑙 (𝑛𝑛) = 𝐸𝐸{𝐜𝐜(𝑛𝑛)} − 𝐜𝐜𝑂𝑂 (𝑛𝑛) is the lag error which represents the bias in 𝐜𝐜(𝑛𝑛) with respect
to the optimal 𝐜𝐜𝑂𝑂 (𝑛𝑛).
We note that:
1. In stationary SOEs the estimation error manifests itself as the noisy fluctuations about the
constant optimal 𝐜𝐜𝑂𝑂 and the lag error is zero.
2. In nonstationary SOEs the estimation error manifests itself as the noisy fluctuations about
the mean trajectory of 𝐜𝐜(𝑛𝑛) and the lag error manifests itself as the deviation between the
𝐜𝐜(𝑛𝑛) curve and the optimal 𝐜𝐜𝑂𝑂 (𝑛𝑛) curve.
7.4.1 LMS Algorithm

The estimation misadjustment/MSD are all ∝ µ and increase with increasing step-size
1
parameter µ, whereas the lag misadjustment/MSD are all ∝ and decrease with increasing
µ
step-size parameter µ. Thus there is an optimal µ𝑜𝑜𝑜𝑜𝑜𝑜 that can be chosen that minimises the
combined (estimation + lag) misadjustment/MSD. Derivation of this optimal µ𝑜𝑜𝑜𝑜𝑜𝑜 is beyond
the scope of these notes but can be found in [1, pages 597-599].
7.4.2 RLS algorithm

The estimation misadjustment/MSD are all ∝ 1 − λ and decrease with increasing forgetting
factor λ reaching zero estimation misadjustment/MSD when λ = 1, whereas the lag
1
misadjustment/MSD are all ∝ and increase with increasing forgetting factor λ reaching
1−λ
infinite lag misadjustment/MSD when λ = 1. Thus the growing memory RLS algorithm will
fail to track nonstationary SOEs and there is an optimal λ𝑜𝑜𝑜𝑜𝑜𝑜 that can be chosen that
minimises the combined (estimation + lag) misadjustment/MSD. Derivation of this optimal
λ𝑜𝑜𝑜𝑜𝑜𝑜 is beyond the scope of these notes but can be found in [1, pages 599-603].
7.5 References
McGraw-Hill, 2000.
8. Least-Squares Filtering and Prediction
8.1 Principle of Least Squares

[1: 395-396][2: 177-188]
The optimum MMSE estimator assumes knowledge of the second-order moments (i.e. the
autocorrelation sequence 𝑟𝑟(𝑙𝑙)). In practice such second-order moments are not available and
have to be estimated from the available data. Thus we resort to techniques applicable to
deterministic signals where we have access to actual data (although the origin may be
stochastic) rather than estimated statistics.
For ergodic stationary data, a widely used measure is the time-averaged sample
autocorrelation sequence:
𝑁𝑁−1
⎧1
⎪ � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 − 𝑙𝑙) 0 ≤ 𝑙𝑙 ≤ 𝑁𝑁 − 1
𝑟𝑟̂𝑥𝑥 (𝑙𝑙) = 𝑁𝑁 𝑛𝑛=𝑙𝑙
⎨𝑟𝑟 (−𝑙𝑙) −(𝑁𝑁 − 1) ≤ 𝑙𝑙 ≤ 0
⎪ 𝑥𝑥
⎩0 otherwise
Equation 8.1
which makes use only of the available data {𝑥𝑥(𝑛𝑛)}𝑁𝑁−1
0 Hence estimates for |𝑙𝑙| close to N will
.
not contain enough data samples to be reliable (i.e. 𝑁𝑁 − 𝑙𝑙 samples is a small number). A good
rule of thumb is to restrict |𝑙𝑙| ≤ 𝑁𝑁/4.
Alternatively a least-squares error (LSE) estimation can be performed based directly on

samples of the data and desired response. Since the desired signal is required and known using
LS for providing an estimate of the desired signal appears redundant. However LS analysis is
used in practice for several reasons:
• In system modelling applications the co-efficients or estimators are more important than
the signal estimate as these are used to provide a mathematical description and behaviour
of the system in the LS sense.
• For non-stationary signals (e.g. speech), the data framing approach of LS can be used to
provide short-time or quasi- stationary modelling.
• In cases where the desired response is not known the estimators can be calculated from
“training” data which is assumed to be sufficiently representative of that system.
8.2 Linear Least Squares Error Estimation

[1: 396-406]
Linear LSE estimation is based on the availability of measurements of the desired response
𝑦𝑦(𝑛𝑛) and the M input signals 𝑥𝑥𝑘𝑘 (𝑛𝑛) for 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀 over the measurement interval or analysis
frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1. As was the case with optimum MMSE estimation the problem is to
estimate the desired response 𝑦𝑦(𝑛𝑛) using a linear combination of the M input signals as
follows:
𝑀𝑀
𝑦𝑦�(𝑛𝑛) = � 𝑐𝑐𝑘𝑘 (𝑛𝑛)𝑥𝑥𝑘𝑘 (𝑛𝑛) = 𝐜𝐜 𝑇𝑇 (𝑛𝑛)𝐱𝐱(𝑛𝑛) for 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1

𝑘𝑘=1
Equation 8.2
with estimation error:
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝑦𝑦�(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − 𝐱𝐱 𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑛𝑛) for 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1
Equation 8.3
The key to linear LSE estimation is the minimisation of the sum of squared errors, or energy
of the error signal:
𝑁𝑁−1
𝐸𝐸 = � |𝑒𝑒(𝑛𝑛)|2
𝑛𝑛=0
For this minimisation to be possible, the coefficient vector 𝒄𝒄(𝑛𝑛) has to be held constant over
the analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1, i.e. 𝒄𝒄(𝑛𝑛) ≡ 𝒄𝒄. Furthermore the linear LSE estimator 𝒄𝒄𝒍𝒍𝒍𝒍
so obtained depends on the measurement set or particular analysis frame and a different
value will be obtained with different sets of data. This should be contrasted with the optimum
MMSE estimate 𝒄𝒄𝑶𝑶 which only depends on the second-order moments (e.g. it is time-
invariant for stationary signals).
The system of equations represented by Equation 8.3 for 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 can be written in

matrix form as:
𝐞𝐞 = 𝐲𝐲 − 𝐲𝐲� = 𝐲𝐲 − 𝐗𝐗𝐗𝐗
Equation 8.4
where:
𝑒𝑒(0)
𝑒𝑒(1)
𝐞𝐞 = � �, (𝑁𝑁 × 1) error data vector
⋮
𝑒𝑒(𝑁𝑁 − 1)
𝑦𝑦(0)
𝑦𝑦(1)
𝐲𝐲 = � �, (𝑁𝑁 × 1) desired response vector
⋮
𝑦𝑦(𝑁𝑁 − 1)
𝑥𝑥1 (0) 𝑥𝑥2 (0) ⋯ 𝑥𝑥𝑀𝑀 (0)
𝑥𝑥 (1) 𝑥𝑥2 (1) ⋯ 𝑥𝑥𝑀𝑀 (1)
𝐗𝐗 = � 1 �, (𝑁𝑁 × 𝑀𝑀) input data matrix
⋮ ⋮ ⋱ ⋮
𝑥𝑥1 (𝑁𝑁 − 1) 𝑥𝑥2 (𝑁𝑁 − 1) ⋯ 𝑥𝑥𝑀𝑀 (𝑁𝑁 − 1)
𝑐𝑐1
𝑐𝑐2
𝐜𝐜 = �⋮ �, (𝑀𝑀 × 1) estimator parameter vector
𝑐𝑐𝑀𝑀
The input data matrix can be partitioned columnwise or rowwise as follows:

𝐱𝐱 𝑇𝑇 (0)
⎡ 𝑇𝑇 ⎤
𝐗𝐗 = [𝐱𝐱�1 𝐱𝐱� 2 ⋯ 𝐱𝐱� 𝑀𝑀 ] = ⎢ 𝐱𝐱 (1) ⎥
⎢ ⋮ ⎥
⎣𝐱𝐱 𝑇𝑇 (𝑁𝑁 − 1)⎦
where the 𝑁𝑁 × 1 columns 𝐱𝐱� 𝑘𝑘 of X are called the data records (collected for input “sensor” k):
𝑥𝑥𝑘𝑘 (0)
𝑥𝑥 (1)
𝐱𝐱� 𝑘𝑘 = � 𝑘𝑘 �
⋮
𝑥𝑥𝑘𝑘 (𝑁𝑁 − 1)
𝑇𝑇
and the 1 × 𝑀𝑀 rows 𝐱𝐱 (𝑛𝑛) of X are called the snapshots (of all “sensors” at time n):
𝐱𝐱 𝑇𝑇 (𝑛𝑛) = [𝑥𝑥1 (𝑛𝑛) 𝑥𝑥2 (𝑛𝑛) ⋯ 𝑥𝑥𝑀𝑀 (𝑛𝑛)]
Equation 8.4 represents a system of N equations in M unknowns. The practical case of interest
for LS analysis that will be considered is for overdetermined systems when N > M.
The LSE estimator operates in block processing mode. That is, it processes a frame of N
snapshots of the data, where the data is blocked into frames of length N samples with
successive frames overlapping by 𝑁𝑁𝑂𝑂 samples. The required estimate or error signal are
unblocked at the final stage of the processor. The values of N and 𝑁𝑁𝑂𝑂 and the interpolation
between overlapping estimates depends on the application. This is illustrated in Figure 8-1.
Figure 8-1 Block processing implementation of general linear LSE estimator (Figure 8.2[1])
8.2.1 Derivation of Normal Equations
A geometric derivation to the normal equations will be provided which will highlight
important interpretations of the variables involved, especially the input data matrix, X.
The desired response vector, 𝐲𝐲, and data records, 𝐱𝐱� 𝑘𝑘 , for 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀 are considered as vectors
in an N-dimensional vector space, with the inner product defined by:
⟨𝐱𝐱�𝑖𝑖 , 𝐱𝐱� 𝑖𝑖 ⟩ = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐱𝐱� 𝑖𝑖 = ∑𝑁𝑁−1 �, 𝐱𝐱�⟩ = ‖𝐱𝐱�‖2
𝑛𝑛=0 𝑥𝑥𝑖𝑖 (𝑛𝑛)𝑥𝑥𝑗𝑗 (𝑛𝑛) with the special case ⟨𝐱𝐱
The estimate of the desired response can be expressed as:
𝑀𝑀
𝐲𝐲� = 𝐗𝐗𝐗𝐗 = � 𝑐𝑐𝑘𝑘 𝐱𝐱� 𝑘𝑘

𝑘𝑘=1
Equation 8.5
that is, as a linear combination of the data records (columns of X). Thus the M column
vectors, 𝐱𝐱� 𝑘𝑘 , of X form an M-dimensional sub-space called the estimation space. The linear
LSE estimate 𝐲𝐲� lies in the estimation space, whereas for 𝐞𝐞 ≠ 𝟎𝟎 the desired response 𝐲𝐲 does
not. The estimate that produces the minimum error 𝐞𝐞𝑙𝑙𝑙𝑙 = 𝐲𝐲 − 𝐲𝐲�𝑙𝑙𝑙𝑙 is the one such that the error

is orthogonal to the estimation space, that is 𝐞𝐞𝑙𝑙𝑙𝑙 ⊥ 𝐱𝐱� 𝑘𝑘 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀, which is the case when
𝐲𝐲�𝑙𝑙𝑙𝑙 is the projection of 𝐲𝐲 onto the estimation space. This is illustrated in Figure 8-2.
Figure 8-2 Vector space interpretation of LSE estimation for N=3 (dimension of data space)
and M=2 (dimension of estimation space) (Figure 8.5[1])
Therefore, we have the orthogonality principle:

⟨𝐱𝐱� 𝑘𝑘 , 𝐞𝐞⟩ = 𝐱𝐱� 𝑘𝑘𝑇𝑇 𝐞𝐞 = 0 1 ≤ 𝑘𝑘 ≤ 𝑀𝑀
which can be written in matrix form as:
𝐗𝐗 𝑇𝑇 𝐞𝐞 = 𝐗𝐗 𝑇𝑇 (𝐲𝐲 − 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 ) = 𝟎𝟎
that is:
(𝐗𝐗 𝑇𝑇 𝐗𝐗)𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐗𝐗 𝑇𝑇 𝐲𝐲
𝐑𝐑
Equation 8.6
where 𝐑𝐑 � = 𝐗𝐗 𝐗𝐗 represents the time-average correlation matrix and 𝐝𝐝 = 𝐗𝐗 𝐲𝐲 is the time-
𝑇𝑇 ̂ 𝑇𝑇
average cross-correlation vector. From Figure 8-2 we have the following trigonometric
identity:
‖𝐲𝐲‖2 = ‖𝐲𝐲�𝑙𝑙𝑙𝑙 ‖2 + ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖2
Now we define 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐞𝐞𝑇𝑇𝑙𝑙𝑙𝑙 𝐞𝐞𝑙𝑙𝑙𝑙 = ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖2 , 𝐸𝐸𝑦𝑦 = 𝐲𝐲 𝑇𝑇 𝐲𝐲 = ‖𝐲𝐲‖2 and note that from Equation 8.5
𝑇𝑇 𝑇𝑇 𝑇𝑇
𝐲𝐲�𝑙𝑙𝑙𝑙 = 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 . This together with Equation 8.6 gives ‖𝐲𝐲�𝑙𝑙𝑙𝑙 ‖2 = 𝐲𝐲�𝑙𝑙𝑙𝑙 𝐲𝐲�𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝑇𝑇 𝐲𝐲, and
thus we can say:
𝑇𝑇 ̂
𝑇𝑇 𝑇𝑇
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐜𝐜𝑙𝑙𝑙𝑙 𝐗𝐗 𝐲𝐲 = 𝐸𝐸𝑦𝑦 − 𝐜𝐜𝑙𝑙𝑙𝑙 𝐝𝐝 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙
Equation 8.7
8.2.2 Computational Considerations
For linear MSE estimation the computational requirements for calculation of 𝐑𝐑 � = 𝐗𝐗 𝑇𝑇 𝐗𝐗 and
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 are as important as the solution of the normal equations of Equation 8.6 themselves.
The formulation of the normal equations for LS estimation is illustrated by Figure 8-3.

� = 𝐗𝐗 𝑻𝑻 𝐗𝐗 and 𝐝𝐝̂ = 𝐗𝐗 𝑻𝑻 𝐲𝐲 for the LS normal equations

Figure 8-3 Formulation of 𝐑𝐑
(Figure 8.3[1])
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 is symmetric only the upper triangular elements 𝑟𝑟̂𝑖𝑖𝑖𝑖 = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐱𝐱�𝑗𝑗 for 𝑗𝑗 ≥ 𝑖𝑖 need to
Since 𝐑𝐑
be calculated requiring 𝑀𝑀(𝑀𝑀 + 1)/2 dot products with N arithmetic operations per dot
product. The 𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 requires calculation of 𝑑𝑑̂𝑖𝑖 = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐲𝐲 requiring M dot-products with N
operations per dot-product. Thus, to form the normal equations requires a total of:
1 1 3
𝑀𝑀(𝑀𝑀 + 1)𝑁𝑁 + 𝑀𝑀𝑀𝑀 = 𝑀𝑀2 𝑁𝑁 + 𝑀𝑀𝑀𝑀
2 2 2
arithmetic operations. Solution of the normal equations by standard techniques like LDL𝐻𝐻 or
Cholesky decomposition [1, Section 6.3] requires 𝑂𝑂(𝑀𝑀3 ) operations. For over-determined
systems of interest where 𝑁𝑁 > 𝑀𝑀 this may mean that more computational work will be
involved in forming the normal equations 𝑂𝑂(𝑀𝑀2 𝑁𝑁) than in solving them 𝑂𝑂(𝑀𝑀 3 )!
8.2.3 Solution of Normal Equations
Uniqueness Theorem
The over-determined (𝑁𝑁 > 𝑀𝑀) LS problem has a unique solution provided by the normal
equations of Equation 8.6 if the time-average correlation matrix 𝐑𝐑� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 is positive definite,
or equivalently if the data matrix 𝐗𝐗 has linearly independent columns.
The LS solution can be expressed as:

� −1 𝐝𝐝̂ = (𝐗𝐗 𝑇𝑇 𝐗𝐗)−1 𝐗𝐗 𝑇𝑇 𝐲𝐲 = 𝐗𝐗 + 𝐲𝐲
𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐑𝐑
where 𝐗𝐗 + = (𝐗𝐗 𝑇𝑇 𝐗𝐗)−1 𝐗𝐗 𝑇𝑇 is an 𝑀𝑀 × 𝑁𝑁 matrix known as the pseudo-inverse or the Moore-
Penrose generalised inverse matrix of 𝐗𝐗.
The LS estimate can be expressed as:

𝐲𝐲�𝑙𝑙𝑙𝑙 = 𝐗𝐗𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐗𝐗(𝐗𝐗 𝑇𝑇 𝐗𝐗)−1 𝐗𝐗 𝑇𝑇 𝐲𝐲 = 𝐏𝐏𝐏𝐏
where 𝐏𝐏 = 𝐗𝐗(𝐗𝐗 𝑇𝑇 𝐗𝐗)−1 𝐗𝐗 𝑇𝑇 is the (𝑁𝑁 × 𝑁𝑁) projection matrix since it projects the data vector 𝐲𝐲
onto the column space of 𝐗𝐗 to provide the LS estimate 𝐲𝐲. The projection 𝐏𝐏 is symmetric (or
Hermitian in the complex case) and idempotent:
𝐏𝐏 = 𝐏𝐏 𝐻𝐻 (Hermitian)
𝐏𝐏 2 = 𝐏𝐏 𝑇𝑇 𝐏𝐏 = 𝐏𝐏 (idempotent)
Exercise 8.1 Prove the idempotent property algebraically and conceptually by noting that P is
a projection matrix.
The LS error can be expressed as:

𝐞𝐞𝑙𝑙𝑙𝑙 = 𝐲𝐲 − 𝐲𝐲� = (𝐈𝐈 − 𝐏𝐏)𝐲𝐲
Statistical Properties of LS estimator

1. For both deterministic data and random data the LS estimator, 𝐜𝐜𝑙𝑙𝑙𝑙 , is an unbiased estimator
of the optimum MMSE estimator, 𝐜𝐜𝑂𝑂 , that is
𝐸𝐸{𝐜𝐜𝑙𝑙𝑙𝑙 } = 𝐜𝐜𝑂𝑂
2. For both deterministic data and random data an unbiased estimate of the error variance,
�2𝑒𝑒𝑂𝑂 , is given by:
σ
𝐸𝐸𝑙𝑙𝑙𝑙
�2𝑒𝑒𝑂𝑂 =
σ
𝑁𝑁 − 𝑀𝑀
3. For both deterministic data and random data the covariance matrix of 𝐜𝐜𝑙𝑙𝑙𝑙 is given by:
Γ𝑙𝑙𝑙𝑙 = 𝐸𝐸{(𝐜𝐜𝑙𝑙𝑙𝑙 − 𝐜𝐜𝑂𝑂 )(𝐜𝐜𝑙𝑙𝑙𝑙 − 𝐜𝐜𝑂𝑂 )𝑇𝑇 } = σ �2𝑒𝑒𝑂𝑂 𝐸𝐸{(𝐗𝐗 𝑇𝑇 𝐗𝐗)−1 } = σ � −1
�2𝑒𝑒𝑂𝑂 𝐑𝐑
Example 8.1
Problem: Estimate the sequence 𝐲𝐲 = [1 2 3 2]𝑇𝑇 from the observation data records 𝐱𝐱�1 =
[1 2 1 1]𝑇𝑇 and 𝐱𝐱� 2 = [2 1 2 3]𝑇𝑇 by determining the optimum filter, the error vector,
𝐞𝐞𝑙𝑙𝑙𝑙 , and LSE 𝐸𝐸𝑙𝑙𝑙𝑙 .
Solution: Compute the time-average quantities:
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 = �7 9 �,
𝐑𝐑
10
𝐝𝐝̂ = 𝐗𝐗 𝑇𝑇 𝐲𝐲 = � �
9 18 16
where:
1 2
2 1
𝐗𝐗 = [𝐱𝐱�1 𝐱𝐱� 2 ] = � �
1 2
1 3
and the solve the normal equations to obtain the LS estimator:
2 1 4
−
� −1 𝐝𝐝̂ = �5
𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐑𝐑 5� �10� = �5 �
1 7 16 22
−
5 45 45
and the LSE:
4
98
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = ‖𝐲𝐲‖2 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = 18 − [10 16] �5 � =
22 45
45
The projection matrix is:
2 1 2 1
⎡ ⎤
⎢ 9 9 9 3 ⎥
1 43 1 2
⎢ − ⎥
𝑇𝑇
𝐏𝐏 = 𝐗𝐗(𝐗𝐗 𝐗𝐗) 𝐗𝐗 =−1 𝑇𝑇 ⎢ 9 45 9 15⎥
⎢ 2 1 2 1 ⎥
⎢9 9 9 3 ⎥
⎢1 2 1 3 ⎥
⎣3 −
15 3 5 ⎦
which can be used to determine the error vector:
7 4 11 4 𝑇𝑇
𝐞𝐞𝑙𝑙𝑙𝑙 = (𝐈𝐈 − 𝐏𝐏)𝐲𝐲 = �− − − �
9 45 9 15
2 98
from which we get ‖𝐞𝐞𝑙𝑙𝑙𝑙 ‖ = = 𝐸𝐸𝑙𝑙𝑙𝑙
45
8.3 Least-Squares FIR Filters

[1: 406-411]
8.3.1 Formulation of Normal Equations
Linear LS estimation of FIR filters adopts the same framework used in optimum MMSE FIR
filter estimation. Previously we had:
𝑀𝑀−1
𝑒𝑒(𝑛𝑛) = 𝑦𝑦(𝑛𝑛) − � ℎ(𝑘𝑘)𝑥𝑥(𝑛𝑛 − 𝑘𝑘) = 𝑦𝑦(𝑛𝑛) − 𝐜𝐜 𝑇𝑇 𝐱𝐱(𝑛𝑛)

𝑘𝑘=0
Equation 8.8
where 𝑦𝑦(𝑛𝑛) is the desired response, 𝑒𝑒(𝑛𝑛) is the error vector,
𝐱𝐱(𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) ⋯ 𝑥𝑥(𝑛𝑛 − 𝑀𝑀 + 1)]𝑇𝑇
is the (𝑀𝑀 × 1) input data vector and
𝐜𝐜 = [𝑐𝑐0 𝑐𝑐1 ⋯ 𝑐𝑐𝑀𝑀−1 ]𝑇𝑇
is the (𝑀𝑀 × 1) filter co-efficient vector related to the impulse response by 𝑐𝑐𝑘𝑘 = ℎ(𝑘𝑘).
With linear LS estimation we assume availability of the desired response 𝑦𝑦(𝑛𝑛) and input
signal 𝑥𝑥(𝑛𝑛) over the measurement interval or analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 and hold the
linear LS estimators or filter coefficients 𝐜𝐜 = [𝑐𝑐0 𝑐𝑐1 ⋯ 𝑐𝑐𝑀𝑀−1 ]𝑇𝑇 constant over this
interval. However unlike the general linear LS estimation case (Section 8.2) Equation 8.8
holds for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓 where 𝑁𝑁𝑖𝑖 and 𝑁𝑁𝑓𝑓 depend on the type of “windowing” that is applied.
The need for this windowing arises by noting that data outside the measurement interval may
be required. For example, at 𝑛𝑛 = 0 we require 𝐱𝐱(0) = [𝑥𝑥(0) 𝑥𝑥(−1) ⋯ 𝑥𝑥(−𝑀𝑀 + 1)]𝑇𝑇
and the data 𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀 + 1) lies outside the measurement interval 0 ≤ 𝑛𝑛 ≤
𝑁𝑁 − 1. In matrix form the system of equations can be written as:
𝐞𝐞 = 𝐲𝐲 − 𝐗𝐗𝐗𝐗 for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓
where 𝐞𝐞, 𝐲𝐲 and 𝐗𝐗 are given by:
𝐞𝐞 = [𝑒𝑒(𝑁𝑁𝑖𝑖 ) 𝑒𝑒(𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑒𝑒(𝑁𝑁𝑓𝑓 )]𝑇𝑇
𝐲𝐲 = [𝑦𝑦(𝑁𝑁𝑖𝑖 ) 𝑦𝑦(𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑦𝑦(𝑁𝑁𝑓𝑓 )]𝑇𝑇
𝐱𝐱 𝑇𝑇 (𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 − 1) ⋯ 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑀𝑀 + 1)
⎡ 𝑇𝑇 ⎤
𝐱𝐱 (𝑁𝑁𝑖𝑖 + 1)⎥ 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1) 𝑥𝑥((𝑁𝑁𝑖𝑖 + 1) − 1) ⋯ 𝑥𝑥((𝑁𝑁𝑖𝑖 + 1) − 𝑀𝑀 + 1)
𝐗𝐗 = ⎢ =� �
⎢⋮ ⎥ ⋮ ⋮ ⋱ ⋮
𝑇𝑇
⎣𝐱𝐱 (𝑁𝑁𝑓𝑓 ) ⎦ 𝑥𝑥(𝑁𝑁𝑓𝑓 ) 𝑥𝑥(𝑁𝑁𝑓𝑓 − 1) ⋯ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑀𝑀 + 1)
= [𝐱𝐱�1 𝐱𝐱� 2 ⋯ 𝐱𝐱� 𝑀𝑀 ]
where e and y are (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) length vectors and X is a (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) × 𝑀𝑀 matrix,
𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙)
⎡⋮ ⎤
⎢ ⎥
𝐱𝐱� 𝑙𝑙 = ⎢𝑥𝑥(𝑛𝑛 + 1 − 𝑙𝑙) ⎥
⎢⋮ ⎥
⎣𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑙𝑙)⎦
and the LS criterion is:
𝑁𝑁𝑓𝑓
𝐸𝐸 = � |𝑒𝑒(𝑛𝑛)|2 = 𝐞𝐞𝑇𝑇 𝐞𝐞
𝑛𝑛=𝑁𝑁𝑖𝑖
which is minimised when the LS FIR filter co-efficients are chosen so as to satisfy the normal
equations:
(𝐗𝐗 𝑇𝑇 𝐗𝐗)𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐗𝐗 𝑇𝑇 𝐲𝐲
𝐑𝐑
with an LS error of:
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑦𝑦 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = ‖𝐲𝐲‖2 − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗
8.3.2 Efficient computation of the correlation matrix 𝐑𝐑
� = 𝐗𝐗 𝑇𝑇 𝐗𝐗 are given by:

The elements of the time-average correlation matrix 𝐑𝐑
𝑁𝑁𝑓𝑓
𝑟𝑟̂𝑖𝑖𝑖𝑖 = 𝐱𝐱� 𝑖𝑖𝑇𝑇 𝐱𝐱�𝑗𝑗 = � 𝑥𝑥(𝑛𝑛 + 1 − 𝑖𝑖)𝑥𝑥(𝑛𝑛 + 1 − 𝑗𝑗) 1 ≤ 𝑖𝑖, 𝑗𝑗 ≤ 𝑀𝑀

𝑛𝑛=𝑁𝑁𝑖𝑖
Equation 8.9
By simple manipulation this leads to the following recursive expression:
𝑟𝑟̂𝑖𝑖+1,𝑗𝑗+1 = 𝑟𝑟̂𝑖𝑖𝑖𝑖 + 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑖𝑖)𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑗𝑗) − 𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑖𝑖)𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑗𝑗)
Equation 8.10

This recursion holds because the columns of X are obtained by shifting the first column. The
recursion suggests the following efficient way to compute 𝐑𝐑 �:
1. Compute the first row of 𝐑𝐑 � using Equation 8.9. This requires M dot products and a total of
𝑀𝑀(𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 ) operations.
2. Compute the remaining elements in the upper triangular part of 𝐑𝐑 � using the recursion
Equation 8.10. The required number of operations are 𝑂𝑂(𝑀𝑀2 ).
3. Compute the lower triangular part of 𝐑𝐑 � is symmetric.
� from 𝑟𝑟𝑗𝑗𝑗𝑗 = 𝑟𝑟𝑖𝑖𝑖𝑖 since 𝐑𝐑
8.3.3 Windowing: Selecting 𝑁𝑁𝑖𝑖 and 𝑁𝑁𝑓𝑓
In the following we assume an over-determined system 𝑁𝑁 > 𝑀𝑀. The data, 𝑥𝑥(𝑛𝑛) and 𝑦𝑦(𝑛𝑛), are
also assumed derived by windowing the sequence by a rectangular window of length 0 ≤ 𝑛𝑛 ≤
𝑁𝑁 − 1. That is, only data in the analysis frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 is available and data outside
that interval are assumed to be zero.
Case 1: No windowing
𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1 implying:
𝐲𝐲 = [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1)]𝑇𝑇
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1)
𝐗𝐗 = � � = [𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 ]
⋮ ⋮ ⋱ ⋮
𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀)
That is, all the available data is used without any distortions arising from using data outside
the measurement interval. In the signal processing literature this is sometimes referred to as
the covariance method.
Case 2: Prewindowing
𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1 implying:
𝐲𝐲 = [𝑦𝑦(0) 𝑦𝑦(1) ⋯ 𝑦𝑦(𝑁𝑁 − 1)]𝑇𝑇
𝑥𝑥(0) 𝑥𝑥(−1) ⋯ 𝑥𝑥(−𝑀𝑀 + 1) 𝑥𝑥(0) 0 ⋯ 0
⎡ ⎤ ⎡ ⎤
𝑥𝑥(1) 𝑥𝑥(0) ⋯ 𝑥𝑥(−𝑀𝑀 + 2) 𝑥𝑥(1) 𝑥𝑥(0) ⋯ 0
⎢ ⎥ ⎢ ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎢⋮ ⋮ ⋱ ⋮ ⎥
𝐗𝐗 = ⎢
𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) 𝑥𝑥(−1) ⎥ = ⎢𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) 0 ⎥ = �𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝 �
⎢𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥ ⎢𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥ 𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛
⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥ ⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥ ⎢⋮ ⋮ ⋱ ⋮ ⎥
⎣𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎦ ⎣𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀)⎦
That is, access is required to the data 𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀 + 1) which is outside the
measurement interval and are all set equal to zero. This method is widely used in LS adaptive
filtering.

Case 3: Postwindowing
𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 2 implying:
𝐲𝐲 = [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 𝑦𝑦(𝑁𝑁) ⋯ 𝑦𝑦(𝑁𝑁 + 𝑀𝑀 − 2)]𝑇𝑇
= [𝑦𝑦(𝑀𝑀 − 1) 𝑦𝑦(𝑀𝑀) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 0 ⋯ 0]𝑇𝑇
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
⎡ ⎤
⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢ 𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎥
𝐗𝐗 =
⎢𝑥𝑥(𝑁𝑁) 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 4) ⋯ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0)
⎡ ⎤
⎢ ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
=⎢
𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎥ = �𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 �
⎢0 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥ 𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢0 0 ⋯ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣0 0 ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
That is, access is required to the data 𝑥𝑥(𝑁𝑁), 𝑥𝑥(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) and signal 𝑦𝑦(𝑁𝑁),
𝑦𝑦(𝑁𝑁 + 1), … , 𝑦𝑦(𝑁𝑁 + 𝑀𝑀 − 2) which are outside the measurement interval and are all set
equal to zero. This method is not widely used in practice.

Case 4: Full Windowing

𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 2 implying:
𝐲𝐲 = [𝑦𝑦(0) 𝑦𝑦(1) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 𝑦𝑦(𝑁𝑁) ⋯ 𝑦𝑦(𝑁𝑁 + 𝑀𝑀 − 2)]𝑇𝑇
= [𝑦𝑦(0) 𝑦𝑦(1) ⋯ 𝑦𝑦(𝑁𝑁 − 1) 0 ⋯ 0]𝑇𝑇
𝑥𝑥(0) 𝑥𝑥(−1) ⋯ 𝑥𝑥(−𝑀𝑀 + 1)
⎡ ⎤
𝑥𝑥(1) 𝑥𝑥(0) ⋯ 𝑥𝑥(−𝑀𝑀 + 2)
⎢ ⎥
⎢ ⋮ ⋮ ⋱ ⋮ ⎥
⎢𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) ⋯ 𝑥𝑥(−1) ⎥
𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝 ⎢ 𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥
⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥
𝐗𝐗 = �𝐗𝐗 𝑛𝑛𝑛𝑛𝑛𝑛𝑛𝑛 � = ⎢ ⎥
⋮ ⋮ ⋱ ⋮
𝐗𝐗 𝑝𝑝𝑝𝑝𝑝𝑝𝑝𝑝 ⎢𝑥𝑥(𝑁𝑁 − 1) ⎥
𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀)
⎢ ⎥
⎢𝑥𝑥(𝑁𝑁) 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢ 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 − 4) ⋱ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 3) ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
𝑥𝑥(0) 0 ⋯ 0
⎡ ⎤
𝑥𝑥(1) 𝑥𝑥(0) ⋯ 0
⎢ ⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢𝑥𝑥(𝑀𝑀 − 2) 𝑥𝑥(𝑀𝑀 − 3) ⋯ 0 ⎥
⎢𝑥𝑥(𝑀𝑀 − 1) 𝑥𝑥(𝑀𝑀 − 2) ⋯ 𝑥𝑥(0) ⎥
⎢𝑥𝑥(𝑀𝑀) 𝑥𝑥(𝑀𝑀 − 1) ⋯ 𝑥𝑥(1) ⎥
=⎢ ⎥
⋮ ⋮ ⋱ ⋮
⎢𝑥𝑥(𝑁𝑁 − 1) 𝑥𝑥(𝑁𝑁 − 2) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀) ⎥
⎢ ⎥
⎢0 𝑥𝑥(𝑁𝑁 − 1) ⋯ 𝑥𝑥(𝑁𝑁 − 𝑀𝑀 + 1)⎥
⎢⋮ ⋮ ⋱ ⋮ ⎥
⎢0 0 ⋱ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣0 0 ⋯ 𝑥𝑥(𝑁𝑁 − 1) ⎦
where the unavailable data 𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀 + 1), 𝑥𝑥(𝑁𝑁), 𝑥𝑥(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 +
𝑀𝑀 − 2) and 𝑦𝑦(𝑁𝑁), 𝑦𝑦(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 2) are set equal to zero. From Equation 8.10
we note:
𝑟𝑟̂𝑖𝑖+1,𝑗𝑗+1 = 𝑟𝑟̂𝑖𝑖𝑖𝑖 + 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑖𝑖)𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑗𝑗) − 𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑖𝑖)𝑥𝑥(𝑁𝑁𝑓𝑓 + 1 − 𝑗𝑗)
= 𝑟𝑟̂𝑖𝑖𝑖𝑖 + 𝑥𝑥(−𝑖𝑖)𝑥𝑥(−𝑗𝑗) − 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 1 − 𝑖𝑖)𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 1 − 𝑗𝑗)
= 𝑟𝑟̂𝑖𝑖𝑖𝑖 1 ≤ 𝑖𝑖, 𝑗𝑗 ≤ 𝑀𝑀
Thus the elements 𝑟𝑟̂𝑖𝑖𝑖𝑖 depend on the lag 𝑖𝑖 − 𝑗𝑗 and matrix 𝐑𝐑 � becomes Toeplitz. This method
provides the sampled or estimated autocorrelation sequence, 𝑟𝑟̂𝑥𝑥 (𝑙𝑙) = 𝑟𝑟̂𝑖𝑖,𝑖𝑖−𝑙𝑙 = 𝑟𝑟̂𝑖𝑖−𝑙𝑙,𝑖𝑖 , of
Equation 8.1 and is sometimes referred to in the signal processing literature as the
autocorrelation method.
� and vector 𝐝𝐝̂ for the various windowing methods are computed by using the
The matrix 𝐑𝐑
SASP MATLAB function [1] [R,d]=lsmatvec(method,x,M,y) for filter order M,
input data vector x, desired response vector y, and method = “prew”, “post”, “full” or
“nowi”.

Figure 8-4 shows the FIR LSE filter operating in block processing mode.
Figure 8-4 Block processing implementation of FIR LSE filter (Figure 8.6[1])
8.4 Linear Least-Squares Signal Estimation

[1: 411-414]
Adopting the same notation and theoretical development previously for the FLP, BLP and
smoother MMSE estimators we formulate the LSE estimators by restating the following
equation for estimating the 𝑙𝑙 𝑡𝑡ℎ sample of the signal using M other samples:
𝑙𝑙−1 𝑀𝑀
𝑒𝑒 (𝑙𝑙) (𝑛𝑛) = � 𝑐𝑐𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘) + 𝑥𝑥(𝑛𝑛 − 𝑙𝑙) + � 𝑐𝑐𝑘𝑘 𝑥𝑥(𝑛𝑛 − 𝑘𝑘)
𝑘𝑘=0 𝑘𝑘=𝑙𝑙+1
= 𝑥𝑥(𝑛𝑛 − 𝑙𝑙) + 𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑛𝑛)𝐜𝐜(𝑙𝑙)
Equation 8.11
where:
𝐱𝐱 𝑙𝑙 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛) 𝑥𝑥(𝑛𝑛 − 1) … 𝑥𝑥�𝑛𝑛 − (𝑙𝑙 − 1)� 𝑥𝑥(𝑛𝑛 − (𝑙𝑙 + 1)) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇
𝐜𝐜 (𝑙𝑙) = [𝑐𝑐0 𝑐𝑐1 … 𝑐𝑐𝑙𝑙−1 𝑐𝑐𝑙𝑙+1 … 𝑐𝑐𝑀𝑀 ]𝑇𝑇
are (𝑀𝑀 × 1) vectors.
For LSE estimation the predictor co-efficients, 𝐜𝐜 (𝑙𝑙) , are held constant over the measurement
interval or frame 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 Equation 8.11 can be written compactly as:
𝐞𝐞(𝑙𝑙) = 𝐱𝐱 (−𝑙𝑙) + 𝐗𝐗 𝑙𝑙 𝐜𝐜 (𝑙𝑙) for 𝑁𝑁𝑖𝑖 ≤ 𝑛𝑛 ≤ 𝑁𝑁𝑓𝑓
where:
𝑇𝑇
𝐞𝐞(𝑙𝑙) = �𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 ) 𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 + 1) ⋯ 𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑓𝑓 )�
𝐱𝐱 (−𝑙𝑙) = [𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑙𝑙) 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙) ⋯ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑙𝑙)]𝑇𝑇
are �𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1� × 1 vectors, and:
𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑁𝑁𝑖𝑖 )
⎡ 𝑇𝑇 ⎤
⎢ 𝐱𝐱 𝑙𝑙 (𝑁𝑁𝑖𝑖 + 1)⎥
𝐗𝐗 𝑙𝑙 = ⎢ ⎥
⎢⋮ 𝑇𝑇 ⎥
⎣𝐱𝐱𝑙𝑙 �𝑁𝑁𝑓𝑓 � ⎦
is a (𝑁𝑁𝑓𝑓 − 𝑁𝑁𝑖𝑖 + 1) × 𝑀𝑀 matrix.

Hence:
𝑒𝑒 (𝑙𝑙) (𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁 𝑖𝑖 − 𝑙𝑙) 𝐱𝐱 𝑙𝑙𝑇𝑇 (𝑁𝑁𝑖𝑖 )
⎡ ⎤ ⎡ 𝑇𝑇 ⎤
⎢ 𝑒𝑒 (𝑙𝑙)
(𝑁𝑁 + 1) ⎥ 𝑥𝑥(𝑁𝑁𝑖𝑖 + 1 − 𝑙𝑙) ⎢ 𝐱𝐱 𝑙𝑙 (𝑁𝑁𝑖𝑖 + 1)⎥ (𝑙𝑙)
𝐞𝐞(𝑙𝑙) =⎢ 𝑖𝑖
⎥ =� �+⎢ ⎥ 𝐜𝐜 = 𝐱𝐱 (𝑙𝑙) + 𝐗𝐗 𝑙𝑙 𝐜𝐜 (𝑙𝑙)
⋮
⎢ (𝑙𝑙) ⋮ ⎥ ⎢⋮ 𝑇𝑇 ⎥
⎣ 𝑒𝑒 (𝑁𝑁 𝑓𝑓 ) ⎦ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑙𝑙) 𝐱𝐱
⎣ 𝑙𝑙 𝑓𝑓 �𝑁𝑁 � ⎦
where:
for full windowing : 𝑁𝑁𝑖𝑖 = 0, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 1,
for no windowing : 𝑁𝑁𝑖𝑖 = 𝑀𝑀, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1, and
𝑥𝑥(−1), 𝑥𝑥(−2), … , 𝑥𝑥(−𝑀𝑀), 𝑥𝑥(𝑁𝑁), 𝑥𝑥(𝑁𝑁 + 1), … , 𝑥𝑥(𝑁𝑁 + 𝑀𝑀 − 1) are all set to zero
The solution for the LSE estimator is given by the usual normal equations
(𝑙𝑙)
(𝐗𝐗 𝑇𝑇𝑙𝑙 𝐗𝐗 𝑙𝑙 )𝐜𝐜𝑙𝑙𝑙𝑙 = −𝐗𝐗 𝑇𝑇𝑙𝑙 𝐱𝐱 (−𝑙𝑙)
� 𝐜𝐜 (𝑙𝑙) = 𝐝𝐝̂
𝐑𝐑 𝑙𝑙𝑙𝑙
Equation 8.12
with an LS error of:
(𝑙𝑙) (𝑙𝑙) 2 𝑇𝑇 (𝑙𝑙)
𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸𝑥𝑥 (−𝑙𝑙) − 𝐝𝐝̂𝑇𝑇 𝐜𝐜𝑙𝑙𝑙𝑙 = �𝐱𝐱 (−𝑙𝑙) � + �𝐱𝐱 (−𝑙𝑙) � 𝐗𝐗 𝑙𝑙 𝐜𝐜𝑙𝑙𝑙𝑙
Special cases
(𝑀𝑀/2) (𝑀𝑀/2)
LSE SLS 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 𝑠𝑠 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐜𝐜𝑙𝑙𝑙𝑙
(0) 𝑓𝑓 (0)
LSE FLP 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐚𝐚𝑙𝑙𝑙𝑙
LSE BLP 𝐸𝐸𝑙𝑙𝑙𝑙 = 𝐸𝐸 𝑏𝑏 and 𝐜𝐜𝑙𝑙𝑙𝑙 = 𝐛𝐛𝑙𝑙𝑙𝑙
LSE FLP (𝒍𝒍 = 𝟎𝟎)
𝐱𝐱 0 (𝑛𝑛) = [𝑥𝑥(𝑛𝑛 − 1) 𝑥𝑥(𝑛𝑛 − 2) … 𝑥𝑥(𝑛𝑛 − 𝑀𝑀)]𝑇𝑇

𝐚𝐚𝑙𝑙𝑙𝑙 = [ 𝑎𝑎1 𝑎𝑎2 … 𝑎𝑎𝑀𝑀 ]𝑇𝑇
𝑥𝑥(𝑁𝑁𝑖𝑖 ) 𝐱𝐱 𝑇𝑇 (𝑁𝑁 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 − 1) 𝑥𝑥(𝑁𝑁𝑖𝑖 − 2) ⋯ 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑀𝑀)

⎡ 0𝑇𝑇 𝑖𝑖 ⎤
𝑥𝑥(𝑁𝑁𝑖𝑖 + 1) 𝐱𝐱 (𝑁𝑁 + 1)⎥ 𝑥𝑥(𝑁𝑁𝑖𝑖 ) 𝑥𝑥(𝑁𝑁𝑖𝑖 − 1) ⋯ 𝑥𝑥(𝑁𝑁𝑖𝑖 − 𝑀𝑀 + 1)
𝐱𝐱 (−0) =� � 𝐗𝐗 0 = ⎢ 0 𝑖𝑖 =� �
⋮ ⎢⋮ ⎥ ⋮ ⋮ ⋱ ⋮
𝑥𝑥(𝑁𝑁𝑓𝑓 ) 𝑇𝑇
⎣𝐱𝐱0 �𝑁𝑁𝑓𝑓 � ⎦ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 1) 𝑥𝑥(𝑁𝑁𝑓𝑓 − 2) ⋯ 𝑥𝑥(𝑁𝑁𝑓𝑓 − 𝑀𝑀)

Example 8.2
Problem: Observations over the interval 0 ≤ 𝑛𝑛 ≤ 𝑁𝑁 − 1 are available for the signal 𝑥𝑥(𝑛𝑛) =
α𝑛𝑛 , where α is an arbitrary constant. Determine the first-order (M = 1) one-step forward linear
predictor via LSE estimation, using the full-windowing and no-windowing methods
Solution for full windowing (𝑁𝑁𝑖𝑖 = 0, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 1 = 𝑁𝑁)

The (𝑁𝑁 + 1) × 1 data matrix/vector is given by:
𝑥𝑥(0) 𝑥𝑥(−1) → 0
⎡ ⎤ ⎡ ⎤
𝑥𝑥(1) 𝑥𝑥(0)
⎢ ⎥ ⎢ ⎥ (1)
𝐱𝐱 (−0) = ⎢ ⋮ ⎥, 𝐗𝐗 0 = ⎢ ⋮ ⎥ , 𝐚𝐚𝑙𝑙𝑙𝑙 = 𝑎𝑎1
⎢ 𝑥𝑥(𝑁𝑁 − 1) ⎥ ⎢ 𝑥𝑥(𝑁𝑁 − 2) ⎥
⎣𝑥𝑥(𝑁𝑁) → 0⎦ ⎣ 𝑥𝑥(𝑁𝑁 − 1) ⎦
(𝐗𝐗 𝑇𝑇0 𝐗𝐗 0 )𝐚𝐚𝑙𝑙𝑙𝑙 = −𝐗𝐗 𝑇𝑇0 𝐱𝐱 (−0)

2 𝑇𝑇
𝐸𝐸 𝑓𝑓 = �𝐱𝐱 (−0) � + �𝐱𝐱 (−0) � 𝐗𝐗 0 𝐚𝐚𝑙𝑙𝑙𝑙
gives:
(1)
𝑟𝑟̂11 𝑎𝑎1 = −𝑟𝑟̂12
(1)
𝐸𝐸 𝑓𝑓 = 𝑟𝑟̂22 + 𝑟𝑟̂21 𝑎𝑎1
where:
𝑁𝑁−1 𝑁𝑁−1
1 − |𝛼𝛼|2𝑁𝑁
𝑟𝑟̂11 = 𝑟𝑟̂22 = � |𝑥𝑥(𝑛𝑛)|2 =� |𝛼𝛼|2𝑛𝑛 =
1 − |𝛼𝛼|2
1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂12 = 𝑟𝑟̂21 = � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 1) = � |𝛼𝛼|2𝑛𝑛+1 = 𝛼𝛼
1 − |𝛼𝛼|2
and by carrying out the required algebraic simplification the solution gives:
(1) 𝑟𝑟̂21 1−|α|2(𝑁𝑁−1) 1−|α|2(2𝑁𝑁−1)
𝑎𝑎1 = − = −α , 𝐸𝐸 𝑓𝑓 =
𝑟𝑟̂11 1−|α|2𝑁𝑁 1−|α|2𝑁𝑁
From which we see:
(1)
• The FLP is minimum-phase since it can be shown that �𝑎𝑎1 � ≤ 1
(1)
• If |α| < 1, then for full windowing lim 𝑎𝑎1 = −α and lim 𝐸𝐸 𝑓𝑓 = 1 = 𝑥𝑥(0) which implies
𝑁𝑁→∞ 𝑁𝑁→∞
that for large N the LSE FLP approaches the optimum MMSE FLP.

Solution for no windowing (𝑁𝑁𝑖𝑖 = 1, 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1)

The (𝑁𝑁 − 1) × 1 data matrix/vector is given by:
𝑥𝑥(1) 𝑥𝑥(0)
⎡ ⎤ ⎡ ⎤
𝑥𝑥(2) 𝑥𝑥(1)
⎢ ⎥ ⎢ ⎥ (1)
𝐱𝐱 (−0) = ⎢ ⋮ ⎥ , 𝐗𝐗 0 = ⎢ ⋮ ⎥, 𝐚𝐚𝑙𝑙𝑙𝑙 = 𝑎𝑎1
⎢𝑥𝑥(𝑁𝑁 − 2)⎥ ⎢𝑥𝑥(𝑁𝑁 − 3)⎥
⎣𝑥𝑥(𝑁𝑁 − 1)⎦ ⎣𝑥𝑥(𝑁𝑁 − 2)⎦
and:
2
1 − |𝛼𝛼|2(𝑁𝑁−1) 1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂11 = � |𝑥𝑥(𝑛𝑛)| = , 𝑟𝑟̂22 = � |𝑥𝑥(𝑛𝑛)|2 = |𝛼𝛼|2
1 − |𝛼𝛼|2 1 − |𝛼𝛼|2
𝑁𝑁−2
1 − |𝛼𝛼|2(𝑁𝑁−1)
𝑟𝑟̂12 = 𝑟𝑟̂21 = � 𝑥𝑥(𝑛𝑛)𝑥𝑥(𝑛𝑛 + 1) = 𝛼𝛼
1 − |𝛼𝛼|2
𝑛𝑛=0
By carrying out the required algebraic simplification the solution gives:
(1) 𝑟𝑟̂
𝑎𝑎1 = − 21 = −α , 𝐸𝐸 𝑓𝑓 = 0
𝑟𝑟̂11
From which we see:
• The FLP is minimum-phase only when |α| < 1
• The no windowing LSE predictor is identical to the optimum MMSE predictor.
8.5 Numerical Considerations for LS computations

[1: 416-420]
8.5.1 LS computations based on the MMSE order-recursive algorithms
For FIR filtering and prediction the following observations can be made with regards to
adopting the order-recursive algorithms developed in [1: 355-360][2: 215-240].
• In the full windowing case (𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 + 𝑀𝑀 − 2) the order m correlation matrix
� 𝑚𝑚 is Toeplitz and the order-recursive algorithms of Levinson and Levinson-Durbin can
𝐑𝐑
be applied.
• In the prewindowing case (𝑁𝑁𝑖𝑖 = 0 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1) the order-recursive algorithms will
require time updatings. A similar results holds for postwindowing but is not of practical
interest.
• In the no windowing case (𝑁𝑁𝑖𝑖 = 𝑀𝑀 − 1 and 𝑁𝑁𝑓𝑓 = 𝑁𝑁 − 1) the correlation matrix 𝐑𝐑 � 𝑚𝑚
depends on both M and N resulting in complicated time updatings.

8.5.2 LS computations using orthogonalization techniques
In numerical analysis orthogonal decomposition methods [1, pages 422-431] applied directly
to the data matrix X are preferrable to the computation and solution of the normal equations
whenever numerical stability is important. The “squaring” 𝐑𝐑 � = 𝐗𝐗 𝑇𝑇 𝐗𝐗 of the data to form the
time-average correlation matrix results in a loss of information. Algorithms that compute the
Cholesky factor used in QR factorization directly from X are known as square-root methods.
8.5.3 LS computations using singular value decomposition
Singular Value Decomposition (SVD) is important in the solution of LS problems because it

provides a unified framework for the solution of overdetermined and underdetermined LS
problems with full rank or that are rank-deficient, and is the best numerical method to solve
LS problems in practice.
SVD Decomposition Theorem

Any real 𝑁𝑁 × 𝑀𝑀 matrix X with rank r (where r is the number of linearly independent columns
of X) can be written as:
𝐗𝐗 = 𝐔𝐔Σ𝐕𝐕𝐻𝐻
where U is an 𝑁𝑁 × 𝑁𝑁 unitary/orthogonal/rotation matrix (i.e. 𝐔𝐔 −1 = 𝐔𝐔 𝐻𝐻 ≡ 𝐔𝐔 𝑇𝑇 and ‖𝐔𝐔 𝐻𝐻 𝐱𝐱‖ =
‖𝐱𝐱‖), V is an 𝑀𝑀 × 𝑀𝑀 unitary/orthogonal/rotation matrix (i.e. 𝐕𝐕 −1 = 𝐕𝐕𝐻𝐻 ≡ 𝐕𝐕 𝑇𝑇 and ‖𝐕𝐕𝐻𝐻 𝐱𝐱‖ =
‖𝐱𝐱‖), and Σ is an 𝑁𝑁 × 𝑀𝑀 diagonal matrix with ⟨Σ⟩𝑖𝑖𝑖𝑖 = 0, 𝑖𝑖 ≠ 𝑗𝑗 and ⟨Σ⟩𝑖𝑖𝑖𝑖 = σ𝑖𝑖 > 0, 𝑖𝑖 = 1,
2, … , 𝑟𝑟.
Now:
‖𝐞𝐞‖ = ‖𝐲𝐲 − 𝐗𝐗𝐗𝐗‖ = ‖𝐲𝐲 − 𝐔𝐔Σ𝐕𝐕𝐻𝐻 𝐜𝐜‖ = ‖𝐔𝐔 𝐻𝐻 𝐲𝐲 − Σ𝐕𝐕𝐻𝐻 𝐜𝐜‖ = ‖𝐲𝐲′ − Σ𝐜𝐜′‖
where 𝐲𝐲′ = 𝐔𝐔 𝐻𝐻 𝐲𝐲, 𝐜𝐜′ = 𝐕𝐕𝐻𝐻 𝐜𝐜 and we have used that fact that ‖𝐔𝐔 𝐻𝐻 𝐞𝐞‖ = ‖𝐞𝐞‖ since U is unitary.
Thus:
𝑟𝑟 𝑁𝑁
‖𝐞𝐞‖2 = ‖𝐲𝐲 − 𝐗𝐗𝐗𝐗‖2 = ‖𝐲𝐲′ − Σ𝐜𝐜′‖2 = �|𝑦𝑦𝑖𝑖′ − σ𝑖𝑖 𝑐𝑐𝑖𝑖′ |2 + � |𝑦𝑦𝑖𝑖′ |2

𝑖𝑖=1 𝑖𝑖=𝑟𝑟+1
𝑦𝑦𝑖𝑖′
which is minimised if and only if 𝑐𝑐𝑖𝑖′ = with remaining error ∑𝑖𝑖=𝑟𝑟+1|𝑦𝑦𝑖𝑖′ |2 .
𝑁𝑁
σ𝑖𝑖
The solution of the LS problem using the SVD method from the above analysis is outlined
below [1, pages 431-438].

Steps for solution of the LS problem using the SVD method

1. Compute the SVD, 𝐗𝐗 = 𝐔𝐔Σ𝐕𝐕𝐻𝐻 via the MATLAB function [U,S,V] = svd(X), where
X is the input data matrix.
2. Determine the rank r of X as the number of non-zero elements ⟨Σ⟩𝑖𝑖𝑖𝑖 along the diagonal of
Σ.
3. Compute 𝐲𝐲′ = 𝐔𝐔 𝐻𝐻 𝐲𝐲, where 𝐲𝐲 is the desired signal vector.
𝑦𝑦𝑖𝑖′
4. Compute 𝐜𝐜𝑙𝑙𝑙𝑙 = ∑𝑟𝑟𝑖𝑖=1 𝐯𝐯 , where 𝑦𝑦𝑖𝑖′ is the ith element of 𝐲𝐲′, σ𝑖𝑖 = ⟨Σ⟩𝑖𝑖𝑖𝑖 , and 𝐯𝐯𝑖𝑖 is the ith
σ𝑖𝑖 𝑖𝑖
column of V.
5. Compute 𝐸𝐸𝑙𝑙𝑙𝑙 = ∑𝑁𝑁 ′ 2
𝑖𝑖=𝑟𝑟+1|𝑦𝑦𝑖𝑖 |
8.6 References
McGraw-Hill, 2000 (Chapter 8).

Analyzing Analog and Digital Filters

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Analyzing Analog and Digital Filters

Uploaded by

Copyright:

Available Formats

ELEC4404 Signal Processing

0. Introduction to Analog and Digital Filters

Properties of 𝑯𝑯(𝒋𝒋𝛚𝛚) FT of 𝒉𝒉(𝒕𝒕)

Properties of 𝑯𝑯(𝒆𝒆𝒋𝒋𝛀𝛀 ) DTFT of 𝒉𝒉[𝒏𝒏]

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-1

0.1 Phase Delay and Group Delay Measures

Group Delay, 𝒕𝒕𝒈𝒈 (𝛚𝛚)

0.2 Linear-Phase and Minimum-Phase Filters

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-2

Why are linear-phase filters important?

0.2.2 Minimum-Phase Filters

Minimum-Phase Analog Filters (smallest group delay)

Minimum-Phase Digital Filters (smallest group delay)

Why are minimum-phase filters important?

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-3

0.3 Filter Design

0.3.1 Filter Design Process

0.3.2 Analog versus Digital Filters

In engineering applications both analog and digital filters are important.

0.4 Analog Filter Specification and Design

0.4.1 Design Specification

Figure 11-1 Magnitude response specification

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-5

0.5 Analog Filter Approximation Functions

0.5.1 Butterworth Filters

Figure 11-2 Butterworth low-pass prototype magnitude response

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-6

Show that the high-frequency rolloff is 20𝑛𝑛 dB / decade

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-7

0.5.2 Chebyshev I Filters

Properties of the Chebyshev I Filter

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-8

Question: A Chebyshev I bandpass filter is to meet the following specifications:

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-9

0.5.3 Chebyshev II Filters

Figure 11-8 Transformation of Chebyshev I filter to Chebyshev II filter

Thus the Chebyshev II magnitude response is given by:

Properties of the Chebyshev II Filter

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-10

Question: A Chebyshev II lowpass filter is to meet the following specifications: 𝐴𝐴𝑝𝑝 ≤ 1 dB

0.5.4 Elliptic and Bessel Filters

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-11

Properties of Elliptic Filters

0.6 Analog Filter Realisation and Implementation

0.6.1 Realisation of System Functions

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-12

Why a large 𝑄𝑄 factor is bad

Butterworth / Chebyshev I / Bessel (cheaper realisation)

Chebyshev II / Elliptic (more expensive realisation)

0.6.2 Active RC Realisations

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-13

There are two main approaches to implementing active RC filter circuits

0.7 Digital IIR and FIR filters

Infinite Impulse Response (IIR) filter

𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀

Finite Impulse Response (FIR) filter

𝐻𝐻(𝑧𝑧) = 𝐵𝐵0 + 𝐵𝐵1 𝑧𝑧 −1 + ⋯ + 𝐵𝐵𝑀𝑀 𝑧𝑧 −𝑀𝑀

0.8 Digital Filter Design Principles

IIR Filter Design

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-15

There are various design approaches one can use:

To find ℎ[𝑛𝑛] we use the inverse DTFT to obtain:

10/02/2017 Dr. Roberto Togneri, EE&C Eng, UWA Page 0-16