Professional Documents
Culture Documents
c
Abdelhak Zoubir 2006
Last revised: October 2006.
Contents
Preface xi
3 Filter Design 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Linear Phase Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Generalised Linear Phase for FIR Filters . . . . . . . . . . . . . . . . . . . 22
3.4 Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Properties of IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.2 Linear phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
i
ii CONTENTS
v
vi LIST OF FIGURES
3.11 Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)
Bandpass. (d) Stopband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 Transfer function of an ideal band-pass filter . . . . . . . . . . . . . . . . . . . . 116
10.1 The window function N1 |∆N (ejω )|2 for N = 20, N = 10 and N = 5. . . . . . . . 122
10.2 The periodogram estimate of the spectrum (solid line) and the true spectrum
ω
(dotted line). The abscissa displays the frequency 2π while the ordinate shows
the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.3 The periodogram as an estimate of the spectrum using 4 non-overlapping seg-
ω
ments. The abscissa displays the frequency 2π while the ordinate shows the
estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.4 Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. . . . . . . . . . . . . . . . . . . 129
10.5 The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to
ω
bottom right). The abscissa displays the frequency 2π while the ordinate shows
the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
10.6 Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting
from the upper left to the lower right) a rectangular, Bartlett, Hamming and
Blackman window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
12.1 Relationship between system state and measurements assumed for Kalman filter. 146
viii LIST OF FIGURES
List of Tables
3.1 The four classes of FIR filters and their associated properties . . . . . . . . . . . 24
4.1 Suitability of a given class of filter for the four FIR types . . . . . . . . . . . . . 33
4.2 Properties of different windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
ix
x LIST OF TABLES
Preface
This draft is provided as an aid for students studying the subject Digital Signal Processing. It
covers the material presented during lectures. Useful references on this subject include:
1. Oppenheim, A.V. & Schafer, R.W. Discrete-Time Signal Processing, Prentice-Hall, 1989.
2. Proakis, J.G. & Manolakis, D.G. Digital Signal Processing Principles, Algorithms, and
Applications, Maxwell MacMillan Editions, 1992.
4. Diniz, P. & da Silva, E. & Netto, S. Digital Signal Processing, System Analysis and Design,
Cambridge University Press, 2002
5. Porat, B. A Course on Digital Signal Processing, J. Wiley & Sons Inc., 1997
9. Brillinger, D.R. Time Series, Data Analysis and Theory, Holden Day Inc., 1981
10. Priestley, M.B. Spectral Analysis and Time Series, Academic Press, 2001
xi
Chapter 1
1.1 Signals
Most signals can in practice be classified into three broad groups:
1. Analog or continuous-time signals are continuous in both time and amplitude. Examples:
speech, image, seismic and radar signals. An example of a continuous time signal is shown
in Figure 1.1.
x(t)
1
2 CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS
6 x(n)
-
1 2 3 n
x(n)
1Τ 2Τ 3Τ t
There are some sequences which play an important role in digital signal processing. We
discuss these next.
In general,
p(n)
6
a−1 a1 a3
- n
−1 1 3
Figure 1.4: The sequence p(n).
∞
X
x(n) = x(k)δ(n − k) ,
k=−∞
as shown in Figure 1.5.
6x(n)
-
1 n
x(k)
6
δ(n − k)
-
n k
or alternatively
∞
X
u(n) = δ(n − k) .
k=0
Conversely, the impulse sequence can be expressed as the first backward difference of the unit
step sequence, i.e.,
Xn n−1
X
δ(n) = u(n) − u(n − 1) = δ(k) − δ(k)
k=−∞ k=−∞
x(n) = A · αn , (1.1)
where A and α may be complex numbers. For real A > 0 and real 0 < α < 1, x(n) is a decreasing
sequence similar in appearance to Figure 1.6.
x(n)
For −1 < α < 0, the sequence values alternate in sign but again decrease in magnitude with
increasing n, as shown in Figure 1.7. If |α| > 1, then x(n) grows in magnitude as n increases.
A sinusoidal sequence has the general form
x(n) = A · cos(ωo n + φ)
3π
with A and φ real. The sinusoidal sequence is shown in Figure 1.8 for φ = 2 .
Now let A and α be complex so that A = |A|ejφ and α = |α|ejω0 , then
A plot of the real part of x(n) = A · αn for A = |A|ejω0 is given in Figure 1.9 for |α| > 1. In this
case we set ω0 = π and φ = 0, i.e.,
x(n)
6x(n)
-
π
2ω0 n
Note. In the continuous-time case, the quantity ω0 is called the frequency and has the di-
mension radians per second. In the discrete-time case n is a dimensionless integer. Thus the
dimension of ω0 must be radians. To maintain a closer analogy with the continuous-time case we
can specify the units of ω0 to be radians per sample and the units of n to be samples. Following
this rationale we can define the period of a complex exponential sequence as 2π/ω0 which has
the dimension samples.
6 CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS
x(n)
1.2 Systems
A system T that maps an input x(n) to an output y(n) is represented by:
y(n) = T[x(n)]
1.2. SYSTEMS 7
Example 1.2.1 The ideal delay system is defined by y(n) = x(n − nd ), nd ∈ Z+ . The system
would shift the input to the right by nd of samples corresponding to a time delay.
This definition of a system is very broad. Without some restrictions, we cannot completely
characterise a system when we know the output of a system to a certain set of inputs. Two types
of restrictions greatly simplify the characterisation and analysis of systems; linearity and shift
invariance (LSI). Many systems can be approximated in practice by a linear and shift invariant
system. A system T is said to be linear if
holds, where T[x1 (n)] = y1 (n), T[x2 (n)] = y2 (n), and a and b are any scalar constants. The
property
T[ax(n)] = aT[x(n)]
is called homogeneity, and
is called additivity. Additivity and homogeneity are collectively known as the Principle of
Superposition. A system is said to be shift invariant (SI) if
T[x(n − n0 )] = y(n − n0 )
where g(n) is an arbitrary sequence. This system is shown in Figure 1.11. Linearity of the
system is easily verified since
x(n) - × - y(n)
6
T
g(n)
T[x(n − n0 )] = x2 (n − n0 ) ,
and
y(n − n0 ) = x2 (n − n0 ) .
and the linearity property, the output y(n) of a linear system T for an input x(n) is given by
y(n) = T[x(n)]
" ∞ #
X
= T x(k)δ(n − k)
k=−∞
= T[· · · + x(−1)δ(n + 1) + x(0)δ(n) + x(1)δ(n − 1) + · · · ]
∞
X
= x(k)T[δ(n − k)]
k=−∞
1.3. CONVOLUTION 9
This result indicates that a linear system can be completely characterised by the response of
the system to a unit sample sequence δ(n) and its delays δ(n − k). Moreover, if the system is
shift-invariant, we have:
h(n − n0 ) = T[δ(n − n0 )]
where h(n) = T[δ(n)] is the unit sample response of a system T to an input δ(n).
1.3 Convolution
The input-output relation of a linear and shift-invariant (LSI) system T is given by
∞
X
y(n) = T[x(n)] = x(k)h(n − k)
k=−∞
This equation shows that knowledge of h(n) completely characterises the system, i.e., knowing
the unit sample response, h(n), of a LSI system T, allows us to completely determine the output
y(n) for a given input x(n). This equation is referred to as convolution and is denoted by the
convolution operator ⋆. For a LSI system, we have
y(n) = x(n) ⋆ h(n)
X∞
= x(k)h(n − k)
k=−∞
Remark 1: The unit sample response h(n) loses its significance for a nonlinear or shift-variant
system.
Remark 2: The choice of δ(n) as an input in characterising a LSI system is the simplest, both
conceptually and in practice.
Example 1.3.2 To compute the relationship between the input and the output with a transversal
filter, h(−k) is first plotted against k; h(−k) is simply h(k) reflected or “flipped” around k = 0,
as shown in Figure 1.13. Replacing k by k − n, where n is a fixed integer,
P leads to a shift in the
origin of the sequence h(−k) to k = n. y(n) is then obtained by the x(k)h(n − k). See Figure
1.13.
x(k)
-
n−3 n k
y(k)
-
n−3 n k
3
2 2
1 1
- -
k k
6
y(n)
7
6
6h(−k)
2
1 1
- -
k n
1.4.1 Stability
A system is considered stable in the bounded-input bounded-output (BIBO) sense if and only
if a bounded input always leads to a bounded output. A necessary and sufficient condition for
1.4. STABILITY AND CAUSALITY 11
an LSI system to be stable is that the unit sample response h(n) is absolutely summable, i.e.,
∞
X
|h(n)| < ∞
n=−∞
Such a sequence h(n) is sometimes referred to as stable sequence. The proof is quite simple.
First let x(n) be bounded by M , i.e., |x(n)| ≤ M < ∞ ∀ n, then using convolution
∞
X
|y(n)| = h(k)x(n − k)
k=−∞
X∞
≤ |h(k)||x(n − k)|
k=−∞
X∞
≤ M |h(k)|
k=−∞
So given a bounded input |x(n)| ≤ M , the output y(n) is bounded if and only if h(n) is absolutely
summable.
1.4.2 Causality
Causality is a fundamental law of physics stating that information cannot travel from the future
to the past. In the context of systems we say that a system is causal if and only if the current
output y(n) does not depend on any future values of the input such as x(n + 1), x(n + 2), . . ..
Causality holds for any practical system regardless of linearity or shift invariance.
A necessary and sufficient condition for an LSI system to be causal is that h(n) = 0, for
n < 0. The proof follows from convolution,
∞
X
y(n) = x(n − k)h(k).
k=−∞
Digital Processing of
Continuous-Time Signals
Discrete-time signals arise in many ways, but they most commonly occur in representations
of continuous-time signals. This is partly due to the fact that the processing of continuous-
time signals is often carried out on discrete-time sequences obtained by sampling continuous-
time signals. It is remarkable that under reasonable constraints a continuous-time signal can
be adequately represented by a sequence. In this chapter we discuss the process of periodic
sampling.
Herein, T is the sampling period, and its reciprocal, fs = 1/T , is the sampling frequency, in
samples per second and Ωs = 2πfs is the sampling frequency in rad/sec. This equation is the
input-output relationship of an ideal A/D converter. It is generally not invertible since many
continuous-time signals can produce the same output sequence of samples. This ambiguity can
be removed by restricting the class of input signals to the sampler, however.
It is convenient to represent the sampling process in two stages, as shown in Figure 2.1. This
consists of an impulse train modulator which is followed by conversion of the impulse train into
a discrete sequence. The system G in Figure 2.1 represents a system that converts an impulse
13
14 CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS
6 xs (t)
p(t)
train into a discrete-time sequence. The modulation signal p(t) is a periodic impulse train,
∞
X
p(t) = δ(t − nT ) ,
n=−∞
The Fourier transform of xs (t), Xs (jΩ), is obtained by convolving Xc (jΩ) and P (jΩ). The
Fourier transform of a periodic impulse train is a periodic impulse train, i.e.,
∞ ∞
2π X 2πk 2π X
P (jΩ) = δ(Ω − )= δ(Ω − kΩs ).
T T T
k=−∞ k=−∞
Since
1
Xs (jΩ) = Xc (jΩ) ∗ P (jΩ),
2π
it follows that
∞ ∞
1 X 2πk 1 X
Xs (jΩ) = Xc j Ω − = Xc (jΩ − kjΩs ) .
T T T
k=−∞ k=−∞
The same result can be obtained by noting that p(t) is periodic so that it can be represented
as a Fourier series with coefficients T1 , i.e.,
∞
1 X jnΩs t
p(t) = e .
T n=−∞
Thus
∞
1 X
xs (t) = xc (t)p(t) = xc (t)ejnΩs t .
T n=−∞
Figure 2.2 depicts the frequency domain representation of impulse train sampling. Figure
2.2(a) represents the Fourier transform of a band-limited signal where the highest non-zero
frequency component in Xc (jΩ) is at ΩB . Figure 2.2(b) shows the periodic impulse train P (jΩ),
Figure 2.2(c) shows Xs (jΩ), which is the result of convolving Xc (jΩ) with P (jΩ). From Figure
2.2(c) it is evident that when
the replica of Xc (jΩ) do not overlap, and therefore xc (t) can be recovered from xs (t) with an
ideal low-pass filter with cut-off frequency of at least ΩB but not exceeding ΩS −ΩB . However, in
Figure 2.2(d) the replica of Xc (jΩ) overlap, and as a result the original signal cannot be recovered
by ideal low pass filtering. The reconstructed signal is related to the original continuous-time
signal through a distortion referred to as aliasing.
If xc (t) is a band-limited signal with no components above ΩB rad/sec, then the sampling
frequency has to be equal to or greater than 2ΩB rad/sec. This is known as the Nyquist criterion
and 2ΩB is known as the Nyquist frequency.
where T is the sampling interval. The reconstruction filter commonly has a gain of T and a
cutoff frequency of Ωc = Ωs /2 = π/T . This choice is appropriate for any relationship between
Ωs and ΩB that avoids aliasing, i.e., so long as Ωs > 2ΩB . The impulse response of such a filter
is given by
sin (πt/T )
hr (t) = .
πt/T
Consequently, the relation between xr (t) and x(n) is given by
∞
X sin[π(t − nT )/T ]
xr (t) = x(n) ,
n=−∞
π(t − nT )/T
Example 2.3.1 Assume that we wanted to restore the signal sc (t) from yc (t) = sc (t) + nc (t),
where nc (t) is a background noise. The spectral characteristics of these signals are shown in
Figure 2.4.
16 CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS
(a) X c (j Ω)
1
− ΩB ΩB Ω
(b)
P(j Ω)
2π
−−−
T
−2 Ω s − Ωs Ωs 2 Ωs Ω
(c)
1 X s (j Ω)
−−−
T
ΩB Ωs Ω
(d)
X s (j Ω)
1
−−−
T
Ωs Ω
Figure 2.2: Effect in the frequency domain of sampling in the time domain: (a) Spectrum of
the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled
signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB .
We can perform the processing for this problem using either continuous-time or digital tech-
niques. Using a continuous-time approach, we would filter the signal sc (t) using an continuous-
time low pass filter. Alternatively, we could adopt a digital approach, as shown in Figure 2.3.
We would first low pass filter yc (t) to prevent aliasing, and then convert the continuous-time
signal into a discrete-time signal using an A/D converter. Any processing is then performed
2.3. DISCRETE-TIME PROCESSING OF CONTINUOUS-TIME SIGNALS. 17
digitally on the discrete-time signal. The processed signal is then converted back to continuous
form using an D/A converter.
Sc (jΩ) 6 Nc (jΩ) 6
1 1
- -
-ΩB +ΩB Ω -ΩB1 -ΩB ΩB ΩB1 Ω
Yc (jΩ) 6
-
-ΩB1 ΩB ΩB ΩB1 Ω
H(jΩ)
6 Processed
yc (t) Signal
- -
-
-ΩB 0 ΩB Ω
The advantages of processing signals digitally as compared with an analog approach are:
18 CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS
1. Flexibility, if we want to change the continuous-time filter because of change in signal and
noise characteristics, we would have to change the hardware components. Using the digital
approach, we only need to modify the software.
Filter Design
3.1 Introduction
Filters are an important class of linear time-invariant systems. Their objective is to modify the
input signal. In practice filters are frequency selective, meaning they pass certain frequency
components of the input signal and totally reject others. The objective of this chapter and the
following ones is to introduce the concepts of frequency selective filters and the most popular
filter design techniques.
The design of filters (continuous-time or digital) involves the following three steps.
1. Filter specification
This step consists of setting the required characteristics of the filter. For example, to
restore a signal which has been degraded by background noise, the filter characteristics will
depend on the spectral characteristics of both the signal and background noise. Typically
(but not always) specifications are given in the frequency domain.
2. Filter design
Determine the unit sample response, h(n) of the filter or its system function, H(z) and
H(ejω ).
3. Filter implementation
Implement a discrete time system with the given h(n) or H(z) or one which approaches
it.
These three steps are closely related. For instance, it does not make much sense to specify a
filter that cannot be designed or implemented. So, we shall consider a particular class of digital
filters which possess the following practical properties:
1. h(n) is real.
19
20 CHAPTER 3. FILTER DESIGN
In a practical setting, the desired filter is often implemented digitally on a chip and is used
to filter a signal derived from a continuous-time signal by means of periodic sampling followed
by analog-to-digital conversion. For this reason, we refer to digital filters as discrete-time filters
even though the underlying design techniques relate to the discrete-time nature of the signals
and systems.
T T
Figure 3.1: A basic system for discrete-time filtering of a continuous-time signal.
A basic system for discrete-time filtering of continuous-time signals is given in Figure 3.1.
Assuming the input, xc (t), is band-limited and the sampling frequency is high enough to avoid
aliasing, the overall system behaves as a linear time-invariant system with frequency response
H(ejΩT ), |Ω| < π/T
Hef f (jΩ) =
0, |Ω| > π/T
where x(n) and y(n), n = 0, ±1, . . . are the input and output of the system and max(N − 1, M )
is the order.
The unit sample response of the system may be of either finite duration or infinite duration.
It is useful to distinguish between these two classes as the properties of the digital processing
techniques for each differ. If the unit sample response is of finite duration, we have a finite
impulse response (FIR) filter, and if the unit sample response is of infinite duration, we have an
infinite impulse response (IIR) filter.
If M = 0 so that
N −1
1 X
y(n) = br x(n − r) (3.2)
a0
r=0
3.3. LINEAR PHASE FILTERS 21
then the system corresponds to an FIR filter. In fact comparison with the convolution sum
∞
X
y(n) = h(k)x(n − k)
k=−∞
shows that Equation (3.2) is identical to the convolution sum, hence it follows that
bn
h(n) = a0 , n = 0, . . . , N − 1
0, otherwise.
An FIR filter can be always described by a difference equation of the form (3.1) with M = 0.
M must be greater than zero for an IIR filter.
Simply put, a linear phase filter preserves the shape of the signal. Should the filter have the
property that |H(ejω )| = 1, but does not have linear phase, then the shape of the signal will be
distorted. We generalise the notion of linear phase to
where HM (ejω ) is a real function of ω, and α and β are constants. The case where β = 0 is
referred to as linear phase as opposed to generalised linear phase where we have
leading to
∞
X
h(n) sin(ωn)
sin(β − ωα) n=−∞
tan(β − ωα) = =− ∞
cos(β − ωα) X
h(n) cos(ωn)
n=−∞
and
∞
X
h(n) sin(ω(n − α) + β) = 0. (3.3)
n=−∞
To have φ(ω) = −β + ωα, (3.3) must hold for all ω, and is a necessary, but not sufficient,
condition on h(n), α and β. For example, one can show that
β = 0 or π , 2α = N − 1 , h(2α − n) = h(n) (3.4)
satisfy (3.3). Alternatively
π 3π
β= , or , 2α = N − 1 , h(2α − n) = −h(n) (3.5)
2 2
also satisfies (3.3).
To have a causal system we require
∞
X
h(n) sin(ω(n − α) + β) = 0
n=0
to hold for all ω. Causality and the previous conditions (3.4) and (3.5) imply that h(n) = 0, n < 0
and n > N −1, i.e., causal FIR filters have generalised linear phase if they have impulse response
length N and satisfy either h(2α − n) = h(n) or h(2α − n) = −h(n). Specifically if,
h(N − 1 − n), 0 ≤ n ≤ N − 1,
h(n) =
0, otherwise ,
then
H(ejω ) = He (ejω )e−jω(N −1)/2 ,
where He (ejω ) is a real, even, periodic function of ω. Similarly if
−h(N − 1 − n), 0 ≤ n ≤ N − 1 ,
h(n) =
0, otherwise ,
then
H(ejω ) = jHO (ejω )e−jω(N −1)/2 = HO (ejω )e−jω(N −1)/2+jπ/2 ,
where HO (ejω ) is a real, odd, periodic function of ω. The above two equations for h(n) are
sufficient, but not necessary conditions, which guarantee a causal system with generalised linear
phase.
Type I FIR linear phase filter. A type I filter has a symmetric impulse response
Type II FIR linear phase filter. A type II filter has a symmetric impulse response with N
an even integer. The corresponding frequency response is
XN/2
N −1
H(ejω ) = e−jω 2 b(k) cos[ω(k − 1/2)] ,
k=1
where
b(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.
Type III FIR linear phase filter. A type III filter has an antisymmetric impulse response
where
c(k) = 2h((N − 1)/2 − k), k = 1, 2, . . . , (N − 1)/2.
Type IV FIR linear phase filter. A type IV filter has an antisymmetric impulse response
with N an even integer. The corresponding frequency response is
XN/2
N −1
H(ejω ) = je−jω 2 d(k) sin[ω(k − 1/2)] ,
k=1
where
d(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.
24 CHAPTER 3. FILTER DESIGN
Table 3.1: The four classes of FIR filters and their associated properties
Remarks
1. A filter will have no phase distortion if H(ejω ) is real and even since h(n) will be real and
even.
2. Non-causal FIR filters can be made causal by an appropriate time shift so that the first
non-zero value occurs at n = 0. The point of symmetry can therefore occur at two discrete-
time points N 2−1 depending on whether N was odd or even.
Example 3.3.1 Example of a type I FIR filter. Given a filter with unit sample response
1, 0 ≤ n ≤ 4
h(n) = ,
0, otherwise
we can evaluate the frequency response to be
4
X 1 − e−j5ω sin(5 · ω/2)
H(ejω ) = e−jωn = = e−j2ω .
1 − e−jω sin(ω/2)
n=0
The magnitude and the phase of the frequency response of the filter are depicted in Figure 3.4
h(n)
3 6 TYPE I
2
1
-
n
h(n)
3 6 TYPE II
2
1
-
h(n) n
6 2 TYPE III
1
-
n
-1
-2
h(n)
6 3
2 TYPE IV
1
-
n
-1
-2
-3
Figure 3.3: Examples of the four types of linear phase FIR filters
Example 3.3.3 Example of a type III FIR filter. Consider a filter with unit impulse response
The magnitude and the phase of H(ejω ) are depicted in Figure 3.6.
26 CHAPTER 3. FILTER DESIGN
5 4
4.5
3
4
2
3.5
Phase (Radians)
1
3
Magnitude
2.5 0
2
−1
1.5
−2
1
−3
0.5
0 −4
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.4: Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase (right).
6 4
3
5
2
Phase (Radians)
4
1
Magnitude
3 0
−1
2
−2
1
−3
0 −4
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.5: Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase
(right).
3
3
2
2.5
Phase (Radians)
1
2
Magnitude
0
1.5
1 −1
0.5 −2
0 −3
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.6: Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase
(right).
3.4. FILTER SPECIFICATION 27
Example 3.3.4 Example of a type IV FIR filter. Consider a filter with unit impulse response
h(n) = δ(n) − δ(n − 1).
We can then evaluate the frequency response as
ω π
ω
H(ejω ) = 1 − e−jω = j[2 sin(ω/2)]e−jω/2 = 2e−j ( 2 − 2 ) sin .
2
The magnitude and the phase of H(ejω ) are depicted in Figure 3.7.
3
3
2
2.5
Phase (Radians)
1
2
Magnitude
0
1.5
1 −1
0.5 −2
0 −3
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.7: Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase
(right).
6 |H(ejω )|
1
-
π ω
6 |H(ejω )|
1
-
π ω
jω
6 |H(e )|
1
-
ω1 ω2 π ω
jω
6 |H(e )|
1
-
ω1 ω2 π ω
Figure 3.8: Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters.
A similar arrangement holds for high pass, bandstop and band pass filters (see Figure 3.11).
1. have a unit sample response h(n) that is real, causal, and stable.
3.5. PROPERTIES OF IIR FILTERS 29
|H(ejω )| 6
Transition
Band
? -
ωp ωs ω
A causal h(n) with a rational z-transform can be realised by a linear constant coefficient
difference equation (LCCDE) with initial rest conditions (IRC) where the output is com-
puted recursively. Thus IIR filters can be implemented efficiently.
jω
6 |H(e )| (a)
1 + α1 .............................................
..............................
1 − α1 ...............
α2 ........................................................................................................................................................................
-
ωp ωs π ω
jω (b)
6 |H(e )|
1 + α1 ........................................................................................................................................................................
1 − α1 ........................................................................................................................................................................
α2 .............................................
-
ωs ωp π ω
6 |H(ejω )| (c)
1 + α1 .............................................
.............................................
1 − α1
α2 ............................................. ........................................................................................................................
-
ωs1 ωp1 ωp2 ωs2 π ω
jω (d)
6 |H(e )|
1 + α1 ............................................. ........................................................................................................................
.............................. ........................................................................................................................
1 − α1 ...............
α2 .............................................
-
ωp1 ωs1 ωs2 ωp2 π ω
Figure 3.11: Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)
Bandpass. (d) Stopband.
3.5.1 Stability
An FIR is stable if h(n) is finite for all n, therefore stability is not a problem in practical design
or implementation. With an IIR filter, all the poles of H(z) must lie inside the unit circle to
ensure filter stability.
3.5. PROPERTIES OF IIR FILTERS 31
4.1 Introduction
The filter design problem is that of determining h(n), H(ejω ) or H(z) to meet the design speci-
fication. The first step is to choose the type of filter from the previously discussed four types of
FIR filters.
Some types of filters may not be suitable for a given design specification. For example,
from Table 3.1 a Type II filter is not suitable for a high-pass filter design as it will always have
H(ejω ) = 0 at ω = π since cos π(n − 1/2) = 0. Table 4.1 shows what type of filters are not
suitable for a given class of filter.
Table 4.1: Suitability of a given class of filter for the four FIR types
33
34 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
The simplest way to obtain a causal FIR filter from hd (n) is to define a new system with unit
sample response
hd (n), 0 ≤ n ≤ N − 1 ,
h(n) =
0, elsewhere ,
which can be re-written as
h(n) = hd (n) · w(n) ,
and therefore Z π
jω 1
H(e ) = Hd (ejθ ) · W (ej(ω−θ) )dθ .
2π −π
N −1 ωp +ωs
where α = 2 , ωc = 2 , and N is the filter length (odd). The inverse Fourier transform of
Hd (ejω ) is
sin ωc (n − α)
hd (n) =
π(n − α)
If we apply a rectangular window to hd (n), the resulting filter impulse response h(n) is
(
sin ωc (n−α)
π(n−α) , for 0 ≤ n ≤ N − 1 ,
h(n) =
0 , otherwise .
The sequence h(n) is shown in Figure 4.1 for N = 7 and ωc = π/2. Note that this filter may or
may not meet the design specification.
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH 35
6 h(n)
0.5
.
0.318 0.318
-
n
−0.106 -0.106
H(ejω ) = Hd (ejω )
⋆ W (ejω )
Z π
1
= Hd (ejθ )W (ej(ω−θ) )dθ (4.1)
2π −π
If we do not truncate at all so that w(n) = 1 for all n, W (ejω ) is a periodic impulse train with
period 2π, and therefore H(ejω ) = Hd (ejω ). This interpretation suggests that if w(n) is chosen
so that W (ejω ) is concentrated in a narrow band of frequencies around ω = 0, then H(ejω )
approximates Hd (ejω ) very closely except where Hd (ejω ) changes very abruptly. Thus, the
choice of window is governed by the desire to have w(n) as short as possible in duration so as to
minimise computation in the implementation of the filter while having W (ejω ) approximate an
impulse; that is, we want W (ejω ) to be highly concentrated in frequency so that the convolution
(4.1) faithfully reproduces the desired frequency response. These are conflicting requirements.
Take, for example, the case of the rectangular window, where
N −1
X 1 − e−jωN sin(ωN/2)
W (ejω ) = e−jωn = −jω
= e−jω(N −1)/2 .
1−e sin(ω/2)
n=0
The magnitude of the function sin(ωN/2)/ sin(ω/2) is shown in Figure 4.2 for the case N = 8.
Note that W (ejω ) for the rectangular window has a generalised linear phase. As N increases,
the width of the “main-lobe” decreases. The main-lobe is usually defined as the region between
the first zero crossing on either side of the origin. For the rectangular window, the main-lobe
width is ∆ω = 4π/N . However, for the rectangular window, the side lobes are large and, in
fact, as N increases, the peak amplitudes of the main-lobe and the side-lobes grow in a manner
such that the area under each lobe is a constant even though the width of each lobe decreases
with N . Consequently, as W (ej(ω−θ) ) “slides by” a discontinuity of Hd (ejω ) with increasing ω,
the integral of W (ej(ω−θ) )Hd (ejω ) will oscillate as each side-lobe of W (ej(ω−θ) ) moves past the
discontinuity.
36 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
10
9 Mainlobe
Magnitude
5
4
Peak sidelobe
0
−2 0 2 4 6 8
w rad/s
We can modify this effect by tapering the window smoothly to zero at each end so that
the height of the side-lobes are diminished; however this is achieved at the expense of a wider
main-lobe and thus a wider transition at the discontinuity.
Five common windows are used in practice: the rectangular, Bartlett, Hanning, Hamming
and Blackman window. The formula for these windows are given below.
Rectangular
1, 0 ≤ n ≤ N − 1 ,
w(n) =
0, elsewhere.
Bartlett
2n N −1
, 0≤n≤
,
w(n) = N −1 2
2 − 2n , N − 1 ≤ n ≤ N − 1 .
N −1 2
Hanning
1 2πn
w(n) = 1 − cos( ) , 0 ≤ n ≤ N − 1.
2 N −1
Hamming
2πn
w(n) = 0.54 − 0.46 cos , 0 ≤ n ≤ N − 1.
N −1
Blackman
2πn 4πn
w(n) = 0.42 − 0.5 cos + 0.08 cos , 0 ≤ n ≤ N − 1.
N −1 N −1
Figure 4.3 compares these window functions for N = 200. A good choice of the window
corresponds to choosing a window with small main-lobe width and small side-lobe amplitude.
Clearly, the shape of a window affects both main-lobe and side-lobe behavior.
Figures 4.4, 4.5, 4.6, 4.7 and 4.8 show these windows in the frequency domain for N = 51,
using a 256-point FFT. For comparison, Figures 4.9 and 4.10 show |W (ejω )| of the Hamming
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH 37
1.2
1
Rectangular
0.8
Bartlett
w(n)
0.6 Hamming
0.4
Blackman
0.2
Hanning
0
0 50 100 150 200
n
Figure 4.3: Commonly used windows for FIR filter design N = 200.
window for N = 101 and 201, respectively. This shows that main-lobe behavior is affected
primarily by window length.
The effect of the shape and size of the window can be summarised as follows:
Since side-lobe behavior and therefore passband and stop-band tolerance is affected only by
the shape of the window, the passband and stop-band requirements dictate which window is
chosen. The window size is then determined from the transition width requirements. Table 4.2
(column 1 and 2) contains information useful in choosing the shape and length of the window.
-10
-20
20 log|W|
-30
-40
-50
-60
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
-20
-40
-60
20 log|W|
-80
-100
-120
-140
-160
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
-20
-40
-60
20 log|W|
-80
-100
-120
-140
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
-10
-20
-30
-40
20 log|W|
-50
-60
-70
-80
-90
-100
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
-20
-40
-60
20 log|W|
-80
-100
-120
-140
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
-20
-40
20 log|W|
-60
-80
-100
-120
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.9: Fourier transform of a Hamming window with window length of N = 101.
40 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
-20
-40
20 log|W|
-60
-80
-100
-120
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.10: Fourier transform of a Hamming window with window length of N = 201.
We have to design a low-pass filter with a required stop-band attenuation of −50dB. Which
filter is most appropriate?
The third column of Table 4.2 contains information on the best stop-band (or passband)
attenuation we can expect from a given window shape. Therefore, we can see that the rectangular,
Bartlett and Hanning windows will not meet the specifications. The Hamming window may just
meet the requirements, but the Blackman window will clearly be the most appropriate.
From the second column of Table 4.2 the length of the window required can be derived. The
distance between the peak ripples on either side of discontinuity is approximately equal to the
main-lobe width ∆ωm . The transition width ∆ω = ωs − ωp is therefore somewhat less than
the main-lobe width. For example, for the Blackman window, N can be determined from the
relationship 12π/(N − 1) ≃ ∆ω. Therefore N ≈ 12π ∆ω + 1 points.
where α = N 2−1 , and I0 (·) is the zeroth-order modified Bessel function of the first kind. In
contrast to the previously discussed windows, this one has two parameters; the length N , and
a shape parameter β. Thus, by varying N and β we can trade side-lobe amplitude for main-
lobe width. Figure 4.11 shows the trade-off that can be achieved. If the window is tapered
more, the side-lobes of the Fourier transform become smaller, but the main-lobe becomes wider.
Alternatively, we can hold β constant and increase N , causing the main-lobe to decrease in
width without affecting the amplitude of the side-lobes.
4.4. KAISER WINDOW FILTER DESIGN 41
1.2
0.8
Amplitude
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20
Samples
(a)
−10
−20
−30
−40
20 log |W|
−50
−60
−70
−80
−90
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
−10
−20
−30
−40
20 log |W|
−50
−60
−70
−80
−90
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(c)
Figure 4.11: (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.
(b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows
with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed).
42 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Kaiser obtained formula that permitted the filter designer to estimate the values of N and
β which were needed to meet a given filter specification. Consider a frequency selective filter
with the specifications shown in Figure 3.11(a). Kaiser found that over a usefully wide range
of conditions α1 is determined by the choice of β. Given α1 , the passband cutoff frequency ωp
of the low-pass filter is defined to be the highest frequency such that |H(ejω )| ≥ 1 − α1 . The
stop-band cutoff frequency ωs is defined to be the lowest frequency such that |H(ejω )| ≤ α2 .
The transition region width is therefore
∆ω = ωs − ωp
for the low-pass filter approximation. With A being the negative peak approximation error (pae)
A = −20 log10 α1 = 60
and
β = 5.635, N = 38 .
Then, (
sin ωc (n−α) I0 [β(1−[(n−α)/α]2 )1/2 ]
· , 0 ≤ n ≤ N − 1,
h(n) = π(n−α) I0 (β)
0, elsewhere
where α = N 2−1 = 37
2 = 18.5 and ωc = (ωs + ωp ) /2. Since N is an even integer, the resulting
linear phase system would be of type II. Figure 4.12 shows h(n) and H(ejω ).
ωs = 0.35π
ωp = 0.5π
α1 = 0.021
Again using the formulae proposed by Kaiser, we get β = 2.6 and N = 25. Figure 4.13 shows
h(n) and H(ejω ). The resulting peak error is 0.0213 > α1 = α2 = 0.021 as specified. Thus we
increase N by 26. In this case the filter would be of type II and is not appropriate for designing
a high-pass. Therefore we use N = 27, which exceeds the required specifications.
4.4. KAISER WINDOW FILTER DESIGN 43
0.6
0.5
0.4
0.3
Amplitude
0.2
0.1
−0.1
−0.2
0 5 10 15 20 25 30 35 40
Sample number (n)
(a)
20
−20
−40
20 log |H|
−60
−80
−100
−120
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
Figure 4.12: Response functions. (a) Impulse response. (b) Log magnitude.
The design of FIR filters by windowing is straightforward to apply and it is quite general
even though it has a number of limitations. However, we often wish to design a filter that is the
“best” that can be achieved for a given value of N . We describe two methods that attempt to
improve upon the windowing method by introducing a new criterion of optimality.
44 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
0.7
0.6
0.5
0.4
0.3
Amplitude
0.2
0.1
−0.1
−0.2
−0.3
−0.4
0 5 10 15 20 25 30
Sample number (n)
(a)
20
−20
20 log |H|
−40
−60
−80
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
Figure 4.13: Kaiser window estimate. (a) Impulse response. (b) Log magnitude.
4.5. OPTIMAL FILTER DESIGN. 45
h(n) = h(−n),
the frequency response is
L
X
jω
H(e ) = h(n)e−jωn
n=−L
L
X
= h(0) + 2h(n) cos(ωn)
n=1
L
X
= a(n) cos(ωn)
n=0
L
" n
#
X X
k
= a(n) βkn cos(ω)
n=0 k=0
L
X
= ǎ(n) cos(ω)n
n=0
which is real and even. The filter specifications for Hd (ejω ) are shown in Fig. 4.14
To design the filter the Parks-McClellan algorithm formulates the problem as one of polynomial
approximation which can be solved by the Chebyshev approximation over disjoint sets of fre-
quencies (F = {0 ≤ ω ≤ ωp , ωs ≤ ω ≤ π}). The approximation error function, E(ω), is defined
as
ˆ (ω) Hd (ejω ) − H(ejω ) .
E(ω)=W (4.2)
E(ω) measures the error between the optimum and attained frequency response, which is
weighted by W (ω) where
α1
α2 : 0 ≤ ω ≤ ωp
W (ω) = .
1 : ωs ≤ ω ≤ π
46 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
1 + α1
1
1 − α1
α2
−α2
ωp ωs
The optimal H(ejω ) minimizes the maximum weighted error which means we have to search for
an H(ejω ) which fulfills
min max |E(ω)| = min max W (ω) Hd (ejω ) − H(ejω ) .
ǎ(n) : 0≤n≤L ω∈F ǎ(n) : 0≤n≤L ω∈F
to be the unique, best weighted Chebyshev approximation of Hd (ejω ) in F is that the error
function E(ω) has at least L + 2 alternating frequencies in F . This means that there must exist
at least L + 2 frequencies ωi in F such that ω0 < ω1 < · · · < ωL+1 with E(ωi ) = −E(ωi+1 ) and
The maximum of the alternating frequencies can be found from the derivative of E(ω). If we
calculate the derivative of E(ω) and take into account that Hd (ejω ) is constant in F , we get for
the maximal frequencies
dE(ω) d
= W (ω) Hd (ejω ) − H(ejω )
dω dω
dH(ejω )
= − W (ω) (4.3)
dω
XL
= W (ω) sin(ω) nǎ(n) cos(ω)n−1 = 0.
n=1
This means that there are at most L + 1 maxima belonging to the polynomial structure and
to the frequencies at ω = 0 and ω = π. Two more maxima will occur at ω = ωp and ω = ωs
4.5. OPTIMAL FILTER DESIGN. 47
with
L+1
Y 1
γk = .
cos(ωk ) − cos(ωn )
n=0, n6=k
Since E(ω) has the same extremal frequencies as H(ejω ) we can write
(−1)i ρ
Ĥ(ejωi ) = Hd (ejωi ) − .
W (ωi )
Using Lagrange interpolation it follows that
PL jωi ) βk
jω k=0 Ĥ(e cos(ω)−cos(ωk )
H(e ) = PL βk
k=0 cos(ω)−cos(ωk )
with
L
Y 1
βk = .
cos(ωk ) − cos(ωn )
n=0, n6=k
With H(ejω ) it is possible to calculate the error E(ω) according to Eq. (4.2). After calculat-
ing E(ω), L + 2 extremal frequencies belonging to the largest alternating peaks can be selected
and the algorithm can start again by using Eq. (4.4). The Algorithm is summarised in Fig.
4.15.
48 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
By using the Remez algorithm we are able to find the δ that defines the upper bound of |E(ω)|,
which is the optimum solution for the Chebyshev approximation. After the algorithm converges
the filter coefficients h(n) can be calculated by taking L + 1 frequencies from H(ejω ) with
L
1 X 2πkn
h(n) = H(eωn )ej L+1 .
L+1
n=0
Remarks
1. Designing an optimal filter requires much more computation than the windowing method.
2. The length of the filter designed by the window method is 20% to 50% longer than the
length of the optimal filter.
c
3. Optimal filter design is the most widely used FIR filter design method. In
MATLAB
the function that performs optimum filter design is called ”firpm”.
i.e., the output y(n) represents a sum of weighted and delayed versions of the input x(n). The
signal flow graph for a single delay element is shown in Figure 4.16. The convolution sum can
be represented as in Figure 4.17.
The computational efficiency of the algorithm in Figure 4.17 may be improved by exploiting
the symmetry (or anti-symmetry) of h(n). For type I Filters, we have h(n) = h(N − 1 − n) and
4.6. IMPLEMENTATION OF FIR FILTERS 49
This realisation is shown by the flow graph in Figure 4.18. This method reduces the number of
multiplications by 50% without affecting the number of additions or storage elements required.
50 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
?
Initial guess of
(L + 2) extremal frequencies
-
?
Calculate the optimum
ρ on extremal set
?
Interpolate through (L + 1)
points to obtain H(ejω )
?
Calculate error E(ω)
and find local maxima
where |E(ω)| ≥ ρ
No
?
Changed Check whether the
extremal points changed
Unchanged
?
Best approximation
x(n) - z −1 - y(n)
x(n) - - z −1 - - z −1 - - - z −1
?h(0) ?h(1) ?h(2) ?
h(N − 1)
- - - y(n)
x(n) - - z −1 - - z −1 - - z −1
] ] ] ]
z −1 z −1 z −1
?h(0) ?
h(1) ?h(2) h( N 2−3 )
? h( N 2−1 )
?
y(n)
Figure 4.18: Efficient signal flow graph for the realisation of a type I FIR filter.
52 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Chapter 5
5.1 Introduction
When designing IIR filters one must determine a rational H(z) which meets a given magnitude
specification. The two different approaches are :
1. Design a digital filter from an analog or continuous-time filter.
Remarks
1. Step 2 depends on Step 4 since translation of the digital filter specification to a continuous-
time filter specification depends on how the continuous-time filter is transformed back to
the digital domain.
2. It is desirable for Step 4 to always transform a causal and stable Hc (s) to a causal and
stable H(z), and also that the transformation results in a digital filter with a rational H(z)
when the continuous-time filter has a rational Hc (s).
3. The continuous-time filter frequency response Hc (jΩ) = Hc (s)|s=jΩ should map the digital
filter frequency response H(ejω ) = H(z = ejω ). Typically Butterworth, Chebyshev and
elliptic filters are used as continuous-time prototypes.
The impulse invariance method and bilinear transformation are two approaches that satisfy
all the above properties, and are often used in designing IIR filters. We discuss their implemen-
tation in the following sections.
53
54 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
2. The sequence h(n) is obtained by h(n) = Td · hc (nTd ), where Td is the design sampling
interval which does not need to be the same as the sampling period T associated with
A/D and D/A conversion.
Hc (s) H(z)
hc (t) h(n)
So, H(ejω ) is not simply related to Hc (jΩ) because of aliasing. If there were no aliasing effect,
jω ω
H(e ) = Hc j for − π ≤ ω ≤ π , (5.2)
Td
and the shape of H(ejω ) would be the same as Hc (jΩ). Note that in practice Hc (jΩ) is not
band limited and therefore aliasing is present which involves some difficulty in the design. In
practice, we suppose that the relation H(ejω ) = Hc (j Tωd ) holds over [−π, π) and we carry out
5.2. IMPULSE INVARIANCE METHOD 55
the remaining steps, instead of using (5.1). Note that the discrete-time system frequency ω is
related to the continuous-time system frequency Ω by ω = ΩTd . Because of the approximation,
the resulting filter may not satisfy the digital filter specifications and some iterations may be
needed. Also due to aliasing, it is difficult to design band stop or high-pass filters that have no
significant aliasing.
In the impulse invariance design procedure, we transform the continuous-time filter specifi-
cations through the use of Eq. (5.2) to the discrete-time filter specifications. We illustrate this
in the following example.
Example 5.2.2 Assume that we have the low-pass filter specifications as shown in Figure 5.1
with α1 = 0.10875, α2 = 0.17783, ωp = 0.2π, Td = 1 and ωs = 0.3π. In this case the passband is
between 1 − α1 and 1 rather than between 1 − α1 and 1 + α1 . Historically, most continuous-
jω
6|H(e )|
1 ..............................
1 − α1 ..............................
α2 .................................
-
ωp ωs π ω
time filter approximation methods were developed for the design of passive systems (having gain
less than 1). Therefore, the design specifications for continuous time filters have typically been
formulated so that the passband is less than or equal to 1.
We will design a filter by applying impulse invariance to an appropriate Butterworth continuous-
time filter. A Butterworth low-pass filter is defined by the property that the magnitude-squared
function is
1
|Hc (jΩ)|2 =
1 + (Ω/Ωc )2N
where N is an integer corresponding to the order of the filter. The magnitude response of a But-
terworth filter of N th order is maximally flat in the passband, i.e., the first (2N − 1) derivatives
of the magnitude-squared function are zero at Ω = 0. An example of this characteristic is shown
in Figure 5.2. The first step is to transform the discrete-time specifications to continuous-time
filter specifications. Assuming that the aliasing involved in the transformation from Hc (jΩ) to
H(ejω ) will be negligible, we obtain the specification on Hc (jΩ) by applying the relation
Ω = ω/Td
and
|Hc (jΩ)| ≤ 0.17783 = α2 , 0.3π ≤ |Ω| ≤ π .
56 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
1
0.9
0.8
0.7
0.6
Magnitude
N=2
0.5
N=4
0.4
0.3 N=8
0.2
0.1
0
0 1 2 3 4 5 6
Frequency
6|Hc (jΩ)|
1 − α1
α2
-
ωp ωs π
0 Td Td Td Ω
Figure 5.3: Filter specifications for Example 5.2.2.
and
|Hc (j0.3π)| ≤ 0.17783
Now, we determine N and Ωc , which are parameters of
1
|Hc (jΩ)|2 = ,
1 + (Ω/Ωc )2N
and 2N 2
0.3π 1
1+ =
Ωc 0.17783
which leads to N = 5.8858 and Ωc = 0.70474. Rounding N up to 6, we find Ωc = 0.7032. With
these values, the passband specification will be met exactly and the stopband specifications will
be exceeded, reducing the potential of aliasing. With N = 6, the 12 poles of
are uniformly distributed around a circle of radius ωc = 0.7032. Consequently, the poles of Hc (s)
are the 3 pole pairs in the left half of the s-plane with coordinates
s1 = −0.182 ± j0.679
s2 = −0.497 ± j0.497
s3 = −0.679 ± j0.182
so that
0.12093
Hc (s) = .
(s2 + 0.364s + 0.4942)(s2 + 0.994s + 0.494)(s2 + 1.358s + 0.4542)
1 1
If we express Hc (s) as a partial fraction expansion, we transform (s−sk ) into (1−esk z −1 )
and get
2 1 − e−jω 2
s= · −jω
= j tan(ω/2).
Td 1 + e Td
The bilinear transformation transforms the imaginary axis of the s-plane to the unit circle
as shown in Figure 5.4. The inverse transform is given by
2
Td +s 1 + (Td /2)s 1 + σTd /2 + jΩTd /2
z= 2 = = .
Td −s 1 − (Td /2)s 1 − σTd /2 − jΩTd /2
The region ℜ{s} < 0 is transformed to |z| < 1, i.e. the left-hand half plane ℜ{s} < 0 of the
s-plane is mapped inside the unit circle. This means the bilinear transformation will preserve
the stability of systems.
58 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
Image of
s = jΩ (unit circle)
- -
σ ℜ
Image of
left half-plane
Figure 5.4: Mapping of the s-plane onto the z-plane using the bilinear transformation
In the definition of the transformation, T2d is a scaling factor. It controls the deformation of
the frequency axis when we use the bilinear transformation to obtain a digital transfer function
H(z) from the continuous-time transfer function Hc (s). From s = j T2d tan(ω/2) = σ + jΩ, it
follows that
2
Ω= tan(ω/2) and σ = 0 .
Td
For low frequencies, the deformation between Ω and ω is very small,i.e. for small ω, we can
assume Ω ≈ ω2 . This was the criterion for choosing 2/Td as a scaling factor. This is shown in
Figure 5.5. The choice of another criterion will yield another scaling factor.
4
2
Discrete−time frequency
−1
−2
−3
−4
−20 −15 −10 −5 0 5 10 15 20
Continuous−time frequency
Figure 5.5: Mapping of the continuous-time frequency axis onto the unit circle by bilinear
transformation (Td = 1).
Remarks
1. This transformation is the most used in practice. It calculates digital filters from known
continuous-time filter responses.
τd = τc [1 + ((ωc Td )/2)2 ] .
If the continuous-time filter has a constant delay time, the resulting digital filter will not
have this property.
4. To design a digital filter using this method one must first transform the discrete time filter
specifications to the continuous time specifications, design a continuous-time filter with
these new requirements and then apply the bilinear transformation.
Example 5.3.1 Consider the discrete-time filter specifications of Example 5.2.2, where we il-
lustrated the impulse invariance technique for the design of a discrete-time filter. In carrying out
the design using the bilinear transformation, the critical frequencies of the discrete-time filter
must be prewarped to the corresponding continuous-time frequencies using Ω = (2/Td ) tan(ω/2)
so that the frequency distortion inherent in bilinear transformation will map them back to the
correct discrete-time critical frequencies. For this specific filter, with |Hc (jΩ)| representing the
magnitude response function of the continuous-time filter, we require that
2 0.2π
0.89125 ≤ |Hc (jΩ)| ≤ 1, 0 ≤ |Ω| ≤ tan ,
Td 2
2 0.3π
|Hc (jΩ)| ≤ 0.17783, tan ≤ |Ω| ≤ ∞.
Td 2
and
|Hc (j2 tan(0.15π))| ≤ 0.17783.
The form of the magnitude-squared function for the Butterworth filter is
1
|Hc (jΩ)|2 = .
1 + (Ω/Ωc )2N
and 2N 2
2 tan(0.15π) 1
1+ = (5.4)
Ωc 0.17783
and solving for N gives
1 1
log[(( 0.17783 )2 − 1)/(( 0.89125 )2 − 1)]
N= = 5.30466.
2 log[tan(0.15π)/ tan(0.1π)]
60 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
Since N must be an integer, we choose N = 6. Substituting this N into Eq. ( 5.4) we obtain
Ωc = 0.76622. For this value of Ωc , the passband specifications are exceeded and stop band
specifications are met exactly. This is reasonable for the bilinear transformation since we do not
have to be concerned with aliasing. That is, with proper prewarping, we can be certain that the
resulting discrete-time filter will meet the specifications exactly at the desired stopband edge.
In the s-plane, the 12 poles of the magnitude-squared function are uniformly distributed in
angle on a circle of radius 0.7662. The system function of the continuous-time filter obtained
by selecting the left half-plane poles is
0.20238
Hc (s) = .
(s2 + 0.3966s + 0.5871)(s2 + 1.0836s + 0.5871)(s2 + 1.4802s + 0.5871)
We then obtain the system function H(z) for the discrete-time filter by applying the bilinear
transformation to Hc (s) with Td = 1,
0.0007378(1 + z −1 )6
H(z) = .
(1 − 1.268z −1 + 0.7051z −2 )(1 − 1.0106z −1 + 0.3583z −2 )(1 − 0.9044z −1 + 0.2155z −2 )
Since the bilinear transformation maps the entire jΩ-axis of the s-plane onto the unit circle,
the magnitude response of the discrete-time filter falls off much more rapidly than the original
continuous-time filter, i.e. the entire imaginary axis of the s-plane is compressed onto the unit
circle. Hence, the behaviour of H(ejω ) at ω = π corresponds to the behaviour of Hc (jΩ) at
Ω = ∞. Therefore, the discrete-time filter will have a sixth-order zero at z = −1 since the
continuous-time Butterworth filter has a sixth-order zero at s = ∞.
Note that as with impulse invariance, the parameter Td is of no consequence in the design
procedure since we assume that the design problem always begins with discrete-time filters
specifications. When these specifications are mapped to continuous-time specifications and then
the continuous-time filter is mapped back to a discrete-time filter, the effect of Td is cancelled.
Although we retained the parameter Td in our discussion, in specific problems any convenient
value of Td can be chosen.
b0 v(n)
x(n) - - - - - y(n)
z −1 ? 6 6
?
z −1
b1 a1
x(n − 1) - y(n − 1)
z −1 ? z
?
−1
6 6
b2 a2
x(n − 2) - y(n − 2)
bN −2 aN −2
x(n − N + 2) - y(n − N + 2)
z −1 ? z
?
−1
bN −1 aN −1
x(n − N + 1) - y(n − N + 1)
Figure 5.6: Signal flow graph of direct form I structure for an (N − 1)th-order system.
we can change the order of H1 (z) and H2 (z) without affecting the overall system function. This
is because the delay elements are redundant and can be combined. We then obtain the direct
form II structure. The advantage of the second form lies in the reduction in the number of
storage elements required, while the number of arithmetic operations required is constant. The
flow graph of direct form II structure is depicted in Figure 5.7
62 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
w(n) b0
x(n) - - - - y(n)
?
z −1
6 6
a1 b1
-
z −1
?
6 6
a2 b2
-
aN −2 bN −2
-
z −1
?
aN −1 bN −1
-
Figure 5.7: Signal flow graph of direct form II structure for an N − 1th-order system.
Figure 5.8: Cascade structure for a 6th-order system with a direct form II realisation of each
second-order subsystem.
aside phase considerations, a given amplitude response specification will, in general, be met most
efficiently with an IIR filter. However, in many cases, the linear phase available with an FIR
filter may be well worth the extra cost.
Thus a multitude of tradeoffs must be considered in designing a digital filter. Clearly, the
final choice will most often be made in terms of engineering judgement on such questions as
the formulation of specifications, the method of implementation, and computational facilities
available for carrying out the design.
Chapter 6
6.2 Applications
We encounter many areas where spectral analysis is needed. A few examples are given below.
65
66 CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS
11111
00000
00000
11111
00000
11111
Fixed mirror
Moveable
2 mirror
Beam splitter
S(t) 1
X(t)
111
000
d=0
AS(t − t0 )
AS(t − t0 − τ )
τ = 2d/c
Detector
V (t)
Low-pass
filter
w(τ )
Fourier
transform
W (ejω )
Thus, the output of a Michelson interferometer is proportional to the spectrum of the light
source. One of the objectives in spectroscopic interferometry is to get a good estimate of
the spectrum of S(t) without moving the mirror a large d.
Electrical Engineering. Measurement of the power in various frequency bands of some elec-
tromagnetic signal of interest. In radar there is the problem of signal detection and inter-
pretation.
Acoustics. The power spectrum plays the role of a descriptive statistic. We may use the
spectrum to calculate the direction of arrival (DOA) of an incoming wave or to characterise
the source.
Geophysics. Earthquakes give rise to a number of different waves such as pressure waves, shear
waves and surface waves. These waves are received by an array of sensors. Even for a
6.2. APPLICATIONS 67
100km distance between the earthquake and the array, these types of waves can arrive at
almost the same time at the array. These waves must be characterised.
{X ≤ x} := {ζ : X(ζ) ≤ x}
with probability
P ({X ≤ x}) := P ({ζ : X(ζ) ≤ x})
7.1.2 Properties
The probability distribution function has the following properties
1. 0 ≤ FX (x) ≤ 1
FX (∞) = 1
FX (−∞) = 0
lim FX (x + ǫ) = FX (x)
ǫ→0
69
70 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
6fx (x)
1
b−a
-
a b x
61
-
a b x
Example 7.1.2 A random variable X ∼ N (1000, 50). Find the probability that X is between
900 and 1050. We have
Numerical values of ΦX (x) over a range of x are shown in Table 1. Using Table 1, we conclude
that
P({900 ≤ X ≤ 1050}) = ΦX (1) + ΦX (2) − 1 = 0.819.
since
ΦX (−x) = 1 − ΦX (x).
1
Often we define the following as the error-function. Note that erf(x) = ΦX (x) − 2 though
there are other definitions in the literature.
Z x
1 2 1
erf(x) = √ e−x /2 du = ΦX (x) −
2π 0 2
Table 1: Numerical values for erf(x) function.
7.1.6 χ2 Distribution
P
Let X = νk=1 Yk2 be a random variable where Yk ∼ N (0, 1). X is then χ2 distributed with
ν degrees of freedom. The probability density function of a χ2ν random variable is displayed in
Figure 7.2 for ν = 1, 2, 3, 5 and 10.
The probability density function as
ν
x 2 −1 x
fX (x) = ν ν e− 2 u(x) ,
2 2 Γ( 2 )
where Γ(x) is the gamma function,
Z ∞
Γ(x) = ξ x−1 e−ξ dξ .
0
7.1. RANDOM VARIABLES 73
ν=1
0.5
0.4
χ2ν ν=2
0.3
ν=3
0.2
ν=5
ν = 10
0.1
0
0 1 2 3 4 5 6 7 8 9 10
The mean of X is
E [X] = ν
and the variance is
2
σX = 2ν .
We must introduce a new concept to include the probability of the joint event {X1 ≤ x1 , X2 ≤
x2 } = {(X1 , X2 ) ∈ D}, where x1 and x2 are two arbitrary real numbers and D is the quadrant
shown in Figure 7.3.
The joint distribution FX1 X2 (x1 , x2 ) of two random variables X1 and X2 is the probability
of the event {X1 ≤ x1 , X2 ≤ x2 },
x2
D
- X1
x1
7.1.8 Properties
The multivariate probability distribution function has the following properties.
1)
lim FX1 X2 ...Xn (x1 , . . . , xn ) = F (∞, ∞, . . . , ∞) = 1
x1 →∞,...,xn →∞
lim FX1 X2 ...Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn )
xi →−∞
= Fx1 ,x2 ,...,xn (x1 , . . . , xi−1 , −∞, xi+1 , . . . , xn ) = 0
2) F is continuous from the right in each component xi when the remaining components are
fixed.
3) F is a non decreasing function of each component xi when the remaining components are
fixed.
4) For the nth difference
X Pn
∆ba F = (−1)[n− i=1 ki ]
F (bk11 a1−k
1
1
, . . . , bknn a1−k
n
n
)≥0
k∈{0,1}n
- x1
and
n
Y
fX1 ...Xn (x1 , . . . , xn ) = fXi (xi )
i=1
or equivalently
n
Y
FX1 ...Xn (x1 , . . . , xn ) = FXi (xi )
i=1
which means that the events {X1 ≤ x1 }, . . . , {Xn ≤ xn } are independent. For the special case
for n = 2, we have the following properties:
1)
2)
lim FX1 X2 (x1 + ǫ, x2 ) = lim FX1 X2 (x1 , x2 + ǫ) = FX1 X2 (x1 , x2 ).
ǫ→0 ǫ→0
3) The event {x11 < X1 ≤ x12 , X2 ≤ x2 } consists of all the points (X1 , X2 ) in D1 , and the
event {X1 ≤ x1 , x21 < X2 ≤ x22 } consists of all the points in D2 . From Figure 7.5, we see
that
and therefore,
x22
D2 x2
x21
- X1
x1 x11 x12
D1
Similarly,
P({X1 < x1 , x21 < X2 ≤ x22 }) = FX1 X2 (x1 , x22 ) − FX1 X2 (x1 , x21 )
4)
P({x11 < X1 ≤ x12 , x21 < X2 ≤ x2 2}) =
FX1 X2 (x12 , x22 ) − FX1 X2 (x11 , x22 ) − FX1 X2 (x12 , x21 ) + FX1 X2 (x11 , x21 ) > 0
This is a probability that (X1 , X2 ) is in the rectangle D3 (see Figure 7.6).
X2
6
x22
D3
x21
-X1
x11 x12
where µX1 and µX2 are respectively the means of X1 and X2 , σX 2 > 0 and σ 2 > 0 the corre-
1 X2
2 ) and N (µ , σ 2 )
sponding variances, and −1 < ρ < 1 is the correlation coefficient. N (µX1 , σX 1 X2 X2
are the marginal distributions.
7.1. RANDOM VARIABLES 77
σ
The conditional density function fX2 (x2 |x1 ) of X2 when X1 = x1 is N (µX2 + ρ σX
X
2
(x1 −
2
2 (1 − ρ2 )). If X and X are independent then ρ = 0, and clearly f
µX2 ), σX 2 1 2 X1 X2 (x1 , x2 ) =
fX1 (x1 )fX2 (x2 ).
7.1.10 Expectation
The mathematical expectation of a random variable X,
E [X]
which may be read “the expected value of X” or “the mean value of X” is defined as
Z ∞
E [X] := x · fX (x)dx
−∞
Example 7.1.4
Z ∞
E [X] = xfX (x)dx
−∞
Z ∞
1 1
= √ x exp{− (x − µ)2 /σ 2 }dx
σ 2π −∞ 2
Z ∞
1 1
= √ (σu + µ) exp{− u2 }σdu
σ 2π −∞ 2
Z ∞ Z ∞
1 1 2 1 2
= √ σ u · exp{− u }du + µ exp{− u }du
2π −∞ 2 −∞ 2
1 h √ i
= √ σ · 0 + µ 2π
2π
= µ
78 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
ii) Uniform distribution: If X is uniformly distributed on the interval [a, b], then
1
fX (x) = · rect[a,b] (x)
b−a
where
1, a ≤ x ≤ b
rect[a,b] =
0, else.
Z ∞ Z b
1
E [X] = xfX (x)dx = xdx
−∞ b − a a
1 1 2
= (b − a2 )
b−a2
(b − a)(b + a)
=
2(b − a)
a+b
=
2
which can be verified graphically.
Z ∞
E [X] = x · fX (x)dx
−∞
Z ∞
1 x−a
= x · exp − u(x − a)dx
−∞ b b
Z
1 ∞ x−a
= x · exp − dx
b a b
Z
1 ∞ h xi hai
= x · exp − dx exp
b a b b
E [X] = a+b
Note: If a random variable’s density is symmetrical about a line x = a, then E [X] = a, i.e.
E [X] = a, if fX (x + a) = fX (−x + a)
where V and fV (v) are respectively the random voltage and its probability density function.
g(X) = X n , n = 0, 1, 2, . . . ,
the expectation of g(X) is known as the nth order moment of X and denoted by µn :
Z ∞
µn := E [X n ] = xn fX (x)dx .
−∞
It is also of importance to use central moments of X around the mean. These are defined as
The positive constant σ, is called the standard deviation of X. From the definition it follows
that σ 2 is the mean of the random variable (X − E [X])2 . Thus,
σ 2 = E (X − E [X])2 = E X 2 − 2X · (E [X]) + (E [X])2
= E X 2 − 2E [XE [X]] + (E [X])2
= E X 2 − 2 (E [X])2 + (E [X])2
σ 2 = E X 2 − (E [X])2
= E X 2 − µ2
Example 7.1.6
We will now find the variance of three common distributions.
i) Normal distribution:
Let X ∼ N (µ, σ 2 ). The probability density function of X is
1 1 (x − µ)2
fX (x) = √ exp − .
σ 2π 2 σ2
80 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Clearly Z
∞
1 (x − µ)2 √
exp − dx = σ 2π
−∞ 2 σ2
Differentiating with respect to σ, we obtain
Z ∞
(x − µ)2 1 (x − µ)2 √
exp − dx = 2π
−∞ σ3 2 σ2
√
Multiplying both sides by σ 2 / 2π, we conclude that
E (X − µ)2 = E (X − E [X])2 = σ 2 .
E [(X − E [X])2 = E X 2 − (E [X])2
a+b 2
Z ∞
2
= x fX (x)dx −
−∞ 2
Z b 2 2
x a+b
= dx −
a b−a 2
(b − a)2
= .
12
Z g −1 (y)
−1
FY (y) = P ({Y ≤ y}) = P ({g(X) ≤ y}) = P {X ≤ g (y)} = fX (x)dx
−∞
holds. The density function of Y = g(X) is obtained via differentiation which leads to
−1
dg −1 (y)
fY (y) = fX g (y) ·
dy
7.1. RANDOM VARIABLES 81
This can be interpreted as the expectation of a function g(X) = ejωX , or as the Fourier transform
(with reversed sign of the exponent) of the probability density function (pdf) of X. Consid-
ering Equation (7.2) as a Fourier transform, we can readily apply all the properties of Fourier
transforms to the characteristic function. Most importantly, the inverse relationship of Equation
(7.2) is given by Z ∞
1
fX (x) = ΦX (ω)e−jωx dω.
2π −∞
Using Schwarz’s inequality in (7.2) shows that |ΦX (ejω )| ≤ 1 and thus it always exists.
82 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.1.9 We shall derive the cf of a Gaussian distributed random variable. First, we
consider Z ∼ N (0, 1).
Z
ΦZ (ω) = fZ (z)ejωz dz
Z ∞
1 1 2
= √ e− 2 z ejωz dz
2π −∞
Z ∞
1 1 2 1 2
= √ e− 2 (z−jω) + 2 (jω) dz
2π −∞
Z
− 21 ω 2
= e fZ (z − jω)dz
1 2
= e− 2 ω
The cf is also known as the moment-generating function as it may be used to determine nth
order moment of a random variable. This can be seen from the following derivation. Consider
the power series expansion of an exponential term for x < ∞:
∞
x
X xn
e = .
n!
i=0
which means that the nth order moment of X can be determined according to
n
n 1 d
E [X ] = n ΦX (ω) .
j dω ω=0
This is known as the moment theorem. When the power series expansion of Equation (7.3)
converges the characteristic function and hence the pdf of X are completely determined by the
moments of X.
7.2. COVARIANCE 83
7.2 Covariance
If the statistical properties of the two random variables X1 and X2 are described by their joint
probability density function fX1 X2 (x1 , x2 ), then we have
Z ∞Z ∞
E [g(X1 , X2 )] = g(x1 , x2 ) · fX1 X2 (x1 , x2 )dx1 dx2
−∞ −∞
µ′20 = σX
2
1
µ′02 = σX
2
2
are the variances of X1 and X2 , respectively. The joint moment µ′11 is called covariance of X1
and X2 and denoted by cX1 X2
Clearly
cX1 X2 = rX1 X2 − E [X1 ] E [X2 ] .
If two random variables are independent, i.e. fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), then cX1 X2 = 0.
The converse however ispnot always true (except for the Gaussian case). The normalized second-
′ c
order moment ρ = µ11 / µ′20 µ′02 = σXX1σXX2 is known as the correlation coefficient of X1 and X2 .
1 2
It can be shown that
−1 ≤ ρ ≤ 1
We will show that two uncorrelated random variables are not necessarily independent.
1
In the engineering literature rX1 X2 is often called correlation. We reserve this term to describe the normalized
covariance as it is common in the statistical literature.
84 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
X1 = cos U, X2 = sin U
N
N X
X
σY2 2
= E (Y − E [Y ]) = αi αj cXi Xj
i=1 j=1
N
X
σY2 = αi2 σX
2
i
.
i=1
Definition 7.3.1 A real random process is an indexed set of real functions of time that has
certain statistical properties. We characterise a real random process by a set of real functions
and associated probability description.
..
xi−1 (n) = X(n, ζi−1 ) .
xi (n) = X(n, ζi )
..
n1 . n2
X1 = X(n1 ) = X(n1 , ζ) X2 = X(n2 ) = X(n2 , ζ)
random process X(n) that describes a noise source. A random process can be seen as a function
of two variables, time n and elementary event ζ. When n is fixed, X(n, ζ) is simply a random
variable. When ζ is fixed, X(n, ζ) = x(n) is a function of time known as “sample function” or
realisation. If n = n1 then X(n1 , ζ) = X(n1 ) = X1 is a random variable. If the noise source has
a Gaussian distribution, then any of the random variables would be described by
(xi −µi )2
1 −
2σ 2
fXi (xi ) = q ·e i
2πσi2
where Xi = X(ni , ζ). Also, all joint distributions are multivariate Gaussian. At each point
in time, the value of the waveform is a random variable described by its probability density
function at that time n, fX (x; n).
The probability that the process X(n) will have a value in the range [a, b] at n is given by
Z b
P({a ≤ X(n) ≤ b}) = fX (x; n)dx
a
86 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
where µ1 (n) is the mean of X(n), and µ2 (n)−µ1 (n)2 is the variance of X(n). Because in general,
fX (x; n) depends on n, the moments will depend on time. The probability that a process X(n)
lies in the interval [a, b] at time n1 and in the interval [c, d] at time n2 is given by the joint
density function fX1 X2 (x1 , x2 ; n1 , n2 ),
Z bZ d
P ({a ≤ X1 ≤ b, c ≤ X2 ≤ d}) = fX1 X2 (x1 , x2 ; n1 , n2 )dx1 dx2 .
a c
A complete description of a random process requires knowledge of all such joint densities for all
possible point locations and orders fX1 X2 ...XN (x1 , . . . , xN ; n1 , . . . , nN ).
• A continuous random process consists of a random process with associated continuously
distributed random variables, X(n), n ∈ Z. For example, noise in communication circuits
at a given time; the voltage measured can have any value.
• A discrete random process consists of a random process with associated discrete random
variables, e.g. the output of an ideal limiter.
holds or equivalently:
fX (x1 , x2 , . . . , xN ; n1 , . . . , nN ) = fX (x1 , x2 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) .
The process is said to be strictly stationary if it is stationary to the infinite order. If a process
is stationary, it can be translated in time without changing its statistical description.
Ergodicity is a topic dealing with the relationship between statistical averages and time
averages. Suppose that, for example, we wish to determine the mean µ1 (n) of a process X(n).
For this purpose, we observe a large number of samples X(n, ζi ) and we use their ensemble
average as the estimate of µ1 (n) = E [X(n)]
L
1X
µ̂1 (n) = X(n, ζi )
L
i=1
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES) 87
Suppose, however, that we have access only to a single sample x(n) = X(n, ζ) of X(n) for
−N ≤ n ≤ N . Can we use then the time average
N
1 X
X(n) = lim X(n, ζ)
N →∞ 2N + 1
n=−N
as the estimate of µ1 (n)? This is, of course, impossible if µ1 (n) depends on n. However, if
µ1 (n) = µ1 is a constant, then under the general conditions, X(n) approaches µ1 .
Definition 7.3.3 A random process is said to be ergodic if all time averages of any sample
function are equal to the corresponding ensemble averages (expectation).
Example 7.3.1 DC and RMS values are defined in terms of time averages, but if the process
is ergodic, they may be evaluated by use of ensemble averages. The DC value of X(n) is:
Xdc = E [X(n)]
p
Xrms = E [X(n)2 ].
If X(n) is ergodic, the time average is equal to the ensemble average, that is
N Z ∞
1 X
Xdc = lim X(n) = E [X(n)] = xfX (x; n)dx = µX
N →∞ 2N + 1 −∞
n=−N
Furthermore,
Z 2π
1
E X(n)2 = A2 cos(ω0 n + ϕ)2 dϕ
0 2π
A2
=
2
In this example, the time parameter n disappeared when the ensemble averages were evaluated.
The process is stationary up to the second order.
Now, evaluate the corresponding time averages using a sample function of the random pro-
cess. X(n, ζ1 ) = A cos ω0 n that occurs when Φ = 0. The zero phase Φ = 0 corresponds to one
of the events. The time average for any of the sample functions can be calculated by letting Φ
be any value between 0 and 2π. The time averages are:
N
1 X
• First moment: X(n) = lim A cos(ω0 n + ϕ) = 0
N →∞ 2N + 1
n=−N
N
1 X A2
• Second moment: X(n)2 = lim [A cos(ω0 n + ϕ)]2 =
N →∞ 2N + 1 2
n=−N
In this example, ϕ disappears when the time average is calculated. We see that the time average
is equal to the ensemble average for the first and second moments.
We have not yet proved that the process is ergodic because we have to evaluate all the orders
of moments and averages. In general, it is difficult to prove that a process is ergodic. In this
manuscript we will focus on stationary processes.
Example 7.3.3 Suppose that the amplitude in the previous example is now random. Let us
assume thatA and Φ are independent. In this case, the process is stationary but not ergodic
because if E A2 = σ 2 , then
σ2
E X(n)2 = E A2 · E cos(ω0 n + Φ)2 =
2
but
N
1 X 1
lim X(n)2 = A2
T →∞ 2N + 1 2
n=−N
Summary:
• An ergodic process has to be stationary.
• However, if a process is stationary, it may or may not be ergodic.
If the process is stationary to the second order, the SOMF is only a function of the time difference
κ = |n2 − n1 |, and
rXX (κ) = E [X(n + κ)X(n)]
1. E [X(n)] is a constant.
Proofs for 1 and 2 follow from the definition. The third property holds because
E (X(n + κ) ± X(n))2 ≥ 0
E X(n + κ)2 ± 2E [X(n + κ)X(n)] + E X(n)2 ≥ 0
rXX (0) ± 2 rXX (κ) + rXX (0) ≥ 0.
Definition 7.3.6 The “cross-SOMF” for two real processes X(n) and Y (n) is:
where κ = n1 − n2 .
3.
1
|rXY (κ)| ≤ [rXX (0) + rY Y (0)]
2
Property 2 constitutes a tighter bound than that of property 3, because the geometric mean of
two positive numbers cannot exceed the arithmetic mean, that is
p 1
rXX (0)rY Y (0) ≤ [rXX (0) + rY Y (0)]
2
• Two random processes X(n) and Y (n) are said to be uncorrelated if:
rXY (κ) = 0, ∀κ ∈ Z
• If the random processes X(n) and Y (n) are jointly ergodic, the time average may be used
to replace the ensemble average.
If X(n) and Y (n) are jointly ergodic
N
△ 1 X
rXY (κ) = E [X(n + κ)Y (n)] = E [X(n)Y (n − κ)] ≡ lim X(n) Y (n − κ)
N →∞ 2N + 1
n=−N
(7.4)
In this case, cross-SOMFs and auto-SOMFs may be measured by using a system composed
of a delay element, a multiplier and an accumulator. This is illustrated in Figure 7.8, where
the block z −κ is a delay line of order κ.
X(n)
? N
- - - 1 X
-
Y (n) z −κ × 2N + 1
rXY (κ)
n=−N
Y (n − κ)
The covariance function for two processes X(n) and Y (n) is defined by
or equivalently
cXY (n + κ, n) = rXY (n + κ, n) − E [X(n + κ)] E [Y (n)]
For processes that are at least jointly wide sense stationary, we have
and
cXY (κ) = rXY (κ) − E [X(n)] E [Y (n)]
The variance of a random process is obtained from cXX (κ) by setting κ = 0. If the process is
stationary h i
σX2 = E (X(n) − E [X(n)])2
= rXX (0) − (E [X(n)])2
= rXX (0) − µ2X
For two random processes, if cY X (n, n + κ) = 0, then they are called uncorrelated, which is
equivalent to rXY (n + κ, n) = E [X(n + κ)] E [Y (n)].
A complex process is stationary if X(n) and Y (n) are jointly stationary, i.e.
fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 , . . . , n2N )
= fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 + n0 , . . . , n2N + n0 )
for all x1 , . . . , xN , y1 , . . . , yN and all n1 , . . . , n2N , n0 and N . The mean of a complex random
process is defined as:
E [Z(n)] = E [X(n)] + jE [Y (n)]
Definition 7.3.9 The complex random process is stationary in the wide sense if E [Z(n)] is a
complex constant and rZZ (n1 , n2 ) = rZZ (κ) where κ = |n2 − n1 |.
The SOMF of a wide-sense stationary complex process has the symmetry property:
rZZ (−κ) = rZZ (κ)∗
as it can be easily seen from the definition.
Definition 7.3.10 The cross-SOMF for two complex random processes Z1 (n) and Z2 (n) is:
rZ1 Z2 (n1 , n2 ) = E [Z1 (n1 )Z2 (n2 )∗ ]
If the complex random processes are jointly wide-sense stationary, the cross-SOMF becomes:
rZ1 Z2 (n1 , n2 ) = rZ1 Z2 (κ) where κ = |n2 − n1 |
The covariance function of a complex random process is defined as
cZZ (n + κ, n) = E [(Z(n + κ) − E [Z(n + κ)]) (Z(n) − E [Z(n)])∗ ]
and is a function of κ only if Z(n) is wide-sense stationary. The cross-covariance function of
Z1 (n) and Z2 (n) is given as:
cZ1 Z2 (n + κ, n) = E [(Z1 (n + κ) − E [Z1 (n + κ)]) (Z2 (n) − E [Z2 (n)])∗ ]
Z1 (n) and Z2 (n) are called uncorrelated if cZ1 Z2 (n + κ, n) = 0. They are orthogonal if rZ1 Z2 (n +
κ, n) = 0.
Here ω0 is constant, Am is a zero-mean random amplitude of the mth signal and Φm is its
random phase. We assume all variables Am and Φm for m = 1, 2, . . . , M to be statistically
independent and the random phases uniformly distributed on [0, 2π). Find the mean and SOMF.
Is the process wide sense stationary ?
M
X
E [Z(n)] = E [Am ] ejω0 n E ejΦm
m=1
XM
= E [Am ] ejω0 n {E [cos(Φm )] + jE [sin(Φm )]}
m=1
= 0
Since h i 1 m=l
j(Φm −Φl )
E e =
6 l
0 m=
The former holds because for m 6= l,
E ejΦm −Φl = E ejΦm E e−jΦl = 0
Thus if m = l,
M
X M
X
2 jω0 κ jω0 κ
rZZ (n + κ, n) = E Am e =e E A2m
m=1 m=1
where Parseval’s theorem has been used to obtain the second integral. By dividing the expression
EN by N we obtain the average power PN in xN (n) over the interval [−M, M ]:
M Z π
1 X
2 |XN (ejω )|2 dω
PN = xN (n) = (7.5)
2M + 1 −π 2M + 1 2π
n=−M
From Equation (7.5), we see that the average power PN , in the truncated ensemble member
xN (n), can be obtained by the integration of |XN (ejω )|2 /N . To obtain the average power for the
entire sample function, we must consider the limit as M → ∞. PN provides a measure of power
for a single sample function. If we wish to determine the average power of the random process,
94 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
we must also take the average (expected value) over the entire ensemble of sample functions.
The average power of the process X(n, ζ) is defined as
M Z π
1 X 2
E |XN (ejω , ζ)|2 dω
PXX = lim E X(n, ζ) = lim
M →∞ 2M + 1 M →∞ −π 2M + 1 2π
n=−M
Remark 1: The average power PXX in a random process X(n) is given by the time average
of its second moment.
Remark 2: If X(n) is wide-sense stationary, E X(n)2 is a constant and PXX = E X(n)2 .
PXX can be obtained by a frequency domain integration. If we define the power spectral density
(PSD) of X(n) by h 2 i
E XN (ejω , ζ)
SXX (ejω , ζ) = lim
M →∞ 2M + 1
where
XM
XN (ejω , ζ) = X(n, ζ) e−jωn
n=−M
then the normalised power is given by
Z π
dω
PXX = SXX (ejω ) = rXX (0)
−π 2π
Example 7.4.1 Consider the random process
X(n) = A · cos(ω0 n + Φ)
where A and ω0 are real constants and Φ is a random variable uniformly distributed on [0, π/2).
We shall find PXX for X(n) using
M
X E X(n)2
PXX = lim .
M →∞ 2M + 1
n=−M
Using
Z π E XN (ejω )2 dω
PXX = lim
M →∞ −π 2M + 1 2π
A2
yields PXX = 2 which agrees with the above, which should be shown by the reader as an exercise.
The results derived for the PSD can be obtained similarly for two processes X(n) and Y (n),
say. The cross-power density spectrum is then given by
jω )Y (ejω )∗
E X N (e N
SXY (ejω ) = lim
M →∞ 2M + 1
and thus the total average cross power is
Z π
dω
PXY = SXY (ejω )
−π 2π
and conversely, Z π
dω
rXX (κ) = F −1 SXX (ejω ) = SXX (ejω ) ejωκ
−π 2π
Consequence: there are two distinctly different methods that may be used to evaluate the
PSD of a random process:
2. The PSD is obtained by evaluating the Fourier transform of rXX (κ), where rXX (κ) has
to be obtained first.
96 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.4.2
X(n) = A sin(ω0 n + Φ)
where A, ω0 are constants and Φ is uniformly distributed on [0, 2π). The mean of X(n) is given
by Z 2π
1
E [X(n)] = A sin(ω0 n + ϕ) dϕ = 0
0 2π
and its SOMF is
A2
rXX (κ) = E [X(n + κ)X(n)] = cos(ω0 κ)
2
By taking the Fourier transform of rXX (κ) we obtain the PSD
A2
SXX (ejω ) = F{rXX (κ)} = π {η(ω − ω0 ) + η(ω + ω0 )}
2
where
∞
X
η(ω) = δ(ω + k2π)
k=−∞
Example 7.4.3 Suppose that the random variables Ai are uncorrelated with zero mean and
variance σi2 , i = 1, . . . , P . Define
P
X
Z(n) = Ai exp{jωi n}.
i=1
It follows that
P
X
rZZ (κ) = σi2 exp{jωi κ}
i=1
and
P
X
jω
SZZ (e ) = σi2 η(ω − ωi )
i=1
7.4. POWER SPECTRAL DENSITY 97
Example 7.4.4 Suppose now that the random variables Ai and Bi , i = 1, . . . , P are pairwise
uncorrelated with zero mean and common variance
E A2i = E Bi2 = σi2 , i = 1, . . . , P.
Define
P
X
X(n) = [Ai cos(ωi n) + Bi sin(ωi n)] .
i=1
We find
P
X
rXX (κ) = σi2 cos(ωi κ)
i=1
and therefore
P
X
SXX = π σi2 [η(ω − ωi ) + η(ω + ωi )]
i=1
Definition 7.4.1 A random process X(n) is said to be a “white noise process” if the PSD is
constant over all frequencies:
N0
SXX (ejω ) =
2
where N0 is a positive constant.
The SOMF for a white process is obtained by taking the inverse Fourier transform of SXX (ejω ).
N0
rXX (κ) = δ(κ)
2
where δ(κ) is the Kronecker delta function given by
1, κ=0
δ(κ) =
0, 6 0
κ=
and
M
X
YN (ejω ) = yN (n)e−jωn
n=−M
provided −M ≤ n2 + κ ≤ M so that the delta function is in the range of the summation. This
condition can be relaxed in the limit as M → ∞. The above result shows that the time average
of the SOMF and the PSD form a Fourier transform pair. Clearly, if X(n) and Y (n) are jointly
stationary then this result also implies
∞
X
jω
SXY (e ) = rXY (κ)e−jωκ
κ=−∞
Definition 7.4.2 For two jointly stationary processes X(n) and Y (n) the cross-power spectral
density is given by
X∞
SXY (ejω ) = F{rXY (κ)} = rXY (κ)e−jωκ
κ=−∞
Properties:
1. SXY (ejω ) = SY X (ejω )∗ . In addition, if X(n) and Y (n) are real, then SXY (ejω ) =
SY X (e−jω ).
2. Re SXY (ejω ) and Re SY X (ejω ) are even functions of ω if X(n) and Y (n) are real.
3. Im SXY (ejω ) and Im SY X (ejω ) are odd functions of ω if X(n) and Y (n) are real.
Example 7.4.5 Let the SOMF function of two processes X(n) and Y (n) be
AB
rXY (n + κ, n) = [sin(ω0 κ) + cos(ω0 [2n + κ])]
2
where A, B and ω0 are constants. We find
M M
1 X AB AB 1 X
lim rXY (n + κ, n) = sin(ω0 κ) + lim cos(ω0 [2n + κ])
M →∞ 2M + 1 2 2 M →∞ 2M + 1
n=−M n=−M
| {z }
=0
3. Convergence in Probability
Xk converges in probability to X if ∀ ǫ > 0
lim P (|Xk − X| > ǫ) = 0
k→∞
4. Convergence in Distribution
Xk converges in distribution to X (or Xk is asymptotically FX distributed) if
lim FXk (x) = FX (x)
k→∞
for all continuous points of FX .
The convergences 1 to 4 are not equally weighted.
i) Convergence with probability 1 implies convergence in probability.
ii) Convergence with probability 1 implies convergence in the MSS, provided second order
moments exist.
iii) Convergence in the MSS implies convergence in probability.
iv) Convergence in probability implies convergence in distribution.
With the concepts of convergence one may define continuity, differentiability and so on.
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS 101
If X(n) is stationary and E [|X(n)|] < ∞, then Y (n) is stationary. If X(n) is white noise,
i.e. E [X(n)] = 0, E [X(n)X(k)] = σX 2 δ(n − k), with σ 2 the power of the noise and δ(n) the
X
Kronecker delta function
1, n = 0
δ(n) =
0, else
R
Y (n) is called a linear process. If we do not require the filter to be stable but |H(ejω )|2 dω < ∞,
where
X
H(ejω ) = h(n)e−jωn
n
P
is the transfer function of the system, and if for X(n), |cXX (n)| < ∞ then
X X
Y (n) = h(k)X(n − k) = h(n − k)X(k)
k k
converges in the MSS for N, M → ∞ and Y (n) is stationary in the wide sense with
X
µY , E [Y (n)] = h(k)E [X(n − k)] = µX H(ej0 )
k
102 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
−j, 0 < ω < π
H(ejω ) = 0, ω = 0, −π
j, −π < ω < 0
0, n=0
h(n) =
(1 − cos(πn))/(πn), else
R
The filter is not stable, but |H(ejω )|2 dω < ∞. Y (n) is stationary in the wide sense. For
example if X(n) = a cos(ωo n + φ), then Y (n) = a sin(ωo n + φ).
where X̃(n) , X(n) − µX . In the general case, the system impulse response and the input
process may be complex.
The covariance function of the output is determined according to
h i
cY Y (κ) = E Ỹ (n + κ)Ỹ (n)∗
" ! !∗ #
X X
= E h(k)X̃(n + κ − k) h(l)X̃(n − l)
k l
XX h i
= h(k)h(l) E X̃(n + κ − k)X̃(n − l)∗
∗
k l
XX
= h(k)h(l)∗ cXX (κ − k + l)
k l
and thus,
Example 7.6.3 Consider a system as depicted in Figure 7.9 where X(n) is a stationary, zero-
2 . The impulse response of the system is given by
mean white noise process with variance σX
n n
1 1 2 1
h(n) = + − u(n)
3 2 3 2
1/3 2/3
H(ejω ) = F{h(n)} = 1 −jω +
1 − 4e 1 + 12 e−jω
1
= 1 −jω
1+ 4e − 81 e−j2ω
Example 7.6.4 Consider the system shown in Figure 7.11 where Z(n) is stationary, zero-mean
white noise with variance σZ2 . Let h(n) be the same system as discussed in Example 7.6.3 and
g(n) is given by
1 1
g(n) = δ(n) + δ(n − 1) − δ(n − 2)
2 2
where δ(n) is the Kronecker delta function. We wish to find the cross-spectrum CXY (ejω ).
X
G(ejω ) = g(n)e−jωn
n
1 1
= 1 + e−jω − e−j2ω
2 2
h(n) X(n)
Z(n)
g(n) Y (n)
Example 7.6.5 We consider again the system shown in Figure 7.11 where we wish to find the
cross spectrum CXY (ejω ). Instead of using Equation (7.9) with X1 (n) = X2 (n) = Z(n) as done
106 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
1
Y (n) H(ejω ) X(n)
G(ejω )
in Example 7.6.4, we reformulate the problem as a cascade of two systems as shown in Figure
7.13.
Remembering that CY Y (ejω ) = |G(ejω )|2 CZZ (ejω ) (from Figure 7.11 and Equation (7.7)) we
obtain
H(ejω )
CXY (ejω ) = CY Y (ejω )
G(ejω )
H(ejω )
= |G(ejω )|2 CZZ (ejω )
G(ejω )
= H(ejω )G(ejω )∗ CZZ (ejω )
V(n)
Applying the Fourier transform to both sides of the above expression yields
cY X (κ) = E [Y (n + κ)X(n)]
" ! #
X
= E h(k)X(n + κ − k) + V (n + κ) X(n)
k
X
= h(k)E [X(n + κ − k)X(n)]
k
X
cY X (κ) = h(k)cXX (κ − k)
k
Applying the Fourier transform to both sides of the above expression yields
Note that when the additive noise at the output, V (n) is uncorrelated with the input process
X(n), it has no effect on the cross-spectrum of the input and output.
108 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
where ω0 is a constant and Ak is a zero-mean random amplitude of the kth signal with variance
σk2 , k = 1, 2, . . . , K. The phase φk is uniformly distributed on [−π, π). The random variables
Ak and φk are statistically pairwise independent for all k = 1, . . . , K.
Z(n) is the input to a linear, stable time-invariant system with known impulse response
h(n). The output of the system is buried in zero-mean noise V (n) of known covariance cV V (n),
independent of Z(n) (refer to Figure 7.14). We wish to determine the cross-spectrum CY X (ejω )
and the auto-spectrum CY Y (ejω ).
We first calculate the covariance function of the input process X(n)
hP i
K j(ω0 n+φk )
E [X(n)] = E k=1 Ak e
K
X h i
= E [Ak ] E ej(ω0 n+φk ) = 0
k=1
h i
j(φk −φl ) 1, k=l
Since E e =
0, k 6= l
K
X
cXX (n + κ, n) = E A2k ejω0 κ
k=1
K
X
= ejω0 κ σk2 = cXX (κ)
k=1
Let X(0), . . . , X(n−1) be a model for the observations x(0), . . . , x(N −1) of a stationary random
process X(n), n = 0, ±1, ±2, . . . . The function
N
X −1
XN (ejω ) = X(n)e−jωn , −∞ < ω < ∞
n=0
Properties
The finite Fourier transform has some properties which are listed below.
1. XN (ej(ω+2π) ) = XN (ejω )
3. The finite Fourier transform of aX(n) + bY (n) is aXN (ejω ) + bYN (ejω ), where a and b are
constants.
P P
4. If y(n) = m h(m)x(n − m), n (1 + |n|)|h(n)| < ∞ and |X(n)| < M , then there exists
a constant K > 0 such that |YN (ejω ) − H(ejω )XN (ejω )| < K ∀ ω.
Rπ
5. X(n) = (1/2π) −π XN (ejω )ejωn dω, n = 0, 1, . . . , N − 1, and 0 otherwise.
111
112 CHAPTER 8. THE FINITE FOURIER TRANSFORM
given that µ = E (X(0), . . . , X(N − 1))T , Σ = (Cov [Xi , Xk ])i,k=0,...,N −1 = E (X − µ)(X − µ)T
and |Σ| > 0.
Complex normal distribution: A vector X is called complex normally distributed with mean µ
and covariance Σ (X ∼ NNC (µ, Σ)) if the 2N vector of real valued random variables
ℜX
ℑX
where k is an integer.
(b) For pairwise distinct frequencies ωi , i = 1, . . . , M with 0 ≤ ωi ≤ π, the XN (ejωi ) are
asymptotically independent variables. This is also valid when the ωi are approximated
by 2πki /N → ωi as N → ∞.
This states that the discrete Fourier transformation provides N independent variables
at frequencies ωi which are normally distributed with mean 0 or N µX and variance
N CXX (ejωi ).
8.1. STATISTICAL PROPERTIES OF THE FINITE FOURIER TRANSFORM 113
2. RIf we use a window w(t), t ∈ R with w(t) = 0 for t < 0 and t > 1 that is bounded and
1 2
0 w(t) dt < ∞, then for ωi , i = 0, . . . , N − 1
N
X −1
XW,N (ejωi ) = w(n/N )X(n)e−jωi n , −∞ < ω < ∞
n=0
3. If N = M L and X(0), . . . , X(N − 1) are divided into L segments each of length M , then
M
X −1
XW,M (ejω , l) = w(m/M )X(lM + m)e−jω(lM +m)
m=0
This states that finite Fourier transforms of data segments of size M each at ωRi are inde-
1
pendently normally distributed random variable with variance M · CXX (ejωi ) 0 w(t)2 dt.
as N → ∞
114 CHAPTER 8. THE FINITE FOURIER TRANSFORM
Interpretation
To 1.(a): Take a broad-band noise with a normally distributed amplitude and pass it through an
ideal narrow bandpass filter,
jω 1, |ω ± ωo | < π/N
H(e ) = .
0, else
H(ejω )
2π/Ν
−ω0 ω0 ω
P
If X(n) is filtered, then Y (n) = h(m)X(n − m). For large N
1 1
Y (n) ≈ XN (2πs/N )e2πsn/N + XN (−2πs/N )e−2πsn/N
N N
where 2πs/N ≈ ωo , also Y (n) is approximately N1 (0, N1 CXX (ejωo )) distributed.
To 1.(b):
jω
H(e
0
)
Y(n)
ω
−ω0 ω0
jω
H(e
1
)
U(n)
−ω1 ω1 ω
non−overlapping
frequency ranges
Chapter 9
Spectrum Estimation
Y (n) = X(n)2
then
Z π
dω
E [Y (n)] = E X(n)2 = cXX (0) = CXX (ejω )
−π 2π
where HB 0 will be specified later and ω is the center frequency of the filter. The low-pass filter
0
is an FIR filter with impulse response
1
hl (n) = N , n = 0, 1, 2, . . . , N − 1
0, otherwise ,
V(n) U(n)
X(n) Band−pass Low−pass Y(n)
filter filter
Figure 9.1:
115
116 CHAPTER 9. SPECTRUM ESTIMATION
HΒ (ω)
HΒ0
jω sin( ωN
2 ) N −1
HL (e ) = e−ω 2
N sin(ω/2)
To measure CXX (ejω ) in a frequency band, we would need a filter bank with a cascade of
squarers and integrators. In acoustics, one uses such a circuit, where the center frequencies are
taken to be ωk+1 = (2ωk )1/3 , k = 1, . . . , K − 1. This is the principle of a spectral analyser,
especially when it is used for analysing analog signals. Such a spectral analyser yields the
spectrum at frequencies ω1 , . . . , ωk simultaneously. In measurement technology, one uses simpler
systems that work with a band-pass having variable center frequency. The signal of the system
output gives an estimate of the spectrum in a certain frequency band. The accuracy of the
measurement depends on the slowness of changing the center frequency and the smoothness of
the spectrum of X(n).
118 CHAPTER 9. SPECTRUM ESTIMATION
Chapter 10
Non-Parametric Spectrum
Estimation
Because it is not possible to create an ideal estimator, we try to reach this goal asymptotically
as the number N of observed samples X(n), n = 0, 1, . . . , N −1 tends to infinity. We require that
ĈXX (ejω ) is a consistent estimator for CXX (ejω ), i.e., ĈXX (ejω ) → CXX (ejω ) in probability as
N → ∞. A sufficient condition for consistency is that the mean square error (MSE) of ĈXX (ejω )
converges to zero as N → ∞ (see 7.6.1), i.e.,
2
jω jω jω
M SE ĈXX (e ) = E ĈXX (e ) − CXX (e ) →0 as N → ∞ .
Equivalently, we may say that a sufficient condition for ĈXX (ejω ) to be consistent is that
h i
Var ĈXX (ejω ) → 0 as N → ∞
and h i
Bias ĈXX (ejω ) = E ĈXX (ejω ) − CXX (ejω ) →0 as N →∞
because
2
jω jω jω
M SE ĈXX (e ) = E ĈXX (e ) − CXX (e )
h i h i
= E ĈXX (ejω )2 − 2E ĈXX (ejω ) · CXX (ejω ) + CXX (ejω )2
h i2 h i2
− E ĈXX (ejω ) + E ĈXX (ejω )
h i h i2 h i2
= E ĈXX (ejω )2 − E ĈXX (ejω ) + E ĈXX (ejω ) − CXX (ejω )
h i h i2
= Var ĈXX (ejω ) + Bias ĈXX (ejω ) .
119
120 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION
h i
If E ĈXX (ejω ) 6= CXX (ejω ) the estimator has a bias which results in a systematic error of the
h i
method. On the other hand if Var ĈXX (ejω ) 6= 0, we get a random error in the estimator, i.e.,
the estimator varies from data set to data set.
and bounded.
As seen in Chapter 8, the finite Fourier transform satisfies the following:
C
N1 (0, N CXX (ejω )), ω 6= kπ
jω
XN (e ) ∼ N1 (0, N CXX (ejπ )), ω = (2k + 1)π
N1 (N µX , N CXX (0)), ω = 2kπ
which suggests 1 the so called Periodogram IXX
N (ejω ) as an estimator for the spectrum C jω
XX (e ),
i.e.,
N 1
IXX (ejω ) = |XN (ejω )|2 (10.1)
N
N −1 2
1 X
X(n)e−jωn .
=
N
n=0
N 1
E IXX (ω) = · E XN (ejω ) · XN (ejω )∗ (10.2)
N
"N −1 N −1
#
1 X X
= ·E X(n)e−jωn X(m)∗ ejωm
N
n=0 m=0
N −1 N −1
1 X X
= E [X(n)X(m)∗ ] e−jω(n−m)
N
n=0 m=0
N −1 N −1
1 X X
= (cXX (n − m) + µ2X )e−jω(n−m)
N
n=0 m=0
N −1 N −1
1 X X 1 N jω 2 2
= cXX (n − m)e−jω(n−m) + |∆ (e )| µX
N N
n=0 m=0
N −1 N −1 Z π
1 X X
−jω(n−m) dλ 1
= e CXX (ejλ )ejλ(n−m) + |∆N (ejω )|2 µ2X
N −π 2π N
Zn=0 m=0
1 π dλ 1
= |∆N (ej(ω−λ) )|2 CXX (ejλ ) + |∆N (ejω )|2 µ2X
N −π 2π N
where
N −1
X N −1 sin( ωN
2 )
∆N (ejω ) = e−jωn = e−jω 2 · ω
sin( 2 )
n=0
is the Fourier transform of a rectangular window. Thus, for a zero mean stationary process
X(n), the expected value of the periodogram IXX N (ejω ), E I N (ejω ) is equal to the spectrum
XX
jω
CXX (e ) convolved with the magnitude squared of the Fourier transform of the rectangular win-
N P
dow, |∆N (ejω )|2 . This suggests that P
limN →∞ E IXX (ejω ) = CXX (ejω )+µ2X ·2π k δ(ω +2πk),
because limN →∞ N1 |∆N (ejω )|2 = 2π k δ(ω + 2πk), as illustrated in Fig. 10.1.
In (10.2) we calculated the mean of the periodogram of a rectangular windowed data sequence
X(n), n = 0, . . . , N −1. Advantages of using different window types are stated in the filter design
theory part (chapter 4). We now consider periodograms of sequences X(n), n = 0, . . . , N − 1,
multiplied by a real-valued window (also taper) wN (n) of length N , having a Fourier transform
WN (ejω ). Then,
N −1 N −1
N jω
1 X X
E IW X,W X (e ) = wN (n)wN (n)E [X(n)X(m)∗ ] e−jω(n−m) (10.3)
N
n=0 m=0
N −1 N −1
1 X X 1
= wN (n)wN (n)cXX (n − m)e−jω(n−m) + |WN (ejω )|2 µ2X
N N
Zn=0
π
m=0
1 dλ 1
= |WN (ej(ω−λ) )|2 CXX (ejλ ) + |WN (ejω )|2 µ2X
N −π 2π N
122 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION
20
N = 20
N jω 2
N |∆ (e )|
15
1
Window function
10 N = 10
5 N =5
0
−1 −0.5 0 ω
0.5 1
Frequency 2π
1 N jω 2
Figure 10.1: The window function N |∆ (e )| for N = 20, N = 10 and N = 5.
Comparing this result with the one for rectangular windowed data, we observe the following:
If CXX (ejα ) has a significant peak at α in the neighborhood of ω, the expected value of
N (ejω ) and I N
IXX jω jω
W X,W X (e ) can differ substantially from CXX (e ). The advantage of employing
a taper is now apparent. It can be considered as a means to reducing the effect of neighboring
peaks.
If the process X(n) is non-white and non-Gaussian the covariance function can be found in
[Brillinger 1981] as
h i
N
Cov IXX (ejω ), IXX
N
(ejλ ) ≈ (10.4)
1
CXX (ejω )2 · 2 |∆N (ej(ω+λ) )|2 + |∆N (ej(ω−λ) )|2 ,
N
which reduces to the variance of the periodogram to
N 1
Var IXX (ejω ) ≈ CXX (ejω )2 · 2 |∆N (ej2ω )|2 + N 2 .
N
N (ejω ) will tend to remain at the
This suggests that no matter how large N is, the variance of IXX
jω 2
level CXX (e ) . Because the variance does not tend to zero for N → ∞, according to Section
10.1, the periodogram is not a consistent estimator. If an estimator with a variance smaller than
this is desired, it cannot be obtained by simply increasing the sample length N .
The properties of the finite Fourier transform suggest that periodogram ordinates asymp-
totically (for large N ) are independent random variates, which explains the irregularity of the
periodogram in Figure 10.2. From the asymptotic distribution of the finite Fourier transform,
16
14
12
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 10.2: The periodogram estimate of the spectrum (solid line) and the true spectrum (dotted
ω
line). The abscissa displays the frequency 2π while the ordinate shows the estimate.
If we assume that the covariance function cXX (k) of the process X(n) is finite or can be
viewed as finite with a small error and the length of cXX (k) is much smaller than the length
of the data stretch N , then we can divide the process X(n) of length N into L parts with
length M each, where M ≪ N . With this assumption we can calculate L nearly independent
periodograms IXXM (ejω , l) l = 0, . . . , L − 1 of sequences
Xl (n) = X(n + l · M ) , n = 0, . . . , N − 1
This method of estimating the spectral density function CXX (ejω ) is called Bartlett method due
to its invention by Bartlett in 1953.
which is the same as the mean of the periodogram of the process X(n) for n = 0, . . . , M − 1.
From Section 10.2.1, it can be seen that Bartlett’s estimator is an unbiased estimator of the
spectrum for M → ∞ because the periodogram is asymptotically unbiased.
To check the performance of the estimator we also have to derive the variance of the estimator.
10.3. AVERAGING PERIODOGRAMS - BARTLETT ESTIMATOR 125
16
14
12
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 10.3: The periodogram as an estimate of the spectrum using 4 non-overlapping segments.
ω
The abscissa displays the frequency 2π while the ordinate shows the estimate.
with an increasing L.
With this result and according to the distribution of the periodogram, Bartlett’s estimator
is asymptotically (
CXX (ejω ) 2
B jω 2L χ2L , ω 6= πk
ĈXX (e ) ∼ CXX (ejω ) 2
k∈Z .
L χL , ω = πk
distributed. This is due to the fact that the sum of L independent χ22 distributed variables is
χ22L distributed.
Xl (n) = X(n + l · D) ,
where l · D is the starting point of the lth sequence. According to this, the periodograms are
defined similarly to Equation (10.5), except for a multiplication by the window wM (n), i.e.,
M −1 2
1 X
M jω
Xl (n) · wM (n) · e−jωn .
IW X,W X (e , l) =
M
n=0
To calculate the normalization factor A we have to derive the mean of the estimator.
" L−1
#
h i 1 X
W
E ĈXX (ejω ) = E · M
IW jω
X,W X (e , l)
L·A
l=0
L−1
1 X M jω
= · E IW X,W X (e , l)
L·A
l=0
1 M
= E IW X,W X (ejω )
A
From (10.3) we get
h i Z π
W 1 1 1
E ĈXX (ejω ) = · · |WM (ej(ω−λ) 2 jλ 2 jω 2
)| CXX (e )d λ + µX · |WM (e )| .
A M 2π −π
10.5. SMOOTHING THE PERIODOGRAM 127
For M → ∞ the variance of the Welch method for non-overlapping data segments with
D = M is asymptotically the same as for the Bartlett method (unwindowed data)
h i h i 1
W
Var ĈXX (ejω ) ≈ B
Var ĈXX (ejω ) ≈ CXX (ejω )2 .
M →∞, D=M L
In the case of 50% overlap (D = M2 ) Welch calculated the asymptotic variance of the estimator
to be h i
W 9 1
Var ĈXX (ejω ) ≈ · · CXX (ejω )2 ,
M 8 L
M →∞, D= 2
which is higher than the variance of Bartlett’s estimator. The advantage of using Welch’s
estimator with overlap is that one can use more segments for the same amount of data. This
reduces the variance of the estimate while having a bias which is comparable to the one obtained
by the Bartlett estimator. For example, using a 50% overlap leads to doubling the number of
segments, thus a reduction of the variance by the factor 2. Additionally, the shape of the window
only affects the bias of the estimate, not the variance. So even for non-overlapping segments,
the selection of suitable window functions can result in an estimator of reduced bias. These
advantages makes Welch’s estimator one of the most used non-parametric spectrum estimators.
c
The method is also implemented in the ”pwelch” function of
MATLAB.
frequencies ωk = 2πk
N , k ∈ Z in the neighborhood of a value of ω at which we wish to estimate
CXX (ejω ). This motivates the estimate
m 2π
S 1 X
ĈXX (ejω ) = N
IXX ej N [k(ω)+k]
2m + 1
k=−m
where 2π jω
N ·k(ω) is the nearest frequency to ω at which the spectrum CXX (e ) is to be estimated.
Given the above property, we can expect to reduce the variance of the estimation by averag-
jω
ing over adjacent periodogram ordinates. While for ω 6= kπ, k ∈ Z IXX N (ejω ) ∼ CXX (e ) χ2
2 2
jω
N (ejω ) for ω = πk is not CXX (e ) χ2 distributed but C
distributed, we note that IXX jω 2
2 2 XX (e )χ1
distributed. Therefore we exclude these values in the smoothing, which means
k(ω) + k 6= l · N, l ∈ Z.
Because the spectrum CXX (ejω ) of a real process X(n) is of even symmetry and periodic in 2π
it is natural to calculate only values of the periodogram IXX N (ejω ) for 0 ≤ ω < π. This results
which is the same as for the estimation of ω = (2l + 1)π when N is even. For the estimation at
frequency ω = (2l + 1)π when N is odd we use
m
1 Xh N π 2kπ π 2kπ
i
S
ĈXX (ejω ) = IXX (ej (ω− N + N ) ) + IXX
N
(ej (ω+ N − N ) )
2m
k=1
m
1 X N π 2kπ
= IXX (ej (ω− N + N ) ) ω = (2l + 1)π and N odd
m
k=1
which implies that the variance of the estimator at frequencies ω = lπ, l ∈ Z will be higher than
at other frequencies. In the sequel, we will only consider frequencies ω 6= lπ, but the method is
similar to frequencies ω = lπ.
The smoothing method is nothing else but sampling the periodogram at discrete frequencies
with the 2π
N spacing and averaging. Thus, the method can be easily implemented using the FFT
which makes it computational efficient.
2πk(ω)
In the following calculation we will assume that N = ω. Using the formula for the mean of
10.5. SMOOTHING THE PERIODOGRAM 129
4.5
4
Am (ejω )|N =15, m=0
3.5
10
3
2.5
2
5
1.5
0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
ω
π π
2.5 1.4
1.2
2
Am (ejω )|N =15, m=3
1.5
0.8
0.6
1
0.4
0.5
0.2
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
the mean of the periodogram (10.2) and the mean of the smoothed periodogram (10.7) suggests
S (ejω ) will most often be greater than the bias of I N (ejω ) as the integral
that the bias of ĈXX XX
extends over a greater range. Both will be the same if the spectrum CXX (ejω ) is flat, however
S (ejω ) is unbiased as N → ∞.
ĈXX
130 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION
14 10
12
8
10
8 6
6
4
4
2
2
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
10
8
8
6
6
4 4
2 2
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Figure 10.5: The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to
ω
bottom right). The abscissa displays the frequency 2π while the ordinate shows the estimate.
With this result and the asymptotic behavior of the periodogram ordinates, we find the
S (ejω ) as
asymptotic distribution of ĈXX
CXX (ejω ) 2
S jω 4m+2 χ4m+2 , ω 6= πk
ĈXX (e ) ∼ k∈Z .
CXX (ejω ) 2
2m χ2m , ω = πk
near ω, which, for a nearly constant spectrum is reasonable. However, if the spectrum CXX (ejω )
varies, this method creates a larger bias. In this case it is more appropriate to weigh periodogram
ordinates near ω more than those further away. Analogously to Section 10.5, we construct the
estimator as
m 2π
SW jω 1 X N k(ω) + k 6= l · N
ĈXX (e ) = Wk IXX ej N [k(ω)+k] l ∈ Z, (10.8)
A ω 6= l · π
k=−m
where A is a constant to be chosen for an unbiased estimator. The case where ω = lπ can be
easily derived from Section 10.5. For further investigations of the estimator in (10.8) we will
only consider the case where ω 6= l · π, l ∈ Z.
where
m
jω
1 1 X 2πk
ASW e = Pm · Wk |∆N (ej(ω+ N ) )|2 .
l=−m Wl N k=−m
This expression is very similar to the function Am (ejω ) from Equation (10.7). Indeed if we set
1
Wk = 2m+1 ∀ k, it follows automatically that ASW (ejω ) = Am (ejω ), i.e. Am (ejω ) is a special case
of ASW (ejω ). However with the weights Wk we have more control over the shape of ASW (ejω )
which also leads to a better control of the bias of the estimator itself. The effect of different
weightings on the shape of ASW (ejω ) can be seen in Figure 10.6.
132 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION
1.8 3
1.6
2.5
1.4
1.2 2
ASW (ejω )
ASW (ejω )
1
1.5
0.8
0.6 1
0.4
0.5
0.2
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
4 4
3.5 3.5
3 3
ASW (ejω )
ASW (ejω )
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
Figure 10.6: Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting
from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window.
SW (ejω ) is given by
The variance of the estimator ĈXX
h i h i h i2
SW jω SW jω 2 SW jω
Var ĈXX (e ) = E ĈXX (e ) − E ĈXX (e )
m m h 2π 2π i
1 X X N j N [k(ω)+k] N j N [k(ω)+l]
= W k W l E IXX e IXX e
A2
k=−m l=−m
m m h 2π i2
1 X X
N
− Wq Wµ E IXX ej N [k(ω)]
A2
q=−m µ=−m
m 2 m m h 2π i2
1 X 2 N j 2π [k(ω)+p] 1 X X N
= W E IXX e N + 2 Wk Wl E IXX ej N [k(ω)]
A2 p=−m p A
k=−m l=−m
l6=k
m m h 2π i2
1 X X
N
− Wq Wµ E IXX ej N [k(ω)]
A2
q=−m µ=−m
m h i
1 X 2 N 2π
= W Var IXX ej N [k(ω)] .
A2 p=−m p
10.7. THE LOG-SPECTRUM 133
With Pm Pm 2
m m
i=−m Wi k=−m Wk
X X
Wk2 = Wk − + ≥0 (10.9)
2m + 1 2m + 1
k=−m k=−m
SW (ejω ). From
we are able to calculate the weighting coefficients Wk for the lowest variance of ĈXX
(10.9) it is easy to see that if
Pm
Wi A
Wk = i=−m = , k = 0, ±1, . . . , ±m ,
2m + 1 2m + 1
SW (ejω ). Because this result is nothing
we reach the lowest possible variance for the estimator ĈXX
other than the smoothed periodogram (see section 10.5) we see that ĈXX S (ejω ) has the largest
bias for a fixed N but the lowest variance of the general class of spectrum estimators covered in
this section.
The above equations show that the variance is approximately a constant and does not depend
on CXX (ejω ).
NP
−1
1
where X̄ = N X(k) is an estimate of the mean µX = E [X(k)]. As it can be seen from
k=0
(10.10) the estimator of the covariance function ĉXX (n) is only defined for n ≥ 0. It is obvious
to calculate estimates for n < 0 via the symmetry property of the covariance function
Motivated by the fact that the spectrum CXX (ejω ) is the Fourier transform of the covariance
function cXX (n) it is intuitive to suggest an estimator of CXX (ejω ) as
∞
X
ĈXX (ejω ) = ĉXX (n)e−jωn
n=−∞
N −1 ∞
1 X X
= [X(l + n) − X̄][X(l) − X̄]∗ · e−jωn
N
n=−(N −1) l=−∞
N −1 ∞
1 X X
= [X(n − l) − X̄][X(−l) − X̄]∗ · e−jωn
N
n=−(N −1) l=−∞
which is equivalent to
ĈXX (ejω ) = IX−
N jω
X̄,X−X̄ (e ) (10.11)
and should be shown by the reader as an exercise.
The derivation in (10.11) shows that taking the Fourier transform of the sample covari-
ance function is equivalent to computing the periodogram as in (10.1). This is therefore not a
consistent method for estimating the spectrum.
N jω 2
Since limN →∞ 1∆ (e ) = 2π P δ(ω + 2πk), we find
N k
Z ∞
1 1 N jϑ 2
lim |∆ (e )| · W2M −1 ej(ω−θ−ϑ) d ϑ = W2M −1 ej(ω−θ)
N →∞ 2π −∞ N
which reduces the term for the mean of the estimator to a simple convolution
h i Z π
BT jω 1
E ĈXX (e ) = CXX (ejθ ) · W2M −1 ej(ω−θ) d θ . (10.13)
2π −π
Based on this approximation the variance of the Blackman Tukey method tends to zero if the
window function w2M −1 (n) has finite energy.
Given X(0), . . . , X(N − 1), Y (0), . . . , Y (N − 1), the objective is to estimate the cross-spectrum
CY X (ejω ) for −π < ω < π. The finite Fourier transforms are given by
N
X −1
YN (ejω ) = Y (n)e−jωn
n=0
N
X −1
XN (ejω ) = X(n)e−jωn
n=0
XN (ejω ) and YN (ejω ) are for large N normally distributed random variables as discussed above.
At distinct frequencies 0 < ω1 < ω2 < . . . < ωN < π the random variables XN (ejω ) and YN (ejω )
are independent.
Considering L data stretches of the series X(0), . . . , X(N − 1) and Y (0), . . . , Y (N − 1), so
that M L = N ,
M
X −1
jω
YN (e , l) = Y ([l − 1]M + m) e−jωm l = 1, . . . , L and
m=0
M
X −1
XN (ejω , l) = X ([l − 1]M + m) e−jωm l = 1, . . . , L
m=0
CXX (ejω ) ℜCY X (ejω ) 0 ℑCY X (ejω )
N ℜCY X (ejω ) CY Y (ejω ) −ℑCY X (ejω ) 0
E V(ejω )V(ejω )H =
jω jω jω
.
2 0 ℑCY X (e ) CXX (e ) ℜCY X (e )
−ℑCY X (ejω ) 0 ℜCY X (ejω ) CY Y (ejω )
Also, we have
E YN (ejω )XN (ejω ) = 0
and
Var IYNX (ejω ) = CY Y (ejω )CXX (ejω )
10.9. CROSS-SPECTRUM ESTIMATION 137
Because
CY X (ejω )2 ≤ CY Y (ejω )CXX (ejω )
2
The variance of IYNX (ejω ) is never smaller than CY X (ejω ) for any N .
To construct estimates for CY X (ejω ) with lower variance, we use the properties given above.
We calculate
L
1X 1
ĈY X (ejω ) = YM (ejω , l)XM (ejω , l)∗
L M
l=1
Consider the system of Figure 11.1 with output X(n) such that
p
X
X(n) + ak X(n − k) = Z(n)
k=1
where E [Z(n)] = 0, E [Z(n)Z(k)] = σZ2 δ(n−k) and ap 6= 0. It is assumed that the filter with unit
impulse response h(n) is stable. The process X(n) is then called an autoregressive process of
order p or in short notation AR(p). The recursive filter possesses the following transfer function
1
H(ejω ) = Pp −jωk
1+ k=1 ak e
Because we have assumed the filter to be stable, the roots of the characteristic equation
p
X
zp + ak z p−k = 0, z = ejω
k=1
lie in the unit disc, i.e. |zl | < 1 for all l roots of the above equation. The spectrum is given as
thus,
σZ2
CXX (ejω ) = Pp −jωk |2
|1 + k=1 ak e
139
140 CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION
The objective is to estimate CXX (ejω ) based on observations of the process X(n) for n =
0, . . . , N − 1.
Multiplying
p
X
X(n) + ak X(n − k) = Z(n)
k=1
Thus,
p
X
cXX (l) + ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]
k=1
Assuming the recursive filter to be causal, i.e. h(n) = 0 for n < 0 leads to the relationship
∞
X
X(n) = h(k)Z(n − k)
k=0
∞
X
= h(0)Z(n) + h(k)Z(n − k)
k=1
The difference equation in cXX (l) has a similar form to the system describing difference equation
in X(n).
Having observed X(n), one can estimate cXX (n). Using cXX (0), . . . , cXX (p), one would get
a unique solution for a1 , . . . , ap . The solution is unique if the inverse matrix with the elements
cXX (l − k) exists, which is guaranteed if the filter is stable. Given estimates â1 , . . . , âp of
a1 , . . . , ap , one can use the Yule-Walker equations for l = 0 to get σ̂Z2 . Then it is straightforward
to find an estimate for CXX (ejω ) given by
σ̂Z2
ĈXX (ejω ) = Pp −jωk |2
|1 + k=1 âk e
11.2. SPECTRUM ESTIMATION USING A MOVING AVERAGE PROCESS 141
If Z(n) is a white noise process, i.e., E [Z(n)] = 0 and cZZ (k) = σZ2 δ(k), then X(n) is a Moving
Average (MA) Process of order q, MA(q).
Z(n)
z −1 z −1 z −1
Let ĉXX (0), . . . , ĉXX (q) be available (consistent) estimates for cXX (0), . . . , cXX (q). Then
q−n
X
ĉXX (n) = b̂(m)b̂(m + n), n = 0, . . . , q .
m=0
The estimates b̂(0), . . . , b̂(q) are consistent, because of the consistency of ĉXX (k), k = 0, . . . , q.
b̂(1) b̂(q)
Then ĥ(1) = , . . . , ĥ(q) = , σ̂Z2 = b̂(0)2 .
b̂(0) b̂(0)
= B̂(z)B̂ (1/z)
As an example
q
b̂(1) X
= ĥ(1) = − zi .
b̂(0) i=1
Example 11.2.1 Consider a receiver input signal X(n) from a wireless transmission. To find
the moving average estimate of the spectrum CXX (ejω ), we first calculate the sample covariance
ĉXX (n) and take the z-transform ĈXX (z). By taking P (z) = z N −1 ĈXX (z) we find N − 1 roots
zi , i = 1, . . . , N − 1 inside the unit circle and assign them to B̂(z). The estimator ĈXX (ejω )
can then be found according to (11.1).
11.3. MODEL ORDER SELECTION 143
12
10
6
ĈXX (ejω )
4
CXX (ejω )
2
0
0 0.5 1 1.5 2
ω
π
2 2
AIC(m) = log σ̂Z,m +m , m = 1, . . . , M
N
The order estimate is the smallest m minimising AIC(m).
Another more accurate estimate is given by the minimum description length (MDL) criterion,
obtained by taking the smallest m minimising
2 log N
MDL(m) = log σ̂Z,m +m , m = 1, . . . , M
N
Examples
We generate realisations from the N (0, 1) distribution. We filter the data recursively to get 1024
values. The filter has poles
10 True
Estimate
9
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
ω
Figure 11.4: True and estimated AR(6) spectrum. The abscissa displays the frequency 2π while the
ordinate shows the estimate.
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5
0.5
MDL
AIC
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2 4 6 8 10 2 4 6 8 10
Order Order
Figure 11.5: Akaike and MDL information criteria for the AR(6) process for orders 1 through 10.
Chapter 12
The Kalman filter has become an indispensable tool in modern control and signal processing.
Its simplest use is as an estimation tool. However, it is capable of revealing more than just
a standard estimate; it provides a complete statistical description of the degree of knowledge
available. This can be highly useful in system analysis. In contrast to many other estimation
techniques, the Kalman filter operates in the state space domain. This intuitive structure has
given it physical motivation in a variety of applications. In addition, its implementation can be
computationally efficient.
As a result, it has found application in a wide variety of fields; especially in process control
and target tracking, but also in many other signal processing topics.
where
• X(n) ∈ ℜN ×1 is the process state at time n
145
146 CHAPTER 12. DISCRETE KALMAN FILTER
X(n)
U (n) B δK (n − 1) H Z(n)
Figure 12.1: Relationship between system state and measurements assumed for Kalman filter.
In addition, we assume that the noise quantities W (n) and V (n) are independent to each
other and to U (n), white and Gaussian distributed,
W (n) ∼ N (0, Q)
V (n) ∼ N (0, R) .
The matrices A, B and H are assumed known from the physics of the system.
The control input U (n) is the intervention or “steering” of the process by an outside source
(e.g. a human operator). It is assumed that U (n) is known precisely. Therefore, its effect on
the process, through (12.1) is also known.
Now, in order to derive the Kalman filter equations we need to consider what is the “best”
estimate X̂(n) of the new state X(n) based on the previous state estimate X̂(n − 1), the pre-
vious control input U (n − 1) and the current observation Z(n). This estimate of X(n) may be
calculated in two steps:
12.2. DERIVATION OF KALMAN FILTER EQUATIONS 147
where
P (n) = E E(n)E(n)T (12.7)
with
E(n) = X(n) − X̂(n) (12.8)
is the (state) estimate error – i.e. the difference between the true state vector and our a posteriori
state estimate.
To find the K(n) that minimises (12.6), start by writing the a posteriori error in terms of
the a priori error, E − (n) = X(n) − X̂ − (n), and its covariance, P − (n) = E E − (n)E − (n)T
E(n) = X(n) − X̂(n)
= E − (n) − X̂(n) − X̂ − (n)
= E − (n) − K(n) Z(n) − H X̂ − (n)
= E − (n) − K(n) H(X(n) − X̂ − (n)) + V (n)
= E − (n) − K(n) HE − (n) + V (n) (12.9)
148 CHAPTER 12. DISCRETE KALMAN FILTER
It is easy to see that as R → 0, i.e. the measurement noise becomes less influential , K(n) → H −1 .
Thus, the residual between actual and predicted measurements is more trusted and is given a
heavy weighting in correcting the state estimate through (12.5).
Alternatively, if it is known that the a priori error covariance approaches 0, i.e. P − (n) → 0,
or if R is large, then K(n) → 0. This means that the residual is given less weight in updating
the state estimate – the a priori estimate is given priority.
Recall that
X(n) = AX(n − 1) + BU (n − 1) + W (n − 1)
X̂ − (n) = AX̂(n − 1) + BU (n − 1)
then
P − (n) = E E − (n)E − (n)T
T
− −
= E X(n) − X̂ (n) X(n) − X̂ (n)
T
= E A(X(n − 1) − X̂(n − 1)) + W (n − 1) A(X(n − 1) − X̂(n − 1)) + W (n − 1)
T
= AE X(n − 1) − X̂(n − 1) X(n − 1) − X̂(n − 1) AT + E W (n − 1)W (n − 1)T
= AE E(n − 1)E(n − 1)T AT + E W (n − 1)W (n − 1)T
= AP (n − 1)AT + Q (12.11)
1
The following lemmas are used in the differentiation:
∂tr {AX}
= AT
∂X
∂tr AX T
˘ ¯
= A
∂X
∂XAX T
= X(A + AT )
∂X
12.3. EXTENDED KALMAN FILTER 149
so
h T i
E E(n)E(n)T = E (I − K(n)H)E − (n) − K(n)V (n) (I − K(n)H)E − (n) − K(n)V (n)
h i
P (n) = E (I − K(n)H) E − (n)E − (n)T (I − K(n)H)T + K(n)V (n)V (n)T K(n)T
= (I − K(n)H) P − (n) (I − K(n)H)T + K(n)RK(n)T
= P − (n) − K(n)HP − (n) − P − (n)H T K(n)T + K(n)HP − (n)H T K(n)T + K(n)RK(n)T
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + K(n)(HP − (n)H T + R)K(n)T
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + P − (n)H T K(n)T
= (I − K(n)H) P − (n) (12.12)
Note that we have used the solution for K(n) in (12.10) to simplify the above equation.
It is now possible to draw all the equations derived above together to complete the description
of the Kalman filter:
1. Predict the next state and covariance using the previous estimate.
The distinction between the two main steps is quite clear. The prediction is based on the a
priori information available, i.e. the estimates from the previous time instance which was based
on all measurements up to, but not including, the current time instance n. While the correction
is made using the current measurement Z(n) to yield an a posteriori estimate.
In fact, the suitability of the Kalman filter for computer implementation is, in large part, due
to this feature. The procedure is recursive – it is run after each measurement is made, meaning
intermediate estimates are always available. This is in contrast to some techniques that must
process the entire measurement sequence to finally yield an estimate. It also compresses all the
information from previous measurements into the current state estimate and its corresponding
covariance matrix estimate – there is no need to retain a record of previous measurements. This
can result in significant memory savings.
applications. In these cases it becomes necessary to use the Extended Kalman Filter (EKF). As
the name suggests, this is a generalisation or extension to the standard Kalman filter framework
whereby the equations are “linearised” around the estimates in a way analogous to the Taylor
series.
Let the state transition now be defined as: (c.f. (12.1))
As for the Kalman filter case, the noise processes, W (n − 1) and V (n − 1), are unobservable
and only an estimate of the previous state, X̂(n − 1), is available. However, the control input,
U (n − 1) is known. Therefore, estimates of the state and measurement vectors can be found by
This means that (12.13) and (12.14) can be linearised around these initial estimates
and
where A(n), C(n), H(n) and D(n) are Jacobian matrices of the partial derivatives of f and h,
i.e.
∂f[i]
A[i,j] = (X̂(n − 1), U (k − 1), 0)
∂x[j]
∂f[i]
W[i,j] = (X̂(n − 1), U (k − 1), 0)
∂w[j]
∂h[i]
H[i,j] = (X̂(n − 1), 0)
∂x[j]
∂h[i]
V[i,j] = (X̂(n − 1), 0) .
∂v[j]
Hopefully the interpretation of these equations is clear. The nonlinear state and measurement
equations are linearised about an initial, a priori estimate. The four Jacobian matrices describe
the rate of change of the nonlinear functions at these estimates.
Due to the non-linearities, the random variables in the EKF cannot be guaranteed to be
Gaussian. Further, no general optimality criterion can be associated with the resulting estimate.
One way to interpret the EKF is that it is simply an adaptation of the Kalman filter to the non-
linear case through the use of a first order Taylor series expansion about the a priori estimate.
By using the equations above and following the methodology in deriving the Kalman filter,
the complete algorithm for implementing an EKF is:
12.3. EXTENDED KALMAN FILTER 151
1. Predict the next state and covariance using the previous estimate.
x(−n)∗ X(ejω )∗
x(n) real and even X(ejω ) real and even
x(n) real and odd X(ejω ) pure imaginary and odd
Parseval’s Theorem
∞ Z π
X 2 1 2
|x(n)| = |X(ejω )| dω
n=−∞
2π −π
∞ Z π
X
∗ 1
x(n) · y(n) = X(ejω ) · Y (ejω )∗ dω
n=−∞
2π −π
153
154 APPENDIX A. DISCRETE-TIME FOURIER TRANSFORM