Digital Signal Processin-TU Darmstadt

Darmstadt University of Technology
Digital Signal Processing

Abdelhak M. Zoubir
Signal Processing Group
Institute of Telecommunications
Darmstadt University of Technology
Merckstr. 25, 64283 Darmstadt, Germany
e-mail: zoubir@spg.tu-darmstadt.de
URL: http://www.spg.tu-darmstadt.de
Manuscript draft for Digital Signal Processing
c
Abdelhak Zoubir 2006
Last revised: October 2006.
Contents
Preface xi
1 Discrete-Time Signals and Systems 1

1.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1.1 The unit sample sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 The unit step sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.1.3 Exponential and sinusoidal sequences . . . . . . . . . . . . . . . . . . . . 4
1.1.4 Periodic Sequences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 The unit sample response of linear systems . . . . . . . . . . . . . . . . . 8
1.3 Convolution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4 Stability and Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.4.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2 Digital Processing of Continuous-Time Signals 13

2.1 Periodic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Reconstruction of Band-Limited Signals . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Discrete-time Processing of Continuous-Time Signals. . . . . . . . . . . . . . . . 15
3 Filter Design 19
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Digital Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Linear Phase Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.3.1 Generalised Linear Phase for FIR Filters . . . . . . . . . . . . . . . . . . . 22
3.4 Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.5 Properties of IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.5.1 Stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
3.5.2 Linear phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4 Finite Impulse Response Filter Design 33

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.2 Design of FIR Filters by Windowing . . . . . . . . . . . . . . . . . . . . . . . . . 33
4.3 Importance of the Window Shape and Length . . . . . . . . . . . . . . . . . . . . 35
4.4 Kaiser Window Filter Design . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.5 Optimal Filter Design. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
4.6 Implementation of FIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
i
ii CONTENTS
5 Design of Infinite Impulse Response Filters 53

5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53
5.2 Impulse Invariance Method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.3 Bilinear Transformation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4 Implementation of IIR Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.1 Direct Form I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
5.4.2 Direct Form II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.4.3 Cascade form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
5.5 A Comparison of FIR and IIR Digital Filters . . . . . . . . . . . . . . . . . . . . 63
6 Introduction to Digital Spectral Analysis 65

6.1 Why Spectral Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
7 Random Variables and Stochastic Processes 69

7.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.1 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
7.1.3 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
7.1.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.6 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
7.1.7 Multiple Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.1.8 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.1.9 Operations on Random Variables . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.10 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77
7.1.11 Expected Value of a Function of Random Variables . . . . . . . . . . . . . 78
7.1.12 The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7.1.13 Transformation of a Random Variable . . . . . . . . . . . . . . . . . . . . 80
7.1.14 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.3 Stochastic Processes (Random Processes) . . . . . . . . . . . . . . . . . . . . . . 84
7.3.1 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . . 86
7.3.2 Second-Order Moment Functions and Wide-Sense Stationarity . . . . . . 88
7.4 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.4.1 Definition of Power Spectral Density . . . . . . . . . . . . . . . . . . . . . 93
7.4.2 Relationship between the PSD and SOMF: Wiener-Khintchine Theorem . 95
7.4.3 Properties of the PSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
7.4.4 Cross-Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . 97
7.5 Definition of the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
7.6 Random Signals and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.6.1 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . . 100
7.6.2 Linear filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.6.3 Linear filtered process plus noise . . . . . . . . . . . . . . . . . . . . . . . 106
8 The Finite Fourier Transform 111

8.1 Statistical Properties of the Finite Fourier Transform . . . . . . . . . . . . . . . . 111
8.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
CONTENTS iii
9 Spectrum Estimation 115

9.1 Use of bandpass filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
10 Non-Parametric Spectrum Estimation 119

10.1 Consistency of Spectral Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . 119
10.2 The Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2.1 Mean of the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
10.2.2 Variance of the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . 122
10.3 Averaging Periodograms - Bartlett Estimator . . . . . . . . . . . . . . . . . . . . 124
10.4 Windowing Periodograms - Welch Estimator . . . . . . . . . . . . . . . . . . . . 126
10.5 Smoothing the Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
10.6 A General Class of Spectral Estimates . . . . . . . . . . . . . . . . . . . . . . . . 130
10.7 The Log-Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
10.8 Estimation of the Spectrum using the Sample Covariance . . . . . . . . . . . . . 133
10.8.1 Blackman-Tukey Method . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
10.9 Cross-Spectrum Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
10.9.1 The Cross-Periodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
11 Parametric Spectrum Estimation 139

11.1 Spectrum Estimation using an Autoregressive Process . . . . . . . . . . . . . . . 139
11.2 Spectrum Estimation using a Moving Average Process . . . . . . . . . . . . . . . 141
11.3 Model order selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
12 Discrete Kalman Filter 145

12.1 State Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145
12.2 Derivation of Kalman Filter Equations . . . . . . . . . . . . . . . . . . . . . . . . 145
12.3 Extended Kalman Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
.1 Discrete-Time Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
iv CONTENTS
List of Figures
1.1 Example of a continuous-time signal. . . . . . . . . . . . . . . . . . . . . . . . . 1

1.2 Example of a continuous-time signal. . . . . . . . . . . . . . . . . . . . . . . . . 2
1.3 Example of a sampled continuous-time signal. . . . . . . . . . . . . . . . . . . . 2
1.4 The sequence p(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.5 Discrete Time Signal. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.6 Exponential Sequence with 0 < α < 1. . . . . . . . . . . . . . . . . . . . . . . . . 4
1.7 Exponential Sequence with −1 < α < 0. . . . . . . . . . . . . . . . . . . . . . . . 5
1.8 Sinusoidal Sequence. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.9 Real and Imaginary parts of x(n). . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.10 Representation of a discrete-time system. . . . . . . . . . . . . . . . . . . . . . . 7
1.11 Linear Shift Invariant System. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.12 Transversal Filter input and output for q = 3. . . . . . . . . . . . . . . . . . . . . 10
1.13 Convolution operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1 Continuous-Time to digital converter. . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Effect in the frequency domain of sampling in the time domain: (a) Spectrum of
the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum
of the sampled signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with
Ωs < 2ΩB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3 Discrete-time processing of continuous-time signals. . . . . . . . . . . . . . . . . . 17
2.4 Frequency Spectra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Continuous-Time Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3.1 A basic system for discrete-time filtering of a continuous-time signal. . . . . . . . 20

3.2 Linear and time-invariant filtering of s(n). . . . . . . . . . . . . . . . . . . . . . . 21
3.3 Examples of the four types of linear phase FIR filters . . . . . . . . . . . . . . . . 25
3.4 Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.6 Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.7 Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase
(right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.8 Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters. 28
3.9 Specifications of a low-pass filter. . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.10 Low-pass filter with transition band. . . . . . . . . . . . . . . . . . . . . . . . . 29
v
vi LIST OF FIGURES
3.11 Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)
Bandpass. (d) Stopband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.1 The sequence h(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Magnitude of the Fourier transform of a rectangular window (N = 8). . . . . . . 36
4.3 Commonly used windows for FIR filter design N = 200. . . . . . . . . . . . . . . 37
4.4 Fourier transform (magnitude) of rectangular window (N = 51). . . . . . . . . . 38
4.5 Fourier transform (magnitude) of Bartlett (triangular) window (N = 51). . . . . 38
4.6 Fourier transform (magnitude) of Hanning window (N = 51). . . . . . . . . . . . 38
4.7 Fourier transform (magnitude) of Hamming window (N = 51) . . . . . . . . . . . 39
4.8 Fourier transform (magnitude) of Blackman window (N = 51). . . . . . . . . . . 39
4.9 Fourier transform of a Hamming window with window length of N = 101. . . . 39
4.10 Fourier transform of a Hamming window with window length of N = 201. . . . 40
4.11 (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.
(b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of
Kaiser windows with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed). 41
4.12 Response functions. (a) Impulse response. (b) Log magnitude. . . . . . . . . . . 43
4.13 Kaiser window estimate. (a) Impulse response. (b) Log magnitude. . . . . . . . . 44
4.14 Filter Specification . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
4.15 Flowchart of Parks-McClellan algorithm. . . . . . . . . . . . . . . . . . . . . . . . 50
4.16 Signal flow graph for a delay element. . . . . . . . . . . . . . . . . . . . . . . . . 51
4.17 Signal Flow graph representing the convolution sum. . . . . . . . . . . . . . . . 51
4.18 Efficient signal flow graph for the realisation of a type I FIR filter. . . . . . . . . 51
5.1 Low-pass filter specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

5.2 Magnitude transfer function Hc (jΩ). . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 Filter specifications for Example 5.2.2. . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4 Mapping of the s-plane onto the z-plane using the bilinear transformation . . . . 58
5.5 Mapping of the continuous-time frequency axis onto the unit circle by bilinear
transformation (Td = 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.6 Signal flow graph of direct form I structure for an (N − 1)th-order system. . . . 61
5.7 Signal flow graph of direct form II structure for an N − 1th-order system. . . . 62
5.8 Cascade structure for a 6th-order system with a direct form II realisation of each
second-order subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
6.1 A Michelson Interferometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
7.1 Probability function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.2 χ2ν distribution with ν = 1, 2, 3, 5, 10 . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.3 The quadrant D of two arbitrary real numbers x and y. . . . . . . . . . . . . . . 74
7.4 Bivariate probability distribution function. . . . . . . . . . . . . . . . . . . . . . . 75
7.5 Properties for n=2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
7.6 Probability that (X1 , X2 ) is in the rectangle D3 . . . . . . . . . . . . . . . . . . . 76
7.7 Voltage waveforms emitted from a noise source. . . . . . . . . . . . . . . . . . . . 85
7.8 Measurement of SOMF. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.9 A linear filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
7.10 Interaction of two linear systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
7.11 Interaction of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
7.12 Cascade of L linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
LIST OF FIGURES vii
7.13 Cascade of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.14 A linear filter plus noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
8.1 Transfer function of an ideal bandpass filter. . . . . . . . . . . . . . . . . . . . . 114
9.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
9.2 Transfer function of an ideal band-pass filter . . . . . . . . . . . . . . . . . . . . 116
10.1 The window function N1 |∆N (ejω )|2 for N = 20, N = 10 and N = 5. . . . . . . . 122
10.2 The periodogram estimate of the spectrum (solid line) and the true spectrum
ω
(dotted line). The abscissa displays the frequency 2π while the ordinate shows
the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
10.3 The periodogram as an estimate of the spectrum using 4 non-overlapping seg-
ω
ments. The abscissa displays the frequency 2π while the ordinate shows the
estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
10.4 Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. . . . . . . . . . . . . . . . . . . 129
10.5 The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to
ω
bottom right). The abscissa displays the frequency 2π while the ordinate shows
the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
10.6 Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting
from the upper left to the lower right) a rectangular, Bartlett, Hamming and
Blackman window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
11.1 Linear and time-invariant filtering of white noise . . . . . . . . . . . . . . . . . . 139

11.2 Filter model for a moving average process . . . . . . . . . . . . . . . . . . . . . . 141
11.3 Estimate of an MA process with N = 100 and q = 5 . . . . . . . . . . . . . . . . 143
ω
11.4 True and estimated AR(6) spectrum. The abscissa displays the frequency 2π
while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . 144
11.5 Akaike and MDL information criteria for the AR(6) process for orders 1 through
10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
12.1 Relationship between system state and measurements assumed for Kalman filter. 146
viii LIST OF FIGURES
List of Tables
3.1 The four classes of FIR filters and their associated properties . . . . . . . . . . . 24
4.1 Suitability of a given class of filter for the four FIR types . . . . . . . . . . . . . 33
4.2 Properties of different windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
ix
x LIST OF TABLES
Preface
This draft is provided as an aid for students studying the subject Digital Signal Processing. It
covers the material presented during lectures. Useful references on this subject include:
1. Oppenheim, A.V. & Schafer, R.W. Discrete-Time Signal Processing, Prentice-Hall, 1989.
2. Proakis, J.G. & Manolakis, D.G. Digital Signal Processing Principles, Algorithms, and
Applications, Maxwell MacMillan Editions, 1992.
3. Mitra, S.K. Digital Signal Processing: A Computer-Based Approach, Prentice-Hall, 1998.
4. Diniz, P. & da Silva, E. & Netto, S. Digital Signal Processing, System Analysis and Design,
Cambridge University Press, 2002
5. Porat, B. A Course on Digital Signal Processing, J. Wiley & Sons Inc., 1997
6. Kammeyer, K.D. & Kroschel, K. Digitale Signalverarbeitung, Teubner Studienbücher, 1998
7. Hänsler, E. Statistische Signale, Springer Verlag, 3. Auflage, 2001
8. Böhme, J.F. Stochastische Signale, Teubner Studienbücher, 1998
9. Brillinger, D.R. Time Series, Data Analysis and Theory, Holden Day Inc., 1981
10. Priestley, M.B. Spectral Analysis and Time Series, Academic Press, 2001
xi
Chapter 1
Discrete-Time Signals and Systems
1.1 Signals
Most signals can in practice be classified into three broad groups:
1. Analog or continuous-time signals are continuous in both time and amplitude. Examples:
speech, image, seismic and radar signals. An example of a continuous time signal is shown
in Figure 1.1.
x(t)
Figure 1.1: Example of a continuous-time signal.
2. Discrete-time signals are discrete in time and continuous in amplitude. An example of

such a signal is shown in Figure 1.2. A common way to generate discrete-time signals is
by sampling continuous-time signals. There exist, however, signals which are intrinsically
discrete in time, such as the average monthly rainfall.
3. Digital signals are discrete in both time and amplitude. Digital signals may be generated
through the amplitude quantisation of discrete-time signals. The signals used by computers
are digital, being discrete in both time and amplitude.
The development of digital signal processing concepts based on digital signals requires a
detailed treatment of amplitude quantisation, which is extremely difficult and tedious. In
addition, many useful insights are lost in such a treatment because of the mathematical
complexity.
For this reason, digital signal processing concepts have been developed based on discrete-
time signals and continuous-time signals. Experience shows that the theories developed
1
2 CHAPTER 1. DISCRETE-TIME SIGNALS AND SYSTEMS
6 x(n)
-
1 2 3 n
Figure 1.2: Example of a continuous-time signal.
based on continuous-time or discrete-time signals are often applicable to digital signals. A

discrete-time signal (sequence) will be denoted by functions whose arguments are integers.
For example, x(n) represents a sequence which is defined for all integer values of n. If
n is not integer then x(n) is not zero, but is undefined. The notation x(n) refers to the
discrete-time function x or to the value of the function x, at a specific n. Figure 1.3 gives
an example of a sampled signal with sampling period T . Herein x(n) = x(t)|t=nT = x(nT ),
n integer and T > 0.
x(n)
1Τ 2Τ 3Τ t
Figure 1.3: Example of a sampled continuous-time signal.
There are some sequences which play an important role in digital signal processing. We
discuss these next.
1.1.1 The unit sample sequence

The unit sample sequence is denoted by δ(n) and is defined as

1, n=0,
δ(n) =
0, otherwise.
The sequence δ(n) may be regarded as an analog to the impulse function used in continuous-
time system analysis. An important aspect of the impulse sequence is that any sequence can
be represented as a linear combination of delayed impulses. For example, the sequence p(n) in
Figure 1.4 can be expressed as
p(n) = a−1 δ(n + 1) + a1 δ(n − 1) + a3 δ(n − 3) .
1.1. SIGNALS 3
In general,
p(n)
6
a−1 a1 a3
- n
−1 1 3
Figure 1.4: The sequence p(n).
∞
X
x(n) = x(k)δ(n − k) ,
k=−∞
as shown in Figure 1.5.
6x(n)
-
1 n
x(k)
6
δ(n − k)
-
n k
Figure 1.5: Discrete Time Signal.
1.1.2 The unit step sequence

The unit step sequence denoted by u(n) is defined as

1, n ≥ 0,
u(n) =
0, otherwise.
The sequence u(n) is related to δ(n) by
n
X
u(n) = δ(k) ,
k=−∞
or alternatively
∞
X
u(n) = δ(n − k) .
k=0
Conversely, the impulse sequence can be expressed as the first backward difference of the unit
step sequence, i.e.,
Xn n−1
X
δ(n) = u(n) − u(n − 1) = δ(k) − δ(k)
k=−∞ k=−∞
1.1.3 Exponential and sinusoidal sequences

Exponential and sinusoidal sequences play an important role in digital signal processing. The
general form of an exponential sequence is
x(n) = A · αn , (1.1)
where A and α may be complex numbers. For real A > 0 and real 0 < α < 1, x(n) is a decreasing
sequence similar in appearance to Figure 1.6.
x(n)
Figure 1.6: Exponential Sequence with 0 < α < 1.
For −1 < α < 0, the sequence values alternate in sign but again decrease in magnitude with
increasing n, as shown in Figure 1.7. If |α| > 1, then x(n) grows in magnitude as n increases.
A sinusoidal sequence has the general form
x(n) = A · cos(ωo n + φ)
3π
with A and φ real. The sinusoidal sequence is shown in Figure 1.8 for φ = 2 .
Now let A and α be complex so that A = |A|ejφ and α = |α|ejω0 , then
x(n) = Aαn = |A|ejφ |α|n ejnω0

= |A| · |α|n ej(ω0 n+φ)
= |A| · |α|n [cos(ω0 n + φ) + j sin(ω0 n + φ)]
A plot of the real part of x(n) = A · αn for A = |A|ejω0 is given in Figure 1.9 for |α| > 1. In this
case we set ω0 = π and φ = 0, i.e.,
ℜ{x(n)} = |A| · |α|n cos(πn)

= |A| · |α|n (−1)n .
1.1. SIGNALS 5
x(n)
Figure 1.7: Exponential Sequence with −1 < α < 0.
6x(n)
-
π
2ω0 n
Figure 1.8: Sinusoidal Sequence.
The imaginary part is zero for all n.

For |α| = 1, x(n) is referred to as the complex exponential sequence, and has the form
x(n) = |A| · ej(ω0 n+φ) = |A| [cos(πn + φ) + j sin(πn + φ)] ,
i.e., real and imaginary parts of ejω0 n vary sinusoidally with n.
Note. In the continuous-time case, the quantity ω0 is called the frequency and has the di-
mension radians per second. In the discrete-time case n is a dimensionless integer. Thus the
dimension of ω0 must be radians. To maintain a closer analogy with the continuous-time case we
can specify the units of ω0 to be radians per sample and the units of n to be samples. Following
this rationale we can define the period of a complex exponential sequence as 2π/ω0 which has
the dimension samples.
x(n)
Figure 1.9: Real and Imaginary parts of x(n).
1.1.4 Periodic Sequences

A sequence x(n) is called periodic with period N if x(n) satisfies the following condition:
x(n) = x(n + N ) ∀ n,
where N is a positive integer. For example, the sequence cos(πn) is periodic with period 2k,
k ∈ Z, but the sequence cos(n) is not periodic. Note that a sinusoidal discrete-time sequence
A cos (ω0 n) is periodic with period 2π if ω0 N = 2πk, k ∈ Z. A similar statement holds for the
complex exponential sequence C · ejω0 n , i.e., periodicity with period N requires
ejω0 (n+N ) = ejω0 n ,
which is true only for
ω0 N = 2πk, k ∈ Z . (1.2)
Consequently, complex exponential and sinusoidal sequences are not necessarily periodic with
period 2π/ω0 and, depending on the value of ω0 , may not be periodic at all. For example, for
ω0 = 3π/4, the smallest value of N for which equation (1.2) can be satisfied with k an integer
is N = 8 ( corresponding to k = 3). For ω0 = 1, there are no integer values of N or k for which
Equation (1.2) is satisfied.
When we combine the condition of Equation (1.2) with the observation that ω0 and ω0 +2πr,
r ∈ Z, are indistinguishable frequencies, then it is clear that there are only N distinguishable
frequencies for which the corresponding sequences are periodic with period N . One set of
frequencies is ωk = 2πk/N , k = 0, 1, . . . , N − 1. These properties of complex exponential and
sinusoidal sequences are basic to both the theory and the design of computational algorithms
for discrete-time Fourier analysis.
1.2 Systems
A system T that maps an input x(n) to an output y(n) is represented by:
y(n) = T[x(n)]
1.2. SYSTEMS 7
x(n) - T[·] - y(n)
Figure 1.10: Representation of a discrete-time system.
Example 1.2.1 The ideal delay system is defined by y(n) = x(n − nd ), nd ∈ Z+ . The system
would shift the input to the right by nd of samples corresponding to a time delay.
This definition of a system is very broad. Without some restrictions, we cannot completely
characterise a system when we know the output of a system to a certain set of inputs. Two types
of restrictions greatly simplify the characterisation and analysis of systems; linearity and shift
invariance (LSI). Many systems can be approximated in practice by a linear and shift invariant
system. A system T is said to be linear if
T[ax1 (n) + bx2 (n)] = ay1 (n) + by2 (n)
holds, where T[x1 (n)] = y1 (n), T[x2 (n)] = y2 (n), and a and b are any scalar constants. The
property
T[ax(n)] = aT[x(n)]
is called homogeneity, and
T[x1 (n) + x2 (n)] = T[x1 (n)] + T[x2 (n)]
is called additivity. Additivity and homogeneity are collectively known as the Principle of
Superposition. A system is said to be shift invariant (SI) if
T[x(n − n0 )] = y(n − n0 )
where y(n) = T[x(n)] and n0 is any integer.
Example 1.2.2 A system described by the relation

n
X
y(n) = x(k)
k=−∞
is called an accumulator. This system is both linear and shift invariant.
Example 1.2.3 Consider a system which is described by
y(n) = T[x(n)] = x(n) · g(n)
where g(n) is an arbitrary sequence. This system is shown in Figure 1.11. Linearity of the
system is easily verified since
T [ax1 (n) + bx2 (n)] = g(n) [ax1 (n) + bx2 (n)]

= ag(n)x1 (n) + bg(n)x2 (n)
= aT [x1 (n)] + bT [x2 (n)] .
x(n) - × - y(n)
6
T
g(n)
Figure 1.11: Linear Shift Invariant System.
However, the system is shift variant because
T [x(n − n0 )] = x(n − n0 )g(n)
does not equal

y(n − n0 ) = x(n − n0 )g(n − n0 ) .
Example 1.2.4 Consider a system described by
y(n) = T [x(n)] = x2 (n) .
Although this system is shift-invariant, it is non-linear. The system is non-linear because
T [x1 (n) + x2 (n)] = [x1 (n) + x2 (n)]2 ,
which is different from

y1 (n) + y2 (n) = x21 (n) + x22 (n) .
The system is shift-invariant because
T[x(n − n0 )] = x2 (n − n0 ) ,
and
y(n − n0 ) = x2 (n − n0 ) .
1.2.1 The unit sample response of linear systems

Consider a linear system T. Using the relationship
∞
X
x(n) = x(k)δ(n − k)
k=−∞
and the linearity property, the output y(n) of a linear system T for an input x(n) is given by
y(n) = T[x(n)]
" ∞ #
X
= T x(k)δ(n − k)
k=−∞
= T[· · · + x(−1)δ(n + 1) + x(0)δ(n) + x(1)δ(n − 1) + · · · ]
∞
X
= x(k)T[δ(n − k)]
k=−∞
1.3. CONVOLUTION 9
This result indicates that a linear system can be completely characterised by the response of
the system to a unit sample sequence δ(n) and its delays δ(n − k). Moreover, if the system is
shift-invariant, we have:
h(n − n0 ) = T[δ(n − n0 )]
where h(n) = T[δ(n)] is the unit sample response of a system T to an input δ(n).
1.3 Convolution
The input-output relation of a linear and shift-invariant (LSI) system T is given by
∞
X
y(n) = T[x(n)] = x(k)h(n − k)
k=−∞
This equation shows that knowledge of h(n) completely characterises the system, i.e., knowing
the unit sample response, h(n), of a LSI system T, allows us to completely determine the output
y(n) for a given input x(n). This equation is referred to as convolution and is denoted by the
convolution operator ⋆. For a LSI system, we have
y(n) = x(n) ⋆ h(n)
X∞
= x(k)h(n − k)
k=−∞
Remark 1: The unit sample response h(n) loses its significance for a nonlinear or shift-variant
system.
Remark 2: The choice of δ(n) as an input in characterising a LSI system is the simplest, both
conceptually and in practice.
Example 1.3.1 A linear transversal filter is given by

q
X
y(n) = x(n − k)a(k)
k=0
= a(0)x(n) + a(1)x(n − 1) + . . . + a(q)x(n − q)
1
If we take a(0) = a(1) = . . . = a(q) = q+1 , the filter output sequence is an average of q + 1
samples of the input sequence around the nth sample. A transversal filter input and output for
q = 3 are shown in Figure 1.12.
Example 1.3.2 To compute the relationship between the input and the output with a transversal
filter, h(−k) is first plotted against k; h(−k) is simply h(k) reflected or “flipped” around k = 0,
as shown in Figure 1.13. Replacing k by k − n, where n is a fixed integer,
P leads to a shift in the
origin of the sequence h(−k) to k = n. y(n) is then obtained by the x(k)h(n − k). See Figure
1.13.
1.4 Stability and Causality

To implement a LSI system in practice (realisability), we have to impose two additional con-
straints – stability and causality.
x(k)
-
n−3 n k
y(k)
-
n−3 n k
Figure 1.12: Transversal Filter input and output for q = 3.

x(k)
6 6h(k)
3
2 2
1 1
- -
k k
6
y(n)
7
6
6h(−k)
2
1 1
- -
k n
Figure 1.13: Convolution operation.
1.4.1 Stability
A system is considered stable in the bounded-input bounded-output (BIBO) sense if and only
if a bounded input always leads to a bounded output. A necessary and sufficient condition for
1.4. STABILITY AND CAUSALITY 11
an LSI system to be stable is that the unit sample response h(n) is absolutely summable, i.e.,
∞
X
|h(n)| < ∞
n=−∞
Such a sequence h(n) is sometimes referred to as stable sequence. The proof is quite simple.
First let x(n) be bounded by M , i.e., |x(n)| ≤ M < ∞ ∀ n, then using convolution
∞
X

|y(n)| = h(k)x(n − k)

k=−∞
X∞
≤ |h(k)||x(n − k)|
k=−∞
X∞
≤ M |h(k)|
k=−∞
So given a bounded input |x(n)| ≤ M , the output y(n) is bounded if and only if h(n) is absolutely
summable.
1.4.2 Causality
Causality is a fundamental law of physics stating that information cannot travel from the future
to the past. In the context of systems we say that a system is causal if and only if the current
output y(n) does not depend on any future values of the input such as x(n + 1), x(n + 2), . . ..
Causality holds for any practical system regardless of linearity or shift invariance.
A necessary and sufficient condition for an LSI system to be causal is that h(n) = 0, for
n < 0. The proof follows from convolution,
∞
X
y(n) = x(n − k)h(k).
k=−∞
To obtain y, at index n, causality implies that we cannot assume knowledge of x at index n − k

for n − k > n, or simply, for k < 0. Setting h(k) = 0 for k < 0 ensures this.
Chapter 2
Digital Processing of
Continuous-Time Signals
Discrete-time signals arise in many ways, but they most commonly occur in representations
of continuous-time signals. This is partly due to the fact that the processing of continuous-
time signals is often carried out on discrete-time sequences obtained by sampling continuous-
time signals. It is remarkable that under reasonable constraints a continuous-time signal can
be adequately represented by a sequence. In this chapter we discuss the process of periodic
sampling.
2.1 Periodic Sampling

Let xc (t) be a continuous-time signal and its Fourier transform be denoted by Xc (jΩ)
Z +∞
Xc (jΩ) = xc (t)e−jΩt dt .
−∞
The signal xc (t) is obtained from Xc (jΩ) by

Z +∞
1
xc (t) = Xc (jΩ)e+jΩt dΩ .
2π −∞
The typical method of obtaining a discrete-time representation of a continuous-time signal is

through periodic sampling, wherein a sequence of samples x(n) is obtained from the continuous-
time signal xc (t) according to
x(n) = xc (t)|t=nT = xc (nT ), n = 0, ±1, ±2, . . . , T > 0.
Herein, T is the sampling period, and its reciprocal, fs = 1/T , is the sampling frequency, in
samples per second and Ωs = 2πfs is the sampling frequency in rad/sec. This equation is the
input-output relationship of an ideal A/D converter. It is generally not invertible since many
continuous-time signals can produce the same output sequence of samples. This ambiguity can
be removed by restricting the class of input signals to the sampler, however.
It is convenient to represent the sampling process in two stages, as shown in Figure 2.1. This
consists of an impulse train modulator which is followed by conversion of the impulse train into
a discrete sequence. The system G in Figure 2.1 represents a system that converts an impulse
13
14 CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUS-TIME SIGNALS
xc (t) - × - G - x(n) = xc (nT )
6 xs (t)
p(t)
Figure 2.1: Continuous-Time to digital converter.
train into a discrete-time sequence. The modulation signal p(t) is a periodic impulse train,
∞
X
p(t) = δ(t − nT ) ,
n=−∞
where δ(t) is called the Dirac delta function. Consequently, we have

∞
X ∞
X ∞
X
xs (t) = xc (t) · p(t) = xc (t) · δ(t − nT ) = xc (nT )δ(t − nT ) = x(n)δ(t − nT ).
n=−∞ n=−∞ n=−∞
The Fourier transform of xs (t), Xs (jΩ), is obtained by convolving Xc (jΩ) and P (jΩ). The
Fourier transform of a periodic impulse train is a periodic impulse train, i.e.,
∞ ∞
2π X 2πk 2π X
P (jΩ) = δ(Ω − )= δ(Ω − kΩs ).
T T T
k=−∞ k=−∞
Since
1
Xs (jΩ) = Xc (jΩ) ∗ P (jΩ),
2π
it follows that
∞ ∞
1 X 2πk 1 X
Xs (jΩ) = Xc j Ω − = Xc (jΩ − kjΩs ) .
T T T
k=−∞ k=−∞
The same result can be obtained by noting that p(t) is periodic so that it can be represented
as a Fourier series with coefficients T1 , i.e.,
∞
1 X jnΩs t
p(t) = e .
T n=−∞
Thus
∞
1 X
xs (t) = xc (t)p(t) = xc (t)ejnΩs t .
T n=−∞
Taking the Fourier transform of xs (t) leads to

∞
1 X
Xs (jΩ) = Xc (j (Ω − kΩs )) .
T
k=−∞
2.2. RECONSTRUCTION OF BAND-LIMITED SIGNALS 15
Figure 2.2 depicts the frequency domain representation of impulse train sampling. Figure
2.2(a) represents the Fourier transform of a band-limited signal where the highest non-zero
frequency component in Xc (jΩ) is at ΩB . Figure 2.2(b) shows the periodic impulse train P (jΩ),
Figure 2.2(c) shows Xs (jΩ), which is the result of convolving Xc (jΩ) with P (jΩ). From Figure
2.2(c) it is evident that when
Ωs − ΩB > ΩB , or Ωs > 2ΩB
the replica of Xc (jΩ) do not overlap, and therefore xc (t) can be recovered from xs (t) with an
ideal low-pass filter with cut-off frequency of at least ΩB but not exceeding ΩS −ΩB . However, in
Figure 2.2(d) the replica of Xc (jΩ) overlap, and as a result the original signal cannot be recovered
by ideal low pass filtering. The reconstructed signal is related to the original continuous-time
signal through a distortion referred to as aliasing.
If xc (t) is a band-limited signal with no components above ΩB rad/sec, then the sampling
frequency has to be equal to or greater than 2ΩB rad/sec. This is known as the Nyquist criterion
and 2ΩB is known as the Nyquist frequency.
2.2 Reconstruction of Band-Limited Signals

If x(n) is the input to an ideal low pass filter with frequency response Hr (jΩ) and impulse
response hr (t), then the output of the filter will be
∞
X
xr (t) = x(n)hr (t − nT ) .
n=−∞
where T is the sampling interval. The reconstruction filter commonly has a gain of T and a
cutoff frequency of Ωc = Ωs /2 = π/T . This choice is appropriate for any relationship between
Ωs and ΩB that avoids aliasing, i.e., so long as Ωs > 2ΩB . The impulse response of such a filter
is given by
sin (πt/T )
hr (t) = .
πt/T
Consequently, the relation between xr (t) and x(n) is given by
∞
X sin[π(t − nT )/T ]
xr (t) = x(n) ,
n=−∞
π(t − nT )/T
with xc (nT ) = x(n).
2.3 Discrete-time Processing of Continuous-Time Signals.

A continuous-time signal is processed by digital signal processing techniques using analog-to-
digital (A/D) and digital-to-analog (D/A) converters before and after processing. This concept
is illustrated in Figure 2.3. The low pass filter limits the bandwidth of the continuous-time
signal to reduce the effect of aliasing. The spectral characteristics (magnitude) of the low pass
filter is shown in Figure 2.5.
Example 2.3.1 Assume that we wanted to restore the signal sc (t) from yc (t) = sc (t) + nc (t),
where nc (t) is a background noise. The spectral characteristics of these signals are shown in
Figure 2.4.
(a) X c (j Ω)
1
− ΩB ΩB Ω
(b)
P(j Ω)
2π
−−−
T
−2 Ω s − Ωs Ωs 2 Ωs Ω
(c)
1 X s (j Ω)
−−−
T
ΩB Ωs Ω
(d)
X s (j Ω)
1
−−−
T
Ωs Ω
Figure 2.2: Effect in the frequency domain of sampling in the time domain: (a) Spectrum of
the continuous-time signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled
signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB .
We can perform the processing for this problem using either continuous-time or digital tech-
niques. Using a continuous-time approach, we would filter the signal sc (t) using an continuous-
time low pass filter. Alternatively, we could adopt a digital approach, as shown in Figure 2.3.
We would first low pass filter yc (t) to prevent aliasing, and then convert the continuous-time
signal into a discrete-time signal using an A/D converter. Any processing is then performed
2.3. DISCRETE-TIME PROCESSING OF CONTINUOUS-TIME SIGNALS. 17
Analog Analog Digital Processed

(or Continuous - Low pass - - Signal - - Continuous-Time
A/D D/A
-Time) Signal Filter Processing Signal
Figure 2.3: Discrete-time processing of continuous-time signals.
digitally on the discrete-time signal. The processed signal is then converted back to continuous
form using an D/A converter.
Sc (jΩ) 6 Nc (jΩ) 6
1 1
- -
-ΩB +ΩB Ω -ΩB1 -ΩB ΩB ΩB1 Ω
Yc (jΩ) 6
-
-ΩB1 ΩB ΩB ΩB1 Ω
Figure 2.4: Frequency Spectra.
H(jΩ)
6 Processed
yc (t) Signal
- -
-
-ΩB 0 ΩB Ω
Figure 2.5: Continuous-Time Filter.
The advantages of processing signals digitally as compared with an analog approach are:
1. Flexibility, if we want to change the continuous-time filter because of change in signal and
noise characteristics, we would have to change the hardware components. Using the digital
approach, we only need to modify the software.
2. Better control of accuracy requirements. Tolerances in continuous-time circuit components

make it extremely difficult for the system designer to control the accuracy of the system.
3. The signals can be stored without deterioration or loss of signal quality.
4. Lower cost of the digital implementation.

Chapter 3
Filter Design
3.1 Introduction
Filters are an important class of linear time-invariant systems. Their objective is to modify the
input signal. In practice filters are frequency selective, meaning they pass certain frequency
components of the input signal and totally reject others. The objective of this chapter and the
following ones is to introduce the concepts of frequency selective filters and the most popular
filter design techniques.
The design of filters (continuous-time or digital) involves the following three steps.
1. Filter specification
This step consists of setting the required characteristics of the filter. For example, to
restore a signal which has been degraded by background noise, the filter characteristics will
depend on the spectral characteristics of both the signal and background noise. Typically
(but not always) specifications are given in the frequency domain.
2. Filter design
Determine the unit sample response, h(n) of the filter or its system function, H(z) and
H(ejω ).
3. Filter implementation
Implement a discrete time system with the given h(n) or H(z) or one which approaches
it.
These three steps are closely related. For instance, it does not make much sense to specify a
filter that cannot be designed or implemented. So, we shall consider a particular class of digital
filters which possess the following practical properties:
1. h(n) is real.
2. h(n) is causal, i.e., h(n) = 0 for n < 0.
3. h(n) is stable, i.e,

∞
X
|h(n)| < ∞ .
n=0
19
20 CHAPTER 3. FILTER DESIGN
In a practical setting, the desired filter is often implemented digitally on a chip and is used
to filter a signal derived from a continuous-time signal by means of periodic sampling followed
by analog-to-digital conversion. For this reason, we refer to digital filters as discrete-time filters
even though the underlying design techniques relate to the discrete-time nature of the signals
and systems.
- A/D - Discrete-Time System - D/A -

xc (t) x(n) y(n) yc (t)
6 6
T T
Figure 3.1: A basic system for discrete-time filtering of a continuous-time signal.
A basic system for discrete-time filtering of continuous-time signals is given in Figure 3.1.
Assuming the input, xc (t), is band-limited and the sampling frequency is high enough to avoid
aliasing, the overall system behaves as a linear time-invariant system with frequency response

H(ejΩT ), |Ω| < π/T
Hef f (jΩ) =
0, |Ω| > π/T
In such cases, it is straightforward to convert from specifications on the effective continuous-time

filter to specifications on the discrete-time filter through the relation ω = ΩT . That is H(ejω )
is specified over one period by the equation
ω
H(ejω ) = Hef f j , |ω| < π .
T
This concept will be treated in detail in Chapter 5.
3.2 Digital Filters

Throughout the text we will interchangeably use system and filter.
Linear time-invariant discrete-time systems can be characterised by linear constant coefficient
difference equations with initial rest condition of the type
M
X N
X −1
ak y(n − k) = br x(n − r), (3.1)
k=0 r=0
where x(n) and y(n), n = 0, ±1, . . . are the input and output of the system and max(N − 1, M )
is the order.
The unit sample response of the system may be of either finite duration or infinite duration.
It is useful to distinguish between these two classes as the properties of the digital processing
techniques for each differ. If the unit sample response is of finite duration, we have a finite
impulse response (FIR) filter, and if the unit sample response is of infinite duration, we have an
infinite impulse response (IIR) filter.
If M = 0 so that
N −1
1 X
y(n) = br x(n − r) (3.2)
a0
r=0
3.3. LINEAR PHASE FILTERS 21
then the system corresponds to an FIR filter. In fact comparison with the convolution sum
∞
X
y(n) = h(k)x(n − k)
k=−∞
shows that Equation (3.2) is identical to the convolution sum, hence it follows that
bn
h(n) = a0 , n = 0, . . . , N − 1
0, otherwise.
An FIR filter can be always described by a difference equation of the form (3.1) with M = 0.
M must be greater than zero for an IIR filter.
3.3 Linear Phase Filters

A digital filter with unit sample response h(n) is said to have linear phase if H(ejω ) can be
expressed as
H(ejω ) = HM (ejω )e−jωα |ω| < π
where HM (ejω ) is a real function of ω and α is a constant. The significance of a linear phase
filter lies in the result that if a discrete-time signal is filtered by a linear-shift-invariant (LSI)
system whose transfer function has magnitude HM (ejω ) = 1 and linear phase (see Figure 3.2),
then the spectrum of the output is
s(n) - HM (ejω ) = 1, α - y(n)
Figure 3.2: Linear and time-invariant filtering of s(n).
S(ejω )e−jωα ↔ s(n − α) = y(n) .
Simply put, a linear phase filter preserves the shape of the signal. Should the filter have the
property that |H(ejω )| = 1, but does not have linear phase, then the shape of the signal will be
distorted. We generalise the notion of linear phase to
H(ejω ) = HM (ejω )e−jωα ejβ = HM (ejω )e−jφ(ω) , φ(ω) = −β + ωα
where HM (ejω ) is a real function of ω, and α and β are constants. The case where β = 0 is
referred to as linear phase as opposed to generalised linear phase where we have
H(ejω ) = HM (ejω ) cos(β − ωα) + jHM (ejω ) sin(β − ωα)

∞
X
= h(n)e−jωn
n=−∞
X∞ ∞
X
= h(n) cos(ωn) − j h(n) sin(ωn), h(n) real
n=−∞ n=−∞
leading to
∞
X
h(n) sin(ωn)
sin(β − ωα) n=−∞
tan(β − ωα) = =− ∞
cos(β − ωα) X
h(n) cos(ωn)
n=−∞
and
∞
X
h(n) sin(ω(n − α) + β) = 0. (3.3)
n=−∞
To have φ(ω) = −β + ωα, (3.3) must hold for all ω, and is a necessary, but not sufficient,
condition on h(n), α and β. For example, one can show that
β = 0 or π , 2α = N − 1 , h(2α − n) = h(n) (3.4)
satisfy (3.3). Alternatively
π 3π
β= , or , 2α = N − 1 , h(2α − n) = −h(n) (3.5)
2 2
also satisfies (3.3).
To have a causal system we require
∞
X
h(n) sin(ω(n − α) + β) = 0
n=0
to hold for all ω. Causality and the previous conditions (3.4) and (3.5) imply that h(n) = 0, n < 0
and n > N −1, i.e., causal FIR filters have generalised linear phase if they have impulse response
length N and satisfy either h(2α − n) = h(n) or h(2α − n) = −h(n). Specifically if,

h(N − 1 − n), 0 ≤ n ≤ N − 1,
h(n) =
0, otherwise ,
then
H(ejω ) = He (ejω )e−jω(N −1)/2 ,
where He (ejω ) is a real, even, periodic function of ω. Similarly if

−h(N − 1 − n), 0 ≤ n ≤ N − 1 ,
h(n) =
0, otherwise ,
then
H(ejω ) = jHO (ejω )e−jω(N −1)/2 = HO (ejω )e−jω(N −1)/2+jπ/2 ,
where HO (ejω ) is a real, odd, periodic function of ω. The above two equations for h(n) are
sufficient, but not necessary conditions, which guarantee a causal system with generalised linear
phase.
3.3.1 Generalised Linear Phase for FIR Filters

We shall restrict our discussion to linear phase FIR filters. We classify FIR filters into four
types depending on whether h(n) is symmetric (h(n) = h(N − 1 − n)) or anti-symmetric (h(n) =
−h(N − 1 − n)) about the point of symmetry or anti-symmetry, and whether the filter length is
odd or even. The four types of FIR filters and their associated properties are discussed below
and summarised in Table 3.1. Figure 3.3 shows examples of the four types of FIR filters.
Type I FIR linear phase filter. A type I filter has a symmetric impulse response
h(n) = h(N − 1 − n), 0≤n≤N −1
with N an odd integer. The corresponding frequency response is

 N −1 
N
X −1 X2
N −1
H(ejω ) = h(n)e−jωn = e−jω 2  a(k) cos(ωk) ,
n=0 k=0
where a(0) = h( N 2−1 ) and a(k) = 2h( N 2−1 − k), k = 1, 2, . . . , N 2−1 .
Type II FIR linear phase filter. A type II filter has a symmetric impulse response with N
an even integer. The corresponding frequency response is
 
XN/2 
N −1
H(ejω ) = e−jω 2 b(k) cos[ω(k − 1/2)] ,
 
k=1
where
b(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.
Type III FIR linear phase filter. A type III filter has an antisymmetric impulse response
h(n) = −h(N − 1 − n), 0≤n≤N −1
with N an odd integer. The corresponding frequency response is

 
N −1
(NX
−1)/2 
H(ejω ) = je−jω 2 c(k) sin(ωk) ,
 
k=1
where
c(k) = 2h((N − 1)/2 − k), k = 1, 2, . . . , (N − 1)/2.
Type IV FIR linear phase filter. A type IV filter has an antisymmetric impulse response
with N an even integer. The corresponding frequency response is
 
XN/2 
N −1
H(ejω ) = je−jω 2 d(k) sin[ω(k − 1/2)] ,
 
k=1
where
d(k) = 2h(N/2 − k), k = 1, 2, . . . , N/2.
Table 3.1: The four classes of FIR filters and their associated properties
Type Symmetry of N α β Form of Constraints on

h(n) HM (ejω ) HM (ejω ) · ejβ
N −1
X2
N −1
I h(n) = h(N − 1 − n) odd 2 0 a(n) · cos ωn real
n=0
symmetric π
N
2
N −1
X 1
II h(n) = h(N − 1 − n) even 2 0 b(n) · cos ω(n − ) real
2
n=1
symmetric π HM (ejπ ) = 0
N −1
X2
N −1
III h(n) = −h(N − 1 − n) odd 2 π/2 c(n) · sin ωn Pure Imaginary
n=1
anti-symmetric 3π/2 HM (ej0 ) = HM (ejπ ) = 0
N
2
N −1
X 1
IV h(n) = −h(N − 1 − n) even 2 π/2 d(n) · sin ω(n − ) Pure Imaginary
2
n=1
anti-symmetric 3π/2 HM (ej0 ) = 0
Remarks
1. A filter will have no phase distortion if H(ejω ) is real and even since h(n) will be real and
even.
2. Non-causal FIR filters can be made causal by an appropriate time shift so that the first
non-zero value occurs at n = 0. The point of symmetry can therefore occur at two discrete-
time points N 2−1 depending on whether N was odd or even.
Example 3.3.1 Example of a type I FIR filter. Given a filter with unit sample response

1, 0 ≤ n ≤ 4
h(n) = ,
0, otherwise
we can evaluate the frequency response to be
4
X 1 − e−j5ω sin(5 · ω/2)
H(ejω ) = e−jωn = = e−j2ω .
1 − e−jω sin(ω/2)
n=0
The magnitude and the phase of the frequency response of the filter are depicted in Figure 3.4
Example 3.3.2 Example of a type II FIR filter.

Consider an extension, by one sample of the impulse response of the filter in example 3.3.1.
Then the frequency response of the system is given by
5
jω
X 5 sin(3 · ω)
H(e ) = e−jωn = e−jω 2 ,
sin(ω/2)
n=0
whose magnitude and phase are depicted in Figure 3.5

h(n)
3 6 TYPE I
2
1
-
n
h(n)
3 6 TYPE II
2
1
-
h(n) n
6 2 TYPE III
1
-
n
-1
-2
h(n)
6 3
2 TYPE IV
1
-
n
-1
-2
-3
Figure 3.3: Examples of the four types of linear phase FIR filters
Example 3.3.3 Example of a type III FIR filter. Consider a filter with unit impulse response
h(n) = δ(n) − δ(n − 2) ,
where δ(n) is Kronecker’s delta defined as

1 , n=0
δ(n) = .
0 , n 6= 0
We can then evaluate the frequency response as

π
H(ejω ) = 1 − e−j2ω = j[2 sin(ω)]e−jω = 2e−j (ω− 2 ) sin(ω).
The magnitude and the phase of H(ejω ) are depicted in Figure 3.6.
5 4
4.5
3
4
2
3.5
Phase (Radians)
1
3
Magnitude
2.5 0
2
−1
1.5
−2
1
−3
0.5
0 −4
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.4: Frequency response of type I filter of Example 9.3.1. Magnitude (left). Phase (right).
6 4
3
5
2
Phase (Radians)
4
1
Magnitude
3 0
−1
2
−2
1
−3
0 −4
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.5: Frequency response of type II filter of Example 9.3.2. Magnitude (left). Phase
(right).
3
3
2
2.5
Phase (Radians)
1
2
Magnitude
0
1.5
1 −1
0.5 −2
0 −3
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.6: Frequency response of type III filter of Example 9.3.3. Magnitude (left). Phase
(right).
3.4. FILTER SPECIFICATION 27
Example 3.3.4 Example of a type IV FIR filter. Consider a filter with unit impulse response
h(n) = δ(n) − δ(n − 1).
We can then evaluate the frequency response as
ω π
ω
H(ejω ) = 1 − e−jω = j[2 sin(ω/2)]e−jω/2 = 2e−j ( 2 − 2 ) sin .
2
The magnitude and the phase of H(ejω ) are depicted in Figure 3.7.
3
3
2
2.5
Phase (Radians)
1
2
Magnitude
0
1.5
1 −1
0.5 −2
0 −3
0 1 2 3 4 5 6 0 1 2 3 4 5 6
w rad/s w rad/s
Figure 3.7: Frequency response of type IV filter of Example 9.3.4. Magnitude (left). Phase
(right).
3.4 Filter Specification

Like continuous-time filters, digital filters are specified in the frequency domain. Since H(ejω ) =
H(ej(ω+2π) ) for all ω, H(ejω ) is completely specified for 0 ≤ ω < 2π. In addition, since h(n) is
real, we have H(ejω ) = H(e−jω )∗ . This implies that specifying H(ejω ) for 0 ≤ ω < π completely
specifies H(ejω ) for all ω. To uniquely specify the complex frequency response H(ejω ), we need
to specify both the magnitude and the phase of H(ejω ). However, since we are dealing with
linear phase FIR filters, we need only to specify the magnitude.
The ideal structure (magnitude of the frequency response) of a filter is given in Figure 3.8
for the low-pass, high-pass, band-pass and stop-band.
Example 3.4.1 Specification of a low-pass filter.

The ideal low-pass filter is composed of a pass-band and stop-band region. In practice we also
have a transition band, as a sharp transition cannot be achieved. Ideally, |H(ejω )| is unity in
the passband and zero in the stopband. In practice, we supply error tolerances that require
1 − α1 ≤ |H(ejω )| ≤ 1 + α1 for 0 ≤ ω ≤ ωp ,
and
|H(ejω )| < α2 for ωs ≤ ω ≤ π ,
where α1 and α2 are called the passband tolerance and stopband tolerance, respectively. A low-
pass filter is then completely specified by 4 parameters α1 , α2 , ωp and ωs . Figures 3.9 and 3.10
illustrate the concept of tolerance bands in filter design.
6 |H(ejω )|
1
-
π ω
6 |H(ejω )|
1
-
π ω
jω
6 |H(e )|
1
-
ω1 ω2 π ω
jω
6 |H(e )|
1
-
ω1 ω2 π ω
Figure 3.8: Frequency response of ideal low-pass, high-pass, band-pass, and band-stop filters.
A similar arrangement holds for high pass, bandstop and band pass filters (see Figure 3.11).
3.5 Properties of IIR Filters

As discussed an IIR filter has a unit sample response of infinite duration. In general, an IIR
filter with arbitrary unit sample response h(n) cannot be realised, since computing each output
sample requires a large amount of arithmetic operations. Therefore we restrict ourselves to the
class of filters which:
1. have a unit sample response h(n) that is real, causal, and stable.
3.5. PROPERTIES OF IIR FILTERS 29
Figure 3.9: Specifications of a low-pass filter.
|H(ejω )| 6
Transition
Band
? -
ωp ωs ω
Figure 3.10: Low-pass filter with transition band.
2. possess a rational z-transform with a0 = 1, i.e.,

N
X −1
bk z −k
k=0
H(z) = M
.
X
1− ak z −k
r=1
A causal h(n) with a rational z-transform can be realised by a linear constant coefficient
difference equation (LCCDE) with initial rest conditions (IRC) where the output is com-
puted recursively. Thus IIR filters can be implemented efficiently.
Example 3.5.1 Consider a causal impulse response described by

n
1
h(n) = · u(n) ,
2
with H(z) given by
1
H(z) = .
1 − 21 z −1
jω
6 |H(e )| (a)
1 + α1 .............................................
..............................
1 − α1 ...............
α2 ........................................................................................................................................................................
-
ωp ωs π ω
jω (b)
6 |H(e )|
1 + α1 ........................................................................................................................................................................
1 − α1 ........................................................................................................................................................................
α2 .............................................
-
ωs ωp π ω
6 |H(ejω )| (c)
1 + α1 .............................................
.............................................
1 − α1
α2 ............................................. ........................................................................................................................
-
ωs1 ωp1 ωp2 ωs2 π ω
jω (d)
6 |H(e )|
1 + α1 ............................................. ........................................................................................................................
.............................. ........................................................................................................................
1 − α1 ...............
α2 .............................................
-
ωp1 ωs1 ωs2 ωp2 π ω
Figure 3.11: Tolerance scheme for digital filter design. (a) Low-pass. (b) High pass. (c)
Bandpass. (d) Stopband.
This can be realised by

1
y(n) = y(n − 1) + x(n)
2
assuming IRC. Note that although h(n) is of infinite extent, the number of arithmetic operations
required per output is only one multiplication and one addition using a LCCDE formulation.
3.5.1 Stability
An FIR is stable if h(n) is finite for all n, therefore stability is not a problem in practical design
or implementation. With an IIR filter, all the poles of H(z) must lie inside the unit circle to
ensure filter stability.
3.5. PROPERTIES OF IIR FILTERS 31
3.5.2 Linear phase

It is simple to achieve linear phase with FIR filters, to the point that we require all FIR filters
to possess this property. It is much more difficult to control the phase of an IIR filter. As a
result, we specify only the magnitude response for the IIR filter and accept the resulting phase
characteristics.
Chapter 4
Finite Impulse Response Filter

Design
4.1 Introduction
The filter design problem is that of determining h(n), H(ejω ) or H(z) to meet the design speci-
fication. The first step is to choose the type of filter from the previously discussed four types of
FIR filters.
Some types of filters may not be suitable for a given design specification. For example,
from Table 3.1 a Type II filter is not suitable for a high-pass filter design as it will always have
H(ejω ) = 0 at ω = π since cos π(n − 1/2) = 0. Table 4.1 shows what type of filters are not
suitable for a given class of filter.
Table 4.1: Suitability of a given class of filter for the four FIR types
Type Low-pass High-pass Band-pass Band stop

I
II Not suitable Not suitable
III Not suitable Not suitable Not suitable
IV Not suitable Not suitable
The two standard approaches for FIR filter design are:
1. The window method.
2. The optimal filter design method.
We now discuss each of these approaches in turn.
4.2 Design of FIR Filters by Windowing

Many idealised systems are defined by piecewise-constant or piecewise-functional frequency re-
sponses with discontinuities at the boundaries between bands. As a result, they have non causal
impulse responses that are infinitely long. The most straightforward approach for obtaining a
causal FIR approximation to such systems is to truncate the ideal response.
33
34 CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Let Hd (ejω ) be the desired frequency response of the filter. Then

∞
X
Hd (ejω ) = hd (n)e−jωn ,
n=−∞
where hd (n) is the corresponding unit sample response sequence given by

Z π
1
hd (n) = Hd (ejω )ejωn dω .
2π −π
The simplest way to obtain a causal FIR filter from hd (n) is to define a new system with unit
sample response
hd (n), 0 ≤ n ≤ N − 1 ,
h(n) =
0, elsewhere ,
which can be re-written as
h(n) = hd (n) · w(n) ,
and therefore Z π
jω 1
H(e ) = Hd (ejθ ) · W (ej(ω−θ) )dθ .
2π −π
Note that H(ejω )

represents a smeared version of Hd (ejω ). If hd (n) is symmetric (or anti-
symmetric) with respect to a certain point of symmetry (or anti-symmetry) and w(n) is also
symmetric with respect to the same point of symmetry, then hd (n) · w(n) is the unit sample
response of a linear phase filter.
Example 4.2.1 Low-pass filter design with specifications α1 ,α2 , ωp , ωs .

We choose a Type I filter for the design. Since the desired frequency response Hd (ejω ) should
be consistent with the type of filter used, from Table 3.1, Hd (ejω ) is given by
−jωα
jω e , 0 ≤ |ω| ≤ ωc ,
Hd (e ) =
0, ωc ≤ |ω| ≤ π
N −1 ωp +ωs
where α = 2 , ωc = 2 , and N is the filter length (odd). The inverse Fourier transform of
Hd (ejω ) is
sin ωc (n − α)
hd (n) =
π(n − α)
If we apply a rectangular window to hd (n), the resulting filter impulse response h(n) is
(
sin ωc (n−α)
π(n−α) , for 0 ≤ n ≤ N − 1 ,
h(n) =
0 , otherwise .
The sequence h(n) is shown in Figure 4.1 for N = 7 and ωc = π/2. Note that this filter may or
may not meet the design specification.
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH 35
6 h(n)
0.5
.
0.318 0.318
-
n
−0.106 -0.106
Figure 4.1: The sequence h(n).
4.3 Importance of the Window Shape and Length

In the time-domain we have
h(n) = hd (n) · w(n) ,
which corresponds in the frequency domain to
H(ejω ) = Hd (ejω )
⋆ W (ejω )
Z π
1
= Hd (ejθ )W (ej(ω−θ) )dθ (4.1)
2π −π
If we do not truncate at all so that w(n) = 1 for all n, W (ejω ) is a periodic impulse train with
period 2π, and therefore H(ejω ) = Hd (ejω ). This interpretation suggests that if w(n) is chosen
so that W (ejω ) is concentrated in a narrow band of frequencies around ω = 0, then H(ejω )
approximates Hd (ejω ) very closely except where Hd (ejω ) changes very abruptly. Thus, the
choice of window is governed by the desire to have w(n) as short as possible in duration so as to
minimise computation in the implementation of the filter while having W (ejω ) approximate an
impulse; that is, we want W (ejω ) to be highly concentrated in frequency so that the convolution
(4.1) faithfully reproduces the desired frequency response. These are conflicting requirements.
Take, for example, the case of the rectangular window, where
N −1
X 1 − e−jωN sin(ωN/2)
W (ejω ) = e−jωn = −jω
= e−jω(N −1)/2 .
1−e sin(ω/2)
n=0
The magnitude of the function sin(ωN/2)/ sin(ω/2) is shown in Figure 4.2 for the case N = 8.
Note that W (ejω ) for the rectangular window has a generalised linear phase. As N increases,
the width of the “main-lobe” decreases. The main-lobe is usually defined as the region between
the first zero crossing on either side of the origin. For the rectangular window, the main-lobe
width is ∆ω = 4π/N . However, for the rectangular window, the side lobes are large and, in
fact, as N increases, the peak amplitudes of the main-lobe and the side-lobes grow in a manner
such that the area under each lobe is a constant even though the width of each lobe decreases
with N . Consequently, as W (ej(ω−θ) ) “slides by” a discontinuity of Hd (ejω ) with increasing ω,
the integral of W (ej(ω−θ) )Hd (ejω ) will oscillate as each side-lobe of W (ej(ω−θ) ) moves past the
discontinuity.
10
9 Mainlobe
Magnitude
5
4
Peak sidelobe
0
−2 0 2 4 6 8
w rad/s
Figure 4.2: Magnitude of the Fourier transform of a rectangular window (N = 8).
We can modify this effect by tapering the window smoothly to zero at each end so that
the height of the side-lobes are diminished; however this is achieved at the expense of a wider
main-lobe and thus a wider transition at the discontinuity.
Five common windows are used in practice: the rectangular, Bartlett, Hanning, Hamming
and Blackman window. The formula for these windows are given below.
Rectangular
1, 0 ≤ n ≤ N − 1 ,
w(n) =
0, elsewhere.
Bartlett 
2n N −1
, 0≤n≤

 ,
w(n) = N −1 2
 2 − 2n , N − 1 ≤ n ≤ N − 1 .

N −1 2
Hanning
1 2πn
w(n) = 1 − cos( ) , 0 ≤ n ≤ N − 1.
2 N −1
Hamming
2πn
w(n) = 0.54 − 0.46 cos , 0 ≤ n ≤ N − 1.
N −1
Blackman

2πn 4πn
w(n) = 0.42 − 0.5 cos + 0.08 cos , 0 ≤ n ≤ N − 1.
N −1 N −1
Figure 4.3 compares these window functions for N = 200. A good choice of the window
corresponds to choosing a window with small main-lobe width and small side-lobe amplitude.
Clearly, the shape of a window affects both main-lobe and side-lobe behavior.
Figures 4.4, 4.5, 4.6, 4.7 and 4.8 show these windows in the frequency domain for N = 51,
using a 256-point FFT. For comparison, Figures 4.9 and 4.10 show |W (ejω )| of the Hamming
1.2
1
Rectangular
0.8
Bartlett
w(n)
0.6 Hamming
0.4
Blackman
0.2
Hanning
0
0 50 100 150 200
n
Figure 4.3: Commonly used windows for FIR filter design N = 200.
window for N = 101 and 201, respectively. This shows that main-lobe behavior is affected
primarily by window length.
The effect of the shape and size of the window can be summarised as follows:
1. Window shape controls side-lobe as well as main-lobe behavior.
2. Window size controls main-lobe behavior only.
Since side-lobe behavior and therefore passband and stop-band tolerance is affected only by
the shape of the window, the passband and stop-band requirements dictate which window is
chosen. The window size is then determined from the transition width requirements. Table 4.2
(column 1 and 2) contains information useful in choosing the shape and length of the window.
Table 4.2: Properties of different windows.
Peak Amplitude Approximate Width (rad) Peak Approximation

Window of Side-lobe (dB) of main-lobe Error (dB)
Type (relative) ∆ωm 20 log10 (min(α1 , α2 ))
Rectangular -13 4π/N -21
Bartlett -25 8π/(N − 1) -25
Hanning -31 8π/(N − 1) -44
Hamming -41 8π/(N − 1) -53
Blackman -57 12π/(N − 1) -74
Example 4.3.1 Filter design problem.

-10
-20
20 log|W|
-30
-40
-50
-60
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.4: Fourier transform (magnitude) of rectangular window (N = 51).
-20
-40
-60
20 log|W|
-80
-100
-120
-140
-160
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.5: Fourier transform (magnitude) of Bartlett (triangular) window (N = 51).
-20
-40
-60
20 log|W|
-80
-100
-120
-140
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.6: Fourier transform (magnitude) of Hanning window (N = 51).

-10
-20
-30
-40
20 log|W|
-50
-60
-70
-80
-90
-100
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.7: Fourier transform (magnitude) of Hamming window (N = 51)
-20
-40
-60
20 log|W|
-80
-100
-120
-140
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.8: Fourier transform (magnitude) of Blackman window (N = 51).
-20
-40
20 log|W|
-60
-80
-100
-120
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.9: Fourier transform of a Hamming window with window length of N = 101.
-20
-40
20 log|W|
-60
-80
-100
-120
0 0.5 1 1.5 2 2.5 3 3.5
w rad/s
Figure 4.10: Fourier transform of a Hamming window with window length of N = 201.
We have to design a low-pass filter with a required stop-band attenuation of −50dB. Which
filter is most appropriate?
The third column of Table 4.2 contains information on the best stop-band (or passband)
attenuation we can expect from a given window shape. Therefore, we can see that the rectangular,
Bartlett and Hanning windows will not meet the specifications. The Hamming window may just
meet the requirements, but the Blackman window will clearly be the most appropriate.
From the second column of Table 4.2 the length of the window required can be derived. The
distance between the peak ripples on either side of discontinuity is approximately equal to the
main-lobe width ∆ωm . The transition width ∆ω = ωs − ωp is therefore somewhat less than
the main-lobe width. For example, for the Blackman window, N can be determined from the
relationship 12π/(N − 1) ≃ ∆ω. Therefore N ≈ 12π ∆ω + 1 points.
4.4 Kaiser Window Filter Design

We can quantify the trade-off between main-lobe width and side-lobe area by seeking the window
function that is maximally concentrated around ω = 0 in the frequency domain. Kaiser proposed
the window function

 I0 [β(1 − [(n − α)/α]2 )1/2 ]
, 0 ≤ n ≤ N − 1,
w(n) = I0 (β)

0, elsewhere ,
where α = N 2−1 , and I0 (·) is the zeroth-order modified Bessel function of the first kind. In
contrast to the previously discussed windows, this one has two parameters; the length N , and
a shape parameter β. Thus, by varying N and β we can trade side-lobe amplitude for main-
lobe width. Figure 4.11 shows the trade-off that can be achieved. If the window is tapered
more, the side-lobes of the Fourier transform become smaller, but the main-lobe becomes wider.
Alternatively, we can hold β constant and increase N , causing the main-lobe to decrease in
width without affecting the amplitude of the side-lobes.
4.4. KAISER WINDOW FILTER DESIGN 41
1.2
0.8
Amplitude
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16 18 20
Samples
(a)
−10
−20
−30
−40
20 log |W|
−50
−60
−70
−80
−90
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
−10
−20
−30
−40
20 log |W|
−50
−60
−70
−80
−90
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(c)
Figure 4.11: (a) Kaiser windows for β = 0 (solid), 3 (semi-dashed), and 6 (dashed) and N = 21.
(b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows
with β = 6 and N = 11 (solid), 21 (semi-dashed), and 41 (dashed).
Kaiser obtained formula that permitted the filter designer to estimate the values of N and
β which were needed to meet a given filter specification. Consider a frequency selective filter
with the specifications shown in Figure 3.11(a). Kaiser found that over a usefully wide range
of conditions α1 is determined by the choice of β. Given α1 , the passband cutoff frequency ωp
of the low-pass filter is defined to be the highest frequency such that |H(ejω )| ≥ 1 − α1 . The
stop-band cutoff frequency ωs is defined to be the lowest frequency such that |H(ejω )| ≤ α2 .
The transition region width is therefore
∆ω = ωs − ωp
for the low-pass filter approximation. With A being the negative peak approximation error (pae)
A = −20 log10 min (α1 , α2 ) ,
Kaiser determined empirically that to achieve a specified value of A, we required that


 0.1102(A − 8.7), A > 50,
β= 0.5842(A − 21)0.4 + 0.07886(A − 21), 21 ≤ A ≤ 50,

0, A < 21.
To achieve prescribed values of A and ∆ω, N must satisfy

A−8
N= +1 .
2.285∆ω
With these formula no iterations or trial is required.
Example 4.4.1 Given the specifications of a low-pass filter ωp = 0.4π, ωs = 0.6π, α1 = α2 =

0.001 and ∆ω = 0.2π, find the unit sample response using Kaiser’s method. We find
A = −20 log10 α1 = 60
and
β = 5.635, N = 38 .
Then, (
sin ωc (n−α) I0 [β(1−[(n−α)/α]2 )1/2 ]
· , 0 ≤ n ≤ N − 1,
h(n) = π(n−α) I0 (β)
0, elsewhere
where α = N 2−1 = 37
2 = 18.5 and ωc = (ωs + ωp ) /2. Since N is an even integer, the resulting
linear phase system would be of type II. Figure 4.12 shows h(n) and H(ejω ).
Example 4.4.2 Design of a high-pass filter.

In the following, we consider a high-pass filter with
ωs = 0.35π
ωp = 0.5π
α1 = 0.021
Again using the formulae proposed by Kaiser, we get β = 2.6 and N = 25. Figure 4.13 shows
h(n) and H(ejω ). The resulting peak error is 0.0213 > α1 = α2 = 0.021 as specified. Thus we
increase N by 26. In this case the filter would be of type II and is not appropriate for designing
a high-pass. Therefore we use N = 27, which exceeds the required specifications.
4.4. KAISER WINDOW FILTER DESIGN 43
0.6
0.5
0.4
0.3
Amplitude
0.2
0.1
−0.1
−0.2
0 5 10 15 20 25 30 35 40
Sample number (n)
(a)
20
−20
−40
20 log |H|
−60
−80
−100
−120
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
Figure 4.12: Response functions. (a) Impulse response. (b) Log magnitude.
The design of FIR filters by windowing is straightforward to apply and it is quite general
even though it has a number of limitations. However, we often wish to design a filter that is the
“best” that can be achieved for a given value of N . We describe two methods that attempt to
improve upon the windowing method by introducing a new criterion of optimality.
0.7
0.6
0.5
0.4
0.3
Amplitude
0.2
0.1
−0.1
−0.2
−0.3
−0.4
0 5 10 15 20 25 30
Sample number (n)
(a)
20
−20
20 log |H|
−40
−60
−80
−100
0 0.5 1 1.5 2 2.5 3
w rad/s
(b)
Figure 4.13: Kaiser window estimate. (a) Impulse response. (b) Log magnitude.
4.5. OPTIMAL FILTER DESIGN. 45
4.5 Optimal Filter Design.

As seen, the design of a filter with the window method is straightforward and easy to handle
but it is not optimal in the sense of minimizing the maximum error. Through the Fourier
approximation, any filter design which uses windowing attempts to minimize the integrated
error
Z π
2 1
Hd (ejω ) − H(ejω )2 dω.
ǫ =
2π −π
In many applications we get a better result by minimizing the maximum error. Using the
windowing method and Fourier approximation we always get the Gibbs phenomenon which
implies that the maximum error will be at discontinuities of Hd (ejω ). Better results can be
achieved if the approximation error is spaced equally over the whole frequency band.
In the following section only type I linear phase FIR filters are taken into account but the
implementation of other linear phase FIR filters is straightforward.
Take a non-causal FIR filter with the impulse response
h(n) = h(−n),
the frequency response is
L
X
jω
H(e ) = h(n)e−jωn
n=−L
L
X
= h(0) + 2h(n) cos(ωn)
n=1
L
X
= a(n) cos(ωn)
n=0
L
" n
#
X X
k
= a(n) βkn cos(ω)
n=0 k=0
L
X
= ǎ(n) cos(ω)n
n=0
which is real and even. The filter specifications for Hd (ejω ) are shown in Fig. 4.14
To design the filter the Parks-McClellan algorithm formulates the problem as one of polynomial
approximation which can be solved by the Chebyshev approximation over disjoint sets of fre-
quencies (F = {0 ≤ ω ≤ ωp , ωs ≤ ω ≤ π}). The approximation error function, E(ω), is defined
as

ˆ (ω) Hd (ejω ) − H(ejω ) .
E(ω)=W (4.2)
E(ω) measures the error between the optimum and attained frequency response, which is
weighted by W (ω) where
α1
α2 : 0 ≤ ω ≤ ωp
W (ω) = .
1 : ωs ≤ ω ≤ π
1 + α1
1
1 − α1
α2
−α2
ωp ωs
Figure 4.14: Filter Specification
The optimal H(ejω ) minimizes the maximum weighted error which means we have to search for
an H(ejω ) which fulfills

min max |E(ω)| = min max W (ω) Hd (ejω ) − H(ejω ) .
ǎ(n) : 0≤n≤L ω∈F ǎ(n) : 0≤n≤L ω∈F
Theorem 4.5.1 (Alternation Theorem)

With a closed disjoint subset F on the real axis a necessary and sufficient condition for
L
X
jω
H(e ) = ǎ(n) cos(ω)n
n=0
to be the unique, best weighted Chebyshev approximation of Hd (ejω ) in F is that the error
function E(ω) has at least L + 2 alternating frequencies in F . This means that there must exist
at least L + 2 frequencies ωi in F such that ω0 < ω1 < · · · < ωL+1 with E(ωi ) = −E(ωi+1 ) and
|E(ωi )| = max |E(ω)| i = 0, . . . , L + 1.

ω∈F
The maximum of the alternating frequencies can be found from the derivative of E(ω). If we
calculate the derivative of E(ω) and take into account that Hd (ejω ) is constant in F , we get for
the maximal frequencies
dE(ω) d
= W (ω) Hd (ejω ) − H(ejω )
dω dω
dH(ejω )
= − W (ω) (4.3)
dω
XL
= W (ω) sin(ω) nǎ(n) cos(ω)n−1 = 0.
n=1
This means that there are at most L + 1 maxima belonging to the polynomial structure and
to the frequencies at ω = 0 and ω = π. Two more maxima will occur at ω = ωp and ω = ωs
4.5. OPTIMAL FILTER DESIGN. 47
because of the discontinuity and the inability to approximate this.

As it can be seen in Eq. (4.3), the error function E(ω) and the approximation function H(ejω )
possess maxima at the same frequencies ωi . Thus from Theorem 4.5.1 we can write the error
E(ω) at the ωi , i = 0 . . . L + 1, as

E(ωi ) = W (ωi ) Hd (ejωi ) − H(ejωi ) = (−1)i ρ i = 0...L + 1
and can express the ideal frequency response as
L
jωi jωi (−1)i ρ X (−1)i ρ
Hd (e ) = H(e )+ = a(n) cos(ωi n) +
W (ωi ) W (ωi )
n=0
or
 1
 a0

 
1 cos(ω0 ) · · · cos(Lω0 ) W (ω0 ) Hd (ejω0 )
 . .. .. ..  .. 
 .. ..  .   .. 
 . . . .  = . .
(−1)L+1
 aL  jω
1 cos(ωL+1 ) · · · cos(LωL+1 ) Hd (e L+1 )
W (ωL+1 ) ρ
To solve these equations for the parameter an , we use an algorithm called ”Remez” which was
developed by Rabiner et. al. in 1975.
After an initial guess of L + 2 extremal frequencies of H(ejω ) we can compute ρ according to
γ0 Hd (ejω0 ) + . . . + γL+1 Hd (ejωL+1 )

ρ= (−1)L+1 γL+1
(4.4)
γ0
W (ejω0 )
+ ... +
W (ejωL+1 )
with
L+1
Y 1
γk = .
cos(ωk ) − cos(ωn )
n=0, n6=k
Since E(ω) has the same extremal frequencies as H(ejω ) we can write
(−1)i ρ
Ĥ(ejωi ) = Hd (ejωi ) − .
W (ωi )
Using Lagrange interpolation it follows that
PL jωi ) βk
jω k=0 Ĥ(e cos(ω)−cos(ωk )
H(e ) = PL βk
k=0 cos(ω)−cos(ωk )
with
L
Y 1
βk = .
cos(ωk ) − cos(ωn )
n=0, n6=k
With H(ejω ) it is possible to calculate the error E(ω) according to Eq. (4.2). After calculat-
ing E(ω), L + 2 extremal frequencies belonging to the largest alternating peaks can be selected
and the algorithm can start again by using Eq. (4.4). The Algorithm is summarised in Fig.
4.15.
By using the Remez algorithm we are able to find the δ that defines the upper bound of |E(ω)|,
which is the optimum solution for the Chebyshev approximation. After the algorithm converges
the filter coefficients h(n) can be calculated by taking L + 1 frequencies from H(ejω ) with
L
1 X 2πkn
h(n) = H(eωn )ej L+1 .
L+1
n=0
Property of Optimal Filters

The error between the desired and attained filter frequency response achieve a minimum and
maximum consecutively at a finite number of frequencies in the frequency range of interest (pass-
band and stopband regions) called alternation frequencies. The minimum number of alternating
frequencies for a Type I optimum filter is N 2+3 (with L = N 2−1 ). Because of this characteristic,
optimal filters are called equi-ripple filters.
Remarks
1. Designing an optimal filter requires much more computation than the windowing method.
2. The length of the filter designed by the window method is 20% to 50% longer than the
length of the optimal filter.
c
3. Optimal filter design is the most widely used FIR filter design method. In MATLAB
the function that performs optimum filter design is called ”firpm”.
4.6 Implementation of FIR Filters

Here we discuss the implementation of a discrete time system with unit sample response h(n),
where h(n) may be a filter. The input-output relationship is given by the convolution sum
∞
X N
X −1
y(n) = h(k)x(n − k) = h(k)x(n − k) ,
k=−∞ k=0
i.e., the output y(n) represents a sum of weighted and delayed versions of the input x(n). The
signal flow graph for a single delay element is shown in Figure 4.16. The convolution sum can
be represented as in Figure 4.17.
The computational efficiency of the algorithm in Figure 4.17 may be improved by exploiting
the symmetry (or anti-symmetry) of h(n). For type I Filters, we have h(n) = h(N − 1 − n) and
4.6. IMPLEMENTATION OF FIR FILTERS 49
N is odd. This leads to

N
X −1
y(n) = h(k) · x(n − k)
k=0
N −1
2
−1
X N −1 N −1
= h(k) · x(n − k) + h x n−
2 2
k=0
N
X −1
+ h(k) · x(n − k)
k= N 2−1 +1
N −1
2
−1
X
y(n) = h(k) · [x(n − k) + x(n − (N − 1 − k))]
k=0

N −1 N −1
+ h x n−
2 2
This realisation is shown by the flow graph in Figure 4.18. This method reduces the number of
multiplications by 50% without affecting the number of additions or storage elements required.
Input filter parameters
?
Initial guess of
(L + 2) extremal frequencies
-
?
Calculate the optimum
ρ on extremal set
?
Interpolate through (L + 1)
points to obtain H(ejω )
?
Calculate error E(ω)
and find local maxima
where |E(ω)| ≥ ρ
More than Yes Retain (L + 2)

(L + 2) - largest
extrema? extrema
No

?
Changed Check whether the
extremal points changed
Unchanged
?
Best approximation
Figure 4.15: Flowchart of Parks-McClellan algorithm.

4.6. IMPLEMENTATION OF FIR FILTERS 51
x(n) - z −1 - y(n)
Figure 4.16: Signal flow graph for a delay element.
x(n) - - z −1 - - z −1 - - - z −1
?h(0) ?h(1) ?h(2) ?
h(N − 1)
- - - y(n)
Figure 4.17: Signal Flow graph representing the convolution sum.
x(n) - - z −1 - - z −1 - - z −1

] ] ] ]
z −1 z −1 z −1
?h(0) ?
h(1) ?h(2) h( N 2−3 )
? h( N 2−1 )
?
y(n)
Figure 4.18: Efficient signal flow graph for the realisation of a type I FIR filter.
Chapter 5
Design of Infinite Impulse Response

Filters
5.1 Introduction
When designing IIR filters one must determine a rational H(z) which meets a given magnitude
specification. The two different approaches are :
1. Design a digital filter from an analog or continuous-time filter.
2. Design the filter directly.

Only the first approach is considered since it is much more useful. It consists of the following
four steps:
Step 1 : Specification of the digital filter.
Step 2 : Translation to a continuous-time filter specification.
Step 3 : Determination of the continuous-time filter system function Hc (s).
Step 4 : Transformation of Hc (s) to a digital filter system function H(z).
Remarks
1. Step 2 depends on Step 4 since translation of the digital filter specification to a continuous-
time filter specification depends on how the continuous-time filter is transformed back to
the digital domain.
2. It is desirable for Step 4 to always transform a causal and stable Hc (s) to a causal and
stable H(z), and also that the transformation results in a digital filter with a rational H(z)
when the continuous-time filter has a rational Hc (s).
3. The continuous-time filter frequency response Hc (jΩ) = Hc (s)|s=jΩ should map the digital
filter frequency response H(ejω ) = H(z = ejω ). Typically Butterworth, Chebyshev and
elliptic filters are used as continuous-time prototypes.
The impulse invariance method and bilinear transformation are two approaches that satisfy
all the above properties, and are often used in designing IIR filters. We discuss their implemen-
tation in the following sections.
53
54 CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
5.2 Impulse Invariance Method

In the impulse invariance method, we obtain the unit sample response of the digital filter by
sampling the impulse response of the continuous-time filter:
1. From Hc (s) and Hc (jΩ), we determine hc (t).
2. The sequence h(n) is obtained by h(n) = Td · hc (nTd ), where Td is the design sampling
interval which does not need to be the same as the sampling period T associated with
A/D and D/A conversion.
3. H(z) is obtained from h(n).
Hc (s) H(z)
hc (t) h(n)
Example 5.2.1 A continuous-time filter is described by

1
Hc (s) = .
s − sc
The inverse Laplace transform is
hc (t) = esc t · u(t) .
Therefore,
h(n) = hc (t)|t=nTd = esc n · u(n)
assuming Td = 1, which leads to
1
H(z) =
1 − esc z −1
In general, to transform Hc (s) to H(z), Hc (s) is first expressed as a linear combination of
1
system functions of the form (s−s c)
, which is then transformed to give H(z), i.e. perform a
partial fraction expansion and transform each component.
If Hc (s) has multiple poles at s = sc , Hc (s) would have a component of the form (s−s1 c )p
with p > 1 . Its z-transform can be obtained by following the same steps as the above example.
In previous chapters, we have seen that
∞
jω
X ω 2π
H(e ) = Hc j +j k . (5.1)
Td Td
k=−∞
So, H(ejω ) is not simply related to Hc (jΩ) because of aliasing. If there were no aliasing effect,

jω ω
H(e ) = Hc j for − π ≤ ω ≤ π , (5.2)
Td
and the shape of H(ejω ) would be the same as Hc (jΩ). Note that in practice Hc (jΩ) is not
band limited and therefore aliasing is present which involves some difficulty in the design. In
practice, we suppose that the relation H(ejω ) = Hc (j Tωd ) holds over [−π, π) and we carry out
5.2. IMPULSE INVARIANCE METHOD 55
the remaining steps, instead of using (5.1). Note that the discrete-time system frequency ω is
related to the continuous-time system frequency Ω by ω = ΩTd . Because of the approximation,
the resulting filter may not satisfy the digital filter specifications and some iterations may be
needed. Also due to aliasing, it is difficult to design band stop or high-pass filters that have no
significant aliasing.
In the impulse invariance design procedure, we transform the continuous-time filter specifi-
cations through the use of Eq. (5.2) to the discrete-time filter specifications. We illustrate this
in the following example.
Example 5.2.2 Assume that we have the low-pass filter specifications as shown in Figure 5.1
with α1 = 0.10875, α2 = 0.17783, ωp = 0.2π, Td = 1 and ωs = 0.3π. In this case the passband is
between 1 − α1 and 1 rather than between 1 − α1 and 1 + α1 . Historically, most continuous-
jω
6|H(e )|
1 ..............................
1 − α1 ..............................
α2 .................................
-
ωp ωs π ω
Figure 5.1: Low-pass filter specifications.
time filter approximation methods were developed for the design of passive systems (having gain
less than 1). Therefore, the design specifications for continuous time filters have typically been
formulated so that the passband is less than or equal to 1.
We will design a filter by applying impulse invariance to an appropriate Butterworth continuous-
time filter. A Butterworth low-pass filter is defined by the property that the magnitude-squared
function is
1
|Hc (jΩ)|2 =
1 + (Ω/Ωc )2N
where N is an integer corresponding to the order of the filter. The magnitude response of a But-
terworth filter of N th order is maximally flat in the passband, i.e., the first (2N − 1) derivatives
of the magnitude-squared function are zero at Ω = 0. An example of this characteristic is shown
in Figure 5.2. The first step is to transform the discrete-time specifications to continuous-time
filter specifications. Assuming that the aliasing involved in the transformation from Hc (jΩ) to
H(ejω ) will be negligible, we obtain the specification on Hc (jΩ) by applying the relation
Ω = ω/Td
to obtain the continuous-time filter specifications shown in Figure 5.3.

The Butterworth filter should have the following specifications,
1 − α1 = 0.89125 ≤ |Hc (jΩ)| ≤ 1 , 0 ≤ |Ω| ≤ 0.2π ,
and
|Hc (jΩ)| ≤ 0.17783 = α2 , 0.3π ≤ |Ω| ≤ π .
1
0.9
0.8
0.7
0.6
Magnitude
N=2
0.5
N=4
0.4
0.3 N=8
0.2
0.1
0
0 1 2 3 4 5 6
Frequency
Figure 5.2: Magnitude transfer function Hc (jΩ).
6|Hc (jΩ)|
1 − α1
α2
-
ωp ωs π
0 Td Td Td Ω
Figure 5.3: Filter specifications for Example 5.2.2.
Because |Hc (jΩ)| is a monotonic function of Ω, the specifications will be satisfied if
|Hc (j0.2π)| ≥ 0.89125
and
|Hc (j0.3π)| ≤ 0.17783
Now, we determine N and Ωc , which are parameters of
1
|Hc (jΩ)|2 = ,
1 + (Ω/Ωc )2N
to meet the specifications. Thus, we have

2N 2
0.2π 1
1+ =
Ωc 0.89125
5.3. BILINEAR TRANSFORMATION 57
and 2N 2
0.3π 1
1+ =
Ωc 0.17783
which leads to N = 5.8858 and Ωc = 0.70474. Rounding N up to 6, we find Ωc = 0.7032. With
these values, the passband specification will be met exactly and the stopband specifications will
be exceeded, reducing the potential of aliasing. With N = 6, the 12 poles of
Hc (s)Hc (−s) = 1/[1 + (s/jΩc )2N ]
are uniformly distributed around a circle of radius ωc = 0.7032. Consequently, the poles of Hc (s)
are the 3 pole pairs in the left half of the s-plane with coordinates
s1 = −0.182 ± j0.679
s2 = −0.497 ± j0.497
s3 = −0.679 ± j0.182
so that
0.12093
Hc (s) = .
(s2 + 0.364s + 0.4942)(s2 + 0.994s + 0.494)(s2 + 1.358s + 0.4542)
1 1
If we express Hc (s) as a partial fraction expansion, we transform (s−sk ) into (1−esk z −1 )
and get
0.2871 − 0.4466z −1 −2.1428 + 1.1455z −1 1.8557 − 0.6303z −1

H(z) = + + .
1 − 1.2971z −1 + 0.6949z −2 1 − 1.0691z −1 + 0.3699z −2 1 − 0.9972z −1 + 0.2570z −2
5.3 Bilinear Transformation

The other main approach to the design problem is the bilinear transformation. With this
approach, H(z) is obtained directly from the continuous-time frequency response Hc (s) by re-
placing s by
2 1 − z −1
s=
Td 1 + z −1
so that
2 1 − z −1
H(z) = Hc .
Td 1 + z −1
A sample on the unit circle, z = ejω , corresponds to the point
2 1 − e−jω 2
s= · −jω
= j tan(ω/2).
Td 1 + e Td
The bilinear transformation transforms the imaginary axis of the s-plane to the unit circle
as shown in Figure 5.4. The inverse transform is given by
2
Td +s 1 + (Td /2)s 1 + σTd /2 + jΩTd /2
z= 2 = = .
Td −s 1 − (Td /2)s 1 − σTd /2 − jΩTd /2
The region ℜ{s} < 0 is transformed to |z| < 1, i.e. the left-hand half plane ℜ{s} < 0 of the
s-plane is mapped inside the unit circle. This means the bilinear transformation will preserve
the stability of systems.
6jΩ s-plane 6ℑ z-plane
Image of
s = jΩ (unit circle)
- -
σ ℜ
Image of
left half-plane
Figure 5.4: Mapping of the s-plane onto the z-plane using the bilinear transformation
In the definition of the transformation, T2d is a scaling factor. It controls the deformation of
the frequency axis when we use the bilinear transformation to obtain a digital transfer function
H(z) from the continuous-time transfer function Hc (s). From s = j T2d tan(ω/2) = σ + jΩ, it
follows that
2
Ω= tan(ω/2) and σ = 0 .
Td
For low frequencies, the deformation between Ω and ω is very small,i.e. for small ω, we can
assume Ω ≈ ω2 . This was the criterion for choosing 2/Td as a scaling factor. This is shown in
Figure 5.5. The choice of another criterion will yield another scaling factor.
4
2
Discrete−time frequency
−1
−2
−3
−4
−20 −15 −10 −5 0 5 10 15 20
Continuous−time frequency
Figure 5.5: Mapping of the continuous-time frequency axis onto the unit circle by bilinear
transformation (Td = 1).
Remarks
1. This transformation is the most used in practice. It calculates digital filters from known
continuous-time filter responses.
2. By taking into account the frequency deformation,

2 ω
Ω= tan ,
Td 2
5.3. BILINEAR TRANSFORMATION 59
the detailed shape of Hc (jΩ) is not preserved.
3. The delay time is also modified
τd = τc [1 + ((ωc Td )/2)2 ] .
If the continuous-time filter has a constant delay time, the resulting digital filter will not
have this property.
4. To design a digital filter using this method one must first transform the discrete time filter
specifications to the continuous time specifications, design a continuous-time filter with
these new requirements and then apply the bilinear transformation.
Example 5.3.1 Consider the discrete-time filter specifications of Example 5.2.2, where we il-
lustrated the impulse invariance technique for the design of a discrete-time filter. In carrying out
the design using the bilinear transformation, the critical frequencies of the discrete-time filter
must be prewarped to the corresponding continuous-time frequencies using Ω = (2/Td ) tan(ω/2)
so that the frequency distortion inherent in bilinear transformation will map them back to the
correct discrete-time critical frequencies. For this specific filter, with |Hc (jΩ)| representing the
magnitude response function of the continuous-time filter, we require that

2 0.2π
0.89125 ≤ |Hc (jΩ)| ≤ 1, 0 ≤ |Ω| ≤ tan ,
Td 2

2 0.3π
|Hc (jΩ)| ≤ 0.17783, tan ≤ |Ω| ≤ ∞.
Td 2
For convenience we choose Td = 1. Also, as with Example 5.2.2, since a continuous-time

Butterworth filter has monotonic magnitude response, we can equivalently require that
|Hc (j2 tan(0.1π))| ≥ 0.89125
and
|Hc (j2 tan(0.15π))| ≤ 0.17783.
The form of the magnitude-squared function for the Butterworth filter is
1
|Hc (jΩ)|2 = .
1 + (Ω/Ωc )2N
Solving for N and Ωc with the equality sign we obtain

2N 2
2 tan(0.1π) 1
1+ = (5.3)
Ωc 0.89125
and 2N 2
2 tan(0.15π) 1
1+ = (5.4)
Ωc 0.17783
and solving for N gives
1 1
log[(( 0.17783 )2 − 1)/(( 0.89125 )2 − 1)]
N= = 5.30466.
2 log[tan(0.15π)/ tan(0.1π)]
Since N must be an integer, we choose N = 6. Substituting this N into Eq. ( 5.4) we obtain
Ωc = 0.76622. For this value of Ωc , the passband specifications are exceeded and stop band
specifications are met exactly. This is reasonable for the bilinear transformation since we do not
have to be concerned with aliasing. That is, with proper prewarping, we can be certain that the
resulting discrete-time filter will meet the specifications exactly at the desired stopband edge.
In the s-plane, the 12 poles of the magnitude-squared function are uniformly distributed in
angle on a circle of radius 0.7662. The system function of the continuous-time filter obtained
by selecting the left half-plane poles is
0.20238
Hc (s) = .
(s2 + 0.3966s + 0.5871)(s2 + 1.0836s + 0.5871)(s2 + 1.4802s + 0.5871)
We then obtain the system function H(z) for the discrete-time filter by applying the bilinear
transformation to Hc (s) with Td = 1,
0.0007378(1 + z −1 )6
H(z) = .
(1 − 1.268z −1 + 0.7051z −2 )(1 − 1.0106z −1 + 0.3583z −2 )(1 − 0.9044z −1 + 0.2155z −2 )
Since the bilinear transformation maps the entire jΩ-axis of the s-plane onto the unit circle,
the magnitude response of the discrete-time filter falls off much more rapidly than the original
continuous-time filter, i.e. the entire imaginary axis of the s-plane is compressed onto the unit
circle. Hence, the behaviour of H(ejω ) at ω = π corresponds to the behaviour of Hc (jΩ) at
Ω = ∞. Therefore, the discrete-time filter will have a sixth-order zero at z = −1 since the
continuous-time Butterworth filter has a sixth-order zero at s = ∞.
Note that as with impulse invariance, the parameter Td is of no consequence in the design
procedure since we assume that the design problem always begins with discrete-time filters
specifications. When these specifications are mapped to continuous-time specifications and then
the continuous-time filter is mapped back to a discrete-time filter, the effect of Td is cancelled.
Although we retained the parameter Td in our discussion, in specific problems any convenient
value of Td can be chosen.
5.4 Implementation of IIR Filters

An IIR filter has an infinite extent and cannot be implemented by direct convolution in practice.
For this reason, we require the IIR filter to have a rational z-transform, as we have seen that
this can be realised by a difference equation.
5.4.1 Direct Form I

The difference equation that describes the system is given by
M
X N
X −1
y(n) = ak y(n − k) + br x(n − r)
k=1 r=0
This can be realized

PN −1in a straightforward manner
PM as depicted in Figure 5.6 which represents
the pair v(n) = r=0 br x(n − r) and y(n) = k=1 ak y(n − k) + v(n) for (N − 1) = M . This
5.4. IMPLEMENTATION OF IIR FILTERS 61
representation can be viewed as an implementation of H(z) through the decomposition

 
  N −1
!
 1  X
  −r
H(z) = H2 (z)H1 (z) =  M
 br z .
 X 
r=0
1 − ak z −k 
k=1
Herein we have represented the delay element

z −1
- z −1 - by -
b0 v(n)
x(n) - - - - - y(n)
z −1 ? 6 6
?
z −1
b1 a1
x(n − 1) - y(n − 1)
z −1 ? z
?
−1
6 6
b2 a2
x(n − 2) - y(n − 2)
bN −2 aN −2
x(n − N + 2) - y(n − N + 2)
z −1 ? z
?
−1
bN −1 aN −1
x(n − N + 1) - y(n − N + 1)
Figure 5.6: Signal flow graph of direct form I structure for an (N − 1)th-order system.
5.4.2 Direct Form II

The previous system can also be viewed as a cascade of two systems whose system functions are
given by H1 (z) and H2 (z). Since
H1 (z)H2 (z) = H(z) = H2 (z)H1 (z) ,
we can change the order of H1 (z) and H2 (z) without affecting the overall system function. This
is because the delay elements are redundant and can be combined. We then obtain the direct
form II structure. The advantage of the second form lies in the reduction in the number of
storage elements required, while the number of arithmetic operations required is constant. The
flow graph of direct form II structure is depicted in Figure 5.7
w(n) b0
x(n) - - - - y(n)
?
z −1
6 6
a1 b1
-
z −1
?
6 6
a2 b2
-
aN −2 bN −2
-
z −1
?
aN −1 bN −1
-
Figure 5.7: Signal flow graph of direct form II structure for an N − 1th-order system.
5.4.3 Cascade form

The rational system function
N
X −1
br z −r
r=0
H(z) = M
X
1− ak z −k
k=1
can be expressed in the general form,

Y
H(z) = A Hk (z)
k
where Hk (z) is given by :

1 + β1k z −1 + β2k z −2
Hk (z) =
1 − α1k z −1 − α2k z −2
The constants A, β1k , β2k , α1k and α2k are real, since h(n) is real. The previous equations have
shown that H(z) can be represented by a cascade of second order sections Hk (z). The cascade
form is less sensitive to coefficient quantisation. If we change the branch transmittance in the
two direct forms, the poles and zeroes of the system function will be affected. In the cascade
form, however, the change in one transmittance branch will affect the poles or zeroes only in the
second order section where the branch transmittance is affected. Figure 5.8 gives the cascade
structure for a 6th-order system with a direct form II realisation of each second-order subsystem.
5.5. A COMPARISON OF FIR AND IIR DIGITAL FILTERS 63
w1 (n) y1 (n) w2 (n) y2 (n) w3 (n) y3 (n)
β01 β02 β03
x(n) - - R - - - R - - - R - - y(n)
?
z −1 ?
z −1 ?
z −1
6 α11 β11 6 6 α12 β12 6 6 α13 β13 6
- - -
?
z −1 ?
z −1 ?
z −1
α21 β21 α22 β22 α23 β23
- - -
Figure 5.8: Cascade structure for a 6th-order system with a direct form II realisation of each
second-order subsystem.
5.5 A Comparison of FIR and IIR Digital Filters

After having discussed a wide range of methods for the design of both infinite and finite duration
impulse response filters several questions naturally arise: What type of system is best, IIR or
FIR? Which methods yields the best results? The answer is neither. We have discussed a wide
range of methods for both IIR and FIR filter because no single type of filter nor single design
method is best for all circumstances.
The choice between an FIR filter and an IIR filter depends upon the relative weight that one
attaches to the advantages and disadvantages of each type of filter. For example, IIR filters have
the advantage that a variety of frequency selective filters can be designed using closed-form design
formulae. That is, once the problem has been defined in terms of appropriate specifications for
a given kind of filter (e.g., Butterworth, Chebyshev, or elliptic), then the coefficients (or poles
and zeros) of the desired digital filter are obtained by straightforward substitution into a set
of design equations. This kind of simplicity in the design procedure is attractive if only a few
filters are to be designed or if limited computational facilities are available.
In the case of FIR filters, closed-form design equations do not exist. While the window
method can be applied in a straightforward manner, some iteration may be necessary to meet a
prescribed specification. Most of the other FIR design methods are iterative procedures which
require relatively powerful computational facilities for their implementation. In contrast, it is
often possible to design frequency selective IIR digital filters using only a hand calculator and
tables of continuous-time filter design parameters. A drawback of IIR filters is that the closed-
form IIR designs are preliminary limited to low-pass, band-pass, and high-pass filters, etc.
Furthermore, these designs generally disregard the phase response of the filter. For example,
with a relatively simple computational procedure we may obtain excellent amplitude response
characteristics with an elliptic low-pass filter while the phase response will be very nonlinear
(especially at the band edge).
In contrast, FIR filters can have precise linear phase. Also, the window method and most
algorithmic methods afford the possibility of approximating more arbitrary frequency response
characteristics with little more difficulty than is encountered in the design of low-pass filters.
Also, it appears that the design problem for FIR filters is much more under control than the
IIR design problem because there is an optimality theorem for FIR filters that is meaningful in
a wide range of practical situations.
Finally, there are questions of economics in implementing a digital filter. Concerns of this
type are usually measured in terms of hardware complexity or computational speed. Both these
factors are strongly related to the filter order required to meet a given specification. Putting
aside phase considerations, a given amplitude response specification will, in general, be met most
efficiently with an IIR filter. However, in many cases, the linear phase available with an FIR
filter may be well worth the extra cost.
Thus a multitude of tradeoffs must be considered in designing a digital filter. Clearly, the
final choice will most often be made in terms of engineering judgement on such questions as
the formulation of specifications, the method of implementation, and computational facilities
available for carrying out the design.
Chapter 6
Introduction to Digital Spectral

Analysis
6.1 Why Spectral Analysis?

There are mainly three principal reasons for spectral analysis.
1. To provide useful descriptive statistics.
2. As a diagnostic tool to indicate which further analysis method might be relevant.
3. To check postulated theoretical models.
6.2 Applications
We encounter many areas where spectral analysis is needed. A few examples are given below.
Physics. In spectroscopy we wish to investigate the distribution of power in a radiation field

as a function of frequency (wavelength). Consider, for example, the situation shown in
Figure 6.1 where a Michelson interferometer is used to measure the spectrum of a light
source. Let S(t) be a zero-mean stochastic stationary process which may be complex. It
can be seen that
X(t) = A (S(t − t0 ) + S(t − t0 − τ ))
and
V (t) = |X(t)|2 = A2 |S(t − t0 ) + S(t − t0 − τ )|2 .
Using the notation t̃ for t − t0 , we get

V (t) = A2 S(t̃)S(t̃)∗ + S(t̃)S(t̃ − τ )∗ + S(t̃ − τ )S(t̃)∗ + S(t̃ − τ )S(t̃ − τ )∗ .
After low-pass filtering, we obtain (ideally)
w(τ ) = E [V (t)] = A2 [cSS (0) + cSS (τ ) + cSS (−τ ) + cSS (0)]

= A2 [2cSS (0) + 2ℜ{cSS (τ )}]
Performing a Fourier transform leads to
W (ejω ) = F{w(τ )} = A2 (2cSS (0)δ(ejω ) + CSS (ejω ) + CSS (e−jω ))
65
66 CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS
11111
00000
00000
11111
00000
11111
Fixed mirror
Moveable
2 mirror
Beam splitter
S(t) 1
X(t)
111
000
d=0
AS(t − t0 )
AS(t − t0 − τ )
τ = 2d/c
Detector
V (t)
Low-pass
filter
w(τ )
Fourier
transform
W (ejω )
Figure 6.1: A Michelson Interferometer.
Thus, the output of a Michelson interferometer is proportional to the spectrum of the light
source. One of the objectives in spectroscopic interferometry is to get a good estimate of
the spectrum of S(t) without moving the mirror a large d.
Electrical Engineering. Measurement of the power in various frequency bands of some elec-
tromagnetic signal of interest. In radar there is the problem of signal detection and inter-
pretation.
Acoustics. The power spectrum plays the role of a descriptive statistic. We may use the
spectrum to calculate the direction of arrival (DOA) of an incoming wave or to characterise
the source.
Geophysics. Earthquakes give rise to a number of different waves such as pressure waves, shear
waves and surface waves. These waves are received by an array of sensors. Even for a
6.2. APPLICATIONS 67
100km distance between the earthquake and the array, these types of waves can arrive at
almost the same time at the array. These waves must be characterised.
Mechanical Engineering. Detection of abnormal combustion in a spark ignition engine. The

power spectrum is a descriptive statistic to test if a combustion cycle is a knocking one.
Medicine. EEG (electroencephalogram) and ECG (electrocardiogram) analysis. It is well

known that the adult brain exhibits several major modes of oscillation, the level of brain
activity in these various frequency bands may be used as a diagnostic tool.
68 CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS
Chapter 7
Random Variables and Stochastic

Processes
7.1 Random Variables

A random variable is a number X(ζ) assigned to every outcome ζ ∈ S of an experiment. This
could be the gain in a game of chance, the voltage of a random source, or the cost of a random
component.
Given ζ ∈ S, the mapping ζ → X(ζ) ∈ R is a random variable, if for all x ∈ R the set
{ζ : X(ζ) ≤ x} is also an event
{X ≤ x} := {ζ : X(ζ) ≤ x}
with probability
P ({X ≤ x}) := P ({ζ : X(ζ) ≤ x})
7.1.1 Distribution Functions

The mapping x ∈ R → FX (x) = P ({X ≤ x}) is called the probability distribution function of
the random variable X.
7.1.2 Properties
The probability distribution function has the following properties
1. 0 ≤ FX (x) ≤ 1
FX (∞) = 1
FX (−∞) = 0
2. FX (x) is continuous from the right i.e.,
lim FX (x + ǫ) = FX (x)
ǫ→0
3. If x1 < x2 then FX (x1 ) ≤ FX (x2 ), i.e., FX (x) is a non-decreasing function of x
P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) ≥ 0
69
70 CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
4. The probability density function is defined as:

dFX (x)
fX (x) :=
dx
and if it exists has the property
fX (x) ≥ 0.
To find FX (x), we integrate fX (x), i.e.,
Z x
P ({X ≤ x}) = FX (x) = fX (u)du
−∞
and because FX (∞) = 1, Z ∞

fX (x)dx = 1.
−∞
Further, Z x2
P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) = fX (x) dx
x1
5. For a continuous variable X, P(x1 < X < x2 ) = P(x1 < X ≤ x2 ) because

P({x}) = P ((−∞, x]) − P ((−∞, x)) = FX (x) − FX (x − 0) = 0
7.1.3 Uniform Distribution

X is said to be uniformly distributed on [a, b] if it admits the probability density function
1
fX (x) = [u(x − a) − u(x − b)] , a < b
b−a
and probability distribution function
Z x
1
FX (x) = fX (u)du = [(x − a)u(x − a) − (x − b)u(x − b)]
−∞ b − a
Both FX (x) and fX (x) are displayed in Figure 7.1.
7.1.4 Exponential Distribution

Let X be exponentially distributed with parameter λ. Then,
fX (x) = λe−λx u(x), λ>0
and
FX (x) = 1 − e−λx u(x),
where
1, x ≥ 0
u(x) =
0, x < 0.
The exponential distribution is useful in many applications in engineering, for example, to
describe the life-time X of a transistor. The breakdown rate of a transistor is defined by
1 P {x < X ≤ x + ∆}
lim
∆→0+ ∆ P {x < X}
and is given by
fX (x)
= λ = constant
1 − FX (x)
7.1. RANDOM VARIABLES 71
6fx (x)
1
b−a
-
a b x
61
-
a b x
Figure 7.1: Probability function.
7.1.5 Normal Distribution

X is normally distributed with mean µ and variance σ 2 X ∼ N (µ, σ 2 ) if
1 1 2
fX (x) = √ e− 2σ2 (x−µ) σ 2 > 0, −∞ < µ < ∞
σ 2π
A closed-form expression for FX (x) does not exist. For µ = 0, σ 2 = 1, X is said to be standard
normally distributed, and its distribution function is denoted by ΦX (x), i.e., we have

x−µ
FX (x) = ΦX
σ
R∞
Example 7.1.1 Show that for the normal distribution −∞ fX (x) dx = 1.
Z ∞ Z ∞
1 1 2
fX (x) dx = √ e− 2σ2 (x−µ) dx
−∞ −∞ σ 2π
x−µ
If we let s = σ , then Z Z
∞ ∞
1 1 2
fX (x)dx = √ e− 2 s ds
−∞ −∞ 2π
√1 e − 12 x2
Because 2π
> 0, we calculate
Z ∞ 2 Z ∞ Z ∞
1 1 2 1 − 1 (x2 +y2 )
√ e− 2 x dx = e 2 dx dy
−∞ 2π −∞ −∞ 2π
Z 2π Z ∞
1 − 1 r2
= e 2 r dr dϕ
2π
Z0 ∞ 0
1 2
= e− 2 r r dr
Z0 ∞
= e−s ds
0
= 1
p y
where r = x2 + y 2 and ϕ = arctan x .
Example 7.1.2 A random variable X ∼ N (1000, 50). Find the probability that X is between
900 and 1050. We have
P({900 ≤ X ≤ 1050}) = FX (1050) − FX (900)
Numerical values of ΦX (x) over a range of x are shown in Table 1. Using Table 1, we conclude
that
P({900 ≤ X ≤ 1050}) = ΦX (1) + ΦX (2) − 1 = 0.819.
since
ΦX (−x) = 1 − ΦX (x).
1
Often we define the following as the error-function. Note that erf(x) = ΦX (x) − 2 though
there are other definitions in the literature.
Z x
1 2 1
erf(x) = √ e−x /2 du = ΦX (x) −
2π 0 2
Table 1: Numerical values for erf(x) function.
x erf(x) x erf(x) x erf(x) x erf(x)

0.05 0.01994 0.80 0.28814 1.55 0.43943 2.30 0.48928
0.10 0.03983 0.85 0.30234 1.60 0.44520 2.35 0.49061
0.15 0.05962 0.90 0.31594 1.65 0.45053 2.40 0.49180
0.20 0.07926 0.95 0.32894 1.70 0.45543 2.45 0.49286
0.25 0.09871 1.00 0.34134 1.75 0.45994 2.50 0.49379
0.30 0.11791 1.05 0.35314 1.80 0.46407 2.55 0.49461

0.35 0.13683 1.10 0.36433 1.85 0.46784 2.60 0.49534
0.40 0.15542 1.15 0.37493 1.90 0.47128 2.65 0.49597
0.45 0.17364 1.20 0.38493 1.95 0.47441 2.70 0.49653
0.50 0.19146 1.25 0.39495 2.00 0.47726 2.75 0.49702
0.55 0.20884 1.30 0.40320 2.05 0.47982 2.80 0.49744

0.60 0.22575 1.35 0.41149 2.10 0.48214 2.85 0.49781
0.65 0.24215 1.40 0.41924 2.15 0.48422 2.90 0.48813
0.70 0.25804 1.45 0.42647 2.20 0.48610 2.95 0.49841
0.75 0.27337 1.50 0.43319 2.25 0.48778 3.00 0.49865
7.1.6 χ2 Distribution
P
Let X = νk=1 Yk2 be a random variable where Yk ∼ N (0, 1). X is then χ2 distributed with
ν degrees of freedom. The probability density function of a χ2ν random variable is displayed in
Figure 7.2 for ν = 1, 2, 3, 5 and 10.
The probability density function as
ν
x 2 −1 x
fX (x) = ν ν e− 2 u(x) ,
2 2 Γ( 2 )
where Γ(x) is the gamma function,
Z ∞
Γ(x) = ξ x−1 e−ξ dξ .
0
ν=1
0.5
0.4
χ2ν ν=2
0.3
ν=3
0.2
ν=5
ν = 10
0.1
0
0 1 2 3 4 5 6 7 8 9 10
Figure 7.2: χ2ν distribution with ν = 1, 2, 3, 5, 10
The mean of X is
E [X] = ν
and the variance is
2
σX = 2ν .
7.1.7 Multiple Random Variables

The probability of two events A = {X1 ≤ x1 } and B = {X2 ≤ x2 } have already been defined as
functions of x and y, respectively called probability distribution functions
FX1 (x1 ) = P({X1 ≤ x1 })

FX2 (x2 ) = P({X2 ≤ x2 })
We must introduce a new concept to include the probability of the joint event {X1 ≤ x1 , X2 ≤
x2 } = {(X1 , X2 ) ∈ D}, where x1 and x2 are two arbitrary real numbers and D is the quadrant
shown in Figure 7.3.
The joint distribution FX1 X2 (x1 , x2 ) of two random variables X1 and X2 is the probability
of the event {X1 ≤ x1 , X2 ≤ x2 },
FX1 X2 (x1 , x2 ) = P ({X1 ≤ x1 , X2 ≤ x2 }) .
For n random variables X1 , . . . , Xn the joint distribution function is denoted by
FX1 X2 ...Xn (x1 , . . . , xn ) = P ({X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn })

X2
6
x2
D
- X1
x1
Figure 7.3: The quadrant D of two arbitrary real numbers x and y.
7.1.8 Properties
The multivariate probability distribution function has the following properties.
1)
lim FX1 X2 ...Xn (x1 , . . . , xn ) = F (∞, ∞, . . . , ∞) = 1
x1 →∞,...,xn →∞
lim FX1 X2 ...Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn )
xi →−∞
= Fx1 ,x2 ,...,xn (x1 , . . . , xi−1 , −∞, xi+1 , . . . , xn ) = 0
2) F is continuous from the right in each component xi when the remaining components are
fixed.
3) F is a non decreasing function of each component xi when the remaining components are
fixed.
4) For the nth difference
X Pn
∆ba F = (−1)[n− i=1 ki ]
F (bk11 a1−k
1
1
, . . . , bknn a1−k
n
n
)≥0
k∈{0,1}n
where a = (a1 , . . . , an )T , b = (b1 , . . . , bn )T with ai < bi (i = 1, . . . , n)

where T denotes transpose. For n = 1, ∆ba F is equivalent to F (b) − F (a), and for n = 2, ∆ba F is
equivalent to F (b1 , b2 ) − F (b1 , a2 ) − F (a1 , b2 ) + F (a1 , a2 ). To show that (1)–(3) are not sufficient
to define a probability, one should discuss the function FX1 X2 (x1 , x2 ), defined as:

0, x1 + x2 < 0,
FX1 X2 (x1 , x2 ) =
1, elsewhere.
and is depicted in Figure 7.4.
Suppose that in a combined experiment the result of an experiment i depends on the first to
(i − 1)th experiments. With the density function fX1 (x1 ) for the first experiment one constructs
the density function of the total experiment
n
Y
fX1 ...Xn (x1 , . . . , xn ) = fX1 (x1 ) fXi (xi |x1 , . . . , xi−1 )
i=2
Yn
= fX1 (x1 ) fXi (xi |X1 = x1 , . . . , Xi−1 = xi−1 )
i=2
x2
6
FX1 ,X2 (x1 , x2 ) = 1
- x1
FX1 ,X2 (x1 , x2 ) = 0
Figure 7.4: Bivariate probability distribution function.
If the ith experiment is independent of the previous ones, then
fXi (xi |x1 , . . . , xi−1 ) = fXi (xi )
and
n
Y
fX1 ...Xn (x1 , . . . , xn ) = fXi (xi )
i=1
or equivalently
n
Y
FX1 ...Xn (x1 , . . . , xn ) = FXi (xi )
i=1
which means that the events {X1 ≤ x1 }, . . . , {Xn ≤ xn } are independent. For the special case
for n = 2, we have the following properties:
1)
FX1 X2 (∞, ∞) = P(X1 ≤ ∞, X2 ≤ ∞) = 1

FX1 X2 (−∞, x2 ) = P(X1 ≤ −∞, X2 ≤ x2 ) = 0 = FX1 X2 (x1 , −∞)
because {X1 = −∞, X2 ≤ x2 } ⊂ {X1 = −∞}, {X1 ≤ x1 , X2 = −∞} ⊂ {X2 = −∞}.
2)
lim FX1 X2 (x1 + ǫ, x2 ) = lim FX1 X2 (x1 , x2 + ǫ) = FX1 X2 (x1 , x2 ).
ǫ→0 ǫ→0
3) The event {x11 < X1 ≤ x12 , X2 ≤ x2 } consists of all the points (X1 , X2 ) in D1 , and the
event {X1 ≤ x1 , x21 < X2 ≤ x22 } consists of all the points in D2 . From Figure 7.5, we see
that
{X1 ≤ x12 , X2 ≤ x2 } = {X1 ≤ x11 , X2 ≤ x2 } ∪ {x11 < X1 ≤ x12 , X2 ≤ x2 }.
The two summands are mutually exclusive, hence
P({X1 ≤ x12 , X2 ≤ x2 }) = P({X1 ≤ x11 , X2 ≤ x2 }) + P({x11 < X1 ≤ x12 , X2 ≤ x2 })
and therefore,
P({x11 < X1 ≤ x12 , X2 ≤ x2 }) = FX1 X2 (x12 , x2 ) − FX1 X2 (x11 , x2 ).

X2
6
x22
D2 x2
x21
- X1
x1 x11 x12
D1
Figure 7.5: Properties for n=2
Similarly,
P({X1 < x1 , x21 < X2 ≤ x22 }) = FX1 X2 (x1 , x22 ) − FX1 X2 (x1 , x21 )
4)
P({x11 < X1 ≤ x12 , x21 < X2 ≤ x2 2}) =
FX1 X2 (x12 , x22 ) − FX1 X2 (x11 , x22 ) − FX1 X2 (x12 , x21 ) + FX1 X2 (x11 , x21 ) > 0
This is a probability that (X1 , X2 ) is in the rectangle D3 (see Figure 7.6).
X2
6
x22
D3
x21
-X1
x11 x12
Figure 7.6: Probability that (X1 , X2 ) is in the rectangle D3 .
Example 7.1.3 The bivariate normal distribution

1
fX1 X1 (x1 , x2 ) = p
2πσX1 σX2 1 − ρ2
" " ##
1 x1 − µX1 2 (x1 − µX1 )(x2 − µX2 ) x2 − µX2 2
× exp − − 2ρ +
2(1 − ρ2 ) σ X1 σ X2 σ X2 σ X2
where µX1 and µX2 are respectively the means of X1 and X2 , σX 2 > 0 and σ 2 > 0 the corre-
1 X2
2 ) and N (µ , σ 2 )
sponding variances, and −1 < ρ < 1 is the correlation coefficient. N (µX1 , σX 1 X2 X2
are the marginal distributions.
σ
The conditional density function fX2 (x2 |x1 ) of X2 when X1 = x1 is N (µX2 + ρ σX
X
2
(x1 −
2
2 (1 − ρ2 )). If X and X are independent then ρ = 0, and clearly f
µX2 ), σX 2 1 2 X1 X2 (x1 , x2 ) =
fX1 (x1 )fX2 (x2 ).
7.1.9 Operations on Random Variables

The concept of a random variable was previously introduced as a means of providing a symmetric
definition of events defined on a sample space. It formed a mathematical model for describing
characteristics of some real, physical world random variables, which are mostly based on a single
concept – expectation.
7.1.10 Expectation
The mathematical expectation of a random variable X,
E [X]
which may be read “the expected value of X” or “the mean value of X” is defined as
Z ∞
E [X] := x · fX (x)dx
−∞
where fX (x) is the probability density function of X. If X happens to be discrete with N

possible values xi having probabilities P(xi ) of occurrence, then
N
X
fX (x) = P(xi ) · δ(x − xi )
i=1
N
X
E [X] = xi P(xi )
i=1
Example 7.1.4
i) Normal Distribution: If X ∼ N (µ, σ 2 ), then

1 1 2 2
fX (x) = √ · exp − (x − µ) /σ
σ 2π 2
Z ∞
E [X] = xfX (x)dx
−∞
Z ∞
1 1
= √ x exp{− (x − µ)2 /σ 2 }dx
σ 2π −∞ 2
Z ∞
1 1
= √ (σu + µ) exp{− u2 }σdu
σ 2π −∞ 2
Z ∞ Z ∞
1 1 2 1 2
= √ σ u · exp{− u }du + µ exp{− u }du
2π −∞ 2 −∞ 2
1 h √ i
= √ σ · 0 + µ 2π
2π
= µ
ii) Uniform distribution: If X is uniformly distributed on the interval [a, b], then
1
fX (x) = · rect[a,b] (x)
b−a
where
1, a ≤ x ≤ b
rect[a,b] =
0, else.
Z ∞ Z b
1
E [X] = xfX (x)dx = xdx
−∞ b − a a
1 1 2
= (b − a2 )
b−a2
(b − a)(b + a)
=
2(b − a)
a+b
=
2
which can be verified graphically.
iii) Exponential Distribution: If X is exponentially distributed, then

1 x−a
fX (x) = exp − u(x − a)
b b
Z ∞
E [X] = x · fX (x)dx
−∞
Z ∞
1 x−a
= x · exp − u(x − a)dx
−∞ b b
Z
1 ∞ x−a
= x · exp − dx
b a b
Z
1 ∞ h xi hai
= x · exp − dx exp
b a b b
E [X] = a+b
Note: If a random variable’s density is symmetrical about a line x = a, then E [X] = a, i.e.
E [X] = a, if fX (x + a) = fX (−x + a)
7.1.11 Expected Value of a Function of Random Variables

Suppose we are interested in the mean of the random variable Y = g(X)
Z ∞
E [Y ] = y · fY (y)dx
−∞
and we are given fX (x). Then,

Z ∞
E [Y ] = E [g(X)] = g(x) · fX (x)dx
−∞
Example 7.1.5 The average power in a 1 Ω resistor can be found as

Z ∞
2
E V = v 2 fV (v)dv
−∞
where V and fV (v) are respectively the random voltage and its probability density function.
In particular, if we apply the transformation g(·) to the random variable X, defined as
g(X) = X n , n = 0, 1, 2, . . . ,
the expectation of g(X) is known as the nth order moment of X and denoted by µn :
Z ∞
µn := E [X n ] = xn fX (x)dx .
−∞
It is also of importance to use central moments of X around the mean. These are defined as
µ′n := E [(X − E [X])n ] , n = 0, 1, 2 . . .

Z ∞ Z
(x − E [X])n · fX (x)dx = (x − µ)n fX (x)dx
−∞
Clearly the first order central moment is zero.
7.1.12 The Variance

The variance of a random variable X is by definition
Z ∞
2
σ := (x − E [X])2 · fX (x)dx
−∞
The positive constant σ, is called the standard deviation of X. From the definition it follows
that σ 2 is the mean of the random variable (X − E [X])2 . Thus,

σ 2 = E (X − E [X])2 = E X 2 − 2X · (E [X]) + (E [X])2

= E X 2 − 2E [XE [X]] + (E [X])2

= E X 2 − 2 (E [X])2 + (E [X])2

σ 2 = E X 2 − (E [X])2

= E X 2 − µ2
Example 7.1.6
We will now find the variance of three common distributions.
i) Normal distribution:
Let X ∼ N (µ, σ 2 ). The probability density function of X is

1 1 (x − µ)2
fX (x) = √ exp − .
σ 2π 2 σ2
Clearly Z
∞
1 (x − µ)2 √
exp − dx = σ 2π
−∞ 2 σ2
Differentiating with respect to σ, we obtain
Z ∞
(x − µ)2 1 (x − µ)2 √
exp − dx = 2π
−∞ σ3 2 σ2
√
Multiplying both sides by σ 2 / 2π, we conclude that

E (X − µ)2 = E (X − E [X])2 = σ 2 .
ii) Uniform distribution:

E [(X − E [X])2 = E X 2 − (E [X])2

a+b 2
Z ∞
2
= x fX (x)dx −
−∞ 2
Z b 2 2
x a+b
= dx −
a b−a 2
(b − a)2
= .
12
iii) Exponential distribution:
fX (x) = λ exp{−λx}u(x), x > 0, λ > 0

1
E [X] =
λ
1
σ2 =
λ2
7.1.13 Transformation of a Random Variable

In practice, one may wish to transform one random variable X into a new random variable Y
by means of a transformation
Y = g(X),
where the probability density function fX (x) or distribution function of X is known. The
problem is to find the probability density function of fY (y) or distribution function FY (y) of Y .
Let us assume that g(·) is continuous and differentiable at all values of x for which fX (x) 6= 0.
Furthermore, we assume that g(·) is monotone for which the inverse g −1 (·) exists. Then:

Z g −1 (y)
−1
FY (y) = P ({Y ≤ y}) = P ({g(X) ≤ y}) = P {X ≤ g (y)} = fX (x)dx
−∞
holds. The density function of Y = g(X) is obtained via differentiation which leads to

−1
dg −1 (y)
fY (y) = fX g (y) ·
dy
Example 7.1.7 Let

Y = aX + b.
To find the probability density function of Y knowing FX (x) or fX (x), we calculate
FY (y) = P ({Y ≤ y}) = P ({aX + b ≤ y})

y−b
= P X≤ , a>0
a

y−b
= FX ,
a
and
y−b
FY (y) = 1 − FX , a<0
a
By differentiating FY (y), we obtain the probability density function

1 y−b
fY (y) = · fX
|a| a
Example 7.1.8 Let
Y = X2
To find the probability density function of Y knowing FX (x) or fX (x), we calculate

2
0, y<0
FY (y) = P {X ≤ y} = √ √
P ({− y ≤ X ≤ y}) , y ≥ 0
√ √
FY (y) = FX ( y) − FX (− y)
By differentiating FY (y), we obtain the probability density function

 0, y<0
√ √
fY (y) = fX ( y) +√fX (− y)
 , y>0
2 y
FY (y) is non differentiable at y = 0. We may choose arbitrarily fY (y) = 0 at y = 0.
7.1.14 The Characteristic Function

The characteristic function (cf) of a random variable X is defined as follows:

ΦX (ejω ) := E ejωX (7.1)
Z ∞
= fX (x)ejωx dx. (7.2)
−∞
This can be interpreted as the expectation of a function g(X) = ejωX , or as the Fourier transform
(with reversed sign of the exponent) of the probability density function (pdf) of X. Consid-
ering Equation (7.2) as a Fourier transform, we can readily apply all the properties of Fourier
transforms to the characteristic function. Most importantly, the inverse relationship of Equation
(7.2) is given by Z ∞
1
fX (x) = ΦX (ω)e−jωx dω.
2π −∞
Using Schwarz’s inequality in (7.2) shows that |ΦX (ejω )| ≤ 1 and thus it always exists.
Example 7.1.9 We shall derive the cf of a Gaussian distributed random variable. First, we
consider Z ∼ N (0, 1).
Z
ΦZ (ω) = fZ (z)ejωz dz
Z ∞
1 1 2
= √ e− 2 z ejωz dz
2π −∞
Z ∞
1 1 2 1 2
= √ e− 2 (z−jω) + 2 (jω) dz
2π −∞
Z
− 21 ω 2
= e fZ (z − jω)dz
1 2
= e− 2 ω
Now we consider the more general case of X = σZ + µ ∼ N (µ, σ 2 ):

h i
ΦX (ω) = E ejω(σZ+µ)
= ejωµ ΦZ (σω)
1 2 ω2
= ejωµ− 2 σ
The cf is also known as the moment-generating function as it may be used to determine nth
order moment of a random variable. This can be seen from the following derivation. Consider
the power series expansion of an exponential term for x < ∞:
∞
x
X xn
e = .
n!
i=0
Substituting this into Equation (7.2) yields

ZZ
(jωx)2
ΦX (ω) = fX (x) 1 + jωx + + · · · dx
2!
(jω)2 2
= 1 + jωE [X] + E X + ··· (7.3)
2!
Differentiating the above expression n times and evaluating at the origin gives
n
d
ΦX (ω) = j n E [X n ]
dω ω=0
which means that the nth order moment of X can be determined according to
n
n 1 d
E [X ] = n ΦX (ω) .
j dω ω=0
This is known as the moment theorem. When the power series expansion of Equation (7.3)
converges the characteristic function and hence the pdf of X are completely determined by the
moments of X.
7.2. COVARIANCE 83
7.2 Covariance
If the statistical properties of the two random variables X1 and X2 are described by their joint
probability density function fX1 X2 (x1 , x2 ), then we have
Z ∞Z ∞
E [g(X1 , X2 )] = g(x1 , x2 ) · fX1 X2 (x1 , x2 )dx1 dx2
−∞ −∞
For g(X1 , X2 ) = X1n X2k

h i Z ∞ Z ∞
n k
µn,k := E X1 X2 = xn1 xk2 fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . .
−∞ −∞

are called the joint moments of X1 and X2 . Clearly µ0k = E X2k , while µn0 = E [X1n ]. The
second order moment E [X1 X2 ] of X1 and X2 we denote1 by rX1 X2 ,
Z ∞Z ∞
rX1 X2 = E [X1 X2 ] = x1 x2 fX1 X2 (x1 , x2 )dx1 dx2
−∞ −∞
If rX1 X2 = E [X1 X2 ] = E [X1 ] · E [X2 ], then X1 and X2 are known to be uncorrelated. If

rX1 X2 = 0, then X1 and X2 are orthogonal.
The moments µ′nk , k = 0, 1, 2, . . . are known to be joint central moments and defined by
h i
µ′nk = E (X1 − E [X1 ])n (X2 − E [X2 ])k )
Z ∞Z ∞
= (x1 − E [X1 ])n (x2 − E [X2 ])k fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . .
−∞ −∞
The second order central moments
µ′20 = σX
2
1
µ′02 = σX
2
2
are the variances of X1 and X2 , respectively. The joint moment µ′11 is called covariance of X1
and X2 and denoted by cX1 X2
c X1 X2 = E [(X1 − E [X1 ])(X2 − E [X2 ])]

Z ∞
= (x1 − E [X1 ])(x2 − E [Y2 ]) · fX1 X2 (x1 , x2 )dx1 dx2
−∞
Clearly
cX1 X2 = rX1 X2 − E [X1 ] E [X2 ] .
If two random variables are independent, i.e. fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), then cX1 X2 = 0.
The converse however ispnot always true (except for the Gaussian case). The normalized second-
′ c
order moment ρ = µ11 / µ′20 µ′02 = σXX1σXX2 is known as the correlation coefficient of X1 and X2 .
1 2
It can be shown that
−1 ≤ ρ ≤ 1
We will show that two uncorrelated random variables are not necessarily independent.
1
In the engineering literature rX1 X2 is often called correlation. We reserve this term to describe the normalized
covariance as it is common in the statistical literature.
Example 7.2.1 U is uniformly distributed on [−π, π)
X1 = cos U, X2 = sin U
X1 and X2 are not independent because X12 + X22 = 1
cXY = E [(X1 − E [X1 ])(X2 − E [X2 ])]

= E [X1 Y1 ] = E [cos U sin U ]
= 12 E [sin 2U ]
= 0
Thus X1 and X2 are uncorrelated but not independent.
Example 7.2.2 Let Xi , i = 1, . . . , N be random variables and αi , i = 1, . . . , N be scalar values.

We calculate the mean and the variance of
N
X
Y = αi Xi
i=1
given the means and variances (and covariances) of Xi .

N
X
E [Y ] = αi E [Xi ]
i=1
N
N X
X
σY2 2
= E (Y − E [Y ]) = αi αj cXi Xj
i=1 j=1
For the special case cXi Xj = 0, i 6= j
N
X
σY2 = αi2 σX
2
i
.
i=1
7.3 Stochastic Processes (Random Processes)

Random processes naturally arise in many engineering applications. For example, the voltage
waveform of a noise source or the bit stream of a binary communications signal are continuous-
time random processes. The monthly rainfall values or the daily stock market index are examples
of discrete-time random processes. However, random sequences are most commonly encountered
when sampling continuous-time processes, as discussed in the preceding chapter. The remain-
der of the manuscript focuses solely on discrete-time random processes. However the results
presented herein apply in a similar fashion to the continuous- time case.
Definition 7.3.1 A real random process is an indexed set of real functions of time that has
certain statistical properties. We characterise a real random process by a set of real functions
and associated probability description.
In general, xi (n) = X(n, ζi ), n ∈ Z, ζi ∈ S, denotes the waveform that is obtained when

the event ζi of the process occurs. xi (n) is called a sample function of the process (see Figure
7.7). The set of all possible sample functions {X(n, ζi )} is called the ensemble and defines the
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES) 85
..
xi−1 (n) = X(n, ζi−1 ) .
xi (n) = X(n, ζi )
xi+1 (n) = X(n, ζi+1 )
..
n1 . n2
X1 = X(n1 ) = X(n1 , ζ) X2 = X(n2 ) = X(n2 , ζ)
Figure 7.7: Voltage waveforms emitted from a noise source.
random process X(n) that describes a noise source. A random process can be seen as a function
of two variables, time n and elementary event ζ. When n is fixed, X(n, ζ) is simply a random
variable. When ζ is fixed, X(n, ζ) = x(n) is a function of time known as “sample function” or
realisation. If n = n1 then X(n1 , ζ) = X(n1 ) = X1 is a random variable. If the noise source has
a Gaussian distribution, then any of the random variables would be described by
(xi −µi )2
1 −
2σ 2
fXi (xi ) = q ·e i
2πσi2
where Xi = X(ni , ζ). Also, all joint distributions are multivariate Gaussian. At each point
in time, the value of the waveform is a random variable described by its probability density
function at that time n, fX (x; n).
The probability that the process X(n) will have a value in the range [a, b] at n is given by
Z b
P({a ≤ X(n) ≤ b}) = fX (x; n)dx
a
The first moments of X(n) are

Z ∞
µ1 (n) = E [X(n)] = x fX (x; n)dx
Z−∞
∞

µ2 (n) = E X(n)2 = x2 fX (x; n)dx
−∞
.. ..
. . Z
∞
µl (n) = E X(n)l = xl fX (x; n)dx,
−∞
where µ1 (n) is the mean of X(n), and µ2 (n)−µ1 (n)2 is the variance of X(n). Because in general,
fX (x; n) depends on n, the moments will depend on time. The probability that a process X(n)
lies in the interval [a, b] at time n1 and in the interval [c, d] at time n2 is given by the joint
density function fX1 X2 (x1 , x2 ; n1 , n2 ),
Z bZ d
P ({a ≤ X1 ≤ b, c ≤ X2 ≤ d}) = fX1 X2 (x1 , x2 ; n1 , n2 )dx1 dx2 .
a c
A complete description of a random process requires knowledge of all such joint densities for all
possible point locations and orders fX1 X2 ...XN (x1 , . . . , xN ; n1 , . . . , nN ).
• A continuous random process consists of a random process with associated continuously
distributed random variables, X(n), n ∈ Z. For example, noise in communication circuits
at a given time; the voltage measured can have any value.
• A discrete random process consists of a random process with associated discrete random
variables, e.g. the output of an ideal limiter.
7.3.1 Stationarity and Ergodicity

Definition 7.3.2 A random process X(n), n ∈ Z is said to be stationary to the order N if, for
any n1 , n2 , . . . nN , and x1 , x2 , . . . , xN , and any n0
FX (x1 , . . . , xN ; n1 , . . . , nN ) = P ({X(n1 ) ≤ x1 , X(n2 ) ≤ x2 , . . . , X(nN ) ≤ xN }) =

P({X(n1 + n0 ) ≤ x1 , X2 (n2 + n0 ) ≤ x2 , . . . , X(nN + n0 ) ≤ xN })
= FX (x1 , . . . , xN ; n1 + n0 , . . . , nN + n0 )
holds or equivalently:
fX (x1 , x2 , . . . , xN ; n1 , . . . , nN ) = fX (x1 , x2 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) .
The process is said to be strictly stationary if it is stationary to the infinite order. If a process
is stationary, it can be translated in time without changing its statistical description.
Ergodicity is a topic dealing with the relationship between statistical averages and time
averages. Suppose that, for example, we wish to determine the mean µ1 (n) of a process X(n).
For this purpose, we observe a large number of samples X(n, ζi ) and we use their ensemble
average as the estimate of µ1 (n) = E [X(n)]
L
1X
µ̂1 (n) = X(n, ζi )
L
i=1
Suppose, however, that we have access only to a single sample x(n) = X(n, ζ) of X(n) for
−N ≤ n ≤ N . Can we use then the time average
N
1 X
X(n) = lim X(n, ζ)
N →∞ 2N + 1
n=−N
as the estimate of µ1 (n)? This is, of course, impossible if µ1 (n) depends on n. However, if
µ1 (n) = µ1 is a constant, then under the general conditions, X(n) approaches µ1 .
Definition 7.3.3 A random process is said to be ergodic if all time averages of any sample
function are equal to the corresponding ensemble averages (expectation).
Example 7.3.1 DC and RMS values are defined in terms of time averages, but if the process
is ergodic, they may be evaluated by use of ensemble averages. The DC value of X(n) is:
Xdc = E [X(n)]
p
Xrms = E [X(n)2 ].
If X(n) is ergodic, the time average is equal to the ensemble average, that is
N Z ∞
1 X
Xdc = lim X(n) = E [X(n)] = xfX (x; n)dx = µX
N →∞ 2N + 1 −∞
n=−N
Similarly, we have for the rms value:

v
u N q Z ∞
u 1 X p
Xrms = t lim 2 2 2 2
X(n) = E [X(n) ] = σX + µX = x2 fX (x; n)dx
N →∞ 2N + 1 −∞
n=−N
2 is the variance of X(n).

where σX
Ergodicity and Stationarity

If a process is ergodic, all time and ensemble averages are interchangeable. The time average is
by definition independent of time. As it equals the ensemble averages (such as moments), the
latter are also independent of time. Thus, an ergodic process must be stationary. But not all
stationary processes are ergodic.
Example 7.3.2 An Ergodic Random Process

Let the random process be given by: X(n) = A cos(ω0 n + Φ), where A and ω0 are constants
and Φ is a random variable that is uniformly distributed over [0, 2π). First we evaluate some
ensemble averages (expectation).
Z ∞
E [X(n)] = x(ϕ) fΦ (ϕ) dϕ
−∞
Z 2π
1
= A cos(ω0 n + ϕ) dϕ
0 2π
= 0.
Furthermore,
Z 2π
1
E X(n)2 = A2 cos(ω0 n + ϕ)2 dϕ
0 2π
A2
=
2
In this example, the time parameter n disappeared when the ensemble averages were evaluated.
The process is stationary up to the second order.
Now, evaluate the corresponding time averages using a sample function of the random pro-
cess. X(n, ζ1 ) = A cos ω0 n that occurs when Φ = 0. The zero phase Φ = 0 corresponds to one
of the events. The time average for any of the sample functions can be calculated by letting Φ
be any value between 0 and 2π. The time averages are:
N
1 X
• First moment: X(n) = lim A cos(ω0 n + ϕ) = 0
N →∞ 2N + 1
n=−N
N
1 X A2
• Second moment: X(n)2 = lim [A cos(ω0 n + ϕ)]2 =
N →∞ 2N + 1 2
n=−N
In this example, ϕ disappears when the time average is calculated. We see that the time average
is equal to the ensemble average for the first and second moments.
We have not yet proved that the process is ergodic because we have to evaluate all the orders
of moments and averages. In general, it is difficult to prove that a process is ergodic. In this
manuscript we will focus on stationary processes.
Example 7.3.3 Suppose that the amplitude in the previous example is now random. Let us
assume thatA and Φ are independent. In this case, the process is stationary but not ergodic
because if E A2 = σ 2 , then
σ2
E X(n)2 = E A2 · E cos(ω0 n + Φ)2 =
2
but
N
1 X 1
lim X(n)2 = A2
T →∞ 2N + 1 2
n=−N
Summary:
• An ergodic process has to be stationary.
• However, if a process is stationary, it may or may not be ergodic.
7.3.2 Second-Order Moment Functions and Wide-Sense Stationarity

Definition 7.3.4 The second-order moment function (SOMF) of a real stationary process X(n)
is defined as:
rXX (n1 , n2 ) = E [X(n1 )X(n2 )]
Z ∞ Z ∞
= x1 x2 fX1 X2 (x1 , x2 ; n1 , n2 ) dx1 dx2
−∞ −∞
where X1 = X(n1 ) and X2 = X(n2 ).

If the process is stationary to the second order, the SOMF is only a function of the time difference
κ = |n2 − n1 |, and
rXX (κ) = E [X(n + κ)X(n)]
Definition 7.3.5 A random process is said to be wide-sense stationary if:
1. E [X(n)] is a constant.
2. rXX (n1 , n2 ) = rXX (κ), where κ = |n2 − n1 |
A process that is stationary to order 2 or greater is certainly wide-sense stationary. However, a

finite order stationary process is not necessarily stationary.
Properties of the SOMF

1. rXX (0) = E X(n)2 = σX
2 + µ2 is the second order moment
X
2. rXX (κ) = rXX (−κ)
3. rXX (0) ≥ |rXX (κ)|, |κ| > 0
Proofs for 1 and 2 follow from the definition. The third property holds because

E (X(n + κ) ± X(n))2 ≥ 0

E X(n + κ)2 ± 2E [X(n + κ)X(n)] + E X(n)2 ≥ 0
rXX (0) ± 2 rXX (κ) + rXX (0) ≥ 0.
Definition 7.3.6 The “cross-SOMF” for two real processes X(n) and Y (n) is:
rXY (n1 , n2 ) = E [X(n1 )Y (n2 )]
If X(n) and Y (n) are jointly stationary, the cross-SOMF becomes:
rXY (n1 , n2 ) = rXY (n1 − n2 ) = rXY (κ)
where κ = n1 − n2 .
Properties of the Cross-SOMF of a Jointly Stationary Real Process

1. rXY (−κ) = rY X (κ)
p
2. |rXY (κ)| ≤ rXX (0)rY Y (0).
This holds because h i

E (X(n + κ) − aY (n))2 ≥ 0 ∀a ∈ R

E X(n + κ)2 + a2 E Y (n)2 − 2aE [X(n + κ)Y (n)] ≥ 0 ∀a ∈ R
rXX (0) + a2 rY Y (0) − 2arXY (κ) ≥ 0
This quadratic form in a is valid if the roots are not real (except as a double real root). Therefore
rXY (κ)2 − rXX (0)rY Y (0) ≤ 0

p
rXY (κ)2 ≤ rXX (0)rY Y (0) ⇒ |rXY (κ)| ≤ rXX (0)rY Y (0)
Sometimes
r (κ)
p XY
rXX (0)rY Y (0)
is denoted by ρXY (κ).
3.
1
|rXY (κ)| ≤ [rXX (0) + rY Y (0)]
2
Property 2 constitutes a tighter bound than that of property 3, because the geometric mean of
two positive numbers cannot exceed the arithmetic mean, that is
p 1
rXX (0)rY Y (0) ≤ [rXX (0) + rY Y (0)]
2
• Two random processes X(n) and Y (n) are said to be uncorrelated if:
rXY (κ) = E [X(n + κ)] E [Y (n)] = µX µY ∀κ ∈ Z
• Two random processes X(n) and Y (n) are orthogonal if:
rXY (κ) = 0, ∀κ ∈ Z
• If X(n) = Y (n), the cross-SOMF becomes the SOMF.
• If the random processes X(n) and Y (n) are jointly ergodic, the time average may be used
to replace the ensemble average.
If X(n) and Y (n) are jointly ergodic
N
△ 1 X
rXY (κ) = E [X(n + κ)Y (n)] = E [X(n)Y (n − κ)] ≡ lim X(n) Y (n − κ)
N →∞ 2N + 1
n=−N
(7.4)
In this case, cross-SOMFs and auto-SOMFs may be measured by using a system composed
of a delay element, a multiplier and an accumulator. This is illustrated in Figure 7.8, where
the block z −κ is a delay line of order κ.
X(n)
? N
- - - 1 X
-
Y (n) z −κ × 2N + 1
rXY (κ)
n=−N
Y (n − κ)
Figure 7.8: Measurement of SOMF.

Central Second-Order Moment Function (Covariance Function)

The central second-order moment functionis called the covariance function and is defined by
cXX (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(X(n) − E [X(n)])]
which can be re-written in the form
cXX (n + κ, n) = rXX (n + κ, n) − E [X(n + κ)] E [X(n)]
The covariance function for two processes X(n) and Y (n) is defined by
cXY (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(Y (n) − E [Y (n)])]
or equivalently
cXY (n + κ, n) = rXY (n + κ, n) − E [X(n + κ)] E [Y (n)]
For processes that are at least jointly wide sense stationary, we have
cXX (κ) = rXX (κ) − (E [X(n)])2
and
cXY (κ) = rXY (κ) − E [X(n)] E [Y (n)]
The variance of a random process is obtained from cXX (κ) by setting κ = 0. If the process is
stationary h i
σX2 = E (X(n) − E [X(n)])2
= rXX (0) − (E [X(n)])2
= rXX (0) − µ2X
For two random processes, if cY X (n, n + κ) = 0, then they are called uncorrelated, which is
equivalent to rXY (n + κ, n) = E [X(n + κ)] E [Y (n)].
Complex Random Processes

△
Definition 7.3.7 A complex random process is Z(n) = X(n) + j Y (n) where X(n) and Y (n)
are real random processes.
A complex process is stationary if X(n) and Y (n) are jointly stationary, i.e.
fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 , . . . , n2N )
= fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 + n0 , . . . , n2N + n0 )
for all x1 , . . . , xN , y1 , . . . , yN and all n1 , . . . , n2N , n0 and N . The mean of a complex random
process is defined as:
E [Z(n)] = E [X(n)] + jE [Y (n)]
Definition 7.3.8 The SOMF for a complex random process is:
rZZ (n1 , n2 ) = E [Z(n1 )Z(n2 )∗ ]

Definition 7.3.9 The complex random process is stationary in the wide sense if E [Z(n)] is a
complex constant and rZZ (n1 , n2 ) = rZZ (κ) where κ = |n2 − n1 |.
The SOMF of a wide-sense stationary complex process has the symmetry property:
rZZ (−κ) = rZZ (κ)∗
as it can be easily seen from the definition.
Definition 7.3.10 The cross-SOMF for two complex random processes Z1 (n) and Z2 (n) is:
rZ1 Z2 (n1 , n2 ) = E [Z1 (n1 )Z2 (n2 )∗ ]
If the complex random processes are jointly wide-sense stationary, the cross-SOMF becomes:
rZ1 Z2 (n1 , n2 ) = rZ1 Z2 (κ) where κ = |n2 − n1 |
The covariance function of a complex random process is defined as
cZZ (n + κ, n) = E [(Z(n + κ) − E [Z(n + κ)]) (Z(n) − E [Z(n)])∗ ]
and is a function of κ only if Z(n) is wide-sense stationary. The cross-covariance function of
Z1 (n) and Z2 (n) is given as:
cZ1 Z2 (n + κ, n) = E [(Z1 (n + κ) − E [Z1 (n + κ)]) (Z2 (n) − E [Z2 (n)])∗ ]
Z1 (n) and Z2 (n) are called uncorrelated if cZ1 Z2 (n + κ, n) = 0. They are orthogonal if rZ1 Z2 (n +
κ, n) = 0.
Example 7.3.4 A complex process Z(n) is comprised of a sum of M complex signals

M
X
Z(n) = Am ej(ω0 n+Φm )
m=1
Here ω0 is constant, Am is a zero-mean random amplitude of the mth signal and Φm is its
random phase. We assume all variables Am and Φm for m = 1, 2, . . . , M to be statistically
independent and the random phases uniformly distributed on [0, 2π). Find the mean and SOMF.
Is the process wide sense stationary ?
M
X
E [Z(n)] = E [Am ] ejω0 n E ejΦm
m=1
XM
= E [Am ] ejω0 n {E [cos(Φm )] + jE [sin(Φm )]}
m=1
= 0
rZZ (n + κ, n) = E [Z(n + κ)Z(n)∗ ]

" M M
#
X X
jω0 (n+κ) jΦm −jω0 n −jΦl
= E Am e e · Al e e
m=1 l=1
X M
M X h i
= E [Am Al ] · ejω0 κ · E ej(Φm −Φl )
m=1 l=1
7.4. POWER SPECTRAL DENSITY 93
Since h i 1 m=l
j(Φm −Φl )
E e =
6 l
0 m=
The former holds because for m 6= l,

E ejΦm −Φl = E ejΦm E e−jΦl = 0
Thus if m = l,
M
X M
X
2 jω0 κ jω0 κ

rZZ (n + κ, n) = E Am e =e E A2m
m=1 m=1
7.4 Power Spectral Density

7.4.1 Definition of Power Spectral Density
Let X(n, ζi ), n ∈ Z, represent a sample function of a real random process X(n, ζ) ≡ X(n). We
define a truncated version of this sample function as follows:

X(n, ζi ), |n| ≤ M, M ∈ Z+
xN (n) =
0, elsewhere
where N = 2M + 1 is the length of the truncation window. As long as M is finite, we presume
that xN (n) will satisfy
XM
|xN (n)| < ∞
n=−M
Then the Fourier transform of xN (n) becomes:

∞
X
XN (ejω ) = xN (n) e−jωn
n=−∞
M
X
= X(n, ζi ) e−jωn
n=−M
The normalised energy in the time interval [−M, M ] is

M Z π
XN (ejω )2 dω
X
EN = xN (n)2 =
−π 2π
n=−M
where Parseval’s theorem has been used to obtain the second integral. By dividing the expression
EN by N we obtain the average power PN in xN (n) over the interval [−M, M ]:
M Z π
1 X
2 |XN (ejω )|2 dω
PN = xN (n) = (7.5)
2M + 1 −π 2M + 1 2π
n=−M
From Equation (7.5), we see that the average power PN , in the truncated ensemble member
xN (n), can be obtained by the integration of |XN (ejω )|2 /N . To obtain the average power for the
entire sample function, we must consider the limit as M → ∞. PN provides a measure of power
for a single sample function. If we wish to determine the average power of the random process,
we must also take the average (expected value) over the entire ensemble of sample functions.
The average power of the process X(n, ζ) is defined as
M Z π
1 X 2
E |XN (ejω , ζ)|2 dω
PXX = lim E X(n, ζ) = lim
M →∞ 2M + 1 M →∞ −π 2M + 1 2π
n=−M
Remark 1: The average power PXX in a random process X(n) is given by the time average
of its second moment.

Remark 2: If X(n) is wide-sense stationary, E X(n)2 is a constant and PXX = E X(n)2 .
PXX can be obtained by a frequency domain integration. If we define the power spectral density
(PSD) of X(n) by h 2 i
E XN (ejω , ζ)
SXX (ejω , ζ) = lim
M →∞ 2M + 1
where
XM
XN (ejω , ζ) = X(n, ζ) e−jωn
n=−M
then the normalised power is given by
Z π
dω
PXX = SXX (ejω ) = rXX (0)
−π 2π
Example 7.4.1 Consider the random process
X(n) = A · cos(ω0 n + Φ)
where A and ω0 are real constants and Φ is a random variable uniformly distributed on [0, π/2).
We shall find PXX for X(n) using
M
X E X(n)2
PXX = lim .
M →∞ 2M + 1
n=−M
The mean-squared value of X(n) is

h i
E X(n)2 = E A2 cos (ω0 n + Φ)2
2
A A2
= E + cos (2ω0 n + 2Φ)
2 2
Z
A2 A2 π/2 2
= + cos (2ω0 n + 2ϕ) dϕ
2 2 0 π

A2 A2 sin (2ω0 n + 2ϕ) π/2 2
= +
2 2 2 0 π
2 2

A A sin (2ω0 n + π) − sin (2ω0 n) 2
= +
2 2 2 π
A 2 2
A 2 −2 sin (2ω0 n)
= +
2 2 π 2
A2 A2
= − sin (2ω0 n)
2 π

X(n) is non stationary because E X(n)2 is a function of time. The time average of the above
expression is
M 2
1 X A A2 A2
lim − sin(2ω0 n) =
M →∞ 2M + 1 2 π 2
n=−M
Using
Z π E XN (ejω )2 dω
PXX = lim
M →∞ −π 2M + 1 2π
A2
yields PXX = 2 which agrees with the above, which should be shown by the reader as an exercise.
The results derived for the PSD can be obtained similarly for two processes X(n) and Y (n),
say. The cross-power density spectrum is then given by
jω )Y (ejω )∗

E X N (e N
SXY (ejω ) = lim
M →∞ 2M + 1
and thus the total average cross power is
Z π
dω
PXY = SXY (ejω )
−π 2π
7.4.2 Relationship between the PSD and SOMF: Wiener-Khintchine Theo-

rem
Theorem 7.4.1 Wiener-Khintchine Theorem
When X(n) is a wide-sense stationary process, the PSD can be obtained from the Fourier
Transform of the SOMF:
∞
X
SXX (ejω ) = F {rXX (κ)} = rXX (κ) e−jωκ
κ=−∞
and conversely, Z π
dω
rXX (κ) = F −1 SXX (ejω ) = SXX (ejω ) ejωκ
−π 2π
Consequence: there are two distinctly different methods that may be used to evaluate the
PSD of a random process:
1. The PSD is obtained directly from the definition

h 2 i
E XN (ejω )
SXX (ejω ) = lim
M →∞ 2M + 1
where
M
X
XN (ejω ) = X(n) e−jωn
n=−M
2. The PSD is obtained by evaluating the Fourier transform of rXX (κ), where rXX (κ) has
to be obtained first.
7.4.3 Properties of the PSD

The PSD has a number of important properties:
1. The PSD is always real, even if X(n) is complex
∞
X
jω ∗
SXX (e ) = rXX (κ)∗ e+jωκ
κ=−∞
X∞
= rXX (−κ)e+jωκ
κ=−∞
X∞
′
= rXX (κ′ )e−jωκ
κ=−∞
= SXX (ejω )
2. SXX (ejω ) ≥ 0, even if X(n) is complex.

3. When X(n) is real, SXX (e−jω ) = SXX (ejω ).
Rπ
4. −π SXX (ejω ) dω
2π = PXX is the total normalised average power.
Example 7.4.2
X(n) = A sin(ω0 n + Φ)
where A, ω0 are constants and Φ is uniformly distributed on [0, 2π). The mean of X(n) is given
by Z 2π
1
E [X(n)] = A sin(ω0 n + ϕ) dϕ = 0
0 2π
and its SOMF is
A2
rXX (κ) = E [X(n + κ)X(n)] = cos(ω0 κ)
2
By taking the Fourier transform of rXX (κ) we obtain the PSD
A2
SXX (ejω ) = F{rXX (κ)} = π {η(ω − ω0 ) + η(ω + ω0 )}
2
where
∞
X
η(ω) = δ(ω + k2π)
k=−∞
Example 7.4.3 Suppose that the random variables Ai are uncorrelated with zero mean and
variance σi2 , i = 1, . . . , P . Define
P
X
Z(n) = Ai exp{jωi n}.
i=1
It follows that
P
X
rZZ (κ) = σi2 exp{jωi κ}
i=1
and
P
X
jω
SZZ (e ) = σi2 η(ω − ωi )
i=1
Example 7.4.4 Suppose now that the random variables Ai and Bi , i = 1, . . . , P are pairwise
uncorrelated with zero mean and common variance

E A2i = E Bi2 = σi2 , i = 1, . . . , P.
Define
P
X
X(n) = [Ai cos(ωi n) + Bi sin(ωi n)] .
i=1
We find
P
X
rXX (κ) = σi2 cos(ωi κ)
i=1
and therefore
P
X
SXX = π σi2 [η(ω − ωi ) + η(ω + ωi )]
i=1
Definition 7.4.1 A random process X(n) is said to be a “white noise process” if the PSD is
constant over all frequencies:
N0
SXX (ejω ) =
2
where N0 is a positive constant.
The SOMF for a white process is obtained by taking the inverse Fourier transform of SXX (ejω ).
N0
rXX (κ) = δ(κ)
2
where δ(κ) is the Kronecker delta function given by

1, κ=0
δ(κ) =
0, 6 0
κ=
7.4.4 Cross-Power Spectral Density

Suppose we have observations xN (n) and yN (n) of a random process X(n) and Y (n), respec-
tively. Let XN (ejω ) and YN (ejω ) be their respective Fourier transforms, i.e.
M
X
XN (ejω ) = xN (n)e−jωn
n=−M
and
M
X
YN (ejω ) = yN (n)e−jωn
n=−M
The cross-spectral density function is given as

jω E XN (ejω )YN (ejω )∗
SXY (e ) = lim
M →∞ 2M + 1
Substituting XN (ejω ) and YN (ejω ):

M M
jω 1 X X
SXY (e ) = lim rXY (n1 , n2 )e−jω(n1 −n2 )
M →∞ 2M + 1
n1 =−M n2 =−M
Taking the inverse Fourier transform of both sides

Z π Z π M M
jωκ dω 1 dω
X X
jω
SXY (e )e = lim rXY (n1 , n2 )e−jω(n1 −n2 ) ejωκ
−π 2π −π M →∞ 2M + 1 2π
n1 =−M n2 =−M
M M Z π
1 X X dω
= lim rXY (n1 , n2 ) ejω(n2 +κ−n1 )
M →∞ 2M + 1 −π 2π
n1 =−M n2 =−M
M M
1 X X
= lim rXY (n1 , n2 )δ(n2 + κ − n1 )
M →∞ 2M + 1
n1 =−M n2 =−M
M
1 X
= lim rXY (n2 + κ, n2 ) (7.6)
M →∞ 2M + 1
n2 =−M
provided −M ≤ n2 + κ ≤ M so that the delta function is in the range of the summation. This
condition can be relaxed in the limit as M → ∞. The above result shows that the time average
of the SOMF and the PSD form a Fourier transform pair. Clearly, if X(n) and Y (n) are jointly
stationary then this result also implies
∞
X
jω
SXY (e ) = rXY (κ)e−jωκ
κ=−∞
Definition 7.4.2 For two jointly stationary processes X(n) and Y (n) the cross-power spectral
density is given by
X∞
SXY (ejω ) = F{rXY (κ)} = rXY (κ)e−jωκ
κ=−∞
and is in general complex.
Properties:
1. SXY (ejω ) = SY X (ejω )∗ . In addition, if X(n) and Y (n) are real, then SXY (ejω ) =
SY X (e−jω ).

2. Re SXY (ejω ) and Re SY X (ejω ) are even functions of ω if X(n) and Y (n) are real.

3. Im SXY (ejω ) and Im SY X (ejω ) are odd functions of ω if X(n) and Y (n) are real.
4. SXY (ejω ) = 0 and SY X (ejω ) = 0 if X(n) and Y (n) are orthogonal.

7.5. DEFINITION OF THE SPECTRUM 99
Property 1 results from the fact that:

∞
X
jω
SY X (e ) = rY X (κ)e−jωκ
κ=−∞
X∞
= rXY (−κ)∗ e−jωκ
κ=−∞
" ∞
#∗
X
−jωκ
= rXY (κ)e
κ=−∞
= SXY (ejω )∗
Example 7.4.5 Let the SOMF function of two processes X(n) and Y (n) be
AB
rXY (n + κ, n) = [sin(ω0 κ) + cos(ω0 [2n + κ])]
2
where A, B and ω0 are constants. We find
M M
1 X AB AB 1 X
lim rXY (n + κ, n) = sin(ω0 κ) + lim cos(ω0 [2n + κ])
M →∞ 2M + 1 2 2 M →∞ 2M + 1
n=−M n=−M
| {z }
=0
so using the result of Equation (7.6)

jω AB
SXY (e ) = F sin(ω0 κ)
2
π AB
= [η(ω − ω0 ) − η(ω + ω0 )]
j 2
7.5 Definition of the Spectrum

The PSD of a stationary random process has been defined as the Fourier transform of the SOMF.
We now define the ‘spectrum’ of a stationary random process X(n) as the Fourier transform of
the covariance function:
∞
X
CXX (ejω ) := cXX (n)e−jωn
n=−∞
P
If n |cXX (n)| < ∞, CXX (ejω ) exists and is bounded and uniformly continuous. The inversion
is given by
Z π
1
cXX (n) = CXX (ejω )ejωn dω
2π −π
CXX (ejω ) is real, 2π periodic and non-negative.

The cross-spectrum of two jointly stationary random processes X(n) and Y (n) is defined by
∞
X
jω
CXY (e ) := cXY (n)e−jωn
n=−∞
with the property

CXY (ejω ) = CY X (ejω )∗
The inversion is given by
Z π
1
cXY (n) = CXY (ejω )ejωn dω
2π −π
If X(n) and Y (n) are real,
CXX (ejω ) = CXX (e−jω )
CXY (ejω ) = CXY (e−jω )∗ = CY X (e−jω ) = CY X (ejω )∗
The spectrum of a real process is completely specified on the interval [0, π].
7.6 Random Signals and Linear Systems

7.6.1 Convergence of Random Variables
In the following we consider definitions of convergence of a sequence of random variables Xk ,
k = 0, 1, 2, . . . . They are given in order of power.
1. Convergence with probability one or almost surely convergence

A variable Xk converges to X with probability 1 if

P lim |Xk − X| = 0 = 1
k→∞
2. Convergence in the Mean Square Sense

A variable Xk converges to X in the MSS if

lim E |Xk − X|2 = 0
k→∞
3. Convergence in Probability
Xk converges in probability to X if ∀ ǫ > 0
lim P (|Xk − X| > ǫ) = 0
k→∞
4. Convergence in Distribution
Xk converges in distribution to X (or Xk is asymptotically FX distributed) if
lim FXk (x) = FX (x)
k→∞
for all continuous points of FX .
The convergences 1 to 4 are not equally weighted.
i) Convergence with probability 1 implies convergence in probability.
ii) Convergence with probability 1 implies convergence in the MSS, provided second order
moments exist.
iii) Convergence in the MSS implies convergence in probability.
iv) Convergence in probability implies convergence in distribution.
With the concepts of convergence one may define continuity, differentiability and so on.
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS 101
7.6.2 Linear filtering

We are given two stationary processes X(n) and Y (n) P and a linear time-invariant (LTI) system
with impulse response h(n), where the filter is stable ( |h(n)| < ∞) as in figure 7.9. Then for
such a system
X X
Y (n) = h(k)X(n − k) = h(n − k)X(k)
k k
PM
exists with probability one i.e. k=−N h(k)X(n − k) converges with probability one to Y (n)
for N, M → ∞.
X(n) h(n) Y(n)
Figure 7.9: A linear filter.
If X(n) is stationary and E [|X(n)|] < ∞, then Y (n) is stationary. If X(n) is white noise,
i.e. E [X(n)] = 0, E [X(n)X(k)] = σX 2 δ(n − k), with σ 2 the power of the noise and δ(n) the
X
Kronecker delta function

1, n = 0
δ(n) =
0, else
R
Y (n) is called a linear process. If we do not require the filter to be stable but |H(ejω )|2 dω < ∞,
where
X
H(ejω ) = h(n)e−jωn
n
P
is the transfer function of the system, and if for X(n), |cXX (n)| < ∞ then
X X
Y (n) = h(k)X(n − k) = h(n − k)X(k)
k k
exists in the MSS, i.e,

M
X
h(k)X(n − k)
k=−N
converges in the MSS for N, M → ∞ and Y (n) is stationary in the wide sense with
X
µY , E [Y (n)] = h(k)E [X(n − k)] = µX H(ej0 )
k
Example 7.6.1 Hilbert Transforms

 −j, 0 < ω < π
H(ejω ) = 0, ω = 0, −π

j, −π < ω < 0

0, n=0
h(n) =
(1 − cos(πn))/(πn), else
R
The filter is not stable, but |H(ejω )|2 dω < ∞. Y (n) is stationary in the wide sense. For
example if X(n) = a cos(ωo n + φ), then Y (n) = a sin(ωo n + φ).
Output covariance function and spectrum

We now determine the relationship between the covariance function and the spectrum, at the
input and at the output. First, let us define
X X
Ỹ (n) , Y (n) − µY = h(k)X(n − k) − h(k)E [X(n − k)]
k k
X X
= h(k) (X(n − k) − µX ) = h(k)X̃(n − k)
k k
where X̃(n) , X(n) − µX . In the general case, the system impulse response and the input
process may be complex.
The covariance function of the output is determined according to
h i
cY Y (κ) = E Ỹ (n + κ)Ỹ (n)∗
" ! !∗ #
X X
= E h(k)X̃(n + κ − k) h(l)X̃(n − l)
k l
XX h i
= h(k)h(l) E X̃(n + κ − k)X̃(n − l)∗
∗
k l
XX
= h(k)h(l)∗ cXX (κ − k + l)
k l
Taking the Fourier transform of both sides leads to
CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) (7.7)
and applying the inverse Fourier transform formula yields2

Z π
dω
cY Y (κ) = |H(ejω )|2 CXX (ejω )ejωκ
−π 2π
Example 7.6.2 FIR filter

Let X(n) be white noise with power σ 2 and
Y (n) = X(n) − X(n − 1).

2
The integral exists because CXX (ejω ) is bounded and |H(ejω )|2 dω < ∞.
R
From the above, one can see that


 1, n=0
h(n) = −1, n = 1

0, else
Taking the Fourier transform leads to

∞
X
H(ejω ) = h(n)e−jωn = 1 − e−jω
n=−∞
and thus,
CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) = |1 − e−jω |2 σ 2

= |2je−jω/2 sin(ω/2)|2 σ 2
= 4 sin(ω/2)2 σ 2
Cross-covariance function and cross-spectrum

The cross-covariance function of the output is determined according to
h i
cY X (κ) = E Ỹ (n + κ)X̃(n)∗
" ! #
X
∗
= E h(k)X̃(n + κ − k) X̃(n)
k
X h i
= h(k)E X̃(n + κ − k)X̃(n)∗
k
X
= h(k)cXX (κ − k)
k
Taking the Fourier transform of both sides leads to
CY X (ejω ) = H(ejω )CXX (ejω )
and applying the inverse Fourier transform formula yields

Z π
dω
cY X (κ) = H(ejω )CXX (ejω )ejωκ .
−π 2π
Example 7.6.3 Consider a system as depicted in Figure 7.9 where X(n) is a stationary, zero-
2 . The impulse response of the system is given by
mean white noise process with variance σX
n n
1 1 2 1
h(n) = + − u(n)
3 2 3 2
where u(n) is the unit step function

1, n ≥ 0
u(n) =
0, n < 0
and we wish to determine CY Y (ejω ) and CY X (ejω ).

The system transfer function is given by
1/3 2/3
H(ejω ) = F{h(n)} = 1 −jω +
1 − 4e 1 + 12 e−jω
1
= 1 −jω
1+ 4e − 81 e−j2ω
The spectrum and cross-spectrum are given respectively by
CY Y (ejω ) = |H(ejω )|2 CXX (ejω )

2
σX
=
1 + 1 e−jω − 1 e−j2ω 2
4 8

2
σX
=
1 + 41 e−jω − 81 e−j2ω
Interaction of Two Linear Systems

The previous result may be generalised to obtain the cross-covariance or cross-spectrum of two
linear systems. Let X1 (n) and Y1 (n) be the input and output processes of the first system, with
impulse response h1 (n), and X2 (n) and Y2 (n) be the input and output processes of the second
system with impulse response h2 (n) (see Figure 7.10). Suppose that X1 (n) and X2 (n) are wide-
sense stationary and the systems are time-invariant and linear. Then the output cross-covariance
function is:
XX
cY1 Y2 (κ) = h1 (k)h2 (l)∗ cX1 X2 (κ − k + l) (7.8)
k l
It can be seen from (7.8) that
cY1 Y2 (κ) = h1 (κ) ⋆ h2 (−κ)∗ ⋆ cX1 X2 (κ)
where ⋆ denotes convolution. Taking the Fourier transform leads to
CY1 Y2 (ejω ) = H1 (ejω )H2 (ejω )∗ CX1 X2 (ejω ) (7.9)
- h1 (n), H1 (ejω ) - Y1 (n) =

P
X1 (n) κ h1 (κ)X1 (n − κ)
cX1 X2 (κ) cY1 Y2 (κ)
CX1 X2 (ejω ) CY1 Y2 (ejω )
- h2 (n), H2 (ejω ) - Y2 (n) =
P
X2 (n) κ h2 (κ)X2 (n − κ)
Figure 7.10: Interaction of two linear systems

Example 7.6.4 Consider the system shown in Figure 7.11 where Z(n) is stationary, zero-mean
white noise with variance σZ2 . Let h(n) be the same system as discussed in Example 7.6.3 and
g(n) is given by
1 1
g(n) = δ(n) + δ(n − 1) − δ(n − 2)
2 2
where δ(n) is the Kronecker delta function. We wish to find the cross-spectrum CXY (ejω ).
X
G(ejω ) = g(n)e−jωn
n
1 1
= 1 + e−jω − e−j2ω
2 2
CXY (ejω ) = H(ejω )G(ejω )∗ CZZ (ejω )

" #
1 + 21 e−jω − 21 e−j2ω 2
= 1 −jω 1 −j2ω σZ
1 + 4e − 8e
h(n) X(n)
Z(n)
g(n) Y (n)
Figure 7.11: Interaction of two linear systems.
Cascade of linear systems

We consider a cascade of L linear systems in serial connection, as illustrated by Figure 7.12.
From linear system theory, it is known that a cascade of linear systems can be equivalently
represented by a system whose transfer function H(ejω ) is the product of the L individual
system functions
H(ejω ) = H1 (ejω )H2 (ejω ) · · · HL (ejω ).
When a random process is the input to such a system, the spectrum of the output and cross
spectrum of the output with the input are given respectively by
L 2 L
Y Y
CY Y (ejω ) = CXX (ejω ) Hi (ejω ) = CXX (ejω )
Hi (ejω )2

i=1 i=1
L
Y
CY X (ejω ) = CXX (ejω ) Hi (ejω )
i=1
Example 7.6.5 We consider again the system shown in Figure 7.11 where we wish to find the
cross spectrum CXY (ejω ). Instead of using Equation (7.9) with X1 (n) = X2 (n) = Z(n) as done
X(n) H1 (ejω ) H2 (ejω ) ··· HL (ejω ) Y (n)
Figure 7.12: Cascade of L linear systems.
1
Y (n) H(ejω ) X(n)
G(ejω )
Figure 7.13: Cascade of two linear systems.
in Example 7.6.4, we reformulate the problem as a cascade of two systems as shown in Figure
7.13.
Remembering that CY Y (ejω ) = |G(ejω )|2 CZZ (ejω ) (from Figure 7.11 and Equation (7.7)) we
obtain
H(ejω )
CXY (ejω ) = CY Y (ejω )
G(ejω )
H(ejω )
= |G(ejω )|2 CZZ (ejω )
G(ejω )
= H(ejω )G(ejω )∗ CZZ (ejω )
7.6.3 Linear filtered process plus noise

Consider the system shown in Figure 7.14. It is assumed that X(n) is stationary with E [X(n)] =
0, V (n) stationary with E [V (n)] = 0 and cV X (n) = 0 (X(n) and V (n) uncorrelated). It is also
assumed that h(n), X(n) and V (n) are real.
V(n)
X(n) h(n) + Y(n)
Figure 7.14: A linear filter plus noise.

The expected value of the output is derived below.

∞
X
Y (n) = h(k)X(n − k) + V (n)
k=−∞
X∞
E [Y (n)] = h(k)E [X(n − k)] + E [V (n)] = 0
k=−∞
Output covariance function and spectrum

The covariance function of the output is determined according to
cY Y (κ) = E [Y (n + κ)Y (n)]

" ! !#
X X
= E h(k)X(n + κ − k) + V (n + κ) h(l)X(n − l) + V (n)
k l
XX
= h(k)h(l)E [X(n + κ − k)X(n − l)]
k l
X
+ h(k)E [X(n + κ − k)V (n)]
k
X
+ h(l)E [V (n + κ)X(n − l)]
l
+ E [V (n + κ)V (n)]
XX
= h(k)h(l)cXX (κ − k + l) + cV V (κ)
k l
Applying the Fourier transform to both sides of the above expression yields
CY Y (ejω ) = F{cY Y (κ)} = |H(ejω )|2 CXX (ejω ) + CV V (ejω )
Cross-covariance function and cross-spectrum

The cross-covariance function of the output is determined according to
cY X (κ) = E [Y (n + κ)X(n)]
" ! #
X
= E h(k)X(n + κ − k) + V (n + κ) X(n)
k
X
= h(k)E [X(n + κ − k)X(n)]
k
X
cY X (κ) = h(k)cXX (κ − k)
k
Applying the Fourier transform to both sides of the above expression yields
Note that when the additive noise at the output, V (n) is uncorrelated with the input process
X(n), it has no effect on the cross-spectrum of the input and output.
Example 7.6.6 A complex-valued process X(n) is comprised of a sum of K complex signals

K
X
X(n) = Ak ej(ω0 n+φk )
k=1
where ω0 is a constant and Ak is a zero-mean random amplitude of the kth signal with variance
σk2 , k = 1, 2, . . . , K. The phase φk is uniformly distributed on [−π, π). The random variables
Ak and φk are statistically pairwise independent for all k = 1, . . . , K.
Z(n) is the input to a linear, stable time-invariant system with known impulse response
h(n). The output of the system is buried in zero-mean noise V (n) of known covariance cV V (n),
independent of Z(n) (refer to Figure 7.14). We wish to determine the cross-spectrum CY X (ejω )
and the auto-spectrum CY Y (ejω ).
We first calculate the covariance function of the input process X(n)
hP i
K j(ω0 n+φk )
E [X(n)] = E k=1 Ak e
and since Ak and φk are independent for all k = 1, . . . , K
K
X h i
= E [Ak ] E ej(ω0 n+φk ) = 0
k=1
cXX (n + κ, n) = E [X(n + κ)X(n)∗ ]

"K K
#
X X
= E Ak ej(ω0 (n+κ)+φk ) Al e−j(ω0 n+φl )
k=1 l=1
K
XX K h i
= E [Ak Al ] ejω0 κ E ej(φk −φl )
k=1 l=1
h i
j(φk −φl ) 1, k=l
Since E e =
0, k 6= l
K
X
cXX (n + κ, n) = E A2k ejω0 κ
k=1
K
X
= ejω0 κ σk2 = cXX (κ)
k=1

= H(ejω )F {cXX (κ)}
"K #
X X
= H(ejω ) σk2 2π δ(ω + ω0 + 2πl)
k=1 l
" K
#
X X
= σk2 H(ω − ω0 + 2πl)δ(ω − ω0 + 2πl)
k=1 l
CY Y (ejω ) = |H(ejω )|2 CXX (ejω ) + CV V (ejω )

"K #
X X
= σk2 |H(ω − ω0 + 2πl)|2 δ(ω − ω0 + 2πl) + CV V (ejω )
k=1 l
Chapter 8
The Finite Fourier Transform
Let X(0), . . . , X(n−1) be a model for the observations x(0), . . . , x(N −1) of a stationary random
process X(n), n = 0, ±1, ±2, . . . . The function
N
X −1
XN (ejω ) = X(n)e−jωn , −∞ < ω < ∞
n=0
is called the finite Fourier transform.
Properties
The finite Fourier transform has some properties which are listed below.
1. XN (ej(ω+2π) ) = XN (ejω )
2. XN (e−jω ) = XN (ejω )∗ ∀ X(n) ∈ R
3. The finite Fourier transform of aX(n) + bY (n) is aXN (ejω ) + bYN (ejω ), where a and b are
constants.
P P
4. If y(n) = m h(m)x(n − m), n (1 + |n|)|h(n)| < ∞ and |X(n)| < M , then there exists
a constant K > 0 such that |YN (ejω ) − H(ejω )XN (ejω )| < K ∀ ω.
Rπ
5. X(n) = (1/2π) −π XN (ejω )ejωn dω, n = 0, 1, . . . , N − 1, and 0 otherwise.
The finite Fourier transform can be taken at discrete frequencies ωk = 2πk/N

X
XN (2πk/N ) = X(n)e−j2πkn/N k = 0, 1, . . . , N − 1 ,
which can be computed by means of the Fast Fourier Transform (FFT).
8.1 Statistical Properties of the Finite Fourier Transform

8.1.1 Preliminaries
Real normal distribution: X = (X(0), . . . , X(N − 1))T is said to be normally distributed (X ∼
NN (µ, Σ)) if
1 T Σ−1 (X−µ)
fX (X) = (2π)−N/2 |Σ|−1/2 e− 2 (X−µ)
111
112 CHAPTER 8. THE FINITE FOURIER TRANSFORM

given that µ = E (X(0), . . . , X(N − 1))T , Σ = (Cov [Xi , Xk ])i,k=0,...,N −1 = E (X − µ)(X − µ)T
and |Σ| > 0.
Complex normal distribution: A vector X is called complex normally distributed with mean µ
and covariance Σ (X ∼ NNC (µ, Σ)) if the 2N vector of real valued random variables

ℜX
ℑX
is a normal vector with mean

ℜµ
ℑµ
and covariance matrix

1 ℜΣ −ℑΣ
2 ℑΣ ℜΣ
One should note that Σ is Hermitian and so (ℑΣ)T = −ℑΣ, and

E (X − µ)(X − µ)T = 0
If Σ is nonsingular, then the probability density function is

H
c Σ−1 (x−µ)
fX (x) = π −N |Σ|−1 e−(x−µ)
For N = 1, Σ = E [(X − µ)(X − µ)∗ ] is real and identical to σX

2 . Then Cov [ℜX, ℑX] = 0, i.e.
2 2
ℜX and ℑX are independent N1 (ℜX, σX /2) and N1 (ℑX, σX /2) distributed respectively.
Asymptotic Normal Distribution

We say that a sequence X(n), n = 1, 2, . . . of N dimensional random vectors is asymptotically
−1
NNC (µ(n) , Σ(n) ) distributed if the distribution function of G(n) X(n) − µ(n) converges to
the distribution NNC (0, I) as n → ∞. Herein G(n) is a matrix with G(n) [G(n) ]H = Σ(n) .
In the following, we assume that X(n) is real and stationary with finite moments of all
orders and absolutely summable cumulants, that is, the statistical dependence between X(n)
and X(m) decreases sufficiently fast as |n − m| increases, for example, if X(n) is a linear process.
Then we have the following properties for the finite Fourier transform.
1. (a) For large N , XN (ejω ) is distributed as

 C
 N1 (0, N CXX (ejω )), ω 6= kπ
XN (ejω ) ∼ N1 (0, N CXX (ejπ )), ω = (2k + 1)π

N1 (N µX , N CXX (0)), ω = 2kπ
where k is an integer.
(b) For pairwise distinct frequencies ωi , i = 1, . . . , M with 0 ≤ ωi ≤ π, the XN (ejωi ) are
asymptotically independent variables. This is also valid when the ωi are approximated
by 2πki /N → ωi as N → ∞.
This states that the discrete Fourier transformation provides N independent variables
at frequencies ωi which are normally distributed with mean 0 or N µX and variance
N CXX (ejωi ).
8.1. STATISTICAL PROPERTIES OF THE FINITE FOURIER TRANSFORM 113
2. RIf we use a window w(t), t ∈ R with w(t) = 0 for t < 0 and t > 1 that is bounded and
1 2
0 w(t) dt < ∞, then for ωi , i = 0, . . . , N − 1
N
X −1
XW,N (ejωi ) = w(n/N )X(n)e−jωi n , −∞ < ω < ∞
n=0
are asymptotically independently distributed as

 R1
C 2 jω
 N1 (0, NR 0 w(t) dt CXX (e i )),
 ωi 6= kπ
jωi 1 2 jπ
XN (e ) ∼ N1 (0, N 0 w(t) dt CXX (e )), ωi = (2k + 1)π
 N (N µ R 1 w(t) dt, N R 1 w(t)2 dt C (0)), ω = 2kπ

1 X 0 0 XX i
The above is also valid for ωi ≃ 2πki /N .

This states that for large N , the Fourier transform of windowed stationary processes are
distributedR similarly as non-windowed processes, except that the variance is Rmultiplied by
1 1
the factor 0 w(t)2 dt and the mean at ωi = 2kπ is multiplied by the factor 0 w(t) dt.
3. If N = M L and X(0), . . . , X(N − 1) are divided into L segments each of length M , then
M
X −1
XW,M (ejω , l) = w(m/M )X(lM + m)e−jω(lM +m)
m=0
for l = 0, . . . , L − 1 are asymptotically independently distributed as

 R1
C jω 2
 N1 (0, M CXX (e iR) 0 w(t) dt),
 ωi 6= kπ
jωi jπ 1 2
XW,M (e , l) ∼ N1 (0, M CXX (e ) 0 w(t) dt), ωi = (2k + 1)π
 N (M µ R 1 w(t) dt, M C (0) R 1 w(t)2 dt), ω = 2kπ

1 X 0 XX 0 i
This states that finite Fourier transforms of data segments of size M each at ωRi are inde-
1
pendently normally distributed random variable with variance M · CXX (ejωi ) 0 w(t)2 dt.
4. If X(n) is Gaussian then the distributions are exact.
5. Under stronger conditions with respect to the cumulants,

√ one can show that the Fourier
transform has a rate of growth at most of order N log N . If X(n) is bounded by a
constant, M say, then
sup |XN (ejω )| ≤ M N

ω
giving growth rate of order N .

P P∞
6. If Y (n) = ∞k=−∞ h(k)X(n − k), E [X(n)] = 0 and n=−∞ (1 + |n|)|h(n)| < ∞,
then with probability 1
p
YN (ejω ) = H(ejω )XN (ejω ) + O log N , (8.1)
r !
jω jω jω log N
YW,N (e ) = H(e )XW,N (e ) + O (8.2)
N
as N → ∞
114 CHAPTER 8. THE FINITE FOURIER TRANSFORM
Interpretation
To 1.(a): Take a broad-band noise with a normally distributed amplitude and pass it through an
ideal narrow bandpass filter,

jω 1, |ω ± ωo | < π/N
H(e ) = .
0, else
H(ejω )
2π/Ν
−ω0 ω0 ω
Figure 8.1: Transfer function of an ideal bandpass filter.
P
If X(n) is filtered, then Y (n) = h(m)X(n − m). For large N
1 1
Y (n) ≈ XN (2πs/N )e2πsn/N + XN (−2πs/N )e−2πsn/N
N N
where 2πs/N ≈ ωo , also Y (n) is approximately N1 (0, N1 CXX (ejωo )) distributed.
To 1.(b):
jω
H(e
0
)
Y(n)
ω
−ω0 ω0
X(n) Y(n) and U(n) are

approximately independent
jω
H(e
1
)
U(n)
−ω1 ω1 ω
non−overlapping
frequency ranges
Chapter 9
Spectrum Estimation
9.1 Use of bandpass filters

X(n) is a stationary stochastic process with covariance function cXX (n) and spectrum CXX (ejω ).
Assume E [X(n)] = 0. If we transform X(n) by
Y (n) = X(n)2
then
Z π
dω
E [Y (n)] = E X(n)2 = cXX (0) = CXX (ejω )
−π 2π
This motivates the following construction to estimate the spectrum.

The bandpass filter has the transfer function
0
jω HB , |ω ± ω0 | ≤ B2 for |ω| ≤ π
HB (e ) =
0, else ,
where HB 0 will be specified later and ω is the center frequency of the filter. The low-pass filter
0
is an FIR filter with impulse response
1
hl (n) = N , n = 0, 1, 2, . . . , N − 1
0, otherwise ,
V(n) U(n)
X(n) Band−pass Low−pass Y(n)
filter filter
Figure 9.1:
115
116 CHAPTER 9. SPECTRUM ESTIMATION
HΒ (ω)
HΒ0
−ω0 ω0−Β/2 ω0 ω0+Β/2
Figure 9.2: Transfer function of an ideal band-pass filter
used to calculate the mean. We have
jω sin( ωN
2 ) N −1
HL (e ) = e−ω 2
N sin(ω/2)
The spectrum of V (n) is given by
CV V (ejω ) = |HB (ejω )|2 CXX (ejω )
The output process is given by

N −1 N −1
X 1 1 X
Y (n) = U (n − m) = V (n − m)2
N N
m=0 m=0
The mean of Y (n) is given by

N −1
1 X
µY = E [Y (n)] = E V (n − m)2 = cV V (0)
N
m=0
Z π
dω
= CV V (ejω )
2π
Z−ππ
dω
= |HB (ejω )|2 CXX (ejω )
−π 2π
Z ω0 +B/2
0 2 dω
= 2(HB ) CXX (ejω )
ω0 −B/2 2π
0 2B
≈ (HB ) CXX (ejω0 )
π
If we choose HB 0 = 1, µ
Y = E [Y (n)] measures the power of X(n) in thepfrequency band
jω 0
ω − ω0 < B/2. The spectral value CXX (e ) will be measured if HB = π/B, provided
0
B ≪ ω0 and CXX (ejω ) is continuous at ω = ω0 .

Under the assumption that X(n) is Gaussian, one can show that
2π
σY2 ≈ CXX (ejω0 )2 .
BN
9.1. USE OF BANDPASS FILTERS 117
To measure CXX (ejω0 ), we choose HB 0 so that µ = C

Y XX (e
jω0 ). However, the power of the
output process is proportional to CXX (e 0 ) .jω 2
To measure CXX (ejω ) in a frequency band, we would need a filter bank with a cascade of
squarers and integrators. In acoustics, one uses such a circuit, where the center frequencies are
taken to be ωk+1 = (2ωk )1/3 , k = 1, . . . , K − 1. This is the principle of a spectral analyser,
especially when it is used for analysing analog signals. Such a spectral analyser yields the
spectrum at frequencies ω1 , . . . , ωk simultaneously. In measurement technology, one uses simpler
systems that work with a band-pass having variable center frequency. The signal of the system
output gives an estimate of the spectrum in a certain frequency band. The accuracy of the
measurement depends on the slowness of changing the center frequency and the smoothness of
the spectrum of X(n).
118 CHAPTER 9. SPECTRUM ESTIMATION
Chapter 10
Non-Parametric Spectrum
Estimation
10.1 Consistency of Spectral Estimates

An estimate ĈXX (ejω ) of the spectrum CXX (ejω ) is a random variable. If we could design an
ideal estimator for the spectrum CXX (ejω ), the pdf fĈXX (ejω ) (α) would be a delta function at
h i h i
CXX (ejω ), i.e. the mean E ĈXX (ejω ) would be CXX (ejω ) and the variance Var ĈXX (ejω )
would be zero.
Because it is not possible to create an ideal estimator, we try to reach this goal asymptotically
as the number N of observed samples X(n), n = 0, 1, . . . , N −1 tends to infinity. We require that
ĈXX (ejω ) is a consistent estimator for CXX (ejω ), i.e., ĈXX (ejω ) → CXX (ejω ) in probability as
N → ∞. A sufficient condition for consistency is that the mean square error (MSE) of ĈXX (ejω )
converges to zero as N → ∞ (see 7.6.1), i.e.,
2
jω jω jω
M SE ĈXX (e ) = E ĈXX (e ) − CXX (e ) →0 as N → ∞ .
Equivalently, we may say that a sufficient condition for ĈXX (ejω ) to be consistent is that
h i
Var ĈXX (ejω ) → 0 as N → ∞
and h i
Bias ĈXX (ejω ) = E ĈXX (ejω ) − CXX (ejω ) →0 as N →∞
because
2
jω jω jω
M SE ĈXX (e ) = E ĈXX (e ) − CXX (e )
h i h i
= E ĈXX (ejω )2 − 2E ĈXX (ejω ) · CXX (ejω ) + CXX (ejω )2
h i2 h i2
− E ĈXX (ejω ) + E ĈXX (ejω )
h i h i2 h i2
= E ĈXX (ejω )2 − E ĈXX (ejω ) + E ĈXX (ejω ) − CXX (ejω )
h i h i2
= Var ĈXX (ejω ) + Bias ĈXX (ejω ) .
119
120 CHAPTER 10. NON-PARAMETRIC SPECTRUM ESTIMATION
h i
If E ĈXX (ejω ) 6= CXX (ejω ) the estimator has a bias which results in a systematic error of the
h i
method. On the other hand if Var ĈXX (ejω ) 6= 0, we get a random error in the estimator, i.e.,
the estimator varies from data set to data set.
10.2 The Periodogram

Suppose X(n), n = 0, ±1, . . . is P
stationary with mean µX and cXX (k) = Cov [X(n + k), X(n)],
i.e. the spectrum CXX (ejω ) = ∞ k=−∞ cXX (k)e
−jωk , for −π < ω ≤ π is uniformly continuous
and bounded.
As seen in Chapter 8, the finite Fourier transform satisfies the following:
 C
 N1 (0, N CXX (ejω )), ω 6= kπ
jω
XN (e ) ∼ N1 (0, N CXX (ejπ )), ω = (2k + 1)π

N1 (N µX , N CXX (0)), ω = 2kπ
which suggests 1 the so called Periodogram IXX
N (ejω ) as an estimator for the spectrum C jω
XX (e ),
i.e.,
N 1
IXX (ejω ) = |XN (ejω )|2 (10.1)
N
N −1 2
1 X
X(n)e−jωn .

=
N
n=0
N (ejω ) has the same symmetry, non-negativity and periodicity properties as C

Note that IXX jω
XX (e ).
The periodogram was introduced by Schuster (1898) as a tool for the identification of hidden
periodicities. Indeed, in the case of
J
X
X(n) = aj cos(ωj n + φj ), n = 0, . . . , N − 1
j=1
N (ejω ) has peaks at frequencies ω = ±ω (mod 2π), j = 1, . . . , J, provided N is large.

IXX j
10.2.1 Mean of the Periodogram

The mean of the periodogram can be calculated as follows:
1
From the asymptotic distribution of XN ejω we note that Var XN ejω = N CXX (ejω ).
` ´ ˆ ` ´˜
Thus
h i h i
2 2
1
Var XN ejω = N1 E ˛XN ejω ˛ = E N1 ˛XN ejω ˛ .
ˆ ` ´˜ ˛ ` ´˛ ˛ ` ´˛
N
10.2. THE PERIODOGRAM 121
N 1
E IXX (ω) = · E XN (ejω ) · XN (ejω )∗ (10.2)
N
"N −1 N −1
#
1 X X
= ·E X(n)e−jωn X(m)∗ ejωm
N
n=0 m=0
N −1 N −1
1 X X
= E [X(n)X(m)∗ ] e−jω(n−m)
N
n=0 m=0
N −1 N −1
1 X X
= (cXX (n − m) + µ2X )e−jω(n−m)
N
n=0 m=0
N −1 N −1
1 X X 1 N jω 2 2
= cXX (n − m)e−jω(n−m) + |∆ (e )| µX
N N
n=0 m=0
N −1 N −1 Z π
1 X X
−jω(n−m) dλ 1
= e CXX (ejλ )ejλ(n−m) + |∆N (ejω )|2 µ2X
N −π 2π N
Zn=0 m=0
1 π dλ 1
= |∆N (ej(ω−λ) )|2 CXX (ejλ ) + |∆N (ejω )|2 µ2X
N −π 2π N
where
N −1
X N −1 sin( ωN
2 )
∆N (ejω ) = e−jωn = e−jω 2 · ω
sin( 2 )
n=0
is the Fourier transform of a rectangular window. Thus, for a zero mean stationary process
X(n), the expected value of the periodogram IXX N (ejω ), E I N (ejω ) is equal to the spectrum
XX
jω
CXX (e ) convolved with the magnitude squared of the Fourier transform of the rectangular win-
N P
dow, |∆N (ejω )|2 . This suggests that P
limN →∞ E IXX (ejω ) = CXX (ejω )+µ2X ·2π k δ(ω +2πk),
because limN →∞ N1 |∆N (ejω )|2 = 2π k δ(ω + 2πk), as illustrated in Fig. 10.1.
Therefore the periodogram estimate is asymptotically unbiased for ω 6= 2πk, k ∈ Z. We may

find an asymptotic unbiased estimator for CXX (ejω ) ∀ ω by subtracting the sample mean of the
N
P
jω ) = 1 ∞

−jωn 2
data, before computing the periodogram, i.e., IX−µ X ,X−µ X
(e N n=−∞ (X(n) − µX )e
is asymptotically unbiased.
In (10.2) we calculated the mean of the periodogram of a rectangular windowed data sequence
X(n), n = 0, . . . , N −1. Advantages of using different window types are stated in the filter design
theory part (chapter 4). We now consider periodograms of sequences X(n), n = 0, . . . , N − 1,
multiplied by a real-valued window (also taper) wN (n) of length N , having a Fourier transform
WN (ejω ). Then,
N −1 N −1
N jω
1 X X
E IW X,W X (e ) = wN (n)wN (n)E [X(n)X(m)∗ ] e−jω(n−m) (10.3)
N
n=0 m=0
N −1 N −1
1 X X 1
= wN (n)wN (n)cXX (n − m)e−jω(n−m) + |WN (ejω )|2 µ2X
N N
Zn=0
π
m=0
1 dλ 1
= |WN (ej(ω−λ) )|2 CXX (ejλ ) + |WN (ejω )|2 µ2X
N −π 2π N
20
N = 20
N jω 2
N |∆ (e )|
15
1
Window function
10 N = 10
5 N =5
0
−1 −0.5 0 ω
0.5 1
Frequency 2π
1 N jω 2
Figure 10.1: The window function N |∆ (e )| for N = 20, N = 10 and N = 5.
Comparing this result with the one for rectangular windowed data, we observe the following:
If CXX (ejα ) has a significant peak at α in the neighborhood of ω, the expected value of
N (ejω ) and I N
IXX jω jω
W X,W X (e ) can differ substantially from CXX (e ). The advantage of employing
a taper is now apparent. It can be considered as a means to reducing the effect of neighboring
peaks.
10.2.2 Variance of the Periodogram

To find the variance of the periodogram we consider the covariance function of the periodogram.
For simplicity, let us assume that X(n) is a real white Gaussian process with variance σ 2 and
absolute summable covariance function cXX (n) and ω, λ 6= 2kπ with k ∈ Z. Then,
h i h i N h N i
N
Cov IXX (ejω ), IXX
N
(ejλ ) = E IXX
N
(ejω )IXX
N
(ejλ ) − E IXX (ejω ) E IXX (ejλ )
1 N j(ω+λ) 2 N j(ω−λ) 2

= |∆ (e )| + |∆ (e )| σ4 .
N2
Using the characteristic function, this can be proven to equal
E [X1 X2 X3 X4 ] = E [X1 X2 ] E [X3 X4 ] + E [X1 X3 ] E [X2 X4 ] + E [X1 X4 ] E [X2 X3 ] .

10.2. THE PERIODOGRAM 123
If the process X(n) is non-white and non-Gaussian the covariance function can be found in
[Brillinger 1981] as
h i
N
Cov IXX (ejω ), IXX
N
(ejλ ) ≈ (10.4)
1
CXX (ejω )2 · 2 |∆N (ej(ω+λ) )|2 + |∆N (ej(ω−λ) )|2 ,
N
which reduces to the variance of the periodogram to
N 1
Var IXX (ejω ) ≈ CXX (ejω )2 · 2 |∆N (ej2ω )|2 + N 2 .
N
N (ejω ) will tend to remain at the
This suggests that no matter how large N is, the variance of IXX
jω 2
level CXX (e ) . Because the variance does not tend to zero for N → ∞, according to Section
10.1, the periodogram is not a consistent estimator. If an estimator with a variance smaller than
this is desired, it cannot be obtained by simply increasing the sample length N .
The properties of the finite Fourier transform suggest that periodogram ordinates asymp-
totically (for large N ) are independent random variates, which explains the irregularity of the
periodogram in Figure 10.2. From the asymptotic distribution of the finite Fourier transform,
16
14
12
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 10.2: The periodogram estimate of the spectrum (solid line) and the true spectrum (dotted
ω
line). The abscissa displays the frequency 2π while the ordinate shows the estimate.
we can deduce the asymptotic distribution of periodogram ordinates as

(
CXX (ejω ) 2
N jω χ2 , ω 6= πk
IXX (e ) ∼ 2 k∈Z .
CXX (ejω )χ21 , ω = πk
10.3 Averaging Periodograms - Bartlett Estimator

One way to reduce the variance of an estimator is to average estimators obtained from indepen-
dent measurements.
If we assume that the covariance function cXX (k) of the process X(n) is finite or can be
viewed as finite with a small error and the length of cXX (k) is much smaller than the length
of the data stretch N , then we can divide the process X(n) of length N into L parts with
length M each, where M ≪ N . With this assumption we can calculate L nearly independent
periodograms IXXM (ejω , l) l = 0, . . . , L − 1 of sequences
Xl (n) = X(n + l · M ) , n = 0, . . . , N − 1
The periodograms are written as

M −1 2
1 X
M
(ejω , l) = Xl (n)e−jωn

IXX (10.5)
M
n=0
and the estimator as

L−1
B 1 X M jω
ĈXX (ejω ) = IXX (e , l) .
L
l=0
This method of estimating the spectral density function CXX (ejω ) is called Bartlett method due
to its invention by Bartlett in 1953.
B (ejω ) can be easily obtained as

The mean of ĈXX
" L−1 #
h i 1 X M jω
B jω
E ĈXX (e ) = E IXX (e , l)
L
l=0
L−1
1 X M jω
= E IXX (e , l)
L
l=0
M jω
= E IXX (e )
which is the same as the mean of the periodogram of the process X(n) for n = 0, . . . , M − 1.
From Section 10.2.1, it can be seen that Bartlett’s estimator is an unbiased estimator of the
spectrum for M → ∞ because the periodogram is asymptotically unbiased.
To check the performance of the estimator we also have to derive the variance of the estimator.
10.3. AVERAGING PERIODOGRAMS - BARTLETT ESTIMATOR 125
16
14
12
10
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
Figure 10.3: The periodogram as an estimate of the spectrum using 4 non-overlapping segments.
ω
The abscissa displays the frequency 2π while the ordinate shows the estimate.
The variance of Bartlett’s estimator can be calculated as

" L−1
# " #!2
h i 1 X L−1
X L−1
1 X M jω
B
Var ĈXX (ejω ) = E · M
IXX (ejω , k)IXX
M
(ejω , m) − E · IXX (e , l)
L2 L
k=0 m=0 l=0
L−1 L−1
X L−1
1 X
M
1 X M jω M jω
= · E IXX (ejω , p)2 + · E IXX (e , k) E IXX (e , m)
L2 L2
p=0 k=0 m=0
m6=k
M jω 2
− E IXX (e )
L−1
1 X M jω 2 L · (L − 1) M jω 2 M jω 2
= 2
· E IXX (e , p) + 2
E IXX (e ) − E IXX (e )
L L
p=0
1 h M jω 2 M jω 2 i
= · E IXX (e ) − E IXX (e )
L
1 M jω
= · Var IXX (e ) . (10.6)
L
Note that we can proceed similarly as above with a windowed periodogram. We would then
consider
L−1
B jω 1 X M
ĈW X,W X (e ) = IW X,W X (ejω , l)
L
l=0
which has similar properties as the above.
The variance of the Bartlett estimator is equivalent to the variance of the periodogram di-
vided by the number of data segments L. So the variance of the estimator ĈXXB (ejω ) decreases
with an increasing L.
With this result and according to the distribution of the periodogram, Bartlett’s estimator
is asymptotically (
CXX (ejω ) 2
B jω 2L χ2L , ω 6= πk
ĈXX (e ) ∼ CXX (ejω ) 2
k∈Z .
L χL , ω = πk
distributed. This is due to the fact that the sum of L independent χ22 distributed variables is
χ22L distributed.
10.4 Windowing Periodograms - Welch Estimator

Another method of averaging periodograms was developed by Welch in 1967. Welch derived a
modified version of the Bartlett method called Welch’s method, which allows the data segments
to overlap. Additionally, the segments are multiplied by a window function which affects the
bias of the estimator. For each periodogram l, l = 0, . . . , L − 1 the data sequence is now
Xl (n) = X(n + l · D) ,
where l · D is the starting point of the lth sequence. According to this, the periodograms are
defined similarly to Equation (10.5), except for a multiplication by the window wM (n), i.e.,
M −1 2
1 X
M jω
Xl (n) · wM (n) · e−jωn .

IW X,W X (e , l) =
M
n=0
Welch defined his estimator as

L−1
W 1 X M
ĈXX (ejω ) = IW X,W X (ejω , l)
L·A
l=0
with a normalization factor A to get an unbiased estimation. If we set A = 1 and wM (n) as a

rectangular window of length M and D = M , the Welch method is equivalent to the Bartlett
method. Thus, the Bartlett method is a special case of the Welch method.
To calculate the normalization factor A we have to derive the mean of the estimator.
" L−1
#
h i 1 X
W
E ĈXX (ejω ) = E · M
IW jω
X,W X (e , l)
L·A
l=0
L−1
1 X M jω

= · E IW X,W X (e , l)
L·A
l=0
1 M
= E IW X,W X (ejω )
A
From (10.3) we get
h i Z π
W 1 1 1
E ĈXX (ejω ) = · · |WM (ej(ω−λ) 2 jλ 2 jω 2
)| CXX (e )d λ + µX · |WM (e )| .
A M 2π −π
10.5. SMOOTHING THE PERIODOGRAM 127
If we assume that M is large, the energy of WM (ejω ) is concentrated around ω = 0. Thus, we

can state that approximately
h i Z π
W jω 1 1 1 jω jλ 2 2 jω 2
E ĈXX (e ) ≈ · · CXX (e ) |WM (e )| d λ + µX · |WM (e )| .
A M 2π −π
Using Parseval’s theorem,

Z π M −1
1 jλ 2
X
|WM (e )| dλ = |wM (n)|2 ,
2π −π n=0
we rewrite the above as

" M −1
#
h i 1 1 X
W
E ĈXX (ejω ) ≈ · jω
· CXX (e ) 2 2 jω 2
|wM (n)| + µX · |WM (e )| .
A M
n=0
To get an asymptotically unbiased estimator for ω 6= 2kπ, k ∈ Z, A has to be chosen as

M −1
1 X
A= · |wM (n)|2 .
M
n=0
With this and M → ∞, the Welch estimator is unbiased for ω 6= 2kπ.
For M → ∞ the variance of the Welch method for non-overlapping data segments with
D = M is asymptotically the same as for the Bartlett method (unwindowed data)
h i h i 1
W
Var ĈXX (ejω ) ≈ B
Var ĈXX (ejω ) ≈ CXX (ejω )2 .
M →∞, D=M L
In the case of 50% overlap (D = M2 ) Welch calculated the asymptotic variance of the estimator
to be h i
W 9 1
Var ĈXX (ejω ) ≈ · · CXX (ejω )2 ,
M 8 L
M →∞, D= 2
which is higher than the variance of Bartlett’s estimator. The advantage of using Welch’s
estimator with overlap is that one can use more segments for the same amount of data. This
reduces the variance of the estimate while having a bias which is comparable to the one obtained
by the Bartlett estimator. For example, using a 50% overlap leads to doubling the number of
segments, thus a reduction of the variance by the factor 2. Additionally, the shape of the window
only affects the bias of the estimate, not the variance. So even for non-overlapping segments,
the selection of suitable window functions can result in an estimator of reduced bias. These
advantages makes Welch’s estimator one of the most used non-parametric spectrum estimators.
c
The method is also implemented in the ”pwelch” function of MATLAB.
10.5 Smoothing the Periodogram

The asymptotic independence of the finite Fourier transform at discrete frequencies (see prop-
N (ejω1 ), . . . , I N (ejωL ) for
erties 1b in section 8.1) suggests that the periodogram ordinates IXX XX
0 ≤ ω1 < ω2 < . . . < ωL ≤ π are asymptotically independent. These frequencies may be discrete
frequencies ωk = 2πk
N , k ∈ Z in the neighborhood of a value of ω at which we wish to estimate
CXX (ejω ). This motivates the estimate
m 2π
S 1 X
ĈXX (ejω ) = N
IXX ej N [k(ω)+k]
2m + 1
k=−m
where 2π jω
N ·k(ω) is the nearest frequency to ω at which the spectrum CXX (e ) is to be estimated.
Given the above property, we can expect to reduce the variance of the estimation by averag-
jω
ing over adjacent periodogram ordinates. While for ω 6= kπ, k ∈ Z IXX N (ejω ) ∼ CXX (e ) χ2
2 2
jω
N (ejω ) for ω = πk is not CXX (e ) χ2 distributed but C
distributed, we note that IXX jω 2
2 2 XX (e )χ1
distributed. Therefore we exclude these values in the smoothing, which means
k(ω) + k 6= l · N, l ∈ Z.
Because the spectrum CXX (ejω ) of a real process X(n) is of even symmetry and periodic in 2π
it is natural to calculate only values of the periodogram IXX N (ejω ) for 0 ≤ ω < π. This results
in the estimation of CXX (ejω ) for ω = 2πl, l ∈ Z, by

" −1 m
#
1 X 2πk X 2πk
S
ĈXX (ejω ) = N
IXX (ej (ω+ N ) ) + N
IXX (ej (ω+ N ) )
2m
k=−m k=1
m
1 X 2πk
= N
IXX (ej (ω+ N ) ) [ω = 2lπ] or [ω = (2l + 1)π and N even]
m
k=1
which is the same as for the estimation of ω = (2l + 1)π when N is even. For the estimation at
frequency ω = (2l + 1)π when N is odd we use
m
1 Xh N π 2kπ π 2kπ
i
S
ĈXX (ejω ) = IXX (ej (ω− N + N ) ) + IXX
N
(ej (ω+ N − N ) )
2m
k=1
m
1 X N π 2kπ
= IXX (ej (ω− N + N ) ) ω = (2l + 1)π and N odd
m
k=1
which implies that the variance of the estimator at frequencies ω = lπ, l ∈ Z will be higher than
at other frequencies. In the sequel, we will only consider frequencies ω 6= lπ, but the method is
similar to frequencies ω = lπ.
The smoothing method is nothing else but sampling the periodogram at discrete frequencies
with the 2π
N spacing and averaging. Thus, the method can be easily implemented using the FFT
which makes it computational efficient.
The mean of the smoothed periodogram is given by

h i m h 2π i
S 1 X
E ĈXX (ejω ) = N
E IXX ej N (k(ω)+k) .
2m + 1
k=−m
2πk(ω)
In the following calculation we will assume that N = ω. Using the formula for the mean of
10.5. SMOOTHING THE PERIODOGRAM 129
the periodogram (10.2), we obtain

m Z
h
S jω
i 1 X 1 π 2πk
dλ
E ĈXX (e ) = |∆N (ej(ω+ N −λ) )|2 CXX ejλ (10.7)
2m + 1 N −π 2π
k=−m
1
+ |∆N (ejω )|2 µ2X
N
Z π m dλ
1 1 X 2πk 1
= · · |∆N (ej(ω+ N −λ) )|2 CXX ejλ + |∆N (ejω )|2 µ2X
−π 2m + 1 N k=−m 2π N
| {z }
Am (ej(ω−λ) )
Z π dλ
1
= Am ej(ω−λ) CXX ejλ + |∆N (ejω )|2 µ2X .
−π 2π N
Examining the shape of Am (ejω ) in Figure 10.4 for a fixed value of N , and different values of
m, we can describe Am (ejω ) as having approximately a rectangular shape. A comparison of
15 5
4.5
4
Am (ejω )|N =15, m=0
Am (ejω )|N =15, m=1
3.5
10
3
2.5
2
5
1.5
0.5
0 0
0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
ω
π π
2.5 1.4
1.2
2
Am (ejω )|N =15, m=3
Am (ejω )|N =15, m=5
1.5
0.8
0.6
1
0.4
0.5
0.2
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
Figure 10.4: Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5.
the mean of the periodogram (10.2) and the mean of the smoothed periodogram (10.7) suggests
S (ejω ) will most often be greater than the bias of I N (ejω ) as the integral
that the bias of ĈXX XX
extends over a greater range. Both will be the same if the spectrum CXX (ejω ) is flat, however
S (ejω ) is unbiased as N → ∞.
ĈXX
S (ejω ) is similar to the calculation in (10.6) and results

The calculation of the variance for ĈXX
in h i
S 1 N
Var ĈXX (ejω ) = Var IXX (ejω ) .
2m + 1
S (ejω ) is plotted for different
This can be confirmed from Figure 10.5 where the estimate ĈXX
values of m.
14 10
12
8
10
8 6
6
4
4
2
2
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
10
8
8
6
6
4 4
2 2
0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5
Figure 10.5: The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to
ω
bottom right). The abscissa displays the frequency 2π while the ordinate shows the estimate.
With this result and the asymptotic behavior of the periodogram ordinates, we find the
S (ejω ) as
asymptotic distribution of ĈXX

 CXX (ejω ) 2
S jω 4m+2 χ4m+2 , ω 6= πk
ĈXX (e ) ∼ k∈Z .
 CXX (ejω ) 2
2m χ2m , ω = πk
10.6 A General Class of Spectral Estimates

In section 10.5 we considered an estimator which utilised the independence property of the
periodogram ordinates. The smoothed periodogram weighted equally all periodogram ordinates
10.6. A GENERAL CLASS OF SPECTRAL ESTIMATES 131
near ω, which, for a nearly constant spectrum is reasonable. However, if the spectrum CXX (ejω )
varies, this method creates a larger bias. In this case it is more appropriate to weigh periodogram
ordinates near ω more than those further away. Analogously to Section 10.5, we construct the
estimator as
m 2π
SW jω 1 X N k(ω) + k 6= l · N
ĈXX (e ) = Wk IXX ej N [k(ω)+k] l ∈ Z, (10.8)
A ω 6= l · π
k=−m
where A is a constant to be chosen for an unbiased estimator. The case where ω = lπ can be
easily derived from Section 10.5. For further investigations of the estimator in (10.8) we will
only consider the case where ω 6= l · π, l ∈ Z.
SW (ejω ) can be calculated with the assumption that 2πk(ω)

The mean of the estimator ĈXX N =ω
by
h i m h i
1 X 2πk
SW jω
E ĈXX (e ) = N
Wk E IXX ej [ω+ N ] .
A
k=−m
SW (ejω ) we set
To get an asymptotically unbiased estimator for ĈXX
m
X
A= Wk
k=−m
N
because E IXX (ejω ) = CXX (ejω ) as N → ∞. The mean of the estimator can be derived as
h i Z π m dλ
SW jω 1 X 2πk
E ĈXX (e ) = Wk |∆N (ej(ω+ N −λ) )|2 CXX ejλ
−π AN k=−m 2π
| {z }
ASW (ej(ω−λ) )
1
+ |∆N (ejω )|2 µ2X
ZNπ dλ 1
= ASW (ej(ω−λ) )CXX ejλ + |∆N (ejω )|2 µ2X
−π 2π N
where
m
jω
1 1 X 2πk
ASW e = Pm · Wk |∆N (ej(ω+ N ) )|2 .
l=−m Wl N k=−m
This expression is very similar to the function Am (ejω ) from Equation (10.7). Indeed if we set
1
Wk = 2m+1 ∀ k, it follows automatically that ASW (ejω ) = Am (ejω ), i.e. Am (ejω ) is a special case
of ASW (ejω ). However with the weights Wk we have more control over the shape of ASW (ejω )
which also leads to a better control of the bias of the estimator itself. The effect of different
weightings on the shape of ASW (ejω ) can be seen in Figure 10.6.
1.8 3
1.6
2.5
1.4
1.2 2
ASW (ejω )
ASW (ejω )
1
1.5
0.8
0.6 1
0.4
0.5
0.2
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
4 4
3.5 3.5
3 3
ASW (ejω )
ASW (ejω )
2.5 2.5
2 2
1.5 1.5
1 1
0.5 0.5
0 0
0 0.2 0.4 ω 0.6 0.8 1 0 0.2 0.4 ω 0.6 0.8 1
π π
Figure 10.6: Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting
from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window.
SW (ejω ) is given by
The variance of the estimator ĈXX
h i h i h i2
SW jω SW jω 2 SW jω
Var ĈXX (e ) = E ĈXX (e ) − E ĈXX (e )
m m h 2π 2π i
1 X X N j N [k(ω)+k] N j N [k(ω)+l]
= W k W l E IXX e IXX e
A2
k=−m l=−m
m m h 2π i2
1 X X
N
− Wq Wµ E IXX ej N [k(ω)]
A2
q=−m µ=−m
m 2 m m h 2π i2
1 X 2 N j 2π [k(ω)+p] 1 X X N
= W E IXX e N + 2 Wk Wl E IXX ej N [k(ω)]
A2 p=−m p A
k=−m l=−m
l6=k
m m h 2π i2
1 X X
N
− Wq Wµ E IXX ej N [k(ω)]
A2
q=−m µ=−m
m h i
1 X 2 N 2π
= W Var IXX ej N [k(ω)] .
A2 p=−m p
10.7. THE LOG-SPECTRUM 133
With Pm Pm 2
m m
i=−m Wi k=−m Wk
X X
Wk2 = Wk − + ≥0 (10.9)
2m + 1 2m + 1
k=−m k=−m
SW (ejω ). From
we are able to calculate the weighting coefficients Wk for the lowest variance of ĈXX
(10.9) it is easy to see that if
Pm
Wi A
Wk = i=−m = , k = 0, ±1, . . . , ±m ,
2m + 1 2m + 1
SW (ejω ). Because this result is nothing
we reach the lowest possible variance for the estimator ĈXX
other than the smoothed periodogram (see section 10.5) we see that ĈXX S (ejω ) has the largest
bias for a fixed N but the lowest variance of the general class of spectrum estimators covered in
this section.
10.7 The Log-Spectrum

Instead of estimating the spectrum directly, it is possible to estimate 10 log CXX (ejω ) from
10 log ĈXX (ejω ). The log-function can be used as a variance stabilizing technique [Priestley
2001]. We then find
h i
E 10 log ĈXX ejω ≈ 10 log CXX ejω
h i (10 log e)2
Var 10 log ĈXX ejω ≈
C
where C is a constant that controls the variance. If we use Bartlett’s estimator, for example, C
would be equal to L.
The above equations show that the variance is approximately a constant and does not depend
on CXX (ejω ).
10.8 Estimation of the Spectrum using the Sample Covariance

The covariance function cXX (n) is defined in Chapter 7 as
cXX (n) = E [(X(k + n) − µX )(X(k) − µX )∗ ] .
Given X(0), . . . , X(N − 1) we can estimate cXX (k) by


 1 N −1−|n|
P ∗
X(k + |n|) − X̄ X(k) − X̄ , 0 ≤ n ≤ N − 1
ĉXX (n) = N
k=0
(10.10)

0, otherwise
NP
−1
1
where X̄ = N X(k) is an estimate of the mean µX = E [X(k)]. As it can be seen from
k=0
(10.10) the estimator of the covariance function ĉXX (n) is only defined for n ≥ 0. It is obvious
to calculate estimates for n < 0 via the symmetry property of the covariance function
cXX (n) = cXX (−n)∗ .

P
Equation (10.10) is a computationally efficient implementation for ĉXX (k) = N1 ∞ k=−∞ (X(k +
n) − X̄)(X(k) − X̄)∗ if X(n) is zero outside the interval n = 0, . . . , N − 1. It can be proven that
ĉXX (n) is an asymptotically consistent estimator of the covariance function cXX (n).
Motivated by the fact that the spectrum CXX (ejω ) is the Fourier transform of the covariance
function cXX (n) it is intuitive to suggest an estimator of CXX (ejω ) as
∞
X
ĈXX (ejω ) = ĉXX (n)e−jωn
n=−∞
N −1 ∞
1 X X
= [X(l + n) − X̄][X(l) − X̄]∗ · e−jωn
N
n=−(N −1) l=−∞
N −1 ∞
1 X X
= [X(n − l) − X̄][X(−l) − X̄]∗ · e−jωn
N
n=−(N −1) l=−∞
which is equivalent to
ĈXX (ejω ) = IX−
N jω
X̄,X−X̄ (e ) (10.11)
and should be shown by the reader as an exercise.
The derivation in (10.11) shows that taking the Fourier transform of the sample covari-
ance function is equivalent to computing the periodogram as in (10.1). This is therefore not a
consistent method for estimating the spectrum.
10.8.1 Blackman-Tukey Method

In 1958 Blackman and Tukey developed a spectral density estimator, here referred to the
Blackman-Tukey method, which is based on the sample covariance function. The basic idea
of the method is to estimate the sample covariance function ĉXX (k) and window it with a
function w2M −1 (k) of the following form

w2M −1 (−k) , for |k| ≤ M − 1 with M ≪ N
w2M −1 (k) = .
0, otherwise
This condition ensures that the Fourier transform of the window is real-valued and that the
BT (ejω ) will not be complex. We then get
estimator ĈXX
M
X −1
BT
ĈXX (ejω ) = ĉXX (n) · w2M −1 (n) · e−jωn .
n=−(M −1)
Using (10.11) this is equivalent to

Zπ
BT j(ω−λ) d λ
ĈXX (ejω ) = N
IX− X̄,X−X̄ (ejλ
) · W 2M −1 e (10.12)
2π
−π
N
which is the convolution of the Periodogram IX− (ejω ) with the Fourier transform of the
X̄,X−X̄
window w2M −1 (n). From (10.12) we get another constraint for the window
W2M −1 (ejω ) ≥ 0 ∀ω ,
10.9. CROSS-SPECTRUM ESTIMATION 135
BT (ejω ) is non-negative for all ω.

to ensure the estimate ĈXX
BT (ejω ) can be easily calculated as

The mean of the estimator ĈXX
h i h i
BT
E ĈXX (ejω ) = E IX− N
X̄,X−X̄ (ejω
) ⋆ W2M −1 (ejω )

which, according to Section 10.2.1, is given by

h i 1 N jω 2
BT
E ĈXX (ejω ) = |∆ (e )| ⋆ W2M −1 (ejω )
⋆ CXX (ejω )
N
Zπ Z∞
1 1 1 N jϑ 2
= CXX (ejθ ) |∆ (e )| · W2M −1 (ej(ω−θ−ϑ) )d ϑd θ .
2π 2π N
−π −∞
N jω 2
Since limN →∞ 1∆ (e ) = 2π P δ(ω + 2πk), we find
N k
Z ∞
1 1 N jϑ 2
lim |∆ (e )| · W2M −1 ej(ω−θ−ϑ) d ϑ = W2M −1 ej(ω−θ)
N →∞ 2π −∞ N
which reduces the term for the mean of the estimator to a simple convolution
h i Z π
BT jω 1
E ĈXX (e ) = CXX (ejθ ) · W2M −1 ej(ω−θ) d θ . (10.13)
2π −π
Therefore the Blackman-Tukey estimator is unbiased for M → ∞.
The variance of the estimator is given from Eq. (10.12) as

Z
h
BT
i 1 π
Var ĈXX (ejω ) ≈ CXX (ejω )2 · W2M −1 (ejω )2 d ω
N −π
M −1
1 X
≈ CXX (ejω )2 · w2M −1 (n)2 .
N
n=−(M −1)
Based on this approximation the variance of the Blackman Tukey method tends to zero if the
window function w2M −1 (n) has finite energy.
10.9 Cross-Spectrum Estimation

If X(n) and Y (n) are jointly stationary and real, the cross-covariance function is defined as
cY X (n) = E [(Y (n + k) − E [Y (n)])(X(k) − E [X(k)])]

cY X (n) = E [Y (n + k)X(k)] − E [Y (n)] E [X(n)]
The cross-spectrum is defined by

∞
X
jω
CY X (e ) = cY X (n)e−jωn = CXY (ejω )∗ = CXY (e−jω ) .
n=−∞
Given X(0), . . . , X(N − 1), Y (0), . . . , Y (N − 1), the objective is to estimate the cross-spectrum
CY X (ejω ) for −π < ω < π. The finite Fourier transforms are given by
N
X −1
YN (ejω ) = Y (n)e−jωn
n=0
N
X −1
XN (ejω ) = X(n)e−jωn
n=0
XN (ejω ) and YN (ejω ) are for large N normally distributed random variables as discussed above.
At distinct frequencies 0 < ω1 < ω2 < . . . < ωN < π the random variables XN (ejω ) and YN (ejω )
are independent.
Considering L data stretches of the series X(0), . . . , X(N − 1) and Y (0), . . . , Y (N − 1), so
that M L = N ,
M
X −1
jω
YN (e , l) = Y ([l − 1]M + m) e−jωm l = 1, . . . , L and
m=0
M
X −1
XN (ejω , l) = X ([l − 1]M + m) e−jωm l = 1, . . . , L
m=0
are for large M stochastically independent for 0 < ω ≤ π.

T
Defining V(ejω ) = ℜXN (ejω ), ℜYN (ejω ), ℑXN (ejω ), ℑYN (ejω ) , we have
 
CXX (ejω ) ℜCY X (ejω ) 0 ℑCY X (ejω )
N  ℜCY X (ejω ) CY Y (ejω ) −ℑCY X (ejω ) 0 
E V(ejω )V(ejω )H = 
jω jω jω
.
2  0 ℑCY X (e ) CXX (e ) ℜCY X (e ) 
−ℑCY X (ejω ) 0 ℜCY X (ejω ) CY Y (ejω )
10.9.1 The Cross-Periodogram

The cross-periodogram is defined as
1
IYNX (ejω ) = YN (ejω )XN (ejω )∗
N
It can be shown that

E IYNX (ejω ) = CY X (ejω )
Also, we have

E YN (ejω )XN (ejω ) = 0
and

Var IYNX (ejω ) = CY Y (ejω )CXX (ejω )
10.9. CROSS-SPECTRUM ESTIMATION 137
Because

CY X (ejω )2 ≤ CY Y (ejω )CXX (ejω )
2
The variance of IYNX (ejω ) is never smaller than CY X (ejω ) for any N .
To construct estimates for CY X (ejω ) with lower variance, we use the properties given above.
We calculate
L
1X 1
ĈY X (ejω ) = YM (ejω , l)XM (ejω , l)∗
L M
l=1
and we have (for large M )

h i
E ĈY X (ejω ) ≈ CY X (ejω )
h i 1
Var ĈY X (ejω ) ≈ CY Y (ejω )CXX (ejω )
L
Also we can smooth the cross-periodogram over neighboring frequencies and get similar results.
Chapter 11
Parametric Spectrum Estimation
11.1 Spectrum Estimation using an Autoregressive Process
Z(n) h(n) X(n)
Figure 11.1: Linear and time-invariant filtering of white noise
Consider the system of Figure 11.1 with output X(n) such that
p
X
X(n) + ak X(n − k) = Z(n)
k=1
where E [Z(n)] = 0, E [Z(n)Z(k)] = σZ2 δ(n−k) and ap 6= 0. It is assumed that the filter with unit
impulse response h(n) is stable. The process X(n) is then called an autoregressive process of
order p or in short notation AR(p). The recursive filter possesses the following transfer function
1
H(ejω ) = Pp −jωk
1+ k=1 ak e
Because we have assumed the filter to be stable, the roots of the characteristic equation
p
X
zp + ak z p−k = 0, z = ejω
k=1
lie in the unit disc, i.e. |zl | < 1 for all l roots of the above equation. The spectrum is given as
CXX (ejω ) = |H(ejω )|2 CZZ (ejω )
thus,
σZ2
CXX (ejω ) = Pp −jωk |2
|1 + k=1 ak e
139
140 CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION
The objective is to estimate CXX (ejω ) based on observations of the process X(n) for n =
0, . . . , N − 1.
Multiplying
p
X
X(n) + ak X(n − k) = Z(n)
k=1
by X(n − l) (from the right), l ≥ 0 and taking expectation, leads to

p
X
E [X(n)X(n − l)] + ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]
k=1
Thus,
p
X
cXX (l) + ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]
k=1
Assuming the recursive filter to be causal, i.e. h(n) = 0 for n < 0 leads to the relationship
∞
X
X(n) = h(k)Z(n − k)
k=0
∞
X
= h(0)Z(n) + h(k)Z(n − k)
k=1
Thus, shifting X(n) by l gives

∞
X
X(n − l) = h(0)Z(n − l) + h(k)Z(n − l − k)
k=1
Taking the expected value of the product Z(n)X(n − l) leads to

∞
X
E [Z(n)X(n − l)] = h(0)cZZ (l) + h(k)cZZ (l + k)
k=1
| {z }
=0
E [Z(n)X(n − l)] = h(0)cZZ (l) = h(0)σZ2 δ(l)
Assuming h(0) = 1, leads to the so called Yule-Walker equations
p 2
X σZ , l = 0
cXX (l) + ak cXX (l − k) =
0, l = 1, 2, . . .
k=1
The difference equation in cXX (l) has a similar form to the system describing difference equation
in X(n).
Having observed X(n), one can estimate cXX (n). Using cXX (0), . . . , cXX (p), one would get
a unique solution for a1 , . . . , ap . The solution is unique if the inverse matrix with the elements
cXX (l − k) exists, which is guaranteed if the filter is stable. Given estimates â1 , . . . , âp of
a1 , . . . , ap , one can use the Yule-Walker equations for l = 0 to get σ̂Z2 . Then it is straightforward
to find an estimate for CXX (ejω ) given by
σ̂Z2
ĈXX (ejω ) = Pp −jωk |2
|1 + k=1 âk e
11.2. SPECTRUM ESTIMATION USING A MOVING AVERAGE PROCESS 141
11.2 Spectrum Estimation using a Moving Average Process

Consider a transversal filter with impulse response h(n), input Z(n) and output X(n) ∈ R as
shown in Fig. 11.2:
q
X q
X
X(n) = h(k)Z(n − k) = Z(n) + h(k)Z(n − k) .
k=0 k=1
If Z(n) is a white noise process, i.e., E [Z(n)] = 0 and cZZ (k) = σZ2 δ(k), then X(n) is a Moving
Average (MA) Process of order q, MA(q).
Z(n)
z −1 z −1 z −1
h(1) h(2) h(q)

X(n)
Figure 11.2: Filter model for a moving average process
The spectrum of X(n) can be calculated as

2
CXX (ejω ) = H(ejω ) σZ2 .
The covariance function of X(n) is
cXX (n) = E [X(n + k)X(k)]

" q ! q
!#
X X
= E h(r)Z(n + k − r) h(m)Z(k − m)
r=0 m=0
q
X q
X
= h(r)h(m)σZ2 δ(n + m − r)
r=0 m=0
 q−|n|
 P
σZ2 h(m)h(m + |n|), 0 ≤ |n| ≤ q
cXX (n) = m=0
.

0, else
If we have cXX (0), . . . , cXX (q), we would need to solve non-linear equations in h(0), . . . , h(q)
and σZ2 , which makes it impossible to find a unique solution.
Let b(m) = σZ h(m) for m = 0, . . . , q, then

 q−|n|
 P
b(m)b(m + |n|), 0 ≤ |n| ≤ q
cXX (n) = m=0
.

0, else
Let ĉXX (0), . . . , ĉXX (q) be available (consistent) estimates for cXX (0), . . . , cXX (q). Then
q−n
X
ĉXX (n) = b̂(m)b̂(m + n), n = 0, . . . , q .
m=0
The estimates b̂(0), . . . , b̂(q) are consistent, because of the consistency of ĉXX (k), k = 0, . . . , q.
b̂(1) b̂(q)
Then ĥ(1) = , . . . , ĥ(q) = , σ̂Z2 = b̂(0)2 .
b̂(0) b̂(0)
The implementation of the solution is as follows:

With
q
X
B̂(z) = b̂(n)z −n
n=0
Xq
ĈXX (z) = ĉXX (n)z −n
n=−q
= B̂(z)B̂ (1/z)
1. Calculate P (z) = z q ĈXX (z)

Qq
2. Assign all roots zi of P (z) inside the unit circle to B̂(z), i.e. B̂(z) = b̂(0) i=1 (1 − zi z −1 )
with b̂(0) > 0 and |zi | ≤ 1.
b̂(m)
3. Calculate ĥ(m) = for m = 1, . . . , q
b̂(0)
As an example
q
b̂(1) X
= ĥ(1) = − zi .
b̂(0) i=1
To find σ̂Z2 = b̂20 we use

q−k
X
ĉXX (n) = b̂(m)b̂(m + n) , n = 0, 1, . . . , q
m=0
and find for n = 0

ĉXX (0)
b̂(0)2 = Pq
1 + m=1 ĥ(m)2
as an estimate for the power σZ2 of Z(n).
An alternative to estimate the spectrum CXX (ejω ) of X(n) is

q
X q
X
ĈXX (ejω ) = ĉXX (n)e−jωn = ĉXX (0) + 2 ĉXX (n)e−jωn . (11.1)
n=−q n=1
Example 11.2.1 Consider a receiver input signal X(n) from a wireless transmission. To find
the moving average estimate of the spectrum CXX (ejω ), we first calculate the sample covariance
ĉXX (n) and take the z-transform ĈXX (z). By taking P (z) = z N −1 ĈXX (z) we find N − 1 roots
zi , i = 1, . . . , N − 1 inside the unit circle and assign them to B̂(z). The estimator ĈXX (ejω )
can then be found according to (11.1).
11.3. MODEL ORDER SELECTION 143
12
10
6
ĈXX (ejω )
4
CXX (ejω )
2
0
0 0.5 1 1.5 2
ω
π
Figure 11.3: Estimate of an MA process with N = 100 and q = 5
11.3 Model order selection

A problem arising with the parametric approach is that of estimating the number of coefficients
required, i.e. p or q.
Having observed X(0), . . . , X(N − 1) there are several techniques for doing this. Akaike’s
information criterion (AIC) suggests minimising the following function with respect to m.
2 2
AIC(m) = log σ̂Z,m +m , m = 1, . . . , M
N
The order estimate is the smallest m minimising AIC(m).
Another more accurate estimate is given by the minimum description length (MDL) criterion,
obtained by taking the smallest m minimising
2 log N
MDL(m) = log σ̂Z,m +m , m = 1, . . . , M
N
Examples
We generate realisations from the N (0, 1) distribution. We filter the data recursively to get 1024
values. The filter has poles
z1,2 = 0.7e±j0.4 , z3,4 = 0.75e±j1.3 , z5,6 = 0.9e±j1.9
It is a recursive filter of order 6. The process is an AR(6) process.

10 True
Estimate
9
0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5
ω
Figure 11.4: True and estimated AR(6) spectrum. The abscissa displays the frequency 2π while the
ordinate shows the estimate.
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5
0.5
MDL
AIC
0.4
0.4
0.3
0.3
0.2
0.2
0.1
0.1
0
0
2 4 6 8 10 2 4 6 8 10
Order Order
Figure 11.5: Akaike and MDL information criteria for the AR(6) process for orders 1 through 10.
Chapter 12
Discrete Kalman Filter
The Kalman filter has become an indispensable tool in modern control and signal processing.
Its simplest use is as an estimation tool. However, it is capable of revealing more than just
a standard estimate; it provides a complete statistical description of the degree of knowledge
available. This can be highly useful in system analysis. In contrast to many other estimation
techniques, the Kalman filter operates in the state space domain. This intuitive structure has
given it physical motivation in a variety of applications. In addition, its implementation can be
computationally efficient.
As a result, it has found application in a wide variety of fields; especially in process control
and target tracking, but also in many other signal processing topics.
12.1 State Estimation

The purpose of a Kalman filter is to estimate the state of a process. In many (or most) physical
applications, this state will consist of multiple random variables. For example, in a remote
sensing scenario, the task may be to estimate the position and velocity of a vehicle. In this
case information may be available in the form of measurements such as acoustic signals, radar
returns or visual images.
Once an appropriate model of the system is constructed, it may be possible to derive an
algorithm for the estimation of the state, based on a finite set of observations.
12.2 Derivation of Kalman Filter Equations

Consider a multi-dimensional system where the time evolution of the (unobservable) process
state can be described by
       
a11 . . . a1N X1 (n − 1) b11 . . . b1L U1 (n − 1) W1 (n − 1)
X1 (n)
=  ... ..   ..
 ..   .. .. ..   ..   .. 
. .  . + . . .  . + . 
XN (n)
aN 1 . . . aN N XN (n − 1) bN 1 . . . bN L UL (n − 1) WN (n − 1)
and in vector notation
X(n) = AX(n − 1) + BU (n − 1) + W (n − 1) (12.1)
where
• X(n) ∈ ℜN ×1 is the process state at time n
145
146 CHAPTER 12. DISCRETE KALMAN FILTER
• A is the state transition matrix

• U (n) ∈ ℜL×1 is the control input
• B governs the effect of the control input on the state
• W (n) ∈ ℜN ×1 is the process noise.
The measurements of the process are determined by
      
Z1 (n) h11 . . . h1N X1 (n) V1 (n)
 ..   . .. ..   ..  +  .. 
 .  =  .. . .  .   . 
ZM (n) hM 1 . . . hM N XN (n) VM (n)
Z(n) = HX(n) + V (n) (12.2)

where
• Z(n) ∈ ℜM ×1 is the measurement
• H maps the process state to a measurement
• V (n) ∈ ℜM ×1 is the measurement noise
The relationships between the system components can be seen in Figure 12.1.
W (n) V (n)
X(n)
U (n) B δK (n − 1) H Z(n)
Figure 12.1: Relationship between system state and measurements assumed for Kalman filter.
In addition, we assume that the noise quantities W (n) and V (n) are independent to each
other and to U (n), white and Gaussian distributed,
W (n) ∼ N (0, Q)
V (n) ∼ N (0, R) .
The matrices A, B and H are assumed known from the physics of the system.
The control input U (n) is the intervention or “steering” of the process by an outside source
(e.g. a human operator). It is assumed that U (n) is known precisely. Therefore, its effect on
the process, through (12.1) is also known.
Now, in order to derive the Kalman filter equations we need to consider what is the “best”
estimate X̂(n) of the new state X(n) based on the previous state estimate X̂(n − 1), the pre-
vious control input U (n − 1) and the current observation Z(n). This estimate of X(n) may be
calculated in two steps:
12.2. DERIVATION OF KALMAN FILTER EQUATIONS 147
1. Predict forward in time from previous estimate

X̂ − (n) = AX̂(n − 1) + BU (n − 1) . (12.3)
From (12.1) it can be seen that the new state is a linear function of the previous state with
the addition of the (known) control input and (unknown) zero-mean noise. To predict
the new state it is reasonable to use the previous state estimate in place of the true state
and to ignore the additive noise component since its expected value is zero. The quantity
X̂ − (n) is referred to a the a priori state estimate since it is determined from the knowledge
available before time instant n. It is a projection of the previous state estimate forward in
time.
2. Predict the new measurement value
Ẑ − (n) = H X̂ − (n) (12.4)
and correct the state estimate based on the residual between predicted and actual mea-
surements
X̂(n) = X̂ − (n) + K(n)(Z(n) − Ẑ − (n)) . (12.5)
The predicted measurement, Ẑ − (n), is calculated from the a priori state estimate. It
can be seen that (12.4) is similar to (12.2) without the zero-mean measurement noise and
using the previous state estimate rather than the true, unknown value. Since the true
measurement, Z(n), is available, the measurement residual Z(n) − Ẑ − (n) is a measure
of how correct the state estimate is – a small error is more likely when the estimate is
good, whereas a poor estimate will be more likely to produce a large difference between
predicted and actual measurements. The state estimate is now corrected by the weighted
residual K(n)(Z(n) − Ẑ − (n)).
The N × M matrix K(n) is referred to as the Kalman gain. It controls how the measurement
residual affects the state estimate. In order to optimise the choice of K(n), a criterion must be
specified. A logical choice would be to minimise the mean squared error between the a posteriori
estimate X̂(n) and the true state vector X(n), i.e.
h i
min E kX(n) − X̂(n)k2 = min tr {P (n)} (12.6)
K(n) K(n)
where
P (n) = E E(n)E(n)T (12.7)
with
E(n) = X(n) − X̂(n) (12.8)
is the (state) estimate error – i.e. the difference between the true state vector and our a posteriori
state estimate.
To find the K(n) that minimises (12.6), start by writing the a posteriori error in terms of
the a priori error, E − (n) = X(n) − X̂ − (n), and its covariance, P − (n) = E E − (n)E − (n)T
E(n) = X(n) − X̂(n)

= E − (n) − X̂(n) − X̂ − (n)

= E − (n) − K(n) Z(n) − H X̂ − (n)

= E − (n) − K(n) H(X(n) − X̂ − (n)) + V (n)

= E − (n) − K(n) HE − (n) + V (n) (12.9)
Now substitute into (12.7)

P (n) = E E(n)E(n)T
h T i
= E E − (n) − K(n) HE − (n) + V (n) E − (n) − K(n) HE − (n) + V (n)

= E E − (n)E − (n)T − E K(n) HE − (n) + V (n) E − (n)T . . .
h T i
−E E − (n) HE − (n) + V (n) K(n)T . . .
h T i
+E K(n) HE − (n) + V (n) HE − (n) + V (n) K(n)T
Differentiate1 tr {P (n)} with respect to K(n) and set to 0

0 = 0 − P − (n)H T − P − (n)H T + 2K(n) HP − (n)H T + R

P − (n)H T = K(n) HP − (n)H T + R
−1
K(n) = P − (n)H T HP − (n)H T + R . (12.10)
It is easy to see that as R → 0, i.e. the measurement noise becomes less influential , K(n) → H −1 .
Thus, the residual between actual and predicted measurements is more trusted and is given a
heavy weighting in correcting the state estimate through (12.5).
Alternatively, if it is known that the a priori error covariance approaches 0, i.e. P − (n) → 0,
or if R is large, then K(n) → 0. This means that the residual is given less weight in updating
the state estimate – the a priori estimate is given priority.
Recall that
X(n) = AX(n − 1) + BU (n − 1) + W (n − 1)
X̂ − (n) = AX̂(n − 1) + BU (n − 1)
then

P − (n) = E E − (n)E − (n)T
T
− −
= E X(n) − X̂ (n) X(n) − X̂ (n)
T
= E A(X(n − 1) − X̂(n − 1)) + W (n − 1) A(X(n − 1) − X̂(n − 1)) + W (n − 1)
T
= AE X(n − 1) − X̂(n − 1) X(n − 1) − X̂(n − 1) AT + E W (n − 1)W (n − 1)T

= AE E(n − 1)E(n − 1)T AT + E W (n − 1)W (n − 1)T
= AP (n − 1)AT + Q (12.11)
1
The following lemmas are used in the differentiation:
∂tr {AX}
= AT
∂X
∂tr AX T
˘ ¯
= A
∂X
∂XAX T
= X(A + AT )
∂X
12.3. EXTENDED KALMAN FILTER 149
Starting from (12.9),
E(n) = E − (n) − K(n)(HE − (n) + V (n))

= (I − K(n)H)E − (n) − K(n)V (n)
so
h T i
E E(n)E(n)T = E (I − K(n)H)E − (n) − K(n)V (n) (I − K(n)H)E − (n) − K(n)V (n)
h i
P (n) = E (I − K(n)H) E − (n)E − (n)T (I − K(n)H)T + K(n)V (n)V (n)T K(n)T
= (I − K(n)H) P − (n) (I − K(n)H)T + K(n)RK(n)T
= P − (n) − K(n)HP − (n) − P − (n)H T K(n)T + K(n)HP − (n)H T K(n)T + K(n)RK(n)T
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + K(n)(HP − (n)H T + R)K(n)T
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + P − (n)H T K(n)T
= (I − K(n)H) P − (n) (12.12)
Note that we have used the solution for K(n) in (12.10) to simplify the above equation.
It is now possible to draw all the equations derived above together to complete the description
of the Kalman filter:
1. Predict the next state and covariance using the previous estimate.
(a) X̂ − (n) = AX̂(n − 1) + BU (n − 1)

(b) P − (n) = AP (n − 1)AT + Q
2. Correct the predictions using the measurement.
(a) K(n) = P − (n)H T (HP − (n)H + R)−1

(b) X̂(n) = X̂ − (n) + K(n)(Z(n) − Ẑ − (n))
(c) P (n) = (I − K(n)H)P − (n)
The distinction between the two main steps is quite clear. The prediction is based on the a
priori information available, i.e. the estimates from the previous time instance which was based
on all measurements up to, but not including, the current time instance n. While the correction
is made using the current measurement Z(n) to yield an a posteriori estimate.
In fact, the suitability of the Kalman filter for computer implementation is, in large part, due
to this feature. The procedure is recursive – it is run after each measurement is made, meaning
intermediate estimates are always available. This is in contrast to some techniques that must
process the entire measurement sequence to finally yield an estimate. It also compresses all the
information from previous measurements into the current state estimate and its corresponding
covariance matrix estimate – there is no need to retain a record of previous measurements. This
can result in significant memory savings.
12.3 Extended Kalman Filter

The Kalman filter has very strict assumptions on the system model. In particular it uses linear
stochastic difference equations to describe the state transition and the state-to-measurement
mapping. This linearity assumption is violated by many or, by strict interpretation, all real-life
applications. In these cases it becomes necessary to use the Extended Kalman Filter (EKF). As
the name suggests, this is a generalisation or extension to the standard Kalman filter framework
whereby the equations are “linearised” around the estimates in a way analogous to the Taylor
series.
Let the state transition now be defined as: (c.f. (12.1))
X(n) = f (X(n − 1), U (n − 1), W (n − 1)) (12.13)
and the measurement is found by: (c.f. (12.2))
Z(n) = h(X(n), V (n)). (12.14)
As for the Kalman filter case, the noise processes, W (n − 1) and V (n − 1), are unobservable
and only an estimate of the previous state, X̂(n − 1), is available. However, the control input,
U (n − 1) is known. Therefore, estimates of the state and measurement vectors can be found by
X̂ − (n) = f (X̂(n − 1), U (n − 1), 0) (12.15)

− −
Ẑ (n) = h(X̂ (n), 0) . (12.16)
This means that (12.13) and (12.14) can be linearised around these initial estimates
X(n) = f (X(n − 1), U (n − 1), W (n − 1))

≈ X̂ − (n) + A(n)(X(n − 1) − X̂(n − 1)) + C(n)W (n − 1)
and
Z(n) ≈ Ẑ − (n) + H(n)(X(n) − X̂ − (n)) + D(n)V (n)
where A(n), C(n), H(n) and D(n) are Jacobian matrices of the partial derivatives of f and h,
i.e.
∂f[i]
A[i,j] = (X̂(n − 1), U (k − 1), 0)
∂x[j]
∂f[i]
W[i,j] = (X̂(n − 1), U (k − 1), 0)
∂w[j]
∂h[i]
H[i,j] = (X̂(n − 1), 0)
∂x[j]
∂h[i]
V[i,j] = (X̂(n − 1), 0) .
∂v[j]
Hopefully the interpretation of these equations is clear. The nonlinear state and measurement
equations are linearised about an initial, a priori estimate. The four Jacobian matrices describe
the rate of change of the nonlinear functions at these estimates.
Due to the non-linearities, the random variables in the EKF cannot be guaranteed to be
Gaussian. Further, no general optimality criterion can be associated with the resulting estimate.
One way to interpret the EKF is that it is simply an adaptation of the Kalman filter to the non-
linear case through the use of a first order Taylor series expansion about the a priori estimate.
By using the equations above and following the methodology in deriving the Kalman filter,
the complete algorithm for implementing an EKF is:
12.3. EXTENDED KALMAN FILTER 151
1. Predict the next state and covariance using the previous estimate.
(a) X̂ − (n) = f (X̂(n − 1), U (n − 1), 0)

(b) P − (n) = A(n)P (n − 1)A(n)T + C(n)Q(n − 1)C(n)T
2. Correct the predictions using the measurement.
(a) K(n) = P − (n)H(n)T (H(n)P − (n)H(n)T + V RV T )−1

(b) X̂(n) = X̂ − (n) + K(n)(Z(n) − h(X̂ − (n), 0))
(c) P (n) = (I − KH(n))P − (n)
Appendix A
Discrete-Time Fourier Transform
Sequence Fourier Transform

ax(n) + by(n) aX(ejω ) + bY (ejω )
x(n − nd ) (nd integer) e−jωnd X(ejω )
e−jω0 n x(n) X(ej(ω+ω0 ) )
x(−n) X(e−jω )
X(ejω )∗ if x(n) real
dX(ejω )
nx(n) j
dω
x(n) ⋆ y(n) X(eZjω )Y (ejω )
π
1
x(n)y(n) X(ejθ )Y (ej(ω−θ )dθ
2π −π
dX(ejω )
−jnx(n) dω
Symmetry properties:
x(n)∗ X(e−jω )∗
x(n) real X(ejω ) = X(e−jω )∗
x(n) real XR (ejω ) and|X(ejω )| are even
XI (ejω ) and argX(ejω ) are odd
x(−n)∗ X(ejω )∗
x(n) real and even X(ejω ) real and even
x(n) real and odd X(ejω ) pure imaginary and odd
Parseval’s Theorem
∞ Z π
X 2 1 2
|x(n)| = |X(ejω )| dω
n=−∞
2π −π
∞ Z π
X
∗ 1
x(n) · y(n) = X(ejω ) · Y (ejω )∗ dω
n=−∞
2π −π
153
154 APPENDIX A. DISCRETE-TIME FOURIER TRANSFORM
Sequence Fourier Transform

δ(n) 1
δ(n − n0 ) e−jωn0
X∞
1, (−∞ < n < ∞) 2πδ(ω + 2πk)
k=−∞
1
an u(n), (|a| < 1)
1 − ae−jω ∞
1 X
u(n) + πδ(ω + 2πk)
1 − e−jω
k=−∞
1
(n + 1)an u(n) (|a| < 1)
(1 − ae−jω )2
rn sin(ωp (n + 1)) 1
u(n) (|r| < 1) −jω ) + r 2 e−j2ω
sin(ωp ) 1 − 2r cos(ω
pe
sin(ωc n) 1, |ω| < ωc
X(ejω ) =
πn 0, ωc < |ω| ≤ π
1, 0 ≤ n ≤ M sin[ω(M + 1)/2] −jωM/2
x(n) = e
0, otherwise sin(ω/2)
X∞
ejω0 m 2πδ(ω − ω0 + 2πk)
k=−∞
X∞
cos(ω0 n + φ) π [ejφ δ(ω − ω0 + 2πk) + e−jφ δ(ω + ω0 + 2πk)
k=−∞

Digital Signal Processin-TU Darmstadt

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Digital Signal Processin-TU Darmstadt

Uploaded by

Copyright:

Available Formats

Darmstadt University of Technology

Digital Signal Processing

Manuscript draft for Digital Signal Processing

1 Discrete-Time Signals and Systems 1

2 Digital Processing of Continuous-Time Signals 13

4 Finite Impulse Response Filter Design 33

5 Design of Infinite Impulse Response Filters 53

6 Introduction to Digital Spectral Analysis 65

7 Random Variables and Stochastic Processes 69

8 The Finite Fourier Transform 111

9 Spectrum Estimation 115

10 Non-Parametric Spectrum Estimation 119

11 Parametric Spectrum Estimation 139

12 Discrete Kalman Filter 145

1.1 Example of a continuous-time signal. . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 Continuous-Time to digital converter. . . . . . . . . . . . . . . . . . . . . . . . . 14

3.1 A basic system for discrete-time filtering of a continuous-time signal. . . . . . . . 20

4.1 The sequence h(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

5.1 Low-pass filter specifications. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

6.1 A Michelson Interferometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

7.1 Probability function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.13 Cascade of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

8.1 Transfer function of an ideal bandpass filter. . . . . . . . . . . . . . . . . . . . . 114

11.1 Linear and time-invariant filtering of white noise . . . . . . . . . . . . . . . . . . 139

3. Mitra, S.K. Digital Signal Processing: A Computer-Based Approach, Prentice-Hall, 1998.

6. Kammeyer, K.D. & Kroschel, K. Digitale Signalverarbeitung, Teubner Studienbücher, 1998

7. Hänsler, E. Statistische Signale, Springer Verlag, 3. Auflage, 2001

8. Böhme, J.F. Stochastische Signale, Teubner Studienbücher, 1998

Discrete-Time Signals and Systems

Figure 1.1: Example of a continuous-time signal.

2. Discrete-time signals are discrete in time and continuous in amplitude. An example of

Figure 1.2: Example of a continuous-time signal.

based on continuous-time or discrete-time signals are often applicable to digital signals. A

Figure 1.3: Example of a sampled continuous-time signal.

1.1.1 The unit sample sequence

Figure 1.5: Discrete Time Signal.

1.1.2 The unit step sequence

1.1.3 Exponential and sinusoidal sequences

Figure 1.6: Exponential Sequence with 0 < α < 1.

x(n) = Aαn = |A|ejφ |α|n ejnω0

ℜ{x(n)} = |A| · |α|n cos(πn)

Figure 1.7: Exponential Sequence with −1 < α < 0.

Figure 1.8: Sinusoidal Sequence.

The imaginary part is zero for all n.

x(n) = |A| · ej(ω0 n+φ) = |A| [cos(πn + φ) + j sin(πn + φ)] ,

i.e., real and imaginary parts of ejω0 n vary sinusoidally with n.

Figure 1.9: Real and Imaginary parts of x(n).

1.1.4 Periodic Sequences

x(n) - T[·] - y(n)

Figure 1.10: Representation of a discrete-time system.

T[ax1 (n) + bx2 (n)] = ay1 (n) + by2 (n)

T[x1 (n) + x2 (n)] = T[x1 (n)] + T[x2 (n)]

where y(n) = T[x(n)] and n0 is any integer.

Example 1.2.2 A system described by the relation

is called an accumulator. This system is both linear and shift invariant.

Example 1.2.3 Consider a system which is described by

y(n) = T[x(n)] = x(n) · g(n)

T [ax1 (n) + bx2 (n)] = g(n) [ax1 (n) + bx2 (n)]

Figure 1.11: Linear Shift Invariant System.

However, the system is shift variant because

T [x(n − n0 )] = x(n − n0 )g(n)

does not equal

Example 1.2.4 Consider a system described by