This action might not be possible to undo. Are you sure you want to continue?
Digital Signal Processing
Abdelhak M. Zoubir
Signal Processing Group Institute of Telecommunications Darmstadt University of Technology Merckstr. 25, 64283 Darmstadt, Germany email: zoubir@spg.tudarmstadt.de URL: http://www.spg.tudarmstadt.de
Manuscript draft for Digital Signal Processing
c Abdelhak Zoubir 2006
Last revised: October 2006.
Contents
Preface 1 DiscreteTime Signals and Systems 1.1 Signals . . . . . . . . . . . . . . . . . . . . . . . . . 1.1.1 The unit sample sequence . . . . . . . . . . 1.1.2 The unit step sequence . . . . . . . . . . . . 1.1.3 Exponential and sinusoidal sequences . . . 1.1.4 Periodic Sequences . . . . . . . . . . . . . . 1.2 Systems . . . . . . . . . . . . . . . . . . . . . . . . 1.2.1 The unit sample response of linear systems 1.3 Convolution . . . . . . . . . . . . . . . . . . . . . . 1.4 Stability and Causality . . . . . . . . . . . . . . . . 1.4.1 Stability . . . . . . . . . . . . . . . . . . . . 1.4.2 Causality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . xi 1 1 2 3 4 6 6 8 9 9 10 11 13 13 15 15 19 19 20 21 22 27 28 30 31 33 33 33 35 40 45 48
2 Digital Processing of ContinuousTime Signals 2.1 Periodic Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2.2 Reconstruction of BandLimited Signals . . . . . . . . . . . . . . . . . . . . . . . 2.3 Discretetime Processing of ContinuousTime Signals. . . . . . . . . . . . . . . . 3 Filter Design 3.1 Introduction . . . . . . . . . . . . . . . . 3.2 Digital Filters . . . . . . . . . . . . . . . 3.3 Linear Phase Filters . . . . . . . . . . . 3.3.1 Generalised Linear Phase for FIR 3.4 Filter Speciﬁcation . . . . . . . . . . . . 3.5 Properties of IIR Filters . . . . . . . . . 3.5.1 Stability . . . . . . . . . . . . . . 3.5.2 Linear phase . . . . . . . . . . . . . . . . . . . . . . . . . . Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Finite Impulse Response Filter Design 4.1 Introduction . . . . . . . . . . . . . . . . . . . 4.2 Design of FIR Filters by Windowing . . . . . 4.3 Importance of the Window Shape and Length 4.4 Kaiser Window Filter Design . . . . . . . . . 4.5 Optimal Filter Design. . . . . . . . . . . . . . 4.6 Implementation of FIR Filters . . . . . . . . .
i
ii 5 Design of Inﬁnite Impulse Response Filters 5.1 Introduction . . . . . . . . . . . . . . . . . . . 5.2 Impulse Invariance Method . . . . . . . . . . 5.3 Bilinear Transformation . . . . . . . . . . . . 5.4 Implementation of IIR Filters . . . . . . . . . 5.4.1 Direct Form I . . . . . . . . . . . . . . 5.4.2 Direct Form II . . . . . . . . . . . . . 5.4.3 Cascade form . . . . . . . . . . . . . . 5.5 A Comparison of FIR and IIR Digital Filters
CONTENTS 53 53 54 57 60 60 61 62 63 65 65 65 69 69 69 69 70 70 71 72 73 74 77 77 78 79 80 81 83 84 86 88 93 93 95 96 97 99 100 100 101 106
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
. . . . . . . .
6 Introduction to Digital Spectral Analysis 6.1 Why Spectral Analysis? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6.2 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7 Random Variables and Stochastic Processes 7.1 Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.1 Distribution Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.3 Uniform Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.4 Exponential Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.5 Normal Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.6 χ2 Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.7 Multiple Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.8 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.9 Operations on Random Variables . . . . . . . . . . . . . . . . . . . . . . 7.1.10 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.11 Expected Value of a Function of Random Variables . . . . . . . . . . . . 7.1.12 The Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.1.13 Transformation of a Random Variable . . . . . . . . . . . . . . . . . . . 7.1.14 The Characteristic Function . . . . . . . . . . . . . . . . . . . . . . . . . 7.2 Covariance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.3 Stochastic Processes (Random Processes) . . . . . . . . . . . . . . . . . . . . . 7.3.1 Stationarity and Ergodicity . . . . . . . . . . . . . . . . . . . . . . . . . 7.3.2 SecondOrder Moment Functions and WideSense Stationarity . . . . . 7.4 Power Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.1 Deﬁnition of Power Spectral Density . . . . . . . . . . . . . . . . . . . . 7.4.2 Relationship between the PSD and SOMF: WienerKhintchine Theorem 7.4.3 Properties of the PSD . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.4.4 CrossPower Spectral Density . . . . . . . . . . . . . . . . . . . . . . . . 7.5 Deﬁnition of the Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6 Random Signals and Linear Systems . . . . . . . . . . . . . . . . . . . . . . . . 7.6.1 Convergence of Random Variables . . . . . . . . . . . . . . . . . . . . . 7.6.2 Linear ﬁltering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7.6.3 Linear ﬁltered process plus noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
8 The Finite Fourier Transform 111 8.1 Statistical Properties of the Finite Fourier Transform . . . . . . . . . . . . . . . . 111 8.1.1 Preliminaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
CONTENTS
iii
9 Spectrum Estimation 115 9.1 Use of bandpass ﬁlters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 10 NonParametric Spectrum Estimation 10.1 Consistency of Spectral Estimates . . . . . . . . . . . . . 10.2 The Periodogram . . . . . . . . . . . . . . . . . . . . . . . 10.2.1 Mean of the Periodogram . . . . . . . . . . . . . . 10.2.2 Variance of the Periodogram . . . . . . . . . . . . 10.3 Averaging Periodograms  Bartlett Estimator . . . . . . . 10.4 Windowing Periodograms  Welch Estimator . . . . . . . 10.5 Smoothing the Periodogram . . . . . . . . . . . . . . . . . 10.6 A General Class of Spectral Estimates . . . . . . . . . . . 10.7 The LogSpectrum . . . . . . . . . . . . . . . . . . . . . . 10.8 Estimation of the Spectrum using the Sample Covariance 10.8.1 BlackmanTukey Method . . . . . . . . . . . . . . 10.9 CrossSpectrum Estimation . . . . . . . . . . . . . . . . . 10.9.1 The CrossPeriodogram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119 119 120 120 122 124 126 127 130 133 133 134 135 136 139 139 141 143 145 145 145 149 152
11 Parametric Spectrum Estimation 11.1 Spectrum Estimation using an Autoregressive Process . . . . . . . . . . . . . . . 11.2 Spectrum Estimation using a Moving Average Process . . . . . . . . . . . . . . . 11.3 Model order selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12 Discrete Kalman Filter 12.1 State Estimation . . . . . . . . . . . . 12.2 Derivation of Kalman Filter Equations 12.3 Extended Kalman Filter . . . . . . . . .1 DiscreteTime Fourier Transform . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
iv
CONTENTS
List of Figures
1.1 1.2 1.3 1.4 1.5 1.6 1.7 1.8 1.9 1.10 1.11 1.12 1.13 2.1 2.2 Example of a continuoustime signal. . . . . Example of a continuoustime signal. . . . . Example of a sampled continuoustime signal. The sequence p(n). . . . . . . . . . . . . . . Discrete Time Signal. . . . . . . . . . . . . . Exponential Sequence with 0 < α < 1. . . . . Exponential Sequence with −1 < α < 0. . . . Sinusoidal Sequence. . . . . . . . . . . . . . . Real and Imaginary parts of x(n). . . . . . . Representation of a discretetime system. . . Linear Shift Invariant System. . . . . . . . . Transversal Filter input and output for q = 3. Convolution operation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1 2 2 3 3 4 5 5 6 7 8 10 10 14
2.3 2.4 2.5 3.1 3.2 3.3 3.4
ContinuousTime to digital converter. . . . . . . . . . . . . . . . . . . . . . . . . Eﬀect in the frequency domain of sampling in the time domain: (a) Spectrum of the continuoustime signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Discretetime processing of continuoustime signals. . . . . . . . . . . . . . . . . . Frequency Spectra. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ContinuousTime Filter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
16 17 17 17 20 21 25 26 26 26 27 28 29 29
A basic system for discretetime ﬁltering of a continuoustime signal. . . . . . . . Linear and timeinvariant ﬁltering of s(n). . . . . . . . . . . . . . . . . . . . . . . Examples of the four types of linear phase FIR ﬁlters . . . . . . . . . . . . . . . . Frequency response of type I ﬁlter of Example 9.3.1. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.5 Frequency response of type II ﬁlter of Example 9.3.2. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.6 Frequency response of type III ﬁlter of Example 9.3.3. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.7 Frequency response of type IV ﬁlter of Example 9.3.4. Magnitude (left). Phase (right). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3.8 Frequency response of ideal lowpass, highpass, bandpass, and bandstop ﬁlters. 3.9 Speciﬁcations of a lowpass ﬁlter. . . . . . . . . . . . . . . . . . . . . . . . . . . 3.10 Lowpass ﬁlter with transition band. . . . . . . . . . . . . . . . . . . . . . . . .
v
vi
LIST OF FIGURES 3.11 Tolerance scheme for digital ﬁlter design. (a) Lowpass. (b) High pass. (c) Bandpass. (d) Stopband. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.1 4.2 4.3 4.4 4.5 4.6 4.7 4.8 4.9 4.10 4.11 The sequence h(n). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magnitude of the Fourier transform of a rectangular window (N = 8). . . . . . . Commonly used windows for FIR ﬁlter design N = 200. . . . . . . . . . . . . . . Fourier transform (magnitude) of rectangular window (N = 51). . . . . . . . . . Fourier transform (magnitude) of Bartlett (triangular) window (N = 51). . . . . Fourier transform (magnitude) of Hanning window (N = 51). . . . . . . . . . . . Fourier transform (magnitude) of Hamming window (N = 51) . . . . . . . . . . . Fourier transform (magnitude) of Blackman window (N = 51). . . . . . . . . . . Fourier transform of a Hamming window with window length of N = 101. . . . Fourier transform of a Hamming window with window length of N = 201. . . . (a) Kaiser windows for β = 0 (solid), 3 (semidashed), and 6 (dashed) and N = 21. (b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows with β = 6 and N = 11 (solid), 21 (semidashed), and 41 (dashed). Response functions. (a) Impulse response. (b) Log magnitude. . . . . . . . . . . Kaiser window estimate. (a) Impulse response. (b) Log magnitude. . . . . . . . . Filter Speciﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Flowchart of ParksMcClellan algorithm. . . . . . . . . . . . . . . . . . . . . . . . Signal ﬂow graph for a delay element. . . . . . . . . . . . . . . . . . . . . . . . . Signal Flow graph representing the convolution sum. . . . . . . . . . . . . . . . Eﬃcient signal ﬂow graph for the realisation of a type I FIR ﬁlter. . . . . . . . . Lowpass ﬁlter speciﬁcations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Magnitude transfer function Hc (jΩ). . . . . . . . . . . . . . . . . . . . . . . . . . Filter speciﬁcations for Example 5.2.2. . . . . . . . . . . . . . . . . . . . . . . . . Mapping of the splane onto the zplane using the bilinear transformation . . . . Mapping of the continuoustime frequency axis onto the unit circle by bilinear transformation (Td = 1). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Signal ﬂow graph of direct form I structure for an (N − 1)thorder system. . . . Signal ﬂow graph of direct form II structure for an N − 1thorder system. . . . Cascade structure for a 6thorder system with a direct form II realisation of each secondorder subsystem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A Michelson Interferometer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . y. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
30 35 36 37 38 38 38 39 39 39 40
4.12 4.13 4.14 4.15 4.16 4.17 4.18 5.1 5.2 5.3 5.4 5.5 5.6 5.7 5.8 6.1 7.1 7.2 7.3 7.4 7.5 7.6 7.7 7.8 7.9 7.10 7.11 7.12
41 43 44 46 50 51 51 51 55 56 56 58 58 61 62 63 66 71 73 74 75 76 76 85 90 101 104 105 106
Probability function. . . . . . . . . . . . . . . . . . . χ2 distribution with ν = 1, 2, 3, 5, 10 . . . . . . . . . ν The quadrant D of two arbitrary real numbers x and Bivariate probability distribution function. . . . . . . Properties for n=2 . . . . . . . . . . . . . . . . . . . Probability that (X1 , X2 ) is in the rectangle D3 . . . Voltage waveforms emitted from a noise source. . . . Measurement of SOMF. . . . . . . . . . . . . . . . . A linear ﬁlter. . . . . . . . . . . . . . . . . . . . . . . Interaction of two linear systems . . . . . . . . . . . Interaction of two linear systems. . . . . . . . . . . . Cascade of L linear systems. . . . . . . . . . . . . . .
LIST OF FIGURES
vii
7.13 Cascade of two linear systems. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 7.14 A linear ﬁlter plus noise. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106 8.1 9.1 9.2 Transfer function of an ideal bandpass ﬁlter. . . . . . . . . . . . . . . . . . . . . 114
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115 Transfer function of an ideal bandpass ﬁlter . . . . . . . . . . . . . . . . . . . . 116 122
1 10.1 The window function N ∆N (ejω )2 for N = 20, N = 10 and N = 5. . . . . . . . 10.2 The periodogram estimate of the spectrum (solid line) and the true spectrum ω (dotted line). The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.3 The periodogram as an estimate of the spectrum using 4 nonoverlapping segω ments. The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.4 Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. . . . . . . . . . . . . . . . . . . 10.5 The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to ω bottom right). The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10.6 Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
123
125 129
130
132 139 141 143 144 144
Linear and timeinvariant ﬁltering of white noise . . . . . . . . . . . . . . . . . . Filter model for a moving average process . . . . . . . . . . . . . . . . . . . . . . Estimate of an MA process with N = 100 and q = 5 . . . . . . . . . . . . . . . . ω True and estimated AR(6) spectrum. The abscissa displays the frequency 2π while the ordinate shows the estimate. . . . . . . . . . . . . . . . . . . . . . . . 11.5 Akaike and MDL information criteria for the AR(6) process for orders 1 through 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
11.1 11.2 11.3 11.4
12.1 Relationship between system state and measurements assumed for Kalman ﬁlter. 146
viii
LIST OF FIGURES
List of Tables
3.1 4.1 4.2 The four classes of FIR ﬁlters and their associated properties . . . . . . . . . . . Suitability of a given class of ﬁlter for the four FIR types . . . . . . . . . . . . . Properties of diﬀerent windows. . . . . . . . . . . . . . . . . . . . . . . . . . . . 24 33 37
ix
x
LIST OF TABLES
Preface
This draft is provided as an aid for students studying the subject Digital Signal Processing. It covers the material presented during lectures. Useful references on this subject include: 1. Oppenheim, A.V. & Schafer, R.W. DiscreteTime Signal Processing, PrenticeHall, 1989. 2. Proakis, J.G. & Manolakis, D.G. Digital Signal Processing Principles, Algorithms, and Applications, Maxwell MacMillan Editions, 1992. 3. Mitra, S.K. Digital Signal Processing: A ComputerBased Approach, PrenticeHall, 1998. 4. Diniz, P. & da Silva, E. & Netto, S. Digital Signal Processing, System Analysis and Design, Cambridge University Press, 2002 5. Porat, B. A Course on Digital Signal Processing, J. Wiley & Sons Inc., 1997 6. Kammeyer, K.D. & Kroschel, K. Digitale Signalverarbeitung, Teubner Studienb¨cher, 1998 u 7. H¨nsler, E. Statistische Signale, Springer Verlag, 3. Auﬂage, 2001 a 8. B¨hme, J.F. Stochastische Signale, Teubner Studienb¨cher, 1998 o u 9. Brillinger, D.R. Time Series, Data Analysis and Theory, Holden Day Inc., 1981 10. Priestley, M.B. Spectral Analysis and Time Series, Academic Press, 2001
xi
Chapter 1
DiscreteTime Signals and Systems
1.1 Signals
Most signals can in practice be classiﬁed into three broad groups: 1. Analog or continuoustime signals are continuous in both time and amplitude. Examples: speech, image, seismic and radar signals. An example of a continuous time signal is shown in Figure 1.1.
x(t)
t
Figure 1.1: Example of a continuoustime signal. 2. Discretetime signals are discrete in time and continuous in amplitude. An example of such a signal is shown in Figure 1.2. A common way to generate discretetime signals is by sampling continuoustime signals. There exist, however, signals which are intrinsically discrete in time, such as the average monthly rainfall. 3. Digital signals are discrete in both time and amplitude. Digital signals may be generated through the amplitude quantisation of discretetime signals. The signals used by computers are digital, being discrete in both time and amplitude. The development of digital signal processing concepts based on digital signals requires a detailed treatment of amplitude quantisation, which is extremely diﬃcult and tedious. In addition, many useful insights are lost in such a treatment because of the mathematical complexity. For this reason, digital signal processing concepts have been developed based on discretetime signals and continuoustime signals. Experience shows that the theories developed 1
2
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS
T x(n)
E
1 2 3
n
Figure 1.2: Example of a continuoustime signal. based on continuoustime or discretetime signals are often applicable to digital signals. A discretetime signal (sequence) will be denoted by functions whose arguments are integers. For example, x(n) represents a sequence which is deﬁned for all integer values of n. If n is not integer then x(n) is not zero, but is undeﬁned. The notation x(n) refers to the discretetime function x or to the value of the function x, at a speciﬁc n. Figure 1.3 gives an example of a sampled signal with sampling period T . Herein x(n) = x(t)t=nT = x(nT ), n integer and T > 0.
x(n)
1Τ
2Τ
3Τ
t
Figure 1.3: Example of a sampled continuoustime signal. There are some sequences which play an important role in digital signal processing. We discuss these next.
1.1.1
The unit sample sequence
The unit sample sequence is denoted by δ(n) and is deﬁned as δ(n) = 1, n=0, 0, otherwise.
The sequence δ(n) may be regarded as an analog to the impulse function used in continuoustime system analysis. An important aspect of the impulse sequence is that any sequence can be represented as a linear combination of delayed impulses. For example, the sequence p(n) in Figure 1.4 can be expressed as p(n) = a−1 δ(n + 1) + a1 δ(n − 1) + a3 δ(n − 3) .
1.1. SIGNALS In general,
T
3
p(n) a1 a3
E n
a−1
−1
1
3
Figure 1.4: The sequence p(n).
∞ k=−∞
x(n) = as shown in Figure 1.5.
x(k)δ(n − k) ,
Tx(n)
E
1
T
n
x(k) δ(n − k)
E
n Figure 1.5: Discrete Time Signal.
k
1.1.2
The unit step sequence
1, n ≥ 0, 0, otherwise.
n
The unit step sequence denoted by u(n) is deﬁned as u(n) = The sequence u(n) is related to δ(n) by u(n) =
k=−∞
δ(k) ,
4 or alternatively
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS
∞ k=0
u(n) =
δ(n − k) .
Conversely, the impulse sequence can be expressed as the ﬁrst backward diﬀerence of the unit step sequence, i.e.,
n n−1
δ(n) = u(n) − u(n − 1) =
k=−∞
δ(k) −
δ(k)
k=−∞
1.1.3
Exponential and sinusoidal sequences
Exponential and sinusoidal sequences play an important role in digital signal processing. The general form of an exponential sequence is x(n) = A · αn , (1.1)
where A and α may be complex numbers. For real A > 0 and real 0 < α < 1, x(n) is a decreasing sequence similar in appearance to Figure 1.6.
x(n)
n
Figure 1.6: Exponential Sequence with 0 < α < 1. For −1 < α < 0, the sequence values alternate in sign but again decrease in magnitude with increasing n, as shown in Figure 1.7. If α > 1, then x(n) grows in magnitude as n increases. A sinusoidal sequence has the general form x(n) = A · cos(ωo n + φ) with A and φ real. The sinusoidal sequence is shown in Figure 1.8 for φ = Now let A and α be complex so that A = Aejφ and α = αejω0 , then x(n) = Aαn = Aejφ αn ejnω0 = A · αn ej(ω0 n+φ)
3π 2 .
= A · αn [cos(ω0 n + φ) + j sin(ω0 n + φ)] A plot of the real part of x(n) = A · αn for A = Aejω0 is given in Figure 1.9 for α > 1. In this case we set ω0 = π and φ = 0, i.e., ℜ{x(n)} = A · αn cos(πn) = A · αn (−1)n .
1.1. SIGNALS
x(n)
5
n
Figure 1.7: Exponential Sequence with −1 < α < 0.
Tx(n)
E
π 2ω0
n
Figure 1.8: Sinusoidal Sequence. The imaginary part is zero for all n. For α = 1, x(n) is referred to as the complex exponential sequence, and has the form x(n) = A · ej(ω0 n+φ) = A [cos(πn + φ) + j sin(πn + φ)] , i.e., real and imaginary parts of ejω0 n vary sinusoidally with n. Note. In the continuoustime case, the quantity ω0 is called the frequency and has the dimension radians per second. In the discretetime case n is a dimensionless integer. Thus the dimension of ω0 must be radians. To maintain a closer analogy with the continuoustime case we can specify the units of ω0 to be radians per sample and the units of n to be samples. Following this rationale we can deﬁne the period of a complex exponential sequence as 2π/ω0 which has the dimension samples.
6
x(n)
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS
n
Figure 1.9: Real and Imaginary parts of x(n).
1.1.4
Periodic Sequences
∀ n,
A sequence x(n) is called periodic with period N if x(n) satisﬁes the following condition: x(n) = x(n + N )
where N is a positive integer. For example, the sequence cos(πn) is periodic with period 2k, k ∈ Z, but the sequence cos(n) is not periodic. Note that a sinusoidal discretetime sequence A cos (ω0 n) is periodic with period 2π if ω0 N = 2πk, k ∈ Z. A similar statement holds for the complex exponential sequence C · ejω0 n , i.e., periodicity with period N requires ejω0 (n+N ) = ejω0 n , which is true only for Consequently, complex exponential and sinusoidal sequences are not necessarily periodic with period 2π/ω0 and, depending on the value of ω0 , may not be periodic at all. For example, for ω0 = 3π/4, the smallest value of N for which equation (1.2) can be satisﬁed with k an integer is N = 8 ( corresponding to k = 3). For ω0 = 1, there are no integer values of N or k for which Equation (1.2) is satisﬁed. When we combine the condition of Equation (1.2) with the observation that ω0 and ω0 +2πr, r ∈ Z, are indistinguishable frequencies, then it is clear that there are only N distinguishable frequencies for which the corresponding sequences are periodic with period N . One set of frequencies is ωk = 2πk/N , k = 0, 1, . . . , N − 1. These properties of complex exponential and sinusoidal sequences are basic to both the theory and the design of computational algorithms for discretetime Fourier analysis. ω0 N = 2πk, k ∈ Z . (1.2)
1.2
Systems
y(n) = T[x(n)]
A system T that maps an input x(n) to an output y(n) is represented by:
1.2. SYSTEMS x(n)
E E y(n)
7
T[·]
Figure 1.10: Representation of a discretetime system. Example 1.2.1 The ideal delay system is deﬁned by y(n) = x(n − nd ), nd ∈ Z+ . The system would shift the input to the right by nd of samples corresponding to a time delay. This deﬁnition of a system is very broad. Without some restrictions, we cannot completely characterise a system when we know the output of a system to a certain set of inputs. Two types of restrictions greatly simplify the characterisation and analysis of systems; linearity and shift invariance (LSI). Many systems can be approximated in practice by a linear and shift invariant system. A system T is said to be linear if T[ax1 (n) + bx2 (n)] = ay1 (n) + by2 (n) holds, where T[x1 (n)] = y1 (n), T[x2 (n)] = y2 (n), and a and b are any scalar constants. The property T[ax(n)] = aT[x(n)] is called homogeneity, and T[x1 (n) + x2 (n)] = T[x1 (n)] + T[x2 (n)] is called additivity. Additivity and homogeneity are collectively known as the Principle of Superposition. A system is said to be shift invariant (SI) if T[x(n − n0 )] = y(n − n0 ) where y(n) = T[x(n)] and n0 is any integer. Example 1.2.2 A system described by the relation
n
y(n) =
k=−∞
x(k)
is called an accumulator. This system is both linear and shift invariant. Example 1.2.3 Consider a system which is described by y(n) = T[x(n)] = x(n) · g(n) where g(n) is an arbitrary sequence. This system is shown in Figure 1.11. system is easily veriﬁed since T [ax1 (n) + bx2 (n)] = g(n) [ax1 (n) + bx2 (n)] = ag(n)x1 (n) + bg(n)x2 (n) = aT [x1 (n)] + bT [x2 (n)] . Linearity of the
8
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS x(n)
E
×
E y(n)
T
T g(n)
Figure 1.11: Linear Shift Invariant System. However, the system is shift variant because T [x(n − n0 )] = x(n − n0 )g(n) does not equal y(n − n0 ) = x(n − n0 )g(n − n0 ) . Example 1.2.4 Consider a system described by y(n) = T [x(n)] = x2 (n) . Although this system is shiftinvariant, it is nonlinear. The system is nonlinear because T [x1 (n) + x2 (n)] = [x1 (n) + x2 (n)]2 , which is diﬀerent from y1 (n) + y2 (n) = x2 (n) + x2 (n) . 1 2 The system is shiftinvariant because T[x(n − n0 )] = x2 (n − n0 ) , and y(n − n0 ) = x2 (n − n0 ) .
1.2.1
The unit sample response of linear systems
∞ k=−∞
Consider a linear system T. Using the relationship x(n) = x(k)δ(n − k)
and the linearity property, the output y(n) of a linear system T for an input x(n) is given by y(n) = T[x(n)] = T
∞ k=−∞ ∞
x(k)δ(n − k)
= T[· · · + x(−1)δ(n + 1) + x(0)δ(n) + x(1)δ(n − 1) + · · · ] =
k=−∞
x(k)T[δ(n − k)]
1.3. CONVOLUTION
9
where h(n) = T[δ(n)] is the unit sample response of a system T to an input δ(n).
This result indicates that a linear system can be completely characterised by the response of the system to a unit sample sequence δ(n) and its delays δ(n − k). Moreover, if the system is shiftinvariant, we have: h(n − n0 ) = T[δ(n − n0 )]
1.3
Convolution
∞ k=−∞
The inputoutput relation of a linear and shiftinvariant (LSI) system T is given by y(n) = T[x(n)] = x(k)h(n − k)
This equation shows that knowledge of h(n) completely characterises the system, i.e., knowing the unit sample response, h(n), of a LSI system T, allows us to completely determine the output y(n) for a given input x(n). This equation is referred to as convolution and is denoted by the convolution operator ⋆. For a LSI system, we have y(n) = x(n) ⋆ h(n) =
∞ k=−∞
x(k)h(n − k)
Remark 1: The unit sample response h(n) loses its signiﬁcance for a nonlinear or shiftvariant system. Remark 2: The choice of δ(n) as an input in characterising a LSI system is the simplest, both conceptually and in practice. Example 1.3.1 A linear transversal ﬁlter is given by
q
y(n) =
k=0
x(n − k)a(k)
= a(0)x(n) + a(1)x(n − 1) + . . . + a(q)x(n − q)
1 If we take a(0) = a(1) = . . . = a(q) = q+1 , the ﬁlter output sequence is an average of q + 1 samples of the input sequence around the nth sample. A transversal ﬁlter input and output for q = 3 are shown in Figure 1.12.
Example 1.3.2 To compute the relationship between the input and the output with a transversal ﬁlter, h(−k) is ﬁrst plotted against k; h(−k) is simply h(k) reﬂected or “ﬂipped” around k = 0, as shown in Figure 1.13. Replacing k by k − n, where n is a ﬁxed integer, leads to a shift in the origin of the sequence h(−k) to k = n. y(n) is then obtained by the x(k)h(n − k). See Figure 1.13.
1.4
Stability and Causality
To implement a LSI system in practice (realisability), we have to impose two additional constraints – stability and causality.
10
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS
x(k)
E
n−3
n
k y(k)
E
n−3
n
k
Figure 1.12: Transversal Filter input and output for q = 3. x(k) T 3 2 1
E
h(k) T
2 1
E
k
T Th(−k)
k y(n) 7 6 4
2 1
E
1
E
k Figure 1.13: Convolution operation.
n
1.4.1
Stability
A system is considered stable in the boundedinput boundedoutput (BIBO) sense if and only if a bounded input always leads to a bounded output. A necessary and suﬃcient condition for
1.4. STABILITY AND CAUSALITY
11
an LSI system to be stable is that the unit sample response h(n) is absolutely summable, i.e.,
∞ n=−∞
h(n) < ∞
Such a sequence h(n) is sometimes referred to as stable sequence. The proof is quite simple. First let x(n) be bounded by M , i.e., x(n) ≤ M < ∞ ∀ n, then using convolution y(n) = ≤
∞ k=−∞ ∞
h(k)x(n − k)
≤ M
k=−∞ ∞
h(k)x(n − k) h(k)
k=−∞
So given a bounded input x(n) ≤ M , the output y(n) is bounded if and only if h(n) is absolutely summable.
1.4.2
Causality
Causality is a fundamental law of physics stating that information cannot travel from the future to the past. In the context of systems we say that a system is causal if and only if the current output y(n) does not depend on any future values of the input such as x(n + 1), x(n + 2), . . .. Causality holds for any practical system regardless of linearity or shift invariance. A necessary and suﬃcient condition for an LSI system to be causal is that h(n) = 0, for n < 0. The proof follows from convolution, y(n) =
∞ k=−∞
x(n − k)h(k).
To obtain y, at index n, causality implies that we cannot assume knowledge of x at index n − k for n − k > n, or simply, for k < 0. Setting h(k) = 0 for k < 0 ensures this.
12
CHAPTER 1. DISCRETETIME SIGNALS AND SYSTEMS
Chapter 2
Digital Processing of ContinuousTime Signals
Discretetime signals arise in many ways, but they most commonly occur in representations of continuoustime signals. This is partly due to the fact that the processing of continuoustime signals is often carried out on discretetime sequences obtained by sampling continuoustime signals. It is remarkable that under reasonable constraints a continuoustime signal can be adequately represented by a sequence. In this chapter we discuss the process of periodic sampling.
2.1
Periodic Sampling
+∞
Let xc (t) be a continuoustime signal and its Fourier transform be denoted by Xc (jΩ) Xc (jΩ) =
−∞
xc (t)e−jΩt dt .
The signal xc (t) is obtained from Xc (jΩ) by xc (t) = 1 2π
+∞ −∞
Xc (jΩ)e+jΩt dΩ .
The typical method of obtaining a discretetime representation of a continuoustime signal is through periodic sampling, wherein a sequence of samples x(n) is obtained from the continuoustime signal xc (t) according to x(n) = xc (t)t=nT = xc (nT ), n = 0, ±1, ±2, . . . , T > 0.
Herein, T is the sampling period, and its reciprocal, fs = 1/T , is the sampling frequency, in samples per second and Ωs = 2πfs is the sampling frequency in rad/sec. This equation is the inputoutput relationship of an ideal A/D converter. It is generally not invertible since many continuoustime signals can produce the same output sequence of samples. This ambiguity can be removed by restricting the class of input signals to the sampler, however. It is convenient to represent the sampling process in two stages, as shown in Figure 2.1. This consists of an impulse train modulator which is followed by conversion of the impulse train into a discrete sequence. The system G in Figure 2.1 represents a system that converts an impulse
13
14
CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUSTIME SIGNALS
E
xc (t)
×
E
G
E x(n) = xc (nT )
T xs (t)
p(t) Figure 2.1: ContinuousTime to digital converter. train into a discretetime sequence. The modulation signal p(t) is a periodic impulse train, p(t) =
∞ n=−∞
δ(t − nT ) ,
where δ(t) is called the Dirac delta function. Consequently, we have xs (t) = xc (t) · p(t) = xc (t) ·
∞ n=−∞
δ(t − nT ) =
∞ n=−∞
xc (nT )δ(t − nT ) =
∞ n=−∞
x(n)δ(t − nT ).
The Fourier transform of xs (t), Xs (jΩ), is obtained by convolving Xc (jΩ) and P (jΩ). The Fourier transform of a periodic impulse train is a periodic impulse train, i.e., 2π P (jΩ) = T Since Xs (jΩ) = it follows that Xs (jΩ) = 1 T
∞ k=−∞ ∞ k=−∞
2πk 2π δ(Ω − )= T T
∞ k=−∞
δ(Ω − kΩs ).
1 Xc (jΩ) ∗ P (jΩ), 2π 2πk T 1 T
∞ k=−∞
Xc j Ω −
=
Xc (jΩ − kjΩs ) .
The same result can be obtained by noting that p(t) is periodic so that it can be represented 1 as a Fourier series with coeﬃcients T , i.e., p(t) = Thus xs (t) = xc (t)p(t) = 1 T
∞ n=−∞
1 T
∞ n=−∞
ejnΩs t .
xc (t)ejnΩs t .
Taking the Fourier transform of xs (t) leads to 1 Xs (jΩ) = T
∞ k=−∞
Xc (j (Ω − kΩs )) .
2.2. RECONSTRUCTION OF BANDLIMITED SIGNALS
15
Figure 2.2 depicts the frequency domain representation of impulse train sampling. Figure 2.2(a) represents the Fourier transform of a bandlimited signal where the highest nonzero frequency component in Xc (jΩ) is at ΩB . Figure 2.2(b) shows the periodic impulse train P (jΩ), Figure 2.2(c) shows Xs (jΩ), which is the result of convolving Xc (jΩ) with P (jΩ). From Figure 2.2(c) it is evident that when Ωs − ΩB > ΩB , or Ωs > 2ΩB the replica of Xc (jΩ) do not overlap, and therefore xc (t) can be recovered from xs (t) with an ideal lowpass ﬁlter with cutoﬀ frequency of at least ΩB but not exceeding ΩS −ΩB . However, in Figure 2.2(d) the replica of Xc (jΩ) overlap, and as a result the original signal cannot be recovered by ideal low pass ﬁltering. The reconstructed signal is related to the original continuoustime signal through a distortion referred to as aliasing. If xc (t) is a bandlimited signal with no components above ΩB rad/sec, then the sampling frequency has to be equal to or greater than 2ΩB rad/sec. This is known as the Nyquist criterion and 2ΩB is known as the Nyquist frequency.
2.2
Reconstruction of BandLimited Signals
If x(n) is the input to an ideal low pass ﬁlter with frequency response Hr (jΩ) and impulse response hr (t), then the output of the ﬁlter will be xr (t) =
∞ n=−∞
x(n)hr (t − nT ) .
where T is the sampling interval. The reconstruction ﬁlter commonly has a gain of T and a cutoﬀ frequency of Ωc = Ωs /2 = π/T . This choice is appropriate for any relationship between Ωs and ΩB that avoids aliasing, i.e., so long as Ωs > 2ΩB . The impulse response of such a ﬁlter is given by sin (πt/T ) hr (t) = . πt/T Consequently, the relation between xr (t) and x(n) is given by xr (t) = with xc (nT ) = x(n).
∞ n=−∞
x(n)
sin[π(t − nT )/T ] , π(t − nT )/T
2.3
Discretetime Processing of ContinuousTime Signals.
A continuoustime signal is processed by digital signal processing techniques using analogtodigital (A/D) and digitaltoanalog (D/A) converters before and after processing. This concept is illustrated in Figure 2.3. The low pass ﬁlter limits the bandwidth of the continuoustime signal to reduce the eﬀect of aliasing. The spectral characteristics (magnitude) of the low pass ﬁlter is shown in Figure 2.5. Example 2.3.1 Assume that we wanted to restore the signal sc (t) from yc (t) = sc (t) + nc (t), where nc (t) is a background noise. The spectral characteristics of these signals are shown in Figure 2.4.
16
CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUSTIME SIGNALS
(a) 1 X c (j Ω)
− ΩB (b) 2π −−− T
ΩB P(j Ω)
Ω
−2 Ω s (c)
− Ωs X s (j Ω)
Ωs
2 Ωs
Ω
1 −−− T
ΩB (d) 1 −−− T X s (j Ω)
Ωs
Ω
Ωs
Ω
Figure 2.2: Eﬀect in the frequency domain of sampling in the time domain: (a) Spectrum of the continuoustime signal, (b) Spectrum of the sampling function, (c) Spectrum of the sampled signal with Ωs > 2ΩB , (d) Spectrum of the sampled signal with Ωs < 2ΩB . We can perform the processing for this problem using either continuoustime or digital techniques. Using a continuoustime approach, we would ﬁlter the signal sc (t) using an continuoustime low pass ﬁlter. Alternatively, we could adopt a digital approach, as shown in Figure 2.3. We would ﬁrst low pass ﬁlter yc (t) to prevent aliasing, and then convert the continuoustime signal into a discretetime signal using an A/D converter. Any processing is then performed
2.3. DISCRETETIME PROCESSING OF CONTINUOUSTIME SIGNALS. Analog (or Continuous Time) Signal Analog
E Low pass E
17 Processed
Digital A/D
E Signal E
D/A
E
ContinuousTime Signal
Filter
Processing
Figure 2.3: Discretetime processing of continuoustime signals. digitally on the discretetime signal. The processed signal is then converted back to continuous form using an D/A converter. Sc (jΩ) T 1 Nc (jΩ) T 1
E
E
ΩB
+ΩB
Ω
ΩB1
ΩB
ΩB
ΩB1 Ω
Yc (jΩ) T 1
E
ΩB1
ΩB
ΩB
ΩB1 Ω
Figure 2.4: Frequency Spectra.
T
H(jΩ)
Processed Signal
E E
yc (t)
E
ΩB
0
ΩB Ω
Figure 2.5: ContinuousTime Filter. The advantages of processing signals digitally as compared with an analog approach are:
18
CHAPTER 2. DIGITAL PROCESSING OF CONTINUOUSTIME SIGNALS 1. Flexibility, if we want to change the continuoustime ﬁlter because of change in signal and noise characteristics, we would have to change the hardware components. Using the digital approach, we only need to modify the software. 2. Better control of accuracy requirements. Tolerances in continuoustime circuit components make it extremely diﬃcult for the system designer to control the accuracy of the system. 3. The signals can be stored without deterioration or loss of signal quality. 4. Lower cost of the digital implementation.
Chapter 3
Filter Design
3.1 Introduction
Filters are an important class of linear timeinvariant systems. Their objective is to modify the input signal. In practice ﬁlters are frequency selective, meaning they pass certain frequency components of the input signal and totally reject others. The objective of this chapter and the following ones is to introduce the concepts of frequency selective ﬁlters and the most popular ﬁlter design techniques. The design of ﬁlters (continuoustime or digital) involves the following three steps. 1. Filter speciﬁcation This step consists of setting the required characteristics of the ﬁlter. For example, to restore a signal which has been degraded by background noise, the ﬁlter characteristics will depend on the spectral characteristics of both the signal and background noise. Typically (but not always) speciﬁcations are given in the frequency domain. 2. Filter design Determine the unit sample response, h(n) of the ﬁlter or its system function, H(z) and H(ejω ). 3. Filter implementation Implement a discrete time system with the given h(n) or H(z) or one which approaches it. These three steps are closely related. For instance, it does not make much sense to specify a ﬁlter that cannot be designed or implemented. So, we shall consider a particular class of digital ﬁlters which possess the following practical properties: 1. h(n) is real. 2. h(n) is causal, i.e., h(n) = 0 for n < 0. 3. h(n) is stable, i.e,
∞ n=0
h(n) < ∞ .
19
20
CHAPTER 3. FILTER DESIGN
In a practical setting, the desired ﬁlter is often implemented digitally on a chip and is used to ﬁlter a signal derived from a continuoustime signal by means of periodic sampling followed by analogtodigital conversion. For this reason, we refer to digital ﬁlters as discretetime ﬁlters even though the underlying design techniques relate to the discretetime nature of the signals and systems.
E E DiscreteTime System E E
xc (t)
A/D
T
x(n)
y(n)
D/A
T
yc (t)
T
T
Figure 3.1: A basic system for discretetime ﬁltering of a continuoustime signal. A basic system for discretetime ﬁltering of continuoustime signals is given in Figure 3.1. Assuming the input, xc (t), is bandlimited and the sampling frequency is high enough to avoid aliasing, the overall system behaves as a linear timeinvariant system with frequency response Hef f (jΩ) = H(ejΩT ), Ω < π/T 0, Ω > π/T
In such cases, it is straightforward to convert from speciﬁcations on the eﬀective continuoustime ﬁlter to speciﬁcations on the discretetime ﬁlter through the relation ω = ΩT . That is H(ejω ) is speciﬁed over one period by the equation H(ejω ) = Hef f j ω , T ω < π .
This concept will be treated in detail in Chapter 5.
3.2
Digital Filters
Throughout the text we will interchangeably use system and ﬁlter. Linear timeinvariant discretetime systems can be characterised by linear constant coeﬃcient diﬀerence equations with initial rest condition of the type
M k=0
ak y(n − k) =
N −1 r=0
br x(n − r),
(3.1)
where x(n) and y(n), n = 0, ±1, . . . are the input and output of the system and max(N − 1, M ) is the order. The unit sample response of the system may be of either ﬁnite duration or inﬁnite duration. It is useful to distinguish between these two classes as the properties of the digital processing techniques for each diﬀer. If the unit sample response is of ﬁnite duration, we have a ﬁnite impulse response (FIR) ﬁlter, and if the unit sample response is of inﬁnite duration, we have an inﬁnite impulse response (IIR) ﬁlter. If M = 0 so that N −1 1 y(n) = br x(n − r) (3.2) a0
r=0
3.3. LINEAR PHASE FILTERS then the system corresponds to an FIR ﬁlter. In fact comparison with the convolution sum y(n) =
∞ k=−∞
21
h(k)x(n − k)
shows that Equation (3.2) is identical to the convolution sum, hence it follows that h(n) =
bn a0 ,
0,
n = 0, . . . , N − 1 otherwise.
An FIR ﬁlter can be always described by a diﬀerence equation of the form (3.1) with M = 0. M must be greater than zero for an IIR ﬁlter.
3.3
Linear Phase Filters
where HM (ejω ) is a real function of ω and α is a constant. The signiﬁcance of a linear phase ﬁlter lies in the result that if a discretetime signal is ﬁltered by a linearshiftinvariant (LSI) system whose transfer function has magnitude HM (ejω ) = 1 and linear phase (see Figure 3.2), then the spectrum of the output is s(n) HM (ejω ) = 1, α
 y(n)
A digital ﬁlter with unit sample response h(n) is said to have linear phase if H(ejω ) can be expressed as H(ejω ) = HM (ejω )e−jωα ω < π

Figure 3.2: Linear and timeinvariant ﬁltering of s(n). S(ejω )e−jωα ↔ s(n − α) = y(n) . Simply put, a linear phase ﬁlter preserves the shape of the signal. Should the ﬁlter have the property that H(ejω ) = 1, but does not have linear phase, then the shape of the signal will be distorted. We generalise the notion of linear phase to H(ejω ) = HM (ejω )e−jωα ejβ = HM (ejω )e−jφ(ω) , φ(ω) = −β + ωα where HM (ejω ) is a real function of ω, and α and β are constants. The case where β = 0 is referred to as linear phase as opposed to generalised linear phase where we have H(ejω ) = HM (ejω ) cos(β − ωα) + jHM (ejω ) sin(β − ωα) = =
∞
h(n)e−jωn h(n) cos(ωn) − j
∞ n=−∞
n=−∞ ∞ n=−∞
h(n) sin(ωn),
h(n) real
22 leading to
∞
CHAPTER 3. FILTER DESIGN
h(n) sin(ωn) h(n) cos(ωn)
sin(β − ωα) n=−∞ tan(β − ωα) = =− ∞ cos(β − ωα)
n=−∞
and
∞ n=−∞
h(n) sin(ω(n − α) + β) = 0.
(3.3)
To have φ(ω) = −β + ωα, (3.3) must hold for all ω, and is a necessary, but not suﬃcient, condition on h(n), α and β. For example, one can show that β = 0 or π , 2α = N − 1 , h(2α − n) = h(n) satisfy (3.3). Alternatively β= π 3π , or , 2α = N − 1 , h(2α − n) = −h(n) 2 2 (3.5) (3.4)
also satisﬁes (3.3). To have a causal system we require
∞ n=0
h(n) sin(ω(n − α) + β) = 0
to hold for all ω. Causality and the previous conditions (3.4) and (3.5) imply that h(n) = 0, n < 0 and n > N −1, i.e., causal FIR ﬁlters have generalised linear phase if they have impulse response length N and satisfy either h(2α − n) = h(n) or h(2α − n) = −h(n). Speciﬁcally if, h(n) = then H(ejω ) = He (ejω )e−jω(N −1)/2 , where He (ejω ) is a real, even, periodic function of ω. Similarly if h(n) = then H(ejω ) = jHO (ejω )e−jω(N −1)/2 = HO (ejω )e−jω(N −1)/2+jπ/2 , where HO (ejω ) is a real, odd, periodic function of ω. The above two equations for h(n) are suﬃcient, but not necessary conditions, which guarantee a causal system with generalised linear phase. −h(N − 1 − n), 0 ≤ n ≤ N − 1 , 0, otherwise , h(N − 1 − n), 0 ≤ n ≤ N − 1, 0, otherwise ,
3.3.1
Generalised Linear Phase for FIR Filters
We shall restrict our discussion to linear phase FIR ﬁlters. We classify FIR ﬁlters into four types depending on whether h(n) is symmetric (h(n) = h(N − 1 − n)) or antisymmetric (h(n) = −h(N − 1 − n)) about the point of symmetry or antisymmetry, and whether the ﬁlter length is odd or even. The four types of FIR ﬁlters and their associated properties are discussed below and summarised in Table 3.1. Figure 3.3 shows examples of the four types of FIR ﬁlters.
3.3. LINEAR PHASE FILTERS Type I FIR linear phase ﬁlter. A type I ﬁlter has a symmetric impulse response 0≤n≤N −1 N −1
2
23
h(n) = h(N − 1 − n),
with N an odd integer. The corresponding frequency response is H(ejω ) =
N −1 n=0
h(n)e−jωn = e−jω
N −1 2
−1 −1 −1 where a(0) = h( N 2 ) and a(k) = 2h( N 2 − k), k = 1, 2, . . . , N 2 .
k=0
a(k) cos(ωk) ,
where
Type II FIR linear phase ﬁlter. A type II ﬁlter has a symmetric impulse response with N an even integer. The corresponding frequency response is N/2 N −1 H(ejω ) = e−jω 2 b(k) cos[ω(k − 1/2)] ,
k=1
b(k) = 2h(N/2 − k), Type III FIR linear phase ﬁlter.
k = 1, 2, . . . , N/2.
A type III ﬁlter has an antisymmetric impulse response 0≤n≤N −1
h(n) = −h(N − 1 − n),
where
with N an odd integer. The corresponding frequency response is (N −1)/2 N −1 H(ejω ) = je−jω 2 c(k) sin(ωk) ,
k=1
c(k) = 2h((N − 1)/2 − k),
k = 1, 2, . . . , (N − 1)/2.
Type IV FIR linear phase ﬁlter. A type IV ﬁlter has an antisymmetric impulse response with N an even integer. The corresponding frequency response is N/2 N −1 d(k) sin[ω(k − 1/2)] , H(ejω ) = je−jω 2
k=1
where
d(k) = 2h(N/2 − k),
k = 1, 2, . . . , N/2.
24
CHAPTER 3. FILTER DESIGN Table 3.1: The four classes of FIR ﬁlters and their associated properties
Type
Symmetry of h(n) h(n) = h(N − 1 − n) symmetric
N
α
β
N −1 2
Form of HM (ejω ) a(n) · cos ωn
Constraints on HM (ejω ) · ejβ real
I
odd
N −1 2
0
n=0
π
N 2
II
h(n) = h(N − 1 − n) symmetric
even
N −1 2
0
n=1
1 b(n) · cos ω(n − ) 2
real HM (ejπ ) = 0
π
N −1 2
III
h(n) = −h(N − 1 − n) antisymmetric
odd
N −1 2
π/2
n=1
c(n) · sin ωn
Pure Imaginary HM (ej0 ) = HM (ejπ ) = 0 Pure Imaginary HM (ej0 ) = 0
3π/2
N 2
IV
h(n) = −h(N − 1 − n) antisymmetric
even
N −1 2
π/2
n=1
1 d(n) · sin ω(n − ) 2
3π/2
Remarks 1. A ﬁlter will have no phase distortion if H(ejω ) is real and even since h(n) will be real and even. 2. Noncausal FIR ﬁlters can be made causal by an appropriate time shift so that the ﬁrst nonzero value occurs at n = 0. The point of symmetry can therefore occur at two discrete−1 time points N 2 depending on whether N was odd or even. Example 3.3.1 Example of a type I FIR ﬁlter. Given a ﬁlter with unit sample response h(n) = we can evaluate the frequency response to be
4
1, 0 ≤ n ≤ 4 , 0, otherwise
H(ejω ) =
n=0
e−jωn =
1 − e−j5ω sin(5 · ω/2) = e−j2ω . 1 − e−jω sin(ω/2)
The magnitude and the phase of the frequency response of the ﬁlter are depicted in Figure 3.4 Example 3.3.2 Example of a type II FIR ﬁlter. Consider an extension, by one sample of the impulse response of the ﬁlter in example 3.3.1. Then the frequency response of the system is given by
5
H(e ) =
n=0
jω
e−jωn = e−jω 2
5
sin(3 · ω) , sin(ω/2)
whose magnitude and phase are depicted in Figure 3.5
3.3. LINEAR PHASE FILTERS h(n) 3 T 2 1
'
25
TYPE I
E
n h(n) 3 T 2 1
'
TYPE II
E
h(n)
T
n 2 TYPE III
E
1 n
1 h(n)
T 3
2
2 1
TYPE IV
E
1 2 3
n
Figure 3.3: Examples of the four types of linear phase FIR ﬁlters Example 3.3.3 Example of a type III FIR ﬁlter. Consider a ﬁlter with unit impulse response h(n) = δ(n) − δ(n − 2) , where δ(n) is Kronecker’s delta deﬁned as δ(n) = 1 0 , , n=0 . n=0
We can then evaluate the frequency response as H(ejω ) = 1 − e−j2ω = j[2 sin(ω)]e−jω = 2e−j (ω− 2 ) sin(ω). The magnitude and the phase of H(ejω ) are depicted in Figure 3.6.
π
26
5 4.5 4 2 Phase (Radians) 3.5 3 2.5 2 1.5 −2 1 0.5 0 0 −3 4
CHAPTER 3. FILTER DESIGN
3
1
Magnitude
0
−1
1
2
3 w rad/s
4
5
6
−4 0
1
2
3 w rad/s
4
5
6
Figure 3.4: Frequency response of type I ﬁlter of Example 9.3.1. Magnitude (left). Phase (right).
6 4
3 5 2 4 Magnitude Phase (Radians) 1 2 3 w rad/s 4 5 6
1
3
0
−1 2 −2 1 −3
0 0
−4 0
1
2
3 w rad/s
4
5
6
Figure 3.5: Frequency response of type II ﬁlter of Example 9.3.2. Magnitude (left). Phase (right).
3
3
2
2.5
Phase (Radians)
2 Magnitude
1
1.5
0
1
−1
0.5
−2
0 0
1
2
3 w rad/s
4
5
6
−3 0
1
2
3 w rad/s
4
5
6
Figure 3.6: Frequency response of type III ﬁlter of Example 9.3.3. Magnitude (left). Phase (right).
3.4. FILTER SPECIFICATION
27
Example 3.3.4 Example of a type IV FIR ﬁlter. Consider a ﬁlter with unit impulse response h(n) = δ(n) − δ(n − 1). We can then evaluate the frequency response as H(ejω ) = 1 − e−jω = j[2 sin(ω/2)]e−jω/2 = 2e−j ( 2 − 2 ) sin The magnitude and the phase of H(ejω ) are depicted in Figure 3.7.
3
ω
π
ω . 2
3
2
2.5
Phase (Radians)
2 Magnitude
1
1.5
0
1
−1
0.5
−2
0 0
1
2
3 w rad/s
4
5
6
−3 0
1
2
3 w rad/s
4
5
6
Figure 3.7: Frequency response of type IV ﬁlter of Example 9.3.4. Magnitude (left). Phase (right).
3.4
Filter Speciﬁcation
Like continuoustime ﬁlters, digital ﬁlters are speciﬁed in the frequency domain. Since H(ejω ) = H(ej(ω+2π) ) for all ω, H(ejω ) is completely speciﬁed for 0 ≤ ω < 2π. In addition, since h(n) is real, we have H(ejω ) = H(e−jω )∗ . This implies that specifying H(ejω ) for 0 ≤ ω < π completely speciﬁes H(ejω ) for all ω. To uniquely specify the complex frequency response H(ejω ), we need to specify both the magnitude and the phase of H(ejω ). However, since we are dealing with linear phase FIR ﬁlters, we need only to specify the magnitude. The ideal structure (magnitude of the frequency response) of a ﬁlter is given in Figure 3.8 for the lowpass, highpass, bandpass and stopband. Example 3.4.1 Speciﬁcation of a lowpass ﬁlter. The ideal lowpass ﬁlter is composed of a passband and stopband region. In practice we also have a transition band, as a sharp transition cannot be achieved. Ideally, H(ejω ) is unity in the passband and zero in the stopband. In practice, we supply error tolerances that require 1 − α1 ≤ H(ejω ) ≤ 1 + α1 for 0 ≤ ω ≤ ωp , and where α1 and α2 are called the passband tolerance and stopband tolerance, respectively. A lowpass ﬁlter is then completely speciﬁed by 4 parameters α1 , α2 , ωp and ωs . Figures 3.9 and 3.10 illustrate the concept of tolerance bands in ﬁlter design. H(ejω ) < α2 for ωs ≤ ω ≤ π ,
28
T H(ejω )
CHAPTER 3. FILTER DESIGN
1
E
π
T H(ejω )
ω
1
E
π
jω T H(e )
ω
1
E
ω1
jω T H(e )
ω2
π
ω
1
E
ω1
ω2
π
ω
Figure 3.8: Frequency response of ideal lowpass, highpass, bandpass, and bandstop ﬁlters. A similar arrangement holds for high pass, bandstop and band pass ﬁlters (see Figure 3.11).
3.5
Properties of IIR Filters
As discussed an IIR ﬁlter has a unit sample response of inﬁnite duration. In general, an IIR ﬁlter with arbitrary unit sample response h(n) cannot be realised, since computing each output sample requires a large amount of arithmetic operations. Therefore we restrict ourselves to the class of ﬁlters which: 1. have a unit sample response h(n) that is real, causal, and stable.
3.5. PROPERTIES OF IIR FILTERS
29
Figure 3.9: Speciﬁcations of a lowpass ﬁlter. H(ejω ) T Transition Band
c E
ωp
ωs
ω
Figure 3.10: Lowpass ﬁlter with transition band. 2. possess a rational ztransform with a0 = 1, i.e.,
N −1
bk z −k . ak z −k
H(z) =
k=0 M
1−
r=1
A causal h(n) with a rational ztransform can be realised by a linear constant coeﬃcient diﬀerence equation (LCCDE) with initial rest conditions (IRC) where the output is computed recursively. Thus IIR ﬁlters can be implemented eﬃciently. Example 3.5.1 Consider a causal impulse response described by h(n) = with H(z) given by H(z) = 1 2
n
· u(n) , 1
1 1 − 2 z −1
.
30
jω T H(e )
CHAPTER 3. FILTER DESIGN (a)
1 + α1 ............................... .............. . .............................. 1 − α1 ........... ... α2
....................................................... ................................................................................................................. ωp ωs π ω
E
1 + α1 1 − α1
jω T H(e )
(b) ....................................................... ................................................................................................................. ....................................................... ................................................................................................................ .
α2 ............................... .............. ωs 1 + α1 1 − α1 α2 ............................... .............. ωs1 ωp1 1 + α1 ............................... .............. .............. .............................. 1 − α1 . α2
jω T H(e )
E
ωp .............. ............................... ............................... ..............
π
ω (c)
T H(ejω )
....................................... ................................................................................. ωp2 ωs2 π ω
E
(d) ....................................... ................................................................................. ................................................................................ ........................................ .............. ...............................
E
ωp1 ωs1
ωs2 ωp2
π
ω
Figure 3.11: Tolerance scheme for digital ﬁlter design. (a) Lowpass. (b) High pass. (c) Bandpass. (d) Stopband. This can be realised by
1 y(n) = y(n − 1) + x(n) 2 assuming IRC. Note that although h(n) is of inﬁnite extent, the number of arithmetic operations required per output is only one multiplication and one addition using a LCCDE formulation.
3.5.1
Stability
An FIR is stable if h(n) is ﬁnite for all n, therefore stability is not a problem in practical design or implementation. With an IIR ﬁlter, all the poles of H(z) must lie inside the unit circle to ensure ﬁlter stability.
3.5. PROPERTIES OF IIR FILTERS
31
3.5.2
Linear phase
It is simple to achieve linear phase with FIR ﬁlters, to the point that we require all FIR ﬁlters to possess this property. It is much more diﬃcult to control the phase of an IIR ﬁlter. As a result, we specify only the magnitude response for the IIR ﬁlter and accept the resulting phase characteristics.
32
CHAPTER 3. FILTER DESIGN
Chapter 4
Finite Impulse Response Filter Design
4.1 Introduction
The ﬁlter design problem is that of determining h(n), H(ejω ) or H(z) to meet the design speciﬁcation. The ﬁrst step is to choose the type of ﬁlter from the previously discussed four types of FIR ﬁlters. Some types of ﬁlters may not be suitable for a given design speciﬁcation. For example, from Table 3.1 a Type II ﬁlter is not suitable for a highpass ﬁlter design as it will always have H(ejω ) = 0 at ω = π since cos π(n − 1/2) = 0. Table 4.1 shows what type of ﬁlters are not suitable for a given class of ﬁlter. Table 4.1: Suitability of a given class of ﬁlter for the four FIR types Type I II III IV Lowpass Highpass Not suitable Not suitable Bandpass Band stop Not suitable Not suitable Not suitable
Not suitable Not suitable
The two standard approaches for FIR ﬁlter design are: 1. The window method. 2. The optimal ﬁlter design method. We now discuss each of these approaches in turn.
4.2
Design of FIR Filters by Windowing
Many idealised systems are deﬁned by piecewiseconstant or piecewisefunctional frequency responses with discontinuities at the boundaries between bands. As a result, they have non causal impulse responses that are inﬁnitely long. The most straightforward approach for obtaining a causal FIR approximation to such systems is to truncate the ideal response. 33
34
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN Let Hd (ejω ) be the desired frequency response of the ﬁlter. Then Hd (ejω ) =
∞ n=−∞
hd (n)e−jωn ,
where hd (n) is the corresponding unit sample response sequence given by hd (n) = 1 2π
π −π
Hd (ejω )ejωn dω .
The simplest way to obtain a causal FIR ﬁlter from hd (n) is to deﬁne a new system with unit sample response hd (n), 0 ≤ n ≤ N − 1 , h(n) = 0, elsewhere , which can be rewritten as h(n) = hd (n) · w(n) , and therefore H(ejω ) = H(ejω ) 1 2π
π −π
Hd (ejθ ) · W (ej(ω−θ) )dθ .
Note that represents a smeared version of Hd (ejω ). If hd (n) is symmetric (or antisymmetric) with respect to a certain point of symmetry (or antisymmetry) and w(n) is also symmetric with respect to the same point of symmetry, then hd (n) · w(n) is the unit sample response of a linear phase ﬁlter. Example 4.2.1 Lowpass ﬁlter design with speciﬁcations α1 ,α2 , ωp , ωs . We choose a Type I ﬁlter for the design. Since the desired frequency response Hd (ejω ) should be consistent with the type of ﬁlter used, from Table 3.1, Hd (ejω ) is given by Hd (ejω ) = where α = Hd (ejω ) is
N −1 2 ,
e−jωα , 0 ≤ ω ≤ ωc , 0, ωc ≤ ω ≤ π
ωc =
ωp +ωs 2 ,
and N is the ﬁlter length (odd). The inverse Fourier transform of hd (n) = sin ωc (n − α) π(n − α) , ,
If we apply a rectangular window to hd (n), the resulting ﬁlter impulse response h(n) is h(n) =
sin ωc (n−α) π(n−α)
0
for 0 ≤ n ≤ N − 1 , otherwise .
The sequence h(n) is shown in Figure 4.1 for N = 7 and ωc = π/2. Note that this ﬁlter may or may not meet the design speciﬁcation.
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH
T h(n)
35
0.5 . 0.318 0.318
E
n −0.106 0.106
Figure 4.1: The sequence h(n).
4.3
Importance of the Window Shape and Length
h(n) = hd (n) · w(n) ,
In the timedomain we have which corresponds in the frequency domain to H(ejω ) = Hd (ejω ) ⋆ W (ejω ) π 1 Hd (ejθ )W (ej(ω−θ) )dθ = 2π −π
(4.1)
If we do not truncate at all so that w(n) = 1 for all n, W (ejω ) is a periodic impulse train with period 2π, and therefore H(ejω ) = Hd (ejω ). This interpretation suggests that if w(n) is chosen so that W (ejω ) is concentrated in a narrow band of frequencies around ω = 0, then H(ejω ) approximates Hd (ejω ) very closely except where Hd (ejω ) changes very abruptly. Thus, the choice of window is governed by the desire to have w(n) as short as possible in duration so as to minimise computation in the implementation of the ﬁlter while having W (ejω ) approximate an impulse; that is, we want W (ejω ) to be highly concentrated in frequency so that the convolution (4.1) faithfully reproduces the desired frequency response. These are conﬂicting requirements. Take, for example, the case of the rectangular window, where W (ejω ) =
N −1 n=0
e−jωn =
1 − e−jωN sin(ωN/2) = e−jω(N −1)/2 . −jω 1−e sin(ω/2)
The magnitude of the function sin(ωN/2)/ sin(ω/2) is shown in Figure 4.2 for the case N = 8. Note that W (ejω ) for the rectangular window has a generalised linear phase. As N increases, the width of the “mainlobe” decreases. The mainlobe is usually deﬁned as the region between the ﬁrst zero crossing on either side of the origin. For the rectangular window, the mainlobe width is ∆ω = 4π/N . However, for the rectangular window, the side lobes are large and, in fact, as N increases, the peak amplitudes of the mainlobe and the sidelobes grow in a manner such that the area under each lobe is a constant even though the width of each lobe decreases with N . Consequently, as W (ej(ω−θ) ) “slides by” a discontinuity of Hd (ejω ) with increasing ω, the integral of W (ej(ω−θ) )Hd (ejω ) will oscillate as each sidelobe of W (ej(ω−θ) ) moves past the discontinuity.
36
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
10 9 8 7 6 5 4 3 2 1 0 Mainlobe
Magnitude
Peak sidelobe
−2
0
2 w rad/s
4
6
8
Figure 4.2: Magnitude of the Fourier transform of a rectangular window (N = 8). We can modify this eﬀect by tapering the window smoothly to zero at each end so that the height of the sidelobes are diminished; however this is achieved at the expense of a wider mainlobe and thus a wider transition at the discontinuity. Five common windows are used in practice: the rectangular, Bartlett, Hanning, Hamming and Blackman window. The formula for these windows are given below. Rectangular w(n) = Bartlett 1, 0 ≤ n ≤ N − 1 , 0, elsewhere.
Hanning
2n N −1 , 0≤n≤ , N −1 2 w(n) = 2 − 2n , N − 1 ≤ n ≤ N − 1 . N −1 2 w(n) = 1 2πn 1 − cos( ) , 0 ≤ n ≤ N − 1. 2 N −1 2πn N −1 , 0 ≤ n ≤ N − 1.
Hamming w(n) = 0.54 − 0.46 cos Blackman w(n) = 0.42 − 0.5 cos 2πn N −1 + 0.08 cos 4πn N −1 , 0 ≤ n ≤ N − 1.
Figure 4.3 compares these window functions for N = 200. A good choice of the window corresponds to choosing a window with small mainlobe width and small sidelobe amplitude. Clearly, the shape of a window aﬀects both mainlobe and sidelobe behavior. Figures 4.4, 4.5, 4.6, 4.7 and 4.8 show these windows in the frequency domain for N = 51, using a 256point FFT. For comparison, Figures 4.9 and 4.10 show W (ejω ) of the Hamming
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH
37
1.2
1 Rectangular 0.8 Bartlett w(n) 0.6 Hamming
0.4 Blackman 0.2 Hanning 0 0
50
100 n
150
200
Figure 4.3: Commonly used windows for FIR ﬁlter design N = 200. window for N = 101 and 201, respectively. This shows that mainlobe behavior is aﬀected primarily by window length. The eﬀect of the shape and size of the window can be summarised as follows: 1. Window shape controls sidelobe as well as mainlobe behavior. 2. Window size controls mainlobe behavior only. Since sidelobe behavior and therefore passband and stopband tolerance is aﬀected only by the shape of the window, the passband and stopband requirements dictate which window is chosen. The window size is then determined from the transition width requirements. Table 4.2 (column 1 and 2) contains information useful in choosing the shape and length of the window. Table 4.2: Properties of diﬀerent windows. Peak Amplitude of Sidelobe (dB) (relative) 13 25 31 41 57 Approximate Width (rad) of mainlobe ∆ωm 4π/N 8π/(N − 1) 8π/(N − 1) 8π/(N − 1) 12π/(N − 1) Peak Approximation Error (dB) 20 log10 (min(α1 , α2 )) 21 25 44 53 74
Window Type Rectangular Bartlett Hanning Hamming Blackman
Example 4.3.1 Filter design problem.
38
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
0
10
20 20 logW
30
40
50
60 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.4: Fourier transform (magnitude) of rectangular window (N = 51).
0 20 40 60 20 logW 80 100 120 140 160 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.5: Fourier transform (magnitude) of Bartlett (triangular) window (N = 51).
0
20
40
20 logW
60
80
100
120
140 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.6: Fourier transform (magnitude) of Hanning window (N = 51).
4.3. IMPORTANCE OF THE WINDOW SHAPE AND LENGTH
39
0 10 20 30 20 logW 40 50 60 70 80 90 100 0 0.5 1 1.5 2 w rad/s 2.5 3 3.5
Figure 4.7: Fourier transform (magnitude) of Hamming window (N = 51)
0
20
40
20 logW
60
80
100
120
140 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.8: Fourier transform (magnitude) of Blackman window (N = 51).
0
20
40 20 logW
60
80
100
120 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.9: Fourier transform of a Hamming window with window length of N = 101.
40
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
0
20
40 20 logW
60
80
100
120 0
0.5
1
1.5 2 w rad/s
2.5
3
3.5
Figure 4.10: Fourier transform of a Hamming window with window length of N = 201. We have to design a lowpass ﬁlter with a required stopband attenuation of −50dB. Which ﬁlter is most appropriate? The third column of Table 4.2 contains information on the best stopband (or passband) attenuation we can expect from a given window shape. Therefore, we can see that the rectangular, Bartlett and Hanning windows will not meet the speciﬁcations. The Hamming window may just meet the requirements, but the Blackman window will clearly be the most appropriate. From the second column of Table 4.2 the length of the window required can be derived. The distance between the peak ripples on either side of discontinuity is approximately equal to the mainlobe width ∆ωm . The transition width ∆ω = ωs − ωp is therefore somewhat less than the mainlobe width. For example, for the Blackman window, N can be determined from the relationship 12π/(N − 1) ≃ ∆ω. Therefore N ≈ 12π + 1 points. ∆ω
4.4
Kaiser Window Filter Design
We can quantify the tradeoﬀ between mainlobe width and sidelobe area by seeking the window function that is maximally concentrated around ω = 0 in the frequency domain. Kaiser proposed the window function I0 [β(1 − [(n − α)/α]2 )1/2 ] , 0 ≤ n ≤ N − 1, w(n) = I0 (β) 0, elsewhere ,
−1 where α = N 2 , and I0 (·) is the zerothorder modiﬁed Bessel function of the ﬁrst kind. In contrast to the previously discussed windows, this one has two parameters; the length N , and a shape parameter β. Thus, by varying N and β we can trade sidelobe amplitude for mainlobe width. Figure 4.11 shows the tradeoﬀ that can be achieved. If the window is tapered more, the sidelobes of the Fourier transform become smaller, but the mainlobe becomes wider. Alternatively, we can hold β constant and increase N , causing the mainlobe to decrease in width without aﬀecting the amplitude of the sidelobes.
4.4. KAISER WINDOW FILTER DESIGN
1.2
41
1
0.8 Amplitude
0.6
0.4
0.2
0 0
2
4
6
8
10 Samples
12
14
16
18
20
(a)
0 −10 −20 −30 −40 −50 −60 −70 −80 −90 −100 0
20 log W
0.5
1
1.5 w rad/s
2
2.5
3
(b)
0 −10 −20 −30 −40 −50 −60 −70 −80 −90 −100 0
20 log W
0.5
1
1.5 w rad/s
2
2.5
3
(c) Figure 4.11: (a) Kaiser windows for β = 0 (solid), 3 (semidashed), and 6 (dashed) and N = 21. (b) Fourier transforms corresponding to windows in (a). (c) Fourier transforms of Kaiser windows with β = 6 and N = 11 (solid), 21 (semidashed), and 41 (dashed).
42
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Kaiser obtained formula that permitted the ﬁlter designer to estimate the values of N and β which were needed to meet a given ﬁlter speciﬁcation. Consider a frequency selective ﬁlter with the speciﬁcations shown in Figure 3.11(a). Kaiser found that over a usefully wide range of conditions α1 is determined by the choice of β. Given α1 , the passband cutoﬀ frequency ωp of the lowpass ﬁlter is deﬁned to be the highest frequency such that H(ejω ) ≥ 1 − α1 . The stopband cutoﬀ frequency ωs is deﬁned to be the lowest frequency such that H(ejω ) ≤ α2 . The transition region width is therefore ∆ω = ωs − ωp for the lowpass ﬁlter approximation. With A being the negative peak approximation error (pae) A = −20 log10 min (α1 , α2 ) , Kaiser determined empirically that to achieve a speciﬁed value 0.1102(A − 8.7), 0.5842(A − 21)0.4 + 0.07886(A − 21), β= 0, To achieve prescribed values of A and ∆ω, N must satisfy N= A−8 +1 . 2.285∆ω of A, we required that A > 50, 21 ≤ A ≤ 50, A < 21.
With these formula no iterations or trial is required. Example 4.4.1 Given the speciﬁcations of a lowpass ﬁlter ωp = 0.4π, ωs = 0.6π, α1 = α2 = 0.001 and ∆ω = 0.2π, ﬁnd the unit sample response using Kaiser’s method. We ﬁnd A = −20 log10 α1 = 60 and β = 5.635, N = 38 . Then, h(n) =
sin ωc (n−α) π(n−α)
0,
·
I0 [β(1−[(n−α)/α]2 )1/2 ] , I0 (β)
0 ≤ n ≤ N − 1, elsewhere
−1 where α = N 2 = 37 = 18.5 and ωc = (ωs + ωp ) /2. Since N is an even integer, the resulting 2 linear phase system would be of type II. Figure 4.12 shows h(n) and H(ejω ).
Example 4.4.2 Design of a highpass ﬁlter. In the following, we consider a highpass ﬁlter with ωs = 0.35π ωp = 0.5π α1 = 0.021 Again using the formulae proposed by Kaiser, we get β = 2.6 and N = 25. Figure 4.13 shows h(n) and H(ejω ). The resulting peak error is 0.0213 > α1 = α2 = 0.021 as speciﬁed. Thus we increase N by 26. In this case the ﬁlter would be of type II and is not appropriate for designing a highpass. Therefore we use N = 27, which exceeds the required speciﬁcations.
4.4. KAISER WINDOW FILTER DESIGN
0.6
43
0.5
0.4
0.3 Amplitude
0.2
0.1
0
−0.1
−0.2 0
5
10
15 20 25 Sample number (n)
30
35
40
(a)
20
0
−20
20 log H
−40
−60
−80
−100
−120 0
0.5
1
1.5 w rad/s
2
2.5
3
(b) Figure 4.12: Response functions. (a) Impulse response. (b) Log magnitude. The design of FIR ﬁlters by windowing is straightforward to apply and it is quite general even though it has a number of limitations. However, we often wish to design a ﬁlter that is the “best” that can be achieved for a given value of N . We describe two methods that attempt to improve upon the windowing method by introducing a new criterion of optimality.
44
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
0.7 0.6 0.5 0.4 Amplitude 0.3 0.2 0.1 0 −0.1 −0.2 −0.3 −0.4 0 5 10 15 Sample number (n) 20 25 30
(a)
20
0
−20 20 log H
−40
−60
−80
−100 0
0.5
1
1.5 w rad/s
2
2.5
3
(b) Figure 4.13: Kaiser window estimate. (a) Impulse response. (b) Log magnitude.
4.5. OPTIMAL FILTER DESIGN.
45
4.5
Optimal Filter Design.
As seen, the design of a ﬁlter with the window method is straightforward and easy to handle but it is not optimal in the sense of minimizing the maximum error. Through the Fourier approximation, any ﬁlter design which uses windowing attempts to minimize the integrated error ǫ2 = 1 2π
π −π
Hd (ejω ) − H(ejω ) dω.
2
In many applications we get a better result by minimizing the maximum error. Using the windowing method and Fourier approximation we always get the Gibbs phenomenon which implies that the maximum error will be at discontinuities of Hd (ejω ). Better results can be achieved if the approximation error is spaced equally over the whole frequency band. In the following section only type I linear phase FIR ﬁlters are taken into account but the implementation of other linear phase FIR ﬁlters is straightforward. Take a noncausal FIR ﬁlter with the impulse response h(n) = h(−n), the frequency response is
L
H(e ) =
n=−L
jω
h(n)e−jωn
L
= h(0) +
n=1 L
2h(n) cos(ωn)
=
n=0 L
a(n) cos(ωn)
n
=
n=0 L
a(n)
k=0
βkn cos(ω)k
=
n=0
a(n) cos(ω)n ˇ
which is real and even. The ﬁlter speciﬁcations for Hd (ejω ) are shown in Fig. 4.14 To design the ﬁlter the ParksMcClellan algorithm formulates the problem as one of polynomial approximation which can be solved by the Chebyshev approximation over disjoint sets of frequencies (F = {0 ≤ ω ≤ ωp , ωs ≤ ω ≤ π}). The approximation error function, E(ω), is deﬁned as E(ω)=W (ω) Hd (ejω ) − H(ejω ) . ˆ (4.2)
E(ω) measures the error between the optimum and attained frequency response, which is weighted by W (ω) where W (ω) =
α1 α2
: 0 ≤ ω ≤ ωp . 1 : ωs ≤ ω ≤ π
46
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
1 + α1 1 1 − α1
α2 −α2
ωp
ωs
Figure 4.14: Filter Speciﬁcation The optimal H(ejω ) minimizes the maximum weighted error which means we have to search for an H(ejω ) which fulﬁlls min max E(ω)
ω∈F
a(n) : 0≤n≤L ˇ
=
a(n) : 0≤n≤L ˇ
min
max W (ω) Hd (ejω ) − H(ejω )
ω∈F
.
Theorem 4.5.1 (Alternation Theorem) With a closed disjoint subset F on the real axis a necessary and suﬃcient condition for
L
H(e ) =
n=0
jω
a(n) cos(ω)n ˇ
to be the unique, best weighted Chebyshev approximation of Hd (ejω ) in F is that the error function E(ω) has at least L + 2 alternating frequencies in F . This means that there must exist at least L + 2 frequencies ωi in F such that ω0 < ω1 < · · · < ωL+1 with E(ωi ) = −E(ωi+1 ) and E(ωi ) = max E(ω)
ω∈F
i = 0, . . . , L + 1.
The maximum of the alternating frequencies can be found from the derivative of E(ω). If we calculate the derivative of E(ω) and take into account that Hd (ejω ) is constant in F , we get for the maximal frequencies dE(ω) dω d W (ω) Hd (ejω ) − H(ejω ) dω dH(ejω ) W (ω) = − dω
L
=
(4.3)
= W (ω) sin(ω)
n=1
nˇ(n) cos(ω)n−1 = 0. a
This means that there are at most L + 1 maxima belonging to the polynomial structure and to the frequencies at ω = 0 and ω = π. Two more maxima will occur at ω = ωp and ω = ωs
4.5. OPTIMAL FILTER DESIGN.
47
because of the discontinuity and the inability to approximate this. As it can be seen in Eq. (4.3), the error function E(ω) and the approximation function H(ejω ) possess maxima at the same frequencies ωi . Thus from Theorem 4.5.1 we can write the error E(ω) at the ωi , i = 0 . . . L + 1, as E(ωi ) = W (ωi ) Hd (ejωi ) − H(ejωi ) = (−1)i ρ and can express the ideal frequency response as Hd (e or 1 cos(ω0 ) · · · . . .. . . . . . 1 cos(ωL+1 ) · · · cos(Lω0 ) . . . cos(LωL+1 )
1 W (ω0 ) jωi
i = 0...L + 1
) = H(e
jωi
(−1)i ρ )+ = W (ωi )
L
a(n) cos(ωi n) +
n=0
(−1)i ρ W (ωi )
. . .
(−1)L+1 W (ωL+1 )
To solve these equations for the parameter an , we use an algorithm called ”Remez” which was developed by Rabiner et. al. in 1975. After an initial guess of L + 2 extremal frequencies of H(ejω ) we can compute ρ according to ρ= with
L+1
a0 Hd (ejω0 ) . . . . = . . . aL jωL+1 ) Hd (e ρ
γ0 Hd (ejω0 ) + . . . + γL+1 Hd (ejωL+1 )
γ0 W (ejω0 )
+ ... +
(−1)L+1 γL+1 W (ejωL+1 )
(4.4)
γk =
n=0, n=k
1 . cos(ωk ) − cos(ωn )
Since E(ω) has the same extremal frequencies as H(ejω ) we can write (−1)i ρ ˆ . H(ejωi ) = Hd (ejωi ) − W (ωi ) Using Lagrange interpolation it follows that H(ejω ) = with
L L βk ˆ jωi ) k=0 H(e cos(ω)−cos(ωk ) L βk k=0 cos(ω)−cos(ωk )
βk =
n=0, n=k
1 . cos(ωk ) − cos(ωn )
With H(ejω ) it is possible to calculate the error E(ω) according to Eq. (4.2). After calculating E(ω), L + 2 extremal frequencies belonging to the largest alternating peaks can be selected and the algorithm can start again by using Eq. (4.4). The Algorithm is summarised in Fig. 4.15.
48
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
By using the Remez algorithm we are able to ﬁnd the δ that deﬁnes the upper bound of E(ω), which is the optimum solution for the Chebyshev approximation. After the algorithm converges the ﬁlter coeﬃcients h(n) can be calculated by taking L + 1 frequencies from H(ejω ) with h(n) = 1 L+1
L n=0
H(eωn )ej L+1 .
2πkn
Property of Optimal Filters The error between the desired and attained ﬁlter frequency response achieve a minimum and maximum consecutively at a ﬁnite number of frequencies in the frequency range of interest (passband and stopband regions) called alternation frequencies. The minimum number of alternating +3 −1 frequencies for a Type I optimum ﬁlter is N 2 (with L = N 2 ). Because of this characteristic, optimal ﬁlters are called equiripple ﬁlters. Remarks 1. Designing an optimal ﬁlter requires much more computation than the windowing method. 2. The length of the ﬁlter designed by the window method is 20% to 50% longer than the length of the optimal ﬁlter. 3. Optimal ﬁlter design is the most widely used FIR ﬁlter design method. In c MATLAB the function that performs optimum ﬁlter design is called ”ﬁrpm”.
4.6
Implementation of FIR Filters
Here we discuss the implementation of a discrete time system with unit sample response h(n), where h(n) may be a ﬁlter. The inputoutput relationship is given by the convolution sum y(n) =
∞ k=−∞
h(k)x(n − k) =
N −1 k=0
h(k)x(n − k) ,
i.e., the output y(n) represents a sum of weighted and delayed versions of the input x(n). The signal ﬂow graph for a single delay element is shown in Figure 4.16. The convolution sum can be represented as in Figure 4.17. The computational eﬃciency of the algorithm in Figure 4.17 may be improved by exploiting the symmetry (or antisymmetry) of h(n). For type I Filters, we have h(n) = h(N − 1 − n) and
4.6. IMPLEMENTATION OF FIR FILTERS N is odd. This leads to y(n) =
N −1 k=0
49
h(k) · x(n − k) h(k) · x(n − k) + h h(k) · x(n − k) N −1 2 x n− N −1 2
N −1 −1 2
=
k=0 N −1
+
−1 k= N 2 +1 N −1 −1 2
y(n) =
k=0
h(k) · [x(n − k) + x(n − (N − 1 − k))] x n− N −1 2
+ h
N −1 2
This realisation is shown by the ﬂow graph in Figure 4.18. This method reduces the number of multiplications by 50% without aﬀecting the number of additions or storage elements required.
50
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Input ﬁlter parameters
c Initial guess of (L + 2) extremal frequencies E c Calculate the optimum ρ on extremal set c Interpolate through (L + 1) points to obtain H(ejω ) c Calculate error E(ω) and ﬁnd local maxima where E(ω) ≥ ρ c
More than (L + 2) extrema?
Yes
E
Retain (L + 2) largest extrema
No Changed
' c
Check whether the extremal points changed Unchanged c Best approximation
Figure 4.15: Flowchart of ParksMcClellan algorithm.
4.6. IMPLEMENTATION OF FIR FILTERS
51
x(n)
E z −1
E y(n)
Figure 4.16: Signal ﬂow graph for a delay element.
x(n) E
E c h(0)
z −1
E
E c h(1)
z −1
E
E
E c h(2)
E E
z −1
c h(N − 1) E y(n)
Figure 4.17: Signal Flow graph representing the convolution sum.
x(n)
E )
E z −1 )
E
E z −1 )
E )
E z −1
c
'
'
'
z −1 '
z −1 ' h( −3 c N2 )
z −1 ' h( −1 c N2 )
h(0) c y(n)
' '
h(1) c
'
h(2) c
Figure 4.18: Eﬃcient signal ﬂow graph for the realisation of a type I FIR ﬁlter.
52
CHAPTER 4. FINITE IMPULSE RESPONSE FILTER DESIGN
Chapter 5
Design of Inﬁnite Impulse Response Filters
5.1 Introduction
When designing IIR ﬁlters one must determine a rational H(z) which meets a given magnitude speciﬁcation. The two diﬀerent approaches are : 1. Design a digital ﬁlter from an analog or continuoustime ﬁlter. 2. Design the ﬁlter directly. Only the ﬁrst approach is considered since it is much more useful. It consists of the following four steps: Step 1 : Speciﬁcation of the digital ﬁlter. Step 2 : Translation to a continuoustime ﬁlter speciﬁcation. Step 3 : Determination of the continuoustime ﬁlter system function Hc (s). Step 4 : Transformation of Hc (s) to a digital ﬁlter system function H(z). Remarks 1. Step 2 depends on Step 4 since translation of the digital ﬁlter speciﬁcation to a continuoustime ﬁlter speciﬁcation depends on how the continuoustime ﬁlter is transformed back to the digital domain. 2. It is desirable for Step 4 to always transform a causal and stable Hc (s) to a causal and stable H(z), and also that the transformation results in a digital ﬁlter with a rational H(z) when the continuoustime ﬁlter has a rational Hc (s). 3. The continuoustime ﬁlter frequency response Hc (jΩ) = Hc (s)s=jΩ should map the digital ﬁlter frequency response H(ejω ) = H(z = ejω ). Typically Butterworth, Chebyshev and elliptic ﬁlters are used as continuoustime prototypes. The impulse invariance method and bilinear transformation are two approaches that satisfy all the above properties, and are often used in designing IIR ﬁlters. We discuss their implementation in the following sections. 53
54
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
5.2
Impulse Invariance Method
In the impulse invariance method, we obtain the unit sample response of the digital ﬁlter by sampling the impulse response of the continuoustime ﬁlter: 1. From Hc (s) and Hc (jΩ), we determine hc (t). 2. The sequence h(n) is obtained by h(n) = Td · hc (nTd ), where Td is the design sampling interval which does not need to be the same as the sampling period T associated with A/D and D/A conversion. 3. H(z) is obtained from h(n). Hc (s) H(z)
hc (t)
h(n)
Example 5.2.1 A continuoustime ﬁlter is described by Hc (s) = The inverse Laplace transform is hc (t) = esc t · u(t) . Therefore, h(n) = hc (t)t=nTd = esc n · u(n) assuming Td = 1, which leads to H(z) = 1 1 − esc z −1 1 . s − sc
In general, to transform Hc (s) to H(z), Hc (s) is ﬁrst expressed as a linear combination of 1 system functions of the form (s−sc ) , which is then transformed to give H(z), i.e. perform a partial fraction expansion and transform each component. 1 If Hc (s) has multiple poles at s = sc , Hc (s) would have a component of the form (s−sc )p with p > 1 . Its ztransform can be obtained by following the same steps as the above example. In previous chapters, we have seen that H(e ) =
jω ∞ k=−∞
Hc j
ω 2π +j k . Td Td
(5.1)
So, H(ejω ) is not simply related to Hc (jΩ) because of aliasing. If there were no aliasing eﬀect, H(ejω ) = Hc j ω Td for − π ≤ ω ≤ π , (5.2)
and the shape of H(ejω ) would be the same as Hc (jΩ). Note that in practice Hc (jΩ) is not band limited and therefore aliasing is present which involves some diﬃculty in the design. In ω practice, we suppose that the relation H(ejω ) = Hc (j Td ) holds over [−π, π) and we carry out
5.2. IMPULSE INVARIANCE METHOD
55
the remaining steps, instead of using (5.1). Note that the discretetime system frequency ω is related to the continuoustime system frequency Ω by ω = ΩTd . Because of the approximation, the resulting ﬁlter may not satisfy the digital ﬁlter speciﬁcations and some iterations may be needed. Also due to aliasing, it is diﬃcult to design band stop or highpass ﬁlters that have no signiﬁcant aliasing. In the impulse invariance design procedure, we transform the continuoustime ﬁlter speciﬁcations through the use of Eq. (5.2) to the discretetime ﬁlter speciﬁcations. We illustrate this in the following example. Example 5.2.2 Assume that we have the lowpass ﬁlter speciﬁcations as shown in Figure 5.1 with α1 = 0.10875, α2 = 0.17783, ωp = 0.2π, Td = 1 and ωs = 0.3π. In this case the passband is between 1 − α1 and 1 rather than between 1 − α1 and 1 + α1 . Historically, most continuous1 .............................. 1 − α1 ..............................
jω TH(e )
α2 ωp
.................................
E
ωs
π
ω
Figure 5.1: Lowpass ﬁlter speciﬁcations. time ﬁlter approximation methods were developed for the design of passive systems (having gain less than 1). Therefore, the design speciﬁcations for continuous time ﬁlters have typically been formulated so that the passband is less than or equal to 1. We will design a ﬁlter by applying impulse invariance to an appropriate Butterworth continuoustime ﬁlter. A Butterworth lowpass ﬁlter is deﬁned by the property that the magnitudesquared function is 1 Hc (jΩ)2 = 1 + (Ω/Ωc )2N where N is an integer corresponding to the order of the ﬁlter. The magnitude response of a Butterworth ﬁlter of N th order is maximally ﬂat in the passband, i.e., the ﬁrst (2N − 1) derivatives of the magnitudesquared function are zero at Ω = 0. An example of this characteristic is shown in Figure 5.2. The ﬁrst step is to transform the discretetime speciﬁcations to continuoustime ﬁlter speciﬁcations. Assuming that the aliasing involved in the transformation from Hc (jΩ) to H(ejω ) will be negligible, we obtain the speciﬁcation on Hc (jΩ) by applying the relation Ω = ω/Td to obtain the continuoustime ﬁlter speciﬁcations shown in Figure 5.3. The Butterworth ﬁlter should have the following speciﬁcations, 1 − α1 = 0.89125 ≤ Hc (jΩ) ≤ 1 , 0 ≤ Ω ≤ 0.2π , and Hc (jΩ) ≤ 0.17783 = α2 , 0.3π ≤ Ω ≤ π .
56
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
1 0.9 0.8 0.7 0.6 Magnitude N=2 0.5 0.4 0.3 0.2 0.1 0 0 N=4
N=8
1
2
3 Frequency
4
5
6
Figure 5.2: Magnitude transfer function Hc (jΩ).
THc (jΩ)
1 1 − α1
α2 0
ωp Td ωs Td π Td
E
Ω
Figure 5.3: Filter speciﬁcations for Example 5.2.2. Because Hc (jΩ) is a monotonic function of Ω, the speciﬁcations will be satisﬁed if Hc (j0.2π) ≥ 0.89125 and Hc (j0.3π) ≤ 0.17783 Now, we determine N and Ωc , which are parameters of Hc (jΩ)2 = to meet the speciﬁcations. Thus, we have 1+ 0.2π Ωc
2N
1 , 1 + (Ω/Ωc )2N
=
1 0.89125
2
5.3. BILINEAR TRANSFORMATION and 1+ 0.3π Ωc
2N
57 1 0.17783
2
=
which leads to N = 5.8858 and Ωc = 0.70474. Rounding N up to 6, we ﬁnd Ωc = 0.7032. With these values, the passband speciﬁcation will be met exactly and the stopband speciﬁcations will be exceeded, reducing the potential of aliasing. With N = 6, the 12 poles of Hc (s)Hc (−s) = 1/[1 + (s/jΩc )2N ] are uniformly distributed around a circle of radius ωc = 0.7032. Consequently, the poles of Hc (s) are the 3 pole pairs in the left half of the splane with coordinates s1 = −0.182 ± j0.679
s2 = −0.497 ± j0.497 s3 = −0.679 ± j0.182 so that Hc (s) = 0.12093 . (s2 + 0.364s + 0.4942)(s2 + 0.994s + 0.494)(s2 + 1.358s + 0.4542)
1 (s−sk )
If we express Hc (s) as a partial fraction expansion, we transform H(z) =
into
1 (1−esk z −1 )
and get
0.2871 − 0.4466z −1 −2.1428 + 1.1455z −1 1.8557 − 0.6303z −1 + + . 1 − 1.2971z −1 + 0.6949z −2 1 − 1.0691z −1 + 0.3699z −2 1 − 0.9972z −1 + 0.2570z −2
5.3
Bilinear Transformation
The other main approach to the design problem is the bilinear transformation. With this approach, H(z) is obtained directly from the continuoustime frequency response Hc (s) by replacing s by 2 1 − z −1 s= Td 1 + z −1 so that H(z) = Hc 2 Td 1 − z −1 1 + z −1 .
A sample on the unit circle, z = ejω , corresponds to the point s= 2 2 1 − e−jω = j tan(ω/2). · −jω Td 1 + e Td
The bilinear transformation transforms the imaginary axis of the splane to the unit circle as shown in Figure 5.4. The inverse transform is given by z=
2 Td 2 Td
+s −s
=
1 + σTd /2 + jΩTd /2 1 + (Td /2)s = . 1 − (Td /2)s 1 − σTd /2 − jΩTd /2
The region ℜ{s} < 0 is transformed to z < 1, i.e. the lefthand half plane ℜ{s} < 0 of the splane is mapped inside the unit circle. This means the bilinear transformation will preserve the stability of systems.
58
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS jΩ T splane ℑ T zplane
Image of © s = jΩ (unit circle)
E E
σ Image of left halfplane
ℜ
Figure 5.4: Mapping of the splane onto the zplane using the bilinear transformation
2 In the deﬁnition of the transformation, Td is a scaling factor. It controls the deformation of the frequency axis when we use the bilinear transformation to obtain a digital transfer function 2 H(z) from the continuoustime transfer function Hc (s). From s = j Td tan(ω/2) = σ + jΩ, it follows that 2 Ω= tan(ω/2) and σ = 0 . Td
For low frequencies, the deformation between Ω and ω is very small,i.e. for small ω, we can assume Ω ≈ ω . This was the criterion for choosing 2/Td as a scaling factor. This is shown in 2 Figure 5.5. The choice of another criterion will yield another scaling factor.
4
3
2 Discrete−time frequency
1
0
−1
−2
−3
−4 −20
−15
−10
−5 0 5 Continuous−time frequency
10
15
20
Figure 5.5: Mapping of the continuoustime frequency axis onto the unit circle by bilinear transformation (Td = 1).
Remarks 1. This transformation is the most used in practice. It calculates digital ﬁlters from known continuoustime ﬁlter responses. 2. By taking into account the frequency deformation, Ω= ω 2 tan , Td 2
5.3. BILINEAR TRANSFORMATION the detailed shape of Hc (jΩ) is not preserved. 3. The delay time is also modiﬁed τd = τc [1 + ((ωc Td )/2)2 ] .
59
If the continuoustime ﬁlter has a constant delay time, the resulting digital ﬁlter will not have this property. 4. To design a digital ﬁlter using this method one must ﬁrst transform the discrete time ﬁlter speciﬁcations to the continuous time speciﬁcations, design a continuoustime ﬁlter with these new requirements and then apply the bilinear transformation. Example 5.3.1 Consider the discretetime ﬁlter speciﬁcations of Example 5.2.2, where we illustrated the impulse invariance technique for the design of a discretetime ﬁlter. In carrying out the design using the bilinear transformation, the critical frequencies of the discretetime ﬁlter must be prewarped to the corresponding continuoustime frequencies using Ω = (2/Td ) tan(ω/2) so that the frequency distortion inherent in bilinear transformation will map them back to the correct discretetime critical frequencies. For this speciﬁc ﬁlter, with Hc (jΩ) representing the magnitude response function of the continuoustime ﬁlter, we require that 0.89125 ≤ Hc (jΩ) ≤ 1, Hc (jΩ) ≤ 0.17783, 0 ≤ Ω ≤ 2 tan Td 2 tan Td 0.2π 2 ,
0.3π 2
≤ Ω ≤ ∞.
For convenience we choose Td = 1. Also, as with Example 5.2.2, since a continuoustime Butterworth ﬁlter has monotonic magnitude response, we can equivalently require that Hc (j2 tan(0.1π)) ≥ 0.89125 and Hc (j2 tan(0.15π)) ≤ 0.17783. The form of the magnitudesquared function for the Butterworth ﬁlter is Hc (jΩ)2 = 1 . 1 + (Ω/Ωc )2N
Solving for N and Ωc with the equality sign we obtain 1+ and 1+ and solving for N gives N=
1 1 log[(( 0.17783 )2 − 1)/(( 0.89125 )2 − 1)] = 5.30466. 2 log[tan(0.15π)/ tan(0.1π)]
2 tan(0.1π) Ωc 2 tan(0.15π) Ωc
2N
=
1 0.89125 1 0.17783
2
(5.3)
2N
2
=
(5.4)
60
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
Since N must be an integer, we choose N = 6. Substituting this N into Eq. ( 5.4) we obtain Ωc = 0.76622. For this value of Ωc , the passband speciﬁcations are exceeded and stop band speciﬁcations are met exactly. This is reasonable for the bilinear transformation since we do not have to be concerned with aliasing. That is, with proper prewarping, we can be certain that the resulting discretetime ﬁlter will meet the speciﬁcations exactly at the desired stopband edge. In the splane, the 12 poles of the magnitudesquared function are uniformly distributed in angle on a circle of radius 0.7662. The system function of the continuoustime ﬁlter obtained by selecting the left halfplane poles is Hc (s) = (s2 + 0.3966s + 0.5871)(s2 0.20238 . + 1.0836s + 0.5871)(s2 + 1.4802s + 0.5871)
We then obtain the system function H(z) for the discretetime ﬁlter by applying the bilinear transformation to Hc (s) with Td = 1, H(z) = 0.0007378(1 + z −1 )6 . (1 − 1.268z −1 + 0.7051z −2 )(1 − 1.0106z −1 + 0.3583z −2 )(1 − 0.9044z −1 + 0.2155z −2 )
Since the bilinear transformation maps the entire jΩaxis of the splane onto the unit circle, the magnitude response of the discretetime ﬁlter falls oﬀ much more rapidly than the original continuoustime ﬁlter, i.e. the entire imaginary axis of the splane is compressed onto the unit circle. Hence, the behaviour of H(ejω ) at ω = π corresponds to the behaviour of Hc (jΩ) at Ω = ∞. Therefore, the discretetime ﬁlter will have a sixthorder zero at z = −1 since the continuoustime Butterworth ﬁlter has a sixthorder zero at s = ∞. Note that as with impulse invariance, the parameter Td is of no consequence in the design procedure since we assume that the design problem always begins with discretetime ﬁlters speciﬁcations. When these speciﬁcations are mapped to continuoustime speciﬁcations and then the continuoustime ﬁlter is mapped back to a discretetime ﬁlter, the eﬀect of Td is cancelled. Although we retained the parameter Td in our discussion, in speciﬁc problems any convenient value of Td can be chosen.
5.4
Implementation of IIR Filters
An IIR ﬁlter has an inﬁnite extent and cannot be implemented by direct convolution in practice. For this reason, we require the IIR ﬁlter to have a rational ztransform, as we have seen that this can be realised by a diﬀerence equation.
5.4.1
Direct Form I
The diﬀerence equation that describes the system is given by
M
y(n) =
k=1
ak y(n − k) +
N −1 r=0
br x(n − r)
This can be realized in a straightforward manner as depicted in Figure 5.6 which represents the pair v(n) = N −1 br x(n − r) and y(n) = M ak y(n − k) + v(n) for (N − 1) = M . This r=0 k=1
5.4. IMPLEMENTATION OF IIR FILTERS representation can be viewed as an implementation of H(z) through the decomposition H(z) = H2 (z)H1 (z) = 1 −
E z −1 E
61
1
M k=1
Herein we have represented the delay element
ak z −k
N −1 r=0
br z −r
.
by
E E
E
z −1
E c−1 z
x(n)
E
E
b0
v(n)
y(n)
z −1 c x(n − 1) z −1 c x(n − 2)
E E
T
T
b1
a1
'
y(n − 1) z c
−1
T
T
b2
a2
'
y(n − 2)
x(n − N + 2) z −1 c x(n − N + 1)
bN −2
E
aN −2
'
y(n − N + 2) z c
−1
bN −1
E
aN −1
'
y(n − N + 1)
Figure 5.6: Signal ﬂow graph of direct form I structure for an (N − 1)thorder system.
5.4.2
Direct Form II
The previous system can also be viewed as a cascade of two systems whose system functions are given by H1 (z) and H2 (z). Since H1 (z)H2 (z) = H(z) = H2 (z)H1 (z) , we can change the order of H1 (z) and H2 (z) without aﬀecting the overall system function. This is because the delay elements are redundant and can be combined. We then obtain the direct form II structure. The advantage of the second form lies in the reduction in the number of storage elements required, while the number of arithmetic operations required is constant. The ﬂow graph of direct form II structure is depicted in Figure 5.7
62
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS w(n) b0 E E E x(n) E y(n)
T c−1 z T E
a1
'
b1
T
z c a2
'
−1
T E
b2
aN −2
'
bN −2
E
z c aN −1
'
−1
bN −1
E
Figure 5.7: Signal ﬂow graph of direct form II structure for an N − 1thorder system.
5.4.3
Cascade form
N −1
The rational system function br z −r ak z −k
k=1
H(z) =
r=0 M
1− can be expressed in the general form, H(z) = A
Hk (z)
k
where Hk (z) is given by : Hk (z) =
The constants A, β1k , β2k , α1k and α2k are real, since h(n) is real. The previous equations have shown that H(z) can be represented by a cascade of second order sections Hk (z). The cascade form is less sensitive to coeﬃcient quantisation. If we change the branch transmittance in the two direct forms, the poles and zeroes of the system function will be aﬀected. In the cascade form, however, the change in one transmittance branch will aﬀect the poles or zeroes only in the second order section where the branch transmittance is aﬀected. Figure 5.8 gives the cascade structure for a 6thorder system with a direct form II realisation of each secondorder subsystem.
1 + β1k z −1 + β2k z −2 1 − α1k z −1 − α2k z −2
5.5. A COMPARISON OF FIR AND IIR DIGITAL FILTERS w1 (n) y1 (n) w2 (n) y2 (n) w3 (n) β01 β02 β03 ©E ©E E E E E E E x(n) E
T α11 ' c−1 z E
63 y3 (n)
©E T
y(n)
β11 β21
T
T α12 '
c−1 z E
β12 β22
T
T α13 '
c−1 z E
β13 β23
α21
c−1 z E
'
α22
c−1 z E
'
α23
c−1 z E
'
Figure 5.8: Cascade structure for a 6thorder system with a direct form II realisation of each secondorder subsystem.
5.5
A Comparison of FIR and IIR Digital Filters
After having discussed a wide range of methods for the design of both inﬁnite and ﬁnite duration impulse response ﬁlters several questions naturally arise: What type of system is best, IIR or FIR? Which methods yields the best results? The answer is neither. We have discussed a wide range of methods for both IIR and FIR ﬁlter because no single type of ﬁlter nor single design method is best for all circumstances. The choice between an FIR ﬁlter and an IIR ﬁlter depends upon the relative weight that one attaches to the advantages and disadvantages of each type of ﬁlter. For example, IIR ﬁlters have the advantage that a variety of frequency selective ﬁlters can be designed using closedform design formulae. That is, once the problem has been deﬁned in terms of appropriate speciﬁcations for a given kind of ﬁlter (e.g., Butterworth, Chebyshev, or elliptic), then the coeﬃcients (or poles and zeros) of the desired digital ﬁlter are obtained by straightforward substitution into a set of design equations. This kind of simplicity in the design procedure is attractive if only a few ﬁlters are to be designed or if limited computational facilities are available. In the case of FIR ﬁlters, closedform design equations do not exist. While the window method can be applied in a straightforward manner, some iteration may be necessary to meet a prescribed speciﬁcation. Most of the other FIR design methods are iterative procedures which require relatively powerful computational facilities for their implementation. In contrast, it is often possible to design frequency selective IIR digital ﬁlters using only a hand calculator and tables of continuoustime ﬁlter design parameters. A drawback of IIR ﬁlters is that the closedform IIR designs are preliminary limited to lowpass, bandpass, and highpass ﬁlters, etc. Furthermore, these designs generally disregard the phase response of the ﬁlter. For example, with a relatively simple computational procedure we may obtain excellent amplitude response characteristics with an elliptic lowpass ﬁlter while the phase response will be very nonlinear (especially at the band edge). In contrast, FIR ﬁlters can have precise linear phase. Also, the window method and most algorithmic methods aﬀord the possibility of approximating more arbitrary frequency response characteristics with little more diﬃculty than is encountered in the design of lowpass ﬁlters. Also, it appears that the design problem for FIR ﬁlters is much more under control than the IIR design problem because there is an optimality theorem for FIR ﬁlters that is meaningful in a wide range of practical situations. Finally, there are questions of economics in implementing a digital ﬁlter. Concerns of this type are usually measured in terms of hardware complexity or computational speed. Both these factors are strongly related to the ﬁlter order required to meet a given speciﬁcation. Putting
64
CHAPTER 5. DESIGN OF INFINITE IMPULSE RESPONSE FILTERS
aside phase considerations, a given amplitude response speciﬁcation will, in general, be met most eﬃciently with an IIR ﬁlter. However, in many cases, the linear phase available with an FIR ﬁlter may be well worth the extra cost. Thus a multitude of tradeoﬀs must be considered in designing a digital ﬁlter. Clearly, the ﬁnal choice will most often be made in terms of engineering judgement on such questions as the formulation of speciﬁcations, the method of implementation, and computational facilities available for carrying out the design.
Chapter 6
Introduction to Digital Spectral Analysis
6.1 Why Spectral Analysis?
There are mainly three principal reasons for spectral analysis. 1. To provide useful descriptive statistics. 2. As a diagnostic tool to indicate which further analysis method might be relevant. 3. To check postulated theoretical models.
6.2
Applications
We encounter many areas where spectral analysis is needed. A few examples are given below. Physics. In spectroscopy we wish to investigate the distribution of power in a radiation ﬁeld as a function of frequency (wavelength). Consider, for example, the situation shown in Figure 6.1 where a Michelson interferometer is used to measure the spectrum of a light source. Let S(t) be a zeromean stochastic stationary process which may be complex. It can be seen that X(t) = A (S(t − t0 ) + S(t − t0 − τ )) and ˜ Using the notation t for t − t0 , we get V (t) = X(t)2 = A2 S(t − t0 ) + S(t − t0 − τ )2 .
˜ ˜ ˜ ˜ ˜ ˜ ˜ ˜ V (t) = A2 S(t)S(t)∗ + S(t)S(t − τ )∗ + S(t − τ )S(t)∗ + S(t − τ )S(t − τ )∗ . After lowpass ﬁltering, we obtain (ideally) w(τ ) = E [V (t)] = A2 [cSS (0) + cSS (τ ) + cSS (−τ ) + cSS (0)] = A2 [2cSS (0) + 2ℜ{cSS (τ )}] Performing a Fourier transform leads to W (ejω ) = F{w(τ )} = A2 (2cSS (0)δ(ejω ) + CSS (ejω ) + CSS (e−jω )) 65
66
CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS
11111 00000 11111 00000 11111 00000
2
Fixed mirror Moveable mirror
Beam splitter
1 1
S(t)
X(t) AS(t − t0 )
111 000
d=0 τ = 2d/c
AS(t − t0 − τ ) Detector
V (t) Lowpass ﬁlter w(τ ) Fourier transform W (ejω )
Figure 6.1: A Michelson Interferometer.
Thus, the output of a Michelson interferometer is proportional to the spectrum of the light source. One of the objectives in spectroscopic interferometry is to get a good estimate of the spectrum of S(t) without moving the mirror a large d. Electrical Engineering. Measurement of the power in various frequency bands of some electromagnetic signal of interest. In radar there is the problem of signal detection and interpretation. Acoustics. The power spectrum plays the role of a descriptive statistic. We may use the spectrum to calculate the direction of arrival (DOA) of an incoming wave or to characterise the source. Geophysics. Earthquakes give rise to a number of diﬀerent waves such as pressure waves, shear waves and surface waves. These waves are received by an array of sensors. Even for a
6.2. APPLICATIONS
67
100km distance between the earthquake and the array, these types of waves can arrive at almost the same time at the array. These waves must be characterised. Mechanical Engineering. Detection of abnormal combustion in a spark ignition engine. The power spectrum is a descriptive statistic to test if a combustion cycle is a knocking one. Medicine. EEG (electroencephalogram) and ECG (electrocardiogram) analysis. It is well known that the adult brain exhibits several major modes of oscillation, the level of brain activity in these various frequency bands may be used as a diagnostic tool.
68
CHAPTER 6. INTRODUCTION TO DIGITAL SPECTRAL ANALYSIS
Chapter 7
Random Variables and Stochastic Processes
7.1 Random Variables
A random variable is a number X(ζ) assigned to every outcome ζ ∈ S of an experiment. This could be the gain in a game of chance, the voltage of a random source, or the cost of a random component. Given ζ ∈ S, the mapping ζ → X(ζ) ∈ R is a random variable, if for all x ∈ R the set {ζ : X(ζ) ≤ x} is also an event {X ≤ x} := {ζ : X(ζ) ≤ x} with probability P ({X ≤ x}) := P ({ζ : X(ζ) ≤ x})
7.1.1
Distribution Functions
The mapping x ∈ R → FX (x) = P ({X ≤ x}) is called the probability distribution function of the random variable X.
7.1.2
Properties
The probability distribution function has the following properties 1. 0 ≤ FX (x) ≤ 1 FX (∞) = 1 FX (−∞) = 0 2. FX (x) is continuous from the right i.e.,
ǫ→0
lim FX (x + ǫ) = FX (x)
3. If x1 < x2 then FX (x1 ) ≤ FX (x2 ), i.e., FX (x) is a nondecreasing function of x P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) ≥ 0 69
70
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES 4. The probability density function is deﬁned as: fX (x) := and if it exists has the property To ﬁnd FX (x), we integrate fX (x), i.e., fX (x) ≥ 0.
x
dFX (x) dx
P ({X ≤ x}) = FX (x) = and because FX (∞) = 1,
∞ −∞
fX (u)du
−∞
fX (x)dx = 1.
x2
Further,
P ({x1 < X ≤ x2 }) = FX (x2 ) − FX (x1 ) =
fX (x) dx
x1
5. For a continuous variable X, P(x1 < X < x2 ) = P(x1 < X ≤ x2 ) because P({x}) = P ((−∞, x]) − P ((−∞, x)) = FX (x) − FX (x − 0) = 0
7.1.3
Uniform Distribution
X is said to be uniformly distributed on [a, b] if it admits the probability density function 1 [u(x − a) − u(x − b)] , a < b b−a and probability distribution function fX (x) =
x
FX (x) =
−∞
fX (u)du =
1 [(x − a)u(x − a) − (x − b)u(x − b)] b−a
Both FX (x) and fX (x) are displayed in Figure 7.1.
7.1.4
Exponential Distribution
fX (x) = λe−λx u(x), λ>0
Let X be exponentially distributed with parameter λ. Then,
and FX (x) = 1 − e−λx u(x), where u(x) = 1, x ≥ 0 0, x < 0.
The exponential distribution is useful in many applications in engineering, for example, to describe the lifetime X of a transistor. The breakdown rate of a transistor is deﬁned by
∆→0+
lim
and is given by
1 P {x < X ≤ x + ∆} ∆ P {x < X}
fX (x) = λ = constant 1 − FX (x)
7.1. RANDOM VARIABLES
Tfx (x)
1 b−a
71
E
a
T1
b
x
E
a
b
x
Figure 7.1: Probability function.
7.1.5
Normal Distribution
X is normally distributed with mean µ and variance σ 2 X ∼ N (µ, σ 2 ) if
1 2 1 fX (x) = √ e− 2σ2 (x−µ) σ 2 > 0, −∞ < µ < ∞ σ 2π
A closedform expression for FX (x) does not exist. For µ = 0, σ 2 = 1, X is said to be standard normally distributed, and its distribution function is denoted by ΦX (x), i.e., we have FX (x) = ΦX x−µ σ
∞ −∞ fX (x) dx
Example 7.1.1 Show that for the normal distribution
∞ −∞
= 1.
fX (x) dx =
∞ −∞
∞ −∞
1 2 1 √ e− 2σ2 (x−µ) dx σ 2π
If we let s =
x−µ σ ,
then
fX (x)dx =
∞ −∞
1 2 1 √ e− 2 s ds 2π
Because
√1 e 2π
− 1 x2 2
> 0, we calculate
∞ −∞
1 2 1 √ e− 2 x dx 2π
2
= =
0
∞ −∞ 2π
∞ −∞ ∞ 0
1 − 1 (x2 +y2 ) e 2 dx dy 2π 1 − 1 r2 e 2 r dr dϕ 2π
=
0
∞ ∞
e− 2 r r dr e−s ds
1 2
=
0
= 1 where r = x2 + y 2 and ϕ = arctan
y x
.
72
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.1.2 A random variable X ∼ N (1000, 50). Find the probability that X is between 900 and 1050. We have P({900 ≤ X ≤ 1050}) = FX (1050) − FX (900) Numerical values of ΦX (x) over a range of x are shown in Table 1. Using Table 1, we conclude that P({900 ≤ X ≤ 1050}) = ΦX (1) + ΦX (2) − 1 = 0.819. since ΦX (−x) = 1 − ΦX (x). Often we deﬁne the following as the errorfunction. Note that erf(x) = ΦX (x) − there are other deﬁnitions in the literature. 1 erf(x) = √ 2π
x 0 1 2
though
e−x
2 /2
du = ΦX (x) −
1 2
Table 1: Numerical values for erf(x) function. x 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 0.50 0.55 0.60 0.65 0.70 0.75 erf(x) 0.01994 0.03983 0.05962 0.07926 0.09871 0.11791 0.13683 0.15542 0.17364 0.19146 0.20884 0.22575 0.24215 0.25804 0.27337 x 0.80 0.85 0.90 0.95 1.00 1.05 1.10 1.15 1.20 1.25 1.30 1.35 1.40 1.45 1.50 erf(x) 0.28814 0.30234 0.31594 0.32894 0.34134 0.35314 0.36433 0.37493 0.38493 0.39495 0.40320 0.41149 0.41924 0.42647 0.43319 x 1.55 1.60 1.65 1.70 1.75 1.80 1.85 1.90 1.95 2.00 2.05 2.10 2.15 2.20 2.25 erf(x) 0.43943 0.44520 0.45053 0.45543 0.45994 0.46407 0.46784 0.47128 0.47441 0.47726 0.47982 0.48214 0.48422 0.48610 0.48778 x 2.30 2.35 2.40 2.45 2.50 2.55 2.60 2.65 2.70 2.75 2.80 2.85 2.90 2.95 3.00 erf(x) 0.48928 0.49061 0.49180 0.49286 0.49379 0.49461 0.49534 0.49597 0.49653 0.49702 0.49744 0.49781 0.48813 0.49841 0.49865
7.1.6
χ2 Distribution
Let X = ν Yk2 be a random variable where Yk ∼ N (0, 1). X is then χ2 distributed with k=1 ν degrees of freedom. The probability density function of a χ2 random variable is displayed in ν Figure 7.2 for ν = 1, 2, 3, 5 and 10. The probability density function as
x x 2 −1 fX (x) = ν ν e− 2 u(x) , 2 2 Γ( 2 ) ν
where Γ(x) is the gamma function, Γ(x) =
0 ∞
ξ x−1 e−ξ dξ .
7.1. RANDOM VARIABLES
73
ν=1
0.5
0.4
χ2 ν
ν=2
0.3
ν=3
0.2
ν=5 ν = 10
0.1
0
0
1
2
3
4
5
6
7
8
9
10
x Figure 7.2: χ2 distribution with ν = 1, 2, 3, 5, 10 ν The mean of X is E [X] = ν and the variance is
2 σX = 2ν .
7.1.7
Multiple Random Variables
The probability of two events A = {X1 ≤ x1 } and B = {X2 ≤ x2 } have already been deﬁned as functions of x and y, respectively called probability distribution functions FX1 (x1 ) = P({X1 ≤ x1 })
FX2 (x2 ) = P({X2 ≤ x2 })
We must introduce a new concept to include the probability of the joint event {X1 ≤ x1 , X2 ≤ x2 } = {(X1 , X2 ) ∈ D}, where x1 and x2 are two arbitrary real numbers and D is the quadrant shown in Figure 7.3. The joint distribution FX1 X2 (x1 , x2 ) of two random variables X1 and X2 is the probability of the event {X1 ≤ x1 , X2 ≤ x2 }, FX1 X2 (x1 , x2 ) = P ({X1 ≤ x1 , X2 ≤ x2 }) . For n random variables X1 , . . . , Xn the joint distribution function is denoted by FX1 X2 ...Xn (x1 , . . . , xn ) = P ({X1 ≤ x1 , X2 ≤ x2 , . . . , Xn ≤ xn })
74
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES X2
T
x2 D x1
E X1
Figure 7.3: The quadrant D of two arbitrary real numbers x and y.
7.1.8
1)
Properties
The multivariate probability distribution function has the following properties.
x1 →∞,...,xn →∞ xi →−∞
lim
FX1 X2 ...Xn (x1 , . . . , xn ) = F (∞, ∞, . . . , ∞) = 1
lim FX1 X2 ...Xn (x1 , . . . , xi−1 , xi , xi+1 , . . . , xn ) = Fx1 ,x2 ,...,xn (x1 , . . . , xi−1 , −∞, xi+1 , . . . , xn ) = 0
2) F is continuous from the right in each component xi when the remaining components are ﬁxed. 3) F is a non decreasing function of each component xi when the remaining components are ﬁxed. 4) For the nth diﬀerence ∆b F = a
k∈{0,1}n
(−1)[n−
Pn
i=1
ki ]
F (bk1 a1−k1 , . . . , bkn a1−kn ) ≥ 0 n n 1 1
where a = (a1 , . . . , an )T , b = (b1 , . . . , bn )T with ai < bi (i = 1, . . . , n) where T denotes transpose. For n = 1, ∆b F is equivalent to F (b) − F (a), and for n = 2, ∆b F is a a equivalent to F (b1 , b2 ) − F (b1 , a2 ) − F (a1 , b2 ) + F (a1 , a2 ). To show that (1)–(3) are not suﬃcient to deﬁne a probability, one should discuss the function FX1 X2 (x1 , x2 ), deﬁned as: FX1 X2 (x1 , x2 ) = 0, 1, x1 + x2 < 0, elsewhere.
and is depicted in Figure 7.4. Suppose that in a combined experiment the result of an experiment i depends on the ﬁrst to (i − 1)th experiments. With the density function fX1 (x1 ) for the ﬁrst experiment one constructs the density function of the total experiment
n
fX1 ...Xn (x1 , . . . , xn ) = fX1 (x1 )
i=2 n
fXi (xi x1 , . . . , xi−1 ) fXi (xi X1 = x1 , . . . , Xi−1 = xi−1 )
= fX1 (x1 )
i=2
7.1. RANDOM VARIABLES
x2
T
75
FX1 ,X2 (x1 , x2 ) = 1
E x1
FX1 ,X2 (x1 , x2 ) = 0
Figure 7.4: Bivariate probability distribution function. If the ith experiment is independent of the previous ones, then fXi (xi x1 , . . . , xi−1 ) = fXi (xi ) and fX1 ...Xn (x1 , . . . , xn ) =
i=1 n
fXi (xi )
n
or equivalently FX1 ...Xn (x1 , . . . , xn ) =
FXi (xi )
i=1
which means that the events {X1 ≤ x1 }, . . . , {Xn ≤ xn } are independent. For the special case for n = 2, we have the following properties: 1) FX1 X2 (∞, ∞) = P(X1 ≤ ∞, X2 ≤ ∞) = 1
FX1 X2 (−∞, x2 ) = P(X1 ≤ −∞, X2 ≤ x2 ) = 0 = FX1 X2 (x1 , −∞) because {X1 = −∞, X2 ≤ x2 } ⊂ {X1 = −∞}, {X1 ≤ x1 , X2 = −∞} ⊂ {X2 = −∞}. 2)
ǫ→0
lim FX1 X2 (x1 + ǫ, x2 ) = lim FX1 X2 (x1 , x2 + ǫ) = FX1 X2 (x1 , x2 ).
ǫ→0
3) The event {x11 < X1 ≤ x12 , X2 ≤ x2 } consists of all the points (X1 , X2 ) in D1 , and the event {X1 ≤ x1 , x21 < X2 ≤ x22 } consists of all the points in D2 . From Figure 7.5, we see that {X1 ≤ x12 , X2 ≤ x2 } = {X1 ≤ x11 , X2 ≤ x2 } ∪ {x11 < X1 ≤ x12 , X2 ≤ x2 }. The two summands are mutually exclusive, hence P({X1 ≤ x12 , X2 ≤ x2 }) = P({X1 ≤ x11 , X2 ≤ x2 }) + P({x11 < X1 ≤ x12 , X2 ≤ x2 }) and therefore, P({x11 < X1 ≤ x12 , X2 ≤ x2 }) = FX1 X2 (x12 , x2 ) − FX1 X2 (x11 , x2 ).
76
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES X2
T
x22 D2 x2 x21 x1 x11 D1 x12
E X1
Figure 7.5: Properties for n=2 Similarly, P({X1 < x1 , x21 < X2 ≤ x22 }) = FX1 X2 (x1 , x22 ) − FX1 X2 (x1 , x21 ) 4) P({x11 < X1 ≤ x12 , x21 < X2 ≤ x2 2}) = FX1 X2 (x12 , x22 ) − FX1 X2 (x11 , x22 ) − FX1 X2 (x12 , x21 ) + FX1 X2 (x11 , x21 ) > 0
This is a probability that (X1 , X2 ) is in the rectangle D3 (see Figure 7.6). X2
T
x22 x21 x11 D3
EX1
x12
Figure 7.6: Probability that (X1 , X2 ) is in the rectangle D3 . Example 7.1.3 The bivariate normal distribution fX1 X1 (x1 , x2 ) = 1 2πσX1 σX2 1 − ρ2 x1 − µX1 σ X1
2
1 × exp − 2(1 − ρ2 )
(x1 − µX1 )(x2 − µX2 ) − 2ρ + σ X2 σ X2
x2 − µX2 σ X2
2
2 2 where µX1 and µX2 are respectively the means of X1 and X2 , σX1 > 0 and σX2 > 0 the corre2 2 sponding variances, and −1 < ρ < 1 is the correlation coeﬃcient. N (µX1 , σX1 ) and N (µX2 , σX2 ) are the marginal distributions.
7.1. RANDOM VARIABLES
σ
2
77
2 µX2 ), σX2 (1 − ρ2 )). If X1 and X2 are independent then ρ = 0, and clearly fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ).
The conditional density function fX2 (x2 x1 ) of X2 when X1 = x1 is N (µX2 + ρ σX2 (x1 − X
7.1.9
Operations on Random Variables
The concept of a random variable was previously introduced as a means of providing a symmetric deﬁnition of events deﬁned on a sample space. It formed a mathematical model for describing characteristics of some real, physical world random variables, which are mostly based on a single concept – expectation.
7.1.10
Expectation
The mathematical expectation of a random variable X, E [X] which may be read “the expected value of X” or “the mean value of X” is deﬁned as E [X] :=
∞ −∞
x · fX (x)dx
where fX (x) is the probability density function of X. If X happens to be discrete with N possible values xi having probabilities P(xi ) of occurrence, then
N
fX (x) =
i=1 N
P(xi ) · δ(x − xi ) xi P(xi )
E [X] =
i=1
Example 7.1.4 i) Normal Distribution: If X ∼ N (µ, σ 2 ), then 1 1 fX (x) = √ · exp − (x − µ)2 /σ 2 2 σ 2π
∞ −∞
E [X] = = = = = =
xfX (x)dx
∞ 1 1 √ x exp{− (x − µ)2 /σ 2 }dx 2 σ 2π −∞ ∞ 1 1 √ (σu + µ) exp{− u2 }σdu 2 σ 2π −∞ ∞ ∞ 1 1 1 √ exp{− u2 }du u · exp{− u2 }du + µ σ 2 2 2π −∞ −∞ √ 1 √ σ · 0 + µ 2π 2π µ
78
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
ii) Uniform distribution: If X is uniformly distributed on the interval [a, b], then fX (x) = where rect[a,b] = 1 · rect[a,b] (x) b−a 1, a ≤ x ≤ b 0, else. 1 b−a
b
E [X] = = = = which can be veriﬁed graphically.
∞ −∞
xfX (x)dx =
xdx
a
1 1 2 (b − a2 ) b−a2 (b − a)(b + a) 2(b − a) a+b 2
iii) Exponential Distribution: If X is exponentially distributed, then fX (x) =
∞ −∞ ∞
x−a 1 exp − u(x − a) b b x · fX (x)dx x·
E [X] = =
x−a 1 exp − u(x − a)dx b b −∞ 1 ∞ x−a = x · exp − dx b a b x a 1 ∞ x · exp − dx exp = b a b b E [X] = a + b Note: If a random variable’s density is symmetrical about a line x = a, then E [X] = a, i.e. E [X] = a, if fX (x + a) = fX (−x + a)
7.1.11
Expected Value of a Function of Random Variables
∞ −∞
Suppose we are interested in the mean of the random variable Y = g(X) E [Y ] = and we are given fX (x). Then, E [Y ] = E [g(X)] =
∞ −∞
y · fY (y)dx
g(x) · fX (x)dx
7.1. RANDOM VARIABLES Example 7.1.5 The average power in a 1 Ω resistor can be found as E V2 =
∞ −∞
79
v 2 fV (v)dv
where V and fV (v) are respectively the random voltage and its probability density function. In particular, if we apply the transformation g(·) to the random variable X, deﬁned as g(X) = X n , n = 0, 1, 2, . . . , the expectation of g(X) is known as the nth order moment of X and denoted by µn : µn := E [X n ] =
∞ −∞
xn fX (x)dx .
It is also of importance to use central moments of X around the mean. These are deﬁned as µ′ := E [(X − E [X])n ] , n = 0, 1, 2 . . . n
∞ −∞
(x − E [X])n · fX (x)dx =
(x − µ)n fX (x)dx
Clearly the ﬁrst order central moment is zero.
7.1.12
The Variance
∞ −∞
The variance of a random variable X is by deﬁnition σ 2 := (x − E [X])2 · fX (x)dx
The positive constant σ, is called the standard deviation of X. From the deﬁnition it follows that σ 2 is the mean of the random variable (X − E [X])2 . Thus, σ 2 = E (X − E [X])2 = E X 2 − 2X · (E [X]) + (E [X])2 = E X 2 − 2E [XE [X]] + (E [X])2 = E X 2 − 2 (E [X])2 + (E [X])2
σ 2 = E X 2 − (E [X])2 = E X 2 − µ2
Example 7.1.6 We will now ﬁnd the variance of three common distributions. i) Normal distribution: Let X ∼ N (µ, σ 2 ). The probability density function of X is 1 1 (x − µ)2 fX (x) = √ exp − . 2 σ2 σ 2π
80 Clearly
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
∞ −∞
exp −
1 (x − µ)2 2 σ2
√ dx = σ 2π
Diﬀerentiating with respect to σ, we obtain 1 (x − µ)2 (x − µ)2 exp − σ3 2 σ2 −∞ √ Multiplying both sides by σ 2 / 2π, we conclude that
∞
dx =
√
2π
E (X − µ)2 = E (X − E [X])2 = σ 2 . ii) Uniform distribution:
E [(X − E [X])2
= E X 2 − (E [X])2 = = =
∞ −∞ b
x2 fX (x)dx −
a+b 2
2
2
x2 dx − a b−a (b − a)2 . 12
a+b 2
iii) Exponential distribution: fX (x) = λ exp{−λx}u(x), x > 0, λ > 0 E [X] = σ2 = 1 λ
1 λ2
7.1.13
Transformation of a Random Variable
In practice, one may wish to transform one random variable X into a new random variable Y by means of a transformation Y = g(X), where the probability density function fX (x) or distribution function of X is known. The problem is to ﬁnd the probability density function of fY (y) or distribution function FY (y) of Y . Let us assume that g(·) is continuous and diﬀerentiable at all values of x for which fX (x) = 0. Furthermore, we assume that g(·) is monotone for which the inverse g −1 (·) exists. Then: FY (y) = P ({Y ≤ y}) = P ({g(X) ≤ y}) = P {X ≤ g
−1 g −1 (y)
(y)} =
−∞
fX (x)dx
holds. The density function of Y = g(X) is obtained via diﬀerentiation which leads to fY (y) = fX g −1 (y) · dg −1 (y) dy
7.1. RANDOM VARIABLES Example 7.1.7 Let Y = aX + b. To ﬁnd the probability density function of Y knowing FX (x) or fX (x), we calculate FY (y) = P ({Y ≤ y}) = P ({aX + b ≤ y}) y−b = P X≤ , a>0 a y−b = FX , a and FY (y) = 1 − FX y−b a , a<0
81
By diﬀerentiating FY (y), we obtain the probability density function fY (y) = Example 7.1.8 Let Y = X2 To ﬁnd the probability density function of Y knowing FX (x) or fX (x), we calculate FY (y) = P {X 2 ≤ y} = 0, y<0 √ √ P ({− y ≤ X ≤ y}) , y ≥ 0 √ √ FY (y) = FX ( y) − FX (− y) 1 · fX a y−b a
FY (y) is non diﬀerentiable at y = 0. We may choose arbitrarily fY (y) = 0 at y = 0.
By diﬀerentiating FY (y), we obtain the probability density function 0, y<0 √ √ fY (y) = fX ( y) + fX (− y) √ , y>0 2 y
7.1.14
The Characteristic Function
ΦX (ejω ) := E ejωX =
∞ −∞
The characteristic function (cf) of a random variable X is deﬁned as follows: (7.1) (7.2)
fX (x)ejωx dx.
Using Schwarz’s inequality in (7.2) shows that ΦX (ejω ) ≤ 1 and thus it always exists.
This can be interpreted as the expectation of a function g(X) = ejωX , or as the Fourier transform (with reversed sign of the exponent) of the probability density function (pdf) of X. Considering Equation (7.2) as a Fourier transform, we can readily apply all the properties of Fourier transforms to the characteristic function. Most importantly, the inverse relationship of Equation (7.2) is given by ∞ 1 fX (x) = ΦX (ω)e−jωx dω. 2π −∞
82
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.1.9 We shall derive the cf of a Gaussian distributed random variable. First, we consider Z ∼ N (0, 1). ΦZ (ω) = = = fZ (z)ejωz dz 1 √ 2π 1 √ 2π
1 2
∞ −∞ ∞ −∞
e− 2 z ejωz dz e− 2 (z−jω)
1 2 + 1 (jω)2 2
1 2
dz
= e− 2 ω = e− 2 ω
1
fZ (z − jω)dz
2
Now we consider the more general case of X = σZ + µ ∼ N (µ, σ 2 ): ΦX (ω) = E ejω(σZ+µ) = ejωµ ΦZ (σω) = ejωµ− 2 σ
1 2 ω2
The cf is also known as the momentgenerating function as it may be used to determine nth order moment of a random variable. This can be seen from the following derivation. Consider the power series expansion of an exponential term for x < ∞: e = Substituting this into Equation (7.2) yields ΦX (ω) = fX (x) 1 + jωx + (jωx)2 + ··· 2! dx (7.3)
x ∞ i=0
xn . n!
= 1 + jωE [X] +
(jω)2 E X2 + · · · 2!
Diﬀerentiating the above expression n times and evaluating at the origin gives d dω
n
ΦX (ω)
ω=0
= j n E [X n ]
which means that the nth order moment of X can be determined according to E [X n ] = 1 jn d dω
n
ΦX (ω)
ω=0
.
This is known as the moment theorem. When the power series expansion of Equation (7.3) converges the characteristic function and hence the pdf of X are completely determined by the moments of X.
7.2. COVARIANCE
83
7.2
Covariance
If the statistical properties of the two random variables X1 and X2 are described by their joint probability density function fX1 X2 (x1 , x2 ), then we have E [g(X1 , X2 )] =
n k For g(X1 , X2 ) = X1 X2 n k µn,k := E X1 X2 = ∞ −∞ ∞ −∞ ∞ −∞ ∞ −∞
g(x1 , x2 ) · fX1 X2 (x1 , x2 )dx1 dx2
xn xk fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . . 1 2
k n are called the joint moments of X1 and X2 . Clearly µ0k = E X2 , while µn0 = E [X1 ]. The second order moment E [X1 X2 ] of X1 and X2 we denote1 by rX1 X2 ,
rX1 X2 = E [X1 X2 ] =
∞ −∞
∞ −∞
x1 x2 fX1 X2 (x1 , x2 )dx1 dx2
If rX1 X2 = E [X1 X2 ] = E [X1 ] · E [X2 ], then X1 and X2 are known to be uncorrelated. If rX1 X2 = 0, then X1 and X2 are orthogonal. The moments µ′ , k = 0, 1, 2, . . . are known to be joint central moments and deﬁned by nk
n k µ′ nk = E (X1 − E [X1 ]) (X2 − E [X2 ]) )
=
∞
∞
−∞
−∞
(x1 − E [X1 ])n (x2 − E [X2 ])k fX1 X2 (x1 , x2 )dx1 dx2 , n, k = 0, 1, 2, . . .
The second order central moments
2 µ′ 20 = σX1 2 µ′ 02 = σX2
are the variances of X1 and X2 , respectively. The joint moment µ′ is called covariance of X1 11 and X2 and denoted by cX1 X2 c X1 X2 = E [(X1 − E [X1 ])(X2 − E [X2 ])] =
∞ −∞
(x1 − E [X1 ])(x2 − E [Y2 ]) · fX1 X2 (x1 , x2 )dx1 dx2
Clearly cX1 X2 = rX1 X2 − E [X1 ] E [X2 ] . If two random variables are independent, i.e. fX1 X2 (x1 , x2 ) = fX1 (x1 )fX2 (x2 ), then cX1 X2 = 0. The converse however is not always true (except for the Gaussian case). The normalized secondc X X2 order moment ρ = µ′ / µ′ µ′ = σX 1σX is known as the correlation coeﬃcient of X1 and X2 . 11 20 02 1 2 It can be shown that −1 ≤ ρ ≤ 1 We will show that two uncorrelated random variables are not necessarily independent.
In the engineering literature rX1 X2 is often called correlation. We reserve this term to describe the normalized covariance as it is common in the statistical literature.
1
84
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.2.1 U is uniformly distributed on [−π, π) X1 = cos U, X2 = sin U
2 2 X1 and X2 are not independent because X1 + X2 = 1
cXY
= E [(X1 − E [X1 ])(X2 − E [X2 ])] = E [X1 Y1 ] = E [cos U sin U ] = 1 E [sin 2U ] 2 = 0
Thus X1 and X2 are uncorrelated but not independent. Example 7.2.2 Let Xi , i = 1, . . . , N be random variables and αi , i = 1, . . . , N be scalar values. We calculate the mean and the variance of
N
Y =
i=1
αi Xi
given the means and variances (and covariances) of Xi .
N
E [Y ] =
i=1
αi E [Xi ]
N N
2 σY
= E (Y − E [Y ])
2
=
i=1 j=1
αi αj cXi Xj
For the special case cXi Xj = 0, i = j
N 2 σY = i=1 2 2 αi σXi .
7.3
Stochastic Processes (Random Processes)
Random processes naturally arise in many engineering applications. For example, the voltage waveform of a noise source or the bit stream of a binary communications signal are continuoustime random processes. The monthly rainfall values or the daily stock market index are examples of discretetime random processes. However, random sequences are most commonly encountered when sampling continuoustime processes, as discussed in the preceding chapter. The remainder of the manuscript focuses solely on discretetime random processes. However the results presented herein apply in a similar fashion to the continuous time case. Deﬁnition 7.3.1 A real random process is an indexed set of real functions of time that has certain statistical properties. We characterise a real random process by a set of real functions and associated probability description. In general, xi (n) = X(n, ζi ), n ∈ Z, ζi ∈ S, denotes the waveform that is obtained when the event ζi of the process occurs. xi (n) is called a sample function of the process (see Figure 7.7). The set of all possible sample functions {X(n, ζi )} is called the ensemble and deﬁnes the
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES) xi−1 (n) = X(n, ζi−1 ) . . .
85
n
xi (n) = X(n, ζi )
n
xi+1 (n) = X(n, ζi+1 )
n n1 X1 = X(n1 ) = X(n1 , ζ) . . . n2 X2 = X(n2 ) = X(n2 , ζ)
Figure 7.7: Voltage waveforms emitted from a noise source. random process X(n) that describes a noise source. A random process can be seen as a function of two variables, time n and elementary event ζ. When n is ﬁxed, X(n, ζ) is simply a random variable. When ζ is ﬁxed, X(n, ζ) = x(n) is a function of time known as “sample function” or realisation. If n = n1 then X(n1 , ζ) = X(n1 ) = X1 is a random variable. If the noise source has a Gaussian distribution, then any of the random variables would be described by fXi (xi ) = 1
2 2πσi
·e
−
(xi −µi )2 2σ 2 i
where Xi = X(ni , ζ). Also, all joint distributions are multivariate Gaussian. At each point in time, the value of the waveform is a random variable described by its probability density function at that time n, fX (x; n). The probability that the process X(n) will have a value in the range [a, b] at n is given by
b
P({a ≤ X(n) ≤ b}) =
fX (x; n)dx
a
86
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
The ﬁrst moments of X(n) are µ1 (n) = E [X(n)] µ2 (n) = E X(n)2 . . . µl (n) = E X(n)l = = . . . =
∞ −∞ ∞ −∞ ∞ −∞
x fX (x; n)dx x2 fX (x; n)dx
xl fX (x; n)dx,
where µ1 (n) is the mean of X(n), and µ2 (n)−µ1 (n)2 is the variance of X(n). Because in general, fX (x; n) depends on n, the moments will depend on time. The probability that a process X(n) lies in the interval [a, b] at time n1 and in the interval [c, d] at time n2 is given by the joint density function fX1 X2 (x1 , x2 ; n1 , n2 ),
b d c
P ({a ≤ X1 ≤ b, c ≤ X2 ≤ d}) =
a
fX1 X2 (x1 , x2 ; n1 , n2 )dx1 dx2 .
A complete description of a random process requires knowledge of all such joint densities for all possible point locations and orders fX1 X2 ...XN (x1 , . . . , xN ; n1 , . . . , nN ). • A continuous random process consists of a random process with associated continuously distributed random variables, X(n), n ∈ Z. For example, noise in communication circuits at a given time; the voltage measured can have any value. • A discrete random process consists of a random process with associated discrete random variables, e.g. the output of an ideal limiter.
7.3.1
Stationarity and Ergodicity
Deﬁnition 7.3.2 A random process X(n), n ∈ Z is said to be stationary to the order N if, for any n1 , n2 , . . . nN , and x1 , x2 , . . . , xN , and any n0 FX (x1 , . . . , xN ; n1 , . . . , nN ) = P ({X(n1 ) ≤ x1 , X(n2 ) ≤ x2 , . . . , X(nN ) ≤ xN }) = = FX (x1 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) holds or equivalently: fX (x1 , x2 , . . . , xN ; n1 , . . . , nN ) = fX (x1 , x2 , . . . , xN ; n1 + n0 , . . . , nN + n0 ) . The process is said to be strictly stationary if it is stationary to the inﬁnite order. If a process is stationary, it can be translated in time without changing its statistical description. Ergodicity is a topic dealing with the relationship between statistical averages and time averages. Suppose that, for example, we wish to determine the mean µ1 (n) of a process X(n). For this purpose, we observe a large number of samples X(n, ζi ) and we use their ensemble average as the estimate of µ1 (n) = E [X(n)] µ1 (n) = ˆ 1 L
L
P({X(n1 + n0 ) ≤ x1 , X2 (n2 + n0 ) ≤ x2 , . . . , X(nN + n0 ) ≤ xN })
X(n, ζi )
i=1
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)
87
Suppose, however, that we have access only to a single sample x(n) = X(n, ζ) of X(n) for −N ≤ n ≤ N . Can we use then the time average 1 X(n) = lim N →∞ 2N + 1
N
X(n, ζ)
n=−N
as the estimate of µ1 (n)? This is, of course, impossible if µ1 (n) depends on n. However, if µ1 (n) = µ1 is a constant, then under the general conditions, X(n) approaches µ1 . Deﬁnition 7.3.3 A random process is said to be ergodic if all time averages of any sample function are equal to the corresponding ensemble averages (expectation). Example 7.3.1 DC and RMS values are deﬁned in terms of time averages, but if the process is ergodic, they may be evaluated by use of ensemble averages. The DC value of X(n) is: Xdc = E [X(n)] Xrms = E [X(n)2 ].
If X(n) is ergodic, the time average is equal to the ensemble average, that is Xdc = lim 1 N →∞ 2N + 1
N
X(n) = E [X(n)] =
n=−N
∞ −∞
xfX (x; n)dx = µX
Similarly, we have for the rms value: Xrms = 1 N →∞ 2N + 1 lim
N
X(n)2 =
n=−N
E [X(n)2 ] =
2 σX + µ2 = X
∞ −∞
x2 fX (x; n)dx
2 where σX is the variance of X(n).
Ergodicity and Stationarity If a process is ergodic, all time and ensemble averages are interchangeable. The time average is by deﬁnition independent of time. As it equals the ensemble averages (such as moments), the latter are also independent of time. Thus, an ergodic process must be stationary. But not all stationary processes are ergodic. Example 7.3.2 An Ergodic Random Process Let the random process be given by: X(n) = A cos(ω0 n + Φ), where A and ω0 are constants and Φ is a random variable that is uniformly distributed over [0, 2π). First we evaluate some ensemble averages (expectation). E [X(n)] = =
0 ∞ −∞ 2π
x(ϕ) fΦ (ϕ) dϕ A cos(ω0 n + ϕ) 1 dϕ 2π
= 0.
88 Furthermore,
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
E X(n)2
2π
=
0
A2 cos(ω0 n + ϕ)2
1 dϕ 2π
A2 = 2 In this example, the time parameter n disappeared when the ensemble averages were evaluated. The process is stationary up to the second order. Now, evaluate the corresponding time averages using a sample function of the random process. X(n, ζ1 ) = A cos ω0 n that occurs when Φ = 0. The zero phase Φ = 0 corresponds to one of the events. The time average for any of the sample functions can be calculated by letting Φ be any value between 0 and 2π. The time averages are: 1 • First moment: X(n) = lim N →∞ 2N + 1 • Second moment: X(n)2
N
A cos(ω0 n + ϕ) = 0
n=−N N
1 = lim N →∞ 2N + 1
[A cos(ω0 n + ϕ)]2 =
n=−N
A2 2
In this example, ϕ disappears when the time average is calculated. We see that the time average is equal to the ensemble average for the ﬁrst and second moments. We have not yet proved that the process is ergodic because we have to evaluate all the orders of moments and averages. In general, it is diﬃcult to prove that a process is ergodic. In this manuscript we will focus on stationary processes. Example 7.3.3 Suppose that the amplitude in the previous example is now random. Let us assume that A and Φ are independent. In this case, the process is stationary but not ergodic because if E A2 = σ 2 , then E X(n)2 = E A2 · E cos(ω0 n + Φ)2 = but 1 lim T →∞ 2N + 1 Summary: • An ergodic process has to be stationary. • However, if a process is stationary, it may or may not be ergodic.
N n=−N
σ2 2
1 X(n)2 = A2 2
7.3.2
SecondOrder Moment Functions and WideSense Stationarity
Deﬁnition 7.3.4 The secondorder moment function (SOMF) of a real stationary process X(n) is deﬁned as: rXX (n1 , n2 ) = E [X(n1 )X(n2 )] = where X1 = X(n1 ) and X2 = X(n2 ).
∞ −∞ ∞ −∞
x1 x2 fX1 X2 (x1 , x2 ; n1 , n2 ) dx1 dx2
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES)
89
If the process is stationary to the second order, the SOMF is only a function of the time diﬀerence κ = n2 − n1 , and rXX (κ) = E [X(n + κ)X(n)] Deﬁnition 7.3.5 A random process is said to be widesense stationary if: 1. E [X(n)] is a constant. 2. rXX (n1 , n2 ) = rXX (κ), where κ = n2 − n1  A process that is stationary to order 2 or greater is certainly widesense stationary. However, a ﬁnite order stationary process is not necessarily stationary. Properties of the SOMF
2 1. rXX (0) = E X(n)2 = σX + µ2 is the second order moment X
2. rXX (κ) = rXX (−κ) 3. rXX (0) ≥ rXX (κ), κ > 0 Proofs for 1 and 2 follow from the deﬁnition. The third property holds because E (X(n + κ) ± X(n))2 ≥ 0 E X(n + κ)2 ± 2E [X(n + κ)X(n)] + E X(n)2 ≥ 0 rXX (0) ± 2 rXX (κ) + rXX (0) ≥ 0. Deﬁnition 7.3.6 The “crossSOMF” for two real processes X(n) and Y (n) is: rXY (n1 , n2 ) = E [X(n1 )Y (n2 )] If X(n) and Y (n) are jointly stationary, the crossSOMF becomes: rXY (n1 , n2 ) = rXY (n1 − n2 ) = rXY (κ) where κ = n1 − n2 . Properties of the CrossSOMF of a Jointly Stationary Real Process 1. rXY (−κ) = rY X (κ) 2. rXY (κ) ≤ This holds because rXX (0)rY Y (0). E (X(n + κ) − aY (n))2 ≥ 0 ∀a ∈ R rXX (0) + a2 rY Y (0) − 2arXY (κ) ≥ 0 rXY (κ)2 − rXX (0)rY Y (0) ≤ 0
E X(n + κ)2 + a2 E Y (n)2 − 2aE [X(n + κ)Y (n)] ≥ 0 ∀a ∈ R This quadratic form in a is valid if the roots are not real (except as a double real root). Therefore
90
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES rXY (κ)2 ≤ rXX (0)rY Y (0) ⇒ rXY (κ) ≤ rXX (0)rY Y (0)
Sometimes
rXY (κ) rXX (0)rY Y (0)
is denoted by ρXY (κ). 3. rXY (κ) ≤ 1 [rXX (0) + rY Y (0)] 2
Property 2 constitutes a tighter bound than that of property 3, because the geometric mean of two positive numbers cannot exceed the arithmetic mean, that is rXX (0)rY Y (0) ≤ 1 [rXX (0) + rY Y (0)] 2
• Two random processes X(n) and Y (n) are said to be uncorrelated if: rXY (κ) = E [X(n + κ)] E [Y (n)] = µX µY ∀κ ∈ Z • Two random processes X(n) and Y (n) are orthogonal if: rXY (κ) = 0, ∀κ ∈ Z • If X(n) = Y (n), the crossSOMF becomes the SOMF. • If the random processes X(n) and Y (n) are jointly ergodic, the time average may be used to replace the ensemble average. If X(n) and Y (n) are jointly ergodic
1 rXY (κ) = E [X(n + κ)Y (n)] = E [X(n)Y (n − κ)] ≡ lim N →∞ 2N + 1
△
N n=−N
X(n) Y (n − κ) (7.4)
In this case, crossSOMFs and autoSOMFs may be measured by using a system composed of a delay element, a multiplier and an accumulator. This is illustrated in Figure 7.8, where the block z −κ is a delay line of order κ. X(n)
c
Y (n)
E
z −κ
E
Y (n − κ)
×
E
1 2N + 1
N n=−N
E
rXY (κ)
Figure 7.8: Measurement of SOMF.
7.3. STOCHASTIC PROCESSES (RANDOM PROCESSES) Central SecondOrder Moment Function (Covariance Function) The central secondorder moment functionis called the covariance function and is deﬁned by cXX (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(X(n) − E [X(n)])] which can be rewritten in the form cXX (n + κ, n) = rXX (n + κ, n) − E [X(n + κ)] E [X(n)] The covariance function for two processes X(n) and Y (n) is deﬁned by cXY (n + κ, n) = E [(X(n + κ) − E [X(n + κ)])(Y (n) − E [Y (n)])] or equivalently cXY (n + κ, n) = rXY (n + κ, n) − E [X(n + κ)] E [Y (n)] For processes that are at least jointly wide sense stationary, we have cXX (κ) = rXX (κ) − (E [X(n)])2 and cXY (κ) = rXY (κ) − E [X(n)] E [Y (n)]
91
The variance of a random process is obtained from cXX (κ) by setting κ = 0. If the process is stationary 2 σX = E (X(n) − E [X(n)])2 = rXX (0) − (E [X(n)])2 = rXX (0) − µ2 X
For two random processes, if cY X (n, n + κ) = 0, then they are called uncorrelated, which is equivalent to rXY (n + κ, n) = E [X(n + κ)] E [Y (n)]. Complex Random Processes Deﬁnition 7.3.7 A complex random process is Z(n) = X(n) + j Y (n) where X(n) and Y (n) are real random processes. A complex process is stationary if X(n) and Y (n) are jointly stationary, i.e. fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 , . . . , n2N ) = fZ (x1 , . . . , xN , y1 , . . . , yN ; n1 + n0 , . . . , n2N + n0 )
△
for all x1 , . . . , xN , y1 , . . . , yN and all n1 , . . . , n2N , n0 and N . The mean of a complex random process is deﬁned as: E [Z(n)] = E [X(n)] + jE [Y (n)] Deﬁnition 7.3.8 The SOMF for a complex random process is: rZZ (n1 , n2 ) = E [Z(n1 )Z(n2 )∗ ]
92
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Deﬁnition 7.3.9 The complex random process is stationary in the wide sense if E [Z(n)] is a complex constant and rZZ (n1 , n2 ) = rZZ (κ) where κ = n2 − n1 . The SOMF of a widesense stationary complex process has the symmetry property: rZZ (−κ) = rZZ (κ)∗ as it can be easily seen from the deﬁnition. Deﬁnition 7.3.10 The crossSOMF for two complex random processes Z1 (n) and Z2 (n) is: rZ1 Z2 (n1 , n2 ) = E [Z1 (n1 )Z2 (n2 )∗ ] If the complex random processes are jointly widesense stationary, the crossSOMF becomes: rZ1 Z2 (n1 , n2 ) = rZ1 Z2 (κ) where κ = n2 − n1  The covariance function of a complex random process is deﬁned as cZZ (n + κ, n) = E [(Z(n + κ) − E [Z(n + κ)]) (Z(n) − E [Z(n)])∗ ] and is a function of κ only if Z(n) is widesense stationary. The crosscovariance function of Z1 (n) and Z2 (n) is given as: cZ1 Z2 (n + κ, n) = E [(Z1 (n + κ) − E [Z1 (n + κ)]) (Z2 (n) − E [Z2 (n)])∗ ] Z1 (n) and Z2 (n) are called uncorrelated if cZ1 Z2 (n + κ, n) = 0. They are orthogonal if rZ1 Z2 (n + κ, n) = 0. Example 7.3.4 A complex process Z(n) is comprised of a sum of M complex signals
M
Z(n) =
m=1
Am ej(ω0 n+Φm )
Here ω0 is constant, Am is a zeromean random amplitude of the mth signal and Φm is its random phase. We assume all variables Am and Φm for m = 1, 2, . . . , M to be statistically independent and the random phases uniformly distributed on [0, 2π). Find the mean and SOMF. Is the process wide sense stationary ?
M
E [Z(n)] =
m=1 M
E [Am ] ejω0 n E ejΦm E [Am ] ejω0 n {E [cos(Φm )] + jE [sin(Φm )]}
=
m=1
= 0 rZZ (n + κ, n) = E [Z(n + κ)Z(n)∗ ]
M M
= E
m=1 M M
Am ejω0 (n+κ) ejΦm ·
Al e−jω0 n e−jΦl
l=1
=
m=1 l=1
E [Am Al ] · ejω0 κ · E ej(Φm −Φl )
7.4. POWER SPECTRAL DENSITY Since E ej(Φm −Φl ) = The former holds because for m = l, E ejΦm −Φl = E ejΦm E e−jΦl = 0 Thus if m = l,
M M
93 1 m=l 0 m=l
rZZ (n + κ, n) =
m=1
E
A2 m
e
jω0 κ
=e
jω0 κ m=1
E A2 m
7.4
7.4.1
Power Spectral Density
Deﬁnition of Power Spectral Density
Let X(n, ζi ), n ∈ Z, represent a sample function of a real random process X(n, ζ) ≡ X(n). We deﬁne a truncated version of this sample function as follows: xN (n) = X(n, ζi ), 0, n ≤ M, M ∈ Z+ elsewhere
where N = 2M + 1 is the length of the truncation window. As long as M is ﬁnite, we presume that xN (n) will satisfy
M n=−M
xN (n) < ∞
Then the Fourier transform of xN (n) becomes: XN (ejω ) = =
n=−M ∞ n=−∞ M
xN (n) e−jωn X(n, ζi ) e−jωn
The normalised energy in the time interval [−M, M ] is
M
EN =
n=−M
xN (n)2 =
π −π
XN (ejω )
2
dω 2π
where Parseval’s theorem has been used to obtain the second integral. By dividing the expression EN by N we obtain the average power PN in xN (n) over the interval [−M, M ]: PN = 1 2M + 1
M
xN (n)2 =
n=−M
π −π
XN (ejω )2 dω 2M + 1 2π
(7.5)
From Equation (7.5), we see that the average power PN , in the truncated ensemble member xN (n), can be obtained by the integration of XN (ejω )2 /N . To obtain the average power for the entire sample function, we must consider the limit as M → ∞. PN provides a measure of power for a single sample function. If we wish to determine the average power of the random process,
94
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
we must also take the average (expected value) over the entire ensemble of sample functions. The average power of the process X(n, ζ) is deﬁned as PXX = lim 1 M →∞ 2M + 1
M
E X(n, ζ)2 = lim
n=−M
π
M →∞ −π
E XN (ejω , ζ)2 dω 2M + 1 2π
Remark 1: The average power PXX in a random process X(n) is given by the time average of its second moment. Remark 2: If X(n) is widesense stationary, E X(n)2 is a constant and PXX = E X(n)2 . PXX can be obtained by a frequency domain integration. If we deﬁne the power spectral density (PSD) of X(n) by 2 E XN (ejω , ζ) SXX (ejω , ζ) = lim M →∞ 2M + 1 where
M
XN (ejω , ζ) =
n=−M
X(n, ζ) e−jωn
then the normalised power is given by
π
PXX =
−π
SXX (ejω )
dω = rXX (0) 2π
Example 7.4.1 Consider the random process X(n) = A · cos(ω0 n + Φ) where A and ω0 are real constants and Φ is a random variable uniformly distributed on [0, π/2). We shall ﬁnd PXX for X(n) using
M
PXX = lim The meansquared value of X(n) is E X(n)2
M →∞
n=−M
E X(n)2 . 2M + 1
= E A2 cos (ω0 n + Φ)2 = E = = = = = A2 A2 + cos (2ω0 n + 2Φ) 2 2
π/2
A2 A2 + 2 2 A2 2 + A2 2
cos (2ω0 n + 2ϕ)
0
2 dϕ π 2 π
sin (2ω0 n + 2ϕ) 2
π/2 0
A2 A2 sin (2ω0 n + π) − sin (2ω0 n) 2 + 2 2 2 π 2 2 2 −2 sin (2ω n) A A 0 + 2 2 π 2 A2 A2 − sin (2ω0 n) 2 π
7.4. POWER SPECTRAL DENSITY
95
X(n) is non stationary because E X(n)2 is a function of time. The time average of the above expression is M 1 A2 A2 A2 lim − sin(2ω0 n) = M →∞ 2M + 1 2 π 2
n=−M
Using
π
PXX = lim yields PXX =
A2 2
M →∞ −π
E XN (ejω )2 2M + 1
dω 2π
which agrees with the above, which should be shown by the reader as an exercise.
The results derived for the PSD can be obtained similarly for two processes X(n) and Y (n), say. The crosspower density spectrum is then given by SXY (ejω ) = lim and thus the total average cross power is
π
E XN (ejω )YN (ejω )∗ M →∞ 2M + 1 dω 2π
PXY =
−π
SXY (ejω )
7.4.2
Relationship between the PSD and SOMF: WienerKhintchine Theorem
Theorem 7.4.1 WienerKhintchine Theorem When X(n) is a widesense stationary process, the PSD can be obtained from the Fourier Transform of the SOMF: SXX (ejω ) = F {rXX (κ)} = and conversely, rXX (κ) = F −1 SXX (ejω ) =
∞ κ=−∞ π −π
rXX (κ) e−jωκ
SXX (ejω ) ejωκ
dω 2π
Consequence: there are two distinctly diﬀerent methods that may be used to evaluate the PSD of a random process: 1. The PSD is obtained directly from the deﬁnition SXX (e ) = lim where XN (ejω ) =
n=−M jω
E XN (ejω ) 2M + 1
2
M →∞ M
X(n) e−jωn
2. The PSD is obtained by evaluating the Fourier transform of rXX (κ), where rXX (κ) has to be obtained ﬁrst.
96
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
7.4.3
Properties of the PSD
The PSD has a number of important properties: 1. The PSD is always real, even if X(n) is complex SXX (e )
jω ∗ ∞ κ=−∞ ∞ κ=−∞ ∞ κ=−∞
= = =
rXX (κ)∗ e+jωκ rXX (−κ)e+jωκ rXX (κ′ )e−jωκ
′
= SXX (ejω ) 2. SXX (ejω ) ≥ 0, even if X(n) is complex. 3. When X(n) is real, SXX (e−jω ) = SXX (ejω ). 4.
π −π
SXX (ejω ) dω = PXX is the total normalised average power. 2π X(n) = A sin(ω0 n + Φ)
Example 7.4.2 where A, ω0 are constants and Φ is uniformly distributed on [0, 2π). The mean of X(n) is given by 2π 1 E [X(n)] = A sin(ω0 n + ϕ) dϕ = 0 2π 0 and its SOMF is A2 cos(ω0 κ) rXX (κ) = E [X(n + κ)X(n)] = 2 By taking the Fourier transform of rXX (κ) we obtain the PSD SXX (ejω ) = F{rXX (κ)} = where η(ω) =
∞ k=−∞
A2 π {η(ω − ω0 ) + η(ω + ω0 )} 2 δ(ω + k2π)
Example 7.4.3 Suppose that the random variables Ai are uncorrelated with zero mean and 2 variance σi , i = 1, . . . , P . Deﬁne
P
Z(n) =
i=1
Ai exp{jωi n}.
It follows that
P
rZZ (κ) =
i=1
2 σi exp{jωi κ}
and SZZ (e ) =
jω
P i=1 2 σi η(ω − ωi )
7.4. POWER SPECTRAL DENSITY
97
Example 7.4.4 Suppose now that the random variables Ai and Bi , i = 1, . . . , P are pairwise uncorrelated with zero mean and common variance
2 2 E A2 = E Bi = σi , i
i = 1, . . . , P.
Deﬁne
P
X(n) =
i=1
[Ai cos(ωi n) + Bi sin(ωi n)] .
We ﬁnd
P
rXX (κ) =
i=1
2 σi cos(ωi κ)
and therefore
P
SXX = π
i=1
2 σi [η(ω − ωi ) + η(ω + ωi )]
Deﬁnition 7.4.1 A random process X(n) is said to be a “white noise process” if the PSD is constant over all frequencies: N0 SXX (ejω ) = 2 where N0 is a positive constant. The SOMF for a white process is obtained by taking the inverse Fourier transform of SXX (ejω ). rXX (κ) = N0 δ(κ) 2
where δ(κ) is the Kronecker delta function given by δ(κ) = 1, 0, κ=0 κ=0
7.4.4
CrossPower Spectral Density
Suppose we have observations xN (n) and yN (n) of a random process X(n) and Y (n), respectively. Let XN (ejω ) and YN (ejω ) be their respective Fourier transforms, i.e.
M
XN (ejω ) =
n=−M
xN (n)e−jωn
and YN (ejω ) =
M
yN (n)e−jωn
n=−M
The crossspectral density function is given as SXY (ejω ) = lim E XN (ejω )YN (ejω )∗ M →∞ 2M + 1
98
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Substituting XN (ejω ) and YN (ejω ): 1 SXY (e ) = lim M →∞ 2M + 1
jω M M
rXY (n1 , n2 )e−jω(n1 −n2 )
n1 =−M n2 =−M
Taking the inverse Fourier transform of both sides
π −π
SXY (ejω )ejωκ
dω 2π
=
1 −π M →∞ 2M + 1 lim
M
π
M
M
rXY (n1 , n2 )e−jω(n1 −n2 ) ejωκ
n1 =−M n2 =−M M π
dω 2π
1 = lim M →∞ 2M + 1 1 = lim M →∞ 2M + 1 1 = lim M →∞ 2M + 1
rXY (n1 , n2 )
n1 =−M n2 =−M M M −π
ejω(n2 +κ−n1 )
dω 2π
n1 =−M n2 =−M M
rXY (n1 , n2 )δ(n2 + κ − n1 ) (7.6)
rXY (n2 + κ, n2 )
n2 =−M
provided −M ≤ n2 + κ ≤ M so that the delta function is in the range of the summation. This condition can be relaxed in the limit as M → ∞. The above result shows that the time average of the SOMF and the PSD form a Fourier transform pair. Clearly, if X(n) and Y (n) are jointly stationary then this result also implies SXY (e ) =
jω ∞ κ=−∞
rXY (κ)e−jωκ
Deﬁnition 7.4.2 For two jointly stationary processes X(n) and Y (n) the crosspower spectral density is given by SXY (ejω ) = F{rXY (κ)} = and is in general complex. Properties: 1. SXY (ejω ) = SY X (ejω )∗ . SY X (e−jω ). In addition, if X(n) and Y (n) are real, then SXY (ejω ) =
∞
rXY (κ)e−jωκ
κ=−∞
2. Re SXY (ejω ) and Re SY X (ejω ) are even functions of ω if X(n) and Y (n) are real. 3. Im SXY (ejω ) and Im SY X (ejω ) are odd functions of ω if X(n) and Y (n) are real. 4. SXY (ejω ) = 0 and SY X (ejω ) = 0 if X(n) and Y (n) are orthogonal.
7.5. DEFINITION OF THE SPECTRUM Property 1 results from the fact that: SY X (e ) = = = =
jω ∞ κ=−∞ ∞ κ=−∞ ∞
99
rY X (κ)e−jωκ rXY (−κ)∗ e−jωκ
−jωκ ∗
rXY (κ)e κ=−∞ SXY (ejω )∗
Example 7.4.5 Let the SOMF function of two processes X(n) and Y (n) be rXY (n + κ, n) = AB [sin(ω0 κ) + cos(ω0 [2n + κ])] 2
where A, B and ω0 are constants. We ﬁnd 1 M →∞ 2M + 1 lim
M
rXY (n + κ, n) =
n=−M
1 AB AB sin(ω0 κ) + lim 2 2 M →∞ 2M + 1
M
cos(ω0 [2n + κ])
n=−M =0
so using the result of Equation (7.6) SXY (ejω ) = F = AB sin(ω0 κ) 2 π AB [η(ω − ω0 ) − η(ω + ω0 )] j 2
7.5
Deﬁnition of the Spectrum
The PSD of a stationary random process has been deﬁned as the Fourier transform of the SOMF. We now deﬁne the ‘spectrum’ of a stationary random process X(n) as the Fourier transform of the covariance function: CXX (ejω ) :=
∞ n=−∞
cXX (n)e−jωn
If n cXX (n) < ∞, CXX (ejω ) exists and is bounded and uniformly continuous. The inversion is given by cXX (n) = 1 2π
π −π
CXX (ejω )ejωn dω
CXX (ejω ) is real, 2π periodic and nonnegative. The crossspectrum of two jointly stationary random processes X(n) and Y (n) is deﬁned by CXY (e ) :=
jω ∞ n=−∞
cXY (n)e−jωn
100 with the property
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
CXY (ejω ) = CY X (ejω )∗ The inversion is given by cXY (n) = If X(n) and Y (n) are real, 1 2π
π −π
CXY (ejω )ejωn dω
CXX (ejω ) = CXX (e−jω ) CXY (ejω ) = CXY (e−jω )∗ = CY X (e−jω ) = CY X (ejω )∗ The spectrum of a real process is completely speciﬁed on the interval [0, π].
7.6
7.6.1
Random Signals and Linear Systems
Convergence of Random Variables
In the following we consider deﬁnitions of convergence of a sequence of random variables Xk , k = 0, 1, 2, . . . . They are given in order of power. 1. Convergence with probability one or almost surely convergence A variable Xk converges to X with probability 1 if P
k→∞
lim Xk − X = 0
=1
2. Convergence in the Mean Square Sense A variable Xk converges to X in the MSS if
k→∞
lim E Xk − X2 = 0
3. Convergence in Probability Xk converges in probability to X if ∀ ǫ > 0
k→∞
lim P (Xk − X > ǫ) = 0
4. Convergence in Distribution Xk converges in distribution to X (or Xk is asymptotically FX distributed) if
k→∞
lim FXk (x) = FX (x)
for all continuous points of FX . The convergences 1 to 4 are not equally weighted. i) Convergence with probability 1 implies convergence in probability. ii) Convergence with probability 1 implies convergence in the MSS, provided second order moments exist. iii) Convergence in the MSS implies convergence in probability. iv) Convergence in probability implies convergence in distribution. With the concepts of convergence one may deﬁne continuity, diﬀerentiability and so on.
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS
101
7.6.2
Linear ﬁltering
We are given two stationary processes X(n) and Y (n) and a linear timeinvariant (LTI) system with impulse response h(n), where the ﬁlter is stable ( h(n) < ∞) as in ﬁgure 7.9. Then for such a system Y (n) =
k
h(k)X(n − k) =
M k=−N
k
h(n − k)X(k)
exists with probability one i.e. for N, M → ∞.
h(k)X(n − k) converges with probability one to Y (n)
X(n)
h(n)
Y(n)
Figure 7.9: A linear ﬁlter. If X(n) is stationary and E [X(n)] < ∞, then Y (n) is stationary. If X(n) is white noise, 2 2 i.e. E [X(n)] = 0, E [X(n)X(k)] = σX δ(n − k), with σX the power of the noise and δ(n) the Kronecker delta function δ(n) = 1, n = 0 0, else H(ejω )2 dω < ∞,
Y (n) is called a linear process. If we do not require the ﬁlter to be stable but where H(ejω ) =
n
h(n)e−jωn cXX (n) < ∞ then h(n − k)X(k)
is the transfer function of the system, and if for X(n), Y (n) =
k
h(k)X(n − k) =
k
exists in the MSS, i.e,
M k=−N
h(k)X(n − k)
converges in the MSS for N, M → ∞ and Y (n) is stationary in the wide sense with µY E [Y (n)] =
k
h(k)E [X(n − k)] = µX H(ej0 )
102
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.6.1 Hilbert Transforms −j, 0 < ω < π 0, ω = 0, −π H(ejω ) = j, −π < ω < 0 h(n) =
0, n=0 (1 − cos(πn))/(πn), else
The ﬁlter is not stable, but H(ejω )2 dω < ∞. Y (n) is stationary in the wide sense. For example if X(n) = a cos(ωo n + φ), then Y (n) = a sin(ωo n + φ). Output covariance function and spectrum We now determine the relationship between the covariance function and the spectrum, at the input and at the output. First, let us deﬁne ˜ Y (n) Y (n) − µY =
k
h(k)X(n − k) −
k
h(k)E [X(n − k)]
k
=
k
h(k) (X(n − k) − µX ) =
˜ h(k)X(n − k)
˜ where X(n) X(n) − µX . In the general case, the system impulse response and the input process may be complex. The covariance function of the output is determined according to ˜ ˜ cY Y (κ) = E Y (n + κ)Y (n)∗ = E
k
˜ h(k)X(n + κ − k)
∗
l
˜ h(l)X(n − l)
∗
=
k l
˜ ˜ h(k)h(l) E X(n + κ − k)X(n − l)∗ h(k)h(l)∗ cXX (κ − k + l)
=
k l
Taking the Fourier transform of both sides leads to CY Y (ejω ) = H(ejω )2 CXX (ejω ) and applying the inverse Fourier transform formula yields2
π
(7.7)
cY Y (κ) =
−π
H(ejω )2 CXX (ejω )ejωκ
dω 2π
Example 7.6.2 FIR ﬁlter Let X(n) be white noise with power σ 2 and Y (n) = X(n) − X(n − 1).
2
The integral exists because CXX (ejω ) is bounded and
R
H(ejω )2 dω < ∞.
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS From the above, one can see that n=0 1, −1, n = 1 h(n) = 0, else
∞ n=−∞
103
Taking the Fourier transform leads to H(ejω ) = and thus,
h(n)e−jωn = 1 − e−jω
CY Y (ejω ) = H(ejω )2 CXX (ejω ) = 1 − e−jω 2 σ 2 = 2je−jω/2 sin(ω/2)2 σ 2 = 4 sin(ω/2)2 σ 2
Crosscovariance function and crossspectrum The crosscovariance function of the output is determined according to ˜ ˜ cY X (κ) = E Y (n + κ)X(n)∗ = E
k
˜ ˜ h(k)X(n + κ − k) X(n)∗
=
k
˜ ˜ h(k)E X(n + κ − k)X(n)∗ h(k)cXX (κ − k)
=
k
Taking the Fourier transform of both sides leads to CY X (ejω ) = H(ejω )CXX (ejω ) and applying the inverse Fourier transform formula yields
π
cY X (κ) =
−π
H(ejω )CXX (ejω )ejωκ
dω . 2π
Example 7.6.3 Consider a system as depicted in Figure 7.9 where X(n) is a stationary, zero2 mean white noise process with variance σX . The impulse response of the system is given by h(n) = where u(n) is the unit step function u(n) = 1, n ≥ 0 0, n < 0 1 3 1 2
n
+
2 3
−
1 2
n
u(n)
and we wish to determine CY Y (ejω ) and CY X (ejω ).
104
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES The system transfer function is given by H(ejω ) = F{h(n)} = = 1 1+
1 −jω 4e
1/3 2/3 1 −jω + 1 − 4e 1 + 1 e−jω 2
1 − 8 e−j2ω
The spectrum and crossspectrum are given respectively by CY Y (ejω ) = H(ejω )2 CXX (ejω ) 2 σX = 1 1 1 + 4 e−jω − 8 e−j2ω CY X (ejω ) = H(ejω )CXX (ejω ) 2 σX = 1 1 1 + 4 e−jω − 8 e−j2ω Interaction of Two Linear Systems The previous result may be generalised to obtain the crosscovariance or crossspectrum of two linear systems. Let X1 (n) and Y1 (n) be the input and output processes of the ﬁrst system, with impulse response h1 (n), and X2 (n) and Y2 (n) be the input and output processes of the second system with impulse response h2 (n) (see Figure 7.10). Suppose that X1 (n) and X2 (n) are widesense stationary and the systems are timeinvariant and linear. Then the output crosscovariance function is: cY1 Y2 (κ) =
k l
2
h1 (k)h2 (l)∗ cX1 X2 (κ − k + l)
(7.8)
It can be seen from (7.8) that cY1 Y2 (κ) = h1 (κ) ⋆ h2 (−κ)∗ ⋆ cX1 X2 (κ) where ⋆ denotes convolution. Taking the Fourier transform leads to CY1 Y2 (ejω ) = H1 (ejω )H2 (ejω )∗ CX1 X2 (ejω ) (7.9)
X1 (n) cX1 X2 (κ) CX1 X2 (ejω ) X2 (n)
E h1 (n), H1 (ejω )
E Y1 (n) =
κ h1 (κ)X1 (n
− κ)
cY1 Y2 (κ)
CY1 Y2 (ejω )
E h2 (n), H2 (ejω ) E Y2 (n) =
κ h2 (κ)X2 (n
− κ)
Figure 7.10: Interaction of two linear systems
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS
105
Example 7.6.4 Consider the system shown in Figure 7.11 where Z(n) is stationary, zeromean 2 white noise with variance σZ . Let h(n) be the same system as discussed in Example 7.6.3 and g(n) is given by 1 1 g(n) = δ(n) + δ(n − 1) − δ(n − 2) 2 2 where δ(n) is the Kronecker delta function. We wish to ﬁnd the crossspectrum CXY (ejω ). G(ejω ) =
n
g(n)e−jωn
1 1 = 1 + e−jω − e−j2ω 2 2 CXY (ejω ) = H(ejω )G(ejω )∗ CZZ (ejω ) =
1 1 1 + 2 e−jω − 2 e−j2ω 2 1 −j2ω σZ 1 −jω − 8e 1 + 4e
h(n) Z(n) g(n)
X(n)
Y (n)
Figure 7.11: Interaction of two linear systems.
Cascade of linear systems We consider a cascade of L linear systems in serial connection, as illustrated by Figure 7.12. From linear system theory, it is known that a cascade of linear systems can be equivalently represented by a system whose transfer function H(ejω ) is the product of the L individual system functions H(ejω ) = H1 (ejω )H2 (ejω ) · · · HL (ejω ). When a random process is the input to such a system, the spectrum of the output and cross spectrum of the output with the input are given respectively by
L 2 L
CY Y (e ) = CXX (e )
i=1 L
jω
jω
Hi (e ) Hi (ejω )
i=1
jω
= CXX (ejω )
i=1
Hi (ejω )
2
CY X (ejω ) = CXX (ejω )
Example 7.6.5 We consider again the system shown in Figure 7.11 where we wish to ﬁnd the cross spectrum CXY (ejω ). Instead of using Equation (7.9) with X1 (n) = X2 (n) = Z(n) as done
106 X(n)
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES H1 (ejω ) H2 (ejω ) ··· HL (ejω ) Y (n)
Figure 7.12: Cascade of L linear systems.
Y (n)
1 G(ejω )
H(ejω )
X(n)
Figure 7.13: Cascade of two linear systems. in Example 7.6.4, we reformulate the problem as a cascade of two systems as shown in Figure 7.13. Remembering that CY Y (ejω ) = G(ejω )2 CZZ (ejω ) (from Figure 7.11 and Equation (7.7)) we obtain CXY (ejω ) = = H(ejω ) CY Y (ejω ) G(ejω ) H(ejω ) G(ejω )2 CZZ (ejω ) G(ejω )
= H(ejω )G(ejω )∗ CZZ (ejω )
7.6.3
Linear ﬁltered process plus noise
Consider the system shown in Figure 7.14. It is assumed that X(n) is stationary with E [X(n)] = 0, V (n) stationary with E [V (n)] = 0 and cV X (n) = 0 (X(n) and V (n) uncorrelated). It is also assumed that h(n), X(n) and V (n) are real.
V(n)
X(n)
h(n)
+
Y(n)
Figure 7.14: A linear ﬁlter plus noise.
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS The expected value of the output is derived below. Y (n) = E [Y (n)] =
∞ k=−∞ ∞ k=−∞
107
h(k)X(n − k) + V (n) h(k)E [X(n − k)] + E [V (n)] = 0
Output covariance function and spectrum The covariance function of the output is determined according to cY Y (κ) = E [Y (n + κ)Y (n)] = E
k
h(k)X(n + κ − k) + V (n + κ) h(k)h(l)E [X(n + κ − k)X(n − l)]
l
h(l)X(n − l) + V (n)
=
k l
+
k
h(k)E [X(n + κ − k)V (n)] h(l)E [V (n + κ)X(n − l)] h(k)h(l)cXX (κ − k + l) + cV V (κ)
+
l
+ E [V (n + κ)V (n)] =
k l
Applying the Fourier transform to both sides of the above expression yields CY Y (ejω ) = F{cY Y (κ)} = H(ejω )2 CXX (ejω ) + CV V (ejω ) Crosscovariance function and crossspectrum The crosscovariance function of the output is determined according to cY X (κ) = E [Y (n + κ)X(n)] = E
k
h(k)X(n + κ − k) + V (n + κ) X(n)
=
k
h(k)E [X(n + κ − k)X(n)] h(k)cXX (κ − k)
cY X (κ) =
k
Applying the Fourier transform to both sides of the above expression yields CY X (ejω ) = H(ejω )CXX (ejω ) Note that when the additive noise at the output, V (n) is uncorrelated with the input process X(n), it has no eﬀect on the crossspectrum of the input and output.
108
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Example 7.6.6 A complexvalued process X(n) is comprised of a sum of K complex signals
K
X(n) =
k=1
Ak ej(ω0 n+φk )
where ω0 is a constant and Ak is a zeromean random amplitude of the kth signal with variance 2 σk , k = 1, 2, . . . , K. The phase φk is uniformly distributed on [−π, π). The random variables Ak and φk are statistically pairwise independent for all k = 1, . . . , K. Z(n) is the input to a linear, stable timeinvariant system with known impulse response h(n). The output of the system is buried in zeromean noise V (n) of known covariance cV V (n), independent of Z(n) (refer to Figure 7.14). We wish to determine the crossspectrum CY X (ejω ) and the autospectrum CY Y (ejω ). We ﬁrst calculate the covariance function of the input process X(n) E [X(n)] = E
K j(ω0 n+φk ) k=1 Ak e
and since Ak and φk are independent for all k = 1, . . . , K
K
=
k=1
E [Ak ] E ej(ω0 n+φk ) = 0
cXX (n + κ, n) = E [X(n + κ)X(n)∗ ]
K K
= E
K k=1 K
Ak e
j(ω0 (n+κ)+φk ) l=1
Al e−j(ω0 n+φl )
=
k=1 l=1
E [Ak Al ] ejω0 κ E ej(φk −φl ) 1, 0, k=l k=l
Since
E ej(φk −φl ) =
K
cXX (n + κ, n) =
k=1
E A2 ejω0 κ k
K 2 σk = cXX (κ) k=1
= ejω0 κ
CY X (ejω ) = H(ejω )CXX (ejω ) = H(ejω )F {cXX (κ)}
K
= H(e )
k=1 K 2 σk k=1 l
jω
2 σk 2π l
δ(ω + ω0 + 2πl)
=
H(ω − ω0 + 2πl)δ(ω − ω0 + 2πl)
7.6. RANDOM SIGNALS AND LINEAR SYSTEMS CY Y (ejω ) = H(ejω )2 CXX (ejω ) + CV V (ejω )
K
109
=
k=1
2 σk l
H(ω − ω0 + 2πl)2 δ(ω − ω0 + 2πl) + CV V (ejω )
110
CHAPTER 7. RANDOM VARIABLES AND STOCHASTIC PROCESSES
Chapter 8
The Finite Fourier Transform
Let X(0), . . . , X(n−1) be a model for the observations x(0), . . . , x(N −1) of a stationary random process X(n), n = 0, ±1, ±2, . . . . The function XN (ejω ) =
N −1 n=0
X(n)e−jωn ,
−∞ < ω < ∞
is called the ﬁnite Fourier transform. Properties The ﬁnite Fourier transform has some properties which are listed below. 1. XN (ej(ω+2π) ) = XN (ejω ) 2. XN (e−jω ) = XN (ejω )∗ ∀ X(n) ∈ R 3. The ﬁnite Fourier transform of aX(n) + bY (n) is aXN (ejω ) + bYN (ejω ), where a and b are constants. 4. If y(n) = m h(m)x(n − m), n (1 + n)h(n) < ∞ and X(n) < M , then there exists a constant K > 0 such that YN (ejω ) − H(ejω )XN (ejω ) < K ∀ ω. 5. X(n) = (1/2π)
π −π
XN (ejω )ejωn dω, n = 0, 1, . . . , N − 1, and 0 otherwise. X(n)e−j2πkn/N
The ﬁnite Fourier transform can be taken at discrete frequencies ωk = 2πk/N XN (2πk/N ) = k = 0, 1, . . . , N − 1 ,
which can be computed by means of the Fast Fourier Transform (FFT).
8.1
8.1.1
Statistical Properties of the Finite Fourier Transform
Preliminaries
Real normal distribution: X = (X(0), . . . , X(N − 1))T is said to be normally distributed (X ∼ NN (µ, Σ)) if fX (X) = (2π)−N/2 Σ−1/2 e− 2 (X−µ) 111
1 T Σ−1 (X−µ)
112
CHAPTER 8. THE FINITE FOURIER TRANSFORM
given that µ = E (X(0), . . . , X(N − 1))T , Σ = (Cov [Xi , Xk ])i,k=0,...,N −1 = E (X − µ)(X − µ)T and Σ > 0. Complex normal distribution: A vector X is called complex normally distributed with mean µ C and covariance Σ (X ∼ NN (µ, Σ)) if the 2N vector of real valued random variables ℜX ℑX is a normal vector with mean ℜµ ℑµ and covariance matrix 1 2 ℜΣ −ℑΣ ℑΣ ℜΣ
One should note that Σ is Hermitian and so (ℑΣ)T = −ℑΣ, and E (X − µ)(X − µ)T = 0 If Σ is nonsingular, then the probability density function is
c fX (x) = π −N Σ−1 e−(x−µ)
H
Σ−1 (x−µ)
2 For N = 1, Σ = E [(X − µ)(X − µ)∗ ] is real and identical to σX . Then Cov [ℜX, ℑX] = 0, i.e. 2 /2) and N (ℑX, σ 2 /2) distributed respectively. ℜX and ℑX are independent N1 (ℜX, σX 1 X
Asymptotic Normal Distribution We say that a sequence X(n), n = 1, 2, . . . of N dimensional random vectors is asymptotically −1 C NN (µ(n) , Σ(n) ) distributed if the distribution function of G(n) X(n) − µ(n) converges to C the distribution NN (0, I) as n → ∞. Herein G(n) is a matrix with G(n) [G(n) ]H = Σ(n) . In the following, we assume that X(n) is real and stationary with ﬁnite moments of all orders and absolutely summable cumulants, that is, the statistical dependence between X(n) and X(m) decreases suﬃciently fast as n − m increases, for example, if X(n) is a linear process. Then we have the following properties for the ﬁnite Fourier transform. 1. (a) For large N , XN (ejω ) is distributed as C N1 (0, N CXX (ejω )), ω = kπ N1 (0, N CXX (ejπ )), ω = (2k + 1)π XN (ejω ) ∼ N1 (N µX , N CXX (0)), ω = 2kπ where k is an integer. (b) For pairwise distinct frequencies ωi , i = 1, . . . , M with 0 ≤ ωi ≤ π, the XN (ejωi ) are asymptotically independent variables. This is also valid when the ωi are approximated by 2πki /N → ωi as N → ∞.
This states that the discrete Fourier transformation provides N independent variables at frequencies ωi which are normally distributed with mean 0 or N µX and variance N CXX (ejωi ).
8.1. STATISTICAL PROPERTIES OF THE FINITE FOURIER TRANSFORM
113
2. If we use a window w(t), t ∈ R with w(t) = 0 for t < 0 and t > 1 that is bounded and 1 2 0 w(t) dt < ∞, then for ωi , i = 0, . . . , N − 1 XW,N (ejωi ) =
N −1 n=0
w(n/N )X(n)e−jωi n ,
−∞ < ω < ∞
The above is also valid for ωi ≃ 2πki /N . This states that for large N , the Fourier transform of windowed stationary processes are distributed similarly as nonwindowed processes, except that the variance is multiplied by 1 1 the factor 0 w(t)2 dt and the mean at ωi = 2kπ is multiplied by the factor 0 w(t) dt. 3. If N = M L and X(0), . . . , X(N − 1) are divided into L segments each of length M , then XW,M (ejω , l) =
M −1 m=0
are asymptotically independently distributed as C N1 (0, N 01 w(t)2 dt CXX (ejωi )), ωi = kπ 1 jωi 2 dt C jπ )), XN (e ) ∼ N (0, N 0 w(t) ωi = (2k + 1)π XX (e 1 1 1 N (N µ w(t) dt, N 0 w(t)2 dt CXX (0)), ωi = 2kπ 1 X 0
w(m/M )X(lM + m)e−jω(lM +m)
This states that ﬁnite Fourier transforms of data segments of size M each at ωi are inde1 pendently normally distributed random variable with variance M · CXX (ejωi ) 0 w(t)2 dt. 4. If X(n) is Gaussian then the distributions are exact. 5. Under stronger conditions with respect to the cumulants, one can show that the Fourier √ transform has a rate of growth at most of order N log N . If X(n) is bounded by a constant, M say, then sup XN (ejω ) ≤ M N
ω
for l = 0, . . . , L − 1 are asymptotically independently distributed as C N1 (0, M CXX (ejωi ) 01 w(t)2 dt), ωi = kπ jωi jπ ) 1 w(t)2 dt), XW,M (e , l) ∼ N (0, M CXX (e ωi = (2k + 1)π 0 1 1 1 N (M µ w(t) dt, M CXX (0) 0 w(t)2 dt), ωi = 2kπ 1 X 0
giving growth rate of order N . 6. If Y (n) = ∞ k=−∞ h(k)X(n − k), E [X(n)] = 0 and then with probability 1
∞ n=−∞ (1
+ n)h(n) < ∞, (8.1) (8.2)
YN (ejω ) = H(ejω )XN (ejω ) + O YW,N (ejω ) = H(ejω )XW,N (ejω ) + O as N → ∞
log N , log N N
114 Interpretation
CHAPTER 8. THE FINITE FOURIER TRANSFORM
To 1.(a): Take a broadband noise with a normally distributed amplitude and pass it through an ideal narrow bandpass ﬁlter, H(ejω ) = 1, ω ± ωo  < π/N 0, else .
H(ejω ) 2π/Ν
−ω0
ω0
ω
Figure 8.1: Transfer function of an ideal bandpass ﬁlter. If X(n) is ﬁltered, then Y (n) = Y (n) ≈ h(m)X(n − m). For large N
1 1 XN (2πs/N )e2πsn/N + XN (−2πs/N )e−2πsn/N N N
1 where 2πs/N ≈ ωo , also Y (n) is approximately N1 (0, N CXX (ejωo )) distributed.
To 1.(b):
H(e j ω ) 0 Y(n) −ω0 ω0 ω
X(n)
Y(n) and U(n) are approximately independent
H(e j ω ) 1 U(n) −ω1 ω1 ω
non−overlapping frequency ranges
Chapter 9
Spectrum Estimation
9.1 Use of bandpass ﬁlters
X(n) is a stationary stochastic process with covariance function cXX (n) and spectrum CXX (ejω ). Assume E [X(n)] = 0. If we transform X(n) by Y (n) = X(n)2 then E [Y (n)] = E X(n)2 = cXX (0) =
π −π
CXX (ejω )
dω 2π
This motivates the following construction to estimate the spectrum. The bandpass ﬁlter has the transfer function HB (ejω ) =
0 HB , ω ± ω0  ≤ 0, else , B 2
for ω ≤ π
0 where HB will be speciﬁed later and ω0 is the center frequency of the ﬁlter. The lowpass ﬁlter is an FIR ﬁlter with impulse response
hl (n) =
0,
1 N,
n = 0, 1, 2, . . . , N − 1 otherwise ,
X(n)
V(n) Band−pass filter
U(n) Low−pass filter
Y(n)
Figure 9.1:
115
116
CHAPTER 9. SPECTRUM ESTIMATION
HΒ (ω)
0 HΒ
−ω0
ω0−Β/2 ω0 ω0+Β/2
Figure 9.2: Transfer function of an ideal bandpass ﬁlter
used to calculate the mean. We have HL (e ) = The spectrum of V (n) is given by CV V (ejω ) = HB (ejω )2 CXX (ejω ) The output process is given by Y (n) = The mean of Y (n) is given by µY 1 = E [Y (n)] = N
π N −1 m=0 N −1 m=0 jω
sin( ωN ) −ω N −1 2 2 e N sin(ω/2)
1 1 U (n − m) = N N
N −1 m=0
V (n − m)2
E V (n − m)2 = cV V (0)
=
−π π
CV V (ejω )
dω 2π dω 2π dω 2π
=
−π
HB (ejω )2 CXX (ejω )
ω0 +B/2 ω0 −B/2
0 = 2(HB )2
CXX (ejω )
B 0 ≈ (HB )2 CXX (ejω0 ) π
0 If we choose HB = 1, µY = E [Y (n)] measures the power of X(n) in the frequency band 0 ω − ω0 < B/2. The spectral value CXX (ejω0 ) will be measured if HB = π/B, provided jω ) is continuous at ω = ω . B ≪ ω0 and CXX (e 0 Under the assumption that X(n) is Gaussian, one can show that 2 σY ≈
2π CXX (ejω0 )2 . BN
9.1. USE OF BANDPASS FILTERS
117
0 To measure CXX (ejω0 ), we choose HB so that µY = CXX (ejω0 ). However, the power of the jω0 )2 . output process is proportional to CXX (e To measure CXX (ejω ) in a frequency band, we would need a ﬁlter bank with a cascade of squarers and integrators. In acoustics, one uses such a circuit, where the center frequencies are taken to be ωk+1 = (2ωk )1/3 , k = 1, . . . , K − 1. This is the principle of a spectral analyser, especially when it is used for analysing analog signals. Such a spectral analyser yields the spectrum at frequencies ω1 , . . . , ωk simultaneously. In measurement technology, one uses simpler systems that work with a bandpass having variable center frequency. The signal of the system output gives an estimate of the spectrum in a certain frequency band. The accuracy of the measurement depends on the slowness of changing the center frequency and the smoothness of the spectrum of X(n).
118
CHAPTER 9. SPECTRUM ESTIMATION
Chapter 10
NonParametric Spectrum Estimation
10.1 Consistency of Spectral Estimates
ˆ An estimate CXX (ejω ) of the spectrum CXX (ejω ) is a random variable. If we could design an ideal estimator for the spectrum CXX (ejω ), the pdf fCXX (ejω ) (α) would be a delta function at ˆ ˆ ˆ CXX (ejω ), i.e. the mean E CXX (ejω ) would be CXX (ejω ) and the variance Var CXX (ejω ) would be zero. Because it is not possible to create an ideal estimator, we try to reach this goal asymptotically as the number N of observed samples X(n), n = 0, 1, . . . , N −1 tends to inﬁnity. We require that ˆ ˆ CXX (ejω ) is a consistent estimator for CXX (ejω ), i.e., CXX (ejω ) → CXX (ejω ) in probability as ˆ N → ∞. A suﬃcient condition for consistency is that the mean square error (MSE) of CXX (ejω ) converges to zero as N → ∞ (see 7.6.1), i.e., ˆ M SE CXX (ejω ) = E ˆ CXX (ejω ) − CXX (ejω )
2
→0
as
N → ∞.
ˆ Equivalently, we may say that a suﬃcient condition for CXX (ejω ) to be consistent is that ˆ Var CXX (ejω ) → 0 and because ˆ M SE CXX (ejω ) = E ˆ CXX (ejω ) − CXX (ejω )
2 2
as
N →∞ →0 as N →∞
ˆ ˆ Bias CXX (ejω ) = E CXX (ejω ) − CXX (ejω )
ˆ ˆ = E CXX (ejω )2 − 2E CXX (ejω ) · CXX (ejω ) + CXX (ejω )2 ˆ − E CXX (ejω ) ˆ + E CXX (ejω )
2 2 2
ˆ ˆ = E CXX (ejω )2 − E CXX (ejω )
ˆ + E CXX (ejω ) − CXX (ejω )
2
ˆ ˆ = Var CXX (ejω ) + Bias CXX (ejω ) 119
.
120
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
ˆ If E CXX (ejω ) = CXX (ejω ) the estimator has a bias which results in a systematic error of the ˆ method. On the other hand if Var CXX (ejω ) = 0, we get a random error in the estimator, i.e., the estimator varies from data set to data set.
10.2
The Periodogram
N which suggests 1 the so called Periodogram IXX (ejω ) as an estimator for the spectrum CXX (ejω ), i.e., N IXX (ejω ) =
Suppose X(n), n = 0, ±1, . . . is stationary with mean µX and cXX (k) = Cov [X(n + k), X(n)], −jωk , for −π < ω ≤ π is uniformly continuous i.e. the spectrum CXX (ejω ) = ∞ k=−∞ cXX (k)e and bounded. As seen in Chapter 8, the ﬁnite Fourier transform satisﬁes the following: C N1 (0, N CXX (ejω )), ω = kπ jω N1 (0, N CXX (ejπ )), ω = (2k + 1)π XN (e ) ∼ N1 (N µX , N CXX (0)), ω = 2kπ 1 XN (ejω )2 N 1 N
N −1 n=0 2
(10.1)
−jωn
=
X(n)e
.
N Note that IXX (ejω ) has the same symmetry, nonnegativity and periodicity properties as CXX (ejω ). The periodogram was introduced by Schuster (1898) as a tool for the identiﬁcation of hidden periodicities. Indeed, in the case of J
X(n) =
j=1
aj cos(ωj n + φj ),
n = 0, . . . , N − 1
N IXX (ejω ) has peaks at frequencies ω = ±ωj (mod 2π), j = 1, . . . , J, provided N is large.
10.2.1
1
Mean of the Periodogram
The mean of the periodogram can be calculated as follows:
` ´˜ ˆ ` ´ = N CXX (ejω ). From the asymptotic distribution of XN ejω we note that Var XN ejω i h ˛ h˛ i ` jω ´˜ ` jω ´˛2 ˆ ` jω ´˛2 1 1 1 = N E ˛XN e ˛ = E N ˛XN e ˛ . Var XN e N Thus
10.2. THE PERIODOGRAM
121
N E IXX (ω)
= =
1 · E XN (ejω ) · XN (ejω )∗ N 1 ·E N 1 N 1 N 1 N 1 N 1 N
N −1
(10.2) X(m)∗ ejωm
X(n)e
−jωn
N −1 m=0
=
n=0 N −1 N −1 n=0 m=0 N −1 N −1 n=0 m=0 N −1 N −1 n=0 m=0 N −1 N −1 n=0 m=0 π N −π
E [X(n)X(m)∗ ] e−jω(n−m)
=
(cXX (n − m) + µ2 )e−jω(n−m) X cXX (n − m)e−jω(n−m) + e−jω(n−m)
π −π
=
1 N jω 2 2 ∆ (e ) µX N 1 dλ + ∆N (ejω )2 µ2 X 2π N
= = where
CXX (ejλ )ejλ(n−m)
∆ (ej(ω−λ) )2 CXX (ejλ )
N −1 n=0
dλ 1 + ∆N (ejω )2 µ2 X 2π N
N −1 2
∆N (ejω ) =
e−jωn = e−jω
·
sin( ωN ) 2 sin( ω ) 2
is the Fourier transform of a rectangular window. Thus, for a zero mean stationary process N N X(n), the expected value of the periodogram IXX (ejω ), E IXX (ejω ) is equal to the spectrum jω ) convolved with the magnitude squared of the Fourier transform of the rectangular winCXX (e N dow, ∆N (ejω )2 . This suggests that limN →∞ E IXX (ejω ) = CXX (ejω )+µ2 ·2π k δ(ω +2πk), X 1 because limN →∞ N ∆N (ejω )2 = 2π k δ(ω + 2πk), as illustrated in Fig. 10.1. Therefore the periodogram estimate is asymptotically unbiased for ω = 2πk, k ∈ Z. We may ﬁnd an asymptotic unbiased estimator for CXX (ejω ) ∀ ω by subtracting the sample mean of the ∞ 1 N −jωn data, before computing the periodogram, i.e., IX−µX ,X−µX (ejω ) = N n=−∞ (X(n) − µX )e is asymptotically unbiased. In (10.2) we calculated the mean of the periodogram of a rectangular windowed data sequence X(n), n = 0, . . . , N −1. Advantages of using diﬀerent window types are stated in the ﬁlter design theory part (chapter 4). We now consider periodograms of sequences X(n), n = 0, . . . , N − 1, multiplied by a realvalued window (also taper) wN (n) of length N , having a Fourier transform WN (ejω ). Then, 1 N 1 N 1 N
N −1 N −1 n=0 m=0 N −1 N −1 n=0 m=0 π −π
2
N E IW X,W X (ejω )
=
wN (n)wN (n)E [X(n)X(m)∗ ] e−jω(n−m) wN (n)wN (n)cXX (n − m)e−jω(n−m) +
(10.3)
= =
1 WN (ejω )2 µ2 X N
WN (ej(ω−λ) )2 CXX (ejλ )
dλ 1 + WN (ejω )2 µ2 X 2π N
122
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
20
1 N jω 2 N ∆ (e )
N = 20
15
Window function
10
N = 10
5
N =5
0 −1
−0.5
0
Frequency
ω 2π
0.5
1
Figure 10.1: The window function
1 N jω 2 N ∆ (e )
for N = 20, N = 10 and N = 5.
Comparing this result with the one for rectangular windowed data, we observe the following: If CXX (ejα ) has a signiﬁcant peak at α in the neighborhood of ω, the expected value of N N IXX (ejω ) and IW X,W X (ejω ) can diﬀer substantially from CXX (ejω ). The advantage of employing a taper is now apparent. It can be considered as a means to reducing the eﬀect of neighboring peaks.
10.2.2
Variance of the Periodogram
To ﬁnd the variance of the periodogram we consider the covariance function of the periodogram. For simplicity, let us assume that X(n) is a real white Gaussian process with variance σ 2 and absolute summable covariance function cXX (n) and ω, λ = 2kπ with k ∈ Z. Then,
N N Cov IXX (ejω ), IXX (ejλ ) N N N N = E IXX (ejω )IXX (ejλ ) − E IXX (ejω ) E IXX (ejλ )
=
1 ∆N (ej(ω+λ) )2 + ∆N (ej(ω−λ) )2 σ 4 . N2
Using the characteristic function, this can be proven to equal E [X1 X2 X3 X4 ] = E [X1 X2 ] E [X3 X4 ] + E [X1 X3 ] E [X2 X4 ] + E [X1 X4 ] E [X2 X3 ] .
10.2. THE PERIODOGRAM
123
If the process X(n) is nonwhite and nonGaussian the covariance function can be found in [Brillinger 1981] as
N N Cov IXX (ejω ), IXX (ejλ ) ≈
(10.4)
CXX (ejω )2 ·
1 ∆N (ej(ω+λ) )2 + ∆N (ej(ω−λ) )2 , N2
which reduces to the variance of the periodogram to
N Var IXX (ejω ) ≈ CXX (ejω )2 ·
1 ∆N (ej2ω )2 + N 2 . N2
N This suggests that no matter how large N is, the variance of IXX (ejω ) will tend to remain at the jω )2 . Because the variance does not tend to zero for N → ∞, according to Section level CXX (e 10.1, the periodogram is not a consistent estimator. If an estimator with a variance smaller than this is desired, it cannot be obtained by simply increasing the sample length N .
The properties of the ﬁnite Fourier transform suggest that periodogram ordinates asymptotically (for large N ) are independent random variates, which explains the irregularity of the periodogram in Figure 10.2. From the asymptotic distribution of the ﬁnite Fourier transform,
16
14
12
10
8
6
4
2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 10.2: The periodogram estimate of the spectrum (solid line) and the true spectrum (dotted ω line). The abscissa displays the frequency 2π while the ordinate shows the estimate. we can deduce the asymptotic distribution of periodogram ordinates as
N IXX (ejω )
∼
CXX (ejω ) 2 χ2 , 2 CXX (ejω )χ2 , 1
ω = πk ω = πk
k∈Z .
124
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
10.3
Averaging Periodograms  Bartlett Estimator
One way to reduce the variance of an estimator is to average estimators obtained from independent measurements. If we assume that the covariance function cXX (k) of the process X(n) is ﬁnite or can be viewed as ﬁnite with a small error and the length of cXX (k) is much smaller than the length of the data stretch N , then we can divide the process X(n) of length N into L parts with length M each, where M ≪ N . With this assumption we can calculate L nearly independent M periodograms IXX (ejω , l) l = 0, . . . , L − 1 of sequences Xl (n) = X(n + l · M ) , The periodograms are written as
M IXX (ejω , l)
n = 0, . . . , N − 1
2
1 = M
M −1 n=0
Xl (n)e
−jωn
(10.5)
and the estimator as 1 ˆB CXX (ejω ) = L
L−1 M IXX (ejω , l) . l=0
This method of estimating the spectral density function CXX (ejω ) is called Bartlett method due to its invention by Bartlett in 1953. ˆB The mean of CXX (ejω ) can be easily obtained as ˆB E CXX (ejω ) = E 1 L 1 L
L−1 M IXX (ejω , l)
l=0 L−1 M E IXX (ejω , l) l=0 M IXX (ejω )
=
= E
which is the same as the mean of the periodogram of the process X(n) for n = 0, . . . , M − 1. From Section 10.2.1, it can be seen that Bartlett’s estimator is an unbiased estimator of the spectrum for M → ∞ because the periodogram is asymptotically unbiased. To check the performance of the estimator we also have to derive the variance of the estimator.
10.3. AVERAGING PERIODOGRAMS  BARTLETT ESTIMATOR
16
125
14
12
10
8
6
4
2
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 10.3: The periodogram as an estimate of the spectrum using 4 nonoverlapping segments. ω The abscissa displays the frequency 2π while the ordinate shows the estimate.
The variance of Bartlett’s estimator can be calculated as ˆB Var CXX (ejω ) 1 = E · L2 = 1 · L2
L−1 L−1 M M IXX (ejω , k)IXX (ejω , m)
k=0 m=0 L−1 M E IXX (ejω , p)2 p=0 2
−
1 E · L
L−1 M IXX (ejω , l) l=0
2
+
1 · L2
L−1 L−1 M M E IXX (ejω , k) E IXX (ejω , m) k=0 m=0
m=k
M − E IXX (ejω )
= = =
1 · L2
L−1 p=0
M E IXX (ejω , p)2 +
L · (L − 1) M E IXX (ejω ) L2
2
2
M − E IXX (ejω )
2
1 M M · E IXX (ejω )2 − E IXX (ejω ) L 1 M · Var IXX (ejω ) . L
(10.6)
Note that we can proceed similarly as above with a windowed periodogram. We would then consider L−1 1 M ˆB IW X,W X (ejω , l) CW X,W X (ejω ) = L
l=0
which has similar properties as the above. The variance of the Bartlett estimator is equivalent to the variance of the periodogram diˆB vided by the number of data segments L. So the variance of the estimator CXX (ejω ) decreases
126 with an increasing L.
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
With this result and according to the distribution of the periodogram, Bartlett’s estimator is asymptotically CXX (ejω ) 2 χ2L , ω = πk B 2L ˆXX (ejω ) ∼ C k∈Z . CXX (ejω ) 2 χL , ω = πk L distributed. This is due to the fact that the sum of L independent χ2 distributed variables is 2 χ2 distributed. 2L
10.4
Windowing Periodograms  Welch Estimator
Another method of averaging periodograms was developed by Welch in 1967. Welch derived a modiﬁed version of the Bartlett method called Welch’s method, which allows the data segments to overlap. Additionally, the segments are multiplied by a window function which aﬀects the bias of the estimator. For each periodogram l, l = 0, . . . , L − 1 the data sequence is now Xl (n) = X(n + l · D) , where l · D is the starting point of the lth sequence. According to this, the periodograms are deﬁned similarly to Equation (10.5), except for a multiplication by the window wM (n), i.e.,
M IW X,W X (ejω , l)
1 = M
M −1 n=0
2
Xl (n) · wM (n) · e
−jωn
.
Welch deﬁned his estimator as ˆW CXX (ejω ) = 1 L·A
L−1 M IW X,W X (ejω , l) l=0
with a normalization factor A to get an unbiased estimation. If we set A = 1 and wM (n) as a rectangular window of length M and D = M , the Welch method is equivalent to the Bartlett method. Thus, the Bartlett method is a special case of the Welch method. To calculate the normalization factor A we have to derive the mean of the estimator. ˆW E CXX (ejω ) 1 · = E L·A = = From (10.3) we get 1 1 1 ˆW · E CXX (ejω ) = · A M 2π
π −π L−1 M IW X,W X (ejω , l)
1 · L·A
l=0 L−1 M E IW X,W X (ejω , l) l=0
1 M E IW X,W X (ejω ) A
WM (ej(ω−λ) )2 CXX (ejλ )d λ + µ2 · WM (ejω )2 . X
10.5. SMOOTHING THE PERIODOGRAM
127
If we assume that M is large, the energy of WM (ejω ) is concentrated around ω = 0. Thus, we can state that approximately 1 1 1 ˆW E CXX (ejω ) ≈ · · CXX (ejω ) A M 2π Using Parseval’s theorem, 1 2π we rewrite the above as 1 1 ˆW E CXX (ejω ) ≈ · · CXX (ejω ) A M
M −1 n=0 π −π π −π
WM (ejλ )2 d λ + µ2 · WM (ejω )2 . X
WM (e ) dλ =
jλ 2
M −1 n=0
wM (n)2 ,
wM (n)2 + µ2 · WM (ejω )2 . X
To get an asymptotically unbiased estimator for ω = 2kπ, k ∈ Z, A has to be chosen as A= 1 · M
M −1 n=0
wM (n)2 .
With this and M → ∞, the Welch estimator is unbiased for ω = 2kπ. For M → ∞ the variance of the Welch method for nonoverlapping data segments with D = M is asymptotically the same as for the Bartlett method (unwindowed data) ˆW Var CXX (ejω ) ≈ 1 ˆB Var CXX (ejω ) ≈ CXX (ejω )2 . L
M →∞, D=M
In the case of 50% overlap (D = M ) Welch calculated the asymptotic variance of the estimator 2 to be 9 1 ˆW ≈ Var CXX (ejω ) · · CXX (ejω )2 , M 8 L M →∞, D= 2 which is higher than the variance of Bartlett’s estimator. The advantage of using Welch’s estimator with overlap is that one can use more segments for the same amount of data. This reduces the variance of the estimate while having a bias which is comparable to the one obtained by the Bartlett estimator. For example, using a 50% overlap leads to doubling the number of segments, thus a reduction of the variance by the factor 2. Additionally, the shape of the window only aﬀects the bias of the estimate, not the variance. So even for nonoverlapping segments, the selection of suitable window functions can result in an estimator of reduced bias. These advantages makes Welch’s estimator one of the most used nonparametric spectrum estimators. The method is also implemented in the ”pwelch” function of c MATLAB.
10.5
Smoothing the Periodogram
The asymptotic independence of the ﬁnite Fourier transform at discrete frequencies (see propN N erties 1b in section 8.1) suggests that the periodogram ordinates IXX (ejω1 ), . . . , IXX (ejωL ) for 0 ≤ ω1 < ω2 < . . . < ωL ≤ π are asymptotically independent. These frequencies may be discrete
128
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
frequencies ωk = 2πk , k ∈ Z in the neighborhood of a value of ω at which we wish to estimate N CXX (ejω ). This motivates the estimate ˆS CXX (ejω ) = 1 2m + 1
m k=−m N IXX ej N [k(ω)+k]
2π
N distributed, we note that IXX (ejω ) for ω = πk is not CXX2(e ) χ2 distributed but CXX (ejω )χ2 2 1 distributed. Therefore we exclude these values in the smoothing, which means
where 2π ·k(ω) is the nearest frequency to ω at which the spectrum CXX (ejω ) is to be estimated. N Given the above property, we can expect to reduce the variance of the estimation by averagjω N ing over adjacent periodogram ordinates. While for ω = kπ, k ∈ Z IXX (ejω ) ∼ CXX2(e ) χ2 2
jω
k(ω) + k = l · N,
l ∈ Z.
Because the spectrum CXX (ejω ) of a real process X(n) is of even symmetry and periodic in 2π N it is natural to calculate only values of the periodogram IXX (ejω ) for 0 ≤ ω < π. This results in the estimation of CXX (ejω ) for ω = 2πl, l ∈ Z, by ˆS CXX (ejω ) = = 1 2m 1 m
m k=1 −1 k=−m N IXX (ej (ω+
2πk N
N IXX (ej (ω+
2πk N
)) +
m k=1
N IXX (ej (ω+
2πk N
))
))
[ω = 2lπ] or [ω = (2l + 1)π and N even]
which is the same as for the estimation of ω = (2l + 1)π when N is even. For the estimation at frequency ω = (2l + 1)π when N is odd we use ˆS CXX (ejω ) = = 1 2m 1 m
m k=1 m N IXX (ej (ω− N + N IXX (ej (ω− N +
π π 2kπ N π ) ) + I N (ej (ω+ N − 2kπ ) ) N XX
2kπ N
))
ω = (2l + 1)π and N odd
k=1
which implies that the variance of the estimator at frequencies ω = lπ, l ∈ Z will be higher than at other frequencies. In the sequel, we will only consider frequencies ω = lπ, but the method is similar to frequencies ω = lπ. The smoothing method is nothing else but sampling the periodogram at discrete frequencies with the 2π spacing and averaging. Thus, the method can be easily implemented using the FFT N which makes it computational eﬃcient. The mean of the smoothed periodogram is given by ˆS E CXX (ejω ) = 1 2m + 1
m k=−m 2πk(ω) N N E IXX ej N (k(ω)+k)
2π
.
In the following calculation we will assume that
= ω. Using the formula for the mean of
10.5. SMOOTHING THE PERIODOGRAM the periodogram (10.2), we obtain ˆS E CXX (ejω ) = 1 2m + 1 + =
m k=−m
129
1 N
π −π
∆N (ej(ω+
2πk −λ) N
)2 CXX ejλ
dλ 2π
(10.7)
1 N jω 2 2 ∆ (e ) µX N π 1 1 · · −π 2m + 1 N
π
m
k=−m
∆N (ej(ω+
2πk −λ) N
)2 CXX ejλ
dλ 1 + ∆N (ejω )2 µ2 X 2π N
Am (ej(ω−λ) )
=
−π
Am ej(ω−λ) CXX ejλ
dλ 1 + ∆N (ejω )2 µ2 . X 2π N
Examining the shape of Am (ejω ) in Figure 10.4 for a ﬁxed value of N , and diﬀerent values of m, we can describe Am (ejω ) as having approximately a rectangular shape. A comparison of
15
5 4.5
Am (ejω )N =15, m=0
10
Am (ejω )N =15, m=1
4
3.5 3
2.5 2
5
1.5 1
0.5
0 0
2.5
0.2
0.4
ω π
0.6
0.8
1
0 0 1.4 1.2
0.2
0.4
ω π
0.6
0.8
1
Am (ejω )N =15, m=3
Am (ejω )N =15, m=5
0.2 0.4 0.6 0.8 1
2
1
1.5
0.8 0.6 0.4 0.2
1
0.5
0 0
ω π
0 0
0.2
0.4
ω π
0.6
0.8
1
Figure 10.4: Function Am (ejω ) for N = 15 and m = 0, 1, 3, 5. the mean of the periodogram (10.2) and the mean of the smoothed periodogram (10.7) suggests N ˆS that the bias of CXX (ejω ) will most often be greater than the bias of IXX (ejω ) as the integral extends over a greater range. Both will be the same if the spectrum CXX (ejω ) is ﬂat, however ˆS CXX (ejω ) is unbiased as N → ∞.
130
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION ˆS The calculation of the variance for CXX (ejω ) is similar to the calculation in (10.6) and results
in
1 N Var IXX (ejω ) . 2m + 1 ˆS This can be conﬁrmed from Figure 10.5 where the estimate CXX (ejω ) is plotted for diﬀerent values of m. ˆS Var CXX (ejω ) =
14 12 10 8 6 4 2 0 0.1 0.2 0.3 0.4 0.5 2 8 6 4 10
0
0.1
0.2
0.3
0.4
0.5
10 8 6 4 2 8 6 4 2
0
0.1
0.2
0.3
0.4
0.5
0
0.1
0.2
0.3
0.4
0.5
Figure 10.5: The smoothed periodogram for values of m = 2, 5, 10 and 20 (from the top left to ω bottom right). The abscissa displays the frequency 2π while the ordinate shows the estimate. With this result and the asymptotic behavior of the periodogram ordinates, we ﬁnd the ˆS asymptotic distribution of CXX (ejω ) as jω CXX (e ) χ2 4m+2 , ω = πk 4m+2 S jω ˆ CXX (e ) ∼ k∈Z . CXX (ejω ) 2 χ2m , ω = πk 2m
10.6
A General Class of Spectral Estimates
In section 10.5 we considered an estimator which utilised the independence property of the periodogram ordinates. The smoothed periodogram weighted equally all periodogram ordinates
10.6. A GENERAL CLASS OF SPECTRAL ESTIMATES
131
near ω, which, for a nearly constant spectrum is reasonable. However, if the spectrum CXX (ejω ) varies, this method creates a larger bias. In this case it is more appropriate to weigh periodogram ordinates near ω more than those further away. Analogously to Section 10.5, we construct the estimator as 1 ˆ SW CXX (ejω ) = A
m k=−m N Wk IXX ej N [k(ω)+k]
2π
k(ω) + k = l · N ω =l·π
l ∈ Z,
(10.8)
where A is a constant to be chosen for an unbiased estimator. The case where ω = lπ can be easily derived from Section 10.5. For further investigations of the estimator in (10.8) we will only consider the case where ω = l · π, l ∈ Z. ˆ SW The mean of the estimator CXX (ejω ) can be calculated with the assumption that by 1 ˆ SW E CXX (ejω ) = A
m k=−m N Wk E IXX ej [ω+
2πk N
2πk(ω) N
=ω
]
.
ˆ SW To get an asymptotically unbiased estimator for CXX (ejω ) we set
m
A=
k=−m
Wk
N because E IXX (ejω ) = CXX (ejω ) as N → ∞. The mean of the estimator can be derived as
ˆ SW E CXX (ejω )
π
=
−π
1 AN
m k=−m
Wk ∆N (ej(ω+
ASW (ej(ω−λ) )
2πk −λ) N
)2 CXX ejλ
dλ 2π
+ =
1 N jω 2 2 ∆ (e ) µX N
π
ASW (ej(ω−λ) )CXX ejλ
−π
dλ 1 + ∆N (ejω )2 µ2 X 2π N
where ASW e
jω
=
1 · m l=−m Wl N
1
m k=−m
Wk ∆N (ej(ω+
2πk ) N
)2 .
This expression is very similar to the function Am (ejω ) from Equation (10.7). Indeed if we set 1 Wk = 2m+1 ∀ k, it follows automatically that ASW (ejω ) = Am (ejω ), i.e. Am (ejω ) is a special case of ASW (ejω ). However with the weights Wk we have more control over the shape of ASW (ejω ) which also leads to a better control of the bias of the estimator itself. The eﬀect of diﬀerent weightings on the shape of ASW (ejω ) can be seen in Figure 10.6.
132
1.8 1.6
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
3
2.5 1.4
1
0.8 0.6 0.4
ASW (ejω )
0.2 0.4 0.6 0.8 1
ASW (ejω )
1.2
2
1.5
1
0.5 0.2 0 0 4 3.5 3 2.5 2
ω π
0 0 4 3.5 3
0.2
0.4
ω π
0.6
0.8
1
ASW (ejω )
0.2 0.4 0.6 0.8 1
ASW (ejω )
2.5 2
1.5 1 0.5 0 0
1.5 1 0.5 0 0
ω π
0.2
0.4
ω π
0.6
0.8
1
Figure 10.6: Function ASW (ejω ) for N = 15 and m = 4 with weights Wk in shape of (starting from the upper left to the lower right) a rectangular, Bartlett, Hamming and Blackman window.
ˆ SW The variance of the estimator CXX (ejω ) is given by ˆ SW Var CXX (ejω ) ˆ SW ˆ SW = E CXX (ejω )2 − E CXX (ejω ) = 1 A2 − =
m m
2π
2
N N Wk Wl E IXX ej N [k(ω)+k] IXX ej N [k(ω)+l] k=−m l=−m m m 2
2π
q=−m µ=−m m 1 N W 2 E IXX A2 p=−m p
1 A2
N Wq Wµ E IXX ej N [k(ω)] 2
2π
e
j 2π [k(ω)+p] N
1 + 2 A
2π
m
m
N Wk Wl E IXX ej N [k(ω)] k=−m l=−m
l=k
2π
2
− =
q=−m µ=−m m 1 W 2 Var A2 p=−m p
1 A2
m
m
N Wq Wµ E IXX ej N [k(ω)]
2
N IXX ej N [k(ω)]
2π
.
10.7. THE LOGSPECTRUM With
m 2 Wk k=−m m m i=−m Wi 2 m k=−m Wk
133
=
k=−m
Wk −
2m + 1
+
2m + 1
≥0
(10.9)
ˆ SW we are able to calculate the weighting coeﬃcients Wk for the lowest variance of CXX (ejω ). From (10.9) it is easy to see that if Wk =
m i=−m Wi
2m + 1
=
A , 2m + 1
k = 0, ±1, . . . , ±m ,
ˆ SW we reach the lowest possible variance for the estimator CXX (ejω ). Because this result is nothing ˆS other than the smoothed periodogram (see section 10.5) we see that CXX (ejω ) has the largest bias for a ﬁxed N but the lowest variance of the general class of spectrum estimators covered in this section.
10.7
The LogSpectrum
Instead of estimating the spectrum directly, it is possible to estimate 10 log CXX (ejω ) from ˆ 10 log CXX (ejω ). The logfunction can be used as a variance stabilizing technique [Priestley 2001]. We then ﬁnd ˆ E 10 log CXX ejω ˆ Var 10 log CXX ejω ≈ 10 log CXX ejω ≈ (10 log e)2 C
where C is a constant that controls the variance. If we use Bartlett’s estimator, for example, C would be equal to L. The above equations show that the variance is approximately a constant and does not depend on CXX (ejω ).
10.8
Estimation of the Spectrum using the Sample Covariance
cXX (n) = E [(X(k + n) − µX )(X(k) − µX )∗ ] .
The covariance function cXX (n) is deﬁned in Chapter 7 as
¯ where X =
Given X(0), . . . , X(N − 1) we can estimate cXX (k) by 1 N −1−n ¯ ¯ X(k + n) − X X(k) − X N cXX (n) = ˆ k=0 0,
1 N N −1 k=0
∗
, 0≤n≤N −1 otherwise
(10.10)
X(k) is an estimate of the mean µX = E [X(k)]. As it can be seen from
(10.10) the estimator of the covariance function cXX (n) is only deﬁned for n ≥ 0. It is obvious ˆ to calculate estimates for n < 0 via the symmetry property of the covariance function cXX (n) = cXX (−n)∗ .
134
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
1 Equation (10.10) is a computationally eﬃcient implementation for cXX (k) = N ∞ ˆ k=−∞ (X(k + ∗ if X(n) is zero outside the interval n = 0, . . . , N − 1. It can be proven that ¯ ¯ n) − X)(X(k) − X) cXX (n) is an asymptotically consistent estimator of the covariance function cXX (n). ˆ
Motivated by the fact that the spectrum CXX (ejω ) is the Fourier transform of the covariance function cXX (n) it is intuitive to suggest an estimator of CXX (ejω ) as ˆ CXX (ejω ) = =
∞ n=−∞
cXX (n)e−jωn ˆ
N −1 ∞
1 N 1 N
n=−(N −1) l=−∞ N −1 ∞
¯ ¯ [X(l + n) − X][X(l) − X]∗ · e−jωn ¯ ¯ [X(n − l) − X][X(−l) − X]∗ · e−jωn
=
n=−(N −1) l=−∞
which is equivalent to
N ˆ CXX (ejω ) = IX−X,X−X (ejω ) ¯ ¯
(10.11)
and should be shown by the reader as an exercise. The derivation in (10.11) shows that taking the Fourier transform of the sample covariance function is equivalent to computing the periodogram as in (10.1). This is therefore not a consistent method for estimating the spectrum.
10.8.1
BlackmanTukey Method
In 1958 Blackman and Tukey developed a spectral density estimator, here referred to the BlackmanTukey method, which is based on the sample covariance function. The basic idea of the method is to estimate the sample covariance function cXX (k) and window it with a ˆ function w2M −1 (k) of the following form w2M −1 (k) = w2M −1 (−k) , for k ≤ M − 1 with M ≪ N 0, otherwise .
This condition ensures that the Fourier transform of the window is realvalued and that the ˆ BT estimator CXX (ejω ) will not be complex. We then get ˆ BT CXX (ejω ) = Using (10.11) this is equivalent to
π M −1 n=−(M −1)
cXX (n) · w2M −1 (n) · e−jωn . ˆ
ˆ BT CXX (ejω )
=
−π
N IX−X,X−X (ejλ ) · W2M −1 ej(ω−λ) ¯ ¯
dλ 2π
(10.12)
N which is the convolution of the Periodogram IX−X,X−X (ejω ) with the Fourier transform of the ¯ ¯ window w2M −1 (n). From (10.12) we get another constraint for the window
W2M −1 (ejω ) ≥ 0
∀ω ,
10.9. CROSSSPECTRUM ESTIMATION ˆ BT to ensure the estimate CXX (ejω ) is nonnegative for all ω. ˆ BT The mean of the estimator CXX (ejω ) can be easily calculated as
N ˆ BT E CXX (ejω ) = E IX−X,X−X (ejω ) ⋆ W2M −1 (ejω ) ¯ ¯
135
which, according to Section 10.2.1, is given by ˆ BT E CXX (ejω ) = = 1 N jω 2 ∆ (e ) ⋆ W2M −1 (ejω ) ⋆ CXX (ejω ) N 1 2π
2 π
−π
1 CXX (ejθ ) 2π
∞
−∞
1 N jϑ 2 ∆ (e ) · W2M −1 (ej(ω−θ−ϑ) )d ϑd θ . N
Since limN →∞
1 N
∆N (ejω )
∞ −∞
= 2π
k
δ(ω + 2πk), we ﬁnd
1 N →∞ 2π lim
1 N jϑ 2 ∆ (e ) · W2M −1 ej(ω−θ−ϑ) d ϑ = W2M −1 ej(ω−θ) N
which reduces the term for the mean of the estimator to a simple convolution 1 ˆ BT E CXX (ejω ) = 2π
π −π
CXX (ejθ ) · W2M −1 ej(ω−θ) d θ .
(10.13)
Therefore the BlackmanTukey estimator is unbiased for M → ∞. The variance of the estimator is given from Eq. (10.12) as ˆ BT Var CXX (ejω ) ≈ CXX (ejω )2 · ≈ CXX (ejω )2 · 1 N 1 N
π
−π M −1
W2M −1 (ejω )2 d ω w2M −1 (n)2 .
n=−(M −1)
Based on this approximation the variance of the Blackman Tukey method tends to zero if the window function w2M −1 (n) has ﬁnite energy.
10.9
CrossSpectrum Estimation
If X(n) and Y (n) are jointly stationary and real, the crosscovariance function is deﬁned as cY X (n) = E [(Y (n + k) − E [Y (n)])(X(k) − E [X(k)])] cY X (n) = E [Y (n + k)X(k)] − E [Y (n)] E [X(n)]
∞ n=−∞
The crossspectrum is deﬁned by CY X (e ) =
jω
cY X (n)e−jωn = CXY (ejω )∗ = CXY (e−jω ) .
136
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
Given X(0), . . . , X(N − 1), Y (0), . . . , Y (N − 1), the objective is to estimate the crossspectrum CY X (ejω ) for −π < ω < π. The ﬁnite Fourier transforms are given by YN (ejω ) = XN (ejω ) =
N −1 n=0 N −1 n=0
Y (n)e−jωn X(n)e−jωn
XN (ejω ) and YN (ejω ) are for large N normally distributed random variables as discussed above. At distinct frequencies 0 < ω1 < ω2 < . . . < ωN < π the random variables XN (ejω ) and YN (ejω ) are independent. Considering L data stretches of the series X(0), . . . , X(N − 1) and Y (0), . . . , Y (N − 1), so that M L = N , YN (e , l) = XN (ejω , l) =
jω M −1 m=0 M −1 m=0
Y ([l − 1]M + m) e−jωm X ([l − 1]M + m) e−jωm
l = 1, . . . , L and
l = 1, . . . , L
are for large M stochastically independent for 0 < ω ≤ π. Deﬁning V(ejω ) = ℜXN (ejω ), ℜYN (ejω ), ℑXN (ejω ), ℑYN (ejω )
T
, we have
E V(ejω )V(ejω )H
CXX (ejω ) ℜCY X (ejω ) 0 ℑCY X (ejω ) N ℜCY X (ejω ) CY Y (ejω ) −ℑCY X (ejω ) 0 = jω ) jω ) jω ) . 0 ℑCY X (e CXX (e ℜCY X (e 2 −ℑCY X (ejω ) 0 ℜCY X (ejω ) CY Y (ejω )
10.9.1
The CrossPeriodogram
The crossperiodogram is deﬁned as
N IY X (ejω ) =
1 YN (ejω )XN (ejω )∗ N
It can be shown that
N E IY X (ejω ) = CY X (ejω )
Also, we have E YN (ejω )XN (ejω ) = 0 and
N Var IY X (ejω ) = CY Y (ejω )CXX (ejω )
10.9. CROSSSPECTRUM ESTIMATION Because CY X (ejω )
2
137
≤ CY Y (ejω )CXX (ejω )
2
N The variance of IY X (ejω ) is never smaller than CY X (ejω )
for any N .
To construct estimates for CY X (ejω ) with lower variance, we use the properties given above. We calculate 1 ˆ CY X (ejω ) = L and we have (for large M ) ˆ E CY X (ejω ) ≈ CY X (ejω )
L l=1
1 YM (ejω , l)XM (ejω , l)∗ M
1 ˆ Var CY X (ejω ) ≈ CY Y (ejω )CXX (ejω ) L Also we can smooth the crossperiodogram over neighboring frequencies and get similar results.
138
CHAPTER 10. NONPARAMETRIC SPECTRUM ESTIMATION
Chapter 11
Parametric Spectrum Estimation
11.1 Spectrum Estimation using an Autoregressive Process
Z(n)
h(n)
X(n)
Figure 11.1: Linear and timeinvariant ﬁltering of white noise Consider the system of Figure 11.1 with output X(n) such that
p
X(n) +
k=1
ak X(n − k) = Z(n)
2 where E [Z(n)] = 0, E [Z(n)Z(k)] = σZ δ(n−k) and ap = 0. It is assumed that the ﬁlter with unit impulse response h(n) is stable. The process X(n) is then called an autoregressive process of order p or in short notation AR(p). The recursive ﬁlter possesses the following transfer function
H(ejω ) =
1 1+
p −jωk k=1 ak e
Because we have assumed the ﬁlter to be stable, the roots of the characteristic equation
p
zp +
k=1
ak z p−k = 0,
z = ejω
lie in the unit disc, i.e. zl  < 1 for all l roots of the above equation. The spectrum is given as CXX (ejω ) = H(ejω )2 CZZ (ejω ) thus, CXX (ejω ) = 1 +
2 σZ p −jωk 2 k=1 ak e
139
140
CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION
The objective is to estimate CXX (ejω ) based on observations of the process X(n) for n = 0, . . . , N − 1. Multiplying
p
X(n) +
k=1
ak X(n − k) = Z(n)
by X(n − l) (from the right), l ≥ 0 and taking expectation, leads to
p
E [X(n)X(n − l)] + Thus, cXX (l) +
p
k=1
ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]
k=1
ak E [X(n − k)X(n − l)] = E [Z(n)X(n − l)]
∞ k=0
Assuming the recursive ﬁlter to be causal, i.e. h(n) = 0 for n < 0 leads to the relationship X(n) = h(k)Z(n − k)
∞ k=1
= h(0)Z(n) + Thus, shifting X(n) by l gives X(n − l) = h(0)Z(n − l) +
h(k)Z(n − k)
∞ k=1
h(k)Z(n − l − k)
Taking the expected value of the product Z(n)X(n − l) leads to E [Z(n)X(n − l)] = h(0)cZZ (l) + E [Z(n)X(n − l)] = h(0)cZZ (l) =
p ∞
h(k)cZZ (l + k)
k=1 =0 2 h(0)σZ δ(l)
Assuming h(0) = 1, leads to the so called YuleWalker equations cXX (l) +
k=1
ak cXX (l − k) =
2 σZ , l = 0 0, l = 1, 2, . . .
The diﬀerence equation in cXX (l) has a similar form to the system describing diﬀerence equation in X(n). Having observed X(n), one can estimate cXX (n). Using cXX (0), . . . , cXX (p), one would get a unique solution for a1 , . . . , ap . The solution is unique if the inverse matrix with the elements cXX (l − k) exists, which is guaranteed if the ﬁlter is stable. Given estimates a1 , . . . , ap of ˆ ˆ a1 , . . . , ap , one can use the YuleWalker equations for l = 0 to get σZ . Then it is straightforward ˆ2 to ﬁnd an estimate for CXX (ejω ) given by ˆ CXX (ejω ) = 1 + σZ ˆ2 p ˆ −jωk 2 k=1 ak e
11.2. SPECTRUM ESTIMATION USING A MOVING AVERAGE PROCESS
141
11.2
Spectrum Estimation using a Moving Average Process
Consider a transversal ﬁlter with impulse response h(n), input Z(n) and output X(n) ∈ R as shown in Fig. 11.2:
q q
X(n) =
k=0
h(k)Z(n − k) = Z(n) +
k=1
h(k)Z(n − k) .
2 If Z(n) is a white noise process, i.e., E [Z(n)] = 0 and cZZ (k) = σZ δ(k), then X(n) is a Moving Average (MA) Process of order q, MA(q).
Z(n) z −1 z −1 h(1) h(2) z −1 h(q) X(n)
Figure 11.2: Filter model for a moving average process The spectrum of X(n) can be calculated as
2 CXX (ejω ) = H(ejω ) σZ . 2
The covariance function of X(n) is cXX (n) = E [X(n + k)X(k)]
q q
= E
q r=0 q
h(r)Z(n + k − r)
m=0
h(m)Z(k − m)
=
r=0 m=0
2 h(r)h(m)σZ δ(n + m − r) q−n m=0
cXX (n) =
2 σZ
h(m)h(m + n), 0 ≤ n ≤ q 0, else
.
If we have cXX (0), . . . , cXX (q), we would need to solve nonlinear equations in h(0), . . . , h(q) 2 and σZ , which makes it impossible to ﬁnd a unique solution. Let b(m) = σZ h(m) for m = 0, . . . , q, then cXX (n) =
q−n m=0
b(m)b(m + n), 0 ≤ n ≤ q 0, else
.
142
CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION
Let cXX (0), . . . , cXX (q) be available (consistent) estimates for cXX (0), . . . , cXX (q). Then ˆ ˆ
q−n
cXX (n) = ˆ
m=0
ˆ b(m)ˆ + n), b(m
n = 0, . . . , q .
The estimates ˆ b(0), . . . , ˆ b(q) are consistent, because of the consistency of cXX (k), k = 0, . . . , q. ˆ ˆ ˆ b(1) b(q) 2 =ˆ 2. ˆ ˆ Then h(1) = , . . . , h(q) = ,σ ˆ b(0)
ˆ b(0) ˆ b(0) Z
The implementation of the solution is as follows: With
q
ˆ B(z) =
n=0 q
ˆ b(n)z −n cXX (n)z −n ˆ
n=−q
ˆ CXX (z) =
ˆ ˆ = B(z)B (1/z) ˆ 1. Calculate P (z) = z q CXX (z) ˆ ˆ 2. Assign all roots zi of P (z) inside the unit circle to B(z), i.e. B(z) = ˆ b(0) ˆ with b(0) > 0 and zi  ≤ 1. ˆ 3. Calculate h(m) = As an example ˆ b(1) ˆ = h(1) = − ˆ b(0) To ﬁnd σZ ˆ2 = ˆ2 we use b0
q−k ˆ b(m) ˆ b(0) q i=1 (1
− zi z −1 )
for m = 1, . . . , q
q
zi .
i=1
cXX (n) = ˆ
m=0
ˆ b(m)ˆ + n) , b(m
n = 0, 1, . . . , q
and ﬁnd for n = 0 ˆ 2= b(0)
2 as an estimate for the power σZ of Z(n).
cXX (0) ˆ 2 ˆ 1+ q m=1 h(m)
An alternative to estimate the spectrum CXX (ejω ) of X(n) is
q q
ˆ CXX (ejω ) =
n=−q
cXX (n)e−jωn = cXX (0) + 2 ˆ ˆ
n=1
cXX (n)e−jωn . ˆ
(11.1)
Example 11.2.1 Consider a receiver input signal X(n) from a wireless transmission. To ﬁnd the moving average estimate of the spectrum CXX (ejω ), we ﬁrst calculate the sample covariance ˆ ˆ cXX (n) and take the ztransform CXX (z). By taking P (z) = z N −1 CXX (z) we ﬁnd N − 1 roots ˆ ˆ ˆ zi , i = 1, . . . , N − 1 inside the unit circle and assign them to B(z). The estimator CXX (ejω ) can then be found according to (11.1).
11.3. MODEL ORDER SELECTION
143
12
10
8
6
ˆ CXX (ejω )
4
CXX (ejω )
2
0 0
0.5
1
ω π
1.5
2
Figure 11.3: Estimate of an MA process with N = 100 and q = 5
11.3
Model order selection
A problem arising with the parametric approach is that of estimating the number of coeﬃcients required, i.e. p or q. Having observed X(0), . . . , X(N − 1) there are several techniques for doing this. Akaike’s information criterion (AIC) suggests minimising the following function with respect to m. AIC(m) = log σZ,m + m ˆ2 2 , N m = 1, . . . , M
The order estimate is the smallest m minimising AIC(m). Another more accurate estimate is given by the minimum description length (MDL) criterion, obtained by taking the smallest m minimising MDL(m) = log σZ,m + m ˆ2 log N , N m = 1, . . . , M
Examples We generate realisations from the N (0, 1) distribution. We ﬁlter the data recursively to get 1024 values. The ﬁlter has poles z1,2 = 0.7e±j0.4 , z3,4 = 0.75e±j1.3 , z5,6 = 0.9e±j1.9 It is a recursive ﬁlter of order 6. The process is an AR(6) process.
144
CHAPTER 11. PARAMETRIC SPECTRUM ESTIMATION
10 9 8 7 6 5 4 3 2 1
True Estimate
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
0.5
Figure 11.4: True and estimated AR(6) spectrum. The abscissa displays the frequency ordinate shows the estimate.
ω 2π
while the
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.4
MDL
0.3 0.2 0.1 0 2 4 6 Order 8 10
AIC
0.5
0.4
0.3
0.2
0.1
0 2 4 6 Order 8 10
Figure 11.5: Akaike and MDL information criteria for the AR(6) process for orders 1 through 10.
Chapter 12
Discrete Kalman Filter
The Kalman ﬁlter has become an indispensable tool in modern control and signal processing. Its simplest use is as an estimation tool. However, it is capable of revealing more than just a standard estimate; it provides a complete statistical description of the degree of knowledge available. This can be highly useful in system analysis. In contrast to many other estimation techniques, the Kalman ﬁlter operates in the state space domain. This intuitive structure has given it physical motivation in a variety of applications. In addition, its implementation can be computationally eﬃcient. As a result, it has found application in a wide variety of ﬁelds; especially in process control and target tracking, but also in many other signal processing topics.
12.1
State Estimation
The purpose of a Kalman ﬁlter is to estimate the state of a process. In many (or most) physical applications, this state will consist of multiple random variables. For example, in a remote sensing scenario, the task may be to estimate the position and velocity of a vehicle. In this case information may be available in the form of measurements such as acoustic signals, radar returns or visual images. Once an appropriate model of the system is constructed, it may be possible to derive an algorithm for the estimation of the state, based on a ﬁnite set of observations.
12.2
Derivation of Kalman Filter Equations
Consider a multidimensional system where the time evolution of the (unobservable) process state can be described by a11 . . . a1N X1 (n − 1) b11 . . . b1L U1 (n − 1) W1 (n − 1) X1 (n) . . . . . . . .. .. . . . . . = . + . + . . . . . . . . . XN (n) aN 1 . . . aN N XN (n − 1) bN 1 . . . bN L UL (n − 1) WN (n − 1) and in vector notation X(n) = AX(n − 1) + BU (n − 1) + W (n − 1) where • X(n) ∈ ℜN ×1 is the process state at time n 145 (12.1)
146 • A is the state transition matrix • U (n) ∈ ℜL×1 is the control input
CHAPTER 12. DISCRETE KALMAN FILTER
• B governs the eﬀect of the control input on the state • W (n) ∈ ℜN ×1 is the process noise. The measurements of the process are determined by Z1 (n) h11 . . . h1N X1 (n) V1 (n) . . . . + . .. . . . . = . . . . . . . ZM (n) hM 1 . . . hM N XN (n) VM (n) Z(n) = HX(n) + V (n) where • Z(n) ∈ ℜM ×1 is the measurement • H maps the process state to a measurement • V (n) ∈ ℜM ×1 is the measurement noise The relationships between the system components can be seen in Figure 12.1. W (n) V (n) (12.2)
U (n)
B
δK (n − 1) A
X(n)
H
Z(n)
Figure 12.1: Relationship between system state and measurements assumed for Kalman ﬁlter. In addition, we assume that the noise quantities W (n) and V (n) are independent to each other and to U (n), white and Gaussian distributed, W (n) ∼ N (0, Q) V (n) ∼ N (0, R) .
The matrices A, B and H are assumed known from the physics of the system. The control input U (n) is the intervention or “steering” of the process by an outside source (e.g. a human operator). It is assumed that U (n) is known precisely. Therefore, its eﬀect on the process, through (12.1) is also known. Now, in order to derive the Kalman ﬁlter equations we need to consider what is the “best” ˆ ˆ estimate X(n) of the new state X(n) based on the previous state estimate X(n − 1), the previous control input U (n − 1) and the current observation Z(n). This estimate of X(n) may be calculated in two steps:
12.2. DERIVATION OF KALMAN FILTER EQUATIONS 1. Predict forward in time from previous estimate ˆ ˆ X − (n) = AX(n − 1) + BU (n − 1) .
147
(12.3)
From (12.1) it can be seen that the new state is a linear function of the previous state with the addition of the (known) control input and (unknown) zeromean noise. To predict the new state it is reasonable to use the previous state estimate in place of the true state and to ignore the additive noise component since its expected value is zero. The quantity ˆ X − (n) is referred to a the a priori state estimate since it is determined from the knowledge available before time instant n. It is a projection of the previous state estimate forward in time. 2. Predict the new measurement value ˆ ˆ Z − (n) = H X − (n) (12.4)
and correct the state estimate based on the residual between predicted and actual measurements ˆ The predicted measurement, Z − (n), is calculated from the a priori state estimate. It can be seen that (12.4) is similar to (12.2) without the zeromean measurement noise and using the previous state estimate rather than the true, unknown value. Since the true ˆ measurement, Z(n), is available, the measurement residual Z(n) − Z − (n) is a measure of how correct the state estimate is – a small error is more likely when the estimate is good, whereas a poor estimate will be more likely to produce a large diﬀerence between predicted and actual measurements. The state estimate is now corrected by the weighted ˆ residual K(n)(Z(n) − Z − (n)). ˆ ˆ ˆ X(n) = X − (n) + K(n)(Z(n) − Z − (n)) . (12.5)
The N × M matrix K(n) is referred to as the Kalman gain. It controls how the measurement residual aﬀects the state estimate. In order to optimise the choice of K(n), a criterion must be speciﬁed. A logical choice would be to minimise the mean squared error between the a posteriori ˆ estimate X(n) and the true state vector X(n), i.e. min E
K(n)
ˆ X(n) − X(n)
2
= min tr {P (n)}
K(n)
(12.6)
where P (n) = E E(n)E(n)T with ˆ E(n) = X(n) − X(n) (12.7) (12.8)
is the (state) estimate error – i.e. the diﬀerence between the true state vector and our a posteriori state estimate. To ﬁnd the K(n) that minimises (12.6), start by writing the a posteriori error in terms of ˆ the a priori error, E − (n) = X(n) − X − (n), and its covariance, P − (n) = E E − (n)E − (n)T ˆ E(n) = X(n) − X(n) ˆ ˆ = E − (n) − X(n) − X − (n)
ˆ = E − (n) − K(n) Z(n) − H X − (n) = E − (n) − K(n) HE − (n) + V (n) ˆ = E − (n) − K(n) H(X(n) − X − (n)) + V (n) (12.9)
148 Now substitute into (12.7) P (n) = E E(n)E(n)T = E
CHAPTER 12. DISCRETE KALMAN FILTER
E − (n) − K(n) HE − (n) + V (n)
T
E − (n) − K(n) HE − (n) + V (n)
T
= E E − (n)E − (n)T − E K(n) HE − (n) + V (n) E − (n)T . . . −E E − (n) HE − (n) + V (n) K(n)T . . . +E K(n) HE − (n) + V (n) HE − (n) + V (n)
T
K(n)T
Diﬀerentiate1 tr {P (n)} with respect to K(n) and set to 0 P − (n)H T 0 = 0 − P − (n)H T − P − (n)H T + 2K(n) HP − (n)H T + R = K(n) HP − (n)H T + R
−1
K(n) = P − (n)H T HP − (n)H T + R
.
(12.10)
It is easy to see that as R → 0, i.e. the measurement noise becomes less inﬂuential , K(n) → H −1 . Thus, the residual between actual and predicted measurements is more trusted and is given a heavy weighting in correcting the state estimate through (12.5). Alternatively, if it is known that the a priori error covariance approaches 0, i.e. P − (n) → 0, or if R is large, then K(n) → 0. This means that the residual is given less weight in updating the state estimate – the a priori estimate is given priority. Recall that X(n) = AX(n − 1) + BU (n − 1) + W (n − 1) ˆ ˆ X − (n) = AX(n − 1) + BU (n − 1) then P − (n) = E E − (n)E − (n)T = E = E = AE ˆ X(n) − X − (n) ˆ X(n) − X − (n)
T
ˆ A(X(n − 1) − X(n − 1)) + W (n − 1) ˆ X(n − 1) − X(n − 1)
ˆ A(X(n − 1) − X(n − 1)) + W (n − 1)
T
T
ˆ X(n − 1) − X(n − 1)
AT + E W (n − 1)W (n − 1)T (12.11)
= AP (n − 1)AT + Q
1
= AE E(n − 1)E(n − 1)T AT + E W (n − 1)W (n − 1)T
The following lemmas are used in the diﬀerentiation: ∂tr {AX} ∂X ˘ ¯ ∂tr AX T ∂X ∂XAX T ∂X = = = AT A X(A + AT )
12.3. EXTENDED KALMAN FILTER Starting from (12.9), E(n) = E − (n) − K(n)(HE − (n) + V (n)) = (I − K(n)H)E − (n) − K(n)V (n) so E E(n)E(n)T = E (I − K(n)H)E − (n) − K(n)V (n)
149
(I − K(n)H)E − (n) − K(n)V (n)
T
P (n) = E (I − K(n)H) E − (n)E − (n)T (I − K(n)H)T + K(n)V (n)V (n)T K(n)T = P − (n) − K(n)HP − (n) − P − (n)H T K(n)T + K(n)HP − (n)H T K(n)T + K(n)RK(n)T = (I − K(n)H) P − (n) (I − K(n)H)T + K(n)RK(n)T
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + P − (n)H T K(n)T = (I − K(n)H) P − (n)
= (I − K(n)H) P − (n) − P − (n)H T K(n)T + K(n)(HP − (n)H T + R)K(n)T
(12.12)
Note that we have used the solution for K(n) in (12.10) to simplify the above equation. It is now possible to draw all the equations derived above together to complete the description of the Kalman ﬁlter: 1. Predict the next state and covariance using the previous estimate. ˆ ˆ (a) X − (n) = AX(n − 1) + BU (n − 1)
(b) P − (n) = AP (n − 1)AT + Q
2. Correct the predictions using the measurement. (a) K(n) = P − (n)H T (HP − (n)H + R)−1 ˆ ˆ ˆ (b) X(n) = X − (n) + K(n)(Z(n) − Z − (n)) (c) P (n) = (I − K(n)H)P − (n) The distinction between the two main steps is quite clear. The prediction is based on the a priori information available, i.e. the estimates from the previous time instance which was based on all measurements up to, but not including, the current time instance n. While the correction is made using the current measurement Z(n) to yield an a posteriori estimate. In fact, the suitability of the Kalman ﬁlter for computer implementation is, in large part, due to this feature. The procedure is recursive – it is run after each measurement is made, meaning intermediate estimates are always available. This is in contrast to some techniques that must process the entire measurement sequence to ﬁnally yield an estimate. It also compresses all the information from previous measurements into the current state estimate and its corresponding covariance matrix estimate – there is no need to retain a record of previous measurements. This can result in signiﬁcant memory savings.
12.3
Extended Kalman Filter
The Kalman ﬁlter has very strict assumptions on the system model. In particular it uses linear stochastic diﬀerence equations to describe the state transition and the statetomeasurement mapping. This linearity assumption is violated by many or, by strict interpretation, all reallife
150
CHAPTER 12. DISCRETE KALMAN FILTER
applications. In these cases it becomes necessary to use the Extended Kalman Filter (EKF). As the name suggests, this is a generalisation or extension to the standard Kalman ﬁlter framework whereby the equations are “linearised” around the estimates in a way analogous to the Taylor series. Let the state transition now be deﬁned as: (c.f. (12.1)) X(n) = f (X(n − 1), U (n − 1), W (n − 1)) and the measurement is found by: (c.f. (12.2)) Z(n) = h(X(n), V (n)). (12.14) (12.13)
As for the Kalman ﬁlter case, the noise processes, W (n − 1) and V (n − 1), are unobservable ˆ and only an estimate of the previous state, X(n − 1), is available. However, the control input, U (n − 1) is known. Therefore, estimates of the state and measurement vectors can be found by ˆ ˆ X − (n) = f (X(n − 1), U (n − 1), 0) ˆ ˆ Z − (n) = h(X − (n), 0) . This means that (12.13) and (12.14) can be linearised around these initial estimates X(n) = f (X(n − 1), U (n − 1), W (n − 1)) ˆ ˆ ≈ X − (n) + A(n)(X(n − 1) − X(n − 1)) + C(n)W (n − 1) and ˆ ˆ Z(n) ≈ Z − (n) + H(n)(X(n) − X − (n)) + D(n)V (n) where A(n), C(n), H(n) and D(n) are Jacobian matrices of the partial derivatives of f and h, i.e. A[i,j] = W[i,j] = H[i,j] = V[i,j] = ∂f[i] ˆ (X(n − 1), U (k − 1), 0) ∂x[j] ∂f[i] ˆ (X(n − 1), U (k − 1), 0) ∂w[j] ∂h[i] ˆ (X(n − 1), 0) ∂x[j] ∂h[i] ˆ (X(n − 1), 0) . ∂v[j] (12.15) (12.16)
Hopefully the interpretation of these equations is clear. The nonlinear state and measurement equations are linearised about an initial, a priori estimate. The four Jacobian matrices describe the rate of change of the nonlinear functions at these estimates. Due to the nonlinearities, the random variables in the EKF cannot be guaranteed to be Gaussian. Further, no general optimality criterion can be associated with the resulting estimate. One way to interpret the EKF is that it is simply an adaptation of the Kalman ﬁlter to the nonlinear case through the use of a ﬁrst order Taylor series expansion about the a priori estimate. By using the equations above and following the methodology in deriving the Kalman ﬁlter, the complete algorithm for implementing an EKF is:
12.3. EXTENDED KALMAN FILTER 1. Predict the next state and covariance using the previous estimate. ˆ ˆ (a) X − (n) = f (X(n − 1), U (n − 1), 0)
151
(b) P − (n) = A(n)P (n − 1)A(n)T + C(n)Q(n − 1)C(n)T 2. Correct the predictions using the measurement. (a) K(n) = P − (n)H(n)T (H(n)P − (n)H(n)T + V RV T )−1 ˆ ˆ ˆ (b) X(n) = X − (n) + K(n)(Z(n) − h(X − (n), 0)) (c) P (n) = (I − KH(n))P − (n)
152
CHAPTER 12. DISCRETE KALMAN FILTER
Appendix A
DiscreteTime Fourier Transform
Sequence ax(n) + by(n) x(n − nd ) (nd integer) e−jω0 n x(n) x(−n) nx(n) x(n) ⋆ y(n) x(n)y(n) −jnx(n) Symmetry properties: x(n)∗ x(n) real x(n) real Fourier Transform aX(ejω ) + bY (ejω ) e−jωnd X(ejω ) X(ej(ω+ω0 ) ) X(e−jω ) X(ejω )∗ if x(n) real dX(ejω ) j dω X(ejω )Y (ejω ) π 1 X(ejθ )Y (ej(ω−θ )dθ 2π −π
dX(ejω ) dω
X(e−jω )∗ X(ejω ) = X(e−jω )∗ XR (ejω ) andX(ejω ) are even XI (ejω ) and argX(ejω ) are odd X(ejω )∗ X(ejω ) real and even X(ejω ) pure imaginary and odd
x(−n)∗ x(n) real and even x(n) real and odd Parseval’s Theorem ∞ 1 x(n)2 = 2π n=−∞
∞ n=−∞
π −π
X(ejω ) dω
π −π
2
x(n) · y(n)∗ =
1 2π
X(ejω ) · Y (ejω )∗ dω
153
154
APPENDIX A. DISCRETETIME FOURIER TRANSFORM
Sequence δ(n) δ(n − n0 ) 1, (−∞ < n < ∞) an u(n), (a < 1) u(n) (n + 1)an u(n) (a < 1) rn sin(ωp (n + 1)) u(n) (r < 1) sin(ωp ) sin(ωc n) πn 1, 0 ≤ n ≤ M x(n) = 0, otherwise ejω0 m cos(ω0 n + φ)
Fourier Transform 1 e−jωn0
∞
2πδ(ω + 2πk)
k=−∞
1 1 − ae−jω ∞ 1 πδ(ω + 2πk) + 1 − e−jω k=−∞ 1 (1 − ae−jω )2 1 1 − 2r cos(ωp e−jω ) + r2 e−j2ω 1, ω < ωc X(ejω ) = 0, ωc < ω ≤ π sin[ω(M + 1)/2] −jωM/2 e sin(ω/2)
∞ k=−∞ ∞
2πδ(ω − ω0 + 2πk) [ejφ δ(ω − ω0 + 2πk) + e−jφ δ(ω + ω0 + 2πk)
π
k=−∞
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.