The Theory of Linear Prediction

Copyright © 2008 by Morgan & Claypool
All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in
any form or by any means---electronic, mechanical, photocopy, recording, or any other except for brief quotations
in printed reviews, without the prior permission of the publisher.
The Theory of Linear Prediction
P. P. Vaidyanathan
www.morganclaypool.com
ISBN: 1598295756 paperback
ISBN: 9781598295757 paperback
ISBN: 1598295764 ebook
ISBN: 9781598295764 ebook
DOI: 10.2200/S00086ED1V01Y200712SPR03
A Publication in the Morgan & Claypool Publishers series
SYNTHESIS LECTURES ON SIGNAL PROCESSING # 3
Lecture #3
Series Editor: José Moura, Carnegie Mellon University
Series ISSN
ISSN 1932-1236 print
ISSN 1932-1694 electronic
The Theory of Linear Prediction
P. P. Vaidyanathan
California Institute of Technology
SYNTHESIS LECTURES ON SIGNAL PROCESSING#3
Morgan Claypool Publishers M
C &
&
To Usha, Vikram, and Sagar and my parents.
vi
ABSTRACT
Linear prediction theory has had a profound impact in the field of digital signal processing.
Although the theory dates back to the early 1940s, its influence can still be seen in applications
today. The theory is based on very elegant mathematics and leads to many beautiful insights into
statistical signal processing. Although prediction is only a part of the more general topics of linear
estimation, filtering, and smoothing, this book focuses on linear prediction. This has enabled
detailed discussion of a number of issues that are normally not found in texts. For example, the
theory of vector linear prediction is explained in considerable detail and so is the theory of line
spectral processes. This focus and its small size make the book different from many excellent texts
which cover the topic, including a few that are actually dedicated to linear prediction. There are
several examples and computer-based demonstrations of the theory. Applications are mentioned
wherever appropriate, but the focus is not on the detailed development of these applications. The
writing style is meant to be suitable for self-study as well as for classroom use at the senior and
first-year graduate levels. The text is self-contained for readers with introductory exposure to signal
processing, random processes, and the theory of matrices, and a historical perspective and detailed
outline are given in the first chapter.
KEYWORDS
Linear prediction theory, vector linear prediction, linear estimation, filtering, smoothing,
line spectral processes, Levinson’s recursion, lattice structures, autoregressive models
vii
Preface
Linear prediction theory has had a profound impact in the field of digital signal processing.
Although the theory dates back to the early 1940s, its influence can still be seen in applications
today. The theory is based on very elegant mathematics and leads to many beautiful insights into
statistical signal processing. Although prediction is only a part of the more general topics of linear
estimation, filtering, and smoothing, I have focused on linear prediction in this book. This has
enabled me to discuss in detail a number of issues that are normally not found in texts. For example,
the theory of vector linear prediction is explained in considerable detail and so is the theory of
line spectral processes. This focus and its small size make the book different from many excellent
texts that cover the topic, including a few that are actually dedicated to linear prediction. There are
several examples and computer-based demonstrations of the theory. Applications are mentioned
wherever appropriate, but the focus is not on the detailed development of these applications.
The writing style is meant to be suitable for self-study as well as for classroom use at the
senior and first-year graduate levels. Indeed, the material here emerged fromclassroomlectures that
I had given over the years at the California Institute of Technology. So, the text is self-contained
for readers with introductory exposure to signal processing, random processes, and the theory of
matrices. A historical perspective and a detailed outline are given in Chapter 1.
ix
Acknowledgments
The pleasant academic environment provided by the California Institute of Technology and the
generous support from the National Science Foundation and the Office of Naval Research have
been crucial in developing some of the advanced materials covered in this book.
During my ‘‘young’’ days, I was deeply influenced by an authoritative tutorial on linear
filtering by Prof. Tom Kailath (1974) and a wonderful tutorial on linear prediction by John
Makhoul (1975). These two articles, among other excellent references, have taught me a lot and
so has the book by Anderson and Moore (1979). My ‘‘love for linear prediction’’ was probably
kindled by these three references. The small contribution I have made here would not have been
possible were it not for these references and other excellent ones mentioned in the introduction in
Chapter 1.
It is impossible to reduce to words my gratitude to Usha, who has shown infinite patience
during my busy days with research and book projects. She has endured many evenings and
weekends of my ‘‘disappearance’’ to work. Her sincere support and the enthusiasm and love from
my uncomplaining sons Vikram and Sagar are much appreciated!
P. P. Vaidyanathan
California Institute of Technology
xi
Contents
1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 History of Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Scope and Outline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2.1 Notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
2. The Optimal Linear Prediction Problem. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.2 Prediction Error and Prediction Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2.3 The Normal Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3.1 Expression for the Minimized Mean Square Error . . . . . . . . . . . . . . . . . . . . . . . 8
2.3.2 The Augmented Normal Equation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
2.4 Properties of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.4.1 Relation Between Eigenvalues and the Power Spectrum . . . . . . . . . . . . . . . . 12
2.4.2 Singularity of the Autocorrelation Matrix. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.4.3 Determinant of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.5 Estimating the Autocorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
3. Levinson’s Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Derivation of Levinson’s Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2.1 Summary of Levinson’s Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.2.2 The Partial Correlation Coefficient . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.3 Simple Properties of Levinson’s Recursion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.4 The Whitening Effect . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
4. Lattice Structures for Linear Prediction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 The Backward Predictor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2.1 All-Pass Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
xii THETHEORY OFLINEAR PREDICTION
4.2.2 Orthogonality of the Optimal Prediction Errors. . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Lattice Structures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.3.1 The IIR LPC Lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4.3.2 Stability of the IIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.3.3 The Upward and Downward Recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
4.4 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
5. Autoregressive Modeling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.2 Autoregressive Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
5.3 Approximation by an AR(N) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.3.1 If a Process is AR, Then LPC Will Reveal It . . . . . . . . . . . . . . . . . . . . . . . . . . 45
5.3.2 Extrapolation of the Autocorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.4 Autocorrelation Matching Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5.5 Power Spectrum of the AR Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.6 Application in Signal Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
5.7 MA and ARMA Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.8 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
6. PredictionError Bound and Spectral Flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.2 Prediction Error for an AR Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
6.3 A Measure of Spectral Flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
6.4 Spectral Flatness of an AR Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
6.5 Case Where Signal Is Not AR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
6.5.1 Error Spectrum Gets Flatter as Predictor Order Grows . . . . . . . . . . . . . . . . . 76
6.5.2 Mean Square Error and Determinant . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
6.6 Maximum Entropy and Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
6.6.1 Connection to the Notion of Entropy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
6.6.2 A Direct Maximization Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
6.6.3 Entropy of the Prediction Error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
6.7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
7. Line Spectral Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
7.2 Autocorrelation of a Line Spectral Process. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
7.2.1 The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
7.2.2 Rank Saturation for a Line Spectral Process . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3 Time Domain Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
CONTENTS xiii
7.3.1 Extrapolation of Autocorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3.2 All Zeros on the Unit Circle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
7.3.3 Periodicity of a Line Spectral Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
7.3.4 Determining the Parameters of a Line Spectral Process . . . . . . . . . . . . . . . . . 93
7.4 Further Properties of Time Domain Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93
7.5 Prediction Polynomial of Line Spectral Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
7.6 Summary of Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
7.7 Identifying a Line Spectral Process in Noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
7.7.1 Eigenstructure of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . 103
7.7.2 Computing the Powers at the Line Frequencies . . . . . . . . . . . . . . . . . . . . . . . 105
7.8 Line Spectrum Pairs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
7.9 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
8. Linear Prediction Theory for Vector Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.2 Formulation of the Vector LPC Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
8.3 Normal Equations: Vector Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.4 Backward Prediction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
8.5 Levinson’s Recursion: Vector Case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
8.6 Properties Derived from Levinson’s Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.6.1 Properties of Matrices F
f
m
and F
b
m
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
8.6.2 Monotone Properties of the Error Covariance Matrices . . . . . . . . . . . . . . . . 128
8.6.3 Summary of Properties Relating to Levinson’s Recursion . . . . . . . . . . . . . . 128
8.7 Transfer Matrix Functions in Vector LPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
8.8 The FIR Lattice Structure for Vector LPC. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
8.8.1 Toward Rearrangement of the Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 131
8.8.2 The Symmetrical Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
8.9 The IIR Lattice Structure for Vector LPC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 135
8.10 The Normalized IIR Lattice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
8.11 The Paraunitary or MIMO All-Pass Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
8.11.1 Unitarity of Building Blocks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
8.11.2 Propagation of Paraunitary Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
8.11.3 Poles of the MIMO IIR Lattice. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
8.12 Whitening Effect and Stalling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
8.13 Properties of Transfer Matrices in LPC Theory. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.13.1 Review of Matrix Fraction Descripions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
8.13.2 Relation Between Predictor Polynomials . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
xiv THETHEORY OFLINEAR PREDICTION
8.14 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
A. Linear Estimationof RandomVariables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.1 The Orthogonality Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 151
A.2 Closed-Form Solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
A.3 Consequences of Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154
A.4 Singularity of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156
B. Proof of a Property of Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
C. Stability of the Inverse Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
D. RecursionSatisfied by AR Autocorrelations. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163
References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 171
Author Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177
Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
1
C H A P T E R 1
Introduction
Digital signal processing has influenced modern technology in many wonderful ways. During the
course of signal processing history, several elegant theories have evolved on a variety of important
topics. These theoretical underpinnings, of which we can be very proud, are certainly at the heart
of the crucial contributions that signal processing has made to the modern technological society we
live in.
One of these is the theory of linear prediction. This theory, dating back to the 1940s, is
fundamental to a number of signal processing applications. For example, it is at the center of many
modern power spectrum estimationtechniques. A variationof the theory arises in the identification
of the direction of arrival of an electromagnetic wave, which is important in sensor networks, array
processing, and radar. The theory has been successfully used for the representation, modeling,
compression, and computer generation of speech waveforms. It has also given rise to the idea of
line spectrum pairs, which are used in speech compression based on perceptual measures. More
recently, vector versions of linear prediction theory have been applied for the problem of blind
identification of noisy communication channels.
Another product of the theory is a class of filtering structures called lattice structures. These
have been found to be important in speech compression. The same structures find use in the design
of robust adaptive digital filter structures. The infinite impulse response (IIR) version of the linear
prediction lattice is identical to the well-known all-pass lattice structure that arises in digital filter
theory. The lattice has been of interest because of its stability and robustness properties despite
quantization.
In this book, we give a detailed presentation of the theory of linear prediction and place in
evidence some of the applications mentioned above. Before discussing the scope and outline, it is
important to have a brief glimpse of the history of linear prediction.
1.1 HISTORY OF LINEARPREDICTION
Historically, linear prediction theory can be traced back to the 1941 work of Kolmogorov, who
considered the problem of extrapolation of discrete time random processes. Other early pioneers
are Levinson (1947), Wiener (1949), and Weiner and Masani (1958), who showed how to extend
2 THE THEORY OFLINEARPREDICTION
the ideas for the case of multivariate processes. One of Levinson’s contributions, , which for some
reason he regarded as ‘‘mathematically trivial,’’ is still in wide use today (Levinson’s recursion). For
a very detailed scholarly review of the history of statistical filtering, we have to refer the reader to
the classic article by Kailath (1974), wherein the history of linear estimation is traced back to its
very roots.
An influential early tutorial is the article by John Makhoul (1975) , which reviews the
mathematics of linear prediction, Levinson’s recursion, and so forth and makes the connection to
spectrum estimation and the representation of speech signals. The connection to power spectrum
estimation is studied in great detail in a number of early articles [Kay and Marple, 1981; Robinson,
1982; Marple, 1987; Kay, 1988]. Connections to maximum entropy and spectrum estimation
techniques can be found in Kay and Marple (1981), Papoulis (1981), and Robinson (1982). Articles
that show the connection to direction of arrival and array processing include Schmidt (1979),
Kumaresan (1983), and Paulraj et al. (1986).
Pioneering work that explored the application in speech coding includes the work of Atal
and Schroeder (1970) and that of Itakura and Saito (1970). Other excellent references for this
are Makhoul (1975), Rabiner and Schafer (1978), Jayant and Noll (1984), Deller et al. (1993),
and Schroeder (1999). Applications of multivariable models can be found even in the early image
processing literature (e.g., see Chellappa and Kashyap, 1985).
The connection to lattice structures was studied by Gray and Markel (1973) and Makhoul
(1977), and their applications in adaptive filtering and channel equalization were studied by a
number of authors (e.g., Satorius and Alexander, 1979). This application can be found in a number
of books (e.g., Haykin, 2002; Sayed, 2003). All-pass lattice structures that arise in digital filter
theory are explained in detail in standard signal processing texts (Vaidyanathan, 1993; Proakis and
Manolakis, 1996; Oppenheim and Schafer, 1999; Mitra, 2001; Antoniou, 2006).
Although the history of linear prediction can be traced back to the 1940s, it still finds new
applications. This is the beauty of any solid mathematical theory. An example is the application of
the vector version of linear prediction theory in blind identification of noisy finite impulse response
(FIR) channels (Gorokhov and Loubaton, 1999; Lopez-Valcarce and Dasgupta, 2001).
1.2 SCOPE ANDOUTLINE
Many signal processing books include a good discussion of linear prediciton theory, for example,
Markel and Gray (1976), Rabiner and Schafer (1978), Therrien (1992), Deller et al. (1993),
Anderson and Moore (1979), Kailath et al. (2000), Haykin (2002), and Sayed (2003). The book
by Strobach (1990) is dedicated to an extensive discussion of linear prediction, and so is the early
book by Markel and Gray (1976), which also discusses application to speech. The book by Therrien
(1992) not only has a nice chapter on linear prediction, it also explains the applications in spectrum
INTRODUCTION 3
estimation extensively. Another excellent book is the one by Kailath et al. (2000), which focuses on
the larger topic of linear estimation. Connections to array processing are also covered by Therrien
(1992) and Van Trees (2002).
This short book focuses on the theory of linear prediction. The tight focus allows us to
include under one cover details that are not normally found in other books. The style and emphasis
here are different from the above references. Unlike most of the other references, we have included
a thorough treatment of the vector case (multiple-input/multiple-output linear predictive coding,
or MIMO LPC) in a separate chapter. Another novelty is the inclusion of a detailed chapter on the
theory of line spectral processes. There are several examples and computer-based demonstrations
throughout the book to enhance the theoretical ideas. Applications are briefly mentioned so that
the reader can see the connections. The reader interested in these applications should peruse some
of the references mentioned above and references in the individual chapters.
Chapter 2 introduces the optimal linear prediction problem and develops basic equations for
optimality, called the normal equations. A number of properties of the solution are also studied.
Chapter 3 introduces Levinson’s recursion, which is a fast procedure to solve the normal equations.
This recursion places in evidence further properties of the solution. In Chapter 4, we develop lattice
structures for linear prediction. Chapter 5 is dedicated to the topic of autoregressive modeling,
which has applications in signal representation and compression.
In Chapter 6, we develop the idea of flatness of a power spectrumand relate it to predictability
of a process. Line spectral processes are discussed in detail in Chapter 7. One application is in the
identification of sinusoids in noise, which is similar to the problem of identifying the direction of
arrival of an electromagnetic wave. The chapter also discusses the theory of line spectrum pairs,
which are used in speech compression.
In Chapter 8, a detailed discussion of linear prediction for the MIMO case (case of vector
processes) is presented. The MIMO lattice and its connection to paraunitary matrices is also
explored in this chapter. The appendices include some review material on linear estimation as well
as a few details pertaining to some of the proofs in the main text.
A short list of homework problems covering most of the chapters is included at the end,
before the bibliography section.
1.2.1 Notations
Boldface letters such as A and v indicate matrices and vectors. Superscript T,*, and †as in A
T
, A*,
and A

denote, respectively, the transpose, conjugate, and transpose--conjugate of a matrix. The
tilde notation on a function of z is defined as follows:
¯
H(z) = H

(1/z *)
4 THE THEORY OFLINEARPREDICTION
Thus,
H(z) =

n
h(n)z
−n

¯
H(z) =

n
h

(n)z
n
,
so that the tilde notation effectively replaces all coefficients with the transpose conjugates and
replaces z with 1/z. For example,
H(z) = h(0) + h(1)z
−1

¯
H (z) = h*(0) + h*(1)z
and
H(z) =
a
0
+ a
1
z
−1
1 + b
1
z
−1

¯
H (z) =
a
0
* + a
1
* z
1 + b
1
* z
.
Note that the tilde notation reduces to transpose conjugation on the unit circle:
¯
H(e

) = H

(e

)
As mentioned in the preface, the text is self-contained for readers with introductory exposure to
signal processing, randomprocesses, and matrices. The determinant of a square matrix A is denoted
as det A and the trace as Tr A. Given two Hermitian matrices A and B, the notation A ≥ B means
that A −B is positive semidefinite, and A > B means that A −B is positive definite.
• • • •
5
C H A P T E R 2
The Optimal Linear Prediction
Problem
2.1 INTRODUCTION
In this chapter, we introduced the optimal linear prediction problem. We develop the equations for
optimality and discuss some properties of the solution.
2.2 PREDICTIONERROR ANDPREDICTIONPOLYNOMIAL
Let x(n) be a wide sense stationary (WSS) random process (Papoulis, 1965), possibly complex.
Suppose we wish to predict the value of the sample x(n) using a linear combination of N most
recent past samples. The estimate has the form
´x
f
N
(n) = −
N

i=1
a
N,i
* x(n −i). (2.1)
The integer N is called the prediction order. Notice the use of two subscripts for a, the first one
being the prediction order. The superscript f on the left is a reminder that we are discussing the
‘‘forward’’ predictor, in contrast to the ‘‘backward predictor’’ to be introduced later. The estimation
error is
e
f
N
(n) = x(n) −´x
f
N
(n), (2.2)
that is,
e
f
N
(n) = x(n) +
N

i=1
a
N,i
* x(n − i). (2.3)
We denote the mean squared error as E
f
N
:
E
f
N
Δ
=E[|e
f
N
(n)|
2
]. (2.4)
In view of WSS property, this is independent of time. The optimum predictor (i.e., the optimum
set of coefficients a
N,i
* ) is the one that minimizes this mean squared value. From Eq. (2.3), we see
6 THE THEORY OFLINEARPREDICTION
A (z)
N
) n ( e ) n ( x
f
N
FIR prediction filter
x(n) e (n)
f
N
1 A (z)
N
/
IIR inverse filter (a) ( b)
FIGURE2.1: (a) The FIR prediction filter and (b) its inverse.
that the prediction error e
f
N
(n) can be regarded as the output of an FIR filter A
N
(z) in response to
the WSS input x(n). See Fig. 2.1. The FIR filter transfer function is given by
A
N
(z) = 1 +
N

i=1
a
N,i
* z
−i
. (2.5)
The IIR filter 1/A
N
(z) can therefore be used to reconstruct x(n) from the error signal e
f
N
(n) (Fig.
2.1(b)). The conjugate sign on a
N,i
in the definition (2.5) is for future convenience. Because e
f
N
(n)
is the output of a filter in response to a WSS input, we see that e
f
N
(n) is itself a WSS random
process. A
N
(z) is called the prediction polynomial, although its output is only the prediction error.
Thus, linear prediction essentially converts the signal x(n) into the set of N numbers {a
N,i
}
and the error signal e
f
N
(n). We will see later that the error e
f
N
(n) has a relatively flat power spectrum
compared with x(n). For large N, the error is nearly white, and the spectral information of x(n) is
mostly contained in the coefficient {a
N,i
}. This fact is exploited in data compression applications.
The technique of linear predictive coding (LPC) is the process of converting segments of a real time
signal into the small set of numbers {a
N,i
} for storage and transmission (Section 5.6).
2.3 THE NORMAL EQUATIONS
From Appendix A.1, we know that the optimal value of a
N,i
should be such that the error e
f
N
(n) is
orthogonal to x(n − i), that is,
E[e
f
N
(n)x* (n −i)] = 0, 1 ≤ i ≤ N. (2.6)
This condition gives rise to N equations similar to Eq. (A.13). The elements of the matrices R and
r, defined in Eq. (A.13), now have a special form. Thus,
[R]
im
= E[x(n −1 −i)x* (n −1 −m)], 0 ≤ i, m ≤ N −1.
THEOPTIMALLINEARPREDICTIONPROBLEM 7
Define R(k) to be the autocorrelation sequence of the WSS process x(n), that is,
1
R(k) = E[x(n)x* (n −k)]. (2.7)
Using the fact that R(k) = R* (−k), we can then simplify Eq. (A.13) to obtain






R(0) R(1) . . . R(N−1)
R* (1) R(0) . . . R(N−2)
.
.
.
.
.
.
.
.
.
.
.
.
R* (N− 1) R* (N− 2) . . . R(0)






. ¸¸ .
R
N






a
N,1
a
N,2
.
.
.
a
N,N






= −






R* (1)
R* (2)
.
.
.
R* (N)






. ¸¸ .
−r
(2.8)
For example, with N = 3, we get



R(0) R(1) R(2)
R* (1) R(0) R(1)
R* (2) R* (1) R(0)



. ¸¸ .
R
3



a
3,1
a
3,2
a
3,3



= −



R* (1)
R* (2)
R* (3)



. (2.9)
These equations have been known variously in the literature as normal equations, Yule--Walker
equations, and Wiener--Hopf equations . We shall refer to them as normal equations. We can find a
unique set of optimal predictor coefficients a
N,i
, as long as the N ×N matrix R
N
is nonsingular.
Singularity of this matrix will be analyzed in Section 2.4.2. Note that the matrix R
N
is Toeplitz,
that is, all the elements on any line parallel to the main diagonal are identical. We will elaborate
more on this later.
Minimum-phase property. The optimal predictor polynomial A
N
(z) in Eq. (2.5), which
is obtained from the solution to the normal equations, has a very interesting property. Namely,
all its zeros, z
k
satisfies |z
k
| ≤ 1. That is, A
N
(z) is a mimimum-phase polynomial. In fact, the
zeros are strictly inside the unit circle |z
k
| < 1, unless x(n) is a very restricted process called a
line spectral process (Section 2.4.2). The minimum-phase property guarantees that the IIR filter
1/A
N
(z) is stable. The proof of the minimum-phase property follows automatically as a corollary
of the so-called Levinson’s recursion, which will be presented in Section 3.2. A more direct proof
is given in Appendix C.
1
A more appropriate notation would be R
xx
(k), but we have omitted the subscript for simplicity.
8 THE THEORY OFLINEARPREDICTION
2.3.1 Expression for the Minimized Mean Square Error
We can use Eq. (A.17) to arrive at the following expression for the minimized mean square error:
E
f
N
= R(0) +
N

i=1
a
N,i
* R* (i).
Because E
f
N
is real, we can conjugate this to obtain
E
f
N
= R(0) +
N

i=1
a
N,i
R(i). (2.10)
Next, because e
f
N
(n) is orthogonal to the past N samples x(n − k), it is also orthogonal to the
estimate ´x
f
N
(n). Thus, from x(n) = ´x
f
N
(n) + e
f
N
(n), we obtain
E[ |x(n)|
2
] = E[ |´x(n)|
2
] + E[ |e
f
N
(n)|
2
]
. ¸¸ .
E
f
N
. (2.11)
Thus the mean square value of x(n) is the sum of mean square values of the estimate and the
estimation error.
Example 2.1: Second-Order Optimal Predictor. Consider a real WSS process with auto-
correlation sequence
R(k) = (24/5) × 2
−|k|
−(27/10) ×3
−|k|
. (2.12)
The values of the first few coefficients of R(k) are
R(0) = 2.1, R(1) = 1.5, R(2) = 0.9, . . . (2.13)
The first-order predictor produces the estimate
´x
f
1
(n) = −a
1,1
x(n −1), (2.14)
and the optimal value of a
1,1
is obtained from R(0)a
1,1
= −R(1), that is,
a
1,1
= −
R(1)
R(0)
= −
5
7
The optimal predictor polynomial is
A
1
(z) = 1 + a
1,1
z
−1
= 1 − (5/7)z
−1
(2.15)
THEOPTIMALLINEARPREDICTIONPROBLEM 9
The minimized mean square error is, from Eq. (2.10),
E
f
1
= R(0) + a
1,1
R(1) = 36/35. (2.16)
The second-order optimal predictor coefficients are obtained by solving the normal equations Eq.
(2.8) with N = 2, that is,
_
2.1 1.5
1.5 2.1
__
a
2,1
a
2,2
_
= −
_
1.5
0.9
_
(2.17)
The result is a
2,1
= −5/6 and a
2,2
= 1/6. Thus, the optimal predictor polynomial is
A
2
(z) = 1 −(5/6)z
−1
+ (1/6)z
−2
. (2.18)
The minimized mean square error, computed from Eq. (2.10), is
E
f
2
= R(0) + a
2,1
R(1) + a
2,2
R(2) = 1.0. (2.19)
2.3.2 The Augmented Normal Equation
We can augment the error information in Eq. (2.10) to the normal equations (2.8) simply by
moving the right hand side in Eq. (2.8) to the left and adding an extra row at the top of the matrix
R
N
. The result is






R(0) R(1) . . . R(N)
R* (1) R(0) . . . R(N− 1)
.
.
.
.
.
.
.
.
.
.
.
.
R* (N) R* (N−1) . . . R(0)






. ¸¸ .
R
N+1






1
a
N,1
.
.
.
a
N,N






. ¸¸ .
a
N
=






E
f
N
0
.
.
.
0






(2.20)
This is called the augmented normal equation for the Nth-order optimal predictor and will be used
in many of the following sections. We conclude this section with a couple of remarks about normal
equations.
1. Is any Toeplitz matrix an autocorrelation? Premultiplying both sides of the augmented
equation by vector a

N
, we obtain
a

N
R
N+1
a
N
= E
f
N
. (2.21)
Because R
N+1
is positive semidefinite, this gives a second verification of the (obvious) fact
that E
f
N
≥ 0. This observation, however, has independent importance. It can be used to
10 THE THEORY OFLINEARPREDICTION
prove that any positive definite Toeplitz matrix is an autocorrelation of some WSS process
(see Problem 14).
2. From predictor coefficients to autocorrelations. Given the set of autocorrelation coefficients
R(0), R(1), . . . , R(N), we can uniquely identify the N predictor coefficients a
N,i
and the
mean square error E
f
N
by solving the normal equations (assuming nonsingularity) and then
using Eq. (2.10). Conversely, suppose we are given the solution a
N,i
and the error E
f
N
.
Then, we can work backward and uniquely identify the autocorrelation coefficients R(k),
0 ≤ k ≤ N. This result, perhaps not obvious, is justified as part of the proof of Theorem
5.2 later.
2.4 PROPERTIES OF THE AUTOCORRELATIONMATRIX
The N× N autocorrelation matrix R
N
of a WSS process can be written as
R
N
= E[x(n)x

(n)] (2.22)
where
x(n) =
_
x(n) x(n − 1) . . . x(n −N + 1)
_
T
. (2.23)
For example, the 3 × 3 matrix R
3
in Eq. (2.9) is
R
3
= E



x(n)
x(n −1)
x(n −2)



[ x* (n) x* (n −1) x* (n − 2) ].
This follows fromthe definitionR(k) = E[x(n)x* (n −k)] and fromthe property R(k) = R* (−k).
We first observe some of the simple properties of R
N
:
1. Positive definiteness. Because x(n)x

(n) is Hermitian and positive semidefinite, R
N
is also
Hermitian and positive semidefinite. In fact, it is positive definite as long as it is nonsingular.
Singularity is discussed in Section 2.4.2.
2. Diagonal elements. The quantity R(0) appears on all the diagonal elements and is the mean
square value of the random process, that is, R(0) = E[|x(n)|
2
].
THEOPTIMALLINEARPREDICTIONPROBLEM 11
3. Toeplitz property. We observed earlier that R
N
has the Toeplitz property, that is, the (k, m)
element of R
N
depends only on the difference m −k. To prove the Toeplitz property
formally, simply observe that
[R
N
]
km
= E[x(n −k)x* (n −m)]
= E[x(n)x* (n − m + k)]
= R(m −k).
The Toeplitz property is a consequence of the WSS property of x(n). In Section 3.2, we
will derive a fast procedure called the Levinson’s recursion to solve the normal equations.
This recursion is made possible because of the Toeplitz property of R
N
.
4. A filtering interpretation of eigenvalues. The eigenvalues of the Toeplitz matrix R
N
can be
given a nice interpretation. Consider an FIR filter
V(z) = v
0
* + v
1
* z
−1
+. . . + v
N−1
* z
−(N−1)
(2.24)
with input x(n). Its output can be expressed as
y(n) = v
0
* x(n) + v
1
* x(n −1) +. . . + v
N−1
* x(n −N + 1) = v

x(n),
where
v

=
_
v
0
* v
1
* . . . v
N−1
*
_
The mean square value of y(n) is E[|v

x(n)|
2
] = v

E[x(n)x

(n)]v = v

R
N
v. Thus,
E[|y(n)|
2
] = v

R
N
v. (2.25)
That is, given the Toeplitz autocorrelation matrix R
N
, the quadratic form v

R
N
v is the
mean square value of the output of the FIR filter V(z), with input signal x(n). In particular,
let v be an unit-norm eigenvector of R
N
with eigenvalue λ, that is,
R
N
v = λv, v

v = 1.
Then,
v

R
N
v = λ. (2.26)
Thus, any eigenvalue of R
N
can be regarded as the mean square value of the output y(n) for
appropriate choice of the unit-energy filter V(z) driven by x(n).
In the next few subsections, we study some of the deeper properties of the autocorrelation matrix.
These will be found to be useful for future discussions.
12 THE THEORY OFLINEARPREDICTION
2.4.1 Relation Between Eigenvalues and the Power Spectrum
Suppose λ
i
, 0 ≤ i ≤ N −1 are the eigenvalues of R
N
. Because the matrix is positive semidefinite,
we know that these are real and nonnegative. Now, let S
xx
(e

) be the power spectrum of the
process x(n), that is,
S
xx
(e

) =

k=−∞
R(k)e
−jωk
. (2.27)
We know S
xx
(e

) ≥ 0. Now, let S
min
and S
max
denote the extreme values of the power spectrum
(Fig. 2.2). We will show that the eigenvalues λ
i
are bounded as follows:
S
min
≤ λ
i
≤ S
max
. (2.28)
For example, if x(n) is zero-meanwhite, then S
xx
(e

) is constant, and all the eigenvalues are equal
(consistent with the fact that R
N
is diagonal with all diagonal elements equal to R(0)).
Proof of Eq. (2.28). Consider again a filter V(z) as in (2.24) with input x(n) and denote the
output as y(n). Because the mean square value is the integral of the power spectrum, we have
E[|y(n)|
2
] =
1

_

0
S
yy
(e

)dω
=
1

_

0
S
xx
(e

)|V(e

)|
2

≤ S
max
_

0
|V(e

)|
2


= S
max
v

v
where the last equality follows from Parseval’s theorem. Similarly, E[|y(n)|
2
] ≥ S
min
v

v. With v
constrained to have unit norm (v

v = 1), we then have
S
min
≤ E[|y(n)|
2
] ≤ S
max
(2.29)

S (e )
xx
ω

0
Smax
Smin
All eigenvalues
are here
FIGURE2.2: The relation between power spectrum and eigenvalues of the autocorrelation matrix.
THEOPTIMALLINEARPREDICTIONPROBLEM 13

S (e )
xx
ω

0
poorly conditioned
well conditioned
FIGURE2.3: Examples of power spectra that are well and poorly conditioned.
But we have already shown that any eigenvalue of R
N
can be regarded as the mean square value
E[|y(n)|
2
for appropriate choice of the unit-energy filter V(z) (see remark after Eq. (2.26)). Because
(2.29) holds for every possible output y(n) with unit-energy V(z), Eq. (2.28), therefore, follows.
With λ
min
and λ
max
denoting the extreme eigenvalues of R
N
, the ratio
N =
λ
max
λ
min
≥ 1 (2.30)
is called the condition number of the matrix. It is well-known (Golub and Van Loan, 1989) that if
this ratio is large, then numerical errors tend to have a more severe effect during matrix inversion
(or when trying to solve Eq. (2.8)). It is also known that the condition number cannot decrease as
the size N increases (Problem 12). If the condition number is close to unity, we say that the system
of equations is well-conditioned. By comparison with Eq. (2.28), we see that
1 ≤ N =
λ
max
λ
min

S
max
S
min
(2.31)
Thus, if a random process has a nearly flat power spectrum (i.e., S
max
/S
min
≈ 1), it can be
considered to be a well-conditioned process. If the power spectrum has a wide range, it is possible
that the process is poorly conditioned (i.e., N could be very large). See Fig. 2.3 for demonstration.
Also see Problem 11.
2.4.2 Singularity of the Autocorrelation Matrix
If the autocorrelation matrix is singular, then the corresponding random variables are linearly
14 THE THEORY OFLINEARPREDICTION
V(z)
x(n)
FIR
y(n) = 0
for all time
FIGURE2.4: The FIR filter V(z), which annihilates a line spectral process.
dependent (Section A.4). To be more quantitative, let us assume that R
L+1
[the (L + 1) ×(L + 1)
matrix] is singular. Then, proceeding as in Section A.4, we conclude that
v
0
* x(n) + v
1
* x(n −1) +. . . + v
L
* x(n −L) = 0, (2.32)
where not all v
i
’s are zero. This means, in particular, that if we measure a set of L successive samples
of x(n), then all the future samples can be computed recursively, with no error. That is, the process
is fully predictable.
Eq. (2.32) implies that if we pass the WSS random process x(n) through an FIR filter (see
Fig. 2.4),
V(z) = v
0
* + v
1
* z
−1
+ . . . + v
L
* z
−L
, (2.33)
then the output y(n) is zero for all time! Its power spectrum S
yy
(e

), therefore, is zero for all ω.
Thus
S
yy
(e

) = S
xx
(e

)|V(e

)|
2
≡ 0. (2.34)
Because V(z) has at most L zeros on the unit circle (i.e., |V(e

)|
2
has, at most, L distinct zeros in
0 ≤ ω < 2π), we conclude that S
xx
(e

) can be nonzero only at these points. That is, it has the
form
S
xx
(e

) = 2π
L

i=1
c
i
δ
a
(ω −ω
i
), 0 ≤ ω < 2π, (2.35)
which is a linear combination of Dirac delta functions. This is demonstrated in Fig. 2.5. The
autocorrelation of the process x(n), which is the inverse Fourier transform of S
xx
(e

), then takes
the form
R(k) =
L

i=1
c
i
e

i
k
. (2.36)
A WSS process characterized by the power spectrum Eq. (2.35) (equivalently the autocorrelation
(2.36)) is said to be a line spectral process, and the frequencies ω
i
are called the line frequencies.
THEOPTIMALLINEARPREDICTIONPROBLEM 15

S (e )
xx
ω

0
Dirac delta
functions
FIGURE2.5: Power spectrum of a line spectral process.
Because the process is fully predictable, it is the exact ‘‘opposite’’ of a white process, which has no
correlation between any pair of samples. In terms of frequency domain, for a white process, the
power spectrum is constant, whereas for a fully predictable process, the power spectrum can only
have impulses---it cannot have any smooth component.
In Chapter 7, we present a more complete study of these and address the problem of
identifying the parameters ω
i
and c
i
when such a process is buried in noise.
2.4.3 Determinant of the Autocorrelation Matrix
We now show that the minimized mean square error E
f
N
can be expressed directly in terms of the
determinants of the autocorrelation matrices R
N+1
and R
N
. This result is of considerable value in
theoretical analysis.
Consider the augmented normal equation Eq. (2.20) for the Nth-order predictor. We can
further augment this equation to include the information about the lower-order predictors by
appending more columns. To demonstrate, let N = 3. We then obtain four sets of augmented
normal equations, one for each order, which can be written elegantly together as follows:
R
4
×





1 0 0 0
a
3,1
1 0 0
a
3,2
a
2,1
1 0
a
3,3
a
2,2
a
1,1
1





=





E
f
3
× × ×
0 E
f
2
× ×
0 0 E
f
1
×
0 0 0 E
f
0





(2.37)
This uses the fact that R
3
is a submatrix of R
4
, that is,
R
4
=
_
R(0) ×
× R
3
_
, (2.38)
16 THE THEORY OFLINEARPREDICTION
and similarly, R
2
is a submatrix of R
3
and so forth. The entries × are possibly nonzero but will not
enter our discussion. Taking determinants, we arrive at
det R
4
= E
f
3
E
f
2
E
f
1
E
f
0
,
where we have used the fact that the determinant of a triangular matrix is equal to the product
of its diagonal elements (Horn and Johnson, 1985). Extending this for arbitrary N, we have the
following result:
det R
N+1
= E
f
N
E
f
N−1
. . . E
f
0
. (2.39)
In a similar manner, we have
det R
N
= E
f
N−1
E
f
N−2
. . . E
f
0
. (2.40)
Taking the ratio, we arrive at
E
f
N
=
det R
N+1
det R
N
. (2.41)
Thus, the minimized mean square prediction errors can be expressed directly in terms of the
determinants of two autocorrelation matrices. In Section 6.5.2, we will further show that
lim
N→∞
(det R
N
)
1/N
= lim
N→∞
E
f
N
(2.42)
That is, the limiting value of the mean squared prediction error is identical to the limiting value of
(det R
N
)
1/N
.
2.5 ESTIMATINGTHE AUTOCORRELATION
In any practical application that involves linear prediction, the autocorrelation samples R(k) have
to be estimated from (possibly noisy) measured data x(n) representing the random process. There
are many methods for this, two of which are described here.
The autocorrelationmethod. In this method, we define the truncated version of measured
data
x
L
(n) =
_
x(n) 0 ≤ n ≤ L −1,
0 outside,
and compute its deterministic autocorrelation. Thus, the estimate of R(k) has the form
´
R(k) =
L−1

n=0
x
L
(n)x
L
* (n − k). (2.43)
THEOPTIMALLINEARPREDICTIONPROBLEM 17
There are variations of this method; for example, one could divide the summation by the number
of terms in the sum (which depends on k) and so forth. From the estimated value
´
R(k), we form
the Toeplitz matrix R
N
and proceed with the computation of the predictor coefficients.
A simple matrix interpretation of the estimation of R
N
is useful. For example, if we have
L = 5 and wish to estimate the 3 × 3 autocorrelation matrix R
3
, the computation (2.43) is
equivalent to defining the data matrix
X =












x(0) 0 0
x(1) x(0) 0
x(2) x(1) x(0)
x(3) x(2) x(1)
x(4) x(3) x(2)
0 x(4) x(3)
0 0 x(4)












(2.44)
and forming the estimate of R
3
using
´
R
3
= (X

X)
*
. (2.45)
This is a Toeplitz matrix whose top row is
´
R(0),
´
R(1), . . . . Furthermore, it is positive definite by
virtue of its form X

X. So, all the properties from linear prediction theory continue to be satisfied
by the polynomial A
N
(z). For example, we can use Levinson’s recursion (Section 3.2) to solve for
A
N
(z), and all zeros of A
N
(z) are guaranteed to be in |z| < 1 (Appendix C).
Note that the number of samples of x(n) used in the estimate of R(k) decreases as k increases,
so the estimates are of good quality only if k is small compared with the number of available data
samples L. There are variations of this method that use a tapering window on the given data instead
of abruptly truncating it (Rabiner and Schafer, 1978; Therrien, 1992).
The covariance method. In a variation called the ‘‘covariance method,’’ the data matrix X is
formed differently:
X =



x(2) x(1) x(0)
x(3) x(2) x(1)
x(4) x(3) x(2)



(2.46)
and the autocorrelation estimated as
´
R
3
= (X

X)
*
. If necessary we can divide each element of the
estimate by a fixed integer so that this looks like a time average.
In the covariance method, each
´
R(k) is an average of N possibly nonzero samples of data,
unlike the autocorrelation method, where the number of samples used in the estimate of R(k)
decreases as k increases. So, the estimates, in general, tend to be better than in the autocorrelation
18 THE THEORY OFLINEARPREDICTION
method, but the matrix
´
R
N
in this case is not necessarily Toeplitz. So, Levinson’s recursion
(Section 3.2) cannot be applied for solving the normal equations, and we have to solve the normal
equations directly. Another problem with non-Toeplitz covariances is that the solution A
N
(z) is
not guaranteed to have all zeros inside the unit circle.
More detailed discussions on the relative advantages and disadvantages of these methods
can be found in many references (e.g., Makhoul, 1975; Rabiner and Schafer, 1978; Kay, 1988;
Therrien, 1992).
2.6 CONCLUDINGREMARKS
In this chapter, we introduced the linear predictionproblem and discussedits solution. The solution
appears in the formof a set of linear equations called the normal equations. An efficient way to solve
the normal equations using a recursive procedure, called Levinson’s recursion, will be introduced in
the next chapter. This recursion will place in evidence a structure called the lattice structure for linear
prediction. Deeper discussions on linear prediction will follow in later chapters. The extension of
the optimal prediction problem for the case of vector processes, the MIMO LPC problem, will be
considered in Chapter 8.
• • • •
19
C H A P T E R 3
Levinson’s Recursion
3.1 INTRODUCTION
The optimal linear predictor coefficients a
N,i
are solutions to the set of normal equations given
by Eq. (2.8). Traditional techniques to solve these equations require computations of the order of
N
3
(see, e.g., Golub and Van Loan, 1989). However, the N× N matrix R
N
in these equations is
not arbitrary, but Toeplitz. This property can be exploited to solve these equations in an efficient
manner, requiring computations of the order of N
2
. A recursive procedure for this, due to Levinson
(1947), will be described in this chapter. This procedure, in addition to being efficient, also places
in evidence many useful properties of the optimal predictor, as we shall see in the next several
sections.
3.2 DERIVATIONOF LEVINSON’S RECURSION
Levinson’s recursion is based on the observation that if the solution to the predictor problem is
known for order m, then the solution for order (m + 1) can be obtained by a simple updating
process. In this way, we obtain not only the solution to the Nth-order problem, but also all the
lower orders. To demonstrate the basic idea, consider the third-order prediction problem. The
augmented normal Eq. (2.20) becomes





R(0) R(1) R(2) R(3)
R*(1) R(0) R(1) R(2)
R*(2) R*(1) R(0) R(1)
R*(3) R*(2) R*(1) R(0)





. ¸¸ .
R
4





1
a
3,1
a
3,2
a
3,3





. ¸¸ .
a
3
=





E
f
3
0
0
0





. (3.1)
This set of four equations describes the third-order optimal predictor. Our aim is to show how we
can pass from the third-order to the fourth-order case. For this, note that we can append a fifth
equation to the above set of four and write it as
20 THE THEORY OFLINEARPREDICTION







R(0) R(1) R(2) R(3) R(4)
R*(1) R(0) R(1) R(2) R(3)
R*(2) R*(1) R(0) R(1) R(2)
R*(3) R*(2) R*(1) R(0) R(1)
R*(4) R*(3) R*(2) R*(1) R(0)







. ¸¸ .
R
5







1
a
3,1
a
3,2
a
3,3
0







=







E
f
3
0
0
0
α
3







. (3.2)
where
α
3
Δ
=R*(4) + a
3,1
R*(3) + a
3,2
R*(2) + a
3,3
R*(1). (3.3)
The matrix R
5
above is Hermitian and Toeplitz. Using this, we verify that its elements satisfy
(Problem 9)
R
4−i,4−k
* = R
ik
, 0 ≤ i, k ≤ 4. (3.4)
In other words, if we reverse the order of all rows, then reverse the order of all columns and then
conjugate the elements, the result is the same matrix! As a result, Eq. (3.2) also implies







R(0) R(1) R(2) R(3) R(4)
R*(1) R(0) R(1) R(2) R(3)
R*(2) R*(1) R(0) R(1) R(2)
R*(3) R*(2) R*(1) R(0) R(1)
R*(4) R*(3) R*(2) R*(1) R(0)







. ¸¸ .
R
5







0
a
3,3
*
a
3,2
*
a
3,1
*
1







=







α
3
*
0
0
0
E
f
3







. (3.5)
If we now take a linear combination of Eqs. (3.2) and (3.5) such that the last element on the
right-hand side becomes zero, we will obtain the equations governing the fourth-order predictor
indeed! Thus, consider the operation
Eq. (3.2) + k
4
* ×Eq. (3.5), (3.6)
where k
4
is a constant. If we choose
k
4
* =
−α
3
E
f
3
, (3.7)
LEVINSON’S RECURSION 21
then the result has the form







R(0) R(1) R(2) R(3) R(4)
R*(1) R(0) R(1) R(2) R(3)
R*(2) R*(1) R(0) R(1) R(2)
R*(3) R*(2) R*(1) R(0) R(1)
R*(4) R*(3) R*(2) R*(1) R(0)







. ¸¸ .
R
5







1
a
4,1
a
4,2
a
4,3
a
4,4







=







×
0
0
0
0







, (3.8)
where
a
4,1
= a
3,1
+ k
4
* a
3,3
*
a
4,2
= a
3,2
+ k
4
* a
3,2
*
a
4,3
= a
3,3
+ k
4
* a
3,1
*
a
4,4
= k
4
* .
By comparison with Eq. (3.1), we conclude that the element denoted × on the right-hand side of
Eq. (3.8) is E
f
4
, which is the minimized forward prediction error for the fourth-order predictor.
From the above construction, we see that this is related to E
f
3
as
E
f
4
= E
f
3
+ k
4
* α
3
*
= E
f
3
− k
4
k
4
* E
f
3
(from Eq. (3.7)).
= (1 −|k
4
|
2
)E
f
3
.
Summarizing, if we know the coefficients a
3,i
and the mean square error E
f
3
for the third-order
optimal predictor, we can find the corresponding quantities for the fourth-order predictor from the
above equations.
Polynomial Notation. The preceding computation of {a
4,k
} from {a
3,k
} can be written
more compactly if we use polynomial notations and define the FIR filter
A
m
(z) = 1 + a
m,1
* z
−1
+ a
m,2
* z
−2
+ . . . + a
m,m
* z
−m
(prediction polynomial). (3.9)
With this notation, we can rewrite the computation of a
4,k
from a
3,k
as
A
4
(z) = A
3
(z) + k
4
z
−1
[z
−3
¯
A
3
(z)], (3.10)
where the tilde notation is as defined in Section 1.2. Thus,
z
−m
¯
A
m
(z) = a
m,m
+ a
m,m−1
z
−1
+. . . + a
m,1
z
−(m−1)
+ z
−m
. (3.11)
22 THE THEORY OFLINEARPREDICTION
3.2.1 Summary of Levinson’s Recursion
The recursion demonstrated above for the third-order case can be generalized easily to arbitrary
predictor orders. Thus, let A
m
(z) be the predictor polynomial for the mth-order optimal predictor
and E
f
m
the corresponding mean square value of the prediction error. Then, we can find the
corresponding quantities for the (m + 1)th-order optimal predictor as follows:
k
m+1
=
−α
m
*
E
f
m
, (3.12)
A
m+1
(z) = A
m
(z) + k
m+1
z
−1
[z
−m
¯
A
m
(z)] (order update), (3.13)
E
f
m+1
= (1 − |k
m+1
|
2
)E
f
m
, (error update), (3.14)
where
α
m
= R*(m + 1) + a
m,1
R*(m) + a
m,2
R*(m −1) +. . . + a
m,m
R*(1). (3.15)
Initialization. Once this recursion is initialized for small m (e.g., m = 0), it can be used to
solve the optimal predictor problem for any m. Note that for m = 0, the predictor polynomial is
A
0
(z) = 1, (3.16)
and the error is, from Eq. (2.3), e
f
0
(n) = x(n). Thus,
E
f
0
= R(0). (3.17)
Also, from the definition of α
m
, we have
α
0
= R*(1). (3.18)
The above three equations are used to initialize Levinson’s recursion. For example, we can compute
k
1
= −α
0
* /E
f
0
= −R(1)/R(0), (3.19)
and evaluate A
1
(z) from Eq. (3.13) and so forth.
3.2.2 The Partial Correlation Coefficient
The quantity k
i
in Levinson’s recursion has a nice ‘‘physical’’ significance. Recall from Section 2.3
that the error e
f
m
(n) of the optimal predictor is orthogonal to the past samples
x(n − 1), x(n − 2), . . . , x(n − m). (3.20)
LEVINSON’S RECURSION 23
The ‘‘next older’’ sample x(n − m −1), however, is not necessarily orthogonal to the error, that
is, the correlation E[e
f
m
(n)x*(n − m −1)] could be nonzero. It can be shown (Problem 6) that the
quantity α
m
* in the numerator of k
m+1
satisfies the relation
α
m
* = E[e
f
m
(n)x*(n −m − 1)]. (3.21)
The coefficient k
m+1
represents this correlation, normalized by the mean square error E
f
m
.
The correction to the prediction polynomial in the order-update equation (second term in
Eq. (3.13)) is proportional to this normalized correlation.
In the literature, the coefficients k
i
have been called the partial correlation coefficients and
abbreviated as parcor coefficients. They are also known as the lattice or reflection coefficients, for
reasons that will become clear in later chapters.
Example 3.1: Levinson’s Recursion. Consider again the real WSS process with the auto-
correlation (2.12). We initialize Levinson’s recursion as described above, that is,
A
0
(z) = 1, α
0
= R(1) = 1.5, and E
f
0
= R(0) = 2.1. (3.22)
We can now compute the quantities
k
1
= −α
0
/E
f
0
= −5/7
A
1
(z) = A
0
(z) + k
1
z
−1
¯
A
0
(z) = 1 − (5/7)z
−1
,
E
f
1
= (1 −k
2
1
)E
f
0
= 36/35.
We have now obtained all information about the first-order predictor. To obtain the second-order
predictor, we compute the appropriate quantities in the following order:
α
1
= R(2) + a
1,1
R(1) = (9/10) − (5/7) × 1.5 = −6/35
k
2
= −α
1
/E
f
1
= 1/6
A
2
(z) = A
1
(z) + k
2
z
−2
¯
A
1
(z) = 1 −(5/6)z
−1
+ (1/6)z
−2
E
f
2
= (1 −k
2
2
)E
f
1
= 1.0.
We can proceed in this way to compute optimal predictors of any order. The above results agree
with those of Example 2.1, where we used a direct approach to solve the normal equations.
3.3 SIMPLE PROPERTIES OF LEVINSON’S RECURSION
Inspection of Levinson’s recursion reveals many interesting properties. Some of these are discussed
below. Deeper properties will be studied in the next several sections.
1. Computational complexity. The computation of A
m+1
(z) from A
m
(z) requires nearly m
multiplication and addition operations. As a result, the amount of computation required
24 THE THEORY OFLINEARPREDICTION
for N repetitions (i.e., to find a
N,i
) is proportional to N
2
. This should be compared with
traditional methods for solving Eq. (2.8) (such as Gaussian elimination), which require
computations proportional to N
3
. In addition to reducing the computation, Levinson’s
recursion also reveals the optimal solution to all the predictors of orders ≤ N.
2. Parcor coefficients are bounded. Because E
f
m
is the mean square value of the prediction error,
we know E
f
m
≥ 0 for any m. From Eq. (3.14), it, therefore, follows that |k
i
|
2
≤ 1, for any i.
3. Strict bound on parcor coefficients. Let |k
m+1
| = 1. Then Eq. (3.14) says that E
f
m+1
= 0. In
other words, the mean square value of the prediction error sequence e
f
m+1
(n) is zero. This
means that e
f
m+1
(n) is identically zero for all n. In other words, the output of A
m+1
(z) in
response to x(n) is zero (Fig. 2.1(a)). Using an argument similar to the one in Section 2.4.2,
we conclude that this is not possible unless x(n) is a line spectral process. Summarizing, we
have
|k
i
| < 1, for any i, (3.23)
unless x(n) is a line spectral process. Recall also that the assumption that x(n) is not
line spectral also ensures that the autocorrelation matrix R
N
in the normal equations is
nonsingular for all N (Section 2.4.2).
4. Minimized error is monotone. From Eq. (3.14) it also follows that the mean square error is a
monotone function, that is,
E
f
m+1
≤ E
f
m
, (3.24)
with equality if and only if k
m+1
= 0. It is interesting to note that repeated application
of Eq. (3.14) yields the following expression for the mean square prediction error of the
Nth-order optimal predictor:
E
f
N
=
_
1 −|k
N
|
2
__
1 − |k
N−1
|
2
_
. . .
_
1 −|k
1
|
2
_
E
f
0
=
_
1 −|k
N
|
2
__
1 − |k
N−1
|
2
_
. . .
_
1 −|k
1
|
2
_
R(0)
Example3.2: Levinson’s recursion. Now, consider a WSSprocess x(n) with power spectrum
S
xx
(e

) as shown in Fig. 3.1 (top). The first few samples of the autocorrelation are
R(0) R(1) R(2) R(3) R(4) R(5) R(6)
0.1482 0.0500 0.0170 −0.0323 −0.0629 0.0035 −0.0087
LEVINSON’S RECURSION 25
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
0 1 2
0.08
0.12
0.16
Prediction order m
P
r
e
d
i
c
t
i
o
n

e
r
r
o
r
0 1 2 14
0.08
0.12
0.16
Prediction order m
P
r
e
d
i
c
t
i
o
n

e
r
r
o
r
FIGURE3.1: Example 3.2. The input power spectrum S
xx
(e

) (top), the prediction error E
f
m
plotted
for a few prediction orders m (middle), and the prediction error E
f
m
shown for more orders (bottom).
26 THE THEORY OFLINEARPREDICTION
If we perform Levinson’s recursion with this, we obtain the parcor coefficients
k
1
k
2
k
3
k
4
k
5
k
6
−0.3374 −0.0010 0.2901 0.3253 −0.3939 0.1908
for the first six optimal predictors. Notice that these coefficients satisfy |k
m
| < 1 as expected from
theory. The corresponding prediction errors are
E
f
0
E
f
1
E
f
2
E
f
3
E
f
4
E
f
5
E
f
6
0.1482 0.1314 0.1314 0.1203 0.1076 0.0909 0.0876
The prediction errors are also plotted in Fig. 3.1 (middle). The error decreases monotonically as the
prediction order increases. The same trend continues for higher prediction orders as demonstrated
in the bottom plot. The optimal prediction polynomials A
0
(z) through A
6
(z) have the coefficients
given in the following table:
A
0
(z) 1.0
A
1
(z) 1.0 −0.3374
A
2
(z) 1.0 −0.3370 −0.0010
A
3
(z) 1.0 −0.3373 −0.0988 0.2901
A
4
(z) 1.0 −0.2429 −0.1309 0.1804 0.3253
A
5
(z) 1.0 −0.3711 −0.2020 0.2320 0.4210 −0.3939
A
6
(z) 1.0 −0.4462 −0.1217 0.2762 0.3825 −0.4647 0.1908
The zeros of these polynomials can be verified to be inside the unit circle, as expected from our
theoretical development (Appendix C). For example, the zeros of A
6
(z) are complex conjugate
pairs with magnitudes 0.9115, 0.7853, and 0.6102.
3.4 THE WHITENINGEFFECT
Suppose we compute optimal linear predictors of increasing order using Levinson’s recursion. We
know the prediction error decreases with order, that is, E
k+1
≤ E
k
. Assume that after the predictor
has reached the order m, the error does not decrease any further, that is, suppose that
E
f
m
= E
f
m+1
= E
f
m+2
= E
f
m+3
= . . . (3.25)
This represents a stalling condition, that is, increasing the numberof past samples does not help to
increase the prediction accuracy any further. Whenever such a stalling occurs, it turns out that the
LEVINSON’S RECURSION 27
error e
f
m
(n) satisfies a very interesting property, namely, any pair of samples are mutually orthogonal.
That is,
E
_
e
f
m
(n)[e
f
m
(n + i)]*
_
=
_
0 i = 0
E
f
m
i = 0.
(3.26)
In particular, if x(n) has zero mean, then e
f
m
(n) will also have zero mean, and the preceding equation
means that e
f
m
(n) is white. Now, from Section 2.2, we know that x(n) can be represented as the
output of a filter, excited with e
f
m
(n) (see Fig. 2.1(b)). Thus, whenever stalling occurs, x(n) is the
output of an all-pole filter 1/A
m
(z) with white input; such a zero-mean process x(n) is said to be
autoregressive (AR). We will present a more complete discussion of AR processes in Section 5.2.
For the case of nonzero mean, Eq. (3.26) still holds, and we say that e
f
m
(n) is an orthogonal (rather
than white) random process.
Theorem 3.1. Stalling and whitening. Consider the optimal prediction of a WSS process
x(n), with successively increasing predictor orders. Then, the iteration stalls after the mth-order
predictor (i.e., the mean square error stays constant as shown by Eq. (3.25)), if and only if the
prediction error e
f
m
(n) satisfies the orthogonality condition Eq. (3.26). ♦
Proof. Stalling implies, in particular, that E
f
m+1
= E
f
m
, that is, k
m+1
= 0 (from Eq. (3.14)).
As a result, A
m+1
(z) = A
m
(z). Repeating this, we see that
A
m
(z) = A
m+1
(z) = A
m+2
(z) = . . . (3.27)
The prediction error sequence e
f

(n), therefore, is the same for all ≥ m. The condition k
m+1
= 0
implies α
m
= 0 from Eq. (3.12). In other words, the cross-correlationEq. (3.21) is zero. Repeating
this argument, we see that whenever stalling occurs, we have
E[e
f
m+
(n)x*(n − m − − 1)] = 0, ≥ 0. (3.28)
But e
f
m+
(n) = e
f
m
(n) for any ≥ 0, so that
E[e
f
m
(n)x*(n − m − − 1)] = 0, ≥ 0. (3.29)
By orthogonality principle, we already knowthat E[e
f
m
(n)x*(n − i)] = 0 for 1 ≤ i ≤ m. Combining
the preceding two equations, we conclude that
E[e
f
m
(n)x*(n −)] = 0, ≥ 1. (3.30)
28 THE THEORY OFLINEARPREDICTION
In other words, the error e
f
m
(n) is orthogonal to all the past samples of x(n). We also know that
e
f
m
(n) is a linear combination of present and past samples of x(n), that is,
e
f
m
(n) = x(n) +
m

i=1
a
m,i
* x(n −i). (3.31)
Similarly,
e
f
m
(n −) = x(n − ) +
m

i=1
a
m,i
* x(n − −i). (3.32)
Because e
f
m
(n) is orthogonal to all the past samples of x(n), we, therefore, conclude that e
f
m
(n)
is orthogonal to e
f
m
(n − ), > 0. Summarizing, we have proved, E[e
f
m
(n)[e
f
m
(n −)]*] = 0 for
> 0. This proves Eq. (3.26) indeed. By reversing the above argument, we can show that if e
f
m
(n)
has the property (3.26) then the recursion stalls (i.e., Eq. (3.25) holds).
Example 3.3: Stalling and Whitening. Consider a real WSS process with autocorrelation
R(k) = ρ
|k|
, (3.33)
where −1 < ρ < 1. The first-order predictor coefficient a
1,1
is obtained by solving
R(0)a
1,1
= −R(1), (3.34)
so that a
1,1
= −ρ. Thus, the optimal predictor polynomial is A
1
(z) = 1 −ρz
−1
. To compute the
second-order predictor, we first evaluate
α
1
= R(2) + a
1,1
R(1) = ρ
2
− ρ
2
= 0. (3.35)
Using Levinson’s recursion, we find k
2
= −α
1
* /E
f
1
= 0, so that A
2
(z) = A
1
(z). To find the
third-order predictor, note that
α
2
= R(3) + a
2,1
R(2) + a
2,2
R(1) = R(3) −ρR(2) = 0. (3.36)
Thus, k
3
= 0 and A
3
(z) = A
1
(z). No matter how far we continue, we will find in this case that
A
m
(z) = A
1
(z) for all m. That is, the recursion has stalled. Let us double-check this by testing
whether e
f
1
(n) satisfies Eq. (3.26). Because e
f
1
(n) is the output of A
1
(z) in response to x(n), we
have
e
f
1
(n) = x(n) − ρx(n − 1). (3.37)
LEVINSON’S RECURSION 29
Thus, for k > 0,
E[e
f
1
(n)e
f
1
(n − k)] = R(k) + ρ
2
R(k) −ρR(k −1) −ρR(k + 1) = 0, (3.38)
as anticipated. Because R(∞) = 0, the process x(n) has zero mean. So, e
f
1
(n) is a zero-mean white
process.
3.5 CONCLUDINGREMARKS
Levinson’s work was done in 1947 and, in his own words, was a ‘‘mathematically trivial procedure.’’
However, it is clear from this chapter that Levinson’s recursion is very elegant and insightful. It can
be related to early work by other famous authors (e.g., Chandrasekhar, 1947). It is also related to
the Berlekamp--Massey algorithm (Berlekamp, 1969). For a fascinating history, the reader should
study the scholarly review by Kaliath (1974; in particular, see p. 160). The derivation of Levinson’s
recursion in this chapter used the properties of the autocorrelation matrix. However, the method
can be extended to the case of Toeplitz matrices, which are not necessarily positive definite (Blahut,
1985).
In fact, eventhe Toeplitz structure is not necessary if the goal is to obtainan O(N
2
) algorithm.
In 1979, Kailath et al. introduced the idea of displacement rank for matrices. They showed that as
long as the displacement rank is a fixed number independent of matrix size, O(N
2
) algorithms can
be found for solving linear equations involving these matrices. It turns out that Toeplitz matrices
have displacement rank 2, regardless of the matrix size. The same is true for inverses of Toeplitz
matrices (which are not necessarily Toeplitz).
One of the outcomes of Levinson’s recursion is that it gives rise to an elegant structure for
linear prediction called the lattice structure. This will be the topic of discussion for the next chapter.
• • • •
31
C H A P T E R 4
Lattice Structures for Linear
Prediction
4.1 INTRODUCTION
In this chapter, we present lattice structures for linear prediction. These structures essentially
followfromLevinson’s recursion. Lattice structures have fundamental importance not only in linear
prediction theory but, more generally, in signal processing. For example, they arise in the theory
of all-pass filters: any stable rational all-pass filter can be represented using an IIR lattice structure
similar to the IIR LPC lattice. As a preparation for the main topic, we first discuss the idea of
backward linear prediction.
4.2 THE BACKWARDPREDICTOR
Let x(n) represent a WSS random process as usual. A backward linear predictor for this process
estimates the sample x(n − N− 1) based on the ‘‘future values’’ x(n −1), . . . , x(n −N). The
predicted value is a linear combination of the form
´ x
b
N
(n −N − 1) = −
N

i=1
b
N,i
* x(n −i), (4.1)
and the prediction error is
e
b
N
(n) = x(n −N −1) − ´ x
b
N
(n −N − 1)
=
N

i=1
b
N,i
* x(n −i) + x(n −N −1).
The superscript b is meant to be a reminder of ‘‘backward.’’ Figure 4.1 demonstrates the difference
betweenthe forward and backward predictors. Notice that both ´ x
f
N
(n) and ´ x
b
N
(n −N −1) are based
on the same set of N measurements. The backward prediction problem is primarily of theoretical
interest, but it helps us to attach a ‘‘physical’’ significance to the polynomial z
−m
¯
A
m
(z) in Levinson’s
32 THE THEORY OFLINEARPREDICTION
n − 1 n n−N n−N−1
time
observed data
forward predictor
estimates this
backward predictor
estimates this
FIGURE 4.1: Comparison of the forward and backward predictors.
recursion (3.13), as we shall see. As in the forward predictor problem, we again define the predictor
polynomial
B
N
(z) =
N

i=1
b
N,i
* z
−i
+ z
−(N+1)
. (4.2)
The output of this FIR filter in response to x(n) is equal to the predictor error e
b
N
(n).
To find the optimal predictor coefficients b
N,i
* , we again apply the orthogonality principle,
which says that e
b
N
(n) must be orthogonal to x(n −k), for 1 ≤ k ≤ N. Using this, we can show that
the solution is closely related to that of the forward predictor of Section 2.3. With a
N,i
denoting
the optimal forward predictor coefficients, it can be shown (Problem 15) that
b
N,i
= a
N,N+1−i
* , 1 ≤ i ≤ N, (4.3)
This equation represents time-reversal and conjugation. Thus, the optimum backward predictor
polynomial is given by
B
N
(z) = z
−(N+1)
¯
A
N
(z), (4.4)
where the tilde notation is as defined in Section 1.2.1. Figure 4.2 summarizes how the forward and
backward prediction errors are derived from x(n) by using the two FIR filters, A
N
(z) and B
N
(z). It
is therefore clear that we can derive the backward prediction error sequence e
b
N
(n) from the forward
prediction error sequence e
f
N
(n) as shown in Fig. 4.3. In view of Eq. (4.4), we have
B
N
(z)
A
N
(z)
=
z
−(N+1)
¯
A
N
(z)
A
N
(z)
(4.5)
LATTICESTRUCTURES FORLINEARPREDICTION 33
A (z)
N
) n ( e ) n ( x
f
N
forward prediction filter
B (z)
N
e (n)
b
N
backward prediction filter
FIGURE 4.2: The forward and the backward prediction filters.
4.2.1 All-Pass Property
We now show that the function
G
N
(z)
Δ
=
z
−N
¯
A
N
(z)
A
N
(z)
(4.6)
is all-pass, that is,
|G
N
(e

)| = 1, ∀ ω. (4.7)
To see this, simply observe that
¯
G
N
(z)G
N
(z) =
z
N
A
N
(z)
¯
A
N
(z)
×
z
−N
¯
A
N
(z)
A
N
(z)
= 1, ∀ z
But because
¯
H
N
(e

) = H
N
*
(e

),
the preceding implies that |G
N
(e

)|
2
= 1, which proves Eq. (4.7). From Fig. 4.3, we therefore
see that the backward error e
b
N
(n) is the output of the all-pass filter z
−1
G
N
(z) in response to the
B (z)
N
e (n)
b
N
backward prediction filter
x(n)
e (n)
f
N
1 A (z)
N
/
inverse of forward
prediction filter
FIGURE 4.3: The backward prediction error sequence, derived from the forward prediction error
sequence.
34 THE THEORY OFLINEARPREDICTION
input e
f
N
(n), which is the forward error. The power spectrum of e
f
N
(n) is therefore identical to that
of e
b
N
(n). In particular, therefore, the mean square values of the two errors are identical, that is,
E
b
N
= E
f
N
(4.8)
where E
b
N
= E[|e
b
N
(n)|
2
] and E
f
N
= E[|e
f
N
(n)|
2
]. Notice that the all-pass filter (4.6) is stable because
the zeros of A
N
(z) are inside the unit circle (Appendix C).
4.2.2 Orthogonality of the Optimal Prediction Errors
Consider the set of backward prediction error sequences of various orders, that is,
e
b
0
(n), e
b
1
(n), e
b
2
(n), . . . (4.9)
We will show that any two of these are orthogonal. More precisely,
Theorem 4.1. Orthogonality of errors. The backward prediction error sequences satisfy the
property
E
_
e
b
m
(n)[e
b
k
(n)]
*
_
=
_
0 for k = m
E
b
m
for k = m.
(4.10)
for all k, m ≥ 0. ♦
Proof. It is sufficient to prove this for m > k. According to the orthogonality principle, e
b
m
(n)
is orthogonal to
x(n − 1), . . . , x(n −m). (4.11)
From its definition,
e
b
k
(n) = x(n − k −1) + b
k,1
*
x(n −1) + b
k,2
*
x(n −2) + . . . + b
k,k
*
x(n −k).
From these two equations, we conclude that E[e
b
m
(n)[e
b
k
(n)]

] = 0, for m > k. From this result, Eq.
(4.10) follows immediately.
In a similar manner, it can be shown (Problem 16) that the forward predictor error sequences
have the following orthogonality property:
E
_
e
f
m
(n)[e
f
k
(n − 1)]
*
_
= 0, m > k. (4.12)
The above orthogonality properties have applications in adaptive filtering, specifically in improving
the convergence of adaptive filters (Satorius and Alexander, 1979; Haykin, 2002). The process
of generating the above set of orthogonal signals from the random process x(n) has also been
LATTICESTRUCTURES FORLINEARPREDICTION 35
interpretedas a kind of Gram--Schmidt orthogonalization(Haykin, 2002). These details are beyond
scope of this chapter, but the above references provide further details and bibliography.
4.3 LATTICE STRUCTURES
In Section3.2, we presentedLevinson’s recursion, which computes the optimal predictor polynomial
A
m+1
(z) from A
m
(z) according to
A
m+1
(z) = A
m
(z) + k
m+1
z
−1
[z
−m
¯
A
m
(z)] (4.13)
On the other hand, we know from Section 4.2, that the optimal backward predictor polynomial is
given by Eq. (4.4). We can therefore rewrite Levinson’s order update equation as
A
m+1
(z) = A
m
(z) + k
m+1
B
m
(z). (4.14)
From this, we also obtain
B
m+1
(z) = z
−1
[k
m+1
*
A
m
(z) + B
m
(z)], (4.15)
by using B
m+1
(z) = z
−(m+2)
¯
A
m+1
(z). The preceding relations can be schematically represented as
in Fig. 4.4(a). Because e
f
m
(n) and e
b
m
(n), are the outputs of the filters A
m
(z) and B
m
(z) in response
to the common input x(n), the prediction error signals are related as shown in Fig. 4.4(b). By
A (z)
m A (z)
m+1
B (z)
m
B (z)
m+1 z
−1
k
*
m+1
k
m+1
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
z
−1
k
*
m+1
k
m+1
(a)
( b)
FIGURE 4.4: (a) The lattice section that generates the (m + 1)th-order predictor polynomials from
the mth-order predictor polynomials. (b) The same system shown with error signals indicated as the
inputs and outputs.
36 THE THEORY OFLINEARPREDICTION
z
−1
e (n)
b
1
B (z)
1
A (z)
1
e (n)
f
1
z
−1
e (n)
b
0
B (z)
0
A (z)
0
e (n)
f
0
k
1
z
−1
e (n)
b
2
B (z)
2
A (z)
2
e (n)
f
2
z
−1
e (n)
b
N
B (z)
N
A (z)
N
e (n)
f
N
k
2
k
N

k
m
k
m
k
*
m
( b)
(a)
x(n)
FIGURE 4.5: (a) The FIR LPC lattice structure that generates the prediction errors e
f
m
(n) and e
b
m
(n)
for all optimal predictors with order 1 ≤ m ≤ N. (b) Details of the mth lattice section.
using the facts that A
0
(z) = 1 and B
0
(z) = z
−1
, we obtain the structure of Fig. 4.5(a). Here,
each rectangular box is the two multiplier lattice section shown in Fig. 4.5(b). In this structure,
the transfer functions A
m
(z) and B
m
(z) are generated by repeated use of the lattice sections in
Fig. 4.4. This is a cascaded FIR lattice structure, often referred to as the LPC FIR lattice. If we
apply the input x(n) to the FIR lattice structure, the optimal forward and backward prediction
error sequences e
f
m
(n) and e
b
m
(n) appear at various nodes as indicated. Lattice structures for linear
prediction have been studied in detail by Makhoul (1977).
4.3.1 The IIRLPC Lattice
From Fig. 4.4(b), we see that the prediction errors are related as
e
f
m+1
(n) = e
f
m
(n) + k
m+1
e
b
m
(n),
e
b
m+1
(n) = k
m+1
*
e
f
m
(n − 1) + e
b
m
(n −1).
Suppose we rearrange the first equation as e
f
m
(n) = e
f
m+1
(n) − k
m+1
e
b
m
(n), then these two equations
become
e
f
m
(n) = e
f
m+1
(n) −k
m+1
e
b
m
(n),
e
b
m+1
(n) = k
m+1
*
e
f
m
(n − 1) + e
b
m
(n −1).
These equations have the structural representation shown in Fig. 4.6(a). Repeated application of
this gives rise to the IIR structure of Fig. 4.6(b). This is precisely the IIR all-pass lattice structure
LATTICESTRUCTURES FORLINEARPREDICTION 37
(a)
( b)
e (n)
f
1
e (n)
b
0
z
−1
k
1

e (n)
f
0
= x(n)
z
−1
z
−1
k
N
e (n)
b
N
e (n)
f
N
e (n)
f
N−1
e (n)
b
N−1
k
N−1
e (n)
b
1
z
−1
z
−1
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
k
*
m+1
−k
m+1
This box is labelled as k
m+1
FIGURE 4.6: (a) The IIR LPC lattice section. (b) The IIR LPC lattice structure that generates the
signal x(n) from the optimal prediction error e
f
N
(n). The lower-order forward prediction errors e
f
m
(n)
and the backward prediction errors are also generated in the process automatically.
studied in signal processing literature (Gray and Markel, 1973; Vaidyanathan, 1993; Oppenheim
and Schafer, 1999). Thus, if we apply the forwardpredictionerror e
f
N
(n) as an input to this structure,
then the forward and backward error sequences of all the lower-order optimal predictors appear at
various internal nodes, as indicated. In particular, the 0th-order prediction error e
f
0
(n) appears at
the rightmost node. Because e
f
0
(n) = x(n), we therefore obtain the original random process x(n)
as an output of the IIR lattice, in response to the input e
f
N
(n). We already showed that the transfer
function
z
−1
G
N
(z) = z
−(N+1)
¯
A
N
(z)
A
N
(z)
is all-pass. This is the transfer function from the input e
f
N
(n) to the node e
b
N
(n) in the IIR lattice.
The transfer function from the input to the rightmost node in Fig. 4.6(b) is the all-pole filter
1/A
N
(z), because this transfer function produces x(n) in response to the input e
f
N
(n) (see Fig. 4.2).
4.3.2 Stability of the IIRFilter
It is clear from these developments that the ‘‘parcor’’ coefficients k
m
derived from Levinson’s
recursion are also the lattice coefficients for the all-pass function z
−N
¯
A
N
(z)/A
N
(z). Now, it is well
38 THE THEORY OFLINEARPREDICTION
known from the signal processing literature (Gray and Markel, 1973; Vaidyanathan, 1993) that the
polynomial A
N
(z) has all its zeros strictly inside the unit circle if and only if
|k
m
| < 1, 1 ≤ m ≤ N. (4.16)
In fact, under this condition, all the polynomials A
m
(z), 1 ≤ m ≤ N have all zeros strictly inside
the unit circle.
On the other hand, Levinson’s recursion(Section 3.2) has shown that the condition |k
m
| < 1
is indeed true as long as x(n) is not a line spectral process. This proves that the IIR filters 1/A
m
(z)
and z
−m
¯
A
m
(z)/A
m
(z) indeed have all poles strictly inside the unit circle so that Fig. 4.6(b) is, in
particular, stable. Although the connection to the lattice structure is insightful, it is also possible
to prove stability of 1/A
m
(z) directly without the use of Levinson’s recursion or the lattice. This is
done in Appendix C.
4.3.3 The Upward and Downward Recursions
Levinson’s recursion (Eq. (4.14) and (4.15)) computes the optimal predictor polynomials of
increasing order, from a knowledge of the autocorrelation R(k) of the WSS process x(n). This is
called the upward recursion. These equations can be inverted to obtain the relations
(1 − |k
m+1
|
2
) × A
m
(z) = A
m+1
(z) − k
m+1
zB
m+1
(z),
(1 −|k
m+1
|
2
) ×B
m
(z) = −k
m+1
*
A
m+1
(z) + zB
m+1
(z). (4.17)
Given A
m+1
(z), the polynomial B
m+1
(z) is known, and so is
k
m+1
= a
m+1,m+1
*
. (4.18)
Using this, we can uniquely identify A
m
(z), hence, B
m
(z), from (4.17). Thus, if we know the
Nth-order optimal prediction polynomial A
N
(z), we can uniquely identify all the lower-order
optimal predictors. So Eq. (4.17) is called the downward recursion. This process automatically
reveals the lower-order lattice coefficients k
i
. If we know the prediction error E
f
N
, we can therefore
also identify the lower-order errors E
f
i
, by repeated use of Eq. (3.14), that is,
E
f
m
= E
f
m+1
/(1 − |k
m+1
|
2
) (4.19)
LATTICESTRUCTURES FORLINEARPREDICTION 39
4.4 CONCLUDINGREMARKS
The properties of the IIRlattice have beenanalyzed in depth by Gray and Markel (1973), and lattice
structures for linear prediction have been discussed in detail by Makhoul (1977). The advantage of
lattice structures arises from the fact that even when the coefficients k
m
are quantized the IIR lattice
remains stable, as long as the quantized numbers continue to satisfy |k
m
| < 1. This is a very useful
result in compression and coding, because it guarantees stable reconstruction of the data segment
that was compressed. We will return to this point in Sections 5.6 and 7.8.
• • • •
41
C H A P T E R 5
Autoregressive Modeling
5.1 INTRODUCTION
Linear predictive coding of a randomprocess reveals a model for the process, calledthe autoregressive
(AR) model. This model is very useful both conceptually and for approximating the process with a
simple model. In this chapter, we describe this model.
5.2 AUTOREGRESSIVEPROCESSES
A WSS random process w(n) is said to be autoregressive (AR) if it can be generated by using the
recursive difference equation
w(n) = −
N

i=1
d
i
*
w(n −i) + e(n), (5.1)
where
1. e(n) is a zero-mean white WSS process, and
2. the polynomial D(z) = 1 +

N
i=1
d
i
*
z
−i
has all zeros inside the unit circle.
In other words, we can generate w(n) as the output of a causal, stable, all-pole IIR filter 1/D(z) in
response to white input (Fig. 5.1). If d
N
= 0, we say that the process is AR(N), that is, AR of order
N. Because e(n) has zero mean, the AR process has zero mean according to the above definition.
Now, how does the AR process enter our discussion of linear prediction? Given a WSS
process x(n), let us assume that we have found the Nth-order optimal predictor polynomial A
N
(z).
We know, we can then represent the process as the output of an IIR filter as shown in Fig. 2.1(b).
The input to this filter is the prediction error e
f
N
(n). In the time domain, we can write
x(n) = −
N

i=1
a

N,i
x(n −i) + e
f
N
(n). (5.2)
42 THE THEORY OFLINEARPREDICTION
w(n) e(n) 1 D(z)
/
IIR allpole filter
white process
AR process
FIGURE5.1: Generation of an AR process from white noise and an all-pole IIR filter.
In Section 3.4, we saw that if the error stalls, that is, E
f
m
does not decrease anymore as m increases
beyond some value N, then e
f
N
(n) is white (assuming x(n) has zero mean). Thus, the stalling
phenomenon implies that x(n) is AR(N).
Summarizing, suppose the optimal predictors of various orders for a zero-mean process x(n)
are such that the minimized mean square errors satisfy
E
f
1
≥ E
f
2
≥ . . . ≥ E
f
N−1
> E
f
N
= E
f
m
, m > N. (5.3)
Then, x(n) is AR(N).
Example 5.1: Levinson’s recursion for an AR process. Consider a WSS AR process
generated by using an all-pole filter c/D(z) with complex conjugate pairs of poles at
0.8e
±0.2 jπ
, 0.85e
±0.4 jπ
so that
D(z) = 1.0000 −1.8198z
−1
+ 2.0425z
−2
− 1.2714z
−3
+ 0.4624z
−4
.
The power spectrum S
ww
(e

) of this AR(4) process is shown in Fig. 5.2 (top plot). The constant
c in c/D(z) is such that S
ww
(e

) has maximum magnitude unity. The first few autocorrelation
coefficients for this process are as follows:
R(0) R(1) R(2) R(3) R(4) R(5)
0.2716 0.1758 −0.0077 −0.1091 −0.0848 −0.0226
If we perform Levinson’s recursion with this, we obtain the lattice coefficients
k
1
k
2
k
3
k
4
k
5
k
6
−0.6473 0.7701 −0.5469 0.4624 0.0000 0.0000
AUTOREGRESSIVEMODELING 43
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
0 1 2 3 4 14
0
0.1
0.2
0.3
Prediction order m
P
r
e
d
i
c
t
i
o
n

e
r
r
o
r
0 1 2 14
0.08
0.12
0.16
Prediction order m
P
r
e
d
i
c
t
i
o
n

e
r
r
o
r
FIGURE 5.2: Example 5.1. The input power spectrum S
ww
(e

) of an AR(4) process (top) and the
prediction error E
f
m
plotted for a few prediction orders m (middle). The prediction error for the non-AR
process of Ex. 3.2 is also reproduced (bottom).
44 THE THEORY OFLINEARPREDICTION
for the first six optimal predictors. Notice that these coefficients satisfy |k
m
| < 1 as expected from
theory. The optimal prediction polynomials A
0
(z) through A
6
(z) have the coefficients given in the
following table:
A
0
(z) 1.0
A
1
(z) 1.0 −0.6473 0 0 0 0 0
A
2
(z) 1.0 −1.1457 0.7701 0 0 0 0
A
3
(z) 1.0 −1.5669 1.3967 −0.5469 0 0 0
A
4
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
A
5
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
A
6
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
The prediction filter does not change after A
4
(z) because the original process is AR(4). This is also
consistent with the fact that the lattice coefficients past k
4
are all zero. The prediction errors are
plotted in Fig. 5.2 (middle). The error E
m
decreases monotonically and then becomes a constant for
m ≥ 4. For comparison, the prediction error E
m
for the non-AR process of Ex. 3.2 is also shown
in the figure (bottom); this error continues to decrease.
5.3 APPROXIMATIONBY ANAR(N) PROCESSES
Recall that an arbitrary WSS process x(n) (AR or otherwise), can always be represented in terms
of the optimal linear prediction error e
f
N
(n) as in Eq. (5.2). In many practical cases, the error e
f
N
(n)
tends to be nearly white for reasonably large N. If we replace e
f
N
(n) with zero-mean white noise
e(n) having the mean square value E
f
N
, we obtain the so-called AR model y(n) for the process x(n).
(a)
e(n)
white process
AR approximation
y(n) 1 A (z)
N /
IIR allpole filter
( b)
x(n) e (n)
f
N
1 A (z)
N
/
IIR allpole filter
original
process
FIGURE 5.3: Generation of the AR(N) approximation y(n) for a WSS process x(n). (a) The exact
process x(n) expressed in terms of the Nth-order LPC parameters and (b) the AR(N) approximation
y(n).
AUTOREGRESSIVEMODELING 45
This is shown in Fig. 5.3. The process y(n) satisfies
y(n) = −
N

i=1
a
N,i
* y(n −i) + e(n)
.¸¸.
white
(5.4)
The model process y(n) is also called the AR(N) approximation (Nth-order autoregressive
approximation) of x(n).
5.3.1 If a Process is AR, Then LPC Will Reveal It
If a zero-mean WSS process x(n) is such that the optimal LPC error e
f
N
(n) is white, then we know
that x(n) is AR. Now, consider the converse. That is, let x(n) be AR(N). This means that we can
express it as
x(n) = −
N

i=1
c
N,i
* x(n −i) + e(n), (5.5)
for some set of coefficients c
N,i
, where e(n) is white. This is a causal difference equation, so that, at
any time n, the sample x(n) is a linear combination
x(n) = g(0)e(n) + g(1)e(n −1) + g(2)e(n −2) +. . . (5.6)
Similarly, x(n −i) is a linear combination of
e(n − i), e(n −i −1) . . .
Now consider
E[e(n)x
*
(n −i)] = E
_
e(n)[ g(0)e(n −i) + g(1)e(n −i −1) +. . .]
*
_
Because e(n) is white (i.e., E[e(n)e
*
(n −)] = 0, > 0), this means that
E[e(n)x
*
(n − i)] = 0, i > 0. (5.7)
In other words, e(n) is orthogonal to all the past samples of x(n). In view of the orthogonality
principle, this means that the linear combination

N

i=1
c
N,i
*
x(n − i)
is the optimal linear prediction of x(n). As the solution to the optimal Nth-order prediction
problem is unique, the solution {a
N,i
} to the optimal predictor problem is therefore equal to c
N,i
.
Thus, the coefficients c
N,i
of the AR(N) process can be identified simply by solving the Nth-order
46 THE THEORY OFLINEARPREDICTION
optimal prediction problem. The resulting prediction error e
f
N
(n) is white and equal to the ‘‘input
term’’ e(n) in the AR equation Eq. (5.5). Summarizing, we have:
Theorem5.1. LPCof an ARprocess. Let x(n) be an AR(N) WSS process, that is, a zero-mean
process representable as in Eq. (5.5), where e(n) is white. Then,
1. The AR coefficients c
N,i
can be found simply by performing Nth-order LPC on the process
x(n). The optimal LPC coefficients a
N,i
will turn out to be c
N,i
. Thus, the AR model y(n)
resulting from the optimal LPC (Fig. 5.3) is the same as x(n), that is y(n) = x(n).
2. Moreover, the prediction error e
f
N
(n) is white and the same as the white noise sequence e(n)
appearing in the AR(N) description Eq. (5.5) of x(n). ♦
5.3.2 Extrapolation of the Autocorrelation
Because e
f
N
(n) is white, the autocorrelation R(k) of an AR(N) process x(n) is completely
determined by the filter 1/A
N
(z) and the variance E
f
N
. Because the quantities A
N
(z) and E
f
N
can be
found from the N + 1 coefficients
R(0), R(1), . . . , R(N)
using Levinson’s recursion, we conclude that for AR(N) processes, all the coefficients R(k), |k| > N
are determined by the above N + 1 coefficients. So, the determination of A
N
(z) and E
f
N
provides
us a way to extrapolate an autocorrelation. In other words, we can find R(k) for |k| > N from the
first N + 1 values in such a way that the Fourier transform S
xx
(e

) of the extrapolated sequence is
nonnegative for all ω.
Extrapolationequation. An explicit equation for extrapolationof R(k) can readily be written
down. Thus, given R(0), R(1), . . . R(N) for an AR(N) process, we can find R(k) for any k > N
using the equation
R(k) = −a

N,1
R(k − 1) − a

N,2
R(k −2) . . . − a

N,N
R(k −N).
This equation follows directly from normal equations. See Appendix D for details.
In many practical situations, a signal x(n) that is not AR can be modeled well by an AR
process. A common example is speech (Markel and Gray, 1976; Rabiner and Schafer, 1978; Jayant
and Noll, 1984). The fact that x(n) is not AR means that the prediction error e
f
m
(n) never gets
white, but its power spectrum often gets flatter and flatter as m increases. This will be demonstrated
later in Section 6.3 where we define a mathematical measure for spectral flatness and study it in the
context of linear prediction.
AUTOREGRESSIVEMODELING 47
5.4 AUTOCORRELATION MATCHINGPROPERTY
Let x(n) be a WSS process, A
N
(z) be its optimal Nth-order predictor polynomial, and y(n) be
the AR(N) approximation of x(n) generated as in Fig. 5.3. In what mathematical sense does y(n)
approximate x(n), that is, in what respect is y(n) ‘‘close’’ to x(n)? We now provide a quantitative
answer:
Theorem5.2. Let R(k) and r(k) be the autocorrelations of x(n) and the AR(N) approxima-
tion y(n), respectively. Then,
R(k) = r(k), |k| ≤ N. (5.8)
Thus, the AR(N) model y(n) is an approximation of x(n) in the sense that the first N + 1
autocorrelation coefficients for the two processes are equal to each other. ♦
So, as the approximation order N increases, more and more values of R(k) are matched to
r(k). If the process x(n) is itself AR(N), then we know (Theorem 5.1) that the AR(N) model
y(n) satisfies y(n) = x(n) so that Eq. (5.8) holds for all k, not just |k| ≤ N. A simple corollary of
Theorem 5.2 is
R(0) = r(0),
which can be used to prove that for any WSS process x(n),
R(0) = E
f
N
_

0
1
|A
N
(e

)|
2


(5.9)
To see this, observe first that the AR process y(n) in Fig. 5.3(b) is generated using white noise e(n)
with variance E
f
N
, so that
r(0) =
_

0
E
f
N
|A
N
(e

)|
2


.
But because r(0) = R(0), Eq. (5.9) follows readily.
Proof of Theorem 5.2. By construction, the AR(N) process y(n) is representable as in
Fig. 5.3(b) and therefore satisfies the difference equation Eq. (5.4). Now consider the sample
y(n −k). Because Fig. 5.3(b) is a causal system,
y(n − k) = linear combination of e(n −k − i), i ≥ 0. (5.10)
But because e(n) is zero-mean white, we have E[e(n)e

(n −k)] = 0 for k = 0. Thus,
E[e(n)y*(n − k)] = 0, k > 0. (5.11)
48 THE THEORY OFLINEARPREDICTION
Multiplying Eq. (5.4) by y*(n −k), taking expectations, and using the above equality, we obtain
the following set of equations.






r(0) r(1) . . . r(N)
r*(1) r(0) . . . r(N− 1)
.
.
.
.
.
.
.
.
.
.
.
.
r*(N) r*(N −1) . . . r(0)












1
a
N,1
.
.
.
a
N,N






=






E
f
N
0
.
.
.
0






(5.12)
where r(k) = E[ y(n)y*(n −k)]. This set of equations has exactly the same appearance as the
augmented normal equations Eq. (2.20). Thus, the coefficients a
N,i
are the solutions to two optimal
predictor problems: one based on the process x(n) and the other based on y(n).
We now claim that Eq. (2.20) and (5.12) together imply R(k) = r(k), for the range of values
0 ≤ k ≤ N. This is proved by observing that the coefficients a
N,i
and E
f
N
completely determine the
lower-order optimal prediction coefficients a
m,i
and errors E
f
m
(see end of Section 4.3.3, ‘‘Upward
and Downward Recursions’’). Thus, a
m,i
and E
f
m
satisfy smaller sets of equations of the form
Eq. (2.20). Collecting them together, we obtain a set of equations resembling Eq. (2.37). We
obtain a similar relation if R(k) is replaced by r(k). Thus, we have the following two sets of
equations (demonstrated for N = 3)





r(0) r(1) r(2) r(3)
r*(1) r(0) r(1) r(2)
r*(2) r*(1) r(0) r(1)
r*(3) r*(2) r*(1) r(0)





. ¸¸ .
r
N+1





1 0 0 0
a
3,1
1 0 0
a
3,2
a
2,1
1 0
a
3,3
a
2,2
a
1,1
1





. ¸¸ .
Δ

=





E
f
3
× × ×
0 E
f
2
× ×
0 0 E
f
1
×
0 0 0 E
f
0





. ¸¸ .
Δ
u
1





R(0) R(1) R(2) R(3)
R

(1) R(0) R(1) R(2)
R

(2) R

(1) R(0) R(1)
R

(3) R

(2) R

(1) R(0)





. ¸¸ .
R
N+1





1 0 0 0
a
3,1
1 0 0
a
3,2
a
2,1
1 0
a
3,3
a
2,2
a
1,1
1





. ¸¸ .
Δ

=





E
f
3
× × ×
0 E
f
2
× ×
0 0 E
f
1
×
0 0 0 E
f
0





. ¸¸ .
Δ
u
2
The symbol × stands for possibly nonzero entries whose values are irrelevant for the discus-
sion. Note that Δ

is lower triangular and Δ
u
1
and Δ
u
2
are upper triangular matrices. We
AUTOREGRESSIVEMODELING 49
have not shown that Δ
u
1
= Δ
u
2
, so we will take a different route to prove the desired claim
Eq. (5.8). By premultiplying the preceding equations with Δ

, we get
Δ

r
N+1
Δ

= Δ

Δ
u
1
,
Δ

R
N+1
Δ

= Δ

Δ
u
2
.
The right-hand side in each of these equations is a product of two upper triangular matrices and
is therefore upper triangular. The left-hand sides, on the other hand, are Hermitian. As a result,
the right-hand sides must be diagonal! It is easy to verify that these diagonal matrices have E
f
m
as diagonal elements, so that the right-hand sides of the above two equations are identical. This
proves that Δ

R
N+1
Δ

= Δ

r
N+1
Δ

. Since Δ

is nonsingular, we can invert it, and obtain
R
N+1
= r
N+1
. This proves Eq. (5.8) indeed.
In a nutshell, given {a
N,i
} and the error E
f
N
, we can use the downward recursion (Section
4.3.3) to compute a
m,i
and E
f
m
for m = N −1, N−2, . . . So we uniquely identify R(0) = E
f
0
. Next,
from the normal equation with m = 1, we can identify R(1). In general, if R(0), . . . , R(m− 1) are
known, we can use the normal equation for mth order and identify R(m) from the mth equation in
this set (refer to Eq. (2.8)). Thus, the coefficients R(0), R(1), . . . R(N), which lead to a given set
of {a
N,i
} and E
f
N
, are unique, so r(k) and R(k) in Eqs. (2.20) and (5.12) must be the same!
R(k) is Matched to the Autocorrelationof the IIRFilter. The AR(N) model of Fig. 5.3(b)
can be redrawn as in Fig. 5.4 where e
1
(n) is white noise with variance equal to unity and where
H(z) =
_
E
f
N
A
N
(z)
(5.13)
H(z) is a causal IIR filter with impulse response, say, h(n). The deterministic autocorrelation of
the sequence h(n) is defined as

n
h(n)h*(n −k).
Because e
1
(n) is white with E[|e
1
(n)|
2
] = 1, the autocorrelation r(k) of the AR model process y(n)
is given by
r(k) =

n
h(n)h*(n − k). (5.14)
So we can rephrase Eq. (5.8) as follows:
50 THE THEORY OFLINEARPREDICTION
y(n)
H(z)
IIR allpole filter
e (n)
1
) n ( x f o l e d o m R A 1 = e c n a i r a v , e t i h w
FIGURE 5.4: Autoregressive model y(n) of a signal x(n).
Corollary 5.1. Let x(n) be a WSS process with autocorrelation R(k) and A
N
(z) be the
Nth-order optimal predictor polynomial. Let h(n) be the impulse response of the causal stable IIR
filter
_
E
f
N
/A
N
(z). Then,
R(k) =

n=0
h(n)h*(n −k), (5.15)
for |k| ≤ N. That is, the first N + 1 coefficients of the deterministic autocorrelation of h(n) are
matched to the values R(k). In particular, if x(n) is AR(N), then Eq. (5.15) holds for all k. ♦
5.5 POWER SPECTRUM OF THE ARMODEL
We just proved that the AR(N) approximation y(n) is such that its autocorrelation coefficients
r(k) match that of the original process x(n) for the first N + 1 values of k. As N increases, we
therefore expect the power spectrum S
yy
(e

) of y(n) to resemble the power spectrum S
xx
(e

) of
x(n). This is demonstrated next.
Example 5.2: ARapproximations. In this example, we consider a random process generated
as the output of the fourth-order IIR filter
G
4
(z) =
3.9 −2.7645z
−1
+ 1.4150z
−2
−0.5515z
−3
1 − 0.9618z
−1
+ 0.7300z
−2
−0.5315z
−3
+ 0.5184z
−4
This system has complex conjugate pairs of poles inside the unit circle at 0.9e
±0.2 jπ
and 0.8e
±0.6 jπ
.
With white noise of variance σ
2
e
driving this system, the output has the power spectrum
S
xx
(e

) = σ
2
e
|G
4
(e

)|
2
.
This is shown in Fig. 5.5 (dotted plots), with σ
2
e
assumed to be such that the peak value of S
xx
(e

)
is unity. Clearly S
xx
(e

) is not an AR process because the numerator of G
4
(z) is not a constant.
Figure 5.5 shows the AR(N) approximations S
yy
(e

) of the spectrum S
xx
(e

) for various
orders N (solid plots). We see that the AR(4) approximation is quite unacceptable, whereas
the AR(5) approximation shows significant improvement. The successive approximations are
better and better until we find that the AR(9) approximation is nearly perfect. From the
preceding theory, we know that the AR(9) approximation and the original process have matching
AUTOREGRESSIVEMODELING 51
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(4)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(5)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(6)
(a)
(b)
(c)
FIGURE 5.5: AR approximations of a power spectrum: (a) AR(4), (b) AR(5), (c) AR(6), (d) AR(7),
(e)AR(8), and (f) AR(9).
52 THE THEORY OFLINEARPREDICTION
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(7)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(8)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
original
AR(9)
(e)
(f)
(d)
FIGURE5.5: Continued.
AUTOREGRESSIVEMODELING 53
autocorrelation coefficients, that is, R(m) = r(m) for 0 ≤ m ≤ 9. This is indeed seen from the
following table:
m R(m) r(m)
0 0.1667 0.1667
1 0.0518 0.0518
2 −0.0054 −0.0054
3 0.0031 0.0031
4 −0.0519 −0.0519
5 −0.0819 −0.0819
6 −0.0364 −0.0364
7 −0.0045 −0.0045
8 0.0057 0.0057
9 0.0318 0.0318
10 0.0430 0.0441
11 0.0234 0.0241
The coefficients R(m) and r(m) begin to differ starting from m = 10 as expected. So the AR(9)
approximation is not perfect either, although the plots of S
xx
(e

) and S
yy
(e

) are almost
indistinguishable.
The spectrumS
yy
(e

) of the AR approximationy(n) is called an AR-model-based estimate of
S
xx
(e

). This is also said to be the maximumentropy estimate, for reasons explained later in Section
6.6.1. We can say that the estimate S
yy
(e

) is obtained by extrapolating the finite autocorrelation
segment
R(k), |k| ≤ N (5.16)
using an AR(N) model. The estimate S
yy
(e

) is nothing but the Fourier transform of the
extrapolated autocorrelation r(k). Note that if R(k) were ‘‘extrapolated’’ by forcing it to be zero for
|k| > N, then the result may not even be a valid autocorrelation (i.e., its Fourier transform may not
be nonnegative everywhere). Further detailed discussions on power spectrum estimationtechniques
can be found in Kay and Marple (1981), Marple (1987), Kay (1988), and Therrien (1992).
Peaks are well matched. We knowthat the AR(N) model approximates the power spectrum
S
xx
(e

) with the all-pole spectrum power spectrum
S
yy
(e

) =
E
f
N
|A
N
(e

)|
2
(5.17)
54 THE THEORY OFLINEARPREDICTION
If S
xx
(e

) has sharp peaks, then A
N
(z) has to have zeros close to the unit circle to approxi-
mate these peaks. If, on the other hand, S
xx
(e

) has zeros (or sharp dips) on the unit circle,
then S
yy
(e

) cannot approximate these very well because it cannot have zeros on the unit
circle. Thus, the AR(N) model can be used to obtain a good match of the power spectrum
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
Original
AR(18)
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
P
o
w
e
r

s
p
e
c
t
r
u
m
Original
AR(31)
FIGURE5.6: Demonstrating that the AR approximation of a power spectrum shows good agreement
near the peaks. In the AR(18) approximation, the peaks are more nicely matched than the valleys. The
AR(31) approximation is nearly perfect. The original power spectrum is generated with a pole-zero filter
as in Ex. 5.2, but instead of G
4
(z), we have in this example a 10th-order real-coefficient filter G
10
(z)
with all poles inside the unit circle.
AUTOREGRESSIVEMODELING 55
S
xx
(e

) near its peaks but not near its valleys. This is demonstrated in the example shown in
Fig. 5.6. Such approximations, however, are useful in some applications such as the AR represen-
tation of speech.
Spectral factorization. If the process x(n) is indeed AR(N), then the autoregressive model
is such that S
xx
(e

) = S
yy
(e

). So,
S
xx
(e

) =
E
f
N
|A
N
(e

)|
2
(5.18)
In other words, we have found a rational transfer function H
N
(z) = c/A
N
(z) such that
S
xx
(e

) = |H
N
(e

)|
2
.
We see that H
N
(z) is a spectral factor of S
xx
(e

). Thus, when x(n) is AR(N), the Nth-order
predictionproblemplaces in evidence a stable spectral factor H
N
(z) of the power spectrumS
xx
(e

).
When x(n) is not AR(N), the filter H
N
(z) is an AR(N) approximation to the spectral factor.
Example 5.3: Estimation of AR model paramaters. We now consider an AR(4) process
x(n) derived by driving the filter 1/A
4
(z) with white noise, where
A
4
(z) = 1.0000 − 1.8198z
−1
+ 2.0425z
−2
−1.2714z
−3
+ 0.4624z
−4
This has roots
0.2627 ± 0.8084 j, 0.6472 ±0.4702 j
with absolute values 0.85 and 0.8. A sample of the AR(N) process can be generated by first
generating a WSS white process e(n) using the Matlab command randn and driving the filter
1/A
4
(z) with input e(n). The purpose of the example is to show that, from the measured values
of the random process x(n), we can actually estimate the autocorrelation and hence, the model
parameters (coefficients of A
4
(z)) accurately.
First, we estimate the autocorrelation R(k) using (2.43), where L is the number of samples
of x(n) available. To see the effect of noise (which is always present in practice), we add random
noise to the output x(n), so the actual data used are
x
noisy
(n) = x(n) + η(n).
The measurement noise η(n) is assumed to be zero-mean white Gaussian. Its variance will be
denoted as σ
2
η
. With L = 2
10
and σ
2
η
= 10
−4
, and x
noisy
(n) used instead of x(n), the estimated
autocorrelation R(k) for 0 ≤ k ≤ 4 is as follows:
R(0) = 8.203, R(1) = 4.984, R(2) = −1.032, R(3) = −3.878, R(4) = −2.250.
56 THE THEORY OFLINEARPREDICTION
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
FIGURE5.7: Example 5.3. Estimation of AR model parameters. Magnitude responses of the original
system 1/A
4
(z) and the estimate 1/
´
A
4
(z) are shown for three values of the measurement noise variance
σ
2
η
= 10
−4
, 0.0025, and σ
2
η
= 0.01, respectively, from top to bottom. The segment length used to
estimate autocorrelation is L = 2
10
.
AUTOREGRESSIVEMODELING 57
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
0 0.2 0.4 0.6 0.8 1
0
1
2
3
4
5
ω/π
M
a
g
n
i
t
u
d
e

r
e
s
p
o
n
s
e
Original
Estimate
FIGURE5.8: Example 5.3. Repetition of Fig. 5.7 with segment length L = 2
8
instead of 2
10
.
58 THE THEORY OFLINEARPREDICTION
From this, the estimated prediction polynomial can be obtained as
´
A
4
(z) = 1.0000 − 1.8038z
−1
+ 2.0131z
−2
−1.2436z
−3
+ 0.4491z
−4
This should be compared with the original A
4
(z) given above. The roots of
´
A
4
(z) are 0.2616 ±
0.8036 j and 0.6403 ± 0.4679 j, which are inside the unit circle as expected.
Figure 5.7 (top plot) shows the magnitude responses of the filters 1/A
4
(z) and 1/
´
A
4
(z),
which are seen to be in good agreement. The middle and lower plots show these responses with
the measurement noise variance increased to σ
2
η
= 0.0025 and σ
2
η
= 0.01, respectively. Thus, the
estimate deteriorates with increasing measurement noise. For the same three levels of measurement
noise, Fig. 5.8 shows similar plots for the case where the number of measured samples of x(n) are
reduced to L = 2
8
. Decreasing L results in poorer estimates of A
4
(z) because the estimated R(k)
is less accurate.
5.6 APPLICATIONINSIGNALCOMPRESSION
The AR(N) model gives an approximate representation of the entire WSS process x(n), using
only N + 1 numbers, (i.e., the N autoregressive coefficients a

N,i
and the mean square value E
f
N
).
In practical applications such as speech coding, this modeling is commonly used for compressing
the signal waveform. Signals of practical interest can be modeled by WSS processes only over short
periods. The model has to be updated as time evolves.
It is typical to subdividethe sampled speechwaveformx(n) (evidently, a real-valuedsequence)
into consecutive segments (Fig. 5.9).
1
Each segment might be typically 20-ms long, corresponding
to 200 samples at a sampling rate of 10 kHz. This sequence of 200 samples is viewed as representing
a WSS random process, and the first N autocorrelation coefficients R(k) are estimated. With
N << 200 (usually N ≈ 10), the finiteness of duration of the segment is of secondary importance
in affecting the accuracy of the estimates. There are several sophisticated ways to perform this
estimation (Section 2.5). The simplest would be to use the average
R(k) ≈
1
L
L−1

n=0
x(n)x(n −k), 0 ≤ k ≤ N. (5.19)
Here, L is the length of the segment. If N << L, the end effects (caused by the use of finite
summation) are negligible. In practice, one multiplies the segment with a smooth window to reduce
the end effects. Further details about these techniques can be found in Rabiner and Schafer (1978).
1
Note that when x(n) is real-valued as in speech, the polynomial coefficients a
N,i
, the parcor coefficients k
i
, and the
prediction error sequence e
f
N
(n) are real as well.
AUTOREGRESSIVEMODELING 59
Perform LPC
k + 1 k + 2

segment k
20 ms
and
f
N
A (z)
N
signal to be
compressed
x(n)
y(n) 1 A (z)
N
/
IIR allpole filter
white, variance
f
N
e(n)
LPC
parameters
(a)
( b)
compression
reconstruction
FIGURE5.9: (a) Compression of a signal segment using LPC and (b) reconstruction of an approxima-
tion of the original segment from the LPC parameters A
N
(z) and E
f
N
. The k th segment of the original
signal x(n) is approximated by a segment of the AR signal y(n). For the next segment of x(n), the LPC
parameters A
N
(z) and E
N
are generated again.
Note that with segment length L = 200 and the AR model order N = 10, a segment of 200
samples has been compressed into 11 coefficients! The encoded message
_
a
N,1
a
N,2
. . . a
N,N
E
f
N
_
(5.20)
is thentransmitted, possibly after quantization. At the receiver end, one generates a local white-noise
source e(n) with variance E
f
N
and drives the IIR filter
1
A
N
(z)
=
1
1 +

N
i=1
a
N,i
z
−i
(5.21)
with this noise source as in Fig. 5.9(b). A segment of the output y(n) is then taken as the AR(N)
approximation of the original speech segment.
Pitch-excited coders and noise-excited coders. Strictly speaking, the preceding discussion
tells only half the story. The locally generated white noise e(n) is satisfactory only when the speech
segment under consideration is unvoiced. Such sounds have no pitch or periodicity (e.g., the part
60 THE THEORY OFLINEARPREDICTION
‘‘Sh’’ in ‘‘Should I.’’ By contrast, voiced sounds such as vowel utterances (e.g., ‘‘I’’) are characterized
by a pitch, which is the local periodicity of the utterance. This periodicity T of the speech segment
has to be transmitted so that the receiver can reproduce faithful voiced sounds. The modification
required at the receiver is that the AR(N) model is now driven by a periodic waveform such as an
impulse train, with period T. Summarizing, the AR(N) model is excited by a noise generator or a
periodic impulse train generator, according to whether the speech segment is unvoiced or voiced.
This is summarized in Fig. 5.10. There are further complications in practice such as transitions
from voiced to unvoiced regions and so forth.
Many variations of linear prediction have been found to be useful in speech compression. A
technique called differential pulse code modulation is widely used in many applications; this uses
the predictive coder in a feedback loop in an ingenious way. The reader interested in the classical
literature on speech compression techniques should study (Atal and Schroeder, 1970; Rabiner and
Schafer, 1978; Jayant and Noll, 1984; Deller et al., 1993) or the excellent, although old, tutorial
paper on vector quantization by Makhoul et al. (1985).
Quantization and stability. It is shown in Appendix C that the unquantized coefficients
a
N,i
, which result fromthe normal equations are such that 1/A
N
(z) has all poles inside the unit circle
(i.e., it is causal and stable). However, if we quantize a
N,i
before transmission, this may not continue
to remain true, particularly in low bit-rate coding. A solution to this problem is to transmit the
‘‘parcor’’ or lattice coefficients k
m
. From Section 3.3, we know that |k
m
| < 1. In view of the relation
between Levinson’s recursion and lattice structures (Section 4.3.1), we also know that 1/A
N
(z) is
stable if and only if |k
m
| < 1. Suppose the coefficients k
m
are quantized in such a way that |k
m
| < 1
continues to hold after quantization. At the receiver end, one recomputes an approximation A
(q)
N
(z)
of A
N
(z), from the quantized versions k
m
. Then, the AR(N) model 1/A
(q)
N
(z) is guaranteed
to be stable.
y(n)
1 A (z)
N
/
AR(N) model of
speech segment
reconstructed
speech segment
white noise
generator
periodic impulse
train generator
for unvoiced signals
for voiced signals
FIGURE 5.10: Reconstruction of a speech segment from its AR(N) model. The source of excitation
depends on whether the segment is voiced or unvoiced (see text).
AUTOREGRESSIVEMODELING 61
In Section 7.8, we introduce the idea of line-spectrum pair (LSP). Instead of quantizing
and transmitting the coefficients of A
N
(z) or the lattice coefficients k
m
, one can quantize and
transmit the LSPs. Like the lattice coefficients, the LSP coefficients preserve stability even after
quantization. Other advantages of using the LSP coefficients will be discussed in Section 7.8.
5.7 MA ANDARMAPROCESSES
We know that a WSS random process x(n) is said to be AR if it satisfies a recursive (IIR) difference
equation of the form
x(n) = −
N

i=1
d
i
*x(n − i) + e(n), (5.22)
where e(n) is a zero-mean white WSS process, and the polynomial D(z) = 1 +

N
i=1
d
i
*z
−i
has
all zeros inside the unit circle. We say that a WSS process x(n) is a moving average (MA) process
if it satisfies a nonrecursive (FIR) difference equation of the form
x(n) =
N

i=0
p
i
*e(n − i), (5.23)
where e(n) is a zero-mean white WSS process. Finally, we say that a WSS process x(n) is an
ARMA process if
x(n) = −
N

i=1
d
i
*x(n − i) +
N

i=0
p
i
*e(n − i), (5.24)
where e(n) is a zero-mean white WSS process. Defining the polynomials
D(z) = 1 +
N

i=1
d
i
*z
−i
and P(z) =
N

i=0
p
i
*z
−i
, (5.25)
we see that the above processes can be represented as in Fig. 5.11. In each of the three cases, x(n) is
the output of a rational discrete time filter, driven by zero-mean white noise. For the AR process,
the filter is an all-pole filter. For the MA process, the filter is FIR. For the ARMA process, the
filter is IIR with both poles and zeros.
We will not have occassion to discuss MA and ARMA processes further. However, the
theory and practice of linear prediction has been modified to obtain approximate models for these
types of processes. Some discussions can be found in Marple (1987) and Kay (1988). In Section
5.5, we saw that the AR model provides an exact spectral factorization for an AR process. An
62 THE THEORY OFLINEARPREDICTION
x(n) e(n) 1 D(z)
/
IIR allpole filter
white
AR process
x(n) e(n) P(z)
FIR filter
white
MA process
x(n) e(n)
P(z) D(z)
/
IIR pole-zero filter
white
ARMA process
(a)
( b)
( c)
FIGURE 5.11: (a) AR, (b) MA, and (c) ARMA processes.
approximate spectral factorization algorithm for MA processes, based on the theory of linear
prediction, can be found in Friedlander (1983).
5.8 SUMMARY
We conclude the chapter with a summary of the properties of AR processes.
1. An AR process x(n) of order N (AR(N) process) satisfies an Nth-order recursive difference
equation of the form Eq. (5.22), where e(n) is zero-mean white noise. Equivalently, x(n) is
the output of a causal stable IIR all-pole filter 1/D(z) (Fig. 5.11(a)) excited by white noise
e(n).
2. If we perform Nth-order linear prediction on a WSS process x(n), we obtain the represen-
tation of Fig. 5.3(a), where e
f
N
(n) is the optimal prediction error and A
N
(z) is the predictor
polynomial. By replacing e
f
N
(n) with white noise, we obtain the AR(N) model of Fig.
5.3(b). Here, y(n) is an AR(N) process, and is the AR(N) approximation (or model) for
x(n).
3. The AR(N) model y(n) approximates x(n) in the sense that the autocorrelations of the two
processes, denoted r(k) and R(k), respectively, are matched for 0 ≤ |k| ≤ N (Theorem 5.2).
4. The AR(N) model y(n) is such that its power spectrum S
yy
(e

) tends to approximate
the power spectrum S
xx
(e

) of x(n). This approximation improves as N increases and is
particularly good near the peaks of S
xx
(e

) (Fig. 5.6). Near the valleys of S
xx
(e

), it is
not so good, because the AR spectrum S
yy
(e

) is an all-pole spectrum and cannot have
zeros on or near the unit circle.
AUTOREGRESSIVEMODELING 63
5. If x(n) itself is AR(N), then the optimal Nth-order prediction error e
f
N
(n) is white. Thus,
x(n) can be written as in Eq. (5.22), where e(n) is white and d
i
are the solutions (usually
denoted a
N,i
) to the normal equations Eq. (2.8). Furthermore, the AR(N) power spectrum
E
f
N
/|A
N
(e

)|
2
is exactly equal to S
xx
(e

). Thus, we can compute the power spectrum
S
xx
(e

) exactly, simply by knowing the autocorrelation coefficients R(k), |k| ≤ N and
identifying A
N
(z) using normal equations. This means, in turn, that all the autocorrelation
coefficients R(k) of an AR(N) process x(n) are determined, if R(k) is known for |k| ≤ N.
• • • •
65
C H A P T E R 6
Prediction Error Bound and Spectral
Flatness
6.1 INTRODUCTION
In this chapter, we first derive a closed-formexpression for the minimum mean-squared prediction
error for an AR(N) process. We then introduce an important concept called spectral flatness. This
measures the extent to which the power spectrum S
xx
(e

) of a WSS process x(n), not necessarily
AR, is ‘‘flat.’’ The flatness γ
2
x
is such that 0 < γ
2
x
≤ 1, with γ
2
x
= 1 for a flat power spectrum (white
process). We will then prove two results:
1. For fixed mean square value R(0), as the flatness measure of an AR(N) process x(n) gets
smaller, the mean square optimal prediction error E
f
N
also gets smaller (Section 6.4). So, a
less flat process is more predictable.
2. For any WSS process (not necessarily AR), the flatness measure of the optimal prediction
error e
f
m
(n) increases as m increases. So, the power spectrum of the prediction error gets
flatter and flatter as the prediction order m increases (Section 6.5).
6.2 PREDICTIONERROR FORANARPROCESS
If x(n) is an AR(N) process, we know that the optimal prediction error e
f
N
(n) is white. We will
show that the mean square error E
f
N
can be expressed in closed form as
E
f
N
= exp
_
1

_

0
LnS
xx
(e

) dω
_
(6.1)
where S
xx
(e

) is the power spectrum of x(n). Here, Ln is the natural logarithm of a positive
number as normally defined in elementary calculus and demonstrated in Fig. 6.1. Because x(n) is
AR, the spectrum S
xx
(e

) > 0 for all ω, and LnS
xx
(e

) is real and finite for all ω. The derivation
of the above expression depends on the following lemma.
66 THE THEORY OFLINEARPREDICTION
0 1 5 10
-1
0
1
2
3
y
Ln y
FIGURE6.1: A plot of the logarithm.
Lemma 6.1. Consider a polynomial in z
−1
of the form A(z) = 1 +

N
n=1
a
n
z
−n
, and let all
the zeros of A(z) be strictly inside the unit circle. Then,
1

_

0
lnA(e

) dω = j 2πk (6.2)
for some integer k. This implies
_

0
ln|A(e

)|
2
dω/2π = j 2πm for some integer m, so that
1

_

0
Ln|A(e

)|
2
dω = 0, (6.3)
where Ln(.) stands for the principal value of the logarithm. ♦
Because the logarithm arises frequently in our discussions in the next few sections, it is
important to bear in mind some of the subtleties in its definition. Consider the function ln w, with
w = Re

. Here, R > 0 and θ is the phase of w. Clearly, replacing θ with θ + 2πk for integer k
does not change the value of w. However, the value of lnw will depend on k. Thus,
lnw = LnR + jθ + j 2πk, (6.4)
where Ln is the logarithmof a positive number as normally definedin calculus (Fig. 6.1). Thus, lnw
has an infinite number of branches, each branch being defined by one value of the arbitrary integer
k (Kreyszig, 1972; Churchill and Brown, 1984). If we restrict θ to be in the range −π < θ ≤ π
and set k = 0, then the logarithm lnw is said to be evaluated on the principal branch and denoted
as Lnw (called the principal value). For real and positive w, this agrees with Fig. 6.1. Note that
Ln1 = 0, but ln1 = j 2πk, and its value depends on the branch number k.
Because the branch of the function ln w has not been specified in the statement of Lemma
6.1, the integer k is undetermined (and unimportant for our discussions). Similarly, the integer m is
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 67
undetermined. However, in practically all our applications, we deal with quantities of the form Eq.
(6.1), so that it is immaterial whether we use ln(.) or Ln(.) (because exp(j 2πk) = 1 for any integer
k). The distinction should be kept in mind only when trying to follow detailed proofs.
Proof of Lemma 6.1. Note first that because A(z) has no zeros on the unit circle, ln A(e

)
is well defined and finite there (except for branch ambiguity). We can express
A(e

) =
N

i=1
(1 − α
i
e
−jω
), (6.5)
so that
ln A(e

) =
N

i=1
ln(1 − α
i
e
−jω
) + j 2πk. (6.6)
It is therefore sufficient to prove that
1

_

0
ln(1 −α
i
e
−jω
) dω = j 2πk
i
, (6.7)
where k
i
is some integer. This proof depends crucially on the condition that the zeros of A(z)
are inside the unit circle, that is |α
i
| < 1. This condition enables us to expand the principal value
Ln(1 − α
i
e
−jω
) using the power series
1
Ln(1 −v) = −

n=1
v
n
n
, |v| < 1. (6.8)
Identifying v as v = α
i
e
−jω
, we have
ln(1 −α
i
e
−jω
) = −

n=1
α
n
i
e
−jωn
n
+ j 2πk
i
. (6.9)
The power series Eq. (6.8) has region of convergence |v| < 1. We know that any power series
converges uniformly in its region of convergence and can therefore be integrated term by term
(Kreyszig, 1972; Churchill and Brown, 1984). Because
_

0
e
−jωn
dω = 0, n = 1, 2, 3 . . . , (6.10)
1
This expansion evaluates the logarithm on the principal branch because it reduces to zero when v = 0.
68 THE THEORY OFLINEARPREDICTION
Eq. (6.7) readily follows, proving (6.2). Next, A*(e

) =

N
i=1
(1 −α
i
* e

). Because |α
i
| < 1, the
above argument can be repeated to obtain a similar result. Thus,
1

_

0
ln|A(e

)|
2
dω =
1

_

0
ln A(e

)dω +
1

_

0
lnA

(e

)dω = j 2πm
for some integer m. This completes the proof.
Derivationof Eq. (6.1). Recall that e
f
N
(n) is the output of the FIR filter A
N
(z), in response
to the process x(n). So, the power spectrum of e
f
N
(n) is given by S
xx
(e

)|A
N
(e

)|
2
. Because e
f
N
(n)
is white, its spectrum is constant, and its value at any frequency equals the mean square value E
f
N
of
e
f
N
(n). Thus,
E
f
N
= S
xx
(e

)|A
N
(e

)|
2
= exp
_
Ln
_
S
xx
(e

)|A
N
(e

)|
2
_
_
= exp
_
1

_

0
Ln
_
S
xx
(e

)|A
N
(e

)|
2
_

_
= exp
_
1

_

0
LnS
xx
(e

) dω +
1

_

0
Ln|A
N
(e

)|
2

_
= exp
_
1

_

0
LnS
xx
(e

) dω
_
,
because
_

0
Ln|A
N
(e

)|
2
dω = 0 by Lemma 6.1. This establishes the desired result. The third
equality above was possible because S
xx
(e

)|A
N
(e

)|
2
is constant for all ω.
6.3 AMEASURE OF SPECTRAL FLATNESS
Consider a zero-mean WSS process x(n). We know that x(n) is said to be white if its power
spectrum S
xx
(e

) is constant for all ω. When x(n) is not white, it is of interest to introduce a
‘‘degree of whiteness’’ or ‘‘measure of flatness’’ for the spectrum. Although such a measure is by no
means unique, the one introduced in this section proves to be useful in linear prediction theory,
signal compression, and other signal processing applications.
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 69
Motivation from AM/GM ratio. The arithmetic-mean geometric mean inequality (abbre-
viated as the AM--GM inequality) says that any set of positive numbers x
1
, . . . , x
N
satisfies the
inequality
1
N
N

i=1
x
i
. ¸¸ .
AM

_
N

i=1
x
i
. ¸¸ .
GM
_
1/N
(6.11)
with equality if and only if all x
i
are identical (Bellman, 1960). Thus,
0 ≤
GM
AM
≤ 1 (6.12)
and the ratio GM/AM is a measure of how widely different the numbers {x
i
} are. As the numbers
get closer to each other, this ratio grows and becomes unity when the numbers are identical. The
ratio GM/AM can therefore be regarded as a measure of ‘‘flatness.’’ The larger this ratio, the flatter
the distribution is.
By extending the concept of the GM/AM ratio to functions of a continuous variable (i.e.,
where the integer subscript i in x
i
is replaced with a continuous variable, say frequency) we can
obtain a useful measure of flatness for the power spectrum.
Definition6.1. Spectral flatness. For a WSS process with power spectrumS
xx
(e

), we define
the spectral flatness measure as
γ
2
x
=
exp
_
1

_

0
LnS
xx
(e

) dω
_
1

_

0
S
xx
(e

) dω
(6.13)
where it is assumed that S
xx
(e

) > 0 for all ω. ♦
We will see that the ratio γ
2
x
can be regarded as the GM/AM ratio of the spectrum S
xx
(e

).
Clearly, γ
2
x
> 0. The denominator of γ
2
x
is
1

_

0
S
xx
(e

)dω = R(0),
which is the mean square value of x(n). We will first show that
1

_

0
S
xx
(e

) dω ≥ exp
_
1

_

0
LnS
xx
(e

) dω
_
(6.14)
70 THE THEORY OFLINEARPREDICTION
with equality if and only if S
xx
(e

) is constant (i.e., x(n) is white). Because of this result, we have
0 < γ
2
x
≤ 1,
with γ
2
x
= 1 if and only if x(n) is white. A rigorous proof of a more general version of (6.14) can
be found in Rudin (1974, pp. 63--64). The justification below is based on extending the AM--GM
inequality for functions of continuous argument.
Justificationof Eq. (6.14). Consider a function f (y) > 0 for 0 ≤ y ≤ a. Suppose we divide
the interval 0 ≤ y ≤ a into N uniform subintervals of length Δy as shown in Fig. 6.2. From the
AM--GM inequality, we obtain
1
N
N−1

n=0
f (nΔy) ≥
N−1

n=0
_
f (nΔy)
_
1/N
= exp
_
Ln
N−1

n=0
_
f (nΔy)
_
1/N
_
= exp
_
1
N
N−1

n=0
Ln f (nΔy)
_
.
Equality holds if and only if f (nΔy) is identical for all n. We can replace 1/N with Δy/a
(Fig. 6.2) so that the above becomes
1
a
N−1

n=0
f (nΔy)Δy ≥ exp
_
1
a
N−1

n=0
Ln f (nΔy)Δy
_
.
0
f(y)
y
Δy 2Δy

ΝΔy = a
=a/N
FIGURE6.2: Pertaining to the proof of Eq. (6.14).
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 71
If we let Δy →0 (i.e., N →∞), the two summations in the above equation become integrals. We
therefore have
1
a
_
a
0
f ( y) dy ≥ exp
_
1
a
_
a
0
Ln f ( y) dy
_
. (6.15)
With f ( y) identified as S
xx
(e

) > 0 and a = 2π, we immediately obtain Eq. (6.14).
Example 6.1. Spectral Flatness Measure. Consider a real, zero-mean WSS process x(n)
with the power spectrum S
xx
(e

) shown in Fig. 6.3(a). Fig. 6.3(b) shows the spectrum for various
values of α. It can be verified that
R(0) =
1

_

0
S
xx
(e

) dω = 2.
The quantity in the numerator of Eq. (6.13) can be computed as
exp
_
1

_

0
LnS
xx
(e

) dω
_
= exp
_
αLn
_
1
α
+ 1
_
_
,

S (e )
xx
ω
π 0
1
1 + (π/ω )
c
ω
c
α = ω /π
c

S (e )
xx
ω
π 0
1
ω

S (e )
xx
π 0
1

S (e )
xx
π 0
2
(a)
( b)
small α medium α max α (= 1)
ω
FIGURE6.3: (a) Power spectrum for Ex. 6.1 and (b) three special cases of this power spectrum.
72 THE THEORY OFLINEARPREDICTION
where α = ω
c
/π. We can simplify this and substitute into the definitionof γ
2
x
to obtain the flatness
measure
γ
2
x
= 0.5
_
1 +
1
α
_
α
This is plotted in Fig. 6.4 as a function of α. We see that as α increases from zero to unity, the
spectrum gets flatter, as expected from an examination of Fig. 6.3(b).
6.4 SPECTRAL FLATNESS OF ANARPROCESS
We know that if x(n) is an AR(N) process, then the mean square value of the optimal prediction
error is given by Eq. (6.1). The spectral flatness measure γ
2
x
defined in Eq. (6.13) therefore becomes
γ
2
x
=
E
f
N
R(0)
(6.16)
For fixed R(0), the mean square prediction error E
f
N
is smaller if the flatness measure γ
2
x
is
smaller. Qualitatively speaking, this means that if x(n) has a very nonflat power spectrum, then the
prediction error E
f
N
can be made very small. If on the other hand the spectrum S
xx
(e

) of x(n) is
nearly flat, then the prediction error E
f
N
is nearly as large as the mean square value of x(n) itself.
ExpressioninTerms of the Impulse Response. An equivalent and useful way to express the
above flatness measure is provided by the fact that x(n) is the output of the IIR filter
G(z) =
1
A
N
(z)
,
0 0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
α
γ
x
2
FIGURE6.4: Flatness measure γ
2
x
in Ex. 6.1.
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 73
in response to the input e
f
N
(n) (Fig. 2.1(b)). Because x(n) is AR(N), the error e
f
N
(n) is white, and
therefore, the power spectrum of x(n) is
S
xx
(e

) =
E
f
N
|A
N
(e

)|
2
.
The mean square value of x(n) is therefore given by
R(0) = E
f
N
×
1

_

0
1
|A
N
(e

)|
2
dω = E
f
N

n=0
|g(n)|
2
,
where g(n) is the causal impulse response of G(z). So, Eq. (6.16) becomes
γ
2
x
=
1


n=0
|g(n)|
2
=
1
E
g
,
where
E
g
=

n=0
|g(n)|
2
= energy of the filter G(z).
Summarizing, we have
Theorem6.1. Spectral flatness of AR process. Let x(n) be an AR(N) process with mean square
value R(0) = E[|x(n)|
2
]. Let A
N
(z) be the optimal Nth-order prediction polynomial and E
f
N
the
corresponding mean square prediction error. Then, the spectral flatness γ
2
x
of the process x(n) can
be expressed in either of the following two forms:
1. γ
2
x
= E
f
N
/R(0), or
2. γ
2
x
= 1/


n=0
|g(n)|
2
,
where g(n) is the causal impulse response of G(z) = 1/A
N
(z). ♦
Example 6.2. Spectral Flatness of an AR(1) Process. In Ex. 3.3, we considered a process
x(n) with autocorrelation
R(k) = ρ
|k|
,
with −1 < ρ < 1. The optimal first-order prediction polynomial was found to be
A
1
(z) = 1 −ρz
−1
.
74 THE THEORY OFLINEARPREDICTION
-1 -0.5 0 0.5 1
0
0.2
0.4
0.6
0.8
1
ρ
2
γ
x
FIGURE6.5: The flatness of an AR(1) process as a function of the correlation coefficient ρ.
We also saw that the first-order prediction error sequence e
f
1
(n) is white. This means that the
process x(n) is AR(1). The IIR all-pole filter G(z) defined in Theorem 6.1 is
G(z) =
1
A
1
(z)
=
1
1 −ρz
−1
and its impulse response is g(n) = ρ
n
U(n). So


n=0
g
2
(n) = 1/(1 − ρ
2
), and the flatness measure
for the AR(1) process x(n) is
γ
2
x
= 1 −ρ
2
.
This is plotted in Fig. 6.5. So, the process gets flatter and flatter as ρ
2
gets smaller. The power
spectrum of the process x(n), which is the Fourier transform of R(k), is given by
S
xx
(e

) =
1 − ρ
2
|1 −ρe
−jω
|
2
=
1 −ρ
2
1 +ρ
2
−2ρ cos ω
This is depicted in Fig. 6.6 for two (extreme) values of ρ in the range 0 < ρ < 1. (For −1 < ρ < 0,
it is similar except that the peak occurs at π.) As ρ increases from zero to unity (i.e., the pole of
0
10
20
0
0.5
1
ρ = 0.1 ρ = 0.9

S (e )
xx

S (e )
xx
ω π 0 ω π 0
FIGURE6.6: The power spectrum of an AR(1) process for two values of the correlation coefficient ρ.
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 75
G(z) gets closer to the unit circle), the spectrum gets more and more peaky (less and less flat),
consistent with the fact that γ
2
x
decreases as in Fig. 6.5.
6.5 CASEWHERE SIGNALIS NOT AR
We showed that if x(n) is an AR(N) process, then the Nth-order mean square prediction error is
given by (6.1). When x(n) is not AR, it turns out that this same quantity acts as a lower bound for
the mean square prediction error as demonstrated in Fig. 6.7. We now prove this result.
Theorem 6.2. Prediction error bound. Let x(n) be a WSS process with power spectrum
S
xx
(e

) finite and nonzero for all ω. Let E
f
m
be the mean square value of the optimal mth-order
prediction error. Then,
E
f
m
≥ exp
_
1

_

0
LnS
xx
(e

) dω
_
. (6.17)
Furthermore, equality holds if and only if x(n) is AR(N) and m ≥ N. ♦
Proof. We know that the prediction error e
f
m
(n) is the output of the FIR filter A
m
(z) with
input equal to x(n) (Fig. 2.1(a)). So, its power spectrum is given by S
xx
(e

)|A
m
(e

)|
2
, so that
E
f
m
=
1

_

0
S
xx
(e

)|A
m
(e

)|
2

≥ exp
_
1

_

0
Ln
_
S
xx
(e

)|A
m
(e

)|
2
_

_
(by Eq. (6.15))
= exp
_
1

_
_

0
LnS
xx
(e

) dω +
_

0
Ln|A
m
(e

)|
2

_
_
= exp
_
1

_

0
LnS
xx
(e

) dω
_
(by Lemma 6.1).
0
m
theoretical bound
(prediction
order)
1 2
f
m
FIGURE6.7: The optimal prediction error E
f
m
decreases monotonically with increasing m but is lower
bounded by Eq. (6.1) for any WSS process.
76 THE THEORY OFLINEARPREDICTION
Equality holds if and only if S
xx
(e

)|A
m
(e

)|
2
is constant, or equivalently, x(n) is AR with order
≤ m. The proof is therefore complete.
6.5.1 Error Spectrum Gets Flatter as Predictor Order Grows
We will now show that the power spectrum of the error e
f
m
(n) gets ‘‘flatter’’ as the prediction order
m increases. With S
m
(e

) denoting the power spectrum of e
f
m
(n), we have
S
m
(e

) = S
xx
(e

)|A
m
(e

)|
2
.
The flatness measure γ
2
m
for the process e
f
m
(n) is
γ
2
m
=
exp
_
1

_

0
LnS
m
(e

) dω
_
1

_

0
S
m
(e

) dω
=
exp
_
1

_

0
Ln
_
S
xx
(e

)|A
m
(e

)|
2
_

_
E
f
m
Using
_

0
Ln|A
m
(e

)|
2
dω = 0 (Lemma 6.1), this simplifies to
γ
2
m
=
exp
_
1

_

0
LnS
xx
(e

) dω
_
E
f
m
. (6.18)
The numerator is fixed for a given process. As the prediction order increases, the error E
m
decreases.
This proves that the flatness is an increasing function of prediction order m (Fig. 6.8).
0
m
(prediction
order)
1 2
γ
m
2
flatness
of error
1
FIGURE 6.8: The flatness γ
2
m
of the optimal prediction error increases monotonically with increasing
m. It is bounded above by unity.
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 77
We can rewrite γ
2
m
in terms of the flatness measure γ
2
x
of the original process x(n) as follows:
γ
2
m
=
γ
2
x
R(0)
E
f
m
.
Here, γ
2
x
and R(0) are constants determined by the input process x(n). The flatness γ
2
m
of the error
process e
f
m
(n) is inversely proportional to the mean square value of the prediction error E
f
m
.
Example 6.3: Spectral FlatteningWith Linear Prediction. In this example, we consider an
AR(4) process with power spectrum
S
xx
(e

) =
1
|A(e

)|
2
,
where
A(z) = 1 −1.8198z
−1
+ 2.0425z
−2
− 1.2714z
−3
+ 0.4624z
−4
The first few elements of the autocorrelation are
R(0) R(1) R(2) R(3) R(4) R(5) R(6) R(7)
7.674 4.967 −0.219 −3.083 −2.397 −0.639 −0.087 −0.474
If we do linear prediction on this process, the prediction polynomials are
A
0
(z) 1.0
A
1
(z) 1.0 −0.6473 0 0 0 0 0
A
2
(z) 1.0 −1.1457 0.7701 0 0 0 0
A
3
(z) 1.0 −1.5669 1.3967 −0.5469 0 0 0
A
4
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
A
5
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
A
6
(z) 1.0 −1.8198 2.0425 −1.2714 0.4624 0 0
Thus, after A
4
(z), the polynomial does not change, because the original process is AR(4). The
lattice coefficients are
k
1
k
2
k
3
k
4
k
5
k
6
k
7
−0.6473 0.7701 −0.5469 0.4624 0.0 0.0 0.0
Again, after k
4
, all the coefficients are zero because the original process is AR(4).
78 THE THEORY OFLINEARPREDICTION
Fig. 6.9 shows the power spectrumS
m
(e

) of the prediction error e
f
m
(n) for 1 ≤ m ≤ 4. The
original power spectrumS
xx
(e

) = S
0
(e

) is shownin each plot for comparison. For convenience,
the plots are shown normalized so that S
xx
(e

) has peak value equal to unity. It is seen that the
spectrum S
m
(e

) gets flatter as m grows, and S
4
(e

) is completely flat, again because the original
process is AR(4). The flatness measure, calculated for each of these power spectra, is as follows:
γ
2
0
γ
2
1
γ
2
2
γ
2
3
γ
2
4
γ
2
5
γ
2
6
0.1309 0.2248 0.5517 0.7862 1.0 1.0 1.0
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
E
r
r
o
r

p
o
w
e
r
-
s
p
e
c
t
r
u
m
m = 0
m = 1
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
E
r
r
o
r

p
o
w
e
r
-
s
p
e
c
t
r
u
m
m = 0
m = 2
(a)
(b)
FIGURE 6.9: Original AR(4) power spectrum (dotted) and power spectrum of prediction error
(solid). (a) Prediction order = 1, (b) prediction order = 2, (c) prediction order = 3, and (d) prediction
order = 4.
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 79
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
E
r
r
o
r

p
o
w
e
r
-
s
p
e
c
t
r
u
m
m = 0
m = 3
0 0.2 0.4 0.6 0.8 1
0
0.5
1
ω/π
E
r
r
o
r

p
o
w
e
r
-
s
p
e
c
t
r
u
m
m = 0
m = 4
(c)
(d)
FIGURE6.9: Continued.
6.5.2 Mean Square Error and Determinant
In Section 2.4.3, we found that the minimized mean square prediction error can be written in
terms of the determinant of the autocorrelation matrix. Thus,
det R
m
= E
f
m−1
E
f
m−2
. . . E
f
0
. (6.19)
We know that the sequence E
f
m
is monotone nonincreasing and lower bounded by zero. So it
reaches a limit E
f

as m →∞. We will now show that
lim
m→∞
_
det R
m
_
1/m
= E
f

(6.20)
80 THE THEORY OFLINEARPREDICTION
The quantity (det R
m
)
1/m
can be related to the differential entropy of a Gaussian random process
(Cover and Thomas, 1991, p. 273). We will say more about the entropy in Section 6.6.1.
Proof of (6.20). If a sequence of numbers α
0
, α
1
, α
2
, . . . tends to a limit α, then
lim
M→∞
1
M
M−1

i=0
α
i
= α. (6.21)
That is, the average value of the numbers tends to α as well (Problem 26). Defining α
i
= Ln E
f
i
,
we have from Eq. (6.19)
Ln
_
det R
m
_
= Ln E
f
m−1
+ LnE
f
m−2
. . . + Ln E
f
0
.
Because Ln(x) is a continuous function when x > 0, we can say
lim
i→∞
Ln E
f
i
= Ln E
f

. (6.22)
(Problem 26). We therefore have
lim
m→∞
1
m
Ln
_
det R
m
_
= lim
m→∞
1
m
_
Ln E
f
m−1
+ LnE
f
m−2
. . . + Ln E
f
0
_
= LnE
f

,
by invoking Eq. (6.21) and (6.22). This implies
exp
_
lim
m→∞
1
m
Ln
_
det R
m
_
_
= exp
_
Ln E
f

_
.
Because exp(x) is a continuous function of the argument x, we can interchange the limit with
exp(.). This results in (6.20) indeed.
A deeper result, called Szego’s theorem (Haykin, 1986, pp. 96), can be used to prove the
further fact that the limit in Eq. (6.20) is really equal to the bound on the right-hand side of
Eq. (6.17). That is,
lim
m→∞
_
det R
m
_
1/m
= E
f

= exp
_
1

_

0
Ln S
xx
(e

) dω
_
. (6.23)
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 81
In other words, the lower bound on the mean square prediction error, which is achieved for the
AR(N) case for finite prediction order, is approached asymptotically for the non-AR case. For
further details on the asymptotic behavior, see Grenander and Szego (1958) and Gray (1972).
Predictability measure. We now see that the flatness measure in Eq. (6.13) can be written
in the form
γ
2
x
=
E
f

R(0)
(6.24)
For very small γ
2
x
, the power E
f

in the prediction error is therefore much smaller than the power
R(0) in the original process, and we say that the process is very predictable. On the other hand, if
γ
2
x
is close to unity then the ultimate prediction error is only slightly smaller that R(0), and we say
that the process is not very predictable. For white noise, γ
2
x
= 1, and the process is not predictable,
as one would expect. The reciprocal 1/γ
2
x
can be regarded as a convenient measure of predictability
of a process.
6.6 MAXIMUM ENTROPY ANDLINEARPREDICTION
There is a close relation between linear prediction and the so-called maximum entropy methods
(Burg, 1972). We now briefly outline this. In Section 6.5, we showed that the mth-order prediction
error E
f
m
for a WSS process x(n) is bounded as
E
f
m
≥ exp
_
1

_

0
LnS
xx
(e

) dω
_
. (6.25)
We also know that if the process x(n) is AR(N), then equality is achieved in Eq. (6.25) whenever
the prediction order m ≥ N. Now consider two WSS processes x(n) and y(n) with autocorrelations
R(k) and r(k), respectively, and power spectra S
xx
(e

) and S
yy
(e

), respectively. Assume that
R(k) = r(k), |k| ≤ N.
Then the predictor polynomials A
m
(z) and the mean square prediction errors E
f
m
will be identical
for the processes, for 1 ≤ m ≤ N.
Suppose now that y(n) is AR(N), but x(n) is not necessarily AR, then from Section 6.2, we
know that
exp
_
1

_

0
LnS
yy
(e

) dω
_
= E
f
N
82 THE THEORY OFLINEARPREDICTION
From Theorem 6.2, on the other hand, we know that
E
f
N
≥ exp
_
1

_

0
LnS
xx
(e

) dω
_
Comparison of the two preceding equations shows that
exp
_
1

_

0
LnS
yy
(e

) dω
_
≥ exp
_
1

_

0
LnS
xx
(e

) dω
_
.
Moreover, they are equal if and only if x(n) is also AR(N) (by Theorem 6.2). Because exp(v) is a
monotone-increasing function of real v, the above also means
_

0
LnS
yy
(e

)dω ≥
_

0
LnS
xx
(e

)dω.
Summarizing, we have proved the following result.
Theorem 6.3. Let x(n) and y(n) be two WSS processes with autocorrelations R(k) and
r(k), respectively, and power spectra S
xx
(e

) > 0 and S
yy
(e

) > 0, respectively. Assume that
r(k) = R(k) for |k| ≤ N and that y(n) is AR(N), then
_

0
LnS
yy
(e

) dω
. ¸¸ .
AR(N) process

_

0
LnS
xx
(e

) dω (6.26)
with equality if and only if x(n) is also AR. ♦
6.6.1 Connection to the Notion of Entropy
The importance of the above result lies in the fact that the integrals in Eq. (6.26) are related to the
notion of entropy in information theory. If a WSS process y(n) is Gaussian, then it can be shown
(Cover and Thomas, 1991) that the quantity
φ = a + b
_

0
LnS
yy
(e

) dω (6.27)
is equal to the entropy per sample (entropy rate) of the process, where a = ln

2πe and b = 1/4π.
Details. For discrete-amplitude random variables, entropy is related to the extent of
‘‘uncertainty’’ or ‘‘randomness.’’ For a continuous-amplitude random variable, the entropy is more
appropriately called differential entropy and should be interpreted carefully. Thus, given a random
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 83
variable x with differential entropy H
x
, consider the quantized version x
Δ
with a fixed quantization
step size Δ. Then the number of bits required to represent x
Δ
is
b
Δ
= H
x
+ c,
where c depends only on the step size Δ (Cover and Thomas, 1991, p. 229). Thus, for fixed step
size, larger differential entropy implies more number of bits, uncertainty, or randomness.
Although the integral in the second term of (6.27) will be referred to as ‘‘entropy’’ and
interpreted as ‘‘randomness’’ in the following discussions, the details given above should be kept
in mind. The problem of maximizing φ under appropriate constraints is said to be an entropy
maximization problem. Theorem 6.3 says that, among all Gaussian WSS processes with fixed
autocorrelation for |k| ≤ N, the entropy has the maximum value for AR(N) processes. Furthermore,
all AR(N) processes with autocorrelation fixed as above have the same entropy.
Now assume that y(n) is the AR(N) model for x(n) obtained by use of linear prediction as
in Fig. 5.3. We know that in this model e(n) is white noise with variance E
f
N
. If we also make e(n)
Gaussian, then the output y(n) will be Gaussian as well (Papoulis, 1965). In other words, we have
obtained an AR(N) model y(n) for the process x(n), with the following properties.
1. The AR(N) process y(n) is Gaussian.
2. The autocorrelation r(k) of the AR(N) process y(n) matches the autocorrelation R(k) of
the given process x(n) (i.e., R(k) = r(k)), for |k| ≤ N (as shown in Section 5.4).
3. Under this matching constraint, the entropy
_

0
LnS
yy
(e

) dω of the process y(n) is the
largest possible. In other words, y(n) is ‘‘as random as it could be,’’ under the constraint
r(k) = R(k) for |k| ≤ N.
The relation between linear prediction and maximum entropy methods is also discussed by Burg
(1972), Robinson (1982), and Cover and Thomas (1991). Robinson (1982) also gives a good
perspective on the history of spectrum estimation. Also see Papoulis (1981).
6.6.2 ADirect Maximization Problem
A second justification of Theorem 6.3 will now be mentioned primarily because it might be more
insightful for some readers. Suppose we forget about the linear prediction problem and consider
the following maximizationproblem directly: we are given a WSS process x(n) with autocorrelation
R(k). We wish to find a WSS process y(n) with the following properties.
1. Its autocorrelation r(k) satisfies r(k) = R(k) for |k| ≤ N.
84 THE THEORY OFLINEARPREDICTION
2. With S
yy
(e

) denoting the power spectrum of y(n), the entropy
_

0
LnS
yy
(e

) dω is as
large as possible. This must be achieved by optimizing the coefficients r(k) for |k| > N.
Anecessary conditionfor the maximization
2
of the entropy φdefinedinEq. (6.27) is ∂φ/∂r (k) = 0
for |k| > N. Now,
∂φ
∂r(k)
= b
_

0
∂[LnS
yy
(e

)]
∂r(k)

= b
_

0
1
S
yy
(e

)
∂S
yy
(e

)
∂r(k)

= b
_

0
1
S
yy
(e

)
e
−jωk

The last equality follows by using S
yy
(e

) =


k=−∞
r(k)e
−jωk
. Setting ∂φ/∂r(k) = 0, we
therefore obtain
_

0
1
S
yy
(e

)
e
−jωk
dω = 0, |k| > N.
This says that the inverse Fourier transform of 1/S
yy
(e

) should be of finite duration, restricted to
the region |k| ≤ N. In other words, we must be able to express 1/S
yy
(e

) as
1
S
yy
(e

)
= c
1
|A
N
(e

)|
2
,
or equivalently
S
yy
(e

) =
c
|A
N
(e

)|
2
,
where A
N
(z) is an FIR function of the form A
N
(z) = 1 +

N
i=1
a

N,i
z
−i
. Thus, S
yy
(e

) is an
autoregressive spectrum.
Because the desired solution has been proved to be AR(N), we can find it simply by finding
the AR(N) model of the process x(n) using linear prediction techniques (Section 5.3). This is
because we already know that the AR(N) model y(n) arising out of linear prediction satisfies
r(k) = R(k) for |k| ≤ N (Section 5.4) and, furthermore, that an AR(N) spectrum satisfying the
constraint r(k) = R(k), |k| ≤ N is unique (because the solution to the normal equations is unique).
2
The fact that this maximizes rather than minimizes φ is verified by examining the Hessian matrix (Chong and
˙
Zak, 2001; Antoniou and Lu, 2007).
PREDICTIONERROR BOUNDANDSPECTRALFLATNESS 85
6.6.3 Entropy of the Prediction Error
Let e
f
m
(n) be the mth-order optimal prediction error for a WSS process x(n), and let S
m
(e

) be
the power spectrum for e
f
m
(n). With S
xx
(e

) denoting the power spectrum of x(n), we have
S
m
(e

) = S
xx
(e

)|A
m
(e

)|
2
and S
0
(e

) = S
xx
(e

). Because
_

0
Ln|A
m
(e

)|
2
dω = 0 (Lemma 6.1), we have
_

0
LnS
m
(e

) dω =
_

0
LnS
xx
(e

) dω,
for all ω. If x(n) is Gaussian, then so is e
f
m
(n), and the integral above represents the entropy of
e
f
m
(n).
Thus, in the Gaussian case, the entropy of the prediction error is the same for all prediction
orders m and equal to the entropy of the input x(n). From Eq. (6.18), we see that the flatness γ
2
m
of the error e
f
m
(n) is given by
γ
2
m
=
K
E
f
m
, (6.28)
where K is independent of m and depends only on the entropy of x(n). So, the flatness of the
prediction error is proportional to 1/E
f
m
. Thus, as the prediction order m increases, the error
spectrum becomes flatter, not because the entropy increases, but because the mean square error E
f
m
decreases.
6.7 CONCLUDINGREMARKS
The derivation of the optimal linear predictor was based only on the idea that the mean-squared
prediction error should be minimized. We also pointed out earlier that the solution has the
minimum-phase property and that it has structural interpretations in terms of lattices. We now
see that the solution has interesting connections to flatness measures, entropy, and so forth. It is
fascinating to see how all these connections arise starting from the ‘‘simple’’ objective of minimizing
a mean square error.
Another interesting topic we have not discussed here is the prediction of a continuous-time
band-limited signal from a finite number of past samples. With sufficiently large sampling rate, this
can be done accurately, with predictor coefficients independent of the signal! This result dates back
to Wainstein and Zubakov (1962). Further results can be found in Vaidyanathan (1987), along
with a discussion of the history of this problem.
• • • •
87
C H A P T E R 7
Line Spectral Processes
7.1 INTRODUCTION
A WSS random process is said to be line spectral, if the power spectrum S
xx
(e

) consists only of
Dirac delta functions, that is,
S
xx
(e

) = 2π
L

i=1
c
i
δ
a
(ω − ω
i
), 0 ≤ ω < 2π, (7.1)
where {ω
i
} is a set of L distinct frequencies (called line frequencies) and c
i
≥ 0. If c
i
> 0 for all i,
then L represents the number of lines in the spectrum as demonstrated in Fig. 7.1, and we say that
L is the degree of the process. We also sometimes abbreviate this as a Linespec(L) process. Note
that the power of the signal x(n) around frequency ω
i
is c
i
, that is,
1

_
ω
i
+
ω
i

S
xx
(e

) dω = c
i
(7.2)
for sufficiently small > 0. So we say that c
i
is the power at the line frequency ω
i
.
In this chapter, we study line spectral processes in detail. We first express the Linespec(L)
property concisely in terms of the autocorrelation matrix. We then study the time domain
properties and descriptions of Linespec(L) processes. For example, we show that a Linespec(L)
process satisfies a homogeneous difference equation. Using this, we can predict, with zero error,
all of its samples, if we know the values of L successive samples. We then consider the problem of
identifying a Linespec(L) process in noise. More specifically, suppose we have a signal y(n), which
is a sum of a Linespec(L) process x(n) and uncorrelated white noise e(n):
y(n) = x(n) + e(n).
We will show how the line frequencies ω
i
and the line powers c
i
in Eq. (7.1) can be extracted from
the noisy signal.
88 THE THEORY OFLINEARPREDICTION

S (e )
xx
ω
2π 0
ω
1

ω
L
ω
2
2π c
1
2π c
2
FIGURE 7.1: Power spectrum of a line spectral process with L lines. The kth line is a Dirac delta
function with height 2πc
k
.
7.2 AUTOCORRELATION OF ALINESPECTRAL PROCESS
The autocorrelation of the line spectral process is the inverse Fourier transform of Eq. (7.1):
R(k) =
L

i=1
c
i
e

i
k
. (7.3)
This is a superposition of single-frequency sequences. Consider the (L + 1) ×(L + 1) autocorre-
lation matrix R
L+1
of a WSS process:
R
L+1
=






R(0) R(1) . . . R(L)
R*(1) R(0) . . . R(L −1)
.
.
.
.
.
.
.
.
.
.
.
.
R*(L) R*(L −1) . . . R(0)






(7.4)
The Linespec(L) property can be stated entirely in terms of R
L+1
and R
L
as shown by the following
result.
Theorem7.1. Line spectral processes. A WSS process x(n) is line spectral with degree L if and
only if the matrix R
L+1
is singular and R
L
is nonsingular. ♦
Proof. We already showed in Section 2.4.2 that if R
L+1
is singular, then the power spectrum
has the form Eq. (7.1) (line spectral, with degree ≤ L). Conversely, assume x(n) is line spectral
with degree ≤ L. If we pass x(n) through an FIR filter V(z) of order ≤ L with zeros at the line
frequencies ω
i
, the output is zero for all n. With the filter written as
V(z) =
L

i=0
v
i
* z
−i
, (7.5)
LINESPECTRAL PROCESSES 89
the output is
v
0
* x(n) + v
1
* x(n − 1) + . . . + v
L
* x(n − L) = 0. (7.6)
Defining
x(n) =
_
x(n) x(n −1) . . . x(n −L)
_
T
and v =
_
v
0
v
1
. . . v
L
_
T
, we can rewrite (7.6) as v

x(n) = 0. From this, we obtain
v

E[x(n)x

(n)]v = 0, that is,
v

R
L+1
v = 0.
Because R
L+1
is positive semidefinite, the above equation implies that R
L+1
is singular. Summa-
rizing, R
L+1
is singular if and only if x(n) is line spectral with degree ≤ L. By replacing L + 1
with L, it also follows that R
L
is singular if and only if x(n) is line spectral with degree ≤ L −1.
This means, in particular, that if x(n) is line spectral with degree equal to L, then R
L
has to be
nonsingular and, of course, R
L+1
singular.
Conversely, assume R
L
is nonsingular and R
L+1
singular. The latter implies that the process
is line spectral with degree ≤ L, as we already showed. The former implies that the degree is not
≤ L − 1. So the degree has to be exactly L.
Notice, by the way, that R
L
is a principal submatrix of R
L+1
in two ways:
R
L+1
=
_
R
L
×
× R(0)
_
=
_
R(0) ×
× R
L
_
. (7.7)
We now show that if x(n) is Linespec(L), there is a unique nonzero vector v (up to a scale factor)
that annihilates R
L+1
:
R
L+1
v = 0. (7.8)
Proof. Because R
L+1
is singular, it is obvious that there exists an annihilating vector v. To
prove uniqueness, we assume there are two linearly independent vectors u = 0 and w = 0, such that
R
L+1
u = R
L+1
w = 0, and bring about a contradiction. The 0th elements of these vectors must be
nonzero, that is, u
0
= 0 and w
0
= 0, for otherwise the remaining components annihilate R
L
in
view of (7.7), contradicting nonsingularity of R
L
. Now consider the vector y = u/u
0
−w/w
0
. This
is nonzero (because u and w are linearly independent), but its zeroth component is zero:
y =
_
0
z
_
= 0
Because R
L+1
y = 0, it follows that R
L
z = 0, z = 0, which violates nonsingularity of R
L
.
90 THE THEORY OFLINEARPREDICTION
The result can also be proved by using the eigenvalue interlace property for principal
submatrices of Hermitian matrices (Horn and Johnson, 1985). But the above proof is more direct.
7.2.1 The Characteristic Polynomial
The unique vector v that annihilates R
L+1
has zeroth component v
0
= 0. For, if v
0
= 0, then the
smaller vector
w =
_
v
1
v
2
. . . v
L
_
T
would satisfy R
L
w = 0 (in view of Eq. (7.7)) contradicting the fact that that R
L
is nonsingular.
So, without loss of generality, we can assume v
0
= 1. So the FIR filter V(z) in Eq. (7.5), which
annihilates the line spectral process x(n), can be written as
V(z) = 1 +
L

i=1
v
i
* z
−i
.
This filter, determined by the unique eigenvector v, is itself unique and is called the characteristic
polynomial or characteristic filter of the line spectral process. From Eq. (7.6), we see that a
Linespec(L) process x(n) also satisfies the homogeneous difference equation
x(n) = −
L

i=1
v
i
* x(n −i). (7.9)
This means, in particular, that all future samples can be predicted with zero error if a block of L
samples is known.
1
m
2
L
rank of R
m
1
L
FIGURE 7.2: The rank of the autocorrelation matrix R
m
as a function of size m, for a Linespec(L)
process.
LINESPECTRAL PROCESSES 91
7.2.2 Rank Saturation for a Line Spectral Process
From the above discussions, we see that if x(n) is Linespec(L), then R
L
as well as R
L+1
have rank
L. It can, in fact, be proved (Problem 24) that
rank R
m
=
_
m for 1 ≤ m ≤ L
L for m ≥ L.
(7.10)
This is indicated in Fig. 7.2. Thus, the rank saturates as soon as m exceeds the value L.
7.3 TIME DOMAINDESCRIPTIONS
Line spectral processes exhibit a number of special properties, which are particularly useful when
viewed in the time domain. In this section, we discuss some of these.
7.3.1 Extrapolation of Autocorrelation
We showed that a Linespec(L) process x(n) sastisfies the homogeneous difference equation
Eq. (7.9). If we multiply both sides of (7.9) by x*(n −k) and take expectations, we get
R(k) = −
L

i=1
v
i
* R(k −i), (7.11)
where we have used R(k) = E[x(n)x*(n − k)]. Conversely, we will see that whenever the autocor-
relation satisfies the above recursion, the random process x(n) itself satisfies the same recursion,
that is, (7.9) holds.
Thus, if we know R(k) for 0 ≤ k ≤ L −1 for a Linespec(L) process x(n), we can compute
R(k) for all k using (7.11). This is similar in spirit to the case where x(n) is AR(N), in which case,
the knowledge of R(k) for 0 ≤ k ≤ N would reveal the polynomial A
N
(z) and therefore the value
of R(k) for all k (Section 5.3.2).
7.3.2 All Zeros on the Unit Circle
From the proof of Theorem 7.1, we know that V(z) annihilates the process x(n). This implies, in
particular, that all the L zeros of
V(z) = 1 +
L

m=1
v
m
* z
−m
(7.12)
92 THE THEORY OFLINEARPREDICTION
are on the unit circle at the L distinct points ω
i
, where S
xx
(e

) has the lines.
1
This means, in
particular, that the numbers v
i
satisfy the property
v
i
= cv
L−i
* , |c| = 1, 0 ≤ i ≤ L, (7.13)
where v
0
= 1. That is, V(z) is a linear phase filter. Because V(z) has the L distinct zeros e

i
, the
solutions to the homogeneous equation Eq. (7.9) necessarily have the form
x(n) =
L

i=1
d
i
e

i
n
. (7.14)
Note that there are only Lrandomvariables d
i
inEq. (7.14), althoughthe ‘‘random’’ process x(n) is an
infinite sequence. The WSS property of x(n) imposes some strong conditions on the joint behavior
of the random variables d
i
, which we shall derive soon. The deterministic quantity c
i
in Eq. (7.3) is
the power of the process x(n) at the line frequency ω
i
, whereas the random variable d
i
is said to be
the random amplitude (possibly complex) at the frequency ω
i
.
7.3.3 Periodicity of a Line Spectral Process
The process (7.14) is periodic if and only if all the frequencies ω
i
are rational multiples of 2π
(Oppenheim and Schafer, 1999). That is,
ω
i
=
_
K
i
M
i
_

for integers K
i
, M
i
. In this case, we can write ω
i
= 2πL
i
/M where M is the least common multiple
(LCM) of the set {M
i
}, and L
i
are integers. Thus, all the frequencies are harmonics of the
fundamental frequency 2π/M. Under this condition, we say that the process is periodic with period
M or just harmonic. Because R(k) has the same form as x(n) (compare Eqs. (7.3) and (7.14)), we
see that R(k) is periodic if and only if x(n) is periodic. In this connection, the following result is
particularly interesting.
Theorem 7.2. Let x(n) be a WSS random process with autocorrelation R(k). Suppose
R(0) = R(M) for some M > 0. Then, x(n) is periodic with period M, and so is R(k). ♦
Proof. Because R(0) = R(M), we have
E[x(n)x*(n)] = E[x(n)x*(n − M)]. (7.15)
1
Another way to see this would be as follows: R(k) satisfies the homogeneous difference equation (7.11). At the
same time, it also has the form (7.3). From the theory of homogeneous equations, it then follows that the L zeros
of V(z) have the form e

i
.
LINESPECTRAL PROCESSES 93
The mean square value of x(n) − x(n − M), namely,
E
_
_
x(n) − x(n −M)
__
x*(n) −x*(n −M)
_
_
,
can be written as
μ = E[|x(n)|
2
] −E[x(n)x*(n −M)]
−E[x(n − M)x*(n)] + E[|x(n − M)|
2
].
The first and the last terms are equal because of WSS property. Substituting further from
Eq. (7.15), we get μ = 0. So the random variable x(n) −x(n − M) has mean square value equal
to zero. This implies x(n) = x(n − M). So x(n) is periodic with period M. Using the definition
R(k) = E[x(n)x*(n − k)], it is readily verified that R(k) is also periodic, with period M.
7.3.4 Determining the Parameters of a Line Spectral Process
Given the autocorrelation R(k), 0 ≤ k ≤ L of the Linespec(L) process x(n), we can identify the
unique eigenvector v satisfying R
L+1
v = 0. Then, the unique characteristic polynomial V(z) is
determined (Eq. (7.12)). All its L zeros are guaranteed to be on the unit circle (i.e., they have the
form e

i
). The line frequencies ω
i
can therefore be identified. From Eq. (7.3), we can solve for the
constants c
i
(line powers) by writing a set of linear equations. For example, if L = 3, we have



1 1 1
e

1
e

2
e

3
e
2jω
1
e
2jω
2
e
2jω
3






c
1
c
2
c
3



=



R(0)
R(1)
R(2)



. (7.16)
The 3 × 3 matrix is a Vandermonde matrix and is nonsingular because ω
i
are distinct (Horn and
Johnson, 1985). So, we can uniquely identify the line powers c
i
. Thus, the line frequencies and
powers in Fig. 7.1 have been completely determined from the set of coefficients R(k), 0 ≤ k ≤ L.
Identifying amplitudes of sinusoids in x (n). Similarly, consider the samples x(n) of
the Linespec(L) random process x(n). If we are given the values of L successive samples, say
x(0), x(1), . . . x(L −1), we can uniquely identify the coefficients d
i
appearing in Eq. (7.14) by
solving a set of equations similar to (7.16). Thus, once we have measured L samples of the random
process, we can find all the future and past samples exactly, with zero error! The randomness is only
in the L coefficients {d
i
}, and once these have been identified, x(n) is known for all n.
7.4 FURTHER PROPERTIES OF TIMEDOMAINDESCRIPTIONS
Starting from the assumption that x(n) is a line spectral process, we derived the time domain
expression Eq. (7.14). Conversely, consider a sequence of the form Eq. (7.14), where d
i
are random
94 THE THEORY OFLINEARPREDICTION
variables. Is it necessarily a line spectral process? Before addressing this question, we first have to
address WSS property. The WSS property of Eq. (7.14) itself imposes some severe restrictions on
the random variables d
i
, as shown next.
Lemma 7.1. Consider a random process of the form
x(n) =
L

i=1
d
i
e

i
n
, (7.17)
where d
i
are random variables and ω
i
are distinct constants. Then, x(n) is a WSS process if and
only if
1. E[d
i
] = 0 whenever ω
i
= 0 (zero-mean condition).
2. E[d
i
d
m
* ] = 0 for i = m (orthogonality condition). ♦
Proof. Recall that x(n) is WSS if E[x(n)] and E[x(n)x*(n −k)] are independent of n. First,
assume that the two conditions in the lemma are satisfied. We have E[x(n)] =

L
i=1
E[d
i
]e

i
n
. By
the first condition of the lemma, this is constant for all n. Next,
E[x(n)x*(n − k)] =
L

i=1
L

m=1
E[d
i
d
m
* ]e

i
n
e
−jω
m
(n−k)
(7.18)
With d
i
satisfying the second condition of the lemma, this reduces to
E[x(n)x*(n −k)] =
L

i=1
E[|d
i
|
2
]e

i
k
, (7.19)
which is independent of n. Thus, the two conditions of the lemma imply that x(n) is WSS. Now
consider the converse. Suppose x(n) is WSS. Because E[x(n)] is independent of n, we see from
Eq. (7.17) that E[d
i
] = 0 whenever ω
i
= 0. So the first condition of the lemma is necessary. Next,
Eq. (7.18) can be rewritten as
E[x(n)x*(n −k)] =
L

i=1
L

m=1
e

i
n
E[d
i
d
m
* ]e
−jω
m
n
e

m
k
.
LINESPECTRAL PROCESSES 95
Express the right-hand side as
[ e

1
n
. . . e

L
n
]D






e
−jω
1
n
0 . . . 0
0 e
−jω
2
n
. . . 0
.
.
.
.
.
.
.
.
.
.
.
.
0 0 . . . e
−jω
L
n






. ¸¸ .
call this A(n)






e

1
k
e

2
k
.
.
.
e

L
k






. ¸¸ .
t(k)
,
where D is an L × L matrix with [D]
im
= E[d
i
d
m
* ]. WSS property requires that A(n)t(k) be
independent of n, for any fixed choice of k. Thus, defining the matrix
T = [ t(0) t(1) . . . t(L −1) ],
we see that A(n)T must be independent of n. But T is a Vandermonde matrix and is nonsingular
because ω
i
are distinct. So A(n) must itself be independent of n. Consider now its mth column:
[A(n)]
m
=
L

i=1
E[d
i
d
m
*]e
j(ω
i
−ω
m
)n
.
Because ω
i
are distinct, the differences ω
i
−ω
m
are distinct for fixed m. So the above sum is
independent of n if and only if E[d
i
d
m
* ] = 0 for i = m. The second condition of the lemma is,
therefore, necessary as well.
From Eq. (7.19), it follows that R(k) has the form
R(k) =
L

i=1
E[|d
i
|
2
]e

i
k
(7.20)
so that the power spectrum is a line spectrum. Thus, if d
i
satisfies the conditions of the lemma,
then x(n) is not only WSS, but is also a line spectral process of degree L. Summarizing, we have
proved this:
Theorem7.3. Consider a random process of the form
x(n) =
L

i=1
d
i
e

i
n
, (7.21)
where d
i
are random variables and ω
i
are distinct constants. Then, x(n) is a WSS and, hence, a line
spectral process if and only if
1. E[d
i
] = 0 whenever ω
i
= 0 (zero-mean condition).
2. E[d
i
d
m
* ] = 0 for i = m (orthogonality condition).
96 THE THEORY OFLINEARPREDICTION
Under this condition, R(k) is as in Eq. (7.20), so that E[|d
i
|
2
] represents the power at the line
frequency ω
i
. ♦
Example 7.1: Stationarityand Ergodicity of Line spectral Processes. Consider the random
process x(n) = Ae

0
n
, where ω
0
= 0 is a constant and A is a zero-mean random variable. We see
that E[x(n)] = E[A]e

0
n
= 0 and E[x(n)x*(n −k)] = E[|A|
2
]e

0
k
. Becausethese are independent
of n, the process x(n) is WSS.
If a process x(n) is ergodic (Papoulis, 1965), then, in particular, E[|x(n)|
2
] must be equal to
the time average. In our case, because |x(n)| = |A|, we see that
1
2N + 1
N

n=−N
|x(n)|
2
= |A|
2
and is, in general, a randomvariable! So the time average will in general not be equal to the ensemble
average E[|A|
2
], unless |A|
2
is nonrandom. (For example, if A = ce

, where c is a constant and θ is
a random variable, then |A|
2
would be a constant.) So we see that, a WSS line spectral process is
not ergodic unless we impose some strong restriction on the random amplitudes.
Example 7.2: Real RandomSinusoid. Next consider a real random process of the form
x(n) = A cos(ω
0
n + θ),
where A and θ are real random variables and ω
0
> 0 is a constant. We have
x(n) = 0.5Ae

e

0
n
+ 0.5Ae
−jθ
e
−jω
0
n
.
From Lemma 7.1, we know that this is WSS if and only if
E[Ae

] = 0, E[Ae
−jθ
] = 0, and E[A
2
e
j2θ
] = 0. (7.22)
This is satisfied, for example, if A and θ are statistically independent random variables and θ is
uniformly distributed in [0, 2π). Thus, statistical independence implies
E[Ae

] = E[A]E[e

] and E[A
2
e
j2θ
] = E[A
2
]E[e
j2θ
],
and the uniform distribution of θ implies E[e

] = E[e
j2θ
] = 0, so that Eq. (7.22) is satisfied.
Notice that Lemma 7.1 requires that Ae

and Ae
−jθ
have zero mean for WSS property, but it is not
necessary for A itself to have zero mean! Summarizing, x(n) = A cos(ω
0
n +θ) is WSS whenever
the real random variables A and θ are statistically independent and θ is uniformly distributed in
[0, 2π]. It is easily verified that E[x(n)] = 0 and
R(k) = E[x(n)x(n −k)] = 0.5E[A
2
]cos(ω
0
k) = Pcos(ω
0
k),
LINESPECTRAL PROCESSES 97
where P = 0.5E[A
2
]. So the power spectrum is
S
xx
(e

) =
P
2
× 2π
_
δ
a
(ω − ω
0
) + δ
a
(ω + ω
0
)
_
, 0 ≤ ω < 2π.
The quantity P is the total power in the two line frequencies ±ω
0
.
Example 7.3: Sumof Sinusoids. Consider a real random process of the form
x(n) =
N

i=1
A
i
cos(ω
i
n +θ
i
). (7.23)
Here A
i
and θ
i
are real random variables and ω
i
are distinct positive constants. Suppose we make
the following statistical assumptions:
1. A
i
and θ
m
are statistically independent for any i, m.
2. θ
i
and θ
m
are statistically independent for i = m.
3. θ
m
is uniformly distributed in 0 ≤ θ
m
< 2π for any m.
We can then show that x(n) is WSS. For this, we only have to rewrite x(n) in the complex form
Eq. (7.21) and verify that the conditions of Theorem 7.3 are satisfied (Problem 25). Notice that
no assumption has been made about the joint statistics of A
i
and A
m
, for i = m. Under the above
conditions, it can be verified that E[x(n)] = 0. How about the autocorrelation? We have
R(k) = E[x(n)x(n −k)]
= E[x(0)x(−k)] (using WSS)
= E
N

i=1
A
i
cos(θ
i
)
N

m=1
A
m
cos(−kω
m

m
).
Using the assumptions listed above, the cross terms for i = m reduce to
E[A
i
A
m
]E[cos(θ
i
)]E[cos(−kω
m

m
)] = 0.
When i = m, the terms reduce to P
i
cos(ω
i
k) as in Ex. 7.2. Thus,
R(k) = E[x(n)x(n − k)] =
N

i=1
P
i
cos(ω
i
k),
where P
i
= 0.5E[A
2
i
] > 0. The power spectrum is therefore
S
xx
(e

) = π
N

i=1
P
i
_
δ
a
(ω − ω
i
) +δ
a
(ω + ω
i
)
_
, 0 ≤ ω < 2π.
The total power at the frequencies ω
i
and −ω
i
is given by 0.5P
i
+ 0.5P
i
= P
i
.
98 THE THEORY OFLINEARPREDICTION
n
cos(ω n)
0
1
0
FIGURE7.3: Plot of the cosine.
Philosophical Discussions. Arandomprocess of the formEq. (7.21), where there are only L
random variables d
i
, always appears to hold some conceptual mystery. If we measure the L random
variables d
i
by some means, then we know the entire waveform x(n). So the samples are fully
predictable, as also seen from the difference equation Eq. (7.9). Furthermore, a plot of x(n) does
not look ‘‘random.’’ For example, consider A cos(ω
0
n +θ) discussed in Ex. 7.2. Here, the random
variables A and θ do not depend on n, and the plot of x(n) as a function of n (Fig. 7.3) is just a nice
and smooth cosine!
So, where does the randomness manifest? Recall that a random process, by definition, is
a collection (or ensemble) of waveforms (Papoulis, 1965; Peebles, 1987). Each waveform is an
outcome of an experiment. For example, the result of an experiment determines the values of the
random variables A and θ to be A
0
and θ
0
, and the outcome A
0
cos(ω
0
n + θ
0
) is fully determined
for all n and is said to be a realizationof the random process. It is for this reasonthat it is somewhat
tricky to force ergodicity (Ex. 7.1); ergodicity says that time averages are equal to ensemble averages,
but if a particular outcome (time function) of the random process does not exhibit ‘‘randomness’’
along time, then the time average cannot tell us much about the ensemble averages indeed.
7.5 PREDICTIONPOLYNOMIAL OF LINESPECTRAL
PROCESSES
Let x(n) be Linespec(L) and let R
L+1
be its autocorrelation matrix of size (L + 1) × (L + 1). In
Section 7.2, we argued that there exists a unique vector v of the form
v = [ 1 v
1
. . . v
L
]
T
such that R
L+1
v = 0. By comparison with the augmented normal equation Eq. (2.20), we see that
if we take the coefficients of the Lth-order predictor to be
a
L,i
= v
i
,
LINESPECTRAL PROCESSES 99
then the optimal Lth-order prediction error E
f
L
becomes zero. The optimal predictor polynomial
for the process x(n) is therefore
A
L
(z) = 1 + v

1
z
−1
+ . . . + v

L
z
−L
= V(z).
In Section 7.3.2, we saw that V(z) has all the L zeros on the unit circle, at the points e

i
, where ω
i
are the line frequencies. We therefore conclude that the optimal Lth-order predictor polynomial
A
L
(z) of a Linespec(L) process has all L zeros on the unit circle. This also implies
a
n
= ca

L−n
for some c with |c| = 1. For example, in the real case, a
n
is symmetrical or antisymmetrical. In what
follows, we present two related results.
Lemma 7.2. Let R(k) be the autocorrelation of a WSS process x(n), and let R(k) satisfy the
difference equation
R(k) = −
L

i=1
a

i
R(k − i). (7.24)
Furthermore, let there be no such equation of smaller degree satisfied by R(k). Then the process
x(n) is line spectral with degree L and the polynomial A
L
(z) = 1 +

L
i=1
a

i
z
−i
necessarily has all
the L zeros on the unit circle. ♦
Proof. By using the property R(k) = R*(−k), we can verify that Eq. (7.24) implies






R(0) R(1) . . . R(L)
R*(1) R(0) . . . R(L − 1)
.
.
.
.
.
.
.
.
.
.
.
.
R*(L) R*(L − 1) . . . R(0)






. ¸¸ .
R
L+1






1
a
L
.
.
.
a
L






= 0.
So R
L+1
is singular. If R
L
were singular, we could proceed as in Section 7.2.1 to show that x(n)
satisfies a difference equation of the form Eq. (7.9) (but with lower-degree L −1); from this, we
could derive an equation of the form Eq. (7.24) but with lower degree L − 1. Because this violates
the conditions of the lemma, we conclude that R
L
is nonsingular. Using Theorem 7.1, we therefore
conclude that the process x(n) is Linespec(L). It is clear from Section 7.3 that the polynomial
A
L
(z) is the characteristic polynomial V(z) of the process x(n) and therefore has all zeros on the
unit circle.
100 THETHEORY OFLINEARPREDICTION
We now present a related result directly in terms of the samples x(n) of random process.
This is a somewhat surprising conclusion, although its proof is very simple in light of the above
discussions. (For a more direct proof, see Problem 23.)
Theorem7.4. Let x(n) be a random process satisfying
x(n) = −
L

i=1
a

i
x(n −i). (7.25)
Furthermore, let there be no such equation of smaller degree satisfied by x(n). If the process x(n) is
WSS, then
1. The polynomial A
L
(z) = 1 +

L
i=1
a

i
z
−i
has all the L zeros on the unit circle.
2. x(n) is line spectral with degree L. ♦
Thus, if a process satisfies the difference equation Eq. (7.25) (and if L is the smallest such number),
then imposing the WSS property on x(n) immediately forces the zeros of A
L
(z) to be all on the
unit circle!
Proof. Suppose the process is WSS. If we multiply both sides of Eq. (7.25) by x*(n − k)
and take expectations, this results in Eq. (7.24). If there existed an equation of the form Eq. (7.24)
with L replaced by L −1, we could prove then that R
L
is singular and proceed as in Section 7.2.1
to derive an equation of the form Eq. (7.25) with L replaced by L −1. Because this violates the
conditions of the theorem, there is no smaller equation of the form Eq. (7.24). Applying Lemma
7.2, the claim of the theorem immediately follows.
7.6 SUMMARY OF PROPERTIES
Before we proceed to line spectral processes buried in noise, we summarize, at the expense of being
somewhat repetitive, the main properties of line spectral processes.
1. Spectrum has only impulses. We say that a WSS random process is Linespec(L) if the power
spectrum is made of L impulses, that is,
S
xx
(e

) = 2π
L

i=1
c
i
δ
a
(ω −ω
i
), 0 ≤ ω < 2π, (7.26)
with c
i
> 0 for any i. Here, ω
i
are distinct and called the line frequencies and c
i
are the
powers at these line frequencies. In this case, the process x(n) as well as its autocorrelation
LINESPECTRAL PROCESSES 101
R(k) satisfy an Lth-order recursive, homogeneous, difference equation:
x(n) = −
L

i=1
v
i
*x(n −i), R(k) = −
L

i=1
v
i
*R(k − i). (7.27)
So all the samples of the sequence x(n) can be predicted with zero error, if L successive
samples are known. Similarly, all the autocorrelation coefficients R(k) of the process x(n)
can be computed if R(k) is known for 0 ≤ k ≤ L − 1.
2. Characteristic polynomial . In the above, v
i
* are the coefficients of the polynomial
V(z) = 1 +
L

i=1
v
i
* z
−i
. (7.28)
This is called the characteristicpolynomial of the Linespec(L) process x(n). This polynomial
has all its L zeros on the unit circle at z = e

i
.
3. Sum of sinusoids. The random process x(n) as well as the autocorrelation R(k) can be
expressed as a sum of L single-frequency signals
x(n) =
L

i=1
d
i
e

i
n
, R(k) =
L

i=1
c
i
e

i
k
, (7.29)
where d
i
are random variables (amplitudes at the line frequencies ω
i
) and c
i
= E[|d
i
|
2
]
(powers at the line frequencies ω
i
).
4. Autocorrelation matrix. A WSS process x(n) is Linespec(L) if and only if the autocorrelation
matrix R
L+1
is singular and R
L
nonsingular. So R
L+1
has rank L. There is a unique vector
v of the form
v =
_
1 v
1
. . . v
L
_
T
(7.30)
satisfying R
L+1
v = 0, and this vector uniquely determines the characteristic polynomial
V(z). Furthermore, the rank has the saturation behavior sketched in Fig. 7.2.
5. Sum of sinusoids, complex. A random process of the form x(n) =

L
i=1
d
i
e

i
n
is WSS and,
hence, line spectral, if and only if (a) E[d
i
] = 0 whenever ω
i
= 0 and (b) E[d
i
d
m
* ] = 0 for
i = m. Under this condition, R(k) =

L
i=1
E[|d
i
|
2
]e

i
k
(Theorem 7.3).
6. Sum of sinusoids, real . Consider the process x(n) =

N
i=1
A
i
cos(ω
i
n + θ
i
), where A
i
and
θ
i
are real random variables and ω
i
are distinct positive constants. This is WSS, hence,
line spectral, under the conditions stated in Ex. 7.3. The autocorrelation is then R(k) =

N
i=1
P
i
cos(ω
i
k), where P
i
= 0.5E[A
2
i
] is the total power in the line frequencies ±ω
i
.
102 THETHEORY OFLINEARPREDICTION
7. Identifying the parameters. Given the set of autocorrelation coefficients R(k), 0 ≤ k ≤ L of
the Linespec(L) process x(n), we can determine v
i
in Eq. (7.28) from the eigenvector v
of R
L+1
corresponding to zero eigenvalue. The line frequencies ω
i
can then be determined
because e

i
are the zeros of the characteristic polynomial V(z). Once the line frequencies
are known, the constants c
i
are determined as described in Section 7.3.4.
8. Periodicity. If a WSS process x(n) is such that R(0) = R(M) for some M > 0, then x(n) is
periodic with period M, and so is R(k) (Theorem 7.2). So the process is line spectral.
9. Prediction polynomial . If we perform optimal linear prediction on a Linespec(L) process,
we will find that the Lth-order prediction polynomial A
L
(z) is equal to the characteristic
polynomial V(z) and therefore has all its zeros on the unit circle. The prediction error
E
f
L
= 0.
7.7 IDENTIFYINGALINE SPECTRAL PROCESS INNOISE
Identification of sinusoids in noise is an important problem in signal processing. It is also related
to the problem of finding the direction of arrival of a propagating wave using an array of sensors.
An excellent treatment can be found in Therrien (1992). We will be content with giving a brief
introduction to this problem here. Consider a random process
y(n) = x(n) + e(n), (7.31)
where x(n) is Linespec(L) (i.e., a line spectral process with degree L). Let e(n) be zero-mean white
noise with variance σ
2
e
, that is,
E[e(n)e*(n − k)] = σ
2
e
δ(k).
We thus have a line spectral process buried in noise (popularly referred to as sinusoids buried in
noise). The power spectrum of x(n) has line frequencies (impulses), whereas that of e(n) is flat,
with value σ
2
e
for all frequencies. The total power spectrum is demonstrated in Fig. 7.4. Notice that
some of the line frequencies could be very closely spaced, and some could be buried beneath the
noise level.
Assume that x(n) and e(m) are uncorrelated, that is,
E[x(n)e*(m)] = 0, for any n, m.
LINESPECTRAL PROCESSES 103

S (e )
xx
ω
2π 0
ω
1

ω
L
ω
2
2π c
4
2π c
1
σ
2
e
noise variance
ω
4
FIGURE7.4: A Linespec(L) process with additive white noise of variance σ
2
e
.
Let r(k) and R(k) denote the autocorrelation sequences of y(n) and x(n), respectively. Using this
and the whiteness of e(n), we get
r(k) = E[y(n)y*(n −k)]
= E[x(n)x*(n −k)] + E[e(n)e*(n −k)]
= R(k) +σ
2
e
δ(k),
which yields
r(k) = R(k) + σ
2
e
δ(k) (7.32)
Note, in particular, that r(k) = R(k), k = 0. Thus, the m ×m autocorrelation matrix r
m
of the
process y(n) is given by
r
m
= R
m

2
e
I
m
, (7.33)
where R
m
is the m ×m autocorrelationmatrix of x(n). We say that R
m
is the signal autocorrelation
matrix and r
m
is the signal-plus-noise autocorrelation matrix.
7.7.1 Eigenstructure of the Autocorrelation Matrix
Let η
i
be an eigenvalue of R
m
with eigenvector u
i
, so that
R
m
u
i
= η
i
u
i
.
Then
[R
m
+ σ
2
e
I
m
]
. ¸¸ .
r
m
u
i
= [η
i

2
e
]u
i
.
Thus, the eigenvalues of r
m
are
λ
i
= η
i
+ σ
2
e
.
The corresponding eigenvectors of r
m
are the same as those of R
m
.
104 THETHEORY OFLINEARPREDICTION
m(size)
0 1 2 L + 1 L
R(0)
λ
min
(R )
m
0 1 2
σ
2
e
noise variance
m(size)
L + 1 L
R(0) + σ
2
e
λ
min
(r )
m
(a)
( b)
FIGURE 7.5: (a) The decreasing minimum-eigenvalue of the autocorrelation matrix R
m
as the size m
increases and (b) corresponding behavior of the minimum eigenvalue of r
m
.
The Minimum Eigenvalue. Because x(n) is Linespec(L), its autocorrelation matrix R
m
is nonsingular for 1 ≤ m ≤ L and singular for m = L + 1 (Theorem 7.1). Thus, the smallest
eigenvalue of R
m
is positive for m ≤ L and equals zero for m > L. It is easily shown (Problem
12) that the smallest eigenvalue λ
min
(R
m
) of the matrix R
m
cannot increase with m. So it behaves
in a monotone manner as depicted in Fig. 7.5(a).
Accordingly, the smallest eigenvalue of r
m
behaves as in Fig. 7.5(b). In particular, for m > L
it attains a constant value equal to σ
2
e
. We can use this as a means of identifying the number of
line frequencies L in the process x(n), as well as the variance σ
2
e
of the noise e(n). (This could be
misleading in some cases, as we cannot keep measuring the eigenvalue for an unlimited number of
values of m; see Problem 29.)
Eigenvector corresponding to the smallest eigenvalue. Let v be the eigenvector of r
L+1
corresponding to the smallest eigenvalue σ
2
e
. So
r
L+1
v = σ
2
e
v. (7.34)
In view of Eq. (7.33) this implies
R
L+1
v +σ
2
e
v = σ
2
e
v,
that is,
R
L+1
v = 0.
LINESPECTRAL PROCESSES 105
Thus, v is also the eigenvector that annihilates R
L+1
. The eigenvector corresponding to the
minimum eigenvalue of r
L+1
can be computed using any standard method, such as, for example,
the power method (Golub and Van Loan, 1989). Once we identify the vector v, then we can find
the characteristic polynomial V(z) (Eq. (7.28)). Its roots are guaranteed to be on the unit circle
(i.e., have the form e

i
), revealing the line frequencies. The smallest eigenvalue of r
L+1
gives the
noise variance σ
2
e
.
7.7.2 Computing the Powers at the Line Frequencies
The autocorrelation R(k) of the process x(n) has the form (7.3), where c
i
are the powers at the line
frequencies ω
i
. Because we already know ω
i
, we can compute c
i
by solving linear equations (as in
Eq. (7.16)), if we know enough samples of R(k). For this note that R(k) is related to the given
autocorrelation r(k) as in Eq. (7.32). So we know all the samples of R(k) (because σ
2
e
has been
identified as described above).
What if we do not know the noise variance? Even if the noise variance σ
2
e
and, hence,
R(0) are not known, we can compute the line powers as follows. From Eq. (7.32) we have
R(k) = r (k), k = 0. So we know the values of R(1), . . . , R(L). We can write a set of L linear
equations






e

1
e

2
. . . e

L
e
j2ω
1
e
j2ω
2
. . . e
j2ω
L
.
.
.
.
.
.
.
.
.
.
.
.
e
jLω
1
e
jLω
2
. . . e
jLω
L












c
1
c
2
.
.
.
c
L






=






R(1)
R(2)
.
.
.
R(L)






(7.35)
The L ×L matrix above is Vandermonde and nonsingular because the line frequencies ω
i
are
distinct (Horn and Johnson, 1985). So we can solve for the line powers c
i
uniquely.
The technique described in this section to estimate the parameters of a line spectral process
is commonly referred to as Pisarenko’s harmonic decomposition. Notice that the line frequencies need
not be harmonically related for the method to work (i.e., ω
k
need not be integer multiplies of a
fundamental ω
0
).
Case of Real Processes. The case where x(n) is a real line spectral process is of considerable
importance. From Ex. 7.3 recall that a process of the form
x(n) =
N

i=1
A
i
cos(ω
i
n + θ
i
)
is WSS and hence, harmonic, under some conditions. If x(n) in Eq. (7.31) has this form, then the
above discussions continue to hold. We can identify the eigenvector v of r
L+1
corresponding to
106 THETHEORY OFLINEARPREDICTION
its minimum eigenvalue (Eq. (7.34)) and, hence, the corresponding line frequencies as described
previously. In this case, it is more convenient to write R(k) in the form
R(k) =
N

i=1
P
i
cos(ω
i
k),
where P
i
> 0 is the total power at the frequencies ±ω
i
. Because R(k) is known for 1 ≤ k ≤ N, we
can identify P
i
by writing linear equations, as we did for the complex case.
Example 7.4: Estimationof sinusoids in noise. Consider the sum of sinusoids
x(n) = 0.3 sin(0.1πn) + 1.6 sin(0.5πn) + 2.1 sin(0.9πn), (7.36)
and assume that we have noisy samples
y(n) = x(n) + e(n),
where e(n) is zero-mean white noise with variance σ
2
e
= 0.1. We assume that 100 samples of y(n)
are available. The signal component x(n) and the noise component e(n) generated using the Matlab
command randn are shown in Fig. 7.6. The noisy data y(n) can be regarded as a process with a line
spectrum (six lines because each sinusoid has two lines), buried in a background noise spectrum
(as in Fig. 7.4). We need the 7 × 7 autocorrelation matrix R
7
of the process y(n), to estimate the
lines. So, from the noisy data y(n), we estimate the autocorrelation R(k) for 0 ≤ k ≤ 6 using the
autocorrelation method described in Section 2.5. This estimates the top row of R
7
:
_
3.6917 −2.0986 0.5995 −1.4002 2.1572 −0.1149 −1.8904
_
From these coefficients, we form the 7 ×7 Toeplitz matrix R
7
and compute its smallest eigenvalue
and the corresponding eigenvector
v =
_
0.6065 0.0083 −0.3619 0.0461 −0.3619 0.0083 0.6065
_
T
.
With the elements of v regarded as the coefficients of the polynomial V(z), we now compute the
zeros of V(z) and obtain
0.0068 ±0.9999j, 0.9414 ±0.3373j, −0.9550 ±0.2965j.
From the angles of these zeros, we identify the six line frequencies ±ω
1
, ±ω
2
, and ±ω
3
. The three
sinusoidal frequencies ω
1
, ω
2
, and ω
3
are estimated to be
0.1029π, 0.4985π, and 0.9034π,
LINESPECTRAL PROCESSES 107
0 99
-4
-2
0
2
4
n
s
i
g
n
a
l

a
m
p
l
i
t
u
d
e
0 99
-4
-2
0
2
4
n
n
o
i
s
e

a
m
p
l
i
t
u
d
e
FIGURE 7.6: Example 7.4. Estimation of sinusoids in noise. The plots show the signal and the noise
components used in the example.
which are quite close to the correct values given in Eq. (7.36), namely, 0.1π, 0.5π, and 0.9π. With
ω
k
thus estimated and R(k) already estimated, we can use (7.35) to estimate c
k
and finally obtain
the amplitudes of the sinusoids (positive square roots of c
k
in this example). The result is
0.3058, 1.6258, and 2.1120,
108 THETHEORY OFLINEARPREDICTION
which should be compared with 0.3, 1.6, and 2.1 in Eq. (7.36). Owing to the randomness of noise,
these estimates vary from experiment to experiment. The preceding numbers represent averages
over several repetitions of this experiment, so that the reader sees an averaged estimate.
Large number of measurements. In the preceding example, we had 100 noisy measurements
available. If this is increased to a large number such as 5000, then the accuracy of the estimate
improves. The result (again averaged over several experiments) is
0.1001π, 0.5000π, and 0.9001π
for the line frequencies and
0.2977, 1.5948, and 2.1019
for the line amplitudes. If the noise is large, the estimates will obviously be inaccurate, but this
effect can be combated if there is a large record of available measurements. Thus, consider the above
example with noise variance increased to unity. Fig. 7.7 shows the signal and noise components
for the first few samples---the noise is very significant indeed. With 10,000 measured samples y(n)
used in the estimation process (an unrealistically large number), we obtain the estimate
0.1005π, 0.4999π, 0.9000π,
for the line frequencies and
0.3301, 1.6140, and 2.1037
for the line amplitudes. Considering how large the noise is, this estimate is quite good indeed.
Figure 7.8 shows even larger noise, almost comparable with the signal (noise variance σ
2
e
= 2).
With 10,000 measured samples y(n) used in the estimation process , we obtain the estimate
0.1030π, 0.5039π and 0.9010π
for the line frequencies and
0.2808, 1.6162, and 2.0725
for the line amplitudes.
In this section, we have demonstrated that sinusoids buried in additive noise can be identified
effectively from the zeros of an eigenvector of the estimated autocorrelation. The method outlined
above reveals the fundamental principles as advanced originally by Pisarenko (1973). Since then,
many improved methods have beenproposed, whichcanobtainmore accurate estimates froma small
LINESPECTRAL PROCESSES 109
0 200
-4
-2
0
2
4
n
s
i
g
n
a
l

a
m
p
l
i
t
u
d
e
0 200
-4
-2
0
2
4
n
n
o
i
s
e

a
m
p
l
i
t
u
d
e
FIGURE 7.7: Example 7.4. Estimation of sinusoids in large noise. The plots show the signal and the
noise components used in the example.
number of noisy measurements. Notable among these are minimum-normmethods (Kumaresan and
Tufts, 1983), MUSIC (Schmidt, 1979, 1986), and ESPRIT (Paulraj et al., 1986; Roy and Kailath,
1989). These methods can also be applied to the estimation of direction of arrival information in
110 THETHEORY OFLINEARPREDICTION
0 200
-4
-2
0
2
4
n
s
i
g
n
a
l

a
m
p
l
i
t
u
d
e
0 200
-4
-2
0
2
4
n
n
o
i
s
e

a
m
p
l
i
t
u
d
e
FIGURE7.8: Example 7.4. Estimation of sinusoids in very large noise. The plots show the signal and
the noise components used in the example.
array processing applications (Van Trees, 2002). In fact, many of these ideas originated in the array
processing literature (Schmidt, 1979). A review of these methods can be found in the very readable
account given by Therrien (1992).
LINESPECTRAL PROCESSES 111
7.8 LINE SPECTRUM PAIRS
In Section 5.6, we discussed the application of linear prediction theory in signal compression. We
explained the usefulness of the lattice coefficients {k
m
} in the process. Although the quantization
and transmission of lattice coefficients {k
m
} guarantees that the reconstruction filter 1/A
N
(z)
remains stable, the coefficients k
m
are difficult to interpret from a perceptual viewpoint. If we knew
the relative importance of various coefficients k
m
in preserving perceptual quality, we could assign
them bits according to that. But this has not been found to be possible.
In speech coding practice, one uses an important variationof the predictioncoefficients called
line-spectrum pairs (LSP). The motivation comes from the fact that the connection between LSP
and perceptual properties is better understood (Itakura, 1975; Soong and Juang, 1984, 1993; Kang
and Fransen, 1995). The set of LSP coefficients not only guarantees stability of the reconstruction
filter under quantization, but, in addition, a better perceptual interpretationin the frequency domain
is obtained. To define the LSP coefficients, we construct two new polynomials
P(z) = A
N
(z) + z
−(N+1)
¯
A
N
(z),
Q(z) = A
N
(z) − z
−(N+1)
¯
A
N
(z). (7.37)
Notice that these can be regarded as the polynomial A
N+1
(z) in Levinson’s recursion for the cases
k
N+1
= 1 and k
N+1
= −1, respectively. Equivalently, they are the transfer functions of the FIR
lattice shown in Fig. 7.9.
Let us examine the zeros of these polynomials. The zeros of P(z) and Q(z) are solutions of
the equations
z
−(N+1)
_
¯
A
N
(z)
A
N
(z)
_
= −1 = e
j(2m+1)π
[for P(z)]
z
−(N+1)
_
¯
A
N
(z)
A
N
(z)
_
= 1 = e
j2mπ
[for Q(z)].
z
−1
B (z)
1
A (z)
1
z
−1
B (z)
0
A (z)
0
k
1

x(n)
z
−1
B (z)
N
A (z)
N
k
N
1

P(z)
Q(z)
FIGURE7.9: Interpretation of the LSP polynomials P(z) and Q(z) in terms of the FIR LPC lattice.
112 THETHEORY OFLINEARPREDICTION
Because the left-hand side is all-pass of order N + 1 with all zeros in |z| < 1, it satisfies the
modulus property
¸
¸
¸
¸
¸
z
−(N+1)
¯
A
N
(z)
A
N
(z)
¸
¸
¸
¸
¸





< 1 for|z| > 1
> 1 for|z| < 1
= 1 for|z| = 1
See Vaidyanathan (1993, p. 75) for a proof. Thus, the solutions of the preceding equations are
necessarily on the unit circle. That is, all the N + 1 zeros of P(z) and similarly those of Q(z) are
on the unit circle.
AlternationProperty. In fact, we can say something deeper. It is well known (Vaidyanathan,
1993, p. 76) that the phase response φ(ω) of the stable all-pass filter
z
−(N+1)
¯
A
N
(z)
A
N
(z)
z-plane
zeros of P(z)
zeros of Q(z)
unit circle
θ
1
ω
1
ω
5
θ
4
θ
0
0
unit circle
θ
1
ω
1
ω
6
θ
5
θ
0
0
N = 4, P(−1) = 0, Q(−1) = 0 N = 5, P(−1) = 0, Q(−1) = 0
ω
φ(ω)
−π
−2π
−3π
−2π (Ν+1)
ω
1
θ
1
ω
2 2π
(a)
(b)
FIGURE 7.10: Development of LSPs. (a) Monotone phase response of z
−(N+1)
¯
A
N
(z)/A
N
(z) and (b)
zeros of P(z) and Q(z) for the cases N = 4 and N = 5.
LINESPECTRAL PROCESSES 113
z-plane
unit
circle
θ
1
ω
1
0
LSP coefficients for N = 4
θ
2
ω
2
unit
circle
θ
1
ω
1
0
LSP coefficients for N = 5
ω
2
θ
2
ω
3
FIGURE7.11: Examples of LSP coefficients for N = 4 and N = 5. The unit-circle coefficients in the
lower half are redundant and therefore dropped.
is a monotone decreasing function, spanning a total range of 2(N + 1)π (Fig. 7.10(a)). Thus, the
N + 1 zeros of P(z) and Q(z) alternate with each other as demonstrated in Fig. 7.10(b) for N = 4
and N = 5. The angles of these zeros are denoted as ω
i
and θ
i
.
Because A
N
(z) has real coefficients in speech applications, the zeros come in complex
conjugate pairs. Note that P(−1) = 0 for even N and Q(−1) = 0 for odd N. Moreover,
Q(1) = 0 and P(1) = 0 regardless of N. These follow from the fact that A
N
(±1) =
¯
A
N
(±1).
Thus, if we know the zeros of P(z) and Q(z) in the upper half of the unit circle (and excluding
z = ±1), we can fully determine these polynomials (their constant coefficients being unity by the
definition (7.37)). There are N such zeros (ω
k
and θ
k
), as demonstrated in Fig. 7.11 for N = 4
and N = 5. These are calledline spectrumpairs (LSP) associatedwiththe predictor polynomial A
N
(z).
For example, the LSP parameters for N = 4 are the four ordered frequencies
ω
1
< θ
1
< ω
2
< θ
2
(7.38)
and the LSP parameters for N = 5 are the five ordered frequencies
ω
1
< θ
1
< ω
2
< θ
2
< ω
3
. (7.39)
Properties and Advantages of LSPs. Given the N coefficients of the polynomial A
N
(z),
the N LSP parameters can be uniquely identified by finding the zeros of P(z) and Q(z). These
parameters are quantized, making sure that the ordering (e.g., Eq. (7.39)) is preserved in the
process. The quantized coefficients are then transmitted. Because P(z) and Q(z) can be computed
114 THETHEORY OFLINEARPREDICTION
uniquely from the LSP parameters, approximations of these polynomial can be computed at the
receiver. From these, we can identify an approximation of A
N
(z) because
A
N
(z) = (P(z) + Q(z))/2.
The speech segment can then be reconstructed from the stable filter 1/A
N
(z) as described in
Section 5.6. Several features of the LSP coding scheme are worth noting.
1. Stability preserved. As long as the ordering (e.g., Eq. (7.39)) is preserved in the quantization,
the reconstructed version of A
N
(z) is guaranteed to have all zeros in |z| < 1. Thus, stability
of 1/A
N
(z) is preserved despite quantizationas in the case of lattice coefficient quantization.
2. Perceptual spectral interpretation. Unlike the lattice coefficients, the LSP coefficients are
perceptually better understood. To explain this, recall first that for sufficiently large N, the
AR(N) model gives a good approximation of the speech power spectrum S
xx
(e

). This
approximation is especially good near the peak frequencies, called the formants of speech.
Now, the peak locations correspond approximately to the pole angles of the filter 1/A
N
(z).
Near these locations, the phase response tends to change rapidly. The same is true of the
phase response φ(ω) of the all-pass filter, as shown in Fig. 7.12(b). The LSP coefficients,
which are intersections of the horizontal lines (multiples of π) with the plot of φ(ω),
therefore tend to get crowded near the formant frequencies. Thus, the crucial features of
the power spectrum tend to get coded into the density-information of the LSP coefficients
at various points on the unit circle. This information allows us to perform bit allocation
among the LSP coefficients (quantize the crowded LSP coefficients with greater accuracy).
For a given bit rate, this results in perceptually better speech quality, as compared with
the quantization of lattice coefficients. Equivalently, for a given perceptual quality, we can
reduce the bit rate; in early work, an approximately 25% saving has been reported using
this idea, and more savings have been obtained in a series of other papers. Furthermore, as
the bit rate is reduced, the speech degradation is found to be more gradual compared with
lattice coefficient quantization.
3. Acoustical tube models. It has been shown in speech literature that the lattice structure is
related to an acoustical tube model of the vocal tract (Markel and Gray, 1976). This is
the origin of the term reflection coefficients. The values k
N+1
= ±1 in Fig. 7.9 indicate, for
example, situations where one end of the tube is open or closed. Thus, the LSP frequencies
are related to the open- and close-ended acoustic tube models.
4. Connection to circuit theory. In the theory of passive electrical circuits, the concept of reactances
is well-known (Balabanian and Bickart, 1969). Reactances are input impedances of electrical
LCnetworks. It turns out that reactances have a pole-zero alternationproperty similar to the
LINESPECTRAL PROCESSES 115
ω
−π
−2π
−3π

(b)
ω

−4π
−5π
(a)
formant
Power
spectrum
phase of
allpass
filter
FIGURE7.12: Explanation of how the LSP coefficients tend to get crowded near the formant regions
of the speech spectrum. (a) A toy power spectrum with one formant and (b) the phase response of the
all-pass filter z
−(N+1)
¯
A
N
(z)/A
N
(z), where A
N
(z) is the prediction polynomial (see text).
alternation of the zeros of the LSP polynomials P(z) and Q(z). In fact, the ratio P(z)/Q(z)
is nothing but a discrete time version of the reactance function.
7.9 CONCLUDINGREMARKS
In this chapter, we presented a rather detailed study of line spectral processes. The theory finds
applications in the identification of sinusoids buried in noise. A variation of this problem is
immediately applicable in array signal processing where one seeks to identify the direction of arrival
of a signal using an antenna array (Van Trees, 2002). Further applications of the concepts in signal
compression, using LSPs, was briefly reviewed.
• • • •
117
C H A P T E R 8
Linear Prediction Theory for Vector
Processes
8.1 INTRODUCTION
In this chapter, we consider the linear prediction problem for vector WSS processes. Let x(n)
denote the L ×1 vector sequence representing a zero-mean WSS process. Thus, E[x(n)] = 0 and
the autocorrelation is
R(k) = E[x(n)x

(n-k)], (8.1)
where x

(n) represents transpose conjugation, as usual. Note that R(k) is independent of n (owing
to WSS property) and depends only on k. The autocorrelation is a matrix sequence with each
element R(k) representing an L ×L matrix. Furthermore, it is readily shown from the definition
(8.1) that
R

(−k) = R(k). (8.2)
For a vector WSS process characterized by autocorrelation R(k), we now formulate the linear
prediction problem. Most of the developments will be along lines similar to the scalar case. But
there are some differences between the scalar and vector cases that need to be emphasized. A short
and sweet introduction to this topic was given in Section 9.6 of Anderson and Moore (1979). We
shall go into considerably greater details in this chapter.
8.2 FORMULATIONOF THE VECTOR LPC PROBLEM
The forward predictor of order N predicts the sample x(n) from a linear combination of past N
samples:
´x(n) = −a

N,1
x(n − 1) − a

N,2
x(n −2) . . . − a

N,N
x(n-N). (8.3)
The forward prediction error is therefore
e
f
N
(n) = x(n) −´x(n) = x(n) + a

N,1
x(n −1) + a

N,2
x(n − 2) . . . + a

N,N
x(n −N) (8.4)
118 THETHEORY OFLINEARPREDICTION
and the forward prediction polynomial is
A
N
(z) = I
L
+ a

N,1
z
−1
+ a

N,2
z
−2
+ . . . + a

N,N
z
−N
. (8.5)
The optimal predictor has the L ×L matrix coefficients a
N,k
chosen such that the total mean square
error in all components is minimized, that is,
E
f
N
Δ
= E[(e
f
N
(n))

e
f
N
(n)] (8.6)
is minimized. This quantity is also equal to the trace (i.e., sum of diagonal elements) of the forward
prediction error covariance matrix
E
f
N
= E[e
f
N
(n)(e
f
N
(n))

]. (8.7)
Note that this is an L ×L matrix. The goal therefore is to minimize
E
f
N
Δ
= Tr(E
f
N
) (8.8)
by choosing a
N,k
appropriately. The theory of optimal predictors for the vector case is again based
on the orthogonality principle:
Theorem8.1. Orthogonality principle (vector processes). The Nth-order forwardlinear predictor
for a WSS vector process x(n) is optimal if and only if the error e
f
N
(n) at any time n is orthogonal
to the N past observations x(n −1), . . . , x(n −N), that is,
E[e
f
N
(n)x

(n −k)] = 0, 1 ≤ k ≤ N, (8.9)
where the right-hand side 0 represents the L ×L zero-matrix (with L denoting the size of the
column vectors x(n)). ♦
Proof. Let ´x

(n) be the predicted value for which the error e

(n) satisfies the orthogonality
condition (8.9), and let ´x(n) be another predicted value with error e(n) (superscript f and subscript
N deleted for simplicity). Now,
e(n) = x(n) −´x(n) = x(n) −´x

(n) +´x

(n) −´x(n),
so that
e(n) = e

(n) +´x

(n) −´x(n). (8.10)
Now, the estimates ´x

(n) and ´x(n) are, by construction, linear combinations of x(n −k), 1 ≤ k ≤
N. Because the error e

(n), by definition, is orthogonal to these past samples, it follows that
E[e(n)e

(n)]
. ¸¸ .
call this E
= E[e

(n)e


(n)]
. ¸¸ .
call this E

+E[(´x

(n) −´x(n))(´x

(n) −´x(n))

] (8.11)
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 119
Because the second term on the right-hand side is a covariance, it is positive semidefinite. The
preceding therefore shows that
E − E

≥ 0 (8.12)
where the notation A ≥ 0 means that the Hermitian matrix A is positive semidefinite. Taking trace
on both sides, we therefore obtain
E ≥ E

. (8.13)
When does equality arise? Observe first that E −E

= Tr(E −E

). Because E − E

has been
shown to be positive semidefinite, its trace is zero only when
1
E −E

= 0. From (8.11), we see
that this is equivalent to the condition
E[(´x

(n) −´x(n))(´x

(n) −´x(n))

] = 0,
which, in turn, is equivalent to ´x

(n) −´x(n) = 0. This proves that equality is possible if and only
if ´x(n) =´x

(n).
Optimal Error Covariance. The error covariance E
f
N
for the optimal predictor can be
expressed in an elegant way using the orthogonality conditions:
E
f
N
= E[e
f
N
(n)(e
f
N
(n))

] = E[e
f
N
(n)
_
x(n) −´x(n)
_

] = E[e
f
N
(n)x

(n)].
The third equality follows by observing that the optimal ´x(n) is a linear combination of samples
x(n −k), which are orthogonal to e
f
N
(n). Using Eq. (8.4), this can be rewritten as
E
f
N
= E
__
x(n) + a

N,1
x(n −1) + a

N,2
x(n − 2) . . . + a

N,N
x(n −N)
_
x

(n)
¸
= R(0) + a

N,1
R

(1) + a

N,2
R

(2) + . . . + a

N,N
R

(N)
Because the error covariance E
f
N
is Hermitian anyway, we can rewrite this as
E
f
N
= R(0) + R(1)a
N,1
+ R(2)a
N,2
+ . . . + R(N)a
N,N
. (8.14)
We shall find this useful when we write the augmented normal equations for optimal linear
prediction.
1
The trace of a matrix A is the sum of its eigenvalues. When A is positive semidefinite, all eigenvalues are
nonnegative. So, zero-trace implies all eigenvalues are zero. This implies that the matrix itself is zero (because any
positive semidefinite matrix can be written as A = UΛU

, where Λ is the diagonal matrix of eigenvalues and U is
unitary).
120 THETHEORY OFLINEARPREDICTION
8.3 NORMAL EQUATIONS: VECTOR CASE
Substituting the expression (8.4) for the prediction error into the orthogonality condition (8.9), we
get N matrix equations. By using the definition of autocorrelation given in Eq. (8.1), we find these
equations to be
R(k ) + a

N,1
R(k −1) + a

N,2
R(k −2) +. . . + a

N,N
R(k − N) = 0, (8.15)
for 1 ≤ k ≤ N. Here the 0 on the right is an L ×L matrix of zeros. Solving these equations, we
can obtain the N optimal predictor coefficient matrices a
N,k
. For convenience, we shall rewrite the
equations in the form
R

(k − 1)a
N,1
+ R

(k −2)a
N,2
+. . . + R

(k −N)a
N,N
= −R

(k) (8.16)
for 1 ≤ k ≤ N. By using the fact that R

(−k) = R(k), these equations can be written in the
following elegant form:






R(0) R(1) . . . R(N−1)
R(−1) R(0) . . . R(N−2)
.
.
.
.
.
.
.
.
.
.
.
.
R(−N + 1) R(−N + 2) . . . R(0)












a
N,1
a
N,2
.
.
.
a
N,N






= −






R(−1)
R(−2)
.
.
.
R(−N)






. (8.17)
These represent the normal equations for the vector LPC problem. Note that these equations reduce
to the normal equations for scalar processes given in Eq. (2.8) (remembering R

(−k) = R(k)). If
we move the vector on the right to the left-hand side and then prefix Eq. (8.14) at the top, we get
the following augmented normal equations for the optimal vector LPC problem:






R(0) R(1) . . . R(N)
R(−1) R(0) . . . R(N− 1)
.
.
.
.
.
.
.
.
.
.
.
.
R(−N) R(−N + 1) . . . R(0)












I
L
a
N,1
.
.
.
a
N,N






=






E
f
N
0
.
.
.
0






. (8.18)
We will soon return to this equation and solve it. But first, we have to introduce the idea of
backward prediction.
8.4 BACKWARDPREDICTION
The formulation of the backward predictor problem for the vector case allows us to derive a
recursion similar to Levinson’s recursion in the scalar case (Chapter 3). This also results in elegant
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 121
lattice structures. The Nth-order backward linear predictor predicts the L ×1 vector x(n − N− 1)
as follows:
´x(n −N −1) = −b

N,1
x(n −1) −b

N,2
x(n − 2) − . . . −b

N,N
x(n − N),
where b
N,k
are L ×L matrices. The backward prediction error is
e
b
N
(n) = x(n − N −1) −´x(n − N− 1),
so that
e
b
N
(n) = x(n −N −1) + b

N,1
x(n − 1) + b

N,2
x(n −2) +. . . + b

N,N
x(n −N).
So the backward prediction polynomial takes the form
B
N
(z) = b

N,1
z
−1
+ b

N,2
z
−2
. . . + b

N,N
z
−N
+ z
−(N+1)
I. (8.19)
The optimal coefficients b
N,k
are again found by imposing the orthogonality condition (8.9),
namely,
E[e
b
N
(n)x

(n −k)] = 0, 1 ≤ k ≤ N
This results in the N equations
b

N,1
R(k −1) + b

N,2
R(k −2) +. . . b

N,N
R(k − N) + R(k −N − 1) = 0 (8.20)
for 1 ≤ k ≤ N. The error covariance for the optimal backward predictor is
E
b
N
= E[e
b
N
(n)(e
b
N
(n))

] = E[e
b
N
(n) (x(n − N− 1) −´x(n −N −1))

]
= E[e
b
N
(n)(x

(n − N− 1)].
The third equality follows by observing that the optimal ´x(n −N −1) is a linear combination of
samples x(n −k) that are orthogonal to e
b
N
(n). So E
b
N
can be rewritten as
E
b
N
= E
__
x(n −N −1) + b

N,1
x(n − 1) +. . . + b

N,N
x(n −N)
_
x

(n −N −1)
_
Using Eq. (8.1) and the facts that E
b
N
is Hermitian and R

(−k) = R(k), this can further be
rewritten as
E
b
N
= R(−N)b
N,1
+. . . + R(−1)b
N,N
+ R(0). (8.21)
122 THETHEORY OFLINEARPREDICTION
Combining this with the Nequations in Eq. (8.20) and rearranging slightly, we obtain the following
augmented normal equations for the backward vector predictor:






R(0) R(1) . . . R(N)
R(−1) R(0) . . . R(N− 1)
.
.
.
.
.
.
.
.
.
.
.
.
R(−N) R(−N + 1) . . . R(0)












b
N,1
.
.
.
b
N,N
I
L






=






0
0
.
.
.
E
b
N






(8.22)
8.5 LEVINSON’S RECURSION: VECTOR CASE
We now combine the normal equations for the forward and backward predictors to obtain
Levinson’s recursion for solving the optimal predictors recursively. For this, we start from Eq.
(8.18) and add an extra column and row
2
to obtain








R(0) R(1) . . . R(N) R(N + 1)
R(−1) R(0) . . . R(N− 1) R(N)
.
.
.
.
.
.
.
.
.
.
.
.
R(−N) R(−N + 1) . . . R(0) R(1)
R(−N− 1) R(−N) . . . R(−1) R(0)








. ¸¸ .
call this R
N+2








I
L
a
N,1
.
.
.
a
N,N
0








=








E
f
N
0
.
.
.
0
F
f
N








, (8.23)
where F
f
N
is an L ×L matrix, possibly nonzero, which comes from the extra row added at the
bottom. Similarly, starting from Eq. (8.22), we add an extra row and column to obtain








R(0) R(1) . . . R(N) R(N + 1)
R(−1) R(0) . . . R(N− 1) R(N)
.
.
.
.
.
.
.
.
.
.
.
.
R(−N) R(−N + 1) . . . R(0) R(1)
R(−N −1) R(−N) . . . R(−1) R(0)








. ¸¸ .
this is R
N+2
too!








0
b
N,1
.
.
.
b
N,N
I
L








=








F
b
N
0
.
.
.
0
E
b
N








(8.24)
where F
b
N
comes from the extra row added at the top. We now postmultiply Eq. (8.24) by an L ×L
matrix (K
f
N+1
)

and add it to Eq. (8.23) with the hope that the last entry F
f
N
in the right-hand side is
2
Actually L extra columns and rows because each entry is itself an L ×L matrix.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 123
replaced with zero. This will then lead to the augmented normal equation for the (N + 1)th-order
forward predictor. We have
R
N+2
















I
L
a
N,1
.
.
.
a
N,N
0








+








0
b
N,1
.
.
.
b
N,N
I
L








(K
f
N+1
)









=








E
f
N
0
.
.
.
0
F
f
N








+








F
b
N
0
.
.
.
0
E
b
N








(K
f
N+1
)

(8.25)
Clearly, the choice of K
f
N+1
, which achieves the aforementioned purpose should be such that
F
f
N
+ E
b
N
(K
f
N+1
)

= 0.
At this point, we make the assumption that the L ×L error covariance E
b
N
, and for future purposes,
E
f
N
, are nonsingular. The solution K
f
N+1
is therefore given by
K
f
N+1
= −[F
f
N
]

[E
b
N
]
−1
. (8.26)
We can similarly postmultiply Eq. (8.23) by an L × L matrix K
b
N+1
and add it to Eq. (8.24):
R
N+2
















I
L
a
N,1
.
.
.
a
N,N
0








K
b
N+1
+








0
b
N,1
.
.
.
b
N,N
I
L
















=








E
f
N
0
.
.
.
0
F
f
N








K
b
N+1
+








F
b
N
0
.
.
.
0
E
b
N








. (8.27)
To create the augmented normal equation for the (N + 1)-order backward predictor, we have to
reduce this to the form (8.22). For this, K
b
N+1
should be chosen as
[E
f
N
]K
b
N+1
+ F
b
N
= 0,
that is,
K
b
N+1
= −[E
f
N
]
−1
F
b
N
. (8.28)
124 THETHEORY OFLINEARPREDICTION
The matrices K
f
N+1
and K
b
N+1
are analogous to the parcor coefficients k
m
used in scalar linear
prediction. With K
f
N+1
and K
b
N+1
chosen as above, Eqs. (8.25) and (8.27) reduce to the form
R
N+2
















I
L
a
N,1
.
.
.
a
N,N
0








+








0
b
N,1
.
.
.
b
N,N
I
L








(K
f
N+1
)









=








E
f
N+1
0
.
.
.
0
0








(8.29)
and
R
N+2
















I
L
a
N,1
.
.
.
a
N,N
0








K
b
N+1
+








0
b
N,1
.
.
.
b
N,N
I
L
















=








0
0
.
.
.
0
E
b
N+1








, (8.30)
respectively. These are the augmented normal equations for the (N + 1)th-order optimal for-
ward and backward vector predictors, respectively. Thus, the (N + 1)th-order forward predictor
coefficients are obtained from








I
L
a
N+1,1
.
.
.
a
N+1,N
a
N+1,N+1








=








I
L
a
N,1
.
.
.
a
N,N
0








+








0
b
N,1
.
.
.
b
N,N
I
L








(K
f
N+1
)

.
Similarly, the (N + 1)th-order backward predictor coefficients are obtained from








b
N+1,1
b
N+1,2
.
.
.
b
N+1,N+1
I
L








=








I
L
a
N,1
.
.
.
a
N,N
0








K
b
N+1
+








0
b
N,1
.
.
.
b
N,N
I
L








.
The forward and backward prediction polynomials defined in Eqs. (8.5) and (8.19) are therefore
given by
A
N+1
(z) = A
N
(z) + K
f
N+1
B
N
(z) (8.31)
and
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 125
B
N+1
(z) = z
−1
[(K
b
N+1
)

A
N
(z) + B
N
(z)]. (8.32)
The extra z
−1
arises because the 0th coefficient in the backward predictor, by definition, is zero
(see Eq. (8.19)). By comparing Eqs. (8.25) and (8.29) we see that the optimal prediction error
covariance matrix is updated as follows:
E
f
N+1
= E
f
N
+ F
b
N
(K
f
N+1
)

. (8.33)
Similarly, from (8.25) and (8.29), we obtain
E
b
N+1
= E
b
N
+ F
f
N
K
b
N+1
. (8.34)
By substituting from Eq. (8.26) and (8.28) and using the fact that the error covariances are
Hermitian anyway, we obtain the error covariance update equations
E
f
N+1
=
_
I
L
−K
f
N+1
(K
b
N+1
)

_
E
f
N
(8.35)
and
E
b
N+1
=
_
I
L
−(K
b
N+1
)

K
f
N+1
_
E
b
N
(8.36)
Summary of Levinson’s Recursion. We now summarize the key equations of Levinson’s
recursion. The subscript m is used instead of N for future convenience. Let R(k) be the L ×L
autocorrelation matrix sequence of a WSS process x(n). Then, the optimal linear predictor
polynomials A
N
(z) and B
N
(z) and the error covariances E
f
N
and E
b
N
can be calculated recursively
as follows: first, initialize the recursion according to
A
0
(z) = I, B
0
(z) = z
−1
I, and E
f
0
= E
b
0
= R(0). (8.37(a))
Given A
m
(z), B
m
(z), E
f
m
, and E
b
m
(all L ×L matrices), proceed as follows:
1. Calculate the L × L matrices
F
f
m
= R(−m −1) + R(−m)a
m,1
+ . . . + R(−1)a
m,m
(8.37(b))
F
b
m
= R(1)b
m,1
+ . . . + R(m)b
m,m
+ R(m + 1) (8.37(c))
2. Calculate the parcor coefficients (L ×L matrices)
K
f
m+1
= −[F
f
m
]

[E
b
m
]
−1
(8.37(d))
K
b
m+1
= −[E
f
m
]
−1
F
b
m
(8.37(e))
126 THETHEORY OFLINEARPREDICTION
3. Update the prediction polynomials according to
A
m+1
(z) = A
m
(z) + K
f
m+1
B
m
(z) (8.37( f ))
B
m+1
(z) = z
−1
[(K
b
m+1
)

A
m
(z) + B
m
(z)]. (8.37(g))
4. Update the error covariances according to
E
f
m+1
=
_
I
L
−K
f
m+1
(K
b
m+1
)

_
E
f
m
(8.37(h))
E
b
m+1
=
_
I
L
−(K
b
m+1
)

K
f
m+1
_
E
b
m
. (8.37(i))
This recursion can be repeated to obtain the optimal predictors for any order N.
8.6 PROPERTIES DERIVEDFROMLEVINSON’S RECURSION
We now derive some properties of the matrices that are involved in Levinson’s recursion. This gives
additional insight and places the mathematical equations in proper context.
8.6.1 Properties of Matrices F
f
m
and F
b
m
Eq. (8.37(b)) can be rewritten as
(F
f
m
)

= R(m + 1) + a

m,1
R(m) + . . . + a

m,m
R(1)
= E
__
x(n) + a

m,1
x(n − 1) +. . . + a

m,m
x(n − m)
_
x

(n −m − 1)
¸
= E[e
f
m
(n)x

(n − m −1)]
We know the error e
f
m
(n) is orthogonal to the samples x(n −1), . . . , x(n −m), but not necessarily
to x(n − m −1). This ‘‘residual correlation’’ between e
f
m
(n) and x(n − m −1) is precisely what is
captured by the matrix F
f
m
. The coefficient K
f
m+1
in Eq. (8.37(d)) is therefore called the partial
correlation or parcor coefficient, as in the scalar case. From Eq. (8.37(d)), we see that if this ‘‘residual
correlation’’ is zero, then K
f
m+1
= 0, and as a consequence, there is no progress in the recursion,
that is,
A
m+1
(z) = A
m
(z) and E
f
m+1
= E
f
m
.
How about F
b
m
? Does it have a similar significance? Observe from (8.37(c)) that
(F
b
m
)

= b

m,1
R(−1) + . . . + b

m,m
R(−m) + R(−m −1)
= E
__
b

m,1
x(n − 1) + . . . + b

m,m
x(n −m) + x(n −m −1)
_
x

(n)
_
= E[e
b
m
(n)x

(n)]
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 127
The backward error e
b
m
(n) is the error in the estimation of x(n − m −1) based on the samples
x(n −m), . . . , x(n −1). Thus, e
b
m
(n) is orthogonal to x(n −m), . . . , x(n −1) but not necessarily
to the sample x(n). This residual correlation is captured by F
b
m
. If this is zero, then K
b
m+1
is zero as
seen from Eq. (8.37(e)). That is, there is no progress in prediction as we go from stage m to m + 1.
RelationBetweenF
f
m
and F
b
m
. We have shown that
(F
f
m
)

= E[e
f
m
(n)x

(n − m −1)] (8.38)
and
(F
b
m
)

= E[e
b
m
(n)x

(n)]. (8.39)
For the scalar case, the parameter α
m
played the role of the matrices F
f
m
and F
b
m
. But for the
vector case, we have the two matrices F
f
m
and F
b
m
, which appear to be unrelated. This gives the
first impression that there is a lack of symmetry between the forward and backward predictors.
However, there is a simple relation between F
f
m
and F
b
m
credited to Burg (Kailath, 1974); namely,
F
b
m
= (F
f
m
)

. (8.40)
This is readily proved as follows:
Proof of Eq. (8.40). We can rewrite Eq. (8.38) as
(F
f
m
)

= E
_
e
f
m
(n)
_
x(n − m −1) + b

m,1
x(n −1) +. . . + b

m,m
x(n − m)
_

_
= E[e
f
m
(n)(e
b
m
(n))

]
because E[e
f
m
(n)x(n −k)] = 0 for 1 ≤ k ≤ m anyway. Similarly, from Eq. (8.39), we get
(F
b
m
)

= E
_
e
b
m
(n)
_
x(n) + a

m,1
x(n −1) +. . . + a

m,m
x(n − m)
¸

_
= E[e
b
m
(n)(e
f
m
(n))

].
So we have shown that
F
f
m
= E[e
b
m
(n)(e
f
m
(n))

] and F
b
m
= E[e
f
m
(n)(e
b
m
(n))

], (8.41)
which proves (8.40).
Relation Between Error Covariance Matrices. From the definitions of K
f
m+1
and K
b
m+1
given in Eqs. (8.37(d)) and (8.37(e)), we have
[F
f
m
]

= −K
f
m+1
E
b
m
and F
b
m
= −E
f
m
K
b
m+1
.
128 THETHEORY OFLINEARPREDICTION
In view of the symmetry relation (8.40), it then follows that
K
f
m+1
E
b
m
= E
f
m
K
b
m+1
. (8.42)
8.6.2 Monotone Properties of the Error Covariance Matrices
We now show that the error covariances of the optimal linear predictor satisfy the following:
E
f
m
≥ E
f
m+1
and E
b
m
≥ E
b
m+1
, (8.43)
that is, the differences E
f
m
−E
f
m+1
and E
b
m
−E
b
m+1
are positive semidefinite. Because the total
mean square error E
f
m
is the trace of E
f
m
, Eq. (8.43) means in particular that
E
f
m
≥ E
f
m+1
and E
b
m
≥ E
b
m+1
(8.44)
Proof of (8.43). Substituting (8.37(d)) into Eq. (8.33), we see that E
f
m+1
= E
f
m

F
b
m
(E
b
m
)
−1
F
f
m
. In view of the symmetry relation (8.40), it then follows that
E
f
m+1
= E
f
m
−(F
f
m
)

(E
b
m
)
−1
F
f
m
. (8.45)
Because (E
b
m
)
−1
is Hermitianand positive definite, it follows that the secondtermon the right-hand
side is Hermitian and positive semidefinite. This proves E
f
m
≥ E
f
m+1
. Similarly, substituting Eq.
(8.37(e)) into Eq. (8.34), we get E
b
m+1
= E
b
m
−F
f
m
(E
f
m
)
−1
F
b
m
, from which it follows that
E
b
m+1
= E
b
m
−F
f
m
(E
f
m
)
−1
(F
f
m
)

. (8.46)
This shows that E
b
m
≥ E
b
m+1
and the proof of Eq. (8.43) is complete.
8.6.3 Summary of Properties Relating to Levinson’s Recursion
Here is a summary of what we have shown so far:
1. F
f
m
and F
b
m
have the following interpretations:
(F
f
m
)

= E[e
f
m
(n)x

(n − m −1)], (F
b
m
)

= E[e
b
m
(n)x

(n)]. (8.47(a))
2. We can also express these as
F
f
m
= E[e
b
m
(n)(e
f
m
(n))

] and F
b
m
= E[e
f
m
(n)(e
b
m
(n))

], (8.47(b))
so that
F
b
m
= (F
f
m
)

. (8.47(c))
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 129
3. The parcor coefficients are related as
K
f
m+1
E
b
m
= E
f
m
K
b
m+1
. (8.47(d))
4. The error covariances matrices satisfy
E
f
m
≥ E
f
m+1
, E
b
m
≥ E
b
m+1
, (8.47(e))
where A ≥ B means that A −B is positive semidefinite.
5. Traces of the error covariances satisfy E
f
m
≥ E
f
m+1
and E
b
m
≥ E
b
m+1
.
6. For completeness, we list one more relation that will only be proved in Section 8.8.1. The
error covariances for successive prediction orders are related as
(E
f
m+1
)
1/2
= (E
f
m
)
1/2
_
I
L
−P
m+1
P

m+1
_
1/2
(8.47( f ))
(E
b
m+1
)
1/2
= (E
b
m
)
1/2
_
I
L
−P

m+1
P
m+1
_
1/2
, (8.47(g))
where
P
m+1
Δ
= (E
f
m
)
−1/2
K
f
m+1
(E
b
m
)
1/2
= (E
f
m
)
†/2
K
b
m+1
(E
b
m
)
−†/2
(8.47(h))
and the matrices P
m
satisfy
P
m
P

m
< I, P

m
P
m
< I. (8.47(i))
8.7 TRANSFERMATRIX FUNCTIONS INVECTOR LPC
The forward prediction polynomial (8.5) is a multi-input multi-output (MIMO) FIR filter, which
produces the output e
f
N
(n) in response to the input x(n). Similarly backward prediction polynomial
(8.19) produces e
b
N
(n) in response to the input x(n). These are schematically shown in Fig. 8.1(a).
It then follows that the backward error e
b
N
(n) can be generated from the forward error e
f
N
(n) as
shown in part (b) of the figure. We therefore see that the transfer function from e
f
N
(n) to e
b
N
(n) is
given by
z
−1
H
N
(z)
Δ
=B
N
(z)A
−1
N
(z) (8.48)
A modification of this is shown in part (c) where the quantities g
f
N
(n) and g
b
N
(n) are given by
g
f
N
(n) = (E
f
N
)
−1/2
e
f
N
(n), g
b
N
(n) = (E
b
N
)
−1/2
e
b
N
(n) (8.49)
130 THETHEORY OFLINEARPREDICTION
x(n) A (z)
N
e (n)
N
f
B (z)
N
e (n)
N
b
x(n)
e (n)
N
f
B (z)
N
e (n)
N
b
x(n)
e (n)
N
f
B (z)
N
e (n)
N
b
g (n)
N
f
E
N
f
[ ]
1/2
g (n)
N
b
E
N
b
[ ]
−1/2
(a)
( b)
( c)
A (z)
N
-1
A (z)
N
-1
FIGURE 8.1: (a) Generation of the forward and backward errors using the prediction polynomials
A
N
(z) and B
N
(z), (b) generation of the backward error e
b
N
(n) from the forward error e
f
N
(n), and (c)
generation of the normalized backward error g
b
N
(n) from the normalized forward error g
f
N
(n).
These are called normalized errors because their covariance matrices are equal to I. The transfer
function from g
f
N
(n) to g
b
N
(n) is given by
z
−1
G
N
(z)
Δ
=[E
b
N
]
−1/2
B
N
(z)A
−1
N
(z)[E
f
N
]
1/2
. (8.50)
The properties of G
N
(z) will be studied in later sections.
8.8 THE FIRLATTICESTRUCTURE FORVECTOR LPC
The update equations for predictor polynomials, given in Eqs. (8.37(f )) and (8.37(g)), can be used
to derive lattice structures for vector linear prediction, similar to the scalar case. Recall that the
error e
f
m
(n) is the output of A
m
(z) in response to the input x(n), and similarly, e
b
m
(n) is the output
of B
m
(z). Thus, the update Eqs. (8.37(f )) and (8.37(g)) show that the prediction errors for order
m + 1 can be derived from those for order m using the MIMO lattice section shown in Fig. 8.2.
Comparing this with the lattice structure for the scalar case (Fig. 4.5), we see that there is
a great deal of similarity. One difference is that, in the scalar case, the lattice coefficients were
k
m+1
and its conjugate k
m+1
* . But in the MIMO case, the lattice coefficients are the two different
matrices K
f
m+1
and (K
b
m+1
)

.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 131
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
z
−1
I
K
m+1
f
K
m+1
b
FIGURE 8.2: The MIMO lattice section that generates prediction errors for order m + 1 from
prediction errors for order m.
8.8.1 Toward Rearrangement of the Lattice
We now show how to bring more symmetry into the lattice structure by rewriting equations a little
bit. Recall that the coefficients K
f
m+1
and K
b
m+1
are related as in Eq. (8.42). We first rearrange this.
Because error covariances are positive semidefinite matrices, they can be factored into the form
E = E
1/2
E
†/2
,
where E
1/2
can be regarded as the left ‘square root and its transpose conjugate E
†/2
the right
square root. Using this notation and remembering that the error covariances are assumed to be
nonsingular, we can rewrite Eq. (8.42) as
(E
f
m
)
−1/2
K
f
m+1
(E
b
m
)
1/2
= (E
f
m
)
†/2
K
b
m+1
(E
b
m
)
−†/2
.
We will indicate this quantity by the notation P
m+1
, that is,
P
m+1
Δ
= (E
f
m
)
−1/2
K
f
m+1
(E
b
m
)
1/2
= (E
f
m
)
†/2
K
b
m+1
(E
b
m
)
−†/2
, (8.51)
so that
K
f
m+1
= (E
f
m
)
1/2
P
m+1
(E
b
m
)
−1/2
and K
b
m+1
= (E
f
m
)
−†/2
P
m+1
(E
b
m
)
†/2
. (8.52)
From the definition of P
m+1
, we see that
P
m+1
P

m+1
= (E
f
m
)
−1/2
(K
f
m+1
)(K
b
m+1
)

(E
f
m
)
1/2
(8.53)
and
P

m+1
P
m+1
= (E
b
m
)
−1/2
(K
b
m+1
)

K
f
m+1
(E
b
m
)
1/2
. (8.54)
132 THETHEORY OFLINEARPREDICTION
To understand the role of these matrices, recall the error update Eq. (8.35). This can be
rewritten as
E
f
m+1
=
_
I
L
− K
f
m+1
(K
b
m+1
)

_
E
f
m
=
_
I
L
− K
f
m+1
(K
b
m+1
)

_
(E
f
m
)
1/2
(E
f
m
)
†/2
=
_
(E
f
m
)
1/2
−K
f
m+1
(K
b
m+1
)

(E
f
m
)
1/2
_
(E
f
m
)
†/2
= (E
f
m
)
1/2
_
I
L
− (E
f
m
)
−1/2
K
f
m+1
(K
b
m+1
)

(E
f
m
)
1/2
_
(E
f
m
)
†/2
,
so that
E
f
m+1
= (E
f
m
)
1/2
_
I
L
−P
m+1
P

m+1
_
(E
f
m
)
†/2
. (8.55)
Similarly, from Eq. (8.36), we have
E
b
m+1
=
_
I
L
− (K
b
m+1
)

K
f
m+1
_
E
b
m
=
_
I
L
− (K
b
m+1
)

K
f
m+1
_
(E
b
m
)
1/2
(E
b
m
)
†/2
=
_
(E
b
m
)
1/2
−(K
b
m+1
)

K
f
m+1
(E
b
m
)
1/2
_
(E
b
m
)
†/2
= (E
b
m
)
1/2
_
I
L
− (E
b
m
)
−1/2
(K
b
m+1
)

K
f
m+1
(E
b
m
)
1/2
_
(E
b
m
)
†/2
,
so that
E
b
m+1
= (E
b
m
)
1/2
_
I
L
−P

m+1
P
m+1
_
(E
b
m
)
†/2
. (8.56)
Thus, the modified lattice coefficient P
m+1
enters both the forward and backward error updates in
a symmetrical way. Under the assumption that the error covariances are nonsingular, we now prove
that P
m+1
satisfies a boundedness property:
P
m+1
P

m+1
< I (8.57)
and similarly
P

m+1
P
m+1
< I. (8.58)
These can equivalently be written as
(I − P
m+1
P

m+1
) > 0, (I −P

m+1
P
m+1
) > 0. (8.59)
Proof. Eq. (8.55) can be written as
E
f
m+1
= U

QU,
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 133
where Q is Hermitian and U is nonsingular. Because E
f
m+1
is assumed to be nonsingular, it is
positive definite, so that v

E
f
m+1
v > 0 for any nonzero vector v. Thus, v

U

QUv > 0. Because
U = (E
f
m
)
†/2
is nonsingular, this implies that w

Qw > 0 for any vector w = 0, which is equivalent
to the statement that Q is positive definite. This proves that I − P
m+1
P

m+1
is positive definite,
which is equivalent to Eq. (8.57). Eq. (8.58) follows similarly.
Because E
f
m+1
is Hermitian and positive definite, it can be written in the form
E
f
m+1
= (E
f
m+1
)
1/2
(E
f
m+1
)
†/2
From Eq. (8.55), we therefore see that the left square root can be expressed as
(E
f
m+1
)
1/2
= (E
f
m
)
1/2
_
I
L
− P
m+1
P

m+1
_
1/2
. (8.60)
The existence of the square root (I
L
−P
m+1
P

m+1
)
1/2
is guaranteed by the fact that (I
L

P
m+1
P

m+1
) is positive definite (as shown by Eq. (8.59)). Similarly, from Eq. (8.56), we have
(E
b
m+1
)
1/2
= (E
b
m
)
1/2
_
I
L
− P

m+1
P
m+1
_
1/2
(8.61)
8.8.2 The Symmetrical Lattice
The advantage of defining the L ×L matrix P
m+1
from the L ×L matrices K
f
m+1
and K
b
m+1
is
that it allows us to redraw the lattice more symmetrically, in terms of P
m+1
. Because P
m+1
also has
the boundedness property (8.57), the lattice becomes very similar to the scalar LPC lattice, which
had the boundedness property |k
m
| < 1. We now derive this symmetrical lattice. From Eq. (8.51),
we see that the lattice coefficients can be expressed as
K
f
m+1
= (E
f
m
)
1/2
P
m+1
(E
b
m
)
−1/2
, (8.62)
and
K
b
m+1
= (E
f
m
)
−†/2
P
m+1
(E
b
m
)
†/2
. (8.63)
Substituting these into Fig. 8.2 and rearranging, we obtain the lattice section shown in Fig. 8.3(b).
The quantities g
f
m
(n) and g
b
m
(n) indicated in Fig. 8.3(b) are given by
g
f
m
(n) = (E
f
m
)
−1/2
e
f
m
(n), g
b
m
(n) = (E
b
m
)
−1/2
e
b
m
(n) (8.64)
These are normalized prediction error vectors, in the sense that their covariance matrix is
identity. We can generate the set of all forward and backward normalized error sequences by using
the cascaded FIR lattice structure, as demonstrated in Fig. 8.4 for three stages.
134 THETHEORY OFLINEARPREDICTION
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
1
P
m+1
P
m+1
1/2
E
m
f
( )
−1/2
E
m
f
( )
1/2
E
m
b
( )
−1/2
E
m
b
( )
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
z
−1
I
K
m+1
f
K
m+1
b
(a)
( b)
g (n)
f
m
g (n)
b
m
z

I
FIGURE8.3: (a) The original MIMOlattice sectionand (b) an equivalent sectionin a more symmetrical
form.
The multipliers Q
f
m
in the figure are given by
Q
f
m
= (E
f
m
)
−1/2
(E
f
m−1
)
1/2
, m ≥ 1 (8.65)
and Q
f
0
= (E
f
0
)
−1/2
. The multipliers Q
b
m
are similar (just replace superscript f with b everywhere).
In view of Eqs. (8.60) and (8.61), these multipliers can be rewritten as
Q
f
m
=
_
I
L
−P
m
P

m
_
−1/2
, Q
b
m
=
_
I
L
− P

m
P
m
_
−1/2
m ≥ 1. (8.66)
The unnormalized error sequence for any order can be obtained simply by inserting matrix
multipliers (E
f
m
)
1/2
and (E
b
m
)
1/2
at appropriate places. Because
e
f
3
(n) = (E
f
3
)
1/2
g
f
3
(n) and e
b
3
(n) = (E
b
3
)
1/2
g
b
3
(n),
the extra (rightmost) multipliers in Fig. 8.4(a) are therefore (E
f
3
)
1/2
and (E
b
3
)
1/2
. The lattice
sections in Fig. 8.4 will be referred to as normalized lattice sections because the outputs at each of
the stages represent the normalized error sequences.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 135
( b)
(a)
P
m
P
m
P
m
Q
0
b
Q
1
b
e (n)
f
0
P
1
x(n)
P
3
P
2
g (n)
f
0
g (n)
b
0
g (n)
f
1
g (n)
b
1
g (n)
f
2
g (n)
b
2
g (n)
f
3
g (n)
b
3
e (n)
b
0
e (n)
f
3
e (n)
b
3
Q
0
f
Q
1
f
Q
2
f
Q
2
b
Q
3
f
Q
3
b
z
−1
I z
−1
I
z
−1
I z
−1
I
FIGURE 8.4: (a) The MIMO FIR LPC lattice structure with normalized sections and (b) details of
the box labeled P
m
.
8.9 THE IIRLATTICESTRUCTURE FORVECTOR LPC
The IIR LPC lattice for the vector case can be obtained similar to the scalar case by starting from
Fig. 8.2 reproduced in Fig. 8.5(a) and rearranging equations. From Fig. 8.5(a), we have
e
f
m+1
(n) = e
f
m
(n) + K
f
m+1
e
b
m
(n)
and
e
b
m+1
(n + 1) = [K
b
m+1
]

e
f
m
(n) + e
b
m
(n).
The first equation can be rewritten as
e
f
m
(n) = e
f
m+1
(n) −K
f
m+1
e
b
m
(n).
Substituting this into the second equation, we get
e
b
m+1
(n + 1) = [K
b
m+1
]

e
f
m+1
(n) +
_
I − [K
b
m+1
]

K
f
m+1
_
e
b
m
(n).
These equations can be ‘‘drawn’’ as shown in Fig. 8.5(b). By repeated application of this idea, we
obtain the IIR LPC lattice shown in Fig. 8.6. This is called the MIMO IIR LPC lattice. The
transfer functions indicated as H
k
(z) in the figure will be discussed later.
136 THETHEORY OFLINEARPREDICTION
e (n)
f
m
e (n)
f
m+1
e (n)
b
m
e (n)
b
m+1
z
−1
I
K
m+1
f
K
m+1
b
e (n)
b
m
e (n)
f
m
e (n)
b
m+1
e (n)
f
m+1
z
−1
I
K
m+1
b
K
m+1
f

(a)
( b)
FIGURE8.5: (a) The MIMO FIR lattice section and (b) the corresponding IIR lattice section.
( b)
(a)
K
3
x(n)
K
1
K
2
e (n)
f
0
e (n)
b
0
e (n)
f
1
e (n)
b
1
e (n)
f
2
e (n)
b
2
e (n)
f
3
e (n)
b
3
z
−1
I z
−1
I
z
−1
I z
−1
I
K
m
K
m
b
K
m
f

H (z)
3
H (z)
2
H (z)
1
H
0
FIGURE 8.6: (a) The MIMO IIR cascaded LPC lattice, and (b) details of the building block labeled
K
m
.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 137
8.10 THE NORMALIZED IIRLATTICE
Starting from the symmetrical MIMO FIR LPC lattice in Fig. 8.3(b), we can derive an IIR LPC
lattice with slightly different structural arrangement. We will call this the normalized IIR LPC
lattice for reasons that will become clear. To obtain this structure, consider again the MIMO FIR
lattice section shown separately in Fig. 8.7(a). This FIR lattice section can be represented using the
equations
g
f
m+1
(n) = Q
f
m+1
_
g
f
m
(n) + P
m+1
g
b
m
(n)
_
(8.67)
and
g
b
m+1
(n + 1) = Q
b
m+1
_
P

m+1
g
f
m
(n) + g
b
m
(n)
_
. (8.68)
The first equation can be rearranged to express g
f
m
(n) in terms of the other quantities:
g
f
m
(n) = (Q
f
m+1
)
−1
g
f
m+1
(n) −P
m+1
g
b
m
(n). (8.69)
The second equation (8.68) can then be rewritten by substituting from Eq. (8.69):
g
b
m+1
(n + 1) = Q
b
m+1
P

m+1
(Q
f
m+1
)
−1
g
f
m+1
(n) + (Q
b
m+1
)
−†
g
b
m
(n), (8.70)
P
m+1
P
m+1
g (n)
f
m +1
Q
m +1
f
z
−1
I
Q
m +1
b
g (n)
b
m +1
g (n)
f
m
g (n)
b
m
g (n)
f
m +1
z
−1
I
g (n)
b
m +1
g (n)
f
m
g (n)
b
m
P
m+1

Q
m +1
f
( )
−1
Q
m +1
b
( )

Q
m +1
b
Q
m +1
f
( )
−1
P
m+1
(a)
( b)
FIGURE8.7: (a) The normalized MIMOFIRlattice section and (b) the corresponding IIRnormalized
lattice section.
138 THETHEORY OFLINEARPREDICTION
where we have used the definitions (8.66) for Q
f
m
and Q
b
m
, that is,
Q
f
m
=
_
I
L
−P
m
P

m
_
−1/2
, Q
b
m
=
_
I
L
− P

m
P
m
_
−1/2
m ≥ 1.
The preceding two equations can be represented using the normalized IIR lattice section shown in
Fig. 8.7(b).
The normalized MIMO IIR LPC lattice therefore takes the form shown in Fig. 8.8 for three
sections. For an N-stage IIR lattice, the signal entering at the left is the forward error e
f
N
(n), and
the signal at the right end is g
f
0
(n). Upon scaling with the multiplier
(Q
f
0
)
−1
= (E
f
0
)
1/2
,
the error signal g
f
0
(n) becomes e
f
0
(n), which is nothing but the original signal x(n). Thus, while
the FIR lattice produced the (normalized) error signals from the original signal x(n), the IIR lattice
reproduces x(n) back from the error signals.
8.11 THE PARAUNITARYOR MIMO ALL-PASS PROPERTY
Figure 8.9 shows one section of the IIR lattice structure. Here, G
m
(z) is the transfer function from
the pervious section, and P
m+1
is the constant matrix shown in Fig. 8.8(b). Comparing with Fig.
8.8(a), we see that the transfer matrices G
m
(z) convert the forward error signals to corresponding
( b)
(a)
P
3
x(n)
P
1
P
2
g (n)
f
0
g (n)
b
0
g (n)
f
1
g (n)
b
1
g (n)
f
2
g (n)
b
2
g (n)
f
3
g (n)
b
3
z
−1
I z
−1
I
z
−1
I z
−1
I
Q
0
f
( )
−1
P
m
G (z)
2
G (z)
3
G (z)
1
G
0
P
m

Q
m
f
( )
−1
Q
m
b
( )

Q
m
b
P
m
Q
f
( )
−1
m
FIGURE 8.8: (a) The normalized MIMO IIR LPC lattice shown for three stages and (b) details of
the normalized IIR lattice sections. Here, Q
f
m
= (I
L
− P
m
P

m
)
−1/2
, Q
b
m
= (I
L
−P

m
P
m
)
−1/2
, m ≥ 1,
and Q
f
0
= (E
f
0
)
−1/2
.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 139
G (z)
m+1
G (z) z
−1
m
P
m+1
x (n)
1
x (n)
2 y (n)
1
y (n)
2
FIGURE8.9: The normalized MIMO IIR lattice section.
backward error signals. More precisely, if g
f
m
(n) is input to the system G
m
(z), then the output is
g
b
m
(n + 1).
In the scalar case, the transfer function G
m
(z) was all-pass, and this was used to show that
G
m+1
(z) was all-pass. Furthermore, these all-pass filters had all their poles inside the unit circle
(Section 4.3.2). We now prove an analogous property for the vector case. This is somewhat more
complicated than in the scalar case because the transfer functions are MIMO, and the all-pass
property generalizes to the so-called paraunitary property to be defined below (Vaidyanathan,
1993). We will first show that the building block indicated as P
m+1
has a unitary transfer matrix
T
m+1
. We then show that in an interconnection such as this, whenever G
m
(z) is paraunitary
and P
m+1
unitary, then G
m+1
(z) is paraunitary as well. Because G
0
(z) = I is paraunitary, it will
therefore follow that all the transfer functions G
m
(z) in the normalized IIR lattice are paraunitary.
At this point, we ask the reader to review the tilde notation
¯
G(z) described in Section 1.2.1.
Definition 8.1. Paraunitary Systems. An M× M transfer matrix G(z) is said to be
paraunitary if
G

(e

)G(e

) = I (8.71)
for all ω. That is, the transfer matrix is unitary for all frequencies. If G(z) is rational (i.e., the entries
G
km
(z) are ratios of polynomials in z
−1
) then the paraunitary property is equivalent to
¯
G(z)G(z) = I (8.72)
for all z. ♦
With x(n) and y(n) denoting the input and output of an LTI system (Fig. 8.10), their Fourier
transforms are related as
Y(e

) = G(e

)X(e

)
140 THETHEORY OFLINEARPREDICTION
G(z)
x(n) y(n)
FIGURE8.10: An LTI system with input x(n) and output y(n).
A number of important properties of paraunitary matrices are derived in Vaidyanathan (1993). The
following properties are especially relevant for our discussions here:
1. If G(e

) is unitary, it follows that
Y

(e

)Y(e

) = X

(e

)X(e

), for all ω (8.73)
for any Fourier transformable input x(n).
2. It can, in fact, be shown that a rational LTI system G(z) is paraunitary if and only if (8.73)
holds for all Fourier-transformable inputs x(n).
3. By integrating both sides of (8.73) and using Parseval’s relation, it also follows that, for a
paraunitary system, the input energy is equal to the output energy, that is,

n
x

(n)x(n) =

n
y

(n)y(n) (8.74)
for all finite energy inputs.
8.11.1 Unitarity of Building Blocks
The building block P
m+1
in Fig. 8.9 has the form shown in Fig. 8.8(b). Its transfer matrix has the
form
T =



Q
b
P

(Q
f
)
−1
(Q
b
)
−†
(Q
f
)
−1
−P



, (8.75)
where all subscripts have been omitted for simplicity. Recall here that the matrices Q
b
and Q
f
are
related to P as in Eq. (8.66), that is,
Q
f
=
_
I
L
− PP

_
−1/2
, Q
b
=
_
I
L
− P

P
_
−1/2
. (8.76)
We will now show that T is unitary, that is, T

T = I.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 141
Proof of Unitarity of T. We have
T

T =
_
A B
B

D
_
,
where the symbols above are defined as
A = (Q
f
)
−†
P(Q
b
)

Q
b
P

(Q
f
)
−1
+ (Q
f
)
−†
(Q
f
)
−1
,
B = (Q
f
)
−†
P(Q
b
)

(Q
b
)
−†
− (Q
f
)
−†
P = 0,
D = (Q
b
)
−1
(Q
b
)
−†
+ P

P.
So, B is trivially zero. To simplify D, we substitute from Eq. (8.76) to obtain
D =
_
I
L
− P

P
_
1/2
_
I
L
− P

P
_
†/2
+ P

P
=
_
I
L
− P

P
_
+ P

P = I
L
Next, substituting from Eq. (8.76), the first term of A becomes
_
I
L
−PP

_
†/2
P
_
I
L
−P

P
_
−†/2
_
I
L
−P

P
_
−1/2
P

_
I
L
−PP

_
1/2
=
_
I
L
−PP

_
†/2
P
_
I
L
−P

P
_
−1
P

_
I
L
− PP

_
1/2
,
so that
A =
_
I
L
− PP

_
†/2
_
P
_
I
L
− P

P
_
−1
P

+ I
_
. ¸¸ .
call this A
1
_
I
L
− PP

_
1/2
.
We now claim that A
1
= (I
L
−PP

)
−1
:
A
1
(I
L
− PP

) = P
_
I
L
−P

P
_
−1
_
P

−P

PP

_
+
_
I
L
−PP

_
= P
_
I
L
−P

P
_
−1
_
I
L
−P

P
_
P

+
_
I
L
− PP

_
= PP

+
_
I
L
−PP

_
= I
L
.
Thus, A = (I
L
− PP

)
†/2
(I
L
− PP

)
−1
(I
L
−PP

)
1/2
= I, where we have used (I
L
− PP

) =
(I
L
−PP

)
1/2
(I
L
−PP

)
†/2
. This completes the proof that T is unitary.
8.11.2 Propagation of Paraunitary Property
Return now to Fig. 8.9. We will show that if the rational transfer matrix G
m
(z) is paraunitary and
the box labeled P
m+1
has a unitary transfer matrix T
m+1
(z) then G
m+1
(z) is also paraunitary.
142 THETHEORY OFLINEARPREDICTION
Proof of Paraunitarity of G
m+1
(z). From Fig. 8.9, we have
_
Y
1
(e

)
Y
2
(e

)
_
= T
m+1
_
X
1
(e

)
X
2
(e

)
_
The unitary property of T
m+1
implies that
Y

1
(e

)Y
1
(e

) + Y

2
(e

)Y
2
(e

)
= X

1
(e

)X
1
(e

) + X

2
(e

)X
2
(e

), for all ω.
Because G
m
(z) is paraunitary, we also have Y

2
(e

)Y
2
(e

) = X

2
(e

)X
2
(e

) for all ω. So it
follows that
Y

1
(e

)Y
1
(e

) = X

1
(e

)X
1
(e

), for all ω.
This shows that the rational function G
m+1
(z) is paraunitary.
8.11.3 Poles of the MIMO IIRLattice
Figure 8.11 is a reproduction of Fig. 8.9 with details of the building block P
m+1
shown explicitly
(with subscripts deleted for simplicity). Here, the matrices Q
f
and Q
b
are as in Eq. (8.66), that is,
Q
f
=
_
I
L
− PP

_
−1/2
and Q
b
=
_
I
L
− P

P
_
−1/2
.
In Section 8.8.1, we showed that the LPC lattice satisfies the property
P

P < I (8.77)
(see Eq. (8.58)). Under this assumption, we will prove an important property about the poles
3
of
the transfer matrix G
m+1
(z).
Lemma 8.1. Poles of the IIR Lattice. If the rational paraunitary matrix G
m
(z) has all poles in
|z| < 1 and if the matrix multiplier P is bounded as in Eq. (8.77), then the poles of G
m+1
(z) are
also confined to the region |z| < 1. ♦
Proof. It is sufficient to show that the transfer function F(z) indicated in Fig. 8.11 has all
poles in |z| < 1. To find an expression for F(z), note that
Y(z) = z
−1
G
m
(z)W(z) and W(z) = −z
−1
PG
m
(z)W(z) + X(z),
3
The poles of a MIMO transfer function F(z) are nothing but the poles of the individual elements F
km
(z). For
this section, the reader may want to review the theory on poles of MIMO systems (Kailath, 1980; Vaidyanathan,
1993, Chapter 13).
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 143
G (z)
m+1
G (z) z
−1
m
P

( ) Q
f −1
( )
Q
b −
P Q
b
Q
f
( )
−1
F(z)
x(n)
y(n)
w(n)
FIGURE8.11: The normalizedMIMOIIRlattice section with some of the details indicated. Subscripts
on P and Q are omitted for simplicity.
where w(n) is the signal indicated in the figure. Eliminating W(z), we get Y(z) = z
−1
G
m
(z)(I +
z
−1
PG
m
(z))
−1
X(z). This shows that
F(z) = z
−1
G
m
(z)(I + z
−1
PG
m
(z))
−1
. (8.78)
Because the poles of G
m
(z) are assumed to be in |z| < 1, it suffices to show that all the poles of
(I + z
−1
PG
m
(z))
−1
are in |z| < 1. Because
_
I + z
−1
PG
m
(z)
_
−1
=
Adj
_
I + z
−1
PG
m
(z)
_
det (I + z
−1
PG
m
(z))
,
where Adj is the adjugate matrix, it is sufficient to show that all zeros of the denominator above are
in |z| < 1. We will use Eq. (8.77) and the assumption that G
m
(z) is paraunitary with all poles in
|z| < 1. The latter implies
G

m
(z)G
m
(z) ≤ I (8.79)
for all z in |z| ≥ 1 (see Section 14.8 of Vaidyanathan, 1993). Now, if z
0
is a zero of the determinant
of (I + z
−1
PG
m
(z)), then I + z
−1
0
PG
m
(z
0
) is singular. So there exists a unit-norm vector v such
that z
−1
0
PG
m
(z
0
)v = −v. This implies
v

G

m
(z
0
)P

PG
m
(z
0
)v = |z
0
|
2
. (8.80)
If |z
0
| ≥ 1, then G
m
(z
0
)v has norm≤ 1 (in view of Eq. (8.79)). So the left-hand side in Eq. (8.80)
is < 1 in view of (8.77). This contradicts the statement that the right-hand side of Eq. (8.80)
satisfies |z
0
| ≥ 1. This proves that all zeros of the said determinant are in |z| < 1.
By induction, it therefore follows that the MIMO IIR LPC lattice is stable. Summarizing, we have
proved the following:
144 THETHEORY OFLINEARPREDICTION
Theorem 8.2. Stability of the IIR Lattice. Assume all the matrix multipliers indicated as P
m
in the MIMO IIR lattice (demonstrated in Fig. 8.8 for three stages) satisfy P

m
P
m
< I and define
Q
f
m
and Q
b
m
as in (8.66). Then, the poles of all the paraunitary matrices G
m
(z) in the LPC lattice
are confined to be in |z| < 1. ♦
Equivalently, all poles of the MIMO IIR LPC lattice are in |z| < 1 (i.e., the IIR lattice is a causal
stable system). This is the MIMO version of the stability property we saw in Section 4.3.2 for the
scalar LPC lattice.
8.12 WHITENINGEFFECTANDSTALLING
Consider again Levinson’s recursion for the MIMO linear predictor described in Section 8.5.
Recall that the error covariances are updated according to Eqs. (8.37(h)) and (8.37(i)), which can
be rewritten using (8.37(d)) and (8.37(e)) as follows:
E
f
N+1
= E
f
N
−(F
f
N
)

(E
b
N
)
−1
F
f
N
(8.81)
E
b
N+1
= E
b
N
− F
f
N
(E
f
N
)
−1
(F
f
N
)

. (8.82)
We say that there is no progress as we move from order N to order N + 1, if E
f
N+1
= E
f
N
, that is,
Tr E
f
N+1
= Tr E
f
N
. (8.83)
We know that the optimal prediction e
f
N
(n) at time n is orthogonal to the samples x(n − k), for
1 ≤ k ≤ N. We now claim the following:
Lemma 8.2. No-Progress Situation. The condition (8.83) arises if and only if
(F
f
N
)

Δ
=E[e
f
N
(n)x

(n −N −1)] = 0 (8.84)
that is, the error e
f
N
(n) is orthogonal to x(n −N −1) in addition to being orthogonal to
x(n −1), . . . , x(n − N). ♦
Proof. Eq. (8.83) is equivalent to Tr ((F
f
N
)

(E
b
N
)
−1
F
f
N
) = 0, as seenfromEq. (8.45). Because
the matrix (F
f
N
)

(E
b
N
)
−1
F
f
N
is Hermitian and positive semidefinite, the zero-trace condition is
equivalent to the statement
4
(F
f
N
)

(E
b
N
)
−1
(F
f
N
) = 0.
4
The trace of a matrix A is the sum of its eigenvalues. When A is positive semidefinite, all eigenvalues are
nonnegative. So, zero-trace implies all eigenvalues are zero. This implies that the matrix itself is zero (because any
positive semidefinite matrix can be written as A = UΛU

where Λ is the diagonal matrix of eigenvalues and U
unitary).
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 145
This is equivalent to F
f
N
= 0 (because (E
b
N
)
−1
is Hermitian and nonsingular). This completes the
proof.
A number of points should be noted.
1. From Eqs. (8.82) and (8.84), we see that when there is no progress in forward prediction,
then there is no progress in backward prediction either (and vice versa).
2. When Eq. (8.84) holds, we have K
f
N+1
= 0 (from (8.26)) so that A
N+1
(z) = A
N
(z).
Similarly K
b
N+1
= 0 and B
N+1
(z) = z
−1
B
N
(z).
We say that linear prediction ‘‘stalls’’ at order N if there is no further progress after stage N,
that is, if
E
f
N
= E
f
N+1
= E
f
N+2
= . . . . (8.85)
We now prove the following:
Lemma 8.3. Stalling and Whitening. The stalling situation (8.85) occurs if and only if
E[e
f
N
(n)(e
f
N
)

(n − m)] = E
f
N
δ(m), (8.86)
that is, if and only if e
f
N
(n) is white. ♦
Proof. From Lemma 8.2, it is clear that stalling arises if and only if
F
f
m
= 0, m ≥ N,
or equivalently,
E[e
f
N
(n)x

(n −m)] = 0, m ≥ N + 1.
Because this equality holds for 1 ≤ m ≤ N directly because of orthogonality principle, we can say
that stalling happens if and only if
E[e
f
N
(n)x

(n −m)] = 0, m ≥ 1. (8.87)
But e
f
N
(n) is a linear combination of x(n − m), m ≥ 0, so the preceding equation implies
E[e
f
N
(n)(e
f
N
)

(n −k)] = 0, k ≥ 1. (8.88)
Conversely, because x(n) is a linear combination of e
f
N
(n −k), k ≥ 0, we see that Eq. (8.88) implies
Eq. (8.87) as well. But because (8.88) is equivalent to Eq. (8.86), we have shown that the stalling
condition is equivalent to Eq. (8.86).
146 THETHEORY OFLINEARPREDICTION
8.13 PROPERTIES OF TRANSFERMATRICES INLPC THEORY
In the scalar case, the properties of the transfer functions A
N
(z) and B
N
(z) are well understood. For
example, A
N
(z) has all zeros strictly inside the unit circle (i.e., it is a minimum-phase polynomial).
Furthermore, we showed that
B
N
(z) = z
−(N+1)
¯
A
N
(z).
This implies that B
N
(z) has all its zeros outside the unit circle and the filter B
N
(z)/A
N
(z) is stable
and all-pass. That is, the forward and backward error sequences are related by an all-pass filter,
so that E
f
N
= E
b
N
. For the case of vector linear prediction, it is trickier to prove similar properties
because the transfer functions are matrices. Let us first recall what we have already done. The
transfer matrix H
N
(z) from the prediction error e
f
N
(n) to the prediction error e
b
N
(n) is given in Eq.
(8.48) and is reproduced below:
z
−1
H
N
(z)
Δ
=B
N
(z)A
−1
N
(z). (8.89)
The transfer matrix z
−1
G
N
(z) from the normalized prediction error g
f
N
(n) to the normalized
prediction error g
b
N
(n) is given in Eq. (8.50) and reproduced below:
z
−1
G
N
(z)
Δ
=[E
b
N
]
−1/2
B
N
(z)A
−1
N
(z)[E
f
N
]
1/2
. (8.90)
This also appears in the MIMO IIR lattice of Fig. 8.8(a) (indicated also in Fig. 8.11). We showed
in Section 8.11.2 that G
N
(z) is paraunitary, which is an extension of the all-pass property to the
MIMO case. We also showed (Section 8.11.3) that all poles of G
N
(z) are in |z| < 1.
We will start from these results and establish some more. The fact that the poles of G
N
(z)
are in |z| < 1 does not automatically imply that the zeros of the determinant of A
N
(z) are in
|z| < 1 unless we establishthat the rational formin Eq. (8.90) is ‘‘irreducible’’ or ‘‘minimal’’ in some
sense. We now discuss this issue more systematically. We will also establish a relation between the
forward and backward matrix polynomials A
N
(z) and B
N
(z).
8.13.1 Review of Matrix Fraction Descripions
An expression of the form
H(z) = B(z)A
−1
(z), (8.91)
where A(z) and B(z) are polynomial matrices is called a matrix fraction description (MFD).
5
The
expressions (8.89) and (8.90) for z
−1
H
N
(z) and z
−1
G
N
(z) are therefore MFDs. We know that the
5
In this section, all polynomials are polynomials in z
−1
.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 147
rational expression H(z) = B(z)/A(z) for a transfer function is not unique because there can be
cancellations between the polynomials A(z) and B(z). Similarly, the MFD expression for a rational
transfer matrix is not unique. If there are ‘‘common factors’’ between A(z) and B(z), then we can
cancel these to obtain a reduced MFD description for the same H(z). To be more quantitative,
we first need to define ‘‘common factors’’ for matrix polynomials. If the two polynomials A(z) and
B(z) can be written as
B(z) = B
1
(z)R(z) and A(z) = A
1
(z)R(z)
where B
1
(z), A
1
(z), and R(z) are polynomial matrices, then we say that R(z) is a right common
divisor or factor between A(z) and B(z), abbreviated as rcd. In this case, we can rewrite Eq. (8.91)
as
H(z) = B
1
(z)A
−1
1
(z). (8.92)
Because
det A(z) = det A
1
(z)det R(z),
it follows that det A
1
(z) has smaller degree than det A(z) (unless det R(z) is constant). Thus,
cancellation of all these common factors results in a reduced MFD (i.e., one with smaller degree
for det A(z)). We say that the MFD (8.91) is irreducible if every rcd R(z) has unit determinant,
det R(z) = 1, (8.93)
or more generally, a nonzero constant. A matrix polynomial R(z) satisfying Eq. (8.93) is said to
be unimodular. Thus, an MFD (8.91) is irreducible if the matrices A(z) and B(z) can have only
unimodular rcds. In this case, we also say that A(z) and B(z) are right coprime.
Any unimodular matrix is an rcd. Given A(z) and B(z), we can always write
A(z) = A(z)U
−1
(z)
. ¸¸ .
polynomial A
1
(z)
×U(z), B(z) = B(z)U
−1
(z)
. ¸¸ .
polynomial B
1
(z)
×U(z)
for any unimodular polynomial U(z). The inverse U
−1
(z) remains a polynomial because det U(z) =
1. Thus, A
1
(z) and B
1
(z) are still polynomials. This shows that any unimodular matrix can be
regarded as an rcd of any pair of matrix polynomials.
The following results are well-known in linear system theory (e.g., see Lemmas 13.5.2 and
13.6.1 in Vaidyanathan, 1993).
Lemma 8.4. Irreducible MFDs and Poles. Let H(z) = B(z)A
−1
(z) be an irreducible MFD.
Then, the set of poles of H(z) is precisely the set of zeros of [det A(z)]. ♦
148 THETHEORY OFLINEARPREDICTION
Lemma 8.5. Irreducible and Reducible MFDs. Let H(z) = B(z)A
−1
(z) be an MFD and
H(z) = B
red
(z)A
−1
red
(z) an irreducible MFDfor the same systemH(z). Then, B(z) = B
red
(z)R(z)
and A(z) = A
red
(z)R(z) for some polynomial matrix R(z). ♦
8.13.2 Relation Between Predictor Polynomials
We now prove a crucial property of the optimal MIMO predictor polynomials. The property
depends on the vector version of Levinson’s upward recursion Eqs. (8.37(f ))--(8.37(g)). First note
that we can solve for A
m
(z) and B
m
(z) in terms of A
m+1
(z) and B
m+1
(z) and obtain
A
m
(z) =
_
I − K
f
m+1
(K
b
m+1
)

_
−1
_
A
m+1
(z) − zK
f
m+1
B
m+1
(z)
_
(8.94)
and
B
m
(z) =
_
I − (K
b
m+1
)

K
f
m+1
_
−1
_
zB
m+1
(z) −(K
b
m+1
)

A
m+1
(z)
_
. (8.95)
This is called the downward recursion because we go from m + 1 to m using these equations.
The inverses indicated in Eq. (8.94) and (8.95) exist by virtue of the assumption that the error
covariances at stage m and m + 1 are nonsingular and related as in Eqs. (8.35) and (8.36).
Lemma 8.6. Coprimality of Predictor Polynomials. The optimal predictor polynomials A
N
(z)
and B
N
(z) are right coprime for any N, so that the MFD given by B
N
(z)A
−1
N
(z) is irreducible. ♦
Proof. If R(z) is an rcd of A
m+1
(z) and B
m+1
(z), then we see from Eqs. (8.94) and (8.95)
that it is also an rcd of A
m
(z) and B
m
(z). Repeating this argument, we see that any rcd R(z) of
A
N
(z) and B
N
(z) for N > 0 must be an rcd of A
0
(z) and B
0
(z). But A
0
(z) = I and B
0
(z) = z
−1
I,
so these have no rcd other than unimodular matrices. Thus, R(z) has to be unimodular, or
equivalently, A
N
(z) and B
N
(z) are right coprime.
We are now ready to prove the following result. The only assumption is that the error covariances
E
f
N
and E
b
N
are nonsingular (as assumed throughout the chapter).
Theorem 8.3. Minimum-Phase Property. The optimal predictor polynomial matrix A
N
(z) is
a minimum-phase polynomial, that is, all the zeros of [det A
N
(z)] are in |z| < 1. ♦
Proof. From Eq. (8.90), we have
B
N
(z)A
−1
N
(z) = z
−1
(E
b
N
)
1/2
G
N
(z)(E
f
N
)
−1/2
.
From Theorem 8.2, we know that G
N
(z) is a stable transfer matrix (all poles restricted to |z| < 1).
This shows that B
N
(z)A
−1
N
(z) is stable as well. Because B
N
(z)A
−1
N
(z) is an irreducible MFD
(Lemma 8.6), all the zeros of [det A
N
(z)] are poles of B
N
(z)A
−1
N
(z) (Lemma 8.4). So all zeros of
[det A
N
(z)] are in |z| < 1.
LINEARPREDICTIONTHEORY FORVECTOR PROCESSES 149
Refer again to Eq. (8.90), which is reproduced below:
z
−1
G
N
(z)
Δ
=[E
b
N
]
−1/2
B
N
(z)
. ¸¸ .
call this B(z)
A
−1
N
(z)[E
f
N
]
1/2
. ¸¸ .
call this A
−1
(z)
= B(z)A
−1
(z) (8.96)
Because G
N
(z) is paraunitary, it follows that B(z)A
−1
(z) is paraunitary as well. Thus,
¯
A
−1
(z)
¯
B(z)B(z)A
−1
(z) = I
from which it follows that
¯
B(z)B(z) =
¯
A(z)A(z) for all z. (8.97)
That is,
¯
B
N
(z)[E
b
N
]
−1
B
N
(z) =
¯
A
N
(z)[E
f
N
]
−1
A
N
(z). (8.98)
So, the forward and backward predictor polynomials are related by the above equation. For the
scalar case, note that we had E
b
N
= E
f
N
, so that this reduces to
¯
B
N
(z)B
N
(z) =
¯
A
N
(z)A
N
(z), which,
of course, is equivalent to the statement that B
N
(z)/A
N
(z) is all-pass. Summarizing, the main
properties of the optimal predictor polynomials for vector LPC are the following:
1. A
N
(z) and B
N
(z) are right coprime, that is, B
N
(z)A
−1
N
(z) is irreducible (Lemma 8.6).
2. A
N
(z) has minimum-phase property (Theorem 8.3).
3.
¯
B
N
(z)[E
b
N
]
−1
B
N
(z) =
¯
A
N
(z)[E
f
N
]
−1
A
N
(z), or equivalently,
z
−1
G
N
(z)
Δ
=[E
b
N
]
−1/2
B
N
(z)A
−1
N
(z)[E
f
N
]
1/2
(8.99)
is paraunitary.
8.14 CONCLUDINGREMARKS
Although the vector LPC theory proceeds along the same lines as does the scalar LPC theory, there
are a number of important differences as shown in this chapter. The scalar all-pass lattice structure
introduced in Section 4.3 has applications in the design of robust digital filter structures (Gray
and Markel, 1973, 1975, 1980; Vaidyanathan and Mitra, 1984, 1985; Vaidyanathan et al., 1986).
Similarly, the paraunitary lattice structure for MIMO linear prediction can be used for the design of
low sensitivity digital filter structures, as shown in the pioneering work of Rao and Kailath (1984).
The MIMO lattice is also studied by Vaidyanathan and Mitra (1985).
• • • •
151
A P P E N D I X A
Linear Estimation of Random
Variables
Let x
1
, x
2
, . . . , x
N
be a set of N random variables, possibly complex. Let x be another random
variable, possibly correlated to x
i
in some way. We wish to obtain an estimate of x from the observed
values of x
i
, using a linear combination of the form
´ x = −a
1
* x
1
−a
2
* x
2
. . . − a
N
* x
N
. (A.1)
The use of −a
i
* rather than just a
i
might appear to be an unnecessary complication, but will become
notationally convenient. We say that ´ x is a linear estimate of x in terms of the ‘‘observed’’ variables
x
i
. The estimation error is defined to be
e = x − ´ x, (A.2)
so that
e = x + a
1
* x
1
+ a
2
* x
2
+ . . . + a
N
* x
N
. (A.3)
A classic problem in estimation theory is the identification of the constants a
i
such that the mean
square error
E = E[ |e|
2
], (A.4)
is minimized. The estimate ´ x is then said to be the minimum mean square error (MMSE) linear
estimate.
A.1 THE ORTHOGONALITY PRINCIPLE
The minimization of mean square error is achieved with the help of the following fundamental
result:
TheoremA.1. Orthogonality Principle. The estimate ´ x in Eq. (A.1) results in minimum mean
square error among all linear estimates, if and only if the estimation error e is orthogonal to the N
random variables x
i
, that is, E[ex
i
* ] = 0 for 1 ≤ i ≤ N. ♦
152 THETHEORY OFLINEARPREDICTION
Proof. Let ´x

denote the estimate obtained when the orthogonality condition is satisfied.
Thus, the error
e

Δ
=x −´x

(A.5)
satisfies
E[e

x
i
*] = 0, 1 ≤ i ≤ N. (A.6)
Let ´x be some other linear estimate, with error e. Then,
e = x −´x = e

+´x

−´x (from Eq. (A.5)).
This has the mean square value
E[ |e|
2
] = E[ |e

|
2
] + E[ |´x

−´x|
2
] + E[(´x

−´x)e

* ] + E[e

(´x

* −´x *)]. (A.7)
Because ´x

and ´x are both linear combinations as in Eq. (A.1), we can write
´x

−´x =
N

i=1
c
i
x
i
. (A.8)
We therefore have
E[e

(´x

* −´x *)] = E[e

N

i=1
c
i
*x
i
* ] (from Eq. (A.8))
=
N

i=1
c
i
* E[e

x
i
* ]
= 0 (from Eq. (A.6)),
so that Eq. (A.7) becomes
E[ |e|
2
] = E[ |e

|
2
] + E[ |´x

−´x|
2
].
The second term on the right-hand side above is nonnegative, and we conclude that
E[ |e|
2
] ≥ E[|e

|
2
].
Equality holds in the above, if and only if ´x

= ´x.
A.2 CLOSED-FORMSOLUTION
If we impose the orthogonality condition on the error (A.3), we obtain the N equations
E[ex

i
] = 0, 1 ≤ i ≤ N (orthogonality). (A.9)
LINEARESTIMATIONOFRANDOMVARIABLES 153
We use these to find the coefficients a
i
, which result in the optimal estimate. (The subscript ⊥
which was used in the above proof will be discontinued for simplicity.) Defining the column vectors
x = [ x
1
x
2
. . . x
N
]
T
, a = [ a
1
a
2
. . . a
N
]
T
,
we can write the error e as
e = x + a

x, (A.10)
and the orthogonality condition Eq. (A.9) as
E[e x

] = 0. (A.11)
Substituting from Eq. (A.10), we arrive at
E[x x

] + a

E[xx

] = 0. (A.12)
In the above equation, the matrix
R
Δ
=E[xx

]
is the correlation matrix for the random vector x. This matrix is Hermitian, that is, R

= R.
Defining the vector
r = E[x x*],
Eq. (A.12) simplifies to
Ra = −r.
More explicitly, in terms of the matrix elements, this can be written as






E[ |x
1
|
2
] E[x
1
x
2
*] . . . E[x
1
x
N
* ]
E[x
2
x
1
*] E[ |x
2
|
2
] . . . E[x
2
x
N
* ]
.
.
.
.
.
.
.
.
.
.
.
.
E[x
N
x
1
*] E[x
N
x
2
*] . . . E[ |x
N
|
2
]






. ¸¸ .
R






a
1
a
2
.
.
.
a
N






. ¸¸ .
a
= −






E[x
1
x*]
E[x
2
x*]
.
.
.
E[x
N
x*]






. ¸¸ .
−r
. (A.13)
The vector r is the cross-correlation vector. Its ith element represents the correlation between the
random variable x and the observations x
i
. If the correlation matrix R is nonsingular, we can solve
for a and obtain
a = −R
−1
r (optimal linear estimator). (A.14)
154 THETHEORY OFLINEARPREDICTION
The autocorrelation matrix R is positive semidefinite because xx

is positive semidefinite. If in
addition R is nonsingular, then it is positive definite. The case of singular R will be discussed in
Section A.4.
The above development does not assume that x or x
i
have zero mean. Notice that if the
correlation between x and the observation x
i
is zero for each i, then r = 0. Under this condition,
a = 0. This means that the best estimate is ´x = 0 and the error is e = x itself!
A.3 CONSEQUENCES OF ORTHOGONALITY
Because the estimate ´x is a linear combinationof the randomvariables x
i
, the orthogonality property
(A.9) implies
E[e ´x *] = 0. (A.15)
That is, the optimal linear estimate ´x is itself orthogonal to the error e. The random variable x and
its estimate ´x are related as x = ´x + e. From this, we obtain
E[ |x|
2
] = E[ |´x|
2
] + E[ |e|
2
]
. ¸¸ .
E
(A.16)
by exploiting Eq. (A.15).
GeometricInterpretation. Ageometric interpretationof orthogonality principle canbe given
by visualizing the random variables as vectors. Imagine we have a three-dimensional Euclidean
vector x, which should be approximated in the two-dimensional plane (x
1
, x
2
) (Fig. A.1). We see
that the approximation error is given by the vector e and that the length of this vector is minimized
if and only if it is perpendicular to the x
1
, x
2
plane. Under this condition, the lengths of the vector
x, the error e, and the estimate ´x are indeed related as
(length of x)
2
= (length of ´x )
2
+ (length of e)
2
,
which is similar to Eq. (A.16). The above relation is also reminiscent of the Pythagorean
theorem.
Expressionfor the MinimizedError. Assuming that the coefficients a
i
have been found, we
can compute the minimized mean square error as follows:
E = E[|e|
2
] = E[ee*] = E[e(x* + x

a)]
= E[ex*] (by orthogonality (A.11)).
LINEARESTIMATIONOFRANDOMVARIABLES 155
x
e
x
2
x
x
1
FIGUREA.1: Pertains to the discussion of orthogonality principle.
Using Eq. (A.10) and the definition of r, this can be rewritten as
E[ |e|
2
]
. ¸¸ .
E
= E[ |x|
2
] + a

r (A.17)
Example A.1: Estimating a Random Variable. Let x and x
1
be real random variables. We
wish to estimate x from the observed values of x
1
using the linear estimate
´x = −a
1
x
1
.
The normal equations are obtained by setting N = 1 in Eq. (A.13), that is,
E[x
2
1
] × a
1
= −E[x
1
x].
Thus,
a
1
=
−E[x
1
x]
E[x
2
1
]
.
The mean square value of the estimation error is
E[e
2
] = E[x
2
] −
(E[x
1
x])
2
E[x
2
1
]
[from Eq. (A.17)].
Thus, a
1
is proportional to the cross-correlationE[x
1
x]. If E[x
1
x] = 0, then a
1
= 0, that is, the best
estimate is ´ x = 0, and the mean square error is
E[e
2
] = E[x
2
] = mean square value of x itself!
156 THETHEORY OFLINEARPREDICTION
A.4 SINGULARITYOF THE AUTOCORRELATIONMATRIX
If the autocorrelation matrix R defined in Eq. (A.13) is singular, (i.e., [det R] = 0), we cannot
invert it to obtain a unique solution a as in Eq. (A.14). To analyze the meaning of singularity, recall
(Horn and Johnson, 1985) that R is singular if and only if Rv = 0 for some vector v = 0. Thus,
singularity of R implies v

Rv = 0, that is,
v

E[xx

]v = 0, v = 0.
Because v is a constant with respect to the expectation operation, we can rewrite this as E[v

xx

v] =
0. That is,
E[ |v

x|
2
] = 0.
This implies that the scalar random variable v

x is zero, that is,
v

x = v
1
*x
1
+ v
2
*x
2
+. . . + v
N
* x
N
= 0.
with x =
_
x
1
x
2
. . . x
N
_
T
and v =
_
v
1
v
2
. . . v
N
_
T
. Because v = 0, at least one of components
v
i
is nonzero. In other words, the N random variables x
i
are linearly dependent. Assuming, for
example, that v
N
is nonzero, we can write
x
N
= −[v
1
*x
1
+ . . . + v
N−1
* x
N−1
]/v
N
* .
That is, we can drop the term a
N
* x
N
fromEq. (A.3) and solve a smaller estimation problem, without
changing the value of the optimal estimate. We can continue to eliminate the linear dependence
among random variables in this manner until we finally arrive at a smaller set of random variables,
say, x
i
, 1 ≤ i ≤ L, which do not have linear dependence. The L ×L correlation matrix of these
random variables is then nonsingular, and we can solve a set of L equations (normal equations)
similar to Eq. (A.13), to find the unique set of L coefficients a
i
.
Example A.2: Singularity of Autocorrelation. Consider the estimation of x from two real
random variables x
1
and x
2
. Let the 2 ×2 autocorrelation matrix be
R =
_
1 a
a a
2
_
.
We have [det R] = a
2
− a
2
= 0 so that R is singular. We see that the vector v =
_
a
−1
_
has the
property that Rv = 0. From the above theory, we conclude that x
1
and x
2
are linearly dependent
and ax
1
− x
2
= 0, that is,
x
2
= ax
1
.
In other words, x
2
is completely determined by x
1
.
157
A P P E N D I X B
Proof of a Property of
Autocorrelations
The autocorrelation R(k) of a WSS process satisfies the inequality
R(0) ≥ |R(k)|. (B.1)
To prove this, first observe that
E|x(0) −e

x(−k)|
2
≥ 0 (B.2)
for any real constant φ. The left-hand side can be rewritten as
E
_
|x(0)|
2
+|x(−k)|
2
−e

x*(0)x(−k) −e
−jφ
x(0)x*(−k)
¸
= 2R(0) −e

R
*
(k) − e
−jφ
R(k).
Because this holds for any real φ, let us choose φ so that R

(k)e

is real and nonnegative, that is,
R

(k)e

= |R(k)|. Thus,
E|x(0) − e

x(−k)|
2
= 2R(0) − 2|R(k)| ≥ 0,
from (B.2). This proves Eq. (B.1). Equality in (B.1) is possible if and only if
E|x(0) −e

x(−k)|
2
= 0,
that is,
E|x(n) −e

x(n − k)|
2
= 0
for all n (by WSS property). Thus,
x(n) = e

x(n −k),
which means that all samples of x(n) can be predicted perfectly if we know the first k samples
x(0), . . . , x(k −1). Summarizing, as long as the process is not perfectly predictable, we have
R(0) > |R(k)| (B.3)
for k = 0.
159
A P P E N D I X C
Stability of the Inverse Filter
We now give a direct proof that the optimum prediction polynomial A
N
(z) has all zeros p
k
inside the unit circle, that is, | p
k
| < 1, as long as x(n) is not a line spectral process (i.e., not fully
predictable). This gives a direct proof (without appealing to Levinson’s recursion) that the causal
IIR filter 1/A
N
(z) is stable. This proof appeared in Vaidyanathan et al. (1997). Also see Lang and
McClellan (1979).
Proof. The prediction filter reproduced in Fig. C.1(a) can be redrawn as in Fig. C.1(b),
where q is some zero of A
N
(z) and C
N−1
(z) is causal FIR with order N− 1. First observe that
1 −qz
−1
is the optimum first-order prediction polynomial for the WSS process y(n), for otherwise,
the mean square value of its output can be made smaller by using a different q, which contradicts
the fact that A
N
(z) is optimal. Thus, q is the optimal coefficient
q =
R
yy
(1)
R
yy
(0)
, (C.1)
where R
yy
(k) is the autocorrelation of the WSS process y(n). From Appendix B, we know that
R
yy
(0) ≥ |R
yy
(k)| for any k that shows
|q| ≤ 1.
This proves that all zeros of A
N
(z) satisy |z
k
| ≤ 1. To show that, in fact, |q| < 1 when x(n)
is not a line spectral process, we have to work a little harder. Recall that the optimal e
f
N
(n) is
orthogonal to x(n −1), . . . , x(n −N) (Section 2.3). Because y(n −1) is a linear combination of
x(n −1), . . . , x(n −N), it is orthogonal to e
f
N
(n) as well:
E[e
f
N
(n)y*(n −1)] = 0. (C.2)
160 THETHEORY OFLINEARPREDICTION
A (z)
N
) n ( e ) n ( x
f
N
FIR prediction filter
C (z)
N − 1
x(n) e (n)
f
N
y(n)
1 − qz
−1
(a)
( b)
FIGUREC.1: (a) The prediction filter and (b) an equivalent drawing, with a factor (1 −qz
−1
) shown
separately.
Thus, using the fact that e
f
N
(n) = y(n) −qy(n − 1), we can write
E
f
N
= E[|e
f
N
(n)|
2
] = E[e
f
N
(n)(y(n) −qy(n − 1))*]
= E[e
f
N
(n)y*(n)] (from (C.2))
= E[(y(n) −qy(n − 1))y*(n)]
= R
yy
(0) −qR
yy
*
(1)
= R
yy
(0)(1 − |q|
2
) (from (C.1)).
Because x(n) is not a line spectral process, we have E
f
N
= 0, that is, E
f
N
> 0. Thus, (1 − |q|
2
) > 0,
which proves |q| < 1 indeed.
• • • •
161
A P P E N D I X D
Recursion Satisfied by AR
Autocorrelations
Let a
k,n
* denote the optimal Nth-order prediction coefficients and E
f
N
the corresponding prediction
error for a WSS process x(n) with autocorrelation R(k). Then, the autocorrelation satisfies the
following recursive equations:
R(k) =
_
−a
N,1
* R(k −1) −a
N,2
* R(k −2) . . . − a
N,N
* R(k −N) + E
f
N
, k = 0
−a
N,1
* R(k − 1) − a
N,2
* R(k − 2) . . . − a
N,N
* R(k −N), 1 ≤ k ≤ N.
(D.1)
In particular, if the process x(n) is AR(N), then the second equation can be made stronger:
R(k) = −a
N,1
* R(k −1) −a
N,2
* R(k −2) . . . −a
N,N
* R(k − N), for all k > 0 (D.2)
Proof. The above equations are nothing but the augmented normal equations rewritten. To
see this, note that the 0th row in Eq. (2.20) is precisely the k = 0 case in (D.1). For 1 ≤ k ≤ N,
the kth equation in (2.20) is
R
*
(k) + a
N,1
R
*
(k − 1) + . . . + a
N,N
R
*
(k −N) = 0 (using R(i) = R
*
(−i)).
Conjugating both sides, we get the second equation in Eq. (D.1). For the case of the AR(N)
process, we know that the (N + m)th-order optimal predictor for any m > 0 is the same as the
Nth-order optimal predictor, that is,
a
N+m,n
=
_
a
N,n
1 ≤ n ≤ N
0 n > N.
So, the second equation in Eq. (D.1), written for N + m instead of N yields
R(k) = −a
N,1
* R(k −1) −a
N,2
* R(k − 2) . . . −a
N,N
* R(k − N), 1 ≤ k ≤ N + m
for any m ≥ 0. This proves Eq. (D.2) for AR(N) processes.
• • • •
163
Problems
1. Suppose x(n) is a zero-mean white WSS random process with variance σ
2
. What are the
coefficients a
N,i
of the Nth-order optimal predictor? What is the prediction error variance?
2. Consider the linear prediction problem for a real WSS process x(n) with autocorrelation
R(k). The prediction coefficients a
N,i
are now real. Show that the mean square prediction
error E
f
N
can be written as
E
f
N
= R(0) + a
T
R
N
a + 2a
T
r,
where
a =
_
a
N,1
a
N,2
. . . a
N,N
_
T
,
and R
N
and r are as in Eq. (2.8). By differentiating E
f
N
with respect to each a
N,i
and setting
it to zero, show that we again obtain the normal equations (derived in Section 2.3 using
orthogonality principle).
3. Consider a WSS process x(n) with autocorrelation
R(k) =
_
b
0.5|k|
for k even
0 for k odd
(P.3)
where 0 < b < 1.
a) Compute the power spectral density S
xx
(e

).
b) Find the predictor polynomials A
1
(z), A
2
(z), the lattice coefficients k
1
, k
2
, and the error
variances E
f
1
, E
f
2
using Levinson’s recursion.
c) Show that e
f
2
(n) is white.
4. Someone performs optimal linear prediction on a WSS process x(n) and finds that the first
two lattice coefficients are k
1
= 0.5 and k
2
= 0.25 and that the second-order mean square
prediction error is E
f
2
= 1.0. Using these values, compute the following.
a) The predictor polynomials A
1
(z) and A
2
(z).
b) The mean square errors E
f
0
and E
f
1
.
c) The autocorrelation coefficients R(m) of the process x(n), for m = 0, 1, 2.
164 THETHEORY OFLINEARPREDICTION
5. Someone performs optimal linear prediction on a WSS process x(n) and finds that the first
four lattice coefficients are
k
1
= 0, k
2
= 0.5, k
3
= 0, k
4
= 0.25, (P.5)
and that the fourth-order mean square prediction error is E
f
4
= 2.0. Using these values,
compute the following:
a) The predictor polynomials A
m
(z) for m = 1,2,3,4.
b) The mean square errors E
f
m
for m = 0,1,2,3.
c) The autocorrelation coefficients R(m) of the process x(n), for m = 0,1,2,3,4.
6. Show that the quantity α
m
arising in Levinson’s recursion can be expressed in terms of a
cross-correlation, that is,
α
m
* = E[e
f
m
(n)x*(n −m − 1)]. (P.6)
7. Estimation From Noisy Data. Let v be a noisy measurement of a random variable u, that is,
v = u +, (P.7a)
where the random variable represents noise . Assume that this noise has zero mean and
that it is uncorrelated with u. From this noisy measurement v, we would like to find an
estimate of u of the form´u = av. Let e = u −´u be the estimation error. Compute the value
of a, which minimizes the mean square error E[|e|
2
]. Hence, show that the best estimate of
u is
´u =
_
R
uu
R
uu

2

_
v, (P.7b)
where R
uu
= E[|u|
2
] and σ
2

= E[||
2
] (noise variance). Show also that the minimized mean
square error is
E[|e|
2
] =
R
uu
σ
2

R
uu

2

. (P.7c)
Thus, when the noise variance σ
2

is very large, the best estimate approaches zero; this means
that we should not trust the measurement and just take the estimate to be zero. In this
extreme case, the error E[|e|
2
] approaches R
uu
. Thus, with the above estimate, the mean
square estimation error will never exceed the mean square value R
uu
of the random variable
u to be estimated, no matter how strong the noise is!
PROBLEMS 165
8. Valid Autocorrelations. Let R be a Hermitian positive definite matrix. Show that it is a
valid autocorrelation matrix. That is, show that there exists a random vector x such that
R = E[xx

].
9. Let J denote the N ×N antidiagonal matrix, demonstrated below for N = 3.
J =



0 0 1
0 1 0
1 0 0



(P.9)
If A is any N × N matrix, then JA represents a matrix whose rows are identical to those of
A but renumbered in reverse order. Similarly, AJ corresponds to column reversal.
a) Now assume that A is Hermitian and Toeplitz. Show that JAJ = A
*
.
b) Conversely, suppose JAJ = A
*
. Does it mean that A is Hermitian and Toeplitz? Justify.
c) Suppose JAJ = A
*
and A is Hermitian. Does it mean that A is Toeplitz? Justify.
10. Consider a WSS process x(n) with power spectrum
S
xx
(e

) =
1 − ρ
2
1 + ρ
2
− 2ρ cos ω
− 1 < ρ < 1. (P.10)
Let R
2
represent the 2 × 2 autocorrelationmatrix of x(n) as usual. Compute the eigenvalues
of this matrix and verify that they are bounded by the extrema of the power spectrum (i.e.,
as in Eq. (2.28)).
11. Condition number and spectral dynamic range. Eq. (2.31) shows that if the power spectrum
has wide variations (large S
max
/S
min
), then the condition number of R
N
can, in general, be
very large. However, this does not necessarily always happen: we can create examples where
S
max
/S
min
is arbitrarily large, and yet the condition number of R
N
is small. For example,
given the autocorrelation R(k) with Fourier transform S
xx
(e

), define a new sequence as
follows:
R
yy
(k) =
_
R(k/N) if k is multiple of N
0 otherwise.
a) Find the Fourier transform S
yy
(e

) of R
yy
(k), in terms of S
xx
(e

).
b) Argue that R
yy
(k) represents a valid autocorrelation for a WSS process y(n).
c) For the process y(n), write down the N ×N autocorrelationmatrix. What is its condition
number?
166 THETHEORY OFLINEARPREDICTION
d) For the process x(n), let γ denote the ratio S
max
/S
min
in its power spectrum S
xx
(e

).
For the process y(n), what is the corresponding ratio?
12. Consider two Hermitian matrices A and B such that A is the upper left submatrix of B,
that is
B =
_
A ×
× ×
_
. (P.12)
a) Show that the minimum eigenvalue of B cannot be larger than that of A. Similarly,
show that the maximum eigenvalue of B cannot be smaller than that of A. Hint. Use
Rayleigh’s principle (Horn and Johnson, 1985).
b) Prove that the condition number N of the autocorrelation matrix R
m
of a WSS process
(Section 2.4.1) cannot decrease as the matrix size m grows.
13. Consider a WSS process x(n) with the first three autocorrelation coefficients
R(0) = 3, R(1) = 2, R(2) = 1. (P.13)
a) Compute the coefficients of the first and the second-order optimal predictor polynomials
A
1
(z) and A
2
(z) by directly solving the normal equations.
b) By using Levinson’s recursion, compute the predictor polynomials and mean square
prediction errors for the first- and second-order predictors.
c) Draw the second-order FIR LPClattice structure representing the optimal predictor and
indicate the forward and backward prediction errors at the appropriate nodes. Indicate
the values of the lattice coefficients k
1
and k
2
.
d) Draw the second-order IIR lattice structure representing the optimal predictor and
indicate the forward and backward prediction errors at the appropriate nodes. Indicate
the values of the lattice coefficients k
1
and k
2
.
14. Valid Toeplitz autocorrelation. Consider a set of numbers R(k), 0 ≤ k ≤ N and define
the matrix R
N+1
as in Eq. (2.20). By construction, this matrix is Hermitian and Toeplitz.
Suppose, inaddition, it turns out that it is alsopositive definite. Then, showthat the numbers
R(k), 0 ≤ k ≤ N, are valid autocorrelation coefficients of some WSS process x(n). Give a
constructive proof, that is, show how you would generate a WSS process x(n) such that its
first N + 1 autocorrelation coefficients would be the elements R(0), R(1), . . . , R(N).
15. Show that the coefficients b
N,i
of the optimal backward predictor are indeed given by
reversing and conjugating the coefficients of the forward predictor, as shown in Eq. (4.3).
PROBLEMS 167
16. Let e
f
m
(n) denote the forward prediction error for the optimal mth-order predictor as usual.
Show that the following orthogonality condition is satisfied:
E
_
e
f
m
(n)[e
f
k
(n −1)]
*
_
= 0, m > k. (P.16)
17. In Problem 4, suppose you are informed that x(n) is actually AR(2). Then, compute the
autocorrelation R(m) for m = 3,4.
18. In Example 2.1, the autocorrelation R(k) of a WSS process x(n) was supplied in closed
form. Show that the process x(n) is in fact AR(2), that is, the prediction error e
f
2
(n) is
white.
19. Compute and sketch the entropy of a Gaussian WSS process, with autocorrelation R(k) =
ρ
|k|
, −1 < ρ < 1. Make qualitative remarks explaining the nature of the plot.
20. Consider a Gaussian AR(2) process with R(0) = 1, for which the second-order predictor
polynomial is
A
2
(z) =
_
1 −
1
2
z
−1
__
1 −
1
3
z
−1
_
(P.20)
Compute the entropy of the process and the flatness measure γ
2
x
.
21. Consider a real WSS Gaussian process x(n) with power spectrum sketched as shown in
Fig. P.21.
a) Compute the entropy of the process.

S (e )
xx
ω
π
0
1
π /2 π /4
2
4
FIGUREP.21
168 THETHEORY OFLINEARPREDICTION
b) Compute the limiting value E
f

of the mean square prediction error.
c) Compute the flatness measure γ
2
x
.
22. Consider a WSS process x(n) with autocorrelation R(k). Let the 3 × 3 autocorrelation
matrix have the form
R
3
=



1 ρ α
ρ 1 ρ
α ρ 1



, (P.22)
where ρ and α are real and −1 < ρ < 1. (Thus, R(0) = 1, R(1) = ρ, and R(2) = α.) If
you are told that x(n) is a line spectral process of degree = 2, what would be the permissible
values of α? For each of these permissible values of α, do the following:
a) Find the line frequencies ω
1
and ω
2
and the powers at these line frequencies.
b) Find the recursive difference equation satisfied by R(k).
c) Find a closed form expression for R(k).
d) Compute R(3) and R(4).
23. We now provide a direct, simpler, proof of the result in Theorem 7.4. Let y(n) be a WSS
random process such that if it is input to the FIR filter 1 −αz
−1
the output is identically
zero (see Fig. P.23)
a) Show that the constant α must have unit magnitude. (Assume, of course, that y(n) itself
is not identically zero.)
b) Now let x(n) be a WSS process satisfying the difference equation
x(n) = −
L

i=1
a
i
*x(n − i). (P.23)
That is, if x(n) is input to the FIR filter A(z) = 1 +

L
i=1
a
i
*z
−i
, the output is identically
zero. Furthermore, let there be no such FIR filter of lower order. Show then that all the
zeros of A(z) must be on the unit circle.
y(n) = 0 for all n 1 − α z
−1
FIGUREP.23
PROBLEMS 169
24. Let R
m
be the m ×m autocorrelation matrix of a Linespec(L) process x(n). We know that
R
L
has rank L and R
L+1
is singular, with rank L. Using (some or all of) the facts that R
m
is
Hermitian, Toeplitz, and positive semidefinite for any m, show the following.
a) R
m
has rank m for all m ≤ L.
b) R
m
has rank exactly L for any m ≥ L.
25. Show that the sum of sinusoids considered in Example 7.3 is indeed WSS under the three
conditions stated in the example.
26. Suppose a sequence of numbers
α
0
, α
1
, α
2
, . . . (P.26a)
tends to a limit α. This means that, given > 0, there exits a finite integer N, such that

i
−α| < for i > N. (P.26b)
a) Show that
lim
M→∞
1
M
M−1

i=0
α
i
= α. (P.26c)
In other words, define a sequence β
M
= (1/M)

M−1
i=0
α
i
and show that it tends to α.
b) Suppose f (x) is a continuous function for all x in a given range R. Thus, for any fixed x
in the range, if we are given a number > 0, we can find a number δ > 0 such that
| f (x
1
) − f (x)| < for |x
1
−x| < δ. (P.26d )
Let all members α
i
of the sequence (P.26a) be in the range R. Show then that
lim
i→∞
f (α
i
) = f
_
lim
i→∞
α
i
_
= f (α). (P.26e)
27. Let R
1
(k) and R
2
(k) represent the autocorrelations of two WSS random processes. Which
of the following sequences represent valid autocorrelations?
a) R
1
(k) + R
2
(k)
b) R
1
(k) −R
2
(k)
c) R
1
(k)R
2
(k)
d)

n
R
1
(n)R
2
(k −n)
e) R
1
(2k)
170 THETHEORY OFLINEARPREDICTION
f) R(k) =
_
R
1
(k/2) for k even
0 for k odd.
28. Let R(k) be the autocorrelationof a zeromeanWSSprocess and suppose R(k
1
) = R(k
2
) = 0
for some k
2
> k
1
> 0. Does this mean that R(k) is periodic? Note. If k
1
= 0 and k
2
> 0,
then the answer is yes, as asserted by Theorem 7.2.
29. Find an example of the autocorrelation R(k) for a WSS process such that, for some m,
the autocorrelation matrices R
m
and R
m+1
have the same minimum eigenvalue, but the
minimum eigenvalue of R
m+2
is smaller.
30. Let x and y be real randomvariables suchthat E[xy] = E[x]E[y], that is, they are uncorrelated.
This does not, in general, imply that E[x
k
y
n
] = E[x
k
]E[y
n
] (i.e., that x
k
and y
n
are
uncorrelated) for arbitrary positive integers k, n. Prove this statement with an example.
31. It is well-known that the power spectrum S
xx
(e

) of a scalar WSS process is nonnegative
(Papoulis, 1965; Peebles, 1987). Using this, show that the power spectral matrix S
xx
(e

)
of a WSS vector process x(n) is positive semidefinite.
32. Let x(n) and y(n) be zero mean WSS processes, with power spectra S
xx
(e

) and S
yy
(e

).
Suppose S
xx
(e

)S
yy
(e

) = 0 for all ω, that is, these spectra do not overlap as demonstrated
in the figure.
a) This does not mean that the processes are necessarily uncorrelated. Demonstrate this.
b) Assume, however, that the above processes are not only WSS but also jointly WSS (i.e.,
E[x(n)y*(n −k)] does not depend on n). Show then that they are indeed uncorrelated.

S (e )
xx
ω

0

S (e )
yy
FIGUREP.32
• • • •
171
References
1. Anderson, B. D. O., and Moore, J. B. Optimal filtering, Prentice-Hall, Englewood Cliffs, NJ,
1979.
2. Antoniou, A. Digital signal processing: signals, systems, and filters, McGraw-Hill, New York,
2006.
3. Antoniou, A., and Lu, W.-S. Practical optimization: algorithms and engineering applications,
Springer, New York, 2007.
4. Atal, B. S., and Schroeder, M. R. ‘‘Adaptive predictive coding of speech signals,’’ Bell. Sys.
Tech. J., vol. 49, pp. 1973--1986, October 1970. doi:10.1109/ICASSP.1980.1170967
5. Atal, B. S., and Schroeder, M. R. ‘‘Predictive coding of speech signals and subjective error
criteria,’’ IEEE Trans. Acoust. Speech Signal Process., vol. 27, pp. 247--254, June 1979.
6. Balabanian, N., and Bickart, T. A. Electrical network theory, John Wiley & Sons, New York,
1969.
7. Bellman, R. Introduction to matrix analysis, McGraw-Hill, New York, 1960.
8. Berlekamp, E. R. Algebraic coding theory, McGraw-Hill, New York, 1969.
9. Blahut, R. E. Fast algorithms for digital signal processing, Addison-Wesley, Reading, MA, 1985.
10. Burg, J. P. ‘‘The relationship between maximum entropy spectra and maximum likelihood
spectra,’’ Geophysics, vol. 37, pp. 375--376, April 1972. doi:10.1190/1.1440265
11. Chandrasekhar, S. ‘‘On the radiative equilibrium of a stellar atmosphere,’’ Astrophys. J., vol.
106, pp. 152--216, 1947.
12. Chellappa, R., and Kashyap, R. L. ‘‘Texture synthesis using 2-D noncausal autoregressive
models,’’ IEEE Trans. Acoust. Speech Signal Process., vol. 33, pp. 194--203, February 1985.
13. Chong, E. K. P., and
˙
Zak, S. H. An introduction to optimization, John Wiley & Sons, New
York, 2001. doi:10.1109/MAP.1996.500234
14. Churchill, R. V., and Brown, J. W. Introduction to complex variables and applications, McGraw-
Hill, New York, 1984.
15. Cover, T. M., and Thomas, J. A. Elements of information theory, John Wiley & Sons, New
York, 1991.
16. Deller, J. R., Proakis, J. G., and Hansen, J. H. L. Discrete-time processing of speech signals,
Macmillan, New York, 1993.
172 THETHEORY OFLINEARPREDICTION
17. Friedlander, B. ‘‘A lattice algorithm for factoring the spectrum of a moving average pro-
cess,’’ IEEE Trans. Automat. Control, vol. 28, pp. 1051--1055, November 1983. doi:10.1109/
TAC.1983.1103175
18. Gersho, A., and Gray, R. M. Vector quantization and signal compression, Kluwer Academic
Press, Boston, 1992.
19. Golub, G. H., and Van Loan, C. F. Matrix computations, Johns Hopkins University Press,
Baltimore, MD, 1989.
20. Gorokhov, A., and Loubaton, P. ‘‘Blind identification of MIMO-FIR systems: a general-
ized linear prediction approach,’’ Signal Process., vol. 73, pp. 105--124, 1999. doi:10.1016/
S0165-1684(98)00187-X
21. Gray, Jr., A. H. ‘‘Passive cascadedlattice digital filters,’’ IEEETrans. Circuits Syst., vol. CAS-27,
pp. 337--344, May 1980.
22. Gray, R. M. ‘‘On asymptotic eigenvalue distributionof Toeplitz matrices,’’ IEEETrans. Inform.
Theory, vol. 18, pp. 725--730, November 1972.
23. Gray, Jr., A. H., and Markel, J. D. ‘‘Digital lattice and ladder filter synthesis,’’ IEEETrans. Audio
Electroacoust. vol. AU-21, pp. 491--500, December 1973. doi:10.1109/TAU.1973.1162522
24. Gray, Jr., A. H., and Markel, J. D. ‘‘A normalized digital filter structure,’’ IEEE Trans.
Acoust. Speech Signal Process., vol. ASSP-23, pp. 268--277, June 1975. doi:10.1109/TASSP.
1975.1162680
25. Grenander, U., and Szego, G. Toeplitz forms and their applications, University of California
Press, Berkeley, CA, 1958.
26. Hayes, M. H. Statistical digital signal processing and modeling, John Wiley & Sons, New York,
1996.
27. Haykin, S. Adaptive filter theory, Prentice-Hall, Upper Saddle River, NJ, 1986 and 2002.
28. Horn R. A., and Johnson, C. R. Matrix analysis, Cambridge University Press, Cambridge,
1985.
29. Itakura, F. ‘‘Line spectrum representation of linear predictive coefficients of speech signals,’’ J.
Acoust. Soc. Am., vol. 57, 1975.
30. Itakura, F., and Saito, S. ‘‘A statistical method for estimation of speech spectral density and
formant frequencies,’’ Trans IECE Jpn., vol. 53-A, pp. 36--43, 1970.
31. Jayant, N. S., and Noll, P. Digital coding of waveforms, Prentice-Hall, Englewood Cliffs, NJ,
1984. doi:10.1016/0165-1684(85)90053-2
32. Kailath, T. ‘‘A view of three decades of linear filtering theory,’’ IEEE Trans. Inform. Theory,
vol. 20, pp. 146--181, March 1974. doi:10.1109/TIT.1974.1055174
33. Kailath, T. Linear systems, Prentice Hall, Inc., Englewood Cliffs, NJ, 1980.
REFERENCES 173
34. Kailath, T., Kung S-Y., and Morf, M. ‘‘Displacement ranks of a matrix,’’ Am. Math. Soc., vol.
1, pp. 769--773, September 1979. doi:10.1016/0022-247X(79)90124-0
35. Kailath, T., Sayed, A. H., and Hassibi, B. Linear estimation, Prentice-Hall, Englewood Cliffs,
NJ, 2000.
36. Kang, G. S., and Fransen, L. J. ‘‘Application of line-spectrum pairs to low-bit-rate speech
coders,’’ Proc. IEEE Int. Conf. Acoust. Speech Signal Process., pp. 7.3.1--7.3.4, May 1995.
37. Kay, S. M., and Marple, S. L., Jr., ‘‘Spectrum analysis---a modern perspective,’’ Proc. IEEE,
vol. 69, pp. 1380--1419, November 1981.
38. Kay, S. M. Modern spectral estimation: theory and application, Prentice-Hall, Englewood Cliffs,
NJ, 1988.
39. Kolmogorov, A. N. ‘‘Interpolation and extrapolation of stationary random sequences,’’ Izv.
Akad. Nauk SSSR Ser. Mat. 5, pp. 3--14, 1941.
40. Kreyszig, E. Advanced engineering mathematics, John Wiley & Sons, New York, 1972.
41. Kumaresan, R., and Tufts, D. W. ‘‘Estimating the angles of arrival of multiple plane waves,’’
IEEE Trans. Aerosp. Electron. Syst., vol. 19, pp. 134--139, January 1983.
42. Lang, S. W., and McClellan, J. H. ‘‘A simple proof of stability for all-pole linear prediction
models,’’ Proc. IEEE, vol. 67, pp. 860--861, May 1979.
43. Lepschy, A., Mian, G. A., and Viaro, U. ‘‘A note on line spectral frequencies,’’ IEEE Trans.
Acoust. Speech Signal Process., vol. 36, pp. 1355--1357, August 1988. doi:10.1109/29.1664
44. Levinson, N. ‘‘The Wiener rms error criterion in filter design and prediction,’’ J. Math. Phys.,
vol. 25, pp. 261--278, January 1947.
45. Lopez-Valcarce, R., and Dasgupta, S. ‘‘Blind channel equalization with colored sources based
on second-order statistics: a linear prediction approach,’’ IEEE Trans. Signal Process., vol. 49,
pp. 2050--2059, September 2001. doi:10.1109/78.942633
46. Makhoul, J. ‘‘Linear prediction: a tutorial review,’’ Proc. IEEE, vol. 63, pp. 561--580, April
1975.
47. Makhoul, J. ‘‘Stable and efficient lattice methods for linear prediction,’’ IEEE Trans. Acoust.
Speech Signal Process., vol. 25, pp. 423--428, October 1977.
48. Makhoul, J., Roucos, S., and Gish, H. ‘‘Vector quantization in speech coding,’’ Proc. IEEE,
vol. 73, pp. 1551--1588, November 1985. doi:10.1109/ISCAS.2003.1206020
49. Manolakis, D. G., Ingle, V. K., and Kogon, S. M. Statistical and adaptive signal processing,
McGraw-Hill, New York, 2000.
50. Markel, J. D., and Gray, A. H., Jr. Linear prediction of speech, Springer-Verlag, New York,
1976.
51. Marple, S. L., Jr. Digital spectral analysis, Prentice-Hall, Englewood Cliffs, NJ, 1987.
doi:10.1121/1.398548
174 THETHEORY OFLINEARPREDICTION
52. Mitra, S. K. Digital signal processing: a computer-based approach, McGraw-Hill, New York,
2001.
53. Moon, T. K., and Stirling, W. C. Mathematical methods and algorithms for signal processing,
Prentice-Hall, Upper Saddle River, NJ, 2000.
54. Oppenheim, A. V., and Schafer, R. W. Discrete-time signal processing, Prentice-Hall, Englewood
Cliffs, NJ, 1999.
55. Papoulis, A. Probability, random variables and stochastic processes, McGraw-Hill, New York,
1965.
56. Papoulis, A. ‘‘Maximum entropy and spectrum estimation: a review,’’ IEEE Trans. Acoust.
Speech Signal Process., vol. 29, pp. 1176--1186, December 1981.
57. Paulraj, A., Roy, R., and Kailath, T. ‘‘A subspace rotation approach to signal parameter
estimation,’’ Proc. IEEE, vol. 74, pp. 1044--1046, July 1986.
58. Peebles, Jr., P. Z. Probability, random variables and random signal principles, McGraw-Hill,
New York, 1987.
59. Pisarenko, V. F. ‘‘The retrieval of harmonics from a covariance function,’’ Geophys. J. R. Astron.
Soc., vol. 33, pp. 347--366, 1973. doi:10.1111/j.1365-246X.1973.tb03424.x
60. Proakis, J. G., and Manolakis, D. G. Digital signal processing: principles, alogrithms and
applications, Prentice-Hall, Upper Saddle River, NJ, 1996.
61. Rabiner, L. R., and Schafer, R. W. Digital processing of speech signals, Prentice-Hall, Englewood
Cliffs, NJ, 1978.
62. Rao, S. K., and Kailath, T. ‘‘Orthogonal digital lattice filters for VLSI implementation,’’ IEEE
Trans. Circuits Syst., vol. CAS-31, pp. 933--945, November 1984.
63. Robinson, E. A. ‘‘A historical perspective of spectrum estimation,’’ Proc. IEEE, vol. 70, pp.
885--907, September 1982.
64. Roy, R., and Kailath, T. ‘‘ESPRIT---estimation of signal parameters via rotational invariance
techniques,’’ IEEE Trans. Acoust. Speech Signal Process., vol. 37, pp. 984--995, July 1989.
doi:10.1109/29.32276
65. Rudin, W. Real and complex analysis, McGraw-Hill, New York, 1974.
66. Satorius, E. H., and Alexander S. T. ‘‘Channel equalization using adaptive lattice algorithms,’’
IEEE Trans. Commun., vol. 27, pp. 899--905, June 1979. doi:10.1109/TCOM.1979.1094477
67. Sayed, A. H. Fundamentals of adaptive filtering, IEEE Press, John Wiley & Sons, Hoboken,
NJ, 2003.
68. Schmidt, R. O. ‘‘Multiple emitter location and signal parameter estimation,’’ IEEE Trans.
Antennas Propagation, vol. 34, pp. 276--280, March 1986 (reprinted from the Proceedings of
the RADC Spectrum Estimation Workshop, 1979). doi:10.1109/TAP.1986.1143830
REFERENCES 175
69. Schroeder, M. Computer speech: recognition, compression and synthesis, Springer-Verlag, Berlin,
1999.
70. Soong, F. K., and Juang B.-H, ‘‘Line spectrum pair (LSP) and speech data compression,’’ Proc.
IEEE Int. Conf. Acoust. Speech Signal Process., pp. 1.10.1--1.10.4, March 1984.
71. Soong, F. K., and Juang B.-H, ‘‘Optimal quantization of LSP parameters,’’ IEEE Trans. Speech
Audio Process., vol. 1, pp. 15--24, January 1993.
72. Strobach, P. Linear prediction theory, Springer-Verlag, Berlin, 1990.
73. Therrien, C. W. Discrete random signals and statistical signal processing, Prentice-Hall, Upper
Saddle River, NJ, 1992.
74. Vaidyanathan, P. P. ‘‘The discrete-time bounded-real lemma in digital filtering,’’ IEEE. Trans.
Circuits Syst., vol. CAS-32, pp. 918--924, September 1985. doi:10.1109/TCS.1985.1085815
75. Vaidyanathan, P. P. ‘‘Passive cascaded lattice structures for low sensitivity FIR filter design,
with applications to filter banks,’’ IEEE Trans. Circuits Syst., vol. CAS-33, pp. 1045--1064,
November 1986. doi:10.1109/TCS.1986.1085867
76. Vaidyanathan, P. P. ‘‘On predicting a band-limited signal based on past sample values,’’ Proc.
IEEE, vol. 75, pp. 1125--1127, August 1987.
77. Vaidyanathan, P. P. Multirate systems and filter banks, Prentice-Hall, Englewood Cliffs, NJ,
1993. doi:10.1109/78.382395
78. Vaidyanathan, P. P., and Mitra, S. K. ‘‘Low passband sensitivity digital filters: A generalized
viewpoint and synthesis procedures,’’ Proc. IEEE, vol. 72, pp. 404--423, April 1984.
79. Vaidyanathan, P. P., and Mitra, S. K. ‘‘A general family of multivariable digital lattice filters,’’
IEEE. Trans. Circuits Syst., vol. CAS-32, pp. 1234--1245, December 1985. doi:10.1109/
TCS.1985.1085661
80. Vaidyanathan, P. P., Mitra, S. K., and Neuvo, Y. ‘‘A new approach to the realization of low
sensitivity IIR digital filters,’’ IEEE Trans. Acoust. Speech Signal Process., vol. ASSP-34, pp.
350--361, April 1986. doi:10.1109/TASSP.1986.1164829
81. Vaidyanathan, P. P., Tuqan, J., and Kirac, A. ‘‘On the minimum phase property of prediction-
error polynomials,’’ IEEE Signal Process. Lett., vol. 4, pp. 126--127, May 1997. doi:10.1109/
97.575554
82. Van Trees, H. L. Optimum array processing, John Wiley & Sons, New York, 2002.
83. Wainstein, L. A., and Zubakov, V. D. Extraction of signals from noise, Prentice-Hall, London,
1962. doi:10.1119/1.1969250
84. Weiner, N. Extrapolation, interpolation and smoothing of stationary time series, John Wiley &
Sons, New York, 1949.
85. Weiner, N., and Masani, P. ‘‘The prediction theory of multivariate stochastic processes,’’ Acta
Math., part 1: vol. 98, pp. 111--150, 1957; part 2: vol. 99, pp. 93--137, 1958.
177
Author Biography
P. P. Vaidyanathan received his bachelor of science, bachelor of technology, and master of
technology degrees from the University of Calcutta, India. He obtained his doctor of philosophy
degree from the University of California at Santa Barbara and became a faculty of the Department
of Electrical Engineering at the California Institute of Technology in 1983. He served as the
executive officer of the Department of Electrical Engineering at Caltech for the period 2002–2005.
He has received several awards for excellence in teaching at Caltech. He is the author of the book
Multirate Systems and Filter Banks and has authored or coauthored more than 370 articles in the
signal processing area. His research interests include digital filter banks, digital communications,
image processing, genomic signal processing, and radar signal processing. He is an IEEE Fellow
(1991), past distinguished lecturer of the IEEE SP society, and recipient of the IEEE ASSP Senior
Paper Award and the S. K. Mitra Memorial Award (IETE, India). He was the Eliahu and Joyce
Jury lecturer at the University of Miami for 2007. He received the F. E. Terman Award (ASEE)
in 1995, the IEEE CAS Society’s Golden Jubilee Medal in 1999, and the IEEE Signal Processing
Society’s Technical Achievement Award in 2001.
179
Index
Acoustical tube, 114
All-pass
phase response, 112
All-pass filters, 31
All-pass property, 33
MIMO, 138
Allpole filter, 37
Alternation property, 112
AM--GM inequality, 69
Antidiagonal matrix, 165
AR (autoregressive), 27, 41
AR approximation, 44
AR model
estimation, 55
power spectrum, 50
AR modeling, 44
AR process
maximum entropy, 83
summary, 62
AR property and stalling, 42
Augmented normal equation, 9
Autocorrelation
determinant, 15
eigenstructure, 103
estimating, 16
recursion, 161
singularity, 13
Autocorrelation and periodicity, 92
Autocorrelation extrapolation, 46, 53, 91
Autocorrelation inequality, 157
Autocorrelation matching, 47
filter, 49
Autocorrelation matrix
properties, 10
Autocorrelation method, 16
Autocorrelation sequence, 7
Autoregressive(AR), 27, 41
Backward prediction polynomial, 32
Backward predictor, 31
Band-limited signal
prediction of, 85
Burg’s relation, 127
Compression, 58
Condition number, 13
Condition number and spectrum, 165
Coprimality
predictor polynomials, 148
Coprime matrices, 147
Covariance method, 17
Dagger notation, 3
Determinant and MSE, 79
Determinant of autocorrelation, 15
Differential entropy, 82
Displacement rank, 29
Downward recursion, 38
180 THETHEORY OFLINEARPREDICTION
Eigenstructure of autocorrelation, 103
Eigenvalues and power spectrum, 12
Entropy
differential, 82
Gaussian, 82
Entropy maximization, 83
Entropy maximum
AR process, 83
Entropy of prediction error, 85
Ergodicity, 96
Error covariance matrix, 118
Error spectrum
flatness, 76
ESPRIT, 109
Estimating autocorrelations, 16
Estimation
AR model, 55
Estimation from noisy data, 164
Extrapolation of autocorrelation, 46, 53
FIR lattice, 36
Flatness, 65
error spectrum, 76
Formants in speech, 114
Forward predictor, 5
Full predictability, 14
Harmonic process, 92
History of linear prediction, 1
Homogeneous difference equation, 90
IIR lattice, 36
stability, 37
Inverse filter
stability proof, 159
Inverse filter (IIR), 6
Irreducible MFDs, 147
Irreducible MFDs and poles, 147
Kailath, 2
Kolmogorov, 1
Lattice coefficients, 23
Lattice structure
derivation, 35
FIR, 36
FIR, vector LPC, 130
IIR, 36
IIR, vector LPC, 135
normalized, vector LPC, 137
stability, 37
symmetrical, vector LPC, 133
Lattice structures, 31
Levinson, 2
Levinson’s recursion, 19
downward, 38
history, 29
initialization, 22
properties, 23
summary, 22
upward, 38
vector case, summary, 128
Line frequencies, 15, 87
Line spectra
extrapolation, 91
periodicity, 92
unit-circle zeros, 91
Line spectral paremeters, 93
Line spectral process, 15, 87
autocorrelation, 88
characteristic polynomial, 90
rank saturation, 91
INDEX 181
Line spectrum
amplitude, 92
conditions for, 95
ergodicity, 96
power, 92, 105
prediction polynomial, 98
summary, 100
Line spectrum in noise, 102
Line spectrum pairs
definition, 113
Line-spectrum pair (LSP), 61
Linear dependence, 156
Linear estimation, 151
Linear prediction
MIMO, 117
vector process, 117
Linear predictive coding (LPC), 6
Linespec(L), 87
LPC for compression, 58
LSP, 113
LSPs
properties, 113
Makhoul, 2
Masani, 1
Matrix fraction descriptions (MFD), 146
Maximum entropy, 81
Maximum entropy estimate, 53
Maximum entropy method, 83
Mean squared error, 5
MFD (matrix fraction descriptions), 146
Minimum mean square error estimate, 151
Minimum-norm methods, 109
Minimum-phase
vector LPC, 148
Minimum-phase property, 7
Modulus property, 112
Moving average (MA) process, 61
MSE and determinant, 79
MUSIC, 109
Noise-excited coders, 60
Normal equations, 7
Normalized error, 130
Notations, 3
Optimal error covariance, 119
Optimal linear estimator, 153
Optimum predictor, 5
Orthogonality
geometric interpretation, 154
prediction errors, 34
Orthogonality of error, 6
Orthogonality principle, 151
Paraunitary property, 138
propagation, 141
Parcor, 22
bound, 24
vector LPC, 124
Partial correlation, 22
Perceptual quality and LSP, 114
Periodic autocorrelation, 92
Periodic process, 92
Phase response
all-pass, 112
Pisarenko’s harmonic decomposition, 105
Pitch-excited coders, 60
Poles of MIMO lattice, 142
Poles, MIMO case, 142
Positivity of power spectrum, 170
Power spectrum, 12
182 THETHEORY OFLINEARPREDICTION
AR model, 50
estimate, 53
positivity, 170
Power spectrum and
eigenvalues, 12
Predictability and
spectral flatness, 65
Predictability measure, 81
Predictability, full, 14
Prediction error, 6
closed form (AR), 65
limit, 16
orthogonality, 34
Prediction error bound, 75
Prediction error limit
integral, 81
proof, 79
Prediction filter (FIR), 6
Prediction order, 5
Prediction polynomial, 6
backward, 32
line spectral, 98
Predictor
backward, 31
forward, 5
Quantization and stability, 60
Rank saturation
line spectral process, 91
RCD (right common divisor), 147
Reactance in circuits, 114
Real random sinusoid, 96
Recursion
AR autocorrelation, 161
Recursive difference equation, 41
Reflection coefficients, 23
origin, 114
Right common divisor (rcd), 147
Right coprime matrices, 147
Singularity of autocorrelation, 13, 156
Sinusoids
identification, 93
in noise, 102
Spectral factorization, 55
Spectral flatness, 65
and predictability, 65
AR case, 72
definition, 69
Stability
inverse filter, proof, 159
of MIMO lattice, 144
and quantization, 60
inverse filter, 7
lattice structure, 37
Stalling, 26
and AR property, 42
vector LPC, 144
Symmetrical lattice, 133
Szego’s theorem, 80
Tilde notation, 3
Toeplitz, 7
Toeplitz autocorrelation, validity, 166
Toeplitz property
proof, 11
Transfer matrices, 129
properties, 146
Unimodular matrices, 147
Upward recursion, 38
INDEX 183
Vector LPC, 117
augmented normal equations, 120
augmented normal equations,
backward, 122
backward prediction, 121
error covariance, 118
Levinson Properties, 126
Levinson summary, 125
Levinson’s recursion, 122
normal equations, 120
parcor, 124
prediction polynomial, 118
Vector processes, 117
Voiced and unvoiced sounds, 60
Whitening effect, 26
vector LPC, 144
Wiener, 1
Wiener-Hopf equations, 7
WSS
conditions for, 94, 95
Yule-Walker equations, 7

Copyright © 2008 by Morgan & Claypool All rights reserved. No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by any means---electronic, mechanical, photocopy, recording, or any other except for brief quotations in printed reviews, without the prior permission of the publisher. The Theory of Linear Prediction P. P. Vaidyanathan www.morganclaypool.com

ISBN: 1598295756 ISBN: 9781598295757 ISBN: 1598295764 ISBN: 9781598295764

paperback paperback ebook ebook

DOI: 10.2200/S00086ED1V01Y200712SPR03 A Publication in the Morgan & Claypool Publishers series SYNTHESIS LECTURES ON SIGNAL PROCESSING # 3

Lecture #3 Series Editor: José Moura, Carnegie Mellon University

Series ISSN ISSN 1932-1236 print ISSN 1932-1694 electronic

The Theory of Linear Prediction
P. P. Vaidyanathan
California Institute of Technology

SYNTHESIS LECTURES ON SIGNAL PROCESSING #3

M &C

Mor gan

&Cl aypool

Publishers

and Sagar and my parents.To Usha. . Vikram.

.

and a historical perspective and detailed outline are given in the first chapter. vector linear prediction. autoregressive models . Applications are mentioned wherever appropriate. the theory of vector linear prediction is explained in considerable detail and so is the theory of line spectral processes. This focus and its small size make the book different from many excellent texts which cover the topic. filtering. this book focuses on linear prediction. This has enabled detailed discussion of a number of issues that are normally not found in texts. including a few that are actually dedicated to linear prediction. and the theory of matrices. KEYWORDS Linear prediction theory. random processes. The writing style is meant to be suitable for self-study as well as for classroom use at the senior and first-year graduate levels. filtering. The text is self-contained for readers with introductory exposure to signal processing. line spectral processes. its influence can still be seen in applications today. Levinson’s recursion. linear estimation. but the focus is not on the detailed development of these applications. and smoothing. For example. smoothing. The theory is based on very elegant mathematics and leads to many beautiful insights into statistical signal processing.vi ABSTRACT Linear prediction theory has had a profound impact in the field of digital signal processing. Although prediction is only a part of the more general topics of linear estimation. There are several examples and computer-based demonstrations of the theory. Although the theory dates back to the early 1940s. lattice structures.

I have focused on linear prediction in this book. the theory of vector linear prediction is explained in considerable detail and so is the theory of line spectral processes. . Although the theory dates back to the early 1940s. random processes. For example. This has enabled me to discuss in detail a number of issues that are normally not found in texts. the material here emerged from classroom lectures that I had given over the years at the California Institute of Technology. including a few that are actually dedicated to linear prediction. A historical perspective and a detailed outline are given in Chapter 1. Indeed. So. its influence can still be seen in applications today. and smoothing. The theory is based on very elegant mathematics and leads to many beautiful insights into statistical signal processing. the text is self-contained for readers with introductory exposure to signal processing. and the theory of matrices. Applications are mentioned wherever appropriate. This focus and its small size make the book different from many excellent texts that cover the topic. filtering. but the focus is not on the detailed development of these applications. Although prediction is only a part of the more general topics of linear estimation. The writing style is meant to be suitable for self-study as well as for classroom use at the senior and first-year graduate levels.vii Preface Linear prediction theory has had a profound impact in the field of digital signal processing. There are several examples and computer-based demonstrations of the theory.

.

P. Vaidyanathan California Institute of Technology . My ‘‘love for linear prediction’’ was probably kindled by these three references. Her sincere support and the enthusiasm and love from my uncomplaining sons Vikram and Sagar are much appreciated! P. She has endured many evenings and weekends of my ‘‘disappearance’’ to work. The small contribution I have made here would not have been possible were it not for these references and other excellent ones mentioned in the introduction in Chapter 1.ix Acknowledgments The pleasant academic environment provided by the California Institute of Technology and the generous support from the National Science Foundation and the Office of Naval Research have been crucial in developing some of the advanced materials covered in this book. have taught me a lot and so has the book by Anderson and Moore (1979). Tom Kailath (1974) and a wonderful tutorial on linear prediction by John Makhoul (1975). During my ‘‘young’’ days. I was deeply influenced by an authoritative tutorial on linear filtering by Prof. These two articles. who has shown infinite patience during my busy days with research and book projects. It is impossible to reduce to words my gratitude to Usha. among other excellent references.

.

. . 16 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .9 2. . . . 5 2. . . . . . . . . . . . . . . . . . .2 The Augmented Normal Equation . . . . . . 18 3. 31 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10 2. . . . . . . . . . . . . . . . . . .4. . . . . . . . . . . . 33 . . . . . . . . . . . . . . . . . .xi Contents 1. 19 3. . . . . . . . . . . 31 4. . . . . . . . . . . . . . . . .4. . . . . . . . . 15 2. . . . . . . . . . 1 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5 2. . . . . . . . . . . 3 2.1 All-Pass Property . 6 2. . . .1 Expression for the Minimized Mean Square Error . . . . . . .3 Determinant of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . .2 The Partial Correlation Coefficient . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 3. . . . . . . . . . . . . . . . . . . . .1 Summary of Levinson’s Recursion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The Backward Predictor . . . . .1 Introduction . . . . . . . . . . . . . . . . . . The Optimal Linear Prediction Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Levinson’s Recursion . . . . . . . . . . . .1 Introduction . . . . . . . . . . . . .4 Properties of the Autocorrelation Matrix . . . 2 1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . .5 Estimating the Autocorrelation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23 3. . . . . . . . . . . . . . . . . .1 Relation Between Eigenvalues and the Power Spectrum . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . .2. . . . . . .5 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . 12 2. . . . . . . . . . . .2. . . . . . . . . . . Lattice Structures for Linear Prediction . . . . . . . . . . . . . . . . Introduction . . . . . . 1 1. . . . . . . . . . . . . . . . . . . . . . . . . 19 3. 22 3. . . . . .6 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 The Normal Equations . . . . . . . . . . . 31 4. . . . . . . . . . . . . . . . . . . . . . . . . 5 2. . . . .1 Introduction . . . . . . . . .1 Notations . . . . .2 Prediction Error and Prediction Polynomial . 19 3. . . . . . . . . 8 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 4. . . . . . . . . . . . . . . . . . . . . . . . . .3 Simple Properties of Levinson’s Recursion . . . . . . . . 13 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Scope and Outline . . . . . . . .2 Singularity of the Autocorrelation Matrix . . .4 The Whitening Effect . . . . . . . . . . . . . . .2 Derivation of Levinson’s Recursion . . . .4. . . . . . . .1 History of Linear Prediction . . .

. . . . . . . . . . .1 Introduction . . 81 6.1 Error Spectrum Gets Flatter as Predictor Order Grows . . . . . . . . . . . . . . . Prediction Error Bound and Spectral Flatness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3. . 65 6. . . . . . . . . . . . . . . .1 Connection to the Notion of Entropy . . . . . . 83 6.5 Case Where Signal Is Not AR . . . . . . . . .5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 4. . 41 5. . . . . . . . . . . . . . . 72 6. . . . . . . . . . . . . . . . . . . . . 41 5. . . 62 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61 5. . . . . . . . . . . . . . . . . . 34 4. . . 38 4. . . . . . . .3 The Upward and Downward Recursions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Line Spectral Processes . . . . . . . . . . . . . . . . . . . . . .5. . . . . . . . . . . . . . . . 91 7. . . . . . . . . . . . . . .1 If a Process is AR. . . . . . . . . . . . . . . . . . . . . . . . . Then LPC Will Reveal It . . .4 Spectral Flatness of an AR Process . . . . . . . . . . . . . . . . . . . .1 The IIR LPC Lattice. . . 87 7. . .7 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Introduction . . 76 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Autoregressive Processes . . . . . . . . . . . . . 46 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 . . 37 4. . . . . . . . . . . . . . . . . . . . . . . .6 Application in Signal Compression . . . . . . . . . . .6 Maximum Entropy and Linear Prediction . . . . . . . . . . . . . . . . . . . . . . . . . Autoregressive Modeling . . . . . . . . . . . . . . . . .5 Power Spectrum of the AR Model . . . . . . . . . . . . . . . . . 85 6. . .4 Concluding Remarks . . . . . . . . . 45 5. . . . . .3 Time Domain Descriptions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47 5. . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .xii THE THEORY OF LINEAR PREDICTION 4. . . . . . . 75 6. . . .3 Approximation by an AR(N) Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Mean Square Error and Determinant .3. . . . . 85 7. . . . . . . . . . . . . . . . . . . . .4 Autocorrelation Matching Property . . . . . . . . . . . . 39 5. . . . . .3 A Measure of Spectral Flatness . . . . . . . . . . . . . 90 7. . . . .3. .6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Prediction Error for an AR Process . . . . . . . . . . . . . . . . . . . . . . . 44 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . .6. . . . . . . . . . . . . . . . . . . . . . . . . 65 6. .2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Entropy of the Prediction Error . . . . .3. . . . . . . . . . . . . . . . .1 The Characteristic Polynomial . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Orthogonality of the Optimal Prediction Errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88 7. . . . . . . . . . . . . . . . . . .2 Stability of the IIR Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Lattice Structures . . . . . . . . . 82 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 A Direct Maximization Problem .2 Autocorrelation of a Line Spectral Process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Extrapolation of the Autocorrelation . 65 6. . . . . . . . .1 Introduction . . . . . . . . . .2. . . . .8 Summary . 41 5. . . 68 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 MA and ARMA Processes . . . .2. . . . 50 5. 87 7. . . . . 58 5.6. . . . . . . . . . . . . . . . . . .2 Rank Saturation for a Line Spectral Process . .

. . . . . . . . . . . 135 8. . . . . . . . . . .5 Levinson’s Recursion: Vector Case . . . . . . . . . . . . .3 Normal Equations: Vector Case . .117 8. . . . . . . . . . .11. . . .6 Summary of Properties . . . . . . . . . . . . . . . . . . . . . . . .11. . . . . . . . . . .11. . . . 126 f 8. . . . . . . . . . .9 Concluding Remarks . . .3. . .2 All Zeros on the Unit Circle . . . . . . . . . . . . . . . . . .4 Further Properties of Time Domain Descriptions . . . . . . . . . . . . . 120 8. . 111 7. . .2 Computing the Powers at the Line Frequencies . . .3 Poles of the MIMO IIR Lattice . . . . . . . . . . . . . .6. . 148 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Propagation of Paraunitary Property . .6 Properties Derived from Levinson’s Recursion . . . . . . . . 128 8. . . 130 8.5 Prediction Polynomial of Line Spectral Processes . . . . . . . . . . . . . . . . . . . . . . . 93 7. . . . . . . . . . . 102 7. . . . . . . . . . . 137 8. . . 146 8. .6. 142 8. . . . . . . . . . . . .1 Extrapolation of Autocorrelation . Linear Prediction Theory for Vector Processes . . . . . 91 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Eigenstructure of the Autocorrelation Matrix . . . . . . . . . . 105 7. . . . 146 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Relation Between Predictor Polynomials . . . . . . . . . .3. . . . .7 Transfer Matrix Functions in Vector LPC . . . . . . . . . . . .1 Properties of Matrices Fm and Fb . . . . . .11 The Paraunitary or MIMO All-Pass Property . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 93 7. . . . . . 138 8. . . . . .6. . 131 8. . . . . . . 126 m 8. . . . . . . . . . . . . . . .9 The IIR Lattice Structure for Vector LPC . .8. . 144 8. . . . . . . . . . . . . . . . . . . . . . . .2 Monotone Properties of the Error Covariance Matrices . .. . . . . . 115 8. . . . . . . . . . . . . . . . . . . . . . . . . 100 7. . . . . . . . . . . . . . . . . . . . . . . 129 8. . . . . . . . . . . . . . . . . . . . . . .2 The Symmetrical Lattice . . . . . . . . . . . . . . . . . . . . . . . . .7 Identifying a Line Spectral Process in Noise . . . . . . . .4 Backward Prediction . . . . . . . . . . . . . . . . . . . . . . . 122 8. . . . .1 Introduction. . . . . . . 92 7. . . . . . 140 8. . . . . . . . . . . . . . . . . . . . . . . .13. . . . . . . . . 98 7. .8 The FIR Lattice Structure for Vector LPC . . . . . . . . . . .3. . . . . . . . . 117 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Determining the Parameters of a Line Spectral Process . . . . . . . . . . . . . . . . . . .2 Formulation of the Vector LPC Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128 8. . . . . . . . . . . . . . . . . . . . .CONTENTS xiii 7. . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . .1 Unitarity of Building Blocks . . . . . . . . . . . . . . . . . . . . . . . 133 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Periodicity of a Line Spectral Process . . . . . . . . . . . . . . . . . . . . . . . . .7. . . . . . . . . . . . . . . . . . . . . . 103 7. . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 7. 141 8. . . . . .1 Review of Matrix Fraction Descripions . . . . . . . . . . .7. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Toward Rearrangement of the Lattice . . . . . . . . . . . . . . . . . . . . . .13. . . . . . . . . . .12 Whitening Effect and Stalling . . . . .10 The Normalized IIR Lattice . . . . . . . . . . .13 Properties of Transfer Matrices in LPC Theory . . . . . . . . . . . .8 Line Spectrum Pairs . . . . . . . . 120 8. . . . . . . . . . . . . . . . . . . .3 Summary of Properties Relating to Levinson’s Recursion . . . . 117 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

. . . . . . . . . . . . . . . . . . 179 . . . . . . . . . . . .1 The Orthogonality Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 177 Index . . . . . . .2 Closed-Form Solution . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . C. 151 A. . . . . . . . . . . . . . . . . 159 Recursion Satisfied by AR Autocorrelations . . . . . . . . . . . . . . . . . . . . .xiv THE THEORY OF LINEAR PREDICTION A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Consequences of Orthogonality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 156 Proof of a Property of Autocorrelations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . D. . . . . . 171 Author Biography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .14 Concluding Remarks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .161 Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . B. . . . . . . . . 157 Stability of the Inverse Filter . . .4 Singularity of the Autocorrelation Matrix . . . . . . . . . . . . . . . . . . . 151 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154 A. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149 Linear Estimation of Random Variables . . . . . . . . . . . . .

and Weiner and Masani (1958). of which we can be very proud. and radar. compression.1 HISTORY OF LINEAR PREDICTION Historically. who showed how to extend . we give a detailed presentation of the theory of linear prediction and place in evidence some of the applications mentioned above. The theory has been successfully used for the representation. is fundamental to a number of signal processing applications. These have been found to be important in speech compression. These theoretical underpinnings. The same structures find use in the design of robust adaptive digital filter structures. it is at the center of many modern power spectrum estimation techniques. and computer generation of speech waveforms. which is important in sensor networks. are certainly at the heart of the crucial contributions that signal processing has made to the modern technological society we live in. This theory. modeling.1 CHAPTER 1 Introduction Digital signal processing has influenced modern technology in many wonderful ways. Other early pioneers are Levinson (1947). array processing. which are used in speech compression based on perceptual measures. several elegant theories have evolved on a variety of important topics. In this book. The infinite impulse response (IIR) version of the linear prediction lattice is identical to the well-known all-pass lattice structure that arises in digital filter theory. Before discussing the scope and outline. During the course of signal processing history. 1. dating back to the 1940s. Another product of the theory is a class of filtering structures called lattice structures. One of these is the theory of linear prediction. vector versions of linear prediction theory have been applied for the problem of blind identification of noisy communication channels. More recently. who considered the problem of extrapolation of discrete time random processes. It has also given rise to the idea of line spectrum pairs. it is important to have a brief glimpse of the history of linear prediction. Wiener (1949). The lattice has been of interest because of its stability and robustness properties despite quantization. A variation of the theory arises in the identification of the direction of arrival of an electromagnetic wave. For example. linear prediction theory can be traced back to the 1941 work of Kolmogorov.

. Kay. Proakis and Manolakis. 1981. 2006). wherein the history of linear estimation is traced back to its very roots. The connection to lattice structures was studied by Gray and Markel (1973) and Makhoul (1977). . An example is the application of the vector version of linear prediction theory in blind identification of noisy finite impulse response (FIR) channels (Gorokhov and Loubaton. Rabiner and Schafer (1978). which also discusses application to speech. Mitra. One of Levinson’s contributions. (2000).. Rabiner and Schafer (1978). and Sayed (2003). 2003). The book by Strobach (1990) is dedicated to an extensive discussion of linear prediction. 1. and their applications in adaptive filtering and channel equalization were studied by a number of authors (e. Satorius and Alexander. For a very detailed scholarly review of the history of statistical filtering. 1996. 1988]. and so is the early book by Markel and Gray (1976). 1987. it also explains the applications in spectrum . it still finds new applications. 2001). we have to refer the reader to the classic article by Kailath (1974). Therrien (1992). Markel and Gray (1976). and Paulraj et al. An influential early tutorial is the article by John Makhoul (1975) . Levinson’s recursion. Marple. Deller et al. This application can be found in a number of books (e. This is the beauty of any solid mathematical theory. Other excellent references for this are Makhoul (1975). 1979). Applications of multivariable models can be found even in the early image processing literature (e. Deller et al. The connection to power spectrum estimation is studied in great detail in a number of early articles [Kay and Marple. Jayant and Noll (1984). Pioneering work that explored the application in speech coding includes the work of Atal and Schroeder (1970) and that of Itakura and Saito (1970). which for some reason he regarded as ‘‘mathematically trivial. 2001..’’ is still in wide use today (Levinson’s recursion).g.2 THE THEORY OF LINEAR PREDICTION the ideas for the case of multivariate processes. Articles that show the connection to direction of arrival and array processing include Schmidt (1979). which reviews the mathematics of linear prediction. Robinson. for example. 1993. Kailath et al. and so forth and makes the connection to spectrum estimation and the representation of speech signals. Oppenheim and Schafer. and Schroeder (1999). (1986).g. 1999. 1999. (1993). Haykin (2002). Lopez-Valcarce and Dasgupta. see Chellappa and Kashyap. 2002. Connections to maximum entropy and spectrum estimation techniques can be found in Kay and Marple (1981). The book by Therrien (1992) not only has a nice chapter on linear prediction. 1982. Papoulis (1981). Kumaresan (1983). All-pass lattice structures that arise in digital filter theory are explained in detail in standard signal processing texts (Vaidyanathan. Sayed.2 SCOPE AND OUTLINE Many signal processing books include a good discussion of linear prediciton theory. Anderson and Moore (1979). Although the history of linear prediction can be traced back to the 1940s. 1985).g. Haykin. Antoniou. (1993). and Robinson (1982).

INTRODUCTION 3 estimation extensively. Unlike most of the other references. called the normal equations. we develop lattice structures for linear prediction. Another novelty is the inclusion of a detailed chapter on the theory of line spectral processes. Line spectral processes are discussed in detail in Chapter 7.1 Notations Boldface letters such as A and v indicate matrices and vectors. the transpose. In Chapter 8. which focuses on the larger topic of linear estimation. In Chapter 4. and †as in AT . which has applications in signal representation and compression. Chapter 2 introduces the optimal linear prediction problem and develops basic equations for optimality. which is similar to the problem of identifying the direction of arrival of an electromagnetic wave. (2000). 1. A short list of homework problems covering most of the chapters is included at the end. or MIMO LPC) in a separate chapter. † and A denote. before the bibliography section. The tight focus allows us to include under one cover details that are not normally found in other books. A number of properties of the solution are also studied. The chapter also discusses the theory of line spectrum pairs. The appendices include some review material on linear estimation as well as a few details pertaining to some of the proofs in the main text. One application is in the identification of sinusoids in noise. Connections to array processing are also covered by Therrien (1992) and Van Trees (2002). Superscript T. and transpose--conjugate of a matrix. we have included a thorough treatment of the vector case (multiple-input/multiple-output linear predictive coding. The tilde notation on a function of z is defined as follows: H(z) = H† (1/z *) . which is a fast procedure to solve the normal equations. Another excellent book is the one by Kailath et al.*. There are several examples and computer-based demonstrations throughout the book to enhance the theoretical ideas. The MIMO lattice and its connection to paraunitary matrices is also explored in this chapter. conjugate. Applications are briefly mentioned so that the reader can see the connections. which are used in speech compression. This recursion places in evidence further properties of the solution. A*. Chapter 3 introduces Levinson’s recursion. The style and emphasis here are different from the above references. In Chapter 6. a detailed discussion of linear prediction for the MIMO case (case of vector processes) is presented. Chapter 5 is dedicated to the topic of autoregressive modeling. The reader interested in these applications should peruse some of the references mentioned above and references in the individual chapters.2. This short book focuses on the theory of linear prediction. respectively. we develop the idea of flatness of a power spectrum and relate it to predictability of a process.

the text is self-contained for readers with introductory exposure to signal processing.4 THE THEORY OF LINEAR PREDICTION Thus. † so that the tilde notation effectively replaces all coefficients with the transpose conjugates and replaces z with 1/z. 1 + b1 z−1 1 + b1 z * H(e jω ) = H† (e jω ) As mentioned in the preface. For example. the notation A ≥ B means that A − B is positive semidefinite. The determinant of a square matrix A is denoted as det A and the trace as Tr A. H(z) = h(0) + h(1)z−1 ⇒ H (z) = h*(0) + h*(1)z and H(z) = a0 + a1 z−1 a0 + a1 z * * ⇒ H (z) = . Given two Hermitian matrices A and B. and matrices. random processes. • • • • Note that the tilde notation reduces to transpose conjugation on the unit circle: . and A > B means that A − B is positive definite. H(z) = n h(n)z−n ⇒ H(z) = n h (n)zn .

i x(n − i). The optimum predictor (i. we introduced the optimal linear prediction problem.4) In view of WSS property. The superscript f on the left is a reminder that we are discussing the ‘‘forward’’ predictor. fΔ f (2. (2. the first one being the prediction order. 1965). this is independent of time.i x(n − i). The estimate has the form N f xN (n) =− i=1 aN.2 PREDICTION ERROR AND PREDICTION POLYNOMIAL Let x(n) be a wide sense stationary (WSS) random process (Papoulis. We develop the equations for optimality and discuss some properties of the solution. * (2.1 INTRODUCTION In this chapter. From Eq..e. The estimation error is eN (n) = x(n) − xN (n). 2. in contrast to the ‘‘backward predictor’’ to be introduced later.1) The integer N is called the prediction order. Suppose we wish to predict the value of the sample x(n) using a linear combination of N most recent past samples. we see .2) = x(n) + i=1 f aN.3) We denote the mean squared error as EN : EN =E[|eN (n)|2 ]. possibly complex.5 CHAPTER 2 The Optimal Linear Prediction Problem 2. * (2. N f eN (n) f f (2. that is. Notice the use of two subscripts for a.i ) is the one that minimizes this mean squared value.3). the optimum * set of coefficients aN.

For large N.3 THE NORMAL EQUATIONS From Appendix A. Thus. Because eN(n) f is the output of a filter in response to a WSS input. and the spectral information of x(n) is mostly contained in the coefficient {aN. defined in Eq. we see that eN (n) is itself a WSS random process.6).i } for storage and transmission (Section 5. We will see later that the error eN (n) has a relatively flat power spectrum compared with x(n).5) is for future convenience. [R]im = E[x(n − 1 − i)x* (n − 1 − m)]. See Fig.i } f f and the error signal eN (n).i should be such that the error eN (n) is orthogonal to x(n − i). linear prediction essentially converts the signal x(n) into the set of N numbers {aN. (2.1(b)). the error is nearly white. E[eN (n)x* (n − i)] = 0. that is. we know that the optimal value of aN.5) The IIR filter 1/AN (z) can therefore be used to reconstruct x(n) from the error signal eN (n) (Fig. This fact is exploited in data compression applications.13).1.6) This condition gives rise to N equations similar to Eq. f f 1 ≤ i ≤ N.6 THE THEORY OF LINEAR PREDICTION x(n) (a) A N (z) FIR prediction filter eNf (n) eNf (n) (b) 1/ A N (z) IIR inverse filter x(n) FIGURE 2. although its output is only the prediction error. The elements of the matrices R and r. The technique of linear predictive coding (LPC) is the process of converting segments of a real time signal into the small set of numbers {aN. (A. (A. . The FIR filter transfer function is given by N f AN ( z ) = 1 + i=1 a N . i z −i . m ≤ N − 1.1: (a) The FIR prediction filter and (b) its inverse. The conjugate sign on aN. * f (2. Thus.i in the definition (2. now have a special form. 0 ≤ i. that the prediction error eN (n) can be regarded as the output of an FIR filter AN(z) in response to the WSS input x(n).i }. 2.1. 2. AN (z) is called the prediction polynomial.13). f 2.

. ⎥⎢ .8) For example.2. (A.1 R* (1) ⎥⎢ ⎥ ⎢ ⎥ ⎢ R(0) . . with N = 3. We can find a unique set of optimal predictor coefficients aN.3 R* (3) ⎡ R3 (2.2 ⎥ ⎢ R* (1) ⎢ R* (2) ⎥ ⎥⎢ . Minimum-phase property.5). all the elements on any line parallel to the main diagonal are identical. We shall refer to them as normal equations. ⎥ ⎢ ⎢ . all its zeros. unless x(n) is a very restricted process called a line spectral process (Section 2. Note that the matrix RN is Toeplitz. ⎦ aN . R ( N − 2 ) ⎥ ⎢ aN . R(0) ⎡ RN −r (2.N R* (N) R* (N − 1) R* (N − 2) . AN (z) is a mimimum-phase polynomial. .2). . has a very interesting property. .2 ⎦ = − ⎣ R* (2) ⎦ . .. That is.2. . R* (2) R* (1) R(0) a3. as long as the N × N matrix RN is nonsingular. . .7) ⎤⎡ ⎤ ⎤ ⎡ R(0) R(1) . Yule--Walker equations. In fact. which is obtained from the solution to the normal equations. that is. R ( N − 1) aN . A more direct proof is given in Appendix C. which will be presented in Section 3.1 R* (1) ⎥⎢ ⎥ ⎢ ⎥ ⎢ ⎣ R* (1) R(0) R(1) ⎦ ⎣ a3.4. . R(k) = E[x(n)x* (n − k)]. ⎥ = −⎢ . ⎦⎣ . The proof of the minimum-phase property follows automatically as a corollary of the so-called Levinson’s recursion. that is. zk satisfies |zk | ≤ 1. and Wiener--Hopf equations . we get ⎤⎡ ⎤ ⎤ ⎡ R(0) R(1) R(2) a3.4. Namely. .13) to obtain 1 (2. . ⎥ . we can then simplify Eq. the zeros are strictly inside the unit circle |zk | < 1. The minimum-phase property guarantees that the IIR filter 1/AN (z) is stable. Singularity of this matrix will be analyzed in Section 2. . (2. . The optimal predictor polynomial AN (z) in Eq. We will elaborate more on this later.i. ⎥ ⎢ . Using the fact that R(k) = R* (−k). . 1 A more appropriate notation would be Rxx (k). but we have omitted the subscript for simplicity. ⎦ ⎣ ⎣ . .THE OPTIMAL LINEAR PREDICTION PROBLEM 7 Define R(k) to be the autocorrelation sequence of the WSS process x(n).9) These equations have been known variously in the literature as normal equations.

1 = −R(1).1. because eN (n) is orthogonal to the past N samples x(n − k). (A.5. it is also orthogonal to the f f f estimate xN (n). .1 x(n − 1).12) (2.1 z−1 = 1 − (5/7)z−1 R(1) 5 =− R(0) 7 f (2. we obtain E[ |x(n)|2 ] = E[ |x(n)|2] + E[ |eN (n)|2] .14) (2. a1 .i R* (i).9.17) to arrive at the following expression for the minimized mean square error: N f EN = R(0) + i=1 aN. * Because EN is real. The first-order predictor produces the estimate x1 (n) = −a1. R(2) = 0.1: Second-Order Optimal Predictor. and the optimal value of a1.8 THE THEORY OF LINEAR PREDICTION 2.13) (2. . . R(1) = 1. Consider a real WSS process with autocorrelation sequence R(k) = (24/5) × 2−|k| − (27/10) × 3−|k| .11) Thus the mean square value of x(n) is the sum of mean square values of the estimate and the estimation error. The values of the first few coefficients of R(k) are R(0) = 2.3. we can conjugate this to obtain N f EN = R(0) + i=1 f f aN.1 Expression for the Minimized Mean Square Error We can use Eq.1 = − The optimal predictor polynomial is A1 (z) = 1 + a1. EN f f (2.10) Next. Example 2.i R(i). (2.1 is obtained from R(0)a1. from x(n) = xN(n) + eN (n). Thus. that is.15) .

(2. . .2 = 1/6.10) to the normal equations (2. The result is ⎤⎡ ⎡ ⎤ ⎡ f⎤ R(0) R(1) . (2. f (2. ⎥ ⎢ .5 0.N aN 0 This is called the augmented normal equation for the N th-order optimal predictor and will be used in many of the following sections. It can be used to .0.17) The result is a2. R(N) 1 EN ⎥⎢ ⎢ ⎥ ⎢ ⎥ R(0) . ⎥ ⎢ . Is any Toeplitz matrix an autocorrelation? Premultiplying both sides of the augmented equation by vector a† . that is. however. . has independent importance.3. .16) The second-order optimal predictor coefficients are obtained by solving the normal equations Eq. . E1 = R(0) + a1.. is (2. computed from Eq.1 a2.5 1. the optimal predictor polynomial is A2 (z) = 1 − (5/6)z−1 + (1/6)z−2 .8) to the left and adding an extra row at the top of the matrix RN . ⎥⎢ . this gives a second verification of the (obvious) fact f that EN ≥ 0.2 R(2) = 1. (2. . f (2. . ⎦ ⎣ . from Eq.20) . ⎦⎣ .THE OPTIMAL LINEAR PREDICTION PROBLEM 9 The minimized mean square error is.1 ⎥ ⎢ 0 ⎥ ⎢ R* (1) ⎥⎢ . . ⎥ = ⎢ . . ⎥ ⎢ .1 R(1) = 36/35. † f (2. We conclude this section with a couple of remarks about normal equations.21) Because RN+1 is positive semidefinite. 2. ⎦ ⎣ .10). we obtain N aN RN+1 aN = EN . 1. .18) E2 = R(0) + a2.10).1 R(1) + a2. . This observation.5 2.19) 2.9 (2. RN+1 R(0) aN. Thus.1 1. R* (N) R* (N − 1) . The minimized mean square error.2 =− 1.8) simply by moving the right hand side in Eq. (2. (2. R(N − 1) ⎥ ⎢ aN.2 The Augmented Normal Equation We can augment the error information in Eq.1 a2.8) with N = 2.1 = −5/6 and a2. . . (2.

From predictor coefficients to autocorrelations. Given the set of autocorrelation coefficients R(0).2 later. the 3 × 3 matrix R3 in Eq. 0 ≤ k ≤ N. Positive definiteness. suppose we are given the solution aN. This result..23) ⎤ x(n) ⎥ ⎢ R3 = E ⎣ x(n − 1) ⎦ [ x* (n) x(n − 2) ⎡ x* (n − 1) x* (n − 2) ].i and the error EN . The quantity R(0) appears on all the diagonal elements and is the mean square value of the random process. 2. . In fact. R(N ). perhaps not obvious. (2. it is positive definite as long as it is nonsingular. RN is also Hermitian and positive semidefinite. . Then. (2. (2. . we can work backward and uniquely identify the autocorrelation coefficients R(k).i and the f mean square error EN by solving the normal equations (assuming nonsingularity) and then f using Eq. Conversely. R(0) = E[|x(n)|2 ]. x(n − N + 1) T .2. . we can uniquely identify the N predictor coefficients aN. Because x(n)x† (n) is Hermitian and positive semidefinite.10 THE THEORY OF LINEAR PREDICTION prove that any positive definite Toeplitz matrix is an autocorrelation of some WSS process (see Problem 14).22) . Singularity is discussed in Section 2. Diagonal elements.10).9) is (2. This follows from the definition R(k) = E[x(n)x* (n − k)] and from the property R(k) = R* (−k). 2. R(1).4. We first observe some of the simple properties of RN : 1. is justified as part of the proof of Theorem 5.4 PROPERTIES OF THE AUTOCORRELATION MATRIX The N × N autocorrelation matrix RN of a WSS process can be written as RN = E[x(n)x† (n)] where x(n) = x(n) x(n − 1) For example. 2.. that is. .

. . We observed earlier that RN has the Toeplitz property. v† RN v = λ. + v* −1 z−(N−1) N with input x(n). any eigenvalue of RN can be regarded as the mean square value of the output y(n) for appropriate choice of the unit-energy filter V(z) driven by x(n). In the next few subsections. let v be an unit-norm eigenvector of RN with eigenvalue λ. 4. In Section 3. v N −1 * The mean square value of y(n) is E[|v† x(n)|2 ] = v† E[x(n)x† (n)]v = v† RNv.THE OPTIMAL LINEAR PREDICTION PROBLEM 11 3. that is. we will derive a fast procedure called the Levinson’s recursion to solve the normal equations. the (k. To prove the Toeplitz property formally. that is. (2.. the quadratic form v† RN v is the mean square value of the output of the FIR filter V(z). Then.24) . given the Toeplitz autocorrelation matrix RN . Toeplitz property.25) That is. This recursion is made possible because of the Toeplitz property of RN . RN v = λv. E[|y(n)|2 ] = v† RN v. . Its output can be expressed as y(n) = v0 x(n) + v1 x(n − 1) + . A filtering interpretation of eigenvalues.2. (2.26) Thus. Consider an FIR filter * * V(z) = v0 + v1 z−1 + . v† v = 1. * * N where v† = v* 0 v* 1 (2. + v* −1 x(n − N + 1) = v† x(n). The eigenvalues of the Toeplitz matrix RN can be given a nice interpretation.. we study some of the deeper properties of the autocorrelation matrix. . . m) element of RN depends only on the difference m − k. simply observe that [RN]km = E[x(n − k)x* (n − m)] = E[x(n)x* (n − m + k)] = R(m − k). with input signal x(n). The Toeplitz property is a consequence of the WSS property of x(n). In particular. These will be found to be useful for future discussions. Thus.

then Sxx (e jω ) is constant. we have E[|y(n)|2] = 1 2π 1 = 2π 2π 0 Syy (e jω )dω Sxx (e jω )|V(e jω )|2 dω 2π 0 2π 0 ≤ Smax |V(e jω )|2 dω = Smax v† v 2π where the last equality follows from Parseval’s theorem. Proof of Eq. . we know that these are real and nonnegative. We will show that the eigenvalues λi are bounded as follows: Smin ≤ λi ≤ Smax . we then have Smin ≤ E[|y(n)|2 ] ≤ Smax (2. that is. ∞ Sxx (e ) = k=−∞ jω R(k)e−jωk .2: The relation between power spectrum and eigenvalues of the autocorrelation matrix. Consider again a filter V(z) as in (2.4.24) with input x(n) and denote the output as y(n).2).12 THE THEORY OF LINEAR PREDICTION 2. and all the eigenvalues are equal (consistent with the fact that RN is diagonal with all diagonal elements equal to R(0)). Now. Now. Because the mean square value is the integral of the power spectrum. (2.28) For example. Because the matrix is positive semidefinite. Similarly. if x(n) is zero-mean white.1 Relation Between Eigenvalues and the Power Spectrum Suppose λi . let Smin and Smax denote the extreme values of the power spectrum (Fig. (2. E[|y(n)|2 ] ≥ Smin v† v. 2. let Sxx (ejω ) be the power spectrum of the process x(n).28). 0 ≤ i ≤ N − 1 are the eigenvalues of RN .29) S max Sxx (e jω ) S min 0 2π All eigenvalues are here ω FIGURE 2.27) We know Sxx (e jω ) ≥ 0. With v constrained to have unit norm (v† v = 1). (2.

3 for demonstration.30) is called the condition number of the matrix. With λmin and λmax denoting the extreme eigenvalues of RN .e. See Fig. (2. if a random process has a nearly flat power spectrum (i. But we have already shown that any eigenvalue of RN can be regarded as the mean square value E[|y(n)|2 for appropriate choice of the unit-energy filter V(z) (see remark after Eq.. 2. Because (2. follows.4. then numerical errors tend to have a more severe effect during matrix inversion (or when trying to solve Eq. (2.29) holds for every possible output y(n) with unit-energy V(z).28).e.31) Thus. it can be considered to be a well-conditioned process. It is well-known (Golub and Van Loan. (2. we see that 1≤N = λmax Smax ≤ λmin Smin (2. then the corresponding random variables are linearly . If the power spectrum has a wide range. Eq. N could be very large).. 1989) that if this ratio is large.26)).3: Examples of power spectra that are well and poorly conditioned. we say that the system of equations is well-conditioned. Also see Problem 11. If the condition number is close to unity.2 Singularity of the Autocorrelation Matrix If the autocorrelation matrix is singular.THE OPTIMAL LINEAR PREDICTION PROBLEM 13 poorly conditioned Sxx (e ) jω well conditioned ω 0 2π FIGURE 2. By comparison with Eq.8)). It is also known that the condition number cannot decrease as the size N increases (Problem 12). (2. 2.28). it is possible that the process is poorly conditioned (i. the ratio N = λmax ≥1 λmin (2. Smax /Smin ≈ 1). therefore.

4). This is demonstrated in Fig. (2. (2. let us assume that RL+1 [the (L + 1) × (L + 1) matrix] is singular.e. in particular. That is. This means. Then. That is. that if we measure a set of L successive samples of x(n). then takes the form L R(k) = i=1 ci e jωi k . 2. + vL x(n − L) = 0. The autocorrelation of the process x(n).4). . which annihilates a line spectral process. 1 (2. Thus Syy (e jω ) = Sxx (e jω )|V(e jω )|2 ≡ 0.36) A WSS process characterized by the power spectrum Eq.4. L distinct zeros in 0 ≤ ω < 2π ). we conclude that * * * v0 x(n) + v1 x(n − 1) + . it has the form L Sxx (e jω ) = 2π i=1 ci δa (ω − ωi ). at most. * * V(z) = v0 + v* z−1 + . we conclude that Sxx (e jω ) can be nonzero only at these points.34) Because V(z) has at most L zeros on the unit circle (i.14 THE THEORY OF LINEAR PREDICTION x(n) V(z) FIR y(n) = 0 for all time FIGURE 2.36)) is said to be a line spectral process. proceeding as in Section A.35) (equivalently the autocorrelation (2. (2. . . .35) which is a linear combination of Dirac delta functions.33) then the output y(n) is zero for all time! Its power spectrum Syy (e jω ). Eq. To be more quantitative. 0 ≤ ω < 2π. with no error. (2. then all the future samples can be computed recursively.32) where not all vi ’s are zero.32) implies that if we pass the WSS random process x(n) through an FIR filter (see Fig. dependent (Section A. . (2. + vL z−L .5.. and the frequencies ωi are called the line frequencies. which is the inverse Fourier transform of Sxx (e jω ). is zero for all ω. |V(e jω )|2 has. therefore. 2. (2. the process is fully predictable.4: The FIR filter V(z).

Because the process is fully predictable. we present a more complete study of these and address the problem of identifying the parameters ωi and ci when such a process is buried in noise. R(0) × . 2.37) This uses the fact that R3 is a submatrix of R4 .2 0 0 1 a1. In terms of frequency domain. To demonstrate. We then obtain four sets of augmented normal equations. (2.3 f ⎡ 0 1 a2. let N = 3. the power spectrum can only have impulses---it cannot have any smooth component. Consider the augmented normal equation Eq. This result is of considerable value in theoretical analysis.1 ⎤ ⎡ f E3 0 ⎥ ⎢ 0 0⎥ ⎢ ⎥=⎢ 0⎦ ⎣ 0 1 0 × f E2 0 0 × × f E1 0 ⎤ × ×⎥ ⎥ ⎥ ×⎦ f E0 (2. that is. In Chapter 7. × R3 R4 = (2.3 Determinant of the Autocorrelation Matrix We now show that the minimized mean square error EN can be expressed directly in terms of the determinants of the autocorrelation matrices RN+1 and RN .4. which can be written elegantly together as follows: 1 ⎢a ⎢ 3 .1 a2. it is the exact ‘‘opposite’’ of a white process. which has no correlation between any pair of samples.38) .20) for the N th-order predictor. whereas for a fully predictable process. We can further augment this equation to include the information about the lower-order predictors by appending more columns. one for each order.5: Power spectrum of a line spectral process.2 a3.THE OPTIMAL LINEAR PREDICTION PROBLEM Dirac delta functions 15 Sxx (e jω ) ω 0 2π FIGURE 2.1 R4 × ⎢ ⎣ a3. for a white process. the power spectrum is constant.

two of which are described here. we arrive at f f f f f f f f f f (2. E0 . the limiting value of the mean squared prediction error is identical to the limiting value of 1 /N (det RN ) . Extending this for arbitrary N. where we have used the fact that the determinant of a triangular matrix is equal to the product of its diagonal elements (Horn and Johnson. The entries × are possibly nonzero but will not enter our discussion. we arrive at det R4 = E3 E2 E1 E0 . and compute its deterministic autocorrelation. . det RN (2. E0 . * (2.39) (2. The autocorrelation method. . . Taking determinants. In a similar manner. Thus.42) That is. the minimized mean square prediction errors can be expressed directly in terms of the determinants of two autocorrelation matrices. Taking the ratio.5 ESTIMATING THE AUTOCORRELATION In any practical application that involves linear prediction.2.16 THE THEORY OF LINEAR PREDICTION and similarly.40) EN = f det RN+1 . the estimate of R(k) has the form L−1 R(k) = n=0 xL (n)xL (n − k). 1985).43) . we define the truncated version of measured data xL (n) = x(n) 0 0 ≤ n ≤ L − 1. . the autocorrelation samples R(k) have to be estimated from (possibly noisy) measured data x(n) representing the random process. we will further show that N→∞ lim (det RN ) 1/N = lim EN N→∞ f (2.41) Thus. There are many methods for this.5. R2 is a submatrix of R3 and so forth. we have det RN = EN−1 EN−2 . we have the following result: det RN+1 = EN EN−1 . In this method. 2. In Section 6. outside.

For example. the computation (2. There are variations of this method that use a tapering window on the given data instead of abruptly truncating it (Rabiner and Schafer. Furthermore. tend to be better than in the autocorrelation . .2) to solve for AN (z). 1992).’’ the data matrix X is formed differently: ⎤ ⎡ x(2) x(1) x(0) ⎥ ⎢ (2. R(1). If necessary we can divide each element of the estimate by a fixed integer so that this looks like a time average. In the covariance method.THE OPTIMAL LINEAR PREDICTION PROBLEM 17 There are variations of this method. Therrien. . so the estimates are of good quality only if k is small compared with the number of available data samples L.43) is equivalent to defining the data matrix ⎤ ⎡ x(0) 0 0 ⎥ ⎢ ⎢ x(1) x(0) 0 ⎥ ⎥ ⎢ ⎢ x(2) x(1) x(0) ⎥ ⎥ ⎢ X = ⎢ x(3) x(2) x(1) ⎥ (2. Note that the number of samples of x(n) used in the estimate of R(k) decreases as k increases. and all zeros of AN (z) are guaranteed to be in |z| < 1 (Appendix C). A simple matrix interpretation of the estimation of RN is useful. (2. For example. the estimates. From the estimated value R(k). So. In a variation called the ‘‘covariance method. . each R(k) is an average of N possibly nonzero samples of data.45) This is a Toeplitz matrix whose top row is R(0). in general. The covariance method. all the properties from linear prediction theory continue to be satisfied by the polynomial AN (z). we form the Toeplitz matrix RN and proceed with the computation of the predictor coefficients. 1978. So. if we have L = 5 and wish to estimate the 3 × 3 autocorrelation matrix R3 .44) ⎥ ⎢ ⎢ x(4) x(3) x(2) ⎥ ⎥ ⎢ ⎥ ⎢ ⎣ 0 x(4) x(3) ⎦ 0 0 x(4) and forming the estimate of R3 using R3 = (X† X)* . . it is positive definite by virtue of its form X† X. unlike the autocorrelation method. where the number of samples used in the estimate of R(k) decreases as k increases.46) X = ⎣ x(3) x(2) x(1) ⎦ x(4) x(3) x(2) and the autocorrelation estimated as R3 = (X† X)* . for example. one could divide the summation by the number of terms in the sum (which depends on k) and so forth. we can use Levinson’s recursion (Section 3.

called Levinson’s recursion. Kay. the MIMO LPC problem. Deeper discussions on linear prediction will follow in later chapters. Therrien. 1975. we introduced the linear prediction problem and discussed its solution. will be introduced in the next chapter. but the matrix RN in this case is not necessarily Toeplitz. Makhoul. The solution appears in the form of a set of linear equations called the normal equations.g.6 CONCLUDING REMARKS In this chapter.. and we have to solve the normal equations directly. More detailed discussions on the relative advantages and disadvantages of these methods can be found in many references (e. The extension of the optimal prediction problem for the case of vector processes. Another problem with non-Toeplitz covariances is that the solution AN(z) is not guaranteed to have all zeros inside the unit circle. Levinson’s recursion (Section 3. This recursion will place in evidence a structure called the lattice structure for linear prediction.18 THE THEORY OF LINEAR PREDICTION method.2) cannot be applied for solving the normal equations. 1978. Rabiner and Schafer. 1992). will be considered in Chapter 8. 1988. 2. An efficient way to solve the normal equations using a recursive procedure. So. • • • • .

This procedure.20) becomes ⎤⎡ ⎤ ⎡ f⎤ R(0) R(1) R(2) R(3) 1 E3 ⎢ R*(1) R(0) R(1) R(2) ⎥ ⎢ a ⎥ ⎢ ⎥ ⎥ ⎢ 3.g. A recursive procedure for this. Golub and Van Loan. the N × N matrix RN in these equations is not arbitrary.1) This set of four equations describes the third-order optimal predictor. However.i are solutions to the set of normal equations given by Eq. requiring computations of the order of N 2 . also places in evidence many useful properties of the optimal predictor. in addition to being efficient. as we shall see in the next several sections. due to Levinson (1947).1 ⎥ ⎢ 0 ⎥ ⎢ ⎥⎢ ⎢ ⎥ = ⎢ ⎥. (2. but Toeplitz. For this. (2. To demonstrate the basic idea. consider the third-order prediction problem. e. note that we can append a fifth equation to the above set of four and write it as . Traditional techniques to solve these equations require computations of the order of N 3 (see. ⎣ R*(2) R*(1) R(0) R(1) ⎦ ⎣ a3. will be described in this chapter.3 0 ⎡ R4 a3 (3. we obtain not only the solution to the Nth-order problem. Our aim is to show how we can pass from the third-order to the fourth-order case. In this way.1 INTRODUCTION The optimal linear predictor coefficients aN. but also all the lower orders. 1989). This property can be exploited to solve these equations in an efficient manner.2 DERIVATION OF LEVINSON’S RECURSION Levinson’s recursion is based on the observation that if the solution to the predictor problem is known for order m..2 ⎦ ⎣ 0 ⎦ R*(3) R*(2) R*(1) R(0) a3.19 CHAPTER 3 Levinson’s Recursion 3. 3.8). The augmented normal Eq. then the solution for order (m + 1) can be obtained by a simple updating process.

3 R*(1). k ≤ 4.3 ⎥⎢ R*(1) R(0) R(1) R(2) ⎥ ⎢ a3. consider the operation Eq.2) also implies ⎡ R(0) ⎢ R*(1) ⎢ ⎢ ⎢ R*(2) ⎢ ⎣ R*(3) R*(4) ⎤⎡ R(1) R(2) R(3) R(4) 0 ⎥ ⎢ a* R(0) R(1) R(2) R(3) ⎥ ⎢ 3.2) where k4 is a constant. (3. 0 ≤ i.3 ⎦ ⎣ 0 ⎦ R*(4) R*(3) R*(2) R*(1) R(0) 0 α3 ⎡ R5 (3.5) If we now take a linear combination of Eqs.1 * R*(3) R*(2) R*(1) R(0) 1 R5 ⎤ ⎤ * α3 ⎥ ⎢ 0 ⎥ ⎥ ⎢ ⎥ ⎥ ⎢ ⎥ ⎥ = ⎢ 0 ⎥.5).4−k = Rik . if we reverse the order of all rows. Eq.7) . then reverse the order of all columns and then conjugate the elements.1 ⎥ ⎢ ⎥ ⎢ ⎥⎢ ⎢ ⎥ ⎢ ⎥ ⎢ R*(2) R*(1) R(0) R(1) R(2) ⎥ ⎢ a3.1 R*(3) + a3.3) The matrix R5 above is Hermitian and Toeplitz. we will obtain the equations governing the fourth-order predictor indeed! Thus.5) such that the last element on the right-hand side becomes zero. (3.6) −α3 E3 f . If we choose * k4 = + k4 × Eq.2 ⎥ = ⎢ 0 ⎥ .4) In other words. ⎥⎢ ⎢ ⎥ ⎢ ⎥ ⎣ R*(3) R*(2) R*(1) R(0) R(1) ⎦ ⎣ a3. Δ (3.2) and (3. * (3. the result is the same matrix! As a result.2 R*(2) + a3. we verify that its elements satisfy (Problem 9) R4*−i. (3.2 * ⎥⎢ R*(2) R*(1) R(0) R(1) ⎦ ⎣ a3. ⎥ ⎢ ⎥ ⎦ ⎣ 0 ⎦ f E3 ⎡ (3.20 THE THEORY OF LINEAR PREDICTION ⎤⎡ ⎤ ⎡ f⎤ R(0) R(1) R(2) R(3) R(4) 1 E3 ⎢ R*(1) R(0) R(1) R(2) R(3) ⎥ ⎢ a ⎥ ⎢ 0 ⎥ ⎥ ⎢ 3 . (3.2) where α3 =R*(4) + a3. Using this. (3. (3.

we can rewrite the computation of a4. .9) With this notation.4 0 (3.LEVINSON’S RECURSION 21 then the result has the form ⎡ R(0) R(1) R(2) R(3) ⎢ R*(1) R(0) R(1) R(2) ⎢ ⎢ ⎢ R*(2) R*(1) R(0) R(1) ⎢ ⎣ R*(3) R*(2) R*(1) R(0) R*(4) R*(3) R*(2) R*(1) R5 where a4.3 + k4 a3.m−1 z−1 + .2 * * = a3. (3. if we know the coefficients a3. Summarizing.k as A4 (z) = A3 (z) + k4 z−1 [z−3 A3 (z)]. .m z−m m m (prediction polynomial).k } can be written more compactly if we use polynomial notations and define the FIR filter * Am (z) = 1 + a* .8) is E4 .1 a4.1 + k4 a3. we can find the corresponding quantities for the fourth-order predictor from the above equations. we see that this is related to E3 as E4 = E3 + k4 α3 * * f f = E3 − k4 k4 E3 (from Eq.2.1 z−1 + a* .3 a4.11) . which is the minimized forward prediction error for the fourth-order predictor.m + am.1 ⎥ ⎢ ⎥ ⎥⎢ ⎥ ⎢ ⎥ R(2) ⎥ ⎢ a4. z−m Am (z) = am. we conclude that the element denoted × on the right-hand side of f Eq.7)).10) (3. (3.4 ⎤⎡ ⎤ ⎡ ⎤ R(4) 1 × ⎥⎢a ⎥ ⎢ 0 ⎥ R(3) ⎥ ⎢ 4.2 + k4 a3.1). f f f (3. + am.i and the mean square error E3 for the third-order optimal predictor.1 z−(m−1) + z−m . .8) = a3.k from a3.2 ⎥ = ⎢ 0 ⎥ .2 a4. Polynomial Notation. + am.2 z−2 + .3 ⎦ ⎣ 0 ⎦ R(0) a4. . * By comparison with Eq. f From the above construction. * f = (1 − |k4 |2 )E3 . (3. The preceding computation of {a4.3 * * = a3. (3.1 * * = k4 . where the tilde notation is as defined in Section 1.k } from {a3. ⎥⎢ ⎥ ⎢ ⎥ R(1) ⎦ ⎣ a4. Thus.

12) Am+1 (z) = Am (z) + km+1 z−1 [z−m Am (z)] (order update).. x(n − m). (3. f (3. (3. x(n − 2).2 R*(m − 1) + .13) Em+1 = (1 − |km+1 |2 )Emf . we have f (3.. (3.2 The Partial Correlation Coefficient The quantity ki in Levinson’s recursion has a nice ‘‘physical’’ significance.15) Initialization. .18) The above three equations are used to initialize Levinson’s recursion.14) αm = R*(m + 1) + am. (2.19) 3.17) α0 = R*(1).2. (3.22 THE THEORY OF LINEAR PREDICTION 3. we can find the corresponding quantities for the (m + 1)th-order optimal predictor as follows: km+1 = −αm * Em f . Thus.20) . Then. (3.1 R*(m) + am. and evaluate A1 (z) from Eq. Recall from Section 2.3).16) E0 = R(0). the predictor polynomial is A0 (z) = 1. m = 0). . Also. f (3. . from the definition of αm . Thus.g.m R*(1). from Eq. (3.3 f that the error em (n) of the optimal predictor is orthogonal to the past samples x(n − 1). e0 (n) = x(n).2. Once this recursion is initialized for small m (e. . Note that for m = 0. and the error is. let Am (z) be the predictor polynomial for the mth-order optimal predictor f and Em the corresponding mean square value of the prediction error. For example.13) and so forth. it can be used to solve the optimal predictor problem for any m. .1 Summary of Levinson’s Recursion The recursion demonstrated above for the third-order case can be generalized easily to arbitrary predictor orders. we can compute * k1 = −α0 /E0 = −R(1)/R(0). where f (error update). + am. (3.

is not necessarily orthogonal to the error. that is.1.1. the coefficients ki have been called the partial correlation coefficients and abbreviated as parcor coefficients. 2 We can proceed in this way to compute optimal predictors of any order. Computational complexity.3 SIMPLE PROPERTIES OF LEVINSON’S RECURSION Inspection of Levinson’s recursion reveals many interesting properties.21) f The coefficient km+1 represents this correlation. To obtain the second-order predictor. the amount of computation required . f (3. (3.13)) is proportional to this normalized correlation.5 = −6/35 f k2 = −α1 /E1 = 1/6 A2 (z) = A1 (z) + k2 z−2 A 1 (z) = 1 − (5/6)z−1 + (1/6)z−2 f f E2 = (1 − k2 )E1 = 1.LEVINSON’S RECURSION 23 The ‘‘next older’’ sample x(n − m − 1). Consider again the real WSS process with the autocorrelation (2.0.1 R(1) = (9/10) − (5/7) × 1. the correlation E[em (n)x*(n − m − 1)] could be nonzero.12). where we used a direct approach to solve the normal equations. normalized by the mean square error Em . The correction to the prediction polynomial in the order-update equation (second term in Eq. that f is. for reasons that will become clear in later chapters. In the literature. We initialize Levinson’s recursion as described above. Some of these are discussed below. We can now compute the quantities k1 = −α0 /E0 = −5/7 A1 (z) = A0 (z) + k1 z−1 A0 (z) = 1 − (5/7)z−1 . however. α0 = R(1) = 1. A0 (z) = 1. Example 3. It can be shown (Problem 6) that the * quantity αm in the numerator of km+1 satisfies the relation f αm = E[em (n)x*(n − m − 1)].22) α1 = R(2) + a1. f f E1 = (1 − k2 )E0 = 36/35. They are also known as the lattice or reflection coefficients.5. we compute the appropriate quantities in the following order: f and E0 = R(0) = 2. 3. As a result. 1. The above results agree with those of Example 2. * (3. The computation of Am+1 (z) from Am (z) requires nearly m multiplication and addition operations.1: Levinson’s Recursion. 1 We have now obtained all information about the first-order predictor. Deeper properties will be studied in the next several sections.

1 − |k1 |2 E0 = 1 − |kN |2 1 − |kN−1 |2 . This should be compared with traditional methods for solving Eq. which require computations proportional to N 3 .14) yields the following expression for the mean square prediction error of the N th-order optimal predictor: EN = 1 − |kN |2 1 − |kN−1 |2 . f 3. Recall also that the assumption that x(n) is not line spectral also ensures that the autocorrelation matrix RN in the normal equations is nonsingular for all N (Section 2. Then Eq. (3. Because Em is the mean square value of the prediction error.23) unless x(n) is a line spectral process. (3.2: Levinson’s recursion. . In addition to reducing the computation. Em+1 ≤ Emf . it. . It is interesting to note that repeated application of Eq. therefore. . This f means that em+1 (n) is identically zero for all n.24 THE THEORY OF LINEAR PREDICTION for N repetitions (i. for any i. to find aN.4. (3. 3. .0170 R(3) −0. follows that |ki |2 ≤ 1.1482 R(1) 0.24) with equality if and only if km+1 = 0.0087 . the output of Am+1 (z) in response to x(n) is zero (Fig. Using an argument similar to the one in Section 2.4.14). that is.0323 R(4) −0. Let |km+1 | = 1.1(a)).1 (top). In f other words.14) says that Em+1 = 0. we conclude that this is not possible unless x(n) is a line spectral process. Summarizing. Now.2). (2.. From Eq.2.0500 R(2) 0. the mean square value of the prediction error sequence em+1 (n) is zero.14) it also follows that the mean square error is a monotone function. we have | ki | < 1 . f we know Em ≥ 0 for any m. The first few samples of the autocorrelation are jω f f R(0) 0. Parcor coefficients are bounded. for any i.i) is proportional to N 2 . From Eq. (3.0629 R(5) 0.0035 R(6) −0. In other words. Levinson’s recursion also reveals the optimal solution to all the predictors of orders ≤ N. 1 − |k1 |2 R(0) Example 3. (3.8) (such as Gaussian elimination). Strict bound on parcor coefficients.e. 4. 2. f (3. f 2. consider a WSS process x(n) with power spectrum Sxx (e ) as shown in Fig. Minimized error is monotone.

2.16 Prediction error 0. and the prediction error Em shown for more orders (bottom).1: Example 3.08 0 1 2 Prediction order m 0. f .5 0 0 0.08 0 1 2 Prediction order m 14 FIGURE 3.LEVINSON’S RECURSION 1 25 Power spectrum 0. The input power spectrum Sxx (e jω ) (top).2 0.6 0.12 0.12 0.16 Prediction error 0.4 ω/π 0. the prediction error Em plotted f for a few prediction orders m (middle).8 1 0.

The same trend continues for higher prediction orders as demonstrated in the bottom plot. f f f (3.1076 f E5 0.3374 k2 −0. Notice that these coefficients satisfy |km | < 1 as expected from theory.0 1. that is. and 0.3253 k5 −0.3939 k6 0. we obtain the parcor coefficients k1 −0.0876 f The prediction errors are also plotted in Fig.0 1.1203 f E4 0.2020 −0. 3.3253 0.0010 −0. suppose that Emf = Em+1 = Em+2 = Em+3 = .0988 −0. 3.4647 0.0909 f E6 0. 0. it turns out that the . The error decreases monotonically as the prediction order increases.0010 k3 0.4210 0. the zeros of A6 (z) are complex conjugate pairs with magnitudes 0.2762 0.1804 0.1908 The zeros of these polynomials can be verified to be inside the unit circle. . For example.1314 f E3 0.3373 −0. that is.1314 f E2 0.3711 −0. as expected from our theoretical development (Appendix C).2901 k4 0.0 1.3939 −0. the error does not decrease any further.3370 −0.0 1.7853.6102.3825 −0.1908 for the first six optimal predictors.0 1.4 THE WHITENING EFFECT Suppose we compute optimal linear predictors of increasing order using Levinson’s recursion.1309 −0.26 THE THEORY OF LINEAR PREDICTION If we perform Levinson’s recursion with this. The corresponding prediction errors are E0 0.2901 0.9115.2320 0.2429 −0. Ek+1 ≤ Ek .1 (middle). increasing the numberof past samples does not help to increase the prediction accuracy any further.3374 −0.0 1.1217 0.25) This represents a stalling condition. . We know the prediction error decreases with order.4462 −0.1482 f E1 0. Whenever such a stalling occurs.0 −0. Assume that after the predictor has reached the order m. The optimal prediction polynomials A0 (z) through A6 (z) have the coefficients given in the following table: A0 ( z ) A1 ( z ) A2 ( z ) A3 ( z ) A4 ( z ) A5 ( z ) A6 ( z ) 1. that is.

In other words. f For the case of nonzero mean.1. (3. whenever stalling occurs. is the same for all ≥ m. The condition km+1 = 0 implies αm = 0 from Eq. That is. if and only if the f ♦ prediction error em (n) satisfies the orthogonality condition Eq. in particular.26) still holds. Theorem 3.26). We will present a more complete discussion of AR processes in Section 5. As a result. we know that x(n) can be represented as the f output of a filter. therefore.30) .28) ≥ 0. the iteration stalls after the mth-order predictor (i. 2. Thus. we see that whenever stalling occurs. (3. Repeating this argument. Repeating this.29) By orthogonality principle. Now. Proof.LEVINSON’S RECURSION f error em (n) 27 satisfies a very interesting property. x(n) is the output of an all-pole filter 1/Am (z) with white input. (3. any pair of samples are mutually orthogonal. such a zero-mean process x(n) is said to be autoregressive (AR). km+1 = 0 (from Eq. Combining the preceding two equations.2. (3.21) is zero. so that f E[em (n)x*(n − m − − 1)] = 0. (3. Consider the optimal prediction of a WSS process x(n).27) The prediction error sequence e (n). (3. Then.e. . we already know that E[em (n)x*(n − i)] = 0 for 1 ≤ i ≤ m. Stalling and whitening.1(b)). (3. we conclude that f E[em (n)x*(n − )] = 0. the cross-correlation Eq. But em+ (n) = em (n) for any ≥ 0.14)).. with successively increasing predictor orders. f f E em (n)[em (n + i)]* = 0 f Em i=0 i = 0. the mean square error stays constant as shown by Eq. excited with em (n) (see Fig. and the preceding equation f means that em (n) is white. we see that Am (z) = Am+1 (z) = Am+2 (z) = . Am+1 (z) = Am (z). ≥ 1. Eq. f f f f (3. that is. (3. (3. and we say that em (n) is an orthogonal (rather than white) random process. f f f f ≥ 0. namely. (3. from Section 2. we have E[em+ (n)x*(n − m − − 1)] = 0. if x(n) has zero mean. Stalling implies.25)).2. then em (n) will also have zero mean.12). . that Em+1 = Em .26) In particular.

Summarizing. where −1 < ρ < 1.2 R(1) = R(3) − ρR(2) = 0.37) . That is.e. we have e1 (n) = x(n) − ρx(n − 1). m f em (n − ) = x(n − ) + i=1 am. Let us double-check this by testing f f whether e1 (n) satisfies Eq. so that A2 (z) = A1 (z). (3. f (3. we will find in this case that Am (z) = A1 (z) for all m.25) holds). Eq. (3. To find the * third-order predictor. note that α2 = R(3) + a2. we can show that if em (n) has the property (3.1 = −R(1). (3. * (3. the recursion has stalled. we have proved.i x(n − − i). that is.1 is obtained by solving R(0)a1.33) (3.3: Stalling and Whitening. therefore.i x(n − i). f (3. f (3.35) Using Levinson’s recursion. To compute the second-order predictor.1 R(1) = ρ2 − ρ2 = 0.28 THE THEORY OF LINEAR PREDICTION In other words. Because e1 (n) is the output of A1 (z) in response to x(n). we.. Consider a real WSS process with autocorrelation R(k) = ρ|k| .36) Thus. conclude that em (n) f f f is orthogonal to em (n − ). we first evaluate α1 = R(2) + a1.26) then the recursion stalls (i.26).1 R(2) + a2.32) f Because em (n) is orthogonal to all the past samples of x(n). E[em (n)[em (n − )]*] = 0 for f > 0.26) indeed. we find k2 = −α1 /E1 = 0. > 0. Example 3. No matter how far we continue. This proves Eq. the error em (n) is orthogonal to all the past samples of x(n). By reversing the above argument. Thus.1 = −ρ. the optimal predictor polynomial is A1 (z) = 1 − ρz−1 . m f em (n) f = x(n) + i=1 am. We also know that f em (n) is a linear combination of present and past samples of x(n). The first-order predictor coefficient a1. * (3.34) so that a1. k3 = 0 and A3 (z) = A1 (z). (3.31) Similarly.

• • • • . Chandrasekhar. They showed that as long as the displacement rank is a fixed number independent of matrix size.’’ However. f f f (3. in particular. for k > 0. see p. regardless of the matrix size.LEVINSON’S RECURSION 29 Thus. One of the outcomes of Levinson’s recursion is that it gives rise to an elegant structure for linear prediction called the lattice structure. For a fascinating history.. 3. E[e1 (n)e1 (n − k)] = R(k) + ρ2R(k) − ρR(k − 1) − ρR(k + 1) = 0. The same is true for inverses of Toeplitz matrices (which are not necessarily Toeplitz). 1969). even the Toeplitz structure is not necessary if the goal is to obtain an O(N 2 ) algorithm. It is also related to the Berlekamp--Massey algorithm (Berlekamp. 1985). However. It can be related to early work by other famous authors (e. Because R(∞) = 0. e1 (n) is a zero-mean white process. it is clear from this chapter that Levinson’s recursion is very elegant and insightful. Kailath et al. introduced the idea of displacement rank for matrices.5 CONCLUDING REMARKS Levinson’s work was done in 1947 and. In fact. It turns out that Toeplitz matrices have displacement rank 2. 160). 1947).g. This will be the topic of discussion for the next chapter. In 1979. the reader should study the scholarly review by Kaliath (1974. So. in his own words.38) as anticipated. the process x(n) has zero mean. the method can be extended to the case of Toeplitz matrices. O(N 2 ) algorithms can be found for solving linear equations involving these matrices. The derivation of Levinson’s recursion in this chapter used the properties of the autocorrelation matrix. was a ‘‘mathematically trivial procedure. which are not necessarily positive definite (Blahut.

.

For example.1 INTRODUCTION In this chapter. As a preparation for the main topic.1 demonstrates the difference f b between the forward and backward predictors.’’ Figure 4. they arise in the theory of all-pass filters: any stable rational all-pass filter can be represented using an IIR lattice structure similar to the IIR LPC lattice. . we first discuss the idea of backward linear prediction.i x(n − i) + x(n − N − 1). Notice that both xN (n) and xN(n − N − 1) are based on the same set of N measurements. Lattice structures have fundamental importance not only in linear prediction theory but. . * The superscript b is meant to be a reminder of ‘‘backward.31 CHAPTER 4 Lattice Structures for Linear Prediction 4.1) and the prediction error is eb (n) = x(n − N − 1) − xN (n − N − 1) N N b = i=1 bN. .i x(n − i). 4. in signal processing. but it helps us to attach a ‘‘physical’’ significance to the polynomial z−m Am (z) in Levinson’s . The predicted value is a linear combination of the form N b x N (n − N − 1) = − i=1 bN. . we present lattice structures for linear prediction. x(n − N). more generally. The backward prediction problem is primarily of theoretical interest. A backward linear predictor for this process estimates the sample x(n − N − 1) based on the ‘‘future values’’ x(n − 1). * (4.2 THE BACKWARD PREDICTOR Let x(n) represent a WSS random process as usual. These structures essentially follow from Levinson’s recursion.

the optimum backward predictor polynomial is given by BN (z) = z−(N+1) AN (z). for 1 ≤ k ≤ N. It is therefore clear that we can derive the backward prediction error sequence eb (n) from the forward N f prediction error sequence eN (n) as shown in Fig. (4. * (4.5) . 1 ≤ i ≤ N. we again apply the orthogonality principle. we can show that N the solution is closely related to that of the forward predictor of Section 2. we again define the predictor polynomial N BN (z) = i=1 bN. as we shall see. Thus.32 THE THEORY OF LINEAR PREDICTION observed data backward predictor estimates this forward predictor estimates this time n− N − 1 n−N n−1 n FIGURE 4.i z−i + z−(N+1) . In view of Eq.1.4). (4.i .3) This equation represents time-reversal and conjugation. which says that eb (n) must be orthogonal to x(n − k). 4.4) where the tilde notation is as defined in Section 1.i = aN.1: Comparison of the forward and backward predictors. recursion (3.2) The output of this FIR filter in response to x(n) is equal to the predictor error eb (n). AN (z) and BN (z). (4.2.3. Figure 4. With aN.2 summarizes how the forward and backward prediction errors are derived from x(n) by using the two FIR filters.N+1−i. Using this.i denoting the optimal forward predictor coefficients. As in the forward predictor problem.13). N * To find the optimal predictor coefficients bN. we have z−(N+1) AN(z) BN (z) = AN (z) AN ( z ) (4.3. it can be shown (Problem 15) that * bN.

simply observe that GN (z)GN(z) = But because zN AN (z) AN (z) ∀ ω. From Fig. which proves Eq. we therefore see that the backward error eb (n) is the output of the all-pass filter z−1 GN(z) in response to the N eNf (n) b eN (n) 1/ A N (z) inverse of forward prediction filter x(n) B N (z) backward prediction filter FIGURE 4. (4.6) |GN (e jω )| = 1.2.2: The forward and the backward prediction filters.LATTICE STRUCTURES FOR LINEAR PREDICTION 33 x(n) A N (z) forward prediction filter eNf (n) B N (z) backward prediction filter b eN (n) FIGURE 4.7). Δ z − N AN ( z ) AN ( z ) (4. the preceding implies that |GN (e jω )|2 = 1. 4. ∀ z AN (z) * HN (e jω ) = HN (e jω ). To see this. .3: The backward prediction error sequence.1 All-Pass Property We now show that the function GN ( z ) = is all-pass. that is.3.7) × z−N AN (z) = 1. derived from the forward prediction error sequence. (4. 4.

12) The above orthogonality properties have applications in adaptive filtering. eb (n).1 x(n − 1) + bk. According to the orthogonality principle. In particular. it can be shown (Problem 16) that the forward predictor error sequences have the following orthogonality property: f E em (n)[ek (n − 1)]* = 0.34 THE THEORY OF LINEAR PREDICTION input eN(n).8) b where EN = E[|eb (n)|2 ] and EN = E[|eN(n)|2]. Eq. 1979. k From these two equations. .10) follows immediately. . x(n − m). The process of generating the above set of orthogonal signals from the random process x(n) has also been .1. that is. m k (4. specifically in improving the convergence of adaptive filters (Satorius and Alexander. Haykin. . . Orthogonality of errors. The power spectrum of eN (n) is therefore identical to that of eb (n). From its definition. . which is the forward error.10) ♦ Proof. eb (n) m is orthogonal to x(n − 1). N b EN = E N f f f f f (4.9) We will show that any two of these are orthogonal. More precisely. f (4. . eb (n). It is sufficient to prove this for m > k. 0 b Em for k = m for k = m. therefore. From this result. The backward prediction error sequences satisfy the property E eb (n)[eb (n)]* = m k for all k. 4. .2 x(n − 2) + . * * * eb (n) = x(n − k − 1) + bk. that is. we conclude that E[eb (n)[eb (n)]∗] = 0. Theorem 4.6) is stable because N the zeros of AN (z) are inside the unit circle (Appendix C). 0 1 2 (4. (4. Notice that the all-pass filter (4.11) m > k.2 Orthogonality of the Optimal Prediction Errors Consider the set of backward prediction error sequences of various orders. the mean square values of the two errors are identical.k x(n − k). In a similar manner. + bk. 2002). eb (n). for m > k.2. . . (4. m ≥ 0.

By A m (z) A m+1 (z) k* m+1 k m+1 B m (z) (a) z −1 Bm+1 (z) f em(n) f em+1 (n) k* m+1 k m+1 b em (n) (b) z −1 b em+1 (n) FIGURE 4. (4. we also obtain * Bm+1 (z) = z−1 [km+1 Am (z) + Bm (z)].4). 4. which computes the optimal predictor polynomial Am+1 (z) from Am (z) according to Am+1 (z) = Am (z) + km+1 z−1 [z−m Am (z)] (4.15) by using Bm+1 (z) = z−(m+2) Am+1 (z). 4. . are the outputs of the filters Am (z) and Bm (z) in response m to the common input x(n).4: (a) The lattice section that generates the (m + 1)th-order predictor polynomials from the mth-order predictor polynomials. we presented Levinson’s recursion.2.4(a).4(b). (4. 4.13) On the other hand.LATTICE STRUCTURES FOR LINEAR PREDICTION 35 interpreted as a kind of Gram--Schmidt orthogonalization (Haykin. These details are beyond scope of this chapter.2. The preceding relations can be schematically represented as f in Fig. We can therefore rewrite Levinson’s order update equation as Am+1 (z) = Am (z) + km+1 Bm (z). From this. but the above references provide further details and bibliography. we know from Section 4. 2002). the prediction error signals are related as shown in Fig. Because em (n) and eb (n).3 LATTICE STRUCTURES In Section 3. (b) The same system shown with error signals indicated as the inputs and outputs.14) (4. that the optimal backward predictor polynomial is given by Eq.

4. the transfer functions Am (z) and Bm (z) are generated by repeated use of the lattice sections in Fig.5(b).4(b). This is a cascaded FIR lattice structure. If we apply the input x(n) to the FIR lattice structure.3. we see that the prediction errors are related as f em+1 (n) = em (n) + km+1 eb (n).5(a). Lattice structures for linear m prediction have been studied in detail by Makhoul (1977). 4. m m f f f These equations have the structural representation shown in Fig. (b) Details of the mth lattice section. m f * eb +1 (n) = km+1 em (n − 1) + eb (n − 1). 4.6(b). 4.5: (a) The FIR LPC lattice structure that generates the prediction errors em (n) and eb (n) m for all optimal predictors with order 1 ≤ m ≤ N. m b f * em+1 (n) = km+1 em (n − 1) + eb (n − 1).4. often referred to as the LPC FIR lattice. we obtain the structure of Fig. 4. then these two equations m become f em (n) = em+1 (n) − km+1 eb (n). using the facts that A0 (z) = 1 and B0 (z) = z−1 .1 The IIR LPC Lattice From Fig. each rectangular box is the two multiplier lattice section shown in Fig. This is precisely the IIR all-pass lattice structure . m f Suppose we rearrange the first equation as em (n) = em+1 (n) − km+1 eb (n).6(a). 4. the optimal forward and backward prediction f error sequences em (n) and eb (n) appear at various nodes as indicated.36 THE THEORY OF LINEAR PREDICTION x(n) e0 (n) A0 (z) e0 (n) B0 (z) b f e1 (n) A1 (z) f e2 (n) A2 (z) f eN(n) f k1 z −1 z −1 (a) e1 (n) B1 (z) b k2 z −1 e2 (n) B2 (z) b … AN(z) kN z −1 eN(n) BN(z) b k* m (b) km km f FIGURE 4. Repeated application of this gives rise to the IIR structure of Fig. In this structure. 4. Here.

N The transfer function from the input to the rightmost node in Fig. as indicated. 4. This is the transfer function from the input eN(n) to the node eb (n) in the IIR lattice.6(b) is the all-pole filter f 1/AN (z). then the forward and backward error sequences of all the lower-order optimal predictors appear at f various internal nodes. Thus.2 Stability of the IIR Filter It is clear from these developments that the ‘‘parcor’’ coefficients km derived from Levinson’s recursion are also the lattice coefficients for the all-pass function z−NAN (z)/AN (z).3. 4. We already showed that the transfer function z−1 GN(z) = z−(N+1) AN ( z ) AN ( z ) f is all-pass. Now. in response to the input eN (n). studied in signal processing literature (Gray and Markel. 1993. 1999). Oppenheim f and Schafer. it is well . (b) The IIR LPC lattice structure that generates the f f signal x(n) from the optimal prediction error eN (n). Vaidyanathan. because this transfer function produces x(n) in response to the input eN (n) (see Fig. Because e0 (n) = x(n). The lower-order forward prediction errors em (n) and the backward prediction errors are also generated in the process automatically. the 0th-order prediction error e0 (n) appears at f the rightmost node. 1973.6: (a) The IIR LPC lattice section. if we apply the forward prediction error eN (n) as an input to this structure.LATTICE STRUCTURES FOR LINEAR PREDICTION em+1 (n) f 37 em(n) f −k m+1 (a) b em+1 (n) k* m+1 z −1 This box is labelled as b em (n) k m+1 f eN (n) f e f (n) N −1 e1 (n) f e0 (n) = x(n) eN (n) (b) b kN z −1 e b (n) N −1 k N −1 … z −1 e1 (n) b k1 z −1 e0 (n) b z −1 FIGURE 4.2). we therefore obtain the original random process x(n) f as an output of the IIR lattice. In particular. 4.

17) is called the downward recursion. Emf = Em+1 /(1 − |km+1 |2 ) f (4.15)) computes the optimal predictor polynomials of increasing order.17). Although the connection to the lattice structure is insightful. Vaidyanathan. Levinson’s recursion (Section 3.14).2) has shown that the condition |km | < 1 is indeed true as long as x(n) is not a line spectral process. Given Am+1 (z). from a knowledge of the autocorrelation R(k) of the WSS process x(n). This proves that the IIR filters 1/Am (z) and z−m Am (z)/Am (z) indeed have all poles strictly inside the unit circle so that Fig. in particular.m+1 . If we know the prediction error EN . * (1 − |km+1 |2 ) × Bm (z) = −km+1 Am+1 (z) + zBm+1 (z). Bm (z).3 The Upward and Downward Recursions Levinson’s recursion (Eq. it is also possible to prove stability of 1/Am (z) directly without the use of Levinson’s recursion or the lattice. by repeated use of Eq.17) (4. This process automatically f reveals the lower-order lattice coefficients ki .3. 4. the polynomial Bm+1 (z) is known. (4. (4. that is. stable. we can uniquely identify Am (z). On the other hand. So Eq. 1973. we can uniquely identify all the lower-order optimal predictors. (3. from (4.14) and (4. under this condition. and so is * km+1 = am+1.18) Using this. 1993) that the polynomial AN(z) has all its zeros strictly inside the unit circle if and only if |km | < 1. These equations can be inverted to obtain the relations (1 − |km+1 |2 ) × Am (z) = Am+1 (z) − km+1 zBm+1 (z). (4. hence. Thus. 1 ≤ m ≤ N have all zeros strictly inside the unit circle. This is called the upward recursion. we can therefore f also identify the lower-order errors Ei . 4.38 THE THEORY OF LINEAR PREDICTION known from the signal processing literature (Gray and Markel. all the polynomials Am (z). if we know the N th-order optimal prediction polynomial AN (z). (4. This is done in Appendix C.6(b) is.16) In fact. 1 ≤ m ≤ N.19) .

This is a very useful result in compression and coding. We will return to this point in Sections 5. The advantage of lattice structures arises from the fact that even when the coefficients km are quantized the IIR lattice remains stable.LATTICE STRUCTURES FOR LINEAR PREDICTION 39 4. • • • • . as long as the quantized numbers continue to satisfy |km | < 1.6 and 7. and lattice structures for linear prediction have been discussed in detail by Makhoul (1977).8. because it guarantees stable reconstruction of the data segment that was compressed.4 CONCLUDING REMARKS The properties of the IIR lattice have been analyzed in depth by Gray and Markel (1973).

.

If dN = 0. let us assume that we have found the Nth-order optimal predictor polynomial AN (z). we can then represent the process as the output of an IIR filter as shown in Fig. In the time domain. 2. all-pole IIR filter 1/D(z) in response to white input (Fig. that is. (5.1).41 CHAPTER 5 Autoregressive Modeling 5. This model is very useful both conceptually and for approximating the process with a simple model. we can write N x(n) = − i=1 a∗ . Now. 5. the AR process has zero mean according to the above definition. we say that the process is AR(N ).2 AUTOREGRESSIVE PROCESSES A WSS random process w(n) is said to be autoregressive (AR) if it can be generated by using the recursive difference equation N w(n) = − i=1 di *w(n − i) + e(n).1 INTRODUCTION Linear predictive coding of a random process reveals a model for the process. and N 2. e(n) is a zero-mean white WSS process.1(b). AR of order N. the polynomial D(z) = 1 + i=1 di*z−i has all zeros inside the unit circle. called the autoregressive (AR) model.i x(n − i) + eN (n). In other words. Because e(n) has zero mean. In this chapter. 5. we can generate w(n) as the output of a causal. stable.2) . how does the AR process enter our discussion of linear prediction? Given a WSS process x(n).1) where 1. N f (5. f The input to this filter is the prediction error eN (n). we describe this model. We know.

The constant c in c/D(z) is such that Sww (e jω ) has maximum magnitude unity. Thus. we saw that if the error stalls. .4624z−4 . Em does not decrease anymore as m increases f beyond some value N.0000 − 1.0226 If we perform Levinson’s recursion with this.6473 k2 0.2716 R(1) 0.1: Generation of an AR process from white noise and an all-pole IIR filter. ≥ EN−1 > EN = Emf .8198z−1 + 2. Example 5.1758 R(2) −0. suppose the optimal predictors of various orders for a zero-mean process x(n) are such that the minimized mean square errors satisfy f E1 ≥ E2 ≥ .4 jπ so that D(z) = 1. Summarizing. .1: Levinson’s recursion for an AR process.2 (top plot). (5.3) Then. Consider a WSS AR process generated by using an all-pole filter c/D(z) with complex conjugate pairs of poles at 0. we obtain the lattice coefficients k1 −0. 0.0000 k6 0. f f f f m > N.0000 . that is.8e±0. In Section 3.0077 R(3) −0. The first few autocorrelation coefficients for this process are as follows: R(0) 0.2714z−3 + 0. x(n) is AR(N ). the stalling phenomenon implies that x(n) is AR(N ).42 THE THEORY OF LINEAR PREDICTION e(n) white process IIR allpole filter 1 /D(z) w(n) AR process FIGURE 5.2 jπ .4. The power spectrum Sww (e jω ) of this AR(4) process is shown in Fig.1091 R(4) −0.0848 R(5) −0.85e±0.0425z−2 − 1.5469 k4 0.7701 k3 −0.4624 k5 0. then eN (n) is white (assuming x(n) has zero mean). 5.

5 0 0 0.AUTOREGRESSIVE MODELING 1 43 Power spectrum 0. .12 0.2 0. The input power spectrum Sww (e jω ) of an AR(4) process (top) and the f prediction error Em plotted for a few prediction orders m (middle).16 Prediction error 0.4 ω/π 0. 3.1.8 1 Prediction error 0.2 is also reproduced (bottom).2 0.3 0.6 0.2: Example 5.08 0 1 2 Prediction order m 14 FIGURE 5.1 0 0 1 2 3 4 Prediction order m 14 0. The prediction error for the non-AR process of Ex.

2 is also shown in the figure (bottom).2 (middle).5469 −1. The prediction errors are plotted in Fig.8198 −1.4624 0 0 0 0 0 0 0 0 0 0 0 0 The prediction filter does not change after A4 (z) because the original process is AR(4). The optimal prediction polynomials A0 (z) through A6 (z) have the coefficients given in the following table: A0 ( z ) A1 ( z ) A2 ( z ) A3 ( z ) A4 ( z ) A5 ( z ) A6 ( z ) 1. 3.0425 2. can always be represented in terms f f of the optimal linear prediction error eN (n) as in Eq.1457 −1. 5. This is also consistent with the fact that the lattice coefficients past k4 are all zero.0 1. (a) eNf (n) 1 /A N (z) IIR allpole filter x(n) original process e(n) (b) white process 1 /A N (z) IIR allpole filter y(n) AR approximation FIGURE 5.2714 −1. (5.0 1.3 APPROXIMATION BY AN AR(N) PROCESSES Recall that an arbitrary WSS process x(n) (AR or otherwise).8198 0 0.0 1. If we replace eN (n) with zero-mean white noise f e(n) having the mean square value EN .44 THE THEORY OF LINEAR PREDICTION for the first six optimal predictors. The error Em decreases monotonically and then becomes a constant for m ≥ 4. For comparison. In many practical cases. the prediction error Em for the non-AR process of Ex. Notice that these coefficients satisfy |km | < 1 as expected from theory.0425 2. (a) The exact process x(n) expressed in terms of the Nth-order LPC parameters and (b) the AR(N ) approximation y(n). .3: Generation of the AR(N ) approximation y(n) for a WSS process x(n).8198 −1.0425 0 0 −0. we obtain the so-called AR model y(n) for the process x(n).0 −0.7701 1.4624 0.5669 −1.2).4624 0.0 1.2714 −1.3967 2.0 1. 5.2714 0 0 0 0.6473 −1. the error eN(n) f tends to be nearly white for reasonably large N.0 1. this error continues to decrease.

3. the coefficients cN. x(n − i) is a linear combination of e(n − i). where e(n) is white. .e. at any time n. This means that we can express it as N f x(n) = − i=1 cN. then we know that x(n) is AR. e(n − i − 1) . the solution {aN.i . 5. this means that the linear combination N − i=1 * cN. Then LPC Will Reveal It If a zero-mean WSS process x(n) is such that the optimal LPC error eN(n) is white. The process y(n) satisfies N y(n) = − i=1 aN.1 If a Process is AR.i of the AR(N ) process can be identified simply by solving the N th-order . Now consider E[e(n)x*(n − i)] = E e(n)[ g(0)e(n − i) + g(1)e(n − i − 1) + .i x(n − i) + e(n).i x(n − i) is the optimal linear prediction of x(n). E[e(n)e*(n − )] = 0.5) for some set of coefficients cN. > 0). let x(n) be AR(N ). As the solution to the optimal N th-order prediction problem is unique. * (5.i. This is a causal difference equation.AUTOREGRESSIVE MODELING 45 This is shown in Fig. i > 0. In view of the orthogonality principle. consider the converse. . Now. the sample x(n) is a linear combination x(n) = g(0)e(n) + g(1)e(n − 1) + g(2)e(n − 2) + .. Thus.7) In other words. . . 5. . this means that E[e(n)x*(n − i)] = 0. . That is. Similarly.3.i } to the optimal predictor problem is therefore equal to cN. so that.6) (5.i y(n − i) + e(n) * white (5. (5.4) The model process y(n) is also called the AR(N ) approximation (N th-order autoregressive approximation) of x(n).]* Because e(n) is white (i. e(n) is orthogonal to all the past samples of x(n).

N N N This equation follows directly from normal equations. 1. Let x(n) be an AR(N ) WSS process. Summarizing. See Appendix D for details. that is.5). Extrapolation equation. A common example is speech (Markel and Gray. all the coefficients R(k). R(N ) for an AR(N ) process. The AR coefficients cN. the determination of AN (z) and EN provides us a way to extrapolate an autocorrelation.N R(k − N). 1978.i . . . we conclude that for AR(N ) processes. So. LPC of an AR process. we can find R(k) for |k| > N from the first N + 1 values in such a way that the Fourier transform Sxx (e jω ) of the extrapolated sequence is nonnegative for all ω.i will turn out to be cN. The fact that x(n) is not AR means that the prediction error em (n) never gets white. Thus.2 R(k − 2) . The resulting prediction error eN (n) is white and equal to the ‘‘input term’’ e(n) in the AR equation Eq. . we have: Theorem 5. Because the quantities AN (z) and EN can be found from the N + 1 coefficients R(0). (5. a signal x(n) that is not AR can be modeled well by an AR process. the prediction error eN (n) is white and the same as the white noise sequence e(n) appearing in the AR(N ) description Eq.5). the autocorrelation R(k) of an AR(N ) process x(n) is completely f f determined by the filter 1/AN (z) and the variance EN . a zero-mean process representable as in Eq. This will be demonstrated later in Section 6.5) of x(n). . An explicit equation for extrapolation of R(k) can readily be written down. Jayant f and Noll. .i can be found simply by performing N th-order LPC on the process x(n). . but its power spectrum often gets flatter and flatter as m increases. R(1). 5.2 Extrapolation of the Autocorrelation Because eN (n) is white. (5. f 2. (5. Rabiner and Schafer. where e(n) is white. ♦ f 5.3. Then. we can find R(k) for any k > N using the equation R(k) = −a∗ . R(1).3) is the same as x(n). f .3 where we define a mathematical measure for spectral flatness and study it in the context of linear prediction. 1976.. . R(N ) using Levinson’s recursion. Moreover. In other words. the AR model y(n) resulting from the optimal LPC (Fig. Thus. |k| > N f are determined by the above N + 1 coefficients. In many practical situations.1 R(k − 1) − a∗ . given R(0).46 THE THEORY OF LINEAR PREDICTION optimal prediction problem. The optimal LPC coefficients aN. 1984). .1. − a∗ . that is y(n) = x(n).

respectively.2. If the process x(n) is itself AR(N ). and y(n) be the AR(N ) approximation of x(n) generated as in Fig.AUTOREGRESSIVE MODELING 47 5. as the approximation order N increases.9) To see this.8) Thus.3(b) is generated using white noise e(n) f with variance EN .10) But because e(n) is zero-mean white. |AN (e jω )|2 2π f But because r(0) = R(0). (5. 5. (5. R(0) = EN f 0 2π 1 dω |AN (e jω )|2 2π (5. Because Fig.9) follows readily. then we know (Theorem 5. the AR(N ) model y(n) is an approximation of x(n) in the sense that the first N + 1 ♦ autocorrelation coefficients for the two processes are equal to each other.11) . Now consider the sample y(n − k).4). (5. k > 0. 5. (5. Proof of Theorem 5. 5. So.2. Then. AN (z) be its optimal N th-order predictor polynomial. i ≥ 0. | k| ≤ N . observe first that the AR process y(n) in Fig. R(k) = r(k). in what respect is y(n) ‘‘close’’ to x(n)? We now provide a quantitative answer: Theorem 5.3(b) and therefore satisfies the difference equation Eq. 5.3(b) is a causal system. Let R(k) and r(k) be the autocorrelations of x(n) and the AR(N ) approximation y(n). the AR(N ) process y(n) is representable as in Fig. E[e(n)y*(n − k)] = 0. we have E[e(n)e∗ (n − k)] = 0 for k = 0. A simple corollary of Theorem 5.1) that the AR(N ) model y(n) satisfies y(n) = x(n) so that Eq. so that 2π r(0) = 0 EN dω . (5. that is. Thus. (5.8) holds for all k. By construction.4 AUTOCORRELATION MATCHING PROPERTY Let x(n) be a WSS process. which can be used to prove that for any WSS process x(n). more and more values of R(k) are matched to r(k).3. not just |k| ≤ N. Eq. y(n − k) = linear combination of e(n − k − i). In what mathematical sense does y(n) approximate x(n).2 is R(0) = r(0).

where r(k) = E[ y(n)y*(n − k)].20).1 ⎥ ⎢ 0 ⎥ ⎥⎢ .2 R(0) a3.1 ⎤ ⎡ f E3 0 ⎥ ⎢ 0 0⎥ ⎢ ⎥=⎢ 0⎦ ⎣ 0 1 0 × f E2 0 0 × × f E1 0 ⎤ × ×⎥ ⎥ ⎥ ×⎦ f E0 Δ Δu2 The symbol × stands for possibly nonzero entries whose values are irrelevant for the discussion.i and errors Em (see end of Section 4. We .20). . . . Thus. ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ r(0) r*(1) . Thus. ⎥ = ⎢ . for the range of values f 0 ≤ k ≤ N. .1 ⎢ ⎥⎢ ⎢ ⎣ r*(2) r*(1) r(0) r(1) ⎦ ⎣ a3. am.3 ⎡ rN+1 0 1 a2. (2.i and EN completely determine the f lower-order optimal prediction coefficients am. r(N − 1) ⎥ ⎢ aN.3. (2. (2.2 0 0 1 a1. . We obtain a similar relation if R(k) is replaced by r(k). ⎦ ⎣ . we obtain the following set of equations. We now claim that Eq. (5. r (N ) 1 EN ⎥⎢ ⎥ ⎢ ⎥ . ⎥ . ⎦ r(0) aN. ⎤⎡ ⎤ ⎡ f⎤ .2 r*(3) r*(2) r*(1) r(0) a3. This is proved by observing that the coefficients aN.20) and (5. r(1) r(0) . ⎥ ⎢ . we have the following two sets of equations (demonstrated for N = 3) ⎤⎡ r(0) r(1) r(2) r(3) 1 ⎢ r*(1) r(0) r(1) r(2) ⎥ ⎢ a ⎥ ⎢ 3. (2.1 a2. ‘‘Upward f and Downward Recursions’’). . the coefficients aN.4) by y*(n − k). we obtain a set of equations resembling Eq.12) together imply R(k) = r(k). . This set of equations has exactly the same appearance as the augmented normal equations Eq. ⎥ . Thus. taking expectations..i and Em satisfy smaller sets of equations of the form Eq.12) r*(N ) r*(N − 1) .2 0 0 1 a1 .N 0 (5. .37). Note that Δ is lower triangular and Δu1 and Δu2 are upper triangular matrices. .48 THE THEORY OF LINEAR PREDICTION Multiplying Eq.i are the solutions to two optimal predictor problems: one based on the process x(n) and the other based on y(n).1 ⎥⎢ R(1) ⎦ ⎣ a3. ⎦⎣ . and using the above equality. Collecting them together. . . .1 ⎤ ⎡ f E3 0 ⎥ ⎢ 0 0⎥ ⎢ ⎥=⎢ 0⎦ ⎣ 0 1 0 × f E2 0 0 × × f E1 0 ⎤ × ×⎥ ⎥ ⎥ ×⎦ f E0 Δ Δu1 ⎡ R(0) R(1) R(2) ⎢ R∗ (1) R(0) R(1) ⎢ ⎢ ∗ ⎣ R (2) R∗ (1) R(0) R∗ (3) R∗ (2) R∗ (1) RN+1 ⎤⎡ R(3) 1 ⎥⎢ R(2) ⎥ ⎢ a3. . ⎥⎢ .3 0 1 a2.3.1 a2.

and obtain RN+1 = rN+1 .13) H(z) is a causal IIR filter with impulse response.8) as follows: . So we uniquely identify R(0) = E0 .3. h(n).i } and EN . The AR(N ) model of Fig.8)). The left-hand sides. n Because e1 (n) is white with E[|e1 (n)|2 ] = 1. N − 2.4 where e1 (n) is white noise with variance equal to unity and where f H(z) = EN f AN ( z ) (5. This proves Eq. so we will take a different route to prove the desired claim Eq. we can use the normal equation for mth order and identify R(m) from the mth equation in this set (refer to Eq. f the right-hand sides must be diagonal! It is easy to verify that these diagonal matrices have Em as diagonal elements. we get Δ†rN+1 Δ = Δ† Δu1 . so that the right-hand sides of the above two equations are identical. . the autocorrelation r(k) of the AR model process y(n) is given by r(k) = n h(n)h*(n − k). on the other hand. In a nutshell. The deterministic autocorrelation of the sequence h(n) is defined as h(n)h*(n − k).12) must be the same! R(k) is Matched to the Autocorrelation of the IIR Filter. (5. By premultiplying the preceding equations with Δ† . we can identify R(1). R(1).i and Em for m = N − 1. are Hermitian. . the coefficients R(0). As a result. . Thus.8) indeed. from the normal equation with m = 1. (5. . R(N ). .3) to compute am. if R(0). we can use the downward recursion (Section f f 4. Since Δ is nonsingular. . Next. .3(b) can be redrawn as in Fig. we can invert it. In general.8). so r(k) and R(k) in Eqs. (5. given {aN. (2. The right-hand side in each of these equations is a product of two upper triangular matrices and is therefore upper triangular.. (2.i } and the error EN . are unique. . 5. 5.20) and (5. which lead to a given set f of {aN. . (5.AUTOREGRESSIVE MODELING 49 have not shown that Δu1 = Δu2 . R(m − 1) are known. Δ†RN+1 Δ = Δ† Δu2 . This † † proves that Δ RN+1 Δ = Δ rN+1 Δ .14) So we can rephrase Eq. say.

Then.5 POWER SPECTRUM OF THE AR MODEL We just proved that the AR(N ) approximation y(n) is such that its autocorrelation coefficients r(k) match that of the original process x(n) for the first N + 1 values of k. (5. From the preceding theory. the output has the power spectrum Sxx (e jω ) = σe2 |G4 (e jω )|2 .9e±0.2 jπ and 0. the first N + 1 coefficients of the deterministic autocorrelation of h(n) are ♦ matched to the values R(k).4150z−2 − 0. In particular.5 shows the AR(N ) approximations Syy (e jω ) of the spectrum Sxx (e jω ) for various orders N (solid plots). 5. As N increases.15) for |k| ≤ N. we consider a random process generated as the output of the fourth-order IIR filter G4 (z) = 3. we know that the AR(9) approximation and the original process have matching . In this example. we therefore expect the power spectrum Syy (e jω ) of y(n) to resemble the power spectrum Sxx (e jω ) of x(n).9 − 2. This is shown in Fig. That is. with σe2 assumed to be such that the peak value of Sxx (e jω ) is unity. if x(n) is AR(N ).4: Autoregressive model y(n) of a signal x(n).5 (dotted plots).5515z−3 1 − 0.5315z−3 + 0. Let h(n) be the impulse response of the causal stable IIR filter EN /AN (z).9618z−1 + 0. then Eq. Let x(n) be a WSS process with autocorrelation R(k) and AN (z) be the N th-order optimal predictor polynomial.7645z−1 + 1.7300z−2 − 0.50 THE THEORY OF LINEAR PREDICTION e1 (n) white. With white noise of variance σe2 driving this system. We see that the AR(4) approximation is quite unacceptable. 5. ∞ f R(k) = n=0 h(n)h*(n − k).5184z−4 This system has complex conjugate pairs of poles inside the unit circle at 0. (5.8e±0. Figure 5. Example 5.2: AR approximations.15) holds for all k. variance = 1 IIR allpole filter H(z) y(n) AR model of x(n) FIGURE 5. whereas the AR(5) approximation shows significant improvement. Corollary 5.1. Clearly Sxx (e jω ) is not an AR process because the numerator of G4 (z) is not a constant.6 jπ . The successive approximations are better and better until we find that the AR(9) approximation is nearly perfect. This is demonstrated next.

4 ω/π 0.8 1 1 original Power spectrum AR(6) 0.6 0.8 1 1 original Power spectrum AR(5) 0.4 ω/π 0.8 1 FIGURE 5.5: AR approximations of a power spectrum: (a) AR(4).5 0 0 (c) 0.AUTOREGRESSIVE MODELING (a) 51 1 original Power spectrum AR(4) 0. (e)AR(8). and (f) AR(9). (c) AR(6).2 0. (d) AR(7).5 0 0 (b) 0. .6 0.5 0 0 0.2 0.2 0. (b) AR(5).6 0.4 ω/π 0.

4 ω/π 0.8 1 1 Power spectrum original AR(8) 0.6 0.5: Continued. .6 0.8 1 1 original Power spectrum AR(9) 0.5 0 0 (f) 0.2 0.5 0 0 (e) 0.6 0.52 THE THEORY OF LINEAR PREDICTION (d) 1 Power spectrum original AR(7) 0.4 ω/π 0.2 0.8 1 FIGURE 5.4 ω/π 0.5 0 0 0.2 0.

for reasons explained later in Section 6.17) .0234 r(m) 0. although the plots of Sxx (e jω ) and Syy (e jω ) are almost indistinguishable.0518 −0. Kay (1988).0057 0.0364 −0. This is indeed seen from the following table: m 0 1 2 3 4 5 6 7 8 9 10 11 R(m) 0.0430 0. R(m) = r(m) for 0 ≤ m ≤ 9.0054 0.0318 0.6. So the AR(9) approximation is not perfect either. |k| ≤ N (5. then the result may not even be a valid autocorrelation (i. We know that the AR(N ) model approximates the power spectrum Sxx (e jω ) with the all-pole spectrum power spectrum Syy (e jω ) = EN |AN (e jω )|2 f (5.0519 −0. This is also said to be the maximum entropy estimate.0045 0.0441 0.0519 −0.0057 0. Further detailed discussions on power spectrum estimation techniques can be found in Kay and Marple (1981).0031 −0.16) using an AR(N ) model.0518 −0.0241 The coefficients R(m) and r(m) begin to differ starting from m = 10 as expected.0364 −0.0054 0. that is..0819 −0. Peaks are well matched. We can say that the estimate Syy (e jω ) is obtained by extrapolating the finite autocorrelation segment R(k).1667 0.1667 0. The spectrum Syy (e jω ) of the AR approximation y(n) is called an AR-model-based estimate of Sxx (e jω ).0318 0.e.AUTOREGRESSIVE MODELING 53 autocorrelation coefficients. The estimate Syy (e jω ) is nothing but the Fourier transform of the extrapolated autocorrelation r(k). its Fourier transform may not be nonnegative everywhere). Marple (1987). and Therrien (1992). Note that if R(k) were ‘‘extrapolated’’ by forcing it to be zero for |k| > N.0819 −0.0031 −0.0045 0.1.

In the AR(18) approximation. The AR(31) approximation is nearly perfect.4 ω/π 0. then Syy (e jω ) cannot approximate these very well because it cannot have zeros on the unit circle. we have in this example a 10th-order real-coefficient filter G10 (z) with all poles inside the unit circle. The original power spectrum is generated with a pole-zero filter as in Ex. . but instead of G4 (z). the peaks are more nicely matched than the valleys. Sxx (e jω ) has zeros (or sharp dips) on the unit circle.6 0.2 0.2 0.5 0 0 0.8 1 FIGURE 5.2.8 1 1 Original Power spectrum AR(31) 0. on the other hand.5 0 0 0. then AN (z) has to have zeros close to the unit circle to approximate these peaks. If.6 0. 5.54 THE THEORY OF LINEAR PREDICTION If Sxx (e jω ) has sharp peaks.4 ω/π 0. Thus.6: Demonstrating that the AR approximation of a power spectrum shows good agreement near the peaks. the AR(N ) model can be used to obtain a good match of the power spectrum 1 Power spectrum Original AR(18) 0.

878. where L is the number of samples of x(n) available. from the measured values of the random process x(n). we can actually estimate the autocorrelation and hence. we add random noise to the output x(n).AUTOREGRESSIVE MODELING 55 Sxx (e ) near its peaks but not near its valleys.6. R(4) = −2.984. First.85 and 0. then the autoregressive model is such that Sxx (e jω ) = Syy (e jω ).18) In other words. however.2714z−3 + 0. Its variance will be 2 2 denoted as ση .203. are useful in some applications such as the AR representation of speech.3: Estimation of AR model paramaters. 5.4624z−4 This has roots 0. .8084 j. To see the effect of noise (which is always present in practice). The purpose of the example is to show that.2627 ± 0. the N th-order prediction problem places in evidence a stable spectral factor HN (z) of the power spectrum Sxx (e jω ).032. R(3) = −3. Such approximations. A sample of the AR(N ) process can be generated by first generating a WSS white process e(n) using the Matlab command randn and driving the filter 1/A4 (z) with input e(n). the model parameters (coefficients of A4 (z)) accurately. Sxx (e jω ) = jω EN |AN (e jω )|2 f (5. Thus. R(2) = −1.6472 ± 0. So. Example 5.0000 − 1. The measurement noise η(n) is assumed to be zero-mean white Gaussian. When x(n) is not AR(N ). With L = 210 and ση = 10−4 . where A4 (z) = 1. We now consider an AR(4) process x(n) derived by driving the filter 1/A4 (z) with white noise.4702 j with absolute values 0. we have found a rational transfer function HN (z) = c/AN (z) such that Sxx (e jω ) = |HN (e jω )|2. If the process x(n) is indeed AR(N ). This is demonstrated in the example shown in Fig. the estimated autocorrelation R(k) for 0 ≤ k ≤ 4 is as follows: R(0) = 8. so the actual data used are xnoisy (n) = x(n) + η(n). We see that HN (z) is a spectral factor of Sxx (e jω ). R(1) = 4. the filter HN (z) is an AR(N ) approximation to the spectral factor. and xnoisy (n) used instead of x(n). we estimate the autocorrelation R(k) using (2. 0. when x(n) is AR(N ). Spectral factorization.8.8198z−1 + 2.0425z−2 − 1.250.43).

Magnitude responses of the original system 1/A4 (z) and the estimate 1/A4 (z) are shown for three values of the measurement noise variance 2 2 ση = 10−4 . respectively.2 0.2 0. The segment length used to estimate autocorrelation is L = 210 . Estimation of AR model parameters. . from top to bottom.7: Example 5.2 0.6 0.4 ω/π 0.6 0.4 ω/π 0.8 1 Original 5 Magnitude response Estimate 4 3 2 1 0 0 0.0025.8 1 Original 5 Magnitude response Estimate 4 3 2 1 0 0 0.3.01. 0.6 0.8 1 FIGURE 5. and ση = 0.56 THE THEORY OF LINEAR PREDICTION Original 5 Magnitude response Estimate 4 3 2 1 0 0 0.4 ω/π 0.

6 0.8 1 Original 5 Magnitude response 4 3 2 1 0 0 Estimate 0.4 ω/π 0.4 ω/π 0.8 1 Original 5 Magnitude response 4 3 2 1 0 0 Estimate 0.3.7 with segment length L = 28 instead of 210 .8: Example 5.2 0.AUTOREGRESSIVE MODELING 57 Original 5 Magnitude response 4 3 2 1 0 0 Estimate 0.2 0.6 0. Repetition of Fig.4 ω/π 0.6 0.2 0. . 5.8 1 FIGURE 5.

(i.7 (top plot) shows the magnitude responses of the filters 1/A4 (z) and 1/A4 (z). the parcor coefficients ki .19) Here. The roots of A4 (z) are 0. the estimate deteriorates with increasing measurement noise.i . the polynomial coefficients aN.8038z−1 + 2.6 APPLICATION IN SIGNAL COMPRESSION The AR(N ) model gives an approximate representation of the entire WSS process x(n).4679 j. the end effects (caused by the use of finite summation) are negligible.i and the mean square value EN ).2616 ± 0. It is typical to subdivide the sampled speech waveform x(n) (evidently.e. L is the length of the segment. The model has to be updated as time evolves. a real-valued sequence) into consecutive segments (Fig. 5. corresponding to 200 samples at a sampling rate of 10 kHz. which are seen to be in good agreement. 5. If N << L.5). This sequence of 200 samples is viewed as representing a WSS random process. the finiteness of duration of the segment is of secondary importance in affecting the accuracy of the estimates.6403 ± 0. . Figure 5.8 shows similar plots for the case where the number of measured samples of x(n) are reduced to L = 28 . (5. The simplest would be to use the average R(k) ≈ 1 L L−1 x(n)x(n − k). this modeling is commonly used for compressing the signal waveform. 1 Note that when x(n) is real-valued as in speech. In practice. using f only N + 1 numbers.58 THE THEORY OF LINEAR PREDICTION From this.0025 and ση = 0. the N autoregressive coefficients a∗ .01. Decreasing L results in poorer estimates of A4 (z) because the estimated R(k) is less accurate. respectively.4491z−4 This should be compared with the original A4 (z) given above.0131z−2 − 1. Fig. With N << 200 (usually N ≈ 10). one multiplies the segment with a smooth window to reduce the end effects. the estimated prediction polynomial can be obtained as A4 (z) = 1. There are several sophisticated ways to perform this estimation (Section 2.1 Each segment might be typically 20-ms long.0000 − 1. which are inside the unit circle as expected.9). n=0 0 ≤ k ≤ N. and the first N autocorrelation coefficients R(k) are estimated.. 5.8036 j and 0. and the f prediction error sequence eN (n) are real as well. N In practical applications such as speech coding. Further details about these techniques can be found in Rabiner and Schafer (1978). The middle and lower plots show these responses with 2 2 the measurement noise variance increased to ση = 0.2436z−3 + 0. For the same three levels of measurement noise. Thus. Signals of practical interest can be modeled by WSS processes only over short periods.

N EN f (5.2 . The locally generated white noise e(n) is satisfactory only when the speech segment under consideration is unvoiced . .9(b). 5.AUTOREGRESSIVE MODELING 59 x(n) signal to be compressed segment k 20 ms k+1 k+2 … (a) Perform LPC compression A (z) N and f N LPC parameters (b) e(n) white.20) is then transmitted. A segment of the output y(n) is then taken as the AR(N ) approximation of the original speech segment. Note that with segment length L = 200 and the AR model order N = 10.i z (5. For the next segment of x(n).g. Pitch-excited coders and noise-excited coders. one generates a local white-noise f source e(n) with variance EN and drives the IIR filter 1 = AN (z) 1+ 1 N −i i=1 aN. variance f N 1/ A N (z) IIR allpole filter y(n) reconstruction FIGURE 5.9: (a) Compression of a signal segment using LPC and (b) reconstruction of an approximaf tion of the original segment from the LPC parameters AN (z) and EN . . possibly after quantization. the part . the LPC parameters AN (z) and EN are generated again. Strictly speaking.. Such sounds have no pitch or periodicity (e.21) with this noise source as in Fig. aN. a segment of 200 samples has been compressed into 11 coefficients! The encoded message aN. At the receiver end. The k th segment of the original signal x(n) is approximated by a segment of the AR signal y(n).1 aN. the preceding discussion tells only half the story.

with period T.i . 1993) or the excellent. Many variations of linear prediction have been found to be useful in speech compression. The reader interested in the classical literature on speech compression techniques should study (Atal and Schroeder. white noise generator for unvoiced signals 1/ A N (z) periodic impulse train generator AR(N) model of speech segment y(n) reconstructed speech segment for voiced signals FIGURE 5. it is causal and stable). 5. A solution to this problem is to transmit the ‘‘parcor’’ or lattice coefficients km .. the AR(N ) model is excited by a noise generator or a periodic impulse train generator. from the quantized versions km .g. At the receiver end.10. one recomputes an approximation AN (z) (q ) of AN (z). this uses the predictive coder in a feedback loop in an ingenious way. we know that |km | < 1. A technique called differential pulse code modulation is widely used in many applications.1).3. tutorial paper on vector quantization by Makhoul et al. The modification required at the receiver is that the AR(N ) model is now driven by a periodic waveform such as an impulse train. 1984. This is summarized in Fig. .. we also know that 1/AN (z) is stable if and only if |km | < 1. Deller et al. The source of excitation depends on whether the segment is voiced or unvoiced (see text).. From Section 3. Rabiner and Schafer.i before transmission. although old.e. according to whether the speech segment is unvoiced or voiced. There are further complications in practice such as transitions from voiced to unvoiced regions and so forth. It is shown in Appendix C that the unquantized coefficients aN. Then. this may not continue to remain true. if we quantize aN.10: Reconstruction of a speech segment from its AR(N ) model. particularly in low bit-rate coding. voiced sounds such as vowel utterances (e. (1985). In view of the relation between Levinson’s recursion and lattice structures (Section 4. However. 1978. Summarizing.60 THE THEORY OF LINEAR PREDICTION ‘‘Sh’’ in ‘‘Should I. which is the local periodicity of the utterance. This periodicity T of the speech segment has to be transmitted so that the receiver can reproduce faithful voiced sounds. which result from the normal equations are such that 1/AN (z) has all poles inside the unit circle (i. Jayant and Noll. Quantization and stability. 1970. the AR(N ) model 1/AN (z) is guaranteed to be stable.3. ‘‘I’’) are characterized by a pitch.’’ By contrast. Suppose the coefficients km are quantized in such a way that |km | < 1 (q ) continues to hold after quantization.

AUTOREGRESSIVE MODELING

61

In Section 7.8, we introduce the idea of line-spectrum pair (LSP). Instead of quantizing and transmitting the coefficients of AN (z) or the lattice coefficients km , one can quantize and transmit the LSPs. Like the lattice coefficients, the LSP coefficients preserve stability even after quantization. Other advantages of using the LSP coefficients will be discussed in Section 7.8.

5.7

MA AND ARMA PROCESSES

We know that a WSS random process x(n) is said to be AR if it satisfies a recursive (IIR) difference equation of the form
N

x(n) = −
i=1

di*x(n − i) + e(n),
N

(5.22)

where e(n) is a zero-mean white WSS process, and the polynomial D(z) = 1 + i=1 di*z−i has all zeros inside the unit circle. We say that a WSS process x(n) is a moving average (MA) process if it satisfies a nonrecursive (FIR) difference equation of the form
N

x(n) =
i=0

pi*e(n − i),

(5.23)

where e(n) is a zero-mean white WSS process. Finally, we say that a WSS process x(n) is an ARMA process if
N N

x(n) = −
i=1

di*x(n − i) +
i=0

pi*e(n − i),

(5.24)

where e(n) is a zero-mean white WSS process. Defining the polynomials
N N

D( z ) = 1 +
i=1

di*z−i

and P(z) =
i=0

pi*z−i ,

(5.25)

we see that the above processes can be represented as in Fig. 5.11. In each of the three cases, x(n) is the output of a rational discrete time filter, driven by zero-mean white noise. For the AR process, the filter is an all-pole filter. For the MA process, the filter is FIR. For the ARMA process, the filter is IIR with both poles and zeros. We will not have occassion to discuss MA and ARMA processes further. However, the theory and practice of linear prediction has been modified to obtain approximate models for these types of processes. Some discussions can be found in Marple (1987) and Kay (1988). In Section 5.5, we saw that the AR model provides an exact spectral factorization for an AR process. An

62

THE THEORY OF LINEAR PREDICTION
(a)

e(n)
white

1 /D(z)
IIR allpole filter

x(n)

AR process

(b)

e(n)
white

P(z)
FIR filter

x(n)

MA process

(c)

e(n)
white

P(z) /D(z)
IIR pole-zero filter

x(n)

ARMA process

FIGURE 5.11: (a) AR, (b) MA, and (c) ARMA processes.

approximate spectral factorization algorithm for MA processes, based on the theory of linear prediction, can be found in Friedlander (1983).

5.8

SUMMARY
1. An AR process x(n) of order N (AR(N ) process) satisfies an N th-order recursive difference equation of the form Eq. (5.22), where e(n) is zero-mean white noise. Equivalently, x(n) is the output of a causal stable IIR all-pole filter 1/D(z) (Fig. 5.11(a)) excited by white noise e(n). 2. If we perform N th-order linear prediction on a WSS process x(n), we obtain the represenf tation of Fig. 5.3(a), where eN (n) is the optimal prediction error and AN (z) is the predictor f polynomial. By replacing eN (n) with white noise, we obtain the AR(N ) model of Fig. 5.3(b). Here, y(n) is an AR(N ) process, and is the AR(N ) approximation (or model) for x(n). 3. The AR(N ) model y(n) approximates x(n) in the sense that the autocorrelations of the two processes, denoted r(k) and R(k), respectively, are matched for 0 ≤ |k| ≤ N (Theorem 5.2). 4. The AR(N ) model y(n) is such that its power spectrum Syy (e jω ) tends to approximate the power spectrum Sxx (e jω ) of x(n). This approximation improves as N increases and is particularly good near the peaks of Sxx (e jω ) (Fig. 5.6). Near the valleys of Sxx (e jω ), it is not so good, because the AR spectrum Syy (e jω ) is an all-pole spectrum and cannot have zeros on or near the unit circle.

We conclude the chapter with a summary of the properties of AR processes.

AUTOREGRESSIVE MODELING

63

5. If x(n) itself is AR(N ), then the optimal N th-order prediction error is white. Thus, x(n) can be written as in Eq. (5.22), where e(n) is white and di are the solutions (usually denoted aN,i ) to the normal equations Eq. (2.8). Furthermore, the AR(N ) power spectrum f EN /|AN (e jω )|2 is exactly equal to Sxx (e jω ). Thus, we can compute the power spectrum Sxx (e jω ) exactly, simply by knowing the autocorrelation coefficients R(k), |k| ≤ N and identifying AN (z) using normal equations. This means, in turn, that all the autocorrelation coefficients R(k) of an AR(N ) process x(n) are determined, if R(k) is known for |k| ≤ N.
• • • •

f eN (n)

65

CHAPTER 6

Prediction Error Bound and Spectral Flatness
6.1 INTRODUCTION
In this chapter, we first derive a closed-form expression for the minimum mean-squared prediction error for an AR(N ) process. We then introduce an important concept called spectral flatness. This measures the extent to which the power spectrum Sxx (e jω ) of a WSS process x(n), not necessarily 2 2 2 AR, is ‘‘flat.’’ The flatness γx is such that 0 < γx ≤ 1, with γx = 1 for a flat power spectrum (white process). We will then prove two results: 1. For fixed mean square value R(0), as the flatness measure of an AR(N ) process x(n) gets f smaller, the mean square optimal prediction error EN also gets smaller (Section 6.4). So, a less flat process is more predictable. 2. For any WSS process (not necessarily AR), the flatness measure of the optimal prediction f error em (n) increases as m increases. So, the power spectrum of the prediction error gets flatter and flatter as the prediction order m increases (Section 6.5).

6.2

PREDICTION ERROR FOR AN AR PROCESS

If x(n) is an AR(N ) process, we know that the optimal prediction error eN (n) is white. We will f show that the mean square error EN can be expressed in closed form as

f

EN = exp

f

1 2π

2π 0

LnSxx (e jω ) dω

(6.1)

where Sxx (e jω ) is the power spectrum of x(n). Here, Ln is the natural logarithm of a positive number as normally defined in elementary calculus and demonstrated in Fig. 6.1. Because x(n) is AR, the spectrum Sxx (e jω ) > 0 for all ω , and LnSxx (e jω ) is real and finite for all ω . The derivation of the above expression depends on the following lemma.

66

THE THEORY OF LINEAR PREDICTION
3 2

Ln y

1 0 -1 0

1

5

10

y

FIGURE 6.1: A plot of the logarithm.

Lemma 6.1. Consider a polynomial in z−1 of the form A(z) = 1 + the zeros of A(z) be strictly inside the unit circle. Then, 1 2π for some integer k. This implies
2π 0 2π 0

N n=1

anz−n , and let all

lnA(e jω ) dω = j 2π k

(6.2)

ln|A(e jω )|2dω/2π = j 2π m for some integer m, so that
2π 0

1 2π

Ln|A(e jω )|2 dω = 0,

(6.3) ♦

where Ln(.) stands for the principal value of the logarithm.

Because the logarithm arises frequently in our discussions in the next few sections, it is important to bear in mind some of the subtleties in its definition. Consider the function ln w, with w = Re jθ . Here, R > 0 and θ is the phase of w. Clearly, replacing θ with θ + 2π k for integer k does not change the value of w. However, the value of ln w will depend on k. Thus, lnw = LnR + jθ + j 2π k,

(6.4)

where Ln is the logarithm of a positive number as normally defined in calculus (Fig. 6.1). Thus, ln w has an infinite number of branches, each branch being defined by one value of the arbitrary integer k (Kreyszig, 1972; Churchill and Brown, 1984). If we restrict θ to be in the range −π < θ ≤ π and set k = 0, then the logarithm ln w is said to be evaluated on the principal branch and denoted as Lnw (called the principal value). For real and positive w, this agrees with Fig. 6.1. Note that Ln1 = 0, but ln1 = j 2π k, and its value depends on the branch number k. Because the branch of the function ln w has not been specified in the statement of Lemma 6.1, the integer k is undetermined (and unimportant for our discussions). Similarly, the integer m is

1984).9) The power series Eq.7) where ki is some integer. (6. .) (because exp(j 2π k) = 1 for any integer k). Proof of Lemma 6. (6. we have ∞ ln(1 − αi e− jω ) = − n=1 αn e− jωn i + j 2 π ki . (6. n (6. 2. we deal with quantities of the form Eq. ln A(e jω ) is well defined and finite there (except for branch ambiguity). This proof depends crucially on the condition that the zeros of A(z) are inside the unit circle. n |v| < 1. . . Churchill and Brown. (6. 1972. in practically all our applications. However. 3 .PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 67 undetermined. so that it is immaterial whether we use ln(.1). (6. . This condition enables us to expand the principal value Ln(1 − αi e− jω ) using the power series1 ∞ Ln(1 − v) = − n=1 vn .1. Note first that because A(z) has no zeros on the unit circle. n = 1. that is |αi | < 1.) or Ln(.10) 1 This expansion evaluates the logarithm on the principal branch because it reduces to zero when v = 0. We know that any power series converges uniformly in its region of convergence and can therefore be integrated term by term (Kreyszig. (6. We can express N A( e ) = i=1 jω (1 − αi e− jω ). The distinction should be kept in mind only when trying to follow detailed proofs.6) It is therefore sufficient to prove that 1 2π 2π 0 ln(1 − αi e− jω ) dω = j 2π ki.8) Identifying v as v = αi e− jω . Because 2π 0 e− jωn dω = 0.8) has region of convergence |v| < 1.5) so that N ln A(e ) = i=1 jω ln(1 − αi e− jω ) + j 2π k. (6.

the power spectrum of eN (n) is given by Sxx (e jω )|AN (e jω )|2 . 2π 6.68 THE THEORY OF LINEAR PREDICTION Eq. We know that x(n) is said to be white if its power spectrum Sxx (e jω ) is constant for all ω . proving (6. Thus. The third equality above was possible because Sxx (e jω )|AN (e jω )|2 is constant for all ω . Derivation of Eq. (6. 1 2π 2π 0 = exp = exp 1 2π 1 2π Ln|AN (e jω )|2 dω because 0 Ln|AN (e jω )|2 dω = 0 by Lemma 6. Although such a measure is by no means unique.7) readily follows.2). the i above argument can be repeated to obtain a similar result. A *(e jω ) = i=1 (1 − α* e jω ). When x(n) is not white. its spectrum is constant. 2π 0 2π 0 2π 0 N 1 2π ln|A(e jω )|2 dω = 1 2π ln A(e jω )dω + 1 2π ln A∗ (e jω )dω = j 2π m for some integer m.3 A MEASURE OF SPECTRAL FLATNESS Consider a zero-mean WSS process x(n). (6.1). Thus. So. This establishes the desired result. signal compression. . This completes the proof. Recall that eN (n) is the output of the FIR filter AN (z). Because |αi| < 1. Because eN(n) f is white. Next. the one introduced in this section proves to be useful in linear prediction theory. it is of interest to introduce a ‘‘degree of whiteness’’ or ‘‘measure of flatness’’ for the spectrum. f EN = Sxx (e jω )|AN (e jω )|2 = exp Ln Sxx (e jω )|AN (e jω )|2 1 = exp 2π 2π 0 2π 0 2π 0 f Ln Sxx (e jω )|AN (e jω )|2 dω LnSxx (e jω ) dω + LnSxx (e jω ) dω .1. and its value at any frequency equals the mean square value EN of f eN (n). and other signal processing applications. in response f f to the process x(n).

where the integer subscript i in xi is replaced with a continuous variable. which is the mean square value of x(n). .. Definition 6.13) Sxx (e ) dω jω where it is assumed that Sxx (e jω ) > 0 for all ω . The arithmetic-mean geometric mean inequality (abbreviated as the AM--GM inequality) says that any set of positive numbers x1 . . γx > 0. 0≤ GM ≤1 AM (6. xN satisfies the inequality 1 N N N 1/N xi ≥ i=1 AM i=1 GM xi (6. the flatter the distribution is.14) .’’ The larger this ratio. The denominator of γx is 1 2π 2π 0 Sxx (e jω )dω = R(0). For a WSS process with power spectrum Sxx (e jω ). 2 2 Clearly.11) with equality if and only if all xi are identical (Bellman. . this ratio grows and becomes unity when the numbers are identical. say frequency) we can obtain a useful measure of flatness for the power spectrum. As the numbers get closer to each other. ♦ 2 We will see that the ratio γx can be regarded as the GM/AM ratio of the spectrum Sxx (e jω ).12) and the ratio GM/AM is a measure of how widely different the numbers {xi } are. 1960). Spectral flatness. By extending the concept of the GM/AM ratio to functions of a continuous variable (i.e.PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 69 Motivation from AM/GM ratio. Thus. . The ratio GM/AM can therefore be regarded as a measure of ‘‘flatness.1. We will first show that 1 2π 2π 0 Sxx (e jω ) dω ≥ exp 1 2π 2π 0 LnSxx (e jω ) dω (6. we define the spectral flatness measure as exp 2 γx = 1 2π 1 2π 2π 0 2π 0 LnSxx (e jω ) dω (6.

Suppose we divide the interval 0 ≤ y ≤ a into N uniform subintervals of length Δy as shown in Fig.. 2 with γx = 1 if and only if x(n) is white. (6. n=0 f(y) … 0 Δy =a/N y 2Δy ΝΔy = a FIGURE 6.70 THE THEORY OF LINEAR PREDICTION with equality if and only if Sxx (e jω ) is constant (i.2.14).2) so that the above becomes 1 a N −1 f (nΔy)Δy ≥ exp n=0 1 a N −1 Ln f (nΔy)Δy . we obtain N −1 N −1 1 N f (nΔy) ≥ n=0 n=0 f (nΔy) N −1 1/N = exp Ln = exp 1 N n=0 N −1 f (nΔy) 1/N Ln f (nΔy) .e. 6. (6. pp.2: Pertaining to the proof of Eq.14). we have 2 0 < γ x ≤ 1. . 63--64). Because of this result.14) can be found in Rudin (1974. The justification below is based on extending the AM--GM inequality for functions of continuous argument. Consider a function f (y) > 0 for 0 ≤ y ≤ a. A rigorous proof of a more general version of (6. n=0 Equality holds if and only if f (nΔy) is identical for all n. We can replace 1/N with Δy/a (Fig. x(n) is white). From the AM--GM inequality. 6. Justification of Eq.

1 and (b) three special cases of this power spectrum.. (6. 6.13) can be computed as 1 2π 2π 0 exp LnSxx (e jω ) dω = exp αLn 1 +1 α . It can be verified that R(0) = 1 2π 2π 0 Sxx (e jω ) dω = 2. Spectral Flatness Measure.1. (6. Sxx (e (a) jω ) 1 + (π/ω c ) α = ω c /π 1 ω 0 ωc jω π jω Sxx (e (b) jω ) Sxx (e ) medium α Sxx (e ) max α (= 1) small α 1 0 ω π 0 1 ω π 0 2 ω π FIGURE 6. N → ∞).15) With f ( y) identified as Sxx (e jω ) > 0 and a = 2π .3(b) shows the spectrum for various values of α. Fig.e. Consider a real. We therefore have 1 a a 0 f ( y) dy ≥ exp 1 a a 0 Ln f ( y) dy . the two summations in the above equation become integrals. 6. (6. The quantity in the numerator of Eq. zero-mean WSS process x(n) with the power spectrum Sxx (e jω ) shown in Fig. 6. we immediately obtain Eq. Example 6. .3(a).14).PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 71 If we let Δy → 0 (i.3: (a) Power spectrum for Ex.

6. the mean square prediction error EN is smaller if the flatness measure γx is smaller. We can simplify this and substitute into the definition of γx to obtain the flatness measure 2 γx 1 = 0. The spectral flatness measure γx defined in Eq.16) 2 For fixed R(0). then the mean square value of the optimal prediction 2 error is given by Eq. the spectrum gets flatter.13) therefore becomes 2 γx = EN R(0) f f (6.6 0. Qualitatively speaking. Expression in Terms of the Impulse Response.1.5 1 + α α This is plotted in Fig.2 0. this means that if x(n) has a very nonflat power spectrum.6 0.8 γ x2 0.2 0 0.4 0. 6. 6. then the f prediction error EN can be made very small. An equivalent and useful way to express the above flatness measure is provided by the fact that x(n) is the output of the IIR filter G(z) = 1 . (6.4: Flatness measure γx in Ex. 6. then the prediction error EN is nearly as large as the mean square value of x(n) itself. We see that as α increases from zero to unity. .1).8 1 α 2 FIGURE 6.4 SPECTRAL FLATNESS OF AN AR PROCESS We know that if x(n) is an AR(N ) process. AN ( z ) 1 0. (6.4 0.72 THE THEORY OF LINEAR PREDICTION 2 where α = ωc /π .4 as a function of α. If on the other hand the spectrum Sxx (e jω ) of x(n) is f nearly flat.3(b). as expected from an examination of Fig.

PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 73 in response to the input (Fig. Because x(n) is AR(N ). Spectral flatness of AR process. with −1 < ρ < 1.3. we considered a process x(n) with autocorrelation R(k) = ρ|k| .2. the spectral flatness γx of the process x(n) can be expressed in either of the following two forms: 2 1.1(b)). we have Theorem 6. So. the power spectrum of x(n) is Sxx (e jω ) = f eN (n) f eN (n) is white.1. n=0 where g(n) is the causal impulse response of G(z). (6. and EN .16) becomes 2 γx = ∞ n=0 1 1 = . ♦ Example 6. The optimal first-order prediction polynomial was found to be A1 (z) = 1 − ρz−1 . Spectral Flatness of an AR(1) Process. Eq. 3. Let AN (z) be the optimal N th-order prediction polynomial and EN the 2 corresponding mean square prediction error. |AN (e jω )|2 f The mean square value of x(n) is therefore given by R(0) = EN × f 1 2π 2π 0 1 f dω = EN |AN (e jω )|2 ∞ |g(n)|2 . or ∞ 2 2. γx = 1/ n=0 |g(n)|2 . 2. . Let x(n) be an AR(N ) process with mean square f value R(0) = E[|x(n)|2]. the error therefore. f where g(n) is the causal impulse response of G(z) = 1/AN(z). γx = EN /R(0). Then. Summarizing. In Ex. 2 |g(n)| Eg where ∞ Eg = n=0 |g(n)|2 = energy of the filter G(z).

6: The power spectrum of an AR(1) process for two values of the correlation coefficient ρ.5 0 0 ω π 0 0 ω π FIGURE 6. which is the Fourier transform of R(k). So for the AR(1) process x(n) is = 1/(1 − ρ 2). the process gets flatter and flatter as ρ 2 gets smaller. The power spectrum of the process x(n).5.1 is G( z ) = 1 1 = A1 ( z ) 1 − ρ z −1 ∞ 2 n=0 g (n) f and its impulse response is g(n) = ρnU (n). is given by Sxx (e jω ) = 1 − ρ2 1 − ρ2 = −jω |2 |1 − ρ e 1 + ρ 2 − 2ρ cos ω This is depicted in Fig. and the flatness measure 2 γx = 1 − ρ 2 .5 1 ρ FIGURE 6.6 0.2 0 -1 -0. .5 0 0. (For −1 < ρ < 0.4 0. We also saw that the first-order prediction error sequence e1 (n) is white.8 2 γx 0. 6. This means that the process x(n) is AR(1).) As ρ increases from zero to unity (i.74 THE THEORY OF LINEAR PREDICTION 1 0. This is plotted in Fig. the pole of 20 1 ρ = 0.5: The flatness of an AR(1) process as a function of the correlation coefficient ρ.6 for two (extreme) values of ρ in the range 0 < ρ < 1.1 ρ = 0. The IIR all-pole filter G(z) defined in Theorem 6. So. 6.e. it is similar except that the peak occurs at π ..9 Sxx (e jω ) Sxx (e jω ) 10 0.

We now prove this result.17) ♦ Furthermore.5 CASE WHERE SIGNAL IS NOT AR We showed that if x(n) is an AR(N ) process. 6.1). (6. equality holds if and only if x(n) is AR(N ) and m ≥ N. so that Emf = 1 2π 2π 0 Sxx (e jω )|Am (e jω )|2 dω 2π 0 2π 0 2π 0 ≥ exp = exp = exp 1 2π 1 2π 1 2π Ln Sxx (e jω )|Am (e jω )|2 dω LnSxx (e jω ) dω + 0 2π (by Eq. (6.1) for any WSS process. 6. its power spectrum is given by Sxx (e jω )|Am (e jω )|2 . f m theoretical bound 0 1 2 f m (prediction order) FIGURE 6. We know that the prediction error is the output of the FIR filter Am (z) with input equal to x(n) (Fig.PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 75 G(z) gets closer to the unit circle). 2 consistent with the fact that γx decreases as in Fig. Let Em be the mean square value of the optimal mth-order prediction error. So.5.7: The optimal prediction error Em decreases monotonically with increasing m but is lower bounded by Eq.1(a)). Emf ≥ exp 1 2π 2π 0 LnSxx (e jω ) dω . Prediction error bound. 6. Theorem 6. f em (n) Proof. (6. then the Nth-order mean square prediction error is given by (6.15)) Ln|Am (e jω )|2 dω LnSxx (e jω ) dω (by Lemma 6. Let x(n) be a WSS process with power spectrum f jω Sxx (e ) finite and nonzero for all ω . 2.1). When x(n) is not AR.7.2. the spectrum gets more and more peaky (less and less flat). Then. it turns out that this same quantity acts as a lower bound for the mean square prediction error as demonstrated in Fig. .

8: The flatness γm of the optimal prediction error increases monotonically with increasing m. (6. the error Em decreases.1).18) The numerator is fixed for a given process. 6. This proves that the flatness is an increasing function of prediction order m (Fig. 6. x(n) is AR with order ≤ m.5. this simplifies to exp 2 γm = 1 2π 2π 0 LnSxx (e jω ) dω Em f . 1 2 γm flatness of error 0 1 2 m (prediction order) 2 FIGURE 6. or equivalently. 2 The flatness measure γm for the process em (n) is f f exp 2 γm = 1 2π 2π 0 2π 0 LnSm (e jω ) dω = Using 2π 0 1 2π 1 exp 2π Sm (e jω ) dω Ln Sxx (e jω )|Am (e jω )|2 dω 2π 0 Em f Ln|Am (e jω )|2 dω = 0 (Lemma 6.8). As the prediction order increases.76 THE THEORY OF LINEAR PREDICTION Equality holds if and only if Sxx (e jω )|Am (e jω )|2 is constant. With Sm (e jω ) denoting the power spectrum of em (n).1 Error Spectrum Gets Flatter as Predictor Order Grows We will now show that the power spectrum of the error em (n) gets ‘‘flatter’’ as the prediction order f m increases. It is bounded above by unity. The proof is therefore complete. we have Sm (e jω ) = Sxx (e jω )|Am (e jω )|2 . .

The flatness γm of the error f f process em (n) is inversely proportional to the mean square value of the prediction error Em .219 R(3) −3.3: Spectral Flattening With Linear Prediction. .5469 Again.0 −0.7701 1.0 1. γx and R(0) are constants determined by the input process x(n).474 If we do linear prediction on this process. The lattice coefficients are k1 k2 0.5469 −1.8198 −1.2714z−3 + 0.0 1. after k4 .4624 0 0 0 0 0 0 0 0 0 0 0 0 Thus.8198 0 0.PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 77 We can rewrite 2 γm in terms of the flatness measure 2 γm = 2 γx of the original process x(n) as follows: 2 γx R(0) Em f .967 R(2) −0.0425 2. Example 6.1457 −1.397 R(5) −0.4624 k5 0. the prediction polynomials are A0 ( z ) A1 ( z ) A2 ( z ) A3 ( z ) A4 ( z ) A5 ( z ) A6 ( z ) 1. because the original process is AR(4).0 1.5669 −1.4624 0.0425 2.0 k6 0.4624 0.639 R(6) −0.087 R(7) −0.674 R(1) 4.0 1.8198 −1.0 1.083 R(4) −2.2714 −1. the polynomial does not change.0 1. 2 2 Here.6473 −1. |A(e jω )|2 A(z) = 1 − 1. after A4 (z).4624z−4 The first few elements of the autocorrelation are R(0) 7.0425z−2 − 1.2714 0 0 0 0.0425 0 0 −0.8198z−1 + 2. In this example. all the coefficients are zero because the original process is AR(4).0 k7 0.7701 k3 k4 0.6473 −0.2714 −1. we consider an AR(4) process with power spectrum Sxx (e jω ) = where 1 .3967 2.0 −0.

5 0 0 0. It is seen that the spectrum Sm (e jω ) gets flatter as m grows.9: Original AR(4) power spectrum (dotted) and power spectrum of prediction error (solid). and S4 (e jω ) is completely flat.9 shows the power spectrum Sm (e jω ) of the prediction error em (n) for 1 ≤ m ≤ 4.78 THE THEORY OF LINEAR PREDICTION Fig. (c) prediction order = 3.0 1.8 1 (b) 1 Error power-spectrum m=0 m=2 0. (a) Prediction order = 1. .0 (a) 1 Error power-spectrum m=0 m=1 0.1309 0. and (d) prediction order = 4.6 0. is as follows: 2 γ0 2 γ1 2 γ2 2 γ3 2 γ4 2 γ5 2 γ6 f 0.8 1 FIGURE 6.6 0.2 0.2 0. 6.7862 1. calculated for each of these power spectra.5517 0. The original power spectrum Sxx (e jω ) = S0 (e jω ) is shown in each plot for comparison.4 ω/π 0. (b) prediction order = 2.0 1.2248 0.5 0 0 0. For convenience. the plots are shown normalized so that Sxx (e jω ) has peak value equal to unity. again because the original process is AR(4). The flatness measure.4 ω/π 0.

PREDICTION ERROR BOUND AND SPECTRAL FLATNESS
(c)

79

1

Error power-spectrum

m=0 m=3
0.5

0 0

0.2

0.4

ω/π

0.6

0.8

1

(d)

1

Error power-spectrum

m=0 m=4
0.5

0 0

0.2

0.4

ω/π

0.6

0.8

1

FIGURE 6.9: Continued.

6.5.2 Mean Square Error and Determinant
In Section 2.4.3, we found that the minimized mean square prediction error can be written in terms of the determinant of the autocorrelation matrix. Thus, det Rm = Em−1 Em−2 . . . E0 .
f f f f

(6.19)

We know that the sequence Em is monotone nonincreasing and lower bounded by zero. So it f reaches a limit E∞ as m → ∞. We will now show that
1 /m
m→∞

lim

det Rm

f = E∞

(6.20)

80

THE THEORY OF LINEAR PREDICTION

The quantity (det Rm )1/m can be related to the differential entropy of a Gaussian random process (Cover and Thomas, 1991, p. 273). We will say more about the entropy in Section 6.6.1. Proof of (6.20). If a sequence of numbers α0 , α1, α2 , . . . tends to a limit α, then 1 M→∞ M
M−1

lim

αi = α.
i=0

(6.21)
f

That is, the average value of the numbers tends to α as well (Problem 26). Defining αi = Ln Ei , we have from Eq. (6.19) Ln det Rm = Ln Em−1 + Ln Em−2 . . . + Ln E0 . Because Ln(x) is a continuous function when x > 0, we can say
i→∞

f

f

f

f lim Ln Ei = Ln E∞ .

f

(6.22)

(Problem 26). We therefore have

m→∞

lim

1 1 f f f Ln Em−1 + Ln Em−2 . . . + Ln E0 Ln det Rm = lim m→∞ m m f = Ln E∞ ,

by invoking Eq. (6.21) and (6.22). This implies 1 Ln det Rm m
f = exp Ln E∞ .

exp

m→∞

lim

Because exp(x) is a continuous function of the argument x, we can interchange the limit with exp(.). This results in (6.20) indeed. A deeper result, called Szego’s theorem (Haykin, 1986, pp. 96), can be used to prove the further fact that the limit in Eq. (6.20) is really equal to the bound on the right-hand side of Eq. (6.17). That is,
1/m
m→∞

lim

det Rm

f = E∞ = exp

1 2π

2π 0

Ln Sxx (e jω ) dω .

(6.23)

PREDICTION ERROR BOUND AND SPECTRAL FLATNESS

81

In other words, the lower bound on the mean square prediction error, which is achieved for the AR(N ) case for finite prediction order, is approached asymptotically for the non-AR case. For further details on the asymptotic behavior, see Grenander and Szego (1958) and Gray (1972). Predictability measure. We now see that the flatness measure in Eq. (6.13) can be written in the form
2 γx = f

E∞ R(0)

f

(6.24)

2 For very small γx , the power E∞ in the prediction error is therefore much smaller than the power R(0) in the original process, and we say that the process is very predictable. On the other hand, if 2 γx is close to unity then the ultimate prediction error is only slightly smaller that R(0), and we say 2 that the process is not very predictable. For white noise, γx = 1, and the process is not predictable, 2 as one would expect. The reciprocal 1/γx can be regarded as a convenient measure of predictability of a process.

6.6

MAXIMUM ENTROPY AND LINEAR PREDICTION

There is a close relation between linear prediction and the so-called maximum entropy methods (Burg, 1972). We now briefly outline this. In Section 6.5, we showed that the mth-order prediction f error Em for a WSS process x(n) is bounded as

Emf ≥ exp

1 2π

2π 0

LnSxx (e jω ) dω .

(6.25)

We also know that if the process x(n) is AR(N ), then equality is achieved in Eq. (6.25) whenever the prediction order m ≥ N. Now consider two WSS processes x(n) and y(n) with autocorrelations R(k) and r(k), respectively, and power spectra Sxx (e jω ) and Syy (e jω ), respectively. Assume that R(k) = r(k),

| k| ≤ N .
f

Then the predictor polynomials Am (z) and the mean square prediction errors Em will be identical for the processes, for 1 ≤ m ≤ N. Suppose now that y(n) is AR(N ), but x(n) is not necessarily AR, then from Section 6.2, we know that exp 1 2π
2π 0

LnSyy (e jω ) dω

= EN

f

1991) that the quantity φ = a+b 0 2π LnSyy (e jω ) dω (6. Thus. we have proved the following result. Because exp(v) is a monotone-increasing function of real v. Theorem 6.2. (6. Let x(n) and y(n) be two WSS processes with autocorrelations R(k) and r(k). For discrete-amplitude random variables.27) √ is equal to the entropy per sample (entropy rate) of the process. the entropy is more appropriately called differential entropy and should be interpreted carefully.82 THE THEORY OF LINEAR PREDICTION From Theorem 6. then it can be shown (Cover and Thomas.26) process with equality if and only if x(n) is also AR. then 2π 0 AR(N) LnSyy (e jω ) dω ≥ 0 2π LnSxx (e jω ) dω (6. entropy is related to the extent of ‘‘uncertainty’’ or ‘‘randomness. If a WSS process y(n) is Gaussian. Moreover.1 Connection to the Notion of Entropy The importance of the above result lies in the fact that the integrals in Eq. given a random . the above also means 2π 0 LnSyy (e jω )dω ≥ 0 2π LnSxx (e jω )dω. respectively. Summarizing.26) are related to the notion of entropy in information theory.2). and power spectra Sxx (e jω ) > 0 and Syy (e jω ) > 0. on the other hand. they are equal if and only if x(n) is also AR(N ) (by Theorem 6. Details.3.’’ For a continuous-amplitude random variable. respectively. where a = ln 2π e and b = 1/4π . Assume that r(k) = R(k) for |k| ≤ N and that y(n) is AR(N ).6. we know that EN ≥ exp f 1 2π 2π 0 LnSxx (e jω ) dω Comparison of the two preceding equations shows that exp 1 2π 2π 0 LnSyy (e jω ) dω ≥ exp 1 2π 2π 0 LnSxx (e jω ) dω . ♦ 6.

6. Now assume that y(n) is the AR(N ) model for x(n) obtained by use of linear prediction as f in Fig. The problem of maximizing φ under appropriate constraints is said to be an entropy maximization problem. Theorem 6. Suppose we forget about the linear prediction problem and consider the following maximization problem directly: we are given a WSS process x(n) with autocorrelation R(k). 6.e. then the output y(n) will be Gaussian as well (Papoulis. Then the number of bits required to represent xΔ is bΔ = Hx + c. The AR(N ) process y(n) is Gaussian.27) will be referred to as ‘‘entropy’’ and interpreted as ‘‘randomness’’ in the following discussions. Also see Papoulis (1981). consider the quantized version xΔ with a fixed quantization step size Δ.. The relation between linear prediction and maximum entropy methods is also discussed by Burg (1972). We wish to find a WSS process y(n) with the following properties.2 A Direct Maximization Problem A second justification of Theorem 6. with the following properties.3. R(k) = r(k)). and Cover and Thomas (1991). 1. Robinson (1982) also gives a good perspective on the history of spectrum estimation. Although the integral in the second term of (6. 1991. the entropy 0 LnSyy (e jω ) dω of the process y(n) is the largest possible. larger differential entropy implies more number of bits. 1965). Thus. Furthermore. 1. If we also make e(n) Gaussian. Robinson (1982). The autocorrelation r(k) of the AR(N ) process y(n) matches the autocorrelation R(k) of the given process x(n) (i. the details given above should be kept in mind. 2. 2π 3.PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 83 variable x with differential entropy Hx . for fixed step size. y(n) is ‘‘as random as it could be. we have obtained an AR(N ) model y(n) for the process x(n). the entropy has the maximum value for AR(N ) processes. We know that in this model e(n) is white noise with variance EN . for |k| ≤ N (as shown in Section 5. p. or randomness. In other words. 229). where c depends only on the step size Δ (Cover and Thomas. In other words. Under this matching constraint.4). . all AR(N ) processes with autocorrelation fixed as above have the same entropy.’’ under the constraint r(k) = R(k) for |k| ≤ N .3 will now be mentioned primarily because it might be more insightful for some readers. Its autocorrelation r(k) satisfies r(k) = R(k) for |k| ≤ N. 5. among all Gaussian WSS processes with fixed autocorrelation for |k| ≤ N.3 says that. uncertainty.

(6. 2 The fact that this maximizes rather than minimizes φ is verified by examining the Hessian matrix (Chong and ˙ Zak. A necessary condition for the maximization2 of the entropy φ defined in Eq. 2007). Because the desired solution has been proved to be AR(N ). 2001. that an AR(N ) spectrum satisfying the constraint r(k) = R(k). In other words. furthermore. This says that the inverse Fourier transform of 1/Syy (e jω ) should be of finite duration. Now. This must be achieved by optimizing the coefficients r(k) for |k| > N. we 1 e−jωk dω = 0. Antoniou and Lu.27) is ∂φ/∂ r (k) = 0 for |k| > N. |AN (e jω )|2 N where AN (z) is an FIR function of the form AN (z) = 1 + i=1 a∗ .3). restricted to the region |k| ≤ N. With Syy (e jω ) denoting the power spectrum of y(n). we can find it simply by finding the AR(N ) model of the process x(n) using linear prediction techniques (Section 5. . Setting ∂φ/∂ r(k) = 0.4) and. This is because we already know that the AR(N ) model y(n) arising out of linear prediction satisfies r(k) = R(k) for |k| ≤ N (Section 5. Syy (e jω ) or equivalently Syy (e jω ) = c . Thus. Syy (e jω ) is an N autoregressive spectrum. we must be able to express 1/Syy (e jω ) as 1 = c1 |AN (e jω )|2. the entropy 0 LnSyy (e jω ) dω is as large as possible. |k| ≤ N is unique (because the solution to the normal equations is unique). 2π ∂φ =b ∂ r(k) =b 2π 0 2π 0 2π =b 0 ∂[LnSyy (e jω )] dω ∂ r(k) ∂ Syy (e jω ) 1 dω Syy (e jω ) ∂ r(k) 1 e−jωk dω Syy (e jω ) ∞ k=−∞ The last equality follows by using Syy (e jω ) = therefore obtain 2π 0 r(k)e−jωk . Syy (e jω ) | k| > N .i z−i .84 THE THEORY OF LINEAR PREDICTION 2.

along with a discussion of the history of this problem. the flatness of the f prediction error is proportional to 1/Em . as the prediction order m increases. It is fascinating to see how all these connections arise starting from the ‘‘simple’’ objective of minimizing a mean square error. With sufficiently large sampling rate. the error f spectrum becomes flatter. From Eq. with predictor coefficients independent of the signal! This result dates back to Wainstein and Zubakov (1962). 6. but because the mean square error Em decreases. (6.18). we have Sm (e jω ) = Sxx (e jω )|Am (e jω )|2 and S0 (e jω ) = Sxx (e jω ). entropy. and let Sm (e jω ) be f the power spectrum for em (n). then so is f em (n). and the integral above represents the entropy of Thus. Because 2π 0 2π 0 f Ln|Am (e jω )|2 dω = 0 (Lemma 6. not because the entropy increases. Thus. (6.7 CONCLUDING REMARKS The derivation of the optimal linear predictor was based only on the idea that the mean-squared prediction error should be minimized. So.6. in the Gaussian case. With Sxx (e jω ) denoting the power spectrum of x(n). We now see that the solution has interesting connections to flatness measures. • • • • . we see that the flatness γm f of the error em (n) is given by 2 γm = K Em f. this can be done accurately. LnSm (e jω ) dω = LnSxx (e jω ) dω.28) where K is independent of m and depends only on the entropy of x(n). Further results can be found in Vaidyanathan (1987).3 Entropy of the Prediction Error Let em (n) be the mth-order optimal prediction error for a WSS process x(n).PREDICTION ERROR BOUND AND SPECTRAL FLATNESS 85 6. Another interesting topic we have not discussed here is the prediction of a continuous-time band-limited signal from a finite number of past samples. for all ω . and so forth. If x(n) is Gaussian. we have 2π 0 f em (n). the entropy of the prediction error is the same for all prediction 2 orders m and equal to the entropy of the input x(n). We also pointed out earlier that the solution has the minimum-phase property and that it has structural interpretations in terms of lattices.1).

.

1 2π ωi + ωi − Sxx (e jω ) dω = ci (7. Using this. suppose we have a signal y(n). all of its samples. L Sxx (e jω ) = 2π i=1 ci δa (ω − ωi ). We then study the time domain properties and descriptions of Linespec(L) processes. We first express the Linespec(L) property concisely in terms of the autocorrelation matrix. (7. For example.1 INTRODUCTION A WSS random process is said to be line spectral. We also sometimes abbreviate this as a Linespec(L) process. if we know the values of L successive samples. if the power spectrum Sxx (e jω ) consists only of Dirac delta functions. In this chapter. we study line spectral processes in detail.1) can be extracted from the noisy signal. we can predict. and we say that L is the degree of the process.1.2) for sufficiently small > 0. 7. We will show how the line frequencies ωi and the line powers ci in Eq. (7. that is.87 CHAPTER 7 Line Spectral Processes 7. we show that a Linespec(L) process satisfies a homogeneous difference equation. that is. . with zero error. If ci > 0 for all i. Note that the power of the signal x(n) around frequency ωi is ci . We then consider the problem of identifying a Linespec(L) process in noise. then L represents the number of lines in the spectrum as demonstrated in Fig. 0 ≤ ω < 2π. More specifically. which is a sum of a Linespec(L) process x(n) and uncorrelated white noise e(n): y(n) = x(n) + e(n). So we say that ci is the power at the line frequency ωi .1) where {ωi } is a set of L distinct frequencies (called line frequencies) and ci ≥ 0.

the output is zero for all n.4) . R(L) ⎥ ⎢ R(0) . i (7. ⎦ ⎣ . (7. Line spectral processes. R(L − 1) ⎥ ⎢ R*(1) ⎥ RL+1 = ⎢ ...1): R(k) = i=1 ci e jωi k .3) This is a superposition of single-frequency sequences.1: Power spectrum of a line spectral process with L lines. (7. The kth line is a Dirac delta function with height 2πck .2 AUTOCORRELATION OF A LINE SPECTRAL PROCESS L The autocorrelation of the line spectral process is the inverse Fourier transform of Eq. Consider the (L + 1) × (L + 1) autocorrelation matrix RL+1 of a WSS process: ⎤ ⎡ R(0) R(1) .88 THE THEORY OF LINEAR PREDICTION 2 π c1 Sxx (e jω ) 2 π c2 … 0 ω1 ω2 ω ω L 2π FIGURE 7. A WSS process x(n) is line spectral with degree L if and ♦ only if the matrix RL+1 is singular and RL is nonsingular. .2 that if RL+1 is singular. R(0) The Linespec(L) property can be stated entirely in terms of RL+1 and RL as shown by the following result. We already showed in Section 2. R*(L) R*(L − 1) ..1) (line spectral. With the filter written as L V(z) = i=0 v* z−i . . Proof. (7. then the power spectrum has the form Eq. (7. assume x(n) is line spectral with degree ≤ L. . 7. .. with degree ≤ L). . Theorem 7. . ⎥ ⎢ . Conversely. If we pass x(n) through an FIR filter V(z) of order ≤ L with zeros at the line frequencies ωi . .4.1. ..5) . .

The former implies that the degree is not ≤ L − 1. By replacing L + 1 with L. we can rewrite (7. Summarizing. .7) We now show that if x(n) is Linespec(L). x(n − L) and v = v0 v1 . Defining x(n) = x(n) x(n − 1) . Because RL+1 is positive semidefinite. Notice. we obtain v† RL+1 v = 0. RL+1 is singular if and only if x(n) is line spectral with degree ≤ L. that is. such that RL+1 u = RL+1 w = 0. u0 = 0 and w0 = 0. assume RL is nonsingular and RL+1 singular. 0 z =0 z = 0. Conversely. vL T T (7. that RL is a principal submatrix of RL+1 in two ways: RL+1 = RL × × R(0) = R(0) × . This means. as we already showed.LINE SPECTRAL PROCESSES 89 the output is * * * v0 x(n) + v1 x(n − 1) + . of course. it is obvious that there exists an annihilating vector v. Now consider the vector y = u/u0 − w/w0 . This is nonzero (because u and w are linearly independent).7). (7. by the way. contradicting nonsingularity of RL . So the degree has to be exactly L. . it also follows that RL is singular if and only if x(n) is line spectral with degree ≤ L − 1. The latter implies that the process is line spectral with degree ≤ L. and bring about a contradiction. × RL (7. . for otherwise the remaining components annihilate RL in view of (7.6) . .8) Proof. that is. Because RL+1 is singular. it follows that RL z = 0. the above equation implies that RL+1 is singular. To prove uniqueness. then RL has to be nonsingular and. . we assume there are two linearly independent vectors u = 0 and w = 0. but its zeroth component is zero: y= Because RL+1 y = 0. The 0th elements of these vectors must be nonzero. v† E[x(n)x† (n)]v = 0.6) as v† x(n) = 0. which violates nonsingularity of RL . + vL x(n − L) = 0. . in particular. From this. there is a unique nonzero vector v (up to a scale factor) that annihilates RL+1 : RL+1 v = 0. RL+1 singular. that if x(n) is line spectral with degree equal to L. .

then the smaller vector w = v1 v2 .2: The rank of the autocorrelation matrix Rm as a function of size m. So the FIR filter V(z) in Eq.90 THE THEORY OF LINEAR PREDICTION The result can also be proved by using the eigenvalue interlace property for principal submatrices of Hermitian matrices (Horn and Johnson. . . From Eq. is itself unique and is called the characteristic polynomial or characteristic filter of the line spectral process. (7. vL T would satisfy RL w = 0 (in view of Eq.5). which annihilates the line spectral process x(n). determined by the unique eigenvector v. we can assume v0 = 1.2. 7. (7. can be written as L V(z) = 1 + i=1 v* z−i . L rank of Rm 1 1 2 L m FIGURE 7. For. for a Linespec(L) process. 1985). if v0 = 0. So. (7. that all future samples can be predicted with zero error if a block of L samples is known. i (7. we see that a Linespec(L) process x(n) also satisfies the homogeneous difference equation L x(n) = − i=1 v* x(n − i). But the above proof is more direct.6).9) This means. in particular. . without loss of generality. i This filter.7)) contradicting the fact that that RL is nonsingular.1 The Characteristic Polynomial The unique vector v that annihilates RL+1 has zeroth component v0 = 0.

we know that V(z) annihilates the process x(n).2.12) . rank Rm = (7.9) holds. 7.2. if we know R(k) for 0 ≤ k ≤ L − 1 for a Linespec(L) process x(n).10) This is indicated in Fig. the random process x(n) itself satisfies the same recursion.1 Extrapolation of Autocorrelation We showed that a Linespec(L) process x(n) sastisfies the homogeneous difference equation Eq.3. in particular. we discuss some of these. we see that if x(n) is Linespec(L). (7.3. Thus. If we multiply both sides of (7. (7.9).LINE SPECTRAL PROCESSES 91 7. 7. we will see that whenever the autocorrelation satisfies the above recursion. 7. Thus. This is similar in spirit to the case where x(n) is AR(N). In this section.1. It can. i (7. the knowledge of R(k) for 0 ≤ k ≤ N would reveal the polynomial AN (z) and therefore the value of R(k) for all k (Section 5.11) where we have used R(k) = E[x(n)x*(n − k)].3 TIME DOMAIN DESCRIPTIONS Line spectral processes exhibit a number of special properties. in which case.11). that is. we get L R(k) = − i=1 v* R(k − i). in fact.2 All Zeros on the Unit Circle From the proof of Theorem 7. then RL as well as RL+1 have rank L. that all the L zeros of L V(z) = 1 + m=1 v* z−m m (7. which are particularly useful when viewed in the time domain. This implies. 7. we can compute R(k) for all k using (7.2).9) by x*(n − k) and take expectations.3. be proved (Problem 24) that m L for 1 ≤ m ≤ L for m ≥ L. the rank saturates as soon as m exceeds the value L.2 Rank Saturation for a Line Spectral Process From the above discussions. Conversely.

we can write ωi = 2π Li /M where M is the least common multiple (LCM) of the set {Mi }. 0 ≤ i ≤ L. V(z) is a linear phase filter. L |c| = 1. Thus.14) Note that there are only L random variables di in Eq.9) necessarily have the form L x(n) = i=1 di e jωi n . x(n) is periodic with period M. That is. 1999). (7. ωi = Ki 2π Mi for integers Ki . we see that R(k) is periodic if and only if x(n) is periodic. 7.3). Because R(0) = R(M).3) is the power of the process x(n) at the line frequency ωi . Suppose ♦ R(0) = R(M) for some M > 0. whereas the random variable di is said to be the random amplitude (possibly complex) at the frequency ωi . Then. Because R(k) has the same form as x(n) (compare Eqs.14)). Proof. the following result is particularly interesting. (7.2. that the numbers vi satisfy the property vi = cv* −i . . it then follows that the L zeros of V(z) have the form e jωi . The WSS property of x(n) imposes some strong conditions on the joint behavior of the random variables di. all the frequencies are harmonics of the fundamental frequency 2π/M.15) Another way to see this would be as follows: R(k) satisfies the homogeneous difference equation (7.13) where v0 = 1. Mi . in particular. Theorem 7. although the ‘‘random’’ process x (n) is an infinite sequence.1 This means. (7.3) and (7. Let x(n) be a WSS random process with autocorrelation R(k). and Li are integers. the solutions to the homogeneous equation Eq.14). we say that the process is periodic with period M or just harmonic.3 Periodicity of a Line Spectral Process The process (7.14) is periodic if and only if all the frequencies ωi are rational multiples of 2π (Oppenheim and Schafer. (7. In this connection. and so is R(k). where Sxx (e jω ) has the lines. That is. (7. which we shall derive soon. 1 (7. it also has the form (7. The deterministic quantity ci in Eq.3. Under this condition.11). From the theory of homogeneous equations.92 THE THEORY OF LINEAR PREDICTION are on the unit circle at the L distinct points ωi . Because V(z) has the L distinct zeros e jωi . (7. At the same time. In this case. we have E[x(n)x*(n)] = E[x(n)x*(n − M)].

This implies x(n) = x(n − M).14).4 FURTHER PROPERTIES OF TIME DOMAIN DESCRIPTIONS Starting from the assumption that x(n) is a line spectral process. 0 ≤ k ≤ L of the Linespec(L) process x(n). Substituting further from Eq. if L = 3. we have ⎤ ⎡ ⎤⎡ ⎤ ⎡ c1 R(0) 1 1 1 ⎥ ⎢ jω1 jω2 jω3 ⎥ ⎢ ⎥ ⎢ (7. we derived the time domain expression Eq.LINE SPECTRAL PROCESSES 93 The mean square value of x(n) − x(n − M). The line frequencies ωi can therefore be identified. we can find all the future and past samples exactly. If we are given the values of L successive samples. Using the definition R(k) = E[x(n)x*(n − k)]. consider a sequence of the form Eq. So the random variable x(n) − x(n − M) has mean square value equal to zero. Then.16) e e ⎦ ⎣ c2 ⎦ = ⎣ R(1) ⎦ . For example.. So.4 Determining the Parameters of a Line Spectral Process Given the autocorrelation R(k). . x(1). 7. they have the form e jωi ). once we have measured L samples of the random process.14) by solving a set of equations similar to (7. 7. From Eq. x(L − 1). Thus. . the line frequencies and powers in Fig. we can uniquely identify the coefficients di appearing in Eq. with period M. The first and the last terms are equal because of WSS property.e. All its L zeros are guaranteed to be on the unit circle (i. (7. Similarly. we can uniquely identify the line powers ci . it is readily verified that R(k) is also periodic.16). we get μ = 0. say x(0). we can identify the unique eigenvector v satisfying RL+1 v = 0. (7.14). with zero error! The randomness is only in the L coefficients {di }. (7. where di are random . (7. and once these have been identified. . 7. consider the samples x(n) of the Linespec(L) random process x(n). Conversely. 0 ≤ k ≤ L. (7.3. ⎣e 2jω1 2jω2 2jω3 e e e c3 R(2) The 3 × 3 matrix is a Vandermonde matrix and is nonsingular because ωi are distinct (Horn and Johnson.12)). the unique characteristic polynomial V(z) is determined (Eq. 1985). we can solve for the constants ci (line powers) by writing a set of linear equations. namely. x(n) is known for all n. So x(n) is periodic with period M.15). (7. Identifying amplitudes of sinusoids in x (n).1 have been completely determined from the set of coefficients R(k). Thus. E can be written as x(n) − x(n − M) x*(n) − x*(n − M) . μ = E[|x(n)|2 ] − E[x(n)x*(n − M)] −E[x(n − M)x*(n)] + E[|x(n − M)|2].3).

Thus. Now consider the converse. Suppose x(n) is WSS. So the first condition of the lemma is necessary.1. We have E[x(n)] = i=1 E[di ]e jωi n . L assume that the two conditions in the lemma are satisfied. Because E[x(n)] is independent of n. Next.18) can be rewritten as L L E[x(n)x*(n − k)] = i=1 m=1 e jωi n E[di dm ]e−jωmn e jωm k . By the first condition of the lemma. E[didm ] = 0 for i = m (orthogonality condition).18) With di satisfying the second condition of the lemma. First. (7.17) where di are random variables and ωi are distinct constants. E[di] = 0 whenever ωi = 0 (zero-mean condition). this reduces to L E[x(n)x*(n − k)] = i=1 E[|di |2 ]e jωi k . Then. (7. ♦ Proof. the two conditions of the lemma imply that x(n) is WSS.94 THE THEORY OF LINEAR PREDICTION variables. as shown next. this is constant for all n.14) itself imposes some severe restrictions on the random variables di.19) which is independent of n. * . we see from Eq. Recall that x(n) is WSS if E[x(n)] and E[x(n)x*(n − k)] are independent of n. (7. x(n) is a WSS process if and only if 1. Eq.17) that E[di ] = 0 whenever ωi = 0. L L E[x(n)x*(n − k)] = i=1 m=1 E[didm ]e jωi n e−jωm (n−k) * (7. we first have to address WSS property. Next. * 2. The WSS property of Eq. (7. Consider a random process of the form L x(n) = i=1 d i e j ωi n . Is it necessarily a line spectral process? Before addressing this question. (7. Lemma 7.

LINE SPECTRAL PROCESSES

95

Express the right-hand side as

[ e jω1 n

e−jω1 n 0 ⎢ ⎢ 0 e−jω2 n jωL n ]D ⎢ ... e . ⎢ . . . . ⎣ . 0 0 call this A(n)

⎤ e jω1 k ⎥ ⎢ jω2 k ⎥ ⎥⎢ e ⎥ ⎥ ⎢ . ⎥, ⎥⎢ . ⎥ ⎦⎣ . ⎦ −jωL n ...e e jωL k ... ... .. .
0 0 . . . t(k)

⎤⎡

where D is an L × L matrix with [D]im = E[di dm ]. WSS property requires that A(n)t(k) be * independent of n, for any fixed choice of k. Thus, defining the matrix T = [ t(0) t(1) . . . t(L − 1) ], we see that A(n)T must be independent of n. But T is a Vandermonde matrix and is nonsingular because ωi are distinct. So A(n) must itself be independent of n. Consider now its mth column:
L

[A(n)]m =
i=1

E[di dm *]e j(ωi −ωm )n .

Because ωi are distinct, the differences ωi − ωm are distinct for fixed m. So the above sum is * independent of n if and only if E[di dm ] = 0 for i = m. The second condition of the lemma is, therefore, necessary as well. From Eq. (7.19), it follows that R(k) has the form
L

R(k) =
i=1

E[|di |2 ]e jωi k

(7.20)

so that the power spectrum is a line spectrum. Thus, if di satisfies the conditions of the lemma, then x(n) is not only WSS, but is also a line spectral process of degree L. Summarizing, we have proved this: Theorem 7.3. Consider a random process of the form
L

x(n) =
i=1

di e jωi n ,

(7.21)

where di are random variables and ωi are distinct constants. Then, x(n) is a WSS and, hence, a line spectral process if and only if 1. E[di ] = 0 whenever ωi = 0 (zero-mean condition). * 2. E[di dm ] = 0 for i = m (orthogonality condition).

96

THE THEORY OF LINEAR PREDICTION

Under this condition, R(k) is as in Eq. (7.20), so that E[|di |2 ] represents the power at the line ♦ frequency ωi . Example 7.1: Stationarity and Ergodicity of Line spectral Processes. Consider the random process x(n) = Ae jω0 n , where ω0 = 0 is a constant and A is a zero-mean random variable. We see that E[x(n)] = E[A]e jω0 n = 0 and E[x(n)x*(n − k)] = E[|A|2 ]e jω0 k . Because these are independent of n, the process x(n) is WSS. If a process x(n) is ergodic (Papoulis, 1965), then, in particular, E[|x(n)|2] must be equal to the time average. In our case, because |x(n)| = |A|, we see that 1 2N + 1
N

|x(n)|2 = |A|2
n=−N

and is, in general, a random variable! So the time average will in general not be equal to the ensemble average E[|A|2 ], unless |A|2 is nonrandom. (For example, if A = ce jθ , where c is a constant and θ is a random variable, then |A|2 would be a constant.) So we see that, a WSS line spectral process is not ergodic unless we impose some strong restriction on the random amplitudes. Example 7.2: Real Random Sinusoid. Next consider a real random process of the form x(n) = A cos(ω0 n + θ), where A and θ are real random variables and ω0 > 0 is a constant. We have x(n) = 0.5Ae jθ e jω0 n + 0.5Ae−jθ e−jω0 n . From Lemma 7.1, we know that this is WSS if and only if E[Ae jθ ] = 0, E[Ae−jθ ] = 0, and E[A 2 e j2θ ] = 0.

(7.22)

This is satisfied, for example, if A and θ are statistically independent random variables and θ is uniformly distributed in [0, 2π). Thus, statistical independence implies E[Ae jθ ] = E[A]E[e jθ ] and E[A 2 e j2θ ] = E[A 2 ]E[e j2θ ],

and the uniform distribution of θ implies E[e jθ ] = E[e j2θ ] = 0, so that Eq. (7.22) is satisfied. Notice that Lemma 7.1 requires that Ae jθ and Ae−jθ have zero mean for WSS property, but it is not necessary for A itself to have zero mean! Summarizing, x(n) = A cos(ω0 n + θ) is WSS whenever the real random variables A and θ are statistically independent and θ is uniformly distributed in [0, 2π]. It is easily verified that E[x(n)] = 0 and R(k) = E[x(n)x(n − k)] = 0.5E[A 2 ]cos(ω0 k) = P cos(ω0 k),

LINE SPECTRAL PROCESSES

97

where P = 0.5E[A ]. So the power spectrum is Sxx (e jω ) = P × 2π δa (ω − ω0 ) + δa (ω + ω0 ) , 0 ≤ ω < 2π. 2

2

The quantity P is the total power in the two line frequencies ±ω0 . Example 7.3: Sum of Sinusoids. Consider a real random process of the form
N

x(n) =
i=1

Ai cos(ωi n + θi ).

(7.23)

Here Ai and θi are real random variables and ωi are distinct positive constants. Suppose we make the following statistical assumptions: 1. Ai and θm are statistically independent for any i, m. 2. θi and θm are statistically independent for i = m. 3. θm is uniformly distributed in 0 ≤ θm < 2π for any m. We can then show that x(n) is WSS. For this, we only have to rewrite x(n) in the complex form Eq. (7.21) and verify that the conditions of Theorem 7.3 are satisfied (Problem 25). Notice that no assumption has been made about the joint statistics of Ai and Am , for i = m. Under the above conditions, it can be verified that E[x(n)] = 0. How about the autocorrelation? We have R(k) = E[x(n)x(n − k)] = E[x(0)x(−k)] (using WSS)
N N

=E
i=1

Ai cos(θi )
m=1

Am cos(−kωm + θm ).

Using the assumptions listed above, the cross terms for i = m reduce to E[Ai Am ]E[cos(θi )]E[cos(−kωm + θm )] = 0. When i = m, the terms reduce to Pi cos(ωi k) as in Ex. 7.2. Thus,
N

R(k) = E[x(n)x(n − k)] =
i=1

Pi cos(ωi k),

where Pi =

0.5E[Ai2 ]

> 0. The power spectrum is therefore
N

Sxx (e jω ) = π
i=1

Pi δa (ω − ωi ) + δa (ω + ωi ) ,

0 ≤ ω < 2π.

The total power at the frequencies ωi and −ωi is given by 0.5Pi + 0.5Pi = Pi .

and the outcome A0 cos(ω0 n + θ0 ) is fully determined for all n and is said to be a realization of the random process.2. a plot of x(n) does not look ‘‘random. vL ] T such that RL+1 v = 0. where there are only L random variables di . the result of an experiment determines the values of the random variables A and θ to be A0 and θ0 . then we know the entire waveform x(n). and the plot of x(n) as a function of n (Fig. then the time average cannot tell us much about the ensemble averages indeed. 1965. By comparison with the augmented normal equation Eq. 1987).i = vi .1).21). 7. Each waveform is an outcome of an experiment.3: Plot of the cosine. 7. . Here.’’ For example. .2. always appears to hold some conceptual mystery. (7. So the samples are fully predictable. 7.98 THE THEORY OF LINEAR PREDICTION 1 cos(ω 0 n) 0 n FIGURE 7. by definition. we see that if we take the coefficients of the Lth-order predictor to be aL. For example. we argued that there exists a unique vector v of the form v = [ 1 v1 . is a collection (or ensemble) of waveforms (Papoulis. Philosophical Discussions. as also seen from the difference equation Eq. . Peebles. A random process of the form Eq. where does the randomness manifest? Recall that a random process.3) is just a nice and smooth cosine! So. (2. the random variables A and θ do not depend on n. consider A cos(ω0 n + θ) discussed in Ex. If we measure the L random variables di by some means. Furthermore. It is for this reason that it is somewhat tricky to force ergodicity (Ex. In Section 7.9).20). (7.5 PREDICTION POLYNOMIAL OF LINE SPECTRAL PROCESSES Let x(n) be Linespec(L) and let RL+1 be its autocorrelation matrix of size (L + 1) × (L + 1). 7. but if a particular outcome (time function) of the random process does not exhibit ‘‘randomness’’ along time. ergodicity says that time averages are equal to ensemble averages.

. ⎥⎢ . For example. . (7. let there be no such equation of smaller degree satisfied by R(k). i (7. we saw that V(z) has all the L zeros on the unit circle.24) Furthermore. Lemma 7. . we could proceed as in Section 7. + v∗ z−L = V(z). we can verify that Eq. Proof. It is clear from Section 7. . 1 L In Section 7. we therefore conclude that the process x(n) is Linespec(L).2. an is symmetrical or antisymmetrical. R(L) 1 ⎥⎢ ⎥ ⎢ ⎢ R*(1) R(0) . ⎢ . RL+1 R(0) aL So RL+1 is singular. . Because this violates the conditions of the lemma.2. . we could derive an equation of the form Eq. If RL were singular. ⎥ = 0. We therefore conclude that the optimal Lth-order predictor polynomial AL (z) of a Linespec(L) process has all L zeros on the unit circle. Let R(k) be the autocorrelation of a WSS process x(n). . . R*(L) R*(L − 1) . . . ⎥ ⎢ . . at the points e jωi . Using Theorem 7.24) implies ⎤⎡ ⎤ ⎡ R(0) R(1) . In what follows. . By using the property R(k) = R*(−k). we conclude that RL is nonsingular. (7.1 to show that x(n) satisfies a difference equation of the form Eq.24) but with lower degree L − 1. ⎦⎣ .1. and let R(k) satisfy the difference equation L R(k) = − i=1 a∗ R(k − i). . The optimal predictor polynomial AL (z) = 1 + v∗ z−1 + .. . . Then the process L x(n) is line spectral with degree L and the polynomial AL (z) = 1 + i=1 a∗ z−i necessarily has all i ♦ the L zeros on the unit circle. we present two related results. R(L − 1) ⎥ ⎢ aL ⎥ ⎥ ⎢ . .3 that the polynomial AL (z) is the characteristic polynomial V(z) of the process x(n) and therefore has all zeros on the unit circle. ⎦ ⎣ . This also implies an = ca∗ −n L for some c with |c| = 1.2. (7. where ωi are the line frequencies. .3.9) (but with lower-degree L − 1). in the real case.LINE SPECTRAL PROCESSES 99 then the optimal Lth-order prediction error for the process x(n) is therefore f EL becomes zero. from this.

24) with L replaced by L − 1. 1. Here. (7. 7.24).1 to derive an equation of the form Eq. then 1. L Sxx (e jω ) = 2π i=1 ci δa (ω − ωi ). i 2. Applying Lemma 7. The polynomial AL (z) = 1 + i=1 a∗ z−i has all the L zeros on the unit circle. the process x(n) as well as its autocorrelation . Spectrum has only impulses. We say that a WSS random process is Linespec(L) if the power spectrum is made of L impulses. (7. at the expense of being somewhat repetitive. (7. see Problem 23. This is a somewhat surprising conclusion. (7.6 SUMMARY OF PROPERTIES Before we proceed to line spectral processes buried in noise. then imposing the WSS property on x(n) immediately forces the zeros of AL (z) to be all on the unit circle! Proof. If the process x(n) is WSS. we summarize. x(n) is line spectral with degree L. (7. the claim of the theorem immediately follows.25) by x*(n − k) and take expectations. let there be no such equation of smaller degree satisfied by x(n). Suppose the process is WSS.25) with L replaced by L − 1.25) Furthermore. i (7. (7. In this case.25) (and if L is the smallest such number).26) with ci > 0 for any i. Let x(n) be a random process satisfying L x(n) = − i=1 a∗ x(n − i).100 THE THEORY OF LINEAR PREDICTION We now present a related result directly in terms of the samples x(n) of random process.24). although its proof is very simple in light of the above discussions. we could prove then that RL is singular and proceed as in Section 7. there is no smaller equation of the form Eq. the main properties of line spectral processes.2.) Theorem 7.4. (7. that is. If there existed an equation of the form Eq. if a process satisfies the difference equation Eq.2. If we multiply both sides of Eq. ωi are distinct and called the line frequencies and ci are the powers at these line frequencies. L ♦ Thus. Because this violates the conditions of the theorem. (For a more direct proof. this results in Eq. 0 ≤ ω < 2π.

Furthermore. (7. L 5.LINE SPECTRAL PROCESSES 101 R(k) satisfy an Lth-order recursive. real. the rank has the saturation behavior sketched in Fig. Autocorrelation matrix.3. Sum of sinusoids. where Ai and θi are real random variables and ωi are distinct positive constants.29) where di are random variables (amplitudes at the line frequencies ωi ) and ci = E[|di |2 ] (powers at the line frequencies ωi ). This polynomial has all its L zeros on the unit circle at z = e jωi . where Pi = 0. line spectral. complex. homogeneous. Similarly. Consider the process x(n) = i=1 Ai cos(ωin + θi ). vL T (7. So RL+1 has rank L. . Sum of sinusoids. . i = m. 7. if and only if (a) E[di ] = 0 whenever ωi = 0 and (b) E[di dm ] = 0 for L 2 jωi k (Theorem 7. R(k) = − i=1 vi*R(k − i). 7. 3. v* are the coefficients of the polynomial i L V(z) = 1 + i=1 v* z−i . 2. A random process of the form x(n) = i=1 di e jωi n is WSS and. line spectral. The random process x(n) as well as the autocorrelation R(k) can be expressed as a sum of L single-frequency signals L L x(n) = i=1 die jωi n. all the autocorrelation coefficients R(k) of the process x(n) can be computed if R(k) is known for 0 ≤ k ≤ L − 1.5E[A i ] is the total power in the line frequencies ±ωi . In the above. This is WSS. Characteristic polynomial. Sum of sinusoids. R(k) = i=1 ci e jωi k . The autocorrelation is then R(k) = N 2 i=1 Pi cos(ωi k). if L successive samples are known. hence. under the conditions stated in Ex. R(k) = i=1 E[|di | ]e N 6. Under this condition. difference equation: L L x(n) = − i=1 vi*x(n − i).27) So all the samples of the sequence x(n) can be predicted with zero error.30) satisfying RL+1 v = 0. i (7.3). and this vector uniquely determines the characteristic polynomial V(z).28) This is called the characteristic polynomial of the Linespec(L) process x(n). 4. * hence. . A WSS process x(n) is Linespec(L) if and only if the autocorrelation matrix RL+1 is singular and RL nonsingular. There is a unique vector v of the form v = 1 v1 .2. (7.

4. The total power spectrum is demonstrated in Fig. that is. 7.4. a line spectral process with degree L). Notice that some of the line frequencies could be very closely spaced. 7. Given the set of autocorrelation coefficients R(k). . An excellent treatment can be found in Therrien (1992). we will find that the Lth-order prediction polynomial AL (z) is equal to the characteristic polynomial V(z) and therefore has all its zeros on the unit circle. that is.7 IDENTIFYING A LINE SPECTRAL PROCESS IN NOISE Identification of sinusoids in noise is an important problem in signal processing. with value σe2 for all frequencies. (7. then x(n) is periodic with period M.e. If a WSS process x(n) is such that R(0) = R(M) for some M > 0. Assume that x(n) and e(m) are uncorrelated. we can determine vi in Eq. (7. Prediction polynomial. It is also related to the problem of finding the direction of arrival of a propagating wave using an array of sensors.. m. 9. the constants ci are determined as described in Section 7. The power spectrum of x(n) has line frequencies (impulses). Periodicity. The prediction error f EL = 0. and some could be buried beneath the noise level. and so is R(k) (Theorem 7. for any n. Consider a random process y(n) = x(n) + e(n). E[x(n)e*(m)] = 0. 8. We thus have a line spectral process buried in noise (popularly referred to as sinusoids buried in noise).31) where x(n) is Linespec(L) (i. whereas that of e(n) is flat.3.102 THE THEORY OF LINEAR PREDICTION 7. We will be content with giving a brief introduction to this problem here.28) from the eigenvector v of RL+1 corresponding to zero eigenvalue.2). Let e(n) be zero-mean white noise with variance σe2 . Once the line frequencies are known. So the process is line spectral. E[e(n)e*(n − k)] = σe2 δ(k). The line frequencies ωi can then be determined because e jωi are the zeros of the characteristic polynomial V(z). If we perform optimal linear prediction on a Linespec(L) process. 0 ≤ k ≤ L of the Linespec(L) process x(n). Identifying the parameters.

We say that Rm is the signal autocorrelation matrix and rm is the signal-plus-noise autocorrelation matrix. so that Rm ui = ηiui . Let r(k) and R(k) denote the autocorrelation sequences of y(n) and x(n).32) Note. rm Thus. the eigenvalues of rm are λi = ηi + σe2 . k = 0. that r(k) = R(k).1 Eigenstructure of the Autocorrelation Matrix Let ηi be an eigenvalue of Rm with eigenvector ui .4: A Linespec(L) process with additive white noise of variance σe . (7. Then [Rm + σe2 Im ] ui = [ηi + σe2 ]ui. The corresponding eigenvectors of rm are the same as those of Rm . Using this and the whiteness of e(n). Thus. the m × m autocorrelation matrix rm of the process y(n) is given by rm = Rm + σe2 Im . respectively. in particular.7. which yields r(k) = R(k) + σe2 δ(k) (7. we get r(k) = E[y(n)y*(n − k)] = E[x(n)x*(n − k)] + E[e(n)e*(n − k)] = R(k) + σe2 δ(k). 7.LINE SPECTRAL PROCESSES 2 π c1 Sxx (e jω 103 ) 2π c4 0 noise variance 2 σe … ω L 2π ω1 ω2 ω4 ω 2 FIGURE 7. .33) where Rm is the m × m autocorrelation matrix of x(n).

So it behaves in a monotone manner as depicted in Fig. its autocorrelation matrix Rm is nonsingular for 1 ≤ m ≤ L and singular for m = L + 1 (Theorem 7. We can use this as a means of identifying the number of line frequencies L in the process x(n). Accordingly. Because x(n) is Linespec(L). 7. the smallest eigenvalue of Rm is positive for m ≤ L and equals zero for m > L.1). Thus.5: (a) The decreasing minimum-eigenvalue of the autocorrelation matrix Rm as the size m increases and (b) corresponding behavior of the minimum eigenvalue of rm . (This could be misleading in some cases.5(b).104 THE THEORY OF LINEAR PREDICTION R(0) λ min (R m ) (a) 0 1 2 L L+1 m (size) 2 R(0) + σe λ (r ) min m (b) noise variance 2 σe 0 1 2 L L+1 m (size) FIGURE 7. as we cannot keep measuring the eigenvalue for an unlimited number of values of m. the smallest eigenvalue of rm behaves as in Fig. 7. see Problem 29.33) this implies RL+1 v + σe2 v = σe2 v. as well as the variance σe2 of the noise e(n). (7. that is. So rL+1 v = σe2 v. In view of Eq. for m > L it attains a constant value equal to σe2 . Let v be the eigenvector of rL+1 corresponding to the smallest eigenvalue σe2 .) Eigenvector corresponding to the smallest eigenvalue. It is easily shown (Problem 12) that the smallest eigenvalue λmin (Rm ) of the matrix Rm cannot increase with m. (7. RL+1 v = 0. In particular. The Minimum Eigenvalue.5(a).34) .

⎦⎣ . then the above discussions continue to hold. From Eq. e jωL c1 R(1) ⎥ ⎢ j2ω1 j2ω2 ⎥⎢ ⎥ ⎢ e . . ⎥⎢ ⎥ ⎢ . Once we identify the vector v.e. e j2ωL ⎥ ⎢ c2 ⎥ ⎢ R(2) ⎥ ⎢e ⎢ . if we know enough samples of R(k). . So we know the values of R(1). ⎣ . 1989). the power method (Golub and Van Loan. For this note that R(k) is related to the given autocorrelation r(k) as in Eq. hence. . . If x(n) in Eq.2 Computing the Powers at the Line Frequencies The autocorrelation R(k) of the process x(n) has the form (7.3 recall that a process of the form N x(n) = i=1 Ai cos(ωi n + θi ) is WSS and hence. .3).28)).31) has this form.7. The technique described in this section to estimate the parameters of a line spectral process is commonly referred to as Pisarenko’s harmonic decomposition. .e. . we can compute ci by solving linear equations (as in Eq.. So we know all the samples of R(k) (because σe2 has been identified as described above). . Its roots are guaranteed to be on the unit circle (i. . We can write a set of L linear equations ⎤ ⎡ ⎤⎡ ⎤ ⎡ e jω1 e jω2 . (7. ⎥ = ⎢ . (7. revealing the line frequencies. Case of Real Processes. ωk need not be integer multiplies of a fundamental ω0 ). 7. The eigenvector corresponding to the minimum eigenvalue of rL+1 can be computed using any standard method. We can identify the eigenvector v of rL+1 corresponding to . .. have the form e jωi ). Notice that the line frequencies need not be harmonically related for the method to work (i. (7. . (7. for example. e jLω1 e jLω2 . From Ex.32) we have R(k) = r (k).32). The smallest eigenvalue of rL+1 gives the noise variance σe2 . ⎥ ⎥ ⎢ . 7. e jLωL cL R(L) The L × L matrix above is Vandermonde and nonsingular because the line frequencies ωi are distinct (Horn and Johnson. harmonic. then we can find the characteristic polynomial V(z) (Eq. (7.. So we can solve for the line powers ci uniquely. ⎥⎢ . . ⎦ .16)). .LINE SPECTRAL PROCESSES 105 Thus. under some conditions. we can compute the line powers as follows. The case where x(n) is a real line spectral process is of considerable importance. R(0) are not known. such as. . . R(L). .35) . k = 0. What if we do not know the noise variance? Even if the noise variance σe2 and. where ci are the powers at the line frequencies ωi . 1985). (7. ⎦ ⎣ . . v is also the eigenvector that annihilates RL+1 . Because we already know ωi .

6065 T . and assume that we have noisy samples y(n) = x(n) + e(n). 0.5.0083 0.4985π.8904 From these coefficients.0461 −0. where e(n) is zero-mean white noise with variance σe2 = 0.2965j.9999j. In this case.1029π. buried in a background noise spectrum (as in Fig.6917 (7. Because R(k) is known for 1 ≤ k ≤ N.36) −2.34)) and. (7.106 THE THEORY OF LINEAR PREDICTION its minimum eigenvalue (Eq.6. We assume that 100 samples of y(n) are available.3619 0. With the elements of v regarded as the coefficients of the polynomial V(z).1 sin(0.9034π. we now compute the zeros of V(z) and obtain 0. and 0. −0. ±ω2 . ω2 . We need the 7 × 7 autocorrelation matrix R7 of the process y(n).1149 −1.9414 ± 0. we form the 7 × 7 Toeplitz matrix R7 and compute its smallest eigenvalue and the corresponding eigenvector v = 0. we estimate the autocorrelation R(k) for 0 ≤ k ≤ 6 using the autocorrelation method described in Section 2. to estimate the lines. The signal component x(n) and the noise component e(n) generated using the Matlab command randn are shown in Fig.5995 −1. 7. from the noisy data y(n). hence. The noisy data y(n) can be regarded as a process with a line spectrum (six lines because each sinusoid has two lines). as we did for the complex case. and ±ω3 .6 sin(0.4: Estimation of sinusoids in noise.1π n) + 1. the corresponding line frequencies as described previously. where Pi > 0 is the total power at the frequencies ±ωi .3373j. From the angles of these zeros. and ω3 are estimated to be 0. it is more convenient to write R(k) in the form N R(k) = i=1 Pi cos(ωi k).5π n) + 2.0068 ± 0. So.4). we identify the six line frequencies ±ω1 .6065 0.0986 0. The three sinusoidal frequencies ω1 . 0. .9550 ± 0. 7. This estimates the top row of R7 : 3.3 sin(0. Example 7.3619 0. Consider the sum of sinusoids x(n) = 0.9π n). we can identify Pi by writing linear equations.1.1572 −0.4002 2.0083 −0.

3058.LINE SPECTRAL PROCESSES 107 4 signal amplitude 2 0 -2 -4 0 n 99 4 noise amplitude 2 0 -2 -4 0 n 99 FIGURE 7. which are quite close to the correct values given in Eq. and 2. Estimation of sinusoids in noise.9π .1π . (7.6: Example 7.5π .1120. The plots show the signal and the noise components used in the example. namely.36).35) to estimate ck and finally obtain the amplitudes of the sinusoids (positive square roots of ck in this example). and 0. With ωk thus estimated and R(k) already estimated. 0. 1. 0. .6258. we can use (7. The result is 0.4.

we obtain the estimate 0. (7. consider the above example with noise variance increased to unity. Fig. In the preceding example. and 2.0725 for the line amplitudes.1 in Eq.5000π.1019 for the line amplitudes. With 10. we have demonstrated that sinusoids buried in additive noise can be identified effectively from the zeros of an eigenvector of the estimated autocorrelation. 0. we obtain the estimate 0. for the line frequencies and 0.1005π.2977.2808.6140. The preceding numbers represent averages over several repetitions of this experiment. many improved methods have been proposed.9010π for the line frequencies and 0. 1.6. the estimates will obviously be inaccurate.6162. Since then. then the accuracy of the estimate improves.5948.9001π for the line frequencies and 0. Large number of measurements. Thus. so that the reader sees an averaged estimate. Owing to the randomness of noise. and 2.5039π and 0. this estimate is quite good indeed.1001π. but this effect can be combated if there is a large record of available measurements.7 shows the signal and noise components for the first few samples---the noise is very significant indeed.000 measured samples y(n) used in the estimation process (an unrealistically large number).9000π. 1. almost comparable with the signal (noise variance σe2 = 2).108 THE THEORY OF LINEAR PREDICTION which should be compared with 0.3. If the noise is large. and 0. 7. and 2.4999π. With 10. If this is increased to a large number such as 5000.1037 for the line amplitudes.36).1030π. and 2. The result (again averaged over several experiments) is 0. The method outlined above reveals the fundamental principles as advanced originally by Pisarenko (1973). these estimates vary from experiment to experiment. we had 100 noisy measurements available. In this section.000 measured samples y(n) used in the estimation process . 1. Considering how large the noise is. Figure 7.3301. which can obtain more accurate estimates from a small . 0. 0. 0. 1.8 shows even larger noise.

These methods can also be applied to the estimation of direction of arrival information in . MUSIC (Schmidt. 1979. 1986. Estimation of sinusoids in large noise. Roy and Kailath.7: Example 7. 1983)..LINE SPECTRAL PROCESSES 109 4 signal amplitude 2 0 -2 -4 0 n 200 4 noise amplitude 2 0 -2 -4 0 n 200 FIGURE 7.4. and ESPRIT (Paulraj et al. 1986). number of noisy measurements. Notable among these are minimum-norm methods (Kumaresan and Tufts. 1989). The plots show the signal and the noise components used in the example.

array processing applications (Van Trees.8: Example 7. In fact. 2002). 1979). . A review of these methods can be found in the very readable account given by Therrien (1992).110 THE THEORY OF LINEAR PREDICTION 4 signal amplitude 2 0 -2 -4 0 n 200 4 noise amplitude 2 0 -2 -4 0 n 200 FIGURE 7.4. many of these ideas originated in the array processing literature (Schmidt. Estimation of sinusoids in very large noise. The plots show the signal and the noise components used in the example.

The motivation comes from the fact that the connection between LSP and perceptual properties is better understood (Itakura. 1984. Q(z) = AN (z) − z−(N+1) AN (z). x(n) A0 (z) A1 (z) AN (z) P(z) k1 z −1 B0 (z) … z −1 B1 (z) kN z −1 BN (z) Q(z) −1 FIGURE 7. . We explained the usefulness of the lattice coefficients {km } in the process. But this has not been found to be possible. Let us examine the zeros of these polynomials.6. 1995). Soong and Juang.9. a better perceptual interpretation in the frequency domain is obtained. one uses an important variation of the prediction coefficients called line-spectrum pairs (LSP). The set of LSP coefficients not only guarantees stability of the reconstruction filter under quantization. Kang and Fransen. 1975. In speech coding practice. If we knew the relative importance of various coefficients km in preserving perceptual quality. we could assign them bits according to that. The zeros of P(z) and Q(z) are solutions of the equations z−(N+1) z−(N+1) AN (z) AN (z) AN (z) AN (z) = −1 = e j(2m+1)π = 1 = e j2mπ [for P(z)] [for Q(z)]. we construct two new polynomials P(z) = AN (z) + z−(N+1) AN (z). 7. but. the coefficients km are difficult to interpret from a perceptual viewpoint. (7.LINE SPECTRAL PROCESSES 111 7. they are the transfer functions of the FIR lattice shown in Fig. To define the LSP coefficients. Equivalently. respectively.8 LINE SPECTRUM PAIRS In Section 5.37) Notice that these can be regarded as the polynomial AN+1 (z) in Levinson’s recursion for the cases kN+1 = 1 and kN+1 = −1. Although the quantization and transmission of lattice coefficients {km } guarantees that the reconstruction filter 1/AN (z) remains stable.9: Interpretation of the LSP polynomials P(z) and Q(z) in terms of the FIR LPC lattice. we discussed the application of linear prediction theory in signal compression. in addition. 1993.

That is. Q(−1) = 0 N = 4. Alternation Property. P(−1) = 0. it satisfies the modulus property ⎧ ⎪ < 1 for|z| > 1 −(N+1) z AN (z) ⎨ > 1 for|z| < 1 ⎪ AN (z) ⎩ = 1 for|z| = 1 See Vaidyanathan (1993. In fact. It is well known (Vaidyanathan.112 THE THEORY OF LINEAR PREDICTION Because the left-hand side is all-pass of order N + 1 with all zeros in |z| < 1.10: Development of LSPs. 75) for a proof. Thus. . we can say something deeper. Q(−1) = 0 FIGURE 7. (a) Monotone phase response of z−(N+1) AN (z)/AN (z) and (b) zeros of P(z) and Q(z) for the cases N = 4 and N = 5. p. P(−1) = 0. the solutions of the preceding equations are necessarily on the unit circle. all the N + 1 zeros of P(z) and similarly those of Q(z) are on the unit circle. p. 1993. 76) that the phase response φ(ω) of the stable all-pass filter z−(N+1)AN (z) AN (z) ω1 θ ω2 1 −π −2π −3π 2π ω φ(ω) (a) −2π (Ν +1) unit circle θ1 ω1 θ0 0 unit circle z-plane 0 θ1 ω1 θ0 zeros of P(z) (b) θ4 ω5 zeros of Q(z) ω6 θ5 N = 5.

11 for N = 4 and N = 5. These are called line spectrum pairs (LSP) associated with the predictor polynomial AN (z). 7. the zeros come in complex conjugate pairs. Because P(z) and Q(z) can be computed .39) Properties and Advantages of LSPs.. The angles of these zeros are denoted as ωi and θi .38) ω1 < θ1 < ω2 < θ2 < ω3 . the N + 1 zeros of P(z) and Q(z) alternate with each other as demonstrated in Fig. Given the N coefficients of the polynomial AN (z). These follow from the fact that AN (±1) = AN (±1).11: Examples of LSP coefficients for N = 4 and N = 5. making sure that the ordering (e. There are N such zeros (ωk and θk ). the N LSP parameters can be uniquely identified by finding the zeros of P(z) and Q(z). Note that P(−1) = 0 for even N and Q(−1) = 0 for odd N. is a monotone decreasing function.LINE SPECTRAL PROCESSES 113 ω2 θ2 0 θ1 ω1 z-plane ω2 θ2 ω3 0 θ1 ω1 unit circle unit circle LSP coefficients for N = 4 LSP coefficients for N = 5 FIGURE 7.g. Because AN(z) has real coefficients in speech applications. For example. These parameters are quantized. (7. if we know the zeros of P(z) and Q(z) in the upper half of the unit circle (and excluding z = ±1). (7. The quantized coefficients are then transmitted. as demonstrated in Fig.10(a)). Thus. 7.37)).10(b) for N = 4 and N = 5. spanning a total range of 2(N + 1)π (Fig. Eq. Q(1) = 0 and P(1) = 0 regardless of N. 7. Thus. we can fully determine these polynomials (their constant coefficients being unity by the definition (7. Moreover. The unit-circle coefficients in the lower half are redundant and therefore dropped.39)) is preserved in the process. the LSP parameters for N = 4 are the four ordered frequencies ω1 < θ1 < ω2 < θ2 and the LSP parameters for N = 5 are the five ordered frequencies (7.

and close-ended acoustic tube models. the reconstructed version of AN (z) is guaranteed to have all zeros in |z| < 1. The same is true of the phase response φ(ω) of the all-pass filter. the AR(N) model gives a good approximation of the speech power spectrum Sxx (e jω ). therefore tend to get crowded near the formant frequencies. the concept of reactances is well-known (Balabanian and Bickart. Stability preserved . the LSP coefficients are perceptually better understood. which are intersections of the horizontal lines (multiples of π ) with the plot of φ(ω). 3. 1969).6. 7. 2. (7. This information allows us to perform bit allocation among the LSP coefficients (quantize the crowded LSP coefficients with greater accuracy). As long as the ordering (e. as shown in Fig.39)) is preserved in the quantization. we can identify an approximation of AN(z) because AN (z) = (P(z) + Q(z))/2. It turns out that reactances have a pole-zero alternation property similar to the . Thus. an approximately 25% saving has been reported using this idea. Several features of the LSP coding scheme are worth noting. Acoustical tube models. the peak locations correspond approximately to the pole angles of the filter 1/AN (z).12(b). 1976). situations where one end of the tube is open or closed. Eq. called the formants of speech. Equivalently. Furthermore. It has been shown in speech literature that the lattice structure is related to an acoustical tube model of the vocal tract (Markel and Gray. stability of 1/AN (z) is preserved despite quantization as in the case of lattice coefficient quantization. To explain this. This approximation is especially good near the peak frequencies.g. this results in perceptually better speech quality. and more savings have been obtained in a series of other papers. The speech segment can then be reconstructed from the stable filter 1/AN (z) as described in Section 5. the LSP frequencies are related to the open. for a given perceptual quality. The values kN+1 = ±1 in Fig. This is the origin of the term reflection coefficients. as compared with the quantization of lattice coefficients. 1. Connection to circuit theory. Unlike the lattice coefficients. Perceptual spectral interpretation. the crucial features of the power spectrum tend to get coded into the density-information of the LSP coefficients at various points on the unit circle. 7. Reactances are input impedances of electrical LC networks. The LSP coefficients. From these. in early work. In the theory of passive electrical circuits. 4.9 indicate. approximations of these polynomial can be computed at the receiver. the phase response tends to change rapidly. Now. Near these locations. Thus.. we can reduce the bit rate. as the bit rate is reduced. recall first that for sufficiently large N. For a given bit rate. Thus. the speech degradation is found to be more gradual compared with lattice coefficient quantization. for example.114 THE THEORY OF LINEAR PREDICTION uniquely from the LSP parameters.

where AN (z) is the prediction polynomial (see text). we presented a rather detailed study of line spectral processes. The theory finds applications in the identification of sinusoids buried in noise. (a) A toy power spectrum with one formant and (b) the phase response of the all-pass filter z−(N+1)AN (z)/AN (z). Further applications of the concepts in signal compression. A variation of this problem is immediately applicable in array signal processing where one seeks to identify the direction of arrival of a signal using an antenna array (Van Trees.12: Explanation of how the LSP coefficients tend to get crowded near the formant regions of the speech spectrum. 7. • • • • . using LSPs. 2002). the ratio P(z)/Q(z) is nothing but a discrete time version of the reactance function. alternation of the zeros of the LSP polynomials P(z) and Q(z).LINE SPECTRAL PROCESSES formant 115 Power spectrum (a) 2π ω 2π ω −π phase of −2π allpass −3π filter −4π −5π (b) FIGURE 7.9 CONCLUDING REMARKS In this chapter. was briefly reviewed. In fact.

.

N N N.1) that R† (−k) = R(k).1 x(n − 1) + a† . we now formulate the linear prediction problem. Note that R(k) is independent of n (owing to WSS property) and depends only on k.3) (8. . .4) . Thus. as usual. Furthermore. Most of the developments will be along lines similar to the scalar case.2 x(n − 2) . (8.117 CHAPTER 8 Linear Prediction Theory for Vector Processes 8. it is readily shown from the definition (8.6 of Anderson and Moore (1979).N The forward prediction error is therefore eN (n) = x(n) − x(n) = x(n) + a† . But there are some differences between the scalar and vector cases that need to be emphasized. + a† . − a† x(n-N). .2) For a vector WSS process characterized by autocorrelation R(k). we consider the linear prediction problem for vector WSS processes. We shall go into considerably greater details in this chapter. 8.1 x(n − 1) − a† .N x(n − N) N N N f (8.1) where x† (n) represents transpose conjugation.2 x(n − 2) .1 INTRODUCTION In this chapter. Let x(n) denote the L × 1 vector sequence representing a zero-mean WSS process. (8. The autocorrelation is a matrix sequence with each element R(k) representing an L × L matrix.2 FORMULATION OF THE VECTOR LPC PROBLEM The forward predictor of order N predicts the sample x(n) from a linear combination of past N samples: x(n) = −a† . . E[x(n)] = 0 and the autocorrelation is R(k) = E[x(n)x† (n-k)]. A short and sweet introduction to this topic was given in Section 9.

1 z−1 + a† . f 1 ≤ k ≤ N. Proof. by definition. . (8.5) The optimal predictor has the L × L matrix coefficients aN. . so that e(n) = e⊥ (n) + x⊥ (n) − x(n). and let x(n) be another predicted value with error e(n) (superscript f and subscript N deleted for simplicity). by construction.Nz−N . E[eN (n)x† (n − k)] = 0.k appropriately.k chosen such that the total mean square error in all components is minimized. . Orthogonality principle (vector processes). linear combinations of x(n − k).7) EN = Tr(EN ) f Δ f (8. . Let x⊥ (n) be the predicted value for which the error e⊥ (n) satisfies the orthogonality condition (8. This quantity is also equal to the trace (i.8) by choosing aN. . The goal therefore is to minimize f f f (8. it follows that E[e(n)e† (n)] = E[e⊥ (n)e† (n)] +E[(x⊥ (n) − x(n))(x⊥ (n) − x(n))† ] ⊥ call this E call this E⊥ (8. Note that this is an L × L matrix. 1 ≤ k ≤ N.. x(n − N ). is orthogonal to these past samples. (8.1. The Nth-order forward linear predictor f for a WSS vector process x(n) is optimal if and only if the error eN (n) at any time n is orthogonal to the N past observations x(n − 1).e.10) Now. Now. that is.9).118 THE THEORY OF LINEAR PREDICTION and the forward prediction polynomial is AN (z) = IL + a† . Because the error e⊥ (n). sum of diagonal elements) of the forward prediction error covariance matrix EN = E[eN (n)(eN (n))† ]. e(n) = x(n) − x(n) = x(n) − x⊥ (n) + x⊥ (n) − x(n).2 z−2 + . EN = E[(eN (n))†eN (n)] f Δ f f (8. . N N N (8.11) . the estimates x⊥ (n) and x(n) are. The theory of optimal predictors for the vector case is again based on the orthogonality principle: Theorem 8.9) where the right-hand side 0 represents the L × L zero-matrix (with L denoting the size of the ♦ column vectors x(n)).6) is minimized. that is. + a† .

N x(n − N) x† (n) = R(0) + a† . + a† . . in turn. which are orthogonal to eN(n). So. is equivalent to x⊥ (n) − x(n) = 0. (8. The error covariance EN for the optimal predictor can be expressed in an elegant way using the orthogonality conditions: EN = E[eN(n)(eN (n))†] = E[eN (n) x(n) − x(n) ] = E[eN(n)x† (n)]. .12) where the notation A ≥ 0 means that the Hermitian matrix A is positive semidefinite. From (8. all eigenvalues are nonnegative. (8.11). . The third equality follows by observing that the optimal x(n) is a linear combination of samples f x(n − k). it is positive semidefinite. . Optimal Error Covariance. + R(N)aN. which.NR† (N) N N N Because the error covariance EN is Hermitian anyway.1 + R(2)aN.1 x(n − 1) + aN. Taking trace on both sides. This implies that the matrix itself is zero (because any positive semidefinite matrix can be written as A = UΛU† . . its trace is zero only when1 E − E⊥ = 0. where Λ is the diagonal matrix of eigenvalues and U is unitary). The trace of a matrix A is the sum of its eigenvalues.N.1 R† (1) + a† . Using Eq. The preceding therefore shows that E − E⊥ ≥ 0 (8. 1 . When A is positive semidefinite. we see that this is equivalent to the condition E[(x⊥ (n) − x(n))(x⊥ (n) − x(n))† ] = 0. .2 + .13) When does equality arise? Observe first that E − E⊥ = Tr(E − E⊥ ). zero-trace implies all eigenvalues are zero. we can rewrite this as EN = R(0) + R(1)aN.4).14) We shall find this useful when we write the augmented normal equations for optimal linear prediction. Because E − E⊥ has been shown to be positive semidefinite. we therefore obtain E ≥ E⊥ .LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 119 Because the second term on the right-hand side is a covariance. This proves that equality is possible if and only if x(n) = x⊥ (n). this can be rewritten as EN = E x(n) + aN.2 R† (2) + . f f f † † † f f f f f † f (8.2 x(n − 2) . + aN.

⎥ ⎦ ⎣ .1 + R† (k − 2)aN. ⎤⎡ .17) These represent the normal equations for the vector LPC problem.8) (remembering R† (−k) = R(k)). By using the fact that R (−k) = R(k). . . we shall rewrite the equations in the form R† (k − 1)aN. R(0) ⎡ R(0) R(−1) .4 BACKWARD PREDICTION The formulation of the backward predictor problem for the vector case allows us to derive a recursion similar to Levinson’s recursion in the scalar case (Chapter 3). . . . .1 ⎥⎢ . ⎥⎢ . . R(N ) IL ⎥⎢ . R(−N ) R(−N + 1) . . . we have to introduce the idea of backward prediction. . .NR(k − N ) = 0.15) for 1 ≤ k ≤ N. . . . . .1 R(−1) ⎥⎢ ⎥ ⎢ ⎥ ⎢ . .4) for the prediction error into the orthogonality condition (8. . . . .1).120 THE THEORY OF LINEAR PREDICTION 8.14) at the top. N N N (8. But first. This also results in elegant . R(N − 2) ⎥ ⎢ aN.N ⎤ ⎤ f EN ⎥ ⎢ ⎥ ⎥ ⎢ 0 ⎥ ⎥ = ⎢ .N = −R† (k) † (8. + R† (k − N )aN. we get the following augmented normal equations for the optimal vector LPC problem: ⎡ ⎢ ⎢ ⎢ ⎢ ⎣ R(0) R(−1) . . (8. ⎦⎣ . If we move the vector on the right to the left-hand side and then prefix Eq. ⎦ 0 ⎡ (8.2 R(k − 2) + . R(N − 1) ⎥ ⎢ aN. .k .2 + . (8. Note that these equations reduce to the normal equations for scalar processes given in Eq. . . . . ⎥. . these equations can be written in the following elegant form: ⎤⎡ ⎤ ⎤ ⎡ . ⎢ . . ..16) for 1 ≤ k ≤ N. . ⎥ ⎢ . we get N matrix equations. Solving these equations. R(1) R(0) .N R(−N ) R(−N + 1) R(−N + 2) . (2. we find these equations to be R(k ) + a† . By using the definition of autocorrelation given in Eq. For convenience.2 ⎥ ⎢ ⎢ R(−2) ⎥ ⎥⎢ . ⎥⎢ . (8. ⎦⎣ .18) We will soon return to this equation and solve it. R(0) aN. R(1) R(0) . . 8. ⎦ ⎦ ⎣ ⎣ aN . . Here the 0 on the right is an L × L matrix of zeros.. ⎥ = −⎢ ⎥. . R(N − 1) aN . + a† .9).3 NORMAL EQUATIONS: VECTOR CASE Substituting the expression (8. .1 R(k − 1) + a† . ⎥ ⎥ ⎢ ⎢ . we can obtain the N optimal predictor coefficient matrices aN.

1) and the facts that EN is Hermitian and R (−k) = R(k).2 z−2 .21) .N x(n − N ). . . .1 x(n − 1) + bN. N This results in the N equations bN. The backward prediction error is eb (n) = x(n − N − 1) − x(n − N − 1).1 + . .N + R(0). N so that eb (n) = x(n − N − 1) + bN. . − bN.N R(k − N ) + R(k − N − 1) = 0 for 1 ≤ k ≤ N.1 x(n − 1) + . (8. The third equality follows by observing that the optimal x(n − N − 1) is a linear combination of b samples x(n − k) that are orthogonal to eb (n).9). The Nth-order backward linear predictor predicts the L × 1 vector x(n − N − 1) as follows: x(n − N − 1) = −bN. .1 R(k − 1) + bN. . + bN.2 x(n − 2) + . E[eb (n)x† (n − k)] = 0.2 x(n − 2) − . . N So the backward prediction polynomial takes the form BN (z) = b† . .k are L × L matrices.1 x(n − 1) − bN. . N N N † † † † † † (8. namely.N x(n − N) x† (n − N − 1) N N b † † † † † 1≤k≤N (8.N x(n − N ). where bN. + b† . So EN can be rewritten as N Eb = E N x(n − N − 1) + b† . + R(−1)bN.N z −N + z−(N+1) I. . The error covariance for the optimal backward predictor is Eb = E[eb (n)(eb (n))†] = E[eb (n) (x(n − N − 1) − x(n − N − 1)) ] N N N N b † = E[eN (n)(x (n − N − 1)]. + b† . this can further be rewritten as Eb = R(−N)bN.20) Using Eq.1 z−1 + b† . bN.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 121 lattice structures.19) The optimal coefficients bN. N (8.k are again found by imposing the orthogonality condition (8. .2 R(k − 2) + .

23) with the hope that the last entry FN in the right-hand side is b 2 Actually L extra columns and rows because each entry is itself an L × L matrix. (8. ⎦ IL Eb R(−N) R(−N + 1) .24) where FN comes from the extra row added at the top. possibly nonzero. . ⎥ = ⎢ . ⎥ . . . .22) 8. . (8. (8.23) where FN is an L × L matrix. R(N) bN . R(N − 1) R(N) ⎥ ⎢ N.1 ⎥ ⎢ 0 ⎥ ⎢ ⎥⎢ . ⎥ ⎢ ⎥⎢ ⎢ ⎥ ⎢ ⎥ R(1) ⎦ ⎣ aN. . ⎥⎢ . .. . . ⎥ ⎢ . .20) and rearranging slightly. .24) by an L × L f f matrix (KN+1 )† and add it to Eq.22). (8. . . R(1) R(0) . (8. ⎥ ⎢ .122 THE THEORY OF LINEAR PREDICTION Combining this with the N equations in Eq. . Similarly. ⎥ ⎢ 0 ⎥ . . . ⎥ ⎢ . R(0) b R(−N − 1) R(−N) .N ⎦ ⎣ 0 ⎦ ⎣ R(−N) R(−N + 1) . . ⎥⎢ . . R(−1) R(0) 0 FN ⎡ R(0) R(−1) . ⎥ ⎢ . ⎥. ⎥ ⎢ . . we add an extra row and column to obtain f ⎤⎡ ⎤ ⎡ b⎤ .1 0 ⎥⎢ ⎥ ⎢ ⎥ . . . . (8. we obtain the following augmented normal equations for the backward vector predictor: ⎡ R(0) ⎢ ⎢ R(−1) ⎢ . . R(0) N R(1) R(0) . ⎢ . ⎥ = ⎢ . . . . R(N) R(N + 1) IL EN ⎥⎢ a ⎥ ⎢ 0 ⎥ ⎢ . ⎢ . . . . . R(N − 1) R(N) ⎥ ⎢ bN. . . . ⎥⎢ ⎥ ⎢ ⎥ . ⎥ = ⎢ . R(1) R(0) . .5 LEVINSON’S RECURSION: VECTOR CASE We now combine the normal equations for the forward and backward predictors to obtain Levinson’s recursion for solving the optimal predictors recursively.18) and add an extra column and row 2 to obtain ⎤⎡ ⎤ ⎡ f⎤ . ⎥⎢ . ⎥⎢ . starting from Eq. ⎦ ⎣ bN. . this is RN+2 too! (8. . . ⎥ ⎢ . we start from Eq. . R(N) R(N + 1) 0 FN ⎥⎢ ⎢ ⎥ ⎢ ⎥ . ⎥⎢ . call this RN+2 (8. R(0) f R(−N − 1) R(−N) . . . R(N − 1) ⎥ ⎢ . ⎣ ⎤⎡ ⎤ ⎡ ⎤ .N ⎦ ⎣ . We now postmultiply Eq. ⎥ ⎢ .. ⎥ ⎢ ⎥⎢ ⎢ ⎥ ⎢ ⎥ R(1) ⎦ ⎣ bN. . . .. For this. which comes from the extra row added at the bottom. R(−1) R(0) IL EN ⎡ R(0) R(−1) .1 ⎥ ⎢ ⎥ ⎢ ⎥⎢ .N ⎦ ⎣ 0 ⎦ ⎣ R(−N) R(−N + 1) . .

⎥⎟ = ⎢ . ⎥ ⎢ .22).27) To create the augmented normal equation for the (N + 1)-order backward predictor. ⎥ + ⎢ . we have to b reduce this to the form (8. ⎥ N ⎢ . N f f f f (8. RN+2 ⎜⎢ .26) We can similarly postmultiply Eq. KN+1 should be chosen as [EN ]Kb +1 + Fb = 0.N ⎦⎠ ⎣ 0 ⎦ ⎣ 0 ⎦ ⎝⎣ aN. .1 ⎥ ⎢ bN. ⎥ ( K f ) † RN+2 ⎜⎢ .1 ⎥ ⎟ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎜⎢ .N ⎦ ⎠ ⎣ 0 ⎦ ⎣ 0 ⎦ f 0 IL Eb FN N f ⎛⎡ (8. We have ⎤ ⎡ ⎤ ⎞ ⎡ f⎤ ⎡ b ⎤ IL 0 EN FN ⎜⎢ ⎥ ⎢ ⎥ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜⎢ aN. we make the assumption that the L × L error covariance Eb . ⎥ ⎢ ⎢ ⎥ ⎜⎢ ⎥ ⎥⎟ ⎢ ⎥ ⎣ bN. ⎥ ⎢ . ⎥ . ⎥⎟ ⎢ .24): N ⎡ ⎡ b⎤ ⎤ ⎤⎞ ⎡ f ⎤ IL 0 EN FN ⎢ b ⎥⎟ ⎢ 0 ⎥ ⎢ 0 ⎥ ⎜⎢ a ⎥ ⎢ N. ⎥ N+1 ⎟ ⎢ .28) . N At this point. For this. ⎥ ⎢ . and for future purposes. This will then lead to the augmented normal equation for the (N + 1)th-order forward predictor. are nonsingular. N N that is. ⎥ + ⎢ . ⎥ N+1 . Kb +1 = −[EN]−1 Fb . ⎜⎢ ⎥ ⎢ ⎥ ⎟ ⎢ ⎥ ⎢ ⎥ ⎝ ⎣ aN . (8.1 ⎥⎟ ⎢ ⎥ ⎢ ⎥ ⎜ ⎢ N . ⎥ ⎟ ⎢ ⎥ ⎢ ⎥ ⎜⎢ .25) Clearly. The solution KN+1 is therefore given by KN+1 = −[FN ]† [Eb ]−1 . N N f f (8.N ⎦ f b 0 IL EN FN ⎛⎡ (8.23) by an L × L matrix Kb +1 and add it to Eq. ⎥ Kb +1 + ⎢ . ⎢ . ⎥ ( K f ) † ⎟ = ⎢ . . (8.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 123 replaced with zero. ⎥ b ⎥⎟ ⎢ ⎥ ⎜⎢ . ⎥ . which achieves the aforementioned purpose should be such that FN + Eb (KN+1 )† = 0.1 ⎥ ⎢ ⎢ ⎥ ⎜⎢ . ⎥ KN+1 + ⎢ . . N f f EN . the choice of KN+1 .N ⎦ ⎣ bN .

.N ⎦ aN+1. ⎥ ⎢ . ⎥⎟ = ⎢ . ⎥ ⎢ ⎢ ⎥ ⎢ ⎢ ⎢ ⎥ ⎢ ⎥ ⎥ ⎣ bN . ⎥ . ⎜⎢ .27) reduce to the form N ⎛⎡ ⎤ ⎡ ⎤ ⎞ ⎡ f ⎤ IL 0 EN+1 ⎜⎢ a ⎥ ⎢ b ⎥ ⎟ ⎢ ⎥ ⎜ ⎢ N . ⎢ . ⎥ N+1 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎣ aN+1. ⎥ (KN+1 )† ⎟ = ⎢ .N ⎦ ⎣ bN. respectively.N ⎦ ⎠ ⎣ 0 ⎦ 0 IL 0 and ⎡ ⎤ ⎤⎞ ⎡ IL 0 0 ⎢ b ⎥⎟ ⎢ 0 ⎜⎢ a ⎥ ⎢ N .N ⎦⎠ ⎣ 0 ⎝ ⎣ aN . the (N + 1)th-order forward predictor coefficients are obtained from ⎡ ⎤ ⎡ ⎤ ⎡ ⎤ IL IL 0 ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ aN+1.25) and (8. ⎥ + ⎢ . ⎜⎢ . ⎥ ⎢ ⎜⎢ ⎥ ⎥⎟ ⎢ ⎣ bN.5) and (8. ⎢ ⎥ ⎢ . ⎥ ⎥ ⎦ (8. ⎥ f RN+2 ⎜⎢ .N ⎦ ⎣ bN.1 ⎥ ⎢ .1 IL 0 ⎢b ⎥ ⎢ b ⎥ ⎢ ⎥ ⎢ N .2 ⎥ ⎢ aN. ⎢ ⎥ = ⎢ . . .1 ⎥ ⎢ N . ⎥ .1 ⎥ ⎢ ⎢ ⎥ ⎢ . . Eqs.N ⎦ ⎣ aN. ⎥ ⎟ ⎢ .N ⎦ ⎣ bN+1. ⎥⎟ ⎢ . (8. ⎥ ⎜⎢ ⎥ ⎢ ⎥ ⎟ ⎢ ⎥ ⎝⎣ aN.N ⎦ 0 IL Eb +1 N ⎛⎡ ⎤ ⎥ ⎥ ⎥ ⎥.30) respectively. ⎥ . ⎥ b ⎥ .29) ⎜⎢ .1 ⎥ ⎟ ⎢ 0 ⎥ ⎜⎢ . These are the augmented normal equations for the (N + 1)th-order optimal forward and backward vector predictors.19) are therefore given by AN+1 (z) = AN (z) + KN+1 BN(z) and f (8. ⎥⎟ ⎢ .N+1 0 IL Similarly.1 ⎥ ⎢ ⎥ ⎢ . . (8.31) .124 THE THEORY OF LINEAR PREDICTION f b The matrices KN+1 and KN+1 are analogous to the parcor coefficients km used in scalar linear f prediction. ⎥ + ⎢ . ⎥ KN+1 + ⎢ .1 ⎥ ⎢ bN. ⎥ KN+1 + ⎢ .1 ⎥ ⎟ ⎢ ⎜ ⎢ N . ⎥ (8.1 ⎥ ⎢ aN. ⎥ ⎢ . . ⎢ ⎥ = ⎢ . ⎥ ⎢ .N+1 ⎦ ⎣ aN. Thus. ⎥ b RN+2 ⎜⎢ . ⎥ ⎟ ⎢ .N ⎦ IL 0 IL The forward and backward prediction polynomials defined in Eqs. With KN+1 and Kb +1 chosen as above. ⎥ (K f )† . ⎥ ⎢ .1 ⎥ ⎢ N+1. the (N + 1)th-order backward predictor coefficients are obtained from ⎡ ⎡ ⎤ ⎡ ⎤ ⎤ bN+1.

proceed as follows: 1. Bm (z).28) and using the fact that the error covariances are Hermitian anyway. N (8. the optimal linear predictor f polynomials AN (z) and BN (z) and the error covariances EN and Eb can be calculated recursively N as follows: first.37(d)) (8.1 + .25) and (8.m (8. b b f b f f f (8. + R(−1)am.36) Summary of Levinson’s Recursion.19)). f b B0 (z) = z−1 I.37(b)) (8. Calculate the parcor coefficients (L × L matrices) Km+1 = −[Fm ]† [Em ]−1 f b f Kb +1 = −[Em ]−1 Fb m m f b (8.25) and (8.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 125 BN+1 (z) = z −1 [(Kb +1 )† AN (z) + BN (z)]. Calculate the L × L matrices f Fm = R(−m − 1) + R(−m)am. by definition.29).33) (8. and E0 = E0 = R(0).35) and Eb +1 = IL − (Kb +1 )† KN+1 Eb N N N f (8. By comparing Eqs. (8. . N Similarly.37(a)) Given Am (z).29) we see that the optimal prediction error covariance matrix is updated as follows: EN+1 = EN + Fb (KN+1 )† .37(c)) Fm = R(1)bm. . is zero (see Eq.37(e)) . . we obtain the error covariance update equations EN+1 = IL − KN+1 (KN+1 )† EN b f f f (8.1 + . we obtain EN+1 = EN + FNKN+1 . Em . initialize the recursion according to A0 (z) = I. We now summarize the key equations of Levinson’s recursion.34) By substituting from Eq. f b (8. Let R(k) be the L × L autocorrelation matrix sequence of a WSS process x(n). . (8. (8. The subscript m is used instead of N for future convenience. from (8. and Em (all L × L matrices).26) and (8.32) The extra z−1 arises because the 0th coefficient in the backward predictor. + R(m)bm. Then.m + R(m + 1) 2.

m R(−m) + R(−m − 1) m m m =E bm.1 R(−1) + . (8.m x(n − m) + x(n − m − 1) x† (n) † † = E[eb (n)x† (n)] m . we see that if this ‘‘residual f correlation’’ is zero.6 PROPERTIES DERIVED FROM LEVINSON’S RECURSION We now derive some properties of the matrices that are involved in Levinson’s recursion.37(b)) can be rewritten as f (Fm )† = R(m + 1) + a† .37(g)) (8.37(d)) is therefore called the partial correlation or parcor coefficient. + b† .6.1 x(n − 1) + . .37(c)) that m (Fb )† = b† .m x(n − m) x† (n − m − 1) m m f = E[em (n)x† (n − m − 1)] We know the error em (n) is orthogonal to the samples x(n − 1).37(h)) (8.126 THE THEORY OF LINEAR PREDICTION 3. + a† .37( f )) (8.1 Properties of Matrices Fm and Fb m Eq. Update the error covariances according to Em+1 = IL − Km+1 (Km+1 )† Em b f f f f (8. f f How about Fb ? Does it have a similar significance? Observe from (8. (8. and as a consequence. . .1 R(m) + . there is no progress in the recursion. This ‘‘residual correlation’’ between em (n) and x(n − m − 1) is precisely what is f f captured by the matrix Fm . . (8. + bm.m R(1) m m = E x(n) + a† . .1 x(n − 1) + .37(i)) Em+1 = IL − (Km+1 )† Km+1 Em . . m 4. that is. Update the prediction polynomials according to Am+1 (z) = Am (z) + Km+1 Bm (z) Bm+1 (z) = z−1 [(Kb +1 )† Am (z) + Bm (z)].. f 8. This gives additional insight and places the mathematical equations in proper context. b b b f This recursion can be repeated to obtain the optimal predictors for any order N. . as in the scalar case. The coefficient Km+1 in Eq. then Km+1 = 0. x(n − m). . f Am+1 (z) = Am (z) and Em+1 = Em .37(d)). . 8. . + a† . but not necessarily f to x(n − m − 1). . From Eq.

+ b† .41) which proves (8.1 x(n − 1) + . This residual correlation is captured by Fm . Thus.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 127 The backward error is the error in the estimation of x(n − m − 1) based on the samples x(n − m). we have f [Fm ]† = −Km+1 Eb m f f and Fb = −Em Kb +1 . m † So we have shown that f f Fm = E[eb (n)(em (n))†] and m f Fb = E[em (n)(eb (n))†]. If this is zero. then Km+1 is zero as seen from Eq. 1974). eb (n) is orthogonal to x(n − m).39).38) as f f (Fm )† = E em (n) x(n − m − 1) + b† . . from Eq. m m (8. We have shown that eb (n) m f f (Fm )† = E[em (n)x† (n − m − 1)] (8. .40). . . there is a simple relation between Fm and Fm credited to Burg (Kailath. (8. But for the m f vector case.m x(n − m) m m f = E[em (n)(eb (n))† ] m † because E[em (n)x(n − k)] = 0 for 1 ≤ k ≤ m anyway. m m (8. the parameter αm played the role of the matrices Fm and Fb . b f (8. (8. Relation Between Error Covariance Matrices. . . x(n − 1) but not necessarily m b b to the sample x(n).39) f For the scalar case. there is no progress in prediction as we go from stage m to m + 1. x(n − 1). Fm = (Fm )† .40). f b Relation Between Fm and Fm . which appear to be unrelated.1 x(n − 1) + . . . f b However.40) This is readily proved as follows: Proof of Eq. (8. We can rewrite Eq. namely. Similarly. (8. From the definitions of Km+1 and Kb +1 m given in Eqs. + a† .37(e)). m m f . we have the two matrices Fm and Fb . . we get f (Fb )† = E eb (n) x(n) + a† . That is.37(d)) and (8.m x(n − m) m m m m f = E[eb (n)(em (n))† ]. .. This gives the m first impression that there is a lack of symmetry between the forward and backward predictors.37(e)).38) and (Fb )† = E[eb (n)x† (n)]. (8. .

43).40). We can also express these as f f Fm = E[eb (n)(em (n))† ] and Fm = E[em (n)(eb (n))† ].46) This shows that Em ≥ Em+1 and the proof of Eq. (8.42) 8.47(a)) 2. substituting Eq. it then follows that f Km+1 Eb = Em Kb +1 . m m (8. Because the total m m f f mean square error Em is the trace of Em . In view of the symmetry relation (8. m m (8.6. the differences Em − Em+1 and Eb − Eb +1 are positive semidefinite.34).3 Summary of Properties Relating to Levinson’s Recursion Here is a summary of what we have shown so far: f 1. from which it follows that Em+1 = Em − Fm (Em )−1 (Fm )† . b b 8.6. Substituting (8. Eq. it follows that the second term on the right-hand m f f side is Hermitian and positive semidefinite.2 Monotone Properties of the Error Covariance Matrices We now show that the error covariances of the optimal linear predictor satisfy the following: f Em ≥ Em+1 f f and Eb ≥ Eb +1 . m m f (8.43) is complete.45) Because (Eb )−1 is Hermitian and positive definite. m m f b (8. b f (8. b b f f f (8. Similarly. (8.47(b)) so that F m = (F m )† . Fm and Fb have the following interpretations: m f f (Fm )† = E[em (n)x† (n − m − 1)].33). This proves Em ≥ Em+1 . b b f f b (8.44) f Proof of (8. we see that Em+1 = Em − b b f Fm (Em )−1 Fm . (8.43) means in particular that Emf ≥ Em+1 f b b and Em ≥ Em+1 f (8.128 THE THEORY OF LINEAR PREDICTION In view of the symmetry relation (8. it then follows that Em+1 = Em − (Fm )† (Em )−1 Fm .43) f that is.47(c)) . (Fb )† = E[eb (n)x† (n)]. f f b f f (8. (8. we get Em+1 = Em − Fm (Em )−1 Fm .37(d)) into Eq.40).37(e)) into Eq.

Similarly backward prediction polynomial (8. These are schematically shown in Fig. 6. m m f (8. We therefore see that the transfer function from eN (n) to eb (n) is N given by z−1 HN (z)=BN (z)AN (z) f Δ −1 (8.8. N f It then follows that the backward error eb (n) can be generated from the forward error eN(n) as N f shown in part (b) of the figure. (8. For completeness. m P† Pm < I.47(e)) where A ≥ B means that A − B is positive semidefinite. which f produces the output eN (n) in response to the input x(n). we list one more relation that will only be proved in Section 8.47(h)) and the matrices Pm satisfy Pm P† < I.47(i)) 8. The parcor coefficients are related as f Km+1 Eb = Em Kb +1 .47(d)) 4. m (8. The error covariances matrices satisfy Em ≥ Em+1 . f f Em ≥ Em+1 . b b (8. f f f gb (n) = (Eb )−1/2 eb (n) N N N (8. The error covariances for successive prediction orders are related as f (Em+1 )1/2 = (Em )1/2 IL − Pm+1 P† +1 m f 1/2 (8.1(a). f f b b 5.47(g)) f f Pm+1 = (Em )−1/2 Km+1 (Eb )1/2 = (Em )†/2 Kb +1 (Eb )−†/2 m m m f Δ (8.49) .7 TRANSFER MATRIX FUNCTIONS IN VECTOR LPC The forward prediction polynomial (8.47( f )) (Eb +1 )1/2 = (Eb )1/2 IL − P† +1 Pm+1 m m m where 1 /2 .5) is a multi-input multi-output (MIMO) FIR filter.19) produces eb (n) in response to the input x(n). 8.1.48) A modification of this is shown in part (c) where the quantities gN (n) and gb (n) are given by N gN (n) = (EN)−1/2 eN (n).LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 129 3. Traces of the error covariances satisfy Em ≥ Em+1 and Em ≥ Em+1 .

8. the lattice coefficients are the two different * f b matrices Km+1 and (Km+1 )† . Recall that the f error em (n) is the output of Am (z) in response to the input x(n).37(g)) show that the prediction errors for order m + 1 can be derived from those for order m using the MIMO lattice section shown in Fig. and similarly. given in Eqs. 8.1: (a) Generation of the forward and backward errors using the prediction polynomials f AN (z) and BN (z).5). (8. (8. b Δ −1 f (8. .2. Comparing this with the lattice structure for the scalar case (Fig. we see that there is a great deal of similarity.8 THE FIR LATTICE STRUCTURE FOR VECTOR LPC The update equations for predictor polynomials.130 THE THEORY OF LINEAR PREDICTION x(n) (a) A N (z) BN (z) eNf (n) b eN (n) (b) eNf (n) -1 AN(z) x(n) BN (z) b eN (n) (c) f gN (n) eNf (n) [ ENf ]1/2 x(n) -1 AN(z) BN (z) b eN (n) b [ EN ]−1/2 b gN (n) FIGURE 8. These are called normalized errors because their covariance matrices are equal to I. (b) generation of the backward error eb (n) from the forward error eN (n). Thus.37(f )) and (8. 4. One difference is that. and (c) N f b generation of the normalized backward error gN (n) from the normalized forward error gN (n). the update Eqs. eb (n) is the output m of Bm (z). But in the MIMO case. can be used to derive lattice structures for vector linear prediction. the lattice coefficients were km+1 and its conjugate km+1 . The transfer f function from gN (n) to gb (n) is given by N z−1 GN (z)=[EN ]−1/2 BN (z)AN (z)[EN ]1/2 .37(f )) and (8.37(g)). in the scalar case. similar to the scalar case.50) The properties of GN (z) will be studied in later sections.

42) as f f (Em )−1/2 Km+1 (Eb )1/2 = (Em )†/2 Kb +1 (Eb )−†/2 . Because error covariances are positive semidefinite matrices.54) . we see that f f Pm+1 P† +1 = (Em )−1/2 (Km+1 )(Kb +1 )† (Em )1/2 m m f (8.42).2: The MIMO lattice section that generates prediction errors for order m + 1 from prediction errors for order m.53) and P† +1 Pm+1 = (Eb )−1/2 (Kb +1 )† Km+1 (Eb )1/2 . (8. Pm+1 = (Em )−1/2 Km+1 (Em )1/2 = (Em )†/2 Km+1 (Em )−†/2 .8. f b f b b Δ f (8. they can be factored into the form E = E1/2 E†/2 . where E1/2 can be regarded as the left ‘square root and its transpose conjugate E†/2 the right square root.52) From the definition of Pm+1 . we can rewrite Eq. Recall that the coefficients Km+1 and Km+1 are related as in Eq.51) so that f f Km+1 = (Em )1/2 Pm+1 (Eb )−1/2 and Kb +1 = (Em )−†/2 Pm+1 (Eb )†/2 . m m m m f (8. (8.1 Toward Rearrangement of the Lattice We now show how to bring more symmetry into the lattice structure by rewriting equations a little f b bit. m m m f (8. m m m f We will indicate this quantity by the notation Pm+1 . that is. 8. We first rearrange this. Using this notation and remembering that the error covariances are assumed to be nonsingular.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES f em(n) f em+1 (n) 131 K m+1 K m+1 b em (n) b f z −1 I b em+1 (n) FIGURE 8.

m These can equivalently be written as † (8. f f f † (8. the modified lattice coefficient Pm+1 enters both the forward and backward error updates in a symmetrical way.56) Thus. This can be rewritten as f Em+1 = IL − Km+1 (Kb +1 )† Em m f f = IL − Km+1 (Kb +1 )† (Em )1/2 (Em )†/2 m f f f = (Em )1/2 − Km+1 (Kb +1 )† (Em )1/2 (Em )†/2 m f f f f = (Em )1/2 IL − (Em )−1/2 Km+1 (Kb +1 )† (Em )1/2 (Em )†/2 . Eq. m m m m m so that Em+1 = (Em )1/2 IL − Pm+1 Pm+1 (Em )†/2 . (8. we now prove that Pm+1 satisfies a boundedness property: Pm+1 Pm+1 < I and similarly P† +1 Pm+1 < I. f .132 THE THEORY OF LINEAR PREDICTION To understand the role of these matrices. (8. from Eq.35). recall the error update Eq. (8.55) Similarly.36). m (8. we have Eb +1 = IL − (Kb +1 )† Km+1 Eb m m m f f = IL − (Kb +1 )† Km+1 (Eb )1/2 (Eb )†/2 m m m = (Eb )1/2 − (Kb +1 )† Km+1 (Eb )1/2 (Eb )†/2 m m m m f = (Eb )1/2 IL − (Eb )−1/2 (Kb +1 )† Km+1 (Eb )1/2 (Eb )†/2 . b b b † f (8. Under the assumption that the error covariances are nonsingular.57) (8.55) can be written as (I − P† +1 Pm+1 ) > 0. m Proof.59) Em+1 = U† QU.58) (I − Pm+1 P† +1 ) > 0. m f f f f f so that Em+1 = (Em )1/2 IL − Pm+1 Pm+1 (Em )†/2 .

Eq. Because is assumed to be nonsingular.58) follows similarly. (8.62) and f Kb +1 = (Em )−†/2 Pm+1 (Eb )†/2 . 8. (8.55). which is equivalent † to the statement that Q is positive definite. it is f positive definite. We can generate the set of all forward and backward normalized error sequences by using the cascaded FIR lattice structure.2 The Symmetrical Lattice The advantage of defining the L × L matrix Pm+1 from the L × L matrices Km+1 and Kb +1 is m that it allows us to redraw the lattice more symmetrically. which is equivalent to Eq. Because f U = (Em )†/2 is nonsingular. Because Em+1 is Hermitian and positive definite. gb (n) = (Eb )−1/2 eb (n) m m m (8. it can be written in the form Em+1 = (Em+1 )1/2 (Em+1 )†/2 From Eq. f b f f (8. From Eq. (8.63) Substituting these into Fig. Because Pm+1 also has the boundedness property (8. in terms of Pm+1 .51). Thus. we therefore see that the left square root can be expressed as f (Em+1 )1/2 = (Em )1/2 IL − Pm+1 P† +1 m f † f f f f 1/2 . 8. (8. we obtain the lattice section shown in Fig. f The quantities gm (n) and gb (n) indicated in Fig.3(b).57).56). 8. (8. in the sense that their covariance matrix is identity. v† U† QUv > 0. this implies that w† Qw > 0 for any vector w = 0.8.59)).57). Similarly. (8. from Eq. 8. we have m (Eb +1 )1/2 = (Eb )1/2 IL − P† +1 Pm+1 m m m 1/2 (8. This proves that I − Pm+1 Pm+1 is positive definite. m m (8. We now derive this symmetrical lattice. we see that the lattice coefficients can be expressed as Km+1 = (Em )1/2 Pm+1 (Em )−1/2 . the lattice becomes very similar to the scalar LPC lattice.2 and rearranging. as demonstrated in Fig. (8.60) The existence of the square root (IL − Pm+1 Pm+1 )1/2 is guaranteed by the fact that (IL − Pm+1 P† +1 ) is positive definite (as shown by Eq. which had the boundedness property |km | < 1.4 for three stages.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES f Em+1 133 where Q is Hermitian and U is nonsingular.3(b) are given by m f f f gm (n) = (Em )−1/2 em (n).61) 8. . so that v† Em+1 v > 0 for any nonzero vector v.64) These are normalized prediction error vectors.

f f f f m≥1 (8. m In view of Eqs. these multipliers can be rewritten as Q m = I L − Pm Pm f † −1/2 .134 THE THEORY OF LINEAR PREDICTION f em(n) f em+1 (n) K m+1 (a) b em (n) b K m+1 f z −1 I b em+1 (n) f ( Em)−1/2 f em(n) g (n) m f f (Em )1/2 f em+1 (n) P (b) m+1 (Eb )−1/2 m b em (n) Pm+1 (Eb )1/2 m z −1 I b em+1 (n) g b(n) m FIGURE 8. 8.66) The unnormalized error sequence for any order can be obtained simply by inserting matrix f multipliers (Em )1/2 and (Eb )1/2 at appropriate places.61). b f .65) and Q 0 = (E0 )−1/2 . Q m = IL − Pm Pm b † − 1 /2 m ≥ 1. The multipliers Q b are similar (just replace superscript f with b everywhere). 3 3 b f f f the extra (rightmost) multipliers in Fig. (8. The lattice sections in Fig.60) and (8.4 will be referred to as normalized lattice sections because the outputs at each of the stages represent the normalized error sequences. Because m e3 (n) = (E3 )1/2 g3 (n) and eb (n) = (E3 )1/2 gb (n).4(a) are therefore (E3 )1/2 and (E3 )1/2 . The multipliers Q m in the figure are given by f f Q m = (Em )−1/2 (Em−1 )1/2 . 8. (8.3: (a) The original MIMO lattice section and (b) an equivalent section in a more symmetrical form.

LINEAR PREDICTION THEORY FOR VECTOR PROCESSES x(n) e 0(n) b e 0 (n) f 135 g 0(n) f Q0 f g 1 (n) f g2 (n) f g 3 (n) f e 3 (n) f z −1 I (a) b g 0(n) P1 z −1 I f Q1 b g 1 (n) P2 z −1 I f Q2 b g2 (n) P3 z −1 I f Q3 b g3 (n) e b (n) 3 Qb 0 Qb 1 Qb 2 Qb 3 (b) Pm Pm Pm FIGURE 8.4: (a) The MIMO FIR LPC lattice structure with normalized sections and (b) details of the box labeled Pm . m m b The first equation can be rewritten as f em (n) = em+1 (n) − Km+1 eb (n).9 THE IIR LATTICE STRUCTURE FOR VECTOR LPC The IIR LPC lattice for the vector case can be obtained similar to the scalar case by starting from Fig. From Fig. 8.5(b). 8. 8. . we have f em+1 (n) = em (n) + Km+1 eb (n) m f f and f eb +1 (n + 1) = [Km+1 ]† em (n) + eb (n).2 reproduced in Fig. This is called the MIMO IIR LPC lattice. we get eb +1 (n + 1) = [Km+1 ]†em+1 (n) + I − [Km+1 ]†Km+1 eb (n).6. m m b b f f These equations can be ‘‘drawn’’ as shown in Fig.5(a).5(a) and rearranging equations. m f f Substituting this into the second equation. 8. 8. 8. The transfer functions indicated as Hk (z) in the figure will be discussed later. By repeated application of this idea. we obtain the IIR LPC lattice shown in Fig.

6: (a) The MIMO IIR cascaded LPC lattice.136 THE THEORY OF LINEAR PREDICTION f em(n) f em+1 (n) K m+1 (a) b K m+1 b em (n) f z −1 I b em+1 (n) f em+1 (n) f em (n) f − K m+1 (b) K m+1 b em+1 (n) b z −1 I b em (n) FIGURE 8. . and (b) details of the building block labeled Km .5: (a) The MIMO FIR lattice section and (b) the corresponding IIR lattice section. e 3(n) f e 2 (n) f e1 (n) f e0 (n) f x(n) b e 3(n) K3 z −1 I H 3 (z) b e 2(n) K2 z −1 I H 2 (z) e1b(n) K1 z −1 I H 1 (z) e0b(n) z −1 I H0 (a) (b) Km f −K m Km b FIGURE 8.

To obtain this structure. We will call this the normalized IIR LPC lattice for reasons that will become clear.68) (8.70) g m (n) (a) f g f Pm+1 Pm+1 b g m (n) m +1 (n) b Q m +1 z −1 I g b (n) m +1 g (b) f f (Q m +1 ) −1 m +1 (n) g m (n) − f Q b +1 m m +1 Pm+1 (Q m +1 ) f −1 g b (n) z −1 I (Q b +1 ) m −P m+1 g b (n) m FIGURE 8. This FIR lattice section can be represented using the equations f gm+1 (n) = Q m+1 gm (n) + Pm+1 gb (n) m f f (8. (8. m m m m f The first equation can be rearranged to express gm (n) in terms of the other quantities: f gm (n) = (Q m+1 )−1 gm+1 (n) − Pm+1 gb (n).69): gb +1 (n + 1) = Q m+1 Pm+1 (Q m+1 )−1 gm+1 (n) + (Q m+1 )−† gb (n).68) can then be rewritten by substituting from Eq.67) and f gb +1 (n + 1) = Q b +1 P† +1 gm (n) + gb (n) . we can derive an IIR LPC lattice with slightly different structural arrangement. m m b b f Q m +1 † f f (8. consider again the MIMO FIR lattice section shown separately in Fig.7(a).7: (a) The normalized MIMO FIR lattice section and (b) the corresponding IIR normalized lattice section.10 THE NORMALIZED IIR LATTICE Starting from the symmetrical MIMO FIR LPC lattice in Fig.3(b). 8.69) The second equation (8. . 8. m f f (8.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 137 8.

8(b). f f f f 8.11 THE PARAUNITARY OR MIMO ALL-PASS PROPERTY Figure 8.138 THE THEORY OF LINEAR PREDICTION f where we have used the definitions (8. Q m = IL − Pm Pm b † − 1 /2 m ≥ 1. the error signal g0 (n) becomes e0 (n). 8. the signal entering at the left is the forward error eN (n). m m m f f −1/2 and Q 0 = (E0 ) . Gm (z) is the transfer function from the pervious section. Q b = (IL − P† Pm )−1/2 .66) for Q m and Q b . The normalized MIMO IIR LPC lattice therefore takes the form shown in Fig. Upon scaling with the multiplier (Q 0 )−1 = (E0 )1/2 . The preceding two equations can be represented using the normalized IIR lattice section shown in Fig. Q m = (IL − Pm P† )−1/2 . m ≥ 1. we see that the transfer matrices Gm (z) convert the forward error signals to corresponding f f f f f ( Q 0 )−1 g 3 (n) g2 (n) g 1 (n) g 0(n) x(n) b g3 (n) P3 z −1 I G3 (z) b g2 (n) P2 z −1 I G2 (z) b g 1 (n) P1 z −1 I G1 (z) b g 0(n) z −1 I G0 (a) f (Q m )−1 (b) Pm Q b Pm (Q fm ) m −1 (Q b ) m − −P m FIGURE 8. and f the signal at the right end is g0 (n).7(b). that is. the IIR lattice reproduces x(n) back from the error signals. m Q m = I L − Pm Pm f † −1/2 . 8.8: (a) The normalized MIMO IIR LPC lattice shown for three stages and (b) details of f the normalized IIR lattice sections.8 for three f sections. which is nothing but the original signal x(n). Here.8(a). 8. and Pm+1 is the constant matrix shown in Fig. For an N-stage IIR lattice. Comparing with Fig. Here. .9 shows one section of the IIR lattice structure. while the FIR lattice produced the (normalized) error signals from the original signal x(n). Thus. 8.

the transfer function Gm (z) was all-pass. This is somewhat more complicated than in the scalar case because the transfer functions are MIMO..9: The normalized MIMO IIR lattice section. then Gm+1 (z) is paraunitary as well. 1993). f backward error signals. and this was used to show that Gm+1 (z) was all-pass.10). then the output is gb ( n + 1 ) . We then show that in an interconnection such as this. Because G0 (z) = I is paraunitary. Definition 8.1. We will first show that the building block indicated as Pm+1 has a unitary transfer matrix Tm+1 . If G(z) is rational (i. if gm (n) is input to the system Gm (z). Furthermore. we ask the reader to review the tilde notation G(z) described in Section 1. m In the scalar case. their Fourier transforms are related as Y(e jω ) = G(e jω )X(e jω ) . and the all-pass property generalizes to the so-called paraunitary property to be defined below (Vaidyanathan.71) for all ω .2. it will therefore follow that all the transfer functions Gm (z) in the normalized IIR lattice are paraunitary. (8.e. whenever Gm (z) is paraunitary and Pm+1 unitary.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 139 x 1 (n) y 2 (n) y1 (n) Pm+1 x (n) 2 z−1 Gm(z) Gm+1 (z) FIGURE 8. 8.1. these all-pass filters had all their poles inside the unit circle (Section 4. the transfer matrix is unitary for all frequencies.2). That is. More precisely. At this point.72) ♦ With x(n) and y(n) denoting the input and output of an LTI system (Fig. An M × M transfer matrix G(z) is said to be paraunitary if G† (e jω )G(e jω ) = I (8. the entries Gkm (z) are ratios of polynomials in z−1 ) then the paraunitary property is equivalent to G(z)G(z) = I for all z.3. Paraunitary Systems. We now prove an analogous property for the vector case.

x† (n)x(n) = n n y† (n)y(n) (8.73) and using Parseval’s relation. that is.11.140 THE THEORY OF LINEAR PREDICTION x(n) G(z) y(n) FIGURE 8. Recall here that the matrices Q and Q are related to P as in Eq. Q f b f = IL − PP† −1/2 .74) for all finite energy inputs. 8. for a paraunitary system.73) holds for all Fourier-transformable inputs x(n).1 Unitarity of Building Blocks The building block Pm+1 in Fig. 3. be shown that a rational LTI system G(z) is paraunitary if and only if (8.66).76) We will now show that T is unitary. the input energy is equal to the output energy.10: An LTI system with input x(n) and output y(n).75) ⎦. (8. (8. † .9 has the form shown in Fig. it follows that Y† (e jω )Y(e jω ) = X† (e jω )X(e jω ). it also follows that. It can. Its transfer matrix has the form ⎡ ⎤ Q b P† (Q f )−1 (Q b )−† ⎢ ⎥ T=⎣ (8.8(b). 2.73) for any Fourier transformable input x(n). If G(e jω ) is unitary. 8. that is. f −1 (Q ) −P where all subscripts have been omitted for simplicity. in fact. that is. A number of important properties of paraunitary matrices are derived in Vaidyanathan (1993). By integrating both sides of (8. for all ω (8. Q = IL − P P b † −1/2 . 8. T T = I. The following properties are especially relevant for our discussions here: 1.

the first term of A becomes IL − PP† †/2 †/2 P IL − P† P P IL − P P † −†/2 −1 IL − P† P † −1/2 P† IL − PP† 1/2 = IL − PP† so that P IL − PP † 1/2 . To simplify D. (8.76) to obtain D = IL − P† P 1/2 IL − P† P †/2 + P† P = IL − P† P + P† P = IL Next.11. We will show that if the rational transfer matrix Gm (z) is paraunitary and the box labeled Pm+1 has a unitary transfer matrix Tm+1 (z) then Gm+1 (z) is also paraunitary. Thus. where we have used (IL − PP ) = (IL − PP† )1/2 (IL − PP† )†/2 .LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 141 Proof of Unitarity of T. 8. † † † † 8. D = (Qb )−1 (Qb )−† + P† P. call this A1 We now claim that A1 = (IL − PP† )−1 : A1 (IL − PP† ) = P IL − P† P −1 −1 P† − P† PP† + IL − PP† IL − P† P P† + IL − PP† = P I L − P† P = PP† + IL − PP† = IL . We have T T= where the symbols above are defined as A = (Q )−† P(Q )† Q P (Q )−1 + (Q )−† (Q )−1 . B is trivially zero. (8. . B = (Q f )−† P(Qb )† (Qb )−† − (Q f )−† P = 0.9. A = (IL − PP )†/2 (IL − PP )−1 (IL − PP )1/2 = I.2 Propagation of Paraunitary Property Return now to Fig. This completes the proof that T is unitary. substituting from Eq.76). we substitute from Eq. 1/2 A = IL − PP† †/2 P IL − P† P −1 P† + I IL − PP† . f b f f f b † † A B . B† D So.

For this section. we also have Y† (e jω )Y2 (e jω ) = X† (e jω )X2 (e jω ) for all ω . This shows that the rational function Gm+1 (z) is paraunitary.58)). Lemma 8. Here. Because Gm (z) is paraunitary. 1980. From Fig. Chapter 13).11 is a reproduction of Fig. Proof.77) (see Eq. that is. 8.77). † † for all ω.9 with details of the building block Pm+1 shown explicitly (with subscripts deleted for simplicity). we have Y1 (e jω ) Y2 (e jω ) The unitary property of Tm+1 implies that Y† (e jω )Y1 (e jω ) + Y† (e jω )Y2 (e jω ) 1 2 = Tm+1 X1 (e jω ) X2 (e jω ) = X† (e jω )X1 (e jω ) + X† (e jω )X2 (e jω ).9. Q f = IL − PP† −1/2 and Qb = IL − P† P −1/2 . 8. we will prove an important property about the poles3 of the transfer matrix Gm+1 (z). we showed that the LPC lattice satisfies the property P† P < I (8.11. It is sufficient to show that the transfer function F(z) indicated in Fig.66). then the poles of Gm+1 (z) are ♦ also confined to the region |z| < 1. the matrices Q f and Qb are as in Eq. .1.142 THE THEORY OF LINEAR PREDICTION Proof of Paraunitarity of Gm+1 (z). Vaidyanathan.3 Poles of the MIMO IIR Lattice Figure 8. 8. (8. note that Y(z) = z−1 Gm (z)W(z) and 3 W(z) = −z−1 PGm (z)W(z) + X(z).1. If the rational paraunitary matrix Gm (z) has all poles in |z| < 1 and if the matrix multiplier P is bounded as in Eq. Under this assumption. (8. Poles of the IIR Lattice. The poles of a MIMO transfer function F(z) are nothing but the poles of the individual elements Fkm (z). To find an expression for F(z). In Section 8. the reader may want to review the theory on poles of MIMO systems (Kailath. So it 2 2 follows that Y1 (e jω )Y1 (e jω ) = X1 (e jω )X1 (e jω ).11 has all poles in |z| < 1. 1993. 8. (8. 1 2 for all ω.8.

† † (8. The latter implies Gm (z)Gm (z) ≤ I † (8. if z0 is a zero of the determinant − of (I + z−1 PGm (z)).78) Because the poles of Gm (z) are assumed to be in |z| < 1. (8.79) for all z in |z| ≥ 1 (see Section 14. we get Y(z) = z−1 Gm (z)(I + z−1 PGm (z))−1 X(z). 1993). Now. So the left-hand side in Eq.80) is < 1 in view of (8.80) If |z0 | ≥ 1.8 of Vaidyanathan. This contradicts the statement that the right-hand side of Eq. where w(n) is the signal indicated in the figure. (8.77). it suffices to show that all the poles of (I + z−1 PGm (z))−1 are in |z| < 1. (8.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 143 (Q ) Qb P f −1 x(n) w(n) −P ( Q f )−1 ( Q b )− y(n) z−1 Gm(z) Gm+1 (z) F(z) FIGURE 8. This proves that all zeros of the said determinant are in |z| < 1. By induction. We will use Eq.80) satisfies |z0 | ≥ 1. it therefore follows that the MIMO IIR LPC lattice is stable. (8. det (I + z−1 PGm (z)) where Adj is the adjugate matrix.79)). So there exists a unit-norm vector v such − that z0 1 PGm (z0 )v = −v. then I + z0 1 PGm (z0 ) is singular. we have proved the following: . then Gm (z0 )v has norm ≤ 1 (in view of Eq. This implies v† Gm (z0 )P PGm (z0 )v = |z0 |2 . Subscripts on P and Q are omitted for simplicity. Summarizing.11: The normalized MIMO IIR lattice section with some of the details indicated. (8. Because I + z−1 PGm (z) −1 = Adj I + z−1 PGm (z) .77) and the assumption that Gm (z) is paraunitary with all poles in |z| < 1. This shows that F(z) = z−1 Gm (z)(I + z−1 PGm (z))−1 . Eliminating W(z). it is sufficient to show that all zeros of the denominator above are in |z| < 1.

37(d)) and (8.82) We say that there is no progress as we move from order N to order N + 1. . f f f (8.81) (8. Because N f f the matrix (FN )† (Eb )−1 FN is Hermitian and positive semidefinite. N N f f f f f f f f f (8. 8. all poles of the MIMO IIR LPC lattice are in |z| < 1 (i.84) that is. This implies that the matrix itself is zero (because any positive semidefinite matrix can be written as A = UΛU† where Λ is the diagonal matrix of eigenvalues and U unitary). No-Progress Situation. Tr EN+1 = Tr EN . x(n − N).83) We know that the optimal prediction eN(n) at time n is orthogonal to the samples x(n − k). if EN+1 = EN .37(h)) and (8. Stability of the IIR Lattice. (8. the IIR lattice is a causal stable system).e. the zero-trace condition is N equivalent to the statement4 f f (FN)† (Eb )−1 (FN ) = 0. zero-trace implies all eigenvalues are zero.8 for three stages) satisfy P† Pm < I and define m f b Q m and Q m as in (8. 8. as seen from Eq. 4 f f . When A is positive semidefinite. Proof.3. So.2 for the scalar LPC lattice. for 1 ≤ k ≤ N.5. which can be rewritten using (8. The condition (8. Equivalently. . Assume all the matrix multipliers indicated as Pm in the MIMO IIR lattice (demonstrated in Fig. that is. the error eN (n) is orthogonal to x(n − N − 1) in addition to being orthogonal to ♦ x(n − 1).2.83) is equivalent to Tr ((FN )† (Eb )−1FN ) = 0. Recall that the error covariances are updated according to Eqs.144 THE THEORY OF LINEAR PREDICTION Theorem 8. N The trace of a matrix A is the sum of its eigenvalues.12 WHITENING EFFECT AND STALLING Consider again Levinson’s recursion for the MIMO linear predictor described in Section 8.37(e)) as follows: EN+1 = EN − (FN )† (Eb )−1 FN N Eb +1 = Eb − FN (EN )−1 (FN )† . all eigenvalues are nonnegative.. (8. This is the MIMO version of the stability property we saw in Section 4.66). . the poles of all the paraunitary matrices Gm (z) in the LPC lattice ♦ are confined to be in |z| < 1.83) arises if and only if (FN )† =E[eN (n)x† (n − N − 1)] = 0 f f Δ f (8. (8. Eq. Then.45). We now claim the following: Lemma 8.37(i)).2. .

that is. m ≥ N. Because this equality holds for 1 ≤ m ≤ N directly because of orthogonality principle. From Eqs. f f f f f f f (8. we see that Eq. so the preceding equation implies E[eN (n)(eN )† (n − k)] = 0. f f f k ≥ 1. When Eq.88) Conversely. if and only if eN (n) is white. that is.86). E[eN (n)x† (n − m)] = 0. f 2. (8. k ≥ 0. because x(n) is a linear combination of eN(n − k). we see that when there is no progress in forward prediction. m ≥ 0. We now prove the following: Lemma 8. (8. we have KN+1 = 0 (from (8. The stalling situation (8. From Lemma 8.87) But eN (n) is a linear combination of x(n − m). f m ≥ N + 1. Stalling and Whitening. it is clear that stalling arises if and only if f Fm = 0.85) (8.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 145 This is equivalent to proof.2. (8. . f FN = 0 (because (Eb )−1 N is Hermitian and nonsingular). (8.86) ♦ or equivalently.84) holds. . This completes the A number of points should be noted. .26)) so that AN+1 (z) = AN (z).87) as well. We say that linear prediction ‘‘stalls’’ at order N if there is no further progress after stage N. (8. .86). f f m ≥ 1. then there is no progress in backward prediction either (and vice versa). 1. (8.84).88) implies Eq. (8. we can say that stalling happens if and only if E[eN (n)x† (n − m)] = 0. (8.85) occurs if and only if E[eN (n)(eN )† (n − m)] = EN δ(m). if EN = EN+1 = EN+2 = .88) is equivalent to Eq. Proof.3. we have shown that the stalling condition is equivalent to Eq. But because (8. b Similarly KN+1 = 0 and BN+1 (z) = z−1 BN(z).82) and (8.

we showed that BN (z) = z−(N+1) AN (z). (8.89) and (8. . This implies that BN(z) has all its zeros outside the unit circle and the filter BN(z)/AN (z) is stable and all-pass. all polynomials are polynomials in z−1 . We will start from these results and establish some more. f b so that EN = EN . Let us first recall what we have already done.. (8. the forward and backward error sequences are related by an all-pass filter.11.48) and is reproduced below: z−1 HN(z)=BN (z)AN (z). it is trickier to prove similar properties because the transfer functions are matrices.8(a) (indicated also in Fig.5 The expressions (8.11.146 THE THEORY OF LINEAR PREDICTION 8.2 that GN (z) is paraunitary.11). the properties of the transfer functions AN (z) and BN(z) are well understood. We now discuss this issue more systematically. N (8.50) and reproduced below: N − z−1 GN (z)=[Eb ]−1/2 BN (z)AN 1 (z)[EN ]1/2 . We will also establish a relation between the forward and backward matrix polynomials AN (z) and BN (z). We also showed (Section 8. which is an extension of the all-pass property to the MIMO case. Furthermore. The fact that the poles of GN (z) are in |z| < 1 does not automatically imply that the zeros of the determinant of AN (z) are in |z| < 1 unless we establish that the rational form in Eq.13. We know that the 5 In this section.90) is ‘‘irreducible’’ or ‘‘minimal’’ in some sense.e. 8.91) where A(z) and B(z) are polynomial matrices is called a matrix fraction description (MFD). For example. AN(z) has all zeros strictly inside the unit circle (i. it is a minimum-phase polynomial).90) for z−1 HN(z) and z−1 GN (z) are therefore MFDs. That is. (8. N Δ f (8. f Δ −1 (8. 8.90) This also appears in the MIMO IIR lattice of Fig.3) that all poles of GN (z) are in |z| < 1.13 PROPERTIES OF TRANSFER MATRICES IN LPC THEORY In the scalar case. For the case of vector linear prediction. 8.1 Review of Matrix Fraction Descripions An expression of the form H(z) = B(z)A−1 (z).89) The transfer matrix z−1 GN (z) from the normalized prediction error gN (n) to the normalized prediction error gb (n) is given in Eq. We showed in Section 8. The f transfer matrix HN (z) from the prediction error eN (n) to the prediction error eb (n) is given in Eq.

To be more quantitative. Similarly. Thus. The following results are well-known in linear system theory (e. In this case. the set of poles of H(z) is precisely the set of zeros of [det A(z)].5. The inverse U−1 (z) remains a polynomial because det U(z) = 1. we first need to define ‘‘common factors’’ for matrix polynomials.2 and 13.93) or more generally. det R(z) = 1.g. we can always write A(z) = A(z)U−1 (z) ×U(z).92) Because det A(z) = det A1 (z)det R(z). If there are ‘‘common factors’’ between A(z) and B(z). the MFD expression for a rational transfer matrix is not unique.1 in Vaidyanathan. (8. then we can cancel these to obtain a reduced MFD description for the same H(z). 1993). Lemma 8. Let H(z) = B(z)A−1 (z) be an irreducible MFD. Given A(z) and B(z).91) is irreducible if the matrices A(z) and B(z) can have only unimodular rcds. A matrix polynomial R(z) satisfying Eq. In this case. Any unimodular matrix is an rcd. (8..91) is irreducible if every rcd R(z) has unit determinant. polynomial A1 (z) B(z) = B(z)U−1 (z) ×U(z) polynomial B1 (z) for any unimodular polynomial U(z). abbreviated as rcd. cancellation of all these common factors results in a reduced MFD (i.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 147 rational expression H(z) = B(z)/A(z) for a transfer function is not unique because there can be cancellations between the polynomials A(z) and B(z). then we say that R(z) is a right common divisor or factor between A(z) and B(z).6. Thus.91) as − H(z) = B1 (z)A1 1 (z). Thus. and R(z) are polynomial matrices. we can rewrite Eq. This shows that any unimodular matrix can be regarded as an rcd of any pair of matrix polynomials. A1 (z) and B1 (z) are still polynomials. one with smaller degree for det A(z)). . We say that the MFD (8. (8. ♦ Then. we also say that A(z) and B(z) are right coprime.4.e.93) is said to be unimodular. an MFD (8. see Lemmas 13. (8. it follows that det A1 (z) has smaller degree than det A(z) (unless det R(z) is constant). Irreducible MFDs and Poles. If the two polynomials A(z) and B(z) can be written as B(z) = B1 (z)R(z) and A(z) = A1 (z)R(z) where B1 (z). A1 (z).. a nonzero constant.

Minimum-Phase Property. N f From Theorem 8. or equivalently.95) that it is also an rcd of Am (z) and Bm (z).3. Proof. that is. The optimal predictor polynomial matrix AN (z) is ♦ a minimum-phase polynomial. (8.94) z Bm+1 (z) − (Kb +1 )† Am+1 (z) . . we have − BN (z)AN 1 (z) = z−1 (Eb )1/2 GN (z)(EN )−1/2 . Lemma 8. (8.2 Relation Between Predictor Polynomials We now prove a crucial property of the optimal MIMO predictor polynomials. AN (z) and BN (z) are right coprime. m (8. Because BN (z)AN 1 (z) is an irreducible MFD −1 (Lemma 8. so these have no rcd other than unimodular matrices.13. (8. The inverses indicated in Eq. so that the MFD given by BN (z)AN 1 (z) is irreducible. ♦ Proof.5. we see that any rcd R(z) of AN (z) and BN (z) for N > 0 must be an rcd of A0 (z) and B0 (z). First note that we can solve for Am (z) and Bm (z) in terms of Am+1 (z) and Bm+1 (z) and obtain Am (z) = I − Km+1 (Kb +1 )† m and Bm (z) = I − (Km+1 )† Km+1 b f −1 f −1 Am+1 (z) − z Km+1 Bm+1 (z) f (8.6). From Eq. Repeating this argument. we know that GN (z) is a stable transfer matrix (all poles restricted to |z| < 1).94) and (8. (8. We are now ready to prove the following result.90).2. N Theorem 8.4). R(z) has to be unimodular. all the zeros of [det AN (z)] are poles of BN(z)AN (z) (Lemma 8. Irreducible and Reducible MFDs.35) and (8.94) and (8.37(g)). ♦ 8.95) This is called the downward recursion because we go from m + 1 to m using these equations. Coprimality of Predictor Polynomials. Thus. If R(z) is an rcd of Am+1 (z) and Bm+1 (z).148 THE THEORY OF LINEAR PREDICTION Lemma 8.95) exist by virtue of the assumption that the error covariances at stage m and m + 1 are nonsingular and related as in Eqs. Let H(z) = B(z)A−1 (z) be an MFD and −1 H(z) = Bred (z)Ared (z) an irreducible MFD for the same system H(z). The only assumption is that the error covariances f EN and Eb are nonsingular (as assumed throughout the chapter). The optimal predictor polynomials AN (z) − and BN (z) are right coprime for any N.6. then we see from Eqs.36). all the zeros of [det AN (z)] are in |z| < 1. So all zeros of [det AN (z)] are in |z| < 1. The property depends on the vector version of Levinson’s upward recursion Eqs. (8.37(f ))--(8. Then. B(z) = Bred (z)R(z) and A(z) = Ared (z)R(z) for some polynomial matrix R(z). But A0 (z) = I and B0 (z) = z−1 I. − − This shows that BN (z)AN 1 (z) is stable as well.

1973.96) call this B(z) call this A−1 (z) −1 Because GN (z) is paraunitary.97) (8.99) is paraunitary. 2.LINEAR PREDICTION THEORY FOR VECTOR PROCESSES 149 Refer again to Eq. which is reproduced below: − z−1 GN (z)= [Eb ]−1/2 BN(z) AN 1 (z)[EN ]1/2 = B(z)A−1 (z) N Δ f (8. or equivalently. 1986). that is. The scalar all-pass lattice structure introduced in Section 4. is equivalent to the statement that BN (z)/AN(z) is all-pass. 1975. the main properties of the optimal predictor polynomials for vector LPC are the following: − 1. BN (z)[EN ]−1 BN (z) = AN (z)[EN ]−1 AN (z). BN (z)[EN ]−1 BN (z) = AN (z)[EN ]−1 AN (z). − z−1 GN(z)=[Eb ]−1/2 BN (z)AN 1 (z)[EN ]1/2 N Δ f (8. Summarizing. (8. For the f b scalar case. it follows that B(z)A (z) is paraunitary as well. which. Vaidyanathan et al. note that we had EN = EN . BN (z)AN 1 (z) is irreducible (Lemma 8. The MIMO lattice is also studied by Vaidyanathan and Mitra (1985). (8. 1980. 8. • • • • .90). Thus.. 1985. f b 3.14 CONCLUDING REMARKS Although the vector LPC theory proceeds along the same lines as does the scalar LPC theory. of course. so that this reduces to BN (z)BN (z) = AN (z)AN (z). the forward and backward predictor polynomials are related by the above equation. A (z)B(z)B(z)A (z) = I from which it follows that B(z)B(z) = A(z)A(z) That is.98) So. AN (z) and BN (z) are right coprime. 1984.3).3 has applications in the design of robust digital filter structures (Gray and Markel. there are a number of important differences as shown in this chapter. the paraunitary lattice structure for MIMO linear prediction can be used for the design of low sensitivity digital filter structures.6). AN (z) has minimum-phase property (Theorem 8. Vaidyanathan and Mitra. as shown in the pioneering work of Rao and Kailath (1984). b f −1 −1 for all z. Similarly.

.

1) results in minimum mean square error among all linear estimates.2) (A. using a linear combination of the form x = −a1 x1 − a2 x2 . Let x be another random variable. x2 . (A. .151 APPENDIX A Linear Estimation of Random Variables Let x1 .3) A classic problem in estimation theory is the identification of the constants ai such that the mean square error E = E[ |e|2 ]. We say that x is a linear estimate of x in terms of the ‘‘observed’’ variables xi . The estimate x in Eq. that is. i . * * * (A. * * * (A. possibly complex. + aN xN .1.1 THE ORTHOGONALITY PRINCIPLE The minimization of mean square error is achieved with the help of the following fundamental result: Theorem A. (A. xN be a set of N random variables. . but will become notationally convenient. E[ex* ] = 0 for 1 ≤ i ≤ N. We wish to obtain an estimate of x from the observed values of xi . possibly correlated to xi in some way.4) is minimized. .1) The use of −ai* rather than just ai might appear to be an unnecessary complication. if and only if the estimation error e is orthogonal to the N ♦ random variables xi . . Orthogonality Principle. so that e = x + a1 x1 + a2 x2 + . A. . . . The estimation error is defined to be e = x − x. The estimate x is then said to be the minimum mean square error (MMSE) linear estimate. − aN xN. .

(A.5) (A.8) We therefore have N E[e⊥ (x⊥ − x *)] = E[e⊥ * i=1 N ci*xi* ] (from Eq. Δ (A.6) Let x be some other linear estimate. if and only if x⊥ = x. A. with error e. we obtain the N equations (A. The second term on the right-hand side above is nonnegative. (A. (A. the error e⊥ =x − x⊥ satisfies E[e⊥ xi*] = 0.6)).8)) = =0 so that Eq. (A.9) .152 THE THEORY OF LINEAR PREDICTION Proof. 1 ≤ i ≤ N i (orthogonality).3).1). e = x − x = e⊥ + x⊥ − x This has the mean square value * * E[ |e|2 ] = E[ |e⊥ |2] + E[ |x⊥ − x|2 ] + E[(x⊥ − x)e⊥ ] + E[e⊥ (x⊥ − x *)]. and we conclude that E[ |e|2 ] ≥ E[|e⊥ |2 ]. (A. (A.2 CLOSED-FORM SOLUTION E[ex∗ ] = 0.5)). (A. If we impose the orthogonality condition on the error (A. Then. Let x⊥ denote the estimate obtained when the orthogonality condition is satisfied. Equality holds in the above. 1 ≤ i ≤ N. Thus. E[ |e|2 ] = E[ |e⊥ |2 ] + E[ |x⊥ − x|2 ].7) becomes i=1 ci* E[e⊥ xi* ] (from Eq. we can write N (from Eq.7) x⊥ − x = i=1 c i xi . Because x⊥ and x are both linear combinations as in Eq.

we can solve for a and obtain a = −R−1 r (optimal linear estimator).13) −r The vector r is the cross-correlation vector. . ⎦⎣ . . More explicitly. . . If the correlation matrix R is nonsingular.12) (A. (A. xN ]T . . . . . . a = [ a1 a2 .14) . we can write the error e as e = x + a† x .2 ⎥ . ⎦ ⎣ . This matrix is Hermitian. we arrive at E[x x† ] + a† E[xx† ] = 0. . R = R. E[x1 xN ] * * a1 E[x1 x*] ⎥⎢ ⎥ ⎥ ⎢ ⎢ * * ⎢ E[x x*] ⎥ ⎢ E[x2 x1 ] E[ |x2 |2 ] . . Its ith element represents the correlation between the random variable x and the observations xi . . . . which result in the optimal estimate. E[ |xN | ] R a † Δ (A. this can be written as ⎤⎡ ⎤ ⎤ ⎡ ⎡ E[ |x1 |2 ] E[x1 x2 ] . (A. aN ]T . (The subscript ⊥ which was used in the above proof will be discontinued for simplicity. ⎥ = − ⎢ . the matrix R=E[xx† ] is the correlation matrix for the random vector x.10). . . ⎥ ⎢ . in terms of the matrix elements. In the above equation. and the orthogonality condition Eq.10) (A. ⎦ ⎣ 2 aN E[xN x*] * * E[xN x1 ] E[xN x2 ] . . .) Defining the column vectors x = [ x1 x2 . that is. .11) (A. ⎢ . ⎥⎢ . (A.9) as E[e x† ] = 0.. E[x2 xN ] ⎥ ⎢ a2 ⎥ ⎥ ⎢ .12) simplifies to Ra = −r. ⎥ ⎢ . Substituting from Eq. Eq. Defining the vector r = E[x x*].LINEAR ESTIMATION OF RANDOM VARIABLES 153 We use these to find the coefficients ai. (A.

the lengths of the vector x. (A. The random variable x and its estimate x are related as x = x + e.16) by exploiting Eq. the optimal linear estimate x is itself orthogonal to the error e. Under this condition. The above development does not assume that x or xi have zero mean. Assuming that the coefficients ai have been found. (A. a = 0. Geometric Interpretation. If in addition R is nonsingular. The case of singular R will be discussed in Section A. x2 ) (Fig.16). Expression for the Minimized Error.3 CONSEQUENCES OF ORTHOGONALITY Because the estimate x is a linear combination of the random variables xi . (A. which is similar to Eq. then r = 0. From this.1). . the error e. which should be approximated in the two-dimensional plane (x1 . A geometric interpretation of orthogonality principle can be given by visualizing the random variables as vectors. the orthogonality property (A. Imagine we have a three-dimensional Euclidean vector x.11)).154 THE THEORY OF LINEAR PREDICTION The autocorrelation matrix R is positive semidefinite because xx† is positive semidefinite.9) implies E[e x *] = 0. we can compute the minimized mean square error as follows: E = E[|e|2 ] = E[ee*] = E[e(x* + x† a)] = E[ex*] (by orthogonality (A. x2 plane. A. Notice that if the correlation between x and the observation xi is zero for each i. we obtain E[ |x|2 ] = E[ |x|2 ] + E[ |e|2 ] E (A. Under this condition. This means that the best estimate is x = 0 and the error is e = x itself! A. The above relation is also reminiscent of the Pythagorean theorem. We see that the approximation error is given by the vector e and that the length of this vector is minimized if and only if it is perpendicular to the x1 .4. and the estimate x are indeed related as (length of x)2 = (length of x )2 + (length of e)2 .15). then it is positive definite.15) That is.

Let x and x1 be real random variables. The normal equations are obtained by setting N = 1 in Eq.13). a1 is proportional to the cross-correlation E[x1 x]. (A. and the mean square error is E[e 2 ] = E[x 2 ] = mean square value of x itself! .17) Example A. Thus. E[x2 ] 1 The mean square value of the estimation error is E[e 2 ] = E[x 2 ] − (E[x1 x])2 2 E[x1 ] [from Eq. Using Eq. If E[x1 x] = 0. Thus. that is. this can be rewritten as E[ |e|2 ] = E[ |x|2 ] + a† r E (A.1: Estimating a Random Variable. 2 E[x1 ] × a1 = −E[x1 x]. We wish to estimate x from the observed values of x1 using the linear estimate x = −a1 x1 . the best estimate is x = 0. (A. (A.LINEAR ESTIMATION OF RANDOM VARIABLES 155 x e x1 x x2 FIGURE A.1: Pertains to the discussion of orthogonality principle. a1 = −E[x1 x] .17)].10) and the definition of r. then a1 = 0. that is.

That is. say. 1985) that R is singular if and only if Rv = 0 for some vector v = 0. singularity of R implies v† Rv = 0. for example. . v† E[xx† ]v = 0. E[ |v† x|2 ] = 0. without * changing the value of the optimal estimate. xi . x2 is completely determined by x1 . . Example A. . (A. [det R] = 0). In other words. recall (Horn and Johnson. (A. Let the 2 × 2 autocorrelation matrix be R= 1 a . and we can solve a set of L equations (normal equations) similar to Eq. + vN xN = 0. .14). Thus.2: Singularity of Autocorrelation.. . the N random variables xi are linearly dependent.13) is singular. that is.156 THE THEORY OF LINEAR PREDICTION A. we can drop the term aN xN from Eq. . . We can continue to eliminate the linear dependence among random variables in this manner until we finally arrive at a smaller set of random variables. In other words. we can rewrite this as E[v† xx† v] = 0.3) and solve a smaller estimation problem. a a2 T T a has the −1 property that Rv = 0. we cannot invert it to obtain a unique solution a as in Eq. (i. This implies that the scalar random variable v† x is zero. . To analyze the meaning of singularity. at least one of components vi is nonzero. that vN is nonzero. Assuming. From the above theory. v = 0. we conclude that x1 and x2 are linearly dependent and ax1 − x2 = 0.13). (A. .e. with x = x1 x2 . The L × L correlation matrix of these random variables is then nonsingular. 1 ≤ i ≤ L. which do not have linear dependence. We see that the vector v = x2 = ax1 . We have [det R] = a 2 − a 2 = 0 so that R is singular. Because v is a constant with respect to the expectation operation. to find the unique set of L coefficients ai . xN and v = v1 v2 . Because v = 0. Consider the estimation of x from two real random variables x1 and x2 . (A.4 SINGULARITY OF THE AUTOCORRELATION MATRIX If the autocorrelation matrix R defined in Eq. vN . *x *x * v† x = v1 1 + v2 2 + . we can write xN = −[v1 1 + . + vN−1 xN−1 ]/vN . that is. that is. *x * * That is.

2) (B. let us choose φ so that R∗ (k)e jφ is real and nonnegative.1) (B. . (B. The left-hand side can be rewritten as E |x(0)|2 + |x(−k)|2 − e jφ x*(0)x(−k) − e−jφ x(0)x*(−k) = 2R(0) − e jφ R*(k) − e−jφ R(k). Equality in (B. This proves Eq.1) is possible if and only if E|x(0) − e jφ x(−k)|2 = 0. we have R(0) > |R(k)| for k = 0. Thus. . E|x(0) − e jφ x(−k)|2 = 2R(0) − 2|R(k)| ≥ 0. that is. (B. Thus.2). as long as the process is not perfectly predictable. that is. from (B.1). R∗ (k)e jφ = |R(k)|.157 APPENDIX B Proof of a Property of Autocorrelations The autocorrelation R(k) of a WSS process satisfies the inequality R(0) ≥ |R(k)|. Because this holds for any real φ. E|x(n) − e jφ x(n − k)|2 = 0 for all n (by WSS property). which means that all samples of x(n) can be predicted perfectly if we know the first k samples x(0). x(k − 1).. . first observe that E|x(0) − e jφ x(−k)|2 ≥ 0 for any real constant φ.3) . Summarizing. x(n) = e jφ x(n − k). To prove this.

.

. . in fact. x(n − N ) (Section 2.. we have to work a little harder. x(n − N ). . we know that Ryy (0) ≥ |Ryy (k)| for any k that shows |q| ≤ 1.159 APPENDIX C Stability of the Inverse Filter We now give a direct proof that the optimum prediction polynomial AN (z) has all zeros pk inside the unit circle. q is the optimal coefficient Ryy (1) . This proof appeared in Vaidyanathan et al. | pk | < 1. for otherwise. . .1(b). . Also see Lang and McClellan (1979). Recall that the optimal eN (n) is orthogonal to x(n − 1).. C. |q| < 1 when x(n) f is not a line spectral process.1) where Ryy (k) is the autocorrelation of the WSS process y(n). From Appendix B.1(a) can be redrawn as in Fig. (1997). as long as x(n) is not a line spectral process (i.2) . C.3). To show that. where q is some zero of AN (z) and CN−1 (z) is causal FIR with order N − 1. f (C. the mean square value of its output can be made smaller by using a different q. This proves that all zeros of AN (z) satisy |zk | ≤ 1. not fully predictable). The prediction filter reproduced in Fig. that is. This gives a direct proof (without appealing to Levinson’s recursion) that the causal IIR filter 1/AN (z) is stable. First observe that 1 − qz−1 is the optimum first-order prediction polynomial for the WSS process y(n). Ryy (0) q= (C. Because y(n − 1) is a linear combination of f x(n − 1).e. it is orthogonal to eN (n) as well: E[eN(n)y*(n − 1)] = 0. Thus. Proof. which contradicts the fact that AN (z) is optimal..

2)) = E[(y(n) − qy(n − 1))y*(n)] * = Ryy (0) − qRyy(1) = Ryy (0)(1 − |q|2 ) (from (C.1)). Thus. with a factor (1 − qz−1 ) shown separately.1: (a) The prediction filter and (b) an equivalent drawing.160 THE THEORY OF LINEAR PREDICTION (a) x(n) A N (z) FIR prediction filter eNf (n) x(n) (b) CN − 1 (z) y(n) 1 − qz −1 eNf (n) FIGURE C. (1 − |q|2 ) > 0. using the fact that eN(n) = y(n) − qy(n − 1). we have EN = 0. that is. we can write f EN = E[|eN (n)|2 ] = E[eN (n)(y(n) − qy(n − 1))*] f = E[eN (n)y*(n)] (from (C. EN > 0. which proves |q| < 1 indeed. • • • • f f f f f . Thus. Because x(n) is not a line spectral process.

To see this.2 R(k − 2) . (D.n = aN . written for N + m instead of N yields R(k) = −aN. 1 ≤ k ≤ N. * * * f (D. then the second equation can be made stronger: R(k) = −aN.2) Proof. . . . .20) is R*(k) + aN.2 R(k − 2) .N R(k − N ). (2. • • • • . the kth equation in (2.2) for AR(N ) processes. k = 0 * * * −aN. aN+m. 1 ≤ k ≤ N + m * * * for any m ≥ 0.1 R(k − 1) − aN. Conjugating both sides.NR*(k − N ) = 0 (using R(i) = R*(−i)).1). for all k > 0 * * * (D. For 1 ≤ k ≤ N.1 R*(k − 1) + . note that the 0th row in Eq. + aN. the second equation in Eq. − aN.1) In particular. This proves Eq. that is.n 0 1≤n≤N n > N.N R(k − N ).1 R(k − 1) − aN. . So.2 R(k − 2) .1 R(k − 1) − aN.1 R(k − 1) − aN. The above equations are nothing but the augmented normal equations rewritten. . we know that the (N + m)th-order optimal predictor for any m > 0 is the same as the N th-order optimal predictor. − aN.20) is precisely the k = 0 case in (D.N R(k − N ) + EN . the autocorrelation satisfies the following recursive equations: R(k) = f −aN.161 APPENDIX D Recursion Satisfied by AR Autocorrelations Let ak. . − aN.n denote the optimal N th-order prediction coefficients and EN the corresponding prediction * error for a WSS process x(n) with autocorrelation R(k). (D. (D.1). . we get the second equation in Eq.N R(k − N ). For the case of the AR(N ) process. .1). − aN.2 R(k − 2) . if the process x(n) is AR(N ). Then. .

.

. Suppose x(n) is a zero-mean white WSS random process with variance σ 2 .3 using orthogonality principle).N f T f . Someone performs optimal linear prediction on a WSS process x(n) and finds that the first two lattice coefficients are k1 = 0.5 and k2 = 0. b) Find the predictor polynomials A1 (z). f c) Show that e2 (n) is white. (2. a) The predictor polynomials A1 (z) and A2 (z). compute the following. By differentiating EN with respect to each aN.3) where 0 < b < 1. . Consider a WSS process x(n) with autocorrelation R(k) = b0. the lattice coefficients k1 . for m = 0. where a = aN. E2 using Levinson’s recursion. A2 (z). The prediction coefficients aN. 3.8). and the error f f variances E1 .5|k| 0 for k even for k odd (P. . show that we again obtain the normal equations (derived in Section 2.i and setting it to zero.25 and that the second-order mean square f prediction error is E2 = 1. 1. 4. f f b) The mean square errors E0 and E1 .1 aN. Show that the mean square prediction f error EN can be written as EN = R(0) + aT RN a + 2aT r.i are now real. Using these values.0.2 .i of the N th-order optimal predictor? What is the prediction error variance? 2. and RN and r are as in Eq. c) The autocorrelation coefficients R(m) of the process x(n). 2. a) Compute the power spectral density Sxx (e jω ).163 Problems 1. aN. What are the coefficients aN. Consider the linear prediction problem for a real WSS process x(n) with autocorrelation R(k). k2 .

7b) where Ruu = E[|u|2] and σ 2 = E[| |2 ] (noise variance).2. this means that we should not trust the measurement and just take the estimate to be zero.5) and that the fourth-order mean square prediction error is E4 = 2. Hence.3. 6. Using these values.5. that is. the mean square estimation error will never exceed the mean square value Ruu of the random variable u to be estimated.4. (P. f α* = E[em (n)x*(n − m − 1)]. Let e = u − u be the estimation error. that is. we would like to find an estimate of u of the form u = av.1.3. c) The autocorrelation coefficients R(m) of the process x(n).7c) Thus.2. k2 = 0. k4 = 0. From this noisy measurement v.25. Ruu + σ 2 (P.7a) where the random variable represents noise . f (P.6) 7. for m = 0. Thus. which minimizes the mean square error E[|e|2 ].4. the best estimate approaches zero. show that the best estimate of u is u= Ruu Ruu + σ 2 v.0. Let v be a noisy measurement of a random variable u. Compute the value of a. Show that the quantity αm arising in Levinson’s recursion can be expressed in terms of a cross-correlation. Someone performs optimal linear prediction on a WSS process x(n) and finds that the first four lattice coefficients are k1 = 0. the error E[|e|2] approaches Ruu. no matter how strong the noise is! . m (P. k3 = 0. when the noise variance σ 2 is very large. Show also that the minimized mean square error is E[|e|2 ] = Ruuσ 2 .3. (P.164 THE THEORY OF LINEAR PREDICTION 5.2. f b) The mean square errors Em for m = 0. with the above estimate. Assume that this noise has zero mean and that it is uncorrelated with u. v =u+ . compute the following: a) The predictor polynomials Am (z) for m = 1. Estimation From Noisy Data.1. In this extreme case.

demonstrated below for N = 3. (2. Condition number and spectral dynamic range. in general. 10. Show that JAJ = A*. then JA represents a matrix whose rows are identical to those of A but renumbered in reverse order. AJ corresponds to column reversal. this does not necessarily always happen: we can create examples where Smax /Smin is arbitrarily large. a) Find the Fourier transform Syy (e jω ) of Ryy (k). Let J denote the N × N antidiagonal matrix. given the autocorrelation R(k) with Fourier transform Sxx (e jω ). then the condition number of RN can. Does it mean that A is Hermitian and Toeplitz? Justify. Eq. be very large.9) If A is any N × N matrix. (P. a) Now assume that A is Hermitian and Toeplitz. That is. For example. 9. b) Argue that Ryy (k) represents a valid autocorrelation for a WSS process y(n). (2. ⎡ ⎤ 001 ⎢ ⎥ J = ⎣0 1 0⎦ 100 (P.e. Does it mean that A is Toeplitz? Justify. define a new sequence as follows: Ryy (k) = R(k/N ) if k is multiple of N 0 otherwise. Consider a WSS process x(n) with power spectrum Sxx (e jω ) = 1 − ρ2 1 + ρ2 − 2ρ cos ω − 1 < ρ < 1.31) shows that if the power spectrum has wide variations (large Smax /Smin ). and yet the condition number of RN is small. as in Eq. suppose JAJ = A*. write down the N × N autocorrelation matrix. in terms of Sxx (e jω ). c) For the process y(n). However.. show that there exists a random vector x such that R = E[xx† ]. c) Suppose JAJ = A* and A is Hermitian. b) Conversely. Compute the eigenvalues of this matrix and verify that they are bounded by the extrema of the power spectrum (i. Show that it is a valid autocorrelation matrix. Let R be a Hermitian positive definite matrix.28)). Similarly. 11.10) Let R2 represent the 2 × 2 autocorrelation matrix of x(n) as usual. What is its condition number? .PROBLEMS 165 8. Valid Autocorrelations.

show that the numbers R(k). Show that the coefficients bN. let γ denote the ratio Smax /Smin in its power spectrum Sxx (e jω ).13) a) Compute the coefficients of the first and the second-order optimal predictor polynomials A1 (z) and A2 (z) by directly solving the normal equations. Give a constructive proof. that is. R(1) = 2. . 13. show how you would generate a WSS process x(n) such that its first N + 1 autocorrelation coefficients would be the elements R(0). show that the maximum eigenvalue of B cannot be smaller than that of A. 1985). For the process y(n). 0 ≤ k ≤ N. Similarly. in addition. ×× (P. . Indicate the values of the lattice coefficients k1 and k2 . (4. (P. .and second-order predictors.20). compute the predictor polynomials and mean square prediction errors for the first. it turns out that it is also positive definite.3). d) Draw the second-order IIR lattice structure representing the optimal predictor and indicate the forward and backward prediction errors at the appropriate nodes. b) By using Levinson’s recursion. Indicate the values of the lattice coefficients k1 and k2 .1) cannot decrease as the matrix size m grows. Valid Toeplitz autocorrelation. . Use Rayleigh’s principle (Horn and Johnson. what is the corresponding ratio? 12. R(1).12) a) Show that the minimum eigenvalue of B cannot be larger than that of A.i of the optimal backward predictor are indeed given by reversing and conjugating the coefficients of the forward predictor. b) Prove that the condition number N of the autocorrelation matrix Rm of a WSS process (Section 2. . R(N ). (2. Consider two Hermitian matrices A and B such that A is the upper left submatrix of B. c) Draw the second-order FIR LPC lattice structure representing the optimal predictor and indicate the forward and backward prediction errors at the appropriate nodes. this matrix is Hermitian and Toeplitz. Consider a WSS process x(n) with the first three autocorrelation coefficients R(0) = 3. Suppose. Then. as shown in Eq. that is B= A× . Consider a set of numbers R(k). By construction. Hint. 14. 0 ≤ k ≤ N and define the matrix RN+1 as in Eq. R(2) = 1. are valid autocorrelation coefficients of some WSS process x(n).166 THE THEORY OF LINEAR PREDICTION d) For the process x(n). 15.4.

P. 20. Consider a real WSS Gaussian process x(n) with power spectrum sketched as shown in Fig. for which the second-order predictor polynomial is A2 (z) = 1 1 − z −1 2 1 1 − z−1 3 (P. Then.PROBLEMS f em (n) 167 16. (P. the autocorrelation R(k) of a WSS process x(n) was supplied in closed f form. 18. Consider a Gaussian AR(2) process with R(0) = 1.1.20) 2 Compute the entropy of the process and the flatness measure γx . the prediction error e2 (n) is white. that is. −1 < ρ < 1. f m > k.4. 19. Make qualitative remarks explaining the nature of the plot. compute the autocorrelation R(m) for m = 3. Let denote the forward prediction error for the optimal mth-order predictor as usual.16) 17.21. 21.21 . Show that the following orthogonality condition is satisfied: f E em (n)[ek (n − 1)]* = 0. with autocorrelation R(k) = ρ|k| . Show that the process x(n) is in fact AR(2). Sxx (e jω 4 ) 2 1 ω 0 π /4 π /2 π FIGURE P. a) Compute the entropy of the process. Compute and sketch the entropy of a Gaussian WSS process. In Problem 4. suppose you are informed that x(n) is actually AR(2). In Example 2.

Let the 3 × 3 autocorrelation matrix have the form f ⎤ 1ρα ⎢ ⎥ R3 = ⎣ ρ 1 ρ ⎦ .4.23) That is. R(1) = ρ. P. let there be no such FIR filter of lower order.168 THE THEORY OF LINEAR PREDICTION b) Compute the limiting value E∞ of the mean square prediction error. and R(2) = α. simpler. R(0) = 1. Show then that all the zeros of A(z) must be on the unit circle. Consider a WSS process x(n) with autocorrelation R(k).) If you are told that x(n) is a line spectral process of degree = 2. Furthermore. do the following: a) Find the line frequencies ω1 and ω2 and the powers at these line frequencies.) b) Now let x(n) be a WSS process satisfying the difference equation L x(n) = − i=1 ai*x(n − i). L (P. (Thus. proof of the result in Theorem 7.23) a) Show that the constant α must have unit magnitude. αρ1 ⎡ (P. Let y(n) be a WSS random process such that if it is input to the FIR filter 1 − αz−1 the output is identically zero (see Fig. y(n) 1 − α z −1 = 0 for all n FIGURE P. We now provide a direct. 23. the output is identically zero.23 . of course. 22.22) where ρ and α are real and −1 < ρ < 1. c) Find a closed form expression for R(k). 2 c) Compute the flatness measure γx . b) Find the recursive difference equation satisfied by R(k). if x(n) is input to the FIR filter A(z) = 1 + i=1 ai*z−i . (Assume. d) Compute R(3) and R(4). what would be the permissible values of α? For each of these permissible values of α. that y(n) itself is not identically zero.

.26b) M−1 αi = α. 25. define a sequence βM = (1/M) i=0 αi and show that it tends to α. .26a) be in the range R. and positive semidefinite for any m. (P. such that |αi − α| < a) Show that 1 lim M→∞ M for i > N. Toeplitz. α2 . we can find a number δ > 0 such that | f (x1 ) − f (x)| < for |x1 − x| < δ. We know that RL has rank L and RL+1 is singular. Thus. Let R1 (k) and R2 (k) represent the autocorrelations of two WSS random processes. i=0 M −1 (P.26c) In other words. (P. Using (some or all of) the facts that Rm is Hermitian. there exits a finite integer N. . Show then that i→∞ lim f (αi ) = f lim αi = f (α). b) Suppose f (x) is a continuous function for all x in a given range R. (P. show the following. This means that. Which of the following sequences represent valid autocorrelations? a) R1 (k) + R2 (k) b) R1 (k) − R2 (k) c) R1 (k)R2 (k) d) n R1 (n)R2 (k − n) e) R1 (2k) .26a) tends to a limit α. b) Rm has rank exactly L for any m ≥ L. if we are given a number > 0. given > 0. α1 . i→∞ (P. for any fixed x in the range.PROBLEMS 169 24.26e) 27. Let Rm be the m × m autocorrelation matrix of a Linespec(L) process x(n). with rank L.26d ) Let all members αi of the sequence (P. Show that the sum of sinusoids considered in Example 7. a) Rm has rank m for all m ≤ L. Suppose a sequence of numbers α0 .3 is indeed WSS under the three conditions stated in the example. 26.

. these spectra do not overlap as demonstrated in the figure. Find an example of the autocorrelation R(k) for a WSS process such that.e. E[x(n)y*(n − k)] does not depend on n). for some m.170 THE THEORY OF LINEAR PREDICTION R1 (k/2) for k even 0 for k odd. Let R(k) be the autocorrelation of a zero mean WSS process and suppose R(k1 ) = R(k2 ) = 0 for some k2 > k1 > 0. imply that E[x k yn ] = E[x k ]E[yn ] (i. then the answer is yes. however. If k1 = 0 and k2 > 0. n. that x k and yn are uncorrelated) for arbitrary positive integers k. in general. 28. 30. Demonstrate this. Peebles. 32. that the above processes are not only WSS but also jointly WSS (i. b) Assume.. 1987). f) R(k) = 29. that is. with power spectra Sxx (e jω ) and Syy (e jω ). Let x(n) and y(n) be zero mean WSS processes. It is well-known that the power spectrum Sxx (e jω ) of a scalar WSS process is nonnegative (Papoulis. show that the power spectral matrix Sxx (e jω ) of a WSS vector process x(n) is positive semidefinite. Sxx (e jω ) Syy (e jω ) ω 0 2π FIGURE P. Suppose Sxx (e jω )Syy (e jω ) = 0 for all ω . but the minimum eigenvalue of Rm+2 is smaller. Using this. 31.2.32 • • • • .e. a) This does not mean that the processes are necessarily uncorrelated. they are uncorrelated. Show then that they are indeed uncorrelated. the autocorrelation matrices Rm and Rm+1 have the same minimum eigenvalue. Let x and y be real random variables such that E[xy] = E[x]E[y]. that is. Does this mean that R(k) is periodic? Note. as asserted by Theorem 7. This does not. 1965. Prove this statement with an example.

16. J. . M. H. K. O. 8. New York. October 1970. and Kashyap. 1984.-S. Churchill. ‘‘The relationship between maximum entropy spectra and maximum likelihood spectra. 49. doi:10.. 27. 106. 1969. vol. 1947. G. New York. John Wiley & Sons. S. ‘‘Predictive coding of speech signals and subjective error criteria. McGraw-Hill. 37. June 1979..1980. 2006. Acoust. R.1170967 5. B. McGrawHill. February 1985. E.’’ Bell. 1960. Electrical network theory. Englewood Cliffs. J. Elements of information theory. Chong. and Moore. Algebraic coding theory. An introduction to optimization. MA.’’ IEEE Trans. and Zak. Acoust. Fast algorithms for digital signal processing.. 10. Addison-Wesley. ‘‘On the radiative equilibrium of a stellar atmosphere.. D. Practical optimization: algorithms and engineering applications. W. H. and Thomas. S. pp. doi:10. Berlekamp. 2. M. Atal. NJ. A.. 15. Chandrasekhar. 152--216. Blahut. McGraw-Hill. New York. pp. J.. R. E. Cover. Springer. 1979. M. 2007. 375--376.. Balabanian. New York. 1969. R. 4. Antoniou. and Brown. 247--254. 194--203. 3. pp. Burg. P. vol. ˙ 13. 2001. and Schroeder. vol.’’ IEEE Trans. R. New York. B. New York. John Wiley & Sons. systems. P. 6. Bellman. Prentice-Hall. McGraw-Hill.. Macmillan. John Wiley & Sons. 1993. Chellappa. and Hansen. Discrete-time processing of speech signals. April 1972. Reading. J. 1985. R. T. and filters.’’ Geophysics. J. New York. and Schroeder. Sys. 1991. L.1109/ICASSP. Deller.1440265 11. vol. R. Atal.1109/MAP. New York. R. B. 9. R.. B. 33.. vol..500234 14. Anderson. L. V. J. A. pp.171 References 1. Introduction to matrix analysis. and Lu. ‘‘Adaptive predictive coding of speech signals. R. 1973--1986. Antoniou. W. J.’’ Astrophys.. pp. A. Optimal filtering. New York. S. N. S. E. and Bickart.. A. Speech Signal Process. doi:10. 7. 12. Digital signal processing: signals. J.. ‘‘Texture synthesis using 2-D noncausal autoregressive models. Speech Signal Process. Introduction to complex variables and applications. Proakis..1996. J. Tech. T.1190/1.

CA. vol. ‘‘Blind identification of MIMO-FIR systems: a generalized linear prediction approach. Prentice-Hall. ‘‘Digital lattice and ladder filter synthesis.1103175 18. Gray. P. vol. pp. Theory.1109/TASSP. vol. 57. 31. 491--500. 725--730. Inform. A. ‘‘A view of three decades of linear filtering theory. 1975. doi:10. D. John Wiley & Sons. Golub. NJ. pp. vol. Jr.. 1975. Kailath. doi:10. Gray. Grenander. Acoust. Jr. pp. 1992. R.. A. Speech Signal Process.. pp. Itakura. H. Matrix computations. 23.. pp. F. Upper Saddle River. A. NJ. 1980. H. and Markel. vol.. 146--181. Hayes. and Noll. 22. 105--124. J. New York.1055174 33. Inc. November 1983.’’ IEEE Trans. June 1975. C. 36--43. AU-21. 1989. vol. A. G. ‘‘Line spectrum representation of linear predictive coefficients of speech signals. Cambridge University Press. M.172 THE THEORY OF LINEAR PREDICTION 17. Gorokhov. ‘‘A normalized digital filter structure. University of California Press. 53-A. H. December 1973. Kailath. Audio Electroacoust. S. S. Haykin. 20. May 1980. T. H. doi:10. 1985. and Szego.1974. Friedlander. Digital coding of waveforms. ‘‘A statistical method for estimation of speech spectral density and formant frequencies. Acoust. J.. 1970.. Boston. S. 1986 and 2002. CAS-27. 1984. November 1972. Soc. 1958. Toeplitz forms and their applications. 337--344.1162522 24.. Gray.. ‘‘Passive cascaded lattice digital filters. doi:10. vol. R.’’ J.’’ Trans IECE Jpn. Horn R. Englewood Cliffs.. and Gray. pp. Jr. N. Prentice Hall. MD. C. Statistical digital signal processing and modeling..’’ IEEE Trans. 28. doi:10. 1051--1055.. P.. 73. Vector quantization and signal compression. vol. 18. 30. 20. 1996.1973. 1999. U.. F. NJ. Automat.1109/ TAC. doi:10. March 1974. M. D. ‘‘A lattice algorithm for factoring the spectrum of a moving average process.. Englewood Cliffs.’’ Signal Process. Matrix analysis. and Markel. T. 28. Theory.1016/ S0165-1684(98)00187-X 21. Linear systems. B. Johns Hopkins University Press. and Saito.’’ IEEE Trans. Gersho. M.’’ IEEE Trans. Kluwer Academic Press. A. Jayant. 27. and Loubaton.1109/TIT. pp. pp.’’ IEEE Trans.. 26. Gray. G. 19. Berkeley. Inform. Adaptive filter theory. R. and Van Loan. Control.1016/0165-1684(85)90053-2 32. Am. Circuits Syst. ASSP-23.1162680 25. 268--277. Prentice-Hall.. and Johnson. Itakura. . Baltimore.1109/TAU. F. Cambridge.1983. 29. vol. A. H.. ‘‘On asymptotic eigenvalue distribution of Toeplitz matrices.’’ IEEE Trans.

N. doi:10. Speech Signal Process. G. N. NJ. 47. Jr. pp. Mat.. M. 1941. New York. 134--139. vol. 42. IEEE Int.’’ Proc. Math. Statistical and adaptive signal processing. 561--580. Jr. and Morf. September 2001. 48.. Nauk SSSR Ser. 3--14. 37. 50. Syst. 36. August 1988. S... Marple. pp.. pp. John Wiley & Sons. April 1975. V. NJ. Electron. 1987.2003. ‘‘Stable and efficient lattice methods for linear prediction. U. Acoust.. Sayed. and Gray. A. A.’’ Proc.1109/29.. pp. doi:10. ‘‘A simple proof of stability for all-pole linear prediction models. Kailath. Conf. Springer-Verlag. Levinson.398548 . New York.. A. and Gish. and Kogon.’’ IEEE Trans. 40. A. pp. McGraw-Hill.. 43. NJ. S. Speech Signal Process. January 1983. R. Prentice-Hall.. pp. 51. May 1979. Kay. 1380--1419.’’ Proc. Digital spectral analysis. S. pp. IEEE. IEEE.1--7.’’ Am. K. J.1206020 49.. and Tufts. vol. ‘‘The Wiener rms error criterion in filter design and prediction. M.1016/0022-247X(79)90124-0 35.. IEEE. Soc. Makhoul. IEEE. 261--278. S. Acoust.. H. Signal Process. T. and Hassibi. 38.. Lepschy. W.. Prentice-Hall.3. 1.’’ Proc. September 1979. Akad. ‘‘Displacement ranks of a matrix. vol. Kreyszig. Kang..4. doi:10.1109/ISCAS. M. Englewood Cliffs. Manolakis. J. ‘‘Application of line-spectrum pairs to low-bit-rate speech coders. pp.3. 5. 39. vol. Lopez-Valcarce. 2000. 49. 25. 860--861.1664 44. 73. 36. 25. 1976. ‘‘Blind channel equalization with colored sources based on second-order statistics: a linear prediction approach. Advanced engineering mathematics.. 2050--2059. vol. Aerosp. ‘‘Interpolation and extrapolation of stationary random sequences.. M. 45. Ingle. pp.’’ IEEE Trans. vol. and Fransen. Linear estimation. G. T.’’ IEEE Trans. E. Roucos.. R.REFERENCES 173 34. 41. Makhoul. 1988. W. 63.. H. J. 769--773. Phys. ‘‘Estimating the angles of arrival of multiple plane waves. Kailath.. May 1995. October 1977. January 1947. Modern spectral estimation: theory and application.1121/1. doi:10. Acoust. A. H.’’ IEEE Trans. Linear prediction of speech. Makhoul. Speech Signal Process. S. D. G. ‘‘Vector quantization in speech coding. Englewood Cliffs. ‘‘Spectrum analysis---a modern perspective. and Viaro. Markel. and Marple. ‘‘Linear prediction: a tutorial review. Kung S-Y. L. and Dasgupta. B. Kumaresan. D. J. Math. 7. ‘‘A note on line spectral frequencies. vol. H. Mian.1109/78. 1355--1357. vol. 1551--1588. vol. 423--428. D. Kay. pp. 1972. Lang. Prentice-Hall. 69. and McClellan.’’ Izv. doi:10. pp. Englewood Cliffs. J. J...’’ Proc. S. Kolmogorov. Jr... November 1981. 67. L. pp. S. S. 19.. November 1985.. L. S. 2000. New York.942633 46. vol.’’ J.

1973. Z. Antennas Propagation. McGraw-Hill. Paulraj. vol. Digital processing of speech signals. 1978. Circuits Syst. vol. Prentice-Hall. Prentice-Hall. doi:10.. random variables and random signal principles. A.1109/TAP. 70. NJ. Prentice-Hall. 984--995. vol.’’ Geophys. and Kailath. New York. 64. 347--366. T. A. R. W. 53. and Manolakis. Proakis.’’ Proc.’’ IEEE Trans. V.1365-246X.. T. vol.. McGraw-Hill. NJ. doi:10. September 1982. D. Mathematical methods and algorithms for signal processing.. 58. Rao. 33.. W. pp. S. K.1111/j. 66. 27. P. Roy. 276--280. 1176--1186. NJ. November 1984. pp. R. IEEE Press.’’ IEEE Trans. ‘‘Channel equalization using adaptive lattice algorithms. Hoboken. Satorius. 933--945.. E. Soc. random variables and stochastic processes. 1987. Mitra.. H. 62.1094477 67. ‘‘ESPRIT---estimation of signal parameters via rotational invariance techniques. Rabiner. 55. Digital signal processing: a computer-based approach. R. Englewood Cliffs. 29. O. and Kailath. Acoust. Probability. A. Roy. 1965. Robinson. 2000. 63. and Kailath. H. June 1979. 74.. A. C. McGraw-Hill. and Schafer. 1973. Oppenheim.. Peebles. J. March 1986 (reprinted from the Proceedings of the RADC Spectrum Estimation Workshop...174 THE THEORY OF LINEAR PREDICTION 52. R. 1999.. IEEE. ‘‘Maximum entropy and spectrum estimation: a review..’’ IEEE Trans. Upper Saddle River. pp. IEEE..1143830 . Pisarenko.tb03424. Acoust.1109/29. vol. Real and complex analysis. G. pp. 59. New York.. NJ. ‘‘Orthogonal digital lattice filters for VLSI implementation. pp. ‘‘Multiple emitter location and signal parameter estimation.’’ IEEE Trans. New York. ‘‘The retrieval of harmonics from a covariance function. Probability. J. W. vol. 61. NJ. E. pp. Prentice-Hall. ‘‘A historical perspective of spectrum estimation. T. Rudin. 68. 56. R. Schmidt. 1979). ‘‘A subspace rotation approach to signal parameter estimation. CAS-31. A. vol. 2003. T. K. doi:10. 57. 2001.1979. Papoulis. W. 54. Jr. Sayed.1109/TCOM. Papoulis.1986. July 1986. 899--905. New York. Astron. V. 1996. and Alexander S. R. Moon.32276 65. Discrete-time signal processing. John Wiley & Sons. and Stirling. doi:10. Speech Signal Process.’’ Proc. A. G. 34. T. pp. Fundamentals of adaptive filtering.x 60. K. July 1989. Upper Saddle River. S. McGraw-Hill. F. and Schafer. Digital signal processing: principles. 1974. L. alogrithms and applications. Commun. R. Englewood Cliffs. pp. December 1981. Speech Signal Process. 1044--1046.’’ IEEE Trans. 37. vol. 885--907.

Speech Signal Process. D. IEEE Int.’’ Proc. doi:10. London. Therrien. P. P. Prentice-Hall. Berlin.-H.-H. 1. IEEE. H. vol. K. 404--423.’’ IEEE Signal Process. pp.’’ IEEE. 1045--1064. P. P. P. March 1984. Vaidyanathan. K. pp. and Zubakov. ‘‘Optimal quantization of LSP parameters. pp. Extraction of signals from noise. Trans. pp. P. pp. Circuits Syst. 98.. ‘‘The prediction theory of multivariate stochastic processes. ‘‘Low passband sensitivity digital filters: A generalized viewpoint and synthesis procedures. Soong. ASSP-34. 93--137. P. IEEE. 75.1085661 80. N. F. vol. Extrapolation. Vaidyanathan. and Mitra..1085815 75. Linear prediction theory. 85.’’ IEEE Trans. vol.’’ IEEE Trans. C.. 83. CAS-32. doi:10. Vaidyanathan.. P.. Vaidyanathan. J. P. 79. ‘‘Line spectrum pair (LSP) and speech data compression. 2002. Wainstein.1119/1... part 1: vol. 1990. and Kirac.1986. P. vol.. S.1109/TCS. doi:10.382395 78. compression and synthesis.’’ Acta Math. December 1985. and Juang B. May 1997. Trans. W. 1234--1245. Weiner. 15--24.. K.1969250 84.1164829 81. Soong. 918--924. ‘‘On predicting a band-limited signal based on past sample values. doi:10.1109/ TCS. S. pp. F. 111--150. P.’’ IEEE Trans. 4. 72. doi:10. 71. P. Schroeder.1985.1109/78.’’ IEEE.1985. N. Mitra. 74... 70. Multirate systems and filter banks. Vaidyanathan. P.575554 82. Tuqan.. 99. doi:10. L. S. vol. CAS-32. pp. 1962.. 1. vol. Y.1109/TCS. Van Trees. Circuits Syst. 1958. Weiner. NJ. August 1987. 1125--1127. V. Berlin. 1999.4. 1992. Prentice-Hall. 1949. Vaidyanathan. Circuits Syst. Strobach. January 1993. with applications to filter banks. Vaidyanathan. ‘‘Passive cascaded lattice structures for low sensitivity FIR filter design. 77. September 1985. Vaidyanathan. part 2: vol. ‘‘A new approach to the realization of low sensitivity IIR digital filters.REFERENCES 175 69. A. doi:10. . 350--361.. P. New York. ‘‘The discrete-time bounded-real lemma in digital filtering.1109/ 97.. April 1984. P. P. Optimum array processing. interpolation and smoothing of stationary time series. Computer speech: recognition. Springer-Verlag. pp. vol. Englewood Cliffs. John Wiley & Sons. L. P. ‘‘On the minimum phase property of predictionerror polynomials. pp.. Acoust. 73. Speech Signal Process.1--1.1986. John Wiley & Sons. NJ. 72. and Neuvo. Springer-Verlag. K. 1957.. April 1986.. ‘‘A general family of multivariable digital lattice filters. Upper Saddle River. pp.1109/TASSP. pp. A. Conf. Speech Audio Process. 1993. 126--127. CAS-33. and Juang B. and Mitra.’’ Proc. P.’’ Proc.10.10. Acoust.1085867 76. Prentice-Hall. M. vol. New York. K. Discrete random signals and statistical signal processing. and Masani. Lett. November 1986.

.

Terman Award (ASEE) in 1995. . and master of technology degrees from the University of Calcutta. He served as the executive officer of the Department of Electrical Engineering at Caltech for the period 2002–2005. He was the Eliahu and Joyce Jury lecturer at the University of Miami for 2007. Vaidyanathan received his bachelor of science. India). He received the F.177 Author Biography P. past distinguished lecturer of the IEEE SP society. and radar signal processing. bachelor of technology. He has received several awards for excellence in teaching at Caltech. E. genomic signal processing. P. and the IEEE Signal Processing Society’s Technical Achievement Award in 2001. and recipient of the IEEE ASSP Senior Paper Award and the S. image processing. He is the author of the book Multirate Systems and Filter Banks and has authored or coauthored more than 370 articles in the signal processing area. He is an IEEE Fellow (1991). His research interests include digital filter banks. Mitra Memorial Award (IETE. He obtained his doctor of philosophy degree from the University of California at Santa Barbara and became a faculty of the Department of Electrical Engineering at the California Institute of Technology in 1983. the IEEE CAS Society’s Golden Jubilee Medal in 1999. India. digital communications. K.

.

147 Covariance method. 33 MIMO. 85 Burg’s relation. 17 Dagger notation. 41 Backward prediction polynomial. 41 AR approximation. 7 Autoregressive(AR). 165 AR (autoregressive). 44 AR process maximum entropy.179 Index Acoustical tube. 83 summary. 114 All-pass phase response. 112 All-pass filters. 50 AR modeling. 127 Compression. 161 singularity. 27. 15 Differential entropy. 16 Autocorrelation sequence. 13 Autocorrelation and periodicity. 15 eigenstructure. 157 Autocorrelation matching. 38 . 42 Augmented normal equation. 31 Band-limited signal prediction of. 44 AR model estimation. 9 Autocorrelation determinant. 55 power spectrum. 27. 46. 112 AM--GM inequality. 49 Autocorrelation matrix properties. 69 Antidiagonal matrix. 37 Alternation property. 148 Coprime matrices. 103 estimating. 138 Allpole filter. 62 AR property and stalling. 165 Coprimality predictor polynomials. 13 Condition number and spectrum. 47 filter. 10 Autocorrelation method. 16 recursion. 79 Determinant of autocorrelation. 82 Displacement rank. 91 Autocorrelation inequality. 92 Autocorrelation extrapolation. 53. 29 Downward recursion. 31 All-pass property. 3 Determinant and MSE. 32 Backward predictor. 58 Condition number.

23 Lattice structure derivation. 118 Error spectrum flatness. 87 autocorrelation. 36 Flatness. 91 . 1 Lattice coefficients. 36 IIR. 82 Entropy maximization. 14 Harmonic process. 92 unit-circle zeros. 1 Homogeneous difference equation. 85 Ergodicity. 76 ESPRIT. 2 Kolmogorov. 164 Extrapolation of autocorrelation. 19 downward. 76 Formants in speech. 88 characteristic polynomial. 90 rank saturation. 96 Error covariance matrix. 114 Forward predictor. 109 Estimating autocorrelations. 29 initialization. 90 IIR lattice. 2 Levinson’s recursion. vector LPC. 103 Eigenvalues and power spectrum. 91 periodicity. 135 normalized. 36 FIR. vector LPC. 137 stability. 53 FIR lattice. 133 Lattice structures. 93 Line spectral process. 15. 38 history. 128 Line frequencies. 130 IIR. 147 Irreducible MFDs and poles. vector LPC. 55 Estimation from noisy data. 46. 37 symmetrical. 82 Gaussian. 83 Entropy maximum AR process. 22 properties. summary. vector LPC. 12 Entropy differential. 15. 36 stability. 147 Kailath. 92 History of linear prediction. 159 Inverse filter (IIR). 6 Irreducible MFDs. 23 summary. 65 error spectrum. 35 FIR. 16 Estimation AR model. 5 Full predictability. 87 Line spectra extrapolation. 37 Inverse filter stability proof. 31 Levinson.180 THE THEORY OF LINEAR PREDICTION Eigenstructure of autocorrelation. 38 vector case. 91 Line spectral paremeters. 22 upward. 83 Entropy of prediction error.

154 prediction errors. 95 ergodicity. 6 Orthogonality principle. 146 Minimum mean square error estimate. 60 Normal equations. 12 . 92 Phase response all-pass. 130 Notations. 5 MFD (matrix fraction descriptions). 61 MSE and determinant. 112 Pisarenko’s harmonic decomposition. 87 LPC for compression. MIMO case. 151 Linear prediction MIMO. 113 Makhoul. 153 Optimum predictor. 117 vector process. 148 Minimum-phase property. 92. 151 Paraunitary property. 34 Orthogonality of error. 22 bound. 81 Maximum entropy estimate. 79 MUSIC. 2 Masani. 58 LSP. 53 Maximum entropy method. 109 Minimum-phase vector LPC. 138 propagation. 105 Pitch-excited coders. 112 Moving average (MA) process. 6 Linespec(L). 98 summary. 22 Perceptual quality and LSP. 141 Parcor. 109 Noise-excited coders. 5 Orthogonality geometric interpretation. 170 Power spectrum. 105 prediction polynomial. 156 Linear estimation. 96 power. 100 Line spectrum in noise. 24 vector LPC. 151 Minimum-norm methods. 142 Positivity of power spectrum. 7 Normalized error. 146 Maximum entropy. 119 Optimal linear estimator. 114 Periodic autocorrelation. 92 Periodic process. 142 Poles. 124 Partial correlation. 83 Mean squared error. 1 Matrix fraction descriptions (MFD). 102 Line spectrum pairs definition. 113 LSPs properties.INDEX 181 Line spectrum amplitude. 3 Optimal error covariance. 117 Linear predictive coding (LPC). 61 Linear dependence. 92 conditions for. 7 Modulus property. 113 Line-spectrum pair (LSP). 60 Poles of MIMO lattice.

37 Stalling. 147 Upward recursion. 81 proof. 96 Recursion AR autocorrelation. 14 Prediction error. 72 definition. 91 RCD (right common divisor). 69 Stability inverse filter. 53 positivity. 114 Real random sinusoid. 81 Predictability. 12 Predictability and spectral flatness. 65 Predictability measure. 75 Prediction error limit integral. 102 Spectral factorization. 98 Predictor backward. proof. 16 orthogonality. 41 Reflection coefficients. 166 Toeplitz property proof. 23 origin. full. 13. 55 Spectral flatness. 159 of MIMO lattice. 129 properties. 146 Unimodular matrices. 31 forward. 34 Prediction error bound. 60 Rank saturation line spectral process. 133 Szego’s theorem. 6 closed form (AR). 147 Right coprime matrices. 144 Symmetrical lattice. 50 estimate. 32 line spectral. 5 Prediction polynomial. 93 in noise. 147 Reactance in circuits. 147 Singularity of autocorrelation. validity. 3 Toeplitz. 6 Prediction order. 7 lattice structure. 114 Right common divisor (rcd). 38 . 144 and quantization.182 THE THEORY OF LINEAR PREDICTION AR model. 65 limit. 11 Transfer matrices. 170 Power spectrum and eigenvalues. 5 Quantization and stability. 6 backward. 42 vector LPC. 7 Toeplitz autocorrelation. 156 Sinusoids identification. 79 Prediction filter (FIR). 65 AR case. 80 Tilde notation. 60 inverse filter. 161 Recursive difference equation. 26 and AR property. 65 and predictability.

125 Levinson’s recursion. 126 Levinson summary. 117 augmented normal equations. 121 error covariance. 122 backward prediction. 7 . 7 WSS conditions for. backward. 60 Whitening effect. 120 parcor. 94. 120 augmented normal equations.INDEX 183 Vector LPC. 95 Yule-Walker equations. 117 Voiced and unvoiced sounds. 118 Levinson Properties. 1 Wiener-Hopf equations. 122 normal equations. 118 Vector processes. 144 Wiener. 26 vector LPC. 124 prediction polynomial.

Sign up to vote on this title
UsefulNot useful