This action might not be possible to undo. Are you sure you want to continue?
_
¸
¸
¸
¸
¸
¸
¸
_
t
0
t
−1
t
−2
t
−(n−1)
t
1
t
0
t
−1
t
2
t
1
t
0
.
.
.
.
.
.
.
.
.
t
n−1
t
0
_
¸
¸
¸
¸
¸
¸
¸
_
Robert M. Gray
Information Systems Laboratory
Department of Electrical Engineering
Stanford University
Stanford, California 94305
Revised March 2000
This document available as an
Adobe portable document format (pdf) ﬁle at
http://wwwisl.stanford.edu/~gray/toeplitz.pdf
c _Robert M. Gray, 1971, 1977, 1993, 1997, 1998, 2000.
The preparation of the original report was ﬁnanced in part by the National
Science Foundation and by the Joint Services Program at Stanford. Since then it
has been done as a hobby.
ii
Abstract
In this tutorial report the fundamental theorems on the asymptotic be
havior of eigenvalues, inverses, and products of “ﬁnite section” Toeplitz ma
trices and Toeplitz matrices with absolutely summable elements are derived.
Mathematical elegance and generality are sacriﬁced for conceptual simplic
ity and insight in the hopes of making these results available to engineers
lacking either the background or endurance to attack the mathematical lit
erature on the subject. By limiting the generality of the matrices considered
the essential ideas and results can be conveyed in a more intuitive manner
without the mathematical machinery required for the most general cases. As
an application the results are applied to the study of the covariance matrices
and their factors of linear models of discrete time random processes.
Acknowledgements
The author gratefully acknowledges the assistance of Ronald M. Aarts of
the Philips Research Labs in correcting many typos and errors in the 1993
revision, Liu Mingyu in pointing out errors corrected in the 1998 revision,
Paolo Tilli of the Scuola Normale Superiore of Pisa for pointing out an in
correct corollary and providing the correction, and to David Neuhoﬀ of the
University of Michigan for pointing out several typographical errors and some
confusing notation.
Contents
1 Introduction 3
2 The Asymptotic Behavior of Matrices 5
3 Circulant Matrices 15
4 Toeplitz Matrices 19
4.1 Finite Order Toeplitz Matrices . . . . . . . . . . . . . . . . . . 23
4.2 Toeplitz Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.3 Toeplitz Determinants . . . . . . . . . . . . . . . . . . . . . . 45
5 Applications to Stochastic Time Series 47
5.1 Moving Average Sources . . . . . . . . . . . . . . . . . . . . . 48
5.2 Autoregressive Processes . . . . . . . . . . . . . . . . . . . . . 51
5.3 Factorization . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
5.4 Diﬀerential Entropy Rate of Gaussian Processes . . . . . . . . 57
Bibliography 58
1
2 CONTENTS
Chapter 1
Introduction
A toeplitz matrix is an nn matrix T
n
= t
k,j
where t
k,j
= t
k−j
, i.e., a matrix
of the form
T
n
=
_
¸
¸
¸
¸
¸
¸
¸
_
t
0
t
−1
t
−2
t
−(n−1)
t
1
t
0
t
−1
t
2
t
1
t
0
.
.
.
.
.
.
.
.
.
t
n−1
t
0
_
¸
¸
¸
¸
¸
¸
¸
_
. (1.1)
Examples of such matrices are covariance matrices of weakly stationary
stochastic time series and matrix representations of linear timeinvariant dis
crete time ﬁlters. There are numerous other applications in mathematics,
physics, information theory, estimation theory, etc. A great deal is known
about the behavior of such matrices — the most common and complete ref
erences being Grenander and Szeg¨ o [1] and Widom [2]. A more recent text
devoted to the subject is B¨ ottcher and Silbermann [15]. Unfortunately, how
ever, the necessary level of mathematical sophistication for understanding
reference [1] is frequently beyond that of one species of applied mathemati
cian for whom the theory can be quite useful but is relatively little under
stood. This caste consists of engineers doing relatively mathematical (for an
engineering background) work in any of the areas mentioned. This apparent
dilemma provides the motivation for attempting a tutorial introduction on
Toeplitz matrices that proves the essential theorems using the simplest possi
ble and most intuitive mathematics. Some simple and fundamental methods
that are deeply buried (at least to the untrained mathematician) in [1] are
here made explicit.
3
4 CHAPTER 1. INTRODUCTION
In addition to the fundamental theorems, several related results that nat
urally follow but do not appear to be collected together anywhere are pre
sented.
The essential prerequisites for this report are a knowledge of matrix the
ory, an engineer’s knowledge of Fourier series and random processes, calculus
(Riemann integration), and hopefully a ﬁrst course in analysis. Several of the
occasional results required of analysis are usually contained in one or more
courses in the usual engineering curriculum, e.g., the CauchySchwarz and
triangle inequalities. Hopefully the only unfamiliar results are a corollary to
the CourantFischer Theorem and the Weierstrass Approximation Theorem.
The latter is an intuitive result which is easily believed even if not formally
proved. More advanced results from Lebesgue integration, functional analy
sis, and Fourier series are not used.
The main approach of this report is to relate the properties of Toeplitz
matrices to those of their simpler, more structured cousin — the circulant or
cyclic matrix. These two matrices are shown to be asymptotically equivalent
in a certain sense and this is shown to imply that eigenvalues, inverses, prod
ucts, and determinants behave similarly. This approach provides a simpliﬁed
and direct path (to the author’s point of view) to the basic eigenvalue distri
bution and related theorems. This method is implicit but not immediately
apparent in the more complicated and more general results of Grenander
in Chapter 7 of [1]. The basic results for the special case of a ﬁnite order
Toeplitz matrix appeared in [16], a tutorial treatment of the simplest case
which was in turn based on the ﬁrst draft of this work. The results were sub
sequently generalized using essentially the same simple methods, but they
remain less general than those of [1].
As an application several of the results are applied to study certain models
of discrete time random processes. Two common linear models are studied
and some intuitively satisfying results on covariance matrices and their fac
tors are given. As an example from Shannon information theory, the Toeplitz
results regarding the limiting behavior of determinants is applied to ﬁnd the
diﬀerential entropy rate of a stationary Gaussian random process.
We sacriﬁces mathematical elegance and generality for conceptual sim
plicity in the hope that this will bring an understanding of the interesting
and useful properties of Toeplitz matrices to a wider audience, speciﬁcally
to those who have lacked either the background or the patience to tackle the
mathematical literature on the subject.
Chapter 2
The Asymptotic Behavior of
Matrices
In this chapter we begin with relevant deﬁnitions and a prerequisite theo
rem and proceed to a discussion of the asymptotic eigenvalue, product, and
inverse behavior of sequences of matrices. The remaining chapters of this
report will largely be applications of the tools and results of this chapter to
the special cases of Toeplitz and circulant matrices.
The eigenvalues λ
k
and the eigenvectors (ntuples) x
k
of an nn matrix
M are the solutions to the equation
Mx = λx (2.1)
and hence the eigenvalues are the roots of the characteristic equation of M:
det(M −λI) = 0 . (2.2)
If M is Hermitian, i.e., if M = M
∗
, where the asterisk denotes conjugate
transpose, then a more useful description of the eigenvalues is the variational
description given by the CourantFischer Theorem [3, p. 116]. While we will
not have direct need of this theorem, we will use the following important
corollary which is stated below without proof.
Corollary 2.1 Deﬁne the Rayleigh quotient of an Hermitian matrix H and
a vector (complex n−tuple) x by
R
H
(x) = (x
∗
Hx)/(x
∗
x). (2.3)
5
6 CHAPTER 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES
Let η
M
and η
m
be the maximum and minimum eigenvalues of H, respectively.
Then
η
m
= min
x
R
H
(x) = min
x
∗
x=1
x
∗
Hx (2.4)
η
M
= max
x
R
H
(x) = max
x
∗
x=1
x
∗
Hx (2.5)
This corollary will be useful in specifying the interval containing the eigen
values of an Hermitian matrix.
The following lemma is useful when studying nonHermitian matrices
and products of Hermitian matrices. Its proof is given since it introduces
and manipulates some important concepts.
Lemma 2.1 Let A be a matrix with eigenvalues α
k
. Deﬁne the eigenvalues
of the Hermitian matrix A
∗
A to be λ
k
. Then
n−1
k=0
λ
k
≥
n−1
k=0
[α
k
[
2
, (2.6)
with equality iﬀ (if and only if ) A is normal, that is, iﬀ A
∗
A = AA
∗
. (If A
is Hermitian, it is also normal.)
Proof.
The trace of a matrix is the sum of the diagonal elements of a matrix.
The trace is invariant to unitary operations so that it also is equal to the
sum of the eigenvalues of a matrix, i.e.,
Tr¦A
∗
A¦ =
n−1
k=0
(A
∗
A)
k,k
=
n−1
k=0
λ
k
. (2.7)
Any complex matrix A can be written as
A = WRW
∗
. (2.8)
where W is unitary and R = ¦r
k,j
¦ is an upper triangular matrix [3, p. 79].
The eigenvalues of A are the principal diagonal elements of R. We have
Tr¦A
∗
A¦ = Tr¦R
∗
R¦ =
n−1
k=0
n−1
j=0
[r
j,k
[
2
=
n−1
k=0
[α
k
[
2
+
k=j
[r
j,k
[
2
≥
n−1
k=0
[α
k
[
2
. (2.9)
7
Equation (2.9) will hold with equality iﬀ R is diagonal and hence iﬀ A is
normal.
Lemma 2.1 is a direct consequence of Shur’s Theorem [3, pp. 229231]
and is also proved in [1, p. 106].
To study the asymptotic equivalence of matrices we require a metric or
equivalently a norm of the appropriate kind. Two norms — the operator or
strong norm and the HilbertSchmidt or weak norm — will be used here [1,
pp. 102103].
Let A be a matrix with eigenvalues α
k
and let λ
k
be the eigenvalues of
the Hermitian matrix A
∗
A. The strong norm  A  is deﬁned by
 A = max
x
R
A
∗
A
(x)
1/2
= max
x
∗
x=1
[x
∗
A
∗
Ax]
1/2
. (2.10)
From Corollary 2.1
 A 
2
= max
k
λ
k
∆
= λ
M
. (2.11)
The strong norm of A can be bounded below by letting e
M
be the eigenvector
of A corresponding to α
M
, the eigenvalue of A having largest absolute value:
 A 
2
= max
x
∗
x=1
x
∗
A
∗
Ax ≥ (e
∗
M
A
∗
)(Ae
M
) = [α
M
[
2
. (2.12)
If A is itself Hermitian, then its eigenvalues α
k
are real and the eigenvalues
λ
k
of A
∗
A are simply λ
k
= α
2
k
. This follows since if e
(k)
is an eigenvector of A
with eigenvalue α
k
, then A
∗
Ae
(k)
= α
k
A
∗
e
(k)
= α
2
k
e
(k)
. Thus, in particular,
if A is Hermitian then
 A = max
k
[α
k
[ = [α
M
[. (2.13)
The weak norm of an n n matrix A = ¦a
k,j
¦ is deﬁned by
[A[ =
_
_
n
−1
n−1
k=0
n−1
j=0
[a
k,j
[
2
_
_
1/2
= (n
−1
Tr[A
∗
A])
1/2
=
_
n
−1
n−1
k=0
λ
k
_
1/2
. (2.14)
From Lemma 2.1 we have
[A[
2
≥ n
−1
n−1
k=0
[α
k
[
2
, (2.15)
with equality iﬀ A is normal.
8 CHAPTER 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES
The HilbertSchmidt norm is the “weaker” of the two norms since
 A 
2
= max
k
λ
k
≥ n
−1
n−1
k=0
λ
k
= [A[
2
. (2.16)
A matrix is said to be bounded if it is bounded in both norms.
Note that both the strong and the weak norms are in fact norms in the
linear space of matrices, i.e., both satisfy the following three axioms:
1.  A ≥ 0 , with equality iﬀ A = 0 , the all zero matrix.
2.  A +B ≤ A  +  B 
3.  cA = [c[  A 
.
(2.17)
The triangle inequality in (2.17) will be used often as is the following direct
consequence:
 A −B ≥ [  A  −  B  . (2.18)
The weak norm is usually the most useful and easiest to handle of the
two but the strong norm is handy in providing a bound for the product of
two matrices as shown in the next lemma.
Lemma 2.2 Given two n n matrices G = ¦g
k,j
¦ and H = ¦h
k,j
¦, then
[GH[ ≤ G  [H[. (2.19)
Proof.
[GH[
2
= n
−1
i
j
[
k
g
i,k
h
k,j
[
2
= n
−1
i
j
k
m
g
i,k
¯ g
i,m
h
k,j
¯
h
m,j
= n
−1
j
h
∗
j
G
∗
Gh
j
,
(2.20)
9
where ∗ denotes conjugate transpose and h
j
is the j
th
column of H. From
(2.10)
(h
∗
j
G
∗
Gh
j
)/(h
∗
j
h
j
) ≤ G 
2
and therefore
[GH[
2
≤ n
−1
 G 
2
j
h
∗
j
h
j
= G 
2
[H[
2
.
Lemma 2.2 is the matrix equivalent of 7.3a of [1, p. 103]. Note that the
lemma does not require that G or H be Hermitian.
We will be considering nn matrices that approximate each other when
n is large. As might be expected, we will use the weak norm of the diﬀerence
of two matrices as a measure of the “distance” between them. Two sequences
of n n matrices A
n
and B
n
are said to be asymptotically equivalent if
1. A
n
and B
n
are uniformly bounded in strong (and hence in weak) norm:
 A
n
 ,  B
n
≤ M < ∞ (2.21)
and
2. A
n
−B
n
= D
n
goes to zero in weak norm as n →∞:
lim
n→∞
[A
n
−B
n
[ = lim
n→∞
[D
n
[ = 0.
Asymptotic equivalence of A
n
and B
n
will be abbreviated A
n
∼ B
n
. If one
of the two matrices is Toeplitz, then the other is said to be asymptotically
Toeplitz. We can immediately prove several properties of asymptotic equiv
alence which are collected in the following theorem.
Theorem 2.1
1. If A
n
∼ B
n
, then
lim
n→∞
[A
n
[ = lim
n→∞
[B
n
[. (2.22)
2. If A
n
∼ B
n
and B
n
∼ C
n
, then A
n
∼ C
n
.
3. If A
n
∼ B
n
and C
n
∼ D
n
, then A
n
C
n
∼ B
n
D
n
.
10 CHAPTER 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES
4. If A
n
∼ B
n
and  A
−1
n
,  B
−1
n
≤ K < ∞, i.e., A
−1
n
and B
−1
n
exist
and are uniformly bounded by some constant independent of n, then
A
−1
n
∼ B
−1
n
.
5. If A
n
B
n
∼ C
n
and  A
−1
n
≤ K < ∞, then B
n
∼ A
−1
n
C
n
.
Proof.
1. Eqs. (2.22) follows directly from (2.17).
2. [A
n
−C
n
[ = [A
n
−B
n
+B
n
−C
n
[ ≤ [A
n
−B
n
[ +[B
n
−C
n
[
−→
n→∞ 0
3. Applying Lemma 2.2 yields
[A
n
C
n
−B
n
D
n
[ = [A
n
C
n
−A
n
D
n
+A
n
D
n
−B
n
D
n
[
≤ A
n
 [C
n
−D
n
[+  D
n
 [A
n
−B
n
[
−→
n→∞ 0.
4. [A
−1
n
−B
−1
n
[ = [B
−1
n
B
n
A
n
−B
−1
n
A
n
A
−1
n
≤ B
−1
n
  A
−1
n
 [B
n
−A
n
[
−→
n→∞ 0.
5. B
n
−A
−1
n
C
n
= A
−1
n
A
n
B
n
−A
−1
n
C
n
≤ A
−1
n
 [A
n
B
n
−C
n
[
−→
n→∞ 0.
The above results will be useful in several of the later proofs.
Asymptotic equality of matrices will be shown to imply that eigenvalues,
products, and inverses behave similarly. The following lemma provides a
prelude of the type of result obtainable for eigenvalues and will itself serve
as the essential part of the more general theorem to follow.
Lemma 2.3 Given two sequences of asymptotically equivalent matrices A
n
and B
n
with eigenvalues α
n,k
and β
n,k
, respectively, then
lim
n→∞
n
−1
n−1
k=0
α
n,k
= lim
n→∞
n
−1
n−1
k=0
β
n,k
. (2.23)
11
Proof.
Let D
n
= ¦d
k,j
¦ = A
n
−B
n
. Eq. (2.23) is equivalent to
lim
n→∞
n
−1
Tr(D
n
) = 0. (2.24)
Applying the CauchySchwartz inequality [4, p. 17] to Tr(D
n
) yields
[Tr(D
n
)[
2
=
¸
¸
¸
¸
¸
n−1
k=0
d
k,k
¸
¸
¸
¸
¸
2
≤ n
n−1
k=0
[d
k,k
[
2
≤ n
n−1
k=0
n−1
j=0
[d
k,j
[
2
= n
2
[D
n
[
2
.
.
Dividing by n
2
, and taking the limit, results in
0 ≤ [n
−1
Tr(D
n
)[
2
≤ [D
n
[
2 −→
n→∞ 0. (2.25)
which implies (2.24) and hence (2.23).
Similarly to (2.23), if A
n
and B
n
are Hermitian then (2.22) and (2.15)
imply that
lim
n→∞
n
−1
n−1
k=0
α
2
n,k
= lim
n→∞
n
−1
n−1
k=0
β
2
n,k
. (2.26)
Note that (2.23) and (2.26) relate limiting sample (arithmetic) averages of
eigenvalues or moments of an eigenvalue distribution rather than individual
eigenvalues. Equations (2.23) and (2.26) are special cases of the following
fundamental theorem of asymptotic eigenvalue distribution.
Theorem 2.2 Let A
n
and B
n
be asymptotically equivalent sequences of ma
trices with eigenvalues α
n,k
and β
n,k
, respectively. Assume that the eigenvalue
moments of either matrix converge, e.g., lim
n→∞
n
−1
n−1
k=0
α
s
n,k
exists and is ﬁnite
for any positive integer s. Then
lim
n→∞
n
−1
n−1
k=0
α
s
n,k
= lim
n→∞
n
−1
n−1
k=0
β
s
n,k
. (2.27)
12 CHAPTER 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES
Proof.
Let A
n
= B
n
+ D
n
as in Lemma 2.3 and consider A
s
n
− B
s
n
∆
= ∆
n
. Since
the eigenvalues of A
s
n
are α
s
n,k
, (2.27) can be written in terms of ∆
n
as
lim
n→∞
n
−1
Tr∆
n
= 0. (2.28)
The matrix ∆
n
is a sum of several terms each being a product of ∆
n
s and
B
n
s but containing at least one D
n
. Repeated application of Lemma 2.2 thus
gives
[∆
n
[ ≤ K
[D
n
[
−→
n→∞ 0. (2.29)
where K
does not depend on n. Equation (2.29) allows us to apply Lemma
2.3 to the matrices A
s
n
and D
s
n
to obtain (2.28) and hence (2.27).
Theorem 2.2 is the fundamental theorem concerning asymptotic eigen
value behavior. Most of the succeeding results on eigenvalues will be appli
cations or specializations of (2.27).
Since (2.26) holds for any positive integer s we can add sums correspond
ing to diﬀerent values of s to each side of (2.26). This observation immedi
ately yields the following corollary.
Corollary 2.2 Let A
n
and B
n
be asymptotically equivalent sequences of ma
trices with eigenvalues α
n,k
and β
n,k
, respectively, and let f(x) be any poly
nomial. Then
lim
n→∞
n
−1
n−1
k=0
f (α
n,k
) = lim
n→∞
n
−1
n−1
k=0
f (β
n,k
) . (2.30)
Whether or not A
n
and B
n
are Hermitian, Corollary 2.2 implies that
(2.30) can hold for any analytic function f(x) since such functions can be
expanded into complex Taylor series, i.e., into polynomials. If A
n
and B
n
are Hermitian, however, then a much stronger result is possible. In this
case the eigenvalues of both matrices are real and we can invoke the Stone
Weierstrass approximation Theorem [4, p. 146] to immediately generalize
Corollary 2.3. This theorem, our one real excursion into analysis, is stated
below for reference.
Theorem 2.3 (StoneWeierstrass) If F(x) is a continuous complex function
on [a, b], there exists a sequence of polynomials p
n
(x) such that
lim
n→∞
p
n
(x) = F(x)
13
uniformly on [a, b].
Stated simply, any continuous function deﬁned on a real interval can be
approximated arbitrarily closely by a polynomial. Applying Theorem 2.3 to
Corollary 2.2 immediately yields the following theorem:
Theorem 2.4 Let A
n
and B
n
be asymptotically equivalent sequences of Her
mitian matrices with eigenvalues α
n,k
and β
n,k
, respectively. Since A
n
and
B
n
are bounded there exist ﬁnite numbers m and M such that
m ≤ α
n,k
, β
n,k
≤ M , n = 1, 2, . . . k = 0, 1, . . . , n −1. (2.31)
Let F(x) be an arbitrary function continuous on [m, M]. Then
lim
n→∞
n
−1
n−1
k=0
F[α
n,k
] = lim
n→∞
n
−1
n−1
k=0
F[β
n,k
] (2.32)
if either of the limits exists.
Theorem 2.4 is the matrix equivalent of Theorem (7.4a) of [1]. When two
real sequences ¦α
n,k
; k = 0, 1, . . . , n−1¦ and ¦β
n,k
; k = 0, 1, . . . , n−1¦ satisfy
(2.31)(2.32), they are said to be asymptotically equally distributed [1, p. 62].
As an example of the use of Theorem 2.4 we prove the following corollary
on the determinants of asymptotically equivalent matrices.
Corollary 2.3 Let A
n
and B
n
be asymptotically equivalent Hermitian matri
ces with eigenvalues α
n,k
and β
n,k
, respectively, such that α
n,k
, β
n,k
≥ m > 0.
Then
lim
n→∞
(det A
n
)
1/n
= lim
n→∞
(det B
n
)
1/n
. (2.33)
Proof.
From Theorem 2.4 we have for F(x) = ln x
lim
n→∞
n
−1
n−1
k=0
ln α
n,k
= lim
n→∞
n
−1
n−1
k=0
ln β
n,k
14 CHAPTER 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES
and hence
lim
n→∞
exp
_
n
−1
ln
n−1
k=0
α
n,k
_
= lim
n→∞
exp
_
n
−1
ln
n−1
k=0
β
n,k
_
or equivalently
lim
n→∞
exp[n
−1
ln det A
n
] = lim
n→∞
exp[n
−1
ln det B
n
],
from which (2.33) follows.
With suitable mathematical care the above corollary can be extended to
the case where α
n,k
, β
n,k
> 0, but there is no m satisfying the hypothesis of
the corollary, i.e., where the eigenvalues can get arbitrarily small but are still
strictly positive.
In the preceding chapter the concept of asymptotic equivalence of matri
ces was deﬁned and its implications studied. The main consequences have
been the behavior of inverses and products (Theorem 2.1) and eigenvalues
(Theorems 2.2 and 2.4). These theorems do not concern individual entries
in the matrices or individual eigenvalues, rather they describe an “average”
behavior. Thus saying A
−1
n
∼ B
−1
n
means that that [A
−1
n
− B
−1
n
[
−→
n→∞ 0 and
says nothing about convergence of individual entries in the matrix. In certain
cases stronger results on a type of elementwise convergence are possible using
the stronger norm of Baxter [7, 8]. Baxter’s results are beyond the scope of
this report.
The major use of the theorems of this chapter is that we can often study
the asymptotic behavior of complicated matrices by studying a more struc
tured and simpler asymptotically equivalent matrix.
Chapter 3
Circulant Matrices
The properties of circulant matrices are well known and easily derived [3, p.
267],[19]. Since these matrices are used both to approximate and explain the
behavior of Toeplitz matrices, it is instructive to present one version of the
relevant derivations here.
A circulant matrix C is one having the form
C =
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
c
0
c
1
c
2
c
n−1
c
n−1
c
0
c
1
c
2
.
.
.
c
n−1
c
0
c
1
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. c
2
c
1
c
1
c
n−1
c
0
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
, (3.1)
where each row is a cyclic shift of the row above it. The matrix C is itself
a special type of Toeplitz matrix. The eigenvalues ψ
k
and the eigenvectors
y
(k)
of C are the solutions of
Cy = ψ y (3.2)
or, equivalently, of the n diﬀerence equations
m−1
k=0
c
n−m+k
y
k
+
n−1
k=m
c
k−m
y
k
= ψ y
m
; m = 0, 1, . . . , n −1. (3.3)
Changing the summation dummy variable results in
n−1−m
k=0
c
k
y
k+m
+
n−1
k=n−m
c
k
y
k−(n−m)
= ψ y
m
; m = 0, 1, . . . , n −1. (3.4)
15
16 CHAPTER 3. CIRCULANT MATRICES
One can solve diﬀerence equations as one solves diﬀerential equations — by
guessing an (hopefully) intuitive solution and then proving that it works.
Since the equation is linear with constant coeﬃcients a reasonable guess is
y
k
= ρ
k
(analogous to y(t) = e
sτ
in linear time invariant diﬀerential equa
tions). Substitution into (3.4) and cancellation of ρ
m
yields
n−1−m
k=0
c
k
ρ
k
+ρ
−n
n−1
k=n−m
c
k
ρ
k
= ψ.
Thus if we choose ρ
−n
= 1, i.e., ρ is one of the n distinct complex n
th
roots
of unity, then we have an eigenvalue
ψ =
n−1
k=0
c
k
ρ
k
(3.5)
with corresponding eigenvector
y = n
−1/2
_
1, ρ, ρ
2
, . . . , ρ
n−1
_
, (3.6)
where the normalization is chosen to give the eigenvector unit energy. Choos
ing ρ
j
as the complex n
th
root of unity, ρ
j
= e
−2πij/n
, we have eigenvalue
ψ
m
=
n−1
k=0
c
k
e
−2πimk/n
(3.7)
and eigenvector
y
(m)
= n
−1/2
_
1, e
−2πim/n
, , e
−2πi(n−1)/n
_
.
From (3.7) we can write
C = U
∗
ΨU, (3.8)
where
U =
_
y
(0)
[y
(1)
[ [y
(n−1)
_
= n
−1
_
e
−2πimk/n
; m, k = 0, 1, . . . , n −1
_
Ψ = ¦ψ
k
δ
k,j
¦
17
To verify (3.8) we note that the (k, j)
th
element of C, say a
k,j
, is
a
k,j
= n
−1
n−1
m=0
e
2πim(k−j)/n
ψ
m
= n
−1
n−1
m=0
e
2πim(k−j)/n
n−1
r=0
c
r
e
2πimr/n
= n
−1
n−1
r=0
c
r
n−1
m=0
e
2πim(k−j+r)/n
.
(3.9)
But we have
n−1
m=0
e
2πim(k−j+r)/n
=
_
n k −j = −r mod n
0 otherwise
so that a
k,j
= c
−(k−j) mod n
. Thus (3.8) and (3.1) are equivalent. Furthermore
(3.9) shows that any matrix expressible in the form (3.8) is circulant.
Since C is unitarily similar to a diagonal matrix it is normal. Note that
all circulant matrices have the same set of eigenvectors. This leads to the
following properties.
Theorem 3.1 Let C = ¦c
k−j
¦ and B = ¦b
k−j
¦ be circulant n n matrices
with eigenvalues
ψ
m
=
n−1
k=0
c
k
e
−2πimk/n
β
m
=
n−1
k=0
b
k
e
−2πimk/n
,
respectively.
1. C and B commute and
CB = BC = U
∗
γU ,
where γ = ¦ψ
m
β
m
δ
k,m
¦, and CB is also a circulant matrix.
18 CHAPTER 3. CIRCULANT MATRICES
2. C +B is a circulant matrix and
C +B = U
∗
ΩU,
where Ω = ¦(ψ
m
+β
m
)δ
k,m
¦
3. If ψ
m
,= 0; m = 0, 1, . . . , n −1, then C is nonsingular and
C
−1
= U
∗
Ψ
−1
U
so that the inverse of C can be straightforwardly constructed.
Proof.
We have C = U
∗
ΨU and B = U
∗
ΦU where Ψ and Φ are diagonal matrices
with elements ψ
m
δ
k,m
and β
m
φ
k,m
, respectively.
1. CB = U
∗
ΨUU
∗
ΦU
= U
∗
ΨΦU
= U
∗
ΦΨU = BC
Since ΨΦ is diagonal, (3.9) implies that CB is circulant.
2. C +B = U
∗
(Ψ + Φ)U
3. C
−1
= (U
∗
ΨU)
−1
= U
∗
Ψ
−1
U
if Ψ is nonsingular.
Circulant matrices are an especially tractable class of matrices since in
verses, products, and sums are also circulants and hence both straightforward
to construct and normal. In addition the eigenvalues of such matrices can
easily be found exactly.
In the next chapter we shall see that certain circulant matrices asymp
totically approximate Toeplitz matrices and hence from Chapter 2 results
similar to those in Theorem 3 will hold asymptotically for Toeplitz matrices.
Chapter 4
Toeplitz Matrices
In this chapter the asymptotic behavior of inverses, products, eigenvalues,
and determinants of ﬁnite Toeplitz matrices is derived by constructing an
asymptotically equivalent circulant matrix and applying the results of the
previous chapters. Consider the inﬁnite sequence ¦t
k
; k = 0, ±1, ±2, ¦ and
deﬁne the ﬁnite (n n) Toeplitz matrix T
n
= ¦t
k−j
¦ as in (1.1). Toeplitz
matrices can be classiﬁed by the restrictions placed on the sequence t
k
. If
there exists a ﬁnite m such that t
k
= 0, [k[ > m, then T
n
is said to be a
ﬁnite order Toeplitz matrix. If t
k
is an inﬁnite sequence, then there are two
common constraints. The most general is to assume that the t
k
are square
summable, i.e., that
∞
k=−∞
[t
k
[
2
< ∞ . (4.1)
Unfortunately this case requires mathematical machinery beyond that as
sumed in this paper; i.e., Lebesgue integration and a relatively advanced
knowledge of Fourier series. We will make the stronger assumption that the
t
k
are absolutely summable, i.e.,
∞
k=−∞
[t
k
[ < ∞. (4.2)
This assumption greatly simpliﬁes the mathematics but does not alter the
fundamental concepts involved. As the main purpose here is tutorial and we
wish chieﬂy to relay the ﬂavor and an intuitive feel for the results, this paper
will be conﬁned to the absolutely summable case. The main advantage of
(4.2) over (4.1) is that it ensures the existence and continuity of the Fourier
19
20 CHAPTER 4. TOEPLITZ MATRICES
series f(λ) deﬁned by
f(λ) =
∞
k=−∞
t
k
e
ikλ
= lim
n→∞
n
k=−n
t
k
e
ikλ
. (4.3)
Not only does the limit in (4.3) converge if (4.2) holds, it converges uniformly
for all λ, that is, we have that
¸
¸
¸
¸
¸
¸
f(λ) −
n
k=−n
t
k
e
ikλ
¸
¸
¸
¸
¸
¸
=
¸
¸
¸
¸
¸
¸
−n−1
k=−∞
t
k
e
ikλ
+
∞
k=n+1
t
k
e
ikλ
¸
¸
¸
¸
¸
¸
≤
¸
¸
¸
¸
¸
¸
−n−1
k=−∞
t
k
e
ikλ
¸
¸
¸
¸
¸
¸
+
¸
¸
¸
¸
¸
¸
∞
k=n+1
t
k
e
ikλ
¸
¸
¸
¸
¸
¸
≤
−n−1
k=−∞
[t
k
[ +
∞
k=n+1
[t
k
[
,
where the righthand side does not depend on λ and it goes to zero as n →∞
from (4.2), thus given there is a single N, not depending on λ, such that
¸
¸
¸
¸
¸
¸
f(λ) −
n
k=−n
t
k
e
ikλ
¸
¸
¸
¸
¸
¸
≤ , all λ ∈ [0, 2π] , if n ≥ N. (4.4)
Note that (4.2) is indeed a stronger constraint than (4.1) since
∞
k=−∞
[t
k
[
2
≤
_
_
_
∞
k=−∞
[t
k
[
_
_
_
2
.
Note also that (4.2) implies that f(λ) is bounded since
[f(λ)[ ≤
∞
k=−∞
[t
k
e
ikλ
[
≤
∞
k=−∞
[t
k
[
∆
= M
f
< ∞ .
The matrix T
n
will be Hermitian if and only if f is real, in which case we
denote the least upper bound and greatest lower bound of f(λ) by M
f
and
m
f
, respectively. Observe that max([m
f
[, [M
f
[) ≤ M
f
.
21
Since f(λ) is the Fourier series of the sequence t
k
, we could alternatively
begin with a bounded and hence Riemann integrable function f(λ) on [0, 2π]
([f(λ)[ ≤ M
f
< ∞ for all λ) and deﬁne the sequence of n n Toeplitz
matrices
T
n
(f) =
_
(2π)
−1
_
2π
0
f(λ)e
−i(k−j)
dλ ; k, j = 0, 1, , n −1
_
. (4.5)
As before, the Toeplitz matrices will be Hermitian iﬀ f is real. The as
sumption that f(λ) is Riemann integrable implies that f(λ) is continuous
except possibly at a countable number of points. Which assumption is made
depends on whether one begins with a sequence t
k
or a function f(λ) —
either assumption will be equivalent for our purposes since it is the Riemann
integrability of f(λ) that simpliﬁes the bookkeeping in either case. Before
ﬁnding a simple asymptotic equivalent matrix to T
n
, we use Corollary 2.1
to ﬁnd a bound on the eigenvalues of T
n
when it is Hermitian and an upper
bound to the strong norm in the general case.
Lemma 4.1 Let τ
n,k
be the eigenvalues of a Toeplitz matrix T
n
(f). If T
n
(f)
is Hermitian, then
m
f
≤ τ
n,k
≤ M
f
. (4.6)
Whether or not T
n
(f) is Hermitian,
 T
n
(f) ≤ 2M
f
(4.7)
so that the matrix is uniformly bounded over n if f is bounded.
Proof.
Property (4.6) follows from Corollary 2.1:
max
k
τ
n,k
= max
x
(x
∗
T
n
x)/(x
∗
x) (4.8)
min
k
τ
n,k
= min
x
(x
∗
T
n
x)/(x
∗
x)
22 CHAPTER 4. TOEPLITZ MATRICES
so that
x
∗
T
n
x =
n−1
k=0
n−1
j=0
t
k−j
x
k
¯ x
j
=
n−1
k=0
n−1
j=0
_
(2π)
−1
_
2π
0
f(λ)e
i(k−j)λ
dλ
_
x
k
¯ x
j
= (2π)
−1
_
2π
0
¸
¸
¸
¸
¸
n−1
k=0
x
k
e
ikλ
¸
¸
¸
¸
¸
2
f(λ) dλ
(4.9)
and likewise
x
∗
x =
n−1
k=0
[x
k
[
2
= (2π)
−1
_
2π
0
dλ[
n−1
k=0
x
k
e
ikλ
[
2
. (4.10)
Combining (4.9)(4.10) results in
m
f
≤
_
2π
0
dλf(λ)
¸
¸
¸
¸
¸
n−1
k=0
x
k
e
ikλ
¸
¸
¸
¸
¸
2
_
2π
0
dλ
¸
¸
¸
¸
¸
n−1
k=0
x
k
e
ikλ
¸
¸
¸
¸
¸
2
=
x
∗
T
n
x
x
∗
x
≤ M
f
, (4.11)
which with (4.8) yields (4.6). Alternatively, observe in (4.11) that if e
(k)
is
the eigenvector associated with τ
n,k
, then the quadratic form with x = e
(k)
yields x
∗
T
n
x = τ
n,k
n−1
k=0
[x
k
[
2
. Thus (4.11) implies (4.6) directly.
We have already seen in (2.13) that if T
n
(f) is Hermitian, then  T
n
(f) =
max
k
[τ
n,k
[
∆
= [τ
n,M
[, which we have just shown satisﬁes [τ
n,M
[ ≤ max([M
f
[, [m
f
[)
which in turn must be less than M
f
, which proves (4.7) for Hermitian ma
trices.. Suppose that T
n
(f) is not Hermitian or, equivalently, that f is not
real. Any function f can be written in terms of its real and imaginary parts,
f = f
r
+ if
i
, where both f
r
and f
i
are real. In particular, f
r
= (f + f
∗
)/2
and f
i
= (f −f
∗
)/2i. Since the strong norm is a norm,
 T
n
(f)  =  T
n
(f
r
+if
i
) 
=  T
n
(f
r
) +iT
n
(f
i
) 
≤  T
n
(f
r
)  +  T
n
(f
i
) 
≤ M
fr
+M
f
i

.
4.1. FINITE ORDER TOEPLITZ MATRICES 23
Since [(f ±f
∗
)/2 ≤ ([f[ +[f
∗
[)/2 ≤ M
f
, M
fr
+M
f
i

≤ 2M
f
, proving (4.7).
Note for later use that the weak norm between Toeplitz matrices has a
simpler form than (2.14). Let T
n
= ¦t
k−j
¦ and T
n
= ¦t
k−j
¦ be Toeplitz, then
by collecting equal terms we have
[T
n
−T
n
[
2
= n
−1
n−1
k=0
n−1
j=0
[t
k−j
−t
k−j
[
2
= n
−1
n−1
k=−(n−1)
(n −[k[)[t
k
−t
k
[
2
=
n−1
k=−(n−1)
(1 −[k[/n)[t
k
−t
k
[
2
. (4.12)
We are now ready to put all the pieces together to study the asymptotic
behavior of T
n
. If we can ﬁnd an asymptotically equivalent circulant matrix
then all of the results of Chapters 2 and 3 can be instantly applied. The
main diﬀerence between the derivations for the ﬁnite and inﬁnite order case
is the circulant matrix chosen. Hence to gain some feel for the matrix chosen
we ﬁrst consider the simpler ﬁnite order case where the answer is obvious,
and then generalize in a natural way to the inﬁnite order case.
4.1 Finite Order Toeplitz Matrices
Let T
n
be a sequence of ﬁnite order Toeplitz matrices of order m+1, that is,
t
i
= 0 unless [i[ ≤ m. Since we are interested in the behavior or T
n
for large n
we choose n >> m. A typical Toeplitz matrix will then have the appearance
of the following matrix, possessing a band of nonzero entries down the central
diagonal and zeros everywhere else. With the exception of the upper left and
lower right hand corners that T
n
looks like a circulant matrix, i.e. each row
24 CHAPTER 4. TOEPLITZ MATRICES
is the row above shifted to the right one place.
T =
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
t
0
t
−1
t
−m
t
1
t
0
.
.
. 0
.
.
.
.
.
.
t
m
.
.
.
t
m
t
1
t
0
t
−1
t
−m
.
.
. t
−m
.
.
.
0 t
0
t
−1
t
m
t
1
t
0
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
. (4.13)
We can make this matrix exactly into a circulant if we ﬁll in the upper right
and lower left corners with the appropriate entries. Deﬁne the circulant
matrix C in just this way, i.e.
T =
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
t
0
t
−1
t
−m
t
m
t
1
t
1
.
.
.
.
.
.
t
m
.
.
.
.
.
.
t
m
0
.
.
.
t
m
t
1
t
0
t
−1
t
−m
.
.
.
0 t
−m
t
−m
.
.
.
.
.
.
.
.
.
t
0
t
−1
t
−1
t
−m
t
m
t
1
t
0
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
4.1. FINITE ORDER TOEPLITZ MATRICES 25
=
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
c
(n)
0
c
(n)
n−1
c
(n)
n−1
c
(n)
0
.
.
.
c
(n)
1
c
(n)
n−1
c
(n)
0
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
. (4.14)
Equivalently, C, consists of cyclic shifts of (c
(n)
0
, , c
(n)
n−1
) where
c
(n)
k
=
_
¸
¸
¸
_
¸
¸
¸
_
t
−k
k = 0, 1, , m
t
(n−k)
k = n −m, , n −1
0 otherwise
(4.15)
Given T
n
(f), the circulant matrix deﬁned as in (4.14)(4.15) is denoted C
n
(f).
The matrix C
n
(f) is intuitively a candidate for a simple matrix asymp
totically equivalent to T
n
(f) — we need only prove that it is indeed both
asymptotically equivalent and simple.
Lemma 4.2 The matrices T
n
and C
n
deﬁned in (4.13) and (4.14) are asymp
totically equivalent, i.e., both are bounded in the strong norm and.
lim
n→∞
[T
n
−C
n
[ = 0. (4.16)
Proof. The t
k
are obviously absolutely summable, so T
n
are uniformly
bounded by 2M
f
from Lemma 4.1. The matrices C
n
are also uniformly
bounded since C
∗
n
C
n
is a circulant matrix with eigenvalues [f(2πk/n)[
2
≤
4M
2
f
. The weak norm of the diﬀerence is
[T
n
−C
n
[
2
= n
−1
m
k=0
k([t
k
[
2
+[t
−k
[
2
)
≤ mn
−1
m
k=0
([t
k
[
2
+[t
−k
[
2
)
−→
n→∞ 0
.
26 CHAPTER 4. TOEPLITZ MATRICES
The above Lemma is almost trivial since the matrix T
n
− C
n
has fewer
than m
2
nonzero entries and hence the n
−1
in the weak norm drives [T
n
−C
n
[
to zero.
From Lemma 4.2 and Theorem 2.2 we have the following lemma:
Lemma 4.3 Let T
n
and C
n
be as in (4.13) and (4.14) and let their eigen
values be τ
n,k
and ψ
n,k
, respectively, then for any positive integer s
lim
n→∞
n
−1
n−1
k=0
τ
s
n,k
= lim
n→∞
n
−1
n−1
k=0
ψ
s
n,k
. (4.17)
In fact, for ﬁnite n,
¸
¸
¸
¸
¸
n
−1
n−1
k=0
τ
s
n,k
−n
−1
n−1
k=0
ψ
s
n,k
¸
¸
¸
¸
¸
≤ Kn
−1/2
, (4.18)
where K is not a function of n.
Proof.
Equation (4.17) is direct from Lemma 4.2 and Theorem 2.2. Equation
(4.18) follows from Lemma 2.3 and Lemma 4.2.
Lemma 4.3 is of interest in that for ﬁnite order Toeplitz matrices one can
ﬁnd the rate of convergence of the eigenvalue moments. It can be shown that
k ≤ sM
s−1
f
.
The above two lemmas show that we can immediately apply the results
of Section II to T
n
and C
n
. Although Theorem 2.1 gives us immediate hope
of fruitfully studying inverses and products of Toeplitz matrices, it is not yet
clear that we have simpliﬁed the study of the eigenvalues. The next lemma
clariﬁes that we have indeed found a useful approximation.
Lemma 4.4 Let C
n
(f) be constructed from T
n
(f) as in (4.14) and let ψ
n,k
be the eigenvalues of C
n
(f), then for any positive integer s we have
lim
n→∞
n
−1
n−1
k=0
ψ
s
n,k
= (2π)
−1
_
2π
0
f
s
(λ) dλ. (4.19)
4.1. FINITE ORDER TOEPLITZ MATRICES 27
If T
n
(f) and hence C
n
(f) are Hermitian, then for any function F(x) contin
uous on [m
f
, M
f
] we have
lim
n→∞
n
−1
n−1
k=0
F(ψ
n,k
) = (2π)
−1
_
2π
0
F[f(λ)] dλ. (4.20)
Proof.
From Chapter 3 we have exactly
ψ
n,j
=
n−1
k=0
c
(n)
k
e
−2πijk/n
=
m
k=0
t
−k
e
−2πijk/n
+
n−1
k=n−m
t
n−k
e
−2πijk/n
=
m
k=−m
t
k
e
−2πijk/n
= f(2πjn
−1
)
. (4.21)
Note that the eigenvalues of C
n
are simply the values of f(λ) with λ uniformly
spaced between 0 and 2π. Deﬁning 2πk/n = λ
k
and 2π/n = ∆λ we have
lim
n→∞
n
−1
n−1
k=0
ψ
s
n,k
= lim
n→∞
n
−1
n−1
k=0
f(2πk/n)
s
= lim
n→∞
n−1
k=0
f(λ
k
)
s
∆λ/(2π)
= (2π)
−1
_
2π
0
f(λ)
s
dλ, (4.22)
where the continuity of f(λ) guarantees the existence of the limit of (4.22)
as a Riemann integral. If T
n
and C
n
are Hermitian than the ψ
n,k
and f(λ)
are real and application of the StoneWeierstrass theorem to (4.22) yields
(4.20). Lemma 4.2 and (4.21) ensure that ψ
n,k
and τ
n,k
are in the real interval
[m
f
, M
f
].
Combining Lemmas 4.24.4 and Theorem 2.2 we have the following special
case of the fundamental eigenvalue distribution theorem.
28 CHAPTER 4. TOEPLITZ MATRICES
Theorem 4.1 If T
n
(f) is a ﬁnite order Toeplitz matrix with eigenvalues τ
n,k
,
then for any positive integer s
lim
n→∞
n
−1
n−1
k=0
τ
s
n,k
= (2π)
−1
_
2π
0
f(λ)
s
dλ. (4.23)
Furthermore, if T
n
(f) is Hermitian, then for any function F(x) continuous
on [m
f
, M
f
]
lim
n→∞
n
−1
n−1
k=0
F(τ
n,k
) = (2π)
−1
_
2π
0
F[f(λ)] dλ; (4.24)
i.e., the sequences τ
n,k
and f(2πk/n) are asymptotically equally distributed.
This behavior should seem reasonable since the equations T
n
x = τx and
C
n
x = ψx, n > 2m+1, are essentially the same n
th
order diﬀerence equation
with diﬀerent boundary conditions. It is in fact the “nice” boundary condi
tions that make ψ easy to ﬁnd exactly while exact solutions for τ are usually
intractable.
With the eigenvalue problem in hand we could next write down theorems
on inverses and products of Toeplitz matrices using Lemma 4.2 and the results
of Chapters 23. Since these theorems are identical in statement and proof
with the inﬁnite order absolutely summable Toeplitz case, we defer these
theorems momentarily and generalize Theorem 4.1 to more general Toeplitz
matrices with no assumption of ﬁne order.
4.2 Toeplitz Matrices
Obviously the choice of an appropriate circulant matrix to approximate a
Toeplitz matrix is not unique, so we are free to choose a construction with
the most desirable properties. It will, in fact, prove useful to consider two
slightly diﬀerent circulant approximations to a given Toeplitz matrix. Say
we have an absolutely summable sequence ¦t
k
; k = 0, ±1, ±2, ¦ with
f(λ) =
∞
k=−∞
t
k
e
ikλ
t
k
= (2π)
−1
_
2π
0
f(λ)e
−ikλ
. (4.25)
4.2. TOEPLITZ MATRICES 29
Deﬁne C
n
(f) to be the circulant matrix with top row (c
(n)
0
, c
(n)
1
, , c
(n)
n−1
)
where
c
(n)
k
= n
−1
n−1
j=0
f(2πj/n)e
2πijk/n
. (4.26)
Since f(λ) is Riemann integrable, we have that for ﬁxed k
lim
n→∞
c
(n)
k
= lim
n→∞
n
−1
n−1
j=0
f(2πj/n)e
2πijk/n
= (2π)
−1
_
2π
0
f(λ)e
ikλ
dλ = t
−k
(4.27)
and hence the c
(n)
k
are simply the sum approximations to the Riemann integral
giving t
−k
. Equations (4.26), (3.7), and (3.9) show that the eigenvalues ψ
n,m
of C
n
are simply f(2πm/n); that is, from (3.7) and (3.9)
ψ
n,m
=
n−1
k=0
c
(n)
k
e
−2πimk/n
=
n−1
k=0
_
_
n
−1
n−1
j=0
f(2πj/n)e
2πijk/n
_
_
e
−2πimk/n
=
n−1
j=0
f(2πj/n)
_
n
−1
n−1
k=0
2
2πik(j−m)/n
_
= f(2πm/n)
. (4.28)
Thus, C
n
(f) has the useful property (4.21) of the circulant approximation
(4.15) used in the ﬁnite case. As a result, the conclusions of Lemma 4.4
hold for the more general case with C
n
(f) constructed as in (4.26). Equation
(4.28) in turn deﬁnes C
n
(f) since, if we are told that C
n
is a circulant matrix
with eigenvalues f(2πm/n), m = 0, 1, , n −1, then from (3.9)
c
(n)
k
= n
−1
n−1
m=0
ψ
n,m
e
2πimk/n
= n
−1
n−1
m=0
f(2πm/n)e
2πimk/n
, (4.29)
30 CHAPTER 4. TOEPLITZ MATRICES
as in (4.26). Thus, either (4.26) or (4.28) can be used to deﬁne C
n
(f).
The fact that Lemma 4.4 holds for C
n
(f) yields several useful properties
as summarized by the following lemma.
Lemma 4.5
1. Given a function f of (4.25) and the circulant matrix C
n
(f) deﬁned by
(4.26), then
c
(n)
k
=
∞
m=−∞
t
−k+mn
, k = 0, 1, , n −1. (4.30)
(Note, the sum exists since the t
k
are absolutely summable.)
2. Given T
n
(f) where f(λ) is real and f(λ) ≥ m
f
> 0, then
C
n
(f)
−1
= C
n
(1/f).
3. Given two functions f(λ) and g(λ), then
C
n
(f)C
n
(g) = C
n
(fg).
Proof.
1. Since e
−2πimk/n
is periodic with period n, we have that
f(2πj/n) =
∞
m=−∞
t
m
e
i2πjm/n
∞
m=−∞
t
−m
e
−i2πjm/n
=
n−1
l=0
∞
m=−∞
t
−1+mn
e
−2πijl/n
4.2. TOEPLITZ MATRICES 31
and hence from (4.26) and (3.9)
c
(n)
k
= n
−1
n−1
j=0
f(2πj/n)e
2πijk/n
= n
−1
n−1
j=0
n−1
l=0
∞
m=−∞
t
−1+mn
e
2πij(k−l)/n
=
n−1
l=0
∞
m=−∞
t
−1+mn
_
_
_
n
−1
n−1
j=0
e
2πij(k−l)/n
_
_
_
=
∞
m=−∞
t
−k+mn
.
2. Since C
n
(f) has eigenvalues f(2πk/n) > 0, by Theorem 3.1, C
n
(f)
−1
has eigenvalues 1/f(2πk/n), and hence from (4.29) and the fact that
C
n
(f)
−1
is circulant we have C
n
(f)
−1
= C
n
(1/f).
3. Follows immediately from Theorem 3.1 and the fact that, if f(λ) and
g(λ) are Riemann integrable, so is f(λ)g(λ).
Equation (4.30) points out a shortcoming of C
n
(f) for applications as
a circulant approximation to T
n
(f) — it depends on the entire sequence
¦t
k
; k = 0, ±1, ±2, ¦ and not just on the ﬁnite collection of elements
¦t
k
; k = 0, ±1, , ±n − 1¦ of T
n
(f). This can cause problems in practi
cal situations where we wish a circulant approximation to a Toeplitz matrix
T
n
when we only know T
n
and not f. Pearl [13] discusses several coding and
ﬁltering applications where this restriction is necessary for practical reasons.
A natural such approximation is to form the truncated Fourier series
ˆ
f
n
(λ) =
n
k=−n
t
k
e
ikλ
, (4.31)
which depends only on ¦t
k
; k = 0, ±1, , ±n − 1¦, and then deﬁne the
circulant matrix
ˆ
C
n
= C
n
(
ˆ
f
n
); (4.32)
32 CHAPTER 4. TOEPLITZ MATRICES
that is, the circulant matrix having as top row (ˆ c
(n)
0
, , ˆ c
(n)
n−1
) where
ˆ c
(n)
k
= n
−1
n−1
j=0
ˆ
f
n
(2πj/n)e
2πijk/n
= n
−1
n−1
j=0
_
n
m=−n
t
m
e
2πijk/n
_
e
2πijk/n
=
n
m=−n
t
m
_
_
n
−1
n−1
j=0
e
2πij(k+m)/n
_
_
.
(4.33)
The last term in parentheses is from (3.9) 1 if m = −k or m = n − k, and
hence
ˆ c
(n)
k
= t
−k
+t
n−k
, k = 0, 1, , n −1.
Note that both C
n
(f) and
ˆ
C
n
= C
n
(
ˆ
f
n
) reduces to the C
n
(f) of (4.15) for an
r
th
order Toeplitz matrix if n > 2r + 1.
The matrix
ˆ
C
n
does not have the property (4.28) of having eigenvalues
f(2πk/n) in the general case (its eigenvalues are
ˆ
f
n
(2πk/n), k = 0, 1, , n−
1), but it does not have the desirable property to depending only on the
entries of T
n
. The following lemma shows that these circulant matrices are
asymptotically equivalent to each other and T
m
.
Lemma 4.6 Let T
n
= ¦t
k−j
¦ where
∞
k=−∞
[t
k
[ < ∞
and deﬁne as usual
f(λ) =
∞
k=−∞
t
k
e
ikλ
.
Deﬁne the circulant matrices C
n
(f) and
ˆ
C
n
= C
n
(
ˆ
f
n
) as in (4.26) and (4.31)
(4.32). Then,
C
n
(f) ∼
ˆ
C
n
∼ T
n
. (4.34)
Proof.
4.2. TOEPLITZ MATRICES 33
Since both C
n
(f) and
ˆ
C
n
are circulant matrices with the same eigenvec
tors (Theorem 3.1), we have from part 2 of Theorem 3.1 and (2.14) and the
comment following it that
[C
n
(f) −
ˆ
C
n
[
2
= n
−1
n−1
k=0
[f(2πk/n) −
ˆ
f
n
(2πk/n)[
2
.
Recall from (4.4) and the related discussion that
ˆ
f
n
(λ) uniformly converges
to f(λ), and hence given > 0 there is an N such that for n ≥ N we have
for all k, n that
[f(2πk(n) −
ˆ
f
n
(2πk/n)[
2
≤
and hence for n ≥ N
[C
n
(f) −
ˆ
C
n
[
2
≤ n
−1
n−1
i=0
= .
Since is arbitrary,
lim
n→∞
[C
n
(f) −
ˆ
C
n
[ = 0
proving that
C
n
(f) ∼
ˆ
C
n
. (4.35)
Next,
ˆ
C
n
= ¦t
k−j
¦ and use (4.12) to obtain
[T
n
−
ˆ
C
n
[
2
=
n−1
k=−(n−1)
(1 −[k[/n)[t
k
−t
k
[
2
.
From (4.33) we have that
t
k
=
_
¸
¸
_
¸
¸
_
ˆ c
(n)
k
= t
k
+t
n−k
k ≤ 0
ˆ c
(n)
n−k
= t
−n−k
+t
k
k ≥ 0
(4.36)
and hence
[T
n
−
ˆ
C
n
[
2
= [t
n−1
[
2
+
n−1
k=0
(1 −k/n)([t
n−k
[
2
+[t
−(n−k)
[
2
)
= [t
n−1
[
2
+
n−1
k=0
(k/n)([t
k
[
2
+[t
−k
[
2
)
.
34 CHAPTER 4. TOEPLITZ MATRICES
Since the ¦t
k
¦ are absolutely summable,
lim
n→∞
[t
n−1
[
2
= 0
and given > 0 we can choose an N large enough so that
∞
k=N
[t
k
[
2
+[t
−k
[
2
≤
and hence
lim
n→∞
[T
n
−
ˆ
C
n
[ = lim
n→∞
n−1
k=0
(k/n)([t
k
[
2
+[t
−k
[
2
)
= lim
n→∞
_
N−1
k=0
(k/n)([t
k
[
2
+[t
−k
[
2
) +
n−1
k=N
(k/n)([t
k
[
2
+[t
−k
[
2
)
_
≤ lim
n→∞
n
−1
_
N−1
k=0
k([t
k
[
2
+[t
−k
[
2
)
_
+
∞
k=N
([t
k
[
2
+[t
−k
[
2
) ≤
.
Since is arbitrary,
lim
n→∞
[T
n
−
ˆ
C
n
[ = 0
and hence
T
n
∼
ˆ
C
n
, (4.37)
which with (4.35) and Theorem 2.1 proves (4.34).
We note that Pearl [13] develops a circulant matrix similar to
ˆ
C
n
(de
pending only on the entries of T
n
) such that (4.37) holds in the more general
case where (4.1) instead of (4.2) holds.
We now have a circulant matrix C
n
(f) asymptotically equivalent to T
n
and whose eigenvalues, inverses and products are known exactly. We can
now use Lemmas 4.24.4 and Theorems 2.22.3 to immediately generalize
Theorem 4.1
Theorem 4.2 Let T
n
(f) be a sequence of Toeplitz matrices such that f(λ)
is Riemann integrable, e.g., f(λ) is bounded or the sequence t
k
is absolutely
summable. Then if τ
n,k
are the eigenvalues of T
n
and s is any positive integer
lim
n→∞
n
−1
n−1
k=0
τ
s
n,k
= (2π)
−1
_
2π
0
f(λ)
s
dλ. (4.38)
4.2. TOEPLITZ MATRICES 35
Furthermore, if T
n
(f) is Hermitian (f(λ) is real) then for any function F(x)
continuous on [m
f
, M
f
]
lim
n→∞
n
−1
n−1
k=0
F(τ
n,k
) = (2π)
−1
_
2π
0
F[f(λ)] dλ. (4.39)
Theorem 4.2 is the fundamental eigenvalue distribution theorem of Szeg¨ o
[1]. The approach used here is essentially a specialization of Grenander’s [1,
ch. 7].
Theorem 4.2 yields the following two corollaries.
Corollary 4.1 Let T
n
(f) be Hermitian and deﬁne the eigenvalue distribution
function D
n
(x) = n
−1
(number of τ
n,k
≤ x). Assume that
_
λ:f(λ)=x
dλ = 0.
Then the limiting distribution D(x) = lim
n→∞
D
n
(x) exists and is given by
D(x) = (2π)
−1
_
f(λ)≤x
dλ.
The technical condition of a zero integral over the region of the set of λ
for which f(λ) = x is needed to ensure that x is a point of continuity of the
limiting distribution.
Proof.
Deﬁne the characteristic function
1
x
(α) =
_
¸
_
¸
_
1 m
f
≤ α ≤ x
0 otherwise
.
We have
D(x) = lim
n→∞
n
−1
n−1
k=0
1
x
(τ
n,k
) .
Unfortunately, 1
x
(α) is not a continuous function and hence Theorem 4.2 can
not be immediately implied. To get around this problem we mimic Grenander
36 CHAPTER 4. TOEPLITZ MATRICES
and Szeg¨ o p. 115 and deﬁne two continuous functions that provide upper
and lower bounds to 1
x
and will converge to it in the limit. Deﬁne
1
+
x
(α) =
_
_
_
1 α ≤ x
1 −
α−x
x < α ≤ x +
0 x + < α
1
−
x
(α) =
_
_
_
1 α ≤ x −
1 −
α−x+
x − < α ≤ x
0 x < α
The idea here is that the upper bound has an output of 1 everywhere 1
x
does,
but then it drops in a continuous linear fashion to zero at x + instead of
immediately at x. The lower bound has a 0 everywhere 1
x
does and it rises
linearly from x to x − to the value of 1 instead of instantaneously as does
1
x
. Clearly
1
−
x
(α) < 1
x
(α) < 1
+
x
(α)
for all α.
Since both 1
+
x
and 1
−
x
are continuous, Theorem 4 can be used to conclude
that
lim
n→∞
n
−1
n−1
k=0
1
+
x
(τ
n,k
)
= (2π)
−1
_
1
+
x
(f(λ)) dλ
= (2π)
−1
_
f(λ)≤x
dλ + (2π)
−1
_
x<f(λ)≤x+
(1 −
f(λ) −x
) dλ
≤ (2π)
−1
_
f(λ)≤x
dλ + (2π)
−1
_
x<f(λ)≤x+
dλ
and
lim
n→∞
n
−1
n−1
k=0
1
−
x
(τ
n,k
)
= (2π)
−1
_
1
−
x
(f(λ)) dλ
= (2π)
−1
_
f(λ)≤x−
dλ + (2π)
−1
_
x−<f(λ)≤x
(1 −
f(λ) −(x −)
) dλ
4.2. TOEPLITZ MATRICES 37
= (2π)
−1
_
f(λ)≤x−
dλ + (2π)
−1
_
x−<f(λ)≤x
(x −f(λ)) dλ
≥ (2π)
−1
_
f(λ)≤x−
dλ
= (2π)
−1
_
f(λ)≤x
dλ −(2π)
−1
_
x−<f(λ)≤x
dλ
These inequalities imply that for any > 0, as n grows the sample average
n
−1
n−1
k=0
1
x
(τ
n,k
) will be sandwitched between
(2π)
−1
_
f(λ)≤x
dλ + (2π)
−1
_
x<f(λ)≤x+
dλ
and
(2π)
−1
_
f(λ)≤x
dλ −(2π)
−1
_
x−<f(λ)≤x
dλ.
Since can be made arbitrarily small, this means the sum will be sandwiched
between
(2π)
−1
_
f(λ)≤x
dλ
and
(2π)
−1
_
f(λ)≤x
dλ −(2π)
−1
_
f(λ)=x
dλ.
Thus if _
f(λ)=x
dλ = 0,
then
D(x) = (2π)
−1
_
2π
0
1
x
[f(λ)]dλ
= (2π)
−1
v
_
f(λ)≤x
dλ
.
Corollary 4.2 For T
n
(f) Hermitian we have
lim
n→∞
max
k
τ
n,k
= M
f
lim
n→∞
min
k
τ
n,k
= m
f
.
38 CHAPTER 4. TOEPLITZ MATRICES
Proof.
From Corollary 4.1 we have for any > 0
D(m
f
+) =
_
f(λ)≤m
f
+
dλ > 0.
The strict inequality follows from the continuity of f(λ). Since
lim
n→∞
n
−1
¦number of τ
n,k
in [m
f
, m
f
+]¦ > 0
there must be eigenvalues in the interval [m
f
, m
f
+] for arbitrarily small .
Since τ
n,k
≥ m
f
by Lemma 4.1, the minimum result is proved. The maximum
result is proved similarly.
We next consider the inverse of an Hermitian Toeplitz matrix.
Theorem 4.3 Let T
n
(f) be a sequence of Hermitian Toeplitz matrices such
that f(λ) is Riemann integrable and f(λ) ≥ 0 with equality holding at most
at a countable number of points.
1. T
n
(f) is nonsingular
2. If f(λ) ≥ m
f
> 0, then
T
n
(f)
−1
∼ C
n
(f)
−1
, (4.40)
where C
n
(f) is deﬁned in (4.29). Furthermore, if we deﬁne T
n
(f) −
C
n
(f) = D
n
then T
n
(f)
−1
has the expansion
T
n
(f)
−1
= [C
n
(f) +D
n
]
−1
= C
n
(f)
−1
[I +D
n
C
n
(f)
−1
]
−1
= C
n
(f)
−1
_
I +D
n
C
n
(f)
−1
+ (D
n
C
n
(f)
−1
)
2
+
_
(4.41)
and the expansion converges (in weak norm) for suﬃciently large n.
3. If f(λ) ≥ m
f
> 0, then
T
n
(f)
−1
∼ T
n
(1/f) =
_
(2π)
−1
_
π
−π
dλe
i(k−j)λ
/f(λ)
_
; (4.42)
4.2. TOEPLITZ MATRICES 39
that is, if the spectrum is strictly positive then the inverse of a Toeplitz
matrix is asymptotically Toeplitz. Furthermore if ρ
n,k
are the eigenval
ues of T
n
(f)
−1
and F(x) is any continuous function on [1/M
f
, 1/m
f
],
then
lim
n→∞
n
−1
n−1
k=0
F(ρ
n,k
) = (2π)
−1
_
π
−π
F[(1/f(λ)] dλ. (4.43)
4. If m
f
= 0, f(λ) has at least one zero, and the derivative of f(λ) exists
and is bounded, then T
n
(f)
−1
is not bounded, 1/f(λ) is not integrable
and hence T
n
(1/f) is not deﬁned and the integrals of (4.41) may not
exist. For any ﬁnite θ, however, the following similar fact is true: If
F(x) is a continuous function of [1/M
f
, θ], then
lim
n→∞
n
−1
n−1
k=0
F[min(ρ
n,k
, θ)] = (2π)
−1
_
2π
0
F[min(1/f(λ), θ)] dλ.
(4.44)
Proof.
1. Since f(λ) > 0 except at possible a ﬁnite number of points, we have
from (4.9)
x
∗
T
n
x =
1
2π
_
π
−π
¸
¸
¸
¸
¸
n−1
k=0
x
k
e
ikλ
¸
¸
¸
¸
¸
2
f(λ)dλ > 0
so that for all n
min
k
τ
n,k
> 0
and hence
det T
n
=
n−1
k=0
τ
n,k
,= 0
so that T
n
(f) is nonsingular.
2. From Lemma 4.6, T
n
∼ C
n
and hence (4.40) follows from Theorem 2.1
since f(λ) ≥ m
f
> 0 ensures that
 T
−1
n
,  C
−1
n
≤ 1/m
f
< ∞.
40 CHAPTER 4. TOEPLITZ MATRICES
The series of (4.41) will converge in weak norm if
[D
n
C
−1
n
[ < 1 (4.45)
since
[D
n
C
−1
n
[ ≤ C
−1
n
 [D
n
[ ≤ (1/m
f
)[D
n
[
−→
n→∞ 0
(4.45) must hold for large enough n. From (4.40), however, if n is large
enough, then probably the ﬁrst term of the series is suﬃcient.
3. We have
[T
n
(f)
−1
−T
n
(1/f)[ ≤ [T
n
(f)
−1
−C
n
(f)
−1
[ +[C
n
(f)
−1
−T
n
(1/f)[.
From (b) for any > 0 we can choose an n large enough so that
[T
n
(f)
−1
−C
n
(f)
−1
[ ≤
2
. (4.46)
From Theorem 3.1 and Lemma 4.5, C
n
(f)
−1
= C
n
(1/f) and from
Lemma 4.6 C
n
(1/f) ∼ T
n
(1/f). Thus again we can choose n large
enough to ensure that
[C
n
(f)
−1
−T
n
(1/f)[ ≤ /2 (4.47)
so that for any > 0 from (4.46)(4.47) can choose n such that
[T
n
(f)
−1
−T
n
(1/f)[ ≤
which is (4.42). Equation (4.43) follows from (4.42) and Theorem 2.4.
Alternatively, if G(x) is any continuous function on [1/M
f
, 1/m
f
] and
(4.43) follows directly from Lemma 4.6 and Theorem 2.4 applied to
G(1/x).
4. When f(λ) has zeros (m
f
= 0) then from Corollary 4.2 lim
n→∞
min
k
τ
n,k
= 0
and hence
 T
−1
n
= max
k
ρ
n,k
= 1/ min
k
τ
n,k
(4.48)
is unbounded as n → ∞. To prove that 1/f(λ) is not integrable and
hence that T
n
(1/f) does not exist we deﬁne the sets
E
k
= ¦λ : 1/k ≥ f(λ)/M
f
> 1/(k + 1)¦
= ¦λ : k ≤ M
f
/f(λ) < k + 1¦
(4.49)
4.2. TOEPLITZ MATRICES 41
since f(λ) is continuous on [0, M
f
] and has at least one zero all of these
sets are nonzero intervals of size, say, [E
k
[. From (4.49)
_
π
−π
dλ/f(λ) ≥
∞
k=1
[E
k
[k/M
f
(4.50)
since f(λ) is diﬀerentiable there is some ﬁnite value η such that
¸
¸
¸
¸
¸
df
dλ
¸
¸
¸
¸
¸
≤ η. (4.51)
From (4.50) and (4.51)
_
π
−π
dλ/f(λ) ≥
∞
k=1
(k/M
f
)(1/k −1/(k + 1)/η
= (M
f
η)
−1
∞
k=1
1/(k + 1)
, (4.52)
which diverges so that 1/f(λ) is not integrable. To prove (4.44) let F(x)
be continuous on [1/M
f
, θ], then F[min(1/x, θ)] is continuous on [0, M
f
]
and hence Theorem 2.4 yields (4.44). Note that (4.44) implies that the
eigenvalues of T
−1
n
are asymptotically equally distributed up to any
ﬁnite θ as the eigenvalues of the sequence of matrices T
n
[min(1/f, θ)].
A special case of part 4 is when T
n
(f) is ﬁnite order and f(λ) has at least
one zero. Then the derivative exists and is bounded since
df/dλ =
¸
¸
¸
¸
¸
¸
m
k=−m
ikt
k
e
ikλ
¸
¸
¸
¸
¸
¸
≤
m
k=−m
[k[[t
k
[ < ∞
.
The series expansion of part 2 is due to Rino [6]. The proof of part 4
is motivated by one of Widom [2]. Further results along the lines of part 4
regarding unbounded Toeplitz matrices may be found in [5].
Extending (a) to the case of nonHermitian matrices can be somewhat
diﬃcult, i.e., ﬁnding conditions on f(λ) to ensure that T
n
(f) is invertible.
42 CHAPTER 4. TOEPLITZ MATRICES
Parts (a)(d) can be straightforwardly extended if f(λ) is continuous. For a
more general discussion of inverses the interested reader is referred to Widom
[2] and the references listed in that paper. It should be pointed out that when
discussing inverses Widom is concerned with the asymptotic behavior of ﬁnite
matrices. As one might expect, the results are similar. The results of Baxter
[7] can also be applied to consider the asymptotic behavior of ﬁnite inverses
in quite general cases.
We next combine Theorems 2.1 and Lemma 4.6 to obtain the asymptotic
behavior of products of Toeplitz matrices. The case of only two matrices is
considered ﬁrst since it is simpler.
Theorem 4.4 Let T
n
(f) and T
n
(g) be deﬁned as in (4.5) where f(λ) and
g(λ) are two bounded Riemann integrable functions. Deﬁne C
n
(f) and C
n
(g)
as in (4.29) and let ρ
n,k
be the eigenvalues of T
n
(f)T
n
(g)
1.
T
n
(f)T
n
(g) ∼ C
n
(f)C
n
(g) = C
n
(fg). (4.53)
T
n
(f)T
n
(g) ∼ T
n
(g)T
n
(f). (4.54)
lim
n→∞
n
−1
n−1
k=0
ρ
s
n,k
= (2π)
−1
_
2π
0
[f(λ)g(λ)]
s
dλ s = 1, 2, . . . . (4.55)
2. If T
n
(t) and T
n
(g) are Hermitian, then for any F(x) continuous on
[m
f
m
g
, M
f
M
g
]
lim
n→∞
n
−1
n−1
k=0
F(ρ
n,k
) = (2π)
−1
_
2π
0
F[f(λ)g(λ)] dλ. (4.56)
3.
T
n
(f)T
n
(g) ∼ T
n
(fg). (4.57)
4. Let f
1
(λ), .., f
m
(λ) be Riemann integrable. Then if the C
n
(f
i
) are de
ﬁned as in (4.29)
m
i=1
T
n
(f
i
) ∼ C
n
_
m
i=1
f
i
_
∼ T
n
_
m
i=1
f
i
_
. (4.58)
4.2. TOEPLITZ MATRICES 43
5. If ρ
n,k
are the eigenvalues of
m
i=1
T
n
(f
i
), then for any positive integer s
lim
n→∞
n
−1
n−1
k=0
ρ
s
n,k
= (2π)
−1
_
2π
0
_
m
i=1
f
i
(λ)
_
s
dλ (4.59)
If the T
n
(f
i
) are Hermitian, then the ρ
n,k
are asymptotically real, i.e.,
the imaginary part converges to a distribution at zero, so that
lim
n→∞
n
−1
n−1
k=0
(Re[ρ
n,k
])
s
= (2π)
−1
_
2π
0
_
m
i=1
f
i
(λ)
_
s
dλ. (4.60)
lim
n→∞
n
−1
n−1
k=0
(·[ρ
n,k
])
2
= 0. (4.61)
Proof.
1. Equation (4.53) follows from Lemmas 4.5 and 4.6 and Theorems 2.1
and 3. Equation (4.54) follows from (4.53). Note that while Toeplitz
matrices do not in general commute, asymptotically they do. Equation
(4.55) follows from (4.53), Theorem 2.2, and Lemma 4.4.
2. Proof follows from (4.53) and Theorem 2.4. Note that the eigenvalues
of the product of two Hermitian matrices are real [3, p. 105].
3. Applying Lemmas 4.5 and 4.6 and Theorem 2.1
[T
n
(f)T
n
(g) −T
n
(fg)[ = [T
n
(f)T
n
(g) −C
n
(f)C
n
(g)
+C
n
(f)C
n
(g) −T
n
(fg)[
≤ [T
n
(f)T
n
(g) −C
n
(f)C
n
(g)[
+[C
n
(fg) −T
n
(fg)[
−→
n→∞0
.
4. Follows from repeated application of (4.53) and part (c).
5. Equation (4.58) follows from (d) and Theorem 2.1. For the Hermitian
case, however, we cannot simply apply Theorem 2.4 since the eigenval
ues ρ
n,k
of
i
T
n
(f
i
) may not be real. We can show, however, that they
44 CHAPTER 4. TOEPLITZ MATRICES
are asymptotically real. Let ρ
n,k
= α
n,k
+iβ
n,k
where α
n,k
and β
n,k
are
real. Then from Theorem 2.2 we have for any positive integer s
lim
n→∞
n
−1
n−1
k=0
(α
n,k
+iβ
n,k
)
s
= lim
n→∞
n
−1
n−1
k=0
ψ
s
n,k
= (2π)
−1
_
2π
0
_
m
i=1
f
i
(λ)
_
s
dλ
, (4.62)
where ψ
n,k
are the eigenvalues of C
n
_
m
i=1
f
i
_
. From (2.14)
n
−1
n−1
k=0
[ρ
n,k
[
2
= n
−1
n−1
k=0
_
α
2
n,k
+β
2
n,k
_
≤
¸
¸
¸
¸
¸
m
i=i
T
n
(f
i
)
¸
¸
¸
¸
¸
2
.
From (4.57), Theorem 2.1 and Lemma 4.4
lim
n→∞
¸
¸
¸
¸
¸
m
i=1
T
n
(f
i
)
¸
¸
¸
¸
¸
2
= lim
n→∞
¸
¸
¸
¸
¸
C
n
_
m
i=1
f
i
_¸
¸
¸
¸
¸
2
= (2π)
−1
_
2π
0
_
m
i=1
f
i
(λ)
_
2
dλ
. (4.63)
Subtracting (4.61) for s = 2 from (4.61) yields
lim
n→∞
n
−1
n−1
k=1
β
2
n,k
≤ 0.
Thus the distribution of the imaginary parts tends to the origin and
hence
lim
n→∞
n
−1
n−1
k=0
α
s
n,k
= (2π)
−1
_
2π
0
_
m
i=1
f
i
(λ)
_
s
dλ.
Parts (d) and (e) are here proved as in Grenander and Szeg¨ o [1, pp.
105106].
We have developed theorems on the asymptotic behavior of eigenvalues,
inverses, and products of Toeplitz matrices. The basic method has been
to ﬁnd an asymptotically equivalent circulant matrix whose special simple
4.3. TOEPLITZ DETERMINANTS 45
structure as developed in Chapter 3 could be directly related to the Toeplitz
matrices using the results of Chapter 2. We began with the ﬁnite order case
since the appropriate circulant matrix is there obvious and yields certain
desirable properties that suggest the corresponding circulant matrix in the
inﬁnite case. We have limited our consideration of the inﬁnite order case
to absolutely summable coeﬃcients or to bounded Riemann integrable func
tions f(λ) for simplicity. The more general case of square summable t
k
or
bounded Lebesgue integrable f(λ) treated in Chapter 7 of [1] requires sig
niﬁcantly more mathematical care but can be interpreted as an extension of
the approach taken here.
4.3 Toeplitz Determinants
The fundamental Toeplitz eigenvalue distribution theory has an interesting
application for characterizing the limiting behavior of determinants. Suppose
now that T
n
(f) is a sequence of Hermitian Toeplitz matrices such that that
f(λ) ≥ m
f
> 0. Let C
n
= C
n
(f) denote the sequence of circulant matrices
constructed from f as in (4.26). Then from (4.28) the eigenvalues of C
n
are
f(2πm/n) for m = 0, 1, . . . , n −1 and hence detC
n
=
n−1
m=0
f(2πm/n). This
in turn implies that
ln (det(C
n
))
1
n
=
1
n
ln detC
n
=
1
n
n−1
m=0
ln f(2π
m
n
).
These sums are the Riemann approximations to the limiting integral, whence
lim
n→∞
ln (det(C
n
))
1
n
=
_
1
0
ln f(2πλ) dλ.
Exponentiating, using the continuity of the logarithm for strictly positive
arguments, and changing the variables of integration yields
lim
n→∞
(det(C
n
))
1
n
= e
1
2π
_
2π
0
ln f(λ) dλ.
This integral, the asymptotic equivalence of C
n
and T
n
(f) (Lemma 4.6), and
Corollary 2.3 togther yield the following result ([1], p. 65)
Theorem 4.5 Let T
n
(f) be a sequence of Hermitian Toeplitz matrices such
that ln f(λ) is Riemann integrable and f(λ) ≥ m
f
> 0. Then
lim
n→∞
(det(T
n
(f)))
1
n
= e
1
2π
_
2π
0
ln f(λ) dλ.
(4.64)
46 CHAPTER 4. TOEPLITZ MATRICES
Chapter 5
Applications to Stochastic
Time Series
Toeplitz matrices arise quite naturally in the study of discrete time random
processes. Covariance matrices of weakly stationary processes are Toeplitz
and triangular Toeplitz matrices provide a matrix representation of causal
linear time invariant ﬁlters. As is well known and as we shall show, these
two types of Toeplitz matrices are intimately related. We shall take two
viewpoints in the ﬁrst section of this chapter section to show how they are
related. In the ﬁrst part we shall consider two common linear models of
random time series and study the asymptotic behavior of the covariance ma
trix, its inverse and its eigenvalues. The well known equivalence of moving
average processes and weakly stationary processes will be pointed out. The
lesser known fact that we can deﬁne something like a power spectral density
for autoregressive processes even if they are nonstationary is discussed. In
the second part of the ﬁrst section we take the opposite tack — we start with
a Toeplitz covariance matrix and consider the asymptotic behavior of its tri
angular factors. This simple result provides some insight into the asymptotic
behavior or system identiﬁcation algorithms and WienerHopf factorization.
The second section provides another application of the Toeplitz distri
bution theorem to stationary random processes by deriving the Shannon
information rate of a stationary Gaussian random process.
Let ¦X
k
; k ∈ 1¦ be a discrete time random process. Generally we take
1 = Z, the space of all integers, in which case we say that the process
is twosided, or 1 = Z
+
, the space of all nonnegative integers, in which
case we say that the process is onesided. We will be interested in vector
47
48 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
representations of the process so we deﬁne the column vector (n−tuple) X
n
=
(X
0
, X
1
, . . . , X
n−1
)
t
, that is, X
n
is an ndimensional column vector. The
mean vector is deﬁned by m
n
= E(X
n
), which we usually assume is zero for
convenience. The n n covariance matrix R
n
= ¦r
j,k
¦ is deﬁned by
R
n
= E[(X
n
−m
n
)(X
n
−m
n
)
∗
]. (5.1)
This is the autocorrelation matrix when the mean vector is zero. Subscripts
will be dropped when they are clear from context. If the matrix R
n
is
Toeplitz, say R
n
= T
n
(f), then r
k,j
= r
k−j
and the process is said to be
weakly stationary. In this case we can deﬁne f(λ) =
∞
k=−∞
r
k
e
ikλ
as the
power spectral density of the process. If the matrix R
n
is not Toeplitz but
is asymptotically Toeplitz, i.e., R
n
∼ T
n
(f), then we say that the process is
asymptotically weakly stationary and once again deﬁne f(λ) as the power
spectral density. The latter situation arises, for example, if an otherwise sta
tionary process is initialized with X
k
= 0, k ≤ 0. This will cause a transient
and hence the process is strictly speaking nonstationary. The transient dies
out, however, and the statistics of the process approach those of a weakly
stationary process as n grows.
The results derived herein are essentially trivial if one begins and deals
only with doubly inﬁnite matrices. As might be hoped the results for asymp
totic behavior of ﬁnite matrices are consistent with this case. The problem is
of interest since one often has ﬁnite order equations and one wishes to know
the asymptotic behavior of solutions or one has some function deﬁned as a
limit of solutions of ﬁnite equations. These results are useful both for ﬁnding
theoretical limiting solutions and for ﬁnding reasonable approximations for
ﬁnite order solutions. So much for philosophy. We now proceed to investigate
the behavior of two common linear models. For simplicity we will assume
the process means are zero.
5.1 Moving Average Sources
By a linear model of a random process we mean a model wherein we pass
a zero mean, independent identically distributed (iid) sequence of random
variables W
k
with variance σ
2
through a linear time invariant discrete time
ﬁltered to obtain the desired process. The process W
k
is discrete time “white”
5.1. MOVING AVERAGE SOURCES 49
noise. The most common such model is called a moving average process and
is deﬁned by the diﬀerence equation
U
n
=
n
k=0
b
k
W
n−k
=
n
k=0
b
n−k
W
k
(5.2)
U
n
= 0; n < 0.
We assume that b
0
= 1 with no loss of generality since otherwise we can
incorporate b
0
into σ
2
. Note that (5.2) is a discrete time convolution, i.e.,
U
n
is the output of a ﬁlter with “impulse response” (actually Kronecker δ
response) b
k
and input W
k
. We could be more general by allowing the ﬁlter
b
k
to be noncausal and hence act on future W
k
’s. We could also allow the
W
k
’s and U
k
’s to extend into the inﬁnite past rather than being initialized.
This would lead to replacing of (5.2) by
U
n
=
∞
k=−∞
b
k
W
n−k
=
∞
k=−∞
b
n−k
W
k
. (5.3)
We will restrict ourselves to causal ﬁlters for simplicity and keep the initial
conditions since we are interested in limiting behavior. In addition, since
stationary distributions may not exist for some models it would be diﬃcult
to handle them unless we start at some ﬁxed time. For these reasons we take
(5.2) as the deﬁnition of a moving average.
Since we will be studying the statistical behavior of U
n
as n gets arbitrarily
large, some assumption must be placed on the sequence b
k
to ensure that (5.2)
converges in the meansquared sense. The weakest possible assumption that
will guarantee convergence of (5.2) is that
∞
k=0
[b
k
[
2
< ∞. (5.4)
In keeping with the previous sections, however, we will make the stronger
assumption
∞
k=0
[b
k
[ < ∞. (5.5)
As previously this will result in simpler mathematics.
50 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
Equation (5.2) can be rewritten as a matrix equation by deﬁning the
lower triangular Toeplitz matrix
B
n
=
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
1 0
b
1
1
b
2
b
1
.
.
. b
2
.
.
.
.
.
.
b
n−1
. . . b
2
b
1
1
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
(5.6)
so that
U
n
= B
n
W
n
. (5.7)
If the ﬁlter b
n
were not causal, then B
n
would not be triangular. If in addition
(5.3) held, i.e., we looked at the entire process at each time instant, then (5.7)
would require inﬁnite vectors and matrices as in Grenander and Rosenblatt
[12]. Since the covariance matrix of W
k
is simply σ
2
I
n
, where I
n
is the nn
identity matrix, we have for the covariance of U
n
:
R
(n)
U
= EU
n
(U
n
)
∗
= EB
n
W
n
(W
n
)
∗
B
∗
n
= σB
n
B
∗
n
(5.8)
or, equivalently
r
k,j
= σ
2
n−1
=0
b
−k
¯
b
−j
= σ
2
min(k,j)
=0
b
+(k−j)
¯
b
. (5.9)
From (5.9) it is clear that r
k,j
is not Toeplitz because of the min(k, j) in the
sum. However, as we next show, as n → ∞ the upper limit becomes large
and R
(n)
U
becomes asymptotically Toeplitz. If we deﬁne
b(λ) =
∞
k=0
b
k
e
ikλ
(5.10)
then
B
n
= T
n
(b) (5.11)
so that
R
(n)
U
= σ
2
T
n
(b)T
n
(b)
∗
. (5.12)
5.2. AUTOREGRESSIVE PROCESSES 51
We can now apply the results of the previous sections to obtain the following
theorem.
Theorem 5.1 Let U
n
be a moving average process with covariance matrix
R
Un
. Let ρ
n,k
be the eigenvalues of R
(n)
U
. Then
R
(n)
U
∼ σ
2
T
n
([b[
2
) = T
n
(σ
2
[b[
2
) (5.13)
so that U
n
is asymptotically stationary. If m ≤ [b(γ)[
2
≤ M and F(x) is any
continuous function on [m, M], then
lim
n→∞
n
−1
n−1
k=0
F(ρ
n,k
) = (2π)
−1
_
2π
0
F(σ
2
[b(λ)[
2
) dλ. (5.14)
If [b(λ)[
2
≥ m > 0, then
R
(n)
U
−1
∼ σ
−2
T
n
(1/[b[
2
). (5.15)
Proof.
(Theorems 4.24.4 and 2.4.)
If the process U
n
had been initiated with its stationary distribution then
we would have had exactly
R
(n)
U
= σ
2
T
n
([b[
2
).
More knowledge of the inverse R
(n)
U
−1
can be gained from Theorem 4.3, e.g.,
circulant approximations. Note that the spectral density of the moving av
erage process is σ
2
[b(λ)[
2
and that sums of functions of eigenvalues tend to
an integral of a function of the spectral density. In eﬀect the spectral density
determines the asymptotic density function for the eigenvalues of R
n
and T
n
.
5.2 Autoregressive Processes
Let W
k
be as previously deﬁned, then an autoregressive process X
n
is deﬁned
by
X
n
= −
n
k=1
a
k
X
n−k
+W
k
n = 0, 1, . . .
52 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
X
n
= 0 n < 0. (5.16)
Autoregressive process include nonstationary processes such as the Wiener
process. Equation (5.16) can be rewritten as a vector equation by deﬁning
the lower triangular matrix.
A
n
=
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
1
a
1
1 0
a
1
1
.
.
.
.
.
.
a
n−1
a
1
1
_
¸
¸
¸
¸
¸
¸
¸
¸
¸
_
(5.17)
so that
A
n
X
n
= W
n
.
We have
R
(n)
W
= A
n
R
(n)
X
A
∗
n
(5.18)
since det A
n
= 1 ,= 0, A
n
is nonsingular so that
R
(n)
X
= σ
2
A
−1
n
A
−1∗
n
(5.19)
or
(R
(n)
X
)
−1
= σ
2
A
∗
n
A
n
(5.20)
or equivalently, if (R
(n)
X
)
−1
= ¦t
k,j
¦ then
t
k,j
=
n
m=0
¯ a
m−k
a
m−j
=
n−max(k,j)
m=0
a
m
a
m+(k−j)
.
Unlike the moving average process, we have that the inverse covariance ma
trix is the product of Toeplitz triangular matrices. Deﬁning
a(λ) =
∞
k=0
a
k
e
ikλ
(5.21)
we have that
(R
(n)
X
)
−1
= σ
−2
T
n
(a)
∗
T
n
(a) (5.22)
and hence the following theorem.
5.2. AUTOREGRESSIVE PROCESSES 53
Theorem 5.2 Let X
n
be an autoregressive process with covariance matrix
R
(n)
X
with eigenvalues ρ
n,k
. Then
(R
(n)
X
)
−1
∼ σ
−2
T
n
([a[
2
). (5.23)
If m
≤ [a(λ)[
2
≤ m
, then for any function F(x) on [m
, M
] we have
lim
n→∞
n
−1
n−1
k=0
F(1/ρ
n,k
) = (2π)
−1
_
2π
0
F(σ
2
[a(λ)[
2
) dλ, (5.24)
where 1/ρ
n,k
are the eigenvalues of (R
(n)
X
)
−1
. If [a(λ)[
2
≥ m
> 0, then
R
(n)
X
∼ σ
2
T
n
(1/[a[
2
) (5.25)
so that the process is asymptotically stationary.
Proof.
(Theorems 5.1.)
Note that if [a(λ)[
2
> 0, then 1/[a(λ)[
2
is the spectral density of X
n
. If
[a(λ)[
2
has a zero, then R
(n)
X
may not be even asymptotically Toeplitz and
hence X
n
may not be asymptotically stationary (since 1/[a(λ)[
2
may not be
integrable) so that strictly speaking x
k
will not have a spectral density. It is
often convenient, however, to deﬁne σ
2
/[a(λ)[
2
as the spectral density and it
often is useful for studying the eigenvalue distribution of R
n
. We can relate
σ
2
/[a(λ)[
2
to the eigenvalues of R
(n)
X
even in this case by using Theorem 4.3
part 4.
Corollary 5.1 If X
k
is an autoregressive process and ρ
n,k
are the eigenvalues
of R
(n)
X
, then for any ﬁnite θ and any function F(x) continuous on [1/m
, θ]
lim
n→∞
n
−1
n−1
k=0
F[min(ρ
n,k
, θ)] = (2π)
−1
_
2π
0
F[min(1/[a(γ)[
2
, θ)] dλ. (5.26)
54 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
Proof.
(Theorem 5.2 and Theorem 4.3.)
If we consider two models of a random process to be asymptotically equiv
alent if their covariances are asymptotically equivalent, then from Theorems
5.1d and 5.2 we have the following corollary.
Corollary 5.2 Consider the moving average process deﬁned by
U
n
= T
n
(b)W
n
and the autoregressive process deﬁned by
T
n
(a)X
n
= W
n
.
Then the processes U
n
and X
n
are asymptotically equivalent if
a(λ) = 1/b(λ)
and M ≥ a(λ) ≥ m > 0 so that 1/b(λ) is integrable.
Proof.
(Theorem 4.3 and Theorem 4.5.)
R
(n)
X
= σ
2
T
n
(a)
−1
T
−1
n
(a)
∗
∼ σ
2
T
n
(1/a)T
n
(1/a)
∗
∼ σ
2
T
n
(1/a)
∗
T
n
(1/a). (5.27)
Comparison of (5.27) with (5.12) completes the proof.
The methods above can also easily be applied to study the mixed autoregressive
moving average linear models [2].
5.3 Factorization
As a ﬁnal example we consider the problem of the asymptotic behavior of
triangular factors of a sequence of Hermitian covariance matrices T
n
(f). It is
5.3. FACTORIZATION 55
well known that any such matrix can be factored into the product of a lower
triangular matrix and its conjugate transpose [12, p. 37], in particular
T
n
(f) = ¦t
k,j
¦ = B
n
B
∗
n
, (5.28)
where B
n
is a lower triangular matrix with entries
b
(n)
k,j
= ¦(det T
k
) det(T
k−1
)¦
−1/2
γ(j, k), (5.29)
where γ(j, k) is the determinant of the matrix T
k
with the right hand col
umn replaced by (t
j,0
, t
j,1
, . . . , t
j,k−1
)
t
. Note in particular that the diagonal
elements are given by
b
(n)
k,k
= ¦(det T
k
)/(det T
k−1
)¦
1/2
. (5.30)
Equation (5.29) is the result of a Gaussian elimination of a GramSchmidt
procedure. The factorization of T
n
allows the construction of a linear model
of a random process and is useful in system identiﬁcation and other recursive
procedures. Our question is how B
n
behaves for large n; speciﬁcally is B
n
asymptotically Toeplitz?
Assume that f(λ) ≥ m > 0. Then ln f(λ) is integrable and we can perform
a WienerHopf factorization of f(λ), i.e.,
f(λ) = σ
2
[b(λ)[
2
¯
b(λ) = b(−λ)
b(λ) =
∞
k=0
b
k
e
ikλ
b
0
= 1
. (5.31)
From (5.28) and Theorem 4.4 we have
B
n
B
∗
n
= T
n
(f) ∼ T
n
(σb)T
n
(σb)
∗
. (5.32)
We wish to show that (5.32) implies that
B
n
∼ T
n
(σb). (5.33)
56 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
Proof.
Since det T
n
(σb) = σ
n
,= 0, T
n
(σb) is invertible. Likewise, since det B
n
=
[det T
n
(f)]
1/2
we have from Theorem 4.3 part 1 that det T
n
(f) ,= 0 so that
B
n
is invertible. Thus from Theorem 2.1 (e) and (5.32) we have
T
−1
n
B
n
= [B
−1
n
T
n
]
−1
∼ T
∗
n
B
∗−1
n
= [B
−1
n
T
n
]
∗
. (5.34)
Since B
n
and T
n
are both lower triangular matrices, so is B
−1
n
and hence
B
n
T
n
and [B
−1
n
T
n
]
−1
. Thus (5.34) states that a lower triangular matrix is
asymptotically equivalent to an upper triangular matrix. This is only possible
if both matrices are asymptotically equivalent to a diagonal matrix, say G
n
=
¦g
(n)
k,k
δ
k,j
¦. Furthermore from (5.34) we have G
n
∼ G
∗−1
n
_
[g
(n)
k,k
[
2
δ
k,j
_
∼ I
n
. (5.35)
Since T
n
(σb) is lower triangular with main diagonal element σ, T
n
(σb)
−1
is lower triangular with all its main diagonal elements equal to 1/σ even
though the matrix T
n
(σb)
−1
is not Toeplitz. Thus g
(n)
k,k
= b
(n)
k,k
/σ. Since T
n
(f)
is Hermitian, b
k,k
is real so that taking the trace in (5.35) yields
lim
n→∞
σ
−2
n
−1
n−1
k=0
_
b
(n)
k,k
_
2
= 1. (5.36)
From (5.30) and Corollary 2.3, and the fact that T
n
(σb) is triangular we
have that
lim
n→∞
σ
−1
n
−1
n−1
k=0
b
(n)
k,k
= σ
−1
lim
n→∞
¦(det T
n
(f))/(det T
n−1
(f))¦
1/2
= σ
−1
lim
n→∞
¦det T
n
(f)¦
1/2n
σ
−1
lim
n→∞
¦det T
n
(σb)¦
1/n
= σ
−1
σ = 1
.
(5.37)
Combining (5.36) and (5.37) yields
lim
n→∞
[B
−1
n
T
n
−I
n
[ = 0. (5.38)
Applying Theorem 2.1 yields (5.33).
5.4. DIFFERENTIAL ENTROPY RATE OF GAUSSIAN PROCESSES57
Since the only real requirements for the proof were the existence of the
WienerHopf factorization and the limiting behavior of the determinant, this
result could easily be extended to the more general case that lnf(λ) is in
tegrable. The theorem can also be derived as a special case of more general
results of Baxter [8] and is similar to a result of Rissanen [11].
5.4 Diﬀerential Entropy Rate of Gaussian Pro
cesses
As a ﬁnal application of the Toeplitz eigenvalue distribution theorem, we
consider a property of a random process that arises in Shannon information
theory. Given a random process ¦X
n
¦ for which a probability density func
tion f
X
n(x
n
) is for the random vector X
n
= (X
0
, X
1
, . . . , X
n−1
) deﬁned for
all positive integers n, the Shannon diﬀerential entropy h(X
n
) is deﬁned by
the integral
h(X
n
) = −
_
f
X
n(x
n
) log f
X
n(x
n
) dx
n
and the diﬀerential entropy rate is deﬁned by the limit
h(X) = lim
n→∞
1
n
h(X
n
)
if the limit exists. (See, for example, Cover and Thomas[14].) The logarithm
is usually taken as base 2 and the units are bits. We will use the Toeplitz
theorem to evaluate the diﬀerential entropy rate of a stationary Gaussian
random process.
A stationary zero mean Gaussian random process is comletely described
by its mean correlation function R
X
(k, m) = R
X
(k−m) = E[(X
k
−m)(X
k
−
m)] or, equivalently, by its power spectral density function
S(f) =
∞
n=−∞
R
X
(n)e
−2πinf
,
the Fourier transform of the covariance function. For a ﬁxed positive integer
n, the probability density function is
f
X
n(x
n
) =
1
(2π)
n/2
det(R
n
)
1/2
e
−
1
2
(x
n
−m
n
)
t
R
−1
n
(x
n
−m
n
)
,
58 CHAPTER 5. APPLICATIONS TO STOCHASTIC TIME SERIES
where R
n
is the n n covariance matrix with entries R
X
(k, m), k, m =
0, 1, . . . , n − 1. A straightforward multidimensional integration using the
properties of Gaussian random vectors yields the diﬀerential entropy
h(X
n
) =
1
2
log(2πe)
n
detR
n
.
If we now identify the the covariance matrix R
n
as the Toeplitz matrix
generated by the power spectral density, T
n
(S), then from Theorem 4.5 we
have immediately that
h(X) =
1
2
log(2πe)σ
2
∞
(5.39)
where
σ
2
∞
=
1
2π
_
2π
0
ln S(f) df. (5.40)
The Toeplitz distribution theorems have also found application in more
complicated information theoretic evaluations, including the channel capacity
of Gaussian channels [17, 18] and the ratedistortion functions of autoregres
sive sources [9].
Bibliography
[1] U. Grenander and G. Szeg¨ o, Toeplitz Forms and Their Applications,
University of Calif. Press, Berkeley and Los Angeles, 1958.
[2] H. Widom, “Toeplitz Matrices,” in Studies in Real and Complex Anal
ysis, edited by I.I. Hirschmann, Jr., MAA Studies in Mathematics,
PrenticeHall, Englewood Cliﬀs, NJ, 1965.
[3] P. Lancaster, Theory of Matrices, Academic Press, NY, 1969.
[4] W. Rudin, Principles of Mathematical Analysis, McGrawHill, NY,
1964.
[5] R.M. Gray, “On Unbounded Toeplitz Matrices and Nonstationary Time
Series with an Application to Information Theory,” Information and
Control, 24, pp. 181–196, 1974.
[6] C.L. Rino, “The Inversion of Covariance Matrices by Finite Fourier
Transforms,” IEEE Trans. on Info. Theory, IT16, No. 2, March 1970,
pp. 230–232.
[7] G. Baxter, “A Norm Inequality for a ‘FiniteSection’ WienerHopf Equa
tion,” Illinois J. Math., 1962, pp. 97–103.
[8] G. Baxter, “An Asymptotic Result for the Finite Predictor,” Math.
Scand., 10, pp. 137–144, 1962.
[9] R.M. Gray, “Information Rates of Autoregressive Processes,” IEEE
Trans. on Info. Theory, IT16, No. 4, July 1970, pp. 412–421.
[10] F.R. Gantmacher, The Theory of Matrices, Chelsea Publishing Co., NY
1960.
59
60 BIBLIOGRAPHY
[11] J. Rissanen and L. Barbosa, “Properties of Inﬁnite Covariance Matrices
and Stability of Optimum Predictors,” Information Sciences, 1, 1969,
pp. 221–236.
[12] U. Grenander and M. Rosenblatt, Statistical Analysis of Stationary
Time Series, Wiley and Sons, NY, 1966, Chapter 1.
[13] J. Pearl, “On Coding and Filtering Stationary Signals by Discrete
Fourier Transform,” IEEE Trans. on Info. Theory, IT19, pp. 229–232,
1973.
[14] T. A. Cover and J. A. Thomas, Elements of Information Theory, Wiley,
New York, 1991.
[15] A. B¨ ottcher and B. Silbermann, Introduction to Large Truncated Toeplitz
Matrices, Springer, New York, 1999.
[16] R. M. Gray, “On the asymptotic eigenvalue distribvution of Toeplitz ma
trices,” IEEE Transactions on Information Theory, Vol. 18, November
1972, pp. 725–730.
[17] B.S. Tsybakov, “On the transmission capacity of a discretetime Gaus
sian channel with ﬁlter,” (in Russion),Probl. Peredach. Inform., Vol 6,
pp. 78–82, 1970.
[18] B.S. Tsybakov, “Transmission capacity of memoryless Gaussian vector
channels,” (in Russion),Probl. Peredach. Inform., Vol 1, pp. 26–40, 1965.
[19] P. J. Davis, Circulant Matrices, WileyInterscience, NY, 1979.
ii Abstract In this tutorial report the fundamental theorems on the asymptotic behavior of eigenvalues, inverses, and products of “ﬁnite section” Toeplitz matrices and Toeplitz matrices with absolutely summable elements are derived. Mathematical elegance and generality are sacriﬁced for conceptual simplicity and insight in the hopes of making these results available to engineers lacking either the background or endurance to attack the mathematical literature on the subject. By limiting the generality of the matrices considered the essential ideas and results can be conveyed in a more intuitive manner without the mathematical machinery required for the most general cases. As an application the results are applied to the study of the covariance matrices and their factors of linear models of discrete time random processes.
Acknowledgements The author gratefully acknowledges the assistance of Ronald M. Aarts of the Philips Research Labs in correcting many typos and errors in the 1993 revision, Liu Mingyu in pointing out errors corrected in the 1998 revision, Paolo Tilli of the Scuola Normale Superiore of Pisa for pointing out an incorrect corollary and providing the correction, and to David Neuhoﬀ of the University of Michigan for pointing out several typographical errors and some confusing notation.
Contents
1 Introduction 2 The Asymptotic Behavior of Matrices 3 Circulant Matrices 4 Toeplitz Matrices 4.1 Finite Order Toeplitz Matrices . . . . . . . . . . . . . . . . . . 4.2 Toeplitz Matrices . . . . . . . . . . . . . . . . . . . . . . . . . 4.3 Toeplitz Determinants . . . . . . . . . . . . . . . . . . . . . . 5 Applications to Stochastic Time Series 5.1 Moving Average Sources . . . . . . . . . . . . . 5.2 Autoregressive Processes . . . . . . . . . . . . . 5.3 Factorization . . . . . . . . . . . . . . . . . . . 5.4 Diﬀerential Entropy Rate of Gaussian Processes Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3 5 15 19 23 28 45 47 48 51 54 57 58
1
2 CONTENTS .
estimation theory. the necessary level of mathematical sophistication for understanding reference [1] is frequently beyond that of one species of applied mathematician for whom the theory can be quite useful but is relatively little understood. tn−1 ··· t0 Examples of such matrices are covariance matrices of weakly stationary stochastic time series and matrix representations of linear timeinvariant discrete time ﬁlters. 3 . etc. physics. A more recent text o devoted to the subject is B¨ttcher and Silbermann [15].j = tk−j .1) Tn = t2 . (1.. t1 t0 . a matrix of the form t0 t−1 t−2 · · · t−(n−1) t t0 t−1 1 . howo ever. A great deal is known about the behavior of such matrices — the most common and complete references being Grenander and Szeg¨ [1] and Widom [2]. This apparent dilemma provides the motivation for attempting a tutorial introduction on Toeplitz matrices that proves the essential theorems using the simplest possible and most intuitive mathematics. . i. . .j where tk. .Chapter 1 Introduction A toeplitz matrix is an n × n matrix Tn = tk. Some simple and fundamental methods that are deeply buried (at least to the untrained mathematician) in [1] are here made explicit.e. . Unfortunately. There are numerous other applications in mathematics. This caste consists of engineers doing relatively mathematical (for an engineering background) work in any of the areas mentioned.. information theory. .
The basic results for the special case of a ﬁnite order Toeplitz matrix appeared in [16]. The latter is an intuitive result which is easily believed even if not formally proved. The main approach of this report is to relate the properties of Toeplitz matrices to those of their simpler. e. more structured cousin — the circulant or cyclic matrix. This method is implicit but not immediately apparent in the more complicated and more general results of Grenander in Chapter 7 of [1]. inverses. . and hopefully a ﬁrst course in analysis. More advanced results from Lebesgue integration. The essential prerequisites for this report are a knowledge of matrix theory. and determinants behave similarly. an engineer’s knowledge of Fourier series and random processes. the CauchySchwarz and triangle inequalities. This approach provides a simpliﬁed and direct path (to the author’s point of view) to the basic eigenvalue distribution and related theorems. a tutorial treatment of the simplest case which was in turn based on the ﬁrst draft of this work. calculus (Riemann integration).g. Two common linear models are studied and some intuitively satisfying results on covariance matrices and their factors are given. several related results that naturally follow but do not appear to be collected together anywhere are presented. speciﬁcally to those who have lacked either the background or the patience to tackle the mathematical literature on the subject. As an example from Shannon information theory. We sacriﬁces mathematical elegance and generality for conceptual simplicity in the hope that this will bring an understanding of the interesting and useful properties of Toeplitz matrices to a wider audience. Several of the occasional results required of analysis are usually contained in one or more courses in the usual engineering curriculum. but they remain less general than those of [1]. the Toeplitz results regarding the limiting behavior of determinants is applied to ﬁnd the diﬀerential entropy rate of a stationary Gaussian random process. products. and Fourier series are not used. The results were subsequently generalized using essentially the same simple methods. These two matrices are shown to be asymptotically equivalent in a certain sense and this is shown to imply that eigenvalues.. Hopefully the only unfamiliar results are a corollary to the CourantFischer Theorem and the Weierstrass Approximation Theorem. functional analysis. As an application several of the results are applied to study certain models of discrete time random processes.4 CHAPTER 1. INTRODUCTION In addition to the fundamental theorems.
i. where the asterisk denotes conjugate transpose.. The eigenvalues λk and the eigenvectors (ntuples) xk of an n × n matrix M are the solutions to the equation M x = λx (2.1 Deﬁne the Rayleigh quotient of an Hermitian matrix H and a vector (complex n−tuple) x by RH (x) = (x∗ Hx)/(x∗ x). While we will not have direct need of this theorem. 116]. if M = M ∗ . and inverse behavior of sequences of matrices. Corollary 2. p. (2.e.3) .1) and hence the eigenvalues are the roots of the characteristic equation of M : det(M − λI) = 0 . The remaining chapters of this report will largely be applications of the tools and results of this chapter to the special cases of Toeplitz and circulant matrices.Chapter 2 The Asymptotic Behavior of Matrices In this chapter we begin with relevant deﬁnitions and a prerequisite theorem and proceed to a discussion of the asymptotic eigenvalue. we will use the following important corollary which is stated below without proof. 5 (2.2) If M is Hermitian. then a more useful description of the eigenvalues is the variational description given by the CourantFischer Theorem [3. product.
k = n−1 λk . (2. The trace of a matrix is the sum of the diagonal elements of a matrix.1 Let A be a matrix with eigenvalues αk . Lemma 2.e.7) Any complex matrix A can be written as A = W RW ∗ . Tr{A∗ A} = n−1 k=0 (A∗ A)k. Then ηm = min RH (x) = min x∗ Hx (2. (If A is Hermitian. that is. i. THE ASYMPTOTIC BEHAVIOR OF MATRICES Let ηM and ηm be the maximum and minimum eigenvalues of H. p.9) n−1 k=0 = k=0 αk 2 + k=j rj. The eigenvalues of A are the principal diagonal elements of R.8) where W is unitary and R = {rk. (2.5) This corollary will be useful in specifying the interval containing the eigenvalues of an Hermitian matrix. The following lemma is useful when studying nonHermitian matrices and products of Hermitian matrices..) Proof. Deﬁne the eigenvalues of the Hermitian matrix A∗ A to be λk .6 CHAPTER 2.6) with equality iﬀ (if and only if ) A is normal. We have Tr{A∗ A} = Tr{R∗ R} = n−1 n−1 n−1 k=0 j=0 rj. k=0 (2. The trace is invariant to unitary operations so that it also is equal to the sum of the eigenvalues of a matrix. Its proof is given since it introduces and manipulates some important concepts. Then n−1 k=0 λk ≥ n−1 k=0 αk 2 . 79].4) ∗ x x x=1 ηM = max RH (x) = max x∗ Hx ∗ x x x=1 (2. iﬀ A∗ A = AA∗ . (2.k 2 ≥ αk 2 .k 2 . it is also normal.j } is an upper triangular matrix [3. respectively.
the eigenvalue of A having largest absolute value: A 2 = max x∗ A∗ Ax ≥ (e∗ A∗ )(AeM ) = αM 2 . The strong norm A is deﬁned by A = max RA∗ A (x)1/2 = max [x∗ A∗ Ax]1/2 . p. (2. This follows since if e(k) is an eigenvector of A 2 with eigenvalue αk .j  2 = (n Tr[A A]) −1 ∗ 1/2 = n −1 n−1 1/2 λk k=0 . 102103].12) If A is itself Hermitian. pp.15) with equality iﬀ A is normal.13) The weak norm of an n × n matrix A = {ak. Lemma 2. if A is Hermitian then A = max αk  = αM .1 A 2 = max λk = λM . .14) From Lemma 2. then its eigenvalues αk are real and the eigenvalues 2 λk of A∗ A are simply λk = αk . To study the asymptotic equivalence of matrices we require a metric or equivalently a norm of the appropriate kind. Two norms — the operator or strong norm and the HilbertSchmidt or weak norm — will be used here [1. Thus. 229231] and is also proved in [1. pp.1 we have A ≥ n 2 −1 n−1 k=0 αk 2 . then A∗ Ae(k) = αk A∗ e(k) = αk e(k) .9) will hold with equality iﬀ R is diagonal and hence iﬀ A is normal.7 Equation (2. k (2. Let A be a matrix with eigenvalues αk and let λk be the eigenvalues of the Hermitian matrix A∗ A.11) The strong norm of A can be bounded below by letting eM be the eigenvector of A corresponding to αM . M ∗ x x=1 (2. (2. 106].10) From Corollary 2. ∗ x x x=1 (2.1 is a direct consequence of Shur’s Theorem [3.j } is deﬁned by A = n−1 n−1 n−1 k=0 j=0 1/2 ak. in particular. k ∆ (2.
j } and H = {hk.20) = n−1 j h∗ G∗ Ghj .17) The triangle inequality in (2.j }. .k hk. (2.j ¯ = n−1 i j k m (2. 3. A+B ≤ A cA = c· A + B (2. Lemma 2. (2. both satisfy the following three axioms: 1.16) A matrix is said to be bounded if it is bounded in both norms.m hk. j .17) will be used often as is the following direct consequence: A−B ≥ A − B . i. the all zero matrix.e.k gi.18) The weak norm is usually the most useful and easiest to handle of the two but the strong norm is handy in providing a bound for the product of two matrices as shown in the next lemma. then GH ≤ G ·H.2 Given two n × n matrices G = {gk. (2. 2.19) Proof.8 CHAPTER 2. GH2 = n−1 i j  k gi. A ≥ 0 .j hm. with equality iﬀ A = 0 . THE ASYMPTOTIC BEHAVIOR OF MATRICES The HilbertSchmidt norm is the “weaker” of the two norms since A 2 = max λk ≥ n k −1 n−1 k=0 λk = A2 .. Note that both the strong and the weak norms are in fact norms in the linear space of matrices.j 2 ¯ gi.
Lemma 2. then n→∞ lim An  = lim Bn .22) 2. If An ∼ Bn and Bn ∼ Cn . As might be expected. Two sequences of n × n matrices An and Bn are said to be asymptotically equivalent if 1. 3. . We can immediately prove several properties of asymptotic equivalence which are collected in the following theorem. If one of the two matrices is Toeplitz. If An ∼ Bn . then An Cn ∼ Bn Dn . Bn ≤ M < ∞ (2.1 1.3a of [1. n→∞ (2.9 where ∗ denotes conjugate transpose and hj is the j th column of H. we will use the weak norm of the diﬀerence of two matrices as a measure of the “distance” between them. We will be considering n × n matrices that approximate each other when n is large. An and Bn are uniformly bounded in strong (and hence in weak) norm: An and 2. An − Bn = Dn goes to zero in weak norm as n → ∞: n→∞ .2 is the matrix equivalent of 7. then An ∼ Cn . From (2. Note that the lemma does not require that G or H be Hermitian.10) (h∗ G∗ Ghj )/(h∗ hj ) ≤ G 2 j j and therefore GH2 ≤ n−1 G 2 j h∗ hj = G j 2 ·H2 . p. 103]. Asymptotic equivalence of An and Bn will be abbreviated An ∼ Bn . then the other is said to be asymptotically Toeplitz.21) lim lim An − Bn  = n→∞ Dn  = 0. If An ∼ Bn and Cn ∼ Dn . Theorem 2.
(2. Bn − A−1 Cn = A−1 An Bn − A−1 Cn n n n ≤ A−1 n −→ ·An Bn − Cn  n→∞ 0.e. n 5. i. 5.23) .k and βn. k=0 (2. Lemma 2. THE ASYMPTOTIC BEHAVIOR OF MATRICES −1 −1 4.17). The following lemma provides a prelude of the type of result obtainable for eigenvalues and will itself serve as the essential part of the more general theorem to follow. A−1 ≤ K < ∞. then Bn ∼ A−1 Cn . −→ 2.3 Given two sequences of asymptotically equivalent matrices An and Bn with eigenvalues αn.22) follows directly from (2. then lim n−1 n−1 k=0 n→∞ αn.k = lim n−1 n→∞ n−1 βn.k . If An Bn ∼ Cn and Proof. A−1 and Bn exist n n and are uniformly bounded by some constant independent of n. The above results will be useful in several of the later proofs. n n 1. Asymptotic equality of matrices will be shown to imply that eigenvalues. products. A−1 − Bn  = Bn Bn An − Bn An A−1 n n −1 ≤ Bn · A−1 n −→ ·Bn − An  n→∞ 0. then −1 A−1 ∼ Bn . Eqs.k . If An ∼ Bn and A−1 .2 yields An Cn − Bn Dn  = An Cn − An Dn + An Dn − Bn Dn  ≤ An ·Cn − Dn + Dn −→ ·An − Bn  n→∞ 0. −1 −1 −1 4. respectively.. An − Cn  = An − Bn + Bn − Cn  ≤ An − Bn  + Bn − Cn  n→∞ 0 3. Bn ≤ K < ∞. Applying Lemma 2.10 CHAPTER 2. and inverses behave similarly.
Similarly to (2.j 2 = n2 Dn 2 .k n−1 n−1 k=0 j=0 ≤n n−1 k=0 dk. ≤ n dk. k=0 (2.23) is equivalent to n→∞ lim n−1 Tr(Dn ) = 0.g.k = n→∞ n−1 lim n−1 s αn. results in 0 ≤ n−1 Tr(Dn )2 ≤ Dn 2 −→ n→∞ 0. respectively. (2. Then lim n−1 n→∞ n−1 k=0 s αn.26) Note that (2. Dividing by n2 . Eq. n→∞n−1 lim for any positive integer s.23). Equations (2.k .22) and (2.24) Applying the CauchySchwartz inequality [4. e. 17] to Tr(Dn ) yields Tr(Dn ) n−1 2 2 = k=0 dk.k . if An and Bn are Hermitian then (2. and taking the limit.k exists and is ﬁnite k=0 n−1 s βn.k .23).23) and (2.26) are special cases of the following fundamental theorem of asymptotic eigenvalue distribution.11 Proof.23) and (2.k k=0 = n→∞ n lim −1 n−1 2 βn.k 2 .15) imply that n→∞ lim n −1 n−1 2 αn.25) which implies (2.j } = An − Bn .24) and hence (2.k and βn. (2. Theorem 2. (2.27) . Assume that the eigenvalue moments of either matrix converge.2 Let An and Bn be asymptotically equivalent sequences of matrices with eigenvalues αn.. Let Dn = {dk. p. k=0 (2.26) relate limiting sample (arithmetic) averages of eigenvalues or moments of an eigenvalue distribution rather than individual eigenvalues.
This theorem.29) allows us to apply Lemma s 2.30) Whether or not An and Bn are Hermitian. If An and Bn are Hermitian. Theorem 2. into polynomials.3 to the matrices As and Dn to obtain (2.3 and consider As − Bn = ∆n .. s ∆ Let An = Bn + Dn as in Lemma 2. (2. Since n s the eigenvalues of As are αn.2 is the fundamental theorem concerning asymptotic eigenvalue behavior.2 Let An and Bn be asymptotically equivalent sequences of matrices with eigenvalues αn. is stated below for reference. 146] to immediately generalize Corollary 2. there exists a sequence of polynomials pn (x) such that n→∞ lim pn (x) = F (x) .k ) = n→∞ n−1 lim n−1 f (βn.26) holds for any positive integer s we can add sums corresponding to diﬀerent values of s to each side of (2. Corollary 2.28) and hence (2.k . our one real excursion into analysis. respectively.k and βn. p.3 (StoneWeierstrass) If F (x) is a continuous complex function on [a.2 thus gives −→ ∆n  ≤ K Dn  n→∞ 0. k=0 (2.3.27). Since (2. This observation immediately yields the following corollary. Equation (2. In this case the eigenvalues of both matrices are real and we can invoke the StoneWeierstrass approximation Theorem [4.26).k .28) The matrix ∆n is a sum of several terms each being a product of ∆n s and Bn s but containing at least one Dn .12 CHAPTER 2.30) can hold for any analytic function f (x) since such functions can be expanded into complex Taylor series. Then lim n−1 n→∞ n−1 k=0 f (αn. n Theorem 2.27). Repeated application of Lemma 2. and let f (x) be any polynomial. Most of the succeeding results on eigenvalues will be applications or specializations of (2.2 implies that (2.e. THE ASYMPTOTIC BEHAVIOR OF MATRICES Proof. (2. (2.29) where K does not depend on n. however. b].k ) . i.27) can be written in terms of ∆n as n n→∞ lim n−1 Tr∆n = 0. Corollary 2. then a much stronger result is possible.
k . p. Corollary 2.k and βn. 1.k . k = 0. Then lim n −1 n−1 k=0 (2. . b]. (2.k .13 uniformly on [a.4 we prove the following corollary on the determinants of asymptotically equivalent matrices.4 we have for F (x) = ln x lim n−1 n→∞ n−1 k=0 ln αn. n − 1.31)(2. Theorem 2.k and βn.31) n→∞ F [αn. βn. Since An and Bn are bounded there exist ﬁnite numbers m and M such that m ≤ αn. respectively. .k .4a) of [1]. n−1} satisfy (2.k .33) n→∞ n→∞ Proof. .k ] = n→∞ n lim −1 n−1 F [βn. As an example of the use of Theorem 2. such that αn. n = 1. .4 is the matrix equivalent of Theorem (7.k . . 1. 2. . 62]. βn. 1. M ]. Stated simply. . .k ≥ m > 0. .32).3 Let An and Bn be asymptotically equivalent Hermitian matrices with eigenvalues αn.4 Let An and Bn be asymptotically equivalent sequences of Hermitian matrices with eigenvalues αn. respectively.32) if either of the limits exists.k ] k=0 (2.2 immediately yields the following theorem: Theorem 2. . . they are said to be asymptotically equally distributed [1. Then lim (det An )1/n = lim (det Bn )1/n . . n−1} and {βn. Let F (x) be an arbitrary function continuous on [m. Applying Theorem 2. When two real sequences {αn.k ≤ M . any continuous function deﬁned on a real interval can be approximated arbitrarily closely by a polynomial. . . From Theorem 2. k = 0. . k = 0.3 to Corollary 2.k k=0 .k = n→∞ n−1 lim n−1 ln βn.
In certain cases stronger results on a type of elementwise convergence are possible using the stronger norm of Baxter [7..2 and 2. where the eigenvalues can get arbitrarily small but are still strictly positive. Baxter’s results are beyond the scope of this report. The main consequences have been the behavior of inverses and products (Theorem 2.k = lim exp n−1 ln n→∞ n−1 βn. These theorems do not concern individual entries in the matrices or individual eigenvalues.k . The major use of the theorems of this chapter is that we can often study the asymptotic behavior of complicated matrices by studying a more structured and simpler asymptotically equivalent matrix. . βn. In the preceding chapter the concept of asymptotic equivalence of matrices was deﬁned and its implications studied. THE ASYMPTOTIC BEHAVIOR OF MATRICES and hence lim exp n−1 ln n→∞ or equivalently n→∞ n−1 k=0 αn.e.14 CHAPTER 2.33) follows. 8]. n→∞ from which (2.k k=0 lim exp[n−1 ln det An ] = lim exp[n−1 ln det Bn ]. i.4). rather they describe an “average” −1 −1 −→ behavior. With suitable mathematical care the above corollary can be extended to the case where αn.k > 0. Thus saying A−1 ∼ Bn means that that A−1 − Bn  n→∞ 0 and n n says nothing about convergence of individual entries in the matrix. but there is no m satisfying the hypothesis of the corollary.1) and eigenvalues (Theorems 2.
of the n diﬀerence equations m−1 n−1 cn−m+k yk + k=0 k=m ck−m yk = ψ ym .1) where each row is a cyclic shift of the row above it. 1. (3. . . n − 1. A circulant matrix C is one having the form c0 c1 c2 c1 c2 cn−1 c0 . . c2 c1 c0 .. n − 1. m = 0. ··· cn−1 · · · cn−1 . . ..Chapter 3 Circulant Matrices The properties of circulant matrices are well known and easily derived [3. . m = 0.4) . . The matrix C is itself a special type of Toeplitz matrix. equivalently.. p. Since these matrices are used both to approximate and explain the behavior of Toeplitz matrices. . . 1. (3. .. . 15 (3. c1 C= cn−1 c0 c1 . 267]. ..[19].. The eigenvalues ψk and the eigenvectors y (k) of C are the solutions of Cy = ψ y (3. . .3) Changing the summation dummy variable results in n−1−m n−1 ck yk+m + k=0 k=n−m ck yk−(n−m) = ψ ym .2) or. it is instructive to present one version of the relevant derivations here. ...
. . we have eigenvalue n−1 ψm = k=0 ck e−2πimk/n (3. where U = y (0) y (1)  · · · y (n−1) (3. CIRCULANT MATRICES One can solve diﬀerence equations as one solves diﬀerential equations — by guessing an (hopefully) intuitive solution and then proving that it works.4) and cancellation of ρm yields n−1−m k=0 ck ρk + ρ−n n−1 ck ρk = ψ.6) where the normalization is chosen to give the eigenvector unit energy. From (3. · · · . ρ2 . ρ is one of the n distinct complex nth roots of unity. . k=n−m Thus if we choose ρ−n = 1.16 CHAPTER 3. Since the equation is linear with constant coeﬃcients a reasonable guess is yk = ρk (analogous to y(t) = esτ in linear time invariant diﬀerential equations). .e. .j } . Substitution into (3. (3.5) with corresponding eigenvector y = n−1/2 1. . e−2πim/n . Choosing ρj as the complex nth root of unity. n − 1 Ψ = {ψk δk. .. ρj = e−2πij/n . 1.8) = n−1 e−2πimk/n .7) we can write C = U ∗ ΨU. ρ. e−2πi(n−1)/n . i. k = 0. . m. then we have an eigenvalue n−1 ψ= k=0 c k ρk (3.7) and eigenvector y (m) = n−1/2 1. ρn−1 .
m }. Furthermore (3. This leads to the following properties. where γ = {ψm βm δk.j = c−(k−j) mod n . Note that all circulant matrices have the same set of eigenvectors.9) shows that any matrix expressible in the form (3. Theorem 3.j = n −1 n−1 e2πim(k−j)/n ψm m=0 n−1 n−1 = n−1 e2πim(k−j)/n m=0 n−1 n−1 r=0 cr e2πimr/n (3. Thus (3.1 Let C = {ck−j } and B = {bk−j } be circulant n × n matrices with eigenvalues n−1 ψm = k=0 n−1 ck e−2πimk/n βm = k=0 bk e−2πimk/n . e2πim(k−j+r)/n = m=0 n 0 k − j = −r mod n otherwise so that ak.8) is circulant. .j .1) are equivalent.8) we note that the (k. respectively. and CB is also a circulant matrix.8) and (3. say ak. is ak. C and B commute and CB = BC = U ∗ γU . 1. j)th element of C. Since C is unitarily similar to a diagonal matrix it is normal.17 To verify (3.9) = n−1 But we have n−1 cr r=0 m=0 e2πim(k−j+r)/n .
If ψm = 0. C −1 = (U ∗ ΨU )−1 = U ∗ Ψ−1 U if Ψ is nonsingular. 1. In addition the eigenvalues of such matrices can easily be found exactly. We have C = U ∗ ΨU and B = U ∗ ΦU where Ψ and Φ are diagonal matrices with elements ψm δk. Proof. . 1.9) implies that CB is circulant.m and βm φk. C + B = U ∗ (Ψ + Φ)U 3. . Circulant matrices are an especially tractable class of matrices since inverses. (3. . C + B is a circulant matrix and C + B = U ∗ ΩU. products. CIRCULANT MATRICES 2. n − 1. CB = U ∗ ΨU U ∗ ΦU = U ∗ ΨΦU = U ∗ ΦΨU = BC Since ΨΦ is diagonal. then C is nonsingular and C −1 = U ∗ Ψ−1 U so that the inverse of C can be straightforwardly constructed. 2. and sums are also circulants and hence both straightforward to construct and normal. where Ω = {(ψm + βm )δk.m } 3. .m . In the next chapter we shall see that certain circulant matrices asymptotically approximate Toeplitz matrices and hence from Chapter 2 results similar to those in Theorem 3 will hold asymptotically for Toeplitz matrices. respectively. . m = 0.18 CHAPTER 3.
e. Consider the inﬁnite sequence {tk . If there exists a ﬁnite m such that tk = 0. k > m.1).1) k=−∞ Unfortunately this case requires mathematical machinery beyond that assumed in this paper.e. this paper will be conﬁned to the absolutely summable case..1) is that it ensures the existence and continuity of the Fourier 19 . Lebesgue integration and a relatively advanced knowledge of Fourier series. then there are two common constraints.. (4.2) This assumption greatly simpliﬁes the mathematics but does not alter the fundamental concepts involved.2) over (4. i.e. and determinants of ﬁnite Toeplitz matrices is derived by constructing an asymptotically equivalent circulant matrix and applying the results of the previous chapters.. (4. The main advantage of (4. As the main purpose here is tutorial and we wish chieﬂy to relay the ﬂavor and an intuitive feel for the results. i. ∞ k=−∞ tk  < ∞. then Tn is said to be a ﬁnite order Toeplitz matrix. eigenvalues.Chapter 4 Toeplitz Matrices In this chapter the asymptotic behavior of inverses. ±2. k = 0. If tk is an inﬁnite sequence. The most general is to assume that the tk are square summable. We will make the stronger assumption that the tk are absolutely summable. ±1. · · ·} and deﬁne the ﬁnite (n × n) Toeplitz matrix Tn = {tk−j } as in (1. i. Toeplitz matrices can be classiﬁed by the restrictions placed on the sequence tk . products. that ∞ tk 2 < ∞ .
20 series f (λ) deﬁned by
∞
CHAPTER 4. TOEPLITZ MATRICES
n
f (λ) =
k=−∞
tk eikλ = n→∞ lim
tk eikλ .
k=−n
(4.3)
Not only does the limit in (4.3) converge if (4.2) holds, it converges uniformly for all λ, that is, we have that f (λ) −
n −n−1 ∞
tk eikλ
k=−n
=
−n−1
tk eikλ +
k=−∞ k=n+1 ∞
tk eikλ
≤
tk eikλ +
k=−∞ −n−1 ∞ k=n+1 k=n+1
tk eikλ ,
≤
tk  +
tk 
k=−∞
where the righthand side does not depend on λ and it goes to zero as n → ∞ from (4.2), thus given there is a single N , not depending on λ, such that f (λ) −
n k=−n
tk eikλ ≤
, all λ ∈ [0, 2π] , if n ≥ N.
(4.4)
Note that (4.2) is indeed a stronger constraint than (4.1) since
∞ k=−∞
tk 2 ≤
∞
k=−∞
tk  .
2
Note also that (4.2) implies that f (λ) is bounded since f (λ) ≤
∞
tk eikλ 
∆
k=−∞ ∞
≤
tk  = Mf  < ∞ .
k=−∞
The matrix Tn will be Hermitian if and only if f is real, in which case we denote the least upper bound and greatest lower bound of f (λ) by Mf and mf , respectively. Observe that max(mf , Mf ) ≤ Mf  .
21 Since f (λ) is the Fourier series of the sequence tk , we could alternatively begin with a bounded and hence Riemann integrable function f (λ) on [0, 2π] (f (λ) ≤ Mf  < ∞ for all λ) and deﬁne the sequence of n × n Toeplitz matrices Tn (f ) = (2π)−1
2π 0
f (λ)e−i(k−j) dλ ; k, j = 0, 1, · · · , n − 1
.
(4.5)
As before, the Toeplitz matrices will be Hermitian iﬀ f is real. The assumption that f (λ) is Riemann integrable implies that f (λ) is continuous except possibly at a countable number of points. Which assumption is made depends on whether one begins with a sequence tk or a function f (λ) — either assumption will be equivalent for our purposes since it is the Riemann integrability of f (λ) that simpliﬁes the bookkeeping in either case. Before ﬁnding a simple asymptotic equivalent matrix to Tn , we use Corollary 2.1 to ﬁnd a bound on the eigenvalues of Tn when it is Hermitian and an upper bound to the strong norm in the general case.
Lemma 4.1 Let τn,k be the eigenvalues of a Toeplitz matrix Tn (f ). If Tn (f ) is Hermitian, then (4.6) mf ≤ τn,k ≤ Mf . Whether or not Tn (f ) is Hermitian, Tn (f ) ≤ 2Mf  so that the matrix is uniformly bounded over n if f is bounded. (4.7)
Proof. Property (4.6) follows from Corollary 2.1: max τn,k = max(x∗ Tn x)/(x∗ x) x
k
(4.8)
min τn,k = min(x∗ Tn x)/(x∗ x) x
k
22 so that x∗ Tn x =
CHAPTER 4. TOEPLITZ MATRICES
n−1 n−1
tk−j xk xj ¯
k=0 j=0 n−1 n−1 2π 0
=
k=0 j=0
(2π)−1
2π n−1 0 k=0
f (λ)ei(k−j)λ dλ xk xj ¯
2
(4.9)
= (2π) and likewise x x=
k=0 ∗ n−1
−1
xk e
ikλ
f (λ) dλ
xk  = (2π)
2
−1 0
2π
n−1
dλ
k=0
xk eikλ 2 .
(4.10)
Combining (4.9)(4.10) results in
2π n−1 2
dλf (λ) mf ≤
0 2π k=0 n−1
xk e
ikλ 2
=
dλ
0 k=0
xk eikλ
x∗ Tn x ≤ Mf , x∗ x
(4.11)
which with (4.8) yields (4.6). Alternatively, observe in (4.11) that if e(k) is the eigenvector associated with τn,k , then the quadratic form with x = e(k) yields x∗ Tn x = τn,k n−1 xk 2 . Thus (4.11) implies (4.6) directly. k=0 We have already seen in (2.13) that if Tn (f ) is Hermitian, then Tn (f ) = ∆ maxk τn,k  = τn,M , which we have just shown satisﬁes τn,M  ≤ max(Mf , mf ) which in turn must be less than Mf  , which proves (4.7) for Hermitian matrices.. Suppose that Tn (f ) is not Hermitian or, equivalently, that f is not real. Any function f can be written in terms of its real and imaginary parts, f = fr + ifi , where both fr and fi are real. In particular, fr = (f + f ∗ )/2 and fi = (f − f ∗ )/2i. Since the strong norm is a norm, Tn (f ) = = ≤ Tn (fr + ifi ) Tn (fr ) + iTn (fi ) Tn (fr ) + Tn (fi )
≤ Mfr  + Mfi  .
FINITE ORDER TOEPLITZ MATRICES 23 Since (f ± f ∗ )/2 ≤ (f  + f ∗ )/2 ≤ Mf  .4. The main diﬀerence between the derivations for the ﬁnite and inﬁnite order case is the circulant matrix chosen. Mfr  + Mfi  ≤ 2Mf  .1.e. then by collecting equal terms we have n−1 n−1 k=0 j=0 n−1 k=−(n−1) n−1 Tn − Tn 2 = n−1 tk−j − tk−j 2 = n−1 (n − k)tk − tk 2 . proving (4.14). Note for later use that the weak norm between Toeplitz matrices has a simpler form than (2. i. With the exception of the upper left and lower right hand corners that Tn looks like a circulant matrix. If we can ﬁnd an asymptotically equivalent circulant matrix then all of the results of Chapters 2 and 3 can be instantly applied. A typical Toeplitz matrix will then have the appearance of the following matrix. 4. that is. possessing a band of nonzero entries down the central diagonal and zeros everywhere else. and then generalize in a natural way to the inﬁnite order case. Since we are interested in the behavior or Tn for large n we choose n >> m. Hence to gain some feel for the matrix chosen we ﬁrst consider the simpler ﬁnite order case where the answer is obvious.7). Let Tn = {tk−j } and Tn = {tk−j } be Toeplitz.1 Finite Order Toeplitz Matrices Let Tn be a sequence of ﬁnite order Toeplitz matrices of order m + 1. (4. ti = 0 unless i ≤ m.12) = k=−(n−1) (1 − k/n)tk − tk 2 We are now ready to put all the pieces together to study the asymptotic behavior of Tn . each row .
. . t−1 t1 . .. .. i. ··· t−m tm t−1 t0 · · · t1 t0 tm t−m . .e. tm t0 t1 . 0 . . T = 0 .24 CHAPTER 4. . tm t−m . t0 t−1 · · · t−m tm ··· .. . t1 . . . . ... tm t1 t0 t−1 · · · t−m .. Deﬁne the circulant matrix C in just this way. . . ... t0 t−1 t1 t0 . . T = (4... TOEPLITZ MATRICES is the row above shifted to the right one place... .. 0 tm ··· t−m .. tm ··· t1 t0 t−1 · · · t−m .13) We can make this matrix exactly into a circulant if we ﬁll in the upper right and lower left corners with the appropriate entries. t−1 · · · t−m t0 0 .. . .
both are bounded in the strong norm and..2 The matrices Tn and Cn deﬁned in (4. (4. · · · . so Tn are uniformly bounded by 2Mf  from Lemma 4.14) are asymptotically equivalent.. FINITE ORDER TOEPLITZ MATRICES 25 (n) c0 (n) ··· (n) cn−1 ··· . The weak norm of the diﬀerence is Tn − Cn 2 = n−1 m k=0 k(tk 2 + t−k 2 ) . n→∞ lim Tn − Cn  = 0.1. cn−1 (n) (n) .1.16) Proof.14) c1 (n) c0 (n) Equivalently. i. . m −→ (tk 2 + t−k 2 ) n→∞ 0 ≤ mn−1 k=0 .13) and (4. The matrix Cn (f ) is intuitively a candidate for a simple matrix asymptotically equivalent to Tn (f ) — we need only prove that it is indeed both asymptotically equivalent and simple. C. (n) cn−1 c0 (n) = (4. 1.15) is denoted Cn (f ). the circulant matrix deﬁned as in (4. m k = n − m.4.15) ck = t(n−k) 0 (n) Given Tn (f ). Lemma 4.14)(4. consists of cyclic shifts of (c0 . The matrices Cn are also uniformly ∗ bounded since Cn Cn is a circulant matrix with eigenvalues f (2πk/n)2 ≤ 2 4Mf  .e. The tk are obviously absolutely summable. cn−1 ) where t−k k = 0. · · · . n − 1 otherwise (4. · · · .
k − n−1 n−1 k=0 s ψn.2 and Theorem 2.k ≤ Kn−1/2 . k=0 (4. From Lemma 4.3 is of interest in that for ﬁnite order Toeplitz matrices one can ﬁnd the rate of convergence of the eigenvalue moments.17) (4. Equation (4.26 CHAPTER 4.1 gives us immediate hope of fruitfully studying inverses and products of Toeplitz matrices. n−1 k=0 s τn. it is not yet clear that we have simpliﬁed the study of the eigenvalues. n−1 n−1 k=0 s τn. The above two lemmas show that we can immediately apply the results of Section II to Tn and Cn . then for any positive integer s lim n−1 n→∞ In fact.3 and Lemma 4.k = n→∞ n−1 lim n−1 s ψn. Equation (4.k be the eigenvalues of Cn (f ).2 and Theorem 2.14) and let ψn. Lemma 4.k . Although Theorem 2.19) . Lemma 4. TOEPLITZ MATRICES The above Lemma is almost trivial since the matrix Tn − Cn has fewer than m2 nonzero entries and hence the n−1 in the weak norm drives Tn −Cn  to zero.18) follows from Lemma 2.k . respectively.14) and let their eigenvalues be τn. for ﬁnite n.2 we have the following lemma: Lemma 4.17) is direct from Lemma 4.3 Let Tn and Cn be as in (4. Proof.18) where K is not a function of n.4 Let Cn (f ) be constructed from Tn (f ) as in (4. (4.k and ψn. The next lemma clariﬁes that we have indeed found a useful approximation.k = (2π)−1 2π 0 f s (λ) dλ. then for any positive integer s we have lim n−1 n→∞ n−1 k=0 s ψn.13) and (4.2.2. It can be shown that s−1 k ≤ sMf .
2 and (4. (4.k and τn. 0 (4. Mf ] we have lim n−1 n→∞ n−1 k=0 F (ψn.j = k=0 m ck e−2πijk/n (n) n−1 = k=0 t−k e−2πijk/n + tn−k e−2πijk/n .21) ensure that ψn. Mf ].4 and Theorem 2.1.k and f (λ) are real and application of the StoneWeierstrass theorem to (4.22) as a Riemann integral.21) k=n−m m = tk e−2πijk/n = f (2πjn−1 ) k=−m Note that the eigenvalues of Cn are simply the values of f (λ) with λ uniformly spaced between 0 and 2π.k ) = (2π)−1 2π F [f (λ)] dλ.20). . (4. then for any function F (x) continuous on [mf .22) where the continuity of f (λ) guarantees the existence of the limit of (4.20) Proof.2 we have the following special case of the fundamental eigenvalue distribution theorem. FINITE ORDER TOEPLITZ MATRICES 27 If Tn (f ) and hence Cn (f ) are Hermitian.24.22) yields (4. Lemma 4.4.k are in the real interval [mf .k = k=0 lim n−1 n→∞ n−1 n−1 f (2πk/n)s k=0 = n→∞ lim f (λk )s ∆λ/(2π) k=0 2π 0 = (2π)−1 f (λ)s dλ. Combining Lemmas 4. From Chapter 3 we have exactly n−1 ψn. If Tn and Cn are Hermitian than the ψn. Deﬁning 2πk/n = λk and 2π/n = ∆λ we have lim n−1 n→∞ n−1 s ψn.
then for any function F (x) continuous on [mf . k = 0.23) Furthermore. This behavior should seem reasonable since the equations Tn x = τ x and Cn x = ψx. With the eigenvalue problem in hand we could next write down theorems on inverses and products of Toeplitz matrices using Lemma 4. then for any positive integer s n→∞ lim n −1 n−1 k=0 s τn. so we are free to choose a construction with the most desirable properties. Since these theorems are identical in statement and proof with the inﬁnite order absolutely summable Toeplitz case. It will. ±1.e. ±2.1 to more general Toeplitz matrices with no assumption of ﬁne order. · · ·} with ∞ f (λ) = tk eikλ k=−∞ . n > 2m + 1.2 and the results of Chapters 23. the sequences τn.2 Toeplitz Matrices Obviously the choice of an appropriate circulant matrix to approximate a Toeplitz matrix is not unique. TOEPLITZ MATRICES Theorem 4.25) tk = (2π)−1 f (λ)e−ikλ .24) i.28 CHAPTER 4. are essentially the same nth order diﬀerence equation with diﬀerent boundary conditions. if Tn (f ) is Hermitian. (4.. in fact.k ) = (2π)−1 2π F [f (λ)] dλ.k . 0 (4. It is in fact the “nice” boundary conditions that make ψ easy to ﬁnd exactly while exact solutions for τ are usually intractable.1 If Tn (f ) is a ﬁnite order Toeplitz matrix with eigenvalues τn.k = (2π)−1 2π 0 f (λ)s dλ. prove useful to consider two slightly diﬀerent circulant approximations to a given Toeplitz matrix. we defer these theorems momentarily and generalize Theorem 4.k and f (2πk/n) are asymptotically equally distributed. 4. Mf ] lim n−1 n→∞ n−1 k=0 F (τn. 2π 0 (4. Say we have an absolutely summable sequence {tk .
then from (3.28) in turn deﬁnes Cn (f ) since. from (3. Equation (4.26) Since f (λ) is Riemann integrable. Cn (f ) has the useful property (4.7) and (3.m e2πimk/n m=0 . n − 1.27) = (2π)−1 0 (n) f (λ)eikλ dλ = t−k and hence the ck are simply the sum approximations to the Riemann integral giving t−k . that is.4.9) ck (n) = n−1 n−1 ψn. if we are told that Cn is a circulant matrix with eigenvalues f (2πm/n). · · · .29) f (2πm/n)e2πimk/n m=0 .m of Cn are simply f (2πm/n).26). cn−1 ) where ck = n−1 (n) n−1 j=0 f (2πj/n)e2πijk/n .15) used in the ﬁnite case.2.4 hold for the more general case with Cn (f ) constructed as in (4. As a result.9) show that the eigenvalues ψn.7). c1 . · · · . Equations (4.m = k=0 n−1 ck e−2πimk/n (n) n−1 n−1 j=0 = k=0 n−1 f (2πj/n)e2πijk/n e−2πimk/n .21) of the circulant approximation (4. m = 0. we have that for ﬁxed k lim c n→∞ k (n) = n→∞ lim n−1 n−1 f (2πj/n)e2πijk/n j=0 2π (4. the conclusions of Lemma 4. n−1 (4. and (3.26).28) = j=0 f (2πj/n) n−1 22πik(j−m)/n k=0 = f (2πm/n) Thus. TOEPLITZ MATRICES (n) (n) 29 (n) Deﬁne Cn (f ) to be the circulant matrix with top row (c0 . 1. (3.9) n−1 ψn. (4. = n−1 n−1 (4.
we have that ∞ ∞ f (2πj/n) = n−1 tm e m=−∞ ∞ i2πjm/n t−m e−i2πjm/n m=−∞ = t−1+mn e−2πijl/n l=0 m=−∞ . then Cn (f )−1 = Cn (1/f ). k = 0. (4. Since e−2πimk/n is periodic with period n. then Cn (f )Cn (g) = Cn (f g). Given Tn (f ) where f (λ) is real and f (λ) ≥ mf > 0.26). Thus.30 CHAPTER 4. either (4. n − 1.26).4 holds for Cn (f ) yields several useful properties as summarized by the following lemma. · · · . Lemma 4.25) and the circulant matrix Cn (f ) deﬁned by (4.30) (Note. The fact that Lemma 4.) 2. the sum exists since the tk are absolutely summable. Proof. Given a function f of (4. TOEPLITZ MATRICES as in (4. then ck = m=−∞ (n) ∞ t−k+mn . Given two functions f (λ) and g(λ).5 1.28) can be used to deﬁne Cn (f ). 3. 1. 1.26) or (4.
k = 0.30) points out a shortcoming of Cn (f ) for applications as a circulant approximation to Tn (f ) — it depends on the entire sequence {tk .32) . Cn (f )−1 has eigenvalues 1/f (2πk/n). by Theorem 3.31) which depends only on {tk . = t−k+mn m=−∞ 2.29) and the fact that Cn (f )−1 is circulant we have Cn (f )−1 = Cn (1/f ).9) ck (n) 31 = n−1 n−1 f (2πj/n)e2πijk/n j=0 n−1 n−1 ∞ = n−1 n−1 t−1+mn e2πij(k−l)/n j=0 l=0 m=−∞ ∞ = ∞ l=0 m=−∞ t−1+mn n−1 n−1 j=0 e2πij(k−l)/n . TOEPLITZ MATRICES and hence from (4. ±1. ±n − 1} of Tn (f ). Follows immediately from Theorem 3. so is f (λ)g(λ).26) and (3.2. This can cause problems in practical situations where we wish a circulant approximation to a Toeplitz matrix Tn when we only know Tn and not f . (4. 3. ±2. · · · .1. ±n − 1}. k = 0. if f (λ) and g(λ) are Riemann integrable. A natural such approximation is to form the truncated Fourier series ˆ fn (λ) = n tk eikλ .4. Equation (4. · · ·} and not just on the ﬁnite collection of elements {tk . and hence from (4. ±1. k = 0. Pearl [13] discusses several coding and ﬁltering applications where this restriction is necessary for practical reasons.1 and the fact that. · · · . and then deﬁne the circulant matrix ˆ ˆ Cn = Cn (fn ). Since Cn (f ) has eigenvalues f (2πk/n) > 0. ±1. k=−n (4.
ˆ Cn (f ) ∼ Cn ∼ Tn .15) for an rth order Toeplitz matrix if n > 2r + 1. Lemma 4.31)(4. ˆ ˆ ˆ Note that both Cn (f ) and Cn = Cn (fn ) reduces to the Cn (f ) of (4. Then. ˆ The matrix Cn does not have the property (4. 1. and hence (n) ck = t−k + tn−k .26) and (4. · · · .34) Proof. TOEPLITZ MATRICES (n) (n) that is. The following lemma shows that these circulant matrices are asymptotically equivalent to each other and Tm . (4. · · · .9) 1 if m = −k or m = n − k. the circulant matrix having as top row (ˆ0 . but it does not have the desirable property to depending only on the entries of Tn . cn−1 ) where c ˆ ck ˆ (n) = n−1 n−1 j=0 n−1 j=0 ˆ fn (2πj/n)e2πijk/n n = n−1 n tm e2πijk/n e2πijk/n m=−n (4.28) of having eigenvalues ˆ f (2πk/n) in the general case (its eigenvalues are fn (2πk/n).6 Let Tn = {tk−j } where ∞ k=−∞ tk  < ∞ and deﬁne as usual f (λ) = ∞ k=−∞ tk eikλ . n − 1). . k = 0. m=−n The last term in parentheses is from (3.32 CHAPTER 4. ˆ ˆ Deﬁne the circulant matrices Cn (f ) and Cn = Cn (fn ) as in (4. k = 0.33) = tm n−1 n−1 j=0 e2πij(k+m)/n . 1. · · · . n − 1.32).
4) and the related discussion that fn (λ) uniformly converges to f (λ).14) and the comment following it that ˆ Cn (f ) − Cn 2 = n−1 n−1 k=0 ˆ f (2πk/n) − fn (2πk/n)2 . TOEPLITZ MATRICES 33 ˆ Since both Cn (f ) and Cn are circulant matrices with the same eigenvectors (Theorem 3. and hence given > 0 there is an N such that for n ≥ N we have for all k. i=0 ˆ lim Cn (f ) − Cn  = 0 ˆ Cn (f ) ∼ Cn .35) proving that ˆ Next.2. ck ˆ (n) = tk + tn−k = t−n−k + tk k≤0 (4. ˆ Recall from (4. Cn = {tk−j } and use (4. n that ˆ f (2πk(n) − fn (2πk/n)2 ≤ and hence for n ≥ N ˆ Cn (f ) − Cn 2 ≤ n−1 Since is arbitrary.12) to obtain ˆ Tn − Cn 2 = From (4. (4.1).33) we have that n−1 k=−(n−1) (1 − k/n)tk − tk 2 .1 and (2. we have from part 2 of Theorem 3.4. n→∞ n−1 = .36) k≥0 tk = and hence (n) cn−k ˆ ˆ Tn − Cn 2 = tn−1 2 + n−1 k=0 (1 − k/n)(tn−k 2 + t−(n−k) 2 ) . = tn−1 2 + n−1 k=0 (k/n)(tk 2 + t−k 2 ) .
24. TOEPLITZ MATRICES Since the {tk } are absolutely summable. Then if τn.2 Let Tn (f ) be a sequence of Toeplitz matrices such that f (λ) is Riemann integrable.k = (2π)−1 2π 0 f (λ)s dλ. We now have a circulant matrix Cn (f ) asymptotically equivalent to Tn and whose eigenvalues. k=0 ≤ Since is arbitrary..1) instead of (4.1 Theorem 4. (4. ˆ We note that Pearl [13] develops a circulant matrix similar to Cn (depending only on the entries of Tn ) such that (4.37) holds in the more general case where (4.2) holds.k are the eigenvalues of Tn and s is any positive integer lim n−1 n→∞ n−1 k=0 s τn.1 proves (4.38) .g.4 and Theorems 2. n→∞ lim tn−1 2 = 0 and given > 0 we can choose an N large enough so that ∞ k=N tk 2 + t−k 2 ≤ and hence ˆ lim Tn − Cn  = n→∞ n−1 lim n→∞ (k/n)(tk 2 + t−k 2 ) n−1 k=N N −1 k=0 ∞ k=N k=0 N −1 = n→∞ lim (k/n)(tk 2 + t−k 2 ) + (k/n)(tk 2 + t−k 2 ) . We can now use Lemmas 4. e. n→∞ lim n −1 k(tk  + t−k  ) + 2 2 (tk 2 + t−k 2 ) ≤ n→∞ ˆ lim Tn − Cn  = 0 ˆ Tn ∼ Cn . f (λ) is bounded or the sequence tk is absolutely summable.22.34 CHAPTER 4.34).3 to immediately generalize Theorem 4.37) and hence which with (4. (4.35) and Theorem 2. inverses and products are known exactly.
2. 7]. 0 (4. k=0 Unfortunately. TOEPLITZ MATRICES 35 Furthermore. To get around this problem we mimic Grenander .k ) .2 yields the following two corollaries.39) Theorem 4.4. Assume that dλ = 0. Proof. if Tn (f ) is Hermitian (f (λ) is real) then for any function F (x) continuous on [mf .k ≤ x). otherwise n−1 1x (τn. Mf ] lim n−1 n−1 k=0 n→∞ F (τn. The approach used here is essentially a specialization of Grenander’s [1. Deﬁne the characteristic function 1x (α) = We have D(x) = n→∞ n−1 lim 1 0 mf ≤ α ≤ x . Theorem 4. The technical condition of a zero integral over the region of the set of λ for which f (λ) = x is needed to ensure that x is a point of continuity of the limiting distribution. Corollary 4.2 is the fundamental eigenvalue distribution theorem of Szeg¨ o [1].k ) = (2π)−1 2π F [f (λ)] dλ. λ:f (λ)=x Then the limiting distribution D(x) = limn→∞ Dn (x) exists and is given by D(x) = (2π)−1 f (λ)≤x dλ. 1x (α) is not a continuous function and hence Theorem 4.1 Let Tn (f ) be Hermitian and deﬁne the eigenvalue distribution function Dn (x) = n−1 (number of τn. ch.2 cannot be immediately implied.
Deﬁne 1+ (α) x = 1 − 0 1 1 α−x α≤x x<α≤x+ x+ <α α≤x− 1 − α−x+ x − < α ≤ x 0 x<α The idea here is that the upper bound has an output of 1 everywhere 1x does.36 CHAPTER 4. 115 and deﬁne two continuous functions that provide upper o and lower bounds to 1x and will converge to it in the limit.k ) x 1− (f (λ)) dλ x dλ + (2π)−1 f (λ)≤x− x− <f (λ)≤x = (2π)−1 = (2π)−1 (1 − f (λ) − (x − ) ) dλ . TOEPLITZ MATRICES and Szeg¨ p. but then it drops in a continuous linear fashion to zero at x + instead of immediately at x. The lower bound has a 0 everywhere 1x does and it rises linearly from x to x − to the value of 1 instead of instantaneously as does 1x . Theorem 4 can be used to conclude x x that lim n−1 n→∞ n−1 1+ (τn. Since both 1+ and 1− are continuous. Clearly 1− (α) < 1x (α) < 1+ (α) x x 1− (α) = x for all α.k ) x k=0 = (2π)−1 = (2π)−1 1+ (f (λ)) dλ x dλ + (2π)−1 f (λ)≤x x<f (λ)≤x+ (1 − dλ f (λ) − x ) dλ ≤ (2π)−1 and lim n−1 n→∞ n−1 k=0 dλ + (2π)−1 f (λ)≤x x<f (λ)≤x+ 1− (τn.
TOEPLITZ MATRICES = (2π)−1 f (λ)≤x− 37 (x − f (λ)) dλ dλ + (2π)−1 x− <f (λ)≤x ≥ (2π)−1 = (2π)−1 dλ f (λ)≤x− f (λ)≤x dλ − (2π)−1 dλ x− <f (λ)≤x These inequalities imply that for any > 0. this means the sum will be sandwiched between (2π)−1 dλ f (λ)≤x and Thus if (2π)−1 f (λ)≤x dλ − (2π)−1 dλ = 0.k = Mf k n→∞ lim min τn. dλ f (λ)≤x = (2π) v −1 Corollary 4. f (λ)=x f (λ)=x then D(x) = (2π)−1 0 2π 1x [f (λ)]dλ .2 For Tn (f ) Hermitian we have n→∞ lim max τn.k ) will be sandwitched between k=0 (2π)−1 f (λ)≤x dλ + (2π)−1 x<f (λ)≤x+ dλ and (2π)−1 f (λ)≤x dλ − (2π)−1 dλ.2. dλ. as n grows the sample average n−1 n−1 1x (τn.k = mf . k . x− <f (λ)≤x Since can be made arbitrarily small.4.
From Corollary 4. 3.41) and the expansion converges (in weak norm) for suﬃciently large n. f (λ)≤mf + The strict inequality follows from the continuity of f (λ).42) . mf + ]} > 0 there must be eigenvalues in the interval [mf . (4. then Tn (f )−1 ∼ Cn (f )−1 . Furthermore. The maximum result is proved similarly. if we deﬁne Tn (f ) − Cn (f ) = Dn then Tn (f )−1 has the expansion Tn (f )−1 = [Cn (f ) + Dn ]−1 = Cn (f )−1 [I + Dn Cn (f )−1 ] −1 = Cn (f )−1 I + Dn Cn (f )−1 + (Dn Cn (f )−1 ) + · · · (4.k in [mf . Theorem 4. Since τn.k ≥ mf by Lemma 4. We next consider the inverse of an Hermitian Toeplitz matrix. 1. (4. If f (λ) ≥ mf > 0.40) where Cn (f ) is deﬁned in (4.1 we have for any > 0 D(mf + ) = dλ > 0. then Tn (f )−1 ∼ Tn (1/f ) = (2π)−1 π −π 2 dλei(k−j)λ /f (λ) . If f (λ) ≥ mf > 0. the minimum result is proved.38 CHAPTER 4. TOEPLITZ MATRICES Proof.29). mf + ] for arbitrarily small . Tn (f ) is nonsingular 2.1.3 Let Tn (f ) be a sequence of Hermitian Toeplitz matrices such that f (λ) is Riemann integrable and f (λ) ≥ 0 with equality holding at most at a countable number of points. Since n→∞ lim n−1 {number of τn.
Since f (λ) > 0 except at possible a ﬁnite number of points.k > 0 k and hence det Tn = n−1 τn. if the spectrum is strictly positive then the inverse of a Toeplitz matrix is asymptotically Toeplitz. Furthermore if ρn. If mf = 0. however. then Tn (f )−1 is not bounded. then lim n−1 n−1 k=0 n→∞ F [min(ρn. (4. For any ﬁnite θ. Cn ≤ 1/mf < ∞. TOEPLITZ MATRICES 39 that is.43) 4.k = 0 k=0 so that Tn (f ) is nonsingular. 2. we have from (4.1 since f (λ) ≥ mf > 0 ensures that −1 −1 Tn . θ)] = (2π)−1 2π F [min(1/f (λ). 1/mf ].k are the eigenvalues of Tn (f )−1 and F (x) is any continuous function on [1/Mf . 0 (4. 1/f (λ) is not integrable and hence Tn (1/f ) is not deﬁned and the integrals of (4. . From Lemma 4. θ)] dλ.k . and the derivative of f (λ) exists and is bounded. then lim n−1 n→∞ n−1 k=0 F (ρn.k ) = (2π)−1 π −π F [(1/f (λ)] dλ.4. θ]. 1.44) Proof.6. the following similar fact is true: If F (x) is a continuous function of [1/Mf . Tn ∼ Cn and hence (4.41) may not exist.40) follows from Theorem 2.9) 2 1 π n−1 ∗ ikλ x Tn x = xk e f (λ)dλ > 0 2π −π k=0 so that for all n min τn. f (λ) has at least one zero.2.
When f (λ) has zeros (mf = 0) then from Corollary 4. From (4.43) follows directly from Lemma 4.4. however.42) and Theorem 2.43) follows from (4. 1/mf ] and (4.45) since −1 −1 Dn Cn  ≤ Cn −→ ·Dn  ≤ (1/mf )Dn  n→∞ 0 (4.48) k k (4. 2 (4.46)(4.1 and Lemma 4.6 and Theorem 2.k = 1/ min τn.46) From Theorem 3. TOEPLITZ MATRICES The series of (4. From (b) for any > 0 we can choose an n large enough so that Tn (f )−1 − Cn (f )−1  ≤ .41) will converge in weak norm if −1 Dn Cn  < 1 (4. Equation (4.40).k = 0 n→∞ k and hence −1 Tn = max ρn.47) can choose n such that Tn (f )−1 − Tn (1/f ) ≤ which is (4. To prove that 1/f (λ) is not integrable and hence that Tn (1/f ) does not exist we deﬁne the sets Ek = {λ : 1/k ≥ f (λ)/Mf > 1/(k + 1)} = {λ : k ≤ Mf /f (λ) < k + 1} (4.k (4.45) must hold for large enough n.5. if G(x) is any continuous function on [1/Mf .40 CHAPTER 4.6 Cn (1/f ) ∼ Tn (1/f ). Thus again we can choose n large enough to ensure that Cn (f )−1 − Tn (1/f ) ≤ /2 so that for any > 0 from (4. Cn (f )−1 = Cn (1/f ) and from Lemma 4. if n is large enough. then probably the ﬁrst term of the series is suﬃcient.4 applied to G(1/x).47) is unbounded as n → ∞.2 lim minτn. We have Tn (f )−1 − Tn (1/f ) ≤ Tn (f )−1 − Cn (f )−1  + Cn (f )−1 − Tn (1/f ). 4.42). Alternatively. 3.49) .
then F [min(1/x. Then the derivative exists and is bounded since m df /dλ = m iktk eikλ k=−m . θ)] is continuous on [0. θ].44) let F (x) be continuous on [1/Mf .2.. ﬁnding conditions on f (λ) to ensure that Tn (f ) is invertible. Ek . Note that (4.e. Mf ] and hence Theorem 2.50) and (4. Mf ] and has at least one zero all of these sets are nonzero intervals of size.50) since f (λ) is diﬀerentiable there is some ﬁnite value η such that df ≤ η. 1/(k + 1) (4. From (4. A special case of part 4 is when Tn (f ) is ﬁnite order and f (λ) has at least one zero.49) π −π dλ/f (λ) ≥ ∞ k=1 Ek k/Mf (4. Extending (a) to the case of nonHermitian matrices can be somewhat diﬃcult. dλ From (4.44).52) = (Mf η)−1 k=1 which diverges so that 1/f (λ) is not integrable. . The proof of part 4 is motivated by one of Widom [2].51) dλ/f (λ) ≥ ∞ k=1 (k/Mf )(1/k − 1/(k + 1)/η ∞ . TOEPLITZ MATRICES 41 since f (λ) is continuous on [0.4 yields (4. Further results along the lines of part 4 regarding unbounded Toeplitz matrices may be found in [5]. ≤ ktk  < ∞ k=−m The series expansion of part 2 is due to Rino [6]. say.51) π −π (4. To prove (4. θ)].44) implies that the −1 eigenvalues of Tn are asymptotically equally distributed up to any ﬁnite θ as the eigenvalues of the sequence of matrices Tn [min(1/f. i.4.
The case of only two matrices is considered ﬁrst since it is simpler.53) (4.42 CHAPTER 4.6 to obtain the asymptotic behavior of products of Toeplitz matrices. For a more general discussion of inverses the interested reader is referred to Widom [2] and the references listed in that paper. (4.k be the eigenvalues of Tn (f )Tn (g) 1. Tn (f )Tn (g) ∼ Cn (f )Cn (g) = Cn (f g).55) 2. (4. We next combine Theorems 2. Theorem 4. As one might expect.. i=1 (4. The results of Baxter [7] can also be applied to consider the asymptotic behavior of ﬁnite inverses in quite general cases. the results are similar. then for any F (x) continuous on [mf mg .56) Tn (fi ) ∼ Cn m i=1 fi ∼ Tn m fi . n−1 k=0 (4. TOEPLITZ MATRICES Parts (a)(d) can be straightforwardly extended if f (λ) is continuous.5) where f (λ) and g(λ) are two bounded Riemann integrable functions.k ) = (2π)−1 2π F [f (λ)g(λ)] dλ. . It should be pointed out that when discussing inverses Widom is concerned with the asymptotic behavior of ﬁnite matrices. . Let f1 (λ). .58) .29) m i=1 −1 n−1 k=0 F (ρn.1 and Lemma 4. fm (λ) be Riemann integrable.54) lim n−1 n→∞ ρs = (2π)−1 n. Tn (f )Tn (g) ∼ Tn (g)Tn (f ).57) 4. Deﬁne Cn (f ) and Cn (g) as in (4.4 Let Tn (f ) and Tn (g) be deﬁned as in (4.29) and let ρn. 2. Mf Mg ] lim n n→∞ 3. . 0 (4.k 2π 0 [f (λ)g(λ)]s dλ s = 1. Then if the Cn (fi ) are deﬁned as in (4. Tn (f )Tn (g) ∼ Tn (f g). If Tn (t) and Tn (g) are Hermitian. ..
5 and 4. i. 5. p.1 Tn (f )Tn (g) − Tn (f g) = Tn (f )Tn (g) − Cn (f )Cn (g) +Cn (f )Cn (g) − Tn (f g) . however. we cannot simply apply Theorem 2. 2. the imaginary part converges to a distribution at zero. Equation (4. however.k are the eigenvalues of −1 n−1 n→∞ lim n ρs n. then for any positive integer s i=1 −1 0 2π m s 5.6 and Theorems 2.53) and Theorem 2.5 and 4.e.6 and Theorem 2.4.4. Equation (4. If ρn. asymptotically they do. that they i .55) follows from (4.53) follows from Lemmas 4. Theorem 2.4. ( [ρn. Equation (4..1 and 3.58) follows from (d) and Theorem 2.2.53). 3.k ])2 = 0.54) follows from (4. then the ρn.4 since the eigenvalues ρn. and Lemma 4. ≤ Tn (f )Tn (g) − Cn (f )Cn (g) −→ +Cn (f g) − Tn (f g)n→∞0 4. Note that while Toeplitz matrices do not in general commute.59) If the Tn (fi ) are Hermitian.1.61) 1.k k=0 = (2π) fi (λ) i=1 dλ (4. so that lim n−1 n→∞ n−1 k=0 n−1 k=0 (Re[ρn.60) lim n−1 n→∞ Proof.k ])s = (2π)−1 2π 0 m s fi (λ) i=1 dλ. (4.k are asymptotically real.53) and part (c). We can show. Note that the eigenvalues of the product of two Hermitian matrices are real [3. Proof follows from (4. 105].2. (4.53). Equation (4. Follows from repeated application of (4.k of Tn (fi ) may not be real. Applying Lemmas 4. TOEPLITZ MATRICES m 43 Tn (fi ). For the Hermitian case.
k ≤ Tn (fi ) .k ) k=0 s = n→∞ lim n −1 n−1 s ψn. TOEPLITZ MATRICES are asymptotically real. and products of Toeplitz matrices.2 we have for any positive integer s lim n −1 n−1 n→∞ (αn.k + βn. Let ρn.44 CHAPTER 4. dλ (4. o 105106].62) = (2π)−1 0 m fi (λ) i=1 where ψn. From (2.k + iβn. From (4. The basic method has been to ﬁnd an asymptotically equivalent circulant matrix whose special simple .k = (2π) −1 0 2π m fi (λ) i=1 dλ.k are real.14) m i=i 2 n−1 n−1 k=0 ρn. inverses. Theorem 2.4 m n→∞ 2 m 2 lim Tn (fi ) i=1 = n→∞ lim Cn i=1 2π 0 fi . Parts (d) and (e) are here proved as in Grenander and Szeg¨ [1.k and βn. Then from Theorem 2.1 and Lemma 4.61) for s = 2 from (4.k where αn.61) yields lim n−1 n→∞ n−1 k=1 2 βn.k = αn.k + iβn. Thus the distribution of the imaginary parts tends to the origin and hence s n→∞ lim n −1 n−1 k=0 s αn.k are the eigenvalues of Cn i=1 fi .57). pp.k 2 = n−1 n−1 k=0 2 2 αn. We have developed theorems on the asymptotic behavior of eigenvalues. (4.63) m 2 = (2π)−1 fi (λ) i=1 dλ Subtracting (4.k k=0 2π m s .k ≤ 0.
n − 1 and hence detCn = n−1 f (2πm/n). ln detCn = n n m=0 n 1 These sums are the Riemann approximations to the limiting integral. (4. This m=0 in turn implies that ln (det(Cn )) n = 1 1 1 n−1 m ln f (2π ). using the continuity of the logarithm for strictly positive arguments. Then from (4. Let Cn = Cn (f ) denote the sequence of circulant matrices constructed from f as in (4. 65) Theorem 4. Suppose now that Tn (f ) is a sequence of Hermitian Toeplitz matrices such that that f (λ) ≥ mf > 0.64) .3. We began with the ﬁnite order case since the appropriate circulant matrix is there obvious and yields certain desirable properties that suggest the corresponding circulant matrix in the inﬁnite case. whence 1 lim ln (det(Cn )) n = n→∞ ln f (2πλ) dλ.3 togther yield the following result ([1].6). Then lim (det(Tn (f ))) n = e 2π n→∞ 1 1 2π 0 ln f (λ) dλ. We have limited our consideration of the inﬁnite order case to absolutely summable coeﬃcients or to bounded Riemann integrable functions f (λ) for simplicity. and changing the variables of integration yields lim (det(Cn )) n = e 2π n→∞ 1 1 2π 0 ln f (λ) dλ. .26).5 Let Tn (f ) be a sequence of Hermitian Toeplitz matrices such that ln f (λ) is Riemann integrable and f (λ) ≥ mf > 0. 0 Exponentiating.4.3 Toeplitz Determinants The fundamental Toeplitz eigenvalue distribution theory has an interesting application for characterizing the limiting behavior of determinants.28) the eigenvalues of Cn are f (2πm/n) for m = 0. . p. . TOEPLITZ DETERMINANTS 45 structure as developed in Chapter 3 could be directly related to the Toeplitz matrices using the results of Chapter 2. and Corollary 2. . the asymptotic equivalence of Cn and Tn (f ) (Lemma 4. 1. This integral. The more general case of square summable tk or bounded Lebesgue integrable f (λ) treated in Chapter 7 of [1] requires signiﬁcantly more mathematical care but can be interpreted as an extension of the approach taken here. 4.
TOEPLITZ MATRICES .46 CHAPTER 4.
the space of all integers. its inverse and its eigenvalues. The lesser known fact that we can deﬁne something like a power spectral density for autoregressive processes even if they are nonstationary is discussed. We will be interested in vector 47 . In the ﬁrst part we shall consider two common linear models of random time series and study the asymptotic behavior of the covariance matrix. In the second part of the ﬁrst section we take the opposite tack — we start with a Toeplitz covariance matrix and consider the asymptotic behavior of its triangular factors. Generally we take I = Z. in which case we say that the process is twosided. the space of all nonnegative integers. these two types of Toeplitz matrices are intimately related. This simple result provides some insight into the asymptotic behavior or system identiﬁcation algorithms and WienerHopf factorization. in which case we say that the process is onesided. As is well known and as we shall show. k ∈ I} be a discrete time random process. We shall take two viewpoints in the ﬁrst section of this chapter section to show how they are related. Let {Xk . The well known equivalence of moving average processes and weakly stationary processes will be pointed out.Chapter 5 Applications to Stochastic Time Series Toeplitz matrices arise quite naturally in the study of discrete time random processes. The second section provides another application of the Toeplitz distribution theorem to stationary random processes by deriving the Shannon information rate of a stationary Gaussian random process. Covariance matrices of weakly stationary processes are Toeplitz and triangular Toeplitz matrices provide a matrix representation of causal linear time invariant ﬁlters. or I = Z+ .
For simplicity we will assume the process means are zero. then rk. The mean vector is deﬁned by mn = E(X n ). Rn ∼ Tn (f ). The transient dies out. The process Wk is discrete time “white” . If the matrix Rn is not Toeplitz but is asymptotically Toeplitz. We now proceed to investigate the behavior of two common linear models. These results are useful both for ﬁnding theoretical limiting solutions and for ﬁnding reasonable approximations for ﬁnite order solutions. however. and the statistics of the process approach those of a weakly stationary process as n grows. X1 . then we say that the process is asymptotically weakly stationary and once again deﬁne f (λ) as the power spectral density. X n is an ndimensional column vector. independent identically distributed (iid) sequence of random variables Wk with variance σ 2 through a linear time invariant discrete time ﬁltered to obtain the desired process. The latter situation arises. 5.1 Moving Average Sources By a linear model of a random process we mean a model wherein we pass a zero mean.1) This is the autocorrelation matrix when the mean vector is zero. So much for philosophy. In this case we can deﬁne f (λ) = rk eikλ as the k=−∞ power spectral density of the process. As might be hoped the results for asymptotic behavior of ﬁnite matrices are consistent with this case. for example. The n × n covariance matrix Rn = {rj.48 CHAPTER 5. . k ≤ 0. (5. i. The problem is of interest since one often has ﬁnite order equations and one wishes to know the asymptotic behavior of solutions or one has some function deﬁned as a limit of solutions of ﬁnite equations. Subscripts will be dropped when they are clear from context. . Xn−1 )t .k } is deﬁned by Rn = E[(X n − mn )(X n − mn )∗ ]..e.j = rk−j and the process is said to be ∞ weakly stationary. that is. The results derived herein are essentially trivial if one begins and deals only with doubly inﬁnite matrices. which we usually assume is zero for convenience. . say Rn = Tn (f ). This will cause a transient and hence the process is strictly speaking nonstationary. APPLICATIONS TO STOCHASTIC TIME SERIES representations of the process so we deﬁne the column vector (n−tuple) X n = (X0 . . if an otherwise stationary process is initialized with Xk = 0. If the matrix Rn is Toeplitz.
This would lead to replacing of (5. Note that (5. . (5.3) We will restrict ourselves to causal ﬁlters for simplicity and keep the initial conditions since we are interested in limiting behavior.2) Un = 0. The most common such model is called a moving average process and is deﬁned by the diﬀerence equation n n Un = k=0 bk Wn−k = k=0 bn−k Wk (5. We could also allow the Wk ’s and Uk ’s to extend into the inﬁnite past rather than being initialized.5) As previously this will result in simpler mathematics.2) converges in the meansquared sense. some assumption must be placed on the sequence bk to ensure that (5. In addition.4) In keeping with the previous sections. Un is the output of a ﬁlter with “impulse response” (actually Kronecker δ response) bk and input Wk . We could be more general by allowing the ﬁlter bk to be noncausal and hence act on future Wk ’s. we will make the stronger assumption ∞ k=0 bk  < ∞.2) is that ∞ k=0 bk 2 < ∞. i. MOVING AVERAGE SOURCES 49 noise. since stationary distributions may not exist for some models it would be diﬃcult to handle them unless we start at some ﬁxed time.2) by ∞ ∞ Un = k=−∞ bk Wn−k = k=−∞ bn−k Wk . (5. however. For these reasons we take (5.2) as the deﬁnition of a moving average. We assume that b0 = 1 with no loss of generality since otherwise we can incorporate b0 into σ 2 .5. n < 0.. The weakest possible assumption that will guarantee convergence of (5. (5. Since we will be studying the statistical behavior of Un as n gets arbitrarily large.2) is a discrete time convolution.e.1.
as we next show. (5. . 0 1 b1 b2 . we have for the covariance of Un : RU or. . equivalently n−1 (n) ∗ = EU n (U n )∗ = EBn W n (W n )∗ Bn ∗ = σBn Bn (5. . Since the covariance matrix of Wk is simply σ 2 In .9) min(k.j is not Toeplitz because of the min(k.j) = σ2 =0 b b +(k−j)¯ From (5. b1 1 (5. .j = σ 2 =0 b b −k¯ −j . . b2 . However.10) then Bn = Tn (b) so that RU = σ 2 Tn (b)Tn (b)∗ .9) it is clear that rk.7) would require inﬁnite vectors and matrices as in Grenander and Rosenblatt [12]..7) If the ﬁlter bn were not causal. where In is the n × n identity matrix.6) bn−1 .2) can be rewritten as a matrix equation by deﬁning the lower triangular Toeplitz matrix Bn = 1 b1 b2 .12) . i. so that U n = Bn W n . then Bn would not be triangular. If we deﬁne ∞ b(λ) = k=0 bk eikλ (5. we looked at the entire process at each time instant..11) (5.8) rk. (n) (5.. then (5. j) in the sum. . as n → ∞ the upper limit becomes large (n) and RU becomes asymptotically Toeplitz. APPLICATIONS TO STOCHASTIC TIME SERIES Equation (5.50 CHAPTER 5. (5. If in addition (5.3) held.e.
then RU (n) −1 ∼ σ −2 Tn (1/b2 ).4 and 2.k ) = (2π)−1 2π 0 F (σ 2 b(λ)2 ) dλ.. (n) −1 (n) 5. Note that the spectral density of the moving average process is σ 2 b(λ)2 and that sums of functions of eigenvalues tend to an integral of a function of the spectral density.4.14) If b(λ)2 ≥ m > 0.2 Autoregressive Processes n Let Wk be as previously deﬁned. M ].) If the process Un had been initiated with its stationary distribution then we would have had exactly RU = σ 2 Tn (b2 ).5.g. Theorem 5. (5. .1 Let Un be a moving average process with covariance matrix (n) RUn . . 1. . . then an autoregressive process Xn is deﬁned by Xn = − ak Xn−k + Wk k=1 n = 0. More knowledge of the inverse RU circulant approximations. Then RU ∼ σ 2 Tn (b2 ) = Tn (σ 2 b2 ) (n) (5. In eﬀect the spectral density determines the asymptotic density function for the eigenvalues of Rn and Tn .2. can be gained from Theorem 4.3.24. e.13) so that Un is asymptotically stationary. (Theorems 4.k be the eigenvalues of RU . Let ρn.15) Proof. AUTOREGRESSIVE PROCESSES 51 We can now apply the results of the previous sections to obtain the following theorem. If m ≤ b(γ)2 ≤ M and F (x) is any continuous function on [m. (5. then lim n n→∞ −1 n−1 k=0 F (ρn.
0 (5.j = m=0 am−k am−j = ¯ m=0 am am+(k−j) . (5.20) or (n) (RX )−1 = σ 2 A∗ An n (n) or equivalently. Unlike the moving average process.. . . 1 a1 An = 1 a1 1 .16) Autoregressive process include nonstationary processes such as the Wiener process.j } then n n−max(k. .22) and hence the following theorem. .j) tk. Equation (5.17) an−1 so that An X n = W n . APPLICATIONS TO STOCHASTIC TIME SERIES Xn = 0 n < 0. if (RX )−1 = {tk..16) can be rewritten as a vector equation by deﬁning the lower triangular matrix. we have that the inverse covariance matrix is the product of Toeplitz triangular matrices. An is nonsingular so that RX = σ 2 A−1 A−1∗ n n (n) (5. Deﬁning ∞ a(λ) = k=0 ak eikλ (5. We have RW = An RX A∗ n (n) (n) a1 1 (5.19) (5.18) since det An = 1 = 0.21) we have that (RX )−1 = σ −2 Tn (a)∗ Tn (a) (n) (5.52 CHAPTER 5.
θ)] dλ. (5. θ] lim n−1 n→∞ n−1 k=0 F [min(ρn. however.3 part 4.2. (5. to deﬁne σ 2 /a(λ)2 as the spectral density and it often is useful for studying the eigenvalue distribution of Rn . (n) (5.k .25) Proof.24) where 1/ρn. If a(λ)2 ≥ m > 0. We can relate (n) σ 2 /a(λ)2 to the eigenvalues of RX even in this case by using Theorem 4. then RX may not be even asymptotically Toeplitz and hence Xn may not be asymptotically stationary (since 1/a(λ)2 may not be integrable) so that strictly speaking xk will not have a spectral density. AUTOREGRESSIVE PROCESSES 53 Theorem 5. M ] we have lim n−1 n→∞ n−1 k=0 (n) F (1/ρn.k ) = (2π)−1 2π 0 F (σ 2 a(λ)2 ) dλ. Corollary 5.1.1 If Xk is an autoregressive process and ρn. then for any function F (x) on [m .2 Let Xn be an autoregressive process with covariance matrix (n) RX with eigenvalues ρn. θ)] = (2π)−1 2π 0 F [min(1/a(γ)2 . Then (RX )−1 ∼ σ −2 Tn (a2 ). then RX ∼ σ 2 Tn (1/a2 ) so that the process is asymptotically stationary. then 1/a(λ)2 is the spectral density of Xn .23) If m ≤ a(λ)2 ≤ m .k are the eigenvalues (n) of RX .) Note that if a(λ)2 > 0. If (n) a(λ)2 has a zero. It is often convenient.5. (Theorems 5.k . then for any ﬁnite θ and any function F (x) continuous on [1/m . (n) (5.k are the eigenvalues of (RX )−1 .26) .
Then the processes Un and Xn are asymptotically equivalent if a(λ) = 1/b(λ) and M ≥ a(λ) ≥ m > 0 so that 1/b(λ) is integrable.) If we consider two models of a random process to be asymptotically equivalent if their covariances are asymptotically equivalent.5.2 Consider the moving average process deﬁned by U n = Tn (b)W n and the autoregressive process deﬁned by Tn (a)X n = W n . then from Theorems 5.2 and Theorem 4. (Theorem 4.12) completes the proof. (5. 5. It is .54 CHAPTER 5.3 Factorization As a ﬁnal example we consider the problem of the asymptotic behavior of triangular factors of a sequence of Hermitian covariance matrices Tn (f ). APPLICATIONS TO STOCHASTIC TIME SERIES Proof.27) with (5.1d and 5.3 and Theorem 4.2 we have the following corollary. Proof. Corollary 5. (Theorem 5.27) Comparison of (5.) RX (n) −1 = σ 2 Tn (a)−1 Tn (a)∗ ∼ σ 2 Tn (1/a)Tn (1/a)∗ ∼ σ 2 Tn (1/a)∗ Tn (1/a).3. The methods above can also easily be applied to study the mixed autoregressivemoving average linear models [2].
.28) and Theorem 4.e. Then ln f (λ) is integrable and we can perform a WienerHopf factorization of f (λ). (5. tj. (5. The factorization of Tn allows the construction of a linear model of a random process and is useful in system identiﬁcation and other recursive procedures.j = {(det Tk ) det(Tk−1 )}−1/2 γ(j. 37]. p.31) b(λ) = k=0 b0 = 1 From (5. in particular ∗ Tn (f ) = {tk.k−1 )t . tj.1 .. (n) (5.29) where γ(j.0 . bk e ikλ (5.33) .32) We wish to show that (5.32) implies that Bn ∼ Tn (σb).j } = Bn Bn . (5.k = {(det Tk )/(det Tk−1 )}1/2 .5. .4 we have ∗ Bn Bn = Tn (f ) ∼ Tn (σb)Tn (σb)∗ . k). Note in particular that the diagonal elements are given by bk. i. .29) is the result of a Gaussian elimination of a GramSchmidt procedure.30) Equation (5.3. Our question is how Bn behaves for large n.28) where Bn is a lower triangular matrix with entries bk. FACTORIZATION 55 well known that any such matrix can be factored into the product of a lower triangular matrix and its conjugate transpose [12. . speciﬁcally is Bn asymptotically Toeplitz? Assume that f (λ) ≥ m > 0. f (λ) = σ 2 b(λ)2 ¯ b(λ) = b(−λ) ∞ . k) is the determinant of the matrix Tk with the right hand column replaced by (tj. (n) (5.
37) Combining (5. APPLICATIONS TO STOCHASTIC TIME SERIES Proof.3.32) we have −1 −1 ∗ ∗−1 −1 Tn Bn = [Bn Tn ]−1 ∼ Tn Bn = [Bn Tn ]∗ . (n) (5.34) we have Gn ∼ G∗−1 n gk. Thus (5.k 2 δk.34) states that a lower triangular matrix is asymptotically equivalent to an upper triangular matrix.37) yields n→∞ −1 lim Bn Tn − In  = 0.35) Since Tn (σb) is lower triangular with main diagonal element σ. and the fact that Tn (σb) is triangular we have that lim σ −1 n−1 n−1 k=0 n→∞ bk. .k is real so that taking the trace in (5.k δk.3 part 1 that det Tn (f ) = 0 so that Bn is invertible. bk. Tn (σb)−1 is lower triangular with all its main diagonal elements equal to 1/σ even (n) (n) though the matrix Tn (σb)−1 is not Toeplitz. Likewise. Since Tn (f ) is Hermitian.56 CHAPTER 5.34) −1 Since Bn and Tn are both lower triangular matrices.30) and Corollary 2.38) Applying Theorem 2. (5. (5.j ∼ In . so is Bn and hence −1 −1 Bn Tn and [Bn Tn ] .k = σ −1 n→∞{(det Tn (f ))/(det Tn−1 (f ))}1/2 lim (n) = σ −1 n→∞{det Tn (f )}1/2n σ −1 n→∞{det Tn (σb)}1/n lim lim = σ −1 · σ = 1 . (5. Thus from Theorem 2.1 (e) and (5.k (n) 2 = 1.k = bk. since det Bn = [det Tn (f )]1/2 we have from Theorem 4. say Gn = (n) {gk. Thus gk. Since det Tn (σb) = σ n = 0. (5. This is only possible if both matrices are asymptotically equivalent to a diagonal matrix.j }.33).k /σ.1 yields (5. Tn (σb) is invertible. Furthermore from (5.35) yields lim σ −2 n−1 n→∞ n−1 k=0 bk.36) and (5.36) From (5.
by its power spectral density function ∞ S(f ) = n=−∞ RX (n)e−2πinf .) The logarithm is usually taken as base 2 and the units are bits. the Shannon diﬀerential entropy h(X n ) is deﬁned by the integral h(X n ) = − fX n (xn ) log fX n (xn ) dxn and the diﬀerential entropy rate is deﬁned by the limit 1 h(X) = n→∞ h(X n ) lim n if the limit exists. For a ﬁxed positive integer n. We will use the Toeplitz theorem to evaluate the diﬀerential entropy rate of a stationary Gaussian random process.4 Diﬀerential Entropy Rate of Gaussian Processes As a ﬁnal application of the Toeplitz eigenvalue distribution theorem. . the probability density function is fX n (xn ) = 1 (2π)n/2 det(R n )1/2 e− 2 (x 1 n −mn )t R−1 (xn −mn ) n . m) = RX (k − m) = E[(Xk − m)(Xk − m)] or. equivalently. this result could easily be extended to the more general case that ln f (λ) is integrable.5. A stationary zero mean Gaussian random process is comletely described by its mean correlation function RX (k. Cover and Thomas[14]. The theorem can also be derived as a special case of more general results of Baxter [8] and is similar to a result of Rissanen [11]. Given a random process {Xn } for which a probability density function fX n (xn ) is for the random vector X n = (X0 . DIFFERENTIAL ENTROPY RATE OF GAUSSIAN PROCESSES57 Since the only real requirements for the proof were the existence of the WienerHopf factorization and the limiting behavior of the determinant. for example. . . (See. X1 .4. we consider a property of a random process that arises in Shannon information theory. the Fourier transform of the covariance function. . Xn−1 ) deﬁned for all positive integers n. . 5.
. then from Theorem 4. (5. 2 If we now identify the the covariance matrix Rn as the Toeplitz matrix generated by the power spectral density.40) 2π 0 The Toeplitz distribution theorems have also found application in more complicated information theoretic evaluations. . APPLICATIONS TO STOCHASTIC TIME SERIES where Rn is the n × n covariance matrix with entries RX (k. A straightforward multidimensional integration using the properties of Gaussian random vectors yields the diﬀerential entropy h(X n ) = 1 log(2πe)n detRn . Tn (S).5 we have immediately that 1 2 (5. m). . m = 0. . including the channel capacity of Gaussian channels [17. n − 1.58 CHAPTER 5. 1. 18] and the ratedistortion functions of autoregressive sources [9]. k. .39) h(X) = log(2πe)σ∞ 2 where 1 2π 2 σ∞ = ln S(f ) df.
on Info.L. NY 1960. Toeplitz Forms and Their Applications. No. No. Rudin. Szeg¨.” Information and Control. “Information Rates of Autoregressive Processes. 181–196.” Illinois J. [4] W. edited by I. 4. The Theory of Matrices. March 1970. 59 . Principles of Mathematical Analysis. Theory. Englewood Cliﬀs. “A Norm Inequality for a ‘FiniteSection’ WienerHopf Equation.R. Scand. 1958. [8] G. NJ. 24.. Jr. NY. IT16. “An Asymptotic Result for the Finite Predictor. Press. [7] G. Gray.. PrenticeHall. [9] R. pp.. Gantmacher.M. Lancaster. 230–232. 1962.” IEEE Trans. “The Inversion of Covariance Matrices by Finite Fourier Transforms. [5] R.M. Gray. Chelsea Publishing Co. Berkeley and Los Angeles. Academic Press. Rino. pp. 1964. 412–421. July 1970. McGrawHill. Grenander and G. [10] F. 2. pp. 137–144.” IEEE Trans. on Info. MAA Studies in Mathematics. pp.. 1969. Theory. Hirschmann.” Math. “Toeplitz Matrices. IT16. 1965.I. 1974. 97–103. 10.Bibliography [1] U. Theory of Matrices. Math.” in Studies in Real and Complex Analysis. 1962. Baxter. [3] P. “On Unbounded Toeplitz Matrices and Nonstationary Time Series with an Application to Information Theory. [2] H. Baxter. Widom. [6] C. o University of Calif. pp. NY.
” Information Sciences. [17] B. [13] J. 1970. Vol 6. pp. 26–40. B¨ttcher and B. Davis. pp..S. 78–82. Inform. A. Springer. Thomas. 1973. Wiley and Sons. 1. IT19. [14] T. pp. [12] U. NY. 18.Probl. “On the asymptotic eigenvalue distribvution of Toeplitz matrices. Silbermann. NY. [15] A. Statistical Analysis of Stationary Time Series. [16] R. WileyInterscience. Circulant Matrices. [19] P. Barbosa. Vol 1.” (in Russion). Tsybakov. “On Coding and Filtering Stationary Signals by Discrete Fourier Transform.. Chapter 1. “Transmission capacity of memoryless Gaussian vector channels.60 BIBLIOGRAPHY [11] J. November 1972. . Cover and J. 1966. New York. 221–236. 1991. Elements of Information Theory. “On the transmission capacity of a discretetime Gaussian channel with ﬁlter. 1979. Inform. Rissanen and L.” IEEE Trans. Introduction to Large Truncated Toeplitz o Matrices. A. J.” (in Russion). pp.Probl. Tsybakov. 725–730. 1965.” IEEE Transactions on Information Theory. [18] B. 1969. Vol. 229–232. New York. Wiley. Pearl. on Info. 1999. “Properties of Inﬁnite Covariance Matrices and Stability of Optimum Predictors.S. pp. Peredach. Grenander and M. Gray. Theory. Peredach. Rosenblatt. M.
This action might not be possible to undo. Are you sure you want to continue?
We've moved you to where you read on your other device.
Get the full title to continue reading from where you left off, or restart the preview.