You are on page 1of 7

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/262561257

Multivariate Itakura-Saito distance for spectral estimation: Relation between


time and spectral domain relative entropy rates *

Conference Paper · October 2011


DOI: 10.13140/2.1.3165.0246

CITATIONS READS

0 205

3 authors:

Augusto Ferrante Chiara Masiero


University of Padova University of Padova
112 PUBLICATIONS   1,037 CITATIONS    15 PUBLICATIONS   90 CITATIONS   

SEE PROFILE SEE PROFILE

Michele Pavon
University of Padova
150 PUBLICATIONS   1,687 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Optimal control View project

Spectral estimation View project

All content following this page was uploaded by Michele Pavon on 23 May 2014.

The user has requested enhancement of the downloaded file.


Multivariate Itakura-Saito distance for spectral
estimation: Relation between time and spectral
domain relative entropy rates∗
Augusto Ferrante, Chiara Masiero
Department of Information Engineering, via Gradenigo 6/B, University of Padova, 35131
Padova, Italy
E-mail: augusto@dei.unipd.it, masieroc@dei.unipd.it

Michele Pavon

Department of Pure and Applied Mathematics, via Trieste 63, University of Padova, 35131
Padova, Italy
E-mail:pavon@math.unipd.it

Abstract between multivariable power spectra have only re-


cently received any attention. In this direction,
The notion of spectral relative entropy rate is
we mention the generalization of the Hellinger dis-
defined for jointly stationary Gaussian processes.
tance introduced in [6], a version of the quantum-
Using classical information-theoretic results, we
mechanical Umegaki-von Neumann relative entropy
establish a remarkable connection between time
[9] and Riemannian metrics [13].
and spectral domain relative entropy rates, and
In our recent paper [7], a multivariate version of
therefore with a multivariate version of the clas-
the Itakura-Saito divergence was employed to solve
sical Itakura-Saito divergence. This information-
the state-covariance matching problem. The choice
theoretic result appears promising for applications
of this Bregman divergence was there motivated by
where spectral entropy already plays an important
the connection between such a distance and the rel-
role such as EEG analysis. It also lends support
ative entropy rate for stationary, multidimensional,
to a new spectral estimation technique recently de-
Gaussian processes [18], [11, p.371]. Here, we show
veloped by the authors where a multivariate ver-
that it is possible to introduce a spectral relative en-
sion of the Itakura-Saito distance is employed as a
tropy rate, namely a relative entropy rate between
spectrum divergence. A minimum complexity spec-
the corresponding orthogonal increments processes
trum is provided by this new approach. Simula-
of the spectral representation. It is then possible
tions suggest the effectiveness of the new technique
to prove that the relative entropy rates in the time
in tackling multivariate spectral estimation tasks,
and spectral domains are in fact equal! On the one
especially in the case of short data records.
hand, this information theoretic result lends further
support to the choice of the multivariate Itakura-
Saito distance for spectrum approximation. On the
1 Introduction
other hand, it opens up a new perspective in ap-
Information theoretic indexes have played a key plications involving spectral entropy such as EEG
role in a widespread variety of statistical, signal analysis, see e.g. [20].
processing and indentification problems, see e.g
[15, 18, 2, 3, 4, 12]. On the other hand, distances

Work partially supported by the Italian Ministry
for Education and Research (MIUR) under PRIN grant
n. 20085FFJ2Z “New Algorithms and Applications of
System Identification and Adaptive Control”
2 Multivariate Itakura-Saito 3 Circularly-symmetric complex
Distance Gaussian random vectors
The relative entropy or Kullback-Leibler pseudo- We shall need to consider zero-mean, n-
distance or divergence between two probability den- dimensional, complex-valued Gaussian random vec-
sities p and q, with the support of p contained in tors z = α + jβ, where the real and imaginary
the support of q, is defined by parts are jointly Gaussian. The corresponding den-
Z sity function is the joint probability density of the
p(x) 2n-dimensional compound vector γ = [α> β > ]> .
D(pkq):= p(x) log dx.
R n q(x) The relative entropy between two zero-mean n-
dimensional complex Gaussian densities p and q
In the case of two zero mean, Gaussian densities p
is given by
and q, with positive definite covariance matrices P
and Q, respectively, it is given by 1
log det(Rp−1 Rq ) + tr(Rq−1 Rp ) − 2n ,

D(pkq) :=
1 2
log det P −1 Q + tr Q−1 (P − Q) .
  
D (pkq) :=
2 where Rp and Rq are the covariance matrices of the
2n-dimensional vectors γp and γq corresponding to
Consider now two wide-sense stationary, purely
the densities p and q, respectively. If the zero-
nondeterministic, zero mean, jointly Gaussian
mean, Cn -valued Gaussian random vector z has the
stochastic processes y = {yk ; k ∈ Z} and z =
property that E[zz > ] = 0, then we say that z is
{zk ; k ∈ Z}, with values in Rm . Let Y[−n,n] be the
a circular symmetric normally distributed random
Gaussian random vector whose elements are given
vector [17, 8]. This implies that E[ααT ] = E[ββ T ].
by the window y−n , y−n+1 , . . . , y0 , . . . , yn−1 , yn and
If p and q are two n-dimensional complex Gaussian
let Z[−n,n] be defined in the same fashion. Let
distribution with circular symmetry, the expression
pY[−n,n] and pZ[−n,n] denote the corresponding joint
of the relative entropy simplifies to the formula
densities, respectively. Then, the relative entropy
rate between z and y is defined by D(pkq) = log det(P −1 Q) + tr(Q−1 P ) − n,
1 
Dr (ykz) := lim D pY[−n,n] kpZ[−n,n] , where P and Q are the covariance matrices of
n→ ∞ 2n + 1
zp := αp + jβp and zq := αq + jβq , respectively.
if the limit exists. Let Φy and Φz be the spectral
density functions of y and z, respectively. Assume
that at least one of the following conditions is sat- 4 Spectral relative entropy rate
isfied: The stationary, purely nondeterministic, Gaus-
1. Φy Φ−1 is bounded; sian process y with spectrum Φy admits the spectral
z
representation
2. Φy ∈ L2 (−π, π) and Φz is coercive (i.e. ∃ α > Z
0 s.t. Φz (ejϑ ) − αIm > 0 a.e. on T), yk = ejkϑ dŷ(ejϑ ), (2)
T
where T denotes the unit circle. Then, the following
classical result holds [18, 12, 22]: where ŷ is an m−dimensional stochastic process
Z π with orthogonal increments [14, 21, 16]. Moreover,
1
log det Φ−1 jϑ jϑ
 
Dr (ykz) = y (e )Φz (e ) E dŷ(ejϑ )dŷ(ejϑ )∗ = Φy (ejϑ )dϑ,

(3)
4π −π
+ tr Φ−1 jϑ jϑ jϑ
 
z (e ) Φy (e ) − Φz (e ) dϑ. (1) where ∗ denotes transposition plus conjugation.
Notice that, given ϑ1 , ϑ2 ∈ (−π, π], ŷ(ϑ1 , ϑ2 ) is
The right-hand side of (1) represents a multivari- a complex-valued Gaussian vector, i.e. a complex
able Itakura-Saito distance. Indeed, in the case of vector for which the real and imaginary parts are
scalar spectra, Dr (ykz) = 1/2dIS (Φ, Ψ), where jointly Gaussian. Recall that the corresponding
Z π  density function is the joint probability density of
Φ(ejϑ ) Φ(ejϑ )

1
dIS (Φ, Ψ) = − log − 1 dϑ the 2m-dimensional vector γ obtained by stacking
2π −π Ψ(ejϑ ) Ψ(ejϑ )
the real part of the complex vector over its imag-
is the classical Itakura-Saito distance of maximum inary part [23, 17]. The first contribution of this
likelihood estimation for speech processing [4]. paper is to introduce a natural concept of spectral
relative entropy rate. Let ϑk = πk
n , k = 0, 1, . . . , 2n Lemma 2, we have that the elements of Ŷn are in-
and consider the complex Gaussian random vectors dependent random vectors and the same holds for
> the elements of Ẑn . Hence, we have the following
Ŷ2n := ŷ(ejϑ0 , ejϑ1 )> , . . . , ŷ(ejϑ2n−1 , ejϑ2n )> ,

additive decomposition:
>
Ẑ2n := ẑ(ejϑ0 , ejϑ1 )> , . . . , ẑ(ejϑ2n−1 , ejϑ2n )> .
    
D p(Ŷ2n )kp(Ẑ2n ) = D p(Ŷn )kp(Ẑn )
Let p(Ŷ2n ) and p(Ẑ2n ) be their joint probability n−1
X
density, respectively. We define the spectral rela- = D (p(δ ŷk )kp(δẑk )) , (6)
tive entropy rate between between y and z as the k=0

following limit, provided it exists:


with p(δ ŷk ) and p(δẑk ) being the probability densi-
1   ties of the random vector δ ŷk = ŷ(ejϑk , ejϑk+1 ) and
Dr (dŷkdẑ) := lim D p(Ŷ2n )kp(Ẑ2n ) . (4) δẑk =ẑ(ejϑk , ejϑk+1 ), respectively. Since δ ŷk and δẑk
n→∞ 2n
are jointly Gaussian and circularly symmetric, by
Our main contribution is the following thought- (2) and (3), we get,
provoking result:
Theorem 1. Let y and z be as above. Assume that D (p(δ ŷk )kp(δẑk ))
= log det Q−1
 
both Φy and Φz are piecewise continuous and coer- y (ϑk , ϑk+1 )Qz (ϑk , ϑk+1 )
cive spectral densities. Then, the following equality + tr Q−1
 
z (ϑk , ϑk+1 )Qy (ϑk , ϑk+1 ) − m,
holds:
Dr (ykz) = Dr (dŷkdẑ) . (5) where, by virtue of the orthogonality of the incre-
ments
The proof involves a number of nontrivial steps, due
to the fact that the spectral processes are complex- Z ϑk+1
valued. The two following lemmas, whose proof Qy (ϑk , ϑk+1 ) := Φy (ejξ )dξ,
ϑk
may be found in [7], are crucial to this end. The
first result establishes a circular symmetry property and similarly for z. By piecewise continuity and
of the increments of the process ŷ occurring in the the mean value theorem, we have that, except for
spectral representation. a finite number of k’s,
Lemma 1. Suppose ϑ1 , ϑ2 ∈ (−π, π], then D (p(δ ŷk )kp(δẑk ))
ŷ(e−jϑ2 , e−jϑ1 ) = ŷ(ejϑ1 , ejϑ2 ). If, moreover, ϑ1 , ϑ2 
π −1 π

have the same sign, then, ŷ(ejϑ1 , ejϑ2 ) is a circularly = log det Φy (ejϑ̄k ) Φz (ejϑ̄k )
n n
symmetric, normally distributed random vector.  −1 
π π
The second lemma proves an independence of the + tr Φz (ejϑ̄k ) Φy (ejϑ̄k ) −m
n n
increments result.
= log det[Φy (ejϑ̄k )−1 Φz (ejϑ̄k )]
Lemma 2. Let ϑ1 , ϑ2 , ϑ3 , ϑ4 be such that [ϑ1 , ϑ2 ] ∩ h i
+ tr Φz (ejϑ̄k )−1 Φy (ejϑ̄k ) − m,
[ϑ3 , ϑ4 ] = [ϑ1 , ϑ2 ] ∩ [−ϑ4 , −ϑ3 ] = ∅. Then,
ŷ(ejϑ1 , ejϑ2 ) and ŷ(ejϑ3 , ejϑ4 ) are independent ran-
dom vectors. where ϑk ≤ ϑ̄k < ϑk+1 . By employing the latter
expression together with (6) and (4), we get
The last preparatory result allows to restrict our
attention to increments on the interval [0, π].
Lemma 3.
   
D p(Ŷ2n )kp(Ẑ2n ) = D p(Ŷn )kp(Ẑn ) .

Proof. (Theorem 1) In view of Lemma 1, the


last n components of Ŷ2n are functions (the com-
plex conjugate) of the first n and the same holds
for Ẑ2n . Hence, in viewof Lemma 3, we have
D p(Ŷ2n )kp(Ẑ2n ) = D p(Ŷn )kp(Ẑn ) . Using
2. an estimate based on the data {yi }Ni=1 of the
steady-state covariance Σ of the state x(k) of
1  
Dr (dŷkdẑ) = lim D p(Ŷn )kp(Ẑn ) the filter
n→∞ 2n
n−1
1 X x(k + 1) = Ax(k) + By(k);
= lim D (p(δ ŷk )kp(δẑk ))
n→∞ 2n
k=0 m×m
n−1
3. a prior spectral density Ψ ∈ S+ ;
1 X
= lim log det Φy (ejϑ̄k )−1 Φz (ejϑ̄k ) 4. an index that measures the distance between
n→∞ 2n
k=0
h  i two spectral densities.
−1 jϑ̄k
+ tr Φz (e ) Φy (ejϑ̄k ) − Φz (ejϑ̄k )
The filterbank (8) provides Carathèodory or, more
n−1 generally, Nevanlinna-Pick interpolation data for
1 Xn
= lim log det Φy (ejϑ̄k )−1 Φz (ejϑ̄k ) the positive real part Φ+ of Φ, see [5, Section II].
n→∞ 2π
k=0 This occurs through the constraint
h  io π
−1 jϑ̄k
+ tr Φz (e ) Φy (ejϑ̄k ) − Φz (ejϑ̄k ) Z
n
Z π GΦG∗ = Σ (8)
1  −1 jϑ jϑ

= log det Φy (e )Φz (e )
2π 0
which must be satisfied by the spectrum of y (here
+ tr Φ−1 jϑ jϑ jϑ
 
z (e ) Φy (e ) − Φz (e ) dϑ and in the sequel integration occurs on the unit
1 π 
Z
circle with respect to normalized Lebesgue mea-
log det Φ−1 jϑ jϑ

= y (e )Φz (e )
4π −π sure). Since, in general, the prior estimate Ψ is
+ tr Φ−1
 jϑ jϑ jϑ
 not consistent with the interpolation conditions, an
z (e ) Φy (e ) − Φz (e ) dϑ,
approximation problem arises. It is then necessary
which, by (1), is Dr (ykz). to introduce an adequate distance index. This cru-
cial choice is dictated by several requirements. On
the one hand, the solution should be rational of low
5 Approximation in the McMillan degree at least when the prior Ψ is such.
On the other hand, the variational analysis should
Itakura-Saito distance lead to a computable solution, typically by solving
m×m the dual optimization problem. In [7], the following
Let S+ denote the family of bounded and co-
ercive spectral densities on T := {z ∈ C : |z| = 1} approximation problem was considered:
of Rm -valued processes. Suppose that the data m×m
Problem 1. Let Ψ ∈ S+ , G(z) as in (7) and
{yi }N
i=1 are generated by an unknown, zero-mean, > ◦
Σ = Σ > 0. Find Φ that solves:
m-dimensional, Rm -valued, purely nondeterminis-
tic, stationary, Gaussian process y = {yk ; k ∈ Z}. minimize dIS (Φ, Ψ)
We wish to estimate the spectral density Φ ∈  Z 
m×m
S+ of y from {yi }N
i=1 . over m×m
Φ ∈ S+ | GΦG∗ = Σ .
A THREE-like approach [5, 19] generalizes Burg-
like methods in several ways. The second or- The first issue to be addressed is feasibility of the
der statistics that are estimated from the data optimization problem, namely existence of a Φ ∈
{yi }N
i=1 are not necessarily the covariance lags Cl := S+m×m
that satisfies (8). This problem was solved
E{yk+l yk> } of y. Moreover, a prior estimate of Φ in [10] (see also [19]) where an algebraic condition
may be included in the estimation procedure. More that is necessary and sufficient for feasibility was
explicitly these methods hinge on the following four derived. In [7], it was shown that, if the problem is
elements: feasible, then the optimal spectrum has the form
1. A rational filter to process the data. The filter −1
Φ◦ (Λ) := Ψ−1 + G∗ ΛG

has transfer function . (9)

G(z) = (zI − A)
−1
B, (7) The Hermitian matrix Λ playing the role of a mul-
tiplier must be such as to satisfy the constraint,
where A ∈ Rn×n has all its eigenvalues inside namely
the unit circle, B ∈ Rn×m is full rank, n ≥ m, Z n −1 o ∗
and (A, B) is a reachable pair; G Ψ−1 + G∗ ΛG G =Σ
Existence of a Λ◦ satisfying this constraint can be editors, System Identification: Advances and
established proving that the dual problem has a Case Studies, pages 27–96, 1976.
solution (this is a highly nontrivial task since the
[4] M. Basseville. Distance Measures for Signal
multipliers set is open and unbounded) . The mul-
Processing and Pattern Recognition. Signal
tiplier can then be computed trough a globally con-
Processing, 18:349–369, 1989.
vergent, matricial, Newton-like method, see [7] for
the details. The most interesting feature of the ex- [5] C. I. Byrnes, T. Georgiou, and A. Lindquist.
pression (9) is that it provides an upper bound for A new approach to spectral estimation: A tun-
the McMillan degree of the optimal solution Φ◦ (Λ◦ ) able high-resolution spectral estimator. IEEE
(which is the most natural index of complexity). Trans. Sig. Proc., 48:3189–3205, 2000.
This upper bound is given by 2n + deg(Ψ) and is
equal to the best one so far established in the scalar [6] A. Ferrante, M. Pavon, and F. Ramponi.
case. Thus, this result represents a significant im- Hellinger vs. Kullback-Leibler multivariable
provement in the frame of multivariable spectral spectrum approximation. IEEE Trans. Aut.
estimation. Indeed, the best so far available upper Control, 53:954–967, 2008.
bound was deg[Ψ] + 4n. [7] A. Ferrante, C. Masiero, and M. Pavon. Time
Following [5], in [7] it is shown that Problem 1 and spectral domain relative entropy: A new
may be used as a basis for a spectral estimation approach to multivariate spectral estimation.
technique that features high resolution in a de- March 2011. IEEE Trans. Aut. Contr., to ap-
sired range of frequencies. Moreover, simulations pear, preprint arXiv:1103.5602v2.
show that in the case of a very short data record
this method is robust with regard to artifacts, as [8] R. Gallager. Circularly-Symmetric Gaussian
opposed to standard identification techniques like random vectors. M.I.T. notes, Jan. 2008.
MATLAB’s PEM and N4SID. We refer the reader
[9] T. Georgiou. Relative entropy and the mul-
to [7] for further details and more comprehensive
tivariable multidimensional moment problem.
simulations.
IEEE Trans. Inform. Theory, 52:1052–1066,
2006.
6 Conclusions [10] T. Georgiou. The structure of state covari-
ances and its relation to the power spectrum of
In conclusion, the main contribution of the paper
the input. IEEE Trans. Aut. Control, 47:1056–
is to provide a novel, precise connection between
1066, 2002.
spectral relative entropy and its time domain coun-
terpart via a multivariate Itakura-Saito divergence. [11] R. Gray, A. Buzo, A. Jr Gray, and Y. Mat-
This information-theoretic result seems to suggest suyama. Distortion measures for speech pro-
that the pseudo distance induced by the spectral cessing. IEEE Trans. Acoustics, Speech and
entropy rate is quite natural in dealing with many Signal Proc., 28:367–376, 1980.
optimization problems in science and information
engineering. It also appears promising for applica- [12] S. Ihara. Information Theory for Continuous
tions where spectral entropy already plays an im- Systems. World Scientific, Singapore, 1993.
portant role such as EEG analysis. [13] X. Jiang, L. Ning, and T. Georgiou. Distances
and Riemannian metrics for multivariate spec-
tral densities. preprint, June 2011.
References
[14] H. Kramer and M. R. Leadbetter. Stationary
[1] M. Basseville. Distance Measures for Signal and Related Stochastic Processes. Wiley, New
Processing and Pattern Recognition. Signal York, 1966.
Processing, 18:349–369, 1989.
[15] S. Kullback. Information Theory and Statistics
[2] H. Akaike. A new look at the statistical model 2nd ed. Dover, Mineola NY, 1968.
identification. IEEE Trans. Aut. Contr., AC-
19:716–723, 1974. [16] A. Lindquist and G. Picci. Linear Stochastic
Systems: A Geometric Approach to Modeling,
[3] H. Akaike. Canonical correlation analysis of Estimation and Identification. In preparation:
time series and the use of an information cri- preprint available in http://www.math.kth.
terion. In R.K. Mehra and D.G. Lainiotis, se/~alq/LPbook.
[17] K. S. Miller. Complex Stochastic Processes.
Addison Wesley, Reading, MA, 1974.
[18] M. S. Pinsker. Information and information
stability of random variables and processes.
Holden-Day, San Francisco, 1964. Translated
by A. Feinstein.
[19] F. Ramponi, A. Ferrante, and M. Pavon.
A globally convergent matricial algorithm for
multivariate spectral estimation. IEEE Trans-
actions on Automatic Control, 54(10):2376–
2388, Oct. 2009.
[20] I. Rezek, and S. Roberts. Stochastic Complex-
ity Measures for Physiological Signal Analysis.
IEEE Transactions on Biomedical Engineering
Vol. 45 (9):1186–1191, Sept. 1998.
[21] Yu. A. Rozanov. Stationary Random Pro-
cesses. Holden-Day, San Francisco, 1967.
[22] A. A. Stoorvogel and J. H. Van Schuppen. Sys-
tem identification with information theoretic
criteria. In S. Bittanti and G. Picci, editors,
Identification, Adaptation, Learning: The Sci-
ence of Learning Models from Data. Springer,
Berlin-Heidelberg, 1996.
[23] A. Van Den Bos. The Multivariate Complex
Normal Distribution - A Generalization. IEEE
Trans. Inform. Theory, 41:537–539, 1995.

View publication stats

You might also like