TimeFrequency DistributionsA Review
LEON COHEN
lnvited Paper
A review and tutorial of the fundamental ideas and methods of the study of timevarying signals. However, there exist nat
joint timefrequency distributions is presented. The objective of ural and manmade signals whose spectral content is
the field is to describe how the spectral content of a signal is
changing so rapidly that finding an appropriate shorttime
changing in time, and to develop the physical and mathematical
ideas needed to understand what a timevarying spectrum is. The window i s problematic since there may not be any time
basic goal is to devise a distribution that represents the energy or interval for which the signal i s more or less stationary. Also,
intensity of a signal simultaneously in time and frequency. decreasing the time window so that one may locate events
Although the basic notions have been developing steadily over the in time reduces the frequency resolution. Hence there i s
last 40 years, there have recently been significant advances. This
review is presented to be understandable to the nonspecialist with an inherent tradeoff between time and frequency resolu
emphasis on the diversity of concepts and motivations that have tion. Perhaps the prime example of signals whose fre
gone into the formation of the field. quency content i s changing rapidly and in a complex man
ner is human speech. Indeed it was the motivation to
I. INTRODUCTION analyze speech that led t o the invention of the sound spec
trogram [113], [I601 during the 1940s and which, along with
The power of standard Fourier analysis i s that it allows subsequent developments, became a standard and pow
the decomposition of a signal into individual frequency erful tool for the analysis of nonstationary signals [5], [6],
components and establishes the relative intensity of each [MI, [751, [1171, U121, F261, [1501, [1511, [1581,[1631, [1641, [1741.
component. The energy spectrum does not, however, tell Its possible shortcomings not withstanding, the shorttime
us when those frequencies occurred. During a dramatic Fourier transform and its variations remain the prime meth
sunset, for example, it is clear that the composition of the ods for the analysis of signals whose spectral content i s
light reaching us isverydifferentthan what it isduring most varying.
of the day. If we Fourier analyze the light from sunrise to Starting with the classical works of Gabor [BO], Ville [194],
sunset, the energy density spectrum would not tell us that and Page [152], there has been an alternative development
the spectral composition was significantly different in the for the study of timevarying spectra. Although it i s now
last 5 minutes. I n this situation, where the changes are rel fashionable t o say that the motivation for this approach i s
atively slow, we may Fourier analyze 5minute samples of t o improve upon the spectrogram, it i s historically clear that
the signal and get a pretty good idea of how the spectrum the main motivation was for a fundamental analysis and a
during sunset differed from a 5minute strip during noon. clarification of the physical and mathematical ideas needed
This may be refined by sliding the 5minute intervals along to understand what a timevarying spectrum is. The basic
time, that is, by taking the spectrum with a 5minute time idea is t o devise a joint function of time and frequency, a
window at each instant in time and getting an energy spec distribution, that will describe the energy density or inten
trum as acontinuous function of time. As long as the 5min sityof a signal simultaneously in time and frequency. In the
Ute intervals themselves d o not contain rapid changes, this ideal case such a joint distribution would be used and
will give an excellent idea of how the spectral composition manipulated in the same manner as any density function
of the light has changed during the course of the day. If of more than one variable. For example, if we had a joint
significant changes occurred considerably faster than over density for the height and weight of humans, we could
5 minutes, we may shorten the time window appropriately. obtain the distribution of height by integrating out weight.
This is the basic idea of the shorttime Fourier transform, Wecould obtain the fraction of peopleweighing more than
or spectrogram, which i s currently the standard method for 150 Ib but less than 160 Ib with heights between 5 and 6
ft. Similarly, we could obtain the distribution of weight at
Manuscript received March21,1988; revised March30,1989,This a particular height, the correlation between height and
workwas supported in part by the City University Research Award weight, and so on. The motivation for devising a joint time
Program. frequencydistribution i s t o be able to use it and manipulate
The author i s with the Department of Physics and Astronomy,
it in the same way. If we had such a distribution, we could
Hunter Collegeand Graduatecenter, City Universityof NewYork,
New York, NY 10021, USA. ask what fraction of the energy is in a certain frequency and
IEEE Log Number 8928443. time range, we could calculate the distribution of fre
0 1989 IEEE
00189219/89/07000941501.00
PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, IULY 1989 941
quency at a particular time, we could calculate the global The total energy€, expressed in terms of thedistribution,
and local moments of the distribution such as the mean i s given by
frequency and its local spread, and so on. In addition, if we
did have a method of relating a joint timefrequency dis
tribution to a signal, it would be a powerful tool for the con
E = j f ( t , U ) dU dt (1.7)
struction of signals with desirable properties. This would and will be equal to the total energyof the signal i f the mar
be done by first constructing a joint timefrequency func ginals are satisfied. However, we note that it i s possible for
tion with the desired attributes and then obtaining the sig a distribution to give the correct value for the total energy
nal that produces that distribution. That is, we could without satisfying the marginals.
synthesize signals having desirable timefrequency char Do there exist joint timefrequency distributions that
acteristics. Of course, timefrequency analysis has unique would satisfy our intuitive ideas of atimevarying spectrum?
features, such astheuncertaintyprinciple,which add tothe Can their interpretation be as true densities or distribu
richness and challenge of the field. tions? How can such functions be constructed? Do they
From standard Fourier analysis, recall that the instanta really represent the correlations between time and fre
neous energy of a signal s ( t ) i s the absolute value of the sig quency? What reasonable conditions can be imposed to
nal squared, obtain such functions? The hope is that they do exist, but
if they do not in the full sense of true densities, what i s the
(s(t)I2= intensity per unit time at time t best we can do? Is there one distribution that i s the best,
or or are different distributions to be used in different situ
ations? Are there inherent limitations to a joint timefre
(s(t)12At = fractional energy in time interval A t at time t.
quency distribution? This is the scope of timefrequency
(1.1) distribution theory.
Scope of Review, Notation, and Terminology: The basic
The intensity per unit frequency,’ the energy density spec
ideas and methods that have been developed are readily
trum, i s the absolutevalue of the Fourier transform squared,
understood by the uninitiated and do not require any spe
~ intensity per unit frequency at w
( S ( O ) (= cialized mathematics. We shall stress the fundamental
ideas, motivations, and unresolved issues. Hopefully our
or
emphasis on the fundamental thinking that has gone into
(s(o)(’Aw = fractional energy in frequency the development of the field will also be of interest to the
interval A w at frequency w expert.
We confine our discussion to distributions in the spirit
where
of those proposed by Wigner, Ville, Page, Rihaczek, and
S(W)= 
J2.AI 6 s(t)e’”‘ dt. others and consider only deterministic signals. There are
other qualitatively different approaches for joint timefre
quency analysis which are very powerful but will not be dis
We have chosen the normalization such that
cussed here. Of particular note i s Priestley’s theory of evo
(s(t)I2d t = 6 lS(w)I2 dw = total energy = 1
lutionary spectra 11621and we point out that his discussions
of the basic concepts relating to timevarying spectra are
among the most profound. Also, we will not consider the
where,for convenience,wewill alwaystakethetotal energy
to be equal to I.’The fundamental goal is to devise a joint Gabor logon approach, although it is related to the spec
function of time and frequencywhich represents the energy trogram discussed in Section VI.
or intensity per unit time per unit frequency. For a joint dis As usual, when considering many special cases and sit
tribution P(t, w ) we have uations, one may quickly become embroiled in a morass
of subscripts and superscripts. We have chosen to keep the
P ( t , U) = intensity at time t and frequency w notation simple and, if no confusion arises, wedifferentiate
or between cases and situations by context rather than by
notation.
P(t, w ) A t Aw = fractional energy in timefrequency Some of the terminology may be unfamiliar or puzzling
cell A t A w at t, W.
to some readers. Many words, like distribution in the prob
ability sense, are used because of historical reasons. These
Ideally the summing up of the energy distribution for all distributions first arose in quantum mechanics where the
frequencies at a particular time would give the instanta words ”probability density” or “distribution” are applied
neous energy, and the summing up over all times at a par properly. For deterministic signals where no probabilistic
ticular frequency would give the energy density spectrum, considerations enter, the reader should think of distribu
6 P(t, W) d w = (s(t)lZ (1.5)
tions as “intensities” or “densities” in the common usage
of the words, or simply as how the energy i s “distributed”
1 P(t, W ) dt = (S(4I2. (1.6)
in the timefrequency cells. Of course many probability
concepts apply to intensities, such as averages and spreads.
When we deal with stochastic signals, probability concepts
do properly enter. As we will see, many of the known dis
’We use angular frequency. All integrals go from CO to +m tributions may become negative or even complex. Hence
unless otherwise stated.
’Signals that cannot be normalized may be handled as limiting they are sometimes called quasi or pseudo distributions.
cases of normalized ones or by using generalized functions. Also from probability theory, the word ”marginal” is used
942 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
to indicate the individual distribution. The marginals are "physical" spectrum, which i s basically the spectrogram,
derived from the joint distribution by integrating out the and showed its relation t o the Wigner distribution. Fun
other variables. Hence we will say that l s ( t ) I 2 and S(w)I2are damental considerations regarding timefrequency distri
the marginals of P(t, U), as per Eqs. (1.5) and (1.6). butions and nonstationary processes were given by Blanc
Lapierre and Picinbono [24], Loynes [128], and Lacoume and
II. BRIEFHISTORICAL
PERSPECTIVE
AND EXAMPLES Kofman [121].
One of the main stumbling blocks i n developing a con
Although we will discuss the particular distributions in sistent theory is the fact that the behavior of these few dis
detail, it isofvaluetogiveashort historical perspective here. tributions i s dramatically different and each has peculiar
The two original papers that addressed the question of a properties. However, each does satisfy the marginals, has
joint distribution function i n the sense considered here are other desirable properties, and presumably i s a good can
those of Gabor [80]and Ville [194]. They were guided by a
didate for the timevarying spectrum. Furthermore each has
similar development i n quantum mechanics, where there been derived from seemingly plausible ideas. It was unclear
i s a partial mathematical resemblance to timefrequency
how many more existed and whether the peculiarities were
analysis. We discuss this resemblance later, but we empha general features or individual ones. It was subsequently
size here that the physical interpretations are drastically
realized [58] that an infinite number can be readily gen
different and the analogy is only formal. Gabor developed
erated from
a mathematical method closely connected t o socalled
coherent states in quantum mechanics. I n the same paper
Gabor introduced the important concept of the analytic sig
nal. Ville derived a distribution that Wigner [I991 gave in
1932 t o study quantum statistical mechanics. At the same
time as Ville, Moyal [I431 used an identical derivation in the
quantum mechanical context. Although we use the word where +(e, 7)is an arbitrary function called the kernel3 by
"derivation," we emphasize that there i s an ambiguity i n Claasen and Mecklenbrauker [56]. By choosing different
the method of Ville and Moyal, and many later authors used kernels, different distributions are obtained at will. For
the same derivation t o obtain other distributions. The Wig example the Wigner, Rihaczek, and Page distributions are
nerVille distribution i s obtained by taking$@, 7) = 1, and e/01r1'2,
respectively.
Having a simple method t o generate all distributions has
the advantage of allowing one t o prove general results and
t o studywhat aspects of a particular distribution are unique
It satisfies the marginals, but we do not show that now. We or common t o all. Equally important is the idea that by plac
shall see later that by simple inspection the properties of ing constraints on the kernel one obtains a subset of the
a distribution can readily be determined. A year after Wig distributions which have a particular property [58]. That is,
ner, Kirkwood [I071 came u p with another distribution and the properties of thedistribution aredetermined bythe cor
argued that it i s simpler to use than the Wigner distribution responding kernel.
for certain problems. The distribution i s simply There has been agreat surgeof activity in the past 10 years
or so and the initial impetus came from the work of Claasen
(2.2) and Mecklenbrauker [54][56], janse and Kaizer [97l, Boa
shash (aka Bouachache) [35], and others. The importance
of their initial contributions is that they developed ideas
This distribution and its variations have been derived and
uniquely suited to the timefrequency situation and dem
studied in many ways and independently introduced in sig
onstrated useful methods for implementation. Moreover,
nal analysis. A particularly innovative derivation based on
they were innovative i n using the similarities and differ
physical considerations was given by Rihaczek [167]. Levin
ences with quantum mechanics. In an important set of
[I251 derived it by modifying the considerations that led t o
papers, Claasen and Mecklenbrauker [54][56] developed
the Page [I521 distribution. Margenau and Hill [I331 derived
a comprehensive approach and originated many new ideas
it by the Ville and Moyal methods. Equation (2.2) i s complex
and procedures for the study of joint distributions. Boa
and i s sometimes called the complex energy spectrum. Its
shash [35] was perhaps the first t o utilize them for real prob
real part is also a distribution and satisfies the marginals.
lems and developed a number of new methods. I n partic
In 1952 Page [I521 developed the concept of the running
ular he realized that even though a distribution may not
spectrum. He obtained a new distribution from simple con
behave properly in all respects or interpretations, it could
ceptual considerations and coined the phrase "instanta
still be used if a particular property such as instantaneous
neous power spectra." The Page distribution i s
frequency i s well described. He initially applied them t o
problems i n geophysical exploration. Escudie [71], [77] and
coworkers transcribed directly some of the early quantum
It was pointed out by Turner [I901 and Levin [I251 that the 31n general the kernel may depend explicitly on time and fre
Page procedure can be used to obtain other distributions. quency and in addition may also be a functional of the signal. To
A comprehensive and farreaching study was done by avoid notational complexity we will use $48, T), and the possible
dependence on other variables will be clear form the context. As
Mark [I381 in 1970, where many ideas commonly used today we will see in Section IV, the time and frequencyshift invariant
were developed. He pinpointed the difficulty of the spu distributions are those for which the kernel is independent of time
rious values in the Wigner distribution, introduced the and frequency.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 943
mechanical results, particularly the work on the general have already occurred. We also note that while the Wigner
class of distributions [58], [132], into signal analysis lan distribution peaks i n the middle of each portion of the sig
guage. The work by Janse and Kaizer [97] was remarkable nal, the Rihaczek distribution i s flat and the Page distri
i n i t s scope and introduction of new methodologies. They bution gradually increases as more of the signal at that fre
developed innovative theoretical and practical techniques quency comes through in time.
for the use of these distributions. We emphasize that all three distributions satisfy the
Many divergent attitudes toward the meaning, interpre instantaneous energy and spectral energy exactly. Although
tation, and use of these distributions have arisen over the very different i n appearance, they are equivalent in the
years, ranging from the attempt t o describe a timevarying sensethateachonecan beobtainedfrom theother uniquely
spectrum t o merely using them as carrying the information and contains the same amount of information. They are very
of a signal in a convenient way. The divergent viewpoints different in their energy concentration, but nonetheless all
and interests have led to a better understanding and imple three have been used with considerable profit. We note
mentation. We will discuss some of the common attitudes that these are just three possibilities out of an infinite num
and unresolved issues i n the conclusion. The subject i s ber of choices, all with vastly different behavior.
evolving rapidly and most of the issues are open.
Preliminary Illustrative Examples: Before proceeding we
AND METHODS
111. THE DISTRIBUTIONS FOR OBTAINING
present a simple example of the above distributions so that
THEM
the reader may get a better feeling for the variety and dif
ficulties. We consider the signal illustrated in Fig. l(a). Ini One of the remarkable facts regarding timefrequency
tially the sine wave has a frequency w1 in the interval (0, t,), distributions i s that so many plausible derivations and
then it is shut off in the interval (tl, t2)and turned on again approaches have been suggested, yet the behavior of each
in the interval (t2, t3)with a frequency w2. This simple signal distribution i s dramatically different. It i s therefore impor
i s an idealization of common situations that we hope these tant to understand the ideas and arguments that have been
distributions will handle effectively. The signal i s highly given, as variations and insights of them will undoubtedly
nonstationary, has intermediate periods of silence com lead the way t o further development. We will not present
mon in acoustic signals, and has sudden onsets. Everyone these approaches in historical order, but rather in a
has a sense of what the distribution should be. We expect sequence that logically develops the ideas and techniques.
the distribution t o show a peak at w1 in the interval (0, tl) However, the different sections may be read indepen
and another peak at w2 for the interval (f2, t3),and of course dently. With the benefit of hindsight we have streamlined
to be zero in the interval (tl, t2). Fig. 1 illustrates the dis some of the original arguments.
tributions mentioned thus far and we see that they all imply
intensities, that is, nonzero values, at places that are not
expected. The Wigner distribution i s not zero in the range A. Page Distribution and Variations
(tl, t2)although the signal is. This i s a fundamental property Page [I521 argues that as the signal i s evolving, our knowl
which we discuss later. The Rihaczek distribution has non
edge of it consists of the signal up t o the current time tand
zero values at w2 at time (tl, t2), although we would expect we have no information about the future part. Conceptually
zero intensity at that frequency for those times. Similar
we may consider a new signal s,(t’),
statements hold for the interval (t2, t3) at frequency U,. The
distribution i s such that all values of the spectrum are s(t’), t’ I t
reflected at each time. The Page distribution is similar to s,(t’) = [o, t’ > t (3.1)
that of Rihaczek, but it reflects only those frequencies that
U1 a2 a1
FREQUENCY
(a) (b) (C)
Fig. 1. (a) Wigner, (b) Rihaczek, and (c) Page distributions for the signal illustrated at left.
The signal is turned on at time zero with constant frequency U, and turned off at time t,,
turnedonagainattime t,withfrequencyw,and turned off attime t,.All threedistributions
display energy density where one does not expect any. The positive parts of the distri
butions are plotted. For the Rihaczek distribution we have plotted the real part, which is
also a distribution.
944 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
where t' i s the running time and t i s the present instant. The and end of a finiteduration signal. That i s not the case with
Fourier transform of s,(t') i s the Page distribution.
It was subsequently realized by Turner [I901 that Page's
1 "
s;(w) =  s,(t"e+" dt' definition and procedure have two arbitrary aspects. First
JT;; " we can add to the Page distribution the function p ( t , U ) ,


1
I__'
s(t')e'"'' dt' (3.2)
which Turner called the complementary function, and
obtain a new distribution,
which i s called the running spectrum. The () notation i s
to signify that we have observed the signal from oo. In
analogy with Eq. (1.6) we expect the energy observed per The marginals are s t i l l satisfied if the complementary func
unit frequency u p to time t to be tion satisfies
I'm
P(t', 0)dt' = (S;(w))2. (3.3) p(t, U) dt = 0 and
s p ( t , w ) dw = 0. (3.10)
This equation can be used to determine P(t, w), since dif Turner also pointed out that taking the interval from 03
ferentiation with respect to t yields to t is not necessary; other intervals can be taken, each pro
ducing a different distribution function related to each
P(t, w ) =
a IS, (w)(2
 (3.4) other by a complementary function satisfying the above
at conditions.
which i s the Page distribution. It can be obtained from the Levin [I251 defined the future running transform by
general class of distributions, Eq. (2.4), by taking
(3.11)
4(e, 7 ) = e/+1'*. (3.5)
Substituting Eq. (3.2) into (3.4) and carrying out the differ and using the same argument, we have
entiation, we also have
Differentiation with respect to time leads to
which i s a convenient form for its calculation.
As for the general behavior of the Page distribution, we a
P'U, w ) = 
at Js:(w))2
note that the longer a particular frequency i s observed, the
larger the intensity i s at that frequency. This i s illustrated 1
in Fig.2(a),whereweplotthe Pagedistributionforthefinite = 2 Re s * ( t ) S:(w)ei"'. (3.13)
duration signal,
6
He also argued that we should treat past and future on the
s ( t ) = e/"O', 0s t 5 T. (3.7)
same footing, by defining the instantaneous energy spec
The distribution i s trum as the average of the two,
I
 sinc (w  wo)t, 0 5 t s T P(t, w ) = 1 [ P + ( t ,w ) + P(t, 41 (3.14)
2
PU, 0)=
[it otherwise.
(3.8)
As time increases, thedistribution becomes moreand more
peaked at U,,. In Fig. 2(b) we have also plotted the Wigner
distribution for the same signal for later discussion. We  ' Re s * ( t ) e'"'S*(w). (3.16)
remark here that u p to t = T/2 the distributions are iden
J%
tical, but after that, their behavior is quite different. The This distribution is the real part of the distribution given by
Wigner distribution always goes to zero at the beginning Eq. (2.2),which corresponds to the kernel 4 ( 0 , ~ = ) cos iO7.
(a)
Fig. 2. (a) Page and (b) Wigner distributions for the finiteduration signal s(t) = eluor,0 5
t s T. As time increases, the Page distribution continues to increase at wo. The Wigner
distribution increases until T / 2 and then decreases because the Wigner distribution always
goes to zero at the beginning and end of a finiteduration signal.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 94s
Because of the symmetry between time and frequency We now obtain the energy density w at t by going to the
we can also define the running signal transform by limit,
sJ(t) = I
&
3
1 PW
S(w')e/"" d w '

(3.17) e(t, w ) = lim
ut, w)

A ~ . A ~  oAt
1
=  V~V(f)e'"'.
Am &
(3.25)
which yields the distribution Associating a signal s(t) with the voltage V(t) [in which case
V, i s S(o)], we have the distribution
1
e(t, W) = s(t) S*(w)e'"' (3.26)
J%
~
1
= 2 Re sJ(t) S(w)ei"'. (3.18)
6 which i s Eq. (2.2).
Similarly,
C. VilleMoyal Method and Generalization
The Ville [I941 approach is conceptually different and
relies o n traditional methods for constructing distributions
from characteristic functions, although with a new twist.
If Eqs. (3.17) and (3.18) are added together, we again obtain Ville used the method to derive the Wigner distribution but
Eq. (3.16). did not notice that there was an ambiguity i n his presen
Filterbank Method: Grace [83] has given a interesting tation and that other distributions can be derived i n the
derivation of the Page distribution. The signal i s passed same way. As previously mentioned, Moyal [I431 used the
through a bank of bandpass filters and the envelope of the same approach.
output i s calculated. The squared envelope i s given by Suppose we have a distribution P(t, w ) of two variables t
and w. Then the characteristic function is defined as the
expectation value of that is,
where h(t)e/"' is the impulse response of one filter. By M(0, 7) = (e/st+/'W)= [ [ e'e'+/rwP(f, w ) dt dw. (3.27)
choosing the impulse response h(t)to be 1 up t o time tand
zero afterward, we obtain the frequency density, the right It has certain manipulative advantages over the distribu
hand side of Eq. (3.3), and the Page distribution follows as tion. For example, the joint moments can be calculated by
before. differentiation,
B. Complex Energy Spectrum I an+m
( t n w m )= 
In/" ae a7m
we, 7) . (3.28)
As already noted, the Rihaczek distribution was used and
derived in many ways, but Rihaczek [I671 and Ackroyd [2],
By expanding the exponential i n Eq. (3.27) it is straightfor
[3] derived it from physical considerations. Consider a time
ward t o show
dependent voltage V(t) going through a pure reactance
whoseadmittancefunction iszero for all frequenciesexcept
for a narrow band around w, where it is 1 . If we decompose
the voltage into its frequency components,
which shows how the characteristic function can be con
(3.21) structed from the joint moments. I n general the charac
teristic function i s complex. However, not every complex
the voltage at each frequency i s (I/&)Vwe/"'. The current function isacharacteristicfunction since it must bethe Fou
at that frequency i s rier transform of some density. We point out that there are
cases where the joint moments d o not determine a unique
1 distribution.
iw(t)=  V,e'"' (3.22)
6 The distribution function may be obtained from M(0, 7)
by Fourier inversion,
and the total current in the frequency range w t o w + Am
is
i(t) = i,(t) d w = 
J G
sw
w+Aw
Vwe/"' dw. (3.23)
I s it possible to find the characteristic function for the sit
The complex power at time t i s V(t) i * ( t ) , and hence the uation we are considering and hence obtain the distribu
energy i n the time interval A t is tion?Clearly, it seems, we must have the distribution to start
with. Recall, however, that the characteristic function i s an
average. Ville devised the following method for the cal
E(t, w) = culation of averages, a method that i s rooted in the quan
~1
 sf f'Af
ju
w+Aw
V:V(t)e/"' d w dt. (3.24)
tum mechanical method of associating operators with ordi
nary variables. If we have a function g,(t) of time only, then
its average value can be calculated in one of two ways,
946 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
directly using the signal or by way of the spectrum, that is, Inverting as per Eq. (3.30), we obtain the distribution
(gdt)) = j Is(t)I2gl(t) dt
e/"  I T w d0 d7 du. (3.41)
The 0 integration gives a delta function, and hence
This i s easy to show by assuming that the function gl(t)can
be expanded in a power series. Therefore in the frequency
domain time is "represented" by the operator jdldw. Sim
ilarlyfor afunction of frequencyonly,g,(w), its averagevalue
can be calculated by
(3.42)
(g2(w)>= 1 1S(w)12g2(4 dw
= s*(t) g2(  j ): s ( t ) dt. (3.32)
which i s the Wigner distribution.
and hence frequency becomes the operator jdldt in the It was subsequently pointed out [58], [I321 that there i s
time domain. an inherent ambiguity in the derivation because the char
Therefore we can associate time and frequency with the acteristic functions written in terms of the classical vari
operators 3 and W so that ables allow many operator correspondences. The method
was generalized by devising a simple method to generate
3 + t Wt j in the time domain correspondences and distributions [58]. Instead of
dt
d ,/~t+p ~ ,/ea+/Tw (3.44)
3'1% W+w in the frequency domain.
which is called the Weyl correspondence, we could take
Butwhat i f w e haveafunctiong(t,w)of timeandfrequency? e/@r+~rw t e/83e/rW (3.45)
How do we then calculate its average value?Ville proposed
that we do it the same way, namely, by using or
e/8'+l'w * e/'we/e3 (3.46)
(g(t,U ) ) = s * ( t ) G ( t , W ) s ( t ) dt (3.33)
which are called normal ordered correspondences. The
in the time domain and symmetrical correspondence is the average of the two,
(g(t,U ) ) = 1 S * W G(3, U ) S(w) dw (3.34)
,/e'+lrw + t(e/03eJrw+ e/Twe/03,). (3.47)
in the frequency domain, where G(3, W ) i s the operator There are many other expressions which reduce to the left
"associated" with or "corresponding" to g(t, w). Since the hand side when operators 3 and W are not considered
characteristic function i s an expectation value, we can use operators but ordinary functions. The nonunique proce
Eq. (3.33) to obtain it, and in particular, dure of going from a classical function to an operator func
s
tion i s called a correspondence rule, and there are an
M(0, 7 ) = (eW+/rW) + S*(t)e/83+/rWs ( t ) dt. (3.35)
infinite number of such rules [58]. Each different corre
spondence rule will give a different characteristic function,
We now proceed to evaluate this expression. Because the which in turn will give a different distribution. If we use the
quantities are noncommuting operators, correspondence given by Eq. (3.46), we obtain
3WW3=j (3.36)
one has to be careful in manipulating them. To break up
the exponent, one can use a special case of the BakerHaus
dorf [201] theorem
= s s * ( t ) e/sc'+"s(t + 7 ) dt
efXl +/"W = e/Or/z
e /rWe 103 = e/OT/2
e /O3e / A V . (3.37) = j s*(t  7) e%(t) dt. (3.48)
The operator elTwis the translation operator,
Inverting to get the distribution we have
e/'"s(t) = eT'"''s(t) = s(t + 7) (3.38)
and substituting into Eq. (3.35) we have
.
M(0, 7 ) =
S
Making the change of variables U = t
s*(t) e/er/2e/8's(t + 7)
+ i7, du
dt. (3.39)
= dt, we
e/8t/rw
d0 d7 du (3.49)
obtain
M(0, 7 ) = s S*(U  7) e/eus( U + 7) du. (3.40) (3.50)
COHEN: TIMEFREQUENCY DISTRIBUTIONS 947
which i s the Rihaczek distribution, Eq. (2.2). If we use Eq. and inverting using Eq. (3.30), we obtain the general class
(3.45) instead, we get the complex conjugate of Eq. (3.50), of distributions, Eq. (2.4).
and if we use Eq. (3.47), we get the real part. The formalisms possible with this approach and the rela
Another way t o get the correspondence i s by associating tion to classical theory has been analyzed by Groblicki
arbitrary mixed products of time and frequency [58]. That [84],Srinivas and Wolf [181], and Ruggeri [171].
is, we associate
tnwm + C(3,W) (3.51) D. Local Autocorrelation Methods
A general approach t o deriving timedependent spectra
where C(3, W ) indicates a correspondence, and then we
i s by generalizing the relationship between the power spec
calculate the characteristic function from Eq. (3.29). Once
trum and the autocorrelation function. The concept of a
the characteristic function i s determined, the distribution
is obtained as above. Many correspondences of the form local autocorrelation function was developed by Fano [72]
and Schroeder and Atal [175], and the relationship of their
given by Eq. (3.51) have been proposed. An early one was
work to timevarying spectra was considered by Ackroyd
that of Born and Jordan [33]. When the abovementioned
procedure iscarried out,oneobtainsadistributionwith the [2], [3]. A local autocorrelation method was used by Lam
pard [I221 for deriving the Page distribution, and subse
kernel [58]
quently other investigators have pointed out the relation
to other distributions. The basic idea i s t o write the energy
(3.52) density spectrum as
which has some interesting properties [58], [51], [76], [26],
[123].4
Hence one way t o approach the problem of obtaining a
joint distribution i s t o write the totality of possible corre
spondences for the characteristic function and repeat the
above calculation. The reason for the wide choice i s that the By making the transformation 7 = t  t', d7 = dt', we have
time and frequency operators d o not commute, and hence ~ n n
a number of different rules are possible. A general pro
cedure for associating functions with operators has been
developed and has been used i n a number of different fields.
To an ordinary function g(t,w) one associates the operator (3.60)
C(3, W ) in the following manner [58], [132], [181]:
where the autocorrelation function i s defined as
j j r(e,
where
G ( 3 , w) = 7) 4(e, T)e/83+/rw de dT (3.53)
R(7) =
s s*(t) s(t + 7 ) dt = s S*(t  7) s ( t ) dt
r(e,7 ) = 
or, equivalently,
47r2 ss g (t, w )e/#' 1" dt dw (3.54)
One generalizes the relationship between the energy
spectrum and R ( 7 ) as given by Eq. (3.60) by assuming that
we can write a timedependent power spectrum, that is, a
joint timefrequency distribution, as
. e /#(3  t ) +/r(W U)
dB d7 dt dw (3.55)
(3.62)
where d(0, 7 ) i s an arbitrary function that satisfies 4(0, 0) =
+(O, 7) 1. The reason for imposing this condition i s that
=
it assures that functions of t o r w only transform according where now R t ( 7 ) i s a timedependent or local autocorrela
to tion function. Many expressions for R y ( 7 ) have been pro
posed, and we illustrate some of the possibilities before
gAt) $ g,(3), gz(4 + gAW). (3.56) generalizing. One can simply take
Now if this procedure i s applied to the characteristic R'(7) = s(t) S * ( t + 7) (3.63)
function, we obtain the general correspondence
which, when substituted into Eq. (3.62), yields the Rihaczek
e/8r+/'" + +(e, 7)ele3 +jrW
(3.57) distribution. Mark [I381 argued for symmetry,
Substituting this into Eq. (3.33), we have, as before,
M(0, 7 ) = $40, T) s .*(U  7) eious(u + 7) du
which gives the Wigner distribution. Mark pointed out that
(3.58) one could consider a more general form,
4Although this kernel and the corresponding distribution are Rt(7) = s*(t  k7) S [ t + (1  k ) ~ ] . (3.65)
sometimes attributed to Born and Jordan, they never considered
joint distributions or kernels. It was derived in [58]using the cor 3
He preferred the value of k = because for the autocor
respondence of Born and Jordan. relation function we have R ( 7 ) = R * (  7 ) , and if we want the
948 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, J U L Y 1989
 .~
same to hold for the local autocorrelation function now give an alternative derivation which avoids operator
concepts and depends on the relationship between the
Rt(7) = R : (  7 ) (3.66) characteristic function of two variables and the character
then the value of k = must be chosen. However, there are istic functions of the marginals. Suppose we have a joint
an infinite number of other forms that can be obtained. A distribution of P ( t , W ) and that their marginals are given by
generalized timedependent autocorrelation function can
be defined from the general distribution, Eq. (2.4), as was
done by Choi and Williams [51]. Comparing Eq. (2.4) with
Pl(t) = 'j P(t, w ) dw, P2(w) = 1 P(t, w ) dt. (3.72)
Eq. (3.62),the generalized timedependent autocorrelation The characteristic functions of the marginals are
function i s
M1(%)= 'j e/"P,(t) dt, M2(7) =
and comparing with Eq. (3.27), we have
s e/'"P,(w) dw (3.73)
(3.67) M(%,0) = &Il(%), M(0, 7) = M*(7). (3.74)
Now suppose a characteristic function satisfies the mar
The above special cases can be obtained by particular ginals, that is, Eqs. (3.74). Then
choices of the kernel function.
It i s convenient to write this as
will also satisfy them if we take
qqo, 7 ) = +(%, 0) = 1. (3.76)
(3.68) Therefore any characteristic function, if multiplied by a
where function satisfying Eq. (3.76), will produce a new charac
teristic function which will also satisfy the marginals. If we
r(u, 7) = 'j e/'"+(%, 7)do. (3.69) take Eq. (3.40)as the "original" characteristic function, then
a whole class i s obtained from
We now ask, for what types of kernels does the local auto
correlation function satisfy Eq. (3.66)?Taking the complex
conjugate of Eq. (3.67) and substituting  7 for 7 , we have
M(%,7 ) = +(%, 7 ) s .*(U  4 7) e/'"s(u + 7) du
R:(7) = 7
1
ss
47r
efecu')+*(8, 7) s*(u  7)
which when substituted into Eq. (3.30)yields, as before, the
general class of distributions, Eq. (2.4).
(3.77)
(3.70) We emphasize that even though we have been using the
terminology "characteristic function," they are not proper
Therefore if we want Eq. (3.66)to hold, we must take characteristic functions since Eq. (3.75) i s not a sufficient
condition. They should properly becalled quasior pseudo
I$(%, 7) = +*(8, 7). (3.71) characteristic functions. We also point out that the choice
What i s interesting about this relation is that it i s the of Eq. (3.40) for M(%,7 ) in Eq. (3.75) i s arbitrary.
requirement for the distribution to be real, as discussed in
Section IV. F. Positive Distributions
Choi and Williams [51] devised a very interesting way to The question of the existence of manifestly positive dis
understand the effects of the kernel by examining the local tributions which satisfy the marginals has been a central
autocorrelation function. They point out that since the main issue in the field. Many "proofs" have been given for their
interest in these distributions i s to study phenomena that nonexistence, and for a long time it was generally believed
are occurring locally, we want to give a relatively large that they did not exist. The uncertainty principle was often
weight to s*(u  17) s(u + 17) when U i s close to t; otherwise invoked to make it reasonable that positive distributions
we would not be emphasizing the events near time t. Choi cannot exist. MugurSchachter [I441 has shown where the
and Williams used this concept very effectively in devising hidden assumptions in these proofs havecrept in.Also, Park
a new distribution. The importance of their work i s that it and Margenau [I541 have made a farreaching study of the
gives a fresh perspective and a concrete prescription for relation of joint measurement, joint distributions, and the
obtaining distributions with desirable properties.They have existence of positive distributions. Positive distributions do
unified these concepts by way of the timedependent auto exist, and it is easy to generate an infinite number of them
correlation function and generalized ambiguity function [62]. Choose any positive function Q(u,v) of the two vari
[63], [58], [76]. We discuss their distribution in Section ables U , v such that
IllG.
E. PseudoCharacteristic Function Method and General
Bilinear Class
and construct
We have seen in Section IlC how Ville's method is gen
eralized to obtain an infinite number of distributions. We
COHEN: TIMEFREQUENCY DISTRIBUTIONS 949
For U and v we now substitute zero values. These spurious values, which are due to the
socalled cross terms, are particularly prevalent for multi
component signals [26], [41], [51], [91], [141]. The cause of
these effects, sometimes called “artifacts,” i s usually attrib
uted to the bilinear nature of the distribution, and it was
To show that the marginals are satisfied, we integrate with
felt by many that it i s something we have to live with. In the
respect to U ,
s
case of the Wigner distribution, extensive studies have been
1 P(t, U ) d w = ls(t)I2 IS(412Q(u,V) d w
made and methods devised to remove them in some way.
This usually involves violating some of the desired prop
erties like the marginals. Choi and Williams argue that
= Is(t)l2 j’ Q(u, v) dv = ls(t)I2. (3.81) instead of devising procedures to eliminate them from the
Wigner distribution, let us find distributions for which the
spurious values are minimal. Choi and Williams succeeded
The last step follows since d v = ( S ( O ) (dw;
~ similarly for inte
in devising a distribution that behaves remarkably well in
gration with respect to t. Functions satisfying Eq. (3.78) can
satisfying our intuitive notions of where the signal energy
readily be constructed. It has been shown that this pro
should be concentrated, and that reduces to a large extent
cedure generates all possible positive distributions [73],
the spurious cross terms for multicomponent signals. Also,
[176].We note that the positive distributions are not bilinear
the desirable properties of a distribution are satisfied.
in the signal and that in general Q(u,v) may be a functional
Following Choi and Williams we consider a signal made
of the signal. The relation of bilinearity to the question of
up of components Sk(t),
positivity is discussed in the conclusion. The kernel which
N
generates the positive distributions from Eq. (2.4) can be
obtained [62]. s(t) = sk(t). (3.82)
k=l
Whether any of these positive distributions can yield
intensities that conform to our expectations has been ques Substituting this into the general equation, Eq. (2.4),we can
tioned. Janssenand Claasen [I011 have pointed out that no write the distribution as the sum of self and cross terms,
systematic procedure exists for choosing a unique Q(u,v). N N
That of course i s also true with the bilinear distributions. p(t, a) = c Pkk(t, a) + c
k=l k,l=l
Pk/(t, U ) (3.83)
Janssen [102a] has argued that the positive distributions I f k
cannot satisfactorily represent a chirplike signal, although
where
it has been noted [102b] that it can if the kernel is taken to
be signal dependent. The problem of constructing a joint
distribution satisfying the marginals arises in every field of
science and mathematics and is one of the major problems
to be resolved. There are in general an infinite number of sl( U  r) SI( u + T) du dr de. (3.84)
joint distributions for given marginals, although their con
struction i s far from straightforward. Because the marginals
Choi and Williams [51] realized that by a judicious choice
do not contain any correlation information, other condi
of the kernel, one can minimize the cross terms and still
tions are needed to specify a particular joint distribution.
retain the desirable properties of the self terms. This aspect
That information i s entered by way of Q , although a sys
i s investigated using a generalized ambiguity concept [63]
tematic procedure for doing so has not been developed. In
and autocorrelation function, as in Section I l l . They found
the case of signal analysis and quantum mechanics there
a particularly good choice for the kernel,
i s the further issue of dealing with a signal (or wave func
tion) and constructing the marginal from it. The two mar ,#,(e, 7) = ee2rz’o (3.85)
ginals ls(t)I2,and 1S(o)I2do not determine the signal. This
where U is a constant. Substituting into Eq. (2.4) and inte
was pointed out by Reichenbach [166], who attributes to
grating over 8, one obtains
Bargmann a method for constructing different signalswhich
have the same absolute instantaneous energy and energy
density spectrum. Vogt [I951 and Altes [i‘l give similar meth
ods of constructing such functions. Since the signal con
tains information that the marginals do not, in general Q(u, (3.86)
v) must be signal dependent. The question of the amount
of information needed to construct a unique joint distri Theabilitytosuppress thecrosstermscomes bywayof con
bution, and of how much more information the signal con trolling U . The rcw(t  U , r), as defined by Eq. (3.691, i s
tains than the combination of the energy density and spec
tral energy density, requires considerable further research.
G. Choi Williams Method
From the discussion of Section IllD we note that indeed
A new and novel approach has recently been presented it is peaked when U = t and U can be used to control the
by Choi and Williams [51] where they address one of the relative importance of r.
main difficulties with the Wigner distribution. As we have The kernel given by Eq. (3.85) satisfies Eq. (3.71), which
already seen from Fig. l(a), the Wigner distribution some shows that the local autocorrelation function satisfies Eq.
times indicates intensity in regions where one would expect (3.66) and that the distribution i s real. In Section I V we use
950 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
w1 9
FREQUENCY
(a) (b)
Fig. 3. (a) Wigner and (b), (c) ChoiWilliams distributions for the sum of two sine waves,
s(t) = e'y" + ewZr. The Wigner distribution i s peaked to infinity at the frequencies w , , w?
and at the spurious value of w = (w, + w2).The middle term oscillates and is due to the
cross terms. The ChoiWilliams distributions are shown for (b) U = IOb and (c) U = IO5.
Note that all three distributions satisfy the marginals. The values for w are w , = 1 and w2
= 9. The delta functions at w, and w2 are symbolically represented and are cut off at the
value of 700.
this kernel as an example to demonstrate how the general We first note that
properties of a distribution can be readily determined by
inspection of the kernel.
, w2, U ) = 6[w 
lim ~ ( w U,,
U m
+(U, + w2)] (3.91)
The importance of the work of Choi and Williams i s that
and forthat casethedistribution becomes infinitelypeaked
they have formulated and implemented effectively a means
of choosing distributions that minimize spurious values
+
at w = $(U, w2). In fact, as U t 03, it becomes the Wigner
distribution, since for that limit the kernel becomes 1. As
caused by the cross terms. Moreover they have connected
long as U i s finite, the cross terms will be finite at that point
in a very revealing way the properties of a distribution with
that of the local autocorrelation function and characteristic andwill increaseas &.Notethat ifuissmall,thecrossterms
function. The kernel given by Eq. (3.85) i s a oneparameter are small and do not obscure the interpretation with spu
family, but their method can be used to find many other rious values. In Fig. 3 we illustrate the effect of U. We have
kernels having the general desirable properties. represented the delta function symbolically, but the cal
We give three examples to illustrate the considerable culation for the cross terms i s exact. We see that the cross
clarity in interpretation possible using the distribution of terms may easily be eliminated for all practical purposes by
Choi and Williams. We first take the sum of two pure sine choosing an appropriate value of U .
waves, Another revealing example i s the sum of two chirplike
signals [51],
s(t) = + A2e/"". (3.88) 114
s(t) = A(:) e~nlt'/2+/8'f*/2+/wlf
The ChoiWilliams distribution i s readily worked out [51],
Pcw(t, U) = A:6(w  U,) + A@&  U,) + 2A1A2
+ Az( ~)114e..f.;.+/8~t2,z+.... (3.92)
' COS (02  wq)t V ( W , U,, ~ 2 U)
, (3.89)
where which have instantaneous frequencies along w = Blt w1 +
and w = &t + w2. We expect the concentration of energies
to be along the instantaneous frequencies. Fig. 4 i s a plot
of the Wigner and the ChoiWilliams distributions. Note
the middle hump in the Wigner distribution. On the other
hand, thedistribution of Choi and Williams i s clear and eas
ily interpretable. We emphasize that thedistribution of Choi
and Williams satisfies the marginals for any value of U.
FREQUENCY 
(a) (b)
Fig. 4. (a) Wigner and (b) ChoiWilliams distributions for the sum of two chirps. The Wig
nerdistribution has artifactsand spuriousvalues in between the twoconcentrationsalong
the instantaneous frequencies of the chirps. In the ChoiWilliams distribution the spu
rious terms are made negligible by an appropriate choice of U.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 95 1
advantageously to develop practical methods for analysis
and filtering, as was done by Eichmann and Dong [70]. Excel
lent reviews relating the properties of the kernel to the
properties of the distribution have been given by Janseand
Kaizer [97], Janssen[98], Claasen and Mecklenbrau ker [56],
and Boashash [26].
For convenience we repeat the general form here,
. .... .. .
(a)
FREQUENCY *
(b) p(t, = 4 JSJ
4s
e/et/rw+/su640, 7)
Fig. 5. (a) Wigner and (b) ChoiWilliams distributions for
( : > (+ : >
signal s(t) = A,e/B't2'2+/w1t+ A, e ~ u z+t/ 4 2 s m u m t
. Both distri
butions show concentration at the instantaneous frequen . S* U  s U 7 du d7 d0. (4.1)
+
cies w = U , p, t and w = w2 + b2w, cos w, t. For the Wigner
distribution there are spurious terms in between the two The kernel can be a function of time and frequency and in
frequencies. They are very small in magnitude in the Choi
principle can be a function of the signal. However, unless
Williams distribution.
otherwise stated, we shall here assume that it is a not afunc
tion of time or frequency and is independent of the signal.
In Fig. 5 we show the Wigner and ChoiWilliams distri
Independence of the kernel of time and frequency assures
butions for a signal that i s the sum of a chirp and a sinu
that the distribution i s time and shift invariant, as indicated
soidal modulated signal,
below. If the kernel is independent of the signal, then the
distributions are said to be bilinear because the signal enters
For both distributions we see a concentration along the only twice. An important subclass are those distributions
instantaneous frequencies; however, for the Wigner dis for which the kernel i s afunction of 67, the product kernels,
tribution there are "interference" terms which are very
minor in the ChoiWilliams distribution.
4(e, 7) = dPR(e7). (4.2)
For notational clarity, we will drop the subscript PR since
IV. UNIFIEDAPPROACH one can tell whether we are talking about the general case
As can be seen from the preceding section, there are many or the product case by the number of variables attributed
distributions with varied functional forms and properties. to 4b3,7).I n Table 1we list some distributions and their cor
A unified approach can be formulated in a simple manner responding kernels.
with the advantage that all distributions can be studied The general class of distributions can be expressed in
together in a consistent way. Moveover, a method which terms of the spectrum by substituting Eq. (1.3) into Eq. (2.4)
readily generates them allows the extraction of those with to obtain
the desired properties. As we will see, the properties of a
distribution are reflected as simple constaints on the ker
nel. By examination of the kernel one can determine the
( + : > (:>
general properties of the distribution. Also, having a gen
eral method to represent all distributions can be used
. S* U 6 S U 0 du d7d0. (4.3)
Table 1 Some Distributions and Their Kernels
Reference Kernel d&3. r) Distribution P(t. d
Wigner [199], Ville [I941 1 1
2r
1 e/rw s * ( t  t r ) s(t + ir) dr
Margenau and Hill [I331 COS 3 er
Kirkwood [107],Rihaczek [ I 6 7 e/"/'
sin a&
sinc (581
Page [I521
Choi and Williams [51]
Spectrogram
952 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
The best practical method to determine the kernel for a we must have
given distribution is to put it into the form of Eq. (4.1). Oth
440, 0) = 1 (4.14)
erwise one can calculate the kernel from
ss e/st+”wP(t,CO) dt do
which i s called the normalization condition. We note that
this condition i s weaker than the conditions given by Eqs.
’(” = s eloUs*(u $7)
which i s obtained from Eq. (4.1) by Fourier inversion. It i s
s(u + $ 7 ) du
(4.4)
(4.10) and (4.12), that is, it i s possible to have a joint distri
bution whose total energy i s the same as that of the signal,
but whose marginals are not satisfied. An example of this
i s the spectrogram discussed in Section VI.
alsoconvenienttodefinethecross distribution function for Reality: The bilinear distributions are not in general pos
two signals, as was done by Eq. (3.84). The main reason for itive definite, which causes serious interpretive problems.
defining it is that the distribution of the sum of two signals It has been generally argued that at least they should be
real. By taking the complex conjugate of Eq. (4.1) and com
s(t) = Sl(t) + S2(t) (4.5) paring it to the original, it i s straightforward to show that
can be conveniently written as a necessary and sufficient condition for a distribution to be
real i s that the kernel satisfy
~ ( tU,) = pll(t, W ) + W ) (+ t p12(t,
~ ~ ~ , U) + p2,(t, U ) .
w, 7) = d*(e, +. (4.15)
(4.6)
Time and Frequency Shifts: If we translate the signal by
If s,(t) and s 2 ( f )are each normalized to 1, then an overall an amount to,we expect the whole distribution to be trans
normalization may be inserted so that s ( t ) and the distri lated by the same amount. Letting s ( t ) st&) = s ( t + to)and
+
bution are normalized to 1 . substituting in Eq. (4.1), we have
r r r
A, Physical Properties Related to Kernel
We now show how the properties of the distribution are
related to the properties of the kernel. We shall give only
a few derivations to indicate the general approach since the
procedures are fairly simple.
lnstantaneous Energy and Spectrum: If P(t, U ) is to be a (4.16)
o n ”
joint distribution for the intensity, we want it to satisfy the ~
individual intensities of time and frequency. That is, when
the frequency variable i s integrated out, we expect to have
the instantaneous power Js(t)I2, and similarly when time i s
integrated out, we expect to have the energy density spec
trum IS(w)I2. Integrating Eq. (4.1) with respect tow we have
1 P(t, U ) d w = 
27r sss b(7) e/ecu*)qW?,
7)
S*
(+ : > +(:.>
U   7 s U  dOd7du (4.18)
= P(t torw). (4.19)
1s
(4.7)
Hence a shift in the signal produces a corresponding shift
in thedistribution. We note that the proof requires that the
= e’e(Ut’4(0,O ) ~ S ( U )d
O
( ~ du. (4.8)
27r kernel be independent of time and frequency. A similar
result holds in the frequency domain, that is, if we shift the
The only way this can be made equal to Js(t)I2i s if
spectrum by a fixed frequency, then the distribution i s
2s
1 0) de = b(t 
e/ecu”4(0, U) (4.9)
shifted by the same amount. If
S(w) + S(w + wo) or s(t) + s(t)e/wot (4.20a)
which forces
$(e, 0) = I. (4.10) then
Similarly, if we want P(t, 0)+ P(t, + WO). (4.20b)
Global and Local Quantities: If we have a function g(t, w )
P(t, w ) dt = IS(w)1* (4.11) of time and frequency, its global average i s
we must take
d(0, 7 ) = 1.
It also follows that if the total energy i s to be preserved, that
(4.12) ( g ( t , 4) = s1 g(t, w ) P(t, w ) d w dt.
The local or mean conditional value, the average of g(t, w )
(4.21)
at a particular time, i s
s
is,
P(t, w ) d w dt = 1 = total energy (4.13)
COHEN: TIMEFREQUENCY DISTRIBUTIONS 953
where P,(t) i s the density i n time, butions the covariance was considered first by Cartwright
Pl(t) = 1 P(t, U ) d w (4.23)
[46]. The covariance i s defined as
c o v (tu) = ( t u )  ( t > ( w ) (4.31)
and i s equal to Js(t)I2if Eq. (4.10) i s satisfied. Similar equa and the correlation coefficient by
tions apply for the expectation value of a function at a par c o v (tu)
ticular frequency. r= (4.32)
UP,
Mean Conditional Frequency and Instantaneous Fre
quency: The local or mean conditional frequency i s given where a, and uw are the duration and the bandwidth of a
by signal as usually defined [Eq. (8.10)]. The simplest way to
(U), =
P,(t)
1 wP(t, w ) dw. (4.24)
calculate ( t u ) i s t o use Eq. (3.29) with Eq. (3.77),
We have avoided using the term "instantaneous" fre
quency for reasons to be discussed shortly. A straightfor This is an interesting relation because the derivative of the
ward calculation leads t o phase is acting as the frequency. We should emphasize that
the covariance and the correlation coefficient, as used here,
do not always have the same behavior as their standard
counterparts because the distribution i s not necessarily
positive definite.
e/e(u I)de du (4.25) Spread a n d Second Conditional Moment: Having
obtained a reasonable result for the mean frequency at a
where the signal has been expressed in terms of its ampli
particular time, it i s natural to ask for the spread or broad
tude and phase, ness of frequency for that time. This was done by Claasen
s(t) = A(t)e/"'". (4.26) and Mecklenbrauker [54] for the Wigner distribution case
in the signal analysis context, and by others in thequantum
For the product kernels this becomes mechanical context [82]. Unfortunately difficulties arise, as
(w>,P,(t) = 4(0)A2(t)cp'W + 24'(0) A(t) (4.27) we shall see. First considerthe second conditional moment
where the primes denote differentiation.
We must now face the question as t o what we want this
(a2)=
,
Pl(t)
1 w2P(t, w ) dw. (4.34)
to be equal to. First we note that if we take
The calculation of this quantity i s important for many rea
sons. I n quantum mechanics it i s particularly relevant
because it corresponds to the local kinetic energy. It has
been considered by a number of people who have pro
posed different expressions for it. It was subsequently
in Eq. (4.25) or shown that these different expressions are particular real
+(O) = 1, 4'(0) = 0 (4.29) izations of different distributions. We give the results for
the product kernels [123],
in Eq. (4.27),we then have P,(t) equalingA*(t)and we obtain
(U>, = d(t) (4.30)
a pleasing result reminiscent of the usual definition of
instantaneous frequency. But it i s not. Instantaneous fre  
1 [I  
44"(0)]A"(t) + pp'2(t). (4.35)
quency is the derivative of the phase if the analytic signal 2 A(t)
i s used (see Section VIII). Equation (4.30) i s true for any sig Even though in general the second conditional moment
nal. Moreover, even though instantaneous frequency i s should be manifestly positive, that i s not the case with most
meaningfully defined for certain types of signals [79], [153], of thedistributions, including the Wigner distribution.This
[168], [191], this result i s for all signals. It has been specu makes the usual interpretation of these quantities impos
lated [21], [56] that this indicates a method for ageneral def sible. However, aswewill see below, therearedistributions
inition of instantaneous frequency which will hold under for which the second conditional moment and thevariance
all circumstances. However, considerably more work has are manifestly positive.
to be done to fully develop the concept. Conversely, Boas Now consider the spread of the mean frequency at a given
hash [26] has argued that since Eq. (4.30)corresponds to the time,
instantaneous frequency only when the analytic signal i s
used, we should always use the analytic signal in these dis
tributions. These issues are discussed at greater length in
Section VIII.
<U:>, =
s (W  (W>J2P(t,W ) d w
Using Eqs. (4.35) and (4.29) we have
r
( w 2 ) ,  (U):.
72
(4.36)
Correlation Coefficient a n d Covariance; The covariance 1
and the correlation coefficient very often afford much (a:), =  [I
2
+ 44"(0)]
insight into the relationship between two variables. An
application of this i s given in Section V, where we apply  1 [I
  .
44"(0)]A"(t)
(4.37)
these ideas t o the Wigner distribution. For quasidistri 2 A(t)
954 PKOCEEDINCS OF THE IEEE. VOL. 77, NO. 7, J U L Y 1989
As before, this expression will become negative for most Therefore
bilinear distributions and therefore cannot be interpreted
as a variance. However, consider the choice [I231
4”(0)= a. (4.38)
Hence the envelope of the signal at frequency w’ i s
Then the spread becomes
r 72
delayed by $(a’),and the phase i s delayed by $(w’)/w’.
The delay of the envelope i s called the group delay, which
we now write for an arbitrary function w,
tg = $‘(U). (4.46)
From the point of view of joint timefrequency distri
butions we may think of the group delay as the mean time
which is manifestly positive. Also, for this case, at a given frequency. Now by virtue of the symmetry
between Eqs. (4.1) and (4.3) everything we have done for the
expectation value of frequency at a given time allows us to
(4.40)
readilywritedown thecorresponding results for theexpec
tation value of time at a given frequency.
which i s also manifestly positive. There are an infinite num
In particular,
ber of distributions having this property since there are an
infinite number of kernels with the same second derivative (0, = $’(a) (4.47a)
at zero.
Group Delay: Suppose we focus on the frequency band if the kernel i s chosen such that
around the frequency w’ and assume that the phase of the
spectrum is a slowly varying function of frequency so that d40, 7) = 1, amcs.,i/ = 0. (4.47b)
a good approximation to it, around point U’, i s a linear one ae g=o
[1531, [1681,
For the case of product kernels the conditions are the same
$(a) = $(U‘) + (U  w’)$’(w’) (4.41)
as that given by Eq. (4.29).
where $ ( U ) is the phase of the spectrum, Transformation of Signal a n d Distribution: In Table 2 we
list the transformation properties of the distribution and
S(w) = JS(w)(e/J.‘“’. (4.42)
the characteristic function for simple transformations of
If we consider a signal that i s made up of the original spec the signal.
trum concentrated onlyaround the frequencies U’,then we Range of Distribution: If a signal i s zero in a certain range,
s,.(t) = 
6
s
have the corresponding signal
s(w  w,)el[J.(W’)+(WW’)J.’(W’)I
we would expect the distribution to be zero, but that is not
ture for all distributions. It can be seen by inspection that
the Rihaczek distribution i s always zero when the signal i s
zero. This i s not the case for the Wigner distribution. The
. elwi dw. (4.43) general question of when adistribution iszero has not been
fully investigated. Claasen and Mecklenbrauker [56] have
We now write the spectrum in terms of the original signal derived the following condition for determining whether
as given by Eq. (1.3), a distribution i s zero before a signal starts and after it ends:
sJt) =
23r ss s(t”[J.‘w’’ + (0  w?J.’(w’)l
Even when this holds, it i s not necessarily the fact that the
distribution i s zero in regions where the signal i s zero.
Real and Imaginary Parts of Distributions: If a complex
distribution satisfies the marginals, then so do the complex
. s[t’ + $’(U’) + t] dt’. (4.44b) conjugateand the real part. The imaginary part indeed must
Table 2 Transformation Properties of Distributions and Characteristic Functions for
Transformations of the Signal
Characteristic
Signal Distribution Function Kernel Product Kernel
Transformation s(t) P(t, 4 7) $48, 7) 4(87)
COHEN: TIMEFREQUENCY DISTRIBUTIONS 955
integrate t o zero for each variable, that is, or
Im s P ( t , w ) dt
The complex conjugate distribution
= 0, Im s P ( t , w ) dw = 0. (4.49)
The preceding relations can be used t o determine
P*(t, w ) has the kernel $*(0, 7) (4.50) whether a signal exists that will generate a given P ( t , w ) . We
call such distributions representable or realizable. A nec
and the real part distribution
essary and sufficient [61] condition for representability i s
Re P ( t , a) has the kernel ;[+(e, T) + $*(0, 7)J.
that the righthand side of Eq. (4.55) or Eq. (4.58) result in
a product form as indicated by the lefthand side.
(4.51) Nuttall [I481 has made an important contribution regard
ing the reversibility problem. He has been able to char
For example, using Eq. (4.51) we see that the kernel of the acterize the distributions from which the signal can be
real part of the Rihaczek distribution i s cos $07. recovered uniquely. We note that from a given distribution
Example: To illustrate how readily one can determine the thecharacteristic function M ( ~ , T ) always be determined
can
general properties of a distribution by a simple inspection uniquely since it i s the Fourier transform of the distribution
of the kernel, we use as an example the distribution of Choi as defined by Eq. (3.30). However, t o obtain the signal one
and Williams [51]. The ChoiWilliams kernel i s has to divide the characteristic function by the kernel $40,
$ c w ( ~ ,7 ) = e0zr2iu (4.52) T),which maybezeroforsomevaluesofOand7. Nuttall [I481
has shown that the signal can be recovered uniquely if the
and we see that it i s a product kernel, kernel has only isolated zeros. The number of zeros can be
+cw(x) = (4.53) infinite, or the kernel may be zero along a line in the 0, 7
plane, However, if the kernel i s zero in a region of nonzero
where x = 07. From Eqs. (4.10) and (4.12) it is readily seen area, then the signal cannot be recovered. The basic reason
that the marginals are satisfied, that the mean frequency is is that for isolated zeros the ratio M(0, t)l4(0, t ) can be
the derivative of the phase [verified using Eq. (4.29)], that obtained by taking limits at the points where the kernel i s
the shift properties are automatically satisfied, and that all zero. However, if the kernel i s zero in a region, then the
properties and transformations in Table 2 are satisfied since ratio i s undefined.
it is a product kernel.
C. Relations Between Distributions
B. lnversion and Representability
Many special cases relating one particular distribution t o
To obtain the signal from a distribution we take the
another have been given in the literature. A general rela
inverse Fourier transform of Eq. (4.1) and obtain
tionship between any two distributions can be derived
.*(U  ; s(u + ; readily [61].
Having such aconnection allows thederivation of results
for a new distribution in a simple way if the answers are
already known for another distribution. In addition it clar
ifies the relation between distributions.
or Suppose we have two distributions P1 and P2 with cor
responding kernels $, and $2. Their characteristic functions
are
. e / ( t  r ' ) w + / ~ [ ~  ( r + r ' ) i 2 ~dx dw d0. (4.55)
By taking a particular value of t', for instance zero, we have (4.60)
Hence the signal can be recovered from the distribution u p
(4.61)
to a constant phase.
These equations can be written in terms of the gener and dividing one by the other we have
alized characteristic function [58], [63] as defined by Eq.
(3.58),
(4.62)
we, t  t')
e
.
d0
Taking the Fourier transform to obtain the distribution we
have
(4.57)
or
. P2(t', U ' ) d0 d7 dt' dw'. (4.63)
956 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
It i s sometimes convenient t o write this as
Pl(t, U) = Sl g(t'  t, U'  w)P2(t', U ' ) dt' d w ' (4.64)
. $(e, x'  x) d0 (4.73)
with or
A very useful way t o express Eq. (4.64) i s in operator form. We note that Ko(t, 7) i s essentially r(t, 7), as defined in Eq.
We note the general theorem [60] (3.69).
Additional conditions imposed on the distribution are
reflected as constraints [go], [114], [170], [200], [I301 on K in
the same manner that we have imposed constraints on f ( 0 ,
= G( j k, k)j H(t, w ) (4.66)
7). However, as shown by Kruger and Poffyn [114], the con
traints f ( 0 , ~are
) much simpler t o formulate and express as
compared t o those on K, and that is why Eq. (2.4) is easier
which, when applied to Eq. (4.63), gives t o work with than Eq. (4.70).For example, the time and fre
quencyshift invariant requirement i s imposed by simply
requiring that 4(8, 7) not be a function of time and fre
quency.
Characteristic Functions and Moments: We have seen in
Section I l l that characteristic functions are a powerful way
to derive distributions. The characteristic function i s closely
D. Other Topics related to the ambiguityfunction (see Section VI). We would
like t o emphasize that, in addition, characteristic functions
Mean Values of TimeFrequency Functions: We have are often the most effective way of studying distributions.
already defined and used global and local expectation val For example, consider the transformation property of the
ues. There exists a general relationship between averages characteristic function as given by Eq. (4.62) and compare
and correspondence rules which has theoretical interest it t o the transformation property of the distributions as
and i s very often the best way to calculate global averages. given by Eq. (4.63)
One can show [58] that the expectation value calculated in We also point out the relationship of the generalized
the usual way characteristic functions and the generalized autocorrela
(g(t, 4)= S g(t, w)P(t, w)d w dt (4.68)
tion function. Comparing Eq. (3.67) with Eq. (3.77) we see
that
can be shortcircuited by calculating instead
( g ( t , a)> = S s*(t) '33, W ) s(t) dt (4.69) This relation can be used t o derive the transformation prop
erties of the autocorrelation function. If /?!"and /?i2'(7)are
if the correspondence between g and G i s given by Eq. (3.55). the autocorrelation functions corresponding t o two dif
Bilinear Transformations: A general bilinear transforma ferent distributions, then we have
tion may be written i n the form
P(t, U ) = l K(t, U; x, x') s*(x) s(x') dx dx' (4.70)
as has been done by Wigner [200], Kruger and Poffyn [114],
and others [go], [130], [170]. By requiring the distribution t o
satisfy desirable properties, Wigner obtained the condi where we have used Eq. (4.62). Writing the characteristic
tions to be imposed on K. If we require that the distribution function in terms of the autocorrelation function we have
be timeshift invariant, then it can be shown [114], [200] that
K must be a function of t  x and t  x', or equivalently a t') Riz)(r)dt' (4.78)
function of 2t  x  x' and x  x'. In addition if the dis
tribution is to be frequencyshift invariant, then K must be
where
of the form
K(t, w ; x, x') = e/(xx')wK(t,
0; x, x'). (4.71) (4.79)
Hence kernels which yield time and frequencyshift in
Using the characteristic function, we can also obtain the
variant distributions are of the form [114], [200], [I301
transformation of mixed moments. Using Eqs. (3.29) and
K(t, U; x, x') = e~"""KO(2t  x  x', x  x') (4.72) (4.611,
where the new kernel KO i s only a function of two variables.
By comparing Eq. (4.70) with Eq. (2.4) we have
COHEN: TIMEFREQUENCY DISTRIBUTIONS 957
Some straightforward manipulation yields
n m
(5.10)
where
a result obtained by Claasen and Mecklenbrauker [54]. As
they have pointed out, it generally goes negative and can
not be interpreted properly.
B. Range of Wigner Distribution
From the functional relation to the signal onecan develop
These are very convenient relations for obtaining the some simple rules of thumb to ascertain the behavior of the
moments of a distribution if one has already found them Wigner distribution [59]. From Eq. (5.2) we see that for a par
for another one. ticulartimeweareadding up pieces made from the product
ofthesignalat apasttimemultiplied bythesignalatafuture
V. WICNERDISTRIEUTION time, the time into the past being equal to the time into the
The Wigner distribution was the first to be proposed and future. Therefore to see whether the Wigner distribution
is certainly the most widely studied and applied. The dis iszeroatapoint,onemaymentallyfold the part ofthesignal
covery of its strengths and weaknesses has been a major to the left over to the right and see whether there i s any
thrust in the development of the field. It can be obtained overlap. If so, the Wigner distribution will not be zero, oth
from the general class by taking erwise it will. Now consider a finiteduration signal in the
interval t , t o f 2 as illustrated:
dw(e,7) = I. (5.1)
The Wigner distribution i s tl tz
If we are any place left of tl and fold over the signal to the
right, there will be no overlap since there i s no signal to the
and in terms of the spectrum, it i s left of tl to fold over. This will remain true up to the start
of the signal at timer,. Hence for finiteduration signals, the
Wignerdistribution iszerouptothestart. This i s adesirable
feature since we should not have a nonzero value for the
distribution if the signal i s zero. At any point to the right
A. General Properties of tl but less than t2, the folding will result in an overlap.
Becausethe kernel isequal to1,thepropertiesoftheWig Similar arguments hold for points to the right of t2. There
ner distribution are readily determined. Using the general fore for a timelimited signal, the Wigner distribution is zero
equations of Section IV, we see that the Wigner distribution before the signal starts and after the signal ends, that is,
satisfies the marginals, that it i s real, and that time and fre W(t, w ) = 0 for t srl or t 2 t2 if s(t) i s nonzero
quency shifts in the signal producecorresponding time and
frequency shifts in the distribution. Since the kernel i s a only in the range (t,, t2). (5.11)
product kernel, all the transformation properties of Table Due to the similar structures of Eqs. (5.2) and (5.3), the
2 in Section IV are seen to be satisfied. same considerations apply to the frequency domain. If we
The inversion properties are easily obtained by special have a bandlimited signal, the Wigner distribution will be
izing Eqs. (4.54)(4.56)for the case of 4 = 1, zero for all frequencies that are not included i n that band,
W(t, w ) = 0 for w IU, or w 2 w2 if S(W)i s nonzero
(5.4)
only in the range (a1,w2). (5.12)
These properties are sometimes called the support prop
erties of the Wigner distribution.
Now consider a signal of the following form:
s
(5.5)
/wwwv\n, rn
s(t) = W ( i t, u)eJtu dw. (5.6) tl tz ta ts
s * (0)
Mean Local Frequency, Group Delay, a n d Spread: Since where the signal i s zero from t2to f 3 , and focus on point t,.
the kernel for the Wigner distribution is 1, we have from Mentally folding the right and left parts of t, it is clear that
Eq. (4.28) there will be an overlap, and hence the Wigner distribution
i s not zero even though the signal is. In general the Wigner
( w ) ~= pyt) if s(t) = A(t)ejdt'. (5.7) distribution i s not zero when the signal is zero, and this
From Eqs. (4.35) and (4.37), the local meansquared fre causes considerable difficulty in interpretation. In speech,
quency and the local standard deviation are given by for example, there are silences which are important, but the
Wignerdistribution masks them. These spurious valuescan
be cleaned up by smoothing, but smoothing destroys some
(5.8)
other desirable properties of the Wigner distribution.
958 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
Range of Cross Terms: Similar considerations apply to the
cross Wigner distribution. In particular, for any two func
tions sl(t) and s2(t)which are zero outside the intervals (tl,
t2)and (t3,t4), respectively, the cross Wigner distribution is
zero for the ranges indicated [60], [92al, [92b]
1 1
if t 5  (tl + t3), t 2  (r2 + r4). (5.13)
2 2
01 02
In the frequency domain we have that for two bandlimited FREQUENCY 
functions S,(W)and &(U), which are zero outside the inter Fig. 6. Example illustrating the "propagation" of noise in
vals (w,, w 2 ) and (a3,
w4), respectively, the Wigner distribution. Noise (between arrows) appears in
the Wigner distribution at times for which there is no noise
in the signal. If the signal was of infinite duration, noise
would appear for all time in the Wigner distribution,
although it was of finite duration in the signal.
the Wigner distribution i s
These relationships are useful in calculating the Wigner dis
tribution for finiteduration signals [60], [92a], [92b]. (5.16)
7r
C. Propagation of Characteristics (e.g., Noise)
The first thing we should note i s that the Wigner distri
Almost every worker who has applied the Wigner dis bution i s positive, and this is the only signal for which it is
tribution has noticed that it i s "noisy." Indeed wewill show positive [93], [157], [180]. If a is small, then the distribution
that in general if there is noise for a small finite time of the i s concentrated along the line w = W , +
Pt, which is the
signal, that noise will "appear" at other times, and if the derivative of the phase and corresponds to the local mean
signal i s infinite, then it will appear for all time. This effect frequency. In the extreme case where we have a chirp (i.e.,
is a general property of the Wigner distribution that must a = 0) the distribution becomes'
be fully understood if one i s to develop a feeling for its
W(t, w ) = 6(w  Pt  w,) when a = 0 (5.17)
behavior. The important point to realize i s that the Wigner
distribution at a particular time generally reflects proper which shows that the energy i s totally concentrated along
ties that the signal has at other times because the Wigner the instantaneous frequency. If we further take P = 0, then
distribution is highly nonlocal. the distribution is peaked only at the carrier frequency,
Consider a finiteduration signal, illustrated below, where
w(t, U) =  a, p = 0. (5.18)
we have indicated bythe wavy lines any local characteristic,
but which for convenience we shall call noise: In Fig. 7 we plot the Wigner distribution to illustrate how
it behaves as the Gaussian becomes more chirplike.
The correlation coefficient for this case gives a revealing
ta tb
answer. We find that
Suppose we calculate the Wigner distribution at a time
where this characteristic does not appear, say, at point t,.
Folding over the signal, we see that there is no overlap of
the left part of the signal with the noise in the signal, and 7 r
hence noise will not appear in the distribution at time t,.
Now consider point t b . The overlap will include the noise
in the signal and therefore noise will appear in the distri The covariance is therefore
bution, even though there is no noise in the signal at that
time. Now considering a signal that goes from minus infin
cov (tu) = .P (5.20)
ity to plus infinity, noise will appear everywhere since for 2a
anypointwechoose,foldingoverthesignal aboutthat point
WhenP * 0, thecovariancegoes tozero,which implies that
will always have an intersection with the noise in the signal.
we have no correlation between time and frequency. That
In Fig. 6 we give an example for a finiteduration signal.
i s reasonable because we have a pure sinewave for all time.
Noise appears for times in between the arrows, even though
As a * 0, the covariance goes to infinity and we have total
the noise in the signal was of shorter duration.
correlation, which is also reasonable since a chirp forces
total dependence between time and frequency. Similar
D. Examples considerations apply to the correlation coefficient as
Example I : For signals of the form
5 0 n e must be careful in taking the limit because the signal can
(5.15) no longer be normalized. The normalizing factor is omitted when
calculating Eq. (5.17).
COHEN: TIMEFREQUENCY DISTRIBUTIONS 959
(a) (b)
. A s a 0, making
Fig. 7. Wigner distribution for Gaussian signal s ( t ) = Ae"'*~2"4r'/2+~uo' +
the signal more chirplike, the Wigner distribution becomes concentrated around the
+
instantaneous frequency w = U,, P t . P = 0.3, w,, = 3. (a) a = 1. (b) a = 0.5. (c) a = 0.1.
Time and frequency variables are plotted from 5 to 5 units.
defined by Eq. (4.231, a straightforward calculation gives
(5.21) w(t, = A: ealt2(oi31t01)2/a1 + 4 ea2t2(w~2tw2)2/4
7r 7r
The correlation coefficient goes to 1 as a 0, which indi
f i Re
+
A1A2(ala2)l/44
cates perfect correlation, and it goes to zero as a 00, which +
+
implies no correlation.
Example 2: We take
{ t ( y 2  Yl)t + i [ w  +2 + 4112
s(t) = Ale'W1t + A2e'w2t. (5.22) exP (5.27)
i(72 + 71)
The Wigner distribution is
where
W(t, w ) = A:6(w  ~ 1 ) + A:6(w  ~ 2 ) + MIA2
71 = a1 + iol, 72 = a2  io2. (5.28)
.  ;(Ul + w2)j COS (a2 wl)t. (5.23)
Fig. 4(a) illustrates such a case. The middle hump i s again
Besides the concentration at w1 and w2 as expected, we also due to the crossterm effect.
have non zero values at the frequency $(al w2). This i s an +
illustration of the crossterm effect discussed previously. E. Discrete Wigner Distribution
The point w = j(wl +
w2) i s the only point for which there A significant advance in the use of the Wigner distri
is an overlap since the signal is sharp at both w1 and w 2 . This bution was its formulation for discrete signals. A number
distribution is illustrated in Fig. 3(a). For the sum of sine of difficulties arise, but much progress has been made
waves, we will always have a spurious value of the distri recently. A fundamental result was obtained by Claasen and
bution at the midway point between any two frequencies. Mecklenbrauker [53], [54], where they applied the sampling
Hence for N sine waves we will have :N(N  1) spurious theorem to the Wigner distribution. They showed that the
values. We note that the spurious terms oscillate and can Wigner distribution for a bandlimited signal i s determined
be removed to some extent by smoothing. from the samples by
Example 3: For the finiteduration signal
T m
s(t) = e'"O', 0 5 t c T (5.24) W(t, w ) = 
T k=m
s*(t  kT)e2jwkTs(t k T ) + (5.29)
the Wigner distribution i s where I I T is the sampling frequency and must be chosen
so that T c 7r/2wmax,where om,,i s the highest frequency in
t T
w(t,0) =  sinc (w  w o k ostc the signal. The sampling frequency must be chosen so that
a 2
ws L 4wmax. (5.30)
T  t T
 sinc(w  wo)(T  t),  < t < T. (5.25)
7r 2 For a discretetime signal s(n), with T equal to 1, this
becomes
This i s plotted in Fig. 2(a). As discussed, the Wigner dis
tribution goes to zero at the end point of a finiteduration
signal. Also, note the symmetry about the middle time,
l
W(n, 0) =  cos*(n  k)eYeks(n+ k).
k=m
(5.31)
w i i c h is notthecasewiththePagedistribution.TheWigner
In the discrete case as given by Eq. (5.29) the distribution
distribution treats past and present on an equal footing.
i s periodic in w, with a period of a rather than 27r, as in the
Example 4: For the sum of two Gaussians
continuous case. Hence the highest sampling frequency
that must be used i s twice the Nyquist rate. Chan [48]
devised an alternative definition, which i s periodic in 7r.
Boashash and Black [25] have also given a discrete version
+ A2 (:)I4
' eU2t2/2 +/82t2/2 + / w 2 t of the Wigner distribution and argued that the use of the
(5.26)
analytic signal eliminates the aliasing problem. They devised
960 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
a realtime method of calculating the Wigner distribution Soto and Claverie [I791 have given a concise summary of
and the analytic signal. Choi and Williams [51] devised a the effects of smoothing with a Gaussian for quantum
discrete version of the exponential distribution. An alter mechanical distributions, and their results can readily be
native approach has been used by BoudreauxBartels and transcribed into signal analysis language. The global expec
Parks[38], [41],whoapproximated the Fourier integral using tation values of frequency and time are the same for the
spline approximations. Wigner and the smoothed Wigner distributions. That is not
A new and unified approach to the discrete Wigner dis the case for the second global moments where we have
tribution has recently been given by Peyrin and Prost [155], ( w ~ = ) ~(w2>,,, + la and similarly for ( t 2 ) . The higher
which naturally reduces to and preserves the properties of moments are also changed, which i s a reflection that the
the continuous case. Starting with the relation for the sam marginals are not preserved. Soto and Claverie show that
pled signal i(t) in terms of the continuous signal s(t), smoothing very often gives erroneous answers in the quan
tum mechanical case.
i(t) = C s ( n T ) 6 ( t  nT) (5.32)
Garudadri et a/. [81] have made important contributions
n
in understanding the effects and limitations of smoothing.
they substitute into the continuous Wigner distribution and They show that smoothing causes loss of phase informa
obtain the discrete version for the time variable tion. However, they show that partial smoothing can be of
advantage.
M n , w) = s*((n  2 k ) ~ ) s ( k ~ ) e  / ” ‘ ~. ~ (5.33)
”)~
Amin [8]has obtained conditions for the selection of time
and lag window for smoothing the Wigner distribution and
Equation (5.33) retains the frequency as a continuous vari showed the relation with the autocorrelation function.
able.The identical procedurecan be used toderivethecase The recent work of Andrieux et al. [IO] has made signif
when frequency i s discrete and time is continuous. They icant strides toward utilizing smoothing effectively. They
also generalize to the case where both time and frequency consider optimal smoothing of the Wigner distribution to
are discrete. Their approach i s general and can undoubt
be that which preserves as much as possible of the basic
edly be applied to other distributions. characteristics of the Wigner distribution. They argue that
Amin [9] has given explicit recursion relations for the cal the smoothing should involve regions of the timefre
culation of the discrete Wigner distribution and discrete
quencyplanewhich areassmall as possibleandyet still lead
smoothed Wigner distribution.
to a positive distribution. They obtain general conditions
for this minimum smoothing in terms of the rate of change
F. Smoothed Distributions of the phase for special signals of the form s(t) = e’+“’.
The major motivations for smoothing the Wigner distri Nuttall [146], [ I 4 7 has considered smoothing with a more
bution are that for certain types of smoothing, a positive general Gaussian form,
distribution is obtained [34], [47], [66], and that many of the
(5.37a)
socalled artifacts illustrated before are suppressed. The
fundamental idea i s to smooth the Wigner distribution by with
the double convolution
1
Ws(t, U ) =
s L(t  t’, w  U’) W(t’, U‘)’ dt’
where L is a smoothing function. It i s hoped that a judicious
dw’ (5.34)
Q=c2
4
and has shown that the resulting distribution will be pos
itive if Q 5 1, that is, if
(5.37b)
choice of L will result in a new distribution with desirable
properties. We stress that if L is taken to be independent 1
ap 2  (5.38)
of the signal, then the only way to obtain a positive distri 1 + c2‘
bution i s by sacrificing the marginals. The most common
smoothing function used i s a Gaussian, He points out that this smoothing function need not be a
Wigner distribution function of some signal, unless Q = 1.
He has derived a number of interesting alternative expres
(5.35)
sions for the smoothed Wigner distribution and showed
how smoothing can be optimized in advantageous ways.
and it has been known for a long time in the quantum lit Choi and Williams [51], Nuttall [146], 1147, and Flandrin
erature [34], [47], [66], [118][120], [142], [I811that for certain [76] have found the ambiguity function plane to be a very
values of a and p, a positive distribution i s obtained. The effective way of finding kernels.
condition i s that [47l, [66] Although we havediscussed smoothing in the section on
ap 2 1. (5.36) the Wigner distribution, this i s really a distributioninde
pendent process in the sense that smoothing adistribution
More general smoothing functions have been considered with one smoothing function is equivalent to smoothing
in Bertrand et a/. [23], where they showed that a sufficient another distribution with a different smoothing function.
condition for positivity is that the smoothing function be The end result i s the same and is a member of the bilinear
a Wigner distribution of any normalized signal. Janssen[IOO] class. In particular if P, i s smoothed with the smoothing
and Janssenand Claasen [I011considered smoothing of the function Ll(t’, U’; t, w ) to obtain PL,,
general class of distributions, Eq. (4.1), with a Gaussian. In
Section VI we discuss the spectrogram which can be viewed
as a smoothing procedure.
PL,(t, w ) = ss P,(t’, w’)L,(t’, U’; t, w ) dt’ dw’ (5.39a)
COHEN: TIMEFREQUENCY DISTRIBUTIONS 961
the same smoothed distribution can be obtained from dis analysis.” In fact it i s not really used in quantum mechanics
tribution f 2with smoothing function L2(t’, U’; t , U), either. Of course, the inner product i s afundamental quan
fL,(t,W ) = P,,(t, W) =
ss ’ , t,
f 2 ( t : w ’ ) L ~ ( ~U’; W) dt’ d d
(5.39b)
tity in signal analysis and quantum mechanics, and what
oneneeds isawayto relate i t t o t h e respectivedistributions.
Equation (5.45) does so and there i s no particular reason
why the relation has to be of the form given by Eq. (5.44).
if the smoothing functions are related by We note that Moyal’s formula has been found to be useful
in detection problems [781, (1171, [311.
, t,
~ , ( t ”0”; W) =
SS g(t”  t’, U”  U ’ )
. L,(t’, w’; t , w ) dt’ do’ (5.39c)
Performancein Noise:The behavior of the estimate of the
Wigner distribution for a deterministic signal in zero mean
additive stationary noise has been analyzed by Nuttall [146].
He obtains explicit expressions for the mean Wigner dis
where g(t, U ) i s defined by Eq. (4.65). tribution (ensemble average over all possible realizations
of the noise) and its variance. Since it i s assumed that the
G. Other Properties a n d Results noise i s additive, the signal and noise process i s
Positivity: We have seen that the Wigner distribution is x(t) = s(t) + n(t). (5.47)
manifestly positive for the Gaussian signal, Eq. (5.15). In fact
that i s the only signal for which the Wigner distribution i s As the noise term is stationary and does not decay to zero
positive,aswasshown by Hudson [93]and Piquet[157]. Soto at infinity, x ( t ) i s weighted by a known deterministic func
and Claverie [I801have proved it for the multidimensional tion v(t), which may be chosen advantageously depending
case. on the circumstances. The weighted process i s
Modulation and Convolution: Consider the Wigner dis
y(t) = v(t)x(t) = v(t)[s(t) + n(t)l. (5.48)
tribution of the product of two signals,
The Wigner distribution can therefore be written as the
s(t) = s,(t)s2(t). (5.40) convolution, with respect to frequency, of the distribution
To write it in terms of the Wigner distribution of the indi of v(t) with the distribution of s(t) +
n(t), as per Eq. (5.41).
vidual signals, substitute the signal into Eq. (5.1) and use the The Wigner distribution of s(t) + n(t) consists of four terms,
inverse relations of Eq. (5.5) to obtain [54] namely,thedistributionof thesignal, thedistributionof the
W(t, w ) =
s
For the case of convolution where
W,(t, w‘)WZ(t, w  U’) dw’. (5.41)
noise, and the two cross terms which are linear in the noise.
The linear noise terms ensemble average to zero because
we are dealing with zero mean noise. Therefore the mean
Wigner distribution i s
s(t) =
s s,(t‘)s,(t  t‘) dt‘ or S(w) = Sl(w)S2(w)
(5.42)

W,(t, w ) =
s W,(t, w‘)[W,(t, w  w’) + W J t ,w  U’)] dw’
(5.49)
we immediately get (by symmetry) that where overbars denote an ensemble average over all pos
W(t, w ) =
s W,(t’, w ) W2(f’  t ,
Moyal Formu1a:An interesting relation exists between the
0)dt’. (5.43)
sible realizations of the noise.
The mean Wigner distribution of the noise can be sim
plified for stationary noise. Namely, the noise covariance
C(t) i s a function of the difference of the times,
overlap of two signals and the overlap of their Wigner dis
tributions, C(t’  t ) = n(t)n*(t’) (5.50)
11Sl(f)S2+(f)dtr = 2a S Wdt, w ) WZ(t, w ) dt dw. (5.44)
and hence
This was first shown by Moyal 11431. janssen [98] showed
that there are an infinite number of other distributions with (5.51)
this property. The requirement i s thatthe kernel satisfy
r)I2 = 1. I n the case where this is not so we have
= C(w  U’) (5.52)
where G(w) i s the power density spectrum. The mean Wig
. K(t  t’, w  w’) dt dw dt’ dw’ ner distribution of y(t) i s therefore
(5.45) 
where W,(t, 0)= J W,(t, w’)[W,(t, w  U’) + G(w  w’)l dw’.
(5.53)
With the further assumption that n ( t ) i s a Gaussian with
Some have made Moyal’s formula a requirement of dis n(t)n(t’) = 0, Nuttall (1461obtains an explicit expression for
tribution, but it i s not clear why that should be so. As Jans the variance of W., He shows that for any noise spectrum,
sen [98] pointed out, it has a certain appeal in quantum thevariancewill be infinite if the signal is not weighted, that
mechanics but i s ”perhaps not really necessary for signal is, if v(t) = 1 for all time. Moreover, he shows that for the
962 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
case of white noise (power density constant for all fre which byanalogywith the precedingdiscussion can be used
quencies) the variance is infinite for any weighting func to study the behavior of the properties around the fre
tion. To have finite variance, the frequencies outside the quency point w. This is done by choosing a time window
band of the signal must be filtered out. function whose transform i s weighted relatively higher at
Other Derivations and Properties: The Wigner distribu the frequency w.
tion can be derived by different methods. A particularly The more compact or peaked we make the window in the
interesting one using the Radon transform was given by time domain, the more time resolution i s achieved. Simi
Bertrand and Bertrand [22], who also studied the behavior larly, if we choose a window peaked in the frequency
of these distributionsfor broadband signals. Kobayashi and domain, highfrequency resolution i s obtained. Because of
Suzuki [I101 have shown that for monocomponent signals the uncertainty principle, both h(t)and H(w) cannot be made
the Wigner distribution may give rise t o side lobes. arbitrarily narrow; hence there is an inherent tradeoff
General reviews of the Wigner distribution have been between time and frequency resolutions in the spectro
given by Hillery et a/. [89], Mecklenbrauker [141], Boashash gram for a particular window. However, different windows
[26], [29], and BoudreauxBartels [41]. can be used for estimating different properties.
The basic properties [68], [126], [ I l l ] , [I121 and effective
VI. SPECTROGRAM AND AMBIGUITY FUNCTION ness of the spectrogram for a particular signal depend on
the functional form of the window, although we expect that
A. ShortTime Fourier Spectrum a n d Spectrogram
the estimated properties are not too sensitive to the details
The spectrogram [5], [61, [681, [75], [ I l l ] , [1121, [1261, [1501, of the window. Indeed one would hope that the results are
[151], [158], [163], [164], [I741 has been the most widely used in some sense window independent. As an illustration con
tool for the analysis of timevarying spectra. The concept sider calculating the first conditional moment of frequency
behind it is simple and powerful. If we want to analyze what by using Eq. (4.24) in order to estimate the instantaneous
i s happening at a particular time, then we just use a small frequency. If we write the signal in terms of its amplitude
portion of the signal centered around that time, calculate and phase as in Eq. (4.26), and similarly for the window
its energy spectrum, and do it for each instant of time. Spe
cifically, for a window function h(t) centered at t, we cal h(t) = Ah(t)e/qh(r) (6.4)
culate the spectrum of s(t')h(t'  t),
then the first conditional moment i s calculated to be
e/""s(t')h(t'  t) dt' (6.1)
which i s the shorttime Fourier transform. The energyden
sity spectrum or spectrogram i s where P,(t) i s the marginal distribution in time,
which can be considered as the energy density at points t
(6.2)
Pl(t) = 'j lS,(o)12dw 'j A*(t')A;(t'
=  t) dt'. (6.6)
and W . The window function controls the relative weight This may be derived directly or by using Eq. (4.251, with the
imposed on different parts of the signal. Bychoosingawin kernel of the spectrogram to be given by Eq. (6.17). If dif
dow that weighs the interval near the observation point a ferent windows are used, different results are obtained for
greater amount than other points, the spectrogram can be ( U ) , . lfthewindow isnarrowed in such awayastoapproach
used to estimate local quantities. Depending on the appli a delta function [123], A:(t) + 6(t), then P(t) , A2(t). Using
cation and field, different forms of display have been used. Eq. (6.5) for real windows, the estimated instantaneous fre
The most common display is a twodimensional projection quency approaches the derivative of the phase
where the intensity i s represented by different shades of
gray. This is possibleforthe spectrogram, because it is man + cp'(t). (6.7)
ifestly positive. The earliest application was used to dis
We note, however, that although the average approaches
cover the fundamental aspects of speech. The mathemat
the derivative of the phase, its standard deviation
ical description of the spectrogram i s closely connected to
approaches infinity [123]. This is due to the fact that as
theworkof Fano[72]and Schroederand Atal[175], although
Ai(t)+ 6(t), the modified signal s(t')h(t'  t) is very narrow
their approach was from the correlation point of view. There
as a function of t' and hence has a large spread in the fre
have been many modifications [ I l l ] , [I121 of the spectro
quency domain.
gram, and an excellent comparison of the different
The energyconcentration of the spectrogram in the time
approaches i s presented by Kodera, Gendrin, and De
frequency plane is illustrated effectively by the following
Villedary [112]. Altes [6] has given a comprehensive analysis
example [112], where we have been able to put the final
of the spectrogram and derived a number of interesting
result in a revealing analytic form. For the signal we take
relations pertinent to the issues we are addressing in this
an amplitudemodulated linear FM as given by Eq. (5.15) (we
review.
take wo = 0 for convenience) and choose the window to be
Another perspective i s gained if we express the shorttime
Fourier transform in terms of the Fourier transforms of the
signal S(o) and window H(w),
e/""S(w')H(w  U') dw' (6.3) The shorttime Fourier transform can be calculated ana
lytically and, after some algebra, the energy density spec
COHEN: TIMEFREQUENCY DISTRIBUTIONS 963
trum can be written in either of two forms, (4.12~
P&, w ) =  (6.9) (6.20)
(6.21)
 P2(4
 (f)"]2/24 (6.10)
To have the correct marginals, both of these quantities
where P,(t) and P2(w) are the marginal distributions of time should be equal to 1. The onlywaywe can makeds(8,0) equal
and frequency, respectively, to 1 i s if we choose a window whose square approaches a
r delta function. The closer it approaches a delta function,
the closer the time marginal of the spectrogram will
approach the instantaneous energy. However, for a narrow
window, the Fourier transform will be very broad and the
spectral energy density will be represented poorly.
We have already given the first conditional moment of
frequency for a given time, Eq. (6.51, and showed its rela
(6.12) tionship t o the instantaneous frequency. A similar result
holds for the time delay. If the Fourier transforms of the
and where
signal and the window are written as
(6.13)
ap  ba the conditional expected value of time for a given fre
(0, (6.14)
a(a2 + b2)+ a ( a 2 + P2) quency is
=
U: =
1
 (01 + a) + 21 _ +_
(0_ b)2
(6.15) ( t ) , =  B 2 ( ~ ' ) B i( ~w')[$'(w')  $;(a  U')] dw'
2 01+a I S
P2(w)
=
1 (01

+ a)2 + (fi + bI2 (6.16)
(6.23)
2 a(a2 + b2)+ a b 2 + P2)' where P2(w) i s the marginal distribution in time,
From Eqs. (6.9) and (6.10) we see that for a given time, the
maximum concentration i s along the estimated instanta
P2(w) = 1 B2(w')BL(w  U') dw'. (6.24)
neous frequency and that for a given frequency, the con If the window is narrowed in the frequency domain, then
centration i s alongtheestimatedtimedelay. If wewant high a similar argument as before shows that the estimated group
time resolution, which will give a good estimate of the delay goes as ( t ) ,  $ ' ( U ) for real windows that are nar
+
instantaneous frequency, we must take a narrow window row in the frequency domain.
which i s accomplished by making a large. From Eq. (6.5) we One can perform a further integration of the conditional
see that then ( U ) , fit, as expected from our previous dis
+
moments t o get the mean time and mean frequency of the
cussion. signal. They are given by
Properties of the Spectrogram and Kernel: As previously
mentioned, the spectrogram i s a member of the class given (t)lS = (t>(s  (f)(h (6.25)
by Eq. (2.4). Expanding Eq. (6.2)and comparingwith Eq. (2.4),
(w>ls = (w>ls +(w>(h = (a'(t)>ls+ (ai(t)>Ih (6.26)
we see that the kernel that produces the spectrogram i s [56]
where the subscript S signifies that we are using the spec
trogram for the calculation and the subscripts s and h indi
cate calculation with the signal or window only, that is, with
A2(t)or with Ai(t). The mean value of the window has a very
which i s related t o the symmetrical ambiguity function (see
determined effect on these global quantities. If the window
next section) of the window. It can also be expressed in
i s chosen so that i t s mean time and frequency are zero (e.g.,
terms of the Fourier transform of the window,
by choosing a window symmetrical in time and frequency),
then theaverageof the spectrogram will be identical to that
of the window, irrespective of its shape. However, the sec
ond global moment and the standard deviation will depend
Equation (6.17) i s a very convenient way t o study and on the window characteristics.
derive the properties of the spectrogram. For example, to Similar considerations apply t o the covariance. The first
see whether energy can be preserved, we examine the ker joint moment i s
nel at 8, 7 = 0, as per Eq. (4.14),
(tw>(s (ta'(t)>ls (tpi(t))(h  (t>(h((O'(t))ls
(6.19) (6.27)
This should be equal t o 1 if we want the total energy to be and using Eqs. (6.25) and (6.26) we see that the covariance
preserved, and that can be achieved if the window i s nor can be written i n the form
malized to 1. To examine whether the marginals are sat
isfied we examine the conditions given by Eqs. (4.10) and c o v (tu)(,= c o v ( t U ) l s  c o v (tw)lh. (6.28)
964 PROCEEDINGS OF THE IEEE, VOL. 7 7 , NO. 7 , JULY 1989
This shows that the covariance of the energy density spec type discussed in Section I evolved separately and were
trum is the difference between the covariance of the signal motivated by different approaches and physical arguments.
and that of the window. Only recently has the connection between them been
lnversion and Representability: The inversion problem for appreciated. I n the quantum context Bopp [34] and Kurysh
the spectrogram poses no new difficulties and the discus kin et al. [118][I201 developed the theory of spectrograms
sion of Section IV regarding inversion applies. To deter as an alternative approach t o the Wigner distribution. Per
minewhether a particular window function leads to a spec haps the earliest connection between the spectrogram and
trogram from which a signal can be recovered, one other distributions, pointed out by Ackroyd [2], [3] and later
calculates the kernel by way of Eq. (6.17) and applies the used by Altes [6], is the relation with the Rihaczek distri
general criteria of Nuttall [I481 as discussed i n Section IVB. bution,
Comparison with Bilinear Distributions Satisfying the
Marginals: The spectrogram has a simple intuitive inter
pretation and by choosing appropriate windows, the phys
ical parameters of the signals can be measured or esti
Ps(t, w ) =
ss e#’, w’)eh(t’  t , w
where es(t,U ) and eh@,w ) are the Rihaczek distributions of
U‘) dt’ dw‘ (6.29)
mated. However, one must manipulate the window the signal and window functions, respectively. The spec
depending on the quantities being estimated. For example, trogram can be thought of as the timefrequency distri
if one wants t o obtain accurate results for both the instan bution of the signal smoothed with the timefrequencydis
taneous frequency and the time delay, different windows tribution of the window. Mark [I381 and Claasen and
must be used. Also the optimum window t o use will i n gen Mecklenbrauker [56] pointed out a similar relation with the
eral depend on time [14], [183]. O n the other hand, as we Wigner distribution,
have seen,for someof the bilinear distributions, the instan
taneous frequency and time delay are obtained exactly by
calculating the conditional averages, and no decisions with
respect t o the windows have t o be made. This is an impor
fs(t, U) =
5s w,(t’, w’)Wh(t‘  t, w
Many researchers working with the Wigner distribution,
 U‘) dt‘ dw’. (6.30)
tant advantage afforded by distributions like the Wigner, who have been unawareof Eq. (6.29), have implied that this
which has been used with considerable profit to estimate shows some unique and important connnection between
the instantaneous frequencyof a signal. However, the spec the spectrogram and the Wigner distribution. We now know
trogram has the advantage that it is always positive. The bi that these relations are just special cases of the general rela
linear distributions which give the proper marginals are tion which connects any two different bilinear distribu
never positive for an arbitrary signal. Also, the results they tions.
give for other conditional moments can very often not be In fact these two special cases can be generalized for other
interpreted [56]. The relative merits and the usefulness of distributions,
ss
these distributions are developing subjects and as we gain
experience with a variety of distributions, their advantages Ps(t’,W ‘ ) P h ( t ‘ t, w  dt’ dw’ (6.31)
Ps(t, U ) =  U’)
and drawbackswill beclarified. It isvery likelythat different
distributions should be used for different signals and for for all kernels such that 4(0,7)4(0,7) = 1, where P, and Ph
obtaining different properties of a signal. are the distribution functions of the signal and the window,
For comparison we list i n Table 3 some of the important respectively. To show this, suppose M, and Mhare the char
properties of the spectrogram and compare them t o the acteristic functions of the signal and the window. Using Eq.
bilinear distributions which satisfy the marginals and sat (3.77) we have that
isfy Eqs. (4.28) and (4.4713).
Relation to Other Distributions: Historically the devel (6.32)
opment of the spectrogram and bilinear distributions of the
Table 3 Comparison of Spectrogram with Distributions Satisfying Marginals and Eqs.
(4.28) and (4.47b)*
Property Spectrogram “Distribution”
Total energy 1 1
Time marginal Pl(t) 5 A2(t’)AE(t’  t ) dt’
‘The signal and window are written as A(t) e’”“’and AJt) eiqh“’, respectively, and both are normalized to 1. Their Fourier trans
Is
forms are expressed as B(w) el”“’ and B H b ) e’$“”’, respectively. The sybmols and I h indicate that the calculation i s done
with the signal or the window only.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 965
The characteristic function of the spectrogram i s energy, which has been chosen t o be 1. The relation that
x(0, 7 ) = x*(O,  7 ) implies that the distribution i s real. If
Ms(O,7 ) = 55 dt dw
(St(w)(2e/ef+/'w we look at the column labeled characteristic function in
Table2, we recognize that thoseare properties usuallyasso
ciated with the ambiguity function, and that in many cases
(6.33)
the interpretation in terms of distributions i s more trans
parent. The analogy can be extended by defining a gen
If we take the Fourier transform of both sides according t o
eralized ambiguity function through Eq. (3.77). Choi and
Eq. (3.30) t o obtain the distribution, then Eq. (6.31) follows.
Williams [51] have used them with profit t o analyze the
If d(O, dd(8, t ) does not equal 1, then
s 5j 5
effects of the kernel o n the behavior of the distribution.
Thereare many moreanalogies than we have indicated here.
P,(t, w) = a t " , w")Ps(t',w') A number of excellent articles exploring the relationship
. Ph(t" + t'  t, U'' +w  U') dt' dt" dw' dw" between the ambiguity function and timefrequency dis
tributions can be found i n [56], [69], [187].
(6.34) Some have argued that a particular distribution, such as
where thewigner, is"better"thantheambiguityfunction.Anum
ber of reasons are usually given, among them that the Wig
ner distribution i s real while the ambiguity function is com
plex. This is a mistaken view for the following reasons.
Characteristic functions are very often much more reveal
B. Ambiguity Function ing than the distribution. Furthermore they are very useful
in calculation as, for example, t o calculate the mixed
The Woodward [202] ambiguity function has been an
moments. The properties of a distribution are often easier
important tool in analyzing and constructing signals asso
t o determine from the characteristic function than from the
ciated with radar. It relates range and velocity resolution,
manipulation of the distribution. Also, the ambiguity func
and the performance characteristics of a waveform can be
tion plane is a very effective means for choosing kernels
formulated i n terms of it. By constructing signals having a
[51], [146], [147], [76]. Finally we point out that the charac
particular ambiguity function, desired performance char
teristic function has been a main tool for obtaining these
acteristics are achieved, at least i n theory. A comprehensive
distributions.
discussion of the ambiguity function can be found in [168],
and shorter reviews of i t s properties and applications are
VII. TIMEFREQUENCY
FILTERINGAND SYNTHESIS
found in [67] and [177].
The connection between the ambiguity function and If the concepts and methods of filter theorycould be gen
timefrequency distribution functions as discussed here eralized t o the timefrequency plane, it would offer a pow
has been recognized for a long time [108], [log]. Indeed erful tool for theconstruction of signals with desirable time
Woodward [202] noted the connection with the Ville char frequency properties. However, timefrequency filtering
acteristic function. The similarities between the ambiguity presents uniquedifficultieswhich have not been fullyover
function and pseudocharacteristic functions as discussed come. Perhaps the first attempt toobtain inputoutput rela
in Section I l l are many. Having a connection between the tionsfor a joint quasidistribution was by Liu [127],who used
two often helps to clarify relations. the Pagedistribution. Hecalculated theoutput relations for
There are a number of minor differences in terminology a number of causal linear systems and obtained interesting
regarding the ambiguity function. We shall use the defi bounds on the output distribution. Subsequently Bastiaans
nition of [168], [17], [I81 and Claasen and Mecklenbrauker [56] have
~ ( 0 7)
, = 5 s*(t  7)s(f)e'" dt. (6.36)
obtained the transformation properties for the Wigner dis
tribution. Eichmann and Dong [70] formulated a general
optical method for timefrequency filtering and produced
The symmetrical ambiguity function is defined [I681 by methods that may be applied t o many distributions.
AsSalehandSubotic[173] have pointedout, unlikeastan
dard transfer function, the output for these bilinear dis
tributions is not a simple multiplicative function of the input
distribution. As a matter of fact, the output distribution will
We note that very often the complex conjugate of (6.36) or almost always not be representable, that is, n o signal will
its absolute value, or the absolute value squared, are called exist that will produce it. There are two qualitatively dif
the ambiguity function. ferent reasons why distributions are not representable.
Comparing with Eq. (3.19) it is seen that the ambiguity
They can be categorized into distributionindependent and
function is the characteristic function of the Rihaczek dis distributiondependent reasons. For the sake of simplicity
tribution, and comparing with Eq. (3.40) we see that the sym we restrict ourselves to distributions that satisfy the mar
metrical ambiguity function i s the characteristic function ginals.
of the Wigner distribution. The mathematical and possible
1) Distributionlndependent Conditions: From a poten
physical analogy between the two enhances the interpre
tial candidate for adistribution P(t, a),one can calculate the
tation ofthepropertiesoftheambiguityfunction. Forexam
two marginals
ple, the condition that x(0,O) = 1 i s easily understood from
a characteristic function point of view since it i s a reflection
of the fact that the distribution i s normalized t o the total
P,(t) = 5 P(t, W) dw, P,(w) = 5 P(r, U ) dt. (7.1)
966 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7 , JULY 1989
Now for the distribution t o be representable, PI and P2 must respectively, and H i s the timevarying transfer function. I n
be the absolute squares of functions that are Fourier trans general the output distribution will not be representable,
form pairs of each other, that is, there must exist a signal and they present two methods t o synthesize the signal from
s ( t ) whose Fourier transform is S(w), such that P l ( t ) = ls(t)I2 the output distribution. One technique i s based on using
and P2(w) = (S(w)I2. An example of nonrepresentability, as Eq. (5.6), irrespective of whether or not the signal is rep
pointed out by Saleh and Subotic [173], would be a distri resentable, and the other finds a signal that reproduces a
bution that produced marginals which are nonzero only in distribution as closelyas possible, in the leastsquare sense,
finite regions. Such marginals could not produce Fourier totheoutput distribution.The method of Saleh and Subotic
pairs since they are time and band limited. is appealing because it conforms as much as possible with
2) DistributionDependent Conditions: The above re our current intuitive notions of what we would want time
quirements are clear and within the experience of working frequency filtering t o do. As they point out, their approach
with Fourier transforms. The second set of reasons depend applies t o other timefrequency distributions. It would be
on the functional relationship between thedistribution and of interest t o investigate for which distributions their pro
the signal, and reflect the peculiarities of the distribution. cedure can be implemented i n an optimal manner.
Thus the design of timevariant transfer functions cannot Other innovative methods have been devised for the syn
be based solely on physical grounds but must take into thesis problem. BoudreauxBartelsand Parks [39][41] have
account the peculiarities of the distribution, and unfor devised a number of efficient methods for the synthesis of
tunatelyeach distribution has itsown peculiarities. To illus the Wigner distribution, and other methods have been
trate these difficulties, we give some examples. Suppose given by Yu and Chang [203], [204] and Boashash et al. [26].
that for an input Wigner distribution function the transfer InputOutput Relations: We now summarize the input
function cutsa strip parallel to thefrequencyaxisfor afinite output relations for a general timevariant linear transfor
time interval, indicating silence at all frequencies. The mation of the signal,
resulting distribution i s never representable, although what
we have done to it, namely, asked for some silence, is cer
so(t) = h(t, t')s,(t') dt' (7.3)
tainly reasonable. Is the nonrepresentability an indication
of some violation of physical impossibility? No, it is merely
a peculiarity of the Wigner distribution. For a more dra where h(t, t ' ) is the impulse response [172], [205]. In such
matic example, suppose we have the Wigner distribution a case the relation between the input distribution and the
for a Gaussian signal. If we multiply the distribution by w2, output distribution can always be written as
dowegeta representabledistribution? No. In fact ifwemul
tiply it by any positive function other than a Gaussian, we
can be certain that the resulting distribution i s not proper.
The reason is that the Gaussian signal i s the only one that
Po(t, w ) =
SS K(t, U ; t', w ' ) P / ( t ' , U ' ) dt' dw'. (7.4)
A straightforward calculation yields
gives a positive Wigner distribution, and multiplying the
distribution with another positive function which is not K ( t , w, t ' , U ' )
Gaussian cannot result in any Wigner distribution. Now if
we use the Rihaczek distribution for the silence example,
the resulting distribution is a proper Rihaczek distribution.
However, if we multiply a Rihaczek distribution by a func
tion of time and frequency which i s not a product of func
tions of time and frequency, the output will never be a
Rihaczek distribution. Again, this i s a peculiarity of the dis . du dr d0 du' dr' do'. (7.5)
tribution and not a reflection of some inherent physical
impossibility. Hence procedures that appear reasonable, as This simplifies considerably when particular kernels are
reflected by reasonable timevariant transfer functions, considered. For the Rihaczek distribution we have
often do notwork for a particulardistribution, but maywork
for another. The failure is not due to any violation of phys
ical law, but just a reflection of the peculiarities of the dis
tribution. How to recognize and deal with these peculiar
ities i s one of the major stumbling blocks. The above where
difficulties have been investigated for only a few distri
butions, and it is possible that there may be distributions h(t, t ' ) e ~ w f  ~dwt "dt'.
' (7.7)
for which the difficulties d o not arise.
These problems not withstanding, Saleh and Subotic [I731
We note that K is a twodimensional Rihaczek distribution.
simplified matters considerably. By analogy with the stan For the Wigner distribution we have
dard transfer function method they multiply the input dis
tribution by a timevariant transfer function t o obtain the
output. Conceptually this i s an ideal method as it i s simple
and direct. Specifically, they write
where Po and P, are the output and input distributions, which i s a twodimensional Wigner distribution [17], [18].
COHEN: TIMEFREQUENCY DISTRIBUTIONS 967
VIII. OTHER
TOPICS:INSTANTANEOUSFREQUENCY, [85], [178]. For example, one can define it in terms of the
QUANTUMMECHANICS, AND UNCERTAINTY PRINCIPLE average number of zeros that a function crosses per unit
time.
A. Instantaneous Frequency and Analytic Signal
+
For a real signal of the form A(t) cos [ w o t @(t)]
the com
Theconcept of "instantaneous" frequency has a long his plex signal is often taken to beA(t)e/"o'f/m''',which is called
tory in physics and astronomy. Historically the method the quadrature, or exponential, model. The conditions
ology and description of instantaneous frequency has not under which this complex signal i s a good approximation
always been associated with timefrequency distributions to the analytic signal have been investigated [169]. Nuttall
ora timevaryingspectrum.Acomprehensive theoryof joint [I451 resolved the issue by defining the error between the
timefrequency distributions would be able t o encompass exponential and the analytic signal t o be the energy of the
and clarify the concept of instantaneous frequency, so it i s difference of the two signals. He showed that the error will
important to appreciate the work that has been done along be zero if the spectrum of A(t)e'""' i s single sided. It i s not
theselines. ltwasArmstrong's[Il]discoverythat frequency necessary for the signal to be narrow band. A convenient
modulation for radio transmission reduces noise signifi and useful theorem for the study of Hilbert transforms was
cantly, which produced a concerted effort to understand given by Bedrosian [20]. It relates the Hilbert transform of
and describe the mathematical and conceptual description a product of two signals to the Hilbert transform of each
of frequency modulation and instantaneous frequency. signal.
Early comprehensive works on the analysis of frequency "Instantaneous" frequency implies that we are dealing
modulation were those of Carson and Fry [45] and Van der with a local concept, but to calculate the Hilbert transform,
Pol [192], who defined instantaneous frequency as the rate the signal for all time must be used. This paradoxical sit
of change of the phase of the signal. This definition implies uation was analyzed by Vakman [191], who set u p mild and
that we have some procedure for forming a complex signal reasonable conditions for the formation of a complex signal
from a real one. I n general there are an infinite number of and showed that these conditions lead to the analytic sig
complex signals whose real part is a given real signal. A nal. He points out that in reality only a small band around
major step was made by Cabor [80], who from the obser the instantaneous frequency, the "active band," i s needed
vation that both sin wt and cos wt transform into an expo to approximate the analytic signal [4], [106], [191].
nential e'"' if we use only their positive spectrum, gener He makes the interesting observation that, very often,
alized to the arbitrary case with the prescription t o quantities defined globally can, under certain circutii
"suppress the amplitudes belonging to negative frequen stances, be described advantageously by local concepts as,
cies and multiply the amplitudes of positive frequency by for example, i s the case with electromagnetic waves, where
2."He noted that this procedure is equivalent to adding to in principle we are dealing with waves spread out through
the signal an imaginary part, which i s the Hilbert transform space, but under certain conditions, the light ray method
of the signal. The positive frequencies are multiplied by 2 i s appropriate and useful.
to preserve the total energy of the original signal. To see The identification of the derivative of the phase with the
how the Hilbert transform arises from the above prescrip concept of instantaneous frequency must not be taken too
tion, suppose the signal i s s(t) and its Fourier transform i s literally. The relation of the derivative of the phase and the
S(w).The signal z(t)whose spectrum i s composed of the pos "frequency" that appears in the spectrum has been inves
itive frequencies of S(w) i s given by the inverse transform tigated by Ville [194], Fink [74], and Mandel [131]. Consider
of S(w) using only the positive frequencies, at each instant keeping track of the instantaneous fre
quency and asking for the average frequency. That would
be given by the time average
Expressing S(w) i n terms of the signal s ( t ) as per Eq. (1.3), (q'(t)>r = J (o'(t)Is(t)j2dt (time average). (8.5)
z(t) = 2 lm1
277 0
s(t')e/"''elW' dt' d w (8.2)
If we compare this to the mean frequency as defined by the
spectrum,
and using the fact that
lom elwxdo = d ( x ) + i (8.3)
(W)S = w l . % ~ ) l ~d w (spectral average)
it i s easy t o prove that they are identical,
(8.6)
we have
z(t) = s(t) + j s, dt'. (8.4)
(a), = (q'(t))r
which argues for the identification of the derivative of the
phase as the instantaneous frequency. However, if onecal
culates the second moments, the identification i s no longer
(8.7)
The second part of Eq. (8.4) i s the Hilbert transform of the
signal, and z(t) is called the analytic signal. The derivative compatible because they are not equal. I n fact [741, [1311,
of the phase of the analytic signal conforms t o our expec [1941,
tations of instantaneous frequency for a wide variety of
cases, particularly narrowband signals. There has been (w2)S = (p'2(t))r + J A'*(t) dt. (8.8)
considerable controversy over whether this represents the
proper mathematical expression of instantaneous fre Also, the standard deviation of the spectral average i s
quency, and a number of other definitions have been given u: = ( a 2 ) , ( U ) : , and bya similar expression for the time
968 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, J U L Y 1989
average, we have are not properlygiven. Second, the analytic signal does not
have negative frequencies and therefore cannot cause
interference terms with the positive frequencies. Although
U: = U: + A’2(t) dt. (8.9)
eliminating the negative frequencies does eliminate their
Mandel [I311 has emphasized that the derivative of the overlap, it does not eliminate the interference of the pos
phase does not always coincide with the frequencies that itive frequencieswith other positive frequencies. Therewill
appear in the spectrum, although the averages are equal, always be interference terms, no matter what part of the
as per Eq. (8.7). I n fact it i s very easy t o construct examples signal i s eliminated, since that i s an inherent property of the
where the derivative of the phase of an analytic signal bilinear distributions.Thethird argumentfor usingtheana
(whose spectrum consistsof only positive frequencies) may lytic signal i s that aliasing i s eliminated and the sampling
be negative at certain times. Of course one can define rate i s reduced to the standard Nyquist rate [26].
instantaneous frequency t o be the derivative of the phase,
but the conceptual notion embodied in the phrase will not
B. Relation to Quantum Mechanics
always be reflected i n the definition. We note that if there
i s no amplitude modulation, the two coincide, and we fur The fundamental notion of classical mechanics is that
ther note that the second term of Eq. (8.9)is identical t o the from a knowledge of the initial positions and velocities of
expected value of the local spread, as defined by Eq. (4.39). a particle, and the knowledge of the forces, one can predict
From the perspective of joint timefrequency distribu exactlywhat the position and velocityof the particle will be
tions, instantaneous frequency i s defined as the average at a later time. The equation of evolution in classical
frequencyat a particular time, that is, the mean conditional mechanics i s Newton’s second law of motion. The break
moment of frequency. We have already seen that there are down of classical mechanics and the realization that the
an infinite number of distributions which give the deriv deterministic viewpoint i s incorrect because the laws of
ative of the phase for the mean conditional value of fre nature only predict the probability where a particle will be,
quency. The condition for this t o hold i s given by Eq. (4.28) i s one of the greatest intellectual achievements of human
and i s usually considered an important and desirable attri kind. I n addition it has had profound practical conse
bute of a distribution. However, we note that the result i s quences as evidenced by the modern devices based on
true for any complex signal, not just the analytic signal. One quantum effects. The fundamental idea of modern physics
may argue that the condition o n these distributions should i s that wecan only predict probabilities for observables such
be that the conditional moment of frequency be the deriv as position and velocity, and that this i s not a reflection of
ative of the phase for only certain types of signals, or it human ignorance but rather the way that nature operates.
should be the derivative of the phase in some approximate The probabilities are predicted by solving Schrodinger’s
sense. It can also be argued that the result should be the equation of motion, which gives the wave function of posi
derivative of the phase of the analytic signal, even if the tion at time t. The probability of finding the particle at posi
actual signal i s used i n the calculation of the distribution. tion q a t time t i s then the absolute square of thewave func
More fundamentally, a theory of timefrequency distri tion. Another dramatic departure of quantum mechanics
butions should predict the proper expression for instan from classical mechanics i s that physical observables are
taneous frequency and the ”answer” should not have t o be represented by operators and not functions. The noncom
imposed. In addition we point out that some of the distri mutation of operators has profound consequences regard
butions which give the derivative of the phase for the first ing the simultaneous measurability of observables. We
conditional moment give improper results for the second should point out that i n quantum mechanics we may have
conditional moment, as discussed in Section IV. This indi an additional level of description. That i s the case where we
cates that we do not have a fully consistent theory. Con do not know, because of human ignorance, what the wave
siderable further research i s needed t o clarify the relations function is and assign a probability to the possible wave
between the instantaneous frequency, joint timefre functions. This i s done in quantum statistical mechanics
quency distributions, and the result implied by the works and i s similar t o the treatment of stochastic signal in signal
of Ville, Fink, and Mandel, Eq. (8.9). theory. We emphasize that in quantum mechanics we are
From the practical point of view the question arises as to startingwith a probabilitydescription, but in signal analysis
whether the actual signal or the analytic signal should be we are starting with a deterministic description.
used when calculating a joint timefrequency distribution. There is a partial formal mathematical correspondence
Many have advocated using the analytic signal. There are between quantum mechanics and signal analysis. Histor
three basic reasons for this advocacy. First, as we have dis ically work on joint timefrequency distributions has often
cussed, some distributions give the derivative of the phase been guided by corresponding developments in quantum
for the conditional first moment. Hence it i s argued that we mechanics. Indeed the original papers of Gabor and Ville
should use the analytic signal because the instantaneous continuously evoked the quantum analogy. However, the
frequency i s defined in terms of the analytic signal. If one analogy i s formal only and because the interpretation is dra
wants to use these distributions for the estimation of the matically different, one must be particularly cautious in
instantaneous frequencies, and that is certainly an impor transposing and interpreting results from one field to
tant application, then the analytic signal should be used. another. What may be reasonable in quantum mechanics
Whileit istruethatadistributionoftheanalyticsignaldram does not necessarily make it resonable in signal theory.
atizes the instantaneous frequency concentration for many Indeed it i s often preposterous in signal theory, as will be
signals, the question of what it hides of the original signal illustrated with a concrete example.
has not been fully addressed. We point out that when the The similaritycomes about because in quantum mechan
analytic signal is used, the marginals of the original signal ics the probability distribution for finding the particle at a
COHEN: TIMEFREQUENCY DISTRIBUTIONS 969
certain position istheabsolute squareof thewave function, the standard deviation of its Fourier transform. In partic
and the probability for finding the momentum i s the abso ular, the standard deviations are defined by
lute square of the Fourier transform of the wave function.
Thus one can associate the signal with the wave function, (At)2 = (t  t)21s(t)12d t
time with position, and frequency with momentum. The
marginal conditions are formally the same, although the
variables are different. The first fundamental difference is
that quantum mechanics i s an inherently probabilistic the
(Aw)* =
s (W  Z ) * ) S ( Wdu
where: and ware the mean time and frequency. The uncer
)~~ (8.10)
ory. Its probabilistic interpretation is not aquestion of igno tainty principle i s
rance but of the fundamental character of the physical AtAu 2 1 (8.11)
world. I n signal theory, on the other hand, the signal i s
for any signal. In common usage At and AWare called the
inherentlydeterministic, and the absolute squareof the sig
duration and the bandwidth of a signal.
nal i s an intensitywith no probability connotations. We now
We would like to clarify the role of the uncertainty prin
come t o the most important distinction. In quantum
ciple and its significance with regard to joint distributions.
mechanics, physical quantities are always associated with
Wewill show that the uncertainty principle is a relationship
operators. It is the fundamental tenet of quantum mechan
concerning the marginals only and has no bearing on the
ics that what can be measured for an observable are the
existence of joint distributions. The phrase ”uncertainty”
eigenvalues of its operator. This produces some seemingly
was coined in quantum mechanics, where its connotation
bizarre results, which are nonetheless true and have been
is appropriate since quantum mechanics i s an inherently
verified experimentally. It is the basis for the quantization
probabilistic theory. In quantum mechanics the standard
of physical quantities and has no counterpart in signal the
deviations involve the measurement of physical observa
ory. For a dramatic example, consider the sum of two con
bles. However, in nonprobabilistic contexts the uncertainly
tinuous quantities. In quantum mechanics the sum i s not
principle should be thought of as expressing the fact that
necessarily continuous. Specifically consider the position
a function and its Fourier transform cannot be made arbi
q and momentum p, which are continuous variables; i n
trarily narrow.
+
quantum mechanics q2 p2(appropriately dimensioned)
The proper interpretation of the uncertainty relation in
i s never continuous under any circumstances, for any par
signal analysis has been emphasized by many. In his paper
ticle. It i s always quantized, that is, it can have only certain
on the representation of signals, for example, Lerner [I241
values. The corresponding statement in signal analysis
states that the uncertainty principle . . has tempted some
’ I .
would be that time and frequency are continuous but that
individuals t o draw unwarranted parallels t o the uncer
+ u 2 (appropriately dimensioned) i s never so, and of
tainty principle in quantum mechanics.. . The analogy i s
course that would be a ludicrous statement t o make in sig
formal only.” Equally t o the point i s Skolnik [177]: ”The use
nal analysis. Hence, even though there is a mathematical
of theword ‘uncertainty’ i s a misnomer, for there i s nothing
analogy with quantum mechanics, we cannot take the
uncertain about the ’uncertainty relation.’. . . It states the
resultsof quantum mechanics over to joint timefrequency
wellknown mathematical fact that a narrow waveform
distributions indiscriminately. In Table4weoutline thefor
yields a wide spectrum and a wide waveform yields a nar
mal mathematical correspondence between signal analysis
row spectrum and both the time waveform and the fre
and quantum mechanics.
quency spectrum cannot be made arbitrarily small simul
taneously.”
C. Uncertainty Principle a n d joint Distributions
In both signal theory and quantum theory we have an
The uncertainty principle expresses a fundamental rela uncertaintyprinciple. lnquantum mechanics it referstothe
tionship between the standard deviation of a function and probabilistic aspects of measuring quantities, and the word
Table 4 Relationship Between Quantum Mechanics and Simal Analvsis*
Quantum Mechanics Signal Analysis
(Inherently Probabilistic) (Deterministic)
Position q (random) Time t
Momentum p (random) Frequency w
Time t No correspondence
Wave function $(q, t ) Signal s(t)
Momentum wave function @ ( p ,t ) = 
m
I S $(q, t ) dq Spectrum S(w) = 
1
1 s(t) e’‘“‘ dt
Probability of position at time t l$(q, t)I2 Energy density ls(t)I2
Probability of momentum at time t I@(p,t)1’ Energy density spectrum IS(w)l’
Expected value of position ( q ) = I ql$(q, t)l’ d q Mean time ( t ) = j tls(t)l’ dt
Expected value of momentum ( p ) = 1 pld(p, t)I2dp Mean frequency (w) = j wlS(w)l’ d w
Standard deviation of position uq = J ( q ’ )  (q)’ Duration T = J(P)  (t)*
Standard deviation of momentum up = J ( p ’ )  ( p ) ’ Bandwidth B = J ( w z )  (U)’
Uncertainty principle CPUq 2 h / 2 Timebandwidth relation BT 2
*The formal mathematical correspondence i s (position, momentum) Q (time, frequency). The wave function i n quantum mechanics depends o n time, but
this has no formal correspondence i n signal analysis. Planck’s constant h may be taken equal to 1. Quantum mechanics i s an inherently probabilistic theory
in contrast tosignal analysis, which isdeterministic. Hencewhile there i s the formal mathematical correspondence, the interpretation of results isverydifferent.
Both quantum mechanics and signal theory have another level o f indetermlnism where the wave function or the signal i s ensemble averaged.
970 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
"uncertainty" connotes the right meaning. It i s one of the i s sometimes stated that if one averages the Wigner dis
most profound discoveries and refers to the measurement tribution over an area greater than that given by the uncer
of physical quantities represented by operators that do not tainty principle, we will get a positive answer. That i s not
commute, such as position and momentum. In signal anal the case. Many counterexamples have been given.
ysis it applies only to the broadness of signals, which are
related to each other by Fourier transforms, and it does not Ix. AppLlcATloNs
relate to measurement in the quantum mechanical sense.
There has been a considerable effort to apply these dis
As Ackroyd [3] has emphasized, "There i s a misconception
tributions to almost every field where nonstationary signals
that it i s not possible to measure the t  f energy density
of a given waveform and that this i s a consequence of arise. The purpose of the applications has varied consid
erably, from the simple graphic presentation of the results,
Gabor's uncertainty relation. However, the uncertainty
with the expectation that they will reveal more than other
principle of waveform analysis i s not concerned with the
methods, to sophisticated manipulation of the distribution.
measurement of t  f energy density distributions: instead
We will emphasize how these distributions have been
it states that if the effective bandwidth of a signal i s W then
applied, but we will not go into detail about the particular
the effective duration cannot be less than about 1 / W (and
numerical techniques. The applications can be broadly cat
conversely). . ."
Additional confusion arises when the At and Au, which egorized according to three methodologies. First is cal
are used in the uncertainty principle to connote standard culating the distribution to see whether it does reveal more
information than other tools such as the spectrogram. An
deviations or broadness, are misconstrued with the differ
example is the application to speech, where one hopes more
ential elements of calculus. They are not the same, and the
of the fine points of speech such as transients and tran
uncertainty principle does not say that we cannot make the
differential elements as small as we like. The two uses of sitions will be revealed. Second is to use a particular prop
erty of the distribution which clearly and robustly repre
A should not be confused.
sents the timefrequency content for that property, for
We now address the question of the relationship of the
uncertainty principle and joint distributions. Our point i s example, correlating instantaneous frequency with phys
that it has no bearing on the question of joint distributions ical quantities one i s trying to obtain. Third is to use the
distribution as a carrier of the information of a signal and
and relates to the product of the standard deviations of the
without concern as to whether the distribution truly rep
marginals. To understand this, suppose we have a joint dis
tribution and wish to calculate the product (At)2(Aw)2. It resents the timefrequency energy density. Many appli
cations do not fall clearly into the above categories, but it
would be
is nevertheless useful to keep them in mind because the
(At)2(Aw)2= ss (t  j ) 2 f ( t , w ) d t dw
successor failureof adistribution in a particular application
does not necessarily imply success or failure in a different
.ss (U  W ) 2 f ( t , w ) dt dw (8.12)
application. For example, the Wigner distribution may be
hard to interpret in the analysis of speech, but may be use
ful for recognition. For applications where the interpre
tation as true densities i s not necessary, the violation of cer
tain properties, such as the marginals, may be acceptable.
Perhaps the earliest application that took advantage of
(8.13) the Wigner distribution was the work of Boashash [35]. His
method is based on correlating a physical quantity of the
This i s the usual starting point in the derivation of the uncer problem at hand with a feature in the Wigner distribution,
tainty principle, and hence the uncertainty principle fol usually the instantaneous frequency. The importance of
lows. This demonstration i s rather trivial; however, since Boashash's idea is that one does not have to rely on a full
there i s a general sense that the uncertainty principle has interpretation of the distribution as a joint density but only
to do with correlations between measurements of time and that some of its predictions need be correct. For example,
frequency, the preceding steps force the reader to see that as long as one isconfidentthatthe instantaneousfrequency
this is not the case. The uncertainty principle i s calculated is well described by the distribution, the fact that other
onlyfrom the marginals. Hence any joint distribution that properties may not be i s unimportant. His first application
yields the marginals will give the uncertainty principle. It of this was to geophysical exploration. The basic idea i s to
has nothing to do with correlations between time and fre send a signal through the ground, measure the resulting
quency or the measurement for small times and frequen signal, and calculate the Wigner distribution. From the dis
cies. What it does say i s that the marginals are functionally tribution onedetermines the instantaneousfrequency,and
dependent. But the fact that marginals are related does not from the instantaneous frequency one calculates the atten
imply correlation between the variables and has nothing to uation and dispersion. This method has been used to study
do with the existence or nonexistence of a joint distribu many diverse problems. Boashash etal. [27], [28] studied the
tion. absorption and dispersion effects in the earth. lmberger
It is often stated that one cannot have proper positive and Boashash [94], [95] have applied the method to analyze
distributions because that would violate the uncertainty the temperature gradient microstructure in the ocean by
principle. But it i s well known that the Wigner distribution relating the instantaneous frequency to the dissipation of
is positive for some signals. If positivity and the uncertainty kinetic energy. Bazelaire and Viallix [I91 have also used the
principle were incompatible, it must be so for all cases. Fur Wigner distribution to obtain data to measure the absorp
thermore it i s possible to generate an infinite number of tion and dispersion coefficients of the ground and have for
positive distributions which satisfy the marginals. Also, it mulated a new understanding of seismic noise.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 971
TIME IN MICROSEC.
(a) (b)
Fig. 8. ContourplotsofWignerdistributionfortheoutputofasimulated ultrasonictrans
ducer for various design parameters. The advantage of using a timefrequency distri
bution is that one can readily see the main features of the output.
Of particular significancewas theworkof Janseand Kaizer general characteristics of the output with the various
[97]. They developed innovative techniques and laid the parameters under the designer's control. The advantage of
foundation for the use of these distributions as practical using a timefrequency distribution i s that one can quickly
tooIs.TheycalcuIated the Wigner distribution for a number and effectively see the effects of varying the parameters. In
of standard filters, and found it t o be a particularly powerful Fig. 8 we show various contour plots of the Wigner distri
means of handling the inherently nonstationary signals bution of the output of a simulation program for designing
encountered in loudspeaker design. transducers for three choices of design parameters. In Fig.
I n the design of devices t o produce waves, timefre 8(a) the output has a very poor uniformity of energy in the
quency distributions are an effective and comprehensive various frequencies. In Fig. 8(b) there is considerable
indicator of the characteristics of the output. We illustrate improvement, but nouniformityyet,and i n Fig.8(c)we have
this with the work of Marinovic and Smith [137], who used an ideal case, the output being quite uniform for the fre
the Wigner distribution as an aid in the design and analysis quencies produced. The advantage of using a joint time
of ultrasonic transducers. An ultrasonic transducer i s a frequency distribution i s that within one picture the char
device for producing sound waves and is typically used as acteristics of the transducer are readily discerned and one
the source of waves in medical imaging, sonar, and so on. does not have t o do various independent time and fre
The most common means of producing highfrequency quency analyses.
sound i s by exciting a piezoelectric crystal such as a quartz. An example that illustrates the use of these distributions
When mechanical stress is applied t o these crystals, the fordiscoveryand classification istheworkof Barryand Cole
polarization i s changed, producing an electric field. Con [I51 on muscle sounds. When a muscle contracts, it pro
versely when an electric field i s applied, the crystal i s duces sounds that can be picked up readily by a micro
strained. By using an oscillating electric field the crystal i s phone. It has been discovered that these sounds are not
made t o vibrate, producing acoustic waves i n the medium. due to the muscle vibrating as a simple string. The work of
The output of a transducer depends on many factors, such Barry and Cole [I51 and others i s aimed at correlating the
as shape, thickness, the electronic fields driving it, and the properties of the sound with the characteristics of the mus
coupling with the medium. In designing a transducer one cle. If one had agood understanding of thedifferent mech
i s interested in the output having certain desirable char anisms that are producing significant changes in the dis
acteristics. Typically what i s required is, for the output, to tribution, one would potentially have an excellent
have a uniform time spread over the frequencies. Uniform diagnostic tool since these acoustic waves would provide
ity is desired so that there are n o aberrations with respect a noninvasive diagnostic tool. Fig. 9 shows the distribution
to different frequencies acting in different ways. A number of Choi and Williams for two different sounds produced by
of simulation models have been devised which predict the a muscle. The first i s produced during an isometric con
110. 
 90.0 140.
I
N
 70.0 N 109.
U
F

5 50.0 78.0
3 z
W
g 30.0 3
0
w 46.8
LL
10.0
15.6
J
10.0 30.0 50.0 70.0 90.0
4

5.00 15.0 25.0 35.0 45.t
TIME I MS 1 TIME I MS 1
(a) (b)
Fig. 9. ChoiWilliams distribution for two different sounds produced by a muscle. (a)
During isometric contraction. (b) When muscle is twitched.
972 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
traction and the second when the muscle is twitched. The that has propagated through different media. At each
general features are quite reproducible from muscle t o instant of time the signal we measure will be the super
muscle and hence reflect some general characteristics of position of a number of waves. We may still be measuring
muscle contraction. O n a fundamental level these figures the initial wave more or less as it left the source. Super
dramatically show that average frequencyor instantaneous imposed on that will be a delayed signal from a reflection
frequency changes significantly during a muscle contrac boundary, and that signal may have the same shape as the
tion and gives an indication of the spread around the aver original one, but delayed in time. The superposition of these
age. The sounds have been correlated with characteristics two signals may look quite complicated, but in the time
of the muscle such as its stiffness. The important point from frequency plane we will simply see the distribution of the
our perspective i s that, as Barryand Cole[15] point out, these original signal and the samedistribution translated upward
”timedependent frequency changes in the acoustical sig with respect t o time. This would be immediately recogniz
nals would be hard t o discern with standard frequency able. If in addition we have a wave that was delayed and
domain analysis.” dispersed, this will be seen i n a timefrequency plot as an
The propagation of a signal through different media i s a image similar t o the original, displaced upward with a cer
verycommon occurrence i n nature, and since it i s generally tain amount of relative bending in those frequencies were
accompanied by time delays and changes in frequencies, the dispersion occurred.
acombined timefrequencydescription i s natural.The sim To illustrate we use the work of Boashash and Bazelaire
plest propagation of a disturbance is one governed by the [36] and Boles and Boashash [32] on geophysical explora
equation u,,(x, t ) = v2utt(x, t), where U is the physical quan tion. Seismic signals are particularly rich and varied due t o
tity that i s changing (e.g., pressure or electric field), x and layers of different media. Not only are there layers of var
t are position and time, respectively, and v i s the velocity ious solids with different properties (e.g., shale, sandstone),
of propagation. If we start with a disturbance at t = 0 given but one has layers of water and gas beneath the surface. In
by u(x, O),then the disturbance at a later time will be given addition if one is exploring off shore, the signal obtained
by the same function, displaced by vt, that is, the disturb from sources beneath the seabed are concealed by the
ance will propagate with the same shape. Mathematically reverberations of the water. What i s typically done in explo
this i s explained by observing that any function of the form ration is t o produce a wave at the surface and measure the
u(x  vt) is a solution. Examples are electromagnetic waves resulting wave at one or more places down the field. The
i n free space and sound waves i n air (to a large extent). initial wave i s generated by different means, such as by an
Indeed the reason that a person standing 10 ft away from explosion caused by dynamite or by vibrating a metal plate
an object sees and hears the same thing as a person 20 ft coupled to the earth. The resulting acoustic wave travels
away is that the shape of the disturbance has not changed. through the different layers and i s reflected upward with
However, a typical wave equation governing the prop possible multiple reflections. The velocity of different lay
agation of a wave in a medium contains extra terms and ers varies considerably. For air it i s about 1000 ft/s and for
does not admit solutions of the form u(x  vt) for an arbi solid rock it may be as high as 20 000 ftls. The resulting sig
trary U . The equation is a wave equation because it does nal will be a multicomponent signal, and if we did have a
admit propagating waves of the form e/(*wfkxl. Only si good timefrequency distribution, each component would
nusoidal waves propagate without change. One of the most stand out i n the timefrequency plane. A schematic dia
important effects i n wave propagation is that the phase gram of what such a distribution would look like i s pre
velocity may depend on the wavelength. Since an arbitrary sented in Fig. 10, which isadapted from Bolesand Boashash
disturbance can be decomposed into sinusoids by way of
the Fourier transform and each will move at a different
velocity, the recombination of them at a later time will not
preserve the shape of the signal. This phenomenon, that
sinusoidal waves of different frequencies propagate with
different velocities, is called dispersion. The reason for this
term is that the earliest discovered ramification was that a
prism can “disperse” white light into different colors since
they travel with different speeds in glass. If the velocity
decreases with frequency, one says the dispersion i s “nor
mal;’’ otherwise it is termed ”anomalous” dispersion. FREQUEYCY 
Another important effect in the propagation of a signal i s Fig. 10. Schematic diagram of a timefrequency distribu
the attenuation of a wave, the dying out or absorption. The tion for a wave reflected from media with different char
acteristics. Curve ainitial signal; curve 6distribution of
energy is typically dissipated into heat. Again, the amount a signal that has gone through a layer with little dispersion;
of attenuation depends on the medium and the frequency. curve cfrom layer with normal dispersion. The high fre
In the case of sound in normal conditions there i s almost quencies travel at a slower speed and arrive at a relatively
no attenuation, and that i s why we are able t o hear from far later time. Also the high frequencies are cut off for 6 and c,
indicating attenuation. (Adopted from Boles and Boashash
away. I n contrast highfrequency electromagnetic waves [32] and Boashash and Bazelaire [36].)
are damped within a short distance of entering the surface
of a conductor. Also, as a wave propagates from one
medium to another, part of it gets reflected and part trans [32]. The different components are labeled ac, where a is
mitted. the original signal at the source. The delays are due to
We are now in a position t o understand why a timefre reflection from deeper layers which arrive at a later time.
quency analysis offers an effective description of a signal The bending upward at the high frequencies i s a reflection
COHEN: TIMEFREQUENCY DISTRIBUTIONS 973
of the normal dispersion because the high frequencies what kinds of distributions would be desirable t o use in
travel more slowly and arrive at a relatively later time. For speech analysis and has used smoothed distributions to
the components that are delayed longest and have traveled study formant structure. He devised a means of detecting
furthest, the high frequencies are cut off because of atten and extracting the relevant speech parameters. Velez and
uation. The advantage of using a timefrequency descrip Absher [I931 used the smoothed Wigner distribution t o dis
tion i s that one can see all these effects in one picture. playformant structure in speech and found i t t o bean effec
Potentially one may obtain the parameters by direct mea tive clarifier of speech sounds. We have already mentioned
surement of the delay, attenuation, and dispersion and the work of Janse and Kaizer [97] in regard t o loudspeaker
thereby identify the media through which it propagated. design. Preis [I611 used the Wigner distribution t o study
Synthesis methods can be used t o decompose the signal various audio signals and found that the combined rep
into its components. We have gone into some length in resentation presents a clearer view of the various timefre
describing this type of situation because it is very typical quency quantities. Szu [I891 has given a comprehensive dis
of avarietyof phenomena and will most likely be oneof the cussion of the applications of bilinear distributions for the
common uses of timefrequency distributions. The imple study of various acoustical signals,and i n particulartoques
mentation of the idealized picture described i s currently tions of the signal processing involved i n hearing.
complicated by various factors. In particular, if the Wigner Pattern recognition schemes use thedistribution as a two
distribution is used, in addition t o the “real components” dimensional representation, not necessarily time and fre
we have the cross terms, and for more than a few com quency [I], [30], [37l, [65], [115]. Particular interpretations of
ponents the number of cross terms i s very large. We also a joint distribution as representing the energy i s not
have noise. Also, because the layers are not uniform, the required for this type of analysis.
simple picture illustrated above becomes considerably Kumar and Carroll [116], [ I 1 7 considered the use of the
more complicated. Boles and Boashash [32] have devised Wigner distribution function for the binary detection prob
a number of methods t o overcome these difficulties and lem where one has to make a yesno decision as to the exis
have applied their analysis t o real and simulated data. We tenceof a signal in the presence of additive noise. They used
havealreadyseen thatthenew distributionof Choiand Wil the integral along the instantaneous frequency of the Wig
liams [51] reduces the cross terms dramatically. It would ner distribution as a statistic. The Wigner distribution
certainly be interesting to apply that distribution t o such method performed comparably t o crosscorrelation meth
a situation. ods. They point out possible advantages for nonstationary
An innovative use of these distributions has been the signals of more complexity. A detection scheme for deter
work of Marinovic and Eichmann [134], [135], who devel mining the instantaneous frequency of a chirp i n additive
oped a novel approach that does not depend on inter noise was considered by Kay and BoudreauxBartels [104].
preting them as true distributions. The Wigner distribution They used the likelihood ratio test t o show that it i s optimal
is regarded as the kernel of an integral equation, and the to integrate the distribution along all straight lines in the
corresponding eigenvalues and eigenfunctions are found. timefrequency plane and choose the maximum value to
Theexpansion in terms of theeigenfunctions has been used compare t o a threshold. This reflects the fact that for chirp
for pattern recognition as the eigenvalues have been found like signals the concentration of the distribution i s maxi
to be effective classifiers of shapes. This decomposition may mum along the instantaneous frequency. Szu [I881 has
also be used t o suppress the effects of noise i n the Wigner devised a number of methods t o use these distributions for
distribution [136]. The noise i s spread through all the terms passive surveillance. He has taken advantage of the unique
of the decomposition and, retaining only the first few terms symmetry properties of the Wigner distribution. The syn
of the expansion, suppresses the noise considerably. The thesis method developed by BoudreauxBartels and Parks
expansion methodwas applied by Marinovic and Smith [I361 [39], [41] has been used t o separate two signals with dif
to show how the reconstructed distribution, which retains ferent characteristics. A general approach to the detection
only the first few terms of the singular value expansion, problem in the timefrequency plane has been formulated
allows one t o extract the local frequency information in the by Flandrin [78]. Harris and Abu Salem [88] have compared
case of echo signals which get corrupted by interference the performance of the Wigner distribution with other
and noise. methods for the case of a sinusoid in the presence of addi
Speech i s one of the most complex nonstationary signals tive white noise. They found that for the estimation of
and a natural application for these timefrequency distri amplitude and frequency the Wigner distribution behaves
butions. Chester and coworkers [49], [50] were the first to poorly i n noise but has the advantage that a priori knowl
apply the Wigner distribution for the analysis and recog edge of some of the characteristics of the signal is not
nition of speech and constructed hardware for its calcu required as i s the case with some other methods. Cohen,
lation. They pointed out that the Wigner distribution has BoudreauxBartels, and Kadambe [57l have devised a time
considerable noise sensitivity and that the interpretation i s frequency approach to tracking mono and multicompo
not straightforward, as with the spectrogram. However they nent chirp signals i n noise. Boashash and O’Shea [31] have
found it useful for analysis and recognition.Their useof the extended the work of Kumar and Carroll and applied their
Wignerdistribution i s based o n the possibility thatthechar method to the identification of underwater acoustic tran
acteristic for speech signals i s very robust and hence may sients, in particular machine noise. They developed a gen
be used for recognition. Pickover and Cohen [I561 used a eral methodology for use of the Wigner distribution and
number of distributions t o study speech and found them cross Wigner distribution for detection problems.
difficult t o interpret compared to the standard spectro The Wigner distribution has been used extensively as a
gram. They pinpointed the difficulty with each distribution tool to study radiance functions and coherence in optics,
they considered. Riley [I701 has considered the question of and methods for its production and optical filtering and
974 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
display have been proposed and studied [12], [13], [16], [42], enigma of these distributions i s that they sometimes give
[441, [641, [861, [961, [IO3l, [105l, [I@], [1821, W41. very reasonable results and sometimes absurd ones. For
Linear predictive and autoregressive methods have been example, the Wigner distribution gives a very reasonable
considered by Ramamoorthy et al. [165]. They showed that result for the first conditional moment of frequency, but an
it results in good time and frequency resolutions, although unreasonable one for the second conditional moment. A
they find that the interpretation is difficult. Boashash and common attitude i s that when we do get unacceptable
coworkers [30], [I291 showed that autoregressive methods results, we will know that the theory does not apply, and
can improve resolution if a careful choice i s made of the wewill not use it for those situations.The problem with that
parameters, otherwise spurious peaks occur which have no point of view i s how do you know when the results are
significance. absurd? Sometimes it i s obvious, but not always. The fact
In a unique application Choi, Williams, and Zaveri [52] that thesedistributions cannot be used i n aconsistent man
used the distribution discussed in Section IllG t o evaluate ner is one of the main areas that needs much further the
the classify “eventrelated potentials” where certain words oretical development.
were used to induce brain wave responses in patients. The Oneof the major issues in the field hasalways been which
signal obtained i s represented by the distribution and used distribution, if any, i s the absolute “best.” There has been
to classify the signal in terms of the types of stimuli. They a general attempt t o set up a set of desirable conditions and
found thedistribution given by Eq. (3.86)to beveryeffective totryto provethatonlyonedistribution fits them.Typically,
as it reduces the masking effects of the cross terms. however, the list i s not complete with the obvious require
Breed and Posch [43] have used the Wigner distribution ments, because the author knows that the added desirable
to study an array of receivers and have formulated it in terms properties would not be satisfied by the distribution he i s
of the spatial parameters. They show that is provides a use advocating. Also these lists very often contain require
ful range and azimuth estimator. This approach works very ments that arequestionableand areobviously put in t o force
well because for moderate ranges the signal has a quadratic an issue. An example i s the requirement that Moyal’s for
phase spatial variation. These are precisely the cases that mula should hold, but it is unclear why. If we found a dis
the Wigner distribution is well suited to handle, as we have tribution for which the Moyal formula did not hold but
seen in Section V. Swindlehurst and Kailath [I851 have also nevertheless behaved well, would we reject it on that basis?
devised a method for using the Wigner distribution for Clearly not. Witness the recent discovery of the ChoiWil
source localization for an array of receivers in the nearfield liams distribution. Another requirement commonly
approximation. imposed i s the finite support property, that is, for a finite
The relationship between the evolutionary spectrum of duration signal the distribution should be zero before the
Priestley [I621 and the Wigner distribution has been made signal stops and after the signal ends. That certainly seems
by Hammond and Harrison [87]. like a desirable condition for if there is no signal, we expect
The theoryof these distributions has been applied t o sto the distribution t o be zero. (However, we know from Sec
chastic signals, and many of the original papers in the field tion V that there are distributions, for example, the Wigner,
addressed this aspect of the problem. This involves a fur which have the finitesupport property but are not nec
ther averaging t o take into account the distribution of sig essarily zero in regions where the signal i s zero.) But then
nals. Comprehensive work has been done by Janssen [99], thecondition should simply bethat the distribution should
Martin [139], Martin and Flandrin [140], and White [196]. be zero if the signal is zero and not just the finitesupport
Martin [I391 has coined the phrase WignerVille spectrum property. The positivity condition i s also usually left out,
to indicate the Wigner distribution that has been ensemble although everyone concerned with choosing a best distri
averaged over the possible realizations of the signal. White bution mentions the advantage of having a positive distri
[I961 and White and Boashash [197], [I981 have devised spe bution. Wealso point outthat even plausiblesoundingcon
cific methods for obtaining the important parameters of a ditions have t o be applied carefully. An often stated
random process and have given expressions for the errors requirement is that the first conditional moment of fre
involved in estimating the parameters. Posch [I591 has quency be the derivative of the phase of the signal because
shown that if Eq. (4.10) is satisfied by the kernel, then the it corresponds t o instantaneous frequency. This may sound
distribution will be the power spectrum when the input i s reasonable, but we already discussed in Section Vlll the
a random stationary signal. difficulties of making a total identification of instantaneous
In concluding this summary of the applications, we frequency, first conditional moment, and derivative of the
emphasize that these distributions have not only been use phase. As has been pointed out in Section VIII, there i s the
ful to study old ideas, but have also led to new concepts. theoretical difficulty that the derivative of the phase does
An innovative concept has been introduced by Szu and not always correspond to the frequencies in the Fourier
Coulfield [186], where they address the question of how to spectrum [74], [131]. This indicates that there may be a pos
compare the frequency contents of two signals. They sible inherent inconsistency with the marginal require
devised a fourdimensional Rihaczek distribution, the vari ment. This problem requires considerable further inves
ables being time and frequency for each signal. From this tigation. Also the requirement should be in terms of the
correlated distribution they compare the frequency con analytic signal because that i s how instantaneous fre
tents of two signals. quency is defined. Moreover it i s well known that the use
fulness of the definition i s meaningful only for certain types
of signals, and therefore it i s questionable whether we
X. CONCLUSION
should insist that this hold for all signals. Given all these
In conclusion we discuss some general attitudes that have issues, it i s not straightforward t o set u p the conditions for
arisen in regard t o these timefrequency distributions. The the satisfaction of the concept of instantaneous frequency.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 975
Indeed, a comprehensive theory of a timevarying spec who has identified many of the questionable and hidden
trum should predict what instantaneous frequency is. assumptions that have gone into the proofs t o show that
Another approach i s t o argue that the performance of a they do not exist. Even before positive distributions were
distribution i s best for a particular property that i s deemed constructed, as explained i n Section Ill, it was clear that
desirable. I n a penetrating work Janssen[98]considered the there could not be any inherent reason for their nonexis
performance of distributions for signals of the form s ( t ) = tence since the Wigner distribution is positive for certain
e/”“’and attempted t o determine which distribution i s more signals. Park and Margenau [154], in their work on joint
concentrated along the line w = p’(t). Toward that end he measurability, also analyzed the various arguments that
needed a method to determine the spread along that line. have been given for the nonexistence of positive distri
As we have seen, the concept of spread using these dis butions and were able t o construct a simple counterex
tributions i s far from clear, so Janssen squared the distri ample. Of course we now know that positive distributions
bution t o avoid the fact that the distributions may go neg are easily constructed, as in Section IllF, and that they do
ative. Some have assumed that Janssen showed that the yield the correct marginals and the uncertainty principle.
Wigner distribution has the least amount of spread around It i s a fact that distributions which are bilinear functionals
the derivative of the phase. However, Janssen proved this of the signal cannot be positive for all signals [199], [200].
only for the class of distributions that have kernels of the Joint densities and marginals appear in every field of sci
form 4(0, 7) = elas‘. Also, we have seen that for multicom ence and engineering, and certainly bilinearity i s never
ponent signals there are distributions that behave better imposed on a distribution. Note that in timefrequency
than the Wigner distribution in the sense that the cross analysis the marginals themselves are bilinear i n the signal,
terms are smaller in magnitude. Hence it is far from clear and hence bilinear distributions in the sense discussed here
whether “optimality” should be set u p for mono or mul are joint distributions which are bilinear to the square root
ticomponent signals, or perhaps neither. of the marginals. Now even the simplest proper joint dis
Another common argument for elevating a particular dis tribution, the correlationless one [P(x, y) = P,(x)P,(y)], i s a
tribution is to argue, for example, that all time and shift product of the marginals and hence quartilinear t o the
invariant distributions can be expressed as a ”smoothed” square root of the marginals. I n general, proper joint dis
version of it and therefore degraded in some sense. I n par tributionsarehighlynonlinearfunctionalsofthe marginals.
ticular it isoften stated that the timeand shiftinvariant dis The possibility of using distributions that are not bilinear
tributions may be written in the form in the signal needs considerably more research. That i s not
P(t, w ) = s g(t’  t, w’  w ) W(t’, U‘) dt‘ dw’
where W(t,w ) i s the Wigner distribution, making it the “fun
(10.1)
to say that the class of bilinear distributions are not useful
or desirable. However, we should be clear about the con
ceptual assumptions and interpretations.
Current knowledge has barely scratched the surface of
damental” one. However, we have seen in Section IV that the possible distributions and methodologies that may be
we can equally well express the distributions in terms of, used to describe a timevarying spectrum. There are an infi
for example, the Rihaczek distribution. nite number of distributions, and only a few have been
Another view is that the choice of distribution should explored. Although the concepts and techniques that have
depend on the application and possibly the class of signals been developed in the past 40 years are truly impressive,
used,much in thesamespirit asdifferentwindowfunctions it i s clear that much more work lies ahead. The attempt t o
are chosen in various applications of the spectrogram or understand what a timevarying spectrum is, and to rep
different sets of functions are used to expand the electro resent the properties of a signal simultaneously i n time and
static potential depending on the geometryof the problem. frequency, i s one of the most fundamental and challenging
As with expansions in terms of a complete set of functions, aspects of analysis.
the choice i s a matter of convenience, insight, and math
ematical simplicity, which depends on the situation. Per ADDITIONALCOMMENTS
haps the proper attitude should be that the choice of dis I would like to mention some recent results and some
tribution should be signal or application dependent. Indeed additions and omissions in the text.
the recent work of Choi and Williams [51] and Nuttall [146], Elimination of Aliasing in the Discrete Wigner Distribu
[147], which uses the ambiguity plane to choose the kernel, tion. For a bandlimited signal, the value of the signal at an
i s an indication that different kernels may be appropriate arbitrary time can be obtained from discrete sampled val
for different signals. This makes kernels signal dependent, ues if the sampling is done at the Nyquist rate or higher,
and hence the distributions are not necessarily bilinear any that is, at a sampling frequency as 2 2wmax, where amax is
longer. Given these exciting developments it appears that the highest frequency i n the signal. As mentioned in Sec
at this stage of our knowledge, trying t o prove which func tion VE, it has generally been believed that to reconstruct
tion is “best” i s premature to say the least. the Wigner distribution from discrete samples, one must
We now turn to what has been a fundamental issue with sample the signal at twice this rate or higher; otherwise
these joint distributions, and that is the positivityquestion. aliasing will occur. Nuttall [206]has recently shown that the
Everyone agrees that ideally a distribution should be pos higher sampling rate i s unnecessaryand hasdevised an effi
itive since they are t o be interpreted as densities. Many cient aliasfree method for the computation of the Wigner
proofs have been given that positive distributions satisfying distribution from a signal sampled at the Nyquist rate. The
the marginals do not exist. The common plausibility argu key to his approach i s t o take into account all the available
ment relied on the uncertainty principle as discussed in information in the local autocorrelation function &(7) as
Section VIII. A thorough and profound analysis of the pos given by Eq. (3.64). By doubly Fourier transforming the local
itivity question has been given by MugurSchachter [144], autocorrelation function, Nuttall has shown that non
976 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
overlapping diamondshaped regions exist in the trans vibrations of gears with and without faults he has shown
formed plane, each containing the fundmental informa that the Wigner distribution i s an excellent discriminator
tion, provided that the signal was sampled at the Nyquist and can be used to detect both the type and the extent of
rate or higher. The end result of his analysis for obtaining faults.
the Wigner distribution at an arbitrary timefrequency pair White and Boashash [213] developed a method for esti
i s to construct first mating the Wigner distribution for a nonstationary random
process. They use a recursive procedure for the specifi
c S ( ~ T )eiukT, if 5 2
T cation of estimators having desired characteristics in the
particular regions of the timefrequency plane.
otherwise
ACKNOWLEDGMENT
where s(kT) are the sampled signal values. The Wigner dis
tribution i s then constructed from S(U) according to The author would like to express his appreciation to Dr.
Albert Nuttall for the many conversations and valuable
w(t, 1 s*
= 2T (U  9 ) eJtes(w + 9 ) d9. insights he has given him on averywidevarietyof subjects.
He also appreciates the help rendered by Dr. Chongmoon
Nuttall has shown that this is the Wigner distribution of the Lee in the preparation of the manuscript and he wishes to
original continuoys signal s(t) and is hence free of aliasing. thank Profs. D.T. Barry, N. M. Marinovic, and W. 1. Williams
In practice, both S(U) and W(t, w ) will be discretized in time for supplying some of the graphs relating to their and their
and frequency and accomplished by fast Fourier trans coworkers’ respective research.
forms. We note that interpolation of the sampled values or
reconstitution of the continuous signal from the sampled REFERENCES
values i s not necessary or utilized. R. M. S. S. Abeysekera and B. Boashash, “Time frequency
“DataAdaptive“ Distributions: As discussed in the Con domain features of ECG signals: their application in Pwave
clusion, the most common view point regarding timefre detection using the cross WignerVille distribution,” in Proc.
/E€€ lCASSP89, 1989 (to be published).
quency distributions i s to find a “best” one, which will be M. H. Ackroyd, “Instantaneous and timevarying spectra
used for all signals, although, as we mentioned, there are an introduction,” Radio Electron. Eng., vol. 239, pp. 145152,
recent indications that different distributions may be 1970.
appropriate for different signals. Recently ]ones and Parks , “Shorttime spectra and timefrequency energy dis
[207], [208] have made an important contribution in devel tributions,”]. Acoust. Soc. Am., vol. 50, pp. 12291231,1970.
D. V. Ageyev, “The active band of the frequency spectrum
oping and implementing a “dataadaptive” method for of a function of time,” Trud. Gor’k. Politekh. Inta, vol. 11,
devising a timefrequency distribution of a certain form. p. 5, 1955.
They consider the shorttime Fourier transform with a J. E. Allen and L. R. Rabiner, “A unified approach to short
Gaussian window where the parameters of the Gaussian time Fourier analysis and synthesis,” Proc. /€€€, vol. 65, pp.
15581564, 1977.
window are varied for each different point in the timefre R. Altes, “Detection, estimation and classification with spec
quency plane. The parameters are chosen so that the time trograms,”]. Acoust. Soc. Am., vol. 67, pp. 12321246,1980.
frequency concentration of the locally dominant compo , Tech. Memo. 322, Orincon, personal communication,
nent i s maximized.They have applied this approach to both Apr. 1984.
idealized and real data with considerable effectiveness in M. G. Amin, “Time and lagwindow selection in WignerVille
distribution,” in Proc. /€E€ /CASSP 87, pp. 15291532, 1987.
constructing distributions with high resolution. ,”Recursion in Wigner distribution,” in AdvancedAlgo
Resolution Comparison: Jonesand Parks [209] have made rithms and Architectures for Signal Processing 111, F. T. Luk,
an interesting comparative study of the resolution prop Ed., PrOC. SPIE, vol. 975, pp. 221231, 1989.
erties of the Wigner distribution, spectrogram, and 1. C.Andrieux, M. R. Feix,C. Mourgues, P. Bertrand, B. Izrar,
and V. T. Nguyen, “Optimum smoothingof the WignerVille
smoothed Wigner distribution. They used a signal com distribution,” 1EEE Trans. Acoust., Speech, Signal Process
posed of two Gaussian components with different time and ing, vol. ASSP35, pp. 764769, 1987.
frequency centers. They showed that for this case the best E. H. Armstrong, “A method of reducing disturbances in
resolution (defined by the ability of the distribution to sep radiosignaling bya method of frequencymodulation,” Proc.
arate the two centers) was obtained by the spectrogram with IRE, VOI.24, pp. 669740, 1936.
R.A.Athale,J. N. Lee, E. L. Robinson,and H. H. Szu,“Acousto
an optimally matched window. optic processors for realtime generation of timefrequency
Bilinear Distributions: The approach discussed in Section representations,” Opt. Lett., vol. 8, pp. 166168, 1983.
IllD was extended in O‘Connell and Wigner [210], where R. Eamler and H. Glunder, “The Wigner distribution func
they considered the question of uniqueness in a distri tion of twodimensional signals coherentoptical genera
tion and display,” Opt. Acta, vol. 30, pp. 17891803, 1983.
bution. N. F. Barber and F. Ursell, “The response of a resonant sys
Local Second Moment: It was mentioned in the text that tem to a gliding tone,” Phil. Mag., vol. 39, pp. 345361,1948,
the second local moment of frequency corresponds to the D.T. Barry and N . M. Cole, “Muscle sounds are emitted at
local kinetic energy in quantum mechanics and that dif the resonant frequency of skeletal muscle,” /E€€ Trans.
ferent expressions have been considered. The unified Biomed. Eng. (submitted for publication).
H. 0. Bartelt, K.H. Brenner, and W. Lohmann, “The Wigner
approach where previous expressions are special cases was distribution function and its optical production,” Opt.
presented in [211], and the references to previously known Commun., vol. 32, pp. 3238, 1980.
expressions are given therein. M. J. Bastiaans, “Wigner distribution function and i t s appli
Applications: Forrester [212] has applied the Wigner dis cation to first order optics,” ), Opt. Soc. Am., vol. 69, pp.
17101 716, 1979.
tribution to the study of vibrations of helicopter compo , “Application of the Wigner distribution function to
nents with the aim of detecting developing failure of partially coherent light,”). Opt. Soc. Am., vol. 3, pp. 1227
machine parts, in particular gear failure. By examining the 1238, 1986.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 977
[I 91 E. Bazelaire and J. R. Viallix, "Theory of seismic noise," in G. F. BoudreauxBartels, "Time varying signal processing
Proc. 49th fur. Ass. Explor. Ceophys. Mtg. (Belgrade, Yugo using the Wignerdistribution timefrequency signal rep
slavia, 1987), pp. 12. resentation," in Advances in Geophysical Data Processing,
E. Bedrosian, "A product theorem for Hilbert transforms," M. Simaan, Ed. Greenwich, CT: JAl Press, 1985, pp. 3379.
Proc. /E€€, vol. 51, pp. 686689, 1963. [421 H. Brand, R. Graham, and A. Schenzle,"Wigner distribution
P. Bertrand and J . Bertrand, "Timefrequency representa of bistable steady states in optical subharmonic genera
tions of broad band signals," in The Physics of PhaseSpace, tion," Opt. Commun., vol. 32, pp. 359363, 1980.
Y. S. Kim and W. W. Zachary, Eds. New York, NY: Springer, B. R. Breed and T. E. Posch, "A range and azimuth estimator
1987, pp. 250252. based on forming the spatial Wigner distribution," in Proc.
J. Bertrand and P. Bertrand, "Tomographic procedure for I f € € lCASSP84, pp. 416/9.19.2, 1984.
constructing phase space representations," in The Physics K. H. Brenner and A. W. Lohmann, "Wigner distribution
ofPhaseSpace, Y. S. Kim and W. W. Zachary, Eds. NewYork, function display of complex I D signals," Opt. Commun.,
NY: Springer, 1987, pp. 208210. VOI. 42, pp. 310314, 1982.
P. Bertrand, B. Izrar, V. T. Nguyen, and M. R. Feix, "Obtain J. R. Carson and T. C. Fry, "Variable frequency electric cir
ing nonnegative quantum distribution functions," Phys. cuit theorywith application to the theoryof frequency mod
Lett., vol. 94, pp. 415417, 1983. ulation," Bell Sys. Tech. ]., vol. 16, pp. 513540, 1937.
A. BlancLapierre and B. Picinbono, "Remarques sur la L461 N. Cartwright, "Correlations without joint distributions in
notion de spectre instantane de puissance," Pub. Sci. Univ. quantum mechanics," Found. Phys., 4, pp. 127136,
d'Alger, s. BI, pp. 232, 1955. 1974.
B. Boashash and P. J. Black, "An efficient realtime imple , "A nonnegative quantum mechanical distribution
mentation of the WignerVille distribution," / € € E Trans. function," Physica, vol. 83A, pp. 210212, 1976.
Acoust., Speech, Signal Processing, vol. ASSP35, pp. 1611 D. S. K. Chan, "A nonaliased discretetime Wigner distri
1618, 1987. bution for timefrequency signal analysis," in Proc. / € € E
B. Boashash, "Timefrequency signal analysis, the choice of lCASSP82, pp. 13331336,1982,
a method and its applications," in Advances in Spectral D. Chester, F. J. Taylor, and M. Doyle, "On the Wigner dis
Analysis, S. Haykin, Ed. Englewood Cliffs, NJ:PrenticeHall tribution," in Proc. / € € E lCASSP83, pp. 491494, 1983.
(to be published). D. Chester, F. J. Taylor, and M. Doyle, "The Wigner distri
B. Boashash, B. Lovell, and H. J. Whitehouse, "Highreso bution in speech processing applications,"]. Franklin Inst.,
lution timefrequency signal analysis by parametric mod VOI.318, pp. 415430, 1984.
eling of the WignerVille distribution," in Proc. lAST€Dlnt. H. I. Choi and W. J . Williams, "Improved timefrequency
Symp. Signal Processing and Its Applications (Brisbane, representation of multicomponent signals using exponen
Australia, 1987), pp. 297301. tial kernels," /E€€ Trans. Acoust., Speech, Signal Processing,
B. Boashash and H. J. Whitehouse, "Seismic applications of vol. ASSP37, 1989 (in press).
the WignerVille distribution," in Proc. /€€€ lnt. Conf. Sys H. I. Choi, W. J. Williams, and H. Zaveri, "Analysis of event
tems a n d Circuits, pp. 3437, 1986. related potentials: time frequency energy distribution," in
B. Boashash, "Nonstationary signals and the WignerVille Proc. Rocky Mountain Bioengineering Symp., pp. 251258,
distribution," in Proc. lAST€D lnt. Symp. Signal Processing 1987.
and Its Applications (Brisbane, Australia, 1987), pp. 261268. T. A. C. M. Claasenand W. F. G. Mecklenbrauker, "Thealias
B. Boashash,B. Lovell, and L. White, "Time frequency anal ing problem in discretetime Wigner distributions," /E€€
ysis and pattern recognition using singular value decom Trans. Acoust., Speech, Signal Processing, vol. ASSP31, pp.
position of the WignerVille distribution," in Advanced 10671072, 1983.
Algorithms and Architecture for Signal Processing, Proc. T. A. C. M. Claasen and W. F. G. Mecklenbrauker, "The Wig
SPl€, vol. 828, pp. 104114, 1987. ner distributiona tool for timefrequency signal analysis;
B. Boashash and P. O'Shea, "A methodology for detection part I: continuoustime signals," Phijips 1.R ' es., vol. 35,' pp.
and classification of underwater acoustic signals using time 217250, 1980.
frequency analysis techniques," preprint, 1989 (to be pub T. A. C. M. Claasenand W. F. G. Mecklenbrauker, "The Wig
Ii s hed). ner distributiona tool for timefrequency signal analysis;
P. Boles and B. Boashash, "The cross WignerVille distri part II: discrete time signals," Philips1,Res., vol. 35, pp. 276
butiona twodimensional analysis method for the pro 300, 1980.
cessing of vibrosis seismic signals," in Proc. /€€€ lCASSP87, T. A. C. M. Claasenand W. F. G. Mecklenbrauker, "The Wig
pp. 904907, 1988. ner distributiona tool for timefrequency signal analysis;
M. Born and P. Jordan, "Zur Quantenmechanik, " Z. Phys., part Ill: relations with other timefrequency signal trans
vol. 34, pp. 858888, 1925. formations," Philips ]. Res., vol. 35, pp. 372389, 1980.
F. Bopp, "La mechanique quantique estelle une mecha [57l F. S. Cohen, G. F. BoudreauxBartels, and S. Kadambe,
nique statistique classique particuliere?," Ann. Inst. H . "Tracking of unknown nonstationary chirp signal using
Poincare, vol. 15, pp. 81112, 1956. unsupervised clustering in the Wigner distribution space,"
B. Bouachache, "Representation tempsfrequence," Soc. in Proc. /€€E lCASSP88, pp. 21802183, 1988.
Nat. ELF Aquitaine, Pau, France, Publ. Recherches, no. 373 L. Cohen, "Generalized phasespace distribution func
378, 1978. tions," 1. Math. Phys., vol. 7, pp. 781786, 1966.
B. Bouachache and E. de Bazelaire, "Pattern recognition in , "On a fundamental property of the Wigner distribu
the timefrequency domain by means of the WignerVille tion," /E€€ Trans. Acoust., Speech, Signal Processing, vol.
distribution," in Proc. GRETSl9th Colloq., pp. 879884, ASSP35, pp. 559561, 1987.
1983. , "Wigner distribution for finite duration or bandlim
[37l B. Bouachache and F. Rodriguez,"Recognition of timevary ited signals and limiting cases," /E€€ Trans. Acoust., Speech,
ing signals in the timefrequency domain by means of the Signal Processing, vol. ASSP35, pp. 796806, 1987.
Wigner distribution," in Proc. /€€E lCASSP84, pp. 22.5/14, ,"Quantization problem and variational principle in the
1984. phase space formulation of quantum mechanics," 1. Math.
G. F. BoudreauxBartels and T. W. Parks, "Reducing aliasing Phys., VOI.17, pp. 18631866, 1976.
in the Wigner distribution using implicit spline interpola L. Cohen and T. Posch, "Positive timefrequency distribu
tion," in Proc. /€E€ lCASSP83, pp. 14381441, 1983. tion functions," /E€€ Trans. Acoust., Speech, Signal Pro
G. F. BoudreauxBartelsand T. W. Parks, "Signal estimation cessing, vol. ASSP33, pp. 3138, 1985.
using modified Wigner distributions," in Proc. /E€€ lCASSP b31 L. Cohen and T. Posch, "Generalized ambiguity functions,"
84, pp. 22.3114, 1984. in Proc. / € E € lCASSP85, pp. 10251028, 1985.
G. F. BoudreauxBartels and T. W. Parks, "Timevarying fil [641 M. Conner and L. Yao, "Optical generation of the Wigner
tering and signal estimation using Wigner distribution syn distribution of 2D real signals,"Appl. Opt., vol. 24, pp. 3825
thesis techniques," / E € € Trans. Acoust., Speech, Signal Pro 3829, 1985.
cessing, vol. ASSP34, pp. 442451, 1986. G. Cristobal, J. Bescos, and J. Santamaria, "Application of
978 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
the Wigner distribution for image representation and anal , “Interference terms in the Wigner distribution,” in
ysis,” in Proc. /€€€ 8th Int. Conf. Pattern Recognition, pp. Digital Signal Processing 84, V. Cappellini and A. G. Con
9981000, 1986. stantinides, Eds. Amsterdam, The Netherlands: Elsevier
N. C. DeBruijn,”Uncertainty principles in Fourier analysis,” (North Holland), 1984.
in Inequalities, 0.Shisha, Ed. NewYork, NY: Academic Press, (a) , “A note on ’Wigner distribution for finite duration
1967, pp. 5771. or bandlimited signals and limiting cases’,” / € E € Trans.
G. W. Deley, ”Waveform design,” in Radar Handbook, M. Acoust., Speech, Signal Processing, vol. ASSP36, pp. 927
1. Skolnik, Ed. New York, NY: McGrawHill, 1970. 929, 1988; (b) author’s reply, ibid.
A. Dziewonski, S. Bloch, and M. Landisman, “A technique R. L. Hudson, “When is the Wigner quasiprobability den
for the analysis of transient signals,” Bull. Seism. Soc. Am., sity nonnegative?,” Rep. Math. Phys., vol. 6, pp. 249252,
VOI.59, pp. 427444, 1969. 1974.
C. Eichmann and N. M. Marinovic, “Scaleinvariant Wigner J. lmberger and B. Boashash, “Signal processing in ocean
distribution and ambiguity functions,” in Proc. lnt. Soc. Opt. ography,” in Proc. lASTED Int. Symp. Signal Processingand
Eng., Proc. SPIE, vol. 519, pp. 1824, 1985. Its Applications (Brisbane, Australia, 1987), pp. 89.
C. Eichmann and B. Z. Dong, “Twodimensional optical fil J. lmberger and B. Boashash, “Application of the Wigner
teringof l  D signals,”App/. Opt.,vol.21, pp. 31523156,1982. Ville distribution to temperature gradient microstructure:
B. Escudie and J. Grea, “Sur une formulation generale de a new technique to study smallscale variations,” J. Phys.
la representation en temps et frequence dans I‘analyse de Oceanog., vol. 16, pp. 19972012, 1986.
signaux d’knergie finie,” C. R. Acad. Sci. Paris, vol. A283, pp. T. Iwai, A. K. Gupta, and T. Asakura, “Simultaneous optical
10491051, 1976. productionof the sectional Wigner distribution function for
R. M. Fano, “Shorttimeautocorrelation functions and power a twodimensional object,” Opt. Commun., vol. 58, pp. 15
spectra,” J. Acoust. Soc. Am., vol. 22, pp. 546550, 1950. 19, 1986.
P. D. Finch and R. Croblicki, “Bivariate probabilitydensities [97l C. P. Janseand J. M. Kaizer, “Timefrequency distributions
with given margins,” Found. Phys., vol. 14, pp. 549552,1984. of loudspeakers: the application of the Wigner distribu
L. M. Fink, “Relations between the spectrum and instan tion,” 1. Audio €ng. Soc., vol. 31, pp. 198223, 1983.
taneous frequency of a signal,” transl. in Probl. Inform. A. J. E. M. Janssen,“On the locus and spread of psuedoden
Transm., vol. 2, pp. 1121, 1966. sity functions in the timefrequency plane,” Philips J. Res.,
J. Flanagan, Speech Analysis Synthesis and Perception. vol. 37, pp. 79110, 1982.
New York, NY: Springer, 1972. , “Application of Wigner distributions to harmonic anal
P. Flandrin, “Some features of timefrequency represen ysis of generalized stochastic processes,” Math.Centrum,
tations of multicomponent signals,” in Proc. /E€€ lCASSP Amsterdam, The Netherlands, MCtract 114, 1979.
84, pp. 418.4.14.4, 1984. ,“Positivity properties of phaseplane distribution func
[77l P. Flandrin and B. Escudie, “Time and frequency represen tions,” 1. Math. Phys., vol. 25, pp. 22402252, 1984.
tation of finite energy signals: a physical property as a result A. J . E. M. Janssenand T. C. M. Claasen, “On positivity of
of a Hilbertian condition,” Signal Processing, vol. 2, pp. 93 timefrequency distributions,” /€€E Trans. Acoust., Speech,
100, 1980. Signal Processing, vol. ASSP33, pp. 10291032, 1985.
P. Flandrin, “A timefrequency formulation of optimum (a) A. J. E. M. Janssen, “A note on ’Positive timefrequency
detection,” / € E € Trans. Acoust., Speech, Signal Processing, distributions’,” I€€€Trans. Acoust., Speech, Signal Process
vol. ASSP36, pp. 13771384, 1988. ing, vol. ASSP35, pp. 701704, 1987. (b) Author’s reply,
L. E. Franks, Signal Theory. Englewood Cliffs, NJ:Prentice rbrd.
Hall, 1969. J.Z.Jiao,B. Wang, and L. Hong, “Wigner distribution func
D. Gabor, “Theory of communication,” J. /E€ (London), vol. tion and optical geometrical transformation,” Appl. Opt.,
93, pp. 429457, 1946. vol. 23, pp. 12491254, 1984.
H. Garudadri, M. P. Beddoes, A.P. Benguerel, and J. H. V. S. Kay and G. F. BoudreauxBartels, “On the optimality of
Gilbert, “On computing the smoothed Wigner distribu the Wigner distribution for detection,” in Proc. / € € E ICASSP
tion,” in Proc. I€€€ICASSP87, pp. 15211524,1987. 85, pp. 10171020, 1985.
S. K. Chosh, M. Berkowitz, and R. G. Parr, “Transcription 0. Kenny and B. Boashash, “An optical signal processor for
of groundstate densityfunctional theory into a local ther timefrequency signal analysis using the WignerVille dis
modynamics,” in Proc. Nat. Acad. Sci., vol. 81, pp. 80288031, tribution,” J. Elec. Electron. Eng. (Australia), pp. 152158,
1984. 1988.
0. D. Grace, “Instantaneous power spectra,”]. Acoust, Soc. E. L. Key, E. M. Fowle, and R. D. Haggarty, ”A method of
Am., vol. 69, pp. 191198, 1981. designing signals of large timebandwidth product,” in IRE
R. Croblicki, “A generalization of Cohen’s theorem,” J. Int. Conv. Rec., vol. 4, pp. 146154, 1961.
Math. Phys., vol. 24, pp. 841843, 1983. J. G. Kirkwood, “Quantum statistics of almost classical
M. S. Cupta, “Definitions of instantaneous frequency and ensembles,” Phys. Rev., vol. 44, pp. 3137, 1933.
frequency measurability,” Am. 1. Phys., vol. 43, pp. 1087 J. R. Klauder, “The design of radar signals having both high
1088,1975. range resolution and high velocity resolution,” Bell Sys.
A. K. Gupta and T. Askura, “New optical system for the effi Tech. J., vol. 39, pp. 809820, 1960.
cient display of Wigner distribution functions usinga single I.R. Klauder,A. C. Price, S. Darlington,and W. J.Albershein,
object transparency,” Opt. Cornmun., vol. 60, pp. 265268, “The theory and design of chirp radars,” Bell Sys. Tech. I.,
1986. vOI. 39, pp. 745808, 1960.
J. K. Hammond and R. F. Harrison, “WignerVille and evo F. Kobayashi and H. Suzuki, “Timevarying signal analysis
lutionary spectra for covariance equivalent nonstationary using squared analytic signals,” in Proc. /E€€ lCASSP87, pp.
random process,” in Proc. IEEE lCASSP 85, pp. 10251027, 15251528,1987.
1985. K. Kodera, C. devilledary, and R. Gendrin, R, “A new method
F. J.Harris and A. Abu Salem, ”Performance comparison of for the numerical analysis of nonstationary signals,” Phys.
WignerVille based techniques to standard FMdiscrimi Earth and Planetary Interiors, vol. 12, pp. 142150, 1976.
nators for estimating instantaneous frequency of a rapidly K. Kodera, R. Gendrin, and C. de Villedary, ”Analysis of time
slewing FM sinusoid in the presenceof noise,” in Advanced varying signals with small BT values,” /€€€ Trans. Acoust.,
Algorithms and Architectures for Signal Processing 111, F. T. Speech, Signal Processing, vol. ASSP26, pp. 6476, 1978.
Luk, Ed., Proc. SPIE, vol. 975, pp. 232244, 1989. R. Koenig, H. K. Dunn, and L. Y. Lacy, “The sound spectro
M. Hillery, R. F. O‘Connell, M. 0. Scully, and E. P. Wigner, graph,” J. Acoust. Soc. Am., vol. 18, pp. 1949, 1946.
”Distribution functions in physics: fundamentals,” Phys. J. G. Kruger and A. Poffyn, “Quantum mechanics in phase
Rep., vol. 106, pp. 121167, 1984. space,” Physica, vol. 85, pp. 84100, 1976.
F. Hlawatsch, “Transformation, inversion and conversion of B. V. K. V. Kurnar and W. C. Carroll, “Pattern recognition
bilinear signal representations,” in Proc. l€E€lCASSP85,pp. using Wigner distribution function,” in / € E € Proc 70th lnt.
10291032, 1985. Optical Computing Conf., pp. 130135, 1983.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 979
[I161 B. V. K. V. Kumar and W. C. Carroll, ”Effects of sampling on of nonstationary processes,” lEEE Trans. Acoust., Speech,
signal detection using the crossWigner distribution func Signal Processing, pp. 14611470, 1985.
tion,” Appl. Opt., vol. 23, pp. 40904094, 1984. W. F. G. Mecklenbrauker, “A tutorial on nonparametric bil
[I171 B. V. K. V. Kumar and W. C. Carroll, “Performance of Wigner inear timefrequency signal representations,” in Signal Pro
distribution function based detection methods,” Opt. Eng., cessing, J. L. Lacoume, T. S. Durrani, and R. Stora, Eds.
VOI.23, pp. 732737, 1984. Amsterdam, The Netherlands: North Holland, 1985.
1181 V. V. Kuryshkin, I. Lyabis, and Y. I. Zaparovanny, “Sur le G. Mourgues, M. R. Feix, J. C. Andrieux, and P. Bertrand,
probleme de la regle de correspondance en theorie quan “Not necessary but sufficient conditions for the positivity
tique,” Ann. fondation Louis de Broglie, vol. 3, pp. 4561, of generalized Wigner functions,”/. Math. Phys., vol. 26, pp.
1978. 25542555,1985.
1191 V. V. Kuryshkin, “Some problems of quantum mechanics J. E. Moyal, “Quantum mechanics as a statistical theory,”
possessing a nonnegative phasespace distribution func Proc. Cambridge Phil. Soc., vol. 45, pp. 99124, 1949.
tion,” lnt. 1. Theor. Phys., vol. 7, pp. 451466, 1973. M. MugurSchachter, “A study of Wigner’s theorem on joint
1201 , “Uncertainty principle and the problems of joint coor probabilities,” found. Phys., vol. 9, pp. 389404, 1979.
dinatemomentum probability density in quantum mechan A. H. Nuttall, “On the quadrature approximation to the Hil
ics,‘’ in The Uncertainty Principle and the foundations o f bert transform of modulated signals,” Proc. IEEE, vol. 54, p.
Quantum Mechanics. New York, NY: Wiley, 1977, pp. 61 1458, 1966; E. Bedrosian, author’s reply, ibid.
83. , “Wigner distribution function: relation to shortterm
[I211 J.L. Lacoume and W. Kofman, “Etude des signaux non sta spectral estimation, smoothing, and performance in noise,”
tionnaires par la representation en temps et en frbquence,” Naval Underwater Systems Center, NUSC Tech. Rep. 8225,
A. Telec., vol. 30, pp. 231238, 1975. Feb. 16, 1988.
[I221 D. G. Lampard, ”Generalization of the WienerKhint , “The Wigner distribution function with minimum
chine theorem to nonstationary processes,” 1. Appl. Phys., spread,” Naval Underwater Systems Center, NUSC Tech.
vol. 25, p. 802, 1954. Rep. 8317, June1, 1988.
[I231 C. Leeand L. Cohen,”Instantaneousfrequency, its standard , private communication.
deviation and multicomponent signals,” in Advanced Algo J. OjedaCastaneda and E. E. Sicre, “Bilinear optical systems,
rithms and Architectures for Signal Processing l / / ,F. T. Luk, Wigner distribution function and ambiguity function rep
Ed., ro
‘/c. SPlE, vol. 975, pp. 186208, 1989. resentations,” Opt. Acta, vol. 31, pp. 255260, 1984.
[I241 R. M. Lerner, “Representation of signals,” in Lectures on A. V. Oppenheim, “Speech spectrograms using the fast Fou
Communications System Theory, E. J. Baghdady, Ed. New rier transform,” IEEE Spectrum, vol. 7, pp. 5762, 1970.
York, NY: McGrawHill, 1961, pp. 203242. A. V. Oppenheim and R. W. Schafer, Digitalsignal Process
[I251 M. J. Levin, “Instantaneous spectra and ambiguity func ing. Englewood Cliffs, NJ: PrenticeHall, 1975.
tions,” lEEE Trans. lnformat. Theory, vol. IT13, pp. 9597, C. H. Page, ”Instantaneous power spectra,” 1. Appl. Phys.,
1967. vol. 23, pp. 103106, 1952.
[I261 A. L. Levshin, V. F. Pisarenko, and G. A. Pogrebinsky, “On A. Papoulis, Signal Analysis. New York, NY: McGrawHill,
a frequencytime analysis of oscillations,” Ann. Geophys., 1977.
VOI.28, pp. 211218, 1972. J. L. Park and H. Margenau, “Simultaneous measurability in
1271 S. C. Liu, ”Timevarying spectra and linear transformation,” quantum theory,” lnt. 1. Theor. Phys., vol. 1, pp. 211283,
5ellSys. Tech. /., vol. 50, pp. 23652374, 1971. 1968.
1281 R. M. Loynes, ”On the concept of spectrum for nonstation F. Peyrin and R. Prost, “A unified definition for the discrete
ary processes,”]. R. Stat. Soc., ser. B, vol. 30, pp. I30,1968. time discretefrequency, and discretetimelfrequency Wig
1291 B. Lovell and B. Boashash, “Evaluation of criteria for detec ner distributions,” IEEF Trans. Acoust., Speech, Signal Pro
tion of change in nonstationary signals,” in Proc. lASTEDlnt. cessing, vol. ASSP34, pp. 858867, 1986.
Symp. Signal Processingand Its Applications (Brisbane, Aus C. Pickover and L. Cohen, “A comparison of joint timefre
tralia, 1983, pp. 291298. quency distributions for speech signals,” Proc. lEEE lnt.
1301 D. Lowe, ”Joint representations in quantum mechanics and Conf. Systems and Circuits, vol. 1, pp. 4245, 1986.
signal processing theory: why a probability function of time C. Piquet, C. R. Acad. Sci. Paris, vol. A279, pp. 107109,1974.
and frequency is disallowed,” RoyalSignals and RadarEstab M. R. Portnoff, “Timefrequency representation of digital
lishment, Memo. 4017, 1986. signals and systems based on shorttime Fourier analysis,”
[I311 L. Mandel, ”Interpretation of instantaneous frequency,”Am. /FEE Trans. Acoust., Speech, Signal Processing, vol. ASSP28,
1. Phys., VOI.42, pp. 840846,1974. pp. 5569, 1980.
[I321 H. Margenau and L. Cohen, “Probabilities in quantum T. E. Posch, “Time frequency distributions for a wide sense
mechanics,” in Quantum TheoryandReality, M. Bunge, Ed. stationary random signal,” in Proc. IEEE lnt. Conf. Systems
New York, NY: Springer, 1967. and Circuits, 1989 (to be published).
[I331 H. Margenau and R. N. Hill, “Correlation between mea R. K. Potter, G.Kopp, and H. C. Green, Visiblespeech. New
surements in quantum theory,” frog. Theor. Phys., vol. 26, York, NY: Van Nostrand, 1947.
pp. 722738, 1961. D. Preis, ”Phase distortion and phase equalization in audio
1341 N. M. Marinovic and G. Eichmann, “Feature extraction and signal processinga tutorial review,”]. Audio Eng. Soc., vol.
pattern classification in spatial frequency domain,” in Proc. 30, pp. 774794, 1982.
Conf. Intelligent Robots and Computer Vision, Proc. SPlE, M. B. Priestley, Spectral Analysis and Time Series. New
vol. 579, pp. 1520, 1985. York: NY: Academic Press, 1981.
1351 N. M. Marinovic and G. Eichmann, “An expansion of Wigner L. R. Rabiner and B. Gold, Theoryand Application ofDigital
distribution and i t s applications,” in Proc. lEEE lCASSP 85, Signal Processing. Englewood Cliffs, NJ: PrenticeHall,
pp. 10211024, 1985. 1975.
1361 N. M. Marinovic and W. A. Smith, ”Suppression of noise to L. R. Rabiner and R. W. Schafer, Digital Processingofspeech
extract the intrinsic frequency variation from an ultrasonic Signals. Englewood Cliffs, NJ: PrenticeHall, 1978.
echo,“ in Proc. IEEE 1985 Ultrasonics Symp., pp. 841846, P. A. Ramamoorthy, V. K. lyer, and Y. Ploysongsang, “Auto
1985. regressive modeling of the Wigner spectrum,” in Proc. IEEE
[I371 N. M. Marinovic and W. A. Smith, “Application of joint time lCASSP 87, pp. 15091512, 1987.
frequency distributions to ultrasonic transducers,” in Proc. H. Reichenbach, Philosophical foundations of Quantum
1986 IEEE lnt. Symp. Circuits and Systems, pp. 5054, 1986. Mechanics. Berkeley, CA: Univ. California Press, 1944.
[I381 W. D. Mark, “Spectral analysis of the convolution and fil W. Rihaczek, “Signal energy distribution in time and fre
tering of nonstationary stochastic processes,”/. Sound Vib., quency,’’ lEEE Trans. lnformat. Theory, vol. IT14, pp. 369
VOI.11, pp. 1963, 1970. 374, 1968.
[I391 W. Martin, “Timefrequency analysis of random signals,” in , Principles of HighResolution Radar. New York, NY:
PrOC. lEEE lCASSP82, pp. 13251328,1982, McGrawHiII, 1969.
[I401 W. Martin and P. Flandrin, “WignerVille spectral analysis ,“Hilbert transforms and the complex representation of
980 PROCEEDINGS OF THE IEEE, VOL. 77, NO. 7, JULY 1989
real signals,” Proc. IEEE, vol. 54, pp. 434435,1966; E. Bedro Foundations of Quantum Theory, A. R. Marlow, Ed. New
sian, authors’ reply, ibid. York, NY: Academic Press, 1978.
M. D. Riley, ”Beyond quasistationarity: designing timefre L. B. White, “Some aspects of the timefrequency analysis
quency representations for speech signals,” in Proc. /FEE of random processes,” thesis, University of Queensland,
lCASSP 87, pp. 657660,1987. Australia, 1989.
G. J. Ruggeri, “On phase space description of quantum L. B. White and B. Boashash, “On estimating the instanta
mechanics,” Prog. Theo. Phys., vol. 46, pp. 17031712,1971. neous frequency of a Gaussian random signal by the use of
B. E. A. Saleh, “Optical bilinear transformationsgeneral the WignerVille distribution,” /FEE Trans. Acoust, Speech,
properties,” Opt. Acta, vol. 26, pp. 777799, 1979. Signal Processing, vol. ASSP36, pp. 417420, 1988.
B. E. A. Saleh and N. S. Subotic, “Timevariant filtering of L. B. White and B. Boashash, “Timefrequency coherence
signals in the mixed timefrequency domain,” / € E € Trans. a theoretical basis for cross spectral analysis of non sta
Acoust., Speech, Signal Processing, vol. ASSP33, pp. 1479 tionary signals,” in Proc. lASTED lnt. Symp. Signal Process
1485, 1985. ing and Its Applications (Brisbane, Australia, 1987), pp. 18
R. W. Schafer and L. R. Rabiner, “Design and simulation of 23.
a speech analysissynthesissystem based on shorttime Fou E. P. Wigner, “On the quantum correction for thermody
rier analysis,” /€FE Trans. Audio Electrocoust.,vol. AU21, pp. namic equilibrium,” Phys. Rev., vol. 40, pp. 749759, 1932.
165174,1973. , ”Quantummechanical distribution functions revis
M. R. Schroeder and B. S. Atal, “Generalized shorttime ited,“ in Perspectives in Quantum Theory, W. Yourgrau and
power spectra and autocorrelation functions,” ). Acoust. A. van der Merwe, Eds. Cambridge, MA: M.I.T. Press, 1971.
Soc. Am., vol. 34, pp. 16791683, 1962. R. M. Wilcox, “Exponential operators and parameter dif
B. Schweizer and A. Sklar, “Probability distributions with ferentiation in quantum physics,”). Math. Phys., vol. 8, pp.
given margins: note on a paper by Finch and Croblicki,” 962982,1967.
Found. Phys., vol. 16, pp. 10611064, 1986. P. M. Woodward, Probability and information Theory with
M. 1. Scolnik, lntroduction to RadarSystems. New York, NY: Application to Radar. London, England: Pergamon, 1953.
McCrawHill, 1980. K.B. Yu and S. Cheng, “Signal synthesis from Wigner dis
J. Shekel, “‘Instantaneous’ frequency,” Proc. IRE, vol. 41, p. tribution,” in Proc. /FEE /CASSP 85, pp. 10371040, 1985.
548, 1953. K.B. Yu, “Signal representation and processing in the mixed
F. Soto and P. Claverie, “Some properties of the smoothed timefrequency domain,” in Proc. / € € E lCASSP87, pp. 1513
Wigner function,” Physica, vol. 109A, pp. 193207, 1981. 1516,1987.
F. Soto and P. Claverie, “When i s the Wigner distribution L. A. Zadeh, “Frequency analysis of variable networks,” Proc.
function of multidimensional systems nonnegative?,” 1. IRE, vol. 38, p. 291,1950.
Math. Phys., vol. 24, pp. 97100, 1983. A. H. Nuttall, ”Aliasfree discretetime Wigner distribution
M. D.Srinivas and E. Wolf, “Some nonclassical features of function and complex ambiguity function,” Naval Under
phasespace representations of quantum mechanics,” Phys. water SystemsCenter, NUSCTech. Rep. 8533, April 14,1989.
Rev., vol. D11, pp. 14771485, 1975. D. L. Jones, ”A highresolution dataadaptive timefre
S. Stein and H. H. Szu,“Two dimensional optical processing quency representation,” thesis, Rice University, Houston,
of onedimensional acoustic data,” Opt. Eng., vol. 21, pp. TX, 1988.
804813, 1982. D. L. Jonesand T. W. Parks, “A highresolution dataadaptive
L. R. 0. Storey, “An investigation of whistling atmospher timefrequency representation,” in Proc. /€E€ lCASSP87, pp.
ics,” Phil. Trans. R. Soc., vol. A246, pp. 113141, 1953. 681683.
N. S. Subotic and B. E. A. Saleh, “Optical timevariant pro D. L. Jonesand T. W. Park, “A resolution comparison of sev
cessing of signs in the mixed timefrequencydomain,” Opt. eral timefrequency representations,” /FEE Trans. Acoust.,
Commun., vol. 52, pp. 259264, 1984. Speech, Signal Processing, 1989 (submitted for publication).
A. L. Swindlehurst and T. Kailath, ”Nearfield source param R. F. O’Connell and E. P. Wigner, ”Quantummechanical dis
eter estimation using a spatial Wigner distribution tribution functions: conditions for uniqueness,” Phys. Lett.,
approach,” in Advanced Algorithms a n d Architectures for vol. 83A, pp. 145148, 1981.
Signal Processing111, T. Luk, Ed., Proc. SPlE, vol. 975, pp. 86 L. Cohen, “Local kinetic energy in quantum mechanics,”).
92, 1989. Chem. Phys., vol. 70, pp. 788789, 1979.
H. H. Szu and H. J. Coulfield, “The mutual timefrequency B. D. Forrester, “Use of the WignerVille distribution in hel
content of two signals,” Proc. /FEE, vol. 72, pp. 902908,1984. icopter fault detection,” in Proc. ASSPA 89, pp. 7882,1989.
H. H. Szu and J. Blodgett, “Wigner distribution and ambi L. B. White and B. Boashash, ”A design methodology for the
guity functions,” in Optics in Four Dimensions, L. M. Nar optimal estimation of the spectral characteristics of nonsta
ducci, Ed. NewYork, NY:Am. Inst.of Physics,1981,pp.355 tionary random signals,” in Proc. ASSP Workshop on Spec
381. tra/ Estimation and Modeling, pp. 6570, 1988.
H. H. Szu, ”Two dimensional optical processing of one
dimensional acoustic data,” Opt. Eng., vol. 21, pp. 804813,
1982.
, “Signal processing using bilinear and nonlinear time
frequency joint distributions,” in The Physics of Phase
Space, Y. S. Kim and W. W. Zachary, Eds. New York, NY:
Springer, 1987, pp. 179199.
C. H. M. Turner, “On the concept of an instantaneous spec
trum and its relationship to the autocorrelation function,”
Leon Cohen received the B.S. degree from
/. Appl. PhyS., vol. 25, pp. 13471351, 1954.
City College, New York, in 1962 and the
D. Y. Vakman, “Do we know what are the instantaneous fre Ph.D. degree from Yale University, New
quency and instantaneous amplitude of a signal?,” transl. Haven, CT, in 1966, both in physics.
in Radio Eng. Electron. Phys., vol. 21, pp. 95100, June1976. He is currently Professor of Physics at
B. Van der Pol, ”The fundamental principles of frequency Hunter College and Graduate Center of the
modulation,” ). / € E (London), vol. 93, pp. 153158, 1946. City University of New York. His research
E. F. Valez and R. G. Absher, “Transient analysis of speech interests are in astronomy, quantum
signals using the Wigner timefrequency representation,” mechanics, chemical physics, numerical
in Proc. /E€€ lCASSP89, 1989 (to be published). methods,and signal analysis. In the past few
J. Ville, “Theorie et applications de la notion de signal ana years he has held visiting research posi
lytique,“ Cables et Transmission, vol. 2A, pp. 6174, 1948. tions at the IBM T. J. Watson Research Center, the Center for
A. Vogt, “Position and momentum distribuions do not Advanced Studies at the University of New Mexico, and the Naval
determine the quantum mechanical state,” in Mathematical Underwater Systems Center.
COHEN: TIMEFREQUENCY DISTRIBUTIONS 981