You are on page 1of 31

Signal Processing 87 (2007) 1504–1522

www.elsevier.com/locate/sigpro

The fan-chirp transform for non-stationary harmonic signals


Luis Weruagaa,*, Ma´rian Ke´pesib
a
Commission for Scientific Visualisation, Austrian Academy of Sciences, Donau-City Strasse 1, 1220 Vienna, Austria
b
Signal Processing and Speech Communication Laboratory, Technical University of Graz, Inffeldgasse 12, 8010 Graz, Austria
Received 30 August 2006; received in revised form 4 December 2006; accepted 4 January 2007
Available online 16 January 2007

Abstract

This paper presents a novel transform related to the framework of warping operators when the continuous time warping
mapping is a second-order polynomial. This case is proven in the paper to be the only one from the aforementioned group
that marginalizes the Wigner distribution along line paths, in particular, with a fan geometry. The properties and attributes
of the fan-chirp transform (FChT) along with the analytical characterization of harmonically related Gaussian chirplets
bear especial relevance in the paper. This analysis shows that for chirp-periodic signals the FChT can reach the limit of the
time–frequency (TF) uncertainty principle, while simultaneously keeping the cross-terms at minimum level. The
formulation of the fast digital computation of the FChT is also provided in the paper. Two practical scenarios—the
analysis of speech with natural intonation and bat ultrasound—validate the theoretical developments and shows
manifestly the eloquent competitive performance of the new transform.
r 2007 Elsevier B.V. All rights reserved.

Keywords: Fan-chirp transform (FChT); Time–frequency (TF); Fan marginalization; Harmonically related chirplets

1. Introduction pursuits [5] over redundant chirplet dictionaries [6–


8], or chirp-based transforms that marginalize the
Identifying frequency-modulated (FM) sinusoids Wigner–Ville distribution according to certain
or chirps in a signal is known to be a tough geometries [10–14]. Although the redundant dic-
challenge for classical linear analysis. Most of the tionary remains the most popular method for
alternative solutions for this problem has come chirplet decomposition so far, chirp-based trans-
from the field of time–frequency (TF) analysis, forms are especially interesting because they can
mainly in the form of Cohen’s class bilinear time– provide a broader picture of the TF content of the
frequency distributions (TFD) [1]. However, the signal.
long debate on the relevance and meaning of the Several signal processing transforms are related
cross-terms [2] or the dilemma on the need of to the term ‘‘chirp’’: the Chirp-Z [9] and the
positive TFDs [3,4] have moved the attention ‘‘Chirplet’’ transforms [10] contain explicitly the
towards different approaches, such as matching term, whilst the fractional Fourier transform [11,12]
and the warped-time operators [13,14] are concep-
* tually related to it. The Chirp-Z transform [9] is an
Corresponding author. Tel.: +43 1515816707.
E-mail addresses: luis.weruaga@oeaw.ac.at, efficient algorithm for calculating discrete Fourier
weruaga@ieee.org (L. Weruaga), kepesi@tugraz.at (M. Ke´pesi). transforms (DFTs) at frequencies not related to a

0165-1684/$ - see front matter r 2007 Elsevier B.V. All rights


reserved. doi:10.1016/j.sigpro.2007.01.006
L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504– 150
1522 5
power-of-two fraction of the sampling bandwidth.
Although a discrete-time chirp signal is used in the
a b
mechanism, the usage of this algorithm is not f f
actually related to the context of this paper.
The first relevant chirp-based transform, the
chirplet transform (CT) [10], is described by the
t t
inner product between the signal and a chirplet as
*
X b ð f Þ ¼ Z 1 xðtÞ gR;t;f ;
ðtÞ dt, (1)
-1 b
where t is time, xðtÞ is the analysis signal, * denotes c d
complex conjugate, and gR;t;f ;b t isð aÞ Gaussian f f
chirplet of unit energy

gR;t;f ;bðtÞ ¼ 2
e-ð1=2Þððt-tÞ=RÞ j2pð f ðt-tÞþð1=2Þbðt-tÞ2 Þ
e . (2) t t
p4
ffi ffiffi ffiffiffi
p R

2
Here n is the instantaneous frequency (IF) at t ¼ t;
b the frequency variation rate and R the time
spread. By disregarding the Gaussian term and ¼ Fig. 1. Marginalization of the Wigner distribution along straight
setting t 0 for simplicity, it is easy to deduce that
line paths: (a) Fourier, (b) chirplet, (c) fractional Fourier and (d)
the squared magnitude of the Chirp(let) transform fan-chirp transforms. The dark area represents the TF energy of a
is Gaussian chirplet.
jX bð f Þj2 ¼ Z 1 WDx ðt; f - btÞ dt, (3)
-1 -1 1
where fðtÞ ¼ c ðtÞ. Eq. (7) represents the inner
ðÞ
where WDxð t; fÞ is the Wigner–Ville distribution product of signal x t with a non-linear chirp, which
[2] ðofÞ x t (henceforth Wigner distribution, WD). is the basic mechanism in TF redundant dictionaries
Thus, the CT yields the ‘‘slanted’’ marginal of for chirp-based signal decomposition [8]. The
the WD. warped-time framework has also given rise to new
Another well-established chirp-based transform is TF distributions, such as the generalized warped
the fractional Fourier transform (FrFT) [11] Cohen’s class (GWCC) [16], which introduces new
Z ways of interpreting the TF content.
X y ðuÞ ¼ 1 xðvÞ K y ðv; uÞ dv, (4) This paper tackles the warping operator (7)
-1 constrained to the mapping f ð Þt being a second-
where Ky ðv; uÞ is the transformation kernel [12]. The order polynomial. This case is proven here to
FrFT involves products with linear-FM chirps in marginalize the WD along straight line paths. Thus,
such a way that it yields the marginalization of the it can be seamlessly compared against other linear
WD along the angular direction y, i.e., chirp-based transforms, such as the CT and FrFT
(see Fig. 1 for an introductory illustration), and
jX yðuÞj2 ¼ Z 1 WDx ðcu - sv; su þ cvÞ dv, (5) studied with more detail apart from the general
-1 framework. The organization of the paper follows:
where c ¼ cos y and s ¼ sin y. Section 2 introduces the analysis and synthesis
The last chirp-based transform considered here is equations of the proposed fan-chirp transform
the warping operator [13–15], defined as (FChT); in Section 3 its main basic attributes, the
1 marginalization geometry and representation of
0
X cð Þ ð f Þ ¼ Z xðcðtÞÞ jc ðtÞj e-j2pft dt, (6) chirp-periodic signals are studied; Section 4
-1 contains a discussion on previous works related to
pffiffiffiffiffiffiffiffiffiffiffiffi
where cð Þt is a continuous differentiable time the FChT; Section 5 addresses the estimation of the
0
mapping and c ðtÞ its derivative. An equivalent only user- defined parameter of the FChT, the
formulation to this warped-time Fourier transform chirp rate, in
is R
0
jf ðtÞje-j2pf fðtÞ dt,
1
X ð f ; fð ÞÞ (7) becomes X cð Þ ð f Þ ¼ jf0 ðfðtÞÞje-j2pf fðtÞ dt, that is
xðtÞ pffiffiffiffiffiffiffiffiffiffiffiffiffi
150 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
6 Z 1 1522 1
By doing the variable change t ¼ffiffiffiffiffifffi ðtÞ in (6), this
integral
pffiffiffiffiffiffiffiffiffiffiffiffi lar to (7).
-1
x
ðtÞ

-1 lar to (7).
order to better match the TF geometry of the xðtÞ can be recovered from its FChT as
analysis signal; Section 6 elaborates on the practical Z qffijffifffiffiffiaffiffiðffiffitffiÞffiffijffi ej2pf faðtÞ df .
xðtÞ ¼ 1
aspects of the digital implementation; Section 7 -1 X ð f ; (14)
presents the performance evaluation of the FChT aÞ
on synthetic and real scenarios, namely, the analysis Proof. ZBy replacing (8) in (14), (14) becomes
of natural speech and sound of mammals; the qffijffifffiffiffi ffiffiðffiffitffiffiÞffifffiffiffi affiffiðffiffitffiffiÞffijffiffi
zðtÞ ¼ 1 0 0
conclusions close the paper. -1 xðtÞ
dðfaðtÞ- faðtÞÞ dt.
2. The FChT (15)
Eq. (15) can be simplified by using the time-scaling
The analysis formula of the FChT of signal ðxÞt property of the Dirac delta
is defined as X dðt - ti Þ
Z 1 qffiffiffiffiffiffiffiffiffiffiffiffiffi dðmðtÞÞ ¼ , (16)
0
Xðf; xðtÞ i jm ðti Þj
aÞ9
0
jf a ðtÞj e-j2pf fa ðtÞ dt, (8)
i
-1
2 where mðtÞ is a continuous differentiable function
where t is time, f is frequency, and fa ðtÞ is the and ti is such that mðti Þ ¼ 0. Since here mðtÞ ¼
second-order polynomial controlled by the so-called fa ðt Þ- ð a second-order polynomial, two
fa t is
chirp rate a Þ exist. After simplifications (15) becomes
roots
fa ðtÞ9ð1 þ 21 atÞt. (9) zðtÞ ¼ xðtÞ þ xð-t - 2=aÞ. (17)
The FChT involves the inner product between x ðt Þ The synthesis formula (14) delivers the input
and the complex signals signal overlayed with itself mirrored around the
pffiffiffiffiffiffiffiffiffiffiffiffiffi focal time instant. Thus, in order to achieve
xðt; f ; aÞ ¼ffiffi j1 þ atjej2pf ð1þð1 2ÞatÞt
=
(10)
perfect synthesis, condition (13) has to be met.
which are chirps whose IF, defined as the time &
derivative of the exponent, varies linearly over time
Fig. 2 contains a toy example illustrating the
dfa ðtÞ nature of the FChT synthesis: the signal on top is
nðtÞ ¼ f ¼ ð1 þ atÞf . (11)
d 4
the analysis signal, and the one at the bottom the
According t to (11), the sign of the IF of all basis reconstructed signal; the focal time instant is
components switches at the instant marked with an asterisk. Since the signal is non-
1 zero beyond the focal point, that half of the time
axis results overlayed over the half of interest (in the
t¼- (12)
a example t4 -10) in the reconstruction. Thus, in
which is called the ‘‘focal point’’ instant, that is, all order to prevent this time aliasing, the focal point
basis components meet at the point of the Wigner must be found outside the time support of the
plane ht; f i ¼ h-1=a; 0i .3 For the sake of simplicity signal.
and without loss of generality, henceforth the chirp
rate a is considered positive.
2.2. FChT as warped-time Fourier
2.1. Synthesis formula By doing the variable change ¼t fað t in the
analysis equation, (8) becomes Þ

Lemma 2.1. If xðtÞ fulfils the time support property


1 X ð f ; aÞ ¼ 1 ð þ ðtÞ þ - ðtÞÞ e-j2pf t dt, (18)
-1=2a x
x
Z
xðtÞ ¼ 0 for to - . (13)
a where signals x þ ðtÞ and x - ðtÞ are given by
5

2
As will be shown, f has actually a connotation with IF at 1
t ¼ 0. x ± ðtÞ ¼ p ffiffi xðc
±
ðtÞÞ. (19)
4

ffi ffi ffiffiffi ffiffiffiffi
a
t ¼ 0.
3
The complex signal (10) is strictly non-analytic because of the
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4
amplitude modulation with the normalization term j1 þ atj. In order to evaluate numerically the integral (8), (14), the
Furthermore, since its IF crosses at the focal point (12), the signals in the example are oversampled discrete-time signals.
5
chirp sweeps the entire frequency axis. Hence, sub/superindex ± describes both cases þ and -.
1

1
20 15 10 5 0 5 10 15 20
t
1

1
20 15 10 5 0 5 10 15 20
t

Fig. 2. Synthesis of a signal from its fan-chirp transform: (a) original signal, (b) reconstructed signal. The chirp rate is a ¼ 0:1. Focal point
-1=a marked with asterisk.

þ -
Here c a ðtÞ and c a ðtÞ correspond to the 3. Properties
double solution of the inverse mapping of fa ðtÞ,
i.e.,
In this section we derive the most relevant
pffi ffiffiffi ffiffiffiffiffi ffiffi ffiffiffi ffi
± 1 1 þ 2 a t property of the FChT regarding the marginalization
c ðtÞ a ¼ - ± . (20) of the WD; Parseval’s theorem along with other
a
a basic properties are also derived; the TF resolution
Eq. (18) describes the FChT as the Fourier trans- over harmonically related chirplets covers the final
form of a warped-time version of the analysis part of the section.
signal. Since both components ðÞ x þ ðtÞ and x - t
overlap in time, the synthesis conditions (13) in 3.1. TF marginalization geometry
Lemma 2.1 can be also stated as: xðtÞ can be
obtained fromð Þ ¼its FChT if x - t 0. The analysis formula (8) can be arranged as
The most relevant aspect of (18) is the fact that Z
the FChT can be evaluated with the Fourier X ð f ; aÞ ¼ 1 cðtÞ e-j2pf fa ðtÞ dt, (21)
transform mechanism. This fact clearly points out -1
to its fast discrete-time evaluation: the signal is where cðtÞ is the normalized signal,
prewarped accordingly and the FFT algorithm pffiffiffiffiffiffiffiffiffiffiffiffiffi
applied thereafter. These aspects are addressed in wðtÞ ¼ ffiffi j1 þ atjxðtÞ. (22)
detail in Section 6. The last paragraphs of the Lemma 3.1. The squared magnitude of the FChT is
seminal work on warped wavelets [13] anticipates equal to the marginalization of the WD of the
the high potential of that approach: ‘‘In the future, normalized signal cðtÞ according to a fan
we can imagine the choice of a warping operator geometry,
, that is
being made automatically to best fit a certain style Z
of analysis to a given set of data.’’ In the FChT, that jX ð f ; aÞj2 ¼ 1 WDwðt; ð1 þ atÞf Þ dt. (23)
time-warping law is governed by chirp rate a. -1
Proof. With the variable change ¼ t fa ðt , (21) can
be written as Þ
t
Z
X ð f ; aÞ ¼ 1 w ðtÞ e-j2pf t dt, (24) v
-1

where
f
w ðtÞ ¼ w þ ðtÞ þ w - ðtÞ, (25a) f

w ±
wðc a
± uðt þ 1=ð2aÞÞ. (25b)
ðtÞÞ
ðtÞ ¼ ± a0 ðca
f
ðtÞÞ
Supported on the Fourier-based formulation (24),
1 1 0 t
the squared magnitude of the FChT of x t ð Þis equal y 
to the frequency marginal of the WD of the warped
signal w ðtÞ, i.e., Fig. 3. The FChT power spectrum is equal to the fan margin-
alization of the Wigner distribution from the focal point
2 ðt; f Þ ¼ ð-1=a; 0Þ.
jX ð f ; aÞj ¼ Z 1 WDw ðt; f Þ dt, (26a)
-1 graphically explainable by placing the focal point on
where the positive half of the time axis.
t t
WDw ðt; f Þ ¼ Z 1 w (t þ w * (t - e-j2pf t dt. Corollary 3.2. The FChT is the only warped-time
-1 2 2 Fourier transform (7) that marginalizes the WD
(26b) along line paths.
The following variable change in the double integral
Proof. Note first that the proof of Lemma 3.1 has
(26):
been conducted without expanding faðtÞ, except for
1 v 1 v the final integral variable (27b). Thus, in order for
t¼ fa (u þ þ fa (u - , (27a) the general warping operator to marginalize the WD
2 2 2 2
v along line paths, the general mapping fðtÞ is to
tv ¼ f (u þ - f (u - fulfil in (27b)

a a
2 2 ( v ( v
¼ ð1 þ auÞ v ð27bÞ
gives rise to the following Jacobian: f u þ ð Þ- f u - ¼ nðuÞ v; u; v 2 R, (29)
qðt; tÞ 0 v 0 v
¼ f (u þ f (u - . (28) where n u is a u-dependent function representing
2 mentioned
the 2 line paths. Since (29) must hold for
any value of v (and u), by setting v ¼ 0, the line
a a
qðu; vÞ 2 2 0
paths turn out to be nðuÞ ¼ f ðuÞ. Given that (29)
±
Given that ca ð fða ÞÞ
t t, the double integral for chirp rate a. The fan marginalization achieves
(26) after simplifications
¼ becomes equal to the its finest
proposi- h- i representation when the focal point is 1=g; 0 . The
tion (23). & Fourier analysis corresponds to the ‘‘light’’ source
located at the infinite. Negative chirp rates are
Fig. 3 illustrates the result of Lemma 3.1. The
image corresponds to the WD of a signal composed
of harmonically related chirplets, where the dark
stripes represent the non-stationary TF energy (for
the sake of simplicity, the cross-terms are not
depicted). When the structure is ‘‘illuminated’’
from the TF h- point i1=a; 0 , the resulting
projection gives rise to the FChT power spectrum
must hold for any v, it is simple to deduce that
the only function f t fulfilling that condition is a ðÞ
second-order polynomial, which can be
univocally written as
( )
fðtÞ ¼ 1 þ 1 aðt - tÞ ðt - tÞ þ Z. (30) 2

Here t and Z are constants playing simply the


role of time and phase shifts, respectively. Then,
0
the TF line ht; f ðtÞf i represents the
marginalization path. &

3.2. Basic properties

A main characteristic of the FChT is its variant


nature respect to time shift and time scale, that is,
the FChT of x t t0 and x ct yield complex ð - Þ ð Þ
expressions that cannot be intuitively related to
the
Table 1
Basic properties of the fan-chirp transform
energy of the kth harmonic. Assuming that the
evolution of the IF of any harmonic follows a
Signal FChT
linear trajectory, the fundamental phase results in
Z t (
1
1 x* ðtÞ X * ð-f ; jðtÞ ¼ f 0ð1 þ gtÞ dt - f 0 1 þ t, (33)
aÞ 2
2 xð-2=a - tÞ X ð f ; aÞ gt
-1
3 xð-tÞ X ð-f ; -aÞ where f 0 is the instantaneous fundamental fre-
4 xðtÞ is real X ð f ; aÞ ¼ X * ð-f ; aÞ quency (or pitch) at t ¼ 0, and g is the so-called
5 xðtÞ is imaginary X ð f ; aÞ ¼ -X * ð-f ;
pitch rate. The TF energy of signal (32) is illustrated

6 axðtÞ þ byðtÞ aX ð f ; aÞ þ bY ð f ; aÞ schematically in Fig. 3. The signal (32), which is
7 xðtÞ ej2pnfa ðtÞ X ð f - n; aÞ
8 pffi
xðtÞ ffiffi
yðtÞj1 þ
X ð f ; aÞnY ð f ; referred here to as ‘‘chirp-periodic’’, describes
ffiffiffi ffi ffiffi aÞ several natural sounds, such as short segments of
at j
human speech [17] and the song of some mammals
FChT of x ðt Þ. The time-variant characteristic can be [18]. Based on the Gaussian chirplet definition (2),
easily understood from the fan geometry depicted in xðtÞ canXbe written as
Fig. 3. Additional basic properties of the FChT, ð Þ ¼ ðt ,
xt ak g Rk;tk;nk;bk (34)
directly deduced from the analysis formula, are k
presented in Table 1, such as the linearity (No. 6), Þ
where
the chirp modulation (No. 7) and the windowing
theorem (No. 8). For a ¼0 all FChT properties nk ¼ kf 0ð1 þ gtkÞ, (35a)
turn out properties of the Fourier transform (except bk ¼ kf 0g. (35b)
No. 2, which results meaningless).
Given that perfect synthesis is constrained to Different tk and Rk parameters allow independent
energy distribution for each harmonic (required
finite-time support conditions, the FChT basis (10)
for instance in an accurate description of speech
is not strictly orthogonal within the whole time [17]).
domain. Thus, Parseval’s Theorem does not strictly
We are now interested in obtaining the analytical
hold. This fact can be intuitively argued by the
description of the chirp-periodic signal (34) under
‘‘sink’’ effect around the singular focal point.
the FChT transform. Since the squared magnitude of
Furthermore, the ambiguity in the reconstruction,
the FChT is equivalent to the fan marginalization
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi of
which implies that two different signals, with
the p
different energy values, can have the same FChT
WD of the normalized signal wð Þt ¼ j 1þ atj xð tÞ,
(see property No. 2 and Fig. 2), clearly stresses that
the WD of the multi-component signal xð tÞ is firstly
limitation. However, if the signal has finite time 6
support according to (13), Parseval’s theorem required. This WD is composed of the positive
Z 1 Z 1 contribution of the Gaussian chirplets and the cross-
x * terms among them
- ðtÞ y ðtÞ dt -1 X X X
1=a
X ¼ WDxðt; f Þ ¼ Ck;kðt; f Þ þ Ci;jðt; f Þ,
*
ð f ; aÞY ð f ; aÞ df (31)
(36)
k i ja
i
where
-
i 1=a
does hold regardless of the chirp rate a. If C i;j ðt; f Þ ¼ 2jai jjaj j cosð2pði - jÞf 0 fg ðtÞÞ
yðtÞ ¼ xðtÞ, (31) holds the equivalence between !!
the 2 2

energy in the time and frequency domains. 1 ðt - ti Þ ðt -


x exp - Ri 2 tj Þ
2 þ 2 !
3.3. Analysis of harmonically related ( Rj 2
2 2
chirplets ði þ jÞ

Let x t be a signal composed of Gaussian is integer multiple of a common x p


chirplets ðharmonically
Þ related, that is, whose phase phase jðtÞ e -4p R
f -f0 0
f g ðtÞ
2
ð37Þ
and R2 ¼ 2R2R2=ðR2 þ R2Þ. 7

i j i j
xðtÞ ¼ 1
-ð1=2Þððt-tk Þ=Rk Þ2 jkjðtÞ
k e . (32) 6
Note that the WD of the normalized signal wðtÞ can be
pR2
4
qffiffiffiffiffiffiffiffi wðt; f Þ ¼ j þ1 tj xðt; f
k Þ
because the
2
k
k k approximated by WD 1 a WD
because the

Here k is the harmonic index and tk, Rk and ak input signal xðtÞ is assumed to fluctuate much faster than factor
j j pffi 1ffiffiffiffi at
ffiffiffiffi.
are, respectively, the time location, time spread and 7j þ
ffiffiffiffiffi In
ffi case iaj, Eq. (37) is an accurate approximation.
j
L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504– 1511
1522
The squared magnitude of the FChT of the multi- this statement may seem obvious at this point,
component signal results thus in previous works [19,20] suggest surprisingly the
X
jX ð f ; aÞj2 ¼ 2 þ Cð f ; aÞ,
jGk ð f ; aÞj (38) use of the FrFT for the analysis of chirp-periodic
signals.
k
• Since the cross-terms (37) follow the same fan
where Gk ð f ; aÞ is the FChT of the kth harmonic TF geometry as the positive terms, and the same
and
ð CÞ f ; a contains the fan marginal of all cross- TF geometry intrinsic in the FChT
terms. Appendix A contains the analysis of the marginalization
¼ when a g, it is simple to deduce
positive terms in (38): the fan marginalization of the that the cross- term interference among the
kth harmonic has nearly Gaussian shape of mean harmonics in the FChT is minimum and, given
and variance, respectively the oscillatory nature of the cross-terms TF
energy, irrelevant.
mk ¼ kf 1 þ gtk
, (39a)
0
1 þ atk 4. Prior related contributions

s2
¼
k R.k 2
=ð ðRkkf 0Þ ðg - aÞ2 . (39b) The FChT compares seamlessly against the
2 4
8p Þ 2
2
þ Chirplet [10], the fractional Fourier [12], and the
ð1 þ atk Þ 2ð1 þ atk Þ
Fourier transforms, all yielding the marginalization
On the other hand, the fan marginalization of the of the TF plane along different straight line
cross-terms can be likewise addressed based on the geometries (see Fig. 1 again for an illustrative
previous result: the cross-terms have Gaussian comp- arison). Additionally, it is important to
envelope and follow the same fan TF geometry address here previous works [19–22] related to the
than the positive terms; the major difference is, FChT to a larger or lesser extent.
however, the oscillating nature of the TF energy The work [19] proposes the so-called Harmonic
described by the first factor in (37). fractional Fourier transform (HFT)
The uncertainty principle [1] provides a lower Z
bound to the accuracy of the joint TF resolution HFTðoÞ9 1 xðtÞ e-joð1þAtÞt dt (43)
ss X 1 , (40) -1

t f as analysis method for speech segments with non-


4p stationary pitch. Here A is the so-called base tone,
where st and sf are the spread of the time and
frequency marginal, respectively. In case of the kth clearly related to the analysis chirp rate a. From
harmonic of the multi-component signal x ðt Þ , the the analysis formula (43) it results obvious that
spectral dispersion s2f ðaÞ is given by (39b), and its the HFT (43) and the FChT (8) are closely related
time dispersion (referred to the ‘‘warped’’ to each other. One important different between
harmonic) is given by both is
0
the normalization term pffifffiffiffiffiffiffi ffi
a t , which makes only
0
st2 ðaÞ ’ 21 Rk2 jf a ðtk Þj2 . (41) j
the FChT fulfil Parseval theorem. ð However, the
ffiffi ffiffi ffi
Then, the product duration–bandwidth results in major difference comes Þfrom j the interpretation of
Eq. (43) given in [19] by its authors: the FrFT (4)
stðaÞsf ðaÞ 1’ qffi1ffiffiffiþffiffiffiffiffikffiffiðffiffigffiffiffi-ffiffiffiffiffiaffiffiÞffiffi2ffi, is suggested as ‘‘natural choice for fast
4p evaluation’’ of the integral (43), arriving to the
conclusion that the required rotation of the FrFT
(42) is tanðyÞ ¼ oA. This
2 4 2 2 2
where k¼ k R 0 kð f þ = 1Þ
4p atk 40. The bound of the uncertainty principle for a single
minimum ¼
Gaussian chirplet. On the contrary, ðnone
Þ is capable
of (42) takes clearly place for a¼ g, the duration– to reach that resolution for all harmonics of a
bandwidth product being close to the lower bound linear chirp-periodic signal on the same transform
of the uncertainty principle. Two conclusions arise instance as the FChT does. Although
from the last results:

• The CT (1) and the FrFT (4) can reach the lower
1510 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
correct geometrical correspondence
1522 tells us that
an FrFT instance delivers only one single HFT
frequency bin, namely, for o tan y =A. Thus,
the evaluation of the integral (43) requires as
many FrFT instances as frequency bins wished
to evaluate. This exhaustive processing
mechanism, seconded by recent works [20],
relate implicitly rotations on the TF plane with
the fan TF geometry of chirp-periodic signals, a
relation that is not conceptually appropriate.
Another work closely related to the FChT
indeed is found in [21]. In spite of addressing
analytically the practical boundaries
of the chirp rate, the work [21] does not provide accordingly. If the adequate kernel in the GWCC is
theoretical insights on the proposed time-warper. selected, the result can be a nearly cross-term-free
The work [22] proposes the so-called local TF representation. That is clearly achieved only if
polynomial Fourier transform (LPFT) the signal xðtÞ has fan TF geometry and if the
LPF Z1 analysis chirp rate a matches that geometry. On the
TðxÞ ¼ -1 contrary, Cohen’s class TFDs are not suited to
x the fan geometry but to the slanted geometry, such
ðtÞ e-jfLP ðt xÞ dt,
;
(44a)
LPF x
ðtÞ e-jfLP ðt xÞ dt,
;
TðxÞ ¼ -1 (44a)
contrary, Cohen’s class TFDs are not suited to
the fan geometry but to the slanted geometry, such
where the phase of the analysis basis is an m-order as in [24]. Further comments on warped TFD are
polynomial beyond the scope of this paper.
1 2 1
fLP ðt; xÞ9 o1 t þ o 2 t þ þ
2 m o mtm (44b)
! 5. Chirp rate estimation
and x o¼1;ð . . . ; om are is taken, and
Þ the polynomial coeffi-
finally the bidimensional representation is morphed
cients. At first sight one might be tempted to 8
Likewise, one may argue that the LPFT is the extension of the
enclose the FChT into the second-order LPFT.
Chirplet transform to polynomial laws higher than two.
However, a more careful look reveals that the
second-order LPTF corresponds actually to the
Chirplet trans- form ¼ (1), where o 2pf ¼ is
1
frequency and o2
8
2pb is the frequency variation rate. Consequently,
the second-order LPFT yields a marginalization
geometry (Fig. 1b) different from the FChTs (Fig.
1d). Even in case of a general order m, the
marginalization takes place (loosely analytically
speaking) in curved ‘‘parallel’’ paths, but never
with a fan geometry. As major consequence, neither
the LPFT (nor the Chirplet transform) is indicated in
the analysis of harmonically related chirplets (as the
seminal [22] and applied works [23] confirm). It is
important to remark again that harmonically
related chirplets, the fan marginalization of the
WD and the Fourier transform of the faðtÞ-based
time-warping are conceptually related.
The relation between the FChT and TFD may
be suggested as required exercise. However, it is
necessary to remark that the FChT itself is a
transform, and not a TFD. On the other hand, the
time-warping operation fa tð Þ (9) can be linked to
recent TFDs, such as the GWCC [16], which in its
simplest case, the generalized warped Wigner dis-
tribution (GWD) [8], is defined by
0
GWDxðt; f Þ9WDxðf- 1
ðtÞÞ ðfa ðtÞ; f =fa ðtÞÞ. (45)
a

This GWD yields different distribution of the cross-


terms with respect to those delivered by the plain
WD. This can be easily understood by parsing the
GWD mechanism: the signal xðtÞ is time-warped
according to f ð Þ-a 1 t , thus resulting in
‘‘stationary’’ harmonics, then the WD of the result
The most important aspect of the FChT in
practical scenarios regards the adequacy of the law
fa t to the actual TF characteristics of the signal.
Signals with fan geometry are found in practice ðÞ
only in short segments, such as in case of speech
[17] or
the song of some mammals [18]. In that sense,
two options are at hand: either to use a warping
function f t with more degrees of freedom than ðÞ ðÞ
fa t for
matching the possible non-linear geometry, or to
parse the signal into short segments and analyze
them independently with the FChT. The first
option directs the attention back to the general
warping operator (7) described in the Introduction.
Due to its inherent complexity, this general
option is beyond the scope of this paper. The
second option refers to the use of the FChT as
generalization of the spectrogram [1] (which is
indeed a TFD), that is,
2
Cxðt; f Þ ¼ jFChTfwðtÞxðt þ tÞ; aðtÞgj , (46)
where w t is the analysis window and a t is the ðÞ ðÞ
analysis chirp rate for the segment centered at
time t. The analysis window must be a Gaussian-
like window, such as Gaussian or Hamming, in
order for the existing chirped components of the
segment to become Gaussian chirplets, as
suggested in Section 3.3. This segmentwise
processing brings interesting advantages, namely,
the spectrogram is known to be the TFD with the
9
least cross-term interference, and the chirp rate is
independent for each segment and can be set
accordingly to obtain the best resolution for the
segment. This segment- wise processing is
common in on-line applications and the preferred
methodology for the practitioner. The need to ðÞ
estimate the chirp rate a t that best matches the
TF characteristics of the segment is
9
Although the spectrogram is positive, it is not free of cross-
terms interference. In fact, these correspond actually to the (in
this case) fan-marginalization of the WD cross-terms of each
single segment wðtÞxðt þ tÞ.
151 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
2 1522
doubtlessly the decisive factor on using the FChT- 1
spectrogram instead of the plain spectrogram. This
estimation can be approached with two different 0
methodologies: inter-frame and intra-frame.
1
15 10 5 0 5 10 15
5.1. Inter-frame
ms

Assuming that the signal presents a continuous 4


evolution of its fundamental frequency f 0 ðt Þ,
according to the IF of the fan geometry (11), the 3
best estimation of the pitch rate is

kHz
2
0
aðt f 0ðtÞ, (47)
Þ¼ 1
f 0 ðtÞ
0
where f 0ð tÞ is time derivative of f 0ðtÞ. Thus, 0
the intuitive approach is to quantify the evolution 0.5 0 0.5

of the pitch f 0 tð Þand then compute the chirp rate
as (47).
The estimation of the pitch f 0 is a popular
problem. Given the broad spectrum of pitch
estimation methodologies [25], it is not our aim to
focus here on a particular pitch estimation algo- 
0.5 0 0.5
rithm and elaborate it further. On the other hand,

since the short-based processing in (46) is
commonly
carried out only at certain instants t ¼ nS, based on Fig. 4. Example of dense ða; f Þ plane. From top to bottom:
a shift interval S, the estimated pitch on the analysis signal, ða; f Þ plane, and its vertical marginalization. The
neighboring segments around the nth can be used ða; f Þ is in logarithmic scale. The chirp rate a has been scaled by
the signal length.
to obtain the pitch rate as
f 0½n þ 1]- f 0½n - 1] 10
a½n] ¼ . rate g. The a;ð f plane Þ reveals the ‘‘bowtie’’-
0
shaped spread, typical in chirp-based transforms
Note that the estimation of (48)
the chirp rate for the nth and unequivocal sign of redundancy.
2Sf ½n]the pitch of the
segment as given in (48) will require In order to skip the large computational load
next segment. This ‘‘non-causal’’ method implies to required to obtain the redundant ða; fÞ plane, a
step one segment back to recompute the FChT of different methodology, proposed in [18], is that of
the nth segment once that the pitch of the nþ 1th is computing L FChT instances (associated to differ-
available. More details of a procedure inspired on ent chirp rates ai and estimating the chirplet
this approach can be found in [17]. parameters from the L FChT ‘‘views’’. We will
show that three views (L¼ 3Þ are sufficient for that
5.2. Intra-frame purpose. According to the harmonic representation
in (39), the kth harmonic in the ith FChT view is to
While in the inter-frame approach pitch informa- be centered at frequency mk;i and have a width sk;i
tion from adjacent segments is available, the intra- according to
frame methodology relies only on the current 1 þ gtk
segment. Computing a dense ða; f Þ plane turns out mk; ¼ 0 , (49a)
to be the most intuitive methodology. Fig. 4 i kf 1 þ a it k

contains an example of such a plane: the analysis


signal corresponds to a real speech segment; the R-k 2 =ð8p2 Þ ðRkkf 0Þ2 ðai - gÞ . (49b)
s2 ¼ 2
k;
þ2 4
ða; f Þ plane reveals a detailed harmonic representa- i ð1 þ ai tk Þ 2ð1 þ a t
i k Þ
tion for positive chirp rate values; the vertical
projection of that plane has a maximum at a ’ 0:3, 10
Note that this estimation method turns out much simpler
which can be adopted as the estimation of the pitch than the FrFT-based approach in [19,20].
Assuming that the centers and spreads C ¼ 5.3. Inter versus Intra
mk;i; sk;i i¼1;...;L are available, it is possible to
f g
estimate the kth harmonic characteristics as follows:
In this section we have tried to outline some in-
• The central instant tk can be obtained as house methods for estimating the optimal chirp rate
of a given segment based on two different
meth-

tk ¼ mk;i - mk;j odologies, each one having its own pros and
aj - m . (50) limitations.
m ai
k;j k;i
The inter-frame methodology is mainly condi-
At least two views are required to estimate tk. tioned by the accuracy of the pitch tracking. Inter-
• By replacing (49a) into (49b), we can write frame methods other than those supported on pitch
2
r 92s2 ð1 þ ai tk Þ estimation are not foreseen as promising or
i k;i
1 competitive, essentially because pitch and its change
Rk2m2 2
¼ i over time is the main descriptive parameter of quasi-
4p2Rk2 þ k
2
ða - gÞ . periodic or chirp-periodic signals. Although the
The pairs ðð51Þ
ai; ri Þcorrespond to the samples of a pitch tracking task is not especially costly, it should
second-order ð1 þ gt Þ
polynomial, r aa2 þba þc. It is
¼ not be underestimated, especially if the signal
simple to see that the minimum of that segment contains disturbing components, such as
parabola background noise of any type.
corresponds to the chirp rate g, i.e., On the contrary, the inter-frame approach does
b not rely on temporal information, but only on the
g¼ - . (52) information within the very segment. Although,
2a computing a dense ða; f Þplane is the most intuitive
Likewise, the spread Rk can be obtained from the methodology, the intrinsic redundancy and
parabola coefficients. In order to obtain the resulting computational overload suggests the
coefficients ða; b; cÞ, three points are required. need of alter- native methods. The alternative
• The central frequency is estimated as method outlined in this section makes use of very
mk;i mk;j ðaj - ai Þ few FChT instances,
kf 0 ¼ . (53) thus skipping the use of redundant information.
ð1 þ gtk Þðmk;j a - k;i ai Þ
However, the large computational load required to
m estimate the chirplets present in the signal does not
The requirement of three different FChT instances suggest its use in on-line applications.
is equivalent to that in the fast refinement for At the risk of leaving this comparative without a
chirplet decomposition based on redundant diction- clear conclusion, let us mention that our current
aries proposed in [7], in which three projective chirp research is mainly focused on inter-frame meth-
views are used to estimate the chirplet parameters. odologies, and in particular on how to track the
Despite the simplicity of the inverse problem in pitch in more general scenarios, such as the signal
(50)–(53), the question that immediately arises being corrupted with background noise, or even
regards the method to obtain the set C of centers with another interfering harmonic signal.
and spreads for all harmonics. The answer has been
addressed in [18]. That approach is based on
6. Discrete-time formulation
decomposing each FChT view into Gaussians so
that each triple Gaussians (each from each view)
In analogy to the discrete-time Fourier transform,
fulfil (39). This is achieved with a Gaussian-fitting
the formulation of the discrete-time FChT could be
algorithm driven by the expectation maximization
thought as the continuous-time transform of signal
(EM) algorithm, the procedure being constrained to
(50)–(53). This iterative mechanism converges to X
xðtÞ ¼ 1 x½n]dðt - nT sÞ, (54)
the harmonically related chirplets present in the
n¼-1
signal. Although the computational load of the
algorithm is ½]
not low, mainly due to EM, the mechanism admits where x n is the discrete-time signal and Ts is the
parallel implementation and resembles somewhat sampling interval. This way of proceeding results in
biological processing [26]. For the sake of clarity, X1
the
details are out of the scope of the paper, the
X ðO; a^ Þ ¼ pffi ffi ffiffiffi ffiffiffiffiffi ffi ffiffi ffiffi ffi
interested reader being referred to that work. x½n] j 1 þ a ^ n j e-jOð1þð1=2Þa^ nÞn ,
n¼-1

(55)
where n is discrete time, O is frequency, and the having a length of T ¼ NTs. Note that the instants
analysis chirp rate a^ is the discrete counterpart tn are all within the interval ½-T=2; T=2]. Let xðtÞ be
of
the chirp rate a, that is, the equivalent continuous signal
a^ ¼ aT s . (56) x NX-1
ðtÞ ¼
Likewise, frequency O is related to its continuous- n¼0
x
½n]hðt - tnÞ, (62)

Likewise, frequency O is related to its continuous-


time counterpart f as O ¼ 2pfTs.
where hðtÞ is an interpolation filter, such that hð0Þ ¼
Although ‘‘good-looking’’, the major drawback
1 and hðkT s Þ ¼ 0 for ka0 integer. Since the
of the suggested discrete-time formulation (55)
focal point is to be outside that interval, the chirp
comes from the undesired spectral overlapping of
rate a is to fulfil
the chirp basis, a well-known problem in chirp-
based transforms. Therefore, an aliasing-free for- 2 2
jajo ! ja^ jo . (63)
mulation may result by defining xðtÞ as T N
1
The Fourier-based evaluation of the FChT (Section
xðtÞ ¼ X x½n]pðt - nT Þ, (57)
n¼-1
s 2.2) involves the resampling of the signal x t ðÞ
according to the warping law
where p ðt) is the interpolation filter, which is to be 1
set to the ideal case pffi ffiffiffi ffiffiffiffiffi ffiffi ffiffiffi ffi . (64)
ca ðtÞ ¼ - 1 þ 2 a t
a
þ
pð a
t sinðpt=T s Þ The resampled signal x ðtÞ (19) is defined within the
Þ¼ . (58)
pt=Ts interval ½fa ð-T=2Þ; fa ðT=2Þ], which has also length
This yields the aliasing-free expression of T. The time instants t n where the samples of the
1 warped signal x ½n] are to be located are obtained as
X
X ðO; a^ Þ ¼ x½n]W a*^ ðO; nÞ, (59) 1 T
n¼-1 tn ¼ fa ð-T=2Þ þ (n þ , (65)
2 M
where the basis correspond to the bulky expression
where M is the number of samples of x ½n], and

in
pffi ffi ffiffiffi ffiffiffiffiffi ffi ffiffi ffi ffi
W ðO; nÞ ¼ Z 1 pðt - nÞ j 1 þ a ^ t j ejOfa^ ðtÞ (65) the index spans n ¼ f0; .. . ; M - 1g. Note that
a^ the resampled signal x ½n] has different number of
dt. (60) samples than x½n]. Thus, the x ½n] results in
-1

Unfortunately, the integral in (60) has no explicit X x½‘]


solution and thus (59) is of little help. N-1

In the practice, signal x½n] has finite time support. x ½n] pffi ffi ffiffiffi ffiffiffiffiffi ffiffifficaffiffiffiðtffi nhð
Þ - t‘ Þ. (66)
¼ j 1 þ‘ a t j
Furthermore, the focal point is to be placed ‘¼
0
outside ½]
the time support interval (as concluded in previous N-1
sections).
n
11
These facts
s along with the use of the t ¼ (n - T . (61)
2
Fourier-based evaluation presented in Section 2.2
gives rise to compact processing mechanisms. These time instants are placed equidistant, the
equivalent segment being centered on t ¼ 0 and
6.1. Digital evaluation of the FChT

n f0; ... ; N-1 ghave N samples. We


Let x½ n] , ¼
associate the discrete time index n to continuous
time instants as follows:
Finally, taking the Fourier transform of x n yields which is clearly an expensive option. Thus,
the FChT of x n . Two issues in the previous despite sacrificing ½ ] ideal interpolation, we suggest
procedure deserve special attention. a shorter interpolation filter, such as a first-order
filter, or the
6.1.1. Interpolation filter hðtÞ cubic Hermite spline, whose normalized definition
is 8
The requirement of h 0 1 and h kT s 0 for ka0 3 3 5 2ð Þ ¼ ð Þ¼
2 < jtj2 - jtj þ 1;
> 0pjtjo1;
makes the ideal filter (58) a possible candidate. - 1jtj 3 þ 5jtj 2 - 4jtjþ 2; 1pjtjo2;
However, that choice implies that the evaluation hðtÞ 2 2
of each sample x n (66) requires N operations, ¼ ½]
11
>:
Note that this type of restrictions are common in other chirp- 0; jtjX2:
based transforms, such as the FrFT (4), in which the TF content
of the signal is to be confined within a circle [27]. (67)
In case of cubic Hermite interpolation, only four
samples of x½n ] are used to compute each of x^ ½n ].
a b
B
With linear interpolation only two. 1-| |T/2

6.1.2. Length M B
B
The warping rule (64) has a slope greater than
0
one in one time half, that ð Þ is, c a t 41 for
ato0. This impliesð Þ that signal x t is
‘‘undersampled’’ on that
region, this leading to undesired aliasing effects
(equivalent to those mentioned on (55)). This
undesired aliasing is reduced or even
suppressed by setting the length M to a proper
value. It is clear then that M4N.
Fig. 5 illustrates the effects of the time warping 
T 0 T cp (T/2) 0 cp (T/2)
2 2
on the TF contents of the signal ð Þ x t . The dark
stripes
of Fig. 5a represent harmonically related chirplets
(cross-terms are not depicted); since the signal Fig. 5. Time–frequency plane of (a) xðtÞ and (b) warped
signal
xðtÞ x ðtÞ. The TF content of x ðtÞ spans to a higher frequency.
results from the synthesis of a discrete signal x½n], its
The

spectral content is limited to frequency B ¼ example corresponds to a chirp rate a ¼ 0:5=T.


1=ð2TsÞ.
The TF content of the warped signal x ð Þt spans to a Based on the approximated dewarping (70) and the
higher frequency, as Fig. 5b shows. In order to have use of length M in (68), the reconstructed signal y½n]
N aliasing-free bins (out of M), this length M is to differs from the original x ½n] . We can then talk of a
be set as coding/decoding residual error, whose magnitude
MX 1 - ja^ N =4N. (68) depends on the interpolation filter hðnÞ and chirp
1 j- ja^ rate a^ . Note that for a^ ¼ 0, the discrete
jN=2 FChT

The frequency that corresponds to the length (68) is reduces to the DFT.
marked with a solid line in Fig. 5b. This length
gives rise to spectral overlapping (shaded on that 6.2. Computational load
figure) only in the remaining M N bins,
-
which in many practical cases do not contain Table 2 summarizes the digital computation of
valuable information and can be thus disregarded in the FChT and its inverse. Both directions are
12
further processing. The inverse FChT process composed of four main operations, which in case
results from taking the inverse Fourier transform of the direct transform correspond to:
of the M samples, and
dewarping signal x ½n] to obtain the original • Normalization: The discrete signal x½n] is
x½n].
The inverse resampling (66) clearly exists, weighted by a chirp-rate-dependent window. This
especially task implies the window computation (N opera-
given that M4N. However, that inverse process tions) and the product between window and
involves a sparse matrix pseudo-inversion, which signal (N operations).
implies large computational costs. An approximate • Warped index: The resulting weighted discrete
inverse warping is based on resampling x ½n] with signal is to be resampled according to the law
the same interpolation filter hðtÞ, as follows: caðtÞ; the new time instants need to be computed
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MX-1
y½n] ¼ j1 þ x ½‘]hðfa ðtn Þ - t ‘ Þ. (70) (M operations).
atnj ‘¼ • Resampling: The discrete warped signal is ob-
0 tained by interpolation. By using cubic Hermit
spline interpolation (67), the computing load of
12
On the other hand, in order to code the whole TF this task results in 4M.
information, the discrete time must be resampled according to • DFT computation: The resampled signal is
the length
DFTed, resulting in the FChT. Assuming that
N the length M is power of two, the number of
MX . (69)
1 - ja^ operations is in the order of M log M.
jN=2
Table 2
Digital computation of the fan-chirp transform and its inverse

Task Description Operations

Fan-chirp transform
Normalization x½n]
z½n] ¼ pffiffiffiffiffiffiffiffiffiffiffi 2N
j1þatn
Warped index tn ¼ ca ðtj n Þ M
Resampling P
z ½n] ¼ z½‘] hðt‘ - tn Þ 1 0.5 0 0.5 1
Fourier transform

4M

DFT X ½k; a] ¼ DFTfz ½n]g M log M
Inverse fan-chirp transform
iDFT z ½n] ¼ iDFTfX ½k; a]g M log M
Warped index t n ¼ fa ðtn Þ N
P
Resampling z½n] ¼ z ½‘] hðt ‘ - t n Þ the fourth harmonic k 4 were the highest. The results
4N Normalization
‘ are shown in Fig. 6. As proven in Section 3.3
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p
y½n] ¼ j1 þ atnj z½n]
2N

The length M is assumed to be power of two.

The previous analysis on computing requirements is


approximate and do not correspond to single
operations of a processor. Some tasks will likely
unfold in more processor instructions, such as the
computation of the normalization window. None-
theless, an exact computing load analysis is left as
programming-related exercise. Since we can fairly
assume that both lengths N and M are comparable,
the number of operations required in the FChT
computation is approximately
ðlog N þ 7ÞN. (71)
Therefore, the additional tasks would cost nearly
as much effort as the DFT operation. This fact
clearly opens the door the FChT in real-time
applications.
Finally, it is necessary to remark that the
estima- tion of the best-fitting a is considered an
external process to the FChT digital evaluation.
Thus, its computational load is not addressed by
Table 2.

7. Results

The first experiment provides a comparison


among the CT, FrFT and FChT on a toy synthetic
example. The synthetic signal corresponds to a train
of pulses non-equidistantly spaced, in such way that
the fundamental frequency changes in a linear
fashion; for the sake of clarity, the spectral envelope
delineated by the harmonics is not flat. The
mentioned transforms were applied over that signal,
in such a way that the resolution achieved around
¼
1 0.5 0 0.5 1
Chirplet transform

1 0.5 0 0.5 1
fractional Fourier transform

1 0.5 0 0.5 1
fan-Chirp transform

Fig. 6. Linear-path marginalization-based transforms on linear


pitch-variant train of pulses. The horizontal axis correspond to
normalized frequency and the vertical one to the magnitude
value.

the FChT is able to reach the maximum


resolution for any harmonic in the same
transform instance, while CT and FrFT can
achieve that resolution only for the selected
harmonic. On the contrary to these last
transforms, the FT and FChT deliver always a
symmetric spectrum.
A real case of chirp-periodic signals is found in
human speech. A short segment of speech can be
described by the so-called speech formation model
[28], as the filtration of the vocal cord excitation
pðtÞ
4

3
kHz

0
0 0.5 1 1.5 2 2.5 3
sec.
spectrogram, T=24 ms

3
kHz

0
0 0.5 1 1.5 2 2.5 3
sec.
spectrogram, T=48 ms

3
kHz

0
0 0.5 1 1.5 2 2.5 3
sec.
FChT-based spectrogram, T=48 ms

3
kHz

0
0 0.5 1 1.5 2 2.5 3
sec.
Pseudo smooth-Wigner–Villedistribution

Fig. 7. Time–frequency analysis of female speech with natural intonation. The horizontal and vertical axis in all representations are time
(in seconds) and frequency (from 0 to 4 kHz), respectively. The intensity is in logarithmic scale and preemphasis of high frequencies has
been applied.
by the vocal tract impulse response h ðt Þ . The rate (the proof of model (72) can be found in [17]).
excitation pð Þt is classically considered a train of Thus, a windowed segment of the natural speech
pulses equidistant in time. However, with natural can be written as
intonation, the fundamental frequency (or pitch)
X
undergoes linear variation over time. In
xðtÞ ¼ sðtÞwðtÞ ¼ ak ðtÞ ej2pkf 0 fg ðtÞ , (73)
consequence, a speech signal is more accurately k
modeled as
X where wðtÞ is a Gaussian-like window, and
0
sðtÞ ¼ Hðkf 0 f ðtÞÞ ej2pkf 0 fg ðtÞ
, the envelopesð Þak t can be approximated also with a
Gaussian shape.
(72) Supported on the previous facts, the speech can
k be analyzed with the FChT-based generalization
of
where Hð f Þ is the Fourier transform of hðtÞ, f 0 is
the pitch at the center of the interval, and g is the
pitch

3
kHz

0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
spectrogram,T =48 ms

3
kHz

0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
FChT-based spectrogram, T=48 ms

3
kHz

0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
Pseudo smooth-Wigner–Ville distribution

Fig. 8. Time–frequency analysis of male speech with natural intonation. The horizontal and vertical axis in all representations are time (in
seconds) and frequency (from 0 to 4 kHz), respectively. The intensity is in logarithmic scale and preemphasis of high frequencies has been
applied.
the spectrogram (46). Two examples are considered
in this paper: a British female speaker (TIMIT
database) and an Austrian male speaker (recorded
from public radio broadcasting). Both speakers talk
with natural intonation, which results in a contin-
uous and very often fast variation of the pitch. 0 1 2 3 4
Figs. 7 and 8 depict the TF representation of the kHz
female and male recordings, respectively, as result
of the following analysis techniques: from top
to
bottom, spectrogram with different window lengths,
13
FChT-based generalization of the spectrogram, 0 1 2 3 4
and smoothed pseudo-Wigner distribution kHz
14
(SPWD). Hamming window was used in all
methods. The first conclusion from the results in
Figs. 7 and 8 is that the spectrogram, regardless of
the segment length, fails to represent properly the
harmonics of the voiced utterances within the whole
bandwidth. This limitation, which is commonly 0 1 2 3 4
believed to be caused by ‘‘stochastic’’ components kHz
of voiced utterances [28], can be argued here as fast
frequency variation of the medium- and high-
frequency harmonics. This explanation is validated
by the FChT-based spectrogram, whose medium- and
high-frequency areas contain detailed harmonic 0 1 2 3 4
trajectories. Finally, the performance obtained by kHz
the SPWD (and other TFDs of Cohen’s class) does
not especially encourage the use of kernel distribu- Fig. 9. Fourier transform (top) and FChT (bottom) of the speech
tions on chirp-periodic signals. The reason of this record considered in Fig. 8, at instants (a) t ¼ 2:25 s and (b)
poor performance lies on the large TF overlapping t ¼ 0:25 s.
between adjacent harmonics, especially in those from
the medium and high frequencies. Furthermore,
kernel TFD can suit signals with slanted TF geometry the Fourier transform is blurry, but the position of
[24], but perform poorly with fan TF geometries. the spread harmonics still correspond visually to that
The limitation of the classical spectrogram ob- given by the more detailed FChT representation. In
served in this paper, overcome by the FChT-based example (b), the pitch rate of the naturally intonated
generalization of the spectrogram, may bring im- voiced segment causes the Fourier representation
portant consequences in speech coding: in the become unclear on the middle frequencies, whereas in
harmonic or sinusoidal coding the global bandwidth the high frequencies the Fourier spectrum seems to
is commonly split up into two bands [29], the lowest contain harmonics again. However, these are false
one being considered voiced and the highest one harmonics, caused by the marginalization of the
unvoiced; upon estimation of the split frequency each cross-terms (38), that do not match those delivered by
region is coded accordingly. This scenario is illu- the FChT. Consequently, with a FChT-based har-
strated in Fig. 9 with some spectral lines delivered by monic the whole bandwidth can be coded entirely as
the classical and the FChT-based spectrogram over an harmonic structure.
Another example of chirp-periodic signals is
the male record. The example (a) corresponds to a
found in the song of some mammals. The
speech segment of mild pitch rate: the second half of 15
popular bat echolocation ultrasound was chosen
13
The chirp rate estimation was based on the inter-frame
methodology, i.e., by tracking the pitch.
14 15
Other kernel distributions, such as the Choi–Williams (s ¼ 1) The authors wish to thank Curtis Condon, Ken White, and
and the Zhao–Atlas–Marks distribution [1] were also considered Al Feng of the Beckman Institute of the University of Illinois for
in the analysis, both delivering similar results to the SPWD. the bat data and for permission to use it in this paper.
1520 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
1522

a b c

50
kHz

25

0
0 2 4 0 2 4 0 2 4
ms ms ms

Fig. 10. TF analysis of ultrasound bat signal: (a) WD, (b) spectrogram ðT ¼ 0:9 msÞ, and (c) FChT spectrogram ðT ¼ 0:9 msÞ. The
intensity scale follows a root square, that is, amplitude level (instead of energy level) is shown.

as final scenario. The results of the analysis 8. Conclusions


are illustrated in Fig. 10: (a) the WD, (b) the
spectrogram and (c) the FChT-spectrogram The fan-chirp transform (FChT) is an effective
(both with 0.9-ms Hamming window) of that method for representing signals with fan time–
signal are shown. The FChT-spectrogram was frequency (TF) structure. This type of signals,
obtained according to the inter-frame methodo- denoted here as chirp-periodic signals, are common
logy, as in the previous speech examples. The in nature, such as segments of the song of mammals
segment length chosen (0.9 ms) assures the linear and human speech in natural intonation. The FChT
evolution of the pitch within the segment. Since possesses the property of marginalizing the WD
very few harmonics are found in the along straight line paths according to a fan
ð’bandwidth Þ 71 kHz , their localization in the WD is geometry. This geometry is entirely described by the
not difficult. On the other hand, the TF resolution user- defined chirp rate a, or by its inverse, the focal
of the FChT-spectrogram is excellent, following point instant, which plays the role of ‘‘source’’ and
pre- cisely the harmonic trajectory obtained by the ‘‘sink’’ of frequency. In practical terms, the signal
WD. The only drawback of the FChT-based should be zero beyond (and near to) that instant. The
spectrogram refers to the rigid TF resolution (42), FChT efficacy in real scenarios depends on the
due to the constant window length. The PSWD on choice of the chirp rate to best fit the TF
the contrary does not suffer from this limitation so characteristics of the signal. This strong signal
much, but it is affected severely by the cross-terms, dependency points out to the development of ad hoc
whereas the FChT-spectrogram shows content automatic methods for optimal chirp rate estimation.
only in areas close to the positive TF energy. The The guidelines of such estimation mechanisms have
TF resolution of the classical spectrogram (Fig. 10) been provided in the paper. The formulation in
is very poor, which supports the biological discrete time and its fast digital evaluation have
evidence [26] that the auditory system of small been provided as well. The excellent TF resolution
mammals should be doing more than Fourier and suppression of cross- terms delivered by the
spectral analysis. FChT has been illustrated with real examples of
chirp-periodic signals.
Appendix A. Gaussian chirplet of the Gaussian chirplet. On the other hand, the
representation Gaussian approximation (A.3) describes with accu-
racy the central frequency and variance of the
The Gaussian chirplet representation. In spite of the severe asymmetry for
1 -ð1=2Þððt-tÞ=RÞ2 2
ej2pðnðt-tÞþð1=2Þbðt-tÞ Þ extreme values of the chirp rate a, the Gaussian
gðtÞ ¼ p4 approximation is a very good reference of the
ffi ffiffi ffiffiffi ffi e
p R 2
is the only signal with a positive WD: spectral shape.
2 2 2 2 2
-
WDgðt; f Þ ¼ 2e-R ðt-tÞ
e-4p R ð f -n-bðt-tÞÞ
. References
Based on the fan-geometry marginalization (23), the [1] L. Cohen, Time–frequency Analysis, Prentice-Hall, Engle-
squared magnitude of the FChT of the Gaussian wood Cliffs, NJ, 1995.
chirplet results in [2] Mecklenbra¨uker, Hlawatsch (Eds.), The Wigner Distribu-
tion—Theory and Applications in Signal Processing, Else-
2 j1 þ atð f Þj -ð1=2Þðð f ð1þatÞ-nÞ=sð f Þ Þ
jGð f aÞj ¼ e 2 , (A.1) vier, Amsterdam, NL, 1997.
; pffi ffiffi ð f
2 Þ [3] L. Cohen, T. Posch, Positive time–frequency distribution
ffiffiffi
p s
where the parameters sð f Þ and tð f Þ are governed by functions, IEEE Trans. Acoust. Speech Signal Process.
frequency f as follows: ASSP-33 (1985) 31–38.
[4] P.J. Loughlin, J.W. Pitton, L.E. Atlas, Construction of

s2ð f ¼ 1 R-2 þ 1 ðaf - bÞ2 R2 , (A.2a)


positive time–frequency distributions, IEEE Trans. Signal
Þ 2 Process. 42 (10) (October 1994) 2697–2705.
[5] S. Mallat, Z. Zhang, Matching pursuit with time–frequency
8p2
8 an dictionaries, IEEE Trans. Signal Process. 41 (12) (December
>>t if b ’ 1993) 3397–3415.
< 1þ
tð f Þ f - n þ bt ; (A.2b) [6] A. Bultan, A four-parameter atomic decomposition of
¼ at
chirplets, IEEE Trans. Signal Process. 47 (3) (March 1999)
:> b-
otherwise: 731–745.
af [7] Q. Yin, S. Qian, A. Feng, A fast refinement for adaptive
The power spectrum (A.1) has nearly Gaussian Gaussian chirplet decomposition, IEEE Trans. Signal
shape: the ‘‘variance’’ (A.2a) depends on variable f Process. 50 (6) (June 2002) 1298–1306.
[8] A. Papandreou-Suppapola, S.B. Suppapola, Analysis and
and thus (A.1) is not strictly a Gaussian function.
2
classification of time-varying signals with multiple time–
The resulting shape of jGð f ; aÞj is asymmetric, the frequency structures, IEEE Signal Process. Lett. 9 (3)
more the larger the chirp rate a and the frequency. (March 2002) 92–95.
In spite of this asymmetry, we are interested in [9] L.R. Rabiner, R.W. Schafer, C.M. Rader, The Chirp-Z
transform algorithm and its application, Bell System
characterizing j Gð f ; aÞj 2 in terms of its spread and
Technical J. 48 (5) (May–June 1969) 1249–1292.
central location, that is, to establish an approxima- [10] S. Mann, S. Haykin, The chirplet transform: physical
tion with the true Gaussian model. The mean and considerations, IEEE Trans. Signal Process. 43 (11) (No-
variance of the proposed Gaussian approximation vember 1995) 2745–2761.
result, respectively, in [11] H.M. Ozaktas, Z. Zalevsky, M.A. Kutay, The Fractional
n Fourier Transform with Applications in Optics and Signal
, (A.3a) Processing, Wiley, New York, NY, 2001.

1 at [12] L.B. Almeida, The fractional Fourier transform and time–
frequency representations, IEEE Trans. Signal Process. 42
þ2 1
(
1
1
-2 2 (11) (November 1994) 3084–3091.
s ¼ 2 R þ ðam - bÞ R2 . (A.3b)
ð1 þ 28p2 [13] R.G. Baraniuk, D.L. Jones, Warped wavelet bases: unitary
atÞ equivalence and signal processing, in: Proceedings of IEEE
This approximation does not correspond to the parameters (A.3) were obtained from the second-
maximum likelihood criterion, since such an order term.
analy- sis faces integrals with no analytical The validity of the previous analysis has been
solution. Instead, the exponent in (A.1) was assessed extensively with simulation experiments.
expanded in a Taylor series
¼ around f m, and the These confirmed the analytical expression (A.1) as
accurate description of the FChT power ICASSP, 1993, pp. 320–323.
spectrum [14] R. Baraniuk, Unitary equivalence: a new twist on signal
processing, IEEE Trans. Signal Process. 43 (10) (October
1995) 2269–2282.
[15] T. Twaroch, F. Hlawatsch, Modulation and warping
operators in joint signal analysis, in: Proceedings of IEEE
Symposium on Time–frequency and Time-scale Analysis,
1998, pp. 9–12.
[16] A. Papandreou–Suppapoula, F. Hlawatsch, G.F. Bou-
dreaux-Bartels, Quadratic time–frequency representations
with scale covariance and generalized time-shift
covariance: a unified framework for the affine,
hyperbolic, and power
classes, Digital Signal Process.: Rev. J. 8 (1) (January 1998) [23] L. Stankovic, S. Djukanovic, Order adaptive local poly-
3–48. nomial FT based interference rejection in spread spectrum
[17] M. Kepesi, L. Weruaga, Adaptive chirp-based time–fre- communication systems, IEEE Trans. Instrum. Meas. 54 (6)
quency analysis of speech signals, Speech Commun. 48 (5) (2005) 2156–2162.
(May 2006) 474–492. [24] D.L. Jones, R.G. Baraniuk, An adaptive optimal-kernel
[18] L. Weruaga, M. Ke´pesi, Self-organizing chirp-sensitive time–frequency representation, IEEE Trans. Signal Process.
artificial auditory cortical model, in: Proceedings of Inter- 43 (10) (October 1995) 2361–2371.
speech, 2005, pp. 705–708. [25] A.M. Kondoz, Digital Speech Coding for Low Bit Rate
[19] F. Zhang, Y.Q. Chen, G. Bi, Adaptive harmonic fractional Communication Systems, Wiley, Chichester, UK, 2004.
Fourier transform, IEEE Signal Process. Lett. 6 (11) [26] E. Mercado III, C. Myers, M.A. Gluck, Modelling auditory
(November 1999) 281–283. cortical processing as an adaptive chirplet transform,
[20] J.G. Vargas-Rubio, B. Santhanam, An improved spectro- Neurocomputing 32–33 (2000) 913–919.
gram using the multiangle centered discrete fractional [27] H.M. Ozaktas, O. Arikan, M.A. Kutay, G. Bozdag˘i,
Fourier transform, in: Proceedings of IEEE ICASSP, 2005, Digital computation of the fractional Fourier transform,
pp. 505–508. IEEE Trans. Signal Process. 44 (9) (September 1996) 2141–
[21] R.J. Sluijter, A.E.J.M. Janssen, A time warper for speech 2150.
signals, in: IEEE Workshop Speech Coding, 1999, pp. 150– [28] T.F. Quatieri, Discrete-time Speech Signal Processing,
152. Prentice-Hall, Upper Saddle River, NJ, 2002.
[22] V. Katkovnik, Discrete-time local polynomial approxima- [29] R.J. McAulay, T.F. Quatieri, Sinusoidal coding, in: Kleijn,
tion of the instantaneous frequency, IEEE Trans. Signal Paliwal (Eds.), Speech Coding and Synthesis, Elsevier
Process. 46 (10) (October 1998) 2626–2637. Science, Amsterdam, 1995, pp. 121–174.

You might also like