Professional Documents
Culture Documents
www.elsevier.com/locate/sigpro
Abstract
This paper presents a novel transform related to the framework of warping operators when the continuous time warping
mapping is a second-order polynomial. This case is proven in the paper to be the only one from the aforementioned group
that marginalizes the Wigner distribution along line paths, in particular, with a fan geometry. The properties and attributes
of the fan-chirp transform (FChT) along with the analytical characterization of harmonically related Gaussian chirplets
bear especial relevance in the paper. This analysis shows that for chirp-periodic signals the FChT can reach the limit of the
time–frequency (TF) uncertainty principle, while simultaneously keeping the cross-terms at minimum level. The
formulation of the fast digital computation of the FChT is also provided in the paper. Two practical scenarios—the
analysis of speech with natural intonation and bat ultrasound—validate the theoretical developments and shows
manifestly the eloquent competitive performance of the new transform.
r 2007 Elsevier B.V. All rights reserved.
Keywords: Fan-chirp transform (FChT); Time–frequency (TF); Fan marginalization; Harmonically related chirplets
gR;t;f ;bðtÞ ¼ 2
e-ð1=2Þððt-tÞ=RÞ j2pð f ðt-tÞþð1=2Þbðt-tÞ2 Þ
e . (2) t t
p4
ffi ffiffi ffiffiffi
p R
ffi
2
Here n is the instantaneous frequency (IF) at t ¼ t;
b the frequency variation rate and R the time
spread. By disregarding the Gaussian term and ¼ Fig. 1. Marginalization of the Wigner distribution along straight
setting t 0 for simplicity, it is easy to deduce that
line paths: (a) Fourier, (b) chirplet, (c) fractional Fourier and (d)
the squared magnitude of the Chirp(let) transform fan-chirp transforms. The dark area represents the TF energy of a
is Gaussian chirplet.
jX bð f Þj2 ¼ Z 1 WDx ðt; f - btÞ dt, (3)
-1 -1 1
where fðtÞ ¼ c ðtÞ. Eq. (7) represents the inner
ðÞ
where WDxð t; fÞ is the Wigner–Ville distribution product of signal x t with a non-linear chirp, which
[2] ðofÞ x t (henceforth Wigner distribution, WD). is the basic mechanism in TF redundant dictionaries
Thus, the CT yields the ‘‘slanted’’ marginal of for chirp-based signal decomposition [8]. The
the WD. warped-time framework has also given rise to new
Another well-established chirp-based transform is TF distributions, such as the generalized warped
the fractional Fourier transform (FrFT) [11] Cohen’s class (GWCC) [16], which introduces new
Z ways of interpreting the TF content.
X y ðuÞ ¼ 1 xðvÞ K y ðv; uÞ dv, (4) This paper tackles the warping operator (7)
-1 constrained to the mapping f ð Þt being a second-
where Ky ðv; uÞ is the transformation kernel [12]. The order polynomial. This case is proven here to
FrFT involves products with linear-FM chirps in marginalize the WD along straight line paths. Thus,
such a way that it yields the marginalization of the it can be seamlessly compared against other linear
WD along the angular direction y, i.e., chirp-based transforms, such as the CT and FrFT
(see Fig. 1 for an introductory illustration), and
jX yðuÞj2 ¼ Z 1 WDx ðcu - sv; su þ cvÞ dv, (5) studied with more detail apart from the general
-1 framework. The organization of the paper follows:
where c ¼ cos y and s ¼ sin y. Section 2 introduces the analysis and synthesis
The last chirp-based transform considered here is equations of the proposed fan-chirp transform
the warping operator [13–15], defined as (FChT); in Section 3 its main basic attributes, the
1 marginalization geometry and representation of
0
X cð Þ ð f Þ ¼ Z xðcðtÞÞ jc ðtÞj e-j2pft dt, (6) chirp-periodic signals are studied; Section 4
-1 contains a discussion on previous works related to
pffiffiffiffiffiffiffiffiffiffiffiffi
where cð Þt is a continuous differentiable time the FChT; Section 5 addresses the estimation of the
0
mapping and c ðtÞ its derivative. An equivalent only user- defined parameter of the FChT, the
formulation to this warped-time Fourier transform chirp rate, in
is R
0
jf ðtÞje-j2pf fðtÞ dt,
1
X ð f ; fð ÞÞ (7) becomes X cð Þ ð f Þ ¼ jf0 ðfðtÞÞje-j2pf fðtÞ dt, that is
xðtÞ pffiffiffiffiffiffiffiffiffiffiffiffiffi
150 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
6 Z 1 1522 1
By doing the variable change t ¼ffiffiffiffiffifffi ðtÞ in (6), this
integral
pffiffiffiffiffiffiffiffiffiffiffiffi lar to (7).
-1
x
ðtÞ
-1 lar to (7).
order to better match the TF geometry of the xðtÞ can be recovered from its FChT as
analysis signal; Section 6 elaborates on the practical Z qffijffifffiffiffiaffiffiðffiffitffiÞffiffijffi ej2pf faðtÞ df .
xðtÞ ¼ 1
aspects of the digital implementation; Section 7 -1 X ð f ; (14)
presents the performance evaluation of the FChT aÞ
on synthetic and real scenarios, namely, the analysis Proof. ZBy replacing (8) in (14), (14) becomes
of natural speech and sound of mammals; the qffijffifffiffiffi ffiffiðffiffitffiffiÞffifffiffiffi affiffiðffiffitffiffiÞffijffiffi
zðtÞ ¼ 1 0 0
conclusions close the paper. -1 xðtÞ
dðfaðtÞ- faðtÞÞ dt.
2. The FChT (15)
Eq. (15) can be simplified by using the time-scaling
The analysis formula of the FChT of signal ðxÞt property of the Dirac delta
is defined as X dðt - ti Þ
Z 1 qffiffiffiffiffiffiffiffiffiffiffiffiffi dðmðtÞÞ ¼ , (16)
0
Xðf; xðtÞ i jm ðti Þj
aÞ9
0
jf a ðtÞj e-j2pf fa ðtÞ dt, (8)
i
-1
2 where mðtÞ is a continuous differentiable function
where t is time, f is frequency, and fa ðtÞ is the and ti is such that mðti Þ ¼ 0. Since here mðtÞ ¼
second-order polynomial controlled by the so-called fa ðt Þ- ð a second-order polynomial, two
fa t is
chirp rate a Þ exist. After simplifications (15) becomes
roots
fa ðtÞ9ð1 þ 21 atÞt. (9) zðtÞ ¼ xðtÞ þ xð-t - 2=aÞ. (17)
The FChT involves the inner product between x ðt Þ The synthesis formula (14) delivers the input
and the complex signals signal overlayed with itself mirrored around the
pffiffiffiffiffiffiffiffiffiffiffiffiffi focal time instant. Thus, in order to achieve
xðt; f ; aÞ ¼ffiffi j1 þ atjej2pf ð1þð1 2ÞatÞt
=
(10)
perfect synthesis, condition (13) has to be met.
which are chirps whose IF, defined as the time &
derivative of the exponent, varies linearly over time
Fig. 2 contains a toy example illustrating the
dfa ðtÞ nature of the FChT synthesis: the signal on top is
nðtÞ ¼ f ¼ ð1 þ atÞf . (11)
d 4
the analysis signal, and the one at the bottom the
According t to (11), the sign of the IF of all basis reconstructed signal; the focal time instant is
components switches at the instant marked with an asterisk. Since the signal is non-
1 zero beyond the focal point, that half of the time
axis results overlayed over the half of interest (in the
t¼- (12)
a example t4 -10) in the reconstruction. Thus, in
which is called the ‘‘focal point’’ instant, that is, all order to prevent this time aliasing, the focal point
basis components meet at the point of the Wigner must be found outside the time support of the
plane ht; f i ¼ h-1=a; 0i .3 For the sake of simplicity signal.
and without loss of generality, henceforth the chirp
rate a is considered positive.
2.2. FChT as warped-time Fourier
2.1. Synthesis formula By doing the variable change ¼t fað t in the
analysis equation, (8) becomes Þ
2
As will be shown, f has actually a connotation with IF at 1
t ¼ 0. x ± ðtÞ ¼ p ffiffi xðc
±
ðtÞÞ. (19)
4
ffi ffi ffiffiffi ffiffiffiffi
a
t ¼ 0.
3
The complex signal (10) is strictly non-analytic because of the
p ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
4
amplitude modulation with the normalization term j1 þ atj. In order to evaluate numerically the integral (8), (14), the
Furthermore, since its IF crosses at the focal point (12), the signals in the example are oversampled discrete-time signals.
5
chirp sweeps the entire frequency axis. Hence, sub/superindex ± describes both cases þ and -.
1
1
20 15 10 5 0 5 10 15 20
t
1
1
20 15 10 5 0 5 10 15 20
t
Fig. 2. Synthesis of a signal from its fan-chirp transform: (a) original signal, (b) reconstructed signal. The chirp rate is a ¼ 0:1. Focal point
-1=a marked with asterisk.
þ -
Here c a ðtÞ and c a ðtÞ correspond to the 3. Properties
double solution of the inverse mapping of fa ðtÞ,
i.e.,
In this section we derive the most relevant
pffi ffiffiffi ffiffiffiffiffi ffiffi ffiffiffi ffi
± 1 1 þ 2 a t property of the FChT regarding the marginalization
c ðtÞ a ¼ - ± . (20) of the WD; Parseval’s theorem along with other
a
a basic properties are also derived; the TF resolution
Eq. (18) describes the FChT as the Fourier trans- over harmonically related chirplets covers the final
form of a warped-time version of the analysis part of the section.
signal. Since both components ðÞ x þ ðtÞ and x - t
overlap in time, the synthesis conditions (13) in 3.1. TF marginalization geometry
Lemma 2.1 can be also stated as: xðtÞ can be
obtained fromð Þ ¼its FChT if x - t 0. The analysis formula (8) can be arranged as
The most relevant aspect of (18) is the fact that Z
the FChT can be evaluated with the Fourier X ð f ; aÞ ¼ 1 cðtÞ e-j2pf fa ðtÞ dt, (21)
transform mechanism. This fact clearly points out -1
to its fast discrete-time evaluation: the signal is where cðtÞ is the normalized signal,
prewarped accordingly and the FFT algorithm pffiffiffiffiffiffiffiffiffiffiffiffiffi
applied thereafter. These aspects are addressed in wðtÞ ¼ ffiffi j1 þ atjxðtÞ. (22)
detail in Section 6. The last paragraphs of the Lemma 3.1. The squared magnitude of the FChT is
seminal work on warped wavelets [13] anticipates equal to the marginalization of the WD of the
the high potential of that approach: ‘‘In the future, normalized signal cðtÞ according to a fan
we can imagine the choice of a warping operator geometry,
, that is
being made automatically to best fit a certain style Z
of analysis to a given set of data.’’ In the FChT, that jX ð f ; aÞj2 ¼ 1 WDwðt; ð1 þ atÞf Þ dt. (23)
time-warping law is governed by chirp rate a. -1
Proof. With the variable change ¼ t fa ðt , (21) can
be written as Þ
t
Z
X ð f ; aÞ ¼ 1 w ðtÞ e-j2pf t dt, (24) v
-1
where
f
w ðtÞ ¼ w þ ðtÞ þ w - ðtÞ, (25a) f
w ±
wðc a
± uðt þ 1=ð2aÞÞ. (25b)
ðtÞÞ
ðtÞ ¼ ± a0 ðca
f
ðtÞÞ
Supported on the Fourier-based formulation (24),
1 1 0 t
the squared magnitude of the FChT of x t ð Þis equal y
to the frequency marginal of the WD of the warped
signal w ðtÞ, i.e., Fig. 3. The FChT power spectrum is equal to the fan margin-
alization of the Wigner distribution from the focal point
2 ðt; f Þ ¼ ð-1=a; 0Þ.
jX ð f ; aÞj ¼ Z 1 WDw ðt; f Þ dt, (26a)
-1 graphically explainable by placing the focal point on
where the positive half of the time axis.
t t
WDw ðt; f Þ ¼ Z 1 w (t þ w * (t - e-j2pf t dt. Corollary 3.2. The FChT is the only warped-time
-1 2 2 Fourier transform (7) that marginalizes the WD
(26b) along line paths.
The following variable change in the double integral
Proof. Note first that the proof of Lemma 3.1 has
(26):
been conducted without expanding faðtÞ, except for
1 v 1 v the final integral variable (27b). Thus, in order for
t¼ fa (u þ þ fa (u - , (27a) the general warping operator to marginalize the WD
2 2 2 2
v along line paths, the general mapping fðtÞ is to
tv ¼ f (u þ - f (u - fulfil in (27b)
a a
2 2 ( v ( v
¼ ð1 þ auÞ v ð27bÞ
gives rise to the following Jacobian: f u þ ð Þ- f u - ¼ nðuÞ v; u; v 2 R, (29)
qðt; tÞ 0 v 0 v
¼ f (u þ f (u - . (28) where n u is a u-dependent function representing
2 mentioned
the 2 line paths. Since (29) must hold for
any value of v (and u), by setting v ¼ 0, the line
a a
qðu; vÞ 2 2 0
paths turn out to be nðuÞ ¼ f ðuÞ. Given that (29)
±
Given that ca ð fða ÞÞ
t t, the double integral for chirp rate a. The fan marginalization achieves
(26) after simplifications
¼ becomes equal to the its finest
proposi- h- i representation when the focal point is 1=g; 0 . The
tion (23). & Fourier analysis corresponds to the ‘‘light’’ source
located at the infinite. Negative chirp rates are
Fig. 3 illustrates the result of Lemma 3.1. The
image corresponds to the WD of a signal composed
of harmonically related chirplets, where the dark
stripes represent the non-stationary TF energy (for
the sake of simplicity, the cross-terms are not
depicted). When the structure is ‘‘illuminated’’
from the TF h- point i1=a; 0 , the resulting
projection gives rise to the FChT power spectrum
must hold for any v, it is simple to deduce that
the only function f t fulfilling that condition is a ðÞ
second-order polynomial, which can be
univocally written as
( )
fðtÞ ¼ 1 þ 1 aðt - tÞ ðt - tÞ þ Z. (30) 2
i j i j
xðtÞ ¼ 1
-ð1=2Þððt-tk Þ=Rk Þ2 jkjðtÞ
k e . (32) 6
Note that the WD of the normalized signal wðtÞ can be
pR2
4
qffiffiffiffiffiffiffiffi wðt; f Þ ¼ j þ1 tj xðt; f
k Þ
because the
2
k
k k approximated by WD 1 a WD
because the
Here k is the harmonic index and tk, Rk and ak input signal xðtÞ is assumed to fluctuate much faster than factor
j j pffi 1ffiffiffiffi at
ffiffiffiffi.
are, respectively, the time location, time spread and 7j þ
ffiffiffiffiffi In
ffi case iaj, Eq. (37) is an accurate approximation.
j
L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504– 1511
1522
The squared magnitude of the FChT of the multi- this statement may seem obvious at this point,
component signal results thus in previous works [19,20] suggest surprisingly the
X
jX ð f ; aÞj2 ¼ 2 þ Cð f ; aÞ,
jGk ð f ; aÞj (38) use of the FrFT for the analysis of chirp-periodic
signals.
k
• Since the cross-terms (37) follow the same fan
where Gk ð f ; aÞ is the FChT of the kth harmonic TF geometry as the positive terms, and the same
and
ð CÞ f ; a contains the fan marginal of all cross- TF geometry intrinsic in the FChT
terms. Appendix A contains the analysis of the marginalization
¼ when a g, it is simple to deduce
positive terms in (38): the fan marginalization of the that the cross- term interference among the
kth harmonic has nearly Gaussian shape of mean harmonics in the FChT is minimum and, given
and variance, respectively the oscillatory nature of the cross-terms TF
energy, irrelevant.
mk ¼ kf 1 þ gtk
, (39a)
0
1 þ atk 4. Prior related contributions
s2
¼
k R.k 2
=ð ðRkkf 0Þ ðg - aÞ2 . (39b) The FChT compares seamlessly against the
2 4
8p Þ 2
2
þ Chirplet [10], the fractional Fourier [12], and the
ð1 þ atk Þ 2ð1 þ atk Þ
Fourier transforms, all yielding the marginalization
On the other hand, the fan marginalization of the of the TF plane along different straight line
cross-terms can be likewise addressed based on the geometries (see Fig. 1 again for an illustrative
previous result: the cross-terms have Gaussian comp- arison). Additionally, it is important to
envelope and follow the same fan TF geometry address here previous works [19–22] related to the
than the positive terms; the major difference is, FChT to a larger or lesser extent.
however, the oscillating nature of the TF energy The work [19] proposes the so-called Harmonic
described by the first factor in (37). fractional Fourier transform (HFT)
The uncertainty principle [1] provides a lower Z
bound to the accuracy of the joint TF resolution HFTðoÞ9 1 xðtÞ e-joð1þAtÞt dt (43)
ss X 1 , (40) -1
• The CT (1) and the FrFT (4) can reach the lower
1510 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
correct geometrical correspondence
1522 tells us that
an FrFT instance delivers only one single HFT
frequency bin, namely, for o tan y =A. Thus,
the evaluation of the integral (43) requires as
many FrFT instances as frequency bins wished
to evaluate. This exhaustive processing
mechanism, seconded by recent works [20],
relate implicitly rotations on the TF plane with
the fan TF geometry of chirp-periodic signals, a
relation that is not conceptually appropriate.
Another work closely related to the FChT
indeed is found in [21]. In spite of addressing
analytically the practical boundaries
of the chirp rate, the work [21] does not provide accordingly. If the adequate kernel in the GWCC is
theoretical insights on the proposed time-warper. selected, the result can be a nearly cross-term-free
The work [22] proposes the so-called local TF representation. That is clearly achieved only if
polynomial Fourier transform (LPFT) the signal xðtÞ has fan TF geometry and if the
LPF Z1 analysis chirp rate a matches that geometry. On the
TðxÞ ¼ -1 contrary, Cohen’s class TFDs are not suited to
x the fan geometry but to the slanted geometry, such
ðtÞ e-jfLP ðt xÞ dt,
;
(44a)
LPF x
ðtÞ e-jfLP ðt xÞ dt,
;
TðxÞ ¼ -1 (44a)
contrary, Cohen’s class TFDs are not suited to
the fan geometry but to the slanted geometry, such
where the phase of the analysis basis is an m-order as in [24]. Further comments on warped TFD are
polynomial beyond the scope of this paper.
1 2 1
fLP ðt; xÞ9 o1 t þ o 2 t þ þ
2 m o mtm (44b)
! 5. Chirp rate estimation
and x o¼1;ð . . . ; om are is taken, and
Þ the polynomial coeffi-
finally the bidimensional representation is morphed
cients. At first sight one might be tempted to 8
Likewise, one may argue that the LPFT is the extension of the
enclose the FChT into the second-order LPFT.
Chirplet transform to polynomial laws higher than two.
However, a more careful look reveals that the
second-order LPTF corresponds actually to the
Chirplet trans- form ¼ (1), where o 2pf ¼ is
1
frequency and o2
8
2pb is the frequency variation rate. Consequently,
the second-order LPFT yields a marginalization
geometry (Fig. 1b) different from the FChTs (Fig.
1d). Even in case of a general order m, the
marginalization takes place (loosely analytically
speaking) in curved ‘‘parallel’’ paths, but never
with a fan geometry. As major consequence, neither
the LPFT (nor the Chirplet transform) is indicated in
the analysis of harmonically related chirplets (as the
seminal [22] and applied works [23] confirm). It is
important to remark again that harmonically
related chirplets, the fan marginalization of the
WD and the Fourier transform of the faðtÞ-based
time-warping are conceptually related.
The relation between the FChT and TFD may
be suggested as required exercise. However, it is
necessary to remark that the FChT itself is a
transform, and not a TFD. On the other hand, the
time-warping operation fa tð Þ (9) can be linked to
recent TFDs, such as the GWCC [16], which in its
simplest case, the generalized warped Wigner dis-
tribution (GWD) [8], is defined by
0
GWDxðt; f Þ9WDxðf- 1
ðtÞÞ ðfa ðtÞ; f =fa ðtÞÞ. (45)
a
kHz
2
0
aðt f 0ðtÞ, (47)
Þ¼ 1
f 0 ðtÞ
0
where f 0ð tÞ is time derivative of f 0ðtÞ. Thus, 0
the intuitive approach is to quantify the evolution 0.5 0 0.5
of the pitch f 0 tð Þand then compute the chirp rate
as (47).
The estimation of the pitch f 0 is a popular
problem. Given the broad spectrum of pitch
estimation methodologies [25], it is not our aim to
focus here on a particular pitch estimation algo-
0.5 0 0.5
rithm and elaborate it further. On the other hand,
since the short-based processing in (46) is
commonly
carried out only at certain instants t ¼ nS, based on Fig. 4. Example of dense ða; f Þ plane. From top to bottom:
a shift interval S, the estimated pitch on the analysis signal, ða; f Þ plane, and its vertical marginalization. The
neighboring segments around the nth can be used ða; f Þ is in logarithmic scale. The chirp rate a has been scaled by
the signal length.
to obtain the pitch rate as
f 0½n þ 1]- f 0½n - 1] 10
a½n] ¼ . rate g. The a;ð f plane Þ reveals the ‘‘bowtie’’-
0
shaped spread, typical in chirp-based transforms
Note that the estimation of (48)
the chirp rate for the nth and unequivocal sign of redundancy.
2Sf ½n]the pitch of the
segment as given in (48) will require In order to skip the large computational load
next segment. This ‘‘non-causal’’ method implies to required to obtain the redundant ða; fÞ plane, a
step one segment back to recompute the FChT of different methodology, proposed in [18], is that of
the nth segment once that the pitch of the nþ 1th is computing L FChT instances (associated to differ-
available. More details of a procedure inspired on ent chirp rates ai and estimating the chirplet
this approach can be found in [17]. parameters from the L FChT ‘‘views’’. We will
show that three views (L¼ 3Þ are sufficient for that
5.2. Intra-frame purpose. According to the harmonic representation
in (39), the kth harmonic in the ith FChT view is to
While in the inter-frame approach pitch informa- be centered at frequency mk;i and have a width sk;i
tion from adjacent segments is available, the intra- according to
frame methodology relies only on the current 1 þ gtk
segment. Computing a dense ða; f Þ plane turns out mk; ¼ 0 , (49a)
to be the most intuitive methodology. Fig. 4 i kf 1 þ a it k
tk ¼ mk;i - mk;j odologies, each one having its own pros and
aj - m . (50) limitations.
m ai
k;j k;i
The inter-frame methodology is mainly condi-
At least two views are required to estimate tk. tioned by the accuracy of the pitch tracking. Inter-
• By replacing (49a) into (49b), we can write frame methods other than those supported on pitch
2
r 92s2 ð1 þ ai tk Þ estimation are not foreseen as promising or
i k;i
1 competitive, essentially because pitch and its change
Rk2m2 2
¼ i over time is the main descriptive parameter of quasi-
4p2Rk2 þ k
2
ða - gÞ . periodic or chirp-periodic signals. Although the
The pairs ðð51Þ
ai; ri Þcorrespond to the samples of a pitch tracking task is not especially costly, it should
second-order ð1 þ gt Þ
polynomial, r aa2 þba þc. It is
¼ not be underestimated, especially if the signal
simple to see that the minimum of that segment contains disturbing components, such as
parabola background noise of any type.
corresponds to the chirp rate g, i.e., On the contrary, the inter-frame approach does
b not rely on temporal information, but only on the
g¼ - . (52) information within the very segment. Although,
2a computing a dense ða; f Þplane is the most intuitive
Likewise, the spread Rk can be obtained from the methodology, the intrinsic redundancy and
parabola coefficients. In order to obtain the resulting computational overload suggests the
coefficients ða; b; cÞ, three points are required. need of alter- native methods. The alternative
• The central frequency is estimated as method outlined in this section makes use of very
mk;i mk;j ðaj - ai Þ few FChT instances,
kf 0 ¼ . (53) thus skipping the use of redundant information.
ð1 þ gtk Þðmk;j a - k;i ai Þ
However, the large computational load required to
m estimate the chirplets present in the signal does not
The requirement of three different FChT instances suggest its use in on-line applications.
is equivalent to that in the fast refinement for At the risk of leaving this comparative without a
chirplet decomposition based on redundant diction- clear conclusion, let us mention that our current
aries proposed in [7], in which three projective chirp research is mainly focused on inter-frame meth-
views are used to estimate the chirplet parameters. odologies, and in particular on how to track the
Despite the simplicity of the inverse problem in pitch in more general scenarios, such as the signal
(50)–(53), the question that immediately arises being corrupted with background noise, or even
regards the method to obtain the set C of centers with another interfering harmonic signal.
and spreads for all harmonics. The answer has been
addressed in [18]. That approach is based on
6. Discrete-time formulation
decomposing each FChT view into Gaussians so
that each triple Gaussians (each from each view)
In analogy to the discrete-time Fourier transform,
fulfil (39). This is achieved with a Gaussian-fitting
the formulation of the discrete-time FChT could be
algorithm driven by the expectation maximization
thought as the continuous-time transform of signal
(EM) algorithm, the procedure being constrained to
(50)–(53). This iterative mechanism converges to X
xðtÞ ¼ 1 x½n]dðt - nT sÞ, (54)
the harmonically related chirplets present in the
n¼-1
signal. Although the computational load of the
algorithm is ½]
not low, mainly due to EM, the mechanism admits where x n is the discrete-time signal and Ts is the
parallel implementation and resembles somewhat sampling interval. This way of proceeding results in
biological processing [26]. For the sake of clarity, X1
the
details are out of the scope of the paper, the
X ðO; a^ Þ ¼ pffi ffi ffiffiffi ffiffiffiffiffi ffi ffiffi ffiffi ffi
interested reader being referred to that work. x½n] j 1 þ a ^ n j e-jOð1þð1=2Þa^ nÞn ,
n¼-1
(55)
where n is discrete time, O is frequency, and the having a length of T ¼ NTs. Note that the instants
analysis chirp rate a^ is the discrete counterpart tn are all within the interval ½-T=2; T=2]. Let xðtÞ be
of
the chirp rate a, that is, the equivalent continuous signal
a^ ¼ aT s . (56) x NX-1
ðtÞ ¼
Likewise, frequency O is related to its continuous- n¼0
x
½n]hðt - tnÞ, (62)
in
pffi ffi ffiffiffi ffiffiffiffiffi ffi ffiffi ffi ffi
W ðO; nÞ ¼ Z 1 pðt - nÞ j 1 þ a ^ t j ejOfa^ ðtÞ (65) the index spans n ¼ f0; .. . ; M - 1g. Note that
a^ the resampled signal x ½n] has different number of
dt. (60) samples than x½n]. Thus, the x ½n] results in
-1
In the practice, signal x½n] has finite time support. x ½n] pffi ffi ffiffiffi ffiffiffiffiffi ffiffifficaffiffiffiðtffi nhð
Þ - t‘ Þ. (66)
¼ j 1 þ‘ a t j
Furthermore, the focal point is to be placed ‘¼
0
outside ½]
the time support interval (as concluded in previous N-1
sections).
n
11
These facts
s along with the use of the t ¼ (n - T . (61)
2
Fourier-based evaluation presented in Section 2.2
gives rise to compact processing mechanisms. These time instants are placed equidistant, the
equivalent segment being centered on t ¼ 0 and
6.1. Digital evaluation of the FChT
6.1.2. Length M B
B
The warping rule (64) has a slope greater than
0
one in one time half, that ð Þ is, c a t 41 for
ato0. This impliesð Þ that signal x t is
‘‘undersampled’’ on that
region, this leading to undesired aliasing effects
(equivalent to those mentioned on (55)). This
undesired aliasing is reduced or even
suppressed by setting the length M to a proper
value. It is clear then that M4N.
Fig. 5 illustrates the effects of the time warping
T 0 T cp (T/2) 0 cp (T/2)
2 2
on the TF contents of the signal ð Þ x t . The dark
stripes
of Fig. 5a represent harmonically related chirplets
(cross-terms are not depicted); since the signal Fig. 5. Time–frequency plane of (a) xðtÞ and (b) warped
signal
xðtÞ x ðtÞ. The TF content of x ðtÞ spans to a higher frequency.
results from the synthesis of a discrete signal x½n], its
The
The frequency that corresponds to the length (68) is reduces to the DFT.
marked with a solid line in Fig. 5b. This length
gives rise to spectral overlapping (shaded on that 6.2. Computational load
figure) only in the remaining M N bins,
-
which in many practical cases do not contain Table 2 summarizes the digital computation of
valuable information and can be thus disregarded in the FChT and its inverse. Both directions are
12
further processing. The inverse FChT process composed of four main operations, which in case
results from taking the inverse Fourier transform of the direct transform correspond to:
of the M samples, and
dewarping signal x ½n] to obtain the original • Normalization: The discrete signal x½n] is
x½n].
The inverse resampling (66) clearly exists, weighted by a chirp-rate-dependent window. This
especially task implies the window computation (N opera-
given that M4N. However, that inverse process tions) and the product between window and
involves a sparse matrix pseudo-inversion, which signal (N operations).
implies large computational costs. An approximate • Warped index: The resulting weighted discrete
inverse warping is based on resampling x ½n] with signal is to be resampled according to the law
the same interpolation filter hðtÞ, as follows: caðtÞ; the new time instants need to be computed
pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi MX-1
y½n] ¼ j1 þ x ½‘]hðfa ðtn Þ - t ‘ Þ. (70) (M operations).
atnj ‘¼ • Resampling: The discrete warped signal is ob-
0 tained by interpolation. By using cubic Hermit
spline interpolation (67), the computing load of
12
On the other hand, in order to code the whole TF this task results in 4M.
information, the discrete time must be resampled according to • DFT computation: The resampled signal is
the length
DFTed, resulting in the FChT. Assuming that
N the length M is power of two, the number of
MX . (69)
1 - ja^ operations is in the order of M log M.
jN=2
Table 2
Digital computation of the fan-chirp transform and its inverse
Fan-chirp transform
Normalization x½n]
z½n] ¼ pffiffiffiffiffiffiffiffiffiffiffi 2N
j1þatn
Warped index tn ¼ ca ðtj n Þ M
Resampling P
z ½n] ¼ z½‘] hðt‘ - tn Þ 1 0.5 0 0.5 1
Fourier transform
4M
‘
DFT X ½k; a] ¼ DFTfz ½n]g M log M
Inverse fan-chirp transform
iDFT z ½n] ¼ iDFTfX ½k; a]g M log M
Warped index t n ¼ fa ðtn Þ N
P
Resampling z½n] ¼ z ½‘] hðt ‘ - t n Þ the fourth harmonic k 4 were the highest. The results
4N Normalization
‘ are shown in Fig. 6. As proven in Section 3.3
ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
p
y½n] ¼ j1 þ atnj z½n]
2N
7. Results
1 0.5 0 0.5 1
fractional Fourier transform
1 0.5 0 0.5 1
fan-Chirp transform
3
kHz
0
0 0.5 1 1.5 2 2.5 3
sec.
spectrogram, T=24 ms
3
kHz
0
0 0.5 1 1.5 2 2.5 3
sec.
spectrogram, T=48 ms
3
kHz
0
0 0.5 1 1.5 2 2.5 3
sec.
FChT-based spectrogram, T=48 ms
3
kHz
0
0 0.5 1 1.5 2 2.5 3
sec.
Pseudo smooth-Wigner–Villedistribution
Fig. 7. Time–frequency analysis of female speech with natural intonation. The horizontal and vertical axis in all representations are time
(in seconds) and frequency (from 0 to 4 kHz), respectively. The intensity is in logarithmic scale and preemphasis of high frequencies has
been applied.
by the vocal tract impulse response h ðt Þ . The rate (the proof of model (72) can be found in [17]).
excitation pð Þt is classically considered a train of Thus, a windowed segment of the natural speech
pulses equidistant in time. However, with natural can be written as
intonation, the fundamental frequency (or pitch)
X
undergoes linear variation over time. In
xðtÞ ¼ sðtÞwðtÞ ¼ ak ðtÞ ej2pkf 0 fg ðtÞ , (73)
consequence, a speech signal is more accurately k
modeled as
X where wðtÞ is a Gaussian-like window, and
0
sðtÞ ¼ Hðkf 0 f ðtÞÞ ej2pkf 0 fg ðtÞ
, the envelopesð Þak t can be approximated also with a
Gaussian shape.
(72) Supported on the previous facts, the speech can
k be analyzed with the FChT-based generalization
of
where Hð f Þ is the Fourier transform of hðtÞ, f 0 is
the pitch at the center of the interval, and g is the
pitch
3
kHz
0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
spectrogram,T =48 ms
3
kHz
0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
FChT-based spectrogram, T=48 ms
3
kHz
0
0 0.5 1 1.5 2 2.5 3 3.5
sec.
Pseudo smooth-Wigner–Ville distribution
Fig. 8. Time–frequency analysis of male speech with natural intonation. The horizontal and vertical axis in all representations are time (in
seconds) and frequency (from 0 to 4 kHz), respectively. The intensity is in logarithmic scale and preemphasis of high frequencies has been
applied.
the spectrogram (46). Two examples are considered
in this paper: a British female speaker (TIMIT
database) and an Austrian male speaker (recorded
from public radio broadcasting). Both speakers talk
with natural intonation, which results in a contin-
uous and very often fast variation of the pitch. 0 1 2 3 4
Figs. 7 and 8 depict the TF representation of the kHz
female and male recordings, respectively, as result
of the following analysis techniques: from top
to
bottom, spectrogram with different window lengths,
13
FChT-based generalization of the spectrogram, 0 1 2 3 4
and smoothed pseudo-Wigner distribution kHz
14
(SPWD). Hamming window was used in all
methods. The first conclusion from the results in
Figs. 7 and 8 is that the spectrogram, regardless of
the segment length, fails to represent properly the
harmonics of the voiced utterances within the whole
bandwidth. This limitation, which is commonly 0 1 2 3 4
believed to be caused by ‘‘stochastic’’ components kHz
of voiced utterances [28], can be argued here as fast
frequency variation of the medium- and high-
frequency harmonics. This explanation is validated
by the FChT-based spectrogram, whose medium- and
high-frequency areas contain detailed harmonic 0 1 2 3 4
trajectories. Finally, the performance obtained by kHz
the SPWD (and other TFDs of Cohen’s class) does
not especially encourage the use of kernel distribu- Fig. 9. Fourier transform (top) and FChT (bottom) of the speech
tions on chirp-periodic signals. The reason of this record considered in Fig. 8, at instants (a) t ¼ 2:25 s and (b)
poor performance lies on the large TF overlapping t ¼ 0:25 s.
between adjacent harmonics, especially in those from
the medium and high frequencies. Furthermore,
kernel TFD can suit signals with slanted TF geometry the Fourier transform is blurry, but the position of
[24], but perform poorly with fan TF geometries. the spread harmonics still correspond visually to that
The limitation of the classical spectrogram ob- given by the more detailed FChT representation. In
served in this paper, overcome by the FChT-based example (b), the pitch rate of the naturally intonated
generalization of the spectrogram, may bring im- voiced segment causes the Fourier representation
portant consequences in speech coding: in the become unclear on the middle frequencies, whereas in
harmonic or sinusoidal coding the global bandwidth the high frequencies the Fourier spectrum seems to
is commonly split up into two bands [29], the lowest contain harmonics again. However, these are false
one being considered voiced and the highest one harmonics, caused by the marginalization of the
unvoiced; upon estimation of the split frequency each cross-terms (38), that do not match those delivered by
region is coded accordingly. This scenario is illu- the FChT. Consequently, with a FChT-based har-
strated in Fig. 9 with some spectral lines delivered by monic the whole bandwidth can be coded entirely as
the classical and the FChT-based spectrogram over an harmonic structure.
Another example of chirp-periodic signals is
the male record. The example (a) corresponds to a
found in the song of some mammals. The
speech segment of mild pitch rate: the second half of 15
popular bat echolocation ultrasound was chosen
13
The chirp rate estimation was based on the inter-frame
methodology, i.e., by tracking the pitch.
14 15
Other kernel distributions, such as the Choi–Williams (s ¼ 1) The authors wish to thank Curtis Condon, Ken White, and
and the Zhao–Atlas–Marks distribution [1] were also considered Al Feng of the Beckman Institute of the University of Illinois for
in the analysis, both delivering similar results to the SPWD. the bat data and for permission to use it in this paper.
1520 L. Weruaga, M. Ke´pesi / Signal Processing 87 (2007) 1504–
1522
a b c
50
kHz
25
0
0 2 4 0 2 4 0 2 4
ms ms ms
Fig. 10. TF analysis of ultrasound bat signal: (a) WD, (b) spectrogram ðT ¼ 0:9 msÞ, and (c) FChT spectrogram ðT ¼ 0:9 msÞ. The
intensity scale follows a root square, that is, amplitude level (instead of energy level) is shown.