Professional Documents
Culture Documents
STFT Tamu PDF
STFT Tamu PDF
Overview
Analysis: Fourier-transform view
Analysis: filtering view
Synthesis: filter bank summation (FBS) method
Synthesis: overlap-add (OLA) method
STFT magnitude
Overview
Recap from previous lectures
Discrete time Fourier transform (DTFT)
Taking the expression of the Fourier transform =
the DTFT can be derived by numerical integration
() ,
where = and = 2
1
=0
To avoid this issue, we apply the DFT over short periods of time
For short enough windows, speech can be considered to be stationary
Remember, though, that there is a time-frequency tradeoff here
1.5
490
50
390
40
30
-0.5
20
-1
10
-1.5
SFTF (Hz)
X(f)
x(t)
0.5
290
200
100
0
50
100
150
time (sa.)
200
500
frequency (Hz)
1000
20
40
time (frames)
60
[Sethares, 2007]
Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU
http://en.wikipedia.org/wiki/Window_function
Rectangular
Hann
Hamming
http://en.wikipedia.org/wiki/Window_function
Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU
Discrete STFT
By analogy with the DTFT/DFT, the discrete STFT is defined as
, = ,
5000
wideband
narrowband
4000
4000
3000
3000
SFTF (Hz)
x(t)
SFTF (Hz)
0.5
2000
2000
0
1000
-0.5
1000
0
500 1000 1500 2000 2500
time (sa.)
0
50
100 150
time (frames)
200
50
[Quatieri, 2002]
10
[Quatieri, 2002]
11
Note that each filter is acting as a bandpass filter centered around its
selected frequency
12
analysis
synthesis
[Quatieri, 2002]
13
Examples
ex6p1.m
Generate STFT using Matlab functions
ex6p2.m
Generate filterbank outputs using the filtering view
of the STFT
ex6p3.m
Time-frequency resolution tradeoff (Quatieri fig 7.8)
14
Short-time synthesis
Under what conditions is the STFT invertible?
The discrete-time STFT , is generally invertible
Recall that
, =
with =
Evaluating [] at = we obtain [] = 0
So assuming that 0 0, we can estimate as
=
1
2 0
15
Amplitude
16
Amplitude
2
2
2
3
[Quatieri, 2002]
Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU
17
1
0
[Quatieri, 2002]
18
2
= 0
=0
The latter is known as the BFS constraint, and states that the frequency
response of the analysis filters should sum to a constant across the entire
bandwidth
[Quatieri, 2002]
19
Synthesis: Overlap-add
OLA is based on the Fourier transform view of the STFT
In the OLA method, we take the inverse DFT for each fixed time in the
discrete STFT
In principle, we could then divide by the analysis window
This method is not used, however, as small perturbations in the STFT can
become amplified in the estimated signal
1
=0
20
1
2
1
=
,
0
=0
=
[Quatieri, 2002]
Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU
21
[Quatieri, 2002]
22
STFT magnitude
The spectrogram (STFT magnitude) is widely used in speech
For one, evidence suggests that the human ear extracts information
strictly from a spectrogram representation of the speech signal
Likewise, trained researchers can visually read spectrograms, which
further indicates that the spectrogram retains most of the information
in the speech signal (at least at the phonetic level)
Hence, one may question whether the original signal can be
recovered from , , that is, by ignoring phase information
23
2
=
1
where
is the inverse DFT of 1 ,
And the process continues iteratively until convergence or a stopping
criterion is met
Introduction to Speech Processing | Ricardo Gutierrez-Osuna | CSE@TAMU
24
2
+1
=
2
=
where , = ,
,
,
25
Example
ex6p4.m
Estimate a signal from its STFT magnitude
26