Music 421 Spring 2004-2005 Homework #8 Overlap-Add STFT Processing, Filter Banks 60 points Due in 5 days

(5/31/2005)

1. (10 pts) Draw a block diagram of the filter bank interpretation of DFT, and briefly explain the functions of each of the blocks. Solution: (10 points) A diagram of the DFT filter bank is shown in Fig. 1.

x0(n)

h

y0(n)

1 x1(n) x(n) e−jω1nT
h

y1(n)

xN −1 (n)
h

yN −1(n)

e−jωN −1 nT
Figure 1: DFT filter bank with h being a running sum filter. (a) : e−jωk n is a complex sinusoid which modulates the signal at frequency ωk down to the DC. (b) : h is the length-N running sum filter. It sums N samples of the input signal, from sample n − N + 1 upto n. In other words, it does “DC-pass” its input signal. 1

(c) : yk (n) at the output of channel k at time n is the k th DFT coefficient of the current frame of signal at time n. 2. Define the signal yk (m) = Xm (ωk )ejωk mR , with k viewed as a fixed parameter, and m viewed as the independent variable. (a) (10 pts) Show that 1 N
N −1

yk (m) = w(0)x(mR)
k=0

if N ≥ M , or if N < M and w(mN ) = 0, m = ±1, ±2, . . . . (b) (2 pts) What does the term ejωk mR do in the reconstruction? (c) (8 pts) What are the disadvantages of using the case N < M ? (d) (10 pts) How do we recover x(n) for all n when R > 1? Solution: (10 points) This is another exercise in manipulating summations and correctly handling impulse trains when they arise. 1 N
N −1

1 yk (m) = N k=0 = = = 1 N 1 N

N −1

Xm (ωk )ejωk mR
k=0 N −1 N −1

x(n)w(n − mR)e−jωk n ejωk mR
k=0 n=0 N −1 N −1

x(n)w(n − mR)
n=0 k=0 ∞

e−jωk (n−mR) δ(n − mR − rN )

N −1

x(n)w(n − mR)
n=0 N −1 r=−∞ ∞

=
n=0 ∞

x(n + mR)w(n)
r=−∞

δ(n − rN )

=
r=−∞

x(rN + mR)w(rN )

= x(mR)w(0) + x(N + mR)w(N ) + x(−N + mR)w(−N ) + x(2N + mR)w(2N ) + x(−2N + mR)w(−2N ) + · · · = w(0)x(mR) given N ≥ M , or N < M and w(rN ) = 0, r = ±1, ±2, ±3, . . .. (a) (2 points) The term ejωk mR modulates the decimated filter bank output back up to the proper frequency. We could say that it acts like a “remodulator”. 2

(b) (8 points) For N ≥ M , the FFT is longer than the window size. The window is zero for |n| > (M − 1)/2, so w(rN ) = 0 for r = 0, and the relationship holds. The window is said to be Nyquist(N). If N < M , the condition w(rN ) = 0 for r = 0 is necessary for the relationship to hold. So the disadvantage of using the case N < M is that we have a supplementary constraint on the choice of the window. Note also that if the spectrum is modified before resynthesis, it can’t be guaranteed that the undersampled spectral components will still reconstruct to the modified signal obtained via the corresponding time-domain filtering operation. (c) (10 points) To recover x(n) for all n when R > 1: i. ii. iii. iv. stretch the channel signals (STFT) by a factor R feed them into an interpolation filter remodulate sum up to obtain x(n)

3. (20 pts) Suppose the window transform W (ω) is a lowpass filter with cut-off frequency ωc = 2π/R. That is, W (ω) ≈ 0 for |ω| ≥ ωc . In this case, show that

w(n − mR) ≈
m=−∞

1 W (0). R

If these approximations were exact equalities, specify the set of useable frame step sizes R such that ∞ w(n − mR ) = constant.
m=−∞

Solution: (20 points) By Poisson’s summation formula:
M −1

1 w(n − mR) = R m=0 2πk R

R−1

W
k=0

2πk R

ej2πk n/R .

If W (ω) ≈ 0 for |ω| > ωc = 2π/R, then W
k=0

ej2πk n/R ≈ 0 , ∀n .

and thus,
M −1

w(n − mR) ≈
m=0

1 W (0) R

which is a constant. If W (ω) = 0 for |ω| > ωc (the approximation is an exact equality), then the Poisson summation can be rewritten as
M −1

1 1 w(n − mR ) = W (0) + R R m=0 3

R −1

W (ωk )ejωk n
k=1

where ωk = 2πk/R

With any R ≤ R, ωk ≥ ω c Therefore,
M −1 m=0

and

1 R

R −1

W (ωk )ejωk n = 0.
k=1

w(n − mR ) is constant for any R ≤ R.

This problem illustrates the basic point made in the 1977 paper by Allen. If your window transform is a good lowpass filter, any frame step size R (filterbank decimation factor) less than or equal to π/ωc will allow aliasing-free reconstruction via the STFT. This is because any such step size gives a sufficiently high sampling rate for each STFT bin over time. Step sizes longer than π/ωc (e.g. 2π/ωc = M/2 for the Hamming window) rely on aliasing cancellation (or zeroing) to give perfect reconstruction by the inverse STFT. Therefore, spectral modifications may disturb this cancellation, rendering the STFT less robust. 4. (Optional) Cross-Synthesis Download the skeleton program hw8xsynth.m1 and the sound source files, SteveJobs.wav2 and motorcycle.wav.3 The program analyzes the spectral envelope of the speech which is then imposed on the spectrum of a broadband signal, here, a motorcycle sound. (a) (5 pts) Fill in the comments (5 of them) in the program to explain what the code in the next few lines do and why we might want to do that. (b) (25 pts) Fill in the unfinished lines to make an alias-free cross synthesizer. Turn in your code with all the comments completed and a sample of your cross synthesis result between SteveJobs.wav and motorcycle.wav. Name the cross-synthesis wave file xxxxhw8.wav where xxxx are the first four letters of your last name. (c) (5 pts) For an arbitrary time n, plot the following: i. The short-time speech spectrum magnitude (dB). ii. The amplitude response of the all-pole filter 1/A(z) obtained by linear prediction analysis at that time. iii. The short-time spectral magnitude (dB) of the synthesized sample. (d) (5 pts) Discuss what are the criteria of the selection of two signals to be fed into the cross-synthesis, so that the synthesized speech is clearly intelligible. Remark: One good thing to do is to create bad examples as well as good examples, and investigate why they are good or bad.

1

2

http://www-ccrma.stanford.edu/˜jos/hw421/hw8/hw8xsynth.m http://www-ccrma.stanford.edu/˜jos/hw421/hw8/SteveJobs.wav 3 http://www-ccrma.stanford.edu/˜jos/hw421/hw8/motorcycle.wav

4

Master your semester with Scribd & The New York Times

Special offer for students: Only $4.99/month.

Master your semester with Scribd & The New York Times

Cancel anytime.