You are on page 1of 4

Music 421 Spring 2004-2005 Homework #8 Overlap-Add STFT Processing, Filter Banks 60 points Due in 5 days (5/31/2005)

1. (10 pts) Draw a block diagram of the ﬁlter bank interpretation of DFT, and brieﬂy explain the functions of each of the blocks.

Solution:

x(n)

(10 points) A diagram of the DFT ﬁlter bank is shown in Fig. 1.

x 0 (n)
h
y 0 (n)
1
x 1 (n)
h
y 1 (n)
e −jω 1 nT
x N−1 (n)
h
y N−1 (n)
e −jω N−1 nT

Figure 1: DFT ﬁlter bank with h being a running sum ﬁlter.

 (a) : e −jω k n is a complex sinusoid which modulates the signal at frequency ω k down to the DC. (b) : h is the length-N running sum ﬁlter. It sums N samples of the input signal, from sample n − N + 1 upto n. In other words, it does “DC-pass” its input signal.

1

(c)

: y k (n) at the output of channel k at time n is the k th DFT coeﬃcient of the current frame of signal at time n.

2. Deﬁne the signal y k (m) = X m (ω k )e jω k mR , with k viewed as a ﬁxed parameter, and m viewed as the independent variable.

(a)

(b)

(c)

(d)

(10 pts) Show that

1

N

N1

k=0

y k (m) = w(0)x(mR)

if N M , or

(2 pts) What does the term e jω k mR do in the reconstruction?

(8 pts) What are the disadvantages of using the case N < M ?

(10 pts) How do we recover x(n) for all n when R > 1?

if N < M and w(mN ) = 0, m = ±1, ±2,

Solution:

(10 points) This is another exercise in manipulating summations and correctly handling impulse trains when they arise.

1

N

N1

k=0

y k (m)

=

=

=

1

N

1

N

1

N

N1

k=0

X m (ω k )e jω k mR

N1 N1

x(n)w(n mR)e jω k n e jω k mR

k=0

N1

n=0

n=0

x(n)w(n mR)

N1

k=0

e k (nmR)

=

=

N1

 x(n)w(n − mR) n=0 N−1 x(n + mR)w(n)

n=0

r=−∞

r=−∞

=

x(rN + mR)w(rN )

r=−∞

δ(n mR rN )

δ(n rN)

= x(mR)w(0) + x(N + mR)w(N ) + x(N + mR)w(N ) + x(2N + mR)w(2N ) + x(2N + mR)w(2N ) + · · ·

= w(0)x(mR)

given N M , or N

< M

and w(rN ) = 0, r = ±1, ±2, ±3,

(a) (2 points) The term e jω k mR modulates the decimated ﬁlter bank output back up to the proper frequency. We could say that it acts like a “remodulator”.

2

 (b) (8 points) For N ≥ M , the FFT is longer than the window size. The window is zero for |n| > (M − 1)/2, so w(rN ) = 0 for r The window is said to be Nyquist(N). = 0, and the relationship holds. If N < M , the condition w(rN ) = 0 for r = 0 is necessary for the relation- ship to hold. So the disadvantage of using the case N < M is that we have a supplementary constraint on the choice of the window. Note also that if the spec- trum is modiﬁed before resynthesis, it can’t be guaranteed that the undersampled spectral components will still reconstruct to the modiﬁed signal obtained via the corresponding time-domain ﬁltering operation. (c) (10 points) To recover x(n) for all n when R > 1:

i. stretch the channel signals (STFT) by a factor R ii. feed them into an interpolation ﬁlter

iii. remodulate

iv. sum up to obtain x(n)

3. (20 pts) Suppose the window transform W (ω) is a lowpass ﬁlter with cut-oﬀ frequency ω c = 2π/R. That is, W (ω) 0 for |ω| ≥ ω c . In this case, show that

m=−∞

w(n mR)

R 1 W (0).

If these approximations were exact equalities, specify the set of useable frame step sizes R such that

m=−∞

w(n mR ) = constant.

Solution:

(20 points) By Poisson’s summation formula:

M1

m=0

w(n mR) =

If W (ω) 0 for |ω| > ω c = 2π/R, then

2πk

k

=0

W

R

1

R

R1

k=0

W

2πk

R

e j2πk n/R .

e j2πk n/R 0 , n .

and thus,

M1

m=0

w(n mR)

1

R W (0)

which is a constant.

If W (ω) = 0 for |ω| > ω c (the approximation is an exact equality), then the Poisson summation can be rewritten as

R 1

M1

1

R W (0) +

1

R

w(n mR ) =

W(ω

k )e jω

k n

where ω k = 2πk/R

m=0

k=1

3

With any R R,

ω k ω c

and

1

R

R 1

k=1

W(ω

k

)e jω

k n = 0.

Therefore,

This problem illustrates the basic point made in the 1977 paper by Allen. If your window transform is a good lowpass ﬁlter, any frame step size R (ﬁlterbank decimation factor) less than or equal to π/ω c will allow aliasing-free reconstruction via the STFT. This is because any such step size gives a suﬃciently high sampling rate for each STFT bin over time.

Step sizes longer than π/ω c (e.g.

2π/ω c = M/2 for the Hamming window) rely on

aliasing cancellation (or zeroing) to give perfect reconstruction by the inverse STFT. Therefore, spectral modiﬁcations may disturb this cancellation, rendering the STFT less robust.

4. (Optional) Cross-Synthesis Download the skeleton program hw8xsynth.m 1 and the sound source ﬁles, SteveJobs.wav 2 and motorcycle.wav. 3 The program analyzes the spectral envelope of the speech which is then imposed on the spectrum of a broadband signal, here, a motorcycle sound.

M1

m=0

w(n mR ) is constant for any R R.

 (a) (5 pts) Fill in the comments (5 of them) in the program to explain what the code in the next few lines do and why we might want to do that. (b) (25 pts) Fill in the unﬁnished lines to make an alias-free cross synthesizer. Turn in your code with all the comments completed and a sample of your cross synthesis result between SteveJobs.wav and motorcycle.wav. Name the cross-synthesis wave ﬁle xxxxhw8.wav where xxxx are the ﬁrst four letters of your last name. (c) (5 pts) For an arbitrary time n, plot the following: i. The short-time speech spectrum magnitude (dB). ii. The amplitude response of the all-pole ﬁlter 1/A(z) obtained by linear pre- diction analysis at that time. iii. The short-time spectral magnitude (dB) of the synthesized sample. (d) (5 pts) Discuss what are the criteria of the selection of two signals to be fed into the cross-synthesis, so that the synthesized speech is clearly intelligible. Remark: One good thing to do is to create bad examples as well as good exam- ples, and investigate why they are good or bad.

1 http://www-ccrma.stanford.edu/˜jos/hw421/hw8/hw8xsynth.m

2 http://www-ccrma.stanford.edu/˜jos/hw421/hw8/SteveJobs.wav

3 http://www-ccrma.stanford.edu/˜jos/hw421/hw8/motorcycle.wav

4