Music 421 Spring 20042005 Homework #8 OverlapAdd STFT Processing, Filter Banks 60 points Due in 5 days (5/31/2005)
1. (10 pts) Draw a block diagram of the ﬁlter bank interpretation of DFT, and brieﬂy explain the functions of each of the blocks.
Solution:
x(n)
(10 points) A diagram of the DFT ﬁlter bank is shown in Fig. 1.
Figure 1: DFT ﬁlter bank with h being a running sum ﬁlter.
(a) 
: e ^{−}^{j}^{ω} k ^{n} is a complex sinusoid which modulates the signal at frequency ω _{k} down to the DC. 
(b) 
: h is the lengthN running sum ﬁlter. It sums N samples of the input signal, from sample n − N + 1 upto n. In other words, it does “DCpass” its input signal. 
1
(c)
: y _{k} (n) at the output of channel k at time n is the k ^{t}^{h} DFT coeﬃcient of the current frame of signal at time n.
2. Deﬁne the signal y _{k} (m) = X _{m} (ω _{k} )e ^{j}^{ω} k ^{m}^{R} , with k viewed as a ﬁxed parameter, and m viewed as the independent variable.
(a)
(b)
(c)
(d)
(10 pts) Show that
1
N
N−1
k=0
y _{k} (m) = w(0)x(mR)
if N ≥ M , or
(2 pts) What does the term e ^{j}^{ω} k ^{m}^{R} do in the reconstruction?
(8 pts) What are the disadvantages of using the case N < M ?
(10 pts) How do we recover x(n) for all n when R > 1?
if N < M and w(mN ) = 0, m = ±1, ±2,
Solution:
(10 points) This is another exercise in manipulating summations and correctly handling impulse trains when they arise.
1
N
N−1
k=0
y _{k} (m)
=
=
=
^{1}
N
^{1}
N
^{1}
N
N−1
k=0
X _{m} (ω _{k} )e ^{j}^{ω} ^{k} ^{m}^{R}
N−1 N−1
x(n)w(n − mR)e ^{−}^{j}^{ω} ^{k} ^{n} e ^{j}^{ω} ^{k} ^{m}^{R}
k=0
N−1
n=0
n=0
x(n)w(n − mR)
N−1
k=0
_{e} −jω _{k} (n−mR)
=
=
N−1

x(n)w(n − mR) 
n=0 

N−1 


x(n + mR)w(n) 
n=0
∞
∞
r=−∞
∞
r=−∞
=
x(rN + mR)w(rN )
r=−∞
δ(n − mR − rN )
δ(n − rN)
= x(mR)w(0) + x(N + mR)w(N ) + x(−N + mR)w(−N ) + x(2N + mR)w(2N ) + x(−2N + mR)w(−2N ) + · · ·
= w(0)x(mR)
given N ≥ M , or N
< M
and w(rN ) = 0, r = ±1, ±2, ±3,
(a) (2 points) The term e ^{j}^{ω} k ^{m}^{R} modulates the decimated ﬁlter bank output back up to the proper frequency. We could say that it acts like a “remodulator”.
2
(b) 
(8 points) For N ≥ M , the FFT is longer than the window size. The window is 

zero for n > (M − 1)/2, so w(rN ) = 0 for r The window is said to be Nyquist(N). 
= 0, and the relationship holds. 

If N 
< 
M , the condition w(rN ) = 
0 
for 
r 
= 0 is necessary for the relation 

ship to hold. So the disadvantage of using the case N < M is that we have a supplementary constraint on the choice of the window. Note also that if the spec trum is modiﬁed before resynthesis, it can’t be guaranteed that the undersampled spectral components will still reconstruct to the modiﬁed signal obtained via the corresponding timedomain ﬁltering operation. 

(c) 
(10 points) To recover x(n) for all n when R > 1: 
i. stretch the channel signals (STFT) by a factor R ii. feed them into an interpolation ﬁlter
iii. remodulate
iv. sum up to obtain x(n)
3. (20 pts) Suppose the window transform W (ω) is a lowpass ﬁlter with cutoﬀ frequency ω _{c} = 2π/R. That is, W (ω) ≈ 0 for ω ≥ ω _{c} . In this case, show that
∞
m=−∞
w(n − mR) ≈
_{R} 1 W (0).
If these approximations were exact equalities, specify the set of useable frame step sizes R ^{} such that
_{∞}
m=−∞
w(n − mR ^{} ) = constant.
Solution:
(20 points) By Poisson’s summation formula:
M−1
m=0
w(n − mR) =
If W (ω) ≈ 0 for ω > ω _{c} = 2π/R, then
^{} 2πk
k
=0
_{W}
R
^{1}
R
R−1
k=0
_{W}
^{} 2πk
R
_{e} j2πk n/R _{.}
e ^{j}^{2}^{π}^{k} ^{n}^{/}^{R} ≈ 0 , ∀n .
and thus,
M−1
m=0
w(n − mR) ≈
1
_{R} W (0)
which is a constant.
If W (ω) = 0 for ω > ω _{c} (the approximation is an exact equality), then the Poisson summation can be rewritten as
R ^{} −1
M−1
1
_{R} _{} W (0) +
1
R
^{}
w(n − mR ^{} ) =
W(ω
_{k} )e ^{j}^{ω}
k ^{n}
where ω _{k} = 2πk/R ^{}
m=0
k=1
3
With any R ^{} ≤ R,
ω _{k} ≥ ω _{c}
and
1
R
^{}
R ^{} −1
k=1
W(ω
_{k}
)e ^{j}^{ω} ^{}
k ^{n} = 0.
Therefore, ^{}
This problem illustrates the basic point made in the 1977 paper by Allen. If your window transform is a good lowpass ﬁlter, any frame step size R (ﬁlterbank decimation factor) less than or equal to π/ω _{c} will allow aliasingfree reconstruction via the STFT. This is because any such step size gives a suﬃciently high sampling rate for each STFT bin over time.
Step sizes longer than π/ω _{c} (e.g.
2π/ω _{c} = M/2 for the Hamming window) rely on
aliasing cancellation (or zeroing) to give perfect reconstruction by the inverse STFT. Therefore, spectral modiﬁcations may disturb this cancellation, rendering the STFT less robust.
4. (Optional) CrossSynthesis Download the skeleton program hw8xsynth.m ^{1} and the sound source ﬁles, SteveJobs.wav ^{2} and motorcycle.wav. ^{3} The program analyzes the spectral envelope of the speech which is then imposed on the spectrum of a broadband signal, here, a motorcycle sound.
M−1
m=0
w(n − mR ^{} ) is constant for any R ^{} ≤ R.
(a) 
(5 pts) Fill in the comments (5 of them) in the program to explain what the code in the next few lines do and why we might want to do that. 
(b) 
(25 pts) Fill in the unﬁnished lines to make an aliasfree cross synthesizer. Turn in your code with all the comments completed and a sample of your cross synthesis result between SteveJobs.wav and motorcycle.wav. Name the crosssynthesis wave ﬁle xxxxhw8.wav where xxxx are the ﬁrst four letters of your last name. 
(c) 
(5 pts) For an arbitrary time n, plot the following: 
i. The shorttime speech spectrum magnitude (dB). ii. The amplitude response of the allpole ﬁlter 1/A(z) obtained by linear pre diction analysis at that time. iii. The shorttime spectral magnitude (dB) of the synthesized sample. 

(d) 
(5 pts) Discuss what are the criteria of the selection of two signals to be fed into the crosssynthesis, so that the synthesized speech is clearly intelligible. Remark: One good thing to do is to create bad examples as well as good exam ples, and investigate why they are good or bad. 
^{1} http://wwwccrma.stanford.edu/˜jos/hw421/hw8/hw8xsynth.m
^{2} http://wwwccrma.stanford.edu/˜jos/hw421/hw8/SteveJobs.wav
^{3} http://wwwccrma.stanford.edu/˜jos/hw421/hw8/motorcycle.wav
4
Much more than documents.
Discover everything Scribd has to offer, including books and audiobooks from major publishers.
Cancel anytime.