You are on page 1of 7

Music 421 Spring 2004-2005 Homework #5 Sinusoidal Modelling, Additive Synthesis, and Noise Reduction 75 points Due in one week (5/10/2005)

1. (10 pts) Getting Started with SDIF

 (a) Download and run the Matlab program make sdif file.m 1 which shows how to make an SDIF 2 ﬁle in Matlab. It uses IRCAM’s 3 SDIF Extensions for Matlab 4 (which should already be installed at CCRMA) to write a Matlab cell array to an SDIF ﬁle. (Later in this homework you will write SDIF ﬁles containing the results of a sinusoidal analysis of an input sound. For now, the parameters in the SDIF ﬁle are just some made-up numbers so that you can see how SDIF works.) (b) Use the Unix command-line utility spew-sdif to print the contents of the SDIF ﬁle. This should be installed in /usr/ccrma/bin, which should be part of your Unix path by default. (c) Download the Matlab program additive synth.m 5 and use it to synthesize the SDIF ﬁle you just created. Listen to the resulting sound. (d) Changing only the index numbers (i.e., the integers in the ﬁrst columns of the matrices frame1, frame2, etc), modify make sdif file.m so that it produces an SDIF ﬁle that sounds noticeably diﬀerent when you synthesize it.

2. (5 pts) Look at the function ignore phase synth in additive synth.m, especially at the use of the variables dp and ddp, which stand for “diﬀerence in phase” and “diﬀerence in diﬀerence in phase” respectively.

One might think that this is a needlessly low-level style of programming. Based on the equations for a sum of sinusoids 6 it seems cleaner to treat the frequency interpolation just like the amplitude interpolation, like this:

 % Additive synthesis inner loop: f = oldf; df = (newf-oldf) / R; a = olda; da = (newa-olda) / R;

1 http://ccrma.stanford.edu/˜jos/hw421/hw5/make sdif ﬁle.m

2 http://cnmat.berkeley.edu/SDIF

3 http://www.ircam.fr

6 http://ccrma.stanford.edu/˜jos/sasp/Spectral Modeling Synthesis.html

1

t

=

0;

 for i = fromSample+1:toSample output(i) = output(i) + (a * sin(f * t + phases(index))); f = f + df; a = a + da; t = t + T; end

Explain the problem with this technique.

3. (20 pts) Write a matlab program to perform sinusoidal analysis on an input sound and write the result to an SDIF ﬁle. You should use the findpeaks function from HW#3 or elsewhere. The parameters of interest are the amplitudes and frequencies of the sinusoidal components. (Don’t worry about phase.) When matching up spectral peaks in adjacent frames, choose the solution that minimizes frequency deviation from one frame to the next.

Your program should take the following arguments:

 (a) Filename for resulting SDIF ﬁle (b) Input samples (c) Sampling rate f s of input samples (d) Analysis frame rate f R (FFT hop size R = f loor(f s /f R )) (e) Percentage overlap of analysis frames (f) Window function (g) Any other analysis parameters appropriate to your analysis technique

Download the sound ﬁle wrenpn1.wav 7 which contains the sound of a bird chirping with noise embedded. Test your analysis program on this sound ﬁle using the following parameters:

 (a) Analysis frame rate f R = 200 Hz (b) 50% overlap of analysis frames (R ≈ M/2, where M is the window length) (c) Window each frame by Hann window before analysis (zero-phase is not needed). (d) Use the number of peaks which gives you the best results.

Use the additive synth.m oscillator-bank additive-synthesis program to technique to reconstruct the birdsong based on your sinusoidal model, stripping out the noise in the process. (The signal to noise ratio is approximately 60 dB). Use the ’ignore phase’ interpolation type argument to get linear interpolation on both the amplitudes and frequencies throughout.

7 http://ccrma.stanford.edu/˜jos/hw421/hw5/wrenpn1.wav

2

Spectrogram of wrenpn1.wav 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

Time

1.5

2

Spectrogram of resynthesized birdsong 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

Time

1.5

2

Figure 1: Spectrogram of the original and resynthesized birdsong using one peak (high SNR)

Turn in your analysis code and your ﬁnal denoised soundﬁle. You can email them to the TA, or create a temporary webpage for the TA to view.

Solution:

The sinusoidal modeled birdsong output is in

(20 pts) The code is on additivesynth.m 8 and hw5chirp.m 9 which

use the function findpeaks2.m 10 . wrenout1.wav 11 .

4. (5 pts) Plot the following spectrograms using Matlab’s specgram function:

 (a) Original birdsong. (b) Resynthesized birdsong.

Solution: (10 pts) The spectrogram is shown in Figure 1.

5. (5 pts) At the time half way through, plot the following spectral slices overlaid on a dB scale (i.e., just plot the spectral magnitude at that time):

 (a) Original birdsong. (b) Resynthesized birdsong.

Solution: (10 pts) At approximately half way through (time t=1.1428 sec), the overlaid spectral slices (the magnitude of DFT of the frame near this time instant) are shown in Figure 2. The frame used in both cases are time-aligned (see code). Note that there is only one prominent peak so we can set the number of peak in the bird- chirp sinusoid modeling to be one. The result of using just one peak is arguably the

9 http://ccrma.stanford.edu/˜jos/hw421/hw5sol/hw5chirp.m

10 http://ccrma.stanford.edu/˜jos/hw421/hw5sol/ﬁndpeaks2.m

11 http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout1.wav

3

dB Spectral slice (Magnitude DFT) midway through (frame 230) (wrenpn1.wav)
30
original
resynthesized
20
10
0
−10
−20
−30
−40
−50
0
2000
4000
6000
8000
10000

Hz

Figure 2: Spectral slices of the original and resynthesized birdsong overlaid at time midway through using one peak (high SNR)

best from listening. Though generally, using more peaks will represent the signal more faithfully, in this case (speciﬁc kind of bird), it is more than necessary. If more than two peaks are used, we will actually start to hear some artifacts.

6. (10 pts) Repeat the previous two problems for the sound ﬁle wrenpn2.wav 12 in which the signal to noise ratio is only 0 dB.

Solution: (10 pts) Now the SNR is too low for reliable sinusoid peak tracking. The result is far worse than the low noise case. The spectrogram and the spectral slices are shown in Figure 3 and 5 respectively. In particular, the spectral slice of the original shows that the spurious peaks corresponding to noise are as high as the sinusoid itself though in this frame, the peak detection still gives the correct one, we might not be as lucky in other frames.

7. (5 pts) What is the limitation of this noise reduction technique? Explain in relation to your results obtained earlier.

Solution: (5 pts) The limitation to this noise reduction technique is that it will not work when the signal-to-noise ratio (SNR) is too low. This is a thresholding eﬀect associated with any non-linear estimator such as the peak ﬁnding used here. The reason is, ﬁndpeaks will return the peak(s) corresponding to the noise rather than the sinusoid at low SNR. This is evident from wrenpn2.wav where SNR is only 0dB even when only one peak is used to model the signal. Also, it will not work well with other sound source which consist of many more partials than the birdsong, having some of the sinusoidal peaks lower than the noise ﬂoor, for example.

8. (10 pts) For this problem you will write two general-purpose programs that transform

12 http://ccrma.stanford.edu/˜jos/hw421/hw5/wrenpn2.wav

4

Original (wrenpn2) 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

Time

1.5

Resynthesized (wrenpn2)

2 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

Time

1.5

2

Figure 3: Spectrogram of the original and resynthesized birdsong overlaid at time midway through using one peak (low SNR)

Spectral slice (Magnitude DFT) midway through (frame 230) (wrenpn2.wav) 20
original
resynthesized
15
10
5
0
−5
−10
−15
−20
−25
0
2000
4000
6000
8000
10000
dB

Hz

Figure 4: Spectral slices of the original and resynthesized birdsong overlaid at time midway through using one peak (low SNR)

5

Pitch scaled by 1/4 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

Time

1.5

Time stretched by 2

2 10000
8000
6000
4000
2000
0
Frequency

0

0.5

1

1.5

2

2.5

Time

3

3.5

4

4.5

Figure 5: Spectrogram of the synthesized 1/4-pitch-scaled and time-stretched-by-two bird- song using one peak (high SNR)

sinusoidal models stored in SDIF ﬁles. Each program should read in an input SDIF ﬁle and write the result to an output SDIF ﬁle. You should then run the output SDIF ﬁle through the additive synthesizer to hear the result.

 (a) (5 pts) Frequency-scale modiﬁcation: Decrease the pitch of the birdsong by a factor of 4 without changing duration, creating a “bigger bird” sound. Plot the spectrogram of the result. (b) (5 pts) Time-scale modiﬁcation: Stretch the birdsong in time by a factor of 2, keeping the frequency excursions unchanged, and plot the spectrogram of the result. (Try some more extreme slow-down factors, just for fun).

Extra credit: Instead of having the frequency and time scale factors be constants, allow them to be functions of time. For example, you should be able to decrease the pitch of the birdsong by a factor of 2 initially, then have it keep getting lower over the course of the SDIF ﬁle, to an ending pitch factor of 6.

Solution:

 (a) (5 pts) To do it see the solution code hw5chirp.m. Basically, all frequency esti- mates are divided by 4 (see code). The output soundﬁle is wrenout2.wav 13 . (b) (5 pts) Also see hw5chirp.m. The length used to interpolate the instantaneous amplitudes and frequencies is increased by a factor of 2 (see code). The output soundﬁle is wrenout3.wav 14 .

13 http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout2.wav

14 http://ccrma.stanford.edu/˜jos/hw421/hw5sol/wrenout3.wav

6

9. (5 pts) Download the sound ﬁle peaches.wav. 15 Repeat the analysis and synthesis processes in the ﬁrst problem. Tracking three peaks through time is enough for starters. Describe your result and compare its quality to that of the birdsong case. How many partials (peaks) are needed to make the speech intelligible?

(5 pts) The number of partials needed to make the speech intelligible is

only 2 or 3. The algorithm used with the birdsong earlier does not work well with a human speech because

Solution:

 (a) The spectrum of the speech is more complex. We need a lot more number of sinusoids to model it. Even with a large number is used, simple tracking used with the birdsong will not give correct trajectories. (b) Human speech has the noise and transient components which are not well-modeled by the sinusoid modeling alone.

15 http://ccrma.stanford.edu/˜jos/hw421/hw5/peaches.wav

7