22 upvotes00 downvotes

766 views35 pagesJul 13, 2007

© Attribution Non-Commercial (BY-NC)

PDF, TXT or read online from Scribd

Attribution Non-Commercial (BY-NC)

766 views

22 upvotes00 downvotes

Attribution Non-Commercial (BY-NC)

You are on page 1of 35

Center for Computer Research in Music and Acoustics (CCRMA)

Department of Music, Stanford University

Stanford, California 94305

Course Overview

signal processing using the short-time Fourier

transform (STFT).

• Applications include musical sound synthesis and

audio signal processing.

1

Administrative Information

Units

You may sign up for either 3 or 4 units:

• 3 units = lectures + assignments + final

• 4 units adds a final project based on outside reading

and/or a software project

Important Pointers

• The course schedule and outline1 (reachable from the

class home page2) lists the following information:

– Assignments

– Weekly class schedule

– Pointers to all lecture overheads

• The 421 home page further contains pointers to

– Programming examples

– Sound examples

– Related items of interest online

• The MUS421/EE367B Overview3 contains this

administrative info and more.

1

http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html

2

http://ccrma.stanford.edu/CCRMA/Courses/421/

3

http://ccrma.stanford.edu/~jos/intro421/

2

Why The Fourier Transform

The ear performs a kind of Fourier analysis

• Spectral models can be very compact and flexible:

– MPEG audio coding

– Sinusoidal modeling (“additive synthesis”)

∗ AES talk4 on history of spectral modeling at

CCRMA and elsewhere.

• Any Linear Time Invariant (LTI) system can be

implemented in the frequency domain by means of

the Fourier Transform

• Efficient FFT implementations exist which make it

possible to implement very large LTI systems in real

time, e.g., room impulse-response convolutions of

length 10,000 to 100,000

4

http://ccrma.stanford.edu/~jos/pdf/AES-Heyser.pdf

3

Applications of the

Short-Time Fourier Transform (STFT)

• Fast convolution

• Time-varying linear filtering

• Time-varying nonlinear filtering

• Fourier analysis, modification, and resynthesis

• Musical sound synthesis via spectral modeling:

– Additive synthesis using sinusoids

– Sines + Noise modeling

– Sines + Noise + Transients modeling

• Speech synthesis

• Vocoders

• Time scaling

• Pitch shifting

• Pitch Detection

• Noise reduction

• Audio compression

4

Applications, Cont’d

– MPEG AAC (10X common)

– “MP3” (MPEG-II, Layer III — about half the

quality of AAC)

– Dolby AC-2 and AC-3 (6X, fixed)

– Philips DCC, Sony Minidisc (4 or 5X))

Music 422 (EE 367C) is an entire CCRMA course

devoted to this topic. It is offered only in Winter

quarters.

• Perfect Reconstruction Filter banks

• Computational Auditory Modeling

• Filter Design and Implementation

• System Identification

• Signal Separation (Auditory Scene Analysis)

• Non-Parametric Spectral Estimation

5

Emerging Applications in Audio Coding

compression algorithms:

• First transmit

objects = synthesizer patches = “decoder”

• Next transmit

messages = performance data (like MIDI) =

“encoded bit stream”

• Main challenge: to develop classifiers and coders for

general purpose audio

• Individual or group research project

• Must be related to lecture/lab topics (FFT+Audio)

• One-page project proposal due by 4th class meeting

• Final written report due by the end of finals week

• Oral presentation during the last class invited!

• Project Types

– Programming project and report

6

– Reading and report

• Suggested Outside Reading: See

http://ccrma.stanford.edu/~jos/refs421/

(also available in Appendix P of the text.5)

• Example Project Topics

– Windows

∗ New FFT window types

∗ Explore window types not covered in class

– Spectrum Analysis

∗ Short-time spectrum analysis of recorded data

∗ Study of statistical spectrum estimation

∗ Matching spectrum analysis parameters to

human hearing

∗ Alternative time-frequency representations

(Wavelets, Wigner, ...)

– Sinusoidal Modeling

∗ Readings in additive synthesis

∗ Implement your own sines+noise

analysis/synthesis system

∗ Noise reduction based on sinusoidal modeling

∗ Source separation based on sinusoidal modeling

5

http://ccrma.stanford.edu/~jos/sasp/

7

∗ Transient detection (for “sines + noise +

transients” modeling)

– Short-Time Fourier Transform (STFT) based

Analysis, Modification, and Resynthesis

∗ Software system development

∗ Noise reduction

∗ Pitch detection/tracking

∗ Time compression/expansion

∗ Transform coding

∗ Pitch-synchronous phase vocoder (adapt

window to pitch period)

∗ Signal reconstruction from the STFT magnitude

only (phase discarded)

∗ Spectral interpolation schemes to compensate

for lost frames

– Software Development

∗ Modify course Matlab examples to be

compatible with Octave

(See http://www.octave.org/.)

∗ Develop missing components of the Signal

Processing Toolbox for Octave (looking only at

Matlab help info).

∗ PD FFT-processing patches (see

http://www.pure-data.org/)

8

∗ LADSPA spectral-processing plug-ins (see

http://www.ladspa.org/

9

Main Pointer

class home page7) lists the following information:

• Assignments

• Weekly class schedule

• Pointers to all lecture overheads

6

http://ccrma.stanford.edu/~jos/intro421/Schedule_Assignments.html

7

http://ccrma.stanford.edu/CCRMA/Courses/421/

10

Notation

in this class. (We will try to stay consistent.)

Frequency and Time:

• f denotes continuous frequency in Hertz (Hz)

• ω = 2πf

• ωk denotes discrete frequency, ωk = 2π(k/N )fs

• ω, ωk ∈ R (frequencies are always real)

• T = sampling interval (sec)

1

• fs = sampling rate, fs = T

• tn = nT (discrete time)

• n, k ∈ Z (integers)

• t, tn ∈ R (times are always real)

11

Introduction to Audio Spectrum Analysis

short segments. We are therefore most interested in

short-time spectrum analysis:

• The human ear uses less than one second of past

sound to form a spectrum.

• There is a limit to the length of signal we can analyze

at once.

to apply a window function. An unmodified segment

extraction corresponds to a “rectangular window”.

Everything we ‘look at’ will be through a ‘window’, hence

it is important to realize what the window is doing to our

underlying signal.

Applications we’ll discuss first:

• FIR Filter Design

12

Example of Windowing

demonstrate what happens when we turn an infinite

duration signal into a finite duration signal through

windowing.

Complex Sinusoid:

x(n) = ejωnT , 0 ≤ ωT < π

Notes:

• The frequencies present in our signal are only positive.

A fancy name for this is an ‘analytic signal’

increases.) In order to end up with a signal which dies

out eventually (so we can use the DFT), we need to

multiply our signal by a window (which does die out).

13

The following is a diagram of a typical window function:

Zero−Phase Window

1

0.9

0.8

0.7

0.6

Amplitude

0.5

0.4

0.3

0.2

0.1

0

−1000 −800 −600 −400 −200 0 200 400 600 800 1000

Time (samples)

or “even”) window function, which means its phase in the

frequency domain is either zero or π, as we will see in

detail later. (Recall that a real and even function has a

real and even Fourier transform.) The window is also

nonnegative, as is typical.

14

We might also require that our window be zero for

negative time. Such a window is said to be ‘causal’.

Causal windows are necessary for real-time processing:

Linear Phase Window (Causal)

1

0.9

0.8

0.7

0.6

Amplitude

0.5

0.4

0.3

0.2

0.1

0

−1000 −800 −600 −400 −200 0 200 400 600 800 1000

Time (samples)

we have turned the original non-causal window into a

causal window. The Shift property of the Fourier

Transform tells us that we have introduced a linear phase

term.

15

Putting all this together, we get the following:

Our original signal (unwindowed, infinite duration), is

x(n) = ejω0nT , n ∈ Z

A portion of the real part, cos(ω0nT ), is plotted below:

1

0.8

0.6

0.4

0.2

Amplitude

−0.2

−0.4

−0.6

−0.8

−1

−2000 −1500 −1000 −500 0 500 1000 1500 2000

Time (samples)

for a 90-degree phase-shift to the right.

16

The Fourier Transform of this infinite duration signal is a

delta function at ω0: X(ω) = δ(ω − ω0)

δ(ω − ω0)

0 ω0 ω

(Note carefully the difference between w and ω.)

17

1

0.8

0.6

0.4

0.2

Amplitude

−0.2

−0.4

−0.6

−0.8

−1

−2500 −2000 −1500 −1000 −500 0 500 1000 1500 2000 2500

Time (samples)

in the time domain results in a convolution in the

frequency domain. Hence, in our case, we will obtain the

convolution of a delta function at frequency ω0, and the

transform of the window.

The result of convolution with a delta function is the

original function, shifted to the location of the delta

function. (The delta function is the identity element for

convolution.)

18

0

−5

main lobe

−10

−15

−20

Amplitude − dB

−25

−30

−35

sidelobes

−40

−45

−50

−3 −2 −1 0

ω0 1 2 3

ωT

19

Summary

‘smearing’ or ‘smoothing’ in the frequency domain.

We need to be aware of this if we are trying to resolve

sinusoids which are close together in frequency.

• Windowing also introduced side lobes.

This is important when we are trying to resolve low

amplitude sinusoids in the presence of higher

amplitude signals. When we look at specific windows,

we will be looking at this behavior.

• A sinusoid at amplitude A, frequency ω0, and phase φ

becomes a window transform shifted out to frequency

ω0, and scaled by Aejφ.

purposes and exhibit various properties.

20

The Rectangular Window

(

∆ 1, |n| ≤ M2−1

wR (n) =

0, otherwise

Zero−Phase Rectangular Window − M = 21

0.8

Amplitude

0.6

0.4

0.2

0

−20 −15 −10 −5 0 5 10 15 20

Time (samples)

• Need M odd in zero-centered case

• Scale window by 1/M to obtain unity dc gain

21

To see what happens in the frequency domain, we need

to look at the DTFT of the window:

∞

∆

X

WR (ω) = DTFTω (wR ) = wR(n)e−jωn

n=−∞

M −1

2 M −1 M +1

X ejω 2 − e−jω 2

= e−jωn =

1 − e−jω

n=− M2−1

U

X

n rL − rU +1

r =

1−r

n=L

and denominator of the above expression to get

1

" M M

#

e−jω 2 ejω 2 − e−jω 2

WR (ω) = 1 1 1

e−jω 2 ejω 2 − e−jω 2

ω

sin M 2 ∆

= ω

= M · asincM (ω)

sin 2

where asincM (ω) denotes the aliased sinc function.

sin(M ω/2)

∆

asincM (ω) =

M · sin(ω/2)

(also called the Dirichlet function)

22

Rectangular Window Transform (Cont’d)

the aliased sinc function:

ω

∆ sin M 2

WR (ω) = M · asincM (ω) = ω

sin 2

DFT of a Rectangular Window of length M = 11

12

Amp

10

6

Complex Amplitude

−2

−4

−6 −4 −2 0 2 4 6

freq

window. For the causal case, a linear phase term appears:

M −1 ω

WRc (ω) = e−j 2 M asincM (ω)

function approaches the regular sinc function

∆ sin(πx)

sinc(x) =

πx

23

More generally, we may plot both the magnitude and

phase of the window versus frequency:

Magnitude of Rectangular Window Transform (M = 11)

1

0.9

0.8

0.7

Magnitude (Linear)

0.6

0.5

0.4

0.3

0.2

0.1

0

−6 −4 −2 0 2 4 6

ω/ΩM

3.5343

3.1416

2.7489

2.3562

1.9635

Phase

1.5708

1.1781

0.7854

0.3927

Main Lobe

0

−0.3927

−6 −4 −2 0 2 4 6

ω/ΩM

24

In audio work, we more typically plot the window

transform magnitude on a decibel (dB) scale:

DFT of a Rectangular Window − M = 11

0

−5

main lobe

−10

13 dB down

−15

Magnitude (dB)

sidelobes sidelobes

−20

−25

−30

−35

nulls nulls

−40

−3 −2 −1 0 1 2 3

Normalized Frequency ω (rad/sample)

25

Since the DTFT of the rectangular window approximates

the sinc function, it should “roll off” at approximately 6

dB per octave, as verified in the log-log plot below:

DFT of a Rectangular Window − M = 20

0

−6.0206

Ideal −6 dB per octave line

−12.0412

−18.0618 Partial

Main

−24.0824 Lobe

Amplitude (dB)

−30.103

−36.1236

−42.1442

−48.1648

−54.1854

Normalized Frequency (rad/sample)

window transform (asinc) converges exactly to the sinc

function. Therefore, the departure of the roll-off from

that of the sinc function can be ascribed to aliasing in the

frequency domain, due to sampling in the time domain.

26

Sidelobe Roll-Off Rate

function w(t) exist (i.e., they are finite and uniquely

defined), then its Fourier Transform magnitude is

asymptotically proportional to

constant

|W (ω)| → n+1

(as ω → ∞)

ω

Proof: Look up “roll-off rate” in text index.

Thus, we have the following rule-of-thumb:

n derivatives ←→ −6(n + 1) dB per octave roll-off rate

(since −20 log10(2) = 6.0205999 . . .).

This is also −20(n + 1) dB per decade.

To apply this result, we normally only need to look at the

window’s endpoints. The interior of the window is usually

differentiable of all orders.

Examples:

• Amplitude discontinuity ←→ −6 dB/octave roll-off

• Slope discontinuity ←→ −12 dB/octave roll-off

• Curvature discontinuity ←→ −18 dB/octave roll-off

For discrete-time windows, the roll-off rate slows down at

high frequencies due to aliasing.

27

In summary, the DTFT of the M -sample rectangular

window is proportional to the ‘aliased sinc function’:

∆ sin(ωM T /2)

asincM (ωT ) =

sin(ωT /2)

sin(πf M T ) ∆

≈ = M sinc(f M T )

πf T

Some important points:

∆ 2π

• Zero crossings at integer multiples of ΩM = M

∆

where ΩM = 2πM = frequency sampling interval used

by a length M DFT

4π

• Main lobe width is 2ΩM = M

• As M gets bigger, the mainlobe narrows

(better frequency resolution)

• M has no effect on the height of the side lobes

(Same as the “Gibbs phenomenon” for Fourier series)

• First sidelobe only 13 dB down from main-lobe peak

• Side lobes roll off at approximately 6dB per octave

• A phase term arises when we shift the window to

make it causal, while the window transform is real in

the zero-centered case (i.e., when the window w(n) is

an even function of n)

28

Frequency Resolution

The next series of plots shows the effect that an increased

window length has on our ability to resolve 2 sinusoids.

2π

• 2 cosines separated by ∆ω = 40

• Rectangular Windows of lengths: 20, 30, 40, 80

(∆ω = 12 ΩM , 34 ΩM , ΩM , 2ΩM )

M = 20 M = 30

0.7 0.7

0.6 0.6

0.5 0.5

Magnitude

Magnitude

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 1 2 3 0 1 2 3

Frequency ωT (rad/sample) Frequency ωT (rad/sample)

M = 40 M = 80

0.7 0.7

0.6 0.6

0.5 0.5

Magnitude

Magnitude

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 1 2 3 0 1 2 3

Frequency ωT (rad/sample) Frequency ωT (rad/sample)

29

One Sine and One Cosine

(“Phase Quadrature” Case)

• Note: least-resolved case appears resolved!

• Note: M = 40 case suddenly looks much worse

• Only the M = 80 case looks good at all phases

M = 20 M = 30

0.7 0.7

0.6 0.6

0.5 0.5

Magnitude

Magnitude

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 1 2 3 0 1 2 3

Frequency ωT (rad/sample) Frequency ωT (rad/sample)

M = 40 M = 80

0.7 0.7

0.6 0.6

0.5 0.5

Magnitude

Magnitude

0.4 0.4

0.3 0.3

0.2 0.2

0.1 0.1

0 0

0 1 2 3 0 1 2 3

Frequency ωT (rad/sample) Frequency ωT (rad/sample)

30

One Sine and One Cosine

(“Phase Quadrature” Case)

All Four Resolutions Overlaid

• Peak locations are biased in under-resolved cases,

both in amplitude and frequency

0.5

M=20

M=30

0.45 M=40

M=80

0.4

0.35

0.3

Magnitude

0.25

0.2

0.15

0.1

0.05

0

0 0.5 1 1.5 2 2.5 3

Frequency ωT (rad/sample)

window of length M , two sinusoids can be most easily

31

resolved when they are separated in frequency by

∆ 2π

∆ω ≥ 2ΩM ΩM =

M

This implies there must be at least two full cycles of the

difference-frequency under the window. (We’ll see later

that this is an overly conservative requirement—a more

careful study reveals that 1.44 cycles is sufficient for the

rectangular window.)

In principle, arbitrarily small frequency separations can be

resolved if

• we are sure we are looking at the sum of two ideal

sinusoids under the window.

and/or interference, so we prefer to require sinusoidal

frequency separation by at least one main-lobe width (of

the sinc-function in this case, or the window transform

more generally) whenever possible.

The rectangular window provides an abrupt transition at

its edge. We will later look at some other windows which

have a more gradual transition. This is usually done to

reduce the height of the side lobes.

32

Resolution Bandwidth (resolving sinusoids)

Our ability to resolve two closely spaced sinusoids is

determined primarily by the main lobe width of the

Fourier transform of the window we are using.

Let Bw denote the main lobe width in Hz, with the main

lobe width defined as the width between zero crossings:

6 WR (ω)

5 Μ=7

2

2πBw

1

0

−3Ω −2Ω −Ω Ω 2Ω 3Ω ω

-1

-2

-4 -3 -2 -1 0 1 2 3 4

∆ sin (M ωT /2) sin (M πf T )

WR(ω) = asincM (ω) = =

sin(ωT /2) sin(πf T )

Main lobe width is “two sidelobes wide”

ΩM fs

⇒ Bw = 2 =2 (Hz)

2π M

33

Choosing Window Length to Resolve Sinusoids

noisy conditions) with a spacing of ∆Hz is to choose a

window length M long enough so that their main lobes

are clearly discernible. For example, we may require that

their main lobes meet at the first zero crossings.

1

0.9

0.8

∆

0.7

0.6

Amplitude

0.5

0.4

0.3

0.2

0.1

0

0 0.5 1 1.5 2 2.5 3

Bw Bw ωT (radians per sample)

Bw ≤ ∆, where Bw is the main lobe width in Hz, and ∆

is the sinusoidal frequency separation in Hz.

34

For the rectangular window, Bw can be expressed as

fs

Bw = 2

M

Hence we need:

fs

Bw = 2 ≤∆

M

fs

⇒ M ≥2

∆

or

fs

M ≥2

|f2 − f1|

Thus, to resolve the frequencies f1 and f2 under a

rectangular window, it is sufficient for the window length

M to span at least 2 periods of the difference frequency

f2 − f1, measured in samples, where 2 is the width of the

main lobe, measured in sidelobe-widths.

A rectangular window of length or greater is said to

resolve the sinusoidal frequencies f1 and f2.

35

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.