Matlab

© All Rights Reserved

6 views

Matlab

© All Rights Reserved

- Pr100 User Manual
- Learning Musical Instruments from Mixtures of Audio with Weak Labels
- angle estimation
- Final.Report
- difraction diagram.pdf
- AP9211-ADSP
- Fault Diagnosis of Marine Main Engine Cylinder Cover Based
- Analysis of Duobinary
- advanced digital signal processing
- Www.maintenanceonline.org Maintenanceonline Content Images p36-45 Lacey Paper 1
- 15 an Improved Simulation Model for Nakagami-m Fading Channels
- DSP_CEN352_ch4_DFT
- Eas6490 Fan Hw4
- Code Audio Analyzer
- RCL-3
- (DEZA 2013) a CFD Study of Pressure Fluctuations to Determine Fluidization Regimes in Gas-solid Beds
- amplif de 10+10w tda2009
- Padovani
- Offshore wind turbine foundation design issues
- HANDS FREE COMPUTER CONTROL

You are on page 1of 6

Objectives

q know one way to estimate the fundamental frequency of a section of speech signal from its spectrum

q know one way to estimate the fundamental frequency of a section of speech signal from its waveform

q know one way to estimate the formant frequencies from a section of speech signal

q appreciate some of the problems of fundamental frequency and formant frequency estimation

Outline

The general problem of fundamental frequency estimation is to take a portion of signal and to find the dominant

frequency of repetition. Difficulties arise from (i) that not all signals are periodic, (ii) those that are periodic may

be changing in fundamental frequency over the time of interest, (iii) signals may be contaminated with noise, even

with periodic signals of other fundamental frequencies, (iv) signals that are periodic with interval T are also

periodic with interval 2T, 3T etc, so we need to find the smallest periodic interval or the highest fundamental

frequency; and (v) even signals of constant fundamental frequency may be changing in other ways over the

interval of interest.

A reliable way of obtaining an estimate of the dominant fundamental frequency for long, clean, stationary speech

signals is to use the cepstrum. The cepstrum is a Fourier analysis of the logarithmic amplitude spectrum of the

signal. If the log amplitude spectrum contains many regularly spaced harmonics, then the Fourier analysis of the

spectrum will show a peak corresponding to the spacing between the harmonics: i.e. the fundamental frequency.

Effectively we are treating the signal spectrum as another signal, then looking for periodicity in the spectrum itself.

The cepstrum is so-called because it turns the spectrum inside-out. The x-axis of the cepstrum has units of

quefrency, and peaks in the cepstrum (which relate to periodicities in the spectrum) are called rahmonics.

To obtain an estimate of the fundamental frequency from the cepstrum we look for a peak in the quefrency

region corresponding to typical speech fundamental frequencies. Here is an example:

[x,fs]=wavread('six.wav',[24120 25930]);

ms1=fs/1000; % maximum speech Fx at 1000Hz

ms20=fs/50; % minimum speech Fx at 50Hz

%

% plot waveform

t=(0:length(x)-1)/fs; % times of sampling instants

subplot(3,1,1);

plot(t,x);

legend('Waveform');

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 1/6

6/6/2014 Introduction to Computer Programming with MATLAB

xlabel('Time (s)');

ylabel('Amplitude');

%

% do fourier transform of windowed signal

Y=fft(x.*hamming(length(x)));

%

% plot spectrum of bottom 5000Hz

hz5000=5000*length(Y)/fs;

f=(0:hz5000)*fs/length(Y);

subplot(3,1,2);

plot(f,20*log10(abs(Y(1:length(f)))+eps));

legend('Spectrum');

xlabel('Frequency (Hz)');

ylabel('Magnitude (dB)');

%

% cepstrum is DFT of log spectrum

C=fft(log(abs(Y)+eps));

%

% plot between 1ms (=1000Hz) and 20ms (=50Hz)

q=(ms1:ms20)/fs;

subplot(3,1,3);

plot(q,abs(C(ms1:ms20)));

legend('Cepstrum');

xlabel('Quefrency (s)');

ylabel('Amplitude');

To search for the index of the peak in the cepstrum between 1 and 20ms, and then convert back to hertz, use:

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 2/6

6/6/2014 Introduction to Computer Programming with MATLAB

[c,fx]=max(abs(C(ms1:ms20)));

fprintf('Fx=%gHz\n',fs/(ms1+fx-1));

The cepstrum works best when the fundamental frequency is not changing too rapidly, when the fundamental

frequency is not too high and when the signal is relatively noise-free. A disadvantage of the cepstrum is that it

involves computationally-expensive frequency-domain processing.

The cepstrum looks for periodicity in the log spectrum of the signal, whereas our perception of pitch is more

strongly related to periodicity in the waveform itself. A means to estimate fundamental frequency from the

waveform directly is to use autocorrelation. The autocorrelation function for a section of signal shows how well

the waveform shape correlates with itself at a range of different delays. We expect a periodic signal to correlate

well with itself at very short delays and at delays corresponding to multiples of pitch periods.

This code plots the autocorrelation function for a section of speech signal:

[x,fs]=wavread('six.wav',[24120 25930]);

ms20=fs/50; % minimum speech Fx at 50Hz

%

% plot waveform

t=(0:length(x)-1)/fs; % times of sampling instants

subplot(2,1,1);

plot(t,x);

legend('Waveform');

xlabel('Time (s)');

ylabel('Amplitude');

%

% calculate autocorrelation

r=xcorr(x,ms20,'coeff');

%

% plot autocorrelation

d=(-ms20:ms20)/fs; % times of delays

subplot(2,1,2);

plot(d,r);

legend('Autocorrelation');

xlabel('Delay (s)');

ylabel('Correlation coeff.');

You can see that the autocorrelation function peaks at zero delay and at delays corresponding to 1 period,

2 periods, etc. We can estimate the fundamental frequency by looking for a peak in the delay interval

corresponding to the normal pitch range in speech, say 2ms(=500Hz) and 20ms (=50Hz). For example:

ms20=fs/50 % minimum speech Fx at 50Hz

% just look at region corresponding to positive delays

r=r(ms20+1:2*ms20+1)

[rmax,tx]=max(r(ms2:ms20))

fprintf('rmax=%g Fx=%gHz\n',rmax,fs/(ms2+tx-1));

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 3/6

6/6/2014 Introduction to Computer Programming with MATLAB

The autocorrelation approach works best when the signal is of low, regular pitch and when the spectral content

of the signal is not changing too rapidly. The autocorrelation method is prone to pitch halving errors where a

delay of two pitch periods is chosen by mistake. It can also be influenced by periodicity in the signal caused by

formant resonances, particularly for female voices where F1 can be lower in frequency than Fx.

Estimation of formant frequencies is generally more difficult than estimation of fundamental frequency. The

problem is that formant frequencies are properties of the vocal tract system and need to be inferred from the

speech signal rather than just measured. The spectral shape of the vocal tract excitation strongly influences the

observed spectral envelope, such that we cannot guarantee that all vocal tract resonances will cause peaks in the

observed spectral envelope, nor that all peaks in the spectral envelope are caused by vocal tract resonances.

The dominant method of formant frequency estimation is based on modelling the speech signal as if it were

generated by a particular kind of source and filter:

This type of analysis is called source-filter separation, and in the case of formant frequency estimation, we are

interested only in the modelled system and the frequencies of its resonances. To find the best matching system

we use a method of analysis called Linear Prediction. Linear prediction models the signal as if it were

generated by a signal of minimum energy being passed through a purely-recursive IIR filter.

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 4/6

6/6/2014 Introduction to Computer Programming with MATLAB

We will demonstrate the idea by using LPC to find the best IIR filter from a section of speech signal and then

plotting the filter's frequency response.

[x,fs]=wavread('six.wav',[24120 25930]);

% resample to 10,000Hz (optional)

x=resample(x,10000,fs);

fs=10000;

%

% plot waveform

t=(0:length(x)-1)/fs; % times of sampling instants

subplot(2,1,1);

plot(t,x);

legend('Waveform');

xlabel('Time (s)');

ylabel('Amplitude');

%

% get Linear prediction filter

ncoeff=2+fs/1000; % rule of thumb for formant estimation

a=lpc(x,ncoeff);

%

% plot frequency response

[h,f]=freqz(1,a,512,fs);

subplot(2,1,2);

plot(f,20*log10(abs(h)+eps));

legend('LP Filter');

xlabel('Frequency (Hz)');

ylabel('Gain (dB)');

To find the formant frequencies from the filter, we need to find the locations of the resonances that make up the

filter. This involves treating the filter coefficients as a polynomial and solving for the roots of the polynomial.

Details are outside the scope of this course (see readings). Here is the code to find estimated formant

frequencies from the LP filter:

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 5/6

6/6/2014 Introduction to Computer Programming with MATLAB

r=roots(a); % find roots of polynomial a

r=r(imag(r)>0.01); % only look for roots >0Hz up to fs/2

ffreq=sort(atan2(imag(r),real(r))*fs/(2*pi));

% convert to Hz and sort

for i=1:length(ffreq)

fprintf('Formant %d Frequency %.1f\n',i,ffreq(i));

end

Formant 2 Frequency 2310.6

Formant 3 Frequency 2770.9

Formant 4 Frequency 3489.9

Formant 5 Frequency 4294.8

Reading

provides more detail about linear prediction.

A mathematical treatment can be found in: J. Makhoul. Linear prediction: A tutorial review. Proceedings of the

IEEE, 63(4):561-580, April 1975.

A good set of lecture notes on the mathematics of speech signal processing can be found at:

http://mi.eng.cam.ac.uk/~ajr/SA95/SpeechAnalysis.html

Exercises

For these exercises, use the editor window to enter your code, and save your answers to files under your

account on the central server. When you save the files, give them the file extension of ".m". Run your programs

from the command window.

1. Write a program (ex101.m) that asks the user for a filename (WAV file), and plots a fundamental frequency

track against time for the whole signal using the cepstrum method. Use a window size of 30ms and an output

frame rate of 100 Fx samples/second.

2. Modify the previous exercise to use the autocorrelation method (ex102.m). In this program set the

fundamental frequency value to zero when the peak of the autocorrelation function is less than 0.5. This acts

as a crude voicing decision.

3. Write a program (ex103.m) that asks the user for a filename (WAV file), and produces a scatter plot of

formant peaks against time. Use a window size of 30ms and an output frame rate of 100 per second.

4. (Homework) Write a GUI program which asks you to say a vowel and plots the location of the vowel on a

Vowel Quadrilateral diagram (effectively F1 vs F2).

http://www.phon.ucl.ac.uk/courses/spsci/matlab/lect10.html 6/6

- Pr100 User ManualUploaded byjllaredo
- Learning Musical Instruments from Mixtures of Audio with Weak LabelsUploaded bymachinelearner
- angle estimationUploaded byDave Higgins
- Final.ReportUploaded byVinayaka Swamy
- difraction diagram.pdfUploaded byDudu Dudu
- AP9211-ADSPUploaded byselvijegan
- Fault Diagnosis of Marine Main Engine Cylinder Cover BasedUploaded byLuiz Dorocinski Gonzaga
- Analysis of DuobinaryUploaded bySarvesh Desai
- advanced digital signal processingUploaded byL Narayanan Deva
- Www.maintenanceonline.org Maintenanceonline Content Images p36-45 Lacey Paper 1Uploaded byFrankGriffith
- 15 an Improved Simulation Model for Nakagami-m Fading ChannelsUploaded bystarscream99
- DSP_CEN352_ch4_DFTUploaded byTewodros
- Eas6490 Fan Hw4Uploaded byfangatech
- Code Audio AnalyzerUploaded byThao Le Minh
- RCL-3Uploaded byDo van Thang
- (DEZA 2013) a CFD Study of Pressure Fluctuations to Determine Fluidization Regimes in Gas-solid BedsUploaded byCarlos Manuel Romero Luna
- amplif de 10+10w tda2009Uploaded byRene Alfredo Flores
- PadovaniUploaded byQiyan
- Offshore wind turbine foundation design issuesUploaded byLászló Arany
- HANDS FREE COMPUTER CONTROLUploaded byJames Moreno
- Analysis of ECG Signals Using Cepstrum.pdfUploaded byPAULSELVI
- Standby.pdfUploaded byUsharaniy
- Filter Bank Transceivers for OFDM and DMT Systems - Y. Lin, Et Al., (Cambridge, 2011) WWUploaded bycao-sh
- 1QuantitativeIdentificationofNear FaultPulse LikeGroundMotionsBasedonEnergy 2013Uploaded byYan Naung Ko
- CIB17202.pdfUploaded byAl Fauzan
- Application of Computed Order Tracking, Vold-Kalman FilteringUploaded byUma Tamil
- RWTSCUploaded byApril Dagli Diestro
- Project Cardiac CoherenceUploaded byAnonymous LXV7Pwp
- Correlation ExampleUploaded byV.Ramachandran
- fadingdopplereffect-160316140333Uploaded byImran khan

- ontolog-social-web-keynote.pdfUploaded byAngelRibeiro10
- You Only Look Once - Unified, Real-Time Object Detection ( J Redmon Et Al 2016)Uploaded byDavid Budaghyan
- CompositionUploaded byAngelRibeiro10
- The Polyglot ProjectUploaded byroutrhead
- Ontolog Social Web KeynoteUploaded byAngelRibeiro10
- ML InterperatabilityUploaded byrcoca_1
- A Survey of Heterogeneous Information Network AnalysisUploaded byAngelRibeiro10
- Biopython_Tutorial.pdfUploaded byAngelRibeiro10
- Language, Music and Computing - Mitrenina, Eds - 2019.pdfUploaded byAngelRibeiro10
- Guide to Unconventional Computing for MusicUploaded bySonnenschein
- Fundamentals of Algorithmics Brassard InglesUploaded byTusharVatsa
- egc2013_tutoriel_MissaouiUploaded byAngelRibeiro10
- curso_grafos_handout201009Uploaded byAngelRibeiro10
- Review Text BasedUploaded byAngelRibeiro10
- Overlap Coefficient - WikipediaUploaded byAngelRibeiro10
- inplementar.pdfUploaded byAngelRibeiro10
- Text MiningUploaded byAngelRibeiro10
- Jacard vs PMIUploaded byAngelRibeiro10
- Redes Complexas 2Uploaded byAngelRibeiro10
- book_270.pdfUploaded bygerman2210
- Sound LabUploaded byAngelRibeiro10
- How to Use the Hungarian Algorithm_ 10 Steps (With Pictures)Uploaded byAngelRibeiro10
- acustica.txtUploaded byAngelRibeiro10
- jumping-nlp-curves.pdfUploaded byAngelRibeiro10
- natural language processingUploaded byAngelRibeiro10
- Programa Escola RCUploaded byAngelRibeiro10
- Beethoven's Letters. (1790--1826.) Vol. iUploaded byAngelRibeiro10
- Gary OldmanUploaded byAngelRibeiro10

- Science of SingingUploaded byMichael Grant White
- Mouth ResonanceUploaded byMontserrat López Montalvo
- An 1 Sem 1 a Short Introduction to Phonetics and Phonology(2)Uploaded byUKASY8
- 10.1.1.33Uploaded bywrongsenderlynn
- Speculations on the control of speechUploaded bygitiny
- The Versatile Singer- A Guide to Vibrato & Straight ToneUploaded byPaulaRivero
- Trabalho Modulo 4 - Voice Physics and Neurology (1)Uploaded byJoão Cardita
- The Social Motivation of a Sound Change.pdfUploaded byDaniel Muñoz
- A Comparison of Vowel Productions in Children WithUploaded byTeodora Cn
- 10.1.1.104Uploaded byFarida M.Sabry
- A CASE STUDY OF SPEECH EMOTION RECOGNITION.Uploaded byIJAR Journal
- Babbling - A Crosslinguistic Investigation of Vowel Formants in BabblingUploaded byPaulina Klaudia Woźniak
- 2a-Kuhl et al, 1997Uploaded byRevisão Textual
- Nwav41 NasalUploaded byIgor Reina
- Strait Kraus 2013 Music Biology PlasticityUploaded byCassia Lmt
- Glossary of speech processingUploaded byuflilla
- Feature Extraction Methods LPC, PLP and MFCCUploaded byRasool Reddy
- Hearing+-+VozUploaded byNancy Olguín
- Synth Secrets, Part 23: Formant Synthesis.pdfUploaded byscribbler72
- JVoiceFormants&Partials 2013Uploaded byTimon Wapenaar
- Handout 5 - Source Filter Theory.pdfUploaded byTaha Ahsan
- DAFx-15 Submission 32 Articulation SpeechUploaded byHua Hidari Yang
- Acoustic Phonetics 3Uploaded byBéline Lily
- Joseph Shore - A Right to SingUploaded byFelix Bes Abraham
- Butter and Bread, An InterviewUploaded byTamara Stojanovic
- InterpretaciónacústicadelaresoanciadelavozIngoTitzeUploaded bySindy Vargas
- A new inverse-filtering technique for deriving the glottal air flow waveform during voicingUploaded byrobarista
- PRAAT Workshop Manual v421Uploaded byWanis Amer
- English Phonetics and PhonologyUploaded byAndrea Caretta
- Voice MorphingUploaded byapi-3827000

## Much more than documents.

Discover everything Scribd has to offer, including books and audiobooks from major publishers.

Cancel anytime.