You are on page 1of 16

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/352491950

Developing a MATLAB Code for Fundamental Frequency and Pitch Estimation


from Audio Signal

Technical Report · October 2018


DOI: 10.13140/RG.2.2.18424.37124/3

CITATIONS READS

0 695

1 author:

Arpita Das
Chittagong University of Engineering & Technology
13 PUBLICATIONS   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Designing a Cascadable Comparator Cell and Cascading It to Form a Comparator for Two 4-bit Numbers View project

6.6KV/220V Distribution Transformer Design in MATLAB View project

All content following this page was uploaded by Arpita Das on 27 June 2021.

The user has requested enhancement of the downloaded file.


Chittagong University of Engineering and Technology
Department of Electrical and Electronic Engineering

Project Title – Developing a MATLAB Code for Fundamental


Frequency and Pitch Estimation from Speech Signal
Course No. EEE 496

Course Title – Digital Signal Processing Sessional

Date – 29.10.2018

Submitted to

Naqib Sad Pathan


Assistant Professor, Department of Electrical and Electronic Engineering

Chittagong University of Engineering and Technology

Submitted by

Arpita Das, ID: 1402054


Objectives:

1) To learn about fundamental frequency and pitch

2) To be able to detect pitch and a fundamental frequency of a signal from audio file

3) To know about periodogram and pwelch function of MATLAB

4) To study about the different ways to detect pitch using MATLAB

5) To be able to analyze the results from the waveforms

THEORY:

FUNDAMENTAL FREQUENCY: In music, the fundamental is the musical pitch of a


note that is perceived as the lowest partial present. It is related to physical phenomena.

PITCH: The quality of a sound governed by the rate of vibrations producing it; the degree
of highness or lowness of a tone, the steepness. It is the tone which is perceived by the
listener.

METHODS:

AUTOCORRELATION: Autocorrelation, also known as serial correlation, is


the correlation of a signal with a delayed copy of itself as a function of delay. Informally, it is
the similarity between observations as a function of the time lag between them.

PERIODOGRAM: A periodogram is a graphical data analysis technique for examining


frequency-domain models of an equi-spaced time series. The periodogram is the Fourier
transform of the autocovariance function. An equi-spaced time series is one in which the
distance between adjacent points is constant. It is an estimate of the spectral density of a signal.

CEPSTRUM: A cepstrum is the result of taking the inverse Fourier transform (IFT) of
the logarithm of the estimated spectrum of a signal. There is a complex cepstrum,
a real cepstrum, a power cepstrum, and a phase cepstrum. The power cepstrum in particular
finds applications in the analysis of human speech. The cepstrum starts by taking the Fourier
transform, then the magnitude, then the logarithm, and then the inverse Fourier transform.

FFT: A Fast Fourier Transform (FFT) is an algorithm that samples a signal over a period of
time (or space) and divides it into its frequency components. These components are single
sinusoidal oscillations at distinct frequencies each with their own amplitude and phase. An

Page 2 of 15
FFT algorithm computes the discrete Fourier transform (DFT) of a sequence, or its inverse
(IFFT). Fourier analysis converts a signal from its original domain to a representation in
the frequency domain and vice versa.

WELCH’S PERIODOGRAM: In the previous section it has been shown that


the periodogram, as a non-parametric estimator of the power spectral density (PSD) of a
random signal, is not consistent.

Developed Code:
%% Clearing and closing previous files and/or variables
%%
clc;
clearvars;
close all;

%% Reading Audio file


%%
[xa, fs] = audioread('sound_aaa.wav');
[xe, fs] = audioread('sound_eee.wav');
[xu, fs] = audioread('sound_uuu.wav');

%% Plotting signals in time domain


%%
xa1 = xa(:, 1); % extracting channel one only
figure
subplot(311)
plot(xa1);
xlabel('Time');
ylabel('Amplitude');
title('Signal "aaa" in time domain')

xe1 = xe(:, 1);% extracting channel one only


subplot(312)
plot(xe1);
xlabel('Time');
ylabel('Amplitude');
title('Signal "eee" in time domain')

xu1 = xu(:, 1); % extracting channel one only


subplot(313)
plot(xu1);
xlabel('Time');
ylabel('Amplitude');
title('Signal "uuu" in time domain')

%% Fast fourier transform (FFT)

Page 3 of 15
%%
NFFT=4096;
xaF = fftshift(abs(fft(xa1,NFFT)));
f=(-1/2:1/NFFT:1/2-1/NFFT)*fs;
figure, plot(f,xaF(1:end))
hold on;
[pka,lka]=findpeaks(xaF, 'MinPeakHeight', 10); % 10 should be varried if fundamental freqs
of signal a,e,u are not same
plot(f(lka), xaF(lka), 'o');
title('Signal "a" in frequency domain');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
ffa=min(abs(f(lka)));
fprintf('fundamental frequency of signal "a" is: %f Hz\n', ffa);

xeF = fftshift(abs(fft(xe1,NFFT)));
f=(-1/2:1/NFFT:1/2-1/NFFT)*fs;
figure, plot(f,xeF(1:end))
hold on;
[pke,lke]=findpeaks(xeF, 'MinPeakHeight', 10);% 10 should be varried if fundamental freqs
of signal a,e,u are not same
plot(f(lke), xeF(lke), 'o');
title('Signal "e" in frequency domain');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
ffe=min(abs(f(lke)));
fprintf('fundamental frequency of signal "e" is: %f Hz\n', ffe);

xuF = fftshift(abs(fft(xu1,NFFT)));
f=(-1/2:1/NFFT:1/2-1/NFFT)*fs;
figure, plot(f,xuF(1:end))
hold on;
[pku,lku]=findpeaks(xuF, 'MinPeakHeight', 10);% 10 should be varried if fundamental freqs
of signal a,e,u are not same
plot(f(lku), xuF(lku), 'o');
title('Signal "u" in frequency domain');
xlabel('Frequency(Hz)');
ylabel('Amplitude');
ffu=min(abs(f(lku)));
fprintf('fundamental frequency of signal "u" is: %f Hz\n\n', ffu);

%% PSD analysis
%%
h = spectrum.welch; % or, h = spectrum.periodogram
xapsd = psd(h, xa1, 'fs', fs );
figure; plot(xapsd);
xepsd = psd(h, xe1, 'fs', fs );
figure; plot(xepsd);
xupsd = psd(h, xu1, 'fs', fs );

Page 4 of 15
figure; plot(xupsd);
% highest point of psd is the pitch of that signal. It should be marked.

%% Using pwelch() function (recommended by MATLAB)


%%
[Pxxa,Fxa] = pwelch(xa1,length(xa1),0,NFFT,fs);
figure, plot(Fxa, Pxxa);
hold on;
[~,Ia] = max(Pxxa);
ffreqa = abs(Fxa(Ia));
fprintf('Using pwelch()function (recommended by MATLAB):-\n');
fprintf('Pitch of signal "a" is: %f Hz\n', ffreqa);
plot(Fxa(Ia), Pxxa(Ia), 'o');
title('PSD of Signal "a"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

[Pxxe,Fxe] = pwelch(xe1,length(xe1),0,NFFT,fs);
figure, plot(Fxe, Pxxe);
hold on;
[~,Ie] = max(Pxxe);
ffreqe = abs(Fxe(Ie));
fprintf('Pitch of signal "e" is: %f Hz\n', ffreqe);
plot(Fxe(Ie), Pxxe(Ie), 'o');
title('PSD of Signal "e"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

[Pxxu,Fxu] = pwelch(xu1,length(xu1),0,NFFT,fs);
figure, plot(Fxu, Pxxu);
hold on;
[~,Iu] = max(Pxxu);
ffrequ = abs(Fxu(Iu));
fprintf('Pitch of signal "u" is: %f Hz\n\n', ffrequ);
plot(Fxu(Iu), Pxxu(Iu), 'o');
title('PSD of signal "u"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

%% Auto correlation in time domain for reducing noise


%%
NFFT=4096;
ra = xcorr(xa1, xa1);
figure, plot(ra);
title('Auto-correlated Sound "a" in Time Domain');
xlabel('Time');
ylabel('Amplitude');

re = xcorr(xe1, xe1);
figure, plot(re);

Page 5 of 15
title('Auto-correlated Sound "e" in Time Domain');
xlabel('Time');
ylabel('Amplitude');

ru = xcorr(xu1, xu1);
figure, plot(ru)
title('Auto-correlated Sound "e" in Time Domain');
xlabel('Time');
ylabel('Amplitude');

%% Periodogram of auto-correlated signal


%%
[Pxxxa, Fxxa] = pwelch(ra, length(ra), 0, NFFT, fs);
figure, plot(Fxxa,Pxxxa);
hold on;
[~,Ia] = max(Pxxxa);
ffreqa = abs(Fxxa(Ia));
fprintf('From PERIODOGRAM of auto-correlated signal:-\n');
fprintf('Pitch of signal "a" (auto-correlated) is: %f Hz\n', ffreqa);
plot(Fxxa(Ia), Pxxxa(Ia), 'o');
title('PSD of Signal "aaa"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

[Pxxxe, Fxxe] = pwelch(re, length(re), 0, NFFT, fs);


figure, plot(Fxxe, Pxxxe);
hold on;
[~,Ie] = max(Pxxxe);
ffreqe = abs(Fxxe(Ie));
fprintf('Pitch of signal "e" (auto-correlated) is: %f Hz\n', ffreqe);
plot(Fxxe(Ie), Pxxxe(Ie), 'o');
title('PSD of Signal "eee"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

[Pxxxu, Fxxu] = pwelch(ru, length(ru), 0, NFFT, fs);


figure, plot(Fxxu, Pxxxu);
hold on;
[~,Iu] = max(Pxxxu);
ffrequ = abs(Fxxu(Iu));
fprintf('Pitch of Sound "u" (auto-correlated) is: %f Hz\n\n', ffrequ);
plot(Fxxu(Iu), Pxxxu(Iu), 'o');
title('PSD of signal "uuu"');
xlabel('Frequency(Hz)');
ylabel('Power/Frequency');

%% Cepstrum
%%
% Working with a section of signal
dt = 1/fs;

Page 6 of 15
I0 = round(0.1/dt);
Iend = round(0.2/dt);

xac = xa1(I0:Iend);
figure, plot(xac)
title('Working with a section of sound signal "a"');
xlabel('Time');
ylabel('Amplitude');

xec = xe1(I0:Iend);
figure, plot(xec);
title('Working with a section of sound signal "e"');
xlabel('Time');
ylabel('Amplitude');

xuc = xu1(I0:Iend);
figure, plot(xuc);
title('Working with a section of sound signal "u"');
xlabel('Time');
ylabel('Amplitude');

ca = rceps(xac);
figure, plot(ca);
title('Cepstrum of signal "a"');
xlabel('quefrency(s)')
ylabel('Amplitude');

ce = rceps(xec);
figure, plot(ce)
title('Cepstrum of signal "e"');
xlabel('quefrency(s)')
ylabel('Amplitude');

cu = rceps(xuc);
figure, plot(cu)
title('Cepstrum of signal "u"');
xlabel('quefrency(s)')
ylabel('Amplitude');

%% For women, the fundamental frequency is in between f1 = 240Hz and f2 = 170Hz


(source: Google)
%% Calculating the fundamental frequency
f1 = 240;
f2 = 170;
t = 0:dt:length(xac)*dt-dt;
trng = t(t>=1/f1 & t<=1/f2);

crng_a = ca(t>=1/f1 & t<=1/f2);


crng_e = ce(t>=1/f1 & t<=1/f2);
crng_u = cu(t>=1/f1 & t<=1/f2);

Page 7 of 15
[~,Ia] = max(crng_a);
figure, plot(trng, crng_a);
hold on;
plot(trng(Ia), crng_a(Ia), 'o');
title('Real Cepstrum F0 Estimation of signal "a"');
xlabel('Time');
ylabel('Amplitude');

[~,Ie] = max(crng_e);
figure, plot(trng, crng_e);
hold on;
plot(trng(Ie), crng_e(Ie), 'o');
title('Real Cepstrum F0 Estimation of signal "e"');
xlabel('Time');
ylabel('Amplitude');

[~,Iu] = max(crng_u);
figure, plot(trng, crng_u);
hold on;
plot(trng(Iu), crng_u(Iu), 'o');
title('Real Cepstrum F0 Estimation of signal "u"');
xlabel('Time');
ylabel('Amplitude');

fprintf('Real cepstrum F0 estimation of sound "aaa..." is %f Hz.\n',1/trng(Ia));


fprintf('Real cepstrum F0 estimation of sound "eee..." is %f Hz.\n',1/trng(Ie));
fprintf('Real cepstrum F0 estimation of sound "uuu..." is %f Hz.\n',1/trng(Iu));

Outputs:

1) Primary Audio signal:

The audios which were actually three different types of sound (aaaaaa……, eeeeee…… &
uuuuuu……..) were my voice and were recorded and given as inputs (.wav file) in MATLAB.

Figure 01 : Plotting the audio signals in time domain

Page 8 of 15
2) Frequency domain plot:

Figure 02 : Plotting the audio signals in frequency domain

Page 9 of 15
3) Pitch Estimation (Using pwelch function of MATLAB):

Figure 03 : Power Spectrum Density Plot

Hence, it is seen from the curves that pitch using pwelch function of MATLAB gives:

for sound ‘aaa…’ : 1171.88 Hz

for sound ‘eee…’ : 328.125 Hz

for sound ‘uuu….’ : 1066. 41 Hz

All of these values will later be validated also from analytic estimation.

Page 10 of 15
4) Auto correlated signal:

Here autocorrelation is used to reduce noise. We have plotted the signals after autocorrelation
in time domain.

Figure 04 : Replotting in time domain after auto correlation

Page 11 of 15
5) Pitch Estimation (via PERIODOGRAM):

Figure 05 : Replotting PSD after auto correlation

Hence, it is seen from the curves that pitch from PERIODOGRAM gives:

for sound ‘aaa…’ : 1171.88 Hz

for sound ‘eee…’ : 328.125 Hz

for sound ‘uuu….’ : 351.562 Hz

All of these values will later be validated also from analytic estimation.

Page 12 of 15
6) Cepstrum:

Cepstrum is done using rcep() function which returns the real part of inverse fourier transform
of logarithm of fourier transform.

Figure 06 : IFT plot

Page 13 of 15
7) Real cepstrum (Estimation of Fundamental Frequency):

Figure 07 : Plot for fundamental frequency estimation

Page 14 of 15
Assuming a fundamental frequency range (for women usually between 170 to 240 Hz), we
plotted the cepstrum in time domain and marked the maximum time. Inverse of that time gives
minimum frequency, that is the fundamental frequency.

Results:

Figure 08: Results showing fundamental frequency before and after the tested operation

Conclusion:

Our main target in this project was to detect pitch and fundamental frequency. We have learned
some ways to detect them. From the fast fourier transform and cepstrum, we have detected
fundamental frequency and we found that values were not equal. In the fundamental frequency,
there may be some presence of noise which may be the reason of this inequality. Again, from
the periodogram and pwelch function of MATLAB we tried to detect pitch. Also, in this case
we didn’t get same values. We saved the audio in .wav format which is not good enough as
.mp3 format. In conclusion, there may be some code errors as we are new to this kind of
implementation at this stage and hope that our respective audience will be considerate about
this.

Page 15 of 15

View publication stats

You might also like