You are on page 1of 24

21AIE315 AI IN SPEECH PROCESSING

Tutorial 1 Questions
Team 5
Krishnan Km --------------- CB.EN.U4AIE20031

Spandana M --------------- CB.EN.U4AIE20038

Sathvika P --------------- CB.EN.U4AIE20050

Rishekesan SV --------------- CB.EN.U4AIE20058

a.Computationally show the original sampling rate and bit-resolution of the


speech waveform.

Resample the waveform to half of that of original sampling frequency.


% Load the speech waveform
clc;
clear all;
[x, fs] = audioread('arctic_a0050.wav');

%orginal sample rate


fs

fs = 32000

info = audioinfo('arctic_a0050.wav');
%bit-resolutio n
bit_depth = info.BitsPerSample;
disp(['Bit-resolution:',num2str(bit_depth),'bits'])

Bit-resolution:16bits

% Determine the new sampling rate


new_sampling_rate = fs / 2

new_sampling_rate = 16000

% Resample the waveform


y = resample(x,fs,new_sampling_rate);

1
% Save the resampled waveform to a new file
audiowrite('team5.wav', y, new_sampling_rate);

b.Plot the speech and EGG waveforms in a single figure as subplots (after
resampling).
t = (0:length(y)-1) /new_sampling_rate;

%speech data channel-1


speech = y(:,1);
%egg data channel-2
egg=y(:,2);

figure;
subplot(2,1,1)
plot(t,speech);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(2,1,2)
plot(t, egg);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

2
c.Take 2 seconds duration of the resampled speech signal from its starting
and

mark the different regions (based on excitation phenomenon).


Compare it with the corresponding EGG waveform (after plotting both) and
write down your inferences.

duration = 2; % seconds
num_samples = duration * new_sampling_rate;
x_start = y(1:num_samples);

t = (0:length(x_start)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t,x_start);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

duration = 2; % seconds
num_samples = duration * new_sampling_rate;
egg_start = egg(1:num_samples);
t_egg = (0:length(egg_start)-1) /new_sampling_rate;

subplot(2,1,2)
plot(t_egg,egg_start);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

3
%labelled diagram based on excitation phenomena & the corresponding EGG
%waveform
createfigure(t,x_start,egg_start)

4
In the silent region we can observed the EGG waveform is flat whereas in
the voiced region it has waves.

The EGG signal represents relative vocal fold contact area

and thus delivers physiological evidence of vocal fold vibration.

d. Select a voiced region of 100ms (from the 2 seconds of resampled


speech) and show the speech and

corresponding glottal waveforms in single figure as subplots. Mark the


instants of glottal activity (GCIs and GOIs)

in the dEGG waveform. Compute pitch for one glottal cycle using instants of
glottal activity. Repeat the procedure

for entire glottal cycles in the selected 100ms.

% Select the desired 100ms region of the audio and extract it


startIdx = 17040; % choose start index for 100ms region
endIdx = startIdx + round(new_sampling_rate*0.1); % add 100ms to start index for end in
%extracting the voiced part from the 2s sample
x_voiced = x_start(startIdx:endIdx);

5
%time on the x axis
t_voiced = (0:length(x_voiced)-1) /new_sampling_rate;
%corresponding glottal waveform of the signal
egg_voiced=egg_start(startIdx:endIdx);
%time on the x axis
t_egg_voiced=(0:length(egg_voiced)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t_voiced,x_voiced);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(2,1,2)
plot(t_egg_voiced,egg_voiced);
xlabel('Time (s)');
ylabel('Amplitude');
title('Glottal Waveform');

%differentiating in order to get the goi's and the gci's


diff_voice=diff(egg_voiced);
%time on the x axis
t_g_voiced = (0:length(diff_voice)-1) /new_sampling_rate;
%plotting the speech waveform & degg waveform

6
figure;
subplot(2,1,2)
plot(t_voiced,x_voiced);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(2,1,1)
plot(t_g_voiced,diff_voice)
xlabel('Time (s)');
title('DEGG');

%marking the goi's and the gci's


createfigure1(t_voiced,x_voiced,t_g_voiced, diff_voice)

7
The pitch, also known as fundamental frequency, is the rate at which the vocal folds vibrate per second
and is measured in Hertz (Hz).

To compute the pitch for one glottal cycle using instants of glottal activity, you can use the following
formula:

Pitch = 1 / (T1 - T0)

where T1 is the time of the second instant of glottal activity within one cycle, and T0 is the time of the
first instant of glottal activity within that cycle.

In other words, the pitch can be calculated by taking the inverse of the time interval between two
consecutive instants of glottal activity within one glottal cycle.

%pitch for the first glottal cycle

T1=0.02175;

T3=0.051;

T5=0.080125;

pitch=(1/(T3-T1))

pitch = 34.1880

8
%pitch for the second glottal cycle
pitch=(1/(T5-T3))

pitch = 34.3348

%method 2
%taking samples on the x axis and plotting

[x_f,fs_f]=audioread('arctic_a0050.wav');
y=x_f(1:2:length(x_f),:);
x=y(1:32000,:);
xx=x(14180:15779,:);
xx_diff=diff(xx(:,2));
subplot(2,1,1)
plot(diff(xx(:,2)))
title("dEGG_1")
subplot(2,1,2)
plot(diff(xx(:,1)))
title("Speech")

e. Save the 2 seconds of resampled speech. Load the saved file into
‘wavesurfer’ tool and show the pitch contour plot.

9
Compare it with the result obtained in ‘d’. Write down your inferences.

We can infer that the pitch calculated and the pitch value from the pitch
contour plot is similar.

f. Select a voiced frame of 100ms and 25ms (from the 2 seconds of


resampled speech) and plot the real magnitude spectrum for

both without using any inbuilt commands. Write down the inferences.
% Select the desired 25ms region of the audio and extract it
startIdx_25 = 17040; % choose start index for 25ms region
endIdx_25 = startIdx_25 + round(new_sampling_rate*0.025); % add 100ms to start index fo
%extracting the voiced part from the 2s sample
x_voiced_25 = x_start(startIdx_25:endIdx_25);

%time on the x axis


t_voiced_25 = (0:length(x_voiced_25)-1) /new_sampling_rate;
%corresponding glottal waveform of the signal
egg_voiced_25=egg_start(startIdx_25:endIdx_25);

10
%time on the x axis
t_egg_voiced_25=(0:length(egg_voiced_25)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t_voiced_25,x_voiced_25);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform 25ms');

subplot(2,1,2)
plot(t_egg_voiced_25,egg_voiced_25);
xlabel('Time (s)');
ylabel('Amplitude');
title('Glottal Waveform 25ms');

Fs_2 = 16000;
%100ms signal
frame_100ms=x_start(startIdx:endIdx);
%25ms signal
frame_25ms=x_start(startIdx_25:endIdx_25);

N = 1024;
% Compute DFT and magnitude spectrum
X_100ms = abs(fft(frame_100ms, N));
X_25ms = abs(fft(frame_25ms, N));

11
% Plot magnitude spectra
freq_scale = (0:N-1)*Fs_2/N;
figure;
subplot(2,1,1);
plot(freq_scale, X_100ms);
xlabel('Frequency (Hz)');
ylabel('Magnitude');
title('100ms Voiced Frame');
subplot(2,1,2);
plot(freq_scale, X_25ms);
xlabel('Frequency (Hz)');
ylabel('Magnitude');
title('25ms Voiced Frame');

The left most and the right most frequencies are dominating in the
spectrum.

2. Load the 5 files in the drive (shorturl.at/fqyFN) in MATLAB and investigate


the channels of the files, separately.

Show the plots of speech, EGG and dEGG for 1 file out of the 5 files as
subplots. Write down your inferences.

12
Save each speech files separately as wave files (1 channel only) to visualize
and plot the pitch contour in wavesurfer.

Do you find any difference in pitch contour of the 5 speech files.Justify.


(Hint: The files belong to emotions Happy,

Boredom, Neutral, Sad and Anger)


[audio1, fs] = audioread('audio1.wav');

%speech data channel-1


speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);

figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
ylabel('Amplitude');
title('dEGG Waveform');

13
As the mood of the person changes most of the times it gets reflected in their voice.For example when
the person is angry the voice gets louder.

There will be a sudden increase in the pitch which is demonstrated in the above plot

[audio1, fs] = audioread('audio2.wav');

%speech data channel-1


speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);

figure;

14
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
ylabel('Amplitude');
title('dEGG Waveform');

15
When the person is happy the pitch increases.The vocal folds are relaxed which will allow smooth flow
of air and producing uniform vibrations .

As a result the voice has consistent patterns

[audio1, fs] = audioread('audio3.wav');

%speech data channel-1


speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);

figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
ylabel('Amplitude');
title('dEGG Waveform');

16
When the person is sad the voice is irregular.The vocal cords became relaxed or tensed depending
upon the situation which

causes variation in the sound wave.

[audio1, fs] = audioread('audio4.wav');

%speech data channel-1


speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);

17
figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
ylabel('Amplitude');
title('dEGG Waveform');

18
When the person is bored there aren't many fluctuations in the voice.The voice will be mostly flat and
regular.

The voice may be monotone in some instances.

[audio1, fs] = audioread('audio5.wav');

%speech data channel-1


speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);

figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');

subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
ylabel('Amplitude');
title('EGG Waveform');

subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
ylabel('Amplitude');
title('dEGG Waveform');

19
In the neutarl state the voice is stable and consistant.There will not be many irregularities or much
variations in the pitch,waveform.

With the plotted speech signal, we easily say that the acquired in the positive polarity of EGG machine.

The epochs seem to be consistent and constant without perturbation to certain extent.

function createfigure(X1, Y1, Y2)


%CREATEFIGURE(X1, Y1, Y2)
% X1: vector of plot x data

20
% Y1: vector of plot y data
% Y2: vector of plot y data

% Auto-generated by MATLAB on 02-Mar-2023 16:13:19

% Create figure
figure1 = figure('NumberTitle','off','Name','Figure','Color',[1 1 1]);

% Create subplot
subplot1 = subplot(3,1,1,'Parent',figure1);
hold(subplot1,'on');

% Create plot
plot(X1,Y1);

% Create ylabel
ylabel('Amplitude');

% Create xlabel
xlabel('Time (s)');

% Create title
title('Speech Waveform');

box(subplot1,'on');
hold(subplot1,'off');
% Create subplot
subplot2 = subplot(3,1,2,'Parent',figure1);
hold(subplot2,'on');

% Create plot
plot(X1,Y2);

% Create ylabel
ylabel('Amplitude');

% Create xlabel
xlabel('Time (s)');

% Create title
title('EGG Waveform');

box(subplot2,'on');
hold(subplot2,'off');
% Create rectangle
annotation(figure1,'rectangle',...
[0.131541871921182 0.11353711790393 0.322891625615764 0.807860262008734],...
'Color',[0.0588235294117647 1 1],...
'LineWidth',3, 'LineStyle','-.');

21
% Create rectangle
annotation(figure1,'rectangle',...
[0.475137931034483 0.107714701601164 0.210822660098522 0.815138282387192],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',3, 'LineStyle','-.');

% Create rectangle
annotation(figure1,'rectangle',...
[0.696812807881773 0.110625909752547 0.134467980295567 0.812227074235809],...
'LineWidth',3, 'LineStyle','-.');
subplot(3,1,3)
plot(0,0,'c--');
hold on
plot(0,0,'r--');
plot(0,0,'k--');
legend(["Silent Region","Voiced Region","UnVoiced Region"])
axis off
hold off

% Create plot
end

function createfigure1(X1, Y1, X2, Y2)


%CREATEFIGURE1(X1, Y1, X2, Y2)
% X1: vector of plot x data
% Y1: vector of plot y data
% X2: vector of plot x data
% Y2: vector of plot y data

% Auto-generated by MATLAB on 02-Mar-2023 18:07:03

% Create figure
figure1 = figure('Name','Figure','Color',[1 1 1]);

% Create subplot
subplot1 = subplot(2,1,2,'Parent',figure1);
hold(subplot1,'on');

% Create plot
plot(X1,Y1);

% Create ylabel
ylabel('Amplitude');

% Create xlabel
xlabel('Time (s)');

% Create title
title('Speech Waveform');

22
box(subplot1,'on');
hold(subplot1,'off');
% Create subplot
subplot2 = subplot(2,1,1,'Parent',figure1);
hold(subplot2,'on');

% Create plot
plot(X2,Y2);

% Create xlabel
xlabel('Time (s)');

% Create title
title('DEGG');

box(subplot2,'on');
hold(subplot2,'off');
% Create line
annotation(figure1,'line',[0.194701986754967 0.193377483443709],...
[0.867600682593857 0.872013651877133],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);

% Create line
annotation(figure1,'line',[0.421192052980132 0.422516556291391],...
[0.871013651877133 0.878839590443686],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);

% Create line
annotation(figure1,'line',[0.525827814569536 0.528476821192053],...
[0.609921501706485 0.614334470989761],'Color',[1 0 1],'LineWidth',10);

% Create line
annotation(figure1,'line',[0.298013245033113 0.299337748344371],...
[0.606508532423208 0.612627986348123],'Color',[1 0 1],'LineWidth',10);

% Create line
annotation(figure1,'line',[0.64635761589404 0.649006622516556],...
[0.872720136518771 0.877133105802048],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);

% Create line
annotation(figure1,'line',[0.747019867549669 0.752317880794702],...
[0.618453924914676 0.621160409556314],'Color',[1 0 1],'LineWidth',10);

end

23
24

You might also like