Team5 Final

21AIE315 AI IN SPEECH PROCESSING
Tutorial 1 Questions
Team 5
Krishnan Km --------------- CB.EN.U4AIE20031
Spandana M --------------- CB.EN.U4AIE20038
Sathvika P --------------- CB.EN.U4AIE20050
Rishekesan SV --------------- CB.EN.U4AIE20058
a.Computationally show the original sampling rate and bit-resolution of the

speech waveform.
Resample the waveform to half of that of original sampling frequency.

% Load the speech waveform
clc;
clear all;
[x, fs] = audioread('arctic_a0050.wav');
%orginal sample rate

fs
fs = 32000
info = audioinfo('arctic_a0050.wav');
%bit-resolutio n
bit_depth = info.BitsPerSample;
disp(['Bit-resolution:',num2str(bit_depth),'bits'])
Bit-resolution:16bits
% Determine the new sampling rate

new_sampling_rate = fs / 2
new_sampling_rate = 16000
% Resample the waveform

y = resample(x,fs,new_sampling_rate);
1
% Save the resampled waveform to a new file
audiowrite('team5.wav', y, new_sampling_rate);
b.Plot the speech and EGG waveforms in a single figure as subplots (after
resampling).
t = (0:length(y)-1) /new_sampling_rate;
%speech data channel-1

speech = y(:,1);
%egg data channel-2
egg=y(:,2);
figure;
subplot(2,1,1)
plot(t,speech);
xlabel('Time (s)');
ylabel('Amplitude');
title('Speech Waveform');
subplot(2,1,2)
plot(t, egg);
xlabel('Time (s)');
title('EGG Waveform');
2
c.Take 2 seconds duration of the resampled speech signal from its starting
and
mark the different regions (based on excitation phenomenon).

Compare it with the corresponding EGG waveform (after plotting both) and
write down your inferences.
duration = 2; % seconds
num_samples = duration * new_sampling_rate;
x_start = y(1:num_samples);
t = (0:length(x_start)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t,x_start);
xlabel('Time (s)');
duration = 2; % seconds
num_samples = duration * new_sampling_rate;
egg_start = egg(1:num_samples);
t_egg = (0:length(egg_start)-1) /new_sampling_rate;
subplot(2,1,2)
plot(t_egg,egg_start);
xlabel('Time (s)');
3
%labelled diagram based on excitation phenomena & the corresponding EGG
%waveform
createfigure(t,x_start,egg_start)
4
In the silent region we can observed the EGG waveform is flat whereas in
the voiced region it has waves.
The EGG signal represents relative vocal fold contact area
and thus delivers physiological evidence of vocal fold vibration.
d. Select a voiced region of 100ms (from the 2 seconds of resampled

speech) and show the speech and
corresponding glottal waveforms in single figure as subplots. Mark the

instants of glottal activity (GCIs and GOIs)
in the dEGG waveform. Compute pitch for one glottal cycle using instants of
glottal activity. Repeat the procedure
for entire glottal cycles in the selected 100ms.
% Select the desired 100ms region of the audio and extract it

startIdx = 17040; % choose start index for 100ms region
endIdx = startIdx + round(new_sampling_rate*0.1); % add 100ms to start index for end in
%extracting the voiced part from the 2s sample
x_voiced = x_start(startIdx:endIdx);
5
%time on the x axis
t_voiced = (0:length(x_voiced)-1) /new_sampling_rate;
%corresponding glottal waveform of the signal
egg_voiced=egg_start(startIdx:endIdx);
%time on the x axis
t_egg_voiced=(0:length(egg_voiced)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t_voiced,x_voiced);
xlabel('Time (s)');
subplot(2,1,2)
plot(t_egg_voiced,egg_voiced);
xlabel('Time (s)');
title('Glottal Waveform');
%differentiating in order to get the goi's and the gci's

diff_voice=diff(egg_voiced);
%time on the x axis
t_g_voiced = (0:length(diff_voice)-1) /new_sampling_rate;
%plotting the speech waveform & degg waveform
6
figure;
subplot(2,1,2)
plot(t_voiced,x_voiced);
xlabel('Time (s)');
subplot(2,1,1)
plot(t_g_voiced,diff_voice)
xlabel('Time (s)');
title('DEGG');
%marking the goi's and the gci's

createfigure1(t_voiced,x_voiced,t_g_voiced, diff_voice)
7
The pitch, also known as fundamental frequency, is the rate at which the vocal folds vibrate per second
and is measured in Hertz (Hz).
To compute the pitch for one glottal cycle using instants of glottal activity, you can use the following
formula:
Pitch = 1 / (T1 - T0)
where T1 is the time of the second instant of glottal activity within one cycle, and T0 is the time of the
first instant of glottal activity within that cycle.
In other words, the pitch can be calculated by taking the inverse of the time interval between two
consecutive instants of glottal activity within one glottal cycle.
%pitch for the first glottal cycle
T1=0.02175;
T3=0.051;
T5=0.080125;
pitch=(1/(T3-T1))
pitch = 34.1880
8
%pitch for the second glottal cycle
pitch=(1/(T5-T3))
pitch = 34.3348
%method 2
%taking samples on the x axis and plotting
[x_f,fs_f]=audioread('arctic_a0050.wav');
y=x_f(1:2:length(x_f),:);
x=y(1:32000,:);
xx=x(14180:15779,:);
xx_diff=diff(xx(:,2));
subplot(2,1,1)
plot(diff(xx(:,2)))
title("dEGG_1")
subplot(2,1,2)
plot(diff(xx(:,1)))
title("Speech")
e. Save the 2 seconds of resampled speech. Load the saved file into
‘wavesurfer’ tool and show the pitch contour plot.
9
Compare it with the result obtained in ‘d’. Write down your inferences.
We can infer that the pitch calculated and the pitch value from the pitch
contour plot is similar.
f. Select a voiced frame of 100ms and 25ms (from the 2 seconds of

resampled speech) and plot the real magnitude spectrum for
both without using any inbuilt commands. Write down the inferences.
% Select the desired 25ms region of the audio and extract it
startIdx_25 = 17040; % choose start index for 25ms region
endIdx_25 = startIdx_25 + round(new_sampling_rate*0.025); % add 100ms to start index fo
%extracting the voiced part from the 2s sample
x_voiced_25 = x_start(startIdx_25:endIdx_25);
%time on the x axis

t_voiced_25 = (0:length(x_voiced_25)-1) /new_sampling_rate;
%corresponding glottal waveform of the signal
egg_voiced_25=egg_start(startIdx_25:endIdx_25);
10
%time on the x axis
t_egg_voiced_25=(0:length(egg_voiced_25)-1) /new_sampling_rate;
figure;
subplot(2,1,1)
plot(t_voiced_25,x_voiced_25);
xlabel('Time (s)');
title('Speech Waveform 25ms');
subplot(2,1,2)
plot(t_egg_voiced_25,egg_voiced_25);
xlabel('Time (s)');
title('Glottal Waveform 25ms');
Fs_2 = 16000;
%100ms signal
frame_100ms=x_start(startIdx:endIdx);
%25ms signal
frame_25ms=x_start(startIdx_25:endIdx_25);
N = 1024;
% Compute DFT and magnitude spectrum
X_100ms = abs(fft(frame_100ms, N));
X_25ms = abs(fft(frame_25ms, N));
11
% Plot magnitude spectra
freq_scale = (0:N-1)*Fs_2/N;
figure;
subplot(2,1,1);
plot(freq_scale, X_100ms);
xlabel('Frequency (Hz)');
ylabel('Magnitude');
title('100ms Voiced Frame');
subplot(2,1,2);
plot(freq_scale, X_25ms);
xlabel('Frequency (Hz)');
ylabel('Magnitude');
title('25ms Voiced Frame');
The left most and the right most frequencies are dominating in the
spectrum.
2. Load the 5 files in the drive (shorturl.at/fqyFN) in MATLAB and investigate

the channels of the files, separately.
Show the plots of speech, EGG and dEGG for 1 file out of the 5 files as
subplots. Write down your inferences.
12
Save each speech files separately as wave files (1 channel only) to visualize
and plot the pitch contour in wavesurfer.
Do you find any difference in pitch contour of the 5 speech files.Justify.

(Hint: The files belong to emotions Happy,
Boredom, Neutral, Sad and Anger)

[audio1, fs] = audioread('audio1.wav');

speech_1 = audio1(:,1);
%egg data channel-2
egg_1=audio1(:,2);
figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
title('dEGG Waveform');
13
As the mood of the person changes most of the times it gets reflected in their voice.For example when
the person is angry the voice gets louder.
There will be a sudden increase in the pitch which is demonstrated in the above plot

%egg data channel-2
egg_1=audio1(:,2);
figure;
14
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
15
When the person is happy the pitch increases.The vocal folds are relaxed which will allow smooth flow
of air and producing uniform vibrations .
As a result the voice has consistent patterns

%egg data channel-2
egg_1=audio1(:,2);
figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
16
When the person is sad the voice is irregular.The vocal cords became relaxed or tensed depending
upon the situation which
causes variation in the sound wave.

%egg data channel-2
egg_1=audio1(:,2);
17
figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
18
When the person is bored there aren't many fluctuations in the voice.The voice will be mostly flat and
regular.
The voice may be monotone in some instances.

%egg data channel-2
egg_1=audio1(:,2);
figure;
subplot(3,1,1)
plot(speech_1);
xlabel('Time (s)');
subplot(3,1,2)
plot(egg_1);
xlabel('Time (s)');
subplot(3,1,3)
plot(diff(egg_1));
xlabel('Samples');
19
In the neutarl state the voice is stable and consistant.There will not be many irregularities or much
variations in the pitch,waveform.
With the plotted speech signal, we easily say that the acquired in the positive polarity of EGG machine.
The epochs seem to be consistent and constant without perturbation to certain extent.
function createfigure(X1, Y1, Y2)

%CREATEFIGURE(X1, Y1, Y2)
% X1: vector of plot x data
20
% Y1: vector of plot y data
% Auto-generated by MATLAB on 02-Mar-2023 16:13:19
% Create figure
figure1 = figure('NumberTitle','off','Name','Figure','Color',[1 1 1]);
% Create subplot
subplot1 = subplot(3,1,1,'Parent',figure1);
hold(subplot1,'on');
% Create plot
plot(X1,Y1);
% Create ylabel
% Create xlabel
xlabel('Time (s)');
% Create title
box(subplot1,'on');
hold(subplot1,'off');
% Create subplot
% Create plot
plot(X1,Y2);
% Create ylabel
% Create xlabel
xlabel('Time (s)');
% Create title
box(subplot2,'on');
% Create rectangle
annotation(figure1,'rectangle',...
[0.131541871921182 0.11353711790393 0.322891625615764 0.807860262008734],...
'Color',[0.0588235294117647 1 1],...
'LineWidth',3, 'LineStyle','-.');
21
% Create rectangle
[0.475137931034483 0.107714701601164 0.210822660098522 0.815138282387192],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
% Create rectangle
[0.696812807881773 0.110625909752547 0.134467980295567 0.812227074235809],...
subplot(3,1,3)
plot(0,0,'c--');
hold on
plot(0,0,'r--');
plot(0,0,'k--');
legend(["Silent Region","Voiced Region","UnVoiced Region"])
axis off
hold off
% Create plot
end
function createfigure1(X1, Y1, X2, Y2)

%CREATEFIGURE1(X1, Y1, X2, Y2)
% Auto-generated by MATLAB on 02-Mar-2023 18:07:03
% Create figure
figure1 = figure('Name','Figure','Color',[1 1 1]);
% Create subplot
% Create plot
plot(X1,Y1);
% Create ylabel
% Create xlabel
xlabel('Time (s)');
% Create title
22
box(subplot1,'on');
% Create subplot
% Create plot
plot(X2,Y2);
% Create xlabel
xlabel('Time (s)');
% Create title
title('DEGG');
box(subplot2,'on');
% Create line
annotation(figure1,'line',[0.194701986754967 0.193377483443709],...
[0.867600682593857 0.872013651877133],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);
% Create line
[0.871013651877133 0.878839590443686],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);
% Create line
[0.609921501706485 0.614334470989761],'Color',[1 0 1],'LineWidth',10);
% Create line
[0.606508532423208 0.612627986348123],'Color',[1 0 1],'LineWidth',10);
% Create line
[0.872720136518771 0.877133105802048],...
'Color',[0.635294117647059 0.0784313725490196 0.184313725490196],...
'LineWidth',10);
% Create line
[0.618453924914676 0.621160409556314],'Color',[1 0 1],'LineWidth',10);
end
23
24

Team5 Final

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Team5 Final

Uploaded by

Copyright:

Available Formats

21AIE315 AI IN SPEECH PROCESSING

Spandana M --------------- CB.EN.U4AIE20038

Sathvika P --------------- CB.EN.U4AIE20050

Rishekesan SV --------------- CB.EN.U4AIE20058

a.Computationally show the original sampling rate and bit-resolution of the

Resample the waveform to half of that of original sampling frequency.

%orginal sample rate

% Determine the new sampling rate

% Resample the waveform

%speech data channel-1

mark the different regions (based on excitation phenomenon).

The EGG signal represents relative vocal fold contact area

and thus delivers physiological evidence of vocal fold vibration.

d. Select a voiced region of 100ms (from the 2 seconds of resampled

corresponding glottal waveforms in single figure as subplots. Mark the

for entire glottal cycles in the selected 100ms.

% Select the desired 100ms region of the audio and extract it

%differentiating in order to get the goi's and the gci's

%marking the goi's and the gci's

Pitch = 1 / (T1 - T0)

%pitch for the first glottal cycle

f. Select a voiced frame of 100ms and 25ms (from the 2 seconds of

%time on the x axis

2. Load the 5 files in the drive (shorturl.at/fqyFN) in MATLAB and investigate

Do you find any difference in pitch contour of the 5 speech files.Justify.

Boredom, Neutral, Sad and Anger)

%speech data channel-1

[audio1, fs] = audioread('audio2.wav');

%speech data channel-1

As a result the voice has consistent patterns

[audio1, fs] = audioread('audio3.wav');

%speech data channel-1

causes variation in the sound wave.

[audio1, fs] = audioread('audio4.wav');

%speech data channel-1

The voice may be monotone in some instances.

[audio1, fs] = audioread('audio5.wav');

%speech data channel-1

function createfigure(X1, Y1, Y2)

% Auto-generated by MATLAB on 02-Mar-2023 16:13:19

function createfigure1(X1, Y1, X2, Y2)

% Auto-generated by MATLAB on 02-Mar-2023 18:07:03

You might also like