You are on page 1of 6

SPEAKER IDENTIFICATION USING SPEECH

PROCESSING








Submitted by
Eldhose Kurian -110942012 (ME)
Jatheesh Joy -110942009 (ME)
Mohammed Hashim -110915016 (DEAC)
Vibin Sunny Philip -110915003 (DEAC)

Aim
To identiIy a unknown speaker based on the sound oI the person.
Theory
Speech recognition is one oI the important areas in signal processing where a major
research are undergoing. Speech recognition can be divided into mainly two regions speaker
identiIication and spoken word recognition. Speaker identiIication has many areas oI interest
such as biometric security, automated telephone application etc. recognizing words is another
method in which the word or letter or number spoken by the speaker is identiIied which has
application in areas like word processor, computer interaction, home automation etc.
In this project speaker identiIication techniques are used. The system will identiIy the
speaker whose sound is already in the database oI the system. A simpliIied technique using
statistical method is used in this project. The system compares the unidentiIied persons sound
with the data in database and tells iI he is the same person or not.
Procedure
The recognition algorithm consist oI two parts Iirst the system need a database oI the
sound oI person then the identiIication algorithm which identiIies the speaker.
1. Voice record part

In this section a matlab code is written to collect ten same words spoken by the speaker to its
database this is the reIerence sound to analyses the speaker statistical calculations are carried on
this 10 sound. Each sound is recorded as a .wav Iile in the corresponding directory.



2. IdentiIication
This is the core oI algorithm here the speaker is identiIied Irom the database based on the
sound input. Comparison is carried out based on statistical calculation oI recorded data. From
the 10 sample recordings each .wav Iile is analyzed. First step involve reducing the array to
neglect all background noise this is carried out by removing amplitude levels below a minimum
threshold. Second step is to take Fourier transIorm oI the data to convert it into Irequency
domain. Only Irequency below 600Hz is used since this Irequency contains all human speech
sound. From all 10 sound source a normalized Irequency response is created. Calculate the
standard deviation oI all possible samples and is stored as std`. From the input oI unidentiIied
speaker same above steps are repeated. AIter converting to Irequency domain and calculating
the standard deviation based on hypothesis testing decision is made whether the user is same as
the one whose sound is recorded earlier.
DIAGRAM










$ource code
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%% Voice Recognition Project %%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

% Jnce the user has 10 recordings of their voice using recorder.m,
% then the 10 recordings are cropped and placed in a 88200x20 matrix .
name = input ('Enter the name that must be recognized -- ','s');
ytemp = zeros (88200,20);
r = zeros (10,1);
for j = 1:10
file = sprintf ('%s%d.wav','file',j);
t, fs, = wavread (file);
s = abs (t);
start = 1;
last = 88200;

for i = 1:88200
if s (i) =.1 && i <=7000
start = 1;
break
end
8ecord volce
uaLabase
CollecL volce lourler
Lransform
Calc SLand
uevlaLlon
Compare
valld user or noL
if s (i) =.1 && i 7000
start = i-7000;
break
end
end
for i = 1:88200
k = 88201-i;
if s (k)=.1 && k=81200
last = 88200;
break
end
if s (k)= .1 && k <81200
last = k + 7000;
break
end
end
r (j) = last-start;
ytemp (1: last - start + 1,2 j) = t (start:last);
ytemp (1: last - start + 1,(2j - 1)) = t (start:last);
end

% The rows of the matrix are truncated to the smallest length
% of the 10 recordings.
y = zeros (min (r),20);
for i = 1:20
y (:,i) = ytemp (1:min (r),i);
end

% Convert the individual columns into frequency
% domain by applying the Fast Fourier Transform.
%Then take the modulus squared of all the entries in the matrix.

fy = fft (y);
fy = fy.conj (fy);

% Normalize the spectra of each recording and place into the matrix fn.
%Jnly frequiencies upto 600 are needed to represent the speech of most
% humans.
fn = zeros (600,20);
for i = 1:20
fn (1:600,i) = fy (1:600,i)/sqrt(sum (abs (fy (1:600,i)).^2));
end

% Find the average vector pu
pu = zeros (600,1);
for i = 1:20
pu = pu + fn (1:600,i);
end
pu = pu/20;

% Normalize the average vector
tn = pu/sqrt(sum (abs (pu).^2));

% Find the Standard Deviation
std = 0;
for i = 1:20
std = std + sum (abs (fn (1:600,i)-tn).^2);
end
std = sqrt (std/19);


%%%%%%%% Verification Process %%%%%%%%
% Prepare the user to record voice
input ('You will have 2 seconds to say your name. Press enter when ready')

% Record Voice and confirm if the user is happy with their recording
usertemp = wavrecord (88200,44100);
sound (usertemp,44100);
'';
rec = input ('Are you happy with this recording. \nPress 1 to record again or
just press enter to proceed-- ');
while rec == 1
rec = 0;
input ('You will have 2 seconds to say your name. Press enter when
ready')
usertemp = wavrecord (88200,44100);
sound (usertemp,44100);
rec = input ('Are you happy with this recording. \nPress 1 to record
again or just press enter to proceed-- ');
end


% Crop recording to a window that just contains the speech
s = abs (usertemp);
start = 1;
last = 88200;
for i = 1:88200
if s (i) =.1 && i <=5000
start = 1;
break
end
if s (i) =.1 && i 5000
start = i-5000;
break
end
end
for i = 1:88200
k = 88201-i;
if s (k)=.1 && k=83200
last = 88200;
break
end
if s (k)= .1 && k <83200
last = k + 5000;
break
end
end


% Transform the recording with FFT and normalize
user = usertemp (start:last);
userftemp = fft (user);
userftemp = userftemp.conj (userftemp);
userf = userftemp (1:600);
userfn = userf/sqrt(sum (abs (userf).^2));

% Plot the spectra of the recording along with the average normal vector
hold on;
subplot (2,1,1);
plot (userfn)
title ('Normalized Frequency Spectra Jf Recording')
subplot (2,1,2);
plot (tn);
title ('Normalized Frequency Spectra of Average')

% Confirm weather user's voice is within two standard deviations of mean.
s = sqrt (sum (abs (userfn - tn).^2));
if s < 2std
name = strcat ('HELLJ----',name,' !!!!');
name
else
name = strcat ('YJU ARE NJT---- ',name,' !!!!');
name
end

Recorded voice
%This program records 10 your voice 10 times and
%uses them in the voice recognition algorithm.
'Get ready to record your name ten times'
for i = 1:10
file = sprintf('%s%d.wav','file',i);
input('You have 2 seconds to say your name. Press enter when ready to
record-- ');
y = wavrecord(88200,44100);
sound(y,44100);
wavwrite(y,44100,file);
end




Result
From the execution oI program with low background noise the detection rate is higher
with correct decision. As noise increases the decision may go wrong. So more accurate decision
can be possible with other methods such as markov model and mIcc.

You might also like