Professional Documents
Culture Documents
Final Year Project Progress Report
Final Year Project Progress Report
12 Jan. 09
Matthew Byrne
Contents
1. INTRODUCTION
1.1 Project Background
1.1.1 Applications of Speaker Recognition/Verification software
1.1.2 Theory behind Speaker Recognition/Verification software
1.1.3 Project Aim
2. Technology Background
2.1 Matlab
3. Progress to date
3.1 Front-end Processing
3.2 Mel-Frequency Cepstral Coefficients
3.3 Input
4. Remaining Project Goals
4.1 Testing
4.2 Investigation of Speaker Recognition and verification over the Internet
4.3 Investigation and Development of real-time implementation
4.4 PC GUI
APPENDIX I CODE
1. Program code
2. MFCC code
APPENDIX II - TABLE OF REFERENCES
12 Jan. 09
Matthew Byrne
Chapter 1 Introduction
1.1 Project Background
1.1.1 Applications of Speaker Recognition/Verification software
Speaker Verification software, or voice biometrics, is technology that relies on that fact
that we all have a unique voice that identifies us. Speaker verification software is most
commonly used in security applications, such as gaining access to restricted areas of a
building, or computer files by using your voice. It is also used in call centres where
clients say the menu which they wish to select.
In many applications, speech may be the main or only means of transferring information
and so would provide a simpler method for authentication. The telephone system
provides a familiar network for obtaining and delivering the speech signal. For telephony
based applications, there would be no need for any equipment or networks to be
installed as the telephone and mobile network would suffice. For non-telephone
applications, soundcards and microphones are available as a cheap alternative.
12 Jan. 09
Matthew Byrne
dependant system. Text-dependant systems can greatly improve accuracy, though such
constraints can be hard or impossible to enforce in certain cases. Text-independent
systems analyse and uses any spoken word to identify the user.
A speaker verification system comprises of two states, a training state, and a
test/verification state:
The first step of training is to transform the speech signal to a set of vectors, and to
obtain a new representation of the speech which is more suitable for statistical
modelling. Firstly the speech signal is broken up into short frames, then windowed to
minimize distortion. The signal is then analysed and stored as that users template.
The first step in the testing stage is to extract features from the input speech, similar to
that during training, compare the input speech to all other stored templates and select
the most accurately matching template and ID the speaker. The figure below charts this
process.
12 Jan. 09
Matthew Byrne
12 Jan. 09
Matthew Byrne
12 Jan. 09
Matthew Byrne
50
100
150
200
250
300
12 Jan. 09
Matthew Byrne
The next step is to apply Hamming windowing to minimize the signal discontinuities at
the beginning and end of each frame. This helps minimize the spectral distortion by
using the window to taper the signal to zero at the beginning and the end of each frame.
800
600
400
200
0
-200
-400
-600
-800
-1000
50
100
150
200
250
300
12 Jan. 09
Matthew Byrne
x 10
1.8
1.6
1.4
1.2
1
0.8
0.6
0.4
0.2
0
10
12
14
16
18
20
12 Jan. 09
Matthew Byrne
0.5
0.45
0.4
0.35
0.3
0.25
0.2
0.15
0.1
0.05
0
12 Jan. 09
Matthew Byrne
10
12
14
16
18
20
3.3 Input
The project will initially involve algorithm development and simulation using Matlab, and
using pre-existing publicly available databases of speech. However, the project will also
involve the development of a system for real-time operation, using a commerciallyavailable embedded microprocessor development system or a system based on a
dedicated Digital Signal Processor (depending on the choice of hardware, only a subset
of the functionality of the full system may be implemented).
Initially a text-dependent, speaker identification system would be developed. Input
speech data is required for both training and testing. The speech used for initial training
and testing was pre-recorded.
The system initially requires the pre-recorded speech file. In latter stages, the system
will use recorded audio from multiple users. This will require a microphone for input
which will store the signal temporarily while the system analyses it against stored users.
In the final stages of the project, the system will wait for a prompt to start recording. The
system will then wait for audio silence to stop analyzing. This audio silence will then be
removed from the signal and then be analysis as usual.
12 Jan. 09
Matthew Byrne
10
12 Jan. 09
Matthew Byrne
11
4.4 PC GUI
A program with a graphical user interface will be coded. Users will have the options to
train or run the program. Training will require a login from a previously trained user with
a sufficient security profile. The training protocol will then be run.
When testing is selected the user will be prompted to speak a given passphrase, and
based on the analysis the user will either pass through to the secured area, or be
blocked and have to try again.
12 Jan. 09
Matthew Byrne
12
Appendix I - CODE
1 - Program Code
i = 1;
j = 1;
y = fopen('c0r0.end', 'rb');
x = fread(y, 'short');
[m,n] = size(x);
fsamp = 8000;
timeLength = m/8000; %About 1/2 a second
framelength = 256;
%overlapping
nFrames = floor((m-128)/(framelength-128));
%fclose(x);
figure(1);
plot(x)
sampNo = 1;
for frame = 1 : nFrames,
aquiredData = x(i:i+256-1);
i = i + 128;
figure(2);
plot(aquiredData);
%pause;
% Hamming windowing
windowing = (hamming(256).*aquiredData);
figure(3);
plot(windowing)
% FFT
FFTofData = fft(windowing);
figure(4);
plot(abs(FFTofData))
FFTofData = FFTofData(1:end/2);
figure(5);
plot(abs(FFTofData)) % First 128 frames of the sample
W=melFilterMatrix2(8000,256,20);
FilteredData = W*(abs(FFTofData));
figure (6);
plot(FilteredData)
MFCC = DCT(FilteredData);
figure(7);
plot (MFCC)
end
12 Jan. 09
Matthew Byrne
13
- MFCC Code
%right ramp
12 Jan. 09
Matthew Byrne
14
12 Jan. 09
Matthew Byrne
15
12 Jan. 09
Matthew Byrne
16