You are on page 1of 12

Automatic Speaker Recognition

System based on
Machine Learning Algorithms
ABSTRACT
 Speaker recognition technology has improved over recent years and has
become inexpensive and and reliable method for person identification
and verification. Research in the field of speaker recognition has now
spanned over five decades and has shown fruitful results, however there is
not much work done with regards to South African indigenous languages.
 This paper presents the development of an automatic speaker recognition
system that incorporates classification and recognition of Sepedi home
language speakers. Four classifier models, namely, Support Vector
Machines, K-Nearest Neighbors, Multilayer Perceptrons (MLP) and
Random Forest (RF), are trained using WEKA data mining tool. Auto-
WEKA is applied to determine the best classifier model together with its
best hyper-parameters. The performance of each model is evaluated in
WEKA using 10-fold cross validation.
MLP and RF yielded good accuracy surpassing the state-of-the-art with an
accuracy of 97% and 99.9% respectively, the RF model is then
implemented on a graphical user interface for development testing.
INTRODUCTION
 This paper presents the development of an automatic speaker
recognition system that incorporates classification and
recognition of Sepedi home language speakers.
The system uses machine learning algorithms that learns
features extracted from the Sepedi speech data to train the
classifier model.
The system can be used to automatically authenticate speaker
identities using their voices to allow only the identified
persons an access right to information systems or to facilities
that need to be protected from the intrusion of unauthorized
persons.
PROPOSED METHOD
Fundamental tasks of Speaker Recognition :
the speaker has to claim an identity and the system validates the claimed identity.
Applications of speaker verification include telephone banking, computer login, cellular
telephone fraud prevention and calling cards.
Classification of Speaker Recognition Systems:
the spoken text or phrase used to train and test the system is fixed for each speaker .Text-
dependent speaker recognition systems are used mostly in services such as access control
and telephone based services, where users are considered to be cooperative.Text-dependent
recognition achieves higher recognition performance than the text-independent recognition.
Phases of Speaker Recognition:
A speakers voice is recorded and a number of audio features vectors are extracted to form a
unique model (voice-print) that uniquely identifies the speaker.
Applications of Speaker Recognition Systems:
Speaker recognition for authentication allows uses or automated systems to identify a person
using their voices. This type of authentication method is known as biometric person
authentication.
BLOCK DIAGRAM
METHODOLOGY
A systematic workflow of the proposed speaker recognition system.
Given an input speech signal, voice activity detection is performed
to identify speech presence or speech absence in the given speech
signal.
Then audio feature vectors are extracted and used to train four
standard machine learning algorithms; support vector machine
(SVM), K-nearest neighbors (k-NN), multilayer perceptrons (MLP),
and random forest (RF).
The models are trained and evaluated on a 10 fold cross-validation
in the Waikato Environment for Knowledge Analysis (WEKA) to
determine the best classifier among them.
 was also applied to automatically determine the best performing
algorithm with its best hyper-parameters. Auto-WEKA selected RF
as the best classifier model
Hardware
Raspberry-pi
 speaker
 display
 motor
SOFTWARE
Raspbian OS
 noobs
Linux(ubuntu)
QT Creator
Python
Feature Extraction
 The human voice contains numerous discriminative features that can be
used to identify speakers. Feature extraction is one of the most important
aspect of speaker recognition and generates a vector that represents the
speech signal.
We extract features using pyAudioAnalysis, an open-source comprehensive
package developed in Python . pyAudioAnalysis implements a total of 34
short-term features. Table II presents a complete list of all the 34 features.
 Time-domain features (Zero Crossing Rate (ZCR), Energy and Entropy of
Energy) are extracted directly from the raw audio samples. Frequency
domain features (Spectral Spread, Spectral Centroid, Spectral Flux, Spectral
Entropy, Spectral Rolloff, Chroma Deviation and Chroma Vector) are then
based on the magnitude of the Discrete Fourier Transform (DFT).
Lastly, cepstral-domain features (Mel Frequency Cepstral Coefficients or
MFCCs) result after the Inverse DFT is applied on the logarithmic spectrum.
Fig. 4 shows the time domain (ZCR), frequency domain (Spectral Centroid)
and cepstral domain (MFCCs) features extracted from a single audio file of
one speaker.
CONCLUSION
 This paper reported on the development of a text independent
speaker recognition system based on machine . SVM KNN MLP
Classifier Models .
The paper briefly described all the stages (training and testing) which
covered voice activity detection, feature extraction, model training,
evaluation and graphical user interface. The dataset of Sepedi speech
data was obtained from the NCHLT project. Voice activity detection
was performed using Long-Term Spectral Divergence algorithm.
Features were extracted using pyAudioAnalysis library. We used the
SVM, KNN, RF and MLP implemented in WEKA to train the
models. We also applied Auto-WEKA to determine the best
algorithm. It was observed that MLPs performed well on the given
dataset, however, Auto-WEKA selected Random Forest as the best
algorithm, which was then implemented on the GUI.
REFERENCES
[1] T. Kinnunen and H. Li, "An overview of text-independent
speaker recognition: From features to supervectors," Speech
communication, vol. 52, no. 1, pp. 12-40, 2010.
[2] R. R Ramachandran, K. R. Farrell, R. Ramachandran, and R.
J. Mammone, "Speaker recognitiongeneral classifier
approaches and data fusion methods," Pattern Recognition, vol.
35, no. 12, pp. 2801-2821, 2002.
[3] N. Singh, R. Khan, and R. Shree, "Applications of speaker
recognition," Pmcedia engineering, vol. 38, pp. 3122-3126,
2012.
[4] A. Larcher, K. A. Lee, B. Ma, and H. Li, "Text-dependent
speaker verification: Classifiers, databases and rsr2015,"

Speech Communication, vol. 60, pp. 56-77, 2014. .

You might also like