Speaker Recognition With Normal and Telephonic Assamese Speech Using I Vector and Learning Based Classifier

See discussions, stats, and author profiles for this publication at: https://www.researchgate.
net/publication/319347902
Speaker Recognition With Normal and Telephonic Assamese Speech Using I-

Vector and Learning-Based Classiﬁer
Chapter · March 2017

DOI: 10.4018/978-1-5225-2128-0.ch008
CITATIONS READS
0 72
3 authors:
Mridusmita Sharma Rituraj Kaushik

Assam Engineering College INRIA - Nancy France
28 PUBLICATIONS 61 CITATIONS 12 PUBLICATIONS 208 CITATIONS
SEE PROFILE SEE PROFILE
Kandarpa Kumar Sarma

Gauhati University
491 PUBLICATIONS 1,943 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
ResiBots View project
Intelligent System Design View project
All content following this page was uploaded by Mridusmita Sharma on 28 August 2018.
The user has requested enhancement of the downloaded file.

256
Chapter 8
Speaker Recognition With
Normal and Telephonic
Assamese Speech Using
I-Vector and Learning-
Based Classifier
Mridusmita Sharma
Gauhati University, India
Rituraj Kaushik
Tezpur University, India
Kandarpa Kumar Sarma

Gauhati University, India
ABSTRACT
Speaker recognition is the task of identifying a person by his/her unique identification features or behav-
ioural characteristics that are included in the speech uttered by the person. Speaker recognition deals
with the identity of the speaker. It is a biometric modality which uses the features of the speaker that is
influenced by one’s individual behaviour as well as the characteristics of the vocal cord. The issue be-
comes more complex when regional languages are considered. Here, the authors report the design of a
speaker recognition system using normal and telephonic Assamese speech for their case study. In their
work, the authors have implemented i-vectors as features to generate an optimal feature set and have used
the Feed Forward Neural Network for the recognition purpose which gives a fairly high recognition rate.
INTRODUCTION
Speaker or voice recognition is the task of automatically recognizing the identity of the person speaking.
The task of speaker recognition considers the individual behavior as well as the characteristics of the vocal
cord as features to identify a speaker. Speaker recognition is however very much different from speech
DOI: 10.4018/978-1-5225-2128-0.ch008
Copyright © 2017, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited.

Speaker Recognition With Normal and Telephonic Assamese Speech
recognition. In speech recognition, the speech signals which convey much information to the listeners
are being detected with the help of the features that has the ability to represent the speech content. The
primary level of speech recognition is to recognize the speech content spoken by the speaker. But at a
higher level, the spoken utterances or speech samples contains the information regarding the gender,
emotions and also the individuality of the speaker speaking (Reynolds, 2002).
With the rapid growth of techniques in the signal processing and model building and empowerment
of computing devices, significant progress has been made in the speech recognition area. Research in
the field of speaker recognition has also evolved at par with speech recognition and speech synthesis
because of the similar characteristics and challenges associated with it. Research and development in
the field of speaker recognition dates back to over four decades and this area is still an active topic for
research. The application of speaker recognition technology has been continually growing in various
fields of application such as forensic applications, dynamic signatures, gait, keystroke recognition, data
encryption purpose, user validation in contact centers, etc (Beigi, 2011).
From the literature, it has been found that recent speaker recognition tasks have implemented i-vector
based features for their purpose. The characteristics of a voice sample or about the speaker can be obtained
by the features extracted from the speech sample. Recently the i-vector paradigm is widely used for the
speaker recognition systems. The i-vector based feature extraction is based on Joint Factor Analysis (JFA)
approach which provides an elegant way to convert the Mel Frequency Cepstral Coefficient (MFCC)
feature vector of a variable length utterance into a low dimensional vector representation (Dehak et. al.,
2011). I-vectors are one example of subspace modeling approaches that can be used to reduce the dimen-
sionality of data before training and applying to classifiers for recognition purpose. The dimensionality
reduction should make training of the classifier less computationally expensive which could enable us
to train the system with more data. In this work, we have considered Assamese speech for our experi-
mental work as the number of work done in this language is relatively less. Also, Assamese language
has a very rich and diverse phonological content which shows notable variations from region to region
as well as speaker to speaker. The diversity and variations present in this language increases the need to
develop a language specific tool for speech and speaker recognition system (Sharma & Sarma, 2015).
The organization of the chapter is as follows. The second section provides an overview of the basic
considerations and theoretical background that are associated with various speaker recognition tasks.
A detailed literature survey of the previous work done related to speaker recognition problems and the
features used for the purpose is presented in the third section. The fourth section gives the proposed
system model, the speech database and the basic algorithm used for our proposed work. Experimental
details and results derived from our work is shown in Section five. Section six concludes the chapter.
THEORETICAL CONSIDERATIONS
The study of the basic theoretical considerations of speaker recognition and other such related topics
gives a proper understanding of the speaker recognition problems. In this section a brief overview of
the basic theoretical considerations related to speaker recognition problem, features used and the soft
computing technique implemented for the recognition purpose is provided.
1. Speaker Recognition Basics: Speaker recognition is the process of recognition of the person
speaking on the basis of the uttered speech. The history of speaker recognition dates back to the
257
23 more pages are available in the full version of this document, which may
be purchased using the "Add to Cart" button on the product's webpage:
www.igi-global.com/chapter/speaker-recognition-with-normal-and-telephonic-
assamese-speech-using-i-vector-and-learning-based-
classifier/179395?camid=4v1
This title is available in Advances in Computational Intelligence and Robotics,

InfoSci-Books, InfoSci-Computer Science and Information Technology,
Science, Engineering, and Information Technology, InfoSci-Media and
Communications, Communications, Social Science, and Healthcare, InfoSci-
Select, InfoSci-Computer Science Collection, InfoSci-Select, InfoSci-Select,
InfoSci-Select. Recommend this product to your librarian:
www.igi-global.com/e-resources/library-recommendation/?id=77
Related Content
Discrete Artificial Bee Colony Optimization Algorithm for Financial Classification Problems
Yannis Marinakis, Magdalene Marinaki, Nikolaos Matsatsinis and Constantin Zopounidis (2011).
International Journal of Applied Metaheuristic Computing (pp. 1-17).
www.igi-global.com/article/discrete-artificial-bee-colony-optimization/52789?camid=4v1a
The Great Salmon Run Metaheuristic for Robust Shape and Size Design of Truss Structures
with Dynamic Constraints
Ahmad Mozaffari and Mehrzad Ebrahimnejad (2014). International Journal of Applied Metaheuristic
Computing (pp. 54-79).
www.igi-global.com/article/the-great-salmon-run-metaheuristic-for-robust-shape-and-size-
design-of-truss-structures-with-dynamic-constraints/114206?camid=4v1a
Vector Evaluated and Objective Switching Approaches of Artificial Bee Colony Algorithm (ABC)
for Multi-Objective Design Optimization of Composite Plate Structures
S. N. Omkar, G. Narayana Naik, Kiran Patil and Mrunmaya Mudigere (2013). Trends in Developing
Metaheuristics, Algorithms, and Optimization Approaches (pp. 59-84).
www.igi-global.com/chapter/vector-evaluated-objective-switching-
approaches/69718?camid=4v1a
A Reinforcement Learning: Great-Deluge Hyper-Heuristic for Examination Timetabling

Ender Özcan, Mustafa Misir, Gabriela Ochoa and Edmund K. Burke (2012). Modeling, Analysis, and
Applications in Metaheuristic Computing: Advancements and Trends (pp. 34-55).
www.igi-global.com/chapter/reinforcement-learning-great-deluge-hyper/63803?camid=4v1a
View publication stats

Speaker Recognition With Normal and Telephonic Assamese Speech Using I Vector and Learning Based Classifier

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speaker Recognition With Normal and Telephonic Assamese Speech Using I Vector and Learning Based Classifier

Uploaded by

Copyright:

Available Formats

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

Speaker Recognition With Normal and Telephonic Assamese Speech Using I-

Chapter · March 2017

Mridusmita Sharma Rituraj Kaushik

SEE PROFILE SEE PROFILE

Kandarpa Kumar Sarma

ResiBots View project

Intelligent System Design View project

The user has requested enhancement of the downloaded file.

Kandarpa Kumar Sarma

This title is available in Advances in Computational Intelligence and Robotics,

A Reinforcement Learning: Great-Deluge Hyper-Heuristic for Examination Timetabling

View publication stats

You might also like