Speaker Recognation System Srs

asadthomas@gmail.
com
asadthomas@gmail.com
SPEAKER RECOGNATION
SYSTEM (SRS)
MD. ASAD
asadthomas@gmail.com
RIYA BHADRA
riyabhadra123456@gmail.com
IASNLP-2015, IIIT Hyderabad
Introduction
• Speaker Recognition: It is the process of automatically recognizing

(identify & verify) who is speaking on the basis of individual information
that exist in speech waves.
Objectives and aims
• To extract, characterize, and recognize the information about a
speaker identity.
• To building a robust system to identify and verify a speaker

accurately.
Automatically extract information transmitted
in speech signal
Application of speaker recognition
• SR uses are voice dialling, banking by telephone, telephone

shopping, database access services, information services,
voice mail, security control for confidential information
areas, and remote access to computers.
• Some systems use "anti-speaker" techniques such as

cohort models.
Development of Speaker Recognition Systems
• The first type of speaker recognition machine using

spectrograms of voices was invented in the 1960’s. It was
called voiceprint analysis or visible speech.
• Since the mid-1980s, this field has been steadily getting

matured that commercial applications of SR have been
increasing, and many companies currently offer this
technology.
Speech processing taxonomy
Principles of Speaker Recognition
Two applications:
• Speaker Identification and

• Speaker Verification
There exist two types of speaker recognition:
• Text dependent (restrained)
• Text independent (unrestrained)
Text dependent recognition has better performance for

subjects that cooperate. But text independent voice
recognition is more flexible that it can be used for non-
cooperating individuals.
• Close Set
• Open Set
Speaker Recognition
• Basically identification or authentication using speaker

recognition consists of four steps:
1. Voice Recording
2. Feature Extraction
3. Pattern Matching
4. Decision (accept / reject)
Feature Extraction
• Feature extraction is to convert speech waveform to

some type of parametric representation. This sub-
process is the key part in front-end processing, and
always be viewed as a ‘replacer’ of front-end
processing
• Models used for feature extraction are LPCCs, MFCCs
etc…
Pattern Matching
• Pattern matching is the actual comparisson of the extracted

frames with known speaker models (or templates), this results
in a matching score which quantifies the similarity in between
the voice recording and a known speaker model. Pattern
matching is often based on Hidden Markov Models (HMMs),
a statistical model which takes into account the underlying
variations and temporal changes of the accoustic pattern.
• Models used for Pattern Matching are VQ, NN,
HMM,GMM etc…
Speaker Recognition
• Data Base using = TIMIT

• Feature extraction = MFCCs
• Pattern matching = GMM
• Tool used = Mat-Lab
WHY MFCCs?
Mel-frequency Cepstrum Coefficients:

• Until now, Mel-frequency cepstral coefficients (MFCC) are the best
known and most commonly used features for not only speech
recognition, but speaker recognition as well. The computation of
MFCC is based on the short-term analysis and it is similar to the
computation of Cepstral Coefficients. The significant difference lays
on the usage of critical bank filters to realize mel-frequency
warping. The critical bandwidths with frequency are based on the
human ears perception.
• A mel is a unit of measure based on the human ear’s perceived
frequency.
Intoduction to GMM
• Gaussian • Mixture Model

“Gaussian is a characteristic symmetric “mixture model is a probabilistic model
“bell carve” shape that quickly falls off which assumes the underlying data to
towards 0 (practically)” belong to a mixture distribution”
Why GMM?
• Classification paradigms used in SRS during the past 20

years VQ, NN, HMM and GMM represent Vector
Quantization, Neutral Network, Hidden Markov Model and
Gaussian Mixture Model respectively. A continuous ergodic
HMM method is superior to a discrete ergodic HMM
method and that a continuous ergodic HMM method is as
robust as a VQ-based method when enough training data is
available. However, when little data is available, the VQ-
based method is more robust than a continuous HMM
method.
EXPERIMENTAL METHODOLOGY
Dataset Description
• TIMIT Database.
• Total Number of speakers= 98
• Female speakers= 48
• Male Speakers= 50
• Total sentences= 10
• Trained Data= 8 sentences for each speaker
• Testing Data= 2 sentences for each speaker
Analysis Tool
• Matlab
Result
References:
1. Reynolds, D. A and Rose, R. C. 1995. “Robust Text- Independent

Speaker Identification Using Gaussian Mixture Speaker Models”,
IEEE Trans. on Speech and Audio Processing, vol.3, No.1, pp.72-
83,
2. Panda, A. K & Sahoo, A. K. 2011. Study of Speaker Recognition
System. Thesis NIT, Rourkela.
3. Ling Feng, “Speaker Recognition”, Kgs. Lyngby 2004
Question?

Speaker Recognation System Srs

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speaker Recognation System Srs

Uploaded by

Copyright:

Available Formats

asadthomas@gmail.

• Speaker Recognition: It is the process of automatically recognizing

• To building a robust system to identify and verify a speaker

• SR uses are voice dialling, banking by telephone, telephone

• Some systems use "anti-speaker" techniques such as

• The first type of speaker recognition machine using

• Since the mid-1980s, this field has been steadily getting

• Speaker Identification and

Text dependent recognition has better performance for

• Basically identification or authentication using speaker

• Feature extraction is to convert speech waveform to

• Pattern matching is the actual comparisson of the extracted

• Data Base using = TIMIT

Mel-frequency Cepstrum Coefficients:

• Gaussian • Mixture Model

• Classification paradigms used in SRS during the past 20

1. Reynolds, D. A and Rose, R. C. 1995. “Robust Text- Independent

You might also like