Speaker Verification For Remote Authentication

Tribhuvan University Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering
MAJOR PROJECT FINAL PRESENTATION :
TEXT PROMPTED REMOTE SPEAKER AUTHENTICATION

Project Supervisor : Project Members:
Dr. Subarna Shakya Associate Professor
Ganesh Tiwari (75010) Madhav Pandey(75014) Manoj Shrestha(75018)
Internal Examiner:
External Examiner
Er. Manoj Ghimire
Er. Bimal Acharya
INTRODUCTION
Voice biometric system
User login
Text-Prompted system
Claimant is asked to speak a prompted(random) text
Speech and Speaker Recognition
Why Text prompted ?
Playback attack
OUR SYSTEM
Feature : MFCC
Modeling and Classifications : both statistical
GMM - Speaker Modeling :

HMM/VQ - Speech Modeling :
PROPERTIES OF SPEECH SIGNAL
Carries both Speech Content and Speaker identity What makes Speech Signal Unique ?

Each phoneme resonates at its own fundamental frequency and harmonics of it Studied over short period : short time spectral analysis
What is Speaker Dependent information
Fundamental frequency, primarily

function of the dimensions and tension of the vocal chords size and shape of the mouth, throat, nose, and teeth
Studied over long period : all the variations from that speaker
UNIQUENESS IN PHONEME
Phoneme /ah/
0.15
0.1
0.05
Amplitude
-0.05
-0.1
-0.15
Phoneme /i:/
0 500 1000 Samples 1500 2000 2500
-0.2
Pre-Processing and Feature Extraction
PREPROCESSING : STEPS
1)Silence Removal
1
0.5
-0.5
-1
8 x 10
9
4
Silence Signal
1
0.5
Silence Removed
-0.5
-1
0.5
1.5
2.5
3.5
4
4
PREPROCESSING :STEPS (CONTD..)

1)Silence Removal
0.05 0.04 0.03 0.02 0.01 0
2)Pre-Emphasis
Suppressed high Frequencies
|Y(f)|
2000
4000
6000 Frequency (Hz)
8000
10000
12000
5 4 3 2 1 0
x 10
-3
Boosted high Frequencies
|Y(f)|
2000
4000
6000 Frequency (Hz)
8000
10000
12000

1)Silence Removal2)Pre-Emphasis
3)Framing
50% overlapped, 23ms

1)Silence Removal2)Pre-Emphasis3)Framing
0.05 0.04 0.03 0.02 0.01 0 -0.01 -0.02
4)Windowing
0.04 0.03
-0.03 -0.04 -0.05 0 200 400 600 800 1000 1200
0.02 0.01 0 -0.01
1 0.9 0.8 0.7

-0.04
Hamming Window
-0.02 -0.03
0.6 0.5 0.4 0.3 0.2 0.1 0
200
400
600
800
1000
1200
Windowed Signal Hamming Window

10 20 30 40 50 60
FEATURE EXTRACTION
MFCC : Mel Filter Cepstral Coefficients
Perceptual approach
Human Ear processes audio signal in Mel scale
Mel scale : linear up to 1KHz and logarithmic after 1KHz
MFCC EXTRACTION: (CONTD..)
Steps :
FFT Mel Filter Log DCT CMS
Mel Filter Bank
Mel Filter : 12
Filtering of absolute fft coefficients using triangular filter bank in Mel scale
MFCC gives distribution of energy acc. to filters in Mel frequency band
EXTRA FEATURES :ENERGY AND DELTAS

For achieving high recognition rate A Energy Feature Delta and Delta-Delta
delta velocity feature

Co-articulation
double delta acceleration feature
COMPOSITION OF FEATURE VECTOR

12 MFCC Features 12 MFCC 12 MFCC 1 Energy Feature 1 Energy 1 Energy
39 Features from each frame
Speech Recognition/Verification by
HMM/VQ
HIDDEN MARKOV MODEL (HMM)
HMM is the extension of Markov Process Markov Process consist of observable states HMM has hidden states and observable symbols per states HMM is the stochastic model
HMM (CONTD)
Parameters
1) The initial state distribution () 2) State transition probability distribution (A) 3) Observation symbol probability distribution (B)
The HMM Model
( A, B, )
EXAMPLE: PRONUNCIATION MODEL OF WORD TOMATO
( A, B, )
HMM IMPLEMENTATION
Feature Vector observation symbols , 256 Phonemes hidden states, 6 Left to right HMM Discrete Hidden Markov Model (DHMM) with Vector Quantization (VQ) technique
SPEECH RECOGNITION SYSTEM
VECTOR QUANTIZATION
Speaker Recognition/Verification by
GMM
SPEAKER VERIFICATION SYSTEM
SPEAKER MODELING (GMM)

Gaussian
Mixture Model
Parametric probability density function Based on soft clustering technique Mixture of Gaussian components
= ( , , )
SPEAKER MODEL TRAINING

Estimate the model parameters Expectation Maximization algorithm
SPEAKER VERIFICATION
Based on likelihood ratio =

TOOLS USED
Languages:

Adobe Flex Java Blaze DS for RPC
Servers:
Apache Tomcat MySQL
Versioning
Tortoise SVN
OUTPUT : SNAPSHOT (GUI)
APPLICATION AREAS
Telephone transaction
Telephone credit card purchase, Telephone stock trading

Access control
Physical facilities Computer networks Information retrieval Customers information
Forensics
Voice sample matching
LIMITATION AND FUTURE ENHANCEMENT
Noise reduction Training on more data Combine with

other features other classification methods
Thanks
Any queries ?

Speaker Verification For Remote Authentication

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Speaker Verification For Remote Authentication

Uploaded by

Copyright:

Available Formats

Tribhuvan University Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering

MAJOR PROJECT FINAL PRESENTATION :

TEXT PROMPTED REMOTE SPEAKER AUTHENTICATION

Dr. Subarna Shakya Associate Professor

Ganesh Tiwari (75010) Madhav Pandey(75014) Manoj Shrestha(75018)

Er. Manoj Ghimire

Er. Bimal Acharya

Voice biometric system

Claimant is asked to speak a prompted(random) text

Speech and Speaker Recognition

Why Text prompted ?

GMM - Speaker Modeling :

PROPERTIES OF SPEECH SIGNAL

What is Speaker Dependent information

Fundamental frequency, primarily

Pre-Processing and Feature Extraction

PREPROCESSING :STEPS (CONTD..)

6000 Frequency (Hz)

Boosted high Frequencies

6000 Frequency (Hz)

PREPROCESSING :STEPS (CONTD..)

50% overlapped, 23ms

PREPROCESSING :STEPS (CONTD..)

-0.03 -0.04 -0.05 0 200 400 600 800 1000 1200

0.02 0.01 0 -0.01

1 0.9 0.8 0.7

0.6 0.5 0.4 0.3 0.2 0.1 0

Windowed Signal Hamming Window

MFCC : Mel Filter Cepstral Coefficients

Human Ear processes audio signal in Mel scale

Mel scale : linear up to 1KHz and logarithmic after 1KHz

MFCC EXTRACTION: (CONTD..)

Mel Filter Bank

MFCC gives distribution of energy acc. to filters in Mel frequency band

EXTRA FEATURES :ENERGY AND DELTAS

delta velocity feature

double delta acceleration feature

COMPOSITION OF FEATURE VECTOR

39 Features from each frame

HIDDEN MARKOV MODEL (HMM)

The HMM Model

EXAMPLE: PRONUNCIATION MODEL OF WORD TOMATO

SPEECH RECOGNITION SYSTEM

SPEAKER VERIFICATION SYSTEM

SPEAKER MODELING (GMM)

SPEAKER MODEL TRAINING

Adobe Flex Java Blaze DS for RPC

Apache Tomcat MySQL

OUTPUT : SNAPSHOT (GUI)

Physical facilities Computer networks Information retrieval Customers information

Voice sample matching

LIMITATION AND FUTURE ENHANCEMENT

Noise reduction Training on more data Combine with

You might also like