You are on page 1of 31

Tribhuvan University Institute of Engineering Pulchowk Campus Department of Electronics and Computer Engineering

MAJOR PROJECT FINAL PRESENTATION :

TEXT PROMPTED REMOTE SPEAKER AUTHENTICATION


Project Supervisor : Project Members:

Dr. Subarna Shakya Associate Professor

Ganesh Tiwari (75010) Madhav Pandey(75014) Manoj Shrestha(75018)

Internal Examiner:

External Examiner

Er. Manoj Ghimire

Er. Bimal Acharya

INTRODUCTION

Voice biometric system

User login

Text-Prompted system

Claimant is asked to speak a prompted(random) text

Speech and Speaker Recognition

Why Text prompted ?

Playback attack

OUR SYSTEM

Feature : MFCC
Modeling and Classifications : both statistical

GMM - Speaker Modeling :


HMM/VQ - Speech Modeling :

PROPERTIES OF SPEECH SIGNAL

Carries both Speech Content and Speaker identity What makes Speech Signal Unique ?

Each phoneme resonates at its own fundamental frequency and harmonics of it Studied over short period : short time spectral analysis

What is Speaker Dependent information

Fundamental frequency, primarily


function of the dimensions and tension of the vocal chords size and shape of the mouth, throat, nose, and teeth

Studied over long period : all the variations from that speaker

UNIQUENESS IN PHONEME
Phoneme /ah/

0.15

0.1

0.05

Amplitude

-0.05

-0.1

-0.15

Phoneme /i:/
0 500 1000 Samples 1500 2000 2500

-0.2

Pre-Processing and Feature Extraction

PREPROCESSING : STEPS
1)Silence Removal
1

0.5

-0.5

-1

8 x 10

9
4

Silence Signal
1

0.5

Silence Removed

-0.5

-1

0.5

1.5

2.5

3.5

4
4

PREPROCESSING :STEPS (CONTD..)


1)Silence Removal
0.05 0.04 0.03 0.02 0.01 0

2)Pre-Emphasis
Suppressed high Frequencies

|Y(f)|

2000

4000

6000 Frequency (Hz)

8000

10000

12000

5 4 3 2 1 0

x 10

-3

Boosted high Frequencies

|Y(f)|

2000

4000

6000 Frequency (Hz)

8000

10000

12000

PREPROCESSING :STEPS (CONTD..)


1)Silence Removal2)Pre-Emphasis

3)Framing

50% overlapped, 23ms

PREPROCESSING :STEPS (CONTD..)


1)Silence Removal2)Pre-Emphasis3)Framing
0.05 0.04 0.03 0.02 0.01 0 -0.01 -0.02

4)Windowing

0.04 0.03

-0.03 -0.04 -0.05 0 200 400 600 800 1000 1200

0.02 0.01 0 -0.01

1 0.9 0.8 0.7


-0.04

Hamming Window

-0.02 -0.03

0.6 0.5 0.4 0.3 0.2 0.1 0

200

400

600

800

1000

1200

Windowed Signal Hamming Window


10 20 30 40 50 60

FEATURE EXTRACTION

MFCC : Mel Filter Cepstral Coefficients

Perceptual approach

Human Ear processes audio signal in Mel scale

Mel scale : linear up to 1KHz and logarithmic after 1KHz

MFCC EXTRACTION: (CONTD..)

Steps :
FFT Mel Filter Log DCT CMS

Mel Filter Bank

Mel Filter : 12

Filtering of absolute fft coefficients using triangular filter bank in Mel scale

MFCC gives distribution of energy acc. to filters in Mel frequency band

EXTRA FEATURES :ENERGY AND DELTAS


For achieving high recognition rate A Energy Feature Delta and Delta-Delta

delta velocity feature


Co-articulation

double delta acceleration feature

COMPOSITION OF FEATURE VECTOR


12 MFCC Features 12 MFCC 12 MFCC 1 Energy Feature 1 Energy 1 Energy

39 Features from each frame

Speech Recognition/Verification by

HMM/VQ

HIDDEN MARKOV MODEL (HMM)

HMM is the extension of Markov Process Markov Process consist of observable states HMM has hidden states and observable symbols per states HMM is the stochastic model

HMM (CONTD)

Parameters
1) The initial state distribution () 2) State transition probability distribution (A) 3) Observation symbol probability distribution (B)

The HMM Model

( A, B, )

EXAMPLE: PRONUNCIATION MODEL OF WORD TOMATO

( A, B, )

HMM IMPLEMENTATION

Feature Vector observation symbols , 256 Phonemes hidden states, 6 Left to right HMM Discrete Hidden Markov Model (DHMM) with Vector Quantization (VQ) technique

SPEECH RECOGNITION SYSTEM

VECTOR QUANTIZATION

Speaker Recognition/Verification by

GMM

SPEAKER VERIFICATION SYSTEM

SPEAKER MODELING (GMM)


Gaussian

Mixture Model

Parametric probability density function Based on soft clustering technique Mixture of Gaussian components
= ( , , )

SPEAKER MODEL TRAINING


Estimate the model parameters Expectation Maximization algorithm

SPEAKER VERIFICATION
Based on likelihood ratio =

TOOLS USED
Languages:

Adobe Flex Java Blaze DS for RPC

Servers:

Apache Tomcat MySQL

Versioning

Tortoise SVN

OUTPUT : SNAPSHOT (GUI)

APPLICATION AREAS
Telephone transaction
Telephone credit card purchase, Telephone stock trading

Access control

Physical facilities Computer networks Information retrieval Customers information

Forensics

Voice sample matching

LIMITATION AND FUTURE ENHANCEMENT

Noise reduction Training on more data Combine with


other features other classification methods

Thanks

Any queries ?