Professional Documents
Culture Documents
January 6, 2016
Speaker Recognition
Identification of the person who is speaking by characteristics of their
voice biometrics.
Primary Goal:
Accurate and Efficient Short utterance speaker recognition.
Additional Goal:
Scalability over large number of speakers.
Low Latency Real-Time recognition
A working Real-Time recognition system.
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 2/1
Content
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 3/1
VAD
VAD
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 4/1
VAD
VAD
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 4/1
VAD
VAD
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 4/1
VAD
LTSD
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 5/1
Features MFCC
MFCC
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 6/1
Features MFCC
Windowing
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 7/1
Features MFCC
Mel-Scale
f
Mel(f) = 2595 log10 (1 + )
700
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 8/1
Features MFCC
MFCC
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 9/1
Features LPC
LPC
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 10 / 1
Features LPC
LPC
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 10 / 1
Features Experiments on Features
MFCC Params
1.00 1.00
0.98 0.98
0.96 0.96
Accuracy
Accuracy
0.94
0.94
0.92
0.92
0.90
12 14 16 18 20 22 24 26 0.9020 25 30 35 40 45 50 55
Number of Cepstrals Number of Filters
0.94
Frame length: 32ms
0.92
0.90
20 25 30 35 40
Frame Length
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 11 / 1
Features Experiments on Features
LPC Params
1.00 1.00
0.98 0.98
0.96 0.96
Accuracy
Accuracy
0.94 0.94
0.92 0.92
0.9020 25 30 35 40 0.9010 12 14 16 18 20 22 24 26
Frame Length Number of Coefficients
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 12 / 1
Model GMM-UBM
GMM
Gaussian Mixture Model is commonly used to model human’s
acoustic feature.
∑
K
p(θ) = wi N (µi , Σi )
i=1
. . . . . . . . . . . . . . . . . . . .
We use K = 32
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 13 / 1
Model GMM-UBM
GMM
Gaussian Mixture Model is commonly used to model human’s
acoustic feature.
∑
K
p(θ) = wi N (µi , Σi )
i=1
1.00
0.98
0.96
Accuracy
0.94
We use K = 32
0.92
Optimized GMM
Basic GMM training: random initialize, estimate parameters with EM.
Improvment: initialize with a parallel KMeansII.
Improvment: parallel training implementation in C++.
Compared to GMM from scikit-learn:
Optimized GMM
Basic GMM training: random initialize, estimate parameters with EM.
Improvment: initialize with a parallel KMeansII.
Improvment: parallel training implementation in C++.
Compared to GMM from scikit-learn:
1.00
45 Our GMM with concurrency of 1
Our GMM with concurrency of 2
0.98 40 Our GMM with concurrency of 4
Our GMM with concurrency of 8
0.92 20
15
0.90
10
0.88 5
GMM from scikit-learn 0
0.86 0 2000 4000 6000 8000 10000 12000 14000 16000
Our GMM Number of MFCC Features
0 10 20 30 40 50 60 70 80
UBM
GMM Results
1.00 1.00
0.98 0.98
0.96 0.96
Accuracy
Accuracy
0.94 0.94
0.92 2s 2s
3s 0.92 3s
4s 4s
0.90 5s 5s
0.90
0 10 20 30 40 50 60 70 80 0 10 20 30 40 50 60 70 80
Number of Speakers (Reading) Number of Speakers (Whisper)
CRBM
Restricted Boltzmann Machine is a generative stochastic two-layer
neural network.
Continuous RBM extends RBM to real-valued inputs.
RBM has the ability to reconstruct a layer similar to input layer. The
difference between the two layers can be a used to measure the fitness
of an input to the model.
Therefore, RBM can be a substituion to GMM.
Chen, Hsin and Murray, Alan F, 2003, Continuous restricted Boltzmann machine
with an implementable training algorithm .
. .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
.
.
. .
. .
.
.
.
CRBM
Restricted Boltzmann Machine is a generative stochastic two-layer
neural network.
Continuous RBM extends RBM to real-valued inputs.
RBM has the ability to reconstruct a layer similar to input layer. The
difference between the two layers can be a used to measure the fitness
of an input to the model.
Therefore, RBM can be a substituion to GMM.
Chen, Hsin and Murray, Alan F, 2003, Continuous restricted Boltzmann machine
with an implementable training algorithm .
. .
. . . . . . . . . . . . . .
. . . . . . . . . . . . . .
.
.
. .
. .
.
.
.
RBM Results
Results of CRBM, tested with 5 secs of utterance.
0.95
Accuracy
0.90
0.85
30 sec training
60 sec training
0.80 120 sec training
0 10 20 30 40 50
Number of Speakers
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 18 / 1
Model CRBM
GUI Demo
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 19 / 1
Model CRBM
Conclusion
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 20 / 1
Model CRBM
Thanks!
. . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . .
Xinyu Zhou, Yuxin Wu, Tiezheng Li Speaker Recognition January 6, 2016 21 / 1