Professional Documents
Culture Documents
Combi VQHMM
Combi VQHMM
Dist
distortions VQ-DIST
acoustic Classify
vectors Vectors
indices DHMM Prob
The probability of the codeword sequence of the test ut- COMBINING THE SCORES
terance against speaker-specific DHMMs would give a
As shown in Figure 1, we now have two different scores,
score for speaker recognition. By combining this score
the mean-VQ distortion D and the DHMM-likelihood
with the VQ-distortion, one could expect to achieve better
Prob, to recognise the unknown speaker from a test utter-
recognition results compared with the conventional VQ
ance. In order to combine the two scores several possi-
method.
bilities are available [7].
Since we have to train a speaker-specific DHMM for each
Assuming that we have two statistically independent
word, the speaker data used to train those DHMMs might
scores for an input utterance against a reference speaker,
be inadequate. One way to overcome this problem is the
the joint probability is chosen here because of its mathe-
use of a speaker-adaptive system [6]. First, for each word
matical tractability. The combined probability is defined
in the vocabulary a speaker-independent word model is
as follows:
trained using a large number of occurrences of that word
uttered by all registered speakers in the system. The pa-
rameters of this model are estimated by classic training Prcombined = PrVQ PrDHMM (1)
(Baum-Welch reestimation algorithm). Next, a speaker- where PrVQ is the probability of the speaker being the
specific DHMM for each word is reestimated using the
speaker data, which uses the parameters of the speaker- client using VQ-distortion; and PrDHMM is the probabil-
independent word model as initial parameters. ity of the utterance being generated by the client DHMM.
This method extends beyond the conventional VQ-based We convert the mean-distortion D into a probability score
method but requires little extra computation because the using the following formula:
VQ procedure provides input data to both scoring meth-
ods, VQ-distortion measurement and DHMM probability,
simultaneously. Figure 1 shows the data flow in the com- PrVQ = e − wD (2)
bined algorithm. The box labelled “Classify Vectors” is
the nearest-neighbour search procedure using a VQ code- Figure 2 shows a plot of the function PrVQ. At D = 0, the
book. The sequence of acoustic vectors extracted from probability of speaker being the client is assumed to be 1.
the speech is split into sequences of VQ-distortions and As the mean-distortion D increases, that probability de-
codebook indices. The mean distortion D is used in the creases and approaches 0. The probability weight w is
conventional VQ-based method while the likelihood score used to control the decline of the curve PrVQ. One prop-
Prob of the sequence of indices is computed from the erty of the function PrVQ is that it smoothes out the prob-
DHMM. ability at large values of D. This is similar to taking the
distortion-intersection measure (DIM) which was intro-
Another way to look at this combined method is to view it duced by Matsui and Furui [1].
as a simplified version of the continuous HMM method
where both feature vectors distribution and temporal in- To avoid underflow in computation, we take the log of the
probabilities in formula (1). Finally the log-likelihood of
PVQ the combined method is computed as:
1
log(Prcombined ) = − wD + log(PrDHMM ) (3)
100
Speaker identification rate (%)
90
80
70
60 DHMM
50 VQ
40 Combined
30
20
10
0
32 64 128
Codebook size
CONCLUSION
This paper has proposed a new method in speaker recog-
nition which extends the mean VQ distortion method by
combining it which DHMM. This method was designed
for systems with a small amount of training requirements
and is ideally suited for text-dependent or text-prompted
speaker recognition systems. Our speaker identification
experiments showed that the method surpasses both the
conventional VQ-based method and the DHMM-based
method. The combined score chosen here is quite simple
and the research will go on to investigate other possible
combinations of scores.
REFERENCES
[1] T. Matsui and S. Furui (1992), “Comparison of
text independent speaker recognition methods using VQ-
distortion and discrete/continuous HMM’s”, Proc.
ICASSP 1992, Vol.II, pp. 157-160.