Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
Nikolay Lubimov, Evgeny Mikheev, Alexey Lukin
Moscow Lomonosov State University, Moscow, Russia
ABSTRACT
In this paper, we consider textindependent speaker iden
tification problem. The Gaussian mixture is used to model
speaker acoustic feature distribution. In this work we use
wellknown Melfrequency cepstral coefficient (MFCC) to
model speaker variability. The parameters of speaker models
are reestimated during ExpectationMaximization (EM)
procedure. Monotonic property of this method leads to the
problem of initial parameters approximation that affects the
final convergence result. We have tested some major cluster
ing algorithms as initialization step in the EM algorithm and
have analyzed their influence on speaker identification per
formance. The fuzzy and hard clustering techniques have
been used to construct speaker models. The comparison be
tween these models has been done on telephonequality
speech database.
Index Terms— speaker identification, Kmeans, K
means++, LindeBuzoGray, Fuzzy Cmeans, Gustafson
Kessel, Gaussian mixture
1. INTRODUCTION
In this paper we consider textindependent speaker identi
fication task that refers to acoustic recognition research.
Many different techniques have been presented over past
several decades. A stateoftheart technique uses Gaussian
Mixtures (GMM) [1] for modeling speaker data distribution
presented by MFCC [1] or LPCC [2] features. The classifi
cation is obtained by choosing the speaker class with maxi
mum likelihood on observed data. More complex approach
considers discriminative capability of the methods like Sup
port Vector Machine (SVM) in order to separate different
acoustic classes [3]. A hybrid system for speaker identifica
tion presented in [4] successfully combines advantages of
GMM’s generative capability and SVM’s discriminative
power by introducing Fisher kernel.
In this work we examine the simplest scheme for construc
tion of speaker identification system. We can separate 3 ma
jor stages in our system: 1) preprocessing, 2) initial cluster
ing in feature space, 3) Gaussian mixture model parameter
reestimation. As mentioned above, there are many different
successful techniques proposed for the preprocessing step
[1][2][3]. The ExpectationMaximization (EM) algorithm
used for Gaussian mixture parameter reestimation is also
well documented [5]. On the other hand, it is not obvious
how to initialize the recurrent formula of the EM algorithm
in this task. In other words, an interesting problem is: which
type of initial clustering in feature space should be used to
obtain better results? It is known that convergence properties
of the EM algorithm strongly depend on initial approxima
tion [5]. In this paper we describe some existing methods for
making an initial approximation in the EM procedure, and
show how these methods affect to final speaker recognition
rate. Using different algorithms for feature space clustering
we construct several classifiers for speaker identification
task. We perform comparison between them using identifica
tion rate error on a speaker database with telephonequality
signals. Our main goals are to compare the performance of
fuzzy and hard clustering methods, and also examine the
influence of deterministic and random initializations of the
EM algorithm. In first section we briefly describe clustering
approaches that can be used to separate the acoustic feature
space in nonintersecting classes. In second section the
overview of GMM model and basic reestimation formulas
are presented. The overall baseline speaker identification
system is described in the third section. Then speaker identi
fication test is performed with speaker models constructed
using different initial clustering procedures. The results and
discussion are presented in the fourth section.
2. CLUSTERING APPROACHES
2.1. Kmeans
Kmeans is one of the most popular unsupervised cluster
ing algorithms [6]. Its main advantages are simplicity in rea
lization and small computational complexity. Considering
some discrete data set, Kmeans minimize the distance be
tween K centers and vector points in corresponding space.
This algorithm is widely known so we only tell its initiali
zation. In our work we have used the following Kmeans
initialization procedure:
1. Find minimum m and maximum M value among all
points in one dimension;
2. Uniformly choose k points inside segment [m, M];
3. Repeat steps 1 and 2 for all dimensions of the input
data vector;
4. Select k cluster centers in these uniformly distri
buted points in each dimension.
2.2. Kmeans++
Kmeans++ is a modification of Kmeans differing only in
initialization step, which can be formalized in the following
way:
1. Choose an initial center
1
c uniformly at random from
dataset X;
2. Choose the next center
i
c , selecting ' x c
i
= , with
probability
¿
eX x
x D
x D
2
2
) (
) ' (
, where D(x) denotes the short
est Euclidian distance from a data point x to the clos
est center we have already chosen;
3. Repeat Step 2 until we have chosen k clusters.
Then standard Kmeans strategy is used to reestimate the
centers of obtained clusters. This method was originally pre
sented in [7].
Using this initialization, the algorithm is guaranteed to find
a solution that is ) (log k O competitive to the optimal k
means solution. If center set C is constructed with K
means++, then the corresponding potential function
¿ ¿
= e
÷ =
K
i C x
i j
i j
x V
1
satisfies
opt
V k V E ) 2 (ln 8 ] [ + s
, where
i
C is a subset belonging to the ith cluster.
This strategy determines Kmeans initialization, but the
problem with selection of number of clusters still remains.
In the deterministic Kmeans++ algorithm we have chosen
the first cluster center as a middle point of the data set.
2.3. LindeBuzoGray
LindeBuzoGray (LBG) algorithm is originally proposed
in [8]. It is very similar to Kmeans clustering, besides the
fact that it successfully avoids nondeterministic initializa
tion of vector quantization procedure. The main idea of this
algorithm is to set initial cluster centers according to prin
cipal components in feature space. At first we find sample
mean of the whole dataset. Then this mean is split into two
points along the first principal component. The standard K
means procedure is applied to recalculate cluster centers due
to convergence of the potential function, considering K=2.
Then the cluster with the biggest radius is chosen, and its
center is similarly split into two points along the first prin
cipal component in its proper subset. Now Kmeans algo
rithm is applied for number of clusters K=3. By analogy we
continue this procedure until the desired number of clusters
will be reached.
2.4. Fuzzy Cmeans
Fuzzy Cmeans (FCM) is one of the most popular fuzzy
clustering algorithms. It divides data region into K spherical
clusters. Before formalize this algorithm some definition
must be introduced:
1. Decomposition matrix
C N
R U
*
e , where N is the
data points count and C is the clusters count;
2. Fuzzy coefficient w (usually w=2 is used).
Now we briefly describe the FCM algorithm:
1. Initialize U randomly;
2. Calculate cluster centers using formula:
¿
¿
=
=
=
N
n
w
kn
N
n
n
w
kn
k
u
x u
c
1
1 (1)
3. Update the decomposition matrix using formula:
¿
=
÷
÷
÷
÷
=
C
j
w
j n
w
k n
kn
c x d
c x d
u
1
1
1
1
1
) , (
) , (
(2)
where ) , (
k n
c x d is the Euclidean distance between
corresponding data point and cluster center.
Note, that if
0 ) , (
1
1
=
÷ w
k n
c x d
,
kn
u becomes equal 1 be
cause
k
c and
n
x are the same point and
n
x is in
k
c with
probability 1. Otherwise, if 0 ) , (
1
1
=
÷ w
j n
c x d ,
kn
u becomes
equal 0 because
j
c and
n
x are the same point and
n
x is
not in ,
k
c so probability equals 0.
4. Repeat steps 2 and 3 until
< ÷
÷1 l l
U U
where is a sufficiently small real value and i denotes
iteration number.
The detailed description and discussion of this method can
be found in [9].
2.5. GustafsonKessel algorithm
GustafsonKessel algorithm is another fuzzy clustering al
gorithm. It improves Cmeans by dividing the data region
into K ellipsoids rather than spherical clusters. Gustafson
Kessel algorithmincludes the following steps:
1. Initialize decomposition matrix U randomly;
2. Calculate cluster centers using formula (1);
3. Calculate variation matrix
k
F for each cluster k =
1,2,…,K by using formula:
¿
¿
=
=
÷ ÷
=
N
n
w
kn
N
n
T
k n k n
w
kn
k
u
c x c x u
F
1
1
) )( (
(3)
4. Update decomposition matrix by using (2), consider
ing the following normalized distance between data
points:
( ) ( ) ( )
k n k
d
k
T
k n k n
c x F F c x c x d ÷

.

\

÷ =
÷
+
1
1
1
det ) , (
5. Repeat steps 24 until
<
1  l l
U  U
3. GAUSSIAN MIXTURE MODEL
Gaussian mixture model is a parametric probability density
function given by
( ) ( )
¿
=
E =
K
k
k k k
x G x f
1
,  
, (4)
where ( )
k k
x G E ,  is a multivariate Gaussian distribution.
For the given data set
N
x x x X ,..., ,
2 1
= the parameter
reestimation formulas are as follows:
( ), 
1
1
¿
=
=
N
n
n k
x k p
N
(5)
,
n k
x = (6)
( )( ) ,
T
k n k n k
x x ÷ ÷ = E
(7)
where expectations are taken over ( )
n
x k p  is the post
erior probability that sample
n
x is generated by kth Gaus
sian component. The last values are simply obtained via
Bayes rule:
( )
( )
( )
¿
=
E
E
=
K
m
m m n m
k k n k
n
x G
x G
x k p
1
, 
, 

. (8)
These steps repeat until convergence of the loglikelihood
function for the given dataset, reaching the local maximum
of this function. As mentioned above, converged model
quality depends on initial approximation of parameter set,
which is a topic of this paper.
4. SPEAKER IDENTIFICATION SYSTEM
We examine the conventional approach for speaker iden
tification based on modeling MFCC features for a given
speaker and then using this model to recognize input utter
ances. About 40 seconds of speech material for each speaker
is used to construct speakerdependent model. As frontend
features, we use 12dimensional MFCC vector calculated
every 10 ms from 25ms frames of the audio signal. Most of
the speaker identification systems use first and second deriv
atives and cepstral mean subtraction. [1][4]. In this work we
avoid these postprocessing steps because our goal was to
compare the clustering performance with the baseline fea
tures. We use a simple voice activity detector (VAD) to ig
nore nonrelevant speaker features. Actually only 50% of
features remain after applying VAD.
The clustering method is used in order to divide the data
set into K clusters. In our system we use K = 16. Resulting
clusters serve as initial point of EM algorithm. Cluster
means and radiuses seem to be a good approximation for
Gaussian means and variances, while weights of jth cluster
are initialized by the values N C
j
. Any clustering me
thod from those presented in section 1 could be used as ini
tial approximation algorithm. For fuzzy types of algorithms
we choose clusters for each data point with maximal values
in corresponding subdivision matrices. We construct GMM
by using EM iterations independently for each speaker. For
a given test sequence of features X , the system consists of
choosing the model with maximum likelihood value:
( )
¿
e

=
X x
s
s
x f s log max arg (9)
In this work we consider the system to solve the standard
problem of identification, meaning that we suppose that
every incoming speaker is presented in our speaker database.
We refer this to the classical speaker identification task, ra
ther than speaker recognition task where impostor models
are introduced in order to describe unknown speakers not
present in a database [11].
.
5. EVALUATION AND RESULTS
For testing purposes, we have chosen database of Russian
speech recorded with telephone quality at 8 kHz, constrain
ing spectrum to the interval 3003400 Hz. The training set
includes 47 speakers, including male and female voices;
each speaker has approximately 40 seconds of speech, in
cluding silence, background noise and other nonspeech
material. Our test database consists of 10sec, 20sec and
30sec input utterances and each speaker is presented at least
by five speech utterances. We have evaluated our system
independently for each utterance length, and then we have
combined the scores to produce our final results.
Figure 1. Comparison of different clustering techniques
on speaker identification performance
The performance of our system with different types of in
itialization methods is shown in figure 1. The percentage
means the number of correct speaker identifications (“hits”)
with respect to total number of tests.
Another interesting thing that we explore during our expe
riments is that identification system performance increases
when deterministic initialization of clustering algorithm is
used rather than in nondeterministic case. To approve this
fact we have made the second experiment with speaker iden
tification rate, comparing four types of EMinitialization
procedures: standard Kmeans with random initialization
(nondeterministic Kmeans), Kmeans with deterministic
initialization as described in section 1.1, Kmeans++ with
random initialization of first center (nondeterministic K
means++), Kmeans++ with the first center computed as
middle point of dataset. The results are shown in the figure
2.
Figure 2. Comparison of fuzzy and hard clustering tech
niques on speaker identification performance
For nondeterministic initialization speaker models have
been constructed 15 times, producing 15 different speaker
database implementations. We have calculated identification
error rate for each speaker model database independently
and then the average hits percent has been evaluated. As can
be seen, clustering algorithm with deterministic initialization
proved to be more powerful than ones with random initiali
zation, providing 1,5 % better results in average.
6. CONCLUSION
We have performed a comparison of different clustering
methods for speaker identification. The standard Kmeans
clustering, Kmeans++, LindeBuzoGray, Fuzzy Cmeans,
and GustafsonKessel algorithms have been analyzed. We
have found that Gaussian mixture model performance de
pends on deterministic properties of EM initialization me
thod. LindeBuzoGray (LBG) method outperforms other
nonfuzzy clustering approaches probably because of natural
arrangement of cluster centers along principal components
of the data, rather than random choice used in Kmeans or
KMeans++. Fuzzy clustering algorithms show better results
because they are more deterministic and use complete data
set during clustering iterations. All of the tested clustering
algorithms except GustafsonKessel divide dataset into
spherical clusters. GustafsonKessel finds ellipsoids so it
shows best result.
7. REFERENCES
[1] D.A. Reynolds, R.C. Rose, “Robust TextIndependent
Speaker Identification Using Gaussian Mixture Speaker
Models,” in Proc. IEEE ICASSP, Vol. 3, No. 1, pp. 7283,
January 1995.
[2] W.C. Chen, C.T. Hsieh, E. Lai, “Multiband approach
to Robust TextIndependent Speaker Identification”, in
IJCLCLP, Vol. 9, No. 2, pp. 6376, ACLCLP, August 2004.
[3] V. Wan, W.M. Campbell, “Support vector machines
for speaker verification and identification,”, in Proc. IEEE
NNSPX’00, Vol. 2, pp. 775784, 2000.
[4] S. Fine, J. Navratil, R.A. Gopinath “A Hybrid
GMM/SVM Approach to Speaker Identification,”, in Proc.
IEEE ICASSP’01, Vol. 1, pp. 417 – 420, Salt Lake City,
USA, 2001.
[5] A.P. Dempster, N.M. Laird, D.B. Rubin “Maximum li
kelihood from Incomplete Data via the EM Algorithm”,
Journal of the Royal Statistical Society, Series B, Vol. 39,
No. 1, pp. 138, 1997
[6] J. MacQueen, “Some Methods for Classification and
Analysis of Multivariate Observations,” in Proc. 5
th
Berke
ley Symp. On Math. Stat. and Prob., Vol. 1, pp. 281297,
1967
[7] D. Arthur, S. Vassilvitskii, “kmeans++: The Advan
tages of Careful Seeding”, in SODA’07, Proc. of the 18
th
annual ACMSIAM symp. on Discrete algorithms, Philadel
phia, PA, USA, 2007
[8] Y. Linde, A. Buzo, R. Gray, “An Algorithm for Vector
Quantizer Desing,” in IEEE Trans. on Communications,
Vol. 28, pp. 8494, 1980
[9] J. Bezdek, Pattern Recognition with Fuzzy Objective
Function Algorithms, Plenum Press, New York, 1981
[10] D.E. Gustafson, W.C. Kessel, “Fuzzy Clustering with
a Fuzzy Covariance Matrix,” in Proc. of IEEE CDC, 1979
[11] J.P. Campbell, “Speaker Recognition: A Tutorial”, in
Proc. of the IEEE, Vol. 85, No. 9, pp. 14371462, 1997
but the problem with selection of number of clusters still remains. if d ( x n . ck ) 1 w 1 1 w 1 (2) d (x j 1 C n . 3. The standard Kmeans procedure is applied to recalculate cluster centers due to convergence of the potential function. This method was originally presented in [7]. Before formalize this algorithm some definition must be introduced: 1. the algorithm is guaranteed to find a solution that is O (log k ) competitive to the optimal kmeans solution. Calculate cluster centers using formula: N *C 2. c k ) is the Euclidean distance between corresponding data point and cluster center. with 2 probability D ( x' ) . Choose the next center ci . Now we briefly describe the FCM algorithm: 1. GustafsonKessel algorithm GustafsonKessel algorithm is another fuzzy clustering algorithm. Repeat steps 2 and 3 until U l U l 1 where is a sufficiently small real value and i denotes iteration number. 2. Calculate cluster centers using formula (1). Then the cluster with the biggest radius is chosen. In the deterministic Kmeans++ algorithm we have chosen the first cluster center as a middle point of the data set. Choose an initial center c1 uniformly at random from dataset X. 1 V K i 1 x j Ci x j i satisfies E[V ] 8(ln k 2)V opt . If center set C is constructed with Kmeans++.4. This strategy determines Kmeans initialization. 2. 4.3. Fuzzy Cmeans Fuzzy Cmeans (FCM) is one of the most popular fuzzy clustering algorithms. Update the decomposition matrix using formula: u kn d ( xn . Otherwise. then the corresponding potential function u w kn 3. LindeBuzoGray LindeBuzoGray (LBG) algorithm is originally proposed in [8]. probability 1. c k so probability equals 0. besides the fact that it successfully avoids nondeterministic initialization of vector quantization procedure. GustafsonKessel algorithm includes the following steps: 1. At first we find sample mean of the whole dataset. Then standard Kmeans strategy is used to reestimate the centers of obtained clusters. which can be formalized in the following way: 1. selecting ci x ' . Kmeans++ Kmeans++ is a modification of Kmeans differing only in initialization step. It improves Cmeans by dividing the data region into K ellipsoids rather than spherical clusters. The main idea of this algorithm is to set initial cluster centers according to principal components in feature space.2. Initialize decomposition matrix U randomly. Fuzzy coefficient w (usually w=2 is used). By analogy we continue this procedure until the desired number of clusters will be reached.5. and its center is similarly split into two points along the first principal component in its proper subset. 2. Then this mean is split into two points along the first principal component. c k ) w 1 0 . 2. where N is the data points count and C is the clusters count. Select k cluster centers in these uniformly distributed points in each dimension. The detailed description and discussion of this method can be found in [9]. Using this initialization. considering K=2. where Note. u kn becomes equal 1 because ck and xn are the same point and xn is in ck with 1 C i is a subset belonging to the ith cluster.K by using formula: . Calculate variation matrix Fk for each cluster k = 1. Repeat Step 2 until we have chosen k clusters. c j ) w1 0 . 2.2. 3. u kn becomes equal 0 because c j and xn are the same point and xn is not in .…. Now Kmeans algorithm is applied for number of clusters K=3. 2. Decomposition matrix U R . 2.4.cj ) where d ( x n . where D(x) denotes the short D ( x) 2 xX ck u n 1 N n 1 N w kn xn (1) est Euclidian distance from a data point x to the closest center we have already chosen. It divides data region into K spherical clusters. It is very similar to Kmeans clustering. Initialize U randomly. that if d ( x n .
x 2 . which is a topic of this paper. reaching the local maximum of this function. m These steps repeat until convergence of the loglikelihood function for the given dataset. while weights of jth cluster are initialized by the values C j 5. Any clustering me f x  k G x  k . c k ) x n c k det Fk 1 d Fk x n c k avoid these postprocessing steps because our goal was to compare the clustering performance with the baseline features. Repeat steps 24 until U l . 20sec and 30sec input utterances and each speaker is presented at least by five speech utterances. The clustering method is used in order to divide the data set into K clusters. k . We use a simple voice activity detector (VAD) to ignore nonrelevant speaker features. For the given data set thod from those presented in section 1 could be used as initial approximation algorithm. the system consists of choosing the model with maximum likelihood value: (9) s arg max log f x s s x X X x1 . x N the parameter (5) (6) T reestimation formulas are as follows: k 1 N pk  x n . In our system we use K = 16. constraining spectrum to the interval 3003400 Hz. As frontend features. each speaker has approximately 40 seconds of speech. including male and female voices. we have chosen database of Russian speech recorded with telephone quality at 8 kHz. As mentioned above. background noise and other nonspeech material. We construct GMM by using EM iterations independently for each speaker.. where expectations are taken over (7) pk  x n is the post In this work we consider the system to solve the standard problem of identification.U l 1 3. [1][4].. k 1 K (4) where G x  k . GAUSSIAN MIXTURE MODEL Gaussian mixture model is a parametric probability density function given by N . Most of the speaker identification systems use first and second derivatives and cepstral mean subtraction. converged model quality depends on initial approximation of parameter set. m n (8) Gx m 1 K  m . Update decomposition matrix by using (2). For fuzzy types of algorithms we choose clusters for each data point with maximal values in corresponding subdivision matrices. EVALUATION AND RESULTS For testing purposes. Comparison of different clustering techniques on speaker identification performance . About 40 seconds of speech material for each speaker is used to construct speakerdependent model. 4. and then we have combined the scores to produce our final results. N n 1 k xn . For a given test sequence of features X .. Resulting clusters serve as initial point of EM algorithm. Our test database consists of 10sec.. considering the following normalized distance between data points: 1 1 T d ( x n . Cluster means and radiuses seem to be a good approximation for Gaussian means and variances. including silence. meaning that we suppose that every incoming speaker is presented in our speaker database. . k x n k x n k . We refer this to the classical speaker identification task. k . In this work we Figure 1.Fk u n 1 N w kn ( xn ck )( xn ck )T (3) u n 1 N w kn 4. k is a multivariate Gaussian distribution. The last values are simply obtained via Bayes rule: pk  x n k Gx n  k . 5. rather than speaker recognition task where impostor models are introduced in order to describe unknown speakers not present in a database [11]. Actually only 50% of features remain after applying VAD. We have evaluated our system independently for each utterance length. we use 12dimensional MFCC vector calculated every 10 ms from 25ms frames of the audio signal. SPEAKER IDENTIFICATION SYSTEM We examine the conventional approach for speaker identification based on modeling MFCC features for a given speaker and then using this model to recognize input utterances. erior probability that sample x n is generated by kth Gaussian component. The training set includes 47 speakers.
All of the tested clustering algorithms except GustafsonKessel divide dataset into spherical clusters. R. pp.C. IEEE ICASSP.C. Rubin “Maximum likelihood from Incomplete Data via the EM Algorithm”.E. REFERENCES [1] D. Vol. pp. We have calculated identification error rate for each speaker model database independently and then the average hits percent has been evaluated. of the IEEE. LindeBuzoGray. Philadelphia. 1979 [11] J. on Communications. pp. Reynolds.5 % better results in average.T. pp. of IEEE CDC. Dempster. [5] A. Navratil.” in Proc. 1997 [6] J.The performance of our system with different types of initialization methods is shown in figure 1. No. Series B. A. in Proc.M. Fine. Vol..C. pp. and Prob. Journal of the Royal Statistical Society. Lai. Kmeans++. in Proc. Chen. 2000. 8494. Vol. 2001. Vol. J. Plenum Press. USA. 2007 [8] Y. 775784. “Robust TextIndependent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE NNSPX’00. 2. Kmeans with deterministic initialization as described in section 1. Proc.B. Vol. The percentage means the number of correct speaker identifications (“hits”) with respect to total number of tests. of the 18th annual ACMSIAM symp. As can be seen. N. 1. S. 1967 [7] D. 6376. 9. Campbell. 85. IEEE ICASSP’01. D. Fuzzy Cmeans. pp. 1980 [9] J. Gustafson. Comparison of fuzzy and hard clustering techniques on speaker identification performance For nondeterministic initialization speaker models have been constructed 15 times. GustafsonKessel finds ellipsoids so it shows best result. clustering algorithm with deterministic initialization proved to be more powerful than ones with random initialization. Stat. pp. 1. On Math. producing 15 different speaker database implementations. [4] S. in IJCLCLP.” in Proc. “Speaker Recognition: A Tutorial”. “Some Methods for Classification and Analysis of Multivariate Observations. LindeBuzoGray (LBG) method outperforms other nonfuzzy clustering approaches probably because of natural arrangement of cluster centers along principal components . Arthur. Salt Lake City. 3. 417 – 420. C. 39. “An Algorithm for Vector Quantizer Desing.M. 281297. August 2004.P. 14371462. 1997 Figure 2. comparing four types of EMinitialization procedures: standard Kmeans with random initialization (nondeterministic Kmeans). 2. To approve this fact we have made the second experiment with speaker identification rate. R. January 1995. 138. 1. pp. MacQueen. Kessel. Linde. “Fuzzy Clustering with a Fuzzy Covariance Matrix. Bezdek. Gopinath “A Hybrid GMM/SVM Approach to Speaker Identification. E. Vol. 28.1. of the data. Laird. The results are shown in the figure 2. in Proc. Vassilvitskii. Kmeans++ with random initialization of first center (nondeterministic Kmeans++). 9. Fuzzy clustering algorithms show better results because they are more deterministic and use complete dataset during clustering iterations.P.” in Proc. We have found that Gaussian mixture model performance depends on deterministic properties of EM initialization method. Rose. Pattern Recognition with Fuzzy Objective Function Algorithms. R. New York. Buzo. providing 1. No. Kmeans++ with the first center computed as middle point of dataset. 1981 [10] D. 7. Hsieh. in SODA’07. Wan. W. W. “Multiband approach to Robust TextIndependent Speaker Identification”.A.A.” in IEEE Trans. rather than random choice used in Kmeans or KMeans++. [3] V. on Discrete algorithms. “kmeans++: The Advantages of Careful Seeding”. Vol. No. 5th Berkeley Symp. ACLCLP. and GustafsonKessel algorithms have been analyzed. Campbell. 6. Another interesting thing that we explore during our experiments is that identification system performance increases when deterministic initialization of clustering algorithm is used rather than in nondeterministic case. Vol. 1. CONCLUSION We have performed a comparison of different clustering methods for speaker identification. The standard Kmeans clustering. Gray. 7283. PA. USA. [2] W.”. No.”. “Support vector machines for speaker verification and identification.
This action might not be possible to undo. Are you sure you want to continue?