COMPARISON OF CLUSTERING ALGORITHMS FOR SPEAKER IDENTIFICATION

Nikolay Lubimov, Evgeny Mikheev, Alexey Lukin
Moscow Lomonosov State University, Moscow, Russia
ABSTRACT
In this paper, we consider text-independent speaker iden-
tification problem. The Gaussian mixture is used to model
speaker acoustic feature distribution. In this work we use
well-known Mel-frequency cepstral coefficient (MFCC) to
model speaker variability. The parameters of speaker models
are re-estimated during Expectation-Maximization (EM)
procedure. Monotonic property of this method leads to the
problem of initial parameters approximation that affects the
final convergence result. We have tested some major cluster-
ing algorithms as initialization step in the EM algorithm and
have analyzed their influence on speaker identification per-
formance. The fuzzy and hard clustering techniques have
been used to construct speaker models. The comparison be-
tween these models has been done on telephone-quality
speech database.
Index Terms— speaker identification, K-means, K-
means++, Linde-Buzo-Gray, Fuzzy C-means, Gustafson-
Kessel, Gaussian mixture
1. INTRODUCTION
In this paper we consider text-independent speaker identi-
fication task that refers to acoustic recognition research.
Many different techniques have been presented over past
several decades. A state-of-the-art technique uses Gaussian
Mixtures (GMM) [1] for modeling speaker data distribution
presented by MFCC [1] or LPCC [2] features. The classifi-
cation is obtained by choosing the speaker class with maxi-
mum likelihood on observed data. More complex approach
considers discriminative capability of the methods like Sup-
port Vector Machine (SVM) in order to separate different
acoustic classes [3]. A hybrid system for speaker identifica-
tion presented in [4] successfully combines advantages of
GMM’s generative capability and SVM’s discriminative
power by introducing Fisher kernel.
In this work we examine the simplest scheme for construc-
tion of speaker identification system. We can separate 3 ma-
jor stages in our system: 1) pre-processing, 2) initial cluster-
ing in feature space, 3) Gaussian mixture model parameter
re-estimation. As mentioned above, there are many different
successful techniques proposed for the pre-processing step
[1][2][3]. The Expectation-Maximization (EM) algorithm
used for Gaussian mixture parameter re-estimation is also
well documented [5]. On the other hand, it is not obvious
how to initialize the recurrent formula of the EM- algorithm
in this task. In other words, an interesting problem is: which
type of initial clustering in feature space should be used to
obtain better results? It is known that convergence properties
of the EM algorithm strongly depend on initial approxima-
tion [5]. In this paper we describe some existing methods for
making an initial approximation in the EM procedure, and
show how these methods affect to final speaker recognition
rate. Using different algorithms for feature space clustering
we construct several classifiers for speaker identification
task. We perform comparison between them using identifica-
tion rate error on a speaker database with telephone-quality
signals. Our main goals are to compare the performance of
fuzzy and hard clustering methods, and also examine the
influence of deterministic and random initializations of the
EM- algorithm. In first section we briefly describe clustering
approaches that can be used to separate the acoustic feature
space in non-intersecting classes. In second section the
overview of GMM model and basic re-estimation formulas
are presented. The overall baseline speaker identification
system is described in the third section. Then speaker identi-
fication test is performed with speaker models constructed
using different initial clustering procedures. The results and
discussion are presented in the fourth section.
2. CLUSTERING APPROACHES
2.1. K-means
K-means is one of the most popular unsupervised cluster-
ing algorithms [6]. Its main advantages are simplicity in rea-
lization and small computational complexity. Considering
some discrete data set, K-means minimize the distance be-
tween K centers and vector points in corresponding space.
This algorithm is widely known so we only tell its initiali-
zation. In our work we have used the following K-means
initialization procedure:
1. Find minimum m and maximum M value among all
points in one dimension;
2. Uniformly choose k points inside segment [m, M];
3. Repeat steps 1 and 2 for all dimensions of the input
data vector;
4. Select k cluster centers in these uniformly distri-
buted points in each dimension.
2.2. K-means++
K-means++ is a modification of K-means differing only in
initialization step, which can be formalized in the following
way:
1. Choose an initial center
1
c uniformly at random from
dataset X;
2. Choose the next center
i
c , selecting ' x c
i
= , with
probability
¿
eX x
x D
x D
2
2
) (
) ' (
, where D(x) denotes the short-
est Euclidian distance from a data point x to the clos-
est center we have already chosen;
3. Repeat Step 2 until we have chosen k clusters.
Then standard K-means strategy is used to re-estimate the
centers of obtained clusters. This method was originally pre-
sented in [7].
Using this initialization, the algorithm is guaranteed to find
a solution that is ) (log k O competitive to the optimal k-
means solution. If center set C is constructed with K-
means++, then the corresponding potential function
¿ ¿
= e
÷ =
K
i C x
i j
i j
x V
1

satisfies
opt
V k V E ) 2 (ln 8 ] [ + s
, where
i
C is a subset belonging to the i-th cluster.
This strategy determines K-means initialization, but the
problem with selection of number of clusters still remains.
In the deterministic K-means++ algorithm we have chosen
the first cluster center as a middle point of the data set.
2.3. Linde-Buzo-Gray
Linde-Buzo-Gray (LBG) algorithm is originally proposed
in [8]. It is very similar to K-means clustering, besides the
fact that it successfully avoids non-deterministic initializa-
tion of vector quantization procedure. The main idea of this
algorithm is to set initial cluster centers according to prin-
cipal components in feature space. At first we find sample
mean of the whole dataset. Then this mean is split into two
points along the first principal component. The standard K-
means procedure is applied to recalculate cluster centers due
to convergence of the potential function, considering K=2.
Then the cluster with the biggest radius is chosen, and its
center is similarly split into two points along the first prin-
cipal component in its proper subset. Now K-means algo-
rithm is applied for number of clusters K=3. By analogy we
continue this procedure until the desired number of clusters
will be reached.
2.4. Fuzzy C-means
Fuzzy C-means (FCM) is one of the most popular fuzzy
clustering algorithms. It divides data region into K spherical
clusters. Before formalize this algorithm some definition
must be introduced:
1. Decomposition matrix
C N
R U
*
e , where N is the
data points count and C is the clusters count;
2. Fuzzy coefficient w (usually w=2 is used).
Now we briefly describe the FCM algorithm:
1. Initialize U randomly;
2. Calculate cluster centers using formula:
¿
¿
=
=
=
N
n
w
kn
N
n
n
w
kn
k
u
x u
c
1
1 (1)
3. Update the decomposition matrix using formula:
¿
=
÷
÷
÷
÷
=
C
j
w
j n
w
k n
kn
c x d
c x d
u
1
1
1
1
1
) , (
) , (
(2)
where ) , (
k n
c x d is the Euclidean distance between
corresponding data point and cluster center.
Note, that if
0 ) , (
1
1
=
÷ w
k n
c x d
,
kn
u becomes equal 1 be-
cause
k
c and
n
x are the same point and
n
x is in
k
c with
probability 1. Otherwise, if 0 ) , (
1
1
=
÷ w
j n
c x d ,
kn
u becomes
equal 0 because
j
c and
n
x are the same point and
n
x is
not in ,
k
c so probability equals 0.
4. Repeat steps 2 and 3 until
 < ÷
÷1 l l
U U
where  is a sufficiently small real value and i denotes
iteration number.
The detailed description and discussion of this method can
be found in [9].
2.5. Gustafson-Kessel algorithm
Gustafson-Kessel algorithm is another fuzzy clustering al-
gorithm. It improves C-means by dividing the data region
into K ellipsoids rather than spherical clusters. Gustafson-
Kessel algorithmincludes the following steps:
1. Initialize decomposition matrix U randomly;
2. Calculate cluster centers using formula (1);
3. Calculate variation matrix
k
F for each cluster k =
1,2,…,K by using formula:
¿
¿
=
=
÷ ÷
=
N
n
w
kn
N
n
T
k n k n
w
kn
k
u
c x c x u
F
1
1
) )( (
(3)
4. Update decomposition matrix by using (2), consider-
ing the following normalized distance between data
points:
( ) ( ) ( )
k n k
d
k
T
k n k n
c x F F c x c x d ÷
|
.
|

\
|
÷ =
÷
+
1
1
1
det ) , (
5. Repeat steps 2-4 until
 <
1 - l l
U - U
3. GAUSSIAN MIXTURE MODEL
Gaussian mixture model is a parametric probability density
function given by
( ) ( )
¿
=
E =
K
k
k k k
x G x f
1
, | |   
, (4)
where ( )
k k
x G E , |  is a multivariate Gaussian distribution.
For the given data set
N
x x x X ,..., ,
2 1
= the parameter
re-estimation formulas are as follows:
( ), |
1
1
¿
=
=
N
n
n k
x k p
N

(5)
,
n k
x =  (6)
( )( ) ,
T
k n k n k
x x   ÷ ÷ = E
(7)
where expectations are taken over ( )
n
x k p | is the post-
erior probability that sample
n
x is generated by k-th Gaus-
sian component. The last values are simply obtained via
Bayes rule:
( )
( )
( )
¿
=
E
E
=
K
m
m m n m
k k n k
n
x G
x G
x k p
1
, |
, |
|
 
 
. (8)
These steps repeat until convergence of the log-likelihood
function for the given dataset, reaching the local maximum
of this function. As mentioned above, converged model
quality depends on initial approximation of parameter set,
which is a topic of this paper.
4. SPEAKER IDENTIFICATION SYSTEM
We examine the conventional approach for speaker iden-
tification based on modeling MFCC features for a given
speaker and then using this model to recognize input utter-
ances. About 40 seconds of speech material for each speaker
is used to construct speaker-dependent model. As front-end
features, we use 12-dimensional MFCC vector calculated
every 10 ms from 25-ms frames of the audio signal. Most of
the speaker identification systems use first and second deriv-
atives and cepstral mean subtraction. [1][4]. In this work we
avoid these post-processing steps because our goal was to
compare the clustering performance with the baseline fea-
tures. We use a simple voice activity detector (VAD) to ig-
nore non-relevant speaker features. Actually only 50% of
features remain after applying VAD.
The clustering method is used in order to divide the data
set into K clusters. In our system we use K = 16. Resulting
clusters serve as initial point of EM algorithm. Cluster
means and radiuses seem to be a good approximation for
Gaussian means and variances, while weights of j-th cluster
are initialized by the values N C
j
. Any clustering me-
thod from those presented in section 1 could be used as ini-
tial approximation algorithm. For fuzzy types of algorithms
we choose clusters for each data point with maximal values
in corresponding subdivision matrices. We construct GMM
by using EM iterations independently for each speaker. For
a given test sequence of features X , the system consists of
choosing the model with maximum likelihood value:
( )
¿
e
-
=
X x
s
s
x f s  log max arg (9)
In this work we consider the system to solve the standard
problem of identification, meaning that we suppose that
every incoming speaker is presented in our speaker database.
We refer this to the classical speaker identification task, ra-
ther than speaker recognition task where impostor models
are introduced in order to describe unknown speakers not
present in a database [11].
.
5. EVALUATION AND RESULTS
For testing purposes, we have chosen database of Russian
speech recorded with telephone quality at 8 kHz, constrain-
ing spectrum to the interval 300-3400 Hz. The training set
includes 47 speakers, including male and female voices;
each speaker has approximately 40 seconds of speech, in-
cluding silence, background noise and other non-speech
material. Our test database consists of 10-sec, 20-sec and
30-sec input utterances and each speaker is presented at least
by five speech utterances. We have evaluated our system
independently for each utterance length, and then we have
combined the scores to produce our final results.
Figure 1. Comparison of different clustering techniques
on speaker identification performance
The performance of our system with different types of in-
itialization methods is shown in figure 1. The percentage
means the number of correct speaker identifications (“hits”)
with respect to total number of tests.
Another interesting thing that we explore during our expe-
riments is that identification system performance increases
when deterministic initialization of clustering algorithm is
used rather than in non-deterministic case. To approve this
fact we have made the second experiment with speaker iden-
tification rate, comparing four types of EM-initialization
procedures: standard K-means with random initialization
(non-deterministic K-means), K-means with deterministic
initialization as described in section 1.1, K-means++ with
random initialization of first center (non-deterministic K-
means++), K-means++ with the first center computed as
middle point of dataset. The results are shown in the figure
2.
Figure 2. Comparison of fuzzy and hard clustering tech-
niques on speaker identification performance
For non-deterministic initialization speaker models have
been constructed 15 times, producing 15 different speaker
database implementations. We have calculated identification
error rate for each speaker model database independently
and then the average hits percent has been evaluated. As can
be seen, clustering algorithm with deterministic initialization
proved to be more powerful than ones with random initiali-
zation, providing 1,5 % better results in average.
6. CONCLUSION
We have performed a comparison of different clustering
methods for speaker identification. The standard K-means
clustering, K-means++, Linde-Buzo-Gray, Fuzzy C-means,
and Gustafson-Kessel algorithms have been analyzed. We
have found that Gaussian mixture model performance de-
pends on deterministic properties of EM initialization me-
thod. Linde-Buzo-Gray (LBG) method outperforms other
non-fuzzy clustering approaches probably because of natural
arrangement of cluster centers along principal components
of the data, rather than random choice used in K-means or
K-Means++. Fuzzy clustering algorithms show better results
because they are more deterministic and use complete data-
set during clustering iterations. All of the tested clustering
algorithms except Gustafson-Kessel divide dataset into
spherical clusters. Gustafson-Kessel finds ellipsoids so it
shows best result.
7. REFERENCES
[1] D.A. Reynolds, R.C. Rose, “Robust Text-Independent
Speaker Identification Using Gaussian Mixture Speaker
Models,” in Proc. IEEE ICASSP, Vol. 3, No. 1, pp. 72-83,
January 1995.
[2] W.-C. Chen, C.-T. Hsieh, E. Lai, “Multiband approach
to Robust Text-Independent Speaker Identification”, in
IJCLCLP, Vol. 9, No. 2, pp. 63-76, ACLCLP, August 2004.
[3] V. Wan, W.M. Campbell, “Support vector machines
for speaker verification and identification,”, in Proc. IEEE
NNSPX’00, Vol. 2, pp. 775-784, 2000.
[4] S. Fine, J. Navratil, R.A. Gopinath “A Hybrid
GMM/SVM Approach to Speaker Identification,”, in Proc.
IEEE ICASSP’01, Vol. 1, pp. 417 – 420, Salt Lake City,
USA, 2001.
[5] A.P. Dempster, N.M. Laird, D.B. Rubin “Maximum li-
kelihood from Incomplete Data via the EM Algorithm”,
Journal of the Royal Statistical Society, Series B, Vol. 39,
No. 1, pp. 1-38, 1997
[6] J. MacQueen, “Some Methods for Classification and
Analysis of Multivariate Observations,” in Proc. 5
th
Berke-
ley Symp. On Math. Stat. and Prob., Vol. 1, pp. 281-297,
1967
[7] D. Arthur, S. Vassilvitskii, “k-means++: The Advan-
tages of Careful Seeding”, in SODA’07, Proc. of the 18
th
annual ACM-SIAM symp. on Discrete algorithms, Philadel-
phia, PA, USA, 2007
[8] Y. Linde, A. Buzo, R. Gray, “An Algorithm for Vector
Quantizer Desing,” in IEEE Trans. on Communications,
Vol. 28, pp. 84-94, 1980
[9] J. Bezdek, Pattern Recognition with Fuzzy Objective
Function Algorithms, Plenum Press, New York, 1981
[10] D.E. Gustafson, W.C. Kessel, “Fuzzy Clustering with
a Fuzzy Covariance Matrix,” in Proc. of IEEE CDC, 1979
[11] J.P. Campbell, “Speaker Recognition: A Tutorial”, in
Proc. of the IEEE, Vol. 85, No. 9, pp. 1437-1462, 1997

but the problem with selection of number of clusters still remains. if d ( x n . ck )  1 w 1  1 w 1 (2)  d (x j 1 C n . 3. The standard Kmeans procedure is applied to recalculate cluster centers due to convergence of the potential function. This method was originally presented in [7]. Before formalize this algorithm some definition must be introduced: 1. the algorithm is guaranteed to find a solution that is O (log k ) competitive to the optimal kmeans solution. Calculate cluster centers using formula: N *C 2. c k ) is the Euclidean distance between corresponding data point and cluster center. with 2 probability D ( x' ) . Choose the next center ci . Now we briefly describe the FCM algorithm: 1. Gustafson-Kessel algorithm Gustafson-Kessel algorithm is another fuzzy clustering algorithm. Repeat steps 2 and 3 until U l  U l 1   where  is a sufficiently small real value and i denotes iteration number. 2. Calculate cluster centers using formula (1). Then the cluster with the biggest radius is chosen. In the deterministic K-means++ algorithm we have chosen the first cluster center as a middle point of the data set. Choose an initial center c1 uniformly at random from dataset X. 1 V  K i 1 x j Ci  x j   i satisfies E[V ]  8(ln k  2)V opt . If center set C is constructed with Kmeans++.4. This strategy determines K-means initialization. 2. 4.3. Fuzzy C-means Fuzzy C-means (FCM) is one of the most popular fuzzy clustering algorithms. Update the decomposition matrix using formula: u kn  d ( xn . Otherwise. then the corresponding potential function u w kn 3. Linde-Buzo-Gray Linde-Buzo-Gray (LBG) algorithm is originally proposed in [8]. probability 1. c k so probability equals 0. besides the fact that it successfully avoids non-deterministic initialization of vector quantization procedure. GustafsonKessel algorithm includes the following steps: 1. At first we find sample mean of the whole dataset. Then standard K-means strategy is used to re-estimate the centers of obtained clusters. which can be formalized in the following way: 1. selecting ci  x ' . K-means++ K-means++ is a modification of K-means differing only in initialization step. It improves C-means by dividing the data region into K ellipsoids rather than spherical clusters. The main idea of this algorithm is to set initial cluster centers according to principal components in feature space.2. Initialize decomposition matrix U randomly. Fuzzy coefficient w (usually w=2 is used). By analogy we continue this procedure until the desired number of clusters will be reached.5. and its center is similarly split into two points along the first principal component in its proper subset. 2. Then this mean is split into two points along the first principal component. c k ) w 1  0 . 2. where N is the data points count and C is the clusters count. Select k cluster centers in these uniformly distributed points in each dimension. The detailed description and discussion of this method can be found in [9]. Using this initialization. considering K=2. where Note. u kn becomes equal 1 because ck and xn are the same point and xn is in ck with 1 C i is a subset belonging to the i-th cluster.K by using formula: . Calculate variation matrix Fk for each cluster k = 1. Repeat Step 2 until we have chosen k clusters. c j ) w1  0 . 2.2. 3. u kn becomes equal 0 because c j and xn are the same point and xn is not in .…. Now K-means algorithm is applied for number of clusters K=3. 2. Decomposition matrix U  R . 2.4.cj ) where d ( x n . where D(x) denotes the short D ( x) 2 xX ck  u n 1 N n 1 N w kn xn (1) est Euclidian distance from a data point x to the closest center we have already chosen. It divides data region into K spherical clusters. It is very similar to K-means clustering. Initialize U randomly. that if d ( x n .

x 2 . which is a topic of this paper. reaching the local maximum of this function. m  These steps repeat until convergence of the log-likelihood function for the given dataset. while weights of j-th cluster are initialized by the values C j 5. Any clustering me- f  x |      k G x |  k . c k )  x n  c k   det Fk 1 d Fk x n  c k    avoid these post-processing steps because our goal was to compare the clustering performance with the baseline features. Repeat steps 2-4 until U l . 20-sec and 30-sec input utterances and each speaker is presented at least by five speech utterances. The clustering method is used in order to divide the data set into K clusters.  k  . We use a simple voice activity detector (VAD) to ignore non-relevant speaker features. For the given data set thod from those presented in section 1 could be used as initial approximation algorithm. the system consists of choosing the model with maximum likelihood value: (9) s   arg max log f x  s s x X    X  x1 . x N the parameter (5) (6) T re-estimation formulas are as follows: k  1 N  pk | x n . In our system we use K = 16. constraining spectrum to the interval 300-3400 Hz. As front-end features. each speaker has approximately 40 seconds of speech. including male and female voices. we have chosen database of Russian speech recorded with telephone quality at 8 kHz. As mentioned above. background noise and other non-speech material. We construct GMM by using EM iterations independently for each speaker.. where expectations are taken over (7) pk | x n  is the post- In this work we consider the system to solve the standard problem of identification.U l -1   3. [1][4].. k 1 K (4) where G  x |  k . GAUSSIAN MIXTURE MODEL Gaussian mixture model is a parametric probability density function given by N . Most of the speaker identification systems use first and second derivatives and cepstral mean subtraction. converged model quality depends on initial approximation of parameter set. m n (8)   Gx m 1 K | m . Update decomposition matrix by using (2). For fuzzy types of algorithms we choose clusters for each data point with maximal values in corresponding subdivision matrices. EVALUATION AND RESULTS For testing purposes. Comparison of different clustering techniques on speaker identification performance . About 40 seconds of speech material for each speaker is used to construct speaker-dependent model. 4. and then we have combined the scores to produce our final results. N n 1  k  xn . For a given test sequence of features X .. Resulting clusters serve as initial point of EM algorithm. Our test database consists of 10-sec.. considering the following normalized distance between data points: 1 1  T d ( x n . Cluster means and radiuses seem to be a good approximation for Gaussian means and variances. including silence. meaning that we suppose that every incoming speaker is presented in our speaker database. .  k   x n   k x n   k  . We refer this to the classical speaker identification task.  k  . In this work we Figure 1.Fk  u n 1 N w kn ( xn  ck )( xn  ck )T (3) u n 1 N w kn 4.  k  is a multivariate Gaussian distribution. The last values are simply obtained via Bayes rule: pk | x n    k Gx n |  k . 5. rather than speaker recognition task where impostor models are introduced in order to describe unknown speakers not present in a database [11]. Actually only 50% of features remain after applying VAD. We have evaluated our system independently for each utterance length. we use 12-dimensional MFCC vector calculated every 10 ms from 25-ms frames of the audio signal. SPEAKER IDENTIFICATION SYSTEM We examine the conventional approach for speaker identification based on modeling MFCC features for a given speaker and then using this model to recognize input utterances. erior probability that sample x n is generated by k-th Gaussian component. The training set includes 47 speakers.

All of the tested clustering algorithms except Gustafson-Kessel divide dataset into spherical clusters. R. pp.C. IEEE ICASSP.-C. Rubin “Maximum likelihood from Incomplete Data via the EM Algorithm”.E. REFERENCES [1] D. Vol. pp. We have calculated identification error rate for each speaker model database independently and then the average hits percent has been evaluated. of the IEEE. Linde-Buzo-Gray. Philadelphia. 1979 [11] J. on Communications. pp. Reynolds.5 % better results in average.-T. pp. of IEEE CDC. Dempster. [5] A. Navratil.” in Proc. 1997 [6] J.The performance of our system with different types of initialization methods is shown in figure 1. No. Series B. A. in Proc.M. Fine. Vol..C. pp. and Prob. Journal of the Royal Statistical Society. Lai. K-means++. in Proc. Chen. 2000. 84-94. Vol. 2001. Vol. J. Plenum Press. USA. 2007 [8] Y. 775-784. “Robust Text-Independent Speaker Identification Using Gaussian Mixture Speaker Models. IEEE NNSPX’00. 2. K-means with deterministic initialization as described in section 1. Proc.B. Vol. The percentage means the number of correct speaker identifications (“hits”) with respect to total number of tests. of the 18th annual ACM-SIAM symp. As can be seen. N. 1. S. 1967 [7] D. 63-76. 9. Campbell. 85. IEEE ICASSP’01. D. Fuzzy C-means. pp. 1980 [9] J. Gustafson. Comparison of fuzzy and hard clustering techniques on speaker identification performance For non-deterministic initialization speaker models have been constructed 15 times. Gustafson-Kessel finds ellipsoids so it shows best result. clustering algorithm with deterministic initialization proved to be more powerful than ones with random initialization. Stat. pp. 1. On Math. producing 15 different speaker database implementations. [4] S. in IJCLCLP.” in Proc. “Speaker Recognition: A Tutorial”. “Some Methods for Classification and Analysis of Multivariate Observations. Linde-Buzo-Gray (LBG) method outperforms other non-fuzzy clustering approaches probably because of natural arrangement of cluster centers along principal components . Arthur. Salt Lake City. 3. 417 – 420. C. 39. “An Algorithm for Vector Quantizer Desing.M. 281-297. August 2004.P. 1437-1462. 1997 Figure 2. comparing four types of EM-initialization procedures: standard K-means with random initialization (non-deterministic K-means). 2. To approve this fact we have made the second experiment with speaker identification rate. R. January 1995. 1-38. 1. pp. MacQueen. Kessel. Linde. “Fuzzy Clustering with a Fuzzy Covariance Matrix. Bezdek. Gopinath “A Hybrid GMM/SVM Approach to Speaker Identification. E. Vol. 28.1. of the data. Laird. The results are shown in the figure 2. in Proc. Vassilvitskii. K-means++ with random initialization of first center (non-deterministic Kmeans++). 9. Fuzzy clustering algorithms show better results because they are more deterministic and use complete dataset during clustering iterations.P.” in Proc. We have found that Gaussian mixture model performance depends on deterministic properties of EM initialization method. Rose. Pattern Recognition with Fuzzy Objective Function Algorithms. R. New York. Buzo. providing 1. No. K-means++ with the first center computed as middle point of dataset. 1981 [10] D. 7. Hsieh. in SODA’07. Wan. W. W. “Multiband approach to Robust Text-Independent Speaker Identification”.A.A.” in IEEE Trans. rather than random choice used in K-means or K-Means++. [3] V. on Discrete algorithms. “k-means++: The Advantages of Careful Seeding”. Vol. No. 5th Berkeley Symp. ACLCLP. and Gustafson-Kessel algorithms have been analyzed. Campbell. 6. Another interesting thing that we explore during our experiments is that identification system performance increases when deterministic initialization of clustering algorithm is used rather than in non-deterministic case. Vol. 1. CONCLUSION We have performed a comparison of different clustering methods for speaker identification. The standard K-means clustering. Gray. 72-83. PA. USA. [2] W.”. No.”. “Support vector machines for speaker verification and identification.

Sign up to vote on this title
UsefulNot useful