583

IEEE TRANSACTIONS ON COMPUTERS, VOL. C-19, NO. 7, JULY 1970

An Algorithm for Detecting Unimodal
Fuzzy Sets and Its Application as a

Clustering Technique
ISRAEL GITMAN

AND

MARTIN D. LEVINE, MEMBER, IEEE

Abstract-An algorithm is presented which partitions a given
sample from a multimodal fuzzy set into unimodal fuzzy sets. It is
proven that if certain assumptions are satisfied, then the algorithm
will derive the optimal partition in the sense of maximum separation.
The algorithm is applied to the problem of clustering data, defined
in multidimensional space, into homogeneous groups. An artificially
generated data set is used for experimental purposes and the results
and errors are discussed in detail. Methods for extending the algorithm to the clustering of very large sets of points are also described.
The advantages of the method (as a clustering technique) are that
it does not require large blocks of high speed memory, the amount of
computing time is relatively small, and the shape of the distribution of
points in a group can be quite general.
Index Terms -Clustering algorithms, multimodal data sets,
pattern recognition, symmetric fuzzy sets, unimodal fuzzy sets.

I. INTRODUCTION

rT HE PRIMARY objective of clustering techniques is
to partition a given data set into so-called homogeneous clusters (groups, categories). The term homogeneous is used in the sense that all points in the same group
are similar (according to some measure) to each other and
are not similar to points in other groups. The clusters generated by the partition are used to exhibit the data set and
to investigate the existence of families as is done in numerical taxonomy or alternatively, as categories for classifying
future data points as in pattern recognition. The role of
cluster analysis in pattern recognition is discussed in detail
in two excellent survey papers [1] [16].
The basic practical problems that clustering techniques
must address themselves to involve the following:
1) the availability of fast computer memory,
2) computational time,
3) the generality of the distributions of the detected
categories.
Clustering algorithms that satisfactorily overcome all of
these problems are not yet available. In general, techniques
that can handle a relatively large data set (say 1000 points)
are only capable of detecting very simple distributions of
points [Fig. 1(a)]; on the other hand, techniques that perform an extensive search in the feature space (the vector
Manuscript received September 22, 1969; revised December 15, 1969.
The research reported here was sponsored by the National Research
Council of Canada under Grant A4156.
I. Gitman was with the Department of Electrical Engineering, McGill
University, Montreal, Canada. He is now with the Research and Development Laboratories, Northern Electric Co. Ltd., Ottawa, Ontario, Canada.
M. D. Levine is with the Department of Electrical Engineering, McGill
University, Montreal, Canada.

62.
(b)

Fig. 1. Distribution of points in a two-dimensional feature space.
The curves represent the closure of the sets which exhibit a high
concentration of sample points.

space in which the points are represented) are only able to
handle a small data set.
Some authors [6], [9], [19], [20] have formulated the
clustering problem in terms of a minimization of a functional based on a distance measure applied to an underlying
model for the data. The clustering methods used to derive
this optimal partition perform an extensive search and are
therefore only applicable to small data sets (less than 200
points). In addition, there is no guarantee that the convergence is to the true minimum. Other methods [3], [12], [15]
use the so-called pairwise similarity matrix or sample covariance matrix [15]. These are memory-limited since, for
example, one-half million memory locations are required
just to store the matrix elements when clustering a data set
of 1000 points. Also, the methods in [12], [15] will generally
not give satisfactory results in detecting categories for an
input space of the type shown in Fig. 1(b).
It is rather difficult to make a fruitful comparison among
the many clustering techniques that have been reported in
the literature and this is not the aim of the paper. The difficulty may be attributed to the fact that many of the algorithms are heuristic in nature, and furthermore, have not
been tested on the same standard data sets. In general it
seems that most of the algorithms are not capable of detecting categories which exhibit complicated distributions in
the feature space [Fig. 1(c)] and that a great many are not
applicable to large data sets (greater than 2000 points).
This paper discusses an algorithm which partitions the
given data set into "unimodal fuzzy sets." The notion of a
unimodal fuzzy set has been chosen to represent the partition of a data set for two reasons. First, it is capable of detecting all the locations in the vector space where there exist
highly concentrated clusters of points, since these will appear as modes according to some measure of "cohesiveness." Second, the notion is general enough to represent
clusters which exhibit quite general distributions of points.

2 This is equivalent to the set Fr in [21 ] where x =f(x. 3). f(x4 As an example of a symmetric fuzzy set. A measure of inhomogeneity is used to detect clusters one at a time. by given only those points xi to in comparison S is large in particular. for every point xi in B. 2). of the space the dimension maxima in f. In order to consider the problem of clustering data points it will be necessary to define discrete fuzzy sets." B is a symmetric fuzzy set. then for every two points xi and Xk in B. An important distinction between this procedure and the methods reported in the literature' is that the latter use a distance measure (or certain average distances) as the only means of clustering. the algorithm to be described in the next points by associating with the point i. JULY 1970 584 The generated partition is optimal in the sense that the program detects all of the existing unimodal fuzzy sets and realizes the maximum separation [21] among them. U):: tf(Xi) . are used in the algorithm. 1]. it is economical in memory space and computational time requirements and also detects groups which are fairly generally distributed in the feature space [Fig. Definition: A fuzzy set B is unimodal if and only if the set Fx. For some point Xk in Si we associate a tion and certain definitions required for the description of point x. in the sense We may define two sets in B as follows: that every point in X is also in S. Section V deals with the application of the algorithm to the clustering of data and the various practical implications. and 3) mentioned above. Xk) = xje(SminSi) [d(xk. f(=) Sup xeB sample point in Si (see Fig. 1(c)]. The point Xk is defined to be an interior point in Si if and only if the set F = {xld(xi. since the taller the building. if B is symmetric. The algorithm is a systematic procedure (as opposed to an iterative technique) which always terminates and the computation time is reasonable. the higher the grade of membership it will have in B. with The notion of an interior point in a discrete fuzzy set the value fA(x) at x representing the 'grade of memberas follows. where xi is a sample of N points from B by Si{(xi. A mode will be called a local maximum if it is a local maximum of f and will then be denoted by vi. "a fuzzy set A in X is characterized by a membership It will be assumed that every local maximal grade of mem(characteristic) function fA(x) which associates with bership is unique [211. a large sample S. F and S. is connected for all xi in B (see Fig. respectively. Let B be a fuzzy set in X with the membership (charactermaximal the which at 4u be the point let and f. this definition reduces to that of an interior point in ordinary sets. The conclusions are given in Section VIII. In Section II the concept of a fuzzy set is extended in order to define both symmetric and unimodal fuzzy sets.. in (S .. Definition: A fuzzy set B is symmetric if and only if. Note that when the sample is of infinite size. where II. This is accomplished by associating with every point in the set a grade of membership or characteristic value [21 ]. each point in X a real number in the interval [0. Xk)} includes at least one [f(x)]. When the number of attributes is large. {(Si. local of number the to and X. xi)} {xdd(jl. as an aid in the clustering process. yi)m} denotes a partition of S into m subsets. We refer to Then: xeX. istic) function grade of membership is attained. A sample point from B will be a point xeX with its associated characteristic value. elements with of points Let X be a space pi as the mode of Si. d(x. The algorithm attempts to solve problems 1). x) < d(xi. Possible extensions of the algorithm to handle very large data sets (say greater than 30 000 points) are presented in Section VII. which are described in detail in Sections III and IV. V and . Any symmetric (in the ordinary sense) function. can represent a characteristic function of a symmetric fuzzy set. 2). or a truncated symmetric function.IEEE TRANSACTIONS ON COMPUTERS.i)." Definition: Let S be a sample from a fuzzy set B and Si a In the rest of this section we shall introduce some nota. xi)]. Section VI discusses the experimental results. DEFINITIONS Si is a discrete fuzzy subset and pi the point in Si at which [21]. that is. a value which is the number of points at a constant finite distance from i. ' Rogers and Tanimoto [17] introduced a certain order among the Given a sample. of membership. two sections is composed of two parts: procedure F which - = this so-called "prime mode" will be the centroid of the data set.Si) such that the algorithm. The latter partitions a sample from a multimodal fuzzy set into unimodal fuzzy sets." the dimension of the order of "importance" of every point. we will denote a L)N}. f(x). rather than a "mode" of a cluster. grade corresponding its point in X and fi which includes set fuzzy discrete a as S can be considered shall require We the sample. xi = rxid* Clearly. that is. Thus the order of the points according to their grade of membership. as well as their order according to distance. d(xi.) < d(Xk.- i= {xif(x) f(xi)}2 2 = x) < d(u. where xi is some point in B and d is a metric.proper subset of S. The basic algorithm consists of the two procedures. consider the set B defined as "all the very tall buildings. Further. defined is ship' of x in A. We have introduced another "dimension. Zadeh by introduced was set fuzzy a of The notion the maximal grade of membership is attained. .

say n.. 3. Thus the point y. then y' becomes the next candidate. y1) forj. that is. 2) If yjmy=y. G.d(ry2) forj. This process will initiate new groups when certain conditions are satisfied. In order to make the steps in the procedure clear.. that is. the set 17. fj 2 f. are the grades of membership of yj and Yt. The points in Si are denoted by x. y' y'. a) A = (yv1. the sample is partitioned into symmetric subsets and in the second. y'. is the nearest point to its mode yI(mY 1up1) except for points that have already been assigned to group 1. At every stage of assignment. the point is assigned into groupj. b) A1 =(yl. Not all the points in An will necessarily be assigned into group n at the termination of X2 x) Qrb (a) (b) Xi Fig. then yi.GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS 585 point in An is its mode and the points are ordered according to their distance to this mode. r-1. y' -pi is the mode for group i. If. 2. and Y 1. (a) A unimodal fuzzy set where for every point xi in the set. The curves indicate lines of equigrade of membership. 3) Suppose that G groups have been initiated. Ai. to be accepted into the group. is not disjoint. procedure F detects all the local maxima of f. i = 1. and if it is assigned into group 1. It is divided into two parts: in the first part. fi)N} from a multimodal fuzzy set. YN) is a descending sequence of the points in the sample ordered according to their grade of membership. The point Xk2 is an interior point in Si. An example of a unimodal fuzzy set in a two-dimensional space. Thus y2 is the first candidate."*. are assigned into group 1 and a new group is initiated with y2. fj)N} be a sample from a fuzzy set (assume.' 1) Initially it is required to generate the following two sequences. that fi # fj for i #j). of the points ordered according to their distance to that is. every group i displays a point from its sequence Ai. Suppose that yq in the sequence A is the point currently being considered for assignment (all the 3 The case in which there are equal grades of membership will be discussed in Section V. for simplicity. however. This will hold true for any sequence Ai." 5 The sets A and A1 are sequences of the same N points. a sequence of points An is formed of all the points in S which might be considered for assignment into group n. t. since F2 includes points in Si. for 1< t.. and so on. y2) y. the sequence A2=(yf y2 y32** y2) iS generated. That is. d(yl. yJ) < d(yl.. We can therefore state that the current candidate point for group 1. the ordering principle is different. Whenever a group. The order of the points in the sequence A is the order in which the points are considered for assignment into groups. Thus there exist G sequences.. each of which displays a candidate point. those points which are closer to Yr than the shortest distance from Yr to the points that have already been assigned. which is its candidate point. 4. The latter includes from among the points that have not yet been assigned. Y3 *Y). is some point in A and its label indicates that it is also in location k in the sequence Ai. 4 We shall use the symbol "-" to mean "is identical to. detects all the local maxima off. for example. PROCEDURE F Given a sample S= {(xi. and a sequence A1 in which they are ordered according to their distance to the mode of A (the first point in A). since there exists a point xr for which F. y1. 000 X2 r2 Xi Fig. Part 1 of Procedure F Let S = {(xi. for i-2. this is shown for one dimension in Fig. that is. An example which demonstrates the procedure is presented later. and y' is its candidate point. where fj and f. 2. some preliminary explanations are given below.t. * . (b) A multimodal fuzzy set. The point Xkl is on the boundary of Si since IF includes no sample points in S.r is disjoint. the procedure.l2 Yr as its mode. . The point of A to be assigned is compared (for identity) with each of the candidate points in turn and is either assigned into one of the existing groups (if it is identical to the corresponding candidate point) or initiates a new group. The points in A2 are now ordered according to their distance to Yr. Y2. 3. The first III. subject to certain conditions on f and S (see Theorem 1). then the candidate of this group is replaced by its next point in the sequence A4. d(yr. The procedure is initialized by the construction of two sequences: a sequence A in which the points are ordered according to their grade of membership. The number of groups (subsets) into which the sample is partitioned is not known beforehand. Thus a point is assigned to the group in which its order according to the grade of membership corresponds to its order according to the distance to its mode. i= 1. a search for the local maxima in the generated subsets is performed. is initiated.5 We will also refer to A1 as the sequence of ordered "candidate" points to be assigned into group 1. respectively. r-1.4 is the sequence . where yI =y1. and procedure S which uses these local maxima and partitions the given sample into unimodal fuzzy sets.

X23. This is done according to the definition given in the previous section. X5. X24. The main restriction is the requirement that every local maximum of f shall have a small symmetric subset in its neighborhood (condition 1).lx17 X25. JULY 1970 f f S3S2 St S4 ' . a point x. Y.I x rp ~~~ri Fig. X8. X30. x) < E } is a symmetric fuzzy set. X8. The sequences A and A1 are given by A (Y1. where I is a set of integers representing those groups whose candidate points are identical to yq. XIx lX7X IX9 1X3 X15 lX X20 X2F3 X25 IX27 X2 X Fig. and 3) {(Vk. Xke(S . b) If yq Yi for some ieI. X28. the set {xId(xi. then yq is assigned into group i. then the following holds. whereas the point to be assigned is x12. X6. X9. Let {(Si. X20. then a new group is initiated with Yq as its mode. X21. Theorem 16: Let f be a characteristic function of a fuzzy set with K local maxima so that: 1) if VK is a local maximum of f. X4. a) Ifyq=my-and Yq #yJ j= 1. 4.Y30) (x15. X7. X29. Xk)J. A1 = (yll Y21 Y31* X2. G. X3. X26. At the stage where the point xi(=ji. X20. Thereafter. we can see that the first four points in A and A1 are pairwise identical and thus they are assigned to group 1. points in A for i < q have already been assigned). A sample of 30 points was taken from a one-dimensional characteristic function. X10. X26.) initiates a new group. fk) }CS. Y2. where Si denotes the discrete fuzzy set. pj)m} denote the partition generated by part 1 of procedure F. X5. X1. such that: 2) for every xi in the domain of f. The latter. . X21). X23. c) If Yq # yi for i = 1. all the sample points that have already been assigned are in the domain VP. as well as the sample points with their associated sample numbers. then yq is assigned into that group to which its nearest neighbor with a higher grade of membership. 5. xpi) = min [d(i. X27. 5. It is not necessary for the sample to be of infinite size. f1)N} be a large sample from f. Y30) = XX514 X16 X13 X2 X18 Xll.ui is a local maximum if the set FRj = {xld(xpi. Theorem 1 states the sufficient condition under which the procedure will detect all the local maxima of f. X9. Thus the nearest point in Fp to xi is at a distance Ri. and m the number of groups. yui)l} be the partition generated by part 1 of procedure F. ji its maximal grade of membership (mode).j and a distance Ri can be found as follows: Ri = d(. which defines the domain Fj of all the points which are at a shorter distance to xi than Ri. if f and S satisfy the conditions stated in Theorem 1. x13.Si) Ri is the minimum distance from the mode to a point in S outside the set Si. x)<e/2} includes at least one point in S. Using the result of Theorem 1. it will be sufficient if it is large in the neighborhood of a local maximum.*. X30)- Observing these sequences. X7.u. Example: The following example demonstrates the various procedures associated with the algorithm. Part 2 Let {Si. We can observe that x15 and x25 are the only interior modes in the partition and thus will be recognized as the local maxima points (vi) off.586 IEEE TRANSACTIONS ON COMPUTERS. For every mode pi and set Si. We say that . X28.* .. x16. X2. X27. then the procedure presented detects all the local maxima of f . X29. X6. G. X25. X22. X18. Otherwise we decide that pi is not a local maximum because it is a boundary point of Si. X4. are interior points. has been assigned. Then pi is an interior point in Si if and only if it is a local maximum of f. X3. The sample points in Fi will be ordered as candidate points to be assigned into the group in which xi is the mode. XlO. X1. To summarize this section. then there exists a finite E >0 such that the set {xld(vk. Condition 2 indirectly relates the dimension of the space to the size ofthe sample set. The dotted lines indicate the partition (the sets Si) resulting from the application of part I of procedure F.. X) < Ri} includes points in Si. X24. are shown in Fig. Part 1 of procedure F is terminated when the sequence A is exhausted. part 2 of the procedure is employed to check all the modes pi in order to detect which 6 The proofs of the theorems are given in Appendix I. . The characteristic functionf and the 30 point sample for the example are shown. the candidate point for group 1 is y1 =x17.x1l. Let S = {(xi. X19.X19. X22. x14. j= i. (x2X. Thus the latter will initiate a new group and a new sequence A2 will be generated.

the set candidate x12 has already been assigned to another F = {xId(xi. ox > 0.ferent from the "nearest neighbor classification rule" [5 ] besigned. i A . X19. X27. x10. K is the local maximum (A3 (yS y. X14. If L-*O. S. X28. X12. then every final set is a union of the sets Si gentheir corresponding sets.* . X20. X23. X16. then only points with a lower grade of membership can be assigned into group r. IX21) for i . none of them can be assigned into group r.j.Y4 4. Part shown in F are 1 Fig. Xll.. demonstrate to order procedure maxima. and so on. M}. Theorem 4: Let f be a piecewise continuous characteristic IV. and The rule for assigning the points differs from the known classification rules appearing in the pattern recognition litera2) the local maxima. providing the local maxima of f are Let S be a sample from f. y3) x9) (Xi of group r. i= 1. r= M+ 1. . Therefore there are no sample points in S to applies to all the points with the exception of the local maxima that initiate new groups. 3) No more points will be assigned into group 1. Let A be the sequence of the points ordered accordI (A (-Y'l.GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS 587 After the first four groups are initiated. . (v1. that is. such that point. * * * . in A have higher grades of membership. detect which of these 2. S2= (X12..K. =: (X15. Thus x1o will be assigned to group 3 since it is identical with the latter's candidate istic function of a fuzzy set.143) = (X25. X25. x) < o} includes at least location. . Based on = X If E'. The candidate points for the four groups that cause of the particular order in which the points are introhave already been initiated are x12. X22. in location j in the sequence A into the group in which its 1) The sequence A2 includes only one point (its mode) nearest neighbor with a higher grade of membership (all since the nearest point to x12 in S has already been the points preceding xj in A) has been assigned. SI = (XI 5. X29. and characteristic values of the local maxima off. and thus cannot be replaced as a candidate for S. X 13. X17X127? ing to their grade of membership. PROCEDURE S function of a fuzzy set and d the distance between its two Procedure S partitions a sample from a fuzzy set into local maxima. f (v2)) are in S. a more powerful result than Theorem 2 can be in next section the end the of at be continued will example for simplicity we will state it for the case of two local stated. X24. fj)N} be an infinite sample from f. respectively. X24. the number. fJ(v))K} c S be the sample of the K local maxima X17.X26. Thus this procedure uses the information obtained 1) for every point xi in the domain of f and for a finite from the application of procedure F. This proposition implies that all the points in A which are X18) found in the locations Pj . Note that the rule is difgenerate a symmetric fuzzy set whose mode is x12. point group 1. Theorem in interior points and are modes x25 see that only the x15 If x-+0. X26. 2) At the stage shown.(Y4. Theorem 2: Let f be a piecewise continuous characterand x26. . let S = {(xi. 2 of 5. Y2. We local maxima of f are in locations pi. = - 11 X I. tion of part of procedure 3: Let S be a sample from a fuzzy set with a Theorem 13 modes to each of the test the procedure is now applied to characteristic function f. X21. r=M+ 1. can infer the following proposition. such that.(Y1. and suppose that the K i K in A. Since all the points that S3 = (X11. (v2. the set F = {xjd(x. duced. the points are finally assigned in the order in which they and the resulting partition to this point are as follows: appear in the sequence A. M + 2. the this partial result. X1 4.1 Y2L * . and {(vi. f(v1)).J<p will automatically be asS4 =(X25. and f(vj)>f(vj) of vi . Proposition: The point xj in location j in the sequence (A (y2) A. X1l6. 2. maxima are discovered. * **. Rather than an arbitrary order which is the usual case. X14. then procedure S partitions the given sample into The resulting symmetric fuzzy sets generated by the applicaunimodal fuzzy sets. precede location p. fi)N} be a sample from a fuzzy set. known. unimodal fuzzy sets. M+2. since its 1) for every xi in the domain of f and for an oa 0. M<K. Y30) Specifically.. a d.) If f(Xpr). we note the Procedure S uses the following rule: assign the point xj following. PM<j<P(M+1). X13. one point in S. This rule assigned. Let S = {(xi. the sequences Ai ture. X30. x1o in the sequence A is to be as. and therefore only two local erated in part 1 of procedure F. can only be assigned into one of (X12) the groups ieIM= {1. In relation to the procedure described above. f Assume that f1(xi)Af+j(xj) for i #j.. Y30J -:: X1 5. no candidate.) A4 . Let f and S be constrained as in can In 5 we Fig. are interior points. X 1 6 1l 13 17. x) < a/2} includes at least one sample group. the points in locations P2<] <P3 will be divided between group 1 and group 2.X24) signed into group 1.

although for any T. if pi is a mode. In fact no further computation is necessary since. Example: To demonstrate procedure S. In other words. X26. This stage of the process and the final partition are shown in Fig. we again consider the sequence A. which can be stated as "the partition of a fuzzy set into unimodal fuzzy sets. X8. The threshold essentially controls the resolution of the characteristic function f. if x23 is the point to be currently assigned. x26.x14. the sample S is identically equal to the domain of f. In particular. One possibility is to use a clustering model to associate with every point a membership value according to its "contribution" to the desired partition. X10. By a clustering model. JULY 1970 588 Let H = xo7 be the optimal hyperplane (point) separating f into unimodal fuzzy sets and F. particularly in the neighborhood of the separating hypersurface (see Theorem 4). we can first assign pi (the mode of Si). on the other hand. and S2) are given by Sl = (X15. X24. x1 X2. x4. x16. x3. however. x29.3x12. and F2 is evaluated and this point will eventually be assigned into the group in which x24 is a member. 5 . X27. it will initiate a new set (group) in part 1 of procedure F. generated in part 1 of procedure F. -. all the points in the sequence A with higher grade of membership than f1 have been assigned. f. x6. a certain order of importance is introduced to facilitate the discrimination among the points not only on the basis of their location (in the vector space) but also according to their "importance. This is not. for i= 1. we can always find a finite ac for which the result holds. x5. Note that when cx = 0." There are many possible ways to discriminate among the points. The distance of x23 to all the points F. then only poinis in S within a distance cx to H can be misclassified. a unique partition into unimodal fuzzy sets is derived. but in this case ni= 1. x12. THE APPLICATION OF THE ALGORITHM TO CLUSTERING DATA The problem we have treated so far. A = ( . when evaluating the distances to the points that have already been assigned. X13. (i). = {xld(xo. On the other hand. Observing procedure S. then every point will have the same number ni(ni= N) and no discrimination is achieved. all the points in the domains F1 and F2 have already been assigned. we must replace the distance d by the minimum distance between any two local maxima off. Theorem 4 implies that if a is finite. we may see that the sample must be large. the case in the clustering ' If X= El. X20. this is also true for a very small T. and then automatically all the points in Si to this same group. In order to directly employ the algorithm it is necessary to associate with every point xi a grade of membership fi. 8 Among other changes in the statement of the theorem in the case when the number of local maxima is greater than two. In our experiments. oc <<«V Theorem 2 states the sufficient conditions (but not necessary) under which procedure S derives the optimal partition into unimodal fuzzy sets. X17. then the partial sets (S. X9. X22.)Since the nearest point to x23 (among the ones that have already been assigned) is x24. X10. In the latter. X27." is well defined. problem [16] where a set of points {(xi)'} is given that must be clustered into categories. and the hypersurface becomes a hyperplane. 6. In this case.x16. That is. X%. At a certain stage in the procedure. This figure demonstrates procedure S. we can modify procedure S to assign subsets Si. then a unimodal fuzzy set is also a convex fuzzy set [21]. All the points up to x25 are automatically assigned to the group in which x15 is the local maximum (see proposition). X30. x13. It is obvious that the resulting partition is dependent on T. More specifically. For example.) S2 = (X25. if a very large threshold is chosen. It is also necessary to consider the practical situation where many points have the same grade of membership since this was explicitly excluded in the previous theoretical developments. This problem was solved by allowing for a permutation of the points in the sequence A when they have the same grade of membership. given a characteristic function f. consider part I of the procedure F in which the symmetric fuzzy sets are derived. x28. X14. xll. we can record its nearest point (with a higher grade of membership). 6. Utilizing the result of Theorem 3.IEEE TRANSACTIONS ON COMPUTERS. where now it is assumed that the local maxima are known. x7. X21). X1. in the extreme. If YqYic. Fig. x18. an integer number ni which is the number of sample points in the set Fi = {xid(xi. rather than individual points of S. The dotted line indicates the final partition for this example. A previous knowledge about the data to be partitioned is not essential in order to choose T. If S does not include any points in F7.x Xll 7. It is quite within the realm of possibility to automate this procedure but this was not done for the experiments reported in Section VI. The latter must be determined in such a way that there is "sufficient" discrimination among the points. and X23 is the next point to be assigned. x) < T}. II 14 -23)4 --I X15 X17 X23 X2025 X r. then procedure S derives the optimal partition of S for any finite cx. G and if f(Yq+I)=(Yq). Suppose that G groups have already been initiated and Yq is the point in the sequence A to be assigned next. then . we have used a threshold value T and associated with every point xi. but ox << d. X24. Hence procedure S reduces to an automatic classification of the points. The other points are assigned either to the first group or to the second (where x25 is the local maximum) according to the classification of the nearest point to the point to be assigned. V. the former will be assigned into S2. x19. we mean functionals which describe the properties of the desired categories and the structure of the final partition [19]. x) <oc/2}.

the algorithm was applied to six data sets. CPU is the number of minutes required to cluster the data on an IBM 360/75 and includes the time needed to generate the data set. The latter consists of points in a ten-dimensional vector space and belonging to sets described by multimodal spherical and ellipsoidal distributions. handprinted characters [4]. This classification is achieved by assuming a knowledge of the functional form and the parameters of the parent populations and using an optimal classifier (Bayes sense).589 GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS the identity between Yq± 1 and y'. and 25. is checked. . This data set is a part of the version that was used in [7 ].8 3. Let Sii be the subset of points in Si which have the same (maximal) grade of membership as pi. respectively. Although these solutions are known theoretically. and rotation. 20. in particular for the case of a= 25. the spherical sets with = 15.33 5. stretching. To solve this problem. If at least one of these points is on the boundary of Si. a 9 The prototype vectors which have been used for the data sets listed in Appendix II. 2. 2) Et. ''. 2. the computation for the ellipsoidal data sets is difficult be- TABLE I* Group Number Ellipsoidal Spherical u=25 a=15 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 6=25 a=15 101(1) 100 100 100 100 97 95 71 71 69 29 29 22 7 4 3 2 201(100. 1) Em. [18 ] for pattern recognition experiments and is described in [8]. were performed. Two types of errors have been used to grade the partitions. Jv) DeJ(v rpercent) Data CPU *f(vi) indicates the maximal grade of membership in the corresponding test.04 E.1 11. 2) 101(1) 100 100 99(1) 93(1) 97(1) 88 93 87 80 62(2) 78 62(8) 65 45 62 38 60 13 38 13(1) 26 10 24 9 18 12 5 3 10 10 5 4 4t * n(nl. with the same grade of membership as Yq. The ellipsoidal data sets were determined by subjecting the vectors of the spherical data sets to certain linear transformations. G. Another consideration is the case in which the maximal grade of membership in a set Si is attained by a number of points. EXPERIMENTS Clustering techniques can be compared only if they are applied to the same data set or to data sets in which the categories are well-known beforehand. consists of Em plus the error produced by the generation of small clusters not in the original set of ten.5 9. We have taken the first ten prototype vectors and generated a data set of 1000 points-100 points for each prototype vector. In the second series. then pi is not considered as a local maximum. It is appreciated that this partition cannot be achieved by any clustering technique because ofoverlapping among the categories. i #1. and n3 are from different categories. .10 0. the total error. The same threshold T as in the first series was used. 2. we have modified part II of procedure F (in which a search for the local maxima is performed) in the following way. or on data sets such as. The samples from each category of the former were generated by adding zero-mean Gaussian noise to a prototype vector.40 5. In the first series. TABLE II* 6 2 Spherical 15 20 25 4000 2500 4000 92 17 18 15 20 25 3500 4000 4500 65 31 15 Ellipsoidal JEt (percent) (minute) 0. thus facilitating a comparison of the results of three runs for different initial conditions of the random number generator. defines the error caused by some of the points of category i being assigned to categoryj. two additional runs with the ellipsoidal data set (derived from the spherical set with a = 15) obtained with different initial conditions for the random number generator.7 9. are cause the hyperellipsoids which indicate the hypersurfaces of equal probability density have different shapes and orientations (see [7]). 3) 100 100 100 100 99 100 99 99(2) 97 97 94 96 88 94(2) 81 78(1) 79(1) 20 19 66 12 8 21 3 8 21 3 2 19 15 13 1 100 100 100 100 100 99 85 81 a=20 u=25 101(1) 271(89. In order to be able to reach some significant conclusions concerning the performance of the algorithm we have applied it to artifically generated data sets.29 5. n2. 1) 202(99. The results are shown in Tables III and IV. Thus yq will initiate a new group only if none of the points Yq+ 1' Yq+2. and the ellipsoidal data sets derived from these. Such experiments can therefore be performed either on artifically generated data.3 10.9 3. were generated. is identical to y i= 1. 2 points.8 23. n3) indicates that there is a total of n points in the corresponding group of which n1.3 18.7 16. for example. n2. G. then every point in Sii is examined as the mode of Si. VI.1 0. the mixing error.59 5. i= 1. 83. t In this case 5 additional groups of 4. A summary of the results is given in Tables I and II.6 12. The reference partition that we have used is the partition into the original ten categories of 100 points each. it is therefore a result of the possible overlapping among the categories or the linking of several categories. The optimal partitions for these data sets are unknown but will be characterized by the rates of error associated with the optimal solution of the supervised pattern recognition problem. These small clusters are the result of the fact that a finite sample from a Gaussian distribution can be made up of several modes.1 10.9 Two series of experiments were performed.

5 out of 80 local maxima in the neighborhood of the prototype vectors were not detected. . 3. T2=3500 f(v1) Em (percent) E. although it is doubtful that this could be achieved for the data sets with v= 25. The difference in the error rates is within 1. We propose two ways in which such sets could be treated to derive partitions which are very similar (if not identical) to the ones discussed in the previous sections. However. The major mixing error can be attributed to the fact that the algorithm did not detect a local maximum in the neighborhood of the prototype vector for some of the categories. If it is such that many points have the same grade of membership. are similar.1 0 5. indicates that the shape of distribution of the points was not a major factor in causing the error.IEEE TRANSACTIONS ON 590 TABLE III Group Number Ellipsoidal =15 1 2 3 4 5 6 7 8 9 10 101(1) 100 100 100 100 97 95 71 71 69 100 100 100 100 100 96 95 92 57 48 11 12 13 14 15 16 17 18 29 29 22 7 4 3 2 38 20 11 8 7 5 5 5 3500 158(58) 100 100 100 100 100 100 100 63 42 16 11 10 5 19 20 21 4 4 J TABLE IV Ellipsoidal a=15. This can be seen in Table I by the entries in the first row. In one of these experiments. The reason for this seems to be that the sample was not large enough. comparing the entries in this column with the corresponding ones in the CPU column gives some support to the above statement. This supports our claim that the algorithm is capable ofdetecting categories with general distributions of points. greater than 30 000 points in a many dimensional space) because of both memory space and computing time limitations. the point will usually not become a local maximum. JULY 1970 linkage of several categories. then a point which is not in a cluster at all. In the two series of experiments. the fact that the error rates for the ellipsoidal data are comparable with those for the spherical sets. VII. The minimum value of T is constrained by the resolution. From the results of the second series of experiments (see Tables III and IV) we can see that the partitions generated with data sets obtained for different initial conditions of the random number generator. or possibly by using an underlying model to evaluate the grade of membership and so yield a continuous variation in f The computer program used the process of assigning a point at a time in procedure S.7 percent.29 3. On the other hand. The value f(v1) in Table II gives some indication as to the discrimination achieved.1) computations of distance and the search for the minimum distance for every point. This problem could be eliminated if the size of the sample were increased. The required computing time lay between three and six minutes on an IBM 360/75 computer and this depended on the discrimination in the values off. 2. the results are quite encouraging. The experiments show that there is a small amount of overlapping among some of the categories. and 3 categories.33 3. We believe that a better choice for T could have eliminated this mixing for the spherical data set with a = 20. Generally speaking.8 9. using an additional measure for discriminating among the points which have the same grade of membership. (percent) CPU 65 70 70 0. have been linked together. since the condition for having a symmetric fuzzy set in its neighborhood will not be satisfied. could be saved by applying Theorem 3. These clearly indicate that the samples of some of the categories are in fact multimodal.33 From Table I. These have not yet been tested experimentally. then procedure F requires more computer time (see Section V). a small T is preferred when no previous knowledge of the data set is available. thus linking 58 points of this category with another. where 2. It is estimated that this would result in an approximate 25 percent reduction in computational time. In particular. and 15. it is reasonable to assume that the problem could be eliminated using larger data sets. respectively. while the maximum is constrained by the possible as well as a COMPUTERS. THE EXTENSION OF THE ALGORITHM TO VERY LARGE DATA SETS The computers available now are generally not capable of clustering very large data sets (say.2 9. This factor could be eliminated by.5 3. given the sample size. we can see that nine to ten major categories number of small clusters were generated in each test. a local maximum in the neighborhood of one of the categories was not detected. even in this case. respectively. and 6. As a guide. All of the computing time in the latter which includes N(N. columns 2. This is supported by the low values off(v1) in Table II where for the above tests the entries are 17. A total of 25 experiments (3 to 5 per data set) have been performed and the best results are included in the tables The threshold T was varied coarsely over a wide range and no fine adjustments were made in order to improve the results. but in a space among several categories. that is.7 11. for example. might have the largest grade of membership. if T is very large. 18.

~N (d) K x Fig. in particular when the categories are multimodal and this information is not known. The resultant characteristic functions fb. Xm}. We can use this algorithm to first partition every category independently into unimodal fuzzy sets. This operation results in the partition (division) of the domain off into three dismin [d(xi. By truncating the sequence A. and the truncated parts can then be introduced sequentially in order to generate the symmetric fuzzy subsets. then part 2 of procedure F and procedure S can be applied to the entire set. to evaluate f (xi). CONCLUSIONS AND REMARKS An algorithm is presented which partitions a given sample from a multimodal fuzzy set into unimodal fuzzy sets. Then it is truncated at several points according to the desired sample size. Once the sample has been partitioned into symmetric fuzzy subsets. Xi. then the algorithm can be employed. where f(x11) =Jji. if d(x1. The use of this algorithm as a clustering technique was also demonstrated. (d). An example of the truncation process when X= E' is given in Fig. xi)> T1. where N Z ni = the number of points in the original data set. f. xj)] fi= Xje(S-Ci) joint domains where each of the latter may be a union of several disjoint subdomains. Then all the other points are introduced sequentially and the distance from x1 to every point is measured. mains and the grade of membership of the points is as in the original functionf The entire domain and the three domains resulting from the truncation are shown in Fig. then the algorithm derives the optimal partition in the sense treated.where Ci is the set of points in the category in which xi is a duced includes sample points in only one of the above do. and (d). if S ' {xld(xi.. (c). then xi will again be introduced until every point has been assigned. (a) The characteristic function f. the corresponding point xi is assignedfinally into the group into which x1 will later be assigned. then the grade of membership of x1 is increased by 1.member. The truncation process.GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS Threshold Filtering In this process we reduce the sample size before applying procedures F and S. Here the sequence A is truncated at a point xl where F(xi) =fJ and at xi. Although threshold filtering has been used before. . Such experiments have been reported in [7].. . When this process of filtering is terminated. VIII.* *. (c). 7. nm. Every subset which is pro. The algorithm can also be applied effectively in supervised pattern recognition. It can be seen that a local maximum may sometimes not be detected if the truncation is done immediately after a local maximum point. X2. it has a particular significance here. 591 f x (a) xi x (b) I __El_ (c) fd x i= 1 Now the usual discrimination procedure is employed. A small threshold T1 is employed for filtering purposes while a large value T2 (equivalent to T in the previous section) is used to evaluate the final grade of membership. is introduced. x1. 7. part 1 of procedure F can be applied sequentially to the truncated parts. This is because the points which are filtered out contribute to the partition of the entire set since they are represented in the grade of membership of the points which are included for clustering. for example. truncated at f1 and f11. In this case we associate with every point xi the distance membership function Truncating the Sequence A It can be observed that the major memory space limitations are governed by the requirements of part 1 in procedure F. If N is of such a size that can be handled by the available computer. It is proven that if certain conditions are satisfied. n2. Thus xi is not considered further in the application of procedures F and S. then set f(xi) = ni + ni +. xi) < T1. say x1. respectively. XN with the temporary grades of membership nl. [101. [18]. 7(a). If d(x1. there remains a smaller set of points. Experiments on artificially generated data sets were performed and both the results and errors were discussed in detail. if not. On the other hand. and fd. This partition is also optimal in the sense of maximum separation [21]. First the sequence A is generated in the usual way. (b). a further filtering stage can be imposed in the same manner.nN. (b). The first point. x) < T2} = {xi.

since xn is for every n< Ni: not a local maximum. since " C depends on the number of categories that the given data set represents. This can be Rt = d(ui. we may consider the line segment APPENDIX I joining x. xr)>> (pi. and (1) shows that this point is in Si. and the point xi. The storage requirement is (20N+ CN+ S)10 bytes. Clearly.. thus generating the sets S(n -1) c S1 and S(n 1) S and that xneS1 is the point to be assigned next. thus xq will prevent xp from being assigned into Si since it is not replaced as a r1 = {xld(x. then it is an interior point. Thus if Ni is the xer number of points that have been assigned into group i.f(xn). especially in practical problems in which the categories are not distributed in "round" Assumption 1 implies that R.xt) = min [d(Ci. /2}.0.") and Assumption 1 implies that (IFnS) is f(xn) > f(Xr) d(Iti. x) = oc/2}. Defining the set F= x)<e/2}. then and IF = {xjd(u. f (xn) <(x) for every xe.592 IEEE TRANSACTIONS ON COMPUTERS. Si = {xjfd(ui. pi)=e/2. xreSi d (x. f(xv) > f(xn) and Disjointness is demonstrated by the same argument. all the points in Sii in the sequence Ai.if it is a local maximum of f.. < d(xn. xi) < q}.it must be on the boundary of Si. Since xq is assigned to group j.) < d(Iti. But xq precedes xp as a Let candidate to be assigned into group i. Xk(SthSi) an advantage. xv) = min [d(Xn. arranged in the order Proof of Theorem 2 that they stand in A. unless it is assigned to Si.j =# i. Suppose that (n-1) points have already been assigned that is. correctly. .Ni. Obviously. assumption 2 assures that F includes at least one sample Lemma: The sets Si are disjoint symmetric fuzzy sets. and thus will block all the points in Sii from being and therefore can be applied to large data sets. N is the number of points to be partitioned. (1) 3) The shape of the distribution of the points in a group Now let be the xt sample point such that (category) can be quite general because of the distributions that the unimodal fuzzy sets include. xv) < a.Xr)J forr=n + 1.1) [d(x. It remains to be shown that tively. xp)] if and only if d(xn. then assumption 1 of Theorem 1 depends on the particular resolution of the magnitude implies that the subset Si is of the components of the data vectors. JULY 1970 It is suggested that the clustering algorithm reported in groupj. d(x0. in our experiments. f(u) = sup [f(x)] candidate point. and same set Si if and only if their order in Aii corresponds to S2 the optimal partition of S. Proof: Let Aii define a subsequence of A. . xjce(F. their order in Ai. xq) for xreSii.]j i. n-1. and S. e. and S is the number of storage locations re. x)] XpOS2 -n1) Then xp in Aii must be assigned first. Xq cient to show that there exists a sample point x eS(n1 and such that Ai = (. d(i. Suppose Xq is an interior point in Si and has been assigned into ocx-+O establishes the proof. let us assume that f has only points to be assigned into group i. where assigned into group i... xj)} for j = 1. of the points point that have been assigned into group i. not empty. Bearing in mind pro. 2) The amount of computing time is relatively small.) < min [d(xn. x) < R1} includes at least one sample point in Si. S If pi is a local maximum. respec. Then there exists a subset Sii c Si which satisfies this paper possesses three advantages over the ones dis.F. x. quired for the given set of data points. Xj)]. min X E >#(n. xv) < 2. it will not be replaced as the candidate point 1) It does not require a great amount of fast core memory in group i. Let Ai be the sequence of candidate Without loss of generality.. and ji. d(pi.. . Note that this estimate of the total memory space is correct for N up to 32 767. any two points xp and xq can be assigned to the separating f into the two unimodal fuzzy sets.two local maxima. x)). It is suffiAiiad * xp.xj. Suppose their order does not correspond. * * * Xp * * . thus min XpES2 [d(x.. To show that the set F ={xld(x. S {xld(xin. . Let H be the optimal hypersurface cedure F.. f(u) . Let xv be the point such that d(x. C= 5 was found to be sufficient.. x) <. 20N and Proof of Theorem 1 CN are required for the fixed portion of the program The lemma implies that if ji is not a local maximum then and the variable length data sequences (A.Xk)]. x.. clusters. 2. Xq. We have shown that there is a sample point that which proves that S is symmetric. Ai). on this line such that Proof of Theorem I d(xin. xi)] xveS such .the condition d(4i. Thus xq precedes cussed in the literature. where n >E/2.. In the limit when a. H) < a.).

1967.. vol. APPENDIX II The following are the ten prototype vectors. d(pi. [12] J. Calif. February 1967. 0. "State of the art in pattern recognition. December 1966. Rogers and T. xp)2co for every xp in S2. 1965. and S2 is the optimal partition. A. pp. Condition 1 implies that d(xn. Research Rept. Menlo Park. Define the following sets: Fk= {xld(Xk. Bonner. This contradicts the above assumption. pp.1) c S.. Now xU and xn must belong to S. 195-205. An application of Theorem 2 completes the proof since if S. 1963.x2) = min [d(pi. Gengerelli. A. and S. vol. J. T. E. "Pattern classification by iteratively determined linear and piecewise linear discriminant functions. alnd Develop. which implies that Si is not symmetric. Kaminuma. 1965. let Xk be the point of intersection with the lowest value f). vol. 137-141. Computers. 220-232. A pattern recognition experiment with near-optimum results. vol. Ball. and Develop. May 1968. A. H. C-17. vol." Amer. vol. Cover and P. December 1966. X) (} and Sk = S n Fk Condition 1 (see Theorem 2) implies that Sk is not empty. Stat. A. 21-27. Res. "Nearest neighbor pattern classification. E. C." IEEE Trans. 294-302. T. It is sufficient to show that if Si is a set generated by the above procedure. vol. September 1964. [13] T. pp. Fischler. xj)]. A. Calif.1) c S2. Assoc. Let xu be the point such that d(xn." presented at the 1964 Western Psych. [11] E. 58. 22-32. EC-12. [19] J. 1963.11 65 33 11 V4 <5 -27 59 -55 25 -27 -33 -47 67j 33 31 53 .71 . vol. Y.. say the sets Si.: Spartan. [2] G. April 1965. [5] T. April 1966. J. AFIPS Proc. "An autonomous reading machine. M. "Some methods for classification and analysis of multivariate observations. Psych.37 -65 V8 V9 Vo 3 99 . vol." IEEE Trans. Takekawa. J. [4] R. July 1965." IBM J. Dorofeyuk. Stanford Research Institute. is on one side of H and S2 is on its other side. EC-15. H. thus the sets Si generated by part 1 of procedure F are always symmetric and disjoint fuzzy sets. pp. vol. [20] J. "Optimal classification into groups: An approach for solving the taxonomy problem. and S2 denote the optimal partition of S. "A computer program for classifying plants.. pp. 27. Forgy. pp. xu) <c. 457-468. if not. 115-118. [21 ] L. [15] R. Watanabe. Tanimoto. Calif. on Math. ". 9. J. Nagy. Suppose that (n-1) points have already been assigned correctly. W. Hall.. Calif. 789-798. J." IBM J.. pp." Proc. Res. Hart. 836-862. vol. May 1968." Pattern Recognition. H. Stat. pp. "On some clustering techniques. 8. 1. 1. which were used to generate the data sets for the experi- ments discussed in Section VI. Washington. IEEE. Xr) < Proof of Theorem 4 Let S. Assoc. Rosen and D. Electronic Computers (Correspondence). pp. Yorktown Heights. Firschen and M. N. Now if c--+O and xr is any point in Sk." IEEE Trans. Then the application of Theorem 2 will complete the proof. EC-15. [7] R. Casey and G. Xu) min = Xj E (S'(n 1) ')tasun - - 1 )) [d(xn.Hierarchical grouping to optimize an objective function. Berkeley. Menlo Park. pp. pp. 56. 320-2915. 593 The vectors are the first ten of the eighty prototype vectors given in [8]. "A technique for determining and coding subclasses in pattern recognition problems. Electronic Computers (Correspondence). "Fuzzy sets. 1958. then it is on one side (either inside or outside) of H. 132. [17] D. Ball and D. 1966." available from C. Let x2 be the point such that d(pi.GITMAN AND LEVINE: ALGORITHM FOR DETECTING UNIMODAL FUZZY SETS Proof of Theorem 3 In this proof we make use of the lemma to Theorem 1. vol." IEEE Trans.. Fisher. "Detecting natural clusters of individuals. Note that in the proof of this lemma. Santa Monica. and Xk be the point at which L intersects H (suppose that there is one point of intersection. Dammann." IBM Rept. January 1964." J. Information Theory. [3] R. RC-1768. "ISODATA." Proc. given by their integer components in the ten-dimensional space. 236-244. Rubin. Ward. since if there are no sample points in 17. IT-13.I 67 47 89 V_6 V7 I 13 -19 -37 . Meeting." Information and Control. [18] C. pp.297. [8] . and S(n . Mattson and J. "Reduction of clustering problem to pattern recognition. 53. 666-667. Duda and H. 8. vol... then S." Amer. 1728-1737. also IBM Corp. 5th Berkeley Symp. "On grouping for maximum homogenity." Stanford Research Institute. 338353. VI v2 -77 -57 -67 -57 -131 19 27 -65 -63 83 53 51 67 -87 -73 -69 73 53 49 57 V3 -791 -13 69 . [9] W. 55." Science.5 -43 45 47 -75 -71 -93 -87 -49 -21 . then f(Xr) < f(x2) and d(pi. Hall. Statist. The order of the vectors bears no relation to the group numbers in Tables I and III. E. XjG5i2 Let L be the line segment joining x2 and pi. January 1967. [16] G. thus generating S(¶. D. "A method for detecting subgroups in a population and specifying their membership. A novel method of data analysis and pattern classification. 281. [6] A. pp. then d(xn. [10] 0. pp. October 1960. August 1966. pt. G. "Teaching algorithms for a pattern recognition machine without a teacher based on the method of potential functions. none of the constraints of Theorem 1 were applied.17 41 -65 . MacQueen.." IEEE Trans. L. Suppose that Si includes points on both sides of H. X2). "Computer-generated data for pattern recognition experiments. pp. "Automatic subclass determination for pattern-recognition applications." Automation and Remote Control. April 1963. "Data analysis in the social sciences: What about the details?" 1965 Fall Joint Computer Conf. and suppose xneS1 is the next point to be assigned. 492-503. 27.. Nagy. and Prob. pp. Rosen.xj)]. Let us assume that f has only two maxima and let H be the optimal hypersurface separating f into the two unimodal fuzzy sets. 533-559. Assoc. D. [14] J. vol. 1969.. Fossum. and Si2(Si1uSi2=Si) and suppose that pi (the mode of Si) is in Si. Zadeh. . vol.: University of California Press.3 51 43 27 21 29 3 23 59 43 5 -65 -73 -97 -35 11 35 25 -43 51 -23 21 5 87 -17 15 -41 -85 REFERENCES [1] G.. vol. pp. Electronic Computers.37 -71 -91 29 -59 69 65 -25 .