Professional Documents
Culture Documents
2 Rough Clustering Highlighted
2 Rough Clustering Highlighted
Rough clustering
Pawan Lingras1∗ and Georg Peters2
64
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering
Fuzzy membership λ
Boolean separator
Poor Rich
Upper approximation
λrich
Lower approximation
λpoor
(Positive region)
1
Rough approximations
USD 500,000 are treated as poor (lower approxima- Let us consider a hypothetical classification
tion of the set poor) and clients owning more than scheme U/P = {X1 , X1 , . . . , Xk}, which partitions the
USD 1 million are served as rich (lower approxima- set U based on certain criteria. The actual values of
tion of the set rich), the group in between these needs Xi are not known. The classification of supermarket
to provide more evidence to be eventually assigned to customers is an example of such a hypothetical clas-
one category of clients. Their unclear membership is sification scheme. Depending on the predominant us-
shown by their assignments to the boundary region age, a set of supermarket customers can be classified
between the both sets. as loyal high spenders, loyal moderate spenders, semi-
The α-cut or defuzzification of fuzzy set the- loyal high spenders, semi-loyal moderate spenders, or
ory may seem like similar instruments at the first low spenders. However, the actual sets correspond-
sight. However, these concepts differ significantly. ing to each one of these classes are not known. Let
Fuzzy sets are based on neighborhood relations, us assume that due to insufficient knowledge it is not
whereas rough sets relate to missing or contradicting possible to precisely describe the sets Xi , 1 ≤ i ≤ k, in
information. the partition. However, it is possible to define each
Note that this is a simplistic comparison of set Xi∈ U/P using its lower and upper approxima-
Boolean concepts and rough as well as fuzzy sets. tions A(Xi ), Ā(Xi ) based on the available informa-
For a more sophisticated interaction between the two tion. The available information for supermarket cus-
set theories, the users are encouraged to study the tomers may consist of their transaction records. We
references cited at the beginning of this section. will use vector representations, v for an object and
xi for cluster Xi . We are considering the upper and
lower approximations of only a few subsets of U.
Therefore, it is not possible to verify all the prop-
ADOPTION OF ROUGH SET THEORY
erties of rough sets.8 However, the family of upper
TO CLUSTERING and lower approximations of xi ∈ U/R are required
Rough sets were proposed using equivalence rela- to follow some of the basic rough set properties such
tions. However, it is possible to define a pair of as:
upper and lower approximations A(X), Ā(X) or a
rough set for every set X ⊆ U as long as the proper- P1. An object v can be part of at most one lower
ties specified by Pawlak4 are satisfied. Yao and Lin,23 approximation.
and Yao24 described various generalizations of rough P2. v ∈ A(xi ) ⇒ v ∈ Ā(xi )
sets by relaxing the assumptions of an underlying
P3. v is not part of any lower bound ⇔
equivalence relation. Polkowski and Skowron,25 and
v belongs to two or more upper bounds.
Skowron and Stepaniuk26 discussed a similar gener-
alization of rough set theory.
Lingras and West11 provided an efficient alter- The next step in rough clustering is to determine
native based on an extension of the k-means algo- whether an object belongs to the upper or lower ap-
rithm. k-means clustering is one of the most popu- proximations of a cluster. For each object vector, v, let
lar clustering techniques.27 Incorporating rough sets d(v,xi ) be the distance between itself and the weight
into k-means clustering requires the addition of the vector xi of cluster Xi . The form of the distance func-
concept of lower and upper approximations. The in- tion depends on the application. The Euclidean dis-
corporation required redefinition of the calculation of tance function or inverse of a similarity function are
the centroids to include the effects of lower and upper two of the possible choices. The ratios d(v,xi )/d(v,xj ),
approximations. The next step was to design criteria are used to determine the assignment of v as follows:
to determine whether an object belongs to the lower
or upper approximations of a cluster. A1. If d(v,xi ) is the minimum for 1 ≤ i ≤ k
The rough k-means approach has been a subject and d(v,xi )/d(v,xj ) ≥ threshold for any pair
of further research. Peters16 discussed various refine- (i, j)then v ∈ Ā(xi ) and v ∈ Ā(x j ). Further-
ments of Lingras and West’s original proposal. These more, v is not part of any lower approxima-
included calculation of rough centroids and the use of tion. This criterion guarantees that property
ratios of distances as opposed to differences between (P3) is satisfied.
distances similar to those used in the rough set-based A2. Otherwise, v ∈ A(xi ) such that d(v,xi ) is the
Kohonen algorithm described in Ref 10. The rough minimum for 1 ≤ i ≤ k. In addition, by prop-
k-means and its various extensions have been found erty (P2), v ∈ Ā(xi ). This criterion also satis-
to be effective in distance-based clustering. fies property (P1).
66
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering
Possibly the most popular clustering techniques The rough Davies–Bouldin index proposed by Mitra
is the k-means algorithm.27,28 The name k-means et al.15 takes into account the compactness within a
comes from the means of the k clusters that are cre- cluster as well as separation among clusters. A modi-
ated from the set of objects using the method. The fied version is defined as follows19 :
process begins by randomly choosing k objects as the ⎛
centroids of the k clusters. The objects are assigned to 1 1
Si = ⎝ × ||v − xi ||2 +
one of the k clusters based on minimum value of dis- |A(xi )| || Ā(xi ) − A(xi )||
v∈A(x )
tance between the object vector v and cluster vector i
⎞1/q
xi . After the assignment of all the objects to various
q
clusters, the new centroid vectors of the clusters are ||v − xi ||2 ⎠
× . (3)
calculated as: bv
v∈ Ā(xi )−A(xi )
v∈xi v
Xi = (1) Here bv is the number of boundary regions object v
|xi |
belongs to.
where |xi | is the cardinality or size of cluster xi . If the
size of cluster xi is 0, then xi is a null vector.
The objects in rough k-means clustering are as-
signed to either lower or upper approximations using RELATED APPROACHES TO ROUGH
the criteria (A1) or (A2). Therefore, we need to mod- CLUSTERING
ify the Eq. (1) for rough k-means as: A Class of Rough Partitive Algorithms
If |A(xi )|
= 0 and |A(xi )|
= 0, then In k-means clustering algorithms, cluster centers
are represented by artificial objects that correspond
v∈A(xi ) v v∈ Ā(xi )−A(xi ) v to the means of the clusters. In contrast to that,
xi = wl × + wu ×
|A(xi )| | Ā(xi ) − A(xi )| Kaufman and Rousseeuw29 proposed k-medoid clus-
tering where the cluster centers are represented by
else if |A(xi )|
= 0 and | Ā(xi ) − A(xi )| = 0, then
real objects. In switching regression models, the clus-
v∈A(xi ) v ters are represented by functions instead of objects.30
xi =
|A(xi )| Perers and Lampart17 suggested rough k-
medoids and Peters18 proposed a rough switching
else if |A(xi )| = 0 and | Ā(xi ) − A(xi )|
= 0, then regression model, which—together with the rough k-
v∈ Ā(xi )−A(xi ) v means—form a class of rough partitive algorithms.
xi = . (2)
| Ā(xi ) − A(xi )|
another one in 2004 by Mitra,14 and an evolutionary ron. Because an object belonging to the lower ap-
k-medoid in 2008 by Peters et al.32 proximation of a cluster also belongs to its upper
We will discuss the evolutionary rough k- approximation, when lower neuron has an output
medoid by Peters et al. in greater detail as it is prob- of 1, the upper neuron also has an output of 1.
ably the most efficient GA based rough clustering. However, membership in the upper approximation
A GA can be used to search for the most appropri- of a cluster does not necessarily imply the member-
ate k-medoid. The genome will contain k-genes, each ship in its lower approximation. Therefore, the up-
corresponding to a medoid. Such a genome will be per neuron contains the lower neuron. Assignments
smaller than the one used by Lingras8 and Mitra.14 (A1) and (A2) are used to determine the output from
The smaller genomes will reduce the space require- the lower and upper neurons. For assignment (A1),
ments and also facilitate faster convergence. The val- the weight vectors xi and xj are modified as xinew =
ues of genes for the medoids are discrete and limited to xiold + au (v − xiold ) and xnew
j = xold
j + au (v − x j ). For
old
the number of objects in the dataset. If we number the assignment (A2), the weight vector xi is modified as
objects from 1 to n, then each gene can take an integer xinew = xiold + al (v − xiold ). Usually, al > au .
value in the range 1–n. This restriction on the values
of genes also reduces the search space allowing for
faster convergence. The rough k-medoid algorithm Support Vector Based Rough Clustering
can use assignments (A1) and (A2) to determine the The rough clustering methods as described above are
membership degrees of objects to lower and upper ap- based on Euclidean distances in the original input data
proximations of different clusters. A cluster validity space. Support vector clustering (SVC)34 is a kernel
measure such as rough Davis–Bouldin index is opti- based clustering method that is capable of identifying
mized to arrive at an appropriate clustering scheme. clusters having arbitrary shapes. Here, the clustering
The main advantage of the evolutionary rough clus- problem is formulated as a quadratic programming
tering is the flexibility of the optimization criteria. (QP) problem to learn a minimum radius sphere en-
closing the image of the dataset to be clustered in a
high-dimensional feature space. In SVC, this problem
Kohonen Network Based Rough Clustering is solved by employing a method, called kernel trick34
The unsupervised learning based on the Kohonen that helps solve the QP problem without explicit map-
rule33 uses competitive learning approach. In com- ping of data points from the input data space to the
petitive learning, the output neurons compete with higher dimensional feature space. Once the QP prob-
each other. The winner output neuron has the output lem is solved, SVC uses a graph-based cluster labeling
of 1, the rest of the output neurons have outputs of method to identify the arbitrary shaped clusters exist-
0. The competitive learning is suitable for classifying ing in the input data space. Rough SVC (RSVC)35
a given pattern into exactly one of the mutually ex- is a soft clustering method derived from the SVC
clusive clusters. A Kohonen network consists of two paradigm. It achieves soft data clustering by a nat-
layers. The first layer is the input layer and the sec- ural fusion of rough set theory and SVC. In RSVC,
ond layer is called the Kohonen layer. The network the QP problem involved in SVC is modified to impart
receives the input vector for a given pattern. If the a rough set theoretic flavor. The modified QP prob-
pattern belongs to the ith cluster, then ith neuron in lem obtained for RSVC turns out to be the same as the
the Kohonen layer has an output value of 1 and other one involved in SVC. Therefore, the existing solution
Kohonen layer neurons have output values of 0. The strategies used for solving the SVC–QP problem can
training set of input vectors is presented to the net- be used for solving the RSVC–QP problem as well.
work several times. For each iteration, an object v The cluster labeling method of RSVC is a modified
contributes to the weight vector xi of a cluster. version of the one used in SVC.
Incorporating rough sets into the Kohonen al-
gorithm requires an addition of the concept of lower
and upper approximations in the equations, which Dynamic Rough Clustering
are used for updating the weights of the winners.10 A In many real-life situations where cluster algorithms
neuron in the Kohonen layer consists of two parts, are applied, the underlying data structure is not stable
a lower neuron and an upper neuron. The lower but changes over the time. For example, the shopping
neuron has an output of 1, if an object belongs to patterns of customers change dramatically within a
the lower approximation of the cluster. Similarly, a calendar year. In autumn, they buy warm clothes to
membership in the upper approximation of the clus- prepare for the winter and in spring they look for
ter will result in an output of 1 from the upper neu- t-shirts instead of gloves. To address changes in the
68
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering
data structures, Peters and Weber19 proposed a dy- The average pattern for the lower approxi-
namic approach to rough clustering where the initial mation of commuter/business cluster had the least
parameters of the algorithm are updated in cycles to variation over the year. The recreational cluster,
better adapt to changing environments like the sea- conversely, had the most variation. The variation
sonal changes in customer behavior. for long-distance cluster was less than the recre-
ational but more than the commuter/business cluster.
Lingras8 illustrated how one of the highway sections
Further Approaches near counter number C013201 may have been com-
Besides the approaches discussed above, several fur-
muter/business or long distance in 1985. The monthly
ther approaches to rough clustering have been pro-
pattern for the highway section fell in between the two
posed. They include early approaches to clustering
clusters. The counter C013201 is located on Highway
based on the set interpretation of rough sets by do
13, 20 km west of Alberta–Saskatchewan border. It is
Prado et al.36 and Voges et al.37,38 Recently Yao
an alternate route for travel from the city of Saskatoon
et al.39 suggested to relax some of the properties of
and surrounding townships to townships surrounding
rough clustering, in particular the need for the mem-
the city of Edmonton. Rough set representation of
bership, to at least two cluster of objects in bound-
clusters made it possible to identify such intermediate
ary areas, and introduced an interval-based clustering
patterns.
approach.
The web users were clustered using con- had only one store and hence it is likely that G3 did
ventional Boolean clustering, FCM, and rough k- not find it convenient to shop at the supermarket on
means to compare these three clustering philosophies. a regular basis.
Table 1 shows the cardinalities of conventional clus- Although the lower approximations tended to
ters, the rough clusters, and fuzzy clusters with mem- provide distinguishing characteristics of various clus-
berships greater than 0.6. The actual numbers in ters, the boundary regions of the clusters tended to
each cluster vary based on the characteristics of each fall between the lower approximations of two re-
course. For example, in the fuzzy clusters, the first- gions. There was a large difference between the lower
term course had significantly more workers than stu- approximations of the G1 and G4. However, their
dious visitors, whereas the second-term course had boundary regions seemed to be less distinct. The
more studious visitors than workers. The increase in boundary regions of G1 and G4 fell between the lower
the percentage of studious visitors in the second term approximations of those groups.
seems to be a natural progression. It should be noted
that the progression from workers to studious visi-
tors was more obvious with fuzzy clusters than the CONCLUSION
conventional clusters and the rough clusters. Interest-
In practical applications, an object may exhibit char-
ingly, the second year course had significantly large
acteristics of multiple clusters. The conventional
number of workers than studious visitors. This seems
Boolean clustering forces these objects into one of
to be counterintuitive. However, it can be explained
the clusters. This can lead to potentially erroneous
based on the structure of the websites. Unlike the two
assignments, leading to incorrect decisions. The fuzzy
first year courses, the second year course did not post
and rough clustering algorithms allow an object to
the class notes on the web. The notes downloaded
belong to more than one cluster. The resulting clus-
by these students were usually sample programs that
tering scheme may consist of overlapping clusters.
were essential during their laboratory work.
Although the rough clustering removes the restric-
tion of the unique cluster assignment, it is less de-
scriptive than the fuzzy clustering. In some cases, a
Clustering Supermarket Customers decision maker may find a concise description from
The data consisted of transactional records over a rough clustering less overwhelming. This paper de-
26-week period from one supermarket store.40 Sorted scribed the adoption of rough set theory to clustering,
patterns of weekly visits and spending were used to including a rough extension of the popular clustering
represent the customers. The average spending and algorithm called k-means. Rough clustering based on
visit patterns enabled Lingras et al.40 to distinguish other techniques such as genetic algorithms, Kohonen
between the five types of customers as: loyal big networks, and SVC were also described. The rough
spenders (G1), loyal moderate spenders (G2), semi- clustering has been used in a variety of applications
loyal potentially big spenders (G3), potentially mod- including forestry, medicine, image processing, web
erate to big spenders with limited loyalty (G4), and mining, supermarkets, and traffic analysis. The paper
infrequent customers (G5). Even though for most described three such applications to highlight various
weeks, G2 had higher spending than G3, the high- aspects of rough clustering, including a comparison
est spending of G3 was higher than G2. The region with Boolean and fuzzy clustering.
70
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering
REFERENCES
1. Joshi A, Krishnapuram R. Robust fuzzy clustering 18. Peters G. Rough clustering and regression analysis.
methods to support web mining. Proceedings of the Proceedings RSKT’07, LNAI 2007, 4481:292–299.
workshop on Data Mining and Knowledge Discovery; 19. Peters G, Weber R. A dynamic approach to rough
SIGMOD ’98, June 2–4 1998. Seattle, Washington. clustering. Proceedings RSCTC’08, LNAI 2008,
1998, 15:1–8. 5306:379–388.
2. Bezdek JC. Pattern Recognition with Fuzzy Objective 20. Peters JF, Skowron A, Suraj Z, Rzasa W, Borkowski
Function Algorithms. New York: Plenum; 1981. M. Clustering: a rough set approach to constructing in-
3. Pedrycz W, Waletzky J. Fuzzy clustering with partial formation granules. Proceedings of 6th International
supervision. IEEE Trans Syst Man Cybern B 1997, Conference on Soft Computing and Distributed Pro-
27:787–795. cessing. Rzeszow, Poland 2002, 57–61.
4. Pawlak Z. Rough Sets: Theoretical Aspects of Reason- 21. Dubois D, Prade H. Rough fuzzy sets and fuzzy rough
ing About Data. Dordrecht, Boston: Kluwer Academic sets. Int J Gen Syst 1990, 17:191–209.
Publishers; 1992. 22. Zadeh LA. The concept of linguistic variable and its ap-
5. Hirano S, Tsumoto S. Rough clustering and its appli- plication to approximate reasoning. Information Sci-
cation to medicine. Inform Sci 2000, 124:125–137. ences, I: 1975, 8:199–249; II:1975, 8:310–357; III:
6. Hirano S, Tsumoto S. On constructing clusters from 1975, 9:43–80.
non-Euclidean dissimilarity matrix by using rough 23. Yao YY, Lin TY. Generalization of rough sets using
clustering. JSAI Workshops. Kitakyushu City, Japan, modal logic. Intell Autom Soft Comput 1996, 2:103–
2005; 5–16. 120.
7. Ho TB, Nguyen NB. Nonhierarchical document clus- 24. Yao YY. Constructive and algebraic methods of the
tering by a tolerance rough set model. Int J Intell Syst theory of rough sets. Inform Sci 1998, 109:21–47.
2002, 17:199–212. 25. Polkowski L, Skowron A. Rough mereology: a new
8. Lingras P. Unsupervised rough set classification using paradigm for approximate reasoning. Int J Approxi-
GAs. J Intell Inform Syst 2001, 16:215–228. mate Reason 1996, 15:333–365.
9. Lingras P, Chen M, Miao D. Rough cluster quality in- 26. Skowron A, Stepaniuk J. Information granules in
dex based on decision theory. IEEE Trans Knowledge distributed environment. Proceedings RSFDGrC’99,
Data Eng 2009, 21:1014–1026. LNCS 1999, 1711: 357–365.
10. Lingras P, Hogo M, Snorek M. Interval set clustering 27. Hartigan JA, Wong MA. Algorithm AS136: a k-means
of web users using modified Kohonen self-organizing clustering algorithm. J Royal Statist Soc C (Appl
maps based on the properties of rough sets. Web Intell Statist) 1979, 28:100–108.
Agent Syst Int J 2004, 2:217–230. 28. MacQueen J. Some methods for classification and anal-
11. Lingras P, West C. Interval set clustering of web users ysis of multivariate observations. Proceedings of Fifth
with rough k-means. J Intell Inform Syst 2004, 23:5– Berkeley Symposium on Mathematical Statistics and
16. Probability, June 21–July 18 1965 and December 27
12. Maji P, Pal SK. Rough set based generalized fuzzy c- 1965–January 7, 1966. Berkeley, California, 1 1967.
means algorithm and quantitative indices. IEEE Trans pp. 281–297.
Syst Man Cybern B 2007, 37:1529–1540. 29. Kaufman L, Rousseeuw PJ. Finding Groups in
13. Maji P, Pal SK. RFCM: a hybrid clustering algo- Data: An Introduction to Cluster Analysis. 2nd ed.
rithm using rough and fuzzy sets. Fundam Informaticae New York: John Wiley & Sons; 2005.
2007, 80:477–498. 30. Quandt R. The estimation of the parameters of a linear
14. Mitra S. An evolutionary rough partitive clustering. regression system obeying two separate regimes. J Am
Pattern Recognit Lett 2004, 25:1439–1449. Stat Assoc 1958, 53:873–880.
15. Mitra S, Banka H, Pedrycz W. Rough-fuzzy collabora- 31. Holland JH. Adaptation in Natural and Artificial Sys-
tive clustering. IEEE Trans Syst Man Cybern B 2006, tems. Ann Arbor, MI: University of Michigan Press;
36:795–805. 1975.
16. Peters G. Some refinements of rough k-means. Pattern 32. Peters G, Lampart M, Weber R. Evolutionary rough
Recognit 2006, 39:1481–1491. k-medoid clustering. Trans Rough Sets 2008, VIII:
17. Peters G, Lampart M. A partitive rough cluster- 289–306.
ing algorithm. Proceedings RSCTC’06, LNAI 2006, 33. Kohonen T. Self-Organization and Associative Mem-
4259:657–666. ory. Berlin: Springer; 1988.
34. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Sup- eds. Heuristics and Optimization for Knowledge Dis-
port vector clustering. J Machine Learn Res 2001, covery. Hershey, PA: Idea Group Publishing; 2002,
2:125–137. 208–225.
35. Asharaf S, Shevade SK, Murty NM. Rough support 38. Voges KE, Pope NK, Brown MR. A rough cluster anal-
vector clustering. Pattern Recognit 2005, 38:1779– ysis of shopping orientation data. Proceedings of Aus-
1783. tralian and New Zealand Marketing Academy Confer-
36. do Prado HA, Engel PM, Filho HC. Rough clustering: ence 2003, 1625–1631.
an alternative to find meaningful clusters by using the 39. Yao YY, Lingras P, Wang R, Miao D. Interval
reducts from a dataset. Proceedings RSCTC’02, LNAI set cluster analysis: a re-formulation. Proceedings of
2002, 2475:234–238. RSFDGrC’09, LNCS 2009, 5908:398–405.
37. Voges KE, Pope NK, Brown MR. Cluster analysis of 40. Lingras P, Hogo M, Snorek M, West C. Temporal
marketing data examining on-line shopping orienta- analysis of clusters of supermarket customers: conven-
tion: a comparison of k-means and rough clustering tional versus interval set approach. Inform Sci 2005,
approaches. In: Abbass HA, Sarker RA, Newton CS, 172:215–240.
FURTHER READING
De SK. A rough set theoretic approach to clustering. Fundam Informaticae 2004, 62:409–417.
Falcón R, Jeon G, Bello R, Jeong J. Rough clustering with partial supervision. Stud Comput Intell 2009, 174:137–161.
Hung CC, Purnawan H. A hybrid rough k-means algorithm and particle swarm optimization for image classification.
Proceedings MICAI’08, LNCS 5317 2008, 585–593.
Kumar P, Krishna PR, Bapi RS, De SK. Rough clustering of sequential data. Data Knowledge Eng 2007, 63:183–199.
Maji P, Pal SK. Rough set based generalized fuzzy c-means algorithm and quantitative indices. IEEE Trans Syst Man Cybern
B 2007, 37:1529–1540.
Mitra S, Barman B. Rough-fuzzy clustering: an application to medical imagery. Proceedings RSTK’08, LNCS 5009 2008,
300–307.
Parmar D, Wu T, Blackhurst J, MMR. An algorithm for clustering categorical data using rough set theory. Data Knowledge
Eng 2007, 63:877–891.
Varma CMBS, Asharaf S, Murty MN. Rough core vector clustering. Proceedings PReMI’07, LNCS 4815 2007, 304–310.
Wang R, Miao D, Li G, Zhang H. Rough overlapping biclustering of gene expression data. Proceedings of the 7th IEEE
International Conference on Bioinformatics and Bioengineering (BIBE) 2007, 828–834.
Wang SC, Miao DQ, Chen M, Wang RZ. Overlapping-based rough clustering algorithm. J Electron Inform Technol 2008,
30:1713–1716.
Xiao D, Hu S. Ellipsoidal basis functional neural network based on rough k-means. J Nanjing Univ Aeronaut Astronaut
2006, 38:321–325.
Zhou T, Zhang Y, Lu H, Deng F, Wang F. Rough cluster algorithm based on kernel function. LNAI 5009 2008, 172–179.
Zhou T, Zhang YM, Yuan HEJ, Lu HL. Rough k-means cluster with adaptive parameters. Proceedings of 6th. International
Conference on Machine Learning and Cybernetics, (ICMLC’07) 2007, 3063–3068.
72
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011