You are on page 1of 9

Advanced Review

Rough clustering
Pawan Lingras1∗ and Georg Peters2

Traditional clustering partitions a group of objects into a number of nonoverlap-


ping sets based on a similarity measure. In real world, the boundaries of these
sets or clusters may not be clearly defined. Some of the objects may be almost
equidistant from the center of multiple clusters. Traditional set theory mandates
that these objects be assigned to a single cluster. Rough set theory can be used
to represent the overlapping clusters. Rough sets provide more flexible represen-
tation than conventional sets, at the same time they are less descriptive than the
fuzzy sets. This paper describes the basic concept of rough clustering based on
k-means, genetic algorithms, Kohonen self-organizing maps, and support vec-
tor clustering. The discussion also includes a review of rough cluster validity
measures, and applications of rough clustering to such diverse areas as forestry,
medicine, medical imaging, web mining, super markets, and traffic engineering.

C 2011 John Wiley & Sons, Inc. WIREs Data Mining Knowl Discov 2011 1 64–72 DOI: 10.1002/widm.16

INTRODUCTION different rough clustering techniques, rough cluster-


ing validation measures, and their applications.
A large number of objects can be grouped into a
smaller number of clusters to simplify modeling
and decision-making process. A decision maker can
then develop guidelines and models for a group in- ROUGH SETS
stead of individual objects. The conventional cluster-
ing techniques mandate that an object must belong to The notion of rough set was proposed by Pawlak.4
precisely one cluster. Such a requirement is found to This section provides a brief summary of the concepts
be too restrictive in many data mining applications.1 from rough set.
In practice, an object may display characteristics of Let U denote the universe (a finite ordinary set),
different clusters. In such cases, an object should be- and let R ⊆ U × U be an equivalence relation on U.
long to more than one cluster and as a result, clus- The pair A = (U,R) is called an approximation space.
ter boundaries necessarily overlap. Fuzzy set repre- The equivalence relation R partitions the set U into
sentation of clusters, using algorithms such as fuzzy disjoint subsets. Such a partition of the universe is
c-means (FCM),2,3 make it possible for an object to denoted by U/R = (E1 , E2 , . . . , En ) , where Ei is an
belong to multiple clusters with a degree of mem- equivalence class of R. If two elements u, v ∈ U belong
bership between 0 and 1. In some cases, the fuzzy to the same equivalence class Ei ⊆ U/R, we say that
degree of membership may be too descriptive for in- u and v are indistinguishable. The equivalence classes
terpreting clustering results. Rough set-based cluster- of R are called the elementary or atomic sets in the
ing provides a solution that is less restrictive than approximation space A = (U,R). The union of one or
conventional clustering and less descriptive (specific) more elementary sets is called a composed set in A.
than fuzzy clustering. Rough set theory has made The empty set is also considered a special composed
substantial progress as a classification tool in data set. Com(A) denotes the family of all composed sets.
mining.4 The basic concept of representing a set as As it is not possible to differentiate the elements
lower and upper approximations can be used in a within the same equivalence class, one may not be able
broader context such as clustering. Clustering in re- to obtain a precise representation for an arbitrary set
lation to rough set theory is attracting increasing X ⊆ U in terms of elementary sets in A.
interest among researchers.5–20 This paper describes Instead, its lower and upper approximations
may represent the set X. The lower approximation
A(X) is the union of all the elementary sets, which

Correspondence to: pawan.lingras@gmail.com are subsets of X. The upper approximation Ā(X)
1
Saint Mary’s University, Nova Scotia, Canada is the union of all the elementary sets that have a
2
Munich University of Applied Sciences, Munich, Germany nonempty intersection with X. The pair [A(X), Ā(X)]
DOI: 10.1002/widm.16 is the representation of an ordinary set of X in the

64 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering

Fuzzy membership λ
Boolean separator
Poor Rich
Upper approximation

λrich

Lower approximation
λpoor
(Positive region)

1
Rough approximations

Negative region Surely Poor or Surely


poor rich clients rich
clients (Unclear status) clients
F I G U R E 1 | Rough set approximations.
F I G U R E 2 | Boolean, fuzzy, and rough sets.

approximation space A = (U,R), or simply the rough


A bank plans to categorize its customers into
set of X. The elements in the lower approximation
two groups, rich and poor customers. When apply-
of X definitely belong to X, whereas elements in the
ing Boolean clustering, the bank needs to define an
upper approximation of X may or may not belong
amount of money that separates rich from poor cus-
to X.
tomers, e.g., rich customers are defined by a fortune of
Figure 1 illustrates the lower and upper approx-
USD 1 million or more. Therefore, a customer own-
imations. The lower approximation A(X) is a sub-
ing USD 999,999 would be treated as poor customer,
set of its corresponding upper approximation Ā(X).
whereas a customer that owns only USD 1 more en-
The region of the upper approximation  that is not
 joys the exclusive service for rich clients (see Figure 2).
covered by the lower approximation Ā(X) − (X) is
So, in Boolean clustering, an object belongs to only
called the boundary region. The lower approxima-
one cluster.
tion is also called positive region because its elements
Obviously, a refined categorization would make
definitely belong to X. In contrast to that, the region
more sense which would incorporate neighborhood
not covered by the upper approximation is denoted
relations. Fuzzy sets provide such methods by the
as negative region to indicate that its elements surely
concept of linguistic variables.22 The variables rich
do not belong to X.
and poor are described by membership functions
(Figure 2) which define the closeness to the variables.
A customer possessing no money at all would be con-
sidered as absolutely poor (λpoor = 1) and by no means
RELATIONSHIP BETWEEN BOOLEAN,
rich (λrich = 0). However, our customer owning USD
FUZZY, AND ROUGH SETS 999,999 would be just in between poor and rich, in-
Rough sets are often compared with fuzzy sets. There dicated by membership degrees, e.g., λpoor = 0.50001
is a significant amount of literature that discusses the and λrich = 0.49999. So, in fuzzy sets, an object gen-
relationship between rough and fuzzy sets, which can erally belongs to more than one set.
often complement each other.21 Many hybridization A possible interpretation of rough clustering is
models have been proposed that combine the com- as follows. In contrast to fuzzy clustering in rough
plementary features of rough and fuzzy sets. In par- sets, an object belongs, like in Boolean clustering, to
ticular, Mitra et al.,15 and Maji and Pal12,13 have only one cluster. However, in contrast to Boolean
shown the usefulness of a hybrid rough and fuzzy set clustering, the definite membership of some objects
clustering. cannot be determined. To indicate their unclear sta-
In this section, we will use a simple example tus, they are grouped only in the upper approxima-
to demonstrate a correspondence between Boolean, tions (or the boundary regions) of the sets they may
fuzzy, and rough sets (note that we refer to sets as possibly be a member of.
Boolean sets to indicate their dichotomist character, Consider a variation of the example discussed
limiting the membership grades to true or false). above. Although customers with assets of less than

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 65
Advanced Review wires.wiley.com/widm

USD 500,000 are treated as poor (lower approxima- Let us consider a hypothetical classification
tion of the set poor) and clients owning more than scheme U/P = {X1 , X1 , . . . , Xk}, which partitions the
USD 1 million are served as rich (lower approxima- set U based on certain criteria. The actual values of
tion of the set rich), the group in between these needs Xi are not known. The classification of supermarket
to provide more evidence to be eventually assigned to customers is an example of such a hypothetical clas-
one category of clients. Their unclear membership is sification scheme. Depending on the predominant us-
shown by their assignments to the boundary region age, a set of supermarket customers can be classified
between the both sets. as loyal high spenders, loyal moderate spenders, semi-
The α-cut or defuzzification of fuzzy set the- loyal high spenders, semi-loyal moderate spenders, or
ory may seem like similar instruments at the first low spenders. However, the actual sets correspond-
sight. However, these concepts differ significantly. ing to each one of these classes are not known. Let
Fuzzy sets are based on neighborhood relations, us assume that due to insufficient knowledge it is not
whereas rough sets relate to missing or contradicting possible to precisely describe the sets Xi , 1 ≤ i ≤ k, in
information. the partition. However, it is possible to define each
Note that this is a simplistic comparison of set Xi∈ U/P using its lower and upper approxima-
Boolean concepts and rough as well as fuzzy sets. tions A(Xi ), Ā(Xi ) based on the available informa-
For a more sophisticated interaction between the two tion. The available information for supermarket cus-
set theories, the users are encouraged to study the tomers may consist of their transaction records. We
references cited at the beginning of this section. will use vector representations, v for an object and
xi for cluster Xi . We are considering the upper and
lower approximations of only a few subsets of U.
Therefore, it is not possible to verify all the prop-
ADOPTION OF ROUGH SET THEORY
erties of rough sets.8 However, the family of upper
TO CLUSTERING and lower approximations of xi ∈ U/R are required
Rough sets were proposed using equivalence rela- to follow some of the basic rough set properties such
tions. However, it is possible to define a pair  of as:
upper and lower approximations A(X), Ā(X) or a
rough set for every set X ⊆ U as long as the proper- P1. An object v can be part of at most one lower
ties specified by Pawlak4 are satisfied. Yao and Lin,23 approximation.
and Yao24 described various generalizations of rough P2. v ∈ A(xi ) ⇒ v ∈ Ā(xi )
sets by relaxing the assumptions of an underlying
P3. v is not part of any lower bound ⇔
equivalence relation. Polkowski and Skowron,25 and
v belongs to two or more upper bounds.
Skowron and Stepaniuk26 discussed a similar gener-
alization of rough set theory.
Lingras and West11 provided an efficient alter- The next step in rough clustering is to determine
native based on an extension of the k-means algo- whether an object belongs to the upper or lower ap-
rithm. k-means clustering is one of the most popu- proximations of a cluster. For each object vector, v, let
lar clustering techniques.27 Incorporating rough sets d(v,xi ) be the distance between itself and the weight
into k-means clustering requires the addition of the vector xi of cluster Xi . The form of the distance func-
concept of lower and upper approximations. The in- tion depends on the application. The Euclidean dis-
corporation required redefinition of the calculation of tance function or inverse of a similarity function are
the centroids to include the effects of lower and upper two of the possible choices. The ratios d(v,xi )/d(v,xj ),
approximations. The next step was to design criteria are used to determine the assignment of v as follows:
to determine whether an object belongs to the lower
or upper approximations of a cluster. A1. If d(v,xi ) is the minimum for 1 ≤ i ≤ k
The rough k-means approach has been a subject and d(v,xi )/d(v,xj ) ≥ threshold for any pair
of further research. Peters16 discussed various refine- (i, j)then v ∈ Ā(xi ) and v ∈ Ā(x j ). Further-
ments of Lingras and West’s original proposal. These more, v is not part of any lower approxima-
included calculation of rough centroids and the use of tion. This criterion guarantees that property
ratios of distances as opposed to differences between (P3) is satisfied.
distances similar to those used in the rough set-based A2. Otherwise, v ∈ A(xi ) such that d(v,xi ) is the
Kohonen algorithm described in Ref 10. The rough minimum for 1 ≤ i ≤ k. In addition, by prop-
k-means and its various extensions have been found erty (P2), v ∈ Ā(xi ). This criterion also satis-
to be effective in distance-based clustering. fies property (P1).

66 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering

Possibly the most popular clustering techniques The rough Davies–Bouldin index proposed by Mitra
is the k-means algorithm.27,28 The name k-means et al.15 takes into account the compactness within a
comes from the means of the k clusters that are cre- cluster as well as separation among clusters. A modi-
ated from the set of objects using the method. The fied version is defined as follows19 :
process begins by randomly choosing k objects as the ⎛
centroids of the k clusters. The objects are assigned to 1  1
Si = ⎝ × ||v − xi ||2 +
one of the k clusters based on minimum value of dis- |A(xi )| || Ā(xi ) − A(xi )||
v∈A(x )
tance between the object vector v and cluster vector i

⎞1/q
xi . After the assignment of all the objects to various
  q
clusters, the new centroid vectors of the clusters are ||v − xi ||2 ⎠
× . (3)
calculated as: bv
v∈ Ā(xi )−A(xi )
v∈xi v
Xi = (1) Here bv is the number of boundary regions object v
|xi |
belongs to.
where |xi | is the cardinality or size of cluster xi . If the
size of cluster xi is 0, then xi is a null vector.
The objects in rough k-means clustering are as-
signed to either lower or upper approximations using RELATED APPROACHES TO ROUGH
the criteria (A1) or (A2). Therefore, we need to mod- CLUSTERING
ify the Eq. (1) for rough k-means as: A Class of Rough Partitive Algorithms
If |A(xi )|
= 0 and |A(xi )|
= 0, then In k-means clustering algorithms, cluster centers
  are represented by artificial objects that correspond
v∈A(xi ) v v∈ Ā(xi )−A(xi ) v to the means of the clusters. In contrast to that,
xi = wl × + wu ×
|A(xi )| | Ā(xi ) − A(xi )| Kaufman and Rousseeuw29 proposed k-medoid clus-
tering where the cluster centers are represented by
else if |A(xi )|
= 0 and | Ā(xi ) − A(xi )| = 0, then
 real objects. In switching regression models, the clus-
v∈A(xi ) v ters are represented by functions instead of objects.30
xi =
|A(xi )| Perers and Lampart17 suggested rough k-
medoids and Peters18 proposed a rough switching
else if |A(xi )| = 0 and | Ā(xi ) − A(xi )|
= 0, then regression model, which—together with the rough k-

v∈ Ā(xi )−A(xi ) v means—form a class of rough partitive algorithms.
xi = . (2)
| Ā(xi ) − A(xi )|

Here, wl + wu = 1. Usually, wl > w u . Rough k- Genetic Algorithm Based Rough Clustering


means is a popular and efficient algorithm for rough The origin of genetic algorithms (GA) is attributed to
clustering. Before looking at other rough clustering al- Holland’s31 work on cellular automata. There has
gorithms, we will look at the evaluation of the quality been significant interest in GA over the last two
of rough clustering. decades. A GA is a search process that follows the
principles of evolution through natural selection. The
domain knowledge is represented using a candidate
ROUGH CLUSTER VALIDITY solution called an organism. Typically, an organism
is a single genome represented as a vector.
MEASURES An abstract view of a generational GA is as
Quality of clustering is an important issue in appli- follows: A group of organisms is called a popula-
cation of clustering techniques to real world data. A tion. Successive populations are called generations. A
good measure of cluster quality will help in decid- generational GA starts from initial generation G(0),
ing various parameters used in clustering algorithms. and for each generation G(t) generates a new genera-
One of the simplest measures of quality of a con- tion G(t+1) using genetic operators such as mutation
ventional clustering scheme is to compute the sum and crossover. The mutation operator creates new
of compactness of the generated clusters. We can de- genomes by changing values of one or more genes
fine compactness of a cluster as the sum of distance at random. The crossover operator joins segments of
of all the objects in that cluster from the centroid of two or more genomes to generate a new genome.
the cluster. Lingras et al.9 have defined a more elab- There are three versions of the GA based rough
orate cluster validity index based on decision theory. clustering, first one proposed in 2001 by Lingras,8

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 67
Advanced Review wires.wiley.com/widm

another one in 2004 by Mitra,14 and an evolutionary ron. Because an object belonging to the lower ap-
k-medoid in 2008 by Peters et al.32 proximation of a cluster also belongs to its upper
We will discuss the evolutionary rough k- approximation, when lower neuron has an output
medoid by Peters et al. in greater detail as it is prob- of 1, the upper neuron also has an output of 1.
ably the most efficient GA based rough clustering. However, membership in the upper approximation
A GA can be used to search for the most appropri- of a cluster does not necessarily imply the member-
ate k-medoid. The genome will contain k-genes, each ship in its lower approximation. Therefore, the up-
corresponding to a medoid. Such a genome will be per neuron contains the lower neuron. Assignments
smaller than the one used by Lingras8 and Mitra.14 (A1) and (A2) are used to determine the output from
The smaller genomes will reduce the space require- the lower and upper neurons. For assignment (A1),
ments and also facilitate faster convergence. The val- the weight vectors xi and xj are modified as xinew =
ues of genes for the medoids are discrete and limited to xiold + au (v − xiold ) and xnew
j = xold
j + au (v − x j ). For
old

the number of objects in the dataset. If we number the assignment (A2), the weight vector xi is modified as
objects from 1 to n, then each gene can take an integer xinew = xiold + al (v − xiold ). Usually, al > au .
value in the range 1–n. This restriction on the values
of genes also reduces the search space allowing for
faster convergence. The rough k-medoid algorithm Support Vector Based Rough Clustering
can use assignments (A1) and (A2) to determine the The rough clustering methods as described above are
membership degrees of objects to lower and upper ap- based on Euclidean distances in the original input data
proximations of different clusters. A cluster validity space. Support vector clustering (SVC)34 is a kernel
measure such as rough Davis–Bouldin index is opti- based clustering method that is capable of identifying
mized to arrive at an appropriate clustering scheme. clusters having arbitrary shapes. Here, the clustering
The main advantage of the evolutionary rough clus- problem is formulated as a quadratic programming
tering is the flexibility of the optimization criteria. (QP) problem to learn a minimum radius sphere en-
closing the image of the dataset to be clustered in a
high-dimensional feature space. In SVC, this problem
Kohonen Network Based Rough Clustering is solved by employing a method, called kernel trick34
The unsupervised learning based on the Kohonen that helps solve the QP problem without explicit map-
rule33 uses competitive learning approach. In com- ping of data points from the input data space to the
petitive learning, the output neurons compete with higher dimensional feature space. Once the QP prob-
each other. The winner output neuron has the output lem is solved, SVC uses a graph-based cluster labeling
of 1, the rest of the output neurons have outputs of method to identify the arbitrary shaped clusters exist-
0. The competitive learning is suitable for classifying ing in the input data space. Rough SVC (RSVC)35
a given pattern into exactly one of the mutually ex- is a soft clustering method derived from the SVC
clusive clusters. A Kohonen network consists of two paradigm. It achieves soft data clustering by a nat-
layers. The first layer is the input layer and the sec- ural fusion of rough set theory and SVC. In RSVC,
ond layer is called the Kohonen layer. The network the QP problem involved in SVC is modified to impart
receives the input vector for a given pattern. If the a rough set theoretic flavor. The modified QP prob-
pattern belongs to the ith cluster, then ith neuron in lem obtained for RSVC turns out to be the same as the
the Kohonen layer has an output value of 1 and other one involved in SVC. Therefore, the existing solution
Kohonen layer neurons have output values of 0. The strategies used for solving the SVC–QP problem can
training set of input vectors is presented to the net- be used for solving the RSVC–QP problem as well.
work several times. For each iteration, an object v The cluster labeling method of RSVC is a modified
contributes to the weight vector xi of a cluster. version of the one used in SVC.
Incorporating rough sets into the Kohonen al-
gorithm requires an addition of the concept of lower
and upper approximations in the equations, which Dynamic Rough Clustering
are used for updating the weights of the winners.10 A In many real-life situations where cluster algorithms
neuron in the Kohonen layer consists of two parts, are applied, the underlying data structure is not stable
a lower neuron and an upper neuron. The lower but changes over the time. For example, the shopping
neuron has an output of 1, if an object belongs to patterns of customers change dramatically within a
the lower approximation of the cluster. Similarly, a calendar year. In autumn, they buy warm clothes to
membership in the upper approximation of the clus- prepare for the winter and in spring they look for
ter will result in an output of 1 from the upper neu- t-shirts instead of gloves. To address changes in the

68 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering

data structures, Peters and Weber19 proposed a dy- The average pattern for the lower approxi-
namic approach to rough clustering where the initial mation of commuter/business cluster had the least
parameters of the algorithm are updated in cycles to variation over the year. The recreational cluster,
better adapt to changing environments like the sea- conversely, had the most variation. The variation
sonal changes in customer behavior. for long-distance cluster was less than the recre-
ational but more than the commuter/business cluster.
Lingras8 illustrated how one of the highway sections
Further Approaches near counter number C013201 may have been com-
Besides the approaches discussed above, several fur-
muter/business or long distance in 1985. The monthly
ther approaches to rough clustering have been pro-
pattern for the highway section fell in between the two
posed. They include early approaches to clustering
clusters. The counter C013201 is located on Highway
based on the set interpretation of rough sets by do
13, 20 km west of Alberta–Saskatchewan border. It is
Prado et al.36 and Voges et al.37,38 Recently Yao
an alternate route for travel from the city of Saskatoon
et al.39 suggested to relax some of the properties of
and surrounding townships to townships surrounding
rough clustering, in particular the need for the mem-
the city of Edmonton. Rough set representation of
bership, to at least two cluster of objects in bound-
clusters made it possible to identify such intermediate
ary areas, and introduced an interval-based clustering
patterns.
approach.

Clustering Web Users


APPLICATIONS OF ROUGH The study data was obtained from the web access
CLUSTERING logs of the first three courses in Computing Science at
Saint Mary’s University over a 16-week period.11 Stu-
Rough clustering has been used successfully in
dents’ attitudes towards the course vary a great deal.
forestry,32 medicine,5,32 imaging,14 web mining,10
It was hoped that the profile of visits would reflect
supermarkets,40 and traffic engineering applications.8
some of the distinctions between the students. The
This section briefly describes three such experiments.
web logs were preprocessed to create an appropriate
representation of each user corresponding to a visit.
Clustering Highways The following attributes were used for representing
Seasonal and permanent traffic counters (PTCs) scat- each visitor:
tered across a highway network are the major sources
of traffic data. These traffic counters measure the traf- 1. On campus/off campus access.
fic volume—the number of vehicles that have passed 2. Daytime/nighttime access.
through a particular section of a lane or highway in
3. Access during lab/class days or non-lab/class
a given time period. Traffic volumes can be expressed
days.
in terms of hourly or daily traffic.
The PTC sites are grouped to form various 4. Number of hits.
road classes. These classes are used to develop guide- 5. Number of class notes downloaded.
lines for the construction, maintenance, and upgrad-
ing of highway sections. In one commonly used sys- It was assumed that the visitors could fall into
tem, roads are classified on the basis of trip purpose three categories:
and trip length characteristics.8 Examples of resulting
classes were commuter, business, long distance, and 1. Studious: These visitors download the cur-
recreational highways. rent set of notes. Because they download a
The study described in Ref 8 was based on a limited/current set of notes, they probably
sample of 264 monthly traffic patterns—variation of study class notes on a regular basis.
monthly average daily traffic volume in a given year— 2. Crammers: These visitors download a large
recorded between 1987 and 1991 on Alberta high- set of notes. This indicates that they have
ways. Rough clustering was used to create three lower stayed away from the class notes for a long
and upper approximations of three clusters: period of time. They are planning for pretest
cramming.
1. commuter respective business
3. Workers: These visitors are mostly working
2. long-distance, and on class or lab assignments or accessing the
3. recreational. discussion board.

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 69
Advanced Review wires.wiley.com/widm

T A B L E 1 Cardinalities of Clusters Using Crisp, Fuzzy, and Rough Clustering

Course Cluster FCM > 0.6 Lower Bound Conventional Cluster

First Studious 1382 1412 1814


Crammers 414 288 406
Workers 4354 5350 5399
Second Studious 1750 1197 1699
Crammers 397 443 634
Workers 1322 1677 3697
Third Studious 265 223 318
Crammers 84 69 89
Workers 717 906 867

The web users were clustered using con- had only one store and hence it is likely that G3 did
ventional Boolean clustering, FCM, and rough k- not find it convenient to shop at the supermarket on
means to compare these three clustering philosophies. a regular basis.
Table 1 shows the cardinalities of conventional clus- Although the lower approximations tended to
ters, the rough clusters, and fuzzy clusters with mem- provide distinguishing characteristics of various clus-
berships greater than 0.6. The actual numbers in ters, the boundary regions of the clusters tended to
each cluster vary based on the characteristics of each fall between the lower approximations of two re-
course. For example, in the fuzzy clusters, the first- gions. There was a large difference between the lower
term course had significantly more workers than stu- approximations of the G1 and G4. However, their
dious visitors, whereas the second-term course had boundary regions seemed to be less distinct. The
more studious visitors than workers. The increase in boundary regions of G1 and G4 fell between the lower
the percentage of studious visitors in the second term approximations of those groups.
seems to be a natural progression. It should be noted
that the progression from workers to studious visi-
tors was more obvious with fuzzy clusters than the CONCLUSION
conventional clusters and the rough clusters. Interest-
In practical applications, an object may exhibit char-
ingly, the second year course had significantly large
acteristics of multiple clusters. The conventional
number of workers than studious visitors. This seems
Boolean clustering forces these objects into one of
to be counterintuitive. However, it can be explained
the clusters. This can lead to potentially erroneous
based on the structure of the websites. Unlike the two
assignments, leading to incorrect decisions. The fuzzy
first year courses, the second year course did not post
and rough clustering algorithms allow an object to
the class notes on the web. The notes downloaded
belong to more than one cluster. The resulting clus-
by these students were usually sample programs that
tering scheme may consist of overlapping clusters.
were essential during their laboratory work.
Although the rough clustering removes the restric-
tion of the unique cluster assignment, it is less de-
scriptive than the fuzzy clustering. In some cases, a
Clustering Supermarket Customers decision maker may find a concise description from
The data consisted of transactional records over a rough clustering less overwhelming. This paper de-
26-week period from one supermarket store.40 Sorted scribed the adoption of rough set theory to clustering,
patterns of weekly visits and spending were used to including a rough extension of the popular clustering
represent the customers. The average spending and algorithm called k-means. Rough clustering based on
visit patterns enabled Lingras et al.40 to distinguish other techniques such as genetic algorithms, Kohonen
between the five types of customers as: loyal big networks, and SVC were also described. The rough
spenders (G1), loyal moderate spenders (G2), semi- clustering has been used in a variety of applications
loyal potentially big spenders (G3), potentially mod- including forestry, medicine, image processing, web
erate to big spenders with limited loyalty (G4), and mining, supermarkets, and traffic analysis. The paper
infrequent customers (G5). Even though for most described three such applications to highlight various
weeks, G2 had higher spending than G3, the high- aspects of rough clustering, including a comparison
est spending of G3 was higher than G2. The region with Boolean and fuzzy clustering.

70 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011
WIREs Data Mining and Knowledge Discovery Rough clustering

REFERENCES
1. Joshi A, Krishnapuram R. Robust fuzzy clustering 18. Peters G. Rough clustering and regression analysis.
methods to support web mining. Proceedings of the Proceedings RSKT’07, LNAI 2007, 4481:292–299.
workshop on Data Mining and Knowledge Discovery; 19. Peters G, Weber R. A dynamic approach to rough
SIGMOD ’98, June 2–4 1998. Seattle, Washington. clustering. Proceedings RSCTC’08, LNAI 2008,
1998, 15:1–8. 5306:379–388.
2. Bezdek JC. Pattern Recognition with Fuzzy Objective 20. Peters JF, Skowron A, Suraj Z, Rzasa W, Borkowski
Function Algorithms. New York: Plenum; 1981. M. Clustering: a rough set approach to constructing in-
3. Pedrycz W, Waletzky J. Fuzzy clustering with partial formation granules. Proceedings of 6th International
supervision. IEEE Trans Syst Man Cybern B 1997, Conference on Soft Computing and Distributed Pro-
27:787–795. cessing. Rzeszow, Poland 2002, 57–61.
4. Pawlak Z. Rough Sets: Theoretical Aspects of Reason- 21. Dubois D, Prade H. Rough fuzzy sets and fuzzy rough
ing About Data. Dordrecht, Boston: Kluwer Academic sets. Int J Gen Syst 1990, 17:191–209.
Publishers; 1992. 22. Zadeh LA. The concept of linguistic variable and its ap-
5. Hirano S, Tsumoto S. Rough clustering and its appli- plication to approximate reasoning. Information Sci-
cation to medicine. Inform Sci 2000, 124:125–137. ences, I: 1975, 8:199–249; II:1975, 8:310–357; III:
6. Hirano S, Tsumoto S. On constructing clusters from 1975, 9:43–80.
non-Euclidean dissimilarity matrix by using rough 23. Yao YY, Lin TY. Generalization of rough sets using
clustering. JSAI Workshops. Kitakyushu City, Japan, modal logic. Intell Autom Soft Comput 1996, 2:103–
2005; 5–16. 120.
7. Ho TB, Nguyen NB. Nonhierarchical document clus- 24. Yao YY. Constructive and algebraic methods of the
tering by a tolerance rough set model. Int J Intell Syst theory of rough sets. Inform Sci 1998, 109:21–47.
2002, 17:199–212. 25. Polkowski L, Skowron A. Rough mereology: a new
8. Lingras P. Unsupervised rough set classification using paradigm for approximate reasoning. Int J Approxi-
GAs. J Intell Inform Syst 2001, 16:215–228. mate Reason 1996, 15:333–365.
9. Lingras P, Chen M, Miao D. Rough cluster quality in- 26. Skowron A, Stepaniuk J. Information granules in
dex based on decision theory. IEEE Trans Knowledge distributed environment. Proceedings RSFDGrC’99,
Data Eng 2009, 21:1014–1026. LNCS 1999, 1711: 357–365.
10. Lingras P, Hogo M, Snorek M. Interval set clustering 27. Hartigan JA, Wong MA. Algorithm AS136: a k-means
of web users using modified Kohonen self-organizing clustering algorithm. J Royal Statist Soc C (Appl
maps based on the properties of rough sets. Web Intell Statist) 1979, 28:100–108.
Agent Syst Int J 2004, 2:217–230. 28. MacQueen J. Some methods for classification and anal-
11. Lingras P, West C. Interval set clustering of web users ysis of multivariate observations. Proceedings of Fifth
with rough k-means. J Intell Inform Syst 2004, 23:5– Berkeley Symposium on Mathematical Statistics and
16. Probability, June 21–July 18 1965 and December 27
12. Maji P, Pal SK. Rough set based generalized fuzzy c- 1965–January 7, 1966. Berkeley, California, 1 1967.
means algorithm and quantitative indices. IEEE Trans pp. 281–297.
Syst Man Cybern B 2007, 37:1529–1540. 29. Kaufman L, Rousseeuw PJ. Finding Groups in
13. Maji P, Pal SK. RFCM: a hybrid clustering algo- Data: An Introduction to Cluster Analysis. 2nd ed.
rithm using rough and fuzzy sets. Fundam Informaticae New York: John Wiley & Sons; 2005.
2007, 80:477–498. 30. Quandt R. The estimation of the parameters of a linear
14. Mitra S. An evolutionary rough partitive clustering. regression system obeying two separate regimes. J Am
Pattern Recognit Lett 2004, 25:1439–1449. Stat Assoc 1958, 53:873–880.
15. Mitra S, Banka H, Pedrycz W. Rough-fuzzy collabora- 31. Holland JH. Adaptation in Natural and Artificial Sys-
tive clustering. IEEE Trans Syst Man Cybern B 2006, tems. Ann Arbor, MI: University of Michigan Press;
36:795–805. 1975.
16. Peters G. Some refinements of rough k-means. Pattern 32. Peters G, Lampart M, Weber R. Evolutionary rough
Recognit 2006, 39:1481–1491. k-medoid clustering. Trans Rough Sets 2008, VIII:
17. Peters G, Lampart M. A partitive rough cluster- 289–306.
ing algorithm. Proceedings RSCTC’06, LNAI 2006, 33. Kohonen T. Self-Organization and Associative Mem-
4259:657–666. ory. Berlin: Springer; 1988.

Volume 1, January/February 2011 


c 2011 John Wiley & Sons, Inc. 71
Advanced Review wires.wiley.com/widm

34. Ben-Hur A, Horn D, Siegelmann HT, Vapnik V. Sup- eds. Heuristics and Optimization for Knowledge Dis-
port vector clustering. J Machine Learn Res 2001, covery. Hershey, PA: Idea Group Publishing; 2002,
2:125–137. 208–225.
35. Asharaf S, Shevade SK, Murty NM. Rough support 38. Voges KE, Pope NK, Brown MR. A rough cluster anal-
vector clustering. Pattern Recognit 2005, 38:1779– ysis of shopping orientation data. Proceedings of Aus-
1783. tralian and New Zealand Marketing Academy Confer-
36. do Prado HA, Engel PM, Filho HC. Rough clustering: ence 2003, 1625–1631.
an alternative to find meaningful clusters by using the 39. Yao YY, Lingras P, Wang R, Miao D. Interval
reducts from a dataset. Proceedings RSCTC’02, LNAI set cluster analysis: a re-formulation. Proceedings of
2002, 2475:234–238. RSFDGrC’09, LNCS 2009, 5908:398–405.
37. Voges KE, Pope NK, Brown MR. Cluster analysis of 40. Lingras P, Hogo M, Snorek M, West C. Temporal
marketing data examining on-line shopping orienta- analysis of clusters of supermarket customers: conven-
tion: a comparison of k-means and rough clustering tional versus interval set approach. Inform Sci 2005,
approaches. In: Abbass HA, Sarker RA, Newton CS, 172:215–240.

FURTHER READING
De SK. A rough set theoretic approach to clustering. Fundam Informaticae 2004, 62:409–417.
Falcón R, Jeon G, Bello R, Jeong J. Rough clustering with partial supervision. Stud Comput Intell 2009, 174:137–161.
Hung CC, Purnawan H. A hybrid rough k-means algorithm and particle swarm optimization for image classification.
Proceedings MICAI’08, LNCS 5317 2008, 585–593.
Kumar P, Krishna PR, Bapi RS, De SK. Rough clustering of sequential data. Data Knowledge Eng 2007, 63:183–199.
Maji P, Pal SK. Rough set based generalized fuzzy c-means algorithm and quantitative indices. IEEE Trans Syst Man Cybern
B 2007, 37:1529–1540.
Mitra S, Barman B. Rough-fuzzy clustering: an application to medical imagery. Proceedings RSTK’08, LNCS 5009 2008,
300–307.
Parmar D, Wu T, Blackhurst J, MMR. An algorithm for clustering categorical data using rough set theory. Data Knowledge
Eng 2007, 63:877–891.
Varma CMBS, Asharaf S, Murty MN. Rough core vector clustering. Proceedings PReMI’07, LNCS 4815 2007, 304–310.
Wang R, Miao D, Li G, Zhang H. Rough overlapping biclustering of gene expression data. Proceedings of the 7th IEEE
International Conference on Bioinformatics and Bioengineering (BIBE) 2007, 828–834.
Wang SC, Miao DQ, Chen M, Wang RZ. Overlapping-based rough clustering algorithm. J Electron Inform Technol 2008,
30:1713–1716.
Xiao D, Hu S. Ellipsoidal basis functional neural network based on rough k-means. J Nanjing Univ Aeronaut Astronaut
2006, 38:321–325.
Zhou T, Zhang Y, Lu H, Deng F, Wang F. Rough cluster algorithm based on kernel function. LNAI 5009 2008, 172–179.
Zhou T, Zhang YM, Yuan HEJ, Lu HL. Rough k-means cluster with adaptive parameters. Proceedings of 6th. International
Conference on Machine Learning and Cybernetics, (ICMLC’07) 2007, 3063–3068.

72 
c 2011 John Wiley & Sons, Inc. Volume 1, January/February 2011

You might also like