Professional Documents
Culture Documents
Rough-
-Fuzzy
Cl
Clustering
i Algorithm
Al i h
Pradipta
p Maji
j
Machine Intelligence Unit
Indian Statistical Institute, Kolkata, INDIA
E-mail:pmaji@isical.ac.in
Web:http://www isical ac in/ pmaji
Web:http://www.isical.ac.in/~pmaji
Introduction
M
Main
i goall off roughh sets
t - induction
i d ti off approximations
i ti off
concepts.
Indiscernibility
Set Approximation
Rough Membership
Dependency of Attributes
U A1 A2 IS is a pair (U, A)
U A1 A2 Walk
DS: T = (U , A ∪ {d })
Elements of AA={A1
{A1, A2} are
x1 16-30 50 yes called the condition attributes.
x2 16-30 0 no
x3 31-45 1-25 no d∉A is the decision attribute
(instead of one we can consider
x4 31-45 1-25 yes more decision attributes).
x5 46-60 26-49 no
x6 16-30 26-49 yes The same or indiscernible objects
may be represented several times.
x7 46-60 26-49 no
Some of the attributes may be
superfluous.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
An Example of Indiscernibility
Non-empty subsets
b off condition
di i
U A1 A2 Walk attributes are {A1}, {A2}, and
{{A1,, A2}.
}
x1 16-30 50 yes
x2 16-30 0 no IND({A1}) = {{x1,x2,x6},
x3 31-45 1-25 no { 3 4} {x5,x7}}
{x3,x4}, { 5 7}}
x4 31-45 1-25 yes
x5 46-60 26-49 no IND({A2})
({ }) = {{x1},
{{ }, {x2},
{ },
x6 16-30 26-49 yes {x3,x4}, {x5,x6,x7}}
x7 46-60 26-49 no
IND({A1
IND({A1, A2}) = {{{{x1},
1} {{x2},
2}
{x3,x4}, {x5,x7}, {x6}}.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Indiscernibility
Equivalence
i l relation
l i – a binary
bi l i R ⊆ X × X which
relation hi h is
i
reflexive (xRx for any object x),
symmetric
y ( xRyy then yyRx),
(if ), and
transitive (if xRy and yRz then xRz).
Th
The equivalence
i l class
l l t x ∈ X consists
[ x]]R off an element i t off all
ll
objects y ∈ X such that xRy.
S
Subsets
b t that
th t are mostt often
ft off interest
i t t have
h the
th same value
l off
the decision attribute.
{{x2}, {x5,x7}}
AW {{x3,x4}}
yes
AW
{{x1},{x6}} yes/no
no
B X = {x | [ x]B ∩ X ≠ φ }.
B-outside region of X, U − B X ,
consists of those objects that can be with certainty classified as
not belonging to X.
RX − X
U/R
RX
R : subset of
setX attributes
ib
B( X ) ⊆ X ⊆ B X
B (φ ) = B (φ ) = φ , B (U ) = B (U ) = U
B( X ∪ Y ) = B( X ) ∪ B(Y )
B( X ∩ Y ) = B( X ) ∩ B(Y )
X ⊆ Y implies B ( X ) ⊆ B (Y ) and B( X ) ⊆ B (Y )
where -X denotes U - X.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Four Basic Classes of Rough Sets
X is roughly B-definable, iff B ( X ) ≠ φ and B ( X ) ≠ U ,
T = (U, C, D) is independent
if all c ∈ C are indispensable in T.
U1 Yes Norlmal No
CORE = {Headache,Temp} I U2 Yes High Yes
{{MusclePain, Temp}
p} U3 Yes Very-high Yes
U4 N
No N
Normall N
No
= {Temp} U5 No High No
U6 No Very-high Yes
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Dependency of Attributes
Bπ X = {x | μ ( x) > 1 − π }. B
X
Rough-Fuzzy Clustering
Design of Algorithm
Objective Function; Cluster Prototypes; Membership Functions
Convergence of Algorithm; Generalization of Existing Algorithms
Performance Analysis
Numerical Data Sets (Clustering Benchmark Data; Segmentation of
Brain MR Images; Text-Graphics Segmentation; Clustering
Functionally Similar Genes)
N
Non-numerical
i l Data
D t Sets
S t (Amino
(A i Acid
A id Sequence
S A l i)
Analysis)
Conclusion
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Rough Sets
It approximates an arbitrary set X by a pair of lower and upper
approximations
Lower approximation R(X): collection of those elements of U (the
universe of discourse) that definitely belong to X;
Upper approximation Ṝ(X): collection of those elements of U that
possibly belong to X.
( ), Ṝ((X)]
The interval [[R(X), )] is the representation
p of an ordinary
y set X in
the approximation space < U, R > or simply called the rough set of X.
P. Maji and S. K. Pal, ``Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices’’, IEEE Trans. System,
Man and Cybernetics, Part B, Cybernetics, 37(6), pp. 1529--1540, 2007.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Rough-
Rough-Fuzzy-
Fuzzy-Possibilistic Clustering
According to rough sets,
sets if xj belongs to lower approximation of ith
cluster, it does not belong to lower approximation of any other clusters -
that is, xj is contained in ith cluster definitely
It partitions each cluster into two classes: lower approximation (core) and
boundary (overlapping). Only objects in boundary are fuzzified.
P. Maji and S. K. Pal, ``Rough Set Based Generalized Fuzzy C-Means Algorithm and Quantitative Indices’’, IEEE Trans. System,
Man and Cybernetics, Part B, Cybernetics, 37(6), pp. 1529--1540, 2007.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Objective Function
Let uij and ukj be the highest and second highest memberships of object xj,
where uij = a μij + b νij.
If (uij −ukj ) < δ, then xj belongs to boundary regions of ith and kth clusters
and xj does not belong g to lower approximation
pp of any
y cluster
Otherwise xj belongs to lower approximation (core) of ith cluster.
After defining
g lower approximations
pp and boundary y regions
g of different clusters
based on δ, the memberships uij of objects in lower approximation are set to 1,
while those in boundary regions are remain unchanged.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Generalization of Existing Algorithms
If δ = 00, RFPCM reduces to HCM (hard c-means).
c means)
n Represents average
1
The value of threshold, δ =
n
∑ (u
j =1
ij − u kj ) difference of two highest
memberships of all
objects
For objects near to centers and far from separating line, different shape of memberships is apparent.
Dunn’s (D) index is designed to identify sets of clusters that are compact and
well separated. It maximizes
For a given image and c value, higher the homogeneity within the segmented
regions, higher would be β value
value. The value also increases with c.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Clustering on Benchmark Data
Data Object Feature Cluster 3.5
HCM
3
Iris 150 4 3
FCM
2.5 FPCM-MR
D B Index
KHCM
Glass 214 10 6
2
KFCM
1.5 RCM
Ionosp 351 34 2 1 RFCM-MBP
0.5 RFCM
Wine 178 13 3 0
RPCM
RFPCM
Wiscon 569 30 2
IRIS GLASS IONOSP WINE WISCON
10 800
HCM HCM
700
500 KHCM
6 KHCM
KFCM
KFCM 400
RCM
4 RCM 300 RFCM-MBP
RFCM-MBP
200 RFCM
2 RFCM
100 RPCM
RPCM
0 RFPCM
RFPCM 0
IRIS GLASS IONOSP WINE WISCON IRIS GLASS IONOSP WINE WISCON
Each image is of size 256 X 180 with 16 bit gray levels: segmented into four
categories, namely, CSF, gray matter, white matter, and background.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Brain MRI Segmentation (AMRI, Kolkata)
0.2 3.5
HCM HCM
0.18
FCM 3 FCM
0.16
FPCM-MR FPCM-MR
0.14 2.5
KHCM KHCM
Du nn Index
0.12
DB Index
KFCM 2 KFCM
0.1
RCM 1.5 RCM
0 08
0.08
D
RFCM-MBP RFCM-MBP
0.06 1
RFCM RFCM
0.04
RPCM 0.5 RPCM
0.02
RFPCM RFPCM
0 0
Image-1 Image-2 Image-3 Image-4 Image-1 Image-2 Image-3 Image-4
16 2500
HCM HCM
14
10 KHCM KHCM
1500
β Index
KFCM KFCM
8
RCM RCM
E xecution T
6 1000
RFCM-MBP RFCM-MBP
4 RFCM RFCM
500
2 RPCM RPCM
RFPCM RFPCM
0 0
Image-1 Image-2 Image-3 Image-4 Image-1 Image-2 Image-3 Image-4
RFPCM provides lower DB, higher Dunn and β index, and lower execution time.
0.4 12
HCM HCM
0.35 FCM FCM
10
0.3 FPCM-MR FPCM-MR
KHCM 8 KHCM
n Index
0.25
DB IIndex
KFCM KFCM
0.2 6
Dunn
RCM RCM
0.15 RFCM-MBP RFCM-MBP
4
0.1 RFCM RFCM
RPCM 2 RPCM
0.05
RFPCM RFPCM
0 0
Noise=0% 1% 3% 5% 7% 9% Noise=0% 1% 3% 5% 7% 9%
60
KFCM
30
RCM
20 RFCM-MBP
10
RFCM
RPCM
The performance of RFPCM is better
0
Noise=0% 1% 3% 5% 7% 9%
RFPCM
than any other c-
c-means algorithms.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Text-
Text -Graphics Segmentation Using HCM
S core
20 FPCM
Cho et al 6457 17 33
Z-S
15 RFCM
RPCM
10
Auble 6225 4 10 5
RFPCM
0
Yeast Cho et al Auble
-5
P. Maji and S. K. Pal, ``Rough-Fuzzy C-Medoids Algorithm and Selection of Bio-Basis for Amino Acid Sequence Analysis’’, IEEE
Trans. Knowledge and Data Engineering, 19(6), pp. 859--872, 2007.
Machine Intelligence Unit, Indian Statistical Institute, Kolkata, India
Amino Acid Sequence Analysis (NCBI)
0.9 Fi
Five HIV protein,
t i C Caspase
0.8
0.7
Cleavage, and Cai-Chou
0.6
RFCMdd
FCMdd
HIV data
ndex
0.5 RCMdd
β In
0.4 HCMdd
0.3
0.2
MI
GAFR
Dayhoff amino acid
0.1 mutation matrix is used to
0
Cai Chou CaspaseCl
AAC82593 AAG42635 AAO40777 NP057849 NP057850 Cai-Chou
compute homology score
1
0.9
0.8
RFCMdd
07
0.7
FCMdd
0.6
RFCMdd provides higher β
γ Index
RCMdd
0.5
0.4
HCMdd
MI index and lower γ index
values
0.3
GAFR
0.2
0.1
0
AAC82593 AAG42635 AAO40777 NP057849 NP057850 Cai-Chou CaspaseCl
The concept of crisp lower bound and fuzzy boundary deals with
uncertainty, vagueness, and incompleteness in class definition. The
membership function handles efficiently overlapping partitions.
Use of rough sets and fuzzy memberships adds a small computational load
to hard clustering
g algorithm;
g ; however the corresponding
p g integrated
g
method shows a definite increase in performance.
Rough-fuzzy
Rough fuzzy cc-medoids
medoids (relational or pair
pair-wise
wise clustering) can be applied
for clustering relational datasets of bioinformatics, data and web mining
Pradipta Maji and Sankar K. Pal, “Rough Set Based Generalized Fuzzy C-Means
Algorithm and Quantitative Indices’’, IEEE Transactions on System, Man and
Cybernetics, Part B, Cybernetics, 37(6), pp. 1529--1540, 2007.