You are on page 1of 14

Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

A Fuzzy Approach for Multi-Type Relational Data


Clustering
Jian-Ping Mei, Student Member, IEEE, and Lihui Chen, Senior Member, IEEE

Abstract—Mining interrelated data among multiple types of larger the value of the relationship between two objects is,
objects or entities is important in many real-world applications. the more similar the two objects are or more strongly the
Despite of extensive study on fuzzy clustering of vector space two objects are associated. In computer science, relational data
data, very limited exploration has been made on fuzzy clustering
of relational data involving several object types. In this paper, can also be modeled as a graph, in which a node or vertex
we propose a new fuzzy approach for clustering multi-type corresponds to an object and the weight of the edge connected
relational data (FC-MR), which simultaneously clusters different two nodes is the similarity between the two objects. Thus,
types of objects. In FC-MR, an object is assigned with a large a graph can be constructed based on a relation matrix, and
membership to a cluster if its related objects in this cluster clustering of relational data is then equivalent to partitioning
have high rankings. In each cluster, an object tends to have
a high ranking if its related objects have large memberships in of the corresponding graph. Since this type of relation, e.g.,
this cluster. The FC-MR approach is formulated to deal with document-document relation, exist among objects of the same
multi-type relational data with various structures. The objective type, it is referred as homogeneous relational data.
function of FC-MR is locally optimized by an efficient iterative Different from traditional clustering approaches which gen-
algorithm which updates the fuzzy membership matrix and the erate clusters of objects of the same type based on the
ranking matrix of one type at once while keeping those of other
types constant. We also discuss the simplified FC-MR for multi- vector representation or the pairwise relation, co-clustering
type relational data with two special structures namely star- approaches simultaneously cluster the rows and columns of a
structure and extended star-structure. Experimental studies are data matrix [7]–[12]. For example, in document co-clustering
conducted on benchmark document datasets to illustrate how based on the document-word co-occurrence matrix, both doc-
the proposed approach can be applied flexibly under different ument clusters and word clusters are produced. Co-clustering
scenarios in real-world applications. The experimental results
demonstrate the feasibility and effectiveness of the new approach approaches have been initially proposed for handling high
compared with existing ones. dimensional data such as text documents and mircoarray data,
where the effectiveness of traditional distance-based clustering
Index Terms—Fuzzy clustering, relational data, multi-type,
document clustering, multi-way clustering. approaches degrades due to the “curse of dimensionality”. In
[7], co-clustering is treated as a bi-partite graph partitioning
by calculating the Singular Value Decomposition (SVD) of
I. I NTRODUCTION the data matrix. A bi-partite graph consists of two types of
Lustering has been a fundamental and efficient tool for nodes, and edges only exist between nodes of different types.
C data analysis by grouping similar objects into clusters.
Compared with hard clustering, fuzzy clustering which allows
Since homogeneous relational data corresponds to a graph
consisting of nodes of the same type, the data matrix such as
overlaps among clusters is able to provide a more accurate and the document-term matrix which corresponds to a bi-partite
natural description of the underlying structure of real-world graph can be treated as bi-type heterogeneous relational data,
data. The same as k-means, most existing studies on fuzzy i.e., relation between two different object types, as illustrated
clustering including the well known fuzzy c-means (FCM) [1] in Fig. 1. In this point of view, co-clustering is actually
and some recently proposed approaches such as [2], [3] deal bi-type heterogenous relational data clustering [13], or two-
with vector-based data, of which each object is represented way clustering as it produces clusters for two object types
as a vector in some feature space. For example, in document simultaneously [14].
clustering, a document may be represented as a vector where However, the data in real-world data mining applications
each feature or dimension of the vector is a distinctive word. may contain relations involving more than two types. For
Other than the vector represented data, clustering based on example, for a dataset recording information about published
pairwise relation has also been studied for a long time, such research papers, other than knowing a set of words each
as hierarchical clustering [4], k-medoid clustering [5], and paper contains, we may also know the authors of each paper,
fuzzy relational clustering [6]. Generally, pairwise relation the name of conferences or journals where those papers are
could be described by similarities or dissimilarities between published, and the references of each paper. Thus, this dataset
each pair of objects from a given dataset. In this paper, may be referred as a four-type relational data, which consists
we only consider similarity type relation, which means the of paper, term, author, venue (e.g., conference, journal) these
four types of objects or entities, together with four rela-
J.-P Mei and L. Chen are with the Division of Information Engi- tions: paper-term relation, paper-author relation, paper-venue
neering, School of Electrical and Electronic Engineering, Nanyang Tech-
nological University, Singapore 639798, Republic of Singapore (e-mail: relation and paper-paper relation. The first three relations
meij0002@e.ntu.edu.sg; elhchen@ntu.edu.sg) characterize each individual paper with respect to different

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

clustering approaches which produce clusters for different


types simultaneously [13]–[19]. These approaches can be
viewed as different generalized co-clustering approaches. The
approach in [13] could be seen as an extension of bi-partite
graph partitioning to m-partite graph partitioning, which is
solved with semi-definite programming. The approach in [14]
extends the information-theoretic co-clustering to a multi-
way clustering. Although in [15], [16], [18], the multi-type
relational clustering is formulated as high-level co-clustering
Homogeneous Bi-type heterogeneous with collective matrix factorizations, different algorithms are
derived in these three studies to find or estimate the solutions.
Comparing with exhaustive search used in [15], and iterative
Fig. 1. Graph representation of homogeneous relational data and bi-type
heterogeneous relational data. Difference shapes of nodes indicate different
eigendecompsition used in [16], the multiplicative algorithm
object types. used in [18] is more computationally efficient. In probability
model-based approach [19], the computation of posterior in
Paper-Paper
the E-step cannot be calculated in a straightforward way
Paper-Author
due to the dependency of latent variables which violates the
Paper-Term
Term1
independence assumption. There is another approach called
Paper-Conference
NetClus proposed in [20] for star-structured information net-
Paper1
work analysis. This is also a generative model-based approach,
Author1 Paper2
where in each iteration, a generative model for the objects of
Term3
the central-type is established based on the current rankings of
Paper3 Term2 attribute type objects in each cluster, which are calculated with
Term4 heuristically designed ranking rules. Most of these existing
approaches only consider relations between two different types
Conference2 and some of them only handle the star-structured case.
Author3 Author2
In this paper, we develop a new approach of fuzzy clustering
for multi-type relational data (FC-MR). Main contributions of
Conference1
this paper are:
(a) • Formulate the fuzzy clustering of multi-type relational
Fig. 2. Illustration of different relations in a small research paper dataset. The data of various structures into a constrained maximization
dataset involves three papers, three authors, four terms and two conferences. problem.
• Establish connections between the proposed approach and
two existing fuzzy approaches to show that the fuzzy
aspects, namely the content, the person(s) who write the paper, relational clustering approach PFC [21] and fuzzy co-
and the place where the paper is published, respectively; while clustering approach FCoDok [11] are two special cases
the last relation records the citation information of one paper to of the proposed one.
another paper. A small research paper dataset is given in Fig. • Provide experimental study on real-world datasets ex-
2 to illustrate the relations among multiple object types. Multi- tracted from two benchmark document collections
type relational data may form different structures depending namely 20newsgroup [22] and Cora [23] to illustrate how
on which relations are available. A star-structure is a special the proposed approach can be applied according to the
case where relations only exist between the central type and availability of relations. It demonstrates the feasibility and
several attribute types. For example, the research paper data effectiveness of the proposed approach compared with
forms a four-type star-structure with the paper-term, paper- existing ones.
author, and paper-venue relations, where paper is the central
type and term, author, venue are three attribute types. It is The rest of the paper is organized as follows: in the next
not a star-structure any more when the paper-paper relation is section, we present the detailed formulation and algorithm
considered. It is possible to transform a multi-type relational of FC-MR. In Section 3, we discuss special cases of FC-
data into one of the basic data representation forms and then MR and show the connections between FC-MR and existing
use an existing approach to get the clusters of objects of the fuzzy approaches. Experimental results on benchmark datasets
interested type. However, useful information may be lost dur- are reported and discussed in Section 4. Finally, we give the
ing data transformation. Moreover, clustering on each type of conclusion of this paper in Section 5.
objects individually loses the chance of mutual improvement Throughout this paper, we use the following denotations
among clusters of different object types, and is unable to unless otherwise stated:
capture the interrelated patterns among different types which • bold uppercase letters denote matrices, e.g., X. The
may be interesting in some data mining applications [15]. transpose of a matrix X is denoted as X T , T r(X) is the
To fully make use of those relations among multiple object trace of X, and ‘’ is the element-wise multiplication of
types, researchers began to explore multi-type relational data two matrices.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

• letters in bold lowercase denote vectors, e.g., x. gives the sum of rankings of each of its related objects of
• letters in calligraphic font denote sets, e.g., X , and |X | all the m types in cluster f , weighted by the strength of
represents the size of set X . relationships, and
• bold 1 is a vector with a proper length, of which all nν
m 

elements are 1s; 1 is a matrix with a proper size, of μ νμ ν
Mif = rji ujf (3)
which all entries are 1s. ν=1 j=1

calculates the sum of memberships of each of its related


II. F UZZY C LUSTERING FOR M ULTI - TYPE RELATIONAL
objects in cluster f weighted by the strength of relationships.
DATA
We may expect that the membership of x μi is larger in cluster
Now we present the details of the proposed fuzzy approach μ μ
f than in cluster c if Rif > Ric , and within a cluster f , the
for clustering relational data involving several types of objects. ranking of x i is larger than that of x μj if Mif
μ μ μ
> Mjf . Based
on this idea, we now formulate the proposed FC-MR as below:
A. Problem Formulation
We denote a dataset with m object types as X = {X μ }m μ=1 ,

m 
m 
where Xμ = {xμi }i=1

is a set of objects of the μth type max J =T r βμν UTμ Rμν Vν
and nμ = |Xμ |. The relation between X μ and Xν denoted μ=1 ν=1
m m
as Xμ ∼ Xν is stored in matrix R μν , and it may between φμ θν
two different types μ = ν or the same type μ = ν. We − Uμ 2F − Vν 2F (4)
μ=1
2 ν=1
2
μν
assume Rμμ = RTμμ . Each entry rij ∈ Rμν records the
value of relationship between object x μi and xνj . Each relation subject to
matrix Rμν is associated with a weight βμν . The clustering
Uμ 1 = 1, VνT 1 = 1, Uμ , Vν ≥ 0 for μ, ν = 1, 2, . . . , m.
problemnow is to partition the dataset X into k clusters, i.e.,
(5)
X = { Cf }kf =1 , by making use of all relations effectively
and collaboratively. Each cluster C f is a union
 of subsets of where A2F = T r(AT A) denotes the squared Frobenius
objects of each of the types, i.e., C f = { Cfμ }m μ=1 , where norm of matrix A.
Cfμ represents the subset of objects of the μth type in the f th The formulation of the clustering problem into the above
cluster. This clustering problem is equivalent to be described maximization problem consists of three components: qual-
as to simultaneously
 produce a partitioning of each type of ity measure of a partitioning, constraints and regularization.
objects Xμ = { Cfμ }kf =1 , and at the same time to establish Now we give our analytical study to see how this formu-
one-to-one associations among the clusters of different types. lation matches well with our intuitive idea. Without loss
To formulate this problem, for each type μ = {1, 2, . . . , m}, of generality, ∀μ, ν, we assume β μν = 1. We use rνμ i =
we define a cluster membership u μif and a ranking v if
μ
for each νμ νμ
[r1i , r2i , . . . , rnνμν i ]T to denote the ith column of R νμ . Now
μ
object xi with respect to each cluster f . The membership is a we rewrite the first term of (4) as below by writing out the
soft or fuzzy cluster indicator that measures how possible an trace expression:
object is labeled as that cluster, and the ranking measures how
representative or typical x μi is compared with other objects 
m 
m  nμ m
m 
  μT
in Xμ in cluster f . Assuming the number of clusters to be Tr UTμ Rμν Vν = u i VνT rνμ
i (6)
produced is k, then each type X μ is associated with two nμ ×k μ=1 ν=1 μ=1 i=1 ν=1
matrices Uμ and and Vμ . To be more clear, we write out the μ
where u i = [uμi1 , uμi2 , . . . , uμik ]T . In order to maximize the
membership matrix and its transpose in terms of their column
value of the above equation, for each μ, i, the following value
vectors as
needs to be maximized:
μ μ μ
Uμ = (uμ1 , uμ2 , . . . , uμk ), UTμ = (u 1 , u 2 , . . . , u nμ ). (1)
m
 k 
 m k

Each column of U μ , i.e. for f = 1, 2, . . . , k, (uμf )nμ ×1 records μT
u i VνT rνμ
i = uμic vcνT rνμ
i = uμic Ric
μ
(7)
the memberships of n μ objects in Xμ in the f th cluster, and
μ ν=1 c=1 ν=1 c=1
each row of Uμ , i.e. for i = 1, 2, . . . , nμ , (u i )T1×k records
μ μ μ
the memberships of object x i in all the k clusters, where u  i where Ric is defined in (2). With constraints that ∀i, μ,
T μ
T
is the column vector of U μ . Denotations of V μ are defined 1 u i = 1, maximizing (7) tends to assign x μi a large
μ
similarly. membership in cluster f if R if is large. This rule is consistent
It is intuitive that an object is more likely to be assigned with our idea discussed early about how the memberships of
into a cluster where many of its related objects are ranked high different clusters should be assigned to an object. To discuss
in that cluster. Within a cluster, an object is more likely to be how the rankings are distributed in a cluster among objects
a representative object if it is related to many of the objects of each type when maximizing the objective function (4), we
assigned in that cluster. For an object x μi , the following term change (6) into another form as below:
 nν
m  
m 
m  k 
m 
m
μ νμ ν
Rif = rji vjf (2) Tr UTμ Rμν Vν = uμT ν
c Rμν vc (8)
ν=1 j=1 μ=1 ν=1 c=1 μ=1 ν=1

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

We have after some algebraic manipulations, the memberships of object


 m 
m nν
m 
 m
 xμi in the k clusters can be derived as
μν
uμT ν
c Rμν vc =
ν
vjc uμT
c rj
μ=1 ν=1 ν=1 j=1 μ=1 μ 1 μ 1 1 T μ
u i = gi + μ+ (1 − e g )1 (14)
m  nν φμ |Ki | φμ i i
ν ν
= vjc Mjc (9)
ν=1 j=1 with
ν
where Mjc is defined in (3). Maximizing (8) is to maximize m

the value of (9) for each c = {1, 2, . . . , k} under the con- giμ = βμν VνT rνμ μ μ μ T
i = [gi1 , gi2 , . . . , gik ] (15)
straints that ∀ν, c, 1T vcν = 1. This requires the values of ν=1
vcν = (v1cν ν
, v2c , . . . , vnν ν c )T to be distributed in the way that
ν ν ν
object xp has a large ranking value v pc if Mpc is large. This
possible solution follows our general idea on how objects of eTi = [ei1 , ei2 , . . . , eik ], and (16)
the same type should be ranked in a cluster. 
1 for f ∈ Kiμ+
We just discussed that the first term of the objective function eif = (17)
in (4) can be used to measure the quality of clusters, and 0 others
maximizing this term requires memberships and rankings to
be distributed in the way that matches well with our basic idea. where

The other terms in (4) calculate the squared Frobenius norm of Kiμ+ = f : uμif > 0 .
each of the membership matrices and ranking matrices. Those
terms are used as regularization for preventing each object
Similarly, the rankings of all objects of the μth type in cluster
from being only assigned to a single cluster f , with f = c is
μ
arg maxc Ric , or only one object x μp of each type having non-
μ
zero ranking value in a cluster c, where p = arg max j Mjc . 1 μ 1 1
vcμ = h + (1 − eT hμ )1 (18)
Parameters φμ and θν tradeoff the contribution between the θμ c |Ncμ+ | θμ c c
first term and the regularization terms. We can write the
Frobenius norm of a membership matrix and a ranking matrix with
as the sum  of l2 norm of rows and columns, krespectively, i.e.,
nμ μ m

Uμ F = i=1 u i 2 , and Vμ F = c=1 vcμ 2 . Since
μ hμc = βμν Rμν uνc = [hμ1c , hμ1c , . . . , hμnμ c ]T (19)
each u i 2 and vcμ 2 are minimized when all elements are
ν=1
equal, i.e., ∀c, u μic = k1 and ∀j, vjc μ
= n1μ , the larger φμ
is, generally the smoother the memberships of each object of
type μ distributed over the k clusters; and the larger θ μ is, eT   
c = [e1c , e2c , . . . , enμ c ], and (20)
the smoother the rankings are distributed over all objects of 
μ+
type μ in each cluster. In fuzzy clustering, the smoothness of 1 for j ∈ Nc
ejc = (21)
membership distribution over all the clusters is also referred as 0 others
fuzziness, and such kind of regularization referred as quadratic
regularization has been used in several fuzzy approaches [11], where
[21], [24]. μ
Ncμ+ = {l : vlc > 0}

B. Solution μ μ μ
With Uμ = (u 1 , u 2 , . . . , u nμ )T and Vμ =
μ μ μ
We use the method of Lagrange multiplier to derive the local (v1 , v2 , . . . , vk ), we get the membership matrix and
solutions for the constrained optimization problem formulated ranking matrix of objects in X μ .
above. With vectors γ μ , λν , and matrices αμi,c ∈ αμ , βi,c
ν
∈ βν The first term of (14) decides the membership distribution
being Lagrange multipliers, the Lagrangian is formed as of each object xμi in the k clusters while the second term
m
 m
 is a normalization term to ensure the summation constraint
L =J + γμT (Uμ 1 − 1) + λT (VT 1 − 1) to be satisfied. Similarly, the first term in (18) decides the
ν ν
μ=1 ν=1 distribution of ranking values among objects in X μ in each
m
 m
 cluster c, which is normalized by the second term so that the
+ 1T (αμ  Uμ )1 + 1T (βν  Vν )1 (10) sum of ranking values of objects of the same type in each
μ=1 ν=1
cluster is 1.
According to the KKT (Karush-Kuhn-Tucker) conditions: The last problem left is to decide K iμ+ and Ncμ+ . According
∂L to the discussions in [24] and [21], it can be proved that if
=0 (11) c ∈ Kiμ+ , then {∀f ∈ Kiμ+ |f : gif μ
> gic μ
} and if j ∈ Ncμ+ ,
∂uμic
αμi,c uμic = 0 (12) then {∀p ∈ Nc |p : hpc > hjc }. Based on this, Kiμ+ and
μ+ μ μ

Ncμ+ can be calculated in an incremental way which is similar


αμi,c ≥0 (13) to Procedure-K and Procedure-N given in [21].

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

C. Algorithm Algorithm 1: FC-MR


Given relation matrices among m types, and the mem- input : nμ × nν Relation matrix {Rμν }μ,ν={1,2,...,m} ,
bership matrices at iteration l for each of the types, i.e., the number of clusters k, parameter
(l) (l) m
{Uμ }m μ=1 , ranking values for each type {V μ }μ=1 are up- {φμ }μ={1,2,...,m} and {θμ }μ={1,2,...,m} , positive
dated with the current memberships. Based on {V μ }m
(l) weights {βμν }μ,ν={1,2,...,m} .
μ=1 ,
the memberships are re-estimated to get a set of updated output: Fuzzy membership U μ and ranking value V μ
(l+1)
{Uμ }m for each μ = {1, 2, . . . , m}.
μ=1 . This alternating iteration continues until conver- (0)
gence or reaching the maximum number of iterations specified Initialize {Uμ }μ = {1, 2, . . . , m} with nonnegative
by the user. For a more efficient implementation of the values;
algorithm, we use the following simplified procedure to update repeat
Uμ and Vμ in a matrix form, which works well in practice. for each μ = {1, 2, . . . , m} do
• Updating Uμ : calculate
Update Vμ from (26) with H μ calculated as (27);
Modify Vμ according to (29);
1 1 1 end
Uμ = Gμ + (1 − 1Gμ ) (22)
φμ k φμ for each μ = {1, 2, . . . , m} do
with Update Uμ from (22) with G μ calculated as (23);
m Modify Uμ according to (25);

Gμ = βμν Rμν Vν (23) end
ν=1 until convergence;
For each row i of U μ , find
Kiμ+ = {f : uμif > 0} (24)
Other than k, there are sets of parameters {φ μ }μ={1,2,...,m}
modify and {θμ }μ={1,2,...,m} which control how smooth the distri-

/ Kiμ+ bution of memberships over k clusters and the distribution
⎨ 0,
⎪ for c ∈
μ of rankings over n μ objects, respectively. In practice, such
uμic = uic (25)
⎩
⎪ μ , for f ∈ Kiμ+ kind of parameters always need to be tuned empirically due
μ+ u
f ∈K i
if to the difficulty in developing generic parameter estimation
• Updating Vμ : calculate approaches theoretically. Since typically k n μ , comparing
θμ , the same degree of change in φ μ always results a larger
1 1 1
Vμ = Hμ + (1 − 1Hμ ) (26) impact on the results. This means that φ μ needs to be tuned
θμ nμ θμ on a smaller grid than that of θ μ . Based on our experimental
with study, reasonable results are always obtained when φ μ and
m
 θμ are set to make φμ /k and θμ /nμ around the same order.
Hμ = βμν Rμν Uν (27) This indicates the value of θ μ should be larger than φ μ when
ν=1 nμ k. We also realized that to obtain reasonable results,
For each column c of V μ , find the values of φμ and θμ are usually larger for well separated
  dataset than for highly overlapped datasets.
Ncμ+ = p : vpc μ
>0 (28)
modify

⎨ 0,
⎪ for j ∈ / Ncμ+ III. S TAR -S TRUCTURE AND E XTENDED S TAR -S TRUCTURE
μ μ
vjc = vjc (29)

⎩  μ , for j ∈ Ncμ+ A. FC-MR for Star-Structured Relational Data
μ+ vpc
p∈Nc

The complete procedure of the FC-MR algorithm is given In the previous section, we have presented the FC-MR al-
in Algorithm 1. The most costly step of updating U μ (or Vμ ) gorithm which is applicable for various structures of relational
is the calculation of G μ in (23) (or H μ in (27)). If for each data. Now, we discuss a special case where the relational
μ, relation matrix R μν exists for all ν, both of these two data forms a star-structure. Relational data of this structure
steps have a time complexity of O(n μ nmax k), where k is the consists of one central type and several attribute types and
number of clusters, and n max = maxν nν is the largest scale only central-attribute relations are considered. For an m-type
of m types of object sets. When those relation matrices are star-structured relational data, we assume μ = 1 is the central
sparse, this complexity reduces to O(e max k), where emax = type and μ = {2, 3, . . . , m} are m − 1 attribute types. The
μ μ
maxν eμν and eμν is the number of nonzero entries of R μν . relation between the central type and the (μ − 1)th attribute
Assume that the algorithm converges after l iterations, the total type is recorded by a matrix R 1μ . The objective function of
time complexity of the algorithm is O(lmn max nmax k) for FC-MR for the star-structured relational data is reduced to
dense relation matrices or O(lme max k) for sparse case with
emax = maxμ,ν eμν and m is the number of object types. Jstar = Jstar1 (U1 , {Vμ }m m
μ=2 ) + Jstar2 (V1 , {Uμ }μ=2 ) (30)

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

where fuzzy memberships of the central-type objects and rankings


 m
 of the attribute-type objects. The other group of information
Jstar1 =T r UT1 β1μ R1μ Vμ on the partitioning of the attribute type and rankings of the
μ=2 central type can be obtain by solving J star2 in (38).
m 
φ1 θμ T
− UT1 U1 − Vμ Vμ (31)
2 μ=2
2 B. FC-MR for Extended Star-Structured Relational Data
The standard star-structure only consists of relations be-

m tween the central type and attribute types. However, in some
Jstar2 =T r βμ1 UTμ Rμ1 V1 situations, the pairwise relation of the central type is also
μ=2 available. We refer this structure as an extended star-structure
m 
θ1 T φμ T – extends the star-structure by further considering the pair-
− V1 V1 − Uμ Uμ (32) wise relation of objects of the central type. For example, in
2 μ=2
2
Webpage categorization, a Webpage contains both content in-
It can be seen that J star1 and Jstar2 are decoupled since U 1 formation, i.e., the words on the page, and linkage information,
and {Vμ }m μ=2 are mutually dependent and V 1 and {Uμ }μ=2
m i.e., the outgoing and ingoing links to other Webpages. Thus
are dependent on each other. Therefore, we can choose to only the Webpage dataset has both Webpage-word relation and
maximize Jstar1 to get U1 and {Vμ }m μ=2 as in (22) and (26)
Webpage-Webpage relation. Another example is the technical
with the reduced G 1 and {Hμ }m μ=2 given below: paper data. Together with the words occurred in a paper, it also
m
contains citations or references of other papers. We believe
 that both the content information of each individual document
G1 = β1μ R1μ Vμ (33)
μ=2
and the linkage relation among documents are helpful for
clustering these documents. In fact, several recent studies on
and topic modeling began to incorporate the network structure
Hμ = βμ1 Rμ1 U1 for μ = {2, 3, . . . , m}, (34) when establish topic models, such as [25], [26] and [27]. The
objective function of FC-MR for the extended star-structured
or to maximize J star2 to obtain V1 and {Uμ }m
μ=2 as in (26) relational data can be written as
and (22) with H 1 and {Gμ }m μ=2 reduced to
Jestar = Jstar + T r(β11 UT1 R11 V1 ) (40)
m

H1 = β1μ R1μ Uμ (35) with Jstar given in (30) and R 11 is the homogeneous relation
μ=2 matrix of the central type. Compared with the star-structure
and case, the objective function in (40) has one more additional
(0)
term, which makes U 1 dependent on V 1 . Given initial U1
Gμ = βμ1 RTμ1 V1 for μ = {2, 3, . . . , m}. (36) (0)
and V1 , we update each V μ , Uμ for μ = {2, 3, . . . , m}, and
When m = 2, i.e., a bi-type star-structure with one central then re-estimate V1 and U1 . The updating of any U μ and Vμ
type and one attribute type, still follow (22) and (26), respectively, where {G μ }m μ=2 used
for updating attribute type {U μ }mμ=2 is the same as (36) and
φ1 θ2 {Hμ }m {V } m
Jstar1 = T r(β12 UT1 R12 V2 − U1 2F − V2 2F ) μ=2 for updating μ μ=2 is the same as (34); while
2 2 for central type U 1 and V1 , (41) and (42) given below are
(37)
used instead of (33) and (35), respectively.
m

φ2 θ1
Jstar2 = T r(β21 UT2 R21 V1 − U2 2F − V1 2F ) G1 = β1μ R1μ Vμ (41)
2 2 μ=1
(38)
It could be observed that J star1 (m = 2) in (37) is identical m

to the following equation if we set β 12 = 1, φ1 = 2Tu and H1 = β1μ R1μ Uμ (42)
θ2 = 2Tv , μ=1

n2 
k  n1
C. FC-MR for Homogeneous Relational Data

 In the case when only the matrix R 11 records the relation-
Jstar1 = uci rij vcj
c=1 j=1 i=1 ships between each pair of objects of the central type is known,
n1
k 
 n2
k 
 FC-MR reduces to a homogeneous relational data clustering.
− Tu u2ci − Tv 2
vcj (39) The objective function becomes
c=1 i=1 c=1 j=1 φ1 θ1
Jhom = T r(UT1 R11 V1 ) − U1 2F − V1 2F . (43)
which is the objective function of fuzzy co-clustering approach 2 2
called FCoDok [11]. This means, the FCoDok approach is If a dissimilarity matrix D is defined as
a special case of the proposed FC-MR with m = 2, which
handles bi-type star-structured relational data and produces D = rmax 1 − R11 (44)

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

where rmax is the largest entry of R 11 , we obtain


T r(UT1 R11 V1 ) = rmax T r(UT1 1V1 ) − T r(UT1 DV1 )
= rmax n1 − T r(UT1 DV1 ) (45) TABLE I
U PDATING OF R ANKING AND MEMBERSHIPS DURING ITERATIONS
where n1 is the number of objects of the central type. Iteration 0: initialization
T
Here under constraints 1 V1 = 1T and 1T U T T
1k = 11T , we Id U1
T k 1T T 1
obtain T r(U1 1V1 ) = c=1 uc 11 vc = c=1 uc 1 = 1 0.6311 0.7463
1T UT1 1 = 1T 1 = n1 . Thus maximizing J hom in (43) is 2 0.0899 0.0103
equivalent to minimizing the following objective 3 0.0809 0.0484
4 0.7772 0.6679
φ1 θ1 5 0.9051 0.6035
Jhom = T r(UT1 DV1 ) + U1 2F + V1 2F (46) 6 0.5338 0.5261
2 2
7 0.1092 0.7297
which is identical to the objective function of fuzzy clustering 8 0.8258 0.7073
with weighted medoids (PFC) reported in [21]. This means 9 0.3381 0.7814
PFC can be regarded as a special case of FC-MR for homo- 10 0.2940 0.2880
geneous relational data clustering.
Iteration 1
Id V2 Id V3 Id U1
IV. E XPERIMENTAL R ESULTS 1 0.1066 0.0160 1 0 0 1 0.7866 0.2134
2 0.1010 0 2 0.1608 0 2 0.7368 0.2632
This section presents experimental studies of the proposed 3 0.0929 0 3 0.3013 0.1480 3 0.7702 0.2298
approach for clustering of different document datasets which 4 0 0 4 0.1545 0 4 0.8055 0.1945
are represented as multi-type relational data with different 5 0.2368 0.0481 5 0 0 5 0.7670 0.2330
structures. The benchmark datasets 20newsgroups [22] and 6 0 0 6 0.1203 0.3397 6 0.1910 0.8090
7 0.1173 0 7 0.2632 0.5123 7 0.2434 0.7566
Cora [23] are used for studies. 8 0 0 8 0 0 8 0.1688 0.8312
9 0 0.0448 9 0.1177 0.8823
10 0.1197 0.4275 10 0.2327 0.7673
A. A Running Example 11 0.0975 0.1823
First, we give a running example of FC-MR on a toy 12 0.1282 0.2813
problem of a tri-type star-structured relational data. The sim-
Iteration 2
ulated
 data  contains 30 objects of three types, i.e., X = Id V2 Id V3 Id U1
X1 X2 X3 , where |X1 | = 10, |X2 | = 12, and |X3 | = 8. 1 0.1656 0 1 0.1640 0 1 1 0
Two matrices R12 and R13 record the X1 ∼ X2 relation and 2 0.1389 0 2 0.2141 0 2 1 0
X1 ∼ X3 relation, respectively. For this dataset, X 1 is the 3 0.1738 0 3 0.3689 0 3 0.9953 0.0047
central type while X2 and X3 are two attribute types. Fig. 3 4 0.0593 0 4 0.2530 0 4 1 0
5 0.2064 0 5 0 0.1133 5 0.9475 0.0525
shows the value distribution in two matrices, where each entry
6 0.0763 0 6 0 0.3236 6 0 1
is plotted as a dot, and the size of the dot is proportional 7 0.1796 0 7 0 0.4123 7 0.0744 0.9256
to the value of the relationship, i.e., a large dot indicates 8 0 0.0800 8 0 0.1509 8 0 1
a large relationship and a small value of relationship turns 9 0 0.1409 9 0 1
out to be a small dot. According to the X 1 ∼ X2 relation 10 0 0.3157 10 0.0037 0.9963
11 0 0.2019
shown in Figure 3a, it can be observed that the objects in  X1 12 0 0.2615
and X2 form two clusters, respectively, i.e., X 1 = C11 C21 ,
with C11 = {x 1 1 1 1 1 1 1 1 1 1 1
1 , x2 , x3 , x4 , x5 }, C2 = {x6 , x7 , x8 , x9 , x10 }, ..
and X2 = C1 C2 , with C1 = {x1 , x2 , x3 , x4 , x5 , x6 , x27 },
2 2 2 2 2 2 2 2 2 .
C22 = {x28 , x29 , x210 , x211 , x212 }. From the other relation X 1 ∼ X3 Iteration 5: convrege
Id V2 Id V3 Id U1
in Figure 3b, it is seen that the objectsin X 1 form the 1 0.1613 0 1 0.1743 0 1 1 0
same clusters as C11 , C21 , and X3 = C13 C23 , with C13 = 2 0.1405 0 2 0.2201 0 2 1 0
{x31 , x32 , x33 , x34 }, C23 = {x35 , x36 , x37 , x38 }. It is also shown that 3 0.1642 0 3 0.3561 0 3 0.9901 0.0099
C11 is associated to both C 12 and C13 , and C21 is associated to 4 0.0847 0 4 0.2494 0 4 1 0
both C22 and C23 . 5 0.1839 0 5 0 0.1435 5 0.9414 0.0586
6 0.0964 0 6 0 0.3102 6 0 1
Starting with a random partitioning of the central type, the 7 0.1690 0 7 0 0.3635 7 0.0819 0.9181
updating values of memberships of the central type U 1 , and 8 0 0.1112 8 0 0.1828 8 0 1
the rankings of two attribute types V 2 and V3 are given in 9 0 0.1593 9 0 1
Table I, where equal weights of two relation matrices are used. 10 0 0.2813 10 0.0082 0.9918
From this table, it is seen that the algorithm quickly converges 11 0 0.2031
12 0 0.2450
after 5 iterations. After two iterations, V 2 , V3 and U1 are
already close to the final results. The cluster structures of the
three types indicated by the converged values of V 2 , V3 and
U1 are consistent with those expected ones.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

ɖ2
ɖ3

ɖ1 ɖ1

(a) R12 (b) R13


Fig. 3. Toy problem: A tri-type relational data consists of 30 objects, where |X1 | = 10, |X2 | = 12, and |X3 | = 8. Two relations are formed: X1 ∼ X2 and
X1 ∼ X3 . The corresponding relation matrices R12 , R13 are plotted. Each entry in a relation matrix is plotted as a dot, and a larger dot indicates a strong
relationship.

TABLE II
B. Experiments on 20Newsgroups Data S TRUCTURE OF N EWSGROUP D ATA SETS
In this experiment, FC-MR handles a tri-type star-structured
C1 : {rec.sport.baseball, rec.sport.hockey}
relational data. TM1
C2 : {talk.politics.guns, talk.politics.mideast, talk.politics.misc}
1) Data: The original data of 20newsgroups 1 contains C1 : {comp.graphics, comp.os.ms-windows.misc}
18828 non-duplicated documents, which are categorized into TM2 C2 : {rec.autos, rec.motorcycles}
C3 : {sci.crypt, sci.electronics}
20 topics. As in [13] and [16], we generate three subsets C1 : {comp.sys.ibm.pc.hardware, comp.sys.mac.hardware}
consisting of different subtopics as listed in Table II. For C3 : {rec.autos, rec.motorcycles}
TM3
example, dataset TM1 consists of clusters C 1 and C2 , where C3 : {sci.med, sci.space}
C4 : {talk.politics.guns, talk.politics.mideast}
C1 is the rec.sport cluster which contains two subtopics or
categories on baseball and hockey, and C 2 is the talk.politics
cluster that has three subtopics on guns, mideast and misc.
Each of the three datasets is formed by randomly selecting matrices. Each main diagonal entry of D r is the sum of
100 documents from each of the chosen subtopics. each row of R 12 , and each main diagonal entry of D c is
2) Relations Derived: the sum of each column of R 12 . Similar normalization is
• Document-Word Relation: For each dataset, we use the
performed on R 13 . For these three 20newsgroups datasets,
rainbow toolkit [28] to get the document-word co- the document-word relation alone may be used for document
occurrence relation. Stop-words have been removed and clustering, as it provides the information that documents
2000 words with the largest information gain are kept. contain similar words should be labeled in the same cluster;
Non-text documents are skipped by rainbow. After pre- while the document-category relation although clearly shows
processing, documents consisting of less than two words about which documents are in the same subtopic, it provides
are removed. Finally we have 497, 598 and 794 docu- no indication on which subtopics are related to the same topic
ments for TM1, TM2 and TM3, respectively. The tf-idf or cluster. Therefore, the document-category relation alone
weighting [29] is used for weighting the words in each should not be used for clustering process. A better clustering
document. result may be produced based on joint analysis of these two
• Document-Category Relation: Other than the document-
relations.
word relation which captures the document content by 3) FC-MR vs. HFCM and FCoDok: To see whether FC-
statistical information in terms of word frequency, we MR can make use of the additional document-category relation
also generate another relation matrix document-category to achieve any improvements on the clustering results, we first
to indicate which subtopic each document belongs. Each compare the document clusters generated by HFCM [30] and
subtopic is a category and the relation matrix is a binary FCoDok [11] based on R 12 with those generated by FC-MR
matrix with an entry is 1 if the document is from the based on both R 12 and R13 . HFCM is a modified fuzzy c-
corresponding subtopic otherwise is 0. means clustering based on cosine distance, and FCoDok is a
These two relations, i.e., document-word relation represented fuzzy co-clustering approach. In our experiment, the number
as R12 and document-category relation represented as R 13 of clusters k is set to be equal to the real number of classes
form a tri-type star-structure, where the central type is doc- for each dataset. For FC-MR, we follow the guideline given
ument and two attribute types are word and category. In earlier to set the parameters. For other approaches, we tried
this experiment, we use the normalized relation matrices, i.e., different values of parameters on a grid, and choose those
R12 = D−0.5 R12 D−0.5 , where Dr and Dc are two diagonal give the best results. In this experiment, we find the following
r c
settings allow each approach to perform well on three datasets:
1 http://people.csail.mit.edu/jrennie/20Newsgroups/ Tu = 0.001, Tv = 1 for FCoDok, m = 1.02 for HFCM,

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

TABLE VI
and φ1 = 0.01, θ2 = 1, θ3 = 0.03 for FC-MR. The weights C OMPARISON OF A CCURACY AND NMI ON N EWSGROUP DATA
of two matrices β12 and β13 are set to be equal in FC-MR.
For all fuzzy approaches, a truncated partitioning is obtained
Accuracy
by assigning each document to the cluster with the largest TM1 TM2 TM3
membership. HFCM 97.32 ± 10.21 89.49 ± 14.12 86.30 ± 14.76
Table III to Table V give the best clustering results that FCoDok 97.32 ± 10.21 89.49 ± 14.12 86.71 ± 14.23
NMF 97.26 ± 10.20 86.30 ± 9.38 87.53 ± 13.88
each approach can achieve among 30 trials with random SRC 100.00 ± 0.00 75.88 ± 13.96 77.57 ± 22.86
initializations on the three datasets, respectively. From these FC-MR 100.00 ± 0.00 99.16 ± 4.61 95.52 ± 11.78
contingency tables, it can be seen that with a proper initial- NMI
ization, FC-MR is able to label all the documents correctly TM1 TM2 TM3
HFCM 97.32 ± 24.89 89.49 ± 18.57 86.30 ± 14.34
on all the three datasets, while FCoDok and HFCM mis- FCoDok 97.32 ± 24.89 89.49 ± 18.57 86.71 ± 14.54
cluster different numbers of documents on different datasets. NMF 97.26 ± 24.80 86.30 ± 13.83 87.53 ± 12.40
The mislabeled number of documents by FCoDok and HFCM SRC 100.00 ± 0.00 72.06 ± 3.82 85.85 ± 15.87
FC-MR 100.00 ± 0.00 99.16 ± 7.23 95.52 ± 12.03
on TM1 and TM2 are close, but more documents are labeled
correctly by FCoDok than HFCM on TM3. Although FCoDok
and HFCM make use of the same document-word matrix,
eigendecomposition.
HFCM treats each document as a vector and use cosine
• NMF-H [18]: Non-negative matrix factorization approach
similarity to measure the closeness of two documents, while
for star-structured heterogeneous relational data.
FCoDok treats document and word as two different object
types, and documents are clustered based the ranking of words. In this experiment, the results of HFCM and FCoDok are
There is no explicit similarity measure defined in FCoDok and obtained based on X, the data matrix by combing R 12 and
FC-MR. R13 , i.e., X = [R12 R13 ].
To see some more detailed comparison, we plot u 1 = Two external metrics Accuracy and Normalized Mutual
{u1,1 , u2,1 . . . , u500,1 }, the membership values of documents Information (NMI) [31] are used to evaluate the clustering
in C1 of TM1 produced by HFCM and FC-MR in Fig. 4. results, which measure the degree of agreement between the
According to the summation constraints, the memberships of results produced by a clustering algorithm and the ground
these objects in the other cluster follow u 2 = 1−u1 . For TM1, truth. If we refer class as the ground truth, and cluster as
the ground truth partitioning is that the first 198 documents the results generated by a clustering algorithm, the NMI score
are in C1 : rec.sport and the rest are in C 2 : talk.politics. It can is calculated with the following formula
be seen from Fig. 4a that in HFCM, for the 21th document, k f j n·njc
u21,1 < u21,2 , although the ground truth label of this document c=1 j=1 nc log( nc ·nj )
NMI =   f . (47)
is C1 . At the same time, six documents (labeled by circles) k n
( c=1 nc log nnc )( j=1 nj log nj )
which should be in C 2 are assigned larger memberships in
C1 . These results may indicate that the representation of these where n is the total number of documents, n c , nj are the num-
documents in terms of word frequency is not sufficient enough bers of documents in the cth cluster and jth class, respectively,
for correctly assigning those documents into the right clusters. and njc is the number of common documents in class j and
The document-category relation helps to clarify the uncertainty cluster c. In our experiments, the number of clusters k is set
or correct the misleading information in the document-word to be equal to the number of classes f .
relation. For example, according to the document-category Another metric Accuracy is calculated as below after ob-
relation, it is known that the 21th document possibly should be taining a one-to-one matching between the k clusters and the
grouped together with the other 99 documents as they belong k classes,
k
to the same category. This is confirmed from Fig. 4b that when 1 p
Accuracy = n (48)
the document-category relation is further incorporated, those n c=1 c
previously mis-clustered documents are now assigned reason-
able memberships in two clusters which are consistent with the where npc is the number of common objects in the cth cluster
ground truth. Although from the document-category matrix, it and its matched class p. The higher Accuracy and NMI
can be told that which documents are in the same subtopic, indicates the more consistent between the algorithm-generated
this relation alone delivers no information on the hierarchy clusters and the ground truth classes, and thus the better
structure of the subtopics, i.e. which subtopics should be clustering result. Both Accuracy and NMI equal to 1 only
combined into high-level clusters. Therefore, clustering only when the partitioning produced by an algorithm is identical
based on the document-category relation does not produce any to the ground truth classes.
meaningful result, which turn out to be random combinations For each dataset, each approach is run 30 trials. To avoid bad
of those subgroups into the specified number of clusters. initializations with which the word-category relation may dom-
4) Comparison of Accuracy and NMI: Other than two fuzzy inate in the clustering process to produce random combinations
approaches HFCM and FCoDok, we also compare FC-MR of subtopics, we first run HFCM with random initializations
with another two existing multi-type relational data clustering only based on the document-word relation, and then use the
approaches: document fuzzy memberships produced as the initial document
• SRC [16]: Multi-type relational clustering based on partitioning in all the five approaches. Table VI shows the

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

10

TABLE III
C ONTINGENCY TABLE OF TM1

HFCM FCoDok FC-MR


Clust 1 Clust 2 Clust 1 Clust 2 Clust 1 Clust 2
Class 1 197 1 196 2 198 0
Class 2 6 293 4 295 0 299

TABLE IV
C ONTINGENCY TABLE OF TM2

HFCM FCoDok FC-MR


Clust 1 Clust 2 Clust 3 Clust 1 Clust 2 Clust 3 Clust 1 Clust 2 Clust 3
Class 1 193 3 3 194 4 1 199 0 0
Class 2 2 188 10 2 190 8 0 200 0
Class 3 20 9 170 21 11 167 0 0 199

TABLE V
C ONTINGENCY TABLE OF TM3

HFCM FCoDok FC-MR


Clust 1 Clust 2 Clust 3 Clust 4 Clust 1 Clust 2 Clust 3 Clust 4 Clust 1 Clust 2 Clust 3 Clust 4
Class 1 189 4 5 1 194 1 4 0 199 0 0 0
Class 2 3 188 5 3 4 190 4 1 0 199 0 0
Class 3 4 9 180 6 6 6 182 5 0 0 199 0
Class 4 0 6 2 189 1 6 4 186 0 0 0 197

1 1

0.9 0.9

0.8 0.8

0.7 0.7

0.6
i1
Value of ui1

0.6
Value of u

0.5 0.5

0.4 0.4
i = 21
u =0.3886
i1
0.3 0.3

0.2 0.2

0.1 0.1

0 0
0 100 200 300 400 500 0 100 200 300 400 500
(a) HFCM (b) FC-MR
Fig. 4. Fuzzy memberships of documents with respect to C1 of TM1 by HFCM and FC-MR. The (horizontal) x-axis denotes the document id (i = 1, . . . , 500)
and the (vertical) y-axis shows the value of the membership ui1 . In (a), small circles label objects that are clustered incorrectly.

mean and standard deviation of Accuracy (%) and NMI values 5) Ranking of Words and Categories: Together with the
(%) over 30 trials. documents clusters, we also obtain the word rankings and
category rankings. The ranking values of each category in each
It can be seen that FC-MR gives the highest Accuracy and cluster is shown in the three matrices given in (49) to (51) for
NMI values among five approaches on all the three datasets. TM1, TM2 and TM3, respectively, where each row indicates a
The performances of HFCM, FCoDok and NMF-H are very category and each column indicates a cluster or a topic. Top 10
close. Although SRC gives the best results as FC-MR on TM1, words of each cluster of three datasets are also listed in Table
its results on the other two datasets are even worse than those VII. As attribute types, category and word can be used for
of the other three approaches. These results show that FC- description or interpretation of the document clusters. We can
MR achieves significant improvement in document clustering see that those key categories and key words which have large
compared with existing vector-based fuzzy clustering and ranking values provide meaningful information to tell what
fuzzy co-clustering with combined relations, and also perform topic each document cluster possibly related to. If we take C 1
much better than nonnegative matrix factorization based and of TM1 for example, from its associated key categories named
spectral clustering based multi-type relational data clustering baseball and hockey, and key words, such as game, hockey,
approaches. team, players, baseball, we may guess that documents of this

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

11

TABLE VII
T OP 10 WORDS IN EACH CLUSTER OF THREE NEWSGROUP DATASETS GENERATED BY FC-MR

TM1 TM2 TM3


C1 : rec.sport C2 : talk.politics C1 : comp C2 : rec C3 : sci C1 : comp.sys C2 : rec C3 : sci C4 : talk.politics
game jews windows car key mac bike space israel
hockey israel file bike clipper drive car nasa israeli
team people graphics dod chip card dod moon jews
games government files bmw keys scsi bikes launch arab
players gun tiff dog nsa apple list pat arabs
ca state version cars encryption disk ride shuttle turkish
baseball children ftp insurance netcom mb honda henry jewish
season cramer site bikes phone software bmw orbit mr
win fbi program audi des drives oil alaska armenians
espn cosmo dos motorcycle government driver miles mission muslim

cluster are about sports, possibly on hockey and baseball. Such are previously used in [32] and [33]. A summarization of each
kind of information can be used as a short summary of the of them is given in Table VIII.
cluster to give a quick sense of the whole document cluster 2) Relations Derived:
without looking through each of the complete long documents. • paper-paper relation: The value of the relationship be-
tween two papers is 1 if any of them is in the reference
rec.sport talk.politics list of the other and is 2 if two papers are mutually cited.
⎛ ⎞
baseball 0.4327 0 • paper-term relation: Each entry of the paper-term matrix
hockey ⎜ ⎜ 0.5673 0 ⎟
⎟ is the number of occurrences of the term in the abstract
guns ⎜ 0 0.3067 ⎟ (49) of the corresponding paper.
⎜ ⎟
mideast ⎝ 0 0.3506 ⎠ We use R11 to denote the paper-paper citation relation and
misc 0 0.3428 R12 the paper-term relation. For these Cora datasets, both the
paper-paper relation and the paper-term relation may be used
comp rec sci alone for document clustering. However, the content contained
⎛ ⎞ in the abstracts of Cora data is much less complete than that in
graphics 0.4716 0 0
os.ms-windows.misc ⎜
⎜ 0.5284 0 0 ⎟ ⎟
20newsgroups data and some papers in the Cora collection are
⎜ 0 ⎟ recorded even without abstract. According to [32], these five
autos ⎜ 0 0.4579 ⎟ (50)
motorcycles ⎜ 0 0.5421 0 ⎟ datasets we used are preprocessed by removing papers without
⎜ ⎟
crypt ⎝ 0 0 0.7018 ⎠ references only. In other words, a paper without abstract but
electronics 0 0 0.2982 has at least one reference is kept in preprocessing. Based on
this knowledge, here we let the citation relation have a higher
comp.sys rec sci talk.politics weight than the paper-term relation.
⎛ ⎞ 3) Algorithms and Settings: Other than two fuzzy ap-
ibm.pc.hardware 0.5265 0 0 0
mac.hardware ⎜ ⎜ 0.4735 0 0 0 ⎟

proaches HFCM and FCoDok and two multi-type relational
autos ⎜ 0 0.3749 0 0 ⎟ data clustering approaches, we also compare FC-MR with
⎜ ⎟
motorcycles ⎜ 0 0.6251 0 0 ⎟ another state-of-the-art approach iTopicModel [27]: A topic
⎜ ⎟
med ⎜ 0 0 0.4056 0 ⎟ modeling approach, which considers both text and structure
⎜ ⎟
space ⎜ 0 0 0.5944 0 ⎟ information for documents. The results of HFCM and FCoDok
⎜ ⎟
guns ⎝ 0 0 0 0.3530 ⎠ are obtained based on X, the data matrix by combing R 11 and
mideast 0 0 0 0.6470 R12 , i.e., X = [β11 R11 β12 R12 ]. In SRC and NMF-H, R11
(51) is treated as a relation between two different types as these two
approaches only consider such type of relations. The iTopic-
Model approach establishes generative models by making use
C. Experiments on Cora Data Sets both the content information R 12 and the structure information
In this experiment, we compare the clustering accuracy R11 . For FC-MR, the each dataset with paper-paper relation
of FC-MR with both fuzzy and non-fuzzy approaches on and paper-term relation form a bi-type extended star structure.
five Cora datasets. FC-MR treats each dataset as a bi-type In this experiment, we find each of the approaches performs
relational data with an extended star structure. well with the following settings: φ 1 = 0.01, θ1 = θ2 = 1 for
1) Data: The Cora data [23] contains abstracts and refer- FC-MR, Tu = 0.01, Tv = 0.1 for FCoDok, and m = 1.02 for
ences of computer science papers published in the conferences HFCM. The weights of two relation matrices β 11 = 0.8, and
and journals of different research areas, such as artificial β12 = 0.2 are used for all the five datasets.
intelligence, information retrieval and hardware. A typical 4) Comparisons of Accuracy and NMI: Each approach is
sample record is shown in Fig. 5. Five datasets with each run 30 trials on each dataset with random initializations. Table
of them corresponding to a research area in computer science IX shows the means and standard deviations of Accuracy (%)
are used in our experiment. We use the processed data which and NMI (%) values over 30 trials.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

12

URL: http://pertsserver.cs.uiuc.edu/papers/HaLi94a.ps

Title: Validating Timing Constraints in Multiprocessor and Distributed Real-Time Systems

Author: Rhan Ha and Jane W. S. Liu

ǥ.

Abstract: In multiprocessor and distributed real-time systems, scheduling jobs dynamically


on processors is likely to achieve better performance. However, analytical and efficient
validation methods to determine whether all the timing constraints are met do not exist for
systems using modern dynamic scheduling strategies, and exhaustive simulation and
testing are unreliable and expensive. This paper describes several worst-case bounds and
efficient algorithms for validating systems in which jobs have arbitrary timing constraints
and variable execution times and are scheduled on processors dynamically in a priority-
driven manner. ǥ

Reference: [1] <author> J. A. Stankovic, K. Ramamritham, and S. Cheng. </author> <title>


Evaluation of a flexible task scheduling algorithm for distributed hard real-time systems.
</title> <journal> IEEE Transactions on Computers, </journal> <volume> 34(12)
</volume> <pages> 1130-1143, </pages> <month> December </month> <year>
1985. </year>

Reference: [2] <author> K.G. Shin and Y.C. Chang. </author> <title> Load sharing in
distributed real-time systems with state-change broadcasts. </title> <journal> IEEE
Transactions on Software Engineering, </journal> <volume> 38(8) </volume> <pages>
1124-1142, </pages> <month> August </month> <year> 1989. </year>

References-found: 13

Fig. 5. A sample record of extracted information of a paper in the Cora data. Each record usually consists of three parts: the header part includes the URL,
Title, Author and some other related information like Address, Affiliation and Email; the second part is the content of abstract; the third part is the list of
formatted references.

TABLE VIII
C ORA D ATA SETS

# of documents # of words # of subfields average # of links


Data Structure (DS) 751 6234 9 3
Hardware and Architecture (HA) 400 3989 7 4
Machine Learning (ML) 1617 8329 7 5
Operation System (OS) 1246 6737 4 7
Programming Language (PL) 1575 7949 9 6

From this table, it is clearly seen that FC-MR achieves 20newsgroups datasets, the Accuracy and NMI values for Cora
significant improvements than other five approaches on all datasets are much lower. This is mainly because that the
the five datasets with respect to both Accuracy and NMI. clusters of 20Newsgroups datasets are well separated since
The overall performance of the other five approaches are each cluster corresponds to a topic, and all the topics are very
close. For some datasets, the performances of HFCM and different, e.g., sport and politics; while the clusters of Cora
FCoDok based on concatenated data are even slightly better datasets have more overlaps as each cluster is a subfield and all
than those of the other three approaches. Since SRC and subfields are related to the same research field. Other than that,
NMF-H only handles heterogeneous relations, the paper-paper for each document in a 20Newsgroups dataset, we make sure
homogeneous relation is unable to be used effectively. Among that it at least contains two words excluding stop words; while
these five approaches, SRC performs better than other four quite a few documents in Cora data have no abstract being
on ML, but its results on OC is the worst of all. The recorded, which means these documents contain no words.
performance of iTopicModel on these five datasets are not Both the overlaps among clusters and less information on the
as good as expected. It is also observed that comparing with content make Cora datasets more challenging for clustering.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

13

TABLE IX
C OMPARISON OF A CCURACY AND NMI ON C ORA DATA

Accuracy
DS HA ML OS PL
HFCM 39.17 ± 2.84 37.33 ± 2.88 49.25 ± 3.77 44.99 ± 2.91 35.65 ± 1.91
FCoDok 37.15 ± 2.54 40.94 ± 3.34 47.10 ± 4.88 49.64 ± 1.87 32.12 ± 2.23
NMF-H 34.07 ± 2.66 40.08 ± 5.57 40.85 ± 5.03 46.83 ± 5.02 30.56 ± 2.86
SRC 34.92 ± 2.33 41.41 ± 2.19 52.21 ± 3.29 44.79 ± 6.56 30.84 ± 1.58
iTopicModel 29.85 ± 3.00 34.38 ± 4.17 39.97 ± 4.66 50.81 ± 6.65 33.30 ± 2.66
FC-MR 44.39 ± 3.53 46.64 ± 3.49 66.68 ± 2.65 63.32 ± 4.33 42.18 ± 1.98

NMI
DS HA ML OS PL
HFCM 28.11 ± 2.68 32.55 ± 3.30 34.26 ± 2.50 23.97 ± 2.12 28.30 ± 1.70
FCoDok 25.06 ± 1.80 35.72 ± 2.88 27.95 ± 4.11 26.43 ± 2.93 21.92 ± 1.01
NMF-H 22.76 ± 1.99 26.84 ± 4.01 19.90 ± 3.54 19.01 ± 2.73 16.44 ± 1.64
SRC 22.25 ± 1.51 32.29 ± 1.94 36.01 ± 2.08 15.30 ± 2.28 21.46 ± 0.91
iTopicModel 17.08 ± 2.36 16.11 ± 3.19 21.71 ± 2.95 16.59 ± 4.16 17.92 ± 1.75
FC-MR 38.24 ± 1.84 42.38 ± 3.27 49.75 ± 1.83 35.39 ± 1.97 31.92 ± 1.30

NMF-H SRC iTopicModel FC-MR V. C ONCLUSION


103
1000

In this paper, we work on fuzzy clustering for multi-


type relational data mining. The main idea of the proposed
Times (Log (Seconds))

approach FC-MR is to estimate fuzzy memberships of objects


10
1002
in each cluster based on the ranking values of related objects in
that cluster, and the memberships are in turn used for updating
the rankings of objects in each cluster. The proposed FC-MR
10
101
is able to handle relational data with various structures. We use
document clustering as a real-world example to illustrate how
FC-MR can be applied in various situations according to the
requirement of the applications and the availability of relations
1010 in the data. Connections between FC-MR and existing fuzzy
DS HA ML OS PL approaches show that FC-MR can be seen as generalizations
Fig. 6. Comparison of time efficiency of four approaches. The time required of fuzzy approach for homogeneous relational data and fuzzy
by each of the approaches on five Cora datasets are displayed in log(seconds). co-clustering. Our experimental study on benchmark document
datasets shows that FC-MR outperforms existing fuzzy ap-
proaches as well as the state-of-the-art multi-type relational
clustering approaches. This demonstrates the great potential
of our approach for relational data clustering and analysis.
5) Comparison of Time Efficiency: We now compare the
time efficiency of three multi-type relational data clustering ACKNOWLEDGMENT
approaches as well as iTopicModel. Given a relation R 11 We are grateful to Yizhou Sun and Jiawei Han for sharing
between each pair of n 1 documents and R 12 between n1 their Java codes of iTopicModel with us. We also thank
documents and n 2 words, the time complexity of FC-MR, reviewers for their insightful and constructive comments.
NMF-H and iTopicModel is the same O(kln max n1 ), where
k is the number of clusters, l is the number of iterations, and R EFERENCES
nmax = max{n1 , n2 }. But FC-MR is faster and requires less [1] J. C. Bezdek, Pattern Recognition with Fuzzy Objective Function Algo-
storage than the other two. The time complexity of SRC is rithms. New York: Plenum, 1981.
O(l(nmax )3 + klnmax n1 ). [2] W. Pedrycz, V. Loia, and S. Senatore, “Fuzzy clustering with view-
points,” IEEE Trans. Fuzzy Syst., vol. 18, no. 2, pp. 274–284, 2010.
[3] C.-H. Li, B.-C. Kuo, and C.-T. Lin, “LDA-based clustering algorithm
Fig. 6 shows the running time of one trial required by four and its application to an unsupervised feature extraction,” IEEE Trans.
approaches to produce the results on five Cora datasets. All Fuzzy Syst., vol. 19, no. 1, pp. 152–163, 2011.
algorithms are implemented with MATLAB 7.4 on a machine [4] P. H. A. Sneath and R. R. Sokal, Numerical Taxonomy — The Principles
and Practice of Numerical Classification. San Francisco: W. H.
with Dual 2.66GHz Intel Core processors and 2GB RAM. It Freeman, 1973.
is obvious that the proposed FC-MR is the fastest one among [5] L. Kaufman and P. J. Rousseeuw, Finding Groups in Data: An Intro-
four. NMF-H is the second fast and iTopicModel, although has duction to Cluster Analysis. New York: Wiley, 1990.
[6] R. J. Hathaway, J. W. Devenport, and J. C. Bezdek, “Relational duals of
the similar time complexity, needs much more time in real the c-means clustering algorithms,” Pattern Recogn., vol. 22, pp. 205–
implementation. The spectral relational clustering approach 212, 1988.
SRC requires a similar amount of time as iTopicModel, but [7] I. S. Dhillon, “Co-clustering documents and words using bipartite
spectral graph partitioning,” in Proc. KDD’01, 2001, pp. 269–274.
the time complexity of SRC is higher than the other three [8] I. S. Dhillon, S. Mallela, and D. S. Modha, “Information-theoretic co-
approaches and it does not scale well to large datasets. clustering,” in Proc. KDD ’03, 2003, pp. 89–98.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.
Thisarticlehasbeenacceptedforpublicationinafutureissueofthisjournal,buthasnotbeenfullyedited.Contentmaychangepriortofinalpublication.

14

[9] B. Long, Z. Zhang, and P. S. Yu, “Co-clustering by block value Jian-Ping Mei received her B.Eng. degree from
decomposition,” in Proc. KDD, 2005. School of Electronic and Information Engineering at
[10] A. Banerjee, I. Dhillon, J. Ghosh, S. Merugu, and D. S. Modha, “A Ningbo University, China, in 2005 and M.Eng. De-
generalized maximum entropy approach to bregman co-clustering and gree from School of Information Science and Elec-
matrix approximation,” Journal of Machine Learning Research, vol. 8, tronic Engineering at Zhejiang University, China, in
pp. 1919–1986, 2007. 2007. She is currently working towards Ph.D. degree
[11] K. Kummamuru, A. Dhawale, and R. Krishnapuram, “Fuzzy co- at Nanyang Technological University in Singapore.
clustering of documents and keywords,” in 12th IEEE Int. Conf. Fuzzy Her research interests include machine learning al-
Systems, 2003. gorithms and applications to Web mining and bioin-
[12] W.-C. Tjhi and L. Chen, “Dual fuzzy-possibilistic coclustering for formatics.
categorization of documents,” IEEE Trans. Fuzzy Syst., vol. 17, pp. 532–
543, 2009.
[13] B. Gao, T.-Y. Liu, X. Zheng, Q.-S. Cheng, and W.-Y. Ma, “Consistent
bipartite graph co-partitioning for star-structured high-order heteroge-
neous data co-clustering,” in Proc. KDD’05.
[14] R. Bekkerman, R. El-Yaniv, and A. McCallum, “Multi-way distributional
clustering via pairwise interactions,” in ICML’05, 2005.
[15] B. Long, X. Wu, Z. Zhang, and P. S. Yu, “Unsupervised learning on
k-partite graphs,” in KDD’06.
[16] B. Long, Z. Zhang, X. Wu, and P. S. Yu, “Spectral clustering for multi-
type relational data,” in Proc. 23th Int. Conf. Machine Learning, 2006,
pp. 585–592.
[17] A. Banerjee, S. Basuy, and S. Meruguz, “Multi-way clustering on
relation graphs,” in SIAM’07, 2007.
[18] Y. Chen, L. Wang, and M. Dong, “Non-negative matrix factorization for
semisupervised heterogeneous data coclustering,” IEEE Trans. Knowl.
Data Eng., vol. 22, no. 10, pp. 1459–1474, 2010.
[19] B. Long, Z. Zhang, and P. S. Yu, “A probabilistic framework for
relational clustering,” in KDD’07.
[20] Y. Sun, Y. Yu, and J. Han, “Ranking-based clustering of heterogeneous
information networks with star network schema,” in Proc. KDD’09.
[21] J.-P. Mei and L. Chen, “Fuzzy clustering with weighted medoids for
relational data,” Pattern Recognition, vol. 43, pp. 1964–1974, 2010.
[22] K. Lang, “NewsWeeder: learning to filter netnews,” in Proc. 12th Int.
Conf. Machine Learning, 1995, pp. 331–339.
[23] A. McCallum, K. Nigam, J. Rennie, and K. Seymore, “Automating
the construction of internet portals with machine learning,” Information
Retrieval, vol. 3, no. 2, pp. 127–163, 2000.
[24] S. Miyamoto and K. Umayahara, “Fuzzy clustering by quadratic regu-
larization,” in IEEE Int. Conf. Fuzzy Systems, 1998.
[25] Z. Guo, S. Zhu, Y. Chi, Z. M. Zhang, and Y. Gong, “A latent topic Lihui Chen received the BEng in Computer Science
model for linked documents,” in SIGIR’09. & Engineering at Zhejiang University, China and
[26] Q. Mei, D. Cai, D. Zhang, and C. Zhai, “Topic modeling with network the PhD in Computational Science at University of
regularization,” in WWW’08. St. Andrews, UK. Currently she is an Associate
[27] Y. Sun, J. Han, J. Gao, and Y. Yu, “itopicmodel: Information network- Professor in the Division of Information Engineering
integrated topic modeling,” in Proc. ICDM’09. at Nanyang Technological University in Singapore.
[28] A. K. McCallum, “Bow: A toolkit for statistical language modeling, Her research interests include machine learning al-
text retrieval, classification and clustering,” 1996. [Online]. Available: gorithms and applications, data mining and web
http://www.cs.cmu.edu/ mccallum/bow/ intelligence. She has published more than seventy
[29] G. Salton and C. Buckley, “Term-weighting approaches in automatic referred papers in international journals and confer-
text retrieval,” Information Processing and Management, vol. 24, no. 5, ences in these areas. She is a senior member of the
pp. 513–523, 1988. IEEE, and a member of the IEEE Computational Intelligence Society.
[30] M. E. S. Mendes and L. Sacks, “Evaluating fuzzy clustering for
relevance-based information access,” in Proc. IEEE Conf. Fuzzy Syst.,
2003.
[31] A. Strehl and J. Ghosh, “Cluster ensembles—a knowledge reuse frame-
work for combining multiple partitions,” Journal on Machine Learning
Research, vol. 3, pp. 583–617, 2002.
[32] S. Zhu, K. Yu, Y. Chi, and Y. Gong, “Combining content and link for
classification using matrix factorization,” in Proc. SIGIR’07.
[33] D. Zhang, F. Wang, C. Zhang, and T. Li, “Multi-view local learning,”
in Proc. AAAI’08, 2008.

Copyright(c)2011IEEE.Personaluseispermitted.Foranyotherpurposes,permissionmustbeobtainedfromtheIEEEbyemailingpubs-permissions@ieee.org.

You might also like