Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
3Activity
0 of .
Results for:
No results containing your search query
P. 1
Survey on Fuzzy Clustering and Rule Mining

Survey on Fuzzy Clustering and Rule Mining

Ratings: (0)|Views: 353 |Likes:
Published by ijcsis
The document clustering improves the retrieval effectiveness of the information retrieval System. The association rule discovers the interesting relations between variables in transaction databases. Transaction data in realworld applications use fuzzy and quantitative values, to design sophisticated data mining algorithms for optimization. If documents can be clustered together in a sensible order, then indexing and retrieval operations can be optimized. This study presents a review on fuzzy document clustering. This survey paper also aims at giving an overview to some of the previous researches done in fuzzy rule mining, evaluating the current status of the field, and envisioning possible future trends in this area.
The document clustering improves the retrieval effectiveness of the information retrieval System. The association rule discovers the interesting relations between variables in transaction databases. Transaction data in realworld applications use fuzzy and quantitative values, to design sophisticated data mining algorithms for optimization. If documents can be clustered together in a sensible order, then indexing and retrieval operations can be optimized. This study presents a review on fuzzy document clustering. This survey paper also aims at giving an overview to some of the previous researches done in fuzzy rule mining, evaluating the current status of the field, and envisioning possible future trends in this area.

More info:

Published by: ijcsis on Aug 13, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

10/25/2012

pdf

text

original

 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
4
 , 2010
Survey on Fuzzy Clustering and Rule Mining 
D.Vanisri
Department of Computer TechnologyKongu Engineering CollegePerundurai, Tamilnadu, Indiavanisri_raja@rediffmail.com
Dr.C.Loganathan
Principal, Maharaja Arts and Science CollegeCoimbatore, Tamilnadu
,
Indiaclogu@rediffmail.com
 Abstract 
 — 
The document clustering improves the retrievaleffectiveness of the information retrieval System. Theassociation rule discovers the interesting relations betweenvariables in transaction databases.Transaction data in real-world applications use fuzzy and quantitative values, todesign sophisticated data mining algorithms foroptimization.If documents can be clustered together in asensible order, then indexing and retrieval operations canbe optimized. This study presents a review on fuzzydocument clustering.This survey paper also aims at givingan overview to some of the previous researches done infuzzy rule mining, evaluating the current status of the field,and envisioning possible future trends in this area
 Keywords- Fuzzy set, Fuzzy clustering, Fuzzy rule mining, Information Retrieval, Web analysis.
I.I
NTRODUCTION
Fuzzy sets used for optimization result by allowing partial memberships to the different sets. Fuzzy settheory provides the tools need to do the computations inorder to be able to deal with different data structure. DataMining is an analytic process designed to explore data insearch of consistent patterns and systematic relationships between variables and then to validate the findings byapplying the detected patterns to new subsets of data. Theultimate goal of data mining is extracting rules andclustering the similar objects.The goal of this survey is to provide a comprehensivereview of different fuzzy rule mining and clusteringtechniques in data mining. Clustering is a division of datainto groups of similar objects. Each group, called cluster,consists of objects that are similar between themselvesand dissimilar to objects of other groups. Representingdata by fewer clusters necessarily loses certain finedetails, but achieves simplification.Association analysis is the discovery of what arecommonly called association rules. It studies thefrequency of items occurring together in transactionaldatabases, and based on a threshold called support,identifies the frequent item sets. Another threshold,confidence, which is the conditional probability than anitem appears in a transaction when another item appears,is used to pinpoint association rules. Association analysisis commonly used for market basket analysis. Clusteringis the organization of data in classes. However, unlikeclassification, in clustering, class labels are unknown andit is up to the clustering algorithm to discover acceptableclasses. Clustering is also called unsupervisedclassification, because the classification is not dictated bygiven class labels.The remainder of this paper is organized as follows.Section II describes problem formation. Section IIIdiscusses some of the earlier proposed research work onfuzzy document clustering and fuzzy association rulemining. Section IV provides a fundamental idea on whichthe future research work focuses on. Section Vconcludes the paper with fewer discussions.
II.PROBLEM FORMULATION
Association Rule Mining (ARM) is the process of finding a rule of the form X
Y from the given set of transactions. These transactions contain a set of itemswhich is a subset of items in the set of unique items in theentire database. Association rule generated implies that if X, an item set specific to the domain is present then the probability of finding Y item set is given by confidence.The process of finding the association rules involves twosteps namely frequent item set mining and associationrule generation. Frequent item sets play an essential rolein many data mining tasks that try to find interesting patterns from databases, such as association rules,correlations, sequences, episodes, classifiers, clusters andmany more of which the mining of association rules isone of the most popular problems[1]. An associationrule is an expression of the form X => Y, where X and
183http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
4
 , 2010
Y are item sets, and
 X  
= {}. Such a ruleexpresses the association that if a transactioncontains all items in X, then that transaction alsocontains all items in Y. X is called the body oantecedent, and Y is called the head or consequent of the rule.To illustrate the concepts, for example from thesupermarket domain.The support of an association rule X => Y in D, is thesupport of X U Y in D, and similarly, the frequency of the rule is the frequency of X U Y. An association ruleis called frequent if its support (frequency) exceeds agiven minimal support (frequency) threshold σ. Theconfidence or accuracy of an association rule X => Yin D is the conditional probability of having Ycontained in a transaction, given that X is containedin that transaction:
D)Y,(Xconfidence
=P(Y/X)
D)support(X,D)Y,support(XX)YP(
=
 The rule is called confident if P(Y|X) exceeds a givenminimal confidence threshold γ, with 0 < γ < 1.Based on classical association rule mining, a newapproach has been developed expanding it by usingfuzzy sets.The clustering problem is expressed as follows:The set of N documents D = {D
1
,D
2
,...D
 N
} is to beclustered. Each DiεU
 Nd
is an attribute vector consisting of  N
d
real measurements describing the object. Thedocuments are to be grouped into non-overlappingclusters C = {C
1
,C
2
,...C
 N
} (C is known as a clustering),where, K is the number of clusters, C
1
C
2
...
C
, C
i
≠φand C
1
∩C
2
= φ for i≠j.Assuming f: DxD→U
+
is a measure of similarity between document feature vectors. Clustering is the task of finding a partition {C
1
,C
2
,...,C
} of D such that
i, j
{1,...K}, j≠i,
x
C
i
: f(x,O
i
)≥f(x,O
 j
) where, O
i
is onecluster representative of cluster C
i
.The goal of clustering is stated as follows:Given:A set of documents D = {D
1
,D
2
,...D
 N
}A desired number of clusters K An objective function or fitness function thatevaluates the quality of a clustering, the system has tocompute an assignment g: D→(1,2,...,K} and maximizesthe objective function.
III.
ELATED
 
WORK 
One of the key operations in fuzzy logic andapproximate reasoning is the fuzzy implication, which isusually performed by a binary operator, called animplication function or, simply, an implication. M. Mas,et.al.,[2] tries to compile the main basic theoretical properties of the four most usual kinds of implications:S-, R-,QL-, and D-implications. This is done for the properly fuzzy environment (implications defined on[0,1]) as well as for the discrete case, which isincreasingly studied because it allows to avoid numericalinterpretations of the linguistic variables used in fuzzytechniques.C.Y. Suen et al., [3] Handwriting recognition is acomplex and important problem. Recognition of handwriting is important for automatic document processing functions such
as
mail sorting and check reading. Recognition of isolated handwritten digits is nolonger a significant research problem. Paul D. Gader andJames M. Keller [4] introduced fuzzy set theory tohandwriting recognition and suggested a new applicationto handwritten word recognition.Now-a-days, fraud prevention and detection is a very big category in research issues. Hence need somespecific solutions and methodologies for preventingfraud. Mirjana[5] based on science database, fraud prevention has been conducted due to problem domains,fraud detection and prevention are diversified which isindicated by research articles survey. In this work,following applications areas were detected and described:telecommunications, insurance, auditing, medical care,credit card transactions, e-business, bid pricing andidentity verification.Fuzzy clustering is a widely applied method for obtaining fuzzy models from data. It has been appliedsuccessfully in various fields including finance andmarketing. Fuzzy set theory was initially applied toclustering in [6]. The book by Bezdek [7] is a goodsource for material on fuzzy clustering. The most popular fuzzy clustering algorithm is the fuzzy c-means (FCM)algorithm. The design of membership functions is themost important problem in fuzzy clustering. Differentchoices include those based on similarity decompositionand centroids of clusters.Eduardo Raul Hruschka et al., [8] gives survey onevolutionary algorithms for clustering. They proposedhard partition algorithms, though overlapping (soft/fuzzy)approaches and discussed key issues on the design of evolutionary algorithms for data partitioning problems,such as usually adopted representations, evolutionaryoperators, and fitness functions. In particular, mutationand crossover operators commonly described in theliterature are conceptually analyzed, giving especialemphasis to those genetic operators specifically designedfor clustering problems.Chin-Teng Lin and Ya-Ching Lu,[9] Introduced asystem, that has fuzzy supervised learning capability.With fuzzy supervised learning, it has been used for a
184http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
(IJCSIS) International Journal of Computer Science and Information Security,Vol.
8
 , No.
4
 , 2010
fuzzy expert system, fuzzy system modeling or rule baseconcentration. It has been also used for an adaptive fuzzycontroller, when learning with numerical values. Raghu Krishnapuram et al.,[10] presented newrelational fuzzy clustering algorithms based on the ideaof medoids. The worst case complexity of the algorithmswas, which happens while updating the medoids in eachiteration. This complexity compares very favorably withother fuzzy algorithms for relational clustering. Theseapproach were useful in Web mining applications such ascategorization of Web documents, snippets, and user sessions.Chun-Hao Chen et al.,[11] put forward new viewcalled cluster-based fuzzy-genetic mining algorithm for extracting both fuzzy association rules and membershipfunctions from quantitative transactions. It candynamically adjust membership functions by geneticalgorithms and uses them to fuzzify quantitativetransactions. It can also speed up the evaluation processand keep nearly the same quality of solutions byclustering chromosomes. Each chromosome represents aset of membership functions used in fuzzy mining. Thisalgorithm first divides the chromosomes in a populationinto k clusters by using the k-means clustering approach.All the chromosomes in a cluster then use the number of large 1-itemsets derived from the representativechromosome in the cluster and their own suitability of membership functions to calculate the fitness values. Theevaluation cost can thus be reduced due to the time-saving in finding 1-itemsets.Hongwel Chen et.al [12], presented a general fuzzytrust problem domain for P2P-based system, and compareFuzzy Comprehensive Evaluation method, Fuzzy Rank-ordering method, and Fuzzy Inference method through aconcrete paradigm. In this paradigm, they had appliedalgorithm to Fuzzy Comprehensive Evaluation Methodfor P2P-based trust system, and Blin algorithm to that of Fuzzy Rank-ordering Method, and Mamdani algorithm tothat of Fuzzy Inference Method. Results demonstrate thatdifferent fuzzy trust method for P2P-based system maydeduce different fuzzy results.Zhongze Fan and Minchao Huang, [13] speciallymakes extension of the conception of the fuzzy rule thatthe reasoning result may be any of all classes withdifferent degrees though the premise is similar, thus thecontradictions among the fuzzy rules can be completelyresolved though there are overlaps among the hyper spheres. This idea can be applied for the fault diagnosisfields but also can be used for automata, signal treatmentand image treatment etc.FUZZY clustering techniques have been appliedeffectively in image processing, pattern recognition andfuzzy modeling. The best known approach to fuzzyclustering is the method of fuzzy -means (FCM), proposed by Bezdek [14] and Dunn [15], and generalized by other authors. A good survey of relevant works on thesubject can be found in [16]. In FCM, membershipfunctions are defined based on a distance function, andmembership degrees express proximities of entities tocluster centers. By choosing a suitable distance functiondifferent cluster shapes can be identified [17]–[22].Another approach to fuzzy clustering due toKrishnapuram and Keller [23] is the possibilistic-means(PCM) algorithm which eliminates one of the constraintsimposed on the search for partitions leading to possibilistic (absolute) fuzzy membership values insteadof FCM probabilistic (relative) fuzzy memberships.Usana Susana Nascimento et.al.,[24] introduced FCPMframe work called fuzzy clustering with proportionalmembership
 
model, it says how data are generated froma cluster structure to be identified. This implies directinterpretability of the fuzzy membership values, whichshould be considered a motivation for introducing data-driven model-based methods. Hamid Mohamadlou et al.,[25] spotted about an algorithm based on fuzzy clusteringfor mining fuzzy association rules using a combination of crisp and quantitative data. L. Bobrowski and J. Bezdek,[26], the reduction in the amount of clustering dataallows a partition of the data to be produced faster.Yücel Saygin and Özgür Ulusoy[27] forward to putsome methods for automated construction of fuzzy eventsets which are sets of events where each event has adegree of membership to a set. Fuzzy event sets areconstructed by analyzing event histories. They have proposed a sliding window algorithm for mining eventhistories and proposed an automated rule modularizationmethod that does not rely on semantic knowledge. RafaelAlcala et al., [28] based on the 2-tuples linguisticrepresentation model, they have presented a new fuzzydata-mining algorithm for extracting both associationrules and membership functions by means of anevolutionary learning of the membership functions, usinga basic method for mining fuzzy association rules. MilaKwiatkowska et al.,[29]reuse and integration of datafrom heterogeneous data sources requires explicitrepresentation of the predictors, their measures, and their interpretations. They have described a new framework  based on semantic and fuzzy logic for knowledgerepresentation and secondary data analysis.Yeong-Chyi Leeet al.,[30] gave an idea aboutmultiple-level taxonomy and multiple minimum supportsto find fuzzy association rules in a given quantitativetransaction data set. Using different criteria to judge theimportance of different items, managing taxonomicrelationships among items, and dealing quantitative datasets are three issues that usually occur in real miningapplications. This fuzzy mining algorithm can generate
185http://sites.google.com/site/ijcsis/ISSN 1947-5500

Activity (3)

You've already reviewed this. Edit your review.
1 thousand reads
1 hundred reads
dipoSennaike liked this

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->