An Effective Comparison of Graph Mining Algorithms and Techniques

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 3, MARCH 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.
ORG
28
An Effective Comparison of Graph Mining Algorithms and Techniques

B.Venkateshwar Reddy, S.Kalyani, B.Jyothi
Abstract: Many graph mining algorithms have been proposed in recent past researches, all these algorithms rely on a Very different approach so its really hard to say that which one is the most efficient and optimal if talk in the sense of Performance. Graph mining has become an increasingly important research topic in modeling complicated structures, such as bioinformatics, protein-protein interaction Network (PPI), circuits, images, social networks, the web, XML documents, web crawl, and workflows. This paper investigates on comparison of graph mining algorithms and techniques for finding the Information.
Index Terms Sub graphs, Graph mining et. al.
1 INTRODUCTION
raph Mining is the mathematical study of methods for recognizing group within a class of entities. Graph mining algorithms can be divided in to mainly three categories they are Graph Pattern Mining algo rithms,GraphClassificationalgorithms,andGraphCom pressionalgorithmsthispaperinvestigatesaboutthefind ingthepatterns.GraphBasedapproachesaretwotypes:
A. Apriori-Based Approach
It is uses a generateandtest approach generates candi dateitemsetsandtestsiftheyarefrequent:OneisGener ation of candidate item sets is expensive (in both space andtime)secondSupportcountingisexpensivei.e.,Sub setchecking,MultipleDatabasescans(I/O)
thecaseofgraphs.Asinthecaseofmanagementapplica tions,theminingapplicationsarefarmorechallengingto implement because of the additional constraints which arise from the structural nature of the underlying graph. In spite of these challenges, a number oftechniques have been developed for traditional mining problems such as frequent pattern mining, clustering, and classification. In thissection,wewillprovideasurveyofmanyofthestruc turalalgorithmsforgraphmining. .
2.1 Pattern Mining in Graphs

The problem of frequent pattern mining has been widely studied in the context of mining transactional data [1, 2]. Recently,thetechniquesforfrequentpatternmininghave alsobeenextendedtothecaseofgraphdata.Themain Differenceinthecaseofgraphsisthattheprocessofde terminingsupportisquitedifferent.Theproblemcanbe defined in different ways depending upon the applica tiondomain:Inthefirstcase,wehaveagroupofgraphs, and we wish to determine all patterns which support a fraction of the corresponding graphs [3, 4, 5]. In the secondcase,wehaveasinglelargegraph,andwewish to determine all patterns which are supported at least a certain number of times in this large graph [6, 7, 4]. In bothcases,weneedtoaccountfortheisomorphismissue in determining whether one graph is supported by another.However,theproblemofdefiningthesupportis muchmorechallenging,ifoverlapsareallowedbetween different embeddings. This is because if we allow such overlaps, then the antimonotonicity property of most frequent pattern mining algorithms is violated. For the first case, where we have a data set containing multiple graphs, most of the well known techniques for frequent
B. Pattern-Growth Approach
It allows frequent item set discovery without candidate generation. Two steps: 1.Build a compact data structure called the FPtree 2.extracts frequent item sets directly fromtheFPtree. 2 ALGORITHMS AND TECHNIQUES Manyofthetraditionalminingapplicationsalsoapplyto
B.Venkateshwar Reddy is working in Department of Computer Science and Engineering, Anurag Grioup of Institutions, Hyderbad, India. S.Kalyani is working in Department of Computer Science and Engineering, Anurag Grioup of Institutions, Hyderbad, India. B.Jyothi is working in Department of Computer Science and Engineering, Anurag Grioup of Institutions, Hyderbad, India.
JOURNAL OF COMPUTING, VOLUME 4, ISSUE 3, MARCH 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.ORG
29
pattern mining with transactional data can be easily ex tended. For example, Aprioristyle algorithms can be ex tendedtothecaseofgraphdata,byusingasimilarlevel wise strategy of generating (k + 1)candidates from k patterns.Themaindifferenceisthatweneedtodefinethe joinprocessalittledifferently.Twographsofsizekcanbe joined,iftheyhaveastructureofsize(k1)incommon. Thesizeofthisstructurecouldbedefinedintermsofeither nodesoredges.InthecaseoftheAGMalgorithm[3],this common structure is defined in terms of the number of common vertices. Thus, two graphs with k vertices are joined, only if they have a common sub graph with at least(k1)vertices.Asecondwayofperformingthemin ingistojointwographswhichhaveasubgraphcontain ing at least (k 1) edges in common. The FSG algorithm proposed in [4] can be used in order to perform edge basedjoins.Itisalsopossibletodefinethejoinsinterms of arbitrary structures. For example, it is possible to ex press the graphs in terms of edgedisjoint paths. In such cases, sub graphs with (k +1)edge disjoint paths can be generated from two graphs which have k edge disjoint paths, of which (k 1) must be common. An algorithm along these lines is proposed in [5]. Another strategy whichisoftenusedisthatofpatterngrowthtechniques,in whichfrequentgraphpatternsareextendedwiththeuse of additional edges [8, 9, and 10]. As in the case of fre quent pattern mining problem, we use lexicographic or dering among edges in order to structure the search process,sothatagivenpatternisencounteredonlyonce. Forthesecondcaseinwhichwehaveasinglelargegraph, anumberofdifferenttechniquesmaybeusedinorderto definethesupportinpresenceoftheoverlaps.Acommon strategy is to use the size of the maximum independent setoftheoverlapgraphtodefinethesupport.Thisisalso referredtoasthemaximumindependentsetsupport.In[11], two algorithms HSIGRAM and VSIGRAM are proposed for determining the frequent sub graphs within a single largegraph.Intheformercase,abreadthfirstsearchap proach is used in order to determine the frequent sub graphs,whereasadepthfirstapproachisusedinthelat tercase.In[7],ithasbeenshownthatthemaximuminde pendent set measure continues to satisfy the anti monotonicity property. The main problem with this measure is that it is extremely expensive to compute. Therefore,thetechniquein[6]definesadifferentmeasure inordertocomputethesupportofapattern.Theideais tocomputeaminimumimagebasedsupportofagivenpat tern. For this case, we compute the number of unique nodesofthegraphtowhichanodeofthegivenpatternis mapped. This measure continues to satisfy the anti monotonicityproperty,andcanthereforebeusedinorder to determine the underlying frequent patterns. An effi cient algorithm with the use of this measure has been proposed in [6].As in the case of standard frequent pat
tern mining, a number of variations are possible for the caseoffindinggraphpatterns,suchasdeterminingmax imalpatterns[10],closedpatterns[12],orsignificantpat terns [13, 14, 15]. We note that significant graph patterns canbedefinedindifferentwaysdependingupontheap plication.In[14],significantgraphsaredefinedbytrans forming regions of the graphs into features and measur ingthecorrespondingimportanceintermsofpvalues.In [15],significantpatternsaredefinedintermsofarbitrary objectivefunctions.Ametaframeworkhasbeenproposed in[15]todeterminethesignificantpatternsbasedonarbi traryobjectivefunctions.Oneinterestingapproachtodis coversignificantpatternsistobuildamodelbasedsearch treeorMBT[16].Theideaistousedivideandconquerto minethemostsignificantpatternsinasubspaceofexam ples.Itbuildsadecisiontreethatpartitionsthedataonto differentnodes.Thenateachnode,itdirectlydiscoversa discriminative pattern to further divide its examples into purersubsets.Sincethenumberofexamplestowardsleaf level is relatively small, this approach is able to examine patternswithextremelylowglobalsupportthatcouldnot be graph patterns, which is very difficult for most other solutions.Sinceitusesthedivideandconquerparadigm, the algorithm is almost linearly scalable with 1 MinSupportandthenumberofexamples[17].TheMBT technique is not limited to graphs, but also applicable to item sets and sequences, and mine pattern set is enume rated on the whole data set. For some graph data sets which occur in drug discovery applications [17], it could minesignificantbothsmallandsignificant.Oneofthekey challenges which arise in the context of all frequent pat ternminingalgorithmsisthemassivenumberofpatterns which can be mined from the underlying database. This problem is particularly acute in the case of graphs since thesizeoftheoutputcanbeextremelylarge.Onesolution for reducing the number of representative patterns is to reportfrequentpatternsintermsoforthogonality.Amodel calledORIGAMIhasbeenproposedin[18]whichreports frequent graph patterns only if the similarity is below a thresholdsuchpatternsarealsoreferredtoas_orthogonal patterns. A pattern set P is said to be representative, if for every nonreported pattern g, at least one pattern can be found in P for which the underlying similarity to g is at leastathresholdThesetwoconstraintsaddressdifferent aspectsofthestructuralpatterns.Themethodin[18]de terminesthesetofallorthogonalandrepresentativepat terns.Anefficientalgorithmhasbeenproposedin[18]in ordertominesuchpatterns.Theideahereistoreducethe redundancyintheunderlyingpatternsetsoastoprovide a better understanding of the reported patterns. Some particularlychallengingvariationsoftheproblemarisein thecontextofeitherverylargedatasetsorverylargedata graphs.Recently,atechniquewasproposedby[19],which
30
uses randomized summarization in order to reduce the data set to a much smaller size. This summarization is then leveraged in order to determine the frequent sub graphpatternsfromthedata.Boundsarederivedin[19] on the false positives and false negatives with the use of suchanapproach.Anotherchallengingvariationiswhen the frequent patterns are overlaid on a very large graph, asaresultofwhichpatternsmaythemselvesbeverylarge subgraphs.AnalgorithmcalledTSMinerwasproposedin [20] to determine frequent structures in very large scale graphs.Graphpatternmininghasnumerousapplications for a variety of applications. For example, in the case of labeleddata,suchpatternminingtechniquescanbeused inordertodeterminestructuralclassificationrules.Forex ample, the technique in [21] uses this approach for the purposeofXMLdataclassification.Inthiscase,wehavea data set consisting of multiple (XML) graphs, each of whichisassociatedwithaclasslabel.Themethodin[21] determinestherulesinwhichthelefthandsideisastruc ture and the right hand side is a class label. This is used for the purposes of classification. Another application of frequentpatternminingisstudiedin[22],inwhichthese patterns are used in order to create gBoost, which is a classifierdesignedasanapplicationofboosting.Frequent patternmininghasbeenfoundtobeparticularlyusefulin The chemical and biological domain [6, 23, 24, 25]. Fre quent pattern mining techniques have been used to per form important functions in this domain such as classifi cation or determination of metabolic pathways. Frequent graph pattern mining is also useful for the purpose of creatinggraphindexes.In[26],thefrequentstructuresin agraphcollectionaremined,sothattheycanbeusedas features for an indexing process. The similarity of fre quentpatternmembershipbehavioracrossgraphsisused to define a rough similarity function for the purpose of filtering.Aninvertedrepresentationisconstructedonthis feature based representation in order to filter out irrele vant graphs for the similarity search process. The tech nique of [26] is much more efficient than other competi tive techniques because of its feature based approach. In general,frequentpatternminingalgorithmsareusefulfor any application which can be defined effectively on the basisofaggregatecharacteristics.Ingeneralgraphpattern miningtechniqueshavethesamerangeofapplicabilityas theydoforthecaseofvanillafrequentpatternmining. Inokuchi, Washio and Motoda [27] in 1998 pro posedanovelapproachnameAGMtoefficientlyminethe associationruleamongthefrequentlyappearingsubstruc ture in a given graph dataset.A graph is represented by adjacencymatricesandthefrequentpatternsappearingin thematricesareminedthroughtheextendedalgorithmof thebasketanalysis.AgarwalandSrikant[28]in1994con sidered the problem of discovering association rules be
tweenitemsinalargedatabaseofsalestransaction.They presented two new algorithms for solving this problem that are fundamentally different from the known algo rithm.BlockeelandRaedt[29]in1998introducedafirst order framework for topdown induction of logical deci siontree.Topdowninductionofdecisiontreesisthebest known and most successful machine learning technique. It has been used solve numerous practical problems. It employs a divideand conquers strategy, and in this it differsfromitsrulebasedcompetitorswhicharebasedon covering strategies. Chakrabarti, Dom and Indyk [30] in 1998developedanewmethodforautomaticallyclassify inghypertextintoagiventopichierarchy,usinganitera tive relaxation algorithm. After bootstrapping off a text basedclassifier,theyusedbothlocaltextsinadocument aswellasthedistributionoftheestimatedclassesofother documents in its neighborhood, to refine the class distri butionofdocumentbeingclassified.Theydiscussedthree areaofresearch:textandhypertextinformationretrieval, machine learning in context other text or hypertext, and computer vision and pattern recognition. Kramer, Raedt, and Helma [31] in 2001 presented the application of fea tureminingtechniquestothedevelopmentaltherapeutics programsAIDSantiviralscreendatabase.Kuramochiand Karypis[32]in2001presentedacomputationallyefficient algorithm for finding all frequent sub graphs in large graph databases. They evaluated the performance of the algorithm by experiments with synthetic datasets as well asachemicalcompounddataset.Pei,Han,MortazaviAsl andPinto[33]in2001proposedanovelsequentialpattern miningmethodcalledPrefixSpanthatisprefixprojected Sequential pattern mining. Borgelt and Berthold [34] in 2002presentedanalgorithmtofindfragmentsinasetof molecules that help to discriminate between different classes for instance, activity in a drug discovery context. YanandHan[35]in2002investigatednewapproachesfor frequent graphbased pattern mining in graph datasets andproposedanovelalgorithmcalledgSpan.gSpanisa graphbasedsubstructurepatternmining.Thisdiscovered frequentsubstructureswithoutcandidategeneration.Za ki [36] in 2002 presented TREEMINER algorithm to dis cover all frequent sub trees in a forest, using a new data structure called scopelist. Deshpande, Kuramochi and Karypis [37] in 2002 proposed the technique for classify ingchemicalcompounds.Thesetechniquescanbebroad lycategorizedintotwogroups.Thefirstgroupconsistsof techniques that rely mainly on various global properties of the chemical compounds, such as molecular weight, ionization potential, interatomic distance etc. for captur ingthestructuralpropertiesofthecompounds.Sincethis information is not relational, existing classification tech niquescanbeeasilyusedonthesedatasets.Howeverthe absenceofactualstructuralinformationlimitstheaccura cy of such classified. The second group of techniques di
31
rectly analyzes the structure of the chemical compounds toidentifypatternsthatcanbeusedforclassification.In ikuchi et al. [27] developed an algorithm to find all fre quently occurring induced sub graphs and presented someevidencethatsuchsubgraphcanbeusedtofeatures for future classification. Getoor [38] in 2003 studied on link mining. Link among the objects may demonstrated certainpatternswhichcanbehelpfulformanydatamin ingtasksandareusuallyhardtocapturewithtraditional statistical models. Link mining is promising new area where relational learning meets statistical modeling. Huan, wang and Prince [39] in 2003 proposed a novel subgraphminingalgorithm:FFSM,whichemploysaver ticalsearchschemewithinanalgebraicgraphframework. Theyhavedevelopedtoreducethenumberofredundant candidatesproposed.Theirstudiedonsyntheticandreal datasets demonstrates that FFSM achieves a substantial performance gain over the current startofthe art sub graphminingalgorithmgSpan.YanandHan[40]in2003 proposedtomineclosefrequentgraphpatterns.Agraph G is closed in a database if there exists no proper sub graphofGthathasthesamesupportasG.Aclosedgraph pattern mining algorithm, Close Graph, is developed by exploringseveralinterestingloopingmethods.Theirper formance studied shown that Close Graph not only dra maticallyreducesunnecessarysubgroupstobegenerated but also substantially increases the efficiency of mining, especiallyinthepresenceoflargegraphpatterns.Yinand Han[41]in2003developedanewclassificationapproach iscalledCPAR(CPARisclassificationbasedonpredictive Association Rules). Based on their study performance study,CPARachievedhighaccuracyandefficiency,which have many useful features. CPAR represents a new ap proachtowardsefficientandhighqualityclassification.It is interesting to further enhance the efficiency and scala bility of this approach and compare it with other well establishedclassificationschemes.Moreover,thestrength of the derived predictive rules also motivates us to per form an indepth study on alternative approaches to wards effective association rule mining. Huan, Wang, Prins and Yang [42] in 2004 developed a new algorithm thatminesonlymaximalfrequentsubgraphs,thatissub graphthatarenotapartofanyotherfrequentsubgraphs. Theiralgorithmcanachieveafivefoldspeedupoverthe currentstateoftheartsubgraphminingalgorithms. Koyuturk,Grama,Szpankowski,[29]in2004pre sented an innovative new algorithm for detecting fre quently occurring patterns and modules in biological network. They show experimentally that their algorithm can extracted from the KEGG database within seconds. The proposed model and algorithm are applicable to a variety of biological networks either directly or with mi normodification.Meinl,BorgeltandBerthold[30]in2004
shownthatispossibletominemeaningful,discriminative molecularfragmentsfromlargedatabases.Usinganexist ing algorithm that employs a depthfirst strategy and a sophisticated ordering scheme allows avoiding costly re embeddings throughout the candidate growth process, which in turn enables us to find also larger fragments. Yan,YuandHan[32]in2004discoveredtheissuesofin dexinggraphsandproposedanovelsolutionbyapplying a graph mining technique. Different from the existing pathbasedmethods,ourapproach,calledgIndex,makes useoffrequentsubstructureasthebasicindexingfeature. Frequent substructures are ideal candidates since they explore the intrinsic characteristics of the data and are relativelystabletodatabaseupdates.gIndexhas10times smaller index size, but achieves 310 times better perfor mance in comparison with a typical pathbased method. Hu,Yan,Huang,HanandZhou[33]in2005developeda novel algorithm, CODENSE, to efficiently mine frequent coherent dense sub graphs across large number of mas sive graph on biological networks for function discovery. Li and Tan [34] in 2005 proposed a novel graph mining algorithm to detect the dense neighborhoods in an inte ractiongraphwhichmaycorrespondtoproteincomplex es. Their algorithm first located local cliques for each graph vertex and then merge the detected local cliques accordingtotheiraffinitytoformmaximaldenseregions. Yin, Han and Yu [36] in 2005 proposed a new approach, calledCROSSCLUS,whichperformscrossrelationalclus tering with users guidance. Yan, Zhou and Han [37] in 2005 developed two approaches to handle different min ingrequests:CLOSECUT,apatterngrowthapproach,and SPLAT,apatternreductionapproach.Theyappliedthese methods in biological datasets and found the discovered patterns interesting. Chakrabarti and Faloutsos [38] in 2006 discussed the problem of detectingabnormalities in a given graph and of generating synthetic but realistic graphshavereceivedconsiderableattentionrecently.Both are tightly coupled to the problem of finding the distin guished characteristics of realworld graphs i.e. the pat ternsthatshowupfrequentlyinsuchgraphsandcanbe consideredasmarksofrealism.KarunaratneandBostrom [39]in2006developedamethodforlearningfromstruc tured data is limited with respect to handling large iso lated substructures and also imposed constraints on search depth and induced structure length.An approach to learning from structured data using agraph based ca nonicalrepresentationmethodofstructuredcalledfinger printing. Krasky, Rohwer, Schroeder and Selzen [40] in 2006 discussed on a combined bioinformatics and che moinformatics approach for the development of new ant parasitic drugs. In ParMol package they have imple mentedfourofthemostpopularfrequentsubgraphmin ers using a common infrastructure: MoFa, gspan, FFSM and Gaston. They also added additional functionality to
32
some of the algorithms like parallel search, mining di rected graphs and mining in one big graph instead of a graph database. Meinl, Worlein, Fischer, and Philippsen [42] in 2006 presented the threadbased parallel versions of MoFa and gSpan that achieve speedup up to 11 on a shared memory SMP system using 12 processors. They discussed the design space of the parallelization, the re sults, and the obstacles, that are caused by the irregular search space and by the current state of Java technology. Merkwirth and Ogorzalek [45] in 2007 described a me thodforconstructionofspecifictypesofneuralnetworks composedstructuresdirectlylinkedtothestructureofthe molecule under consideration. Each molecule can be represented by a unique neural connectivity problem (graph)whichcanbeprogrammedontoacellularneural network. A composite network can further successfully performclassificationandregressiononrealworldchem ical dataset. Bogdanov [48] in 2008 studied on Graph searching, indexing, mining and modeling for Bioinfor matics,chemoinformaticsandSocialnetwork.Guhaand Schurer[51]in2008investigatedvariousaspectsofdevel opingcomputationalmodelstopredictcelltoxicitybased on cell proliferation screening data generated in the MLSCN. By capturing featurebased information in that data set, such predictive models would be useful in eva luating cell based screening results in general and could be used as an aid to identify and eliminate potentially undesired compounds. Karunaratne and Bostrom [53] in 2008 presented a case study in chemo informatics in which various types of background knowledge are en codedingraphsthataregivenasinputtoagraphlearner. In this paper shown that the type of background know ledgeencodedindeedhasaneffectonthepredictiveper formance. Lam and Chan [54] in 2008 studied on graph dataminingalgorithmareincreasinglyappliedtobiologi calgraphdataset.Inthispapertheyproposedgraphmin ingalgorithmMIGDAC(Mininggraphdataforclassifica tion)thatappliesongraph theoryandaninterestingness measure to discover interesting sub graphs which canbe both Characterized and easily distinguished from other classes.TsudaandKurihara[56]in2008proposedanon parametricBayesianmethodforclusteringgraphandse lecting salient patterns at the same time. Variation infe rence is adopted here because sampling is not applicable due to extremely high dimensionality. Schietget, Costa, Ramon,andRaedt[57]in2009proposedadirectefficient and simple approach for generation of interesting graph pattern. They computed maximum common subgraph from randomly selected pairs of examples and directly use them as features. Jiang, Spjuth, Willighagen, Guha, Eklund and Wikberg in 2010 Studied on toward intero perable and reproducible QSAR analysis: Exchange of datasets.QSARiswidelyusedmethodtorelatechemical structures to responses or properties based experimental
observations.Muchefforthasbeenmadetoevaluateand validatethestatisticalmodelingQSAR,buttheseanalyses treat the dataset as fixed. An overlooked but highly im portantissueisthevalidationofthesetupofthedataset, whichcomprisesadditionofchemicalstructureaswellas selection of descriptors and software implementations priortocalculations.Thisprocessishamperedbythelack of standard and exchange formats in the field, making it virtually impossible to reduce and validate analyses and drastically constrain collaboration and reuse of data. Yang, Parthasarthy and Sadayappan in 2010 presented a novelapproachtodatarepresentationforcomputingthis kernel,particularlytargetingsparcematricesrepresenting powerlaw graphs. They shown their representation scheme, coupled with a novel tiling algorithm, can yield significant benefits over the current state of the art GPU and CPU efforts on a number of core data mining algo rithms such as PageRank, HITS and Random Walk with Restart. A graph transaction is represented by adjacency matrices and the frequent patterns appearing in matrices are mined through the extended algorithm. These are modeled as attribute graph in which each vertex representsanatomandeachedgeabondbetweenatoms. Eachvertexcarriesattributethatindicatestheatomtype.
3. CONCLUSIONS AND FUTURE WORK

Themainchallengeinthedevelopmentofthealgorithmis howtosplitupthediscoveryprocessintoseveralphases efficiently.Thealgorithmshouldbehavelikeaspecialized free tree miner when faced with free tree databases, but should also be able to deal with graphs databases effi ciently. Existing algorithm for frequent pattern mining becomeverycostlyintimeandspaceasthepatternsizes andnetworknumberincrease.Currentlynoefficientalgo rithm is available for mining recurrent patterns across large collection of genome wide network. There are vari ous domains like chemo informatics bioinformatics etc. where no efficient algorithms are available, for example, for mining recurrent patterns across large collection of genomewide networks. Due to increasing size and com plexityofpatternsincomputersciencestheneedforeffi cientgraphminingalgorithmisincreasing.Stillthereisa scope of improvement in graph mining algorithm; the improvementcanbeinspeedorsensitivity.
ACKNOWLEDGMENT
The authors are very thankful to their respected Prof.G.VishnuMurthyHeadoftheDepartmentComputer Science & Engineering, CVSR School of Engineering, Anurag Group of Institutions, Hyderabad, India for giv ingtheirmoralsupportandhelptocarryoutthisresearch work.
33
REFERENCES
[1]R.Agrawal,R.Srikant:Fastalgorithmsforminingassocia tionRulesinlargedatabases,VLDBConference,1994. [2]J.Han,J.Pei,Y.Yin:MiningFrequentPatternswithoutCandi DateGeneration.SIGMODConference,2000. [3]A.Inokuchi,T.Washio,H.Motoda:AnAprioribasedAlgo rithmforMiningFrequentSubstructuresfromGraphData. PKDDConference,pages1323,2000. [4]M.Kuramochi,G.Karypis:Frequentsubgraphdiscovery. ICDMConference,pp.313320,Nov.2001. [5]N.Vanetik,E.Gudes,S.E.Shimony:ComputingFrequent GraphPatternsfromSemistructuredData.IEEEICDMConf erence,2002. [6]B.Bringmann,S.Nijssen:Whatisfrequentinasinglegraph? PAKDDConference,2008. [7]M.Fiedler,C.Borgelt:Supportcomputationforminingfre quentsubgraphsinasinglegraph.WorkshoponMiningand LearningwithGraphs(MLG07),2007. [8]C.Borgelt,M.R.Berthold.Miningmolecularfragments: FindingRelevantSubstructuresofMolecules.ICDMConfe rence,2002. [9]X.Yan,J.Han.Gspan:GraphbasedSubstructurePattern Mining.ICDMConference,2002. [10]J.Huan,W.Wang,J.Prins,J.Yang.Spin:MiningMaximal FrequentSubgraphsfromGraphDatabases.KDDConference, 2004. [11]M.Kuramochi,G.Karypis.Findingfrequentpatternsina largesparsegraph.DataMiningandKnowledgeDiscovery, 11(3):pp.243271,2005. [12]X.Yan,J.Han.CloseGraph:MiningClosedFrequentGraph Patterns,ACMKDDConference,2003. [13]H.He,A.K.Singh:EfficientAlgorithmsforMiningSignifi cantSubstructuresfromGraphswithQualityGuarantees. ICDMConference,2007. [14]S.Ranu,A.K.Singh.GraphSig:Ascalableapproachto Miningsignificantsubgraphsinlargegraphdatabases.ICDE Conference,2009. [15]X.Yan,J.Han.CloseGraph:MiningClosedFrequentGraph Patterns,ACMKDDConference,2003. [16]S.Asur,S.Parthasarathy,andD.Ucar.Aneventbased Frameworkforcharacterizingtheevolutionarybehaviorof interactiongraphs.ACMKDDConference,2007. [17]W.Fan,K.Zhang,H.Cheng,J.Gao.X.Yan,J.Han,P.S.Yu O.Verscheure.DirectMiningofDiscriminativeandEs SentialFrequentPatternsviaModelbasedSearchTree.ACM KDDConference,2008. [18]M.AlHasan,V.Chaoji,S.Salem,J.Besson,M.J.Zaki.ORI GAMI:MiningRepresentativeOrthogonalGraphPatterns. ICDMConference,2007. [19]C.Chen,C.Lin,M.Fredrikson,M.Christodorescu,X.Yan,J. Han,MiningGraphPatternsEfficientlyviaRandomized Summaries,VLDBConference,2009. [20]R.Jin,C.Wang,D.Polshakov,S.Parthasarathy,G.Agrawal: DiscoveringFrequentTopologicalStructuresfromGraph Datasets.ACMKDDConference,2005. [21]M.J.Zaki,C.C.Aggarwal.XRules:AnEffectiveStructural ClassifierforXMLData,KDDConference,2003. [22]T.Kudo,E.Maeda,Y.Matsumoto:AnApplicationofBoost ingtoGraphClassification,NIPSConf.2004.
[23]M.Deshpande,M.Kuramochi,N.Wale,G.Karypis.Fre quentSubstructurebasedApproachesforClassifyingChem icalCompounds.IEEETransactionsonKnowledgeandData Engineering,17:pp.10361050,2005. [24]J.Huan,W.Wang,D.Bandyopadhyay,J.Snoeyink,J.Prins, A.Tropsha:MiningSpatialMotifsfromProteinStructure Graphs.ResearchinComputationalMolecularBiology(RE COMB),pp.308315,2004. [25]M.Koyuturk,A.Grama,W.Szpankowski:AnEfficientAlgo rithmforDetectingFrequentSubgraphsinBiologicalNet works.Bioinformatics,20:I200207,2004. [26]X.Yan,P.S.Yu,J.Han.Graphindexing:Afrequentstructure basedapproach.SIGMODConference,2004. [27]A.Inokuchi,T.Washio,H.Motoda,AnAprioribasedAlgo rithmforMiningFrequentsubstructuresfromGraphData. Inproc.2000EuropeanSymp.PrincipleofDataminingand knowledgeDiscovery(PKDD00),1998,pp.1323. [28]R.Agrawal,R.Srikant,FastAlgorithmsforminingassocia tionrules.Intheproc.Ofthe20thInt.conf.onverylarge databases(VLDB),1994. [29]H.Blockeel,L.D.Raedt,Topdowninductionoffirstorder logicdecisiontrees,ArtificialIntelligence,101,1998,pp. 285297. [30]S.Chakrabarti,B.Dom,P.Indyk,Enhancedhypertextcate gorizationusinghyperlinksACM,(SIGMOD98),1998,pp. 307318. [31]S.Kramer,L.D.Raedt,C.Helma,Molecularfeaturemining InHIVdata,InProc.ationalconf.onofthe7thACM SIGKDDInternationalconf.onknowledgediscoveryand datamining,2001,pp.136143 [32]M.Kuramochi,G.Karypis,FreovequentSubgraphDiscov ery,InProc2001Int.conf.Datamining(ICDM01). [33]J.Pei,J.Han,B.MortazaviAsl,H.Pinto,PrefixSpan:Min ingSequentialPatternGrowth.Inproc.2001int.conf.Data Engineering(ICDE01),2001,pp.215224. [34]C.BorgeltandM.R.Berthold,Miningmolecularfragments: Findingrelevantsubstructuresofmolecules.InProc.2002 int.conf.DataMining(ICDM02),PP.211218. [35]X.Yan,andJ.Han,gSpan:GraphBasedSubstructurePat ternMining.InProc.2002Int.conf.Datamining,2002,pp. 721724. [36]M.J.Zaki,EfficientlyMiningFrequenttreesinaforest.In Proc.2002ACMSIGKDDInt.conf.knowledgeDiscovery AndDatamining(KDD02),2002,pp.7180. [37]M.Deshpande,M.Kuramochi,G.Karypis,Automatedap proachesforclassifyingstructures.InProc.2002workshop onDatamininginBioinformatics(BIOKDD02),2002,pp 1118. [38]L.Getoor,LinkMining:Anewdataminingchallenge. SIGKDDExplorations,(5),2003,pp.8489. [39]J.Huan,W.WangandJ.Prins,EfficientMiningoffrequent SubgraphinthePresenceofIsomorphism.InProc.2003int. conf.Datamining(ICDM03),2003,pp.549552. [40]X.YanandJ.HanCloseGraphs:MiningClosedFrequent GraphPatterns.Inproc.2003ACMSIGKDDInt.conf.know ledgeDiscoveryandDataMining(KDD03),2003,pp.286 295. [41]X.YinandJ.Han,CPAR:ClassificationbasedonPredictive AssociationRules.InProc.2003SIAMInt.conf.DataMin ing(SDM03),2003,pp.331335.
34
[42]J.Huan,W.Wang,J.PrinsandJ.Yang,Spin:miningmax imalfrequentsubgraphsfromgraphDatabases,KDD04 Seattle,Washington,USA,2004. [43]M.Koyuturk,A.Grama,andW.Szpankowski.AnEfficient algorithmfordetectingfrequentsubgraphsinbiologicalnet Works.Bioinformatics,(20),2004,pp.i200i207. [44]X.Yan,P.S.Yu,andJ.Han.GraphIndexing:AFrequent Structurebasedapproach.InProc.2004ACMSIGKDDInt. conf.managementofData,2004,pp.335346. [45]X.Yin,J.Han,P.S.Yu,CrossRelationalClusteringwith UsersGuidance.InProc.2005ACMSIGKDDInt.conf. knowledgeDiscoveryandDatamining(KDD05),2005, 344353. [46]D.Chakrabarti,C.Faloutsos,Graphmining:Laws,Gene rators,andAlgorithm.ACMcomputingsurvey,38(2), 2006,pp.169. [47]S.Maji,S.Mehta,ANetflowdistancebetweenlabeled graphsapplicationsinchemoinformatics. www.cs.berkeley.edu.,2008. [48]A.Krasky,A.Rower,J.Schroeder,P.M.Selzen,Acombined Bioinformaticsandchemoinformaticsapproachforthede velbnopmentofnewantiparasiticdrugs.Elsevier,genom ics,2006,pp.18. [49]T.Meinl,M.Worlein,O.Urzova,I.Fischer,M.Philippsen, TheParMolpackageforfrequentsubgraphmining.Elec tronicscommunicationofESST1,2006. [50]Tsuda,K.Kudo,T.ClusteringGraphsbyweightedsubstruc turemining.Proc.Of23rdInt.conf.onmachinelearning, ACM,148:953960,2006. [51]X.Dong,K.E.Gilbert,R.Guha,R.Heiland,J.Kim,M.E. Pierce,G.CFox,D.J.Wild,WebServiceInfrastructurefor chemoinformatics.J.Chem.Inf.Model,47,2007,pp.1303 1307. [52]A.M.Fahim,G.Saake,A.M.Salem,F.A.Torkey,M.A.Ra madan,Kmeansforsphericalclusterswithlargevariance insizes.WorldAcademyofscience,Engineering&Tech., 45,2008,pp.177182. [53]Hubler,C.Kriegel,H.P.Borgwardt,K.Ghahramani,Z.Me tropolisAlgorithmsforRepresentativeSubgraphsampling. IEEEXplore,2008,pp.283292. [54]W.W.M.Lam,K.C.C.Chan,AGraphminingalgorithmfor classifyingchemicalcompounds.IEEEInt.conf.onBioin formaticsandBiomedicine,2008. [55]S.Maji,S.Mehta,ANetflowdistancebetweenlabeled Graphsapplicationsinchemoinformatics. www.cs.berkeley.edu.,2008. [56]L.Schietget,F.Costa,J.Ramon,L.D.Raedt,Maximum commonsubgraphmining:AFastandeffectiveApproachto wardsfeaturegeneration.InProc.AtSRLMLGILP,Le ven,Belgium,2009. [57]C.Jiang,F.Coenen,M.Zito,M.FrequentSubgraphmin ingonEdgeWeightedGraphs.2010.
Mr.B.Venkateshwar Reddy Received M.Sc Mathematics from Osmania University and M.EComputerScienceandEngineeringfrom Sathyabama University, Chennai. Presently working as an Assistant Professor in CVSR School of Engineering (Anurag Group of Institutions),Hyderabad,India.Published twopapersinvariousInternationalJournals. Mrs.S.Kalyani received B.E Computer Science from Osmania University and M.Tech Computer Science and Engineering fromJNTUH.PresentlyworkingasanAsso ciate Professor in CVSR School of Engineer ing (Anurag Group of Institutions), Hydera bad,India. Ms.B.JyothiRecivedB.TechComputerscience AndEngineerngfromJNTUHandM.Tech ComputerscienceandEngineerngfromJNTUH. PresentlyworkingasanAssistantProfessorin CVSRSchoolofEngineering(AnuragGroupof Institutions),Hyderabad,India.

An Effective Comparison of Graph Mining Algorithms and Techniques

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

An Effective Comparison of Graph Mining Algorithms and Techniques

Uploaded by

Copyright:

Available Formats

JOURNAL OF COMPUTING, VOLUME 4, ISSUE 3, MARCH 2012, ISSN 2151-9617 https://sites.google.com/site/journalofcomputing WWW.JOURNALOFCOMPUTING.

An Effective Comparison of Graph Mining Algorithms and Techniques

Index Terms Sub graphs, Graph mining et. al.

2.1 Pattern Mining in Graphs

3. CONCLUSIONS AND FUTURE WORK

You might also like