A Survey On Large Scale Schema and Ontology Matching Techniques

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.
ORG
55
A Survey on Large Scale Schema and Ontology Matching Techniques

K. Saruladha, Dr. G. Aghila, and B. Sathiya
Abstract With the fast growth and the increasing usage of the large variety of data like XML schemas, ontologies. In many domains, the heterogeneity among the data increases enormously. Hence solving such heterogeneity needs matching techniques. In areas like E-business, web and life sciences the size of XML schema and ontology is large. But most of the existing matching techniques address only small match tasks. So, we present an overview of recently proposed matching techniques like early pruning of the search space, divide and conquer strategies, parallel matching and some renowned ontology matching tools which achieve high match efficiency or/and high quality for large-scale matching. In addition to this the pros and cons of various matching techniques is summarized. Index Terms Similarity Measure, Schema Matching, Ontology Matching, Ontology Alignment.
1 INTRODUCTION
N general relational and XML database is called as Schemaandthedatabaserepresentedbysemanticlan guage like Web Ontology Language (OWL) or Resource DescriptionLanguage(RDF)iscalledasOntology.Match ing technique finds semantic correspondences between cartesian product of entity pair of the given two ontolo gies. These correspondences may stand for equivalent, subsumption, or disjoint relation between the ontology entities.Theresultofmatchingtechniqueisthesetofse mantic correspondences called alignments. Fig.1 depicts the general framework for the schema matching tech niques which is also applicable for ontology matching techniques.Asshowninfiguretheinputofthematching techniqueistwoschemaswhicharefirstimportedintoa format suitable for processing. For a faster computation the schema may be preprocessed, e.g., preparing neig bourlistforeachentitytofastenthestructuralsimilarity calculation, removing redundant information from sche ma,etc.Thesimilarityvalueiscalculatedforallcartesian product entity pairs one from each of two preprocessed schemas using a match workflow consisting of set of matcher (e.g., lexical, structural and semantic matcher). The semantic value ranges from 0 to 1, i.e., value 1 indi cates equivalent and value0 means disjoint relation. The matcherworkflowcanbesequential,parallel,iterativeor in some mixed fashion. The similarity value obtained fromthematchworkflowcanbeaggregatedtoobtainthe final alignment. For large scale schema matching the
Mrs. K. Saruladha is with the Computer Science Department, Pondicherry Engineering College, Puducherry, Pin 605014, India. Dr. G. Aghila is with the Computer Science Department, Pondicherry University, Puducherry, Puducherry, Pin 605014, India. Ms. B. Sathiya is with the Computer Science Department, Pondicherry Engineering College, Puducherry, Pin 605014, India.
workflowwillslightlyvarytoincorporatetechniqueslike early pruning and partitioning of schema to reduce the searchspace,parallelization,etc. According to Shvaiko P and Euzenat J [11] one of the toughestchallengeformatchingsystemishandlinglarge scaleschemasorontologyandonthebasisofRahmE[12] work domain where large scale matching is necessary includeebusiness[13],webdata[14],lifescienceontolo gies ,and medicine [15] and web directories [16]. Even thoughtheadvancementsaremadeinthecurrentmatch ing systems, they are still unable to achieve good effec tiveness (correctness and completeness of the alignment) andgoodefficiency(timeandspaceefficiency).Theeffec tiveness is measured by precision and recall while the efficiency is measured by execution time and memory used. The remainder of this paper is organized as follows: Section 2 discusses various large scale matching tech niques. Section 3 gives a brief comparison of the large scale matching techniques. Finally, Section 4 concludes thesurveyofontologymatchingtechniques. InputSchemas Preprocessor Similarity Similarity Alignments Aggregation Computation Preprocessor
Fig. 1 General Framework for Matching Process
2011 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/
JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.ORG
56
2 APPROACHES USED FOR LARGE SCALE MATCHING TECHNIQUE

In this section, we discuss about various large scale matching techniques. The matching techniques are cate gorizedintofourmajorcategoriesasshownbelow 1) Earlypruningstrategybasedmatchingtechnique a. QuickOntologyMatchingalgorithm (QOM) b. Eric peukert et al. schema and ontology matchingalgorithm 2) Partitionbasedmatchingtechnique a. Coma++ schema and ontology matching al gorithm b. FalconAOontologymatchingalgorithm c. Taxomapontologymatchingalgorithm d. Anchorfloodontologymatchingalgorithm 3) Parallelmatchingtechnique a. Grossetal.ontologymatchingalgorithm 4) Othermatchingtool a. RiMOMontologymatchingtool b. ASMOV (Automated Semantic Matching of Ontologies with Verification) ontology matchingtool c. Agreementmaker schema and ontology matchingtool
2.1 Early Pruning Strategy Matching Technique

Theaimofearlypruningtechniqueistoreducethesearch space for matching, e.g., one matcher can prune entity pairs whose semantic correspondence value is very low, thus reducing search space for the subsequent matcher. This idea is suitable for both sequential and iterative matcher workflow. QOM [1] was the first system to im plement this idea using iterative matcher workflow and Peukert et al. [2] System implemented both matcher workflows. 2.1.1 QOM QOM is the ontology matching algorithm adapted from the Naive Ontology Matching (NOM). The basic differ encebetweenNOMandQOMisQOMusesheuristicand dynamic programming approach to reduce the search space with marginally compromising on effectiveness. QOM is an iterative matcher consisting of the following six steps to match the ontologies: Feature engineering, search step selection, similarity computation, similarity aggregation,interpretationanditeration. FeatureEngineering.Theprocessofdeterminingwheth ertwoentitiesaresameornotisbasedontheentityfea tures.RDFfeatureslikeentitiesunifiedresourceidentifi ers (URIs), its label, its parent entities, similar entities, childrenentitiesanddomainspecificfeaturesareusedfor matching.Thisstepconstructsthesearchspaceofontolo gy entities by computing the cartesian product of entity pairsusingentityfeaturesofbothontologies. Search Step Selection. The entity pairs obtained will be
pruned using heuristics strategies to reduce the search space contructed in the previous step. The selected matching pairs will be processed for similarity computa tionandtheirinterpretation.Onceagainthesystemhasto determinewhichmatchingpairstoaddtotheagendafor the next iteration. The heuristics strategies are based on labels of entities, entity neighbour to the alignments ob tained from the previous iteration, hierarchy of entities, randompickorcombinationoftheabovestrategies. Similarity Computation. The similarity between the above selected entity pairs can be measured by wide range of similarity functions like Levenshteins edit dis tance,Dicecoefficientandcosinesimilarityvalue.Based on the feature of the entity any of the above similarity functioncanbechosen. Similarity Aggregation. QOM uses weighted average of similarityvaluesforagivenentitypair.Theweightofthe similarityvaluesiscalculatedbysigmoidfunctionwhich emphasizeshighindividualsimilaritiesanddeemphasiz eslowindividualsimilarities. Interpretation.First,matchingpairswithsimilarityvalue less than a threshold is discarded. Next similarity pairs whichviolatesbijectivity(onetooneandontoconstraint) ofthematchingarediscarded. Iteration. Initial iteration uses lexical knowledge while later iterations use ontology structure knowledge for matching. The number of iteration required irrespective ofontologysizebasedontheirexperimentsisten. BasedontheevaluationQOMisnearlytwentytimesfast erthanNOM.ThedrawbackofQOMisthat,effectiveness is marginally lower than NOM since not all entity pairs areevaluated. 2.1.2 Peukert et al. Method Peukertetal.proposedaschemaandontologymatching algorithm.Itisarulebasedoptimizationtechniquewhich rewrites match workflow to improve performance. The filteroperatorswithinmatchworkfloweliminatedissimi lar entity pairs (whose similarity value is less than some threshold) from intermediate match results and thus re ducingsearchspaceforfurthermatchers.Thethresholdis either statically predetermined or dynamically derived fromthesimilaritythresholdusedinthepreviousmatch workflowtoselectalignment. TheinputofthisprocessisXMLschemas,metamod elsorontologies.Theseinputformatsareconvertedintoa standard internal format. The user must design the matchingprocesswhichisrepresentedasagraphwhere theverticesrepresentmatcherfromthelibraryandedges representexecutionorderanddataflow.Sincethegraph isdesignedbytheuserthematchworkflowcanbeparal lel or serial or iterative. The user can utilize the rewrite recommendation system, to increase efficiency of the matching workflow. These rewrite recommendation sys tem is similar to costbased rewrites in database query optimizatione.g,theuseofrewriterulesaresimilartothe useofpredicatepushdownrulesfordatabasequeryop
57
timization which reduce the number of input tuples for joinsandotherexpensiveoperators.Asimplecostmodel isusedtochoosewhichpartsofamatchingworkflowto rewrite.Theserewritescanbeacceptedorrejectedbythe user. Once the matching process graph is finalised the graphcanbeexecutedtoobtainthealignment. Thedrawbacksofthismethodareasfollows.Rewrite rules, only focus on efficiency improvements but not on effectiveness improvement. The matcher library consists ofonlyfewnumbersofmatchers.
2.2 Partition Based Matching Technique

Theideaofthisapproachistopartitiontheontologyand executeapartitionwisematching.Thepartitioningisper formedinsuchawaythateachpartitionoffirstontology ismatchedwithonlysmallsubsetofthepartitions(ideal ly, only with one partition) of the second ontology. This methodreducesthesearchspaceandthusbetterefficien cy. Space complexity of the matching process is also re duced. Four partition based methods COMA++ [5], Fal conAO [3],Taxomap [4], anchor flood [6] will be dis cussedbelow. 2.2.1 COMA++ COMA++ is a generic schema and ontology matching tool with a library of matchers and a flexible option to combinethematcherstorefinethematchingresults.This methodcanprocessXMLschema,relationaldatabaseand OWL ontology. First the schema is processed to identify the relevant components (node, path or fragment) for matching. Then depending on the component chosen a matcherworkflowisusedtocomputecomponentsimilar ities. Finally similarity combination methods are used to findthecorrespondencesbetweencomponents. Fragmentcomponentbasedmatchingprocessiscapa ble of handling large scale schema. A rooted subgraph downtotheleaflevelintheschemagraphisreferredas fragment. In general, fragments should have little or no overlap to avoid repeated similarity computations and overlapping match results. The three types of automatic fragmenttypesareschema,subschema,andsharedfrag ments.Usercanalsoselectasubstructureasafragment. Based on the fragment type, the schema is fragmented andfragmentpairsfromeachontologyismatchedtofind the most similar fragment pair.The search for similar fragments is some kind of lightweight matching, e.g., basedonthesimilarityofthefragmentroots.Nextsimilar fragment pair is matched element wise to find the align ments. Thedrawbacksofthismethodareasfollows.COMA++ isdevelopedforXMLschemasandhenceitisnotsuitable forcomplexontologygraph.Ituserelativelysimpleheu ristic rules to partition the input schemas resulting often in too few or too many partitions and to find similar fragmentpaironlytherootnodefeaturesareusedwhich willleadtolessmatchingquality.
2.2.2 Falcon-AO FalconAO is an ontology matching tool which aims at finding alignments of the given two OWL/RDF ontolo gies. The matcher library of FalconAO consist of VDoc [17] and ISub [19] which are lightweight linguistic matchers, GMO [18] an iterative structural matcher and PBM(PartitionBasedMatcher)whichadoptsthedivide andconquer strategy to find block mappings between largescale ontologies. The three phase of PBM are ex plainedbelow Partitioning ontologies: Structural proximities between entities are calculated based on how closely they are re latedinthehierarchies.Theontologyclustersareformed based on structural proximities using modified ROCK [20] clustering algorithm. RDF Blocks are constructed fromtheclustersbyassigningeachRDFtripletoacluster inwhichatleasttwoentitiesarecontained. Matchingblocks:Thealignmentwithhighsimilaritiesis referred as anchors. A lightweight string comparison technique, ISUB, is firstly employed to exploit anchors betweentwofullontologiesandthentheblocksfromthe two ontologies are matched based on the distribution of theanchors. Discovering alignments: VDOC adopts a linguistic ap proach to ontology matching. GMO is an iterative struc turalmatcher.Similaritycombinationisaheuristicstrate gytotunethethresholdsoftheabovetwomatchers. The drawbacks of this method are as follows. The entire ontologyneedstobeprocessedtofindanchorsandthus efficiencyofFalconAOisreduced.Maximumnumberof entityinaclusterisdeterminedbytheGMOmatcherand the clustering algorithm terminates abruptly if it reaches maximumnumberwhichwillleadtopoorclustering. 2.2.3 Taxomap Taxomapisanontologymatchingalgorithmconsistingof twopartitioningalgorithmnamelyPartitionAnchorParti tion(PAP)andAnchorPartitionPartition(APP)whichhave been designed to take the alignment objective into ac count in the partitioning process. The most structured ontologyisreferredastargetontologyandthelessstruc tured is referred as source ontology. PAP is suitable for structuredontologyvsunstructuredontologyandAPPis suitable for structured ontology vs structured ontology. Theentitypaironefromeachoftwoontologywhichhas identicallabelsiscalledasanchorswhichwillbeusedin bothPAPandAPP.Thealignmentisbasedonlexicaland structure(subclass)similaritymeasure. PartitionAnchorPartition(PAP): 1. UsePBMmatchertopartitionthetargetontologyinto setofblocks. 2. Identify the set of anchors between two ontologies. This set will be the center of the future block which willbegeneratedfromthesourceontology. 3. Use PBM matcher to partition the source ontology aroundtheidentifiedcenters. 4. Aligneachblockwiththecorrespondingblock.
58
AnchorPartitionPartition(PAP): algorithm has been proposed by this system leading to 1. Identifythesetofanchorsacrossontologies. better load balancing, limited memory consumption and 2. Partitionboththetargetontologyandsourceontology scalability without reducing the effectiveness of match by modifying PBM matcher in order to take into results. accountsharedanchors. The system first generates the context attributes for 3. Alignblocksthatsharemaximalnumberofanchors. each entity. Then the ontology is partitioned into set of The drawbacks of this method are as follows. The ef fectiveness of this method depends on the availability of identical labels across ontologies. Only labels and hie rarchystructureisusedformatchingandhencecompara tivelylessrecall. 2.2.4 AnchorFlood AnchorFlood is an ontology matching algorithm which implements a dynamic partition based matching that avoidstheprioripartitioningoftheontologies.Thesimi larity measure for alignment is based on internal struc ture, external structure and JaroWingler string distance. The process starts of with the set of anchors where an anchor is defined as a exact string match of concepts, propertiesorinstancespair. The algorithm preprocesses ontologies to normalize the textual contents of entities. It starts with an anchor and collects neighboring entity of the chosen anchor acrossontologies.Thesegmentpairconstructedfromthe above collected neighbors is processed to produce align ment pairs. The process repeats until either all the col lected entity are explored, or no new aligned pair is found. The anchor will be discarded as misalignment if the constructed segment of the anchor does not produce sufficientalignments. Thedrawbacksofthismethodareasfollows.Semantic similaritybetweenentitiesisnotexplored.Itignorescer tain distantly placed aligned pairs since segments are constructed from neighbor of anchors. Only exact string match of concepts, properties and instances are consi deredasanchorswhichleadtoinefficientalignments. partition based on the sizebased partitioning algorithm achieving intramatcher parallelism. Now the partition pairsareconstructedonefromeachofthetwoontologies. Each partition pair is assigned a processor. Within each processorthepartitionpairisprocessedinparallelbythe elementlevel,structurelevel,instancebasedmatchersso as to achieve inter matcher parallelism. The alignments fromallthematchercanbeaggregatedtooutputthefinal alignment. The drawbacks of this method are as follows.The matcherlibraryconsistsofonlyfewnumberofmatchers. The partition algorithm uses only simple strategies for partitioningwhichcanbeimproved.
2.4 Other Matching Tools

RiMOM [8] is the first ontology matching algorithm de velopedwithautomaticanddynamicselectionofmatch ers for ontology alignment tasks. The input ontology should be in OWL format. It considers lexical, structural andinstancesimilarities.Basedonthefeaturesofthein put ontology and the predefined rules, appropriate matchers are chosen to apply for the matching task. Ri MOMconsistofsixstepsandareexplainedbelow. OntologyPreprocessingandFeatureFactorsEstimation. Foreachentityofbothontologies,generatethefeaturesof theentitylikeitsname,label,children,etc.Thenthelabel and structure similarity of the entity pairs are calculated whichwillbeusedinthefollowingstep. Strategy (Matcher) Selection. The basic idea of strategy selection is that, if two ontologies have some feature in common, then matcher based on these feature informa tionareemployedwithhighweight;andifsomefeature factors are two low, then these matcher may not be em ployed.Theentitiesarefirstlinguisticallymatched;struc turalmatchingisonlyappliediftheschemasexhibitsuf ficientstructuralsimilarity. Single strategy execution. The selected strategies from theabovestepisusedtofindthealignmentindependent ly.Eachstrategyoutputsanalignmentresult. Alignmentcombination.Inthisphase,RiMOMcombines the alignment results obtained by the selected strategies. The combination is conducted by a linear interpolation method. Similarity propagation (optional). If the two ontologies havehighstructuresimilarityfactor,RiMOMemploysto find new alignment according to the structural informa tion. Alignment refinement. It refines the alignment results fromtheprevioussteps.Itdefinedseveralheuristicrules toremovetheunreliablealignments.
2.3 Parallel Matching Technique

A straightforward method to increase the efficiency of largescale matching is to execute matcher in parallel on severalprocessors.Thetwokindsofparallelmatchingare intermatcher and intramatcher parallelization. Inter matcher parallelization deals with parallel execution of independently executable matchers while intramatcher parallelizationdealswithinternaldecompositionofindi vidualmatchersormatcherpartsintoseveralmatchtasks that can be executed in parallel. Gross et al.[7] matching systemimplementsbothintermatcherandintramatcher parallelizationwhichwillbediscussedbelow Gross et al. proposed a parallel ontology matching system with a distributed infrastructure to incorporate intramatcherandintermatcherparallelism.Theelement level and structurelevel matching are also parallelized. For intramatcher parallelism, a sizebased partitioning
59
TABLE 1 Comparison of Matching Technique

Inputas XMLSchemas Inputas Ontologies GUI Linguistic Matcher Structural Matcher Instance Matcher Earlypruning Schemaparti tion Parallelmatch ing Dyn.matcher selection Mapping reuse OAEIpartici pation Useofexternal dictionary QOM No Yes No Yes Yes Yes Yes No No No No No Yes Peukert COMA++ etal. Yes Yes Yes Na* Na Na Yes No (Yes) No Yes No No Yes Yes Yes Yes Yes Yes No Yes No No Yes Yes Yes Falcon AO No Yes (Yes)** Yes Yes No No Yes No No No Yes No Taxo map No Yes No Yes Yes No No Yes No No No Yes No Anchor Flood No Yes No Yes Yes No No Yes No No No Yes No Gross etal. No Yes No Yes Yes Yes No Yes Yes No No No No RiMOM No Yes No (Yes) (Yes) (Yes) No No No Yes No Yes Yes ASMOV No Yes No Yes Yes Yes No No No No No Yes Yes Agreement Maker (Yes) Yes Yes Yes Yes Yes No No No No No Yes Yes
*Notavailable**optional The drawback of RiMOM is its inefficiency for dealing with large scale ontologies. Eventhough it shows a very good effectiveness for large scale ontologies it consume longtimeandlargeamountofmemorysinceitdoesnot incorporate search space reduction techniques like early pruning,partitionofontologyorparalleltechnique. ASMOV(Automated Semantic Matching of Ontologies with Verification) [9] is an iterative ontology matching algorithm. The strength of ASMOV lies in the post processing of the alignment to remove the alignments whicharesemanticallyinconsistent.ItusesWordNetand theUnifiedMedicalLanguageSystem(UMLS)toincrease theeffectiveness.TheinputontologyshouldbeinOWL DLformat.TheinputofASMOVistwoontologiesandan optionalpredeterminedalignmentset.Thelexical,struc tural and instance based similarity value between all possiblepairsofentities,onefromeachofthetwoontol ogies is calculated and is used as the optional input alignmenttoadjustanycalculatedmeasures.Asimilarity matrix containing the calculated similarity values for everypairofentitiesisobtainedfromtheabovestep.For each entity choose the maximum similarity value as pre alignmentfromthesimilaritymatrix. Thisprealignmentmustundergoaprocessofseman ticverification,whichisanextensivepostprocessingto
eliminate potential inconsistencies among the set pre alignment. Five different kinds of inconsistencies are checked. One such inconsistency rule is Multipleentity correspondences, e.g., if two alignments (a, b) and (b, c) exist then there must also be alignment like (a, c), if not the above two alignment cannot be verified and hence removed.Theoutputoftheabovestepisthesemantically verified similarity matrix, which is then tested against a termination condition. If this condition is true, no more iteration is needed and the process stops. The resulting alignmentisfinalalignmentset. Thedrawbacksofthissystemareasfollows.Whenthe given two ontologies are dissimilar the effectiveness is decreased. Sometime semantic verification system elimi natestoomanyalignmentsleadingtolessrecall.Foreach iteration, theASMOV needs polynomial time, thus lead ingtoinefficiency. AgreeementMaker[10]isaschemaandontologymatch ing algorithm consisting of wide range of matcher for lexicalandstructuralfeaturesoftheontology.Itprovides bothserialandparallelmatcherworkflows.Thestrength of the system lies in GUI which enable user to choose, control and execute the iterative matchers and their re sults.ThroughGUItheusercanchoosematcherfromthe matcher library based on the matching granularity (ele
60
men wise, structural wise and instance wise), dominant featuresofinputschemas,etc. The input ontology can be in XML, RDFS, OWL, or N3. The system consists of three layers. First matcher layerusetheentityfeatures(e.g.,label,comments,anno tations, and instances) and compute the similarity value using the syntactic and lexical matchers. The resulting similarity values are combined based on weighted aver agemethod.Secondmatcherlayerusestructuralproper tiesoftheontologiestocomputethesimilarityvalue.Fi nally,thirdmatcherlayercombinethesimilarityvalueof firstandsecondmatcherlayer. The drawback of this system is that, the end user should be a sophisticated domain expert because to choose, control and execute the matcher the user must havedomainknowledge.
[3]
[4]
[5]
[6]
[7]
[8]
COMPARISON OF DIFFERENT LARGE SCALE MATCHING TECHNIQUE
[9]
Toshowthestateoftheartincurrentlargescalematch ing techniques, we briefly compare the above discussed matching prototypes that have successfully been applied to largescale matching problem. The OAEI (Ontology AlignmentEvaluationInitiative)isabenchmarkcompeti tion for evaluating the new proposed matching systems andthecomparisonalsotellswhichprototypeshavepar ticipated in OAEI. Table 1 provides a brief comparison betweentenmatchprototypes.
[10]
[11]
[12]
[13]
4 CONCLUSION
This survey paper has discussed the various matching techniques that could be used for large scale matching. Theaimofthissurveypaperistogivebroadoverviewof techniques which are used to increase the efficiency like early pruning strategy, partitioning of ontology/schema and parallelization of matching process. And also the techniques to increase the effectiveness like postprocess ing of alignments, dynamic matcher selection, and full usercontrolofmatchworkflowthroughGUI.Thesurvey depictsthatthereisalwaysatradeoffbetweengoodeffi ciency and good effectiveness in the existing matching systems.Hence,weareworkingforlargescalematching technique which will combine the advantages ofthe me thods discussed in this paper to achieve both good effi ciencyandgoodeffectivenessandwewilltestitwithon tologiesofparticulardomain.
[14]
[15]
[16]
[17]
[18]
[19]
REFERENCES
[1] Ehrig M, Staab S, Quick ontology matching, Proceedings of international conference semantic web (ICSW). LNCS, vol 3298. Springer,Heidelberg,pp683697,2004. Peukert E, Berthold H, Rahm E, Rewrite techniques for per formanceoptimizationofschemamatchingprocesses,Proceed ingsof13thinternationalconferenceonextendingdatabasetechnolo
[20]
gy(EDBT).ACM,NY,pp453464,2010. Hu W, Qu Y, Cheng G, Matching large ontologies: A divide andconquerapproach, Journal of Data Knowledge Engineering, vol.67,no.1,pp.140160,2008. Hamdi F, Safar B, Reynaud C, Zargayouna H, Alignment basedpartitioningoflargescaleontologies,Advancesinknow ledgediscoveryandmanagement,1sted.NewYork:SpringerHei delberg,pp.251269,2009. Do HH, Rahm E, Matching large schemas: Approaches and evaluation,JournalofInformationSystem,vol.3,no.6,pp.857 885,2007. HanifMS,AonoM,Anefficientandscalablealgorithmfor segmentedalignmentofontologiesofarbitrarysize,Journalof WebSemantics,vol.7,no.4,pp.344356,2009. Gross A, Hartung M, Kirsten T, Rahm E, On matching large lifescienceontologiesinparallel,Proceedingsof7thinternational conferenceondataintegrationinthelifesciences(DILS).LNCS,vol 6254.Springer,Heidelberg,2010. Li J, Tang J, Li Y, Luo Q, RiMOM:A dynamic multistrategy ontology alignment framework, IEEE Trans Knowl Data Eng vol.21, no. 8,pp.12181232,2009. JeanMaryYR,ShironoshitaEP,KabukaMR,Ontologymatch ingwithsemanticverification,JWebSemvol.7, no. 3,pp.235 251,2009. Cruz IF, Antonelli FP, Stroe C, AgreementMaker: Efficient matching for large realworld schemas and ontologies, PVLDB,VLDBEndowment,vol.2,no.2,pp15861589,2009. Shvaiko P, Euzenat J, Ten challenges for ontology matching, Confederated International Conference on the Move to Meaningful InternetSystems,pp.11641182,2008. Rahm E, Towards LargeScale Schema and Ontology Match ing, Schema matching and mapping, Bellahsene Z, Bonifati A RahmE, eds. NewYork:SpringerHeidelberg,pp.327,2011. SmithK,MorseM,MorkPetal.,Theroleofschemamatching inlargeenterprises,ProceedingsofCIDR,2009. Su W, Wang J, Lochovsky FH, Holistic schema matching for web query interfaces, Proceedings of international conference on extendingdatabasetechnology(EDBT).Springer,Heidelberg,pp 7794,2006. Kirsten T, ThorA, Rahm E, Instancebased matching of large life science ontologies, Proceedings of data integration in the life sciences(DILS).LNCS,Springer,Heidelberg,vol.4544,pp172 187,2007. AvesaniP,GiunchigliaF,YatskevichM,Alargescaletaxonomy mapping evaluation, Proceedings of international conference on semanticweb(ICSW).LNCS,vol.3729.Springer,Heidelberg,pp 6781,2005. Y. Qu, W. Hu, G. Cheng, Constructing virtual documents for ontology matching, Proceedings of the 15th International World WideWebConference,ACMPress,pp.2331,2006. W. Hu, N. Jian, Y. Qu, Y. Wang, GMO: a graph matching for ontologies, Proceedings of the KCAP Workshop on Integrating Ontologies,pp.4148,2005. Stoilos G, Stamou G, Kollias S, A string metric for ontology alignment,Proceedingsofthe4thInternationalSemanticWebCon ference,LNCS,Springer,vol.3729,pp.624637,2005. Guha S, Rastogi R, Shim K, ROCK: a robust clustering algo rithm for categorical attributes, Proceedings of the 15th Interna tionalConferenceonDataEngineering,pp.512521,1999.
[2]
K. Saruladha working as Assistant professor in Pondicherry Engineering College, India. She has got a total of 21 years of teaching expereince. She has graduated from Pondicherry university.Puchucherry, India. She is a member of Indian Society of Technical Education, India. She has published nearly 20 research papers
61
in Distributed computing, information security and ontology based information retrieval. She is currently pursuing her Ph.D. in ontology based information retreival systems. Dr. G. Aghila working as Professor in Pondicherry University, India has got a total of 21 years of teaching expereince. She has graduated from Anna University Chennai, India. She has published nearly 40 research papers in web crawlers, ontology based information retrieval. She is currently guiding 8 Ph.D. scholars. She was in receipt of schrneiger award. She is an expert in onology development. Her area of interest includes artificial intelligence, text mining and semantic web technologies. B. Sathiya is a post graduate student pursuing her M.Tech in Computer Science and Engineering (Distributed Computing Systems specialization) in Pondicherry Engineering College, Puducherry , India.

A Survey On Large Scale Schema and Ontology Matching Techniques

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

A Survey On Large Scale Schema and Ontology Matching Techniques

Uploaded by

Copyright:

Available Formats

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 10, OCTOBER 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING WWW.JOURNALOFCOMPUTING.

A Survey on Large Scale Schema and Ontology Matching Techniques

2011 Journal of Computing Press, NY, USA, ISSN 2151-9617 http://sites.google.com/site/journalofcomputing/

2 APPROACHES USED FOR LARGE SCALE MATCHING TECHNIQUE

2.1 Early Pruning Strategy Matching Technique

2.2 Partition Based Matching Technique

2.4 Other Matching Tools

2.3 Parallel Matching Technique

TABLE 1 Comparison of Matching Technique

COMPARISON OF DIFFERENT LARGE SCALE MATCHING TECHNIQUE

You might also like