Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Save to My Library
Look up keyword
Like this
1Activity
0 of .
Results for:
No results containing your search query
P. 1
A Review on Ontology-Driven Query-Centric Approach for INDUS Framework

A Review on Ontology-Driven Query-Centric Approach for INDUS Framework

Ratings: (0)|Views: 18 |Likes:
Published by ijcsis
This paper stimulates and describes the data integration component of INDUS that is, Intelligent Data Understanding System, environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of physically distributed autonomous, semantically heterogeneous data sources regardless of location, internal structure and query interfaces as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies. More than 13 systems are studied and it is realized that INDUS is the most preferred system for Information Extraction, Integration, and Knowledge Acquisition from Heterogeneous, Distributed and Autonomous Information Sources. PROSITE, MEROPS, SWISSPROT, and MEME are examples of data sources used by Computational Biologists.
This paper stimulates and describes the data integration component of INDUS that is, Intelligent Data Understanding System, environment for data-driven information extraction and integration from heterogeneous, distributed, autonomous information sources. INDUS employs ontologies and inter-ontology mappings, to enable a user or an application to view a collection of physically distributed autonomous, semantically heterogeneous data sources regardless of location, internal structure and query interfaces as though they were a collection of tables structured according to an ontology supplied by the user. This allows INDUS to answer user queries against distributed, semantically heterogeneous data sources without the need for a centralized data warehouse or a common global ontology. The design of INDUS is motivated by the requirements of applications such as scientific discovery, in which it is desirable for users to be able to access, flexibly interpret, and analyze data from diverse sources from different perspectives in different contexts. INDUS implements a federated, query-centric approach to data integration using user-specified ontologies. More than 13 systems are studied and it is realized that INDUS is the most preferred system for Information Extraction, Integration, and Knowledge Acquisition from Heterogeneous, Distributed and Autonomous Information Sources. PROSITE, MEROPS, SWISSPROT, and MEME are examples of data sources used by Computational Biologists.

More info:

Published by: ijcsis on Sep 05, 2010
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

09/05/2010

pdf

text

original

 
 
 Abstract-
 
This paper stimulates and describes the data integrationcomponent of INDUS that is,
In
telligent
D
ata
U
nderstanding
S
ystem,environment for data-driven information extraction and integrationfrom heterogeneous, distributed, autonomous information sources.
 
INDUS employs ontologies and inter-ontology mappings, to enable auser or an application to view a collection of physically distributedautonomous, semantically heterogeneous data sources regardless of location, internal structure and query interfaces as though they were acollection of tables structured according to an ontology supplied bythe user. This allows INDUS to answer user queries againstdistributed, semantically heterogeneous data sources without the needfor a centralized data warehouse or a common global ontology. Thedesign of INDUS is motivated by the requirements of applicationssuch as scientific discovery, in which it is desirable for users to beable to access, flexibly interpret, and analyze data from diversesources from different perspectives in different contexts. INDUSimplements a federated, query-centric approach to data integrationusing user-specified ontologies. More than 13 systems are studiedand it is realized that INDUS is the most preferred system for Information Extraction, Integration, and Knowledge Acquisition fromHeterogeneous, Distributed and Autonomous Information Sources.PROSITE, MEROPS, SWISSPROT, and MEME are examples of data sources used by Computational Biologists.
 Keywords-
 
INDUS (Intelligent Data Understanding System), Query-centric approach, PROSITE, MEROPS, SWISSPROT, MEME,MIPS2GO, EC2GO.
I.
 
I
 NTRODUCTION
 INDUS is a modular, extensible, platform which does notdependent environment for information integration and data-driven knowledge acquisition from heterogeneous, distributed,autonomous information sources. INDUS when comparedwith machine learning algorithms for ontology-guidedknowledge acquisition that can accelerate the pace of discovery in emerging data-rich domains such as biologicalsciences, atmospheric sciences, economics, defense, socialsciences, by means of enabling scientists and decision makersrapidly and flexibly explore and analyze vast amounts of datafrom disparate sources. IBM provides a family of datamanagement products that enable a systematic approach tosolve the information integration challenges that businessesface today.
 
Data Integration systems [2] attempt to provideusers with seamless and flexible access to information frommultiple autonomous, distributed and heterogeneous datasources through a unified query interface. Ideally, a dataintegration system should allow users to specify whatinformation is needed without having to provide detailedinstructions on how or from where to obtain the information.Data integration system must provide mechanisms for thefollowing, such as communications and interaction with eachdata source as needed, specification of a query, expressed interms of a user specified vocabulary, across multipleheterogeneous and autonomous data sources, specification of mappings between user ontology and the data-source specificontologies, transformation of a query into a plan for extractingthe needed information by interacting with the relevant datasources, and integration and presentation of the results interms of a vocabulary known to the user. Basically there aretwo broad classes of approaches to data integration: DataWarehousing and Database Federation [4].Figure1 Data Integration Layer INDUS allows users to,
A Review on Ontology-Driven Query-CentricApproach for INDUS Framework 
L. Senthilvadivu, Dept of Software Technology Dr. K. Duraiswamy, Dean(Academic)SSM College of Engineering K.S.R College of TechnologyKomarapalayam, Tamilnadu, India Tiruchengode, Tamilnadu, Indialsvadivu.ssm@gmail.comdrkduraiswamy@yahoo.co.in
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 5, August 201046http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
 
 
View the set of data sources as if they were locatedlocally and they were using a homogeneous interface.
 
Interact with data sources (i.e., posting queries) through a provided interface that takes advantage of thefunctionality offered by each of data source using thequery capabilities offered by the data sources to answer queries.
 
Define their own language for defining queries andreceiving answers.
 
Define new concepts based on other concepts by applyinga set of well-defined compositional operations.
 
Use different definitions for the same concept, facilitatingthe exploration of new paradigms that explain the world.For information integration and extraction fromheterogeneous, distributed multi-relational informationsources, this has implications in terms of how new basicconcepts are incorporated into the system. Consider a systemin which the query language is restricted to set unionoperations applied over EDI predicates without built-in predicates. Assuming a Query-centric case, Figure 2 shows afamily of queries based on a set of basic concepts qij. Let I(Q) and I’(Q) be the set of instances satisfying Q respectively before and after adding c to the system. Assume that I(c)
 I (). Then, such that c is added to G (), I ()
I().In other words, only those queries where c is explicitly addedmay return a different answer.Figure 2 Query-Centric Approach Examples.Data sources are autonomous, distributed, and heterogeneousin structure and content; the complexity associated withaccessing the data answering queries must be hidden from theusers; the users need to be able to view disparate data sourcesfrom their own point of view.
 
INDUS consists of three principal layers. In the lower part, the set of data sourcesaccessible by INDUS are shown. In the physical layer, a set of instantiators enable INDUS to communicate with the datasources. The ontological layer offers a repository whereontologies are stored. Using these repository syntactical andsemantic heterogeneities may be solved. Also, another relational database system is used to implement the user workspace private area where users materialize their queries.The user interface layer enables users to interact with thesystem.Figure3 INDUS Schematic DiagramINDUS is based on five modules. The graphical user interfaceenables users to interact with INDUS. This module isdeveloped under Oracle Developer 6i. The common globalontology area, implemented through a relational databasesystem, stores all information about ontologies, concepts andqueries. Any information stored in this repository is shared for all users. The private workspace user area is also implementedthrough a relational database system. Each INDUS user has a private area where queries are materialized.Figure 4 INDUS Module DiagramFigure5 INDUS
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 5, August 201047http://sites.google.com/site/ijcsis/ISSN 1947-5500
 
 
The rest of the paper is organized as follows: Section II brieflyintroduces the related work done by various authors andsection III conclude and enhance the future work of INDUS.II.
 
RELATED WORK 
 Sudarshan Chawathe et al., stated the main motive of theTsimmis Project, this is mainly to develop tools that facilitatethe rapid integration of heterogeneous information sources thatmay include both structured and unstructured data. This paper gives an overview of the project, describing components thatextract properties from unstructured objects, which translateinformation into a common object model, that combineinformation from several sources, that allow browsing of information, and that manage constraints across heterogeneoussites. Tsimmis is a joint project between Stanford and the IBMAlmaden Research Center. In summary, the Tsimmis projectis exploring technology for integrating heterogeneousinformation sources. Current efforts are focusing on translator and mediator generators, which should significantly reduce theeffort required to access new sources and integrateinformation in different ways. TSIMMIS architecture is basedon the concept of wrappers and mediators. Each wrapper knows how to deal with a particular data source and it is ableto receive a query in a common language - Object ExchangeModel (OEM) and to transform it into a particular languageunderstood by the data sources. Both INDUS and TSIMMISuse query-centric approach to data integration. However,unlike TSIMMIS, INDUS maintains a clear separation between ontologies used for data integration (which aresupplied by users) and the procedures that use ontologies to perform data integration. This allows INDUS users to replaceontologies used for data integration ‘on the fly’. This makesINDUS attractive for data integration tasks that arise inexploratory data analysis wherein scientists might want toexperiment with alternative ontologies.Pegasus [17], a heterogeneous multi-database managementsystem that responds to the need for effective access andmanagement of shared data across in a wide range of applications. Pegasus provides facilities for multi-databaseapplications to access and manipulate multipole autonomousheterogeneous distributed object-oriented relational and other information systems through the uniform interface. It is acomplete data management system that integrates variousnative and local databases. Pegasus takes advantage of object-oriented data modeling and programming capabilities. It uses both type and function abstractions to deal with mapping andintegration problems. Function implementation can be definedin an underlying database language or a programminglanguage. Data abstraction and encapsulation facilities in thePegasus object model provide an extensible framework for dealing with various kinds of heterogeneities in the traditionaldatabase systems and nontraditional data sources. UniSQL/M[18], [19], SIMS [20], IRO-DB [3], and other projects, supportmediator capabilities through a unified global schema [21],which integrates each remote database and resolves conflictsamong these remote databases. Although these projects madesubstantial contributions in resolving conflicts among differentschemas and data models, the global schema approach suffersfrom the fragile mediator problem; the unified global schemamust be substantially modified as new sources are integrated.For example, UniSQL/M [18], [19] is a commercial multi-database product; virtual classes are created in the unifiedschema to resolve and “homogenize” heterogeneous entitiesfrom relational and object-oriented schema. Instances of thelocal schema are imported to populate the virtual classes of theintegrated schema, and this involves creating new instances.The first step in integration is defining the attributes of avirtual class, and the second step is a set of queries to populatethis class. They provide a vertical join operator, similar to atuple constructor, and a horizontal join, which is equivalent to performing a union of tuples. The major focus of their research conflicts due to generalization, for e.g., an entity inone schema can be included, i.e., become a subclass of anentity in the global schema, or a class and its subclasses may be included by an entity in the global schema. Attributeinclusion conflicts between two entities can be solved bycreating a subclass relationship among the entities. Other  problems that are studied are aggregation and compositionconflicts. Alternately, the capability of a mediator to resolveconflicts is supported by the use of higher-order querylanguages or meta-models [22], [23], [24]. Mediators are alsoimplemented through the use of mapping knowledge basesthat capture the knowledge required to resolve conflictsamong the local schema, and mapping or transformationalgorithms that support query mediation and interoperationamong relational and object databases.Jaime A Reinoso Castillo motivates and describes the dataintegration component of INDUS (Intelligent DataUnderstanding System) environment for data-driveninformation extraction and integration from heterogeneous,distributed, autonomous information sources. The design of INDUS is motivated by the requirements of applications suchas scientific discovery, in which it is desirable for users to beable to access, flexibly interpret, and analyze data fromdiverse sources from different perspectives in differentcontexts. INDUS implements a federated, query-centricapproach to data integration using user-specified ontologies.Development of high throughput data acquisition in a number of domains (e.g. biological sciences, space sciences,commerce) along with advances in digital storage, computing,and communication technologies have resulted inunprecedented opportunities in data-driven knowledgeacquisition and decision making. The effective use of increasing amounts of data from disparate information sources presents several challenges in practice. This paper describesthe data integration component of INDUS (Intelligent DataUnderstanding System) – a modular, extensible, platformindependent environment for information integration and data-driven knowledge acquisition from heterogeneous, distributed,autonomous information sources. INDUS when equipped with
(IJCSIS) International Journal of Computer Science and Information Security,Vol. 8, No. 5, August 201048http://sites.google.com/site/ijcsis/ISSN 1947-5500

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->