Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more
Download
Standard view
Full view
of .
Look up keyword
Like this
4Activity
0 of .
Results for:
No results containing your search query
P. 1
Integrating and Exchanging XML Data Using Ontologies

Integrating and Exchanging XML Data Using Ontologies

Ratings: (0)|Views: 54 |Likes:
Published by Deep

More info:

Published by: Deep on Sep 19, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

07/26/2010

pdf

text

original

 
Integrating and Exchanging XML Data usingOntologies
Huiyong Xiao and Isabel F. Cruz
Department of Computer ScienceUniversity of Illinois at Chicago
{
hxiao
|
ifc
}
@cs.uic.edu
Abstract.
While providing a uniform syntax and a semistructured data model,XML does not express semantics but only structure such as nesting informa-tion. In this paper, we consider the problem of data integration and interoper-ation of heterogeneous XML sources and use an ontology-based framework toaddress this problem at a semantic level. Ontologies are extensively used fordomain knowledge representation, by virtue of their conceptualization of thedomain, which carries explicit semantics. In our approach, the global ontologyis expressed in RDF Schema (RDFS) and constructed using the global-as-viewapproach by merging individual local ontologies, which represent XML sourceschemas. We provide a formal model for the mappings between XML schemasand local RDFS ontologies and those between local ontologies and the globalRDFS ontology. We consider two cases of query processing, specifically for dataintegration and for data interoperation. In the first case, the user poses an RDFquery on the global ontology, which is answered using all the mapped XMLsources. In the second case, a query is posed on a single source and then ismapped to the XML sources that are connected to that source. For each case,we discuss the problem of query containment and present an equivalent queryrewriting algorithm for queries expressed in two languages: conjunctive RDQLand conjunctive XQuery.
1 Introduction
1.1 Problem description
Data integration is the problem of combining data residing at different sources, andproviding the user with a unified view of these data [25]. It is relevant to a number of applications including data warehousing, enterprise information integration, geographicinformation systems, and e-commerce applications. Data integration systems are usu-ally characterized by an architecture based on a global schema, which provides a rec-onciled and integrated view of the underlying sources. These systems are called
central
A preliminary version of this paper was presented at the 8th International Database Engineer-ing & Applications Symposium (Isabel F. Cruz, Huiyong Xiao, Feihong Hsu: An Ontology-Based Framework for XML Semantic Integration. IDEAS 2004: 217-226). This research waspartially supported by the National Science Foundation under Awards ITR IIS-0326284 andIIS-0513553.
 
2 Huiyong Xiao and Isabel F. Cruz
data integration systems
, and a large number of such systems have been proposed [3,5,11,14,24,27,30,34,36].There are two key issues in central data integration, namely system modeling andquery processing. For modeling the relation between the sources and the global schema,twobasicapproacheshavebeenproposed[10,25,36].Thefirstapproach,calledGlobal-as-View (GaV), expresses the global schema in terms of the data sources. The secondapproach, called Local-as-View (LaV), requires the global schema to be specified in-dependently from the sources, and the relationships between the global schema and thesources are established by defining every source as a view over the global schema.Query processing in central data integration may require a query reformulation step:the query over the global schema has to be reformulated in terms of a set of queries overthe sources. In the GaV approach, every entity in the global schema is associated with aview over the source local schema, therefore query processing in this case uses a simple“unfolding” strategy [25]. In contrast, query processing in LaV can be complex, sincethe local sources may contain incomplete information. In this sense, query processing inLaV, called
view-based query processing
[1,12,18], is similar to query answering withincomplete information [37]. It can also be the case that two data sources communicatein a peer-to-peer (P2P) way either through the global schema or directly. Data exchangeor query processing may occur in this case, which requires data translation or queryrewriting when heterogeneities are present between the communicating sources [16,23,27,30,32].The heterogeneities between distributed data sources can be classified as
syntac-tic
,
schematic
, and
semantic
heterogeneities [6]. Syntactic heterogeneity is caused bythe use of different models or languages (e.g., relational and XML). Schematic hetero-geneity results from the different data organizations (e.g., aggregation or generalizationhierarchies). Semantic heterogeneity is caused by different meanings or interpretationsof data. All these heterogeneities have to be resolved, to achieve the goal of integrationor interoperation. In this paper, we consider the semantic integration of XML data anddata exchange between heterogeneous XML sources, using ontologies.XML documents that represent data with similar semantics may conform to dif-ferent schemas. Therefore, a user must construct queries in accordance to the differentXML document’s structures even if to retrieve fragments of information that have thesame meaning. This fact makes the formulation of queries over heterogeneous XMLsources a nontrivial burden to the user. Furthermore, this shortcoming of XML impedesthe interoperation between XML sources since the reformulation of XML queries fromone source to another has to eliminate the structural differences of the queries whilepresenting the same semantics. Let us illustrate this problem using a running example.
 Example 1.
Figure 1 shows two XML schemas (
1
and
2
) with their instances (i.e.,XML documents
D
1
and
D
2
), which are represented as trees. It is obvious that
1
and
2
both represent a many-to-many relationship between two concepts:
book
and
author
(equivalently denoted
article
and
writer
in
2
). However, structurallyspeaking, they are different:
1
, which is a book-centric schema, has the
author
ele-ment nested under the
book
element, whereas
2
, which is an author-centric schema,has the
article
element nested under the
writer
element. Suppose our query targetis “Find all the authors of the publication
b
2
.” The XML path expressions that are used
 
Journal of Data Semantics 3
booksbookbooktitleauthorname
   b o o   k  s
bookbook"b1"booktitle"b2"name"a2"author"a1"booktitleauthorauthor"a3"namenamewriterswriterfullnamearticletitle
  w  r   i  t  e  r  s
writer"w1"fullnametitlearticle"t1"writer"w2"fullnametitlearticle"t2"writer"w3"fullnametitlearticle"t2"
XML schema 
S
1
XML document 
D
1
"books.xml"XML schema 
S
2
XML document 
D
2
"writers.xml"
Fig.1.
Two XML sources with structural heterogeneities.
to define the search patterns in the two schema trees can be respectively written as
/books/book[booktitle.text()="b2"]/author/name
and
/writers/writer[article/title.text()="b2"]/fullname
, where the contents inthe square brackets specify the constraints for the search patterns. We notice that al-though the above two search patterns refer to semantically equivalent concepts, theyfollow two distinct XML paths.
1.2 Semantic integration of XML documents
The structural diversity of conceptually equivalent XML schemas leads to the fact thatXML queries over different schemas may represent the same semantics even thoughthey are formulated using two different alphabets and structures. In comparison, theschema languages used for conceptual modeling are
structurally flat 
so that the usercan formulate a determined conceptual query without worrying about the structure of the source. RDF Schema (RDFS) [26], DAML+OIL, and OWL are examples of lan-guages used to create ontologies, which represent a shared, formal conceptualizationof the domain of knowledge [17]. There are currently many attempts to use concep-tual schemas (or ontologies) [3,4,16] or conceptual queries [14,15] to overcome theproblem of structural heterogeneities among XML sources.In this paper, we propose an ontology-based approach for the integration of XMLsources. We use the GaV approach to model the mappings between the source schemasand the global ontology, which is, therefore, an integrated view of the source schemas.The global ontology is expressed in terms of RDFS, which is at the core of severalontology languages (e.g., OWL and DAML+OIL). In order to facilitate the mappingsbetween the XML source schemas and the global RDFS ontology, their syntactic dis-parity needs to be reconciled. To this end, we first transform the heterogeneous XMLsources into local RDFS ontologies (defined using the RDFS space [9]), which are thenmerged into the global ontology. This transformation process encodes the mapping in-formation between each concept in the local ontology and the corresponding element inthe XML source. The ontology merging process can be semi-automatically performed(e.g., by using the PROMPT algorithm [29]). In addition to the global ontology, themerging process also produces a
mapping table
, which contains the mapping infor-mation between concepts in the global ontology and concepts in the local ontologies.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->