Welcome to Scribd, the world's digital library. Read, publish, and share books and documents. See more ➡
Download
Standard view
Full view
of .
Add note
Save to My Library
Sync to mobile
Look up keyword or section
Like this
1Activity
×
P. 1
Inverting Schema Mappings

Inverting Schema Mappings

Ratings: (0)|Views: 813|Likes:
Published by vthung

More info:

Published by: vthung on Mar 23, 2009
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See More
See less

10/17/2011

pdf

text

original

 
25
Inverting Schema Mappings
RONALD FAGINIBM Almaden Research Center
 A schema mapping is a specification that describes how data structured under one schema (thesource schema) is to be transformed into data structured under a different schema (the targetschema). Although the notion of an inverse of a schema mapping is important, the exact definitionofaninversemappingissomewhatelusive.Thisisbecauseaschemamappingmayassociatemanytarget instances with each source instance, and many source instances with each target instance.Based on the notion that the composition of a mapping and its inverse is the identity, we give aformaldefinition forwhat it means for a schema mapping
M
to be an inverse of a schema mapping
M
for a class
of source instances. We call such an inverse an
-inverse
. A particular case of interest arises when
is the class of all source instances, in which case an
-inverse is a globalinverse. We focus on the important and practical case of schema mappings specified by source-to-target tuple-generating dependencies, and uncover a rich theory. When
is specified by a setof dependencies with a finite chase, we show how to construct an
-inverse when one exists. Inparticular, we show how to construct a global inverse when one exists. Given
M
and
M
, we showhow to define the largest class
such that
M
is an
-inverse of 
M
.CategoriesandSubjectDescriptors:H.2.5[
DatabaseManagement
]:HeterogeneousDatabases—
 Data translation
; H.2.4 [
Database Management
]: Systems—
 Relational data bases
General Terms: Algorithms, Theory Additional Key Words and Phrases: Data exchange, inverse, schema mapping, data integration,chase, computational complexity, dependencies, metadata model management, second-order logic
 ACM Reference Format:
Fagin, R. 2007. Inverting schema mappings. ACM Trans. Datab. Syst. 32, 4, Article 25 (November2007), 53 pages. DOI
=
10.1145/1292609.1292615 http://doi.acm.org/10.1145/1292609.1292615
1. INTRODUCTION
Data exchange is the problem of materializing an instance that adheres to atarget schema, given an instance of a source schema and a schema mappingthat specifies the relationship between the source and the target. This is a veryold problem [Shu et al. 1977] that arises in many tasks where data must be
This is an expanded version of Fagin [2006]. Author’s address: IBM Almaden Research Center, 650 Harry Road, San Jose, CA 95120; email:fagin@almaden.ibm.com.Permission to make digital or hard copies of part or all of this work for personal or classroom use isgranted without fee provided that copies are not made or distributed for profit or direct commercialadvantage and that copies show this notice on the first page or initial screen of a display alongwith the full citation. Copyrights for components of this work owned by others than ACM must behonored. Abstracting with credit is permitted. To copy otherwise, to republish, to post on servers,to redistribute to lists, or to use any component of this work in other works requires prior specificpermission and/or a fee. Permissions may be requested from Publications Dept., ACM, Inc., 2 PennPlaza, Suite 701, New York, NY 10121-0701 USA, fax
+
1 (212) 869-0481, or permissions@acm.org.
C
2007 ACM 0362-5915/2007/11-ART25 $5.00 DOI 10.1145/1292609.1292615 http://doi.acm.org/ 10.1145/1292609.1292615
 ACM Transactions on Database Systems, Vol. 32, No. 4, Article 25, Publication date: November 2007.
 
25:2
R. Fagin
transferred between independent applications that do not have the same dataformat.Because of the extensive use of schema mappings, it has become importantto develop a framework for managing schema mappings and other metadata,and operators for manipulating them. Bernstein [2003] has introduced such aframework, called
model management
. Melnik et al. [2005] have developed asemantics for model-management operators that allows applying the operatorsto executable mappings. One important schema mapping operator, at least inprinciple, is the inverse operator. What do we mean by an inverse of a schemamapping? This is a delicate question, since in spite of the traditional use of the name
mapping
, a schema mapping is not simply a function that maps aninstance of the source schema to an instance of the target schema. Instead,for each source instance, the schema mapping may associate many target in-stances. Furthermore, for each target instance, there may be many correspond-ing source instances. As in Fagin et al. [2005a, 2005b, 2005c], we study the relational case wherea schema is a sequence of distinct relational symbols. A
schema mapping
is atriple
M
=
(
S
,
T
,
), where
S
(the
source schema
) and
T
(the
target schema
)are sequences of distinct relation symbols with no relation symbols in commonand
is a set of formulas of some logical formalism over
S
,
T
. We say that
 specifies
theschema
M
.AsinFaginetal.[2005a,2005b,2005c],ourmainfocusisontheimportantandpracticalcaseofschemamappingswhere
isafinitesetof 
 source-to-targettuple-generatingdependencies
(which we shall call
 s-ttgds
orsimply
tgds
). These are formulas of the form
x
(
ϕ
(
x
)
y
ψ
(
x
,
y
)), where
ϕ
(
x
)is a conjunction of atoms
1
over
S
, and where
ψ
(
x
,
y
) is a conjunction of atomsover
T
.
2
They have been used to formalize data exchange [Fagin et al. 2005a].They have also been used in data integration scenarios under the name of GLAV (global-and-local-as-view) assertions [Lenzerini 2002]. Note that tgds donotcontainequality,oranyother“built-inrelationsymbols.Whenweconsideregds (
 equality-generating dependencies
), we shall of course treat equality as abuilt-inrelationsymbolthatappearsintheconclusion.Later(inSection15),weshall extend the language of tgds so that the premise may include inequalities,and also a relation symbol
Constant
that represents constants.Intuitively, we would expect invertibility of a schema mapping to correspondto “no loss of information.” As an example, assume that the source schema hasonly the binary relation symbol
P
, and the target schema has only the unaryrelation symbol
Q
. Consider the projection schema mapping that is specified bythe s-t tgd
P
(
 x
,
y
)
Q
(
 x
).
3
It is clear that information is lost by this mapping,and, indeed, the projection schema mapping turns out not to have an inverse.Now assume that the source schema has only the binary relation symbol
P
, andthe target schema has only the ternary relation symbol
R
. Consider the schema
1
 An
atom over
S
is a formula of the form
P
(
v
1
,
...
,
v
m
), where
P
is a relation symbol of 
S
, and
v
1
,
...
,
v
m
are variables; similarly, we define an
atom over
T
.
2
There is also a safety condition, which says that every variable in
x
appears in
ϕ
. However, notall of the variables in
x
need to appear in
ψ
.
3
We will often drop the universal quantifiers in front of a tgd, and implicitly assume such quantifi-cation. However, we will write down all existential quantifiers.
 ACM Transactions on Database Systems, Vol. 32, No. 4, Article 25, Publication date: November 2007.
 
Inverting Schema Mappings
25:3
mapping that is specified by the s-t tgd
P
(
 x
,
y
)
 z
R
(
 x
,
y
,
 z
). It is clear thatno information is lost by this mapping, and indeed, this schema mapping turnsout to have an inverse. One such inverse is specified by the tgd that resultsby “reversing the arrows,” namely,
R
(
 x
,
y
,
 z
)
P
(
 x
,
y
). However, it turns outthat “reversing the arrows” does not always produce an inverse, even when oneexists.There are other flavors of “schema mappings” that have been studied in theliterature, such as view definitions, where there is a unique target instanceassociated with each source instance. In such cases, a schema mapping is afunction in the classical sense, and so it is quite clear and unambiguous as towhat an inverse mapping is. An example of such work is Hull’s [1986] seminalresearch on information capacity of relational database schemas. Although ourschema mappings are not actually functions, they have the advantage of be-ing simpler and more flexible. LAV (local-as-view) mappings, which have beenwidely used in data integration, are special cases of schema mappings specifiedby s-t tgds, where we simply add the restriction that the premise of each tgdmust be a single atom rather than a conjunction of atoms.Let us now consider how to define the inverse in our context, where schemamappingsarenotactuallyfunctions.Letusassociatewiththeschemamapping
M
12
=
(
S
1
,
S
2
,
12
) the set
S
12
of ordered pairs
 I 
,
such that
is a sourceinstance,
isatargetinstance,andthepair
 I 
,
satisfies
12
(written
 I 
,
|=
12
). Perhaps the most natural definition of the inverse of the schema mapping
M
12
would be a schema mapping
M
21
that is associated with the set
S
21
={
 J 
,
:
 I 
,
S
12
}
. This reflects the standard algebraic definition of aninverse, and is the definition that Melnik [2004] and Melnik et al. [2005] gavefortheinverse.Inthosearticles,thisdefinitionwasintendedforagenericmodelmanagement context, where mappings can be defined in a variety of ways,including as view definitions, relational algebra expressions, etc. However, thisdefinition does not make sense in our context. This is because
S
12
, by beingassociatedwithaschemamappingspecifiedbys-ttgds,isautomatically“closeddown on the left and closed up on the right.” This means that if 
 I 
,
S
12
and if 
(that is,
is a subinstance of 
) and
, then
 I 
,
S
12
.However, instead of being closed down on the left and closed up on the right,
S
21
is closed up on the left and closed down on the right. This is inconsistentwithaschemamappingthatisspecifiedbyasetofs-ttgds,whichisthecasewefocus on in this article. In fact, the “language of inverse” (that is, the languageneeded to specify inverses for schema mappings specified by s-t tgds) turns out,as we shall discuss in Section 15, to be given by a generalization of s-t tgds thatare also closed down on the left and closed up on the right.Our notion of an inverse of a schema mapping is based on another algebraicproperty of inverses, that the composition of a function with its inverse is theidentity mapping. In our context, the identity mapping is specified by tgds that“copy” the source instance to the target instance. Our definition of inverse saysthattheschemamapping
M
21
isaninverseoftheschemamapping
M
12
fortheclass
ofsourceinstancesiftheschemamappingspecifiedbytheircompositionisequivalenton
totheidentitymapping.Wereferthento
M
21
asan
-inverse
of 
M
12
. When
is the class of all source instances, then
M
21
is said to be a
 ACM Transactions on Database Systems, Vol. 32, No. 4, Article 25, Publication date: November 2007.

You're Reading a Free Preview

Download
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->