You are on page 1of 5

Mapping between Relational Database Schema and OWL Ontology

for Deep Annotation

Zhuoming Xu Shichao Zhang Yisheng Dong


College of Computer & Faculty of Information School of Computer
Information Engineering, Technology, Science & Engineering,
Hohai University, University of Technology, Southeast University,
Nanjing 210098, China Sydney, Australia Nanjing 210096, China
zmxu@hhu.edu.cn zhangsc@it.uts.edu.au ysdong@seu.edu.cn

Abstract the underlying database of the dynamic Web site.


We agree to the idea that the database (i.e., Web
Creating mappings between database schema and site) owner should cooperatively participate in deep
Web ontology is a preconditioning process in the annotation [5, 6]. Furthermore, we argue that a
generation of ontological annotations for dynamic conventional dynamic Web site system can be
Web page contents extracted from the database. In this extended to automatically generate semantic
paper, a practical approach to creating mappings annotations for the dynamic contents by integrating the
between a relational database schema and an OWL following components into the system:
ontology is presented. The approach can automatically • A Web ontology that captures the knowledge from
construct the mappings by following a set of the domain of discourse (DOD) of the relational
predefined heuristic rules based on the conceptual database underlying the Web site;
correspondences between the schema and the • A set of generic mappings between the database
ontology. This automatic mapping is implemented as schema and the ontology that map the implicit
the core functionality in a prototype tool D2OMapper semantics of the schema to the explicit and formal
that has some assistant functions to help the user knowledge structure of the ontology;
manually create and maintain the mappings. Case • A universal algorithmic procedure invoked by the
studies show that the proposed approach is effective dynamic page scripts and executed by the Web
and the produced mappings can be applied to semantic server when responding to page requests, which can
annotation of database-based, dynamic Web pages. produce ontological instances that semantically
describe the database query results contained in the
1. Introduction pages based on the schema-to-ontology mappings.
Based on the above ideas, we have implemented a
The Semantic Web [1] aims to create machine- prototype framework called DPAnnotator by extending
processable Web content attached well-defined formal a conventional dynamic Web site. In the framework,
semantics in order to provide better machine assistance the Web ontology is obtained by translating the ER
for human users in tasks. Semantically annotating Web schema of the relational database into an OWL
pages with Web ontologies is a key enabling ontology with our self-developed tool ER2WO [7].
technology for achieving the goal. However, existing Limited by space, this paper addresses the issues
tools can only produce semantic annotations for static regarding the second component: Given a relational
Web pages and how to annotate dynamic Web pages database schema and an OWL ontology − both of them
that are generated from the underlying databases (the have the same DOD and, optionally, are directly
greater majority of current Web content [2]) when the derived from the same ER schema, how generic
clients request the pages is still an open problem [3, 4]. mappings between them can be created and
This problem has been referred to as deep annotation represented to effectively support the semantic
[5, 6] that means the process of creating ontological annotation of database-based, dynamic Web pages?
instances for the database-based, dynamic contents by The remainder of the paper is organized as follows.
reaching out to the ‘deep Web’ and directly annotating

Proceedings of the 2006 IEEE/WIC/ACM International Conference


on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
0-7695-2747-7/06 $20.00 © 2006
Sect. 2 briefly discusses related work. In Sect. 3 we and (3) a subset DT of datatype names; each
present our mapping approach. Our prototype mapping datatype is a predefined RDBMS datatype,
tool D2OMapper and a typical mapping example are specifying a value range of the relevant instance
given in Sect. 4. Last section concludes the work. data. Furthermore, for each t ∈ ET ∪ RT , there is a
finite nonempty set col(t ) of column names; each
2. Related Work c ∈ col(t ) has an associated datatype, denoted
datatype( c ) ∈ DT .
Focusing on exposing an RDF description of SQL-
query results, the D2R MAP tool (www.wiwiss.fu- • For each t ∈ ET ∪ RT , there is exactly one primary
berlin.de/suhl/bizer/d2rmap/D2Rmap.htm) employs a key pk(t ) whose values uniquely determine each
declarative language to describe mappings from a row of the instance data in t , where either
specific SQL-query to ontological entities to be pk(t ) ∈ col(t ) (in this case pk(t ) is a single-
created. The mappings can later be executed by the attribute key and t is an entity table) or
D2R processor to create lightweight ontological pk(t ) ⊆ col(t ) (in this case pk(t ) is a composite key
instances in a RDF file. However, the definition
with more than one attribute, and t is a relationship
process is task-centric and fully-manual, and the
table).
produced mappings are simple and query-specific
hence do not fully model the natural concepts and • For each t ∈ ET ∪ RT , there are n ( n ≥ 0) foreign
relations that the relational description is attempting to keys fk(t , r ) where r ∈ ET ; each fk(t , r ) ∈ col(t )
capture. The work by Handschuh, et al. [5, 6] is the references the values of the single-attribute primary
first to introduce the idea of deep annotation. They key of entity table r , and it holds that
developed a prototype framework for “mapping and value(fk(t , r )) ⊆ value(pk( r )) ∪ {null} where
migrating legacy (relational) data to the Semantic value(*) denotes the value range of ‘ * ’ and
Web” by exploiting server-side Web page markup,
pk( r ) ∈ col( r ) .
client-side annotation and schema-to-ontology
mapping, where a prototype tool called OntoMat- • assoc ⊆ RT × ET × ET is a ternary relation over
Reverse was used to map relational schema elements to RT and ET that models an association relation
ontology entities. Although the focus and goal of their between one relationship table and two entity tables
work is different comparing to our research, the work (we assume without loss of generality that n-nary
gives us good hints for developing a more formal and ( n ≥ 3 ) relationships don’t exist in the ER schema).
automatic approach to creating generic mappings For some t ∈ RT and r , s ∈ ET , assoc(t , r, s ) is
between relational database schema and OWL [8] satisfied iif ∃fk(t , r ), fk(t , s ) ∈ col(t ) such that
ontology.
pk(t ) = {fk(t , r ), fk(t , s )} ⊆ col(t ) .
3. Proposed mapping approach • subof ⊆ ET × ET is a binary relation over ET that
models an inheritance relation between two entity
3.1. Mapping source and target tables. For some t , r ∈ ET , subof(t , r ) is satisfied
iif ∃fk(t , r ) ∈ col(t ) such that either fk(t , r ) = pk(t )
As the mapping source, a relational database (single inheritance) or fk(t , r ) ∈ pk(t ) (multiple
schema can be derived from an ER schema of the inheritance). Here t is a subentity table, r is a
database with an existing modeling tool such as superentity table, and all the related tables form a
PowerDesigner by following the classical “ER-to- generalization hierarchy of entity tables.
relational conversion” rules. In Definition 1 we assume
without loss of generality that all relations in the As the mapping target, an OWL ontology that can
schema are in 3NF. also be derived from the database’s ER schema with
our ER2WO tool [7], consists of a set of axioms built
Definition 1. A relational database schema is a tuple using OWL identifiers (each identifier is a URI
D = ( N , col, datatype, pk, fk, assoc, subof) , where reference consisting of a namespace and a fragment
• N is a finite name set partitioned into: (1) a subset identifier) and constructs. Definition 2 is a concise
ET of entity table names; each entity table contains definition of the OWL ontology.
rows of instance data describing entities in the real Definition 2. An OWL ontology is a tuple
world, (2) a subset RT of relationship table names; O = ( ID, Axiom ) , where
each relationship table contains rows of instance
• ID is a finite OWL identifier set partitioned into: (1)
data describing the relationships between entities,

Proceedings of the 2006 IEEE/WIC/ACM International Conference


on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
0-7695-2747-7/06 $20.00 © 2006
a subset CID of class identifiers including user- property identifiers; datatype properties link
defined identifiers plus two predefined classes individuals to data values.
owl:Thing and owl:Nothing; classes are either • Axiom is a finite OWL axiom set partitioned into a
entity classes describing entities or relationship subset of class axioms and a subset of property
classes describing the relationships between entities, axioms; each axiom is formed by applying OWL
(2) a subset DRID of data range identifiers; data constructs to the identifiers or descriptions that are
range identifiers are predefined XML Schema the basic building blocks of a class axiom and
datatypes such as xsd:integer, (3) a subset describe the class either by a class identifier or by
OPID of object property identifiers; object specifying the extension of an unnamed anonymous
properties link individuals (i.e., entities) to class via the construct restriction.
individuals, and (4) a subset DPID of datatype

Rule 1. Generation rule for entity table to entity class mapping.


For ∀t ∈ ET ∪ RT such that (¬ assoc(t , r, s )) ∧ ( r, s ∈ ET ) (actually, t ∈ ET ), if ∃ClassT = getName(t ) ∈ CID then
Mapping := ' EntityTable2EntityClassMapping' ; Source := t ; all Source. fk := fk(t, r ) = pk( r ) ; Source. pk := pk(t ) ; Target := ClassT .
Rule 2. Generation rule for non-foreign-key column to datatype property mapping.
For ∀c ∈ col(t ) such that ( c ≠ fk(t, r )) ∧ (t, r ∈ ET ) , if (∃dp = getName(c ) ∈ DPID ) ∧ (domain( dp ) = getName(t ) ∈ CID ) then
Mapping := 'Column2DatatypePropertyMapping' ; Source := c ; Source.datatype := datatype( c ) ; Target := dp ; Target.range := range( dp ) .
Rule 3. Generation rule for relationship table to relationship class mapping.
For ∀t ∈ ET ∪ RT such that assoc(t , r, s ) ∧ ( r, s ∈ ET ) (actually, t ∈ RT ),
if (∃ClassT = getName(t ) ∈ CID ) ∧ (∃ClassR = getName( r ) ∈ CID ) ∧ (∃ClassS = getName( s ) ∈ CID ) ∧
(∃opTR ∈ OPID ) ∧ (domain(opTR ) = ClassT ) ∧ (range(opTR ) = ClassR ) ∧
(∃opTS ∈ OPID ) ∧ (domain(opTS ) = ClassT ) ∧ (range(opTS ) = ClassS ) ∧
(∃opRT ∈ OPID ) ∧ (domain(opRT ) = ClassR ) ∧ (range(opRT ) = ClassT ) ∧
(∃opST ∈ OPID ) ∧ (domain(opST ) = ClassS ) ∧ (range(opST ) = ClassT )
then Mapping := ' RelationshipTable2RelationshipClassMapping' ; Source := t ; Source. fk1:= fk(t , r ) ; Source. fk 2 := fk(t , s ) ;
Source. pk := pk(t ) ; Target.RelationshipClass.ClassName := ClassT ; Target.RelationshipClass.ObjectPropertyToEntityClass1:= opTR ;
Target.RelationshipClass.ObjectPropertyToEntityClass 2 := opTS ; Target.EntityClass1.ClassName := ClassR ;
Target.EntityClass1.ObjectPropertyToRelationshipClass := opRT ; Target.EntityClass 2.ClassName := ClassS ;
Target.EntityClass 2.ObjectPropertyToRelationshipClass := opST .
Rule 4. Generation rule regarding the mapping of foreign-key to primary-key reference between two entity tables, where the
entity tables are not in a generalization hierarchy.
For ∀c ∈ col( r ) such that ( c = fk( r, s )) ∧ ( r, s ∈ ET ) ∧ (¬ subof( r, s )) ,
if (∃ClassR = getName( r ) ∈ CID ) ∧ (∃ClassS = getName( s ) ∈ CID ) ∧ (∃ClassT ∈ CID ) ∧
(∃opRT ∈ OPID ) ∧ (domain(opRT ) = ClassR ) ∧ (range(opRT ) = ClassT ) ∧
(∃opTR ∈ OPID ) ∧ (domain(opTR ) = ClassT ) ∧ (range(opTR ) = ClassR ) ∧
(∃opTS ∈ OPID ) ∧ (domain(opTS ) = ClassT ) ∧ (range(opTS ) = ClassS ) ∧
(∃opST ∈ OPID ) ∧ (domain(opST ) = ClassS ) ∧ (range(opST ) = ClassT )
then Mapping := ' FK2PKReferencesMapping' ; Mapping.type := 'Joining' ; Source. fk := fk( r, s ) ; Source. pk := pk( s ) ;
Target. EntityClass.ClassName := ClassR ; Target.EntityClass.ObjectPropertyToRelationshipClass := opRT ;
Target.RelationshipClass.ClassName := ClassT ; Target.RelationshipClass.ObjectPropertyToEntityClass := opTR ;
Target.RelationshipClass.ObjectPropertyToRelatedEntityClass := opTS ; Target.RelatedEntityClass.ClassName := ClassS ;
Target.RelatedEntityClass.ObjectPropertyToRelationshipClass := opST .
Rule 5. Generation rule regarding the mapping of foreign-key to primary-key reference between two entity tables, where the
entity tables are in a generalization hierarchy.
For ∀c ∈ col( r ) such that ( c = fk( r, s )) ∧ ( r, s ∈ ET ) ∧ subof( r, s ) , if (∃ClassR = getName( r ) ∈ CID ) ∧
(∃ClassS = getName( s ) ∈ CID ) ∧ (∃ subClassOf(ClassR ClassS ) or Class(ClassR partial ClassS ) ∈ Axiom)
then Mapping := ' FK2PKReferencesMapping' ; Mapping.type := ' Inheritance' ;
Source. fk := fk( r, s ) ; Source. pk := pk( s ) ; Target.Subclass := ClassR ; Target.Superclass := ClassS .

Figure 1. Heuristic rules for automatically creating schema-to-ontology mappings.

Proceedings of the 2006 IEEE/WIC/ACM International Conference


on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
0-7695-2747-7/06 $20.00 © 2006
3.2. Automatic mapping rules choose an XML format to represent the mapping data
and metadata (e.g., the DBMS, database name and
The rationale behind the automatic mapping is the location, the ontology filename, namespace and
existence of conceptual correspondences between the location) in a valid XML document. The data and
relational database schema and the OWL ontology, metadata are stored in separate XML elements. Each
both modeling the same domain of interest. To identify mapping is represented with an XML element; the
these correspondences, we exploit schema matching source and target are subelements of the element; the
techniques [9] and combine syntactic matching with components of the source/target are subelements of the
semantic matching to determine both element-level and subelements. Details of a mapping/source/target are
structure-level correspondences between the schema attached via some XML attributes to the elements.
and the ontology.
Element correspondences (e.g., table-to-class, 4. Implementation and example
column-to-property) can be determined by finding
naming matching between the pair of elements at the Following the proposed approach, we developed
same concept level using string-based techniques such D2OMapper tool based on J2SE v 1.4.2 and Jena2
as same-prefix and the smallest edit distance API. The tool can take a relational database schema
approaches [9]. If renaming trails (which record all the and an OWL ontology (and a term renaming trails file
renaming operations during the derivation of the if necessary) as inputs and produce the mappings both
schema and/or the ontology from the ER schema) exist automatically (via automatic mapping) and semi-
in the system, naming matching can be directly automatically (via man-machine conversation). Case
obtained by accessing the trails file and through simple studies with D2OMapper indicated the effectiveness of
name equality. Optionally, all identified element our approach and the applicability of the produced
correspondences can be further confirmed by the mappings to the semantic annotation of database-based
domain expert via man-machine conversation. Web pages supported by our DPAnnotator framework.
The determination of structure correspondences
relies on a semantics-based approach. By examining
the relations amongst schema/ontology elements,
structure matching between the schema and the
ontology are identified. Both the element and the
structure correspondences are mutually verified and
reinforced before the last determination. Based on
these correspondences, a set of heuristic rules for
automatically creating the schema-to-ontology
mappings can be specified. In Figure 1, we use a
generic 3-tuple ( Mapping , Source, Target ) to denote
each mapping definition item including the category,
source and target of a mapping, and use Source.Child
to denote the child element of the source and
Target.Child [.GrandChild ] to the child (grandchild) Figure 2. An example ER schema univER.
element of the target. Additionally, three auxiliary
Here we give an example to show the mapping
functions are used: getName( e) is for returning the
process in D2OMapper. Figure 2 is an ER diagram
fragment identifier of an ontology class/property univER created with PowerDesigner 9.5. From the ER
corresponding to database table/column e via naming schema, a MySQL 4.0 database univ was generated
matching (aided by man-machine conversation) or by by PowerDesigner and an OWL ontology univOnt
accessing the renaming trails, domain( p ) and was produced from ER2WO tool [7]. As a proof-of-
range( p ) are two functions for returning the domain concept, renamings (e.g., Course -> course during
and range of property p in the ontology, respectively. ER-to-relational conversion, and Graduate ->
GraduateStudent during ER-to-ontology translation)
3.3. Mapping data representation were recorded in a trails file. Inputting the schema,
ontology and renaming trails file to D2OMapper and
Due to the hierarchical and semi-structured natures performing the ‘automatic mapping’ process, the
of the mapping definition data introduced earlier, we automatically-generated mappings were saved to the

Proceedings of the 2006 IEEE/WIC/ACM International Conference


on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
0-7695-2747-7/06 $20.00 © 2006
XML document univ_Db2OntMappings.xml and displayed on the tool screen, as depicted in Figure 3.

Figure 3. Screenshot of D2OMapper, where the database schema, the OWL ontology and the automatically-
generated mappings are displayed in the left, right and upper middle areas, respectively.

5. Conclusion SIGMOD Record, Vol. 33, No. 3, 2004, pp. 61-70.


[3] L. Reeve, H. Han, “Survey of semantic annotation
platforms”, Proc. of the 2005 ACM Symposium on Applied
We have presented a practical approach for creating Computing, ACM Press, 2005, pp. 1634-1638.
generic mappings between relational database schema [4] V. R. Benjamins, J. Contreras, O. Corcho, et al., “Six
and OWL ontology. Since full implementation of the Challenges for the Semantic Web”, AIS SIGSEMIS Bulletin,
Semantic Web requires widespread availability of Vol. 1, No. 1, 2004, pp. 24-25.
ontological annotations for Web pages, our approach [5] R. Volz, S. Handschuh, S. Staab, et al., “Unveiling the
and tool can act as a gap-bridge between existing hidden bride: deep annotation for mapping and migrating
database applications and the Semantic Web. legacy data to the Semantic Web”, Journal of Web Semantics,
Vol. 1, No. 2, 2004, pp. 187-206.
Acknowledgements. This work was funded by the [6] S. Handschuh, S. Staab, R. Volz. “On deep annotation”,
Proc. of the 12th Int’l World Wide Web Conf., ACM Press,
Natural Science Foundation of Jiangsu Province of 2003, pp. 431-438.
China (BK2003001) and the Foreign Experts [7] Z. Xu, X. Cao, Y. Dong, et al., “Formal approach and
Invitation Key-Project Foundation of the State automated tool for translating ER schemata into OWL
Administration of Foreign Experts Affairs of China ontologies”, Advances in Knowledge Discovery and Data
(20050360543). Mining, LNAI 3056, Springer, 2004, pp. 464-475.
[8] M. Dean, G. Schreiber (eds.): OWL Web Ontology
Reference Language Reference, W3C Recommendation, 10 Feb 2004.
http://www.w3.org/TR/owl-ref/.
[9] P. Shvaiko, J. Euzenat, “A survey of schema-based
[1] T. Berners-Lee, J. Hendler, O. Lassila, “The Semantic
matching approaches”, Journal on Data Semantics, Vol. 4,
Web”, Scientific American, Vol. 284, No. 5, 2001, pp. 34-43.
2005, pp. 146-171.
[2] K. C.-C. Chang, B. He, C. Li, et al., “Structured
databases on the Web: observations and implications”,

Proceedings of the 2006 IEEE/WIC/ACM International Conference


on Web Intelligence (WI 2006 Main Conference Proceedings)(WI'06)
0-7695-2747-7/06 $20.00 © 2006