You are on page 1of 7

RDB2RDF: Completed Transformation from Relational

Database into RDF Ontology


Pham Thi Thu Thuy Nguyen Duc Thuan Yongkoo Han Kisung Park Young-Koo Lee
Faculty of Information Faculty of Information Kyung Hee Kyung Hee Kyung Hee
Technology Technology University University University
Nha Trang University Nha Trang University
Vietnam Vietnam South of Korea South of Korea South of Korea
thuthuypht@gmail.com ngducthuan@gmail.com ykhan@khu.ac.kr kspark@khu.ac.kr yklee@khu.ac.kr

ABSTRACT integrated into a data repository enabling applications to


One of the most advantages of the Semantic Web is to augment use the data in different contexts [7].
the data with a well-defined meaning and linking between data by RDF data can be presented in the form of triple (Subject -
using the RDF ontology language. Today most of data are stored
Predicate – Object) or RDF/XML which stores RDF format
in relational databases. In order to reuse and infer this data on the
Semantic Web, there is a need for converting the data stored in in the form of XML file [5]. The most advantage of
relational databases to the form of RDF. Some approaches have RDF/XML is that it can reuse the existing XML tools.
been proposed, however, most of them transform a single table Moreover, each RDF format has an internet content type
into RDF triples. This paper presents RDB2RDF, a complete [5], passed by the server, so the client knows how to parse
method to transform all tables in the relational database into RDF the data. Therefore, in this paper we use RDF/XML format
ontology. The transformation makes it possible to reverse RDF to store the results.
ontology to relational tables. Most of all, all the steps in
RDB2RDF are done automatically without any user intervention. Moreover, most of formatted data today is stored in
relational databases which are excellent tools for storing
and querying data, but lack the ability to describe the
Categories and Subject Descriptors semantics of data. In order to utilize the relational data in a
I.2.4 [Knowledge Representation Formalisms and Method]:
Representation languages. I.5.3. [Clustering] Similarity
semantic context, we should transform those data into RDF,
measures. the data format of the Semantic Web. There are some
proposals that move relational data to the RDF dataset. The
typical approaches are proposed by Edgard Marx et al. [4],
General Terms
Standardization, Languages.
Huajun Chen et al. [6], and Kate Byrne [9]. However, most
of proposed approaches are simple and equivalent
matching. They map some tuples of relational data to some
Keywords triples of RDF dataset without considering the RDFS
Semantic Web, relational databases, RDF, transformation.
semantic constraints and the complex query to extract
important information.
1. INTRODUCTION
The Semantic Web is an extension of the current Web, in The main goal of RDB2RDF is to allow flexible mappings
which data are augmented with a well-defined meaning and of complex relational structures into RDF ontology without
relationship between data by using RDF (Resource changing the existing database. The flexibility is achieved
Description Language) with the vocabulary supported by by employing SQL statements directly in the
RDF Schema (RDFS). Those RDF data can be understood transformation steps. The resulting record sets are grouped
by the computer and then can be shared, exchanged or afterwards and the data is mapped to the RDF triples.
Our contributions are as three folds:
x RDB2RDF can transform all tables in the
Permission to make digital or hard copies of all or part of this work for relational database into RDF ontology.
personal or classroom use is granted without fee provided that copies
are not made or distributed for profit or commercial advantage and that x The transformation keeps the tracks of attribute
copies bear this notice and the full citation on the first page. To copy keys in the tables so that the algorithm can be
otherwise, or republish, to post on servers or to redistribute to lists, extended to reverse RDF ontology to relational
requires prior specific permission and/or a fee. tables.
IMCOM (ICUIMC)’14, January 9–11, 2014, Siem Reap, Cambodia. x All the steps in RDB2RDF are done automatically
Copyright 2014 ACM 978-1-4503-2644-5 …$15.00.
without any user intervention.
The rest of the paper is organized as follows. Section 2 an entity in the conceptual data model. We can check this
presents some specific methods of the related work. In the requirement by looking in the table. Are there attributes
Section 3, we describe the RDB2RDF architecture and which are primary keys and are not foreign keys? If yes,
details of each steps. Section 4 presents the illustrating this table is generated from an entity. Otherwise, it is from a
example for the RDB2RDF. The evaluation is presented in relationship. The target of the transformation is a RDF file
Section 5. Finally, Section 6 summarizes the paper and which has the name of the source table (with different
mentions to the future research. extension).
The extracted table must have at least one attribute which is
2. RELATED WORK a primary key. This primary key is used to create URI for
the resource. Each extracted attribute has domain and range
There are many approaches investigating the as describing in Table 1.
transformation of the relational database into a RDF
dataset. The most similar approach to our approach is the Table 1. Domain and range for each extracted attribute
D2R [3]. However, this approach only extracts interest
Attribute type Domain Range Description
attributes in tables and then transform them into RDF
triples. Our proposed method transforms all attributes in a Primary key in Name of Name of primary
primaryKey
primary table primary table table
database.
Foreign key in Name of Name of
foreignKey
More work has been addressed on the issue of explicitly primary table primary table referenced table
defining semantics in database schemas [2], [13], extracting Name of table
Primary key in Name of
semantics out of database schema and transforming a containing this primaryKey
other table primary table
relational model into an object-oriented model [1], which is key
close to an ontological theory. Foreign key in Name of Name of
foreignKey
other table primary table referenced table
Other well-known approaches are RDB2RDF [4], Juan Nomal Name of
Name of
Sequeda [8], Huajun Chen et al. [6], and Kate Byrne [9]. attribute (not datatype (in attribute
primary table
RDB2RDF [4] method uses the mapping language R2RML key) XML Schema)
[15] to convert tuples of the relational data to RDF triples.
However, this is the direct mapping which does not 3.2 RDB2RDF architecture
consider the RDFS semantic constraints such as
rdfs:subClassof, rdfs:subPropertyof, rdfs:domain, and The details of our approach is presented in Figure 1.
rdfs:range. Juan Sequeda [8] and Huajun Chen et al. [6] are
also direct mappings by using RDF query and SPARQL
query, respectively, to extract some specific information Description RDFS
of attributes RDF
from the relational data. Kate Byrne [9] defines new RDF 1 in DB 3.2 5.2
relationships and maps cultural heritage data to those
relationships.
In the broader sense, our approach could be treated as the Database 3.1 5.1
reverse method: RDF storage in relational database.
Agrawal et al. [12] use only one “universal” table in the 2
database. Every individual (instance) falls into one record 4 XML
Select query Attributes
in the table. While the data model is simple, this approach Execute
Select
has some drawbacks such as large number of columns and query
limits on property values. The “Generic Representation”
[14] has a single table where each record corresponds to a
RDF triple. However, this design means that any query has Figure 1. RDB2RDF architecture
to search the whole database and queries that involve joins. As shown in Figure 1, our approach has five small steps as
The cost will be especially expensive. following:
x Step 1: Describe all attributes in a database. The
3. THE RDB2RDF DESCRIPTION AND
description result is a text file stored in secondary
PROCESS memory.
x Step 2: Use SELECT command to extract the data
3.1 RDB2RDF description describing the resource. Each resource should
. belong to a primary table. The attributes of the
The source of the transformation should be a primary table primary table must be extracted first.
in a relational database. This table has to be generated from
x Step 3: Generate RDF Schema (RDFS) file based SELECT property1, property2.....
on the description file (Step 1) and the attributes FROM table1, table2, ....
extracted from Step 2. WHERE [Where conditions]
x Step 4: Execute the SELECT query to extract ORDER BY property1
instances in the relational database. The results are FOR XML AUTO, ELEMENTS
stored in XML format.
where property1 is the attribute key in the primary table;
x Step 5: Generate RDF dataset from RDFS and table1 is the name of primary table.
XML files.
The algorithm to generate RDF/XML file can be described
Details of each step are presented in next sections. as following pseudo codes:

3.3 SELECT syntax for extracting data


Algorithm 1. GenerateRDF
༦Input: a XML file Fxml, a primary table Tp, a RDFS file
The SELECT command must contain the primary key of
Fs
the primary table. This primary key is considered as URI
༦Output: a RDF file F
for instances in the resource. The SELECT syntax is as
following: 1: Collect all the children elements of the root element in
the XML file Fxml.
SELECT tableName.ID, attibute1 As AliasName1, attibute2 2: FOR each child element ec in Fxml.
As AliasName2....
3: read the value ID of the attribute key in the Tp
FROM tableName , table2Name .... 4: create a resource having
WHERE ..... URI=baseURI+ResourceName+#+ID.
FOR XML AUTO, ELEMENTS ODER BY tableNameID 5: FOR each property p in Fs
6: take a list of elements (listEle) in an instance.
We note that the attribute key in the primary table must be
7: IF n(|listEle|) > 0 THEN
extracted in the SELECT command.
8: FOR each child element ec in listEle
9: IF is the attribute key THEN
3.4 Generating RDFS 10: Create a corresponding property
11: Generate a property having ec’s value.
We assume that the extraction of data in the relational 12: Append predicate to the resource.
database is not redundant. It means that if the foreign key is 10. RETURN F
extracted, the primary key which is referenced by the
foreign key is not extracted and vice versa. The RDFS file We describe an algorithm for generating a RDF file from
contains the classes and properties which are described as the primary table, the RDFS file and the XML file
following: generated by Algorithm 1. First, we collect all the children
elements of the root element in XML file (line 1). Second,
x Description of classes: Each table in a database is we read the value ID of the attribute key in the primary
transformed into a class. The description of a class table, and then we iteratively create a resource having an
is based on the key attribute (primary key or URI that includes base URI and Resource Name and ID for
foreign key). The class name is a value in the each child element in the XML file (line 2-4). Third, we
range column. If the parent class contains values, take a list of elements (listEle) in an instance for each
the class in the range column is a child class of a property (line 6). If the number of elements in listEle is
parent class. greater than 0 and the property is the attribute value, we
x Description of properties: The domain of all the create a corresponding property which referenced to the
attributes is the name of primary table. The range resource having URI for each property (line 7-10). We
of attributes is the values from the range column. generate property which value is the value of element child
in listEle, and then we put predicate between containers
3.5 Generating RDF (line 11). Finally, we append predicate to the resource. If all
the properties in RDFS are not traveled, we return to the
This step produces a RDF file from the XML and RDFS attribute key checking step (line 12). Through these all step,
files generated in Section 3.3 and 3.4. The SELECT we can get the results RDF file.
command to generate RDF format in the form of XML file
is as following:
4. ILLUSTRATING EXAMPLE
The following example illustrates the use of RDB2RDF to Document.DocumentID #Document #Document primaryKey
transform data about authors and their documents from a For easy understanding, we can replace the attribute
database into RDF. Because authors usually have more than Doc_Author.Author by the alias name “Created_By”.
one document and documents can be created by multiple
authors, the information can be stored in three database The RDFS file that stores the required information is
tables: one for the authors, one for their documents, and the generated as following:
third one for the n:m relationship between authors and <?xml version="1.0" ?>
documents:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-
Author (AuthorID, AuthorName, AuthorEmail, syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-
AuthorORG) schema#"
xmlns:xsd="http://www.w3.org/2001/XMLSchema#">
Document (DocumentID, DocName, DocFormat, DocLink)
<rdfs:Class rdf:ID="Document" />
Doc_Author (AuthorID, DocumentID).
<rdfs:Class rdf:ID="Author" />
The instances of three above tables are as following:
<rdf:Property rdf:ID="DocName">
Author: <rdfs:domain rdf:resource="#Document" />
AuthorID AuthorName AuthorEmail AuthorORG <rdfs:range rdf:resource="&rdf;Literal" />
</rdf:Property>
Author01 Anderson anv@yahoo.com MinhKhaiPub
<rdf:Property rdf:ID="DocFormat">
Author02 Thomas btv@yahoo.com MinhKhaiPub <rdfs:domain rdf:resource="#Document" />
<rdfs:range rdf:resource="&rdf;Literal" />
Document: </rdf:Property>
<rdf:Property rdf:ID="DocLink">
DocumentID DocName DocFormat DocLink
<rdfs:domain rdf:resource="#Document" />
Doc01 C++ pdf http://www.somewh
programming ere/Doc <rdfs:range rdf:resource="&rdf;Literal" />
Doc02 Semantic Web chm http://www.somewh </rdf:Property>
ere/Doc <rdf:Property rdf:ID="Created_By">
Doc03 MSSQL 2000 pdf http://www.somewh <rdfs:domain rdf:resource="#Document" />
ere/Doc
<rdfs:range rdf:resource="#Author" />
Doc04 ASP & pdf http://www.somewh
</rdf:Property>
ASP.NET ere/Doc
</rdf:RDF>
Doc_Author: To create the RDF file, we use the following SELECT
AuthorID DocumentID
command.
Author01 Doc01 SELECT DocID AS [Document_RESOURCEURI_], DocName,
DocFormat, DocLink, AuthorID AS [Created_By]
Author02 Doc01
FROM Document, Doc_Author
Author01 Doc02
WHERE Document.DocID=Doc_Author.DocID
Author01 Doc03
ORDER BY DocID
Author02 Doc04
FOR XML AUTO, ELEMENTS
The result RDF/XML file is below:
For example, we would like to know the detail information <?xml version="1.0"?>
about the author. The extracted information is as following:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
Attribute xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
Attribute name domain range
type
xmlns:res="http://www.oracle.com/technology/pub/articles/res#"
Author.AuthorID #Author #Author primaryKey
xmlns:db="http://http://www.w3.org/2001/XMLSchema-
Author.AuthorName #Author Literal Attribute instance/mine#">
Author.AuthorEmail #Author Literal Attribute <db:result> <db:Document>
Author.AuthorORG #Author Literal Attribute <db:Document_RESOURCEURI_>
Doc01</db:Document_RESOURCEURI_> </rdf:RDF>
<db:DocName> C++ Programming</db:DocName>
<db:DocFormat>pdf</db:DocFormat> The RDB2RDF allows to transform an arbitrary relational
<db:DocLink> database into RDF formats which includes RDF Schema
http://www.somewhere/Doc </db:DocLink> file (*.rdfs) and RDF file (*.rdf). The RDF Schema file
describes classes and the relationship between properties
<db:Doc_Author>
and classes. The RDF file stores all instances of relational
<db:Created_By> Author01</db:Created_By> database. The transformation does not depend on the
</db:Doc_Author> database and does not make any change on the database.
<db:Doc_Author> RDB2RDF is implemented by using the C# language with
<db:Created_By> Author02</db:Created_By> the support of the library .Net 2.0. The relational databases
</db:Doc_Author> are stored in SQL server 2012. Therefore, before executing
the transformation, we must connect to the SQL database.
</db:Document>
Our program supports two authentication (windows and
<db:Document> server) for connecting to the database.
<db:Document_RESOURCEURI_>
Our RDB2RDF program provides two kinds of
Doc02</db:Document_RESOURCEURI_> transformations. The first one allows users to use the button
<db:DocName>Semantic Web</db:DocName> to convert all relational tables into RDF ontology. The
<db:DocFormat>chm</db:DocFormat> second one let users enter the SQL commands to specify the
<db:DocLink>http://www.somewhere/Doc needed information from some tables. All steps in
RDB2RDF program are automated and thus inexpensive
</db:DocLink>
and fast.
<db:Doc_Author>
<db:Created_By> Author01</db:Created_By>
5. EVALUATION
</db:Doc_Author>
</db:Document>
We evaluate the proposed transforming strategies by
<db:Document> matching a relational database with a RDF file to determine
<db:Document_RESOURCEURI_> the true matches, and compare our results with related
Doc03</db:Document_RESOURCEURI_> methods. To assess the quality of the matching system, we
use precision and recall [16]. Given the set of expected
<db:DocName>MSSQL 2000</db:DocName>
matching pairs, R, (produced by a human), the set of
<db:DocFormat>pdf</db:DocFormat> alignment pairs, T, (produced by the matching system for
<db:DocLink> http://www.somewhere/Doc the proposed methods), the Precision is computed as the
</db:DocLink> following equation:
R ˆT (1)
<db:Doc_Author> precision(R,T)
T
<db:Created_By> Author01</db:Created_By>
Recall specifies the share of real correspondences:
</db:Doc_Author>
R ˆT (2)
</db:Document> recall(R,T)
R
<db:Document>
Although precision and recall are the most widely used
<db:Document_RESOURCEURI_> measures, when comparing matching systems, one may
Doc04</db:Document_RESOURCEURI_> prefer to have only a single measure. For this reason, F-
<db:DocName> ASP and ASP.NET</db:DocName> measure [16], is introduced to aggregate the precision and
recall.
<db:DocFormat>chm</db:DocFormat> (3)
precision* recall
<db:DocLink> http://www.somewhere/Doc F  measure 2*
precision+ recall
</db:DocLink>
<db:Doc_Author>
To obtain practical evidence, we applied our transformation
<db:Created_By> Author02</db:Created_By> to two sample databases produced by Microsoft,
</db:Doc_Author> particularly, Northwind [10], and Pubs [11].
</db:Document> We compare the precision, recall, and F-measure values
</db:result> between our proposed method and the most related work,
such as D2R [3], RDB2RDF [4], Juan Sequeda [8], and their matching results in the Northwind database are lower
Huajun Chen et al. [6]. The matching system is also than those in the Pubs database. For instance, the D2R’s F-
implemented by using Visual C#. The comparing results are measure score in the Figure 2 is only 58% compared with
shown in the following figures. 66% in the Figure 3.

6. CONLUSIONS

Transformation from relational database into RDF ontology


plays a critical role in realizing the Semantic Web as well
as in many data sharing problems. There are many
approaches mentioning this transformation. Moreover, most
of those approaches directly transform relational tuples into
RDF triples without keeping the foreign key and primary
key relationships. Other methods transform some relational
tuples into the RDF triples and do not consider the RDFS
Figure 2. Matching comparison between our method and semantic constraints and relational data’s structure. Our
related work on Northwind database
proposed RDB2RDF method can transform all data from
the relations or can extract any required information while
keeping the relationship between primary keys and foreign
keys and improve the relational data semantics by using
RDFS vocabularies. The experimental results show that our
proposed method outperforms other related work due to
these reasons.
Moreover, all the steps in our proposed method can be
executed automatically without any human intervention.
This algorithm can be also implemented as an intermediate
module between any relational database and Semantic Web
Figure 3. Matching comparison between our method and page. The extracted information can be selected by the
related work on Pub sample database users. Our future direction is to transform relational
databases into OWL ontology which supports more
Figure 2 and Figure 3 show that our matching quality is semantics for the data than RDF.
highest in comparing to those of the related work.
RDB2RDF [4] is ranked second, then J. Sequeda [8], H. 7. ACKNOWLEDGMENTS
Chen et al. [6], and D2R [3]. The main reason is that our
method and RDB2RDF [4] transform all relational data into This research was supported by the MSIP (Ministry of Science,
RDF whereas other three methods extract some relational ICT & Future Planning), Korea, under the ITRC (Information
tuples. Moreover, our method maintains the relationships Technology Research Center) support program supervised by the
between foreign key and primary key among relations NIPA (National IT Industry Promotion Agency) (NIPA-2013-
whereas RDB2RDF [4] does not. Among D2R [3], J. (H0301-13-2001))
Sequeda [8], and H. Chen et al. [6] methods, J. Sequeda [8]
gives the highest matching values since this method retains 8. REFERENCES
the connections between foreign keys and primary keys.
Moreover, when extracting some portions of the relational [1] Behm A., Geppert A., Dittrich, K. 1997. On the Migration of
data, those three methods change some of the data structure Relational Schemas and Data to Object-Oriented Database
so that their matching scores are not good. Systems. In Proceeding of the 5th Int. Conference on Re-
Technologies for Information Systems (Klagenfurt,
There are some small differences between Figure 2 and December 1997), pp. 13-33.
Figure 3, since the differences of Northwind and Pubs
[2] Chiang R., Barron T., Storey V. 1994. Reverse engineering
databases. Northwind database has 13 relations in
of relational databases: Extraction of an EER model from a
comparing to 11 relations in Pubs database. Among those relational database. Journal. of Data and Knowledge
relations, there are relationships between foreign keys and Engineering, Vol. 12, No. 2, pp. 107–142.
primary keys. In this experiment, the total number of the [3] Christian Bizer. 2003. D2R Map - A Database to RDF
relationships in the Northwind database is higher than that Mapping Language. WWW 2003, Hungary.
of Pubs database. Therefore, for those methods which do [4] Edgard Marx, Percy Salas, Karin Breitman, José Viterbo,
not maintain the foreign key and primary key relationship, Marco A. Casanova. 2013. RDB2RDF: A relational to RDF
plug-in for Eclipse. Software. Practice Expert, Vol. 43, No. [10] Microsoft. 2011. Northwind database.
4, pp. 435-447, doi:10.1002/spe.2145 http://northwinddatabase.codeplex.com/
[5] Graham Klyne, Jeremy Carroll. 2002. Resource Description [11] Microsoft. 2013. Pubs sample database.
Framework (RDF): Concepts and Abstract Syntax. W3C http://technet.microsoft.com/en-
Working Draft (work in progress). us/library/aa238305%28v=sql.80%29.aspx.
http://www.w3.org/TR/2002/WD-rdf-concepts-20021108/. [12] R. Agrawal, A. Somani, and Y. Xu. 2001. Storage and
[6] Huajun Chen, Zhaohui Wu, Heng Wang and Yuxin Mao. Querying of E-Commerce Data. In Proceedings of VLDB.
2006. RDF/RDFS-based relational database integration. In [13] Rishe N. 1992. Database Design: The Semantic Modeling
Proceedings of the 22nd International Conference on Data Approach. McGraw-Hill.
Engineering. pp. 94-104.
[14] S. Alexaki, V. Christophides, G. Karvounarakis, D.
[7] James Hendler, Tim Berners-Lee, Eric Miller. 2002. Plexousakis & K.Tolle, 2001. On Storing Voluminous RDF
Integrating Applications on the Semantic Web. Journal of Description: The case of Web Portal Catalogs, In Proc. of
the Institute of Electrical Engineers of Japan, Vol 122(10), WebDB2001 in conjunction with ACM SIGMOD'01
p.676-680. Conference.
[8] Juan Sequeda, Marcelo Arenas, Daniel P. Miranker. 2012. [15] W3C. 2012. R2RML: RDB to RDF mapping language.
On directly mapping relational databases to RDF and OWL. http://www.w3.org/TR/r2rml/
WWW 2012, 649-658
[16] Wikipedia, “Precision and recall”,
[9] Kate Byrne. 2006. Tethering cultural data with RDF. In http://en.wikipedia.org/wiki/Precision_and_recall
Proceedings of the Jena user conference 2006 (JUC2006),
UK.

You might also like