Deductive Graph Database - Datalog in Action

2015 International Conference on Computational Science and Computational Intelligence
Deductive Graph Database – Datalog in Action
Kornelije Rabuzin
Faculty of organization and informatics
University of Zagreb
Varazdin, Croatia
kornelije.rabuzin@foi.hr
Abstract—In recent years many NoSQL systems, including • Document oriented databases
graph databases, have become available. For large amounts of
interconnected data, these systems represent a good choice. • Column oriented databases
Deductive databases have been used in order to deduce new
• Key value databases
pieces of information based on a database that containes large
amounts of data. But it is important to keep in mind that such • Graph databases
databases were mostly relational, i.e., relations were used to store
data upon which deductive mechanisms were used. In this paper, Graph databases store information in nodes and
deductive graph databases are proposed. In deductive graph relationships. Each node does not have to contain the same
databases, data are stored in a graph database, and Datalog is number of properties (attributes) and the same applies for
used for reasoning purposes on a relational representation of a relationships between the nodes. For large amounts of
graph database. interconnected data graph databases represent a good choice.
They are especially suitable for social network analysis. In the
Keywords—databases, SQL, NoSQL, graph databases, Datalog next chapter a small graph database is implemented (Neo4j
system is used) and more about graph databases will be
I. INTRODUCTION discussed. For additional information on graph databases see
[6]. In this paper, we primarily discuss graph databases (other
The relational data model has been widely used in the past
types are not discussed).
40 years. Many databases were implemented in order to store
large amounts of important data. As it turns out, Dr. Codd’s A deductive database uses rules to produce new pieces of
vision to store data into relations turned out to be crucial and, knowledge based on facts, which are stored in the database.
because of its ideal properties, the relational data model has The next definition can be found in [11]: “A deductive DBD is
survived. Although the term “relation” is used in the theory, a triple D = (F, DR, IC), where F is a finite set of ground facts,
users that use databases on a daily basis usually claim that a DR a finite set of deductive rules, and IC a finite set of
database is, in fact, a set of tables. In order to implement a integrity constraints. The set F of facts is called the extensional
database, certain database management systems (DBMS) are part of the DB (EDB), and the sets DR and IC together form
required. The rich query interface (Structured Query Language the so-called intensional part (IDB)”. A small example is
– SQL) that DBMS supports can be used to work with borrowed from [11]:
databases. The ability to store and efficiently manage large
amounts of data has turned database management systems into Facts
significant parts of many applications and information systems Father(John, Tony)
that were developed over the time.
Mother(Mary, Bob)
SQL is a standardized language that is used to work with
databases. All database management system vendors support Father(Peter, Mary)
SQL which is declarative and typically, not complex. However,
sometimes queries do get quite complex. Furthermore,
different databases management systems do not support all the Deductive Rules
statements in the same form and some differences may exist.
For more information on SQL, see [8] and [9]. Parent(x,y) ←Father(x,y)
But, in recent years, the NoSQL movement has become Parent(x,y) ← Mother(x,y)
popular. Namely, the relational data model is starting to reveal
its weaknesses, and, for some problems, new solutions have to GrandMother(x,y) ← Mother(x,z) ∧ Parent(z,y)
be found. The amounts of data that relational databases have to
store today are beyond their capabilities. Furthermore, a fixed Ancestor(x,y) ← Parent(x,y)
database schema is no longer an option. Because of this, many Ancestor(x,y) ← Parent(x,z) ∧ Ancestor(z,y)
NoSQL systems have been developed, and, generally speaking,
we distinguish: Nondirect-anc(x,y) ← Ancestor(x,y) ∧ ¬Parent(x,y)
978-1-4673-9795-7/15 $31.00 © 2015 IEEE 114

DOI 10.1109/CSCI.2015.60
Integrity Constraints
IC1(x) ← Parent(x,x)
IC2(x) ← Father(x,y) ∧ Mother(x,z)
Thus, there are three facts and several rules used to define
different relationships (parent, grandmother, etc.). Two
integrity constraints prevent someone from being parent of his
self and preventing a person from being both mother and father
at the same time. For more information on deductive databases
see [11] or [14].
However, some things that initially occurred in deductive
databases are used in SQL today: for example, recursive
queries. Some books that cover databases in general are [1],
[2], [3], [5] and [12]; other book are available as well.
In the next section if this paper, graph databases are
defined. Then the Deductive Graph Database is presented and a
few Datalog queries are written. Afterward, the discussion is
Figure 1. Graph database
given and the conclusion is presented.
This example stores data about courses and their
II. GRAPH DATABASES prerequisites. To list courses and their prerequisites (first
level), it is enough to read the graph database. Cypher is used
Unlike relational databases, which store data in tables, as a language to read the data from the database. MATCH
graph databases use nodes and relationships between nodes. clause is used to start the query; it finds two courses that are
Storing data in such a way has certain benefits and it is more connected by means of a relationship, which is called PREREQ
natural for large amounts of interconnected data (for example, and then their names are returned in the result:
social network analysis). Thus, one cannot say that graph
databases are always better or that relational databases are MATCH (n:Course)-[:PREREQ]->(m:Course)
always better; it depends on one’s needs.
RETURN n.name, m.name
Nodes and relationships have properties. In the next
n.name m.name
section, we define a small graph database (the Neo4j system is
used for implementation purposes): Mathematics Programming I
CREATE (p1:Course {name: "Mathematics", ects: 7}), Mathematics Databases I
(p2:Course {name: "Informatics"}), Informatics Databases I
(p3:Course {name: "Programming I", ects: 7}), Databases I Databases II
(p4:Course {name: "Databases I"}), Databases II Data warehouses I
(p5:Course {name: "Databases II", ects: 6}),
(p6:Course {name: "Data warehouses I", ects: 5}), On the second level (as well as on any other level), it is
enough to reread the graph database. Now we are looking for
(p1)-[:PREREQ]->(p3),
three nodes and two relationships between them:
(p1)-[:PREREQ]->(p4),
MATCH (n:Course)-[:PREREQ]->(m:Course)-
(p2)-[:PREREQ]->(p4), [:PREREQ]->(o:Course)
(p4)-[:PREREQ]->(p5), RETURN n.name, o.name
(p5)-[:PREREQ]->(p6) n.name o.name
Mathematics Databases II
The visual interpretation of the database defined above is Informatics Databases II
shown below:
Databases I Data warehouses I
Now we are looking for four nodes and three relationships

between them:
115
MATCH (n:Course)-[:PREREQ]->(m:Course)- course(mathematics, 7).
[:PREREQ]->(o:Course)-[:PREREQ]->(p:Course)
course(informatics, null).
RETURN n.name, p.name
course('programming I', 7).
n.name p.name
course('databases I', null).
Mathematics Data warehouses I
course('databases II', 6).
Informatics Data warehouses I
course('data warehouses I', 5).
The first problem is that no recursion is supported in Neo4j

and we have the same problem as in SQL before the recursion However, one has to keep in mind that nodes do not have to
was added into the SQL standard. In order to merge all the have the same properties. Here, we see that not all courses have
queries (listed above) and to find all courses and their the number of ECTS points and, because of that, null is used.
prerequisites, we should use the UNION clause (the result is Relationships are stored as facts as well:
obvious): prereq(mathematics, 'programming I').
MATCH (n:Course)-[:PREREQ]->(m:Course) prereq(mathematics, 'databases I').
RETURN n.name AS c, m.name AS p prereq(informatics, 'databases I').
UNION prereq('databases I', 'databases II').
MATCH (n:Course)-[:PREREQ]->(m:Course)- prereq('databases II', 'data warehouses I').
[:PREREQ]->(o:Course)
RETURN n.name AS c, o.name AS p
Regarding the translation, one may ask why perform the
UNION translation at all? But the answer may be surprising. Namely,
MATCH (n:Course)-[:PREREQ]->(m:Course)- graph databases use different methods to store data [6]:
[:PREREQ]->(o:Course)-[:PREREQ]->(p:Course) “Some graph databases use native graph storage that is
RETURN n.name AS c, p.name AS p optimized and designed for storing and managing graphs. Not
all graph database technologies use native graph storage
however. Some serialize the graph data into a relational
database, an object-oriented database, or some other general-
Recursive queries are supported in Datalog (deductive purpose data store.”
databases), however, and we could use them on a graph
database in order to more easily find courses and their Thus, we see that the underlying storage model can rely on
prerequisites. Deductive databases also provide views. One can relational databases as well as on some other types, but this
define a view; in this way, queries may be posed much easier. underpins the idea of translating the graph database into a set of
Furthermore, hypothetical queries are supported in Datalog as facts that can be stored in tables in order to use Datalog on such
well. Because we see that Cypher has certain problems while a database. For some graph databases, this may be a native
querying data, in the next section deductive graph databases are solution. Now, when we have the set of facts, we can add two
defined. It is also shown how Datalog can be used to work with rules:
data.
comes_before(X,Y):- prereq(X,Y).
III. DEDUCTIVE GRAPH DATABASE comes_before(X,Y):- prereq(X,Z), comes_before(Z,Y).
As previously noted, deductive databases use rules to
produce new pieces of information based on the facts that are
Course X comes before the course Y if X is prerequisite for
stored in the database. The definition was given above; thus,
Y, or if it is a prerequisite for Z and Z comes before Y. For
we know what components one such database should contain.
testing purposes, the Datalog Educational System (DES) was
When one talks about graph databases, facts are stored in nodes
used:
and relationships.
DES> comes_before(X,Y).
In normal deductive databases, facts are stored in tables.
Thus, for now one can say that this is a crucial distinction. In {
order to be able to use Datalog on graph databases, facts that
are stored as nodes and relationships have to be stored in a comes_before('databases I','data warehouses I'),
more suitable form in order to enter new pieces of information. comes_before('databases I','databases II'),
So, let us transform the graph database into a set of facts so that
Datalog rules could be utilized. Nodes can be translated easily comes_before('databases II','data warehouses I'),
(for each node a new fact is added): comes_before(informatics,'data warehouses I'),
116
comes_before(informatics,'databases I'), become a friend of C as well. In Datalog, this is easy to
express:
comes_before(informatics,'databases II'),
fknow(A,C):- knows(A,B),knows(B,C).
comes_before(mathematics,'data warehouses I'),
DES> fknow(A,B).
comes_before(mathematics,'databases I'),
{
comes_before(mathematics,'databases II'),
fknow(tom,mary)
comes_before(mathematics,'programming I')
}
}
Info: 1 tuple computed.
Info: 10 tuples computed.
We see that it is likely that Tom and Mary will become
friends in the future.
We see that Datalog rules (and recursion) can be used to But one could also use hypothetical queries for such
determine courses and their prerequisites. In Cypher, one has to purposes. If A knew B, is it likely that A becomes a friend of
use the UNION clause (as it was shown above), and this is C?
something that people were doing in SQL a decade ago. So, for
now, Datalog can be used to solve the problems, as it was used Interesting papers that deal with queries in graph databases
to solve the problems that SQL had before the recursion was are [4], [10] and [13]. This research was inspired by [7] and
supported. [15].
Using Datalog has other benefits as well. Let us look at the
following rule: CONCLUSION
tough(X):-course(X,Y), Y>6. Deductive databases were used in the past; some methods
used in deductive databases were so interesting that they even
This is, in fact, a rule that is added to the database and found a way into other fields as well. For example, recursive
executed as a query. Such rules are called views. As a result, queries in SQL have roots in deductive databases. In this paper,
we obtain all the courses that have more than six ECTS points, it is shown how recursion can be used on a graph database
which are considered to be tough: (perhaps one day graph databases will support the recursion) as
well as views and hypothetical queries.
DES> tough(A).
In this paper, the idea of deductive graph database has been
{
presented. On a few examples, it was shown how Datalog
tough(mathematics), could be used to work with data. It was shown that views are
interesting as well as recursion which simplifies the process of
tough('programming I') posing queries.
} In the future, one should look at how to automatically
Info: 2 tuples computed. transform nodes and relationships into a form upon which
Datalog could be used directly. Furthermore, materialized
views could be considered as well as hypothetical queries that
Thus, it is easy to use views in Datalog. More importantly, were only mentioned in the paper.
views are not supported in Neo4j as such.
REFERENCES
Now let us assume that our graph database contains
information about people and relationship, KNOWS (“knows” [1] C. Date, An Introduction to Database Systems. Boston, USA: Addison-
Wesley, 2004.
means that person A knows person B). We could translate that
database in the following set of facts: [2] C. Date, Database Design and Relational Theory: Normal Forms and All
That Jazz. Sebastopol, USA: O’Reilly Media, 2012.
person(tom). [3] E. Redmond and J. R. Wilson, Seven Databases In Seven Weeks. Dallas,
USA: Pragmatic Programmers, 2012.
person(john). [4] G. Butler, L. Chen, X. Chen, And L. Xu, Diagrammatic Queries and
Graph Databases, Workshop on Managing and Integrating Biochemical
person(mary). Data, Retrieved March 15, 2015, from
person(peter). http://users.encs.concordia.ca/~gregb/home/PDF/eml-digrammatic.pdf
[5] H. Garcia-Molina, J. Ullman and J. Widom, Database Systems: The
knows(tom,john). Complete Book. London, UK: Pearson Education, 2009.
[6] I. Robinson, J. Webber, E. Eifrem, Graph Databases. Sebastopol, USA:
knows(john,mary). O'Reilly Media, 2013.
[7] K. Rabuzin, Deductive Data Warehouses, International journal of data
warehousing and mininig, vol. 10(1), 2014.
For example, this is a well-known fact; if A is a friend of B, [8] K. Rabuzin, SQL – napredne teme, akovec, Croatia: Zrinski, 2014.
and B is a friend of C, in the future it is most likely that A will
117
[9] K. Rabuzin, Uvod u SQL, akovec, Croatia: Zrinski, 2011. [13] P. T. Wood, Graph Views and Recursive Query Languages, BNCOD,
[10] K. T. Yar and K. M. L. Tun, Predictive Analysis of Personnel 1990, pp. 124-141.
Relationship in Graph Database, International Journal of Engineering [14] R. M. Colomb, Deductive Databases and Their Applications. London,
Research & Technology, vol. 3(9), 2014. Retrieved March 12, 2015, UK: Taylor & Francis, 2005.
from http://www.ijert.org/view-pdf/11117/predictive-analysis-of- [15] V. Nigam, L. Jia, A. Wang, B. T. Loo and A. Scedrov, An Operational
personnel-relationship-in-graph-database Semantics for Network Datalog, Retrieved March 10, 2015, from
[11] M. Piattini and O. Diaz, Advanced Database Technology and Design. https://www.andrew.cmu.edu/user/liminjia/research/papers/ndlogsemans
Boston, USA: Artech House, 2000. -tr.pdf
[12] N. W. Paton, Active Rules in Database Systems. New York, USA:
Springer, 1998.
118

Deductive Graph Database - Datalog in Action

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Deductive Graph Database - Datalog in Action

Uploaded by

Copyright:

Available Formats

2015 International Conference on Computational Science and Computational Intelligence

Deductive Graph Database – Datalog in Action

978-1-4673-9795-7/15 $31.00 © 2015 IEEE 114

Now we are looking for four nodes and three relationships

The first problem is that no recursion is supported in Neo4j

You might also like