Professional Documents
Culture Documents
Kornelije Rabuzin
Faculty of organization and informatics
University of Zagreb
Varazdin, Croatia
kornelije.rabuzin@foi.hr
Abstract—In recent years many NoSQL systems, including • Document oriented databases
graph databases, have become available. For large amounts of
interconnected data, these systems represent a good choice. • Column oriented databases
Deductive databases have been used in order to deduce new
• Key value databases
pieces of information based on a database that containes large
amounts of data. But it is important to keep in mind that such • Graph databases
databases were mostly relational, i.e., relations were used to store
data upon which deductive mechanisms were used. In this paper, Graph databases store information in nodes and
deductive graph databases are proposed. In deductive graph relationships. Each node does not have to contain the same
databases, data are stored in a graph database, and Datalog is number of properties (attributes) and the same applies for
used for reasoning purposes on a relational representation of a relationships between the nodes. For large amounts of
graph database. interconnected data graph databases represent a good choice.
They are especially suitable for social network analysis. In the
Keywords—databases, SQL, NoSQL, graph databases, Datalog next chapter a small graph database is implemented (Neo4j
system is used) and more about graph databases will be
I. INTRODUCTION discussed. For additional information on graph databases see
[6]. In this paper, we primarily discuss graph databases (other
The relational data model has been widely used in the past
types are not discussed).
40 years. Many databases were implemented in order to store
large amounts of important data. As it turns out, Dr. Codd’s A deductive database uses rules to produce new pieces of
vision to store data into relations turned out to be crucial and, knowledge based on facts, which are stored in the database.
because of its ideal properties, the relational data model has The next definition can be found in [11]: “A deductive DBD is
survived. Although the term “relation” is used in the theory, a triple D = (F, DR, IC), where F is a finite set of ground facts,
users that use databases on a daily basis usually claim that a DR a finite set of deductive rules, and IC a finite set of
database is, in fact, a set of tables. In order to implement a integrity constraints. The set F of facts is called the extensional
database, certain database management systems (DBMS) are part of the DB (EDB), and the sets DR and IC together form
required. The rich query interface (Structured Query Language the so-called intensional part (IDB)”. A small example is
– SQL) that DBMS supports can be used to work with borrowed from [11]:
databases. The ability to store and efficiently manage large
amounts of data has turned database management systems into Facts
significant parts of many applications and information systems Father(John, Tony)
that were developed over the time.
Mother(Mary, Bob)
SQL is a standardized language that is used to work with
databases. All database management system vendors support Father(Peter, Mary)
SQL which is declarative and typically, not complex. However,
sometimes queries do get quite complex. Furthermore,
different databases management systems do not support all the Deductive Rules
statements in the same form and some differences may exist.
For more information on SQL, see [8] and [9]. Parent(x,y) ←Father(x,y)
But, in recent years, the NoSQL movement has become Parent(x,y) ← Mother(x,y)
popular. Namely, the relational data model is starting to reveal
its weaknesses, and, for some problems, new solutions have to GrandMother(x,y) ← Mother(x,z) ∧ Parent(z,y)
be found. The amounts of data that relational databases have to
store today are beyond their capabilities. Furthermore, a fixed Ancestor(x,y) ← Parent(x,y)
database schema is no longer an option. Because of this, many Ancestor(x,y) ← Parent(x,z) ∧ Ancestor(z,y)
NoSQL systems have been developed, and, generally speaking,
we distinguish: Nondirect-anc(x,y) ← Ancestor(x,y) ∧ ¬Parent(x,y)
115
MATCH (n:Course)-[:PREREQ]->(m:Course)- course(mathematics, 7).
[:PREREQ]->(o:Course)-[:PREREQ]->(p:Course)
course(informatics, null).
RETURN n.name, p.name
course('programming I', 7).
n.name p.name
course('databases I', null).
Mathematics Data warehouses I
course('databases II', 6).
Informatics Data warehouses I
course('data warehouses I', 5).
116
comes_before(informatics,'databases I'), become a friend of C as well. In Datalog, this is easy to
express:
comes_before(informatics,'databases II'),
fknow(A,C):- knows(A,B),knows(B,C).
comes_before(mathematics,'data warehouses I'),
DES> fknow(A,B).
comes_before(mathematics,'databases I'),
{
comes_before(mathematics,'databases II'),
fknow(tom,mary)
comes_before(mathematics,'programming I')
}
}
Info: 1 tuple computed.
Info: 10 tuples computed.
We see that it is likely that Tom and Mary will become
friends in the future.
We see that Datalog rules (and recursion) can be used to But one could also use hypothetical queries for such
determine courses and their prerequisites. In Cypher, one has to purposes. If A knew B, is it likely that A becomes a friend of
use the UNION clause (as it was shown above), and this is C?
something that people were doing in SQL a decade ago. So, for
now, Datalog can be used to solve the problems, as it was used Interesting papers that deal with queries in graph databases
to solve the problems that SQL had before the recursion was are [4], [10] and [13]. This research was inspired by [7] and
supported. [15].
Using Datalog has other benefits as well. Let us look at the
following rule: CONCLUSION
tough(X):-course(X,Y), Y>6. Deductive databases were used in the past; some methods
used in deductive databases were so interesting that they even
This is, in fact, a rule that is added to the database and found a way into other fields as well. For example, recursive
executed as a query. Such rules are called views. As a result, queries in SQL have roots in deductive databases. In this paper,
we obtain all the courses that have more than six ECTS points, it is shown how recursion can be used on a graph database
which are considered to be tough: (perhaps one day graph databases will support the recursion) as
well as views and hypothetical queries.
DES> tough(A).
In this paper, the idea of deductive graph database has been
{
presented. On a few examples, it was shown how Datalog
tough(mathematics), could be used to work with data. It was shown that views are
interesting as well as recursion which simplifies the process of
tough('programming I') posing queries.
} In the future, one should look at how to automatically
Info: 2 tuples computed. transform nodes and relationships into a form upon which
Datalog could be used directly. Furthermore, materialized
views could be considered as well as hypothetical queries that
Thus, it is easy to use views in Datalog. More importantly, were only mentioned in the paper.
views are not supported in Neo4j as such.
REFERENCES
Now let us assume that our graph database contains
information about people and relationship, KNOWS (“knows” [1] C. Date, An Introduction to Database Systems. Boston, USA: Addison-
Wesley, 2004.
means that person A knows person B). We could translate that
database in the following set of facts: [2] C. Date, Database Design and Relational Theory: Normal Forms and All
That Jazz. Sebastopol, USA: O’Reilly Media, 2012.
person(tom). [3] E. Redmond and J. R. Wilson, Seven Databases In Seven Weeks. Dallas,
USA: Pragmatic Programmers, 2012.
person(john). [4] G. Butler, L. Chen, X. Chen, And L. Xu, Diagrammatic Queries and
Graph Databases, Workshop on Managing and Integrating Biochemical
person(mary). Data, Retrieved March 15, 2015, from
person(peter). http://users.encs.concordia.ca/~gregb/home/PDF/eml-digrammatic.pdf
[5] H. Garcia-Molina, J. Ullman and J. Widom, Database Systems: The
knows(tom,john). Complete Book. London, UK: Pearson Education, 2009.
[6] I. Robinson, J. Webber, E. Eifrem, Graph Databases. Sebastopol, USA:
knows(john,mary). O'Reilly Media, 2013.
[7] K. Rabuzin, Deductive Data Warehouses, International journal of data
warehousing and mininig, vol. 10(1), 2014.
For example, this is a well-known fact; if A is a friend of B, [8] K. Rabuzin, SQL – napredne teme, akovec, Croatia: Zrinski, 2014.
and B is a friend of C, in the future it is most likely that A will
117
[9] K. Rabuzin, Uvod u SQL, akovec, Croatia: Zrinski, 2011. [13] P. T. Wood, Graph Views and Recursive Query Languages, BNCOD,
[10] K. T. Yar and K. M. L. Tun, Predictive Analysis of Personnel 1990, pp. 124-141.
Relationship in Graph Database, International Journal of Engineering [14] R. M. Colomb, Deductive Databases and Their Applications. London,
Research & Technology, vol. 3(9), 2014. Retrieved March 12, 2015, UK: Taylor & Francis, 2005.
from http://www.ijert.org/view-pdf/11117/predictive-analysis-of- [15] V. Nigam, L. Jia, A. Wang, B. T. Loo and A. Scedrov, An Operational
personnel-relationship-in-graph-database Semantics for Network Datalog, Retrieved March 10, 2015, from
[11] M. Piattini and O. Diaz, Advanced Database Technology and Design. https://www.andrew.cmu.edu/user/liminjia/research/papers/ndlogsemans
Boston, USA: Artech House, 2000. -tr.pdf
[12] N. W. Paton, Active Rules in Database Systems. New York, USA:
Springer, 1998.
118