You are on page 1of 9

(JPMNT) Journal of Process Management – New Technologies, International

Vol. 5, No 2, 2017.

RELATIONAL DATABASE AND GRAPH DATABASE: A


COMPARATIVE ANALYSIS

Surajit Medhi 1 , Hemanta K. Baruah2


1
Department of Computer Science, Gauhati University, Assam, India
2
Bodoland University, Assam, India
surajitmdh@gmail.com; hemanta_bh@yahoo.com

Original Scientific Paper


doi:10.5937/jouproman5-13553

Abstract: Since 1970, relational be represented and visualized using the


database models have been in use for storing, graph database Neo4j [2]
manipulating and retrieving data. The importance
of relational databases is going to decrease due to 1.1 Relational Database
the exponential growth of data as it is difficult to
work with large number of joining tables. For such Relational database is a collection of
kinds of problems, one of the best solutions is to
use graph database for storing data. The graph data which are stored in a tabular form the
database can be used to store highly connected organization of which is based on the
data. In this article, we are going to put forward a relational model proposed by E. F. Codd in
comparison between relational database and graph 1970. In Relational database, each table
database with reference to an experiment contain rows and columns, rows
performed.
representing records and columns
Keywords and Phrases: Graph, Neo4j, MySQL, representing attributes.
cypher query language
Let us suppose that we want to
maintain a database that contains
information of trains of a railway system
1. INTRODUCTION
in a particular year. The information about
In recent years, the importance of trains, pilots, and assignments of pilots to
storing and analysing data in the form of trains are maintained by this database.
graph has been increasing. Graph database Every train has a unique number, a source
is now used in social networks, city, a destination city, an arrival time and
recommendation systems, biological a departure time. Each pilot has a unique
network, web graph etc. and these graphs employee identity, a name, an address,
are highly interconnected. In relational experience in years, etc. Pilots are assigned
database, data are stored in tabular form. to drive certain trains on particular days of
For less connected or static data, relational the year. First, from the description of the
database is perfect, but for highly database, we would identify the data
connected data such as the network of objects and the relationships. Data objects
hyperlinks in the World Wide Web, it is are train and pilot. Relationship is assigned
complex for representing in relational in the sense that different pilots can be
database [1]. A similar kind of problem assigned to different trains. After that we
arises when we want to model the social proceed to identify the attributes and the
networks like Facebook, Twitter etc. It has primary key for a particular data object.
been accepted that the web data can also

1
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

So, the attributes for trains are train which allows storing the particular objects
number, source city, destination city, with relation between them. A graph can
arrival time and departure time. Train be expressed as G = (V, E) where V is the
number is the primary key. The attributes set of vertices (nodes) and E is the set of
for pilots are employee identity, name, edges (relations). The data structure such
address, experience year. Employee as adjacency matrix, incidence matrix,
identity is the primary key. For the adjacency list and linked list are used to
relationship “assigned to”, the attributes store the graph. The Resource Description
are the primary key of train, pilot and date. Framework (RDF) is a popular
representation for graph data, and it is used
Now we can use relational tables to as a model by some graph databases even
represent the data object and the though the primary objective of RDF was
relationship with their attributes. The train not really representation of graphs [4].
table contains the above mentioned
attributes for trains, the pilot table contains In recent years, some graph databases
the above mentioned attributes for pilots such as DEX, Infinite Graph, Neo4j,
and there is one more table for the Orient DB, Titan etc. have been
relationship assigned to attributes like train developed, which are currently used in
number, employee identity and date. operations for storing data over a
traditional relational model. A database
system Big Table has been created and
1.2 Graph Database used by Google [5].

Before going to discuss different types


of graph databases, we have to know the 1.3 Neo4j
NOSQL. NOSQL is a family of database
and graph database is a part of NOSQL Neo4j is an open source graph
database. NOSQL databases are optimized database which is implemented in Java and
for handling very large data in which developed by Neo technology [11]. It uses
elements are not closely related, and graph structure with nodes, edges and
therefore expensive joins are not needed. properties to store data. It provides index
In NOSQL large volumes of data are free adjacency, which means that each
stored efficiently. There are different types node is a pointer. It also supports the
of NOSQL databases [3]. These are: key labelled property graph model. A labelled
value databases, column family stores property graph model has a number of
databases, document oriented databases characteristics. It contains nodes and
and graph databases. In this article, we are relationships where properties and nodes
going to focus on graph database. Every can be labelled with more than one label.
graph database is a database which uses In this model, if we want to model a
graph structures with nodes, edges and relationship among friends then we have to
properties to represent and to store data. A create different nodes for friends and
graph model is used in graph database connect the friends with edges.

2
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

Fig. 1: Labelled Property Graph Model

In the graph model [11] shown above, the huge static graphs and has used
there are three nodes and five relationships geometry based technique to overcome
and a label. User is the label. Ruth, Billy this problem. Hong et. al. [9] used the
and Harry are the nodes and the property is graph database for subgraph matching with
name for the nodes and ‘follows’ is the set similarity. Sakr and Al-Naymat
Al [10]
relationship.
p. Here, Ruth follows Billy, have proposed efficient relational
which means that there is a connection techniques for
or processing graph queries.
between these two nodes using a directed They have used two kinds of relational
edge, and the edge has a property called mapping schemes for storing graph
‘follows”. In Neo4j, the edges are database such as Vertex-Edge
Vertex mapping
bidirectional as we can see in the model in and Edge-Edge
Edge mapping. In Vertex-Edge
Vertex
Fig. 1. Neo4j uses the cyphercyph query mapping, they have used two tables:
language for creating nodes, relationships Vertices (graph ID, Vertex ID, Vertex
with properties and also for retrieving, Label)) and Edges (graph ID, s-vertex,
s d-
searching and manipulating data. vertex and edge label) In Edge-Edge
Edge
mapping they have used only one table for
2. Related Work representing the graph. The table contains
Many researchers are working on Edge (GID, e-ID, e-label,
label, s-VID,
s sv-Label,
graph database in different fields such as Dvid, dv-Label).
computer science, chemistry, biology,
bio 3. Relational Databases vs. Graph
electronics etc. Balaur et. al. [6] have Databases
applied graph database technologies for
managing comprehensive genome scale Relational databases
database were originally
networks. In their work, they have designed in tabular structure. Modelling
visualized an acid metabolic network relationships are not quite suitable in
based on the cypher query language.
language Batra relational databases. It is so because they
[7] also compared the relational database use foreign keys to relate one information
and graph database based on some with another. In relational databases, tables
evaluation parameters such as level of are needed to be joined using foreign keys
support or maturity, security and to connect information from different
flexibility. Ching-Yung
Yung Lin [8] has tables. If we want to join many tables for
worked on graph computing and linked big querying, it may not be optimal to deal
data, and has faced challenges to visualize with.

3
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

Table 1 Table 2

Employees Relationships

Emp_id Emp_name Emp_id Relation_id

E1 Raj E1 R3

E2 Hirak E1 R4

E3 Bimal E2 R3

E4 John E3 R2

E3 R4

E4 R3

Here, employees and relationships are relational databases. But in our example,
shown in two tables and the relationship however the relations are undirected, so
table represents the relation between that emp_id ‘E3’ and relation_id’ R4’ are
employees. An id has been assigned to the same as emp_id ‘E4’ and relation_id
every employee in Table 1. An emp_id in “R3”.
the employees’ table is referenced as either
emp_id or a relation_id; both the emp_id Now, suppose we want to make
and relation_id columns in relationship simple queries such as to find out what
table refer ids in the Employees’ table. For Bimal’s relationships are in the relational
example, Bimal has emp_id ‘E3’ is a model. First, the query looks at the
relation to Raj and Hirak who has emp_id employees’ table, finds which emp_id is
‘E1’ and ‘E2’ respectively. assigned to Bimal, takes that emp_id to the
relationships table, finds all relation_ids
In the employees’ table, emp_id is the that are assigned to Bimal’s emp_id, and
primary key and both emp_id and then those relation_ids are returned back to
relation_id are used as composite keys in the employees’ table to find the emp_name
the relationship table. A composite key associated to those relation_ids. This type
needs both of the id’s to be a unique of a query is still possible with relational
combination of values. For example, model, but it is computationally expensive.
emp_id ‘E1’, relation_id ‘R3’ and emp_id
‘E1’, relation_id ‘R4’ are not same In contrast, for the same type of a
composite value. If directed relationships model, if we use a graph database such as
are needed to be modelled, the composite Neo4j then Neo4j gives us better result to
value such as emp_id ‘E3’ and realtion_id model the data for above relational model.
‘R4’ are completely different from the In Neo4j, the Model is the following:
composite value than emp_id ‘E4’ in the

4
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

Fig. 2: Graph Model for Employee and Relationship

In Neo4j, we can easily design the relational database Mysql version-5.1.0


model and also retrieve the information and Neo4j version-2.0.3. For Mysql PHP
from the graph model which is scripting languages are used to query the
computationally less expensive. Suppose, database and for Neo4j cypher query
we want to make a query: “What are language is used to query the graph
Bimal’s relationships?” We can easily find database.
the result out because Bimal is connected
to Raj and Hirak. So, if we compare the The scheme for the Relational
same query in relational model, then there Database includes the following tables:
would be a lot of joins and it would also be a) Cricketers
computationally expensive.
b) Teams and
4. Implementation and Experimental c) Games
Results
The Cricketers’ Table contains the
We have performed an experiment attributes cric_id and cric_name. Teams’
with a simple dataset of the game of Table contains the attributes team_id,
cricket. The dataset contains the players’ team_name and cric_id. Games’ Table
information, team and which player has contains the attributes cric_id and
played Test Cricket or One Day game_name.
International (ODI) matches or Twenty-
Twenty (T20) matches. We would The scheme for the Graph Database
implement the cricket players’ dataset in includes the following nodes and
relationship:

5
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

Fig. 3: Representation of Cricket Game Dataset for Relationship, Player and


Team

In Fig. 3, for representing the Cricket Dravid, Virat etc. and also set properties
game dataset, we have created different such as cric_id and cric_name. For team
nodes for players, teams, game type and label, we have created nodes such as India,
relationships like “belongs_to” and Australia, Pakistan etc. In this model, we
“plays”. For example, we have created first have created the relationship between the
a label “Cricketer” and then different Cricketer and the Team.
nodes for that label such as Sachin,

Fig. 4: Representation of the Cricket Game Dataset for Relationship, Cricketer and
Game Type.

6
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

In Fig. 4, for representing the cricket Query 3. Find the Cricket players who
game dataset with the relation between belong to Team India.
cricketerr and game, we already have
created different nodes for players, and we In Table - 3 we have shown the query
have added different nodes for game label results in milliseconds (ms) for Mysql and
such as Test, ODI and T20. We have set Neo4j. We have mentioned the number of
properties such as cric_id and game_name objects for the queries in Mysql and Neo4j
also. In this model, we have created the and also have given the time taken for
relationship between the cricketer and different queries in Mysql and Neo4j. We
game. The objective test of evaluation have calculated the time for executing the
including processing speed between Mysql above mentioned queries for 100, 300 and
and Neo4j is based upon a set of queries 400 objects. We have drawn the column
for the above schema. The Queries are as graph for the execution timeti for three
follows: queries in Mysql and Neo4j. It was found
that graph databases are faster than
Query 1. Find the names of teams. relational database with respect to time
taken for executing the complex queries.
Query 2. Find the names of the cricketers Further, we can design the complex
who have played both ODI and Test relationship model easily.
Cricket.

Table - 3: Execution Time for Different Queries for Different Objects

No. of Query1 Query1 Query2 Query2 Query3 Query3


objects (Mysql) (Neo4j) (Mysql) (Neo4j) (Mysql) (Neo4j)

100 12.56 ms 5ms 18.52ms 7.32ms 15.75ms 6 ms

300 153ms 7ms 212.53ms 13ms 180.24ms 9.32ms

400 165.43ms 8.32ms 387.34ms 15.67ms 302.44ms 14.32ms

20
18
16
14
12
10 mysql
8 neo4j
6
4
2
0
query1 query2 query3

Fig. 5: Execution Time


ime for 100 Objects in Column Graph (in Mysql and Neo4j)

7
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

250

200

150
mysql
100 neo4j

50

0
query1 query2 query3

Fig. 6: Execution Time


ime for 300 Objects in Column Graph (in Mysql and Neo4j)

400
350
300
250
200 mysql
150 neo4j
100
50
0
query1 query2 query3

Fig. 7: Execution Time


ime for 400 Objects in Column Graph (in Mysql and Neo4j)

5. Conclusions
REFERENCES
Both relational database and graph
database are useful to represent data. [1] Database Trends and Applications. Available
http://www.dbta.com/Articles/Columns/Notes
http://www.dbta.com/Articles/Columns/Notes-on-
However, if we compare with reference to NoSQL/Graph-Dat
Dat abases
abases-and-the-Value-They-
storing and retrieving highly connected Provide-74544.aspx,2012
data, graph databases appear to give better
results. This means that we can more [2] Surajit Medhi, Visualization of Graph Models
for Web Document in Neo4j, International Journal
easily design and retrieve the results of of Energy, Information and Communications,
queries if we use graph databases instead Vol.7, Issue 6, 2016
of using relational databases. When we
want to add a new relationship to graph [3] Gabriele Pozzani, Introduction to NoSQL,
profs.sci.univr.it/~pozzani/attachments/nosql%20
profs.sci.univr.it/~pozzani/attachments/nosql%20-
database, we do not need to restructure the
%2001%20-%20introduction.pdf,
%20introduction.pdf, April 24, 2013
database again. As we can see from our
experiment, we get better er results for [4] Jonathan Hayes, A Graph Model for RDF R
executing the predefined queries in graph (Diploma thesis), Technische Universit¨at
Darmstadt Universidad de Chile, 2004.
databases than in relational database with
respect to time. The retrieval times for [5] R. Angles and C. Gutierrez, Survey of Graph
quires give us a conclusion that graph Database Models, ACM Comput. Surv.
Surv 40(1):1–39,
database is suitable to use in commercial 2008.
purposes such as developing social [6] Irina Balaur, Alexander Mazein, Mansoor Saqi,
network, stock market, recommendation Artem Lysenko, Christopher J. Rawlings and
engine and network management. Charles Auffray, Recon2neo4j:
Recon2neo4j Applying Graph
Data Based Technology for Managing

8
www.japmnt.com
(JPMNT) Journal of Process Management – New Technologies, International
Vol. 5, No 2, 2017.

Comprehensive Genome – Scale Networks, [9] Liang Hong, Lei Zou, Xiang Lian, Philip S. Yu,
Bioinformatics, 2017, 1–3. Subgraph Matching with Set Similarity in a Large
. Graph Database, IEEE Transactions on Knowledge
[7] Shalini Batra and Charu Tyagi, Comparative and Data Engineering, 1041-4347, 2015.
Analysis of Relational and Graph Databases,
International Journal of Soft Computing and [10] Sherif Sakr and Ghazi Al-Naymat, Efficient
Engineering, Volume-2, Issue-2, 2012 Relational Techniques for Processing Graph
Queries, Journal of Computer Science and
Technology, 25(6): 1237 – 1255, 2010.

[8] Ching-Yung Lin, Graph Computing and Linked [11] www.neo4j.com


Big Data, Keynote Speech @ International
Conference on Semantic Computing, 2014.

9
www.japmnt.com

You might also like