You are on page 1of 95

Property Graph Databases

2016-11-08 Dr. Asja Fischer, Prof. Jens Lehmann


Motivation

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 2


Trends in Data

Trend 1: Big Data

The amount of information that we generate and transfer has increased


dramatically in the past few years.

1Source: IDCs Digital Universe Study, sponsored by EMC, December 2012


http://www.emc.com/collateral/analyst- reports/idc- the- digital- universe- in- 2020.pdf

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 3


Trends in Data

Trend 2: Connectedness

Data is evolving to be more interlinked and connected. Hypertext has links, blogs
have pingback, tagging groups all related data.

2 Figure taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 4
Why Relational Databases are not Enough

(Provocative) claim: Relational databases are not good at handling relationships!

3 Table taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough

(Provocative) claim: Relational databases are not good at handling relationships!

In this social network database, it is easy to find people Bob considers his friends:

SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . FriendID = p1 . ID
JOIN Person p2
ON PersonFriend . PersonID = p2 . ID
WHERE p2 . Person = Bob

Result:
3 Table taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough

(Provocative) claim: Relational databases are not good at handling relationships!

In this social network database, it is easy to find people Bob considers his friends:

SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . FriendID = p1 . ID
JOIN Person p2
ON PersonFriend . PersonID = p2 . ID
WHERE p2 . Person = Bob

Result: Alice and Zach


3 Table taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 5
Why Relational Databases are not Enough

B Sad but true: friendship is not always symmetric!


B What if we want to know who is friends with Bob?

SELECT p1 . Person
FROM Person p1 JOIN PersonFriend
ON PersonFriend . PersonID = p1 . ID
JOIN Person p2
ON PersonFriend . FriendID = p2 . ID
WHERE p2 . Person = Bob

B User side: still easy to implement


B Database side: Database has to consider all the rows in the PersonFriend
table more expensive!

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 6


Why Relational Databases are not Enough?

What if we query deeper into the social network?

SELECT p1 . Person AS PERSON , p2 . Person AS F RI E N D _ O F _ F R I E N D


FROM PersonFriend pf1 JOIN Person p1
ON pf1 . PersonID = p1 . ID
JOIN PersonFriend pf2
ON pf2 . PersonID = pf1 . FriendID
JOIN Person p2
ON pf2 . FriendID = p2 . ID
WHERE p1 . Person = Alice AND pf2 . FriendID <> p1 . ID

B Uses recursive joins increase in syntactic, computational, and space


complexity of the query

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 7


Why Relational Databases are not Enough

Experimental results: Given any two persons chosen at random, is there a path
that connects them that is at most 5 relationships long?

Depth RDBMS execution time (s) Neo4j execution time (s) Records returned
2 0.016 0.01 2500
3 30.267 0.168 110,000
4 1543.505 1.359 600,000
5 Unfinished 2.132 800,000
B Comparing relational store and a popular graph database Neo4j
B Social network consisting of 1,000,000 people
B Each person with approximately 50 friends

Conclusion: Graph databases outperform relational databases when we are


dealing with connected data!

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 8


Relational Databases: What exactly is the Problem?

B Relationships exist only between tables problematic for highly connected


domains
B Relationship traversal can become very expensive
B Flat, disconnected data structures - data processing and relationship
construction happens in the application layer
B Bound by a previously defined schema

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 9


Graph Databases using the Labeled Property
Graph Model

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 10


Graph Databases: Advantages

B Explicit graph structure: semantic dependencies between entities are made


explicit
B New nodes and new relationships can be easily added into the database
without having to migrate data or restructure the existing network
B Relationships correspond to paths; querying the database = traversing the
graph this makes certain types of queries simpler
B Schema-free
B Index-free adjacency

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 11


Index-Free Adjacency

B Each node knows (i.e. directly points to) its adjacent nodes - index-free
adjacency
B No explicit index each node acts as micro-index of nearby nodes much
cheaper than global indices
B Because of this, query times are less/not dependent of the size of the graph
they depend only on the part of the graph that has been searched
B Precomputes and stores bidirectional joins as relationships (i.e. no need to
compute them)
B Enables bidirectional traversal of the database: we can traverse both
incoming and outgoing relationships

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 12


Graph Databases: When to Use

When to use:
B Complex, highly-connected data
B Dynamic systems with topology that is difficult to predict
B Dynamic requirements that change over time
B Cases where relationships in data are themselves meaningful

When not to use:


B Large offline batch analysis tasks of static data
B Simple, unconnected data structures
B For certain types of queries:
Graph fishing (no small set of start nodes)
Bulk Scans (generic queries not applying to any indexed subset)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 13


Labeled Property Graph Model

Building blocks of a Labeled Property Graph:


B Nodes
Can be tagged with one or more labels
Can contain properties
B Relationships:
Connect nodes and structure the graph
Directed
Always have a single name, a start node and an end node no
dangling relationships
Can have properties
B Properties (= key-value pairs)
B Labels:
Group nodes together

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 14


Properties

Properties are named values:


B Key is the name of the property
Always a string
B Values can be:
Numeric values
String values
Boolean values
Lists of any other type of value

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 15


Labels

B Used to assign types to nodes


B Groups nodes into sets: all nodes with the
same label belong to the same set
B Database queries can be constrained to these
sets instead of the whole graph more effi-
cient queries that are easier to write
B Each node has any number of labels (includ-
ing none)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 16


Nodes

B Together with relationships, fundamen-


tal unit of property graph model
the simplest possible graph is a single
node
B Are often used to represent entities
B Can have zero or more properties

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 17


Relationships

B Organize the nodes by connect-


ing them
B Always connects a start node to
the end node
B A key part of a graph database:
allow us to find related data
B Always has a direction

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 18


Paths

B One or more nodes with connecting


relationships
B Typically is a result of a query or a
traversal
B Length of a path = number of rela-
tionships on that path

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 19


Labeled Property Graph Model: Example

3 Illustration taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 20
Labeled Property Graph Model: Example

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 21


Labeled Property Graph Model: Example

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 22


Labeled Property Graph Model: Example

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 23


Labeled Property Graph Model: Example

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 24


Creating Graph Databases vs. Creating
Relational Databases

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 25


Creating a Relational Database

Step 1: Acquire and improve understanding of data: a whiteboard sketch step

3 Diagram taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 26
Creating a Relational Database

Step 2: Construct an entity-relations (E-R) diagram

Note the complexity of the model before any data has even been added
3 E-R diagram taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 27
Creating a Relational Database

Step 3: Map the E-R diagram into tables and relations and normalize the data

3 Diagram taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 28
Creating a Graph Database

B What you sketch on the whiteboard = what you store in the database
B No normalization, denormalization or conversion to tables-relations structure
necessary
B No conceptual-relational dissonance: the physical layout of the data is same
as it is conceptualized
B Domain modeling = further graph modeling:
Makes sure that every node has the appropriate properties
Ensures that every node is in correct semantic context

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 29


Creating a Graph Database

3 Image taken from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 30
Graph DBs: Good Design Principles

B Important to review and evaluate the domain model to make sure it is


suitable for answering most likely queries.
B Ensures that any later changes are driven by changes in application
requirements and not by the need to mitigate bad design decisions
B There are two techniques to do this:
Check that the graph reads well
B Pick a node
B Follow relationships to other nodes, reading each nodes role and
each relationships name as you go
B This should form sensible sentences
Design for queryability
B Understand end-users goals: understand the use cases
B Try to craft sample queries for the use cases

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 31


Design Principles: Reading the Graph

App 1 runs on VM 10, which is hosted by Server 1 in Rack 1.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 32


NEO4J & Other Graph Databases

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 33


Existing Graph Databases

Source: OReilly Graph Databases [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 34
Neo4j

B Probably the most popular graph database


today
B Based on a schema-free labeled property
graph model
https://neo4j.com/ B Scales to billions of nodes, relationships and
properties

Consists of:
B Nodes, Relationships, Properties, Labels, Paths, Traversal, and Schema
(index and constraints)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 35


Neo4j: Learn more!

B Open-source
B Extensive support and learning material
B Check out https://neo4j.com/developer/ for further resources (including a
sandbox for testing!)
B International events and meetup groups

(Neo4j)-[:LOVES]-(Developers)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 36


Cypher

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 37


Cypher: Pattern Matching

A pattern matching query language for graphs

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 38


Cypher: Pattern Matching

A pattern matching query language for graphs

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 39


Cypher: Pattern Matching

A pattern matching query language for graphs

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 40


Cypher: ASCII Art

Uses ASCII art representation

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 41


Cypher: ASCII Art

Directed relationship

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 42


Cypher: ASCII Art

Undirected relationship

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 43


Cypher: ASCII Art

Specific relationships

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 44


Cypher: ASCII Art

Joined paths

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 45


Cypher: ASCII Art

Multiple paths

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 46


Cypher: Start & Return clauses

B START specifies starting point(s) in the graph


Note : these can be both nodes or relationships
B START is done via index lookup or directly using relationship IDs
B RETURN specifies nodes, relationships, and properties that should be
returned

START - RETURN example


START n = node(0) RETURN n
reads lookup node with id 0; return that node
START n=node:Person(name=Andreas) RETURN n
reads lookup node in index Person with a name property whose value is
Andreas; return that node

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 47


Cypher: MATCH Clause

B Specification by example: draw data we are looking for


B Used to define a pattern of nodes and relationships that we want to find

MATCH - example
START a=node:user(name = Michael)
MATCH (a)-[:KNOWS] >(b)-[:KNOWS] >(c), (a)-[:KNOWS] >(c)
RETURN b, c
Reads: Find user with name Michael and starting from that node in the
graph, find nodes b and c that fit the pattern specified by MATCH example.
Return those two nodes.

3 Example from [?]


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 48
Cypher: Return Options

B Cypher also provides various options to enrich the returned results


B They include options to aggregate, order, filter, and limit the returned data
B Example: count allows us to return only the number of matched instances

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 49


RETURN Options - Example

RETURN Options - example


MATCH (theater:Venue {name:Theatre Royal}),
(bard:Author lastname:Shakespeare),
(theater)< [:VENUE]-()-[p:PERFORMANCE OF] >()
-[:PRODUCTION OF] >(play)< [:WROTE PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance count
ORDER BY performance count DESC

This query counts the number of PERFORMANCE OF relationships using the


identifier p (which is bound to the PERFORMANCE OF relationships in the
MATCH clause). It saves the result as performance count. The results are then
ordered (using the clause ORDER BY) by the count in decreasing order. The
query then returns this result.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 50


RETURN Options - Example

RETURN Options - example


MATCH (theater:Venue {name:Theatre Royal}),
(bard:Author lastname:Shakespeare),
(theater)< [:VENUE]-()-[p:PERFORMANCE OF] >()
-[:PRODUCTION OF] >(play)< [:WROTE PLAY]-(bard)
RETURN play.title AS play, count(p) AS performance count
ORDER BY performance count DESC
LIMIT 1

Analogously, adding LIMIT clause tells this query to limit the results to the first
instance

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 51


Cypher: WHERE Clause

B Constrains graph matches by eliminating matched subgraphs by requiring


one or more of the following:
That certain paths must be present (or absent) in the matched
subgraphs
That nodes must have certain labels or relationships with certain names
That specific properties on matched nodes and relationships must be
present (or absent)
That certain properties on matched nodes and relationships must have
specific values
That other predicates must be satisfied (e.g., that performances must
have occurred on or before a certain date)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 52


WHERE - Example

For example, we can query specifically for Shakespeare plays that were written
after 1608

WHERE - example
MATCH (bard:Author {lastname:Shakespeare}),
(play)< [w:WROTE PLAY]-(bard)
WHERE w.year > 1608
RETURN DISTINCT play.title AS play

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 53


Other Clauses

Cypher supports a variety of clauses:


B CREATE and CREATE UNIQUE
B DELETE
B SET
B FOREACH
B UNION
B WITH

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 54


Cypher - Design for Queryability Example

Remember the design for queryability design goal?


Goal: Design a query to find the cause behind an unresponsive application or
service in our example graph.

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 55


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 56


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

B Describes a variable length path between one and five relationships long
B There is no colon or relationship name between the square brackets the
relationships are unnamed
B There are no arrow-tips relationships are undirected

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 56


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 57


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

B We start with the user who reported a problem


B We add asset nodes that have a status property with a value of down
B Nodes which do not have a status property will not be added to the results

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 57


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 58


Cypher - Design for Queryability Example

The sample query we would need to define is:

Example Query
MATCH (user:User)-[*1..5]-(asset:Asset)
WHERE user.name = User3 AND asset.status = down
RETURN DISTINCT asset

B Ensures that unique assets are returned in the results, no matter how many
times they are matched
The sample query we would need to define is:

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 58


Sample Queries

Example 1
MATCH (p:Product)
RETURN p.productName, p.unitPrice
ORDER BY p.unitPrice DESC
LIMIT 10;

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 59


Sample Queries

Example 1
MATCH (p:Product)
RETURN p.productName, p.unitPrice
ORDER BY p.unitPrice DESC
LIMIT 10;

B Returns only a subset of attributes, in this case: ProductName and UnitPrice


B Orders by price
B Returns 10 most expensive items

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 59


Sample Queries

Example 2
MATCH (p:Product {productName:Chocolade})
RETURN p.productName, p.unitPrice;

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 60


Sample Queries

Example 2
MATCH (p:Product {productName:Chocolade})
RETURN p.productName, p.unitPrice;

B A shortcut for MATCH (p:Product) WHERE p.productName =


Chocolade

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 60


Sample Queries

Example 3
MATCH (p:Product {productName:Chocolade})< [:PRODUCT]-
(:Order)< [:PURCHASED]-(c:Customer)
RETURN distinct c.companyName;

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 61


Sample Queries

Example 3
MATCH (p:Product {productName:Chocolade})< [:PRODUCT]-
(:Order)< [:PURCHASED]-(c:Customer)
RETURN distinct c.companyName;

B Names of everyone who bought Chocolade

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 61


Sample Queries

Example 4
MATCH (bob:User {username:Bob})-[:SENT] >(email)-[:CC] >(alias),
(alias)-[:ALIAS OF] >(bob)
RETURN email.id
3 Example and figure taken from [?]

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 62


Sample Queries

Example 4
MATCH (bob:User {username:Bob})-[:SENT] >(email)-[:CC] >(alias),
(alias)-[:ALIAS OF] >(bob)
RETURN email.id
B Returns all emails that Bob has sent where hes CCd one of his own aliases
B Returns 1 result: id:1, content:email content
3 Example and figure taken from [?]

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 63


Sample Queries

Example 5
MATCH (email:Email {id:11})< [f:FORWARD OF*]-(:Forward)
RETURN count(f)

3 Example and figure taken from [?]

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 64


Sample Queries

Example 5
MATCH (email:Email {id:11})< [f:FORWARD OF*]-(:Forward)
RETURN count(f)

B This returns the number of times a particular email is forwarded


B The answer is 2 (count the number of FORWARD OF relationships bound
to f)
3 Example and figure taken from [?]

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 64


Serialisation

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 65


Cypher as Serialisation Format

Cypher - Example
CREATE (:Person {name:Ian})-[:EMPLOYMENT] >
(employment:Job {start date:2011-01-05})
-[:EMPLOYER] >(:Company {name:Neo}),
(employment)-[:ROLE] >(:Role {name:engineer})
3 Image taken from [?]
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 66
Geoff

B Based on Cypher
B A Geoff document consists of a one or more subgraphs, each of which
contains one or more paths
B Properties in JSON syntax

( alice {" name ":" Alice "})


( bob {" name ":" Bob "})
( carol {" name ":" Carol "})
( alice ) < -[: KNOWS ] - >( bob ) < -[: KNOWS ] - >( carol ) < -[: KNOWS ] - >( alice )

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 67


GraphML

B XML-based file format for graphs


B Common format for exchanging graph structure data
B Generic (not limited to graph databases)

<? xml version = " 1.0 " encoding = " UTF -8 " ? >
< graphml xmlns = " http: // graphml . graphdrawing . org / xmlns "
[...]
< graph id = " G " edgedefault = " undirected " >
< node id = " n0 " >
< data key = " d0 " > green </ data >
</ node >
< node id = " n1 " / >
< edge id = " e0 " source = " n0 " target = " n1 " >
< data key = " d1 " > 1.0 </ data >
</ edge >
</ graph >
</ graphml >

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 68


Serialisation

B Essentially no standardisation in property graph community


B Serialisation less important than in semantic technologies:
Not much emphasis on data exchange
Data publishing not commonly considered (in contrast to Linked Data)
Embedding in other formats usually not considered (in contrast to
RDFa)

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 69


Hypergraphs

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 70


Graph Shortcomings

Graphs pairwise relationships.


Pairwise relationships are often insufficient - the world is more complex than that.

Example Problem
Given: A collection of articles together with their authors.
Goal: Group articles into different topics.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 71


Graph Shortcomings

Example Problem
Given: a collection of articles and their authors.
Want: Group articles into different topics.

Possible solution: undirected graph

B Nodes = articles
B If two articles have at least one author in common
add an edge between them
B Add weights to edges equal to the number of
authors the two corresponding articles have in
common

Group articles by topic using clustering approaches


Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 72
Example Problem

What is wrong with this solution?

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 73


Example Problem

What is wrong with this solution?

B Loss of information: No way to tell whether the same person joined in


writing three or more articles.
B This is a problem because the articles from the same person are more likely
to belong to the same topic.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 73


Hypergraphs

Natural solution: a hypergraph!

Definition: Hypergraph
Let V = (v1 , v2 , . . . , vn ) be a set of vertices. A hypergraph on V is a family of
H = (E1 , E2 , . . . , Em ) of subsets of V s.t.:
B For i = 1, 2, . . . , m, Ei 6=
Sm
B Ei = V
i=1

Intuitively: A hypergraph is a graph in which an edge (called hyperedge) can


connect more than two vertices.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 74


Hypergraphs

Models the example problem:


B Each article matches to a vertex
B Each hyperedge contains set of articles written
by the its corresponding author
Note that hyperedges are represented by
dotted lines around the set of vertices they
include

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 75


Hypergraphs: Summary

B Each hyperedge can connect to an arbitrary number of nodes


B Hypergraphs can be useful for data with a large number of many-to-many
relationships
B Hypergraphs are particularly good at capturing meta-intent

Disadvantage:
B Can be more confusing than the more explicit data modeling techniques

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 76


Converting and Comparing the Graph Models

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 77


RDF to PGM Transformation

Distinguish between two types of RDF triples:


B attribute triples
= triples whose object is a literal
B relationship triples
= triples whose object is an IRI or a blank node
The transformation:
B Every relationship triple an edge
B Every attribute triple a property of the vertex for the subject of that triple

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 78


RDF to PGM Transformation

Simple transformation example:

ex : alice foaf : name " Alice " .


ex : bob foaf : name " Bob " .
<< ex : alice foaf : knows ex : bob > > ex : certainty 0.5 .
<< ex : bob foaf : age 23 > > ex : certainty 0.9 .

Note: << and >> enclose metadata tripes in Turtle syntax. The last line cannot
be converted using this transformation approach. See
https://arxiv.org/pdf/1409.3288v2.pdf for details.

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 79


PGM to RDF Transformation

B Edges (incl. their labels) an ordinary RDF triple


B Vertices (incl. their labels) an ordinary RDF triple
B Edge properties metadata triple whose subject is the triple for the
corresponding edge
B Patterns for generating IRIs that denote edge labels and properties can be
chosen freely

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 80


Hypergraphs to Property Graphs

Labeled hypergraphs can be powerful when modelling many-to-many


relationships:

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 81


Hypergraphs to Property Graphs

All relations need to be made explicit when converting to property graphs:

Why can it still make sense to use property graphs?


B More familiar and explicit modelling technique
B Model can be fine-tuned with properties such as primary driver (for
insurance purposes)
Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 82
Knowledge Graph Formats

Characteristics Relational RDF PGM


Standardised yes yes no
Traversal performance - o +
Large analytical queries + o -
Query language SQL SPARQL Cypher & more
Data Publication & Differentiability - + -
Global Identifiers & Cross dataset fusion - + -

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 83


Summary

Covered in the lecture:

B Motivation for Model Graph Databases (Big Data, Connectivity)


B Comparing Relational and Property Graph Databases
B Labeled Property Graph Model
B Cypher Query Language
B Unifying RDF, Property Graphs and Hypergraphs

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 84


References

Dr. Asja Fischer, Prof. Jens Lehmann Property Graph Databases 85

You might also like