You are on page 1of 46

Czech Technical University in Prague, Faculty of Information Technology

MIE-PDB: Advanced Database Systems


http://www.ksi.mff.cuni.cz/~svoboda/courses/2016-2-MIE-PDB/

Lecture 11

Graph Databases: Neo4j


Martin Svoboda
svoboda@ksi.mff.cuni.cz

12. 5. 2017

Charles University, Faculty of Mathematics and Physics


NDBI040: Big Data Management and NoSQL Databases
Lecture Outline
Neo4j
• Data model: property graphs
• Traversal framework
• Cypher query language
Read, write and general clauses

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 2
Neo4j Graph Database
Neo4j
Graph database
• https://neo4j.com/
• Features
Open source, massively scalable (billions of nodes), high
availability, fault-tolerant, master-slave replication, ACID
transactions, embeddable, …
Expressive graph query language (Cypher),
traversal framework
• Developed by Neo Technology
• Implemented in Java
• Operating systems: cross-platform
• Initial release in 2007

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 4
Data Model
Database system structure
Instance → single graph
Property graph = directed labeled multigraph
•Collection of vertices (nodes) and edges (relationships)
Graph node
• Has a unique (internal) identifier
• Can be associated with a set of labels
Allow us to categorize nodes
• Can also be associated with a set of properties
Allow us to store additional data together with nodes

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 5
Data Model
Graph relationship
• Has a unique (internal) identifier
• Has a direction
Relationships are equally well traversed in either direction!
Directions can be ignored when querying
• Always has a start and end node
Can be recursive (i.e. loops are allowed)
• Is associated with exactly one type
• Can also be associated with a set of properties

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 6
Data Model
Node and relationship property
• Key-value pair
Key is a string
Value is an atomic value of any primitive data type,
or an array of atomic values of one primitive data type
Primitive data types
• boolean – boolean values true and false
• byte, short, int, long – integers (1B, 2B, 4B, 8B)
• float, double – floating-point numbers (4B, 8B)
• char – one Unicode character
• String – sequence of Unicode characters

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 7
Sample Data
Sample graph with movies and actors
(m1:movie { id: "vratne lahve", t i t l e : "Vratné l a hve" , ye ar: 2006 } )
(m2:movie { id: "samotari", t i t l e : "Samot á ř i", year: 2000 } )
(m3:movie { id: "medvidek", t i t l e : "Medvídek", year: 2007 } )
(m4:movie { id: " s t e s t i " , t i t l e : " Š t ě s t í " , year: 2005 } )

( a1:ac tor { i d : " t r o j a n" , name: "Ivan Trojan", year: 1964 } )


( a2:ac tor { i d : "machacek", name: " J i ř í Machá ček", year: 1966 } )
( a3:ac tor { i d : "schneiderova", name: " J i t k a Schneiderová ",
year: 1973 } )
( a4:ac tor { i d : "sve ra k", name: "Zdeně k Svě rá k", year: 1936 } )
(m1)-[c1:HAS_ACTOR { r o l e : "Robert Landa" }]->( a2)
(m1)-[c2:HAS_ACTOR { r o l e : " J o s e f Tkaloun" }]->( a4)
(m2)-[c3:HAS_ACTOR { r o l e : "Ondř ej" }]->( a1)
(m2)-[c4:HAS_ACTOR { r o l e : "Jakub" }]->( a2)
(m2)-[c5:HAS_ACTOR { r o l e : "Hanka" }]->( a3)
(m3)-[c6:HAS_ACTOR { r o l e : "Ivan" } ]->( a 1 )
(m3)-[c7:HAS_ACTOR { r o l e : " J i r k a " , award: "Czech Lion" } ]->( a 2 )

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 8
Traversal Framework
Traversal framework
• Allows us to express and execute graph traversal queries
• Based on callbacks, executed lazily
Traversal description
•Defines rules and other characteristics of a traversal
Traverser
• Initiates and manages a particular graph traversal
according to…
the provided traversal description, and
graph node / set of nodes where the traversal starts
• Allows for the iteration over the matching paths,
one by one

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 9
Traversal Framework
Components of a traversal description
• Expanders
What relationships should be considered
• Order
Which graph traversal algorithm should be used
• Uniqueness
Whether nodes / relationships can be visited repeatedly
• Evaluators
When the traversal should be terminated
What paths should be included in the query result

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 10
Traversal Framework
Traversal Description

Path expanders
Being at a given node…
what relationships should next be followed?
• Expander specifies one allowed…
relationship type and direction
– Direction.INCOMING
– Direction.OUTGOING
– Direction.BOTH

• Multiple expanders can be


specified at once
When none is provided,
then all the relationships are
permitted
• Usage: td.relati onships(type,
MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 11
Traversal Framework
Traversal Description

Order
Which graph traversal algorithm should be used?
• Standard depth-first or breadth-first methods can be selected
or
specific branch ordering policies can also be implemented
• Usage:
td.breadthFirst ()
td.depthFirst()

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 12
Traversal Framework
Traversal Description

Uniqueness
Can particular nodes / relationships be revisited?
• Various uniqueness levels are provided
Uniqueness.NONE – no filter is applied
Uniqueness.NODE_PATH
Uniqueness.RELATIONSHIP_PATH
– Nodes / relationships within a current
path must be distinct
Uniqueness.NODE_GLOBAL (default)
Uniqueness.RELATIONSHIP_GLOBAL
– No node / relationship may be visited
more than once

• Usage: td.uniqueness(level)
MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 13
Traversal Framework
Traversal Description

Evaluators
Considering a particular path…
should this path be included in the result?
should the traversal further continue?
• Available evaluation actions
Evaluation.INCLUDE_AND_CONTINUE
Evaluation.INCLUDE_AND_PRUNE
Evaluation.EXCLUDE_AND_CONTINUE
Evaluation.EXCLUDE_AND_PRUNE
• Meaning of these actions
INCLUDE / EXCLUDE = whether to
include the path in the result
CONTINUE / PRUNE = whether to
continue
MIE-PDB: Advanced Database the 11:
Systems | Lecture traversal
Graph Databases: Neo4j | 12. 5. 2017 14
Traversal Framework
Traversal Description

Evaluators
• Predefined evaluators
Evaluators.all()
– Never prunes, includes everything
Evaluators.excludeStartPositi on()
– Never prunes, includes everything except the starting positions
Evaluators.atDepth(depth)
Evaluators.toDepth(maxDepth)
Evaluators.fromDepth(minDepth)
Evaluators.includingDepths(minDepth, maxDepth)
– Includes only positions within the specified interval of
depths

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 15
Traversal Framework
Traversal Description

Evaluators
• Usage: td.evaluator(evaluator)
• Note that evaluators are applied even for the starting nodes!
• When multiple evaluators are provided…
then they must all agree on each of the two questions

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 16
Traversal Framework
Path
•Well-formed sequence of interleaved nodes and relationships
Traverser
• Allows us to perform a particular graph traversal
with respect to a given traversal description starting
at a given node / nodes
• Usage: t = td.traverse(node, ...)
for (Path p : t) { ... }
– Iterates over all the paths
for (Node n : t.nodes()) { ... }
– Iterates over all the paths, returns their end
nodes
for (Relationship r :
t.relationships()) { ... }

– Iterates over all the paths, returns their last


MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 17
Examples
Find all the actors that played in Medvídek movie
TraversalDescription td = db.traversalDesc riptio n( )
. b re a d t h F i r s t ( )
.relationships(Types.HAS_ACTOR, Direction.OUTGOING)
.evaluato r (Evaluators.atDepth(1));

Node s = db.findNode(Label.label("movie"), " i d " , "medvidek");


Traverser t = t d . t r ave r s e ( s ) ;

fo r (Path p : t ) {
Node n = p.endNode();
System.o ut .println(
n.getProperty("nam
e")
);
}
Ivan Trojan
J i ř í Machá ček

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 18
Examples
Find all the actors that played with Zdeněk Svěrák
TraversalDescription td = db.traversalDesc riptio n( )
.depthFirst()
.uniqueness(Uniqueness.NODE_GLOBAL)
.relationships(Types.HAS_ACTOR)
.evaluator(Evaluators.atDepth(2))
. eva lua to r ( Eva lu a to r s . exc lu d e St a r t Po s i t i o n( ) ) ;

Node s = db.fi ndN o de( Label.label("ac tor"), " i d " , "sve ra k") ;
Traverser t = t d . t r ave r s e ( s ) ;

fo r (Node n : t . no d e s ( ) ) {
System.o ut .println(
n.getProperty("name")
);
}

J i ř í Machá ček

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 19
Cypher
Cypher
• Declarative graph query language
Allows for expressive and efficient querying and updates
Inspired by SQL (query clauses) and SPARQL (pattern matching)
• OpenCypher
Ongoing project aiming at Cypher standardization
http://www.opencypher.org/
Clauses
• E.g. MATCH, RETURN, CREATE, …
• Clauses are (almost arbitrarily) chained together
Intermediate result of one clause is passed to a
subsequent one

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 20
Cypher Clauses
Read clauses and sub-clauses
• MATCH – specifies graph patterns to be searched
for
WHERE – adds additional filtering constraints
• …
Write clauses and sub-clauses
• CREATE – creates new nodes or relationships
• DELETE – deletes nodes or relationships
• SET – updates labels or properties
• REMOVE – removes labels or properties
• …

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 21
Cypher Clauses
General clauses and sub-clauses
• RETURN – defines what the query result should contain
ORDER BY – describes how the query result should be
ordered SKIP – excludes certain number of solutions from
the result LIMIT – limits the number of solutions to be
included
• WITH – allows query parts to be chained together
• …

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 22
Sample Query
Find names of all the actors that played in Medvídek movie
MATCH (m:movie)-[r:HAS_ACTOR]-
>(a:actor) WHERE m . t i t l e = "Medvídek"
RETURN a.name, a.ye ar
ORDER BY a.ye ar

a.name a.year
Ivan Trojan 1964
Jiří Macháček 1966

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 23
Path Patterns
Path pattern expression
• ASCII-Art inspired syntax
Circles () for nodes
Arrows <--, --, --> for relationships
• Describes a single path pattern (not a general subgraph)

node pattern

relationship pattern

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 24
Path Patterns
Node pattern
• Matches one data node

( )
variable
ariable : label property map

• Variable
Used to access a given query node later on
• Set of labels
Data node must have all the specified labels to be matched
• Property map
Data node must have all the requested properties (including
their values) to be matched (the order is unimportant)

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 25
Path Patterns
Property map
{ }
key : expression
expression

Relationship pattern
• Matches one data relationship

– –
relationship detail
< – – >

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 26
Path Patterns
Relationship pattern

[
variable
ariable : type

]
variable length property map

• Variable
Used to access a given query relationship later on
• Set of types
Data relationship must be of one of the allowed types
to be matched

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 27
Path Patterns
Relationship pattern
• …
• Property map
Data relationship must have all the requested properties
• Variable path length
Allows us to describe paths of arbitrary lengths
(not just one relationship)
*
integer ..
integer

I.e. matches a general path, not just a single relationship

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 28
Path Patterns
Examples
()

(x)--(y)

(m:movie)-->(a:actor)

(:movie)-->(a { name: "Ivan Trojan" } )

()<-[r:HAS_ACTOR]-()

(m)-[:HAS_ACTOR { r o l e : "Ivan" } ] - > ( )

( : a c t o r { name: "Ivan Trojan" })-[:KNOWS *2]->( :Ac tor )

()-[:KNOWS * 5 . . ] - > ( f )

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 29
Match Clause
MATCH clause
• Allows to search for sub-graphs of the data graph that match
the provided path pattern / patterns (all of them)
Query result (table) = unordered set of solutions
One solution (row) = set of variable bindings
• Each variable has to be bound

MATCH path pattern


OPTIONA
OPTIONAL variable
ariable =
L
,

WHERE expression

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 30
Match Clause
WHERE sub-clause may provide additional constraints
• These constraints are evaluated directly during the matching
phase (i.e. not after it)
• Typical usage
Boolean expressions
Comparisons
Path patterns –
true if at least one
solution is found

Uniqueness requirement
• One data node may match several query nodes, but one data
relationship may not match several query relationships

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 31
Match Clause
Example

Find names of actors who played with Ivan Trojan in any movie
MATCH ( : a c t o r { name: "Ivan Trojan" } )
<-[:HAS_ACTOR]-(:movie)-[:HAS_ACTOR]->
( a : a c to r )
RETURN a.name

MATCH (i:actor)<-[:HAS_ACTOR]-(:movie)-[:HAS_ACTOR]->(a:actor)
WHERE (i.name = "Ivan Trojan")
RETURN a.name

a.name
Jiří Macháček
Jitka Schneiderová
Jiří Macháček

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 32
Match Clause
OPTIONAL MATCH
• Attempts to find matching data sub-graphs as usual…
• but when no solution is found,
one specific solution with all the variables bound to NULL
is generated
• Note that
either the whole pattern is matched, or nothing is matched

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 33
Match Clause
Example

Find all the movies filmed in 2005 or earlier,


return names of all their actors as well (if any)
MATCH (m:movie)
WHERE (m.year <= 2005)
OPTIONAL MATCH (m)-[:HAS_ACTOR]->(a:actor)
RETURN m . t i t l e , a.name

m.title a.name
Samotáři Ivan Trojan
Samotáři Jiří Macháček
Samotáři Jitka Schneiderová
Štěsti NULL

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 34
Return Clause
RETURN clause
• Defines what to include in the query result
Projection of variables, accessing properties via dot notation,
aggregation functions, …
• Optional ORDER BY, SKIP and LIMIT sub-clauses
RETURN projection
DISTINCT

ORDER BY clause SKIP clause LIMIT clause

RETURN DISTINCT
• Duplicate solutions (rows) are removed

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 35
Return Clause
Projection
• * = all the variables
Can only be specified as the very first item
• AS allows to explicitly (re)name output
records
*
, expression
AS variable
ariable

expression
AS variable
ariable

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 36
Return Clause
ORDER BY sub-clause
• Defines the order of solutions within the query result
Multiple criteria can be specified
Default direction is ASC
• The order is undefined unless
explicitly defined
• Nodes and relationships as such
cannot be used as criteria
ORDER BY expression ASCENDING

ASC

DESC

DESCENDING

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 37
Return Clause
SKIP sub-clause
• Defines the number of solutions to be skipped
in the query result
SKIP expression

LIMIT sub-clause
• Defines the number of solutions to be included
in the query result
LIMI
LIMIT expression
T

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 38
With Clause
WITH clause
• Constructs intermediate result
Behaves analogously to the RETURN
clause Does not output anything to the
user,
just forwards the current result to the
subsequent clause
• Optional
WITH
WHERE sub-clause
projection
can also be
DISTINCT

provided
ORDER BY clause SKIP clause LIMIT clause

WHERE expression

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 39
Write Clauses
Query clauses evaluation in general
• WITH and RETURN clauses terminate individual query
parts
• Within each query part…
All the read clauses are evaluated first (if any),
only then the write clauses are evaluated (if any)
• Read-only queries must return data
(i.e. must contain RETURN clause)
Write clauses and sub-clauses
• CREATE – creates new nodes or relationships
• DELETE – deletes nodes or relationships
• SET – updates labels or properties
• REMOVE – removes labels or properties
MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 40
Write Clauses
CREATE clause
• Inserts new nodes or relationships into the data graph
CREATE path pattern
variable
ariable =

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 41
Write Clauses
DELETE clause
• Removes nodes, relationships or paths from the data graph
• Relationships must be removed before the nodes
they are associated with
Unless the DETACH modifier is specified

DELETE expression
DETACH
,

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 42
Write Clauses
SET clause
• Allows to…
set a value for a particular property
– or remove a property when NULL is assigned
replace all the current properties with new ones
add new properties to the existing ones
add labels to nodes
• Cannot be used to set relationship types
SET variable
ariable . property key = expression

variable
ariable = expression

variable
ariable + = expression

variable
ariable : label

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 43
Write Clauses
REMOVE clause
• Allows to…
remove a particular property
remove labels from nodes
• Cannot be used to remove
relationship types
REMOVE variable
ariable . property
property key
key

ariable
variable : label
label

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 44
Expressions
Literal expressions
• Integers: decimal, octal, hexadecimal
• Floating-point numbers
• Strings
Enclosed in double or single quotes
Standard escape sequences
• Boolean values: true, false
•NULL value (cannot be stored in data graphs)
Other expressions
• Collections, variables, property accessors, function calls,
path patterns, boolean expressions, arithmetic expressions,
comparisons, regular expressions, predicates, …

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 45
Lecture Conclusion
Neo4j = graph database
• Property graphs
• Traversal framework
Path expanders, uniqueness, evaluators, traverser
Cypher = graph query language
• Read (sub-)clauses: MATCH, WHERE, …
• Write (sub-)clauses: CREATE, DELETE, SET, REMOVE, …
• General (sub-)clauses: RETURN, WITH, ORDER BY, LIMIT,

MIE-PDB: Advanced Database Systems | Lecture 11: Graph Databases: Neo4j | 12. 5. 2017 46

You might also like