Professional Documents
Culture Documents
WITH NEO4J
1. Introduction
The objective of this practice is to provide the trainees with a broad vision of network
analysis. Concretely, data from COMTRADE will be loaded and analysed with Neo4J .
2. Neo4J installation
Go to the following url:
https://neo4j.com/download/community-edition/
Select download now, ensuring that the operating system is properly selected:
dbms.memory.heap.initial_size=4096m
dbms.memory.heap.max_size=6144m
Run the Neo4J program and select the database location (you can leave this parameter by
default). Then, press the start button and wait for few seconds.
When the server starts, it will be accessible through the web browser in the following url:
http://localhost:7474
Or
http://127.0.0.1:7474
● Leave the host and username as they are (see the capture below).
● The default password is neo4j
You will be asked to change the password. Please, insert the following password:
estp2018
You will be presented with a screen such as the following:
https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/comtrade/comtrade.graph.csv
This dataset has been prepared from the Comtrade 2016 import/exports among countries. If
you open the file with Excel, you observe three columns:
● Number of countries
MATCH (s:Country)-[r:e]-(t:Country)
WITH count(distinct r) as n
RETURN n
Network statistics
● Graph Degree - Some basic statistics about countries transactions (e.g. minimum,
maximum, average and standard deviation of the number of transactions per
country):
MATCH (c:Country)-[:e]->()
WITH c, COUNT(*) as num
RETURN MIN(num) as min, MAX(num) as max, AVG(num) as average, STDEV(num) as
stdev
● Weighted Degree - Some basic statistics about the volume ($) of the transactions
among countries (minimum, maximum, average and standard deviation of the
volume of the transactions per country):
MATCH (c:Country)-[r:e]->()
WITH c, sum(r.weight) AS w
RETURN MIN(w) as min, MAX(w) as max, AVG(w) as average, STDEV(w) as stdev
● Graph density: Measures how many edges exists in comparison with the maximum
possible number of edges between nodes. In this case, how many transactions exists
among countries with respect to the maximum possible number of transactions.
MATCH (s:Country)-[r:e]-(t:Country)
RETURN count(distinct s) as nNodes,
count(distinct r) as nEdges,
count(DISTINCT r)/((count(DISTINCT s)-1) * (count(DISTINCT s) - 1.0)) AS graphDensity
● Top 10 shortests paths (the largest shortest path is the network diameter)
Imports/Exports analysis
● List of countries, total imports and volume of the imports, ordered by volume
MATCH (c:Country)-[r:e]->()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC
● List of countries, total exports and volume of the exports, order by volume
MATCH (c:Country)<-[r:e]-()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC
● List of countries, total and volume of the imports plus exports, order by volume
MATCH (c:Country)-[r:e]-()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC
● Visualising the graph with the top countries per volume ($) of imports:
MATCH g=(s:Country)-[r:e]->(t:Country)
WITH g, sum(r.weight) as w
RETURN g
ORDER BY w DESC
LIMIT 10
● List of countries that imported from more than a given number of countries
MATCH (s:Country)-[r:e]->(t:Country)
WITH t, count(r) as n
WHERE n>200
RETURN t.name, n
MATCH (s:Country)-[r:e]->(t:Country)
WITH t, sum(r.weight) as w
WHERE w>1000000000000
RETURN t.name, w
● List of countries that imported from less than a given number of countries and more
than a given volume
MATCH (s:Country)-[r:e]->(t:Country)
WITH t, count(r) as n, sum(r.weight) as w
WHERE n<200 AND w>1000000000000
RETURN t.name, n
MATCH g=(s:Country)-[r:e]->(t:Country{name:"USA"})
WITH s, t, g, sum(r.weight) as w
RETURN s.name, t.name, w
ORDER BY w DESC
MATCH g=(s:Country)-[r:e]->(t:Country{name:"USA"})
WITH t, sum(r.weight) as w
RETURN t.name, w
6. Conclusions
Networks allow us to represent, visualise and analyse data in a different manner, showing
patterns that are hidden at simple sight. Neo4J is a powerful but easy tool to do store and
query a network.