You are on page 1of 11

COMTRADE ANALYSIS

WITH NEO4J
1. Introduction
The objective of this practice is to provide the trainees with a broad vision of network
analysis. Concretely, data from COMTRADE will be loaded and analysed with Neo4J .

2. Neo4J installation
Go to the following url:

https://neo4j.com/download/community-edition/

Select download now, ensuring that the operating system is properly selected:

Follow the installer steps:


3. Neo4J starting up the server
Before running Neo4J it is recommended to adjust the heap memory in the conf/neo4j.conf
file, depending on your server resources:

dbms.memory.heap.initial_size=4096m
dbms.memory.heap.max_size=6144m

Run the Neo4J program and select the database location (you can leave this parameter by
default). Then, press the start button and wait for few seconds.
When the server starts, it will be accessible through the web browser in the following url:

http://localhost:7474
Or

http://127.0.0.1:7474

4. Setting up the application


The first time you connect to the Neo4J website, you will be asked for some parameters.

● Leave the host and username as they are (see the capture below).
● The default password is neo4j

You will be asked to change the password. Please, insert the following password:

estp2018
You will be presented with a screen such as the following:

5. Training with Comtrade


Download the Comtrade data from the following url:

https://s3-eu-west-1.amazonaws.com/autoritas.academy/EuroStat/comtrade/comtrade.graph.csv

This dataset has been prepared from the Comtrade 2016 import/exports among countries. If
you open the file with Excel, you observe three columns:

● Source, with the source country.


● Target, with the target country.
● Weight, with the amount of the import/export.

Code snippets available at: https://s3-eu-west-


1.amazonaws.com/autoritas.academy/EuroStat/neo4j/neo4j.txt

Loading the dataset with Cypher

From the aforementioned dataset, we create:


● Nodes representing countries (should be unique)
● Edges representing import/export relationships among countries (should be unique)
○ Weights of these relationships

CREATE CONSTRAINT ON (c:Country) ASSERT c.name IS UNIQUE;

LOAD CSV WITH HEADERS FROM "https://s3-eu-west-


1.amazonaws.com/autoritas.academy/EuroStat/comtrade/comtrade.graph.csv" AS row
MERGE (s:Country {name:row.source})
MERGE (t:Country {name:row.target})
MERGE (s)-[r:e]->(t)
ON CREATE SET r.weight=toInt(row.weight);

Basic exploratory analysis

● Number of countries

MATCH (c:Country) RETURN count(c)

● Number of transactions (imports/exports)

MATCH (s:Country)-[r:e]-(t:Country)
WITH count(distinct r) as n
RETURN n

● Pre-visualising the graph (limiting to 10 nodes)

MATCH (c:Country) RETURN c LIMIT 10

Network statistics

● Graph Degree - Some basic statistics about countries transactions (e.g. minimum,
maximum, average and standard deviation of the number of transactions per
country):

MATCH (c:Country)-[:e]->()
WITH c, COUNT(*) as num
RETURN MIN(num) as min, MAX(num) as max, AVG(num) as average, STDEV(num) as
stdev

● Weighted Degree - Some basic statistics about the volume ($) of the transactions
among countries (minimum, maximum, average and standard deviation of the
volume of the transactions per country):

MATCH (c:Country)-[r:e]->()
WITH c, sum(r.weight) AS w
RETURN MIN(w) as min, MAX(w) as max, AVG(w) as average, STDEV(w) as stdev

● Graph density: Measures how many edges exists in comparison with the maximum
possible number of edges between nodes. In this case, how many transactions exists
among countries with respect to the maximum possible number of transactions.

MATCH (s:Country)-[r:e]-(t:Country)
RETURN count(distinct s) as nNodes,
count(distinct r) as nEdges,
count(DISTINCT r)/((count(DISTINCT s)-1) * (count(DISTINCT s) - 1.0)) AS graphDensity

Paths among countries

● Top 10 shortests paths (the largest shortest path is the network diameter)

MATCH (a:Country), (b:Country)


WHERE id(a)>id(b)
MATCH p=shortestPath((a)-[:e]-(b))
RETURN length(p) as len, EXTRACT (x IN nodes(p) | x.name) as path
ORDER BY len desc limit 10

● Shortest path between two countries

MATCH (s:Country {name:"China"}), (t:Country {name:"Chile"})


MATCH p=shortestPath((s)-[e*]-(t))
RETURN p

● All shortest paths between two countries

MATCH (s:Country {name:”China”}), (t:Country {name:”Chile”})


MATCH p=allShortestPaths ((s)-[e*]-(t))
RETURN p

Imports/Exports analysis

● List of countries, total imports and volume of the imports, ordered by volume

MATCH (c:Country)-[r:e]->()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC

● List of countries, total exports and volume of the exports, order by volume
MATCH (c:Country)<-[r:e]-()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC

● List of countries, total and volume of the imports plus exports, order by volume

MATCH (c:Country)-[r:e]-()
WITH c, count(r) AS n, sum(r.weight) AS w
RETURN c.name as country, n, w
ORDER BY w DESC

● Visualising the graph with the top countries per volume ($) of imports:

MATCH g=(s:Country)-[r:e]->(t:Country)
WITH g, sum(r.weight) as w
RETURN g
ORDER BY w DESC
LIMIT 10

Filtering the graph

● List of countries that imported from more than a given number of countries

MATCH (s:Country)-[r:e]->(t:Country)
WITH t, count(r) as n
WHERE n>200
RETURN t.name, n

● List of countries that imported more than a given volume

MATCH (s:Country)-[r:e]->(t:Country)
WITH t, sum(r.weight) as w
WHERE w>1000000000000
RETURN t.name, w

● List of countries that imported from less than a given number of countries and more
than a given volume

MATCH (s:Country)-[r:e]->(t:Country)
WITH t, count(r) as n, sum(r.weight) as w
WHERE n<200 AND w>1000000000000
RETURN t.name, n

● List of countries that imported from a given country, ordered by volume

MATCH g=(s:Country)-[r:e]->(t:Country{name:"USA"})
WITH s, t, g, sum(r.weight) as w
RETURN s.name, t.name, w
ORDER BY w DESC

● Total volume of imports from a given country

MATCH g=(s:Country)-[r:e]->(t:Country{name:"USA"})
WITH t, sum(r.weight) as w
RETURN t.name, w

Deleting the graph

MATCH (n) DETACH DELETE n

6. Conclusions
Networks allow us to represent, visualise and analyse data in a different manner, showing
patterns that are hidden at simple sight. Neo4J is a powerful but easy tool to do store and
query a network.

You might also like