You are on page 1of 46

Complex Systems

@ c s . a a l t o . f i
CS-E5740

Complex Networks
Lectures 1-2:
Jari Saramäki
jari.saramaki@aalto.fi
@JariSaramaki

Lectures 3-6:
Assistants: Mikko Kivelä
Onerva Korhonen mikko.kivela@aalto.fi
Tuomas Alakörkkö www.mkivela.com
Sara Heydari
Elisa Ryyppö

fall 2016
Goals
After the course, you should

• know how to analyze and characterize networks,

• know the fundamental network models,

• understand how networks evolve, and

• know how network structure affects 



dynamical processes.
A Brief How-To
• For all details, see MyCourses!

• No exam

• Weekly problem sets (return online)


• 60% of points needed for passing

• see grading table in MyCourses

• Project work (due end 2016)


Pipeline for one problem set
Wed, week i Wed, week i+1 Mon, week i+2

Exercise Exercise Pre-DL DL


Lecture X
session session ex. session at 23:55

Intro to get started do most ask last finalize


get help! submit!
set X of the work questions!

TU1
Maari
Y342a Y342a MyCourses
M
Continuous feedback!

• We will collect feedback for each exercise


set and the project.
• We reward you with 1 bonus point for
each time you give feedback
• We will publish summaries after each
round!
Course outline
• Nov 2: Introduction; random (Erdős-Rényi) networks
• TUTORIAL: Python + NetworkX
• Nov 9: Small-world networks; scale-free networks
• TUTORIAL: Statistics with Python
• Nov 16: Network analysis and measures
• Nov 23: Weighted & social networks
• Nov 30: Communities & graph clustering
• Dec 7: Temporal networks & multilayer networks
Part I:
Why Study Networks?
Everything is a network

gene regulatory networks


Everything is a network

gene regulatory networks

metabolic
protein interaction networks
networks
Everything is a network
nerve cells

gene regulatory networks power grids


protein interaction networks
metabolic networks

social systems world trade Hagmann P, Kurant


Gigandet X, Thiran P
VJ, et al. (2007) Map
Human Whole-Brain
Structural Networks

the human brain


Diffusion MRI. PLo
(7): e597. !

the Internet
transport networks
What can network science tell us?
Internet human brain
- WHY IS THE INTERNET ALWAYS
ON, WHY DOESN’T IT FAIL?

- WHY IS IT SO HARD TO
ERADICATE ELECTRONIC VIRUSES?
- HOW AND WHY DO WE THINK?

- WHAT ARE THE DIFFERENCES


BETWEEN NORMAL AND
SCHITZOPHRENIC BRAINS?

world trade

Hagmann P, Kurant M,
Gigandet X, Thiran P, Wedeen
VJ, et al. (2007) Mapping
- WHAT DOES GLOBALIZATION MEAN Human Whole-Brain
Structural Networks with
IN PRACTICE?
Diffusion MRI. PLoS ONE 2
- HOW CAN DEVELOPING COUNTRIES
(7): e597. !
DEVELOP FURTHER?
What can network science tell us?

disease spreading
-HOW DO DISEASES SPREAD?

-WHAT IS THE ROLE OF THE UNDERLYING


CONTACT AND TRANSPORT NETWORKS?

-HOW TO PREVENT GLOBAL EPIDEMICS?

Complex systems as networks


• Links denote
node
interactions between
nodes link
‣ Interactions of different
strength weighted networks

‣ Interactions of different
direction directed networks

‣ Time-dependent interactions
temporal networks Vertex Edge
person friendship
‣ Interactions of different type
multiplex networks neuron synapse
WWW hyperlink
• Vertex = node company ownership
(synonyms), edge = link gene regulation
What is a link? 

...that is not always straightforward.

Multiple types of links can be included using multilayer networks or


multiplex networks. (More on these in the last lecture.)
The real question:
Protein 1 Protein 3

Protein 2

How do we
deal with
such things? H. Jeong, S.P. Mason, A.-L. Barabasi, Z.N. Oltvai, Nature 411, 41-42 (2001)
The network approach
1) Make empirical observations
2) Try to explain observations
2.1) Choose right level of coarse-graining 

(interacting elements = nodes, interactions = links)

2.2) Strip the problem, disregard some detail



(assume the node-link structure contains the answers)

2.3) Cast the problem as maths



(analyze networks, simulate processes, write theories)

3) See if theories, calculations, or simulations can reproduce


findings, or predict something
4) Start over from 1), refine
Part II:
Basic concepts
What is a network/graph?
Graph G = (V, E) consists of
• N vertices V = {v1 , . . . , vN },
• m edges E = {e1 , . . . , em }, where
each edge is a pair of vertices,
ei = (vj , vk ).
If the vertex pairs (vj , vk ) are ordered,
the network is directed. Then ei = (vj , vk )
means that there is an edge from vj to vk .

Otherwise the network is undirected and


ei = (vj , vk ) = (vk , vj ) means that vj and
vk are connected

Note: link = edge, node = vertex, network = graph. 



Choose whichever you like.
Simple graphs, multigraphs
A simple graph has no self-loops, that is, links from a node to itself,
or multiple edges between the same pair of nodes.

(Please note that in directed networks (vi , vj ) 6= (vj , vi ) so bidirectional


links do not count.)

Otherwise the graph is a multigraph.

In this course we only deal with simple graphs.

multigraph
simple graphs
Walks and paths
Naming conventions for walks and paths are varying (alternative names in
parenthesis)
• walk (path) is a sequence of vertices where each consecutive pair is con-
nected by an edge.

• path (self-avoiding walk/path, simple path) is a walk where vertices are


never repeated. (Exception: the first and the last vertex can be the same.)
• path length is the number of edges traversed along a path.
• shortest path (geodesic path) is a path between a pair of nodes vi and vj
with minimum path length.

• distance (geodesic distance), dij , is the shortest path length between nodes
vi and vj .
• diameter, d, is the largest distance in the network: d = maxi,j2V dij .
i
Path {i,j,k} has length 2. This is the
j distance between i and k, and also 

happens to be the diameter of this network
k
Subgraphs, cliques
A subgraph G⇤ = (V ⇤ , E ⇤ ) consists of some subset of nodes, V ⇤ ✓ V , together
with some subset of edges between those nodes. Induced subgraph contains all
edges between the nodes V ⇤ .

Subgraph: E ⇤ ✓ {(vi , vj ) 2 E | vi , vj 2 V ⇤ }
Induced subgraph: E ⇤ = {(vi , vj ) 2 E | vi , vj 2 V ⇤ }

A clique is a subgraph where all nodes are linked to all other nodes.

this is a 3-clique!
j
i i
l
m j j k
l
k k m j

Graph G some (induced) subgraphs of G


Connectedness, components
A graph is connected if some path can be found between all pairs of vertices.

If a graph is not connected, it consists of separate components (maximal


connected subgraphs).

Note that because there is no path between nodes that belong to di↵erent
components, their distance is undefined (or infinite).

i i
m j m j
l l
k k
this graph is connected this graph has two components,
(i,j,m) and (l,k)
Directed networks: paths, components
Degree
the degree
If there is an edge (vi , vj ) 2 E, of j is 4
i
• vi and vj are adjacent,
m j
• vi is a neighbour of vj , l j’s neighbours
k are (i,k,l,m)
• the edge is incident to vi and vj .

The degree ki of vertex vi is the number of edges it is incident to. (This is num-
ber of neighbours in simple graphs. Loops are counted twice in multigraphs.)

For directed networks, one can consider separate in- and out-degrees. The in-
degree ki,in is defined as the number of edges leading to vi and the out-degree
ki , out as the number of edges leading out from vi .
P
The average degree hki of a network is hki = i ki /N = 2m/N .
Degree distribution

The degree distribution P (k) is one of the central concepts in network analysis.
It answers the questions ”if a random node is picked, what is the probability
that its degree is k?” That is,

P (k) = Nk /N ,

where Nk is the number of nodes of degree k.

However, we often assume that the observed degree distribution is a sample


from some ”real” degree distribution that is smooth. In this case a better way
to obtain an estimate for the real distribution is to bin the data and answer the
question ”what is the probability that a randomly picked node has a degree in
some interval [ki , ki+1 ]?”
Edge density
The edge density of a network is the fraction of edges out of possible edges:

m 2m
⇢= = N (N 1) .
(N2 )

In real-world networks, the edge density is usually low, i.e. the networks are
sparse.

even though this metabolic network


has dense subgraphs, in general it
has a low link density, i.e. is sparse.
Clustering coefficient

2⇥1
Cj = 4⇥3 = 0.1666
i
m j
l this metabolic network has a high
k 2⇥1 average clustering coefficient
Ck = 2⇥1 =1 because of the dense subgraphs.
Special graphs

• A tree is a connected graph • A set of trees is called a


with no loops. forest.

• So Ci=0 for all i, <C>=0. a tree

• A tree with 

N vertices always 

has m=N-1edges.

• Also: a connected graph



with m=N-1 edges is 

always a tree.

a forest
Special graphs

• In a k-regular graph all


nodes have degree k.

• k-regular graphs can e.g.


take the shape of regular
lattices, or be otherwise
random.
Subgraphs of 1-d lattice,
2-d lattice, and a Cayley tree
• Cayley tree: 

a k-regular (infinite) tree.
Bipartite graphs

• If the vertices of a graph can


be divided into two subsets V1
and V2 such that edges exist
only between subsets, the
graph is bipartite

• A bipartite graph can be


projected (or collapsed) 

onto V1 or V2

• Bipartite graphs arise naturally


in many contexts
(collaboration networks,
metabolic reactions, etc)
Network representation
Network representation
Network data structures

Networkx does everything automatically, so you do not


have to worry about all this.


Just do not try to manually generate an adjacency matrix


for any larger network...
Calculations with the adjacency matrix

undirected network directed network

A = AT N
X
ki,in = aij
N
X N
X i=1
ki = aij = aij XN
j=1 i=1 ki,out = aij
j=1
N
X N X
X N
N
X N
X
2m = ki = aij
i=1 i=1 j=1
m= ki,in = ki,out
i=1 i=1
Part III:
Random networks
Erdős-Rényi networks
Erdős-Rényi networks: Two versions:
• A maximally random • G(N,p): connect each pair of
ensemble of networks of vertices with probability p
given size
• G(N,m): place m edges
Construction: randomly on the network
• Connect N vertices • these define ensembles of
randomly networks

N = 10
p = 1/5
<k> = 1.8
Pál Erdős
(1913-1996)

Erdős, P.; Rényi, A. (1959). "On Random Graphs. I.". Publicationes Mathematicae 6: 290–297.
Ensemble G(N,p) with N =3
N=3, p=1/3
ππjj == probability
probability of
of
realization
realization ofof network
network jj
<kjj>> == avg
<k avg degree
degree in
in jj
π1 ~ 0.3 π2 ~ 0.15 π3~ 0.15 π4~ 0.15
<k1>=0 <k2>=2/3 <k3>=2/3 <k4>=2/3

0.07
π5 ~ 0.15 0.07
π6 ~ 0.15 π7 ~ 0.15
0.07 π8 ~ 0.04
<k5>=4/3 <k6>=4/3 <k7>=4/3 <k8>=2

For any random network model,


any quantity such as average degree
can be viewed as either its
ensemble average, or its expected
value given the generation rules.
Properties of G(N,p)

Edges & degrees Degree distribution


• On average, the number of edges is • Each node’s number of links comes
N
hmi = 2 p = p ⇥ N (N 1)/2. from N 1 independent trials with
probability p.
• Hence the average degree is
hki = 2hmi/N = (N 1)p ⇡ N p. • Hence P (k) = Bin ((N 1), p)
n 1 n 1 k
0.08
= k pk (1 p)
0.07

0.06 • For N ! 1 with hki constant,


0.05
hkik hki
P (k) ! e ,
P (k)

0.04 hki = 30 k!
0.03
that is, P (k) = Poisson(hki).
0.02

0.01

0.00
0 10 20 30 40 50 60 70
k
Average shortest paths & dimensionality

` / ln N

1 X
`= dij
N (N 1) i,j

` / ln N 1/d

`/N ` / N (1/2)
Components in E-R networks

N
Components in E-R networks
<s>
<s>==average
averagenumber
number

relative giant component size S


ofofvertices
verticesinin

average component size <s>


components
components
other
otherthan
thanthe
the
giant
giant
SS==(number
(numberofof
vertices in
vertices in
giant) / N
giant) / N

small average degree <k>


smallcomponents
components
grow
grow insize
in
sizeuntil
until this curve is, strictly
giant
giant appears,then
appears,
then speaking, valid only for
when
join the giant, leaving
join the giant, leaving when<k>=1,
<k>=1,giant
giant ER networks where
only component appears
onlyvery
verysmall
small component appears N→∞ so that pN=const
and
disconnected
disconnectedparts
parts andstarts
startsgrowing
growing
ininsize
size (”thepercolation
(”the percolation
transition”)
transition”)
Randomizing networks:
configuration model
Randomizing networks:
configuration model
Random networks: summary
Extra reading material
Books:
• M.E.J. Newman: Networks: an Introduction (Oxford UP, 2010) recommended!
• S.N. Dorogovtsev, Lectures on Complex Networks (Oxford UP, 2010). 

(online at http://sweet.ua.pt/~f2358/)
• D. Easley & J. Kleinberg, Networks, Crowds, and Markets: Reasoning about a Highly
Connected World (Cambridge UP, 2010)

(online at http://www.cs.cornell.edu/home/kleinber/networks-book/)
Review papers:
• M.E.J. Newman: Structure and function of complex networks (SIAM Review 45, 167-256
(2003))

(online at http://arxiv.org/abs/cond-mat/0303516/) recommended!

• Boccaletti et al., Complex networks: Structure and dynamics, Physics Reports,Vol. 424, No.
4-5. (February 2006) 

(online at http://www.sciencedirect.com/science/article/pii/S037015730500462X)
• P. Holme & J. Saramäki, Temporal Networks, Physics Reports 519, 97-125 (2012)

(online at http://arxiv.org/abs/1108.1780)
• M. Kivelä et al., Multilayer Networks, Journal of Complex Networks 2(3) 203-271 (2014)

(online at http://comnet.oxfordjournals.org/content/early/2014/07/14/comnet.cnu016 )
Extra reading material
Python and Networkx, online documentation and tutorials:


• http://docs.python.org/
• http://en.wikibooks.org/wiki/A_Beginner’s_Python_Tutorial
• http://networkx.github.io/documentation/latest/tutorial/
• http://docs.scipy.org/doc/

Other network software (not used in the course, but good to know):


• http://snap.stanford.edu/
• http://graph-tool.skewed.de/
• http://github.com/CxAalto/
• http://www.boost.org/doc/libs/1_61_0/libs/graph/

You might also like