You are on page 1of 80

Complejidad y Redes

Community Structure

Complejidad y Redes.
Universidad Politécnica de Madrid

Designed by starline / Freepik


Slides based on:

CNSC 6013: Statistical Methods


in Network Science and Data Analysis

by Prof. Michael Szell

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #2
INTRODUCTION

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #3
Communities
Belgium appears to be the model bicultural society:
• 59% of its citizens are Flemish, speaking Dutch and
• 40% are Walloons who speak French.
As multiethnic countries break up all over the world:How did this country foster the peaceful
coexistence of these two ethnic groups?

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities Blondel, Krings, Thomas.
in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
Brussels Studies 42 (2010)

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #4
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The call pattern of the consumers of the largest Belgian mobile phone company

• Communities extracted from the call pattern of the


consumers of the largest Belgian mobile phone company.
• The network has about two million mobile phone users.
• The nodes correspond to communities, the size of each
node being proportional to the number of individuals in
the corresponding community.
• The color of each community on a red–green scale
represents the language spoken in the particular
community, red for French and green for Dutch.
• Only communities of more than 100 individuals are
shown. The community that connects the two main
clusters consists of several smaller communities with less
obvious language separation, capturing the culturally
mixed Brussels, the country’s capital.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #5
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Communities: Zachary karate club

This network has


become a benchmark
for community detection

Citation history

W.W. Zachary, J. Anthropol. Res. 33:452-473 (1977)

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #6
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Zachary karate club club (ZKCC)

The first scientist at any conference on networks who uses Zachary's karate club as an
example is inducted into the Zachary Karate Club Club and awarded a prize.

ZKCC Trophy recipients •11th Federico Battiston


•22nd Luca Gaillo (July 2023) (September 2016)
•21st Santo Fortunato (July 2022) •10th Giona Casiraghi (July 2016)
•20th Jesús Arroyo (July 2021) •9th Filippo Radicchi (May 2016)
•19th Jean-Gabriel Young •8th Qing Ke (September 2015)
(September 2020) •7th Manlio De Domenico (July
•18th Emma Towlson (May 2019) 2015)
•17th Philipp Hövel (March 2019) •6th Tiago Peixoto (June 2015)
•16th Clara Granell (September •5th Mark Newman (June 2014)
2018) •4th Marián Boguñá (September
•15th Leto Peel (June 2018) 2013)
•14th Aric Hagberg (March 2018) •3rd YY Ahn (July 2013)
•13th Megha Padi (March 2018) •2nd Mason Porter (June 2013) http://networkkarate.tumblr.com/
•12th Amir Rubin (January 2017) •1st Cristopher Moore (May 2013)

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #7
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L. (2002). Hierarchical
Communities: Biological modules organization of modularity in metabolic networks. Science, 297(5586), 1551-1555.

Communities are groups of molecules that have more distinct patterns of reactions
with each other than with others.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #8
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Communities: Mesoscopic scale

The study of communities focuses on the mesoscopic scale of networks

Microscopic Mesoscopic Macroscopic

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #9
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
History of community detection

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 10
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
COMMUNITIES

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 11
Definition

In network science we call a community a group of nodes


that have a higher likelihood of connecting to each other
than to nodes from other communities.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 12
® Slide by Network Science by A.L. Barabási
The fundamental hypothesis of communities

H1 (Fundamental Hypothesis)
A network’s community structure is uniquely encoded
in its wiring diagram

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 13
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Hypotheses of communities

A community is a locally dense connected subgraph in a network.


all members of a community
H2 (Connectedness Hypothesis) must be reached through other
A community corresponds to a connected subgraph members of the same community
(connectedness)
Each community corresponds to a connected subgraph,
like the subgraphs formed by the orange, green or the
purple nodes. Consequently, if a network consists of two
isolated components, each community is limited to only
one component. The hypothesis also implies that on the
same component a community cannot consist of two
subgraphs that do not have a link to each other.
Consequently, the orange and the green nodes form
separate communities.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 14
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Hypotheses of communities

A community is a locally dense connected subgraph in a network.

H3 (Density Hypothesis)
Communities are locally dense subgraphs

Nodes in a community are more likely to


connect to other members of the same
community than to nodes in other
communities.

The orange, the green and the purple nodes satisfy


this expectation.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 15
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by cliques?

One of the first papers on community structure, published in 1949, defined a community as
group of individuals whose members all know each other [5]. In graph theoretic terms this
means that a community is a complete subgraph, or a clique.

A clique it is a connected subgraph with maximal link density.

4-clique

3-clique

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 16
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by cliques?

Drawbacks:
• Triangles are frequent, larger cliques are rare.
• Requiring a community to be a complete subgraph may be too restrictive, missing many
other legitimate communities.
• Finding cliques is computationally demanding (NP-complete)

4-clique

3-clique

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 17
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?

Relax the rigidity of cliques: Consider a connected subgraph


with nodes.

A node’s internal degree is the number of


links that connects it to nodes in the same
community

A node’s external degree is the number of links


that connects it to other nodes in the network

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 18
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?

If : all neighbors of belong to , and


is a good community for

If : all neighbors of belong to other communities,


so should be assigned to a different community

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 19
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?

is a strong community if each is a weak community if the


node within has more links total internal degree exceeds
within the community than to its total external degree:
the rest. For all nodes:

Clique
Strong Weak

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 20
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
NUMBER OF COMMUNITIES

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 21
Graph partitioning and community detection

Graph partitioning is a classic problem in computer science.


It is like clustering a graph: dividing the nodes of a network into a pre-defined number of
non-overlapping partitions of given sizes such that the number of links between different
partitions, called the cut size, is minimized.

Community detection is the problem of finding natural, not pre-defined partitions of a


network such that there are many edges within groups and few between groups. It aims to
uncover the inherent community structure of a network.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 22
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
COMMUNITY DETECTION

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 23
Community Detection

We call a partition a division of a network into an arbitrary number of groups, such that
each node belongs to one and only one group. The number of possible partitions follows
Bell number
eN
1040

% 1030

1 𝑗!

BN
𝐵! = %
1020

𝑒 𝑗! 1010

"#$
100
0 10 20 30 40 50
N

The Bell number BN grows faster than exponentially with the network size for large N.

The number of possible ways we can partition a network into communities grows
exponentially or faster with the network size N. Therefore, it is impossible to inspect all
partitions of a large network. We therefore need algorithms that can identify communities
without inspecting all partitions.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 24
® Slide by Network Science by A.L. Barabási
HIERARCHICAL CLUSTERING

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 25
Hierarchical clustering

A set of clustering algorithms.

Based on the definition of a similarity matrix (X):

elements xij indicate the distance of node i from node j.


In community detection, the similarity is extracted from the relative position of nodes i and j
within the network.

Once we have the similarity matrix, hierarchical clustering iteratively identifies groups of
nodes with high similarity.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 26
® Slide by Network Science by A.L. Barabási
Hierarchical clustering

We can use two different procedures to achieve this:

• agglomerative algorithms merge nodes with high similarity into the same community,

• divisive algorithms isolate communities by removing low similarity links that tend to
connect communities.

Both procedures generate a hierarchical tree, called a dendrogram, that predicts the
possible community partitions.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 27
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm

Step 1: Define the Similarity Matrix

In an agglomerative algorithm, similarity should be high for node pairs that belong to the
same community and low for node pairs that belong to different communities. In a network
context, nodes that connect to each other and share neighbors likely belong to the same
community, hence their xij should be large.
Nota: en el libro NetworkScience.com no definen en la
fórmula que se añade 1 al numerador si j e I son vecinos

$ ' &," )*(,!" )


The topological overlap matrix define similarities among nodes: 𝑥&" =
./0 1! ,1" )23*(,!" )

•xij0=1 if nodes i and j have a link to each


$, &4 56$ other and have the same neighbours.
Heaviside step function Θ 𝑥 = *
2, 789:;<&=:
•xij0(i, j) =0 if i and j do not have common
J(i, j) is the number of common neighbors of node i and j neighbours and they are not connected.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 28
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
0 if no link

EXAMPLE $
𝐽 𝑖, 𝑗 + Θ(𝐴&" )
𝑥&" =
min 𝑘& , 𝑘" + 1 − Θ(𝐴&" )

#
The obtained 𝑥!" for each
connected node pair is shown
on each link.

Note that can be nonzero for


nodes that do not link to each
other but have a common
neighbour. Example, C and E.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 29
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 30
Agglomerative hierarchical clustering. The Ravasz algorithm

Smallest similarity xij


Step 2: Decide Group Similarity

As nodes are merged into small


communities, we must measure
how similar two communities are.
Three approaches, called single,
complete and average cluster
similarity, are frequently used to
calculate the community similarity Largest similarity xij average xij
from the node-similarity matrix.

The Ravasz algorithm uses the


average cluster similarity method.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 31
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm

Step 3: Apply Hierarchical Clustering

1. Assign each node to a community of its own and evaluate xij for all node pairs.

2. Find the community pair or the node pair with the highest similarity and merge them

into a single community.

3. Calculate the similarity between the new community and all other communities.

4. Repeat Steps 2 and 3 until all nodes form a single community.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 32
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm

Step 4: Dendogram

The dendrogram visualizes the order in


which the nodes are assigned to specific
communities.
To identify the communities, we must cut
the dendrogram. Hierarchical clustering
does not tell us where that cut should be.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 33
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm

Step 4: Dendrogram

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 34
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm

Divisive procedures systematically remove the links connecting nodes that belong to
different communities, eventually breaking a network into isolated communities.

Step 1: Define Centrality

Divisive algorithms require a link centrality measure xij that is high for pair of nodes that
belong to different communities and is low for node pairs in the same community. Two
frequently used measures can achieve this:

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 35
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm

(a) Link Betweenness


Link betweenness captures the role of each link in information transfer. Hence xij is
proportional to the number of shortest paths between all node pairs that run along the link
(i,j). Consequently, inter-community links have large betweenness. fast
The calculation of link betweenness scales as 0(LN), or 0(N2) for a sparse network. est

(b) Random-Walk Link Betweenness


A pair of nodes m and n are chosen at random. A walker starts at m, following each adjacent
link with equal probability until it reaches n. Random walk link betweenness xij is the
probability that the link i→j was crossed by the walker after averaging over all possible
choices for the starting nodes m and n. The calculation requires the inversion of an NxN
matrix, with 0(N3) computational complexity and averaging the flows over all node pairs,
with 0(LN2). Hence the total computational complexity of random walk betweenness is 0[(L
+ N) N2], or 0(N3) for a sparse network.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 36
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm

Step 2: Hierarchical Clustering

1. Compute the centrality xij of each link.

2. Remove the link with the largest centrality. In case of a tie, choose one link randomly.

3. Recalculate the centrality of each link for the altered network.

4. Repeat steps 2 and 3 until all links are removed.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 37
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 38
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
En Gephi:
Where to “cut”?

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 39
® Slide by Network Science by A.L. Barabási
Summary

In summary, in principle hierarchical clustering does not require preliminary knowledge


about the number and the size of communities.

In practice it generates a dendrogram that offers a family of community partitions


characterizing the studied network. This dendrogram does not tell us which partition
captures best the underlying community structure. Indeed, any cut of the hierarchical tree
offers a potentially valid partition.

This would be at odds with our expectation that in each network there is a ground truth,
corresponding to a unique community structure.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 40
MODULARITY

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 41
Random hypothesis and modularity

H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure

This allows us to formulate a measure that compares the


connections of a network with a random null model.

By comparing the link density of a community structure with the link


density obtained for the same group of nodes for a randomly rewired
network, we could decide if the original community corresponds to a
dense subgraph, or its connectivity pattern emerged by chance.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 42
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity

H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure

This allows us to formulate a measure that compares the


connections of a network with a random null model:

Imagine we have a partition into communities:


Each community has nodes and links.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 43
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity

H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure

This allows us to formulate a measure that compares the


connections of a network with a random null model:

Imagine we have a partition into communities.


Each community has nodes and links.

The modularity of a subgraph is

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 44
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity

H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure

This allows us to formulate a measure that compares the


connections of a network with a random null model:

Imagine we have a partition into communities.


Each community has nodes and links.

The modularity of a subgraph is

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 45
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity

H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure

This allows us to formulate a measure that compares the


connections of a network with a random null model:

Imagine we have a partition into communities.


Each community has nodes and links.

The modularity of a subgraph is

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 46
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity

In the degree-preserving random null model:

We can write the modularity of the full partition as:


The full partition of the whole network

Kronecker delta:
if it is 1, otherwise 0

* 𝛿=1 if node i and node j belong


to the same community

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 47
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 48
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 49
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:

Sum of all degrees


in community c

Remember
n: number of
communities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 50
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:

Sum of all degrees


in community c

Remember
n: number of
communities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 51
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Modularity maximization

H5 (Maximal Modularity Hypothesis)


The partition with maximum modularity M for a given
network offers the optimal community structure

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 52
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Modularity maximization

H5 (Maximal Modularity Hypothesis)


The partition with maximum modularity M for a given
network offers the optimal community structure.

Goal: Find the partition that maximizes M

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 53
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Algorithms based on maximizing
modularity

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 54
The first modularity maximization algorithm was
Greedy modularity maximization proposed by Mark Newman
MEJ Newman, PRE 69 (2004)

1) Assign each node to a community of its own. nx.greedy_modularity_communities

2) Inspect each pair of communities connected by at least one link


and compute the modularity variation ΔM obtained if we merge
these two communities.

3) Identify the community pairs for which ΔM is the largest and


merge them. Note that modularity of a particular partition is always
calculated from the full topology of the network.
4) Repeat step 2 until all nodes are merged into a single
community.

5) Record for each step and select the partition for which the
modularity is maximal.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 55
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Using modularity to decide the dendrogram cut in hierarchical clustering

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 56
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Other advantages/uses of modularity maximization

- Intuitive
- Corresponds to our social network ideas of communities
- Automatically chooses the number of communities

Can be extended with other null models:

can take into account weights Fortunato, S. Physics reports, 486 (2010)

can take into account directions Fortunato, S. Physics reports, 486 (2010)

can take into account attributes or space Expert, P. et al, PNAS 108 (2011)

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 57
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit

In each merging step we calculate the


change in modularity
A B
(2𝐿)>

because:

where

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 58
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit

In each merging step we calculate the


change in modularity
A B
(2𝐿)>
If A and B are distinct communities, they should remain distinct
when M is maximized.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 59
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit

In each merging step we calculate the


change in modularity
A B
(2𝐿)>
If A and B are distinct communities, they should remain distinct
when M is maximized.

This is not the case when and


because then The algorithm merge A and B to maximize modularity.

If we assume then modularity increases if .

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 60
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit

If we assume then modularity increases if .

This is the resolution limit of modularity: If total degrees lie below this threshold,
the expected number of links between A and B is smaller than 1, and a single
link will force the communities together.

Modularity cannot detect communities smaller than this size.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 61
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Good, B. H., de Montjoye, Y. A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4), 046106.

Limitations of modularity: degenerate optimization landscape

In real networks, there are many


different but nearly-as-good
partitions.

The optimization landscape is


degenerate.

We lack a clear modularity maxima; instead, the


modularity function is highly degenerate.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 62
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: unintuitive behavior

Unintuitive merging of nodes


or modules due to resolution
limit.

Good, B. H., de Montjoye, Y. A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4), 046106.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 63
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: structure in random networks

ER-networks Scale-free networks

Because of random fluctuations in the establishment of links


you can find high-modularity partitions in random networks!
Guimera, R., Sales-Pardo, M., & Amaral, L. A. N. (2004). Modularity from fluctuations in random graphs and complex networks. Physical Review E, 70(2), 025101.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 64
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The Louvain method

The O(N2) computational complexity of the greedy algorithm can be prohibitive for very
large networks.

A modularity optimization algorithm with better scalability was proposed by Blondel and
collaborators:

Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large
networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 65
® Slide by Network Science by A.L. Barabási
The Louvain method
The Louvain algorithm consists of two steps that are repeated iteratively.

Step I:
Start with a network of N nodes, initially assigning each node to a different community. For
each node i we evaluate the gain in modularity if we place node i in the community of one
of its neighbors j. We then move node i in the community for which the modularity gain is
the largest, but only if this gain is positive. If no positive gain is found, i stays in its original
community. This process is applied to all nodes until no further improvement can be
achieved, completing Step I.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 66
® Slide by Network Science by A.L. Barabási
The Louvain method
Step II:
We construct a new network whose nodes are the communities identified during Step I. The
weight of the link between two nodes is the sum of the weight of the links between the
nodes in the corresponding communities. Links between nodes of the same community
lead to weighted self-loops.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 68
® Slide by Network Science by A.L. Barabási
The Louvain method

Once Step II is completed, we repeat Steps I - II, calling their combination a pass. The
number of communities decreases with each pass. The passes are repeated until there are
no more changes and maximum modularity is attained.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 69
® Slide by Network Science by A.L. Barabási
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities

The Louvain method in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.

This method accounts for the resolution limit

A hierarchy of different resolutions is constructed, allowing


to zoom into the structure with the desired resolution.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 70
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Community detection in Gephi

Gephi
* Louvain

* Girvan-Newman for
Hierarchical clustering
(see plugins)

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 71
Community detection in NetworkX

https://networkx.github.io/documentation/stable/reference/algorithms/community.html

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 72
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Overlapping communities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 87
Overlapping communities

A node is sometimes not confined to a single community.


Consider a scientist, who belongs to the community of scientists that share his professional
interests. Yet, he also belongs to a community consisting of family members and relatives
and perhaps another community of individuals sharing his hobby. Each of these
communities consists of individuals who are members of several other communities,
resulting in a complicated web of nested and overlapping communities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 88
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The clique percolation algorithm

The clique percolation algorithm, often called CFinder, views a community as the union of
overlapping cliques.

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 89
® Slide by Network Science by A.L. Barabási
Clique percolation (CFinder)

Start “rolling” the clique over adjacent cliques.


Start with a k-clique
Two k-cliques are considered adjacent if they
share k-1 nodes

A k-clique community is the largest connected Other k-cliques that can not be reached from a
subgraph obtained by the union of all particular clique correspond to other clique-
adjacent k–cliques communities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 90
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Clique percolation (CFinder): Example

Network of association norms


around the concept “bright”

www.cfinder.org

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 92
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Comparison

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 97
Comparison: Computational complexities

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 98
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Comparison: Running time

Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 99
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
¡Gracias!

Complejidad y Redes.
Universidad Politécnica de Madrid

Designed by starline / Freepik

You might also like