Professional Documents
Culture Documents
7 CommunityStructure Lastupdate2324
7 CommunityStructure Lastupdate2324
Community Structure
Complejidad y Redes.
Universidad Politécnica de Madrid
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #2
INTRODUCTION
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #3
Communities
Belgium appears to be the model bicultural society:
• 59% of its citizens are Flemish, speaking Dutch and
• 40% are Walloons who speak French.
As multiethnic countries break up all over the world:How did this country foster the peaceful
coexistence of these two ethnic groups?
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities Blondel, Krings, Thomas.
in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
Brussels Studies 42 (2010)
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #4
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The call pattern of the consumers of the largest Belgian mobile phone company
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #5
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Communities: Zachary karate club
Citation history
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #6
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Zachary karate club club (ZKCC)
The first scientist at any conference on networks who uses Zachary's karate club as an
example is inducted into the Zachary Karate Club Club and awarded a prize.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #7
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Ravasz, E., Somera, A. L., Mongru, D. A., Oltvai, Z. N., & Barabási, A. L. (2002). Hierarchical
Communities: Biological modules organization of modularity in metabolic networks. Science, 297(5586), 1551-1555.
Communities are groups of molecules that have more distinct patterns of reactions
with each other than with others.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #8
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Communities: Mesoscopic scale
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure #9
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
History of community detection
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 10
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
COMMUNITIES
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 11
Definition
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 12
® Slide by Network Science by A.L. Barabási
The fundamental hypothesis of communities
H1 (Fundamental Hypothesis)
A network’s community structure is uniquely encoded
in its wiring diagram
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 13
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Hypotheses of communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 14
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Hypotheses of communities
H3 (Density Hypothesis)
Communities are locally dense subgraphs
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 15
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by cliques?
One of the first papers on community structure, published in 1949, defined a community as
group of individuals whose members all know each other [5]. In graph theoretic terms this
means that a community is a complete subgraph, or a clique.
4-clique
3-clique
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 16
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by cliques?
Drawbacks:
• Triangles are frequent, larger cliques are rare.
• Requiring a community to be a complete subgraph may be too restrictive, missing many
other legitimate communities.
• Finding cliques is computationally demanding (NP-complete)
4-clique
3-clique
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 17
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 18
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 19
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Define communities by internal/external degrees?
Clique
Strong Weak
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 20
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
NUMBER OF COMMUNITIES
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 21
Graph partitioning and community detection
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 22
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
COMMUNITY DETECTION
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 23
Community Detection
We call a partition a division of a network into an arbitrary number of groups, such that
each node belongs to one and only one group. The number of possible partitions follows
Bell number
eN
1040
% 1030
1 𝑗!
BN
𝐵! = %
1020
𝑒 𝑗! 1010
"#$
100
0 10 20 30 40 50
N
The Bell number BN grows faster than exponentially with the network size for large N.
The number of possible ways we can partition a network into communities grows
exponentially or faster with the network size N. Therefore, it is impossible to inspect all
partitions of a large network. We therefore need algorithms that can identify communities
without inspecting all partitions.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 24
® Slide by Network Science by A.L. Barabási
HIERARCHICAL CLUSTERING
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 25
Hierarchical clustering
Once we have the similarity matrix, hierarchical clustering iteratively identifies groups of
nodes with high similarity.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 26
® Slide by Network Science by A.L. Barabási
Hierarchical clustering
• agglomerative algorithms merge nodes with high similarity into the same community,
• divisive algorithms isolate communities by removing low similarity links that tend to
connect communities.
Both procedures generate a hierarchical tree, called a dendrogram, that predicts the
possible community partitions.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 27
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
In an agglomerative algorithm, similarity should be high for node pairs that belong to the
same community and low for node pairs that belong to different communities. In a network
context, nodes that connect to each other and share neighbors likely belong to the same
community, hence their xij should be large.
Nota: en el libro NetworkScience.com no definen en la
fórmula que se añade 1 al numerador si j e I son vecinos
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 28
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
0 if no link
EXAMPLE $
𝐽 𝑖, 𝑗 + Θ(𝐴&" )
𝑥&" =
min 𝑘& , 𝑘" + 1 − Θ(𝐴&" )
#
The obtained 𝑥!" for each
connected node pair is shown
on each link.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 29
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 30
Agglomerative hierarchical clustering. The Ravasz algorithm
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 31
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
1. Assign each node to a community of its own and evaluate xij for all node pairs.
2. Find the community pair or the node pair with the highest similarity and merge them
3. Calculate the similarity between the new community and all other communities.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 32
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
Step 4: Dendogram
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 33
® Slide by Network Science by A.L. Barabási
Agglomerative hierarchical clustering. The Ravasz algorithm
Step 4: Dendrogram
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 34
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
Divisive procedures systematically remove the links connecting nodes that belong to
different communities, eventually breaking a network into isolated communities.
Divisive algorithms require a link centrality measure xij that is high for pair of nodes that
belong to different communities and is low for node pairs in the same community. Two
frequently used measures can achieve this:
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 35
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 36
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
2. Remove the link with the largest centrality. In case of a tie, choose one link randomly.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 37
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 38
® Slide by Network Science by A.L. Barabási
Divisive hierarchical clustering. The Girvan-Newman algorithm
En Gephi:
Where to “cut”?
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 39
® Slide by Network Science by A.L. Barabási
Summary
This would be at odds with our expectation that in each network there is a ground truth,
corresponding to a unique community structure.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 40
MODULARITY
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 41
Random hypothesis and modularity
H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 42
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 43
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 44
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 45
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
H4 (Random Hypothesis)
Randomly wired networks are not expected to have a
community structure
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 46
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
Kronecker delta:
if it is 1, otherwise 0
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 47
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 48
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 49
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:
Remember
n: number of
communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 50
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Random hypothesis and modularity
We can write the modularity of the full partition as:
Remember
n: number of
communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 51
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Modularity maximization
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 52
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Modularity maximization
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 53
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Algorithms based on maximizing
modularity
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 54
The first modularity maximization algorithm was
Greedy modularity maximization proposed by Mark Newman
MEJ Newman, PRE 69 (2004)
5) Record for each step and select the partition for which the
modularity is maximal.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 55
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Using modularity to decide the dendrogram cut in hierarchical clustering
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 56
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Other advantages/uses of modularity maximization
- Intuitive
- Corresponds to our social network ideas of communities
- Automatically chooses the number of communities
can take into account weights Fortunato, S. Physics reports, 486 (2010)
can take into account directions Fortunato, S. Physics reports, 486 (2010)
can take into account attributes or space Expert, P. et al, PNAS 108 (2011)
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 57
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit
because:
where
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 58
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 59
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 60
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: Resolution limit
This is the resolution limit of modularity: If total degrees lie below this threshold,
the expected number of links between A and B is smaller than 1, and a single
link will force the communities together.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 61
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Good, B. H., de Montjoye, Y. A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4), 046106.
Good, B. H., de Montjoye, Y. A., & Clauset, A. (2010). Performance of modularity maximization in practical contexts. Physical Review E, 81(4), 046106.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 63
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Limitations of modularity: structure in random networks
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 64
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The Louvain method
The O(N2) computational complexity of the greedy algorithm can be prohibitive for very
large networks.
A modularity optimization algorithm with better scalability was proposed by Blondel and
collaborators:
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities in large
networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 65
® Slide by Network Science by A.L. Barabási
The Louvain method
The Louvain algorithm consists of two steps that are repeated iteratively.
Step I:
Start with a network of N nodes, initially assigning each node to a different community. For
each node i we evaluate the gain in modularity if we place node i in the community of one
of its neighbors j. We then move node i in the community for which the modularity gain is
the largest, but only if this gain is positive. If no positive gain is found, i stays in its original
community. This process is applied to all nodes until no further improvement can be
achieved, completing Step I.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 66
® Slide by Network Science by A.L. Barabási
The Louvain method
Step II:
We construct a new network whose nodes are the communities identified during Step I. The
weight of the link between two nodes is the sum of the weight of the links between the
nodes in the corresponding communities. Links between nodes of the same community
lead to weighted self-loops.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 68
® Slide by Network Science by A.L. Barabási
The Louvain method
Once Step II is completed, we repeat Steps I - II, calling their combination a pass. The
number of communities decreases with each pass. The passes are repeated until there are
no more changes and maximum modularity is attained.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 69
® Slide by Network Science by A.L. Barabási
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008). Fast unfolding of communities
The Louvain method in large networks. Journal of statistical mechanics: theory and experiment, 2008(10), P10008.
Gephi
* Louvain
* Girvan-Newman for
Hierarchical clustering
(see plugins)
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 71
Community detection in NetworkX
https://networkx.github.io/documentation/stable/reference/algorithms/community.html
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 72
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Overlapping communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 87
Overlapping communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 88
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
The clique percolation algorithm
The clique percolation algorithm, often called CFinder, views a community as the union of
overlapping cliques.
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 89
® Slide by Network Science by A.L. Barabási
Clique percolation (CFinder)
A k-clique community is the largest connected Other k-cliques that can not be reached from a
subgraph obtained by the union of all particular clique correspond to other clique-
adjacent k–cliques communities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 90
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Clique percolation (CFinder): Example
www.cfinder.org
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 92
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Comparison
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 97
Comparison: Computational complexities
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 98
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
Comparison: Running time
Complejidad y Redes.
Universidad Politécnica de Madrid Community Structure # 99
® Statistical Methods in Network Science and Data Analysis by Prof. Michael Szell
¡Gracias!
Complejidad y Redes.
Universidad Politécnica de Madrid