Professional Documents
Culture Documents
UNIT – III
CONTENTS
• Introduction
• Communities in Context
• Core Methods
• Quality Functions
• The Kernighan-Lin(KL) algorithm
• Agglomerative/Divisive
Algorithms
• Spectral Algorithms
• Multi-level Graph Partitioning
• Markov Clustering
©
protein-protein interaction network based on confidence score (0.7)
for each chronic stress related lifestyle diseases
Clinical Trials Dataset – Efficacy and Toxicity
network of genes having z score >1 involved in top five diseases
Food Web of the Great Barrier Reef
Multiple language editions of Wikipedia – Linguistic Network
The social network of Zachary's karate club. Red dots denote the supporters
of instructor and blue squares denote the supporters of the president
Introduction
• The Objective
• Uncover and understand important
network/sub-network structures at
multiple topological and temporal scales.
• Extracting community structure and
leverage them to predict emergent,
critical and casual nature of dynamic
networks.
• The complexity
• Topological properties are uncertain and
existing techniques are not suitable.
• Requirements of directed and dynamic
networks are different.
• Scalability
©
Definition
• Social networks exhibit strong modular nature or
community structure.
• A community is a collection of people sharing similar
interests or having same characteristics.
• A network community is a set of nodes which have dense
connections within the group, and sparse connections
outside the group.
• A clique is in some sense a stronger version of a
community. A set of nodes forms a clique (a complete
subgraph) if all possible connections between nodes exist.
• Community detection aims at grouping nodes in accordance
with the relationships among them to form strongly linked
subgraphs from the entire graph
©
Communities in Context
• Social groups could be revealed by suitably
rearranging the rows and the columns of
matrices describing social ties, until they take an
approximate block-diagonal form.
• Identify bridging nodes and use them to separate
out community structure.
• Community discovery
• can facilitate understanding of a social system.
• allows to summarize the interactions within a
network concisely.
• lend itself to actionable pattern discovery.
©
Real World Communities
• Zachary’s Karate club
• Loans among financial institutions
• Social Behaviour of animals
• Proxy caches
• Link farms
• MANETS
• Personalized recommendation engines
• Telecommunication Networks
©
©
Quality Metrics
• Quality function quantifies the goodness of a given
division of the network into communities.
• Normalized cutThe sum of weights of the edges
that connect S to the rest of the graph, normalized
by the total edge weight of S and that of the rest of
the graphS.
• Conductance
©
Quality Metrics
• KL Objective minimize the edge cut under the
constraint that all clusters be of the same size.
©
Core Methods
• Spectral methodsmeant for bi-partitioning the
network, can also be used to recursively subdivide
the network into as many communities as desired.
• Kernighan-Lin Algorithm
• Flow based post processing
©
Kernighan-Lin(KL) Algorithm
• Graph partitioning algorithm which optimizes the KL
objective function i.e. minimize the edge cut while keeping
the cluster sizes balanced.
• At each iteration, the algorithm searches for a subset of
vertices from each part of the graph such that swapping
them will lead to a reduction in the edge cut.
• The gain gv of a vertex is the reduction in edge-cut if vertex v
is moved from its current partition to the other partition.
• Repeatedly select from the larger partition the vertex with
the largest gain and move it to the other partition.
• A vertex is not considered for moving again if it has already
been moved in the current iteration.
• After a vertex has been moved, the gains for its neighbouring
vertices will be updated in order to reflect the new
assignment of vertices to partitions.
©
Agglomerative/Divisive Algorithms
• Agglomerative:
• Begin with each node in the social network in
its own community.
• At each step merge communities that are
deemed to be sufficiently similar
• Continue until either the desired number of
communities is obtained or the remaining
communities are found to be too dissimilar to
merge any further.
• Divisive
• Begin with the entire network as one
community
• At each step, choose a certain community and
split it into two parts. ©
©
Agglomerative/Divisive Algorithms
• Both kinds of hierarchical clustering algorithms often
output a dendrogram which is a binary tree , where the
leaves are nodes of the network, and each internal node
is a community.
• In divisive algorithms, a parent-child relationship
indicates that the community represented by the parent
node was divided to obtain the communities
represented by the child nodes.
• In agglomerative algorithms, a parent-child relationship
in the dendrogram indicates that the communities
represented by the child nodes were agglomerated (or
merged) to obtain the community represented by the
parent node
©
Girvan and Newman’s divisive algorithm
©
Spectral Algorithms
• Assign nodes to communities based on the
eigenvectors of matrices.
• The top k eigen vectors define an
embedding of the nodes of the network as
points in a k-dimensional space.
• Classical data clustering techniques such as
K-means clustering are applied to derive the
final assignment of nodes to clusters.
• Spectral clustering can be shown to solve
real relaxations of different weighted graph
cut problems.
©
Spectral Algorithms
• A is the adjacency matrix of the network
• D is the diagonal matrix with the degrees of the
nodes along the diagonal
• Un-normalized Laplacian L = D −A
• Normalized Laplacian
©
References
• https://pdfs.semanticscholar.org/895a/c6105492c16b6dbd9
7bae2d7965b8e605073.pdf
• http://www.genome-integrity.org/article.asp?issn=2041-
9414;year=2015;volume=6;issue=1;spage=1;epage=1;aulast
=Roy
• http://www.sthda.com/english/articles/28-hierarchical-
clustering-essentials/92-visualizing-dendrograms-
ultimate-guide/
• https://www.cs.cmu.edu/~ckingsf/bioinfo-
lectures/kernlin.pdf
• https://people.csail.mit.edu/jshun/6886-
s18/lectures/lecture13-1.pdf
• https://www.cs.ucsb.edu/~xyan/classes/CS595D-
2009winter/MCL_Presentation2.pdf
©
• Course MOODLE
• http://sna-cse.moodlecloud.com/
• http://sna-it.moodlecloud.com/
– Username:regdno, password: student