You are on page 1of 28

SOCIAL NETWORK ANALYSIS

UNIT – III
CONTENTS
• Introduction
• Communities in Context
• Core Methods
• Quality Functions
• The Kernighan-Lin(KL) algorithm
• Agglomerative/Divisive
Algorithms
• Spectral Algorithms
• Multi-level Graph Partitioning
• Markov Clustering
©
protein-protein interaction network based on confidence score (0.7)
for each chronic stress related lifestyle diseases
Clinical Trials Dataset – Efficacy and Toxicity
network of genes having z score >1 involved in top five diseases
Food Web of the Great Barrier Reef
Multiple language editions of Wikipedia – Linguistic Network
The social network of Zachary's karate club. Red dots denote the supporters
of instructor and blue squares denote the supporters of the president
Introduction
• The Objective
• Uncover and understand important
network/sub-network structures at
multiple topological and temporal scales.
• Extracting community structure and
leverage them to predict emergent,
critical and casual nature of dynamic
networks.
• The complexity
• Topological properties are uncertain and
existing techniques are not suitable.
• Requirements of directed and dynamic
networks are different.
• Scalability
©
Definition
• Social networks exhibit strong modular nature or
community structure.
• A community is a collection of people sharing similar
interests or having same characteristics.
• A network community is a set of nodes which have dense
connections within the group, and sparse connections
outside the group.
• A clique is in some sense a stronger version of a
community. A set of nodes forms a clique (a complete
subgraph) if all possible connections between nodes exist.
• Community detection aims at grouping nodes in accordance
with the relationships among them to form strongly linked
subgraphs from the entire graph

©
Communities in Context
• Social groups could be revealed by suitably
rearranging the rows and the columns of
matrices describing social ties, until they take an
approximate block-diagonal form.
• Identify bridging nodes and use them to separate
out community structure.
• Community discovery
• can facilitate understanding of a social system.
• allows to summarize the interactions within a
network concisely.
• lend itself to actionable pattern discovery.

©
Real World Communities
• Zachary’s Karate club
• Loans among financial institutions
• Social Behaviour of animals
• Proxy caches
• Link farms
• MANETS
• Personalized recommendation engines
• Telecommunication Networks

©
©
Quality Metrics
• Quality function quantifies the goodness of a given
division of the network into communities.
• Normalized cutThe sum of weights of the edges
that connect S to the rest of the graph, normalized
by the total edge weight of S and that of the rest of
the graphS.

• Conductance

©
Quality Metrics
• KL Objective minimize the edge cut under the
constraint that all clusters be of the same size.

• Modularity it is independent of the number of clusters


that the graph is divided into.

©
Core Methods
• Spectral methodsmeant for bi-partitioning the
network, can also be used to recursively subdivide
the network into as many communities as desired.
• Kernighan-Lin Algorithm
• Flow based post processing

• Clustering methodsallow the user to indirectly


control the granularity of the output communities
• Markov clustering
• Clustering via shingling

©
Kernighan-Lin(KL) Algorithm
• Graph partitioning algorithm which optimizes the KL
objective function i.e. minimize the edge cut while keeping
the cluster sizes balanced.
• At each iteration, the algorithm searches for a subset of
vertices from each part of the graph such that swapping
them will lead to a reduction in the edge cut.
• The gain gv of a vertex is the reduction in edge-cut if vertex v
is moved from its current partition to the other partition.
• Repeatedly select from the larger partition the vertex with
the largest gain and move it to the other partition.
• A vertex is not considered for moving again if it has already
been moved in the current iteration.
• After a vertex has been moved, the gains for its neighbouring
vertices will be updated in order to reflect the new
assignment of vertices to partitions.
©
Agglomerative/Divisive Algorithms
• Agglomerative:
• Begin with each node in the social network in
its own community.
• At each step merge communities that are
deemed to be sufficiently similar
• Continue until either the desired number of
communities is obtained or the remaining
communities are found to be too dissimilar to
merge any further.
• Divisive
• Begin with the entire network as one
community
• At each step, choose a certain community and
split it into two parts. ©
©
Agglomerative/Divisive Algorithms
• Both kinds of hierarchical clustering algorithms often
output a dendrogram which is a binary tree , where the
leaves are nodes of the network, and each internal node
is a community.
• In divisive algorithms, a parent-child relationship
indicates that the community represented by the parent
node was divided to obtain the communities
represented by the child nodes.
• In agglomerative algorithms, a parent-child relationship
in the dendrogram indicates that the communities
represented by the child nodes were agglomerated (or
merged) to obtain the community represented by the
parent node

©
Girvan and Newman’s divisive algorithm

• Edge betweenness measures are defined in a way that


edges with high betweenness scores are more likely to be
the edges that connect different communities.
• Inter-community edges are designed to have higher edge
betweenness scores than intra-community edges do
• By identifying and discarding such edges with high
betweenness scores, one can disconnect the social network
into its constituent communities.
• Shortest path betweenness
• random-walk betweenness

• Disadvantage: High computational cost computing the
betweenness for all edges takes O(|V ||E|) time, and the
entire algorithm requires O(|V|3) time.

©
Spectral Algorithms
• Assign nodes to communities based on the
eigenvectors of matrices.
• The top k eigen vectors define an
embedding of the nodes of the network as
points in a k-dimensional space.
• Classical data clustering techniques such as
K-means clustering are applied to derive the
final assignment of nodes to clusters.
• Spectral clustering can be shown to solve
real relaxations of different weighted graph
cut problems.
©
Spectral Algorithms
• A is the adjacency matrix of the network
• D is the diagonal matrix with the degrees of the
nodes along the diagonal
• Un-normalized Laplacian L = D −A
• Normalized Laplacian

• Both L and are symmetric and positive definite,


and therefore have real and positive eigenvalues.
• The eigenvector corresponding to the smallest
non-zero eigen value of L is known as the Fiedler
vector and forms the basis for bi-partitioning the
graph.
©
MULTI-LEVEL GRAPH PARTITIONING

• Provide a powerful framework for fast and high-


quality graph partitioning.
Coarsening: Produce a smaller graph that is similar
to the original graph. Construct a matching on the
graph, where a matching is defined as a set of edges
no two of which are incident on the same vertex.
Initial partitioning: Partition the coarsest graph
using spectral partitioning.
Uncoarsening: Partition on the current graph is used
to initialize a partition on the finer (original) graph.
©
Markov Clustering
• Clusters graphs via manipulation of the
stochastic matrix or transition probability matrix
corresponding to the graph.
• The MCL process consists of two operations on
stochastic matrices, Expand and Inflate.
• Expand(M) is simply M ∗ M, Inflate(M,r) raises each entry in the matrix M
to the inflation parameter r ( > 1, and typically set to 2)
• Re-normalize the columns to sum to 1.
• These two operators are applied in alternation
iteratively until convergence, starting with the
initial transition probability matrix.

©
References
• https://pdfs.semanticscholar.org/895a/c6105492c16b6dbd9
7bae2d7965b8e605073.pdf
• http://www.genome-integrity.org/article.asp?issn=2041-
9414;year=2015;volume=6;issue=1;spage=1;epage=1;aulast
=Roy
• http://www.sthda.com/english/articles/28-hierarchical-
clustering-essentials/92-visualizing-dendrograms-
ultimate-guide/
• https://www.cs.cmu.edu/~ckingsf/bioinfo-
lectures/kernlin.pdf
• https://people.csail.mit.edu/jshun/6886-
s18/lectures/lecture13-1.pdf
• https://www.cs.ucsb.edu/~xyan/classes/CS595D-
2009winter/MCL_Presentation2.pdf

©
• Course MOODLE
• http://sna-cse.moodlecloud.com/
• http://sna-it.moodlecloud.com/
– Username:regdno, password: student

• NPTEL Online Courses – 12 week course


– https://onlinecourses.nptel.ac.in/noc18_cs56/pr
eview

You might also like