You are on page 1of 102

Module3_CommunityNetworks

Reference: R. Zafarani, M. A. Abbasi, and H.


Liu, Social Media Mining: An Introduction,
Cambridge University Press, 2014.
Book at http://socialmediamining.info/
Social Community

[real-world] community

“A group of individuals with common economic, social, or political interests or


characteristics, often living in relative proximity”
Community is “a group of densely interconnected nodes”
Community is “a group of TOPOLOGICALLY SIMILAR LINKS”
Why analyze communities?
Analyzing communities helps better understand
users
– Users form groups based on their interests

Groups provide a clear global view of user


interactions
• E.g., find polarization

Some behaviors are only observable in a group


setting and not on an individual level
– Some republican can agree with some democrats,
but their parties can disagree
SocCommunities

• Formation:
– When like-minded users on social media form a link and start interacting with each other,
a community is formed.
• More Formal Formation:
1. A set of at least two nodes sharing some interest and
2. Interactions with respect to that interest.
• Social Media Communities
– Explicit (emic): formed by user subscriptions (within the social group)
• Members understand that they are its members
• Non-members understand who the community members are
• Community members often have more interactions with each other than non-members
– Implicit (etic): implicitly formed by social interactions (out of the group)
• Individuals interact with others in the form of an unacknowledged community
• Example: Individuals calling Canada from the United States
– Phone operator considers them one community for promotional offers

• Other community names: group, cluster, cohesive subgroup or module


Examples of Explicit Social Media Communities

Facebook has groups and communities. Users can


– post messages and images,
– can comment on other messages,
– can like posts, and
– can view activities of other users

In Google+, Circles represent communities

In Twitter, communities form as lists.


– Users join lists to receive information in the
form of tweets

LinkedIn provides Groups and Associations.


– Users can join professional groups where they can post
and share information related to the group
Finding Implicit Communities: An Example

A simple graph in which three


implicit communities are found,
enclosed by the dashed circles
Overlapping vs. Disjoint Communities

Overlapping Communities Disjoint Communities


Communities in other domains

Protein-protein interaction networks


– Communities are likely to group proteins having
the same specific function within the cell

World Wide Web


– Communities may correspond to groups of pages dealing with the same or related
topics

Food webs
– Communities may identify compartments
Real-world Implicit Communities

Collaboration network
between scientists working
at the Santa Fe
Institute.

The colors indicate high


level communities and
correspond to research
divisions of the institute
Community Analysis

• Community detection
– Discovering implicit communities

• Community evolution
– Studying temporal evolution of communities

• Community evaluation
– Evaluating Detected Communities
Community Detection
Community Detection

• The process of finding clusters of nodes (‘‘communities’’)


– With Strong internal connections and
– Weak connections between different communities
• Ideal decomposition of a large graph
– Completely disjoint communities
– There are no interactions between different communities.
• In practice,
– find community partitions that are maximally decoupled.
Why Detecting Communities is Important?

Zachary's karate
club
Interactions between 34
members of a karate club
for over two years

• The club members split into two groups (gray and white)
• Disagreement between the administrator of the club (node 34) and the club’s
instructor (node 1),
• The members of one group left to start their own club

The same communities can be found using


community detection
Why Community Detection?
Network Summarization
– A community can be considered
as a summary of the whole
network
– Easier to visualize and
understand

Preserve Privacy
– [Sometimes] a community can
reveal some properties without
releasing the individuals’
privacy information.
Community Detection vs. Clustering
Clustering
• Data is often non-linked (matrix rows)
• Clustering works on the distance or similarity matrix, e.g., -
means.
• If we use -means with adjacency matrix rows, we are only
considering the ego-centric network.

Community detection
• Data is linked (a graph)
• Network data tends to be “discrete”, leading to algorithms using
the graph property directly
– -clique, quasi-clique, or edge-betweenness
Community Detection Algorithms

Group Users based


on Group attributes

Group Users based


on Member
attributes
Community Detection Algorithms
Member-Based
Community Detection
Member-Based Community Detection

• Considers node’s characteristics


• Identify nodes with similar characteristics and consider them a
community
Node Characteristics
A. Degree
– Nodes with same (or similar) degrees are in one community
– Example: cliques
B. Reachability
– Nodes that are close (small shortest paths) are in one community
– Example: -cliques, -clubs, and -clans
C. Similarity
– Similar nodes are in the same community
A. Node Degree
Most common subgraph searched for:
• Clique: a maximal complete subgraph in which
all nodes inside the subgraph are adjacent to
each other

Find communities by searching for


1. The maximum clique: the clique
with the largest number of vertices, To overcome this, we can use
or I. Brute Force method
2. All maximal cliques: cliques that II. Relax cliques
are not subgraphs of a larger III. Use cliques as the core
clique; i.e., cannot be further for larger communities
expanded

Both problems are NP-hard


I. Brute-Force Method
Can find all the
maximal cliques in
the graph

For each vertex ,


we find the maximal
clique that contains
node
Impractical for large networks:
• For a complete graph of only 100 nodes, the algorithm will
generate at least different cliques starting from any
node in the graph
Example
Example (cont’d)
Example (cont’d)
Enhancing the Brute-Force Performance

Systematic Pruning :
• When searching for cliques of size or larger
• If the clique is found, each node should have a degree equal to or
more than
• We can first prune all nodes (and edges connected to them) with
degrees less than
– More nodes will have degrees less than
– Prune them recursively
• For large , many nodes are pruned as social media networks
follow a power-law degree distribution [The fraction P(k) of nodes in
the network having k connections to other nodes goes for large
values of k as P ( k ) k− γ where γ is a parameter whose value is
typically in the range 2 < γ < 3 ]
Maximum Clique: Pruning

Example. To find a clique of size ,


remove all nodes with degree

– Remove nodes 2 and 9


– Remove nodes 1 and 3
– Remove node 4

Even with pruning, cliques are less desirable


– Cliques are rare
– A clique of 1000 nodes, has 999x1000/2 edges
– A single edge removal destroys the clique
– That is less than 0.0002% of the edges
II. Relaxing Cliques
• -plex: All nodes have a minimum degree that is not necessarily k - 1.
• For a set of vertices V, the structure is called a k-plex , if we have

• is the degree of in the induced subgraph


– Number of nodes from 𝑉 that are connected to 𝑣
• Clique of size is a -plex
• As k gets larger in a k-plex, the structure gets increasingly relaxed
• Finding the maximum -plex: NP-hard
– In practice, relatively easier due to smaller search space

Maximal -plexes
k-plex Example
Example:
• Identify 1-plex, 2-plex
and 3-plex
More Cliques Relaxing…

• -core: a maximal
connected subgraph in
which all vertices have
degree at least
• -shell: nodes that are part
of the -core, but are not
part of the -core
Questions A Clique that wants to relax
• 0-core?
• 0-shell?
• 1-core?
• -cores of the complete A 2-plex or a 2-core?
graph
k-core, k-shell - Example

• k-core decomposition for a small graph:


– Each closed line contains the set of vertices belonging to a given k-
core
– Different types of vertices correspond to different k-shells

• Reference: K-CORE DECOMPOSITION OF INTERNET GRAPHS: HIERARCHIES, SELF-SIMILARITY


AND MEASUREMENT BIASES, J. I. ALVAREZ-HAMELIN, L. DALL’ASTA, A. BARRAT AND A.
VESPIGNANI
III. Using Cliques as a seed of a Community

Clique Percolation Method (CPM)


– Uses cliques as seeds to find larger communities
– CPM finds overlapping communities

• Input
– A parameter , and a network
• Procedure
– Find out all cliques of size in the given network
– Construct a clique graph.
• Two cliques are adjacent if they share nodes
– Each connected components in the clique graph form a
community
Clique Percolation Method: Example with k=3
Clique Percolation Method: Example with k=3

Cliques of size 3:
, ,
, ,
, ,
,

Communities
(components in clique
graph):
,
,

Nodes v3 and v8 belong to two


communities and these communities
are overlapping
Clique Percolation Method: Example
• Detect communities from the graph given
below with seed size k=3.
Clique Percolation Method: Example
(cont’d)
B. Node Reachability
The two extremes of reachability
Nodes are assumed to be in the same
community
1. If there is a path between them (regardless
of the distance) How? Find using BFS/DFS

Challenge: most nodes are in one


community (giant component)

2. They are so close as to be immediate How? Finding Cliques


neighbors
Challenge: Cliques are challenging to find and are
rarely observed

Solution: Find communities that are in between cliques and connected components
in terms of connectivity and have small shortest paths between their nodes
Special Subgraphs
-Clique: a maximal subgraph in which the largest shortest path distance
between any nodes is less than or equal to
– Shortest path between any two nodes is always less than or equal to k.
– Nodes on the shortest path need not be part of the subgraph
-Club: follows the same definition as a -clique
– Additional Constraint : Nodes on the shortest paths should be part of the subgraph
-Clan: a -clique where for all shortest paths within the subgraph the distance
is equal or less than .
– All -clans are -cliques but not vice versa
k-Clans = k-Cliques ∩ k-Clubs
Special Subgraphs

• -truss: the largest subgraph where all edges belong to triangles

• What is the relationship between a -core and -truss?


C. Node Similarity
• Similar (or most similar) nodes are assumed to be in the same
community

• Node similarity can be defined


– Using the similarity of node neighborhoods (Structural Equivalence)
– Similarity of social circles (Regular Equivalence)

Structural equivalence: two nodes are structurally equivalent iff.


they are connected to the same set of actors

Nodes 1 and 3 are structurally


equivalent,
So are nodes 5 and 6
C. Node Similarity - Regular Equivalence
• A generalization of structural equivalence is known as regular
equivalence

• Consider the situation of two basketball players in two different


countries.
– Though sharing no neighbourhood overlap,
– Social circles of these players (coach, players, fans, etc.) might look quite similar due
to their social status.

• Nodes are regularly equivalent when they are connected to nodes


that are themselves similar (a self-referential definition)
Node Similarity (Structural Equivalence)

Jaccard Similarity

Cosine similarity
Node Similarity (Structural Equivalence)

Construct the Jaccard Similarity Matrix and Cosine Similarity


Matrix for the graph given below
Example

Identify 2-Cliques, 2-clubs and 2-clans from the given graph


Example

Identify k-Clique, k-club and k-clan from the given graph


Group-Based
Community Detection
Community Detection Algorithms
Group-Based Community Detection
Group-based community detection: Find communities that
have certain group properties

Group Properties:
I. Robust: -connected graphs
II. Dense: Quasi-cliques
III. Hierarchical: Hierarchical clustering
IV. Balanced: Spectral clustering
V. Modular: Modularity Maximization
I. Robust Communities
• The goal is to find subgraphs robust enough such that removing
some edges or vertices does not disconnect the subgraph

• -vertex connected ( -connected) graph:


– is the minimum number of nodes that must be removed to disconnect
the graph
• -edge connected: at least edges must be removed to
disconnect the graph
• Any graph G is said to be k-connected if it contains at least k+1
vertices, but does not contain a set of k − 1 vertices whose
removal disconnects the graph
• Examples:
– A cycle: -connected graph
II. Dense Communities

• Find dense communities, which have sufficiently frequent


interactions
• Dense Communities in Social Network would provide us
enough interactions for analysis to make a statistical sense
out of them
• Cliques, clubs, and clans are examples of connected dense
communities
Dense Communities: -dense
• The density of a graph defines how close a graph is to a clique

• A quasi-clique (or γ-clique) is a connected γ-dense graph


• A subgraph is of -dense (or quasi-clique) if

• A -dense graph is a clique


• We can find quasi-cliques using the brute force algorithm discussed previously
Finding Maximal Quasi-Cliques
We can use a two-step procedure consisting of “local search” and
“heuristic pruning”

Local search
• Sample a subgraph, and find a maximal -dense quasi-clique
• A greedy approach is to expand a quasi-clique by all of its high-degree
neighbors until the density drops below 
Heuristic pruning
• For a -dense quasi-clique of size k, we recursively remove nodes with
degree less than  k and incident edges
• We can start from low-degree nodes and recursively remove all nodes with
degree less that  k
Quasi-Cliques - Example
Heuristic pruning
• For a -dense quasi-clique of size k, we recursively remove nodes with
degree less than  k and incident edges
• We can start from low-degree nodes and recursively remove all nodes with
degree less that  k
• Example: Identify a quasi clique of size 4,  is (7/10) .7
• {A,B,C,D} forms a quasi clique of size 4
Example

• Example: Identify a quasi clique s of size 4


Example (cont’d)

• Example: Identify a quasi clique s of size 4

• A few Quasi Cliques are {v4, v5, v6, v7} ,


{v3, v4, v5, v7} as |E| >= ( V* (V-1) /2 ) (i.e : 5 >= ( 10/15 * 4 * 3 /2)
{v3, v4, v5, v6} , {v5, v6, v7, v8} , {v4, v6, v7, v8}
III. Hierarchical Communities
• Communities may have hierarchies
• Each community can have sub/super communities.
– Hierarchical clustering deals with this scenario and generates
community hierarchies
• Initially members are considered as either 1 or
communities in hierarchical clustering. These communities
are gradually
– merged (agglomerative hierarchical clustering) or
– split (divisive hierarchical clustering)
Hierarchical Community Detection
• Goal: build a hierarchical structure of communities based
on network topology
• Approaches:
– Divisive Hierarchical Clustering
– Agglomerative Hierarchical clustering
• Divisive clustering
– Partition nodes into several sets
– Each set is further divided into smaller ones
Divisive Hierarchical Clustering
Girvan-Newman Algorithm: recursively remove the “weakest” links within a
“community”
– Find the edge with the weakest link
– Remove the edge and update the corresponding strength of each edge
• Recursively apply the above two steps until a network is discomposed into a
desired number of connected components
• Each component forms a community
• Assumptions:
If a network has a set of communities and these communities are connected to one
another with a few edges, then all shortest paths between members of different
communities should pass through these edges.
By removing these edges, we can recover (i.e., disconnect) communities in a
network.
Edge Betweenness
• To determine weakest links, the algorithm uses edge
betweenness

Edge betweenness is the number of shortest paths that pass


along with the edge

• Edge betweenness measures the “bridgeness” of an edge


between two communities

• The edge with high betweenness tends to be the bridge


between two communities
Edge Betweenness
• Edge betweenness of an edge is the number of shortest paths
between pairs of nodes that run along it
• If there is more than one shortest path between a pair of
nodes, each path is assigned equal weight such that the total
weight of all of the paths is equal to unity
• Edges connecting communities will have high edge
betweenness
– By removing these edges, the groups are separated from one another
and the underlying community structure of the network is revealed
Girvan-Newman Algorithm

1. Calculate edge betweenness for all edges in


the graph
2. Remove the edge with the highest
betweenness
3. Recalculate betweenness for all edges
affected by the edge removal.
4. Repeat until all edges are removed
Edge Betweenness: Example
Edge Betweenness: Example (cont’d)

•Remove edge with highest value to cluster the graph


•In the example graph, edge BD is removed to get two communities.
Edge Betweenness: Example (cont’d)
• Two Communities:
Edge Betweenness Divisive Clustering : Example

Graph Communities
Edge Betweenness: Example

The edge betweenness of is


, as all the shortest paths
from to have to either
pass or , and is the
shortest path between and
(i.e) = (1/2 + 1/2 +1/2+1/2+ 1/2 +1/2
)+(1) = 3+1
Edge Betweenness Divisive Clustering : Example
Initial betweenness value

the first edge that needs to be removed is e(4,


5) ( or e(4, 6) )
By removing e(4, 5), we compute the edge
betweenness once again; this time, e(4, 6) has
the highest betweenness value: 20.
This is because all shortest paths between nodes
{1,2,3,4} to nodes {5,6,7,8,9} must pass thru e(4, 6);
therefore, it has betweenness 4 5 = 20.
Edge Betweenness Divisive Clustering : Example (cont’d)
Edge betweenness value

The Edge Betweenness (EB) of is , as all the shortest paths from to


have to either pass or , and is the shortest path
between and [SP is Shortest Path]
(i.e) = (1/2 + 1/2 +1/2+1/2+ 1/2 +1/2 )+(1) = 3+1 = 4
EB of = 9 [ SP from 1-{4,5,6,7,8,9}=>6*1, SP from 2-{4,5,6,7,8,9}=>6*0.5 ]
EB of = 4 [ SP from 2-3=> 1, SP from 2-{4,5,6,7,8,9}=>6*0.5 ]
EB of = 9 [ SP from 3-{4,5,6,7,8,9}=>6*1, SP from 2-{4,5,6,7,8,9}=>6*0.5 ]
EB of = 10 [ SP from {1,2,3,4}-5=>4*1, SP from {1,2,3,4}-{7,8,9}=>4*3*0.5]
Edge Betweenness Divisive Clustering : Example (cont’d)
Edge betweenness value

EB of = 10 [ SP from 4-6=>1, SP from {1,2,3}-6 => 3*1,


SP from {1,2,3,4}-{7,8,9}=4*3*0.5 ]
EB of = 6 [ SP from 5-7=> 1, SP from 5-9 => 1,
SP from {1,2,3,4}-7 => 4 *0.5, SP from {1,2,3,4} -9 =>4*0.5 ]
EB of = 3 [ SP from 5-8 => 1, SP from {1,2,3,4}-8 => 4*0.5 ]
EB of = 6 [ SP from 6-7=>1, SP from 6-9 => 1 ,
SP from {1,2,3,4} - 7 =>4*0.5, SP from {1,2,3,4} - 9 => 4*0.5 ]
EB of = 3 [ SP from 6-8=>1, SP from {1,2,3,4} - 8 =>4*0.5 ]
EB of = 2 [ SP from 7-8=>1, SP from 8-9 =>1 ]
EB of = 8 [ SP from {1,2,3,4,5,6,7,8} - 9 => 8*1 ]
Edge Betweenness Divisive Clustering : Example (cont’d)
Edge betweenness value after
removing e(4,5)

EB of =0
EB of = 20 [ SP from {1,2,3,4}-{5,6,7,8,9} => 4 * 5 ]
EB of (5, 7) = 2 [ SP from 5-7=> 1, SP from 5-9 => 1 ]
EB of (5,8) = 1 [ SP from 5-8 => 1 ]
EB of = 10 [ SP from 6-7=>1, SP from 6-9 => 1 , SP from {1,2,3,4} - 7 =>4,
SP from {1,2,3,4} - 9 => 4 ]
EB of = 5 [ SP from 6-8=>1, SP from {1,2,3,4} - 8 =>4 ]
EB of = 2 [ SP from 7-8=>1, SP from 8-9 =>1 ]
EB of = 8 [ SP from {1,2,3,4,5,6,7,8} - 9 => 8*1 ]
Edge Betweenness Divisive Clustering : Example (cont’d)
Graph Components after removing e(7,9)

EB of = 4 [ SP from {5,6,7,8} - 9 => 4 ]

It is evident that e(7,9) with highest EB value has to be removed next.

Removing e(7,9) leads to a disconnected graph comprising three components.

Hence the three communities are {1,2,3,4}, {5,6,7,8}, {9}


Girvan Newman method: Example 1
Girvan Newman method: Example 1

Betweenness(7-8)= 7x7 = 49 Betweenness(1-3) = 1X12=12


Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X11=33
Girvan Newman method: Example 1

Betweenness(1-3) = 1X5=5
Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X4=12
Girvan Newman method: Example 1

Betweenness of every edge = 1


Girvan Newman method: Example 1
Example 2
Example 2

5X5=25
Example 2

5X6=30 5X6=30
Example 2
IV. Balanced Communities
• Community detection can be thought of graph clustering
• Graph clustering : we cut the graph into several partitions and
assume these partitions represent communities
• Cut: partitioning (cut) of the graph into two (or more) sets
(cutsets)
– The size of the cut is the number of edges that are being cut
• Minimum cut (min-cut) problem : Find a graph partition such
that the number of edges between the two sets is minimized
Min-cut
Min-cuts can be computed
efficiently using the max-flow
min-cut theorem

Min-cut often returns an imbalanced partition, with one set being a singleton
Ratio Cut and Normalized Cut

• To mitigate the min-cut problem we can change the


objective function to consider community size

• is the complement cut set


• is the size of the cut

Ratio Cut & Normalized Cut : Example

For the graph given above, find the Ratio cut and Normalized cut for the cut A and cut B
Ratio Cut & Normalized Cut : Example

A
For Cut A

For Cut B
Ratio Cut & Normalized Cut
Ratio Cut & Normalized Cut
Spectral Clustering

Ratio-cut and normalized-cut are NP-hard problems


Spectral clustering is a way to solve relaxed versions (approximation
algorithms) of these problems:

1. The smallest non-null eigenvectors of the unnormalized Laplacian


approximates the Ratio Cut
and
2. The smallest non-null eigenvectors of the random-walk Laplacian
approximates the Normalized-cut (Ncut)

Reference: A Tutorial on Spectral Clustering by Eric Xing, M.


Hein & U.V. Luxburg
Spectral Clustering

• Laplacian matrix, sometimes called


admittance matrix, Kirchhoff matrix or
discrete Laplacian,
– is a matrix representation of a graph
– can be used to find many useful properties of a
graph
• can be used to calculate the number of spanning
trees for a given graph
• Sparsest cut of a graph can be approximated
Spectral Clustering

Degree Matrix:
Given a graph G = ( V , E ) with | V | = n , the degree matrix
D for G is a n × n diagonal matrix defined as given below
The elements of D are
Spectral Clustering
Laplacian matrix :
Given a simple graph G with n vertices, its Laplacian matrix L n
× n is defined as:
L=D−A
where D is the degree matrix and A is the adjacency matrix of
the graph. Since G is a simple graph, A only contains 1s or 0s
and its diagonal elements are all 0s.
The elements of L are given by

• where deg(vi) is the degree of the vertex i


Spectral Clustering
Labeled Graph Degree Matrix(D) Adjacency matrix (A)

Laplacian Matrix , L , (D-A) =


Spectral Clustering

Input: Laplacian L and the number k of clusters to


compute.
Output: Cluster C1,…,Ck.
Steps:
1 Compute the first k eigenvectors of the Laplacian
2 Cluster the column yj ; j = 1; : : : ; n into k clusters using
the K-means algorithm

Note: Y is a matrix of Eigen Vectors of size n x k


Spectral Clustering: Example

Two
communities:

𝒌-means, 𝒌=𝟐
2 Eigenvectors
i.e., we want

𝒌
2 communities
Spectral Clustering – Example (cont’d)

Applying k-Means to the second Eigen Vector {-0.46, -0.46, -0.26, 1.16 * 10-16,
1.16 * 10-16 ,0.13,0.13,0.33, 0.59 }
• Given: {-0.46, -0.46, -0.26, 1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
k=2
• Randomly assign means: m1= -0.46, m2=0.13
Iteration 1
• K1={-0.46, -0.46, -0.26,}, K2={1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
Calculating mean of K1 and K2 results in
m1= -0.39, m2=0.197

Iteration 2
• K1={-0.46, -0.46, -0.26,}, K2={1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
Calculating mean of K1 and K2 results in
m1= -0.39, m2=0.197

• Stop iterating as the clusters with these m

eans stay the same


V. Modular Communities
• Modularity is one measure of the structure of networks or graphs.
• Consider an undirected graph , where the degrees are known
beforehand however edges are not
– Consider two vertices 𝑖 and 𝑗 with degrees 𝑖 and 𝑗
What is an expected number of edges between 𝑖 and 𝑗?
• For any edge going out of 𝑖 randomly, the probability of this edge getting
connected to vertex 𝑗 is where m = |E|

• Because the degree for 𝑖 is 𝑖 , we have 𝑖 number of such edges;


• Hence, the expected number of edges between 𝑖 and 𝑗 is

• So given a degree distribution, the expected number of edges between any pair
of vertices can be computed.
Modularity and Modularity Maximization

• Given a degree distribution, we know the expected number of


edges between any pairs of vertices
• Modularity is designed to measure the strength of division of a
network into modules (also called groups, clusters or
communities).
• Networks with high modularity have dense connections
between the nodes within modules but sparse connections
between nodes in different modules.
• We assume that real-world networks should be far from
random. Therefore, the more distant they are from the
randomly generated network, the more structural they are.
• Modularity defines this distance and modularity maximization
tries to maximize this distance
Normalized Modularity
Difference between the actual number of edges between node i and j and the
expected number of edges between them is

Consider a partitioning of the graph G into k partitions 𝑃 = (𝑃1, 𝑃2, 𝑃3, … , 𝑃𝑘)

For partition , this distance can be


defined as, [Summing over all node pairs ]

This distance can be generalized for


a partitioning

The normalized version of this


distance is defined as Modularity
Modularity and Modularity Maximization
Modular Communities

Modularity matrix , B, is defined as

× is the degree vector for all nodes

To detect ‘k’ clusters/communities:


1) Compute the Eigen values of B
2) Compute top k Eigenvectors of B corresponding
to the largest positive Eigen values
3) Run k-means on the k Eigenvectors to detect ‘k’ communities
Modularity Example
To detect ‘k’ clusters/communities:
1) Compute the Eigen values of B
2) Compute top k Eigenvectors of B corresponding
to the largest positive eigenvalues.
3) Run k-means on the k Eigenvectors to detect ‘k’ communities

d1=d2=1, d3=d4=d5=2, Bij = Aij - didj/2m


All the rows and columns of B sum to 0
Modularity Example
• Eigen Values are

• Eigen Vector wrt Eigen Value 10 is

• Applying k-means (k=2) on the above vector results in


nodes {1,2} and nodes {3,4,5} as two communities.
Modularity Maximization: Example

Two
Communities:
{1, 2, 3, 4}
and
{5, 6, 7, 8, 9}

-means

2 eigenvectors

Modularity Matrix
Community Detection Algorithms - Summary

You might also like