Professional Documents
Culture Documents
[real-world] community
• Formation:
– When like-minded users on social media form a link and start interacting with each other,
a community is formed.
• More Formal Formation:
1. A set of at least two nodes sharing some interest and
2. Interactions with respect to that interest.
• Social Media Communities
– Explicit (emic): formed by user subscriptions (within the social group)
• Members understand that they are its members
• Non-members understand who the community members are
• Community members often have more interactions with each other than non-members
– Implicit (etic): implicitly formed by social interactions (out of the group)
• Individuals interact with others in the form of an unacknowledged community
• Example: Individuals calling Canada from the United States
– Phone operator considers them one community for promotional offers
Food webs
– Communities may identify compartments
Real-world Implicit Communities
Collaboration network
between scientists working
at the Santa Fe
Institute.
• Community detection
– Discovering implicit communities
• Community evolution
– Studying temporal evolution of communities
• Community evaluation
– Evaluating Detected Communities
Community Detection
Community Detection
Zachary's karate
club
Interactions between 34
members of a karate club
for over two years
• The club members split into two groups (gray and white)
• Disagreement between the administrator of the club (node 34) and the club’s
instructor (node 1),
• The members of one group left to start their own club
Preserve Privacy
– [Sometimes] a community can
reveal some properties without
releasing the individuals’
privacy information.
Community Detection vs. Clustering
Clustering
• Data is often non-linked (matrix rows)
• Clustering works on the distance or similarity matrix, e.g., -
means.
• If we use -means with adjacency matrix rows, we are only
considering the ego-centric network.
Community detection
• Data is linked (a graph)
• Network data tends to be “discrete”, leading to algorithms using
the graph property directly
– -clique, quasi-clique, or edge-betweenness
Community Detection Algorithms
Systematic Pruning :
• When searching for cliques of size or larger
• If the clique is found, each node should have a degree equal to or
more than
• We can first prune all nodes (and edges connected to them) with
degrees less than
– More nodes will have degrees less than
– Prune them recursively
• For large , many nodes are pruned as social media networks
follow a power-law degree distribution [The fraction P(k) of nodes in
the network having k connections to other nodes goes for large
values of k as P ( k ) k− γ where γ is a parameter whose value is
typically in the range 2 < γ < 3 ]
Maximum Clique: Pruning
Maximal -plexes
k-plex Example
Example:
• Identify 1-plex, 2-plex
and 3-plex
More Cliques Relaxing…
• -core: a maximal
connected subgraph in
which all vertices have
degree at least
• -shell: nodes that are part
of the -core, but are not
part of the -core
Questions A Clique that wants to relax
• 0-core?
• 0-shell?
• 1-core?
• -cores of the complete A 2-plex or a 2-core?
graph
k-core, k-shell - Example
• Input
– A parameter , and a network
• Procedure
– Find out all cliques of size in the given network
– Construct a clique graph.
• Two cliques are adjacent if they share nodes
– Each connected components in the clique graph form a
community
Clique Percolation Method: Example with k=3
Clique Percolation Method: Example with k=3
Cliques of size 3:
, ,
, ,
, ,
,
Communities
(components in clique
graph):
,
,
Solution: Find communities that are in between cliques and connected components
in terms of connectivity and have small shortest paths between their nodes
Special Subgraphs
-Clique: a maximal subgraph in which the largest shortest path distance
between any nodes is less than or equal to
– Shortest path between any two nodes is always less than or equal to k.
– Nodes on the shortest path need not be part of the subgraph
-Club: follows the same definition as a -clique
– Additional Constraint : Nodes on the shortest paths should be part of the subgraph
-Clan: a -clique where for all shortest paths within the subgraph the distance
is equal or less than .
– All -clans are -cliques but not vice versa
k-Clans = k-Cliques ∩ k-Clubs
Special Subgraphs
Jaccard Similarity
Cosine similarity
Node Similarity (Structural Equivalence)
Group Properties:
I. Robust: -connected graphs
II. Dense: Quasi-cliques
III. Hierarchical: Hierarchical clustering
IV. Balanced: Spectral clustering
V. Modular: Modularity Maximization
I. Robust Communities
• The goal is to find subgraphs robust enough such that removing
some edges or vertices does not disconnect the subgraph
Local search
• Sample a subgraph, and find a maximal -dense quasi-clique
• A greedy approach is to expand a quasi-clique by all of its high-degree
neighbors until the density drops below
Heuristic pruning
• For a -dense quasi-clique of size k, we recursively remove nodes with
degree less than k and incident edges
• We can start from low-degree nodes and recursively remove all nodes with
degree less that k
Quasi-Cliques - Example
Heuristic pruning
• For a -dense quasi-clique of size k, we recursively remove nodes with
degree less than k and incident edges
• We can start from low-degree nodes and recursively remove all nodes with
degree less that k
• Example: Identify a quasi clique of size 4, is (7/10) .7
• {A,B,C,D} forms a quasi clique of size 4
Example
Graph Communities
Edge Betweenness: Example
EB of =0
EB of = 20 [ SP from {1,2,3,4}-{5,6,7,8,9} => 4 * 5 ]
EB of (5, 7) = 2 [ SP from 5-7=> 1, SP from 5-9 => 1 ]
EB of (5,8) = 1 [ SP from 5-8 => 1 ]
EB of = 10 [ SP from 6-7=>1, SP from 6-9 => 1 , SP from {1,2,3,4} - 7 =>4,
SP from {1,2,3,4} - 9 => 4 ]
EB of = 5 [ SP from 6-8=>1, SP from {1,2,3,4} - 8 =>4 ]
EB of = 2 [ SP from 7-8=>1, SP from 8-9 =>1 ]
EB of = 8 [ SP from {1,2,3,4,5,6,7,8} - 9 => 8*1 ]
Edge Betweenness Divisive Clustering : Example (cont’d)
Graph Components after removing e(7,9)
Betweenness(1-3) = 1X5=5
Betweenness(3-7)=betweenness(6-7)=betweenness(8-9) = betweenness(8-12)= 3X4=12
Girvan Newman method: Example 1
5X5=25
Example 2
5X6=30 5X6=30
Example 2
IV. Balanced Communities
• Community detection can be thought of graph clustering
• Graph clustering : we cut the graph into several partitions and
assume these partitions represent communities
• Cut: partitioning (cut) of the graph into two (or more) sets
(cutsets)
– The size of the cut is the number of edges that are being cut
• Minimum cut (min-cut) problem : Find a graph partition such
that the number of edges between the two sets is minimized
Min-cut
Min-cuts can be computed
efficiently using the max-flow
min-cut theorem
Min-cut often returns an imbalanced partition, with one set being a singleton
Ratio Cut and Normalized Cut
For the graph given above, find the Ratio cut and Normalized cut for the cut A and cut B
Ratio Cut & Normalized Cut : Example
A
For Cut A
For Cut B
Ratio Cut & Normalized Cut
Ratio Cut & Normalized Cut
Spectral Clustering
Degree Matrix:
Given a graph G = ( V , E ) with | V | = n , the degree matrix
D for G is a n × n diagonal matrix defined as given below
The elements of D are
Spectral Clustering
Laplacian matrix :
Given a simple graph G with n vertices, its Laplacian matrix L n
× n is defined as:
L=D−A
where D is the degree matrix and A is the adjacency matrix of
the graph. Since G is a simple graph, A only contains 1s or 0s
and its diagonal elements are all 0s.
The elements of L are given by
Two
communities:
𝒌-means, 𝒌=𝟐
2 Eigenvectors
i.e., we want
𝒌
2 communities
Spectral Clustering – Example (cont’d)
Applying k-Means to the second Eigen Vector {-0.46, -0.46, -0.26, 1.16 * 10-16,
1.16 * 10-16 ,0.13,0.13,0.33, 0.59 }
• Given: {-0.46, -0.46, -0.26, 1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
k=2
• Randomly assign means: m1= -0.46, m2=0.13
Iteration 1
• K1={-0.46, -0.46, -0.26,}, K2={1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
Calculating mean of K1 and K2 results in
m1= -0.39, m2=0.197
Iteration 2
• K1={-0.46, -0.46, -0.26,}, K2={1.16 * 10-16, 1.16 * 10-16 ,0.13,0.13,0.33, 0.59 },
Calculating mean of K1 and K2 results in
m1= -0.39, m2=0.197
• So given a degree distribution, the expected number of edges between any pair
of vertices can be computed.
Modularity and Modularity Maximization
Consider a partitioning of the graph G into k partitions 𝑃 = (𝑃1, 𝑃2, 𝑃3, … , 𝑃𝑘)
Two
Communities:
{1, 2, 3, 4}
and
{5, 6, 7, 8, 9}
-means
2 eigenvectors
Modularity Matrix
Community Detection Algorithms - Summary