Professional Documents
Culture Documents
net/publication/343945209
CITATIONS READS
0 164
1 author:
Elaf Alhashemy
University of Babylon
1 PUBLICATION 0 CITATIONS
SEE PROFILE
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Elaf Alhashemy on 28 August 2020.
Abstract—Community detection is one of the most important fields agglomerative methods that applied iteratively by merge similar
that help us in understand and analyze the structure of social nodes/communities together, and the maximization value of an
networks. It is a tool to identify closely related groups in terms of objective function based on optimization algorithms. While the
social relations or common interests. In fact, community detection modularity is a popular quantitative suggested by Girvan and
can be applied in social media, web clients, or e-commerce. For
Newman to measure the quality of the partitions resulting from
this purpose, the traditional Louvain algorithm is used for
community detection as a suitable algorithm, since it provides fast, these algorithms mentioned above [4]. Essentially, the
efficient and robust community detection on large static networks. modularity measure can be stated as follows:
However, the high computing complexity of this algorithm is a Given a unweight graph G=(V, E) where V represents a set of
motivation of this work. Initially, the existing cliques and the other
nodes which have not included in cliques are considered as
nodes and E is set of edges between nodes. Suppose u אV, the
separated communities instead of considering each node in the (u, v) אE, which represents the relation between u and v. The
network is a community as in the traditional method, then the gain modularity of a clustering with k communities (C = {ܥଵ , ܥଶ ,
of integrating neighboring communities is calculated. A specific ,……., ܥ }, ܥଵ ≠ Ø and ܥଵ ؿV ) is given by [3]:
research methodology is followed to ensure that the work is
rigorous in achieving the aim of the work. In synthetic and real- ௗ ௗೕ
ଵ
world data, the traditional and improved algorithms had to be ܳሺܥሻ ൌ σא௩ σא௩ሺܣ െ ሻߪሺܥ ǡ ܥ ሻ (1)
ଶ ଶ
applied to record the results, then analyze them. Experimentally,
the results prove the execution time has reduced if it is compared
with the traditional algorithm while preserving the quality of Where Aij the weight of the edge between i and j, ݀ is the
partitions at the same time somewhat. summation of the weights of the edges attached to vertex i, ܥ
the community to which vertex i assigned , m can be calculated
Keywords—social network, community detection algorithms, by:
Louvain algorithm. ଵ
݉ ൌ σ ܣij (2)
ଶ
I. INTRODUCTION
while ߪሺܥ ǡ ܥ ሻ is a function return 1 if ܥ and ܥ in the same
The complex networks are a set of many connected nodes community and 0 otherwise. In fact, finding the better
that communicate in various ways. It is also described as the modularity value is NP-hard problem [5]. However, many
interactions and connections among members in a real algorithms depend the heuristic strategies to optimize this
networked system such as social, biological, and technological metric.
networks [1]. In a social network-based graph structure model,
each node represents the entity and the edge represents the Girvan and Newman (GN) algorithm proposed a
communication or interaction between these entities in the hierarchical division algorithm, principally the
network. In general, most of these networks are sparse groups betweenness value for all edges of the graph is calculated. Then,
globally and dense locally. On such, the network such social the edge(s) with the highest betweenness (are) removed. These
network is characterized by a community structure, in which the steps are repeated until no edges remain, then the level of
nodes within the community have a higher density of edges communities of higher modularity is determined, with O(m3)
while nodes among communities have a lower density of edges complexity time [4]. Indeed, there are several methods in
[2]. literature have been suggested to optimize and speed up the GN
algorithm.
A defining community is an important step for discovering
what makes entities come together and a comprehensive The original GN has been developed in [6] to optimize it.
understanding of the structural and functional characteristics of The main idea of the work is to classify the communities in two
a large network [3]. There are several types of algorithms to types; the strong communities consist of nodes that make up the
detect community: splitting algorithms detect the community inner degree of each node higher than that of the outer degree
and remove the edges that connect it to the network, and the weak communities whose total internal degree of nodes
exceeds their total external degree. It taking into account a local
Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq
measure that represents the edge-clustering coefficient. At each The paper is organized as follows: In Section II, the Louvain
iteration, the edge with the smallest clustering coefficient is algorithm is displayed for community detection. Section III
removed, the measure is recalculated again and so on. Besides, presents the development and details description of the proposed
the stopping criterion of GN algorithm has also been updated algorithm. The experimental and evaluation results are
based on the properties of communities rather than modularity. displayed and discussed in section IV. Finally, the conclusion
On another side, fast greedy modularity optimization has been and future work are highlighted in Section V.
suggested to improve the time complexity to be O(n log n) [7].
The latter has been optimized using. GN algorithm by starting II. LOUVAIN ALGORITHM
with isolated nodes and iteratively merging pairs of nodes to Louvain algorithm is one of the most effective algorithms
increase the modularity of each step. that developed to divide the community based on modularity
optimization. Initially, the Louvain algorithm considers each
Louvain algorithm is a method to extract communities from
vertex as a community see Figs. 1a and 1b. Mainly, the
large networks created by Blondel and others from
algorithm involves two steps:
the University of Louvain with time complexity O(m) [8]. It has
gained wide use for the optimization of modularity to be the The local movement of the vertices. By choosing one vertex
most effective algorithm in terms of quality and speed. As a randomly, the modularity οܳ gain can be examined for moving
matter of fact, it uses hierarchical division and improved greed this vertex to all neighbor communities. Consequently, the
in two primary stages: (i) the local movement of the vertices vertex would be placed into the neighbor community which
and; (ii) grouping several vertices of the original graph into achieves largest value modularity gain, if the modularity gain is
super vertices to reduce the size of a network at each level. none positive, the vertex stays in its original community. This
step will be repeating until all vertices examined and no longer
In literature, some algorithms have been presented to
more modularity gains for any vertex, thus small communities
improve the Louvain method. The Louvain+ [9] algorithm is a
are formed. The parameter οܳ refers to the modularity gain and
comprehensive multi-level method that attempts to improve the
it is defined as follows [8]:
quality of modularity by including a coarsening phase to extend
the Louvain algorithm. However, this algorithm is not superior οܳ ൌ ݁ǡି ಼మ
σ
(3)
over the traditional Louvain algorithm in terms of complexity of
time, rather it increases the complexity. Where ୧ǡେ is the summation of the weights of the edges between
Speeding up the Louvain algorithm proposed by [10] to vertex i and community C; Ki the summation of the weight of
detect the communities in the large graph in a short time by edges for vertex i, and σ௧௧ the summation of the weight of
moving the node to the random neighbor, but not the best edges incident to vertices in the community. Fig. 1c states the
neighbor. However, the random selection neighborhood makes local movement step.
greed less of Louvain's algorithm and more exploratory. Grouping several vertices into supervertices. The vertices
The authors in [11] proposed a hybrid algorithm based on the belonging to the same community are aggregated into a single
Louvain algorithm and Label Propagation Algorithm (LPA). vertex named supervertices as illustrated in Fig 1d.
This algorithm divided the network into two sub-graph: the first Reconstructing meta-graph by computing the weight of
one includes the key nodes with a high degree which edges among all supervertices. When there are vertices in
predetermined, and the second is the edge nodes which include different partitions in the graph, and there is at least one edge
the other nodes. Afterward, the Louvain algorithm and LPA between them; the supervertices must connect with each other.
have been used to detect the communities of the first and second At the lower level of the graph, the weight is calculating which
sub-graphs, respectively. That is, the authors intend to improve represents the sum of all edges connected between the vertices
the running time in the large network using both algorithms. in the different partitions. While the self-loop included in
The work that has been presented in [12] is to overcome the supervertices with its weight describes the sum of the edges
problem of poorly connected inside communities; thus connecting the linked vertices in the same partition of the graph
addressing the weakness at the Louvain algorithm when it's at the lower level.
implementing iteratively from their view. Consequently, the These two steps of the Louvain algorithm are repeated until
authors proposed a new algorithm called the Leiden algorithm. the modularity value is no longer improved and hierarchical
In this algorithm, they validate the real relations inside the communities are generated as shown in Figs. 1e and 1f. The
community before resuming their procedure. Practically, their example is illustrated in the figure explains the algorithm of
experimental results have shown that the running time and the Louvain [3] for only two iterations.
community structure have been improved.
Although there are a lot of studies proposed improving the
Louvain algorithm on different sides; as mention above, the
proposed work based on the clique principle to improve the
effectiveness of the algorithm.
245
Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq
III. PREPARE PROPOSED ALGORITHM (IMPROVED Finally, the original process of the traditional algorithm is
LOUVAIN ALGORITHM) resumed; where each resultant community is represented as one
node with self-loop. The self-loop refers to the internal
Essentially, the traditional Louvain's algorithm forms the
connection of nodes in the same community and edges refer to
initial communities by making each individual node in network
the external connection of nodes with other nodes from different
a community, then the modularity gain is the criterion for
communities as illustrated in Fig 2d. In practice, the algorithm
moving each community to each other. Intuitively, the
meets the stopping criteria after one iterative which lead to a
operation suffers time complexity problem, which is the
reduction in execution time if compare it with the traditional
motivation of this work to alleviate the problem. A clique-
algorithm. To translate these steps to the Pseudocode,
Louvain algorithm is a proposed algorithm to improve the
ALGORITHM I introduced our proposed algorithm.
Louvain algorithm in terms of the time complexity. Essentially,
the traditional algorithm is linear with the number of edges Algorithm I: clique-Louvain Algorithms
whereas the improved is linear with the number of cliques . 1: Function clique_ Louvain Algorithm (Graph G)
Generally, the key point of our proposed algorithm is 2: Index_Clique ← findclique(G)
looking for the cliques that inherently found in network to form 3: currentGraph ← Aggregation(G, Index_ Clique)
the communities, given the undirected graph G= (V, E), where 4: Index_Com: The index of a clique or node as community א
V denotes a set of vertices and E denotes a set of edges between original graph (G), each clique included at least 3 or more nodes.
vertices. The graph Gs= (Vs, Es) is a sub-graph of G if and only 5: Initialize each clique with its own community and the vertices that
if Vsكǡ Es كevery edge between (Vi, Vj)ęEs and not in clique it is own community
vertices Vi, VjאVs. 6: Modularity ← -∞
Figure 2a illustrates simple network of a coherent structure 7: while Modularity < Q(currentGraph, indexCom)
with different sizes as an example. A clique-Louvain algorithm 8: Modularity ← Q(currentGraph, indexCom)
is proposing to find the cliques that involves three members at 9: Index_Com ← MoveNodes(currentGraph)
least as the first step of the algorithm see Fig. 2 (b). Principally,
10: currentGraph ← Aggregation(currentGraph, Index_Com)
each clique or a node (is not member in a clique) in the graph
11: Index_Com← put each vertex of currentGraph in its own
are a community. It is more likely the members of a single
community
clique belong into one community practically. Under these
conditions, the algorithm provides some computations time to 12: End while
compute the modularity gain between communities in the first 13: return currentGraph
iteration. Computing the modularity gain is the second step of 14: End Function
algorithm, next moving the vertices or cliques. 15: Function findclique(Graph G)
Given the initial communities (clique or node), one community 16: Index_Clique: the index of community for each clique of G
can be chosen randomly and compute the modularity gain of 17: Index_Clique ← find clique in Graph that is large than 3 nodes
merging it with a neighbor (clique or node). Next, moving the 18: return Index_Clique
community chosen into the neighbor that achieves a maximum 19: End Function
value of modularity gain as explained in Fig. 2c. 20: Function MoveNodes(Graph G)
246
Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq
21: Index_Com : the index of community for each vertex of G TABLE I. REAL DATASET INFORMATION
22: while one or more nodes are moved do Dataset Dolphin Facebook Digg
23: for random v אV( G ) do Vertex 62 21099 193808
24: bestGain ← -∞ Edges 159 55883 886335
25: bestCommunity ← community_of_v Average degree 5.1290 5.2972 9.1465
26: for neighboring nodes n of v do
27: Calculate gainModularity ← ∆Q between nodes v and n
B. Real Networks
28: if bestGain < gainModularity then
29: bestGain ← gainModularity
In order to validate the proposed algorithm, a more realistic
network environment has been utilized such as; Dolphin Social
30: bestCommunity ← community_of_n
Network [14], DIGG and Facebook [15]). The Dolphin Social
31: End if Network is a social network of bottlenose dolphins, where the
32: End for nodes are the bottlenose dolphins and an edge indicates a
33: currentGraph ← place v in the bestGain frequent association. The dolphins were observed between
34: End for 1994 and 2001. While DIGG is the friendship network among
35: End while bloggers; each node in the network is a user and each directed
edge denotes that a user replied to another user. The dataset was
36: return Index_Com
observed from 2005 to 2009. Finally, a Facebook dataset is an
37: End Function undirected network contains friendship data among users. In
38: Function Aggregation(Graph G, partition Index_Com) nature, the node is a user and the edge is a friendship between
39: currentGraph ← aggregation nodes which are in same two users. The information was recorded for the period from
community based on Index_Com at graph (G). 2007 to 2008. TABLE I demonstrates the related information
40: return currentGraph of the three datasets.
41: End Function C. Time Computation Measurement
Louvain and the clique-Louvain algorithms have been
IV. EXPERIMENTAL RESULTS
implemented on the synthetic and real networks for comparison
The experiential outcomes are profoundly dependent on the purposes in terms of execution time and modularity.
choice of experiential tools and a setting of the evaluation
parameters. The modularity is used to measure the quality of Figure (3) shows the performance of the algorithms over six
divided communities, and computation time is utilized to datasets versus time, where the vertical axis is the execution
validate the performance of improving Louvain. time, and the horizontal axis is the dataset kind. As illustrated,
the performance of the clique-Louvain algorithm a bit better
In this work, two types of datasets have been chosen: than that of the traditional algorithm when applied in the
synthetic such as LRF1, LRF2, LRF3, and real networks such synthetic networks. Whereas the algorithm records a very good
as DIGG, and Facebook. Important to say, the algorithm works performance when applied in the real networks. Generally, the
on any dataset provided that relations existing among nodes. execution time increases as the size of the dataset increases.
For evaluation purpose, the performance of the proposed However, the matter may differs in case of synthetic networks
algorithm has to be compared with the original Louvain which need more running time than that of real networks
algorithm. Both algorithms have been programmed in the despite the small size of its data. The difference of execution
Python language and executed on a PC equipped with Intel time between the original and improved algorithm is more clear
Core I7-2630 of 2.00GHz and 4.00GB RAM. as the size of the network increases
A. Synthetic Networks D. Modularity Measurement
LFR is a Synthetic benchmark network [13], which has many A modularity measure is described as an objective function to
adjustable parameters that allow fast graphic prototypes of evaluate the quality of communities, so modularity is the
varying degrees of community structure. In this work, three important standard to measure quality of the Louvain algorithm
scenarios with different configurations were applied. In the first that is explained in “(1)” . According to the results obtained in
scenario, the LFR1 network consists of 250 nodes; average Fig. 4, which illustrated the modularity measure of Louvain and
node degree 5; node power-law distribution 3; community- clique-Louvain algorithms under various datasets. It can be said
scale power-law index 1.5; and community structure definition the results are relatively close to each ' other and can overlook
is equal to 0.1. In the second scenario, the LFR2 and LFR3 the little difference versus improving the complexity of time
networks consists of 1000, 10000 nodes with same parameters: provided by the proposed algorithm.
average node degree 8; the node power-law distribution 2;
community-scale power-law index 2; and community structure
definition is equal to 0.56.
247
Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq
248
Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
View publication stats