You are on page 1of 6

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/343945209

Improving Louvain Algorithm by Leveraging Cliques for Community


Detection

Conference Paper · August 2020

CITATIONS READS

0 164

1 author:

Elaf Alhashemy
University of Babylon
1 PUBLICATION   0 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

Social Networks View project

All content following this page was uploaded by Elaf Alhashemy on 28 August 2020.

The user has requested enhancement of the downloaded file.


2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq

Improving Louvain Algorithm by Leveraging Cliques


for Community Detection
Elaf Adel Abbas Huda Naji Nawaf
College of Information Technology College of Information Technology
University of Babylon, University of Babylon,
Babylon, Iraq
Babylon, Iraq
Ministry of Education
halmamory@itnet.uobabylon.edu.iq
elaf1982adil@gmail.com

Abstract—Community detection is one of the most important fields agglomerative methods that applied iteratively by merge similar
that help us in understand and analyze the structure of social nodes/communities together, and the maximization value of an
networks. It is a tool to identify closely related groups in terms of objective function based on optimization algorithms. While the
social relations or common interests. In fact, community detection modularity is a popular quantitative suggested by Girvan and
can be applied in social media, web clients, or e-commerce. For
Newman to measure the quality of the partitions resulting from
this purpose, the traditional Louvain algorithm is used for
community detection as a suitable algorithm, since it provides fast, these algorithms mentioned above [4]. Essentially, the
efficient and robust community detection on large static networks. modularity measure can be stated as follows:
However, the high computing complexity of this algorithm is a Given a unweight graph G=(V, E) where V represents a set of
motivation of this work. Initially, the existing cliques and the other
nodes which have not included in cliques are considered as
nodes and E is set of edges between nodes. Suppose u ‫ א‬V, the
separated communities instead of considering each node in the (u, v) ‫ א‬E, which represents the relation between u and v. The
network is a community as in the traditional method, then the gain modularity of a clustering with k communities (C = {‫ܥ‬ଵ , ‫ܥ‬ଶ ,
of integrating neighboring communities is calculated. A specific ,……., ‫ܥ‬௞ }, ‫ܥ‬ଵ ≠ Ø and ‫ܥ‬ଵ ‫ ؿ‬V ) is given by [3]:
research methodology is followed to ensure that the work is
rigorous in achieving the aim of the work. In synthetic and real- ௗ೔ ௗೕ

world data, the traditional and improved algorithms had to be ܳሺ‫ܥ‬ሻ ൌ σ௜‫א‬௩ σ௝‫א‬௩ሺ‫ܣ‬௜௝ െ ሻߪሺ‫ܥ‬௜ ǡ ‫ܥ‬௝ ሻ (1)
ଶ௠ ଶ௠
applied to record the results, then analyze them. Experimentally,
the results prove the execution time has reduced if it is compared
with the traditional algorithm while preserving the quality of Where Aij the weight of the edge between i and j, ݀௜ is the
partitions at the same time somewhat. summation of the weights of the edges attached to vertex i, ‫ܥ‬௜
the community to which vertex i assigned , m can be calculated
Keywords—social network, community detection algorithms, by:
Louvain algorithm. ଵ
݉ ൌ σ௜௝ ‫ܣ‬ij (2)

I. INTRODUCTION
while ߪሺ‫ܥ‬௜ ǡ ‫ܥ‬௝ ሻ is a function return 1 if ‫ܥ‬௜ and ‫ܥ‬௝ in the same
The complex networks are a set of many connected nodes community and 0 otherwise. In fact, finding the better
that communicate in various ways. It is also described as the modularity value is NP-hard problem [5]. However, many
interactions and connections among members in a real algorithms depend the heuristic strategies to optimize this
networked system such as social, biological, and technological metric.
networks [1]. In a social network-based graph structure model,
each node represents the entity and the edge represents the Girvan and Newman (GN) algorithm proposed a
communication or interaction between these entities in the hierarchical division algorithm, principally the
network. In general, most of these networks are sparse groups betweenness value for all edges of the graph is calculated. Then,
globally and dense locally. On such, the network such social the edge(s) with the highest betweenness (are) removed. These
network is characterized by a community structure, in which the steps are repeated until no edges remain, then the level of
nodes within the community have a higher density of edges communities of higher modularity is determined, with O(m3)
while nodes among communities have a lower density of edges complexity time [4]. Indeed, there are several methods in
[2]. literature have been suggested to optimize and speed up the GN
algorithm.
A defining community is an important step for discovering
what makes entities come together and a comprehensive The original GN has been developed in [6] to optimize it.
understanding of the structural and functional characteristics of The main idea of the work is to classify the communities in two
a large network [3]. There are several types of algorithms to types; the strong communities consist of nodes that make up the
detect community: splitting algorithms detect the community inner degree of each node higher than that of the outer degree
and remove the edges that connect it to the network, and the weak communities whose total internal degree of nodes
exceeds their total external degree. It taking into account a local

978-1-7281-5249-3/20$31.00 ©2020 IEEE 244

Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq

measure that represents the edge-clustering coefficient. At each The paper is organized as follows: In Section II, the Louvain
iteration, the edge with the smallest clustering coefficient is algorithm is displayed for community detection. Section III
removed, the measure is recalculated again and so on. Besides, presents the development and details description of the proposed
the stopping criterion of GN algorithm has also been updated algorithm. The experimental and evaluation results are
based on the properties of communities rather than modularity. displayed and discussed in section IV. Finally, the conclusion
On another side, fast greedy modularity optimization has been and future work are highlighted in Section V.
suggested to improve the time complexity to be O(n log n) [7].
The latter has been optimized using. GN algorithm by starting II. LOUVAIN ALGORITHM
with isolated nodes and iteratively merging pairs of nodes to Louvain algorithm is one of the most effective algorithms
increase the modularity of each step. that developed to divide the community based on modularity
optimization. Initially, the Louvain algorithm considers each
Louvain algorithm is a method to extract communities from
vertex as a community see Figs. 1a and 1b. Mainly, the
large networks created by Blondel and others from
algorithm involves two steps:
the University of Louvain with time complexity O(m) [8]. It has
gained wide use for the optimization of modularity to be the The local movement of the vertices. By choosing one vertex
most effective algorithm in terms of quality and speed. As a randomly, the modularity οܳ gain can be examined for moving
matter of fact, it uses hierarchical division and improved greed this vertex to all neighbor communities. Consequently, the
in two primary stages: (i) the local movement of the vertices vertex would be placed into the neighbor community which
and; (ii) grouping several vertices of the original graph into achieves largest value modularity gain, if the modularity gain is
super vertices to reduce the size of a network at each level. none positive, the vertex stays in its original community. This
step will be repeating until all vertices examined and no longer
In literature, some algorithms have been presented to
more modularity gains for any vertex, thus small communities
improve the Louvain method. The Louvain+ [9] algorithm is a
are formed. The parameter οܳ refers to the modularity gain and
comprehensive multi-level method that attempts to improve the
it is defined as follows [8]:
quality of modularity by including a coarsening phase to extend
the Louvain algorithm. However, this algorithm is not superior οܳ ൌ ݁௜ǡ஼ି ಼೔మ೘
σ೟೚೟
(3)
over the traditional Louvain algorithm in terms of complexity of
time, rather it increases the complexity. Where ‡୧ǡେ is the summation of the weights of the edges between
Speeding up the Louvain algorithm proposed by [10] to vertex i and community C; Ki the summation of the weight of
detect the communities in the large graph in a short time by edges for vertex i, and σ௧௢௧ the summation of the weight of
moving the node to the random neighbor, but not the best edges incident to vertices in the community. Fig. 1c states the
neighbor. However, the random selection neighborhood makes local movement step.
greed less of Louvain's algorithm and more exploratory. Grouping several vertices into supervertices. The vertices
The authors in [11] proposed a hybrid algorithm based on the belonging to the same community are aggregated into a single
Louvain algorithm and Label Propagation Algorithm (LPA). vertex named supervertices as illustrated in Fig 1d.
This algorithm divided the network into two sub-graph: the first Reconstructing meta-graph by computing the weight of
one includes the key nodes with a high degree which edges among all supervertices. When there are vertices in
predetermined, and the second is the edge nodes which include different partitions in the graph, and there is at least one edge
the other nodes. Afterward, the Louvain algorithm and LPA between them; the supervertices must connect with each other.
have been used to detect the communities of the first and second At the lower level of the graph, the weight is calculating which
sub-graphs, respectively. That is, the authors intend to improve represents the sum of all edges connected between the vertices
the running time in the large network using both algorithms. in the different partitions. While the self-loop included in
The work that has been presented in [12] is to overcome the supervertices with its weight describes the sum of the edges
problem of poorly connected inside communities; thus connecting the linked vertices in the same partition of the graph
addressing the weakness at the Louvain algorithm when it's at the lower level.
implementing iteratively from their view. Consequently, the These two steps of the Louvain algorithm are repeated until
authors proposed a new algorithm called the Leiden algorithm. the modularity value is no longer improved and hierarchical
In this algorithm, they validate the real relations inside the communities are generated as shown in Figs. 1e and 1f. The
community before resuming their procedure. Practically, their example is illustrated in the figure explains the algorithm of
experimental results have shown that the running time and the Louvain [3] for only two iterations.
community structure have been improved.
Although there are a lot of studies proposed improving the
Louvain algorithm on different sides; as mention above, the
proposed work based on the clique principle to improve the
effectiveness of the algorithm.

245

Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq

Fig. 1. Louvain Algorithm Steps Fig. 2. Proposed Algorithm Steps

III. PREPARE PROPOSED ALGORITHM (IMPROVED Finally, the original process of the traditional algorithm is
LOUVAIN ALGORITHM) resumed; where each resultant community is represented as one
node with self-loop. The self-loop refers to the internal
Essentially, the traditional Louvain's algorithm forms the
connection of nodes in the same community and edges refer to
initial communities by making each individual node in network
the external connection of nodes with other nodes from different
a community, then the modularity gain is the criterion for
communities as illustrated in Fig 2d. In practice, the algorithm
moving each community to each other. Intuitively, the
meets the stopping criteria after one iterative which lead to a
operation suffers time complexity problem, which is the
reduction in execution time if compare it with the traditional
motivation of this work to alleviate the problem. A clique-
algorithm. To translate these steps to the Pseudocode,
Louvain algorithm is a proposed algorithm to improve the
ALGORITHM I introduced our proposed algorithm.
Louvain algorithm in terms of the time complexity. Essentially,
the traditional algorithm is linear with the number of edges Algorithm I: clique-Louvain Algorithms
whereas the improved is linear with the number of cliques . 1: Function clique_ Louvain Algorithm (Graph G)
Generally, the key point of our proposed algorithm is 2: Index_Clique ← findclique(G)
looking for the cliques that inherently found in network to form 3: currentGraph ← Aggregation(G, Index_ Clique)
the communities, given the undirected graph G= (V, E), where 4: Index_Com: The index of a clique or node as community ‫א‬
V denotes a set of vertices and E denotes a set of edges between original graph (G), each clique included at least 3 or more nodes.
vertices. The graph Gs= (Vs, Es) is a sub-graph of G if and only 5: Initialize each clique with its own community and the vertices that
if Vs‫ك‬ǡƒ† Es‫ ”‘ˆك‬every edge between (Vi, Vj)ęEs and not in clique it is own community
vertices Vi, Vj‫א‬Vs. 6: Modularity ← -∞
Figure 2a illustrates simple network of a coherent structure 7: while Modularity < Q(currentGraph, indexCom)
with different sizes as an example. A clique-Louvain algorithm 8: Modularity ← Q(currentGraph, indexCom)
is proposing to find the cliques that involves three members at 9: Index_Com ← MoveNodes(currentGraph)
least as the first step of the algorithm see Fig. 2 (b). Principally,
10: currentGraph ← Aggregation(currentGraph, Index_Com)
each clique or a node (is not member in a clique) in the graph
11: Index_Com← put each vertex of currentGraph in its own
are a community. It is more likely the members of a single
community
clique belong into one community practically. Under these
conditions, the algorithm provides some computations time to 12: End while
compute the modularity gain between communities in the first 13: return currentGraph
iteration. Computing the modularity gain is the second step of 14: End Function
algorithm, next moving the vertices or cliques. 15: Function findclique(Graph G)
Given the initial communities (clique or node), one community 16: Index_Clique: the index of community for each clique of G
can be chosen randomly and compute the modularity gain of 17: Index_Clique ← find clique in Graph that is large than 3 nodes
merging it with a neighbor (clique or node). Next, moving the 18: return Index_Clique
community chosen into the neighbor that achieves a maximum 19: End Function
value of modularity gain as explained in Fig. 2c. 20: Function MoveNodes(Graph G)

246

Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq

21: Index_Com : the index of community for each vertex of G TABLE I. REAL DATASET INFORMATION
22: while one or more nodes are moved do Dataset Dolphin Facebook Digg
23: for random v ‫ א‬V( G ) do Vertex 62 21099 193808
24: bestGain ← -∞ Edges 159 55883 886335
25: bestCommunity ← community_of_v Average degree 5.1290 5.2972 9.1465
26: for ‫ ׊‬neighboring nodes n of v do
27: Calculate gainModularity ← ∆Q between nodes v and n
B. Real Networks
28: if bestGain < gainModularity then
29: bestGain ← gainModularity
In order to validate the proposed algorithm, a more realistic
network environment has been utilized such as; Dolphin Social
30: bestCommunity ← community_of_n
Network [14], DIGG and Facebook [15]). The Dolphin Social
31: End if Network is a social network of bottlenose dolphins, where the
32: End for nodes are the bottlenose dolphins and an edge indicates a
33: currentGraph ← place v in the bestGain frequent association. The dolphins were observed between
34: End for 1994 and 2001. While DIGG is the friendship network among
35: End while bloggers; each node in the network is a user and each directed
edge denotes that a user replied to another user. The dataset was
36: return Index_Com
observed from 2005 to 2009. Finally, a Facebook dataset is an
37: End Function undirected network contains friendship data among users. In
38: Function Aggregation(Graph G, partition Index_Com) nature, the node is a user and the edge is a friendship between
39: currentGraph ← aggregation nodes which are in same two users. The information was recorded for the period from
community based on Index_Com at graph (G). 2007 to 2008. TABLE I demonstrates the related information
40: return currentGraph of the three datasets.
41: End Function C. Time Computation Measurement
Louvain and the clique-Louvain algorithms have been
IV. EXPERIMENTAL RESULTS
implemented on the synthetic and real networks for comparison
The experiential outcomes are profoundly dependent on the purposes in terms of execution time and modularity.
choice of experiential tools and a setting of the evaluation
parameters. The modularity is used to measure the quality of Figure (3) shows the performance of the algorithms over six
divided communities, and computation time is utilized to datasets versus time, where the vertical axis is the execution
validate the performance of improving Louvain. time, and the horizontal axis is the dataset kind. As illustrated,
the performance of the clique-Louvain algorithm a bit better
In this work, two types of datasets have been chosen: than that of the traditional algorithm when applied in the
synthetic such as LRF1, LRF2, LRF3, and real networks such synthetic networks. Whereas the algorithm records a very good
as DIGG, and Facebook. Important to say, the algorithm works performance when applied in the real networks. Generally, the
on any dataset provided that relations existing among nodes. execution time increases as the size of the dataset increases.
For evaluation purpose, the performance of the proposed However, the matter may differs in case of synthetic networks
algorithm has to be compared with the original Louvain which need more running time than that of real networks
algorithm. Both algorithms have been programmed in the despite the small size of its data. The difference of execution
Python language and executed on a PC equipped with Intel time between the original and improved algorithm is more clear
Core I7-2630 of 2.00GHz and 4.00GB RAM. as the size of the network increases
A. Synthetic Networks D. Modularity Measurement
LFR is a Synthetic benchmark network [13], which has many A modularity measure is described as an objective function to
adjustable parameters that allow fast graphic prototypes of evaluate the quality of communities, so modularity is the
varying degrees of community structure. In this work, three important standard to measure quality of the Louvain algorithm
scenarios with different configurations were applied. In the first that is explained in “(1)” . According to the results obtained in
scenario, the LFR1 network consists of 250 nodes; average Fig. 4, which illustrated the modularity measure of Louvain and
node degree 5; node power-law distribution 3; community- clique-Louvain algorithms under various datasets. It can be said
scale power-law index 1.5; and community structure definition the results are relatively close to each ' other and can overlook
is equal to 0.1. In the second scenario, the LFR2 and LFR3 the little difference versus improving the complexity of time
networks consists of 1000, 10000 nodes with same parameters: provided by the proposed algorithm.
average node degree 8; the node power-law distribution 2;
community-scale power-law index 2; and community structure
definition is equal to 0.56.

247

Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Kurdistan Region – Iraq

of the algorithm. Principally, each clique or a node (not member


in a clique) in the graph are a community. It is more likely the
members of a single clique belong into one community
practically. Therefore, it can conclude that cliques which
consider as a core of communities improves the performance of
the algorithm, at the same time keeps relatively the quality.
Generally speaking, the modularity of the proposed
algorithm is almost identical to that of a traditional algorithm
when applied to synthetic networks. In the same scenario of
modularity, the matter is a bit somewhat different when applied
both algorithms on real networks. In sum, the reason can be
attributed to the structures of the networks. The overall results
of the clique-Louvain algorithm shown that this algorithm
achieves a decrease the execution time over original Louvain
Fig. 3. Execution Time of clique-Louvain and Louvain Algorithms algorithm from (18.47%) at Facebook dataset and (14.94) at
DIGG dataset. Employing our proposed algorithm for the
dynamic networks will be an aim as possible future work.
REFERENCES
[1] P. Kumar, S. Gupta, and B. Bhasker, “An upper approximation based
community detection algorithm for complex networks,” Decis.
Support Syst., vol. 96, pp. 103–118, 2017.
[2] P. Bedi and C. Sharma, “Community detection in social networks,”
Wiley Interdiscip. Rev. Data Min. Knowl. Discov., vol. 6, no. 3, pp.
115–135, 2016.
[3] N. Ozaki, H. Tezuka, and M. Inaba, “A simple acceleration method for
the Louvain algorithm,” Int. J. Comput. Electr. Eng., vol. 8, no. 3, p.
207, 2016.
[4] M. E. J. Newman and M. Girvan, “Finding and evaluating community
structure in networks,” Phys. Rev. E, vol. 69, no. 2, p. 26113, 2004.
[5] J. Zhang, H. Liu, Z. Wen, and S. Zhang, “A sparse completely positive
relaxation of the modularity maximization for community detection,”
SIAM J. Sci. Comput., vol. 40, no. 5, pp. A3091–A3120, 2018.
Fig. 4. Modularity Measure of clique-Louvain and Louvain [6] F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, and D. Parisi,
“Defining and identifying communities in networks,” Proc. Natl. Acad.
Sci., vol. 101, no. 9, pp. 2658–2663, 2004.
More precisely, the modularity of clique-Louvain algorithm [7] A. Clauset, M. E. J. Newman, and C. Moore, “Finding community
is identical, if not better than that of the traditional algorithm structure in very large networks,” Phys. Rev. E, vol. 70, no. 6, p. 66111,
when applied in LRF1, LRF2, LRF3, and DIGG. As for the 2004.
Facebook and Dolphin datasets, the modularity a bit less than [8] V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre, “Fast
unfolding of communities in large networks,” J. Stat. Mech. theory
that of the traditional algorithm, which may the reason is the Exp., vol. 2008, no. 10, p. P10008, 2008.
structure of the dataset. Indeed, the structure of Facebook [9] O. Gach and J.-K. Hao, “Improving the Louvain algorithm for
dataset characterized by largely of overlapping cliques which community detection with modularity maximization,” in International
may affect the modularity. Conference on Artificial Evolution (Evolution Artificielle), 2013, pp.
145–156.
In nature, modularity and complexity time are very [10] V. A. Traag, “Faster unfolding of communities: Speeding up the
important measures to assess the performance of an algorithm, Louvain algorithm,” Phys. Rev. E, vol. 92, no. 3, p. 32801, 2015.
[11] B. Hu, W. Li, X. Huo, Y. Liang, M. Gao, and P. Pei, “Improving
however, setting a trade-off between complexity and quality Louvain algorithm for community detection,” in 2016 International
may depend on the applications. In fact, the two parameters are Conference on Artificial Intelligence and Engineering Applications,
may relate to one another somewhat inversely since the 2016.
traditional Louvain is a greedy algorithm to approximate the [12] V. A. Traag, L. Waltman, and N. J. van Eck, “From Louvain to Leiden:
guaranteeing well-connected communities,” Sci. Rep., vol. 9, no. 1, pp.
optimal modularity of the graph. 1–12, 2019.
[13] M. Coscia, “Discovering communities of community discovery,” in
V. CONCLUSION Proceedings of the 2019 IEEE/ACM International Conference on
Advances in Social Networks Analysis and Mining, 2019, pp. 1–8.
One of the most important tasks in analyzing complex
[14] S. Bilal and M. Abdelouahab, “Evolutionary algorithm and modularity
networks is community detection. Detecting communities in for detecting communities in networks,” Phys. A Stat. Mech. its Appl.,
social networks are far from easy: algorithms have to be fast, at vol. 473, pp. 89–96, 2017.
the same time have to provide high-quality. The proposed [15] M. Weskida and R. Michalski, “Finding influentials in social networks
clique-Louvain algorithm highlights some findings. In this using evolutionary algorithm,” J. Comput. Sci., vol. 31, pp. 77–85,
work, we improve the Louvain algorithm in terms of the time 2019.
complexity. A clique-Louvain algorithm was proposed to find
the cliques that involves three members at least as the first step

248

Authorized licensed use limited to: Carleton University. Downloaded on July 26,2020 at 11:01:58 UTC from IEEE Xplore. Restrictions apply.
View publication stats

You might also like