Professional Documents
Culture Documents
Recognition
Htwe Nu Win Khin Thidar Lynn
University of Computer Studies, Mandalay Faculty of Information Science
Myanmar University of Computer Studies, Mandalay
htwenuwin99@gmail.com Myanmar
lynnthidar@gmail.com
Abstract² Communities among users play the popular role for This paper explores the use of neighborhood overlapping
days of Social Network and the presence of groups of nodes that by using vertex similarity method for detecting outlier and
are high tightly connected with each other than with less links significant community. The heart of this approach is to
connected to nodes of different groups. So community detection represent the underlying dataset as an undirected graph, where
algorithms are come to be the key to detect the user who are a user refers to each node and friendship between two users
interact with each other in social media. However, there are still
represents each edge. Before we measure the similarity among
challenges in considering of some nodes have no any common
node within the same group as well as some nodes have no any neighborhood overlap, finding seed node by using the degree
link to the other node. It can be used similarity measure based on centrality is necessary which is designed to find nodes that are
neighborhood overlapping of nodes to organize communities and PRVW ³FHQWUDO´ WR WKH JUDSK :H operate similarity from the
to identify outliers which cannot be grouped into any of the most centrality node and its neighborhood nodes. The values of
communities. In this paper, we detect communities and outliers zero similarity are then used to identify as outliers. To
from Edge Structure with neighborhood overlap by using nodes illustrate, consider graph of Figure 1, which consists of 23
similarity. The result implies the best quality with modularity nodes. Upon applying the detecting outliers method, the
measurement which leads to more accurate communities as well communities which rounded with red circles are group
as improved their density after removing outliers in the network
together, some nodes are saturated with outside them and node
structure.
Keywords-Social Network; community; outlier; modularity; x is saturated lonely. It can be seen clearly the significant
density communities, outliers and isolated node in this toy example.
I. INTRODUCTION m Communities
Outlier
Social networks are naturally modeled as graphs, which we
l
sometimes represent it as a social graph. The entities are
nodes, and an edge that connects two nodes if the nodes are e
g h
interacted by the relationship that forms the network. For a
Facebook friendship graph which we will use in our work,
social graphs are undirected and unweighted. The
communities of complex networks are groups of nodes which c
b j
are high tightly connected with nodes of the same group other f i
than with less links connected with nodes of different groups
[22]. Communities may be groups of friendship in social d
networks [1], sets of web pages concerning with the same o
topic and groups of cells with similar functions. While k
Community
156
ܱ ݏݐݑൌ ሼݒȁܸ א ݒǡ ܸ݅ᇱ ܸ݅ א ݒ ר ݏ݉ܥ אᇱ ሽ ൌ ܸ െ ڂ
ୀଵ ܸ݅Ԣ
(3)
Outliers directly identified by getting nodes from small Assuming like that, our work identify outliers are appeared
communities [18]: with among separated communities.
ݏ݉ܥራ ܱ ݏݐݑൌ ݒ IV. PROPOSED APPROACH
ݏ݉ܥሩ ܱ ݏݐݑൌ Ͳ Network can be represented as graph for detecting outliers
and significant communities. In this paper, we use concept of
Where, ݏ݉ܥis the community and ܱ ݏݐݑis indicated as undirected graph in which two vertices have common node, if
outlier.
they share one or more of the same vertices. Let Ƚ be the
F. Isolated Node neighborhood of vertex ݅ in a network, i.e, the set of vertices
Some users who used social network, any friendship are that are directly connected to ݅ݒvia an edge. Then the number
not belonging to them can be included in social network. Such of common friends of ݅ݒand ݆ݒis ȁȽ תȽ ȁ . Contrary, nodes
loneliness users are represented as isolated nodes in this paper. which have no any other node in common or sometime they
Since, there are no neighbor nodes around them as well as can be isolated, we can identify them as outliers, can be given
their edge structure could not be considered. as ȁȽ תȽ ȁ ൌ Ͳ . In our work, we propose the system to
Before calculating the seed node, we find them and define identify significant community with detecting outlier by using
as outlier firstly. As shown in Fig1, node x is isolated and it can node similarity depend on this concept.
be made as outlier. In this work we use two main methods, firstly the vertex
centrality which is used to determine seed nodes with the
G. Vertex Similarity highest degree or the largest number of neighborhood and then
It can be assumed that communities are groups of vertices use the method of neighborhood overlap based on vertex
similar to each other. We can compute the similarity between Similarity. In this work, we used the following properties:
each pair of vertices after searching seed nodes. Most existing x Node which has the largest degree is determined as
similarity method are based on the measurement of distance seed node.
called Euclidean, Manhattan and etc., Although, to considered x Determine seed node by using Vertex Centrality
the similarity between selected node and is neighborhood, Method.
Jaccard Similarity is more convenient in this work which we x Calculate Vertex Similarity among nodes that are
will measure the similarity based on the neighborhood overlap linked to seed node.
of seed nodes. Let ܰ denote the neighbors of node ݒ . Given x Nodes that belong to the same communities are likely
a link e(ݒ ǡ ݒ ), the neighborhood overlap is defined as to be more links connected to each other.
௨௦ௗௗ௦௧௩ ௗ௩ೕ
ܱ݈ܽݎ݁ݒሺ݅ݒǡ ݆ݒሻ ൌ
௨ௗ௦௪ௗ௧௧௧௦௧௩ ௗ௩ೕ
x Each community is likely to be fewer links outside
the rest of the graph.
ȁே ځேೕ ȁ
ൌ หே (4) x The isolated individual nodes are stand as outliers.
ேೕ หିଶ
x Nodes in the same community are likely to share
We have െʹin the denominator just to exclude ݒ ܽ݊݀ݒ common neighborhood node.
from the set ܰ ܰ . If there are no overlap vertices in any x If nodes do not have common neighborhood, they are
acted as outliers.
two ܰ ܽ݊݀ܰ means หܰ ܰ ځ ห ൌ , we can identify ܰ
disconnect withܰ . The neighborhood overlap is positively V. EXPERIMENTS
correlated with the total number of minute spent by two
persons in a telecommunication network [14]. For instance, in A. Description of Datasets
Figure 2, two terminal nodes, d and e have only one edge In this paper, real undirected networks, and the fraction of
between them. They have no any other neighborhood overlap. SNAP Social circles: Facebook1 Dataset is used. This dataset
Therefore, their overlap value is 0 as shown in (5). Again, the FRQVLVWVRILQIRUPDWLRQRIIULHQGV¶FLUFOHVIURP)DFHERRNIRU
neighborhood overlap of other two terminal nodes, e and f is instance, the profiles for each user (features), circles and
node h. We can see their overlap value in (6). network. ,Q WKLV 'DWDVHW VWDWLVWLFV ³QRGHV´ UHSUHVHQWV WKH
QXPEHU RI IULHQGV ³(GJHV´ UHSUHVHQWV WKH QXPEHU RI
e f friendship in the network. Thus, 4,039 nodes and 88,234 edges
a
are totally included in this dataset. Since this Facebook graph
b d separated 10 files which have same features are cluster
g h
together in each. :H XVH HGJH VWUXFWXUH RI ³(GJHV´ ILOH LQ
c this work. There are 347 nodes, represent as users and 2519
edges for their friendship information as shown in Table I.
Figure 2. Example of Neighborhood Overlap
ȁሼ ሽȁ
ܱ݈ܽݎ݁ݒሺ݀ǡ ݁ሻ ൌ ȁሼǡǡǡௗǡǡǡǡሽȁିଶ ൌ ൌͲ (5)
଼ିଶ
ȁሼሽȁ ଵ ଵ
ܱ݈ܽݎ݁ݒሺ݁ǡ ݂ሻ ൌ ȁሼǡǡǡሽȁିଶ ൌ ൌ (6)
ସିଶ ଶ
1
https://snap.stanford.edu/data/egonets-Facebook.html
157
TABLE I. NUMBER OF NODES AND EDGES IN REAL FACEBOOK SOCIAL ϭ͘Ϯ
NETWORK DATASET
158
good in measurement. The drawback of our method is only Conference on Knowledge Discovery and Data Mining (SIGKDD),
Edmonton, Alberta, 2002, pages 538±543.
based on edge structure without considered any information
[7] H. D. K. Moonesinghe and Pang-Ning Tan. ³Outrank: a graph-based
such as profile of user. So, our future will be used feature of outlier detection framework using random walk´ International Journal
users to detect effective community. on Artificial Intelligence Tools, 17(1), 2008.
[8] Hung-Hsuan Chen and C. Lee Giles, ³ASCOS: an asymmetric network
TABLE IV. THE EVIDENCE OF AVERAGE CORRELATION AND NON- structure context similarity measure´ In IEEE/ACM International
CORRELATION AMONG COMMUNITIES AFTER REMOVEING OUTLIER Conference on Advances in Social Networks Analysis and Mining
(ASONAM), Niagara Falls, Canada, 2013.
Community Non-Correlation Correlation
[9] Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang,
³Simrank++: Query rewriting through link analysis of the click graph´
0 15.73% 1.81%
In Proceedings of the 34nd International Conference on Very Large Data
1 39.64% 2.33% Bases (VLDB), Auckland, New Zealand, 2008, pages 408±421.
[10] J.Y. Chen, O. R. Zaiane, R. Goebel, ³Detecting Communities in Large
2 29.76% 1.36% Networks by Iterative Local Expansion´ International Conference on
3 42.77% 1.06% Computational Aspects of Social Networks, 2009.
[11] Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos
4 39.76% 0.63% Faloutsos, ³Neighborhood formation and anomaly detection in bipartite
5 29.76% 0.88% graph´ICDM, November 27-30 2005.
[12] Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu,
6 42.77% 3.36% Lei Li, Koji Maruhashi, B. Aditya Prakash, and Hanghang Tong,
7 39.64% 0.71% ³Metricforensics: A multi-level approach for mining volatile graphs´ In
Proceedings of the 16th ACM International Conference on Knowledge
8 43.07% 5.46% Discovery and Data Mining (SIGKDD), Washington, DC, 2010, pages
163±172.
9 38.75% 0.19%
[13] Leman Akoglu, Mary McGlohon and Christos Faloutsos , ³Anomaly
10 41.51% 0.07% Detection in Large Graphs´, November 2009, CMU-CS-09-173.
[14] /HL 7DQJ DQG +XDQ /LX ³&RPPXQLW\ 'HWHFWLRQ DQG 0LQLQJ LQ 6RFLDO
11 41.51% 0.07% 0HGLD´ ,6%1 $ 3XEOLFDWLRQ LQ WKH 0RUJDQ
12 42.77% 1.59% Claypool Publishers series, 2010.
[15] 0 ( - 1HZPDQ DQG 0 *LUYDQ ³)LQGLQJ DQG HYDOXDWLQJ FRPPXQLW\
13 42.64% 1.65% structure in networks´Phys. Rev. E, vol. 69, p. 026113, Feb 2004.
14 43.07% 0.38% [16] 0( - 1HZPDQ ³0RGXODULW\ DQG community structure in networks´,
Proceedings of the National Academy of Sciences, vol. 103, no. 23,
15 41.28% 0.04% pp.8577±8582, 2006.
16 38.75% 0.29% [17] 0*LUYDQ 0(-1HZPDQ ³Community structure in social and
biological networks´, Proc.Natl. Acad. Sci. USA 99, 7821-7826, 2002.
17 42.64% 0.07% [18] Meng Wang, Chaokun Wang, Jeffrey Xu Yu and Jun Zhang,
18 43.07% 0.16% ´Community Detection in Social Networks:An Indepth Benchmarking
Study with a ProcedureOriented Framework´ 41st International
19 38.75% 1.12% Conference on Very Large Data Bases, August 31 st September 4th 2015.
20 38.75% 0.33% [19] 0LFKHO 3ODQWLH¶ DQG 0LFKHO &UDPSHV ³Survey on Social Community
Detection´, Social Media Retrieval, Springer Publishers, Computer
21 38.75% 0.31% Communications and Networks, 978-1-4471-4554-7, pp.65-85, 2013.
[20] 0LQJPLQJ &KHQ 7RPP\ 1JX\HQ DQG %ROHVODZ . 6]\PDQVNL ³2Q
0HDVXULQJ WKH 4XDOLW\ RI D 1HWZRUN &RPPXQLW\ 6WUXFWXUH´
Proceedings of the IEEE Social Computing Conference, Washington
REFERENCES DC, September 8-14, pp. 122-127,2013.
[1] Amol Ghoting, Matthew Eric Otey, and Srinivasan Parthasarathy, [21] Peixiang Zhao, Jiawei Han, and Yizhou Sun. ³P-rank: a comprehensive
³/RDGHGLink-based outlier and anomaly detection in evolving data structural similarity measure over information networks´ In Proceedings
sets´ In ICDM, 2004. of the 18th ACM Conference on Information and Knowledge
[2] CDOHE & 1REOH DQG 'LDQH - &RRN ³Graph-based anomaly detection´ Management (CIKM), Hong Kong, China ACM, 2009, pages 553±562.
In KDD, 2003, pages 631±636. [22] S. Fortunato, ³&RPPXQLW\ 'HWHFFWLRQ LQ *UDSK´ Phys. Rep. 486, 75
[3] Deepayan Chakrabarti, ³$XWRSDUWParameter-free graph partitioning and (2010).
outlier detection´ In PKDD, 2004, pages112±124. [23] William Eberle and Lawrence B. Holder. ³Discovering structural
[4] D. Hawkins, ³Identification of outliers´ Chapman and Hall, 1980. anomalies in graph-based data´ In ICDM Workshops, 2007, pages 393±
398.
[5] Fei Teng, Rongjie Dai, Hongjie Wang and Xiaoliang Fan, ³$FFHOHUDWLQJ
/LQN &RPPXQLW\ 'HWHFWLRQ LQ 6RFLDO 1HWZRUNV´ 2015 International [24] Xiang Li, Yong Tan, Ziyang Zhang, and Qiuli Tong ³Community
Conference on Cloud Computing and Big Data (CCBD) (2015), Detection in Large Social Networks Based on Relationship Density´
Shanghai, China, Nov. 4, 2015 to Nov. 6, 2015, ISBN: 978-1-4673- 2016 IEEE 40th Annual Computer Software and Applications
8350-9, pages 119-126. Conference (COMPSAC) (2016), Atlanta, GA, USA, June 10, 2016 to
June 14, 2016, ISSN: 0730-3157, ISBN: 978-1-4673-8846-7, page 524-
[6] Glen Jeh and Jennifer Widom. ³SimRank: a measure of structural- 533.
context similarity´ In Proceedings of the 8th ACM International
159