Professional Documents
Culture Documents
Abstract—Network embedding algorithms are able to learn various data mining tasks on the network, including node
latent feature representations of nodes, transforming networks classification, community detection and link prediction by
into lower dimensional vector representations. Typical key ap- existing machine learning tools. Nevertheless, it is quite
plications, which have effectively been addressed using network
embeddings, include link prediction, multilabel classification challenging to design an “ideal” algorithm for network
and community detection. In this paper, we propose Biased- embeddings. The main reason is that, in order to learn
Walk, a scalable, unsupervised feature learning algorithm task-independent embeddings, we need to preserve as many
that is based on biased random walks to sample context important properties of the network as possible in the
information about each node in the network. Our random-walk embedding feature space. But such properties might be not
based sampling can behave as Breath-First-Search (BFS) and
Depth-First-Search (DFS) samplings with the goal to capture apparent and even after they are discovered, preserving some
homophily and role equivalence between the nodes in the net- of them is intractable.
work. We have performed a detailed experimental evaluation This can explain the fact that most of the traditional
comparing the performance of the proposed algorithm against methods only aim at retaining the first and the second-order
various baseline methods, on several datasets and learning proximity between nodes [22], [19], [2]. In addition to that,
tasks. The experiment results show that the proposed method
outperforms the baseline ones in most of the tasks and datasets. network representation learning methods are expected to be
efficient so that they can scale well on large networks, as
Keywords-Network representation learning; unsupervised well as are able to handle different types of rich network
feature learning; node sampling; random walks
structures, including (un)weighted, (un)directed and labeled
networks.
I. I NTRODUCTION
In this paper, we propose BiasedWalk, an unsupervised
Networks (or graphs) are used to model data arising and Skip-gram-based [15] network representation learning
from a wide range of applications – ranging from social algorithm which can preserve higher-order proximity infor-
network mining, to biology and neuroscience. Of particular mation, as well as is able to capture both the homophily and
interest here are applications that concern learning tasks role equivalence relationships between nodes. BiasedWalk
over networks. For example, in social networks, to answer relies on a novel node sampling procedure based on biased
whether two given users belong in the same social com- random walks, that can behave as actual depth-first-search
munity or not, examining their direct relationship does not (DFS) and breath-first-search (BFS) explorations – thus,
suffice – the users can have many further characteristics forcing the sampling scheme to capture both role equiva-
in common, such as friendships and interests. Similarly, in lence and homophily relations between nodes. Furthermore,
friendship recommendations, in order to determine whether BiasedWalk is scalable on large scale graphs, and is able
two unlinked users are similar, we need to obtain an in- to handle different types of network structures, including
formative representation of the users and their proximity – (un)weighted and (un)directed ones. Our extensive exper-
that potentially is not fully captured by handcrafted features imental evaluation indicates that BiasedWalk outperforms
extracted from the graph. Towards this direction, representa- the state-of-the-art methods on various network datasets and
tion (or feature) learning algorithms are useful tools that can learning tasks.
help us addressing the above tasks. Given a network, those Code and datasets for the paper are available at:
algorithms embed it into a new space (usually a compact https://goo.gl/Easwk4.
vector space), in such a way that both the original network
structure and other “implicit” features of the network are II. R ELATED W ORK
captured. Network embedding methods have been well studied since
In this paper, we are interested in unsupervised (non- the 2000s. The early works on the topic consider the embed-
task-specific) feature learning methods; once the vector ding task as a dimensionality reduction one, and are often
representations are obtained, they can be used to deal with based on factorizing some matrix associated to pairwise
distances between all nodes (or data points) [22], [19], [2]. Table I: Table of notation.
Those methods rely on the eigen-decomposition of such a
matrix are often sensitive to noise (i.e. missing or incorrect G = (V, E) An input network with set V of nodes and set E of
links
edges). Some recently proposed matrix factorization-based Φ(u) Vector representation of u
methods focus on the formal problem of representation C(u) Set of neighborhood nodes of u
learning on networks, and are able to overcome limitations of Φ0 (w) Vector representation of w as it is in the role of a
context node
the traditional manifold learning approaches. For example, pv Probability of v being the next node of the (uncom-
while traditional ones only focus on preserving the first- pleted) random walk
order proximity, GraRep [5] aims to preserve a higher-order D̃ Average degree of the input network
proximity by using many matrices, each of them is for k-step α Parameter for controlling the distances from the
source node to sampled nodes
transition probabilities between nodes. The HOPE algorithm L Maximum length of random walks
[16] defines some similarity measures between nodes which τv Proximity score of node v
are helpful for preserving higher-order proximities as well γ The number of sampled random walks per node
N (u) Set of neighbors of node u
and formulates those measures as a product of sparse ma- c Window size of training context of the Skip-gram
trices to efficiently find the latent representations. Ψ(u, v) Hadamard operator on vector representations of u
Recently, there is a category of network embedding and v
methods that rely on representation learning techniques in
natural language processing (NLP) [17], [26], [12], [27],
[23], [8]. Two of the most well-known approaches here community detection task, given a fixed sampling budget we
include DeepWalk [17] and node2vec [12], that are both prefer to visit as many community-wide nodes as possible.
based on the Skip-gram model [15]. The model has been Another shortcoming of node2vec is that its second-order
proposed to learn vector representations for words in a random walks require storing the interconnections between
corpus by maximizing the conditional probabilities of pre- the neighbors of every node. More specifically, node2vec’s
dicting contexts given the vector representations of those space complexity is O(D̃ · |E|), where D̃ is the average
words. DeepWalk [17] is the first method to leverage the degree of the input network G = (V, E) [12]. This can
Skip-gram model for learning representations on networks, be an obstacle towards embedding dense networks, where
by extracting truncated random walks in the network and the node degrees can be very high. Moreover, we prefer
considering them as sentences. The sentences are then fed random-walk based sampling rather than pursuing pure DFS
into the Skip-gram model to learn node representations. and BFS sampling because of the Markovian property of
The node2vec algorithm [12] can essentially be considered random walks. Given a random walk of length L, we can
as an extension of Deepwalk. In particular, the difference immediately obtain L − k context sets for nodes in the
between those two methods concerns the node sampling random walk by sliding a window of size k along the walk.
approach: instead of sampling nodes by uniform random The LINE algorithm [20], though not belonging to any
walks, node2vec uses biased (and second-order) random of the previous categories, is also one of the well-known
walks to better capture the network structure. Because the network embedding methods. LINE derives two optimiza-
Skip-gram based methods use random walks to approximate tion functions for preserving the first and the second order
high order proximities, their main advantages include both proximity, and performs the optimizations by stochastic gra-
scalability and the ability to learn latent features that only dient descent with edge sampling. Nevertheless, in general,
appear when we observe a high order proximity (e.g., the the performance tends to be inferior compared to Skip-
community relation between nodes). gram based methods. The interesting reader may refer to
As we will present later on, the proposed method which the following review articles on representation learning on
is called BiasedWalk, is also based on the Skip-gram model. networks [13], [11]. Lastly, we should mention here the
The core of BiasedWalk is a node sampling procedure which recent progress on deep learning techniques that have also
is able to generate biased (and first-order) random walks that been adopted to learn network embeddings [24], [6], [3],
behave as actual depth-first-search (DFS) and breath-first- [28], [7], [10].
search (BFS) explorations – thus, helping towards capturing
important structural properties, such as community informa- III. P ROBLEM F ORMULATION
tion and structural roles of nodes. It should be noted that A network can be represented by a graph G = (V, E),
node2vec [12] also aims to sample nodes based on DFS where V is the set of nodes and E is the set of links in
and BFS random walks, but it cannot control the distances the network. Depending on the nature of the relationships,
between sampling nodes and the source node. Controlling G can be (un)weighted and (un)directed. Table I includes
the distances is crucial as in many specific tasks, such as the notation used throughout the paper.
one of link prediction, nodes close to the source should have An embedding of G = (V, E) is a mapping from the
higher probability to form a link from it. Similarly, in the node set to a low-dimensional space, Φ : V → Rd , where
1
α α 0 + α1
1
d |V |. Network embedding methods are expected to be 2
2 1 α0
able to preserve the structure of the original network, as well 6
3 4
the Skip-gram model, we aim to find embeddings towards
predicting neighborhoods (or “context”) of nodes: 7 4
α +α
1 2
α1
α2
8 7 8
X 6
is not enough for node classification as nodes which are not E LECTION -B LOGS network. Since HOPE is able to preserve
in the same neighborhood can be classified by the same the asymmetric transitivity property of networks, it can work
labels. More precisely, BiasedWalk gives the best results well in directed networks, such as the E LECTION -B LOGS.
for B LOG C ATALOG and PPI networks; in B LOG C ATALOG, LINE’s performance is comparable to that of the Skip-gram
it improves Macro-F1 score by 15% and Micro-F1 score based methods and even the best one in the PPI network.
by 6%. In IMD B network, node2vec and BiasedWalk are The reason is that LINE is proposed to preserve the first
comparable, with the former being slightly better. Figure and the second-order proximity and that is really helpful for
2 depicts the performance of all the methods in different the link prediction task. It is totally possible to infer the
percentages of training data. Given a percentage of training existence of an edge if we know the role of its nodes and
data, in most cases, BiasedWalk has better performance meanwhile, roles of nodes can be discovered by examining
than the rest baselines. An interesting observation from the just their local neighbors. BiasedWalk, again, gains the best
experiment is that, preserving homophily between nodes is results for this task in most of the networks. Finally, the best
important for the node classification task. This explains the results of BiasedWalk in this task are almost based on its
reason why the best performance results obtained by Bi- BFS sampling. This supports the fact that, discovering role
asedWalk come from its DFS random-walk based sampling equivalence between nodes is crucial for the link prediction
scheme. task.
Micro F1 Score
Micro F1 Score
0.35 0.20 0.60
0.18
0.30 0.55
0.16
0.25 0.50
0.14
0.20 0.12 0.45
0.15 0.10 0.40
0.10 0.08 0.35
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
BlogCatalog PPI IMDB
0.35 0.20 0.50
0.30 0.18 0.45
Macro F1 Score
Macro F1 Score
Macro F1 Score
0.40
0.25 0.16 0.35
0.20 0.14 0.30
0.15 0.12 0.25
0.10 0.10 0.20
0.15
0.05 0.08 0.10
0.00 0.06 0.05
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
average degree of 10. As we can observe, BiasedWalk is biased random-walks, that can behave as actual depth-first-
able to learn embeddings for graphs of millions of nodes in search and breath-first-search explorations – thus, forcing
dozens of hours and scales linearly with respect to the size BiasedWalk to efficiently capture role equivalence and ho-
of graphs. Nearly total learning time belongs to the step of mophily between nodes. We have compared BiasedWalk to
sampling nodes that means the Skip-gram is very efficient several state-of-the-art baseline methods, demonstrating its
at solving Eq. (2). good performance on the link prediction and multilabel node
classification tasks. As future work, we plan to theoretically
VI. C ONCLUSIONS AND F UTURE W ORK analyze the properties of the proposed biased random-walk
scheme and to investigate how to adapt the scheme for
In this work, we have proposed BiasedWalk, a Skip-gram networks with specific properties, such as signed networks
based method for learning node representations on graphs. and ego networks, in order to obtain better embedding
The core of BiasedWalk is a node sampling procedure using
Table IV: Macro-F1 and Micro-F1 scores (%) of the 50%-50% train-test split in the link prediction task. HOPE could not
handle the large E PINIONS graph because of its high time and space complexity.
Network
A STRO P HY E LECTION -B LOGS PPI E PINIONS
Algorithm
Macro Micro Macro Micro Macro Micro Macro Micro
HOPE 59.97 65.61 76.34 76.99 53.15 61.45 * *
LINE 93.56 93.59 74.79 74.80 80.29 80.34 81.94 81.95
DeepWalk 92.29 92.32 79.06 79.12 67.84 67.99 89.14 89.18
node2vec 93.08 93.10 81.04 81.15 71.24 71.29 89.29 89.33
BiasedWalk 95.08 95.10 81.54 81.65 75.23 75.34 90.08 90.11
(The best walk type, α) BFS, 0.125 BFS, 0.125 BFS, 1.0 BFS, 0.25
0.28 0.28
Macro F1 Score
Macro F1 Score
0.26
0.26
0.24
0.24
0.22
0.22 0.20
3 2 1 0 3 4 5 6 7 8 9
log2 log2d
0.30 0.285
Macro F1 Score
Macro F1 Score
0.28
0.280
0.26
0.275
0.24
0.22 0.270
40 60 80 100 120 6 8 10 12 14 16
Length of walk Number of walks per node
R EFERENCES
5
log10 time (seconds)
Sampling time [1] Adamic, L.A., Glance, N.: The political blogosphere and the
4 Sampling and optimization time
2004 u.s. election: Divided they blog. In: LinkKDD. (2005)
3 36–43
[4] Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., et al.,
Figure 4: Scalability of BiasedWalk on Erdős-Rényi graphs. A.B.: The biogrid interaction database. Nucleic acids research
(2008)
[5] Cao, S., Lu, W., Xu, Q.: Grarep: Learning graph represen-
tations with global structural information. In: CIKM. (2015)
results. Moreover, we plan to examine the performance of 891–900
BiasedWalk in the task of community discovery.
[6] Cao, S., Lu, W., Xu, Q.: Deep neural networks for learning
graph representations. In: AAAI. (2016) 1145–1152
[7] Chang, S., Han, W., Tang, J., Qi, G.J., Aggarwal, C.C., [18] Richardson, M., Agrawal, R., Domingos, P.: Trust manage-
Huang, T.S.: Heterogeneous network embedding via deep ment for the semantic web. In: ISWC. (2003) 351–368
architectures. In: Proceedings of the 21th ACM SIGKDD
International Conference on Knowledge Discovery and Data [19] Roweis, S.T., Saul, L.K.: Nonlinear dimensionality reduction
Mining. KDD ’15 (2015) 119–128 by locally linear embedding. SCIENCE 290 (2000) 2323–
2326
[8] Du, L., Wang, Y., Song, G., Lu, Z., Wang, J.: Dynamic
network embedding : An extended approach for skip-gram [20] Tang, J., Qu, M., Wang, M., Zhang, M., Yan, J., Mei, Q.: Line:
based network embedding. In: Proceedings of the Twenty- Large-scale information network embedding. In: WWW.
Seventh International Joint Conference on Artificial Intelli- (2015) 1067–1077
gence, IJCAI-18. (7 2018) 2086–2092
[21] Tang, L., Liu, H.: Relational learning via latent social
[9] Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: dimensions. In: KDD. (2009) 817–826
Liblinear: A library for large linear classification. J. Mach.
Learn. Res. 9 (2008) 1871–1874 [22] Tenenbaum, J.B., Silva, V.d., Langford, J.C.: A global
geometric framework for nonlinear dimensionality reduction.
[10] Gao, H., Huang, H.: Deep attributed network embedding. Science 290(5500) (2000) 2319–2323
In: Proceedings of the Twenty-Seventh International Joint
Conference on Artificial Intelligence, IJCAI-18. (7 2018) [23] Tu, C., Zhang, W., Liu, Z., Sun, M.: Max-margin deepwalk:
3364–3370 Discriminative learning of network representation. In: Pro-
ceedings of the Twenty-Fifth International Joint Conference
[11] Goyal, P., Ferrara, E.: Graph embedding techniques, applica- on Artificial Intelligence. IJCAI’16 (2016) 3889–3895
tions, and performance: A survey. Knowledge-Based Systems
(2018) 2086–2092 [24] Wang, D., Cui, P., Zhu, W.: Structural deep network embed-
ding. In: KDD. (2016) 1225–1234
[12] Grover, A., Leskovec, J.: Node2vec: Scalable feature learning
for networks. In: KDD. (2016) 855–864 [25] Wang, X., Sukthankar, G.: Multi-label relational neighbor
classification using social context features. In: KDD. (2013)
[13] Hamilton, W.L., Ying, R., Leskovec, J.: Representation 464–472
learning on graphs: Methods and applications. IEEE Data
Engineering Bulletin. 40(3) (2017) [26] Yang, C., Liu, Z., Zhao, D., Sun, M., Chang, E.Y.: Network
representation learning with rich text information. In: Pro-
[14] Leskovec, J., Krevl, A.: SNAP Datasets: Stanford large ceedings of the 24th International Conference on Artificial
network dataset collection (2014) Intelligence. IJCAI’15 (2015) 2111–2117
[15] Mikolov, T., Sutskever, I., Chen, K., Corrado, G., Dean, J.: [27] Yang, Z., Cohen, W.W., Salakhutdinov, R.: Revisiting semi-
Distributed representations of words and phrases and their supervised learning with graph embeddings. In: Proceedings
compositionality. In: NIPS. (2013) 3111–3119 of the 33rd International Conference on International Con-
ference on Machine Learning - Volume 48. ICML’16 (2016)
[16] Ou, M., Cui, P., Pei, J., Zhang, Z., Zhu, W.: Asymmetric 40–48
transitivity preserving graph embedding. In: KDD. (2016)
1105–1114 [28] Zhang, Z., Yang, H., Bu, J., Zhou, S., Yu, P., Zhang, J.,
Ester, M., Wang, C.: Anrl: Attributed network representation
[17] Perozzi, B., Al-Rfou, R., Skiena, S.: Deepwalk: Online learning via deep neural networks. In: Proceedings of the
learning of social representations. In: KDD. (2014) 701–710 Twenty-Seventh International Joint Conference on Artificial
Intelligence, IJCAI-18. (7 2018) 3155–3161