Professional Documents
Culture Documents
1
and V2 are exactly equivalent V0 = V1 = V2 . Some of the other graph embedding algorithms
preserve the node structure. The idea of node struc-
Definition 2.1. (Network Alignment, π) A subset of ture, first introduced in Leonardo F. R. Ribeiro et
V 1 × V 2 such that any node in V1 = {1, ..., n1 } and al. [2] is to represent symmetry in which network
V2 = {1, ..., n2 } is matched to at most one node in nodes are identified according to their relationship to
the other graph [6]. We define one-to-one matching other nodes. In the embeddings which only preserves
for the subsets where all nodes from V1 is matched node structure rather than proximity, two neighbour
with exactly one node from V2 . node may have very dissimilar features. From Fig-
ure 3 it can be understood that two far away node or
Another notion that we use is graph embedding. It even nodes from different graphs can have the same
is applied for transforming graph data into the latent representation on the embedding space.
vector space.
2
We also use the notation of noise to define the dif- Table 1: Used Symbols and definitions
ferences between two graphs topologies. Since we are
Symbols Definitions
using the same node set during our alignment, we re- G(V, E) Graph with vertex set V and edge set E
move a subset of edges to create noise. As you can see X Embeddings for the graph G
from the Definition 3.1 noise signifies the edge removal N Number of the nodes
probability for the graph. nj Node j.
emj Embeddings for the node nj
In other words, two different nodes from differ- wjk Embeddings for the node nj for the hob k
ent graphs which have very similar embeddings should dj Node degree of the nj .
also have the similar structure. Using this informa- dmax dj for maximum degree node nj
dmin dj for minimum degree node nj
tion, we can obtain structural similarity for nodes in B Number of buckets for one hub distance
between different graphs while keeping embeddings Rik set {nj ∈ G|dist(ni , nj ) = k}
independent from the position of the nodes inside the kmax Hub distance will be used at most
graphs. θ Threshold in alignment method
θ̄ Threshold ratio in alignment method
Given these properties, we come up with the fol- w Weight to control the importance of closer n
lowing node embedding algorithm which preserves _ Vector concatenation
node structure properties. Let G = (V, E) and the set
Rik denotes the set of nodes which linked with ni ∈ V
exactly with k length path. To obtain node embed-
ding for ni , we use histograms of degree distributions
of Rik for different i ∈ 1..kmax . We concatenate(_
concatenation of 123 and 456 is 123456) these his-
togram vectors and add d(ni ) as a feature to get em-
beddings of a ni as illustrated in Equation (1).
3
ni with its closest node from the other graphs. k and B etc. To overcome this problem we introduce
a θ̄ concept which is more consistent across different
na ∈ G1 , n1 , n2 , n3 · · · ∈ G2 settings.
Using one-to-many method, we are able to match one Q1: Are we able to catch all correct mappings, how
node to multiple high possible nodes. However, now many of the node mappings are true?
we have to introduce a new threshold hyperparameter.
This parameter is symbolizing the maximum distance Q2: How confident are we from the matchings that
to match the node pair. To set θ to a particular value we made during the alignment phase for differ-
is not an easy task. Distance in the latent vector space ent p?
is dependent directly on the k level, B bin count in
the histogram, N number of total nodes etc. To set a Q3: Can we control the answer of these questions
constant threshold may not work in different values of using the θ̄ and w?
4
Table 2: Used Definitions during experiment evalua-
tion
5
4.4 Experiment on varying threshold using the neighbours of the nodes. This means if the
node has much more neighbours, our embedded fea-
As mentioned before, one to set matching tends to tures which lay on latent vector space will be denser
match all high possible pairs of nodes across the and denser vector may be more accurate about the
graphs. While this enables us to have higher recall structure. In addition to that, in the power law graphs
by catching nearly all possible true positives on high that we use while generating G, degrees are not uni-
threshold values, it may decrease the precision of our formly distributed. There are much more low degree
algorithm. We use threshold ratio to control these nodes then the high degree nodes. If we assume that
outcomes. During evaluation of the one to set match- embeddings of high degree nodes will be more simi-
ing, we use same simulation environment with same lar to other embeddings which have also high degrees
graph creation and embedding settings. then high degree nodes will have less possible nodes
The one-to-many matching algorithm is more de- to be matched.
scriptive about how our alignment method works un-
der the hood. Since we are able to see all potential
pairs by raising our threshold value, we can also un-
derstand the distribution of the distances in latent
vector space. Therefore, rather than observing only
one information (a node to another), now we can de-
termine how other nodes are also similarly embedded
into vector space.
6
algorithm by generating synthetic data in a simula- [7] Ehsan Kazemi, S. Hamed Hassani, and
tion environment. Matthias Grossglauser. “Growing a Graph
We report the accuracy of our algorithm higher Matching from a Handful of Seeds”. In: Proc.
than 92% with the one-to-one alignment for 3% noise VLDB Endow. 8.10 (June 2015), pp. 1010–1021.
level, and we see that even in high noise like 7%, our issn: 2150-8097. doi: 10 . 14778 / 2794367 .
algorithm is capable of finding more than 70% of all 2794371. url: http://dx.doi.org/10.14778/
node pairs correctly. 2794367.2794371.
We test our embedding method with one-to-many [8] Mingdong Ou et al. “Asymmetric Transitivity
alignment to understand how our algorithm works un- Preserving Graph Embedding”. In: Proceedings
der the hood. We see that using low θ̄ our algorithm of the 22Nd ACM SIGKDD International Con-
may preserve its high precision even in very high noise ference on Knowledge Discovery and Data Min-
conditions as can be seen from the Figure 9. We also ing. KDD ’16. San Francisco, California, USA:
see that we can get high precision with low recall or ACM, 2016, pp. 1105–1114. isbn: 978-1-4503-
high recall with low precision. Therefore we can con- 4232-2. doi: 10.1145/2939672.2939751. url:
trol type I errors and type II errors in our alignment http : / / doi . acm . org / 10 . 1145 / 2939672 .
process using different θ̄. As future works, we will 2939751.
try to increase our overall matching accuracy using
matched pairs in high precision settings (in low θ̄) . [9] Lawrence Page et al. The PageRank Citation
If we find some of the correct matched pairs with low Ranking: Bringing Order to the Web. Technical
θ̄, we may use this knowledge in the alignment pro- Report 1999-66. Previous number = SIDL-WP-
cess for the other pairs. We may also provide some 1999-0120. Stanford InfoLab, 1999. url: http:
”seed”s (starting points), which can then be used as //ilpubs.stanford.edu:8090/422/.
input to state-of-the art algorithms like [7]. [10] Bryan Perozzi, Rami Al-Rfou, and Steven
Skiena. “DeepWalk: Online Learning of Social
Representations”. In: Proceedings of the 20th
References ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. KDD
[1] Francesco Corman, Rob M. P. Goverde, and ’14. New York, New York, USA: ACM, 2014,
Andrea D’Ariano. “Rescheduling Dense Train pp. 701–710. isbn: 978-1-4503-2956-9. doi: 10.
Traffic over Complex Station Interlocking Ar- 1145 / 2623330 . 2623732. url: http : / / doi .
eas”. In: Robust and Online Large-Scale Op- acm.org/10.1145/2623330.2623732.
timization: Models and Techniques for Trans-
portation Systems. Ed. by Ravindra K. Ahuja, [11] Johan Ugander et al. “The Anatomy of the
Rolf H. Möhring, and Christos D. Zaroliagis. Facebook Social Graph”. In: 1111.4503 (Nov.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2011).
2009, pp. 369–386. isbn: 978-3-642-05465-5. [12] Daixin Wang, Peng Cui, and Wenwu Zhu.
doi: 10.1007/978- 3- 642- 05465- 5_16. url: “Structural Deep Network Embedding”. In:
https : / / doi . org / 10 . 1007 / 978 - 3 - 642 - Proceedings of the 22Nd ACM SIGKDD Inter-
05465-5_16. national Conference on Knowledge Discovery
and Data Mining. KDD ’16. San Francisco, Cal-
[2] Daniel R. Figueiredo, Leonardo Filipe Ro-
ifornia, USA: ACM, 2016, pp. 1225–1234. isbn:
drigues Ribeiro, and Pedro H. P. Saverese.
978-1-4503-4232-2. doi: 10 . 1145 / 2939672 .
“struc2vec: Learning Node Representa-
2939753. url: http://doi.acm.org/10.1145/
tions from Structural Identity”. In: CoRR
2939672.2939753.
abs/1704.03165 (2017). arXiv: 1704 . 03165.
url: http://arxiv.org/abs/1704.03165.
[3] Aditya Grover and Jure Leskovec. “node2vec:
Scalable Feature Learning for Networks”. In:
Proceedings of the 22nd ACM SIGKDD Inter-
national Conference on Knowledge Discovery
and Data Mining. 2016.
[4] Mark Heimann, Haoming Shen, and Danai
Koutra. “Node Representation Learning for
Multiple Networks: The Case of Graph Align-
ment”. In: CoRR abs/1802.06257 (2018). arXiv:
1802 . 06257. url: http : / / arxiv . org / abs /
1802.06257.
[5] Lawrence K Saul and Sam T Roweis. “An in-
troduction to locally linear embedding”. In: 7
(Jan. 2001).
[6] E. Kazemi. Network Alignment: Theory, Algo-
rithms, and Applications. Ecole Polytechnique
Fédérale de Lausanne, 2016. url: https : / /
books.google.ch/books?id=HkETnQAACAAJ.
7
2