You are on page 1of 8

Graph Alignment using Graph Embedding Techniques

Orçun Gümüş, William Trouleau, Farnood Salehi, Patrick Thiran, and


Matthias Grossglauser

Information and Network Dynamics Laboratory (INDY), EPFL

Master semester project, Spring 2018

Abstract not match ”Alan Turing” from Facebook to the Insta-


gram account ”@1010” using node labels. Applying
The availability of networks is increasing
across scientific fields and the Internet since
our introduced solution, we are able to match these
the 90s. Facebook is using databases that use nodes from different data sources even if we do not
graph structure and network algorithms to have access to attributes like names. So we want to
query on that data, Google is using its PageR- find a method which matches the nodes of different
ank algorithm to increase query accuracy [11]. networks by only using the topology of the graphs.
However, even state-of-art algorithms for net-
works perform well on some problems, they
still fail to solve some tasks [4]. In particu-
lar, today with the increasing number of data
sources there is a necessity of being able to
handle multiple networks from different data
sources. In order to transfer information from
one graph to another, being able to match
nodes between networks is a necessity. How-
ever different data sources generally do not
share enough node attributes to make this
process straightforward. This is a well-known
problem called network alignment. In this
report, we propose a new solution based on
embeddings of networks in a low dimensional
vector space.

Figure 1: An example of two aligned graphs, blue and


1 Introduction pink nodes and their in-between relations comes from
different data sources, the mapping example is shown
Networks are powerful enough to be able to capture by the blue arrows from left to right. We have sup-
plenty of information from the relations. We are able posed that blue blue nodes collected from the Face-
to run various methods to get different properties book platform with formal names, the pink nodes are
about the information provided in the graph using collected from the Instagram platform with more in-
these relations. From community detection to PageR- formal usernames. So that, we are unable to match
ank, there are numerous of algorithms that make use the nodes using proper names in Facebook with user-
of networks [9]. Many of these algorithms run on a names from Instagram.
single network in order to obtain some useful infor- In order to develop a solution that does not use
mation from that graph. node attributes, we transform the network from the
In some scenarios like combining, merging or align- graph space into a latent vector space applying graph
ment of graphs, handling multiple networks may be embedding techniques. Using this approach, we are
needed. Especially in network alignment, the main able to measure distances between the nodes in a Eu-
objective is to find the correct matching between the clidean space and map nodes across different graphs
nodes of different networks. For our case, we used using these distances.
unlabeled networks which means there are edges and
nodes without any names. Therefore we need to come
up with a matching algorithm which only uses the 2 Problem Formulation
topology of the graph.
Suppose we work on working on the friendship Let us now formally define the network alignment
data on Facebook and Instagram separately. As you problem. Suppose we have different graphs G1 and G2
can see from the Figure 1, nodes of these graphs with vertex sets V1 and V2 resp., our aim is to come
that symbolize the same thing in the real world may up with a matching between the nodes in V0 = V1 ∩V2
not share the same attributes across different data where V0 is the set of vertices common to both graph
sources. For example, in the illustrated case we can- [6]. In our study, we assume that the vertex sets V1

1
and V2 are exactly equivalent V0 = V1 = V2 . Some of the other graph embedding algorithms
preserve the node structure. The idea of node struc-
Definition 2.1. (Network Alignment, π) A subset of ture, first introduced in Leonardo F. R. Ribeiro et
V 1 × V 2 such that any node in V1 = {1, ..., n1 } and al. [2] is to represent symmetry in which network
V2 = {1, ..., n2 } is matched to at most one node in nodes are identified according to their relationship to
the other graph [6]. We define one-to-one matching other nodes. In the embeddings which only preserves
for the subsets where all nodes from V1 is matched node structure rather than proximity, two neighbour
with exactly one node from V2 . node may have very dissimilar features. From Fig-
ure 3 it can be understood that two far away node or
Another notion that we use is graph embedding. It even nodes from different graphs can have the same
is applied for transforming graph data into the latent representation on the embedding space.
vector space.

Definition 2.2. (Graph Embedding, w) Given a 3 Contribution


graph denoted as G = (V, E), network embedding
aims to learn a mapping function f : ni 7→ wi ∈ IRd , Graph alignment becomes a harder problem when the
where d  |V | [12]. topologies of the graphs are different. Some edges
in one graph may not exist in the other. We try to
Through the dimension transformation from the introduce a new alignment method which is able to
graph space to latent vector space, preserved pro- overcome this kind of problems. To do so, we design
prieties of the graph is varying for different embed- an embedding algorithm which is more robust to noise
ding algorithms. To explain the design choices in our when it is compared with other embedding algorithms
embedding algorithm we show some of the preserved like LLE which identify itself as neighbour preserving
graph proprieties in different graph alignment algo- embedding or HOPE which preserve high-order prox-
rithms. imities of large-scale graphs. [5, 8].

Figure 2: On the right graph, there is a Zachary’s


karate club network, on the left, embeddings of the
graph using DeepWalk in 2 latent dimension using
implementation described at [10]

For example, well-known DeepWalk embedding al-


gorithm preserves the relation between the nodes [4,
3]. This means a random node and its neighbours
are more likely to have similar latent vector space
representations. ”Close” nodes inside a graph will Figure 4: Effect on the noise of different embedding
tend to have share similar features in the latent vec- methods. The distances calculated in euclidian space
tor space, and far nodes inside a graph responsively between original embeddings with the embeddings in
will be placed also far in the latent vector space as it the noised version. Y-axis shows 1 minus ratio be-
can be seen in the Figure 2. Since DeepWalk is de- tween the average distance of the original and the
signed to preserve these relations between the nodes, noisy version embeddings divided by overall average
it is a successful tool for the tasks like community distances in that space. If this value is one this means
detection. that the distances between original and the noisy ver-
sion are too small relative to the other distances in
that space.

3.1 Network Embedding Algorithm


The main idea to extract the structure of a node
Figure 3: (Figure 1 of [2]), An example of two struc- is to look at its neighbours’ degree distribution. So
turally similar nodes(µ and ν) inside a single network. that the distance in the latent vector space between
Even the distance between the nodes µ and ν are high, these degree distributions should be correlated with
their neighbours also have similar degree distribution. the structural difference.
Node µ have 5 neighbours that have 4, 3, 3, 2, 2 de-
grees, node ν have 4 neighbours that have 3, 3, 2, 2 Definition 3.1. (noise, p) is the probability of re-
degrees. moving edge e ∈ E from the graph G(V, E)

2
We also use the notation of noise to define the dif- Table 1: Used Symbols and definitions
ferences between two graphs topologies. Since we are
Symbols Definitions
using the same node set during our alignment, we re- G(V, E) Graph with vertex set V and edge set E
move a subset of edges to create noise. As you can see X Embeddings for the graph G
from the Definition 3.1 noise signifies the edge removal N Number of the nodes
probability for the graph. nj Node j.
emj Embeddings for the node nj
In other words, two different nodes from differ- wjk Embeddings for the node nj for the hob k
ent graphs which have very similar embeddings should dj Node degree of the nj .
also have the similar structure. Using this informa- dmax dj for maximum degree node nj
dmin dj for minimum degree node nj
tion, we can obtain structural similarity for nodes in B Number of buckets for one hub distance
between different graphs while keeping embeddings Rik set {nj ∈ G|dist(ni , nj ) = k}
independent from the position of the nodes inside the kmax Hub distance will be used at most
graphs. θ Threshold in alignment method
θ̄ Threshold ratio in alignment method
Given these properties, we come up with the fol- w Weight to control the importance of closer n
lowing node embedding algorithm which preserves _ Vector concatenation
node structure properties. Let G = (V, E) and the set
Rik denotes the set of nodes which linked with ni ∈ V
exactly with k length path. To obtain node embed-
ding for ni , we use histograms of degree distributions
of Rik for different i ∈ 1..kmax . We concatenate(_
concatenation of 123 and 456 is 123456) these his-
togram vectors and add d(ni ) as a feature to get em-
beddings of a ni as illustrated in Equation (1).

Xi = Xi1 _ Xi2 _ . . . Xikmax _ d(ni ) (1)

Algorithm 1: Embedding Algorithm


Input: G(V, E), graph to be embedded, B
histogram length, kmax maximum
distance to be taken in the account
Output: X, embedding of the graph G
1 for ni in G do
2 for k < kmax do
3 Xik ←B bins histogram of Rik
Xi ← Xi _ wk × Xik
4 end Figure 5: On the left there is Zachary’s karate club
5 Xi ← Xi _ d(ni ) network, removed edges labelled with red X. On the
6 end top right., embeddings generated by DeepWalk for
7 return X; 2-dimensional latent space is shown [10]. Original
(without edge removal) embeddings are painted in
With the Algorithm 1, the embedding space will transparent colours, nodes from the noisy graph (after
have 1 + B × kmax dimension. The 1 in the formula is edge removal) painted in solid colours. Arrows show
node degree of the resp. node. B is the bin count of the distance between original and noisy version. The
the histograms. To increase the importance of more same process is done for our algorithm and the result
close neighbours, we introduce a weight hyperparam- shown in the bottom right where B is 2, kmax is 1 and
eter which will be used to simply multiply with the w is 1.
partial embeddings Xik for a distance level k as illus-
trated in Equation (2).
3.2 Network Alignment Algorithm
In the previous section, we have introduced a graph
Xi = w1 × Xi1 _ . . . wkmax × Xikmax _ d(ni ) (2) embedding algorithm which preserves node struc-
tures. In this section, we introduce our graph align-
Since designed embedding algorithm is capable of ment method.
capturing the structure of the nodes, it is more ro- Measuring node distances across graphs We
bust to noise than other algorithms which are prox- use distances between nodes in the latent vector space
imity based. As you can see from the Figure 5, most generated by the embedding. Distances are calculated
of the nodes have exactly the same embedding when using L2 distance as follows:
we remove some of the edges. On the other hand, the
embeddings generated from DeepWalk are more vary- dist(n1 , n2 ) = kem1 − em2 k22 (3)
ing with the noise. Nearly all the nodes change their
position in the latent space when we remove only 7 Using this distance measurement, it is possible to
edges from the graph. come up with a straightforward method which maps

3
ni with its closest node from the other graphs. k and B etc. To overcome this problem we introduce
a θ̄ concept which is more consistent across different
na ∈ G1 , n1 , n2 , n3 · · · ∈ G2 settings.

mapping(na ) = arg min dist(ni , na ) (4)


i θ = θ̄ × max dist(n1 , n2 ) (6)
n1 ∈G1 ,n2 ∈G2
Using Equation (4) every node will be mapped to
the most similar node from the other graph. However, Using θ̄ as hyperparameter, the θ is now not ef-
this mapping equation is not injective which means it fected from maximum distance in the latent vector
can create some nodes from G2 that do not have any space. This approach gives us the possibility to con-
matched node from G1 or it can create some nodes trol the θ without taking account the distribution of
form G2 matched more than once with different nodes distances. θ̄ will be used to assess the performance of
from G1 . our algorithm in the further steps.
To have a mapping function which is one-to-one,
we have implemented a greedy algorithm that iterates
over the pairwise distances. It sorts the distances from 4 Results
minimum to maximum and it matches nodes greedily
from the G1 to G2 starting from the minimum dis- 4.1 Simulation Setup
tance, Algorithm 3 can be found in appendix.
In order to test our embedding and alignment algo-
Algorithm 2: alignment using embeddings rithms, we have designed a simulation environment.
Input: Two graphs to be aligned, G1 , G2 , We formed our simulation in such way that it can im-
matching algorithm itate the real world scenarios. For example, suppose
Output: matching dictionary, one to one, or a hypothetical group of people who uses different so-
one to list dependent on the used cial networks. In different social networks, some of
matching algorithm the relations may not exist although the real friend-
1 X1 ← embeddings of G1 using Algorithm 1; ship relations is the same. So social networks can be
2 X2 ← embeddings of G2 using Algorithm 1; seen as a noisy version of friendship relations. To il-
3 X1 , X2 ← normalize column-wise the lustrate this type of scenarios we have conducted our
embeddings X1 , X2 simulation with these steps:
4 D ← compute distance matrix d(X1 , X2 ) ;
5 matching ← use Algorithm 3 on D; 1. Create a power law graph G with N nodes, with
6 return matching; e edges.
Overall alignment Algorithm 2 first compute em-
beddings of the graphs then takes the pairwise node 2. Create multiple noisy graphs Gi by removing
distances between these two embeddings. Then apply some of edges with probability p from the graph
Algorithm 3 matching function over the embeddings. what is generated in the first step.
Selecting only the most similar node for matching
approach is the most intuitive, straightforward ap- 3. Try to align the noisy graphs using the intro-
proach for the graph alignment problem. Since em- duced algorithm.
bedding algorithm preserve structure similarity, align-
4. Count the correct and false matchings in be-
ment method will map nodes which have most sim-
tween Gi using the real information extracted
ilar structure across graphs. However, this may not
from G.
match two nodes even they have shared very similar
latent vectors because one to one matching approach
can only match one node to exactly one another and We evaluate our algorithm using simulation setup
it loses the information from other matching possibil- with different (p), with different θ̄ and w. During all
ities. To overcome this problem, we introduce one- the tests, we use same B = 8 and kmax = 2.
to-many matching function to decrease our miss ratio We run multiple simulations with the same set-
during alignment process and not to lose any high tings to decrease the effect of the randomness. Each
matching possibilities. In this method, rather than simulation contains 5000 different G, and the results
using the matching function as node to node, we de- are produced by averaging the outcomes of these runs.
fine it as a node to set function. If not specified otherwise, all G have 1000 nodes and
10000 edges. By reviewing results of simulations, we
mapping(na ) = {ni ∈ G2 | dist(ni , na ) < θ} (5) try to answer questions listed below:

Using one-to-many method, we are able to match one Q1: Are we able to catch all correct mappings, how
node to multiple high possible nodes. However, now many of the node mappings are true?
we have to introduce a new threshold hyperparameter.
This parameter is symbolizing the maximum distance Q2: How confident are we from the matchings that
to match the node pair. To set θ to a particular value we made during the alignment phase for differ-
is not an easy task. Distance in the latent vector space ent p?
is dependent directly on the k level, B bin count in
the histogram, N number of total nodes etc. To set a Q3: Can we control the answer of these questions
constant threshold may not work in different values of using the θ̄ and w?

4
Table 2: Used Definitions during experiment evalua-
tion

True positive: An outcome where the


tp:
model correctly predicts the positive class.
False positive: is an outcome where the
f p:
model incorrectly predicts the negative class.
False negative: is an outcome where the
f n:
model incorrectly predicts the positive class.
P: Precision, tp/(tp + f p)
R: Recall, tp/(tp + f n)

To evaluate the performance of the experiments


we use true positive, false positive, false negative no-
tations defined as follows: Suppose µ and ν are nodes
from different graphs. If µ and ν are matched by our Figure 7: The average accuracy (all correctly matched
algorithm and they are the same node in the main node pairs by algorithm divided by all possible correct
graph we count the pair as a true positive. If µ and node pairs) using one-to-one matching Algorithm 3 on
ν are matched by the algorithm and they are not the varying p. Embedding generated using B = 8, kmax =
same node in the main graph, we count the pair as 2.
false positive. If µ and ν are not matched by the al-
gorithm and they are actually the same node in the
main graph, we count the pair as a false negative.
4.3 Experiment on varying weight
For embedding creation, we use the histogram of the
neighbour degree distribution as explained in the Sec-
tion 3.1. We associate different weights to different
distances in the graph. The weight concept is initi-
ated in order to control the importance of the node
embeddings of the neighbour closeness.

Figure 6: Shows examples of tp, f p, f n on a latent


embedding space. Blue nodes and pink nodes are from
a different graph and one-to-many algorithm is in use
with a given threshold. Inside the circle means model
predict as positive and the outside of the circles means
that models predict as negative.

4.2 Experiment on varying noise


Figure 8: The average precision of our graph align-
The one-to-one matching approach forces our align-
ment algorithm varying over w on different p visual-
ment algorithm to match every single node from G1
ized with different colors. Lower w decrease the effect
to precisely one node from G2 . This means the align-
of Rk for bigger k values. Since kmax as 2, we use
ment algorithm produces the exact same number of
only R1 and R2 and if w equals to 1 the effects of R1
matching with the number of the nodes.
and R2 are the same.
High noise on networks makes harder to correctly
match the nodes. Because when noise increases, the Therefore, if we want to increase the importance of
total removed edges also increases. Extracted edges the ”far” neighbours during embedding, we use higher
cause changes on the topology of the graph and our weights, respectively if we want to increase the effect
algorithm success is decreasing as expected. We see of the closest neighbours, we use lower weights. As
this phenomenon on Figure 7, precision is lower in shown in Figure 8, the precision of our algorithm have
higher noise values. Over the y-axis, we see a straight maximums near 1 and it is decreasing when the w is
line of decrease over the noise. However, even in the lower. Non-usage of the weight parameter is exactly
high noise level like 10%, which means that one of the same thing as setting the weight to 1 and we see
the ten edges removed in each graph, the algorithm that we have better precision when we do not use
is capable of correctly matching more than 50% of all weights. However, we will keep weight in order to use
nodes. it with higher kmax values.

5
4.4 Experiment on varying threshold using the neighbours of the nodes. This means if the
node has much more neighbours, our embedded fea-
As mentioned before, one to set matching tends to tures which lay on latent vector space will be denser
match all high possible pairs of nodes across the and denser vector may be more accurate about the
graphs. While this enables us to have higher recall structure. In addition to that, in the power law graphs
by catching nearly all possible true positives on high that we use while generating G, degrees are not uni-
threshold values, it may decrease the precision of our formly distributed. There are much more low degree
algorithm. We use threshold ratio to control these nodes then the high degree nodes. If we assume that
outcomes. During evaluation of the one to set match- embeddings of high degree nodes will be more simi-
ing, we use same simulation environment with same lar to other embeddings which have also high degrees
graph creation and embedding settings. then high degree nodes will have less possible nodes
The one-to-many matching algorithm is more de- to be matched.
scriptive about how our alignment method works un-
der the hood. Since we are able to see all potential
pairs by raising our threshold value, we can also un-
derstand the distribution of the distances in latent
vector space. Therefore, rather than observing only
one information (a node to another), now we can de-
termine how other nodes are also similarly embedded
into vector space.

Figure 10: The cumulative tp/(tp + f p) of one-to-


many matching on varying degree of nodes. Cumula-
tion starts from highest degree to lowest degree. The
total number of nodes inside the networks is 1000 and
B = 8, kmax = 2, w = 1 and p = 0.3. Different θ̄ visu-
alized with different colors. Results are averaged 100
simulations which run on the same settings.
Figure 9: The precision and recall of one-to-many So as you can see from the Figure 10, the preci-
matching on varying on θ̄. The total number of sion of our algorithm on high degrees is greater if we
nodes inside the networks is 1000, p = 0.03, B = 8, compare them with nodes which have moderate de-
kmax = 2 and w = 1. Precision is tp/(tp + f p), Recall gree rank. However, the precision of our algorithm is
is tp/tp + f n. Results are averaged 100 simulations also increasing in very low degree nodes since the em-
which run on the same settings. bedding of these nodes is becoming very sparse and
We can expect that when θ̄ increase, since algo- having the same feature is becoming very informative
rithm starts to capture more correct pairs, the re- in that case.
call also increases. In other words, we start to find
the pairs that we cannot find previously. However 5 Conclusion
there is no progress without any sacrifice, when we
increase our threshold ratio, our alignment method We introduce a graph embedding algorithm which
claims more false-positives, that are matched pairs by
preserves several node proprieties which are useful
our alignment method, but in fact, they are not the while identifying possibly same nodes across different
same node in the G. graphs. While designing our embedding algorithm, we
As can be seen in the Figure 9, when the θ̄ in- focus on transporting node structures from the graph
creases, the recall also increases and the precision de-
space to the embedding space. To do so, we extract
creases. This means that we can use θ̄ to change ourthe histogram of the degree distribution of various dis-
algorithm behaviour for different needs. If we need to
tance neighbours for each node. Using this approach,
find a few correct matchings with the low error rate,
we preserve the structural information of the nodes
we can use low θ̄, this may be in the case when you inside the latent space.
want to be sure about what is claimed as matched After determining our embedding method, we for-
(low type I error). If we need to find all possible cor-
mulate our alignment algorithm based on this embed-
rect matchings, we can use high θ̄ (low type II error).
ding. We calculate L2 distances of the cross-network
latent vectors and match the nodes based on the simi-
4.5 Experiment on different degrees larities on the embedding space. We introduce one-to-
one and one-to-many matching ideas and compare our
We also investigate the success of our algorithm on embedding method in these different alignment meth-
different node degrees. We construct our embeddings ods. We test the success of the introduced alignment

6
algorithm by generating synthetic data in a simula- [7] Ehsan Kazemi, S. Hamed Hassani, and
tion environment. Matthias Grossglauser. “Growing a Graph
We report the accuracy of our algorithm higher Matching from a Handful of Seeds”. In: Proc.
than 92% with the one-to-one alignment for 3% noise VLDB Endow. 8.10 (June 2015), pp. 1010–1021.
level, and we see that even in high noise like 7%, our issn: 2150-8097. doi: 10 . 14778 / 2794367 .
algorithm is capable of finding more than 70% of all 2794371. url: http://dx.doi.org/10.14778/
node pairs correctly. 2794367.2794371.
We test our embedding method with one-to-many [8] Mingdong Ou et al. “Asymmetric Transitivity
alignment to understand how our algorithm works un- Preserving Graph Embedding”. In: Proceedings
der the hood. We see that using low θ̄ our algorithm of the 22Nd ACM SIGKDD International Con-
may preserve its high precision even in very high noise ference on Knowledge Discovery and Data Min-
conditions as can be seen from the Figure 9. We also ing. KDD ’16. San Francisco, California, USA:
see that we can get high precision with low recall or ACM, 2016, pp. 1105–1114. isbn: 978-1-4503-
high recall with low precision. Therefore we can con- 4232-2. doi: 10.1145/2939672.2939751. url:
trol type I errors and type II errors in our alignment http : / / doi . acm . org / 10 . 1145 / 2939672 .
process using different θ̄. As future works, we will 2939751.
try to increase our overall matching accuracy using
matched pairs in high precision settings (in low θ̄) . [9] Lawrence Page et al. The PageRank Citation
If we find some of the correct matched pairs with low Ranking: Bringing Order to the Web. Technical
θ̄, we may use this knowledge in the alignment pro- Report 1999-66. Previous number = SIDL-WP-
cess for the other pairs. We may also provide some 1999-0120. Stanford InfoLab, 1999. url: http:
”seed”s (starting points), which can then be used as //ilpubs.stanford.edu:8090/422/.
input to state-of-the art algorithms like [7]. [10] Bryan Perozzi, Rami Al-Rfou, and Steven
Skiena. “DeepWalk: Online Learning of Social
Representations”. In: Proceedings of the 20th
References ACM SIGKDD International Conference on
Knowledge Discovery and Data Mining. KDD
[1] Francesco Corman, Rob M. P. Goverde, and ’14. New York, New York, USA: ACM, 2014,
Andrea D’Ariano. “Rescheduling Dense Train pp. 701–710. isbn: 978-1-4503-2956-9. doi: 10.
Traffic over Complex Station Interlocking Ar- 1145 / 2623330 . 2623732. url: http : / / doi .
eas”. In: Robust and Online Large-Scale Op- acm.org/10.1145/2623330.2623732.
timization: Models and Techniques for Trans-
portation Systems. Ed. by Ravindra K. Ahuja, [11] Johan Ugander et al. “The Anatomy of the
Rolf H. Möhring, and Christos D. Zaroliagis. Facebook Social Graph”. In: 1111.4503 (Nov.
Berlin, Heidelberg: Springer Berlin Heidelberg, 2011).
2009, pp. 369–386. isbn: 978-3-642-05465-5. [12] Daixin Wang, Peng Cui, and Wenwu Zhu.
doi: 10.1007/978- 3- 642- 05465- 5_16. url: “Structural Deep Network Embedding”. In:
https : / / doi . org / 10 . 1007 / 978 - 3 - 642 - Proceedings of the 22Nd ACM SIGKDD Inter-
05465-5_16. national Conference on Knowledge Discovery
and Data Mining. KDD ’16. San Francisco, Cal-
[2] Daniel R. Figueiredo, Leonardo Filipe Ro-
ifornia, USA: ACM, 2016, pp. 1225–1234. isbn:
drigues Ribeiro, and Pedro H. P. Saverese.
978-1-4503-4232-2. doi: 10 . 1145 / 2939672 .
“struc2vec: Learning Node Representa-
2939753. url: http://doi.acm.org/10.1145/
tions from Structural Identity”. In: CoRR
2939672.2939753.
abs/1704.03165 (2017). arXiv: 1704 . 03165.
url: http://arxiv.org/abs/1704.03165.
[3] Aditya Grover and Jure Leskovec. “node2vec:
Scalable Feature Learning for Networks”. In:
Proceedings of the 22nd ACM SIGKDD Inter-
national Conference on Knowledge Discovery
and Data Mining. 2016.
[4] Mark Heimann, Haoming Shen, and Danai
Koutra. “Node Representation Learning for
Multiple Networks: The Case of Graph Align-
ment”. In: CoRR abs/1802.06257 (2018). arXiv:
1802 . 06257. url: http : / / arxiv . org / abs /
1802.06257.
[5] Lawrence K Saul and Sam T Roweis. “An in-
troduction to locally linear embedding”. In: 7
(Jan. 2001).
[6] E. Kazemi. Network Alignment: Theory, Algo-
rithms, and Applications. Ecole Polytechnique
Fédérale de Lausanne, 2016. url: https : / /
books.google.ch/books?id=HkETnQAACAAJ.

7
2

A One-to-one Matching Algorithm


Algorithm 3: Bijection Matching An algorithm that matches every nodes to exactly one node
according to in between distances
Input: A set of distances with node indexes D = {(d1 , ν1 , µ1 ), (d2 , ν1 , µ2 ), . . . , (dn2 , νn , µn )}
1:1
Output: A bijection dictionary that maps every node inside one graph to another f : {µ −−→ ν }
1 mapping ← ∅ ;
2 for d, all couples µ ∈ G1 , ν ∈ G2 from the lowest distance to the highest do
3 if µ not matched before and ν not matched before then
4 mapping ← mapping ∪ (µ, ν)
5 end

B One-to-Many Matching Algorithm


Algorithm 4: An[1] algorithm that matches every nodes to a set of nodes that have distance smaller
then threshold
Input: Threshold ratio, A set of distances with node indexes
D = {(d1 , ν1 , µ1 ), (d2 , ν1 , µ2 ), . . . , (dn2 , νn , µn )}
1:n
Output: A dictionary that maps every node inside one graph to another f : {µ −−→ {ν1 , ν2 . . . νi } }
1 threshold ← threshold ratio × maximum distance
2 for d, all couples µ ∈ G1 , ν ∈ G2 for all d lower then threshold do
3 mapping ← mapping ∪ (µ, ν)
4 end

You might also like