Professional Documents
Culture Documents
representation of graphs
Fürst, Luka
1 1 1
A A A A A
2 F B 2 h1, 4i F B h1, 4i 2 F B 2 h1, 4i F B h1, 4i 2 F B 2
1 1
A A A
3 F B 2 F B 3 F B 2
4 E C 4 h3, 6i E C h2, 6i 5 E C 4
D D D
6 6
turns false, the function BreakATie assigns a The function Proceed thus performs in time
unique (and hence final) number to some vertex O(n2 log n).
in a set of vertices with equal non-unique num-
bers. Alternatively, if Proceed terminates By combining these observations, we readily
with result true, it has either assigned a unique arrive at the following result:
number to at least one vertex or increased the Theorem 4. The time complexity of Algorithm
numbers of at least one non-singleton set of 1 is O(n3 log n).
vertices. Although the numbers in this set are
not unique, at least one vertex from the set will Proof. Since each iteration of the loop in the
retain its number until the end. function Main assigns a final number to at
least one vertex, the function performs at
Lemma 3. Each iteration of the function most n iterations. Since each iteration takes
Main runs in time O(n2 log n). O(n2 log n) time, we can conclude that the ro-
Proof. For each vertex w with a non-unique bust vertex numbering algorithm runs in time
number, the function Proceed lists its neigh- O(n3 log n).
bors and sorts their numbers to obtain T (w).
Since there are at most n neighbors, this can Experimental results
be achieved in time O(n log n) per vertex and
O(n2 log n) for the entire graph. The next Algorithm 1 renumbers the vertices of a given
step, i.e., finding the position of the sequence graph G in such a way that the pseudo-
T (w) within the lexicographically ordered se- canonical string z(G) is likely to be the same re-
quence of sequences T (1), T (2), . . . , T (n), gardless of the initial vertex numbering. Such a
can be performed by sorting. Since each string z(G) can thus be regarded as a canonical
of the sequences T (1), . . . , T (n) has O(n) representation for the graph G. To determine
elements, sorting the sequence of these se- whether Algorithm 1 gives rise to a numbering-
quence takes O(n2 log n2 ) = O(n2 log n) time. invariant string z(G) for a given graph G, we
run Algorithm 1 and build a string z(G) for
each of the n! possible initial numberings of the
graph vertices. If all resulting strings are equal,
the renumbering algorithm can be said to ‘work’
Algorithm 1 Robust renumbering of graph ver- for the graph G, and the resulting string can be
tices. considered to be a canonical representation for
1: function Main(G = (V, E))
G.
2: for all v ∈ V do As we already know, the function Main
3: id(v) := 1 performs at most n iterations, but the n-th iter-
4: end for ation never changes any number. (For example,
5: repeat if the graph has two vertices, the first iteration
6: changed := Proceed(G, id) changes the number 1 to 2 for one of the ver-
7: finished := (∀u, v ∈ V : id(u) 6= id(v)) tices, whereas the second only finds out that
8: if ¬finished ∧ ¬changed then both numbers are now unique.) If the algorithm
9: BreakATie(G, id) is terminated before the last iteration that ac-
10: end if tually changes something, the numbers of the
11: until finished vertices will not be unique, so we will have to
12: return id distinguish between vertices with equal num-
13: end function
bers. It makes sense to update the numbers
14: function Proceed(G = (V, E), id) of those vertices so that their order matches
15: id ′ := id the initial vertex numbering. For example, if
16: for all k ∈ {1, 2, . . . , |V |} do the algorithm running on the 6-cycle (Figure
17: W := Peers(V, id ′ , k) 3) were stopped upon obtaining the numbering
18: if |W | > 1 then {A 7→ 1, B 7→ 2, C 7→ 4, D 7→ 4, E 7→ 4,
19: for all w ∈ W do F 7→ 2}, that numbering would be updated to
20: T0 := hid ′ (w′ ) | w′ ∈ N (w)i {A 7→ 1, B 7→ 2, C 7→ 4, D 7→ 5, E 7→ 6,
21: T (w) := Sort(T0 ) F 7→ 3}.
22: end for We will call a graph i-canonicalizable if
23: for all w ∈ W do for all n! initial vertex numberings, all of the
24: P := {w′ ∈ W | T (w′ ) ≺ T (w)} numberings obtained after running the func-
25: id(w) := |P | + 1 tion Main for i iterations give rise to the
26: end for same pseudo-canonical string z(G). Con-
27: end if versely, if there are at least two initial num-
28: end for berings leading to at least two distinct strings
29: return (id 6= id ′ ) z(G) after performing i iterations of the func-
30: end function tion Main, the graph will be called non-i-
31: function BreakATie(G = (V, E), id) canonicalizable. A graph is canonicalizable if
32: ∗
k := min{k | |Peers(V, id, k)| > 1} it is n-canonicalizable and non-canonicalizable
33: W := Peers(V, id, k ∗ ) if it is non-n-canonicalizable. The fact that the
34: for all w ∈ W \ {min(W )} do numbering algorithm is polynomial implies the
35: id(w) := id(w) + 1 existence of non-canonicalizable graphs; if such
36: end for graphs did not exist, Algorithm 1 could be used
37: end function to solve the graph isomorphism problem in poly-
38: function Peers(V , id, k) nomial time. However, as we shall see, such
39: return {v ∈ V | id(v) = k} graphs are rare.
40: end function We tested our algorithm using the entire
set of simple undirected graphs with up to
9 vertices. Let Gn denote the set of simple
undirected graphs with n vertices. For each
n ∈ {2, . . . , 9} and i ∈ {1, . . . , n − 1}, Table
D The results of the experimental validation
A of our approach are gathered in Table 2. In ad-
E dition to time consumption in milliseconds (tPC
B for Algorithm 1 and tC for the canonicalization
F algorithm in the Sage system), we also show
C the number of iterations (N ) of the loop in the
G function Main. As expected, the polynomial-
time renumbering algorithm was almost in all
cases faster than the worst-case exponential al-
Figure 4: The graph G∗ : the smallest non-
gorithm in Sage. The only exception was the
canonicalizable graph.
(fairly dense) graph Advogato; however, to an-
swer the question why, in this particular case,
our algorithm takes more time than the one in
1 shows the number of non-i-canonicalizable Sage would require a more thorough investiga-
graphs within the set Gn . The graph G∗ in Fig- tion. Because of a large number of vertices
ure 4 is the smallest non-canonicalizable graph and enormous number of possible initial ver-
and the only such member of the set G7 . For ex- tex numberings, it is untractable to determine
ample, the initial numbering {A 7→ 1, B 7→ 2, whether the computed pseudo-canonical strings
C 7→ 3, D 7→ 4, E 7→ 5, F 7→ 6, G 7→ 7} can be regarded as canonical representations,
results in the final numbering {A 7→ 1, B 7→ 6, although, after trying out numerous random
C 7→ 7, D 7→ 2, E 7→ 4, F 7→ 3, G 7→ 5} permutations, it appears likely to be so.
and, in turn, in the pseudo-canonical string
111100/10011/0011/111/11/0, while the ini-
Conclusion
tial numbering {A 7→ 2, B 7→ 3, C 7→ 4,
D 7→ 1, E 7→ 5, F 7→ 6, G 7→ 7} leads to the We introduced a method for constructing a
numbering {A 7→ 1, B 7→ 3, C 7→ 4, D 7→ 5, pseudo-canonical representation of a graph.
E 7→ 6, F 7→ 2, G 7→ 7} and the represen- The method is based on an algorithm for
tation 111100/11100/0011/011/11/1. As we renumbering the graph vertices in a manner
can see, the graph G∗ is regular; all vertices that is robust to the initial numbering of the
have the same degree. Unsurprisingly, regu- vertices. Initially, the vertices are all assigned
lar graphs pose the greatest challenge to the number 1. Subsequently, in each iteration, the
renumbering algorithm. ties are broken by sorting the vertices in the or-
In another set of experiments, we built der of increasingly ordered sequences of their
pseudo-canonical strings for selected real-world neighbors’ numbers. If no tie is broken in this
graphs from the datasets KONECT [11] and way, one vertex in a group of equally numbered
SNAP [13]. In the pre-processing stage, we re- vertices retains its number, but all others in the
moved all vertex labels, changed directed edges same group are assigned a number higher by 1.
to undirected, removed loops, and turned every The initially strictly local neighborhood data
occurrence of multiple edges between the same thus iteratively propagate through the graph.
pair of vertices into a single edge. This time, Since each iteration of the algorithm assigns a
we were interested in the performance of our final number to at least one vertex, the algo-
renumbering algorithm rather than the canon- rithm runs for at most n iterations, with each
icalizability of individual graphs. In particular, iteration taking time O(n2 log n) owing to sort-
we compared the performance of Algorithm 1 ing O(n) sequences of size O(n).
with the algorithm for computing true canonical The numbering produced by the renum-
strings that is part of the Sage system. Algo- bering algorithm immediately gives rise to
rithm 1 was implemented in the C programming a pseudo-canonical string (if two strings are
language. All experiments were carried out on a equal, the corresponding graphs are isomor-
computer with a 8-core Intel Core i7-3770 pro- phic but not necessarily the other way around).
cessor operating at a clock rate of 3,40 GHz. However, in many cases, as shown by the exper-
Table 1: The number of non-i-canonicalizable simple undirected graphs with n vertices.
i
n |Gn | 1 2 3 4 5 6 7 8
2 1 0
3 2 0 0
4 6 2 1 0
5 21 12 7 3 0
6 112 88 52 27 7 0
7 853 781 427 235 89 20 1
8 11 117 10 873 6032 2780 1164 319 52 19
9 262 180 260 105 145 392 44 952 16 295 4691 899 201 113
iments, the pseudo-canonical string is invariant concept is closely related to that of exploratory
to the initial numbering of the vertices and can equivalence [15, 7]. It is thus worthy to de-
be regarded as a canonical representation (two termine the exact relationship between the two
strings are equal if and only if the correspond- concepts and to investigate if and how one
ing graphs are isomorphic). In particular, out could benefit another.
of 274 292 simple undirected graphs with up to
9 vertices, only for 133 it holds that there are References
at least two initial numberings that give rise to
distinct pseudo-canonical strings. [1] László Babai. Graph isomorphism in quasipoly-
nomial time [extended abstract]. In Symposium
Our approach to building pseudo-canonical
on Theory of Computing (STOC 2016), Cam-
strings is related to graph symmetries. For in- bridge, MA, USA, pages 684–697, Cambridge,
stance, in the fully symmetric graph K6 (the Massachusetts, USA, 2016.
complete graph on 6 vertices), it is immate- [2] László Babai and Eugene M. Luks. Canonical label-
rial how we number the vertices; the pseudo- ing of graphs. In Symposium on Theory of Com-
puting (STOC 1983), Boston, MA, USA, pages
canonical string will be the same for all 6! ver- 171–183, 1983.
tex numberings. Regarding the graph C6 (Fig- [3] V. Blagojević, D. Bojić, M. Bojović, M. Cve-
ure 3), there are fewer degrees of freedom; to tanović, J. Dordević, D. Durdević, B. Furlan,
keep the pseudo-canonical string fixed, only the S. Gajin, Z. Jovanović, D. Milićev, V. Miluti-
nović, B. Nikolić, J. Protić, M. Punt, Z. Radivo-
numbers of the vertices A, C, and E can be jević, Ž. Stanisavljević, S. Stojanović, I. Tartalja,
freely permuted, whereas the numbering of the M. Tomašević, and P. Vuletić. A systematic ap-
vertices B, D, and F has to ‘follow’ the num- proach to generation of new ideas for PhD research
bering of the other three (or vice versa). This in computing. In A. R. Hurson and V. Milutinović,
editors, Creativity in Computing and DataFlow Su-
perComputing, volume 104 of Advances in Com- and Information Science. His research areas
puters, chapter 1, pages 1–31. Elsevier, 2017. include graph grammars, general graph theory
[4] Diane J. Cook and Lawrence B. Holder. Mining
(with a particular emphasis on graph symme-
Graph Data. John Wiley & Sons, 2006.
[5] Dragoš Cvetković and Slobodan K. Simić. Spec- tries), software engineering, machine learning,
tral graph theory in computer science. IPSI BgD and computer programming education.
Transactions on Advanced Research, 8(2):35–42,
2012.
[6] Michael Elberfeld and Pascal Schweitzer. Canoniz-
ing graphs of bounded tree width in logspace. ACM
Transactions on Computation Theory, 9(3):12:1–
12:29, 2017.
[7] Luka Fürst, Uroš Čibej, and Jurij Mihelič. Max-
imum exploratory equivalence in trees. In Fed-
erated Conference on Computer Science and In-
formation Systems (FedCSIS 2015), Lódź, Poland,
pages 507–518, 2015.
[8] Michael R. Garey and David S. Johnson. Comput-
ers and Intractability: A Guide to the Theory of
NP-Completeness. W. H. Freeman and Company,
1979.
[9] Aimin Hou, Qingqi Zhong, Yurou Chen, and
Zhifeng Hao. A practical graph isomorphism al-
gorithm with vertex canonical labeling. Journal of
Computers, 9(10):2467–2474, 2014.
[10] Chuntao Jiang, Frans Coenen, and Michele Zito.
A survey of frequent subgraph mining algorithms.
Knowledge Eng. Review, 28(1):75–105, 2013.
[11] Jérôme Kunegis. KONECT: the Koblenz network
collection. In International Web Observatory Work-
shop (WWW 2013), Rio de Janeiro, Brazil, pages
1343–1350, 2013.
[12] Michihiro Kuramochi and George Karypis. Finding
frequent patterns in a large sparse graph. Data
Mining and Knowledge Discovery, 11(3):243–271,
2005.
[13] Jure Leskovec and Andrej Krevl. SNAP
Datasets: Stanford large network dataset collec-
tion. http://snap.stanford.edu /data, 2014.
[14] Brendan D. McKay and Adolfo Piperno. Practical
graph isomorphism, II. Journal of Symbolic Com-
putation, 60:94–112, 2014.
[15] Jurij Mihelič, Luka Fürst, and Uroš Čibej. Ex-
ploratory equivalence in graphs: Definition and al-
gorithms. In Federated Conference on Computer
Science and Information Systems (FedCSIS 2014),
Warsaw, Poland, pages 447–456, 2014.
[16] Jacobo Torán. On the hardness of graph isomor-
phism. SIAM Journal on Computing, 33(5):1093–
1108, 2004.
[17] David Weininger. SMILES, a chemical lan-
guage and information system. 1. introduction to
methodology and encoding rules. Journal of Chem-
ical Information and Computer Sciences, 28(1):31–
36, 1988.