You are on page 1of 20

Are Targeted Messages More Effective?

Martin Grohe∗
Eran Rosenbluth†
{grohe,rosenbluth}@informatik.rwth-aachen.de
RWTH Aachen University
Germany
ABSTRACT which is a vector over the reals.1 In this paper, the input graphs 𝐺
Graph neural networks (GNN) are deep learning architectures for are undirected vertex-labelled graphs, and the initial state f(𝑣) of a
graphs. Essentially, a GNN is a distributed message passing algorithm, vertex 𝑣 is an encoding of its labels. Then in each iteration all vertices
arXiv:2403.06817v1 [cs.LO] 11 Mar 2024

which is controlled by parameters learned from data. It operates on send a message to all their neighbours. Upon receiving the messages
the vertices of a graph: in each iteration, vertices receive a message from its neighbours, a vertex 𝑣 aggregates all the incoming messages,
on each incoming edge, aggregate these messages, and then update typically using coordinatewise summation, arithmetic mean, or
their state based on their current state and the aggregated messages. maximum, and then updates its state by applying a combination
The expressivity of GNNs can be characterised in terms of certain function to its current state and the aggregated messages. The states
fragments of first-order logic with counting and the Weisfeiler- of the vertices after the last iteration are used to compute the output of
Lehman algorithm. the GNN. The combination function as well as the message function
The core GNN architecture comes in two different versions. In used to compute the messages are computed by neural networks
the first version, a message only depends on the state of the source whose parameters are learned from data. Importantly, all vertices use
vertex, whereas in the second version it depends on the states of the the same message, aggregation, and combination functions. Thus
source and target vertices. In practice, both of these versions are they share the learned parameters, which means that the number
used, but the theory of GNNs so far mostly focused on the first one. of parameters is independent of the size of the input graph. It also
On the logical side, the two versions correspond to two fragments of means that a GNN can be applied to graphs of arbitrary size. Our
first-order logic with counting that we call modal and guarded. GNNs carry out a fixed number of iterations, and each iteration (or
The question whether the two versions differ in their expressivity layer of the GNN) has its own combination and message functions.
has been mostly overlooked in the GNN literature and has only been This is the most basic model and also the “standard”, but there is a
asked recently (Grohe, LICS’23). We answer this question here. It model of recurrent GNNs, which we do not study here.
turns out that the answer is not as straightforward as one might expect. Let us be clear that the view of a GNN as a distributed message
By proving that the modal and guarded fragment of first-order logic passing algorithm is an abstract conceptualisation that we use to
with counting have the same expressivity over labelled undirected explain how GNNs work. In an actual implementation of GNNs,
graphs, we show that in a non-uniform setting the two GNN versions nobody sends any messages. A central control instance collects the
have the same expressivity. However, we also prove that in a uniform states of all vertices in a matrix, and the computation is reduced
setting the second version is strictly more expressive. to matrix and tensor computations, typically carried out on a GPU
using highly optimised code. (The computation is parallelised, but
the parallelisation is not tied to the initial distributed algorithm.)
1 INTRODUCTION By now, there are numerous variants of the basic message passing
The question we address in this paper is motivated by comparing model just described. But even the basic model comes in two different
different versions of graph neural networks, but turns out to be related versions. In a 1-sided GNN (for short: 1-GNN), the message sent
to a very natural question in logic concerning the difference between along the edge (𝑣, 𝑢) is a function of the state f(𝑣) of the source
modal and guarded fragments of first-order logic with counting. vertex. In a 2-sided GNN (2-GNN), the message is a function of
the states f(𝑢) and f(𝑣) of both source and target vertex (see
Graph Neural Networks Figure 1.1). Thus, potentially 2-GNNs are more powerful because
Among the various deep learning architectures for graphs, message they allow for “targeted messages” depending on properties of the
passing graph neural networks (GNNs) [8, 9, 20, 27] are the most target vertex. In this paper, we study the question whether 2-GNNs
common choice for many different applications ranging from com- are indeed more powerful.
binatorial optimisation to particle physics (see, for example, [4, 5, Both the 1-sided and the 2-sided versions have been studied before,
6]) and both have been used in practice. Instances of 1-GNNs are the
We think of a GNN as a distributed message passing algorithm graph convolutional networks of [20]. The message passing neural
operating on the vertices of the input graph 𝐺. This perspective networks of [9] are 2-GNNs. Most of the theoretical work on GNNs
goes back to Gilmer et al. [9]. Each vertex 𝑣 of 𝐺 has a state f(𝑣), considers 1-GNNs (e.g. [2, 22, 30]); this also has to do with the fact
that this work is concerned with non-uniform expressivity, for which
∗ it is easy to see that the two versions coincide. In practical work, the
Funded by the European Union (ERC, SymSim, 101054974). Views and opinions
expressed are however those of the author(s) only and do not necessarily reflect those of prevalent perception seems to be that 2-GNNs are superior—this is
the European Union or the European Research Council. Neither the European Union nor
the granting authority can be held responsible for them. 1We might also call the state of a vertex feature vector; this explains the use of the letter

Funded by the Deutsche Forschungsgemeinschaft (DFG) - 2236/2. f.
𝑣1 𝑣1 Theorem 1.1. (1) There is a query uniformly expressible by a
2-GNN with SUM aggregation, but not by a 1-GNN with SUM
msg(f(𝑢)) msg(f(𝑣1 )) msg(f(𝑣1 ), f(𝑢)) msg(f(𝑢), f(𝑣1 )) aggregation (Theorem 5.6).
(2) All queries non-uniformly expressible by families of 2-GNNs
𝑢 𝑢 are also non-uniformly expressible by families of 1-GNNs
(Corollary 4.4).
𝑣3 𝑣2 𝑣3 𝑣2
(3) All queries non-uniformly expressible by families of 2-GNNs
of bounded depth and polynomial size are also non-uniformly
(a) (b)
expressible by families of 1-GNNs of bounded depth and
polynomial size (Corollary 4.6).
Figure 1.1. 1-sided and 2-sided message passing in the graph
𝐺 = ({𝑢, 𝑣 1, 𝑣 2, 𝑣 3 }, {𝑢𝑣 1, 𝑢𝑣 2, 𝑢𝑣 3 }). Every vertex 𝑥 has a state f(𝑥). Assertion (2) is an easy consequence of the characterisation of
Messages are associated with oriented edges. In the 1-sided version the distinguishing power of GNNs in terms of the Weisfeiler-Leman
(a), messages are a function of the state of the source vertex, in the algorithm [22, 30]. This has probably been observed by others.
2-sided version (b) they are function of the states of both source and Assertions (1) and (3) are new and nontrivial. For (1) we give a
target. direct proof. It is based on an invariant that the states of the vertices
preserve during a 1-GNN computation. This invariant asserts that the
states can be approximated in terms of certain “nice” polynomials. To
also our own experience—though an explicit empirical comparison prove (3), we rely on a logical characterisation of GNN-expressivity
between the two versions has only been carried out recently [28] (there and then prove that the logics characterising 1-GNNs and 2-GNNs
1-GNNs are called isotropic and 2-GNNs are called anisotropic). coincide. We have no idea how to prove (3) directly without going
Interestingly, the conclusion of the this empirical study is that a novel through logic.
version of 1-GNNs is en-par with 2-GNNs. Interestingly, it is necessary to use SUM-aggregation in assertion
We ask if 2-GNNs are more expressive than 1-GNNs, that is, (1) of the theorem: we prove that for GNNs that only use MAX-
whether they can compute more functions. We are not concerned aggregation or only use MEAN-aggregation, 1-GNNs and 2-GNNs
with learning, specifically the algorithmic optimisation and statistical have the same expressivity (Theorem 5.8).
generalisation questions involved. Obviously, these are highly relevant
questions as well, but technically quite different. Expressivity results Logic
such as the characterisation of the power of GNNs to distinguish two 1-sided and 2-sided message passing in GNNs correspond very
graphs in terms of the Weisfeiler-Leman algorithm [22, 30] have naturally to two forms of local quantification in logic that appear in
been quite influential in the development of GNNs as they steered modal and guarded logics.
research in the direction of more expressive architectures and helped We continue to consider undirected labelled graphs, and we
to understand their limitations. interpret all logics over such graphs. The question can best be
Our focus here is on the basic task of node classification, or unary explained in the context of the extension C of first-order logic
queries in the language of finite model theory and database theory. by counting quantifiers ∃≥𝑛 𝑥 .𝜓 (“there exist at least 𝑛 vertices 𝑥
We can build on powerful characterisations of the queries computable such that 𝜓 holds”). Consider the following two forms of restricted
by GNNs in terms of fragments of first-order logic with counting [2, quantification over the neighbours of a vertex 𝑥, both saying that
11]. The precise question we ask is: there exist at least 𝑛 neighbours 𝑥 ′ of 𝑥 such that 𝜓 holds:
Are there unary queries computable by 2-GNNs that ∃≥𝑛 𝑥 ′ (𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 (𝑥 ′ )), (1.A)
are not computable by 1-GNNs? ≥𝑛 ′ ′ ′
∃ 𝑥 (𝐸(𝑥, 𝑥 ) ∧ 𝜓 (𝑥, 𝑥 )). (1.B)
(The question was left as an open problem in [11].) This straightfor-
ward question has a surprisingly complicated answer, which has to do Importantly, 𝑥 is not a free variable of 𝜓 in (1.A), but it is in (1.B).
with the fact that there are different forms of GNN expressivity. The This is the only difference between (1.A) and (1.B). We call the first
most natural is uniform expressivity. A query Q is uniformly express- form of quantification modal and the second guarded. We define the
ible by 𝑖-GNNs if there is a single 𝑖-GNN 𝔑 computing the query. modal fragment MC of C to consist of all C-formulas that only use two
However, much of the literature on the theory of GNNs considers variables and only modal quantification of the form (1.A). Similarly,
non-uniform expressivity. A query Q is non-uniformly expressible we define the guarded fragment GC using guarded quantification of
by 𝑖-GNNs if there is a family N = (𝔑𝑛 )𝑛≥1 of 𝑖-GNNs such that the form (1.B).
𝔑𝑛 computes Q on input graphs of size 𝑛. There is an argument to Intuitively, both the modal and the guarded fragments are local,
be made for non-uniform expressivity, because in many practical in the sense that the definable properties of a vertex 𝑢 depend on
applications the input graphs can be expected to be of similar size, the properties of its neighbours 𝑣. Vertices have no identities, so the
and this size is also well-represented in the training data. However, neighbours can only be distinguished in terms of the properties they
in this case we still want to control the complexity of the GNNs have. This means that the properties of 𝑢 can only depend on how
involved; certainly GNNs of size exponential in the input graphs many neighbours of each property it has. From a more operational
would be infeasible. From this perspective, it is natural to consider perspective, we can think of modal and guarded quantification as
families of GNNs of polynomial size and bounded depth. follows: each neighbour 𝑣 sends a messages 𝑀𝜓 to 𝑢 indicating
if it has some property 𝜓 (𝑣). Node 𝑢 aggregates these messages
2
by counting how many neighbours satisfy 𝜓 and makes a decision However, the query is expressible in MFO+C by the following
based on this count. The difference between modal and guarded formula:
quantification is that in guarded quantification the message 𝑀𝜓 ∃𝑦 < ord.(𝑦 = deg(𝑥) ∧ ∃𝑥 ′ .(𝐸(𝑥, 𝑥 ′ ) ∧ 𝑦 < deg(𝑥 ′ ))). (1.E)
not only depends on properties of the source 𝑣, but on combined
properties 𝜓 (𝑢, 𝑣) of the source 𝑣 and the target 𝑢. That is, 𝑣 does not We exploit the fact that in the logics MFO+C and GFO+C,
indiscriminately send the the same message to all of its neighbours, only quantification over vertex variables 𝑥 ′ is restricted to the
but instead sends a tailored message that depends on the properties of neighbours of the current vertex 𝑥, whereas quantification
the target 𝑣 as well. In this sense, modal and guarded quantification over number variables 𝑦 is not subject to any guardedness
resemble message passing in 1-sided and 2-sided GNNs. restrictions. The philosophy behind this is that we want the
It is fairly easy to see that the modal fragment MC and the logics to be local in the input graph, but want to include full
guarded fragment GC have the same expressive power (recall that bounded arithmetic.
we are interpreting the logics over undirected graphs). However, (2) The trick in (1) is to pass information across modal quantifiers
this changes if we consider the modal fragments MFO+C and the using number variables. But since every formula only contains
guarded fragment GFO+C of the stronger logic FO+C. In the logic finitely many number variables, we can only pass finitely many
C, the number 𝑛 in a quantifier ∃≥𝑛 is a constant that is hard-wired numbers. Therefore, the natural idea to separate the logics
into a formula. In MFO+C, the constant 𝑛 is replaced by a variable would be to consider properties of nodes that depend on
that ranges over the natural numbers. This allows us, to compare unbounded sets of numbers.
counts of arbitrary size and to express arithmetic properties of counts. Let Q2 be the query that selects all nodes 𝑥 that have a
For example, we can express that a vertex has even degree, which neighbour 𝑥 ′ such that sets of degrees of the neighbours of 𝑥
is impossible in C. Formally, instead of quantifiers ∃≥𝑛 𝑥 .𝜓 we use and 𝑥 ′ , respectively, are equal. That is, Q2 evaluates to true at
terms #𝑥 .𝜓 , interpreted as the number of 𝑥 such that 𝜓 is satisfied. a node 𝑣 of a graph 𝐺 if and only if 𝑣 has a neighbour 𝑣 ′ such
Then the following formula expresses that 𝑥 has even degree. that
{ deg𝐺 (𝑤) ⋂︀ 𝑤 ∈ 𝑁𝐺 (𝑣)} = { deg𝐺 (𝑤) ⋂︀ 𝑤 ∈ 𝑁𝐺 (𝑣 ′ )}. (1.F)
∃𝑦 < ord.(2 ⋅ 𝑦 = #𝑥 ′ .𝐸(𝑥, 𝑥 ′ )). (1.C)
It is easy to express Q2 by a GFO+C-formula. It turns out that,
again, it is also possible to express Q2 by an MFO+C-formula.
Here 𝑦 is a number variable ranging over the natural numbers and
Realising this was a key step for us towards proving that the
𝑥, 𝑥 ′ are vertex variables ranging over the vertices of a graph. We
two logics have the same expressive power.
allow quantification over the natural numbers, but only bounded
The idea is that we hash the sets in (1.F) to numbers of
quantification. In the formula above, we use the constant ord, which
size polynomial in the size of the input graph, then pass
is always interpreted by the order of the input graph, to bound the
the hash value across the modal quantification, and compare
existential quantification over 𝑦. In a nutshell, FO+C combines first-
the hash values. Then if the hashing produces no collisions,
order logic over finite graphs with bounded arithmetic, connecting
we will get the correct answer. Of course we need to carry
them via counting terms. The syntax and semantics of FO+C and its
out the collision-free hashing within the logic and without
fragments will be discussed in detail in Section 3.
randomness, but this turns out to be not too difficult (we refer
In the modal fragment MFO+C and the guarded fragment GFO+C
the reader to Section 3.3).
we restrict counting terms in the way as we restrict quantification in
(3) Now consider the query Q3 which is the same as Q2 , except
(1.A) and (1.B), respectively.
that we replace the equality in (1.F) by ⊆. That is, Q3 selects all
The logics MFO+C and GFO+C are interesting because they
nodes 𝑥 that have a neighbour 𝑥 ′ such that sets of degrees of
precisely characterise the non-uniform expressivity of families of
the neighbours of 𝑥 is a subset of the degrees of the neighbours
1-GNNs and 2-GNNs, respectively, of polynomial size and bounded
of 𝑥 ′ .
depth [11].
Unfortunately, the same hashing construction no longer works,
It is not obvious at all whether MFO+C and GFO+C have the
because if the two sets in (1.F) are distinct they get different
same expressive power. The following example illustrates a sequence
hash values, and since it is in the nature of hash functions that
of increasingly harder technical issues related to this question, and
they are hard to invert, we cannot easily check if one set is a
how we resolve them.
subset of the other. Yet we find a way around this (again, we
Example 1.2. (1) Consider the query Q1 selecting all vertices of refer the reader to Section 3.3).
a graph that have a neighbour of larger degree. The query can Fully developing these ideas, we will be able to prove our main
easily be expressed by the GFO+C-formula logical result.
Theorem 1.3. MFO+C and GFO+C have the same expressivity.
𝜑 1 (𝑥) B ∃𝑥 ′ (𝐸(𝑥, 𝑥 ′ ) ∧ deg(𝑥) < deg(𝑥 ′ )), (1.D)
Since a unary query is expressible in MFO+C if and only if it is
where deg(𝑥) is the term #𝑥 ′ .(𝐸(𝑥, 𝑥 ′ ) ∧ 𝑥 ′ = 𝑥 ′ ), which is computable by a family 1-GNNs of polynomial size and bounded
even modal. Incidentally, it is also easy to design a 2-GNN depth, and similarly for GFO+C and 2-GNNs, Theorem 1.1(3)
that computes the query Q1 . follows.
In Section 5, we shall prove that the query is not computable Let us close this introduction with a brief discussion of related
by a 1-GNN; not even on complete bipartite graphs 𝐾𝑚,𝑛 . work on modal and guarded counting logics. Modal and guarded
3
counting logics have mostly been studied as fragments of C2 ; the can be computed by FNNs. A fundamental expressivity result for
two variable-fragment of C (e.g. [10, 25, 19]). Our logic MC is FNNs is the universal approximation theorem: every continuous
equivalent to a logic known as graded modal logic [7, 29, 24]. There function 𝑓 ∶ 𝐾 → R𝑛 defined on a compact domain 𝐾 ⊆ R𝑚 can be
is also related work on description logics with counting (e.g. [1, 16]). approximated to any given additive error by an FNN with just two
However, all this work differs from ours in at least two important layers.
ways: (1) the main question studied in these papers is the decidability
and complexity of the satisfiability problem, and (2) the logics are
interpreted over directed graphs. For the logics as we defined them 2.2 Graphs and Signals
here, the equivalence results between modal and guarded fragments We use a standard notation for graphs. Graphs are always finite,
only hold over undirected graphs. The modal and guarded fragments simple (no loops or parallel edges). and undirected. The order of
of FO+C were introduced in [11] in the context of GNNs. a graph 𝐺, denoted by ⋃︀𝐺 ⋃︀, is the number of its vertices. The set of
neighbours of a vertex 𝑣 in a graph 𝐺 is denoted by 𝑁𝐺 (𝑣), and
2 PRELIMINARIES the degree of 𝑣 is deg𝐺 (𝑣) B ⋃︀𝑁𝐺 (𝑣)⋃︀. We often consider vertex-
By N, N>0, R we denote the sets of nonnegative integers, positive labelled graphs, which formally we view as graphs expanded by
integers, and reals, respectively. For a set 𝑆, we denote the power unary relations. For a set Λ of unary relation symbols, a Λ-labelled
set of 𝑆 by 2𝑆 and the set of all finite multisets with elements from graph is a {𝐸} ∪ Λ-structure 𝐺 where the binary relation 𝐸(𝐺) is
symmetric and anti-reflexive. If Λ = {𝑃1, . . . , 𝑃ℓ }, we also call 𝐺 an
𝑋 by (( 𝑆∗ )). For 𝑛 ∈ N, we use the standard notation (︀𝑛⌋︀ B {𝑖 ∈
ℓ-labelled graph, and we denote the class of all ℓ-labelled graphs by
N ⋃︀ 1 ≤ 𝑖 ≤ 𝑛} as well as the variant (︀𝑛(︀ B {𝑖 ∈ N ⋃︀ 0 ≤ 𝑖 < 𝑛}. Gℓ .
We denote the 𝑖th bit of the binary representation of 𝑛 by Bit(𝑖, 𝑛); When serving as data for graph neural networks, the vertices of
so 𝑛 = ∑𝑖∈N Bit(𝑖, 𝑛)2𝑖 . For 𝑟 ∈ R we let sign(𝑟 ) B −1 if 𝑟 < 0, graphs usually have real-valued features, which we call graph signals.
sign(𝑟 ) B 0 if 𝑟 = 0, and sign(𝑟 ) B 1 if 𝑟 > 0. We denote vectors of An ℓ-dimensional signal on a graph 𝐺 is a function f ∶ 𝑉 (𝐺) → Rℓ .
reals, but also tuples of variables or tuples of vertices of a graph by We denote the class of all ℓ-dimensional signals on 𝐺 by Sℓ (𝐺)
lower-case boldface letters. In linear-algebra contexts, we think of and the class of all pairs (𝐺, f), where f is an ℓ-dimensional
vectors as column vectors, but for convenience we usually write all signal on 𝐺, by GSℓ . An ℓ-dimensional signal is Boolean if its
vectors as row vectors.
range is contained in {0, 1}ℓ . By Sℓbool (𝐺) and GSℓbool we denote
the restrictions of the two classes to Boolean signals. Observe that
2.1 Feedforward Neural Networks
there is a one-to-one correspondence between Gℓ and GSℓbool : the
We briefly review standard fully-connected feed-forward neural ℓ-labelled graph 𝐺 ∈ Gℓ corresponds to ((𝑉 (𝐺), 𝐸(𝐺)), b) ∈ GSℓbool
networks (FNNs), a.k.a. multilayer perceptrons (MLPs). An FNN
for b ∶ 𝑉 (𝐺) → {0, 1}ℓ defined by (b(𝑣))𝑖 = 1 ⇐⇒ 𝑣 ∈ 𝑃𝑖 (𝐺)).
layer of input dimension 𝑚 and output dimension 𝑛 computes a
In the following, we will think of an ℓ-labelled 𝐺 ∈ Gℓ and the
function from R𝑚 to R𝑛 of the form
corresponding ((𝑉 (𝐺), 𝐸(𝐺)), b) ∈ GSℓbool as the same object and
𝜎(𝐴𝒙 + 𝒃) hence of Gℓ and GSℓbool as the same class.
Isomorphisms between pairs (𝐺, f), (𝐻, g) ∈ GSℓ are required
for a weight matrix 𝐴 ∈ R𝑛×𝑚 , a bias vector 𝒃 ∈ R𝑛 , and an activation to preserve the signals. We call a mapping 𝑇 ∶ GSℓ → GS𝑚 a signal
function 𝜎 ∶ R → R that is applied coordinatewise to the vector transformation if for all (𝐺, f) ∈ GSℓ we have 𝑇 (𝐺, f) = (𝐺, f ′ ) for
𝐴𝒙 + 𝒃. some f ′ ∈ S𝑚 (𝐺). Such a signal transformation 𝑇 is equivariant if
An FNN is a tuple (𝐿 (1), . . . , 𝐿 (𝑑) )) of FNN layers, where the for all isomorphic (𝐺, f), (𝐻, g) ∈ GSℓ , every isomorphisms ℎ from
output dimension 𝑛 (𝑖) of 𝐿 (𝑖) matches the input dimension 𝑚 (𝑖+1) (𝐺, f) to (𝐻, g) is also an isomorphism from 𝑇 (𝐺, f) to 𝑇 (𝐻, g).
(1) (𝑑)
of 𝐿 (𝑖+1) . Then the FNN computes a function from R𝑚 to R𝑛 ,
the composition 𝑓 (𝑑) ○ ⋯ ○ 𝑓 (1) of the functions 𝑓 (𝑖) computed by 2.3 Logics and Queries
the 𝑑 layers.
Typical activation functions used in practice are the logistic func- We study the expressivity of logics and graph neural networks. Graph
tion sig(𝑥) B (1 + 𝑒 −𝑥 )−1 (a.k.a. sigmoid function), the hyperbolic neural networks operate on graphs and signals, and they compute
tangent tanh(𝑥), and the rectified linear unit relu(𝑥) B max{𝑥, 0}. signal transformations. The logics we study operate on labelled
In this paper (as in most other theoretical papers on graph neural graphs, and they compute queries. A 𝑘-ary query on the class of
networks), unless explicitly stated otherwise we assume that FNNs ℓ-labelled graphs is a mapping 𝑄 that associates with every graph 𝐺 ∈
only use relu and the identity function as activation functions, and Gℓ a set Q(𝐺) ⊆ 𝑉 (𝐺)𝑘 subject to the following invariance condition:
we assume that all weights are rational numbers. These assumptions for all isomorphic graphs 𝐺, 𝐻 ∈ Gℓ and isomorphisms ℎ from 𝐺 to
can be relaxed, but they will be convenient. One needs to be careful 𝐻 it holds that ℎ(Q(𝐺)) = Q(𝐻 ). Observe that the correspondence
with such assumptions (see Example 5.7), but they are not the core between Gℓ and GSℓbool extends to a correspondence between unary
issue studied in this paper. queries and equivariant signal transformations GSℓbool → G1bool .
In a machine learning setting, we only fix the shape or architecture More generally, equivariant signal transformations can be viewed
of the neural network and learn the parameters (that is, the weights as generalisations of unary queries to the world of graphs with real
and biases). However, our focus in this paper is not on learning valued instead of Boolean features. Similarly, 0-ary (a.k.a. Boolean)
but on the expressivity, that is, on the question which functions queries correspond to invariant mappings GSℓbool → {0, 1}. If we
4
wanted to lift the correspondence to 𝑘-ary queries for 𝑘 ≥ 2, we variable and 𝑠𝑖 ∈ N if 𝑧𝑖 is a number variable. We define the free vari-
would have to consider signals on 𝑘-tuples of vertices. ables of an expression in the natural way, where a counting term (3.A)
When comparing the expressivity of logics, we compare the class binds the variables 𝑥 1, . . . , 𝑥𝑘 , 𝑦1, . . . , 𝑦ℓ . For an FO+C-expression 𝜉,
of queries definable in these logics. The logics we study here are we write 𝜉(𝑧 1, . . . , 𝑧𝑘 ) or 𝜉(𝒛) to indicate that the free variables of 𝜉
“modal” and most naturally define unary queries. Therefore, we are among 𝑧 1, . . . , 𝑧𝑘 . By 𝜉 𝐺 (𝑠 1, . . . , 𝑠𝑘 ) we denote the value of 𝜉 if
restrict our attention to unary queries. For logics L, L′ , we say that the variables 𝑧𝑖 are interpreted by the respective 𝑠𝑖 . If 𝜉 is a formula,
L is at most as expressive as L′ (we write L ⪯ L′ ) if every unary we also write 𝐺 ⊧ 𝜉(𝑠 1, . . . , 𝑠𝑘 ) instead of 𝜉 𝐺 (𝑠 1, . . . , 𝑠𝑘 ) = 1. An
query expressible in L is also expressible in L′ . We say that L is less expression is closed if it has no free variables. For a closed expression
expressive than L′ (we write L ≺ L′ ) if L ⪯ L′ but L′ ⪯⇑ L. We say that 𝜉, we denote the value by 𝜉 𝐺 , and if it is a formula we write 𝐺 ⊧ 𝜉.
L and L′ are equally expressive (we write L ≡ L′ ) if L ⪯ L′ and L′ ⪯ L. The reader may have noted that FO+C does not have any quan-
tifiers. This is because existential and universal quantification can
3 FIRST-ORDER LOGIC WITH COUNTING easily be expressed using counting terms. For example, ∃𝑥 .𝜑 is equiv-
C is the extension of first-order logic FO by counting quantifiers ∃≥𝑝 . alent to 1 ≤ #𝑥 .𝜑. We also have bounded quantification over number
That is, C-formulas are formed from atomic formulas of the form variables. More generally, we can simulate existential quantification
𝑥 = 𝑦, 𝐸(𝑥, 𝑦), 𝑃𝑖 (𝑥) with the usual Boolean connectives, and the new ∃(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < 𝜃 1, . . . , 𝑦ℓ < 𝜃 ℓ ) over tuples of vertex- and number
counting quantifiers. Standard existential and universal quantifiers variables. Once we have existential quantification, we can simulate
can easily be expressed using the counting quantifiers (∃𝑥 as ∃≥1𝑥 universal quantification using existential quantification and negation.
We use all the standard Boolean connectives, and in addition we also
and ∀𝑥 as ¬∃≥1𝑥¬). For every 𝑘 ≥ 1, C𝑘 is the 𝑘-variable fragment
use (inequalities) <, =, ≥, > over terms as they can all be expressed
of C. While it is easy to see that every formula of C is equivalent
using ≤. Counting operators bind stronger than inequalities, which
to an FO-formula, this is no longer the case for the C𝑘 , because
bind stronger than negation, which binds stronger than other Boolean
simulating a quantifier ∃≥𝑛 in FO requires 𝑛 distinct variables.
connectives. We omit parentheses accordingly.
The logic C treats numbers (cardinalities) as constants: the 𝑛
For 𝑘 ≥ 1, the 𝑘-variable fragment FO𝑘 +C of FO+C consists of
in a quantifier ∃≥𝑛 is “hardwired” into the formula regardless of
all formulas with at most 𝑘 vertex variables. The number of number
the input graph. FO+C is a more expressive counting extension
variables in an FO𝑘 +C-formula is not restricted.
of FO in which numbers are treated as first class citizens that can
It is easy to see that
be compared, quantified, and manipulated algebraically. The logic
FO+C has formulas and terms. Terms, taking values in N, are either 𝑘 𝑘
dedicated number variables ranging over N, or the constants 0, 1, ord C ≺ FO +C.
(ord is always interpreted by the order of the input graph), or counting
terms of the form To prove this, we note that C𝑘 ⪯ FO𝑘 +C, because ∃≥𝑛 𝑥𝜑 can be
expressed as
#(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < 𝜃 1, . . . , 𝑦ℓ < 𝜃 ℓ ).𝜓, (3.A) 1 + . . . + 1 ≤ #𝑥 .𝜑.
)︁⌊︂⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ]︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ ⌊︂ )︂
𝑛 times
where the 𝑥𝑖 are vertex variables ranging over the vertices of a
graph, the 𝑦𝑖 are number variables, 𝜓 is a formula, and the 𝜃𝑖 The inclusion is strict, in fact, we even have FO1 +C ⪯⇑ C, because
are terms. Furthermore, terms can be combined using addition the FO1 +C-formula ∃𝑦.2 ⋅ 𝑦 = #𝑥 .𝑥 = 𝑥 expresses that a graph has an
and multiplication. Formulas are formed from atomic formulas even number of vertices, and it is not hard to see (and well-known)
𝑅(𝑥 1, . . . , 𝑥𝑘 ), 𝑥 = 𝑥 ′ and term comparisons 𝜃 ≤ 𝜃 ′ using Boolean that there is no FO-formula and hence no C-formula expressing even
combinations. We do not need quantifiers, because we can simulate cardinality.
them by counting terms.
To define the semantics, we think of formulas and terms as being
interpreted in the 2-sorted expansion 𝐺∪(N, +, ⋅, 0, 1) of a graph 𝐺.
3.1 The Modal and Guarded Fragments
Vertex variables range over 𝑉 (𝐺) and number variables range over We will introduce two closely related fragments, a modal and a
N. We inductively define a Boolean value for each formula and a guarded fragment, of the 2-variable logics C2 and FO2 +C. We
numerical value in N for each term. This is straightforward, except assume that C2 and FO2 +C only use the vertex variables 𝑥 1, 𝑥 2 and,
for the case of counting terms. The value of a counting term (3.A) is in the case of FO2 +C, possibly additional number variables. We
the number of tuples (𝑣 1, . . . , 𝑣𝑘 , 𝑖 1, . . . , 𝑖 ℓ ) ∈ 𝑉 (𝐺)𝑘 × Nℓ such that use 𝑥, 𝑥 ′ to refer to either of these variables, always stipulating that
for all 𝑗, 𝑖 𝑗 is smaller than the value of the term 𝜃 𝑗 and 𝜑 holds under 𝑥 ′ ≠ 𝑥 (that is, either 𝑥 = 𝑥 1 and 𝑥 ′ = 𝑥 2 or 𝑥 = 𝑥 2 and 𝑥 ′ = 𝑥 1 ). For
the assignment 𝑥𝑖 ↦ 𝑣𝑖 , 𝑦 𝑗 ↦ 𝑖 𝑗 . a formula 𝜑(𝑥, 𝒚) in C2 or FO2 +C, we let 𝜑(𝑥 ′, 𝒚) be the formula
An FO+C-expression is either a term or a formula. We denote obtained by simultaneously replacing all occurrences of 𝑥 in 𝜑(𝑥, 𝒚)
terms by 𝜃, 𝜂 and variants such as 𝜃 ′, 𝜂 1 , formulas by 𝜑,𝜓, 𝜒 and by 𝑥 ′ and all occurrences of 𝑥 ′ by 𝑥. Then the resulting formula is
variants, an expressions by 𝜉 and variants. We denote vertex vari- also in C2 or FO2 +C, respectively.
ables by 𝑥 and number variables by 𝑦. We use 𝑧 for both types of A C2 -formula is modal if for all subformulas ∃≥𝑛 𝑥 ′ .𝜑, the formula
variables. Let 𝒛 = (𝑧 1, . . . , 𝑧𝑘 ) be a tuple of variables. For a tuple 𝜑 is of the form (𝐸(𝑥, 𝑥 ′ ) ∧𝜓 (𝑥 ′ )). Remember that 𝜓 (𝑥 ′ ) stipulates
𝒛 = (𝑧 1, . . . , 𝑧𝑘 ) of variables and a graph 𝐺, we let 𝐺 𝒛 be the set of that only the variable 𝑥 ′ , but not 𝑥, may occur freely in 𝜓 . The modal
all tuples (𝑠 1, . . . , 𝑠𝑘 ) ∈ (𝐺 ∪ N)𝑘 such that 𝑠𝑖 ∈ 𝑉 (𝐺) if 𝑧𝑖 is a vertex fragment MC of C consist of all modal C2 -formulas.
5
An C2 -formula is guarded if for all subformulas ∃≥𝑛 𝑥 ′ .𝜑, the that is based on so-called built-in relations. For FO+C, a convenient
formula 𝜑 is of the form (𝐸(𝑥, 𝑥 ′ ) ∧𝜓 (𝑥, 𝑥 ′ )). The guarded fragment way of introducing non-uniformity via built-in numerical relations
GC consist of all guarded C2 -formulas. was proposed in [11, 21]. A numerical relation is a relation 𝑅 ⊆ N𝑘
for some 𝑘 ∈ N. We extend the logic FO+C by new atomic formulas
Proposition 3.1. MC ≡ GC.
𝑅(𝑦1, . . . , 𝑦𝑘 ) for all 𝑘-ary numerical relations 𝑅 and number vari-
The proof (given in the appendix) is easy. It is based on the ables 𝑦1, . . . , 𝑦𝑘 , with the obvious semantics. (Of course only finitely
observation that all formulas 𝜓 (𝑥, 𝑥 ′ ) in MC or GC with two free many numerical relations can appear in one formula.) By FO+Cnu we
variables are just Boolean combinations of formulas with one free denote the extension of FO+C to formulas using arbitrary numerical
variable and atoms 𝐸(𝑥, 𝑥 ′ ), 𝑥 = 𝑥 ′ . relations. We extend the notation to fragments of FO+C, so we also
The definitions for the modal and guarded fragments of FO+C have MFO+Cnu and GFO+Cnu .
are similar, but let us be a bit more formal here. The sets of MFO+C- It is a well-known fact that on ordered graphs, FO+Cnu captures
formulas and MFO+C-terms (in the language of Λ-labelled graphs) the complexity class non-uniform TC0 [3].
are defined inductively as follows: The proof of Theorem 3.3 goes through in the nonuniform setting
(i) all number variables and 0, 1 and ord are MFO+C-terms; with without any changes. So as a corollary (of the proof) we obtain
(ii) 𝜃 +𝜃 ′ and 𝜃 ⋅𝜃 ′ are MFO+C-terms, for all MFO+C-terms 𝜃, 𝜃 ′ ; the following.
(iii) 𝜃 ≤ 𝜃 ′ is an MFO+C-formula, for all MFO+C-terms 𝜃, 𝜃 ′ ;
Corollary 3.4. MFO+Cnu ≡ GFO+Cnu .
(iv) 𝑥𝑖 = 𝑥 𝑗 , 𝐸(𝑥𝑖 , 𝑥 𝑗 ), 𝑃(𝑥𝑖 ) are MFO+C-formulas, for 𝑖, 𝑗 ∈ (︀2⌋︀
and 𝑃 ∈ Λ; Numerical built-in relations are tied to the two-sorted framework
(v) #(𝑦1 < 𝜃 1, . . . , 𝑦𝑘 < 𝜃𝑘 ).𝜓 is an MFO+C-term, for all num- of FO+C with its number variables and arithmetic. There is a
ber variables 𝑦1, . . . , 𝑦𝑘 , all MFO+C-terms 𝜃 1, . . . , 𝜃𝑘 , and all simpler and more powerful, but in most cases too powerful notion of
MFO+C-formulas 𝜓 . nonuniform definability that applies to all logics. We say that a query
(vi) #(𝑥 3−𝑖 , 𝑦1 < 𝜃 1, . . . , 𝑦𝑘 < 𝜃𝑘 ).(𝐸(𝑥𝑖 , 𝑥 3−𝑖 ) ∧𝜓 ) is an MFO+C- Q is non-uniformly expressible in L if there is a family (𝜑𝑛 )𝑛∈N of
term, for all 𝑖 ∈ (︀2⌋︀, all number variables 𝑦1, . . . , 𝑦𝑘 , all L-formulas such that for each 𝑛 the formula 𝜑𝑛 expresses Q on graphs
MFO+C-terms 𝜃 1, . . . , 𝜃𝑘 , and all MFO+C-formulas 𝜓 such of order 𝑛. This notion is too powerful in the sense that if L is at least
that 𝑥𝑖 does not occur freely in 𝜓 . as expressive as first-order logic, then every query is non-uniformly
The sets of GFO+C-formulas and GFO+C-terms are defined induc- expressible in L. This is an easy consequence of the fact that every
tively using rules (i)–(v) and the rule (vi’) obtained from (vi) by finite graph can be described up to isomorphism by a formula of
dropping the requirement that 𝑥𝑖 does not occur freely in 𝜓 . That is, first-order logic. For our counting logics, we obtain the following
in (vi’), 𝜓 may be an arbitrary GFO+C-formula. simple Lemma 3.5, which may be regarded as folklore. Assertion (1)
Example 3.2. Recall Example 1.2. The formula 𝜑 1 (𝑥) in (1.D) essentially goes back to [17] and assertion (2) to [23]. For background
is guarded (that is, a GFO+C-formula). The term deg(𝑥) and the on the Color Refinement algorithm and Weisfeiler-Leman algorithm
formula in (1.E) are modal. and their relation to counting logics as well as graph neural networks,
we refer the reader to [13]. We say that a unary query Q is invariant
Note that every formula of MFO+C and GFO+C either is purely under Colour Refinement if for all graphs 𝐺, 𝐻 and vertices 𝑣 ∈ 𝑉 (𝐺),
arithmetical, using no vertex variables at all and hence only accessing 𝑤 ∈ 𝑉 (𝐻 ) such that the algorithm assigns the same colour to 𝑣, 𝑤
the input graph by its order, or it has at least one free vertex variable. when run on 𝐺, 𝐻 , respectively, we have 𝑣 ∈ Q(𝐺) ⇔ 𝑤 ∈ Q(𝐻 ).
We are mainly interested in unary queries expressible in the logics
and thus in formulas with exactly one free vertex variable and no Lemma 3.5. (1) A unary query Q is non-uniformly expressible
free number variables. Note that an MFO+C-formula with only one in MC (and hence in GC) if and only if it is invariant under
free vertex variable cannot contain a term formed by using rule (v) Colour Refinement.
for a formula 𝜓 with two free vertex variables, because in MFO+C, (2) Every unary query expressible in GFO+Cnu is non-uniformly
a formula with two free vertex variables can never appear within a expressible in GC (and hence in MC).
counting term (necessarily of type (vi)) binding a free vertex variable. Note that assertion (2) follows from (1) by observing that every
Theorem 3.3. MFO+C ≡ GFO+C. unary query expressible in GFO+Cnu in invariant under Colour
This is Theorem 1.3 from the introduction. Before we prove Refinement.
the theorem in Section 3.3, we continue with a few remarks on
nonuniform expressivity. 3.3 Proof of Theorem 3.3
To prove the theorem, we only consider graphs 𝐺 of order at least
3.2 Nonuniform Expressivity 𝑛 0 , where 𝑛 0 ≥ 2 is a constant determined later (in Lemma 3.11). In
In complexity theory, besides the “uniform computability” by models particular, when we say that two formulas are equivalent, this means
like Turing machines, it is common to also study “nonuniform com- that they are equivalent in graphs of order at least 𝑛 0 .
putability”, most often by nonuniform families of circuits. Similarly, This assumption is justified by the fact that on graphs of order
in the literature on graph neural networks it is common to consider a less than 𝑛 0 , the logics MFO+C, GFO+C have the same expressivity
nonuniform notion of expressivity. as MC. This follows from Lemma 3.5(2).
To capture non-uniformity in descriptive complexity theory, Im- In the proof, we informally distinguish between small numbers
𝑂(1)
merman (see [18]) introduced a notion of nonuniform definability in 𝑛𝑂(1) and large numbers in 2𝑛 , where 𝑛 is the order of the
6
𝐺
input graph. We denote small numbers by lower case letters, typically That is, ⎯𝜑(𝑦, 𝒛), 𝜃 (𝒛) (𝒔) is the 𝑁 ∈ N with Bit(𝑖, 𝑁 ) = 1 ⇐⇒
𝑘, ℓ,𝑚, 𝑛, 𝑝, and large numbers by uppercase letters, typically 𝑀, 𝑁 . 𝐺
(𝑖 < 𝜃 (𝒔) and 𝐺 ⊧ 𝜑(𝑖, 𝒔)). Note that we underline the variable 𝑦
Thus in space polynomial in 𝑛 we can represent small and large
numbers as well as sets of small numbers, but not sets of large in ⎯𝜑(𝑦, 𝒛), 𝜃 (𝒛) to indicate which variable we use to determine
numbers. the bits of the number (otherwise it could also be some variable in
We first simplify our formulas. We use the following lemma from 𝒛). The underlined variable does not have to be the first in the list.
[12] (Lemma 3.2). An FO+C-expression is arithmetical if it contains no vertex
variables. Note that all arithmetical FO+C-formulas are also MFO+C-
Lemma 3.6 ([12]). For every FO+C-term 𝜃 (𝑥 1, . . . , 𝑥𝑘 , 𝑦1, . . . , 𝑦𝑘 ) formulas and hence GFO+C-formulas. The following lemma collects
there is a polynomial 𝔭𝜃 (𝑋 ) such that for all graphs 𝐺, all 𝑣 1, . . . , 𝑣𝑘 ∈ the main facts that we need about the expressivity of arithmetical
𝑉 (𝐺), and all 𝑛 1, . . . , 𝑛 ℓ ∈ N it holds that operations in FO+C. Via the connection between FO+C and the
circuit complexity class uniform TC0 [3], the results were mostly
𝜃 𝐺 (𝑣 1, . . . , 𝑣𝑘 , 𝑛 1, . . . , 𝑛 ℓ ) ≤ 𝔭𝜃 ( max ({⋃︀𝐺 ⋃︀} ∪ {𝑛𝑖 ⋂︀ 𝑖 ∈ (︀ℓ⌋︀})).
known in the 1990s, with the exception of division, which was only
Let 𝜑 be an FO+C-formula. Then each number variable 𝑦 except established by Hesse in 2000 [14, 15]. It is difficult, though, to find
for the free number variables of 𝜑 is introduced in a counting term proofs for these results in the circuit-complexity literature. We refer
#(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < 𝜃 1, . . . , 𝑦ℓ < 𝜃 ℓ ).𝜓 . If 𝑦 = 𝑦𝑖 , then it is bound by the the reader to [12] for proof sketches.
term 𝜃𝑖 , which we call the bounding term for 𝑦. We assume that all Lemma 3.8 (Folklore, [3, 14, 15]). (1) There is an arithmetical
free number variables 𝑦 of 𝜑 have a degree deg(𝑦). We inductively FO+C-formula bit(𝑦, 𝑦 ′ ) such that for all graphs 𝐺 and all
define the degree deg𝜑 (𝑦) of a number variable 𝑦 in 𝜑 as follows. 𝑖, 𝑛 ∈ N it holds that 𝐺 ⊧ bit(𝑖, 𝑛) ⇐⇒ Bit(𝑖, 𝑛) = 1.
If 𝑦 is a free variable of 𝜑, then deg𝜑 (𝑦) B deg(𝑦). Otherwise, (2) Let 𝜑 1 (𝑦, 𝒛), 𝜑 2 (𝑦, 𝒛) be FO+C-formulas, and let 𝜃 1 (𝒛), 𝜃 2 (𝒛)
let 𝜃 = 𝜃 (𝑥 1, . . . , 𝑥𝑘 , 𝑦1, . . . , 𝑦ℓ ) be the bounding term for 𝑦, and let be FO+C-terms. Then there are FO+C-formulas add(𝑦, 𝒛),
𝔭𝜃 (𝑋 ) be the polynomial of Lemma 3.6. Let 𝑐 ∈ N be minimum such sub(𝑦, 𝒛) mul(𝑦, 𝒛), div(𝑦, 𝒛), and leq(𝒛) such that for all
that 𝔭𝜃 (𝑛) < 𝑛𝑐 for all 𝑛 ≥ 2. Then if ℓ = 0, that is, the bounding graphs 𝐺 and all 𝒔 ∈ 𝐺 𝒛 the following holds. For 𝑖 = 1, 2, let
term 𝜃 has no free number variables, we let deg𝜑 (𝑦) B 𝑐. Otherwise, 𝐺
𝑁𝑖 B ⎯𝜑𝑖 (𝑦, 𝒛), 𝜃𝑖 (𝒛) (𝒔). Then
we let deg𝜑 (𝑦) = 𝑐𝑑, where 𝑑 B max { deg𝜑 (𝑦𝑖 ) ⋂︀ 𝑖 ∈ (︀ℓ⌋︀}. Strictly
𝐺
speaking, we should have defined the degree of an occurrence of a ⎯add(𝑦, 𝒛), 𝜃 1 + 𝜃 2 (𝒔) (𝒔) = 𝑁 1 + 𝑁 2,
number variable 𝑦, because the same variable may be introduced 𝐺
several times. But without loss of generality, we may assume that ⎯sub(𝑦, 𝒛), 𝜃 1 (𝒔) (𝒔) = max{0, 𝑁 1 − 𝑁 2 },
every number variable is either free and never (re)introduced in a 𝐺
⎯mul(𝑦, 𝒛), 𝜃 1 + 𝜃 2 (𝒔) (𝒔) = 𝑁 1 ⋅ 𝑁 2,
counting term or introduced exactly once. 𝐺 𝑁1
Observe that during the evaluation of 𝜑 in a graph of order 𝑛 ≥ 2 ⎯div(𝑦, 𝒛), 𝜃 1 (𝒔) (𝒔) = ⃒𝑁 2
)︁ if 𝑁 2 ≠ 0,
where each free variable 𝑦 of 𝜑 is assigned a value smaller than 𝐺 ⊧ leq(𝒔) ⇐⇒ 𝑁1 ≤ 𝑁2 .
𝑛 deg(𝑦) , each number variable 𝑦 of 𝜑 can only take values less than Furthermore, if the 𝜑𝑖 , 𝜃𝑖 are arithmetical (modal, guarded)
deg (𝑦)
𝑛 𝜑 . then add, sub, mul, div, leq are arithmetical (modal, guarded,
Let us call a formula 𝜑 simple if it satisfies the following two respectively) as well.
conditions:
Corollary 3.9. 𝜑(𝑦, 𝒛) be an FO+C-formula, and let 𝜃 (𝒛) be an
(i) For each bound number variable 𝑦 in 𝜑, the bounding term of
deg (𝑦) FO+C-term. Then there is an FO+C-formula mod(𝑦, 𝑦, ⧹︂ 𝒛) such that
𝑦 is ord 𝜑 ; for all graphs 𝐺 and all 𝒔 ∈ 𝐺 𝒛 , 𝑛, 𝑛⧹︂ ∈ N we have
(ii) counting terms 𝜃 B #(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < 𝜃 1, . . . , 𝑦ℓ < 𝜃 ℓ ).𝜓 only
appear in subformulas 𝑦 = 𝜃 for some number variable 𝑦 that ⧹︂ 𝒔)
𝐺 ⊧ mod(𝑛, 𝑛,
does not occur in 𝜃 . 𝐺
⇐⇒ 0 ≤ 𝑛 < 𝑛⧹︂ and 𝑛 ≡ ⎯𝜑(𝑦, 𝒛), 𝜃 (𝒛) (𝒔) mod 𝑛.
⧹︂
Note that (i) implies that all counting terms in 𝜑 are of the form
#(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < ord𝑑1 , . . . , 𝑦ℓ < ord𝑑𝑖 ).𝜓 , where 𝑑𝑖 = deg𝜑 (𝑦𝑖 ). Furthermore, if the 𝜑, 𝜃 are arithmetical (modal, guarded) then mod
is arithmetical (modal, guarded, respectively) as well.
Lemma 3.7. Every FO+C-formula 𝜑 with no free number variables
The last ingredient for our proof of Theorem 3.3 is a little bit
is equivalent to a simple FO+C-formula 𝜑 ′ . Furthermore, if 𝜑 is in
of number theory. A crucial idea of the proof is to hash sets of
MFO+C or GFO+C, then 𝜑 ′ is in MFO+C or GFO+C, respectively,
small numbers, or equivalently, large numbers, to small “signatures”,
as well.
which will be obtained by taking the large numbers modulo some
Next, we need to express some arithmetic on bit representations small prime. This is a variant of well-known perfect hash families
of numbers. We think of a pair 𝜑(𝑦, 𝒛), 𝜃 (𝒛) consisting of an FO+C- based on primes.
formula 𝜑 and an FO+C-term 𝜃 as representing a number: for a graph
Lemma 3.10. Let 𝑘,𝑚 ∈ N and M ⊆ (︀2𝑚 (︀. Furthermore, let 𝑃 ⊆ N
𝐺 and a tuple 𝒔 ∈ 𝐺 𝒛 , we let
be a set of primes of cardinality ⋃︀𝑃 ⋃︀ ≥ 𝑘𝑚⋃︀M⋃︀2 . Then
𝐺
⎯𝜑(𝑦, 𝒛), 𝜃 (𝒛) (𝒔) B ∑ 2𝑖 . 1
Pr (∃𝑀, 𝑁 ∈ M, 𝑀 ≠ 𝑁 ∶ 𝑀 ≡ 𝑁 mod 𝑝) < .
𝑖∈N,𝑖<𝜃 𝐺 (𝒔),𝐺⊧𝜑(𝑖,𝒔) 𝑝∈𝑃 𝑘
7
where the probability ranges over 𝑝 ∈ 𝑃 chosen uniformly at random. number variables only take values in (︀𝑚(︀. Let 𝑖 ∈ (︀𝑞⌋︀. Then for every
𝑣 ∈ 𝑉 (𝐺), the formula 𝜒𝑖 defines a relation
A second number theoretic fact we need is that we can find
sufficiently many small primes; this is a direct consequence of the 𝑅𝑖 (𝑣) B {(𝑎 1, . . . , 𝑎𝑘𝑖 ) ∈ (︀𝑚(︀𝑘𝑖 ⋂︀ 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 )}.
prime number theorem.
With each tuple (𝑎 1, . . . , 𝑎𝑘𝑖 ) ∈ (︀𝑚(︀𝑘𝑖 we associate the (small) number
Lemma 3.11. There is an 𝑛 0 ≥ 2 such that for all 𝑛 ≥ 𝑛 0 there are 𝑘𝑖
at least 𝑛 primes 𝑝 ≤ 2𝑛 ln 𝑛. ∐︁𝑎 1, . . . , 𝑎𝑘𝑖 ̃︁ B ∑ 𝑎 𝑗 𝑚 𝑗−1 ∈ [︀𝑚𝑘𝑖 [︀ .
𝑗=1
Finally, we are ready to prove Theorem 3.3. The full proof can be
found in the appendix. What we give here is still most of the proof, Then we can encode 𝑅𝑖 (𝑣) by the (large) number
we just defer some simple claims to the appendix. 𝑘𝑖
∐︁𝑎 1 ,...,𝑎𝑘𝑖 ̃︁
𝑁𝑖 (𝑣) B ∑ 2 ∈ ]︀2𝑚 ]︀ .
Proof of Theorem 3.3. By induction, we prove that for every (𝑎 1 ,...,𝑎𝑘𝑖 )∈𝑅𝑖 (𝑣)
GFO+C-formula 𝜑(𝑥, 𝑦1, . . . , 𝑦𝑘 ) and each assignment of a degree Let
deg(𝑦𝑖 ) to the free number variables there is an MFO+C-formula N(𝐺) B {𝑁𝑖 (𝑣) ⋂︀ 𝑖 ∈ (︀𝑞⌋︀, 𝑣 ∈ 𝑉 (𝐺)}.
⧹︂ 𝑦1, . . . , 𝑦𝑘 ) such that the following holds: for all graphs 𝐺 of order
𝜑(𝑥, 𝑘0

𝑛 B ⋃︀𝐺 ⋃︀, all 𝑣 ∈ 𝑉 (𝐺), and all 𝑎 1 ∈ [︀𝑛 deg(𝑦𝑖 ) [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛 deg(𝑦𝑘 ) [︀ Then ⋃︀N⋃︀ ≤ 𝑞𝑛, and for 𝑘 0 B max{𝑘𝑖 ⋃︀ 𝑖 ∈ (︀𝑞⌋︀} we have N ⊆ ]︀2𝑚 ]︀.
it holds that We want to use small primes 𝑝 ∈ 𝑂(𝑛 2 ) to hash the set N to a
set of small numbers. A prime 𝑝 is good for 𝐺 if for all distinct
𝐺 ⊧ 𝜑(𝑣, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑(𝑣,
⧹︂ 𝑎 1, . . . , 𝑎𝑘 ).
𝑁 , 𝑁 ′ ∈ N(𝐺) it holds that 𝑁 ≡⇑ 𝑁 ′ mod 𝑝.
We may assume that 𝜑 is a simple formula. We let There is some 𝑐 ∈ N that does not depend on 𝑛, but only on the
formula 𝜑 and the parameters 𝑑, 𝑘 0, 𝑞 derived from 𝜑 such that
𝑑 B max{deg𝜑 (𝑦) ⋃︀ 𝑦 number variable of 𝜑}.
𝑛𝑐 ≥ 4𝑞 2𝑛𝑑𝑘0 +2 ln(2𝑞 2𝑛𝑑𝑘0 +2 ) = 4𝑞 2𝑚𝑘0 𝑛 2 ln 2𝑞 2𝑚𝑘0 𝑛 2 .
The only interesting step involves counting terms. Since 𝜑 is simple,
this means that we have to consider a formula Let 𝑃𝑛 be the set of all primes less than or equal to 𝑛𝑐 . Then by
Lemma 3.11, ⋃︀𝑃𝑛 ⋃︀ ≥ 2𝑞 2𝑚𝑘0 𝑛 2 ≥ 2𝑚𝑘 ⋃︀N⋃︀2 . By Lemma 3.10 with
𝜑(𝑥, 𝑦0, . . . , 𝑦𝑘 ) = (𝑦0 = #(𝑥 ′, 𝑦𝑘+1 < ord𝑑𝑘+1 , . . . , 𝑦𝑘+ℓ < ord𝑑𝑘+ℓ ). 𝑘 B 2,𝑚 B 𝑚𝑘 , M B N, more than half of the primes in 𝑃𝑛 are
good.
(𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 (𝑥, 𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ ))).
Now suppose 𝑝 ∈ 𝑃𝑛 is a prime that is good for 𝐺. For every
where for all 𝑖 ∈ (︀𝑘 + ℓ⌋︀ we let 𝑑𝑖 B deg𝜑 (𝑦𝑖 ). 𝑖 ∈ (︀𝑞⌋︀, 𝑣 ∈ 𝑉 , we let 𝑛𝑖 (𝑣, 𝑝) ∈ (︀𝑝(︀ such that 𝑛𝑖 (𝑣, 𝑝) ≡ 𝑁𝑖 (𝑣) mod 𝑝.
We need to understand the structure of 𝜓 . Let us call maximal Observe that for all vertices 𝑣, 𝑣 ′ ∈ 𝑉 (𝐺), if 𝑛𝑖 (𝑣, 𝑝) = 𝑛𝑖 (𝑣 ′, 𝑝) then
subformulas of 𝜓 with only one free vertex variable vertex formulas. 𝑁𝑖 (𝑣) = 𝑁𝑖 (𝑣) and thus 𝑅𝑖 (𝑣) = 𝑅𝑖 (𝑣 ′ ). This means that for all
We distinguish between 𝑥-formulas, where only 𝑥 occurs freely, 𝑎 1, . . . , 𝑎𝑘𝑖 ∈ (︀𝑚(︀ it holds that
and 𝑥 ′ -formulas, where only 𝑥 ′ occurs freely. The formula 𝜓 is 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ) ⇐⇒ 𝐺 ⊧ 𝜒𝑖 (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 ).
formed from relational atoms 𝑥 = 𝑥 ′, 𝐸(𝑥, 𝑥 ′ ), arithmetical formulas
(that neither contain 𝑥 nor 𝑥 ′ ), and vertex formulas, using Boolean Claim 1. For every 𝑖 ∈ (︀𝑞⌋︀ there is a formula 𝜁𝑖 (𝑥, 𝑧, 𝑧𝑖 ) such that
connectives, inequalities between terms, and counting terms of the for all 𝑣 ∈ 𝑉 (𝐺) and 𝑏 ∈ N,
form
′ ′ 𝐺 ⊧ 𝜁𝑖 (𝑣, 𝑝, 𝑏) ⇐⇒ 𝑏 = 𝑛𝑖 (𝑣, 𝑝).
#(𝑦1′ < ord𝑑1 , . . . , 𝑦𝑘′ < ord𝑑ℓ ′ ).𝜒 . (3.B)
Here 𝑧 and 𝑧𝑖 are fresh number variables that do not occur in 𝜑.
Similarly to the proof of Proposition 3.1, we can argue that we do not
need any relational atoms in 𝜓 , because in 𝜑 we take the conjunction Crucially, the formulas 𝜁𝑖 in Claim 1 do not depend on the graph
of 𝜓 with 𝐸(𝑥, 𝑥 ′ ), which means that in 𝜓 the atom 𝐸(𝑥, 𝑥 ′ ) can be 𝐺, but only on the formula 𝜑. The same will hold for all formulas
set to“true” and the atom 𝑥 = 𝑥 ′ can be set to “false”. Moreover, since defined in the following.
the graph is undirected, we can also set 𝐸(𝑥 ′, 𝑥) to “true”. (Of course For every 𝑖 ∈ (︀𝑞⌋︀, let
this only holds for atoms that are not in the scope of some counting 𝜒𝑖′ (𝑥 ′, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 , 𝑧, 𝑧𝑖 ) B
term that binds 𝑥 or 𝑥 ′ , that is, not contained in a vertex formula.)
So we assume that 𝜓 is actually formed from arithmetical formulas ∃𝑥(𝐸(𝑥 ′, 𝑥) ∧ 𝜁 (𝑥, 𝑧, 𝑧𝑖 ) ∧ 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 )).
and vertex formulas using Boolean connectives, inequalities between Note that 𝜒𝑖′ is an MFO+C-formula.
terms, and counting terms of the form (A.A).
To turn 𝜑 into a modal formula, we need to eliminate the 𝑥- Claim 2. For all (𝑣, 𝑣 ′ ) ∈ 𝐸(𝐺) and 𝑎 1, . . . , 𝑎𝑘𝑖 ∈ (︀𝑚(︀ it holds that
formulas in 𝜓 . Let 𝜒1, . . . , 𝜒𝑞 be an enumeration of all 𝑥-formulas 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ) ⇐⇒ 𝐺 ⊧ 𝜒𝑖′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 , 𝑝, 𝑛𝑖 (𝑣, 𝑝)).
in 𝜓 , where 𝜒𝑖 = 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 ). Here the 𝑦𝑖,𝑗 may be variables
in {𝑦1, . . . , 𝑦𝑘+ℓ }, or they may be number variables 𝑦𝑖′ introduced by Now let 𝜓 ′ (𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ , 𝑧, 𝑧 1, . . . , 𝑧𝑞 ) be the formula obtained
counting terms (A.A). from 𝜓 (𝑥, 𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ ) by replacing, for each 𝑖 ∈ (︀𝑞⌋︀, the 𝑥-
Let 𝐺 be a graph of order 𝑛. Let 𝑚 B 𝑛𝑑 . Since 𝑑 ≥ deg𝜑 (𝑦) formula 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 ) by the 𝑥 ′ -formula 𝜒𝑖′ (𝑥 ′, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 , 𝑧, 𝑧𝑖 ).
for all number variables 𝑦 appearing in 𝜑, when evaluating 𝜑 in 𝐺, Then 𝜓 ′ is an MFO+C-formula.
8
Claim 3. For all (𝑣, 𝑣 ′ ) ∈ 𝐸(𝐺) and 𝑎 1 ∈ [︀𝑛𝑑1 [︀ , . . . , 𝑎𝑘+ℓ ∈ [︀𝑛𝑑𝑘+ℓ [︀ if 𝔏 is a 2-GNN-layer. Note that the only difference between the two
it holds that is that in a 1-GNN the messages only depend on the state of the
sender, whereas in a 2-GNN they depend on the states of both sender
𝐺 ⊧ 𝜓 (𝑣, 𝑣 ′, 𝑎 1, . . . , 𝑎𝑘+ℓ )
and recipient.
⇐⇒ 𝐺 ⊧ 𝜓 ′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘+ℓ , 𝑝, 𝑛 1 (𝑣, 𝑝), . . . , 𝑛𝑞 (𝑣, 𝑝)). The message function msg and combination function comb of a
GNN layer are computed by FNNs, and the parameters are typically
We let learned from data. To be precise, we need to add these neural
networks rather than the function they compute to the specification
⎛ 𝑞
𝜑 ′ (𝑥, 𝑦0, . . . , 𝑦𝑘 , 𝑧) B ∃(𝑧 1 < 𝑧, . . . , 𝑧𝑞 < 𝑧). ⋀ 𝜁𝑖 (𝑥, 𝑧, 𝑧𝑖 )∧ of a GNN layer (that is, replace comb and msg by FNNs computing
⎝ 𝑖=1 these functions), but we find it more convenient here to just give
𝑦0 = #(𝑥 ′, 𝑦𝑘+1 < ord𝑑𝑘+1 , . . . , 𝑦𝑘+ℓ < ord𝑑𝑘+ℓ ). the functions. In principle, the aggregation function can be any
function from finite multisets of vectors to vectors, but the most
(𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 ′ (𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ , 𝑧, 𝑧 1, . . . , 𝑧𝑞 ))), common aggregation functions used in practice are summation
(SUM), arithmetic mean (MEAN), and maximum (MAX), applied
Note that 𝜑 ′ is an MFO+C-formula. coordinatewise. We restrict our attention to these. We sometimes have
Claim 4. For all 𝑣 ∈ 𝑉 (𝐺) and 𝑎 1 ∈ [︀𝑛𝑑1 [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛𝑑𝑘 [︀, to aggregate over the empty multiset ∅, and we define SUM(∅) B
MEAN(∅) B MAX(∅) B 0.
𝐺 ⊧ 𝜑(𝑥, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑 ′ (𝑥, 𝑎 1, . . . , 𝑎𝑘 , 𝑝). (3.C) For 𝑖 = 1, 2, an 𝑖-GNN is a tuple 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) of 𝑖-GNN
layers such that the output dimension 𝑞 (𝑖) of 𝔏(𝑖) matches the input
In fact, the assertion of Claim 4 holds for all primes 𝑝 that are dimension 𝑝 (𝑖+1) of 𝔏(𝑖+1) . We call 𝑝 (1) the input dimension of 𝔑
good for 𝐺. Recall that more than half of the primes 𝑝 < 𝑛𝑐 are good.
and 𝑞 (𝑑) the output dimension.
Thus (A.E) holds for more than half of the primes 𝑝 < 𝑛𝑐 . Let
′ ′
An 𝑖-GNN 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) computes the signal transforma-
prime(𝑧) B ∀(𝑦 < 𝑧, 𝑦 < 𝑧).¬𝑦 ⋅ 𝑦 = 𝑧, tion
expressing that 𝑧 is a prime. Then we let 𝑇𝔑 B 𝑇𝔏(𝑑) ○ 𝑇𝔏(𝑑−1) ○ . . . ○ 𝑇𝔏(1) ∶ GS𝑝 (1) → GS𝑞 (𝑑)
𝑐
⧹︂ 𝑦1, . . . , 𝑦𝑘 ) B #𝑧 < ord .prime(𝑧)
𝜑(𝑥, that is, the composition of the transformations of its layers. We also
< 2 ⋅ #𝑧 < ord𝑐 .(prime(𝑧) ∧ 𝜑 ′ (𝑥, 𝑦1, . . . , 𝑦𝑘 , 𝑧)). define 𝑆 𝔑 (𝐺, f) ∈ S𝑞 (𝑑) (𝐺) to be the signal such that 𝑇𝔑 (𝐺, f) B
(𝐺, 𝑆 𝔑 (𝐺, f)).
Claim 5. For all 𝑣 ∈ 𝑉 (𝐺) and 𝑎 1 ∈ [︀𝑛𝑑1 [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛𝑑𝑘 [︀, We observe that we can actually simplify 1-GNNs by assuming
that the message function is always just the identity.
𝐺 ⊧ 𝜑(𝑥, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑(𝑥,
⧹︂ 𝑎 1, . . . , 𝑎𝑘 ). (3.D)
□ Lemma 4.1. For every 1-GNN 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) there is a 1-
̃(0), 𝔏
̃ = (𝔏
GNN 𝔑 ̃(1), . . . , 𝔏
̃(𝑑) ) such that 𝑇 = 𝑇 ̃ and the message
𝔑 𝔑
(𝑖)
4 GRAPH NEURAL NETWORKS ̃(𝑖) is the identity function on R𝑝̃ .
function of each layer 𝔏
Recall that we want to study the expressivity of GNNs with 1-sided A similar claim for 2-GNNs does not seem to hold. If the message
messages (1-GNNs) and GNNs with 2-sided messages (2-GNNs). function is linear, which in practice it often is, and the aggregation
Both 1-GNNs and 2-GNNs consist of a sequence of layers. A 1- function is MEAN, then the message function can be pulled into the
GNN layer is a triple 𝔏 = (msg, agg, comb) of functions: a message combination function of the same layer and be replaced by identity.
𝑟
function msg ∶ R𝑝 → R𝑟 , an aggregation function agg ∶ (( R∗ )) → R𝑟 ,
Proposition 4.2. Let 𝔑 be a 2-GNN such that on all layers of 𝔑, the
and a combination function comb ∶ R𝑝+𝑟 → R𝑞 . We call 𝑝 the input
message function is an affine linear function and the aggregation
dimension and 𝑞 the output dimension of the layer. A 2-GNN layer is
function is MEAN. Then there is a 1-GNN 𝔑 ̃ such that 𝑇𝔑 = 𝑇 ̃ .
defined in exactly the same way, except that msg ∶ R2𝑝 → R𝑟 . 𝔑
An 𝑖-GNN layer 𝔏 = (msg, agg, comb) computes a signal transfor-
4.1 Expressivity and Logical Characterisations
mation 𝑇𝔏 ∶ GS𝑝 → GS𝑞 . We let 𝑇𝔏 (𝐺, f) B (𝐺, 𝑆𝔏 (𝐺, f)), where
𝑆𝔏 (𝐺, f) ∈ S𝑞 (𝐺) is the signal defined by GNNs compute signal transformations, that is, equivariant functions
on the vertices of a graph. To be able to compare them to logics,
⎛ ⎞ we are interested in the queries computable by GNNs. Let 𝔑 be a
𝑆𝔏 (𝐺, f)(𝑣) B comb f(𝑣), agg({{msg(f(𝑤)) ⨄︀ 𝑤 ∈ 𝑁𝐺 (𝑣)}})
⎝ ⎠ GNN of input dimension ℓ and output dimension 1, and Q be a unary
(4.A) query on labelled graphs. We say that 𝔑 (uniformly) computes Q if
if 𝔏 is a 1-GNN-layer, and by for all (𝐺, b) ∈ GSℓbool (recall that we do not distinguish between an
ℓ-labelled graph and the corresponding graph with an ℓ-dimensional
𝑆𝔏 (𝐺, f)(𝑣) B Boolean signal) and all 𝑣 ∈ 𝑉 (𝐺) we have
)︀
⌉︀≥ 3 if 𝑣 ∈ Q(𝐺, b),
⎛ ⎞ ⌉︀
comb f(𝑣), agg({{msg(f(𝑣), f(𝑤)) ⨄︀ 𝑤 ∈ 𝑁𝐺 (𝑣)}}) (4.B) 𝑆 𝔑 (𝐺, b)(𝑣) ⌋︀ 14
⎝ ⎠
]︀≤ 4 if 𝑣 ∈⇑ Q(𝐺, b).
⌉︀
⌉︀
9
MC = 1-GNN GFO+Cnu = 2-GNN
MC ≺ 1-GNN MFO+C

=
Proposition 3.1 Corollary 3.4
=

=
Proposition 3.1 Theorem 5.6 Theorem 3.3

GC 2-GNN ≺ GFO+C GC = 2-GNN MFO+Cnu = 1-GNN

(a) fully nonuniform (b) polysize bounded-depth


Figure 4.1. Uniform expressivity
Figure 4.2. Nonuniform expressivity

Barcelo et al. [2] proved that every unary query expressible by an MC-
formula is expressible by a 1-GNN with SUM-aggregation and relu (1) Q is expressible by a polynomial size bounded-depth family
activation (in the FNNs computing the message and combination of 2-GNNs if and only if Q is expressible in GFO+Cnu .
functions). Grohe [11] proved that every query expressible by a (2) Q is expressible by a polynomial size bounded-depth family
2-GNN is also expressible in GFO+C. Essentially the same proof of 1-GNNs if and only if Q is expressible in MFO+Cnu .
also yields that that every query expressible by a 1-GNN is also
Actually, only assertion (1) of the theorem is proved in [11], but
expressible in MFO+C. The results are illustrated in Figure 4.1. All
the proof can easily be adapted to (2).
inequalities are strict; for the inequality between 1-GNN and 2-GNN
we prove this in the next section. Corollary 4.6. A unary query Q is expressible by a polynomial-size
As explained in the introduction, while uniform expressivity bounded-depth family of 1-GNNs if and only if it is expressible by a
seems to be the most natural notion of GNN-expressivity, much polynomial-size bounded-depth family of 2-GNNs.
of the theoretical literature on GNNs is concerned with a non-
uniform notion where the GNN may depend on the size of the input Figure 4.2 summarises the results in the non-uniform setting.
graph. We formalise non-uniform GNN-expressivity as follows. Let
N B (𝔑𝑛 )𝑛∈N be a family of GNNs. Following [11], we say that N 5 UNIFORM SEPARATION
(non-uniformly) computes a unary query Q on ℓ-labelled graphs if for In this section, we separate the expressivity of 1-GNNs and 2-GNNs.
all 𝑛 ∈ N>0 , all (𝐺, b) ∈ GSℓbool of order ⋃︀𝐺 ⋃︀ = 𝑛, and all 𝑣 ∈ 𝑉 (𝐺) it Interestingly, the separation only works for GNNs that only use
holds that SUM-aggregation. We will prove that 2-GNNs using only MEAN-
)︀
⌉︀≥ 3 if 𝑣 ∈ Q(𝐺),
⌉︀ aggregation or only MAX-aggregation can be simulated by 1-GNNs
𝑆 𝔑𝑛 (𝐺, b)(𝑣) ⌋︀ 41 (using the same aggregation function).
]︀ 4 if 𝑣 ∈⇑ Q(𝐺).
⌉︀≤
⌉︀
Since we consider GNNs with rational weights here, we can take 5.1 SUM Aggregation
size of a GNN to be the bitsize of a representation of the GNN. The
Let Q1 be the query on unlabelled graphs that selects all vertices that
depth of a GNN is the sum of the depths of the FNNs computing the
have a neighbour of larger degree. That is, for every 𝐺 ∈ G0 we let
message and combination functions of the layers of the GNN. We
say that a family N B (𝔑𝑛 )𝑛∈N>0 has polynomial size if there is a Q1 (𝐺) = {𝑢 ∈ 𝑉 (𝐺) ⋂︀ ∃𝑣 ∈ 𝑁𝐺 (𝑢) ∶ deg𝐺 (𝑢) < deg𝐺 (𝑣)}.
polynomial 𝔭 such that the size of 𝔑𝑛 is at most 𝔭(𝑛). We say that This is precisely the query considered in Example 1.2(1), and we
the family has bounded depth if there is a constant 𝑑 such that for all have already observed there that the query is expressible in MFO+C.
𝑛 the depth of 𝔑𝑛 is at most 𝑑.
Let us first look at non-uniform expressivity without any restric- Theorem 5.1. The query Q1 is expressible by a 2-GNN with SUM-
tions on the family N of GNNs. We have the following logical aggregation, but not by a 1-GNN with SUM-aggregation.
characterisation of non-uniform GNN expressivity; it is a direct
It is fairly obvious that Q1 is expressible by a 2-layer 2-GNN.
consequence of the characterisation of GNN-expressivity in terms
The first layer computes the degree of all vertices by summing the
of the Weisfeiler-Leman algorithm [22, 30].
constant message 1 for all neighbours. On the second layer, the
Theorem 4.3 ([22, 30]). Let Q be a unary query. message sent along an edge (𝑣, 𝑢) is 1 if deg(𝑢) < deg(𝑣) and 0
(1) If Q is (non-uniformly) computable by a family N of 2-GNNs, otherwise. The combination function then just needs to check if the
then Q is non-uniformly expressible in MC. sum of all messages is at least 1.
(2) If Q is non-uniformly expressible in MC, then it is computable The proof that Q1 is not expressible by a 1-GNN requires some
by a family of 1-GNNs. preparation. We call a polynomial 𝔭(𝑋, 𝑌 ) in two variables nice if it
is of the form
Corollary 4.4. A unary query Q is computable by a family of 1-GNNs 𝑘
if and only if it is computable by a family of 2-GNNs. 𝔭(𝑋, 𝑌 ) = ∑ 𝑎𝑖 𝑋 ⟨︀𝑖⇑2⧹︀𝑌 [︂𝑖⇑2⌉︂ (5.A)
𝑖=0
It has been proved in [11] that non-uniform computability by GNN with arbitrary real coefficients 𝑎𝑖 . The leading coefficient of 𝔭 is 𝑎𝑖
families of polynomial size and bounded depth can be characterised for the maximum 𝑖 such that 𝑎𝑖 ≠ 0, or 0 if all 𝑎𝑖 are 0. We call a
in terms of the modal and guarded fragments of FO+Cnu , that is, polynomial 𝔭(𝑋, 𝑌 ) co-nice if 𝔭(𝑌, 𝑋 ) is nice.
FO+C with built-in relations.
Lemma 5.2. Let 𝔭(𝑋, 𝑌 ) be a nice or co-nice polynomial with
Theorem 4.5 ([11]). Let Q be a unary query. leading coefficient 𝑎. Then there is an 𝑛 0 ∈ N such that for all 𝑛 ≥ 𝑛 0
10
we have 𝑝 (𝑖), 𝑞 (𝑖) ∈ R are the input and output dimensions of the layer. We
sign (𝔭(𝑛 − 1, 𝑛)) = sign (𝔭(𝑛 + 1, 𝑛)) = sign(𝑎), have 𝑝 (0) = 0, because the input graph is unlabelled, 𝑞 (𝑖) = 𝑝 (𝑖+1)
for 0 ≤ 𝑖 < 𝑑, and 𝑞 (𝑑) = 1.
and ⋃︀𝔭(𝑛 − 1, 𝑛)⋃︀, ⋃︀𝔭1 (𝑛 + 1, 𝑛)⋃︀ ≥ ⋃︀𝑎 ⋃︀.
As input graphs, we consider complete bipartite graphs 𝐾𝑚,𝑛 .
We say that a function 𝑓 ∶ N2 → R is fast-converging to a We assume that 𝑉 (𝐾𝑚,𝑛 ) = 𝑈 ∪ 𝑉 for disjoint sets 𝑈 ,𝑉 of sizes
polynomial 𝔭(𝑋, 𝑌 ) if for all 𝑟 ∈ R there is an 𝑛 0 such that for all ⋃︀𝑈 ⋃︀ = 𝑚, ⋃︀𝑉 ⋃︀ = 𝑛 and 𝐸(𝐾𝑚,𝑛 ) = {𝑢𝑣 ⋃︀ 𝑢 ∈ 𝑈 , 𝑣 ∈ 𝑉 }. Note that
𝑛 ≥ 𝑛 0 and 𝑚 ∈ {𝑛 − 1, 𝑛 + 1} it holds that )︀
⌉︀
⌉︀𝑈 if 𝑚 < 𝑛,
1 ⌉︀
⌉︀
⋂︀𝑓 (𝑚, 𝑛) − 𝔭(𝑚, 𝑛)⋂︀ ≤ 𝑟 . Q1 (𝐾𝑚,𝑛 ) = ⌋︀∅ if 𝑚 = 𝑛, (5.C)
𝑛 ⌉︀
⌉︀
⌉︀
⌉︀
]︀𝑉 if 𝑚 > 𝑛.
Note that fast convergence only considers argument pairs (𝑛 − 1, 𝑛) or
(𝑛 + 1, 𝑛) for 𝑛 ∈ N>0 . We let FC(𝔭) be the class of all 𝑓 ∶ N2 → R (0)
For 𝑚, 𝑛 ∈ N, let f𝑚,𝑛 ∈ S0 (𝐾𝑚,𝑛 ) be the 0-dimensional signal on
that are fast-converging to 𝔭. We let
𝐾𝑚,𝑛 that maps each vertex to the empty tuple (), and for 1 ≤ 𝑖 ≤ 𝑑,
FC B ⋃ FC(𝔭) (𝑖) (𝑖−1) (𝑖)
let f𝑚,𝑛 B 𝑆𝔏(𝑖) (𝐾𝑚,𝑛 , f𝑚,𝑛 ). Then f𝑚,𝑛 ∈ S𝑞 (𝑖) (𝐾𝑚,𝑛 ) is the
𝔭(𝑋 ,𝑌 ) nice
signal we obtain by applying the first 𝑖 layers of 𝔑 to 𝐾𝑚,𝑛 . Since the
be the class of all functions that are fast-converging to some nice transformations computed by GNNs are equivariant, for all 𝑢, 𝑢 ′ ∈ 𝑈
polynomial. Similarly, we let FC co be the class of all 𝑓 ∶ N2 → R (𝑖) (𝑖)
it holds that f𝑚,𝑛 (𝑢) = f𝑚,𝑛 (𝑢 ′ ), and for all 𝑣, 𝑣 ′ ∈ 𝑉 it holds that
fast-converging to some co-nice polynomial. (𝑖) (𝑖)
A function 𝑔 ∶ R𝑘 → R preserves fast convergence if for all f𝑚,𝑛 (𝑣) = f𝑚,𝑛 (𝑣 ′ ).
∶ N2 → R be the function defined
(𝑖)
𝑓1, . . . , 𝑓𝑘 ∈ FC it holds that 𝑔(𝑓1, . . . , 𝑓𝑘 ) ∈ FC and for all For 𝑖 ∈ (︀𝑑⌋︀, 𝑗 ∈ (︀𝑞 (𝑖) ⌋︀, let 𝑓 𝑗
𝑓1, . . . , 𝑓𝑘 ∈ FC co it holds that 𝑔(𝑓1, . . . , 𝑓𝑘 ) ∈ FC co . by
Here 𝑔(𝑓1, . . . , 𝑓𝑘 ) ∶ N2 → R is the function defined by (𝑖) (𝑖)
f𝑚,𝑛 (𝑢) = (𝑓1 (𝑚, 𝑛), . . . , 𝑓
(𝑖)
(𝑚, 𝑛))
𝑞 (𝑖)
𝑔(𝑓1, . . . , 𝑓𝑘 )(𝑥, 𝑦) B 𝑔(𝑓1 (𝑥, 𝑦), . . . , 𝑓𝑘 (𝑥, 𝑦)).
(𝑖)
for all 𝑢 ∈ 𝑈 . That is, 𝑓 𝑗 (𝑚, 𝑛) is the 𝑗 𝑡ℎ entry of the state of a
2 vertex 𝑢 ∈ 𝑈 after applying the first 𝑖 layers of 𝔑 to a graph 𝐾𝑚,𝑛 .
Lemma 5.3. (1) All constant functions 𝑐 ∶ N → R are in FC ∩
FC co . Similarly, we let 𝑔 𝑗 ∶ N2 → R be the function defined by
(𝑖)
(2) For 𝑓 ∶ N2 → R, define 𝑓1, 𝑓2 ∶ N2 → R by 𝑓1 (𝑚, 𝑛) B
𝑚 ⋅ 𝑓 (𝑛,𝑚) and 𝑓2 (𝑚, 𝑛) B 𝑛 ⋅ 𝑓 (𝑚, 𝑛). Then (𝑖) (𝑖) (𝑖)
f𝑚,𝑛 (𝑣) = (𝑔1 (𝑚, 𝑛), . . . , 𝑔 (𝑚, 𝑛))
𝑞 (𝑖)
co
𝑓 ∈ FC Ô⇒ 𝑓1 ∈ FC;
for all 𝑣 ∈ 𝑉 .
𝑓 ∈ FC Ô⇒ 𝑓2 ∈ FC co .
Claim 1. For all 𝑖 ∈ (︀𝑑⌋︀, 𝑗 ∈ (︀𝑞 (𝑖) ⌋︀ we have
(3) All linear functions preserve fast convergence.
(𝑖) (𝑖)
(4) All functions computed by FNNs that only use activation func- 𝑓𝑗 ∈ FC, 𝑔𝑗 ∈ FC co .
tions preserving fast convergence preserve fast convergence.
Proof. The proof is by induction on 𝑖.
Remark 5.4. The proof of Lemma 5.3(2) is the only place where we
For the base step, we observe that 𝑓 (1) must be constant, because
need “fast” convergence, with a convergence rate bound by an inverse (0)
polynomial function in 𝑛, instead of just ordinary convergence. f maps all vertices to the empty tuple, the message function
is the identify, the sum of empty tuples is the empty tuples, so
Lemma 5.5. The relu function preserves fast convergence. the combination function receives the empty tuple as input (at all
vertices).
Proof of Theorem 5.6. Suppose for contradiction that (𝑖−1)
For the inductive step, suppose that 𝑖 > 1 and that 𝑓 𝑗 ∈
𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) FC, 𝑔 𝑗
(𝑖−1)
∈ FC co . Let 𝑝 B 𝑝 (𝑖) = 𝑞 (𝑖−1) , and let 𝑞 B 𝑞 (𝑖) .
is a 1-GNN that computes Q1 . By adding a bias of −1⇑2 to the output It will be convenient to think of comb(𝑖) ∶ R2𝑝 → R𝑞 as a tuple
of the last layer, we may actually assume that for all graphs 𝐺 and (𝑐 1, . . . , 𝑐𝑞 ), where 𝑐 𝑗 ∶ R2𝑝 → R𝑞 . Since comb(𝑖) is computable by
vertices 𝑣 ∈ 𝑉 (𝐺) it holds that an FNN with relu-activations, each 𝑐 𝑗 preserves fast convergence.
)︀
⌉︀≥ 1
⌉︀ if 𝑣 ∈ Q1 (𝐺), For all 𝑢 ∈ 𝑈 , we have
𝑆 𝔑 (𝐺)(𝑣) ⌋︀ 4 1 (5.B)
⌉︀
⌉︀ ≤ − if 𝑣 ∈⇑ Q1 (𝐺). (𝑖) (𝑖−1)
]︀ 4 f𝑚,𝑛 (𝑢) = comb(𝑖) (f (𝑖−1) (𝑢), ∑ f𝑚,𝑛 (𝑣))
By Lemma 4.1, we may assume that the message function on each 𝑣∈𝑉
layer is the identity function. Suppose that and thus, for 𝑗 ∈ (︀𝑞⌋︀,
𝔏(𝑖) = (id𝑝 (𝑖) , SUM, comb(𝑖) ), (𝑖) (𝑖−1) (𝑖−1)
𝑓𝑗 (𝑚, 𝑛) = 𝑐 𝑗 (𝑓1 (𝑚, 𝑛), . . . , 𝑓𝑝 (𝑚, 𝑛),
𝑝 (𝑖) (𝑖) 2𝑝 (𝑖) 𝑞 (𝑖)
where id𝑝 (𝑖) is the identity on R and comb ∶ R →R (𝑖−1) (𝑖−1)
𝑛 ⋅ 𝑔1 (𝑚, 𝑛), . . . , 𝑛 ⋅ 𝑔𝑝 (𝑚, 𝑛)).
is a function computed by a feed-forward neural network. Here
11
(𝑖−1) (𝑖−1)
By the induction hypothesis, we have 𝑓𝑘 ∈ FC and 𝑔𝑘 ∈ layers of the computation. Now the trick is to use a one-hot encoding
FC co . The latter implies, by Lemma 5.3(2), that the function of the possible values. Suppose that after 𝑖-steps of the computa-
(𝑖−1)
(𝑚, 𝑛) ↦ 𝑛 ⋅ 𝑔𝑘
(𝑖)
(𝑚, 𝑛) is in FC. Thus 𝑓 𝑗 ∈ FC. tion, the signal f (𝑖) maps all vertices to vectors in some finite set
(𝑖) 𝑆 (𝑖) = {𝒔 1, . . . , 𝒔𝑚 } ⊆ R𝑞 . Importantly, this set 𝑆 (𝑖) is independent of
The argument for 𝑔 𝑗 is similar. This completes the proof of the
the input graph (𝐺, f) ∈ GSℓbool , it only depends on ℓ and the GNN.
claim. ⧹︃
In a one-hot encoding of the set 𝑆 (𝑖) we represent the element 𝒔 𝑗
(𝑑)
The function 𝑓1 computes the output of 𝔑 at the vertices in 𝑈 : by the 𝑗-th unit vector 𝒆 𝑗 ∈ R𝑚 . We can construct a 1-GNN 𝔑′ that
(𝑑) (𝑑)
for all 𝑢 ∈ 𝑈 we have 𝑓1 (𝑚, 𝑛) = f𝑚,𝑛 (𝑢) = 𝑆 𝔑 (𝐾𝑚,𝑛 )(𝑢). Thus simulates the computation of 𝔑, but represents all states f (𝑖) (𝑢)
by (5.B) and (5.C), by their one-hot encoding. We only need a 1-GNN with identity
)︀ messages for this, because aggregating, that is, taking the coordi-
⌉︀≥ 1
⌉︀ if 𝑚 < 𝑛,
(𝑚, 𝑛) ⌋︀ 4 1
(𝑑)
𝑓1 natewise maximum of, the one-hot encoded states of the neighbours
⌉︀ if 𝑚 ≥ 𝑛.
]︀≤ − 4
⌉︀ gives a node the full set of states appearing at the neighbours. Since
MAX-aggregation is not sensitive to multiplicities, this is sufficient
Thus for all 𝑛 ∈ N>0 , 𝑓 (𝑑) (𝑛 − 1, 𝑛) ≥ 14 and 𝑓 (𝑑) (𝑛 + 1, 𝑛) ≤ − 14 .
(𝑑) to reconstruct the messages of the original 2-GNN 𝔑 and simulate
Since 𝑓1 (𝑚, 𝑛) ∈ FC, it is easy to see that this contradicts the necessary computations in the combination function.
Lemma 5.2. □ If we try to prove the assertion for MEAN-aggregation similarly,
We assumed that FNNs only use relu and the identity function we already fail at the first step. The range of the signals computed by
as activations. The only property of relu that we use in the proof of a MEAN-GNN on inputs (𝐺, f) ∈ GSℓbool is infinite in general. The
Theorem 5.6 is that it preserves fast convergence. So, actually, our trick to resolve this is to use finite-valued approximations and then
proof yields the following more general result. use one-hot encodings of these. The reason that this scheme works
is that the functions computed by FNNs are Lipschitz continuous,
Theorem 5.6. The query Q1 is expressible by a 2-GNN with SUM- and that MEAN aggregation (as opposed to SUM aggregation) keeps
aggregation, but not by a 1-GNN with SUM-aggregation and with the approximation error bounded.
arbitrary activation functions that preserve fast convergence.
Not only relu, but many other standard activation functions, among
them the logistic function sig ∶ 𝑥 ↦ 1+𝑒1−𝑥 , preserve fast convergence. 6 CONCLUSIONS
Interestingly, there are also somewhat natural functions that, when We study the expressivity of graph neural networks with 1-sided and
used as activations, allow 1-GNNs to express the query Q1 . 2-sided message passing, and closely related to this, the expressivity
of modal and guarded first-order logic with counting. The picture
Example 5.7. (1) The query Q1 can be expressed by a 1-GNN we see is surprisingly complicated: on the logical side, the modal
with SUM-aggregation and the square root function as activa- and guarded fragments have the same expressivity. This implies
tion function. that the 1-GNNs and 2-GNNs have the same expressivity in non-
(2) The main results of [11] hold for a class of piecewise linear uniform settings. This contrasts with the uniform setting, where
approximable activations. The functions 𝑓 , 𝑔 ∶ R ⌈︂
→ R defined we can separate 1-GNNs from 2-GNNs, but only if they use SUM
by 𝑓 (𝑥) = min{1, 1⇑⋃︀𝑥 ⋃︀} and 𝑔(𝑥) = min{1, 1⇑ ⋃︀𝑥 ⋃︀} (with aggregation.
𝑓 (0) = 𝑔(0) = 1) are piecewise linear approximable. Yet it The proofs of these results introduce novel techniques. Our proof
can be shown that Q1 can be expressed by a 1-GNN with that MFO+C ≡ GFO+C is based on a hashing trick, we are not aware
SUM-aggregation that uses 𝑓 and 𝑔 as activations. of similar arguments used elsewhere in a logical context (similar
We leave the proofs of these assertions to the reader. techniques are being used in complexity theory). Furthermore, there
are not many inexpressibility results for graph neural networks that
5.2 The MEAN-MAX theorem are not based on the Weisfeiler-Leman algorithm. A notable exception
Maybe surprisingly, the use of SUM-aggregation is crucial in Theo- is [26], and we built on techniques developed there. A specific idea
rem 5.6. The corresponding result for MEAN or MAX aggregation that is new here is our notion of fast convergence, which allows us to
does not hold. To the contrary, we have the following. extend our result to activation functions other than relu.
Theorem 5.8. (1) Every query computable by a 2-GNN with Some questions remain open. In the uniform setting, the query that
MAX-aggregation is computable by a 1-GNN with MAX- we use to separate 2-GNNs from 1-GNNs with SUM-aggregation
aggregation. turns out to be expressible by 1-GNNs that use both SUM and MEAN.
(2) Every query computable by a 2-GNN with MEAN-aggregation Is there also a query that is expressible by 2-GNNs, but not by
is computable by a 1-GNN with MEAN-aggregation. 1-GNNs with SUM and MEAN, or even SUM, MEAN, and MAX
aggregation?
To explain the proof, let us first consider MAX-aggregation. Con- In the non-uniform setting, we have not considered families
sider a 2-GNN 𝔑 with MAX-aggregation that computes some query of GNNs of polynomial size and unbounded depth. Similarly to
Q on ℓ-labelled graphs (𝐺, f) ∈ GSℓbool , for some fixed ℓ. The initial polynomial-size families of Boolean circuits, such families can be
signal f only takes values in the finite set {0, 1}ℓ . It is not hard to very powerful, and we currently have no techniques for establishing
prove, by induction on the number of layers, that in a 2-GNN with lower bounds (that is, inexpressivity) except for arguments based on
MAX-aggregation, the range of the signal remains finite through all Weisfeiler-Leman, which cannot separate 1-GNNs from 2-GNNs.
12
Related to this is also the question if recurrent 2-GNNs are more 2017, Reykjavik, Iceland, June 20-23, 2017. IEEE Computer Society, 1–12. doi:
expressive than recurrent 1-GNNs. 10.1109/LICS.2017.8005133.
[22] Christopher Morris, Martin Ritzert, Matthias Fey, William L. Hamilton, Jan Eric
Lenssen, Gaurav Rattan, and Martin Grohe. 2019. Weisfeiler and leman go
REFERENCES neural: higher-order graph neural networks. In The Thirty-Third AAAI Conference
[1] Franz Baader and Filippo De Bortoli. 2019. On the expressive power of descrip- on Artificial Intelligence, AAAI 2019, The Thirty-First Innovative Applications
tion logics with cardinality constraints on finite and infinite sets. In Frontiers of Artificial Intelligence Conference, IAAI 2019, The Ninth AAAI Symposium on
of Combining Systems - 12th International Symposium, FroCoS 2019, London, Educational Advances in Artificial Intelligence, EAAI 2019, Honolulu, Hawaii,
UK, September 4-6, 2019, Proceedings (Lecture Notes in Computer Science). USA, January 27 - February 1, 2019. AAAI Press, 4602–4609. doi: 10.1609
Andreas Herzig and Andrei Popescu, (Eds.) Vol. 11715. Springer, 203–219. doi: /AAAI.V33I01.33014602.
10.1007/978-3-030-29007-8\_12. [23] Martin Otto. 1997. Bounded variable logics and counting – A study in finite
[2] Pablo Barceló, Egor V. Kostylev, Mikaël Monet, Jorge Pérez, Juan L. Reutter, models. Lecture Notes in Logic. Vol. 9. Springer Verlag.
and Juan Pablo Silva. 2020. The logical expressiveness of graph neural networks. [24] Martin Otto. 2019. Graded modal logic and counting bisimulation. ArXiv,
In 8th International Conference on Learning Representations, ICLR 2020, Addis 1910.00039. http://arxiv.org/abs/1910.00039.
Ababa, Ethiopia, April 26-30, 2020. OpenReview.net. https://openreview.net/fo [25] Ian Pratt-Hartmann. 2007. Complexity of the guarded two-variable fragment
rum?id=r1lZ7AEKvB. with counting quantifiers. J. Log. Comput., 17, 1, 133–155. doi: 10 . 1093
[3] David A. Mix Barrington, Neil Immerman, and Howard Straubing. 1990. On /LOGCOM/EXL034.
[26] Eran Rosenbluth, Jan Toenshoff, and Martin Grohe. 2023. Some might say all
uniformity within NC1 . J. Comput. Syst. Sci., 41, 3, 274–306. doi: 10.1016/002
you need is sum. In Proceedings of the 32nd International Joint Conference on
2-0000(90)90022-D.
Artificial Intelligence, 4172–4179. doi: 10.24963/ijcai.2023/464.
[4] Elias Bernreuther, Thorben Finke, Felix Kahlhoefer, Michael Krämer, and
[27] F. Scarselli, M. Gori, A.C. Tsoi, M. Hagenbuchner, and G. Monfardini. 2009.
Alexander Mück. 2021. Casting a graph net to catch dark showers. SciPost
The graph neural network model. IEEE Transactions on Neural Networks, 20, 1,
Physics, 10. doi: 10.21468/SciPostPhys.10.2.046.
61–80.
[5] Quentin Cappart, Didier Chételat, Elias B. Khalil, Andrea Lodi, Christopher
[28] Shyam A. Tailor, Felix L. Opolka, Pietro Liò, and Nicholas Donald Lane. 2022.
Morris, and Petar Velickovic. 2023. Combinatorial optimization and reasoning
Do we need anisotropic graph neural networks? In The Tenth International
with graph neural networks. J. Mach. Learn. Res., 24, 130:1–130:61. http://jmlr
Conference on Learning Representations, ICLR 2022, Virtual Event, April 25-29,
.org/papers/v24/21-0449.html.
2022. OpenReview.net. https://openreview.net/forum?id=hl9ePdHO4%5C_s.
[6] Ines Chami, Sami Abu-El-Haija, Bryan Perozzi, Christopher Ré, and Kevin
[29] Stephan Tobies. 2001. PSPACE reasoning for graded modal logics. J. Log.
Murphy. 2022. Machine learning on graphs: A model and comprehensive
Comput., 11, 1, 85–106. doi: 10.1093/LOGCOM/11.1.85.
taxonomy. Journal of Machine Learning Research, 23, 89, 1–64.
[30] Keyulu Xu, Weihua Hu, Jure Leskovec, and Stefanie Jegelka. 2019. How powerful
[7] Kit Fine. 1972. In so many possible worlds. Notre Dame J. Formal Log., 13, 4,
are graph neural networks? In 7th International Conference on Learning Repre-
516–520. doi: 10.1305/NDJFL/1093890715.
sentations, ICLR 2019, New Orleans, LA, USA, May 6-9, 2019. OpenReview.net.
[8] C. Gallicchio and A. Micheli. 2010. Graph echo state networks. In Proceedings
https://openreview.net/forum?id=ryGs6iA5Km.
of the IEEE International Joint Conference on Neural Networks.
[9] Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and
George E. Dahl. 2017. Neural message passing for quantum chemistry. In
Proceedings of the 34th International Conference on Machine Learning, ICML
2017, Sydney, NSW, Australia, 6-11 August 2017 (Proceedings of Machine
Learning Research). Doina Precup and Yee Whye Teh, (Eds.) Vol. 70. PMLR,
1263–1272. http://proceedings.mlr.press/v70/gilmer17a.html.
[10] Erich Grädel, Martin Otto, and Eric Rosen. 1997. Two-variable logic with
counting is decidable. In Proceedings, 12th Annual IEEE Symposium on Logic
in Computer Science, Warsaw, Poland, June 29 - July 2, 1997. IEEE Computer
Society, 306–317. doi: 10.1109/LICS.1997.614957.
[11] Martin Grohe. 2023. The descriptive complexity of graph neural networks. In
Proceedings of the 38th Annual ACM/IEEE Symposium on Logic in Computer
Science. doi: 10.1109/LICS56636.2023.10175735.
[12] Martin Grohe. 2023. The descriptive complexity of graph neural networks.
ArXiv, 2303.04613. doi: 10.48550/arXiv.2303.04613.
[13] Martin Grohe. 2021. The logic of graph neural networks. In 36th Annual
ACM/IEEE Symposium on Logic in Computer Science, LICS 2021, Rome, Italy,
June 29 - July 2, 2021. IEEE, 1–17. doi: 10.1109/LICS52264.2021.9470677.
[14] William Hesse. 2001. Division is in uniform TC0 . In Automata, Languages and
Programming, 28th International Colloquium, ICALP 2001, Crete, Greece, July
8-12, 2001, Proceedings (Lecture Notes in Computer Science). Fernando Orejas,
Paul G. Spirakis, and Jan van Leeuwen, (Eds.) Vol. 2076. Springer, 104–114.
doi: 10.1007/3-540-48224-5_9.
[15] William Hesse, Eric Allender, and David A. Mix Barrington. 2002. Uniform
constant-depth threshold circuits for division and iterated multiplication. J.
Comput. Syst. Sci., 65, 4, 695–716. doi: 10.1016/S0022-0000(02)00025-9.
[16] Bernhard Hollunder and Franz Baader. 1991. Qualifying number restrictions
in concept languages. In Proceedings of the 2nd International Conference on
Principles of Knowledge Representation and Reasoning (KR’91). Cambridge,
MA, USA, April 22-25, 1991. James F. Allen, Richard Fikes, and Erik Sandewall,
(Eds.) Morgan Kaufmann, 335–346. https://dblp.org/rec/conf/kr/HollunderB91.
[17] N. Immerman and E. Lander. 1990. Describing graphs: a first-order approach
to graph canonization. In Complexity theory retrospective. A. Selman, (Ed.)
Springer-Verlag, 59–81.
[18] Neil Immerman. 1999. Descriptive complexity. Graduate texts in computer
science. Springer. isbn: 978-1-4612-6809-3. doi: 10.1007/978-1-4612-0539-5.
[19] Emanuel Kieronski, Ian Pratt-Hartmann, and Lidia Tendera. 2018. Two-variable
logics with counting and semantic constraints. ACM SIGLOG News, 5, 3, 22–43.
doi: 10.1145/3242953.3242958.
[20] T. N. Kipf and M. Welling. 2017. Semi-supervised classification with graph
convolutional networks. In Proceedings of the 5th International Conference on
Learning Representations.
[21] Dietrich Kuske and Nicole Schweikardt. 2017. First-order logic with counting.
In 32nd Annual ACM/IEEE Symposium on Logic in Computer Science, LICS
13
A DETAILS OMITTED FROM SECTION 3 is equivalent to
We start by proving Proposition 3.1. For the reader’s convenience, ⎛ ℓ ⎞
we restate the proposition, and we will do the same with all results #(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < ord𝑑1 , . . . , 𝑦ℓ < ord𝑑ℓ ). ⋀ 𝑦𝑖 < 𝜃𝑖 ∧ 𝜓
we prove in the appendix. ⎝ 𝑗=1 ⎠

Proposition 3.1. MC ≡ GC. if we can guarantee that 𝜃𝑖 takes values < 𝑛𝑑𝑖 in graphs of order 𝑛.
Property (ii) is proved by an induction based on the following
Proof. We only need to translate each GC-formula observation. Suppose that 𝜑 is a formula that contains a term 𝜃 such
𝜑(𝑥) = ∃≥𝑛 𝑥 ′ .(𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 (𝑥, 𝑥 ′ )), that no free variable of 𝜃 appears in the scope of any quantifier
(i.e., counting operator) in 𝜑. To achieve this we can rename bound
where (by induction) we assume that 𝜓 (𝑥, 𝑥 ′ ) is modal, to an MC- variables. Let 𝜑 ′ be the formula obtained from 𝜑 by replacing each
formula. occurrence of 𝜃 by a number variable 𝑦. Then 𝜑 is equivalent to
We note that 𝜓 (𝑥, 𝑥 ′ ) is a Boolean combination of formulas 𝑥 = 𝑥 ′ , ∃𝑦 < ord𝑑 (𝑦 = 𝜃 ∧ 𝜑 ′ ) if we can guarantee that 𝜃 takes values < 𝑛𝑑
𝐸(𝑥, 𝑥 ′ ) as well as MC-formulas with only one free variable. We in graphs of order 𝑛. □
may further assume that this Boolean combination is in disjunctive
normal form, that is, 𝜓 (𝑥, 𝑥 ′ ) = ⋁𝑘𝑖=1 𝛾𝑖 (𝑥, 𝑥 ′ ), where each 𝛾𝑖 is Lemma 3.10. Let 𝑘,𝑚 ∈ N and M ⊆ (︀2𝑚 (︀. Furthermore, let 𝑃 ⊆ N
a conjunction of atomic formulas 𝑥 = 𝑥 ′, 𝐸(𝑥, 𝑥 ′ ), their negations be a set of primes of cardinality ⋃︀𝑃 ⋃︀ ≥ 𝑘𝑚⋃︀M⋃︀2 . Then
¬𝑥 = 𝑥 ′, ¬𝐸(𝑥, 𝑥 ′ ), and MC-formulas with only one free variable. In 1
fact, we may even assume that the 𝛾𝑖 are mutually exclusive, that Pr (∃𝑀, 𝑁 ∈ M, 𝑀 ≠ 𝑁 ∶ 𝑀 ≡ 𝑁 mod 𝑝) < .
𝑝∈𝑃 𝑘
is, for 𝑖 ≠ 𝑖 ′ the formula 𝛾𝑖 ∧ 𝛾𝑖 ′ is unsatisfiable. To achieve this,
let us assume that 𝛾𝑖 = (𝜆1 ∧ . . . ∧ 𝜆𝑚 ). Then we replace 𝛾𝑖 ∨ 𝛾𝑖 ′ where the probability ranges over 𝑝 ∈ 𝑃 chosen uniformly at random.
by 𝛾𝑖 ∨ ⋁𝑚 𝑗=1 (𝛾𝑖 ′ ∧ ¬𝜆 𝑗 ). We can easily lift this construction to the
Proof. Suppose that M = {𝑀1, . . . , 𝑀𝑛 } such that 𝑀1 < 𝑀2 <
whole disjunction ⋁𝑘𝑖=1 𝛾𝑖 .
. . . < 𝑀𝑛 , and for 1 ≤ 𝑖 < 𝑗 ≤ 𝑛, let 𝐷𝑖 𝑗 B 𝑀 𝑗 − 𝑀𝑖 . Note that for
Since we are only considering simple undirected graphs, we do not
every 𝑝,
need to consider atoms 𝐸(𝑥 ′, 𝑥) or 𝐸(𝑥, 𝑥). Of course we may assume
that we never have an atom and its negation appearing together in 𝑀𝑖 ≡ 𝑀 𝑗 mod 𝑝 ⇐⇒ 𝑝 ⋂︀ 𝐷𝑖 𝑗 .
the same conjunction 𝛾𝑖 . Again since we are in simple graphs, we (𝑝 ⋃︀ 𝐷𝑖 𝑗 means “𝑝 divides 𝐷𝑖 𝑗 ”). Since 𝐷𝑖 𝑗 < 2𝑚 , it has less than 𝑚
may further assume that 𝑥 = 𝑥 ′ and 𝐸(𝑥, 𝑥 ′ ) never appear together prime factors. Thus
in a 𝛾𝑖 . Finally, since we are taking the conjunction of 𝜓 (𝑥, 𝑥 ′ ) with 𝑚
𝐸(𝑥, 𝑥 ′ ), we may assume that ¬𝐸(𝑥, 𝑥 ′ ) and 𝑥 = 𝑥 ′ never appear in Pr (𝑀𝑖 ≡ 𝑀 𝑗 mod 𝑝) = Pr (𝑝 ⋃︀ 𝐷𝑖 𝑗 ) < .
𝑝∈𝑃 𝑝∈𝑃 ⋃︀𝑃 ⋃︀
a 𝛾𝑖 . Moreover, we can omit 𝐸(𝑥, 𝑥 ′ ) and ¬𝑥 = 𝑥 ′ from all 𝛾𝑖 . All
this implies that we may actually assume that 𝛾𝑖 = 𝜒𝑖 (𝑥) ∧ 𝜒𝑖′ (𝑥 ′ ) By the Union bound,
for MC-formulas 𝜒𝑖 , 𝜒𝑖′ . Hence Pr (∃𝑀, 𝑁 ∈ M, 𝑀 ≠ 𝑁 ∶ 𝑀 ≡ 𝑁 mod 𝑝)
𝑝∈𝑃
⎛ 𝑘 ⎞
𝜑(𝑥) = ∃≥𝑛 𝑥 ′ . 𝐸(𝑥, 𝑥 ′ ) ∧ ⋁ (𝜒𝑖 (𝑥) ∧ 𝜒𝑖′ (𝑥 ′ )) . = Pr (∃𝑖, 𝑗 ∈ (︀𝑛⌋︀, 𝑖 < 𝑗 ∶ 𝑀𝑖 ≡ 𝑀 𝑗 mod 𝑝)
⎝ 𝑖=1 ⎠ 𝑝∈𝑃

Then 𝑚𝑛(𝑛 − 1) 1
< < .
𝑘
2⋃︀𝑃 ⋃︀ 𝑘
𝜑(𝑥) ≡ ∃≥𝑛 𝑥 ′ . ⋁ (𝐸(𝑥, 𝑥 ′ ) ∧ 𝜒𝑖 (𝑥) ∧ 𝜒𝑖′ (𝑥 ′ )) □
𝑖=1
𝑘
≥𝑛 ′ ′ ′ ′ Lemma 3.11. There is an 𝑛 0 ≥ 2 such that for all 𝑛 ≥ 𝑛 0 there are
≡ ⋁ ⋀ ∃ 𝑗 𝑥 .(𝐸(𝑥, 𝑥 ) ∧ 𝜒𝑖 (𝑥) ∧ 𝜒𝑖 (𝑥 )) at least 𝑛 primes 𝑝 ≤ 2𝑛 ln 𝑛.
𝑛 1 ,...,𝑛𝑘 ∈(︀𝑛+1(︀ 𝑗𝑖=1
𝑛 1 +...+𝑛𝑘 =𝑛
Proof. Let 𝜋(𝑛) be the number of primes in (︀𝑛⌋︀. By the prime
𝑘
≥𝑛 𝑗 ′ ′
≡ ⋁ ⋀ (𝜒𝑖 (𝑥) ∧ ∃ 𝑥 .(𝐸(𝑥, 𝑥 ) ∧ 𝜒𝑖′ (𝑥 ′ ))). number theorem, we have
𝑛 1 ,...,𝑛𝑘 ∈(︀𝑛+1(︀ 𝑖=1 𝜋(𝑛)
𝑛 1 +...+𝑛𝑘 =𝑛 lim = 1.
𝑛→∞ 𝑛⇑ ln 𝑛
The last formula is modal. □
Thus there is an 𝑛 1 such that for all 𝑛 ≥ 𝑛 1 we have 𝜋(𝑛) ≥ 4 3𝑛
ln 𝑛 .
Lemma 3.7. Every FO+C-formula 𝜑 with no free number variables Moreover, there is an 𝑛 2 ∈ N such that for all 𝑛 ≥ 𝑛 2 we have
is equivalent to a simple FO+C-formula 𝜑 ′ . Furthermore, if 𝜑 is in 2 ln(2 ln 𝑛) ≤ ln 𝑛. Then for 𝑛 ≥ max{𝑛 1, 𝑛 2 } we have
MFO+C or GFO+C, then 𝜑 ′ is in MFO+C or GFO+C, respectively,
as well. 6𝑛 ln 𝑛 6𝑛 ln 𝑛
𝜋(2𝑛 ln 𝑛) ≥ ≤ = 𝑛.
4 ln 𝑛 + 4 ln(2 ln 𝑛) 6 ln 𝑛
Proof. To establish property (i) of simple formulas, we observe □
that
#(𝑥 1, . . . , 𝑥𝑘 , 𝑦1 < 𝜃 1, . . . , 𝑦ℓ < 𝜃 ℓ ).𝜓 Theorem 3.3. MFO+C ≡ GFO+C.
14
Proof. By induction, we prove that for every GFO+C-formula Let
𝜑(𝑥, 𝑦1, . . . , 𝑦𝑘 ) and each assignment of a degree deg(𝑦𝑖 ) to the N(𝐺) B {𝑁𝑖 (𝑣) ⋂︀ 𝑖 ∈ (︀𝑞⌋︀, 𝑣 ∈ 𝑉 (𝐺)}.
free number variables there is an MFO+C-formula 𝜑(𝑥, ⧹︂ 𝑦1, . . . , 𝑦𝑘 ) 𝑘0
such that the following holds: for all graphs 𝐺 of order 𝑛 B ⋃︀𝐺 ⋃︀, all Then ⋃︀N⋃︀ ≤ 𝑞𝑛, and for 𝑘 0 B max{𝑘𝑖 ⋃︀ 𝑖 ∈ (︀𝑞⌋︀} we have N ⊆ ]︀2𝑚 ]︀.
𝑣 ∈ 𝑉 (𝐺), and all 𝑎 1 ∈ [︀𝑛 deg(𝑦𝑖 ) [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛 deg(𝑦𝑘 ) [︀ it holds that We want to use small primes 𝑝 ∈ 𝑂(𝑛 2 ) to hash the set N to a
set of small numbers. A prime 𝑝 is good for 𝐺 if for all distinct
𝐺 ⊧ 𝜑(𝑣, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑(𝑣,
⧹︂ 𝑎 1, . . . , 𝑎𝑘 ). 𝑁 , 𝑁 ′ ∈ N(𝐺) it holds that 𝑁 ≡⇑ 𝑁 ′ mod 𝑝.
We may assume that 𝜑 is a simple formula. We let There is some 𝑐 ∈ N that does not depend on 𝑛, but only on the
formula 𝜑 and the parameters 𝑑, 𝑘 0, 𝑞 derived from 𝜑 such that
𝑑 B max{deg𝜑 (𝑦) ⋃︀ 𝑦 number variable of 𝜑}.
𝑛𝑐 ≥ 4𝑞 2𝑛𝑑𝑘0 +2 ln(2𝑞 2𝑛𝑑𝑘0 +2 ) = 4𝑞 2𝑚𝑘0 𝑛 2 ln 2𝑞 2𝑚𝑘0 𝑛 2 .
The only interesting step involves counting terms. Since 𝜑 is simple,
this means that we have to consider a formula Let 𝑃𝑛 be the set of all primes less than or equal to 𝑛𝑐 . Then by
Lemma 3.11, ⋃︀𝑃𝑛 ⋃︀ ≥ 2𝑞 2𝑚𝑘0 𝑛 2 ≥ 2𝑚𝑘 ⋃︀N⋃︀2 . By Lemma 3.10 with
𝜑(𝑥, 𝑦0, . . . , 𝑦𝑘 ) = (𝑦0 = #(𝑥 ′, 𝑦𝑘+1 < ord𝑑𝑘+1 , . . . , 𝑦𝑘+ℓ < ord𝑑𝑘+ℓ ). 𝑘 B 2,𝑚 B 𝑚𝑘 , M B N, more than half of the primes in 𝑃𝑛 are
good.
(𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 (𝑥, 𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ ))).
Now suppose 𝑝 ∈ 𝑃𝑛 is a prime that is good for 𝐺. For every
where for all 𝑖 ∈ (︀𝑘 + ℓ⌋︀ we let 𝑑𝑖 B deg𝜑 (𝑦𝑖 ). 𝑖 ∈ (︀𝑞⌋︀, 𝑣 ∈ 𝑉 , we let 𝑛𝑖 (𝑣, 𝑝) ∈ (︀𝑝(︀ such that 𝑛𝑖 (𝑣, 𝑝) ≡ 𝑁𝑖 (𝑣) mod 𝑝.
We need to understand the structure of 𝜓 . Let us call maximal Observe that for all vertices 𝑣, 𝑣 ′ ∈ 𝑉 (𝐺), if 𝑛𝑖 (𝑣, 𝑝) = 𝑛𝑖 (𝑣 ′, 𝑝) then
subformulas of 𝜓 with only one free vertex variable vertex formulas. 𝑁𝑖 (𝑣) = 𝑁𝑖 (𝑣) and thus 𝑅𝑖 (𝑣) = 𝑅𝑖 (𝑣 ′ ). This means that for all
We distinguish between 𝑥-formulas, where only 𝑥 occurs freely, 𝑎 1, . . . , 𝑎𝑘𝑖 ∈ (︀𝑚(︀ it holds that
and 𝑥 ′ -formulas, where only 𝑥 ′ occurs freely. The formula 𝜓 is 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ) ⇐⇒ 𝐺 ⊧ 𝜒𝑖 (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 ).
formed from relational atoms 𝑥 = 𝑥 ′, 𝐸(𝑥, 𝑥 ′ ), arithmetical formulas
(that neither contain 𝑥 nor 𝑥 ′ ), and vertex formulas, using Boolean Claim 1. For every 𝑖 ∈ (︀𝑞⌋︀ there is a formula 𝜁𝑖 (𝑥, 𝑧, 𝑧𝑖 ) such that
connectives, inequalities between terms, and counting terms of the for all 𝑣 ∈ 𝑉 (𝐺) and 𝑏 ∈ N,
form 𝐺 ⊧ 𝜁𝑖 (𝑣, 𝑝, 𝑏) ⇐⇒ 𝑏 = 𝑛𝑖 (𝑣, 𝑝).
′ ′
#(𝑦1′ < ord𝑑1 , . . . , 𝑦𝑘′ < ord𝑑ℓ ′ ).𝜒 . (A.A)
Here 𝑧 and 𝑧𝑖 are fresh number variables that do not occur in 𝜑.
Similarly to the proof of Proposition 3.1, we can argue that we do not Proof. Let
need any relational atoms in 𝜓 , because in 𝜑 we take the conjunction
of 𝜓 with 𝐸(𝑥, 𝑥 ′ ), which means that in 𝜓 the atom 𝐸(𝑥, 𝑥 ′ ) can be 𝛼(𝑦𝑖,1, . . . , 𝑦𝑖𝑘𝑖 ) B 𝑦𝑖,1 + 𝑦𝑖,2 ord𝑑 + . . . , 𝑦𝑖,𝑘𝑖 ord𝑑(𝑘𝑖 −1) .
set to“true” and the atom 𝑥 = 𝑥 ′ can be set to “false”. Moreover, since Then 𝛼 is an arithmetical FO+C-term satisfying
the graph is undirected, we can also set 𝐸(𝑥 ′, 𝑥) to “true”. (Of course
this only holds for atoms that are not in the scope of some counting 𝛼 𝐺 (𝑎 1, . . . , 𝑎𝑘𝑖 ) = ∐︁𝑎 1, . . . , 𝑎𝑘𝑖 ̃︁
term that binds 𝑥 or 𝑥 ′ , that is, not contained in a vertex formula.) Let
So we assume that 𝜓 is actually formed from arithmetical formulas
and vertex formulas using Boolean connectives, inequalities between 𝛽(𝑦, 𝑥) B ∃(𝑦𝑖,1 < ord𝑑 , . . . , 𝑦𝑖𝑘𝑖 < ord𝑑 ).
terms, and counting terms of the form (A.A). (𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 ) ∧ 𝑦 = 𝛼(𝑦𝑖,1, . . . , 𝑦𝑖𝑘𝑖 )).
To turn 𝜑 into a modal formula, we need to eliminate the 𝑥-
Then
formulas in 𝜓 . Let 𝜒1, . . . , 𝜒𝑞 be an enumeration of all 𝑥-formulas 𝐺
in 𝜓 , where 𝜒𝑖 = 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 ). Here the 𝑦𝑖,𝑗 may be variables ⎯𝛽(𝑦, 𝑥), ord𝑑𝑘0 (𝑣) = 𝑁𝑖 (𝑣).
in {𝑦1, . . . , 𝑦𝑘+ℓ }, or they may be number variables 𝑦𝑖′ introduced by Let mod(𝑦, 𝑦,
⧹︂ 𝑥) be the formula obtained by applying Corollary 3.9
counting terms (A.A). with 𝜑(𝑦, 𝒛) B 𝛽(𝑦, 𝑥), 𝜃 (𝒛) B ord𝑑𝑘0 . Then
Let 𝐺 be a graph of order 𝑛. Let 𝑚 B 𝑛𝑑 . Since 𝑑 ≥ deg𝜑 (𝑦)
𝐺 ⊧ mod(𝑏, 𝑝, 𝑣) ⇐⇒ 𝑏 = 𝑛𝑖 (𝑝, 𝑣).
for all number variables 𝑦 appearing in 𝜑, when evaluating 𝜑 in 𝐺,
number variables only take values in (︀𝑚(︀. Let 𝑖 ∈ (︀𝑞⌋︀. Then for every We let 𝜁𝑖 (𝑥, 𝑧, 𝑧𝑖 ) B mod(𝑧𝑖 , 𝑧, 𝑥). ⧹︃
𝑣 ∈ 𝑉 (𝐺), the formula 𝜒𝑖 defines a relation Crucially, the formulas 𝜁𝑖 in Claim 1 do not depend on the graph
𝑘𝑖 𝐺, but only on the formula 𝜑. The same will hold for all formulas
𝑅𝑖 (𝑣) B {(𝑎 1, . . . , 𝑎𝑘𝑖 ) ∈ (︀𝑚(︀ ⋂︀ 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 )}.
defined in the following.
𝑘𝑖 For every 𝑖 ∈ (︀𝑞⌋︀, let
With each tuple (𝑎 1, . . . , 𝑎𝑘𝑖 ) ∈ (︀𝑚(︀ we associate the (small) number
𝑘𝑖 𝜒𝑖′ (𝑥 ′, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 , 𝑧, 𝑧𝑖 ) B
𝑗−1 𝑘𝑖
∐︁𝑎 1, . . . , 𝑎𝑘𝑖 ̃︁ B ∑ 𝑎 𝑗 𝑚 ∈ [︀𝑚 [︀ .
𝑗=1 ∃𝑥(𝐸(𝑥 ′, 𝑥) ∧ 𝜁 (𝑥, 𝑧, 𝑧𝑖 ) ∧ 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 )).

Then we can encode 𝑅𝑖 (𝑣) by the (large) number Note that 𝜒𝑖′ is an MFO+C-formula.
∐︁𝑎 1 ,...,𝑎𝑘𝑖 ̃︁ 𝑘𝑖 Claim 2. For all (𝑣, 𝑣 ′ ) ∈ 𝐸(𝐺) and 𝑎 1, . . . , 𝑎𝑘𝑖 ∈ (︀𝑚(︀ it holds that
𝑁𝑖 (𝑣) B ∑ 2 ∈ ]︀2𝑚 ]︀ .
(𝑎 1 ,...,𝑎𝑘𝑖 )∈𝑅𝑖 (𝑣) 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ) ⇐⇒ 𝐺 ⊧ 𝜒𝑖′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 , 𝑝, 𝑛𝑖 (𝑣, 𝑝)).
15
Proof. For the forward direction, suppose that 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ). B DETAILS OMITTED FROM SECTION 4
Since (𝑣, 𝑣 ′ ) ∈ 𝐸, we have 𝐺 ⊧ 𝜒𝑖′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 , 𝑝, 𝑛𝑖 (𝑣, 𝑝)), because Lemma 4.1. For every 1-GNN 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) there is a 1-
we can take 𝑣 as a witness for the existential quantifier. ̃(0), 𝔏
̃ = (𝔏
GNN 𝔑 ̃(1), . . . , 𝔏
̃(𝑑) ) such that 𝑇 = 𝑇 ̃ and the message
𝔑 𝔑
Suppose that 𝐺 ⊧ 𝜒𝑖′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘𝑖 , 𝑝, 𝑛𝑖 (𝑣, 𝑝)). Let 𝑣 ′′ be a wit- (𝑖)
̃(𝑖) is the identity function on R𝑝̃ .
function of each layer 𝔏
ness for the existential quantifier in 𝜒 ′ . Then
𝐺 ⊧ 𝜁𝑖 (𝑣 ′′, 𝑝, 𝑛𝑖 (𝑣, 𝑝)) (A.B) Proof. The trick is to integrate the message function of layer 𝑖
′′
𝐺 ⊧ 𝜒𝑖 (𝑣 , 𝑎 1, . . . , 𝑎𝑘𝑖 ) (A.C) into the combination function of layer 𝑖 − 1. This is why we need
an additional 0th layer. that does nothing but compute the message
By Claim 1, (A.B) implies 𝑛𝑖 (𝑣 ′′, 𝑝) = 𝑛𝑖 (𝑣, 𝑝). Since 𝑝 is good, this function of the first layer.
implies 𝑁𝑖 (𝑣) = 𝑁𝑖 (𝑣 ′′ ) and thus Suppose that for 𝑖 ∈ (︀𝑑⌋︀ we have 𝔏(𝑖) = (msg(𝑖), agg(𝑖), comb(𝑖) ),
𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ) ⇐⇒ 𝐺 ⊧ 𝜒𝑖 (𝑣 ′′, 𝑎 1, . . . , 𝑎𝑘𝑖 ). (𝑖) (𝑖)
where msg(𝑖) ∶ R𝑝 → R𝑟 and comb(𝑖) ∶ R𝑝 +𝑟 → R𝑞 .
(𝑖) (𝑖) (𝑖)

Thus by (A.C), 𝐺 ⊧ 𝜒𝑖 (𝑣, 𝑎 1, . . . , 𝑎𝑘𝑖 ). ⧹︃ We let 𝑝̃(0) B 𝑝 (1) and 𝑝̃(𝑖) B 𝑝 (𝑖) + 𝑟 (𝑖) for 𝑖 ∈ (︀𝑑⌋︀. Moreover,
′ ′
Now let 𝜓 (𝑥 , 𝑦1, . . . , 𝑦𝑘+ℓ , 𝑧, 𝑧 1, . . . , 𝑧𝑞 ) be the formula obtained for 𝑖 ∈ (︀𝑑(︀ we let 𝑞̃(𝑖) B 𝑝̃(𝑖+1) , and we let 𝑞̃(𝑑) B 𝑞 (𝑑) . For each
from 𝜓 (𝑥, 𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ ) by replacing, for each 𝑖 ∈ (︀𝑞⌋︀, the 𝑥- 𝑝 ∈ N we let id𝑝 be the identity function on R𝑝 . For 𝑖 ∈ (︀𝑑 + 1(︀, we
formula 𝜒𝑖 (𝑥, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 ) by the 𝑥 ′ -formula 𝜒𝑖′ (𝑥 ′, 𝑦𝑖,1, . . . , 𝑦𝑖,𝑘𝑖 , 𝑧, 𝑧𝑖 ). ̃(𝑖) = (id (𝑖) , a
let 𝔏 ̃ ̃ (𝑖) ), where a
gg(𝑖), comb ̃ gg(0) is arbitrary (say,
𝑝̃
Then 𝜓 ′ is an MFO+C-formula. (𝑖) (𝑖) (𝑖)
SUM), agg
̃ (𝑖)
B agg(𝑖) for 𝑖 ∈ (︀𝑑⌋︀, and comb
̃ ∶ R2̃
𝑝
→ R𝑞̃
Claim 3. For all (𝑣, 𝑣 ′ ) ∈ 𝐸(𝐺) and 𝑎 1 ∈ [︀𝑛𝑑1 [︀ , . . . , 𝑎𝑘+ℓ ∈ [︀𝑛𝑑𝑘+ℓ [︀ is defined as follows:
(1)
it holds that ● for 𝑖 = 0 and 𝒙, 𝒛 ∈ R𝑝 we let
𝐺 ⊧ 𝜓 (𝑣, 𝑣 ′, 𝑎 1, . . . , 𝑎𝑘+ℓ ) (0)
̃
comb (𝒙, 𝒛) B (𝒙, msg(1) (𝒛));
⇐⇒ 𝐺 ⊧ 𝜓 ′ (𝑣 ′, 𝑎 1, . . . , 𝑎𝑘+ℓ , 𝑝, 𝑛 1 (𝑣, 𝑝), . . . , 𝑛𝑞 (𝑣, 𝑝)).
(𝑖) (𝑖)
Proof. This follows from Claim 2 by a straightforward induction. ⧹︃ ● for 1 ≤ 𝑖 ≤ 𝑑 − 1 and 𝒙, 𝒙 ′ ∈ R𝑝 , 𝒛, 𝒛 ′ ∈ R𝑟 we let

We let (𝑖)
̃
comb (𝒙𝒛, 𝒙 ′ 𝒛 ′ ) B
𝜑 ′ (𝑥, 𝑦0, . . . , 𝑦𝑘 , 𝑧) B
(comb(𝑖) (𝒙, 𝒛 ′ ), msg(𝑖+1) (comb(𝑖) (𝒙, 𝒛 ′ )));
⎛ 𝑞
∃(𝑧 1 < 𝑧, . . . , 𝑧𝑞 < 𝑧). ⋀ 𝜁𝑖 (𝑥, 𝑧, 𝑧𝑖 )∧ (𝑑) (𝑑)
⎝ 𝑖=1 ● for 𝑖 = 𝑑 and 𝒙, 𝒙 ′ ∈ R𝑝 , 𝒛, 𝒛 ′ ∈ R𝑟 we let
𝑦0 = #(𝑥 ′, 𝑦𝑘+1 < ord𝑑𝑘+1 , . . . , 𝑦𝑘+ℓ < ord𝑑𝑘+ℓ ). ̃
comb
(𝑑)
(𝒙𝒛, 𝒙 ′ 𝒛 ′ ) B comb(𝑑) (𝒙, 𝒛 ′ ).
(𝐸(𝑥, 𝑥 ′ ) ∧ 𝜓 ′ (𝑥 ′, 𝑦1, . . . , 𝑦𝑘+ℓ , 𝑧, 𝑧 1, . . . , 𝑧𝑞 ))), It is straightforward to verify that 𝑇𝔑 = 𝑇𝔑 □
̃.

Note that 𝜑 is an MFO+C-formula.
Proposition 4.2. Let 𝔑 be a 2-GNN such that on all layers of 𝔑, the
Claim 4. For all 𝑣 ∈ 𝑉 (𝐺) and 𝑎 1 ∈ [︀𝑛𝑑1 [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛𝑑𝑘 [︀, message function is an affine linear function and the aggregation
function is MEAN. Then there is a 1-GNN 𝔑 ̃ such that 𝑇𝔑 = 𝑇 ̃ .
𝔑
𝐺 ⊧ 𝜑(𝑥, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑 ′ (𝑥, 𝑎 1, . . . , 𝑎𝑘 , 𝑝). (A.D)
Proof. This follows directly from Claims 1 and 3. ⧹︃ Proof. It suffices to prove this for a single layer. So let 𝔏 =
(msg, MEAN, comb) be a 2-GNN layer of input dimension 𝑝, output
In fact, the assertion of Claim 4 holds for all primes 𝑝 that are
dimension 𝑞 where msg ∶ 𝒙 ↦ 𝐴𝒙 + 𝒃 where 𝐴 ∈ R𝑟 ×2𝑝 , 𝒃 ∈ R𝑟 . For
good for 𝐺. Recall that more than half of the primes 𝑝 < 𝑛𝑐 are good.
all vectors 𝒙, 𝒙 1, . . . , 𝒙 𝑛 ∈ R𝑝 we have
Thus (A.E) holds for more than half of the primes 𝑝 < 𝑛𝑐 . Let

prime(𝑧) B ∀(𝑦 < 𝑧, 𝑦 < 𝑧).¬𝑦 ⋅ 𝑦 = 𝑧,
′ agg({{𝐴(𝒙, 𝒙 𝑖 ) + 𝒃 ⋂︀ 𝑖 ∈ (︀𝑛⌋︀}}) = 𝐴 (𝒙, agg({{𝒙 𝑖 ⋃︀ 𝑖 ∈ (︀𝑛⌋︀}}) + 𝒃.

expressing that 𝑧 is a prime. Then we let ̃ ∶ R2𝑝 → R𝑞 by


We define comb
𝑐
⧹︂ 𝑦1, . . . , 𝑦𝑘 ) B #𝑧 < ord .prime(𝑧)
𝜑(𝑥, ′
̃ (𝒙, 𝒙 ) = comb (𝒙, 𝐴 (𝒙, 𝒙 ) + 𝒃) . ′
comb (B.A)
𝑐 ′
< 2 ⋅ #𝑧 < ord .(prime(𝑧) ∧ 𝜑 (𝑥, 𝑦1, . . . , 𝑦𝑘 , 𝑧)).
Note that this function is computable by an FNN (assuming comb
̃ B (id𝑝 , MEAN, comb
is). We let 𝔏 ̃ ). Then 𝑇 = 𝑇̃ . □
Claim 5. 𝑑1
For all 𝑣 ∈ 𝑉 (𝐺) and 𝑎 1 ∈ [︀𝑛 [︀ , . . . , 𝑎𝑘 ∈ [︀𝑛 [︀, 𝑑𝑘 𝔏 𝔏

𝐺 ⊧ 𝜑(𝑥, 𝑎 1, . . . , 𝑎𝑘 ) ⇐⇒ 𝐺 ⊧ 𝜑(𝑥,
⧹︂ 𝑎 1, . . . , 𝑎𝑘 ). (A.E) Observe that if the GNN layer 𝔏 in the previous proof had SUM
instead of MEAN as aggregation, we could still write down the
Proof. Follows immediately from Claim 4 and the fact that more than function corresponding to (B.A), but in general it would not be
half of the primes 𝑝 < 𝑛𝑐 are good. □ comoputable by an FNN.
16
C DETAILS OMITTED FROM SECTION 5 Proof. Suppose that 𝑓 ∈ FC(𝔭) for the polynomial 𝔭 in (5.A).
We need to show that relu(𝑓 ) ∈ FC. Then the assertion for co-
C.1 SUM
nice polynomials follows by flipping the arguments. Let 𝑎 be the
Lemma 5.2. Let 𝔭(𝑋, 𝑌 ) be a nice or co-nice polynomial with leading coefficient of 𝔭. If 𝑎 = 0, then 𝔭 = 0 is the zero poly-
leading coefficient 𝑎. Then there is an 𝑛 0 ∈ N such that for all 𝑛 ≥ 𝑛 0 nomial, and relu(𝑓 ) ∈ FC(𝔭) follows from the observation that
we have ⋃︀ relu(𝑓 (𝑚, 𝑛))⋃︀ ≤ ⋃︀𝑓 (𝑚, 𝑛)⋃︀ for all 𝑚, 𝑛. Suppose that 𝑎 ≠ 0. By
sign (𝔭(𝑛 − 1, 𝑛)) = sign (𝔭(𝑛 + 1, 𝑛)) = sign(𝑎), Lemma 5.2, there is an 𝑛 0 such that for all 𝑛 ≥ 𝑛 0 and 𝑚 ∈ {𝑛−1, 𝑛+1}
we have
and ⋃︀𝔭(𝑛 − 1, 𝑛)⋃︀, ⋃︀𝔭1 (𝑛 + 1, 𝑛)⋃︀ ≥ ⋃︀𝑎 ⋃︀. )︀
⌉︀≥ 𝑎 if 𝑎 > 0,
⌉︀
𝔭(𝑚, 𝑛) ⌋︀
Proof. The assertion trivially holds if 𝑎 = 0, in which case 𝔭 is
]︀≤ 𝑎 if 𝑎 < 0.
⌉︀
⌉︀
the zero polynomial. So suppose 𝑎 ≠ 0. Assume first that 𝔭(𝑋, 𝑌 ) is
By fast convergence, we may further assume that ⋃︀𝑓 (𝑚, 𝑛)−𝔭(𝑚, 𝑛)⋃︀ ≤
nice. Say, 𝑎
𝑘 2 . Thus if 𝑎 > 0, we have relu(𝑓 (𝑚, 𝑛)) = 𝑓 (𝑚, 𝑛) and thus relu(𝑓 ) ∈
𝔭(𝑋, 𝑌 ) = ∑ 𝑎𝑖 𝑋 ⟨︀𝑖⇑2⧹︀𝑌 [︂𝑖⇑2⌉︂ FC(𝔭). If 𝑎 < 0 we have relu(𝑓 (𝑚, 𝑛)) = 0 and thus relu(𝑓 ) ∈
𝑖=0 FC(0). □
with 𝑎 = 𝑎𝑘 ≠ 0.
If 𝑘 = 0 then 𝔭(𝑚, 𝑛) = 𝑎 0 = 𝑎 for all 𝑚, 𝑛, and the assertions hold C.2 MEAN and MAX
trivially. Assume that 𝑘 > 0. We we write 𝔭 as 𝔭 = 𝔭1 + 𝔭2 , where The rest of the appendix is devoted to a proof of the following
𝑘−1 theorem.
𝔭1 (𝑋, 𝑌 ) = 𝑎𝑘 𝑋 ⟨︀𝑘⇑2⧹︀𝑌 [︂𝑘⇑2⌉︂, 𝔭2 (𝑋, 𝑌 ) = ∑ 𝑎𝑖 𝑋 ⟨︀𝑖⇑2⧹︀𝑌 [︂𝑖⇑2⌉︂ .
𝑖=0 Theorem 5.8. (1) Every query computable by a 2-GNN with
MAX-aggregation is computable by a 1-GNN with MAX-
It is not difficult to show that lim𝑛→∞ ⋂︀𝔭1 (𝑛 − 1, 𝑛)⋂︀ − ⋂︀𝔭2 (𝑛 − 1, 𝑛)⋂︀ =
aggregation.
∞. This implies lim𝑛→∞ ⋃︀𝔭(𝑛 − 1, 𝑛)⋃︀ = ∞. Furthermore, for all
(2) Every query computable by a 2-GNN with MEAN-aggregation
sufficiently large 𝑛 we have sign (𝔭(𝑛 − 1, 𝑛)) = sign (𝔭1 (𝑛 − 1, 𝑛)) =
is computable by a 1-GNN with MEAN-aggregation.
sign(𝑎).
The assertions for 𝔭(𝑛 + 1, 𝑛) and for co-nice polynomials can be We introduce additional notations that we use in the proof. For 𝑛 ∈
proved similarly. □ N>0 , denote by 1𝑛,𝑖 the vector of length 𝑛 with 1 in the i𝑡ℎ position and
0 elsewhere. Let X ⊆ R𝑝 be a finite set and assume an enumeration
Lemma 5.3. (1) All constant functions 𝑐 ∶ N2 → R are in FC ∩ of it X = 𝒙 1, . . . , 𝒙 𝑛 . For all 𝑖 ∈ (︀𝑛⌋︀ denote by 1X (𝒙 𝑖 ) B 1𝑛,𝑖 the
co
FC .
one-hot representation of 𝒙 𝑖 . Denote by 1−1 X the inverse function of
(2) For 𝑓 ∶ N2 → R, define 𝑓1, 𝑓2 ∶ N2 → R by 𝑓1 (𝑚, 𝑛) B X
𝑚 ⋅ 𝑓 (𝑛,𝑚) and 𝑓2 (𝑚, 𝑛) B 𝑛 ⋅ 𝑓 (𝑚, 𝑛). Then 1X , that is, 1−1 X (1𝑛,𝑖 ) B 𝒙 𝑖 . For a multiset 𝑀 ∈ (( ∗ )) we define
1X (𝑀) B {{1X (𝒙) ⋂︀ 𝒙 ∈ 𝑀}} the corresponding multiset of one-
𝑓 ∈ FC co Ô⇒ 𝑓1 ∈ FC;
hot representations. For every two vectors 𝒖 = (𝑢 1, . . . , 𝑢𝑘 ), 𝒗 =
𝑓 ∈ FC Ô⇒ 𝑓2 ∈ FC co . (𝑣 1, . . . , 𝑣𝑘 ) ∈ R𝑘 we define ∏︁𝒖 − 𝒗∏︁ B ∏︁𝒖 − 𝒗∏︁∞ = max𝑖∈(︀𝑘⌋︀ ⋃︀𝑢𝑖 − 𝑣𝑖 ⋃︀.
(3) All linear functions preserve fast convergence. Moreover, we let mean(𝒖) B 𝑘1 ∑ 𝑢𝑖 and max(𝒖) = max𝑖∈(︀𝑘⌋︀ 𝑢𝑖 .
(4) All functions computed by FNNs that only use activation func- We define an operator 𝑈 that removes the duplicates in a multiset,
tions preserving fast convergence preserve fast convergence. that is, for a multiset 𝑀 we have 𝑈 (𝑀) B {𝑥 ∶ 𝑥 ∈ 𝑀}.
Proof. All polynomials of degree 0 (constants) are nice, which Let 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) be a 𝑑-layer i-GNN and let 𝑝 (𝑖), 𝑞 (𝑖) be
proves (1). the input and output dimensions of 𝔏(𝑖) . For every graph and signal
To prove (2), suppose that 𝑓 ∈ FC(𝔭) for the polynomial 𝔭 in (𝐺, f) ∈ GS𝑝 (1) and 𝑖 ∈ (︀𝑑⌋︀, we define
(5.A). Note that (𝑖)
𝑇𝔑 (𝐺, f) B 𝑇𝔏(𝑖) ○ 𝑇𝔏(𝑖−1) ○ . . . ○ 𝑇𝔏(1)
𝑘 𝑘
𝑋 ⋅ 𝔭(𝑋, 𝑌 ) = ∑ 𝑎𝑖 𝑋 ⟨︀𝑖⇑2⧹︀+1𝑌 [︂𝑖⇑2⌉︂ = ∑ 𝑎𝑖 𝑋 [︂(𝑖+1)⇑2⌉︂𝑌 ⟨︀(𝑖+1)⇑2⧹︀, the result of operating the first 𝑖 layers of 𝔑 on (𝐺, f). We define
𝑖=0 𝑖=0 (0)
the 0-stage signal 𝑆 𝔑 (𝐺, f) B f, and for 𝑖 ∈ (︀𝑑⌋︀ we define the
which is co-nice. To establish the fast convergence, let 𝑟 ∈ R. Choose (𝑖)
𝑛 0 ≥ 2 such that for 𝑛 ≥ 𝑛 0 and 𝑚 ∈ {𝑛 − 1, 𝑛 + 1} it holds that 𝑖 𝑡ℎ -stage signal to be 𝑆 𝔑 (𝐺, f) such that
⋂︀𝑓 (𝑚, 𝑛) − 𝔭(𝑚, 𝑛)⋂︀ ≤ 𝑛 −(𝑟 +2) . Then (𝑖) (𝑖)
𝑇𝔑 (𝐺, f) = (𝐺, 𝑆 𝔑 (𝐺, f))
𝑛+1 1
⋂︀𝑚 ⋅ 𝑓 (𝑚, 𝑛)−𝑚 ⋅𝔭(𝑚, 𝑛)⋂︀ ≤ (𝑛 +1)⋂︀𝑓 (𝑚, 𝑛)−𝔭(𝑚, 𝑛)⋂︀ ≤ 𝑟 +2 ≤ 𝑟 . When (𝐺, f) are clear from the context, for a vertex 𝑣 ∈ 𝑉 (𝐺) we
𝑛 𝑛 (𝑖) (𝑖)
The second assertion in (2) follows from the first. define 𝑣 𝔑 B 𝑆 𝔑 (𝐺, f)(𝑣) the signal of 𝑣 after operating the first 𝑖
Assertion (3) follows from the observation that linear combinations layers of 𝔑. Also, when referring to a final signal value 𝑆 𝔑 (𝐺, f)(𝑣)
of nice polynomials are nice. we may omit (𝐺, f) and write 𝑆 𝔑 (𝑣). For every (𝐺, f) ∈ GS𝑝 (1) ,
Assertion (4) follows directly from (1) and (3). □ every 𝑣 ∈ 𝑉 (𝐺), and every 𝑖 ∈ (︀𝑑⌋︀, we define
(𝑖) (𝑖)
Lemma 5.5. The relu function preserves fast convergence. 𝑁 𝔑 (𝑣) B {{𝑤 𝔑 ∶ 𝑤 ∈ 𝑁𝐺 (𝑣)}}
17
the multiset of signal-values of the neighbours of 𝑣 after the operation such that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
of the first 𝑖 layers of 𝔑.
′(𝑖+1) (𝑖) (𝑖)
comb (1 (𝑖) (𝑣 𝔑 ), max(1 (𝑖) (𝑁 𝔑 (𝑣)))) B
𝐷𝔑 𝐷𝔑
Proof of Theorem 5.8(1) (MAX-aggregation).
(𝑖+1) (𝑖) (𝑖) (𝑖)
Let comb (𝑣 𝔑 , 𝑓F (1 (𝑖) (𝑣 𝔑 ), max(1 (𝑖) (𝑁 𝔑 (𝑣))))) =
(1) (𝑑) 𝐷𝔑 𝐷𝔑
𝔑 = (𝔏 ,. . .,𝔏 )
(𝑖+1) (𝑖) (𝑖) (𝑖) (𝑖+1)
(𝑖)
comb (𝑣 𝔑 , max(msg(𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑁 𝔑 (𝑣))) = 𝑣 𝔑
be a 𝑑-layer 2-GNN where 𝔏 = (msg , max, comb(𝑖) ), 𝔏(𝑖) (𝑖)
(C.A)
has input and output dimensions 𝑝 (𝑖), 𝑞 (𝑖) , and msg(𝑖) has output
(𝑖)
(0)
dimension 𝑟 (𝑖) . We define 𝐷 𝔑 B {0, 1}𝑝 the set of possible
(1)
Note that (assuming 𝐷 𝔑 is finite) the domain of 𝑐𝑜𝑚𝑏 ′(𝑖+1) is finite,
(𝑖+1) (0)
input boolean-signal values and for 𝑖 ∈ (︀𝑑⌋︀ we define hence 𝐷 𝔑 is finite. Since 𝐷 𝔑 is finite by definition, we have by
(𝑖)
(𝑖) (𝑖) induction that 𝐷 𝔑 is finite ∀𝑖 ∈ (︀𝑑 + 1(︀ and there exists comb′(𝑖+1)
𝐷 𝔑 B {𝑆 𝔑 (𝐺, b)(𝑣) ∶ (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)}
as described. For 𝑖 ∈ (︀𝑑 − 1(︀ we can modify the comb′(𝑖+1) -
the set of possible signal values after operating the first 𝑖 layers of implementing FNN into a comb′′(𝑖+1) -implementing FNN that
𝔑 when the initial input-signal is boolean. For 𝑖 ∈ (︀𝑑 + 1(︀ define (𝑖+1) (𝑖+1)
(𝑖) (𝑖) (𝑖) outputs 1 (𝑖+1) (𝑣 𝔑 ) instead of 𝑣 𝔑 . Let 𝔏′(0) be a 1-GNN layer
𝑑 𝔑 B ⋂︀𝐷 𝔑 ⋂︀ ∈ (N ∪ {∞}) the size of 𝐷 𝔑 . Assume 𝑖 ∈ (︀𝑑(︀, assume 𝐷𝔑
(𝑖) (𝑖) (which clearly exists) such that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
𝐷𝔑 is finite, and assume an enumeration 𝐷 𝔑 = 𝑥 1, . . . , 𝑥 (𝑖) .
𝑑𝔑 it holds that 𝑆𝔏′(0) (𝐺, b)(𝑣) = 1 (b(𝑣)), and define ∀𝑖 ∈ (︀𝑑 − 1⌋︀
(0)
(𝑖) (𝑖)
(𝑖) (𝑖) 𝐷𝔑
𝑑𝔑 𝐷𝔑
We define 𝔴0 ∶ 1 (𝑖) (𝐷 𝔑 ) × {0, 1} → 𝐷𝔑 × 2 such that ′(𝑖) ′′(𝑖)
𝐷𝔑 𝔏 B (id (𝑖−1) , max, comb ) and
𝑑𝔑
(𝑖)
(𝑖)
∀(𝑠, 𝑡) ∈ 1 (𝑖) (𝐷 𝔑 ) × {0, 1}𝑑𝔑 𝔏′(𝑑) B (id (𝑑−1) , max, comb′(𝑑) )
𝐷𝔑 𝑑𝔑

Finally, define
𝔴0 (𝑠, 𝑡) B (1−1(𝑖) (𝑠), {𝑥𝑖 ∶ 𝑡(𝑖) = 1})
𝐷𝔑 𝔑′ B (𝔏′(0), . . . , 𝔏′(𝑑) )
and note that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
then 𝑇𝔑′ = 𝑇𝔑 . □

(𝑖) (𝑖) (𝑖) (𝑖) Proof of Theorem 5.8(2) (MEAN-aggregation).


𝔴0 (1 (𝑣 𝔑 ), max(1
We define N1>0 B { 𝑎1 ∶ 𝑎 ∈ N>0 }. For every 𝛿 ∈ N1>0 and 𝑛 ∈ N, we
(𝑖) (𝑖) (𝑁 𝔑 (𝑣)))) = (𝑣 𝔑 , 𝑈 (𝑁 𝔑 (𝑣)))
𝐷𝔑 𝐷𝔑

(𝑖) (𝑖+1)
define
(𝑖)
We define 𝔴1 ∶ 𝐷 𝔑 × 2𝐷 𝔑 → R𝑟 such that 𝑆𝑛,𝛿 B {(𝑝 1, . . . , 𝑝𝑛 ) ∶ Σ𝑝𝑖 = 1, ∀𝑖 ∈ (︀𝑛⌋︀ ∃𝑏 ∈ N ∶ 𝑝𝑖 = 𝑏𝛿}
∀(𝐺, b) ∈ GS𝑝bool
(𝑖) (𝑖)
∈ 𝑉 (𝐺) 𝔴1 (𝑣 𝔑 , 𝑈 (𝑁 𝔑 (𝑣)))
(1) , 𝑣 B 𝛿𝑛+ 1 −1
the set of ( 𝑛−1 ) possible elements’ proportions in a multiset over
(𝑖) (𝑖) a set of size 𝑛, s.t. the proportion of each element is a multiple of 𝛿.
max(msg(𝑖+1) (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑈 (𝑁 𝔑 (𝑣))))
Let 𝔑 = (𝔏(1), . . . , 𝔏(𝑑) ) be a 𝑑-layer 2-GNN where
(𝑖) (𝑖+1)
(𝑖)
Finally, we define 𝔴2 ∶ 1 (𝑖) (𝐷 𝔑 ) × {0, 1}𝑑𝔑 → R𝑟 such that 𝔏(𝑖) = (msg(𝑖), mean, comb(𝑖) )
𝐷𝔑

𝔴2 B 𝔴1 ○ 𝔴0 and 𝔏(𝑖) has input and output dimensions 𝑝 (𝑖), 𝑞 (𝑖) . Let X ∈
𝑝 (𝑖)
and have that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
(( R ∗ )) be a finite multiset, and assume an enumeration X =
(𝑖)

𝔴2 (1
(𝑖)
(𝑣 𝔑 ), max(1
(𝑖)
(𝑁 𝔑 (𝑣)))) = 𝑥 1, . . . , 𝑥𝑛 , then for 𝑦 ∈ R𝑝 we denote by
(𝑖) (𝑖)
𝐷𝔑 𝐷𝔑
(𝑖)
(𝑖) (𝑖) MsgX (𝑦) B (msg(𝑖) (𝑦, 𝑥 1 ), . . . , msg(𝑖) (𝑦, 𝑥𝑛 ))
max(msg(𝑖+1) (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑈 (𝑁 𝔑 (𝑣)))
the vector of messages between 𝑦 and every element in X. We
(𝑖) (𝑖)
= max(msg(𝑖+1) (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑁 𝔑 (𝑣))) prove by induction on 𝑑 that for every 𝜀 > 0 there exists a 1-GNN
The last equality is key in determining that the domain of the comb 𝔑′ = 𝔏′(0), 𝔏′(1), . . . , 𝔏′(𝑑) and a finite set X (𝑑) such that
function is finite although there are infinitely many possible multiset- ∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) ∃𝑥 𝑣 ∈ X
(𝑑)

(𝑖)
inputs to the max aggregation. By assumption that 𝐷 𝔑 is finite, 𝜀 𝜀
the domain of 𝔴2 is finite and there exists an FNN that implements ⋂︀𝑆 𝔑 (𝑣) − 𝑥 𝑣 ⋂︀ < , ⋂︀𝑆 𝔑′ (𝑣) − 𝑥 𝑣 ⋂︀ <
2 2
𝔴2 . Denote that FNN by F, denote the function it implements by which implies
𝑓F , and note that comb(𝑖+1) is implemented by an FNN, then there ⋂︀𝑆 𝔑 (𝑣) − 𝑆 𝔑′ (𝑣)⋂︀ < 𝜀
exists an FNN that implements
′(𝑖)
(𝑖) (𝑖)
The layers 𝔏 , 𝑖 ∈ (︀𝑑⌋︀ of the 1-GNN are designed to operate on one-
(𝑖+1)
𝑐𝑜𝑚𝑏 ′(𝑖+1) ∶ {0, 1}𝑑𝔑 × {0, 1}𝑑𝔑 → 𝐷 𝔑 hot representations, and the layer 𝔏′(0) translates the initial-signal
18
(𝑖)
range to its one-hot representation. The X (𝑖) s are the "make-believe" such that ∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) ∀𝛿 2 > 0 ∀𝑢 ∈ R
𝑟
finite domains by which we will design the layers of the 1-GNN.
(0)
⋂︀𝑢 − mean(1X (𝑁 𝔑 (𝑣)))⋂︀ ≤ 𝛿 2 ⇒
𝑝 (1)
Induction Basis. We prove for 𝑑 = 1. Let 𝜀 > 0. Define X B {0, 1} , (0) (0)
𝑛 B ⋂︀X⋂︀, and assume an enumeration X = 𝑥 1, . . . , 𝑥𝑛 . Note that for ⋃︀comb′(1) (1X (𝑣 𝔑 ), mean(1X (𝑁 𝔑 (𝑣))))− (C.E)
every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺) comb
′(1) (0)
(1X (𝑣 𝔑 ), 𝑢)⋃︀ < 𝑔3 (𝛿 2 )
1
(1) (0) (0) Let 𝛿 2 ∈ such that ∀𝛿 ≤ 𝛿 2 𝑔3 (𝛿) < 𝜀2 . Let (𝐺, b) ∈
mean(msg (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑁 𝔑 (𝑣)) = N>0

(1) (0) (0)


(C.B) GS𝑝bool
(1) , 𝑣∈ 𝑉 (𝐺) and let 𝒑 𝑣 ∈ 𝑆𝑛,𝛿 2 be proportions - which must
MsgX (𝑣 𝔑 ) ⋅ mean(1X (𝑁 𝔑 (𝑣))) exist - such that
(0)
That is, the mean of the messages is equal to the product of the vector ⋂︀𝒑 𝑣 − mean(1X (𝑁 𝔑 (𝑣)))⋂︀ ≤ 𝛿 2
of possible messages and the vector of corresponding proportions
, then,
of the elements (of X) among the neighbors of 𝑣. By X being
(0) (0)
finite, there exists 𝑔1 ∶ R → R, lim𝑥→0 𝑔1 (𝑥) = 0 such that ∀(𝐺, b) ∈ ⋃︀comb′(1) (1X (𝑣 𝔑 ), mean(1X (𝑁 𝔑 (𝑣))))−
(𝑖) (C.F)
GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) ∀𝛿 > 0 ∀𝑢 ∈ R
𝑟
comb
′(1) (0)
(1X (𝑣 𝔑 ), 𝒑 𝑣 )⋃︀ <
𝜀
2
(0)
⋂︀𝑢 − mean(1X (𝑁 𝔑 (𝑣)))⋂︀ ≤ 𝛿 ⇒ Define 𝔏′(0) to be a 1-GNN layer (which clearly exists) such that
(1) (0) (0) 𝑆𝔏′(0) (𝐺, b)(𝑣) B 1X (b(𝑣)). Define 𝔏′(1) B (id𝑛 , mean, comb′(1) )
⋃︀MsgX (𝑣 𝔑 ) ⋅ mean(1X (𝑁 𝔑 (𝑣)))−
and define
(1) (0)
MsgX (𝑣 𝔑 ) ⋅ 𝑢⋃︀ < 𝑔1 (𝛿) 𝔑′ = (𝔏′(0), 𝔏′(1) )
Then, note that the first expression in the LHS of Equation (C.F)
Hence, by comb(1) being Lipschitz-Continuous, and remembering is 𝑆 𝔑′ (𝑣). Define 𝛿 3 = 𝑚𝑖𝑛(𝛿 1, 𝛿 2 ), then by Equation (C.D) and
Equation (C.B), there exists 𝑔2 ∶ R → R, lim𝑥→0 𝑔2 (𝑥) = 0 such that Equation (C.F), for every (𝐺, b) ∈ GS𝑝bool (1) , 𝑣 ∈ 𝑉 (𝐺) there exists
(𝑖)
∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) ∀𝛿 > 0 ∀𝑢 ∈ R
𝑟
𝒑 𝑣 ∈ 𝑆𝑛,𝛿 3 such that for
(0) (1) (0)
(0) 𝑥 𝑣 B comb(1) (𝑣 𝔑 , MsgX (𝑣 𝔑 ) ⋅ 𝒑 𝑣 )
⋂︀𝑢 − mean(1X (𝑁 𝔑 (𝑣)))⋂︀ ≤ 𝛿 ⇒
(0) (0) (0) it holds that
⋃︀comb(1) (𝑣 𝔑 , mean(msg(1) (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑁 𝔑 (𝑣)))− (C.C) 𝜀 𝜀
⋃︀𝑆 𝔑 (𝑣) − 𝑥 𝑣 ⋃︀ < , ⋃︀𝑆 𝔑′ (𝑣) − 𝑥 𝑣 ⋃︀ <
(1) (0) (1) (0) 2 2
comb (𝑣 𝔑 , MsgX (𝑣 𝔑 ) ⋅ 𝑢)⋃︀ < 𝑔2 (𝛿 1 )
Hence, define
Let 𝛿 1 ∈ N1>0 such that ∀𝛿 ≤ 𝛿 1 𝑔2 (𝛿) < 𝜀2 . Let (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈
(1)
X (1) B {comb(1) (𝑦, MsgX (𝑦) ⋅ 𝒑) ∶ (𝑦, 𝒑) ∈ X × 𝑆𝑛,𝛿 3 }
𝑉 (𝐺) and let 𝒑 𝑣 ∈ 𝑆𝑛,𝛿 1 be proportions - which must exist - such that
then X (1) satisfies the requirement.
(0)
⋂︀𝒑 𝑣 − mean(1X (𝑁 𝔑 (𝑣)))⋂︀ ≤ 𝛿 1 Induction Step. We assume correctness for 𝑑 = 𝑡 and every 𝜀 ′ > 0
, then, and prove for 𝑑 = 𝑡 + 1. Define X B X (𝑡 ), 𝑛 B ⋂︀X⋂︀, and assume
an enumeration X = 𝑥 1, . . . , 𝑥𝑛 . Let 𝔑′′ = (𝔏′′(0), . . . , 𝔏′′(𝑡 ) ) be a
(0) (0) (0)
⋃︀comb(1) (𝑣 𝔑 , mean(msg(1) (𝑣 𝔑 , 𝑥) ∶ 𝑥 ∈ 𝑁 𝔑 (𝑣)))− 1-GNN - that exists by the induction assumption - such that
(1) (0) (1) (0) 𝜀 (C.D)
(𝑡 )
comb (𝑣 𝔑 , MsgX (𝑣 𝔑 ) ⋅ 𝒑 𝑣 )⋃︀ < ∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) ⋂︀𝑣 𝔑 − 𝑆 𝔑 ′′ (𝑣)⋂︀ < 𝜀

2
Note that the first expression in the LHS of Equation (C.D) is 𝑆 𝔑 (𝑣). and denote by 𝑣 X a value in X - which exists by the induction as-
We proceed to show a similar result for 𝑆 𝔑′ . For 𝛿 2 ∈ N1>0 define
′ ′
sumption - such that ⋂︀𝑣 𝔑 −𝑣 X ⋂︀ < 𝜀2 , ⋂︀𝑆 𝔑′′ (𝑣) −𝑣 X ⋂︀ < 𝜀2 . Compared
(𝑡 )
(1) to the case of 𝑑 = 1, in the case of 𝑑 > 1 we need to account not only
comb′(1) ∶ {0, 1}𝑛 × R𝑛 → R𝑞 such that
for differences (which can be made as small as we want) between the
∀(𝑥, 𝒑) ∈ 1X (X) × 𝑆𝑛,𝛿 2 comb′(1) (𝑥, 𝒑) B actual proportions and the proportions that 𝔏′(𝑡 +1) will be designed
for, but also for ⋂︀𝑣 𝔑 − 𝑣 X ⋂︀ i.e. the difference between the actual
(𝑡 )
(1) (1)
comb (1−1 −1
X (𝑥), MsgX (1X (𝑥)) ⋅ 𝒑) vertices’ values and the values in X - the domain that 𝔏′(𝑡 +1) will
be designed for. We handle the proportions difference first, using
While the domain on which comb′(1) is applicable is infinite, the the already walked-through path used in the proof for 𝑑 = 1. For
domain for which it is explicitly specified is finite, hence comb′(1) (𝑡 +1)
𝛿 1 ∈ N1>0 define comb′(𝑡 +1) ∶ R𝑛 × R𝑛 → R𝑞 such that
can be implemented by an FNN. By argumentation similar to the
development of Equation (C.C) we have that there exists ∀(𝑥, 𝒑) ∈ 1X (X) × 𝑆𝑛,𝛿 1 comb′(𝑡 +1) (𝑥, 𝒑) B
𝑔3 ∶ R → R, lim 𝑔3 (𝑥) = 0 (𝑡 +1)
(1−1
(𝑡 +1)
(1−1
𝑥→0 comb X (𝑥), (MsgX X (𝑥)) ⋅ 𝒑))
19
As explained in the case of 𝑑 = 1, such 𝑐𝑜𝑚𝑏 ′(𝑡 +1) can be im- , and thus, such that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
plemented by an FNN. Also, by the same argumentation used
⋃︀mean(𝑆 𝔑′ (𝑤) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺))−
in the case of 𝑑 = 1, there exists 𝛿 1 ∈ N1>0 such that for every
X ′
(𝐺, b) ∈ GS𝑝bool mean(1X (𝑤 ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺))⋃︀ < 𝑔3 (𝜀 )
(1) , 𝑣 ∈ 𝑉 (𝐺) there exists 𝒑 𝑣 ∈ 𝑆𝑛,𝛿 1 such that for
Hence, by comb′(𝑡 +1) being Lipschitz-Continuous, there exists
𝑥 𝑣 B comb(𝑡 +1) (𝑣 X , MsgX (𝑣 X ) ⋅ 𝒑 𝑣 )
(𝑡 +1)
𝑔4 ∶ R → R, lim 𝑔4 (𝑥) = 0
𝑥→0
it holds that
bool
𝜀 such that for every (𝐺, b) ∈ GS𝑝 (1) , 𝑣
⋂︀comb(𝑡 +1) (𝑣 X , mean(msg(𝑡 +1) (𝑣 X , 𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))−𝑥 𝑣 ⋂︀ <
∈ 𝑉 (𝐺)
4
(C.G) ⋃︀comb′(𝑡 +1) (𝑆 𝔑′ (𝑣), mean(𝑆 𝔑′ (𝑤) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))−
and
𝜀 comb (1X (𝑣 X ), mean(1X (𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))⋃︀ < 𝑔4 (𝜀 ′ )
′(𝑡 +1)
⋂︀comb′(𝑡 +1) (1X (𝑣 X ), mean(1X (𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺))) − 𝑥 𝑣 ⋂︀ <
4 Assuming 𝜀 ′ small enough such that ∀𝜀 ′′ < 𝜀 ′ 𝑔4 (𝜀 ′′ ) < 𝜀4 we have
(C.H)
that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
Next, we turn to the differences between the actual vertices’ values
(after layer 𝑡) and the values in X - the finite domain according ⋃︀comb′(𝑡 +1) (𝑆 𝔑′ (𝑣), mean(𝑆 𝔑′ (𝑤) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))−
to which 𝔏′(𝑡 +1) is defined. We start with bounding the gap for 𝜀 (C.J)
the 2-GNN. By msg(𝑡 +1) being Lipschitz-Continuous, there exists comb
′(𝑡 +1)
(1X (𝑣 X ), mean(1X (𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))⋃︀ <
4
𝑔1 ∶ R → R, lim𝑥→0 𝑔1 (𝑥) = 0 such that ∀(𝐺, b) ∈ GS𝑝bool (1) ∀𝑣, 𝑤 ∈
Define 𝔏′(𝑡 +1) B (id𝑛 , mean, comb′(𝑡 +1) ) and redefine 𝔑′ to in-
𝑉 (𝐺)
clude 𝔏′(𝑡 +1) , that is,
⋂︀msg(𝑡 +1) (𝑣 𝔑 , 𝑤 𝔑 ) − msg(𝑡 +1) (𝑣 X , 𝑤 X )⋂︀ < 𝑔1 (𝜀 ′ )
(𝑡 ) (𝑡 )
𝔑′ B (𝔏′(0), . . . , 𝔏′(𝑡 ), 𝔏′(𝑡 +1) )
, and thus, such that ∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺) Combining Equation (C.G) with Equation (C.I), and Equation (C.H)
with Equation (C.J) we have that for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)
(𝑡 ) (𝑡 )
⋃︀mean(msg(𝑡 +1) (𝑣 𝔑 , 𝑤 𝔑 ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺))− 𝜀 𝜀
⋃︀𝑆 𝔑 (𝑣) − 𝑥 𝑣 ⋃︀ < , ⋃︀𝑆 𝔑′ (𝑣) − 𝑥 𝑣 ⋃︀ <
mean(msg
(𝑡 +1) X X
(𝑣 , 𝑤 ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺))⋃︀ < 𝑔1 (𝜀 ) ′ 2 2
(𝑡 +1)
Hence, define
By comb being Lipschitz-Continuous, we have that there exists (𝑡 +1)
𝑔2 ∶ R → R, lim𝑥→0 𝑔2 (𝑥) = 0 such that X (𝑡 +1) B {comb(𝑡 +1) (𝑦, MsgX (𝑦) ⋅ 𝒑) ∶ (𝑦, 𝒑) ∈ X × 𝑆𝑛,𝛿 1 }
(𝑡 +1)
∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺)
then X satisfies the requirement. □

(𝑡 ) (𝑡 ) (𝑡 )
⋃︀comb(𝑡 +1) (𝑣 𝔑 , mean(msg(𝑡 +1) (𝑣 𝔑 , 𝑤 𝔑 ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))−

comb
(𝑡 +1)
(𝑣 X , mean(msg(𝑡 +1) (𝑣 X , 𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))⋃︀ < 𝑔2 (𝜀 ′ )
Assuming 𝜀 ′ small enough such that ∀𝜀 ′′ < 𝜀 ′ 𝑔2 (𝜀 ′′ ) < 𝜀4 we have

∀(𝐺, b) ∈ GS𝑝bool
(1) ∀𝑣 ∈ 𝑉 (𝐺)

(𝑡 ) (𝑡 ) (𝑡 )
⋃︀comb(𝑡 +1) (𝑣 𝔑 , mean(msg(𝑡 +1) (𝑣 𝔑 , 𝑤 𝔑 ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))−
(𝑡 +1) X 𝜀
comb (𝑣 , mean(msg(𝑡 +1) (𝑣 X , 𝑤 X ) ∶ 𝑤 ∈ 𝑁 𝑣 (𝐺)))⋃︀ <
4
(C.I)
Next, we bound the gap for the 1-GNN. Define ℎ ∶ X → 1X (X) such
that ∀𝑥 ∈ X ℎ(𝑥) = 1X (𝑥), a mapping from the values-domain to its
one-hot representation, which can be implemented by an FNN. Define
𝔑′ B (𝔏′(0), . . . , 𝔏′(𝑡 ) ) such that ∀0 ≤ 𝑖 ≤ (𝑡 − 1) 𝔏′(𝑖) B 𝔏′′(𝑖) ,
and 𝔏′(𝑡 ) is such that
𝑆𝔏′(𝑡 ) B ℎ ○ 𝑆𝔏′′(𝑡 )
By the induction assumption, for every (𝐺, b) ∈ GS𝑝bool
(1) , 𝑣 ∈ 𝑉 (𝐺)

⋂︀𝑆 𝔑′′ (𝑣) − 𝑣 X ⋂︀ < 𝜀2 . Hence, by (every FNN-implementation of) ℎ be-
ing Lipschitz-Continuous, there exists 𝑔3 ∶ R → R, lim𝑥→0 𝑔3 (𝑥) = 0
such that for every (𝐺, b) ∈ GS𝑝bool (1) , 𝑣 ∈ 𝑉 (𝐺)

⋂︀𝑆 𝔑′ (𝑣) − 1X (𝑣 X )⋂︀ < 𝑔3 (𝜀 ′ )


20

You might also like