P. 1
pnp12pt

pnp12pt

|Views: 9|Likes:
Published by jc_david

More info:

Published by: jc_david on Feb 21, 2011
Copyright:Attribution Non-commercial

Availability:

Read on Scribd mobile: iPhone, iPad and Android.
download as PDF, TXT or read online from Scribd
See more
See less

11/16/2011

pdf

text

original

Sections

  • 1. Introduction
  • 1.1 Synopsis of Proof
  • Conditional Independence
  • 2.1 Conditional Independence
  • orem
  • 2.3 Factor Graphs
  • 2.5 I-maps andD-maps
  • 3.1 Two Kinds of poly(logn)-parameterizations
  • 3.1.1 Range Limited Interactions
  • 3.1.2 Value Limited Interactions
  • 3.1.3 On the Atypical Nature of poly(logn)-parameterization
  • 3.1.4 Our Treatment of Range and Value Limited Distributions
  • 4.1 Inductive Definitions and Fixed Points
  • 4.2 Fixed Point Logics for P andPSPACE
  • Independence
  • 5.1 The Limitations of LFP
  • 5.1.1 Locality of First Order Logic
  • 5.2 Simple Monadic LFP and Conditional Indepen-
  • 5.4 Aggregate Properties of LFP over Ensembles
  • 6. The 1RSB Ansatz of Statistical
  • 6.1 Ensembles and Phase Transitions
  • 6.2 The d1RSB Phase
  • 6.2.1 Cores and Frozen Variables
  • 6.2.2 Performance of Known Algorithms in the d1RSB Phase
  • 7. Random Graph Ensembles
  • 7.1 Properties of Factor Graph Ensembles
  • 7.1.1 Locally Tree-Like Property
  • 7.1.2 Degree Profiles in Random Graphs
  • 8. Separation of Complexity Classes
  • 8.2 Generating Distributions from LFP
  • 8.2.1 Encoding k-SAT into Structures
  • 8.2.2 The LFP Neighborhood System
  • 8.2.3 Generating Distributions
  • 8.4 Parametrization of the ENSP
  • 8.5 Separation
  • 8.6 Some Perspectives
  • A.1 The Transitivity Theorem for LFP

P = NP

Vinay Deolalikar
HP Research Labs, Palo Alto
vinay.deolalikar@hp.com
August 11, 2010
¤¯¯¤|¬

H1H¯1|+| ¬ë| H1«+

¯¬ö|¬


+¬¯¤1||«+|¯-ö ¬| +¬1

+¯|¬+।
¬| +¬+¬= ö

¬

¬| ö H¥|¨-¯1+¬|m॥
This work is dedicated to my late parents:
my father Shri. Shrinivas Deolalikar, my mother Smt. Usha Deolalikar,
and my maushi Kum. Manik Deogire,
for all their hard work in raising me;
and to my late grand parents:
Shri. Rajaram Deolalikar and Smt. Vimal Deolalikar,
for their struggle to educate my father inspite of extreme poverty.
This work is part of my Matru-Pitru Rin
1
.
I am forever indebted to my wife for her faith during these years.
1
The debt to mother and father that a pious Hindu regards as his obligation to repay in this
life.
Abstract
We demonstrate the separation of the complexity class NP from its sub-
class P. Throughout our proof, we observe that the ability to compute a prop-
erty on structures in polynomial time is intimately related to an atypical prop-
erty of the space of solutions — namely, the space is parametrizable with only
c
poly(log n)
, c > 1 parameters instead of the typical c
n
parameters required for a
joint distribution of n covariates.
This type of exponentially smaller parametrization arises as a result of severe
limitations placed on the interaction between the variates. In particular, it may
arise from range limited interactions where variates interact at short ranges,
and chain together such interactions to create long range interactions. Such
long range interactions then would be characterized by the statistical notions
of conditional independence and sufficient statistics. The presence of condi-
tional independencies manifests in the form of economical parametrizations of
the joint distribution of covariates. Likewise, such economical parametrizations
can arise frominteractions which take only c
poly(log n)
many values. In both cases,
the result on the joint distribution is the same — it is parametrizable with only
c
poly(log n)
independent parameters. In order to apply this analysis to the space
of solutions of random constraint satisfaction problems, we utilize and expand
upon ideas from several fields spanning logic, statistics, graphical models, ran-
dom ensembles, and statistical physics.
We begin by introducing the requisite framework of graphical models for a
set of interacting variables. We focus on the correspondence between Markov
and Gibbs properties for directed and undirected models as reflected in the fac-
torization of their joint distribution, and the number of independent parameters
required to specify the distribution.
Next, we build the central contribution of this work. We show that there are
fundamental conceptual relationships between polynomial time computation,
which is completely captured by the logic FO(LFP) on classes of successor struc-
tures, and poly(log n)-parametrization. In particular, monadic LFP is a range
limited interaction model that possesses certain directed Markov properties
that may be stated in terms of conditional independence and sufficient statis-
tics. In order to demonstrate these relationships, we view the LFP computation
as “factoring through” several stages of first order computations, and then uti-
lize the limitations of first order logic. Specifically, we exploit the limitation
that first order logic can only express properties in terms of a bounded num-
ber of local neighborhoods of the underlying structure. Then we relate com-
plex fixed points to value limited interactions, which again result in poly(log n)-
parametrization.
Next we introduce ideas from the 1RSB replica symmetry breaking ansatz of
statistical physics. We recollect the description of the clustered phase for ran-
dom k-SAT that arises when the clause density is sufficiently high and k ≥ 9.
In this phase, known as the d1RSB phase, an arbitrarily large fraction of all vari-
ables in cores freeze within exponentially many clusters in the thermodynamic
limit, as the clause density is increased towards the SAT-unSAT threshold. The
Hamming distance between a solution that lies in one cluster and that in an-
other is O(n). Note that the onset of this phase is rigorously proven only for
k ≥ 9, and it is here that we will demonstrate our separation.
Next, we encode k-SAT formulae as structures on which FO(LFP) captures
polynomial time. By asking FO(LFP) to extend partial assignments on ensem-
bles of random k-SAT, we build distributions of solutions. We then construct a
dynamic graphical model on a product space that captures all the information
flows through the various stages of a LFP computation on ensembles of k-SAT
structures. Distributions computed by LFP must satisfy this model. This model
is directed, which allows us to compute factorizations locally and parameterize
using Gibbs potentials on cliques. We then use results from ensembles of factor
graphs of random k-SAT to bound the various information flows in this di-
rected graphical model. We parametrize the resulting distributions in a manner
that demonstrates that irreducible interactions between covariates — namely,
those that may not be factored any further through conditional independencies
— cannot grow faster than poly(log n) in the range limited monadic LFP com-
puted distributions. For value limited complex LFP, we show how to obtain a
parametrization of the solution space by merging potentials with scope O(n).
This allows us to analyze the behavior of the entire class of polynomial time
algorithms on ensembles simultaneously.
Using the aforementioned limitations of LFP, we demonstrate that a pur-
ported polynomial time solution to k-SAT would result in solution space that
is a mixture of distributions each having an exponentially smaller parametriza-
tion than is consistent with the highly constrained d1RSB phases of k-SAT. We
show that this would contradict the behavior exhibited by the solution space in
the d1RSB phase. This corresponds to the intuitive picture provided by physics
about the emergence of extensive (meaning O(n)) long-range correlations be-
tween variables in this phase and also explains the empirical observation that
all known polynomial time algorithms break down in this phase.
Our work shows that every polynomial time algorithm must fail to produce
solutions to large enough problem instances of k-SAT in the d1RSB phase. This
shows that polynomial time algorithms are not capable of solving NP-complete
problems in their hard phases, and demonstrates the separation of P from NP.
Contents
1 Introduction 3
1.1 Synopsis of Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Interaction Models and Conditional Independence 15
2.1 Conditional Independence . . . . . . . . . . . . . . . . . . . . . . . 15
2.2 Conditional Independence in Undirected Graphical Models . . . 17
2.2.1 Gibbs Random Fields and the Hammersley-Clifford The-
orem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.3 Factor Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.4 The Markov-Gibbs Correspondence for Directed Models . . . . . 26
2.5 T-maps and T-maps . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3 Distributions with poly(log n)-Parametrization 30
3.1 Two Kinds of poly(log n)-parameterizations . . . . . . . . . . . . . 32
3.1.1 Range Limited Interactions . . . . . . . . . . . . . . . . . . 32
3.1.2 Value Limited Interactions . . . . . . . . . . . . . . . . . . . 35
3.1.3 On the Atypical Nature of poly(log n)-parameterization . . 38
3.1.4 Our Treatment of Range and Value Limited Distributions . 38
4 Logical Descriptions of Computations 40
4.1 Inductive Definitions and Fixed Points . . . . . . . . . . . . . . . . 41
4.2 Fixed Point Logics for P and PSPACE . . . . . . . . . . . . . . . 44
1
2
5 The Link Between Polynomial Time Computation and Conditional In-
dependence 48
5.1 The Limitations of LFP . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.1.1 Locality of First Order Logic . . . . . . . . . . . . . . . . . 51
5.2 Simple Monadic LFP and Conditional Independence . . . . . . . . 55
5.3 Conditional Independence in Complex Fixed Points . . . . . . . . 60
5.4 Aggregate Properties of LFP over Ensembles . . . . . . . . . . . . 62
6 The 1RSB Ansatz of Statistical Physics 64
6.1 Ensembles and Phase Transitions . . . . . . . . . . . . . . . . . . . 64
6.2 The d1RSB Phase . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.2.1 Cores and Frozen Variables . . . . . . . . . . . . . . . . . . 68
6.2.2 Performance of Known Algorithms in the d1RSB Phase . . 71
7 Random Graph Ensembles 74
7.1 Properties of Factor Graph Ensembles . . . . . . . . . . . . . . . . 75
7.1.1 Locally Tree-Like Property . . . . . . . . . . . . . . . . . . 75
7.1.2 Degree Profiles in Random Graphs . . . . . . . . . . . . . . 76
8 Separation of Complexity Classes 78
8.1 Measuring Conditional Independence in Range Limited Models . 78
8.2 Generating Distributions from LFP . . . . . . . . . . . . . . . . . . 80
8.2.1 Encoding k-SAT into Structures . . . . . . . . . . . . . . . 80
8.2.2 The LFP Neighborhood System . . . . . . . . . . . . . . . . 83
8.2.3 Generating Distributions . . . . . . . . . . . . . . . . . . . 85
8.3 Disentangling the Interactions: The ENSP Model . . . . . . . . . . 87
8.4 Parametrization of the ENSP . . . . . . . . . . . . . . . . . . . . . . 93
8.5 Separation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.6 Some Perspectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
A Reduction to a Single LFP Operation 105
A.1 The Transitivity Theorem for LFP . . . . . . . . . . . . . . . . . . . 105
A.2 Sections and the Simultaneous Induction Lemma for LFP . . . . . 106
2
1. Introduction
The P
?
= NP question is generally considered one of the most important and
far reaching questions in contemporary mathematics and computer science.
The origin of the question seems to date back to a letter from G¨ odel to Von
Neumann in 1956 [Sip92]. Formal definitions of the class NP awaited work by
Edmonds [Edm65], Cook [Coo71], and Levin [Lev73]. The Cook-Levin theorem
showed the existence of complete problems for this class, and demonstrated
that SAT– the problem of determining whether a set of clauses of Boolean lit-
erals has a satisfying assignment – was one such problem. Later, Karp [Kar72]
showed that twenty-one well known combinatorial problems, which include
TRAVELLING SALESMAN, CLIQUE, and HAMILTONIAN CIRCUIT, were also
NP-complete. In subsequent years, many problems central to diverse areas of
application were shown to be NP-complete (see [GJ79] for a list). If P = NP,
we could never solve these problems efficiently. If, on the other hand P = NP,
the consequences would be even more stunning, since every one of these prob-
lems would have a polynomial time solution. The implications of this on ap-
plications such as cryptography, and on the general philosophical question of
whether human creativity can be automated, would be profound.
The P
?
= NP question is also singular in the number of approaches that re-
searchers have brought to bear upon it over the years. From the initial question
in logic, the focus moved to complexity theory where early work used diago-
nalization and relativization techniques. However, [BGS75] showed that these
methods were perhaps inadequate to resolve P
?
= NP by demonstrating rela-
tivized worlds in which P = NP and others in which P = NP (both relations
for the appropriately relativized classes). This shifted the focus to methods us-
3
1. INTRODUCTION 4
ing circuit complexity and for a while this approach was deemed the one most
likely to resolve the question. Once again, a negative result in [RR97] showed
that a class of techniques known as “Natural Proofs” that subsumed the above
could not separate the classes NP and P, provided one-way functions exist.
Owing to the difficulty of resolving the question, and also to the negative
results mentioned above, there has been speculation that resolving the P
?
=
NP question might be outside the domain of mathematical techniques. More
precisely, the question might be independent of standard axioms of set theory.
The first such results in [HH76] show that some relativized versions of the P
?
=
NP question are independent of reasonable formalizations of set theory.
The influence of the P
?
= NP question is felt in other areas of mathematics.
We mention one of these, since it is central to our work. This is the area of de-
scriptive complexity theory — the branch of finite model theory that studies the
expressive power of various logics viewed through the lens of complexity the-
ory. This field began with the result [Fag74] that showed that NP corresponds
to queries that are expressible in second order existential logic over finite struc-
tures. Later, characterizations of the classes P [Imm86], [Var82] and PSPACE
over ordered structures were also obtained.
There are several introductions to the P
?
= NP question and the enormous
amount of research that it has produced. The reader is referred to [Coo06] for an
introduction which also serves as the official problem description for the Clay
Millenium Prize. An older excellent review is [Sip92]. See [Wig07] for a more
recent introduction. Most books on theoretical computer science in general,
and complexity theory in particular, also contain accounts of the problem and
attempts made to resolve it. See the books [Sip97] and [BDG95] for standard
references.
Preliminaries and Notation
Treatments of standard notions from complexity theory, such as definitions of
the complexity classes P, NP, PSPACE, and notions of reductions and com-
pleteness for complexity classes, etc. may be found in [Sip97, BDG95].
4
1. INTRODUCTION 5
Our work will span various developments in three broad areas. While we
have endeavored to be relatively complete in our treatment, we feel it would
be helpful to provide standard textual references for these areas, in the order
in which they appear in the work. Additional references to results will be pro-
vided within the chapters.
Standard references for graphical models include [Lau96] and the more re-
cent [KF09]. For an engaging introduction, please see [Bis06, Ch. 8]. For an
early treatment in statistical mechanics of Markov random fields and Gibbs dis-
tributions, see [KS80].
Preliminaries from logic, such as notions of structure, vocabulary, first order
language, models, etc., may be obtained from any standard text on logic such
as [Hod93]. In particular, we refer to [EF06, Lib04] for excellent treatments of
finite model theory and [Imm99] for descriptive complexity.
For a treatment of the statistical physics approach to random CSPs, we rec-
ommend [MM09]. An earlier text is [MPV87].
1.1 Synopsis of Proof
This proof requires a convergence of ideas and an interplay of principles that
span several areas within mathematics and physics. This represents the major-
ity of the effort that went into constructing the proof. Given this, we felt that
it would be beneficial to explain the various stages of the proof, and highlight
their interplay. The technical details of each stage are described in subsequent
chapters.
Consider a system of n interacting variables such as is ubiquitous in mathe-
matical sciences. For example, these may be the variables in a k-SAT instance
that interact with each other through the clauses present in the k-SAT formula,
or n Ising spins that interact with each other in a ferromagnet. For ease of pre-
sentation, we will assume our variables are binary. Through their interaction,
variables exert an influence on each other, and affect the values each other may
take. The proof centers on the study of logical and algorithmic constructs where
5
1. INTRODUCTION 6
such complex interactions have “simple” descriptions.
What constitutes a simple description of the interaction of n variables? The
number of independent parameters required to specify the joint distribution is a
measure of the complexity of interactions between the covariates. There are
two components to this. The first measures correlations, and the second mea-
sures “ampleness” under those correlations. This is best explained with two
examples. Consider first the uniform distribution over all binary pairs
¦(0, 0), (0, 1), (1, 0), (1, 1)¦
There is no correlation between the two variables in this distribution. They are
independent. Consider next the distribution over 5 covariates which is uni-
formly supported only on
(0, 0, 0, 0, 0) and (1, 1, 1, 1, 1).
In this distribution, the covariates are tightly correlated, but the distribution is
not “ample”. A distribution over n covariates is defined to be ample when it is
supported on c
n
, c > 1 points.
Though initially these two distributions appear quite different, there is a
commonality. Both can be specified with just two parameters. In the first exam-
ple, the two parameters are the probability of the first variate and the probability
of the second variate taking the value 1. With this much information, we can
specify the joint distribution since the variates are independent.
In the second example, we again need two parameters to specify the distri-
bution — namely, the two points on which it is supported.
Though both distributions have simple descriptions, the reasons are very
different. We will study distributions on n covariates that require only 2
poly(log n)
parameters to specify. We will call such distributions poly(log n)-parametrizable.
We will see that such distributions are at the heart of polynomial time com-
putability. Conversely, in hard phases of constraint satisfaction problems such
as k-SAT, the space of solutions is both correlated and ample. This causes all
polynomial time algorithms to fail on them.
6
1. INTRODUCTION 7
A distribution is simple to describe if there is either independence between
the variates (as was the case in our first example) or limited support of the dis-
tribution (as was the case in our second example). We call the first case a range
limited interaction because variates interact with a limited range of other vari-
ates. The second case is called value limited since the number of joint values the
variates can take is limited. The common feature underlying both cases is that
the distribution has a very economical parametrization as compared to “true”
joint distributions (more precisely, statistically typical joint distributions) on n
covariates, which require O(2
n
) parameters to specify. Thus, we wish to study
such distributions, and will consider both the cases of range and value limited
interactions.
At this point, we visit the topic of graphical interaction models and condi-
tional independence which is a manifestation of range limited interactions. While
complete independence between variates in a complex system is rare, condi-
tional independence between blocks of variables is fairly frequent. We see that
factorization into conditionally independent pieces manifests in terms of eco-
nomical parametrizations of the joint distribution. Graphical models offer us a
way to measure the size of these interactions.
The factorization of interactions can be represented by a corresponding fac-
torization of the joint distribution of the variables over the space of configura-
tions of the n variables subject to the constraints of the problem. It has been real-
ized in the statistics and physics communities for long that certain multivariate
distributions decompose into the product of a few types of factors, with each
factor itself having only a few variables. Such a factorization of joint distribu-
tions into simpler factors can often be represented by graphical models whose
vertices index the variables. Afactorization of the joint distribution according to
the graph implies that the interactions between variables can be factored into a
sequence of “local interactions” between vertices that lie within neighborhoods
of each other.
Consider the case of an undirected graphical model. The factoring of inter-
actions may be stated in terms of either a Markov property, or a Gibbs property
7
1. INTRODUCTION 8
with respect to the graph. Specifically, the local Markov property of such mod-
els states that the distribution of a variable is only dependent directly on that
of its neighbors in an appropriate neighborhood system. Of course, two vari-
ables arbitrarily far apart can influence each other, but only through a sequence
of successive local interactions. The global Markov property for such models states
that when two sets of vertices are separated by a third, this induces a condi-
tional independence on variables corresponding to these sets of vertices, given
those corresponding to the third set. On the other hand, the Gibbs property of a
distribution with respect to a graph asserts that the distribution factors into a
product of potential functions over the maximal cliques of the graph. Each po-
tential captures the interaction between the set of variables that form the clique.
The Hammersley-Clifford theorem states that a positive distribution having the
Markov property with respect to a graph must have the Gibbs property with
respect to the same graph.
The condition of positivity is essential in the Hammersley-Clifford theorem
for undirected graphs. However, it is not required when the distribution satis-
fies certain directed models. In that case, the Markov property with respect to
the directed graph implies that the distribution factorizes into local conditional
probability distributions (CPDs). Furthermore, if the model is a directed acyclic
graph (DAG), we can obtain the Gibbs property with respect to an undirected
graph constructed from the DAG by a process known as moralization. We will
return to the directed case shortly.
Chapter 2 develops the principles underlying the framework of graphical
models. We will not use any of these models in particular, but construct another
directed model on a larger product space that utilizes these principles and tailors
them to the case of least fixed point logic, which we turn to next.
At this point, we change to the setting of finite model theory. Finite model
theory is a branch of mathematical logic that has provided machine indepen-
dent characterizations of various important complexity classes including P,
NP, and PSPACE. In particular, the class of polynomial time computable
queries on successor structures has a precise description —it is the class of queries
8
1. INTRODUCTION 9
expressible in the logic FO(LFP) which extends first order logic with the abil-
ity to compute least fixed points of positive first order formulae. Least fixed
point constructions iterate an underlying positive first order formula, thereby
building up a relation in stages. We take a geometric picture of a monadic LFP
computation. Initially the relation to be built is empty. At the first stage, certain
elements, whose types satisfy the first order formula, enter the relation. This
changes the neighborhoods of these elements, and therefore in the next stage,
other elements (whose neighborhoods have been thus changed in the previous
stages) become eligible for entering the relation. The positivity of the formula
implies that once an element is in the relation, it cannot be removed, and so
the iterations reach a fixed point in a polynomial number of steps. Importantly
from our point of view, the positivity and the stage-wise nature of LFP means
that the computation has a directed representation on a graphical model that we
will construct. Recall at this stage that distributions over directed models enjoy
factorization even when they are not defined over the entire space of configura-
tions.
We may interpret this as follows: monadic LFP relies on the assumption that
variables that are highly entangled with each other due to constraints can be
disentangled in a way that they now interact with each other through condi-
tional independencies induced by a certain directed graphical model construc-
tion. Of course, an element does influence others arbitrarily far away, but only
through a sequence of such successive local and bounded interactions. The reason LFP
computations terminate in polynomial time is analogous to the notions of con-
ditional independence that underlie efficient algorithms on graphical models
having sufficient factorization into local interactions.
In order to apply this picture in full generality to all LFP computations, we
use the simultaneous induction lemma to push all simultaneous inductions into
nested ones, and then employ the transitivity theorem to encode nested fixed
points as sections of a single relation of higher arity. We then see that this is
the case of a value limited interaction between O(n) variates. Namely, although
n variates interact with each other, they do not take c
n
joint values. Building
9
1. INTRODUCTION 10
the machinery that can precisely map all these cases to the picture of either
factorization into range limited or value limited interactions is the subject of
Chapter 5.
The preceding insights now direct us to the setting necessary in order to
separate P fromNP. We need a regime of NP-complete problems where inter-
actions between variables have the following two properties.
1. They are so “dense” that they cannot be factored through the bottleneck of
the local and bounded properties of first order logic that limit each stage
of LFP computation.
2. The distribution is ample. Namely, it takes c
n
joint values.
Intuitively, this should happen when each variable has to simultaneously sat-
isfy constraints involving an extensive (O(n)) fraction of the variables in the
problem, and blocks of n variables are instantiated c
n
distinct ways under these
strong correlations. Namely, we have ample, and highly correlated distribu-
tions having no factorization into conditionally independent pieces (remember
the value limited case is already ruled out since the distribution is ample).
In search of regimes where such situations arise, we turn to the study of
ensemble random k-SAT where the properties of the ensemble are studied as a
function of the clause density parameter. We will now add ideas from this field
which lies on the intersection of statistical mechanics and computer science to
the set of ideas in the proof.
In the past two decades, the phase changes in the solution geometry of ran-
dom k-SAT ensembles as the clause density increases, have gathered much re-
search attention. The 1RSB ansatz of statistical mechanics says that the space of
solutions of random k-SAT shatters into exponentially many clusters of solu-
tions when the clause density is sufficiently high. This phase is called 1dRSB (1-
Step Dynamic Replica Symmetry Breaking) and was conjectured by physicists
as part of the 1RSB ansatz. It has since been rigorously proved for high values
of k. It demonstrates the properties of high correlation between large sets of
variables that we will need. Specifically, the emergence of cores that are sets of
10
1. INTRODUCTION 11
C clauses all of whose variables lie in a set of size C (this actually forces C to be
O(n)). As the clause density is increased, the variables in these cores “freeze.”
Namely, they take the same value throughout the cluster. Changing the value of
a variable within a cluster necessitates changing O(n) other variables in order
to arrive at another satisfying solution, which would be in a different cluster.
Furthermore, as the clause density is increased towards the SAT-unSAT thresh-
old, each cluster collapses steadily towards a single solution, that is maximally
far apart from every other cluster. Physicists think of this as an “energy gap”
between the clusters. Such stages are precisely the ones that we need since they
possess the following two properties.
1. Due to strong O(n) correlations that cannot be factored through condi-
tional independencies, they resist attack by local and bounded first order
stages of a monadic LFP computation.
2. Due to their ampleness, which arises from their instantiations in expo-
nentially many clusters, they resist attack by complex fixed points that
produce value limited distributions.
Finally, as the clause density increases above the SAT-unSAT threshold, the so-
lution space vanishes, and the underlying instance of SAT is no longer satisfi-
able.
We should stress that the picture described above is known to hold in the
case of random k-SAT only for k ≥ 9. For lower values of k, such as k = 3,
there is empirical evidence that this picture does not hold. In other words, the
“true” d1RSB phase arises in randomk-SAT for k ≥ 9 as the clause density rises
above (2
k
/k) ln k. Since we need all the known properties of the d1RSB phase,
we will work in this regime. Therefore, our proof does not say anything about
the efficacy of various algorithms for 3-SAT, for instance. We specifically prove
that the d1RSB phase is out of reach for polynomial time algorithms, and this
phase is only reached at k ≥ 9. We reproduce the rigorously proved picture of
the 1RSB ansatz that we will need in Chapter 6.
In Chapter 7, we make a brief excursion into the random graph theory of
11
1. INTRODUCTION 12
the factor graph ensembles underlying random k-SAT. From here, we obtain
results that asymptotically almost surely upper bound the size of the largest
cliques in the neighborhood systems on the Gaifman graphs that we study
later when we build models for the range limited interactions that occur during
monadic LFP. These provide us with bounds on the largest irreducible interac-
tions between variables during the various stages of a LFP computation.
Finally in Chapter 8, we pull all the threads and machinery together. First,
we encode k-SAT instances as queries on structures over a certain vocabulary
in a way that LFP captures all polynomial time computable queries on them.
We then set up the framework whereby we can generate distributions of solu-
tions to each instance by asking a purported LFP algorithm for k-SAT to extend
partial assignments on variables to full satisfying assignments.
Next, we embed the space of covariates into a larger product space which al-
lows us to “disentangle” the flow of information during a LFP computation.
This allows us to study the computations performed by the LFP with various
initial values under a directed graphical model. This model is only polynomi-
ally larger than the structure itself. We call this the Element-Neighborhood-Stage
Product, or ENSP model. The distribution of solutions generated by LFP then is
a mixture of distributions each of whom factors according to an ENSP.
At this point, we wish to measure the growth of independent parameters of
distributions of solutions whose embeddings into the larger product space fac-
tor over the ENSP. In order to do so, we utilize the following properties for
range limited models.
1. The directed nature of the model that comes from properties of LFP.
2. The properties of neighborhoods that are obtained by studies on random
graph ensembles, specifically that neighborhoods that occur during the
LFP computation are of size poly(log n) asymptotically almost surely in
the n → ∞limit.
3. The locality and boundedness properties of FO that put constraints upon
each individual stage of the LFP computation.
12
1. INTRODUCTION 13
4. Simple properties of LFP, such as the closure ordinal being a polynomial
in the structure size.
The crucial property that allows us to analyze mixtures of range limited dis-
tributions that factor according to some ENSP is that we can parametrize the
distribution using potentials on cliques of its moralized graph that are of size at
most poly(log n). This means that when the mixture is exponentially numerous,
we will see features that reflect the poly(log n) factor size of the conditionally
independent parametrization.
Next, we come to value limited models. Here, interactions are of size O(n),
but they are limited to only c
poly(log n)
values, thereby giving us poly(log n)-parametrization.
We show how to deal with mixtures of value limited models. We build a tech-
nique that merges various O(n) potentials that are poly(log n)-parametrizable
into a single potential that is also poly(log n)-parametrizable, and covers the en-
tire graphical model (that has poly(n) variables.)
Now we close the loop and show that a distribution of solutions for k-SAT
constructed by any purported LFP algorithm (monadic or complex) would not
have enough parameters to describe the known picture of k-SAT in the d1RSB
phase for k ≥ 9 — namely, the presence of extensive frozen variables in ex-
ponentially many clusters with Hamming distance between the clusters be-
ing O(n). In particular, in exponentially numerous mixtures of range limited
models, we would have conditionally independent variation between blocks of
poly(log n) variables, causing the Hamming distance between solutions to be of
this order as well. In other words, solutions for k-SAT that are constructed us-
ing range limited LFP will display aggregate behavior that reflects that they are
constructed out of “building blocks” of size poly(log n). This behavior will man-
ifest when exponentially many solutions are generated by the LFP construction.
The case of value limited LFP also leads to a contradiction since it would be
unable to explain the exponentially many cluster instantiations of cores that are
present in the d1RSB phase.
This shows that LFP cannot express the satisfiability query in the d1RSB
phase for high enough k, and separates P from NP. This also explains the
13
1. INTRODUCTION 14
empirical observation that all known polynomial time algorithms fail in the
d1RSB phase for high values of k, and also establishes on rigorous principles
the physics intuition about the onset of extensive long range correlations in the
d1RSB phase that causes all known polynomial time algorithms to fail. It also
completes this picture, since it says that extensive O(n) correlations that (a) can-
not factor through conditional independencies and (b) are amply instantiated,
are the source of failure of polynomial time algorithms.
14
2. Interaction Models and
Conditional Independence
Systems involving a large number of variables interacting in complex ways are
ubiquitous in the mathematical sciences. These interactions induce dependen-
cies between the variables. Because of the presence of such dependencies in a
complex systemwith interacting variables, it is not often that one encounters in-
dependence between variables. However, one frequently encounters conditional
independence between sets of variables. Both independence and conditional in-
dependence among sets of variables have been standard objects of study in
probability and statistics. Speaking in terms of algorithmic complexity, one of-
ten hopes that by exploiting the conditional independence between certain sets
of variables, one may avoid the cost of enumeration of an exponential number
of hypothesis in evaluating functions of the distribution that are of interest.
2.1 Conditional Independence
We first fix some notation. Random variables will be denoted by upper case
letters such as X, Y, Z, etc. The values a random variable takes will be denoted
by the corresponding lower case letters, such as x, y, z. Throughout this work,
we assume our random variables to be discrete unless stated otherwise. We
may also assume that they take values in a common finite state space, which
we usually denote by Λ following physics convention. We denote the probabil-
ity mass functions of discrete random variables X, Y, Z by P
X
(x), P
Y
(y), P
Z
(z)
respectively. Similarly, P
X,Y
(x, y) will denote the joint mass of (X, Y ), and so
15
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 16
on. We drop subscripts on the P when it causes no confusion. We freely use the
term “distribution” for the probability mass function.
The notion of conditional independence is central to our proof. The intuitive
definition of the conditional independence of X from Y given Z is that the con-
ditional distribution of X given (Y, Z) is equal to the conditional distribution
of X given Z alone. This means that once the value of Z is given, no further
information about the value of X can be extracted from the value of Y . This
is an asymmetric definition, and can be replaced by the following symmetric
definition. Recall that X is independent of Y if
P(x, y) = P(x)P(y).
Definition 2.1. Let notation be as above. X is conditionally independent of Y
given Z, written X⊥⊥Y [ Z, if
P(x, y [ z) = P(x [ z)P(y [ z),
The asymmetric version which says that the information contained in Y is
superfluous to determining the value of X once the value of Z is known may
be represented as
P(x [ y, z) = P(x [ z).
The notion of conditional independence pervades statistical theory [Daw79,
Daw80]. Several notions from statistics may be recast in this language.
EXAMPLE 2.2. The notion of sufficiency may be seen as the presence of a cer-
tain conditional independence [Daw79]. A sufficient statistic T in the problem
of parameter estimation is that which renders the estimate of the parameter in-
dependent of any further information from the sample X. Thus, if Θ is the
parameter to be estimated, then T is a sufficient statistic if
P(θ [ x) = P(θ [ t).
Thus, all there is to be gained from the sample in terms of information about
Θ is already present in T alone. In particular, if Θ is a posterior that is being
16
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 17
computed by Bayesian inference, then the above relation says that the posterior
depends on the data X through the value of T alone. Clearly, such a statement
would lead to a reduction in the complexity of inference.
2.2 Conditional Independence in Undirected Graph-
ical Models
Graphical models offer a convenient framework and methodology to describe
and exploit conditional independence between sets of variables in a system.
One may think of the graphical model as representing the family of distribu-
tions whose law fulfills the conditional independence statements made by the
graph. A member of this family may satisfy any number of additional condi-
tional independence statements, but not less than those prescribed by the graph.
In general, we will consider graphs ( = (V, E) whose n vertices index a set of
n random variables (X
1
, . . . , X
n
). The random variables all take their values
in a common state space Λ. The random vector (X
1
, . . . , X
n
) then takes values
in a configuration space Ω
n
= Λ
n
. We will denote values of the random vector
(X
1
, . . . , X
n
) simply by x = (x
1
, . . . , x
n
). The notation X
V \I
will denote the set
of variables excluding those whose indices lie in the set I. Let P be a proba-
bility measure on the configuration space. We will study the interplay between
conditional independence properties of P and its factorization properties.
There are, broadly, two kinds of graphical models: directed and undirected.
We first consider the case of undirected models. Fig. 2.1 illustrates an undirected
graphical model with ten variables.
Random Fields and Markov Properties
Graphical models are very useful because they allow us to read off conditional
independencies of the distributions that satisfy these models from the graph
itself. Recall that we wish to study the relation between conditional indepen-
dence of a distribution with respect to a graphical model, and its factorization.
17
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 18
A C B
Figure 2.1: An undirected graphical model. Each vertex represents a random
variable. The vertices in set A are separated from those in set B by set C. For
random variables to satisfy the global Markov property relative to this graph-
ical model, the corresponding sets of random variables must be conditionally
independent. Namely, A⊥⊥B[ C.
Towards that end, one may write increasingly stringent conditional indepen-
dence properties that a set of random variables satisfying a graphical model
may possess, with respect to the graph. In order to state these, we first define
two graph theoretic notions — those of a general neighborhood system, and of
separation.
Definition 2.3. Given a set of variables S known as sites, a neighborhood system
A
S
on S is a collection of subsets ¦A
i
: 1 ≤ i ≤ n¦ indexed by the sites in S that
satisfy
1. a site is not a neighbor to itself (this also means there are no self-loops in
the induced graph): s
i
/ ∈ A
i
, and
2. the relationship of being a neighbor is mutual: s
i
∈ A
j
⇔ s
j
∈ A
i
.
In many applications, the sites are vertices on a graph, and the neighborhood
system A
i
is the set of neighbors of vertex s
i
on the graph. We will often be
interested in homogeneous neighborhood systems of S on a graph in which, for
18
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 19
each s
i
∈ S, the neighborhood A
i
is defined as
(
i
:= ¦s
j
∈ S: d(s
i
, s
j
) ≤ r¦.
Namely, in such neighborhood systems, the neighborhood of a site is simply
the set of sites that lie in the radius r ball around that site. Note that a nearest
neighbor systemthat is often used in physics is just the case of r = 1. We will need
to use the general case, where r will be determined by considerations from logic
that will be introduced in the next two chapters. We will use the term“variable”
freely in place of “site” when we move to logic.
Definition 2.4. Let A, B, C be three disjoint subsets of the vertices V of a graph
(. The set C is said to separate A and B if every path from a vertex in A to a
vertex in B must pass through C.
Nowwe return to the case of the vertices indexing randomvariables (X
1
, . . . , X
n
)
and the vector (X
1
, . . . , X
n
) taking values in a configuration space Ω
n
. A proba-
bility measure P on Ω
n
is said to satisfy certain Markov properties with respect
to the graph when it satisfies the appropriate conditional independencies with
respect to that graph. We will study the following two Markov properties, and
their relation to factorization of the distribution.
Definition 2.5. 1. The local Markov property. The distribution X
i
(for every i)
is conditionally independent of the rest of the graph given just the vari-
ables that lie in the neighborhood of the vertex. In other words, the influ-
ence that variables exert on any given variable is completely described by
the influence that is exerted through the neighborhood variables alone.
2. The global Markov property. For any disjoint subsets A, B, C of V such that
C separates A from B in the graph, it holds that
A⊥⊥B[ C.
We are interested in distributions that do satisfy such properties, and will
examine what effect these Markov properties have on the factorization of the
19
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 20
distributions. For most applications, this is done in the context of Markov random
fields.
We motivate a Markov random field with the simple example of a Markov
chain ¦X
n
: n ≥ 0¦. The Markov property of this chain is that any variable in
the chain is conditionally independent of all other variables in the chain given
just its immediate neighbors:
X
n
⊥⊥¦x
k
: k / ∈ ¦n −1, n, n + 1¦ [ X
n−1
, X
n+1
¦.
A Markov random field is the natural generalization of this picture to higher
dimensions and more general neighborhood systems.
Definition 2.6. The collection of random variables X
1
, . . . , X
n
is a Markov ran-
dom field with respect to a neighborhood system on ( if and only if the following
two conditions are satisfied.
1. The distribution is positive on the space of configurations: P(x) > 0 for x ∈

n
.
2. The distribution at each vertex is conditionally independent of all other
vertices given just those in its neighborhood:
P(X
i
[ X
V \i
) = P(X
i
[ X
N
i
)
These local conditional distributions are known as local characteristics of
the field.
The second condition says that Markov randomfields satisfy the local Markov
property with respect to the neighborhood system. Thus, we can think of inter-
actions between variables in Markov random fields as being characterized by
“piecewise local” interactions. Namely, the influence of far away vertices must
“factor through” local interactions. This may be interpreted as:
The influence of far away variables is limited to that which is transmit-
ted through the interspersed intermediate variables — there is no “direct”
influence of far away vertices beyond that which is factored through such
intermediate interactions.
20
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 21
However, through such local interactions, a vertex may influence any other ar-
bitrarily far away. Notice though, that this is a considerably simpler picture
than having to consult the joint distribution over all variables for all interac-
tions, for here, we need only know the local joint distributions and use these to
infer the correlations of far away variables. We shall see in later chapters that
this picture, with some additional caveats, is at the heart of polynomial time
computations.
Note the positivity condition on Markov random fields. With this positivity
condition, the complete set of conditionals given by the local characteristics of
a field determine the joint distribution [Bes74].
Markov random fields satisfy the global Markov property as well.
Theorem 2.7. Markov random fields with respect to a neighborhood system satisfy the
global Markov property with respect to the graph constructed from the neighborhood
system.
Markov random fields originated in statistical mechanics [Dob68], where
they model probability measures on configurations of interacting particles, such
as Ising spins. See [KS80] for a treatment that focusses on this setting. Their lo-
cal properties were later found to have applications to analysis of images and
other systems that can be modelled through some form of spatial interaction.
This field started with [Bes74] and came into its own with [GG84] which ex-
ploited the Markov-Gibbs correspondence that we will deal with shortly. See
also [Li09].
2.2.1 Gibbs Random Fields and the Hammersley-Clifford The-
orem
We are interested in how the Markov properties of the previous section trans-
late into factorization of the distribution. Note that Markov random fields are
characterized by a local condition — namely, their local conditional indepen-
dence characteristics. We now describe another random field that has a global
characterization — the Gibbs random field.
21
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 22
Definition 2.8. AGibbs randomfield (or Gibbs distribution) with respect to a neigh-
borhood system A
G
on the graph ( is a probability measure on the set of con-
figurations Ω
n
having a representation of the form
P(x
1
, . . . , x
n
) =
1
Z
exp(−
U(x)
T
),
where
1. Z is the partition function and is a normalizing factor that ensures that the
measure sums to unity,
Z =
¸
x∈Ω
n
exp(−
U(x)
T
).
Evaluating Z explicitly is hard in general since it is a summation over each
of the Λ
n
configurations in the space.
2. T is a constant known as the “Temperature” that has origins in statistical
mechanics. It controls the sharpness of the distribution. At high tempera-
tures, the distribution tends to be uniform over the configurations. At low
temperatures, it tends towards a distribution that is supported only on the
lowest energy states.
3. U(x) is the “energy” of configuration x and takes the following form as a
sum
U(x) =
¸
c∈C
V
c
(x).
over the set of cliques ( of (. The functions V
c
: c ∈ ( are the clique poten-
tials such that the value of V
c
(x) depends only on the coordinates of x that
lie in the clique c. These capture the interactions between vertices in the
clique.
Thus, a Gibbs random field has a probability distribution that factorizes into
its constituent “interaction potentials.” This says that the probability of a con-
figuration depends only on the interactions that occur between the variables,
broken up into cliques. For example, let us say that in a system, each particle
22
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 23
interacts with only 2 other particles at a time, (if one prefers to think in terms
of statistical mechanics) then the energy of each state would be expressible as a
sum of potentials, each of whom had just three variables in its support. Thus,
the Gibbs factorization carries in it a faithful representation of the underlying
interactions between the particles. This type of factorization obviously yields
a “simpler description” of the distribution. The precise notion is that of inde-
pendent parameters it takes to specify the distribution. Factorization into con-
ditionally independent interactions of scope k means that we can specify the
distribution in O(γ
k
) parameters rather than O(γ
n
). We will return to this at the
end of this chapter.
Definition 2.9. Let P be a Gibbs distribution whose energy function U(x) =
¸
c∈C
V
c
(x). The support of the potential V
c
is the cardinality of the clique c. The
degree of the distribution P, denoted by deg(P), is the maximum of the supports
of the potentials. In other words, the degree of the distribution is the size of the
largest clique that occurs in its factorization.
One may immediately see that the degree of a distribution is a measure of
the complexity of interactions in the system since it is the size of the largest set
of variables whose interaction cannot be split up in terms of smaller interactions
between subsets. One would expect this to be the hurdle in efficient algorithmic
applications.
The Hammersley-Clifford theorem relates the two types of random fields.
Theorem 2.10 (Hammersley-Clifford). X is Markov random field with respect to a
neighborhood system A
G
on the graph ( if and only if it is a Gibbs random field with
respect to the same neighborhood system.
The theorem appears in the unpublished manuscript [HC71] and uses a cer-
tain “blackening algebra” in the proof. The first published proofs appear in
[Bes74] and [Mou74].
Note that the condition of positivity on the distribution (which is part of
the definition of a Markov random field) is essential to state the theorem in
full generality. The following example from [Mou74] shows that relaxing this
23
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 24
condition allows us to build distributions having the Markov property, but not
the Gibbs property.
EXAMPLE 2.11. Consider a system of four binary variables ¦X
1
, X
2
, X
3
, X
4
¦.
Each of the following combinations have probability 1/8, while the remaining
combinations are disallowed.
(0, 0, 0, 0) (1, 0, 0, 0) (1, 1, 0, 0) (1, 1, 1, 0)
(0, 0, 0, 1) (0, 0, 1, 1) (0, 1, 1, 1) (1, 1, 1, 1).
We may check that this distribution has the global Markov property with re-
spect to the 4 vertex cycle graph. Namely we have
X
1
⊥⊥X
3
[ X
2
, X
4
and X
2
⊥⊥X
4
[ X
1
, X
3
.
However, the distribution does not factorize into Gibbs potentials.
2.3 Factor Graphs
Factor graphs are bipartite graphs that express the decomposition of a “global”
multivariate function into “local” functions of subsets of the set of variables.
They are a class of undirected models. The two types of nodes in a factor graph
correspond to variable nodes, and factor nodes. See Fig. 2.2.
x
1
C
1
C
2
C
3
x
2
x
3
x
4
x
5
x
6
Figure 2.2: A factor graph showing the three clause 3-SAT formula (X
1
∨ X
4

X
6
) ∧ (X
1
∨ X
2
∨ X
3
) ∧ (X
4
∨ X
5
∨ X
6
). A dashed line indicates that the
variable appears negated in the clause.
24
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 25
The distribution modelled by this factor graph will show a factorization as
follows
p(x
1
, . . . , x
6
) =
1
Z
ϕ
1
(x
1
, x
4
, x
6

2
(x
1
, x
2
, x
3
)ϕ(x
4
, x
5
, x
6
), (2.1)
where Z =
¸
x
1
,...,x
6
ϕ
1
(x
1
, x
4
, x
6

2
(x
1
, x
2
, x
3
)ϕ(x
4
, x
5
, x
6
). (2.2)
Factor graphs offer a finer grained view of factorization of a distribution
than Bayesian networks or Markov networks. One should keep in mind that
this factorization is (in general) far from being a factorization into conditionals
and does not express conditional independence. The system must embed each
of these factors in ways that are global and not obvious from the factors. This
global information is contained in the partition function. Thus, in general, these
factors do not represent conditionally independent pieces of the joint distribu-
tions. In summary, the factorization above is not the one what we are seeking —
it does not imply a series of conditional independencies in the joint distribution.
Factor graphs have been very useful in various applications, most notably
perhaps in coding theory where they are used as graphical models that un-
derlie various decoding algorithms based on forms of belief propagation (also
known as the sum-product algorithm) that is an exact algorithm for computing
marginals on tree graphs but performs remarkably well even in the presence of
loops. See [KFaL98] and [AM00] for surveys of this field. As might be expected
from the preceding comments, these do not focus on conditional independence,
but rather on algorithmic applications of local features (such as locally tree like)
of factor graphs.
A Hammersley-Clifford type theorem holds over the completion of a factor
graph. A clique in a factor graph is a set of variable nodes such that every pair
in the set is connected by a function node. The completion of a factor graph is
obtained by introducing a new function node for each clique, and connecting
it to all the variable nodes in the clique, and no others. Then, a positive distri-
bution that satisfies the global Markov property with respect to a factor graph
satisfies the Gibbs property with respect to its completion.
25
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 26
2.4 The Markov-Gibbs Correspondence for Directed
Models
Consider first a directed acyclic graph (DAG), which is simply a directed graph
without any directed cycles in it. Some specific points of additional terminology
for directed graphs are as follows. If there is a directed edge from x to y, we say
that x is a parent of y, and y is the child of x. The set of parents of x is denoted
by pa(x), while the set of children of x is denoted by ch(a). The set of vertices
from whom directed paths lead to x is called the ancestor set of x and is denoted
an(x). Similarly, the set of vertices to whom directed paths from x lead is called
the descendant set of x and is denoted de(x). Note that DAGs is allowed to have
loops (and loopy DAGs are central to the study of iterative decoding algorithms
on graphical models). Finally, we often assume that the graph is equipped with
a distance function d(, ) between vertices which is just the length of the shortest
path between them. A set of random variables whose interdependencies may
be represented using a DAG is known as a Bayesian network or a directed Markov
field. The idea is best illustrated with a simple example.
Consider the DAG of Fig. 2.3 (left). The corresponding factorization of the
joint density that is induced by the DAG model is
p(x
1
, . . . , x
6
) = p(x
1
)p(x
2
)p(x
3
)p(x
4
[ x
1
)p(x
5
[ x
2
, x
3
, x
4
).
Thus, every joint distribution that satisfies this DAG factorizes as above.
Given a directed graphical model, one may construct an undirected one by
a process known as moralization. In moralization, we (a) replace a directed edge
from one vertex to another by an undirected one between the same two vertices
and (b) “marry” the parents of each vertex by introducing edges between each
pair of parents of the vertex at the head of the former directed edge. The process
is illustrated in the figure below.
In general, if we denote the set of parents of the variable x
i
by pa(x
i
), then
26
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 27
x
2
x
4
x
3
x
5
x
1
x
2
x
4
x
3
x
5
x
1
Figure 2.3: The moralization of the DAG on the left to obtain the moralized
undirected graph on the right.
the joint distribution of (x
1
, . . . , x
n
) factorizes as
p(x
1
, . . . , x
n
) =
N
¸
n=1
p(x
n
[ pa
n
).
We want, however, is to obtain a Markov-Gibbs equivalence for such graphi-
cal models in the same manner that the Hammersley-Clifford theoremprovided
for positive Markov random fields. We have seen that relaxing the positivity
condition on the distribution in the Hammersley-Clifford theorem (Thm. 2.10)
cannot be done in general. In some cases however, one may remove the positiv-
ity condition safely. In particular, [LDLL90] extends the Hammersley-Clifford
correspondence to the case of arbitrary distributions (namely, dropping the pos-
itivity requirement) for the case of directed Markov fields. In doing so, they sim-
plify and strengthen an earlier criterion for directed graphs given by [KSC84].
We will use the result from [LDLL90], which we reproduce next.
Definition 2.12. A measure p admits a recursive factorization according to graph
( if there exist non-negative functions, known as kernels, k
v
(., .) for v ∈ V de-
fined on ΛΛ
| pa(v)|
where the first factor is the state space for X
v
and the second
for X
pa(v)
, such that

k
v
(y
v
, x
pa(v)

v
(dy
v
) = 1
27
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 28
and
p = f.µ where f(x) =
¸
v∈V
k
v
(x
v
, x
pa(v)
).
In this case, the kernels k
v
(., x
pa(v)
) are the conditional densities for the dis-
tribution of X
v
conditioned on the value of its parents X
pa(v)
= x
pa(v)
. Now let
(
m
be the moral graph corresponding to (.
Theorem 2.13. If p admits a recursive factorization according to (, then it admits a
factorization (into potentials) according to the moral graph (
m
.
D-separation
We have considered the notion of separation on undirected models and its ef-
fect on the set of conditional independencies satisfied by the distributions that
factor according to the model. For directed models, there is an analogous no-
tion of separation known as D-separation. The notion is what one would expect
intuitively if one views directed models as representing “flows” of probabilistic
influence.
We simply state the property and refer the reader to [KF09, '3.3.1] and [Bis06,
'8.2.2] for discussion and examples. Let A,B, and C be sets of vertices on a
directed model. Consider the set of all directed paths coming from a node in A
and going to a node in B. Such a path is said to be blocked if one of the following
two scenarios occurs.
1. Arrows on the path meet head-to-tail or tail-to-tail at a node in C.
2. Arrows meet head-to-head at a node, and neither the node nor any of its
descendants is in C.
If all paths from A to B are blocked as above, then C is said to D-separate A
from B, and the joint distribution must satisfy A⊥⊥B[ C.
28
2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 29
2.5 T-maps and T-maps
We have seen that there are two broad classes of graphical models —undirected
and directed — which may be used to represent the interaction of variables
in a system. The conditional independence properties of these two classes are
obtained differently.
Definition 2.14. A graph (directed or undirected) is said to be a T-map (’depen-
dencies map’) for a distribution if every conditional independence statement of
the form A⊥⊥B[ C for sets of variables A, B, and C that is satisfied by the distri-
bution is reflected in the graph. Thus, a completely disconnected graph having
no edges is trivially a T-map for any distribution.
A T-map may express more conditional independencies than the distribu-
tion possesses.
Definition 2.15. A graph (directed or undirected) is said to be a T-map (’inde-
pendencies map’) for a distribution if every conditional independence state-
ment of the form A⊥⊥B[ C for sets of variables A, B, and C that is expressed
by the graph is also satisfied by the distribution. Thus, a completely connected
graph is trivially a T-map for any distribution.
A T-map may express less conditional independencies than the distribution
possesses.
Definition 2.16. A graph that is both an T-map and a T-map for a distribution
is called its {-map (’perfect map’).
In other words a {-map expresses precisely the set of conditional indepen-
dencies that are present in the distribution.
Not all distributions have {-maps. Indeed, the class of distributions having
directed {-maps is itself distinct from the class having undirected {-maps and
neither equals the class of all distributions (see [Bis06, '3.8.4] for examples).
29
3. Distributions with
poly(log n)-Parametrization
We now come to a central theme in our work. Consider a system of n binary
covariates (X
1
, . . . , X
n
). To specify their joint distribution p(x
1
, . . . , x
n
) in the
absence of any additional information, we would have to give the probability
mass function at each of the 2
n
configurations that these n variables can take
jointly. The only constraint given on these probability masses is that they must
sum up to 1. Thus, given the function value at 2
n
− 1 configurations, we could
find that at the remaining configuration. This means that in the absence of any
additional information, n covariates require 2
n
− 1 parameters to specify their
joint distribution. Thus, it takes exponentially many in n parameters to specify
a “true” joint distribution of n covariates. This statement can be made more
precise — the typical joint distribution on n variates requires O(2
n
) parameters
for its specification.
In light of the above, a joint distribution that requires only 2
poly(log n)
param-
eters to specify would seem quite unusual. We would intuitively expect it to be
“much simpler” in some way than the typical joint distribution on n variates.
Indeed, because of the exponent of poly(log n), we would expect that it would
be “somewhat like” a joint distribution on only poly(log n) covariates. In other
words — distributions on n variates but requiring only 2
poly(log n)
parameters for
their specification are like the typical distribution on poly(log n) variates. We
shall refer to such distributions as having poly(log n)-parametrization.
Let us take an extreme case of such a “simple” joint distribution. Take the
case of n covariates, except that we are provided with one critical piece of extra
30
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 31
information — that the n variates are independent of each other. In that case,
we would need 1 parameter to specify each of their individual distributions —
namely, the probability that it takes the value 1. These n parameters then spec-
ify the joint distribution simply because the distribution factorizes completely
into factors whose scopes are single variables (namely, just the p(x
i
)), as a re-
sult of the independence. Thus, we go from exponentially many independent
parameters to linearly many if we know that the variates are independent.
Let us consider another extreme case of such a distribution. Consider the
distribution on n variates that is non-zero only at (0, 0, . . . , 0) and (1, 1, . . . , 1).
Here, the variates are highly correlated. But once again, we require only two
parameters to specify the distribution. In this case, it is because the distribu-
tion is supported on only two out of a possible 2
n
values. In other words, it is
severely limited by the small number of joint values the covariates take.
In both cases above, the distribution on n covariates required far fewer pa-
rameters to specify than the typical n variate distribution does.
In order to state the typical case of a n variate distribution, we make the
following definition.
Definition 3.1. A distribution on n variates will be called ample if it is supported
on c
n
joint values for c > 1.
In other words, ample distributions take the typical number of joint values.
Definition 3.2. A distribution on n variates will be said to have irreducible O(n)
correlations if there exist correlations between O(n) variates that do not permit
factorization into smaller scopes through conditional independencies.
It is distributions that possess both these properties that are problematic for
polynomial time algorithms. We will see that distributions constructed by poly-
nomial time algorithms can have one or the other property, but not both. Note
that distributions having both these properties require O(2
n
) independent pa-
rameters to specify. There is neither factorization, nor limited support, that will
permit more economical parametrization.
31
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 32
This brings us to a key motivating question: What if n covariates had a joint
distribution that required only exponential in poly(log n) many parameters to
specify? When would such a distribution arise, and what would be its limita-
tions? This question is really the heart of P
?
= NP. Indeed, all the machinery
we build and use in this work really takes us to the following insight: Polyno-
mial time computations build distributions of solutions that can be parameterized us-
ing only exponential in poly(log n) many parameters. Namely, they have poly(log n)-
parametrization. In contrast, in the hard phases of NP-complete problems like k-SAT,
the distribution of solutions requires exponentially many parameters to specify. In par-
ticular, the distribution of solutions in the hard phases of NP-complete prob-
lems displays two properties
1. The variates are as far from being independent as possible — they inter-
act with each other O(n) at a time, with no possibility for factorization
into conditional independencies. In other words, the distribution has irre-
ducible O(n) correlations.
2. The distribution is ample.
Note that both conditions are required. It is not only long range correlations,
but (a) the non-factorizability of such correlations and (b) ampleness under
such non-factorizable correlations that characterizes the solution spaces in hard
phases of NP-complete problems.
3.1 Two Kinds of poly(log n)-parameterizations
We have seen that distributions on n variates that are poly(log n)-parametrizable
are very atypical. When do they arise? They can be studied in two categories,
both of which will correspond to polynomial time algorithms.
3.1.1 Range Limited Interactions
As noted earlier, it is not often that complex systems of n interacting variables
have complete independence between some subsets. What is far more frequent
32
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 33
is that there are conditional independencies between certain subsets given some
intermediate subset. In this case, the joint will factorize into factors each of
whose scope is a subset of (X
1
, . . . , X
n
). If the factorization is into condition-
ally independent factors, each of whose scope is of size at most k , then we can
parametrize the joint distribution with at most n2
n
independent parameters.
We should emphasize that the factors must give us conditional independence
for this to be true. For example, factor graphs give us a factorization, but it is,
in general, not a factorization into conditional independents, and so we cannot
conclude anything about the number of independent parameters by just exam-
ining the factor graph. Fromour perspective, a major feature of directed graphi-
cal models is that their factorizations are already globally normalized once they
are locally normalized, meaning that there is a recursive factorization of the
joint into conditionally independent pieces. The conditional independence in
this case is from all non-descendants, given the parents. Therefore, if each node
has at most k parents, we can parametrize the distribution using at most n2
k
independent parameters. We may also moralize the graph and see this as a fac-
torization over cliques in the moralized graph. Note that such a factorization
(namely, starting from a directed model and moralizing) holds even if the dis-
tribution is not positive in contrast with those distributions which do not factor
over directed models and where we have to invoke the Hammersley-Clifford
theorem to get a similar factorization. See [KF09] for further discussion on pa-
rameterizations for directed and undirected graphical models.
Our proof scheme requires us to distinguish distributions based on the size
of the irreducible direct interactions between subsets of the covariates. Namely,
we would like to distinguish distributions where there are O(n) such covariates
whose joint interaction cannot be factored through smaller interactions (having
less than O(n) covariates) chained together by conditional independencies. We
would like to contrast such distributions from others which can be so factored
through factors having only poly(log n) variates in their scope. The measure that
allows us to make this distinction is the number of independent parameters it
takes to specify the distribution. When the size of the smallest irreducible inter-
33
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 34
Range Limited to poly(log n)
R
a
n
g
e

L
i
m
i
t
e
d

t
o

p
o
l
y
(
l
o
g

n
)
R
a
n
g
e

L
i
m
i
t
e
d

t
o

p
o
l
y
(
l
o
g

n
)
Figure 3.1: Arange limited joint distribution on n covariates that has poly(log n)-
parametrization. Although interactions between variables may be ample for
their range, but their range is limited to poly(log n).
34
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 35
actions is O(n), then we need O(c
n
) parameters where c > 1. On the other hand,
if we were able to demonstrate that the distribution factors through interactions
which always have scope poly(log n), then we would need only O(c
poly(log n)
) pa-
rameters. See Fig. 3.1
Let us consider the example of a Markov random field. By Hammersley-
Clifford, it is also a Gibbs random field over the set of maximal cliques in the
graph encoding the neighborhood system of the Markov random field. This
Gibbs field comes with conditional independence assurance, and therefore, we
have an upper bound on the number of parameters it takes to specify the dis-
tribution. Namely, it is just
¸
c∈C
2
|c|
. Thus, if at most k < n variables interact
directly at a time, then the largest clique size would be k, and this would give
us a more economical parameterization than the one which requires 2
n
− 1 pa-
rameters.
In Chapter 5, we will build machinery that shows that if a problem lies in P
as a result of a range limited algorithm (like monadic LFP), then the factoriza-
tion of the distribution of solutions to that problem causes it to have economi-
cal parametrization, precisely because variables do not interact all at once, but
rather in smaller subsets in a directed manner that gives us conditional inde-
pendencies between sets that are of size poly(log n).
Note that the case where all n variates are independent falls into the range
limited category with range being one. The resulting distribution is ample.
3.1.2 Value Limited Interactions
In the previous section we saw the first type of interaction between n covari-
ates that can be parametrized by just poly(log n) independent parameters. This
was the case where the n variates interact directly only poly(log n) at a time, and
such interactions are chained together through conditional independencies. In
this section, we will see another such limited interaction, where the n variates
do interact directly O(n) at a time, but they are restricted to taking only c
poly(log n)
many distinct values (see Fig. 3.2). One sees immediately that the underlying lim-
itation in both this case and the previous is common — the set of n covariates
35
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 36
do not take 2
n
different values with extensive O(n) correlations that do not fac-
tor through conditional independencies like a “true”(or more precisely, typical)
joint distribution of n variates. Instead, in both cases, the n covariates behave in
ways that is similar to a system of poly(log n) covariates. Namely, in both cases,
their “jointness” resembles a system of poly(log n) covariates.
How do we precisely state this property? Through the notion of indepen-
dent parameters. We will measure the jointness of a distribution by the number
of independent parameters required to specify it. A “true” joint distribution
takes O(c
n
), c > 1 independent parameters to specify. On the other hand, both
range and value limited interactions require only O(c
poly(log n)
) independent pa-
rameters to specify. This is the crux of the P = NP question, as we shall see. In
particular, we shall see that in the hard phases of problems such as k-SAT for
k > 8, O(c
poly(log n)
) independent parameters simply will not suffice to explain
the behavior of the solution space of the problem. We will recall this behav-
ior in some detail in Chapter 6. We should stress that this behavior has been
rigorously shown to hold for some phases of k-SAT for high values of k. It is in
these phases that our separation of complexity classes can be demonstrated, not
elsewhere. We should also point out that once we have isolated the precise no-
tion that is at the heart of polynomial time computation — namely poly(log n)-
parametrizability of the space of solutions — several apparent issues resolve
themselves. Take the case of clustering in XORSAT, for instance. We only
need note that the linear nature of XORSAT solution spaces mean there is a
poly(log n)-parametrization (the basis provides this for linear spaces). The core
issue is that of the number of independent parameters it takes to specify the
distribution of the entire space of solutions.
In both cases — range limited and value limited interactions — the sys-
tem of n covariates behaves as though it was a system of only poly(log n)
covariates. In the case of range limited, this is because the covariates only
jointly vary with poly(log n) other variates at a time. In the case of value
limited interactions, this is because though O(n) variates vary jointly, they
only take 2
poly(log n)
joint values. Thus, in both cases, the joint distribution
36
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 37
has a very economical parametrization using only 2
poly(log n)
independent
parameters.
In later chapters, we will build machinery to see that polynomial time LFP
algorithms can capture either range or value limited behaviors, but not the joint
behavior of a “true” joint distribution of n covariates.
It is also useful to notice that neither type of limitation implies the other.
For instance, n independent variates are range limited, but not value limited.
Whereas the distribution supported on the all 1 and all 0 tuple is value limited,
but not range limited. Regimes of problems where the distributions of solutions
are neither value limited nor range limited cause the failure of polynomial time
algorithms on the average.
Range O(n)
V
a
l
u
e

L
i
m
i
t
e
d

t
o

2
p
o
ly
(
lo
g

n
)
V
a
l
u
e

L
i
m
i
t
e
d

t
o

2
p
o
ly
(
lo
g

n
)
Value Limited to
2
poly(log n)
Figure 3.2: Avalue limited joint distribution on n covariates that has poly(log n)-
parametrization. Although interactions between variables are O(n) at a time,
they do not display ampleness in their joint distribution.
37
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 38
3.1.3 On the Atypical Nature of poly(log n)-parameterization
We briefly mentioned earlier that the typical member of the space of distribu-
tions on n covariates requires O(2
n
) parameters. Note that this is a statistical
statement. Namely, if we picked a n variable distribution at random, with
high likelihood we would get a distribution that required O(2
n
) parameters to
specify. In other words, with high likelihood, we would not get a poly(log n)-
parametrizable distribution. This observation may be used to state results about
average case complexity in hard phases of random k-SAT. In many ways, these
hard phases are simply typical, nothing more. The solution space shows the
behavior of a typical joint distribution on n covariates in that it is ample and
correlated. It is polynomial time solution spaces that are atypical for n variate
distributions in that they are either not ample (the value limited case) or they are
not correlated solidly enough (the range limited case, where they admit Gibbs
factorizations into smaller potentials).
This short section owes its existence to Leonid Levin and Avi Wigderson,
both of whom asked us whether our methods could be used to make statements
about average case complexity. We will return to this issue in future versions of
this paper or in the manuscript [Deo10] which is under preparation.
3.1.4 Our Treatment of Range and Value Limited Distributions
The two types of distributions that we have mentioned above are only superfi-
cially dissimilar. In both cases, the range of behaviors of the n covariates can be
parametrized with the number of independent parameters it takes to specify a
joint distribution of only poly(log n) covariates. For purposes of pedagogy, we
will disregard this superficial dissimilarity and provide a full treatment of the
range limited case. We can even think of the value limited behavior as a type of
range limited behavior where, even though a covariate sees O(n) other covari-
ates, it only utilizes poly(log n) amount of the information in them in order to
make its decision.
We end this chapter by tying poly(log n)-parameterizations to a Markov or
38
3. DISTRIBUTIONS WITH poly(LOGN)-PARAMETRIZATION 39
(equivalently for directed) Gibbs models. Once again, consider two kinds of
poly(log n)-parameterizations — range limited and value limited. A range lim-
ited parametrization would correspond to a Gibbs field whose potentials are
specified over maximum cliques of size poly(log n). A value limited parameter-
ization could have maximumcliques of size O(n), but the number of parameters
for such a clique would only be 2
poly(log n)
instead of the possible 2
O(n)
. In either
case, the random field would have poly(log n)-parametrization. See Figs. 3.1
and 3.2.
39
4. Logical Descriptions of
Computations
Work in finite model theory and descriptive complexity theory — a branch of
finite model theory that studies the expressive power of various logics in terms
of complexity classes — has resulted in machine independent characterizations
of various complexity classes. In particular, over ordered structures, there is
a precise and highly insightful characterization of the class of queries that are
computable in polynomial time, and those that are computable in polynomial
space. In order to keep the treatment relatively complete, we begin with a brief
pr´ ecis of this theory. Readers from a finite model theory background may skip
this chapter.
We quickly set notation. A vocabulary, denoted by σ, is a set consisting of
finitely many relation and constant symbols,
σ = 'R
1
, . . . , R
m
, c
1
, . . . , c
s
`.
Each relation has a fixed arity. We consider only relational vocabularies in that
there are no function symbols. This poses no shortcomings since functions may
be encoded as relations. A σ-structure A consists of a set A which is the universe
of A, interpretations R
A
for each of the relation symbols in the vocabulary, and
interpretations c
A
for each of the constant symbols in the vocabulary. Namely,
A = 'A, R
A
1
, . . . , R
A
m
, c
A
1
, . . . , c
A
s
`.
An example is the vocabulary of graphs which consists of a single relation
symbol having arity two. Then, a graph may be seen as a structure over this
40
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 41
vocabulary, where the universe is the set of nodes, and the relation symbol is
interpreted as an edge. In addition, some applications may require us to work
with a graph vocabulary having two constants interpreted in the structure as
source and sink nodes respectively.
We also denote by σ
n
the extension of σ by n additional constants, and de-
note by (A, a) the structure where the tuple a has been identified with these
additional constants.
4.1 Inductive Definitions and Fixed Points
The material in this section is standard, and we refer the reader to [Mos74] for
the first monograph on the subject, and to [EF06, Lib04] for detailed treatments
in the context of finite model theory. See [Imm99] for a text on descriptive com-
plexity theory. Our treatment is taken mostly from these sources, and stresses
the facts we need.
Inductive definitions are a fundamental primitive of mathematics. The idea
is to build up a set in stages, where the defining relation for each stage can be
written in the first order language of the underlying structure and uses elements
added to the set in previous stages. In the most general case, there is an under-
lying structure A = 'A, R
1
, . . . , R
m
` and a formula
φ(S, x) ≡ φ(S, x
1
, . . . , x
n
)
in the first-order language of A. The variable S is a second-order relation vari-
able that will eventually hold the set we are trying to build up in stages. At the
ξ
th
stage of the induction, denoted by I
ξ
φ
, we insert into the relation S the tuples
according to
x ∈ I
ξ
φ
⇔ φ(
¸
η<ξ
I
η
φ
, x).
We will denote the stage that a tuple enters the relation in the induction defined
by φ by [ [
A
φ
. The decomposition into its various stages is a central characteristic
of inductively defined relations. We will also require that φ have only posi-
tive occurrences of the n-ary relation variable S, namely all occurrences of S be
41
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 42
within the scope of an even number of negations. Such inductions are called
positive elementary. In the most general case, a transfinite induction may result.
The least ordinal κ at which I
κ
φ
= I
κ+1
φ
is called the closure ordinal of the induc-
tion, and is denoted by [φ
A
[. When the underlying structures are finite, this is
also known as the inductive depth. Note that the cardinality of the ordinal κ is at
most [A[
n
.
Finally, we define the relation
I
φ
=
¸
ξ
I
ξ
φ
.
Sets of the form I
φ
are known as fixed points of the structure. Relations that may
be defined by
R(x) ⇔ I
φ
(a, x)
for some choice of tuple a over A are known as inductive relations. Thus, induc-
tive relations are sections of fixed points.
Note that there are definitions of the set I
φ
that are equivalent, but can be
stated only in the second order language of A. Note that the definition above is
1. elementary at each stage, and
2. constructive.
We will use both these properties throughout our work.
We now proceed more formally by introducing operators and their fixed
points, and then consider the operators on structures that are induced by first
order formulae. We begin by defining two classes of operators on sets.
Definition 4.1. Let Abe a finite set, and {(A) be its power set. An operator F on
Ais a function F : {(A) → {(A). The operator F is monotone if it respects subset
inclusion, namely, for all subsets X, Y of A, if X ⊆ Y , then F(X) ⊆ F(Y ). The
operator F is inflationary if it maps sets to their supersets, namely, X ⊆ F(X).
Next, we define sequences induced by operators, and characterize the se-
quences induced by monotone and inflationary operators.
42
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 43
Definition 4.2. Let F be an operator on A. Consider the sequence of sets F
0
, F
1
, . . .
defined by
F
0
= ∅,
F
i+1
= F(F
i
).
(4.1)
This sequence (F
i
) is called inductive if it is increasing, namely, if F
i
⊆ F
i+1
for
all i. In this case, we define
F

:=

¸
i=0
F
i
. (4.2)
Lemma 4.3. If F is either monotone or inflationary, the sequence (F
i
) is inductive.
Now we are ready to define fixed points of operators on sets.
Definition 4.4. Let F be an operator on A. The set X ⊆ A is called a fixed point
of F if F(X) = X. A fixed point X of F is called its least fixed point, denoted
LFP(F), if it is contained in every other fixed point Y of F, namely, X ⊆ Y
whenever Y is a fixed point of F.
Not all operators have fixed points, let alone least fixed points. The Tarski-
Knaster guarantees that monotone operators do, and also provides two con-
structions of the least fixed point for such operators: one “from above” and the
other “from below.” The latter construction uses the sequences (4.1).
Theorem 4.5 (Tarski-Knaster). Let F be a monotone operator on a set A.
1. F has a least fixed point LFP(F) which is the intersection of all the fixed points
of F. Namely,
LFP(F) =
¸
¦Y : Y = F(Y )¦.
2. LFP(F) is also equal to the union of the stages of the sequence (F
i
) defined in
(4.1). Namely,
LFP(F) =
¸
F
i
= F

.
However, not all operators are monotone; therefore we need a means of con-
structing fixed points for non-monotone operators.
43
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 44
Definition 4.6. For an inflationary operator F, the sequence F
i
is inductive, and
hence eventually stabilizes to the fixed point F

. For an arbitrary operator G,
we associate the inflationary operator G
infl
defined by G
infl
(Y ) Y ∪G(Y ). The
set G
infl

is called the inflationary fixed point of G, and denoted by IFP(G).
Definition 4.7. Consider the sequence (F
i
) induced by an arbitrary operator F
on A. The sequence may or may not stabilize. In the first case, there is a positive
integer n such that F
n+1
= F
n
, and therefore for all m > n, F
m
= F
n
. In the
latter case, the sequence F
i
does not stabilize, namely, for all n ≤ 2
|A|
, F
n
=
F
n+1
. Now, we define the partial fixed point of F, denoted PFP(F), as F
n
in the
first case, and the empty set in the second case.
4.2 Fixed Point Logics for P and PSPACE
We now specialize the theory of fixed points of operators to the case where the
operators are defined by means of first order formulae.
Definition 4.8. Let σ be a relational vocabulary, and R a relational symbol of
arity k that is not in σ. Let ϕ(R, x
1
, . . . , x
n
) = ϕ(R, x) be a formula of vocabulary
σ ∪ ¦R¦. Now consider a structure A of vocabulary σ. The formula ϕ(R, x)
defines an operator F
ϕ
: {(A
k
) → {(A
k
) on A
k
which acts on a subset X ⊆ A
k
as
F
ϕ
(X) = ¦a [ A [= ϕ(X/R, a¦, (4.3)
where ϕ(X/R, a¦ means that R is interpreted as X in ϕ.
We wish to extend FO by adding fixed points of operators of the form F
φ
,
where φ is a formula in FO. This gives us fixed point logics which play a central
role in descriptive complexity theory.
Definition 4.9. Let the notation be as above.
1. The logic FO(IFP) is obtained by extending FO with the following forma-
tion rule: if ϕ(R, x) is a formula and t a k-tuple of terms, then [IFP
R,x
ϕ(R, x)](t)
44
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 45
is a formula whose free variables are those of t. The semantics are given
by
A [= [IFP
R,x
ϕ(R, x)](a) iff a ∈ IFP(F
ϕ
).
2. The logic FO(PFP) is obtained by extending FO with the following forma-
tion rule: if ϕ(R, x) is a formula and t a k-tuple of terms, then [PFP
R,x
ϕ(R, x)](t)
is a formula whose free variables are those of t. The semantics are given
by
A [= [PFP
R,x
ϕ(R, x)](a) iff a ∈ PFP(F
ϕ
).
We cannot define the closure of FO under taking least fixed points in the
above manner without further restrictions since least fixed points are guaran-
teed to exist only for monotone operators, and testing for monotonicity is un-
decidable. If we were to form a logic by extending FO by least fixed points
without further restrictions, we would obtain a logic with an undecidable syn-
tax. Hence, we make some restrictions on the formulae which guarantee that
the operators obtained from them as described by (4.3) will be monotone, and
thus will have a least fixed point. We need a definition.
Definition 4.10. Let notation be as earlier. Let ϕ be a formula containing a rela-
tional symbol R. An occurrence of R is said to be positive if it is under the scope
of an even number of negations, and negative if it is under the scope of an odd
number of negations. A formula is said to be positive in R if all occurrences of R
in it are positive, or there are no occurrences of R at all. In particular, there are
no negative occurrences of R in the formula.
Lemma 4.11. Let notation be as earlier. If the formula ϕ(R, x) is positive in R, then
the operator obtained from ϕ by construction (4.3) is monotone.
Now we can define the closure of FO under least fixed points of operators
obtained from formulae that are positive in a relational variable.
45
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 46
Definition 4.12. The logic FO(LFP) is obtained by extending FO with the fol-
lowing formation rule: if ϕ(R, x) is a formula that is positive in the k-ary rela-
tional variable R, and t is a k-tuple of terms, then [LFP
R,x
ϕ(R, x)](t) is a formula
whose free variables are those of t. The semantics are given by
A [= [LFP
R,x
ϕ(R, x)](a) iff a ∈ LFP(F
ϕ
).
As earlier, the stage at which the tuple a enters the relation R is denoted by
[a[
A
ϕ
, and inductive depths are denoted by [ϕ
A
[. This is well defined for least
fixed points since a tuple enters a relation only once, and is never removed
from it after. In fixed points (such as partial fixed points) where the underlying
formula is not necessarily positive, this is not true. A tuple may enter and leave
the relation being built multiple times.
Next, we informally state two well-known results on the expressive power
of fixed point logics. First, adding the ability to do simultaneous induction
over several formulae does not increase the expressive power of the logic, and
secondly FO(IFP) = FO(LFP) over finite structures. See [Lib04, '10.3, p. 184] for
details.
We have introduced various fixed point constructions and extensions of first
order logic by these constructions. We end this section by relating these log-
ics to various complexity classes. These are the central results of descriptive
complexity theory.
Fagin [Fag74] obtained the first machine independent logical characteriza-
tion of an important complexity class. Here, ∃SO refers to the restriction of
second-order logic to formulae of the form
∃X
1
∃X
m
ϕ,
where ϕ does not have any second-order quantification.
Theorem 4.13 (Fagin).
∃SO = NP.
Immerman [Imm82] and Vardi [Var82] obtained the following central result
that captures the class P on ordered structures.
46
4. LOGICAL DESCRIPTIONS OF COMPUTATIONS 47
Theorem 4.14 (Immerman-Vardi). Over finite, ordered structures, the queries ex-
pressible in the logic FO(LFP) are precisely those that can be computed in polynomial
time. Namely,
FO(LFP) = P.
A characterization of PSPACE in terms of PFP was obtained in [AV91,
Var82].
Theorem 4.15 (Abiteboul-Vianu, Vardi). Over finite, ordered structures, the queries
expressible in the logic FO(PFP) are precisely those that can be computed in polynomial
space. Namely,
FO(PFP) = PSPACE.
Note: We will often use the term LFP generically instead of FO(LFP) when we
wish to emphasize the fixed point construction being performed, rather than the
language.
47
5. The Link Between Polynomial
Time Computation and Conditional
Independence
In Chapter 2 we saw how certain joint distributions that encode interactions
between collections of variables “factor through” smaller, simpler interactions.
This necessarily affects the type of influence a variable may exert on other vari-
ables in the system. Thus, while a variable in such a system can exert its influ-
ence throughout the system, this influence must necessarily be bottlenecked by
the simpler interactions that it must factor through. In other words, the influ-
ence must propagate with bottlenecks at each stage. In the case where there are
conditional independencies, the influence can only be “transmitted through”
the values of the intermediate conditioning variables.
In this chapter, we will uncover a similar phenomenon underlying the log-
ical description of polynomial time computation on ordered structures. The
fundamental observation is the following:
Least fixed point computations “factor through” first order computations,
and so limitations of first order logic must be the source of the bottleneck at
each stage to the propagation of information in such computations.
The treatment of LFP versus FOin finite model theory centers around the fact
that FO can only express local properties, while LFP allows non-local properties
such as transitive closure to be expressed. We are taking as given the non-local
capability of LFP, and asking how this non-local nature factors at each step, and what is
the effect of such a factorization on the joint distribution of LFP acting upon ensembles.
48
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 49
Fixed point logics allow variables to be non-local in their influence, but this
non-local influence must factor through first order logic at each stage. This is
a very similar underlying idea to the statistical mechanical picture of random
fields over spaces of configurations that we sawin Chapter 2, but comes cloaked
in a very different garb — that of logic and operators. The sequence (F
i
ϕ
) of op-
erators that construct fixed points may be seen as the propagation of influence
in a structure by means of setting values of “intermediate variables”. In this
case, the variables are set by inducting them into a relation at various stages
of the induction. We want to understand the stage-wise bottleneck that a fixed
point computation faces at each step of its execution, and tie this back to no-
tions of conditional independence and factorization of distributions. In order
to accomplish this, we must understand the limitations of each stage of a LFP
computation and understand how this affects the propagation of long-range in-
fluence in relations computed by LFP. Namely, we will bring to bear ideas from
statistical mechanics and message passing to the logical description of compu-
tations.
It will be beneficial to state this intuition with the example of transitive clo-
sure.
EXAMPLE 5.1. The transitive closure of an edge in a graph is the standard exam-
ple of a non-local property that cannot be expressed by first order logic. It can
be expressed in FO(LFP) as follows. Let E be a binary relation that expresses
the presence of an edge between its arguments. Then we can see that iterating
the positive first order formula ϕ(R, x, y) given by
ϕ(R, x, y) ≡ E(x, y) ∨ ∃z(E(x, z) ∧ R(z, y)).
builds the transitive closure relation in stages.
Notice that the decision of whether a vertex enters the relation is based on
the immediate neighborhood of the vertex. In other words, the relation is built
stage by stage, and at each stage, vertices that have entered a relation make
other vertices that are adjacent to them eligible to enter the relation at the next
stage. Thus, though the resulting property is non-local, the information flow used to
49
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 50
compute it is stage-wise local. The computation factors through a local property
at each stage, but by chaining many such local factors together, we obtain the
non-local relation of transitive closure. This picture relates to a Markov random
field, where such local interactions are chained together in a way that variables
can exert their influence to arbitrary lengths, but the factorization of that influ-
ence (encoded in the joint distribution) reveals the stage-wise local nature of the
interaction. There are important differences however — the flow of LFP com-
putation is directed, whereas a Markov random field is undirected, for instance.
We have used this simple example just to provide some preliminary intuition.
We will now proceed to build the requisite framework.
5.1 The Limitations of LFP
Many of the techniques in model theory break down when restricted to finite
models. A notable exception is the Ehrenfeucht-Fra¨ıss´ e game for first order
logic. This has led to much research attention to game theoretic characteriza-
tions of various logics. The primary technique for demonstrating the limitations
of fixed point logics in expressing properties is to consider them a segment of
the logic L
k
∞ω
, which extends first order logic with infinitary connectives, and
then use the characterization of expressibility in this logic in terms of k-pebble
games. This is however not useful for our purpose (namely, separating P from
NP) since NP ⊆ PSPACE and the latter class is captured by PFP, which is
also a segment of L
k
∞ω
.
One of the central contributions of our work is demonstrating a completely
different viewpoint of LFP computations in terms of the concepts of conditional
independence and factoring of distributions, both of which are fundamental to
statistics and probability theory. In order to arrive at this correspondence, we
will need to understand the limitations of first order logic. Least fixed point
is an iteration of first order formulas. The limitations of first order formulae
mentioned in the previous section therefore appear at each step of a least fixed
point computation.
50
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 51
Viewing LFP as “stage-wise first order” is central to our analysis. Let us
pause for a while and see how this fits into our global framework. We are in-
terested in factoring complex interactions between variables into their smallest
constituent irreducible factors. Viewed this way, LFP has a natural factorization
into its stages, which are all described by first order formulae.
Let us now analyze the limitations of the LFP computation through this
viewpoint.
5.1.1 Locality of First Order Logic
The local properties of first order logic have received considerable research at-
tention and expositions can be found in standard references such as [Lib04, Ch.
4], [EF06, Ch. 2], [Imm99, Ch. 6]. The basic idea is that first order formulae can
only “see” up to a certain distance away from their free variables. This distance
is determined by the quantifier rank of the formula.
The idea that first order formulae are local has been formalized in essen-
tially two different ways. This has led to two major notions of locality — Hanf
locality [Han65] and Gaifman locality [Gai82]. Informally, Hanf locality says
that whether or not a first order formula ϕ holds in a structure depends only on
its multiset of isomorphism types of spheres of radius r. Gaifman locality says
that whether or not ϕ holds in a structure depends on the number of elements
of that structure having pairwise disjoint r-neighborhoods that fulfill first order
formulae of quantifier depth d for some fixed d (which depends on ϕ). Clearly,
both notions express properties of combinations of neighborhoods of fixed size.
In the literature of finite model theory, these properties were developed to
deal with cases where the neighborhoods of the elements in the structure had
bounded diameters. In particular, some of the most striking applications of
such properties are in graphs with bounded degree, such as the linear time al-
gorithm to evaluate first order properties on bounded degree graphs [See96].
In contrast, we will use some of the normal forms developed in the context of
locality properties in finite model theory, but in the scenario where neighbor-
hoods of elements have unbounded diameter. Thus, it is not only the locality
51
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 52
that is of interest to us, but the exact specification of the finitary nature of the
first order computation. We will see that what we need is that first order logic
can only exploit a bounded number of local properties. We will need both these
properties in our analysis.
Recall the notation and definitions from the previous chapter. We need some
definitions in order to state the results.
Definition 5.2. The Gaifman graph of a σ-structure A is denoted by G
A
and de-
fined as follows. The set of nodes of G
A
is A. There is an edge between two
nodes a
1
and a
2
in G
A
if there is a relation R in σ and a tuple t ∈ R
A
such that
both a
1
and a
2
appear in t.
With the graph defined, we have a notion of distance between elements a
i
, a
j
of A, denoted by d(a
i
, a
j
), as simply the length of the shortest path between a
i
and a
j
in G
A
. We extend this to a notion of distance between tuples from A as
follows. Let a = (a
1
, . . . , a
n
) and b = (b
1
, . . . , b
m
). Then
d
A
(a, b) = min¦d
A
(a
i
, b
j
): 1 ≤ i ≤ n, 1 ≤ j ≤ m¦.
There is no restriction on n and m above. In particular, the definition above
also applies to the case where either of them is equal to one. Namely, we have
the notion of distance between a tuple and a singleton element. We are now
ready to define neighborhoods of tuples. Recall that σ
n
is the expansion of σ by
n additional constants.
Definition 5.3. Let A be a σ-structure and let a be a tuple over A. The ball of
radius r around a is a set defined by
B
A
r
(a) = ¦b ∈ A: d
A
(a, b) ≤ r¦.
The r-neighborhood of a in Ais the σ
n
-structure N
A
r
(a) whose universe is B
A
r
(a);
each relation R is interpreted as R
A
restricted to B
A
r
(a); and the n additional
constants are interpreted as a
1
, . . . , a
n
.
We recall the notion of a type. Informally, if L is a logic (or language), the L-
type of a tuple is the sum total of the information that can be expressed about it
52
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 53
in the language L. Thus, the first order type of a m-tuple in a structure is defined
as the set of all FO formulae having m free variables that are satisfied by the
tuple. Over finite structures, this notion is far too powerful since it characterizes
the structure (A, a) up to isomorphism. Amore useful notion is the local type of a
tuple. In particular, a neighborhood is a σ
n
-structure, and a type of a neighborhood
is an equivalence class of such structures up to isomorphism. Note that any
isomorphism between N
A
r
(a
1
, . . . , a
n
) and N
B
r
(b
1
, . . . , b
n
) must send a
i
to b
i
for
1 ≤ i ≤ n.
Definition 5.4. Notation as above. The local r-type of a tuple a in A is the type of
a in the substructure induced by the r-neighborhood of a in A, namely in N
A
r
(a).
In what follows, we may drop the superscript if the underlying structure is
clear. The following three notions of locality are used in stating the results.
Definition 5.5. 1. Formulas whose truth at a tuple a depends only on B
r
(a)
are called r-local. In other words, quantification in such formulas is re-
stricted to the structure N
r
(x).
2. Formulas that are r-local around their variables for some value of r are
said to be local.
3. Boolean combinations of formulas that are local around the various coor-
dinates x
i
of x are said to be basic local.
As mentioned earlier, there are two broad flavors of locality results in lit-
erature – those that follow from Hanf’s theorem, and those that follow from
Gaifman’s theorem. The first relates two different structures. [Han65] proved
his result for infinite structures. We provide below the locality result due to
[FSV95] that is suitable for finite models. To proceed, we need a definition.
Definition 5.6. Let A, Bbe σ-structures and let m ∈ N. If for every isomorphism
type τ of a r-neighborhood of a point, either
1. Both A and Bhave the same number of elements of type τ.
53
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 54
2. Both A and Bhave more than m elements of type τ.
Then we say that A and Bare threshold (r, m)-equivalent.
Theorem 5.7 ([FSV95]). For each k, l > 0, there exist r, m > 0 such that if A and B
are threshold (r, m)-equivalent and every element has degree at most l, then they satisfy
the same first order formulae up to quantifier rank k, written A ≡
k
B. Furthermore, r
depends only on k.
We refer the reader to [FSV95] for a discussion comparing the Fagin-Stockmeyer-
Vardi theoremwith Hanf’s theoremin the context of applications to finite model
theory. In particular, neither theorem seems to imply the other.
The Hanf locality lemma for formulae having a single free variable has a
simple form and is an easy consequence of Thm. 5.7.
Lemma 5.8. Notation as above. Let ϕ(x) be a formula of quantifier depth q. Then there
is a radius r and threshold t such that if A and Bhave the same multiset of local types
up to threshold t, and the elements a ∈ A and b ∈ B have the same local type up to
radius r, then
A [= ϕ(a) ↔B[= ϕ(b).
See [Lin05] for an application to computing simple monadic fixed points on
structures of bounded degree in linear time.
Next we come to Gaifman’s version of locality.
Theorem 5.9 ([Gai82]). Every FO formula ϕ(x) over a relational vocabulary is equiv-
alent to a Boolean combination of
1. local formula around x, and
2. sentences of the form
∃x
1
, . . . , x
s

s

i=1
φ(x
i
) ∧

1≤i≤j≤s
d
>2r
(x
i
, x
j
)

,
where the φ are r-local.
54
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 55
In words, for every first order formula, there is an r such that the truth of the
formula on a structure depends only on the number of elements having disjoint
r-neighborhoods that satisfy certain local formulas. This again expresses the
bounded number of local properties feature that limits first order logic.
The following normal form for first order logic that was developed in an
attempt to merge some of the ideas from Hanf and Gaifman locality.
Theorem 5.10 ([SB99]). Every first-order sentence is logically equivalent to one of the
form
∃x
1
∃x
l
∀yϕ(x, y),
where ϕ is local around y.
5.2 Simple Monadic LFP and Conditional Indepen-
dence
In this section, we exploit the limitations described in the previous section to
build conceptual bridges from least fixed point logic to the Markov-Gibbs pic-
ture of the preceding section. At first, this may seemto be an unlikely union. But
we will establish that there are fundamental conceptual relationships between
the directed Markovian picture and least fixed point computations. The key is
to see the constructions underlying least fixed point computations through the
lens of influence propagation and conditional independence. In this section,
we will demonstrate this relationship for the case of simple monadic least fixed
points. Namely, a FO(LFP) formula without any nesting or simultaneous induc-
tion, and where the LFP relation being constructed is monadic. In later sections,
we show how to deal with complex fixed points as well.
We wish to build a viewof fixed point computation as an information propa-
gation algorithm. In order to do so, let us examine the geometry of information
flow during an LFP computation. At stage zero of the fixed point computation,
none of the elements of the structure are in the relation being computed. At the
first stage, some subset of elements enters the relation. This changes the local
55
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 56
neighborhoods of these elements, and the vertices that lie in these local neigh-
borhoods change their local type. Due to the global changes in the multiset of
local types, more elements in the structure become eligible for inclusion into the
relation at the next stage. This process continues, and the changes “propagate”
through the structure. Thus, the fundamental vehicle of this information propagation
is that a fixed point computation ϕ(R, x) changes local neighborhoods of elements at
each stage of the computation.
This propagation is
1. directed, and
2. relies on a bounded number of local neighborhoods at each stage.
In other words, we observe that
The influence of an element during LFP computation propagates in a simi-
lar manner to the influence of a random variable in a directed Markov field.
This correspondence is important to us. Let us try to uncover the under-
lying principles that cause it. The directed property comes from the positivity
of the first order formula that is being iterated. This ensures that once an ele-
ment is inserted into the relation that is being computed, it is never removed.
Thus, influence flows in the direction of the stages of the LFP computation. Fur-
thermore, this influence flow is local in the following sense: the influence of an
element can propagate throughout the structure, but only through its influence
on various local neighborhoods.
This correspondence is most striking in the case of bounded degree struc-
tures. In that case, we have only O(1) local types.
Lemma 5.11. On a graph of bounded degree, there is a fixed number of non-isomorphic
neighborhoods with radius r. Consequently, there are only a fixed number of local r-
types.
In order to determine whether an element in a structure satisfies a first order
formula we need (a) the multiset of local r-types in the structure (also known
56
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 57
as its global type) for some value of r, and (b) the local type of the element.
Furthermore, by threshold Hanf, we only need to know the multiset of local
types up to a certain threshold.
For large enough structures, we will cross the Hanf threshold for the multi-
set of r-types. At this point, we will be making a decision of whether an element
enters the relation based solely on its local r-type. This type potentially changes
with each stage of the LFP. At the time when this change renders the element
eligible for entering the relation, it will do so. Once it enters the relation, it
changes the local r-type of all those elements which lie within a r-neighborhood
of it, and such changes render them eligible, and so on. This is how the compu-
tation proceeds, in a purely stage-wise local manner. This is a Markov property:
the influence of an element upon another must factor entirely through the local
neighborhood of the latter.
In the more general case where degrees are not bounded, we still have fac-
toring through local neighborhoods, except that we have to consider all the lo-
cal neighborhoods in the structure. However, here the bounded nature of FO
comes in. The FO formula that is being iterated can only express a property
about some bounded number of such local neighborhoods. For example, in
the Gaifman form, there are s distinguished disjoint neighborhoods that must
satisfy some local condition.
Remark 5.12. The same concept can be expressed in the language of sufficient
statistics. Namely, knowing some information about certain local neighbor-
hoods renders the rest of the information about variable values that have en-
tered the relation in previous stages of the graph superfluous. In particular,
Gaifman’s theorem says that for first order properties, there exists a sufficient
statistic that is gathered locally at a bounded number of elements. Knowing this statis-
tic gives us conditional independence from the values of other elements that
have already entered the relation previously, but not from elements that will
enter the relation subsequently. This is similar to the directed Markov picture
where there is conditional independence of any variable from non-descendants
given the value of the parents.
57
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 58
X
1
X
n
X
n-1
Interacting variables, highly constrained by one another
LFP assumes conditional independence
after statistics are obtained
X
2
Conditional Independence and factorization over
a larger directed model called the ENSP
(developed in Chapter 7)
Φ
s
Φ
2
Φ
1
Φ
s-1
Bounded number of local
statistics at each stage
Figure 5.1: Range limited LFP computation process viewed as conditional inde-
pendencies.
58
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 59
At this point, we have exhibited a correspondence between two apparently
very different formalisms. This correspondence is illustrated in Fig. 5.1.
59
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 60
5.3 Conditional Independence in Complex Fixed Points
In the previous sections, we showed that the natural “factorization” of LFP into
first order logic, coupled with the bounded local property of first order logic can
be used to exhibit conditional independencies in the relation being computed.
But the argument we provided was for simple fixed points having one free
variable, namely, for monadic least fixed points. How can we show that this
picture is the same for complex fixed points? We accomplish this in stages.
1. First, we use the transitivity theorem for fixed point logic to move nested
fixed points into simultaneous fixed points without nesting.
2. Next, we use the simultaneous induction lemma for fixed point logic to
encode the relation to be computed as a “section” of a single LFP relation
of higher arity.
Steps 1 and 2 involve standard constructions in finite model theory, which we
recall in Appendix A.
At this point, we are now working with k-tuples, for a k fixed for all problem
sizes, instead of single elements. This will change the distance properties of the
resulting structure of k-tuples. Let us examine the case of a 2-ary relation that
is being computed. In this case, we have the following situation. Every pair
of elements occurs in the set of 2-tuples. This means that the neighborhood of
every pair is O(n), since for any element a of the structure, every other element
b, c, d, occurs in a pair along with a.
This means that when there is a change to a 2-tuple containing a, that change
affects the neighborhoods of O(n) other 2-tuples. At this point, we see that we
are in the situation of O(n) range interactions. The key point to note is that we
still have only poly(log n) parametrization. This is because even though the inter-
actions are of O(n) range, the computation terminates in poly(n) steps, giving
us a economical parametrization of the state space. Put another way, though the
interactions are indeed between O(n) elements at a time, they are severely value
limited, leading once again to poly(log n) parametrization. Recall the discussion
60
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 61
of the two kinds of poly(log n) parameterizations (range limited and value lim-
ited) from Chapter 3. We will actually build a graphical model to give us the
parameterization in Chapter 8.
We also need to ensure that our original structure has a relation that allows
an order to be established on k-tuples. In particular, this does not pose a prob-
lem for encoding instances of k-SAT. The basic nature of information gathering
and processing in LFP does not change when the arity of the computation rises.
It merely adds the ability to gather polynomially more information at each stage
taken from O(n) variates at a time. But since the LFP terminates in polynomi-
ally many steps, the number of joint values taken by the system of n variables
is only 2
poly(log n)
. Although each element sees O(n) variates at each stage of the
LFP, it has the capability to utilize only poly(log n) amount of that information
in the following precise sense. A “true” joint distribution over n takes c
n
, c > 1
different values. It requires, therefore, O(c
n
) independent parameters to spec-
ify. This happens because the behavior of one variable is dependent on all n−1
others simultaneously. In cases of joint distributions of n covariates which take
only 2
poly(log n)
values, this can not be the case since the resulting distribution
can be parameterized far too economically.
Remark 5.13. We could work over a product structure where LFP captures the
class of polynomial time computable queries. In other words, we have to work
in a structure whose elements are k-tuples of our original structure. In this way,
a k-ary LFP over the original structure would be a monadic LFP over this struc-
ture. The O(n) nature of interactions remains, but again the parametrization is
only poly(log n).
Note that there are elegant ways to work with the space of equivalence
classes of k-tuples with equivalence under first order logic with k-variables.
For instance, one can consider a construction known as the canonical structure
due originally to [DLW95] who used it to provide a model theoretic proof of
the important theorem in [AV95] that P = PSPACE if and only if LFP = PFP.
Note that this is for all structures, not just for ordered structures.
The issue one faces is that there is a linear order on the canonical structure,
61
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 62
which renders the Gaifman graph trivial (totally connected). See [Lib04, '11.5]
for more details on canonical structures. The simple scheme described above
suffices for our purposes.
Remark 5.14. Though the Immerman-Vardi theoremis usually stated for ordered
structures, it holds for structures equipped with a successor relation (and no lin-
ear ordering). See [LR03, '11.2, p. 204] where the result is stated for successor
structures. The benefit of equipping our structures only with a successor struc-
ture is that the Gaifman graph remains non-trivial.
5.4 Aggregate Properties of LFP over Ensembles
We have shown that any polynomial time computation will update its relation
according to a certain Markov type property on the space of k-types of the un-
derlying structure, after extracting a statistic from the local neighborhoods of
the underlying structure. Thus far, there is no probabilistic picture, or a distri-
bution that we can analyze. We are only describing a fully deterministic com-
putation.
The distribution we seek will arise when we examine the aggregate behav-
ior of LFP over ensembles of structures that come from ensembles of constraint
satisfaction problems (CSPs) such as random k-SAT. When we examine the
properties in the aggregate of LFP running over ensembles, we will find the
following.
The “bounded number of local” property of each stage of monadic LFP
computation manifests as conditional independencies in the distribution,
making the distribution of solutions poly(log n)-parametrizable. Likewise,
value limited interactions in higher arity LFP computations also lead to
distribution of solutions that are poly(log n)-parametrizable.
This gives us the setting where we can exploit the full machinery of graphical
models of Chapter 2.
62
5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND
CONDITIONAL INDEPENDENCE 63
Before we examine the distributions arising from LFP acting on ensembles
of structures, we will bring in ideas from statistical physics into the proof. We
begin this in the next chapter.
63
6. The 1RSB Ansatz of Statistical
Physics
6.1 Ensembles and Phase Transitions
The study of random ensembles of various constraint satisfaction problems
(CSPs) is over two decades old, dating back at least to [CF86]. While a given
CSP — say, 3-SAT— might be NP-complete, many instances of the CSP might
be quite easy to solve, even using fairly simple algorithms. Furthermore, such
“easy” instances lay in certain well defined regimes of the CSP, while “harder”
instances lay in clearly separated regimes. Thus, researchers were motivated to
study randomly generated ensembles of CSPs having certain parameters that
would specify which regime the instances of the ensemble belonged to. We will
see this behavior in some detail for the specific case of the ensemble known as
random k-SAT.
An instance of k-SAT is a propositional formula in conjunctive normal form
Φ = C
1
∧ C
2
∧ ∧ C
m
having m clauses C
i
, each of whom is a disjunction of k literals taken from n
variables ¦x
1
, . . . , x
n
¦. The decision problem of whether a satisfying assignment
to the variables exists is NP-complete for k ≥ 3. The ensemble known as ran-
dom k-SAT consists of instances of k-SAT generated randomly as follows. An
instance is generated by drawing each of the m clauses ¦C
1
, . . . , C
m
¦ uniformly
from the 2
k

n
k

possible clauses having k variables. The entire ensemble of ran-
dom k-SAT having m clauses over n literals will be denoted by SAT
k
(n, m),
64
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 65
and a single instance of this ensemble will be denoted by Φ
k
(n, m). The clause
density, denoted by α and defined as α := m/n is the single most important
parameter that controls the geometry of the solution space of random k-SAT.
Thus, we will mostly be interested in the case where every formula in the en-
semble has clause density α. We will denote this ensemble by SAT
k
(n, α), and
an individual formula in it by Φ
k
(n, α).
Random CSPs such as k-SAT have attracted the attention of physicists be-
cause they model disordered systems such as spin glasses where the Ising spin of
each particle is a binary variable (”up” or “down”) and must satisfy some con-
straints that are expressed in terms of the spins of other particles. The energy of
such a system can then be measured by the number of unsatisfied clauses of a
certain k-SAT instance, where the clauses of the formula model the constraints
upon the spins. The case of zero energy then corresponds to a solution to the
k-SAT instance. The following formulation is due to [MZ97]. First we trans-
late the Boolean variables x
i
to Ising variables S
i
in the standard way, namely
S
i
= −(−1)
x
i
. Then we introduce new variables C
li
as follows. The variable C
li
is equal to 1 if the clause C
l
contains x
i
, it is −1 if the clause contains x
i
, and is
zero if neither appears in the clause. In this way, the sum
¸
n
i=1
C
li
S
i
measures
the satisfiability of clause C
l
. Specifically, if
¸
n
i=1
C
li
S
i
− k > 0, the clause is
satisfied by the Ising variables. The energy of the system is then measured by
the Hamiltonian
H =
m
¸
i=1
δ(
n
¸
i=1
C
li
S
i
, −K).
Here δ(i, j) is the Kronecker delta. Thus, satisfaction of the k-SAT instance
translates to vanishing of this Hamiltonian. Statistical mechanics then offers
techniques such as replica symmetry, to analyze the macroscopic properties of
this ensemble.
Also very interesting from the physicist’s point of view is the presence of a
sharp phase transition [CKT91, MSL92] (see also [KS94]) between satisfiable and
unsatisfiable regimes of random k-SAT. Namely, empirical evidence suggested
that the properties of this ensemble undergoes a clearly defined transition when
the clause density is varied. This transition is conjectured to be as follows. For
65
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 66
each value of k, there exists a transition threshold α
c
(k) such that with proba-
bility approaching 1 as n → ∞(called the Thermodynamic limit by physicists),
• if α < α
c
(k), an instance of random k-SAT is satisfiable. Hence this region
is called the SAT phase.
• If α > α
c
(k), an instance of random k-SAT is unsatisfiable. This region is
known as the unSAT phase.
There has been intense research attention on determining the numerical value
of the threshold between the SAT and unSAT phases as a function of k. [Fri99]
provides a sharp but non-uniform construction (namely, the value α
c
is a func-
tion of the problem size, and is conjectured to converge as n → ∞). Functional
upper bounds have been obtained using the first moment method [MA02] and
improved using the second moment method [AP04] that improves as k gets
larger.
6.2 The d1RSB Phase
More recently, another thread on this crossroad has originated once again from
statistical physics and is most germane to our perspective. This is the work in
the progression [MZ97], [BMW00], [MZ02], and [MPZ02] that studies the evo-
lution of the solution space of randomk-SAT as the constraint density increases
towards the transition threshold. In these papers, physicists have conjectured
that there is a second threshold that divides the SAT phase into two — an “easy”
SAT phase, and a “hard” SAT phase. In both phases, there is a solution with
high probability, but while in the easy phase one giant connected cluster of
solutions contains almost all the solutions, in the hard phase this giant clus-
ter shatters into exponentially many communities that are far apart from each
other in terms of least Hamming distance between solutions that lie in distinct
communities. Furthermore, these communities shrink and recede maximally
far apart as the constraint density is increased towards the SAT-unSAT thresh-
old. As this threshold is crossed, they vanish altogether.
66
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 67
As the clause density is increased, a picture known as the “1RSB hypothesis”
emerges that is illustrated in Fig. 6.1, and described below.
RS For α < α
d
, a problem has many solutions, but they all form one giant
cluster within which going from one solution to another involves flipping
only a finite (bounded) set of variables. This is the replica symmetric phase.
d1RSB At some value of α = α
d
which is below α
c
, it has been observed that
the space of solutions splits up into “communities” of solutions such that
solutions within a community are close to one another, but are far away
from the solutions in any other community. This effect is known as shat-
tering [ACO08]. Within a community, flipping a bounded finite number
of variable assignments on one satisfying takes one to another satisfying
assignment. But to go from one satisfying assignment in one community
to a satisfying assignment in another, one has to flip a fraction of the set
of variables and therefore encounters what physicists would consider an
“energy barrier” between states. This is the dynamical one step replica sym-
metry breaking phase.
unSAT Above the SAT-unSAT threshold, the formulas of random k-SAT are
unsatisfiable with high probability.
Using statistical physics methods, [KMRT
+
07] obtained another phase that
lies between d1RSB and unSAT. In this phase, known as 1RSB (one step replica
symmetry breaking), there is a “condensation” of the solution space into a sub-
exponential number of clusters, and the sizes of these clusters go to zero as the
transition occurs, after which there are no more solutions. This phase has not
been proven rigorously thus far to our knowledge and we will not revisit it in
this work.
The 1RSB hypothesis has been proven rigorously for high values of k. Specif-
ically, the existence of the d1RSB phase has been proven rigorously for the case
of k > 8, starting with [MMZ05] (see also [DMMZ08]) who showed the exis-
tence of clusters in a certain region of the SAT phase using first moment meth-
ods. Later, [ART06] rigorously proved that there exist exponentially many clus-
67
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 68
ters in the d1RSB phase and showed that within any cluster, the fraction of
variables that take the same value in the entire cluster (the so-called frozen vari-
ables) goes to one as the SAT-unSAT threshold is approached. Further [ACO08]
obtained analytical expressions for the threshold at which the solution space of
random k-SAT (as also two other CSPs — random graph coloring and random
hypergraph 2-colorability) shatters, as well as confirmed the O(n) Hamming
separation between clusters.
α
d
α
c α
Figure 6.1: The clustering of solutions just before the SAT-unSAT threshold.
Below α
d
, the space of solution is largely connected. Between α
d
and α
c
, the
solutions break up into exponentially many communities. Above α
c
, there are
no more solutions, which is indicated by the unfilled circle.
In summary, in the region of constraint density α ∈ [α
d
, α
c
], the solution
space is comprised of exponentially many communities of solutions which re-
quire a fraction of the variable assignments to be flipped in order to move be-
tween each other.
6.2.1 Cores and Frozen Variables
In this section, we reproduce results about the distribution of variable assign-
ments within each cluster of the d1RSB phase from [MMW07], [ART06], and
[ACO08].
We first need the notion of the core of a cluster. Given any solution in a clus-
ter, one may obtain the core of the cluster by “peeling away” variable assign-
ments that, loosely speaking occur only in clauses that are satisfied by other
68
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 69
variable assignments. This process leads to the core of the cluster.
To get a formal definition, first we define a partial assignment of a set of vari-
ables (x
1
, . . . , x
n
) as an assignment of each variable to a value in ¦0, 1, ∗¦. The
∗ assignment is akin to a “joker state” which can take whichever value is most
useful in order to satisfy the k-SAT formula.
Next, we say that a variable in a partial assignment is free when each clause
it occurs in has at least one other variable that satisfies the clause, or has as
assignment to ∗.
Finally, to obtain the core of a cluster, we repeat the following starting with
any solution in the cluster: if a variable is free, assign it a ∗.
This process will eventually lead to a fixed point, and that is the core of the
cluster. We may easily see that the core is not dependent upon the choice of the
initial solution.
What does the core of a cluster look like? Recall that the core is itself a
partial assignment, with each variable being assigned a 0, 1 or a ∗. Of obvious
interest are those variables that are assigned 0 or 1. These variables are said to be
frozen. Note that since the core can be arrived at starting from any choice of an
initial solution in the cluster, it follows that frozen variables take the same value
throughout the cluster. For example, if the variable x
i
takes value 1 in the core
of a cluster, then every solution lying in the cluster has x
i
assigned the value
1. The non-frozen variables are those that are assigned the value ∗ in the core.
These take both values 0 and 1 in the cluster. Clearly the number of ∗ variables
is a measure of the internal entropy (and therefore the size) of a cluster since it
is only these variables whose values vary within the cluster.
Apriori, we have no way to tell that the core will not be the all ∗ partial
assignment. Namely, we do not know whether there are any frozen variables at
all. However, [ART06] proved that for high enough values of k, with probability
going to 1 in the thermodynamic limit, almost every variable in a core is frozen
as we increase the clause density towards the SAT-unSAT threshold.
Theorem 6.1 ([ART06]). For every r ∈ [0,
1
2
] there is a constant k
r
such that for all
k ≥ k
r
, there exists a clause density α(k, r) < α
c
such that for all α ∈ [α(k, r), α
c
],
69
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 70
asymptotically almost surely
1. every cluster of solutions of Φ
k
(n, αn) has at least (1 −r)n frozen variables,
2. fewer than rn variables take the value ∗.
This gives us the corollary.
Corollary 6.2 ([ART06]). For every k ≥ 9, there exist α < α
c
(k) such that with high
probability, every cluster of the solution space of Φ
k
(n, αn) has frozen variables.
Note that this picture is known to hold only for k ≥ 9 and is an open question
for k < 9. See also the remark at the end of this section.
We end this section with a physical picture of what forms a core. If a formula
Φ has a core with C clauses, then these clauses must have literals that come
from a set of at most C variables. By bounding the probability of this event,
[MMW07] obtained a lower bound on the size of cores. The bound is linear,
which means that when non-trivial cores do exist ( [ART06] proved their exis-
tence for k ≥ 9), they must involve a fraction of all the variables in the formula.
In other words, a core may be thought of as the onset of a large single interaction
of degree O(n) among the variables. Furthermore, this core is instantiated am-
ply in the solution space (by that we mean it takes exponentially many values
in those many clusters of the d1RSB phase). As the reader may imagine after
reading the previous chapters, this sort of interaction cannot be dealt with by
LFP algorithms. We will need more work to make this precise, but informally
cores are too large to pass through the bottlenecks that the stage-wise first order
LFP algorithms create.
This may also be interpreted as follows. Algorithms based on LFP can tackle
long range interactions between variables, but only when they can be factored
into interactions of degree poly(log n) or are value limited. But the appearance
of cores is equivalent to the onset of O(n) degree interactions which cannot be
further factored into poly(log n) degree interactions, and are ample. Such am-
ple irreducible O(n) interactions, caused by increasing the clause density suffi-
ciently, cannot be dealt with using an LFP algorithm.
70
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 71
We have already noted that this is because LFP algorithms factor through
first order computations, and in a first order computation, the decision of whether
an element is to enter the relation being computed is based on information col-
lected from local neighborhoods and combined in a bounded fashion. This bot-
tleneck is too small for a core to factor through in range limited LFP. The am-
pleness precludes value limited interactions also as we shall see. The precise
statement of this intuitive picture will be provided in the next chapter when we
build our conditional independence hierarchies.
The freezing of variables in cores is known to happen only for k ≥ 9
[ART06]. It remains open for k < 8. Indeed, for low values of k such
as k = 2, 3, there is empirical evidence that this phenomenon does not
take place [MMW05], also see the discussion in [ART06, '1]. Hence, our
separation of complexity classes needs the regime of k ≥ 9.
6.2.2 Performance of Known Algorithms in the d1RSB Phase
We end this chapter with a brief overview of the performance of known algo-
rithms as a function of the clause density, and pointers to more detailed surveys.
Beginning with [CKT91] and [MSL92], there has been an understanding that
hard instances of random k-SAT tend to occur when the constraint density α
is near the transition threshold, and that this behavior was similar to phase tran-
sitions in spin glasses [KS94]. Now that we have surveyed the known results
about the geometry of the space of solutions in this region, we turn to the ques-
tion of how the two are related.
It has been empirically observed that the onset of the d1RSB transition seems
to coincide with the constraint density where traditional solvers tend to exhibit
exponential slowdown; see [ACO08] and [CO09]. See also [CO09] for the best
current algorithm along with a comparison of various other algorithms to it.
Thus, while both regimes in SAT have solutions with high probability, the ease
of finding a solution differs quite dramatically on traditional SAT solvers due to
a clustering of the solution space into numerous communities that are far apart
71
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 72
from each other in terms of Hamming distance. In particular, for clause den-
sities above O(2
k
/k), no algorithms are known to produce solutions in polyno-
mial time with probability Ω(1) — neither on the basis of rigorous or empirical
analysis, or any other evidence [CO09]. Compare this to the SAT-unSAT thresh-
old, which is asymptotically 2
k
ln 2. Thus, well below the SAT-unSAT threshold,
in regimes where we know solutions exist, we are currently unable to find them
in polynomial time. Our work will explain that indeed, this is fundamentally
a limitation of polynomial time algorithms. Specifically, in such phases (for
k ≥ 9), the solution space geometry is not expressible as a mixture of range
or value limited poly(log n)-parametrizable pieces. This is because in the d1RSB
phase, the distribution of solutions is both irreducibly correlated at ranges O(n),
and ample, precluding both range and value limited parametrizations.
Please see [CO09] for the best known algorithm that does solve SAT in-
stances with non-vanishing probability for densities up to 2
k
ω(k)/k for any
sequence ω(k) → ∞. See [ACO08] for proofs that the clause density where
all known polynomial time algorithms fail on NP-complete problems such as
k-SAT and graph coloring coincides with the onset of the d1RSB phase in these
problems. This clause density threshold for the onset of the d1RSB phase is
2
k
/k ln k [ACO08]. The earlier [ART06] had established the existence of shatter-
ing and freezing of variables within cores for α = Θ(2
k
).
The significance of the value of k By the results of [ART06] and [ACO08, '2.1,
Rem. 2], we are guaranteed the presence of the full d1RSB phenomena only for k ≥ 9
and clause density above (2
k
/k) ln k.
Hence, for our separation of complexity classes, we will work with ran-
dom k-SAT in the k ≥ 9 regime, and the clause density sufficiently high so
that we are in the d1RSB phase. We will require all known properties of the
d1RSB phase — namely, the exponentially many clusters, the freezing of vari-
ables within clusters, and the O(n) variable changes required to move from one
cluster to another. These properties are not known to hold except for k ≥ 9 and
clause densities above (2
k
/k) ln k.
72
6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 73
It should be noted that there is empirical evidence that the d1RSB phase is
not present in random 3-SAT in the following sense. The cores in the clusters
of random 3-SAT are trivial. By that we mean that they tend to be the all ∗ core,
unlike k ≥ 9 where [ART06] show the existence of nontrivial cores for almost
all clusters after the d1RSB threshold.
We should also point out that the experimental behavior of algorithms for
k-SAT is largely characterized for lower values of k = 2, 3, 4, where the full
d1RSB picture is not known to hold. For instance, the experimental behavior
of algorithms reported in [MRTS07] is on random 4-SAT. See also [KMRT
+
06],
where experiments are reported on 4-SAT. We are not aware of experimental
work done that shows the efficacy (even under mild requirements) of any algo-
rithm on k ≥ 9 after the onset of the d1RSB phase with nontrivial cores.
Incomplete algorithms are a class that do not always find a solution when
it exists, nor do they indicate the lack of solution except to the extent that
they were unable to find one. Incomplete algorithms are obviously very im-
portant for hard regimes of constraint satisfaction problems since we do not
have complete algorithms in these regimes that have economical running times.
More recently, a breakthrough for incomplete algorithms in this field came with
[MPZ02] who used the cavity method from spin glass theory to construct an
algorithm named survey propagation that does very well on instances of random
k-SAT with constraint density above the aforementioned clustering threshold,
and continues to perform well very close to the threshold α
c
for low values of
k. Survey propagation seems to scale as nlog n in this region. The algorithm
uses the 1RSB hypothesis about the clustering of the solution space into numer-
ous communities. The original work reported in [MPZ02] was on 3-SAT. The
behavior of survey propagation for higher values of k is still being researched.
73
7. Random Graph Ensembles
We will use factor graphs as a convenient means to encode various properties
of the random k-SAT ensemble. In this section we introduce the factor graph
ensembles that represent random k-SAT. Our treatment of this section follows
[MM09, Chapter 9].
Definition 7.1. The randomk-factor graph ensemble, denoted by G
k
(n, m), consists
of graphs having n variable nodes and mfunction nodes constructed as follows.
A graph in the ensemble is constructed by picking, for each of the m function
nodes in the graph, a k-tuple of variables uniformly from the

n
k

possibilities
for such a k-tuple chosen from n variables.
Graphs constructed in this manner may have two function nodes connected
to the same k-tuple of variables. In this ensemble, function nodes all have de-
gree k, while the degree of the variable nodes is a random variable with expec-
tation km/n.
Definition 7.2. The random(k, α)-factor graph ensemble, denoted by G
k
(n, α), con-
sists of graphs constructed as follows. For each of the

n
k

k-tuples of variables,
a function node that connects to only these k variables is added to the graph
with probability αn/

n
k

.
In this ensemble, the number of function nodes is a random variable with
expectation αn, and the degree of variable nodes is a random variable with
expectation αk.
We will be interested in the case of the thermodynamic limit of n, m → ∞
with the ratio α := m/n being held constant. In this case, both the ensembles
converge in the properties that are important to us, and both can be seen as the
74
7. RANDOM GRAPH ENSEMBLES 75
underlying factor graph ensembles to our random k-SAT ensemble SAT
k
(n, α)
(see Chapter 6 for definitions and our notation for random k-SAT ensembles).
With the definitions in place, we are ready to describe two properties of
random graph ensembles that are pertinent to our problem.
7.1 Properties of Factor Graph Ensembles
The first property provides us with intuition on why algorithms find it so hard
to put together local information to form a global perspective in CSPs.
7.1.1 Locally Tree-Like Property
We have seen in Chapter 5 that the propagation of influence of variables during
a LFP computation is stagewise-local. This is really the fundamental limitation
of LFP that we seek to exploit. In order to understand why this is a limitation,
we need to examine what local neighborhoods of the factor graphs underly-
ing NP-complete problems like k-SAT look like in hard phases such as d1RSB.
In such phases, there are many extensive (meaning O(n)) correlations between
variables that arise due to loops of sizes O(log n) and above.
However, remarkably, such graphs are locally trivial. By that we mean that
there are no cycles in a O(1) sized neighborhood of any vertex as the size of the
graph goes to infinity [MM09, '9.5]. One may demonstrate this for the Erdos-
Renyi random graph as follows. Here, there are n vertices, and there is an edge
between any two with probability c/n where c is a constant that parametrizes
the density of the graph. Edges are “drawn” uniformly and independently of
each other. Consider the probability of a certain graph (V, E) occurring as a
subgraph of the Erdos-Renyi graph. Such a graph can occur in

n
|V |

positions.
At each position, the probability of the graph structure occurring is
p
|E|
(1 −p)
|V |
2
−|E|
.
Applying Stirling’s approximations, we see that such a graph occurs asymptot-
ically O([V [ − [E[) times. If the graph is connected, [V [ ≤ [E[ − 1 with equality
75
7. RANDOM GRAPH ENSEMBLES 76
only for trees. Thus, in the limit of n → ∞, finite connected graphs have van-
ishing probability of occurring in finite neighborhoods of any element.
In short, if only local neighborhoods are examined, the two ensembles G
k
(n, m)
and T
k
(n, m) are indistinguishable from each other.
Theorem 7.3. Let G be a randomly chosen graph in the ensemble G
k
(n, m), and i
be a uniformly chosen node in G. Then the r-neighborhood of i in G converges in
distribution to T
k
(n, m) as n → ∞.
Let us see what this means in terms of the information such graphs divulge
locally. The simplest local property is degrees of elements. These are, of course,
available through local inspection. The next would be small connected sub-
graphs (triangles, for instance). But even this next step is not available. In
other words, such random graphs do not provide any of their global proper-
ties through local inspection at each element.
Let us think about what this implies. We know from the onset of cores and
frozen variables in the 1dRSB phase of k-SAT that there are strong correlations
between blocks of variables of size O(n) in that phase. However, these loops
are invisible when we inspect local neighborhoods of a fixed finite size, as the
problem size grows.
7.1.2 Degree Profiles in Random Graphs
The degree of a variable node in the ensemble G
k
(n, m) is a random variable.
We wish to understand the distribution of this random variable. The expected
value of the fraction of variables in G
k
(n, m) having degree d is the same as the
expected value that a single variable node has degree d, both being equal to
P(deg v
i
= d) =

m
d

p
k
(1 −p)
m−d
where p =
k
d
.
In the large graph limit we get
lim
n→∞
P(deg v
i
= d) = e
−kα
(kα)
d
n !
.
76
7. RANDOM GRAPH ENSEMBLES 77
In other words, the degree is asymptotically a Poisson random variable.
A corollary is that the maximum degree of a variable node is almost surely
less than O(log n) in the large graph case.
Lemma 7.4. The maximum variable node degree in G
k
(n, m) is asymptotically almost
surely O(log n). In particular, it asymptotically almost surely satisfies the following
d
max
kαe
=
z
log(z/ log z)
¸
1 + Θ

log log z
(log z)
2

. (7.1)
where z = (log n)/kαe.
Proof. See [MM09, p. 184] for a discussion of this upper bound, as well as a
lower bound.
77
8. Separation of Complexity Classes
We have built a framework that connects ideas from graphical models, logic,
statistical mechanics, and random graphs. We are now ready to begin our final
constructions that will yield the separation of complexity classes.
We have described the fundamental similarity between range limited and
value limited distributions in Chapter 3. Both are hampered by the same un-
derlying property — that inspite of being distributions on n covariates, they
can be specified with only 2
poly(log n)
parameters. In our terminology, they are
both poly(log n)-parametrizable. Informally, this means that their joint distribu-
tion behaves like the joint distribution of only poly(log n) covariates instead of
n covariates.
In light of the above, we first consider the case of range limited poly(log n)-
parametrizations. We return to value limited poly(log n)-parametrizations just
before the final separation of complexity classes in Section 8.5.
8.1 Measuring Conditional Independence in Range
Limited Models
Our central concern with respect to range limited models is to understand which
variable interactions in a system are irreducible — namely, those that cannot be
expressed in terms of interactions between smaller sets of variables with con-
ditional independencies between them. Such irreducible interactions can be
2-interactions (between pairs), 3-interactions (between triples), and so on, up to
n-interactions between all n variables simultaneously.
A joint distribution encodes the interaction of a system of n variables. What
78
8. SEPARATION OF COMPLEXITY CLASSES 79
would happen if all the direct interactions between variables in the system were
all of less than a certain finite range k, with k < n? In such a case, the “joint-
ness” of the covariates really would lie at a lower “level” than n. We would like
to measure the “level” of conditional independence in a system of interacting
variables by inspecting their joint distribution. At level zero of this “hierarchy”,
the covariates should be independent of each other. At level n, they are coupled
together n at a time, without the possibility of being decoupled. In this way,
we can make statements about how deeply entrenched the conditional inde-
pendence between the covariates is, or dually, about how large the set of direct
interactions between variables is.
Remark 8.1. Similarly, if the variables did interact n at a time, but took only
2
poly(log n)
joint values, the “jointness” of the distribution would lie at a lower
level than n. In both cases above, as stated in Chapter 3, the n covariates do not
display the behavior of a typical joint distribution of n variables. Instead, they
behave in ways similar to a set of poly(log n) covariates.
When the largest irreducible interactions are k-interactions, the distribution
can be parametrized with n2
k
independent parameters. Thus, in families of dis-
tributions where the irreducible interactions are of fixed size, the independent
parameter space grows polynomially with n, whereas in a general distribution
without any conditional independencies, it grows exponentially with n. The
case of monadic LFP lies in between — the interactions are not of fixed size, but
they grow relatively slowly. The case of complex LFP is also one of poly(log n)-
parametrization, except it is a value-limited O(n) interaction model.
There are some technical issues with constructing such a hierarchy to mea-
sure conditional independence. The first issue would be how to measure the
level of a distribution in this hierarchy. If, for instance, the distribution has a
directed {-map, then we could measure the size of the largest clique that ap-
pears in its moralized graph. However, as noted in Sec. 2.5, not all distributions
have such maps. We may, of course, upper and lower bound the level using
minimal T-maps and maximal T-maps for the distribution. In the case of or-
dered graphs, we should note that there may be different minimal T-maps for
79
8. SEPARATION OF COMPLEXITY CLASSES 80
the same distribution for different orderings of the variables. See [KF09, p. 80]
for an example.
The insight that allows us to resolve the issue is as follows. If we could
somehow embed the distribution of solutions generated by LFP into a larger dis-
tribution, such that
1. the larger distribution factorized recursively according to some directed
graphical model, and
2. the larger distribution had only polynomially many more variates than
the original one,
then we would have obtained a parametrization of our distribution that would
reflect the factorization of the larger distribution, and would cost us only poly-
nomially more, which does not affect us.
By pursuing the above course, we aim to demonstrate that distributions of
solutions generated by LFP lie at a lower level of conditional independence than
distributions that occur in the d1RSB phase of random k-SAT. Consequently,
they have more economical parametrizations than the space of solutions in the
d1RSB phase does.
We will return to the task of constructing such an embedding in Sec. 8.3.
First we describe how we use LFP to create a distribution of solutions.
8.2 Generating Distributions from LFP
We will describe the method of generating distributions and showing economic
parametrizations by embedding the covariates into a larger directed graphical
model below for monadic LFP. We will indicate the differences for complex
LFP.
8.2.1 Encoding k-SAT into Structures
In order to use the framework from Chapters 4 and 5, we will encode k-SAT
formulae as structures over a fixed vocabulary.
80
8. SEPARATION OF COMPLEXITY CLASSES 81
Our vocabularies are relational, and so we need only specify the set of rela-
tions, and the set of constants. We will use three relations.
1. The first relation R
C
will encode the clauses that a SAT formula comprises.
Since we are studying ensembles of random k-SAT, this relation will have
arity k.
2. We need a relation in order to make FO(LFP) capture polynomial time
queries on the class of k-SAT structures. We will not introduce a linear
ordering since that would make the Gaifman graph a clique. Rather we
will include a relation such that FO(LFP) can capture all the polynomial
time queries on the structure. This will be a binary relation R
E
.
3. Lastly, we need a relation R
P
to hold “partial assignments” to the SAT
formulae. We will describe these in the Sec. 8.2.3.
4. We do not require constants.
This describes our vocabulary
σ = ¦R
C
, R
E
, R
P
¦.
Next, we come to the universe. A SAT formula is defined over n variables,
but they can come either in positive or negative form. Thus, our universe will
have 2n elements corresponding to the variables x
1
, . . . , x
n
, x
1
, . . . , x
n
. In or-
der to avoid new notation, we will simply use the same notation to indicate the
corresponding element in the universe. We denote by lower case x
i
the literals
of the formula, while the corresponding upper case X
i
denotes the correspond-
ing variable in a model.
Finally, we need to interpret our relations in our universe. We dispense with
the superscripts since the underlying structure is clear. The relation R
C
will
consist of k-tuples from the universe interpreted as clauses consisting of dis-
junctions between the variables in the tuple. The relation R
E
will be interpreted
as an “edge” between successive variables. The relation R
P
will be a partial
assignment of values to the underlying variables.
81
8. SEPARATION OF COMPLEXITY CLASSES 82
Now we encode our k-SAT formulae into σ-structures in the natural way.
For example, for k = 3, the clause x
1
∨ x
2
∨ x
3
in the SAT formula will be
encoded by inserting the tuple (x
1
, x
2
, x
3
) in the relation R
C
. Similarly, the
pairs (x
i
, x
i+1
) and (x
i
, x
i+1
), both for 1 ≤ i < n, as well as the pair (x
n
, x
1
)
will be in the relation R
E
. This chains together the elements of the structure.
The reason for the relation R
E
that creates the chain is that on such struc-
tures, polynomial time queries are captured by FO(LFP) [EF06, '11.2]. This is
a technicality. Recall that an order on the structure enables the LFP computa-
tion (or the Turing machine the runs this computation) to represent tuples in
a lexicographical ordering. In our problem of k-SAT, it plays no further role.
Specifically, the assignments to the variables that are computed by the LFP have
nothing to do with their order. They depend only on the relation R
C
which en-
codes the clauses and the relation R
P
that holds the initial partial assignment
that we are going to ask the LFP to extend. In other words, each stage of the
LFP is order-invariant. It is known that the class of order invariant queries is also
Gaifman local [GS00]. However to allow LFP to capture polynomial time on the
class of encodings, we need to give the LFP something it can use to create an
ordering. We could encode our structures with a linear order, but that would
make the Gaifman graph fully connected. What we want is something weaker,
that still suffices. Thus, we encode our structures as successor-type structures
through the relation R
E
. This seems most natural, since it imparts on the struc-
ture an ordering based on that of the variables. Note also that SAT problems
may also be represented as matrices (rows for clauses, columns for variables
that appear in them), which have a well defined notion of order on them.
Ensembles of k-SAT Let us now create ensembles of σ-structures using the
encoding described above. We will start with the ensemble SAT
k
(n, α) and
encode each k-SAT instance as a σ-structure. The resulting ensemble will be
denoted by S
k
(n, α). The encoding of the problem Φ
k
(n, α) as a σ-structure will
be denoted by P
k
(n, α).
82
8. SEPARATION OF COMPLEXITY CLASSES 83
8.2.2 The LFP Neighborhood System
In this section, we wish to describe the neighborhood system that underlies the
monadic LFP computations on structures of S
k
(n, α). We begin with the factor
graph, and build the neighborhood system through the Gaifman graph.
Let us recall the factor graph ensemble G
k
(n, m). Each graph in this ensem-
ble encodes an instance of random k-SAT. We encode the k-SAT instance as
a structure as described in the previous section. Next, we build the Gaifman
graph of each such structure. The set of vertices of the Gaifman graph are sim-
ply the set of variable nodes in the factor graph and their negations since we
are using both variables and their negations for convenience (this is simply an
implementation detail). For instance, the Gaifman graph for the factor graph of
Fig 2.2 will have 12 vertices. Two vertices are joined by an edge in the Gaifman
graph either when the two corresponding variable nodes were joined to a single
function node (i.e., appeared in a single clause) of the factor graph or if they are
adjacent to each other in the chain that relation R
E
has created on the structure.
On this Gaifman graph, the simple monadic LFP computation induces a
neighborhood system described as follows. The sites of the neighborhood sys-
temare the variable nodes. The neighborhood A
s
of a site s is the set of all nodes
that lie in the r-neighborhood of a site, where r is the locality rank of the first
order formula ϕ whose fixed point is being constructed by the LFP computation.
Finally, we make the neighborhood system into a graph in the standard way.
Namely, the vertices of the graph will be the set of sites. Each site s will be con-
nected by an edge to every other site in A
s
. This graph will be called the interac-
tion graph of the LFP computation. The ensemble of such graphs, parametrized
by the clause density α, will be denoted by I
k
(n, α).
Note that this interaction graph has many more edges in general than the
Gaifman graph. In particular, every node that was within the locality rank
neighborhood of the Gaifman graph is now connected to it by a single edge.
The resulting graph is, therefore, far more dense than the Gaifman graph.
What is the size of cliques in this interaction graph? This is not the same as
the size of cliques in the factor graph, or the Gaifman graph, because the density
83
8. SEPARATION OF COMPLEXITY CLASSES 84
of the graph is higher. The size of the largest clique is a random variable. What
we want is an asymptotic almost sure (by this we mean with probability tending
to 1 in the thermodynamic limit) upper bound on the size of the cliques in the
distribution of the ensemble I
k
(n, α).
Note: From here on, all the statements we make about ensembles should be under-
stood to hold asymptotically almost surely in the respective random ensembles. By that
we mean that they hold with probability 1 as n → ∞.
Lemma 8.2. The size of cliques that appear in graphs of the ensemble I
k
(n, α) are upper
bounded by poly(log n) asymptotically almost surely.
Proof. Let d
max
be as in (7.1), and r be the locality rank of ϕ. The maximum
degree of a node in the Gaifman graph is asymptotically almost surely upper
bounded by d
max
= O(log n). The locality rank is a fixed number (roughly equal
to 3
d
where d is the quantifier depth of the first order formula that is being
iterated). The node under consideration could have at most d
max
others adjacent
to it, and the same for those, and so on. This gives us a coarse d
r
max
upper bound
on the size of cliques.
Remark 8.3. While this bound is coarse, there is not much point trying to tighten
it because any constant power factor (r in the case above) can always be in-
troduced by computing a r-ary LFP relation. This bound will be sufficient for
us.
Remark 8.4. High degree nodes in the Gaifman graph become significant fea-
tures in the interaction graph since they connect a large number of other nodes
to each other, and therefore allow the LFP computation to access a lot of infor-
mation through a neighborhood system of given radius. It is these high degree
nodes that reduce factorization of the joint distribution since they represent di-
rect interaction of a large number of variables with each other. Note that al-
though the radii of neighborhoods are O(1), the number of nodes in them is
not O(1) due to the Poisson distribution of the variable node degrees, and the
existence of high degree nodes.
84
8. SEPARATION OF COMPLEXITY CLASSES 85
Remark 8.5. The relation being constructed is monadic, and so it does not intro-
duce new edges into the Gaifman graph at each stage of the LFP computation.
When we compute a k-ary LFP, we can encode it into a monadic LFP over a
polynomially (n
k
) larger product space, as is done in the canonical structure,
for instance, but with the linear order replaced by a weaker successor type rela-
tion. Therefore, we can always chose to deal with monadic LFP. This is really a
restatement of the transitivity principle for inductive definitions that says that
if one can write an inductive definition in terms of other inductively defined
relations over a structure, then one can write it directly in terms of the original
relations that existed in the structure [Mos74, p. 16].
8.2.3 Generating Distributions
The standard scenario in finite model theory is to ask a query about a structure
and obtain a Yes/No answer. For example, given a graph structure, we may ask
the query “Is the graph connected?” and get an answer.
But what we want are distributions of solutions that are computed by a pur-
ported LFP algorithm for k-SAT. This is not generally the case in finite model
theory. Intuitively, we want to generate solutions lying in exponentially many
clusters of the solution space of SAT in the d1RSB phase. How do we do this?
To generate these distributions, we will start with partial assignments to the set
of variables in the formula, and ask the question whether such a partial as-
signment can be extended to a satisfying assignment. We need the following
definition.
Definition 8.6. A global relation associated to a decision problem on a class K is
a relation R of a fixed arity k over A associated to each structure A ∈ K.
The following is a restatement of the Immerman-Vardi theorem phrased in
terms of computability of global relations. See [LR03, '11.2, p. 206] for a proof.
Theorem 8.7. A global relation R on a class of successor structures is computable in
polynomial time if and only if R is inductive.
85
8. SEPARATION OF COMPLEXITY CLASSES 86
We wish to see that the global relation that associates to each structure a
complete assignment that coincides with the partial assignment placed in the
relation R
P
is inductive. By the theorem above, this is equivalent to showing
that it is computable in polynomial time. In order to see this, we recall that
decision problems that are NP-complete have a property called self-reducibility
that allows us to query a decision procedure for them a polynomial number of
times and build a solution to the search version of the problem. If P = NP,
then all decision problems in NP have polynomial time solutions, and one can
use self-reducibility to see that the search version will also be polynomial time
solvable — namely, a solution will be constructible in polynomial time. Next
we will define our search problem in a way that a solution to it will be a global
relation: an instance of the problem will be a structure with partial assignments,
and the question will be whether the partial assignment can be extended to a
complete assignment. The complete assignment can be represented by a global
unary relation that will store all the literals assigned +1, and which must concur
with the partial assignment on its overlap. This decision problem is clearly in
NP, and therefore if P = NP, it would have a polynomial time search solution,
making R computable in polynomial time. The theorem then says R must be
inductive.
Since we want to generate exponentially many such solutions, we will have
to partially assign O(n) (a small fraction) of the variables, and ask the LFP to
extend this assignment, whenever possible, to a satisfying assignment to all
variables. Thus, we now see what the relation R
P
in our vocabulary stands for.
It holds the partial assignment to the variables. For example, suppose we want
to ask whether the partial assignment x
1
= 1, x
2
= 0, x
3
= 1 can be extended to
a satisfying assignment to the SAT formula, we would store this partial assign-
ment in the tuple (x
1
, x
2
, x
3
) in the relation R
P
in our structure.
As mentioned earlier, the output satisfying assignment will be computed as
a unary relation which holds all the literals that are assigned the value 1. This
means that x
i
is in the relation if x
i
has been assigned the value 1 by the LFP,
and otherwise x
i
is in the relation meaning that x
i
has been assigned the value
86
8. SEPARATION OF COMPLEXITY CLASSES 87
0 by the LFP computation. This is the simplest case where the FO(LFP) formula
is simple monadic. For more complex formulas, the output will be some section
of a relation of higher arity (please see Appendix A for details), and we will
view it as monadic over a polynomially larger structure.
Now we “initialize” our structure with different partial assignments and
ask the LFP to compute complete assignments when they exist. If the partial
assignment cannot be extended, we simply abort that particular attempt and
carry on with other partial assignments until we generate enough solutions. By
“enough” we mean rising exponentially with the underlying problem size. In
this way we get a distribution of solutions that is exponentially numerous, and
we now analyze it and compare it to the one that arises in the d1RSB phase of
random k-SAT.
8.3 Disentangling the Interactions: The ENSP Model
Now that we have a distribution of solutions computed by LFP, we would like
to examine its conditional independence characteristics. Does it factor through
any particular graphical model, for instance?
In Chapter 2, we considered various graphical models and their conditional
independence characteristics. Once again, our situation is not exactly like any
of these models. We will have to build our own, based on the principles we
have learnt. Let us first note two issues.
The first issue is that graphical models considered in literature are mostly
static. By this we mean that
1. they are of fixed size, over a fixed set of variables, and
2. the relations between the variables encoded in the models are fixed.
In short, they model fixed interactions between a fixed set of variables. Since
we wish to apply them to the setting of complexity theory, we are interested in
families of such models, with a focus on how their structure changes with the
problem size.
87
8. SEPARATION OF COMPLEXITY CLASSES 88
The second issue that faces us nowis as follows. Even within a certain size n,
we do not have a fixed graph on n vertices that will model all our interactions.
The way a LFP computation proceeds through the structure will, in general,
vary with the initial partial assignment. We would expect a different “trajec-
tory” of the LFP computation for different clusters in the d1RSB phase. So, if
one initial partial assignment landed us in cluster X, and another in cluster Y,
the way the LFP would go about assigning values to the unassigned variables
would be, in general, different. Even within a cluster, the trajectories of two
different initial partial assignments will not be the same, although we would
expect them to be similar. How do we deal with this situation?
In order to model this dynamic behavior, let us build some intuition first.
1. We know that there is a “directed-ness” to LFP in that elements that are
assigned values at a certain stage of the computation then go on to influ-
ence other elements who are as yet unassigned. Thus, there is a directed
flow of influence as the LFP computation progresses. This is, for exam-
ple, different from a Markov random field distribution which has no such
direction.
2. There are two types of flows of information in a LFP computation. Con-
sider simple monadic LFP. In one type of flow, neighborhoods across the
structure influence the value an unassigned node will take. In the other
type of flow, once an element is assigned a value, it changes the neighbor-
hoods (or more precisely the local types of various other elements) in its
vicinity. Note that while the first type of flow happens during a stage of
the LFP, the second type is implicit. Namely, there is no separate stage of
the LFP where it happens. It implicitly happens once any element enters
the relation being computed.
3. Because the flow of information is as described above, we will not be able
to express it using a simple DAG on either the set of vertices, or the set of
neighborhoods. Thus, we have to consider building a graphical model on
certain larger product spaces.
88
8. SEPARATION OF COMPLEXITY CLASSES 89
4. The stage-wise nature of LFP is central to our analysis, and the various
stages cannot be bundled into one without losing crucial information.
Thus, we do need a model which captures each stage separately.
5. In order to exploit the factorization properties of directed graphical models,
and the resulting parametrization by potentials, we would like to avoid
any closed directed paths.
Let us now incorporate this intuition into a model, which we will call a
Element-Neighborhood-Stage Product Model, or ENSP model for short. This model
appears to be of independent interest. We now describe the ENSP model for
a simple monadic least fixed point computation. The model is illustrated in
Fig. 8.1. It has two types of vertices.
Element Vertices These vertices, which encode the variables of the k-SAT in-
stance, are represented by the smaller circles in Fig. 8.1. They therefore
correspond to elements in the structure (recall that elements of the struc-
ture represent the literals in the k-SAT formula). However, each variable in
our original system X
1
, . . . , X
n
is represented by a different vertex at each stage
of the computation. Thus, each variable in the original system gives rise to

A
[ vertices in the ENSP model. Also recall that there are 2n elements in
the k-SAT structure, where n is the number of variables in the SAT for-
mula. However, in Fig 8.1, we have only shown one vertex per variable,
and allowed it to be colored two colors - green indicating the variable
has been assigned a value of +1, and red indicating the variable has been
assigned the value −1. Since the underlying formula ϕ that is being iter-
ated is positive, elements do not change their color once they have been
assigned.
Neighborhood Vertices These vertices, denoted by the larger circles with blue
shading in Fig. 8.1, represent the r-neighborhoods of the elements in the
structure. Just like variables, each neighborhood is also represented by a
different vertex at each stage of the LFP computation. Each of their pos-
sible values are the possible isomorphism types of the r-neighborhoods,
89
8. SEPARATION OF COMPLEXITY CLASSES 90
N(x
1,1
)
X
1,1
N(x
2,1
)
N(x
3,1
)
N(x
n-1,1
)
N(x
n,1
)
N(x
1,2
)
N(x
2,2
)
N(x
3,2
)
N(x
n-1,2
)
N(x
n,2
)
N(x
1,3
)
N(x
2,3
)
N(x
3,3
)
N(x
n-1,3
)
N(x
n,3
)
N(x
1
)
N(x
2
)
N(x
3
)
N(x
n-1
)
N(x
n
)

X
2,1
X
3,1
X
4,1
X
i,1
X
i+1,1
X
n-1,1
X
n,1


X
1,2
X
2,2
X
3,2
X
4,2
X
i,2
X
i+1,2
X
n-1,2
X
n,2
X
1,3
X
2,3
X
3,3
X
4,3
X
i,3
X
i+1,3
X
n-1,3
X
n,3
X
1
X
2
X
3
X
4
X
i
X
i+1
X
n-1
X
n
⋮ ⋮
⋮ ⋮ ⋮
Stages of LFP
E
l
e
m
e
n
t
s
N
e
i
g
h
b
o
r
h
o
o
d
s
Figure 8.1: The Element-Neighborhood-Stage Product (ENSP) model for LFP
ϕ
. See
text for description.
90
8. SEPARATION OF COMPLEXITY CLASSES 91
namely, the local r-types of the corresponding element. These vertices
may be thought of as vectors of size poly(log n) corresponding to the cliques
that occur in the neighborhood system we described in Sec. 8.2.2, or one
may think of them as a single variable taking the value of the various local
r-types.
Now we describe the stages of the ENSP. There are 2[ϕ
A
[ stages, starting
from the leftmost and terminating at the rightmost. Each stage of the LFP com-
putation is represented by two stages in the ENSP. Initially, at the start of the
LFP computation, we are in the left-most stage. Here, notice that some variable
vertices are colored green, and some red. In the figure, X
4,1
is green, and X
i,1
is
red. This indicates that the initial partial assignment that we provided the LFP
had variable X
4
assigned +1 and variable X
i
assigned −1. In this way, a small
fraction O(n) of the variables are assigned values. The LFP is asked to extend
this partial assignment to a complete satisfying assignment on all variables (if it
exists, and abort if not).
Let us now look at the transition to the second stage of the ENSP. At this
stage, based on the conditions expressed by the formula ϕ in terms of their
own local neighborhoods, and the existence of a bounded number of other local
neighborhoods in the structure, some elements enter the relation. This means
they get assigned +1 or −1. In the figure, the variable X
3,2
takes the color green
based on information gathered from its own neighborhood N(X
3,1
) and two
other neighborhoods N(X
2,1
) and N(X
n−1,1
). This indicates that at the first
stage, the LFP assigned the value +1 to the variable X
3
. Similarly, it assigns
the value −1 to variable X
n
(remember that the first two stages in the ENSP
correspond to the first stage of the LFP computation). The vertices that do not
change state simply transmit their existing state to the corresponding vertices
in the next stage by a horizontal arrow, which we do not show in the figure in
order to avoid clutter.
Once some variables have been assigned values in the first stage, their neigh-
borhoods, and the neighborhoods in their vicinity (meaning, the neighborhoods
of other elements that are in their vicinity) change. This is indicated by the dot-
91
8. SEPARATION OF COMPLEXITY CLASSES 92
ted arrows between the second and third stages of the ENSP. Note that this
happens implicitly during LFP computation. That is why we have represented
each stage of the actual LFP computation by two stages in the ENSP. The first
stage is the explicit stage, where variables get assigned values. The second stage
is the implicit stage, where variables “update their neighborhoods” and those
neighborhoods in their vicinity. For example, once X
3
has been assigned the
value +1, it updates its neighborhood and also the neighborhood of variable
X
2
which lies in its vicinity (in this example). In this way, influence propagates
through the structure during a LFP computation. There are two stages of the
ENSP for each stage of the LFP. Thus, there are 2[ϕ
A
[ stages of the ENSP in all.
By the end of the computation, all variables have been assigned values, and
we have a satisfying assignment. The variables at the last stage X
i,|ϕ
A
|
are just
the original X
i
. Thus, we recover our original variables (X
1
, . . . , X
n
) by simply
looking only at the last (rightmost in the figure) level of the ENSP.
By introducing extra variables to represent each stage of each variable and
each neighborhood in the SAT formula, we have accomplished our original
aim. We have embedded our original set of variates into a polynomially larger
product space, and obtained a directed graphical model on this larger space.
This product space has a nice factorization due to the directed graph structure.
This is what we will exploit.
Remark 8.8. The explicit stages of the ENSP also perform the task of propagating
the local constraints placed by the various factors in the underlying factor graph
outward into the larger graphical model. For example, in our case of the factors
encoding clauses of a k-SAT formula, the local constraint placed by a clause
is that the global assignment must evade exactly one restriction to a specified
set of k coordinates. For example, in the case of k = 3 the clause x
1
∨ x
2

x
3
permits all global assignments except those whose first three coordinates
are (−1, −1, +1). In contrast, if the factor were a XORSAT clause, the local
restrictions are all in the form of linear spaces, and so the global solution is an
intersection of such spaces. k-SAT asks a question about whether certain spaces
92
8. SEPARATION OF COMPLEXITY CLASSES 93
of the form
¦ω: (ω
i
1
, . . . , ω
i
k
) = (ν
1
, . . . , ν
k

have non-empty intersections, where 1 ≤ i
1
< i
2
< < i
k
≤ n and the
prohibited ν
i
are ±1. Note that these are O(1) local constraints per factor. In
contrast, XORSAT asks the question about whether certain linear spaces have a
non-empty intersection. Linearity is a global constraint. Of course, all messages
are coded into the formula ϕ. Thus, the end result of multiple runs of the LFP
will be a space of solutions conditioned upon the requirements. So, for instance,
if we were to try to solve XORSAT formulae, we would obtain a space that
would be linear.
Thus, we have a directed graph with 2n + n = 3n vertices at each stage,
and 2[ϕ
A
[ stages. Since the LFP completes its computation in under a fixed
polynomial number of steps, this means that we have managed to represent
the LFP computation on a structure as a directed model using a polynomial
overhead in the number of parameters of our representation space. In other
words, by embedding the covariates into a polynomially larger space, we have
been able to put a common structure on various computations done by LFP
on them. Note that without embedding the covariates into a larger space, we
would not be able to place the various computations done by LFP into a single
graphical model. The insight that we can afford to incur a polynomial cost in
order to obtain a common graphical model on a larger product space was key
to this section.
8.4 Parametrization of the ENSP
Our goal is to demonstrate the following.
If LFP were able to compute solutions to the d1RSB phase of randomk-SAT,
then the distribution of the entire space of solutions would have a substan-
tially simpler parametrization than we know it does.
93
8. SEPARATION OF COMPLEXITY CLASSES 94
In order to accomplish this, we need to measure the growth in the dimension of
independent parameters it requires to parametrize the distribution of solutions
that we have just computed using LFP.
In order to do this, we have embedded our variates into a polynomially
larger space that has factorization according to a directed model — the ENSP.
We have seen that the cliques in the ENSP are of size poly(log n). By employ-
ing the version of Hammersley-Clifford for directed models, Theorem 2.13, we
also know that we can parameterize the distribution by specifying a system of
potentials over its cliques, automatically ensuring conditional independence.
The directed nature of the ENSP also means that we can factor the resulting
distribution into conditional probability distributions (CPDs) at each vertex of
the model of the form P(x [ pa(x)), and then normalize each CPD. Once again,
each CPD will have scope only poly(log n). From our perspective, the major
benefit of directed graphical models is that we can do this always, without any
added positivity constraints. Recall that positivity is required in order to apply
the Hammersley-Clifford theorem to obtain factorizations for undirected mod-
els.
How do we compute the CPDs or potentials? We assign various initial par-
tial assignments to the variables as described in Sec. 8.2.3 and let the LFP com-
putations run. We only consider successful computations, namely those where
the LFP was able to extend the partial assignment to a full satisfying assignment
to the underlying k-SAT formula. We represent each stage of the LFP compu-
tation on the corresponding two stages of the ENSP and thus obtain one full
instantiation of the representation space. We do this exponentially numerous
times, and build up our local CPDs by simply recording local statistics over all
these runs. This gives us the factorization (over the expanded representation
space) of our distribution, assuming that P = NP.
The ENSP for different runs of the LFP will, in general, be different. This
is because the flow of influences through the stages of the ENSP will, in gen-
eral, depend on the initial partial assignment. What is important is that each
such model will have some properties — such as largest clique size, which de-
94
8. SEPARATION OF COMPLEXITY CLASSES 95
termines the order of the number of parameters — in common. Let us inspect
these properties that determine the parametrization of the ENSP model.
1. There are polynomially many more vertices in the ENSP model than ele-
ments in the underlying structure.
2. Lemma 8.2 gives us a poly(log n) upper bound on the size of the neigh-
borhoods. The number of local r-types whose value each neighborhood
vertex can take is 2
poly(log n)
.
3. By Theorem 5.9 there is a fixed constant s such that there must exist s
neighborhoods in the structure satisfying certain local conditions for the
formula to hold. Remember, we are presently analyzing a single stage of
the LFP. This again gives us poly(n) (O(n
s
) in this case) different possibil-
ities for each explicit stage of the ENSP. The same can also be arrived at
by utilizing the normal form of Theorem 5.10. By the previous point, each
of these possibilities can be parameterized by 2
poly(log n)
parameters, giving
us a total of 2
poly(log n)
parameters required.
4. At each implicit stage of the ENSP, we have to update the types of the
neighborhoods that were affected by the induction of elements at the pre-
vious explicit stage. There are only n neighborhoods, and each has poly(log n)
elements at most.
The ENSP is an interaction model where direct interactions are of size poly(log n),
and are chained together through conditional independencies.
Proposition 8.9. A distribution that factorizes according to the ENSP can be pa-
rameterized with 2
poly(log n)
independent parameters. The scope of the factors in the
parametrization grows as poly(log n).
This also underscores the principle that the description of the parameter
space is simpler because it only involves interactions between l variates at a
time directly, and then chains these together through conditional independen-
cies. In the case of the LFP neighborhood system, the size of the largest cliques
95
8. SEPARATION OF COMPLEXITY CLASSES 96
are poly(log n) for each single run of the LFP. This will not change if we were
computing using complex fixed points since the space of k-types is only poly-
nomially larger than the underlying structure.
The crucial property of the distribution of the ENSP is that it admits a recur-
sive factorization. This is what drastically reduces the parameter space required
to specify the distribution. It also allows us to parametrize the ENSP by simply
specifying potentials on its maximal cliques, which are of size poly(log n).
While the entire distribution obtained by LFP may not factor according to
any one ENSP, it is a mixture of distributions each of whom factorizes as per
some ENSP. Next, we analyze the features of such a mixture when exponen-
tially many instantiations of it are provided. As the reader may intuit, when
such a mixture is asked to provide exponentially many samples, these will show
features of scope poly(log n). This is simply a statement about the paucity of in-
dependent parameters in the component distributions in the mixture.
8.5 Separation
We continue our treatment of range limited poly(log n)-parametrizations. We
will treat the value limited case shortly. The property of the ENSP for range
limited models that allows us to analyze the behavior of mixtures is that it is
specified by local Gibbs potentials on its cliques. In other words, a variable
interacts with the rest of the model only through the cliques that it is part of.
These cliques are parametrized by potentials. We may think of the cliques as
the building blocks of each ENSP. The cliques are also upper bounded in size
by poly(log n). Furthermore, a vertex may be in at most O(log n) such cliques.
Therefore, a vertex displays collective behavior only of range poly(log n). Thus,
the mixture comprises distributions that can be parametrized by a subspace of
R
poly(log n)
, in contrast to requiring the larger space R
O(n)
. This means that when
exponentially many solutions are generated, the features in the mixture will be
of size poly(log n), not of size O(n).
Next, let us examine the value limited case. In this case, the differences are
96
8. SEPARATION OF COMPLEXITY CLASSES 97
as follows
1. The solutions are generated by complex LFP, as sections of inductive rela-
tions of higher arity.
2. There are O(n) interactions at each stage, but the graphical model is parametriz-
able with only 2
poly(log n)
parameters.
3. Since the interactions are O(n), the Gibbs potentials are specified over
cliques of size O(n).
4. However, the potentials are parameterized with only 2
poly(log n)
parameters
inspite of having O(n) size. If we think of a potential as a CPD, then the
CPDs are wide (have O(n) columns), but are not very long (have only
2
poly(log n)
rows).
How do we analyze mixtures of such potentials? The idea is as follows. The
potentials are already large in their scope (O(n)). So we will create a single po-
tential over the entire graphical model which will have scope poly(n) (since the
computation terminates in polynomial time). How do we merge various O(n)
potentials into a single poly(n) sized potential? And what will be the resulting
parametrization of this merged potential?
In order to merge the potentials, we observe that they have a certain sheaf-
like property. Namely, since they are CPDs of the same LFP, they must agree
on overlaps. This means that two CPDs cannot specify different behavior for
the same priors. Remember, these CPDs are nothing but the rules by which the
computation proceeds, and these rules are the same for different computations
since it is the same LFP that is being used. Thus, the final merged potential will
be compatible with each smaller potential on overlaps.
Using this property, we can see that if each of the potentials had poly(log n)
parametrization, then so must the final merged potential. Once again we see
that we cannot instantiate exponentially many solutions from such a limited
parametrization and obtain the d1RSB picture which requires ample O(n) joint
distributions.
97
8. SEPARATION OF COMPLEXITY CLASSES 98
This explains why polynomial time algorithms fail when interactions be-
tween variable are ample-O(n), without the possibility of factoring into smaller
pieces through conditional independencies. This also puts on rigorous ground
the empirical observation that even NP-complete problems are easy in large
regimes, and become hard only when the densities of constraints increase above
a certain threshold. This threshold is precisely the value where ample irreducible-
O(n) interactions first appear in almost all randomly constructed instances.
In case of random k-SAT in the d1RSB phase, these ample irreducible-O(n)
interactions manifest through the appearance of cores which comprise clauses
whose variables are coupled so tightly that one has to assign them “simultane-
ously.” Cores arise when a set of C = O(n) clauses have all their variables also
lying in a set of size C. Once clause density is sufficiently high, cores cannot
be assigned poly(log n) at a time, and successive such assignments chained to-
gether through conditional independencies. Nor are they value limited since
they instantiate in each of the exponentially many clusters in the d1RSB phase.
Since cores do not factor through conditional independencies, and are not value
limited either, this makes it impossible for polynomial time algorithms to assign
their variables correctly. Intuitively, variables in a core are so tightly coupled to-
gether that they can only vary jointly, without any conditional independencies
between subsets. Furthermore, their variation is ample. In other words, they
represent irreducible interactions of size O(n) which may not be factored any
further, and which display the ample joint behavior of a system of n covari-
ates, which requires O(c
n
) independent parameters to specify. In such cases,
parametrization over cliques of size only poly(log n) is insufficient to specify the
joint distribution. Likewise, parametrization over cliques of size O(n), but with
only 2
poly(log n)
parameters, is insufficient.
We have shown that in the ENSP for range limited models, the size of the
largest such irreducible interactions are poly(log n), not O(n). Furthermore,
since the model is directed, it guarantees us conditional independencies at the
level of its largest interactions. More precisely, it guarantees us that there will
exist conditional independencies in sets of size larger than the largest cliques in
98
8. SEPARATION OF COMPLEXITY CLASSES 99
its moral graph, which are O(poly(log n)). In other words, there will be inde-
pendent variation within cores when conditioned upon values of intermediate
variables that also lie within the core, should the core factorize as per the ENSP.
This is illustrated in Fig. 8.2. This is contradictory to the known behaviour of
cores for sufficiently high values of k and clause density in the d1RSB phase. In
other words, while the space of solutions generated by LFP has features of size
poly(log n), the features present in cores in the d1RSB phase have size O(n).
The framework we have constructed allows us to analyze the set of poly-
nomial time algorithms simultaneously, since they can all be captured by
some LFP, instead of dealing with each individual algorithm separately. It
makes precise the notion that polynomial time algorithms can take into ac-
count only interactions between variables that grow as poly(log n), not as
O(n).
poly(log n) poly(log n)
Independent given
Intermediate values
Independent given
Intermediate values
Figure 8.2: The factorization and conditional independencies within a core due
to potentials of size poly(log n).
At this point, we are ready to state our main theorem.
Theorem 8.10. P = NP.
Proof. Consider the solution space of k-SAT in the d1RSB phase for k > 8 as
recalled in Section. 6.2.1. We know that for high enough values of the clause
99
8. SEPARATION OF COMPLEXITY CLASSES 100
density α, we have O(n) frozen variables in almost all of the exponentially
many clusters. The first observation we make is that since the variables in cores
are instantiated in exponentially many clusters, we can preclude value limited
poly(log n)-parametrization. Let us consider then the situation where these clus-
ters were generated by a purported range limited LFP algorithm for k-SAT that
can be parametrized by the ENSP model with clique sizes poly(log n). When
exponentially many solutions have been generated from distributions having
the parametrization of the ENSP model, we will see the effect of conditional
independencies beyond range poly(log n). Let αβγ be a representation of the
variables in cliques α, β and γ, then given a value of β, we will see independent
variation over all their possible conditional values in the variables of α and
γ. If each set of such variables has scope at most poly(log n), then this means
that we have generated more than exponential in poly(log n) distinct solutions,
we will have non-trivial conditional distributions conditioned upon values of β
variables. At this point, the conditional independencies ensure that we will see
cross terms of the form
α
1
βγ
1
α
2
βγ
2
α
1
βγ
2
α
2
βγ
1
.
Note that since O(n) variables have to be changed when jumping from one clus-
ter to another, we may even choose our poly(log n) blocks to be in overlaps of
these variables. This would mean that with a poly(log n) change in frozen vari-
ables of one cluster, we would get a solution in another cluster. But we know
that in the highly constrained phases of d1RSB, we need O(n) variable flips to
get fromone cluster to the next. This gives us the contradiction that we seek.
The basic question in analyzing such mixtures is: How many variables do
we need to condition upon in order to split the distribution into conditionally
independent pieces? The answer is given by (a) the size of the largest cliques
and (b) the number of such cliques that a single variable can occur in. In our
case, these two give us a poly(log n) quantity. When exponentially many solu-
tions have been generated, there will be conditional distributions that exhibit
conditional independence between blocks of variates size poly(log n). Namely,
100
8. SEPARATION OF COMPLEXITY CLASSES 101
there will be no effect of the values of one upon those of the other. This is what
prevents the Hamming distance between solutions from being O(n). This is
shown pictorially in Fig. 8.2.
We may think of such mixtures as possessing only c
poly(log n)
“channels” to
communicate directly with other variables. All long range correlations trans-
mitted in such a distribution must pass through only these many channels.
Therefore, exponentially many solutions cannot independently transmit O(n)
correlations (namely, the variables that have to be changed when jumping from
one cluster to another). Their correlations must factor through this bottleneck,
which gives us conditional independences after range poly(log n). This means
that blocks of size larger than this are now varying independently of each other
conditioned upon some intermediate variables. This gives us the cross-terms
described earlier, and prevents the Hamming distance from being O(n) on the
average over exponentially many solutions. Instead, it must be poly(log n).
We can see that due to the limited parameter space that determines each
variable, it can only display a limited joint behavior. This behavior is completely
determined by poly(log n) other variates, not by O(n) other variates. Thus, the
“jointness” in this distribution lies at a level poly(log n). This is why when
enough solutions have been generated by the LFP, the resulting distribution
will start showing features that are at most of size poly(log n). In other words,
there will be solutions that show cross-terms between features whose size is
poly(log n).
It is also useful to consider how many different parametrizations a block of
size poly(log n) may have. Each variable may choose poly(log n) partners out of
O(n) to form a potential. It may choose O(log n) such potentials. Even coarsely,
this means blocks of variables of size poly(log n) only “see” the rest of the dis-
tribution through equivalence classes that grow as O(n
poly(log n)
)). This quantity
would have to grow exponentially with n in order to display the behavior of
the d1RSB phase. Once again we return to the same point — that the jointness
of the distribution that a purported LFP algorithm would generate would lie
at the poly(log n) levels of conditional independence, whereas the jointness in
101
8. SEPARATION OF COMPLEXITY CLASSES 102
the distribution of the d1RSB solution space is truly O(n). Namely, there are
irreducible interactions of size O(n) that cannot be expressed as interactions be-
tween poly(log n) variates at a time, and chained together by conditional inde-
pendencies as would be done by a LFP. This is central to the separation of com-
plexity classes. Hard regimes of NP-complete problems allow O(n) variates to
irreducibly jointly vary, and accounting for such O(n) jointness that cannot be
factored any further is beyond the capability of polynomial time algorithms.
We collect some observations in the following.
Remark 8.11. The poly(log n) size of features and therefore Hamming distance
between solutions tells us that polynomial time algorithms correspond to the
RS phase of the 1RSB picture, not to the d1RSB phase.
Remark 8.12. We can see from the preceding discussion that the number of in-
dependent parameters required to specify the distribution of the entire solution
space in the d1RSB phase (for k > 8) rises as c
n
, c > 1. This is because it takes
that many parameters to specify the exponentially many O(n) variable “jumps”
between the clusters. These jumps are independent, and cannot be factored
through poly(log n) sized factors since that would mean conditional indepen-
dence of pieces of size poly(log n) and would ensure that the Hamming distance
between solutions was of that order.
Remark 8.13. Note that the central notion is that of the number of independent
parameters, not frozen variables. For example, frozen variables would occur
even in low dimensional parametrizations in the presence of additional con-
straints placed by the problem. This is what happens in XORSAT, where the
linearity of the problem causes frozen variables to occur. The frozen variables
in XORSAT do not arise due to a high dimensional parametrization, but sim-
ply because the 2-core percolates [MM09, '18.3]. Each cluster is a linear space
tagged on to a solution for the 2-core, which is also why the clusters are all of
the same size. Linear spaces always admit a simple description as the linear
span of a basis, which takes the order of log of the size of the space.
Remark 8.14. It is tempting to think that there will be such a parametrization
whenever the algorithmic procedure used to generate the solutions is stage-
102
8. SEPARATION OF COMPLEXITY CLASSES 103
wise local. This is not so. We need the added requirement that “mistakes” are
not allowed. Namely, we cannot change a decision that has been made. Other-
wise, even PFP has the stage-wise bounded local property, but it can give rise to
distributions without any conditional independence factorizations whose fac-
tors are of size poly(log n). When placed in the ENSP, we see that there is fac-
torization, but over an exponentially larger space, where clique sizes are of ex-
ponential size. One might observe that the requirement that we not make any
trial and error at all that limits LFP computations in a fundamentally different
manner than the locality of information flows. See [Put65] for an interesting
related notion of “trial and error predicates” in computability theory.
8.6 Some Perspectives
The following perspectives are reinforced by this work.
1. The most natural object of study for constraint satisfaction problems is the
entire space of solutions. It is this space where the dependencies and inde-
pendencies that the CSP imposes upon covariates that satisfy it manifest.
2. There is an intimate relation between the geometry of the space and its
parametrization. Studying the parametrization of the space of solutions is
a worthwhile pursuit.
3. The view that an algorithm is a means to generate one solution is limited
in the sense that it is oblivious to the geometry of the space of all solutions.
It may, of course, be the appropriate approach in many applications. But
there are applications where requiring algorithms to generate numerous
solutions and approximate with increasing accuracy the entire space of
solutions seems more natural.
4. Conditional independence over factors of small scope is at the heart of re-
solving CSPs by means of polynomial time algorithms. In other words,
polynomial time algorithms succeed by successively “breaking up” the
103
8. SEPARATION OF COMPLEXITY CLASSES 104
problem into smaller subproblems that are joined to each other through
conditional independence. Consequently, polynomial time algorithms can-
not solve problems in regimes where blocks whose order is the same as the
underlying problem instance require simultaneous resolution.
5. Polynomial time algorithms resolve the variables in CSPs in a certain or-
der, and with a certain structure. This structure is important in their study.
In order to bring this structure under study, we may have to embed the
space of covariates into a larger space (as done by the ENSP).
104
A. Reduction to a Single LFP
Operation
A.1 The Transitivity Theorem for LFP
We now gather a few results that will enable us to cast any LFP into one having
just one application of the LFP operator. Since we use this construction to deal
with complex fixed points, we reproduce it in this appendix. The presentation
here closely follows [EF06, Ch. 8].
The first result, known as the transitivity theorem, tells us that nested fixed
points can always be replaced by simultaneous fixed points. Let ϕ(x, X, Y ) and
ψ(y, X, Y ) be first order formulas positive in X and Y . Moreover, assume that
no individual variable free in LFP
y,Y
ψ(y, X, Y ) gets into the scope of a corre-
sponding quantifier or LFP operator in A.1.
[LFP
x,X
ϕ(x, X, [LFP
y,Y
ψ(y, X, Y )])]t (A.1)
Then A.1 is equivalent to a formula of the form
∃(∀)u[LFP
z,Z,
χ(z, Z)]u,
where χ is first order.
105
APPENDIX A. REDUCTION TO A SINGLE LFP OPERATION 106
A.2 Sections and the Simultaneous Induction Lemma
for LFP
Next we deal with simultaneous fixed points. Recall that simultaneous induc-
tions do not increase the expressive power of LFP. The proof utilizes a coding
procedure whereby each simultaneous induction is embedded as a section in a
single LFP operation of higher arity. First, we introduce the notion of a section.
Definition A.1. Let R be a relation of arity (k + l) on A and a ∈ A
k
. Then the
a-section of R, denoted by R
a
, is given by
R
a
:= ¦b ∈ A
k
[R(ba)¦
Next we see how sections can be used to encode multiple simultaneous op-
erators producing relations of lower arity into a single operator producing a
relation of higher arity. Let m operators F
1
, . . . , F
m
act as follows:
F
1
: (A
k
1
) (A
k
m
) → (A
k
1
)
F
2
: (A
k
1
) (A
k
m
) → (A
k
2
)
.
.
.
F
m
: (A
k
1
) (A
k
m
) → (A
k
m
)
(A.2)
We wish to embed these operators as sections of a “larger” operator, which
is known as their simultaneous join.
We will denote a tuple consisting only of a’s by ˜ a. The length of ˜ a be clear
from context.
Definition A.2. Let F
1
, . . . , F
m
be operators acting as above. Set
k := max¦k
1
, . . . , k
m
¦ +m+ 1.
The simultaneous join of F
1
, . . . , F
m
, denoted by J(F
1
, . . . , F
m
), is an operator
acting as
J(F
1
, . . . , F
m
): (A
k
) → (A
k
)
106
APPENDIX A. REDUCTION TO A SINGLE LFP OPERATION 107
such that for any a, b ∈ A, the ˜ ab
i
-section (where the length of ˜ a here is k −
i + 1) of the n
th
power of J is the n
th
power of the operator F
i
. Concretely, the
simultaneous join is given by
J(R) :=
¸
a,b∈A,a=b
((F
1
(R
˜ ab
1, . . . , R
˜ ab
m) ¦˜ ab
1
¦) ∪
∪ (F
m
(R
˜ ab
1, . . . , R
˜ ab
m) ¦˜ ab
m
¦)). (A.3)
The simultaneous join operator defined above has properties we will need
to use. These are collected below.
Lemma A.3. The i
th
power J
i
of the simultaneous join operator satisfies
J
i
=
¸
a,b∈A,a=b
((F
i
1
¦˜ ab
1
¦) ∪ ∪ (F
i
m
¦˜ ab
m
¦)). (A.4)
The following corollaries are now immediate.
Corollary A.4. The fixed point J

of the simultaneous join of operators (F
1
, . . . , F
m
)
exists if and only if their simultaneous fixed point (F

1
, . . . , F

m
) exists.
Corollary A.5. The simultaneous join of inductive operators is inductive.
Finally, we need to show that the simultaneous join can itself be expressed
as a LFP computation. We need formulas that will help us define sections of a
simultaneous induction. Since the sections are coded using tuples of the form
a
k−i+k
i
+1
b
i
, we will need formulas that can express this.
Definition A.6. For ≥ 1 and i = 1, . . . , , the section formulas δ
l
i
(x
1
, . . . , x
l
, v, w)
δ
l
i
(x
1
, . . . , x
l
, v, w) :=

(v = w) ∧ (x
1
= = x

= v) i = 1
(v = w) ∧ (x
1
= = x
−i+1
= v) ∧
(x
−i+2
= = w) i > 1.
(A.5)
For distinct a, b ∈ A, A [= δ
i
[˜ ab
j
ab] if and only if i = j.
Now we are ready to show that simultaneous fixed-point inductions of for-
mulas can be replaced by the fixed point induction of a single formula.
107
APPENDIX A. REDUCTION TO A SINGLE LFP OPERATION 108
Definition A.7. Let
ϕ
1
(R
1
, . . . , R
m
, x
1
), . . . , ϕ
m
(R
1
, . . . , R
m
, x
m
)
be formulas of LFP. As always, we let R
i
be a k
i
-ary relation and x
i
be a k
i
-tuple.
Furthermore, let ϕ
1
, . . . , ϕ
m
be positive in R
1
, . . . , R
m
. Set k := max¦k
1
, . . . , k
m
¦+
m+ 1. Define a new first order formula χ
J
having k variables and computing a
single k-ary relation Z by
χ
J
(Z, z
1
, . . . , z
k
) := ∃v∃w(v = w∧
((ϕ
1
(Z
˜ vw
1, . . . , Z
˜ vw
m, z
1
, . . . , z
k
) ∧ δ
k
1
(z
1
, . . . , z
k
, v, w))
∨ (ϕ
2
(Z
˜ vw
1, . . . , Z
˜ vw
m, z
1
, . . . , z
k
) ∧ δ
k
2
(z
1
, . . . , z
k
, v, w))
.
.
.
∨ (ϕ
m
(Z
˜ vw
1, . . . , Z
˜ vw
m, z
1
, . . . , z
k
) ∧ δ
k
m
(z
1
, . . . , z
k
, v, w)))) (A.6)
Then, the relation computed by the least fixed point of χ
J
contains all the
individual least fixed points computed by the simultaneous induction as its sec-
tions.
108
Bibliography
[ACO08] D. Achlioptas and A. Coja-Oghlan. Algorithmic barriers from
phase transitions. arXiv:0803.2122v2 [math.CO], 2008.
[AM00] Srinivas M. Aji and Robert J. McEliece. The generalized distribu-
tive law. IEEE Trans. Inform. Theory, 46(2):325–343, 2000.
[AP04] Dimitris Achlioptas and Yuval Peres. The threshold for random k-
SAT is 2
k
log 2−O(k). J. Amer. Math. Soc., 17(4):947–973 (electronic),
2004.
[ART06] Dimitris Achlioptas and Federico Ricci-Tersenghi. On the solution-
space geometry of random constraint satisfaction problems. In
STOC’06: Proceedings of the 38th Annual ACM Symposium on The-
ory of Computing, pages 130–139. ACM, New York, 2006.
[AV91] Serge Abiteboul and Victor Vianu. Datalog extensions for database
queries and updates. J. Comput. Syst. Sci., 43(1):62–124, 1991.
[AV95] Serge Abiteboul and Victor Vianu. Computing with first-order
logic. Journal of Computer and System Sciences, 50:309–335, 1995.
[BDG95] Jos´ e Luis Balc´ azar, Josep D´ıaz, and Joaquim Gabarr´ o. Structural
complexity. I. Texts in Theoretical Computer Science. An EATCS
Series. Springer-Verlag, Berlin, second edition, 1995.
[Bes74] Julian Besag. Spatial interaction and the statistical analysis of lat-
tice systems. J. Roy. Statist. Soc. Ser. B, 36:192–236, 1974. With dis-
109
BIBLIOGRAPHY 110
cussion by D. R. Cox, A. G. Hawkes, P. Clifford, P. Whittle, K. Ord,
R. Mead, J. M. Hammersley, and M. S. Bartlett and with a reply by
the author.
[BGS75] Theodore Baker, John Gill, and Robert Solovay. Relativizations of
the { =?A{ question. SIAM J. Comput., 4(4):431–442, 1975.
[Bis06] Christopher M. Bishop. Pattern recognition and machine learning. In-
formation Science and Statistics. Springer, New York, 2006.
[BMW00] G Biroli, R Monasson, and M Weigt. A variational description
of the ground state structure in random satisfiability problems.
PHYSICAL JOURNAL B, 568:551–568, 2000.
[CF86] Ming-Te Chao and John V. Franco. Probabilistic analysis of
two heuristics for the 3-satisfiability problem. SIAM J. Comput.,
15(4):1106–1118, 1986.
[CKT91] Peter Cheeseman, Bob Kanefsky, and William M. Taylor. Where
the really hard problems are. In IJCAI, pages 331–340, 1991.
[CO09] A. Coja-Oghlan. A better algorithm for random k-sat.
arXiv:0902.3583v1 [math.CO], 2009.
[Coo71] Stephen A. Cook. The complexity of theorem-proving procedures.
In STOC ’71: Proceedings of the third annual ACM symposium on The-
ory of computing, pages 151–158, New York, NY, USA, 1971. ACM
Press.
[Coo06] Stephen Cook. The P versus NP problem. In The millennium prize
problems, pages 87–104. Clay Math. Inst., Cambridge, MA, 2006.
[Daw79] A. P. Dawid. Conditional independence in statistical theory. J. Roy.
Statist. Soc. Ser. B, 41(1):1–31, 1979.
[Daw80] A. Philip Dawid. Conditional independence for statistical opera-
tions. Ann. Statist., 8(3):598–617, 1980.
110
BIBLIOGRAPHY 111
[Deo10] Vinay Deolalikar. A distribution centric approach to constraint sat-
isfaction problems. Under preparation, 2010.
[DLW95] Anuj Dawar, Steven Lindell, and Scott Weinstein. Infinitary logic
and inductive definability over finite structures. Inform. and Com-
put., 119(2):160–175, 1995.
[DMMZ08] Herv´ e Daud´ e, Marc M´ ezard, Thierry Mora, and Riccardo
Zecchina. Pairs of sat-assignments in random boolean formulæ.
Theor. Comput. Sci., 393(1-3):260–279, 2008.
[Dob68] R. L. Dobrushin. The description of a random field by means of
conditional probabilities and conditions on its regularity. Theory
Prob. Appl., 13:197–224, 1968.
[Edm65] Jack Edmonds. Minimum partition of a matroid into independents
subsets. Journal of Research of the National Bureau of Standards, 69:67–
72, 1965.
[EF06] Heinz-Dieter Ebbinghaus and J ¨ org Flum. Finite model theory.
Springer Monographs in Mathematics. Springer-Verlag, Berlin, en-
larged edition, 2006.
[Fag74] Ronald Fagin. Generalized first-order spectra and polynomial-
time recognizable sets. In Complexity of computation (Proc. SIAM-
AMS Sympos. Appl. Math., New York, 1973), pages 43–73. SIAM–
AMS Proc., Vol. VII. Amer. Math. Soc., Providence, R.I., 1974.
[Fri99] E. Friedgut. Necessary and sufficient conditions for sharp thresh-
olds and the k-sat problem. J. Amer. Math. Soc., 12(20):1017–1054,
1999.
[FSV95] Ronald Fagin, Larry J. Stockmeyer, and Moshe Y. Vardi. On
monadic np vs. monadic co-np. Inf. Comput., 120(1):78–92, 1995.
111
BIBLIOGRAPHY 112
[Gai82] Haim Gaifman. On local and nonlocal properties. In Proceedings of
the Herbrand symposium (Marseilles, 1981), volume 107 of Stud. Logic
Found. Math., pages 105–135, Amsterdam, 1982. North-Holland.
[GG84] Stuart Geman and Donald Geman. Stochastic relaxation, gibbs
distributions and the bayesian restoration of images. IEEE Trans-
actions on Pattern Analysis and Machine Intelligence, 6(6):721–741,
November 1984.
[GJ79] Michael R. Garey and David S. Johnson. Computers and intractabil-
ity. W. H. Freeman and Co., San Francisco, Calif., 1979. A guide
to the theory of NP-completeness, A Series of Books in the Mathe-
matical Sciences.
[GS00] Martin Grohe and Thomas Schwentick. Locality of order-invariant
first-order formulas. ACM Trans. Comput. Log., 1(1):112–130, 2000.
[Han65] WilliamHanf. Model-theoretic methods in the study of elementary
logic. In Theory of Models (Proc. 1963 Internat. Sympos. Berkeley),
pages 132–145. North-Holland, Amsterdam, 1965.
[HC71] J. M. Hammersley and P. Clifford. Markov fields on finite graphs
and lattices. 1971.
[HH76] J. Hartmanis and J. E. Hopcroft. Independence results in computer
science. SIGACT News, 8(4):13–24, 1976.
[Hod93] Wilfrid Hodges. Model theory, volume 42 of Encyclopedia of Math-
ematics and its Applications. Cambridge University Press, Cam-
bridge, 1993.
[Imm82] Neil Immerman. Relational queries computable in polynomial
time (extended abstract). In STOC ’82: Proceedings of the fourteenth
annual ACMsymposiumon Theory of computing, pages 147–152, New
York, NY, USA, 1982. ACM.
112
BIBLIOGRAPHY 113
[Imm86] Neil Immerman. Relational queries computable in polynomial
time. Inform. and Control, 68(1-3):86–104, 1986.
[Imm99] Neil Immerman. Descriptive complexity. Graduate Texts in Com-
puter Science. Springer-Verlag, New York, 1999.
[Kar72] R. M. Karp. Reducibility among combinatorial problems. In R. E.
Miller and J. W. Thatcher, editors, Complexity of Computer Computa-
tions, pages 85–103. Plenum Press, 1972.
[KF09] D. Koller and N. Friedman. Probabilistic Graphical Models: Principles
and Techniques. MIT Press, 2009.
[KFaL98] Frank R. Kschischang, Brendan J. Frey, and Hans andrea Loeliger.
Factor graphs and the sum-product algorithm. IEEE Transactions
on Information Theory, 47:498–519, 1998.
[KMRT
+
06] Florent Krzakala, Andrea Montanari, Federico Ricci-Tersenghi,
Guilhem Semerjian, and Lenka Zdeborov´ a. Gibbs states and the
set of solutions of random constraint satisfaction problems. CoRR,
abs/cond-mat/0612365, 2006.
[KMRT
+
07] Florent Krza¸kała, Andrea Montanari, Federico Ricci-Tersenghi,
Guilhem Semerjian, and Lenka Zdeborov´ a. Gibbs states and the
set of solutions of random constraint satisfaction problems. Proc.
Natl. Acad. Sci. USA, 104(25):10318–10323 (electronic), 2007.
[KS80] R. Kinderman and J. L. Snell. Markov random fields and their ap-
plications. American Mathematical Society, 1:1–142, 1980.
[KS94] Scott Kirkpatrick and Bart Selman. Critical behavior in the satisfi-
ability of random boolean formulae. Science, 264:1297–1301, 1994.
[KSC84] Harri Kiiveri, T. P. Speed, and J. B. Carlin. Recursive causal models.
J. Austral. Math. Soc. Ser. A, 36(1):30–52, 1984.
113
BIBLIOGRAPHY 114
[Lau96] Steffen L. Lauritzen. Graphical models, volume 17 of Oxford Statis-
tical Science Series. The Clarendon Press Oxford University Press,
New York, 1996. Oxford Science Publications.
[LDLL90] S. L. Lauritzen, A. P. Dawid, B. N. Larsen, and H.-G. Leimer.
Independence properties of directed Markov fields. Networks,
20(5):491–505, 1990. Special issue on influence diagrams.
[Lev73] Leonid A. Levin. Universal sequential search problems. Problems
of Information Transmission, 9(3), 1973.
[Li09] Stan Z. Li. Markov random field modeling in image analysis. Ad-
vances in Pattern Recognition. Springer-Verlag London Ltd., Lon-
don, third edition, 2009. With forewords by Anil K. Jain and Rama
Chellappa.
[Lib04] Leonid Libkin. Elements of finite model theory. Texts in Theoretical
Computer Science. An EATCS Series. Springer-Verlag, Berlin, 2004.
[Lin05] S. Lindell. Computing monadic fixed points in linear
time on doubly linked data structures. available online at
http://citeseerx.ist.psu.edu/doi=10.1.1.122.1447, 2005.
[LR03] Richard Lassaigne and Michel De Rougemont. Logic and Complex-
ity. Springer-Verlag, London, 2003.
[MA02] Cristopher Moore and Dimitris Achlioptas. Random k-sat: Two
moments suffice to cross a sharp threshold. FOCS, pages 779–788,
2002.
[MM09] Marc M´ ezard and Andrea Montanari. Information, physics, and com-
putation. Oxford Graduate Texts. Oxford University Press, Oxford,
2009.
114
BIBLIOGRAPHY 115
[MMW05] Elitza N. Maneva, Elchanan Mossel, and Martin J. Wainwright. A
new look at survey propagation and its generalizations. In SODA,
pages 1089–1098, 2005.
[MMW07] Elitza Maneva, Elchanan Mossel, and Martin J. Wainwright. A
new look at survey propagation and its generalizations. J. ACM,
54(4):Art. 17, 41 pp. (electronic), 2007.
[MMZ05] M. M´ ezard, T. Mora, and R. Zecchina. Clustering of solutions in
the random satisfiability problem. Phys. Rev. Lett., 94(19):197–205,
May 2005.
[Mos74] Yiannis N. Moschovakis. Elementary induction on abstract structures.
North-Holland Publishing Co., Amsterdam, 1974. Studies in Logic
and the Foundations of Mathematics, Vol. 77.
[Mou74] John Moussouris. Gibbs and Markov random systems with con-
straints. J. Statist. Phys., 10:11–33, 1974.
[MPV87] Marc M´ ezard, Giorgio Parisi, and Miguel Angel Virasoro. Spin
glass theory and beyond, volume 9 of World Scientific Lecture Notes in
Physics. World Scientific Publishing Co. Inc., Teaneck, NJ, 1987.
[MPZ02] M M` ezard, G Parisi, and R Zecchina. Analytic and Algorithmic
Satisfiability Problems. Science, 297(August):812–815, 2002.
[MRTS07] Andrea Montanari, Federico Ricci-Tersenghi, and Guilhem Se-
merjian. Solving constraint satisfaction problems through belief
propagation-guided decimation, Sep 2007.
[MSL92] David Mitchell, Bart Selman, and Hector Levesque. Hard and easy
distributions of sat problems. In AAAI, pages 459–465, 1992.
[MZ97] R´ emi Monasson and Riccardo Zecchina. Statistical mechanics of
the random k-satisfiability model. Phys. Rev. E, 56(2):1357–1370,
Aug 1997.
115
BIBLIOGRAPHY 116
[MZ02] Marc M´ ezard and Riccardo Zecchina. Random k-satisfiability
problem: From an analytic solution to an efficient algorithm. Phys.
Rev. E, 66(5):056126, Nov 2002.
[Put65] Hilary Putnam. Trial and error predicates and the solution to a
problem of mostowski. J. Symb. Log., 30(1):49–57, 1965.
[RR97] Alexander A. Razborov and Steven Rudich. Natural proofs. J.
Comput. System Sci., 55(1, part 1):24–35, 1997. 26th Annual ACM
Symposium on the Theory of Computing (STOC ’94) (Montreal,
PQ, 1994).
[SB99] Thomas Schwentick and Klaus Barthelmann. Local normal forms
for first-order logic with applications to games and automata. In
Discrete Mathematics and Theoretical Computer Science, pages 444–
454. Springer Verlag, 1999.
[See96] Detlef Seese. Linear time computable problems and first-order de-
scriptions. Math. Structures Comput. Sci., 6(6):505–526, 1996. Joint
COMPUGRAPH/SEMAGRAPH Workshop on Graph Rewriting
and Computation (Volterra, 1995).
[Sip92] Michael Sipser. The history and status of the p versus np question.
In STOC, pages 603–618, 1992.
[Sip97] M. Sipser. Introduction to the Theory of Computation. PWS Publishing
Company, 1997.
[Var82] Moshe Y. Vardi. The complexity of relational query languages (ex-
tended abstract). In STOC ’82: Proceedings of the fourteenth annual
ACM symposium on Theory of computing, pages 137–146, New York,
NY, USA, 1982. ACM.
[Wig07] Avi Wigderson. P, NP, and Mathematics - a computational com-
plexity perspective. Proceedings of the ICM 2006, 1:665–712, 2007.
116

ÚÊ

This work is dedicated to my late parents: my father Shri. Shrinivas Deolalikar, my mother Smt. Usha Deolalikar, and my maushi Kum. Manik Deogire, for all their hard work in raising me; and to my late grand parents: Shri. Rajaram Deolalikar and Smt. Vimal Deolalikar, for their struggle to educate my father inspite of extreme poverty. This work is part of my Matru-Pitru Rin1 .

I am forever indebted to my wife for her faith during these years.

1

The debt to mother and father that a pious Hindu regards as his obligation to repay in this

life.

Abstract We demonstrate the separation of the complexity class NP from its subclass P. Throughout our proof, we observe that the ability to compute a property on structures in polynomial time is intimately related to an atypical property of the space of solutions — namely, the space is parametrizable with only cpoly(log n) , c > 1 parameters instead of the typical cn parameters required for a joint distribution of n covariates. This type of exponentially smaller parametrization arises as a result of severe limitations placed on the interaction between the variates. In particular, it may arise from range limited interactions where variates interact at short ranges, and chain together such interactions to create long range interactions. Such long range interactions then would be characterized by the statistical notions of conditional independence and sufficient statistics. The presence of conditional independencies manifests in the form of economical parametrizations of the joint distribution of covariates. Likewise, such economical parametrizations can arise from interactions which take only cpoly(log n) many values. In both cases, the result on the joint distribution is the same — it is parametrizable with only cpoly(log n) independent parameters. In order to apply this analysis to the space of solutions of random constraint satisfaction problems, we utilize and expand upon ideas from several fields spanning logic, statistics, graphical models, random ensembles, and statistical physics. We begin by introducing the requisite framework of graphical models for a set of interacting variables. We focus on the correspondence between Markov and Gibbs properties for directed and undirected models as reflected in the factorization of their joint distribution, and the number of independent parameters required to specify the distribution. Next, we build the central contribution of this work. We show that there are fundamental conceptual relationships between polynomial time computation,

which again result in poly(log n)parametrization. known as the d1RSB phase. as the clause density is increased towards the SAT-unSAT threshold. . Next we introduce ideas from the 1RSB replica symmetry breaking ansatz of statistical physics. We then use results from ensembles of factor graphs of random k-SAT to bound the various information flows in this directed graphical model. which allows us to compute factorizations locally and parameterize using Gibbs potentials on cliques. In this phase. we exploit the limitation that first order logic can only express properties in terms of a bounded number of local neighborhoods of the underlying structure. and it is here that we will demonstrate our separation. Distributions computed by LFP must satisfy this model. monadic LFP is a range limited interaction model that possesses certain directed Markov properties that may be stated in terms of conditional independence and sufficient statistics. We recollect the description of the clustered phase for random k-SAT that arises when the clause density is sufficiently high and k ≥ 9. an arbitrarily large fraction of all variables in cores freeze within exponentially many clusters in the thermodynamic limit. and poly(log n)-parametrization. we view the LFP computation as “factoring through” several stages of first order computations. This model is directed. Specifically. We parametrize the resulting distributions in a manner that demonstrates that irreducible interactions between covariates — namely. By asking FO(LFP) to extend partial assignments on ensembles of random k-SAT. Note that the onset of this phase is rigorously proven only for k ≥ 9. and then utilize the limitations of first order logic. Next. In particular. we build distributions of solutions. Then we relate complex fixed points to value limited interactions. We then construct a dynamic graphical model on a product space that captures all the information flows through the various stages of a LFP computation on ensembles of k-SAT structures. we encode k-SAT formulae as structures on which FO(LFP) captures polynomial time. In order to demonstrate these relationships. The Hamming distance between a solution that lies in one cluster and that in another is O(n).which is completely captured by the logic FO(LFP) on classes of successor structures.

For value limited complex LFP. We show that this would contradict the behavior exhibited by the solution space in the d1RSB phase. we show how to obtain a parametrization of the solution space by merging potentials with scope O(n).those that may not be factored any further through conditional independencies — cannot grow faster than poly(log n) in the range limited monadic LFP computed distributions. we demonstrate that a purported polynomial time solution to k-SAT would result in solution space that is a mixture of distributions each having an exponentially smaller parametrization than is consistent with the highly constrained d1RSB phases of k-SAT. This shows that polynomial time algorithms are not capable of solving NP-complete problems in their hard phases. Our work shows that every polynomial time algorithm must fail to produce solutions to large enough problem instances of k-SAT in the d1RSB phase. . and demonstrates the separation of P from NP. This corresponds to the intuitive picture provided by physics about the emergence of extensive (meaning O(n)) long-range correlations between variables in this phase and also explains the empirical observation that all known polynomial time algorithms break down in this phase. Using the aforementioned limitations of LFP. This allows us to analyze the behavior of the entire class of polynomial time algorithms on ensembles simultaneously.

. . . . . . . . . Value Limited Interactions . . . . . On the Atypical Nature of poly(log n)-parameterization . . . . . .2. . . . . . . . .1 Two Kinds of poly(log n)-parameterizations . . . .1.3 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 Range Limited Interactions . . . . . . . . . . . . . .1 2. . . . . . . . . . .5 Gibbs Random Fields and the Hammersley-Clifford Theorem .1 4. Fixed Point Logics for P and PSPACE .1 3. . . . . . . . . . .Contents 1 Introduction 1. . . . . . . . . .3 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3. . . . . . . . . . . . . 3 Distributions with poly(log n)-Parametrization 3. Conditional Independence in Undirected Graphical Models . . I-maps and D-maps . . . . . . 3 5 15 15 17 21 24 26 29 30 32 32 35 38 38 40 41 44 Interaction Models and Conditional Independence 2. . .1 2. . . . 2. 4 Logical Descriptions of Computations 4. . . . . .1. . . . . . . .1.2 Inductive Definitions and Fixed Points . . . . . . . . . . . . . . . 1 .1 2 Synopsis of Proof . . . Factor Graphs . . .4 2. . . . . . . .2 Conditional Independence . . . .1. . . . . . . . . . .2 3. Our Treatment of Range and Value Limited Distributions . . . . . . . . . The Markov-Gibbs Correspondence for Directed Models . .

. . . . . . . . . . Some Perspectives . . Aggregate Properties of LFP over Ensembles . . . . . . . . . . . 6. . . .3 5. . .4 8. . . . . . . .2. .2.6 Encoding k-SAT into Structures . . . . . .1 5. .2 Sections and the Simultaneous Induction Lemma for LFP . . . .2 8. . . . .2. . . . . . . . . . .1 6. . . . Conditional Independence in Complex Fixed Points . . . . .1 The Transitivity Theorem for LFP . . . .1 8.1 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Simple Monadic LFP and Conditional Independence . . . . . . . . 8 Separation of Complexity Classes 8. . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . 7 Random Graph Ensembles 7. . .2 Ensembles and Phase Transitions . . . . .2 Measuring Conditional Independence in Range Limited Models . . . . . . . . . .1. . . . Parametrization of the ENSP . . . . . . . . . .5 8. . . . . . . . . . . . . . . .1 8. . . . . . . . . . 5. . . . 48 50 51 55 60 62 64 64 66 68 71 74 75 75 76 78 78 80 80 83 85 87 93 96 The 1RSB Ansatz of Statistical Physics 6. . . . . . 105 A. . 106 2 .3 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Separation . Generating Distributions . 7. .2 Cores and Frozen Variables . . . . . . . . . . . . . . . 8.2 Locally Tree-Like Property . . . . . . . . . . . . . . . . .1 Properties of Factor Graph Ensembles . .2. .4 6 The Limitations of LFP . . . . . . . . . . Degree Profiles in Random Graphs . . 103 105 A Reduction to a Single LFP Operation A. . . .1 7. . . . . . . . . Performance of Known Algorithms in the d1RSB Phase . . .3 8. . . . . . Disentangling the Interactions: The ENSP Model . . . . . Generating Distributions from LFP . . . . . . . . . . .2. . . . . . . . . . . . . . . . . . .1 Locality of First Order Logic . The LFP Neighborhood System .1. . . . . . . . . .2 5. . . . . . . . . . . . . . . . . The d1RSB Phase .2 5 The Link Between Polynomial Time Computation and Conditional Independence 5. . .

¨ The origin of the question seems to date back to a letter from Godel to Von Neumann in 1956 [Sip92]. If. The Cook-Levin theorem showed the existence of complete problems for this class. This shifted the focus to methods us? ? ? 3 . and Levin [Lev73]. would be profound. and on the general philosophical question of whether human creativity can be automated. Formal definitions of the class NP awaited work by Edmonds [Edm65]. many problems central to diverse areas of application were shown to be NP-complete (see [GJ79] for a list). the focus moved to complexity theory where early work used diagonalization and relativization techniques. the consequences would be even more stunning. In subsequent years. From the initial question in logic. we could never solve these problems efficiently. and demonstrated that SAT– the problem of determining whether a set of clauses of Boolean literals has a satisfying assignment – was one such problem. The P = NP question is also singular in the number of approaches that researchers have brought to bear upon it over the years. Later. C LIQUE. Cook [Coo71]. The implications of this on applications such as cryptography. Introduction The P = NP question is generally considered one of the most important and far reaching questions in contemporary mathematics and computer science. and H AMILTONIAN C IRCUIT. [BGS75] showed that these methods were perhaps inadequate to resolve P = NP by demonstrating relativized worlds in which P = NP and others in which P = NP (both relations for the appropriately relativized classes). since every one of these problems would have a polynomial time solution. which include T RAVELLING S ALESMAN. Karp [Kar72] showed that twenty-one well known combinatorial problems. on the other hand P = NP. However. were also NP-complete. If P = NP.1.

etc. We mention one of these. Owing to the difficulty of resolving the question. Later. INTRODUCTION 4 ing circuit complexity and for a while this approach was deemed the one most likely to resolve the question. and complexity theory in particular. characterizations of the classes P [Imm86]. there has been speculation that resolving the P = NP question might be outside the domain of mathematical techniques. BDG95]. This is the area of descriptive complexity theory — the branch of finite model theory that studies the expressive power of various logics viewed through the lens of complexity theory. See the books [Sip97] and [BDG95] for standard references. Most books on theoretical computer science in general. a negative result in [RR97] showed that a class of techniques known as “Natural Proofs” that subsumed the above could not separate the classes NP and P. since it is central to our work. This field began with the result [Fag74] that showed that NP corresponds to queries that are expressible in second order existential logic over finite structures. [Var82] and PSPACE over ordered structures were also obtained. the question might be independent of standard axioms of set theory. The influence of the P = NP question is felt in other areas of mathematics. Once again. An older excellent review is [Sip92]. such as definitions of the complexity classes P. More precisely. There are several introductions to the P = NP question and the enormous amount of research that it has produced. The reader is referred to [Coo06] for an introduction which also serves as the official problem description for the Clay Millenium Prize. Preliminaries and Notation Treatments of standard notions from complexity theory. NP. See [Wig07] for a more recent introduction. The first such results in [HH76] show that some relativized versions of the P = NP question are independent of reasonable formalizations of set theory. may be found in [Sip97. 4 ? ? ? ? .1. also contain accounts of the problem and attempts made to resolve it. and also to the negative results mentioned above. PSPACE. provided one-way functions exist. and notions of reductions and completeness for complexity classes.

For example. Through their interaction. may be obtained from any standard text on logic such as [Hod93]. Consider a system of n interacting variables such as is ubiquitous in mathematical sciences. Additional references to results will be provided within the chapters. For ease of presentation. 1. Standard references for graphical models include [Lau96] and the more recent [KF09]. Preliminaries from logic. first order language. An earlier text is [MPV87]. For an engaging introduction. and highlight their interplay. Ch. For a treatment of the statistical physics approach to random CSPs.1. vocabulary. variables exert an influence on each other. Lib04] for excellent treatments of finite model theory and [Imm99] for descriptive complexity. The proof centers on the study of logical and algorithmic constructs where 5 . etc. we refer to [EF06. INTRODUCTION 5 Our work will span various developments in three broad areas. please see [Bis06.. In particular. The technical details of each stage are described in subsequent chapters. we feel it would be helpful to provide standard textual references for these areas. Given this. or n Ising spins that interact with each other in a ferromagnet. in the order in which they appear in the work. such as notions of structure.1 Synopsis of Proof This proof requires a convergence of ideas and an interplay of principles that span several areas within mathematics and physics. models. 8]. This represents the majority of the effort that went into constructing the proof. we will assume our variables are binary. these may be the variables in a k-SAT instance that interact with each other through the clauses present in the k-SAT formula. While we have endeavored to be relatively complete in our treatment. we recommend [MM09]. For an early treatment in statistical mechanics of Markov random fields and Gibbs distributions. see [KS80]. and affect the values each other may take. we felt that it would be beneficial to explain the various stages of the proof.

With this much information. 1. Both can be specified with just two parameters. In this distribution. We will see that such distributions are at the heart of polynomial time computability. (0. 0). This is best explained with two examples. We will call such distributions poly(log n)-parametrizable. the two parameters are the probability of the first variate and the probability of the second variate taking the value 1. 1. In the second example. This causes all polynomial time algorithms to fail on them. Consider first the uniform distribution over all binary pairs {(0. The first measures correlations. Consider next the distribution over 5 covariates which is uniformly supported only on (0. the covariates are tightly correlated. (1. We will study distributions on n covariates that require only 2poly(log n) parameters to specify. we again need two parameters to specify the distribution — namely. 1)} There is no correlation between the two variables in this distribution. but the distribution is not “ample”. They are independent. in hard phases of constraint satisfaction problems such as k-SAT. the space of solutions is both correlated and ample. 0) and (1. we can specify the joint distribution since the variates are independent. there is a commonality. Though initially these two distributions appear quite different. 0. 0. 1. In the first example. A distribution over n covariates is defined to be ample when it is supported on cn . 1). 0. Conversely. 0). INTRODUCTION such complex interactions have “simple” descriptions. Though both distributions have simple descriptions. 1). (1. the reasons are very different. the two points on which it is supported. c > 1 points. 6 .1. 6 What constitutes a simple description of the interaction of n variables? The number of independent parameters required to specify the joint distribution is a measure of the complexity of interactions between the covariates. and the second measures “ampleness” under those correlations. There are two components to this.

1. INTRODUCTION

7

A distribution is simple to describe if there is either independence between the variates (as was the case in our first example) or limited support of the distribution (as was the case in our second example). We call the first case a range limited interaction because variates interact with a limited range of other variates. The second case is called value limited since the number of joint values the variates can take is limited. The common feature underlying both cases is that the distribution has a very economical parametrization as compared to “true” joint distributions (more precisely, statistically typical joint distributions) on n covariates, which require O(2n ) parameters to specify. Thus, we wish to study such distributions, and will consider both the cases of range and value limited interactions. At this point, we visit the topic of graphical interaction models and conditional independence which is a manifestation of range limited interactions. While complete independence between variates in a complex system is rare, conditional independence between blocks of variables is fairly frequent. We see that factorization into conditionally independent pieces manifests in terms of economical parametrizations of the joint distribution. Graphical models offer us a way to measure the size of these interactions. The factorization of interactions can be represented by a corresponding factorization of the joint distribution of the variables over the space of configurations of the n variables subject to the constraints of the problem. It has been realized in the statistics and physics communities for long that certain multivariate distributions decompose into the product of a few types of factors, with each factor itself having only a few variables. Such a factorization of joint distributions into simpler factors can often be represented by graphical models whose vertices index the variables. A factorization of the joint distribution according to the graph implies that the interactions between variables can be factored into a sequence of “local interactions” between vertices that lie within neighborhoods of each other. Consider the case of an undirected graphical model. The factoring of interactions may be stated in terms of either a Markov property, or a Gibbs property

7

1. INTRODUCTION

8

with respect to the graph. Specifically, the local Markov property of such models states that the distribution of a variable is only dependent directly on that of its neighbors in an appropriate neighborhood system. Of course, two variables arbitrarily far apart can influence each other, but only through a sequence of successive local interactions. The global Markov property for such models states that when two sets of vertices are separated by a third, this induces a conditional independence on variables corresponding to these sets of vertices, given those corresponding to the third set. On the other hand, the Gibbs property of a distribution with respect to a graph asserts that the distribution factors into a product of potential functions over the maximal cliques of the graph. Each potential captures the interaction between the set of variables that form the clique. The Hammersley-Clifford theorem states that a positive distribution having the Markov property with respect to a graph must have the Gibbs property with respect to the same graph. The condition of positivity is essential in the Hammersley-Clifford theorem for undirected graphs. However, it is not required when the distribution satisfies certain directed models. In that case, the Markov property with respect to the directed graph implies that the distribution factorizes into local conditional probability distributions (CPDs). Furthermore, if the model is a directed acyclic graph (DAG), we can obtain the Gibbs property with respect to an undirected graph constructed from the DAG by a process known as moralization. We will return to the directed case shortly. Chapter 2 develops the principles underlying the framework of graphical models. We will not use any of these models in particular, but construct another directed model on a larger product space that utilizes these principles and tailors them to the case of least fixed point logic, which we turn to next. At this point, we change to the setting of finite model theory. Finite model theory is a branch of mathematical logic that has provided machine independent characterizations of various important complexity classes including P, NP, and PSPACE. In particular, the class of polynomial time computable queries on successor structures has a precise description — it is the class of queries

8

1. INTRODUCTION

9

expressible in the logic FO(LFP) which extends first order logic with the ability to compute least fixed points of positive first order formulae. Least fixed point constructions iterate an underlying positive first order formula, thereby building up a relation in stages. We take a geometric picture of a monadic LFP computation. Initially the relation to be built is empty. At the first stage, certain elements, whose types satisfy the first order formula, enter the relation. This changes the neighborhoods of these elements, and therefore in the next stage, other elements (whose neighborhoods have been thus changed in the previous stages) become eligible for entering the relation. The positivity of the formula implies that once an element is in the relation, it cannot be removed, and so the iterations reach a fixed point in a polynomial number of steps. Importantly from our point of view, the positivity and the stage-wise nature of LFP means that the computation has a directed representation on a graphical model that we will construct. Recall at this stage that distributions over directed models enjoy factorization even when they are not defined over the entire space of configurations. We may interpret this as follows: monadic LFP relies on the assumption that variables that are highly entangled with each other due to constraints can be disentangled in a way that they now interact with each other through conditional independencies induced by a certain directed graphical model construction. Of course, an element does influence others arbitrarily far away, but only through a sequence of such successive local and bounded interactions. The reason LFP computations terminate in polynomial time is analogous to the notions of conditional independence that underlie efficient algorithms on graphical models having sufficient factorization into local interactions. In order to apply this picture in full generality to all LFP computations, we use the simultaneous induction lemma to push all simultaneous inductions into nested ones, and then employ the transitivity theorem to encode nested fixed points as sections of a single relation of higher arity. We then see that this is the case of a value limited interaction between O(n) variates. Namely, although n variates interact with each other, they do not take cn joint values. Building

9

Namely. They are so “dense” that they cannot be factored through the bottleneck of the local and bounded properties of first order logic that limit each stage of LFP computation. we turn to the study of ensemble random k-SAT where the properties of the ensemble are studied as a function of the clause density parameter. the phase changes in the solution geometry of random k-SAT ensembles as the clause density increases. It demonstrates the properties of high correlation between large sets of variables that we will need. have gathered much research attention. The distribution is ample. The preceding insights now direct us to the setting necessary in order to separate P from NP. 1. This phase is called 1dRSB (1Step Dynamic Replica Symmetry Breaking) and was conjectured by physicists as part of the 1RSB ansatz. and highly correlated distributions having no factorization into conditionally independent pieces (remember the value limited case is already ruled out since the distribution is ample). INTRODUCTION 10 the machinery that can precisely map all these cases to the picture of either factorization into range limited or value limited interactions is the subject of Chapter 5. In search of regimes where such situations arise. In the past two decades.1. the emergence of cores that are sets of 10 . Specifically. this should happen when each variable has to simultaneously satisfy constraints involving an extensive (O(n)) fraction of the variables in the problem. The 1RSB ansatz of statistical mechanics says that the space of solutions of random k-SAT shatters into exponentially many clusters of solutions when the clause density is sufficiently high. and blocks of n variables are instantiated cn distinct ways under these strong correlations. We need a regime of NP-complete problems where interactions between variables have the following two properties. We will now add ideas from this field which lies on the intersection of statistical mechanics and computer science to the set of ideas in the proof. we have ample. Namely. it takes cn joint values. It has since been rigorously proved for high values of k. 2. Intuitively.

As the clause density is increased. our proof does not say anything about the efficacy of various algorithms for 3-SAT. 1. they take the same value throughout the cluster. they resist attack by local and bounded first order stages of a monadic LFP computation. and the underlying instance of SAT is no longer satisfiable. We specifically prove that the d1RSB phase is out of reach for polynomial time algorithms. Due to strong O(n) correlations that cannot be factored through conditional independencies.” Namely. In other words. as the clause density is increased towards the SAT-unSAT threshold. we will work in this regime. the solution space vanishes. In Chapter 7. Furthermore. We should stress that the picture described above is known to hold in the case of random k-SAT only for k ≥ 9. Such stages are precisely the ones that we need since they possess the following two properties. there is empirical evidence that this picture does not hold. Changing the value of a variable within a cluster necessitates changing O(n) other variables in order to arrive at another satisfying solution. they resist attack by complex fixed points that produce value limited distributions. for instance. that is maximally far apart from every other cluster. such as k = 3. For lower values of k. the “true” d1RSB phase arises in random k-SAT for k ≥ 9 as the clause density rises above (2k /k) ln k. as the clause density increases above the SAT-unSAT threshold. each cluster collapses steadily towards a single solution. 2. we make a brief excursion into the random graph theory of 11 . the variables in these cores “freeze. and this phase is only reached at k ≥ 9. Therefore. INTRODUCTION 11 C clauses all of whose variables lie in a set of size C (this actually forces C to be O(n)). Due to their ampleness. Since we need all the known properties of the d1RSB phase. Finally. Physicists think of this as an “energy gap” between the clusters. We reproduce the rigorously proved picture of the 1RSB ansatz that we will need in Chapter 6.1. which would be in a different cluster. which arises from their instantiations in exponentially many clusters.

we pull all the threads and machinery together. 3. We then set up the framework whereby we can generate distributions of solutions to each instance by asking a purported LFP algorithm for k-SAT to extend partial assignments on variables to full satisfying assignments. we utilize the following properties for range limited models. In order to do so. The locality and boundedness properties of FO that put constraints upon each individual stage of the LFP computation. 12 . we encode k-SAT instances as queries on structures over a certain vocabulary in a way that LFP captures all polynomial time computable queries on them. At this point. we wish to measure the growth of independent parameters of distributions of solutions whose embeddings into the larger product space factor over the ENSP. First. The distribution of solutions generated by LFP then is a mixture of distributions each of whom factors according to an ENSP. Next. From here. This model is only polynomially larger than the structure itself. This allows us to study the computations performed by the LFP with various initial values under a directed graphical model. 1. specifically that neighborhoods that occur during the LFP computation are of size poly(log n) asymptotically almost surely in the n → ∞ limit. The directed nature of the model that comes from properties of LFP.1. 2. we embed the space of covariates into a larger product space which allows us to “disentangle” the flow of information during a LFP computation. These provide us with bounds on the largest irreducible interactions between variables during the various stages of a LFP computation. The properties of neighborhoods that are obtained by studies on random graph ensembles. or ENSP model. We call this the Element-Neighborhood-Stage Product. we obtain results that asymptotically almost surely upper bound the size of the largest cliques in the neighborhood systems on the Gaifman graphs that we study later when we build models for the range limited interactions that occur during monadic LFP. Finally in Chapter 8. INTRODUCTION 12 the factor graph ensembles underlying random k-SAT.

and covers the entire graphical model (that has poly(n) variables. This behavior will manifest when exponentially many solutions are generated by the LFP construction. This means that when the mixture is exponentially numerous. We show how to deal with mixtures of value limited models. This shows that LFP cannot express the satisfiability query in the d1RSB phase for high enough k. we come to value limited models. Here. thereby giving us poly(log n)-parametrization.) Now we close the loop and show that a distribution of solutions for k-SAT constructed by any purported LFP algorithm (monadic or complex) would not have enough parameters to describe the known picture of k-SAT in the d1RSB phase for k ≥ 9 — namely. in exponentially numerous mixtures of range limited models. causing the Hamming distance between solutions to be of this order as well. Simple properties of LFP. interactions are of size O(n). In particular. This also explains the 13 . such as the closure ordinal being a polynomial in the structure size. but they are limited to only cpoly(log n) values. We build a technique that merges various O(n) potentials that are poly(log n)-parametrizable into a single potential that is also poly(log n)-parametrizable.1. In other words. Next. The crucial property that allows us to analyze mixtures of range limited distributions that factor according to some ENSP is that we can parametrize the distribution using potentials on cliques of its moralized graph that are of size at most poly(log n). INTRODUCTION 13 4. solutions for k-SAT that are constructed using range limited LFP will display aggregate behavior that reflects that they are constructed out of “building blocks” of size poly(log n). we will see features that reflect the poly(log n) factor size of the conditionally independent parametrization. and separates P from NP. the presence of extensive frozen variables in exponentially many clusters with Hamming distance between the clusters being O(n). we would have conditionally independent variation between blocks of poly(log n) variables. The case of value limited LFP also leads to a contradiction since it would be unable to explain the exponentially many cluster instantiations of cores that are present in the d1RSB phase.

INTRODUCTION 14 empirical observation that all known polynomial time algorithms fail in the d1RSB phase for high values of k. It also completes this picture.1. are the source of failure of polynomial time algorithms. since it says that extensive O(n) correlations that (a) cannot factor through conditional independencies and (b) are amply instantiated. and also establishes on rigorous principles the physics intuition about the onset of extensive long range correlations in the d1RSB phase that causes all known polynomial time algorithms to fail. 14 .

Z by PX (x). Interaction Models and Conditional Independence Systems involving a large number of variables interacting in complex ways are ubiquitous in the mathematical sciences. it is not often that one encounters independence between variables. etc. However. PY (y). Throughout this work. y) will denote the joint mass of (X. These interactions induce dependencies between the variables. We denote the probability mass functions of discrete random variables X. we assume our random variables to be discrete unless stated otherwise. Speaking in terms of algorithmic complexity. Both independence and conditional independence among sets of variables have been standard objects of study in probability and statistics. y. one may avoid the cost of enumeration of an exponential number of hypothesis in evaluating functions of the distribution that are of interest. PZ (z) respectively. Y. Because of the presence of such dependencies in a complex system with interacting variables. PX. Y. Random variables will be denoted by upper case letters such as X. 2. such as x. z.2. Similarly.1 Conditional Independence We first fix some notation. one often hopes that by exploiting the conditional independence between certain sets of variables. one frequently encounters conditional independence between sets of variables. and so 15 . The values a random variable takes will be denoted by the corresponding lower case letters. Z. Y ).Y (x. which we usually denote by Λ following physics convention. We may also assume that they take values in a common finite state space.

Several notions from statistics may be recast in this language. E XAMPLE 2. The notion of conditional independence is central to our proof. if Θ is the parameter to be estimated. and can be replaced by the following symmetric definition. written X⊥ | Z. We freely use the term “distribution” for the probability mass function. Thus. We drop subscripts on the P when it causes no confusion. then T is a sufficient statistic if P (θ | x) = P (θ | t). Daw80]. A sufficient statistic T in the problem of parameter estimation is that which renders the estimate of the parameter independent of any further information from the sample X. X is conditionally independent of Y given Z. The notion of sufficiency may be seen as the presence of a certain conditional independence [Daw79]. This is an asymmetric definition. all there is to be gained from the sample in terms of information about Θ is already present in T alone. if Θ is a posterior that is being 16 . Thus. The notion of conditional independence pervades statistical theory [Daw79. Definition 2. Let notation be as above. if ⊥Y P (x.2. Z) is equal to the conditional distribution of X given Z alone. In particular. z) = P (x | z). y | z) = P (x | z)P (y | z). y) = P (x)P (y). The asymmetric version which says that the information contained in Y is superfluous to determining the value of X once the value of Z is known may be represented as P (x | y.2.1. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 16 on. no further information about the value of X can be extracted from the value of Y . The intuitive definition of the conditional independence of X from Y given Z is that the conditional distribution of X given (Y. Recall that X is independent of Y if P (x. This means that once the value of Z is given.

xn ). One may think of the graphical model as representing the family of distributions whose law fulfills the conditional independence statements made by the graph. then the above relation says that the posterior depends on the data X through the value of T alone. . We will study the interplay between conditional independence properties of P and its factorization properties. In general. . There are. 2. We first consider the case of undirected models. . . We will denote values of the random vector (X1 . . and its factorization. . Fig. . Xn ) then takes values in a configuration space Ωn = Λn . such a statement would lead to a reduction in the complexity of inference. 17 . A member of this family may satisfy any number of additional conditional independence statements. Random Fields and Markov Properties Graphical models are very useful because they allow us to read off conditional independencies of the distributions that satisfy these models from the graph itself. . broadly. The notation XV \I will denote the set of variables excluding those whose indices lie in the set I.2 Conditional Independence in Undirected Graphical Models Graphical models offer a convenient framework and methodology to describe and exploit conditional independence between sets of variables in a system. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 17 computed by Bayesian inference. Let P be a probability measure on the configuration space. . we will consider graphs G = (V. The random variables all take their values in a common state space Λ. 2. . Xn ).1 illustrates an undirected graphical model with ten variables. two kinds of graphical models: directed and undirected. . The random vector (X1 . but not less than those prescribed by the graph. . . Recall that we wish to study the relation between conditional independence of a distribution with respect to a graphical model. E) whose n vertices index a set of n random variables (X1 .2. Clearly. . Xn ) simply by x = (x1 . . .

the sites are vertices on a graph. The vertices in set A are separated from those in set B by set C. with respect to the graph. We will often be interested in homogeneous neighborhood systems of S on a graph in which. one may write increasingly stringent conditional independence properties that a set of random variables satisfying a graphical model may possess. Given a set of variables S known as sites. a neighborhood system NS on S is a collection of subsets {Ni : 1 ≤ i ≤ n} indexed by the sites in S that satisfy 1. In order to state these.3. the relationship of being a neighbor is mutual: si ∈ Nj ⇔ sj ∈ Ni . For random variables to satisfy the global Markov property relative to this graphical model. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 18 A C B Figure 2. and of separation. a site is not a neighbor to itself (this also means there are no self-loops in the induced graph): si ∈ Ni . ⊥B Towards that end. and / 2. A⊥ | C. the corresponding sets of random variables must be conditionally independent. for 18 .2. Definition 2. we first define two graph theoretic notions — those of a general neighborhood system. Namely.1: An undirected graphical model. and the neighborhood system Ni is the set of neighbors of vertex si on the graph. In many applications. Each vertex represents a random variable.

. Definition 2. the neighborhood of a site is simply the set of sites that lie in the radius r ball around that site. 19 Namely. . The set C is said to separate A and B if every path from a vertex in A to a vertex in B must pass through C. The distribution Xi (for every i) is conditionally independent of the rest of the graph given just the variables that lie in the neighborhood of the vertex. We will study the following two Markov properties. sj ) ≤ r}. ⊥B We are interested in distributions that do satisfy such properties. The local Markov property. The global Markov property. C be three disjoint subsets of the vertices V of a graph G. C of V such that C separates A from B in the graph. . We will need to use the general case. .5. For any disjoint subsets A. Definition 2. in such neighborhood systems. Xn ) and the vector (X1 . 2. and will examine what effect these Markov properties have on the factorization of the 19 . . 1. . it holds that A⊥ | C. Now we return to the case of the vertices indexing random variables (X1 . . . We will use the term “variable” freely in place of “site” when we move to logic. In other words. Let A. and their relation to factorization of the distribution. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE each si ∈ S.2. B. Note that a nearest neighbor system that is often used in physics is just the case of r = 1. where r will be determined by considerations from logic that will be introduced in the next two chapters. B.4. A probability measure P on Ωn is said to satisfy certain Markov properties with respect to the graph when it satisfies the appropriate conditional independencies with respect to that graph. the influence that variables exert on any given variable is completely described by the influence that is exerted through the neighborhood variables alone. the neighborhood Ni is defined as Gi := {sj ∈ S : d(si . Xn ) taking values in a configuration space Ωn .

2. n. 2. we can think of interactions between variables in Markov random fields as being characterized by “piecewise local” interactions. this is done in the context of Markov random fields. Xn is a Markov random field with respect to a neighborhood system on G if and only if the following two conditions are satisfied. We motivate a Markov random field with the simple example of a Markov chain {Xn : n ≥ 0}. Thus. . Definition 2. The distribution at each vertex is conditionally independent of all other vertices given just those in its neighborhood: P (Xi | XV \i ) = P (Xi | XNi ) These local conditional distributions are known as local characteristics of the field.6. . This may be interpreted as: The influence of far away variables is limited to that which is transmitted through the interspersed intermediate variables — there is no “direct” influence of far away vertices beyond that which is factored through such intermediate interactions. . Xn+1 }. For most applications. The Markov property of this chain is that any variable in the chain is conditionally independent of all other variables in the chain given just its immediate neighbors: Xn ⊥ k : k ∈ {n − 1. n + 1} | Xn−1 . 20 . ⊥{x / A Markov random field is the natural generalization of this picture to higher dimensions and more general neighborhood systems. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 20 distributions. . 1. The second condition says that Markov random fields satisfy the local Markov property with respect to the neighborhood system. the influence of far away vertices must “factor through” local interactions. Namely. The collection of random variables X1 . The distribution is positive on the space of configurations: P (x) > 0 for x ∈ Ωn .

See [KS80] for a treatment that focusses on this setting. Note that Markov random fields are characterized by a local condition — namely. for here.2. We now describe another random field that has a global characterization — the Gibbs random field.2.7. See also [Li09]. Markov random fields with respect to a neighborhood system satisfy the global Markov property with respect to the graph constructed from the neighborhood system. We shall see in later chapters that this picture. With this positivity condition. such as Ising spins. 2. Their local properties were later found to have applications to analysis of images and other systems that can be modelled through some form of spatial interaction. that this is a considerably simpler picture than having to consult the joint distribution over all variables for all interactions. This field started with [Bes74] and came into its own with [GG84] which exploited the Markov-Gibbs correspondence that we will deal with shortly. with some additional caveats. where they model probability measures on configurations of interacting particles. Theorem 2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 21 However. Markov random fields originated in statistical mechanics [Dob68]. Notice though. we need only know the local joint distributions and use these to infer the correlations of far away variables. their local conditional independence characteristics. a vertex may influence any other arbitrarily far away. Markov random fields satisfy the global Markov property as well. is at the heart of polynomial time computations. through such local interactions. the complete set of conditionals given by the local characteristics of a field determine the joint distribution [Bes74]. Note the positivity condition on Markov random fields. 21 .1 Gibbs Random Fields and the Hammersley-Clifford Theorem We are interested in how the Markov properties of the previous section translate into factorization of the distribution.

T Evaluating Z explicitly is hard in general since it is a summation over each of the Λn configurations in the space. Z is the partition function and is a normalizing factor that ensures that the measure sums to unity.8. At high temperatures. . U (x) is the “energy” of configuration x and takes the following form as a sum U (x) = c∈C Vc (x). xn ) = where 1. 3. At low temperatures. The functions Vc : c ∈ C are the clique potentials such that the value of Vc (x) depends only on the coordinates of x that lie in the clique c. . For example. It controls the sharpness of the distribution. let us say that in a system. 2. A Gibbs random field (or Gibbs distribution) with respect to a neighborhood system NG on the graph G is a probability measure on the set of configurations Ωn having a representation of the form P (x1 . T is a constant known as the “Temperature” that has origins in statistical mechanics. . Thus. These capture the interactions between vertices in the clique. the distribution tends to be uniform over the configurations.2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 22 Definition 2. Z= x∈Ωn U (x) 1 exp(− ). each particle 22 . . Z T exp(− U (x) ). it tends towards a distribution that is supported only on the lowest energy states. over the set of cliques C of G.” This says that the probability of a configuration depends only on the interactions that occur between the variables. a Gibbs random field has a probability distribution that factorizes into its constituent “interaction potentials. broken up into cliques.

The precise notion is that of independent parameters it takes to specify the distribution. the Gibbs factorization carries in it a faithful representation of the underlying interactions between the particles. In other words. Theorem 2. The theorem appears in the unpublished manuscript [HC71] and uses a certain “blackening algebra” in the proof. The support of the potential Vc is the cardinality of the clique c. Definition 2. X is Markov random field with respect to a neighborhood system NG on the graph G if and only if it is a Gibbs random field with respect to the same neighborhood system. One may immediately see that the degree of a distribution is a measure of the complexity of interactions in the system since it is the size of the largest set of variables whose interaction cannot be split up in terms of smaller interactions between subsets. We will return to this at the end of this chapter. Let P be a Gibbs distribution whose energy function U (x) = c∈C Vc (x). denoted by deg(P ).10 (Hammersley-Clifford). The first published proofs appear in [Bes74] and [Mou74]. each of whom had just three variables in its support. The following example from [Mou74] shows that relaxing this 23 . Note that the condition of positivity on the distribution (which is part of the definition of a Markov random field) is essential to state the theorem in full generality. This type of factorization obviously yields a “simpler description” of the distribution. Thus. Factorization into conditionally independent interactions of scope k means that we can specify the distribution in O(γ k ) parameters rather than O(γ n ).2. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 23 interacts with only 2 other particles at a time. The Hammersley-Clifford theorem relates the two types of random fields. The degree of the distribution P .9. the degree of the distribution is the size of the largest clique that occurs in its factorization. (if one prefers to think in terms of statistical mechanics) then the energy of each state would be expressible as a sum of potentials. is the maximum of the supports of the potentials. One would expect this to be the hurdle in efficient algorithmic applications.

2. 0. 0) (1. and factor nodes. 0. 1. 1) (0. (0. 0. 1. X4 }. X3 . 1.11. X3 . 1. 1. A dashed line indicates that the variable appears negated in the clause. 2.2. 1. 0. 1. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 24 condition allows us to build distributions having the Markov property. 0) (1. Namely we have X1 ⊥ 3 | X2 . but not the Gibbs property. They are a class of undirected models. 0.2: A factor graph showing the three clause 3-SAT formula (X1 ∨ X4 ∨ ¬X6 ) ∧ (¬X1 ∨ X2 ∨ ¬X3 ) ∧ (X4 ∨ X5 ∨ X6 ). the distribution does not factorize into Gibbs potentials. The two types of nodes in a factor graph correspond to variable nodes. 1) (1. 0) (0. 24 . Each of the following combinations have probability 1/8. See Fig.2. X2 . 1. 1). x1 x2 x3 x4 x5 x6 C1 C2 C3 Figure 2. 0. 0. ⊥X ⊥X However. Consider a system of four binary variables {X1 . 0) (1. while the remaining combinations are disallowed. 1) (0. X4 and X2 ⊥ 4 | X1 . 0. We may check that this distribution has the global Markov property with respect to the 4 vertex cycle graph.3 Factor Graphs Factor graphs are bipartite graphs that express the decomposition of a “global” multivariate function into “local” functions of subsets of the set of variables. E XAMPLE 2.

and connecting it to all the variable nodes in the clique. x6 ). In summary. Factor graphs have been very useful in various applications. x2 . in general. a positive distribution that satisfies the global Markov property with respect to a factor graph satisfies the Gibbs property with respect to its completion. One should keep in mind that this factorization is (in general) far from being a factorization into conditionals and does not express conditional independence. x6 ). This global information is contained in the partition function.. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 25 The distribution modelled by this factor graph will show a factorization as follows p(x1 . 25 . x4 . x5 . but rather on algorithmic applications of local features (such as locally tree like) of factor graphs. A clique in a factor graph is a set of variable nodes such that every pair in the set is connected by a function node.. x4 . . As might be expected from the preceding comments. Then.x6 Factor graphs offer a finer grained view of factorization of a distribution than Bayesian networks or Markov networks.1) (2. Z ϕ1 (x1 . A Hammersley-Clifford type theorem holds over the completion of a factor graph. . x6 ) = where Z = 1 ϕ1 (x1 . and no others. x3 )ϕ(x4 .. x3 )ϕ(x4 . . these do not focus on conditional independence. Thus. x2 . the factorization above is not the one what we are seeking — it does not imply a series of conditional independencies in the joint distribution. x6 )ϕ2 (x1 .2.2) x1 . most notably perhaps in coding theory where they are used as graphical models that underlie various decoding algorithms based on forms of belief propagation (also known as the sum-product algorithm) that is an exact algorithm for computing marginals on tree graphs but performs remarkably well even in the presence of loops. . The system must embed each of these factors in ways that are global and not obvious from the factors. x5 . See [KFaL98] and [AM00] for surveys of this field. these factors do not represent conditionally independent pieces of the joint distributions. (2. x6 )ϕ2 (x1 .. The completion of a factor graph is obtained by introducing a new function node for each clique.

Given a directed graphical model. we (a) replace a directed edge from one vertex to another by an undirected one between the same two vertices and (b) “marry” the parents of each vertex by introducing edges between each pair of parents of the vertex at the head of the former directed edge. Consider the DAG of Fig. The set of parents of x is denoted by pa(x). and y is the child of x. Thus.4 The Markov-Gibbs Correspondence for Directed Models Consider first a directed acyclic graph (DAG). x4 ). The process is illustrated in the figure below. . we often assume that the graph is equipped with a distance function d(·. . In general. The set of vertices from whom directed paths lead to x is called the ancestor set of x and is denoted an(x). In moralization.3 (left). If there is a directed edge from x to y. x6 ) = p(x1 )p(x2 )p(x3 )p(x4 | x1 )p(x5 | x2 . The idea is best illustrated with a simple example. Note that DAGs is allowed to have loops (and loopy DAGs are central to the study of iterative decoding algorithms on graphical models). . which is simply a directed graph without any directed cycles in it.2. Similarly. x3 . one may construct an undirected one by a process known as moralization. 2. A set of random variables whose interdependencies may be represented using a DAG is known as a Bayesian network or a directed Markov field. . Finally. if we denote the set of parents of the variable xi by pa(xi ). we say that x is a parent of y. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 26 2. Some specific points of additional terminology for directed graphs are as follows. The corresponding factorization of the joint density that is induced by the DAG model is p(x1 . the set of vertices to whom directed paths from x lead is called the descendant set of x and is denoted de(x). while the set of children of x is denoted by ch(a). then 26 . every joint distribution that satisfies this DAG factorizes as above. ·) between vertices which is just the length of the shortest path between them.

In doing so.10) cannot be done in general. which we reproduce next. We want.3: The moralization of the DAG on the left to obtain the moralized p(x1 . .12. xpa(v) )µv (dyv ) = 1 27 . We will use the result from [LDLL90]. known as kernels. . In some cases however. . one may remove the positivity condition safely.. the joint distribution of (x1 . A measure p admits a recursive factorization according to graph G if there exist non-negative functions. .) for v ∈ V defined on Λ×Λ| pa(v)| where the first factor is the state space for Xv and the second for Xpa(v) . . . 2. .2. . however. We have seen that relaxing the positivity condition on the distribution in the Hammersley-Clifford theorem (Thm. [LDLL90] extends the Hammersley-Clifford correspondence to the case of arbitrary distributions (namely. dropping the positivity requirement) for the case of directed Markov fields. such that k v (yv . is to obtain a Markov-Gibbs equivalence for such graphical models in the same manner that the Hammersley-Clifford theorem provided for positive Markov random fields. xn ) factorizes as N x1 x4 x2 x3 x5 x4 Figure 2. they simplify and strengthen an earlier criterion for directed graphs given by [KSC84]. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 27 x1 x2 x3 x5 undirected graph on the right. xn ) = n=1 p(xn | pan ). Definition 2. In particular. k v (. .

then it admits a factorization (into potentials) according to the moral graph G m .µ where f (x) = v∈V 28 k v (xv . ⊥B 28 . xpa(v) ). Arrows on the path meet head-to-tail or tail-to-tail at a node in C. 2. Now let G m be the moral graph corresponding to G.3.2.13. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE and p = f.B. §3. We simply state the property and refer the reader to [KF09. Theorem 2. Consider the set of all directed paths coming from a node in A and going to a node in B. In this case. and neither the node nor any of its descendants is in C. If all paths from A to B are blocked as above. xpa(v) ) are the conditional densities for the distribution of Xv conditioned on the value of its parents Xpa(v) = xpa(v) .1] and [Bis06.2. Such a path is said to be blocked if one of the following two scenarios occurs.. §8. The notion is what one would expect intuitively if one views directed models as representing “flows” of probabilistic influence. For directed models. and the joint distribution must satisfy A⊥ | C. there is an analogous notion of separation known as D-separation. then C is said to D-separate A from B. Let A. the kernels k v (. Arrows meet head-to-head at a node. If p admits a recursive factorization according to G. 1. D-separation We have considered the notion of separation on undirected models and its effect on the set of conditional independencies satisfied by the distributions that factor according to the model.2] for discussion and examples. and C be sets of vertices on a directed model.

Not all distributions have P-maps. and C that is satisfied by the distri⊥B bution is reflected in the graph. A D-map may express more conditional independencies than the distribution possesses. 29 . a completely connected graph is trivially a I-map for any distribution.16. B. §3.4] for examples). Definition 2. Indeed. Definition 2. A graph that is both an I-map and a D-map for a distribution is called its P-map (’perfect map’). The conditional independence properties of these two classes are obtained differently. Thus. B.5 I-maps and D-maps We have seen that there are two broad classes of graphical models — undirected and directed — which may be used to represent the interaction of variables in a system.15. A graph (directed or undirected) is said to be a D-map (’dependencies map’) for a distribution if every conditional independence statement of the form A⊥ | C for sets of variables A.2. A I-map may express less conditional independencies than the distribution possesses. and C that is expressed ⊥B by the graph is also satisfied by the distribution.14. A graph (directed or undirected) is said to be a I-map (’independencies map’) for a distribution if every conditional independence statement of the form A⊥ | C for sets of variables A. In other words a P-map expresses precisely the set of conditional independencies that are present in the distribution. the class of distributions having directed P-maps is itself distinct from the class having undirected P-maps and neither equals the class of all distributions (see [Bis06.8. Definition 2. a completely disconnected graph having no edges is trivially a D-map for any distribution. INTERACTION MODELS AND CONDITIONAL INDEPENDENCE 29 2. Thus.

. To specify their joint distribution p(x1 . Distributions with poly(log n)-Parametrization We now come to a central theme in our work. we would expect that it would be “somewhat like” a joint distribution on only poly(log n) covariates. Indeed. . We would intuitively expect it to be “much simpler” in some way than the typical joint distribution on n variates. . . except that we are provided with one critical piece of extra 30 . it takes exponentially many in n parameters to specify a “true” joint distribution of n covariates. We shall refer to such distributions as having poly(log n)-parametrization. Take the case of n covariates. Thus. Thus. we could find that at the remaining configuration. In light of the above. a joint distribution that requires only 2poly(log n) parameters to specify would seem quite unusual. . Let us take an extreme case of such a “simple” joint distribution.3. In other words — distributions on n variates but requiring only 2poly(log n) parameters for their specification are like the typical distribution on poly(log n) variates. . Consider a system of n binary covariates (X1 . . because of the exponent of poly(log n). The only constraint given on these probability masses is that they must sum up to 1. xn ) in the absence of any additional information. n covariates require 2n − 1 parameters to specify their joint distribution. given the function value at 2n − 1 configurations. we would have to give the probability mass function at each of the 2n configurations that these n variables can take jointly. . This statement can be made more precise — the typical joint distribution on n variates requires O(2n ) parameters for its specification. Xn ). This means that in the absence of any additional information.

as a result of the independence. it is because the distribution is supported on only two out of a possible 2n values. Thus. . 1). Here. . A distribution on n variates will be said to have irreducible O(n) correlations if there exist correlations between O(n) variates that do not permit factorization into smaller scopes through conditional independencies. the probability that it takes the value 1. It is distributions that possess both these properties that are problematic for polynomial time algorithms. . Definition 3. DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 31 information — that the n variates are independent of each other. we go from exponentially many independent parameters to linearly many if we know that the variates are independent. we require only two parameters to specify the distribution. we would need 1 parameter to specify each of their individual distributions — namely. Note that distributions having both these properties require O(2n ) independent parameters to specify. 0) and (1. the distribution on n covariates required far fewer parameters to specify than the typical n variate distribution does. .2. we make the following definition. but not both. In other words. .1. These n parameters then specify the joint distribution simply because the distribution factorizes completely into factors whose scopes are single variables (namely. that will permit more economical parametrization. nor limited support. We will see that distributions constructed by polynomial time algorithms can have one or the other property. Consider the distribution on n variates that is non-zero only at (0. it is severely limited by the small number of joint values the covariates take. . In this case. In other words. ample distributions take the typical number of joint values. There is neither factorization. the variates are highly correlated.3. A distribution on n variates will be called ample if it is supported on cn joint values for c > 1. . 31 . just the p(xi )). 0. Definition 3. Let us consider another extreme case of such a distribution. . In both cases above. In order to state the typical case of a n variate distribution. In that case. But once again. 1.

they have poly(log n)parametrization. In other words. but (a) the non-factorizability of such correlations and (b) ampleness under such non-factorizable correlations that characterizes the solution spaces in hard phases of NP-complete problems. the distribution of solutions requires exponentially many parameters to specify.1 Range Limited Interactions As noted earlier. with no possibility for factorization into conditional independencies. The distribution is ample. 2. The variates are as far from being independent as possible — they interact with each other O(n) at a time.3. 3. and what would be its limitations? This question is really the heart of P = NP. Indeed. What is far more frequent 32 . DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 32 This brings us to a key motivating question: What if n covariates had a joint distribution that required only exponential in poly(log n) many parameters to specify? When would such a distribution arise. the distribution of solutions in the hard phases of NP-complete problems displays two properties 1. Note that both conditions are required. ? 3. When do they arise? They can be studied in two categories. It is not only long range correlations.1 Two Kinds of poly(log n)-parameterizations We have seen that distributions on n variates that are poly(log n)-parametrizable are very atypical. it is not often that complex systems of n interacting variables have complete independence between some subsets. Namely. the distribution has irreducible O(n) correlations. all the machinery we build and use in this work really takes us to the following insight: Polynomial time computations build distributions of solutions that can be parameterized using only exponential in poly(log n) many parameters. both of which will correspond to polynomial time algorithms. In particular. In contrast.1. in the hard phases of NP-complete problems like k-SAT.

For example. See [KF09] for further discussion on parameterizations for directed and undirected graphical models.3. From our perspective. if each node has at most k parents. We should emphasize that the factors must give us conditional independence for this to be true. DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 33 is that there are conditional independencies between certain subsets given some intermediate subset. The measure that allows us to make this distinction is the number of independent parameters it takes to specify the distribution. . starting from a directed model and moralizing) holds even if the distribution is not positive in contrast with those distributions which do not factor over directed models and where we have to invoke the Hammersley-Clifford theorem to get a similar factorization. then we can parametrize the joint distribution with at most n2n independent parameters. Our proof scheme requires us to distinguish distributions based on the size of the irreducible direct interactions between subsets of the covariates. . given the parents. but it is. In this case. We may also moralize the graph and see this as a factorization over cliques in the moralized graph. a major feature of directed graphical models is that their factorizations are already globally normalized once they are locally normalized. When the size of the smallest irreducible inter- 33 . If the factorization is into conditionally independent factors. Namely. Therefore. and so we cannot conclude anything about the number of independent parameters by just examining the factor graph. Xn ). in general. . We would like to contrast such distributions from others which can be so factored through factors having only poly(log n) variates in their scope. The conditional independence in this case is from all non-descendants. factor graphs give us a factorization. the joint will factorize into factors each of whose scope is a subset of (X1 . Note that such a factorization (namely. meaning that there is a recursive factorization of the joint into conditionally independent pieces. . we would like to distinguish distributions where there are O(n) such covariates whose joint interaction cannot be factored through smaller interactions (having less than O(n) covariates) chained together by conditional independencies. not a factorization into conditional independents. each of whose scope is of size at most k . we can parametrize the distribution using at most n2k independent parameters.

1: A range limited joint distribution on n covariates that has poly(log n)parametrization. but their range is limited to poly(log n). 34 . DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 34 Range Limited to poly(log n) Ra n ge Li m ite d to po ly( lo g n) Range Limited to poly(log n) Figure 3.3. Although interactions between variables may be ample for their range.

In Chapter 5. By HammersleyClifford. then the factorization of the distribution of solutions to that problem causes it to have economical parametrization. if at most k < n variables interact directly at a time. we will build machinery that shows that if a problem lies in P as a result of a range limited algorithm (like monadic LFP). On the other hand. This was the case where the n variates interact directly only poly(log n) at a time. then we need O(cn ) parameters where c > 1. The resulting distribution is ample. One sees immediately that the underlying limitation in both this case and the previous is common — the set of n covariates 35 . 3. In this section. Thus. DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 35 actions is O(n). Note that the case where all n variates are independent falls into the range limited category with range being one. See Fig. it is also a Gibbs random field over the set of maximal cliques in the graph encoding the neighborhood system of the Markov random field. we have an upper bound on the number of parameters it takes to specify the distribution. then the largest clique size would be k.1 Let us consider the example of a Markov random field.2). we will see another such limited interaction. This Gibbs field comes with conditional independence assurance. Namely. and this would give us a more economical parameterization than the one which requires 2n − 1 parameters. if we were able to demonstrate that the distribution factors through interactions which always have scope poly(log n). and such interactions are chained together through conditional independencies. 3. but rather in smaller subsets in a directed manner that gives us conditional independencies between sets that are of size poly(log n). 3.3. and therefore. then we would need only O(cpoly(log n) ) parameters.1. where the n variates do interact directly O(n) at a time. but they are restricted to taking only cpoly(log n) many distinct values (see Fig.2 Value Limited Interactions In the previous section we saw the first type of interaction between n covariates that can be parametrized by just poly(log n) independent parameters. precisely because variables do not interact all at once. it is just c∈C 2|c| .

the joint distribution 36 . the n covariates behave in ways that is similar to a system of poly(log n) covariates. for instance. in both cases. We should stress that this behavior has been rigorously shown to hold for some phases of k-SAT for high values of k. in both cases. c > 1 independent parameters to specify. O(cpoly(log n) ) independent parameters simply will not suffice to explain the behavior of the solution space of the problem. Instead. in both cases. we shall see that in the hard phases of problems such as k-SAT for k > 8. In the case of range limited. In both cases — range limited and value limited interactions — the system of n covariates behaves as though it was a system of only poly(log n) covariates. We should also point out that once we have isolated the precise notion that is at the heart of polynomial time computation — namely poly(log n)parametrizability of the space of solutions — several apparent issues resolve themselves. The core issue is that of the number of independent parameters it takes to specify the distribution of the entire space of solutions. We will recall this behavior in some detail in Chapter 6.3. Take the case of clustering in XORSAT. they only take 2poly(log n) joint values. This is the crux of the P = NP question. both range and value limited interactions require only O(cpoly(log n) ) independent parameters to specify. this is because though O(n) variates vary jointly. their “jointness” resembles a system of poly(log n) covariates. as we shall see. We only need note that the linear nature of XORSAT solution spaces mean there is a poly(log n)-parametrization (the basis provides this for linear spaces). We will measure the jointness of a distribution by the number of independent parameters required to specify it. Thus. In the case of value limited interactions. Namely. typical) joint distribution of n variates. In particular. On the other hand. A “true” joint distribution takes O(cn ). DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 36 do not take 2n different values with extensive O(n) correlations that do not factor through conditional independencies like a “true”(or more precisely. How do we precisely state this property? Through the notion of independent parameters. It is in these phases that our separation of complexity classes can be demonstrated. not elsewhere. this is because the covariates only jointly vary with poly(log n) other variates at a time.

n independent variates are range limited. but not the joint behavior of a “true” joint distribution of n covariates. Whereas the distribution supported on the all 1 and all 0 tuple is value limited. but not range limited. Regimes of problems where the distributions of solutions are neither value limited nor range limited cause the failure of polynomial time algorithms on the average. 37 Value Limited to 2poly(log n) Value Limited to 2poly(log n) . but not value limited. For instance. Although interactions between variables are O(n) at a time. they do not display ampleness in their joint distribution. It is also useful to notice that neither type of limitation implies the other. we will build machinery to see that polynomial time LFP algorithms can capture either range or value limited behaviors.3. DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION has a very economical parametrization using only 2poly(log n) independent parameters.2: A value limited joint distribution on n covariates that has poly(log n)parametrization. Value Limited to 2poly(log n) Range O(n) Figure 3. 37 In later chapters.

even though a covariate sees O(n) other covariates. DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 38 3.3. In both cases. where they admit Gibbs factorizations into smaller potentials). we would not get a poly(log n)parametrizable distribution.1. The solution space shows the behavior of a typical joint distribution on n covariates in that it is ample and correlated. For purposes of pedagogy. In other words. the range of behaviors of the n covariates can be parametrized with the number of independent parameters it takes to specify a joint distribution of only poly(log n) covariates. nothing more. This short section owes its existence to Leonid Levin and Avi Wigderson. we will disregard this superficial dissimilarity and provide a full treatment of the range limited case. both of whom asked us whether our methods could be used to make statements about average case complexity. 3. This observation may be used to state results about average case complexity in hard phases of random k-SAT.3 On the Atypical Nature of poly(log n)-parameterization We briefly mentioned earlier that the typical member of the space of distributions on n covariates requires O(2n ) parameters. We end this chapter by tying poly(log n)-parameterizations to a Markov or 38 . It is polynomial time solution spaces that are atypical for n variate distributions in that they are either not ample (the value limited case) or they are not correlated solidly enough (the range limited case. with high likelihood. with high likelihood we would get a distribution that required O(2n ) parameters to specify.1.4 Our Treatment of Range and Value Limited Distributions The two types of distributions that we have mentioned above are only superficially dissimilar. Note that this is a statistical statement. Namely. it only utilizes poly(log n) amount of the information in them in order to make its decision. these hard phases are simply typical. We will return to this issue in future versions of this paper or in the manuscript [Deo10] which is under preparation. if we picked a n variable distribution at random. In many ways. We can even think of the value limited behavior as a type of range limited behavior where.

A range limited parametrization would correspond to a Gibbs field whose potentials are specified over maximum cliques of size poly(log n). the random field would have poly(log n)-parametrization. In either case. 39 . consider two kinds of poly(log n)-parameterizations — range limited and value limited. 3.1 and 3. See Figs.2. A value limited parameterization could have maximum cliques of size O(n). DISTRIBUTIONS WITH poly(LOG N )-PARAMETRIZATION 39 (equivalently for directed) Gibbs models. but the number of parameters for such a clique would only be 2poly(log n) instead of the possible 2O(n) . Once again.3.

a graph may be seen as a structure over this 40 . . A A A = A. We quickly set notation. In order to keep the treatment relatively complete. . 1 s An example is the vocabulary of graphs which consists of a single relation symbol having arity two. . and interpretations cA for each of the constant symbols in the vocabulary. . Namely. . A σ-structure A consists of a set A which is the universe of A.4. We consider only relational vocabularies in that there are no function symbols. cs . In particular. . and those that are computable in polynomial space. Rm . Then. cA . is a set consisting of finitely many relation and constant symbols. Rm . This poses no shortcomings since functions may be encoded as relations. over ordered structures. c1 . . . R1 . Logical Descriptions of Computations Work in finite model theory and descriptive complexity theory — a branch of finite model theory that studies the expressive power of various logics in terms of complexity classes — has resulted in machine independent characterizations of various complexity classes. . we begin with a brief pr´ cis of this theory. cA . there is a precise and highly insightful characterization of the class of queries that are computable in polynomial time. . . Each relation has a fixed arity. denoted by σ. . σ = R1 . . . . A vocabulary. Readers from a finite model theory background may skip e this chapter. . interpretations RA for each of the relation symbols in the vocabulary.

. a) the structure where the tuple a has been identified with these additional constants. Lib04] for detailed treatments in the context of finite model theory. . We also denote by σn the extension of σ by n additional constants. where the defining relation for each stage can be written in the first order language of the underlying structure and uses elements added to the set in previous stages. The idea is to build up a set in stages. At the ξ ξ th stage of the induction. and we refer the reader to [Mos74] for the first monograph on the subject. The decomposition into its various stages is a central characteristic φ of inductively defined relations. some applications may require us to work with a graph vocabulary having two constants interpreted in the structure as source and sink nodes respectively. we insert into the relation S the tuples according to ξ x ∈ Iφ ⇔ φ( η<ξ η Iφ . We will also require that φ have only positive occurrences of the n-ary relation variable S. x). and denote by (A. x1 . In addition. R1 . and the relation symbol is interpreted as an edge. Our treatment is taken mostly from these sources. . xn ) in the first-order language of A. and stresses the facts we need. Inductive definitions are a fundamental primitive of mathematics. . The variable S is a second-order relation variable that will eventually hold the set we are trying to build up in stages. 4. and to [EF06. . Rm and a formula φ(S.4. In the most general case. denoted by Iφ . x) ≡ φ(S. namely all occurrences of S be 41 . . .1 Inductive Definitions and Fixed Points The material in this section is standard. . there is an underlying structure A = A. See [Imm99] for a text on descriptive complexity theory. We will denote the stage that a tuple enters the relation in the induction defined by φ by | · |A . where the universe is the set of nodes. LOGICAL DESCRIPTIONS OF COMPUTATIONS 41 vocabulary.

When the underlying structures are finite.1. namely. We begin by defining two classes of operators on sets. In the most general case. X ⊆ F (X). but can be stated only in the second order language of A. Note that there are definitions of the set Iφ that are equivalent. constructive. Definition 4. inductive relations are sections of fixed points. The operator F is inflationary if it maps sets to their supersets. then F (X) ⊆ F (Y ). and 2. An operator F on A is a function F : P(A) → P(A). Note that the cardinality of the ordinal κ is at most |A|n . a transfinite induction may result. for all subsets X. x) for some choice of tuple a over A are known as inductive relations. and P(A) be its power set. κ+1 κ The least ordinal κ at which Iφ = Iφ is called the closure ordinal of the induc- tion. Note that the definition above is 1. Such inductions are called positive elementary. We will use both these properties throughout our work. and characterize the sequences induced by monotone and inflationary operators. Next. and is denoted by |φA |. Sets of the form Iφ are known as fixed points of the structure. Y of A. 42 . The operator F is monotone if it respects subset inclusion. LOGICAL DESCRIPTIONS OF COMPUTATIONS 42 within the scope of an even number of negations. Finally. elementary at each stage. Let A be a finite set. we define the relation Iφ = ξ ξ Iφ .4. Thus. and then consider the operators on structures that are induced by first order formulae. We now proceed more formally by introducing operators and their fixed points. Relations that may be defined by R(x) ⇔ Iφ (a. we define sequences induced by operators. this is also known as the inductive depth. if X ⊆ Y . namely.

and also provides two constructions of the least fixed point for such operators: one “from above” and the other “from below. F i+1 = F (F i ). the sequence (F i ) is inductive. The set X ⊆ A is called a fixed point of F if F (X) = X. A fixed point X of F is called its least fixed point. (4. Not all operators have fixed points.” The latter construction uses the sequences (4. Let F be a monotone operator on a set A. if it is contained in every other fixed point Y of F .1) This sequence (F i ) is called inductive if it is increasing.2. Namely. (4. F 1 . X ⊆ Y whenever Y is a fixed point of F . Let F be an operator on A. If F is either monotone or inflationary.2) Lemma 4. we define F ∞ ∞ := i=0 F i. LFP(F ) is also equal to the union of the stages of the sequence (F i ) defined in (4. . Theorem 4.3. 1. namely. not all operators are monotone. 43 .4.1). However. namely. The TarskiKnaster guarantees that monotone operators do. Consider the sequence of sets F 0 . therefore we need a means of constructing fixed points for non-monotone operators. . defined by F 0 = ∅. . Definition 4. 2.4.1). In this case. denoted LFP(F ). Let F be an operator on A.5 (Tarski-Knaster). LFP(F ) = F i = F ∞. Namely. F has a least fixed point LFP(F ) which is the intersection of all the fixed points of F . LOGICAL DESCRIPTIONS OF COMPUTATIONS 43 Definition 4. let alone least fixed points. LFP(F ) = {Y : Y = F (Y )}. if F i ⊆ F i+1 for all i. Now we are ready to define fixed points of operators on sets.

4.2 Fixed Point Logics for P and PSPACE We now specialize the theory of fixed points of operators to the case where the operators are defined by means of first order formulae. a} means that R is interpreted as X in ϕ. xn ) = ϕ(R. 1. x) defines an operator Fϕ : P(Ak ) → P(Ak ) on Ak which acts on a subset X ⊆ Ak as Fϕ (X) = {a | A |= ϕ(X/R. where φ is a formula in FO. and R a relational symbol of arity k that is not in σ. Now consider a structure A of vocabulary σ. The formula ϕ(R. x) be a formula of vocabulary σ ∪ {R}. Consider the sequence (F i ) induced by an arbitrary operator F on A. where ϕ(X/R. and the empty set in the second case. . F n = F n+1 . The is called the inflationary fixed point of G. denoted PFP(F ). LOGICAL DESCRIPTIONS OF COMPUTATIONS 44 Definition 4. Let σ be a relational vocabulary. The sequence may or may not stabilize. F m = F n . as F n in the first case.3) . the sequence F i is inductive. . For an inflationary operator F . a}. 4. there is a positive integer n such that F n+1 = F n . Definition 4. . we define the partial fixed point of F . In the first case.x ϕ(R. Definition 4.6. In the latter case. Definition 4. Now. and therefore for all m > n. Let ϕ(R. This gives us fixed point logics which play a central role in descriptive complexity theory.8. for all n ≤ 2|A| . and hence eventually stabilizes to the fixed point F ∞ . then [IFPR. the sequence F i does not stabilize. The logic FO(IFP) is obtained by extending FO with the following formation rule: if ϕ(R. For an arbitrary operator G. x1 . We wish to extend FO by adding fixed points of operators of the form Fφ . Let the notation be as above. .7. namely. we associate the inflationary operator Ginfl defined by Ginfl (Y ) set Ginfl ∞ Y ∪ G(Y ). x) is a formula and t a k-tuple of terms.9. x)](t) 44 (4. and denoted by IFP(G).

3) is monotone. Now we can define the closure of FO under least fixed points of operators obtained from formulae that are positive in a relational variable. we make some restrictions on the formulae which guarantee that the operators obtained from them as described by (4.3) will be monotone. An occurrence of R is said to be positive if it is under the scope of an even number of negations. x) is positive in R. 2.10. we would obtain a logic with an undecidable syntax. In particular. and negative if it is under the scope of an odd number of negations. Hence. x)](a) iff a ∈ PFP(Fϕ ). and testing for monotonicity is undecidable. x) is a formula and t a k-tuple of terms. or there are no occurrences of R at all. The semantics are given by A |= [PFPR. x)](t) is a formula whose free variables are those of t. 45 .x ϕ(R. We need a definition.x ϕ(R. x)](a) iff a ∈ IFP(Fϕ ). Definition 4. Let notation be as earlier. Let ϕ be a formula containing a relational symbol R. Let notation be as earlier.11. there are no negative occurrences of R in the formula. We cannot define the closure of FO under taking least fixed points in the above manner without further restrictions since least fixed points are guaranteed to exist only for monotone operators. LOGICAL DESCRIPTIONS OF COMPUTATIONS 45 is a formula whose free variables are those of t.x ϕ(R. Lemma 4. If we were to form a logic by extending FO by least fixed points without further restrictions. The logic FO(PFP) is obtained by extending FO with the following formation rule: if ϕ(R.4. A formula is said to be positive in R if all occurrences of R in it are positive. The semantics are given by A |= [IFPR. and thus will have a least fixed point. then [PFPR. then the operator obtained from ϕ by construction (4. If the formula ϕ(R.

This is well defined for least ϕ fixed points since a tuple enters a relation only once. §10. LOGICAL DESCRIPTIONS OF COMPUTATIONS 46 Definition 4. First. The semantics are given by A |= [LFPR.x ϕ(R.x ϕ(R. where ϕ does not have any second-order quantification. adding the ability to do simultaneous induction over several formulae does not increase the expressive power of the logic. then [LFPR.13 (Fagin). x)](a) iff a ∈ LFP(Fϕ ). Here. and inductive depths are denoted by |ϕA |. the stage at which the tuple a enters the relation R is denoted by |a|A . The logic FO(LFP) is obtained by extending FO with the following formation rule: if ϕ(R.3. x) is a formula that is positive in the k-ary relational variable R. Immerman [Imm82] and Vardi [Var82] obtained the following central result that captures the class P on ordered structures. We have introduced various fixed point constructions and extensions of first order logic by these constructions. and t is a k-tuple of terms. In fixed points (such as partial fixed points) where the underlying formula is not necessarily positive. A tuple may enter and leave the relation being built multiple times. Fagin [Fag74] obtained the first machine independent logical characterization of an important complexity class. x)](t) is a formula whose free variables are those of t. 46 .4. and is never removed from it after. As earlier. 184] for details. ∃SO refers to the restriction of second-order logic to formulae of the form ∃X1 · · · ∃Xm ϕ. p. we informally state two well-known results on the expressive power of fixed point logics. Next. Theorem 4. and secondly FO(IFP) = FO(LFP) over finite structures.12. this is not true. These are the central results of descriptive complexity theory. We end this section by relating these logics to various complexity classes. ∃SO = NP. See [Lib04.

FO(PFP) = PSPACE. Namely. FO(LFP) = P. Namely. Note: We will often use the term LFP generically instead of FO(LFP) when we wish to emphasize the fixed point construction being performed. Var82]. LOGICAL DESCRIPTIONS OF COMPUTATIONS 47 Theorem 4. the queries expressible in the logic FO(LFP) are precisely those that can be computed in polynomial time. Vardi). Over finite. ordered structures. Theorem 4. A characterization of PSPACE in terms of PFP was obtained in [AV91. the queries expressible in the logic FO(PFP) are precisely those that can be computed in polynomial space.15 (Abiteboul-Vianu. Over finite. ordered structures.4. rather than the language.14 (Immerman-Vardi). 47 .

5. The fundamental observation is the following: Least fixed point computations “factor through” first order computations. We are taking as given the non-local capability of LFP. simpler interactions. Thus. In the case where there are conditional independencies. while LFP allows non-local properties such as transitive closure to be expressed. and asking how this non-local nature factors at each step. The Link Between Polynomial Time Computation and Conditional Independence In Chapter 2 we saw how certain joint distributions that encode interactions between collections of variables “factor through” smaller. In this chapter. while a variable in such a system can exert its influence throughout the system. This necessarily affects the type of influence a variable may exert on other variables in the system. and so limitations of first order logic must be the source of the bottleneck at each stage to the propagation of information in such computations. The treatment of LFP versus FO in finite model theory centers around the fact that FO can only express local properties. and what is the effect of such a factorization on the joint distribution of LFP acting upon ensembles. the influence can only be “transmitted through” the values of the intermediate conditioning variables. we will uncover a similar phenomenon underlying the logical description of polynomial time computation on ordered structures. In other words. 48 . the influence must propagate with bottlenecks at each stage. this influence must necessarily be bottlenecked by the simpler interactions that it must factor through.

Thus. It can be expressed in FO(LFP) as follows. We want to understand the stage-wise bottleneck that a fixed point computation faces at each step of its execution. and tie this back to notions of conditional independence and factorization of distributions. but comes cloaked i in a very different garb — that of logic and operators.5. the relation is built stage by stage. vertices that have entered a relation make other vertices that are adjacent to them eligible to enter the relation at the next stage. The sequence (Fϕ ) of op- erators that construct fixed points may be seen as the propagation of influence in a structure by means of setting values of “intermediate variables”. y) ∨ ∃z(E(x. though the resulting property is non-local. y) given by ϕ(R. x. In other words. y) ≡ E(x. x. but this non-local influence must factor through first order logic at each stage. The transitive closure of an edge in a graph is the standard example of a non-local property that cannot be expressed by first order logic. This is a very similar underlying idea to the statistical mechanical picture of random fields over spaces of configurations that we saw in Chapter 2. we will bring to bear ideas from statistical mechanics and message passing to the logical description of computations. Let E be a binary relation that expresses the presence of an edge between its arguments. Namely. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 49 Fixed point logics allow variables to be non-local in their influence. y)). Notice that the decision of whether a vertex enters the relation is based on the immediate neighborhood of the vertex.1. In order to accomplish this. builds the transitive closure relation in stages. In this case. and at each stage. the information flow used to 49 . It will be beneficial to state this intuition with the example of transitive closure. the variables are set by inducting them into a relation at various stages of the induction. E XAMPLE 5. z) ∧ R(z. we must understand the limitations of each stage of a LFP computation and understand how this affects the propagation of long-range influence in relations computed by LFP. Then we can see that iterating the positive first order formula ϕ(R.

The limitations of first order formulae mentioned in the previous section therefore appear at each step of a least fixed point computation. 50 . which extends first order logic with infinitary connectives.5. both of which are fundamental to statistics and probability theory. but the factorization of that influence (encoded in the joint distribution) reveals the stage-wise local nature of the interaction. which is also a segment of Lk . we obtain the non-local relation of transitive closure. for instance. and ∞ω then use the characterization of expressibility in this logic in terms of k-pebble games. 5. The primary technique for demonstrating the limitations of fixed point logics in expressing properties is to consider them a segment of the logic Lk . This is however not useful for our purpose (namely. We have used this simple example just to provide some preliminary intuition. separating P from NP) since NP ⊆ PSPACE and the latter class is captured by PFP. This has led to much research attention to game theoretic characterizations of various logics. The computation factors through a local property at each stage. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 50 compute it is stage-wise local. whereas a Markov random field is undirected. but by chaining many such local factors together. There are important differences however — the flow of LFP computation is directed. we will need to understand the limitations of first order logic. We will now proceed to build the requisite framework. where such local interactions are chained together in a way that variables can exert their influence to arbitrary lengths. In order to arrive at this correspondence. ∞ω One of the central contributions of our work is demonstrating a completely different viewpoint of LFP computations in terms of the concepts of conditional independence and factoring of distributions.1 The Limitations of LFP Many of the techniques in model theory break down when restricted to finite models. This picture relates to a Markov random field. Least fixed point is an iteration of first order formulas. A notable exception is the Ehrenfeucht-Fra¨ss´ game for first order ı e logic.

Ch. which are all described by first order formulae. Let us pause for a while and see how this fits into our global framework.1. In contrast. Thus. The basic idea is that first order formulae can only “see” up to a certain distance away from their free variables. Gaifman locality says that whether or not ϕ holds in a structure depends on the number of elements of that structure having pairwise disjoint r-neighborhoods that fulfill first order formulae of quantifier depth d for some fixed d (which depends on ϕ). 6]. these properties were developed to deal with cases where the neighborhoods of the elements in the structure had bounded diameters. 5. Informally. Ch.1 Locality of First Order Logic The local properties of first order logic have received considerable research attention and expositions can be found in standard references such as [Lib04. Let us now analyze the limitations of the LFP computation through this viewpoint. Hanf locality says that whether or not a first order formula ϕ holds in a structure depends only on its multiset of isomorphism types of spheres of radius r. it is not only the locality 51 . LFP has a natural factorization into its stages. In particular. [Imm99. 4]. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 51 Viewing LFP as “stage-wise first order” is central to our analysis. This has led to two major notions of locality — Hanf locality [Han65] and Gaifman locality [Gai82]. The idea that first order formulae are local has been formalized in essentially two different ways. but in the scenario where neighborhoods of elements have unbounded diameter. Ch. This distance is determined by the quantifier rank of the formula. 2].5. we will use some of the normal forms developed in the context of locality properties in finite model theory. We are interested in factoring complex interactions between variables into their smallest constituent irreducible factors. Viewed this way. both notions express properties of combinations of neighborhoods of fixed size. Clearly. such as the linear time algorithm to evaluate first order properties on bounded degree graphs [See96]. [EF06. In the literature of finite model theory. some of the most striking applications of such properties are in graphs with bounded degree.

Then dA (a. A each relation R is interpreted as RA restricted to Br (a). aj of A. b) = min{dA (ai . Informally. . and the n additional constants are interpreted as a1 . 1 ≤ j ≤ m}. The Gaifman graph of a σ-structure A is denoted by GA and defined as follows. as simply the length of the shortest path between ai and aj in GA . but the exact specification of the finitary nature of the first order computation. With the graph defined. an ) and b = (b1 . . an . . if L is a logic (or language). the definition above also applies to the case where either of them is equal to one. . .5. We need some definitions in order to state the results. In particular. We will see that what we need is that first order logic can only exploit a bounded number of local properties. We recall the notion of a type. the Ltype of a tuple is the sum total of the information that can be expressed about it 52 . we have the notion of distance between a tuple and a singleton element. bm ). Definition 5. The ball of radius r around a is a set defined by A Br (a) = {b ∈ A : dA (a. . The set of nodes of GA is A. Recall the notation and definitions from the previous chapter. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 52 that is of interest to us. denoted by d(ai . Definition 5. .2. Let A be a σ-structure and let a be a tuple over A. Let a = (a1 . . . We extend this to a notion of distance between tuples from A as follows. We will need both these properties in our analysis. . aj ). There is an edge between two nodes a1 and a2 in GA if there is a relation R in σ and a tuple t ∈ RA such that both a1 and a2 appear in t. b) ≤ r}. . Namely. we have a notion of distance between elements ai . A A The r-neighborhood of a in A is the σn -structure Nr (a) whose universe is Br (a).3. Recall that σn is the expansion of σ by n additional constants. . We are now ready to define neighborhoods of tuples. bj ) : 1 ≤ i ≤ n. There is no restriction on n and m above.

5. B be σ-structures and let m ∈ N. Let A. Formulas that are r-local around their variables for some value of r are said to be local.6. Boolean combinations of formulas that are local around the various coordinates xi of x are said to be basic local. . If for every isomorphism type τ of a r-neighborhood of a point. and those that follow from Gaifman’s theorem. a) up to isomorphism. this notion is far too powerful since it characterizes the structure (A. an ) and Nr (b1 . 2. The following three notions of locality are used in stating the results. . we need a definition. there are two broad flavors of locality results in literature – those that follow from Hanf’s theorem. In particular. the first order type of a m-tuple in a structure is defined as the set of all FO formulae having m free variables that are satisfied by the tuple.4. and a type of a neighborhood is an equivalence class of such structures up to isomorphism. [Han65] proved his result for infinite structures. To proceed. A more useful notion is the local type of a tuple. either 1. Thus. . THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 53 in the language L. 3. As mentioned earlier. . . . Formulas whose truth at a tuple a depends only on Br (a) are called r-local. Notation as above. quantification in such formulas is restricted to the structure Nr (x). 53 . bn ) must send ai to bi for 1 ≤ i ≤ n. . Definition 5. we may drop the superscript if the underlying structure is clear. We provide below the locality result due to [FSV95] that is suitable for finite models. Definition 5. a neighborhood is a σn -structure. Note that any A B isomorphism between Nr (a1 . 1. Definition 5. In other words. . The local r-type of a tuple a in A is the type of A a in the substructure induced by the r-neighborhood of a in A. The first relates two different structures. In what follows. Both A and B have the same number of elements of type τ . namely in Nr (a). Over finite structures.5.

We refer the reader to [FSV95] for a discussion comparing the Fagin-StockmeyerVardi theorem with Hanf’s theorem in the context of applications to finite model theory. Lemma 5. neither theorem seems to imply the other. sentences of the form s ∃x1 . THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 2.9 ([Gai82]). Both A and B have more than m elements of type τ .8. Let ϕ(x) be a formula of quantifier depth q. Notation as above. Then we say that A and B are threshold (r. then A |= ϕ(a) ↔ B |= ϕ(b).5. xs i=1 φ(xi ) ∧ 1≤i≤j≤s d>2r (xi . Every FO formula ϕ(x) over a relational vocabulary is equivalent to a Boolean combination of 1. m)-equivalent and every element has degree at most l. xj ) .7 ([FSV95]). m)-equivalent. written A ≡k B. . r depends only on k. 5. there exist r.7. Then there is a radius r and threshold t such that if A and B have the same multiset of local types up to threshold t. 54 Theorem 5. In particular. local formula around x. l > 0. where the φ are r-local. See [Lin05] for an application to computing simple monadic fixed points on structures of bounded degree in linear time. and 2. Furthermore. . Next we come to Gaifman’s version of locality. . Theorem 5. then they satisfy the same first order formulae up to quantifier rank k. For each k. . The Hanf locality lemma for formulae having a single free variable has a simple form and is an easy consequence of Thm. m > 0 such that if A and B are threshold (r. 54 . and the elements a ∈ A and b ∈ B have the same local type up to radius r.

and where the LFP relation being constructed is monadic. At the first stage. where ϕ is local around y. We wish to build a view of fixed point computation as an information propagation algorithm. The following normal form for first order logic that was developed in an attempt to merge some of the ideas from Hanf and Gaifman locality.10 ([SB99]). for every first order formula. In order to do so. y).2 Simple Monadic LFP and Conditional Independence In this section. The key is to see the constructions underlying least fixed point computations through the lens of influence propagation and conditional independence. In this section. At stage zero of the fixed point computation. this may seem to be an unlikely union. we will demonstrate this relationship for the case of simple monadic least fixed points. In later sections. Theorem 5. let us examine the geometry of information flow during an LFP computation. This again expresses the bounded number of local properties feature that limits first order logic. we exploit the limitations described in the previous section to build conceptual bridges from least fixed point logic to the Markov-Gibbs picture of the preceding section. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 55 In words. At first. none of the elements of the structure are in the relation being computed. Every first-order sentence is logically equivalent to one of the form ∃x1 · · · ∃xl ∀yϕ(x. we show how to deal with complex fixed points as well. a FO(LFP) formula without any nesting or simultaneous induction. there is an r such that the truth of the formula on a structure depends only on the number of elements having disjoint r-neighborhoods that satisfy certain local formulas. 5.5. Namely. This changes the local 55 . But we will establish that there are fundamental conceptual relationships between the directed Markovian picture and least fixed point computations. some subset of elements enters the relation.

relies on a bounded number of local neighborhoods at each stage.5. In other words. In order to determine whether an element in a structure satisfies a first order formula we need (a) the multiset of local r-types in the structure (also known 56 . more elements in the structure become eligible for inclusion into the relation at the next stage. and 2. Thus. On a graph of bounded degree.11. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 56 neighborhoods of these elements. but only through its influence on various local neighborhoods. there is a fixed number of non-isomorphic neighborhoods with radius r. Consequently. This correspondence is important to us. Thus. This ensures that once an element is inserted into the relation that is being computed. and the vertices that lie in these local neighborhoods change their local type. it is never removed. there are only a fixed number of local rtypes. Let us try to uncover the underlying principles that cause it. This propagation is 1. and the changes “propagate” through the structure. we have only O(1) local types. Lemma 5. x) changes local neighborhoods of elements at each stage of the computation. In that case. the fundamental vehicle of this information propagation is that a fixed point computation ϕ(R. Due to the global changes in the multiset of local types. influence flows in the direction of the stages of the LFP computation. This correspondence is most striking in the case of bounded degree structures. Furthermore. we observe that The influence of an element during LFP computation propagates in a similar manner to the influence of a random variable in a directed Markov field. this influence flow is local in the following sense: the influence of an element can propagate throughout the structure. The directed property comes from the positivity of the first order formula that is being iterated. This process continues. directed.

The same concept can be expressed in the language of sufficient statistics. Once it enters the relation. except that we have to consider all the local neighborhoods in the structure. 57 . This type potentially changes with each stage of the LFP. we will cross the Hanf threshold for the multiset of r-types. This is how the computation proceeds. However. In particular. For example. This is similar to the directed Markov picture where there is conditional independence of any variable from non-descendants given the value of the parents. The FO formula that is being iterated can only express a property about some bounded number of such local neighborhoods. we still have factoring through local neighborhoods.12. At the time when this change renders the element eligible for entering the relation. For large enough structures. and so on. in the Gaifman form. and (b) the local type of the element. it will do so. here the bounded nature of FO comes in. Furthermore. there exists a sufficient statistic that is gathered locally at a bounded number of elements.5. and such changes render them eligible. we will be making a decision of whether an element enters the relation based solely on its local r-type. it changes the local r-type of all those elements which lie within a r-neighborhood of it. This is a Markov property: the influence of an element upon another must factor entirely through the local neighborhood of the latter. Knowing this statistic gives us conditional independence from the values of other elements that have already entered the relation previously. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 57 as its global type) for some value of r. At this point. knowing some information about certain local neighborhoods renders the rest of the information about variable values that have entered the relation in previous stages of the graph superfluous. Gaifman’s theorem says that for first order properties. In the more general case where degrees are not bounded. we only need to know the multiset of local types up to a certain threshold. by threshold Hanf. Namely. in a purely stage-wise local manner. there are s distinguished disjoint neighborhoods that must satisfy some local condition. but not from elements that will enter the relation subsequently. Remark 5.

1: Range limited LFP computation process viewed as conditional independencies.5. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 58 X1 X2 Xn-1 Xn Interacting variables. highly constrained by one another Φ1 Φ2 LFP assumes conditional independence after statistics are obtained Φs-1 Φs Bounded number of local statistics at each stage Conditional Independence and factorization over a larger directed model called the ENSP (developed in Chapter 7) Figure 5. 58 .

1. we have exhibited a correspondence between two apparently very different formalisms. 5. 59 . This correspondence is illustrated in Fig. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 59 At this point.5.

5. At this point. we use the transitivity theorem for fixed point logic to move nested fixed points into simultaneous fixed points without nesting. since for any element a of the structure. How can we show that this picture is the same for complex fixed points? We accomplish this in stages. This is because even though the interactions are of O(n) range. we have the following situation. c. First. that change affects the neighborhoods of O(n) other 2-tuples. · · · occurs in a pair along with a. Steps 1 and 2 involve standard constructions in finite model theory. This will change the distance properties of the resulting structure of k-tuples. d. the computation terminates in poly(n) steps. 2. which we recall in Appendix A. This means that when there is a change to a 2-tuple containing a. Let us examine the case of a 2-ary relation that is being computed. we showed that the natural “factorization” of LFP into first order logic. At this point. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 60 5. In this case. coupled with the bounded local property of first order logic can be used to exhibit conditional independencies in the relation being computed. giving us a economical parametrization of the state space. for a k fixed for all problem sizes. instead of single elements. every other element b. for monadic least fixed points. This means that the neighborhood of every pair is O(n). namely. Put another way. Next. they are severely value limited. though the interactions are indeed between O(n) elements at a time. 1. But the argument we provided was for simple fixed points having one free variable. The key point to note is that we still have only poly(log n) parametrization. Recall the discussion 60 .3 Conditional Independence in Complex Fixed Points In the previous sections. we see that we are in the situation of O(n) range interactions. Every pair of elements occurs in the set of 2-tuples. we use the simultaneous induction lemma for fixed point logic to encode the relation to be computed as a “section” of a single LFP relation of higher arity. we are now working with k-tuples. leading once again to poly(log n) parametrization.

it has the capability to utilize only poly(log n) amount of that information in the following precise sense. we have to work in a structure whose elements are k-tuples of our original structure. Remark 5. 61 . this does not pose a problem for encoding instances of k-SAT. The issue one faces is that there is a linear order on the canonical structure. The O(n) nature of interactions remains. We will actually build a graphical model to give us the parameterization in Chapter 8. In this way. In other words. For instance. The basic nature of information gathering and processing in LFP does not change when the arity of the computation rises. THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 61 of the two kinds of poly(log n) parameterizations (range limited and value limited) from Chapter 3.5. one can consider a construction known as the canonical structure due originally to [DLW95] who used it to provide a model theoretic proof of the important theorem in [AV95] that P = PSPACE if and only if LFP = PFP. But since the LFP terminates in polynomially many steps. We also need to ensure that our original structure has a relation that allows an order to be established on k-tuples.13. We could work over a product structure where LFP captures the class of polynomial time computable queries. not just for ordered structures. Note that there are elegant ways to work with the space of equivalence classes of k-tuples with equivalence under first order logic with k-variables. Although each element sees O(n) variates at each stage of the LFP. O(cn ) independent parameters to specify. This happens because the behavior of one variable is dependent on all n − 1 others simultaneously. Note that this is for all structures. It requires. c > 1 different values. therefore. In particular. the number of joint values taken by the system of n variables is only 2poly(log n) . In cases of joint distributions of n covariates which take only 2poly(log n) values. but again the parametrization is only poly(log n). this can not be the case since the resulting distribution can be parameterized far too economically. a k-ary LFP over the original structure would be a monadic LFP over this structure. A “true” joint distribution over n takes cn . It merely adds the ability to gather polynomially more information at each stage taken from O(n) variates at a time.

204] where the result is stated for successor structures. after extracting a statistic from the local neighborhoods of the underlying structure. The distribution we seek will arise when we examine the aggregate behavior of LFP over ensembles of structures that come from ensembles of constraint satisfaction problems (CSPs) such as random k-SAT. or a distribution that we can analyze. 5.5. p. The “bounded number of local” property of each stage of monadic LFP computation manifests as conditional independencies in the distribution. See [Lib04.14. §11. When we examine the properties in the aggregate of LFP running over ensembles.2. The simple scheme described above suffices for our purposes. §11. Remark 5. See [LR03. making the distribution of solutions poly(log n)-parametrizable. there is no probabilistic picture.5] for more details on canonical structures. Though the Immerman-Vardi theorem is usually stated for ordered structures. We are only describing a fully deterministic computation. The benefit of equipping our structures only with a successor structure is that the Gaifman graph remains non-trivial. value limited interactions in higher arity LFP computations also lead to distribution of solutions that are poly(log n)-parametrizable.4 Aggregate Properties of LFP over Ensembles We have shown that any polynomial time computation will update its relation according to a certain Markov type property on the space of k-types of the underlying structure. Likewise. it holds for structures equipped with a successor relation (and no linear ordering). This gives us the setting where we can exploit the full machinery of graphical models of Chapter 2. 62 . THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 62 which renders the Gaifman graph trivial (totally connected). Thus far. we will find the following.

THE LINK BETWEEN POLYNOMIAL TIME COMPUTATION AND CONDITIONAL INDEPENDENCE 63 Before we examine the distributions arising from LFP acting on ensembles of structures. 63 .5. We begin this in the next chapter. we will bring in ideas from statistical physics into the proof.

The entire ensemble of ran- dom k-SAT having m clauses over n literals will be denoted by SATk (n. Cm } uniformly from the 2k n k possible clauses having k variables. An instance is generated by drawing each of the m clauses {C1 . xn }. m). . . many instances of the CSP might be quite easy to solve. The ensemble known as random k-SAT consists of instances of k-SAT generated randomly as follows. . . Furthermore. such “easy” instances lay in certain well defined regimes of the CSP. While a given CSP — say. even using fairly simple algorithms. . researchers were motivated to study randomly generated ensembles of CSPs having certain parameters that would specify which regime the instances of the ensemble belonged to. Thus. 64 . The decision problem of whether a satisfying assignment to the variables exists is NP-complete for k ≥ 3.6. each of whom is a disjunction of k literals taken from n variables {x1 . The 1RSB Ansatz of Statistical Physics 6. dating back at least to [CF86]. 3-SAT— might be NP-complete.1 Ensembles and Phase Transitions The study of random ensembles of various constraint satisfaction problems (CSPs) is over two decades old. . while “harder” instances lay in clearly separated regimes. An instance of k-SAT is a propositional formula in conjunctive normal form Φ = C1 ∧ C2 ∧ · · · ∧ Cm having m clauses Ci . . We will see this behavior in some detail for the specific case of the ensemble known as random k-SAT. .

6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS

65

and a single instance of this ensemble will be denoted by Φk (n, m). The clause density, denoted by α and defined as α := m/n is the single most important parameter that controls the geometry of the solution space of random k-SAT. Thus, we will mostly be interested in the case where every formula in the ensemble has clause density α. We will denote this ensemble by SATk (n, α), and an individual formula in it by Φk (n, α). Random CSPs such as k-SAT have attracted the attention of physicists because they model disordered systems such as spin glasses where the Ising spin of each particle is a binary variable (”up” or “down”) and must satisfy some constraints that are expressed in terms of the spins of other particles. The energy of such a system can then be measured by the number of unsatisfied clauses of a certain k-SAT instance, where the clauses of the formula model the constraints upon the spins. The case of zero energy then corresponds to a solution to the k-SAT instance. The following formulation is due to [MZ97]. First we translate the Boolean variables xi to Ising variables Si in the standard way, namely Si = −(−1)xi . Then we introduce new variables Cli as follows. The variable Cli is equal to 1 if the clause Cl contains xi , it is −1 if the clause contains ¬xi , and is zero if neither appears in the clause. In this way, the sum the satisfiability of clause Cl . Specifically, if the Hamiltonian H=
i=1 n i=1 n i=1

Cli Si measures

Cli Si − k > 0, the clause is

satisfied by the Ising variables. The energy of the system is then measured by
m n

δ(
i=1

Cli Si , −K).

Here δ(i, j) is the Kronecker delta. Thus, satisfaction of the k-SAT instance translates to vanishing of this Hamiltonian. Statistical mechanics then offers techniques such as replica symmetry, to analyze the macroscopic properties of this ensemble. Also very interesting from the physicist’s point of view is the presence of a sharp phase transition [CKT91, MSL92] (see also [KS94]) between satisfiable and unsatisfiable regimes of random k-SAT. Namely, empirical evidence suggested that the properties of this ensemble undergoes a clearly defined transition when the clause density is varied. This transition is conjectured to be as follows. For 65

6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS

66

each value of k, there exists a transition threshold αc (k) such that with probability approaching 1 as n → ∞ (called the Thermodynamic limit by physicists), • if α < αc (k), an instance of random k-SAT is satisfiable. Hence this region is called the SAT phase. • If α > αc (k), an instance of random k-SAT is unsatisfiable. This region is known as the unSAT phase. There has been intense research attention on determining the numerical value of the threshold between the SAT and unSAT phases as a function of k. [Fri99] provides a sharp but non-uniform construction (namely, the value αc is a function of the problem size, and is conjectured to converge as n → ∞). Functional upper bounds have been obtained using the first moment method [MA02] and improved using the second moment method [AP04] that improves as k gets larger.

6.2

The d1RSB Phase

More recently, another thread on this crossroad has originated once again from statistical physics and is most germane to our perspective. This is the work in the progression [MZ97], [BMW00], [MZ02], and [MPZ02] that studies the evolution of the solution space of random k-SAT as the constraint density increases towards the transition threshold. In these papers, physicists have conjectured that there is a second threshold that divides the SAT phase into two — an “easy” SAT phase, and a “hard” SAT phase. In both phases, there is a solution with high probability, but while in the easy phase one giant connected cluster of solutions contains almost all the solutions, in the hard phase this giant cluster shatters into exponentially many communities that are far apart from each other in terms of least Hamming distance between solutions that lie in distinct communities. Furthermore, these communities shrink and recede maximally far apart as the constraint density is increased towards the SAT-unSAT threshold. As this threshold is crossed, they vanish altogether. 66

6. THE 1RSB ANSATZ OF STATISTICAL PHYSICS

67

As the clause density is increased, a picture known as the “1RSB hypothesis” emerges that is illustrated in Fig. 6.1, and described below. RS For α < αd , a problem has many solutions, but they all form one giant cluster within which going from one solution to another involves flipping only a finite (bounded) set of variables. This is the replica symmetric phase. d1RSB At some value of α = αd which is below αc , it has been observed that the space of solutions splits up into “communities” of solutions such that solutions within a community are close to one another, but are far away from the solutions in any other community. This effect is known as shattering [ACO08]. Within a community, flipping a bounded finite number of variable assignments on one satisfying takes one to another satisfying assignment. But to go from one satisfying assignment in one community to a satisfying assignment in another, one has to flip a fraction of the set of variables and therefore encounters what physicists would consider an “energy barrier” between states. This is the dynamical one step replica symmetry breaking phase. unSAT Above the SAT-unSAT threshold, the formulas of random k-SAT are unsatisfiable with high probability. Using statistical physics methods, [KMRT+ 07] obtained another phase that lies between d1RSB and unSAT. In this phase, known as 1RSB (one step replica symmetry breaking), there is a “condensation” of the solution space into a subexponential number of clusters, and the sizes of these clusters go to zero as the transition occurs, after which there are no more solutions. This phase has not been proven rigorously thus far to our knowledge and we will not revisit it in this work. The 1RSB hypothesis has been proven rigorously for high values of k. Specifically, the existence of the d1RSB phase has been proven rigorously for the case of k > 8, starting with [MMZ05] (see also [DMMZ08]) who showed the existence of clusters in a certain region of the SAT phase using first moment methods. Later, [ART06] rigorously proved that there exist exponentially many clus67

Further [ACO08] obtained analytical expressions for the threshold at which the solution space of random k-SAT (as also two other CSPs — random graph coloring and random hypergraph 2-colorability) shatters. in the region of constraint density α ∈ [αd . [ART06]. there are no more solutions. the solutions break up into exponentially many communities. Given any solution in a cluster.6. the solution space is comprised of exponentially many communities of solutions which require a fraction of the variable assignments to be flipped in order to move between each other.1: The clustering of solutions just before the SAT-unSAT threshold. as well as confirmed the O(n) Hamming separation between clusters. one may obtain the core of the cluster by “peeling away” variable assignments that.2. α αd αc Figure 6. In summary. αc ]. the fraction of variables that take the same value in the entire cluster (the so-called frozen variables) goes to one as the SAT-unSAT threshold is approached. We first need the notion of the core of a cluster. 6. we reproduce results about the distribution of variable assignments within each cluster of the d1RSB phase from [MMW07]. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 68 ters in the d1RSB phase and showed that within any cluster. the space of solution is largely connected. and [ACO08]. Below αd . Above αc . loosely speaking occur only in clauses that are satisfied by other 68 . Between αd and αc .1 Cores and Frozen Variables In this section. which is indicated by the unfilled circle.

r) < αc such that for all α ∈ [α(k.1 ([ART06]). almost every variable in a core is frozen as we increase the clause density towards the SAT-unSAT threshold. it follows that frozen variables take the same value throughout the cluster. This process will eventually lead to a fixed point. Of obvious interest are those variables that are assigned 0 or 1. However. We may easily see that the core is not dependent upon the choice of the initial solution. with each variable being assigned a 0. if the variable xi takes value 1 in the core of a cluster. and that is the core of the cluster. r). 69 . . . we repeat the following starting with any solution in the cluster: if a variable is free. Apriori. Theorem 6. This process leads to the core of the cluster. For every r ∈ [0. Note that since the core can be arrived at starting from any choice of an initial solution in the cluster. The non-frozen variables are those that are assigned the value ∗ in the core. For example. then every solution lying in the cluster has xi assigned the value 1. αc ]. These take both values 0 and 1 in the cluster. [ART06] proved that for high enough values of k.6. . 1 ] there is a constant kr such that for all 2 k ≥ kr . assign it a ∗. The ∗ assignment is akin to a “joker state” which can take whichever value is most useful in order to satisfy the k-SAT formula. we do not know whether there are any frozen variables at all. we say that a variable in a partial assignment is free when each clause it occurs in has at least one other variable that satisfies the clause. THE 1RSB ANSATZ OF STATISTICAL PHYSICS variable assignments. What does the core of a cluster look like? Recall that the core is itself a partial assignment. These variables are said to be frozen. Finally. first we define a partial assignment of a set of variables (x1 . to obtain the core of a cluster. ∗}. we have no way to tell that the core will not be the all ∗ partial assignment. Namely. or has as assignment to ∗. 1 or a ∗. there exists a clause density α(k. Clearly the number of ∗ variables is a measure of the internal entropy (and therefore the size) of a cluster since it is only these variables whose values vary within the cluster. xn ) as an assignment of each variable to a value in {0. 1. Next. 69 To get a formal definition. with probability going to 1 in the thermodynamic limit. .

but informally cores are too large to pass through the bottlenecks that the stage-wise first order LFP algorithms create. THE 1RSB ANSATZ OF STATISTICAL PHYSICS asymptotically almost surely 1.6.2 ([ART06]). Algorithms based on LFP can tackle long range interactions between variables. αn) has at least (1 − r)n frozen variables. αn) has frozen variables. For every k ≥ 9. 70 . In other words. [MMW07] obtained a lower bound on the size of cores. caused by increasing the clause density sufficiently. this core is instantiated amply in the solution space (by that we mean it takes exponentially many values in those many clusters of the d1RSB phase). they must involve a fraction of all the variables in the formula. then these clauses must have literals that come from a set of at most C variables. and are ample. Furthermore. As the reader may imagine after reading the previous chapters. there exist α < αc (k) such that with high probability. We will need more work to make this precise. which means that when non-trivial cores do exist ( [ART06] proved their existence for k ≥ 9). See also the remark at the end of this section. a core may be thought of as the onset of a large single interaction of degree O(n) among the variables. Note that this picture is known to hold only for k ≥ 9 and is an open question for k < 9. this sort of interaction cannot be dealt with by LFP algorithms. We end this section with a physical picture of what forms a core. By bounding the probability of this event. Such ample irreducible O(n) interactions. fewer than rn variables take the value ∗. This may also be interpreted as follows. but only when they can be factored into interactions of degree poly(log n) or are value limited. 2. 70 Corollary 6. The bound is linear. cannot be dealt with using an LFP algorithm. But the appearance of cores is equivalent to the onset of O(n) degree interactions which cannot be further factored into poly(log n) degree interactions. every cluster of solutions of Φk (n. every cluster of the solution space of Φk (n. If a formula Φ has a core with C clauses. This gives us the corollary.

there has been an understanding that hard instances of random k-SAT tend to occur when the constraint density α is near the transition threshold. 3.6. and in a first order computation. we turn to the question of how the two are related. and pointers to more detailed surveys. the ease of finding a solution differs quite dramatically on traditional SAT solvers due to a clustering of the solution space into numerous communities that are far apart 71 . there is empirical evidence that this phenomenon does not take place [MMW05]. It remains open for k < 8. Hence. The ampleness precludes value limited interactions also as we shall see. §1]. for low values of k such as k = 2. the decision of whether an element is to enter the relation being computed is based on information collected from local neighborhoods and combined in a bounded fashion. see [ACO08] and [CO09]. Beginning with [CKT91] and [MSL92]. 6. Thus. See also [CO09] for the best current algorithm along with a comparison of various other algorithms to it. The freezing of variables in cores is known to happen only for k ≥ 9 [ART06]. our separation of complexity classes needs the regime of k ≥ 9. while both regimes in SAT have solutions with high probability.2. and that this behavior was similar to phase transitions in spin glasses [KS94]. The precise statement of this intuitive picture will be provided in the next chapter when we build our conditional independence hierarchies. This bottleneck is too small for a core to factor through in range limited LFP.2 Performance of Known Algorithms in the d1RSB Phase We end this chapter with a brief overview of the performance of known algorithms as a function of the clause density. also see the discussion in [ART06. Indeed. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 71 We have already noted that this is because LFP algorithms factor through first order computations. Now that we have surveyed the known results about the geometry of the space of solutions in this region. It has been empirically observed that the onset of the d1RSB transition seems to coincide with the constraint density where traditional solvers tend to exhibit exponential slowdown.

Rem. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 72 from each other in terms of Hamming distance. the freezing of variables within clusters. Our work will explain that indeed.6. which is asymptotically 2k ln 2. In particular. precluding both range and value limited parametrizations. These properties are not known to hold except for k ≥ 9 and clause densities above (2k /k) ln k. By the results of [ART06] and [ACO08. We will require all known properties of the d1RSB phase — namely. in regimes where we know solutions exist. and ample. we are currently unable to find them in polynomial time. The significance of the value of k and clause density above (2k /k) ln k. Specifically. Thus. 2]. This clause density threshold for the onset of the d1RSB phase is 2k /k ln k [ACO08]. This is because in the d1RSB phase. and the clause density sufficiently high so that we are in the d1RSB phase. the exponentially many clusters. we are guaranteed the presence of the full d1RSB phenomena only for k ≥ 9 72 . no algorithms are known to produce solutions in polynomial time with probability Ω(1) — neither on the basis of rigorous or empirical analysis. we will work with random k-SAT in the k ≥ 9 regime. the solution space geometry is not expressible as a mixture of range or value limited poly(log n)-parametrizable pieces. §2. this is fundamentally a limitation of polynomial time algorithms. in such phases (for k ≥ 9). Hence. the distribution of solutions is both irreducibly correlated at ranges O(n).1. See [ACO08] for proofs that the clause density where all known polynomial time algorithms fail on NP-complete problems such as k-SAT and graph coloring coincides with the onset of the d1RSB phase in these problems. or any other evidence [CO09]. and the O(n) variable changes required to move from one cluster to another. Compare this to the SAT-unSAT threshold. The earlier [ART06] had established the existence of shattering and freezing of variables within cores for α = Θ(2k ). well below the SAT-unSAT threshold. for clause densities above O(2k /k). for our separation of complexity classes. Please see [CO09] for the best known algorithm that does solve SAT instances with non-vanishing probability for densities up to 2k ω(k)/k for any sequence ω(k) → ∞.

Survey propagation seems to scale as n log n in this region.6. nor do they indicate the lack of solution except to the extent that they were unable to find one. More recently. where experiments are reported on 4-SAT. a breakthrough for incomplete algorithms in this field came with [MPZ02] who used the cavity method from spin glass theory to construct an algorithm named survey propagation that does very well on instances of random k-SAT with constraint density above the aforementioned clustering threshold. For instance. and continues to perform well very close to the threshold αc for low values of k. We are not aware of experimental work done that shows the efficacy (even under mild requirements) of any algorithm on k ≥ 9 after the onset of the d1RSB phase with nontrivial cores. THE 1RSB ANSATZ OF STATISTICAL PHYSICS 73 It should be noted that there is empirical evidence that the d1RSB phase is not present in random 3-SAT in the following sense. See also [KMRT+ 06]. The cores in the clusters of random 3-SAT are trivial. The behavior of survey propagation for higher values of k is still being researched. unlike k ≥ 9 where [ART06] show the existence of nontrivial cores for almost all clusters after the d1RSB threshold. We should also point out that the experimental behavior of algorithms for k-SAT is largely characterized for lower values of k = 2. 3. where the full d1RSB picture is not known to hold. The algorithm uses the 1RSB hypothesis about the clustering of the solution space into numerous communities. The original work reported in [MPZ02] was on 3-SAT. 4. Incomplete algorithms are obviously very important for hard regimes of constraint satisfaction problems since we do not have complete algorithms in these regimes that have economical running times. By that we mean that they tend to be the all ∗ core. the experimental behavior of algorithms reported in [MRTS07] is on random 4-SAT. 73 . Incomplete algorithms are a class that do not always find a solution when it exists.

The random k-factor graph ensemble. and the degree of variable nodes is a random variable with expectation αk.7. For each of the with probability αn/ n k n k n k possibilities k-tuples of variables. Chapter 9].2. Graphs constructed in this manner may have two function nodes connected to the same k-tuple of variables. α)-factor graph ensemble. In this case. In this ensemble. and both can be seen as the 74 . In this section we introduce the factor graph ensembles that represent random k-SAT. consists of graphs having n variable nodes and m function nodes constructed as follows. denoted by Gk (n. The random (k.1. a function node that connects to only these k variables is added to the graph . while the degree of the variable nodes is a random variable with expectation km/n. Our treatment of this section follows [MM09. denoted by Gk (n. A graph in the ensemble is constructed by picking. m → ∞ with the ratio α := m/n being held constant. α). Definition 7. Random Graph Ensembles We will use factor graphs as a convenient means to encode various properties of the random k-SAT ensemble. consists of graphs constructed as follows. Definition 7. In this ensemble. a k-tuple of variables uniformly from the for such a k-tuple chosen from n variables. We will be interested in the case of the thermodynamic limit of n. function nodes all have degree k. for each of the m function nodes in the graph. m). both the ensembles converge in the properties that are important to us. the number of function nodes is a random variable with expectation αn.

Here. One may demonstrate this for the ErdosRenyi random graph as follows. §9.1 Locally Tree-Like Property We have seen in Chapter 5 that the propagation of influence of variables during a LFP computation is stagewise-local. RANDOM GRAPH ENSEMBLES 75 underlying factor graph ensembles to our random k-SAT ensemble SATk (n. remarkably. E) occurring as a subgraph of the Erdos-Renyi graph.5]. there are n vertices. At each position.7. This is really the fundamental limitation of LFP that we seek to exploit. In such phases. 7. In order to understand why this is a limitation. the probability of the graph structure occurring is p|E| (1 − p) . and there is an edge between any two with probability c/n where c is a constant that parametrizes the density of the graph. With the definitions in place. If the graph is connected. |V | ≤ |E| − 1 with equality 75 . Consider the probability of a certain graph (V. we are ready to describe two properties of random graph ensembles that are pertinent to our problem. Edges are “drawn” uniformly and independently of each other. α) (see Chapter 6 for definitions and our notation for random k-SAT ensembles). By that we mean that there are no cycles in a O(1) sized neighborhood of any vertex as the size of the graph goes to infinity [MM09. we need to examine what local neighborhoods of the factor graphs underlying NP-complete problems like k-SAT look like in hard phases such as d1RSB. we see that such a graph occurs asymptotically O(|V | − |E|) times.1 Properties of Factor Graph Ensembles The first property provides us with intuition on why algorithms find it so hard to put together local information to form a global perspective in CSPs. However. 7. Such a graph can occur in |V | −|E| 2 n |V | positions.1. Applying Stirling’s approximations. such graphs are locally trivial. there are many extensive (meaning O(n)) correlations between variables that arise due to loops of sizes O(log n) and above.

d In the large graph limit we get lim P (deg vi = d) = e−kα (kα)d . finite connected graphs have vanishing probability of occurring in finite neighborhoods of any element. Thus. We know from the onset of cores and frozen variables in the 1dRSB phase of k-SAT that there are strong correlations between blocks of variables of size O(n) in that phase. 7. Then the r-neighborhood of i in G converges in distribution to Tk (n.1. m) is a random variable. available through local inspection. The expected value of the fraction of variables in Gk (n. m) having degree d is the same as the expected value that a single variable node has degree d. In short. Let us think about what this implies.7. m) as n → ∞. m). and i be a uniformly chosen node in G. But even this next step is not available. m) are indistinguishable from each other. However.2 Degree Profiles in Random Graphs The degree of a variable node in the ensemble Gk (n. the two ensembles Gk (n.3. Let G be a randomly chosen graph in the ensemble Gk (n. m) and Tk (n. In other words. these loops are invisible when we inspect local neighborhoods of a fixed finite size. of course. The simplest local property is degrees of elements. We wish to understand the distribution of this random variable. Theorem 7. such random graphs do not provide any of their global properties through local inspection at each element. in the limit of n → ∞. if only local neighborhoods are examined. RANDOM GRAPH ENSEMBLES 76 only for trees. These are. for instance). n! n→∞ 76 . as the problem size grows. The next would be small connected subgraphs (triangles. both being equal to P (deg vi = d) = m k p (1 − p)m−d d where p = k . Let us see what this means in terms of the information such graphs divulge locally.

the degree is asymptotically a Poisson random variable. it asymptotically almost surely satisfies the following dmax z = 1+Θ kαe log(z/ log z) where z = (log n)/kαe. RANDOM GRAPH ENSEMBLES In other words.1) 77 . (7. 184] for a discussion of this upper bound. log log z (log z)2 . See [MM09. Proof.4. as well as a lower bound. 77 A corollary is that the maximum degree of a variable node is almost surely less than O(log n) in the large graph case. p. m) is asymptotically almost surely O(log n). In particular. The maximum variable node degree in Gk (n.7. Lemma 7.

Both are hampered by the same underlying property — that inspite of being distributions on n covariates. this means that their joint distribution behaves like the joint distribution of only poly(log n) covariates instead of n covariates. up to n-interactions between all n variables simultaneously. In light of the above. they can be specified with only 2poly(log n) parameters. What 78 . 8.8. Such irreducible interactions can be 2-interactions (between pairs). We have described the fundamental similarity between range limited and value limited distributions in Chapter 3. We return to value limited poly(log n)-parametrizations just before the final separation of complexity classes in Section 8. those that cannot be expressed in terms of interactions between smaller sets of variables with conditional independencies between them. We are now ready to begin our final constructions that will yield the separation of complexity classes.1 Measuring Conditional Independence in Range Limited Models Our central concern with respect to range limited models is to understand which variable interactions in a system are irreducible — namely. Separation of Complexity Classes We have built a framework that connects ideas from graphical models. Informally. In our terminology. and so on. A joint distribution encodes the interaction of a system of n variables. and random graphs. 3-interactions (between triples). logic.5. statistical mechanics. they are both poly(log n)-parametrizable. we first consider the case of range limited poly(log n)parametrizations.

8. Remark 8. Instead. We may. the distribution can be parametrized with n2k independent parameters.5. for instance. we can make statements about how deeply entrenched the conditional independence between the covariates is. as noted in Sec. not all distributions have such maps. When the largest irreducible interactions are k-interactions. without the possibility of being decoupled. At level n. the “jointness” of the distribution would lie at a lower level than n. At level zero of this “hierarchy”. The first issue would be how to measure the level of a distribution in this hierarchy. whereas in a general distribution without any conditional independencies. There are some technical issues with constructing such a hierarchy to measure conditional independence. Similarly. but they grow relatively slowly. the distribution has a directed P-map. In both cases above. Thus. the independent parameter space grows polynomially with n. or dually. they behave in ways similar to a set of poly(log n) covariates. in families of distributions where the irreducible interactions are of fixed size. except it is a value-limited O(n) interaction model. The case of complex LFP is also one of poly(log n)parametrization. The case of monadic LFP lies in between — the interactions are not of fixed size. about how large the set of direct interactions between variables is. the covariates should be independent of each other. If. upper and lower bound the level using minimal I-maps and maximal D-maps for the distribution. the “jointness” of the covariates really would lie at a lower “level” than n. of course. 2. SEPARATION OF COMPLEXITY CLASSES 79 would happen if all the direct interactions between variables in the system were all of less than a certain finite range k. In this way. they are coupled together n at a time. with k < n? In such a case. but took only 2poly(log n) joint values. However. it grows exponentially with n. as stated in Chapter 3.1. the n covariates do not display the behavior of a typical joint distribution of n variables. then we could measure the size of the largest clique that appears in its moralized graph. we should note that there may be different minimal I-maps for 79 . In the case of ordered graphs. if the variables did interact n at a time. We would like to measure the “level” of conditional independence in a system of interacting variables by inspecting their joint distribution.

SEPARATION OF COMPLEXITY CLASSES 80 the same distribution for different orderings of the variables. they have more economical parametrizations than the space of solutions in the d1RSB phase does. 8. and 2. the larger distribution factorized recursively according to some directed graphical model. then we would have obtained a parametrization of our distribution that would reflect the factorization of the larger distribution. See [KF09. If we could somehow embed the distribution of solutions generated by LFP into a larger distribution.2. 8. such that 1.1 Encoding k-SAT into Structures In order to use the framework from Chapters 4 and 5. we aim to demonstrate that distributions of solutions generated by LFP lie at a lower level of conditional independence than distributions that occur in the d1RSB phase of random k-SAT. 80] for an example. We will indicate the differences for complex LFP. The insight that allows us to resolve the issue is as follows. which does not affect us. Consequently. By pursuing the above course. p. we will encode k-SAT formulae as structures over a fixed vocabulary. 80 .2 Generating Distributions from LFP We will describe the method of generating distributions and showing economic parametrizations by embedding the covariates into a larger directed graphical model below for monadic LFP. the larger distribution had only polynomially many more variates than the original one. First we describe how we use LFP to create a distribution of solutions. 8. and would cost us only polynomially more.3. We will return to the task of constructing such an embedding in Sec.8.

A SAT formula is defined over n variables. . this relation will have arity k. . RE . We dispense with the superscripts since the underlying structure is clear. We denote by lower case xi the literals of the formula. but they can come either in positive or negative form. . we come to the universe. In order to avoid new notation.8. SEPARATION OF COMPLEXITY CLASSES 81 Our vocabularies are relational. our universe will have 2n elements corresponding to the variables x1 . The relation RP will be a partial assignment of values to the underlying variables. Lastly. We will not introduce a linear ordering since that would make the Gaifman graph a clique. The first relation RC will encode the clauses that a SAT formula comprises. The relation RC will consist of k-tuples from the universe interpreted as clauses consisting of disjunctions between the variables in the tuple. . ¬xn . . We will use three relations. xn . We need a relation in order to make FO(LFP) capture polynomial time queries on the class of k-SAT structures. . Next. and the set of constants. .2. RP }. This describes our vocabulary σ = {RC . Finally. 81 . 8. We will describe these in the Sec.3. We do not require constants. Rather we will include a relation such that FO(LFP) can capture all the polynomial time queries on the structure. . 1. 2. 3. 4. Thus. while the corresponding upper case Xi denotes the corresponding variable in a model. This will be a binary relation RE . we will simply use the same notation to indicate the corresponding element in the universe. we need to interpret our relations in our universe. Since we are studying ensembles of random k-SAT. and so we need only specify the set of relations. ¬x1 . The relation RE will be interpreted as an “edge” between successive variables. we need a relation RP to hold “partial assignments” to the SAT formulae.

both for 1 ≤ i < n. the pairs (xi . Thus. In our problem of k-SAT. In other words. α) and encode each k-SAT instance as a σ-structure. we need to give the LFP something it can use to create an ordering. columns for variables that appear in them). Ensembles of k-SAT Let us now create ensembles of σ-structures using the encoding described above. we encode our structures as successor-type structures through the relation RE . This chains together the elements of the structure. each stage of the LFP is order-invariant. We will start with the ensemble SATk (n. it plays no further role. What we want is something weaker. for k = 3. xi+1 ) and (¬xi . which have a well defined notion of order on them. ¬xi+1 ).8. SEPARATION OF COMPLEXITY CLASSES 82 Now we encode our k-SAT formulae into σ-structures in the natural way. α). Recall that an order on the structure enables the LFP computation (or the Turing machine the runs this computation) to represent tuples in a lexicographical ordering. ¬x3 ) in the relation RC . They depend only on the relation RC which encodes the clauses and the relation RP that holds the initial partial assignment that we are going to ask the LFP to extend. The resulting ensemble will be denoted by Sk (n. that still suffices. The encoding of the problem Φk (n. The reason for the relation RE that creates the chain is that on such structures. We could encode our structures with a linear order. α) as a σ-structure will be denoted by Pk (n.2]. For example. ¬x2 . ¬x1 ) will be in the relation RE . This is a technicality. polynomial time queries are captured by FO(LFP) [EF06. but that would make the Gaifman graph fully connected. Specifically. as well as the pair (xn . α). the assignments to the variables that are computed by the LFP have nothing to do with their order. 82 . §11. It is known that the class of order invariant queries is also Gaifman local [GS00]. However to allow LFP to capture polynomial time on the class of encodings. Note also that SAT problems may also be represented as matrices (rows for clauses. since it imparts on the structure an ordering based on that of the variables. This seems most natural. Similarly. the clause x1 ∨ ¬x2 ∨ ¬x3 in the SAT formula will be encoded by inserting the tuple (x1 .

m). parametrized by the clause density α. or the Gaifman graph. The resulting graph is. α). the vertices of the graph will be the set of sites.2 The LFP Neighborhood System In this section. and build the neighborhood system through the Gaifman graph. Let us recall the factor graph ensemble Gk (n. The ensemble of such graphs. Each site s will be connected by an edge to every other site in Ns .2. For instance. the simple monadic LFP computation induces a neighborhood system described as follows. What is the size of cliques in this interaction graph? This is not the same as the size of cliques in the factor graph. therefore. we make the neighborhood system into a graph in the standard way. The set of vertices of the Gaifman graph are simply the set of variable nodes in the factor graph and their negations since we are using both variables and their negations for convenience (this is simply an implementation detail). every node that was within the locality rank neighborhood of the Gaifman graph is now connected to it by a single edge.2 will have 12 vertices.8. The sites of the neighborhood system are the variable nodes. We begin with the factor graph.e. because the density 83 . α). the Gaifman graph for the factor graph of Fig 2. we wish to describe the neighborhood system that underlies the monadic LFP computations on structures of Sk (n. We encode the k-SAT instance as a structure as described in the previous section. where r is the locality rank of the first order formula ϕ whose fixed point is being constructed by the LFP computation. Finally. This graph will be called the interaction graph of the LFP computation. The neighborhood Ns of a site s is the set of all nodes that lie in the r-neighborhood of a site. On this Gaifman graph. Note that this interaction graph has many more edges in general than the Gaifman graph. Each graph in this ensemble encodes an instance of random k-SAT. Two vertices are joined by an edge in the Gaifman graph either when the two corresponding variable nodes were joined to a single function node (i.. appeared in a single clause) of the factor graph or if they are adjacent to each other in the chain that relation RE has created on the structure. we build the Gaifman graph of each such structure. Namely. Next. SEPARATION OF COMPLEXITY CLASSES 83 8. In particular. far more dense than the Gaifman graph. will be denoted by Ik (n.

8. SEPARATION OF COMPLEXITY CLASSES

84

of the graph is higher. The size of the largest clique is a random variable. What we want is an asymptotic almost sure (by this we mean with probability tending to 1 in the thermodynamic limit) upper bound on the size of the cliques in the distribution of the ensemble Ik (n, α). Note: From here on, all the statements we make about ensembles should be understood to hold asymptotically almost surely in the respective random ensembles. By that we mean that they hold with probability 1 as n → ∞. Lemma 8.2. The size of cliques that appear in graphs of the ensemble Ik (n, α) are upper bounded by poly(log n) asymptotically almost surely. Proof. Let dmax be as in (7.1), and r be the locality rank of ϕ. The maximum degree of a node in the Gaifman graph is asymptotically almost surely upper bounded by dmax = O(log n). The locality rank is a fixed number (roughly equal to 3d where d is the quantifier depth of the first order formula that is being iterated). The node under consideration could have at most dmax others adjacent to it, and the same for those, and so on. This gives us a coarse dr upper bound max on the size of cliques. Remark 8.3. While this bound is coarse, there is not much point trying to tighten it because any constant power factor (r in the case above) can always be introduced by computing a r-ary LFP relation. This bound will be sufficient for us. Remark 8.4. High degree nodes in the Gaifman graph become significant features in the interaction graph since they connect a large number of other nodes to each other, and therefore allow the LFP computation to access a lot of information through a neighborhood system of given radius. It is these high degree nodes that reduce factorization of the joint distribution since they represent direct interaction of a large number of variables with each other. Note that although the radii of neighborhoods are O(1), the number of nodes in them is not O(1) due to the Poisson distribution of the variable node degrees, and the existence of high degree nodes.

84

8. SEPARATION OF COMPLEXITY CLASSES

85

Remark 8.5. The relation being constructed is monadic, and so it does not introduce new edges into the Gaifman graph at each stage of the LFP computation. When we compute a k-ary LFP, we can encode it into a monadic LFP over a polynomially (nk ) larger product space, as is done in the canonical structure, for instance, but with the linear order replaced by a weaker successor type relation. Therefore, we can always chose to deal with monadic LFP. This is really a restatement of the transitivity principle for inductive definitions that says that if one can write an inductive definition in terms of other inductively defined relations over a structure, then one can write it directly in terms of the original relations that existed in the structure [Mos74, p. 16].

8.2.3

Generating Distributions

The standard scenario in finite model theory is to ask a query about a structure and obtain a Yes/No answer. For example, given a graph structure, we may ask the query “Is the graph connected?” and get an answer. But what we want are distributions of solutions that are computed by a purported LFP algorithm for k-SAT. This is not generally the case in finite model theory. Intuitively, we want to generate solutions lying in exponentially many clusters of the solution space of SAT in the d1RSB phase. How do we do this? To generate these distributions, we will start with partial assignments to the set of variables in the formula, and ask the question whether such a partial assignment can be extended to a satisfying assignment. We need the following definition. Definition 8.6. A global relation associated to a decision problem on a class K is a relation R of a fixed arity k over A associated to each structure A ∈ K. The following is a restatement of the Immerman-Vardi theorem phrased in terms of computability of global relations. See [LR03, §11.2, p. 206] for a proof. Theorem 8.7. A global relation R on a class of successor structures is computable in polynomial time if and only if R is inductive. 85

8. SEPARATION OF COMPLEXITY CLASSES

86

We wish to see that the global relation that associates to each structure a complete assignment that coincides with the partial assignment placed in the relation RP is inductive. By the theorem above, this is equivalent to showing that it is computable in polynomial time. In order to see this, we recall that decision problems that are NP-complete have a property called self-reducibility that allows us to query a decision procedure for them a polynomial number of times and build a solution to the search version of the problem. If P = NP, then all decision problems in NP have polynomial time solutions, and one can use self-reducibility to see that the search version will also be polynomial time solvable — namely, a solution will be constructible in polynomial time. Next we will define our search problem in a way that a solution to it will be a global relation: an instance of the problem will be a structure with partial assignments, and the question will be whether the partial assignment can be extended to a complete assignment. The complete assignment can be represented by a global unary relation that will store all the literals assigned +1, and which must concur with the partial assignment on its overlap. This decision problem is clearly in NP, and therefore if P = NP, it would have a polynomial time search solution, making R computable in polynomial time. The theorem then says R must be inductive. Since we want to generate exponentially many such solutions, we will have to partially assign O(n) (a small fraction) of the variables, and ask the LFP to extend this assignment, whenever possible, to a satisfying assignment to all variables. Thus, we now see what the relation RP in our vocabulary stands for. It holds the partial assignment to the variables. For example, suppose we want to ask whether the partial assignment x1 = 1, x2 = 0, x3 = 1 can be extended to a satisfying assignment to the SAT formula, we would store this partial assignment in the tuple (x1 , ¬x2 , x3 ) in the relation RP in our structure. As mentioned earlier, the output satisfying assignment will be computed as a unary relation which holds all the literals that are assigned the value 1. This means that xi is in the relation if xi has been assigned the value 1 by the LFP, and otherwise ¬xi is in the relation meaning that xi has been assigned the value

86

By “enough” we mean rising exponentially with the underlying problem size. based on the principles we have learnt. The first issue is that graphical models considered in literature are mostly static. Once again. In short. the relations between the variables encoded in the models are fixed. Since we wish to apply them to the setting of complexity theory. Now we “initialize” our structure with different partial assignments and ask the LFP to compute complete assignments when they exist. By this we mean that 1. 8. and 2. they are of fixed size.8. and we will view it as monadic over a polynomially larger structure. we are interested in families of such models. We will have to build our own. we would like to examine its conditional independence characteristics. our situation is not exactly like any of these models. they model fixed interactions between a fixed set of variables. we simply abort that particular attempt and carry on with other partial assignments until we generate enough solutions. SEPARATION OF COMPLEXITY CLASSES 87 0 by the LFP computation. For more complex formulas. we considered various graphical models and their conditional independence characteristics. This is the simplest case where the FO(LFP) formula is simple monadic. over a fixed set of variables. Let us first note two issues. and we now analyze it and compare it to the one that arises in the d1RSB phase of random k-SAT. 87 . the output will be some section of a relation of higher arity (please see Appendix A for details). for instance? In Chapter 2.3 Disentangling the Interactions: The ENSP Model Now that we have a distribution of solutions computed by LFP. with a focus on how their structure changes with the problem size. In this way we get a distribution of solutions that is exponentially numerous. Does it factor through any particular graphical model. If the partial assignment cannot be extended.

88 . This is. Because the flow of information is as described above. different from a Markov random field distribution which has no such direction. In the other type of flow. let us build some intuition first. Thus. The way a LFP computation proceeds through the structure will. 2. SEPARATION OF COMPLEXITY CLASSES 88 The second issue that faces us now is as follows. It implicitly happens once any element enters the relation being computed. once an element is assigned a value. for example. or the set of neighborhoods. Even within a certain size n. the trajectories of two different initial partial assignments will not be the same. neighborhoods across the structure influence the value an unassigned node will take. In one type of flow. the way the LFP would go about assigning values to the unassigned variables would be. we will not be able to express it using a simple DAG on either the set of vertices. there is a directed flow of influence as the LFP computation progresses. Consider simple monadic LFP. Thus.8. although we would expect them to be similar. the second type is implicit. 1. if one initial partial assignment landed us in cluster X. We would expect a different “trajectory” of the LFP computation for different clusters in the d1RSB phase. in general. it changes the neighborhoods (or more precisely the local types of various other elements) in its vicinity. There are two types of flows of information in a LFP computation. We know that there is a “directed-ness” to LFP in that elements that are assigned values at a certain stage of the computation then go on to influence other elements who are as yet unassigned. and another in cluster Y. Namely. How do we deal with this situation? In order to model this dynamic behavior. in general. Note that while the first type of flow happens during a stage of the LFP. we do not have a fixed graph on n vertices that will model all our interactions. So. we have to consider building a graphical model on certain larger product spaces. there is no separate stage of the LFP where it happens. different. 3. vary with the initial partial assignment. Even within a cluster.

They therefore correspond to elements in the structure (recall that elements of the structure represent the literals in the k-SAT formula). 5. 89 . SEPARATION OF COMPLEXITY CLASSES 89 4. we have only shown one vertex per variable. Also recall that there are 2n elements in the k-SAT structure. Let us now incorporate this intuition into a model. Since the underlying formula ϕ that is being iterated is positive. in Fig 8. .1. each variable in our original system X1 . and the resulting parametrization by potentials. . are represented by the smaller circles in Fig. represent the r-neighborhoods of the elements in the structure. 8. and red indicating the variable has been assigned the value −1. Each of their possible values are the possible isomorphism types of the r-neighborhoods. This model appears to be of independent interest.green indicating the variable has been assigned a value of +1. However. where n is the number of variables in the SAT formula.8. It has two types of vertices. However. which encode the variables of the k-SAT instance. which we will call a Element-Neighborhood-Stage Product Model. elements do not change their color once they have been assigned. The model is illustrated in Fig.1. In order to exploit the factorization properties of directed graphical models. we do need a model which captures each stage separately. denoted by the larger circles with blue shading in Fig. 8. Xn is represented by a different vertex at each stage of the computation. and allowed it to be colored two colors .1. and the various stages cannot be bundled into one without losing crucial information. We now describe the ENSP model for a simple monadic least fixed point computation. . Just like variables. . or ENSP model for short. each neighborhood is also represented by a different vertex at each stage of the LFP computation. Thus.1. each variable in the original system gives rise to |ϕA | vertices in the ENSP model. Neighborhood Vertices These vertices. 8. The stage-wise nature of LFP is central to our analysis. Thus. we would like to avoid any closed directed paths. Element Vertices These vertices.

3 ⋮ X4.1 X1.2 X3.2) Xn.3) N(xn) N(xn-1.2) N(xn-1. See text for description.1) N(x3.3 N(x1) Xn Xn-1 ⋮ Xi+1 Xi ⋮ X4 X3 X2 X1 Elements Xi+1.3 X1.1) N(xn-1. 90 .2 ⋮ X4.2) N(x3.1: The Element-Neighborhood-Stage Product (ENSP) model for LFPϕ .2 Xi.1 Xn-1.2 ⋮ Xi+1.8.3 Xi.1 X3.1 X2.1) N(xn.3 X3.1) Xn.1 Stages of LFP Figure 8.2 X1.1 ⋮ X4.1 Xi.2) N(x2.2 X2.3 Xn-1.3) Xn.3 X2. SEPARATION OF COMPLEXITY CLASSES 90 N(xn.3) N(xn-1) Neighborhoods N(x3.2) N(xn.2 Xn-1.3 ⋮ Xi+1.1) N(x2.3) N(x3) N(x2.1 ⋮ N(x1.2 N(x1.3) N(x2) N(x1.

notice that some variable vertices are colored green. it assigns the value −1 to variable Xn (remember that the first two stages in the ENSP correspond to the first stage of the LFP computation). and some red. In the figure. This indicates that the initial partial assignment that we provided the LFP had variable X4 assigned +1 and variable Xi assigned −1. Initially. the variable X3. a small fraction O(n) of the variables are assigned values. Here. At this stage. based on the conditions expressed by the formula ϕ in terms of their own local neighborhoods. These vertices may be thought of as vectors of size poly(log n) corresponding to the cliques that occur in the neighborhood system we described in Sec. The vertices that do not change state simply transmit their existing state to the corresponding vertices in the next stage by a horizontal arrow. and the existence of a bounded number of other local neighborhoods in the structure.1 ). and Xi.1 ) and N (Xn−1. some elements enter the relation. and abort if not).1 ) and two other neighborhoods N (X2. Each stage of the LFP computation is represented by two stages in the ENSP. Now we describe the stages of the ENSP. which we do not show in the figure in order to avoid clutter. their neighborhoods. There are 2|ϕA | stages. X4. the LFP assigned the value +1 to the variable X3 .2 takes the color green based on information gathered from its own neighborhood N (X3. In this way.8. the neighborhoods of other elements that are in their vicinity) change. This means they get assigned +1 or −1. Let us now look at the transition to the second stage of the ENSP. at the start of the LFP computation.1 is red. Once some variables have been assigned values in the first stage.1 is green.2. This is indicated by the dot91 . and the neighborhoods in their vicinity (meaning. SEPARATION OF COMPLEXITY CLASSES 91 namely. This indicates that at the first stage. we are in the left-most stage. the local r-types of the corresponding element. Similarly. starting from the leftmost and terminating at the rightmost. or one may think of them as a single variable taking the value of the various local r-types. The LFP is asked to extend this partial assignment to a complete satisfying assignment on all variables (if it exists.2. In the figure. 8.

if the factor were a XORSAT clause. The explicit stages of the ENSP also perform the task of propagating the local constraints placed by the various factors in the underlying factor graph outward into the larger graphical model. . Note that this happens implicitly during LFP computation. The first stage is the explicit stage. the local constraint placed by a clause is that the global assignment must evade exactly one restriction to a specified set of k coordinates. The variables at the last stage Xi. −1. once X3 has been assigned the value +1. we recover our original variables (X1 . and obtained a directed graphical model on this larger space. and so the global solution is an intersection of such spaces. This product space has a nice factorization due to the directed graph structure. and we have a satisfying assignment. This is what we will exploit. in the case of k = 3 the clause x1 ∨ x2 ∨ ¬x3 permits all global assignments except those whose first three coordinates are (−1. By the end of the computation. In this way. all variables have been assigned values. Thus. . k-SAT asks a question about whether certain spaces 92 . By introducing extra variables to represent each stage of each variable and each neighborhood in the SAT formula. The second stage is the implicit stage. the local restrictions are all in the form of linear spaces. where variables get assigned values. For example. it updates its neighborhood and also the neighborhood of variable X2 which lies in its vicinity (in this example). We have embedded our original set of variates into a polynomially larger product space. In contrast.|ϕA | are just the original Xi . there are 2|ϕA | stages of the ENSP in all. +1). For example. That is why we have represented each stage of the actual LFP computation by two stages in the ENSP. There are two stages of the ENSP for each stage of the LFP. Remark 8.8. where variables “update their neighborhoods” and those neighborhoods in their vicinity. we have accomplished our original aim.8. influence propagates through the structure during a LFP computation. Xn ) by simply looking only at the last (rightmost in the figure) level of the ENSP. SEPARATION OF COMPLEXITY CLASSES 92 ted arrows between the second and third stages of the ENSP. For example. in our case of the factors encoding clauses of a k-SAT formula. . . Thus.

Note that without embedding the covariates into a larger space. . then the distribution of the entire space of solutions would have a substantially simpler parametrization than we know it does. If LFP were able to compute solutions to the d1RSB phase of random k-SAT. So. . and 2|ϕA | stages. In other words. this means that we have managed to represent the LFP computation on a structure as a directed model using a polynomial overhead in the number of parameters of our representation space.8. all messages are coded into the formula ϕ. Of course. if we were to try to solve XORSAT formulae. In contrast. Thus. νk )} 93 have non-empty intersections. 8. ωik ) = (ν1 . 93 . the end result of multiple runs of the LFP will be a space of solutions conditioned upon the requirements. we would obtain a space that would be linear. XORSAT asks the question about whether certain linear spaces have a non-empty intersection. we have a directed graph with 2n + n = 3n vertices at each stage. . SEPARATION OF COMPLEXITY CLASSES of the form {ω : (ωi1 . Thus. for instance. Note that these are O(1) local constraints per factor. . . where 1 ≤ i1 < i2 < · · · < ik ≤ n and the prohibited νi are ±1. we have been able to put a common structure on various computations done by LFP on them. Linearity is a global constraint. The insight that we can afford to incur a polynomial cost in order to obtain a common graphical model on a larger product space was key to this section. . by embedding the covariates into a polynomially larger space. we would not be able to place the various computations done by LFP into a single graphical model. .4 Parametrization of the ENSP Our goal is to demonstrate the following. Since the LFP completes its computation in under a fixed polynomial number of steps. .

The directed nature of the ENSP also means that we can factor the resulting distribution into conditional probability distributions (CPDs) at each vertex of the model of the form P (x | pa(x)).13. In order to do this. in general. and build up our local CPDs by simply recording local statistics over all these runs.8. Once again. and then normalize each CPD. which de- 94 . each CPD will have scope only poly(log n).3 and let the LFP computations run. automatically ensuring conditional independence. We have seen that the cliques in the ENSP are of size poly(log n). assuming that P = NP. What is important is that each such model will have some properties — such as largest clique size. without any added positivity constraints. depend on the initial partial assignment. we have embedded our variates into a polynomially larger space that has factorization according to a directed model — the ENSP. We represent each stage of the LFP computation on the corresponding two stages of the ENSP and thus obtain one full instantiation of the representation space. Recall that positivity is required in order to apply the Hammersley-Clifford theorem to obtain factorizations for undirected models. How do we compute the CPDs or potentials? We assign various initial partial assignments to the variables as described in Sec. The ENSP for different runs of the LFP will. 8.2. We only consider successful computations. This gives us the factorization (over the expanded representation space) of our distribution. be different. we need to measure the growth in the dimension of independent parameters it requires to parametrize the distribution of solutions that we have just computed using LFP. From our perspective. By employing the version of Hammersley-Clifford for directed models. in general. Theorem 2. SEPARATION OF COMPLEXITY CLASSES 94 In order to accomplish this. namely those where the LFP was able to extend the partial assignment to a full satisfying assignment to the underlying k-SAT formula. the major benefit of directed graphical models is that we can do this always. we also know that we can parameterize the distribution by specifying a system of potentials over its cliques. We do this exponentially numerous times. This is because the flow of influences through the stages of the ENSP will.

4. The scope of the factors in the parametrization grows as poly(log n). the size of the largest cliques 95 . The same can also be arrived at by utilizing the normal form of Theorem 5. By Theorem 5. This also underscores the principle that the description of the parameter space is simpler because it only involves interactions between l variates at a time directly. 3. Lemma 8. There are only n neighborhoods.9. At each implicit stage of the ENSP.2 gives us a poly(log n) upper bound on the size of the neighborhoods. 2. SEPARATION OF COMPLEXITY CLASSES 95 termines the order of the number of parameters — in common. Let us inspect these properties that determine the parametrization of the ENSP model. By the previous point. we are presently analyzing a single stage of the LFP. each of these possibilities can be parameterized by 2poly(log n) parameters. we have to update the types of the neighborhoods that were affected by the induction of elements at the previous explicit stage. and are chained together through conditional independencies.10. and each has poly(log n) elements at most. Proposition 8. The ENSP is an interaction model where direct interactions are of size poly(log n). and then chains these together through conditional independencies. There are polynomially many more vertices in the ENSP model than elements in the underlying structure. This again gives us poly(n) (O(ns ) in this case) different possibilities for each explicit stage of the ENSP. Remember. In the case of the LFP neighborhood system. 1. A distribution that factorizes according to the ENSP can be parameterized with 2poly(log n) independent parameters.9 there is a fixed constant s such that there must exist s neighborhoods in the structure satisfying certain local conditions for the formula to hold. giving us a total of 2poly(log n) parameters required. The number of local r-types whose value each neighborhood vertex can take is 2poly(log n) .8.

While the entire distribution obtained by LFP may not factor according to any one ENSP. This will not change if we were computing using complex fixed points since the space of k-types is only polynomially larger than the underlying structure. it is a mixture of distributions each of whom factorizes as per some ENSP. The cliques are also upper bounded in size by poly(log n). Therefore. In this case. Next. a vertex may be in at most O(log n) such cliques. we analyze the features of such a mixture when exponentially many instantiations of it are provided. SEPARATION OF COMPLEXITY CLASSES 96 are poly(log n) for each single run of the LFP. It also allows us to parametrize the ENSP by simply specifying potentials on its maximal cliques. These cliques are parametrized by potentials. let us examine the value limited case. in contrast to requiring the larger space RO(n) . these will show features of scope poly(log n). We may think of the cliques as the building blocks of each ENSP. Furthermore. when such a mixture is asked to provide exponentially many samples. This means that when exponentially many solutions are generated. which are of size poly(log n). As the reader may intuit. the features in the mixture will be of size poly(log n). We will treat the value limited case shortly. This is what drastically reduces the parameter space required to specify the distribution. This is simply a statement about the paucity of independent parameters in the component distributions in the mixture. In other words. not of size O(n). the differences are 96 . The crucial property of the distribution of the ENSP is that it admits a recursive factorization. The property of the ENSP for range limited models that allows us to analyze the behavior of mixtures is that it is specified by local Gibbs potentials on its cliques. the mixture comprises distributions that can be parametrized by a subspace of Rpoly(log n) . a variable interacts with the rest of the model only through the cliques that it is part of.8. Thus. a vertex displays collective behavior only of range poly(log n). Next. 8.5 Separation We continue our treatment of range limited poly(log n)-parametrizations.

Namely. as sections of inductive relations of higher arity. and these rules are the same for different computations since it is the same LFP that is being used. we can see that if each of the potentials had poly(log n) parametrization. The solutions are generated by complex LFP. How do we merge various O(n) potentials into a single poly(n) sized potential? And what will be the resulting parametrization of this merged potential? In order to merge the potentials. 4. 2. If we think of a potential as a CPD. but the graphical model is parametrizable with only 2poly(log n) parameters. 3. However. SEPARATION OF COMPLEXITY CLASSES as follows 97 1. the final merged potential will be compatible with each smaller potential on overlaps. they must agree on overlaps. So we will create a single potential over the entire graphical model which will have scope poly(n) (since the computation terminates in polynomial time). the Gibbs potentials are specified over cliques of size O(n). Remember. 97 . since they are CPDs of the same LFP. these CPDs are nothing but the rules by which the computation proceeds. the potentials are parameterized with only 2poly(log n) parameters inspite of having O(n) size. then so must the final merged potential. The potentials are already large in their scope (O(n)). This means that two CPDs cannot specify different behavior for the same priors. but are not very long (have only 2poly(log n) rows).8. Using this property. we observe that they have a certain sheaflike property. Thus. then the CPDs are wide (have O(n) columns). Since the interactions are O(n). There are O(n) interactions at each stage. How do we analyze mixtures of such potentials? The idea is as follows. Once again we see that we cannot instantiate exponentially many solutions from such a limited parametrization and obtain the d1RSB picture which requires ample O(n) joint distributions.

Intuitively. This also puts on rigorous ground the empirical observation that even NP-complete problems are easy in large regimes. without the possibility of factoring into smaller pieces through conditional independencies. cores cannot be assigned poly(log n) at a time. Once clause density is sufficiently high. Since cores do not factor through conditional independencies. Likewise. it guarantees us conditional independencies at the level of its largest interactions. it guarantees us that there will exist conditional independencies in sets of size larger than the largest cliques in 98 . In such cases. since the model is directed. which requires O(cn ) independent parameters to specify. these ample irreducible-O(n) interactions manifest through the appearance of cores which comprise clauses whose variables are coupled so tightly that one has to assign them “simultaneously. but with only 2poly(log n) parameters. Furthermore. and successive such assignments chained together through conditional independencies. and are not value limited either. In case of random k-SAT in the d1RSB phase. not O(n). Nor are they value limited since they instantiate in each of the exponentially many clusters in the d1RSB phase. their variation is ample.” Cores arise when a set of C = O(n) clauses have all their variables also lying in a set of size C. parametrization over cliques of size O(n). and become hard only when the densities of constraints increase above a certain threshold. parametrization over cliques of size only poly(log n) is insufficient to specify the joint distribution. This threshold is precisely the value where ample irreducibleO(n) interactions first appear in almost all randomly constructed instances.8. In other words. they represent irreducible interactions of size O(n) which may not be factored any further. variables in a core are so tightly coupled together that they can only vary jointly. More precisely. and which display the ample joint behavior of a system of n covariates. We have shown that in the ENSP for range limited models. is insufficient. SEPARATION OF COMPLEXITY CLASSES 98 This explains why polynomial time algorithms fail when interactions between variable are ample-O(n). the size of the largest such irreducible interactions are poly(log n). Furthermore. without any conditional independencies between subsets. this makes it impossible for polynomial time algorithms to assign their variables correctly.

should the core factorize as per the ENSP. not as O(n). since they can all be captured by some LFP. This is contradictory to the known behaviour of cores for sufficiently high values of k and clause density in the d1RSB phase. 8. At this point. P = NP. SEPARATION OF COMPLEXITY CLASSES 99 its moral graph. 6. Consider the solution space of k-SAT in the d1RSB phase for k > 8 as recalled in Section.2. It makes precise the notion that polynomial time algorithms can take into account only interactions between variables that grow as poly(log n).10. Theorem 8. there will be independent variation within cores when conditioned upon values of intermediate variables that also lie within the core. while the space of solutions generated by LFP has features of size poly(log n). In other words. we are ready to state our main theorem. instead of dealing with each individual algorithm separately. This is illustrated in Fig.2: The factorization and conditional independencies within a core due to potentials of size poly(log n). Proof. the features present in cores in the d1RSB phase have size O(n).1. In other words. We know that for high enough values of the clause 99 . Independent given Intermediate values Independent given Intermediate values poly(log n) poly(log n) Figure 8.8. The framework we have constructed allows us to analyze the set of polynomial time algorithms simultaneously.2. which are O(poly(log n)).

100 . Let αβγ be a representation of the variables in cliques α. Note that since O(n) variables have to be changed when jumping from one cluster to another. The basic question in analyzing such mixtures is: How many variables do we need to condition upon in order to split the distribution into conditionally independent pieces? The answer is given by (a) the size of the largest cliques and (b) the number of such cliques that a single variable can occur in. The first observation we make is that since the variables in cores are instantiated in exponentially many clusters. If each set of such variables has scope at most poly(log n). we would get a solution in another cluster.8. This would mean that with a poly(log n) change in frozen variables of one cluster. Namely. we can preclude value limited poly(log n)-parametrization. these two give us a poly(log n) quantity. we will see the effect of conditional independencies beyond range poly(log n). When exponentially many solutions have been generated. When exponentially many solutions have been generated from distributions having the parametrization of the ENSP model. the conditional independencies ensure that we will see cross terms of the form α1 βγ1 α2 βγ2 α1 βγ2 α2 βγ1 . At this point. we will see independent variation over all their possible conditional values in the variables of α and γ. But we know that in the highly constrained phases of d1RSB. β and γ. we have O(n) frozen variables in almost all of the exponentially many clusters. there will be conditional distributions that exhibit conditional independence between blocks of variates size poly(log n). Let us consider then the situation where these clusters were generated by a purported range limited LFP algorithm for k-SAT that can be parametrized by the ENSP model with clique sizes poly(log n). then given a value of β. SEPARATION OF COMPLEXITY CLASSES 100 density α. In our case. This gives us the contradiction that we seek. then this means that we have generated more than exponential in poly(log n) distinct solutions. we may even choose our poly(log n) blocks to be in overlaps of these variables. we will have non-trivial conditional distributions conditioned upon values of β variables. we need O(n) variable flips to get from one cluster to the next.

We can see that due to the limited parameter space that determines each variable. It may choose O(log n) such potentials. SEPARATION OF COMPLEXITY CLASSES 101 there will be no effect of the values of one upon those of the other. Thus. 8.8. Instead. This quantity would have to grow exponentially with n in order to display the behavior of the d1RSB phase. the “jointness” in this distribution lies at a level poly(log n). and prevents the Hamming distance from being O(n) on the average over exponentially many solutions. it must be poly(log n). We may think of such mixtures as possessing only cpoly(log n) “channels” to communicate directly with other variables. not by O(n) other variates. This gives us the cross-terms described earlier. Therefore. Once again we return to the same point — that the jointness of the distribution that a purported LFP algorithm would generate would lie at the poly(log n) levels of conditional independence. the resulting distribution will start showing features that are at most of size poly(log n). In other words. This is what prevents the Hamming distance between solutions from being O(n). the variables that have to be changed when jumping from one cluster to another). This is shown pictorially in Fig. it can only display a limited joint behavior. whereas the jointness in 101 . This means that blocks of size larger than this are now varying independently of each other conditioned upon some intermediate variables. exponentially many solutions cannot independently transmit O(n) correlations (namely. Their correlations must factor through this bottleneck. This is why when enough solutions have been generated by the LFP. This behavior is completely determined by poly(log n) other variates. Even coarsely. this means blocks of variables of size poly(log n) only “see” the rest of the distribution through equivalence classes that grow as O(npoly(log n) )). Each variable may choose poly(log n) partners out of O(n) to form a potential. All long range correlations transmitted in such a distribution must pass through only these many channels. which gives us conditional independences after range poly(log n). It is also useful to consider how many different parametrizations a block of size poly(log n) may have.2. there will be solutions that show cross-terms between features whose size is poly(log n).

Hard regimes of NP-complete problems allow O(n) variates to irreducibly jointly vary. but simply because the 2-core percolates [MM09. It is tempting to think that there will be such a parametrization whenever the algorithmic procedure used to generate the solutions is stage102 . This is what happens in XORSAT. We collect some observations in the following. not frozen variables. and accounting for such O(n) jointness that cannot be factored any further is beyond the capability of polynomial time algorithms. Remark 8. Remark 8. §18.8. and chained together by conditional independencies as would be done by a LFP. there are irreducible interactions of size O(n) that cannot be expressed as interactions between poly(log n) variates at a time. where the linearity of the problem causes frozen variables to occur. Remark 8. This is because it takes that many parameters to specify the exponentially many O(n) variable “jumps” between the clusters. not to the d1RSB phase. SEPARATION OF COMPLEXITY CLASSES 102 the distribution of the d1RSB solution space is truly O(n). Linear spaces always admit a simple description as the linear span of a basis. The frozen variables in XORSAT do not arise due to a high dimensional parametrization. Namely.3]. c > 1. Remark 8. which is also why the clusters are all of the same size. This is central to the separation of complexity classes.14.11.12. These jumps are independent. Each cluster is a linear space tagged on to a solution for the 2-core. frozen variables would occur even in low dimensional parametrizations in the presence of additional constraints placed by the problem. which takes the order of log of the size of the space. We can see from the preceding discussion that the number of independent parameters required to specify the distribution of the entire solution space in the d1RSB phase (for k > 8) rises as cn . The poly(log n) size of features and therefore Hamming distance between solutions tells us that polynomial time algorithms correspond to the RS phase of the 1RSB picture. and cannot be factored through poly(log n) sized factors since that would mean conditional independence of pieces of size poly(log n) and would ensure that the Hamming distance between solutions was of that order. Note that the central notion is that of the number of independent parameters. For example.13.

even PFP has the stage-wise bounded local property. where clique sizes are of exponential size. 3. The view that an algorithm is a means to generate one solution is limited in the sense that it is oblivious to the geometry of the space of all solutions. 4. When placed in the ENSP. be the appropriate approach in many applications. Namely. The most natural object of study for constraint satisfaction problems is the entire space of solutions.6 Some Perspectives The following perspectives are reinforced by this work. of course. It is this space where the dependencies and independencies that the CSP imposes upon covariates that satisfy it manifest. polynomial time algorithms succeed by successively “breaking up” the 103 . we cannot change a decision that has been made. but over an exponentially larger space. Studying the parametrization of the space of solutions is a worthwhile pursuit. but it can give rise to distributions without any conditional independence factorizations whose factors are of size poly(log n). One might observe that the requirement that we not make any trial and error at all that limits LFP computations in a fundamentally different manner than the locality of information flows. But there are applications where requiring algorithms to generate numerous solutions and approximate with increasing accuracy the entire space of solutions seems more natural. This is not so. 8. 2. See [Put65] for an interesting related notion of “trial and error predicates” in computability theory. We need the added requirement that “mistakes” are not allowed.8. SEPARATION OF COMPLEXITY CLASSES 103 wise local. 1. It may. we see that there is factorization. There is an intimate relation between the geometry of the space and its parametrization. Conditional independence over factors of small scope is at the heart of resolving CSPs by means of polynomial time algorithms. In other words. Otherwise.

8. Polynomial time algorithms resolve the variables in CSPs in a certain order. SEPARATION OF COMPLEXITY CLASSES 104 problem into smaller subproblems that are joined to each other through conditional independence. 5. polynomial time algorithms cannot solve problems in regimes where blocks whose order is the same as the underlying problem instance require simultaneous resolution. This structure is important in their study. In order to bring this structure under study. Consequently. 104 . and with a certain structure. we may have to embed the space of covariates into a larger space (as done by the ENSP).

Z. [LFPx. X.Y ψ(y. X. Y )])]t Then A. Z)]u. assume that no individual variable free in LFPy. Ch. 8]. Reduction to a Single LFP Operation A. tells us that nested fixed points can always be replaced by simultaneous fixed points. X. χ(z.1 is equivalent to a formula of the form ∃(∀)u[LFPz. Y ) be first order formulas positive in X and Y .1) 105 . Y ) and ψ(y. Moreover. The presentation here closely follows [EF06. Let ϕ(x.A.X ϕ(x. X. where χ is first order.1 The Transitivity Theorem for LFP We now gather a few results that will enable us to cast any LFP into one having just one application of the LFP operator. Y ) gets into the scope of a corresponding quantifier or LFP operator in A.1.Y ψ(y. Since we use this construction to deal with complex fixed points. [LFPy. known as the transitivity theorem. (A. The first result. we reproduce it in this appendix. X.

. Let m operators F1 . We will denote a tuple consisting only of a’s by a. which is known as their simultaneous join. Then the a-section of R. . .APPENDIX A. . denoted by J(F1 . we introduce the notion of a section. Recall that simultaneous inductions do not increase the expressive power of LFP. . Fm act as follows: F1 : (Ak1 ) × · · · × (Akm ) → (Ak1 ) F2 : (Ak1 ) × · · · × (Akm ) → (Ak2 ) . Definition A. . The proof utilizes a coding procedure whereby each simultaneous induction is embedded as a section in a single LFP operation of higher arity. First. Let F1 . . .2 Sections and the Simultaneous Induction Lemma for LFP Next we deal with simultaneous fixed points. . denoted by Ra . is an operator acting as J(F1 . Fm . .2. Let R be a relation of arity (k + l) on A and a ∈ Ak . is given by Ra := {b ∈ Ak |R(ba)} Next we see how sections can be used to encode multiple simultaneous operators producing relations of lower arity into a single operator producing a relation of higher arity. Fm be operators acting as above. . Fm ).2) . Set k := max{k1 . . km } + m + 1. REDUCTION TO A SINGLE LFP OPERATION 106 A. The simultaneous join of F1 . . . . . . . Fm ) : (Ak ) → (Ak ) 106 (A.1. The length of a be clear ˜ ˜ from context. . Fm : (Ak1 ) × · · · × (Akm ) → (Akm ) We wish to embed these operators as sections of a “larger” operator. . . . Definition A. . . . .

The simultaneous join of inductive operators is inductive. Fm ) exists. . we need to show that the simultaneous join can itself be expressed as a LFP computation. . .5. . . . the simultaneous join is given by J(R) := a. . The ith power J i of the simultaneous join operator satisfies Ji = i i ((F1 × {˜b1 }) ∪ · · · ∪ (Fm × {˜bm })).3) a ˜ ˜ The simultaneous join operator defined above has properties we will need to use.4) The following corollaries are now immediate. For ≥ 1 and i = 1. w)  ¬(v = w) ∧ (x1 = · · · = x = v) i=1    l δi (x1 . xl . v. . v. .a=b ((F1 (Rab1 . w) := ¬(v = w) ∧ (x1 = · · · = x −i+1 = v) ∧     (x i > 1. . .5) Now we are ready to show that simultaneous fixed-point inductions of formulas can be replaced by the fixed point induction of a single formula. REDUCTION TO A SINGLE LFP OPERATION 107 such that for any a. We need formulas that will help us define sections of a simultaneous induction. Fm ) ∞ ∞ exists if and only if their simultaneous fixed point (F1 . the section formulas δi (x1 .3. (A. . The fixed point J ∞ of the simultaneous join of operators (F1 .6.a=b (A. l Definition A. Concretely. . . Since the sections are coded using tuples of the form ak−i+ki +1 bi . a a a. Lemma A. xl . the abi -section (where the length of a here is k − ˜ ˜ i + 1) of the nth power of J is the nth power of the operator Fi . A |= δi [˜bj ab] if and only if i = j.b∈A. we will need formulas that can express this. . −i+2 = · · · = w) For distinct a. . 107 . . Rabm ) × {˜b1 }) ∪ · · · a ˜ ˜ · · · ∪ (Fm (Rab1 . . a (A. . . b ∈ A.4. b ∈ A. Rabm ) × {˜bm })). . .b∈A. .APPENDIX A. . These are collected below. . . . Corollary A. . . Finally. Corollary A.

k ∨ (ϕm (Zvw1 . 108 . ϕm be positive in R1 . . xm ) 108 be formulas of LFP. Zvwm . . . Zvwm . . zk ) ∧ δm (z1 . . . . Define a new first order formula χJ having k variables and computing a single k-ary relation Z by χJ (Z. . . z1 . . ϕm (R1 . . zk . zk ) ∧ δ2 (z1 . . . . w)) ˜ ˜ k ∨ (ϕ2 (Zvw1 . . . . . . Rm . we let Ri be a ki -ary relation and xi be a ki -tuple. . . zk . . z1 . x1 ). . . . . . z1 . . . . the relation computed by the least fixed point of χJ contains all the individual least fixed points computed by the simultaneous induction as its sections. . . . . let ϕ1 . . . v. . . Let ϕ1 (R1 . w)))) (A. . .7. . . . w)) ˜ ˜ . z1 . . . . zk ) ∧ δ1 (z1 . . v. REDUCTION TO A SINGLE LFP OPERATION Definition A. Set k := max{k1 . . . . km }+ m + 1. Zvwm . . . . . . Rm . zk . . . As always. . . . . v. Rm . . . . zk ) := ∃v∃w(¬v = w∧ k ((ϕ1 (Zvw1 . .APPENDIX A. .6) ˜ ˜ Then. Furthermore.

2008. 46(2):325–343. Comput. 50:309–335. Datalog extensions for database queries and updates. Inform. In STOC’06: Proceedings of the 38th Annual ACM Symposium on Theory of Computing. 17(4):947–973 (electronic).CO]. Texts in Theoretical Computer Science. 2006. arXiv:0803. J. Sci. 1995. Soc. 2004. Structural e a ı complexity. Achlioptas and A. Roy. Berlin. [AM00] Srinivas M. 1974. [AV91] Serge Abiteboul and Victor Vianu.2122v2 [math. Journal of Computer and System Sciences. I. Computing with first-order logic. Aji and Robert J. Soc. IEEE Trans. ACM. An EATCS Series. B. J. Ser. On the solutionspace geometry of random constraint satisfaction problems. 36:192–236. Coja-Oghlan. Theory. [AP04] Dimitris Achlioptas and Yuval Peres. [Bes74] Julian Besag. pages 130–139. The generalized distributive law. [BDG95] ´ Jos´ Luis Balc´ zar. [ART06] Dimitris Achlioptas and Federico Ricci-Tersenghi.. 2000. Syst. The threshold for random kSAT is 2k log 2 − O(k). With dis109 . New York. Spatial interaction and the statistical analysis of lattice systems. second edition. 1995. [AV95] Serge Abiteboul and Victor Vianu.. Josep D´az. 1991. and Joaquim Gabarro. Algorithmic barriers from phase transitions. J.Bibliography [ACO08] D. Statist. Math. McEliece. Springer-Verlag. 43(1):62–124. Amer.

Comput. [Daw80] A. [Coo71] Stephen A. New York. Soc.. and M Weigt. K. PHYSICAL JOURNAL B. ACM Press. S. [CO09] A. Bartlett and with a reply by the author. 4(4):431–442. Statist. 1991. The complexity of theorem-proving procedures. J. 41(1):1–31. [CKT91] Peter Cheeseman. pages 151–158. J. Comput. R. Ord. 2000. 15(4):1106–1118. Conditional independence for statistical operations. Statist. 1980. The P versus NP problem. Clay Math. Conditional independence in statistical theory. 1975. G. [CF86] Ming-Te Chao and John V.. and Robert Solovay.CO]. [Bis06] Christopher M. [Coo06] Stephen Cook. Probabilistic analysis of two heuristics for the 3-satisfiability problem. NY. pages 87–104. R Monasson. Cox. John Gill. Information Science and Statistics. Ann. 8(3):598–617.3583v1 [math. pages 331–340. [Daw79] A. B. Franco. 568:551–568. MA. Where the really hard problems are. 1979. 1986. 1971.BIBLIOGRAPHY 110 cussion by D. 110 . Hawkes. Clifford. P. Roy. and William M. and M. Mead. Hammersley. Bishop. Ser. arXiv:0902. A. Pattern recognition and machine learning. Springer. [BGS75] Theodore Baker. Bob Kanefsky. [BMW00] G Biroli. A variational description of the ground state structure in random satisfiability problems. 2006. Taylor. R. Cambridge. USA. New York. Whittle. Cook. Inst. Coja-Oghlan. SIAM J. Relativizations of the P =?N P question. 2009.. In IJCAI.. In STOC ’71: Proceedings of the third annual ACM symposium on Theory of computing. SIAM J. P. A better algorithm for random k-sat. P. Dawid. Philip Dawid. In The millennium prize problems. M. 2006.

111 . Generalized first-order spectra and polynomialtime recognizable sets. Marc M´ zard. J. 2006. Minimum partition of a matroid into independents subsets. 1974. 119(2):160–175. [DLW95] Anuj Dawar. Math. [Edm65] Jack Edmonds. Under preparation. Dobrushin.. Larry J. 2010.. Comput. Journal of Research of the National Bureau of Standards. New York. and Comput. 2008. Theory Prob. 1995. [Dob68] R. monadic co-np.. [EF06] ¨ Heinz-Dieter Ebbinghaus and Jorg Flum. Providence. Soc. Stockmeyer. Math. 1973). In Complexity of computation (Proc. Vardi. Springer Monographs in Mathematics. 1995. Appl. Amer. Thierry Mora. Soc.. Friedgut. Math. Steven Lindell. VII. SIAM– AMS Proc. Comput. 13:197–224. 120(1):78–92. and Riccardo e e e Zecchina.I. 69:67– 72. Theor. en- monadic np vs. The description of a random field by means of conditional probabilities and conditions on its regularity. 1999. Inform. Inf. L. larged edition. Pairs of sat-assignments in random boolean formulæ. 1968. Berlin. and Scott Weinstein. R. pages 43–73. A distribution centric approach to constraint satisfaction problems. Vol. Amer. [DMMZ08] Herv´ Daud´ . 12(20):1017–1054. [Fag74] Ronald Fagin. Springer-Verlag. Necessary and sufficient conditions for sharp thresholds and the k-sat problem.BIBLIOGRAPHY [Deo10] 111 Vinay Deolalikar. and Moshe Y. [FSV95] Ronald Fagin. Infinitary logic and inductive definability over finite structures. 1965.. On Finite model theory. SIAMAMS Sympos. Sci. Appl..... 393(1-3):260–279. [Fri99] E.

In Theory of Models (Proc. Berkeley). 8(4):13–24. Cambridge. pages 147–152. 2000. NY. Calif. 1971. Clifford. 1965. H. [GJ79] Michael R. pages 105–135.BIBLIOGRAPHY [Gai82] 112 Haim Gaifman.. North-Holland. Log. E. Sympos. 6(6):721–741. [Hod93] Wilfrid Hodges. 1(1):112–130. Relational queries computable in polynomial time (extended abstract). volume 107 of Stud. North-Holland. Markov fields on finite graphs and lattices.. [Han65] William Hanf. Hammersley and P. 1982.. [Imm82] Neil Immerman. 1963 Internat. 1981). IEEE Transactions on Pattern Analysis and Machine Intelligence. November 1984. A Series of Books in the Mathematical Sciences. [HC71] J. pages 132–145. 112 . 1982. ACM Trans. San Francisco. USA. Garey and David S. Computers and intractability. [GS00] Martin Grohe and Thomas Schwentick. ACM. On local and nonlocal properties. Independence results in computer science.. Hopcroft. 1979. Math. Johnson. In Proceedings of the Herbrand symposium (Marseilles. [HH76] J. M. W. volume 42 of Encyclopedia of Mathematics and its Applications. Model-theoretic methods in the study of elementary logic. In STOC ’82: Proceedings of the fourteenth annual ACM symposium on Theory of computing. 1976. Locality of order-invariant first-order formulas. New York. Freeman and Co. Stochastic relaxation. 1993. Logic Found. SIGACT News. Cambridge University Press. Model theory. Amsterdam. [GG84] Stuart Geman and Donald Geman. Amsterdam. Comput. A guide to the theory of NP-completeness. Hartmanis and J. gibbs distributions and the bayesian restoration of images.

L. Plenum Press. A. 1999. [KFaL98] Frank R. 47:498–519. IEEE Transactions on Information Theory. Miller and J. 104(25):10318–10323 (electronic). [KMRT+ 07] Florent Krzakała. In R. and Control. New York. ¸ Guilhem Semerjian. 264:1297–1301. Carlin. Austral. 1998. Federico Ricci-Tersenghi. Complexity of Computer Computations. and Hans andrea Loeliger. Brendan J. Ser. Snell. Springer-Verlag. [KS80] R. [KMRT+ 06] Florent Krzakala. Reducibility among combinatorial problems. [Imm99] Neil Immerman. Andrea Montanari. Koller and N. B. Gibbs states and the a set of solutions of random constraint satisfaction problems. Probabilistic Graphical Models: Principles and Techniques. 2007. 113 . Science. and Lenka Zdeborov´ . and J. 1986. Critical behavior in the satisfiability of random boolean formulae. Relational queries computable in polynomial time. M. American Mathematical Society. Kschischang. Math. editors. E. Natl. 1980. Recursive causal models. [KS94] Scott Kirkpatrick and Bart Selman. [KSC84] Harri Kiiveri. J. 2006. pages 85–103. P. Proc. MIT Press. Karp. Gibbs states and the a set of solutions of random constraint satisfaction problems. Federico Ricci-Tersenghi. CoRR. 1984. and Lenka Zdeborov´ . 2009. Thatcher. Friedman. Markov random fields and their applications. [Kar72] R. T. 1994. Kinderman and J. [KF09] D. 36(1):30–52. 1:1–142. Inform. Frey. W. Guilhem Semerjian. Acad. Descriptive complexity. Andrea Montanari. Sci. Speed. 1972. Graduate Texts in Computer Science. Factor graphs and the sum-product algorithm. abs/cond-mat/0612365. USA. 68(1-3):86–104.BIBLIOGRAPHY [Imm86] 113 Neil Immerman. Soc.

. Lindell. Computing monadic fixed points in linear available online at time on doubly linked data structures. 2003. Springer-Verlag. With forewords by Anil K. Special issue on influence diagrams. pages 779–788. and come putation. 2009.1. Oxford University Press. Springer-Verlag. Networks. 2009. Levin. 2004. [Lin05] S. [LR03] Richard Lassaigne and Michel De Rougemont. Advances in Pattern Recognition. N. Graphical models. Lauritzen. 1996. 114 . B. Independence properties of directed Markov fields.psu. Oxford Graduate Texts. 20(5):491–505. Oxford. Logic and Complexity.122. Larsen. [MA02] Cristopher Moore and Dimitris Achlioptas. Berlin.1447. http://citeseerx. A. and H.edu/doi=10.-G.ist. Jain and Rama Chellappa.1. Problems of Information Transmission.BIBLIOGRAPHY [Lau96] 114 Steffen L. volume 17 of Oxford Statistical Science Series. Li. [Li09] Stan Z. Lauritzen. Information. Random k-sat: Two moments suffice to cross a sharp threshold. L. [Lib04] Leonid Libkin. Universal sequential search problems. Dawid. London. New York. Oxford Science Publications. 2005. London. 1990. The Clarendon Press Oxford University Press. Springer-Verlag London Ltd. physics. FOCS. 1973. Texts in Theoretical Computer Science. Elements of finite model theory. Markov random field modeling in image analysis. [Lev73] Leonid A. An EATCS Series. 9(3). P. 2002. third edition. [LDLL90] S. Leimer. [MM09] Marc M´ zard and Andrea Montanari.

56(2):1357–1370. Spin e glass theory and beyond. and R. Phys. Giorgio Parisi. Elchanan Mossel. Science. T. pages 459–465. Sep 2007. [MRTS07] Andrea Montanari. Rev. Elementary induction on abstract structures. volume 9 of World Scientific Lecture Notes in Physics. Zecchina. [MPZ02] M M` zard. [MPV87] Marc M´ zard. Amsterdam. Bart Selman. Analytic and Algorithmic e Satisfiability Problems. 77. and Martin J. 54(4):Art. M´ zard. [Mos74] Yiannis N. World Scientific Publishing Co. [Mou74] John Moussouris. Inc.. (electronic). Teaneck. Wainwright. Hard and easy distributions of sat problems. Statistical mechanics of e the random k-satisfiability model. and Miguel Angel Virasoro. Elchanan Mossel. 10:11–33. Moschovakis. [MSL92] David Mitchell. Gibbs and Markov random systems with constraints. A new look at survey propagation and its generalizations. [MMW07] Elitza Maneva.. J. J. 94(19):197–205. [MZ97] R´ mi Monasson and Riccardo Zecchina. Statist. Phys. 17. Studies in Logic and the Foundations of Mathematics. 1987.. and Guilhem Semerjian. Solving constraint satisfaction problems through belief propagation-guided decimation. and Hector Levesque. [MMZ05] M. Rev. 2005. Clustering of solutions in e the random satisfiability problem. Maneva.. G Parisi. Vol. May 2005. NJ. 115 . and R Zecchina. and Martin J. 2002. ACM. Aug 1997. 2007. In SODA. A new look at survey propagation and its generalizations. 41 pp. pages 1089–1098.BIBLIOGRAPHY [MMW05] 115 Elitza N. 1974. Mora. 1974. In AAAI. North-Holland Publishing Co. Phys. Wainwright. E. Federico Ricci-Tersenghi. Lett. 297(August):812–815. 1992.

Trial and error predicates and the solution to a problem of mostowski. 1994). In Discrete Mathematics and Theoretical Computer Science. Razborov and Steven Rudich. pages 603–618. Math. 26th Annual ACM Symposium on the Theory of Computing (STOC ’94) (Montreal.. [Sip92] Michael Sipser. Comput. New York. 6(6):505–526. Structures Comput. Introduction to the Theory of Computation. PWS Publishing Company. 66(5):056126. Nov 2002. [Wig07] Avi Wigderson. PQ. Proceedings of the ICM 2006. NY. [SB99] Thomas Schwentick and Klaus Barthelmann. [Var82] Moshe Y. 1992. Joint COMPUGRAPH/SEMAGRAPH Workshop on Graph Rewriting and Computation (Volterra. part 1):24–35. pages 444– 454. J. Log.BIBLIOGRAPHY [MZ02] Marc M´ zard and Riccardo Zecchina. e Rev. Sci. ACM. 55(1. Hilary Putnam. System Sci. 1997. Symb. 1965.a computational complexity perspective. Springer Verlag. [Put65] 116 Random k-satisfiability problem: From an analytic solution to an efficient algorithm. [RR97] Alexander A. Vardi. Phys. 1999. In STOC ’82: Proceedings of the fourteenth annual ACM symposium on Theory of computing.. The complexity of relational query languages (extended abstract). Sipser. 2007. [See96] Detlef Seese. Linear time computable problems and first-order descriptions. 116 . 1:665–712. NP. P. J. pages 137–146. USA. The history and status of the p versus np question. Local normal forms for first-order logic with applications to games and automata. and Mathematics . In STOC. 1996. [Sip97] M. 1997. 30(1):49–57.. Natural proofs. 1982. 1995). E.

You're Reading a Free Preview

Download
scribd
/*********** DO NOT ALTER ANYTHING BELOW THIS LINE ! ************/ var s_code=s.t();if(s_code)document.write(s_code)//-->