This action might not be possible to undo. Are you sure you want to continue?
Advanced
Complexity
Theory
Markus Bl¨aser & Bodo Manthey
Universit¨ at des Saarlandes
Draft—February 27, 2010 and forever
2
1 Complexity of optimization prob
lems
1.1 Optimization problems
The study of the complexity of solving optimization problems is an impor
tant practical aspect of complexity theory. A good textbook on this topic is
the one by Ausiello et al. [ACG
+
99]. The book by Vazirani [Vaz01] is also
recommend, but its focus is on the algorithms side.
Deﬁnition 1.1. An optimization problem P is a 4tuple (I
P
, S
P
, m
P
, goal
P
)
where
1. I
P
⊆ ¦0, 1¦
∗
is the set of valid instances of P,
2. S
P
is a function that assigns to each valid instance x the set of feasible
solutions S
P
(x) of x, which is a subset of ¦0, 1¦
∗
.
1
3. m
P
: ¦(x, y) [ x ∈ I
P
and y ∈ S
P
(x)¦ → N
+
is the objective function
or measure function. m
P
(x, y) is the objective value of the feasible
solution y (with respect to x).
4. goal
P
∈ ¦min, max¦ speciﬁes the type of the optimization problem.
Either it is a minimization or a maximization problem.
When the context is clear, we will drop the subscript P. Formally,
an optimization problem is deﬁned over the alphabet ¦0, 1¦. But as usual,
when we talk about concrete problems, we want to talk about graphs, nodes,
weights, etc. In this case, we tacitly assume that we can always ﬁnd suitable
encodings of the objects we talk about.
Given an instance x of the optimization problem P, we denote by S
∗
P
(x)
the set of all optimal solutions, that is, the set of all y ∈ S
P
(x) such that
m
P
(x, y) = goal¦m
P
(x, z) [ z ∈ S
P
(x)¦.
(Note that the set of optimal solutions could be empty, since the maximum
need not exist. The minimum always exists, since we m
P
only attains val
ues in N
+
. In the following we will assume that there are always optimal
1
Some authors also assume that for all x ∈ I
P
, S
P
(x) = ∅. In this case the class NPO
deﬁned in the next section would be equal to the class expAPX (deﬁned somewhere else).
3
4 1. Complexity of optimization problems
solutions provided that S
P
(x) = ∅.) The objective value of any y ∈ S
∗
(x) is
denoted by OPT
P
(x).
2
Given an optimization problem P, there are (at least) three things one
could do given a valid instance x:
1. compute an optimal solution y ∈ S
∗
(x) (construction problem).
2. compute OPT(x) (evaluation problem)
3. given an additional bound B, decide whether OPT(x) ≥ B (if goal =
max) or whether OPT(x) ≤ B (if goal = min) (decision problem).
The ﬁrst task seems to be the most natural one. Its precise formalization
is however a subtle task. One could compute the function F : I → {(¦0, 1¦
∗
)
mapping each x to its set of optimal solutions S
∗
(x). However S
∗
(x) could
be very large (or even inﬁnite). Moreover, one is almost always content with
only one optimal solution. A cut of F is any function f : I → Σ
∗
that maps
every x to some y ∈ S
∗
(x). We say that we solve the construction problem
associated with P if there is a cut of F that we can compute eﬃciently.
3
It turns out to be very useful to call such a cut again P and assume that
positive statements containing P are implicitly ∃quantiﬁed and negative
statements are ∀quantiﬁed. (Do not worry too much now, everything will
become clear.)
The second task is easy to model. We want to compute the function
x → OPT(x). We denote this function by P
eval
.
The third task can be modelled as a decision problem. Let
P
dec
=
¦'x, bin(B)` [ OPT(x) ≥ B¦ if goal = max
¦'x, bin(B)` [ OPT(x) ≤ B¦ if goal = min
Our task is now to decide membership in P
dec
.
1.2 PO and NPO
We now deﬁne optimization analogs of P and NP.
Deﬁnition 1.2. NPO is the class of all optimization problems P = (I, S, m, goal)
such that
2
The name m
∗
P
(x) would be more consequent, but OPT
P
(X) is so intuitive and con
venient.
3
Note that not every cut can be computable, even for very simple optimization problems
like computing minimum spanning trees and even on very simple instances. Consider a
complete graph K
n
, all edges with weight one. Then every spanning tree is optimal. But
a cut that maps K
n
to a line if the nth Turing machine halts on the empty word and to
a star otherwise is certainly not computable.
1.3. Example: TSP 5
1. I ∈ P, i.e., we can decide in polynomial time whether a given x is a
valid instance,
2. there is a polynomial p such that for all x ∈ I and y ∈ S(x), [y[ ≤
p([x[), and for all y with [y[ ≤ p([x[), we can decide y ∈ S(x) in time
polynomial in [x[,
3. m is computable in polynomial time.
Deﬁnition 1.3. PO is the class of all optimization problems P ∈ NPO
such that the construction problem P is deterministically polynomial time
computable. (Recall that this means that there is a cut that is polynomial
time computable).
We will see the relation of PO and NPO to P and NP in Section 1.5.
Even though it is not explicit in Deﬁnition 1.2, NPO is a nondeterministic
complexity class.
Theorem 1.4. For each P ∈ NPO, P
dec
∈ NP.
Proof. Let p be the polynomial in the deﬁnition of NPO. The following
nondeterministic Turing machine M decides P
dec
in polynomial time:
Input: instance x ∈ I and bound B
1. M guesses a string y with [y[ ≤ p([x[).
2. M deterministically tests whether y ∈ S(x).
If not, M rejects.
3. If y is, then M computes m(x, y) and tests whether m(x, y) ≤ B
(minimization problem) or m(x, y) ≥ B (maximization problem).
4. If the test is positive, then M accepts, otherwise, M rejects.
It is easy to see that M indeed decides P
dec
and that its running time is
polynomial.
1.3 Example: TSP
Problem 1.5 (TSP, ∆TSP). The Traveling Salesperson Problem (TSP) is
deﬁned as follows: Given a complete loopless (undirected) graph G = (V, E)
and a weight function w : E → N
+
assigning each edge a positive weight,
ﬁnd a Hamiltonian tour of minimum weight. If in addition w fulﬁlls the
triangle inequality, i.e.,
w(¦u, v¦) ≤ w(¦u, x¦) +w(¦x, v¦) for all nodes u, x, v,
then we speak of the Metric Traveling Salesperson Problem (∆TSP).
6 1. Complexity of optimization problems
In the example of the Traveling Salesperson Problem TSP, we have the
following:
• The set of all valid instances is the set (of suitable encodings) of all
edgeweighted complete loopless graphs G. In the special case ∆TSP,
the edge weights should also fulﬁll the triangle inequality (which can
be easily checked).
• Given an instance x, a feasible solution is any Hamiltonian tour of G,
i.e., a permutation of the vertices of G. (Note that for TSP, the set of
feasible solutions only depends on the number of nodes of G.)
• The objective value of a solution y of an instance x is the sum of the
edges used in the tour speciﬁed by y. (This can be interpreted as the
length of the tour.)
• Finally, TSP and ∆TSP are minimization problems.
It is easy to verify, that TSP ∈ NPO. However it is very unlikely that it
is in PO. Even ﬁnding a very rough approximate solution seems to be very
hard.
Exercise 1.1. Assume that there is a polynomial time algorithm that given
an instance x of TSP, returns a Hamiltonian tour whose weight is at most
2
p(n)
OPT for some polynomial p, where n is the number of nodes of the
given graph. Then P = NP. (Hint: Show that under this assumption, one
can decide whether a graph has a Hamiltonian circuit.)
1.4 Construction, evaluation, and decision
Let us investigate the relation between the construction, evaluation, and
decision problem associated with a problem P ∈ NPO.
Theorem 1.6. Let P ∈ NPO. Then
1. P
dec
≤
T
P
P
eval
and P
eval
≤
T
P
P
dec
.
2. P
eval
≤
T
P
P. (Since this is a negative statement about P, it means that
P
eval
≤
T
P
P holds for all cuts P.)
Proof. We start with the ﬁrst statement: P
dec
≤
T
P
P
eval
is seen easily:
On input 'x, bin(B)`, we can compute OPT(x) using the oracle P
eval
, and
compare it with B.
P
eval
≤
T
P
P
dec
is a little trickier: Since m is polynomial time computable,
OPT(x) ≤ 2
q([x[)
for some polynomial q. Using binary search, we can ﬁnd
OPT(x) with q(n) oracle queries.
1.5. NPhard optimization problems 7
For the second statement, note that when we have an optimum solution,
then we can compute OPT(x).
If P
dec
is NPcomplete, then the optimization problem is not harder than
the decision problem.
Theorem 1.7. Let P ∈ NPO such that P
dec
is NPcomplete. Then P ≤
T
P
P
dec
.
Proof. Assume that P is a maximization problem, the minimization
case is symmetric. Let q be a polynomial such for every x ∈ I and y ∈ S(x),
[y[ ≤ q([x[) and m(x, y) is bounded by 2
q([x[)
.
For given x, ﬁx some polynomial time computable total order on the
set ¦0, 1¦
≤q([x[)
. For y ∈ ¦0, 1¦
≤q([x[)
, let λ(y) be the rank that y has with
respect to this order.
We derive a new problem
ˆ
P from P by deﬁning a new objective function.
The objective function of
ˆ
P is given by
ˆ m(x, y) = 2
q(n)+1
m(x, y) +λ(y).
Note that the ﬁrst summand is always bigger than 2
q(n)+1
> λ(y). This
implies that for all x and y
1
, y
2
∈ S(x), ˆ m(x, y
1
) = ˆ m(x, y
2
). Furthermore,
if ˆ m(x, y
1
) ≥ ˆ m(x, y
2
) then m(x, y
1
) ≥ m(x, y
2
). Thus if y ∈
ˆ
S
∗
(x) then
y ∈ S
∗
(x), too. (Here
ˆ
S
∗
(x) is the set of optimum solutions of x as an
instance of
ˆ
P.)
An optimal solution y of
ˆ
S
∗
(x) can be easily derived from
ˆ
OPT(x): We
compute the remainder of the division
ˆ
OPT(x) with 2
q(n)+1
. This remainder
is λ(y) from which we can obtain y. Thus P ≤
T
P
ˆ
P ≤
T
P
ˆ
P
eval
.
By Theorem 1.6,
ˆ
P
eval
≤
T
P
ˆ
P
dec
. Since
ˆ
P
dec
∈ NP and P
dec
is NPcomplete
by assumption,
ˆ
P
dec
≤
P
P
dec
. Using transitivity, we get P ≤
T
P
P
dec
.
1.5 NPhard optimization problems
Deﬁnition 1.8. An optimization problem P is NPhard if for all L ∈ NP,
L ≤
T
P
P.
Theorem 1.9. If P is NPhard and P ∈ PO, then P = NP.
Exercise 1.2. Prove Theorem 1.9.
Theorem 1.10. Let P ∈ NPO. If P
dec
is NPhard, then P is NPhard.
Proof. Since P
dec
is NPhard, L ≤
P
P
dec
for all L ∈ NP. Since manyone
reducibility is a special case of Turing reducibility and ≤
T
P
is transitive, we
get L ≤
T
P
P.
Some authors prefer to call an optimization problem NPhard if P
dec
is NPhard. Theorem 1.10 states that this deﬁnition is potentially more
restrictive than our deﬁnition.
8 1. Complexity of optimization problems
Corollary 1.11. If P = NP, then PO = NPO
Proof. There is a problem P in NPO such that P
dec
is NPhard, for in
stance ∆TSP. If P would belong to PO, then also P
dec
∈ P by Theorem 1.7,
a contradiction.
2 Approximation algorithms and
approximation classes
In the most general sense, an approximation algorithm is an algorithm that
given a valid instance x is able to compute some feasible solution.
Deﬁnition 2.1. A deterministic Turing machine A is an approximation
algorithm for an optimization problem P = (I, S, m, goal) if
1. the running time A is polynomial,
2. A(x) ∈ S(x) for all x ∈ I.
Of course, there are good and not so good approximation algorithms
and we develop a framework to measure the quality or approximation per
formance of such an algorithm.
Deﬁnition 2.2. 1. Let P be an optimization problem, x ∈ I, and y ∈
S(x). The performance ratio of y with respect to x is deﬁned as
PR(x, y) = max
m(x, y)
OPT(x)
,
OPT(x)
m(x, y)
.
1
2. Let α : N →Q. An approximation algorithm A is an αapproximation
algorithm, if for all x ∈ I,
PR(x, A(x)) ≤ α([x[).
The deﬁnition of PR(x, y) basically means that in the case of a mini
mization problem, we measure how many times the objective value of the
computed solution exceeds the objective value of an optimum solution. In
the case of a maximization problem, we do the same but we take the recip
rocal. This may seem strange at a ﬁrst glance but it has the advantage that
we can treat minimization and maximization problems in a uniform way.
(Be aware though that some authors use m(x, y)/ OPT(x) to measure the
approximation performance in case of maximization problems. But this is
merely a question of faith.)
Deﬁnition 2.3. 1. Let F be some set of functions N →Q. An optimiza
tion problem P ∈ NPO is contained in the class FAPX if there is an
f ∈ F such that there exists an fapproximation algorithm for P.
1
Note that m only attains positive values. Thus, the quotient is always deﬁned.
9
10 2. Approximation algorithms and approximation classes
2. APX := O(1)APX.
(I hope that the elegant deﬁnition above clariﬁes why PR was deﬁned
for maximization problems as it is.) There is a wellknown 2approximation
algorithm for ∆TSP that is based on minimum spanning trees, thus
∆TSP ∈ APX.
Even stronger is the concept of a polynomial time approximation scheme.
Deﬁnition 2.4. A deterministic Turing machine A is a polynomial time ap
proximation scheme (PTAS) for an optimization problem P = (I, S, m, goal)
if on input 'x, ` for all small enough > 0,
1. the running time of A is polynomial in the size of x (but not necessarily
in ), and
2. A(x, ) is a feasible solution for x with performance ratio 1 +.
We do not have to distinguish between minimization and maximization
problems. If a solution y has performance ratio 1 + in the case of a
maximization problem, then we know that m(x, y) ≥
1
1+
OPT(x). We have
1
1 +
= 1 −
1 +
≥ 1 −,
which is exactly what we want.
Deﬁnition 2.5. PTAS is the class of all problems in NPO that have a PTAS.
We have
PO ⊆ PTAS ⊆ APX
If P = NP, then both inclusions are strict. Under this assumption, a problem
in APX`PTAS is Maximum Satisﬁability (see the next chapters for a proof),
a problem in PTAS ` PO is Knapsack (solve the next exercise for a proof).
Problem 2.6. Knapsack is the following problem:
Instances: rational numbers w
1
, . . . , w
n
(weights), p
1
, . . . , p
n
(proﬁts),
and B (capacity bound) such that w
ν
≤ B for all ν.
Solutions: I ⊆ ¦1, . . . , n¦ such that
¸
i∈I
w
i
≤ B
Measure:
¸
i∈I
p
i
, the total proﬁt of the items packed
Goal: max
We may assume w.l.o.g. that all the p
ν
are natural numbers. If this is not
the case, assume that p
ν
= x
ν
/y
ν
with gcd(x
ν
, y
ν
) = 1. Let Y = y
1
y
n
.
We now replace p
ν
by p
ν
Y ∈ N. Any knapsack that maximizes the old
objective function also maximizes the new one. The size of the instance is
only polynomially larger. (Note that we encode all inputs in binary.)
2.1. Gap problems 11
Exercise 2.1. 1. Show that there is an algorithm for Knapsack with run
ning time polynomial in n and P := max
1≤ν≤n
p
ν
. (Compute by dy
namic programming sets of indices I(i, p) such that
• ν ≤ i for all ν ∈ I(i, p),
• the sum of the p
ν
with ν ∈ I(i, p) is exactly p, and
• the sum of all w
ν
with ν ∈ I(i, p) is minimum among all such set
of indices.)
2. Show that we get a PTAS out of this pseudopolynomial algorithm as
follows:
• Let S = P/n and ˆ p
ν
= p
ν
/S for 1 ≤ ν ≤ n.
• Find an optimum solution for the instance w
1
, . . . , w
n
, ˆ p
1
, . . . , ˆ p
n
,
and B.
Remark 2.7. The running of the PTAS constructed in the previous exercise
is also polynomial in
1
. This is called an fully polynomial time approxima
tion scheme (FPTAS). The corresponding complexity class is denoted by
FPTAS.
Exercise 2.2. A super fully polynomial time approximation scheme is a
PTAS whose running time is polynomial in log
1
. Show that if Knapsack
has a super fully polynomial time approximation scheme, then P = NP.
2.1 Gap problems
A promise problem is a tuple of languages Q = (L, U) with L ⊆ U. (Think
of U as the universe of admissible inputs.) A Turing machine decides a
promise problem, if for all x ∈ U, M(x) = 1 if x ∈ L and M(x) = 0 if
x ∈ U ` L. On inputs not in U, M may output whatever it wants. Since
we do not have to care about the behaviour of M on inputs not in U, we
can also think that we get an input with the additional promise that it is
in U. The elements in L are often called yesinstances, the elements in
U ` L are called noinstances, and the elements not in U are called don’t
careinstances. ”Ordinary” decision problems are a special case of promise
problems, we just set U = ¦0, 1¦
∗
.
Manyonereductions can be extended to promise problems in a natural
way. Let Q = (L, U) and Q
t
= (L
t
, U
t
) be two promise problems. Q is
polynomial time manyone reducible to Q
t
if there is a polynomial time
computable function f such that
x ∈ L =⇒ f(x) ∈ L
t
and
x ∈ U ` L =⇒ f(x) ∈ U
t
` L
t
12 2. Approximation algorithms and approximation classes
That means that yesinstances are mapped to yesinstances and noinstances
are mapped to noinstances. A promise problem Q is Chard for some class
C of decision or promise problems, if every problem in C is polynomial time
manyone reducible to Q.
Deﬁnition 2.8. Let P = (I
P
, S
P
, m
P
, goal) be an optimization problem and
a < b. gap(a, b)P is the promise problem (L, U) where
U = ¦x [ OPT(x) ≤ a or OPT(x) ≥ b¦
and
L =
¦x [ OPT(x) ≥ b¦ if goal = max
¦x [ OPT(x) ≤ a¦ if goal = min
That is, we get an instance x and the promise that the objective value
is at most a or at least b and we shall decide which of these two options is
the case. There is a diﬀerence in the deﬁnition of L for maximization and
minimization problems because the yesinstances shall be the inputs with
solutions that have a ”good” objective value. We will also allow a and b two
be functions N →N that depend on [x[.
Theorem 2.9. If gap(a, b)P is NPhard for polynomial time computable
functions a and b with input given in unary and output given in binary,
then there is no αapproximation algorithm for P with α < b/a, unless
P = NP.
Proof. Suppose on the contrary that such an algorithm A exists. We
only show the case goal
P
= min, the other case is treated similarly. Since
gap(a, b)P is NPhard, there is a polynomial time manyone reduction f
from SAT to gap(a, b)P. We design a polynomial time algorithm for SAT as
follows:
Input: formula φ in CNF
1. Compute x = f(φ) and y = A(x).
2. If m
P
(x, y) < b([x[), then accept, else reject.
Let us see why this algorithm is correct. If φ ∈ SAT, then OPT
P
(x) ≤
a(x) and
m
P
(x, y) ≤ α([x[) OPT
P
(x) < b([x[).
If φ / ∈ SAT, then OPT
P
(x) ≥ b([x[) and
m
P
(x, y) ≥ OPT
P
(x) ≥ b([x[).
Thus the algorithm works correctly. It is obviously polynomial time. There
fore, P = NP.
2.2. Approximation preserving reductions and hardness 13
In Exercise 1.1, we have seen that there is no polynomial p such that
TSP can be approximated within 2
p(n)
where n is the number of nodes of
the given graph. However, note that n is not the size of the instance but
O(p(n)n
2
). Thus gap(n, 2
n
1−
)TSP is NPhard
Since TSP ∈ NPO, we get the following result.
Theorem 2.10. If P = NP, then APX NPO.
We can always approximate TSP within 2
O([x[)
where x is the given in
stance, since with [x[ symbols we can encode integers up to 2
O([x[)
. Thus
TSP is contained in the class expAPX, as deﬁned below.
Deﬁnition 2.11. expAPX = ¦2
p
[ p is a polynomial ¦APX.
Thus the theorem above can be strengthened to the following statement.
Theorem 2.12. If P = NP, then APX expAPX.
Exercise 2.3. What is the diﬀerence between expAPX and NPO?
2.2 Approximation preserving reductions and hardness
Let P and P
t
be two optimization problems. If P is reducible to P
t
(in
some sense to be deﬁned), then we would like to turn approximate solution
of P
t
back into approximate solutions of P. That is, we do not only need a
function that maps instances of P to instances of P
t
, we also need to transfer
solutions of P
t
back to solutions of P like we did for #P functions. Many of
the reductions between NPcomplete problems give also this second function
for free. But what they usually do not do is that they preserve approximation
factors.
Problem 2.13. Maximum Clique (Clique) is the following problem:
Instances: graph G = (V, E)
Solutions: all cliques of G, i.e., all C ⊆ V such that for all u, v ∈ C
with u = v, ¦u, v¦ ∈ E
Measure: #C, the size of the clique
Goal: max
Problem 2.14. Vertex Cover (VC) is the following problem:
Instances: graph G = (V, E)
Solutions: all subsets C of V such that for each ¦u, v¦ ∈ E, C∩¦u, v¦ =
∅
Measure: #C
Goal: min
Exercise 2.4. There is an easy reduction Clique
dec
≤
P
VC
dec
that simply
maps G to its complement.
14 2. Approximation algorithms and approximation classes
1. How does one get a clique of G from a vertex cover of the complement?
2. Assume we have a vertex cover that is a 2approximation. What ap
proximation do we get for Clique from this?
Deﬁnition 2.15. Let P, P
t
∈ NPO. P is reducible to P
t
by an approxima
tion preserving reduction (short: P is APreducible to P
t
or even shorter,
P ≤
AP
P
t
) if there are two functions f, g : ¦0, 1¦
∗
Q
+
→ ¦0, 1¦
∗
and an
α ≥ 1 such that
1. for all x ∈ I
P
and β > 1, f(x, β) ∈ I
P
,
2. for all x ∈ I
P
and β > 1, if S
P
(x) = ∅ then S
P
(f(x, β)) = ∅,
3. for all x ∈ I
P
, y ∈ S
P
(f(x, β)), and β > 1,
g(x, y, β) ∈ S
P
(x),
4. f and g are deterministically polynomial time computable for ﬁxed
β > 1,
5. for all x ∈ I
P
and all y ∈ S
P
(f(x, β)), if y is a βapproximate solution
of f(x, β), then g(x, y, β) is an (1 +α(β −1))approximate solution of
x.
(f, g, α) is called an APreduction from P to P
t
.
2
Lemma 2.16. If P ≤
AP
P
t
and P
t
∈ APX, then P ∈ APX.
Proof. Let (f, g, α) be an APreduction from P to P
t
and let A
t
be a β
approximation algorithm for P
t
. Given x ∈ I
P
, A(x) := g(x, A
t
(f(x, β)), β)
is a (1+α(β −1))approximate solution for x. This follows directly from the
deﬁnition of APreduction. Furthermore, A is polynomial time computable.
Exercise 2.5. Let P ≤
AP
P
t
. Show that if P
t
∈ PTAS, so is P.
The reduction in Exercise 2.4 is not an APreduction. This has a deeper
reason. While there is a 2approximation algorithm for VC, Clique is much
harder to approximate. H˚astad [H˚as99] shows that any approximation algo
rithm with performance ratio n
1−
0
for some
0
> 0 would imply ZPP = NP
(which is almost as unlikely as P = NP).
2
The functions f, g depend on the quality β of the solution y. I am only aware of one
example where this dependence seems to be necessary, so usually, f and g will not depend
on β.
2.2. Approximation preserving reductions and hardness 15
Problem 2.17. Maximum Independent Set (IS) is the following problem:
Instances: graph G = (V, E)
Solutions: independent sets of G, i.e., all S ⊆ V such that for all u, v ∈
S with u = v, ¦u, v¦ / ∈ E
Measure: #S
Goal: max
Exercise 2.6. Essentially the same idea as in Exercise 2.4 gives a reduction
from Clique to IS. Show that this is an APreduction.
Deﬁnition 2.18. Let C ⊆ NPO. A problem P is Chard (under AP
reductions) if for all P
t
∈ C, P
t
≤
AP
P. P is Ccomplete if it is in C
and Chard.
Lemma 2.19. ≤
AP
is transitive.
Proof. Let P ≤
AP
P
t
and P
t
≤
AP
P
tt
. Let (f, g, α) and (f
t
, g
t
, α
t
) be the
corresponding reductions. Let γ = 1 + α
t
(β −1) We claim that (F, G, αα
t
)
is an APreduction from P to P
tt
where
F(x, β) = f
t
(f(x, γ), β),
G(x, y, β) = g(x, g
t
(f(x, γ), y, γ), β).
We verify that (F, G, αα
t
) is indeed an APreduction by checking the ﬁve
conditions in Deﬁnition 2.15:
1. Obvious.
2. Obvious, too.
3. Almost obvious, thus we give a proof. Let x ∈ I
P
and y ∈ S
P
(F(x, β)).
We know that g
t
(f(x, γ), y, β) ∈ S
P
(f(x, γ)), since (f
t
, g
t
, α
t
) is an
APreduction. But then also g(x, g
t
(f(x, γ), y, γ), β) ∈ S
P
(x), since
(f, g, α) is an APreduction.
4. Obvious.
5. Finally, if y is a βapproximation to f
t
(f(x, γ), β), then g
t
(f(x, γ), y, β)
is a (1+α
t
(β−1))approximation to f(x). But then g(x, g
t
(f(x, γ), y, β), γ)
is a (1 +αα
t
(β −1))approximation to x, as
1 +α(1 +α
t
(β −1) −1) = 1 +αα
t
(β −1).
Lemma 2.20. Let C ⊆ NPO. If P ≤
AP
P
t
and P is Chard, then P
t
is also
Chard.
16 2. Approximation algorithms and approximation classes
Proof. Let Q ∈ C be arbitrary. Since P is Chard, Q ≤
AP
P. Since ≤
AP
is transitive, Q ≤
AP
P
t
.
Thus once we have identiﬁed one APXhard problem, we can prove the
APXhardness using APreductions. A canonical candidate is of course the
following problem:
Problem 2.21 (MaxSAT). The Maximum Satisﬁability problem (MaxSAT)
is deﬁned as follows:
Instances: formulas in CNF
Solutions: Boolean assignments to the variables
Measure: the number of clauses satisﬁed
Goal: max
Proposition 2.22. MaxSAT is APXhard.
The proof of this proposition above is very deep, we will spend the next
few weeks with it.
Exercise 2.7. Give a simple 2approximation algorithm for SAT.
2.3 Further exercises
Here in an NPOcomplete problem.
Problem 2.23. Maximum Weighted Satisfiability is the following prob
lem:
Instances: Boolean formula φ with variables x
1
, . . . , x
n
having nonneg
ative weights w
1
, . . . , w
n
Solutions: Boolean assignments α : ¦x
1
, . . . , x
n
¦ → ¦0, 1¦ that satisfy φ
Measure: max¦1,
¸
n
i=0
w
i
α(x
i
)¦
Goal: max
Exercise 2.8. 1. Show that every maximization problem in NPO is AP
reducible to Maximum Weighted Satisﬁability. (Hint: Construct an
NPmachine that guesses a solution y to input x and computes m(x, y).
Use a variant of the proof of the CookKarpLevin Theorem to produce
an appropriate formula in CNF. Assign only nonzero weights to vari
ables that contain the bits of m(x, y).)
2. Show that every minimization problem in NPO is APreducible to Min
imum Weighted Satisﬁability.
3. Show that Maximum Weighted Satisﬁability is APreducible to Mini
mum Weighted Satisﬁability and vice versa.
4. Conclude that Maximum (Minimum) Weighted Satisﬁability is NPO
complete
2.3. Further exercises 17
The world of optimization classes
PO ⊆ PTAS ⊆ APX ⊆ expAPX ⊆ NPO
All of these inclusion are strict, provided that P = NP. Under this
assumption, we have for instance
• Knapsack ∈ PTAS ` PO
• TSP ∈ expAPX ` APX
• Weighted Satisfiability ∈ NPO ` expAPX.
The goal of the next chapters is to prove that MaxSAT is in
APX ` PTAS provided that P = NP.
3 Probabilistically checkable proofs
and inapproximability
3.1 Probabilistically checkable proofs (PCPs)
3.1.1 Probabilistic veriﬁers
A polynomial time probabilistic veriﬁer is a polynomial time probabilistic
Turing machine that has oracle access to a proof π ∈ ¦0, 1¦
∗
in the following
way: The proof π induces a function ¦0, 1¦
log([π[)
→ ¦0, 1¦ by mapping
b ∈ ¦0, 1¦
log([π[)
to the bit of π that stands in the position that is encoded by
the binary representation b. By abuse of notation, we will call this function
again π. If the veriﬁer queries a bit outside the range of π, then the answer
will be 0.
A veriﬁer described above may query π several times and each query
may depend on previous queries. Such a behavior is called adaptive. We
need a more restricted kind of veriﬁers, called nonadaptive: A nonadaptive
veriﬁer gets the proof π again as an oracle, but in a slightly diﬀerent form:
The veriﬁer can write down several positions of π at one time. If it enters
the query state, it gets the values of all the positions that it queries. But the
veriﬁer may enter the query state only once, i.e., the veriﬁer has to decide
in advance which bits it wants to query.
A nonadaptive probabilistic veriﬁer is called (r(n), q(n))restricted if it
uses r(n)bits of randomness and queries q(n) bits of π for all n and all
inputs x of of length n.
Deﬁnition 3.1. Let r, q : N → N. A language L belongs to the class
PCP[r, q] if there exists a (r, q)restricted nonadaptive polynomial time prob
abilistic veriﬁer such that the following holds:
1. For any x ∈ L, there is a proof π such that
Pr
y
[V
π
(x, y) = 1] = 1.
2. For any x / ∈ L and for all proofs π,
Pr
y
[V
π
(x, y) = 0] ≥ 1/2.
The probabilities are taken over the the random strings y.
18
3.1. Probabilistically checkable proofs (PCPs) 19
In other words, if x is in L, then there is a proof π that convinces the
veriﬁer regardless of the random string y. If x is not in L, then the veriﬁer
will detect a “wrong” proof with probability at least 1/2, that is, for half of
the random strings.
Since the veriﬁer is r(n)restricted, there are only 2
r(n)
(relevant) random
strings. For any ﬁxed random string, the veriﬁers queries at most q(n) bits
of the proof. Therefore, for an input x of length n, we only have to consider
proofs of length q(n)2
r(n)
, since the veriﬁer cannot query more bits than
that.
3.1.2 A diﬀerent characterization of NP
Once we have deﬁned the PCP classes, the obvious question is: What is this
good for and how is it related to other classes? While complexity theorists
also like to answer the second part of the question without knowing an
answer to the ﬁrst part, here the answer to the second part also gives the
answer to the ﬁrst part.
Let R and Q denote sets of functions N → N. We generalize the notion
of PCP[r, q] in the obvious way:
PCP[R, Q] =
¸
r∈R,q∈Q
PCP[r, q].
The characterization of NP by polynomial time veriﬁers immediately yields
the following result.
Proposition 3.2. NP = PCP[0, poly(n)].
In the theorem above, we do not use the randomness at all. The next
result, the celebrated PCP theorem [?], shows that allowing a little bit of
randomness reduces the number of queries dramatically.
Theorem 3.3 (PCP Theorem). NP = PCP[O(log n), O(1)].
What does this mean? By allowing a little randomness—note that
O(log n) are barely suﬃcient to choose O(1) bits of the proof at random—
and a bounded probability of failure, we can check the proof π by just
reading a constant number of bits of π! This is really astonishing.
Exercise 3.1. Show that PCP[O(log n), O(1)] ⊆ NP. (Hint: How many
random strings are there?)
The other direction is way more complicated, we will spend the next few
lectures with its proof. We will not present the original proof by Arora et
al. [ALM
+
98] but a recent and—at least compared to the ﬁrst one—elegant
proof by Irit Dinur [Din07].
20 3. PCP and inapproximability
3.2 PCPs and gap problems
The PCP theorem is usually used to prove hardness of approximation results.
Dinur’s proof goes the other way around, we show that the statement of the
PCP theorem is equivalent to the NPhardness of some gap problem
Theorem 3.4. The following two statements are equivalent:
1. NP = PCP[O(log n), O(1)].
2. There is an > 0 such that gap(1 −, 1)Max3SAT is NPhard.
1
Proof. “=⇒”: Let L be any NPcomplete language. By assumption,
there is an (r(n), q)restricted nonadaptive polynomial time probabilistic
veriﬁer V with r(n) = O(log n) and q = O(1). We can assume that V
always queries exactly q bits.
Let x be an input for L of length n. We will construct a formula in
3CNF φ in polynomial time such that if x ∈ L, then φ is satisﬁable and if
x / ∈ L, then every assignment can satisfy at most a fraction of 1 − of the
clauses for some ﬁxed > 0.
For each position i in the proof, there will be one Boolean variable v
i
. If
v
i
is set to 1, this will mean that the corresponding ith bit is 1; if it is set
to zero, then this bit is 0. Since we can restrict ourselves to proofs of length
≤ q 2
r(n)
= poly(n), the number of these variables is polynomial.
For a random string y, let i(y, 1), . . . , i(y, q) denote the positions of the
bits that the veriﬁer will query. (Note that the veriﬁer is nonadaptive,
hence these position can only depend on y.) Let A
y
be the set of all q
tuples (b
1
, . . . , b
q
) ∈ ¦0, 1¦
q
such that if the i(y, j)th bit of the proof is b
j
for
1 ≤ j ≤ q, then the veriﬁer will reject (with random string y).
For each tuple (b
1
, . . . , b
q
) ∈ A
y
, we construct a clause of q literals, that
is true iﬀ the variables v
i(y,1)
, . . . , v
i(y,q)
do not take the value b
1
, . . . , b
q
, i.e,
v
1−b
1
i(y,1)
∨ ∨ v
1−b
q
i(y,q)
. (Here, for a Boolean variable v, v
1
= v and v
0
= ¯ v.)
The formula φ has ≤ [A
y
[2
r(n)
≤ 2
q+r(n)
= poly(n) many clauses. These
clauses have length q. Like in the reduction of SAT to 3SAT, for each such
clause c, there are q −2 clauses c
1
, . . . , c
q−2
of length three in the variables
of c and some additional variables such that any assignment that satisﬁes c
can be extended to an assignment that satisﬁes c
1
, . . . , c
q−2
and conversely,
the restriction of any assignment that satisﬁes c
1
, . . . , c
q−2
satisﬁes c, too.
This replacement can be computed in polynomial time.
The formula φ can be computed in polynomial time: We enumerate all
(polynomially many) random strings. For each such string y, we simulate
1
Instead of stating the absolute bounds (1 − )m and m, where m is the number of
clauses of the given instance, we just state the relative bounds 1 − and 1. This is very
convenient here, since there is an easy upper bound of the objective value, namely m.
3.2. PCPs and gap problems 21
the veriﬁer V to ﬁnd out which bits he will query. Then we can give him all
the possible answers to the bits he queried to compute the sets A
y
.
If x ∈ L, then there will be a proof π such that V
π
(x, y) = 1 for every
random string y. Therefore, if we set the variables of φ as given by this
proof π, then φ will be satisﬁed.
If x / ∈ L, then for any proof π, there are at least 2
r(n)
/2 random strings
y for which V
π
(x, y) = 0. For each such y, one clause corresponding to a
tuple in A
y
will not be satisﬁed. In other words, for any assignment, 2
r(n)
/2
clauses will not be satisﬁed. The total number of clauses is bounded by
(q −2)2
q+r(n)
. The fraction of unsatisﬁed clauses therefore is
≥
2
r(n)
/2
(q −2)2
q+r(n)
≥ 2
−q−1
/(q −2),
which is a constant.
“⇐=”: By Exercise 3.1, it suﬃces to show that NP ⊆ PCP[O(log n), O(1)].
Let L ∈ NP. By assumption, there is a polynomial time computable function
f such that
x ∈ L =⇒ f(x) is a satisﬁable formula in 3CNF,
x / ∈ L =⇒ f(x) is a formula in 3CNF such that every assignment
satisﬁes at most (1 −) of the clauses.
We construct a probabilistic veriﬁer as follows:
Input: input x, proof π
1. Compute f(x).
2. Randomly select a clause c from f(x).
3. Interpret π as an assignment to f(x) and read the bits that belong to
the variables in c.
4. Accept if the selected clause c is satisﬁed. Reject otherwise.
Let m be the number of clauses of f(x). To select a clause at random,
the veriﬁer reads log m random bits and interprets it as a number. If it
“selects” a nonexisting clause, then it will accept. So we can think of m
being a power of two at the expense of replacing by /2.
Now assume x ∈ L. Then f(x) is satisﬁable and therefore, there is a proof
that will make the veriﬁer always accept, namely a satisfying assignment of
f(x). If x / ∈ L, then no assignment will satisfy more than 1− of the clauses.
In particular, the probability that the veriﬁer selects a clause that is satisﬁed
is at most 1 −. By repeating this process for a constant number of times,
we can bring the error probability down to 1/2.
Since f(x) is in 3CNF, the veriﬁer needs O(log m) = O(log [x[) random
bits, and it only queries O(1) bits of the proof.
22 3. PCP and inapproximability
Exercise 3.2. Let c be a clause of length q. Construct clauses c
1
, . . . , c
q−2
of
length three in the variables of c and some additional variables such that any
assignment that satisﬁes c can be extended to an assignment that satisﬁes
c
1
, . . . , c
q−2
and conversely, the restriction of any assignment that satisﬁes
c
1
, . . . , c
q−2
satisﬁes c, too.
Note that we get an explicit value for in terms of q. Thus in order to
get good nonapproximability results from the PCP theorem, we want q to
be as small as possible.
3.3 Further exercises
Exercise 3.3. Show that PCP[O(log n), 2] = P.
It can be shown—tadah!—that three queries are enough to capture NP;
however, it is not possible to get error probability 1/2 and onesided error,
see [GLST98] for further discussions.
A Max3SAT is APXhard
In this chapter, we will strengthen the result of the previous one by showing
that Max3SAT is in fact APXhard. We do this in several steps. First, we
show that any maximization problem in APX is APreducible to Max3SAT.
Second, we show that for every minimization problem P, there is a maxi
mization problem P
t
such that P ≤
AP
P
t
. This will conclude the proof.
Our proof of the PCPTheorem will also yield the following variant,
which we will use in the following.
Theorem A.1 (PCPTheorem’). There are > 0 and polynomial time
computable functions f
PCP
and g
PCP
such that for every formula ψ in 3
CNF:
1. f
PCP
(ψ) is a formula in 3CNF,
2. if ψ is satisﬁable, so is f
PCP
(ψ),
3. if ψ is not satisﬁable, then any assignment can satisfy at most a frac
tion of 1 − of the clauses in f
PCP
(ψ),
4. if a is an assignment for f
PCP
(ψ) that satisﬁes more than a fraction
of 1 − of the clauses, then g
PCP
(ψ, a) is an assignment that satisﬁes
ψ.
Theorem A.2. Let P = (I
P
, S
P
, m
P
, max) be a maximization problem in
APX. Then P ≤
AP
Max3SAT.
Proof. Our goal is to construct an AP reduction (f, g, α) from P to
Max3SAT. Let f
PCP
and g
PCP
be the functions constructed in Theorem A.1
and let be the corresponding constant. Let A be a bapproximation algo
rithm for P. Let
α = 2(b log b +b −1)
1 +
.
Our goal is to deﬁne the functions f and g given β. Let r = 1+α(β −1).
If r < b, then
β =
r −1
α
+ 1 =
2(1 +)
r −1
b log b +b −1
+ 1 <
2k(1 +)
+ 1 (A.1)
where k = log
r
b. The last inequality follows from
k ≤
log b
log r
+ 1 ≤
r log b
r −1
+ 1 ≤
b log b +b −1
r −1
.
23
24 A. Max3SAT is APXhard
Let µ(x) = m
P
(x, A(x)). Since A is a bapproximation algorithm, µ(x) ≤
OPT
P
(x) ≤ bµ(x).
The following Turing machine computes f:
Input: x ∈ ¦0, 1¦
∗
, β ∈ Q
+
1. Construct formulas φ
x,i
in 3CNF that are true if OPT
P
(x) ≥ i.
(These formulas φ
x,i
can be uniformly constructed in polynomial time,
cf. the proof of Cook’s theorem.)
2. Let ψ
x,κ
= f
PCP
(φ
x,µ(x)r
κ), 1 ≤ κ ≤ k.
By padding with dummy clauses, we may assume that all the ψ
x,κ
have the same number of clauses c.
3. Return ψ
x
=
k
κ=1
ψ
x,κ
.
The function g is computed as follows:
Input: x ∈ ¦0, 1¦
∗
, assignment a with performance ratio β
1. If b ≤ 1 +α(β −1), then return A(x).
1
2. Else let κ
0
be the largest κ such that g
PCP
(φ
x,µ(x)r
κ
,a
) satisﬁes φ
x,µ(x)r
κ.
(We restrict a to the variables of φ
x,µ(x)r
κ.)
3. This satisfying assignment corresponds to a feasible solution y with
m
P
(x, y) ≥ µ(x)r
κ
0
.
Return y.
If b ≤ 1 + α(β −1), then we return A(x). This is a bapproximation by
assumption. Since b ≤ 1 +α(β −1), we are done.
Therefore, assume that b > 1 +α(β −1). We have
OPT
Max3SAT
(ψ
x
) −m
Max3SAT
(ψ
x
, a) ≤ OPT
Max3SAT
(ψ
x
)
β −1
β
≤ kc
β −1
β
.
Let β
κ
denote the performance ratio of a with respect to ψ
x,κ
, i.e., we view
a as an assignment of ψ
x,κ
. We have
OPT
Max3SAT
(ψ
x
) −m
Max3SAT
(ψ
x
, a) ≥ OPT
Max3SAT
(ψ
x,κ
) −m
Max3SAT
(ψ
x,κ
, a)
= OPT
Max3SAT
(ψ
x,κ
)
β
κ
−1
β
κ
≥
c
2
β
κ
−1
β
κ
.
1
Here is the promised dependence on β.
25
The last inequality follows from the fact that any formula in CNF has an
assignment that satisﬁes at least half of the clauses. This yields
c
2
β
κ
−1
β
κ
≤ kc
β −1
β
and ﬁnally
β
κ
≤
1
1 −2k(β −1)/β
.
Exploiting (A.1), we get, after some routine calculations,
β
κ
≤ 1 +.
This means that a satisﬁes at least a fraction of 1/β
κ
≥ 1 − of the clauses
of ψ
x,κ
. Then g
PCP
(a) satisﬁes φ
x,µ(x)r
κ if and only if φ
x,µ(x)r
κ is satisﬁable.
This is equivalent to the fact that OPT
P
(x) ≥ µ(x)r
κ
. By the deﬁnition of
κ
0
,
µ(x)r
κ
0
+1
> OPT
P
(x) ≥ µ(x)r
κ
0
.
This means that m
P
(x, y) ≥ µ(x)r
κ
0
. But then y is an rapproximate solu
tion. Then we are done, since r = 1 +α(β −1) by deﬁnition.
Theorem A.3. For every minimization problem P ∈ APX, there is a max
imization problem P
t
∈ APX such that P ≤
AP
P
t
.
Proof. Let Abe a bapproximation algorithm for P. Let µ(x) = m
P
(x, A(x))
for all x ∈ I
P
. Then µ(x) ≤ b OPT
P
(x). P
t
has the same instances and fea
sible solutions as P. The objective function is however diﬀerent:
m
P
(x, y) =
(k + 1)µ(x) −k m
P
(x, y) if m
P
(x, y) ≤ µ(x)
µ(x) otherwise
where k = b. We have µ(x) ≤ OPT
P
(x) ≤ (k +1)µ(x). This means, that
A is a (k + 1)approximation algorithm for P
t
. Hence, P
t
∈ APX.
The AP reduction (f, g, α) from P
t
to P is deﬁned as follows: f(x, β) = x
for all x ∈ I
P
. (Note that we do not need any dependence on β here). Next,
we set
g(x, y, β) =
y if m
P
(x, y) ≤ µ(x)
A(x) otherwise
And ﬁnally, α = k + 1.
Let y be a βapproximate solution to x under m
P
, that is, R
P
(x, y) =
OPT
P
(x)/ m
P
(x, y) ≤ β. We have to show that R
P
(x, y) ≤ 1 +α(β −1).
26 A. Max3SAT is APXhard
We distinguish two cases: The ﬁrst one is m
P
(x, y) ≤ µ(x). In this case,
m
P
(x, y) =
(k + 1)µ(x) −m
P
(x, y)
k
≤
(k + 1)µ(x) −OPT
P
(x)/β
k
≤
(k + 1)µ(x) −(1 −(β −1)) OPT
P
(x)
k
≤ OPT
P
(x) +
β −1
k
OPT
P
(x)
≤ OPT
P
(x) +
β −1
k
(k + 1)µ(x)
≤ OPT
P
(x) + (β −1)(k + 1)µ(x)/r
≤ (1 +α(β −1)) OPT
P
(x).
This completes the ﬁrst case.
For the second case, note that
m
P
(x, g(x, y)) = m
P
(x, A(y)) ≤ b OPT
P
(x) ≤ (1 +α(β −1)) OPT
P
(x).
Thus, P ≤
AP
P
t
.
Now Theorems A.2 and A.3 imply the following result.
Theorem A.4. Max3SAT is APXhard.
A.1 Further exercises
Exercise A.1. Show that Max3SAT ≤
AP
Clique. (In particular, Clique
does not have a PTAS, unless P = NP.)
The kth cartesian product of a graph G = (V, E) is a graph with nodes
V
k
and there is an edge between (u
1
, . . . , u
k
) and (v
1
, . . . , v
k
) if either u
i
= v
i
or ¦u
i
, v
i
¦ ∈ E for all 1 ≤ i ≤ k.
Exercise A.2. 1. Prove that if G has a clique of size s, then G
k
has a
clique of size s
k
.
2. Use this to show that if Clique ∈ APX, then Clique ∈ PTAS. Now
apply Exercise A.1.
H˚astad [H˚as99] shows that any approximation algorithm with perfor
mance ratio n
1−
0
for some
0
> 0 would imply ZPP = NP On the other
hand, achieving a performance ratio of n is trivial.
4 The long code
The next chapters follow the ideas of Dinur quite closely. The Diplomarbeit
by Stefan Senitsch [Sen07] contains a polished proof with many additional
details which was very helpful in preparing the next chapters.
Let B
n
denote the set of all Boolean functions ¦0, 1¦
n
→ ¦0, 1¦ and B
−
n
the set of all functions ¦−1, 1¦
n
→ ¦−1, 1¦. These are essentially the same
objects, x → −2x + 1 (or x → (−1)
x
) is a bijection that maps 1 to −1 and
0 to 1. Note that we identify 1 (true) with −1 and 0 (false) with 1. For our
purposes, it is more convenient to work with B
−
n
and this interpretation of
true and false.
Deﬁnition 4.1. Let x ∈ ¦−1, 1¦
n
. The long code of x is the function
LC
x
: B
−
n
→ ¦−1, 1¦ given by LC
x
(f) = f(x) for all f ∈ B
−
n
.
The long code was invented by Bellare, Goldreich, and Sudan [BGS98].
By ordering the functions in B
−
n
, we can view LC
x
as a vector in ¦−1, 1¦
2
2
n
and we will tacitly switch between these two views.
The relative distance between two elements in A, B ∈ ¦−1, 1¦
2
2
n
is
δ(A, B) = Pr
f∈B
−
n
[A(f) = B(f)],
i.e., it is the probability that the vectors A and B diﬀer at a random position.
Furthermore, we deﬁne a scalar product on ¦−1, 1¦
2
2
n
by
'A, B` = E
f∈B
−
n
[AB] = 2
−2
n
¸
f∈B
−
n
A(f) B(f).
Note that 'A, A` = 1 for all A.
For a set S ⊆ ¦−1, 1¦
n
, let χ
S
: B
−
n
→ ¦−1, 1¦ be deﬁned by
χ
S
(f) =
¸
x∈S
f(x).
Let 1
n
= ¦A : B
−
n
→R¦. 1
n
is a vector space of dimension 2
2
n
.
Lemma 4.2. ¦χ
S
[ S ⊆ ¦−1, 1¦
n
¦ is an orthonormal basis of 1
n
.
Proof. Let S, T ⊆ ¦−1, 1¦
n
with S = T. First,
'χ
S
, χ
S
` = 2
−2
n
¸
f∈B
−
n
χ
S
(f)
2
= 1.
27
28 4. The long code
Second,
'χ
S
, χ
T
` = 2
−2
n
¸
f∈B
−
n
χ
S
(f)χ
T
(f) =
¸
f∈B
−
n
¸
x∈S
f(x)
¸
x∈T
f(x) =
¸
f∈B
−
n
¸
x∈S∆T
f(x).
Choose an x ∈ S∆T. Note that such an x exists, since S = T. Consider the
mapping on B
−
n
that maps a function f to the function g with f(x) = −g(x)
and f(y) = g(y) for all y = x. This mapping is an involution (i.e., self
inverse) that does not have any ﬁxed points. Such an involution separates
the functions in B
−
n
into two sets of the same size such that for all functions
f, the corresponding function g is in the other set.
1
Hence
¸
f∈B
−
n
¸
x∈S∆T
f(x) =
¸
f∈B
−
n
f(x)
¸
y∈S∆T\x¦
f(y) = 0 .
Thus, the χ
S
form an orthonormal family. Since its size equals the
dimension of ¦−1, 1¦
2
2
n
, it is spanning, too.
Once we have an orthonormal family, we can look at Fourier expansions.
The Fourier coeﬃcients of a function A : B
−
n
→ ¦−1, 1¦ are given by
ˆ
A
S
= 'A, χ
S
` = 2
−2
n
¸
f∈B
−
n
A(f)
¸
x∈S
f(x).
The Fourier expansion of A is
A(f) =
¸
S⊆−1,1¦
n
ˆ
A
S
χ
S
(f) =
¸
S⊆−1,1¦
n
ˆ
A
S
¸
x∈S
f(x).
Furthermore, Parceval’s identity holds, that is,
¸
S⊆−1,1¦
n
ˆ
A
2
S
= 'A, A` = 1.
In what follows, we will usually consider folded strings. A ∈ ¦−1, 1¦
2
n
is called folded over true if for all f, A(−f) = −A(f). Let ψ : ¦−1, 1¦
n
→
¦−1, 1¦. A is called folded over ψ if for all f, A(f) = A(f ∧ ψ). A is simply
called folded, if it is folded over true and over ψ. (This assumes that ψ is
clear from the context.) If a string is folded, then we only need to specify
it on a smaller set of positions D
ψ
deﬁned as follows: Let D be a set of
functions that contains exactly one function of every pair f and −f and let
D
ψ
= ¦f ∈ D [ f = f ∧ ψ¦.
1
To achieve this, pick a function, put it into the one set and its image under the
involution into the other. This is possible, since the involution has no ﬁxed points. Repeat
until all function are put into one of the two sets. I am sorry that you had to read this,
but I was puzzled.
4.1. Properties of folded strings 29
4.1 Properties of folded strings
Lemma 4.3. If A = LC
a
for some a ∈ ¦−1, 1¦
n
and ψ(a) = −1 then A is
folded.
Proof. Let f : ¦−1, 1¦
n
→ ¦−1, 1¦. We have
A(f) = LC
a
(f) = f(a) = (f ∧ ψ)(a) = LC
a
(f ∧ ψ) = A(f ∧ ψ)
and
A(−f) = LC
a
(−f) = −f(a) = −LC
a
(f) = −A(f).
Lemma 4.4. Let ψ ∈ B
−
n
, let A ∈ 1
n
be folded, and let S ⊆ ¦−1, 1¦
n
. We
have:
1. E
f∈B
−
n
[A(f)] = 0.
2. If [S[ is even, then
ˆ
A
S
= 0.
3. If there is a y ∈ S with ψ(y) = 1, then
ˆ
A
S
= 0.
Proof. We start with 1: Let f ∈ B
−
n
. Since A is folded, A(f)+A(−f) = 0.
From this, it follows easily that the expected value is 0.
Next comes 2: We have
χ
S
(−f) =
¸
x∈S
−f(x) = (−1)
[S[
¸
x∈S
f(x) = χ
S
(f).
Let f ∈ B
−
n
. We have
A(f)χ
S
(f) +A(−f)χ
S
(−f) = (A(f) +A(−f))χ
S
(f) = 0.
Thus
ˆ
A
S
= E[Aχ
S
] = 0.
Finally, we show 3: Let f ∈ B
−
n
and let g ∈ B
−
n
be the function that
diﬀers only at y from f. We have
χ
S
(g) =
¸
x∈S
g(x) = g(y)
¸
x∈S\y¦
g(x) = −f(y)
¸
x∈S\y¦
f(x) = −χ
S
(f)
Since A is folded and f ∧ ψ = g ∧ ψ, A(f) = A(g). Thus A(f)χ
S
(f) +
A(g)χ
S
(g) = 0.
5 Long code tests
We will use the long code to encode assignments of a formula ψ ∈ B
−
n
. We
will design a test T that gets a string A and test whether A is the long code
of a satisfying assignment of ψ. The test will only query three bits of A!
However, the long code is simply to long . . . but this will not matter in the
end.
5.1 First test
Input: folded string A : B
−
n
→ ¦−1, 1¦, ψ : ¦−1, 1¦
n
→ ¦−1, 1¦.
1. Let τ = 1/100.
2. Choose f, g ∈ B
−
n
uniformly at random.
3. Deﬁne µ ∈ B
−
n
as follows: If f(x) = 1, then let µ(x) = −1. If f(x) =
−1, then let
µ(x) =
1 with probability 1 −τ,
−1 with probability τ.
4. Let h = µ g.
5. If A(f) = A(g) = A(h) = 1, then reject. Else accept.
The following lemma show that if the test T accepts, then A is close to
the long code of a satisfying assignment of A or its negation.
Lemma 5.1. There exists a constant K
∗
such that the following holds:
If Pr[T rejects (A, ψ)] ≤ for small enough > 0, then there is an a ∈
¦−1, 1¦
n
with ψ(a) = −1 such that either δ(A, LC
a
) < K
∗
or δ(A, −LC
a
) <
K
∗
.
Proof. T accepts iﬀ not all of A(f), A(g) and A(h) equal 1. This is
equivalent to 1 −
1
8
(1 + A(f))(1 + A(g))(1 + A(h)) = 1 (and not 0; the
30
5.1. First test 31
lefthand side is ¦0, 1¦valued). Thus
Pr[T accepts (A, ψ)] = Pr
¸
1 −
1
8
(1 +A(f))(1 +A(g))(1 +A(h)) = 1
= E
¸
1 −
1
8
(1 +A(f))(1 +A(g))(1 +A(h))
=
7
8
−
1
8
(E[A(f)] +E[A(g)] +E[A(h)]
+ E[A(f)A(g)] + E[A(f)A(h)] + E[A(f)A(h)]
+ E[A(f)A(g)A(h)]) (5.1)
As A is folded, E[A(f)] = 0 by Lemma 4.4. Since f, g, and h (check the
latter!) are drawn uniformly at random,
E[A(f)] = E[A(g)] = E[A(h)] = 0.
The pairs (f, g) and (f, h) are independent ((g, h) is, however, not!), thus
E[A(f)A(g)] = E[A(f)] E[A(g)] = 0,
and in the same way E[A(f)A(h)] = 0.
Therefore, it remains to estimate E[A(g)A(h)] and E[A(f)A(g)A(h)] in
(5.1). We start with E[A(g)A(h)] and will use Fourier analysis:
E[A(g)A(h)] = E
¸
S,T⊆−1,1¦
n
ˆ
A
S
χ
S
(g)
ˆ
A
T
χ
T
(h)
¸
¸
=
¸
S,T
ˆ
A
S
ˆ
A
T
E[χ
S
(g)χ
T
(h)]. (5.2)
So we should analyze the terms E[χ
S
(g)χ
T
(h)]. We start with the case
S = T. Let z ∈ S ` T, the other case is symmetric. We have
E[χ
S
(g)χ
T
(h)] = E
¸
x∈S
g(x)
¸
y∈T
h(y)
¸
¸
= E
g(z)
¸
x∈S\z¦
g(x)
¸
y∈T
h(y)
¸
¸
= E[g(z)] E
¸
x∈S\z¦
g(x)
¸
y∈T
h(y)
¸
¸
= 0,
32 5. Long code tests
since g(z) and the remaining product are independent and E[g(z)] = 0,
because g is random. If T = S, then
E[χ
S
(g)χ
S
(h)] = E
¸
¸
x∈S
g(x)h(x)
¸
= E
¸
¸
x∈S
g
2
(x)µ(x)
¸
= E
¸
¸
x∈S
µ(x)
¸
=
¸
x∈S
E[µ(x)]
=
¸
x∈S
(Pr[µ(x) = 1]
. .. .
=
1
2
(1−τ)
−Pr[µ(x) = −1]
. .. .
=
1
2
(1+τ)
)
=
¸
x∈S
(−τ)
= (−τ)
[S[
, (5.3)
Above, we used g
2
(x) = 1 for all x and the independence of µ(x) and µ(y)
for x = y. If we plug everything into (5.2), we get
E[A(g)A(h)] =
¸
S⊆−1,1¦
n
ˆ
A
2
S
(−τ)
[S[
.
Because
ˆ
A
∅
= 0 by Lemma 4.4,
[E[A(g)A(h)][ ≤ τ
¸
S⊆−1,1¦
n
ˆ
A
2
s
= τ, (5.4)
where the last inequality follows from Parceval’s identity.
Next comes E[A(f)A(g)A(h)] =: W. Like before, it can be shown that
W =
¸
R⊆S⊆−1,1¦
n
ˆ
A
2
S
ˆ
A
R
E[χ
S
(µ)χ
R
(f)],
see Exercise 5.1. Now,
E[χ
S
(µ)χ
R
(f)] = E
¸
x∈R
f(x)µ(x)
¸
y∈S\R
µ(y)
¸
¸
=
¸
x∈R
(Pr[f(x)µ(x) = 1] −Pr[f(x)µ(x) = −1]) (−τ)
[S\R[
=
¸
x∈R
1
2
τ −
1
2
+
1
2
(1 −τ)
(−τ)
[S\R[
= (τ −1)
[R[
(−τ)
[S\R[
.
5.1. First test 33
Note that
¸
y∈S\R
µ(y) = (−τ)
[S\R[
has already be shown, see (5.3). Thus,
W =
¸
R⊆S⊆−1,1¦
n
ˆ
A
2
S
ˆ
A
r
(τ −1)
[R[
(−τ)
[S\R[
and
[W[ ≤
¸
R⊆S⊆−1,1¦
n
[
ˆ
A
2
S
[[
ˆ
A
R
[(1 −τ)
[R[
τ
[S\R[
≤
¸
S
[
ˆ
A
2
S
[
¸
R⊆S
[
ˆ
A
R
[(1 −τ)
[R[
τ
[S\R[
. (5.5)
By the Cauchy–Schwartz inequality,
¸
R⊆S
[
ˆ
A
R
[(1 −τ)
[R[
τ
[S\R[
≤
¸
R⊆S
[
ˆ
A
R
[
2
¸
R⊆S
((1 −τ)
[R[
τ
[S\R[
)
2
≤
[S[
¸
i=0
[S[
i
((1 −τ)
2
)
i
(τ
2
)
[S[−i
=
(τ
2
+ (1 −τ)
2
)
[S[
≤ (1 −τ)
[S[/2
.
The last inequality follows by the choice of τ. Thus
[W[ ≤
¸
R⊆S⊆−1,1¦
n
[
ˆ
A
2
S
[(1 −τ)
[S[/2
≤
¸
[S[=1
[
ˆ
A
2
S
[(1 −τ) +
¸
[S[≥3
[
ˆ
A
2
S
[(1 −τ)
[S[/2
.
since
ˆ
A
T
= 0 for even T. We get a better bound for the ﬁrst sum by
analyzing (5.5), since [A
2
R
[ ≤ 1.
Set = Pr[T rejects]. Then (5.1) yields −1 + τ + 8 ≥ W. For small
enough , this yields W < 0 and therefore we get
1 −τ −8 ≤ [W[ ≤ (1 −ρ)(1 −τ) +ρ(1 −τ)
3/2
where ρ =
¸
[S[≥3
[
ˆ
A
2
S
[. From this, we get
ρ ≤
8
(1 −τ)(1 −
√
1 −τ)
≤ K.
with K =
8
(1−τ)(1−
√
1−τ)
, which is constant.
Finally, we will apply Theorem 5.2. For small enough , 1 − Lρ will
be greater than 0. Since
ˆ
A
∅
= 0, the ﬁrst case in the theorem cannot
34 5. Long code tests
happen. Hence [
ˆ
A
2
a¦
[ ≥ 1 −Lρ for some a ∈ ¦−1, 1¦
n
. Thus, either
ˆ
A
a¦
≥
√
1 −Lρ ≥ 1 −Lρ or −
ˆ
A
a¦
≥ 1 −Lρ. In the ﬁrst case, we get
1 −Lρ ≤
ˆ
A
a¦
= 'A, χ
a¦
` = 'A, LC
a
`,
because χ
a¦
(f) = f(a) = LC
a
(f). Thus
δ(A, LC
a
) ≤ L/2ρ ≤ KL/2 .
In the second case, we get δ(A, −LC
a
) ≤ KL/2 in the same way. By
Lemma 4.4, ψ(a) = −1, since
ˆ
A
a¦
= 0.
Here is the theorem that we used in the proof above. It is essentially
the only result that we will not prove. The theorem says that whenever a
function A only has small Fourier coeﬃcients corresponding to sets [S[ > 1,
then most of the mass is concentrated in one Fourier coeﬃcient with [S[ ≤ 1.
Theorem 5.2 (Friedgut, Kalai & Naor [FKN02]). There is a constant L > 0
such that for all ρ > 0 and A : B
−
n
→ ¦−1, 1¦ with ρ ≥
¸
[S[>1
[
ˆ
A
2
S
[ the
following holds: Either [
ˆ
A
2
∅
[ ≥ 1−Lρ or [
ˆ
A
2
a
[ ≥ 1−Lρ for some a ∈ ¦−1, 1¦
n
.
Exercise 5.1. Show that W =
¸
R⊆S⊆−1,1¦
n
ˆ
A
2
S
ˆ
A
R
E[χ
S
(µ)χ
R
(f)].
Theorem 5.3. The long code test T has the following properties:
1. If a ∈ ¦−1, 1¦
n
with ψ(a) = −1, then T accepts LC
a
and ψ with
probability 1.
2. There is a constant c > 0 such that for all 0 < δ ≤ 1, if A is folded
and δ(A, LC
a
) ≥ δ for all a ∈ ¦−1, 1¦
n
with ψ(a) = −1, then T rejects
A and ψ with probability ≥ cδ.
Proof. We ﬁrst prove 1: Let a ∈ ¦−1, 1¦
n
with ψ(a) = −1 and let
A = LC
a
. By Lemma 4.3, A is folded. If A(f) = f(a) = −1, then T accepts.
If A(f) = f(a) = 1, then µ(a) = −1. Hence, A(h) = h(a) = g(a) = A(g).
Thus one of these two values equals −1 and T accepts, too.
Now we come to 2: Assume that the assertion does not hold. Then for
all c > 0, there is a δ
c
such that δ(A, LC
a
) ≥ δ
c
for all a ∈ ¦−1, 1¦
n
with
ψ(a) = −1 and T rejects with probability < cδ
c
.
We choose c < 1/K
∗
small enough and apply Lemma 5.1. There is an
a ∈ ¦−1, 1¦
n
such that ψ(a) = −1 and δ(A, LC
a
) < K
∗
≤ cK
∗
δ
c
< δ
c
or
δ(A, −LC
a
) < δ
c
. The ﬁrst possibility is ruled out by the assumption about
δ
c
. Hence δ(A, −LC
a
) < δ
c
.
5.2. Second test 35
We have
Pr[T rejects (−LC
a
, ψ)] = Pr[LC
a
(f) = LC
a
(g) = LC
a
(h) = −1]
= Pr[f(a) = g(a) = h(a) = −1]
= Pr[f(a) = −1] Pr[g(a) = 1] Pr[h(a) −1[f(a) = g(a) = −1]
=
1
4
Pr[µ(a) = 1[f(a) = −1]
=
1
4
(1 −τ).
This implies that
= Pr[T rejects (−LC
a
, ψ)]
≥ 1 −Pr[T accepts (−LC
a
, ψ)]
≥ 1 −Pr[T accepts (−LC
a
, ψ) or
−LC
a
(f) = A(f) or −LC
a
(g) = A(g) or −LC
a
(h) = A(h)]
≥ 1 −Pr[T accepts (−LC
a
, ψ)] −3δ(A, LC
a
)
≥ 1 −(1 −
1
4
(1 −τ)) −3K
∗
=
1
4
(1 −τ) −3K
∗
.
For small enough, the right hand side is about 1/4 and therefore greater
than , a contradiction.
5.2 Second test
Let e
j
: ¦−1, 1¦
n
→ ¦−1, 1¦ be the projection on the jth component. We
deﬁne a second test T
t
that bases on T:
Input: a ∈ ¦−1, 1¦
n
, folded A : B
−
n
→ ¦−1, 1¦, ψ ∈ B
−
n
1. With probability 1/2, run T on (A, ψ).
2. With probability 1/2, choose a j ∈ ¦1, . . . , n¦ and f ∈ B
−
n
at random.
Accept if a
j
= A(f) A(f e
j
). Else reject.
Theorem 5.4. 1. If ψ(a) = −1, then there is an A ∈ B
−
n
such that
Pr[T
t
accepts (a, A, ψ)] = 1.
2. There is a c > 0 such that for all 0 < δ ≤ 1, the following holds: If
δ(a, a
t
) ≥ δ for all a
t
with ψ(a
t
) = −1, then for all folded A ∈ B
−
n
,
Pr[T
t
rejects (a, A, ψ)] ≥ c δ.
36 5. Long code tests
Proof. We start with 1: Let A = LC
a
. Then A is folded. If T is executed,
then T
t
accepts by Lemma 5.1. Otherwise
A(f) A(f e
j
) = f(a) (f e
j
)(a) = f(a) f(a) e
j
(a) = a
j
.
Thus T
t
accepts with probability 1.
Next comes 2: Let δ(a, a
t
) ≥ δ for all a
t
with ψ(a
t
) = −1 and let A be
folded.
First case: δ(A, LC
a
) ≥ δ/4 for all a
t
with ψ(a
t
) = −1. With probability
1/2, T is executed. T rejects with probability ≥
c
4
δ.
Second case: There is an a
t
∈ ¦−1, 1¦
n
with δ(A, LC
a
) < δ/4 and ψ(a
t
) =
−1. If a
t
j
= a
j
and A(f)A(fe
j
) = a
t
j
, then T will reject. Thus,
Pr[T
t
rejects] ≥ Pr[2. is executed] Pr[2. rejects]
≥
1
2
Pr
j,f
[a
t
j
= a
j
∧ A(f)A(fe
j
) = a
t
j
]
≥
1
2
Pr
j
[a
t
j
= a
j
] Pr
j,f
[A(f)A(fe
j
) = a
t
j
[a
t
j
= a
j
]
=
1
2
Pr
j
[a
t
j
= a
j
] Pr
f
[A(f)A(fe
j
) = a
t
j
]
≥
1
2
δ(1 −
δ
2
)
≥
δ
4
.
For the secondtolast inequality, we have to show that Pr
f
[A(f)A(fe
j
) =
a
t
j
] ≥ 1 −
δ
2
. A(f)A(fe
j
) = a
j
is implied by A(f) = f(a
t
) and A(fe
j
) =
(fe
j
)(a
t
) (multiply the two equations). Therefore,
Pr[A(f)A(fe
j
) = a
t
j
] ≥ Pr[A(f) = f(a
t
) ∧ A(fe
j
) = (fe
j
)(a
t
)]
= 1 −Pr[A(f) = f(a
t
) ∨ A(fe
j
) = (fe
j
)(a
t
)]
≥ 1 −Pr[A(f) = f(a
t
)] −Pr[A(fe
j
) = (fe
j
)(a
t
)]
= 1 −Pr[A(f) = LC
a
(f)] −Pr[A(fe
j
) = LC
a
(fe
j
)]
≥ 1 −δ/2.
The last inequality is true since we can bound both probabilities by δ/4.
The ﬁrst one by assumption and also the second one, since f → fe
j
is a
bijection of B
−
n
.
6 Assignment Tester
6.1 Constraint graph satisﬁability
A constraint graph G over some alphabet Σ is a directed graph (V, E)
together with a mapping c : E → {(Σ). An assignment is a mapping
V → Σ. The assignment a satisﬁes the (constraint at) edge e = (u, v) if
(a(u), a(v)) ∈ c(e). The unsatisﬁability value of a is the number of con
straints not satisﬁed by a divided by the number of constraints (edges).
This value is denoted by UNSAT
a
(G). The unsatisﬁability value of G is
UNSAT(G) = min
a
UNSAT(G).
Problem 6.1. Maximum Constraint graph satisﬁability MaxCGS is the fol
lowing problem:
Instances: constraint graphs ((V, E), Σ, c)
Solutions: assignments a : V → Σ
Measure: (1 −UNSAT(G)) [E[
Goal: max
Exercise 6.1. The following two statements are equivalent:
1. There is an > 0 such that gap(1 −, 1)Max3SAT is NPhard.
2. There is an > 0 such that gap(1 −, 1)MaxCGS is NPhard.
Thus, to prove the PCPTheorem, we can show the NPhardness of
gap(1 − , 1)MaxCGS instead of gap(1 − , 1)Max3SAT. The former one
has the advantage that it is easier to apply results from graph theory, in
particular expander graphs, which we will introduce in the next chapter.
6.2 Assignment testers
Deﬁnition 6.2. An assignment tester over Σ is a deterministic algorithm
that given a Boolean formula ψ in variables X = ¦x
1
, . . . , x
n
¦ outputs a
constraint graph G = ((V, E), Σ, c) with X ⊆ V such that there is an > 0
such that for all a : X → ¦−1, 1¦
1
:
1. If ψ(a) = −1, then there is a b : V ` X → Σ with UNSAT
a∪b
(G) = 0.
2. If ψ(a) = 1, then for all b : V ` X → Σ, we have UNSAT
a∪b
(G) ≥
δ(a, a
t
).
1
We identify two values of Σ with −1 and 1.
37
38 6. Assignment Tester
is called the rejection probability.
Above, a ∪ b : V → Σ is the assignment that maps a v ∈ X to a(v) and
a v ∈ V ` X to b(v). Again, we do not care for running times, since we will
apply the assignment tester only to constant size instances. Given a Boolean
formula, an assignment tester constructs a graph such that every satisfying
assignment of the formula can be extended to a satisfying assignment of the
graph. Every nonsatisfying assignment a, however, cannot be extended to
an assignment of the graph that fulﬁlls a fraction > 1−δ of the constraints,
where δ is the distance of a to any satisfying assignment.
Our construction takes the test T
t
and models its behaviour in a graph.
We set Σ = ¦−1, 1¦
3
and let Y be a set of 2
2
n
Boolean variables. These
variables correspond to the bits of the string A. The test T
t
makes some
random experiments. It ﬁrst ﬂips a coin and then, depending on its outcome,
either chooses f, g, and h or chooses j and f at random. For each of the
possible outcomes r, there will be one variable z
r
. Let Z be the set of all
these variables. We will interpret these three values as “guesses” of the bits
queried. The variables in X and Y will get Boolean values, that is, whenever
they get values from Σ that do not represent −1 or 1, then all constraints
containing them will not be satisﬁed.
If in the outcome r, T
t
queries A(f), A(g), and A(h), then z
r
will be
connected to the three nodes in Y that correspond to these positions. (More
precisely, since we only consider folded strings A, we will consider only
positions in D
ψ
and might replace f, g, and h by the corresponding elements
of D
ψ
.) The constraints on these three edges are satisﬁed, if the three bits
at z
r
correspond to the bits of A(f), A(g), and A(h) and T
t
would accept
when reading these three bits. If in the outcome r, T
t
queries A(f), A(fe
j
),
and a
j
, then z
r
will be connected to two nodes of Y and one of X. The rest
of the construction is essentially the same.
Theorem 6.3. The construction above is an assignment tester.
Proof. First we have to show that if a is a satisfying assignment of ψ,
then we can extend it to a satisfying assignment of G. To the variable of Y ,
we will assign the values according to LC
a
. By Theorem 5.4, T
t
will accept
with probability 1. This means that if we assign to Z values matching the
values of X ∪ Y , then all constraints will be satisﬁed.
If a does not satisfy ψ, then T
t
will reject every A with probability ≥ cδ,
where δ(a, a
t
) ≥ δ for all satisfying assignments a. This means that for a
fraction of c δ of the z
r
’s at least one constraint is not satisﬁed. (Either we
choose the values consistently with the values of X and Y , then all three
constraints are not satisﬁed, or we try to cheat but then at least one is not
satisﬁed.) Thus UNSAT
a∪b
(G) ≥
c
3
δ.
7 Expander graphs
Throughout this chapter, we are considering undirected multigraphs G =
(V, E) with selfloops. The degree d(v) of a node v is the number of edges
that v belongs to. This particularly means that a node with a selfloop and
no other edges has degree 1 (and not 2, which is a meaningful deﬁnition,
too). This deﬁnition of degree will be very convenient in the following. A
graph is called dregular if d(v) = d for all v ∈ V .
It is a wellknown fact that for graphs without selfloops, the sum of the
degrees of the nodes is twice the number of edges (proof by doublecounting).
With selfloops, the following bounds hold.
Fact 7.1. 1. We have [E[ ≤
¸
v∈V
d(v) ≤ 2[E[.
2. If G is dregular, then [E[ ≤ d[V [ ≤ 2[E[.
A walk in a graph G = (V, E) is a sequence (v
0
, e
1
, v
1
, e
2
, . . . , e
, v
) such
that e
λ
= ¦v
λ−1
, v
λ
¦ for all 1 ≤ λ ≤ . v
0
is the start node, v
is the end
node of the walk. Its length is . A walk can visit the same node or edge
several times, i.e., it is allowed that v
i
= v
j
or e
i
= e
j
for some i = j.
A graph is connected if for all pairs of nodes u and v, there is a walk
from u to v. The neighbourhood N(v) of v is the set of all nodes u such
that ¦v, u¦ ∈ E. In general, the tneighbourhood is the set of all nodes u
such that there is a walk from v to u of length t.
7.1 Algebraic graph theory
The adjacency matrix of G is the [V [ [V [matrix
A = (a
u,v
)
u,v∈V
where a
u,v
is the number of edges between u and v. We will usually index the
rows and columns by the nodes itself and not by indices from ¦1, . . . , [V [¦.
But we will assume that the nodes have some ordering, so that when we
need it, we can also index the rows by 1, . . . , [V [.
We will now apply tools from linear algebra to A in order to study
properties of G. This is called algebraic graph theory. The book by Biggs
[Big93] is an excellent introduction to this ﬁeld. Everything you want to
know about expander graphs can be found in [HLW06].
Because G is undirected, A is symmetric. Therefore, A has n real eigen
values λ
1
≥ λ
2
≥ ≥ λ
n
and there is a orthonormal basis consisting of
eigenvectors.
39
40 7. Expander graphs
Lemma 7.2. Let G be a dregular graph with adjacency matrix A and eigen
values λ
1
≥ λ
2
≥ ≥ λ
n
.
1. λ
1
= d and 1
n
= (1, . . . , 1)
T
is a corresponding eigenvector.
2. G is connected if and only if λ
2
< d.
Proof. We start with 1: Since G is dregular,
d = d(v) =
¸
u∈V
a
v,u
for all v
and
A 1
n
= d 1
n
.
Thus d is an eigenvalue and 1
n
is an associated eigenvector.
Let λ be any eigenvalue and b be an associated eigenvector. We can scale
b in such a way that the largest entry of b is 1. Let this entry by b
v
. Then
λ = λ b
v
=
¸
u∈V
a
v,u
b
u
≤
¸
u∈V
a
v,u
= d.
Therefore, d is also the largest eigenvector.
Now comes 2. “=⇒”: Let b be an eigenvector associated with the eigen
value d. As above, we scale b such that the largest entry is 1. Let b
v
be
this entry. We next show that for every node u ∈ N(v), b
u
= 1, too. Since
G is connected, b = 1
n
follows by induction. But this means that d has
multiplicity 1 and λ
2
< d.
A b = d b implies
d = db
v
=
¸
u∈V
a
v,u
b
u
=
¸
u∈N(v)
a
v,u
b
u
.
Since b
u
≤ 1 for all u and since d =
¸
u∈N(v)
a
v,u
, this equation above can
only be fulﬁlled if b
u
= 1 for all u ∈ N(v).
“⇐=”: If the graph G is not connected, then A =
A
1
0
0 A
2
. There
fore (1, . . . , 1, 0, . . . , 0) and (0, . . . , 0, 1, . . . , 1) (with the appropriate number
of 1’s and 0’s) are linearly independent eigenvectors associated with d.
Let . denote the Euclidean norm of R
[V [
, that is b =
¸
v∈V
b
2
v
.
Deﬁnition 7.3. Let G be a graph with adjacency matrix A. Then
λ(G) = max
b⊥1
n
Ab
b
.
Theorem 7.4. Let G be a dregular graph with adjacency matrix A and
eigenvalues λ
1
≥ λ
2
≥ ≥ λ
n
.
7.2. Edge expansion 41
1. λ(G) = [λ
j
[ for some j.
2. λ(G) = max
b⊥1
n
Ab
b
is attained for any eigenvector b associated with
λ
j
.
3. λ(G) = max¦[λ
2
[, [λ
n
[¦.
4. λ(G) ≤ d.
Proof. Let b be a vector for which the maximum is attained in the deﬁ
nition of λ(G). W.l.o.g. let b = 1. Let c
1
, . . . , c
n
be an orthonormal basis
consisting of eigenvectors of A. W.l.o.g. let c
1
= 1
n
. Since b is orthogonal
to 1
n
, we have
b = β
2
c
2
+ +β
n
c
n
,
Since c
1
, . . . , c
n
is a orthonormal family,
1 = b = b
2
2
+ +b
2
n
.
Let λ
j
be the eigenvalue c
j
is associated with. We have
λ(G) = Ab
= β
2
Ac
2
+. . . β
n
Ac
n

= (β
2
λ
2
)
2
+ + (β
n
λ
n
)
2
.
Since b is a vector for which the maximum is attained, β
j
can only be nonzero
for a λ
j
whose absolute value is maximal among λ
2
, . . . , λ
n
.
It is an easy exercise to derive the statements 1–4 from this.
Exercise 7.1. Prove statements 1–4 of Theorem 7.4.
λ(G) is also called the second largest eigenvalue. (More correctly, it
should be called the second largest absolute value of the eigenvalues, but
this is even longer.)
7.2 Edge expansion
Deﬁnition 7.5. Let G be a dregular graph. The edge expansion h(G) of
G is deﬁned as
h(G) = min
S⊆V :[S[≤[V [/2
E(S,
¯
S)
[S[
.
E(S,
¯
S) is the set of all edges with one endpoint in S and one endpoint in
¯
S. G is called an hexpander if h(G) ≥ h.
Large edge expansion means that any set S has many neighbours that
are not in S. This will be a very useful property. Families of expanders can
be constructed in polynomial time, one construction is [RVW02]. We will
not prove it here.
42 7. Expander graphs
Theorem 7.6. There are constants d
0
∈ N and h
0
> 0 and a deterministic
algorithm that given n constructs in time polynomial in n a d
0
regular graph
G
n
with h(G) > h
0
.
Large edge expansion means small second largest eigenvalue and vice
versa. We will need the following bound.
Theorem 7.7. Let G be a dregular graph. If λ(G) < d
1
then
λ(G) ≤ d −
h(G)
2
2d
.
To prove the theorem, it is suﬃcient to prove
h(G)
2
≤ 2d(d −λ) (7.1)
h(G)
2
≤ 2d(d +λ) (7.2)
by Theorem 7.4. The proofs of both inequalities are very similar, we will
only show the ﬁrst one. Let
B = dI −A
B
t
= dI +A
where I is the n nidentity matrix. Let f ∈ R
n
. Later, we will derive f
from an eigenvector of A. In the following, a summation over “e = ¦u, v¦”
is a sum over all edges in E with two end nodes and a summation over
“e = ¦v¦” is a sum over all selfloops in E.
Lemma 7.8.
f
T
Bf =
¸
e=u,v¦
(f
u
−f
v
)
2
f
T
B
t
f ≥
¸
e=u,v¦
(f
u
+f
v
)
2
Proof. We have
f
T
Bf =
¸
v∈V
df
2
v
−f
T
Af
=
¸
¸
e=u,v¦
(f
2
u
+f
2
v
) +
¸
e=v¦
f
2
v
¸
−
¸
¸
e=u,v¦
2f
u
f
v
+
¸
e=v¦
f
2
v
¸
=
¸
e=u,v¦
(f
u
−f
v
)
2
.
1
We have to exclude bipartite graphs, which have λ
n
= −d but can have edge expansion
> 0. Our prove will break down if λ
n
= −d, because (d+λ) must not be zero when proving
the counter part of (7.3).
7.2. Edge expansion 43
The second inequality is proven in a similar manner.
To a given f, let
F =
¸
e=u,v¦
[f
2
u
−f
2
v
[.
Let β
0
< β
1
< < β
r
be the diﬀerent values that f attains. Let
U
j
= ¦u ∈ V [ f
u
≥ β
j
¦,
U
t
j
= ¦u ∈ V [ f
u
≤ β
j
¦
be the set of all nodes whose value f
u
is at least or at most β
j
, respectively.
Lemma 7.9.
F =
r
¸
j=1
[E(U
j
,
¯
U
j
)[(β
2
j
−β
2
j−1
)
F =
r−1
¸
j=0
[E(U
t
j
,
¯
U
t
j
)[(β
2
j+1
−β
2
j
)
Proof. Let e = ¦u, v¦ ∈ E be an edge that is no selfloop. Assume that
f
u
= β
i
≥ β
j
= f
v
. The contribution of e to F is β
2
i
− β
2
j
. On the other
hand, e crosses U
k
and
¯
U
k
for j ≤ k ≤ i −1. Thus the contribution of e to
righthand side of the ﬁrst equation in the statement of the lemma is
(β
2
i
−β
2
i−1
) + (β
2
i−1
−β
2
i−2
+ + (β
2
j+1
−β
2
j
) = β
2
i
−β
2
j
.
Thus both sides of the equation are equal.
Lemma 7.10. We have
F ≤
√
2d
f
T
Bff .
If f(v) ≤ 0 for all v, then
F ≤
√
2d
f
T
B
t
ff .
Proof. We have
F =
¸
e=u,v¦
[f
2
u
−f
2
v
[
=
¸
e=u,v¦
[f
u
−f
v
[ [f
u
+f
v
[
≤
¸
e=u,v¦
(f
u
−f
v
)
2
¸
e=u,v¦
(f
u
+f
v
)
2
=
f
T
Bf
¸
e=u,v¦
(f
u
+f
v
)
2
44 7. Expander graphs
by the Cauchy–Schwartz and Lemma 7.8 We can bound the second factor
by
¸
e=u,v¦
(f
u
+f
v
)
2
≤
2
¸
e=u,v¦
(f
2
u
+f
2
v
)
≤
2d
¸
v∈V
f
2
v
≤
√
2df .
The second inequality is proven in a similar manner.
Lemma 7.11. Let f
v
≥ 0 for all v ∈ V or f
v
≤ 0 for all v ∈ V . If
[ supp(f)[ ≤ n/2, then F ≥ h(G)f.
Proof. We only show the statement for f
v
≥ 0, the other case is com
pletely similar. Since [ supp(f)[ ≤ n/2, we have β
0
= 0 and [U
j
[ ≤ n/2 for
j > 0. We have [E(U
j
,
¯
U
j
)[ ≥ h(g)[U
j
[. By Lemma 7.9,
F =
r
¸
j=1
[E(U
j
,
¯
U
j
)[(β
2
j
−β
2
j−1
)
≥ h(G)
r
¸
j=1
[U
j
[(β
2
j
−β
2
j−1
)
= h(G)
r−1
¸
j=1
β
2
j
([U
j
[ −[U
j+1
[)
. .. .
=[v[f
v
=β
j
¦[
+β
2
r
[U
r
[
= h(G)f
2
.
Finally, we will now prove (7.1) and (7.2). We only show (7.1), (7.2) is
proven in the same manner. Let λ < d be an eigenvector of A. d − λ is an
eigenvector of B = dI − A and every eigenvector of A associated with λ is
an eigenvector of B associated with d −λ. Let g be such an eigenvector. g
is orthogonal to 1
n
. We can assume that g has at most n/2 entries that are
≥ 0, otherwise we consider −g instead. We deﬁne
f
v
=
g
v
if g
v
≥ 0
0 otherwise
7.2. Edge expansion 45
and W = supp(f). By construction, [W[ ≤ n/2. We have
(Bf)
v
= df
v
−
¸
u∈V
a
v,u
f
u
= dg
v
−
¸
u∈W
a
v,u
g
u
≤ dg
v
−
¸
u∈V
a
v,u
g
u
= (d −λ)g
v
.
Since f
v
= 0 for v / ∈ W, this implies
f
T
Bf =
¸
v∈V
f
v
(Bf)
v
≤ (d −λ)
¸
v∈V
f
v
g
v
≤ (d −λ)
¸
v∈V
f
2
v
= (d −λ)f
2
. (7.3)
By Lemmas 7.10 and 7.11,
h(G)f
2
≤
√
2d
f
T
Bff .
Squaring this and exploiting the inequality before, we get
h(G)
2
f
2
≤ 2d f
T
Bf f
2
≤ 2d(d −λ)f
4
.
Because g is orthogonal to 1
n
and nonzero, it has at least one entry > 0.
Therefore, f > 0 and we get
h(G)
2
≤ 2d(d −λ).
The second inequality is proven in the same manner.
8 Random walks on expanders
Consider the following method RW to generate a walk in a dregular graph
G = (V, E).
Input: dregular graph G, ∈ N
1. Randomly choose a vertex v
0
.
2. For λ = 1, . . . , choose v
λ
∈ N(v
λ−1
) uniformly at random.
3. Return (v
0
, . . . , v
).
Let )
be the set of all walks in G. We have [)
[ = nd
, since a path
is uniquely speciﬁed by its start node (n choices) and a vector ¦1, . . . , d¦
which tells us which of the d edges of the current node is the next in the
given walk. For this, we number the d edges incident with a node arbitrarily
from 1 to d. It is now clear that method RW generates the walks according
to the uniform distribution on RW.
Lemma 8.1. Method RW returns each walk W ∈ )
with a probability of
1/(nd
).
Instead of choosing a start node, we can choose a node in the middle.
This modiﬁed method RW
t
also generates the uniform distribution on all
walks of length .
Input: dregular graph G, ∈ N, 0 ≤ j ≤
1. Randomly choose a vertex v
j
.
2. For λ = j −1, . . . , 0 choose v
λ
∈ N(v
λ+1
) uniformly at random.
3. For λ = j + 1, . . . , choose v
λ
∈ N(v
λ−1
) uniformly at random.
4. Return (v
0
, . . . , v
).
In the following, let G be a dregular graph with adjacency matrix A.
Let
˜
A =
1
d
A be the normalized adjacency matrix.
˜
A is doubly stochastic,
i.e., all entries are nonnegative and all row sums and all column sums are 1.
λ is an eigenvalue of
˜
A iﬀ d λ is an eigenvalue of A.
Let x = (x
v
)
v∈V
be a probability distribution on the nodes v and consider
x as an element of R
n
. If we now select a node v according to x, then select
an edge ¦v, u¦ (u = v is allowed) incident with v uniformly at random, and
then go to u, the probability distribution that we get is given by
˜
A x.
Applying induction we get the following result.
46
47
Lemma 8.2. Let x be a probability distribution on V . If we run method
RW for steps and draw the start node according to x, then this induces a
probability distribution on V given by
˜
A
x.
Let F ⊆ E be a set of edges. We want to estimate the probability that
a random walk of length j that starts in an edge of F ends in an edge of F,
too.
To this aim, we ﬁrst calculate the probability x
v
that a random walk
that starts with an edge of F starts in v. The distribution x = (x
v
)
v∈V
is
generated by the following process: First choose an edge f ∈ F at random.
Then choose one of its nodes uniformly at random as the start node (and
the other node as the second node). Here it makes a diﬀerence whether f is
a selfloop or not. We have
x
v
=
1
[F[
1
2
[¦e = ¦u, v¦ [ e ∈ F, u = v¦[ +[¦e = ¦v¦ [ e ∈ F¦[
≤
d
[F[
(8.1)
By symmetry, this is also the probability that v is the second node in a walk
that starts with an edge in F.
Second we estimate the probability y
v
that if we choose a random edge
incident to v, we have chosen an edge in F. This is simply
y
v
=
1
d
[¦e = ¦u, v¦ [ e ∈ F¦[
=
2[F[
d
1
2[F[
[¦e = ¦u, v¦ [ e ∈ F¦[
≤
2[F[
d
x
ν
. (8.2)
Now the probability distribution on the nodes after performing a walk
of length j that starts in F is given by
˜
A
j−1
x
v
. (Note that x
v
is also the
probability that v is the second node in the walk.) The probability that the
(j + 1)th edge is in F is then given by
y,
˜
A
j−1
x
(8.3)
where '., .` is the ordinary scalar product in R
n
.
To estimate (8.3), we will exploit the CauchySchwartz inequality. For
this, we need estimate x. x
1
=
1
n
1
n
is an eigenvector of
˜
A associated with
1. Let x
⊥
= x −x
1
. x
⊥
is orthogonal to x
1
, because
'1
n
, x
⊥
` =
¸
v∈V
(x
v
−1/n) = 1 −1 = 0.
48 8. Random walks on expanders
˜
A
k
x
⊥
is also orthogonal to x
1
for every k, since
'1
n
,
˜
A
k
x
⊥
` = '(
˜
A
k
)
T
1
n
, x
⊥
` = '
˜
A
k
1
n
, x
⊥
` = '1
n
, x
⊥
` = 0.
Let
˜
λ = λ(G)/d. We have

˜
A
j−1
x
⊥
 ≤ [
˜
λ[
j−1
x
⊥

and
x
⊥

2
= x −x
1

2
= x
2
−2'x, x
1
` +x
1

2
= x
2
−
2
n
¸
v∈V
x
v
+
1
n
< x
2
.
By (8.1),
x
2
≤ max
v∈V
x
v
¸
v∈V
x
v
= max
v∈V
x
v
≤
d
[F[
.
Altogether, we have
'y,
˜
A
j−1
x
⊥
` ≤ y 
˜
A
j−1
x
⊥
 ≤
2[F[
d
x [
˜
λ
j−1
[x ≤ 2
˜
λ
j−1
.
Finally, the probability that e
j+1
∈ F can be bounded by
'y,
˜
A
j−1
x` = 'y,
˜
A
j−1
x
1
` +'y,
˜
A
j−1
x
⊥
`
≤ 'y, x
1
` + 2[
˜
λ[
j−1
=
1
n
¸
v∈V
y
v
+ 2[
˜
λ[
j−1
≤
2[F[
dn
+ 2[
˜
λ[
j−1
≤ 2
[F[
[E[
+
λ
d
j−1
.
Lemma 8.3. Consider a random walk on a dregular graph G = (V, E)
starting with an edge from a set F ⊆ E. Then the probability that the
(j + 1)thedge of the walk is again in F is bounded by
2
[F[
[E[
+
λ(G)
d
j−1
.
If F does not contain any selfloops, then (8.1) can be bounded by
d
2[F[
and we can get rid of the 2 in the estimate. Then this bound says that even
after a logarithmic number of steps, the (j + 1)the edge is almost drawn at
random.
9 The ﬁnal proof
Finally, we can start with the proof of the PCP theorem. We begin with
the observation that gap(1 −
1
[E[
, 1)MaxCGS is NPhard (over the alphabet
Σ = ¦0, 1¦
3
. Let G be a given constraint graph. We apply three procedures
to G:
G
↓ Preprocessing (G becomes an expander)
G
prep
↓ Ampliﬁcation (UNSAT value gets larger, but also Σ)
G
amp
↓ Alphabet reduction (Σ = ¦0, 1¦
3
again)
G
red
If we do this O(log [E[) times, then we bring the (relative) size of the
gap from
1
[E[
to constant and we are done.
9.1 Preprocessing
Throughout this chapter, d
0
and h
0
will be “global” constants that come
out of the construction of a constant degree d
0
expander X
n
with constant
edge expansion h
0
, see Theorem 7.6.
Lemma 9.1. Let G = ((V, E), Σ, c) be a constraint graph. There is a con
stant γ
1
> 0 such that we can construct in polynomial time a (d
0
+1)regular
graph G
1
= ((V
1
, E
1
), Σ : c
1
) with size(G
1
) = O(size(G)) and
γ
1
UNSAT(G) ≤ UNSAT(G
1
) ≤ UNSAT(G).
Proof. Let X
n
be the expander from Theorem 7.6. G
1
is constructed as
follows:
1. Replace each v ∈ V by a copy Y
v
of X
d(v)
.
2. For each edge ¦u, v¦ ∈ E insert an edge from Y
u
to Y
v
. Do this in such
a way that every node of Y
v
is incident with exactly one such extra
edge. In this way, the resulting graph will be (d
0
+ 1)regular.
3. Let E
int
be the edges within the copies Y
v
and E
ext
be the edges
between two diﬀerent copies. For all e ∈ E
int
, c
1
(e) is an equality
49
50 9. The ﬁnal proof
constraint that is satisﬁed iﬀ both nodes have the same value (“internal
constraints”). For all e ∈ E
ext
, c
1
(e) is the same constraint as the
original edge has (“external constraint”).
We have [V
1
[ ≤
¸
v∈V
d(v) ≤ 2[E[ and [E
1
[ ≤ [V
1
[(d
0
+ 1) ≤ 2[E[(d
0
+ 1).
Thus size(G
1
) = O(size(G)).
Next, we show that UNSAT(G
1
) ≤ UNSAT(G). Chose an assignment
σ : V → Σ with UNSAT(G) = UNSAT
σ
(G) (i.e., an optimal assignment).
We deﬁne σ
1
: V
1
→ Σ by σ
1
(u) = σ(v) iﬀ u belongs to V (Y
v
), the vertex
set of the copy Y
v
that replaces v. In this way, all internal constraints
are fulﬁlled by construction. Every external constraint is fulﬁlled iﬀ it was
fulﬁlled under σ in G. Therefore,
UNSAT(G
1
) ≤ UNSAT
σ
(G
1
) ≤ UNSAT(G),
where the second equation follows from the fact that [E[ ≤ [E
1
[.
The interesting case is γ UNSAT(G) ≤ UNSAT(G
1
). Let σ
1
: V
1
→ Σ
be an optimum assignment. We deﬁne σ : V → Σ by a majority vote:
σ(v) is the value a ∈ Σ that is the most common among all values σ
1
(u)
with u ∈ V (Y
v
). Ties are broken arbitrarily. Let F ⊆ E be the set of
all unsatisﬁed constraints under σ and F
1
⊆ E
1
the set of all unsatisﬁed
constraints under σ
1
. Let S = ¦u ∈ V (Y
v
) [ v ∈ V, σ
1
(u) = σ(v)¦ and
S
v
= S ∩V (Y
v
), i.e., all the looser nodes that voted for a diﬀerent value for
σ(v). Let α := [F[/[E[ = UNSAT
σ
(G). We have
α[E[ = [F[ ≤ [F
1
[ +[S[,
since, if a constraint in F is not satisﬁed, then either the corresponding
external constraint in [F
1
[ is not satisﬁed or one of the nodes is a looser
node.
Case 1: [F
1
[ ≥
α
2
[E[. We have
UNSAT(G
1
) =
[F
1
[
[E
1
[
≥
α
2
[E[
[E
1
[
=
α
4(d
0
+ 1)
≥
UNSAT(G)
4(d
0
+ 1)
.
Case 2: [F
1
[ <
α
2
[E[. In this case, we have
α
2
[E[ +[S[ > [F
1
[ +[S[ ≥ α[E[.
Thus [S[ ≥
α
2
[E[. Consider some v ∈ V and let S
v
a
= ¦u ∈ S
v
[ σ
1
(u) = a¦.
We have S
v
=
¸
a,=σ(v)
S
v
a
. Because we took a majority vote, [S
v
a
[ ≤
1
2
[V (Y
v
)[
for all a = σ
1
(u). As Y
v
is an expander,
[E(S
v
a
,
¯
S
v
a
)[ ≥ h
0
[S
v
a
[,
9.1. Preprocessing 51
where the complement is taken “locally”, i.e.,
¯
S
v
a
= V (Y
v
) ` S
v
a
. Since we
have equality constraints on all internal edges, all edges in [E(S
v
a
,
¯
S
v
a
)[ are
not satisﬁed. Thus,
[F
1
[ ≥
1
2
¸
v∈V
¸
a,=σ(v)
[E(S
v
a
,
¯
S
v
a
)[
≥
¸
v∈V
1
2
h
0
¸
a,=σ(v)
[S
v
a
[
≥
1
2
h
0
¸
v∈V
[S
v
[
=
1
2
h
0
[S[
>
α
4
h
0
[E[.
Thus
UNSAT(G
1
) =
[F
1
[
[E
1
[
>
αh
0
4
[E[
[E
1
[
≥
αh
0
8(d
0
+ 1)
≥
h
0
8(d
0
+ 1)
UNSAT(G).
We set γ
1
to be the minimum of the constants in the two cases.
Lemma 9.2. Let G be a dregular constraint graph. We can construct in
polynomial time a constraint graph G
2
such that
• G
2
is (d +d
0
+ 1)regular,
• every node of G
2
has a self loop,
• λ(G
2
) ≤ d +d
0
+ 1 −
h
2
0
2(d+d
0
+1)
,
• size(G
2
) = O(size(G)),
•
d
2+2(d
0
+1)
UNSAT(G) ≤ UNSAT(G
2
) ≤ UNSAT(G).
Proof. Assume that G has n nodes. We take the union of G and X
n
(both graphs have the same node set) and attach to each node a selfloop.
The edges from X
n
and the self loops get trivial constraints that are always
fulﬁlled. G
2
= ((V, E
2
), Σ, c
2
) is clearly d +d
0
+ 1regular.
52 9. The ﬁnal proof
We have h(G) ≥ h(X
n
) ≥ h
0
. Since G is not bipartite,
λ(G
2
) ≤ d +d
0
+ 1 −
h
2
0
2(d +d
0
+ 1)
.
Finally,
[E
2
[ = [E[ +[E(X
n
)[ +n ≤ [E[ + (d
0
+ 1)[V [ ≤
d + 2(d
0
+ 1)
d
[E[.
Thus the size increase is linear. Furthermore, the UNSAT value can at most
shrink by this factor.
By combining these two lemmas, we get the following result.
Theorem 9.3. There is are constants β
prep
> 0 and 0 < λ < δ such that for
all constraint graphs G, we can construct in polynomial time a constraint
graph G
prep
over the same alphabet with
• G
prep
is δregular,
• every node in G
prep
has a selfloop,
• λ(G
prep
) ≤ λ,
• size(G
prep
) = O(size(G)),
• β
prep
UNSAT(G) ≤ UNSAT(G
prep
) ≤ UNSAT(G).
We set δ = d +d
0
+ 1, β
prep
= γ
d
d+2(d
0
+1)
, and λ = δ −
h
2
0
2δ
.
9.2 Gap ampliﬁcation
Deﬁnition 9.4. Let G = ((V, E), Σ, c) be a dregular constraint graph such
that every node has a self loop. Let t ∈ N be even. The tfold ampliﬁcation
product G
t
= ((V, E
t
), Σ
d
t/2
, c
t
) is deﬁned as follows:
• For every walk W of length t from u to v, there is an edge ¦u, v¦ in
E
t
. If there are several walks between u and v, we introduce several
edges between u and v. But we disregard the directions of the walks,
that is, for every walk W and its reverse, we put only one edge into
E
t
.
• An assignment ˆ σ maps every node to a vector from Σ
d
t/2
. We index
the entries with walks of length t/2 starting in v. (There are exactly
d
t/2
such walks. Let W be such a walk and let u be the other end node.
ˆ σ(v)
W
is called “the opinion of v about u with respect to w”. Since
there might be many walks from v to u, v can have many opinions
about u. We will usually assume that nodes are not “schizophrenic”,
i.e., that they always have the same opinion about u. In this case, we
will also write ˆ σ(v)
u
for the opinion of v about u.
9.2. Gap ampliﬁcation 53
• It remains to deﬁne c
t
. Let e = ¦u, v¦ ∈ E
t
and ˆ σ be an assignment.
Let G
e
be the subgraph of G induced by N
t/2
(u) ∪ N
t/2
(v). c
t
(e) is
satisﬁed by ˆ σ iﬀ all opinions (of u and v) about every x ∈ G
e
are
consistent and all constraints in G
e
are satisﬁed. (Since G will be an
expander, if one constraint of G “appears” in many constraints of G
t
.)
If t is a constant, G
t
is polynomial time computable from G and we have
size(G
t
) = O(size(G)).
Theorem 9.5. Let λ < d be two constants, Σ an alphabet. There is a
constant β
amp
solely depending on λ, d, and [Σ[ such that for all dregular
constraint graphs G with self loops at every node and λ(G) ≤ λ:
1. UNSAT(G
t
) ≥ β
amp
√
t min¦UNSAT(G),
1
2t
¦
2. UNSAT(G) = 0 ⇒ UNSAT(G
t
) = 0.
Proof. We start with showing 2: Let σ be a satisfying assignment for
G. We deﬁne ˆ σ : V → Σ
d
t/2
by setting ˆ σ(v)
W
= σ(u) where W is a walk of
length t/2 starting in v and u is the other end node of W. By construction,
ˆ σ fulﬁlls all constraints of
ˆ
G
t
.
For 1, let ˆ σ be an optimum assignment for G
t
. We can assume that
there are not any schizophrenic nodes v because otherwise all constraints
involving v are not satisﬁed and therefore, we cannot increase the UNSAT
value by changing the assignment to v.
We will deﬁne an assignment σ with
UNSAT
ˆ σ
(G
t
) ≥ Ω(
√
t) min¦UNSAT
σ
(G),
1
2t
¦.
σ is again deﬁned by a majority vote. σ(v) is the majority of all opinions of
the nodes u that are reachable by a walk of length t/2 from v. (These are
exactly the nodes that have an opinion about v.) If several paths go from v
to u, then each paths contributes one opinion.
We choose an F ⊆ E as large as possible such that all constraints in F
are not satisﬁed by σ and [F[/[E[ ≤ 1/t. Then
min¦UNSAT
σ
(G),
1
2t
¦ ≤
[F[
[E[
≤
1
t
.
Let )
t
denote the set of all walks of length t.
Deﬁnition 9.6. W = (v
0
, e
1
, v
1
, . . . , v
t
) ∈ )
t
is “ hit at j” if e
j
∈ F and
the opinion of v
0
about v
j−1
and of v
t
about v
j
are equal to σ(v
j−1
) and
to σ(v
j
), respectively. (In particular, both nodes have an opinion about the
corresponding node.)
54 9. The ﬁnal proof
If an edge is hit, then it is not satisﬁed and it is not satisﬁed because it
is really not satisﬁed and not just because ˆ σ and σ were inconsistent.
We set I = ¦j ∈ N [ t/2 −
√
t < j ≤ t/2 +
√
t¦, the set of “middle
indices”. For a walk W, we set
N(W) = [¦j ∈ I [ W is hit at j¦[.
Let e
W
be the edge in G
t
corresponding to W If N(W) > 0, then e
W
is not
satisﬁed by ˆ σ, since e
j
is not satisﬁed in G under σ and σ is consistent with
ˆ σ on v
j
and v
j−1
. In formulas,
Pr[N(W) > 0] ≤ Pr
ˆ e∈E
t
[ˆ σ does not satisfy ˆ e]
= UNSAT
ˆ σ
(G
t
)
= UNSAT(G
t
).
We will show that Ω(
√
t)
[F[
[E[
≤ Pr[N(W) > 0]. This will ﬁnish the proof.
Let
N
j
(W) =
1 if W is hit in j,
0 otherwise.
Then
¸
j∈I
N
j
(W) = N(W). Lemma 9.7 below shows Pr
W∈)
t
[N
j
(W) =
1] = Ω(
[F[
[E[
). With this, we can bound Pr[N(W) > 0]. We use the following
conditional expectation inequality:
Pr[N(W) > 0] ≥
¸
j∈I
Pr[N
j
(W) = 1)]
E[N(W) = 1[N
j
(W) = 1]
.
By linearity of expectation,
E[N(W) = 1[N
j
(W) = 1] =
¸
k∈I
E[N
k
(W) = 1[N
j
(W) = 1].
For every summand on the righthand side, we have
E[N
k
(W) = 1 [ N
j
(W) = 1]
= Pr[a random walk of length [k −j + 1[ ends in F [ it started in F]
≤ 2
[F[
[E[
+
λ
d
[k−j[
by Lemma 8.3. Hence,
E[N(W) = 1[N
j
(W) = 1] =
¸
k∈I
2
[F[
[E[
+
λ
d
[k−j[
≤ 2
[I[
[F[
[E[
+
2
1 −λ/d
≤ 2
2
√
t
+
2
1 −λ/d
.
9.2. Gap ampliﬁcation 55
Thus
Pr[N(W) > 0] ≥
¸
j∈I
Ω(
[F[
[E[
)
2
√
t
+
2
1−λ/d
≥ Ω
√
t
[F[
[E[
by the exercise below.
Exercise 9.1. Show that for every c > 0, there is a constant a such that
x
2/x +c
≥ a x
for all x ≥ 1.
Lemma 9.7. For all j ∈ I, Pr
W∈)
t
[N
j
(W) = 1] = Ω(
[F[
[E[
).
Proof. Fix j ∈ I. We generate a walk W = (v
0
, e
1
, v
1
, . . . , v
t
) uniformly
at random by using the method RW’ with parameter j. Then, the edge e
j
is chosen uniformly at random. Furthermore, v
0
only depends on v
j−1
and
v
t
only depends on v
t
. Therefore,
Pr
W∈)
t
[N
j
= 1] =
[F[
[E[
pq
where p = Pr
W∈calW
t
[ˆ σ(v
0
)
v
j−1
= σ(v
j−1
)] and q = Pr
W∈calW
t
[ˆ σ(v
t
)
v
j
=
σ(v
j
). We are done if we can show that p and q are constant. Since both
cases are symmetric, we will only present a proof for p.
Let X
j
be the random variable generated by the following process. We
start in v
j−1
and perform a random walk of length j −1. Let u be the node
that we reach. We output the opinion σ(u)
v
j−1
of u about v
j−1
. If u has no
opinion about v
j−1
(this can happen, since j can be greater than t/2; but
this will not happen too often) then we output some dummy value not in
Σ. Obviously, p = Pr[X
j−1
= σ(v
j−1
)].
If j = t/2, then we reach all the nodes that have an opinion about v
j−1
.
We start with the case j = t/2. In this case, both nodes v
0
and v
t
have
an opinion about v
j−1
and v
j
. Since σ(v
j−1
) is chosen by a majority vote,
p ≥
1
[Σ[
in this case.
We will now show that for all j ∈ I, the probability Pr[X
j−1
= σ(v
j−1
)]
cannot diﬀer by too much from this, in particular, it is Ω(1/[Σ[). The self
loops will play a crucial role here, since they ensure that a random paths
with edges visit not more than (1 − 1/d) diﬀerent nodes. We leave the
rest of this proof as an exercise.
Exercise 9.2. Show that Pr[X
j−1
= σ(v
j−1
)] ≥ Ω(1/[Σ[) for j ∈ I.
56 9. The ﬁnal proof
9.3 Alphabet reduction
In the last section, we increased the UNSAT value of the constraint graph
but also enlarged the alphabet. To apply the construction iteratively, we
need that in the end, the alphabet is again Σ = ¦0, 1¦
3
. This is achieved by
the procedure in this section.
For this, we need a little coding theory. An encoding of ¦0, 1¦
k
is an
injective mapping E : ¦0, 1¦
k
→ ¦0, 1¦
. Its image C is called a code, an
element of C a code word. The (relative) distance of a code is δ(C) =
min¦δ(x, y) [ x, y ∈ C, x = y¦.
For our purposes, we need an encoding E : ¦0, 1¦
k
→ ¦0, 1¦
O(k)
with
relative distance ≥ ρ for some constant ρ > 0. For a construction of such
a code, see e.g. [SS96]. If we have a relative distance of ρ, then this in
particular means that whenever we take a code word and change an arbitrary
fraction of less than ρ/2 of the bits, then we can recover the original code
word, since there is only one that has relative distance less than ρ/2.
Lemma 9.8. There is a constant β
red
such that for all constraint graphs
G = ((V, E),
ˆ
Σ, c) we can construct in polynomial time a constraint graph
G
red
= ((V
t
, E
t
), ¦0, 1¦
3
, c
t
) such that
1. size(G
red
) ≤ O(size(G)) where the constant only depends on [
ˆ
Σ[,
2. β
red
UNSAT(G) ≤ UNSAT(G
red
) ≤ UNSAT(G).
Proof. Let k = [
ˆ
Σ[. We identify every element of
ˆ
Σ with a string ¦0, 1¦
k
with k = log [
ˆ
Σ[. Then we map each string to ¦0, 1¦
with = O(k) using
the code from the beginning of this section. We replace every node v ∈ V by
a sequence of nodes v
1
, . . . , v
. With every edge e = (u, v) ∈ E, we identify
a function φ
e
: ¦0, 1¦
¦0, 1¦
→ ¦0, 1¦. φ
e
(x, y) is true iﬀ x and y are
code words corresponding to values a, b ∈
ˆ
Σ such that (a, b) ∈ c(e). For
each such φ
e
, we construct an assignment tester (see Theorem 6.3) G
e
=
((V
e
, E
e
), ¦0, 1¦
3
, c
e
). The graph G
red
is the union of all these assignment
testers. The nodes v
1
, . . . , v
representing v, v ∈ V , are shared by all
assignment tester corresponding to an edge that contains v. The constraints
of each edge in G
pre
are the constraints of the G
e
. We can assume that each
G
e
has the same number of edges, say, r. Thus G
pre
has r[E[ edges.
Each assignment tester is a constant size graph whose size only depends
on [
ˆ
Σ[. This immediately yields the upper bound on the size of G
pre
.
For the second statement of theorem, consider an optimal assignment σ
of G. We construct an assignment σ
t
for G
t
as follows: If σ satisﬁes the
constraint c
e
, then, by the properties of assignment testers, we can extend
σ in such a way that all constraints of G
e
are satisﬁed. If σ does not satisfy
c
e
, then we extend σ in any way. In the worst case, no constraints of G
e
are
satisﬁed. Thus for every constraint satisﬁed in G, at least r constraints are
9.4. Putting everything together 57
satisﬁed in G
pre
. Thus
UNSAT(G
pre
) ≤
r [E[ UNSAT(G)
[E
t
[
= UNSAT
σ
(G) = UNSAT(G).
For the other inequality, let σ
t
be an optimum assignment for G
t
. For
each set of nodes v
1
, . . . , v
, representing v, v ∈ V , we interpret the as
signment σ
t
to these nodes as a string x ∈ ¦0, 1¦
. We set σ(v) to be the
element a ∈
ˆ
Σ whose encoding ˆ x minimizes δ(x, ˆ x). We will now show that
for every constraint c
e
that is not satisﬁed by σ, a constant fraction of the
constraints of G
e
is not satisﬁed. This will complete the proof. Let v and w
be the two nodes of e and x and y be the strings given by σ
t
and ˆ x and ˆ y
be the nearest code words. Since e is not satisﬁed, either x or y diﬀers from
each satisfying assignment of φ
e
in a least ρ/2 of the bits. If this were not
the case, then x and y would have been decoded to an assignment satisfying
e. Thus x and y in total diﬀer from any satisfying assignment of φ
e
in a
fraction of ρ/4 of the bits. But then, by the properties of an assignment
tester, also a fraction of ρ/4 of the constraints of G
e
are not satisﬁed.
9.4 Putting everything together
If we put together the constructions of the three previous sections, we get
the following result.
Lemma 9.9. There are constants C > 0 and 1 > a > 0 such that for
every constraint graph G over the alphabet Σ = ¦0, 1¦
3
, we can construct a
constraint graph G
t
over the same alphabet in polynomial time such that
1. size(G
t
) ≤ C size(G),
2. If UNSAT(G) = 0, then UNSAT(G
t
) = 0,
3. UNSAT(G
t
) ≥ min¦2 UNSAT(G), a¦.
Proof. We start with G, make it an expander, then amplify the gap (the
value t is yet to choose) and ﬁnally reduce the alphabet. It is clear that if
we choose t to be a constant, then the ﬁrst two statements are fulﬁlled.
It remains to choose t in such a way that the third statement is fulﬁlled.
We have
UNSAT(G
t
) ≥ β
red
β
amp
√
t min¦UNSAT(G
pre
),
1
2t
¦
≥ β
red
β
amp
√
t min¦β
pre
UNSAT(G),
1
2t
¦
If we now set t = 4
1
β
pre
β
amp
β
red
2
, we get
UNSAT(G
t
) ≥ min¦2 UNSAT(G), a¦
58 9. The ﬁnal proof
with a =
β
pre
β
2
amp
β
2
red
4
.
With this lemma, the proof of the PCP theorem follows easily. We start
with the observation that the decision version of constraint graph satisfac
tion is NPcomplete, i.e., gap(1 − 1/[E[, 1)MaxCGS is NPhard. Let G be
an input graph. If we now apply the above lemma log [E[ times, we get
an graph G
t
that can be computed in time polynomial in size(G) with the
property that
UNSAT(G
t
) ≥ min¦2
log [E[
1
[E[
, a¦ = a
is constant. Thus we have a reduction from gap(1 − 1/[E[, 1)MaxCGS to
gap(1−a, 1)MaxCGS. In particular, the latter problem is also NPcomplete.
But this is equivalent to the statement of the PCP theorem.
10 Averagecase complexity
Being intractable, e.g., NPcomplete, does not completely reﬂect the dif
ﬁculty of a problem. Approximability is one way of reﬁning the notion of
intractability: We have seen some NPhard optimization problems, for which
ﬁnding a closetooptimal solution is easy, and others, for which ﬁnding even
a very weak approximation is as hard as solving an NPcomplete problem.
Averagecase complexity is another way of reﬁning the intractability of
problems: Unless P = NP, no eﬃcient algorithm exists to solve an NP
complete problem eﬃciently on all instances. However, we may still hope
that we can solve the problem eﬃciently on most instances or on typical
instances, where “typical” here means something like “sampled from a dis
tribution that reﬂects practical instances.”
Another motivation for studying averagecase complexity is cryptogra
phy: A cryptographic system is secure only if any eﬃcient attempt to break
it succeeds only with a very small probability. Thus, it does not help if a
cryptographic system is hard to break only in the worst case. In fact, most
of cryptography assumes that NP problems exist that are intractable not
only in the worst but even in the average case, i.e., on random inputs.
Bogdanov and Trevisan give a wellwritten survey about averagecase
complexity [BT06]. They also cover connections of averagecase complexity
to areas like cryptography.
10.1 Probability distributions and distributional prob
lems
What probability distribution should we take? What probability distribu
tions should we allow? How should we model that we have inputs of various
sizes?
There are essentially two ways to deal with inputs of various sizes: First,
for each n ∈ N, we can have a distribution D
n
, from which we draw the
instances of “size” n (e.g., n can be the length of the strings). Combining
D
1
, D
2
, D
3
, . . ., we get an ensemble T = (D
n
)
n≥1
of probability distribu
tions.
The second possibility is to have a single probability distribution T for
strings of all lengths. This is convenient in some applications, and it leads to
a simple notion of reducibility that preserves averagecase tractability. But
it is diﬃcult, e.g., to deﬁne circuit complexity for this case, and it is also
sometimes counterintuitive. For instance, since ¦0, 1¦
is a countable inﬁ
59
60 10. Averagecase complexity
nite set, there is no “uniform” distribution on ¦0, 1¦
. Instead, the distribu
tion that is commonly called the “uniform distribution” assigns probability
6
π
2
n
−2
2
−n
to every string of length n ≥ 1 (or, to get rid of the factor 6/π
2
,
we assign probability
1
n(n+1)
2
−n
to strings of length n). We will use ensem
bles of distributions as it will be more convenient. However, most results
hold independent of which possibility we choose. The uniform distribution
 = (U
n
)
n∈N
is then given by U
n
(x) = 2
−n
for x ∈ ¦0, 1¦
n
.
In order to allow for diﬀerent probability distributions, we do not ﬁx
one, but we will consider distributional problems in the following.
Deﬁnition 10.1. A distributional decision problem is a pair Π = (L, T),
where L is a language and T = (D
n
)
n≥1
is an ensemble of probability dis
tributions, where each D
n
has ﬁnite support.
By supp(D) = ¦x [ D(x) > 0¦, we denote the support of a probability
distribution. You can think of supp(D
n
) ⊆ ¦0, 1¦
n
, but this will not al
ways be the case. However, we will have supp(D
n
) ⊆ ¦0, 1¦
≤p(n)
for some
polynomial p for the distributions that we consider.
What we would now like to have is something like an “averagecase NP
hard” distributional problem Π = (L, T): If Π is averagecase tractable, then
(L
t
, T
t
) is averagecase tractable for each L ∈ NP and each ensemble T
t
.
However, as we will show, a statement like “every (L
t
, T
t
) is averagecase
tractable” is the same as “P = NP”. (The previous sentence is nontrivial:
For the same language L
t
, we can use diﬀerent algorithms for diﬀerent dis
tributions T
t
.) Thus, the averagecase analog of NPcompleteness cannot
refer to arbitrary probability distributions. We will restrict ourselves to
two possible sets of distributions, namely polynomial time samplable and
polynomial time computable ensembles.
Deﬁnition 10.2. An ensemble T = (D
n
) is polynomialtime samplable
if there exists a randomized algorithm A that takes an input n ∈ N and
produces a string in A(n) ∈ ¦0, 1¦
with the following properties:
• There exists a polynomial p such that A on input n is p(n) time
bounded, regardless of the random bits A reads.
• For every n ∈ N and every x ∈ ¦0, 1¦
, we have Pr(A(n) = x) =
D
n
(x).
PSamp denotes the set of all polynomialtime samplable ensembles.
Several variants of deﬁnitions of polynomialtime samplable distributions
exist. For instance, one can relax the strict bound on the runningtime and
require only that A runs in expected polynomial time. Such ﬁner distinctions
are important, e.g., in the study of zeroknowledge proofs, but we will not
elaborate on this.
10.2. Average polynomial time and heuristics 61
To deﬁne the second set of distributions, let ≤ denote the lexicographic
ordering between bit strings. (We have x ≤ y if [x[ < [y[ or [x[ = [y[ and x
appears before y in lexicographic order.) Then the cumulative probability of
x with respect to a probability distribution D is deﬁned by
f
D
(x) =
¸
y≤x
D(y) .
Deﬁnition 10.3. An ensemble T = (D
n
) is polynomialtime computable if
there exists a deterministic algorithm A that, on input n ∈ N and x ∈ ¦0, 1¦
,
runs in time poly(n) and computes f
D
n
(x).
PComp denotes the set of all polynomialtime computable ensembles.
If a distribution f
D
n
(x) is computable in time poly(n), then also the
density function D
n
(x) is computable in time poly(n). The converse does
not hold unless P = NP.
Exercise 10.1. Show that there exists an ensemble T = (D
n
)
n∈N
such that
the density functions D
n
are polynomialtime computable but T / ∈ PComp
unless P = NP. The latter means that if the functions f
D
n
are computable
in polynomial time, then P = NP.
In the following, we will focus on ensembles from PComp. Many results
will nevertheless carry over to the wider class PSamp.
Exercise 10.2. Show that PComp ⊆ PSamp.
The converse is unlikely to be true.
Exercise 10.3. Show that PComp = PSamp if and only if P = P
#P
.
Hint: For ⇒, deﬁne an appropriate distribution on 'ϕ, a` (choose pairing
function and encoding carefully), where ϕ is a Boolean formula and a is an
assignment for ϕ. Given ϕ, the probability of 'ϕ, a` should essentially depend
on whether a satisﬁes ϕ. Show that the cumulative distribution function can
be used to compute the number of satisfying assignments.
One can argue that PSamp is the class of natural distributions: A distri
bution is natural if we can eﬃciently sample from it. This, however, does not
mean that their density or distribution function is eﬃciently computable.
10.2 Average polynomial time and heuristics
When can we say that a problem can be solved eﬃciently on average? A
ﬁrst attempt might be to consider the expected runningtime with respect
to a given probability distribution. However, this deﬁnition, although very
natural, does not yield a robust class of average tractable problems.
62 10. Averagecase complexity
Exercise 10.4. Show that the class
¦(L, T) [ L can be solved in expected polynomial time with respect to T¦
is not invariant under changes of the machine model: Let A be an algorithm
with runningtime t, and let B be a simulation of A that is, say, quadratically
slower. Give a function t such that A has expected polynomial runningtime
but B has not.
Another way would be to consider the median of the runningtime. At
least, this would be a deﬁnition that is robust against changes of the machine
model. However, you probability would not call an algorithm eﬃcient if it
requires linear time on 70% of the instances and exponential time on the
rest. So what about requiring that the algorithm runs in polynomial time
on 99% of the instances? The threshold 99% would be arbitrary, there is no
reason why 99% is preferable to 98% or 99.9%. But with any such threshold,
there would still be a constant fraction of the inputs, for which the algorithm
performs poorly.
But what would be a natural way of deﬁning “average tractability”?
Intuitively, one would probably say that an algorithm is eﬃcient on average
if instances that require longer and longer running time show up with smaller
and smaller probability: An algorithm A has polynomial average running
time if there exists a constant c > 0 such that the probability that A requires
more than time T is at most poly(n)/T
c
. In this way, we have a polynomial
tradeoﬀ between runningtime and fraction of inputs.
Deﬁnition 10.4. An algorithm A with runningtime t
A
has average poly
nomial runningtime with respect to the ensemble T if there exists an ε > 0
and a polynomial p such that, for all n and t,
Pr
x∼D
n
(t
A
(x, n) ≥ t) ≤
p(n)
t
ε
.
If an algorithm A has average polynomial runningtime, then its median
runningtime is polynomial, it runs in polynomial time on all but a 1/ poly(n)
fraction of inputs, and in time n
polylog(n)
on all but an n
−polylog(n)
fraction
of the inputs, and so on.
The whole ﬁeld of averagecase complexity theory was essentially founded
by Leonid Levin [Lev86]. (Leonid Levin is the guy who did not get a Tur
ing award for inventing the Russian analog of NP completeness. His pa
per [Lev86] is a good candidate for the shortest paper in theoretical com
puter science that founded an area: The conference version of the paper is
one page long, the ﬁnal journal paper has two pages.) Levin’s original deﬁ
nition is diﬀerent, but turns out to be equivalent: An algorithm has average
polynomial runningtime with respect to T if there exists an ε > 0 such that
E
x∼D
n
(t
A
(x, n)
ε
) = O(n) .
10.2. Average polynomial time and heuristics 63
Exercise 10.5. Prove that Deﬁnition 10.4 and Levin’s deﬁnition are indeed
equivalent.
Exercise 10.6. Actually, Levin’s original deﬁnition uses a single distribu
tion D : ¦0, 1¦
+
→ [0, 1]. Given an ensemble T = (D
n
)
n≥1
with supp(D
n
) ⊆
¦0, 1¦
n
, we obtain a single distribution D : ¦0, 1¦
+
by setting D(x) =
6
π
2
[x[
2
D
[x[
(x).
A function t : ¦0, 1¦
+
→ N is called polynomial on Daverage if there
exists constants k and c such that
¸
x∈0,1¦
+
t(x)
1/k
[x[
D(x) ≤ c .
Prove that t is polynomial on Daverage if and only if Pr
x∼D
n
t(x, n) ≥
t
≤
p(n)
t
ε
for all n, some polynomial p, and some ε > 0.
In addition to how to measure time, we have the choice to use determin
istic or randomized algorithms or even nonuniform families of circuits.
“In practice”, we would probably not run an algorithm forever, but only
for a polynomial number of steps. We can model this by an algorithm with
worstcase polynomial runningtime that “fails” on some inputs. This leads
to the following deﬁnition, where “failure” means that the algorithm says “I
don’t know.”
Deﬁnition 10.5. Let Π = (L, T) be a distributional problem with T = (D
n
).
An algorithm A is a (fully polynomialtime) errorless heuristic scheme for
Π if there is a polynomial p such that the following holds:
• For every n, every δ > 0, and every x ∈ supp(D
n
), A(x, n, δ) outputs
either L(x) or the failure symbol ⊥.
• For every n, every δ > 0, and every x ∈ supp(D
n
), A(x, n, δ) runs in
time p(n, 1/δ).
• For every n and every δ > 0, we have Pr
x∼D
n
(A(x, n, δ) = ⊥) ≤ δ.
This, however, is yet another way of deﬁning averagepolynomial time.
Exercise 10.7. Show that a distributional problem Π admits an errorless
heuristic scheme if and only if it admits an algorithm whose runningtime
is averagepolynomial according to Deﬁnition 10.4.
Now that we have three equivalent deﬁnitions of “tractable on average”,
we will ﬁnally deﬁne a complexity class of distributional problems that are
tractable on average.
Deﬁnition 10.6. The class AvgP is the set of all distributional problems
that admit an errorless heuristic scheme.
64 10. Averagecase complexity
Exercise 10.8. Let
3COL = ¦G [ G is 3colorable¦ ,
let G
n,1/2
be the uniform distribution on graphs with n vertices, and let
(
1/2
= (G
n,1/2
)
n∈N
. Show that (3COL, (
1/2
) ∈ AvgP.
Note: If we include every edge of
n
2
with a probability of p, i.e., the
probability of getting G = (V, E) is p
[E[
(1 − p)
(
n
2
)−[E[
, then this probability
distribution is called G
n,p
.
Instead of having δ as part of the input, we can also have a function
δ : N → (0, 1] of failure probabilities. This leads to the following deﬁnition.
Deﬁnition 10.7. Let Π = (L, T) be a distributional problem, and let δ :
N → (0, 1]. An algorithm A is an errorless heuristic algorithm for Π with
failure probability at most δ if the following properties are fulﬁlled:
• For every n ∈ N and every x ∈ supp(D
n
), A(x, n) outputs either L(x)
or ⊥.
• For every n ∈ N, we have Pr
x∼D
n
(A(x, n) = ⊥) ≤ δ(n).
For a time bound t, we say that Π ∈ Avg
δ
DTime(t) if there exists an
errorless heuristic deterministic algorithm A that, for every n, runs on time
at most t(n) for all x ∈ supp(D
n
) and has failure probability at most δ.
Let Avg
δ
P =
¸
p:polynomial
Avg
δ
DTime(p).
So far, all algorithms that we considered never produced wrong answers.
They might return “don’t know”, but if they give an answer, it is the correct
answer. Weakening the requirement yields the following deﬁnition.
Deﬁnition 10.8. An algorithm A is called a (fully polynomialtime) heuris
tic scheme for Π = (L, T) if there exists a polynomial p such that the fol
lowing holds:
1. For every n, every x ∈ supp(D
n
), and every δ > 0, A(x, n, δ) runs in
time p(n, 1/δ).
2. For every δ > 0, A(, , δ) is a heuristic algorithm for Π with error
probability at most δ.
Let HeurP be the set of all distributional problems that admit a heuristic
scheme.
Deﬁnition 10.9. Let Π = (L, T) be a distributional problem, and let δ :
N → (0, 1]. An algorithm A is a heuristic algorithm with error probability
at most δ for Π if, for all n,
Pr
x∼D
n
A(x, n) = L(x)
≤ δ(n).
10.3. A distribution for which worst case equals average case 65
For a time bound t and δ : N → (0, 1], we have Π ∈ Heur
δ
DTime(t) if
there exists a heuristic deterministic algorithm A such that, for every n and
every x ∈ supp(D
n
), A(x, n) runs in time t(n) with failure probability at
most δ(n).
Let Heur
δ
P =
¸
p:polynomial
Heur
δ
DTime(p).
Exercise 10.9. Prove the following: For every constant c, AvgP ⊆ Avg
n
−cP
and HeurP ⊆ Heur
n
−cP.
So far, we have deﬁned classes of problems that are tractable (to various
degrees) on average. Now we deﬁne the averagecase analog of NP, which is
called DistNP.
Deﬁnition 10.10. DistNP is the class of all distributional problems Π =
(L, T) with L ∈ NP and T ∈ PComp.
Similar to P versus NP, the central question in averagecase complexity
is whether DistNP ⊆ AvgP. Note that AvgP = DistNP does not hold: First,
there is no restriction on the probability distributions for the problems in
AvgP. Second, AvgP might contain problems that are not even in NP.
10.3 A distribution for which worst case equals aver
age case
There exists an ensemble for which worst case and average case are equiva
lent. Thus, the study of averagecase complexity with respect to all (instead
of only samplable or computable) distributions reduces to worstcase com
plexity. To get meaningful results in averagecase complexity, we thus have
to restrict ourselves to suﬃciently simple sets of ensembles like PComp or
PSamp.
Theorem 10.11. There exists an ensemble / such that if L is a decidable
language and the distributional problem (L, /) is in Heur
o(1)
P, then L ∈ P.
For the proof of this theorem, we need Kolmogorov complexity (see also
the script of the lecture “Theoretical Computer Science”). To deﬁne Kol
mogorov complexity, let us ﬁx a universal Turing machine U. The Kol
mogorov complexity of a string x ∈ ¦0, 1¦
is the length of the shortest
string c such that U on input c = 'g, y` outputs x. Here, g is a G¨odel num
ber of a Turing machine and y is an input for the Turing machine M
g
. We
denote the length [c[ by K(x).
The conditional Kolmogorov complexity K(x[z) is the length of the short
est string c = 'g, y` such that M
g
on input y and z outputs x.
Example 10.12. We have K(0
n
) = log n+O(1) and K(0
n
[ bin(n)) = O(1)
and K(x) ≤ [x[ +O(1).
66 10. Averagecase complexity
By abusing notation, we write K(x[n) instead of K(x[ bin(n)) for [x[ = n.
This is also called the lengthconditioned Kolmogorov complexity.
Proof of Theorem 10.11. The universal probability distribution / =
(K
n
)
n∈N
assigns every string x of length n a probability proportional to
2
−K(x[n)
: We have
¸
x∈0,1¦
n 2
−K(x[n)
=
¸
x
K
n
(x) ≤ 1 since ', ` is a
preﬁxfree code (this follows from Kraft’s inequality). By scaling with an
appropriate constant ≥ 1, we make sure that the sum equals 1.
Since L ∈ Heur
o(1)
P, there exists a heuristic algorithm A for L that fails
with probability at most o(1). Consider a string x of length n such that
A(x) = L(x). Since the overall probability of such strings is at most o(1),
we have K
n
(x) = o(1). Thus,
K(x[n) = −log K
n
(x) +O(1) = ω(1).
This lower bound for the Kolmogorov complexity holds for all strings on
which A fails.
Now let x
0
be the lexicographically ﬁrst string of length n on which
A fails. Since L is decidable, x
0
can be computed given n. This implies
K(x
0
[n) = O(1). To conclude, we observe that for suﬃciently large n, no
string exists on which A fails. Thus, A fails only on ﬁnitely many strings,
which proves L ∈ P.
Exercise 10.9 immediately yields the following result.
Corollary 10.13. If (L, /) ∈ HeurP, then L ∈ P.
Exercise 10.10. An interesting feature of the universal distribution is the
following: Let A be any algorithm with runningtime t : ¦0, 1¦
→ N. Then
the expected runningtime with respect to / is asymptotically equal to the
worstcase runningtime:
E
x∼K
n
t(x) = Θ
max
x∈0,1¦
n
t(x)
,
where the constant hidden by Θ depends only on the algorithm A.
Prove this!
Exercise 10.11. Let M = ¦1, . . . , m¦. An encoding of M is an injective
mapping c : M → ¦0, 1, . . . , γ −1¦
. The code c is called preﬁxfree if there
are no i = j in M such that c(i) is a preﬁx of c(j).
Prove the following: Assume that we are given lengths
1
, . . . ,
m
∈ N.
Then there exists a preﬁxfree code c for M with [c(i)[ =
i
if and only if
¸
m
i=1
γ
−
i
≤ 1. (This is called Kraft’s inequality.)
11 Averagecase completeness
The goal of this section is to prove that DistNP = (NP, PComp) contains a
complete problem. There are three issues that we have to deal with: First,
we need an appropriate notion of reduction. Second, we have to take care
of the diﬀerent probability distributions, which can diﬀer vastly. Third, and
this is the easiest task, we need an appropriate problem with an appropriate
probability distribution.
11.1 Reductions
For reductions between distributional problems, the usual manyone reduc
tions do not suﬃce: A feature that a suitable reduction should have is that
if (A, T) reduces to (A
t
, T
t
) and (A
t
, T
t
) ∈ AvgP, then (A, T) ∈ AvgP. Let
us consider the following example: We use the identity mapping as reduc
tion between (A, T) and (A, T
t
). But T assigns high probability to the hard
instances, whereas T
t
assigns small probability to hard instances. Then
(A, T
t
) is tractable on average, but (A, T) is not.
What we learn is that a meaningful notion of reduction must take into
account the probability distributions.
Deﬁnition 11.1. Let Π = (L, T) and Π
t
= (L
t
, T
t
) be distributional prob
lems. Then Π reduces to Π
t
, denoted by Π ≤
AvgP
Π
t
, if there is a function
f that, for every n and every x in the support of D
n
, can be computed in
time polynomial in n such that
1. (correctness) x ∈ L if and only if f(x, n) ∈ L
t
and
2. (domination) there exist polynomials p and m such that, for every n
and every y in the support of D
t
m(n)
,
¸
x:f(x,n)=y
D
n
(x) ≤ p(n) D
t
m(n)
(y) .
The ﬁrst condition is usual for manyone reductions. The second condi
tion forbids the scenario sketched above: Drawing strings according to D
n
and then using f(, n) yields a probability distribution of instances for L
t
.
(Of course, we just get binary strings again. But we view them as instances
for L
t
.) Then the second property makes sure that no string y is generated
with much larger a probability than if y had been drawn according to D
t
m(n)
.
67
68 11. Averagecase completeness
Lemma 11.2. Let C ∈ ¦AvgP, HeurP¦. If Π ≤
AvgP
Π
t
and Π
t
∈ C, then
Π ∈ C.
Proof. We only consider the case C = AvgP. The other case is similar.
Suppose Π
t
∈ AvgP, and let A
t
be a fully polynomialtime errorless heuristic
scheme for Π
t
. Let f be a reduction from Π to Π
t
, and let p and m be the
polynomials of Deﬁnition 11.1.
We claim that A(x, n, δ) = A
t
f(x, n), m(n), δ/p(n)
is a fully poly
nomialtime errorless heuristic scheme for Π. To prove this, let B
n
= ¦y ∈
supp(D
t
m(n)
) [ A
t
(y, m(n), δ/p(n)) = ⊥¦ be the set of strings on which A
t
fails.
Since A
t
is a fully polynomialtime errorless heuristic scheme, we have
D
t
m(n)
(B
n
) ≤ δ/p(n). With this, we get
Pr
x∼D
n
A(x, n, δ) = ⊥
= Pr
x∼D
n
A
t
f(x, n), m(n), δ/p(n)
= ⊥
=
¸
x:f(x,n)∈B
n
D
n
(x)
≤
¸
y∈B
n
p(n)D
t
m(n)
(y)
= p(n) D
t
m(n)
(B
n
) ≤ δ .
The inequality holds because the reduction must fulﬁll the domination con
dition. Thus, Π ∈ AvgP.
It is also not hard to show that ≤
AvgP
is transitive.
Lemma 11.3. Let Π = (L, T), Π
t
= (L
t
, T
t
), and Π
tt
= (L
tt
, T
tt
) be distri
butional problems with Π ≤
AvgP
Π
t
and Π
t
≤ Π
tt
. Then Π ≤
AvgP
Π
tt
.
Proof. Let f be a reduction from Π to Π
t
, and let g be a reduction from Π
t
to Π
tt
. Let p and m the polynomials of Deﬁnition 11.1 for f, and let p
t
and m
t
be the polynomials for g. Obviously, h given by h(x, n) = g(f(x, n), m(n))
is polynomialtime computable and a manyone reduction from L to L
tt
.
To prove that it fulﬁlls domination remains to be done. Therefore, Let
q(n) = p(n) p
t
(m(n)), and let (n) = m
t
(m(n)). The functions q and
are obviously polynomials. Now let n be arbitrary, and let z ∈ supp(D
tt
(n)
).
Then
¸
x:h(x,n)=z
D
n
(x) =
¸
y:g(y,m(n))=z
¸
x:f(x,n)=y
D
n
(x)
≤
¸
y:g(y,m(n))=z
p(n)D
t
m(n)
(y)
≤ p(n)p
t
(m(n))D
tt
m
(m(n))
(z) = q(n)D
tt
(n)
(z) .
11.2. Bounded halting 69
11.2 Bounded halting
In this section, we present a problem that is complete for DistNP. It is the
bounded halting problem, which is the generic NPcomplete problem:
BH =
¸
'g, x, 1
t
` [ M
g
is a nondeterministic Turing machine
that accepts x in at most t steps
¸
.
We will show that (BH, 
BH
) is DistNPcomplete, where 
BH
= (U
BH
n
)
n∈N
is some kind of uniform distribution on the inputs for BH.
The main challenge in the proof that (BH, 
BH
) is DistNPcomplete is
that problems Π = (L, T) from DistNP have very diﬀerent distributions.
Thus, just using the manyone reduction from L to BH is unlikely to work.
The key idea is to ﬁnd an injective mapping C with the property that
C(x) is almost uniformly distributed if x is distributed according to T. The
following lemma makes this more precise and proves that such a function
exists.
Lemma 11.4. Let T = (D
n
)
n∈N
∈ PComp be an ensemble. Then there
exists an algorithm C with the following properties:
1. C(x, n) runs in time polynomial in n for all x ∈ supp(D
n
),
2. for every n and x, x
t
∈ supp(D
n
), C(x, n) = C(x
t
, n) implies x = x
t
(so C is somewhat “injective”), and
3. [C(x, n)[ ≤ 1 + min
¸
[x[, log
1
[D
n
(x)[
¸
.
Proof. Consider any x ∈ supp(D
n
). If D
n
(x) ≤ 2
−[x[
, then let C(x, n) =
0x. If D
n
(x) > 2
−[x[
, then let y be the string that precedes x in lexicographic
order, and let p = f
D
n
(y). Then we set C(x, n) = 1z, where z is the longest
common preﬁx of the binary representation of p and f
D
n
(x) = p + D
n
(x).
Since T ∈ PComp, the string z can be computed in polynomial time. Thus,
C can be computed in polynomial time. (This also shows that [C(x, n)[ is
bounded by a polynomial in [x[.)
It remains to prove that C is injective and fulﬁlls the length condition.
C is injective because no three strings can have the same longest common
preﬁx: If z is the longest common preﬁx of x
1
and x
2
and z is also a preﬁx
of x
3
, then also either z0 or z1 is a preﬁx of x
3
and it is also a preﬁx of
either x
1
and x
2
.
Finally, we observe that either C(x, n) = 0x or C(x, n) = 1z for the
z described above. In the former case, we have [C(x, n)[ ≤ 1 + [x[ and
D
n
(x) ≤ 2
−[x[
. Thus, also 1 + log(1/D
n
(x)) ≥ [x[ + 1 = [C(x, n)[. In the
latter case, we have D
n
(x) ≤ 2
−[z[
. Thus, [C(x, n)[ = 1+[z[ ≤ log(1/D
n
(x))
and, since D
n
(x) ≥ 2
−[x[
, [C(x, n)[ ≤ 1 +[x[.
70 11. Averagecase completeness
Now let us focus on 
BH
. The instances of BH are triples 'g, x, 1
t
` of
length 2 log [g[ + 2 log [x[ + 2 log t + [x[ + [g[ + t + Θ(1). Note that this
representation is preﬁxfree. We draw such instances of length at most N
as follows: We ﬂip random bits b
1
, b
2
, . . . until either i = N or b
1
. . . b
i
has
the form 'g, x`. In the former case, we output b
1
. . . b
N
, in the latter we
output 'g, x, 1
N−i
`. We denote this distribution by U
BH
N
. The probability of
an instance 'g, x, 1
t
` under this distribution is
U
BH
N
'g, x, 1
t
`
= 2
−(2 log [g[+2 log [x[+[g[+[x[+Θ(1)
,
where t = N −['g, x`[.
With this preparation, we can prove the main theorem of this section.
The key idea is to use C to “compress” the inputs: While D
n
can be any
distribution, the images C(x, n) with x drawn according to D
n
are, more or
less, uniformly distributed in the sense of 
BH
: If x is likely, then C(x, n) is
short. If x is unlikely, then C(x, n) is long. Thus, if we draw random bits
until we have seen an image y = C(x, n), then the probability of seeing y is
roughly D
n
(x).
Theorem 11.5. (BH, 
BH
) is DistNP with respect to ≤
AvgP
.
Proof. Let Π = (L, T) ∈ DistNP be arbitrary, i.e., L ∈ NP and T ∈
PComp. Let M be a nondeterministic Turing machine that accepts an input
string y if and only if there exists a string x ∈ L with C(x, n) = y. Since C
is polynomialtime computable and L ∈ NP, we can assume that M obeys
a polynomial time bound q. Let g be the G¨odel number of M.
Let us describe the reduction from (L, T) to (BH, 
BH
): On input x and
parameter n, the reduction outputs an instance 'g, C(x, n), 1
t(x)
` of length
N = N(n). We choose N to be a suﬃciently large polynomial to make sure
that t(x) ≥ q(n).
Obviously, we have x ∈ L if and only if 'g, C(x, n), 1
t(x)
` ∈ BH. To verify
the domination condition, we exploit that C is injective. Thus, it suﬃces
to check that, for every n and every x ∈ supp(D
n
), we have D
n
(x) ≤
poly(n) U
BH
N
('g, x, 1
t(x)
`).
Let = [g[ be the length of the encoding of M, which is ﬁxed. Then
U
BH
N
'g, C(x, n), 1
q(n)
`
= 2
−(2 log +2 log [C(x,n)[++[C(x,n)[+Θ(1))
.
Now log [C(x, n)[ ≤ log(m(n)) + 1 and [C(x, n)[ ≤ log
1
D
n
(x)
+ 1 yields
U
BH
N
'g, C(x, n), 1
q(n)
`
≥ 2
−(2 log +)
. .. .
=Θ(1)
1
(m(n) + 1)
2
D
n
(x) Ω(1) ,
which proves that domination is fulﬁlled.
Let us make a ﬁnal remark on the function C. This function C is some
times called a compression function for T. It plays a crucial role in the
11.3. Heuristic algorithms vs. heuristic schemes 71
completeness proof, as it makes the reductions, which have to meet the
domination requirement, possible in the ﬁrst place. Why is C called com
pression function? Assume that we are given samples drawn according to T.
If we compress them using C, we have a compression with close to optimal
compression rate.
11.3 Heuristic algorithms vs. heuristic schemes
In the previous chapter, we distinguished between algorithms and schemes:
An algorithm has a ﬁxed failure probability (ﬁxed means a ﬁxed function,
not a ﬁxed constant), whereas a scheme works for all failure probabilities δ,
but the runningtime depends on δ.
By Exercise 10.9, if a problem Π admits a heuristic scheme, then it
admits heuristic algorithms with error probabilities n
−c
for every constant c.
The containment in the other direction does not hold. For instance,
Avg
1/n
P contains undecidable problems, whereas AvgP does not.
Exercise 11.1. 1. Show that there exists an undecidable problem L with
(L, ) ∈ Avg
1/n
P.
2. Show that AvgP does not contain undecidable problems (L, ).
But if we restrict ourselves to problems in DistNP, the other containment
can be proved: DistNP as a whole admits heuristic schemes if and only if it
admits heuristic algorithms.
Theorem 11.6. Let c > 0 be arbitrary. If (BH, 
BH
) ∈ Avg
n
−cP, then
DistNP ⊆ AvgP. The same holds for Heur
n
−cP and HeurP.
Proof. For simplicity, we will only consider AvgP and c = 1. By the
completeness of (BH, 
BH
), it suﬃces to show (BH, 
BH
) ∈ AvgP.
Let A be an errorless heuristic algorithm for (BH, 
BH
) with error prob
ability 1/n. We will use A to construct an errorless heuristic scheme A
t
.
The idea is to use padding to map short instances of BH to longer instances.
Then we exploit that the error probability of A decreases with growing input
length.
Let N be the length of an instance I = 'g, x, 1
t
` of BH. Then we set
A
t
(I, N, δ) = A
'g, x, 1
t+1/δ
`, N +
1
δ
.
(A
t
immediately rejects inputs that are not syntactically correct.) Note that
U
BH
N
(I) = U
BH
N+
1
δ
'g, x, 1
t+1/δ
`
by the deﬁnition of U
BH
N
. On inputs from U
BH
N+1/δ
, algorithm A outputs ⊥
with a probability of at most 1/(N + 1/δ) < δ. Thus, A
t
outputs ⊥ on at
most a δ fraction of the instances obtained from U
BH
N
.
72 11. Averagecase completeness
11.4 More DistNPcomplete problems
Here, we list some more DistNPcomplete problems without proving that
they are. For the proofs as well as some other problems, we refer to Wang’s
survey of DistNPcomplete problems [Wan97].
There are not too many DistNPcomplete distributional problems (L, T),
where both L and T are natural. The main issue is that we lack a pow
erful tool to prove the a problem is hardonaverage like the PCP theorem
for (in)approximability. So, in some sense, averagecase complexity is in a
similar state as the complexity of optimization was before the PCP theorem.
Tiling
A tile is a square with a symbol on each of its four sides. Tiles must not be
rotated or turned over. If we have a set T of tiles, we assume that we have
an inﬁnite number of tiles of any kind in T. A tiling of an n n square is
an arrangement of n
2
tiles that cover the square such that the symbols of
the adjacent sides of the tiles agree. The size of a tile is the length of the
binary representation of its four symbols.
Instance: A ﬁnite set T of tiles, an integer n > 0, a sequence s
1
, . . . , s
k
∈ T
of tiles that match each other (the right side of s
i
matches the left side
of s
i+1
) The size of the instance is n plus the sizes of the tiles in T
plus the sizes of s
1
, . . . , s
k
.
Question: Can s
1
, . . . , s
k
be extended to a tiling of the n n square using
only tiles from the set T?
Distribution: Given n, select T using your favorite probability distribution
(this really does not matter much; for the reduction, T just represents
the Turing machine deciding a language, and this Turing machine has
constant size for any ﬁxed language). Select k uniformly at random
from ¦1, . . . , n¦. Finally, select s
1
uniformly from T and select s
i+1
randomly from T such that it matches s
i
.
Levin’s original DistNPcomplete problem was a variant of tiling, where
the corners instead of the sides of the tiles had to match.
Post correspondence
The Post correspondence problem is one of the better known undecidable
problems. In a restricted variant, it becomes NPcomplete. Together with
an appropriate probability distribution, it becomes DistNPcomplete.
Instance: A positive integer n, and a list 'x
1
, y
1
`, . . . , 'x
m
, y
m
` of pairs of
strings. The length N of the instance is n +
¸
m
i=1
[x
i
[ +[y
i
[.
11.4. More DistNPcomplete problems 73
Question: Is there a sequence i
1
, . . . , i
n
∈ ¦1, . . . , m¦ of indices such that
x
i
1
x
i
2
. . . x
i
n
= y
i
1
y
i
2
. . . y
i
n
?
Distribution: Draw m according to Pr(m = µ) = Θ(1/µ
2
). Then draw
x
1
, . . . , x
m
and y
1
, . . . , y
m
according to the uniform distribution on
¦0, 1¦
+
deﬁned in Section 10.1.
Arbitrary NPcomplete problems
If some problem (L, T) with L ∈ NP and T ∈ PComp is hardonaverage,
then every NPcomplete language A is hardonaverage with respect to some
samplable ensemble c. The ensemble c, however, might look a bit unnatural.
In particular, for every NPcomplete language A, there exists an ensemble
c ∈ PSamp such that (A, c) is DistNPhard.
12 Average case versus worst case
In this section, we will show some connections between averagecase and
worstcase complexity. We will ﬁrst provide a condition under which a distri
butional problem is not DistNPcomplete unless EXP = NEXP. Second, we
will show that DistNP is not contained in AvgP unless E = NE. (Recall that
E = DTime(2
O(n)
) and NE = NTime(2
O(n)
), whereas EXP = DTime(2
poly(n)
)
and NEXP = DTime(2
poly(n)
).)
12.1 Flatness and DistNPcomplete problems
So under which conditions is a distributional problem DistNPcomplete?
Gurevich gave a partial answer: Π = (L, DistNP) cannot be DistNPcomplete
if T assigns only very little weight to all strings in supp(D
n
). The intuition
is the following: Assume that we have a distributional problem Ψ = (A, c)
that reduces to Π. Assume further that c assigns high weight to few strings.
Then, in order to satisfy the domination requirement, also T must assign
somewhat high weight to some strings. The following deﬁnition makes the
notion of “assigns very little weight to all strings” precise.
Deﬁnition 12.1. An ensemble T = (D
n
)
n∈N
is ﬂat if there exists an ε > 0
such that, for all n and x, D
n
(x) ≤ 2
−n
ε
.
Exercise 12.1. Show that (
1/2
(introduced in Exercise 10.8) is ﬂat.
Exercise 12.2. Show that 
BH
(see Section 11.2) is not ﬂat.
Theorem 12.2 (Gurevich [Gur91]). If there is a DistNPcomplete problem
(L, T) with a ﬂat ensemble T, then NEXP = EXP.
Proof overview: Assume that Π = (L, T) is DistNPcomplete, T is ﬂat,
and there exists a reduction from Ψ = (A, c) to Π, where c = (E
n
)
n∈N
is
nonﬂat. Let f be a reduction from Ψ to Π. In order to maintain domination,
f must map strings x ∈ supp(E
n
) to very short strings f(x). Short strings,
however, mean (relatively) short runningtime.
Proof. Assume that there exists a DistNPcomplete distributional prob
lem Π = (L, T), where T is a ﬂat ensemble. Obviously, L ∈ EXP. Now let
A ∈ NEXP be arbitrary. Let p be a polynomial such that A ∈ NTime(2
p(n)
).
For a string x ∈ ¦0, 1¦
with [x[ = n, let x
t
= x01
2
p(n)
−n−1
. Let A
t
= ¦x
t
[
74
12.1. Flatness and DistNPcomplete problems 75
x ∈ A¦. Since A ∈ NEXP, the language A
t
is in NP. Let c = (E
n
)
n∈N
be
the following ensemble:
E
2
p(n) (z) =
2
−[x[
if z = x
t
for some string x and
0 otherwise.
Since c is computable, we have (A
t
, c) ∈ DistNP. Thus, there exists a
reduction f from (A
t
, c) to (L, T). Let us make a few observations.
• Given x of length n, f(x
t
, 2
p(n)
) can be computed in time 2
q(n)
for
some polynomial q.
• The function x → f(x
t
, 2
p(n)
) is a manyone reduction from A to L,
i.e., x ∈ A if and only if f(x
t
, 2
p(n)
) ∈ L.
• There exist polynomials m and r such that
¸
z:f(x
,2
p(n)
)=f(z,2
p(n)
)
E
2
p(n) (z) ≤ r(2
p(n)
) D
m(2
p(n)
)
(f(x
t
, 2
p(n)
))
for all n and x. (This might look confusing at ﬁrst glance since the
strings x
t
, z and f(x
t
) are exponentially long, but it is just the domi
nation condition.)
Now we have E
2
p(n) (x
t
) = 2
−n
. Thus, domination implies that
D
m(2
p(n)
)
(f(x
t
, 2
p(n)
)) ≥
2
−n
r(2
p(n)
)
= 2
−s(n)
(12.1)
for some polynomial s. Since T is ﬂat, there exists an ε > 0 such that
D
m(2
p(n)
)
(f(x
t
, 2
p(n)
)) ≤ 2
−(m(2
p(n)
))
ε
(12.2)
From the two bounds (12.1) and (12.2) on D
m(2
p(n)
)
, we get
s(n) ≥ m(2
p(n)
)
ε
. (12.3)
Now we are almost done. Since the images f(x
t
, 2
p(n)
) are in supp(D
m(2
p(n)
)
),
they are polynomially bounded in m(2
p(n)
). By (12.3), this quantity is
bounded from above by a polynomial s(n). Hence, A ∈ EXP: (1) x ∈ A if
and only if f(x
t
, 2
p(n)
) ∈ L. (2) We can compute y = f(x
t
, 2
p(n)
) in time
2
q(n)
. (3) We can decide whether y ∈ L in time 2
p([y[)
≤ 2
p(m(2
p(n)
))
≤
2
p(s(n))
= 2
poly(n)
, where the second inequality holds because of (12.3).
Since the uniform distribution  = (U
n
)
n∈N
is ﬂat, we immediate get
the following result as a special case.
Corollary 12.3. There is no L ∈ NP such that (L, ) is DistNPcomplete
unless NEXP = EXP.
76 12. Average case versus worst case
12.2 Collapse in exponential time
Our second result concerning connections between averagecase and worst
case complexity shows that it is unlikely that DistNP is a subset of AvgP. If
this is the case, then nondeterministic exponential time collapses to deter
ministic exponential time.
To show this, we need the following two lemmas.
Lemma 12.4. E = NE if and only of there exists a unary language L ∈
NP ` P.
Proof. “=⇒”: Assume that NE = E, and let L
t
∈ NE ` E. Let L =
¦1
cod(x)
[ x ∈ L
t
¦ be a unary language. Let us ﬁrst show that L ∈ NP: Given
y = 1
cod(x)
, x can be computed in polynomial time. We have [x[ = O(log [y[).
Since L
t
∈ NE, there exists a 2
O([y[)
time bounded nondeterministic Turing
machine that decides L
t
. On input x, this machine needs time 2
O(log n)
=
n
O(1)
.
Now let us prove that L / ∈ P. Assume to the contrary that L ∈ P.
Then, given any string x, we can compute y = 1
cod(x)
in time 2
O(n)
. By
deﬁnition, y ∈ L if and only if x ∈ L
t
. Since L ∈ P, we can decide in time
[y[
O(1)
= 2
O([x[)
if y ∈ L. This would imply L
t
∈ E – a contradiction.
“⇐=”: Assume that there exists a unary language L ⊆ ¦1¦
in NP ` P.
Consider the language L
t
= ¦bin(n) [ 1
n
∈ L¦. We will show that L
t
∈ NE`E.
On input y, we can compute an x = 1
n
with bin(n) = y in time 2
O([y[)
.
Then we can use the nondeterministic polynomialtime Turing machine that
witnesses L ∈ NP to decide x ∈ L in time n
O(1)
= 2
O([y[)
. Thus, L
t
∈ NE.
Lastly, we have to show that L
t
/ ∈ E. Assume to the contrary that
L
t
∈ E. Then there is a deterministic 2
O(n)
time bounded Turing machine
that decides L
t
. Now, on input x = 1
n
, we can compute y = bin(n) in
polynomial time. We have [y[ = O(log n). Thus, y ∈ L
t
can be decided in
time 2
O(log n)
= n
O(1)
. Since y ∈ L
t
if and only if x ∈ L, this would imply
L ∈ P – again a contradiction.
Lemma 12.5. Let O = (Q
n
)
n∈N
be given by Q
n
(1
n
) = 1. Then, for every
unary language L ⊆ ¦1¦
, we have L ∈ P if and only if (L, O) ∈ AvgP.
Proof. Clearly, if L ∈ P, then (L, O) ∈ AvgP. To see that the converse
also holds, consider any algorithm A that witnesses (L, O) ∈ AvgP. Let t
be the runningtime of A. Then we have, for some ε > 0, E
x∼Q
n
(t
ε
(x, n)) =
O(n). Since supp(Q
n
) = ¦1
n
¦, this is equivalent to t(1
n
, n) = O(n
1/ε
).
Thus, A runs in worstcase polynomial time.
With these two lemmas, the main theorem of this section can be proved
easily.
Theorem 12.6 (BenDavid, Chor, Goldreich, Luby [BDCGL92]). If E =
NE, then DistNP ⊆ AvgP.
12.2. Collapse in exponential time 77
Proof. Let O be the ensemble of Lemma 12.5. Obviously, T ∈ PComp.
Thus, L ∈ NP if and only if (L, T) ∈ DistNP.
We have E = NE if and only if there exists a unary language L ∈ NP` P
by Lemma 12.4. This in turn holds if and only of (L, O) ∈ DistNP ` AvgP
by Lemma 12.5.
13 Decision versus search
A search algorithm for an NP relation V is an algorithm that, on input
x, computes a witness w of length poly([x[) such that 'x, w` ∈ V . Recall
that the corresponding language L ∈ NP is L = ¦x [ ∃
P
w : 'x, w` ∈ V ¦.
By abusing notation, we will call A also a search algorithm for a language
L ∈ NP. This is ambiguous since L does not uniquely deﬁne a corresponding
NP relation.
Obviously, if we have an eﬃcient search algorithm for a language L ∈ NP,
then we can use it to get an eﬃcient decision algorithm for L. What about
the opposite? If L is NPcomplete, then we can use an eﬃcient decision
algorithm for L to eﬃciently compute witnesses (see the script of the lecture
“Computational Complexity Theory”). Thus, if P = NP, then every problem
in NP admits eﬃcient search algorithms. So for NP as a whole, decision and
search are equally hard. Nevertheless, it is believed that in general, eﬃcient
decision does not imply eﬃcient search. For instance, oneway permutations,
if they exist, yield problems for which decision is easy but search is hard.
Exercise 13.1. A oneway permutation is a bijective function f : ¦0, 1¦
→
¦0, 1¦
such that
• [f(x)[ = [x[ for every x ∈ ¦0, 1¦
,
• given x, f(x) can be computed in polynomial time, and
• the problem of ﬁnding an x with f(x) = y for a given y cannot be
solved in polynomial time. (Since f is bijective, we can equivalently
say that f
−1
cannot be computed in polynomial time.)
Show that the existence of oneway permutations implies that there are prob
lems for which search is harder than decision.
In this section, we consider the question of decision versus search in the
averagecase setting: Assume that all DistNP problems admit eﬃcienton
average decision algorithms, i.e., DistNP ⊆ AvgP. Do then all problems in
DistNP also have eﬃcientonaverage search algorithms? We will give a par
tial answer to this question: If all languages in NP with the uniform distri
bution admit eﬃcientonaverage randomized algorithms, then all languages
in NP with the uniform distribution admit eﬃcientonaverage randomized
search algorithms.
We have not yet deﬁned what an eﬃcientonaverage randomized algo
rithm is. (Here, the instances are drawn at random and, in addition, also the
78
13.1. Randomized decision algorithms 79
algorithm itself is allowed to use randomness to solve the instance.) Further
more, we also do not know yet what an eﬃcientonaverage (randomized)
search algorithm is. We will deﬁne all this in the next section and postpone
the main theorem of this section to Section 13.3.
13.1 Randomized decision algorithms
We ﬁrst generalize AvgP, Avg
δ
P and so on to randomized algorithms.
Deﬁnition 13.1. Let Π = (L, T) be a distributional problem. An algo
rithm A is a randomized errorless heuristic scheme for Π if A runs in time
polynomial in n and 1/δ for every δ > 0 and x ∈ supp(D
n
) and
Pr
A
A(x, n, δ) / ∈ ¦L(x), ⊥¦
≤
1
4
(13.1)
(the probability is taken over A’s coin tosses) and
Pr
x∼D
n
Pr
A
A(x, n, δ) = ⊥
≥
1
4
≤ δ (13.2)
(the inner probability is again over A’s coin tosses, the outer probability over
the random instances).
AvgBPP is the class of all distributional problems that admit a random
ized errorless heuristic scheme.
We stress that “errorless” refers to the random input, not to the internal
coin tosses of the algorithm.
Deﬁnition 13.1 probably need some explanation. Fix some input x ∈
supp(D
n
), consider running A(x, n) k times for some large k. If signiﬁcantly
more than k/4 of these runs return ⊥, then we can interpret this as A
not knowing the answer for x. This follows from the second condition of
Deﬁnition 13.1. On the other hand, if ⊥ is returned fewer than k/4 times,
then the ﬁrst condition guarantees that we will see the right answer at least
k/2 times (with high probability due to Chernoﬀ’s bound). The choice of the
constant 1/4 in Deﬁnition 13.1 is arbitrary: Any constant strictly smaller
than 1/3 serves well.
Excursus: Chernoﬀ bounds
Chernoﬀ bounds are frequently used to bound large deviations from the expected
value of random variables that are sums of independent indicator variables. The
rough statement is: If we toss n unbiased coins, we see n/2 ± O(
√
n) heads with
high probability.
More precisely: Let X
1
, . . . , X
n
be independent random variables that assume
only values in ¦0, 1¦. Let Pr(X
i
= 1) = p
i
, let X =
¸
n
i=1
X
i
, and let E(X) =
¸
n
i=1
p
i
= µ. Then
Pr
X > E(X) +a
< exp
−2a
2
n
80 13. Decision versus search
for all a > 0. By symmetry, we have the same bound for Pr(X < E(X) −a).
There are many variants of Chernoﬀ bounds. Sometimes, they lead to slightly
diﬀerent bounds. For most applications, however, it does not matter which version
we use.
Exercise 13.2. Let A be a randomized errorless heuristic scheme. Let A
t
be
the an algorithm that executes A k = k(n) times on inputs from supp(D
n
)
and outputs the majority vote. Prove that
Pr
A
A
t
(x, n, δ) / ∈ ¦L(x), ⊥¦
≤ 2
−Ω(k(n))
and
Pr
x∼D
n
Pr
A
A
t
(x, n, δ) = ⊥
≥ 2
−Ω(k(n))
≤ δ .
As in Deﬁnition 10.2, it is also possible to deﬁne randomized errorless
heuristic algorithms or randomized heuristics that are allowed to make er
rors, but we will not do so here. We can also replace the constant 1/4
in (13.1) by 0. Then we obtain zeroerror randomized errorless heuristic
schemes.
Now what is the diﬀerence between errorless and zeroerror? Note that
we have two types of “errors” or “failure”: We can be unlucky to get a hard
instance, and the algorithm, since randomized, may fail. Errorless means
that there is no instance on which the algorithm A errs. It is just allowed
to produce ⊥. However, if A is randomized it may still have bad luck with
its coin tosses, which may cause it to output a wrong answer. If this is not
the case, then A is called zeroerror.
Exercise 13.3. We can also deﬁne a nonuniform variant of AvgP: A dis
tributional problem Π = (L, T) is in AvgP/poly if there exists an algorithm
A and an advice function a : N(0, 1] → ¦0, 1¦
with [a(n, δ)[ ≤ poly(n, 1/δ)
such that the following holds:
1. For every n, every δ > 0, and every x ∈ supp(D
n
), A(x, n, δ, a(n, δ))
outputs either L(x) or the failure symbol ⊥.
2. For every n, every δ > 0, and every x ∈ supp(D
n
), A(x, n, δ, a(n, δ))
runs in time p(n, 1/δ).
3. For every n and every δ > 0, we have Pr
x∼D
n
(A(x, n, δ, a(n, δ)) =
⊥) ≤ δ.
(One might prefer to deﬁne AvgP/poly in terms of circuits rather than Tur
ing machines that take advice. But we want to have one circuit for each n
and δ, and supp(D
n
) can contain strings of diﬀerent lengths. This technical
problem can be solved, but why bother?)
Prove that AvgBPP ⊆ AvgP/poly.
13.2. Search algorithms 81
13.2 Search algorithms
Now we turn to the deﬁnition of search algorithms. In order to avoid confu
sion, we will ﬁrst deﬁne deterministic search algorithms that are eﬃcient on
average, although we will never use them. After that, we allow our search
algorithms to use randomness.
Deﬁnition 13.2. Let Π = (L, T) be a distributional problem with L ∈ NP.
An algorithm A is a deterministic errorless search scheme for Π if there is
a polynomial p such that the following holds:
1. For every n, δ > 0, and every x ∈ supp(D
n
), A runs in time at most
p(n, 1/δ).
2. For every n, δ > 0, and every x ∈ L ∩ supp(D
n
), A(x, n, δ) outputs a
witness for x ∈ L or ⊥.
3. For every n and δ > 0, we have Pr
x∼D
n
(A(x, n, δ) = ⊥) ≤ δ.
For x / ∈ L, the algorithm A(x, n, δ) can output anything. The above
deﬁnition is not completely precise since the witness language is not unique.
However, this really makes no diﬀerence hear.
In the next deﬁnition, we allow our search algorithm to use randomness.
Deﬁnition 13.3. Let Π = (L, T) be a distributional problem with L ∈ NP.
An algorithm A is a randomized errorless search scheme for Π if there is a
polynomial p such that the following is true:
1. For every n and δ > 0, A runs in time p(n, 1/δ) and outputs either a
string w or ⊥.
2. For every n, δ > 0, and x ∈ L ∩ supp(D
n
),
Pr
A
A(x, n, δ) outputs a witness for x or A(x, n, δ) = ⊥
>
1
2
.
3. For every n and δ > 0,
Pr
x∼D
n
Pr
A
A(x, n, δ) = ⊥
>
1
4
≤ δ .
What does this deﬁnition mean? For any x ∈ L, A may output a non
witness w. According to item (2), this happens with bounded probability.
This is an internal failure of the algorithm A and not due to x being a hard
instance. Item (3) bounds the probability that A outputs ⊥. Intuitively, A
outputs ⊥ not because of internal failure, but because x is a hard instance.
82 13. Decision versus search
This is called an external failure. However, there is at most a δ fraction
of strings x (measured with respect to D
n
) on which A outputs ⊥ with
signiﬁcant probability.
The constants 1/2 and 1/4 in the deﬁnition are to some extent arbitrary.
We can replace them by any constants c and c
t
with 1 > c > c
t
> 0.
Furthermore, these two failure probabilities can be decreased to 2
−Ω(k)
by
executing the algorithm k times: If we ever get a witness, we output this
witness. If we see ⊥ more than c
t
k times, then we output ⊥. Otherwise, we
output an arbitrary string.
Deﬁnition 13.3 allows the algorithm A to output anything on input x / ∈ L
(but even then ⊥ only with bounded probability). Thus, a randomized
errorless search scheme can be used as a randomized decision algorithm: If
we get a witness, then we know that x ∈ L. If we get neither a witness nor
⊥, then we claim that x / ∈ L. If the answer is ⊥, then we do not know. By
amplifying probabilities, we can make sure that the probability of outputting
x / ∈ L though there exists a witness for x is small.
13.3 Searchtodecision reduction
Recall that  = (U
n
)
n∈N
is what we call the uniform distribution on ¦0, 1¦
.
Namely, U
n
(x) = 2
−n
for [x[ = n and U
n
(x) = 0 otherwise.
In the following, we will reduce search to decision in the averagecase
setting. Let us ﬁrst see why the usual approach from worstcase complexity
does not work. Let L = ¦x [ ∃
P
w : 'x, w` ∈ V ¦ ∈ NP with being V the
corresponding witness language. Then, given x and y, deciding if there exists
a witness w that is lexicographically smaller than y is an NP language as
well. Let
W =
¸
'x, y` [ ∃w : w ≤ y ∧ 'x, w` ∈ V
¸
,
where w ≤ y means “lexicographically smaller”. Assuming that decision is
easy, namely P = NP, we can use binary search to ﬁnd a witness w with
'x, w` ∈ L.
What about the averagecase? Let w
x
be the lexicographically smallest
witness for x. Suppose our eﬃcientonaverage algorithm for W works well
on all instances 'x, y` except for those y that are close to w
x
. Then our
algorithm is able to ﬁnd the most signiﬁcant bits of w
x
, but it fails to ﬁnd a
few least signiﬁcant bits of w
x
. Since most strings are not close to w
x
, our
algorithm can still be eﬃcient on average.
Our goal in the remainder of this section is still to prove that searchto
decision reduction is possible in the averagecase setting, despite what we
sketched above.
To do this, let us ﬁrst consider the scenario where every x ∈ L has a
unique witness w
x
. Then we can ask NP questions like “is the ith bit of
the witness for x a 1?” Let p be the (polynomial) length of witnesses. By
13.3. Searchtodecision reduction 83
querying the above for i ∈ ¦1, 2, . . . , p([x[)¦, we can ﬁnd the witness. In the
following, let [x[ = n and p = p(n).
Of course, we cannot assume in general that witnesses are unique. But
we know tools to make witnesses unique (recall the Valiant–Vazirani theorem
from the lecture “Computational Complexity Theory” [VV86]). We use a
family 1 of pairwise independent hash functions h : ¦0, 1¦
p
→ ¦0, 1¦
p
: 1
consists of all functions x → Ax + b, where A ∈ ¦0, 1¦
pp
and b ∈ ¦0, 1¦
p
.
By restricting h ∈ 1 to the ﬁrst j bits, we obtain h
[j
. Also ¦h
[j
[ h ∈ 1¦ is
a family of pairwise independent hash functions.
Now we consider the language
W
t
=
¸
'x, h, i, j` [ ∃w : 'x, w` ∈ V ∧ w
i
= 1 ∧ h
[j
(w) = 0
j
¸
.
We build the quadruple 'x, h, i, j` such that, for [x[ = n, it always has length
q = q(n) for some polynomial q. Furthermore, we make sure that x, h, i
and j are independent. This means that we can equivalently draw x¦0, 1¦
n
,
h ∈ 1 as well as i, j ∈ ¦1, . . . , p¦ uniformly and independently at random
and pair them to 'x, h, i, j`. In this way, we get the same distribution. This
can be done since the lengths of x and h is ﬁxed once n is known. (We can,
for instance, assume that p be a power of 2. Then we can write i and j as
binary string of length log p, possibly with leading 0s.)
It can be shown (see again the script of the lecture “Computational
Complexity Theory”) that if j is the logarithm of the number of witnesses
for x, then, with at least constant, positive probability (taken over the choice
of h ∈ 1), there is a unique witness w for x that also satisﬁes h
[j
(w) = 0.
Now we proceed as follows:
1. Draw h ∈ 1 uniformly at random.
2. If, for some j ∈ ¦1, . . . , p¦, the sequence of answers to the queries
'x, h, 1, j`, . . . , 'x, h, p, j` yields a witness w for x ∈ L, then we output
this witness. (Note that 'x, w` ∈ V can be checked in polynomial
time.)
3. If an answer to some 'x, h, i, j` is ⊥, then we also output ⊥. Otherwise,
we output an arbitrary string.
We call this algorithm B. Apart from technical details, which we will
prove below, this proves the following theorem. The essential subtlety is
that h is part of the (random) input for W
t
, where h is part of the internal
randomness of B for L. This means that h appears in the outer probabil
ity in (13.2) of Deﬁnition 13.1 and in the inner probability of item (3) of
Deﬁnition 13.3.
Theorem 13.4 (BenDavid et al. [BDCGL92]). If (NP, ) ⊆ AvgBPP, then
every problem in (NP, ) has an errorless randomized search algorithm.
84 13. Decision versus search
Proof. We have already done the lion’s share of the work. It remains to
estimate the failure probabilities. Let L ∈ NP be arbitrary such that wit
nesses for L have length p for some polynomial p, and let A be a randomized
errorless heuristic scheme for (W
t
, ) with
W
t
=
¸
'x, h, i, j` [ ∃w : 'x, w` ∈ V ∧ w
i
= 1 ∧ h
[j
(w) = 0
j
¸
.
Let x ∈ L ∩ supp(U
n
) = L ∩ ¦0, 1¦
n
, and let δ > 0 be arbitrary. Our
search algorithm B proceeds as described above. We call A with a failure
probability of α to achieve a failure probability of δ for B. Furthermore, we
amplify item (2) of Deﬁnition 13.1 to
Pr
A
A(y, q, α) / ∈ ¦W
t
(y), ⊥¦
≤ β (13.3)
for every y = 'x, h, i, j` and item (3) to
Pr
y=/x,h,i,j)∼U
q
Pr
A
A(y, q, α) = ⊥
≥ γ
≤ α .
We will specify α, β, and γ later on.
The algorithm described above, which we will call B, runs obviously in
polynomial time. The failure probabilities remain to be analyzed: We have
to ﬁnd constants c
t
and c with 0 < c
t
< c < 1 such that
Pr
B
B(x, n, δ) yields a witness or ⊥
≥ c (13.4)
for each x ∈ L ∩ ¦0, 1¦
n
and
Pr
x∼U
n
Pr
B
B(x, n, δ) = ⊥
> c
t
≤ δ (13.5)
for every n and δ > 0.
To show (13.4), consider any x ∈ L. The probability that we draw a
hash function h with the property that there exists a j such that x possesses
a unique witness w with h
[j
(w) = 0 is at least 1/8 (script of the lecture
“Computational Complexity Theory”, Lemma 17.3). We call such an h good
for x. Fix an arbitrary good h. If B draws this h, then a suﬃcient condition
for B output a witness or ⊥ is that A never outputs a wrong answer. The
probability that A outputs a wrong answer (i.e., neither correct nor ⊥) is at
most p
2
β by a union bound over all i and j. Thus, the probability that
1. B samples an h that is good for x and
2. A never gives a wrong answer
13.3. Searchtodecision reduction 85
is at least c =
1
8
(1 −p
2
β). We choose β =
1
5p
2
, which yields c = 1/10.
Before specifying the parameters α and γ, let us also analyze (13.5). To
do this, let
Z =
x ∈ ¦0, 1¦
n
[ Pr
h
∃i, j : Pr
A
A('x, h, i, j`, q, α) = ⊥
≥ γ
≥ φ
be the set of bad strings. Let us analyze the probability Pr
x∼U
n
(x ∈ Z) that
a random x is bad. We have
Pr
x,h
∃i, j : Pr
A
A('x, h, i, j`, q, α) = ⊥
≥ γ
≥ φ Pr
x∼U
n
(x ∈ Z) .
Thus,
Pr
y=/x,h,i,j)
Pr
A
A(y, q, α) = ⊥
≥ γ
. .. .
≤α by (13.3)
≥
φ Pr
x∼U
n
(x ∈ Z)
p
2
.
From this, we learn Pr
x∼U
n
(x ∈ Z) ≤
αp
2
φ
. We want
Pr
x∼U
n
(x ∈ Z) ≤ δ , (13.6)
thus we put the constraint
αp
2
φ
≤ δ. For x / ∈ Z, we have
Pr
B
B(x, n, δ) = ⊥
≤ φ + (1 −φ)p
2
γ .
Now we choose φ = 1/40 and γ =
1
40p
2
. This also speciﬁes α to α = δφ/p
2
,
which satisﬁes our constraint
αp
2
φ
≤ δ This speciﬁcation of φ and γ yields
Pr
B
B(x, n, δ) = ⊥
≤
1
20
. (13.7)
for x / ∈ Z. We set c
t
= 1/20 < c. Then item (3) of Deﬁnition 13.3 follows
from (13.6) and (13.7).
14 Hardness ampliﬁcation
Assume that we have a function f such that f is hardonaverage in a weak
sense. This means that every algorithm has a nonnegligible chance of mak
ing a mistake when evaluating f on a random input. But there still might be
algorithms that get a huge portion (for instance, a 1 − 1/ poly(n) fraction)
of the inputs right. The goal of hardness ampliﬁcation is the following: If
there is such a function f, then we can get a related problem g from f such
that g is hardonaverage in the strongest possible sense: No algorithm can
do signiﬁcantly better than simply tossing a fair coin. To put it the other
way round: If, for some class of functions, we can compute every function
in this class with a nontrivial error probability (i.e., signiﬁcantly less than
1/2). Then we can amplify this to get the error probability very small (i.e.,
1 −1/ poly(n)). (Note that this does not work by simple Chernoﬀ bounds:
it is the hardness of the instance that causes the algorithm to fail, not bad
luck with its random bits. In fact, our algorithms in this section will always
be deterministic.)
Yao’s XOR lemma is a powerful tool for hardness ampliﬁcation. The idea
is is simple: If f is slightly hard on average, then g, given by g(x
1
, . . . , x
k
) =
f(x
1
)⊕. . .⊕f(x
k
), is very hard on random x
1
, . . . , x
k
. The intuitive reason is
as follows: Although the probability that a speciﬁc x is hard, the probability
that at least one of x
1
, . . . , x
k
is hard is much higher. However, intuition
says that we need all f(x
1
), . . . , f(x
k
) to compute g(x
1
, . . . , x
k
) correctly.
Exercise 14.1. Let X
1
, . . . , X
n
∈ ¦0, 1¦
n
be independent random variables
with Pr(X
i
= 1) = p. Prove that
Pr
n
¸
i=1
X
i
is even
=
1 + (1 −2p)
n
2
.
For simplicity, we restrict ourselves to (nonuniform) circuits in the sec
tion. In the next section, we will state hardness ampliﬁcation results for NP
languages.
Deﬁnition 14.1. We say that a Boolean function f : ¦0, 1¦
n
→ ¦0, 1¦ is
(s, δ)hard with respect to a distribution D if, for every circuit C of size at
most s, we have
Pr
x∼D
f(x) = C(x)
> δ .
What does this mean? For every circuit C of size at most s, there exists
a set H of size 2δ2
n
such that using C to compute f(x) for x ∈ H is about
as good as tossing a fair coin.
86
14.1. Impagliazzo’s hardcore set lemma 87
We also need the advantage of a circuit has in computing a certain
function.
Deﬁnition 14.2. Let f be a function, C be a circuit, and D be a distribution
on inputs for f and C. If
Pr
x∼D
f(x) = C(x)
=
1
2
(1 +ε) ,
then we say that C has an advantage of ε with respect to D. (By deﬁnition,
the advantage ε is a number in the interval [0, 1].)
We will prove Yao’s XOR lemma in Section 14.2. There are several
diﬀerent proofs of this lemma. An elegant (and quite intuitive) proof is
via Impagliazzo’s hardcore set lemma (Section 14.1). This lemma is also
interesting in its own right and a somewhat surprising result. There are at
least two diﬀerent proofs of this lemma. An elegant (and quite intuitive)
proof is via von Neumann’s minmax theorem. Impagliazzo attributes this
proof to Nisan.
14.1 Impagliazzo’s hardcore set lemma
Note the quantiﬁers in Deﬁnition 14.1: For all circuits C, a hard set H
exists. Impagliazzo’s hardcore set lemma states that we can switch the
quantiﬁers: There exists a set H such that for all C computing f on H is
hard. We will prove the hardcore set lemma in two steps: First, we show
that there is a probability distribution over ¦0, 1¦
n
such that f is hard with
respect to this probability distribution. Second, we show how to get a set
from this distribution.
Lemma 14.3. Let f : ¦0, 1¦
n
→ ¦0, 1¦ be an (s, δ)hard function with
respect to the uniform distribution on ¦0, 1¦
n
, and let ε > 0. Then there is
a probability distribution D on ¦0, 1¦
n
with the following properties:
1. f is
s
−ε
2
8log(εδ)
,
1
2
−ε
hard with respect to D.
2. D(x) ≤
1
δ
2
−n
for all x ∈ ¦0, 1¦
n
.
Proof overview: We have to switch quantiﬁers: We have “for all circuits,
there exists a hard set”, and we want “there exists a hard set such that
for all circuits”. We model this by a zerosum game (see excursus below):
One player’s strategies are circuits C, the other player’s strategies are sets
H. The amount that the ﬁrst player (who plays C) gets from the second
player (who plays H) is proportional to the number of inputs of H that C
gets right. Then we can use von Neumann’s minmax theorem to switch
quantiﬁers.
88 14. Hardness ampliﬁcation
Proof. Consider the following twoplayer game: Player D picks a set T of
δ2
n
strings from ¦0, 1¦
n
. Player C picks a circuit of size s
t
= s
−ε
2
8log(εδ)
. The
payoﬀ for C is Pr
x∼0,1¦
(f(x) = C(x)). This is a zerosum game. and we
can apply the minmax theorem: Either the player D has a mixed strategy
so that there is no mixed strategy for player C for which C achieves a payoﬀ
of at least
1
2
+ ε. Or there is a mixed strategy for player C with which C
gets a payoﬀ of at least
1
2
+ε for any (mixed) strategy of player D.
Consider the ﬁrst case. This means the following: There exists a dis
tribution D
t
on sets of size δ2
n
such that every circuit C of size at most s,
which corresponds to the pure strategies of player C, achieves only
E
T∼D
Pr
x∼T
f(x) = C(x)
≤
1
2
+ε .
This is the same as the probability that f(x) = C(x) if x is drawn according
to the following distribution D: First, draw T ∼ D
t
. Second, draw x ∈ T
uniformly at random. This probability distribution D is as stated in the
lemma: For each x
0
∈ ¦0, 1¦
n
, we have Pr
x∼D
(x = x
0
) = Pr
T∼D
(x
0
∈
T)
1
δ
2
−n
≤
1
δ
2
−n
. Thus, f is (s,
1
2
−ε)hard with respect to D. The lemma
follows for this case since s
t
≤ s.
Now consider the second case. There exists a probability distribution (
on circuits of size s
t
such that, for every subset T ⊆ ¦0, 1¦
n
of cardinality
δ2
n
, we have
Pr
x∼T,C∼(
(C(x) = f(x)) ≥
1
2
+ε ,
which corresponds to an average advantage of 2ε.
Let U be the set of inputs x for which the distribution ( on circuits
achieves an advantage of at most ε in computing f. (“Advantage” is gener
alized to distributions over circuits in the obvious way.)
Claim 14.4. [U[ ≤ δ(1 −ε)2
n
.
Proof of Claim 14.4. Assume to the contrary that [U[ > δ(1 − ε)2
n
. If
[U[ ≥ δ2
n
, then U would give rise to a strategy of player D to keep the
payoﬀ to at most
1
2
(1 +ε), which contradicts the assumption.
Otherwise, consider any set T
t
⊇ U of cardinality δ2
n
for which (
achieves the smallest advantage. Since [U[ > δ(1 −ε)2
n
, we have [T
t
` U[ <
εδ2
n
. Then the advantage of ( on T
t
would be smaller than
1
δ2
n
[T
t
∩ U[ ε +[T
t
` U[
<
1
δ2
n
(εδ2
n
+εδ2
n
) = 2ε .
This contradicts the average advantage of at least 2ε on this set T
t
(which
was the assumption of the second case).
14.1. Impagliazzo’s hardcore set lemma 89
Now we construct a circuit
˜
C of size s that gets f(x) right for more than
a 1 − δ fraction of the inputs, which contradicts the assumption that f is
(s, δ)hard. The idea is as follows: On U, we might have only little chance
to get the right answer. Thus, we ignore inputs from U. For inputs from
¦0, 1¦
n
` U, however, we have a nontrivial chance of computing the right
answer if we sample circuits according to the distribution (, which is an
optimal strategy for player C. Then we amplify probabilities by sampling
more than one circuit and taking the majority outcome.
More precisely, let us draw t independent random circuits according
to the distribution (. Our new circuit
˜
C outputs the majority output
of these t circuits. Fix any x / ∈ U. We can bound the probability that
˜
C gets f(x) wrong using Chernoﬀ bounds:
˜
C errs only if at most t/2 of
its subcircuits give a wrong output. The expected number of correct out
puts is at least
1
2
(1 + ε). Thus, Chernoﬀ bounds yields an upper bound of
exp(−2(tε/2)
2
/t) = exp(−tε
2
/2). We set t =
−4log(εδ)
ε
2
≥
−2log(εδ/2)
ε
2
. This
gives a probabilistic construction of
˜
C that errs on only a εδ/2 fraction of
the inputs not from U. (Note that we do not need to construct
˜
C explicitly.
Its existence suﬃces.) Since [U[ ≤ δ(1 − ε)2
n
,
˜
C errs with a probability of
at most δ(1 −ε) +εδ/2 ≤ δ for random x ∈ ¦0, 1¦
n
.
Since
˜
C consists of t circuits of size s
t
, its size is at most 2ts
t
= s. This
contradicts the assumption that f is (s, δ)hard.
Excursus: Minmax theorem
A zerosum game is a game between two players such that the loss of one player
is the gain of the other. A zerosum game can be modeled by a matrix A =
(a
i,j
)
1≤i≤m,1≤j≤n
∈ R
m×n
. The game consists of one player, called the maximizer,
choosing an i ∈ ¦1, . . . , m¦ and the other player, called minimizer, choosing a
j ∈ ¦1, . . . , n¦. Then the minimizer has to pay a
i,j
to the maximizer. (If a
i,j
< 0,
then the maximizer has to pay −a
i,j
to the minimizer.) The set ¦1, . . . , m¦ is the set
of pure strategies of the maximizer. The set ¦1, . . . , n¦ is the set of pure strategies
of the minimizer.
The order in which the players choose matters, as can be seen easily from the
simple game with m = n = 2 and a
i,j
= (−1)
i+j
.
However, if we allow the players to use randomized strategies (socalled mixed
strategies), then the order of play does not matter. This is what the minmax
theorem by von Neumann [vN28] says.
More precisely: A mixed strategy is a probability distribution over the pure
strategies of a player. In our case, it is simply a vector p ∈ [0, 1]
n
with
¸
n
j=1
p
j
= 1
for the minimizer and a vector q ∈ [0, 1]
m
with
¸
m
i=1
q
i
= 1 for the maximizer. The
outcome of the game is then q
T
Ap. Then the minmax theorem says
min
p ∈ [0, 1]
n
n
j=1
p
j
= 1
max
q ∈ [0, 1]
m
m
i=1
q
i
= 1
q
T
Ap = max
q ∈ [0, 1]
m
m
i=1
q
i
= 1
min
p ∈ [0, 1]
n
n
j=1
p
j
= 1
q
T
Ap .
The number min
p
max
q
q
T
Ap = max
q
min
p
q
T
Ap is called the value of the game.
90 14. Hardness ampliﬁcation
The goal of the next lemma is to get a hardcore set from the hardcore
distribution just constructed.
Lemma 14.5. Let D : ¦0, 1¦
n
→ [0, 1] be a probability distribution such that
D(x) ≤
1
δ
2
−n
for all x ∈ ¦0, 1¦
n
. Let f : ¦0, 1¦
n
→ ¦0, 1¦ be a function such
that f is (s,
1
2
−
ε
2
) with respect to D for 2n < s <
1
16n
2
n
(εδ)
2
Then there exists a set H ⊆ ¦0, 1¦
n
such that f is (s,
1
2
− ε)hard with
respect to the uniform distribution on H.
Proof overview: We use the probabilistic method: We draw a set according
to the hardcore distribution. Then we take a union bound over all possible
circuits to bound the probability that there exists a circuit that achieves
a signiﬁcant advantage on this (random) set. Since this probability will
be bounded away from 1, there exists a set such that no circuit achieves a
signiﬁcant advantage on this set. This will be our hardcore set.
Proof. The construction of our set H will again be probabilistic. Let
H be the random set obtained by placing x into H with a probability of
δ2
n
D(x). The expected number of elements in H is δ2
n
. With nonzero
probability, this set H will have the desired property.
The number of circuits of size s is upperbounded by
2(2n +s)
2s
≤ 2
2ns
<
1
4
exp
2
n
ε
2
δ
2
8
.
Let C be any circuit of size s. Let
A
C
(H) =
¦x ∈ H [ f(x) = C(x)¦
be the number of strings x that C gets right. We have
E
H
A
C
(H)
≤
1
2
+
ε
2
δ2
n
by the assumption that Pr
x∼D
(C(x) = f(x)) ≤
1
2
+
ε
2
for every C.
The random variable A
C
(H) consists of 2
n
independent indicator random
variables, one for each string x. This brings Chernoﬀ bounds into play:
Pr
H
A
C
(H) ≥
1
2
+
3ε
4
δ2
n
≤ Pr
H
A
C
(H) ≥ E
A
C
(H)
+
εδ
4
2
n
< exp
−
2
εδ
4
2
n
2
2
n
= exp
−
ε
2
δ
2
2
n
8
.
14.1. Impagliazzo’s hardcore set lemma 91
Furthermore, also by Chernoﬀ bounds, H is unlikely to be small:
Pr
H
[H[ < δ2
n
1 −
ε
4
< exp
−
ε
2
δ
2
2
n
8
.
Now we take a union bound over all circuits of size s: The probability that,
for a random set H, there exists a circuit C with A
C
(H) ≥ (
1
2
+
3ε
4
) δ2
n
or
[H[ < δ2
n
(1 −
ε
4
) is bounded by
Pr
H
∃C : A
C
(H) ≥
1
2
+
3ε
4
δ2
n
∨ [H[ < δ2
n
1 −
ε
4
≤
1
4
exp
ε
2
δ
2
2
n
8
2 exp
−
ε
2
δ
2
2
n
8
=
1
2
.
We can conclude that a set H with the desired properties exists with the
following properties:
• [H[ ≥ δ2
n
(1 −
ε
4
).
• There is no circuit C of size s that gets more than (
1
2
+
3ε
4
) δ2
n
strings
from x right.
If [H[ = δ2
n
, then we are done. If [H[ < δ2
n
, we add δ2
n
− [H[ arbitrary
elements to H and call the new set again H. No circuit gets more than
(
1
2
+ ε) δ2
n
strings of H right. If [H[ > δ2
n
, then we remove [H[ − δ2
n
elements from H and again call the new set H. Since we only remove
elements, no circuits gets more strings of the new set right that it got for
the old set. Thus, this set also meets our requirements.
From the two lemmas above, the main result of this section follows easily.
Theorem 14.6 (Impagliazzo [Imp95]). Let f : ¦0, 1¦
n
→ ¦0, 1¦ be a func
tion that is (s, δ)hard with respect to the uniform distribution. Then, for
every ε > 0, there exists a set H ⊆ ¦0, 1¦
n
(called hardcore set for f) of
cardinality at least δ2
n
such that f is
s
−ε
2
64 log(εδ)
,
1
2
− ε
hard with respect
to the uniform distribution over H.
Proof. By Lemma 14.3, there exists a distribution D such that f is
s
−ε
2
64log(εδ)
,
1
2
−
ε
2
hard with respect to D and D’s density is bounded by
1
δ
2
−n
. (Note that we use ε/2 instead of ε, which yields the worse constant.)
Now Lemma 14.5 shows the existence of a set H of cardinality δ2
n
such that
f is
s
−ε
2
64log(εδ)
,
1
2
−ε
hard with respect to the uniform distribution on H.
Exercise 14.2. It is, in fact, possible to show an even stronger statement:
Assume that f : ¦0, 1¦
n
→ ¦0, 1¦ is (s, δ)hard, and let η > 0 be an arbitrary
constant. Then there exists a set H ⊆ ¦0, 1¦
n
of cardinality at least (2 −
92 14. Hardness ampliﬁcation
η)δ2
n
such that f is (s poly(ε, δ, η),
1
2
−ε)hard with respect to the uniform
distribution.
Prove this!
Hint: Modify Lemma 14.3, then the rest follows.
This is close to optimal: Assume that there is a circuit C that errs only
with a probability of δ, and consider any set H of cardinality signiﬁcantly
larger than 2δ2
n
. Then the probability that C errs on a random input is
signiﬁcantly smaller than 1/2.
14.2 Yao’s XOR lemma
The XOR lemma is attributed to Yao [Yao82]. The version that we present
here is due to Impagliazzo [Imp95].
Theorem 14.7 (Yao’s XOR lemma). Let f : ¦0, 1¦
n
→ ¦0, 1¦ be (s, δ)hard
with respect to the uniform distribution. Let k ≥ 1, and let g : ¦0, 1¦
kn
→
¦0, 1¦ be given by
g(x
1
, . . . , x
k
) = f(x
1
) ⊕f(x
2
) ⊕. . . ⊕f(x
k
) .
Then, for every ε > 0, the function g is
s
−ε
2
100 log(εδ)
,
1
2
−ε −(1 −δ)
k
hard
with respect to the uniform distribution.
Proof overview: Let H be a hardcore set of f as in Theorem 14.6. The
probability that one speciﬁc x
i
is in H is δ. We ignore ε for the moment. If
x
i
∈ H, then computing g(x
1
, . . . , x
k
) correctly stay about the same if we
replace f(x
i
) by a random bit b and compute
f(x
1
) ⊕. . . ⊕f(x
i−1
) ⊕b ⊕f(x
i+1
) ⊕. . . ⊕f(x
k
) .
A random bit xored with something is still a random bit. Thus, we get the
right answer in this case only with probability 1/2.
Our only hope is that none of x
1
, . . . , x
k
is in H. This happens with a
probability of (1−δ)
k
. Thus, we compute g correctly only with a probability
of
1
2
+ (1 −δ)
k
.
The problem with this proof idea is that a circuit for computing g does
not necessarily proceeds by ﬁrst computing f(x
1
), . . . , f(x
k
). It is allowed
to do anything. Nevertheless this idea can be turned into a proof of the
XOR lemma.
Proof. Let H be a hardcore set for f of cardinality at least δ2
n
as in
Theorem 14.6. Assume to the contrary that there exists a circuit C of size
s
t
= s
−ε
2
100 log(εδ)
such that
Pr
x
1
,...,x
k
C(x
1
, . . . , x
k
) = g(x
1
, . . . , x
k
)
>
1
2
+ (1 −δ)
k
+ε . (14.1)
14.2. Yao’s XOR lemma 93
Let D be the uniform distribution over (x
1
, . . . , x
k
) with x
i
∈ ¦0, 1¦
n
and
conditioned on at least one x
i
being in the hardcore set H. This yields
Pr
(x
1
,...,x
k
)∼D
C(x
1
, . . . , x
k
) = g(x
1
, . . . , x
k
)
(14.2)
≥ Pr
x
1
,...,x
k
C(x
1
, . . . , x
k
) = g(x
1
, . . . , x
k
)
− Pr
x
1
,...,x
k
∃i : x
i
∈ H
>
1
2
+ε .
Let us take a diﬀerent view on the distribution D: First, we pick a non
empty set T ⊆ ¦1, . . . , k¦ with an appropriate distribution. Then, we choose
x
i
∈ H for i ∈ T uniformly at random and x
i
∈ ¦0, 1¦
n
` H for i / ∈ T. Let
the latter distribution be D
T
. Thus, we can rewrite (14.2) as
E
T
Pr
(x
1
,...,x
k
)∼D
T
C(x
1
, . . . , x
k
) = g(x
1
, . . . , x
k
)
>
1
2
+ε .
Fix a set T that maximizes the inner probability. Without loss of generality,
we assume that 1 ∈ T. Then we can further rewrite the probability as
E
x
2
,...,x
k
Pr
x
1
∼H
C(x
1
, . . . , x
k
) = g(x
1
, . . . , x
k
)
>
1
2
+ε ,
where, by abusing notation, x
1
∼ H means that x
1
is drawn uniformly
at random from H. Now let a
j
for j > 1 be the assignment for x
j
that
maximizes the above probability. This yields
Pr
x
1
∼H
C(x
1
, a
2
, . . . , a
k
) ⊕f(a
2
) ⊕. . . ⊕f(a
k
) = f(x
1
)
>
1
2
+ε ,
where we have rearranged terms to isolate f(x
1
). To get a circuit for f
from C, we replace x
2
, . . . , x
k
by the constants a
2
, . . . , a
k
. Then we observe
that f(a
2
) ⊕. . . ⊕f(a
k
) is a constant. Thus, we possibly have to negate the
output of C. This increases the size by at most 1. Thus, we have a circuit
C
t
of size s
t
+1 ≤ s
−ε
2
64 log(εδ)
for f that has success probability greater than
1
2
+ε on the hardcore set H for f. This would contradict Theorem 14.6.
Using Exercise 14.2, we can even improve the hardness almost to
1
2
−ε−
(1 −2δ)
k
.
Analogously to Exercise 13.3, we can generalize Heur
δ
P to nonuniform
circuits of polynomial size in a straightforward way. In this way, we obtain
Heur
δ
P/poly.
Corollary 14.8. Let C be a class of Boolean functions with the follow
ing property: If f = (f
n
)
n∈N
∈ C, then also g ∈ C with g(x
1
, . . . , x
k
) =
¸
k
i=1
f(x
i
), where k can be a function of [x
i
[.
Suppose there exists a family of functions f = (f
n
)
n∈N
∈ C such that
f / ∈ Heur
1/p(n)
P/poly. Then, for every constant c > 0, there exists a family
g of functions such that g ∈ C and g / ∈ Heur 1
2
−n
−c
P/poly.
94 14. Hardness ampliﬁcation
Exercise 14.3. Prove Corollary 14.8.
We can phrase the XOR lemma and Corollary 14.8 also the other way
round: Assume that a class ( of functions is closed under ⊗, and suppose
that that every function in ( can be computed with a success probability of
at least
1
2
+ε for some not too small ε > 0. (Say, ε = 1/ poly(n).) Then we
can reduce the error probability to 1/ poly(n). (If this were not the case,
then we would be able to amplify the hardness of 1/ poly(n) to
1
2
− ε – a
contradiction.) So if we are able to compute a function with a nontrivial
advantage, then we can bring the advantage close to 1. This is closely related
to boosting, which is a concept in computational learning theory. Klivans
and Servedio [KS03] explain the connections between boosting and hardcore
sets.
15 Ampliﬁcation within NP
Our goal in this section is to show a statement of the form “if there is a
language in NP that is mildly hard on average, then there is a language in
NP that is very hard on average.” Unfortunately, the XOR lemma does not
yield such a result: If L ∈ NP, then it is unclear if computing L(x) ⊕L(y) is
also in NP. For instance, if L is coNPcomplete, then L(x) ⊕L(y) can only
be computed in NP if NP = coNP.
We circumvent this problem by replacing parity by a monotone function
g : ¦0, 1¦
k
→ ¦0, 1¦. If L ∈ NP, then computing g(L(x
1
), . . . , L(x
k
)) from
x
1
, . . . , x
k
is also in NP.
Exercise 15.1. Prove the above statement. More precisely, prove the fol
lowing stronger statement:
Assume that NP = coNP. Prove that the following two statements are
equivalent for any function g : ¦0, 1¦
k
→ ¦0, 1¦:
1. For all L ∈ NP, also ¦(x
1
, . . . , x
k
) [ g(L(x
1
), . . . , L(x
k
)) = 1¦ ∈ NP.
2. g is monotonically increasing. This means that for any y, z ∈ ¦0, 1¦
with y ≤ z (componentwise), we have g(y) ≤ g(z).
15.1 The main idea
For the results of this chapter, which mainly due to O’Donnell [O’D04], we
need some preparation. Let f : ¦0, 1¦
n
→ ¦0, 1¦, and let g : ¦0, 1¦
k
→
¦0, 1¦. Then g ⊗ f : (¦0, 1¦
n
)
k
→ ¦0, 1¦ denotes the function given by
(g⊗f)(x
1
, . . . , x
k
) = g(f(x
1
), . . . , f(x
k
)). Our goal is to analyze the hardness
of g ⊗ f in terms of properties of g and the hardness of f. The property
of g that we need is the bias of g or, more precisely, the expected bias of g
subject to a random restriction.
Deﬁnition 15.1. The bias of a Boolean function h is
bias(h) = max
¸
Pr
x
(h(x) = 0), Pr
x
(h(x) = 1)
¸
∈ [1/2, 1] .
The function h is called balanced if bias(h) = 1/2.
In fact, not the bias of g is the parameter that plays a role, but the
expected bias of g with respect to a random restriction.
95
96 15. Ampliﬁcation within NP
Deﬁnition 15.2. A restriction ρ of a function h : ¦0, 1¦
m
→ ¦0, 1¦ is a
mapping ρ : ¦1, 2, . . . , m¦ → ¦0, 1, ¦. Then h
ρ
denotes the subfunction of h
obtained by substituting each coordinate i with ρ(i) ∈ ¦0, 1¦ by ρ(i).
For a δ ∈ [0, 1], we denote be P
m
δ
the probability space over all restric
tions, where a restriction ρ is drawn according to the following rules:
• ρ(1), . . . , ρ(m) are independent.
• Pr(ρ(i) = ) = δ.
• Pr(ρ(i) = 0) = Pr(ρ(i) = 1) =
1−δ
2
.
If ρ ∼ P
m
δ
, then ρ is called a random restriction with parameter δ.
The expected bias of h at δ is
EBias
δ
(h) = E
ρ∼P
m
δ
bias(h
ρ
)
.
Exercise 15.2. Let parity
m
be the parity function of m bits. Compute
EBias
δ
(parity
m
).
Give estimates for EBias
δ
(and
m
), where and
m
is the AND of m bits.
The main result of this chapter is the following theorem, which is a
generalized version of the XOR lemma. It states the hardness of g ⊗ f
in terms of the hardness of f and the expected bias of g. The technical
restriction is that we require that the function f be balanced.
Theorem 15.3. Let f : ¦0, 1¦
n
→ ¦0, 1¦ be an (s, δ)hard balanced function,
and let g : ¦0, 1¦
k
→ ¦0, 1¦ be arbitrary. Then, for every η > 0, the function
g ⊗f is (s
t
, 1 −EBias
(2−η)δ
(g) −ε)hard, where s
t
= Ω
s
ε
2
log(1/δ)k
.
We will not give a full proof of the result. Rather, we will give an
intuition why it should be true. In the next sections, we will discuss which
functions g are suitable to amplify hardness.
Proof overview: Suppose that x
1
, . . . , x
k
are drawn at random. Our
task is to compute (g ⊗ f)(x
1
, . . . , x
k
), where f is both balanced and δ
hard. We model the hardness of f by computing g(y
1
, . . . , y
k
) with imperfect
information about y
1
, . . . , y
k
. This means that y
i
= f(x
i
), but the hardness
of f obscures the true values and we see only corrupted values z
1
, . . . , z
k
.
Since f is balanced, Pr(y
i
= 1) = Pr(y
i
= 0) = 1/2, i.e., the values y
1
, . . . , y
k
are drawn uniformly and independently at random. Since f is δhard, we
have Pr(z
i
= y
i
) ≤ 1 −δ.
We abstract away f and just use the δhardness of f. We model this by
setting drawing z
i
according to Pr(z
i
= y
i
) = 1 −δ and Pr(z
i
= y
i
) = δ.
Now we might take simply output g(z
1
, . . . , z
k
). Then we would compute
the correct value g(y
1
, . . . , y
k
) with a probability of NStab
δ
(g). (The noise
stability, denoted by NStab, is deﬁned below.) More sophisticated, we might
15.2. Noise stability and expected bias 97
compute g(z
t
) for diﬀerent z
t
close to z = (z
1
, . . . , z
k
) and output a maximum
likelihood answer.
However, in the true setting involving f, we might not only have z
i
= y
i
,
but we may also know that z
i
is correct. Taking this to its extreme, we get
the following scenario: With a probability of 1 − 2δ, we have z
i
= y
i
, and
we know for sure that z
i
= y
i
. With a probability of 2δ, z
i
is a random bit,
and we know that z
i
is a corrupted bit. (But, of course, we do not know if
z
i
= y
i
or z
i
= y
i
. Question: Why 2δ, where f is only δhard?)
So what can we do now? We take the values z
i
for which we are certain
that z
i
= y
i
for granted, and we replace the corrupted values by . In
this way, we obtain a restriction ρ. Then we compute Pr
a
(g
ρ
(a) = 0) and
Pr
a
(g
ρ
(a) = 1). If the ﬁrst is larger, then we output 0, otherwise, we output
1. The error probability is thus 1 −bias(g
ρ
).
To compute the overall probability that this strategy succeeds, we must
take into account that ρ is in fact a random restriction, drawn from P
2δ
k
.
Thus, the probability of outputting the correct answer is nothing else but
EBias(g). The other way round, this looks as if g ⊗f be EBias
2δ
(g)hard.
The idea sketched above can be turned into a proof. It proceeds as
follows: First, one can show that f does not only possess a hardcore set,
but a balanced hardcore set H. This is not surprising: If a hardcore set
H is not balanced, then either always outputting 1 or always outputting 0
gives a nontrivial advantage. Then we transfer the idea using arguments
similar to those of the proof of Theorem 14.7.
Exercise 15.3. Use Theorem 15.3 and Exercise 15.2 to derive a weaker
form of the XOR lemma (Theorem 14.7), which holds only for balanced
functions.
15.2 Noise stability and expected bias
The expected bias is closely related to another measure for Boolean func
tions, called noise stability. Lemma 15.5 states this connection precisely.
We use noise stability since it is sometimes easier to compute, although the
expected bias is the “right” parameter for Theorem 15.3.
Deﬁnition 15.4. The noise stability of a Boolean function h : ¦0, 1¦
m
→
¦0, 1¦ is deﬁned as
NStab
δ
(h) = Pr
x∼0,1¦
m
,y∼N
δ
(x)
f(x) = f(y)
,
where x is drawn uniformly at random and y is obtained from x by ﬂipping
each bit of x independently with a probability of δ.
98 15. Ampliﬁcation within NP
The quantity
NSens
δ
(h) = 1 −NStab
δ
(h) = Pr
x∼0,1¦
m
,y∼N
δ
(x)
f(x) = f(y)
is called the noise sensitivity of h.
Exercise 15.4. Compute NStab
δ
(parity
m
) and NStab
δ
(and
m
).
Depending on the context, either noise stability or noise sensitivity will
be more convenient to use or to analyze.
In the following, let x
= 2x −1 ∈ [0, 1] for any quantity x ∈ [1/2, 1].
Lemma 15.5. For any Boolean function h : ¦0, 1¦
m
→ ¦0, 1¦, we have
NStab
δ
(h) ≤ EBias
2δ
(h) ≤
NStab
δ
(h) .
Proof. We exploit the following fact.
Exercise 15.5. Prove the following: For any Boolean function h and in
dependently and uniformly drawn x and y, we have h(x) = h(y) with a
probability of
1
2
+
1
2
bias(h)
2
. In other words,
1
2
+
1
2
bias(h)
2
= NStab
1/2
(h) .
We take a diﬀerent view on NStab
δ
(h): First, we draw ρ ∈ P
m
2δ
. We set
x
i
= y
i
= ρ
i
if ρ
i
= . For ρ
i
= , we draw x
i
, y
i
∈ ¦0, 1¦ uniformly and
independently at random. Then x is drawn uniformly at random and y
i
diﬀers from x
i
with a probability of δ. Furthermore, x and y are identically
distributed. And given ρ, they are drawn independently. Let x
t
and y
t
be
the vectors obtained by removing all positions i with ρ
i
= .
Together with Exercise 15.5, we get
NStab
δ
(h) = E
ρ∼P
m
2δ
Pr
x
,y
h
ρ
(x
t
) = h
ρ
(y
t
)
= E
ρ∼P
m
2δ
1
2
+
1
2
bias(h
ρ
)
2
.
Thus,
NStab
δ
(h) = E
ρ∼P
m
2δ
bias
(h
ρ
)
2
.
by linearity of expectation.
Since bias
(h
ρ
) ∈ [0, 1], we have bias
(h
ρ
)
2
≤ bias
(h
ρ
), which yields the
ﬁrst inequality. The second inequality follows from Jensen’s inequality since
squaring is a convex function:
NStab
δ
(h) =
E
ρ∼P
m
2δ
(bias
(h
ρ
)
2
)
≥ E
ρ∼P
m
2δ
bias
(h
ρ
)
2
= E
ρ∼P
m
2δ
bias
(h
ρ
)
= EBias
2δ
(h) .
15.3. Recursive majorityofthree and tribes 99
The following lemma will be very useful to compute the noise stability
and noise sensitivity of the functions g that we use to amplify hardness.
Lemma 15.6. Let h : ¦0, 1¦
m
→ ¦0, 1¦ be a balanced Boolean function, and
let g : ¦0, 1¦
k
→ ¦0, 1¦ be an arbitrary Boolean function. Then
NSens
δ
(g ⊗h) = NSens
NSens
δ
(h)
(g) .
Proof. Let us take a closer look at
NSens
δ
(g ⊗h)
= Pr
x
1
, . . . , x
k
∈ 0, 1¦
m
y
1
∼ N
δ
(x
1
), . . . ,
y
k
∼ N
δ
(x
k
)
g
h(x
1
), . . . , h(x
k
)
= g
h(y
1
), . . . , h(y
k
)
.
Let z
i
= h(x
i
) and z
t
i
= h(y
i
). Then Pr(z
i
= 1) = Pr(z
i
= 0) = 1/2 since h
is balanced. Furthermore, the probability that z
t
i
= z
i
is just NSens
δ
(h) =
1 −NStab
δ
(h). Thus,
NSens
δ
(g ⊗h) = Pr
z ∈ 0, 1¦
k
z
∼ N
NSens
δ
(h)
g(z) = g(z
t
)
= NSens
NSens
δ
(h)
(g)
as claimed.
15.3 Recursive majorityofthree and tribes
The function g should be nearly balanced subject to a random restriction in
order to keep EBias
δ
(g) close to 1/2.
Two functions that turn out to be very useful: The ﬁrst one is the
“recursive majority of 3” function, which we will deﬁne recursively: Let
M
k
: ¦0, 1¦
3
k
→ ¦0, 1¦. Them M
1
(x, y, z) = 1 if and only if at least two of
x, y, and z are set to 1. For k ≥ 1, we have M
k+1
= M
1
⊗M
k
. This means
that
M
k+1
(x
1
, . . . , x
3
k, y
1
, . . . , y
3
k, y
1
, . . . , y
3
k)
= M
1
M
k
(x
1
, . . . , x
3
k), M
k
(y
1
, . . . , y
3
k), M
k
(y
1
, . . . , y
3
k)
.
Lemma 15.7. For ≥ log
1.1
(1/δ), we have NStab
δ
(M
) ≤ δ
−1.1
(3
)
−0.15
.
Exercise 15.6. Prove Lemma 15.7. You can also show a slightly weaker
variant: Prove that there exist constants a > 1, b ≥ 1, c > 0 such that for
≥ log
a
(1/δ), we have NStab
δ
(M
)
≤ δ
−b
(3
)
−c
.
Hint: Calculate NSens
δ
(M
1
) explicitly. Then use Lemma 15.6. Make a
case distinction whether NSens
δ
(M
k
) is large or small for k ≤ .
100 15. Ampliﬁcation within NP
Majorityofthree is helpful to amplify a (1/ poly(n))hard language to
become somewhat hard, namely to (
1
2
−n
−α
)hard for some small constant
α > 0.
To amplify further to (
1
2
−n
−1/2+η
)hardness, we use our second function,
which is called “tribes”. Tribes does particularly well if the function whose
hardness we want to amplify is already somewhat hard.
Let w ∈ N, and let n = n(w) ∈ N be the smallest multiple of w such that
(1 −2
−w
)
n/w
≤ 1/2. Then the tribes function T
n
of n variables is deﬁned as
T
n
(x
1
, . . . , x
n
) =
n/w−1
¸
i=0
w
j=1
x
iw+j
.
If we write w as a function of n, we get w = log n −log ln n +o(1).
To estimate the expected bias of tribes is technically more challenging
than it was for majorityofthree, and we will omit a proof here.
Lemma 15.8. For every constant η > 0, there is a constant r > 0 such that
EBias
1−r
(T
n
) ≤
1
2
+n
−1/2+η
.
Exercise 15.7. Let f : ¦0, 1¦
n
→ ¦0, 1¦ be a Boolean function. The inﬂu
ence of the jth variable is deﬁned as
I
j
= Pr
x∼0,1¦
n
f(x) = f(x
(j)
)
,
where x
(j)
is obtained from x by ﬂipping the jth entry of x. The total
inﬂuence of f is I =
¸
n
j=1
I
j
.
Compute (or give as good as possible estimates for) I(and
m
), I(parity
m
),
I(majority
m
), I(M
k
), I(T
k
). (majority
m
(x
1
, . . . , x
m
) is 1 if and only if at
least m/2 of the x
i
are 1.)
Exercise 15.8. Show that
NSens
δ
(f) ≤ δ I(f) .
15.4 Hardness within NP
Recursive majorityofthree turns out to be very useful to amplify δhardness
for relatively small values of δ. More precisely, if f is 1/ poly(n)hard, then
recursive majorityofthree we can amplify its hardness to
1
2
−n
−α
for some
small constant α. Then tribes comes into play, which can, if δ is not too
small, bring the hardness to
1
2
−n
−1/2+ε
for arbitrarily small ε > 0.
Using Lemma 15.7 and Theorem 15.3, we can show that majorityof
three can amplify hardness close to 1/2.
15.4. Hardness within NP 101
Lemma 15.9. If there is a family of functions in NP which is inﬁnitely often
balanced and (poly(n), 1/ poly(n))hard. (This means that this function is
1/ poly(n)hard for circuits of polynomial size.) Then there is a family of
functions (h
m
) in NP that is inﬁnitely often balanced and (1/2+m
−0.07
)hard
for circuits of polynomial size.
Proof. Suppose a family f = (f
n
) of functions is inﬁnitely often balanced
and 1/n
c
hard for polynomialsize circuits. Choose k = n
C
for some suﬃ
ciently large constant C, and set = log
3
(k) = C log
3
n for some suﬃciently
large constant C.
Let h
m
= M
⊗f
n
. The function h
m
has input length m = nk = n
C+1
.
The family h = (h
m
)
m∈N
is in NP since f is in NP and M
is monotone and
in P. Moreover, h
m
is balanced whenever f
n
is balanced.
We have to show that h
m
is (1/2 − m
−0.07
)hard whenever f
n
is hard
and balanced. We apply Theorem 15.3 with η = 1, ε = n
−C
, and δ = n
−c
.
Lemmas 15.5 (for converting noise stability to expected bias) and 15.7
(for the noise stability of M
) yield
EBias
δ
(M
) ≤
1
2
+
1
2
δ
2
−0.55
3
−0.075
.
This assumes that is suﬃciently large, which can be ensured by choosing
C large.
Now we observe that
1
2
δ
2
−0.55
3
−0.075
≤ (2n
c
)
0.55
n
C
3
−0.075
+n
−C
≤ n
−0.074C
for large enough (but still constant) C. Finally, n
−0.074C
≤ m
−0.07
for large
enough C since m = n
C+1
.
Using the tribes function, we can further improve the hardness.
Theorem 15.10. Suppose that there is a family f = (f
n
)
n∈N
of functions in
NP which is inﬁnitely often balanced and (poly(n), 1/ poly(n))hard. (This
means that f
n
is 1/ poly(n)hard for circuits of polynomial size.) Then there
is a family of functions in NP which is inﬁnitely often (poly(n),
1
2
−n
−1/2+ε
)
hard for any ε > 0.
Exercise 15.9. Prove Theorem 15.10 using the tools Lemma 15.8, Lemma 15.9,
and Theorem 15.3.
At the expense of a small loss in the ﬁnal hardness, we get even get rid
of the requirement that the initial function is balanced.
Theorem 15.11. Suppose that there is a family of functions in NP which
is inﬁnitely often (poly(n), 1/ poly(n))hard. Then there is a family of func
tions in NP which is inﬁnitely often (poly(n),
1
2
−n
−1/3+ε
)hard.
102 15. Ampliﬁcation within NP
We can rephrase Theorem 15.11 in “boosting form”, which sounds more
positive.
Theorem 15.11’. Suppose that (L, ) ∈ Heur 1
2
−n
−0.33
P/poly for all L ∈
NP. Then (L, ) ∈ Heur
1/p
P/poly for every polynomial p and every language
L ∈ NP.
Exercise 15.10. If we allow arbitrary Boolean functions (which are not
required to be from, say, NP or PSPACE), we can even ﬁnd a function that
is exponentially close to 1/2hard for circuits of exponential size.
Prove the following: There exists a universal constant γ ≥ 1/8 such that,
for all suﬃciently large n ∈ N, there exists a function h : ¦0, 1¦
n
→ ¦0, 1¦
which is (2
γn
,
1
2
−2
−γn
)hard.
This is almost as hard as possible: No function is harder than 1/2hard
even for very small circuits. And just by hardwiring the correct function
value for one input and outputting either 0 or 1 on all other inputs (depend
ing on whether more function values are 0 or 1), we can bring the hardness
down to
1
2
−2
−n
.
16 RL and undirected connectivity
The problem
CONN = ¦(G, s, t) [ G is a directed graph with a path from s to t¦
is NLcomplete. What about its undirected counter part
UCONN = ¦(G, s, t) [ G is a undirected graph with a path from s to t¦?
It is of course in NL, but the NLhardness proof for CONN does not work for
undirected G, since the conﬁguration graph of a nondeterministic Turing
machine is a directed graph.
In this chapter, we will show that UCONN can be decided in randomized
logarithmic space. We deﬁne RSpace(s(n)) to be the class of all languages
that can be decided by an s(n)space bounded probabilistic Turing machine
with onesided error. The Turing machine has a separate input tape and the
space used on the random tape is not counted. In the same way, we deﬁne
BPSpace(s(n)), the only diﬀerence is that we allow twosided errors.
Deﬁnition 16.1. 1. RL = RSpace(O(log n)).
2. BPL = BPSpace(O(log n)).
Both RL and BPL allow probability ampliﬁcation. Obviously
RL ⊆ NL.
For randomized computations with small space, it is important that the
randomness is created “on the ﬂy”, that is, that the random tape is oneway.
For instance, one can show that BP L = BPP (the BPoperator applied to
the class L) but BPL is not likely to be BPP.
Theorem 16.2. UCONN ∈ RL
The algorithm showing that UCONN ∈ RL is very simple. We perform a
random walk starting in s. If we reach the node t, we accept. If we do not
reach t after a polynomial number of steps, we reject.
Input: undirected graph G = (V, E), nodes s, t ∈ V .
1. Let v := s.
2. For i := 1 to poly(n) do
103
104 16. RL and undirected connectivity
(a) Replace v by a random neighbour of v.
(b) If v = t, then accept.
3. Reject.
The algorithm is obviously logarithmic space bounded, since we only
have to store one node and a counter that counts to a polynomially large
value. It is clear that if there is no path between s and t, then the algorithm
is always right. The hard part is to show that if there is a path, then it is
also right with constant probability. Along the proof, we will also give an
explicit bound for the poly(n) term.
Let G = (V, E) be a dregular graph and A be its adjacency matrix.
Recall that
˜
A =
1
d
A is the normalized adjacency matrix. It is a doubly
stochastic matrix. If p is a probability distribution on V , then
˜
A
t
p is the
probability distribution that we get when drawing a starting vertex accord
ing to p and then performing a random walk of length t. As a ﬁrst step, we
will show that
˜
A
t
p converges to the uniform distribution
˜
1 on V . We will
need the following relation between 1norm and 2norm:
x
2
≤ x
1
≤
√
n x
2
for all x ∈ R
n
.
Whenever we write just x, we mean the 2norm in the following.
Lemma 16.3. Let G = (V, E) be a dregular connected graph with adjacency
matrix A and let p be a probability distribution on V . Then

˜
A
t
p −
˜
1
2
≤
λ(G)
d
t
for all t ∈ N.
Proof. Let λ = λ(G)/d. By the deﬁnition of λ, we have 
˜
Ax ≤ λx if
x⊥
˜
1. If x⊥
˜
1, then Ax⊥
˜
1, since x belongs to the direct sum of the eigenspaces
of the eigenvalues γ = 1. Therefore, by induction we get:
1. 
˜
A
t
x ≤ λ
t
x if x⊥
˜
1 and
2. 
˜
A
t
˜
1 = 1.
We decompose p = α
˜
1 + q with q⊥
˜
1. q⊥
˜
1 means that 'q,
˜
1` = 0, that is,
¸
n
i=1
q
i
= 0. This means that α = 1, since p is a probability distribution.
Thus
˜
A
t
p =
˜
A
t
(
˜
1 +q) = 1 +
˜
A
t
q.
We have p
2
= 
˜
1
2
+ q
2
, since
˜
1 and q are orthogonal. Therefore,
q ≤ p. Since p is a probability distribution, p ≤ p
1
≤ 1. Hence,

˜
A
t
p −
˜
1 = 
˜
A
t
q ≤ λ
t
q ≤ λ
t
.
Next we will show that many graphs are “slight” expanders, that is,
λ(G)/d is bounded away from 1 by 1/ poly(n).
105
Lemma 16.4. Let G be a connected dregular graph with selfloops at each
node. Then
λ(G)
d
≤ 1 −
1
8dn
3
.
Proof. Let x⊥
˜
1 with x = 1 and let y =
˜
Ax. We have
1 −y
2
= x
2
−y
2
= x
2
−2y
2
+y
2
= x
2
−2'
˜
Ax, y` +y
2
=
n
¸
i=1
n
¸
j=1
˜
A
i,j
x
2
j
−2
n
¸
i=1
n
¸
j=1
˜
A
i,j
x
j
y
i
+
n
¸
i=1
n
¸
j=1
˜
A
i,j
y
2
j
=
n
¸
i=1
n
¸
j=1
˜
A
i,j
(x
j
−v
i
)
2
.
We now claim that there are indices i and j such that
˜
A
i,j
(x
j
−y
i
)
2
≥
1
4dn
3
.
Since the sum above only contains nonnegative terms, this will also be a
lower bound for 1 −y
2
. We sort the nodes (indices) such that x
1
≥ x
2
≥
≥ x
n
. Since
¸
n
i=1
x
i
= 0, we have x
1
≥ 0 ≥ x
n
. Because x
2
= 1,
x
1
≥ 1/
√
n or x
n
≤ −1/
√
n. Thus
x
1
−x
n
≥
1
√
n
.
Thus there is an i
0
such that x
i
0
− x
i
0
+1
≥
1
n
1.5
. Set U = ¦1, . . . , i
0
¦ and
¯
U = ¦i
0
+ 1, . . . , n¦. Since G is connected, there is and edge ¦j, i¦ with
j ∈ U and i ∈
¯
U. Then
[x
j
−y
i
[ ≥ [x
j
−x
i
[
. .. .
≥x
i
0
−x
i
0
+1
≥1/n
1.5
−[x
i
−y
i
[.
If [x
i
−y
i
[ ≤
1
2n
1.5
, then [x
j
−y
i
[ ≥
1
2n
1.5
and
˜
A
i,j
(x
j
−y
i
)
2
≥
1
4dn
3
, because
˜
A
i,j
≥ 1/d since there is an edge ¦j, i¦. If [x
i
− y
i
[ ≥
1
2n
1.5
, then
˜
A
i,i
(x
i
−
y
i
)
2
≥
1
4dn
3
, because
˜
A
i,i
≥ 1/d since the graph has all self loops.
Thus
y
2
≤ 1 −
1
4dn
3
and
y ≤ 1 −
1
8dn
3
.
106 16. RL and undirected connectivity
Since this holds for all y =
˜
Ax with x = 1 and x⊥
˜
1, this is also an upper
bound for λ(G)/d.
Assume G = (V, E) is a connected dregular graph such that every node
has a self loop. Let p be any probability distribution on V . By Lemmas 16.3
and 16.4,

˜
A
t
p −
˜
1 ≤
1 −
1
8dn
3
t
≤ e
−t/(8dn
3
)
≤
1
2n
1.5
for t ≥ 12dn
3
ln n + 8dn
3
ln 2. Thus

˜
A
t
p −
˜
1
1
≤
1
2n
and (
˜
A
t
p)
i
≥ 1/n − 1/(2n) = 1/(2n). Thus the probability that we hit
any particular node i is at least 1/(2n). If we repeat this for 2n times, the
probability that we hit i is at least 1 −1/e ≥ 1/2.
This proves the correctness of the algorithm in the beginning of our
chapter. The input graph G need not to be regular or have self loops at
every node. But we can make it regular with self loops by attaching an
appropriate number of self loops to each node. The degree of the resulting
graph is at most n. This does not change the connectivity properties of
the graph. (In fact, we even do not have to do the preprocessing, since
if a node is hit in the new graph, it is only hit earlier in the old graph.)
Then we apply the analysis above to the connected component that contains
s. If t is in this component, too, then we hit it with probability at least
1/2. Note that instead of restarting the random walk, we can perform one
longer random walk, since the analysis does not make any assumption on
the starting probability except that the mass should be in the component
of s.
17 Explicit constructions of expanders
We call a family of (multi)graphs (G
n
)
n∈N
a family of dregular λexpanders
if
1. G
n
has n nodes
2. G
n
is dregular
3. λ(G
n
) ≤ λ
for all n. Here, d and λ are constants.
The family is called explicit if the function
1
n
→ G
n
is polynomial time computable. It is called strongly explicit if
(n, v, i) → the ith neighbour of v in G
n
is polynomial time computable. Here the input and output size is only
O(log n), so the algorithm runs in time only poly(log n). In our case, it is
also possible to return the whole neighbourhood, since d is constant.
Let G be a dregular graph with adjacency matrix In this chapter, it will
we very convenient to work with the normalized adjacency matrices
˜
A =
1
d
A.
These matrices are also called random walk matrices, since they describe the
transition probabilities of one step of a random walk.
˜
λ(G) is the second
largest (absolute value of an) eigenvalue of
˜
A. Obviously,
˜
λ(G) = λ(G)/d.
We will now describe three graph transformations. One of them increases
the number of nodes. This will be used to construct larger expanders from
smaller ones. The second one will reduce the degree. This is used to keep
the degree of our family constant. An the last one reduces the second largest
eigenvalue. This is needed to keep λ(G) below λ.
17.1 Matrix products
Let G be a dregular graph with normalized adjacency matrix
˜
A. The k
fold matrix product G
k
of G is the graph given by the normalized adjacency
matrix
˜
A
k
. This transformation is also called path product, since there is
an edge between u and v in G
k
if there is path of length k in G between u
and v.
It is obvious that the number of nodes stays the same and the degree
becomes d
k
.
107
108 17. Explicit constructions of expanders
Lemma 17.1.
˜
λ(G
k
) =
˜
λ(G)
k
for all k ≥ 1.
Proof. Let x be an eigenvector of
˜
A associated with the eigenvalue λ
such that λ =
˜
λ(G). Then
˜
A
k
x = λ
k
x (induction in k). Thus
˜
λ(G
k
) ≥ λ
k
.
It cannot be larger, since otherwise
˜
λ(G) > λ.
Matrix product
nodes degree
˜
λ(G)
G n d λ
G
k
n d
k
λ
k
Given oracle access to the neighbourhoods of G, that is, we may ask
queries “Give me a list of all neighbours of v!”, we can compute the neigh
bourhood of a node v in G
k
in time O(d
k
log n) by doing a breadth ﬁrst
search starting in v. From v, we can reach at most d
k
vertices and the
description size of a node is O(log n).
17.2 Tensor products
Let G be a dregular graph with n nodes and normalized adjacency matrix
˜
A and let G
t
be a d
t
regular graph with n
t
nodes and normalized adjacency
matrix
˜
A
t
. The tensor product G⊗G
t
is the graph given by the normalized
adjacency matrix
˜
A⊗
˜
A
t
. Here
˜
A⊗
˜
A
t
denotes the Kronecker product of the
two matrices, which is given by
˜
A⊗
˜
A
t
=
¸
¸
a
1,1
˜
A
t
. . . a
1,n
˜
A
t
.
.
.
.
.
.
.
.
.
a
n,1
˜
A
t
. . . a
n,n
˜
A
t
¸
,
where A = (a
i,j
).
The new graph has nn
t
nodes and its degree is dd
t
.
Lemma 17.2. Let A be a m mmatrix and B be a n nmatrix with
eigenvalues λ
1
, . . . , λ
m
and µ
1
, . . . , µ
n
. The eigenvalues of A ⊗B are λ
i
µ
j
,
1 ≤ i ≤ m, 1 ≤ j ≤ n.
Proof. Let x be an eigenvector of A associated with the eigenvalue λ.
and y be an eigenvector of B associated with the eigenvalue µ. Let z := x⊗y
be the vector
¸
¸
x
1
y
.
.
.
x
n
y
¸
.
17.3. Replacement product 109
where x = (x
i
). z is an eigenvector of A⊗B associated with λµ:
A⊗B z =
¸
¸
a
1,1
x
1
By + +a
1,m
x
m
By
.
.
.
a
m,1
x
1
By + +a
m,m
x
m
By
¸
= µ
¸
¸
(a
1,1
x
1
+ +a
1,m
x
m
)y
.
.
.
(a
m,1
x
1
+ +a
m,m
x
m
)y
¸
= λµ
¸
¸
x
1
y
.
.
.
x
m
y
¸
= λµz.
These are all eigenvalues, since one can show that if x
1
, . . . , x
m
and y
1
, . . . , y
n
are bases, then x
i
⊗y
j
, 1 ≤ i ≤ m, 1 ≤ j ≤ n, form a basis, too.
From the lemma, it follows that
˜
λ(G ⊗ G
t
) = max¦
˜
λ(G),
˜
λ(G
t
)¦, since
1
˜
λ(G
t
) and
˜
λ(G) 1 are eigenvalues of
˜
A ⊗
˜
A
t
, but the eigenvalue 1 1 is
excluded in the deﬁnition of
˜
λ(G⊗G
t
).
Tensor product
nodes degree
˜
λ(G)
G n d λ
G
t
n
t
d
t
λ
t
G⊗G
t
nn
t
dd
t
max¦λ, λ
t
¦
Given oracle access to the neighbourhoods of G and G
t
, we can compute
the neighbourhood of a node v in G⊗G
t
in time O(d
2
log max¦n, n
t
¦). (This
assume that from the names of the nodes v in G and v
t
in G
t
we can compute
in linear time a name of the node that corresponds to v ⊗v
t
.)
17.3 Replacement product
Let G be a Dregular graph with n nodes and adjacency matrix A and H be
a dregular graph with D nodes and adjacency matrix B. The replacement
product GH is deﬁned as follows:
• For every node v of G, we have one copy H
v
of H.
• For every edge ¦u, v¦ of G, there are d parallel edges between node i
in H
u
and node j in H
v
where v is the ith neighbour of u and u is the
jth neighbour of v.
110 17. Explicit constructions of expanders
We assume that the nodes of H are the number from 1 to D and that the
neighbours of each node of G are ordered. Such an ordering can for instance
be induced by an ordering of the nodes of G.
We can think of GH of having an inner and an outer structure. The
inner structures are the copies of H and the outer structure is given by G.
For every edge of G, we put d parallel edges into GH. This ensures that
when we choose a random neighbour of some node v, the probability that we
stay in H
v
is the same as the probability that we go to another H
u
. In other
words, with probability 1/2, we perform an inner step and with probability
1/2, we perform an outer step. The normalized adjacency matrix of GH
is given by
1
2
˜
A+
1
2
I ⊗B,
where I is the n nidentity matrix. The nD nDmatrix
ˆ
A is deﬁned as
follows: Think of the rows and columns labeled with pairs (v, j), v is a node
of G and j is a node of H. Then there is a 1 in the position ((u, i), (v, j)) if v
is the ith neighbour of u and u is the jth neighbour of v.
ˆ
A is a permutation
matrix.
Obviously, GH has nD nodes and it is 2dregular.
Excursus: Induced matrix norms
For a norm . on R
n
, the induced matrix norm on R
n×n
is deﬁned by
A = sup
x=0
Ax
x
= max
x=1
Ax.
It is a norm that is subadditive and submultiplicative. By deﬁnition, it is compatible
with the vector norm, that is,
Ax ≤ A x.
It is the “smallest” norm that is compatible with the given vector norm.
For the Euclidian norm .
2
on R
n
, then induced norm is the socalled spectral
norm, the square root of the largest of the absolute values of the eigenvalues of
A
H
A. If A is symmetric, then this is just the largest of the absolute values of the
eigenvalues of A. In particular,
λ(G) ≤ A
2
.
If A is symmetric and doubly stochastic, then A
2
≤ 1.
Lemma 17.3. If
˜
λ(G) ≤ 1 − and
˜
λ(H) ≤ 1 − δ, then
˜
λ(GH) ≤ 1 −
δ
2
/24.
17.3. Replacement product 111
Proof. By Bernoulli’s inequality, it is suﬃcient to show that
¯
λ(GH)
3
≤
1−δ
2
/8. Since
¯
λ(GH)
3
=
¯
λ((GH)
3
), we analyze the threefold matrix
power of GH. Its normalized adjacency matrix is given by
1
2
ˆ
A+
1
2
I ⊗
˜
B
3
. (17.1)
ˆ
A and I ⊗
˜
B are doubly stochastic, so their spectral norm is bounded by 1.
Since the spectral norm is submultiplicative, we can expand (17.1) into
=
1
8
sum of seven matrices of spectral norm ≤ 1 + (I ⊗
˜
B)
ˆ
A(I ⊗
˜
B)
=
7
8
M +
1
8
(I ⊗
˜
B)
ˆ
A(I ⊗
˜
B)
. .. .
=:(∗)
with M ≤ 1. By Exercise 17.1, we can write
˜
B = (1 − δ)C + δJ with
C ≤ 1. Thus
(∗) = (I ⊗(1 −δ)C +I ⊗δJ)
ˆ
A(I ⊗(1 −δ)C +I ⊗δJ)
= (1 −δ
2
)M
t
+δ
2
(I ⊗J)
ˆ
A(I ⊗J)
with M
t
 ≤ 1. A direct calculation shows that
(I ⊗J)
ˆ
A(I ⊗J) = A⊗J
t
where the entries of J
t
are all equal to 1/D
2
. Thus, the second largest
eigenvalue of
λ((I ⊗J)
ˆ
A(I ⊗J)) = λ(A⊗J
t
)
≤ λ(
˜
A).
Hence,
1
2
ˆ
A+
1
2
I ⊗
˜
B
3
= (1 −
δ
2
8
)M
tt
+
δ
2
8
(A⊗J
t
)
with M
tt
 ≤ 1 and
λ
1
2
ˆ
A+
1
2
I ⊗
˜
B
3
≤ 1 −
δ
2
8
+
δ
2
8
(1 −)
= 1 −
δ
2
8
,
because λ(M
tt
) ≤ M
tt
.
The only term in the analysis that we used was the (I ⊗
˜
B)
ˆ
A(I ⊗
˜
B)
term. This corresponds to doing an “inner” step in H, then an “outer step”
in G and again an “inner” step in H. The socalled zigzag product is a
product similar to the replacement product that only allows such steps.
112 17. Explicit constructions of expanders
Exercise 17.1. Let A be the normalized adjacency matrix of a dregular
λexpander. Let
J =
¸
¸
1
n
. . .
1
n
.
.
.
.
.
.
.
.
.
1
n
. . .
1
n
¸
.
Then
A = (1 −λ)J +λC
for some matrix C with C ≤ 1.
Replacement product
nodes degree
˜
λ(G)
G n D 1 −
H D d 1 −δ
GH nD 2d 1 −δ
2
/24
Given oracle access to the neighbourhoods of D and H, we can compute
the neighbourhood of a node v in G ⊗ G
t
in time O((D + d) log n). (This
assume that the oracle gives us the neighbourhoods in the same order than
the one used when building the replacement product.)
17.4 Explicit construction
We ﬁrst construct a family of expanders (G
m
) such that G
m
has c
m
nodes.
In a second step (Exercise!), we will show that we can get expanders from
G
m
of all sizes between c
m−1
+ 1 and c
m
. The constants occurring in the
proof are fairly arbitrary, they are just chosen in such a way that the proof
works. We have taken them from the book by Arora and Barak.
For the start, we need the following constant size expanders. Since they
have constant size, we do not need a constructive proof, since we can simply
enumerate all graphs of the particular size and check whether they have the
mentioned properties.
Exercise 17.2. For large enough d, there are
1. a dregular 0.01expander with (2d)
100
nodes.
2. a 2dregular (1 −
1
50
)expander with (2d)
200
nodes
We now construct the graphs G
k
inductively:
1. Let H be a dregular 0.01expander with (2d)
100
nodes.
2. Let G
1
be a 2dregular (1 −
1
50
)expander with (2d)
100
nodes and G
2
be a 2dregular (1 −
1
50
)expander with (2d)
200
nodes.
17.4. Explicit construction 113
3. For k ≥ 3, let
G
k
:= ((G
]
k−1
2

⊗G
]
k−1
2

)
50
) H
Theorem 17.4. Every G
k
is a 2dregular (1 −
1
50
)expander with (2d)
100k
nodes. Furthermore, the mapping
(bin k, bin i, bin j) → jth neighbour of node i in G
k
is computable in time polynomial in k. (Note that k is logarithmic in the
size of G
k
!)
Proof. The proof of the ﬁrst part is by induction in k. Let n
k
denote
the number of nodes of G
k
.
Induction base: Clear from construction.
Induction step: The number of nodes of G
k
is
n
]
k−1
2

n
]
k−1
2

(2d)
100
= (2d)
100(k−1)
(2d)
100
(2d)
100k
.
The degree of G
]
k−1
2

and G
]
k−1
2

is 2d by the induction hypothesis.
The degree of their tensor product is (2d)
2
and of the 50th matrix power is
(2d)
100
. Then we take the replacement product with H and get the graph
G
k
of degree 2d.
Finally, the second largest eigenvalue of G
]
k−1
2

⊗ G
]
k−1
2

is ≤ 1 −
1
50
.
Thus,
˜
λ((G
]
k−1
2

⊗G
]
k−1
2

)
50
) ≤ (1 −
1
50
)
50
≤
1
e
≤
1
2
Thus
˜
λ(G
k
) ≤ 1 −
1
2
0.99
2
/24 ≤ 1 −
1
50
.
For the second part note that the deﬁnition of G
k
gives a recursive
scheme to compute the neighbourhood of a node. The recursion depth is
log k. We have shown how to compute the neighbourhoods of G
50
, G⊗G
t
,
and GH from the neighbourhoods of the given graphs. The total size of
the neighbourhood of a node in G
k
is D
log k
= poly(k) for some constant D.
18 UCONN ∈ L
We modify the transition relation of ktape nondeterministic Turing ma
chines as follows: A transition is a tuple (p, p
t
, t
1
, . . . , t
k
) where p and p
t
are
states and t
κ
are triples of the form (αβ, d, α
t
β
t
). The interpretation is the
following: if d = 1, the head of M stands on α, and β is the symbol to the
right of the head, then M may go to the right and replace the two symbols
by α
t
and β
t
. If d = −1, then the head has to be on β and M goes to the
left. In both cases, the machine changes it state from p to p
t
. An “ordinary”
Turing machine can simulate such a Turing machine by always ﬁrst looking
at the symbols to the left and right of the current head position and storing
them in its ﬁnite control.
By deﬁning a transition like above, every transition T has a reverse
transition T
−1
that undoes what T did. M is now called symmetric if for
every T in the transition relation ∆, T
−1
∈ ∆.
Deﬁnition 18.1.
SL = ¦L [ there is a logarithmic space bounded symmetric
Turing machine M such that L = L(M) ¦
L is a subset of SL. We simply make the transition relation of a deter
ministic Turing machine M symmetric by adding T
−1
to it for every T in it.
Note that the weakly connected components of the conﬁguration graph of
M are directed trees that converge into a unique accepting or rejecting con
ﬁguration. We cannot reach any other accepting or rejecting conﬁguration
by making edges in the conﬁguration graph bidirectional, so the accepted
language is the same.
In the same way, we can see that UCONN ∈ SL: Just always guess a neigh
bour of the current node until we reach the target t. The guessing step can
be made reversible and the deterministic steps between the guessing steps
can be made reversible, too. UCONN is also hard for SL under determinis
tic logarithmic space reductions. The NLhardness proof CONN works, we
use the fact that the conﬁguration graph of a symmetric Turing machine is
undirected. Finally, if A ∈ SL and B ≤
log
A, then B ∈ SL.
Less obvious are the facts that
• planarity testing is in SL,
• bipartiteness testing is in SL,
114
18.1. Connectivity in expanders 115
• a lot of other interesting problems are contained in SL, see the com
pendium by [AG00].
• SL is closed under complementation [NTS95].
In this chapter, we will show that UCONN ∈ L. This immediately also
yields space eﬃcient algorithms for planarity or bipartiteness testing.
18.1 Connectivity in expanders
Lemma 18.2. Let c < 1 and d ∈ N. The following promise problem can be
decided by a logarithmic space bounded deterministic Turing machine:
Input: a dregular graph, such that every connected component is a
λexpander with λ/d ≤ c, nodes s and t.
Output: accept if there is a path between s and t, otherwise reject.
Proof. The Turing machine enumerates all paths of length O(log n)
starting in s. If it sees the node t, it accepts; after enumerating all the
paths without seeing t, it rejects.
Since Ghas constant degree, we can enumerate all paths in space O(log n).
Every path is described by a sequence ¦1, . . . , d¦
O(log n)
. Such a sequence
δ
0
, δ
1
, . . . is interpreted as “Take the δ
0
th neighbour of s, then the δ
1
th
neighbour of this node, . . . ”.
If the machine accepts, then there certainly is a path between s and t.
For the other direction note that, by Lemma 16.3, a random walk on G that
starts in s converges to the uniform distribution on the connected compo
nent containing s. After O(log n) steps, every node in the same connected
component of s has a positive probability of being reached. In particular
there is some path of length O(log n) to it.
18.2 Converting graphs into expanders
Lemma 18.3. There is a logarithmic space computable transformation that
transforms any graph G = (V, E) into a cubic regular graph G
t
= (V
t
, E
t
)
such that V ⊆ V
t
and for any pair of nodes s, t ∈ V , there is a path between
s and t in G iﬀ there is one in G
t
.
Proof. If a node v in G
1. has degree d > 3, then we replace v by a cycle of length d and connect
every node of the cycle to one of the neighbours of v.
2. has degree d ≤ 3, then we add 3 −d self loops.
116 18. UCONN ∈ L
For every node v with degree > 3, we identify one of the new nodes of the
cycle with v. Let the resulting graph be G
t
. By construction, G
t
is cubic
and if there is a path between s and t in G then there is one between in G
t
and vice versa.
With a little care, the construction can be done in logarithmic space.
(Recall that the Turing machine has a separate output tape that is write
only and oneway, so once it decided to output an edge this decision is not
reversible.) We process each node in the order given by the representation
of G. For each node v, we count the number m of its neighbours. If m ≤ 3,
then we just copy the edges containing v to the output tape and output the
additional self loops. If m > 3, then we output the edges ¦(v, i), (v, i + 1)¦,
1 ≤ i < m and ¦(v, m), (v, 1)¦. Then we go through all neighbours of v. If
u is the ith neighbour of v, then we determine which neighbour v of u is,
say the jth, and output the edge ¦(v, i), (u, j)¦. (We only need to do this if
v is processed before u because otherwise, the edge is already output.)
Let d be large enough such that there is a d/2regular 0.01expander H
with d
50
nodes. (Again, the constants are chosen in such a way that the
proof works; they are fairly arbitrary and we have taken them from the
book by Arora and Barak.) We can make our cubic graph G d
50
regular by
adding d
50
−3 self loops per node. Recursively deﬁne
G
0
:= G
G
k
:= (G
k−1
H)
50
.
Lemma 18.4. For all k ≥ 1,
1. G
k
has d
50k
n nodes,
2. G
k
is d
50
regular,
3.
˜
λ(G
k
) ≤ 1 −
k
, where
k
= min¦
1
20
,
1.5
k
8d
50
n
3
¦.
Proof. The proof is by induction in k. Let n
k
be the number of nodes of
G
k
.
Induction base: G
0
has n nodes and degree d
50
. By Lemma 16.4,
˜
λ(G
0
) ≤
1 −
1
8d
50
n
3
≤ 1 −
0
.
Induction step: The replacement product G
k
H has n
k
d
50
= n
k+1
nodes.
Its degree is d. G
k+1
has the same number of nodes and the degree becomes
d
50
. We have
˜
λ(G
k
H) ≤ 1 −
k
24
0.99
2
≤ 1 −
k
25
and
˜
λ(G
k
) ≤
1 −
k
25
50
≤ e
−2
k
≤ 1 −2
k
+ 2
2
k
= 1 −2
k
(1 −
k
).
18.2. Converting graphs into expanders 117
If
k
=
1
20
, then
˜
λ(G
k
) ≤ 1 −
1
20
. If
k
=
1.5
k
8d
50
n
3
<
1
20
, then
˜
λ(G
k
) ≤ 1 −1.5
k
= 1 −
k+1
.
If we set k = O(log n), then G
k
is a constant degree expander with
¯
λ(G
k
) ≤
19
20
. For such graphs, connectivity can be decided in deterministic
logarithmic space by Lemma 18.2. So we could ﬁrst make our input graph
cubic, then compute G
k
for k = O(log n) and ﬁnally use the connectivity
algorithm for expander graphs. Since L is closed under logarithmic space
computable reductions, this would show UCONN ∈ SL.
But there one problem: To compute G
k
, we cannot compute G
0
, then
G
1
, then G
2
, and so on, since L is only closed under application of a constant
number of manyonereductions. Thus we have to compute G
k
from G
0
in
one step.
Lemma 18.5. The mapping G
0
→ G
k
with k = O(log n) is deterministic
logarithmic space computable.
Proof. Assume that G
0
has nodes ¦1, . . . , n¦. Then the nodes of G
k
are
from ¦1, . . . , n¦ ¦1, . . . , d
50
¦
k
. The description length of a node of G
k
is
log n+50 log d k = O(log n). We will identify ¦1, . . . , d
50
¦ with ¦1, . . . , d¦
50
,
since an edge in G
k
corresponds to a path of length 50 in G
k−1
H.
Now given a node v = (i, δ
1
, . . . , δ
k
) of G
k
and j ∈ ¦1, . . . , d
50
¦, we want
to compute the jth neighbour of v in G
k
. We interpret j as a sequence
(j
1
, . . . , j
50
) ∈ ¦1, . . . , d¦
50
.
Input: node v = (i, δ
1
, . . . , δ
k
) of G
k
, index j = (j
1
, . . . , j
50
)
Output: the jth neighbour of v in G
k
1. For h = 1, . . . , 50 compute the j
h
neighbour of the current node in
G
k−1
H.
So it remains to compute the neighbours in G
k−1
H.
Input: node v = (i, δ
1
, . . . , δ
k
) of G
k
, index j
Output: the jth neighbour of v in G
k−1
H
1. If j ≤ d/2, then return (i, δ
1
, . . . , δ
k−1
, δ
t
) where δ
t
is the j neighbour
of δ
k
in H. Since H is constant, this can be hardwired.
(We perform an internal step inside a copy of H.)
2. Otherwise, recursively compute the δ
k
th neighbour of (i, δ
1
, . . . , δ
k−1
)
in G
k−1
.
(We perform an external step between two copies of H.)
Note that we can view (v, δ
1
, . . . , δ
k
) as a stack and all the recursive calls
operate on the same step. Thus we only have to store one node at a time.
118 18. UCONN ∈ L
Theorem 18.6 (Reingold [Rei08]). UCONN ∈ SL.
Corollary 18.7. L = SL.
19 Extractors
Extractors are a useful tool for randomness eﬃcient error probability am
pliﬁcation. To deﬁne extractors, we ﬁrst have to be able to measure the
closeness of probability distributions.
Deﬁnition 19.1. Let X and Y two random variables with range S. The
statistical diﬀerence of X and Y is Diﬀ(X, Y ) = max
T⊆S
[ Pr[X ∈ T] −
Pr[Y ∈ T][. X and Y are called close if Diﬀ(X, Y ) ≤ .
In the same way, we can deﬁne the statistical diﬀerence of two probability
distributions.
We can think of T as a statistical test which tries to distinguish the
distributions of X and Y . The L
1
distance of X and Y is deﬁned as
[X −Y [
1
=
¸
s∈S
[ Pr[X = s] −Pr[Y = s][
L
1
distance and statistical diﬀerence are related as stated below.
Exercise 19.1. Prove the following: Two random variables X and Y are
close if and only if [X −Y [
1
≤ 2.
Statistical closeness is preserved under application of functions.
Exercise 19.2. Prove the following statements:
1. Let X and Y be random variables with range S that are close. Let f
be a function with domain S. Then f(X) and f(Y ) are close.
2. If Z is a random variable independent of X and Y , then the random
variables (X, Z) and (Y, Z) are close.
A classical measure for the amount of randomness contained in a random
source X is the Shannon entropy H(X) = −
¸
s∈S
Pr[X = s] log Pr[X = s].
This is however not a suitable measure in our context. Consider for instance
the following source: With probability 0.99 it returns the allzero string.
With probability 0.01 is returns a string in ¦0, 1¦
N
chosen uniformly at
random. The Shannon entropy of this source is ≥ 0.01N which is quite
large, in particular unbounded. If we want to use this source for simulating
randomized algorithms, we will take one sample from this source. But with
probability 0.99, we see a string that contains no randomness at all which
is not very useful for derandomization. The Shannon entropy measures
“randomness on the average” and particularly does not talk about variance.
It is useful when one draws many samples from a source. For our purposes,
the following deﬁnition is more useful.
119
120 19. Extractors
Deﬁnition 19.2. Let X be a random variable with range S.
1. The minentropy of X is min
s∈S
−log Pr[X = s].
2. If X has minentropy at least k, then X will be called a ksource. If in
addition its range is contained in ¦0, 1¦
N
, then X is an (N, k)source.
Note that the minentropy of the source above is only log 1/0.99 which
is constant. In some sense, the minentropy measures “randomness in the
worstcase”.
Deﬁnition 19.3. Let U
d
be the uniform distribution on ¦0, 1¦
d
. A function
Ext : ¦0, 1¦
N
¦0, 1¦
d
→ ¦0, 1¦
m
is called a (k, )extractor if for any
(N, k)source X, Ext(X, U
d
) is close to uniform.
Above, we call a source close to uniform, if it and U
m
are close.
Our aim is to construct extractors with small d and large m. An extractor
extracts the randomness of the weak source in the sense that given a sample
of the weak random source and a short truly random string, it produces a
string that is nearly uniformly distributed.
Sometimes it is convenient to view an extractor Ext as a bipartite multi
graph. The nodes are ¦0, 1¦
N
on the one and ¦0, 1¦
m
on the other side. Each
node v ∈ ¦0, 1¦
N
has degree 2
d
. It is incident with the edges (v, Ext(v, i))
for all i ∈ ¦0, 1¦
d
.
A family of extractors Ext
m
: ¦0, 1¦
N(m)
¦0, 1¦
d(m)
→ ¦0, 1¦
m
is
called explicit, if the mapping (m, v, e) → Ext
m
(v, e) is computable in time
poly(N(m), d(m), m). (Usually, N ≥ m for an extractor. Therefore, we
parameterize the family by the size of the image.)
19.1 Extractors from expanders
Lemma 19.4. Let > 0. Let k(n) ≤ n for all n. There is an explicit family
of (k, )extractors Ext
n
: ¦0, 1¦
n
¦0, 1¦
t
→ ¦0, 1¦
n
with t = O(n − k −
log 1/).
Proof. Let X be an (n, k)source, and let v be a sample drawn from
X. Let G = (V, E) be a dregular
1
2
expander with 2
n
nodes. (We do not
construct this graph, since it is too large. We just perform a random walk
on it. This is possible, since strongly explicit expanders exist.) Let z be a
truly random string of length
t = log d (
n
2
−
k
2
+ log
1
+ 2) = O(n −k + log
1
).
We interpret z as a random walk in G of length =
n
2
−
k
2
+ log
1
+ 1 and
set
Ext(v, z) = label of the node reached from v by a walk as given by z
19.2. Randomness eﬃcient probability ampliﬁcation 121
Let p be the probability distribution on V induced by X and A be the
adjacency matrix of G. Let p =
˜
1 +p
t
with
˜
1⊥p
t
. We have

˜
A
p −1 ≤ 
˜
A
(p −
˜
1) ≤ 
˜
A
p −1 ≤ 2
−
p −
˜
1.
Since X is an (n, k)source, we have Pr[X = s] ≤ 2
−k
for every s in the
range of X. Thus p
2
≤ 2
−k
. Therefore,
p −
˜
1 ≤ p +
˜
1 ≤ 2
−k/2
+ 2
−n/2
≤ 2
−k/2+1
and

˜
A
−
˜
1 ≤ 2
−n/2+k/2−log 1/−2
2
−k/2+1
≤ 2
−n/2−1
.
Finally,
Diﬀ(
˜
A
p, U
n
) = 2
˜
A
p −
˜
1
1
≤ 2
˜
A
p −
˜
1
2
2
n/2
≤ .
The extractor constructed above is only eﬃcient if the k is large, at least
(1 −)n. For small k, better constructions are known.
19.2 Randomness eﬃcient probability ampliﬁcation
Lemma 19.5. If there is an explicit family of (k(r), 1/8)extractors Ext
r
:
¦0, 1¦
N(r)
¦0, 1¦
d(r)
→ ¦0, 1¦
r
, then for any BPPTuring machine M that
runs in time t, uses r random bits, and has error probability 1/3, there is a
BPPmachine M
t
with L(M) = L(M
t
) that runs in time poly(N(r), 2
d(r)
, t),
uses N(r) random bits, and has error probability bounded by 2
k(r)−N(r)
.
Proof. M
t
uses its N(r) random bits and interprets it as a string x ∈
¦0, 1¦
N(r)
. Let y
i
= Ext(x, i) for all i ∈ ¦0, 1¦
d(r)
. M
t
now simulates 2
d(r)
runs of M, each one with a diﬀerent string y
i
as random string. M
t
accepts
if the majority of these runs lead to an accepting conﬁguration and rejects
otherwise.
The bound on the running time is clear from the construction. We
have to estimate the error probability. Assume that a given input u is in
L(M), i.e., M accepts u with probability at least 2/3. The case u / ∈ L(m)
is symmetric. To show the bound on the error probability, it is suﬃcient
to show that less than 2
k(r)
of the random strings x lead to a rejecting
conﬁguration. Suppose on the contrary that this is not the case. Let S be
the set of all such x. Then the uniform distribution X on S has minentropy
at least k(r). Thus Ext(X, U
d(r)
) is 1/6close to uniform. Let T ⊆ ¦0, 1¦
r
be
the statistical test that consists of all random strings that make M accept.
The probability that a string drawn uniformly at random from ¦0, 1¦
r
is
in T is at least 2/3. By deﬁnition, the probability that the y
i
are in T is
≥ 2/3 −1/8 > 1/2.
122 19. Extractors
This is a contradiction, since for each choice of x that makes M
t
reject,
more than half of the string Ext(x, i) lead to a rejecting conﬁguration, i.e.,
are not in T.
If we take the extractor from the previous section, we have N(r) = r. To
achieve d(r) = O(log n) (and get polynomial running time), we have to set
k(r) = r−log r. To get a k(r) source, we can use k(r) random bits and ﬁll the
remaining log r bits with zeroes. The error probability is 2
r−log r−r
= 1/r.
So we get a polynomial error reduction with less random bits! (Note that
one can always save log r random bits by trying all possibilities for them
and then making a majority vote. But it is not clear that this reduces the
error probability, since the trials are not independent.)
Extractors can also be used to run PTMs with a weak random source
instead of a prefect random string. The proof of the following lemma is
similar to the proof of the previous one and is left as an exercise.
Lemma 19.6. If there is an explicit family of (k(r), 1/6)extractors Ext
r
:
¦0, 1¦
N(r)
¦0, 1¦
d(r)
→ ¦0, 1¦
r
then for any BPPmachine M that runs in
time t, uses r random bits, and has error probability 1/3, there is a Turing
machine M
t
with L(M) = L(M
t
) that runs in time poly(N(r), 2
d(r)
, t), uses
one sample of an (N(r), k(r)+(r))source, and has error probability bounded
by 2
−(r)
.
Exercise 19.3. Prove Lemma 19.6
20 Circuits and ﬁrstorder logic
One can (quite easily) ﬁnd AC
0
circuits for addition. Multiplication seems a
little harder, but there are constant depth circuits with unbounded fanin for
multiplication, if we use not only and, or, and not gates, but also threshold
gates. But for a long time, it was not even known how to divide even in
logspace, let alone with constantdepth circuits of polynomial size. This
changed The goal of this and the following section is to develop threshold
circuits for division.
It will sometimes be more convenient to do this in the framework of
logic. Thus, we will show the equivalence of constantdepth circuits to ﬁrst
order logic in this section, which has been proved by Barrington and Im
merman [BI90]. In the next section, we will then show that division can be
performed by constantdepth circuits.
This chapter is far from being a complete introduction to complexity
theory in terms of logic, which is called descriptive complexity. We will only
cover ﬁrstorder logic with some extensions. For a more detailed introduc
tion, we refer to Immerman [Imm99].
20.1 Firstorder logic
Firstorder logic is logic, where the quantiﬁers range only over elements of
the domain and not (as in secondorder logic) over sets of elements.
Since we want to express properties of strings over ¦0, 1¦, we introduce
a unary predicate X. For an input string x = x
0
. . . x
n−1
, we have X(i) =
1 if and only if x
i
= 1. (We will use 1 and true as well as 0 and false
interchangeably.) We will have constants 0, 1, and [x[ = n and binary
predicates = and ≤ on numbers ¦0, . . . , n¦. Finally, we include the binary
predicate BIT, where BIT(i, m) = 1 if and only if the ith bit of the binary
expansion of m is 1. (For instance, BIT(0, m) = 1 if and only if m is odd.
The role of BIT will soon become clear.)
Our ﬁrstorder language is the set of formulas that we can build using
0, n, ≤, =, BIT, X() as well as ∧, ∨, and variables x, y, z, . . . and the
quantiﬁers ∀ and ∃. The quantiﬁer always range over ¦0, 1, . . . , n −1¦, i.e.,
∃x means ∃x ∈ ¦0, 1, . . . , n − 1¦. To make the notation less cumbersome,
we add syntactic sugar like →, where a → b means a ∨ b. Furthermore,
we abbreviate (a ∧ b) ∨ (a ∧ b) by a = b, knowing well that this might
cause confusion with the “oﬃcial” binary predicate =. The exclusiveor is
denoted by ⊕. We will also use the binary predicate < on numbers. Not
123
124 20. Circuits and ﬁrstorder logic
surprisingly, a < b if and only of a ≤ b and (a = b). Analogously, we will
use =, >, and ≥. To increase confusion, we identify 0 and false as well as 1
and true, we will thus treat true and false also as numbers.
A sentence is a closed formula of the ﬁrstorder language that contains
no free variables. (A variable x is free if there is no quantiﬁer ∀x or ∃x to the
left of it that binds this variable.) Sentences express properties of binary
strings: A string x speciﬁes X and n = [x[. Then the sentence is either
true or false. In the former case, x has the property (or is in the language
speciﬁed by the sentence). In the latter, not.
We denote by FO the set of all languages that can be expressed by ﬁrst
order sentences. In particular, the predicates ≤ and BIT are allowed for FO.
If we want to make clear that only builtin predicates P
1
, . . . , P
c
are allowed,
we call the corresponding class FO[P
1
, . . . , P
c
]. Thus, FO = FO[≤, BIT].
Furthermore, FO[], which forbids ≤ and BIT, is a strict subclass of FO. In
FO[], only =, 0, 1, and n are available.
Example 20.1. The regular language L((00 + 11)
) can be expressed by
∀i : BIT(0, i) → ∃j : succ(i, j) ∧ X(i) = X(j) .
We have to specify succ(i, j), which should be 1 if and only if i + 1 = j:
succ(i, j) ≡ ∀k : (k < j) → (k ≤ i) .
The class FO suﬃces to perform (or describe) addition. We assume that
three inputs x, y, and z (each an n digit number) are each given as separate
unary predicates X, Y , and Z, respectively.
We have x +y = z if and only if X(0) ⊕Y (0) = Z(0) and X(i) ⊕Y (i) ⊕
C(i) = Z(i) for i ∈ ¦1, 2, 3, . . . , n − 1¦ and C(n) = 0. Here, C(i) denotes
the ith carry bit. We can express this as
X(0) ⊕Y (0) = Z(0)
∧ ∀i :
i = 0 ∨
X(i) ⊕Y (i) ⊕C(i) = Z(i)
∧ C(n) .
This leaves us with two problem of how to compute C? As a ﬁrst attempt,
one might try
C(i) ≡
X(i −1) ∧ Y (i −1)
∨
C(i −1) ∧ (X(i −1) ∨ Y (i −1))
.
(We can compute i − 1 using succ.) But this does not work: C is not
a predicate that we are allowed to use. Instead, it is a placeholder for
something, and we are only allowed to replace it by a ﬁrstorder formula.
Thus, we replace C(i) by the ﬁrstorder sentence
∃j < i : X(j) ∧ Y (j) ∧ ∀k ∈ ¦j, . . . , i −1¦ : X(k) ∨ Y (k) .
20.1. Firstorder logic 125
We call something like C, which looks like a predicate, will be used like a
predicate, but is no predicate, a pseudo predicate in the following.
In the same way, we can add numbers from ¦0, 1, . . . , n −1¦ using BIT.
We call the corresponding ternary predicate +, written in the usual way
x +y = z.
Just to get a better feeling, assume that our input consists of three parts,
representing n/3bit numbers a, b, and c. We want to test if a+b = c. First,
we compute m = n/3 and m
t
= 2n/3 = 2m, which can be done by
∃m∃m
t
: m+m = m
t
∧ m+m
t
= n .
Then we add three pseudo predicates A, B, C as follows:
A(i) ≡ (i < m) ∧ X(i) ,
B(i) ≡ ∃j : i +m = j ∧ (i < m) ∧ X(j) , and
C(i) ≡ ∃j : i +m
t
= j ∧ (i < m) ∧ X(j) .
Now we can use addition using the pseudo predicates.
Exercise 20.1. Give a ﬁrstorder sentence for the language COPY = ¦ww [
w ∈ ¦0, 1¦
¦.
Exercise 20.2. Give a ﬁrstorder sentence for the following variant of par
ity:
x ∈ ¦0, 1¦
[ x = x
1
. . . x
n
∧
¸
]log
2
n
i=1
x
i
= 1
¸
.
(We will soon prove that FO equals AC
0
(with appropriate uniformity). We
already know that parity is not in AC
0
(not even in nonuniform AC
0
). In
deed, every constant depth circuit for parity has to be of size 2
n
ε
for some
constant ε > 0. Why is this not a contradiction?)
Do the proof without the following Lemma 20.2.
Using BIT, we can also perform multiplication of numbers between 0 and
n − 1. To do this, we need the following result, which is called the bitsum
lemma.
Lemma 20.2. The binary predicate BSUM, which is deﬁned by BSUM(x, y) =
1 if and only if y is equal to the number of 1s in the binary representation
of x, is in FO.
Proof overview: The idea is to keep a runningsum s
1
, s
2
, s
3
, . . . of the
ﬁrst, second, third, . . . log log n bits of x. Then we only have to compare
whether s
i
equals s
i−1
plus the ith block of log log n bits of x. This reduces
the problem of counting 1s in blocks of log log n bits. We apply the same
idea again.
126 20. Circuits and ﬁrstorder logic
Now we encode an array containing the preﬁx sums within each block
into a single variable. Furthermore, we have a variable that represents an
array containing s
1
, s
2
, . . .
Proof. It is fairly easy to express a predicate Pow2 that some number m
is a power of two:
Pow2(m) ≡ ∃i : BIT(i, m) ∧ ∀j : j = i → BIT(j, m) .
The number x consists of at most log
2
n bits. Let L be the smallest power
of two larger than log
2
n. The number L can be expressed as follows:
∃L : Pow2(L) ∧ BIT(L −1, n) = 1 .
(Strictly speaking, we have to translate something like BIT(L − 1, n) into
∃q : q +1 = L∧BIT(q, n). But will not do so for the sake of readability.) In
the following, we assume that a variable can hold up to L bits. This is not
precisely true, but we can use a ﬁxed number c of variables to store c log n
bits. Addressing them is not too complicated since c is a constant and log n
can be computed.
Given any power of two A = 2
a
, we can multiply with and divide by A.
We express x = Ay as
∀i : BIT(i, y) = BIT(i +A, x) ∧ (i < a → BIT(i, x) = 0
.
We have to add less than L bits. This is a number with at most log
2
L
bits. Let L
t
be the smallest power of two larger than log
2
L. The number
L
t
can be expressed in the same way as L.
The idea is to keep a runningsum: Using one existentially quantiﬁed
variable s (which represents L bits), we can guess (roughly) L/L
t
numbers
s
1
, s
2
, . . . , s
L/L
such that s
i
= s
i−1
+ t
i
, where t
i
is the number of 1s of
x
(i−1)L
+1
, . . . , x
iL
. Furthermore, s
1
= t
1
. Given i, the bits of s
i
are the
bits s
(i−1)L
, . . . , s
iL
−1
. We can address them since L
t
is a power of two.
Thus, for instance, we can add them or compare them. We assume for the
moment that the numbers t
i
are given. Then we can express this as
∃s : s
1
= t
1
∧ ∀i : (i > 1) → s
i
= t
i
+s
i−1
.
Thus, we can express BSUM:
BSUM(x, y) = ∃s : s
1
= t
1
∧ ∀i : (i > 1) → s
i
= t
i
+s
i−1
∧ s
L
= y .
The t
1
, t
2
, . . . remain to be computed. We can compute t
i
by a runningsum,
this time over single bits. The numbers of roughly at most log log log n < L
t
bits, and there are only L
t
partial sums. We assume that L
t
L
t
< L, which
is true for suﬃciently large n. Then all partial sums ﬁt into a single variable.
20.2. Firstorder logic with majority 127
We call the kth partial sums t
k
j
. We know how to address since L
t
is a power
of two:
∃t
j
: BIT(0, t
1
j
) = BIT(jL
t
, x) ∧ ∀k ∈ ¦1, . . . , L −1¦ BIT(i, t
1
j
) = 0∧
∀i∀k : t
k
j
+ BIT(jL
t
+k, x) = t
k+1
j
.
(We have used + with a Boolean value BIT(), but it should be clear how to
interpret this.)
Noting that we can deal with notlargeenough values of n by hardwiring
the results completes the proof.
Lemma 20.3. The ternary multiplication predicate , which is true if and
only if x y = z, is ﬁrstorder deﬁnable.
Proof. Multiplication is equivalent to adding log n numbers a
1
, . . . , a
log n
of O(log n) bits, where a
i
= 2
i
xy
i
.
Let L = Θ(log n) and L
t
= Θ(log log n) be as in the proof of Lemma 20.2.
We add the numbers a
1
, a
2
, . . . as follows: First, we split a
i
= b
i
+c
i
such that
the binary representations of b
i
and c
i
consist each of L
t
bits of a
i
separated
by L
t
bits of 0s: b
i
contains the ﬁrst, third, ﬁfth, . . . block of L
t
bits of a
i
, c
i
the second, fourth, sixth, . . . block. From now on, we treat b
1
, b
2
, b
3
, . . . and
c
1
, c
2
, . . . separately. By symmetry, we only have to describe how to add the
b
i
s. If we have the two sums of the b
i
s and c
i
’s, then we can add them sums
using a ﬁnal +.
We consider the b
i
s as written below each other. Then we count the
number of 1s in each column using BSUM. Due to the structure of the
b
i
s, no carry is propagated more than L
t
bits. This reduces the problem to
adding log n numbers (L
t
for each of the log n/L
t
blocks), each of length L
t
(because of possible carries). But if fact, we have only L/L
t
blocks of L
t
numbers of L
t
bits to add. We can guess all sums in a single variable. This
is then the sum of the b
i
s. Then we can verify that we guessed correctly
using BSUM and runningsums, as we did for Lemma 20.2.
Remark 20.4. Instead of BIT and ≤, we can directly use + and . This
is equivalent: We can implement BIT using + and , and we have already
seen that BIT suﬃces to implement + and . Thus, FO = FO[≤, BIT] =
FO[+, ].
Exercise 20.3. Show that FO[≤] ⊆ FO[+].
20.2 Firstorder logic with majority
We can extend ﬁrstorder logic in diﬀerent ways. First, we can add new
predicates and constants, which would allow us to specify properties of more
128 20. Circuits and ﬁrstorder logic
complicated structures. For instances, to specify properties of graphs, it is
more convenient to specify a graph G = (V, E) on n vertices by a binary
relation E with E(i, j) = 1 if and only if ¦i, j¦ ∈ E.
Exercise 20.4. Show that using a binary input predicate E instead of the
unary predicate X does not increase the expressive power of ﬁrstorder logic.
Second, and more importantly for our purposes, we can introduce new
quantiﬁers. We will make heavy use of the threshold quantiﬁer M. It has
the following interpretation: Mx : P(x) is true if and only if P(x) = 1 for
more than half of the possible x, i.e., for at least (n+1)/2 of the n possible
values for x.
We denote by FOM the set of all languages that can be expressed by
ﬁrstorder sentences with ≤ and BIT as well as M.
Another quantiﬁer, which is only of temporary use for us, is H: Hx : P(x)
is true if and only of P(x) is true for n/2 of the n possible values of x.
“Hx : P(x)” can be expressed by saying “P(x) is not true for the majority
of x, but it becomes true if we add one more element x
t
for which P(x
t
) is
true”:
Hx : P(x)
≡ ∃x
t
:
Mx : P(x) ∨ x = x
t
∧
Mx : P(x)
.
The quantiﬁer H is useful to express the following predicates:
1. F(x, y)P(x): “There are exactly y values of x with x ≤ n/2 and
P(x).”
2. S(x, y)P(x): “There are exactly y values of x with x > n/2 and
P(x).”
3. y = #x : P(x): “There are exactly y values of x for which P(x).”
We only show how to express the ﬁrst expression:
F(x, y)P(x)
≡ Hx :
x ≤ n/2 ∧ P(x)
∨
n/2 < x ≤ n −y
.
The second expression follows by symmetry. The third is addition of vari
ables, which we have seen in Section 20.1.
Using #, we can add not only two numbers, but a sequence of num
bers: Let ITADD(X
1
, . . . , X
n
, Y ) be the predicate that is true if the sum
of the numbers x
1
, . . . , x
n
, represented by the unary predicates X
1
, . . . , X
n
,
is equal to y, represented by Y . (ITADD stands for iterated addition. For
convenience, we assume that we have a binary predicate X(i, j) = X
i
(j).)
As we have seen, multiplication reduces to adding sequences of numbers.
Let MULT(X, Y, Z) be true if and only if x y = z.
20.3. Uniformity 129
Lemma 20.5. ITADD, and hence also MULT, can be expressed in FOM.
Proof. The lemma can be proved in a similar way as Lemma 20.3. We
already have # because we are allowed to use M, which is in this setting the
equivalent of BSUM.
Exercise 20.5. Fill in the details of the proof of Lemma 20.5.
For ∀ and ∃, quantiﬁcation over pairs of variables can be replaced by
two quantiﬁers. For instance, ∃xy is equivalent to ∃x∃y. For M, it is not
immediately clear how to get rid of quantiﬁers over two or more variables.
However, we can do so by using BIT.
Lemma 20.6. Mxy can be expressed using FOM.
Exercise 20.6. Prove Lemma 20.6.
Hint: First express the predicate 'u, v` = #'x, y` : P(x, y) with the
meaning “there are exactly n(u −1) +v pairs x, y for which P(x, y) is true.
Exercise 20.7. Show that PARITY ∈ FOM.
In the next chapter, we will show that division is in FOM.
20.3 Uniformity
The circuit complexity classes NC
i
and AC
i
all come in diﬀerent ﬂavors
corresponding to diﬀerent conditions on uniformity.
Recall: A family C = (C
n
)
n∈N
of circuits is called polynomialtime uni
form, if the mapping 1
n
→ C
n
(where we identify C
n
and an appropriate
encoding of C
n
) can be computed in polynomial time. C is logspace uniform
if the mapping can be computed in logarithmic space.
From the two uniformity conditions, we get ptimeu AC
i
and ptimeu NC
i
as well as logspaceu AC
i
and logspaceu NC
i
. However, both uniformity
conditions have drawbacks if we want to analyze subclasses of L or NC
1
.
The reason is, for instance, for ptimeuAC
0
, constructing the circuit can be
much harder than evaluating it. Thus, in order to study subclasses of NC
1
,
we need a more restrictive variant of uniformity.
It turns out that a good choice is DLOGTIME uniformity:
1. It is restricted in the sense that constructing the circuit is very easy.
In particular, it allows us to distinguish subclasses of NC
1
.
2. It yields circuit complexity classes equal to FO and FOM. So it is
robust.
3. Many ptimeu or logspaceu circuits are actually DLOGTIME uniform.
130 20. Circuits and ﬁrstorder logic
Of course, in logarithmic time, we cannot construct a circuit of poly
nomial size. But we are able to answer questions about speciﬁc gates. To
make this more precise, we need the following deﬁnition.
Deﬁnition 20.7. The connection language of a family C = (C
n
)
n∈N
of
circuits is the set of all tuples z = 't, a, b, y`, such that a and b are numbers
(in binary) of gates of C
n
such that b is an input for a and gate a has type
t. The string y is arbitrary such that the whole string z has length n.
In time logarithmic in n (which is by construction also the input length),
we are able to read the relevant parts of the instance 't, a, b, n`. (We will
see that the necessary steps can be performed in logarithmic time below.)
Let us make more precise what we mean by deterministic logarithmic
time: A logtime Turing machine has a readonly input tape, a constant
number of worktapes, and a readwrite address tape. The address tape is
used to select bits from the input. On a given time step, the Turing machine
has access to the input bit speciﬁed by the contents of its address tape. If
the number on the address tape is too large, the Turing machine will get
the information that this is the case.
Deterministic logarithmic time Turing machines look quite limited at
ﬁrst glance, but they can perform some nontrivial basic tasks:
1. They can determine the length of their input.
2. They can add and subtract numbers of O(log n) bits.
3. They can compute the logarithm of numbers of O(log n) bits.
4. They can decode simple pairing functions.
This suﬃces to recognize the relevant parts of connection languages.
Exercise 20.8. Prove that deterministic logtime Turing machines can in
deed do what we claimed above.
We denote by DLOGTIME the set of languages that can be decided in
deterministic logarithmic time.
Deﬁnition 20.8. A family C = (C
n
)
n∈N
of circuits is DLOGTIME uni
form (DLOGTIMEu), if the connection language of C can be decided in
DLOGTIME.
In the following, we are mainly concerned with AC
0
and TC
0
. If nothing
else is said, we assume DLOGTIME uniformity. AC
0
contains all languages
that can be decided by DLOGTIMEu families of circuits with unbounded
fanin, constant depth, and polynomial size. TC
0
is deﬁned similarly. The
exception is that we also have threshold gates of unbounded fanin. A thresh
old gate of m inputs outputs a 1 if and only if at least (m + 1)/2 of its
inputs are 1. (This means, more than half of its inputs must be 1.)
20.4. Circuits vs. logic 131
20.4 Circuits vs. logic
The main result of this section is that AC
0
= FO and TC
0
= FOM. To
show this, we ﬁrst prove that our uniformity condition is restricted enough
to allow for “construction in FO”.
Lemma 20.9. DLOGTIME ⊆ FO.
Proof. Let M be a DLOGTIME Turing machine with k work tapes.
We will have to write down a ﬁrstorder sentence ϕ such that for all input
strings x, M accepts x if and only ϕ is satisﬁed by X.
The main idea is simple: Since the machine runs in logarithmic time,
we can encode its behavior into a constant number of variables. (Recall
that a variable can hold values between 0 and n −1, thus log n bits. Using
BIT, we can specify individual bits.) Each step t is described by a constant
number of bits: M’s state q
t
, the k symbols w
1
, . . . , w
k
that M writes on its
work tapes, the k directions d
1
, . . . , d
k
to which M moves its head on tape
1, . . . , k, respectively, as well as the position I
t
of the input head, which is
controlled by the address tape.
The sentence ϕ begins with existential quantiﬁers over the variables
z
1
, . . . , z
c
(c is a suitable constant) that describe the behavior of M. This
means that ϕ = ∃z
1
∃z
2
. . . ∃z
k
ψ(z
1
, . . . , z
k
) for some ﬁrstorder sentence ψ.
The sentence ψ must assert that z = (z
1
, . . . , z
k
) forms a valid accepting
computation of M. To do this, we deﬁne two ﬁrstorder formulas: C(p, t, a)
is true if and only if the contents of cell p at time t is a. P(p, t) is true if
and only if the appropriate head is at position p at time t. (The position p
also contains the information on which tape the position is.) Given P and
T, we can write ψ as follows. Let us ﬁx a time t.
1. We have to assert that I
t
is correct for all t. To do this, we have a
variable y with an existential quantiﬁer, and we condition y to be equal
to the contents of the address tape at time t, which can be veriﬁed
using C. Then we set I
t
= C(y).
2. The step from time t to time t + 1 should be according to M’s ﬁnite
control. The step depends on I
t
, the current state q
t
, k work tape
symbols (a current tape symbol is an a such that there exists a p with
C(p, t, a) ∧ P(p, t)).
Using P, we can write C: The contents of cell p of tape i at time t is
w
i,t
, where t
t
is the most recent visit of head i to position p. If M has not
yet visited p, then w
i,t
is the blank symbol.
Finally, to get P, we have to sum up O(log n) values of d
t
for t
t
< t.
This can be done as we have already seen.
Theorem 20.10. AC
0
= FO and TC
0
= FOM.
132 20. Circuits and ﬁrstorder logic
Proof. We only prove the ﬁrst statement. The second follows with
an almost identical proof, where we have to replace the quantiﬁer M by
threshold gates and vice versa. (Since threshold gates can take more than
n inputs, we need Lemma 20.6 for TC
0
= FOM.)
Let us ﬁrst prove FO ⊆ AC
0
. Let ϕ be any ﬁrstorder formula of quantiﬁer
depth d. Without loss of generality, we can assume that ϕ is in prenex
normal form. This means that ϕ = Q
1
y
1
Q
2
y
2
. . . Q
d
y
d
: ψ(y
1
, . . . , y
d
), where
ψ is quantiﬁerfree and Q
1
, . . . , Q
d
are any quantiﬁers.
For such a ϕ, there is a canonical constantdepth circuit C
n
for every n.
A tree of fanout n and depth d corresponds to the quantiﬁers. At each leaf,
there is a constantsize circuit corresponding to ψ(y
1
, . . . , y
d
). This circuit
consists of Boolean operators, input nodes, and constants corresponding to
the value of atomic formulas (=, ≤, and BIT), where the constants depend
on y
1
, . . . , y
d
.
What remains to be done is to show that this canonical circuit family is
indeed DLOGTIME uniform. The address of a node will consist of log n bits
for each quantiﬁer (this is needed to specify the respective value of y
i
) as well
as a constant number of bits specifying which node of the respective copy
of the constantsize circuit we are considering. In order to answer queries
for the connection language, our DLOGTIME machine has to be able to (i)
compare O(log n) bit numbers and do arithmetic with them (like dividing
them into their parts for the several quantiﬁers) and (ii) compute from the
numbers y
1
, . . . , y
d
to which input nodes the respective constantsize circuit
has to be connected. The latter is possible because the DLOGTIME machine
can perform BIT and all the other operations on O(log n) bit numbers that
a ﬁrstorder formula can do. It is not diﬃcult but tedious to work out the
details.
To prove AC
0
⊆ FO, we ﬁrst observe that the connection language is in
FO by Lemma 20.9. Let C = (C
n
)
n∈N
be a DLOGTIME uniform family
of constantdepth, polynomialsize circuits. Since C
n
is of polynomial size,
we can refer to its nodes by tuples of (a constant number of) variables. In
order to devise a ﬁrstorder formula for C
n
, we will express the predicate
AccGate(a), which is true if and only if gate a accepts, i.e., outputs 1. If we
have AccGate, then we just have to evaluate it for the output gate.
To get AccGate, we deﬁne inductively predicates AccGate
d
(a) with the
meaning “gate a on level d outputs 1”. For level 0, AccGate
0
(a) is true if
and only if (i) a is a number of a gate on level 0 and (ii) a is connected to
some input x
i
= 1.
To express AccGate
d
(a), we have to evaluate if a is a gate on level d
in the ﬁrst place. Since d is constant, this can be expressed. If gate a is
indeed on level d, then we proceed as follows. If a is a NOT gate, then
AccGate
d
(a) = AccGate
d−1
(b) for some gate b on level d − 1. (We can
easily ﬁnd out which gate b we need using the connection language, which is
in FO by assumption.) If a is an AND gate, we use a ∀b : ξ(b) to range over
20.4. Circuits vs. logic 133
all other gates b. The expression ξ(b) is true if and only if (i) b is not a gate
on level d −1, or (ii) b is not a predecessor of a, or (iii) b is a predecessor of
a and AccGate
d−1
(b) = 1.
Exercise 20.9. Let LH be the class of languages that can be decided by
an alternating logtime Turing machine. (Such machines work similar to
deterministic logtime Turing machines, except that they are alternating.)
Show that FO = LH = AC
0
. (This requires only little extra work, given
that we know FO = AC
0
and DLOGTIME ⊆ FO.)
Thus, DLOGTIMEu AC
0
is indeed a very robust class: We can deﬁne
it in terms of circuits, logic, or using Turing machines.
Exercise 20.10. Construct family of circuits of polynomial size and depth
O(log n/ log log n) for parity.
Note: This is asymptotically optimal (see Corollary 12.7 of the lecture
notes of “Computational Complexity Theory”).
21 Threshold circuits for division
For addition and multiplication, and also for subtraction, it is not too hard
to come up with AC
0
circuits. But division seems to be much harder. It
is fairly easy to see that division can be done in polynomial time. But for
a long time, it was unknown if division is possible in logarithmic space. In
the 1980s, it has been shown that there are polynomialtime uniform TC
0
circuits. (We will deﬁne below, what TC
0
means.) However, even this does
not prove that division can be done in logarithmic space. This was shown
by Chiu et al. [CDL01], who proved that division lies in logspace uniform
TC
0
(which is a subclass of L). Finally, it was shown that division is in
DLOGTIME uniform TC
0
, which is optimal: the problem is complete for
DLOGTIME uniform TC
0
. (We will also deﬁne below what DLOGTIME
uniform means. Let us just remark that it is even weaker than logspace
uniform.)
The goal of this chapter is to prove that division is in DLOGTIME
uniform TC
0
= FOM.
We will do so in three steps: First, we reduce division to iterated multi
plication. Second, we will introduce a new predicate POW (see below). Let
us write FOMP = FOM[BIT, <, POW] and FOP = FO[BIT, <, POW] for
short. We will show that division can be described in FOMP. This places
division in L since FOM = TC
0
⊆ L and POW can easily be seen to be in L.
Third, we will show that in fact POW can also be expressed in FOM, which
places division in FOM = TC
0
.
In the remainder of this chapter, variables in capital letters, such as X
and Y , denote numbers of polynomial lengths. We call them also long num
bers. Small letters represent short numbers, which are of length O(log n).
If there are numbers of length poly(log n), we will mention their lengths
explicitly.
21.1 Division, iterated multiplication, and powering
Division is closely related to two other problems: iterated multiplication and
powering:
Division: The predicate Division(X, Y, i) is 1 if and only if the ith bit of
X/Y  is 1.
Powering: Powering(X, k, i) is 1 if and only if the ith bit of X
k
is 1. (Note
that X has n bits and k has length O(log n).
134
21.2. Division in FOM + POW 135
Iterated multiplication: ItMult(X
1
, X
2
, . . . , X
n
, i) = 1 if and only if the
ith bit of
¸
n
j=1
X
j
is a 1.
If we want to compute X/Y and have 1/Y with suﬃcient precision, then
division reduces to multiplication. And we already know how to multiply in
TC
0
. Now observe that
∞
¸
i=0
(1 −α)
i
=
1
α
for α ∈ (0, 1). If we assume further that α ∈ [1/2, 1), then we have
1
α
=
n
¸
i=0
(1 −α)
i
+O(2
−n
) .
Now let j = log
2
Y  be roughly the number of bits of Y . Then use 2
−j
Y ∈
[
1
2
, 1) as α in the preceding equation. This yields
2
nj
X
Y
= X
n
¸
i=0
(1 −2
−j
Y )
i
+O(X 2
nj−n
)
= X
n
¸
i=0
(2
j
−Y )
i
(2
j
)
n−i
+O(X 2
nj−n
) . (21.1)
This is equivalent to
X
Y
= X
n
¸
i=0
(2
j
−Y )
i
2
−ij
+O(X 2
−n
) .
Thus, X/Y is approximated within an additive error of O(X2
−n
). If we
can evaluate the sum in (21.1), then we can proceed as follows: We calcu
late X/Y with a precision of O(1), and then we compute the exact value
of X/Y  by hand. (There is only a constant number of candidates and
multiplication can be done in FOM.)
So far, we have reduced division to computing an iterated sum of powers.
Of course, powering reduces to iterated multiplication. Thus, we mainly
focus on iterated multiplication in the following.
21.2 Division in FOM + POW
The central tool for iterated multiplication, and thus also for division, is
the Chinese remainder representation (CRR): An nbit number is uniquely
determined by its residues modulo polynomially prime numbers, each of
length O(log n). (There are enough such primes.)
Assume that we are given primes m
1
, . . . , m
k
, each a short number, and
let M =
¸
k
i=1
m
i
be their product. Any number X ∈ ¦0, 1, . . . , M −1¦ can
136 21. Threshold circuits for division
be represented uniquely as (x
1
, . . . , x
k
) with X ≡ x
i
(mod m
i
) for each i.
Let C
i
= M/m
i
, and let h
i
be the inverse of C
i
modulo m
i
, i.e., C
i
h
i
≡ 1
(mod m
i
). Then, for any i, we have
X ≡
k
¸
i=1
x
i
h
i
C
i
(mod M) .
Even more,
X =
k
¸
i=1
x
i
h
i
C
i
−rM
for some number r = rank
M
(X), called the rank of X with respect to M.
Note that r is a short number, equal to the sum of the integer parts of
x
i
h
i
C
i
/M = x
i
h
i
/m
i
, which is in ¦0, 1, . . . , m
i
−1¦.
What does CRR help? We have reduced iterated multiplication to iter
ated multiplication of short numbers, which is considerably easier.
The algorithm for iterated multiplication is now easy to describe:
1. Convert the input from binary to CRR.
2. Compute the iterated product in CRR.
3. Convert the answer from CRR back to binary.
As a tool, we assume that the following predicate is given:
POW(a, i, b, p) ≡ a
i
≡ b (mod p) .
(All four numbers here are short numbers.)
21.2.1 The second step
If p is a prime, then the multiplicative group Z
p
is cyclic and of order p −1.
This allows us to take discrete logarithms: First, we ﬁnd g, the smallest
generator of Z
p
: g is the smallest number with g
i
≡ 1 (mod p) for 0 < i <
p −1. This yields a FOP formula GEN(g, p) that yields true if and only if g
is the smallest generator of Z
p
. If g is a generator, then g
i
≡ a (mod p) has
a unique solution for every a. Using POW and GEN, we can take discrete
logarithms:
GEN(g, p) ∧ POW(g, i, a, p)
is a FOP predicate that states that i is the discrete logarithm of a.
Now, if the input is in CRR, then iterated multiplication simply reduces
to iterated addition: We just have to add the discrete logarithms. Since
iterated addition is in FOM, this would put iterated multiplication in FOMP.
This gives us the second step of our algorithm. However, we still have to be
able to perform the ﬁrst and third step of the algorithm.
21.2. Division in FOM + POW 137
21.2.2 The ﬁrst step
The ﬁrst step of our algorithm is easy to accomplish in FOMP, as we see
from the following lemma.
Lemma 21.1. If X, m
1
, . . . , m
k
are given in binary and X < M =
¸
k
i=1
m
i
,
then we can compute CRR(X) = (x
1
, . . . , x
k
) in FOMP.
Proof. For each m
i
and each j < n, we can compute 2
j
mod m
i
using
POW. In this way, we obtain values y
i,j
∈ ¦0, 1, . . . , m
i
− 1¦ for 1 ≤ i ≤ k
and 0 ≤ j < n. Then we add y
i,1
+. . . +y
i,n−1
using iterated addition (which
is in FOM) and take the sum modulo m
i
to obtain x
i
.
In the lemma above, the prime numbers m
1
, . . . , m
k
are given. You
might wonder how we actually get them since, of course, they are not part
of the input. This is not very diﬃcult, but we will nevertheless deal with it
later on.
21.2.3 The third step
We will prove that we can perform the third step of our algorithm in FOMP
by a series of lemmas. First, we observe that we, in fact, already know how
to divide, albeit only by short primes.
Lemma 21.2. Let p be a short prime. Then the binary representation of
1/p can be computed to n
O(1)
bits of accuracy in FOP.
Proof. We can assume that p is odd. Let s ∈ N be arbitrary. We write
2
s
= ap + b with b = 2
s
mod p. Then the sth bit of the binary expansion
of 1/p is equal to the loworder bit of a: We are interested in the loworder
bit of 2
s
/p. Now we have
2
s
p
= a +
b
p
. Since b ∈ ¦0, 1, . . . , p −1¦, we have
b/p = 0. Thus,
¸
2
s
p
¸
=
¸
a +
b
p
¸
= a .
We observe that ap + b = 2
s
is even. Thus, because p is odd, b mod 2 =
a mod 2. Therefore, the loworder bit of b is the sth bit of 1/p.
In binary representation, it is very easy to test if a number is smaller
than another. In Chinese remainder representation, this is more diﬃcult,
although it can be done.
Lemma 21.3. Let X, Y ∈ ¦0, 1, . . . , M −1¦ given in CRR
M
form. Testing
whether X < Y is in FOMP.
Proof. Of course, X < Y if and only if X/M < Y/M. Thus, it suﬃces
to show that we can compute X/M to polynomially many bits of accuracy.
138 21. Threshold circuits for division
Recall that X =
¸
k
i=1
x
i
h
i
C
i
− rank
M
(X)M and that C
i
= M/m
i
.
Thus,
X
M
=
k
¸
i=1
x
i
h
i
m
i
−rank
M
(x) .
The numbers x
1
, . . . , x
k
are given to us since CRR
M
(X) is part of the in
put. The numbers C
1
, . . . , C
k
can be computed in FOMP: For C
i
, we add
the discrete logarithms of m
j
for j = i. And h
1
, . . . , h
k
are the inverse of
C
1
, . . . , C
k
, respectively, which can also easily be computed in FOMP.
By Lemma 21.2, we can compute each summand can be computed to
polynomially many bits of accuracy. We know that iterated addition is in
FOM, thus we can compute polynomially many bits of the binary represen
tation of
k
¸
i=1
x
i
h
i
m
i
=
X
M
+ rank
M
(X) .
Since rank
M
(X) is an integer, X/M is just the fractional part of this sum,
of which we have suﬃciently many bits.
A useful consequence of us being able to compare numbers in CRR is that
it allows us to change the CRR basis: If we have primes primes p
1
, . . . , p
with
¸
i=1
p
i
= P, then we can get CRR
P
(X) from CRR
M
(X). The crucial
ingredient for this is that, given CRR
M
(X) we can compute X mod p for a
short prime p.
Lemma 21.4. Given CRR
M
(X) and a short prime p,, we can compute
X mod p in FOMP.
Proof. If p = m
i
for some i, then we know the answer from the input.
Thus, we can assume that p does not divide M. Let P = Mp. If we can
compute CRR
P
(X), then this gives us X mod p.
We turn to bruteforce: We try all p ≤ poly(n) possible values q for
X mod p. This gives us the CRR
P
of numbers X
0
, . . . , X
p−1
. One of these
numbers is X. Moreover, X is the only number among X
0
, . . . , X
p−1
that
is smaller than M. (This follows that numbers smaller than M diﬀer in
CRR
M
and all X
0
, . . . , X
p−1
have a unique representation in CRR
P
.)
We can compute CRR
P
(M) by adding the discrete logarithms of the
primes m
1
, . . . , m
k
modulo p. We carry out comparisons with M in CRR
P
.
Thus, we can compute X mod p by ﬁnding the unique X
i
that is smaller
than M. All of this can be done in FOMP.
The last lemma towards implementing the third step of our division
algorithm is dividing by products of short primes.
Lemma 21.5. Let b
1
, . . . , b
be distinct short primes, let B =
¸
i=1
b
i
, and
let CRR
M
(X) be given. Then we can compute CRR
M
(X/B) in FOMP.
21.2. Division in FOM + POW 139
Proof. We can assume that B divides M. Otherwise we apply Lemma 21.4
and extend our CRR basis. Let M = BP. By dropping the primes from P
from our basis, we can compute CRR
B
(X mod B) in FOMP. From this, we
can compute CRR
M
(X mod B) by extending the basis again according to
Lemma 21.4. Finally, we compute X −(X mod B) = B X/B in CRR
M
.
By assumption, B and P are relatively prime. Thus, there exists a
B
−1
with BB
−1
≡ 1 (mod P). We can ﬁnd CRR
P
(B
−1
) in FOMP: This
is ﬁnding the inverse of each component of the CRR
P
representation with
respect to each component of P using discrete logarithms. Now we have
B
−1
B
¸
X
B
¸
≡
¸
X
B
¸
(mod P) (21.2)
in CRR
P
representation. The ﬁnal step is to observe that X < M implies
X/B < P. Thus, we can extend the basis to get the CRR
M
representation
of X/B.
Finally, we are able to prove that also the third step of our algorithm,
converting CRR numbers into binary representation, can be expressed in
FOMP.
Theorem 21.6. Given CRR
M
(X), with 0 ≤ X < M, we can compute the
binary representation of X in FOMP.
Proof. It suﬃces to compute X/2
s
 for any s. Then the sth bit of X
is given as X/2
s
 − 2 X/2
s+1
. We get this number in CRR
M
, but it is
easy to distinguish 0 from 1, even in CRR.
First, we create numbers A
1
, . . . , A
s
. Each A
j
is the product of polyno
mially many short distinct primes that do not divide M, and want A
j
> M.
Recall that M =
¸
k
i=1
m
i
for short primes. Let m
1
< m
2
< . . . < m
k
be the ﬁrst k odd primes. (We can assume this without loss of generality.)
Then we set A
j
=
¸
k
i=1
p
jk+1
, where p
is the th smallest prime number.
The prime number theorem guarantees that there are enough short primes
for our purposes, and these A
j
fulﬁll our conditions. Furthermore, a list of
all (short) primes smaller than poly(n) can easily be computed by a TC
0
circuit, this also in FOM. Thus, we know how to get these primes.
Assume that the A
j
are very large. Then
1+A
j
2A
j
≈
1
2
. Thus, X/2
s
≈
X
¸
s
j=1
1+A
j
2A
j
. It might look as if we are complicating the problem, but it
turns out that, on the one hand, this quantity involving the A
j
s is easier to
compute and, on the other hand, it is precise enough to give us X/2
s
.
Let P = M
¸
s
j=1
A
j
. We extend the basis to get CRR
P
(X). Since
M < A
j
for all j, we have
s
¸
j=1
1 +A
j
A
j
<
1 +
1
M
s
.
140 21. Threshold circuits for division
Furthermore, for every K ≥ 1,
1 +
1
M
s
< exp
s
M
<
1 +
1
K
s
M
(K+1)
Setting K =
M
s+1
and exploiting that s < M, this yields
s
¸
j=1
1 +A
j
A
j
< 1 +
s + 1
M
. (21.3)
Using Lemma 21.5, we can compute the CRR
P
of
Q =
¸
X
¸
s
j=1
1+A
j
2
¸
s
j=1
A
j
¸
≥
¸
X
2
s
¸
.
By (21.3), we have
X
¸
s
j=1
1+A
j
2
¸
s
j=1
A
j
<
X
2
s
1 +
s + 1
M
<
X
2
s
1 +
2
s
X
<
X
2
s
+ 1 .
Thus, Q ∈ ¦X/2
s
, X/2
s
 + 1¦. We determine which one of Q, Q − 1 is
correct by checking whether Q2
s
> X (using CRR
P
).
Exercise 21.1. Using Lemma 21.1 and Theorem 21.6, one can convert in
FOMP numbers from any base to any other base.
Prove this!
Corollary 21.7. Division, iterated multiplication, and powering can be ex
pressed in FOMP.
21.3 POW is in FO
21.3.1 Two special cases in FO
The ﬁrst step towards proving POW ∈ FO will be to show POW as well as
division and iterated multiplication of very short numbers can be performed
in FO. We start by showing that this is true for POW.
Lemma 21.8. POW(a, r, b, p), where a, r, b, and p have O(log log n) bits
each, is in FO.
Proof. Let us assume that a, b, p, and r have k log log n bits each. We
can compute a
r
mod b in FO by using repeated squaring. To do this, we
consider the sequence r
0
, r
1
, . . . , r
k log log n
of exponents with r
i
= r/2
i
.
Thus, r
0
= r and r
k log log n
= 0. Moreover, r
i
= 2r
i+1
or r
i
= 2r
i+1
+ 1
depending on the corresponding bit of r.
21.3. POW is in FO 141
Now we compute all values a
i
= a
r
i
mod p. We have to check that
a
k log log n
= 1 and a
i
= a
2
i+1
mod p or a
i
= a
2
i+1
mod p, depending on the
corresponding bit of r. Each check can be performed easily in FO. Since
each a
i
needs at most k log log n bits and there are k log log n such numbers,
all a
i
easily ﬁt into a single variable for suﬃciently large n. Thus, we can
perform all checks in parallel, which completes the proof of the lemma.
Now we use that POW for very short numbers can be expressed in FO
to show that division and iteration multiplication of short numbers can be
done in FO as well.
Theorem 21.9. ItMult and Division, where the inputs have (log n)
O(1)
bits,
are in FO.
Proof. We know from Corollary 21.7 that division and iterated mul
tiplication with inputs of length r can be done in FOMP over the universe
0, . . . , r−1. We set r = (log n)
k
. Then Division and ItMult can be expressed
in FOMP over the universe 0, . . . , (log n)
k
−1. We will show that such FOMP
formulas can be expressed in FO over the universe 0, 1, . . . , n −1.
Note that all uses of POW in these formulas are called with inputs
of O(log(log n)
k
) = O(log log n) bits. Thus, we can replace POW by FO
formulas according to Lemma 21.8. In the same way, the threshold quantiﬁer
can be replaced by a FO formula since the range of the quantiﬁed variables
is 0, . . . , (log n)
k
− 1. This is because BSUM can be expressed in FO as
long as there are at most (log n)
O(1)
ones to count, which is the result of
Exercise 21.2, which completes the proof.
Exercise 21.2. Prove the following: In FO over the universe 0, 1, . . . , n−1,
we can count the number of 1s in binary strings of length (log n)
O(1)
. Even
more, we can even count the number of 1s in a binary string of length n if
this number is at most (log n)
O(1)
.
Remark 21.10. Beyond being a tool for showing that division is in FOM,
this theorem is also interesting in its own right: It gives tight bounds for
the size of the numbers for which Division and ItMult are in FO: One the
one hand, the theorem shows that this is the case for numbers consisting of
(log n)
O(1)
bits. On the other hand, we have FO = AC
0
. And any circuit
of constant depth d that computes parity of m bits must be of size 2
Ω(
2d
√
m)
.
For parity of m bits being in FO, we need 2
Ω(
2d
√
m)
≤ poly(n), which implies
m ≤ poly(log n).
21.3.2 POW is in FO
What remains to be done is to show that POW is in FO. In order to prove
this, we ﬁrst show something slightly more general: powering in groups of
142 21. Threshold circuits for division
order n is FO Turing reducible to ﬁnding the product of log n elements of
this group.
This needs some clariﬁcation: First, FO Turing reducible essentially
means that we are allowed to use a predicate for the product of log n el
ements of this group. Second, we restrict ourselves to groups that can be
represented in FO. This means that group elements can be labeled by num
bers 0, . . . , n −1 such that the product operation is FO deﬁnable.
Exercise 21.3. Show that for any group that can be represented in FO, the
inverse and the neutral element can be deﬁned in FO.
Lemma 21.11. Finding small powers in any group of order n, i.e., comput
ing a
r
for a group element a and a small number r, is FO Turing reducible
to ﬁnding products of log n elements.
Proof. Our goal is to compute a
r
. The way how we do this is to compute
group elements a
1
, . . . , a
k
as well as numbers u, u
1
, . . . , u
k
for k = o(log n)
such that a
r
= a
u
¸
k
i=1
a
u
i
i
. In addition, we want u
i
< 2 log n and u <
2(log n)
2
.
Given these elements and numbers, we can easily compute a
r
: a
u
i
i
for
1 ≤ i ≤ k as well as a
u
amounts to computing products of small numbers of
group elements: For a
u
i
i
, this follows from u
i
< 2 log n. And for a
u
, we use
two rounds of multiplying at most 2 log n elements. The result a
r
is then
also just a product of k + 1 numbers.
We will choose the group elements a
i
to be dth roots of unity for a small
prime d. The numbers u
i
can then be computed using Chinese remaindering.
Our ﬁrst step consists of ﬁnding a CRR basis D consisting of primes, each
of which is at most O(log n). More precisely, we choose a set k = o(log n)
primes d
1
, . . . , d
k
such that d
i
< 2 log n for each i and each d
i
is relatively
prime to n. Furthermore, we want n < D =
¸
k
i=1
d
i
< n
2
. We can compute
these d
i
s by a FO formula that ﬁnds the ﬁrst D > n that is squarefree,
relatively prime to n such that all its prime factors are smaller than 2 log n.
To compute the number k and the relation between the d
i
and i, we count,
for each prime p
0
, the numbers of primes dividing D that are smaller than
p
0
. We can do this using BSUM.
The second step consists of computing a
i
= a
]n/d
i

. We do this as
follows: First, we compute a
−1
(see Exercise 21.1). Second, we compute
n
i
= n mod d
i
. Third, we compute a
−n
i
by multiplying n
i
copies of a. We
can do this by the assumption of this lemma because n
i
< d
i
< 2 log n. Now
we come to computing a
]n/d
i

. To do this, we observe that
a
]n/d
i

d
i
= a
d
i
]n/d
i

= a
n−n
i
= a
−n
i
.
The last equality holds because a
n
= 1 in any group of order n.
21.3. POW is in FO 143
Now let d
−1
i
be the multiplicative inverse of d
i
, i.e., there exists a number
m with d
i
d
−1
i
= mn + 1. There exists exactly one group element x with
x
d
i
= a
−n
i
, and this group element x is the one we are looking for: We have
x = x
mn+1
=
x
d
i
d
−1
i
=
a
−n
i
d
−1
i
.
Thus, we can express a
i
as
∃a
i
x
d
i
= a
−n
i
.
Note that we can compute x
d
i
using multiplication of O(log n) numbers, but
we cannot compute a
]n/d
i

directly since it might happen that d
−1
i
is not
O(log n).
Our third step consists of ﬁnding the exponents u, u
1
, . . . , u
k
. By the
choice of the a
i
in the second step, we have
a
u
1
1
. . . a
u
k
k
= a
k
i=1
u
i
]n/d
i

.
Thus, we have to choose u
1
, . . . , u
k
such that
u ≡ r −
k
¸
i=1
u
i
¸
n
d
i
¸
(mod n) . (21.4)
In order to get a small value for u, we have to choose
¸
k
i=1
u
i
n
d
i
 close
to r. To achieve this, we approximate r as a linear combination of n/d
i
:
Compute f = rD/n. (We can compute this since r has only O(log n) bits
by Theorem 21.9.) Let D
i
= D/d
i
. Then we compute u
i
= fD
−1
i
mod d
i
.
This gives us
k
¸
i=1
u
i
D
i
≡ f (mod D) .
Let m be a number that satisﬁes
¸
k
i=1
u
i
D
i
= f+mD. Now we can calculate
u from u
1
, . . . , u
k
according to (21.4). (This is a sum of k short numbers,
which can be computed in FO since k = o(log n).)
What remains to be done is to show that u < 2(log n)
2
. To show this,
144 21. Threshold circuits for division
we calculate the diﬀerence between r and the sum of the u
i
n/d
i
:
k
¸
i=1
u
i
¸
n
d
i
¸
=
k
¸
i=1
u
i
n
d
i
−
k
¸
i=1
u
i
n
d
i
−u
i
¸
n
d
i
¸
=
n
D
k
¸
i=1
u
i
D
i
−
k
¸
i=1
u
i
n
d
i
−
¸
n
d
i
¸
=
n
D
(f +mD) −
k
¸
i=1
u
i
n
d
i
−
¸
n
d
i
¸
=
n
D
¸
rD
n
¸
+nm−
k
¸
i=1
u
i
n
d
i
−
¸
n
d
i
¸
= r −
n
D
rD
n
−
¸
rD
n
¸
+nm−
k
¸
i=1
u
i
n
d
i
−
¸
n
d
i
¸
.
This yields
u =
n
D
rD
n
−
¸
rD
n
¸
+
k
¸
i=1
u
i
n
d
i
−
¸
n
d
i
¸
.
For any number x, we have x − x ∈ [0, 1). Furthermore, n/D < 1 by
our choice of D, u
i
< 2 log n for each i, and k = o(log n). Thus, we have
u < 2(log n)
2
, which ﬁnishes the proof.
Now we note that, ﬁrst, FO is closed under polynomial changes of the
input size and, second, the product of log(n
k
) = k log n groups elements is
FO Turing reducible to the product of log n groups elements. This yields
that ﬁnding powers in any group of order n
k
is FO Turing reducible to ﬁnding
the product of log n elements.
We now apply the above result that powering reduces to iterated mul
tiplication to the groups of integers modulo p for a prime p = O(n
k
). The
multiplicative group Z
p
contains the integers 1, . . . , p −1. Multiplication in
Z
p
is FOdeﬁnable since multiplication is FOdeﬁnable.
For evaluating POW(a, r, b, p), we proceed now as follows: If a = 0, then
we just have to check whether also b = 0. Otherwise, we can ﬁnd a
r
in Z
p
,
provided that the product of log n group elements can be computed with
inputs of size log
2
n. However, this can be done according to Theorem 21.9.
This immediately yields the main results of this section and of this chapter.
Theorem 21.12. POW is in FO.
Theorem 21.13. Division is in FO.
Bibliography
[ACG
+
99] G. Ausiello, P. Crescenzi, G. Gambosi, V. Kann, A. Marchetti
Spaccamela, and M. Protasi. Complexity and Approximation.
Springer, 1999.
[AG00] Carme Alvarez and Raymond Greenlaw. A compendium of
problems complete for symmetric logarithmic space. Comput.
Complexity, 9:73–95, 2000.
[ALM
+
98] Sanjeev Arora, Carsten Lund, Rajeev Motwani, Madhu Sudan,
and Mario Szegedy. Proof veriﬁcation and hardness of approx
imation problems. J. ACM, 45(3):501–555, 1998.
[BDCGL92] Shai BenDavid, Benny Chor, Oded Goldreich, and Michael
Luby. On the theory of average case complexity. J. Comput.
Syst. Sci, 44(2):193–219, 1992.
[BGS98] Mihir Bellare, Oded Goldreich, and Madhu Sudan. Free bits,
PCPs, and nonapproximability—towards tight results. SIAM
J. Comput, 27(3):804–915, 1998.
[BI90] David A. Mix Barrington and Neil Immerman. On uniformity
within NC
1
. J. Comput. Syst. Sci, 41:274–306, 1990.
[Big93] Norman Biggs. Algebraic graph theory. Cambridge University
Press, second edition, 1993.
[BT06] Andrej Bogdanov and Luca Trevisan. Averagecase complex
ity. Foundations and Trends in Theoretical Computer Science,
2(1):1–106, 2006.
[CDL01] Andrew Chiu, George I. Davida, and Bruce E. Litow. Division
in logspaceuniform NC
1
. RAIRO Theoretical Informatics and
Applications, 35(3):259–275, 2001.
[Din07] Irit Dinur. The PCP theorem by gap ampliﬁcation. J. ACM,
54(3), 2007.
[FKN02] E. Friedgut, G. Kalai, and A. Naor. Boolean functions whose
Fourier transform is concentrated on the ﬁrst two levels. Adv.
in Applied Math., 29:427–437, 2002.
145
146 BIBLIOGRAPHY
[GLST98] Venkatesan Guruswami, Daniel Lewin, Madhu Sudan, and
Luca Trevisan. A tight characterization of NP with 3query
PCPs. In Proc. 39th Ann. IEEE Symp. on Foundations of
Comput. Sci. (FOCS), pages 8–17, 1998.
[Gur91] Yuri Gurevich. Average case completeness. J. Comput. Syst.
Sci, 42(3):346–398, 1991.
[H˚as99] Johan H˚astad. Clique is hard to approximate within n
1−
. Acta
Mathematica, 182:105–142, 1999.
[HLW06] S. Hoory, N. Linial, and A. Wigderson. Expander graphs and
their applications. Bull. Amer. Math. Soc., pages 439–561,
2006.
[Imm99] Neil Immerman. Descriptive Complexity. Springer, 1999.
[Imp95] Russell Impagliazzo. Hardcore distributions for somewhat
hard functions. In Proc. 36th Ann. IEEE Symp. on Founda
tions of Comput. Sci. (FOCS), pages 538–545, 1995.
[KS03] Adam R. Klivans and Rocco A. Servedio. Boosting and hard
core set construction. Machine Learning, 51(3):217–238, 2003.
[Lev86] Leonid A. Levin. Average case complete problems. SIAM J.
Comput, 15(1):285–286, 1986.
[NTS95] Noam Nisan and Amnon TaShma. Symmetric logspace is
closed under complement. Chicago Journal of Theoretical Com
puter Science, 1995.
[O’D04] Ryan O’Donnell. Hardness ampliﬁcation within NP. J. Com
put. Syst. Sci, 69(1):68–94, 2004.
[Rei08] Omer Reingold. Undirected connectivity is in logspace. J.
ACM, 55(4), 2008.
[RVW02] Omer Reingold, Salil Vadhan, and Avi Wigderson. Entropy
waves, the zigzag graph product and new constant degree ex
panders and extractors. Annals of Mathematics, 155(1):157–
187, 2002.
[Sen07] Stefan Senitsch. Ein kombinatorischer Beweis f¨ ur das PCP
Theorem. Diplomarbeit, TU Ilmenau, 2007.
[SS96] Michael Sipser and Daniel Spielman. Expander codes. IEEE
Trans. Inform. Theory, 42:1710–1722, 1996.
[Vaz01] Vijay V. Vazirani. Approximation Algorithms. Springer, 2001.
BIBLIOGRAPHY 147
[vN28] John von Neumann. Zur Theorie der Gesellschaftsspiele. Math
ematische Annalen, 100:295–320, 1928.
[VV86] Leslie G. Valiant and Vijay V. Vazirani. NP is as easy as
detecting unique solutions. Theoret. Comput. Sci., 47(1):85–
93, 1986.
[Wan97] Jie Wang. Averagecase intractable NP problems. In DingZhu
Du and KerI Ko, editors, Advances in Languages, Algorithms,
and Complexity, pages 313–378. Kluwer, 1997.
[Yao82] A. C. Yao. Theory and applications of trapdoor functions. In
Proc. 23rd Ann. IEEE Symp. on Foundations of Comput. Sci.
(FOCS), pages 80–91, 1982.