Are you sure?
This action might not be possible to undo. Are you sure you want to continue?
Luca Trevisan
1
Notes written in Fall 2002, Revised May 2004
1
Computer Science Division, U.C. Berkeley. Work supported by NSF Grant No. 9984703. Email
luca@cs.berkeley.edu.
Foreword
These are scribed notes from a graduate courses on Computational Complexity oﬀered at
the University of California at Berkeley in the Fall of 2002, based on notes scribed by
students in Spring 2001 and on additional notes scribed in Fall 2002. I added notes and
references in May 2004.
The ﬁrst 15 lectures cover fundamental material. The remaining lectures cover more
advanced material, that is diﬀerent each time I teach the class. In these Fall 2002 notes,
there are lectures on H˚astad’s optimal inapproximability results, lower bounds for parity in
bounded depthcircuits, lower bounds in proofcomplexity, and pseudorandom generators
and extractors.
The notes have been only minimally edited, and there may be several errors and impre
cisions.
I will be happy to receive comments, criticism and corrections about these notes.
The syllabus for the course was developed jointly with Sanjeev Arora. Sanjeev wrote
the notes on Yao’s XOR Lemma (Lecture 11). Many people, and especially Avi Wigderson,
were kind enough to answer my questions about various topics covered in these notes.
I wish to thank the scribes Scott Aaronson, Andrej Bogdanov, Allison Coates, Kama
lika Chaudhuri, Steve Chien, Jittat Fakcharoenphol, Chris Harrelson, Iordanis Kerenidis,
Lawrence Ip, Ranjit Jhala, Fran¸ cois Labelle, John Leen, Michael Manapat, Manikandan
Narayanan, Mark Pilloﬀ, Vinayak Prabhu, Samantha Riesenfeld, Stephen Sorkin, Kunal
Talwar, Jason Waddle, Dror Weitz, Beini Zhou.
Berkeley, May 2004.
Luca Trevisan
luca@cs.berkeley.edu
1
Contents
1 Introduction, P and NP 7
1.1 Computational Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 P and NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.3 NPcompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.1 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.2 NPcompleteness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3.3 An NPcomplete problem . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3.4 The Problem SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Diagonalization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2 SpaceBounded Complexity Classes 14
2.1 SpaceBounded Complexity Classes . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Reductions in NL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 NL Completeness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.4 Savitch’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.5 Undirected Connectivity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.6 Randomized Logspace . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
2.7 NL = coNL . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7.1 A simpler problem ﬁrst . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.7.2 Finding r . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3 The Polynomial Hierarchy 23
3.1 Stacks of quantiﬁers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2 The hierarchy . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.3 An alternate characterization . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.4 Additional Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4 Circuits 28
4.1 Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
4.2 Relation to other complexity classes . . . . . . . . . . . . . . . . . . . . . . 29
4.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
2
5 Probabilistic Complexity Classes 35
5.1 Probabilistic complexity classes . . . . . . . . . . . . . . . . . . . . . . . . . 35
5.2 Relations between complexity classes . . . . . . . . . . . . . . . . . . . . . . 36
5.3 BPP⊆ Σ
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
6 A Probabilistic Algorithm 43
6.1 Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
6.2 Testing Equivalence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
6.3 The SchwartzZippel Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . 45
6.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
7 UniqueSAT 47
7.1 The ValiantVazirani Reduction . . . . . . . . . . . . . . . . . . . . . . . . . 47
7.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
8 Counting Problems 51
8.1 Counting Classes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
8.2 Complexity of counting problems . . . . . . . . . . . . . . . . . . . . . . . . 52
8.3 An approximate comparison procedure . . . . . . . . . . . . . . . . . . . . . 53
8.4 Constructing acomp . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
8.5 The proof of the Leftover Hash Lemma . . . . . . . . . . . . . . . . . . . . 55
8.6 Approximate Sampling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
8.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
9 AverageCase Complexity of the Permanent 58
9.1 The Permanent Problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
9.2 Worstcase to Averagecase Equivalence . . . . . . . . . . . . . . . . . . . . 59
9.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
10 Averagecase Complexity of Problems in PSPACE and EXP 61
10.1 Averagecase Complexity in PSPACE and EXP for 7/8 Fraction of Inputs 61
10.2 CodingTheoretic Perspective . . . . . . . . . . . . . . . . . . . . . . . . . . 62
10.3 Averagecase Complexity in PSPACE and EXP for 1/2 Fraction of Inputs 64
10.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
11 The XOR Lemma 68
11.1 The XOR Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
11.2 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
12 Levin’s Theory of Averagecase Complexity 72
12.1 Distributional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
12.2 DistNP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.3 Reductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
12.4 PolynomialTime on Average . . . . . . . . . . . . . . . . . . . . . . . . . . 74
12.5 Existence of Complete Problems . . . . . . . . . . . . . . . . . . . . . . . . 77
3
12.6 PolynomialTime Samplability . . . . . . . . . . . . . . . . . . . . . . . . . 78
12.7 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
13 Interactive Proofs 81
13.1 Interactive Proofs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
13.1.1 NP+ Interaction = NP . . . . . . . . . . . . . . . . . . . . . . . . . 82
13.1.2 NP + Randomness . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
13.2 IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
13.3 An Example: Graph NonIsomorphism . . . . . . . . . . . . . . . . . . . 84
13.4 Random SelfReductions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
13.5 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
14 IP=PSPACE 89
14.1 UNSAT ⊆ IP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89
14.2 A Proof System for #SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
14.3 A Proof System for QBF . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
14.3.1 PSPACEComplete Language: TQBF . . . . . . . . . . . . . . . . 92
14.3.2 Arithmetization of TQBF . . . . . . . . . . . . . . . . . . . . . . . . 92
14.3.3 The Interactive Protocol . . . . . . . . . . . . . . . . . . . . . . . . . 95
14.3.4 Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
15 Introduction to PCP 98
15.1 Probabilistically Checkable Proofs . . . . . . . . . . . . . . . . . . . . . . . 98
15.2 PCP and MAX3SAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
15.2.1 Approximability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
15.2.2 Inapproximability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
15.2.3 Tighter result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
15.3 Max Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
15.3.1 Approximability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
15.3.2 Inapproximability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
15.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
16 H˚astad’s Proof System 102
16.1 H˚astad’s veriﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
16.2 Applications to hardness of approximation . . . . . . . . . . . . . . . . . . . 103
16.3 From the PCP theorem to Raz’s veriﬁer . . . . . . . . . . . . . . . . . . . . 105
16.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
17 Linearity Test 107
17.1 Fourier analysis of Boolean functions . . . . . . . . . . . . . . . . . . . . . . 107
17.2 Using Fourier analysis to test linearity . . . . . . . . . . . . . . . . . . . . . 110
17.3 Testing code words for the Long Code . . . . . . . . . . . . . . . . . . . . . 111
17.3.1 PreH˚astad veriﬁcation . . . . . . . . . . . . . . . . . . . . . . . . . . 111
17.3.2 The Long Code test . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
17.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4
18 Testing the Long Code 114
18.1 Test for Long Code . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
18.2 H˚astad’s veriﬁer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
19 Circuit Lower Bounds for Parity Using Polynomials 119
19.1 Circuit Lower Bounds for PARITY . . . . . . . . . . . . . . . . . . . . . . 119
19.2 Proof of Circuit Lower Bounds for PARITY using Polynomials . . . . . . 120
19.3 Approximating OR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122
19.4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
20 Lower Bound for Parity Using Random Restrictions 125
20.1 Random Restrictions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
20.2 H˚astad’s Result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 129
20.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132
21 Proof Complexity 133
21.1 Resolution Proof System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134
21.2 Width of a Resolution Proof . . . . . . . . . . . . . . . . . . . . . . . . . . . 136
21.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138
22 Lower Bounds on Resolution Proof Length 139
22.1 Width and Proof Complexity . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.1.1 The Main Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
22.1.2 Proof of the technical lemmas . . . . . . . . . . . . . . . . . . . . . . 140
22.2 Width Lower Bounds . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
22.2.1 The Setting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 142
22.2.2 Matching and Satisﬁability . . . . . . . . . . . . . . . . . . . . . . . 143
22.2.3 Putting them together . . . . . . . . . . . . . . . . . . . . . . . . . . 145
22.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
23 Pseudorandomness and Derandomization 148
23.1 Probabilistic Algorithms versus Deterministic Algorithms . . . . . . . . . . 148
23.1.1 A trivial deterministic simulation . . . . . . . . . . . . . . . . . . . . 149
23.1.2 Exponential gaps between randomized and deterministic procedures 149
23.2 Derandomization Under Complexity Assumptions . . . . . . . . . . . . . . 150
23.2.1 Formal Deﬁnitions of Complexity Measures and Complexity Classes 151
23.2.2 Hardness versus Randomness . . . . . . . . . . . . . . . . . . . . . . 151
23.3 Pseudorandom Generators . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152
23.4 The two main theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
23.4.1 The NisanWigderson Theorem . . . . . . . . . . . . . . . . . . . . . 153
23.4.2 Worstcase to Averagecase Reduction . . . . . . . . . . . . . . . . . 154
24 The NisanWigderson Generator 155
24.1 The NisanWigderson Construction . . . . . . . . . . . . . . . . . . . . . . . 155
24.1.1 Impredictability versus Pseudorandomness . . . . . . . . . . . . . . . 155
24.1.2 Combinatorial Designs . . . . . . . . . . . . . . . . . . . . . . . . . . 156
5
24.1.3 The NisanWigderson Generator . . . . . . . . . . . . . . . . . . . . 157
24.2 The Reduction from Distinguishing to Predicting . . . . . . . . . . . . . . . 157
25 Extractors and Pseudorandom Generators 160
25.1 Use of Weak Random Sources . . . . . . . . . . . . . . . . . . . . . . . . . . 160
25.2 Extractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
25.3 Applications . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161
25.4 An Extractor from NisanWigderson . . . . . . . . . . . . . . . . . . . . . . 162
6
Lecture 1
Introduction, P and NP
This course assumes CS170, or equivalent, as a prerequisite. We will assume that the reader
is familiar with the notions of algorithm and running time, as well as with basic notions of
discrete math and probability.
A main objective of theoretical computer science is to understand the amount of re
sources (time, memory, communication, randomness , . . . ) needed to solve computational
problems that we care about. While the design and analysis of algorithms puts upper
bounds on such amounts, computational complexity theory is mostly concerned with lower
bounds; that is we look for negative results showing that certain problems require a lot of
time, memory, etc., to be solved. In particular, we are interested in infeasible problems,
that is computational problems that require impossibly large resources to be solved, even
on instances of moderate size. It is very hard to show that a particular problem is infeasible,
and in fact for a lot of interesting problems the question of their feasibility is still open.
Another major line of work in complexity is in understanding the relations between diﬀerent
computational problems and between diﬀerent “modes” of computation. For example what
is the relative power of algorithms using randomness and deterministic algorithms, what is
the relation between worstcase and averagecase complexity, how easier can we make an
optimization problem if we only look for approximate solutions, and so on. It is in this
direction that we ﬁnd the most beautiful, and often surprising, known results in complexity
theory.
Before going any further, let us be more precise in saying what a computational problem
is, and let us deﬁne some important classes of computational problems. Then we will see a
particular incarnation of the notion of “reduction,” the main tool in complexity theory, and
we will introduce NPcompleteness, one of the great success stories of complexity theory.
We conclude by demonstrating the use of diagonalization to show some separations between
complexity classes. It is unlikely that such techniques will help solving the P versus NP
problem.
1.1 Computational Problems
In a computational problem, we are given an input that, without loss of generality, we assume
to be encoded over the alphabet ¦0, 1¦, and we want to return in output a solution satisfying
7
some property: a computational problem is then described by the property that the output
has to satisfy given the input.
In this course we will deal with four types of computational problems: decision problems,
search problems, optimization problems, and counting problems.
1
For the moment, we will
discuss decision and search problem.
In a decision problem, given an input x ∈ ¦0, 1¦
∗
, we are required to give a YES/NO
answer. That is, in a decision problem we are only asked to verify whether the input
satisﬁes a certain property. An example of decision problem is the 3coloring problem:
given an undirected graph, determine whether there is a way to assign a “color” chosen
from ¦1, 2, 3¦ to each vertex in such a way that no two adjacent vertices have the same
color.
A convenient way to specify a decision problem is to give the set L ⊆ ¦0, 1¦
∗
of inputs
for which the answer is YES. A subset of ¦0, 1¦
∗
is also called a language, so, with the
previous convention, every decision problem can be speciﬁed using a language (and every
language speciﬁes a decision problem). For example, if we call 3COL the subset of ¦0, 1¦
∗
containing (descriptions of) 3colorable graphs, then 3COL is the language that speciﬁes
the 3coloring problem. From now on, we will talk about decision problems and languages
interchangeably.
In a search problem, given an input x ∈ ¦0, 1¦
∗
we want to compute some answer
y ∈ ¦0, 1¦
∗
that is in some relation to x, if such a y exists. Thus, a search problem
is speciﬁed by a relation R ⊆ ¦0, 1¦
∗
¦0, 1¦
∗
, where (x, y) ∈ R if and only if y is an
admissible answer given x.
Consider for example the search version of the 3coloring problem: here given an undi
rected graph G = (V, E) we want to ﬁnd, if it exists, a coloring c : V → ¦1, 2, 3¦ of the
vertices, such that for every (u, v) ∈ V we have c(u) = c(v). This is diﬀerent (and more de
manding) than the decision version, because beyond being asked to determine whether such
a c exists, we are also asked to construct it, if it exists. Formally, the 3coloring problem is
speciﬁed by the relation R
3COL
that contains all the pairs (G, c) where G is a 3colorable
graph and c is a valid 3coloring of G.
1.2 P and NP
In most of this course, we will study the asymptotic complexity of problems. Instead of
considering, say, the time required to solve 3coloring on graphs with 10, 000 nodes on some
particular model of computation, we will ask what is the best asymptotic running time of
an algorithm that solves 3coloring on all instances. In fact, we will be much less ambitious,
and we will just ask whether there is a “feasible” asymptotic algorithm for 3coloring. Here
feasible refers more to the rate of growth than to the running time of speciﬁc instances of
reasonable size.
A standard convention is to call an algorithm “feasible” if it runs in polynomial time,
i.e. if there is some polynomial p such that the algorithm runs in time at most p(n) on
inputs of length n.
1
This distinction is useful and natural, but it is also arbitrary: in fact every problem can be seen as a
search problem
8
We denote by P the class of decision problems that are solvable in polynomial time.
We say that a search problem deﬁned by a relation R is a NP search problem if the
relation is eﬃciently computable and such that solutions, if they exist, are short. Formally,
R is an NP search problem if there is a polynomial time algorithm that, given x and y,
decides whether (x, y) ∈ R, and if there is a polynomial p such that if (x, y) ∈ R then
[y[ ≤ p([x[).
We say that a decision problem L is an NP decision problem if there is some NP relation
R such that x ∈ L if and only if there is a y such that (x, y) ∈ R. Equivalently, a decision
problem L is an NP decision problem if there is a polynomial time algorithm V (, ) and
a polynomial p such that x ∈ L if and only if there is a y, [y[ ≤ p([x[) such that V (x, y)
accepts.
We denote by NP the class of NP decision problems.
Equivalently, NP can be deﬁned as the set of decision problems that are solvable in
polynomial time by a nondeterministic Turing machine. Suppose that L is solvable in
polynomial time by a nondeterministic Turing machine M: then we can deﬁne the relation
R such that (x, t) ∈ R if and only if t is a transcript of an accepting computation of M on
input x and it’s easy to prove that R is an NP relation and that L is in NP according to
our ﬁrst deﬁnition. Suppose that L is in NP according to our ﬁrst deﬁnition and that R is
the corresponding NP relation. Then, on input x, a nondeterministic Turing machine can
guess a string y of length less than p([x[) and then accept if and only if (x, y) ∈ R. Such a
machine can be implemented to run in nondeterministic polynomial time and it decides L.
For a function t : N → N, we deﬁne by DTIME(t(n)) the set of decision problems
that are solvable by a deterministic Turing machine within time t(n) on inputs of length n,
and by NTIME(t(n)) the set of decision problems that are solvable by a nondeterministic
Turing machine within time t(n) on inputs of length n. Therefore, P =
¸
kDTIME(O(n
k
))
and NP =
¸
kDTIME(O(n
k
)).
1.3 NPcompleteness
1.3.1 Reductions
Let A and B be two decision problems. We say that A reduces to B, denoted A ≤ B, if
there is a polynomial time computable function f such that x ∈ A if and only if f(x)inB.
Two immediate observations: if A ≤ B and B is in P, then also A ∈ P (conversely, if
A ≤ B, and A ∈ P then also B ∈ P); if A ≤ B and B ≤ C, then also A ≤ C.
1.3.2 NPcompleteness
A decision problem A is NPhard if for every problem L ∈ NP we have L ≤ A. A decision
problem A is NPcomplete if it is NPhard and it belongs to NP.
It is a simple observation that if A is NPcomplete, then A is solvable in polynomial
time if and only if P = NP.
9
1.3.3 An NPcomplete problem
Consider the following decision problem, that we call U: we are given in input (M, x, t, l)
where M is a Turing machine, x ∈ ¦0, 1¦
∗
is a possible input, and t and l are integers
encoded in unary
2
, and the problem is to determine whether there is a y ∈ ¦0, 1¦
∗
, [y[ ≤ l,
such that M(x, y) accepts in ≤ t steps.
It is immediate to see that U is in NP. One can deﬁne a procedure V
U
that on input
(M, x, t, l) and y accepts if and only if [y[ ≤ l, and M(x, y) accepts in at most t steps.
Let L be an NP decision problem. Then there are algorithm V
L
, and polynomials T
L
and p
L
, such that x ∈ L if and only if there is y, [y[ ≤ p
L
([x[) such that V
L
(x, y) accepts;
furthermore V
L
runs in time at most T
L
([x[ + [y[). We give a reduction from L to U.
The reduction maps x into the instance f(x) = (V
L
, x, T
L
([x[ + p
L
([x[)), p
L
([x[)). Just by
applying the deﬁnitions, we can see that x ∈ L if and only f(x) ∈ U.
1.3.4 The Problem SAT
In SAT (that stands for CNFsatisﬁability) we are given Boolean variables x
1
, x
2
, . . . , x
n
and
a Boolean formula φ involving such variables; the formula is given in a particular format
called conjunctive normal form, that we will explain in a moment. The question is whether
there is a way to assign Boolean (True / False) values to the variables so that the formula
is satisﬁed.
To complete the description of the problem we need to explain what is a Boolean formula
in conjunctive normal form. First of all, Boolean formulas are constructed starting from
variables and applying the operators ∨ (that stands for OR), ∧ (that stands for AND) and
(that stands for NOT).
The operators work in the way that one expects: x is True if and only if x is False;
x ∧ y is True if and only if both x and y are True; x ∨ y is True if and only at least one
of x or y is True.
So, for example, the expression x ∧ (x ∨ y) can be satisﬁed by setting x to False and
y to True, while the expression x ∧ (x ∨ y) ∧ y is impossible to satisfy.
A literal is a variable or the negation of a variable, so for example x
7
is a literal and
so is x
3
. A clause is formed by taking one or more literals and connecting them with a OR,
so for example (x
2
∨ x
4
∨ x
5
) is a clause, and so is (x
3
). A formula in conjunctive normal
form is the AND of clauses. For example
(x
3
∨ x
4
) ∧ (x
1
) ∧ (x
3
∨ x
2
)
is a formula in conjunctive normal form (from now on, we will just say “CNF formula” or
“formula”). Note that the above formula is satisﬁable, and, for example, it is satisﬁed by
setting all the variables to True (there are also other possible assignments of values to the
variables that would satisfy the formula).
On the other hand, the formula
x ∧ (x ∨ y) ∧ y
is not satisﬁable, as it has already been observed.
2
The “unary” encoding of an integer n is a sequence of n ones.
10
Theorem 1 (Cook) SAT is NPcomplete.
1.4 Diagonalization
Diagonalization is essentially the only way we know of proving separations between com
plexity classes. The basic principle is the same as in Cantor’s proof that the set of real
numbers is not countable: suppose towards a contradiction that the set of real numbers in
the range [0, 1) where countable, and let r
1
, . . . , r
n
, . . . be an enumeration. Let r
i
[j] be the
jth digit in the binary expansion of r
i
(that is, r
i
= sum
∞
j=1
2
−j
r
i
[j]). Then deﬁne the real
number r whose jth digit is 1−r
j
[j], that is, r =
¸
∞
j=1
2
−j
(1−r
j
[j]). This is a welldeﬁned
real number but there can be no i such that r = r
i
, because r diﬀers from i in the ith digit.
Similarly, we can prove that the Halting problem is undecideble by considering the
following decision problem D: on input 'M`, the description of a Turing machine, answer
NO if M('M`) halts and accepts and YES otherwise. The above problem is decidable if the
Halting problem is decidable. However, suppose D where decidable and let T be a Turing
machine that solves D, then T('T`) halts and accepts if and only if T('T`) does not halt
and accept, which is a contradiction.
It is easy to do something similar with timebounded computations, given the following
result (that we will not prove).
Theorem 2 (Eﬃcient Universal Turing Machine) On input the description 'M` of a
Turing machine M, a string x and an integer t > n, the problem of deciding whether M
accepts x within t steps is solvable in O(t log t) time.
The O(t log t) simulation is possible if, say, both the universal machine and the simulated
machine are twotapes Turing machines. A similar theorem holds for any reasonable model
of computation, with t
O(1)
in place of O(t log t).
Theorem 3 DTIME(n(log n)
3
) ⊆ DTIME(O(n)).
Proof: Consider the following decision problem D: on input x = ('M`, z) answer NO if
M(x) rejects within [x[ log[x[ steps, and YES otherwise.
The problem can be solved in O(n(log n)
2
) time, which is less than n(log n)
3
for all but
ﬁnitely many inputs.
Suppose by contradiction that D is solvable in time ≤ an +b on inputs of length n, for
constants a and b, by a machine T. Let z be suﬃciently long so that ['T`, z[ log(['T`, z[) >
a['T`, z[ +b, then T('T`, z) accepts if and only if it rejects, which is a contradiction. 2
See the homeworks for generalizations of the above proof.
We would like to do the same for nondeterministic time, but we run into the problem
that we cannot ask a nondeterministic machine to reject if and only if a nondeterministic
machine of comparable running time accepts. If we could do so, then we would be able to
prove NP = coNP. A considerably subtler argument must be used instead, which uses the
following simple fact.
11
Theorem 4 On input the description 'M` of a nondeterministic Turing machine M, a
string x and an integer t > n, the problem of deciding whether M accepts x within t steps
is solvable in deterministic 2
O(t)
time.
Finally, we can state and prove a special case of the nondeterministic hierarchy theorem.
Theorem 5 NTIME(n(log n)
3
) ⊆ NTIME(O(n)).
Proof: Let f : N → N be deﬁned inductively so that f(1) = 2 and f(k + 1) = 2
kf(k)
.
Consider the following decision problem D: on input x = ('M`, 1
t
), where M is non
deterministic Turing machine,
1. if t = f(k) then the answer is YES if and only if the simulation of M('M`, 1
1+f(k−1)
)
as in Theorem 4 returns NO within t steps,
2. otherwise answer YES if and only if M('M`, 1
t+1
) accepts within t log t steps.
We ﬁrst observe that D is solvable by a nondeterministic Turing machine running in
O(n(log n)
2
) time, where n is the input length, which is less than n(log n)
3
except for
ﬁnitely many values of n.
Suppose that D were decided by a nondeterministic Turing machine T running in
time bounded by an + b, and consider inputs of the form 'T`, 1
t
(which are solved by T
in time a
t + b
). Pick a suﬃciently large k, and consider the behaviour of T on inputs
('T`, 1
t
) for f(k − 1) < t < f(k); since all such inputs fall in case (2), we have that
T('T`, 1
t
) = T('T`, 1
t+1
) for all such t and, in particular,
T('T`, 1
1+f(k−1))
= T('T`, 1
f(k)
) (1.1)
On the other hand, the input ('T`, 1
f(k)
) falls in case (2), and since T('T`, 1
1+f(k−1)
) can
be simulated deterministically in time 2
(k−1)f(k−1)
, if k is large enough, and so the correct
answer on input ('T`, 1
f(k)
) is NO if and only if T('T`, 1
1+f(k−1)
) accepts, which is in
contradiction to Equation 1.1. 2
1.5 References
The time hierarchy theorem is proved in [HS65], which is also the paper that introduced
the term “Computational Complexity.” The nondeterministic hierarchy theorem is due to
Cook [Coo73]. The notion of NPcompleteness is due to Cook [Coo71] and Levin [Lev73],
and the recognition of its generality is due to Karp [Kar72].
12
Exercises
1. Show that if P = NP for decision problems, then every NP search problem can be
solved in polynomial time.
2. Generalize Theorem 3. Say that a monotone nondecreasing function t : N → N is
timeconstructible if, given n, we can compute t(n) in O(t(n)) time. Show that if
t(n) and t
(n) are two timeconstructible functions such that t
(n) > t(n) > n and
lim
n→∞
t(n) log t(n)
t
(n)
= 0 then DTIME(t
(n)) ⊆ DTIME(t(n)).
3. The proof of Theorem 3 shows that there is a problem D solvable in O(n(log n)
2
)
time such that every machine M running in time o(nlog n) fails at solving D on a
constant fraction of all inputs of suﬃciently large length. The constant, however, is
exponentially small in the representation of M.
Show that there is a problem D
solvable in O(n(log n)
2
) time and such that for every
machine M running in time o(nlog n) there is a constant n
M
such that, for every
n > n
M
, M fails at solving D
on at least 1/3 of all inputs of length n.
[Note: this may be an open question. If you can ﬁnd an oracle relative to which the
above statement is false, that would also be interesting, because it would show that
the question cannot be solved via a standard diagonalization.]
13
Lecture 2
SpaceBounded Complexity Classes
2.1 SpaceBounded Complexity Classes
A machine solves a problem using space s() if, for every input x, the machine outputs
the correct answer and uses only the ﬁrst s([x[) cells of the tape. For a standard Turing
machine, we can’t do better than linear space since x itself must be on the tape. So we
will often consider a machine with multiple tapes: a readonly “input” tape, a read/write
“work” or “memory” tape, and possibly a writeonce “output” tape. Then we can say the
machine uses space s if for input x, it uses only the ﬁrst s([x[) cells of the work tape.
We denote by l the set of decision problems solvable in O(log n) space. We denote by
PSPACE the set of decision problems solvable in polynomial space. A ﬁrst observation is
that a spaceeﬃcient machine is, to a certain extent, also a timeeﬃcient one. In general
we denote by SPACE(s(n)) the set of decision problems that can be solved using space at
most s(n) on inputs of length n.
Theorem 6 If a machine always halts, and uses s() space, with s(n) ≥ log n, then it runs
in time 2
O(s(n))
.
Proof: Call the “conﬁguration” of a machine M on input x a description of the state
of M, the position of the input tape, and the contents of the work tape at a given time.
Write down c
1
, c
2
, . . . , c
t
where c
i
is the conﬁguration at time i and t is the running time
of M(x). No two c
i
can be equal, or else the machine would be in a loop, since the c
i
completely describes the present, and therefore the future, of the computation. Now, the
number of possible conﬁgurations is simply the product of the number of states, the number
of positions on the input tape, and the number of possible contents of the work tape (which
itself depends on the number of allowable positions on the input tape). This is
O(1) n [Σ[
s(n)
= 2
O(s(n))+log n
= 2
O(s(n))
Since we cannot visit a conﬁguration twice during the computation, the computation
must therefore ﬁnish in 2
O(s(n))
steps. 2
NL is the set of decision problems solvable by a nondeterministic machine using
O(log n) space. NPSPACE is the set of decision problems solvable by a nondeterministic
14
machine using polynomial space. In general we denote by NSPACE(s(n)) the set of deci
sion problems that can be solved by nondeterministic machines that use at most s(n) bits
of space on inputs of length n.
Analogously with timebounded complexity classes, we could think that NL is exactly
the set of decision problems that have “solutions” that can veriﬁed in logspace. If so, NL
would be equal to NP, since there is a logspace algorithm V that veriﬁes solutions to SAT.
However, this is unlikely to be true, because NL is contained in P. An intuitive reason
why not all problems with a logspace “veriﬁer” can be simulated in NL is that an NL
machine does not have enough memory to keep track of all the nondeterministic choices
that it makes.
Theorem 7 NL ⊆ P.
Proof: Let L be a language in NL and let M be a nondeterministic logspace machine
for L. Consider a computation of M(x). As before, there are 2
O(s(n))
= n
O(1)
possible
conﬁgurations. Consider a directed graph in which vertices are conﬁgurations and edges
indicate transitions from one state to another which the machine is allowed to make in
a single step (as determined by its δ). This graph has polynomially many vertices, so in
polynomial time we can do a depthﬁrst search to see whether there is a path from the
initial conﬁguration that eventually leads to acceptance. This describes a polynomialtime
algorithm for deciding L, so we’re done. 2
2.2 Reductions in NL
We would like to introduce a notion of completeness in NL analogous to the notion of
completeness that we know for the class NP. A ﬁrst observation is that, in order to have
a meaningful notion of completeness in NL, we cannot use polynomialtime reductions,
otherwise any NL problem having at least a YES instance and at least a NO instance
would be trivially NLcomplete. To get a more interesting notion of NLcompleteness we
need to turn to weaker reductions. In particular, we deﬁne log space reductions as follows:
Deﬁnition 1 Let A and B be decision problems. We say A is log space reducible to B,
A ≤
log
B, if ∃ a function f computable in log space such that x ∈ A iﬀ f(x) ∈ B, and
B ∈ L.
Theorem 8 If B ∈ L, and A ≤
log
B, then A ∈ L.
Proof: We consider the concatenation of two machines: M
f
to compute f, and M
B
to
solve B. If our resource bound was polynomial time, then we would use M
f
(x) to compute
f(x), and then run M
B
on f(x). The composition of the two procedures would given an
algorithm for A, and if both procedures run in polynomial time then their composition is
also polynomial time. To prove the theorem, however, we have to show that if M
f
and M
B
are log space machines, then their composition can also be computed in log space.
Recall the deﬁnition of a Turing machine M that has a log space complexity bound: M
has one readonly input tape, one writeonly output tape, and uses a log space work tape.
15
A naive implementation of the composition of M
f
and M
B
would be to compute f(x), and
then run M
B
on input f(x); however f(x) needs to be stored on the work tape, and this
implementation does not produce a log space machine. Instead we modify M
f
so that on
input x and i it returns the ith bit of f(x) (this computation can still be carried out in
logarithmic space). Then we run a simulation of the computation of M
B
(f(x)) by using
the modiﬁed M
f
as an “oracle” to tell us the value of speciﬁed positions of f(x). In order
to simulate M
B
(f(x)) we only need to know the content of one position of f(x) at a time,
so the simulation can be carried with a total of O(log [x[) bits of work space. 2
Using the same proof technique, we can show the following:
Theorem 9 if A ≤
log
B, B ≤
log
C, then A ≤
log
C.
2.3 NL Completeness
Armed with a deﬁnition of log space reducibility, we can deﬁne NLcompleteness.
Deﬁnition 2 A is NLhard if ∀ B ∈ NL, B ≤
log
A. A is NLcomplete if A ∈ NL and A
is NLhard.
We now introduce a problem STCONN (s,tconnectivity) that we will show is NLcomplete.
In STCONN, given in input a directed graph G(V, E) and two vertices s, t ∈ V , we want to
determine if there is a directed path from s to t.
Theorem 10 STCONN is NLcomplete.
Proof:
1. STCONN ∈ NL.
On input G(V,E), s,t, set p to s. For i = 1 to [V [, nondeterminsitically, choose a
neighboring vertex v of p. Set p = v. If p = t, accept and halt. Reject and halt if the
end of the for loop is reached. The algorithm only requires O(log n) space.
2. STCONN is NLhard.
Let A ∈ NL, and let M
A
be a nondeterministic logarithmic space Turing Machine for
A. On input x, construct a directed graph G with one vertex for each conﬁguration
of M(x), and an additional vertex t. Add edges (c
i
, c
j
) if M(x) can move in one step
from c
i
to c
j
. Add edges (c, t) from every conﬁguration that is accepting, and let s
be the start conﬁguration. M accepts x iﬀ some path from s to t exists in G. The
above graph can be constructed from x in log space, because listing all nodes requires
O(log n) space, and testing valid edges is also easy.
2
16
2.4 Savitch’s Theorem
What kinds of tradeoﬀs are there between memory and time? STCONN can be solved
deterministically in linear time and linear space, using depthﬁrstsearch. Is there some
sense in which this is optimal? Nondeterministically, we can search using less than linear
space. Can searching be done deterministically in less than linear space?
We will use Savitch’s Theorem [Sav70] to show that STCONN can be solved deter
ministically in O(log
2
n), and that every NL problem can be solved deterministically in
O(log
2
n) space. In general, if A is a problem that can be solved nondeterministically with
space s(n) ≥ log n, then it can be solved deterministically with O(s
2
(n))space.
Theorem 11 STCONN can be solved deterministically in O(log
2
n) space.
Proof: Consider a graph G(V, E), and vertices s, t. We deﬁne a recursive function
REACH(u, v, k) that accepts and halts iﬀ v can be reached from u in ≤ k steps. If
k = 1, then REACH accepts iﬀ (u, v) is an edge. If k ≥ 2, ∀w ∈ V − ¦u, v¦, compute
REACH(u, w, k/2 ) and REACH(w, v, k/2 ). If both accept and halt, accept. Else,
reject.
Let S(k) be the worstcase space use of REACH(, , k). The space required for the
base case S(1) is a counter for tracking the edge, so S(1) = O(log n). In general, S(k) =
O(log n)+S(k/2) for calls to REACH and for tracking w. So, S(k) = O(log k∗log n). Since
k ≤ n, the worstcase space use of REACH is O(log
2
n). 2
Essentially the same proof applies to arbitrary nondeterministic spacebounded com
putations.
Theorem 12 (Savitch’s Theorem) For every function s(n) computable in space O(s(n)),
NSPACE(s) = SPACE(O(s
2
))
Proof: We begin with a nondeterministic machine M, which on input x uses s([x[) space.
We deﬁne REACH(c
i
, c
j
, k), as in the proof of Theorem 11, which accepts and halts iﬀ
M(x) can go from c
i
to c
j
in ≤ k steps. We compute REACH(c
0
, c
acc
, 2
O
(s[x[)) for all
accepting conﬁgurations c
acc
. If there is a call of REACH which accepts and halts, then
M accepts. Else, M rejects. If REACH accepts and halts, it will do so in ≤ 2
O(x)
steps.
Let S
R
(k) be the worstcase space used by REACH(, , k): S
R
(1) = O(s(n)), S
R
(k) =
O(s(n)) + S
R
(k/2). This solves S
R
= s(n) ∗ log k, and, since k = 2
O
(s(n)), we have
S
R
= O(s
2
(n)). 2
Comparing Theorem 11 to depthﬁrstsearch, we ﬁnd that we are exponentially better
in space requirements, but we are no longer polynomial in time.
Examining the time required, if we let t(k) be the worstcase time used by REACH(, , k),
we see t(1) = O(n + m), and t(k) = n(2 ∗ T(k/2)), which solves to t(k) = n
O(log k)
=
O(n
O(log n)
), which is superpolynomial. Savitch’s algorithm is still the one with the best
known space bound. No known algorithm achieves polynomial log space and polynomial
time simultaneously, although such an algorithm is known for undirected connectivity.
17
2.5 Undirected Connectivity
In the undirected s − t connectivity problem (abbreviated STUCONN) we are given an
undirected graph G = (V, E) and two vertices s, t ∈ V , and the question is whether that is
a path between s and t in G.
While this problem is not known to be complete for NL, and it probably is not, ST
UCONN is complete for the class SL of decision problems that are solvable by symmetric
nondeterministic machines that use O(log n) space. A nondeterministic machine is sym
metric if whenever it can make a transition from a global state s to a global state s
then the
transition from s
to s is also possible. The proof of SLcompleteness of STUCONN is iden
tical to the proof of NLcompleteness of STCONN except for the additional observation
that the transition graph of a symmetric machine is undirected.
For STUCONN there exists an algorithm running in polynomial time and O(log
2
n)
space (but the polynomial has very high degree), due to Nisan [Nis94]. There is also an
algorithm that has O(log
4/3
n) space complexity and superpolynomial time complexity, due
to Armoni, TaShma, Nisan and Wigderson [ATSWZ97], improving on a previous algorithm
by Nisan, Szemeredy and Wigderson [NSW92].
2.6 Randomized Logspace
We now wish to introduce randomized spacebounded Turing machine. For simplicity, we
will only introduce randomized machines for solving decision problems. In addition to a
readonly input tape and a read/write work tape, such machines also have a readonly
random tape to which they have oneway access, meaning that the head on that tape can
only more, say, lefttoright. For every ﬁxed input and ﬁxed content of the random tape, the
machine is completely deterministic, and either accepts or rejects. For a Turing machine
M, an input x and a content r of the random tape, we denote by M(r, x) the outcome of
the computation.
We say that a decision problem L belongs to the class RL (for randomized logspace) if
there is a probabilistic Turing machine M that uses O(log n) space on inputs of length n
and such that
• For every x ∈ L, Pr
r
[M(r, x) accepts ] ≥ 1/2
• For every x ∈ L, Pr
r
[M(r, x) accepts ] = 0.
It is easy to observe that any constant bigger than 0 and smaller than 1 could be
equivalently used instead of 1/2 in the deﬁnition above. It also follows from the deﬁnition
that L ⊆ RL ⊆ NL.
The following result shows that, indeed, L ⊆ SL ⊆ RL ⊆ NL.
Theorem 13 The problem STUCONN is in RL.
We will not give a proof of the above theorem, but just describe the algorithm. Given
an undirected graph G = (V, E) and two vertices s, t, the algorithm performs a random
walk of length 100 n
3
starting from s. If t is never reached, the algorithm rejects.
18
input: G = (V, E), s, t
v ← s
for i ← 1 to 100 n
3
pick at random a neighbor w of v
if w = t then halt and accept
v ← w reject
The analysis of the algorithm is based on the fact that if we start a random walk from
a vertex s of an undirected vertex G, then all vertices in the connected component of s are
likely to be visited at least once after Θ(n
3
) steps.
The best known deterministic simulation of RL uses O((log n)
3/2
) space, and is due to
Saks and Zhou [SZ95].
19
2.7 NL = coNL
In order to prove that these two classes are the same, we will show that there is an NL
Turing machine which solves stconn. stconn is the problem of deciding, given a directed
graph G, together with special vertices s and t, whether t is not reachable from s. Note
that stconn is coNLcomplete.
Once we have the machine, we know that coNL ⊆ NL, since any language A in coNL
can be reduced to stconn, and since stconn has been shown to be in NL (by the exis
tence of our machine), so is A. Also, NL ⊆ coNL, since if stconn ∈ NL, by deﬁnition
stconn ∈ coNL, and since stconn is NLcomplete, this means that any problem in NL
can be reduced to it and so is also in coNL. Hence NL = coNL. This result was proved
independently in [Imm88] and [Sze88].
2.7.1 A simpler problem ﬁrst
Now all that remains to be shown is that this Turing machine exists. First we will solve
a simpler problem than stconn. We will assume that in addition to the usual inputs G,
s and t, we also have an input r, which we will assume is equal to the number of vertices
reachable from s in G, including s.
Given these inputs, we will construct a nondeterministic Turing machine which decides
whether t is reachable from s by looking at all subsets of r vertices in G, halting with yes if
it sees a subset of vertices which are all reachable from s but do not include t, and halting
with no otherwise. Here is the algorithm:
input: G = (V, E), s, t, r
output: yes if it discovers that t is not reachable from s, and no otherwise
assumption: there are exactly r distinct vertices reachable from s
c ← 0
for all v ∈ (V −¦t¦) do
nondeterministically guess if v is reachable from s
if guess = YES then
nondeterministically guess the distance k from s to v
p ← s
for i ← 1 to k do
nondeterministically pick a neighbor q of p
p ← q
if p = v, reject
c ← c + 1
if c = r then return yes, otherwise return no
It is easy to verify that this algorithm is indeed in NL. The algorithm only needs to
maintain the ﬁve variables c, k, p, q, v, and each of these variables can be represented with
log [V [ bits.
20
Regarding correctness, notice that, in the algorithm, c can only be incremented for a
vertex v that is actually reachable from s. Since there are assumed to be exactly r such
vertices, c can be at most r at the end of the algorithm, and if it is exactly r, that means that
there are r vertices other than t which are reachable from s, meaning that t by assumption
cannot be reachable form s. Hence the algorithm accepts if and only if it discovers that t
is not reachable from s.
2.7.2 Finding r
Now we need to provide an NLalgorithm that ﬁnds r. Let’s ﬁrst try this algorithm:
input: G = (V, E), s
output: the number of vertices reachable from s (including s in this count)
c ← 0
for all v ∈ V do
nondeterministically guess if v is reachable from s in k steps
if guess = yes then
p ← s
for i ← 1 to k do
nondeterministically guess a neighbor q of p (possibly not moving at all)
p ← q
if p = v reject
c ← c + 1
return c
This algorithm has a problem. It will only return a number c which is at most r,
but we need it to return exactly r. We need a way to force it to ﬁnd all vertices which are
reachable from s. Towards this goal, let’s deﬁne r
k
to be the set of vertices reachable from
s in at most k steps. Then r = r
n−1
, where n is the number of vertices in G. The idea is
to try to compute r
k
from r
k−1
and repeat the procedure n−1 times, starting from r
0
= 1.
Now here is another try at an algorithm:
input: G = (V, E), s, k, r
k−1
output: the number of vertices reachable from s in at most k steps (including s in this count)
assumption: r
k−1
is the exact number of vertices reachable from s in at most k −1 steps
c ← 0
for all v ∈ V do
d ← 0
flag ← FALSE
for all w ∈ V do
p ← s
for i ← 1 to k −1 do
nondeterministically pick a neighbor q of p (possibly not moving at all)
p ← q
21
if p = w then
d ← d + 1
if v is a neighbor of w, or if v = w then
flag ← TRUE
if d < r
k−1
reject
if flag then c ← c + 1
return c
Here is the idea behind the algorithm: for each vertex v, we need to determine if it is
reachable from s in at most k steps. To do this, we can loop over all vertices which are a
distance at most k −1 from s, checking to see if v is either equal to one of these vertices or
is a neighbor of one of them (in which case it would be reachable in exactly k steps). The
algorithm is able to force all vertices of distance at most k −1 to be considered because it
is given r
k−1
as an input.
Now, putting this algorithm together with the ﬁrst one listed above, we have shown that
stconn ∈ NL, implying that NL = coNL. In fact, the proof can be generalized to show
that if a decision problem A is solvable in nondeterministic space s(n) = Ω(log n), then A
is solvable in nondeterministic space O(s(n)).
Exercises
1. It is possible to prove that there is an algorithm that given a Turing machine M
(of the type with a read/only input tape and a work tape), an input x and a space
bound s, decides whether M(x) accepts and uses ≤ s bits of space; the algorithm uses
O(s) bits of space. Using the above result, argue that if s(n) > log n is a function
computable in O(s(n)) space, then SPACE(s(n) log n) ⊆ SPACE(s(n)).
2. Prove that P = SPACE(O(n)).
[This is a “trick” question. It is not know how to prove P ⊆ SPACE(O(n)) or how
to prove SPACE ⊆ P, so the proof has to start by assuming the classes are equal
and then reach a contradiction, without explicitely showing a problem in one class
that cannot belong to the other. Hint: if A and B are decision problems, A ≤ B and
B ∈ P then A ∈ P; what would happen if the same were true for SPACE(O(n))?]
3. Deﬁne the class BPL (for boundederror probabilistic logspace) as follows. A decision
problem L is in BPL if there is a logspace probabilistic Turing machine M such that
• If x ∈ L then Pr
r
[M(r, x) accepts ] ≥ 2/3;
• If x ∈ L then Pr
r
[M(r, x) accepts ] ≤ 1/3.
Then
(a) Prove that RL ⊆ BPL.
(b) Prove that BPL ⊆ SPACE(O((log n)
2
).
(c) This might be somewhat harder: prove that BPL ⊆ P.
22
Lecture 3
The Polynomial Hierarchy
One way to look at the diﬀerence between NP and coNP is that a decision problem in
NP is asking a sort of “does there exist” question, where the existence of the answer can
by deﬁnition be eﬃciently represented. On the other hand, coNP asks “is it true for all”
questions, which do not seem to have simple, eﬃcient proofs.
Formally, a decision problem A is in NP if and only if there is a polynomial time
procedure V (, ) and a polynomial time bound p() such that
x ∈ A if and only if ∃y.[y[ ≤ p([x[) ∧ V (x, y) = 1
and a problem A is in coNP if and only if there is a polynomial time procedure V (, )
and a polynomial bound p() such that
x ∈ A if and only if ∀y : [y[ ≤ p([x[), V (x, y) = 1
3.1 Stacks of quantiﬁers
Now suppose you had a decision problem A which asked something of the following form:
x ∈ A ⇔ ∃ y
1
s.t. [y
1
[ ≤ p([x[) ∀ y
2
s.t. [y
2
[ ≤ p([x[) V (x, y
1
, y
2
)
(where p() is a polynomial time bound and V (, , ) is a polynomial time procedure.)
In other words, an algorithm solving problem A should return yes on an input x if an
only if there exists some string y
1
such that for all strings y
2
(both of polynomial length),
the predicate V (x, y
1
, y
2
) holds. An example of such a problem is this: given a Boolean
formula φ over variables x
1
, . . . , x
n
, is there a formula φ
which is equivalent to φ and is
of size at most k? In this case, y
1
is the formula φ
, y
2
is an arbitrary assignment to the
variables x
1
, . . . , x
n
, and V (x, y
1
, y
2
) is the predicate which is true if and only if x[y
2
] and
y
1
[y
2
] are both true or both false, meaning that under the variable assignment y
2
, φ and φ
agree. Notice that φ
is equivalent to φ if and only if it agrees with φ under all assignments
of Boolean values to the variables.
As we will see, the problem A is a member of the class Σ
2
in the second level of the
polynomial hierarchy.
23
3.2 The hierarchy
The polynomial hierarchy starts with familiar classes on level one: Σ
1
= NP and Π
1
=
coNP. For all i ≥ 1, it includes two classes, Σ
i
and Π
i
, which are deﬁned as follows:
A ∈ Σ
i
⇔ ∃y
1
. ∀y
2
. . . . .Qy
i
. V
A
(x, y
1
, . . . , y
i
)
and
B ∈ Π
i
⇔ ∀y
1
. ∃y
2
. . . . .Q
y
i
. V
B
(x, y
1
, . . . , y
i
)
where the predicates V
A
and V
B
depend on the problems A and B, and Q and Q
represent
the appropriate quantiﬁers, which depend on whether i is even or odd (for example, if i = 10
then the quantiﬁer Q for Σ
10
is ∀, and the quantiﬁer Q
for Π
10
is ∃). For clarity, we have
also omitted the p() side conditions, but they are still there.
One thing that is easy to see is that Π
k
= coΣ
k
. Also, note that, for all i ≤ k − 1,
Π
i
⊆ Σ
k
and Σ
i
⊆ Σ
k
. These subset relations hold for Π
k
as well. This can be seen by
noticing that the predicates V do not need to “pay attention to” all of their arguments,
and so can represent classes lower on the hierarchy which have a smaller number of them.
3.3 An alternate characterization
The polynomial hierarchy can also be characterized in terms of “oracle machines.” The idea
here is that, instead of a standard Turing machine, we consider one which is augmented
with an oracle of a certain power which can be consulted as many times as desired, and
using only one computational step each time. Syntactically, this can be written as follows.
Let A be some decision problem and ´ be a class of Turing machines. Then ´
A
is
deﬁned to be the class of machines obtained from ´ by allowing instances of A to be solved
in one step. Similarly, if ´ is a class of Turing machines and ( is a complexity class, then
´
C
=
¸
A∈C
´
A
. If L is a complete problem for (, and the machines in ´ are powerful
enough to compute polynomialtime computations, then ´
C
= ´
L
.
Theorem 14 Σ
2
= NP
3SAT
.
Proof: Let A ∈ Σ
2
, then for some polynomial p() and polynomialtime computable V ()
we have
x ∈ A if and only if ∃y
1
s.t. [y
1
[ ≤ p([x[).∀y
2
s.t. [y
2
[ ≤ p([x[).V (x, y
1
, y
2
) = 1
Then we deﬁne a nondeterministic machine with an NPoracle as follows: on input x,
the machine guesses a string y
1
of length at most p([x[), and then asks the oracle whether
∃y
2
.[y
2
[ ≤ p([x[) ∧ V (x, y
1
, y
2
) = 0. The above question is an existential question about a
polynomialtime computation, so, by Cook’s theorem, it is possible to construct in polyno
mial time a 3SAT instance that is satisﬁable if and only if the answer to the above question
is YES. The machine accepts if and only if the answer from the oracle is NO. It is immediate
that the machine has an accepting computation if and only if
∃y
1
.[y
1
[ ≤ p([x[).(∃y
2
.[y
2
[ ≤ p([x[) ∧ V (x, y
1
, y
2
) = 0)
24
that is, the machine accepts if and only if x ∈ A.
Notice that, in the above computation, only one oracle query is made, even though the
deﬁnition of NP
3SAT
allows us to make an arbitrary number of oracle queries.
Let now A ∈ NP
3SAT
, and let M be the oracle machine that solves A. We ﬁrst show
that there is a machine M
that also solves A, only makes one oracle query, and accepts
if and only if the answer to the oracle query is NO. On input x, M
guesses an accepting
computation of M(x), that is, M
guesses all the nondeterministic choices of M(x), all the
oracle questions, and all the answers. Then, for each question that was answered with a
YES, M
guesses a satisfying assignment to verify that the guess was correct. Finally, M
is left with a certain set of oracle questions, say, the formulae φ
1
, . . . , φ
k
, for which it has
guessed that the correct oracle answer is NO. Then M
asks its oracle whether (a formula
equivalent to) φ
1
∨ ∨ φ
k
is satisﬁable, and it accepts if and only if the answer is NO.
Consider the computation of M
(x) when x ∈ A: there is a valid accepting computation
of M(x), and M
(x) can guess that computation along with the valid oracle answers; it can
also guess valid assignments for all the queries for which the answer is YES; ﬁnally, it is left
with unsatisﬁable formulae φ
1
, . . . , φ
k
, the answer to the single oracle query of M
is NO,
and M
accepts.
Conversely, if M
(x) has an accepting computation, then there must be a valid accepting
computation of M(x), and so x ∈ A. 2
In fact, a more general result is known, whose proof works along similar lines.
Theorem 15 For every i ≥ 2, Σ
i
= NP
Σ
i−1
.
3.4 Additional Properties
Here are some more facts about the polynomial hierarchy, which we will not prove:
1. Π
i
and Σ
i
have complete problems for all i.
2. A Σ
i
complete problem is not in Π
j
, j ≤ i − 1, unless Π
j
= Σ
i
, and it is not in Σ
j
unless Σ
j
= Σ
i
.
3. Suppose that Σ
i
= Π
i
for some i. Then Σ
j
= Π
j
= Σ
i
= Π
i
for all j ≥ i.
4. Suppose that Σ
i
= Σ
i+1
for some i. Then Σ
j
= Π
j
= Σ
i
for all j ≥ i.
5. Suppose that Π
i
= Π
i+1
for some i. then Σ
j
= Π
j
= Π
i
for all j ≥ i.
We will just prove the following special case of part (3).
Theorem 16 Suppose NP = coNP. Then, for every i ≥ 2, Σ
i
= NP.
Proof: Let us ﬁrst prove that, under the assumption of the theorem, Σ
2
= NP. Let A ∈ Σ
2
and let M be the nondeterministic oracle machine that decides A using oracle access to
3SAT. Let also M
be the nondeterministic polynomial time Turing machine that decides
the complement of the 3SAT problem. We now describe a nondeterministic polynomial
time Turing machine M
to decide A: on input x, M
guesses an accepting computation
25
of M(x), along with oracle queries and answers; for each oracle question φ for which a YES
answer has been guessed, M
guesses a satisfying assignment; for each oracle question ψ
for which a NO answer has been guessed, M
guesses an accepting computation of M
(ψ).
It is easy to verify that M
(x) has an accepting computation if and only if M
3SAT
(x) has
an accepting computation.
We can prove by induction on i that Σ
i
= NP. We have covered the base case. Let us
now suppose that Σ
i−1
= NP; then Σ
i
= NP
Σ
i−1
= NP
NP
= Σ
2
= NP. 2
While it seems like an artiﬁcial construction right now, in future lectures we will see
that the polynomial hierarchy helps us to understand other complexity classes.
3.5 References
The polynomial time hierarchy was deﬁned by Stockmeyer [Sto76]. Wrathall [Wra76] shows
that every class in the polynomial hierarchy has complete problems.
26
Exercises
1. Consider the following decision problem: given a directed graph G = (V, E), two
vertices s, t ∈ V , and an integer k, determine whether the shortest path from s to t is
of length k. Is this problem in NL?
2. In the MAX SAT problem we are given a formula φ in conjunctive normal form and
we want to ﬁnd the assignment of values to the variables that maximizes the number
of satisﬁed clauses. (For example, if φ is satisﬁable, the optimal solution satisﬁes all
the clauses and the MAX SAT problem reduces to ﬁnding a satisfying assignment.)
Consider the following decision problem: given a formula φ in conjunctive normal
form and an integer k, determine if k is the number of clauses of φ satisﬁed by an
optimal assignment.
• Prove that this problem is in NP if and only if NP = coNP.
[Hint: prove that it is both NPhard and coNPhard.]
• Prove that this problem is in Σ
2
.
27
Lecture 4
Circuits
This lecture is on boolean circuit complexity. We ﬁrst deﬁne circuits and the function they
compute. Then we consider families of circuits and the language they deﬁne. In Section 4.2,
we see how circuits relate to other complexity classes by a series of results, culminating with
the KarpLipton theorem which states that if NP problems can be decided with polynomial
size circuits, then PH = Σ
2
.
4.1 Circuits
A circuit C has n inputs, m outputs, and is constructed with and gates, or gates and not
gates. Each gate has indegree 2 except the not gate which has indegree 1. The outdegree
can be any number. A circuit must have no cycle. See Figure 4.1.
AND
AND
OR
x
1
x
2
x
3
x
4 n
x
1
z
2
z z
m
NOT
. . .
. . .
Figure 4.1: A Boolean circuit.
A circuit C with n inputs and m outputs computes a function f
C
: ¦0, 1¦
n
→ ¦0, 1¦
m
.
28
NOT
AND
OR
AND
NOT
x
1
x
2
circuits
XOR
x
3
x
4
Figure 4.2: A circuit computing the boolean function f
C
(x
1
x
2
x
3
x
4
) = x
1
⊕x
2
⊕x
3
⊕x
4
.
See Figure 4.2 for an example.
Deﬁne SIZE(C) = # of and and or gates of C. By convention, we do not count the
not gates.
To be compatible with other complexity classes, we need to extend the model to arbitrary
input sizes:
Deﬁnition 3 A language L is solved by a family of circuits ¦C
1
, C
2
, . . . , C
n
, . . .¦ if for every
n ≥ 1 and for every x s.t. [x[ = n,
x ∈ L ⇔ f
Cn
(x) = 1.
Deﬁnition 4 Say L ∈ SIZE(s(n)) if L is solved by a family ¦C
1
, C
2
, . . . , C
n
, . . .¦ of cir
cuits, where C
i
has at most s(i) gates.
4.2 Relation to other complexity classes
Proposition 17 For every language L, L ∈ SIZE(O(2
n
)). In other words, exponential
size circuits contain all languages.
Proof: We need to show that for every 1output function f : ¦0, 1¦
n
→ ¦0, 1¦, f has
circuit size O(2
n
).
Use the identity f(x
1
x
2
. . . x
n
) = (x
1
∧f(1x
2
. . . x
n
)) ∨(x
1
∧f(0x
2
. . . x
n
)) to recursively
construct a circuit for f, as shown in Figure 4.3.
The recurrence relation for the size of the circuit is: s(n) = 3 +2s(n−1) with base case
s(1) = 1, which solves to s(n) = 2 2
n
−3 = O(2
n
). 2
29
x
1
x
2 n
x
NOT
AND AND
OR
...
x
2
... x
n
) f(1 x
2
... x
n
) f(0
...
...
...
Figure 4.3: A circuit computing any function f(x
1
x
2
. . . x
n
) of n variables assuming circuits
for two functions of n −1 variables.
Proposition 18 If L ∈ DTIME(t(n)), then L ∈ SIZE(O(t
2
(n))).
Proof: Let L be a decision problem solved by a machine M in time t(n). Fix n and x s.t.
[x[ = n, and consider the t(n) t(n) tableau of the computation of M(x). See Figure 4.4.
Assume that each entry (a, q) of the tableau is encoded using k bits. By Proposition
17, the transition function ¦0, 1¦
3k
→ ¦0, 1¦
k
used by the machine can be implemented by
a “next state circuit” of size k O(2
3k
), which is exponential in k but constant in n. This
building block can be used to create a circuit of size O(t
2
(n)) that computes the complete
tableau, thus also computes the answer to the decision problem. This is shown in Figure 4.5.
2
Corollary 19 P ⊆ SIZE(n
O(1)
).
On the other hand, it’s easy to show that P = SIZE(n
O(1)
).
Proposition 20 There are languages L such that L ∈ SIZE(2
o(n)
). In other words, for
every n, there exists f : ¦0, 1¦
n
→ ¦0, 1¦ that cannot be computed by a circuit of size 2
o(n)
.
Proof: This is a counting argument. There are 2
2
n
functions f : ¦0, 1¦
n
→ ¦0, 1¦, and we
claim that the number of circuits of size s is at most 2
O(s log s)
, assuming s ≥ n. To bound the
number of circuits of size s we create a compact binary encoding of such circuits. Identify
gates with numbers 1, . . . , s. For each gate, specify where the two inputs are coming from,
whether they are complemented, and the type of gate. The total number of bits required
to represent the circuit is
s(2 log(n +s) + 3) ≤ s(2 log 2s + 3) = s(2 log 2s + 5).
30
x
1
q
0
x
2
x
n
.
.
.
.
.
.
q
x x
3 4
. . . . . .
. . .
a b c d e
time
tape position
Figure 4.4: t(n) t(n) tableau of computation. The left entry of each cell is the tape
symbol at that position and time. The right entry is the machine state or a blank symbol,
depending on the position of the machine head.
q
0
x
1
x
2 3
x
next
state
next
state
next
state
next
state
next
state
next
state
.
.
.
.
.
.
.
.
.
next
state
next
state
next
state
k bits
k bits k bits k bits
. . .
next state
circuit
x
n
. . .
. . .
check for accepting state
. . . . . .
Figure 4.5: Circuit to simulate a Turing machine computation by constructing the tableau.
31
So the number of circuits of size s is at most 2
2s log s+5s
, and this is not suﬃcient to
compute all possible functions if
2
2s log s+5s
< 2
2
n
.
This is satisﬁed if s = 2
o(n)
. 2
Theorem 21 (KarpLiptonSipser) If NP ⊆ SIZE(n
O(1)
) then PH = Σ
2
. In other
words, the polynomial hierarchy would collapse to its second level.
Before proving the above theorem, we ﬁrst show a result that contains some of the ideas
in the proof of the KarpLiptonSipser theorem.
Lemma 22 If NP ⊆ SIZE(n
O(1)
) then there is a family of polynomialsize circuits that
on input a 3CNF formula φ outputs a satisfying assignment for φ if one such assignment
exists and a sequence of zeroes otherwise.
Proof: We deﬁne the circuits C
1
n
, . . . , C
n
n
as follows:
• C
1
n
, on input a formula φ over n variables outputs 1 if and only if there is a satisfying
assignment for φ where x
1
= 1,
•
• C
i
n
, on input a formula φ over n variables and bits b
1
, . . . , b
i−1
, outputs 1 if and only
if there is a satisfying assignment for φ where x
1
= b
1
, . . . , x
i−1
= b
i−1
, x
i
= 1
•
• C
n
n
, on input a formula φ over n variables and bits b
1
, . . . , b
n−1
, outputs 1 if and only
if φ is satisﬁed by the assignment x
1
= b
1
, . . . , x
n−1
= b
n−1
, x
n
= 1.
Also, each circuit realizes an NP computation, and so it can be built of polynomial size.
Consider now the sequence b
1
= C
1
n
(φ), b
2
= C
2
n
(b
1
, φ), . . . , b
n
C
n
n
(b
1
, . . . , b
n−1
, φ). The
reader should be able to convince himself that this is a satisfying assignment for φ if φ is
satisﬁable, and a sequence of zeroes otherwise. 2
We now prove the KarpLiptonSipser theorem.
Proof: [Of Theorem 21] We will show that if NP ⊆ SIZE(n
O(1)
) then Π
2
⊆ Σ
2
. By a
result in a previous lecture, this implies that PH = Σ
2
.
Let L ∈ Π
2
, then there is a polynomial p() and a polynomialtime computable V () such
that
x ∈ L iﬀ ∀y
1
.[y
1
[ ≤ p([x[)∃y
2
.[y
2
[ ≤ p([x[).V (x, y
1
, y
2
) = 1
By adapting the proof of Lemma 22 (see Figure 4.6), or by using the statement of the
Lemma and Cook’s theorem, we can show that, for every n, there is a circuit C
n
of size
polynomial in n such that for every x and every y
1
, [y
1
[ ≤ p([x[),
32
x
1 1
y
there is y
starting with
1
2
x
1 1
y
there is y
2
y
2
x
1 1
y
1
b x
1 1
y
2
b
1
b
1
b
2
b
. . .
. . .
there is y
starting with
2
b 1
there is y
starting with
b b 1
2
1 1 2
3
b
0 or 1
Figure 4.6: How to use decision problem solvers to ﬁnd a witness to a search problem.
∃y
2
.[y
2
[ ≤ p([x[) ∧ V (x, y
1
, y
2
) = 1 iﬀ V (x, y
1
, C
n
(x, y
1
)) = 1
Let q(n) be a polynomial upper bound to the size of C
n
.
So now we have that for inputs x of length n,
x ∈ L iﬀ ∃C
n
.[C
n
[ ≤ q(n).∀y
1
.[y
1
[ ≤ p(n).V (x, y
1
, C
n
(x, y
1
)) = 1
which shows that L is in Σ
2
. 2
4.3 References
The KarpLiptonSipser theorem appears in [KL80].
33
Exercises
1. Show that SIZE(n
O(1)
) ⊆ P.
2. Show that there is a language in SPACE(2
n
O(1)
) that does not belong to SIZE(2
o(n)
).
3. Deﬁne EXP = DTIME(2
n
O(1)
). Prove that if EXP ⊆ SIZE(n
O(1)
) then EXP = Σ
2
.
34
Lecture 5
Probabilistic Complexity Classes
In this lecture we will deﬁne the probabilistic complexity classes BPP, RP, ZPP and
we will see how they are related to each other, as well as to other deterministic or circuit
complexity classes.
5.1 Probabilistic complexity classes
First we are going to describe the probabilistic model of computation. In this model an
algorithm A gets as input a sequence of random bits r and the ”real” input x of the problem.
The output of the algorithm is the correct answer for the input x with some probability.
Deﬁnition 5 An algorithm A is called a polynomial time probabilistic algorithm if the size
of the random sequence [r[ is polynomial in the input [x[ and A() runs in time polynomial
in [x[.
If we want to talk about the correctness of the algorithm, then informally we could say that
for every input x we need Pr[A(x, r) = correct answer forx] ≥
2
3
. That is, for every input
the probability distribution over all the random sequences must be some constant bounded
away from
1
2
. Let us now deﬁne the class BPP.
Deﬁnition 6 A decision problem L is in BPPif there is a polynomial time algorithm A
and a polynomial p() such that :
∀x ∈ L Pr
r∈{0,1}
p(x) [A(x, r) = 1] ≥ 2/3
∀x ∈ L Pr
r∈{0,1}
p(x) [A(x, r) = 1] ≤ 1/3
We can see that in this setting we have an algorithm with two inputs and some con
straints on the probabilities of the outcome. In the same way we can also deﬁne the class
P as:
Deﬁnition 7 A decision problem L is in Pif there is a polynomial time algorithm A and
a polynomial p() such that :
∀x ∈ L : Pr
r∈{0,1}
p(x) [A(x, r) = 1] = 1
∀x ∈ L : Pr
r∈{0,1}
p(x) [A(x, r) = 1] = 0
35
Similarly, we deﬁne the classes RPand ZPP.
Deﬁnition 8 A decision problem L is in RPif there is a polynomial time algorithm A and
a polynomial p() such that :
∀x ∈ L Pr
r∈{0,1}
p(x) [A(x, r) = 1] ≥ 1/2
∀x ∈ L Pr
r∈{0,1}
p(x) [A(x, r) = 1] ≤ 0
Deﬁnition 9 A decision problem L is in ZPPif there is a polynomial time algorithm A
whose output can be 0, 1, ? and a polynomial p() such that :
∀x Pr
r∈{0,1}
p(x) [A(x, r) =?] ≤ 1/2
∀x, ∀r such that A(x, r) =? then A(x, r) = 1 if and only if x ∈ L
5.2 Relations between complexity classes
After deﬁning these probabilistic complexity classes, let’s see how they are related to other
complexity classes and with each other.
Theorem 23 RP⊆NP.
Proof: Suppose we have an algorithm for RP. Then this algorithm is also in NP. If
x ∈ L then there is a random sequence r, for which the algorithm answers yes. On the
other hand, if x ∈ L then there is no witness. 2
Most natural probabilistic algorithms belong to the class RP. We can also show that
the class ZPP is no larger than RP.
Theorem 24 ZPP⊆RP.
Proof: We are going to convert a ZPP algorithm into an RP algorithm. The construction
consists of running the ZPP algorithm and anytime it outputs ?, the new algorithm will
answer 0. In this way, if the right answer is 0, then the algorithm will answer 0 with
probability 1. On the other hand, when the right answer is 1, then the algorithm will give
the wrong answer with probability less than 1/2, since the probability of the ZPP algorithm
giving the output ? is less than 1/2. 2
Another interesting property of the class ZPP is that it’s equivalent to the class of
languages for which there is an average polynomial time algorithm that always gives the
right answer. More formally,
Theorem 25 A language L is in the class ZPPif and only if L has an average polynomial
time algorithm that always gives the right answer.
Proof: First let us clarify what we mean by average time. For each input x we take the
average time of A(x, r) over all random sequences r. Then for size n we take the worst time
over all possible inputs x of size [x[ = n. In order to construct an algorithm that always
gives the right answer we run the ZPPalgorithm and if it outputs a ?, then we run it again.
36
Suppose that the running time of the ZPPalgorithm is T, then the average running time
of the new algorithm is:
T
avg
=
1
2
T +
1
4
2T +. . . +
1
2
k
kT ≈ O(T)
Now, we want to prove that if the language L has an algorithm that runs in polynomial
average time t([x[), then this is in ZPP. What we do is run the algorithm for time 2t([x[)
and output a ? if the algorithm has not yet stopped. It is straightforward to see that
this belongs to ZPP. First of all, the worst running time is polynomial, actually 2t([x[).
Moreover, the probability that our algorithm outputs a ? is less than 1/2, since the original
algorithm has an average running time t([x[) and so it must stop before time 2t([x[) at least
half of the times. 2
Let us now prove the fact that RPis contained in BPP.
Theorem 26 RP⊆BPP
Proof: We will convert an RPalgorithm into a BPPalgorithm. In the case that the input
x does not belong to the language then the RPalgorithm always gives the right answer, so
this is deﬁnitely in BPPas well. In the case that the input x does belong to the language
then we need to boost the probability of a correct answer from at least 1/2 to at least 2/3.
More formally, let A be an RPalgorithm for a decision problem L. We ﬁx some number
k and deﬁne the following algorithm:
A
(k)
input: x,
pick r
1
, r
2
, . . . , r
k
if A(x, r
1
) = A(x, r
2
) = . . . = A(x, r
k
) = 0 then return 0
else return 1
Let us now consider the correctness of the algorithm.In case the correct answer is 0 the
output is always right, though in the case where the right answer is 1 the output is right
except when all A(x, r
i
) = 0.
if x ∈ L Pr
r
1
,...,r
k
[A
k
(x, r
1
, . . . , r
k
) = 1] = 0
if x ∈ L Pr
r
1
,...,r
k
[A
k
(x, r
1
, . . . , r
k
) = 1] ≥ 1 −
1
2
k
It is easy to see that by choosing an appropriate k the second probability can go arbitrarily
close to 1 and therefore become larger than 2/3, which is what is required by the deﬁnition
of BPP. In fact, by choosing k to be a polynomial in [x[, we can make the probability
exponentially close to 1. This enables us to change the deﬁnition of RPand instead of the
bound of 1/2 for the probability of a correct answer when the input is in the language L,
we can have a bound of 1 −
1
2
q(x)
, for a ﬁxed polynomial q. 2
Let, now, A be a BPP algorithm for a decision problem L. Then, we ﬁx k and deﬁne
the following algorithm:
37
A
(k)
input: x,
pick r
1
, r
2
, . . . , r
k
c =
¸
k
i=1
A(x, r
i
)
if c ≥
k
2
then return 1
else return 0
In a BPP algorithm we expect the right answer to come up with probability more than 1/2.
So, by running the algorithm many times we make sure that this slightly bigger than 1/2
probability will actually show up in the results. More formally let us deﬁne the Chernoﬀ
bounds.
Theorem 27 (Chernoﬀ Bound)
Suppose X
1
, . . . , X
k
are independent random variables with values in ¦0, 1¦ and for every i,
Pr[X
i
= 1] = p. Then:
Pr[
1
k
k
¸
i=1
X
i
−p > ] < e
−
2
k
2p(1−p)
Pr[
1
k
k
¸
i=1
X
i
−p < −] < e
−
2
k
2p(1−p)
The Chernoﬀ bounds will enable us to bound the probability that our result is far from the
expected. Indeed, these bounds say that this probability is exponentially small in respect
to k.
Let us now consider how the Chernoﬀ bounds apply to the algorithm we described
previously. We ﬁx the input x and call p = Pr
r
[A(x, r) = 1] over all possible random
sequences. We also deﬁne the independent random variables X
1
, . . . , X
k
such that X
i
=
A(x, r
i
).
First, suppose x ∈ L. Then the algorithm A
(k)
(x, r
1
, . . . , r
k
) outputs the right answer
1, when
1
k
¸
i
X
i
≥
1
2
. So, the algorithm makes a mistake when
1
k
¸
i
X
i
<
1
2
.
We now apply the Chernoﬀ bounds to bound this probability.
Pr[A
(k)
outputs the wrong answer on x] = Pr[
1
k
¸
i
X
i
<
1
2
]
≤ Pr[
1
k
¸
i
X
i
−p ≤ −
1
6
]
since p ≥
2
3
.
≤ e
−
k
72p(1−p)
= 2
−Ω(k)
38
The probability is exponentially small in k. The same reasoning applies also for the case
where x ∈ L. Further, it is easy to see that by choosing k to be a polynomial in [x[ instead
of a constant, we can change the deﬁnition of a BPPalgorithm and instead of the bound
of
1
3
for the probability of a wrong answer, we can have a bound of 2
−q(x)
, for a ﬁxed
polynomial q.
Next, we are going to see how the probabilistic complexity classes relate to circuit
complexity classes and speciﬁcally prove that the class BPPhas polynomial size circuits.
Theorem 28 (Adleman) BPP⊆SIZE(n
O(1)
)
Proof: Let L be in the class BPP. Then by deﬁnition, there is a polynomial time
algorithm A and a polynomial p, such that for every input x
Pr
r∈{0,1}
p(x) [A(x, r) = wrong answer for x] ≤ 2
−(n+1)
This follows from our previous conclusion that we can replace
1
3
with 2
−q(x)
. We now ﬁx
n and try to construct a family of circuits C
n
, that solves L on inputs of length n.
Claim 29 There is a random sequence r ∈ ¦0, 1¦
p(n)
such that for every x ∈ ¦0, 1¦
n
A(x, r)
is correct.
Proof: Informally, we can see that for each input x the number of random sequences
r that give the wrong answer is exponentially small. Therefore, even if we assume that
these sequences are diﬀerent for every input x, their sum is still less than the total number
of random sequences. Formally, let’s consider the probability over all sequences that the
algorithm gives the right answer for all input. If this probability is greater than 0, then the
claim is proved.
Pr
r
[for every x, A(x, r) is correct] = 1 −Pr
r
[∃x, A(x, r) is wrong]
the second probability is the union of 2
n
possible events for each x. This is bounded by the
sum of the probabilities.
≥ 1 −
¸
x∈{0,1}
n
Pr
r
[A(x, r)is wrong]
≥ 1 −2
n
2
−(n+1)
≥
1
2
2
So, we proved that at least half of the random sequences are correct for all possible
input x. Therefore, it is straightforward to see that we can simulate the algorithm A(, ),
where the ﬁrst input has length n and the second p(n), by a circuit of size polynomial in n.
All we have to do is ﬁnd a random sequence which is always correct and build it inside
the circuit. Hence, our circuit will take as input only the input x and simulate A with input
39
x and r for this ﬁxed r. Of course, this is only an existential proof, since we don’t know
how to ﬁnd this sequence eﬃciently. 2
In conclusion, let us brieﬂy describe some other relations between complexity classes.
Whether BPP ⊆ NP or not is still an open question. What we know is that it’s unlikely
that NPis contained in BPP, since then by the previous result NP would have polynomial
size circuits and hence by the result of Karp and Lipton the polynomial hierarchy would
collapse.
5.3 BPP⊆ Σ
2
This result was ﬁrst shown by Sipser and Gacs. Lautemann gave a much simpler proof
which we give below.
Lemma 30 If L is in BPP then there is an algorithm A such that for every x,
Pr
r
(A(x, r) = right answer) ≥ 1 −
1
3m
,
where the number of random bits [r[ = m = [x[
O(1)
and A runs in time [x[
O(1)
.
Proof: Let
ˆ
Abe a BPPalgorithm for L. Then for every x, Pr
r
(
ˆ
A(x, r) = wrong answer) ≤
1
3
. and
ˆ
A uses ˆ m(n) random bits where n = [x[.
Do k(n) repetitions of
ˆ
A and accept if and only if at least
k(n)
2
executions of
ˆ
A ac
cept. Call the new algorithm A. Then A uses k(n) ˆ m(n) random bits and Pr
r
(A(x, r) =
wrong answer) ≤ 2
−ck(n)
. We can then ﬁnd k(n) with k(n) = Θ(log ˆ m(n)) such that
1
2
ck(n)
≤
1
3k(n)
ˆ
m(n)
. 2
Theorem 31 BPP⊆ Σ
2
.
Proof: Let L be in BPP and A as in the claim. Then we want to show that
x ∈ L ⇔ ∃y
1
, . . . , y
m
∈ ¦0, 1¦
m
∀z ∈ ¦0, 1¦
m
m
¸
i=1
A(x, y
i
⊕z) = 1
where m is the number of random bits used by A on input x.
Suppose x ∈ L. Then
Pr
y
1
,...,ym
(∃zA(x, y
1
⊕z) = = A(x, y
m
⊕z) = 0)
≤
¸
z∈{0,1}
m
Pr
y
1
,...,ym
(A(x, y
1
⊕z) = = A(x, y
m
⊕z) = 0)
≤ 2
m
1
(3m)
m
< 1.
40
So
Pr
y
1
,...,ym
(∀z
¸
i
A(x, y
i
⊕z)) = 1 −Pr
y
1
,...,ym
(∃zA(x, y
1
⊕z) = = A(x, y
m
⊕z) = 0)
> 0.
So (y
1
, . . . , y
m
) exists.
Conversely suppose x / ∈ L. Then
Pr
z
¸
i
A(x, y
i
⊕z)
≤
¸
i
Pr
z
(A(x, y
i
⊕z) = 1)
≤ m
1
3m
=
1
3
.
So
Pr
z
(A(x, y
1
⊕z) = = A(x, y
m
⊕z) = 0) = Pr
z
¸
i
A(x, y
i
⊕z)
≥
2
3
> 0.
So there is a z such that
i
A(x, y
i
⊕z) = 0 for all y
1
, . . . , y
m
∈ ¦0, 1¦
m
. 2
5.4 References
Probabilistic complexity classes were deﬁned in [Gil77]. Adleman’s proof that BPP ⊆
SIZE(n
O(1)
) appears in [Adl78]. Sipser’s proof that BPP ⊆ Σ
2
appears in [Sip83], and
Lautemann’s proof is in [Lau83].
41
Exercises
1. Prove that ZPP=RP∩coRP
2. Show that if NP ⊆ BPP then NP = RP.
3. Argue that SPACE(O(n
log n
)) ⊆ BPP.
42
Lecture 6
A Probabilistic Algorithm
6.1 Branching Programs
We begin by introducing another model of computation, that of branching programs. A
branching program accepts or rejects input strings x
1
, . . . , x
n
and can be described as a
directed acyclic graph with a single root and two ﬁnal states, 0 and 1. The computational
path begins at the root. Each nonﬁnal vertex is labeled with some single variable x
i
and
has two outgoing edges, one labeled 0 and the other 1. The path follows edge b such that
x
i
= b, and the program accepts the input if and only if the path ends at the ﬁnal state 1.
An illustrative example is the branching program that considers input string x
1
x
2
x
3
and
outputs 1 if x
1
⊕x
2
⊕x
3
= 1. This is shown in ﬁgure 6.1.
Branching programs are a reasonably powerful model of computation; for example we
can show that L can be computed by branching programs of polynomial size.
.
.
.
.
.
.
.
.
.
.
.
.
AND
OR
. . .
}
k
Figure 6.1: A branching program.
43
6.2 Testing Equivalence
We would like to be able to consider two branching programs and determine if they compute
the same function. In general, testing equivalence for any model of computation is diﬃcult:
for Turing machines, the problem is undecidable, and for circuits, the problem is coNP
complete. A similar result is true for branching programs– here, testing equivalence is also
coNPcomplete, through a reduction from co3SAT.
Happily, the problem becomes easier once we restrict it to readonce branching programs,
in which every path from the root to a ﬁnal state tests each variable at most once. We will
show that in this case testing equivalence is in coRP. Our strategy will be as follows: Given
two readonce branching programs B
1
and B
2
, we will deﬁne two polynomials p
1
and p
2
of
degree n over x
1
, . . . , x
n
that are equivalent if and only if B
1
and B
2
are also equivalent.
We will then use a coRPalgorithm to test the equivalence of the two polynomials. The
polynomials will be represented in compact form, but will have an exponential number of
monomials and thus we cannot simply directly compare their coeﬃcients.
Our polynomials will be determined as follows:
• We assign each vertex v
i
in the branching program from the root to the ﬁnal states
a polynomial p
i
(x
1
, . . . , x
n
), beginning with the root vertex which gets the constant
polynomial 1.
• Once a vertex has been assigned a polynomial p(x
1
, . . . , x
n
), its outgoing edges are
then given polynomials in the following manner: If the vertex performed a test on x
j
,
the outgoing 0 edge receives the polynomial (1 −x
j
)p(x
1
, . . . , x
n
) and the outgoing 1
edge receives (x
j
)p(x
1
, . . . , x
n
).
• Once all of a vertex’s incoming edges have been appropriately labeled, that vertex’s
polynomial becomes the sum of the polynomials of the incoming edges.
• The polynomial associated with the ﬁnal state 1 is the polynomial of the branching
program.
These rules are illustrated in ﬁgure 6.2. The idea is that the polynomial at each vertex
is nonzero on a set of inputs only if that set of inputs will lead the branching program to
that vertex.
.
.
.
.
.
.
.
.
.
.
.
.
OR
. . .
}
Collapse
OR
}
log S
Figure 6.2: Constructing the polynomial of a branching program.
We can see that when the branching programs are readonce, the degree of a variable
in any monomial in the resulting polynomial will have power no larger than 1, and hence
the polynomial will be of degree at most n.
44
We now observe that given readonce branching programs B
1
and B
2
, their correspond
ing polynomials p
1
and p
2
are equivalent if and only if B
1
and B
2
are equivalent. To see
this, if any monomial m in p
1
or p
2
does not contain a variable x
i
we replace m with
mx
i
+m(1 −x
i
). We can then write p
1
and p
2
as sums of terms of the form
¸
n
i=1
y
i
, where
y
i
is either x
i
or (1 − x
i
). Each term corresponds to a path in the branching program
that ends in ﬁnal state 1. From this, some extra thought reveals that equivalence of the
polynomials is equivalent to equivalence of the branching programs.
Notice also that if the branching programs are polynomial size, we can evaluate the
polynomials eﬃciently, even though they may contain an exponential number of coeﬃcients.
All we need now is a method of testing equivalence of polynomials eﬃciently.
6.3 The SchwartzZippel Lemma
The tool we will use for this is the SchwartzZippel lemma, which states:
Lemma 32 If p(x
1
, . . . , x
n
) is an nvariate nonzero polynomial of degree d over a ﬁnite
ﬁeld F, then p has at most dF
n−1
roots. Equivalently, Pr[p(a
1
, . . . , a
n
) = 0] ≤ d/F.
Notice that we use F to be both the ﬁeld and its size.
Proof: The proof proceeds by induction on n. The base case of n = 1 is simply that of a
univariate polynomial and follows immediately.
Now consider p(x
1
, . . . , x
n
) of degree at most d. We can write
p(x
1
, . . . , x
n
) = p
0
(x
2
, . . . , x
n
) +x
1
p
1
(x
2
, . . . , x
n
) +x
2
1
p
2
(x
2
, . . . , x
n
) +. . . +x
k
1
p
k
(x
2
, . . . , x
n
)
where k is the largest value for which p
k
is not identically zero.
We now observe that
[¦a
1
, . . . , a
n
¦ : p(a
1
, . . . , a
n
) = 0[ ≥ [¦a
1
, . . . , a
n
¦ : p(a
1
, . . . , a
n
) = 0 and p
k
(a
2
, . . . , a
n
) = 0[
By induction, p
k
is nonzero on at least F
n−1
− (d − k)F
n−2
points. Further, for each
such point a
2
, . . . , a
n
, p(x
1
, a
2
, . . . , a
n
) is a nonzero univariate polynomial and hence there
are at least F −k values of a
1
such that p(a
1
, . . . , a
n
) = 0.
Putting this together, we can bound the number of nonzero points from below by
F
n−1
1 −
d −k
F
F
1 −
k
F
≥ F
n
1 −
d
F
= F
n
−dF
n−1
This ﬁnishes the proof. 2
An algorithm for testing equivalence of polynomials and hence readonce branching
programs is now immediate. We choose a ﬁeld of size at least 3d, and then choose random
a
1
, . . . , a
n
from F
n
. We accept if p
1
(a
1
, . . . , a
n
) = p
2
(a
1
, . . . , a
n
).
If the two polynomials are equivalent, then we always accept; otherwise reject with
probability at least 2/3, making this a coRP algorithm.
6.4 References
The SchwartzZippel theorem is from [Sch80, Zip79]. The algorithm for testing equivalence
of readonce branching programs is due to Blum, Chandra and Wegman [BCW80].
45
Exercises
1. Prove that testing equivalence of general branching programs is coNPcomplete.
2. Show that every Boolean function f : ¦0, 1¦
n
→ ¦0, 1¦ can be computed by a readonce
branching program (possibly of exponential size).
46
Lecture 7
UniqueSAT
7.1 The ValiantVazirani Reduction
In this section we show the following: suppose there is an algorithm for the satisﬁability
problem that always ﬁnd a satisfying assignment for formulae that have exactly one sat
isﬁable assignment (and behaves arbitrarily on other instances): then we can get an RP
algorithm for the general satisﬁability problem, and so NP = RP.
We prove the result by presenting a randomized reduction that given in input a CNF
formula φ produces in output a polynomial number of formulae ψ
0
, . . . , ψ
n
. If φ is satisﬁable,
then (with high probability) at least one of the ψ
i
is satisﬁable and has exactly one satisfying
assignment; if φ is not satisﬁable, then (with probability one) all ψ
i
are unsatisﬁable.
The idea for the reduction is the following. Suppose φ is a satisﬁable formula with n
variables that has about 2
k
satisfying assignments, and let h : ¦0, 1¦
n
→ ¦0, 1¦
k
be a hash
function picked from a family of pairwise independent hash functions: then the average
number of assignments x such that φ(x) is true and h(x) = (0, . . . , 0) is about one. Indeed,
we can prove formally that with constant probability there is exactly one such assignment,
1
and that there is CNF formula ψ (easily constructed from φ and h) that is satisﬁed precisely
by that assignment. By doing the above construction for values of k ranging from 0 to n,
we obtain the desired reduction. Details follow.
Deﬁnition 10 Let H be a family of functions of the form h : ¦0, 1¦
n
→ ¦0, 1¦
m
. We say
that H is a family of pairwise independent hash functions if for every two diﬀerent inputs
x, y ∈ ¦0, 1¦
n
and for every two possible outputs a, b ∈ ¦0, 1¦
m
we have
Pr
h∈H
[h(x) = a ∧ h(y) = b] =
1
2
2m
Another way to look at the deﬁnition is that for every x = y, when we pick h at random
then the random variables h(x) and h(y) are independent and uniformly distributed. In
particular, for every x = y and for every a, b we have Pr
h
[h(x) = a[h(y) = b] = Pr
h
[h(x) =
a].
1
For technical reasons, it will be easier to prove that this is the case when picking a hash function
h : {0, 1}
n
→ {0, 1}
k+2
.
47
For m vectors a
1
, . . . , a
m
∈ ¦0, 1¦
m
and m bits b
1
, . . . , b
m
, deﬁne h
a
1
,...,am,b
1
,...,bm
:
¦0, 1¦
n
→ ¦0, 1¦
m
as h
a,b
(x) = (a
1
x + b
1
, . . . , a
m
x + b
m
), and let H
AFF
be the fam
ily of functions deﬁned this way. Then it is not hard to see that H
AFF
is a family of
pairwise independent hash functions.
Lemma 33 Let T ⊆ ¦0, 1¦
n
be a set such that 2
k
≤ [T[ < 2
k+1
and let H be a family of
pairwise independent hash functions of the form h : ¦0, 1¦
n
→ ¦0, 1¦
k+2
. Then if we pick
h at random from H, there is a constant probability that there is a unique element x ∈ T
such that h(x) = 0. Precisely,
Pr
h∈H
[[¦x ∈ T : h(x) = 0¦[ = 1] ≥
1
8
Proof: Let us ﬁx an element x ∈ T. We want to compute the probability that x is the
unique element of T mapped into 0 by h. Clearly,
Pr
h
[h(x) = 0∧∀y ∈ T−¦x¦.h(y) = 0] = Pr
h
[h(x) = 0]Pr
h
[∀y ∈ T−¦x¦.h(y) = 0[h(x) = 0]
and we know that
Pr
h
[h(x) = 0] =
1
2
k+2
The diﬃcult part is to estimate the other probability. First, we write
Pr
h
[∀y ∈ T −¦x¦.h(y) = 0[h(x) = 0] = 1 −Pr
h
[∃y ∈ T −¦x¦.h(y) = 0[h(x) = 0]
And then observe that
Pr
h
[∃y ∈ T −¦x¦.h(y) = 0[h(x) = 0]
≤
¸
y∈T−{x}
Pr
h
[h(y) = 0[h(x) = 0]
=
¸
y∈T−{x}
Pr
h
[h(y) = 0]
=
[T[ −1
2
k+2
≤
1
2
Notice how we used the fact that the value of h(y) is independent of the value of h(x) when
x = y.
Putting everything together, we have
Pr
h
[∀y ∈ T −¦x¦.h(y) = 0[h(x) = 0] ≥
1
2
and so
Pr
h
[h(x) = 0 ∧ ∀y ∈ T −¦x¦.h(y) = 0] ≥
1
2
k+3
48
To conclude the argument, we observe that the probability that there is a unique element
of T mapped into 0 is given by the sum over x ∈ T of the probability that x is the unique
element mapped into 0 (all this events are disjoint, so the probability of their union is the
sum of the probabilities). The probability of a unique element mapped into 0 is then at
least [T[/2
k+3
> 1/8. 2
Lemma 34 There is a probabilistic polynomial time algorithm that on input a CNF formula
φ and an integer k outputs a formula ψ such that
• If φ is unsatisﬁable then ψ is unsatisﬁable.
• If φ has at least 2
k
and less than 2
k+1
satifying assignments, then there is a probability
at least 1/8 then the formula ψ has exactly one satisfying assignment.
Proof: Say that φ is a formula over n variables. The algorithm picks at random vectors
a
1
, . . . , a
k+2
∈ ¦0, 1¦
n
and bits b
1
, . . . , b
k+2
and produces a formula ψ that is equivalent to
the expression φ(x)∧(a
1
x+b
1
= 0)∧. . . ∧(a
k+2
x+b
k+2
= 0). Indeed, there is no compact
CNF expression to compute a x if a has a lot of ones, but we can proceed as follows: for
each i we add auxiliary variables y
i
1
, . . . , y
i
n
and then write a CNF condition equivalent to
(y
i
1
= x
1
∧ a
i
[1]) ∧ ∧ (y
i
n
= y
i
n−1
⊕(x
n
∧ a
i
[n] ⊕b
i
))). Then ψ is the AND of the clauses
in φ plus all the above expressions for i = 1, 2, . . . , k + 2.
By construction, the number of satisfying assignments of ψ is equal to the number of
satisfying assignments x of φ such that h
a
1
,...,a
k+2
,b
1
,...,b
k+2
(x) = 0. If φ is unsatisﬁable, then,
for every possible choice of the a
i
, ψ is also unsatisﬁable.
If φ has between 2
k
and 2
k+1
assignments, then Lemma 33 implies that with probability
at least 1/8 there is exactly one satisfying assignment for ψ. 2
Theorem 35 (ValiantVazirani) Suppose there is a polynomial time algorithm that on
input a CNF formula having exactly one satisfying assignment ﬁnds that assignment. (We
make no assumption on the behaviour of the algorithm on other inputs.) Then NP = RP.
Proof: It is enough to show that, under the assumption of the Theorem, 3SAT has an
RP algorithm.
On input a formula φ, we construct formulae ψ
0
, . . . , ψ
n
by using the algorithm of Lemma
34 with parameters k = 0, . . . , n. We submit all formulae ψ
0
, . . . , ψ
n
to the algorithm in the
assumption of the Theorem, and accept if the algorithm can ﬁnd a satisfying assignment
for at least one of the formulae. If φ is unsatisﬁable, then all the formulae are always
unsatisﬁable, and so the algorithm has a probability zero of accepting. If φ is satisﬁable,
then for some k it has between 2
k
and 2
k+1
satisfying assignments, and there is a probability
at least 1/8 that ψ
k
has exactly one satisfying assignment and that the algorithm accepts.
If we repeat the above procedure t times, and accept if at least one iteration accepts, then if
φ is unsatisﬁable we still have probability zero of accepting, otherwise we have probability
at least 1 −(7/8)
t
of accepting, which is more than 1/2 already for t = 6. 2
7.2 References
The ValiantVazirani result is from [VV86].
49
Exercises
1. Prove that H
AFF
is indeed a family of a pairwise independent hash functions.
2. Change the assumption of Theorem 35 to having a probabilistic polynomial time al
gorithm that on input a formula with exactly one satisfying assignment ﬁnds that
assignment with probability at least 1/2. Prove that it still follows that NP = RP.
50
Lecture 8
Counting Problems
8.1 Counting Classes
Deﬁnition 11 R is an NPrelation, if there is a polynomial time algorithm A such that
(x, y) ∈ R ⇔ A(x, y) = 1 and there is a polynomial p such that (x, y) ∈ R ⇒ [y[ ≤ p([x[).
#R is the problem that, given x, asks how many y satisfy (x, y) ∈ R.
Deﬁnition 12 #Pis the class of all problems of the form #R, where R is an NPrelation.
Observe that an NPrelation R naturally deﬁnes an NP language L
R
, where L
R
= ¦x : x ∈
R(x, y)¦, and every NP language can be deﬁned in this way. Therefore problems in #P can
always be seen as the problem of counting the number of witnesses for a given instance of
an NP problem.
Unlike for decision problems there is no canonical way to deﬁne reductions for counting
classes. There are two common deﬁnitions.
Deﬁnition 13 We say there is a parsimonious reduction from #A to #B (written #A ≤
par
#B) if there is a polynomial time transformation f such that for all x, [¦y, (x, y) ∈ A¦[ =
[¦z : (f(x), z) ∈ B¦[.
Often this deﬁnition is a little too restrictive and we use the following deﬁnition instead.
Deﬁnition 14 #A ≤ #B if there is a polynomial time algorithm for #A given an oracle
that solves #B.
#CIRCUITSAT is the problem where given a circuit, we want to count the number of
inputs that make the circuit output 1.
Theorem 36 #CIRCUITSAT is #Pcomplete under parsimonious reductions.
Proof: Let #R be in #Pand A and p be as in the deﬁnition. Given x we want to construct
a circuit C such that [¦z : C(z)¦[ = [¦y : [y[ ≤ p([x[), A(x, y) = 1¦[. We then construct
ˆ
C
n
that on input x, y simulates A(x, y). From earlier arguments we know that this can be done
with a circuit with size about the square of the running time of A. Thus
ˆ
C
n
will have size
polynomial in the running time of A and so polynomial in x. Then let C(y) =
ˆ
C(x, y). 2
51
Theorem 37 #3SATis #Pcomplete.
Proof: We show that there is a parsimonious reduction from #CIRCUITSAT to #3
SAT. That is, given a circuit C we construct a Boolean formula φ such that the number
of satisfying assignments for φ is equal to the number of inputs for which C outputs 1.
Suppose C has inputs x
1
, . . . , x
n
and gates 1, . . . , m and φ has inputs x
1
, . . . , x
n
, g
1
, . . . , g
m
,
where the g
i
represent the output of gate i. Now each gate has two input variables and one
output variable. Thus a gate can be complete described by mimicking the output for each
of the 4 possible inputs. Thus each gate can be simulated using at most 4 clauses. In this
way we have reduced C to a formula φ with n +m variables and 4m clauses. So there is a
parsimonious reduction from #CIRCUITSAT to #3SAT. 2
Notice that if a counting problem #R is #Pcomplete under parsimonious reduc
tions, then the associated language L
R
is NPcomplete, because #3SAT ≤
par
#R implies
3SAT ≤ L
R
. On the other hand, with the less restrictive deﬁnition of reducibility, even
some counting problems whose decision version is in P are #Pcomplete. For example, the
problem of counting the number of satisfying assignments for a given 2CNF formula and
the problem of counting the number of perfect matchings in a given bipartite graphs are
both #Pcomplete.
8.2 Complexity of counting problems
We will prove the following theorem:
Theorem 38 For every counting problem #A in #P, there is an algorithm C that on input
x, compute with high probability a value v such that
(1 −)#A(x) ≤ v ≤ (1 +)#A(x)
in time polynomial in [x[ and in
1
, using an oracle for NP.
The theorem says that #P can be approximate in BPP
NP
. We have a remark here
that approximating #3SAT is NPhard. Therefore, to compute the value we need at least
the power of NP, and this theorem states that the power of NP and randomization is
suﬃcient.
Another remark concerns the following result.
Theorem 39 (Toda) For every k, Σ
k
⊆ P
#P
.
This implies that #3SAT is Σ
k
hard for every k, i.e., #3SAT lies outside PH, unless
the hierarchy collapses. Recall that BPP lies inside Σ
2
, and hence approximating #3SAT
can be done in Σ
3
. Therefore, approximating #3SAT cannot be equivalent to computing
#3SAT exactly, unless the polynomial hierarchy collapses.
We ﬁrst make some observations so that we can reduce the proof to an easier one.
• It is enough to prove the theorem for #3SAT.
If we have an approximation algorithm for #3SAT, we can extend it to any #A in
#P using the parsimonious reduction from #A to #3SAT.
52
• It is enough to give a polynomial time O(1)approximation for #3SAT.
Suppose we have an algorithm C and a constant c such that
1
c
#3SAT(ϕ) ≤ C(ϕ) ≤ c#3SAT(ϕ).
Given ϕ, we can construct ϕ
k
= ϕ
1
∧ ϕ
2
∧ ∧ ϕ
k
. If ϕ has t satisfying assignments,
ϕ
k
has t
k
satisfying assignments. Then, giving ϕ
k
to the algorithm we get
1
c
t
k
≤ C(ϕ
k
) ≤ ct
k
1
c
1/k
t ≤ C(ϕ
k
)
1/k
≤ c
1/k
t.
If c is a constant and k = O(
1
), c
1/k
= 1 +.
• For a formula ϕ that has O(1) satisfying assignments, #3SAT(ϕ) can be found in
P
NP
.
This can be done by iteratively asking the oracle the questions of the form: “Are there
k assignments satisfying this formula?” Notice that these are NP questions, because
the algorithm can guess these k assignments and check them.
8.3 An approximate comparison procedure
Consider the following approximate comparison procedure acomp deﬁned as:
acomp(ϕ, k) =
YES if #3SAT(ϕ) ≥ 2
k+1
NO if #3SAT(ϕ) < 2
k
whatever otherwise
Given acomp, we can construct an algorithm that 2approximates #3SAT as follows:
Input: ϕ
compute:
acomp(ϕ, 1)
acomp(ϕ, 2)
acomp(ϕ, 3)
.
.
.
acomp(ϕ, n)
if acomp outputs NO from the ﬁrst time then
// The value is either 0 or 1.
// The answer can be checked by one more query to the NP oracle.
Query to the oracle and output an exact value.
53
else
Suppose that it outputs YES for t = 1, . . . , i −1 and NO for t = i
Output 2
i
We need to show that this algorithm approximates #3SAT within a factor of 2. If acomp
answers NO from the ﬁrst time, the algorithm outputs the right answer because it checks
for the answer explicitly. Now suppose acomp says YES for all t = 1, 2, . . . , i − 1 and
says NO for t = i. Since acomp(ϕ, i − 1) outputs YES, #3SAT(ϕ) ≥ 2
i−1
, and also since
acomp(ϕ, 2
i
) outputs NO, #3SAT(ϕ) < 2
i+1
. The algorithm outputs a = 2
i
. Hence,
1
2
a ≥ #3SAT(ϕ) < 2 a
and the algorithm outputs the correct answer with in a factor of 2.
Thus, to establish the theorem, it is enough to give a BPP
NP
implementation of the
acomp.
8.4 Constructing acomp
The procedure and its analysis is similar to the ValiantVazirani reduction: for a given
formula φ we pick a hash function h from a pairwise independent family, and look at the
number of assignments x that satisfy h and such that h(x) = 0.
In the ValiantVazirani reduction, we proved that if S is a set of size approximately
equal to the size of the range of h(), then, with constant probability, exactly one element
of S is mapped by h() into 0. Now we use a diﬀerent result, a simpliﬁed version of the
Leftover Hash Lemma proved by Impagliazzo, Levin, and Luby in 1989, that says that if S
is suﬃciently larger than the range of h() then the number of elements of S mapped into 0
is concentrated around its expectation.
Lemma 40 (The Leftover Hash Lemma) Let H be a family of pairwise independent
hash functions h : ¦0, 1¦
n
→ ¦0, 1¦
m
. Let S ⊂ 0, 1
n
, [S[ ≥
4·2
m
2
. Then,
Pr
h∈H
¸
[¦a ∈ S : h(a) = 0¦[ −
[S[
2
m
≥
[S[
2
m
≤
1
4
.
From this, acomp can be constructed as follows.
input: ϕ, k
if k ≤ 5 then check exactly whether #3SAT(ϕ) ≥ 2
k
.
if k ≥ 6,
pick h from a set of pairwise independent hash functions h : ¦0, 1¦
n
→ ¦0, 1¦
m
,
where m = k −5
answer YES iﬀ there are more then 48 assignments a to ϕ such that
a satisﬁes ϕ and h(a) = 0.
54
Notice that the test at the last step can be done with one access to an oracle to NP. We will
show that the algorithm is in BPP
NP
. Let S ⊆ ¦0, 1¦
n
be the set of satisfying assignments
for ϕ. There are 2 cases.
• If [S[ ≥ 2
k+1
, by the Leftover Hash Lemma we have:
Pr
h∈H
¸
[¦a ∈ S : h(a) = 0¦[ −
[S[
2
m
≤
1
4
[S[
2
m
≤
3
4
,
(set =
1
2
, and [S[ ≥
4·2
m
2
= 64 2
m
, because [S[ ≥ 2
k+1
= 2
m+6
)
Pr
h∈H
¸
[¦a ∈ S : h(a) = 0¦[ ≤
3
4
[S[
2
m
≥
3
4
,
Pr
h∈H
[[¦a ∈ S : h(a) = 0¦[ ≥ 48] ≥
3
4
,
which is the success probability of the algorithm.
• If [S[ < 2
k
:
Let S
be a superset of S of size 2
k
. We have
Pr
h∈H
[answer YES] = Pr
h∈H
[[¦a ∈ S : h(s) = 0¦[ ≥ 48]
≤ Pr
h∈H
[
¸
a ∈ S
: h(s) = 0
¸
≥ 48]
≤ Pr
h∈H
¸
a ∈ S
: h(s) = 0
¸
−
S

2
m
≥
S

2·2
m
≤
1
4
(by L.H.L. with = 1/2, [S
[ = 32 2
m
.)
Therefore, the algorithm will give the correct answer with probability at least 3/4, which
can then be ampliﬁed to, say, 1 −1/4n (so that all n invocations of acomp are likely to be
correct).
8.5 The proof of the Leftover Hash Lemma
We ﬁnish the lecture by proving the Leftover Hash Lemma.
Proof: We will use Chebyshev’s Inequality to bound the failure probability. Let S =
¦a
1
, . . . , a
k
¦, and pick a random h ∈ H. We deﬁne random variables X
1
, . . . , X
k
as
X
i
=
1 if h(a
i
) = 0
0 otherwise.
55
Clearly, [¦a ∈ S : h(a) = 0¦[ =
¸
i
X
i
.
We now calculate the expectations. For each i, Pr[X
i
= 1] =
1
2
m
and E[X
i
] =
1
2
m
.
Hence,
E[
¸
i
X
i
] =
[S[
2
m
.
Also we calculate the variance
Var[X
i
] = E[X
2
i
] −E[X
i
]
2
≤ E[X
2
i
]
= E[X
i
] =
1
2
m
.
Because X
1
, . . . , X
k
are pairwise independent,
Var
¸
¸
i
X
i
¸
=
¸
i
Var[X
i
] ≤
[S[
2
m
.
Using Chebyshev’s Inequality, we get
Pr
¸
[¦a ∈ S : h(a) = 0¦[ −
[S[
2
m
≥
[S[
2
m
= Pr
¸
¸
i
X
i
−E[
¸
i
X
i
]
≥ E[
¸
i
X
i
]
¸
≤
Var[
¸
i
X
i
]
2
E[
¸
i
X
i
]
2
≤
[S[
2
m
2
[S[
2
(2
m
)
2
=
2
m
2
[S[
≤
1
4
.
2
8.6 Approximate Sampling
So far we have considered the following question: for an NPrelation R, given an input x,
what is the size of the set R
x
= ¦y : (x, y) ∈ R¦? A related question is to be able to sample
from the uniform distribution over R
x
.
Whenever the relation R is “downward self reducible” (a technical condition that we
won’t deﬁne formally), it is possible to prove that there is a probabilistic algorithm running
in time polynomial in [x[ and 1/ to approximate within 1 + the value [R
x
[ if and only if
there is a probabilistic algorithm running in time polynomial in [x[ and 1/ that samples a
distribution close to the uniform distribution over R
x
.
We show how the above result applies to 3SAT (the general result uses the same proof
idea). For a formula φ, a variable x and a bit b, let us deﬁne by φ
x←b
the formula obtained
by substituting the value b in place of x.
1
1
Speciﬁcally, φx←1 is obtained by removing each occurrence of ¬x from the clauses where it occurs, and
removing all the clauses that contain an occurrence of x; the formula φx←0 is similarly obtained.
56
If φ is deﬁned over variables x
1
, . . . , x
n
, it is easy to see that
#φ = #φ
x←0
+ #φ
x←1
Also, if S is the uniform distribution over satisfying assignments for φ, we note that
Pr
(x
1
,...,xn)←S
[x
1
= b] =
#φ
x←b
#φ
Suppose then that we have an eﬃcient sampling algorithm that given φ and generates
a distribution close to uniform over the satisfying assignments of φ.
Let us then ran the sampling algorithm with approximation parameter /2n and use it to
sample about
˜
O(n
2
/
2
) assignments. By computing the fraction of such assignments having
x
1
= 0 and x
1
= 1, we get approximate values p
0
, p
1
, such that [p
b
− Pr
(x
1
,...,xn)←S
[x
1
=
b][ ≤ /n. Let b be such that p
b
≥ 1/2, then #φ
x←b
/p
b
is a good approximation, to within
a multiplicative factor (1 + 2/n) to #φ, and we can recurse to compute #φ
x←b
to within
a (1 + 2/n)
n−1
factor.
Conversely, suppose we have an approximate counting procedure. Then we can approx
imately compute p
b
=
#φ
x←b
#φ
, generate a value b for x
1
with probability approximately p
b
,
and then recurse to generate a random assignment for #φ
x←b
.
The same equivalence holds, clearly, for 2SAT and, among other problems, for the
problem of counting the number of perfect matchings in a bipartite graph. It is known
that it is NPhard to perform approximate counting for 2SAT and this result, with the
above reduction, implies that approximate sampling is also hard for 2SAT. The problem
of approximately sampling a perfect matching has a probabilistic polynomial solution, and
the reduction implies that approximately counting the number of perfect matchings in a
graph can also be done in probabilistic polynomial time.
The reduction and the results from last section also imply that 3SAT (and any other
NP relation) has an approximate sampling algorithm that runs in probabilistic polynomial
time with an NP oracle. With a careful use of the techniques from last week it is indeed
possible to get an exact sampling algorithm for 3SAT (and any other NP relation) running
in probabilistic polynomial time with an NP oracle. This is essentially best possible, be
cause the approximate sampling requires randomness by its very deﬁnition, and generating
satisfying assignments for a 3SAT formula requires at least an NP oracle.
8.7 References
The class #P was deﬁned by Valiant [Val79]. An algorithm for approximate counting within
the polynomial hierarchy was developed by Stockmeyer [Sto83]. The algorithm presented
in these notes is taken from lecture notes by Oded Goldreich. The leftover hash lemma
is from [HILL99]. The problem of approximate sampling and its relation to approximate
counting is studied in [JVV86].
57
Lecture 9
AverageCase Complexity of the
Permanent
9.1 The Permanent Problem
The permanent is a number assigned to a matrix M ∈ R, for a ring R, according to the
formula
perm(M) =
¸
π∈Sn
n
¸
i=1
M
i,π(i)
where S
n
is the group of permutations on n items. The deﬁnition of the permanent closely
resembles that of the determinant; the only diﬀerence is that the determinant has a co
eﬃcient (−1)
sgn(π)
in front of the product. However, this resemblance is deceptive: the
determinant can be calculated in polynomial time, whereas Permanent is #Pcomplete.
Claim 41 Let R = Z and let M be a 0/1 matrix, which we think of as representing a
bipartite graph G = (U, V, E) with [U[ = [V [ = n as follows: number both edge sets U and
V , separately, by 1, 2, . . . n, and let M
i,j
be 1 if edge (i, j) ∈ E, and 0 otherwise. Then
perm(M) is the number of perfect matchings in G.
Proof: Each π ∈ S
n
is a potential perfect matching, in that it can be thought of as a
bijection between U and V . This corresponds to an actual perfect matching iﬀ all edges
(i, π(i)) are in E, i.e. if M
i,π(i)
= 1 for all i, which is equivalent to saying that
¸
n
i=1
M
i,π(i)
=
1. Thus we get a summand of 1 in the formula for perm(M) for each perfect matching of
G. 2
Suppose we allow the entries of M to be nonnegative integers. Then M represents a
bipartite graph where the edges have multiplicity given by the entries of M. It still makes
sense to interpret perm(M) as the number of permanent matchings, because the product
of the multiplicities of the edges of a matching is the number of ways of realizing that
matching.
Note that in Z
2
, perm(M) = det(M) and thus is polynomial time. But for suﬃciently
large moduli (or no modulus at all) it’s #Pcomplete.
58
9.2 Worstcase to Averagecase Equivalence
Theorem 42 Suppose for ﬁxed n and T, [T[ > n
O(1)
, we can compute perm(M) on a
fraction 1−
1
2n
of the inputs M in polynomial time. Then there is a (worstcase) polynomial
time probabilistic algorithm for computing perm(M).
Rather than proving this theorem directly, we will just observe that perm(M) is a degree
n polynomial in indeterminates M
11
, M
12
, . . . , M
1n
, . . . , M
nn
, and prove the following much
more general result.
Theorem 43 Suppose we have oracle access to a function f : T
n
→ T and f agreees with
some degree d polynomial p : T
n
→ T on a fraction 1−
1
3d+3
of its inputs where [T[ ≥ d+2.
Then we can compute f(x) with high probability for all x ∈ T
n
in poly(n, d) time, assuming
ﬁeld operations can be performed in poly(n, d) time.
Proof: Suppose that we input x ∈ T
n
and pick y ∈ T
n
uniformly at random. Consider
the line l(t) = x +ty. Each point of l is uniformly distributed except x = l(0). Now,
Pr[p(l(i)) = f(l(i)) for i = 1, 2, . . . , d + 1] = 1 −Pr[∃i ∈ 1, . . . , d + 1 : p(l(i)) = f(l(i))]
≥ 1 −
d + 1
3(d + 1)
=
2
3
Now q(t) = p(l(t)) is a degree d polynomial in t, and we know d +1 of its values, so we can
interpolate the others. Thus we have the following algorithm:
input: x,
pick random y ∈ T
n
ﬁnd univariate degreed polynomial q() such that
q(i) = p(x +iy) for i = 1, . . . , d + 1
return (q(0))
This runs in poly(n, d) time, and with probability ≥
2
3
, it outputs p(x). 2
We can strengthen our result to hold even if p(x) only agrees with f(x) on
3
4
+ of
its inputs. This stronger proof depends on the following theorem, due to Berlekamp and
Welch.
Theorem 44 Let q : T → T be a polynomial of degree d. Say we are given m pairs
(a
1
, b
1
), . . . , (a
m
, b
m
) such that for fewer than
m−d
2
pairs (a
i
, b
i
), b
i
= q(a
i
). Then q can be
reconstructed with poly(n, d) ﬁeld operations.
Now, as promised, we apply BerlekampWelch to strengthen our main theorem:
Theorem 45 Suppose we have oracle access to a function f : T
n
→ T and f agreees with
some degree d polynomial p : T
n
→ T on a fraction
3
4
+ of its inputs where [T[ ≥ 1 +d/.
Then we can compute p(x) with high probability for all x ∈ T
n
in poly(n, d) time, assuming
ﬁeld operations can be performed in poly(n, d) time.
59
Proof: Consider the following algorithm:
input: x,
pick random y ∈ T
n
ﬁnd degreed univariate polynomial q() such that
[¦t ∈ T −¦0¦ : q(t) = f(x +ty)¦[ < ([T[ −1 −d)/2
halt and fail if no such polynomial exists
return (q(0))
If q(t) = p(l(t)) and f(l(t)) disagree on <
T−d−1
2
values of t, BerlekampWelch will
ﬁnd q(t) ≡ p(l(t)) and the algorithm succeeds (and in polynomial time). So, assuming only
1
4
− of our points disagree, we need to show that less than half of our runs fail.
E[# values of t ∈ T −¦0¦ with p(l(t)) = f(l(t))] =
¸
t=0
E
¸
0 if p(l(t)) = f(l(t))
1 otherwise
≤ ([T[ −1)
1
4
−
To clean up the notation let us use m = [T[ − 1. The previous expression and Markov’s
inequality tell us that
Pr
¸
[¦t ∈ T −¦0¦ : p(l(t)) = f(l(t))¦[ >
m−d
2
≤
m(
1
4
−)
m−d
2
≤
1
2
−
2
if m ≥
d
Thus, our algorithm succeeds more than half the time, so it can be repeated to yield any
degree of accuracy we wish. 2
9.3 References
The simple algorithm to reconstruct a polynomial from a faulty oracle is due to Beaver and
Feigenbaum [BF90], and the more sophisticated version that uses errorcorrection is due
to [GLR
+
91]. The use of polynomial reconstruction algorithms in the averagecase com
plexity of the permanent due to Lipton [Lip90]. The BerlekampWelch algorithm [WB86]
is a polynomial time algorithm to reconstruct a univariate polynomial from faulty data.
60
Lecture 10
Averagecase Complexity of
Problems in PSPACE and EXP
10.1 Averagecase Complexity in PSPACE and EXP for 7/8
Fraction of Inputs
Lemma 46 (Multivariate Interpolation) Let T be a ﬁeld, H ⊆ T a subset and f :
H
m
→ T an arbitrary function. Then there is a mvariate polynomial p of degree at most
m[H[ such that p(x) = f(x) for every x ∈ H
m
. Furthermore, the value p(x) can be computed
using time polynomial in [T[
m
and space logarithmic in [T[
m
given oracle access to f.
Let L be an EXPcomplete problem (the same construction and reasoning can be re
peated with a PSPACEcomplete problem), and for every n let L
n
: ¦0, 1¦
n
→ ¦0, 1¦ the
characteristic function of L on inputs of length n. We will construct another EXPcomplete
problem L
such that for every n there is a n
= O(n) such that the truthtable of L
n
is
the encoding of the truthtable of L
n
using a concatenation of ReedMuller and Hadamard
codes. Then we will be able to show that L
is as hard in the worst case as it is hard on
average.
Let us ﬁx an input length n.
Let k = 3log n, and let T be a ﬁeld of size 2
k
= O(n
3
). We will identify elements
of T with vectors of ¦0, 1¦
k
and, in particular, the zero element of T is mapped into
(0, . . . , 0) and the one element of T is mapped into (1, 0, . . . , 0). Let H ⊆ T be a subset
of size 2
k/3
. Let m = 3n/k, then H
m
contains at least 2
n
elements and there is some
eﬃciently computable injective function h : ¦0, 1¦
n
→ H
m
. From Lemma 46 it follows
that there is a multivariate polynomial p of total degree at most O(n
2
/ log n) such that
p(h(x)) = L
n
(x) for all x ∈ ¦0, 1¦
n
. Let f : T
m
¦0, 1¦
k
→ ¦0, 1¦ the function such
that f(x
1
, . . . , x
m
, a) = p(x
1
, . . . , x
m
) a, where we abused notation a little by viewing
p(x
1
, . . . , x
m
) as an element of ¦0, 1¦
k
. With similar abuse, we can view f as a function
f : ¦0, 1¦
3n+k
→ ¦0, 1¦; we deﬁne the language L
such that f is the characteristic function
of L
on inputs of length 3n + 3log n. Clearly, L
is EXPcomplete, because for every
x ∈ B
n
we have x ∈ L if and only if p(h(x)) = 1, which holds if and only if (h(x), 1) ∈ L
,
where 1 = (1, 1, . . . , 1).
61
Theorem 47 Let p : T
m
→ T be a degree d polynomial, let us identify elements of T with
elements of ¦0, 1¦
k
, and let f : ¦0, 1¦
km+k
→ ¦0, 1¦ be such that f(x, a) = p(x) a. Suppose
we are given oracle access to g : ¦0, 1¦
km+k
→ ¦0, 1¦ such that f and g agree on at least a
fraction 7/8 + of the inputs, where [T[ ≥ 1 +max¦4/
2
, d/4¦. Then we can compute p(x)
for every x in time polynomial in m, [F[ and 2
k
.
Proof: For every x, deﬁne c(x) to be the value that maximizes Pr
a
[c(x) a = f(x, a)],
and break ties arbitrarily if the maximum is not unique. Notice that c(x) can be computed
in time polynomial in 2
k
. For random (x, a), f(x, a) = g(x, a) with probability at most
1/8 −, so it follows that
Pr
x
[Pr
a
[f(x, a) = g(x, a)] ≥ 1/4] ≤ 1/2 −4
So for at least a 1/2 + 4 fraction of the xs, we have g(x, a) = f(x, a) = p(x) a for more
than 3/4 of the as. For each such x, then, p(x) = c(x). The function c() has then agreement
1/2 + with p(), which means that, by previous results, we can compute p() everywhere
using c(). 2
Note that the above theorem is essentially giving a polylogarithmic time decoding
procedure for the concatenation of the ReedMuller code with the Hadamard code. The
procedure ﬁrst decodes the Hadamard code locally, and then simulates the procedure for
decoding the ReedMuller code using the results of the Hadamard decoding. The Reed
Muller decoding needs the number of errors to be less than 1/2, and the Hadamard decoding
needs the number of errors to be less than 1/4, so the overall procedure needs the number
of errors to be less than 1/8.
In order to state the application of the previous theorem to the averagecase hardness
of problems in PSPACE and EXP, let us deﬁne avgBPTIME
p(n)
(t(n)) to be the class
of decision problems L for which there is a probabilistic algorithm A running in time t(n)
such that for every input length n
Pr
x∈{0,1}
n[Pr
r
[A(x, r) = L
n
(x)] ≥ 3/4] ≥ p(n)
In other words, avgBPP
p(n)
(t(n)) contains all problems that can be solved on at least a p(n)
fraction of the inputs by BPPtype algorithms in time t(n). Notice that avgBPP
1
(t(n)) =
BPP(t(n)).
Corollary 48 Suppose that EXP ⊆ avgBPP
7/8+1/n
n
O(1)
, then EXP = BPP. Suppose
that PSPACE ⊆ avgBPP
7/8+1/n
n
O(1)
, then PSPACE = BPP.
10.2 CodingTheoretic Perspective
The construction of the previous section can be abstracted as follows: starting from a
language L and an input length n, we encode the characteristic function L
n
: ¦0, 1¦
n
→
¦0, 1¦ of L on inputs of length n as a function f
L
: ¦0, 1¦
O(n)
→ ¦0, 1¦, and then we argue
that given a procedure that computes g on a 1 −δ fraction of inputs, with δ < 1/8 we can
reconstruct L
n
everywhere.
62
The construction can be carried over for every L
n
, and we can see that, in order for the
theorem to hold, it must be the case that if we have to diﬀerent functions L
n
, L
n
: ¦0, 1¦
n
→
¦0, 1¦ then the functions f
L
and f
L
must diﬀer in at least a 2δ fraction of inputs, otherwise
it would not be possible to reconstruct f
L
given only a function g that has agreement 1 −δ
with f
L
. In fact, the encoding of L
n
into f
L
is a concatenation of a ReedMuller code with
a Hadamard code, and its minimum relative distance is almost 1/2.
Now suppose we are trying to prove that EXP contains languages that hard to compute
even on a 1/2 + fraction of inputs. By the above argument, we need an encoding that will
map two diﬀerent languages into two boolean functions that diﬀer in at least 1−2 fraction
of points. Unfortunately, this is impossible: there is no errorcorrecting code with relative
minimum distance bigger than 1/2 (except for trivial codes containing a small constant
number of codewords), which means that the unique reconstruction problem is possible
only if there are less than a 1/4 fraction of errors.
1
In order to prove stronger hardness results, we need to stop looking for unique decoding
procedures, and start looking for list decoding procedures instead. In the list decoding
problem for a code C : ¦0, 1¦
n
→ ¦0, 1¦
m
, we are given a received string z ∈ ¦0, 1¦
m
, and
we believe that less than a δ fraction of errors occurred, and we want to compute the list
of all possible messages x ∈ ¦0, 1¦
n
such that C(x) and z are at distance less than δ. If
the list is small and it can be produced eﬃciently, it gives a lot of information about the
original message.
We will state without proof a listdecoding result about the Hadamard code and one
about the ReedMuller code.
Lemma 49 Given g : ¦0, 1¦
k
→ ¦0, 1¦ and > 0, there are at most 1/4
2
strings a ∈ ¦0, 1¦
k
such that Pr
x
[g(x) = a x] ≥ 1/2 +.
For the Hadamard code, the list can be ﬁnd by brute force in time polynomial in the
length of the encoding. Goldreich and Levin show that, given oracle access to f, the list
can even be found in time polynomial in k and 1/ (by a randomized algorithm, with high
probability), although we will not need such a stronger result in our applications.
Theorem 50 (Sudan) Let T be a ﬁeld, d a degree parameter, and 1 > α >
2d/m be
an accuracy parameter. Then there is a polynomial time algorithm that on input pairs of
ﬁeld elements (a
1
, b
1
), . . . , (a
m
, b
m
) ﬁnds a list of all univariate polynomials q() of degree
≤ d such that q(a
i
) = b
i
for at least αm pairs. Furthermore, the list contains at most 2/α
polynomials.
In the next section we will see how to use Sudan’s algorithm to achieve a form of
listdecoding of ReedMuller code even given a 1 − fraction of errors. Since we can do
listdecoding of the Hadamard code with up to 1/2 − errors, it should be intuitive that
the combination of the two procedures will give a decoding for the concatenation of the two
codes that will work with 1/2−epsilon
fraction of errors. Using nonuniformity, we will be
able pick the right decoding from the list of possible decodings by hardwiring the choice
1
Note that there is still some margin of improvement, because our reconstruction procedure works only
with less than a 1/8 fraction of errors.
63
in the circuit performing the reconstruction. This will show that if every problem in EXP
can be solved by small circuits on 1/2 + fraction of inputs, then all problems in EXP can
be solved by small circuits on all inputs.
10.3 Averagecase Complexity in PSPACE and EXP for 1/2
Fraction of Inputs
Let us recall a (weaker version of a) result from a past lecture. It was stated in terms
of probabilistic procedures, but it holds for a stronger reason in terms of small circuits.
(Because we have seen that probabilistic procedures can always be simulated by circuits.)
Theorem 51 Let p : T
m
→ T be a polynomial of degree d, [T[ > 16d, and let f : T
m
→ T
be a function that agrees with p on a 7/8 fraction of inputs. Then there is an oracle circuit
C of size polynomial in m and [T[ such that C
f
(x) = p(x) for every x.
Here is the multivariate listdecoding result. We state it as a nonuniform unique re
construction procedure.
Theorem 52 Let p : T
m
→ T be a polynomial of degree d, [T[ ≥ 1 + 8d/
2
, and let
f : T
m
→ T be a function that agrees with p on at least an fraction of inputs, with
[T[ ≥ 1 + 8d/
2
. Then there is an oracle circuit C of size polynomial in m and [T[ such
that C
f
(x) = p(x) for every x.
Proof: We will just show the existence of a polynomial size circuit that computes p() on
a 7/8 fraction of inputs, and then invoke Theorem 51.
Let us pick x and y at random in T
m
, we want to argue that knowing p(y) will help us
compute p(x). The following procedure does that.
input: x, y, p(y)
consider line l(t) = tx + (1 −t)y and ﬁnd set Q of
all degreed univariate polynomials q such that
[¦t ∈ T : q(t) = f(tx + (1 −t)y)¦[ > [T[/2
if there is unique q
∗
∈ Q such that q
∗
(0) = p(y) return q
∗
(1)
otherwise fail
By our assumption on T, d, and , we have /2 >
2d/[T[, so the above algorithm can
be implemented by a circuit C of size polynomial in m and [T[. Now we want to argue that
the algorithm is quite eﬀective.
Claim 53 Pr
x,y
[C
f
(x, y, p(y)) = p(x)] ≥ 7/8.
Before proving the claim, let us see that, once the claim is proved, we are done. By an
averaging argument, there is a y such that
Pr
x
[C
f
(x, y, p(y)) = p(x)] ≥ 7/8.
64
Then we just have to hardwire y and p(y) into C, and we have a circuit that computes p()
on a 7/8 fraction of inputs, and we can invoke Theorem 51 and we are done. Now we prove
the claim.
Proof: [Of Claim] First of all, note that when we pick x and y at random, the points on
the line l(t) = tx + (1 − t)y are pairwise independent. Therefore, there is a probability at
least 15/16 then there are at least [T[ −4
[T[ points on the line where f agrees with p,
and this number is at least /2 by our assumptions on T, d and .
So, except with probability 1/16, the polynomial q
∗
such that q
∗
(t) = p(l(t)) is part of
the list Q constructed by the algorithm. It remains to see that it is the unique one such that
q(0) = p(y). Notice that each other polynomial q ∈ Q agrees with q
∗
in at most d points of
the line, so, on a random point of the line, q
∗
has a probability at least 1−([Q[ −1)d/[T[ of
having a value diﬀerent from the one of all other polynomials in Q on that random point.
Well, y was a random point, but it is not exactly a random point on l(): by deﬁnition,
indeed, y = l(0). Can we still argue that y is as good as a random point of the line in
distinguishing q
∗
() from the other polynomials in Q?
To see that this is the case, let us pick the points x, y in a diﬀerent (but equivalent)
way: let us ﬁrst pick two random points x
, y
∈ T
m
, then consider the line l
(t) = tx
+
(1 − t)y
, and ﬁnally pick two random points a, b ∈ T on the line and call x = l
(a) and
y = l
(b). Surely, x and y picked in this way are uniformly distributed. What about the
line l(t) = tx + (1 − t)y? It is the same line as l
(), just with a diﬀerent parameterization:
we have l(t) = l
(t(a −b) +b). So for each polynomial q ∈ Q such that q(t) has agreement
/2 with p(l(t)) there is a corresponding polynomial q
(t) = q(t(a − b) + b)) such that
q
(t) has agreement /2 with p(l
(t)) and we can call Q
the list of such polynomials. Let
q
∗
(t) = q
∗
(t(a − b) + b)), then we have that q
∗
is the unique polynomial in Q such that
q(0) = p(y) if and only if q
∗
is the unique polynomial in Q
such that q
∗
(b) = p(y). In the
latter case, it is clear that we are looking at random point, and so the probability that q
∗
is isolated is at least 1 −([Q[ −1)d/[T[ which is at least 15/16.
Overall, the success probability of the algorithm is at least 7/8 and the claim is proved.
2
2
We can now move to the concatenated version.
Theorem 54 Let p : T
m
→ T be a degree d polynomial, let us identify elements of T with
elements of ¦0, 1¦
k
, and let f : ¦0, 1¦
km+k
→ ¦0, 1¦ be such that f(x, a) = p(x) a. Suppose
we are given oracle access to g : ¦0, 1¦
km+k
→ ¦0, 1¦ such that f and g agree on at least a
fraction 1/2 + of the inputs, where [T[ ≥ 8d/
6
. Then there is an oracle circuit C of size
polynomial in m, [F[ and 2
k
such that C
g
computes p() everywhere.
Proof: It will be enough to show how to compute p on a fraction
3
of the inputs.
For every x, let S(x) ⊆ ¦0, 1¦
k
be the list of all strings c such that g(x, a) and c˙ a agree
on a fraction at least 1/2 + /2 of the values a ∈ ¦0, 1¦
k
. Notice that S(x) is computable
in time polynomial in 2
k
and that it contains at most 1/
2
elements. A Markov argument
shows that
65
Pr
x
[Pr
a
[g(x, a) = p(x) a] ≥ 1/2 +/2] ≥
Now, for each x such that Pr
a
[g(x, a) = p(x) a] ≥ 1/2 + /2 we have p(x) ∈ S(x) by
deﬁnition. Let c
i
(x), for i = 1, . . . , 1/
2
be the ith element of S(x) (say, in lexicographic
order, and if S(x) has less than i elements let c
i
(x) be 0). Again, we have that c
i
(x) is
computable in polynomial in 2
k
and, by an averaging argument, there must be an i
∗
such
that for at least an
3
fraction of the x we have c
i
∗(x) = p(x). We then let C
f
(x) = c
i
∗(x),
and apply Theorem 52. 2
At this point, we are ready to state our main result, except that we need some more
notation. Let us deﬁne avgSIZE
p(n)
(t(n)) to be the class of decision problems L such that
for every input length n there is a circit C of size t(n) such that
Pr
x∈{0,1}
n[C(x) = L
n
(x)] ≥ p(n)
Corollary 55 Suppose that EXP ⊆ avgSIZE
1/2+1/n
n
O(1)
, then EXP ⊆ SIZE(n
O(1)
).
Suppose that PSPACE ⊆ avgSIZE
1/2+1/n
n
O(1)
, then PSPACE ⊆ SIZE(n
O(1)
).
10.4 References
Polynomial encodings are used by Babai et al. [BFNW93] to prove averagecase results
for problems in EXP. The presentation in these notes is inﬂuenced by the more general
perspective taken in [STV01]. Sudan’s listdecoding algorithm for the ReedSolomon code
appears in [Sud97].
66
Exercises
• Let C : ¦0, 1¦
n
→ ¦0, 1¦
m
be an errorcorrecting code of minimum relative distance
3/4. Show that n = 1.
• Show that avgBPP
1−1/n
(n
2
) contains undecidable languages.
• This may be somewhat complicated: try to get a version of Theorem 52 that is more
explicitely in terms of a listdecoding algorithm. The proof already shows that one
can (withouth nonuniformity) deﬁne a list of [T[
m+1
decoding algorithms (one for
each choice of y and p(y)) such that one of them is correct.
– Show that (with a small loss in the parameters) y can be chosen at random, so
that we only need to enumerate over the [T[ possible values for p(y).
– Can you modify the algorithm so that the list contains only poly(1/) elements
instead of [T[?.
67
Lecture 11
The XOR Lemma
We now see an alternative way of proving strong averagecase hardness results, assuming
we have as a starting point a function that is already somewhat hard on average. Part of
these notes were written by Sanjeev Arora.
11.1 The XOR Lemma
Deﬁnition 15 (δhardness) Let f be a boolean function on nbit inputs, T a distribution
on n bit strings. If δ < 1/2 and g ∈ [n,
2
n
n
] then we say that f is δhard on distribution T
for circuits of size g if for any boolean circuit C with at most g gates:
Pr
x∈D
[f(x) = C(x)] < 1 −δ. (11.1)
We want to transform f into a function f
that is very hard on average and that can be
computed eﬃciently given f.
For a parameter k, consider the function f
k
: ¦0, 1¦
nk
→ ¦0, 1¦ deﬁned as
f
k
(x
1
, . . . , x
k
) = f(x
1
) ⊕f(x
2
) ⊕ ⊕f(x
k
) .
Yao’s “XORLemma” states that the fraction on inputs on which f
k
can be computed
by an eﬃcient circuit tends to 1/2 for large k. To see why this makes sense, let us ﬁrst
compute how successful we can be in computing f
k
if we just try to compute f() on each
input x
i
and the we xor the results. In each attempt we will be successful with probability at
most δ (let us say exactly δ for simplicity). Then the probability of computing f
k
correctly
is the same as the probability that we make an even number of errors. This is the same
as deﬁning independent 0/1 random variables Z
1
, . . . , Z
k
, where for each i Pr[Z
i
= 0] = δ
and asking what is Pr[Z
1
⊕ ⊕ Z
k
= 0]. We can think of each Z
i
as being sampled in
the following equivalent way: with probability 1 − 2δ we set Z
i
= 0, and with probability
2δ we ﬂip an unbiased coin and let Z
i
be equal to the outcome of the coin. Then there is
a probability (1 − 2δ)
k
that when we sample Z
1
, . . . , Z
k
we never ﬂip a coin, we then all
Z
i
are zero and their xor is also zero; with the remaining 1 − (1 − 2δ)
k
probability we ﬂip
at least one coin, and the xor of the Z
i
contains the value of that coin xored with other
independent values, and so it is unbiased. Overall, we have
Pr[Z
1
⊕ ⊕Z
k
= 0] = (1 −2δ)
k
+
1
2
(1 −(1 −2δ)
k
) =
1
2
+
1
2
(1 −2δ)
k
68
that actually tends to 1/2 for large k.
What happens when we consider arbitrary circuits for f
k
? The following result by
Impagliazzo allows us to get a proof of the XOR Lemma that follows quite closely the
outline of the above probabilistic reasoning.
Deﬁnition 16 A set H ⊆ ¦0, 1¦
n
is a hard core set for a boolean function f with respect
to size S if for any boolean circuit C with at most S gates:
Pr
x∈S
[f(x) = C(x)] <
1
2
+ (11.2)
Lemma 56 (Impagliazzo) Suppose boolean function f is δhard for size S on nbit strings.
Then for every > 0 there is a set S of size ≥ δ2
n
such that f is hard core on S for
circuits of size S
2
δ
2
/4.
This is an extremely strong result: from the δhardness of f we only know that for every
circuit C there is a subset H
C
of δ2
n
inputs such that C fails on H
C
. Impagliazzo’s result
states that there is a single set H of δ2
n
inputs such that every circuit C fails on about half
the elements of H. (Notice the diﬀerent order of quantiﬁers.)
From Impagliazzo’s result, it is easy to prove the XOR Lemma.
Theorem 57 (Yao’s XOR Lemma) Let f be a boolean function on nbit inputs that is
δhard for circuits of size S. Then the function f
⊕k
on nkbit inputs deﬁned as
f
⊕k
(x
1
, x
2
, . . . , x
k
) = f(x
1
) ⊕f(x
2
) ⊕ ⊕f(x
k
)
is (
1
2
− −(1 −δ)
k
)hard for circuits of size S
= S
2
δ
2
/4.
Proof: The proof is by contradiction. Assume some circuit C of size at most S
satisﬁes
Pr[C(x
1
, x
2
, . . . , x
k
) = f
⊕k
(x
1
, x
2
, . . . , x
k
)] ≥ 1/2 + + (1 −δ)
k
.
By Impagliazzo’s Lemma, there is an hard core set H of δ2
n
inputs with respect to
circuits of size S
. We construct a circuit C
of size S
that predicts f on at least 1/2 +
fraction of inputs of H and thus reach a contradiction.
Partition the set of nkbit strings into S
0
, S
1
, S
2
, . . . , S
k
, where S
t
consist of those
(x
1
, . . . , x
k
) in which exactly t of the x
i
’s lie in H. At most (1 − δ)
k
fraction of inputs
lie in S
0
, so C must predict f
⊕k
with probability at least 1/2 + on inputs from the other
classes. Averaging shows there is a t ≥ 1 such that C predicts f
⊕k
with probability at least
1/2 + on inputs randomly chosen from S
t
. Using nonuniformity, assume we know such a
t.
Let x be a random input from H; this is the input to C
. Pick a
1
, a
2
, . . . , a
t−1
randomly
from H and b
1
, . . . , b
k−t−1
randomly from the complement of H. Randomly permute the k
inputs a
1
, a
2
, . . . , a
t−1
, x, b
1
, . . . , b
k−t−1
to get an nkbit string z; note that z is a random
input from S
t
. Apply C on z. With probability at least (1 +)/2 we have C(z) = f
⊕k
(z).
By averaging, there is a way to ﬁx a
1
, a
2
, . . . , a
t−1
, b
1
, . . . , b
k−t−1
and their random per
mutation such that the circuit still outputs f
⊕k
(z) with probability 1/2 +. Using nonuni
formity again, we hardwire these choices into circuit C. Thus the circuit is left with n input
wires, into which x is being fed. Noting that
f
⊕k
(z) = f(x) ⊕f(a
1
) ⊕ ⊕f(a
t−1
) ⊕f(b
1
) ⊕ ⊕f(b
k−t−1
),
69
we again use nonuniformity and assume we know b = f(a
1
) ⊕ ⊕f(a
t−1
) ⊕f(b
1
) ⊕ ⊕
f(b
k−t−1
). We add a NOT gate at the top of our circuit iﬀ b = 1. This gives a circuit C
with S
gates (remember we do not count NOT gates in circuit complexity) which on input
x produces C(z) ⊕ b as output. Furthermore, for at least 1/2 + fraction of x ∈ H, the
circuit outputs f
⊕k
(z) ⊕ b, which is f(x). Thus C
computes f on 1/2 + of inputs from
H, a contradiction. 2
11.2 References
Yao’s XOR Lemma was stated in the conference presentation of [Yao82]. Proofs of the
Lemma appeared in [Lev87, Imp95a, GNW95, IW97]. The proof given here is Impagli
azzo’s [Imp95a].
70
Exercises
1. Prove a version of Impagliazzo’s Lemma for averagecase algorithms that are errorless
but may fail.
We say that a circuit C is errorless for a function f : ¦0, 1¦
n
→ ¦0, 1¦ if C(x) ∈
¦0, 1, fail¦ for each x, and C(x) = f(x) whenever C(x) = fail.
We say that a function f : ¦0, 1¦
n
→ ¦0, 1¦ is δhard for errorles circuits of size S if
for every errorless circuit C of size ≤ S for f we have Pr
x
[C(x) = fail] ≥ δ.
Prove that if f : ¦0, 1¦
n
→ ¦0, 1¦ is δhard for errorless circuits of size S, then there
is a subset H ⊆ ¦0, 1¦
n
, [H[ ≥ δ2
n
such that for every errorless circuit C for f of size
≤ S
we have Pr
x∈H
[C(x) = fail] ≥ 1 −, where S
= Ω(S/(log(1/δ))).
2. Prove a “concatenation lemma” instead of a XOR Lemma. Let f : ¦0, 1¦
n
→ ¦0, 1¦
be δhard for circuits of size S. Then argue that the function f
¦0, 1¦
kn
→ ¦0, 1¦
k
deﬁned as f
(x
1
, . . . , x
k
) = (f(x
1
), , f(x
k
)) cannot be computed on more than a
+ (1 −δ)
k
fraction of inputs by circuits of size
2
δ
2
S/4.
71
Lecture 12
Levin’s Theory of Averagecase
Complexity
Today we introduce Levin’s theory of averagecase complexity.
This theory is still in its infancy: in this chapter we will introduce the notion of “dis
tributional problems,” discuss various formalizations of the notion of “algorithms that are
eﬃcient on average,” introduce a reducibility that preserves eﬃcient averagecase solvabil
ity, and ﬁnally prove that there is a problem that is complete for the distributional version
of NP under such reductions. It is still an open question how to apply this theory to the
study of natural distributional problems that are believed to be hard on average.
12.1 Distributional Problems
Deﬁnition 17 (Distributional Problem) A distributional problem is a pair 'L, µ`, where
L is a decision problem and µ is a distribution over the set ¦0, 1¦
∗
of possible inputs.
In other settings where averagecase hardness is considered (e.g. in the study of oneway
functions) we normally describe the distribution of inputs as a collection of distributions
µ
1
, . . . , µ
n
, . . . where µ
n
is a distribution over the set ¦0, 1¦
n
of inputs of a given input
length.
1
There are various reasons why this singledistribution approach is convenient for
the purposes of this chapter. We will discuss it again later, but for now the basic intuition is
that we will discuss reductions where the length of the output is not a function of the length
of the input, so that sampling inputs from a ﬁxedlength distribution and passing them to
a reduction does not produce a ﬁxedlength distribution (unlike the case of cryptographic
reductions).
We will restrict to the study of distributional problems where µ is “polynomialtime
computable.” What do we mean by that? Fo all x ∈ ¦0, 1¦
∗
, let
µ(x) =
¸
y≤x
Pr [y] . (12.1)
1
One can always reduce the approach of distinct distributions to the approach of this chapter by assuming
that µ ﬁrst picks at random a certain input length n, and then it samples from µn.
72
where ‘≤’ denotes lexicographic ordering. Then µ must be computable in poly ([x[) time.
Clearly this notion is at least as strong as the requirement that Pr [x] be computable in
polynomial time, because
Pr [x] = µ/ (x) = µ(x) −µ(x −1) , (12.2)
x−1 being the lexicographic predecessor of x. Indeed one can show that, under reasonable
assumptions, there exist distributions that are eﬃciently computable in the second sense
but not polynomialtime computable in our sense.
We can deﬁne the “uniform distribution” to be
Pr [x] =
1
[x[ ([x[ + 1)
2
−x
; (12.3)
that is, ﬁrst choose an input size at random under some polynomiallydecreasing distribu
tion, then choose an input of that size uniformly at random. It is easy to see that the
uniform distribution is polynomialtime computable.
12.2 DistNP
We deﬁne the complexity class
DistNP = ¦'L, µ` : L ∈ NP, µ polynomialtime computable¦ . (12.4)
There are at least two good reasons for looking only at polynomialtime computable
distributions.
1. One can show that there exists a distribution µ such that every problem is as hard on
average under µ as it is in the worst case. Therefore, unless we place some computa
tional restriction on µ, the averagecase theory is identical to the worstcase one.
2. Someone, somewhere, had to generate the instances we’re trying to solve. If we
place computational restrictions on ourselves, then it seems reasonable also to place
restrictions on whoever generated the instances.
It should be clear that we need a whole class of distributions to do reductions; that is,
we can’t just parameterize a complexity class by a single distribution. This is because a
problem can have more than one natural distribution; it’s not always obvious what to take
as the ‘uniform distribution.’
12.3 Reductions
Deﬁnition 18 (Reduction) We say that a distributional problem 'L
1
, µ
1
` reduces to a
distributional problem 'L
2
, µ
2
` (in symbols, 'L
1
, µ
1
` ≤ 'L
2
, µ
2
`) if there exists a polynomial
time computable function f such that:
1. x ∈ L
1
iﬀ f (x) ∈ L
2
.
73
2. There is a > 0 such that, for every x, [f(x)[ = Ω([x[
).
3. For all y ∈ ¦0, 1¦
∗
,
¸
x:f(x)=y
µ/
1
(x) ≤ poly ([y[) µ/
2
(y) . (12.5)
The ﬁrst condition is the standard condition of manytoone reductions in complexity
theory: it ensures that an algorithm that is always correct for L
2
can be converted into an
algorithm that is always correct for L
1
. The second condition is a technical one, that is
necessary to show that the reduction preserves eﬃcientonaverage algorithms. All known
reductions satisﬁes this condition naturally.
The third condition is called domination. To motivate this condition, consider that
we want reductions to preserve the existence of algorithms that are eﬃcient on average.
Suppose that we have an algorithm A
2
for problem L
2
such that, when we pick y according
to distribution µ
2
, A(y) is eﬃcient on average; if we want to solve L
1
under distribution µ
1
,
then, starting from an input x distributed according to µ
1
(x), we compute f(x) and then
apply algorithm A
2
to f(x). This will certainly be correct, but what about the running
time? Intuitively, it could be the case that A
2
is very slow on some inputs, but such inputs
are unlikely to be sampled according to distribution µ
2
; the domination condition ensures
us that such inputs are also unlikely to be sampled when we sample x according to µ
1
and
then consider f(x).
12.4 PolynomialTime on Average
Given a problem 'L, µ` and an algorithm A that runs in time t (x) on input x, what does
it mean to say that A solves 'L, µ` in polynomial time on average? We will consider some
ﬂawed deﬁnitions before settling on the best one and on an alternate one.
A ﬁrst diﬃculty comes with the fact that we are dealing with a single distribution on
all possible inputs. The most intuitive choice of saying that A is eﬃcient if
E[t(x)] is small
is problematic because the expectation could be inﬁnite even if A runs in worstcase poly
nomial time.
One can work around this diﬃculty by deﬁning A to be polynomial provided that for
some constant c and for every suﬃciently large n,
E[t(x)[ [x[ = n] ≤ n
c
However we chose to deﬁne distributional problems and reducibility without separating
problems by input size, and we would run into several diﬃculties in separating them now.
Besides, it seems reasonable that there could be input lengths on which A takes a long
time, but that are generated with very low probability under µ; in such cases A may still
be regarded as eﬃcient, but this is not taken into account in the above deﬁnition.
Our next attempt folds the polynomial running time into the single distribution by
deﬁning A to be polynomial on average if there is a polynomial p such that
74
E
¸
t(x)
p([x[)
= O(1)
This deﬁnition is quite appealing, but is still subject to the fatal of not being robust, in
that: (1) reductions do not preserve this deﬁnition of polynomial solvability on average and
(2) the deﬁnition is sensitive to trivial representation changes such as replacing a matrix
representation of a graph by an adjacency list.
To see why these problems arise, let µ be the uniform distribution, and let
t (x) = 2
n
if x =
− →
0 , t (x) = n
2
otherwise. (12.6)
The average running time is about n
2
. But suppose now that n is replaced by 2n
(because of a change in representation, or because of the application of a reduction), then
t (x) = 2
2n
if x =
− →
0 , t (x) = 4 n
2
otherwise. (12.7)
Similarly, if t(x) is replaced by t
2
(x), the average running time becomes exponential.
We now come to a satisfying deﬁnition.
Deﬁnition 19 (Polynomial on average) Suppose A is an algorithm for a distributional
problem 'L, µ` that runs in time t(x) on input x. We say that A has polynomial running
time on average is there is a constant c such that
E
¸
t(x)
1/c
[x[
¸
= O(1)
Notice, ﬁrst, that this deﬁnition is satisﬁed by any algorithm that runs in worstcase
polynomial time. If t(x) = O([x[
c
), then t(x)
1/c
= O([x[) and the sum converges. More
interestingly, suppose t() is a time bound for which the above deﬁnition is satisﬁed; then an
algorithm whose running time is t
(x) = t(x)
2
also satisﬁes the deﬁnition, unlike the case
of the previous deﬁnition. In fact we have the following result, whose nontrivial proof we
omit.
Theorem 58 If 'L
1
, µ
1
` ≤ 'L
2
, µ
2
` and 'L
2
, µ
2
` admits an algorithm that is polynomial on
average, then 'L
1
, µ
1
` also admits an algorithm that is polynomial on average.
There is an additional interesting property of the deﬁnition of polynomial of average:
there is a high probability that the algorithm runs in polynomial time.
Suppose that
E
¸
t (x)
1/c
[x[
¸
= O(1) . (12.8)
and that we wish to compute Pr[t(x) ≥ k [x[
c
]. Such a probability is clearly the same as
Pr[t(x)
1/c
≥ k
1/c
[x[]
75
and by Markov’s inequality this is at most O(1/k
1/c
), which can be made fairly small by
picking k large enough. Since the algorithm runs in time at most kn
c
for a subset of inputs
having probability 1−k
−1/c
, we see that our deﬁnition gives a smooth quantitative tradeoﬀ
for how much time we need to solve an increasing fraction of inputs.
In the setting of oneway functions and in the study of the averagecase complexity of the
permanent and of problems in EXP (with applications to pseudorandomness), we normally
interpret “average case hardness” in the following way: that an algorithm of limited running
time will fail to solve the problem on a noticeable fraction of the input. Conversely, we would
interpret averagecase tractability as the existence of an algorithm that solves the problem
in polynomial time, except on a negligible fraction of inputs. This leads to the following
formal deﬁnition.
Deﬁnition 20 (Heuristic polynomial time) We say that an algorithm A is a heuristic
polynomial time algorithm for a distributional problem 'L, µ` if A always runs in polynomial
time and for every polynomial p
¸
x:A(x)=χ
L
(x)
µ
(x)p([x[) = O(1)
In other words, a polynomial time algorithm for a distributional problem is a heuristic if the
algorithm fails on a negligible fraction of inputs, that is, a subset of inputs whose probability
mass is bounded even if multiplied by a polynomial in the input length. It might also make
sense to consider a deﬁnition in which A is always correct, although it does not necessarily
work in polynomial time, and that A is heuristic polynomial time if there is a polynomial
q such that for every polynomial p,
¸
x∈Sq
µ
(x)p([x[) = O(1), where S
q
is the set of inputs
x such that A(x) takes more than q([x[) time. Our deﬁnition is only more general, because
from an algorithm A as before one can obtain an algorithm A satisfying Deﬁnition 20 by
adding a clock that stops the computation after q([x[) steps.
The deﬁnition of heuristic polynomial time is incomparable with the deﬁnition of average
polynomial time. For example, an algorithm could take time 2
n
on a fraction 1/n
log n
of the
inputs of length n, and time n
2
on the remaining inputs, and thus be a heuristic polynomial
time algorithm with respect to the uniform distribution, while not beign average polynomial
time with respect to the uniform distribution. On the other hand, consider an algorithm
such that for every input length n, and for 1 ≤ k ≤ 2
n/2
, there is a fraction about 1/k
2
of the
inputs of length n on which the algorithm takes time Θ(kn). Then this algorithm satisﬁes
the deﬁnition of average polynomial time under the uniform distribution, but if we impose
a polynomial clock there will be an inverse polynomial fraction of inputs of each length on
which the algorithm fails, and so the deﬁnition of heuristic polynomial time cannot be met.
It is easy to see that heuristic polynomial time is preserved under reductions.
Theorem 59 If 'L
1
, µ
1
` ≤ 'L
2
, µ
2
` and 'L
2
, µ
2
` admits a heuristic polynomial time algo
rithm, then 'L
1
, µ
1
` also admits a heuristic polynomial time algorithm.
Proof: Let A
2
be the algorithm for 'L
2
, µ
2
`, let f be the function realizing the reduction,
and let p be the polynomial witnessing the domination property of the reduction. Let c and
be such that for every x we have [x[ ≤ c[f(x)[
1/
.
76
Then we deﬁne the algorithm A
1
than on input x outputs A
2
(f(x)). Clearly this is a
polynomial time algorithm, and whenever A
2
is correct on f(x), then A
1
is correct on x.
We need to show that for every polynomial q
¸
x:A
2
(f(x))=χ
L
2
(f(x))
µ
1
(x)q([x[) = O(1)
and the lefthand side can be rewritten as
¸
y:A
2
(y)=χ
L
2
(y)
¸
x:f(x)=y
µ
1
(x)q([x[)
≤
¸
y:A
2
(y)=χ
L
2
(y)
¸
x:f(x)=y
µ
1
(x)q(c [y[
1/
))
≤
¸
y:A
2
(y)=χ
L
2
(y)
µ
2
(y)p([y[)q
([y[)
= O(1)
where the last step uses the fact that A
2
is a polynomial heuristic for 'L
2
, µ
2
` and in the
secondtolast step we introduce the polynomial q
(n) deﬁned as q(c n
1/
)
2
12.5 Existence of Complete Problems
We now show that there exists a problem (albeit an artiﬁcial one) complete for DistNP.
Let the inputs have the form
M, x, 1
t
, 1
l
, where M is an encoding of a Turing machine
and 1
t
is a sequence of t ones. Then we deﬁne the following “universal” problem U.
• Decide whether there exists a string y such that [y[ ≤ l and M (x, y) accepts in at
most t steps.
That U is NPcomplete follows directly from the deﬁnition. Recall the deﬁnition of
NP: we say that L ∈ NP if there exists a machine M running in t = poly ([x[) steps such
that x ∈ L iﬀ there exists a y with y = poly ([x[) such that M (x, y) accepts. Thus, to
reduce L to U we need only map x onto R(x) =
M, x, 1
t
, 1
l
where t and l are suﬃciently
large bounds.
To give a reduction that satisﬁes the domination condition is indeed harded. Let 'L, µ` ∈
DistNP. Deﬁne a uniform distribution over the
M, x, 1
t
, 1
l
as follows:
µ
M, x, 1
t
, 1
l
=
1
[M[ ([M[ + 1) 2
M
1
[x[ ([x[ + 1) 2
x
1
(t +l) (t +l + 1)
. (12.9)
The trouble is that, because of the domination condition, we can’t map x onto R(x) if
µ/ (x) > poly ([x[) 2
−x
. We work around this problem by compressing x to a shorter string
if µ/ (x) is large. Intuitively, by mapping highprobability strings onto shorter lengths, we
make their high probability less conspicuous. The following lemma shows how to do this.
77
Lemma 60 Suppose µ is a polynomialtime computable distribution over x. Then there
exists a polynomialtime algorithm C such that
1. C is injective: C (x) = C (y) iﬀ x = y.
2. [C (x)[ ≤ 1 + min
[x[ , log
1
µ(x)
¸
.
Proof: If µ/ (x) ≤ 2
−x
then simply let C (x) = 0x, that is, 0 concatenated with x. If,
on the other hand, µ/ (x) > 2
−x
, then let C (x) = 1z. Here z is the longest common
preﬁx of µ(x) and µ(x −1) when both are written out in binary. Since µ is computable
in polynomial time, so is z. C is injective because only two binary strings s
1
and s
2
can
have the longest common preﬁx z; a third string s
3
sharing z as a preﬁx must have a longer
preﬁx with either s
1
or s
2
. Finally, since µ/ (x) ≤ 2
−z
, [C (x)[ ≤ 1 + log
1
µ(x)
. 2
Now the reduction is to map x onto R
2
(x) =
M, C (x) , 1
t
, 1
l+x
. Here M is a
machine that on input z, x, y checks that C (x) = z (i.e., that x is a valid decoding of z)
and that M (x, y) accepts. The running time of M is t. Clearly x ∈ L iﬀ M accepts. To
show that domination holds, let µ
2
/ (x) = Pr [R
2
(x)] . Then, since the map is onetoone,
we need only show that µ/ (x) ≤ poly ([x[) µ/
2
(x). Since t = O(poly (t)),
µ/
2
(x) =
1
O
M
2
2
[M[
1
O
[C (x)[
2
2
C(x)
1
O(t +l +[x[)
2
≥ poly ([x[) max
2
−x
, µ/ (x)
≥ poly ([x[) µ/ (x)
and we are done.
Note that, since we mapped longer inputs to shorter ones, we could not have done this
reduction inputlengthwise.
12.6 PolynomialTime Samplability
Deﬁnition 21 (Samplable distributions) We say that a distribution µ is polynomial
time samplable if there exists a probabilistic algorithm A, taking no input, that outputs x
with probability µ/ (x) and runs in poly ([x[) time.
Any polynomialtime computable distribution is also polynomialtime samplable, pro
vided that for all x,
µ/ (x) ≥ 2
−poly(x)
or µ/ (x) = 0. (12.10)
For a polynomialtime computable µ satisfying the above property, we can indeed construct
a sampler A that ﬁrst chooses a real number r uniformly at random from [0, 1], to poly ([x[)
bits of precision, and then uses binary search to ﬁnd the ﬁrst x such that µ(x) ≥ r.
On the other hand, under reasonable assumptions, there are eﬃciently samplable dis
tributios µ that are not eﬃciently computable.
78
In addition to DistNP, we can look at the class
'NP, Psamplable` = ¦'L, µ` : L ∈ NP, µ polynomialtime samplable¦ . (12.11)
A result due to Impagliazzo and Levin [IL90] states that if 'L, µ` is DistNPcomplete,
then 'L, µ` is also complete for the class 'NP, Psamplable`.
This means that the completeness result established in the previous section extends to
the class of NP problems with samplable distributions.
12.7 References
Levin’s theory of averagecase complexity was introduced in [Lev86]. BenDavid et al. [BDCGL92]
prove several basic results about the theory. Impagliazzo and Levin [IL90] show that the
theory can be generalized to samplable distributions. Impagliazzo [Imp95b] wrote a very
clear survey on the subject. Another good reference is a survey paper by Goldreich [Gol97].
79
Exercises
1. For a parameter c, consider the distribution D
n,cn
over instances of 3SAT with n
variables generated by picking cn times independently a random a clause out of the
8
n
3
possible clauses that can be constructed from n variables. (Note that the same
clause could be picked more than once.) Let D
c
be the distribution obtained by ﬁrst
picking a number n with probability 1/n(n + 1) and then sampling from D
n,cn
.
(a) Show that an instance from D
n,cn
is satisﬁable with probability at least (7/8)
cn
and at most 2
n
(7/8)
cn
.
(b) Argue that, using the deﬁnition given in this lecture, D
15
cannot be reduced to
D
30
.
[Hint: take a suﬃciently large n, and then look at the probability of satisﬁable
instances of length n under D
1
5 and the probability that their image is generated
by D
3
0.]
80
Lecture 13
Interactive Proofs
In this lecture we will look at the notion of Interactive Proofs and the induced class IP.
Intuitively, this class is obtained by adding the ingredients Interaction and Randomness to
the class NP. As we shall see, adding either one does not make much diﬀerence but the
two put together make IPa very powerful class.
13.1 Interactive Proofs
We begin be recalling the deﬁnition of NP. Given a language L, we say that: L ∈ NP iﬀ
there is an algorithm V (, ) running in polynomial time T() and a polynomial p() such
that
x ∈ L ⇔ ∃y, [y[ ≤ p([x[) and V (x, y) accepts
We can rewrite the above as two separate conditions, using elementary ideas from logic.
x ∈ L ⇒ ∃y, [y[ ≤ p([x[) and V (x, y) accepts (Completeness)
x ∈ L ⇒ ∀y, [y[ ≤ p([x[) and V (x, y) rejects (Soundness)
We may generalise this to the ﬁgure below 13.1 where the process of recognizing if x ∈ L
is carried out by a Veriﬁer V that must run in polynomial time, and a Prover P, who has
unbounded time and space. The task of the prover is to convince the veriﬁer that x ∈ L
and the veriﬁer is a skeptic who’s task is to be convinced if and only if the string actually
belongs in the language.
Now the class NP is the set of languages L such that L ∈ NP iﬀ there is a Prover P
and a polynomial time veriﬁer V such that:
x ∈ L ⇒ P has a strategy to convince V (Completeness)
x ∈ L ⇒ P has no strategy to convince V (Soundness)
By strategy we mean in this case the certiﬁcate or proof that is polynomially small,
(corresponding to y in the ﬁgure 13.1) that P supplies to V , while later we will generalise
it to interaction i.e. where a sequence of messages is exchanged between the two and a
strategy for P is a function from the sequence of messages seen to the next message that P
must send to V .
81
PROVER VERIFIER
x
y
x
Figure 13.1: NPwith a ProverVeriﬁer mechanism
Thus, we generalise the above deﬁnition to get the class IP. The class IPdiﬀers in the
following two ways:
1. There is randomness, the veriﬁer may be a randomized machine.
2. There is interaction: unlike the above where a single static proof is sent across, there
are rounds of communication, where the veriﬁer may “ask” further questions of the
prover based on the messages that have been sent to it.
First, we shall see that both of the above are required.
13.1.1 NP+ Interaction = NP
Let the class NP be the class corresponding to NP + interaction. That is we allow the
veriﬁer to ask questions and thus the strategy is a function from sequences of messages (the
past communication) to messages (the next message that the veriﬁer must send). We say
that the class NP is the set of languages L such that L ∈ NP iﬀ there is a Prover P and a
polynomial time veriﬁer V such that:
x ∈ L ⇒ P has a strategy to convince V
x ∈ L ⇒ P has no strategy to convince V
It is easy to check that NP = NP. We only need to show NP ⊆ NP, as the other
inclusion is obvious. Say L ∈ NP, let P, V be the prover and veriﬁer for L. Deﬁne
P
, V
the oneround prover and veriﬁer as follows: P
sends to V
the entire transcript of
the interaction between P, V as the certiﬁcate. Suppose that interaction was the set of
messages y
1
, . . . , y
n
as in the ﬁgure 13.2 below. V
is essentially like V only it has been
given all the answers to its questions at once. Thus it takes the ﬁrst message of P, y
1
, its
response (behaving as V ) is exactly y
2
, but y
3
is the reply to that, and so on, it “asks”
the questions and is supplied all the answers on the same transcript, and it accepts iﬀ after
this exchange V accepted. Thus the language accepted by P
, V
is the same as L and so
L ∈ NP. Note that the entire set of messages is polynomially long as there are polynomially
many messages and each is polynomially long.
Note that for the above to work, the prover must at the very beginning know what
responses V may make, which is possible only if V is deterministic. If V was randomized
then to cover all possible questions in a polynomially long interaction P
would have to send
an exponentially large transcript.
82
y1
y2
y3
yn
V P
y1
y3
y2
yn
P’
V’
Figure 13.2: Simulating several rounds of interaction in one round
13.1.2 NP + Randomness
By adding randomness to the class NP we get the class MA.
Deﬁnition 22 (MA) L ∈ MA iﬀ there exists a probabilistic polynomial time machine V
such that:
x ∈ L ⇒ ∃y.Pr[V (x, y) accepts ] ≥
2
3
x ∈ L ⇒ ∀y.Pr[V (x, y) accepts ] ≤
1
3
As in similar classes, the success probabilities may be boosted arbitrarily high. It is
conjectured that MA = NP. it is known that if coNP ⊆ MA then the polynomial
heirarchy collapses.
We are now in a position to deﬁne formally the class IP.
13.2 IP
Deﬁnition 23 (IP) A language L is in IP(r()) iﬀ there exists a probabilistic polynomial
time veriﬁer V such that:
x ∈ L ⇒ ∃P.Pr[V interacting with P accepts ] ≥
2
3
x ∈ L ⇒ ∀P.Pr[V interacting with P accepts ] ≤
1
3
Where V also uses at most r([x[) rounds of interaction.
Note that by round we really mean a single “message” and not a questionanswer se
quence. That is, by two rounds we mean that the Veriﬁer asks a question of the Prover and
the Prover replies, which is exactly two messages.
A related class is AM which is deﬁned as follows:
Deﬁnition 24 (AM) A language L is in AM(r()) iﬀ L ∈ IP(r()) and at each round
the veriﬁer sends a random message, that is a message that is completely random and
independent of all previous communication.
83
The above is also known as Public Coin Proof Systems.
There are the following surprising theorems about the classes AM and IP.
Theorem 61 IP(r(n)) ⊆ AM(r(n) + 2)
Theorem 62 ∀r(n) ≥ 2, AM(2r(n)) ⊆ AM(r(n)).
This yields the following corollaries:
Corollary 63 AM(O(1)) ⊆ AM(2)
That is, all constant round AM proofs can be done in just two rounds. And also:
Corollary 64 IP(O(1))=AM(O(1))=AM(2)
Finally we have the famous theorem of Shamir:
Theorem 65 IP(poly(n)) = PSPACE
We know that AM(2) is good enough for systems with a constant number of rounds, but
it would be surprising indeed if polynomially many rounds could be simulated in a constant
number of round. Due to the above result, when people say AMthey mean AM(2), but
when people say IP they mean IP(poly).
There is also the following theorem that gives an upper bound on IP with constant
rounds.
Theorem 66 If coNP ⊆ IP(O(1)) then the polynomial heirarchy collapses.
Thus for a long time it was believed that IPcould not capture coNPas it was thought that
IPwith a constant number of rounds was roughly as good as IPwith polynomially many
rounds. However, it was shown that in fact coNPwas contained in IP(poly(n)), the proof
of which we shall shortly see and a few weeks after that result, Shamir proved that in fact
IP was the same as PSPACE.
13.3 An Example: Graph NonIsomorphism
To get a feel of the power of interaction and randomness let us look at an example, Graph
NonIsomorphism. The problem is the following:
• Input: Graphs G
1
= (V
1
, E
1
), G
2
= (V
2
, E
2
)
• Output: Decide whether there exists NO π : V
1
→ V
2
such that (u, v) ∈ E
1
iﬀ
(π(u), π(v)) ∈ E
2
.
This problem is not known to be in NP. The diﬃculty lies in constructing a succint
proof that no permutation of the vertices is good, given that there are exponentially many
permutations, let us see how this is in AM(2).
Suppose that the skeptic Arthur (the veriﬁer) has the two graphs G
1
, G
2
and wishes to
know whether or not all the permutations are bad. His only resources are randomness and
his wizard Merlin who is all knowing (the prover).
He does the following:
84
1. Picks a random b ∈ ¦1, 2¦.
2. Creates G to be a random permutation of G
b
.
3. Sends G to Merlin and demands to know of Merlin which graph he sent him a per
mutation of.
4. If Merlin was correct Arthur believes that the graphs are nonisomorphic (accepts),
otherwise Merlin was incorrect, Arthur believes the graphs are isomorphic (rejects).
Notice that if the graphs are really nonisomorphic, then Merlin will always be able
to tell which graph was sent to him as only one of the graphs will be isomorphic to the
random permutation that Merlin is given, and with his inﬁnite resources Merlin will ﬁgure
out which one and thus will reply correctly and so Arthur will accept with probability 1.
If the graphs are isomorphic then Merlin cannot tell as the permutation is a permutation
of both G
1
, G
2
. Merlin’s output depends only on G
1
, G
2
, G, say it is i. Now i = b with
probability
1
2
as b is picked independantly at the start. Thus in the case the graphs are
isomorphic, Arthur accepts only with probability
1
2
. By repeating this k times we can shrink
this to 2
−k
.
Thus we have a protocol, a prover and a veriﬁer for Graph NonIsomorphism and so
it is in AM(2).
Note: The veriﬁer of the above has learnt nothing at all about the Graphs that he did
not know before. He does not have a proof or certiﬁcate with which he may convince a third
party that the Graphs are nonisomorphic. The entire power was in the interaction between
it and the prover, and the one bit that the prover gave it gives it signiﬁcant conﬁdence but
no knowledge. This was therefore a Zero Knowledge Proof.
Remark 1 Graph NonIsomorphism ∈ IP(2) = AM(2). Thus, Graph Isomorphism
is not NPcomplete unless the polynomial heirarchy collapses. Thus is because, if Graph
Isomorphism is NPcomplete then Graph NonIsomorphism is coNPcomplete, and
from the above result, coNP ⊆ AM(2) and from the theorem 66, the polynomial heirarchy
would collapse.
13.4 Random SelfReductions
It is not known how to prove worstcase to averagecase equivalence for NPcomplete prob
lems. In this section we show that an approach similar to the one taken for #P, PSPACE
and EXP does not work.
The worstcase to averagecase equivalence results that we have seen so far rely on
procedures to compute polynomials exactly given a procedure that computes the polynomial
well on average. In all cases, given a point x on which we want to evaluate the polynomial,
we pick a random line passing through x and then evaluate the goodonaverage procedure
on points along the line. Since each such point is uniformly distributed, we are able to get
the correct value of the polynomial on all points (in the simplest case) or on a majority
of points (in the procedures based on BerlekampWelch). The following deﬁnition captures
the way such reductions work.
85
Deﬁnition 25 (Random Self Reduction) We say that a problem L has a nonadaptive
random self reduction if there is a probabilistic polynomial time algorithm R such that,
given an input x ∈ ¦0, 1¦
n
, R(x) produces a sequence of inputs y
1
, . . . , y
k
∈ ¦0, 1¦
n
, where
k, n
= n
O(1)
, and then given L(y
1
), . . . , L(y
k
), R decides correctly whether x ∈ L with
high probability. Furthermore, for every x ∈ ¦0, 1¦
n
and every i, the string y
i
is uniformly
distributed in ¦0, 1¦
n
The intuition for the use of a random selfreduction is that if we have an algorithm A
that solves L on, say, a fraction 1 − 1/4k of the elements of ¦0, 1¦
n
, then we can run R,
generate y
1
, . . . , y
k
, and then continue using the answers A(y
1
), . . . , A(y
k
), which are going
to be all correct with probability at least 3/4. So A can be used to solve L everywhere on
inputs of length n.
We note that the above deﬁnition is restrictive in two ways: R generates the points
y
1
, . . . , y
k
nonadaptively, that is, it chooses y
2
before knowing L(y
1
), and so on. Secondly,
it is not really necessary that all points y
1
, . . . , y
k
be uniformly distributed. Suppose that
each point y
i
is just generated according to a “smooth” distribution (that is, say, no element
of ¦0, 1¦
n
is generated with probability larger than c/2
n
), and that we have an algorithm
A that succeeds on at least a fraction 1 − 1/4ck of ¦0, 1¦
n
. Then using the randomized
reduction together with A we still have at least a probability 3/4 that all queries are correctly
answered.
Nothing is known about the existence of adaptive and/or smooth random selfreduction,
but the following theorem rules out nonadaptive uniform random selfreduction.
Theorem 67 Suppose an NPcomplete language L has a random selfreduction as in Def
inition 25. Then the polynomial hierarchy collapses to Σ
3
.
Proof: We show that if L ∈ NP and L has a random selfreduction, then the complement
of L has an AM[2] protocol (that uses some nonuniformity), and if L were NPcomplete
this would imply the collapse of the hierarchy.
Let R be the random selfreduction for L, let us deﬁne a protocol for the complement of
L on inputs of length n; let n
be the length of the queries generated by R(x), k the number
of queries, and let
p = Pr
y∈{0,1}
n
[y ∈ L]
The value p will be the nonuniformity used in the protocol.
We ﬁx a parameter m = Θ((k log k)
2
). Given x ∈ ¦0, 1¦
n
:
• The veriﬁer runs R(x) m times indepedently, with fresh randomness each time, and
generates queries (y
1
1
, . . . , y
1
k
), . . . , (y
m
1
, . . . , y
m
k
). The veriﬁer sends all such queries to
the prover.
1
• The prover responds by telling whether each y
i
j
is in L or not, and by certifying all
the YES answers with an NP witness (remember that L is in NP).
• The veriﬁer accepts if and only if the following conditions hold:
1
To make it an AM protocol, the veriﬁer may just send the randomness it used to generate the m runs
of R(x), and let the prover ﬁgure out the corresponding queries y
i
j
.
86
1. R(x) rejects in all the m iterations, assuming the answers from the prover were
correct,
2. all certiﬁcates sent by the prover are valid, and
3. for every j = 1, . . . k at least pm−m/2k of the queries y
1
j
, . . . , y
m
j
are answered
YES
We need to argue that if x ∈ L then the veriﬁer accepts with high probability when inter
acting with a valid prover, and that if x ∈ L then the veriﬁer accepts with low probability
when interacting with an arbitrary cheating prover.
If x ∈ L and the prover follows the protocol, then R(x) rejects in all the m iterations,
and the veriﬁer accepts, provided that the third condition is satisﬁed. Note that for each
ﬁxed j, the strings y
1
j
, . . . y
m
j
are independent and uniformly distributed elements of ¦0, 1¦
n
,
and each one has probability p of being a YES instance. Overall, using Chernoﬀ bounds,
there is probability at least 1/4k that at least pm−O(
√
mlog k) of them are YES instances,
and this number is larger is pm− m/2k if m ≥ c(k log k)
2
for a suﬃciently large c. Then
there is a probability at least 3/4 that the condition is true for all j.
If x ∈ L, then the veriﬁer rejects if the prover sends valid answers for one of the m
runs of R(x), so, if the veriﬁer accepts, it must be because the prover sent a set of answers
containing at least m errors. All the erroneous answers of the prover must be YES instances
for which it answers NO (if the prover tries to cheat in the other way, it would be caught
because of its inability to provide a valid certiﬁcate). This means that there is some j such
that, among the queries y
1
j
, . . . y
m
j
, at least m/k are answered NO even though they were
YES instances. By the calculation in the above paragraph, very likely there were no more
than pm+O(
√
mlog k) yes instances to start with among y
1
j
, . . . y
m
j
, and so the prover has
to pretend that there are only pm+O(
√
mlog k) −k/m, which is less than pm−k/2m by
our choice of m, and so the veriﬁer, with high probability, rejects when checking condition
(3). 2
13.5 References
The classes IP and AM were deﬁned independently. The class IP was deﬁned by Shaﬁ
Goldwasser, Silvio Micali and Charles Rackoﬀ [GMR89]. They proposed the class IP as an
intermediate step towards the deﬁnition of Zero Knowledge Proofs. The paper [GMR89]
was written in 1983 but appeared ﬁrst in FOCS 85. The class AM was deﬁned by Lazlo
Babai [Bab85] (the journal version of the paper is by Babai and Moran [BM88]), in order
to characterize the complexity of a grouptheoretic problem. The paper also appeared in
FOCS 85, and contained theorem 62. The interactive proof for graph nonisomorphism is
from [GMW91]. Theorem 61 is due to Goldreich and Sipser [GS86]. Credit for the proof
that IP = PSPACE is typically shared between the work of Lund et al. [LFKN92] and of
Shamir [Sha92].
The results of Section 13.4 are due to Feigenbaum and Fortnow [FF93].
87
Exercises
The following exercises give a proof of Theorem 66.
1. A nondeterministic circuit C has two inputs x = x
1
, . . . , l
m
and y = y
1
, . . . , y
n
. C
accepts the string x iﬀ ∃y.C(x, y) = 1. Show that every problem in MA has non
deterministic circuits of polynomial size. Sligthly harder: show that every problem in
AM[2] has nondeterministic circuits of polynomial size.
[Hint: The proof is similar to the proof that BPP has polynomial sized circuits.]
2. Prove that if coNP has polynomialsize nondeterministic circuits, then the polyno
mial hierarchy collapses.
[Hint: This is not too diﬀerent from KarpLipton.]
88
Lecture 14
IP=PSPACE
In this lecture we will see that IP = PSPACE. We start by giving an interactive proof
system for UNSAT, and therefore show that coNP ⊆ IP. In fact, as we will see, the same
proof system with minor modiﬁcations can be used for #SAT. Since #SAT is #Pcomplete,
and since PH ⊆ P
#P
, then PH ⊆ IP. Finally, we generalize this protocol in order to
solve QSAT, and thus show that PSPACE ⊆ IP.
14.1 UNSAT ⊆ IP
Let φ be a formula with m clauses over the variables x
1
, . . . , x
n
. Let N > 2
n
3
m
be a prime
number. We translate φ to a polynomial p over the ﬁeld (mod N) in the following way.
Literals are translated as follows: x
i
→ x
i
, x
i
→ (1 −x
i
). A clause is translated to the sum
of the (at most 3) expressions corresponding to the literals of the clause. Finally, p is taken
as the product of the m expressions corresponding to the m clauses. Notice that each literal
is of degree 1, and therefore p is of degree at most m. Furthermore, for a 01 assignment,
p evaluates to 0 if this assignment does not satisfy φ, and to a nonzero number if this
assignment does satisfy φ where this number can be at most 3
m
. Hence, φ is unsatisﬁable
iﬀ
¸
x
1
∈{0,1}
¸
x
2
∈{0,1}
. . .
¸
xn∈{0,1}
p(x
1
, . . . , x
n
) ≡ 0 (mod N)
Also, the translation to p can be done in polynomial time, and if we take N = 2
O(n+m)
then
elements in the ﬁeld can be represented by O(n +m) bits, and thus one evaluation of p in
any point in ¦0, . . . , N −1¦
m
can be computed in polynomial time.
We now describe a protocol that enables the prover to convince the veriﬁer that φ
is unsatisﬁable (see also Figure 14.1 for a sketch of the protocol). At the ﬁrst round,
the prover sends N, a proof that it is prime, and an mdegree univariate polynomial
q
1
(x) =
¸
x
2
,...,xn∈{0,1}
p(x, x
2
, . . . , x
n
). At this point, the veriﬁer veriﬁes the primality
proof as well as checking that q
1
(0) + q
1
(1) = 0. If any of the checks fail then the veri
ﬁer rejects. Otherwise, it continues by choosing a random r
1
∈ ¦0, . . . , N −1¦ and send
ing it to the prover. The prover responds by sending an mdegree univariate polynomial
q
2
(x) =
¸
x
3
,...,xn∈{0,1}
p(r
1
, x, x
3
, . . . , x
n
). Now, the veriﬁer checks if q
2
(0) +q
2
(1) = q
1
(r
1
).
If not, it rejects, and otherwise it continues by choosing a random r
2
∈ ¦0, . . . , N −1¦ and
89
φ
?
P
φ
?
V
N, a proof that N is prime

check primality proof
q
1
(x) =
¸
x
2
,...,xn∈{0,1}
p(x, x
2
, . . . , x
n
)

check q
1
(0) +q
1
(1) = 0
r
1
pick random r
1
∈ ¦0, . . . , N −1¦
q
2
(x) =
¸
x
3
,...,xn∈{0,1}
p(r
1
, x, x
3
, . . . , x
n
)

check q
2
(0) +q
2
(1) = q
1
(r
1
)
r
2
pick random r
2
∈ ¦0, . . . , N −1¦
q
q
q
q
i
(x) =
¸
x
i+1
,...,xn∈{0,1}
p(r
1
, . . . , r
i−1
, x, x
i+1
, . . . , x
n
)

check q
i
(0) +q
i
(1) = q
i−1
(r
i−1
)
r
i
pick random r
i
∈ ¦0, . . . , N −1¦
q
q
q
q
n
(x) = p(r
1
, . . . , r
n−1
, x)

check q
n
(0) +q
n
(1) = q
n−1
(r
n−1
)
pick random r
n
∈ ¦0, . . . , N −1¦
check q
n
(r
n
) = p(r
1
, . . . , r
n
)
Figure 14.1: The protocol for proving unsatisﬁability
90
sending it to the prover. The protocol continues in this way where the ith message that
the prover sends is a polynomial q
i
(x) =
¸
x
i+1
,...,xn∈{0,1}
p(r
1
, . . . , r
i−1
, x, x
i+1
, . . . , x
n
).
The veriﬁer responds by checking that q
i
(0) + q
i
(1) = q
i−1
(r
i−1
), rejecting if it is not,
and otherwise, choosing a random r
i
∈ ¦0, . . . , N −1¦ and sending it to the prover. At
the last round, the prover sends q
n
(x) = p(r
1
, . . . , r
n−1
, x). The veriﬁer checks that
q
n
(0) + q
n
(1) = q
n−1
(r
n−1
), chooses a random r
n
∈ ¦0, . . . , N −1¦, and checks that
q
n
(r
n
) = p(r
1
, . . . , r
n
). The veriﬁer rejects if either of the checks fail, and accepts oth
erwise. The veriﬁer can indeed perform the last check since as mentioned before, it can
translate φ to p and evaluate it at the point (r
1
, . . . , r
n
) in polynomial time. Also, the mes
sages are of polynomial length because elements of the ﬁeld can be written with O(m+n)
bits, and since each polynomial has m coeﬃcients.
We now show the correctness of this proof system. If φ is unsatisﬁable then the prover
can make the veriﬁer accept w.p. 1 by just following the protocol since q
1
(0) + q
1
(1) =
¸
x
1
,...,xn∈{0,1}
p(x
1
, . . . , x
n
) ≡ 0 (mod N). It is clear that the rest of the checks will succeed
if the prover send the q
i
’s accordingly. We have to show that if φ is satisﬁable then no matter
what polynomials the prover sends, the veriﬁer will reject with high probability. We write
p
i
(x) for the polynomial that a prover who follows the protocol sends in round i. Since φ
is satisﬁable then
¸
x
1
,...,xn∈{0,1}
p(x
1
, . . . , x
n
) ≡ 0 (mod N). Therefore, p
1
(0) + p
1
(1) = 0,
and hence, if the prover sends q
1
= p
1
then the veriﬁer will reject. If the prover sends q
1
= p
1
then since both are degreem polynomials then they agree on at most m places. Thus, there
is probability ≥ 1 −
m
N
that p
1
(r
1
) = q
1
(r
1
). Suppose that indeed p
1
(r
1
) = q
1
(r
1
). If the
prover then sends q
2
= p
2
then the veriﬁer will reject because q
2
(0)+q
2
(1) = p
1
(r
1
) = q
1
(r
1
).
Thus, the prover must send q
2
= p
2
. Again, in that case, q
2
and p
2
will diﬀer on a fraction
≥ 1 −
m
N
of the elements of ¦0, . . . , N −1¦, so p
2
(r
2
) = q
2
(r
2
) with probability ≥ 1 −
m
N
.
We continue the argument in a similar way. If q
i
= p
i
then w.p. ≥ 1 −
m
N
, r
i
is such
that p
i
(r
i
) = q
i
(r
i
). If so, then the prover must send q
i+1
= p
i+1
in order for the veriﬁer
not to reject. At the end, if the veriﬁer has not rejected before the last check, then w.p.
≥ 1 − (n − 1)
m
N
, q
n
= p
n
. If so, then w.p. ≥ 1 −
m
N
the veriﬁer will reject since, again,
q
n
(x) and p(r
1
, . . . , r
n−1
, x) diﬀer on at least that fraction of the points. Thus, the total
probability that the veriﬁer will accept is at most
nm
N
.
14.2 A Proof System for #SAT
We now show that with a few minor modiﬁcations, the previous proof system can be used
to prove that a formula has a given number of satisfying assignments. Suppose φ has k
satisfying assignments. We wish to give a protocol s.t. if the prover gives k as an answer,
he is able to continue with the protocol s.t. the veriﬁer will accept w.p. 1, but if it gives
another number as an answer, then no matter what messages the prover sends afterwards,
the veriﬁer will reject with high probability.
We ﬁrst change the way we translate φ to the polynomial p. The change is only in
the way we translate clauses. Instead of translating a clause to the sum of the literals, we
translate the clause (z
1
∨z
2
∨z
3
) to 1−(1−z
1
)(1−z
2
)(1−z
3
). Notice that if a 01 assignment
to the variables satisﬁes the clause, then the value of this expression will be 1, and if the
assignment does not satisfy the clause, this value will be 0. Hence, 01 assignments that
satisfy φ will be evaluated to 1 by p, while as before, the 01 assignments that do not satisfy
91
φ will be evaluated to 0. Notice that now the degree of p is 3m instead of m, but now the
sum over all evaluations of 01 assignments is equal to the number of satisfying assignments
of φ. For the same reason, it is now enough to take N > 2
n
.
In the current protocol, the prover starts by sending k, and then continues as in the
previous protocol (with the updated deﬁnition of p). After receiving the ﬁrst message, the
veriﬁer checks if q
1
(0) + q
1
(1) = k instead of checking q
1
(0) + q
1
(1) = 0, and follows the
rest of the previous protocol. If k is indeed the correct number of satisfying assignments,
then the prover can continue by sending q
i
= p
i
at each round, and the veriﬁer will accept
with probability 1. If k is not the correct number of satisfying assignments, then according
to the same analysis as in the previous section, the veriﬁer will accept with probability at
most
3mn
N
.
14.3 A Proof System for QBF
14.3.1 PSPACEComplete Language: TQBF
For a 3CNF boolean formula φ(x
1
, x
2
, . . . , x
n
), we may think of its satisﬁability problem
as determining the truth value of the statement
∃x
1
∃x
2
, . . . , ∃x
n
φ(x
1
, x
2
, . . . , x
n
).
We can generalize this idea to allow universal quantiﬁers in additional to existential quan
tiﬁers to get formulas denoted as quantiﬁed boolean formulas. For example, ∀x
1
∃x
2
(x
1
∨
x
2
) ∧(x
1
∨x
2
) is a quantiﬁed boolean formula. In fact, it’s easy to see that it’s a true quan
tiﬁed boolean formula. Now consider the language of all true quantiﬁed boolean formulas:
TQBF = ¦Φ : Φ is a true quantiﬁed boolean formula¦.
It is known that TQBF is PSPACEcomplete. Thus, if we have an interactiveproof
protocol for recognizing TQBF, then we have a protocol for recognizing any language in
PSPACE. In the rest of the lecture, we will provide a protocol for recognizing TQBF.
14.3.2 Arithmetization of TQBF
We will consider the following quantiﬁed boolean formula:
Φ = ∀x
1
∃x
2
∀x
3
. . . ∀x
n
φ(x
1
, x
2
, . . . , x
n
),
where φ(x
1
, x
2
, . . . , x
n
) is a 3 −CNF boolean formula. Without loss of generality, we can
assume that all the quantiﬁed boolean formulas given to us have this form. The main idea
in the interactive protocol for recognizing TQBFis the same as proving P ⊆ IP from the
last lecture: we will ﬁrst arithmetize the quantiﬁed boolean formula and then the prover
will convince the veriﬁer that the arithmetized expression evaluates to 1. In the following,
all the random elements are drawn from a ﬁeld ¦0, 1, . . . , p−1¦, where p is suﬃciently large.
92
Naive Solution
So how do we arithmetize the quantiﬁed formula? we will ﬁrst show a naive solution and
point out its problems. To begin with, we know from the last lecture how to arithmetize
φ(x
1
, x
2
, . . . , x
n
) to obtain the polynomial F(x
1
, x
2
, . . . , x
n
). Recall that F has degree 3m in
each variable (m is the number of clauses in φ) and agrees with φ for all 0−1 assignments to
the n variables. Now we read the quantiﬁers from right to left and consider the expression
∀x
n
φ(x
1
, x
2
, . . . , x
n
). This expression has n −1 free variables, and for each substitution of
values to these variables the expression itself either evaluates to true or false. We’re looking
for a polynomial that has the same behavior. Using F we can now write a new polynomial
P
∀xn
F(x
1
, x
2
, . . . , x
n
), which is equal to
F(x
1
, x
2
, . . . , x
n−1
, 0) F(x
1
, x
2
, . . . , x
n−1
, 1).
Next consider the previous quantiﬁer ∃x
n−1
. We want to ﬁnd a polynomial representation
for ∃x
n−1
∀x
n
φ(x
1
, x
2
, . . . , x
n
), given the polynomial representation G(x
1
, x
2
, . . . , x
n−1
) for
∀x
n
φ(x
1
, x
2
, . . . , x
n
). The following polynomial, which we denote by P
∃x
n−1
G(x
1
, x
2
, . . . , x
n−1
),
satisﬁes the condition:
1 −(1 −G(x
1
, x
2
, . . . , x
n−2
, 0)) (1 −G(x
1
, x
2
, . . . , x
n−2
, 1)).
In the notations introduced above, the polynomial for ∃x
n−1
∀x
n
φ(x
1
, x
2
, . . . , x
n
) is
P
∃x
n−1
P
∀xn
F(x
1
, x
2
, . . . , x
n
).
In general, we transform the quantiﬁed boolean formula Φ into an arithmetic expression
using operators P
∀u
and P
∃v
as follows:
• Turn the 3CNF formula φ(x
1
, x
2
, . . . , x
n
) into the polynomial F(x
1
, x
2
, . . . , x
n
) as
done in the last lecture.
• Replace the quantiﬁer ∃x
i
by the operator P
∃x
i
.
• Replace the quantiﬁer ∀x
i
by the operator P
∀x
i
.
The ﬁnal expression after the transformation always evaluates to 0 or 1. It evaluates to 1
if and only if the quantiﬁed boolean formula Φ is true.
We have thus arrived at a way of representing quantiﬁed boolean formulas using arith
metic operations. Now consider the arithmetic expression after the transformation
P
∀x
1
P
∃x
2
. . . P
∀xn
F(x
1
, x
2
, . . . , x
n
)
and it evaluates to either 0 or 1. Let us apply the idea from the last lecture to derive a
protocol and see what happens. In the ﬁrst step, the prover eliminates the ﬁrst operator
P
∀x
1
and sends the veriﬁer the polynomial G
1
(x
1
) corresponding to the rest of the arithmetic
expression:
G
1
(x
1
) = P
∃x
2
P
∀x
3
. . . P
∀xn
F(x
1
, x
2
, . . . , x
n
).
93
Upon receiving the polynomial
ˆ
G
1
(x
1
) (which the prover claims to be G
1
(x
1
)), the veriﬁer
checks that
ˆ
G
1
(0)
ˆ
G
1
(1) = 1. The veriﬁer then picks a random value r
1
, computes β
1
=
ˆ
G
1
(r
1
) and sends r
1
to the prover. The prover then needs to convince the veriﬁer that
β
1
= P
∃x
2
P
∀x
3
. . . P
∀xn
F(r
1
, x
2
, . . . , x
n
).
The prover then proceeds by eliminating the operator one after another, and in the end the
veriﬁer checks that β
n
= F(r
1
, r
2
, . . . , r
n
).
However, since each operator potentially doubles the degree of each variable in the
expression, the degree of the polynomial G
1
(x
1
) in the end can very well be exponential.
This means that the prover will have to send the veriﬁer exponentially many coeﬃcients,
but the polynomially bounded veriﬁer will not be able to read them all.
Revised Solution
We will ensure that the degree of any variable in any intermediate stage of the transforma
tion never goes above two. The arithmetization given above clearly does not satisfy this
property. Yet there is a simple way to ﬁx this. At any stage of the transformation, we have
a polynomial J(x
1
, x
2
, . . . , x
n
) where some variables’ degree might exceed two. Normally,
we can’t expect to transform J into a new polynomial J
(x
1
, x
2
, . . . , x
n
) where the degree
of any variable is at most two while still evaluating to the same value at all points. Yet,
notice here that we only need J
to agree with J on all 0 − 1 assignments. What J
does
at any other point is of no interests to us. The key observation then is that for x = 0
or x = 1, x
k
= x for all positive integers k. The desired polynomial J
can thus be ob
tained from J by erasing all exponents, that is, replacing them with 1. For example, if
J(x
1
, x
2
, x
3
) = x
3
1
x
4
2
+ 5x
1
x
3
2
+ x
6
2
x
2
3
, the transformed polynomial J
is 6x
1
x
2
+ x
2
x
3
. In
general, we deﬁne a new operator, Rx
i
, which when applied to a polynomial reduces the
exponents of x
i
to 1 at all occurrences. More formally, we have
Rx
i
J(x
1
, . . . , x
n
) = x
i
J(x
1
, . . . , x
i−1
, 1, x
i+1
, . . . , x
n
)+
(1 −x
i
) J(x
1
, . . . , x
i−1
, 0, x
i+1
, . . . , x
n
).
In this notation, we have
J
(x
1
, . . . , x
n
) = Rx
1
Rx
2
. . . Rx
n
J(x
1
, . . . , x
n
).
In the revised arithmetization process, the only change we make is to apply the reduce
operators whenever the degree of any variable can potentially be above 1 in the polynomial
obtained from the previous step. In particular, we apply the reduce operators after the 3
CNF formula φ is arithmetized into the polynomial F(x
1
, . . . , x
n
). Also, since the other two
operators, P
∀u
and P
∀v
, might double the degree of some variables in the transformation,
we apply the reduce operators after each application of P
∀u
or P
∃v
. This revision will ensure
that the degree of any variable at any intermediate stage never goes above 2. The degree
of some variable might reach 2 after a P
∀u
or P
∃v
operator, but will be reduced back to 1
after the reduce operators are applied.
94
14.3.3 The Interactive Protocol
Using the revised arithmetization, we arithmetize the quantiﬁed boolean formula of the
form
Φ = ∀x
1
∃x
2
∀x
3
. . . ∀x
n
φ(x
1
, . . . , x
n
)
into
E = P
∀x
1
Rx
1
P
∃x
2
Rx
1
Rx
2
P
∀x
3
Rx
1
Rx
2
Rx
3
P
∃x
4
. . . P
∀xn
Rx
1
. . . Rx
n
F(x
1
, . . . , x
n
).
Now the idea of the interactive protocol from the last lecture can be put to use without any
trouble. The prover and the veriﬁer communicate to eliminate one operator at each stage.
In the end, no operators are left and the veriﬁer checks the required condition.
Initially, the prover eliminates the ﬁrst operator P
∀x
1
and sends the veriﬁer the polyno
mial G
1
(x
1
) corresponding to the rest of the arithmetic expression:
G
1
(x
1
) = Rx
1
P
∃x
2
. . . Rx
n
F(x
1
, . . . , x
n
).
Notice that this is a polynomial of degree 1 instead of exponential degree in the original
naive arithmetization. Upon receiving the polynomial
ˆ
G
1
(x
1
)(which the prover claims to
be G
1
(x
1
)), the veriﬁer checks that P
∀x
1
ˆ
G
1
(x
1
) =
ˆ
G
1
(0)
ˆ
G
1
(1) = 1. The veriﬁer then picks
a random value r
1
, computes β
1
=
ˆ
G
1
(r
1
) and sends r
1
to the prover. (In this stage, the
veriﬁer actually doesn’t have to send r
1
to the prover, but for consistency with later rounds
we let the veriﬁer send r
1
to the prover. The soundness of the protocol remains.) The
prover then needs to convince the veriﬁer that
β
1
= Rx
1
P
∃x
2
. . . Rx
n
F(r
1
, x
2
, . . . , x
n
).
The prover then eliminates the second operator Rx
1
and sends the veriﬁer the polynomial
G
2
(x
1
) corresponding to the rest of the arithmetic expression:
G
2
(x
1
) = P
∃x
2
. . . Rx
n
F(x
1
, . . . , x
n
).
Notice that this is a polynomial of degree at most 2. Upon receiving the polynomial
ˆ
G
2
(x
1
)(which the prover claims to be G
2
(x
1
)), the veriﬁer checks that (Rx
1
ˆ
G
2
(x
1
))[r
1
] =
ˆ
G
1
(r
1
), where [r
1
] means that the polynomial in the parenthesis with free variable x
1
is eval
uated at point r
1
. The veriﬁer then picks a random value r
2
for x
1
, computes β
2
=
ˆ
G
1
(r
2
)
and sends r
2
to the prover.
In general, when the veriﬁer needs to be convinced that
β = (Rx
i
P(α
1
, α
2
, . . . , α
i−1
, x
i
))[α
i
],
he asks for the polynomial G(x
i
) = P(α
1
, α
2
, . . . , α
i−1
, x
i
), and gets
ˆ
G(x
i
). (α
j
(1 ≤ j ≤ i)
is the most recent random value the veriﬁer assigned to variable x
j
.) He veriﬁes that
(Rx
i
ˆ
G(x
i
))[α
i
] = β and computes β
=
ˆ
G(α
i
) by choosing a new random value α
i
for x
i
.
Now the veriﬁcation is reduced to whether
β
= P(α
1
, α
2
, . . . , α
i−1
, α
i
).
95
The case of the P
∀u
or P
∃v
operator is the same as in the last lecture where we proved
P ⊆ IP.
The prover proceeds by eliminating the operators one after another, and in the end the
veriﬁer checks that q(α
n
) = F(α
1
, . . . , α
n
), where q is the polynomial the prover sends to
the veriﬁer in the last round and α
i
is the most recent random value the veriﬁer assigned
to variable x
i
.
14.3.4 Analysis
Theorem 68 (a) If Φ ∈ TQBF, then there exists a prover such that Pr[V erifier accepts] =
1.
(b) If Φ ∈ TQBF, then for all provers Pr[V erifier accepts] ≤ 1/2.
Proof: Part (a) is obvious. For part (b), the analysis is the same as in the case of P ⊆ IP
from the last lecture. The error can be introduced only when the veriﬁer happens to pick
a root of some low degree polynomial at some round. While dealing with the last n reduce
operators, the polynomials may have degree at most 3m. For the remaining n(n − 1)/2
reduce operators, the degree is at most 2. For every other operator (n operators of the
form P
∀u
or P
∃v
) the degree of the polynomial is 1. Thus, the sum of the degrees of all the
polynomials encountered in the interaction is at most 3mn+n
2
. Thus, if we choose the size
of our ﬁeld p large enough, the probability of error is at most (3mn +n
2
)/p ≤ 1/2. 2
96
Exercises
1. Show that IP ⊆ PSPACE.
[Hint: it is easier to show the following stronger claim: that given a veriﬁer V and
an input x, we can compute the maximum, over all provers’ strategies P
∗
, of the
probability that V accepts x while interacting with P
∗
.]
97
Lecture 15
Introduction to PCP
In this lecture we see the connection between PCP and hardness of approximation. In par
ticular, we show that the PCP Theorem implies that MAX3SAT cannot be approximated
within (1 − ) for some > 0, unless P = NP. Also, we show that Max Clique cannot be
approximated within any constant factor.
15.1 Probabilistically Checkable Proofs
Deﬁnition 26 Given functions r(n), q(n), we say that L ∈ PCP(r(n), q(n)) if there is a
polynomial time probabilistic veriﬁer V , which is given x, oracle access to a proof π, uses
r([x[) random bits, reads q([x[) bits of π and such that
x ∈ L ⇒ ∃π Pr[V
π
(x) = 1] = 1
x / ∈ L ⇒ ∀π Pr[V
π
(x) = 1] ≤
1
2
.
Theorem 69 (PCP Theorem) NP = PCP(O(log n), O(1)).
15.2 PCP and MAX3SAT
15.2.1 Approximability
Deﬁnition 27 (MAX3SAT) Given a 3CNF formula ϕ (i.e. with at most 3 variables per
clause), ﬁnd an assignment that satisﬁes the largest number of clauses.
Note that MAX3SAT generalizes 3SAT, and so cannot be solved in polynomial time
unless P = NP. However, it is easy to approximate it within a factor of 2:
Algorithm. Output the best solution between alltrue and allfalse.
Analysis. Every clause is satisﬁed by at least one of the two solutions, therefore one of
the solutions satisﬁes at least half of the clauses.
It is possible to do better. The KarloﬀZwick algorithm runs in polynomial time and
satisﬁes ≥
7
8
times the optimal number of clauses.
Some problems admit polynomial time algorithms with approximation ratio ≥ (1−) ∀.
The PCP Theorem implies this is not the case for MAX3SAT, as we see next.
98
15.2.2 Inapproximability
Theorem 70 The PCP Theorem implies that there is an > 0 such that (1−)approximation
of MAX3SAT is NPhard.
Proof: Fix any NPcomplete problemL. By the PCPTheorem, L ∈ PCP(O(log n), O(1)).
Let V be the veriﬁer for L.
Given an instance x of L, our plan is to construct a 3CNF formula ϕ
x
on m variables
such that (for some > 0 to be determined)
x ∈ L ⇒ ϕ
x
is satisﬁable
x / ∈ L ⇒ no more than (1 −)m clauses of ϕ
x
are satisﬁable.
(15.1)
Without loss of generality, assume that V makes nonadaptive queries (by reading all the
bits that could possibly be needed at once). This assumption is valid because the number
of queries was a constant and remains a constant. Let q be the number of queries.
Enumerate all random strings R for V . The length of each string is r([x[) = O(log [x[),
so the number of such strings is polynomial in [x[. For each R
i
, V chooses q positions
i
1
, . . . , i
q
and a Boolean function f
R
: ¦0, 1¦
q
→ ¦0, 1¦ and accepts iﬀ f
R
(π(i
1
), . . . , π(i
q
)).
We want a Boolean formula to simulate this. Introduce Boolean variables x
1
, . . . , x
l
,
where l is the length of the proof. Now we need a correspondence between the number of
satisﬁable clauses and the probability that the veriﬁer accepts.
For every R we add clauses that represent f
R
(x
i
1
, . . . , x
iq
). This can be done with 2
q
SATclauses. We need to convert clauses of length q to length 3 with additional variables.
e.g. x
2
∨ x
10
∨ x
11
∨ x
12
becomes (x
2
∨ x
10
∨ y
R
) ∧ (y
R
∨ x
11
∨ x
12
). This requires at most
q2
q
3SATclauses. In general there is a series of auxiliary variables for each R.
If z ∈ L, then there is a proof π such that V
π
(z) accepts for every R. Set x
i
= π(i) and
auxiliary variables in the right way, all clauses are satisﬁed.
If z / ∈ L, then for every assignment to x
1
, . . . , x
l
(and to y
R
’s), the corresponding proof
π(i) = x
i
makes the veriﬁer reject for half of the R ∈ ¦0, 1¦
r(z)
. For each such R, one of
the clauses representing f
R
fails. Therefore a fraction =
1
2
1
q2
q
of clauses fails. 2
It turns out if this holds for every NPcomplete problem, then the PCP Theorem must
be true.
Theorem 71 If there is a reduction as in (15.1) for some problem L in NP, then L ∈
PCP(O(log n), O(1)), i.e. the PCP Theorem holds for that problem.
Proof: We describe how to construct a veriﬁer for L. V on input z expects π to be a
satisfying assignment for ϕ
z
. V picks O(
1
) clauses of ϕ
z
at random, and checks that π
satisﬁes all of them. Randomness is O(
1
log m) = O(log [z[). O(
1
) = O(1) bits are read in
the proof.
z ∈ L ⇒ ϕ
z
is satisﬁable
⇒ ∃π such that V
π
(z) always accept.
z / ∈ L ⇒ ∀π a fraction
1
of clauses are unsatisﬁed
⇒ ∀π V
π
(z) will reject with probability ≥
1
2
(details omitted).
99
2
15.2.3 Tighter result
What is the best known value of in Theorem 70? In the next lectures we will prove the
following result:
Theorem 72 (H˚astad) There is a PCP veriﬁer for NPthat uses O(log n) bits of ran
domness which, based on R, chooses 3 positions i
1
, i
2
, i
3
and a bit b and accepts iﬀ π(i
1
) ⊕
π(i
2
) ⊕π(i
3
) = b. The veriﬁer satisﬁes
z ∈ L ⇒ ∃π Pr[V
π
(x) = 1] ≥ 1 − (15.2)
z / ∈ L ⇒ ∀π Pr[V
π
(x) = 1] ≤
1
2
+ (15.3)
Note the slightly nonstandard bounds for the probabilities above. If we had “= 1” in
Equation (15.2), we could ﬁnd a proof π by solving a system of linear equations (which
would imply P = NP). For every R, one can encode x
i
1
⊕ x
i
2
⊕ x
i
3
= b with 4 clauses in
3CNF.
If z ∈ L, then a fraction ≥ (1 −) of clauses are satisﬁed.
If z / ∈ L, then for a (
1
2
−) fraction of R,
1
4
of clauses are contradicted.
This is not producing a formula directly, but is enough to prove the hardness of approx
imation ratio:
1 −
1
4
(
1
2
−)
1 −
=
7
8
+
15.3 Max Clique
15.3.1 Approximability
Deﬁnition 28 (Max Clique) Given an undirected graph G = (V, E), ﬁnd the largest C ⊆
V such that ∀u, v ∈ C, (u, v) ∈ E. When convenient, we use the complementary graph
equivalent “Max Independent Set” which is the largest I ⊆ V such that ∀u, v ∈ I, (u, v) / ∈ E.
Finding a largest clique is NPhard, so we ask about approximations. There is a sim
ple
log n
n
approximation:
• arbitrarily partition the graph into subsets of size log n
• solve exactly each subset
• report the largest clique found
Suppose there is a clique of size αn, then one of the subset contains a clique of size
αlog n, hence the approximation. Best known is a
(log n)
2
n
approximation algorithm.
100
x
1
x
3
x
3
x
5
x
2
x
4
x
5
x
1
x
1
x
3
x
5
x
2
Figure 15.1: Graph construction corresponding to the 3CNF formula ϕ = (x
1
∨ x
2
∨ x
3
) ∧
(x
2
∨ x
3
∨ x
5
) ∧ (x
1
∨ x
4
∨ x
5
) ∧ (x
1
∨ x
3
∨ x
5
).
15.3.2 Inapproximability
We apply the PCP Theorem to Max Independent Set. Given a 3CNF formula ϕ = c
1
∧
c
2
∧ . . . ∧ c
m
over variables x
1
, . . . , x
n
arising from Theorem 70, we build a graph using the
same construction as in the NPcompleteness proof of Independent Set in Lecture 3 (see
Figure 15.1).
The graph has 3m vertices. Note that G has an independent set of size m iﬀ ϕ is
satisﬁable, and G has an independent set with k vertices iﬀ ϕ has an assignment that satisﬁes
k clauses. So the hardness results for MAX3SAT translate directly to Max Independent
Set. Below we show how this can be further improved.
Given G = (V, E), consider G
= (V
, E
) where V
= V V and ((u
1
, u
2
), (v
1
, v
2
)) ∈ E
iﬀ (u
1
, v
1
) ∈ E or (u
2
, v
2
) ∈ E. It can be shown that the size of the maximum independent
set in G
is equal to (size of max ind. set in G)
2
. As a sidenote, this result is not true for
cliques (a clique in G
could be possibly larger than expected), unless we use a diﬀerent
deﬁnition of G
.
By a constant number of applications of the graph product operation, we get:
graph G G
G
size of ind. set if ϕ is satisﬁable m m
2
m
4
size of ind. set if only (1 −)m clauses
of ϕ are satisﬁable
(1 −)m (1 −)
2
m
2
(1 −)
4
m
4
The ratio decreases without bound, therefore there can be no constant approximation
algorithm for Max Clique. In fact:
Theorem 73 (H˚astad) If NP = ZPP, then ∀ > 0, Max Independent Set cannot be
approximated within
1
n
1−
(n is the number of vertices).
15.4 References
The PCP Theorem is from [AS98, ALM
+
98], and the connection between proofchecking and
approximability is due to [FGL
+
96]. Karloﬀ and Zwick [KZ97] describe a 7/8 approximation
algorithm for Max 3SAT. H˚astad’s result on threequery PCP is from [H˚as01]. H˚astad’s
result on the approximability of the Max Clique problem is from [H˚as99].
101
Lecture 16
H˚astad’s Proof System
In this lecture we introduce H˚astad’s PCP veriﬁer, which yields stronger hardness results
than the raw PCP theorem. In particular, we derive asymptotically optimal hardness of
approximation results for Max E3LIN2 and Max 3SAT. We also show that Max 2SAT and
Min Vertex Cover are hard to approximate within 21/22 + o(1) and 7/6 + o(1) of their
optimum, respectively. Finally, we outline the connection between the PCP theorem and
H˚astad’s veriﬁer.
16.1 H˚astad’s veriﬁer
To describe H˚astad’s veriﬁer, we will need a more detailed parametrization of PCP. Say
a language L is in PCP
c,s
[r(n), q(n)] if there is a probabilistic polynomial time veriﬁer V
which, given input x and oracle access to a proof π, uses r([x[) random bits, reads q([x[)
bits of π and such that
x ∈ L ⇒ ∃π : Pr[V
π
(x) = 1] ≥ c
x ∈ L ⇒ ∀π : Pr[V
π
(x) = 1] < s.
The values c and s are called the completeness and the soundness of the veriﬁer, respectively.
If, in addition to these conditions, the veriﬁer V makes its queries nonadaptively, we say
L ∈ naPCP
c,s
[r(n), q(n)].
Theorem 74 (H˚astad) For every constant > 0, NP = naPCP
1−,1/2+
[O(log n), 3].
Moreover, the veriﬁer for any L ∈ NP is simple: V
π
generates three indices i, j, k and one
bit b and accepts iﬀ π[i] ⊕π[j] ⊕π[k] = b.
We will sketch how to derive H˚astad’s theorem from the PCP theorem in the next few
lectures. We remark that the theorem achieves the best possible values for its parameters
in several ways. In particular, we have the following lower bounds:
1. Number of queries. (Folklore) For every > 0, P = PCP
1,1−
[O(log n, 2)] and (Zwick)
P = PCP
1−,1−
1/3[O(log n), 2].
2. Completeness vs. soundness. (Zwick) If c/(1−s) ≥ 2, then P = PCP
c,1−s
[O(log n), 3].
102
3. Completeness. (Zwick) P = naPCP
1,5/8
[O(log n), 3].
However, perfect completeness can be bought by giving away soundness or nonadaptivity:
1. (H˚astad) For all > 0, NP = naPCP
1,3/4+
[O(log n), 3]. It is not known whether
this soundness constant is optimal for perfect completeness.
2. (Guruswami, Lewin, Sudan and Trevisan) For all > 0, NP = PCP
1,1/2+
[O(log n), 3].
16.2 Applications to hardness of approximation
We show that H˚astad’s theorem implies the following hardness of approximation factors:
1/2 + o(1) for Max E3LIN2, 7/8 + o(1) for Max 3SAT, 21/22 + o(1) for Max 2SAT and
7/6−o(1) for Min Vertex Cover. The proofs use approximation preserving reductions similar
to the ones we saw last time.
Linear equations modulo 2. Max E3LIN2 is the problem of ﬁnding a maximum assign
ment to a linear system of equations modulo 2, with three distinct variables per equation.
The value of an assignment is the number of equations satisﬁed by it. The problem of decid
ing whether a satisfying assignment of an E3LIN2 instance exists is P—it can be solved by
Gaussian elimination. Obtaining a 1/2approximation for Max E3LIN2 is trivial: Consider
the two assignments x
i
= 0 for all i and x
i
= 1 for all i, and choose the one that satisﬁes
more equations. This assignment will satisfy at least half of the equations.
Surprisingly, a 1/2 approximation is asymptotically optimal, unless P = NP. To show
this, we construct a reduction ψ from an arbitrary NP language L to E3LIN2. Consider
H˚astad’s veriﬁer V for L: On input z, V
π
generates a random string r of length R =
O(log [z[), computes three indices i(r), j(r), k(r) and a bit b(r), queries π at positions i, j, k
and accepts iﬀ π[i(r)] ⊕π[j(r)] ⊕π[k(r)] = b(r). The reduction ψ maps z to the system of
equations
¦x
i(r)
⊕x
j(r)
⊕x
k(r)
= b(r) [ r ∈ ¦0, 1¦
R
¦.
This is a polynomialtime reduction, as the number of equations is m = [¦0, 1¦
R
[ = [z[
O(1)
.
If z ∈ L, there exists a proof π such that V
π
(x) accepts with probability 1 − over the
choice of r. In this case, the assignment x
i
= π[i] satisﬁes (1−)m of the equations ψ(z). If,
on the other hand, some assignment x
i
= a
i
satisﬁes (1/2+)m of the equations ψ(z), then
the proof π such that π[i] = a
i
makes V
π
(z) accept with probability (1/2 + )m, implying
z ∈ L. It follows that E3LIN2 is inapproximable within a factor of
(1/2+)m
(1−)m
≤ 1/2 + 2,
unless P = NP.
Max 3SAT. The 7/8 + inapproximability of Max 3SAT follows by reduction from Max
E3LIN2. The reduction ψ from E3LIN2 to 3SAT maps each equation x ⊕y ⊕z = b into a
conjunction of four clauses:
x ⊕y ⊕z = 0 ⇔ (¯ x ∨ ¯ y ∨ ¯ z) ∧ (¯ x ∨ y ∨ z) ∧ (x ∨ ¯ y ∨ z) ∧ (x ∨ y ∨ ¯ z)
x ⊕y ⊕z = 1 ⇔ (x ∨ y ∨ z) ∧ (x ∨ ¯ y ∨ ¯ z) ∧ (¯ x ∨ y ∨ ¯ z) ∧ (¯ x ∨ ¯ y ∨ z).
103
Consider an instance I of E3LIN and an assignment a for I. If an equation in I is satisﬁed
by a, all four corresponding clauses in ψ(I) are satisﬁed by a. Otherwise, exactly three of
these four clauses are satisﬁed by a. Therefore, if (1−)m equations of I are satisﬁable, then
4(1−)m+3m clauses of ψ(I) are satisﬁable (by the same assignment). On the other hand,
if at most (1/2 +)m equations of I are satisﬁable, then at most 4(1/2 +)m+3(1/2 −)m
clauses of ψ(I) are satisﬁable. The ratio between these two values is less than 7/8 +.
The factor 7/8 is optimal: All instances ψ(I) have exactly three literals per clause, so a
random assignment is expected to satisfy 7m/8 clauses. Even for arbitrary 3SAT instances,
which may contain two literal and one literal clauses, there is a 7/8 approximation algorithm
based on semideﬁnite programming.
Max 2SAT. Unlike for Max 3SAT, there is no representation of an E3LIN2 equation by
a short 2CNF formula. The reduction from E3LIN2 to 2SAT uses auxiliary variables:
Given an equation x
1
⊕ x
2
⊕ x
3
= b, there exists a 12 clause 2CNF φ over variables
x
1
, x
2
, x
3
, y
1
, y
2
, y
3
, y
4
such that:
1. Every assignment of φ satisﬁes at most 11 clauses.
2. If a
1
⊕ a
2
⊕ a
3
= b, there is an assignment that satisﬁes 11 clauses of φ such that
x
1
= a
1
, x
2
= a
2
and x
3
= a
3
.
3. If a
1
⊕ a
2
⊕ a
3
= b, every assignment of φ such that x
1
= a
1
, x
2
= a
2
and x
3
= a
3
satisﬁes at most 10 clauses of φ.
We reduce an E3LIN2 instance I to a 2CNF instance ψ(I) by applying this construction to
every equation of I. We instantiate distinct copies of y
1
, y
2
, y
3
and y
4
for every equation of
I. If (1−)m equations of I are satisﬁable, then 11(1−)m of their corresponding clauses in
ψ(I) are satisﬁable (by property 2). If fewer than (1/2 +)m equations of I are satisﬁable,
then at most 11(1/2 +)m+10(1/2 −)m equations of ψ(I) are satisﬁable (by properties 1
and 3). The ratio between these two values is 21/22 +O() ≈ 0.954 +O(). Using the same
approach, one can show that Max Cut cannot be approximated within 16/17 + , unless
P = NP.
The best known approximation algorithm for Max 2SAT is based on semideﬁnite pro
gramming and guarantees a ≈ 0.93 approximation.
Min Vertex Cover. A vertex cover for a graph G is a set C ⊆ V (G) such that every edge
of G has at least one endpoint in C. Equivalently, C is a vertex cover of G if V (G) −C is
an independent set. We show that the minimum vertex cover of G cannot be approximated
within 7/6, unless P = NP. Let I be an instance of E3LIN2. For every equation x
1
⊕x
2
⊕
x
3
= b in I, we produce four vertices for ψ(I). Each of these corresponds to an assignment
of x
1
, x
2
and x
3
that is consistent with the equation. We draw edges between all pairs of
vertices that originated from the same equation, as well as between all pairs of vertices
that correspond to inconsistent assignments. If (1 −)m equations of I are satisﬁable, then
ψ(I) contains an independent set of size (1 −)m: The independent set is made of vertices
corresponding to satisﬁed clauses consistent with the optimal satisfying assignment of I.
The complement of this independent set is a vertex cover of size at most (3+)m. If at most
(1/2 + )m equations of I are satisﬁable, then every set of (1/2 + )m + 1 vertices of ψ(I)
104
contains an edge. Therefore the largest independent set of V (G) has at most (1/2 + )m
vertices, so its smallest vertex cover has at least (3
1
2
− )m vertices. The ratio between
these two values is 7/6 − O(). The best known inapproximability ratio for this problem
is 10
√
5 − 21 ≈ 1.36. A trivial algorithm that picks all endpoints of an arbitrary maximal
matching achieves a factor 2 approximation. No algorithm is known that improves this
factor asymptotically.
16.3 From the PCP theorem to Raz’s veriﬁer
The PCP theorem says that for some constant δ > 0, NP = PCP
1,δ
[O(log n), O(1)].
Assuming this theorem, we sketch the proof of Theorem 74. The main ingredient is a proof
system for NP of Raz that makes only two queries, but over proofs represented in a large
alphabet. To prove Theorem 74, the ﬁrst step is to show that the proof system of Raz
is equivalent to PCP
1,δ
[O(log n), O(1)]. The second step is to construct a simulation of
the two queries performed by Raz’s veriﬁer by three queries over an appropriate binary
encoding of Raz’s proof. Here we cover the ﬁrst step of the argument.
Raz’s proof system. A twoproof system for L with soundness γ is a probabilistic
polynomialtime veriﬁer V with oracle access to two proofs, π
1
and π
2
over an alphabet
of size 1/γ
O(1)
such that on input x, V performs exactly one query into each of π
1
and π
2
,
and
x ∈ L ⇒ ∃π
1
, π
2
: Pr[V
π
1
,π
2
(x) = 1] = 1
x ∈ L ⇒ ∀π
1
, π
2
: Pr[V
π
1
,π
2
(x) = 1] < γ.
We show that the PCP theorem implies the existence of a twoproof system with sound
ness δ for some δ > 0 for any language in NP. First, using a canonical reduction, we can
restate the PCP theorem in the following form: For every L ∈ NP, there is a polynomial
time reduction ψ that maps instances of L into 3CNF formulas such that
x ∈ L ⇒ ψ(x) is satisﬁable
x ∈ L ⇒ At most (1 −δ)m clauses of ψ(x) are satisﬁable.
We now construct a twoproof system for 3SAT. The proofs π
1
and π
2
are supposed to
represent diﬀerent encodings of an assignment for the input φ. For each variable x
i
, the value
π
1
[i] ∈ ¦0, 1¦ represents an assignment of x
i
. For each clause C
j
, the value π
2
[j] ∈ ¦1, . . . , 7¦
represents one of the seven satisfying assignments for C
j
. The veriﬁer V
π
1
,π
2
chooses a
random clause C
j
, a random variable x
i
that appears in C
j
and accepts iﬀ the assignments
π
1
[i] and π
2
[j] are consistent.
If φ is satisﬁable, the proofs π
1
and π
2
encoding a satisfying assingment of φ make V
π
1
,π
2
accept with probability 1. Now suppose that at most (1 −δ)m clauses of φ are satisﬁable.
Then at least a δ fraction of entries in π
2
are inconsistent with π
1
. If one of these entries
in π
2
is selected by the veriﬁer, at least one of its three literals is assigned inconsistently in
π
1
. Therefore an inconsistency is detected with probability δ/3.
105
We will need a twoproof system for NP with much better soundness. We expect
to increase soundness by “repeating” the twoproof protocol for 3SAT several times. A k
repetition twoproof system for 3SAT is speciﬁed as follows: The proof entries π
1
[i
1
, . . . , i
k
] ∈
¦0, 1¦
k
and π
2
[j
1
, . . . , j
k
] ∈ ¦1, . . . , 7¦
k
represent partial assignments to the ktuples of
variables x
i
1
, . . . , x
i
k
and ktuples of clauses C
j
1
, . . . , C
j
k
, respectively. The veriﬁer V
π
1
,π
2
chooses a random set of k clauses, a random variable in each clause and accepts iﬀ the cor
responding assignments in π
1
and π
2
are consistent. We might expect that the krepetition
of a twoproof system with soundness 1 −γ has soundness (1 −γ)
k
. Surprisingly, this is not
the case; in the next lecture we will see a twoproof system with soundness 1/2 such that
its 2repetition has soundness only 5/8. Fortunately, repetition does amplify soundness at
an exponential rate:
Theorem 75 (Raz) If V is a twoproof veriﬁer with soundness γ, the krepetition of V
has soundness 2
−Ω(k)
, where the constant in the Ω() notation depends only on γ and on the
alphabet size of the proof system.
Applying this theorem (which we won’t prove), we conclude that for every γ > 0 there
exists a 2proof system for 3SAT with soundness γ with proofs over alphabets of size 1/γ
O(1)
.
In the next three lectures we will show that for suitably small γ = γ(), this 2proof system
can be simulated by naPCP
1−,1/2+
[O(log n), 3].
16.4 References
H˚astad’s PCP result is from [H˚as01]. Raz’s result (Theorem 75) is from [Raz98].
106
Lecture 17
Linearity Test
As part of our preparation for an analysis of H˚astad’s proof system (next lecture), we need
to consider the problem of testing whether a given string is a code word for some particular
code. First, we will see a procedure that, given a string which may or may not be a code
word for the Hadamard code, will decide if it is a code word or if it is far from being a code
word. This is equivalent to a test that decides whether a Boolean function is linear or far
from linear. That is, given f : ¦0, 1¦
k
→ ¦0, 1¦,
• if f is linear, i.e. f(x
1
, . . . , x
n
) =
¸
i∈S
x
i
for some S ⊆ ¦1, . . . , n¦, the test accepts
with probability 1;
• if the test accepts with probability at least 1/2 + δ, then f has agreement at least
1/2 + with a linear function.
Here is a possible test:
1. pick x, y ∈ ¦0, 1¦
k
2. accept if and only if f(x) ⊕f(y) = f(x ⊕y).
We need to show that this test meets the requirements above. In order to analyze the test,
we will introduce some concepts from Fourier analysis of Boolean functions.
17.1 Fourier analysis of Boolean functions
We are interested in linear functions, that is Boolean functions l : ¦0, 1¦
k
→ ¦0, 1¦ such
that there is some a ∈ ¦0, 1¦
k
for which l(x) = a x for every x. There is a linear function
l
S
for every set S ⊆ ¦1, . . . , k¦ and it is deﬁned as l
S
(x
1
, . . . , x
k
) =
¸
i∈S
x
i
.
If f : ¦0, 1¦
k
→ ¦0, 1¦ is a Boolean function, we denote by F : ¦−1, 1¦
k
→ ¦−1, 1¦ the
equivalent Boolean function in the representation where we use 1 instead of 0, −1 instead
of 1, and multiplication () instead of addition (⊕). That is,
f(x
1
, . . . , x
k
) =
1
2
−
1
2
F(1 −2x
1
, . . . , 1 −2x
k
)
107
and
F(x
1
, . . . , x
k
) = 1 −2f
1
2
−
1
2
x
1
, . . . ,
1
2
−
1
2
x
k
Note that if l
S
is a linear function, then L
S
(x
1
, . . . , x
k
) =
¸
i∈S
x
i
is the equivalent
function in this correspondence.
Consider now the set of functions F : ¦−1, 1¦
k
→ R: this is a vector space of dimension
2
k
over the reals, and it contains all boolean functions. Deﬁne the following scaled dot
product over such functions:
F G =
1
2
k
¸
x∈{−1,1}
k
F(x)G(x)
Notice that if F : ¦−1, 1¦
k
→ ¦−1, 1¦, i.e. if F is a Boolean function in this space, then
F F = 1. Also notice that if F, G : ¦−1, 1¦
k
→ ¦−1, 1¦ then
Pr[f(x) = g(x)] =
1
2
+
1
2
F G
How do the linear Boolean functions behave with respect to this dot product? Consider
two linear functions L
S
and L
T
, S = T:
L
S
L
T
=
1
2
k
¸
x∈{−1,1}
k
¸
i∈S
x
i
¸
¸
j∈T
x
j
¸
=
1
2
k
¸
x∈{−1,1}
k
¸
i∈S∆T
x
i
= E
x∈{−1,1}
k
¸
i∈S∆T
x
i
=
¸
i∈S∆T
E
x
x
i
= 0,
where S∆T = (ST) ∪ (TS) is the symmetric diﬀerence of S and T. To get the
third line, we think of the sum as an expectation over x ∈ ¦−1, 1¦
k
, where x is chosen
randomly by setting the ith entry independently to ±1, and therefore it is the expectation
of a product of independent variables.
So all functions L
S
are orthogonal to each other, that is, L
S
L
T
= 0 if S = T.
Then all such functions must be linear independent, and since there are 2
k
of them in a
space of dimension 2
k
they form an orthonormal basis. So we can write each function
F : ¦−1, 1¦
k
→ R as
F(x) =
¸
S⊆{1,...,k}
ˆ
F
S
L
S
(x).
This is known as the Fourier expansion of F. The Fourier coeﬃcient
ˆ
F
S
can be computed
as
ˆ
F
S
= F L
S
. Why would we write F like this? We’ll see that the Fourier coeﬃcients
have a useful interpretation.
108
If F is Boolean, then
ˆ
F
S
= F L
S
= E
x
F(x)L
S
(x)
= Pr
x
[F(x) = L
S
(x)] −Pr
x
[F(x) = L
S
(x)] −1
= 2Pr
x
[F(x) = L
S
(x)] −1
= 2Pr
x
[f(x) = l
S
(x)] −1.
Equivalently, for every S, we have Pr
x
[f(x) = l
S
(x)] =
1
2
+
1
2
ˆ
F
S
. The Fourier coeﬃcients
range between 1 and −1, and in a sense, they represent the bias that a Boolean function f
has with respect to each of the linear functions. The coeﬃcient
ˆ
F
S
is 1 when f is exactly
the corresponding linear function l
S
, and it is 1 when f is exactly the complement of the
linear function.
We will see that if the linearity test for a Boolean function f given at the beginning of
the lecture accepts with high probability, then there is a large Fourier coeﬃcient
ˆ
F
S
, which
implies that there is a linear function l
S
with nontrivial agreement with f. First, we need
a couple of results about Fourier coeﬃcients.
Lemma 76 (Parseval’s Equality) For every F : ¦−1, 1¦
k
→ R, E
x
[F
2
(x)] =
¸
S
ˆ
F
2
S
.
One way to interpret this is that the set of values of F and the Fourier coeﬃcients in
the expansion of F have the same l
2
norm up to a constant.
Corollary 77 For every F : ¦−1, 1¦
k
→ ¦−1, 1¦,
¸
S
ˆ
F
2
S
= 1.
Lemma 78 Let f : ¦−1, 1¦
k
→ ¦−1, 1¦. There are at most 1/(4
2
) linear functions l
S
such
that f and l
S
have agreement at least 1/2 +.
We have used the result in Lemma 78 before when we did worst case to average case
reductions. We can easily prove it now using what we know about Fourier coeﬃcients.
Proof: [Lemma 78] Suppose l
s
and f have agreement at least 1/2 + . Then
ˆ
F
S
≥ 2,
which implies that
ˆ
F
2
S
≥ 4
2
, but from Corollary 77, we know that
¸
S
ˆ
F
2
S
= 1. So there
can be at most 1/(4
2
) functions l
S
with that much agreement with f. 2
Proof: [Lemma 76 (Parseval’s Equality)]
E
x
[F
2
(x)] = E
x
¸
¸
S
ˆ
F
S
L
S
(x)
¸
T
ˆ
F
T
L
T
(x)
¸
=
¸
S
¸
T
ˆ
F
S
ˆ
F
T
E
x
[L
S
(x)L
T
(x)]
=
¸
S
¸
T
ˆ
F
S
ˆ
F
T
L
S
L
T
=
¸
S
ˆ
F
2
S
.
2
109
17.2 Using Fourier analysis to test linearity
Now consider the test to check if f : ¦0, 1¦
k
→ ¦0, 1¦ is linear: pick at random x, y ∈ ¦0, 1¦
k
and accept iﬀ f(x) ⊕f(y) = f(x ⊕y). If f is linear then the test accepts with probability
one.
Lemma 79 If the test accepts f with probability p, then there is a linear function l
S
that
agrees with f on at least a p fraction of inputs.
Proof: First observe that
Pr[ test accepts f ] =
1
2
+
1
2
E
x,y
[F(x)F(y)F(xy)].
This follows from the same type of argument we have made before, based on the fact that
F(x)F(y)F(xy) is 1 when the test accepts and −1 when it rejects:
E
x,y
[F(x)F(y)F(xy)] = Pr
x,y
[ test accepts f ] −Pr
x,y
[ test rejects f ]
= 2Pr
x,y
[ test accepts f ] −1.
Now, we need to prove that when E
x,y
[F(x)F(y)F(xy)] is bounded away from zero,
there is a large Fourier coeﬃcient. To do this we just use the Fourier expansion of each
function and simplify.
E
x,y
[F(x)F(y)F(xy)] = E
x,y
¸
¸
S
ˆ
F
S
L
S
(x)
¸
T
ˆ
F
T
L
T
(y)
¸
U
ˆ
F
U
L
U
(xy)
¸
=
¸
S
¸
T
¸
U
ˆ
F
S
ˆ
F
T
ˆ
F
U
E
x,y
[L
S
(x)L
T
(y)L
U
(x)L
U
(y)]
=
¸
S
¸
T
¸
U
ˆ
F
S
ˆ
F
T
ˆ
F
U
(E
x
L
S
(x)L
U
(x)) (E
y
L
T
(y)L
U
(y))
=
¸
S
¸
T
¸
U
ˆ
F
S
ˆ
F
T
ˆ
F
U
(L
S
L
U
)(L
T
L
U
)
=
¸
S
ˆ
F
3
S
≤ max
S
ˆ
F
S
¸
S
ˆ
F
2
S
= max
S
ˆ
F
S
.
So the largest Fourier coeﬃcient of F is an upper bound on the expectation E
x
[F(x)F(y)F(xy)],
and the expectation tells us how likely the test is to accept f. We have shown so far that
for every S, Pr
x
[f(x) = l
S
(x)] =
1
2
+
1
2
ˆ
F
S
, and that Pr[ test accepts f ] ≤
1
2
+
1
2
max
S
ˆ
F
S
.
Therefore if Pr[ test accepts f ] ≥ 1/2 + , then there exists an S such that
ˆ
F
S
≥ 2, i.e.
there is an S such that f and l
S
have agreement on at least 1/2 + fraction of inputs. 2
Since not more than 1 in 4
2
functions can have that much agreement with f, if we do
list decoding for f then we come up with a small list.
110
17.3 Testing code words for the Long Code
So far, we have been thinking of the encoding of a message m ∈ ¦0, 1¦
k
in the Hadamard
code as a function l
S
, where S = ¦i : m
i
= 1¦. An equivalent way to think of the encoding
is as the concatenation of a
1
m, a
2
m, . . . a
2
k m where a
1
, . . . , a
2
k is an enumeration of
¦0, 1¦
k
. This is how we though of the code in worst case to average case reductions. It is
also equivalent to think of the encoding as the concatenation of l
S
1
(m), l
S
2
(m), . . . , l
S
2
k
(m),
where S
1
, . . . , S
2
k is an enumeration of subsets of B
k
, that is, the encoding is the concatena
tion of the evaluation of all possible linear functions at m. In fact, for any code, it is possible
to think of the encoding as the concatenation of the evaluation of a series of functions at
the input value.
Let us now consider the error correcting code, called the Long Code, used in H˚astad’s
proof system. One way to think of the encoding of m ∈ ¦1, . . . , k¦ in the Long Code is
as a functional M : (¦1, . . . , k¦ → ¦0, 1¦) → ¦0, 1¦ that given a function f : ¦1, . . . , k¦ →
¦0, 1¦ returns M(f) = f(m). Equivalently, the encoding of m can be thought of as the
concatenation of f
1
(m), f
2
(m), . . . , f
2
k(m) where f
1
, . . . , f
2
k is an enumeration of all possible
functions f : ¦1, . . . , k¦ → ¦0, 1¦. Every error correcting code can be seen as a subset of
this code, and no encoding can be longer than the Long Code unless two of the functions
being evaluated are exactly the same.
Notice that the Long Code is doubly exponentially long: in order to transmit log k bits
of information (since m ∈ ¦1, . . . , k¦), we need to send an encoding that takes 2
k
bits. In
H˚astad’s proof system, the code is only used on inputs of a constant size, so this blowup
does not cause a problem.
Another equivalent way to think of the encoding of m is as the concatenation of
a
1
[m], a
2
[m], . . . , a
2
k[m] where a
1
, . . . , a
2
k is an enumeration of a ∈ ¦0, 1¦
k
, that is, it is
a concatenation of the mth entry in each string of length k. Equivalently, the encod
ing is a function that maps strings to one of their bits: the encoding of m is a function
M : ¦0, 1¦
k
→ ¦0, 1¦ that given an x ∈ ¦0, 1¦
k
returns M(x) = x[m]. From this point of
view, the Long Code encodes a message m ∈ ¦1, . . . , k¦ as a special linear function l
{m}
.
There is a loss of expressive power in this representation l
{m}
(note that it requires only
log k bits of information), but it is convenient for our purposes.
17.3.1 PreH˚astad veriﬁcation
To get some understanding of how the Long Code test works in veriﬁcation, we’ll start
by considering the veriﬁcation model in vogue before H˚astad’s veriﬁer. Given a function
A : ¦0, 1¦
k
→ ¦0, 1¦, we want the veriﬁer to check that A is actually a code word for the
Long Code as follows: for some predicate P, the veriﬁer should check that there is some
a ∈ ¦1, . . . , k¦ satisfying the predicate, P(a) = 1, such that given any string x, A(x) = x[a],
or in other words, A corresponds to some linear function l
{a}
.
So suppose that after performing some test, we ﬁnd that A has agreement 1 − δ with
some l
{a}
.
Then deﬁne the vector p ∈ ¦0, 1¦
k
such that p[i] = P(i) for every i, and do the following
test: choose a random x ∈ ¦0, 1¦
k
and accept if and only if A(p ⊕x) ⊕A(x) = 1.
For a random x, x ⊕f is also random, and so there is at least a 1 −2δ probability that
111
both A(x) = x[a] and A(p ⊕ x) = p[a] ⊕ x[a]. Thus, if P(a) = 1, the test accepts with
probability at least 1 −2δ, and if P(a) = 0 the test accepts with probability at least 2δ.
17.3.2 The Long Code test
Given A : (¦1, . . . , k¦ → ¦0, 1¦) → ¦0, 1¦,
• pick f, g : ¦1, . . . , k¦ → ¦0, 1¦ uniformly
• pick h : ¦1, . . . , k¦ → ¦0, 1¦ such that for every x
h(x) =
0 with probability 1 −
1 with probability
• accept iﬀ A(f) ⊕A(g) = A(f ⊕g ⊕h)
Suppose that A = l
{a}
. Then with high probability the test will accept since with high
probability f(a) ⊕ g(a) = f(a) ⊕ g(a) ⊕ h(a). If A has no correlation with any linear
function then this test is close enough to a test for linearity that the test will fail. If A is
linear but not a code word for the Long Code , it cannot be deﬁned over a very large set.
Suppose A = l
S
for some set S. Then the probability that
x∈S
f(x)
x∈S
g(x) =
x∈S
f(x)
x∈S
g(x)
x∈S
h(x)
gets exponentially close to 0 with the cardinality of S.
17.4 References
The linearity test studied in these notes is due to Blum, Luby and Rubinfeld [BLR93]. The
analysis that we presented is due to Bellare et al. [BCH
+
96]. The Long Code was introduced
by Bellar, Goldreich and Sudan [BGS98]. The eﬃcient test is due to H˚astad [H˚as01].
112
Exercises
1. Show that if f : ¦0, 1¦
n
→ ¦0, 1¦ is an arbitrary function, then max
S
[
ˆ
F
S
[ ≥ 2
−n/2
.
2. Compute the Fourier coeﬃcients of the function f(x
1
, x
2
) = x
1
∨ x
2
.
3. Let f : ¦0, 1¦
n
→ ¦0, 1¦ and g : ¦0, 1¦
m
→ ¦0, 1¦ be arbitrary functions, and let
h : ¦0, 1¦
n+m
→ ¦0, 1¦ be deﬁned as
h(x
1
, . . . , x
n+m
) = f(x
1
, . . . , x
n
) ⊕g(x
n+1
, . . . , x
n+m
)
Compute the Fourier coeﬃcients of h in terms of the Fourier coeﬃcients of f and g.
4. Let n be even and consider the function
f(x
1
, . . . , x
n
) = (x
1
∨ x
2
) ⊕(x
3
∨ x
4
) ⊕ (x
n−1
∨ x
n
)
Prove that max
S
[
ˆ
F
S
[ = 2
−n/2
.
5. Suppose that f : ¦0, 1¦
n
→ ¦0, 1¦ is a function that depends only on its ﬁrts k
variables, that is, suppose that there is a function g : ¦0, 1¦
k
→ ¦0, 1¦ such that
f(x
1
, . . . , x
n
) = g(x
1
, . . . , x
k
). Compute the Fourier coeﬃcients of f in terms of the
Fourier coeﬃcients of g.
6. Analyse the following linearity test: given oracle access to a function f : ¦0, 1¦
n
→
¦0, 1¦, pick uniformly at random four points x
1
, x
2
, x
3
, x
4
∈ ¦0, 1¦
n
and accept if and
only if
f(x
1
) ⊕f(x
2
) ⊕f(x
3
) ⊕f(x
4
) = f(x
1
⊕x
2
⊕x
3
⊕x
4
)
113
Lecture 18
Testing the Long Code
18.1 Test for Long Code
We now consider a test to distinguish codewords of the long code. Given A : ¦0, 1¦
k
→ ¦0, 1¦
pick x, y ∈ ¦0, 1¦
k
uniformly and z ∈ ¦0, 1¦
k
where the bits of z are IID with Pr[z[i] = 1] = .
Accept iﬀ A(x) ⊕A(y) = A(x ⊕y ⊕z).
If A is a codeword of the long code, then A = l
{a}
and the test check if x[a] ⊕ y[a] =
x[a] ⊕y[a] ⊕z[a] and succeeds iﬀ z[a] = 0. Hence it succeds with probability 1 −.
We now show that if the test accepts with probability 1/2 +δ, then list decoding of the
codeword produces a list with size dependent on δ rather than k. Intuitively this is similar
to the linearity test of the last lecture.
Consider the case where A is a linear function with large support: A = l
S
, S is large.
Then the test checks if l
S
(x) ⊕l
S
(y) = l
S
(x) ⊕l
S
(y) ⊕l
S
(z) and will succeed iﬀ l
S
(z) = 0.
The probability of this occurring is
1
2
+
1
2
(1 − 2)
S
. So the test cannot accept with high
probability if A is not linear or if A is linear but has large support.
We now consider the equivalent setting where we replace 0 with 1, 1 with 1, and
⊕ with so that A : ¦−1, 1¦ → ¦−1, 1¦, x, y, z ∈ ¦−1, 1¦
k
. Here the test accepts
iﬀ A(x)A(y)A(xyz) = 1. Speciﬁcally, Pr[test accepts A] =
1
2
+
1
2
E
x,y,z
[A(x)A(y)A(xyz)]
114
where:
E
x,y,z
[A(x)A(y)A(xyz)]
= E
x,y,z
¸
¸
S
ˆ
A
S
L
S
(x)
¸
T
ˆ
A
T
L
T
(y)
¸
U
ˆ
A
U
L
U
(x)L
U
(y)L
U
(z)
¸
=
¸
S,T,U
ˆ
A
s
ˆ
A
T
ˆ
A
U
L
S
·L
U
. .. .
(E
x
[L
S
(x)L
U
(x)])
L
T
·L
U
. .. .
(E
y
[L
T
(y)L
U
(y)])(E
z
L
U
(z))
=
¸
S
ˆ
A
3
S
E
z
[L
S
(z)]
=
¸
S
ˆ
A
3
S
¸
i∈S
E[z
i
]
=
¸
S
ˆ
A
3
S
(1 −2)
S
≤
max
S
ˆ
A
S
(1 −2)
S
=1
. .. .
¸
S
ˆ
A
2
S
Now suppose that the probability that the test accepts is greater than
1
2
+δ. Then there
is some S where
ˆ
A
S
(1 −2)
S
≥ 2δ. Hence:
ˆ
A
S
≥ 2δ
(1 −2)
S
≥ 2δ
[S[ log
1
1 −2
≤ log
1
2δ
[S[ ≤
log
1
2δ
log
1
1−2
This is close to the the analysis of H˚astad’s proof system. The primary diﬀerence is that
H˚astad’s system looks at two strings rather than one.
18.2 H˚astad’s veriﬁer
First recall Raz’s veriﬁer:
P
Q
V
i
j
The proof P is over the alphabet Σ, [Σ[ = k and the proof Q is over the alphabet
Γ, [Γ[ = t. The veriﬁer takes the character at position i from P and the character j from Q,
115
where j is selected at random and i is selected at random from a few positions of P. The
veriﬁer accepts iﬀ P[i] = π(Q[i]).
Now consider H˚astad’s veriﬁer:
P
Q
V
i
j
In the proofs for H˚astad’s veriﬁer, the binary strings indexed by i, j are the long code
encodings of the characters at those positions in Raz’s Veriﬁer. H˚astad’s verifer selects i, j
the same way as Raz’s so that the problem can be modeled as:
A : ¦0, 1¦
k
→ ¦0, 1¦
B : ¦0, 1¦
t
→ ¦0, 1¦
V
Here the function π : ¦1, . . . , t¦ → ¦1, . . . , k¦ is as before. The veriﬁer will accept with
probability greater than 1 − if A is the long code of P[i], B is the long code of Q[j]
and π(Q[j]) = P[i]. We will show that if the test accepts with probability
1
2
+ then
Pr[D(A) = π(D(B))] ≥ poly().
We digress for a moment to consider a subtle point about the long code test. What if
S = ∅? This is the case where the function is 0 everywhere. Then the test passes with
probability 1. This would not be a problem if the function is balanced. A function A is
selfdual if:
A(1 −x) = 1 −A(x) when x, A(x) ∈ ¦0, 1¦
A(−x) = −A(x) when x, A(x) ∈ ¦−1, 1¦
For example, ⊕ of an even number of inputs is not selfdual while ⊕ of an odd number
of inputs is selfdual. If A is selfdual then
ˆ
A
S
= 0 if [S[ is even. Consider the following
procedure called “folding” to convert any function A : ¦0, 1¦
k
→ ¦0, 1¦ into a self dual
function A
: ¦0, 1¦
k
→ ¦0, 1¦:
A
(x) =
A(x) if x[1] = 0
1 −A(1 −x) if x[1] = 1
This is selfdual since:
A
(1 −x) =
1 −A(x) = 1 −A
(x) if x[1] = 0
A(1 −x) = 1 −A
(x) if x[1] = 1
116
In our case of interest, the long code, then A
= A. In the following analysis, all testing
and decoding is performed on A
, B
.
Given A : ¦0, 1¦
k
→ ¦0, 1¦, B : ¦0, 1¦
t
→ ¦0, 1¦, π : ¦1, . . . , t¦ → ¦1, k¦ we pick x ∈
¦0, 1¦
k
, y ∈ ¦0, 1¦
t
uniformly. We also pick z ∈ ¦0, 1¦
t
where the bits are IID with Pr[z[i] =
1] = . The test accepts iﬀ:
A
(x) ⊕B
(y) = B
((x ◦ π) ⊕y ⊕z)
(From now on we will stop distinguishing between A and A
and between B and B
.)
Treating vectors as functions, x ◦ π ∈ ¦0, 1¦
t
, (x ◦ π)[i] = x[π(i)]. In the case where the
proof is based on a valid proof for Raz’s veriﬁer, we have a = P[i], b = Q[j], A = l
{a}
, B =
l
{b}
, π(b) = a and the test checks if:
x[a] ⊕y[b] = x[π(b)] ⊕y[b] ⊕z[b] = x[a] ⊕y[b] ⊕z[b]
And hence the test accepts with probability 1 − . Considering the equivalent setting of
A : ¦−1, 1¦
k
→ ¦−1, 1¦, B : ¦−1, 1¦
t
→ ¦−1, 1¦. Using the Fourier analysis machinery we
know that:
Pr[test accepts] =
1
2
+
1
2
E
xyz
[A(x)B(y)B((x ◦ π)yz)].
Deﬁne π
2
(β) ¦i ∈ ¦1, . . . , k¦[an odd number of j ∈ β satisfy π(j) = i¦ so that
¸
j∈γ
x[π(j)] =
¸
i∈π
2
(γ)
x
i
. We can then calculate:
E
x,y,z
[A(x)B(y)B((x ◦ π)yz)]
= E
x,y,z
¸
α
ˆ
A
α
L
α
(x)
¸
¸
β
ˆ
B
β
L
β
(y)
¸
¸
γ
ˆ
B
γ
L
γ
(x ◦ π)L
γ
(y)L
γ
(z)
¸
¸
=
¸
α,β,γ
ˆ
A
α
ˆ
B
β
ˆ
B
γ
Lα·L
π
2
(γ)
. .. .
E
x
[L
α
(x)L
γ
(x ◦ π)]
L
β
·Lγ
. .. .
E
y
[L
β
(y)L
γ
(y)]
(1−2)
γ
. .. .
E
z
[L
γ
(z)]
=
¸
β
ˆ
A
π
2
(β)
ˆ
B
2
β
(1 −2)
β
Now suppose that Pr[test accepts] ≥
1
2
+δ so that:
2δ ≤
¸
β
ˆ
A
π
2
(β)
ˆ
B
2
β
(1 −2)
β
≤
¸
β
[
ˆ
A
π
2
(β)
[
ˆ
B
2
β
(1 −2)
β
=
¸
β:
ˆ
A
π
2
(β)
≥δ
[
ˆ
A
π
2
(β)
[
ˆ
B
2
β
(1 −2)
β
+
<δ
. .. .
¸
β:
ˆ
A
π
2
(β)
<δ
[
ˆ
A
π
2
(β)
[
ˆ
B
2
β
(1 −2)
β
117
Multiplying both sides by δ yields:
δ
2
≤
¸
β:
ˆ
A
π
2
(β)
≥δ
ˆ
A
2
π
2
(β)
ˆ
B
2
β
(1 −2)
β
≤
¸
β
ˆ
A
2
π
2
(β)
ˆ
B
2
β
(1 −2)
β
Now consider the following decoding algorithm: D(A) picks α ⊆ ¦1, . . . , k¦ with probability
ˆ
A
2
α
and a ∈ α at random. Similarly D(B) picks β ⊆ ¦1, . . . , t¦ with probability
ˆ
B
2
β
and
b ∈ β at random. Using this strategy and noting that ∃c : (1 −2)
k
≤
c
k
, c = O(
1
) we ﬁnd:
Pr[D(A) = π(D(B))] ≥
¸
β
ˆ
A
2
π
2
(β)
ˆ
B
2
β
1
[β[
≥
1
c
¸
β
ˆ
A
2
π
2
(β)
ˆ
B
2
β
(1 −2)
β
≥
δ
2
c
= Θ(δ
2
)
This probability does not depend on t or k, so that we can adjust the soundness parameter
of Raz’s veriﬁer and not have a circular relationship of the variables.
118
Lecture 19
Circuit Lower Bounds for Parity
Using Polynomials
In this lecture, we will talk about circuit lower bounds and take our ﬁrst step towards prov
ing a result about the lower bound on the size of a constant depth circuit which computes
the XOR of n bits.
Before we talk about bounds on the size of a circuit, let us ﬁrst clarify what we mean
by circuit depth and circuit size. The depth of a circuit is deﬁned as the length of the
longest path from the input to output. The size of a circuit is the number of AND and
OR gates in the circuit.Note that for our purpose, we assume all the gates have unlimited
fanin and fanout. We deﬁne AC
0
to be the class of decision problems solvable by circuits
of polynomial size and constant depth. We want to prove the result that PARITY is not
in AC
0
.
We will try to prove this result using two diﬀerent techniques. In this class, we will talk
about a proof which uses polynomials; in the next class we will look at a diﬀerent proof
which uses random restrictions.
19.1 Circuit Lower Bounds for PARITY
Before we go into our proof, let us ﬁrst look at a circuit of constant depth d that computes
PARITY.
Theorem 80 If a circuit of size S and depth d computes PARITY then S ≥ 2
Ω(n
1
d−1
)
This bound of 2
Ω(n
1
(d−1)
)
is actually tight, as the following example shows.
Consider the circuit C shown in Figure(19.1), which computes the PARITY of n vari
ables. C comprises of a tree of XOR gates, each of which has fanin n
1
d−1
; the tree has a
depth of d −1.
Now, since each XOR gate is a function of n
1
d−1
variables, it can be implemented by
a CNF or a DNF of size 2
n
1
d−1
. Let us replace alternating layers of XOR gates in the
119
Figure 19.1: Circuit for Computing XOR of n variables; each small circuit in the tree
computes the XOR of k = n
1
d−1
variables
tree by CNF’s and DNF’s  for example we replace gates in the ﬁrst layer by their CNF
implementation, gates in the second layer by their DNF implementation, and so on. This
gives us a circuit of depth 2(d − 1). Now we can use the associativity of OR to collapse
consecutive layers of OR gates into a single layer. The same thing can be done for AND to
get a circuit of depth d.
This gives us a circuit of depth d and size O(n2
n
1
d−1
) which computes PARITY.
19.2 Proof of Circuit Lower Bounds for PARITY using Poly
nomials
For our proof, we will utilise a property which is common to all circuits of small size and
constant depth, which PARITY does not have. The property is that circuits of small size
and constant depth can be represented by low degree polynomials, with high probability.
More formally, we show that if a function f : ¦0, 1¦
n
→ ¦0, 1¦ is computed by a circuit of size
s and depth d, then there exists a function g : ¦0, 1¦
n
→ ' such that Pr
x
[f(x) = g(x)] ≥
3
4
and ˆ g
α
= 0 only for [α[ ≤ O((log S)
2d
).
Then we will show that if a function g : ¦0, 1¦
n
→ ' agrees with PARITY on more
than a fraction
3
4
of its inputs, then ˆ g
α
= 0 only for [α[ = Ω(
√
n). That is, a function which
agrees with PARITY on a large fraction of its inputs, has to have high degree. From these
two results, it is easy to see that PARITY cannot be computed by circuits of constant
depth and small size.
Lemma 81 Suppose we can describe a distribution G
OR
over functions g : ¦0, 1¦
n
→ ¦0, 1¦
such that
• for every g in G
OR
, ˆ g
α
= 0 only for [α[ ≤ O
(log
1
)(log K)
120
• for every x ∈ ¦0, 1¦
n
, Pr
g∼G
OR
[g(x) = x
1
∨ x
2
∨ ∨ x
k
] ≥ 1 − . Here K is the
maximum fanin of each gate.
Also suppose we can describe a such a distribution G
AND
for AND gates. Then, given a
circuit C of size S and depth d, we can ﬁnd a polynomial h of degree at most O((log S)
2d
),
such that Pr
x
[h(x) = C(x)] ≥
3
4
Proof: The way we do this is as follows. For every gate in a circuit C, we pick indepen
dently an approximating function g
i
with parameter =
1
4S
, and replace the gate by g
i
.
Then, for a given input, probability that the new circuit computes C(x) correctly, that is,
the probability that the results of all the gates are correctly computed, is at least
3
4
. Let
G
i
: ¦1, −1¦
n
→ ¦1, −1¦ be the Fourier transformed version of function g
i
, so that
g
i
(x
1
, , x
n
) =
1
2
+
1
2
G
i
(1 −2x
1
, 1 −2x
2
, , 1 −2x
n
) (19.1)
Then G
i
can be written as
G
i
=
¸
α
ˆ g
i,α
¸
j∈α
X
j
(19.2)
where X
i
= 1 −2x
i
. Then G
i
has degree at most O((log S)
2
), as K, the fanin of any gate,
is at most S. Therefore, if we expand out the expression for C(x), we will get a multilinear
polynomial of degree at most O((log S)
2d
), as C has depth d. 2
Lemma 82 Suppose there is a function g : ¦0, 1¦
n
→ ' such that
• Pr
x
[g(x) = x
1
⊕x
2
⊕ ⊕x
n
] ≥
3
4
and
• ˆ g
α
= 0, for all sets α such that [α[ > d.
Then d = Ω(
√
n)
Proof: Let G : ¦1, −1¦
n
→ ¦1, −1¦ be the Fourier transformed version of the function g.
That is,
G(x) = 1 −2g(
1
2
−
1
2
x
1
, ,
1
2
−
1
2
x
n
) (19.3)
Then, Pr
x
[G(X) =
¸
n
i=1
X
i
] ≥
3
4
, where X
i
= 1 −2x
i
. Deﬁne A to be the set of values over
which g agrees with PARITY. In other words,
A = ¦X : G(X) =
n
¸
i=1
X
i
¦ (19.4)
Then [A[ ≥
3
4
2
n
, by our initial assumption. Now consider the set of all functions S = ¦f : A → '¦.
These form a vector space of dimension [A[. Any function f in this set can be written as
f(X) =
¸
α
ˆ
f
α
¸
i∈α
X
i
(19.5)
121
Over A, G(X) =
¸
n
i=1
X
i
. Therefore, for X ∈ A,
¸
i∈α
X
i
= G(X)
¸
i / ∈α
X
i
(19.6)
By our initial assumption, G(X) is a polynomial of degree at most d. Therefore, for every
α, such that [α[ ≥
n
2
, we can replace
¸
i∈α
X
i
by a polynomial of degree less than or equal
to d+
n
2
. Every such function f which belong to S can be written as a polynomial of degree
at most d +
n
2
. Hence the set
¸¸
i∈α
X
i
¸
α≤d+
n
2
forms a basis for the set S. As there must
be at least [A[ such monomials, this means that
d+
n
2
¸
k=0
(
n
k
) ≥
3
4
2
n
(19.7)
This holds when d = Ω(
√
n). 2
Now we need to prove that we can indeed describe such distributions G
OR
and G
AND
over functions g
i
for OR and AND.
19.3 Approximating OR
The following lemma says that we can approximately represent OR with a polynomial of
degree exponentially small in the the fanin of the OR gate. We’ll use the notation that x
is a vector of k bits, x
i
is the ith bit of x, and 0 is the vector of zeros (of the appropriate
size based on context).
Lemma 83 For all k and , there exists a distribution G over functions g : ¦0, 1¦
k
→ R
such that
1. g is a degree O((log
1
)(log k)) polynomial, and
2. for all x ∈ ¦0, 1¦
k
,
Pr
g∼G
[g(x) = x
1
∨ . . . ∨ x
k
] ≥ 1 −. (19.8)
Proof Idea: We want a random polynomial p : ¦0, 1¦
k
→ R that computes OR. An
obvious choice is
p
bad
(x
1
, . . . , x
k
) = 1 −
¸
i∈{1,...,k}
(1 −x
i
) , (19.9)
which computes OR with no error. But it has degree k, whereas we’d like it to have
logarithmic degree. To accomplish this amazing feat, we’ll replace the tests of all k variables
with just a few tests of random batches of variables. This gives us a random polynomial
which computes OR with onesided error: when x = 0, we’ll have p(x) = 0; and when some
x
i
= 1, we’ll almost always (over our choice of p) have p(x) = 1.
122
Proof: We pick a random collection C of subsets of the bits of x. (That is, for each S ∈ C
we have S ⊆ ¦1, . . . , k¦). We’ll soon see how to pick C, but once the choice has been made,
we deﬁne our polynomial as
p(x
1
, . . . , x
k
) = 1 −
¸
S∈C
1 −
¸
i∈S
x
i
. (19.10)
Why does p successfully approximate OR? First, suppose x
1
∨ . . . ∨ x
k
= 1. Then we
have x = 0, and:
p(0, . . . , 0) = 1 −
¸
S∈C
1 −
¸
i∈S
0
= 0. (19.11)
So, regardless of the distribution from which we pick C, we have
Pr
C
[p(0) = 0] = 1. (19.12)
Next, suppose x
1
∨ . . . ∨ x
k
= 1. We have p(x) = 1 iﬀ the product term is zero. The
product term is zero iﬀ the sum in some factor is 1. And that, in turn, happens iﬀ there’s
some S ∈ C which includes exactly one x
i
which is 1. Formally, for any x ∈ ¦0, 1¦
k
, we
want the following to be true with high probability.
∃S ∈ C. ([¦i ∈ S : x
i
= 1¦[ = 1) (19.13)
Given that we don’t want C to be very large (so that the degree of the polynomial is
small), we’ll have to pick C very carefully. In order to accomplish this, we turn to the
ValiantVazirani reduction, which you may recall from Lecture 7:
Lemma 84 (ValiantVazirani) Let A ⊆ ¦1, . . . , k¦, let a be such that 2
a
≤ [A[ ≤ 2
a+1
,
and let H be a family of pairwise independent hash functions of the form h : ¦1, . . . , k¦ →
¦0, 1¦
a+2
. Then if we pick h at random from H, there is a constant probability that there
is a unique element i ∈ A such that h(i) = 0. Precisely,
Pr
h∼H
[[¦i ∈ A : h(i) = 0¦[ = 1] ≥
1
8
(19.14)
With this as a guide, we’ll deﬁne our collection C in terms of pairwise independent hash
functions. Let t > 0 be a value that we will set later in terms of the approximation parameter
. Then we let C = ¦S
a,j
¦
a∈{0,...,log k},j∈{1,...,t}
where the sets S
a,j
are deﬁned as follows.
• For a ∈ ¦0, . . . , log k¦:
– For j ∈ ¦1, . . . , t¦:
∗ Pick random pairwise independent hash function h
a,j
: ¦1, . . . , k¦ → ¦0, 1¦
a+2
∗ Deﬁne S
a,j
= h
−1
(0). That is, S
a,j
= ¦i : h(i) = 0¦.
123
Now consider any x = 0 which we’re feeding to our ORpolynomial p. Let A be the set of
bits of x which are 1, i.e., A = ¦i : x
i
= 1¦, and let a be such that 2
a
≤ [A[ ≤ 2
a+1
. Then
we have a ∈ ¦0, . . . , log k¦, so C includes t sets S
a,1
, . . . , S
a,t
. Consider any one such S
a,j
.
By ValiantVazirani, we have
Pr
h
a,j
∼H
[[¦i ∈ A : h
a,j
(i) = 0¦[ = 1] ≥
1
8
(19.15)
which implies that
Pr
h
a,j
∼H
[[¦i ∈ A : i ∈ S
a,j
¦[ = 1] ≥
1
8
(19.16)
so the probability that there is some j for which [S
a,j
∩ A[ = 1 is at least 1 −
7
8
t
, which
by the reasoning above tells us that
Pr
p
[p(x) = x
1
∨ . . . ∨ x
k
] ≥ 1 −
7
8
t
. (19.17)
Now, to get a success probability of 1− as required by the lemma, we just pick t = O(log
1
).
The degree of p will then be [C[ = t(log k) = O((log
1
)(log k)), which satisﬁes the degree
requirement of the lemma. 2
Note that given this lemma, we can also approximate AND with an exponentially low
degree polynomial. Suppose we have some G which approximates OR within as above.
Then we can construct G
which approximates AND by drawing g from G and returning g
such that
g
(x
1
, . . . , x
k
) = 1 −g(1 −x
1
, . . . , 1 −x
k
). (19.18)
Any such g
has the same degree as g. Also, for a particular x ∈ ¦0, 1¦
k
, g
clearly computes
AND if g computes OR, which happens with probability at least 1 − over our choice of g.
19.4 References
The proof of the Parity lower bound using polynomials is due to Razborov [Raz87] and
Smolensky [Smo87]. The original proof was diﬀerent, and we will present it next.
124
Lecture 20
Lower Bound for Parity Using
Random Restrictions
In this lecture we give an alternate proof that parity / ∈ AC
0
using the technique of random
restrictions. This is the original method that was used to prove parity / ∈ AC
0
.
20.1 Random Restrictions
A restriction ﬁxes some inputs of a circuit to constant values, and leaves other inputs free.
More formally, a restriction is a function ρ : ¦1, . . . , n¦ → ¦0, 1, ∗¦. We apply a restriction
ρ to a circuit C with n inputs to obtain a restricted circuit C
ρ
as follows:
• For each i ∈ ¦1, . . . , n¦:
– If ρ(i) = 0, set the ith input of C to 0.
– If ρ(i) = 1, set the ith input of C to 1.
– If ρ(i) = ∗, leave the ith input of C as a free variable.
Now let’s see how restrictions are used to prove our main theorem:
Theorem 85 There does not exist a constantdepth, unboundedfanin, polynomialsize cir
cuit which computes the parity function of its inputs.
Proof Idea: We’ll consider only circuits which at the ﬁrst level consist only of ORs, at the
second level consist only of ANDs, and continue alternating in this way. This is depicted in
Figure 20.1. (Any circuit can be coverted to this form with at most a factor of 2 increase
in the depth.)
The proof will be by contradiction. First, we show that parity circuits of depth 2 must
have exponential width (Lemma 86). Next, we suppose by way of contradiction that we
have an AC
0
circuit C of some constant depth d which computes parity. We give a way of
squashing C down to depth d − 1 while still computing parity on many variables. We can
repeat the method d−2 times to obtain a parity circuit of depth 2 and subexponential size,
which contradicts Lemma 86.
125
x2
OR OR OR
x1 x4
OR
x3
AND AND AND
OR OR
...
Figure 20.1: A circuit with alternating levels of ANDs and ORs.
Restrictions enter in our method of circuit squashing. We apply a random restriction ρ
1
to C’s inputs, obtaining C
ρ
1
. We prove that the top level OR gates of C
ρ
1
are likely to only
depend on a constant number of inputs, but a good number of variables still remain free
(Corollary 88). C
ρ
1
will compute parity (or the inverse of parity, which is just as good) on
those free variables. Then we show that after applying a second random restriction ρ
2
and
obtaining C
ρ
1
ρ
2
, the secondlevel AND gates are also likely to depend on only a constant
number of inputs (Lemma 89). Now we ﬂip the ﬁrst two layers, which we can do with at
most an exponential size increase of those layers (Lemma 90), which is ﬁne because the
size of both layers was made constant by the two rounds of restrictions. After the ﬂip, the
second and third layers are both ORs, so we can collapse them both into a single layer,
obtaining a circuit of depth d −1 that computes parity on many variables.
Statements and proofs of the lemmas mentioned above follow.
Lemma 86 If a DNF or a CNF computes parity of n variables, then:
1. Each term includes all n variables, and
2. There are at least 2
n−1
terms.
Proof: We will prove the lemma for CNFs, which have OR gates at their top level and a
single AND of all the ORs at the second level. The proof for DNFs is quite similar.
For any CNF circuit C:
1. Each term includes all n variables: Suppose by way of contradiction that C has some
term t which does not depend on some variable x
i
. Then when all inputs to t are 0,
t outputs 0 and the single AND gate on the next level outputs 0, which is the output
of the whole circuit. Now ﬂip the value of x
i
. The output of t is still 0, and thus
the output of C has not changed. But since we’ve only changed one variable, the
parity has ﬂipped. Alas, we have a contradiction! So every term must depend on all
variables.
2. There are at least 2
n−1
terms: To compute parity, C must output 0 on 2
n−1
diﬀerent
settings of the input variables. C outputs 0 only when one of the terms (OR gates)
126
outputs 0. But each OR gate outputs 0 on exactly one setting of the input variables.
Thus, C must have at least 2
n−1
terms.
2
Lemma 87 Let ρ be a random restriction deﬁned as
ρ(i) =
0 with probability
1
2
−
1
√
n
1 with probability
1
2
−
1
√
n
∗ with probability
2
√
n
(20.1)
Then an OR or AND gate with n inputs restricted with ρ depends on at most c inputs with
probability at least 1 −
1
n
c/3
.
Proof Idea: We’ll break the proof into two cases. (1) Large OR gates will have some input
ﬁxed at 1 w.h.p., and thus the entire gate can be replaced with a constant 1. Similarly, large
AND gates can be replaced with 0 w.h.p. (2) Small gates will w.h.p. be reduced enough to
depend on only c inputs.
Proof:
Case 1: Suppose the gate has at least c log n inputs before applying the restriction. Then:
Pr
ρ
[gate depends on ≤ c inputs] ≥ Pr
ρ
[gate becomes constant]
≥ Pr
ρ
[some input set to 1 (OR gate) or 0 (AND gate)]
≥ 1 −
1
2
+
1
√
n
c log n
≥ 1 −
1
2
c log n
(for large enough n)
= 1 −
1
n
c/2
≥ 1 −
1
n
c/3
.
Case 2: Suppose the gate has at most c log n inputs. Then:
Pr
ρ
[gate depends on ≤ c inputs] ≥ 1 −
c log n
¸
t=c+1
c log n
t
2
√
n
t
≥ 1 −(c log n)
c
1
√
n
c
≥ 1 −
1
n
c/3
(for large enough n).
2
127
Corollary 88 If a circuit has n
k
gates, then there is a restriction with
√
n
2
free variables
such that all toplevel gates have fanin at most 4k.
Proof: We apply Lemma 87 with c = 4k, obtaining a circuit C
ρ
such that
Pr [some gate of C
ρ
depends on > 4k variables] < n
k
1
n
4k/3
= n
−1/3
< 1 (20.2)
and
Pr
¸
C
ρ
depends on <
√
n
2
variables
≈ 2
O(−
√
n)
< 1 (20.3)
which follows from a Chernoﬀ bound. This guarantees the existance of a circuit which
satisﬁes the corollary. 2
Lemma 89 Given any cCNF C (in which each OR depends on at most c variables) and
any k > 0, there is a constant b(c, k) such that after applying the random restriction ρ of
Lemma 87 to C, with probability more than 1−
1
n
k
, C
ρ
depends on at most b(c, k) variables.
Proof: The proof is by induction on c.
Base case: c = 1. The OR gates each only have 1 input, so we can replace the entire CNF
with a single AND gate. Then by Lemma 87, Lemma 89 is satisﬁed when b(1, k) = 3k.
Inductive case: Suppose the lemma holds for c − 1; we’ll prove that it’s true for c. Let
M be a maximal set of disjoint clauses. That is, the clauses in M share no variables
(disjointedness), and every clause not in M shares a variable with some clause in M
(maximality). (You can think of this as picking a maximal matching in the hypergraph
whose nodes are the circuit’s variables and whose hyperedges are the sets of variables
used by each clause.) One of two cases is true:
Case 1: [M[ ≥ 2
c
k log n. Then:
Pr
ρ
[C depends on ≤ b(c, k) inputs] ≥ Pr
ρ
[C becomes constant]
≥ Pr
ρ
[some OR gate unsatisﬁed]
≥ 1 −
1 −
1
2
−
1
√
n
c
M
≈ 1 −e
−
1
2
c
M
> 1 −
1
n
k
.
Note that in the third inequality we use the fact that the clauses of M are
independent: the probability that a particular OR gate in M is unsatisﬁed is
independent of the probability that any other OR gate in M is unsatisﬁed.
128
Case 2: [M[ ≤ 2
c
k log n. Let V be the set of variables in the clauses of M, so
[V [ ≤ c[M[ = c2
c
k log n. We’ll ﬁrst show that the restriction reduces V to
constant size:
Pr
ρ
[≤ i variables of V stay free] ≥ 1 −
2
c
k log n
i
1
√
n
i
≥ 1 −
1
n
i/3
(for large enough n).
So to get the probability greater than 1 −
1
n
k
as required by the lemma, we
choose i = 4k. But we’re not yet done: we’ve shown that the gates of M shrink
to constant size, but we have said nothing about the gates not in M. To do that,
recall that every clause not in M shares a variable with a clause in M, so if we
restrict the remaining 4k free inputs of M, obtaining C
ρ
, then every clause in
C
ρ
will have some input ﬁxed. C
ρ
is thus a (c − 1)CNF and we can apply the
inductive hypothesis: C
ρ
depends on at most b(c−1, k+1) variables. Now, there
are 2
4k
ways to set the 4k variables, so C
ρ
depends on at most
b(c, k) = 4k + 2
4k
b(c −1, k + 1) (20.4)
variables.
2
Lemma 90 (The Switching Lemma) A CNF or DNF of c variables can be represented
as a DNF or a CNF, respectively, of size at most 2
c
.
We omit the proof of this lemma, but note that we can actually prove a stronger fact: any
function of c variables can be written as a DNF or as a CNF of size at most 2
c
.
20.2 H˚astad’s Result
This section is devoted to a sketch of H˚astad’s tight result for the complexity of parity with
bounded depth circuits. We begin by stating the key theorem:
Theorem 91 (Switching Lemma) Suppose f is a kCNF
1
over the variables x
1
, . . . , x
n
.
Pick at random a restriction that leaves a fraction p of the variables unﬁxed. For each of
the n(1 −p) variables that are ﬁxed, independently hardwire 0 or 1 as that variable’s value.
Then for every t,
Pr[after the restriction f can be expressed as a tCNF and a tDNF] ≥ 1 −(7pk)
t
.
The theorem states that, after making the speciﬁed sort of restriction, we get a formula
that can be expressed as a simpler DNF (if the original f was a CNF) or a simpler CNF (if
the original f was a DNF) with exponentially high probability.
We state one more useful result before getting on to the analysis:
1
We can take f to be a kDNF as well by taking complements.
129
.
.
.
.
.
.
.
.
.
.
.
.
AND
OR
. . .
}
k
Figure 20.2: The ﬁrst two levels of the circuit C. By the Switching Lemma, we can transform
the AND gates of the top level to OR gates with high probability by making a random
restriction.
Lemma 92 If f is a kCNF (or a kDNF) for x
1
⊕x
2
⊕ ⊕x
m
, then every term has size
n.
Now let C be a depth d circuit for parity of size S. Each level of C will consist entirely
of AND gates or of OR gates, starting with AND gates at the top level. Each of the AND
gates at the top level is a 1CNF, each clause being a single variable. We can apply the
Switching Lemma with t = log S and p = 1/14 so that each of these toplevel gates becomes
a log SDNF with probability at least 1 −1/S (see Figure 1).
Since the AND gates of the top level have been switched to ORs, we can collapse the
ﬁrst two levels into one level of ORs to get a circuit that computes the parity of n/14 inputs
such that the top gates have fanin at most log S (see Figure 2).
Next we apply the Switching Lemma to the (new) top level of gates, each of which is a
log SDNF (so k = log S), with p = 1/(14 log S) and t = log S (see Figure 3). Thus each of
these OR gates can be computed by a log SCNF with probability at least
1 −
7
log S
14 log S
log S
= 1 −
7
1
14
log S
= 1 −
1
2
log S
= 1 −
1
S
,
and after collapsing the two consecutive levels of ANDs, we get a depth d − 1 circuit that
computes the parity of n/(14(14 log s)) inputs such that the top gates have fanin at most
log S.
Now we continue applying this procedure recursively until we get a depth 2 circuit that
computes the parity of n/(14(14 log S)
d−2
) inputs such that the top gates have fanin at
most log S. By the lemma, the fanin at the top must be at least the number of variables,
130
.
.
.
.
.
.
.
.
.
.
.
.
OR
. . .
}
Collapse
OR
}
log S
Figure 20.3: Applying the Switching Lemma to the AND gates transforms them to OR
gates, so we can collapse the ﬁrst two levels to form one level of ORs.
.
.
.
.
.
.
.
.
.
.
.
.
. . .
OR
}
log S
AND
Figure 20.4: We use the Switching Lemma to transform the (new) ﬁrst level of ORs to
ANDs, and then we collapse the two consecutive levels of ANDs to one new level.
131
so
log S ≥
n
14(14 log S)
d−2
or
S ≥ 2
1
14
n
1/d−1
.
20.3 References
The idea of using random restrictions, as well as the ﬁrst proof that Parity is in AC
0
, is
due to Furst, Saxe and Sipser [FSS84]. The lower bound was improved to exponential by
Yao [Yao85], and the optimal lower bound is due to H˚astad [H˚as86].
132
Lecture 21
Proof Complexity
In this lecture and in the next one we are going to discuss proof complexity, which is related
to the question of NP vs. coNP in much the same way that circuit complexity is related
to the question of P vs. NP.
Recall that NP = coNP if and only if there is some coNPcomplete problem—the
complement of an NPcomplete problem—that is in NP. In particular, equality will hold
if UNSAT is in NP, where UNSAT is the problem of determining if a Boolean formula
ϕ is a contradiction—that is, if ϕ(x
1
, . . . , x
n
) = 0 for all truth assignments to the variables
x
1
, . . . , x
n
of ϕ. If UNSATis in NP, then we can make the following precise statements:
1. There is a polynomialtime veriﬁer V and a polynomial bound p such that for all
unsatisﬁable formulas ϕ, there exists a proof π, with [π[ ≤ p([ϕ[), such that V (ϕ, π)
accepts, and
2. For all satisﬁable formulas ϕ and all π, V (ϕ, π) rejects (even if the length of π is not
bounded by some polynomial in [ϕ[).
These statements follow immediately from the deﬁnition of NP.
For the purposes of studying proof complexity, we will consider a situation in which the
requirement of a polynomial bound p in (1) is discarded. Thus, we say that V is a sound
and complete proof system for UNSATif
1. (Soundness) If ϕ is a satisﬁable formula, then for all π, V (ϕ, π) rejects,
2. (Completeness) If ϕ is an unsatisﬁable formula, then there is a π such that V (ϕ, π)
accepts, and
3. (Eﬃciency) V runs in polynomial time.
We can now state a necessary and suﬃcient condition for NP to be diﬀerent from coNP
in terms of these sound and complete proof systems: NP = coNP if and only if for every
sound and complete proof system for UNSAT, there is an inﬁnite sequence of unsatisﬁable
formulas whose shortest proof sizes grow superpolynomially.
What we are going to do now is to ﬁx a natural proof system for UNSAT and then
attempt to show unconditionally that there are formulas requiring superpolynomial proofs.
This general approach can be likened to the approach physicists take when investigating
133
physical theory: a physical theory oﬀers some predictions about phenomena that can be
observed in nature, and we check to see that those predictions are actually correct. If they
are, we take that to be evidence in support of the theory. In this case, the “theory” is that
NP = coNP, and this theory predicts that for every proof system for UNSAT, there will
be formulas requiring proofs of superpolynomial length.
Notice that any algorithm A for SAT, whether it runs in polynomial time or not, induces
a proof system for UNSAT. The proof that ϕ is unsatisﬁable is a description of a rejecting
computation of A on input ϕ. Hence a lower bound on the proof length of a proof in this
system automatically gives a lower bound on the running time of A.
21.1 Resolution Proof System
The proof system for UNSATthat we will investigate is Resolution. Given a CNF formula
ϕ = C
1
∧ C
2
. . . ∧ C
m
, where each C
i
is a clause, a proof of the unsatisﬁability of ϕ (also
called a “refutation” of ϕ) is a sequence of clauses (D
1
, D
2
, . . . , D
t
) such that
1. Each D
i
is either
(a) A clause of the original formula ϕ (i.e., C
j
for some j between 1 and m) or
(b) A clause obtained from D
j
and D
h
, j < i and h < i, using the Resolution rule
x ∨ D, x ∨ D
D ∨ D
, and
2. D
t
is 0 (the false empty clause).
As an example of how Resolution works, consider the formula
ϕ = x
1
∧ (x
1
∨ x
2
) ∧ (x
2
∨ x
3
) ∧ x
3
;
we can construct a proof as follows (see Figure 4):
1. Let D
1
= x
1
, D
2
= x
1
∨ x
2
, D
3
= x
2
∨ x
3
, and D
4
= x
3
, so the ﬁrst four clauses in
the refutation are just the original clauses of ϕ.
2. From D
1
= x
1
and D
2
= x
1
∨ x
2
, using the resolution rule we can deduce D
5
= x
2
.
3. From D
3
= x
2
∨ x
3
and D
5
= x
2
, using the resolution rule we can deduce D
6
= x
3
.
4. From D
4
= x
3
and D
6
= x
3
, using the resolution rule we deduce the empty clause
D
7
= 0.
The proof that ϕ is unsatisﬁable is then just the sequence of clauses (D
1
, D
2
, . . . , D
7
). The
reason that this is a proof is that the Resolution rule is sound: any assignment to the
variables of ϕ that satisﬁes ϕ will also satisfy any clauses derived using the Resolution rule.
Thus, if we derive a clause that cannot be satisﬁed by any assignment, it follows that ϕ can
likewise not be satisﬁed by any assignment to its variables. This reasoning shows that the
proof system is sound.
134
X
1
X
3 2
X
2
X X
1
2
X
X
3
X
3
0
Figure 21.1: A treelike Resolution proof.
In this particular example, the proof can be pictured as a tree (again see Figure 4), so
the Resolution proof for ϕ is called treelike. In general, a proof can be represented as a
directed acyclic graph, with the sequence (D
1
, D
2
, . . . , D
t
) being a topological sort of that
graph.
Now we want to show that Resolution is complete (i.e., every unsatisﬁable formula has
a refutation). In doing so, it will be useful to allow the use of the weakening rule
D
D ∨ x
.
We lose no generality in adding the weakening rule because, as it turns out, any proof that
uses the rule can be converted to a proof that doesn’t without adding to the proof’s length:
Lemma 93 If there is a proof for ϕ of length
1
k using the Resolution and weakening rules,
then there is a proof for ϕ of length at most k using the Resolution rule only.
We will not prove this result here.
Now let ϕ = C
1
∧ C
2
∧ . . . ∧ C
m
be an unsatisﬁable formula over the n variables
x
1
, x
2
, . . . , x
n
. Using the weakening rule, construct all clauses of length n that can be
obtained from some C
i
, i = 1, 2, . . . , m, by adding literals. We claim that we can derive all
length n clauses this way.
To see this, suppose that the clause
(x
1
= a
1
) ∨ (x
2
= a
2
) ∨ . . . ∨ (x
n
= a
n
) (∗)
1
The number of clauses in the proof
135
cannot be derived using weakening. Clearly the assignment
x
1
= 1 −a
1
x
2
= 1 −a
2
. . .
x
n
= 1 −a
n
(∗∗)
contradicts (∗). Now take a clause C
i
∈ ϕ over the variables x
i
1
, x
i
2
, . . . , x
i
l
, where each
x
i
j
is an x
k
for some k. A clause is contradicted by exactly one assignment, so if C
i
is
contradicted by the restriction of (∗∗) to x
i
1
, x
i
2
, . . . , x
i
l
, it follows that C
i
is a subset of
(∗), but this is impossible since we assumed that (∗) cannot be derived from any C
j
by
adding literals. Hence (∗∗) satisﬁes every clause of ϕ, but this is a contradiction since we
assumed that ϕ is unsatisﬁable. (In other words, a length n clause that is not derivable
using weakening describes a satisfying assignment for ϕ).
Having derived all length n clauses, we can use the Resolution rule to derive all length
n−1 clauses. Continuing this process, we eventually derive the clauses x and x for a variable
x ∈ ¦x
1
, x
2
, . . . , x
n
¦, and one more application of the Resolution rule gives 0, showing that
ϕ has a refutation and that the proof system is indeed complete.
21.2 Width of a Resolution Proof
If (D
1
, D
2
, . . . , D
t
) is a Resolution proof for ϕ, then t is the length
2
of the proof and the
size of the shortest clause in the derivation is the width of ϕ (we only count clauses that are
derived—not clauses that are in ϕ). It turns out to be easier to prove width lower bounds
than length lower bounds directly. That this approach can still lead to strong lower bounds
for length is the content of the following theorem:
Theorem 94 Fix an unsatisﬁable kCNF formula ϕ over n variables. Let w be the minimal
width of a proof for ϕ, let L
T
be the length of the shortest treelike Resolution proof, and let
L be the length of the shortest general Resolution proof. Then
w ≤ log L
T
+k
w ≤ O(
nlog L) +k.
We ﬁrst state a lemma that will be useful in the proof of the theorem.
Lemma 95 Let ϕ be as above, x a variable in ϕ, and a ∈ ¦0, 1¦. Suppose ϕ
x=a
is a kCNF,
the width of ϕ
x=1−a
is at most d −1, and the width of ϕ
x=a
is at most d. The ϕ has width
no greater than max d, k.
We will prove the lemma and the second part of this theorem in the next lecture. For
now, we focus on the ﬁrst part of the theorem.
Proof: The strategy of the proof is to show that if ϕ has a short proof, then it has a proof
of small width. We will show this by induction on n, the number of variables, and b, where
2
We will also refer to the length of a proof as its size since the length is the number of nodes in the proof
DAG.
136
X X
0
Figure 21.2: The treelike Resolution proof of ϕ consists of two subtrees which are deriva
tions of x and x. Since the size of the proof is at most 2
b
, one of these subtrees must have
size at most 2
b−1
.
b is an integer such that ϕ has a treelike proof of size bounded above by 2
b
. The claim is
that if ϕ has a treelike proof of size at most 2
b
, then it has a proof of width at most b +k.
If b = 0, then the proof that ϕ is unsatisﬁable is (0), in which case the claim is obvious.
If n = 1, then the only possible literals are x and x, so any proof will have width 0 (or
1, depending on the conventions one chooses). Again the claim is immediate.
Now suppose ϕ has a treelike proof of size 2
b
. The last step of the proof must by a
derivation of 0 from x and x, where x is a variable in ϕ (see Figure 5). Since the total size
of the proof is 2
b
, the treelike derivation of x or the treelike derivation of x must have size
no greater than 2
b−1
; without loss of generality, assume that the derivation of x is the small
one.
Suppose that we substitute x = 0 into all the clauses of ϕ and all the clauses we derived
from ϕ. The derivation is still correct since the Resolution rule still applies after we make
this substitution, so the treelike derivation of x becomes a derivation of 0, i.e. a proof for
ϕ
x=0
. Hence ϕ
x=0
has a treelike proof of size at most 2
b−1
, and we can apply the inductive
hypothesis to conclude that there is also a proof of ϕ
x=0
of width at most b +k −1.
Next make the substitution x = 1 into all the clauses of ϕ and all the clauses derived
from ϕ. Then, as above, we get a proof of ϕ
x=1
of size at most 2
b
. But since there is one
less variable in ϕ
x=1
, by induction on n there is a proof for ϕ
x=1
of width at most b +k.
The desired conclusion now follows from the lemma. 2
137
21.3 References
The ﬁrst superpolynomial lower bounds for resolution proof length were proved by Tseitin [Tse70].
The ﬁrst exponential lower bounds are due to Haken [Hak85]. The notion of width is due
to BenSasson and Wigderson [BSW01].
138
Lecture 22
Lower Bounds on Resolution Proof
Length
22.1 Width and Proof Complexity
22.1.1 The Main Theorem
In general, a derivation (ϕ → D) can have diﬀerent proofs and by its width w(ϕ → D),
we mean the width of its minimumwidth proof. The following theorem is about w(ϕ→0)
(written sometimes as simply w(ϕ)), where 0 is the empty false clause and ϕ is unsatisﬁable.
A proof that derives (ϕ→0) is called a refutation of ϕ.
Theorem 96 (Main WidthLength Theorem) Let ϕ be a unsatisﬁable kCNF formula on n
variables. Let L
T
and L be the length of the shortest treelike and shortest general refutation
of ϕ respectively. Then, the width of the minimumwidth refutation of ϕ, w(ϕ) is related to
L
T
and L as:
w(ϕ) ≤ log L
T
+k (22.1)
w(ϕ) ≤ O(
nlog L) +k (22.2)
This theorem is an attempt at simplifying the study of proof complexity [BSW01].
Instead of working directly with L
T
and L, we work on proving linear lower bounds on
w(ϕ) to get exponential lower bounds on L
T
and L. That is, if w(ϕ) = Ω(n) and k = o(n),
then by Relations (22.1) and (22.2), L
T
and L = 2
Ω(n)
. If we replace n with n
, we can’t
say anything about L using the sublinear lower bound on w(ϕ), but we can still get a
subexponential lower bound on L
T
.
We proved Relation (22.1) last time, using a simple induction argument on the structure
of the treelike refutation. We can’t mimic that proof for Relation (22.2) though, since a
single clause in a general refutation can be used later in more than one derivation, leading
to a dag structure. We prove Relation (22.2) using the following technical lemma.
139
Lemma 97 Let π
d
be the set of clauses of size strictly greater than d in a general refutation
π, of a unsatisﬁable kCNF formula ϕ on n variables. If [π
d
[ < (1 −
d
2n
)
−b
for some
parameter b, then the width of π is atmost d +b +k.
The clauses in π
d
are in a sense the problematic clauses in π, but if there are not too
many of them, we can safely bound the width of π. This lemma is true for any value of d
and the proof of Relation (22.2) uses the value of d that optimizes this bound.
Proof: (of Relation (22.2) of WidthLength Theorem) Let π be the shortest general refu
tation of ϕ with length [π[ = L. We obtain the result by using Lemma 97 on the trivial
bound [π
d
[ < L.
w(ϕ) ≤ w(π)
≤ d +b +k (where b is from, L = (1 −
d
2n
)
−b
)
= d +log 1
1−
d
2n
L +k
≤ d +log
1+
d
2n
L +k (using,
1
1−
≈ 1+)
≤ d +O(
2n
d
log L) +k (using, log
1+
x ≈
1
log x)
= d +O(
n
d
log L) +k
Therefore, when d =
n
d
log L (i.e., d =
√
nlog L) in the relation above, we get the desired
upper bound on w(ϕ). 2
22.1.2 Proof of the technical lemmas
In this section, we prove two technical lemmas used in the proof of WidthLength theorem.
Lemma 98 was used in proving Relation (22.1) last time and it will also be used in the
proof of Lemma 97 this time. We need more notations before proceeding. For a variable
x in ϕ, let x
1
and x
0
be aliases for the literals x and x respectively. Also, let ϕ
x=a
be
the restriction of ϕ to x =a, for a ∈ ¦0, 1¦. That is, ϕ
x=a
is the set of clauses obtained
from ϕ after removing all clauses that have x
a
in them and removing x
1−a
from all clauses
that have this literal. The restriction of a clause D to x=a, denoted by D
x=a
, is obtained
similarly.
Lemma 98 If for a ∈ ¦0, 1¦, ϕ
x=a
is a kCNF formula,
w(ϕ
x=1−a
) ≤ d −1 and w(ϕ
x=a
) ≤ d,
then w(ϕ) ≤ max(d, k).
Proof: We will ﬁrst prove the claim: If w(ϕ
x=1−a
) ≤ d−1, then w(ϕ→x
a
) ≤ d. We assume
a =1 for notational ease (the proof for general a extends easily). Let π be a refutation of
ϕ
x=0
with width atmost d−1 and let φ denote the set of clauses in ϕ that have the literal x
in them. Then φ
= φ
x=0
∩π denotes the restricted clauses of φ that were used in the proof
π. Obtain a new proof π
by copying all clauses in π, and adding x back to the clauses in φ
and its subsequent derivations (to preserve validity). The new proof π
derives x directly if
φ
was not an empty set or indirectly via a weakening rule from the ﬁnal 0 clause otherwise.
Thus, π
is a valid proof of (ϕ→x) with width atmost w(π) + 1 = d.
140
ϕ
(x added to some clauses)
x=0
ϕ copied from
width k
width d
width d
x=1
ϕ
with no x in them ϕ Clauses in
0
x
Figure 22.1: Composing (ϕ→x) with (ϕ
x=1
→0), to obtain a refutation for ϕ.
Now we turn to the proof of the lemma, again assuming a=1. We use the claim above
to derive (ϕ→x). We then resolve this derived x with all clauses of ϕ that doesn’t contain
x, to obtain ϕ
x=1
. This is ﬁnally fed into the refutation of ϕ
x=1
. See Figure 22.1 for a
depiction of this composition. Since ϕ
x=1
is a kCNF and the width of both refutations
used in the composition is atmost d, we get a refutation of ϕ with width atmost max(d, k).
2
We now give the proof of Lemma 97, which was used in proving Relation (22.2) of
WidthLength theorem.
Proof: (of Lemma 97) The proof is by induction on b and n. When b = 0, [π
d
[ < 1 and
the lemma is trivially true for any value of n. Similarly for n = 1, the lemma is trivially
true for any value of b, as there are only two literals in the formula. For the induction step,
we need a literal that hits many clauses in π
d
and we will indeed have such a literal by the
Pigeon Hole Principle (on viewing the 2n literals as holes and the sumtotal of the sizes
of all clauses in π
d
as the number of pigeons). Speciﬁcally, there must be a literal, say x
wlog, that occurs in atleast
d
2n
[π
d
[ clauses in π
d
(another way to look at this result is to
pick one out of the 2n literals in random and see the expected number of clauses in π
d
that
this literal hits).
Using this x, we obtain valid refutations of ϕ
x=0
and ϕ
x=1
by restricting π to x = 0 and
x = 1. The induction step then follows from applying Lemma 98 on these new refutations.
By induction on n, we already have w(ϕ
x=0
) ≤ b + d + k. So, we just need to prove that
w(ϕ
x=1
) ≤ b +d+k −1. Here, we are rescued by the fact that x appears in many clauses in
π
d
and hence the set of clauses that are still wider than d in the restricted refutation ϕ
x=1
141
i
C
C
C
1
p
q
C
m
x
x
x
x
x
1
n
k
j
Figure 22.2: Graph G
ϕ
associated with a formula ϕ. Clause C
p
could be x
i
∨ x
j
∨ x
k
and
C
q
could be x
i
∨ x
j
∨ x
k
.
(call this set π
d
) is small enough:
π
d
≤ (1 −
d
2n
) [π
d
[
< (1 −
d
2n
)
−(b−1)
(substituting for π
d
)
Since the parameter is now (b−1), w(ϕ
x=1
) ≤ b−1+d+k by induction on b and we are done
with the proof. 2
22.2 Width Lower Bounds
22.2.1 The Setting
We establish lower bounds on width by relating it to some natural combinatorial property.
In our case, we will use the expansion property of the bipartite graph associated with a
CNF formula. We ﬁrst deﬁne this expansion property and state the main theorem relating
width and expansion. Next we use this theorem to show that random kCNFs require
exponentiallength refutations (whp, due to their expansion properties).
Deﬁnition 29 (Graph associated with a formula) Let ϕ be a CNF formula on m clauses
C
1
, C
2
, . . . , C
m
and n variables x
1
, x
2
, . . . , x
n
. The graph associated with ϕ (see Figure 22.2)
is the undirected bipartite graph G
ϕ
= (L, R, E), deﬁned by
L = ¦x
1
, x
2
, . . . , x
n
¦ ,
R = ¦C
1
, C
2
, . . . , C
m
¦ and
E = ¦(x
i
, C
j
) [ clause C
j
contains one of the literals x
i
or x
i
¦ .
142
Deﬁnition 30 (Bipartite Expansion) A CNF formula ϕ on m clauses C
1
, C
2
, . . . , C
m
is
(r, c)expanding if and only if
∀S ⊆ ¦C
1
, C
2
, . . . , C
m
¦ and [S[ ≤ r, [N(S)[ ≥ (1 +c) r
where N(S) refers to the neighbourhood of S in G
ϕ
.
Theorem 99 (Main WidthLB (LowerBound) Theorem) For positive r and c, if ϕ is (r,c)
expanding and unsatisﬁable, then
w(ϕ) ≥
rc
2
.
The proof of this theorem is quite interesting and is given in the next two sections.
The application of this theorem lies in the context of WidthLength theorem (Theorem 96),
since a linear lower bound on rc is now enough to show a exponential lower bound on the
length of any refutation of ϕ. Infact, the next theorem gives such a lower bound for random
kCNFs.
Theorem 100 Let ϕ be a random kCNF formula on n variables and ∆n clauses, con
structed by picking each clause with k variables independently at random, from all possible
n
k
2
k
clauses with repetition. Then, whp ϕ is
n
∆
1
k−2+
,
expanding, where the clause
density ∆ and the parameter with 0 ≤ ≤
1
2
are constants.
The proof of this theorem follows from counting arguments and is left as an exercise. For
a speciﬁc value of , say
1
2
, we see that random kCNFs are (Ω(n),
1
2
)expanding and hence
by WidthLB and WidthLength theorem, they require exponentiallength refutations whp
(note that they are also unsatisﬁable whp for proper choice of ∆ and k).
22.2.2 Matching and Satisﬁability
This section prepares us for the proof of WidthLB theorem. Consider a subformula S of ϕ.
If we have a matching of S in G
ϕ
, then we can satisfy S by assigning proper truth values
to the unique nonconﬂicting variable that each of its clauses is matched to. This simple
observation along with Hall’s Matching theorem, leads to an interesting result called the
Tarsi’s theorem. We look at Hall’s theorem ﬁrst.
Theorem 101 (Hall’s Matching Theorem) For a bipartite graph G = (L, R, E), there exists
a matching of vertices R
⊆ R, if and only if
∀R
⊆ R
,
N(R
)
≥
R
where N(R
) refers to the neighbourhood of R
in G.
If there is a matching of vertices in R
, then the condition in the theorem is clearly
true. But the theorem’s strength lies in the suﬃciency of the condition, inspite of possible
overlaps between the neighbourhood of diﬀerent R
s. Suﬃciency of the condition leads to
the following two corollaries of Hall’s theorem. The ﬁrst corollary follows from the absence
of a matching (in G
ϕ
) of the entire formula ϕ. The second one follows from the presence of
a matching (in G
ϕ
) of a subformula of size atmost r (by virtue of ϕ’s bipartite expansion
property).
143
N(S’)−N(S)
n
1
x
x
C
1
C
m
S
S’
N(S)
Figure 22.3: S
is an arbitrary subset of ¦C
1
, C
2
, . . . , C
m
¦−S in the proof of Tarsi’s theorem.
[S
[ < [N(S
) −N(S)[, because otherwise we could add S
to the already maximal S,
resulting in a contradiction.
Corollary 102 If ϕ is unsatisﬁable, then there exists a subset S of the clauses of ϕ for
which [S[ > [N(S)[.
Corollary 103 For a positive r and c, if ϕ is (r, c)expanding, then every subformula of ϕ
having atmost r clauses is satisﬁable.
We now present Tarsi’s theorem and its corollary, which will be used in the proof of
WidthLB theorem in the next section.
Theorem 104 (Tarsi’s theorem) For a minimally unsatisﬁable CNF formula ϕ on m
clauses and n variables,
m > n.
(Note that minimal unsatisﬁability refers to minimality w.r.t inclusion of clauses. That is,
if we we take away any clause from ϕ, it becomes satisﬁable).
Proof: Since ϕ is a unsatisﬁable formula (on clauses C
1
, C
2
, . . . , C
m
and variables x
1
, . . . , x
n
),
by Corollary 102 of Hall’s theorem, we have a subset of clauses S for which [S[ > [N(S)[.
Let’s make sure that S is the maximal such set. If S is the entire formula, we are done
with the proof as [S[ = m and [N(S)[ = n. So, let’s assume otherwise (i.e., [S[ < m)
and let S
be an arbitrary subset of ¦C
1
, C
2
, . . . , C
m
¦ − S. By maximality of S, we have
[S
[ < [N(S
) −N(S)[ (see Figure 22.3). By Hall’s theorem, this result basically translates
to a matching in G
ϕ
(and hence satisﬁability), of all the clauses in ¦C
1
, C
2
, . . . , C
m
¦ − S
using only the variables in ¦x
1
, . . . , x
n
¦ − N(S). Since ϕ is minimally unsatisﬁable, S can
also be satisﬁed (ofcourse using only the variables in N(S)), leading to the satisﬁability of
the entire formula ϕ. The only way to resolve this contradiction requires S to be the entire
formula, the case that we have already proved. 2
144
Corollary 105 If a CNF formula ϕ on m clauses and n variables, minimally implies a
clause D of size k, then
m > n −k or equivalently, k > n −m.
(Note that ”minimally implies” again refers to minimality w.r.t inclusion of clauses. That
is, D can be derived from ϕ but not from any subformula of ϕ).
Proof: Recall that x
1
and x
0
are aliases for x and x respectively. If D is (x
a
1
1
∨ . . . ∨ x
a
k
k
),
the unique assignment that contradicts D is then (x
1
=1−a
1
, . . . , x
k
=1−a
k
). Restricting
ϕ to this assignment should yield a unsatisﬁable formula ϕ
(x
1
=1−a
1
,...,x
k
=1−a
k
)
on (n − k)
variables and atmost m clauses. It is also minimally unsatisﬁable because ϕ minimally
implied D in the ﬁrst place. We obtain the result by direct application of Tarsi’s theorem.
2
22.2.3 Putting them together
Equipped with the tools of last section, we are ready for the proof of WidthLB theorem
(Theorem 99). Intuitively, ϕ can’t be refuted using only r of its clauses and we should be
able to ﬁnd a clause that can be derived from a ”fairly big” but not ”too big” subformula
of ϕ. The proof follows from arguing about the size of such a clause in the minimumwidth
refutation.
Proof: (of WidthLB Theorem) Let π be the minimumwidth refutation of ϕ, a (r, c)
expanding formula on m clauses. Deﬁne a measure µ (a number between 1 and m) for all
clauses D ∈ π as:
µ(D) = [ smallest subformula of ϕ that minimally implies D [ .
This measure has the following easily veriﬁable properties.
1. For any clause C in π that is also in ϕ, µ(C) = 1.
2. For the ﬁnal 0 clause in π, µ(0) ≥ r. (from Corollary 103 of Hall’s theorem).
3. For a clause D derived from D
and D
in π, µ(D) ≤ µ(D
) +µ(D
). In other words,
µ is a subadditive measure.
The above properties suggest that there exists a clause in π (say D
∗
), whose measure is
such that r/2 ≤ µ(D
∗
) ≤ r. To see this, start from the ﬁnal 0 clause and pick the higher
measure clause of the two clauses from which it is derived. The measure of the picked clause
is atleast r/2 and if it is also atmost r stop; Otherwise continue this process recursively.
Since µ(C) = 1 for a clause C in ϕ, we will ultimately succeed due to subadditivity of the
measure.
Now, let S be the smallest subformula of ϕ that minimally implies D
∗
. Then using the
fact that r/2 ≤ [S[ ≤ r and the corollary of Tarsi’s theorem, we get: (note how we use the
145
fact that S is ”fairly big” but not ”too big”)
w(ϕ) ≥ Size of D
∗
≥ [N(S)[ −[S[ (by Corollary 105 of Tarsi’s theorem)
≥ (1 +c) [S[ −[S[ (by expansion property of ϕ, since [S[ ≤ r)
= c [S[
≥ rc/2.
2
146
Exercises
1. Prove Theorem 100 on the expansion property of random kCNFs.
2. Show that Relation (22.1) of WidthLength theorem (proved last time) and Lemma
97, can be proven by induction on n alone. (The current proofs use induction on two
variables).
3. Provide a family of examples, for which the upper bound on w(ϕ) in Lemma 98
becomes tight. (Note that a tighter bound is min(d+1, max(d, k)) because there is a
composition slightly diﬀerent from the one in Figure 22.1, that yields a bound of d+1
on w(ϕ)).
22.3 References
All the results in this lecture are due to BenSasson and Wigderson [BSW01].
147
Lecture 23
Pseudorandomness and
Derandomization
These notes follow quite literally the notes that I wrote for the PCMI school on computational
complexity in Summer 2000. There is some redundancy between these notes and the content
of past lectures.
23.1 Probabilistic Algorithms versus Deterministic Algorithms
A probabilistic algorithm A(, ) is an algorithm that takes two inputs x and r, where x is an
instance of some problem that we want to solve, and r is the output of a random source. A
random source is an idealized device that outputs a sequence of bits that are uniformly and
independently distributed. For example the random source could be a device that tosses
coins, observes the outcome, and outputs it. A probabilistic algorithm A is good if it is
eﬃcient and if, say, for every x,
Pr
r
[A(x, r) = right answer for x ] ≥
3
4
We will typically restrict to the case where A solves a decision problem (e.g. it tests
whether two readonce branching programs are equivalent). In this case we say that a
language L is in BPP if there is a polynomial time algorithm A(, ) (polynomial in the
length of the ﬁrst input) such that for every x
Pr
r
[A(x, r) = χ
L
(x)] ≥
3
4
or, said another way,
x ∈ L ⇒ Pr
r
[A(x, r) = 1] ≥
3
4
and
x ∈ L ⇒ Pr
r
[A(x, r) = 1] ≤
1
4
.
The choice of the constant 3/4 is clearly quite arbitrary. For any constant 1/2 < p < 1, if we
had deﬁned BPP by requiring the probabilistic algorithm to be correct with probability st
148
least p, we would have given an equivalent deﬁnition. In fact, for any polynomial p, it would
have been equivalent to deﬁne BPP by asking the algorithm to be correct with probability
at least 1/2+1/p(n), where n is the size of the input, and it would have also been equivalent
if we had asked the algorithm to be correct with probability at least 1−1/2
p(n)
. That is, for
any two polynomials p and q, if for a decision problem L we have a probabilistic polynomial
time A that solves L on every input of length n with probability at least 1/2+1/p(n), then
there is another probabilistic algorithm A
, still running in polynomial time, that solves L
on every input of length n with probability at least 1 −2
−q(n)
.
For quite a few interesting problems, the only known polynomial time algorithms are
probabilistic. A wellknown example is the problem of testing whether two multivariate low
degree polynomials given in an implicit representation are equivalent. Another example is
the problem of extracting “square roots” modulo a prime, i.e. to ﬁnd solutions, if they exist,
to equations of the form x
2
= a (mod p) where p and a are given, and p is prime. More
generally, there are probabilistic polynomial time algorithms to ﬁnd roots of polynomials
modulo a prime. There is no known deterministic polynomial time algorithm for any of the
above problems.
It is not clear whether the existence of such probabilistic algorithms suggests that prob
abilistic algorithms are inherently more powerful than deterministic ones, or that we have
not been able yet to ﬁnd the best possible deterministic algorithms for these problems.
In general, it is quite an interesting question to determine what is the relative power of
probabilistic and deterministic computations. This question is the main motivations for the
results described in this lecture and the next ones.
23.1.1 A trivial deterministic simulation
Let A be a probabilistic algorithm that solves a decision problem L. On input x of length
n, say that A uses a random string r of length m = m(n) and runs in time T = T(n) (note
that m ≤ T).
It is easy to come up with a deterministic algorithm that solves L in time 2
m(n)
T(n).
On input x, compute A(x, r) for every r. The correct answer is the one that comes up the
majority of the times, so, in order to solve our problem, we just have to keep track, during
the computation of A(x, r) for every r, of the number of strings r for which A(x, r) = 1 and
the number of strings r for which A(x, r) = 0.
Notice that the running time of the simulation depends exponentially on the number of
random bits used by A, but only polynomially on the running time of A. In particular, if
A uses a logarithmic number of random bits, then the simulation is polynomial. However,
typically, a probabilistic algorithm uses a linear, or more, number of random bits, and so
this trivial simulation is exponential. As we will see in the next section, it is not easy to
obtain more eﬃcient simulations.
23.1.2 Exponential gaps between randomized and deterministic proce
dures
For some computational problems (e.g. approximating the size of a convex body) there are
probabilistic algorithms that work even if the object on which they operate is exponentially
big and given as a black box; in some cases one can prove that deterministic algorithms
149
cannot solve the same problem in the same setting, unless they use exponential time. Let
us see a particularly clean (but more artiﬁcial) example of this situation.
Suppose that there is some function f : ¦0, 1¦
n
¦0, 1¦
n
→ ¦0, 1¦ that is given as an
oracle; we want to devise an algorithm that on input x ﬁnds an approximation (say, to within
an additive factor 1/10) to the value Pr
y
[f(x, y) = 1]. A probabilistic algorithm would pick
O(1) points y
1
, . . . , y
t
at random, evaluate f(x, y
i
), and then output the fraction of i such
that f(x, y
i
) = 1. This will be an approximation to within 1/10 with good probability.
However a deterministic subexponential algorithm, given x, can only look at a negligible
fraction of the values f(x, y). Suppose that f is zero everywhere. Now consider the function
g(x, y) that is equal to f on all the points that our algorithm queries, and is 1 elsewhere
(note that, by this deﬁnition, the queries of the algorithm on input x will be the same for
f and g). If the algorithm takes subexponential time, g is almost everywhere one, yet the
algorithm will give the same answer as when accessing f, which is everywhere zero. If our
algorithm makes less than 2
n−1
oracle queries, it cannot solve the problem with the required
accuracy.
23.2 Derandomization Under Complexity Assumptions
It is still not known how to improve, in the general case, the deterministic simulation of
Section 23.1.1, and the observation of Section 23.1.2 shows one of the diﬃculties in achieving
an improvement. If we want to come up with a general way of transforming probabilistic
procedures into deterministic subexponential procedures, the transformation cannot be
described and analyzed by modeling in a “black box” way the probabilistic procedure.
1
If
we want to deterministically and subexponentially simulate BPP algorithms, we have to
exploit the fact that a BPP algorithm A(, ) is not an arbitrary function, but an eﬃciently
computable one, and this is diﬃcult because we still have a very poor understanding of the
nature of eﬃcient computations.
The results described in these notes show that it is indeed possible to deterministically
simulate probabilistic algorithms in subexponential (or even polynomial) time, provided
that certain complexitytheoretic assumptions are true. It is quite usual in complexity
theory that, using reductions, one can show that the answer to some open question is
implied by (or even equivalent to) the answer to some other question, however the nature of
the results of these notes is somewhat unusual. Typically a reduction from a computational
problem A to a problem B shows that if B has an eﬃcient algorithm then A has also an
eﬃcient algorithm, and, by counterpositive, if A is intractable then B is also intractable.
In general, using reductions one shows that algorithmic assumptions imply algorithmic
consequences, and intractability assumptions imply intractability consequences. In these
notes we will see instead that the existence of eﬃcient derandomized algorithms is implied by
the intractability of some other problem, so that a hardness condition implies an algorithm
1
More precisely, it is impossible to have a subexponential time deterministic “universal derandomization
procedure” that given x and oracle access to an arbitrary function A(·, ·) outputs 1 when Prr[A(x, r) =
1] ≥ 3/4 and outputs 0 when Prr[A(x, r) = 1] ≤ 1/4. In fact, more generally, it is impossible to give
subexponential time algorithms for all BPP problems by using “relativizing” techniques. It is beyond the
scope of these notes to explain what this means, and why it is more general. “Relativizations” are discussed
in [Pap94], where it is possible to ﬁnd pointers to the relevant literature.
150
consequence.
In the next section we will introduce some notation about computational problems and
complexity measures, and then we will state some results about conditional derandomization.
23.2.1 Formal Deﬁnitions of Complexity Measures and Complexity Classes
For a decision problem L and an integer n we denote by L
n
the restriction of L to inputs of
length n. It will be convenient to think of L
n
as a Boolean function L
n
: ¦0, 1¦
n
→ ¦0, 1¦
(with the convention that x ∈ L
n
if and only if L
n
(x) = 1).
For a function f : ¦0, 1¦
n
→ ¦0, 1¦, consider the size of the smallest circuit that solves
f; denote this number by CC(f). By deﬁnition, we have that if C is a circuit with n inputs
of size less than CC(f) then there exists an x ∈ ¦0, 1¦
n
such that C(x) = f(x).
Consider now, for every n, what is the largest s such that for every circuit C of size ≤ s
we have Pr
x∈{0,1}
n[C(x) = f(x)] ≤ 1/2 + 1/s; denote this number by H(f).
Recall that DTIME(T(n)) is the class of decision problems that can be solved by
deterministic algorithms running in time at most T(n) on inputs of length n. We have the
classes E = DTIME(2
O(n)
) and EXP = DTIME(2
n
O(1)
).
23.2.2 Hardness versus Randomness
From our previous arguments, we have BPP ⊆ EXP. Since there are settings where
probabilistic procedures require exponential time to be simulated, one would conjecture
that BPP ⊆ 2
n
o(1)
; on the other hand, BPP seems to still represent a class of feasible
computations, and it would be very surprising if BPP = EXP. As we will see in a moment,
something is wrong with the above intuition. Either BPP = EXP, which sounds really
impossible, or else it must be the case that BPP has subexponential time deterministic
algorithms (that will work well only on average, but that would be quite remarkable enough).
Theorem 106 ([IW98]) Suppose BPP = EXP; then for every BPP language L and
every > 0 there is a deterministic algorithm A that works in time 2
n
and, for inﬁnitely
many n, solves L on a fraction 1 −1/n of the inputs of length n.
This gives a nontrivial simulation of BPP under an uncontroversial assumption. We
can also get an optimal simulation of BPP under an assumption that is much stronger, but
quite believable.
Theorem 107 ([IW97]) Suppose there is a problem L in E and a ﬁxed δ > 0 such that,
for all suﬃciently large n, CC(L
n
) ≥ 2
δn
; then P = BPP.
We will call the statement “there is a problem L in E and a ﬁxed δ > 0 such that, for all
suﬃciently large n, CC(L
n
) ≥ 2
δn
” the “IW assumption.” Note that if the IW assumption
is true, then it is true in the case where
L = ¦(M, x, 1
k
) : machine M halts within 2
k
steps on input x ¦
Notice also that L cannot be solved by algorithms running in time 2
o(n)
, and so it would
be a little bit surprising if it could be solvable by circuits of size 2
o(n)
, because it would
151
mean that, for general exponential time computations, nonuniformity buys more than a
polynomial speedup. In fact it would be very surprising if circuits of size 2
.99n
existed for
L.
The two theorems that we just stated are the extremes of a continuum of results showing
that by making assumptions on the hardness of problems in E and EXP it is possible to de
vise eﬃcient deterministic algorithms for all BPP problems. The stronger the assumption,
the more eﬃcient the simulation.
Notice that the assumption in Theorem 107 is stronger than the assumption in Theorem
106 in two ways, and that, similarly, the conclusion of Theorem 107 is stronger than the
conclusion in Theorem 106 in two ways. On the one hand, the assumption in Theorem 107
refers to circuit size, that is, to a nonuniform measure of complexity, whereas the assump
tion in Theorem 106 uses a uniform measure of complexity (running time of probabilistic
algorithms). This diﬀerence accounts for the fact that the conclusion of Theorem 107 gives
an algorithm that works for all inputs, while the conclusion of Theorem 106 gives an algo
rithm that works only for most inputs. The other diﬀerence is that Theorem 107 assumes
exponential hardness, while Theorem 107 assumes only superpolynomial hardness. This
is reﬂected in the running time of the consequent deterministic simulations (respectively,
polynomial and subexponential).
When one makes the nonuniform assumption that there is a problem in E that re
quires circuits of size s(n), then the consequence is a deterministic simulation of BPP in
time roughly 2
s
−1
(n
O(1)
)
[ISW99, SU01, Uma02]. So if one assumes that E requires super
polynomial circuits, BPP can be simulated in time 2
n
o(1)
, if one assumes that E requires
circuits of size 2
Ω(n)
then the simulation runs in time n
O(1)
, if one assumes that E requires
circuits of size n
log n
then the simulation runs in time 2
2
O(
√
log n)
, and so on. The result of
[IW98] does not scale up so well when one is willing to make stronger uniform assumptions.
In particular, the following is an open question.
Conjecture 1 Suppose E ⊆
¸
δ>0
BPTIME(2
δn
); then for every BPP language L there
is a deterministic polynomial time algorithm A that, for inﬁnitely many n, solves L on a
fraction 1 −1/n of the inputs of length n.
23.3 Pseudorandom Generators
We say that a function G : ¦0, 1¦
t
→ ¦0, 1¦
m
is a (s, )pseudorandom generator if for every
circuit D of size ≤ s we have
[Pr
r
[D(r) = 1] −Pr
z
[D(G(z)) = 1][ ≤
Suppose that we have a probabilistic algorithm A such that for inputs x of length
n the computation A(x, ) can be performed by a circuit of size s(n); suppose that for
every x we have Pr
r
[A(x, r) = right answer ] ≥ 3/4, and suppose that we have a (s, 1/8)
pseudorandom generator G : ¦0, 1¦
t(n)
→ ¦0, 1¦
m(n)
. Then we can deﬁne a new probabilistic
algorithm A
such that A
(x, z) = A(x, G(z)). It is easy to observe that for every x we have
Pr
z
[A
(x, z) = right answer ] ≥ 5/8
152
and that, using the trivial derandomization we can get a deterministic algorithm A
that
always works correctly and whose running time is 2
t
times the sum of the running time of
A plus the running time of G.
If t is logarithmic in m and s, and if G is computable in poly(m, s) time, then the
whole simulation runs in deterministic polynomial time. Notice also that if we have a (s, )
pseudorandom generator G : ¦0, 1¦
t
→ ¦0, 1¦
m
, then for every m
≤ m we also have, for a
stronger reason, a (s, ) pseudorandom generator G
: ¦0, 1¦
t
→ ¦0, 1¦
m
(G
just computes
G and omits the last m− m
bits of the output). So there will be no loss in generality if
we consider only generators for the special case where, say, s = 2m. (This is not really
necessary, but it will help reduce the number of parameters in the statements of theorems.)
We have the following easy theorem.
Theorem 108 Suppose there is a family of generators G
m
: ¦0, 1¦
O(log m)
→ ¦0, 1¦
m
that
are computable in poly(m) time and that are (2m, 1/8)pseudorandom; then P = BPP.
Of course this is only a suﬃcient condition. There could be other approaches to proving
(conditionally) P = BPP, without passing through the construction of such strong gener
ators. Unfortunately we hardly know of any other approach, and anyway the (arguably)
most interesting results are proved using pseudorandom generators.
2
23.4 The two main theorems
23.4.1 The NisanWigderson Theorem
Theorem 109 (Special case of [NW94]) Suppose there is L ∈ E and δ > 0 such that,
for all suﬃciently large n, H(L
n
) ≥ 2
δn
; then there is a family of generators G
m
: ¦0, 1¦
O(log m)
→
¦0, 1¦
m
that are computable in poly(m) time and that are (2m, 1/8)pseudorandom (in par
ticular, P = BPP).
Notice the strength of the assumption. For almost every input length n, our problem
has to be so hard that even circuits of size 2
δn
have to be unable to solve the problem
correctly on more than a fraction 1/2 +2
−δn
of the inputs. A circuit of size 1 can certainly
solve the problem on a fraction at least 1/2 of the inputs (either by always outputting 0
or by always outputting 1). Furthermore, a circuit of size 2
n
always exist that solves the
problem on every input. A circuit of size 2
δn
can contain, for example, the right solution
to our problem for every input whose ﬁrst (1 −δ)n bits are 0; the circuit can give the right
answer on these 2
δn
inputs, and answer always 0 or always 1 (whichever is better) on the
other inputs. This way the circuit is good on about a fraction 1/2 +2
−(1−δ)n
of the inputs.
2
Some exceptions are discussed below. Andreev et al. [ACR98] show that in order to deterministically
simulate probabilistic algorithms it is enough to construct hitting set generators, a seemingly weaker primitive
than a pseudorandom generator. The complicated proof of [ACR98] was simpliﬁed in subsequent work
[ACRT99, BF99, GW99]. Andreev et al. [ACR99] also show how to construct hitting set generators, but
only under very strong complexity assumptions. Miltersen and Vinodchandran [MV99] give a very elegant
construction of hitting set generators, but it also requires a stronger hardness assumption than in [IW97]. On
the other hand, [MV99] also gets a stronger conclusion, and, in particular, it is not known how to prove the
main result of [MV99] (about the “derandomization” of tworounds interactive proofs) using pseudorandom
generators.
153
So, in particular, for every problem, there is a circuit of size 2
n/2
that solves the problem on
a fraction 1/2+2
−n/2
of the inputs. It is somewhat more tricky to show that there is in fact
even a circuit of size 2
(1/3+o(1))n
that solves the problem on a fraction 1/2 +2
−(1/3+o(1))n
of
the inputs, and this is about best possible for general problems [ACR97].
23.4.2 Worstcase to Averagecase Reduction
Theorem 110 ([BFNW93, Imp95a, IW97]) Suppose there is L ∈ E and δ > 0 such
that, for all suﬃciently large n, CC(L
n
) ≥ 2
δn
; Then there is L
∈ E and δ
> 0 such that,
for all suﬃciently large n, H(L
n
) ≥ 2
δ
n
.
This is quite encouraging: the (believable) IW assumption implies the (a priori less believ
able) NW assumption. Notice how Theorem 107 follows from Theorems 109 and 110.
The results on polynomial reconstruction from Lecture 10 imply Theorem 110, although
this is not the way it was originally proved. Our proof followed [STV01].
154
Lecture 24
The NisanWigderson Generator
These notes follow quite literally the notes that I wrote for the PCMI school on computational
complexity in Summer 2000. There is some redundancy between these notes and the content
of past lectures.
In this lecture we prove Theorem 109.
24.1 The NisanWigderson Construction
The NisanWigderson generator is based on the existence of a decision problem L in E such
that for almost every input length l we have H(L
l
) ≥ 2
δl
, yet there is a uniform algorithm
that solves L
l
in 2
O(l)
time. Our goal is to use these assumptions on L
l
to build a generator
whose input seed is of length O(l), whose output is of length 2
Θ(l)
and indistinguishable
from uniform by adversaries of size 2
Θ(l)
, and the generator should be computable in time
2
O(l)
.
As we will see in a moment, it is not too hard to construct a generator that maps l bits
into l + 1 bits, and whose running time and pseudorandomness are as required. We will
then present the NisanWigderson construction, and present its analysis.
24.1.1 Impredictability versus Pseudorandomness
Let f : ¦0, 1¦
l
→ ¦0, 1¦ be a function such that H(f) ≥ s, and consider the pseudorandom
generator G : ¦0, 1¦
l
→ ¦0, 1¦
l+1
deﬁned as G(x) = x f(x), where ‘’ is used to denote
concatenation. We want to argue that G is a (s −3, 1/s)pseudorandom generator.
The argument works by contradiction, and consists in the proof of the following result.
Lemma 111 Let f : ¦0, 1¦
l
→ ¦0, 1¦. Suppose that there is a circuit D of size s such that
[Pr
x
[D(x f(x)) = 1] −Pr
x,b
[D(x b) = 1][ >
then there is a circuit A of size s + 3 such that
Pr
x
[A(x) = f(x)] >
1
2
+
155
Proof: First of all, we observe that there is a circuit D
of size at most s + 1 such that
Pr
z
[D
(x f(x)) = 1] −Pr
x,b
[D
(x b) = 1] > (24.1)
This is because Expression (24.1) is satisﬁed either by taking D = D
or by taking D = D
.
A way to interpret Expression (24.1) is to observe that when the ﬁrst l bits of the input of
D
() are a random string x, D
is more likely to accept if the last bit is f(x) than if the
last bit is random (and, for a stronger reason, if the last bit is 1 −f(x)). This observation
suggests the following strategy in order to use D
to predict f: given an input x, for which
we want to compute f(x), we guess a value b, and we compute D
(x, b). If D
(x, b) = 1,
we take it as evidence that b was a good guess for f(x), and we output b. If D
(x, b) = 0,
we take it as evidence that b was the wrong guess, and we output 1 − b. Let A
b
be the
procedure that we just described. We claim that
Pr
x,b
[A
b
(x) = f(x)] >
1
2
+ (24.2)
The claim is proved by the following derivation
Pr
x,b
[A
b
(x) = f(x)]
= Pr
x,b
[A
b
(x) = f(x)[b = f(x)]Pr
x,b
[b = f(x)]
+Pr
x,b
[A
b
(x) = f(x)[b = f(x)]Pr
x,b
[b = f(x)]
=
1
2
Pr
x,b
[A
b
(x) = f(x)[b = f(x)] +
1
2
Pr
x,b
[A
b
(x) = f(x)[b = f(x)]
=
1
2
Pr
x,b
[D
(x, b) = 1[b = f(x)] +
1
2
Pr
x,b
[D
(x, b) = 0[b = f(x)]
=
1
2
+
1
2
Pr
x,b
[D
(x, b) = 1[b = f(x)] −
1
2
Pr
x,b
[D
(x, b) = 1[b = f(x)]
=
1
2
+Pr
x,b
[D
(x, b) = 1[b = f(x)]
−
1
2
Pr
x,b
[D
(x, b) = 1[b = f(x)] +Pr
x,b
[D
(x, b) = 1[b = f(x)]
=
1
2
+Pr
x
[D
(x, f(x)) = 1] −Pr
x,b
[D
(x, b) = 1]
>
1
2
+
From Expression (24.2) we can observe that there must be a b
0
∈ ¦0, 1¦ such that
Pr
x
[A
b
0
(x) = f(x)] >
1
2
+
And A
b
0
is computed by a circuit of size at most s + 3 because A
b
0
(x) = b
0
⊕(D
(x, b
0
)),
which can be implemented with two more gates given a circuit for D
. 2
24.1.2 Combinatorial Designs
Consider a family (S
1
, . . . , S
m
) of subsets of an universe U. We say the family is a (l, α)
design if, for every i, [S
i
[ = l, and, for every i = j, [S
i
∩ S
j
[ ≤ α.
156
Theorem 112 For every integer l, fraction γ > 0, there is an (l, log m) design (S
1
, . . . , S
m
)
over the universe [t], where t = O(l/γ) and m = 2
γl
; such a design can be constructed in
O(2
t
tm
2
) time.
We will use the following notation: if z is a string in ¦0, 1¦
t
and S ⊂ [t], then we denote
by z
S
the string of length [S[ obtained from z by selecting the bits indexed by S. For
example if z = (0, 0, 1, 0, 1, 0) and S = ¦1, 2, 3, 5¦ then z
S
= (0, 0, 1, 1).
24.1.3 The NisanWigderson Generator
For a Boolean function f : ¦0, 1¦
l
→ ¦0, 1¦, and a design o = (S
1
, . . . , S
m
) over [t], the
NisanWigderson generator is a function NW
f,S
: ¦0, 1¦
t
→ ¦0, 1¦
m
deﬁned as follows:
NW
f,S
(z) = f(z
S
1
) f(z
S
2
) f(z
Sm
)
24.2 The Reduction from Distinguishing to Predicting
The following lemma implies Theorem 109.
Lemma 113 Let f : ¦0, 1¦
l
→ ¦0, 1¦ be a Boolean function and o = (S
1
, . . . , S
m
) be a
(l, log m) design over [t]. Suppose D : ¦0, 1¦
m
→ ¦0, 1¦ is such that
[Pr
r
[D(r) = 1] −Pr
z
[D(NW
f,S
(z)) = 1][ > .
Then there exists a circuit C of size O(m
2
) such that
[Pr
x
[D(C(x)) = f(x)] −1/2[ ≥
m
Proof: The main idea is that if D distinguishes NW
f,S
() from the uniform distribution,
then we can ﬁnd a bit of the output of the generator where this distinction is noticeable.
On such a bit, D is distinguishing f(x) from a random bit, and such a distinguisher can
be turned into a predictor for f. In order to ﬁnd the “right bit”, we will use the hybrid
argument. At this level of abstraction, the analysis is the same as the analysis of the
BlumMicaliYao generator, however, as the analysis unfolds, we will see major diﬀerences.
Let us start with the hybrid argument. We deﬁne m + 1 distributions H
0
, . . . , H
m
; H
i
is deﬁned as follows: sample a string v = NW
f,S
(z) for a random z, and then sample a
string r ∈ ¦0, 1¦
m
according to the uniform distribution, then concatenate the ﬁrst i bits of
v with the last m−i bits of r. By deﬁnition, H
m
is distributed as NW
f,S
(y) and H
0
is the
uniform distribution over ¦0, 1¦
m
.
Using the hypothesis of the Lemma, we know that there is a bit b
0
∈ ¦0, 1¦ such that
Pr
y
[D
(NW
f,S
(y)) = 1] −Pr
r
[D
(r)] >
where D
(x) = b
0
⊕D(x).
157
We then observe that
≤ Pr
z
[D
(NW
f,S
(z)) = 1] −Pr
r
[D
(r)]
= Pr[D
(H
m
) = 1] −Pr[D
(H
0
) = 1]
=
m
¸
i=1
(Pr[D
(H
i
) = 1] −Pr[D
(H
i−1
) = 1])
In particular, there exists an index i such that
Pr[D
(H
i
) = 1] −Pr[D
(H
i−1
) = 1] ≥ /m (24.3)
Now, recall that
H
i−1
= f(z
S
1
) f(z
S
i−1
)r
i
r
i+1
r
m
and
H
i
= f(z
S
1
) f(y
S
i−1
)f(y
S
i
)r
i+1
r
m
.
We can assume without loss of generality (up to a renaming of the indices) that S
i
=
¦1, . . . , l¦. Then we can see z ∈ ¦0, 1¦
t
as a pair (x, y) where x = z
S
i
∈ ¦0, 1¦
l
and
y = z
[t]−S
i
∈ ¦0, 1¦
t−l
. For every j < i and z = (x, y), let us deﬁne f
j
(x, y) = f(z
S
j
): note
that f
j
(x, y) depends on [S
i
∩ S
j
[ ≤ log m bits of x and on l −[S
i
∩ S
j
[ ≥ l −log m bits of
y. With this notation we have
Pr
x,y,r
i+1
,...,rm
[D
(f
1
(x, y), . . . , f
i−1
(x, y), f(x), . . . , r
m
) = 1]
−Pr
x,y,r
i+1
,...,rm
D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1] > /m
That is, when D
is given a string that contains f
j
(x, y) for j < i in the ﬁrst i −1 entries,
and random bits in the last m− i entries, then D
is more likely to accept the string if it
contains f(x) in the ith entry than if it contains a random bit in the ith entry. This is
good enough to (almost) get a predictor for f. Consider the following algorithm:
Algorithm A
Input: x ∈ ¦0, 1¦
l
Pick random r
i
, . . . , r
m
∈ ¦0, 1¦
Pick random y ∈ ¦0, 1¦
t−l
Compute f
1
(x, y), . . . , f
i−1
(x, y)
If D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1 output r
i
Else output 1 −r
i
Let us forget for a moment about the fact that the step of computing f
1
(x, y), . . . , f
i−1
(x, y)
looks very hard, and let us check that Ais good predictor. Let us denote by A(x, y, r
1
, . . . , r
m
)
158
the output of A on input x and random choices y, r
1
, . . . , r
m
.
Pr
x,y,r
[A(x, y, r) = f(x)]
= Pr
x,y,r
[A(x, y, r) = f(x)[r
i
= f(x)]Pr
x,r
i
[r
i
= f(x)]
+Pr
x,y,r
[A(x, y, r) = f(x)[r
i
= f(x)]Pr
x,r
i
[r
i
= f(x)]
=
1
2
Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = r
i
]
+
1
2
Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 0[f(x) = r
i
]
=
1
2
+
1
2
Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = b]
−Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = b]
=
1
2
+Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = b]
−
1
2
Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = b]
+Pr
x,y,r
[D
(f
1
(x, y), . . . , f
i−1
(x, y), r
i
, . . . , r
m
) = 1[f(x) = b]
=
1
2
+Pr[D
(H
i
) = 1] −Pr[D
(H
i−1
) = 1]
≥
1
2
+
m
So A is good, and it is worthwhile to see whether we can get an eﬃcient implementation.
We said we have
Pr
x,y,r
i
,...,rm
[A(x, y, r) = f(x)] ≥
1
2
+
m
so there surely exist ﬁxed values c
i
, . . . , c
m
to give to r
i
, . . . , r
m
, and a ﬁxed value w to give
to y such that
Pr
x,r
[A(x, w, c
i
, c
i+1
, . . . , c
m
) = f(x)] ≥
1
2
+
m
At this point we are essentially done. Since w is ﬁxed, now, in order to implement A, we only
have to compute f
j
(x, w) given x. However, for each j, f
j
(x, w) is a function that depends
only on ≤ log m bits of x, and so is computable by a circuits of size O(m). Even composing
i−1 < m such circuit, we still have that the sequence f
1
(x, w), . . . , f
i−1
(x, w), c
i
, c
i+1
, . . . , c
m
can be computed, given x, by a circuit C of size O(m
2
). Finally, we notice that at this point
A(x, w, c) is doing the following: output the xor between c
i
and the complement of D
(C(x)).
Since c
i
is ﬁxed, either A(x, w, c) always equals D(C(x)), or one is the complement of the
other. In either case the Lemma follows. 2
At this point it should not be too hard to derive Theorem 109.
159
Lecture 25
Extractors and Pseudorandom
Generators
25.1 Use of Weak Random Sources
Suppose that we have a probabilistic algorithm A(, ) that on inputs of length n runs in
time T(n) and uses m(n) random bits. Instead of a perfect source of randomness, we
assume that we have a source that produces an output containing some “impredictability,”
but that can still be very far from uniform. A very general way of modeling this source is
to assume that on input 1
N
it outputs a string in ¦0, 1¦
N
, and that the output string has
“some randomness” (a notion that we will formalize and quantify in a moment). Typically,
a good way to quantify the amount of randomness, or impredictability, in a distribution,
is to compute its (Shannon) entropy. For a random variable X whose range is ¦0, 1¦
N
, its
entropy is deﬁned as H(X) =
¸
a∈{0,1}
N Pr[X = a] log(1/Pr[X = a]),
Shannon entropy is a very good measure in cases where one is allowed to take multiple
samples from the distribution, but in our setting this is not the best measure. Consider
for example a distribution X such that X = (0, 0, , 0) with probability 1 − 1/
√
N, and
it is uniform with probability 1/
√
N. Then its Shannon entropy is about
√
N, which is
quite high, yet it is almost always a useless string of zeroes. It is a good intuition to
think that the amount of randomness contained in the outcome a of a random variable X is
log 1/Pr[X = a]. If X has Shannon entropy k, then on average, when we sample from X we
get a value of “randomness” k, however it can be the case that with very high probability we
get almost zero randomness, and with low probability we get high randomness. We would
rather have a measure of randomness that guarantees to have almost always, or, even better,
always, high randomness. This motivates the deﬁnition of minentropy: a random variable
X has minentropy at least k if for every a in the range of X we have Pr[X = a] ≤ 1/2
k
.
That is, the minentropy of X is min
a
¦log 1/Pr[X = a]¦.
Deﬁnition 31 A random variable with range ¦0, 1¦
N
having minentropy at least k will be
called a (N, k)source.
Given one access to a (N, k) source, we would like to be able to simulate any probabilistic
algorithm that uses m random bits, where m is close to k. If the simulation is “black box”
160
and takes time T, one can argue that m ≤ k +O(log T). We will not deﬁne formally what a
blackbox simulation is, but we will develop simulations that are black box, so it will come
as no surprise that our simulations will work only for m smaller than k, in fact only for m
smaller than k
1/3
. (This is partly due to oversimpliﬁcations in the analysis; one could get
k
.99
with almost the same proof.)
25.2 Extractors
An extractor is a function that transforms a (N, k) source into an almost uniform distri
bution. The transformation is done by using a (typically very small) number of additional
random bits.
Formally, we have the following deﬁnition.
Deﬁnition 32 For two random variables X and Y with range ¦0, 1¦
m
, their variational
distance is deﬁned as [[X−Y [[ = max
S⊆{0,1}
m¦[Pr[X ∈ S] −Pr[Y ∈ S][¦. We say that two
random variables are close if their variational distance is at most .
Deﬁnition 33 A function Ext : ¦0, 1¦
N
¦0, 1¦
t
→ ¦0, 1¦
m
is a (k, ) extractor if for any
(N, k) source X we have that Ext(X, U
l
) is close to uniform, where U
l
is the uniform
distribution over ¦0, 1¦
l
.
Equivalently, if Ext : ¦0, 1¦
N
¦0, 1¦
t
→ ¦0, 1¦
m
is a (k, ) extractor, then for every
distribution X ranging over ¦0, 1¦
N
of minentropy k, and for every S ⊆ ¦0, 1¦
m
, we have
[Pr
a∈X,z∈{0,1}
t [Ext(a, z) ∈ S] −Pr
r∈{0,1}
m[r ∈ S][ ≤
25.3 Applications
See [Nis96] for an extensive survey. Here we present only one application. Another notable
application is the construction of expanders.
Suppose that A(, ) is a probabilistic algorithm that on an input of length n uses m(n)
random bits, and suppose that for every x we have Pr
r
[A(x, r) = right answer ] ≥ 3/4.
Suppose Ext : ¦0, 1¦
N
¦0, 1¦
t
→ ¦0, 1¦
m
is a (k, 1/4)extractor.
Consider the following algorithm A
: on input x ∈ ¦0, 1¦
n
and weakly random a ∈
¦0, 1¦
N
, A
computes A(x, Ext(a, z)) for every z ∈ ¦0, 1¦
t
, and then it outputs the answer
that appears the majority of such 2
t
times. We want to argue that A
is correct with high
probability if a is sampled from a weak random source of entropy slightly higher than k.
Let us ﬁx the input x. Consider the set B of strings a ∈ ¦0, 1¦
N
for which the algorithm
A
makes a mistake:
B = ¦a : Pr
z∈{0,1}
t [A(x, Ext(a, z)) = right answer ] < 1/2¦
Consider the random variable X that is uniformly distributed over B (clearly, X has min
entropy log B). Then we have
Pr
a∈X,z∈{0,1}
t [A(x, Ext(a, z)) = right answer ] < 1/2
161
and so
[Pr
a∈X,z∈{0,1}
t [A(x, Ext(a, z)) = right answer ] −Pr
r
[A(x, r) = right answer ][ > 1/4
and then it follows form the property of Ext that X must have minentropy less than k,
that is [B[ ≤ 2
k
.
Let now X be a (N, k + 2)source, and let us execute algorithm A
using X. Then
Pr
a∈X,z∈{0,1}
t [A(x, Ext(a, z)) = right answer ] = 1 −Pr
a∈X
[a ∈ B] ≥ 3/4
More generally
Theorem 114 Suppose A is a probabilistic algorithm running in time T
A
(n) and using
m(n) random bits on inputs of length n. Suppose we have for every m(n) a construction
of a (k(n), 1/4)extractor Ext
n
: ¦0, 1¦
N
¦0, 1¦
t(n)
→ ¦0, 1¦
m(n)
running in T
E
(n) time.
Then A can be simulated in time 2
t
(T
A
+T
E
) using one sample from a (N, k + 2) source.
25.4 An Extractor from NisanWigderson
This is a simpliﬁed presentation of results in [Tre01] (see also [RRV99, ISW00]).
Let ( : ¦0, 1¦
N
→ ¦0, 1¦
¯
N
be a polynomial time computable errorcorrecting code such
that any ball of radius at most 1/2−δ contains at most 1/δ
2
codewords. Such a code exists
with ¯ n = poly(n, 1/δ).
For a string x ∈ ¦0, 1¦
¯
N
, let < x >: ¦0, 1¦
log
¯
N
→ ¦0, 1¦ be the function whose truth
table is x. Let l = log
¯
N, and let o = (S
1
, . . . , S
m
) be a (l, log m) design over [t]. Then
consider the procedure ExtNW : ¦0, 1¦
N
¦0, 1¦
t
→ ¦0, 1¦
m
deﬁned as
ExtNW
C,S
(x, z) = NW
<C(x)>,S
(z) .
That is, ExtNW ﬁrst encodes its ﬁrst input using an errorcorrecting code, then views it as
a function, and ﬁnally applies the NisanWigderson construction to such a function, using
the second input as a seed.
Lemma 115 For suﬃciently large m and for > 2
−m
2
, ExtNW
C,S
is a (m
3
, 2)extractor.
Proof: Fix a random variable X of minentropy m
3
and a function D : ¦0, 1¦
m
→ ¦0, 1¦;
we will argue that
[Pr[D(r) = 1] −Pr
a∈X,z∈{0,1}
t [D(ExtNW(a, z)) = 1][ ≤ 2
Let us call a value a bad if it happens that
[Pr[D(r) = 1] −Pr
z∈{0,1}
t [D(ExtNW(a, z)) = 1][ >
and let us call B the set of bad a. When a is bad, it follows that there is a circuit C of size
O(m
2
) such that either D(C()) or its complement agrees with a on a fraction 1/2 +/m of
its entries. Therefore, a is totally speciﬁed by D, C, and 2 log(m/) additional bits (once
we have D and C, we know that the encoding of a sits in a given sphere of radius 1/2−/m,
162
together with at most other (m/)
2
codewords). Therefore, for a ﬁxed D, the size of B is
upper bounded by the number of circuits of size O(m
2
), that is 2
O(m
2
log m)
, times (m/)
2
,
times 2. The total is 2
O(m
2
log m)
. The probability that an element a taken from X belongs
to B is therefore at most 2
−m
3
2
O(m
2
log m)
< for suﬃciently large m. We then have
[Pr[D(r) = 1] −Pr
a∈X,z∈{0,1}
t [D(ExtNW(a, z)) = 1][
≤
¸
a
Pr[X = a]
Pr[D(r) = 1] −Pr
z∈{0,1}
t [D(ExtNW(a, z)) = 1]
≤ Pr[X ∈ B] + ≤ 2
2
Theorem 116 Fix a constant ; for every N and k = N
Ω(1)
there is a polynomialtime
computable (k, )extractor Ext : ¦0, 1¦
N
¦0, 1¦
t
→ ¦0, 1¦
m
where m = k
1/3
and t =
O(log N).
163
Bibliography
[ACR97] A.E. Andreev, A.E.F. Clementi, and J.D.P. Rolim. Optimal bounds for the ap
proximation of boolean functions and some applications. Theoretical Computer
Science, 180:243–268, 1997.
[ACR98] A.E. Andreev, A.E.F. Clementi, and J.D.P. Rolim. A new general derandom
ization method. Journal of the ACM, 45(1):179–213, 1998.
[ACR99] A.E. Andreev, A.E.F. Clementi, and J.D.P. Rolim. Worstcase hardness suﬃces
for derandomization: A new method for hardness vs randomness tradeoﬀs.
Theoretical Computer Science, 221:3–18, 1999.
[ACRT99] A.E. Andreev, A.E.F. Clementi, J.D.P. Rolim, and L. Trevisan. Weak random
sources, hitting sets, and BPP simulations. SIAM Journal on Computing,
28(6):2103–2116, 1999. Preliminary version in Proc of FOCS’97.
[Adl78] Leonard Adleman. Two theorems on random polynomial time. In Proceedings
of the 19th IEEE Symposium on Foundations of Computer Science, pages 75–
83, 1978.
[ALM
+
98] S. Arora, C. Lund, R. Motwani, M. Sudan, and M. Szegedy. Proof veriﬁcation
and hardness of approximation problems. Journal of the ACM, 45(3):501–555,
1998. Preliminary version in Proc. of FOCS’92.
[AS98] S. Arora and S. Safra. Probabilistic checking of proofs: A new characterization
of NP. Journal of the ACM, 45(1):70–122, 1998. Preliminary version in Proc.
of FOCS’92.
[ATSWZ97] R. Armoni, A. TaShma, A. Wigderson, and S. Zhou. SL ⊆ L
4
3
. In Proceedings
of the 29th ACM Symposium on Theory of Computing, pages 230–239, 1997.
[Bab85] L. Babai. Trading group theory for randomness. In Proceedings of the
17th ACM Symposium on Theory of Computing, pages 421–429, 1985. See
also [BM88].
[BCH
+
96] M. Bellare, D. Coppersmith, J. H˚astad, M. Kiwi, and M. Sudan. Linearity
testing over characteristic two. IEEE Transactions on Information Theory,
42(6):1781–1795, 1996.
164
[BCW80] Manuel Blum, Ashok K. Chandra, and Mark N. Wegman. Equivalence of free
boolean graphs can be decided probabilistically in polynomial time. Informa
tion Processing Letters, 10(2):80–82, 1980.
[BDCGL92] Shai BenDavid, Benny Chor, Oded Goldreich, and Michael Luby. On the
theory of average case complexity. Journal of Computer and System Sciences,
44(2):193–219, 1992.
[BF90] Donald Beaver and Joan Feigenbaum. Hiding instances in multioracle queries.
In Proceedings of STACS’90, pages 37–48, 1990.
[BF99] H. Buhrman and L. Fortnow. Onesided versus twosided error in probabilistic
computation. In STACS’99, pages 100–109, 1999.
[BFNW93] L. Babai, L. Fortnow, N. Nisan, and A. Wigderson. BPP has subexponential
time simulations unless EXPTIME has publishable proofs. Computational
Complexity, 3(4):307–318, 1993.
[BGS98] M. Bellare, O. Goldreich, and M. Sudan. Free bits, PCP’s and non
approximability – towards tight results. SIAM Journal on Computing,
27(3):804–915, 1998. Preliminary version in Proc. of FOCS’95.
[BLR93] M. Blum, M. Luby, and R. Rubinfeld. Selftesting/correcting with applications
to numerical problems. Journal of Computer and System Sciences, 47(3):549–
595, 1993. Preliminary version in Proc. of STOC’90.
[BM88] L. Babai and S. Moran. ArthurMerlin games: a randomized proof system, and
a hierarchy of complexity classes. Journal of Computer and System Sciences,
36:254–276, 1988.
[BSW01] Eli BenSasson and Avi Wigderson. Short proofs are narrow: Resolution made
simple. Journal of the ACM, 48(2), 2001.
[Coo71] S.A. Cook. The complexity of theorem proving procedures. In Proceedings of
the 3rd ACM Symposium on Theory of Computing, pages 151–158, 1971.
[Coo73] Stephen A Cook. A hierarchy for nondeterministic time complexity. Journal
of Computer and System Sciences, 7(4):343–353, 1973.
[FF93] Joan Feigenbaum and Lance Fortnow. On the randomselfreducibility of com
plete sets. SIAM Journal on Computing, 22:994–1005, 1993.
[FGL
+
96] U. Feige, S. Goldwasser, L. Lov´asz, S. Safra, and M. Szegedy. Interactive proofs
and the hardness of approximating cliques. Journal of the ACM, 43(2):268–
292, 1996. Preliminary version in Proc. of FOCS91.
[FSS84] Merrick L. Furst, James B. Saxe, and Michael Sipser. Parity, circuits, and the
polynomialtime hierarchy. Mathematical Systems Theory, 17(1):13–27, 1984.
[Gil77] J. Gill. Computational complexity of probabilistic Turing machines. SIAM
Journal on Computing, 6:675–695, 1977.
165
[GLR
+
91] P. Gemmell, R. Lipton, R. Rubinfeld, M. Sudan, and A. Wigderson. Self
testing/correcting for polynomials and for approximate functions. In Proceed
ings of the 23rd ACM Symposium on Theory of Computing, pages 32–42, 1991.
[GMR89] S. Goldwasser, S. Micali, and C. Rackoﬀ. The knowledge complexity of in
teractive proof systems. SIAM Journal on Computing, 18(1):186–208, 1989.
Preliminary version in Proc of STOC’85.
[GMW91] O. Goldreich, S. Micali, and A. Wigderson. Proofs that yield nothing but their
validity. Journal of the ACM, 38(3), 1991.
[GNW95] O. Goldreich, N. Nisan, and A. Wigderson. On Yao’s XOR lemma. Technical
Report TR9550, Electronic Colloquium on Computational Complexity, 1995.
[Gol97] Oded Goldreich. Notes on Levin’s theory of averagecase complexity. Technical
Report TR97058, Electronic Colloquium on Computational Complexity, 1997.
[GS86] S. Goldwasser and M. Sipser. Private coins versus public coins in interactive
proof systems. In Proceedings of the 18th ACM Symposium on Theory of
Computing, pages 59–68, 1986.
[GW99] O. Goldreich and A. Wigderson. Improved derandomization of BPP using a
hitting set generator. In RANDOM’99, pages 131–137, 1999.
[Hak85] A. Haken. The intractibility of resolution. Theoretical Computer Science,
39:297–308, 1985.
[H˚as86] Johan H˚astad. Almost optimal lower bounds for small depth circuits. In
Proceedings of the 18th ACM Symposium on Theory of Computing, pages 6–
20, 1986.
[H˚as99] J. H˚astad. Clique is hard to approximate within n
1−
. Acta Mathematica,
182:105–142, 1999.
[H˚as01] Johan H˚astad. Some optimal inapproximability results. Journal of the ACM,
48(4):798–859, 2001.
[HILL99] J. H˚astad, R. Impagliazzo, L. Levin, and M. Luby. A pseudorandom generator
from any oneway function. SIAM Journal on Computing, 28(4):1364–1396,
1999.
[HS65] J. Hartmanis and R.E. Stearns. On the computational complexity of algo
rithms. Transactions of the AMS, 117:285–306, 1965.
[IL90] Russell Impagliazzo and Leonid Levin. No better ways to generate hard NP
instances than picking uniformly at random. In Proceedings of the 31st IEEE
Symposium on Foundations of Computer Science, pages 812–821, 1990.
[Imm88] N. Immerman. Nondeterministic space is closed under complementation. SIAM
Journal on Computing, 17:935–938, 1988.
166
[Imp95a] Russell Impagliazzo. Hardcore distributions for somewhat hard problems. In
Proceedings of the 36th IEEE Symposium on Foundations of Computer Science,
pages 538–545, 1995.
[Imp95b] Russell Impagliazzo. A personal view of averagecase complexity. In Proceed
ings of the 10th IEEE Conference on Structure in Complexity Theory, pages
134–147, 1995.
[ISW99] R. Impagliazzo, R. Shaltiel, and A. Wigderson. Nearoptimal conversion of
hardness into pseudorandomness. In Proceedings of the 40th IEEE Symposium
on Foundations of Computer Science, 1999.
[ISW00] R. Impagliazzo, R. Shaltiel, and A. Wigderson. Extractors and pseudorandom
generators with optimal seed length. In Proceedings of the 32nd ACM Sympo
sium on Theory of Computing, pages 1–10, 2000.
[IW97] R. Impagliazzo and A. Wigderson. P = BPP unless E has subexponential
circuits. In Proceedings of the 29th ACM Symposium on Theory of Computing,
pages 220–229, 1997.
[IW98] R. Impagliazzo and A. Wigderson. Randomness versus time: Derandomization
under a uniform assumption. In Proceedings of the 39th IEEE Symposium on
Foundations of Computer Science, pages 734–743, 1998.
[JVV86] Mark Jerrum, Leslie G. Valiant, and Vijay V. Vazirani. Random generation
of combinatorial structures from a uniform distribution. Theoretical Computer
Science, 43:169–188, 1986.
[Kar72] R.M. Karp. Reducibility among combinatorial problems. In R.E. Miller and
J.W. Thatcher, editors, Complexity of Computer Computations, pages 85–103.
Plenum Press, 1972.
[KL80] R.M. Karp and R.J. Lipton. Some connections between nonuniform and uni
form complexity classes. In Proceedings of the 12th ACM Symposium on Theory
of Computing, pages 302–309, 1980.
[KZ97] B. Karloﬀ and U. Zwick. A (7/8−)approximation algorithm for MAX 3SAT?
In Proceedings of the 38th IEEE Symposium on Foundations of Computer Sci
ence, pages 406–415, 1997.
[Lau83] C. Lautemann. BPP and the Polynomial Hierarchy. Information Processing
Letters, 17:215–217, 1983.
[Lev73] L. A. Levin. Universal search problems. Problemi Peredachi Informatsii, 9:265–
266, 1973.
[Lev86] Leonid Levin. Average case complete problems. SIAM Journal on Computing,
15(1):285–286, 1986.
167
[Lev87] Leonid Levin. Oneway functions and pseudorandom generators. Combinator
ica, 7(4):357–363, 1987.
[LFKN92] C. Lund, L. Fortnow, H. Karloﬀ, and N. Nisan. Algebraic methods for inter
active proof systems. Journal of the ACM, 39(4):859–868, 1992. Preliminary
version in Proc of FOCS’90.
[Lip90] Richard J. Lipton. Eﬃcient checking of computations. In Proceedings of
STACS’90, pages 207–215, 1990.
[MV99] P.B. Miltersen and N.V. Vinodchandran. Derandomizing ArthurMerlin games
using hitting sets. In Proceedings of the 40th IEEE Symposium on Foundations
of Computer Science, pages 71–80, 1999.
[Nis94] N. Nisan. RL ⊆ SC. Computational Complexity, 4(1), 1994.
[Nis96] N. Nisan. Extracting randomness: How and why. In Proceedings of the 11th
IEEE Conference on Computational Complexity, pages 44–58, 1996.
[NSW92] N. Nisan, E. Szemeredi, and A. Wigderson. Undirected connectivity in
O(log
1.5
n) space. In Proceedings of the 33rd IEEE Symposium on Founda
tions of Computer Science, pages 24–29, 1992.
[NW94] N. Nisan and A. Wigderson. Hardness vs randomness. Journal of Com
puter and System Sciences, 49:149–167, 1994. Preliminary version in Proc.
of FOCS’88.
[Pap94] C.H. Papadimitriou. Computational Complexity. AddisonWesley, 1994.
[Raz87] A.A. Razborov. Lower bounds on the size of bounded depth networks over a
complete basis with logical addition. Matematicheskie Zametki, 41:598–607,
1987.
[Raz98] R. Raz. A parallel repetition theorem. SIAM Journal on Computing,
27(3):763–803, 1998. Preliminary version in Proc. of STOC’95.
[RRV99] R. Raz, O. Reingold, and S. Vadhan. Extracting all the randomness and
reducing the error in Trevisan’s extractors. In Proceedings of the 31st ACM
Symposium on Theory of Computing, pages 149–158, 1999.
[Sav70] Walter J. Savitch. Relationships between nondeterministic and deterministic
tape complexities. Journal of Computer and System Sciences, 4(2):177–192,
1970.
[Sch80] J. T. Schwartz. Fast probabilistic algorithms for veriﬁcation of polynomial
identities. Journal of the ACM, 27:701–717, 1980.
[Sha92] A. Shamir. IP = PSPACE. Journal of the ACM, 39(4):869–877, 1992. Pre
liminary version in Proc of FOCS’90.
168
[Sip83] M. Sipser. A complexity theoretic apprach to randomness. In Proceedings of
the 15th ACM Symposium on Theory of Computing, pages 330–335, 1983.
[Smo87] Roman Smolensky. Algebraic methods in the theory of lower bounds for
boolean circuit complexity. In Proceedings of the 19th ACM Symposium on
Theory of Computing, pages 77–82, 1987.
[Sto76] L.J. Stockmeyer. The polynomialtime hierarchy. Theoretical Computer Sci
ence, 3:1–22, 1976.
[Sto83] L.J. Stockmeyer. The complexity of approximate counting. In Proceedings of
the 15th ACM Symposium on Theory of Computing, pages 118–126, 1983.
[STV01] M. Sudan, L. Trevisan, and S. Vadhan. Pseudorandom generators without the
XOR lemma. Journal of Computer and System Sciences, 62(2):236–266, 2001.
[SU01] R. Shaltiel and C. Umans. Simple extractors for all minentropies and a new
pseudorandom generator. In Proceedings of the 42nd IEEE Symposium on
Foundations of Computer Science, pages 648–657, 2001.
[Sud97] M. Sudan. Decoding of ReedSolomon codes beyond the errorcorrection
bound. Journal of Complexity, 13(1):180–193, 1997. Preliminary version in
Proc. of FOCS’96.
[SZ95] M. Saks and S. Zhou. RSPACE(S) ⊆ DSPACE(S
3/2
). In Proceedings of the
36th IEEE Symposium on Foundations of Computer Science, pages 344–353,
1995.
[Sze88] R. Szelepcs´enyi. The method of forced enumeration for nondeterministic au
tomata. Acta Informatica, 26:279–284, 1988.
[Tre01] L. Trevisan. Extractors and pseudorandom generators. Journal of the ACM,
48(4):860–879, 2001.
[Tse70] G. S. Tseitin. On the complexity of derivation in propositional calculus. In
A. O. Slisenko, editor, Studies in Constructive Mathematics and Mathematical
Logic, pages 115–125, 1970.
[Uma02] C. Umans. Pseudorandom generators for all hardnesses. In Proceedings of the
34th ACM Symposium on Theory of Computing, pages 627–634, 2002.
[Val79] L.G. Valiant. The complexity of computing the permanent. Theoretical Com
puter Science, 8:189–201, 1979.
[VV86] L.G. Valiant and V.V. Vazirani. NP is as easy as detecting unique solutions.
Theoretical Computer Science, 47:85–93, 1986.
[WB86] L.R. Welch and E.R. Berlekamp. Errorcorrection for algebraic block codes.
US Patent No. 4,633,470, 1986.
169
[Wra76] C. Wrathall. Complete sets for the polynomial hierarchy. Theoretical Computer
Science, 3:23–34, 1976.
[Yao82] A.C. Yao. Theory and applications of trapdoor functions. In Proceedings of
the 23th IEEE Symposium on Foundations of Computer Science, pages 80–91,
1982.
[Yao85] Andrew C Yao. Separating the polynomialtime hierarchy by oracles. In Pro
ceedings of the 26th IEEE Symposium on Foundations of Computer Science,
pages 1–10, 1985.
[Zip79] Richard Zippel. Probabilistic algorithms for sparse polynomials. In Proceedings
of the International Symposiumon on Symbolic and Algebraic Computation,
pages 216 – 226. SpringerVerlag, 1979.
170
Foreword
These are scribed notes from a graduate courses on Computational Complexity oﬀered at the University of California at Berkeley in the Fall of 2002, based on notes scribed by students in Spring 2001 and on additional notes scribed in Fall 2002. I added notes and references in May 2004. The ﬁrst 15 lectures cover fundamental material. The remaining lectures cover more advanced material, that is diﬀerent each time I teach the class. In these Fall 2002 notes, there are lectures on H˚ astad’s optimal inapproximability results, lower bounds for parity in bounded depthcircuits, lower bounds in proofcomplexity, and pseudorandom generators and extractors. The notes have been only minimally edited, and there may be several errors and imprecisions. I will be happy to receive comments, criticism and corrections about these notes. The syllabus for the course was developed jointly with Sanjeev Arora. Sanjeev wrote the notes on Yao’s XOR Lemma (Lecture 11). Many people, and especially Avi Wigderson, were kind enough to answer my questions about various topics covered in these notes. I wish to thank the scribes Scott Aaronson, Andrej Bogdanov, Allison Coates, Kamalika Chaudhuri, Steve Chien, Jittat Fakcharoenphol, Chris Harrelson, Iordanis Kerenidis, c Lawrence Ip, Ranjit Jhala, Fran¸ois Labelle, John Leen, Michael Manapat, Manikandan Narayanan, Mark Pilloﬀ, Vinayak Prabhu, Samantha Riesenfeld, Stephen Sorkin, Kunal Talwar, Jason Waddle, Dror Weitz, Beini Zhou. Berkeley, May 2004. Luca Trevisan luca@cs.berkeley.edu
1
Contents
1 Introduction, P and NP 1.1 Computational Problems . . . . . 1.2 P and NP . . . . . . . . . . . . . 1.3 NPcompleteness . . . . . . . . . 1.3.1 Reductions . . . . . . . . 1.3.2 NPcompleteness . . . . . 1.3.3 An NPcomplete problem 1.3.4 The Problem SAT . . . . 1.4 Diagonalization . . . . . . . . . . 1.5 References . . . . . . . . . . . . . 7 7 8 9 9 9 10 10 11 12 14 14 15 16 17 18 18 20 20 21 23 23 24 24 25 26 28 28 29 33
. . . . . . . . .
. . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
. . . . . . . . . . . . . . . . . . . . . . .
2 SpaceBounded Complexity Classes 2.1 SpaceBounded Complexity Classes . 2.2 Reductions in NL . . . . . . . . . . 2.3 NL Completeness . . . . . . . . . . 2.4 Savitch’s Theorem . . . . . . . . . . 2.5 Undirected Connectivity . . . . . . . 2.6 Randomized Logspace . . . . . . . . 2.7 NL = coNL . . . . . . . . . . . . . 2.7.1 A simpler problem ﬁrst . . . 2.7.2 Finding r . . . . . . . . . . . 3 The 3.1 3.2 3.3 3.4 3.5 Polynomial Hierarchy Stacks of quantiﬁers . . . . . The hierarchy . . . . . . . . . An alternate characterization Additional Properties . . . . References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
4 Circuits 4.1 Circuits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4.2 Relation to other complexity classes . . . . . . . . . . . . . . . . . . . . . . 4.3 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
2
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 References .5 The proof of the Leftover Hash Lemma 8. . . . . . . . . . . . . . . . . . . . . . . . . Inputs . . . . . . 6. 5. . . . . . . . . . . . . .1 Distributional Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Branching Programs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .7 References . . . 12. . . . .3 BPP⊆ Σ2 . . . . . . . . . . . . . . 11. . . . . . . . . . 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5. . .3 Reductions . . . . . . . . . . . . . . . . . 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 References . Inputs . . . . . .5 Probabilistic Complexity Classes 5. . . . . . . . .1 The Permanent Problem . . . . . . . . . .4 PolynomialTime on Average . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35 35 36 40 41 43 43 44 45 45 47 47 49 51 51 52 53 54 55 56 57 58 58 59 60 61 61 62 64 66 68 68 70 72 72 73 73 74 77 7 UniqueSAT 7. . . . . . . . . . . . . . . . 11 The XOR Lemma 11. . . . . . . . . . . . . .2 Relations between complexity classes 5. 6. . . . . 9. . . . . . .2 CodingTheoretic Perspective . . . . . . . . .3 An approximate comparison procedure . . . . . . . . . 10 Averagecase Complexity of Problems in PSPACE and EXP 10. . . . . . . . . . . . . . . . . . .3 References . 8 Counting Problems 8. . . . . . . . . . . . . . 8. . . . . .4 References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 Averagecase Complexity in PSPACE and EXP for 7/8 Fraction of 10. . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8. . . . . . . . . . . . . . . . .1 Counting Classes . . . . . . . . . . . . . . . .2 Testing Equivalence . . . . . . . . . .3 The SchwartzZippel Lemma 6. . .1 The ValiantVazirani Reduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3 Averagecase Complexity in PSPACE and EXP for 1/2 Fraction of 10. . . . . . . . . . . . . . . . . . . . . . . . . 7. . . . .5 Existence of Complete Problems . . . 8. . . . . . 12. 3 . 6 A Probabilistic Algorithm 6.1 Probabilistic complexity classes .1 The XOR Lemma . . . . . . 9 AverageCase Complexity of the Permanent 9. . .2 References .2 DistNP . . . . . . . . .2 Worstcase to Averagecase Equivalence . . . . . . . . . . . . . . . . . . 10. 12 Levin’s Theory of Averagecase Complexity 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .6 Approximate Sampling . . . . . . . . . . . . . . . . . . . . . . . . . .4 References . . . .2 Complexity of counting problems . . . . . . . . .4 Constructing acomp . . . . . . . . . . . . . .
16. . . . .3. . . . . . . .1 Interactive Proofs . . . . . . . . . . . . . . . .2 Inapproximability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Inapproximability . . . . . . . . . . . . . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . 15. . 13. . . . . . . . . . . . . . . . . . . 17. . . . . . . . . . . . 4 . . . . .3 Tighter result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15. . 14 IP=PSPACE 14. . . . .5 References . . . . . . . . . . . . .2 PCP and MAX3SAT . . .1 Approximability . . . . . . .3. . . . . 13. . . . . . . . . 14. . . . . . . . . . . 17 Linearity Test 17. . . . . . . . . . TQBF . . . . .3. . . . . . . . . . . . .3. . . . . . . .4 Analysis . . . . . .6 PolynomialTime Samplability . . . .1 UNSAT ⊆ IP . . . . . . . . . . . .3 A Proof System for QBF . . . . .12. . . . . . . . . .3 Testing code words for the Long Code 17. . . . . . . . . . . . . . . . . . . . .1 Approximability . . . . . . .7 References . . . . . 78 79 81 81 82 83 83 84 85 87 89 89 91 92 92 92 95 96 98 98 98 98 99 100 100 100 101 101 102 102 103 105 106 107 107 110 111 111 112 112 . . . . . . . . . . .2 Applications to hardness of approximation 16. .1. . . . . . . . . . . . . 13. . . . . . . . . . . . . . . . . . . . .2 NP + Randomness . . . . . . . 15. . . .2. . . . . . . . . 14. .4 Random SelfReductions . . . . . . . . . . . .3. . . 15. . . . . . . . . . . . . . . . . . . . . .4 References . . . . . . . .1 PreH˚ astad veriﬁcation . .1 Probabilistically Checkable Proofs 15. . . . . . . . . . . . . . . 16. . .2. . . . . . . . . . . . . . . .1 Fourier analysis of Boolean functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 A Proof System for #SAT . . . . . . . . . . . . . 14. . . . . . . . . . . . . . . 16 H˚ astad’s Proof System 16. . . . . . . . . . . . . . . . . . . . . . . . 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14. . . . . . . . . . . . . . . . .3 The Interactive Protocol .1 PSPACEComplete Language: 14. . . . . . . . . . . . . . . . . 17. . . . . . . . . . . . . . . . . . . . . . . . . . 15 Introduction to PCP 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 References . . .4 References . . . .3. . . . . . . . . . . . . . . . 17. . . . . . . . . . . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 Interactive Proofs 13. . . . . . . . . . . . . .1. . . . . . 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 IP . . . . . . . . . . . . . . . . . .1 H˚ astad’s veriﬁer . . . . . . . . 15. . . . . . .2. .3 From the PCP theorem to Raz’s veriﬁer .2 Using Fourier analysis to test linearity 17. . . . . . . . . . . . . 14.1 NP+ Interaction = NP . . . . . . . . .3 An Example: Graph NonIsomorphism 13. 15. . . .2 Arithmetization of TQBF . . . . . . . . . . . . .3. . . . . . . . . . . . . . . . . . . . . . . . .3 Max Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 The Long Code test . . . . . . . . . . . 13.
.2 H˚ astad’s Result . . .2. . . . . . . . . . . . . . . . . . . 153 23. . . . . . . . . . . . . . . . . . . .1 Formal Deﬁnitions of Complexity Measures and Complexity Classes 151 23. .18 Testing the Long Code 114 18. . .2 Hardness versus Randomness . . . . . . . . . . . 114 18. . .3 Putting them together . . . . . . . . 151 23. . . . . . . . . . . . . . . 155 24. . 150 23.1 Test for Long Code . .2 Derandomization Under Complexity Assumptions .2. . . . . . . . . . . .3 Approximating OR . . . . .3 Pseudorandom Generators . . . . . .2 H˚ astad’s veriﬁer .2 Proof of Circuit Lower Bounds for PARITY using Polynomials 19. . . . . . . . . . . . . . . . . . . . . . . 125 20. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2 Combinatorial Designs . . . . . . . . . . . . . . . . . . . . . . .3 References . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19. . . . . . . . . .4 References . . . . . . . . . . . . . 19. . . . . . . . . . . . . . . . . 136 21. . . . . . . . . . . . . . . . . . . . . . . .1. . . 22. .1 Circuit Lower Bounds for PARITY . . 152 23. . . . . . . . . . . 132 21 Proof Complexity 133 21. . .1. . .3 References . . . . . 153 23. . . . . . . . . . . . . . . . . . . . . . . . . . . .4. . . . . . .2 Matching and Satisﬁability . . . . . . 22. . . . . . . . . . . . . . . . . . . 134 21. . . . . . . . . . . . . . .2. . . . . .4. . . . . .1 The Main Theorem . . 149 23. . . . . . . . . . . . . . 119 119 120 122 124 20 Lower Bound for Parity Using Random Restrictions 125 20.1. .1. . . . . . . . . . . . . . . . . . . .1 Probabilistic Algorithms versus Deterministic Algorithms . . . . . . . . . . . . . 154 24 The NisanWigderson Generator 155 24. . . . . .1 Impredictability versus Pseudorandomness . . 22. . . . . . .1 A trivial deterministic simulation . . . . . . . . .1. . . . . . . . . . . . . 22. . . . . . .4 The two main theorems . . . . . . . 138 22 Lower Bounds on Resolution Proof Length 22. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1 The NisanWigderson Construction . . . . . . . . . . . . . . . . . . . . . . . .1 Resolution Proof System . . . . .2 Proof of the technical lemmas . . . 115 19 Circuit Lower Bounds for Parity Using Polynomials 19. . . . . . 22. . . 129 20. . . . . . . . . . . . . . . . . 156 5 . . . . . . . . .2 Exponential gaps between randomized and deterministic procedures 149 23. . . . . . . . . . . 139 139 139 140 142 142 143 145 147 23 Pseudorandomness and Derandomization 148 23. . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . . . . .3 References . . .1 Width and Proof Complexity . . . . . . . . . . . . .1 The NisanWigderson Theorem . . . . . . . . . . . . . . . . .2 Worstcase to Averagecase Reduction . . . . . . . . . . 155 24. . . . . . . . . . . . . . . . . . . . . . . . .2 Width Lower Bounds . . . 22. .1 Random Restrictions . . . . . . . . . . . . .2 Width of a Resolution Proof . . . . . . . . . . . . . . . . . . . . . . . .2. . . . . . . . . . . 22. .1 The Setting . . . . . . . . . . . . . . . . . . 148 23. . . . . . . .
. .2 The Reduction from Distinguishing to Predicting . . . . . . . . . .24. . . . . . . . 25 Extractors and Pseudorandom Generators 25. . . . . . . . . 25. . . . . . . . . . . . .2 Extractors . . . . . . . . . . . . . . . . . . . . . . . . . . . . .4 An Extractor from NisanWigderson . . . . . . . . . . . . . . . . .1 Use of Weak Random Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .1. . . . . . . . . . . . . . . . . . . . .3 The NisanWigderson Generator . . . .3 Applications . . 24. . . . . . . . . 157 157 160 160 161 161 162 6 . 25. . . . 25. .
For example what is the relative power of algorithms using randomness and deterministic algorithms. computational complexity theory is mostly concerned with lower bounds. We conclude by demonstrating the use of diagonalization to show some separations between complexity classes. and let us deﬁne some important classes of computational problems. what is the relation between worstcase and averagecase complexity. to be solved. memory. 1}. . even on instances of moderate size.Lecture 1 Introduction. Before going any further. ) needed to solve computational problems that we care about. we are given an input that. 1.. We will assume that the reader is familiar with the notions of algorithm and running time. It is unlikely that such techniques will help solving the P versus NP problem. It is very hard to show that a particular problem is infeasible. as a prerequisite. or equivalent.” the main tool in complexity theory. that is we look for negative results showing that certain problems require a lot of time. and in fact for a lot of interesting problems the question of their feasibility is still open. memory. It is in this direction that we ﬁnd the most beautiful. In particular. we assume to be encoded over the alphabet {0. randomness . and so on. and often surprising. . A main objective of theoretical computer science is to understand the amount of resources (time. Another major line of work in complexity is in understanding the relations between diﬀerent computational problems and between diﬀerent “modes” of computation.1 Computational Problems In a computational problem. and we want to return in output a solution satisfying 7 . etc. let us be more precise in saying what a computational problem is. as well as with basic notions of discrete math and probability. . we are interested in infeasible problems. without loss of generality. Then we will see a particular incarnation of the notion of “reduction. that is computational problems that require impossibly large resources to be solved. communication. and we will introduce NPcompleteness. how easier can we make an optimization problem if we only look for approximate solutions. one of the great success stories of complexity theory. known results in complexity theory. P and NP This course assumes CS170. While the design and analysis of algorithms puts upper bounds on such amounts.
That is. 2. 1}∗ . we will talk about decision problems and languages interchangeably. In this course we will deal with four types of computational problems: decision problems. An example of decision problem is the 3coloring problem: given an undirected graph. where (x. E) we want to ﬁnd. if such a y exists. In fact. and counting problems. c) where G is a 3colorable graph and c is a valid 3coloring of G. determine whether there is a way to assign a “color” chosen from {1. a search problem is speciﬁed by a relation R ⊆ {0. i. we will study the asymptotic complexity of problems. v) ∈ V we have c(u) = c(v). A standard convention is to call an algorithm “feasible” if it runs in polynomial time. search problems. because beyond being asked to determine whether such a c exists. In a decision problem. optimization problems. 1}∗ . given an input x ∈ {0. then 3COL is the language that speciﬁes the 3coloring problem. 1} ∗ of inputs for which the answer is YES. such that for every (u. This is diﬀerent (and more demanding) than the decision version. 3} of the vertices. and we will just ask whether there is a “feasible” asymptotic algorithm for 3coloring.e. with the previous convention. if it exists. Instead of considering. every decision problem can be speciﬁed using a language (and every language speciﬁes a decision problem). From now on. in a decision problem we are only asked to verify whether the input satisﬁes a certain property. Consider for example the search version of the 3coloring problem: here given an undirected graph G = (V. 1. say. we will discuss decision and search problem. if it exists. 1}∗ we want to compute some answer y ∈ {0. but it is also arbitrary: in fact every problem can be seen as a search problem 1 8 .some property: a computational problem is then described by the property that the output has to satisfy given the input. we will ask what is the best asymptotic running time of an algorithm that solves 3coloring on all instances. 1} ∗ containing (descriptions of) 3colorable graphs. 000 nodes on some particular model of computation. given an input x ∈ {0. For example. In a search problem. 2. A convenient way to specify a decision problem is to give the set L ⊆ {0. the 3coloring problem is speciﬁed by the relation R3COL that contains all the pairs (G. 1}∗ is also called a language.1 For the moment. we will be much less ambitious. y) ∈ R if and only if y is an admissible answer given x. if there is some polynomial p such that the algorithm runs in time at most p(n) on inputs of length n. a coloring c : V → {1.2 P and NP In most of this course. the time required to solve 3coloring on graphs with 10. 1}∗ × {0. Thus. we are required to give a YES/NO answer. Formally. so. Here feasible refers more to the rate of growth than to the running time of speciﬁc instances of reasonable size. 3} to each vertex in such a way that no two adjacent vertices have the same color. if we call 3COL the subset of {0. we are also asked to construct it. 1}∗ that is in some relation to x. A subset of {0. This distinction is useful and natural.
1. then also A ∈ P (conversely. Two immediate observations: if A ≤ B and B is in P.3. a nondeterministic Turing machine can guess a string y of length less than p(x) and then accept if and only if (x. Therefore. y) accepts. then A is solvable in polynomial time if and only if P = NP. P = kDT IM E(O(nk )) and NP = kDT IM E(O(nk )). y) ∈ R. and if there is a polynomial p such that if (x. denoted A ≤ B. We say that a search problem deﬁned by a relation R is a NP search problem if the relation is eﬃciently computable and such that solutions. then also A ≤ C. we deﬁne by DT IM E(t(n)) the set of decision problems that are solvable by a deterministic Turing machine within time t(n) on inputs of length n. We say that A reduces to B. 9 . R is an NP search problem if there is a polynomial time algorithm that.2 NPcompleteness A decision problem A is NPhard if for every problem L ∈ NP we have L ≤ A. y ≤ p(x) such that V (x. Suppose that L is in NP according to our ﬁrst deﬁnition and that R is the corresponding NP relation. y) ∈ R. For a function t : N → N . a decision problem L is an NP decision problem if there is a polynomial time algorithm V (·. 1. Suppose that L is solvable in polynomial time by a nondeterministic Turing machine M : then we can deﬁne the relation R such that (x. if they exist. and A ∈ P then also B ∈ P). y) ∈ R. t) ∈ R if and only if t is a transcript of an accepting computation of M on input x and it’s easy to prove that R is an NP relation and that L is in NP according to our ﬁrst deﬁnition. on input x. A decision problem A is NPcomplete if it is NPhard and it belongs to NP. if A ≤ B and B ≤ C. NP can be deﬁned as the set of decision problems that are solvable in polynomial time by a nondeterministic Turing machine.We denote by P the class of decision problems that are solvable in polynomial time. Such a machine can be implemented to run in nondeterministic polynomial time and it decides L. It is a simple observation that if A is NPcomplete. if A ≤ B. ·) and a polynomial p such that x ∈ L if and only if there is a y. are short.1 NPcompleteness Reductions Let A and B be two decision problems. Equivalently.3. decides whether (x. Formally.3 1. Then. We say that a decision problem L is an NP decision problem if there is some NP relation R such that x ∈ L if and only if there is a y such that (x. if there is a polynomial time computable function f such that x ∈ A if and only if f (x)inB. and by N T IM E(t(n)) the set of decision problems that are solvable by a nondeterministic Turing machine within time t(n) on inputs of length n. given x and y. We denote by NP the class of NP decision problems. y) ∈ R then y ≤ p(x). Equivalently.
We give a reduction from L to U . t. . The operators work in the way that one expects: ¬x is True if and only if x is False. On the other hand. x. A literal is a variable or the negation of a variable. 2 The “unary” encoding of an integer n is a sequence of n ones. while the expression x ∧ (¬x ∨ y) ∧ ¬y is impossible to satisfy. and the problem is to determine whether there is a y ∈ {0. 1}∗ . x. The question is whether there is a way to assign Boolean (True / False) values to the variables so that the formula is satisﬁed. such that M (x. x ∈ {0.3. y) accepts. First of all. So. l) where M is a Turing machine. . t. To complete the description of the problem we need to explain what is a Boolean formula in conjunctive normal form. x2 . that we call U : we are given in input (M.3 An NPcomplete problem Consider the following decision problem. A formula in conjunctive normal form is the AND of clauses. 10 . 1. and t and l are integers encoded in unary2 . For example (x3 ∨ ¬x4 ) ∧ (x1 ) ∧ (¬x3 ∨ x2 ) is a formula in conjunctive normal form (from now on. . 1}∗ is a possible input. y ≤ l. for example. Let L be an NP decision problem. the formula is given in a particular format called conjunctive normal form. so for example (x2 ∨ ¬x4 ∨ x5 ) is a clause. The reduction maps x into the instance f (x) = (VL . Then there are algorithm VL . One can deﬁne a procedure VU that on input (M. the formula x ∧ (¬x ∨ y) ∧ ¬y is not satisﬁable. the expression ¬x ∧ (x ∨ y) can be satisﬁed by setting x to False and y to True.4 The Problem SAT In SAT (that stands for CNFsatisﬁability) we are given Boolean variables x 1 . and. and polynomials TL and pL . Boolean formulas are constructed starting from variables and applying the operators ∨ (that stands for OR). we will just say “CNF formula” or “formula”). l) and y accepts if and only if y ≤ l. y) accepts in at most t steps. that we will explain in a moment. pL (x)). x. we can see that x ∈ L if and only f (x) ∈ U . as it has already been observed. furthermore VL runs in time at most TL (x + y).1. y ≤ pL (x) such that VL (x. x ∨ y is True if and only at least one of x or y is True.3. It is immediate to see that U is in NP. Note that the above formula is satisﬁable. A clause is formed by taking one or more literals and connecting them with a OR. Just by applying the deﬁnitions. y) accepts in ≤ t steps. it is satisﬁed by setting all the variables to True (there are also other possible assignments of values to the variables that would satisfy the formula). such that x ∈ L if and only if there is y. ∧ (that stands for AND) and ¬ (that stands for NOT). . and M (x. TL (x + pL (x)). and so is (x3 ). xn and a Boolean formula φ involving such variables. for example. x ∧ y is True if and only if both x and y are True. so for example ¬x 7 is a literal and so is x3 .
If we could do so. This is a welldeﬁned j=1 real number but there can be no i such that r = ri . for constants a and b. given the following result (that we will not prove). . Suppose by contradiction that D is solvable in time ≤ an + b on inputs of length n. be an enumeration. z) answer NO if M (x) rejects within x · logx steps. Then deﬁne the real j=1 number r whose jth digit is 1 − rj [j]. Theorem 3 DT IM E(n(log n)3 ) ⊆ DT IM E(O(n)). . because r diﬀers from i in the ith digit. 2 See the homeworks for generalizations of the above proof. and YES otherwise. The above problem is decidable if the Halting problem is decidable. 1) where countable. which uses the following simple fact. Proof: Consider the following decision problem D: on input x = ( M . . both the universal machine and the simulated machine are twotapes Turing machines. . suppose D where decidable and let T be a Turing machine that solves D. ri = sum∞ 2−j ri [j]).4 Diagonalization Diagonalization is essentially the only way we know of proving separations between complexity classes. by a machine T . However. but we run into the problem that we cannot ask a nondeterministic machine to reject if and only if a nondeterministic machine of comparable running time accepts. which is less than n(log n)3 for all but ﬁnitely many inputs. . A similar theorem holds for any reasonable model of computation. rn . the description of a Turing machine. Theorem 2 (Eﬃcient Universal Turing Machine) On input the description M of a Turing machine M . . A considerably subtler argument must be used instead. The problem can be solved in O(n(log n)2 ) time. then T ( T . The O(t log t) simulation is possible if. then we would be able to prove NP = coNP. We would like to do the same for nondeterministic time. . The basic principle is the same as in Cantor’s proof that the set of real numbers is not countable: suppose towards a contradiction that the set of real numbers in the range [0. z) > a T . with tO(1) in place of O(t log t). and let r1 . the problem of deciding whether M accepts x within t steps is solvable in O(t log t) time. we can prove that the Halting problem is undecideble by considering the following decision problem D: on input M . z log( T . z) accepts if and only if it rejects. say. Let z be suﬃciently long so that  T . answer NO if M ( M ) halts and accepts and YES otherwise. which is a contradiction.Theorem 1 (Cook) SAT is NPcomplete. a string x and an integer t > n. Similarly. that is. It is easy to do something similar with timebounded computations. which is a contradiction. Let ri [j] be the jth digit in the binary expansion of ri (that is. then T ( T ) halts and accepts if and only if T ( T ) does not halt and accept. 1. 11 . r = ∞ 2−j (1 − rj [j]). z + b.
We ﬁrst observe that D is solvable by a nondeterministic Turing machine running in O(n(log n)2 ) time. Consider the following decision problem D: on input x = ( M . and since T ( T . the input ( T . Proof: Let f : N → N be deﬁned inductively so that f (1) = 2 and f (k + 1) = 2kf (k) . 2. 12 .5 References The time hierarchy theorem is proved in [HS65]. if k is large enough.” The nondeterministic hierarchy theorem is due to Cook [Coo73]. Theorem 5 N T IM E(n(log n)3 ) ⊆ N T IM E(O(n)).Theorem 4 On input the description M of a nondeterministic Turing machine M .1. 1f (k) ) (1. which is also the paper that introduced the term “Computational Complexity. the problem of deciding whether M accepts x within t steps is solvable in deterministic 2O(t) time. T ( T . where n is the input length. 1t+1 ) accepts within t log t steps. The notion of NPcompleteness is due to Cook [Coo71] and Levin [Lev73]. 2 1. 1t (which are solved by T in time a t + b ). otherwise answer YES if and only if M ( M . Finally. 11+f (k−1) ) can be simulated deterministically in time 2(k−1)f (k−1) . 11+f (k−1)) = T ( T . in particular. 1t ). and the recognition of its generality is due to Karp [Kar72]. Pick a suﬃciently large k. 11+f (k−1) ) as in Theorem 4 returns NO within t steps. a string x and an integer t > n.1) On the other hand. 11+f (k−1) ) accepts. 1t ) = T ( T . 1f (k) ) falls in case (2). we can state and prove a special case of the nondeterministic hierarchy theorem. 1f (k) ) is NO if and only if T ( T . and consider inputs of the form T . which is less than n(log n)3 except for ﬁnitely many values of n. since all such inputs fall in case (2). where M is nondeterministic Turing machine. we have that T ( T . 1t ) for f (k − 1) < t < f (k). if t = f (k) then the answer is YES if and only if the simulation of M ( M . and consider the behaviour of T on inputs ( T . 1t+1 ) for all such t and. Suppose that D were decided by a nondeterministic Turing machine T running in time bounded by an + b. which is in contradiction to Equation 1. 1. and so the correct answer on input ( T .
The proof of Theorem 3 shows that there is a problem D solvable in O(n(log n) 2 ) time such that every machine M running in time o(n log n) fails at solving D on a constant fraction of all inputs of suﬃciently large length. Show that if P = NP for decision problems. we can compute t(n) in O(t(n)) time. If you can ﬁnd an oracle relative to which the above statement is false. for every n > nM .] 13 .Exercises 1. is exponentially small in the representation of M . Show that there is a problem D solvable in O(n(log n)2 ) time and such that for every machine M running in time o(n log n) there is a constant nM such that. Generalize Theorem 3. then every NP search problem can be solved in polynomial time. M fails at solving D on at least 1/3 of all inputs of length n. that would also be interesting. The constant. 2. [Note: this may be an open question. because it would show that the question cannot be solved via a standard diagonalization. however. Show that if t(n) and t (n) are two timeconstructible functions such that t (n) > t(n) > n and limn→∞ t(n)tlog t(n) = 0 then DT IM E(t (n)) ⊆ DT IM E(t(n)). given n. Say that a monotone nondecreasing function t : N → N is timeconstructible if. (n) 3.
the number of possible conﬁgurations is simply the product of the number of states. A ﬁrst observation is that a spaceeﬃcient machine is. Proof: Call the “conﬁguration” of a machine M on input x a description of the state of M . and therefore the future. ct where ci is the conﬁguration at time i and t is the running time of M (x). since the ci completely describes the present. We denote by l the set of decision problems solvable in O(log n) space. For a standard Turing machine. . and the contents of the work tape at a given time. c2 . Then we can say the machine uses space s if for input x. the computation must therefore ﬁnish in 2O(s(n)) steps. NPSPACE is the set of decision problems solvable by a nondeterministic 14 . to a certain extent. So we will often consider a machine with multiple tapes: a readonly “input” tape.Lecture 2 SpaceBounded Complexity Classes 2. then it runs in time 2O(s(n)) . also a timeeﬃcient one. we can’t do better than linear space since x itself must be on the tape. This is O(1) · n · Σs(n) = 2O(s(n))+log n = 2O(s(n)) Since we cannot visit a conﬁguration twice during the computation. Now. for every input x. and uses s(·) space. the number of positions on the input tape. the position of the input tape. We denote by PSPACE the set of decision problems solvable in polynomial space. Theorem 6 If a machine always halts. and the number of possible contents of the work tape (which itself depends on the number of allowable positions on the input tape). In general we denote by SPACE(s(n)) the set of decision problems that can be solved using space at most s(n) on inputs of length n. . a read/write “work” or “memory” tape. the machine outputs the correct answer and uses only the ﬁrst s(x) cells of the tape. and possibly a writeonce “output” tape. . Write down c1 . of the computation.1 SpaceBounded Complexity Classes A machine solves a problem using space s(·) if. or else the machine would be in a loop. it uses only the ﬁrst s(x) cells of the work tape. 2 NL is the set of decision problems solvable by a nondeterministic machine using O(log n) space. No two ci can be equal. with s(n) ≥ log n. .
Recall the deﬁnition of a Turing machine M that has a log space complexity bound: M has one readonly input tape. one writeonly output tape. we could think that NL is exactly the set of decision problems that have “solutions” that can veriﬁed in logspace. To get a more interesting notion of NLcompleteness we need to turn to weaker reductions. and A ≤log B. if ∃ a function f computable in log space such that x ∈ A iﬀ f (x) ∈ B. To prove the theorem. and if both procedures run in polynomial time then their composition is also polynomial time. Analogously with timebounded complexity classes. we cannot use polynomialtime reductions. The composition of the two procedures would given an algorithm for A. In general we denote by NSPACE(s(n)) the set of decision problems that can be solved by nondeterministic machines that use at most s(n) bits of space on inputs of length n. Consider a directed graph in which vertices are conﬁgurations and edges indicate transitions from one state to another which the machine is allowed to make in a single step (as determined by its δ). then A ∈ L. Consider a computation of M (x). then their composition can also be computed in log space. 2 2. since there is a logspace algorithm V that veriﬁes solutions to SAT. otherwise any NL problem having at least a YES instance and at least a NO instance would be trivially NLcomplete. A ﬁrst observation is that. Theorem 7 NL ⊆ P. and uses a log space work tape. however. This graph has polynomially many vertices. If our resource bound was polynomial time. Proof: Let L be a language in NL and let M be a nondeterministic logspace machine for L. this is unlikely to be true. However. then we would use Mf (x) to compute f (x). An intuitive reason why not all problems with a logspace “veriﬁer” can be simulated in NL is that an NL machine does not have enough memory to keep track of all the nondeterministic choices that it makes. we deﬁne log space reductions as follows: Deﬁnition 1 Let A and B be decision problems. we have to show that if M f and MB are log space machines. 15 . In particular. so we’re done. We say A is log space reducible to B. and then run MB on f (x). Proof: We consider the concatenation of two machines: Mf to compute f . and B ∈ L. If so. NL would be equal to NP. and MB to solve B. in order to have a meaningful notion of completeness in NL.machine using polynomial space. This describes a polynomialtime algorithm for deciding L. because NL is contained in P. there are 2O(s(n)) = nO(1) possible conﬁgurations.2 Reductions in NL We would like to introduce a notion of completeness in NL analogous to the notion of completeness that we know for the class NP. so in polynomial time we can do a depthﬁrst search to see whether there is a path from the initial conﬁguration that eventually leads to acceptance. A ≤log B. As before. Theorem 8 If B ∈ L.
Then we run a simulation of the computation of MB (f (x)) by using the modiﬁed Mf as an “oracle” to tell us the value of speciﬁed positions of f (x). Add edges (ci . Add edges (c. choose a neighboring vertex v of p. and this implementation does not produce a log space machine. 2 16 . The above graph can be constructed from x in log space. B ≤log A. Set p = v. accept and halt. A is NLcomplete if A ∈ NL and A is NLhard.A naive implementation of the composition of Mf and MB would be to compute f (x). and an additional vertex t. because listing all nodes requires O(log n) space. so the simulation can be carried with a total of O(log x) bits of work space.E). 2. 2 Using the same proof technique. and testing valid edges is also easy. Reject and halt if the end of the for loop is reached. and then run MB on input f (x). construct a directed graph G with one vertex for each conﬁguration of M (x). nondeterminsitically. We now introduce a problem STCONN (s. we can deﬁne NLcompleteness. The algorithm only requires O(log n) space.t. given in input a directed graph G(V. we can show the following: Theorem 9 if A ≤log B. Instead we modify Mf so that on input x and i it returns the ith bit of f (x) (this computation can still be carried out in logarithmic space). 2. and let s be the start conﬁguration. On input x. Let A ∈ NL. If p = t. t) from every conﬁguration that is accepting. we want to determine if there is a directed path from s to t. Theorem 10 STCONN is NLcomplete. however f (x) needs to be stored on the work tape. t ∈ V . E) and two vertices s. STCONN ∈ NL. then A ≤log C. For i = 1 to V . set p to s. STCONN is NLhard. s.3 NL Completeness Armed with a deﬁnition of log space reducibility. B ≤log C. and let MA be a nondeterministic logarithmic space Turing Machine for A. In order to simulate MB (f (x)) we only need to know the content of one position of f (x) at a time. Proof: 1. M accepts x iﬀ some path from s to t exists in G. On input G(V. cj ) if M (x) can move in one step from ci to cj . Deﬁnition 2 A is NLhard if ∀ B ∈ NL. In STCONN.tconnectivity) that we will show is NLcomplete.
If REACH accepts and halts. and. then M accepts. Is there some sense in which this is optimal? Nondeterministically. E). so S(1) = O(log n). Proof: Consider a graph G(V. So. S(k) = O(log k ∗ log n). then REACH accepts iﬀ (u. Let SR (k) be the worstcase space used by REACH(·. which solves to t(k) = n O(log k) = O(nO(log n) ). k): SR (1) = O(s(n)). and that every NL problem can be solved deterministically in O(log2 n) space. ·. k/2 ) and REACH(w. We compute REACH(c0 . k). reject. w. Examining the time required. accept. we ﬁnd that we are exponentially better in space requirements. t. cacc . ∀w ∈ V − {u. compute REACH(u. M rejects. If k = 1. 2 Essentially the same proof applies to arbitrary nondeterministic spacebounded computations. SR (k) = O(s(n)) + SR (k/2). we can search using less than linear space. Let S(k) be the worstcase space use of REACH(·. Theorem 11 STCONN can be solved deterministically in O(log 2 n) space. and t(k) = n(2 ∗ T (k/2)). k). since k = 2O (s(n)). Else. We deﬁne REACH(ci . cj . we see t(1) = O(n + m). which is superpolynomial. then it can be solved deterministically with O(s2 (n))space. Savitch’s algorithm is still the one with the best known space bound. Since k ≤ n. it will do so in ≤ 2O(x) steps. The space required for the base case S(1) is a counter for tracking the edge. 17 . we have SR = O(s2 (n)). v}. which accepts and halts iﬀ M (x) can go from ci to cj in ≤ k steps. v. if A is a problem that can be solved nondeterministically with space s(n) ≥ log n. Can searching be done deterministically in less than linear space? We will use Savitch’s Theorem [Sav70] to show that STCONN can be solved deterministically in O(log2 n). the worstcase space use of REACH is O(log 2 n). k/2 ). We deﬁne a recursive function REACH(u. v. ·. k). but we are no longer polynomial in time. if we let t(k) be the worstcase time used by REACH(·. Theorem 12 (Savitch’s Theorem) For every function s(n) computable in space O(s(n)). Else. If both accept and halt. If there is a call of REACH which accepts and halts. 2O (sx)) for all accepting conﬁgurations cacc . 2 Comparing Theorem 11 to depthﬁrstsearch.4 Savitch’s Theorem What kinds of tradeoﬀs are there between memory and time? STCONN can be solved deterministically in linear time and linear space. NSPACE(s) = SPACE(O(s2 )) Proof: We begin with a nondeterministic machine M . v) is an edge. and vertices s. S(k) = O(log n) + S(k/2) for calls to REACH and for tracking w. This solves SR = s(n) ∗ log k. as in the proof of Theorem 11. If k ≥ 2. k) that accepts and halts iﬀ v can be reached from u in ≤ k steps. In general. ·. No known algorithm achieves polynomial log space and polynomial time simultaneously. which on input x uses s(x) space. using depthﬁrstsearch.2. In general. although such an algorithm is known for undirected connectivity.
we will only introduce randomized machines for solving decision problems. E) and two vertices s. 18 . t. indeed. and it probably is not. In addition to a readonly input tape and a read/write work tape. E) and two vertices s. x) the outcome of the computation. t ∈ V . It is easy to observe that any constant bigger than 0 and smaller than 1 could be equivalently used instead of 1/2 in the deﬁnition above. We say that a decision problem L belongs to the class RL (for randomized logspace) if there is a probabilistic Turing machine M that uses O(log n) space on inputs of length n and such that • For every x ∈ L. the machine is completely deterministic. We will not give a proof of the above theorem. the algorithm rejects. Given an undirected graph G = (V. and either accepts or rejects. Nisan and Wigderson [ATSWZ97]. For every ﬁxed input and ﬁxed content of the random tape. an input x and a content r of the random tape. TaShma.2. Prr [M (r. A nondeterministic machine is symmetric if whenever it can make a transition from a global state s to a global state s then the transition from s to s is also possible. due to Armoni. x) accepts ] ≥ 1/2 • For every x ∈ L. 2. Szemeredy and Wigderson [NSW92]. lefttoright. but just describe the algorithm. The following result shows that. It also follows from the deﬁnition that L ⊆ RL ⊆ NL. For a Turing machine M . we denote by M (r. due to Nisan [Nis94]. If t is never reached. Theorem 13 The problem STUCONN is in RL. There is also an algorithm that has O(log4/3 n) space complexity and superpolynomial time complexity. the algorithm performs a random walk of length 100 · n3 starting from s.5 Undirected Connectivity In the undirected s − t connectivity problem (abbreviated STUCONN) we are given an undirected graph G = (V. Prr [M (r. and the question is whether that is a path between s and t in G. While this problem is not known to be complete for NL. meaning that the head on that tape can only more. improving on a previous algorithm by Nisan. STUCONN is complete for the class SL of decision problems that are solvable by symmetric nondeterministic machines that use O(log n) space. x) accepts ] = 0.6 Randomized Logspace We now wish to introduce randomized spacebounded Turing machine. such machines also have a readonly random tape to which they have oneway access. L ⊆ SL ⊆ RL ⊆ NL. For simplicity. The proof of SLcompleteness of STUCONN is identical to the proof of NLcompleteness of STCONN except for the additional observation that the transition graph of a symmetric machine is undirected. say. For STUCONN there exists an algorithm running in polynomial time and O(log 2 n) space (but the polynomial has very high degree).
s. t v←s for i ← 1 to 100 · n3 pick at random a neighbor w of v if w = t then halt and accept v ← w reject The analysis of the algorithm is based on the fact that if we start a random walk from a vertex s of an undirected vertex G. 19 . then all vertices in the connected component of s are likely to be visited at least once after Θ(n3 ) steps.input: G = (V. The best known deterministic simulation of RL uses O((log n)3/2 ) space. E). and is due to Saks and Zhou [SZ95].
k. v. given a directed graph G.7 NL = coNL In order to prove that these two classes are the same. including s. we will construct a nondeterministic Turing machine which decides whether t is reachable from s by looking at all subsets of r vertices in G. NL ⊆ coNL. whether t is not reachable from s. this means that any problem in NL can be reduced to it and so is also in coNL. and no otherwise assumption: there are exactly r distinct vertices reachable from s c←0 for all v ∈ (V − {t}) do nondeterministically guess if v is reachable from s if guess = YES then nondeterministically guess the distance k from s to v p←s for i ← 1 to k do nondeterministically pick a neighbor q of p p←q if p = v. and since stconn is NLcomplete. which we will assume is equal to the number of vertices reachable from s in G. since if stconn ∈ NL. 2. and halting with no otherwise. otherwise return no It is easy to verify that this algorithm is indeed in NL. Also. E). r output: yes if it discovers that t is not reachable from s. s and t. and since stconn has been shown to be in NL (by the existence of our machine). since any language A in coNL can be reduced to stconn. Hence NL = coNL. q. and each of these variables can be represented with log V  bits. s. We will assume that in addition to the usual inputs G. This result was proved independently in [Imm88] and [Sze88]. Here is the algorithm: input: G = (V. we also have an input r. by deﬁnition stconn ∈ coNL.1 A simpler problem ﬁrst Now all that remains to be shown is that this Turing machine exists. together with special vertices s and t. halting with yes if it sees a subset of vertices which are all reachable from s but do not include t. 20 .2. Once we have the machine. we will show that there is an NL Turing machine which solves stconn. we know that coNL ⊆ NL. t. Note that stconn is coNLcomplete.7. Given these inputs. stconn is the problem of deciding. so is A. The algorithm only needs to maintain the ﬁve variables c. reject c←c+1 if c = r then return yes. First we will solve a simpler problem than stconn. p.
Since there are assumed to be exactly r such vertices. Now here is another try at an algorithm: input: G = (V. Hence the algorithm accepts if and only if it discovers that t is not reachable from s. let’s deﬁne rk to be the set of vertices reachable from s in at most k steps. notice that. s. and if it is exactly r. We need a way to force it to ﬁnd all vertices which are reachable from s. in the algorithm. that means that there are r vertices other than t which are reachable from s. E). meaning that t by assumption cannot be reachable form s. where n is the number of vertices in G.2 Finding r Now we need to provide an NLalgorithm that ﬁnds r. starting from r0 = 1.7.Regarding correctness. k. but we need it to return exactly r. 2. s output: the number of vertices reachable from s (including s in this count) c←0 for all v ∈ V do nondeterministically guess if v is reachable from s in k steps if guess = yes then p←s for i ← 1 to k do nondeterministically guess a neighbor q of p (possibly not moving at all) p←q if p = v reject c←c+1 return c This algorithm has a problem. rk−1 output: the number of vertices reachable from s in at most k steps (including s in this count) assumption: rk−1 is the exact number of vertices reachable from s in at most k − 1 steps c←0 for all v ∈ V do d←0 f lag ← F ALSE for all w ∈ V do p←s for i ← 1 to k − 1 do nondeterministically pick a neighbor q of p (possibly not moving at all) p←q 21 . c can only be incremented for a vertex v that is actually reachable from s. The idea is to try to compute rk from rk−1 and repeat the procedure n − 1 times. Towards this goal. It will only return a number c which is at most r. Then r = rn−1 . E). Let’s ﬁrst try this algorithm: input: G = (V. c can be at most r at the end of the algorithm.
if p = w then d←d+1 if v is a neighbor of w. an input x and a space bound s. Exercises 1. then A is solvable in nondeterministic space O(s(n)). what would happen if the same were true for SPACE(O(n))?] 3. Prove that P = SPACE(O(n)). Deﬁne the class BPL (for boundederror probabilistic logspace) as follows. A decision problem L is in BPL if there is a logspace probabilistic Turing machine M such that • If x ∈ L then Prr [M (r. checking to see if v is either equal to one of these vertices or is a neighbor of one of them (in which case it would be reachable in exactly k steps). so the proof has to start by assuming the classes are equal and then reach a contradiction. then SPACE(s(n) log n) ⊆ SPACE(s(n)). Using the above result. (a) Prove that RL ⊆ BPL. the algorithm uses O(s) bits of space. It is not know how to prove P ⊆ SPACE(O(n)) or how to prove SPACE ⊆ P. Now. It is possible to prove that there is an algorithm that given a Turing machine M (of the type with a read/only input tape and a work tape). we need to determine if it is reachable from s in at most k steps. implying that NL = coNL. 2. • If x ∈ L then Prr [M (r. The algorithm is able to force all vertices of distance at most k − 1 to be considered because it is given rk−1 as an input. In fact. argue that if s(n) > log n is a function computable in O(s(n)) space. Then (b) Prove that BPL ⊆ SPACE(O((log n)2 ). decides whether M (x) accepts and uses ≤ s bits of space. 22 . x) accepts ] ≥ 2/3. without explicitely showing a problem in one class that cannot belong to the other. A ≤ B and B ∈ P then A ∈ P. we have shown that stconn ∈ NL. putting this algorithm together with the ﬁrst one listed above. the proof can be generalized to show that if a decision problem A is solvable in nondeterministic space s(n) = Ω(log n). Hint: if A and B are decision problems. x) accepts ] ≤ 1/3. To do this. we can loop over all vertices which are a distance at most k − 1 from s. [This is a “trick” question. or if v = w then f lag ← T RU E if d < rk−1 reject if f lag then c ← c + 1 return c Here is the idea behind the algorithm: for each vertex v. (c) This might be somewhat harder: prove that BPL ⊆ P.
. ·) and a polynomial bound p() such that x ∈ A if and only if ∀y : y ≤ p(x). An example of such a problem is this: given a Boolean formula φ over variables x1 . y) = 1 and a problem A is in coNP if and only if there is a polynomial time procedure V (·. . y2 ) holds. Notice that φ is equivalent to φ if and only if it agrees with φ under all assignments of Boolean values to the variables. y2 ) is the predicate which is true if and only if x[y2 ] and y1 [y2 ] are both true or both false. y1 is the formula φ . . . ·. is there a formula φ which is equivalent to φ and is of size at most k? In this case.y ≤ p(x) ∧ V (x.t. which do not seem to have simple. y1  ≤ p(x) ∀ y2 s.t. y1 . ·) is a polynomial time procedure. . 23 .Lecture 3 The Polynomial Hierarchy One way to look at the diﬀerence between NP and coNP is that a decision problem in NP is asking a sort of “does there exist” question. meaning that under the variable assignment y 2 . an algorithm solving problem A should return yes on an input x if an only if there exists some string y1 such that for all strings y2 (both of polynomial length). φ and φ agree. and V (x.1 Stacks of quantiﬁers Now suppose you had a decision problem A which asked something of the following form: x ∈ A ⇔ ∃ y1 s. y2 is an arbitrary assignment to the variables x1 . y2  ≤ p(x) V (x.) In other words. a decision problem A is in NP if and only if there is a polynomial time procedure V (·. y1 . xn . Formally. y) = 1 3. . As we will see. y1 . coNP asks “is it true for all” questions. On the other hand. the problem A is a member of the class Σ2 in the second level of the polynomial hierarchy. V (x. xn . . where the existence of the answer can by deﬁnition be eﬃciently represented. . eﬃcient proofs. ·) and a polynomial time bound p() such that x ∈ A if and only if ∃y. the predicate V (x. y2 ) (where p() is a polynomial time bound and V (·.
y1  ≤ p(x). These subset relations hold for Πk as well. y2 ) = 0) 24 . The machine accepts if and only if the answer from the oracle is NO.Qyi . VB (x. we have also omitted the p(·) side conditions.y2  ≤ p(x) ∧ V (x. Similarly. instead of a standard Turing machine. . . by Cook’s theorem. and using only one computational step each time. yi ) where the predicates VA and VB depend on the problems A and B. note that.V (x. Σi and Πi .2 The hierarchy The polynomial hierarchy starts with familiar classes on level one: Σ1 = NP and Π1 = coNP. VA (x. and so can represent classes lower on the hierarchy which have a smaller number of them. and Q and Q represent the appropriate quantiﬁers. . . . . so. we consider one which is augmented with an oracle of a certain power which can be consulted as many times as desired. For clarity. . ∃y2 .3 An alternate characterization The polynomial hierarchy can also be characterized in terms of “oracle machines. which are deﬁned as follows: A ∈ Σi ⇔ ∃y1 .t. One thing that is easy to see is that Πk = coΣk . . Theorem 14 Σ2 = NP3SAT . then for some polynomial p() and polynomialtime computable V () we have x ∈ A if and only if ∃y1 s. . . and then asks the oracle whether ∃y2 . . but they are still there. For all i ≥ 1. . and the machines in M are powerful enough to compute polynomialtime computations. It is immediate that the machine has an accepting computation if and only if ∃y1 . ∀y2 . the machine guesses a string y1 of length at most p(x). if i = 10 then the quantiﬁer Q for Σ10 is ∀. y2  ≤ p(x). y1 . y1 .3. . Then MA is deﬁned to be the class of machines obtained from M by allowing instances of A to be solved in one step. it includes two classes. y2 ) = 0. it is possible to construct in polynomial time a 3SAT instance that is satisﬁable if and only if the answer to the above question is YES. yi ) and B ∈ Πi ⇔ ∀y1 . Proof: Let A ∈ Σ2 . and the quantiﬁer Q for Π10 is ∃). which depend on whether i is even or odd (for example. The above question is an existential question about a polynomialtime computation. y2 ) = 1 Then we deﬁne a nondeterministic machine with an NPoracle as follows: on input x.(¬∃y2 . y1 .y1  ≤ p(x). Also. . for all i ≤ k − 1. y1 . y1 .t. Πi ⊆ Σk and Σi ⊆ Σk . then MC = ML . 3.∀y2 s. if M is a class of Turing machines and C is a complexity class. If L is a complete problem for C. Syntactically.Q yi . then MC = A∈C MA .y2  ≤ p(x) ∧ V (x. . this can be written as follows. This can be seen by noticing that the predicates V do not need to “pay attention to” all of their arguments.” The idea here is that. Let A be some decision problem and M be a class of Turing machines. .
that is, the machine accepts if and only if x ∈ A. Notice that, in the above computation, only one oracle query is made, even though the deﬁnition of NP3SAT allows us to make an arbitrary number of oracle queries. Let now A ∈ NP3SAT , and let M be the oracle machine that solves A. We ﬁrst show that there is a machine M that also solves A, only makes one oracle query, and accepts if and only if the answer to the oracle query is NO. On input x, M guesses an accepting computation of M (x), that is, M guesses all the nondeterministic choices of M (x), all the oracle questions, and all the answers. Then, for each question that was answered with a YES, M guesses a satisfying assignment to verify that the guess was correct. Finally, M is left with a certain set of oracle questions, say, the formulae φ1 , . . . , φk , for which it has guessed that the correct oracle answer is NO. Then M asks its oracle whether (a formula equivalent to) φ1 ∨ · · · ∨ φk is satisﬁable, and it accepts if and only if the answer is NO. Consider the computation of M (x) when x ∈ A: there is a valid accepting computation of M (x), and M (x) can guess that computation along with the valid oracle answers; it can also guess valid assignments for all the queries for which the answer is YES; ﬁnally, it is left with unsatisﬁable formulae φ1 , . . . , φk , the answer to the single oracle query of M is NO, and M accepts. Conversely, if M (x) has an accepting computation, then there must be a valid accepting computation of M (x), and so x ∈ A. 2 In fact, a more general result is known, whose proof works along similar lines. Theorem 15 For every i ≥ 2, Σi = NPΣi−1 .
3.4
Additional Properties
Here are some more facts about the polynomial hierarchy, which we will not prove: 1. Πi and Σi have complete problems for all i. 2. A Σi complete problem is not in Πj , j ≤ i − 1, unless Πj = Σi , and it is not in Σj unless Σj = Σi . 3. Suppose that Σi = Πi for some i. Then Σj = Πj = Σi = Πi for all j ≥ i. 4. Suppose that Σi = Σi+1 for some i. Then Σj = Πj = Σi for all j ≥ i. 5. Suppose that Πi = Πi+1 for some i. then Σj = Πj = Πi for all j ≥ i. We will just prove the following special case of part (3). Theorem 16 Suppose NP = coNP. Then, for every i ≥ 2, Σi = NP. Proof: Let us ﬁrst prove that, under the assumption of the theorem, Σ2 = NP. Let A ∈ Σ2 and let M be the nondeterministic oracle machine that decides A using oracle access to 3SAT. Let also M be the nondeterministic polynomial time Turing machine that decides the complement of the 3SAT problem. We now describe a nondeterministic polynomial time Turing machine M to decide A: on input x, M guesses an accepting computation 25
of M (x), along with oracle queries and answers; for each oracle question φ for which a YES answer has been guessed, M guesses a satisfying assignment; for each oracle question ψ for which a NO answer has been guessed, M guesses an accepting computation of M (ψ). It is easy to verify that M (x) has an accepting computation if and only if M 3SAT (x) has an accepting computation. We can prove by induction on i that Σi = NP. We have covered the base case. Let us now suppose that Σi−1 = NP; then Σi = NPΣi−1 = NPNP = Σ2 = NP. 2 While it seems like an artiﬁcial construction right now, in future lectures we will see that the polynomial hierarchy helps us to understand other complexity classes.
3.5
References
The polynomial time hierarchy was deﬁned by Stockmeyer [Sto76]. Wrathall [Wra76] shows that every class in the polynomial hierarchy has complete problems.
26
Exercises
1. Consider the following decision problem: given a directed graph G = (V, E), two vertices s, t ∈ V , and an integer k, determine whether the shortest path from s to t is of length k. Is this problem in NL? 2. In the MAX SAT problem we are given a formula φ in conjunctive normal form and we want to ﬁnd the assignment of values to the variables that maximizes the number of satisﬁed clauses. (For example, if φ is satisﬁable, the optimal solution satisﬁes all the clauses and the MAX SAT problem reduces to ﬁnding a satisfying assignment.) Consider the following decision problem: given a formula φ in conjunctive normal form and an integer k, determine if k is the number of clauses of φ satisﬁed by an optimal assignment. • Prove that this problem is in NP if and only if NP = coNP. [Hint: prove that it is both NPhard and coNPhard.] • Prove that this problem is in Σ2 .
27
1}m .Lecture 4 Circuits This lecture is on boolean circuit complexity.1. See Figure 4. and is constructed with and gates. A circuit must have no cycle.1 Circuits A circuit C has n inputs. In Section 4. 1}n → {0. A circuit C with n inputs and m outputs computes a function fC : {0.. Each gate has indegree 2 except the not gate which has indegree 1.. Then we consider families of circuits and the language they deﬁne.1: A Boolean circuit. We ﬁrst deﬁne circuits and the function they compute. 28 . or gates and not gates. The outdegree can be any number. culminating with the KarpLipton theorem which states that if NP problems can be decided with polynomialsize circuits. 4. x1 x2 x3 x4 .2.. then PH = Σ2 . m outputs.. zm Figure 4. we see how circuits relate to other complexity classes by a series of results. xn AND OR NOT AND z1 z2 .
. which solves to s(n) = 2 · 2n − 3 = O(2n ).} if for every n ≥ 1 and for every x s. we do not count the not gates. Deﬁne SIZE(C) = # of and and or gates of C. Proof: We need to show that for every 1output function f : {0. . . Cn . In other words. . Deﬁnition 4 Say L ∈ SIZE(s(n)) if L is solved by a family {C1 . . By convention. . The recurrence relation for the size of the circuit is: s(n) = 3 + 2s(n − 1) with base case s(1) = 1. we need to extend the model to arbitrary input sizes: Deﬁnition 3 A language L is solved by a family of circuits {C1 . . . 2 29 . . . xn )) to recursively construct a circuit for f . . . L ∈ SIZE(O(2n )). See Figure 4.2 for an example. C2 . . Cn . xn ) = (x1 ∧ f (1x2 . x ∈ L ⇔ fCn (x) = 1. .x1 x2 x3 x4 NOT AND NOT AND OR XOR circuits Figure 4. .t. where Ci has at most s(i) gates.} of circuits. 4. x = n. . To be compatible with other complexity classes. 1}n → {0. f has circuit size O(2n ).2 Relation to other complexity classes Proposition 17 For every language L.2: A circuit computing the boolean function fC (x1 x2 x3 x4 ) = x1 ⊕ x2 ⊕ x3 ⊕ x4 . 1}. . . Use the identity f (x1 x2 . . . C2 . exponential size circuits contain all languages. as shown in Figure 4.3. xn )) ∨ (x1 ∧ f (0x2 .
s. f(1 x2 . for every n. Proof: This is a counting argument.4. 1}3k → {0. .5.. By Proposition 17. assuming s ≥ n. xn ) NOT AND OR AND Figure 4. For each gate. specify where the two inputs are coming from. This is shown in Figure 4.3: A circuit computing any function f (x1 x2 . it’s easy to show that P = SIZE(nO(1) ). and the type of gate. Proof: Let L be a decision problem solved by a machine M in time t(n). there exists f : {0. . Proposition 18 If L ∈ DTIME(t(n)). then L ∈ SIZE(O(t2 (n))). which is exponential in k but constant in n.. and consider the t(n) × t(n) tableau of the computation of M (x). 30 n . . . f(0 x2 .. xn .t. . . 1}k used by the machine can be implemented by a “next state circuit” of size k · O(23k ). the transition function {0. In other words. q) of the tableau is encoded using k bits. thus also computes the answer to the decision problem. x = n.x1 x2 . 2 Corollary 19 P ⊆ SIZE(nO(1) ). 1}n → {0. See Figure 4. 1} that cannot be computed by a circuit of size 2o(n) ... whether they are complemented. xn ) of n variables assuming circuits for two functions of n − 1 variables.. On the other hand. Identify gates with numbers 1. Proposition 20 There are languages L such that L ∈ SIZE(2o(n) ).. . To bound the number of circuits of size s we create a compact binary encoding of such circuits. There are 22 functions f : {0. 1}n → {0.. and we claim that the number of circuits of size s is at most 2O(s log s) . The total number of bits required to represent the circuit is s(2 log(n + s) + 3) ≤ s(2 log 2s + 3) = s(2 log 2s + 5). Fix n and x s. This building block can be used to create a circuit of size O(t2 (n)) that computes the complete tableau.. 1}.. Assume that each entry (a... xn ) .
xn . . time a b c q d e .. . The left entry of each cell is the tape symbol at that position and time. depending on the position of the machine head. x1 q0 k bits k bits k bits next state next state ..... next state circuit k bits next state x2 x3 .. . .. next state next state .. Figure 4. The right entry is the machine state or a blank symbol. ... .4: t(n) × t(n) tableau of computation.. next state . . ... next state next state next state .. . .. 31 . .. xn . .tape position x1 q0 x2 x3 x4 .5: Circuit to simulate a Turing machine computation by constructing the tableau. check for accepting state . Figure 4. .
1 2 n Consider now the sequence b1 = Cn (φ). . . n • ··· i • Cn . or by using the statement of the Lemma and Cook’s theorem. . . . and this is not suﬃcient to compute all possible functions if 22s log s+5s < 22 . .y1  ≤ p(x)∃y2 . we can show that. for every n. Proof: [Of Theorem 21] We will show that if NP ⊆ SIZE(nO(1) ) then Π2 ⊆ Σ2 . on input a formula φ over n variables and bits b1 . 32 . xn−1 = bn−1 . .V (x. xi−1 = bi−1 . and so it can be built of polynomial size. b2 = Cn (b1 . . The reader should be able to convince himself that this is a satisfying assignment for φ if φ is satisﬁable. bi−1 . φ). . Lemma 22 If NP ⊆ SIZE(nO(1) ) then there is a family of polynomialsize circuits that on input a 3CNF formula φ outputs a satisfying assignment for φ if one such assignment exists and a sequence of zeroes otherwise. bn Cn (b1 . Cn as follows: 1 • Cn . . . .So the number of circuits of size s is at most 22s log s+5s . y1 . In other words. . . Let L ∈ Π2 . . . outputs 1 if and only if φ is satisﬁed by the assignment x1 = b1 . xi = 1 • ··· n • Cn . 1 n Proof: We deﬁne the circuits Cn . . outputs 1 if and only if there is a satisfying assignment for φ where x1 = b1 . . . Before proving the above theorem. . bn−1 . then there is a polynomial p() and a polynomialtime computable V () such that x ∈ L iﬀ ∀y1 . . on input a formula φ over n variables and bits b1 . there is a circuit C n of size polynomial in n such that for every x and every y1 . . bn−1 . 2 Theorem 21 (KarpLiptonSipser) If NP ⊆ SIZE(nO(1) ) then PH = Σ2 . . By a result in a previous lecture. each circuit realizes an NP computation.y2  ≤ p(x). this implies that PH = Σ2 . . and a sequence of zeroes otherwise. the polynomial hierarchy would collapse to its second level. y1  ≤ p(x). 2 We now prove the KarpLiptonSipser theorem. . we ﬁrst show a result that contains some of the ideas in the proof of the KarpLiptonSipser theorem. y2 ) = 1 By adapting the proof of Lemma 22 (see Figure 4. Also. . . .6). φ). on input a formula φ over n variables outputs 1 if and only if there is a satisfying assignment for φ where x1 = 1. xn = 1. This is satisﬁed if s = 2o(n) .
y1 )) = 1 Let q(n) be a polynomial upper bound to the size of Cn .y2  ≤ p(x) ∧ V (x. Cn (x. y1 .x1 y1 there is y2 x1 y1 there is y2 starting with 1 b1 x1 y1 b1 there is y2 starting with b1 1 b2 y2 x1 y1 b1 b2 there is y2 starting with b1 b2 1 b3 ... Cn (x. . y1 )) = 1 which shows that L is in Σ2 . ∃y2 . y2 ) = 1 iﬀ V (x.V (x. 2 4.y1  ≤ p(n). y1 ..Cn  ≤ q(n). x ∈ L iﬀ ∃Cn . 0 or 1 Figure 4. 33 ..6: How to use decision problem solvers to ﬁnd a witness to a search problem. So now we have that for inputs x of length n.3 References The KarpLiptonSipser theorem appears in [KL80].∀y1 . y1 .
).Exercises 1. Show that there is a language in SPACE(2n 3. Show that SIZE(nO(1) ) ⊆ P. 34 . Prove that if EXP ⊆ SIZE(nO(1) ) then EXP = Σ2 . 2. Deﬁne EXP = DTIME(2n O(1) O(1) ) that does not belong to SIZE(2o(n) ).
for every input the probability distribution over all the random sequences must be some constant bounded away from 1 . Deﬁnition 5 An algorithm A is called a polynomial time probabilistic algorithm if the size of the random sequence r is polynomial in the input x and A() runs in time polynomial in x. The output of the algorithm is the correct answer for the input x with some probability. r) = 1] = 1 ∀x ∈ L : Prr∈{0.1}p(x) [A(x.1}p(x) [A(x. RP. 5. then informally we could say that 2 for every input x we need Pr[A(x. r) = 1] ≥ 2/3 ∀x ∈ L Prr∈{0.1}p(x) [A(x. r) = correct answer forx] ≥ 3 .1}p(x) [A(x. ZPP and we will see how they are related to each other. If we want to talk about the correctness of the algorithm. r) = 1] = 0 35 .Lecture 5 Probabilistic Complexity Classes In this lecture we will deﬁne the probabilistic complexity classes BPP. In the same way we can also deﬁne the class P as: Deﬁnition 7 A decision problem L is in Pif there is a polynomial time algorithm A and a polynomial p() such that : ∀x ∈ L : Prr∈{0. That is. 2 Deﬁnition 6 A decision problem L is in BPPif there is a polynomial time algorithm A and a polynomial p() such that : ∀x ∈ L Prr∈{0.1 Probabilistic complexity classes First we are going to describe the probabilistic model of computation. as well as to other deterministic or circuit complexity classes. In this model an algorithm A gets as input a sequence of random bits r and the ”real” input x of the problem. Let us now deﬁne the class BPP. r) = 1] ≤ 1/3 We can see that in this setting we have an algorithm with two inputs and some constraints on the probabilities of the outcome.
Proof: Suppose we have an algorithm for RP. then the algorithm will give the wrong answer with probability less than 1/2. 2 Most natural probabilistic algorithms belong to the class RP. the new algorithm will answer 0. 2 Another interesting property of the class ZPP is that it’s equivalent to the class of languages for which there is an average polynomial time algorithm that always gives the right answer. If x ∈ L then there is a random sequence r.Similarly. r) =? then A(x. since the probability of the ZPP algorithm giving the output ? is less than 1/2. 36 . We can also show that the class ZPP is no larger than RP.2 Relations between complexity classes After deﬁning these probabilistic complexity classes.1}p(x) [A(x. Then for size n we take the worst time over all possible inputs x of size x = n. Theorem 23 RP⊆NP. Theorem 24 ZPP⊆RP. then the algorithm will answer 0 with probability 1. if x ∈ L then there is no witness. On the other hand. for which the algorithm answers yes. Proof: First let us clarify what we mean by average time. Deﬁnition 8 A decision problem L is in RPif there is a polynomial time algorithm A and a polynomial p() such that : ∀x ∈ L Prr∈{0. if the right answer is 0. ∀r such that A(x. r) = 1] ≤ 0 Deﬁnition 9 A decision problem L is in ZPPif there is a polynomial time algorithm A whose output can be 0. On the other hand. when the right answer is 1. For each input x we take the average time of A(x.1}p(x) [A(x. r) =?] ≤ 1/2 ∀x. then we run it again. r) = 1] ≥ 1/2 ∀x ∈ L Prr∈{0.1}p(x) [A(x. In this way. 1. r) = 1 if and only if x ∈ L 5. ? and a polynomial p() such that : ∀x Prr∈{0. More formally. let’s see how they are related to other complexity classes and with each other. The construction consists of running the ZPP algorithm and anytime it outputs ?. Proof: We are going to convert a ZPP algorithm into an RP algorithm. we deﬁne the classes RPand ZPP. r) over all random sequences r. Then this algorithm is also in NP. In order to construct an algorithm that always gives the right answer we run the ZPPalgorithm and if it outputs a ?. Theorem 25 A language L is in the class ZPPif and only if L has an average polynomial time algorithm that always gives the right answer.
. r2 ) = . the worst running time is polynomial. we ﬁx k and deﬁne the following algorithm: 37 . if x ∈ L Prr1 . It is straightforward to see that this belongs to ZPP. A be a BPP algorithm for a decision problem L. . . . rk ) = 0 then return 0 else return 1 Let us now consider the correctness of the algorithm. r2 . rk ) = 1] ≥ 1 − 1 2 k It is easy to see that by choosing an appropriate k the second probability can go arbitrarily close to 1 and therefore become larger than 2/3. . . What we do is run the algorithm for time 2t(x) and output a ? if the algorithm has not yet stopped. . then the average running time of the new algorithm is: Tavg = 1 1 1 · T + · 2T + .rk [Ak (x. Then. . so this is deﬁnitely in BPPas well. In the case that the input x does belong to the language then we need to boost the probability of a correct answer from at least 1/2 to at least 2/3. let A be an RPalgorithm for a decision problem L. + k · kT ≈ O(T ) 2 4 2 Now. . . . .rk [Ak (x.. r1 ) = A(x. .. In fact. This enables us to change the deﬁnition of RPand instead of the bound of 1/2 for the probability of a correct answer when the input is in the language L. though in the case where the right answer is 1 the output is right except when all A(x. More formally. since the original algorithm has an average running time t(x) and so it must stop before time 2t(x) at least half of the times. rk if A(x. 2 Let us now prove the fact that RPis contained in BPP.. the probability that our algorithm outputs a ? is less than 1/2. 2 Let. for a ﬁxed polynomial q. ... actually 2t(x). we want to prove that if the language L has an algorithm that runs in polynomial average time t(x). . rk ) = 1] = 0 if x ∈ L Prr1 . = A(x.Suppose that the running time of the ZPPalgorithm is T . . we can make the probability exponentially close to 1. We ﬁx some number k and deﬁne the following algorithm: A(k) input: x. by choosing k to be a polynomial in x. now.. pick r1 . First of all. Theorem 26 RP⊆BPP Proof: We will convert an RPalgorithm into a BPPalgorithm. 1 q(x) we can have a bound of 1 − 2 . which is what is required by the deﬁnition of BPP.. r1 . r1 . then this is in ZPP. In the case that the input x does not belong to the language then the RPalgorithm always gives the right answer. Moreover.. ri ) = 0.In case the correct answer is 0 the output is always right.
by running the algorithm many times we make sure that this slightly bigger than 1/2 probability will actually show up in the results. pick r1 . . . Pr[Xi = 1] = p. First. the algorithm makes a mistake when k i Xi < 2 . . . So. . . . More formally let us deﬁne the Chernoﬀ bounds. 1} and for every i. r2 . rk ) outputs the right answer 1 1 1 1 1. when k i Xi ≥ 2 . Theorem 27 (Chernoﬀ Bound) Suppose X1 .A(k) input: x. Then: k 1 Pr[ k i=1 k 1 Pr[ k i=1 Xi − p > ] < e − 2 k 2p(1−p) Xi − p < − ] < e − 2 k 2p(1−p) The Chernoﬀ bounds will enable us to bound the probability that our result is far from the expected. We ﬁx the input x and call p = Prr [A(x. ri ) i=1 if c ≥ k then return 1 2 else return 0 In a BPP algorithm we expect the right answer to come up with probability more than 1/2. r) = 1] over all possible random sequences. 1 Pr[A(k) outputs the wrong answer on x] = Pr[ k i 1 Xi < 2 ] ≤ since p ≥ 2 . . . Indeed. r1 . Then the algorithm A(k) (x. 3 1 Pr[ k i Xi − p ≤ − 1 ] 6 ≤e k − 72p(1−p) = 2−Ω(k) 38 . So. . Let us now consider how the Chernoﬀ bounds apply to the algorithm we described previously. rk c = k A(x. . We now apply the Chernoﬀ bounds to bound this probability. these bounds say that this probability is exponentially small in respect to k. . . . suppose x ∈ L. Xk such that Xi = A(x. . Xk are independent random variables with values in {0. . ri ). We also deﬁne the independent random variables X1 .
r) is correct] = 1 − Prr [∃x. ·). there is a polynomial time algorithm A and a polynomial p. we proved that at least half of the random sequences are correct for all possible input x. Further.The probability is exponentially small in k. it is easy to see that by choosing k to be a polynomial in x instead of a constant. such that for every input x Prr∈{0. 1}n A(x. by a circuit of size polynomial in n. where the ﬁrst input has length n and the second p(n). r) = wrong answer for x] ≤ 2−(n+1) This follows from our previous conclusion that we can replace 1 with 2−q(x) . The same reasoning applies also for the case where x ∈ L. Next.1}n ≥ 1 − 2n · 2−(n+1) 1 ≥ 2 2 So. r) is wrong] the second probability is the union of 2n possible events for each x. let’s consider the probability over all sequences that the algorithm gives the right answer for all input. Proof: Informally. that solves L on inputs of length n. 1}p(n) such that for every x ∈ {0. ≥1− Prr [A(x.1}p(x) [A(x. Hence. for a ﬁxed 3 polynomial q. We now ﬁx 3 n and try to construct a family of circuits Cn . even if we assume that these sequences are diﬀerent for every input x. Therefore. we can change the deﬁnition of a BPPalgorithm and instead of the bound of 1 for the probability of a wrong answer. If this probability is greater than 0. it is straightforward to see that we can simulate the algorithm A(·. Claim 29 There is a random sequence r ∈ {0. Formally. r) is correct. their sum is still less than the total number of random sequences. we can see that for each input x the number of random sequences r that give the wrong answer is exponentially small. Then by deﬁnition. Theorem 28 (Adleman) BPP⊆SIZE(nO(1) ) Proof: Let L be in the class BPP. Therefore. This is bounded by the sum of the probabilities. we can have a bound of 2−q(x) . our circuit will take as input only the input x and simulate A with input 39 . we are going to see how the probabilistic complexity classes relate to circuit complexity classes and speciﬁcally prove that the class BPPhas polynomial size circuits. All we have to do is ﬁnd a random sequence which is always correct and build it inside the circuit. r)is wrong] x∈{0. Prr [for every x. A(x. then the claim is proved. A(x.
. ym ⊕ z) = 0) ≤ 2m < 1. 2 In conclusion. .. . 1} ∀z ∈ {0. where the number of random bits r = m = xO(1) and A runs in time xO(1) .. ˆ ˆ Proof: Let A be a BPP algorithm for L. 1} m m i=1 A(x.x and r for this ﬁxed r.. Suppose x ∈ L. Whether BPP ⊆ NP or not is still an open question. Prr (A(x. ˆ 3 .ym (A(x. Then A uses k(n)m(n) random bits and Pr r (A(x. Then for every x. Lautemann gave a much simpler proof which we give below. We can then ﬁnd k(n) with k(n) = Θ(log m(n)) such that wrong answer) ≤ 2 ˆ 1 1 .3 BPP⊆ Σ2 This result was ﬁrst shown by Sipser and Gacs. ym ∈ {0. r) = right answer) ≥ 1 − 1 3m . Prr (A(x. since then by the previous result NP would have polynomial size circuits and hence by the result of Karp and Lipton the polynomial hierarchy would collapse. yi ⊕ z) = 1 where m is the number of random bits used by A on input x.. and A k(n) ˆ ˆ Do k(n) repetitions of A and accept if and only if at least executions of A ac2 cept. . Proof: Let L be in BPP and A as in the claim. What we know is that it’s unlikely that NPis contained in BPP. since we don’t know how to ﬁnd this sequence eﬃciently. . this is only an existential proof. y1 ⊕ z) = · · · = A(x. ym ⊕ z) = 0) ≤ z∈{0.1}m Pry1 . 1 (3m)m 40 . 2 ≤ ˆ 2ck(n) 3k(n)m(n) Theorem 31 BPP⊆ Σ2 . 5. Then Pry1 . r) = ˆ −ck(n) . Call the new algorithm A...ym (∃zA(x. let us brieﬂy describe some other relations between complexity classes. Then we want to show that m x ∈ L ⇔ ∃y1 . y1 ⊕ z) = · · · = A(x. Lemma 30 If L is in BPP then there is an algorithm A such that for every x. Of course.. r) = wrong answer) ≤ 1 ˆ uses m(n) random bits where n = x.
. y1 ⊕ z) = · · · = A(x. . Conversely suppose x ∈ L.So Pry1 . 3 So Prz (A(x.ym (∃zA(x...ym (∀z i A(x. and Lautemann’s proof is in [Lau83]... yi ⊕ z) ⊕ z) = 0 for all y1 . ym ⊕ z) = 0) > 0. 41 . ym ⊕ z) = 0) = Prz 2 ≥ 3 > 0.. yi ⊕ z) = 1) 1 3m ≤m· 1 = . ym ∈ {0.4 References Probabilistic complexity classes were deﬁned in [Gil77]. 1}m . yi i A(x. . . yi ⊕ z)) = 1 − Pry1 . Adleman’s proof that BPP ⊆ SIZE(nO(1) ) appears in [Adl78].. Sipser’s proof that BPP ⊆ Σ2 appears in [Sip83]. . 2 5. . . Then / Prz i A(x. . ym ) exists. So (y1 .. So there is a z such that i A(x. . yi ⊕ z) ≤ i Prz (A(x. y1 ⊕ z) = · · · = A(x.
Prove that ZPP=RP∩coRP 2. 3. Show that if NP ⊆ BPP then NP = RP. 42 .Exercises 1. Argue that SPACE(O(nlog n )) ⊆ BPP.
. } AND . . xn and can be described as a directed acyclic graph with a single root and two ﬁnal states. A branching program accepts or rejects input strings x1 . . . .1: A branching program. for example we can show that L can be computed by branching programs of polynomial size. one labeled 0 and the other 1. . 43 . .1. . and the program accepts the input if and only if the path ends at the ﬁnal state 1. . 0 and 1.. . The computational path begins at the root. Branching programs are a reasonably powerful model of computation. Figure 6.Lecture 6 A Probabilistic Algorithm 6. . . . OR k . . The path follows edge b such that xi = b.1 Branching Programs We begin by introducing another model of computation. . Each nonﬁnal vertex is labeled with some single variable x i and has two outgoing edges. that of branching programs. This is shown in ﬁgure 6.. An illustrative example is the branching program that considers input string x 1 x2 x3 and outputs 1 if x1 ⊕ x2 ⊕ x3 = 1.
testing equivalence for any model of computation is diﬃcult: for Turing machines. through a reduction from co3SAT. The idea is that the polynomial at each vertex is nonzero on a set of inputs only if that set of inputs will lead the branching program to that vertex. xn ) and the outgoing 1 edge receives (xj )p(x1 . . . its outgoing edges are then given polynomials in the following manner: If the vertex performed a test on x j . We can see that when the branching programs are readonce. we will deﬁne two polynomials p1 and p2 of degree n over x1 . .2. testing equivalence is also coNPcomplete. . We will show that in this case testing equivalence is in coRP. . and hence the polynomial will be of degree at most n. .2: Constructing the polynomial of a branching program. in which every path from the root to a ﬁnal state tests each variable at most once. the degree of a variable in any monomial in the resulting polynomial will have power no larger than 1. . These rules are illustrated in ﬁgure 6. } OR . . Our polynomials will be determined as follows: • We assign each vertex vi in the branching program from the root to the ﬁnal states a polynomial pi (x1 . . .6. A similar result is true for branching programs– here. } Collapse Figure 6. . xn that are equivalent if and only if B1 and B2 are also equivalent. . xn ). Happily. . the problem is undecidable. We will then use a coRPalgorithm to test the equivalence of the two polynomials. the problem is coNPcomplete. . . xn ). that vertex’s polynomial becomes the sum of the polynomials of the incoming edges. . beginning with the root vertex which gets the constant polynomial 1. • Once a vertex has been assigned a polynomial p(x1 . . . the problem becomes easier once we restrict it to readonce branching programs. • Once all of a vertex’s incoming edges have been appropriately labeled. . . 44 . In general. . . . but will have an exponential number of monomials and thus we cannot simply directly compare their coeﬃcients. . .2 Testing Equivalence We would like to be able to consider two branching programs and determine if they compute the same function. . . and for circuits.. the outgoing 0 edge receives the polynomial (1 − xj )p(x1 . .. . Our strategy will be as follows: Given two readonce branching programs B1 and B2 . . The polynomials will be represented in compact form. xn ). • The polynomial associated with the ﬁnal state 1 is the polynomial of the branching program. . OR log S .
xn ) is an nvariate nonzero polynomial of degree d over a ﬁnite ﬁeld F. . . we can evaluate the polynomials eﬃciently. . . then p has at most dFn−1 roots. . . The algorithm for testing equivalence of readonce branching programs is due to Blum. we can bound the number of nonzero points from below by Fn−1 1 − d−k F F 1− k F ≥ Fn 1 − d F = Fn − dFn−1 {a1 . . their corresponding polynomials p1 and p2 are equivalent if and only if B1 and B2 are equivalent. then we always accept. . even though they may contain an exponential number of coeﬃcients. . an ). . . All we need now is a method of testing equivalence of polynomials eﬃciently. . . . . . . p(x1 . an . if any monomial m in p1 or p2 does not contain a variable xi we replace m with mxi + m(1 − xi ). . . We choose a ﬁeld of size at least 3d. . a2 . . and then choose random a1 . Pr[p(a1 . making this a coRP algorithm. an ) = 0 This ﬁnishes the proof. xn ) + . . . Equivalently. . . xn ) of degree at most d. . . . . . . We now observe that By induction. where i=1 yi is either xi or (1 − xi ). The base case of n = 1 is simply that of a univariate polynomial and follows immediately. . . . . . an } : p(a1 . some extra thought reveals that equivalence of the polynomials is equivalent to equivalence of the branching programs. an ) = 0] ≤ d/F.We now observe that given readonce branching programs B1 and B2 . an ) = 0 and pk (a2 . . . .3 The SchwartzZippel Lemma The tool we will use for this is the SchwartzZippel lemma. We accept if p1 (a1 . . . Notice also that if the branching programs are polynomial size. We can then write p1 and p2 as sums of terms of the form n yi . . To see this. . xn ) + x2 p2 (x2 . otherwise reject with probability at least 2/3. which states: Lemma 32 If p(x1 . . From this. xn ) + x1 p1 (x2 . . . . . xn ) = p0 (x2 .4 References The SchwartzZippel theorem is from [Sch80. . . . 6. . . . . . an ) = 0 ≥ {a1 . . 45 . Now consider p(x1 . . an from Fn . Putting this together. . . . an ) = p2 (a1 . . Each term corresponds to a path in the branching program that ends in ﬁnal state 1. Zip79]. . . pk is nonzero on at least Fn−1 − (d − k)Fn−2 points. 2 An algorithm for testing equivalence of polynomials and hence readonce branching programs is now immediate. an ) is a nonzero univariate polynomial and hence there are at least F − k values of a1 such that p(a1 . xn ) 1 1 where k is the largest value for which pk is not identically zero. . We can write p(x1 . . . . Chandra and Wegman [BCW80]. + xk pk (x2 . . . an ) = 0. . an } : p(a1 . Proof: The proof proceeds by induction on n. . 6. Notice that we use F to be both the ﬁeld and its size. . . . . . . for each such point a2 . If the two polynomials are equivalent. Further.
Prove that testing equivalence of general branching programs is coNPcomplete. 1}n → {0. 2. 1} can be computed by a readonce branching program (possibly of exponential size).Exercises 1. 46 . Show that every Boolean function f : {0.
1}m we have Prh∈H [h(x) = a ∧ h(y) = b] = 1 22m Another way to look at the deﬁnition is that for every x = y. then (with probability one) all ψi are unsatisﬁable. We prove the result by presenting a randomized reduction that given in input a CNF formula φ produces in output a polynomial number of formulae ψ0 .1 The ValiantVazirani Reduction In this section we show the following: suppose there is an algorithm for the satisﬁability problem that always ﬁnd a satisfying assignment for formulae that have exactly one satisﬁable assignment (and behaves arbitrarily on other instances): then we can get an RP algorithm for the general satisﬁability problem. we obtain the desired reduction. 1}m . b ∈ {0. if φ is not satisﬁable. . . We say that H is a family of pairwise independent hash functions if for every two diﬀerent inputs x. Deﬁnition 10 Let H be a family of functions of the form h : {0. Indeed. . . 1}n → {0.Lecture 7 UniqueSAT 7. . In particular. y ∈ {0. If φ is satisﬁable. we can prove formally that with constant probability there is exactly one such assignment. . . The idea for the reduction is the following. 1}n → {0. 1}n and for every two possible outputs a. 1 47 . 1}k+2 . Suppose φ is a satisﬁable formula with n variables that has about 2k satisfying assignments. for every x = y and for every a. For technical reasons. and let h : {0. Details follow. 0) is about one. ψn . it will be easier to prove that this is the case when picking a hash function h : {0. when we pick h at random then the random variables h(x) and h(y) are independent and uniformly distributed. By doing the above construction for values of k ranging from 0 to n. and so NP = RP. 1 and that there is CNF formula ψ (easily constructed from φ and h) that is satisﬁed precisely by that assignment. b we have Prh [h(x) = ah(y) = b] = Prh [h(x) = a]. 1}k be a hash function picked from a family of pairwise independent hash functions: then the average number of assignments x such that φ(x) is true and h(x) = (0. 1}n → {0. . then (with high probability) at least one of the ψi is satisﬁable and has exactly one satisfying assignment.
First. 1}n → {0. . am · x + bm ). bm . . 1}n be a set such that 2k ≤ T  < 2k+1 and let H be a family of pairwise independent hash functions of the form h : {0. .am . Lemma 33 Let T ⊆ {0. .h(y) = 0h(x) = 0] = 1 − Prh [∃y ∈ T − {x}. .b1 . deﬁne ha1 . there is a constant probability that there is a unique element x ∈ T such that h(x) = 0. ... am ∈ {0. 1}k+2 . . . Precisely. . Putting everything together.h(y) = 0h(x) = 0] ≥ and so Prh [h(x) = 0 ∧ ∀y ∈ T − {x}. we write Prh [∀y ∈ T − {x}. Prh [h(x) = 0∧∀y ∈ T −{x}. We want to compute the probability that x is the unique element of T mapped into 0 by h.h(y) = 0] ≥ 48 1 2 1 2k+3 .h(y) = 0] = Prh [h(x) = 0]·Prh [∀y ∈ T −{x}. we have Prh [∀y ∈ T − {x}.h(y) = 0h(x) = 0] And then observe that Prh [∃y ∈ T − {x}. Then it is not hard to see that HAF F is a family of pairwise independent hash functions.. . . Clearly. Prh∈H [{x ∈ T : h(x) = 0} = 1] ≥ 1 8 Proof: Let us ﬁx an element x ∈ T .h(y) = 0h(x) = 0] Prh [h(y) = 0h(x) = 0] Prh [h(y) = 0] y∈T −{x} y∈T −{x} ≤ = = ≤ T  − 1 2k+2 1 2 Notice how we used the fact that the value of h(y) is independent of the value of h(x) when x = y.b (x) = (a1 · x + b1 .. 1}m as ha... and let HAF F be the family of functions deﬁned this way.For m vectors a1 . .h(y) = 0h(x) = 0] and we know that Prh [h(x) = 0] = 1 2k+2 The diﬃcult part is to estimate the other probability.. 1}m and m bits b1 . 1}n → {0.bm : {0. Then if we pick h at random from H..
and there is a probability at least 1/8 that ψk has exactly one satisfying assignment and that the algorithm accepts. 2 Theorem 35 (ValiantVazirani) Suppose there is a polynomial time algorithm that on input a CNF formula having exactly one satisfying assignment ﬁnds that assignment. . By construction. 2 Lemma 34 There is a probabilistic polynomial time algorithm that on input a CNF formula φ and an integer k outputs a formula ψ such that • If φ is unsatisﬁable then ψ is unsatisﬁable. Proof: Say that φ is a formula over n variables. ψn to the algorithm in the assumption of the Theorem. yn and then write a CNF condition equivalent to i i i (y1 = x1 ∧ ai [1]) ∧ · · · ∧ (yn = yn−1 ⊕ (xn ∧ ai [n] ⊕ bi ))). ... . then Lemma 33 implies that with probability at least 1/8 there is exactly one satisfying assignment for ψ. we observe that the probability that there is a unique element of T mapped into 0 is given by the sum over x ∈ T of the probability that x is the unique element mapped into 0 (all this events are disjoint. .. . On input a formula φ. otherwise we have probability at least 1 − (7/8)t of accepting. . and so the algorithm has a probability zero of accepting. If we repeat the above procedure t times. . the number of satisfying assignments of ψ is equal to the number of satisfying assignments x of φ such that ha1 . k + 2.2 References The ValiantVazirani result is from [VV86]. . we construct formulae ψ0 .bk+2 (x) = 0. . . . .. 49 . which is more than 1/2 already for t = 6. . If φ has between 2k and 2k+1 assignments. If φ is unsatisﬁable. and accept if the algorithm can ﬁnd a satisfying assignment for at least one of the formulae. . 2 7. . ψ is also unsatisﬁable. . . .To conclude the argument. . We submit all formulae ψ0 ... under the assumption of the Theorem. . The algorithm picks at random vectors a1 . 2. • If φ has at least 2k and less than 2k+1 satifying assignments.∧(ak+2 ·x+bk+2 = 0). If φ is satisﬁable. . ak+2 ∈ {0. then if φ is unsatisﬁable we still have probability zero of accepting. 3SAT has an RP algorithm. If φ is unsatisﬁable.) Then NP = RP.. there is no compact CNF expression to compute a · x if a has a lot of ones. Then ψ is the AND of the clauses in φ plus all the above expressions for i = 1. then for some k it has between 2k and 2k+1 satisfying assignments. . and accept if at least one iteration accepts. so the probability of their union is the sum of the probabilities). Proof: It is enough to show that. . 1}n and bits b1 . The probability of a unique element mapped into 0 is then at least T /2k+3 > 1/8. for every possible choice of the ai . . . .. ψn by using the algorithm of Lemma 34 with parameters k = 0. then all the formulae are always unsatisﬁable. n. then. bk+2 and produces a formula ψ that is equivalent to the expression φ(x)∧(a1 ·x+b1 = 0)∧. then there is a probability at least 1/8 then the formula ψ has exactly one satisfying assignment.ak+2 . but we can proceed as follows: for i i each i we add auxiliary variables y1 . Indeed. . (We make no assumption on the behaviour of the algorithm on other inputs. . . .b1 .
Exercises 1. Prove that HAF F is indeed a family of a pairwise independent hash functions. Change the assumption of Theorem 35 to having a probabilistic polynomial time algorithm that on input a formula with exactly one satisfying assignment ﬁnds that assignment with probability at least 1/2. Prove that it still follows that NP = RP. 2. 50 .
Lecture 8
Counting Problems
8.1 Counting Classes
Deﬁnition 11 R is an NPrelation, if there is a polynomial time algorithm A such that (x, y) ∈ R ⇔ A(x, y) = 1 and there is a polynomial p such that (x, y) ∈ R ⇒ y ≤ p(x). #R is the problem that, given x, asks how many y satisfy (x, y) ∈ R. Deﬁnition 12 #Pis the class of all problems of the form #R, where R is an NPrelation. Observe that an NPrelation R naturally deﬁnes an NP language LR , where LR = {x : x ∈ R(x, y)}, and every NP language can be deﬁned in this way. Therefore problems in #P can always be seen as the problem of counting the number of witnesses for a given instance of an NP problem. Unlike for decision problems there is no canonical way to deﬁne reductions for counting classes. There are two common deﬁnitions. Deﬁnition 13 We say there is a parsimonious reduction from #A to #B (written #A ≤ par #B) if there is a polynomial time transformation f such that for all x, {y, (x, y) ∈ A} = {z : (f (x), z) ∈ B}. Often this deﬁnition is a little too restrictive and we use the following deﬁnition instead. Deﬁnition 14 #A ≤ #B if there is a polynomial time algorithm for #A given an oracle that solves #B. #CIRCUITSAT is the problem where given a circuit, we want to count the number of inputs that make the circuit output 1. Theorem 36 #CIRCUITSAT is #Pcomplete under parsimonious reductions. Proof: Let #R be in #Pand A and p be as in the deﬁnition. Given x we want to construct ˆ a circuit C such that {z : C(z)} = {y : y ≤ p(x), A(x, y) = 1}. We then construct Cn that on input x, y simulates A(x, y). From earlier arguments we know that this can be done ˆ with a circuit with size about the square of the running time of A. Thus Cn will have size ˆ polynomial in the running time of A and so polynomial in x. Then let C(y) = C(x, y). 2 51
Theorem 37 #3SATis #Pcomplete. Proof: We show that there is a parsimonious reduction from #CIRCUITSAT to #3SAT. That is, given a circuit C we construct a Boolean formula φ such that the number of satisfying assignments for φ is equal to the number of inputs for which C outputs 1. Suppose C has inputs x1 , . . . , xn and gates 1, . . . , m and φ has inputs x1 , . . . , xn , g1 , . . . , gm , where the gi represent the output of gate i. Now each gate has two input variables and one output variable. Thus a gate can be complete described by mimicking the output for each of the 4 possible inputs. Thus each gate can be simulated using at most 4 clauses. In this way we have reduced C to a formula φ with n + m variables and 4m clauses. So there is a parsimonious reduction from #CIRCUITSAT to #3SAT. 2 Notice that if a counting problem #R is #Pcomplete under parsimonious reductions, then the associated language LR is NPcomplete, because #3SAT ≤par #R implies 3SAT ≤ LR . On the other hand, with the less restrictive deﬁnition of reducibility, even some counting problems whose decision version is in P are #Pcomplete. For example, the problem of counting the number of satisfying assignments for a given 2CNF formula and the problem of counting the number of perfect matchings in a given bipartite graphs are both #Pcomplete.
8.2
Complexity of counting problems
We will prove the following theorem: Theorem 38 For every counting problem #A in #P, there is an algorithm C that on input x, compute with high probability a value v such that (1 − )#A(x) ≤ v ≤ (1 + )#A(x) in time polynomial in x and in 1 , using an oracle for NP. The theorem says that #P can be approximate in BPPNP . We have a remark here that approximating #3SAT is NPhard. Therefore, to compute the value we need at least the power of NP, and this theorem states that the power of NP and randomization is suﬃcient. Another remark concerns the following result. Theorem 39 (Toda) For every k, Σk ⊆ P#P . This implies that #3SAT is Σk hard for every k, i.e., #3SAT lies outside PH, unless the hierarchy collapses. Recall that BPP lies inside Σ2 , and hence approximating #3SAT can be done in Σ3 . Therefore, approximating #3SAT cannot be equivalent to computing #3SAT exactly, unless the polynomial hierarchy collapses. We ﬁrst make some observations so that we can reduce the proof to an easier one. • It is enough to prove the theorem for #3SAT. If we have an approximation algorithm for #3SAT, we can extend it to any #A in #P using the parsimonious reduction from #A to #3SAT. 52
• It is enough to give a polynomial time O(1)approximation for #3SAT. Suppose we have an algorithm C and a constant c such that 1 #3SAT(ϕ) ≤ C(ϕ) ≤ c#3SAT(ϕ). c Given ϕ, we can construct ϕk = ϕ1 ∧ ϕ2 ∧ · · · ∧ ϕk . If ϕ has t satisfying assignments, ϕk has tk satisfying assignments. Then, giving ϕk to the algorithm we get 1 k t ≤ C(ϕk ) ≤ ctk c 1 c
1/k
t ≤ C(ϕk )1/k ≤ c1/k t.
If c is a constant and k = O( 1 ), c1/k = 1 + . • For a formula ϕ that has O(1) satisfying assignments, #3SAT(ϕ) can be found in PNP . This can be done by iteratively asking the oracle the questions of the form: “Are there k assignments satisfying this formula?” Notice that these are NP questions, because the algorithm can guess these k assignments and check them.
8.3
An approximate comparison procedure
procedure acomp deﬁned as: if #3SAT(ϕ) ≥ 2k+1 if #3SAT(ϕ) < 2k otherwise
Given acomp, we can construct an algorithm that 2approximates #3SAT as follows: Input: ϕ compute: acomp(ϕ, 1) acomp(ϕ, 2) acomp(ϕ, 3) . . . acomp(ϕ, n) if acomp outputs NO from the ﬁrst time then // The value is either 0 or 1. // The answer can be checked by one more query to the NP oracle. Query to the oracle and output an exact value. 53
Consider the following approximate comparison YES acomp(ϕ, k) = NO whatever
i − 1 and says NO for t = i.4 Constructing acomp The procedure and its analysis is similar to the ValiantVazirani reduction: for a given formula φ we pick a hash function h from a pairwise independent family. . In the ValiantVazirani reduction. S ≥ 4·2 . If acomp answers NO from the ﬁrst time. it is enough to give a BPPNP implementation of the acomp. that says that if S is suﬃciently larger than the range of h() then the number of elements of S mapped into 0 is concentrated around its expectation. acomp can be constructed as follows. and look at the number of assignments x that satisfy h and such that h(x) = 0. if k ≥ 6. we proved that if S is a set of size approximately equal to the size of the range of h(). with constant probability. Now suppose acomp says YES for all t = 1. Thus. Since acomp(ϕ. the algorithm outputs the right answer because it checks for the answer explicitly. 2. Hence. k if k ≤ 5 then check exactly whether #3SAT(ϕ) ≥ 2k . pick h from a set of pairwise independent hash functions h : {0. #3SAT(ϕ) ≥ 2i−1 . i − 1) outputs YES. Now we use a diﬀerent result. and also since acomp(ϕ. 8. 1n . Then. . Lemma 40 (The Leftover Hash Lemma) Let H be a family of pairwise independent m hash functions h : {0. . 1}m . then. 2i ) outputs NO. . 2 Prh∈H {a ∈ S : h(a) = 0} − 1 S S ≥ m ≤ . . a simpliﬁed version of the Leftover Hash Lemma proved by Impagliazzo. 1}m . input: ϕ. exactly one element of S is mapped by h() into 0. 1}n → {0. to establish the theorem. 1}n → {0. and Luby in 1989. #3SAT(ϕ) < 2i+1 . where m = k − 5 answer YES iﬀ there are more then 48 assignments a to ϕ such that a satisﬁes ϕ and h(a) = 0. Levin. . The algorithm outputs a = 2i . . . i − 1 and NO for t = i Output 2i We need to show that this algorithm approximates #3SAT within a factor of 2.else Suppose that it outputs YES for t = 1. 1 a ≥ #3SAT(ϕ) < 2 · a 2 and the algorithm outputs the correct answer with in a factor of 2. m 2 2 4 From this. Let S ⊂ 0. 54 .
and pick a random h ∈ H. S  = 32 · 2m . ak }. • If S ≥ 2k+1 . because S ≥ 2k+1 = 2m+6 ) 3 S 3 · m ≥ . 4 2 4 3 Prh∈H [{a ∈ S : h(a) = 0} ≥ 48] ≥ . • If S < 2k : Let S be a superset of S of size 2k .) Therefore. We will show that the algorithm is in BPPNP .H. . 55 . Let S = {a1 . Xk as Xi = 1 if h(ai ) = 0 0 otherwise. . m 2 4 2 4 = 64 · 2m . Proof: We will use Chebyshev’s Inequality to bound the failure probability. 8. which can then be ampliﬁed to. say. We have Prh∈H [answer YES] = Prh∈H [{a ∈ S : h(s) = 0} ≥ 48] ≤ Prh∈H [ a ∈ S : h(s) = 0 ≤ Prh∈H ≤ (by L. by the Leftover Hash Lemma we have: Prh∈H (set = 1 . Let S ⊆ {0. with 1 4 a ∈ S : h(s) = 0 − ≥ 48] S  2m ≥ S  2·2m = 1/2. . the algorithm will give the correct answer with probability at least 3/4. and S ≥ 2 4·2m 2 {a ∈ S : h(a) = 0} − 3 1 S S ≤ · m ≤ . We deﬁne random variables X1 . 1}n be the set of satisfying assignments for ϕ.Notice that the test at the last step can be done with one access to an oracle to NP. . 1 − 1/4n (so that all n invocations of acomp are likely to be correct). There are 2 cases.L. . 4 Prh∈H {a ∈ S : h(a) = 0} ≤ which is the success probability of the algorithm. . . .5 The proof of the Leftover Hash Lemma We ﬁnish the lecture by proving the Leftover Hash Lemma.
2 S 4 8. . a variable x and a bit b.Clearly. Var i 1 . φx←1 is obtained by removing each occurrence of ¬x from the clauses where it occurs. S E[ Xi ] = m . Whenever the relation R is “downward self reducible” (a technical condition that we won’t deﬁne formally). Xk are pairwise independent. {a ∈ S : h(a) = 0} = i Xi . 2 i 1 2m and E[Xi ] = 1 2m . and removing all the clauses that contain an occurrence of x. it is possible to prove that there is a probabilistic algorithm running in time polynomial in x and 1/ to approximate within 1 + the value Rx  if and only if there is a probabilistic algorithm running in time polynomial in x and 1/ that samples a distribution close to the uniform distribution over Rx . given an input x. . Also we calculate the variance Var[Xi ] = E[Xi2 ] − E[Xi ]2 ≤ E[Xi2 ] = E[Xi ] = Because X1 . . .1 Speciﬁcally. We show how the above result applies to 3SAT (the general result uses the same proof idea). we get Pr {a ∈ S : h(a) = 0} − S S ≥ m = Pr m 2 2 Var[ ≤ 2 E[ = 2 Xi − E[ Xi ] ≥ E[ Xi ] i i i i Xi ] ≤ 2 i Xi ] S 2m S2 2 (2m )2 1 2m ≤ . We now calculate the expectations. 1 56 . y) ∈ R}? A related question is to be able to sample from the uniform distribution over Rx . Pr[Xi = 1] = Hence. let us deﬁne by φx←b the formula obtained by substituting the value b in place of x. what is the size of the set Rx = {y : (x. For a formula φ. 2m Using Chebyshev’s Inequality.6 Approximate Sampling So far we have considered the following question: for an NPrelation R. the formula φx←0 is similarly obtained. 2m Xi = i Var[Xi ] ≤ S . For each i.
This is essentially best possible. With a careful use of the techniques from last week it is indeed possible to get an exact sampling algorithm for 3SAT (and any other NP relation) running in probabilistic polynomial time with an NP oracle. 8. It is known that it is NPhard to perform approximate counting for 2SAT and this result. The leftover hash lemma is from [HILL99]. it is easy to see that #φ = #φx←0 + #φx←1 Also. we get approximate values p0 . The problem of approximate sampling and its relation to approximate counting is studied in [JVV86]. .. generate a value b for x1 with probability approximately pb . . Conversely.. and the reduction implies that approximately counting the number of perfect matchings in a graph can also be done in probabilistic polynomial time. because the approximate sampling requires randomness by its very deﬁnition...If φ is deﬁned over variables x1 . for 2SAT and.. By computing the fraction of such assignments having x1 = 0 and x1 = 1.xn )←S [x1 = b] ≤ /n. The reduction and the results from last section also imply that 3SAT (and any other NP relation) has an approximate sampling algorithm that runs in probabilistic polynomial time with an NP oracle. The algorithm presented in these notes is taken from lecture notes by Oded Goldreich. such that pb − Pr(x1 . #φ and then recurse to generate a random assignment for #φx←b . if S is the uniform distribution over satisfying assignments for φ. Then we can approximately compute pb = #φx←b ..xn )←S [x1 = b] = #φx←b #φ Suppose then that we have an eﬃcient sampling algorithm that given φ and generates a distribution close to uniform over the satisfying assignments of φ. Let b be such that pb ≥ 1/2. p1 . . implies that approximate sampling is also hard for 2SAT.. to within a multiplicative factor (1 + 2 /n) to #φ.7 References The class #P was deﬁned by Valiant [Val79]. An algorithm for approximate counting within the polynomial hierarchy was developed by Stockmeyer [Sto83]. then #φx←b /pb is a good approximation.. for the problem of counting the number of perfect matchings in a bipartite graph. xn . among other problems. suppose we have an approximate counting procedure. clearly. Let us then ran the sampling algorithm with approximation parameter /2n and use it to ˜ sample about O(n2 / 2 ) assignments. 57 . with the above reduction. . and we can recurse to compute #φ x←b to within a (1 + 2 /n)n−1 factor. The same equivalence holds. we note that Pr(x1 . The problem of approximately sampling a perfect matching has a probabilistic polynomial solution. and generating satisfying assignments for a 3SAT formula requires at least an NP oracle.
E) with U  = V  = n as follows: number both edge sets U and V . 58 . Note that in Z2 . It still makes sense to interpret perm(M ) as the number of permanent matchings. Then M represents a bipartite graph where the edges have multiplicity given by the entries of M .π(i) = 1 for all i. which we think of as representing a bipartite graph G = (U. This corresponds to an actual perfect matching iﬀ all edges (i. . 2 Suppose we allow the entries of M to be nonnegative integers. if Mi. .j be 1 if edge (i. according to the formula perm(M ) = π∈Sn i=1 Mi. and 0 otherwise. . separately. the only diﬀerence is that the determinant has a coeﬃcient (−1)sgn(π) in front of the product. n. The deﬁnition of the permanent closely resembles that of the determinant. Claim 41 Let R = Z and let M be a 0/1 matrix. Thus we get a summand of 1 in the formula for perm(M ) for each perfect matching of G. for a ring R. because the product of the multiplicities of the edges of a matching is the number of ways of realizing that matching.π(i) = i=1 1. j) ∈ E. 2. However. Proof: Each π ∈ Sn is a potential perfect matching.Lecture 9 AverageCase Complexity of the Permanent 9. But for suﬃciently large moduli (or no modulus at all) it’s #Pcomplete. and let Mi. π(i)) are in E. Then perm(M ) is the number of perfect matchings in G. which is equivalent to saying that n Mi. perm(M ) = det(M ) and thus is polynomial time. whereas Permanent is #Pcomplete.π(i) where Sn is the group of permutations on n items. V. by 1.e. in that it can be thought of as a bijection between U and V .1 The Permanent Problem n The permanent is a number assigned to a matrix M ∈ R. i. this resemblance is deceptive: the determinant can be calculated in polynomial time.
Theorem 43 Suppose we have oracle access to a function f : F n → F and f agreees with 1 some degree d polynomial p : F n → F on a fraction 1 − 3d+3 of its inputs where F ≥ d + 2. . it outputs p(x). d) time. . (am . Thus we have the following algorithm: input: x. 2 3 3 We can strengthen our result to hold even if p(x) only agrees with f (x) on 4 + of its inputs. . . M1n . assuming ﬁeld operations can be performed in poly(n. . as promised. Then we can compute p(x) with high probability for all x ∈ F n in poly(n. . and with probability ≥ 2 . . Now. F > nO(1) . bi ). 59 . bi = q(ai ). d + 1 return (q(0)) This runs in poly(n. due to Berlekamp and Welch. we apply BerlekampWelch to strengthen our main theorem: Theorem 45 Suppose we have oracle access to a function f : F n → F and f agreees with 3 some degree d polynomial p : F n → F on a fraction 4 + of its inputs where F ≥ 1 + d/ . d) ﬁeld operations. d + 1 : p(l(i)) = f (l(i))] d+1 2 ≥ 1− = 3(d + 1) 3 Now q(t) = p(l(t)) is a degree d polynomial in t. . Rather than proving this theorem directly. d) time. . . assuming ﬁeld operations can be performed in poly(n. . pick random y ∈ F n ﬁnd univariate degreed polynomial q() such that q(i) = p(x + iy) for i = 1. Consider the line l(t) = x + ty. d) time. bm ) such that for fewer than m−d pairs (ai . Each point of l is uniformly distributed except x = l(0). . Then q can be 2 reconstructed with poly(n. . we will just observe that perm(M ) is a degree n polynomial in indeterminates M11 . . This stronger proof depends on the following theorem. Pr[p(l(i)) = f (l(i)) for i = 1. and prove the following much more general result. and we know d + 1 of its values. Then we can compute f (x) with high probability for all x ∈ F n in poly(n. . Now. . . . Theorem 44 Let q : F → F be a polynomial of degree d. .2 Worstcase to Averagecase Equivalence Theorem 42 Suppose for ﬁxed n and F. . . Say we are given m pairs (a1 . Mnn . M12 . so we can interpolate the others. 2. d) time. . Proof: Suppose that we input x ∈ F n and pick y ∈ F n uniformly at random. we can compute perm(M ) on a 1 fraction 1 − 2n of the inputs M in polynomial time. Then there is a (worstcase) polynomial time probabilistic algorithm for computing perm(M ). . d + 1] = 1 − P r[∃i ∈ 1. d) time. . b1 ).9.
BerlekampWelch will 2 ﬁnd q(t) ≡ p(l(t)) and the algorithm succeeds (and in polynomial time). The previous expression and Markov’s inequality tell us that Pr {t ∈ F − {0} : p(l(t)) = f (l(t))} > m( 1 − ) 1 m−d 4 ≤ m−d ≤ − 2 2 2 2 if m ≥ d Thus. so it can be repeated to yield any degree of accuracy we wish. 60 . pick random y ∈ F n ﬁnd degreed univariate polynomial q() such that {t ∈ F − {0} : q(t) = f (x + ty)} < (F − 1 − d)/2 halt and fail if no such polynomial exists return (q(0)) If q(t) = p(l(t)) and f (l(t)) disagree on < F −d−1 values of t.Proof: Consider the following algorithm: input: x.3 References The simple algorithm to reconstruct a polynomial from a faulty oracle is due to Beaver and Feigenbaum [BF90]. E [# values of t ∈ F − {0} with p(l(t)) = f (l(t))] = E t=0 0 if p(l(t)) = f (l(t)) 1 otherwise 1 − 4 ≤ (F − 1) To clean up the notation let us use m = F − 1. 2 9. our algorithm succeeds more than half the time. assuming only 1 4 − of our points disagree. The use of polynomial reconstruction algorithms in the averagecase complexity of the permanent due to Lipton [Lip90]. The BerlekampWelch algorithm [WB86] is a polynomial time algorithm to reconstruct a univariate polynomial from faulty data. and the more sophisticated version that uses errorcorrection is due to [GLR+ 91]. So. we need to show that less than half of our runs fail.
and let F be a ﬁeld of size 2k = O(n3 ). where we abused notation a little by viewing p(x1 . . xm ) · a. 1) ∈ L . We will construct another EXPcomplete problem L such that for every n there is a n = O(n) such that the truthtable of Ln is the encoding of the truthtable of Ln using a concatenation of ReedMuller and Hadamard codes. a) = p(x1 . . 1). we deﬁne the language L such that f is the characteristic function of L on inputs of length 3n + 3 log n . . where 1 = (1. . . 0). . 1. L is EXPcomplete. . Furthermore. Let k = 3 log n . From Lemma 46 it follows that there is a multivariate polynomial p of total degree at most O(n2 / log n) such that p(h(x)) = Ln (x) for all x ∈ {0. the zero element of F is mapped into (0. Then there is a mvariate polynomial p of degree at most mH such that p(x) = f (x) for every x ∈ H m . . 1}3n+k → {0. . Let H ⊆ F be a subset of size 2k/3 . . Let m = 3n/k. and for every n let Ln : {0. . 1} the function such that f (x1 . . . 0. . in particular. 1}n → H m . . the value p(x) can be computed using time polynomial in Fm and space logarithmic in Fm given oracle access to f . then H m contains at least 2n elements and there is some eﬃciently computable injective function h : {0. 1}k and. . xm . . because for every x ∈ B n we have x ∈ L if and only if p(h(x)) = 1. xm ) as an element of {0. . Let L be an EXPcomplete problem (the same construction and reasoning can be repeated with a PSPACEcomplete problem). . 61 . With similar abuse. 1}n . 1}k → {0. . Let us ﬁx an input length n. We will identify elements of F with vectors of {0. Let f : F m × {0. . 1} the characteristic function of L on inputs of length n. Then we will be able to show that L is as hard in the worst case as it is hard on average.1 Averagecase Complexity in PSPACE and EXP for 7/8 Fraction of Inputs Lemma 46 (Multivariate Interpolation) Let F be a ﬁeld. we can view f as a function f : {0. . 0) and the one element of F is mapped into (1. . 1}k . . 1}. 1}n → {0.Lecture 10 Averagecase Complexity of Problems in PSPACE and EXP 10. H ⊆ F a subset and f : H m → F an arbitrary function. Clearly. which holds if and only if (h(x).
and the Hadamard decoding needs the number of errors to be less than 1/4. a) = g(x. 1}n → {0. a)] ≥ 1/4] ≤ 1/2 − 4 So for at least a 1/2 + 4 fraction of the xs. by previous results. we have g(x. d/4 }. 1}km+k → {0. 1}k . p(x) = c(x). and let f : {0. For random (x. Proof: For every x. 1}O(n) → {0. with δ < 1/8 we can reconstruct Ln everywhere. Suppose that PSPACE ⊆ avgBPP7/8+1/n nO(1) . Notice that avgBPP 1 (t(n)) = BPP(t(n)). a)]. which means that. and then we argue that given a procedure that computes g on a 1 − δ fraction of inputs. we encode the characteristic function L n : {0. In order to state the application of the previous theorem to the averagecase hardness of problems in PSPACE and EXP. 1} be such that f (x. avgBPPp(n) (t(n)) contains all problems that can be solved on at least a p(n) fraction of the inputs by BPPtype algorithms in time t(n). Then we can compute p(x) for every x in time polynomial in m. deﬁne c(x) to be the value that maximizes Pra [c(x) · a = f (x. a) = p(x) · a. Suppose we are given oracle access to g : {0. The ReedMuller decoding needs the number of errors to be less than 1/2. a) = p(x) · a for more than 3/4 of the as. then PSPACE = BPP. 1} such that f and g agree on at least a fraction 7/8 + of the inputs. let us identify elements of F with elements of {0. a). f (x. The function c() has then agreement 1/2 + with p(). The procedure ﬁrst decodes the Hadamard code locally. so it follows that Prx [Pra [f (x. then. 62 . then EXP = BPP. 1}. 1}km+k → {0. F  and 2k . Notice that c(x) can be computed in time polynomial in 2k . 2 Note that the above theorem is essentially giving a polylogarithmic time decoding procedure for the concatenation of the ReedMuller code with the Hadamard code. let us deﬁne avgBPTIMEp(n) (t(n)) to be the class of decision problems L for which there is a probabilistic algorithm A running in time t(n) such that for every input length n Prx∈{0. a) = f (x. r) = Ln (x)] ≥ 3/4] ≥ p(n) In other words. where F ≥ 1 + max{4/ 2 . and then simulates the procedure for decoding the ReedMuller code using the results of the Hadamard decoding. Corollary 48 Suppose that EXP ⊆ avgBPP7/8+1/n nO(1) .Theorem 47 Let p : F m → F be a degree d polynomial. 10. a) = g(x. so the overall procedure needs the number of errors to be less than 1/8. 1} of L on inputs of length n as a function fL : {0.1}n [Prr [A(x. and break ties arbitrarily if the maximum is not unique. we can compute p() everywhere using c(). a) with probability at most 1/8 − .2 CodingTheoretic Perspective The construction of the previous section can be abstracted as follows: starting from a language L and an input length n. For each such x.
and start looking for list decoding procedures instead. 1}n → {0. Theorem 50 (Sudan) Let F be a ﬁeld. b1 ). it should be intuitive that the combination of the two procedures will give a decoding for the concatenation of the two codes that will work with 1/2 − epsilon fraction of errors. 1} and > 0. the list can be ﬁnd by brute force in time polynomial in the length of the encoding. in order for the theorem to hold. . (am . and its minimum relative distance is almost 1/2. d a degree parameter. with high probability). we will be able pick the right decoding from the list of possible decodings by hardwiring the choice 1 Note that there is still some margin of improvement. Now suppose we are trying to prove that EXP contains languages that hard to compute even on a 1/2 + fraction of inputs.1 In order to prove stronger hardness results. otherwise it would not be possible to reconstruct fL given only a function g that has agreement 1 − δ with fL . given oracle access to f . we need to stop looking for unique decoding procedures. 1}k → {0. . . We will state without proof a listdecoding result about the Hadamard code and one about the ReedMuller code. 1}m . it must be the case that if we have to diﬀerent functions Ln . 1}n such that C(x) and z are at distance less than δ. If the list is small and it can be produced eﬃciently. 1}m . the list can even be found in time polynomial in k and 1/ (by a randomized algorithm. Unfortunately. the encoding of Ln into fL is a concatenation of a ReedMuller code with a Hadamard code. bm ) ﬁnds a list of all univariate polynomials q() of degree ≤ d such that q(ai ) = bi for at least αm pairs. In the next section we will see how to use Sudan’s algorithm to achieve a form of listdecoding of ReedMuller code even given a 1 − fraction of errors. and 1 > α > 2d/m be an accuracy parameter. although we will not need such a stronger result in our applications. 1}n → {0. Since we can do listdecoding of the Hadamard code with up to 1/2 − errors. this is impossible: there is no errorcorrecting code with relative minimum distance bigger than 1/2 (except for trivial codes containing a small constant number of codewords). there are at most 1/4 such that Prx [g(x) = a · x] ≥ 1/2 + . we are given a received string z ∈ {0. Using nonuniformity. it gives a lot of information about the original message. 63 . Furthermore. By the above argument.The construction can be carried over for every Ln . Ln : {0. because our reconstruction procedure works only with less than a 1/8 fraction of errors. 1} then the functions fL and fL must diﬀer in at least a 2δ fraction of inputs. 2 strings a ∈ {0. 1}k For the Hadamard code. the list contains at most 2/α polynomials. which means that the unique reconstruction problem is possible only if there are less than a 1/4 fraction of errors. Then there is a polynomial time algorithm that on input pairs of ﬁeld elements (a1 . In the list decoding problem for a code C : {0. In fact. Lemma 49 Given g : {0. and we want to compute the list of all possible messages x ∈ {0. and we believe that less than a δ fraction of errors occurred. we need an encoding that will map two diﬀerent languages into two boolean functions that diﬀer in at least 1 − 2 fraction of points. and we can see that. Goldreich and Levin show that. .
F > 16d. and . The following procedure does that. 10. Claim 53 Prx. input: x. there is a y such that Prx [C f (x. so the above algorithm can be implemented by a circuit C of size polynomial in m and F. y. F ≥ 1 + 8d/ 2 . y. y. and let f : F m → F be a function that agrees with p on at least an fraction of inputs. By an averaging argument. p(y) consider line l(t) = tx + (1 − t)y and ﬁnd set Q of all degreed univariate polynomials q such that {t ∈ F : q(t) = f (tx + (1 − t)y)} > F/2 if there is unique q ∗ ∈ Q such that q ∗ (0) = p(y) return q ∗ (1) otherwise fail By our assumption on F. d. p(y)) = p(x)] ≥ 7/8.y [C f (x.) Theorem 51 Let p : F m → F be a polynomial of degree d. It was stated in terms of probabilistic procedures. Here is the multivariate listdecoding result. Then there is an oracle circuit C of size polynomial in m and F such that C f (x) = p(x) for every x. This will show that if every problem in EXP can be solved by small circuits on 1/2 + fraction of inputs. p(y)) = p(x)] ≥ 7/8. with F ≥ 1 + 8d/ 2 .in the circuit performing the reconstruction. (Because we have seen that probabilistic procedures can always be simulated by circuits. Let us pick x and y at random in F m . then all problems in EXP can be solved by small circuits on all inputs. we are done. Proof: We will just show the existence of a polynomial size circuit that computes p() on a 7/8 fraction of inputs. and then invoke Theorem 51. 64 . we want to argue that knowing p(y) will help us compute p(x). We state it as a nonuniform unique reconstruction procedure. once the claim is proved. let us see that.3 Averagecase Complexity in PSPACE and EXP for 1/2 Fraction of Inputs Let us recall a (weaker version of a) result from a past lecture. Theorem 52 Let p : F m → F be a polynomial of degree d. but it holds for a stronger reason in terms of small circuits. we have /2 > 2d/F. Then there is an oracle circuit C of size polynomial in m and F such that C f (x) = p(x) for every x. and let f : F m → F be a function that agrees with p on a 7/8 fraction of inputs. Now we want to argue that the algorithm is quite eﬀective. Before proving the claim.
d and . x and y picked in this way are uniformly distributed. Can we still argue that y is as good as a random point of the line in distinguishing q ∗ () from the other polynomials in Q? To see that this is the case. Theorem 54 Let p : F m → F be a degree d polynomial. 1}km+k → {0. indeed. Proof: It will be enough to show how to compute p on a fraction 3 of the inputs. So. except with probability 1/16. Then there is an oracle circuit C of size polynomial in m. For every x. and we have a circuit that computes p() on a 7/8 fraction of inputs. and we can invoke Theorem 51 and we are done. then we have that q ∗ is the unique polynomial in Q such that q(0) = p(y) if and only if q ∗ is the unique polynomial in Q such that q ∗ (b) = p(y).Then we just have to hardwire y and p(y) into C. the points on the line l(t) = tx + (1 − t)y are pairwise independent. So for each polynomial q ∈ Q such that q(t) has agreement /2 with p(l(t)) there is a corresponding polynomial q (t) = q(t(a − b) + b)) such that q (t) has agreement /2 with p(l (t)) and we can call Q the list of such polynomials. In the latter case. It remains to see that it is the unique one such that q(0) = p(y). where F ≥ 8d/ 6 . then consider the line l (t) = tx + (1 − t)y . 1}k be the list of all strings c such that g(x. q ∗ has a probability at least 1 − (Q − 1)d/F of having a value diﬀerent from the one of all other polynomials in Q on that random point. let S(x) ⊆ {0. 1} in time polynomial in 2k and that it contains at most 1/ 2 elements. let us pick the points x. Suppose we are given oracle access to g : {0. b ∈ F on the line and call x = l (a) and y = l (b). 1}km+k → {0. Let q ∗ (t) = q ∗ (t(a − b) + b)). F  and 2k such that C g computes p() everywhere. 1}k . Now we prove the claim. y in a diﬀerent (but equivalent) way: let us ﬁrst pick two random points x . note that when we pick x and y at random. Surely. on a random point of the line. Notice that S(x) is computable on a fraction at least 1/2 + /2 of the values a ∈ {0. Well. y ∈ F m . Therefore. let us identify elements of F with elements of {0. a) = p(x) · a. there is a probability at least 15/16 then there are at least F − 4 F points on the line where f agrees with p. y was a random point. and so the probability that q ∗ is isolated is at least 1 − (Q − 1)d/F which is at least 15/16. so. y = l(0). and ﬁnally pick two random points a. the polynomial q ∗ such that q ∗ (t) = p(l(t)) is part of the list Q constructed by the algorithm. the success probability of the algorithm is at least 7/8 and the claim is proved. a) and ca agree ˙ k . Overall. 1} be such that f (x. and this number is at least /2 by our assumptions on F. Proof: [Of Claim] First of all. A Markov argument shows that 65 . and let f : {0. Notice that each other polynomial q ∈ Q agrees with q ∗ in at most d points of the line. it is clear that we are looking at random point. but it is not exactly a random point on l(): by deﬁnition. 2 2 We can now move to the concatenated version. What about the line l(t) = tx + (1 − t)y? It is the same line as l (). just with a diﬀerent parameterization: we have l(t) = l (t(a − b) + b). 1} such that f and g agree on at least a fraction 1/2 + of the inputs.
66 . . [BFNW93] to prove averagecase results for problems in EXP. Suppose that PSPACE ⊆ avgSIZE1/2+1/n nO(1) . and apply Theorem 52. Sudan’s listdecoding algorithm for the ReedSolomon code appears in [Sud97]. then EXP ⊆ SIZE(nO(1) ).1}n [C(x) = Ln (x)] ≥ p(n) Corollary 55 Suppose that EXP ⊆ avgSIZE1/2+1/n nO(1) . Let ci (x). we are ready to state our main result. then PSPACE ⊆ SIZE(nO(1) ). The presentation in these notes is inﬂuenced by the more general perspective taken in [STV01]. except that we need some more notation. 10.4 References Polynomial encodings are used by Babai et al. 1/ 2 be the ith element of S(x) (say. in lexicographic order. a) = p(x) · a] ≥ 1/2 + /2 we have p(x) ∈ S(x) by deﬁnition. Again. by an averaging argument. for i = 1. 2 At this point. a) = p(x) · a] ≥ 1/2 + /2] ≥ Now. and if S(x) has less than i elements let ci (x) be 0). we have that ci (x) is computable in polynomial in 2k and. . . for each x such that Pra [g(x. . there must be an i∗ such that for at least an 3 fraction of the x we have ci∗ (x) = p(x).Prx [Pra [g(x. Let us deﬁne avgSIZEp(n) (t(n)) to be the class of decision problems L such that for every input length n there is a circit C of size t(n) such that Prx∈{0. We then let C f (x) = ci∗ (x).
Show that n = 1. 1}m be an errorcorrecting code of minimum relative distance 3/4. The proof already shows that one can (withouth nonuniformity) deﬁne a list of Fm+1 decoding algorithms (one for each choice of y and p(y)) such that one of them is correct. – Show that (with a small loss in the parameters) y can be chosen at random. so that we only need to enumerate over the F possible values for p(y). 67 . 1}n → {0. • This may be somewhat complicated: try to get a version of Theorem 52 that is more explicitely in terms of a listdecoding algorithm. • Show that avgBPP1−1/n (n2 ) contains undecidable languages.Exercises • Let C : {0. – Can you modify the algorithm so that the list contains only poly(1/ ) elements instead of F?.
. and so it is unbiased. 11. let us ﬁrst compute how successful we can be in computing f k if we just try to compute f () on each input xi and the we xor the results. . This is the same as deﬁning independent 0/1 random variables Z1 . To see why this makes sense. . Overall. We can think of each Zi as being sampled in the following equivalent way: with probability 1 − 2δ we set Zi = 0. Zk we never ﬂip a coin. . xk ) = f (x1 ) ⊕ f (x2 ) ⊕ · · · ⊕ f (xk ) . If δ < 1/2 and g ∈ [n. D a distribution n on n bit strings.1) We want to transform f into a function f that is very hard on average and that can be computed eﬃciently given f . In each attempt we will be successful with probability at most δ (let us say exactly δ for simplicity). . we then all Zi are zero and their xor is also zero. we have 1 1 1 Pr[Z1 ⊕ · · · ⊕ Zk = 0] = (1 − 2δ)k + (1 − (1 − 2δ)k ) = + (1 − 2δ)k 2 2 2 68 f k (x1 . . 1} deﬁned as Yao’s “XORLemma” states that the fraction on inputs on which f k can be computed by an eﬃcient circuit tends to 1/2 for large k. Then the probability of computing f k correctly is the same as the probability that we make an even number of errors. where for each i Pr[Zi = 0] = δ and asking what is Pr[Z1 ⊕ · · · ⊕ Zk = 0]. consider the function f k : {0. and the xor of the Zi contains the value of that coin xored with other independent values. For a parameter k. (11. . Part of these notes were written by Sanjeev Arora. with the remaining 1 − (1 − 2δ) k probability we ﬂip at least one coin. . .1 The XOR Lemma Deﬁnition 15 (δhardness) Let f be a boolean function on nbit inputs. 1}nk → {0. . and with probability 2δ we ﬂip an unbiased coin and let Zi be equal to the outcome of the coin. assuming we have as a starting point a function that is already somewhat hard on average. Zk .Lecture 11 The XOR Lemma We now see an alternative way of proving strong averagecase hardness results. . Then there is a probability (1 − 2δ)k that when we sample Z1 . 2n ] then we say that f is δhard on distribution D for circuits of size g if for any boolean circuit C with at most g gates: x∈D Pr [f (x) = C(x)] < 1 − δ. . .
at−1 . . By averaging. b1 . where St consist of those (x1 . a2 . x. we hardwire these choices into circuit C. Assume some circuit C of size at most S satisﬁes Pr[C(x1 . . bk−t−1 randomly from the complement of H.that actually tends to 1/2 for large k. . . . . Using nonuniformity again. . . b1 . . a2 . . at−1 randomly from H and b1 . bk−t−1 to get an nkbit string z. there is a way to ﬁx a1 . xk ) = f (x1 ) ⊕ f (x2 ) ⊕ · · · ⊕ f (xk ) 1 is ( 2 − − (1 − δ)k )hard for circuits of size S = S 2 δ 2 /4. . Proof: The proof is by contradiction.2) Lemma 56 (Impagliazzo) Suppose boolean function f is δhard for size S on nbit strings. . Deﬁnition 16 A set H ⊆ {0. Partition the set of nkbit strings into S0 . What happens when we consider arbitrary circuits for f k ? The following result by Impagliazzo allows us to get a proof of the XOR Lemma that follows quite closely the outline of the above probabilistic reasoning. Sk . Pick a1 . . . . . into which x is being fed. S1 . . there is an hard core set H of δ2n inputs with respect to circuits of size S . . it is easy to prove the XOR Lemma. . With probability at least (1 + )/2 we have C(z) = f ⊕k (z). At most (1 − δ)k fraction of inputs lie in S0 . xk )] ≥ 1/2 + + (1 − δ)k . . assume we know such a t. . x2 . x2 . . this is the input to C . Randomly permute the k inputs a1 . . We construct a circuit C of size S that predicts f on at least 1/2 + fraction of inputs of H and thus reach a contradiction. . so C must predict f ⊕k with probability at least 1/2 + on inputs from the other classes. note that z is a random input from St . bk−t−1 and their random permutation such that the circuit still outputs f ⊕k (z) with probability 1/2 + . . . Using nonuniformity. . x2 . . Impagliazzo’s result states that there is a single set H of δ2n inputs such that every circuit C fails on about half the elements of H. Noting that f ⊕k (z) = f (x) ⊕ f (a1 ) ⊕ · · · ⊕ f (at−1 ) ⊕ f (b1 ) ⊕ · · · ⊕ f (bk−t−1 ). . . . Let x be a random input from H. This is an extremely strong result: from the δhardness of f we only know that for every circuit C there is a subset HC of δ2n inputs such that C fails on HC . . S2 . . Theorem 57 (Yao’s XOR Lemma) Let f be a boolean function on nbit inputs that is δhard for circuits of size S. a2 . . at−1 . By Impagliazzo’s Lemma. (Notice the diﬀerent order of quantiﬁers. 69 . . . . Then the function f ⊕k on nkbit inputs deﬁned as f ⊕k (x1 . . . Thus the circuit is left with n input wires. . . 1}n is a hard core set for a boolean function f with respect to size S if for any boolean circuit C with at most S gates: x∈S Pr [f (x) = C(x)] < 1 + 2 (11.) From Impagliazzo’s result. . Averaging shows there is a t ≥ 1 such that C predicts f ⊕k with probability at least 1/2 + on inputs randomly chosen from St . . xk ) = f ⊕k (x1 . xk ) in which exactly t of the xi ’s lie in H. . Then for every > 0 there is a set S of size ≥ δ2n such that f is hard core on S for circuits of size S 2 δ 2 /4. Apply C on z.
This gives a circuit C with S gates (remember we do not count NOT gates in circuit complexity) which on input x produces C(z) ⊕ b as output. GNW95. which is f (x). The proof given here is Impagliazzo’s [Imp95a]. for at least 1/2 + fraction of x ∈ H. Proofs of the Lemma appeared in [Lev87. a contradiction. IW97].2 References Yao’s XOR Lemma was stated in the conference presentation of [Yao82]. the circuit outputs f ⊕k (z) ⊕ b. We add a NOT gate at the top of our circuit iﬀ b = 1. Furthermore. 70 . Imp95a. 2 11.we again use nonuniformity and assume we know b = f (a1 ) ⊕ · · · ⊕ f (at−1 ) ⊕ f (b1 ) ⊕ · · · ⊕ f (bk−t−1 ). Thus C computes f on 1/2 + of inputs from H.
1}kn → {0. 1} be δhard for circuits of size S. We say that a function f : {0. 1} n → {0. 1. f ail} for each x. f (xk )) cannot be computed on more than a + (1 − δ)k fraction of inputs by circuits of size 2 δ 2 S/4. then there is a subset H ⊆ {0. . and C(x) = f (x) whenever C(x) = f ail. Prove that if f : {0. . where S = Ω( S/(log(1/δ))). Then argue that the function f {0. · · · .Exercises 1. 1}n → {0. 1} is δhard for errorles circuits of size S if for every errorless circuit C of size ≤ S for f we have Prx [C(x) = f ail] ≥ δ. 1}k deﬁned as f (x1 . 1} is δhard for errorless circuits of size S. 1} if C(x) ∈ {0. Prove a version of Impagliazzo’s Lemma for averagecase algorithms that are errorless but may fail. H ≥ δ2n such that for every errorless circuit C for f of size ≤ S we have Prx∈H [C(x) = f ail] ≥ 1 − . 1}n → {0. 1}n → {0. 71 . . 1}n . 2. . Let f : {0. xk ) = (f (x1 ). We say that a circuit C is errorless for a function f : {0. Prove a “concatenation lemma” instead of a XOR Lemma.
let µ (x) = y≤x 1 Pr [y] . and then it samples from µn . .” introduce a reducibility that preserves eﬃcient averagecase solvability.1 Distributional Problems Deﬁnition 17 (Distributional Problem) A distributional problem is a pair L.Lecture 12 Levin’s Theory of Averagecase Complexity Today we introduce Levin’s theory of averagecase complexity.1) One can always reduce the approach of distinct distributions to the approach of this chapter by assuming that µ ﬁrst picks at random a certain input length n.” What do we mean by that? Fo all x ∈ {0. We will restrict to the study of distributional problems where µ is “polynomialtime computable. . We will discuss it again later. in the study of oneway functions) we normally describe the distribution of inputs as a collection of distributions µ1 . but for now the basic intuition is that we will discuss reductions where the length of the output is not a function of the length of the input.1 There are various reasons why this singledistribution approach is convenient for the purposes of this chapter. This theory is still in its infancy: in this chapter we will introduce the notion of “distributional problems. 1}∗ of possible inputs. . 1}∗ . . where µn is a distribution over the set {0. In other settings where averagecase hardness is considered (e. (12. 72 . 1}n of inputs of a given input length.” discuss various formalizations of the notion of “algorithms that are eﬃcient on average.g. . µ . . 12. where L is a decision problem and µ is a distribution over the set {0. and ﬁnally prove that there is a problem that is complete for the distributional version of NP under such reductions. so that sampling inputs from a ﬁxedlength distribution and passing them to a reduction does not produce a ﬁxedlength distribution (unlike the case of cryptographic reductions). µn . . It is still an open question how to apply this theory to the study of natural distributional problems that are believed to be hard on average.
ﬁrst choose an input size at random under some polynomiallydecreasing distribution. then it seems reasonable also to place restrictions on whoever generated the instances.2) x − 1 being the lexicographic predecessor of x.2 DistNP We deﬁne the complexity class DistNP = { L.where ‘≤’ denotes lexicographic ordering. It is easy to see that the uniform distribution is polynomialtime computable. 12.3) that is. µ : L ∈ N P. then choose an input of that size uniformly at random.’ 12. Therefore. under reasonable assumptions. because Pr [x] = µ (x) = µ (x) − µ (x − 1) . it’s not always obvious what to take as the ‘uniform distribution. that is. somewhere. Indeed one can show that. Someone. unless we place some computational restriction on µ. 2. (12. x ∈ L1 iﬀ f (x) ∈ L2 . (12. there exist distributions that are eﬃciently computable in the second sense but not polynomialtime computable in our sense. It should be clear that we need a whole class of distributions to do reductions. 73 . µ1 reduces to a distributional problem L2 . Then µ must be computable in poly (x) time. µ polynomialtime computable} . We can deﬁne the “uniform distribution” to be Pr [x] = 1 2−x . This is because a problem can have more than one natural distribution. we can’t just parameterize a complexity class by a single distribution. µ2 ) if there exists a polynomialtime computable function f such that: 1. If we place computational restrictions on ourselves. µ2 (in symbols. had to generate the instances we’re trying to solve. x (x + 1) (12. the averagecase theory is identical to the worstcase one. Clearly this notion is at least as strong as the requirement that Pr [x] be computable in polynomial time. One can show that there exists a distribution µ such that every problem is as hard on average under µ as it is in the worst case. L1 . µ1 ≤ L2 .3 Reductions Deﬁnition 18 (Reduction) We say that a distributional problem L1 .4) There are at least two good reasons for looking only at polynomialtime computable distributions. 1.
then. but what about the running time? Intuitively. To motivate this condition. in such cases A may still be regarded as eﬃcient.2. but such inputs are unlikely to be sampled according to distribution µ2 . it seems reasonable that there could be input lengths on which A takes a long time. what does it mean to say that A solves L. µ 1 (x) ≤ poly (y) µ 2 (y) . it could be the case that A2 is very slow on some inputs. but that are generated with very low probability under µ. for every x. starting from an input x distributed according to µ1 (x). when we pick y according to distribution µ2 . The most intuitive choice of saying that A is eﬃcient if E[t(x)] is small is problematic because the expectation could be inﬁnite even if A runs in worstcase polynomial time. we compute f (x) and then apply algorithm A2 to f (x). A ﬁrst diﬃculty comes with the fact that we are dealing with a single distribution on all possible inputs. (12. Suppose that we have an algorithm A2 for problem L2 such that. µ in polynomial time on average? We will consider some ﬂawed deﬁnitions before settling on the best one and on an alternate one. A(y) is eﬃcient on average. There is a > 0 such that. but this is not taken into account in the above deﬁnition. consider that we want reductions to preserve the existence of algorithms that are eﬃcient on average. E[t(x) x = n] ≤ nc However we chose to deﬁne distributional problems and reducibility without separating problems by input size. All known reductions satisﬁes this condition naturally. x:f (x)=y The ﬁrst condition is the standard condition of manytoone reductions in complexity theory: it ensures that an algorithm that is always correct for L2 can be converted into an algorithm that is always correct for L1 .5) 3. For all y ∈ {0. and we would run into several diﬃculties in separating them now. Our next attempt folds the polynomial running time into the single distribution by deﬁning A to be polynomial on average if there is a polynomial p such that 74 . One can work around this diﬃculty by deﬁning A to be polynomial provided that for some constant c and for every suﬃciently large n. Besides. This will certainly be correct. The second condition is a technical one. the domination condition ensures us that such inputs are also unlikely to be sampled when we sample x according to µ 1 and then consider f (x). 12. f (x) = Ω(x ). if we want to solve L1 under distribution µ1 . that is necessary to show that the reduction preserves eﬃcientonaverage algorithms. 1}∗ .4 PolynomialTime on Average Given a problem L. The third condition is called domination. µ and an algorithm A that runs in time t (x) on input x.
We say that A has polynomial running time on average is there is a constant c such that E t(x)1/c x = O(1) Notice. (12. unlike the case of the previous deﬁnition. let µ be the uniform distribution. suppose t() is a time bound for which the above deﬁnition is satisﬁed. Deﬁnition 19 (Polynomial on average) Suppose A is an algorithm for a distributional problem L. then L1 . There is an additional interesting property of the deﬁnition of polynomial of average: there is a high probability that the algorithm runs in polynomial time. Theorem 58 If L1 . whose nontrivial proof we omit. But suppose now that n is replaced by 2n (because of a change in representation. that this deﬁnition is satisﬁed by any algorithm that runs in worstcase polynomial time. Suppose that E t (x)1/c x = O (1) . We now come to a satisfying deﬁnition. µ2 and L2 .8) and that we wish to compute Pr[t(x) ≥ k · xc ]. but is still subject to the fatal of not being robust. If t(x) = O(xc ). µ1 ≤ L2 .E t(x) = O(1) p(x) This deﬁnition is quite appealing. t (x) = n2 otherwise. and let → − t (x) = 2n if x = 0 . then an algorithm whose running time is t (x) = t(x)2 also satisﬁes the deﬁnition.7) Similarly. in that: (1) reductions do not preserve this deﬁnition of polynomial solvability on average and (2) the deﬁnition is sensitive to trivial representation changes such as replacing a matrix representation of a graph by an adjacency list. More interestingly. To see why these problems arise. In fact we have the following result. ﬁrst. µ2 admits an algorithm that is polynomial on average. then − → t (x) = 22n if x = 0 .6) The average running time is about n2 . t (x) = 4 · n2 otherwise. then t(x)1/c = O(x) and the sum converges. (12. (12. the average running time becomes exponential. if t(x) is replaced by t2 (x). µ that runs in time t(x) on input x. µ1 also admits an algorithm that is polynomial on average. Such a probability is clearly the same as Pr[t(x)1/c ≥ k 1/c x] 75 . or because of the application of a reduction).
Our deﬁnition is only more general. Theorem 59 If L1 . This leads to the following formal deﬁnition. let f be the function realizing the reduction. there is a fraction about 1/k 2 of the inputs of length n on which the algorithm takes time Θ(kn). Since the algorithm runs in time at most knc for a subset of inputs having probability 1 − k −1/c . µ2 admits a heuristic polynomial time algorithm. The deﬁnition of heuristic polynomial time is incomparable with the deﬁnition of average polynomial time. an algorithm could take time 2n on a fraction 1/nlog n of the inputs of length n. µ if A always runs in polynomial time and for every polynomial p µ (x)p(x) = O(1) x:A(x)=χL (x) In other words. µ2 and L2 . Deﬁnition 20 (Heuristic polynomial time) We say that an algorithm A is a heuristic polynomial time algorithm for a distributional problem L. a polynomial time algorithm for a distributional problem is a heuristic if the algorithm fails on a negligible fraction of inputs. then L1 . a subset of inputs whose probability mass is bounded even if multiplied by a polynomial in the input length. Proof: Let A2 be the algorithm for L2 . Then this algorithm satisﬁes the deﬁnition of average polynomial time under the uniform distribution. For example. we would interpret averagecase tractability as the existence of an algorithm that solves the problem in polynomial time. and thus be a heuristic polynomial time algorithm with respect to the uniform distribution. Let c and be such that for every x we have x ≤ cf (x)1/ . and time n2 on the remaining inputs. and let p be the polynomial witnessing the domination property of the reduction. we see that our deﬁnition gives a smooth quantitative tradeoﬀ for how much time we need to solve an increasing fraction of inputs. while not beign average polynomial time with respect to the uniform distribution. that is. x∈Sq µ (x)p(x) = O(1). and that A is heuristic polynomial time if there is a polynomial q such that for every polynomial p. µ1 ≤ L2 . because from an algorithm A as before one can obtain an algorithm A satisfying Deﬁnition 20 by adding a clock that stops the computation after q(x) steps. On the other hand. and so the deﬁnition of heuristic polynomial time cannot be met. 76 .and by Markov’s inequality this is at most O(1/k 1/c ). In the setting of oneway functions and in the study of the averagecase complexity of the permanent and of problems in EXP (with applications to pseudorandomness). and for 1 ≤ k ≤ 2n/2 . although it does not necessarily work in polynomial time. µ1 also admits a heuristic polynomial time algorithm. but if we impose a polynomial clock there will be an inverse polynomial fraction of inputs of each length on which the algorithm fails. where Sq is the set of inputs x such that A(x) takes more than q(x) time. µ2 . we normally interpret “average case hardness” in the following way: that an algorithm of limited running time will fail to solve the problem on a noticeable fraction of the input. which can be made fairly small by picking k large enough. Conversely. It might also make sense to consider a deﬁnition in which A is always correct. It is easy to see that heuristic polynomial time is preserved under reductions. except on a negligible fraction of inputs. consider an algorithm such that for every input length n.
Let the inputs have the form M. 1l 1 1 1 .9) The trouble is that. Clearly this is a polynomial time algorithm. We work around this problem by compressing x to a shorter string if µ (x) is large. Thus. The following lemma shows how to do this. 1t . x. 1l . • Decide whether there exists a string y such that y ≤ l and M (x. 77 . µ2 and in the secondtolast step we introduce the polynomial q (n) deﬁned as q(c · n1/ ) 2 12. then A1 is correct on x. To give a reduction that satisﬁes the domination condition is indeed harded. y) accepts. y) accepts in at most t steps.5 Existence of Complete Problems We now show that there exists a problem (albeit an artiﬁcial one) complete for DistNP. by mapping highprobability strings onto shorter lengths. Deﬁne a uniform distribution over the M. 1l as follows: M. because of the domination condition. Then we deﬁne the following “universal” problem U . 1l where t and l are suﬃciently large bounds. where M is an encoding of a Turing machine and 1t is a sequence of t ones. x. Recall the deﬁnition of NP: we say that L ∈ NP if there exists a machine M running in t = poly (x) steps such that x ∈ L iﬀ there exists a y with y = poly (x) such that M (x. We need to show that for every polynomial q µ1 (x)q(x) = O(1) x:A2 (f (x))=χL2 (f (x)) and the lefthand side can be rewritten as µ1 (x)q(x) y:A2 (y)=χL2 (y) x:f (x)=y ≤ ≤ y:A2 (y)=χL2 (y) x:f (x)=y µ1 (x)q(c · y1/ )) µ2 (y)p(y)q (y) y:A2 (y)=χL2 (y) = O(1) where the last step uses the fact that A2 is a polynomial heuristic for L2 . µ ∈ DistNP. x. to reduce L to U we need only map x onto R (x) = M. Intuitively. Let L. · · M  x (x + 1) 2x (t + l) (t + l + 1) M  (M  + 1) 2 µ = (12. 1t . we can’t map x onto R (x) if µ (x) > poly (x) 2−x . x. 1t .Then we deﬁne the algorithm A1 than on input x outputs A2 (f (x)). That U is NPcomplete follows directly from the deﬁnition. and whenever A2 is correct on f (x). we make their high probability less conspicuous. 1t .
we need only show that µ (x) ≤ poly (x) µ 2 (x). C is injective because only two binary strings s1 and s2 can have the longest common preﬁx z. Since t = O (poly (t)). that outputs x with probability µ (x) and runs in poly (x) time. x. provided that for all x. then let C (x) = 1z. µ 2 (x) = 1 O M 2 2M  O C (x)2 2C(x) · 1 · 1 O (t + l + x)2 ≥ poly (x) max 2−x . since we mapped longer inputs to shorter ones.6 PolynomialTime Samplability Deﬁnition 21 (Samplable distributions) We say that a distribution µ is polynomialtime samplable if there exists a probabilistic algorithm A. µ (x) ≥ poly (x) µ (x) and we are done. C (x) . 1t . that x is a valid decoding of z) and that M (x.10) For a polynomialtime computable µ satisfying the above property. 78 . y checks that C (x) = z (i. taking no input.e. µ (x) ≥ 2− poly(x) or µ (x) = 0. 1l+x . since µ (x) ≤ 2−z . Finally. that is. C (x) ≤ 1 + min x . Since µ is computable in polynomial time. y) accepts. On the other hand. let µ2 (x) = Pr [R2 (x)] . (12. (x) Proof: If µ (x) ≤ 2−x then simply let C (x) = 0x. and then uses binary search to ﬁnd the ﬁrst x such that µ (x) ≥ r. Clearly x ∈ L iﬀ M accepts.. 1]. Here M is a machine that on input z. The running time of M is t. C (x) ≤ 1 + log µ 1 . a third string s3 sharing z as a preﬁx must have a longer preﬁx with either s1 or s2 . there are eﬃciently samplable distributios µ that are not eﬃciently computable. C is injective: C (x) = C (y) iﬀ x = y. we can indeed construct a sampler A that ﬁrst chooses a real number r uniformly at random from [0. µ (x) > 2−x . Here z is the longest common preﬁx of µ (x) and µ (x − 1) when both are written out in binary. 12. Note that. we could not have done this reduction inputlengthwise. Then there exists a polynomialtime algorithm C such that 1. Then. 2 (x) Now the reduction is to map x onto R2 (x) = M .Lemma 60 Suppose µ is a polynomialtime computable distribution over x. Any polynomialtime computable distribution is also polynomialtime samplable. since the map is onetoone. 0 concatenated with x. to poly (x) bits of precision. If. log µ 1 . To show that domination holds. 2. so is z. on the other hand. under reasonable assumptions.
79 . This means that the completeness result established in the previous section extends to the class of NP problems with samplable distributions. Impagliazzo [Imp95b] wrote a very clear survey on the subject.7 References Levin’s theory of averagecase complexity was introduced in [Lev86].11) A result due to Impagliazzo and Levin [IL90] states that if L.In addition to DistNP. Another good reference is a survey paper by Goldreich [Gol97]. µ is DistNPcomplete. BenDavid et al. (12. [BDCGL92] prove several basic results about the theory. µ is also complete for the class NP. µ : L ∈ NP. µ polynomialtime samplable} . Psamplable = { L. 12. then L. Psamplable . Impagliazzo and Levin [IL90] show that the theory can be generalized to samplable distributions. we can look at the class NP.
(b) Argue that. consider the distribution Dn.Exercises 1.] 80 . For a parameter c.cn is satisﬁable with probability at least (7/8)cn and at most 2n · (7/8)cn . [Hint: take a suﬃciently large n. (a) Show that an instance from Dn.) Let Dc be the distribution obtained by ﬁrst picking a number n with probability 1/n(n + 1) and then sampling from Dn. (Note that the same 3 clause could be picked more than once. using the deﬁnition given in this lecture. and then look at the probability of satisﬁable instances of length n under D1 5 and the probability that their image is generated by D3 0. D15 cannot be reduced to D30 .cn .cn over instances of 3SAT with n variables generated by picking cn times independently a random a clause out of the 8 n possible clauses that can be constructed from n variables.
Now the class NP is the set of languages L such that L ∈ NP iﬀ there is a Prover P and a polynomial time veriﬁer V such that: x ∈ L ⇒ P has a strategy to convince V (Completeness) (Soundness) x ∈ L ⇒ P has no strategy to convince V By strategy we mean in this case the certiﬁcate or proof that is polynomially small. this class is obtained by adding the ingredients Interaction and Randomness to the class NP. ·) running in polynomial time T (·) and a polynomial p(·) such that x ∈ L ⇔ ∃y. 13.Lecture 13 Interactive Proofs In this lecture we will look at the notion of Interactive Proofs and the induced class IP. The task of the prover is to convince the veriﬁer that x ∈ L and the veriﬁer is a skeptic who’s task is to be convinced if and only if the string actually belongs in the language. y) accepts We can rewrite the above as two separate conditions. using elementary ideas from logic.1 Interactive Proofs We begin be recalling the deﬁnition of NP. y) rejects (Completeness) (Soundness) We may generalise this to the ﬁgure below 13. As we shall see. y ≤ p(x) and V (x.1 where the process of recognizing if x ∈ L is carried out by a Veriﬁer V that must run in polynomial time. who has unbounded time and space. x ∈ L ⇒ ∃y. where a sequence of messages is exchanged between the two and a strategy for P is a function from the sequence of messages seen to the next message that P must send to V . y ≤ p(x) and V (x. y) accepts x ∈ L ⇒ ∀y.1) that P supplies to V .e. while later we will generalise it to interaction i. 81 . Intuitively. Given a language L. and a Prover P. (corresponding to y in the ﬁgure 13. adding either one does not make much diﬀerence but the two put together make IPa very powerful class. y ≤ p(x) and V (x. we say that: L ∈ NP iﬀ there is an algorithm V (·.
1. the veriﬁer may be a randomized machine. which is possible only if V is deterministic. V be the prover and veriﬁer for L. 2. and it accepts iﬀ after this exchange V accepted. Note that the entire set of messages is polynomially long as there are polynomially many messages and each is polynomially long. . . and so on. If V was randomized then to cover all possible questions in a polynomially long interaction P would have to send an exponentially large transcript. 82 x ∈ L ⇒ P has a strategy to convince V .2 below. we generalise the above deﬁnition to get the class IP. Note that for the above to work. V the oneround prover and veriﬁer as follows: P sends to V the entire transcript of the interaction between P. Say L ∈ NP. V as the certiﬁcate. . it “asks” the questions and is supplied all the answers on the same transcript. where the veriﬁer may “ask” further questions of the prover based on the messages that have been sent to it. V is the same as L and so L ∈ NP. V is essentially like V only it has been given all the answers to its questions at once.1: NPwith a ProverVeriﬁer mechanism Thus. First. We only need to show NP ⊆ NP. we shall see that both of the above are required. Thus the language accepted by P . The class IPdiﬀers in the following two ways: 1. its response (behaving as V ) is exactly y2 . We say that the class NP is the set of languages L such that L ∈ NP iﬀ there is a Prover P and a polynomial time veriﬁer V such that: x ∈ L ⇒ P has no strategy to convince V It is easy to check that NP = NP. there are rounds of communication. Suppose that interaction was the set of messages y1 . There is randomness. . Thus it takes the ﬁrst message of P . There is interaction: unlike the above where a single static proof is sent across.y PROVER VERIFIER x x Figure 13. y 1 . That is we allow the veriﬁer to ask questions and thus the strategy is a function from sequences of messages (the past communication) to messages (the next message that the veriﬁer must send). as the other inclusion is obvious. let P. Deﬁne P . the prover must at the very beginning know what responses V may make. 13. but y3 is the reply to that. yn as in the ﬁgure 13.1 NP+ Interaction = NP Let the class NP be the class corresponding to NP + interaction.
13.2 NP + Randomness By adding randomness to the class NP we get the class MA. by two rounds we mean that the Veriﬁer asks a question of the Prover and the Prover replies. Note that by round we really mean a single “message” and not a questionanswer sequence.Pr[V interacting with P accepts ] ≥ Where V also uses at most r(x) rounds of interaction. which is exactly two messages.2 IP Deﬁnition 23 (IP) A language L is in IP(r(·)) iﬀ there exists a probabilistic polynomial time veriﬁer V such that: 2 3 1 x ∈ L ⇒ ∀P. it is known that if coNP ⊆ MA then the polynomial heirarchy collapses. 83 . that is a message that is completely random and independent of all previous communication.Pr[V interacting with P accepts ] ≤ 3 x ∈ L ⇒ ∃P. y) accepts ] ≤ 3 As in similar classes. y) accepts ] ≥ 2 3 1 x ∈ L ⇒ ∀y. It is conjectured that MA = NP. the success probabilities may be boosted arbitrarily high. That is.Pr[V (x.1. Deﬁnition 22 (MA) L ∈ MA iﬀ there exists a probabilistic polynomial time machine V such that: x ∈ L ⇒ ∃y. We are now in a position to deﬁne formally the class IP.2: Simulating several rounds of interaction in one round 13. A related class is AM which is deﬁned as follows: Deﬁnition 24 (AM) A language L is in AM(r(·)) iﬀ L ∈ IP(r(·)) and at each round the veriﬁer sends a random message.Pr[V (x.y1 y2 y3 P yn V P’ y1 y2 y3 yn V’ Figure 13.
E1 ). π(v)) ∈ E2 . He does the following: 84 . The problem is the following: • Input: Graphs G1 = (V1 . However. Shamir proved that in fact IP was the same as PSPACE. G2 = (V2 . AM(2r(n)) ⊆ AM(r(n)). it was shown that in fact coNPwas contained in IP(poly(n)). There are the following surprising theorems about the classes AM and IP. Graph NonIsomorphism. 13. And also: Corollary 64 IP(O(1))=AM(O(1))=AM(2) Finally we have the famous theorem of Shamir: Theorem 65 IP(poly(n)) = PSPACE We know that AM(2) is good enough for systems with a constant number of rounds. There is also the following theorem that gives an upper bound on IP with constant rounds. The diﬃculty lies in constructing a succint proof that no permutation of the vertices is good. Thus for a long time it was believed that IPcould not capture coNPas it was thought that IPwith a constant number of rounds was roughly as good as IPwith polynomially many rounds. Theorem 66 If coNP ⊆ IP(O(1)) then the polynomial heirarchy collapses. v) ∈ E1 iﬀ (π(u). but when people say IP they mean IP(poly). This yields the following corollaries: Corollary 63 AM(O(1)) ⊆ AM(2) That is. let us see how this is in AM(2). Suppose that the skeptic Arthur (the veriﬁer) has the two graphs G1 . but it would be surprising indeed if polynomially many rounds could be simulated in a constant number of round. given that there are exponentially many permutations. Theorem 61 IP(r(n)) ⊆ AM(r(n) + 2) Theorem 62 ∀r(n) ≥ 2. the proof of which we shall shortly see and a few weeks after that result.The above is also known as Public Coin Proof Systems. E2 ) • Output: Decide whether there exists NO π : V1 → V2 such that (u. This problem is not known to be in NP. Due to the above result.3 An Example: Graph NonIsomorphism To get a feel of the power of interaction and randomness let us look at an example. all constant round AM proofs can be done in just two rounds. His only resources are randomness and his wizard Merlin who is all knowing (the prover). G2 and wishes to know whether or not all the permutations are bad. when people say AMthey mean AM(2).
85 . In all cases.4 Random SelfReductions It is not known how to prove worstcase to averagecase equivalence for NPcomplete problems. Thus we have a protocol. By repeating this k times we can shrink this to 2−k . In this section we show that an approach similar to the one taken for #P. If Merlin was correct Arthur believes that the graphs are nonisomorphic (accepts). PSPACE and EXP does not work. G2 . Sends G to Merlin and demands to know of Merlin which graph he sent him a permutation of. Arthur accepts only with probability 2 . 3. G. The entire power was in the interaction between it and the prover. Thus in the case the graphs are 1 isomorphic. coNP ⊆ AM(2) and from the theorem 66. Thus is because. Now i = b with 1 probability 2 as b is picked independantly at the start. Remark 1 Graph NonIsomorphism ∈ IP(2) = AM(2). the polynomial heirarchy would collapse. The following deﬁnition captures the way such reductions work. Graph Isomorphism is not NPcomplete unless the polynomial heirarchy collapses.1. 2}. He does not have a proof or certiﬁcate with which he may convince a third party that the Graphs are nonisomorphic. Creates G to be a random permutation of Gb . and with his inﬁnite resources Merlin will ﬁgure out which one and thus will reply correctly and so Arthur will accept with probability 1. and the one bit that the prover gave it gives it signiﬁcant conﬁdence but no knowledge. Picks a random b ∈ {1. say it is i. Arthur believes the graphs are isomorphic (rejects). if Graph Isomorphism is NPcomplete then Graph NonIsomorphism is coNPcomplete. G2 . Thus. we are able to get the correct value of the polynomial on all points (in the simplest case) or on a majority of points (in the procedures based on BerlekampWelch). 4. we pick a random line passing through x and then evaluate the goodonaverage procedure on points along the line. 13. 2. and from the above result. Merlin’s output depends only on G1 . otherwise Merlin was incorrect. Notice that if the graphs are really nonisomorphic. Since each such point is uniformly distributed. then Merlin will always be able to tell which graph was sent to him as only one of the graphs will be isomorphic to the random permutation that Merlin is given. Note: The veriﬁer of the above has learnt nothing at all about the Graphs that he did not know before. This was therefore a Zero Knowledge Proof. given a point x on which we want to evaluate the polynomial. If the graphs are isomorphic then Merlin cannot tell as the permutation is a permutation of both G1 . The worstcase to averagecase equivalence results that we have seen so far rely on procedures to compute polynomials exactly given a procedure that computes the polynomial well on average. a prover and a veriﬁer for Graph NonIsomorphism and so it is in AM(2).
We note that the above deﬁnition is restrictive in two ways: R generates the points y1 . . . where k. that is. . . say. Let R be the random selfreduction for L. 1}n and every i. for every x ∈ {0. yk ∈ {0. k the number of queries. We ﬁx a parameter m = Θ((k log k)2 ). R(x) produces a sequence of inputs y1 . . . 1}n is generated with probability larger than c/2n ). yk ). Nothing is known about the existence of adaptive and/or smooth random selfreduction. . then we can run R. Then using the randomized reduction together with A we still have at least a probability 3/4 that all queries are correctly answered. To make it an AM protocol. Theorem 67 Suppose an NPcomplete language L has a random selfreduction as in Definition 25. 1}n . . . (y1 . . . Given x ∈ {0. . the veriﬁer may just send the randomness it used to generate the m runs i of R(x). yk be uniformly distributed. . Proof: We show that if L ∈ NP and L has a random selfreduction. and let the prover ﬁgure out the corresponding queries yj . L(yk ). . R decides correctly whether x ∈ L with high probability.1}n [y ∈ L] The value p will be the nonuniformity used in the protocol. and that we have an algorithm A that succeeds on at least a fraction 1 − 1/4ck of {0.Deﬁnition 25 (Random Self Reduction) We say that a problem L has a nonadaptive random self reduction if there is a probabilistic polynomial time algorithm R such that. Secondly. let us deﬁne a protocol for the complement of L on inputs of length n. 1}n : • The veriﬁer runs R(x) m times indepedently. . no element of {0. . . . . . and if L were NPcomplete this would imply the collapse of the hierarchy. So A can be used to solve L everywhere on inputs of length n. . but the following theorem rules out nonadaptive uniform random selfreduction. and then continue using the answers A(y1 ). . given an input x ∈ {0. 1 • The veriﬁer accepts if and only if the following conditions hold: 86 . which are going to be all correct with probability at least 3/4. . and let p = Pry∈{0. and so on. and 1 m m 1 generates queries (y1 . yk ). 1}n . generate y1 . then the complement of L has an AM [2] protocol (that uses some nonuniformity). . . yk . and by certifying all the YES answers with an NP witness (remember that L is in NP). 1}n . . A(yk ). . . The veriﬁer sends all such queries to 1 the prover. Suppose that each point yi is just generated according to a “smooth” distribution (that is. Furthermore. yk nonadaptively. 1}n . . . with fresh randomness each time. let n be the length of the queries generated by R(x). . . . n = nO(1) . say. . and then given L(y1 ). . the string yi is uniformly distributed in {0. 1}n The intuition for the use of a random selfreduction is that if we have an algorithm A that solves L on. a fraction 1 − 1/4k of the elements of {0. i • The prover responds by telling whether each yj is in L or not. it chooses y2 before knowing L(y1 ). it is not really necessary that all points y1 . Then the polynomial hierarchy collapses to Σ3 . .
and so the veriﬁer. Theorem 61 is due to Goldreich and Sipser [GS86]. then R(x) rejects in all the m iterations. yj are answered YES We need to argue that if x ∈ L then the veriﬁer accepts with high probability when interacting with a valid prover. and so the prover has √ to pretend that there are only pm + O( m log k) − k/m. for every j = 1. if the veriﬁer accepts. which is less than pm − k/2m by our choice of m.5 References The classes IP and AM were deﬁned independently. The class AM was deﬁned by Lazlo Babai [Bab85] (the journal version of the paper is by Babai and Moran [BM88]). . and 1 m 3. Then there is a probability at least 3/4 that the condition is true for all j.4 are due to Feigenbaum and Fortnow [FF93]. in order to characterize the complexity of a grouptheoretic problem. . Credit for the proof that IP = PSPACE is typically shared between the work of Lund et al. rejects when checking condition (3). yj . and the veriﬁer accepts. . provided that the third condition is satisﬁed. If x ∈ L and the prover follows the protocol. . 2. . at least m/k are answered NO even though they were YES instances. If x ∈ L. By the calculation in the above paragraph. The interactive proof for graph nonisomorphism is from [GMW91]. . They proposed the class IP as an intermediate step towards the deﬁnition of Zero Knowledge Proofs. . . . 87 . . all certiﬁcates sent by the prover are valid. . Overall. assuming the answers from the prover were correct. . it would be caught because of its inability to provide a valid certiﬁcate). All the erroneous answers of the prover must be YES instances for which it answers NO (if the prover tries to cheat in the other way. very likely there were no more √ 1 m than pm + O( m log k) yes instances to start with among yj . The class IP was deﬁned by Shaﬁ Goldwasser. and contained theorem 62. it must be because the prover sent a set of answers containing at least m errors. The results of Section 13. . and each one has probability p of being a YES instance. yj are independent and uniformly distributed elements of {0.1. among the queries yj . and that if x ∈ L then the veriﬁer accepts with low probability when interacting with an arbitrary cheating prover. R(x) rejects in all the m iterations. . k at least pm − m/2k of the queries yj . yj . Note that for each m 1 ﬁxed j. 2 13. the strings yj . with high probability. The paper also appeared in FOCS 85. √ there is probability at least 1/4k that at least pm − O( m log k) of them are YES instances. This means that there is some j such 1 m that. using Chernoﬀ bounds. . so. Silvio Micali and Charles Rackoﬀ [GMR89]. 1}n . [LFKN92] and of Shamir [Sha92]. and this number is larger is pm − m/2k if m ≥ c(k log k)2 for a suﬃciently large c. . then the veriﬁer rejects if the prover sends valid answers for one of the m runs of R(x). The paper [GMR89] was written in 1983 but appeared ﬁrst in FOCS 85.
lm and y = y1 . Prove that if coNP has polynomialsize nondeterministic circuits. . [Hint: The proof is similar to the proof that BPP has polynomial sized circuits.C(x.Exercises The following exercises give a proof of Theorem 66.] 88 . C accepts the string x iﬀ ∃y.] 2. . Sligthly harder: show that every problem in AM[2] has nondeterministic circuits of polynomial size. . . . . y) = 1. . . [Hint: This is not too diﬀerent from KarpLipton. then the polynomial hierarchy collapses. yn . 1. Show that every problem in MA has nondeterministic circuits of polynomial size. A nondeterministic circuit C has two inputs x = x1 .
and if we take N = 2O(n+m) then elements in the ﬁeld can be represented by O(n + m) bits. . Otherwise. the veriﬁer checks if q2 (0) + q2 (1) = q1 (r1 ). as we will see. p(x1 .Lecture 14 IP=PSPACE In this lecture we will see that IP = PSPACE. and an mdegree univariate polynomial q1 (x) = x2 .1} x2 ∈{0.1} p(x. Notice that each literal is of degree 1. and therefore show that coNP ⊆ IP. for a 01 assignment. . .. p is taken as the product of the m expressions corresponding to the m clauses.. . Finally. . . and therefore p is of degree at most m. We start by giving an interactive proof system for UNSAT.xn ∈{0. the veriﬁer veriﬁes the primality proof as well as checking that q1 (0) + q1 (1) = 0. and since PH ⊆ P#P . N − 1}m can be computed in polynomial time.1 UNSAT ⊆ IP Let φ be a formula with m clauses over the variables x1 . we generalize this protocol in order to solve QSAT. . . . At the ﬁrst round. . xi → (1 − xi ). . . .1} xn ∈{0. x2 .1} p(r1 . the same proof system with minor modiﬁcations can be used for #SAT. . and otherwise it continues by choosing a random r2 ∈ {0. . If any of the checks fail then the veriﬁer rejects. . A clause is translated to the sum of the (at most 3) expressions corresponding to the literals of the clause. ..1 for a sketch of the protocol). If not. . . xn .. . 14.. Since #SAT is #Pcomplete. a proof that it is prime. Furthermore. Let N > 2n · 3m be a prime number. . The prover responds by sending an mdegree univariate polynomial q2 (x) = x3 . and thus one evaluation of p in any point in {0.. Literals are translated as follows: xi → xi . Now. the translation to p can be done in polynomial time. . then PH ⊆ IP. xn ). . the prover sends N ..1} Also. . xn ). it rejects..xn ∈{0.. and thus show that PSPACE ⊆ IP. . Hence. Finally. We now describe a protocol that enables the prover to convince the veriﬁer that φ is unsatisﬁable (see also Figure 14. xn ) ≡ 0 (mod N ) x1 ∈{0. and to a nonzero number if this assignment does satisfy φ where this number can be at most 3m . We translate φ to a polynomial p over the ﬁeld (mod N ) in the following way. N − 1} and 89 . . p evaluates to 0 if this assignment does not satisfy φ. it continues by choosing a random r1 ∈ {0. At this point. x3 . In fact. φ is unsatisﬁable iﬀ .. x. N − 1} and sending it to the prover. . .
xi+1 . .. . N − 1} ri q q q qn (x) = p(r1 .xn ∈{0. . ri−1 . . . . . . rn ) Figure 14. xn )  r1 q2 (x) = x3 . x3 . . .. . rn−1 . . . ... N − 1} r2 q q q qi (x) = xi+1 .1: The protocol for proving unsatisﬁability 90 .xn ∈{0. . . . ..1} p(x. . a proof that N is prime  V check primality proof check q1 (0) + q1 (1) = 0 pick random r1 ∈ {0. xn ) check q2 (0) + q2 (1) = q1 (r1 ) pick random r2 ∈ {0. . N − 1} check qn (rn ) = p(r1 . N − 1}  q1 (x) = x2 . . . . .. . . . . x2 . . . . x.. ...φ ? φ ? P N . x)  check qn (0) + qn (1) = qn−1 (rn−1 ) pick random rn ∈ {0. .. . .1} p(r1 .1} p(r1 .xn ∈{0. .. .. . x. . xn )  check qi (0) + qi (1) = qi−1 (ri−1 ) pick random ri ∈ {0. .
Thus.. xn ) ≡ 0 (mod N ).. xi+1 . . then w. 1.. qn = pn . .. . . If the prover then sends q2 = p2 then the veriﬁer will reject because q2 (0)+q2 (1) = p1 (r1 ) = q1 (r1 ). the veriﬁer will reject with high probability. . . We have to show that if φ is satisﬁable then no matter what polynomials the prover sends. Suppose that indeed p1 (r1 ) = q1 (r1 ). then the prover must send qi+1 = pi+1 in order for the veriﬁer not to reject. if the prover sends q1 = p1 then the veriﬁer will reject. Also. At the last round. Again. but if it gives another number as an answer. .2 A Proof System for #SAT We now show that with a few minor modiﬁcations.t. and if the assignment does not satisfy the clause. while as before. . then w. The change is only in the way we translate clauses.. and accepts otherwise.1} p(x1 . he is able to continue with the protocol s. If the prover sends q1 = p1 then since both are degreem polynomials then they agree on at most m places. we translate the clause (z1 ∨z2 ∨z3 ) to 1−(1−z1 )(1−z2 )(1−z3 ). the total probability that the veriﬁer will accept is at most nm . . again. .. Notice that if a 01 assignment to the variables satisﬁes the clause. If φ is unsatisﬁable then the prover can make the veriﬁer accept w.xn ∈{0. 1 by just following the protocol since q1 (0) + q1 (1) = x1 . . p1 (0) + p1 (1) = 0. . . x. Hence. . then the value of this expression will be 1. It is clear that the rest of the checks will succeed if the prover send the qi ’s accordingly. The veriﬁer can indeed perform the last check since as mentioned before. . If so. . Suppose φ has k satisfying assignments. .p. . the prover sends qn (x) = p(r1 . . ri is such that pi (ri ) = qi (ri ). .. N − 1} and sending it to the prover. . xn ). and since each polynomial has m coeﬃcients. choosing a random ri ∈ {0. Instead of translating a clause to the sum of the literals. . We write pi (x) for the polynomial that a prover who follows the protocol sends in round i. . If qi = pi then w. the veriﬁer will accept w. it can translate φ to p and evaluate it at the point (r1 .p. . rn−1 . and checks that qn (rn ) = p(r1 . . Therefore.sending it to the prover. N − 1}. chooses a random rn ∈ {0. We ﬁrst change the way we translate φ to the polynomial p. .t. If so. m We continue the argument in a similar way. . .. x). the veriﬁer will reject with high probability. We wish to give a protocol s.1} p(x1 . rn ). then no matter what messages the prover sends afterwards. rn ) in polynomial time. . the previous proof system can be used to prove that a formula has a given number of satisfying assignments. rejecting if it is not. . xn ) ≡ 0 (mod N ). . The protocol continues in this way where the ith message that the prover sends is a polynomial qi (x) = xi+1 . Thus. x) diﬀer on at least that fraction of the points. The veriﬁer responds by checking that qi (0) + qi (1) = qi−1 (ri−1 ). the messages are of polynomial length because elements of the ﬁeld can be written with O(m + n) bits. We now show the correctness of this proof system. so p2 (r2 ) = q2 (r2 ) with probability ≥ 1 − N . . . . m m ≥ 1 − (n − 1) N .xn ∈{0. . N 14..p. if the prover gives k as an answer. . in that case. . . At the end. there m is probability ≥ 1 − N that p1 (r1 ) = q1 (r1 ). ≥ 1 − N the veriﬁer will reject since. .. the 01 assignments that do not satisfy 91 . . ≥ 1 − N . .p. rn−1 . The veriﬁer checks that qn (0) + qn (1) = qn−1 (rn−1 ). . if the veriﬁer has not rejected before the last check.. 01 assignments that satisfy φ will be evaluated to 1 by p. . qn (x) and p(r1 . The veriﬁer rejects if either of the checks fail.xn ∈{0. ri−1 . and hence. . Thus. q2 and p2 will diﬀer on a fraction m m ≥ 1 − N of the elements of {0. and otherwise. N − 1}..1} p(r1 . the prover must send q2 = p2 .p. Since φ is satisﬁable then x1 . this value will be 0.
. . 92 . x2 . it is now enough to take N > 2n . For example. . . xn ).3. x2 .2 Arithmetization of TQBF We will consider the following quantiﬁed boolean formula: Φ = ∀x1 ∃x2 ∀x3 . . it’s easy to see that it’s a true quantiﬁed boolean formula. xn ) is a 3 − CN F boolean formula.3. and then continues as in the previous protocol (with the updated deﬁnition of p). . In the rest of the lecture. then the prover can continue by sending qi = pi at each round. x2 . 14. . . where p is suﬃciently large. . . It is known that TQBF is PSPACEcomplete. .1 A Proof System for QBF PSPACEComplete Language: TQBF For a 3CN F boolean formula φ(x1 . and the veriﬁer will accept with probability 1. . . the veriﬁer checks if q1 (0) + q1 (1) = k instead of checking q1 (0) + q1 (1) = 0. x2 . then we have a protocol for recognizing any language in PSPACE. . Thus. In the following. xn ). . 1. where φ(x1 . all the random elements are drawn from a ﬁeld {0. and follows the rest of the previous protocol. We can generalize this idea to allow universal quantiﬁers in additional to existential quantiﬁers to get formulas denoted as quantiﬁed boolean formulas. Notice that now the degree of p is 3m instead of m. . . we may think of its satisﬁability problem as determining the truth value of the statement ∃x1 ∃x2 . N 14. xn ). The main idea in the interactive protocol for recognizing TQBFis the same as proving P ⊆ IP from the last lecture: we will ﬁrst arithmetize the quantiﬁed boolean formula and then the prover will convince the veriﬁer that the arithmetized expression evaluates to 1. we will provide a protocol for recognizing TQBF. . . For the same reason. the veriﬁer will accept with probability at most 3mn . p − 1}. if we have an interactiveproof protocol for recognizing TQBF. ∀xn φ(x1 . If k is not the correct number of satisfying assignments. . ∀x 1 ∃x2 (x1 ∨ x2 ) ∧ (x1 ∨ x2 ) is a quantiﬁed boolean formula. ∃xn φ(x1 . then according to the same analysis as in the previous section. we can assume that all the quantiﬁed boolean formulas given to us have this form.φ will be evaluated to 0. Now consider the language of all true quantiﬁed boolean formulas: TQBF = {Φ : Φ is a true quantiﬁed boolean formula}. .3 14. Without loss of generality. In the current protocol. After receiving the ﬁrst message. . . . In fact. but now the sum over all evaluations of 01 assignments is equal to the number of satisfying assignments of φ. the prover starts by sending k. . If k is indeed the correct number of satisfying assignments. .
Recall that F has degree 3m in each variable (m is the number of clauses in φ) and agrees with φ for all 0−1 assignments to the n variables. Next consider the previous quantiﬁer ∃xn−1 . which is equal to F (x1 . • Replace the quantiﬁer ∀xi by the operator P∀xi . xn ). . . . xn−1 ) for ∀xn φ(x1 . x2 . x2 . . . we know from the last lecture how to arithmetize φ(x1 . . . which we denote by P∃xn−1 G(x1 . . . . Now consider the arithmetic expression after the transformation P∀x1 P∃x2 . . . and for each substitution of values to these variables the expression itself either evaluates to true or false. In general. . In the ﬁrst step. . . The following polynomial. . . xn ). xn ). In the notations introduced above. . x2 . x2 . Using F we can now write a new polynomial P∀xn F (x1 . . . . . . 1)). We’re looking for a polynomial that has the same behavior. . . . x2 . xn ). . P∀xn F (x1 . . x2 . x2 . . . P∀xn F (x1 . To begin with.Naive Solution So how do we arithmetize the quantiﬁed formula? we will ﬁrst show a naive solution and point out its problems. . . . . the polynomial for ∃xn−1 ∀xn φ(x1 . given the polynomial representation G(x1 . . x2 . . 93 . • Replace the quantiﬁer ∃xi by the operator P∃xi . 1). x2 . We have thus arrived at a way of representing quantiﬁed boolean formulas using arithmetic operations. satisﬁes the condition: 1 − (1 − G(x1 . . . . x2 . . It evaluates to 1 if and only if the quantiﬁed boolean formula Φ is true. xn−1 . x2 . . We want to ﬁnd a polynomial representation for ∃xn−1 ∀xn φ(x1 . . . xn ) is P∃xn−1 P∀xn F (x1 . . . . x2 . x2 . xn ) as done in the last lecture. xn ) to obtain the polynomial F (x1 . . xn−2 . x2 . . . . This expression has n − 1 free variables. . . . . . . x2 . . xn ). . . . . xn ). x2 . . . The ﬁnal expression after the transformation always evaluates to 0 or 1. . . x2 . . we transform the quantiﬁed boolean formula Φ into an arithmetic expression using operators P∀u and P∃v as follows: • Turn the 3CN F formula φ(x1 . . xn ) and it evaluates to either 0 or 1. x2 . . 0) · F (x1 . the prover eliminates the ﬁrst operator P∀x1 and sends the veriﬁer the polynomial G1 (x1 ) corresponding to the rest of the arithmetic expression: G1 (x1 ) = P∃x2 P∀x3 . . . . . . xn−2 . xn ). xn−1 ). Let us apply the idea from the last lecture to derive a protocol and see what happens. 0)) · (1 − G(x1 . . . . xn ) into the polynomial F (x1 . xn−1 . Now we read the quantiﬁers from right to left and consider the expression ∀xn φ(x1 .
. might double the degree of some variables in the transformation. . At any stage of the transformation. . we have Rxi J (x1 . r2 . x2 . P∀xn F (r1 . since the other two operators. . Also. . but will be reduced back to 1 after the reduce operators are applied. The degree of some variable might reach 2 after a P∀u or P∃v operator. xn ). 1. . . . . xi+1 . . xn ) where some variables’ degree might exceed two. xn ). we apply the reduce operators after each application of P∀u or P∃v . . . . . . the transformed polynomial J is 6x1 x2 + x2 x3 . xi+1 . xi−1 . 94 . computes β1 = ˆ checks that G ˆ G1 (r1 ) and sends r1 to the prover. xn )+ In this notation. we have J (x1 . xn ). we deﬁne a new operator. but the polynomially bounded veriﬁer will not be able to read them all. . . More formally. Yet. This revision will ensure that the degree of any variable at any intermediate stage never goes above 2. we have a polynomial J(x1 . . . . . xn ) = Rx1 Rx2 . we can’t expect to transform J into a new polynomial J (x1 . . The arithmetization given above clearly does not satisfy this property. x2 . . The key observation then is that for x = 0 or x = 1. . we apply the reduce operators after the 3CN F formula φ is arithmetized into the polynomial F (x1 . . notice here that we only need J to agree with J on all 0 − 1 assignments. . . However. . rn ). In particular. . x2 . . since each operator potentially doubles the degree of each variable in the expression. (1 − xi ) · J(x1 . For example. . This means that the prover will have to send the veriﬁer exponentially many coeﬃcients. xk = x for all positive integers k. . . Rxi . . P∀u and P∀v . . In 2 3 2 1 2 general. . . x3 ) = x3 x4 + 5x1 x3 + x6 x2 . Normally. In the revised arithmetization process. . . . xn ) where the degree of any variable is at most two while still evaluating to the same value at all points. . xn ). the veriﬁer ˆ1 (0) · G1 (1) = 1. if J(x1 .ˆ Upon receiving the polynomial G1 (x1 ) (which the prover claims to be G1 (x1 )). . . x2 . The desired polynomial J can thus be obtained from J by erasing all exponents. Revised Solution We will ensure that the degree of any variable in any intermediate stage of the transformation never goes above two. . . 0. The prover then needs to convince the veriﬁer that β1 = P∃x2 P∀x3 . What J does at any other point is of no interests to us. . . . . Yet there is a simple way to ﬁx this. . the degree of the polynomial G1 (x1 ) in the end can very well be exponential. Rxn J (x1 . and in the end the veriﬁer checks that βn = F (r1 . xi−1 . which when applied to a polynomial reduces the exponents of xi to 1 at all occurrences. . The prover then proceeds by eliminating the operator one after another. that is. replacing them with 1. the only change we make is to apply the reduce operators whenever the degree of any variable can potentially be above 1 in the polynomial obtained from the previous step. . xn ) = xi · J(x1 . The veriﬁer then picks a random value r1 .
) He veriﬁes that ˆ ˆ (Rxi G(xi ))[αi ] = β and computes β = G(αi ) by choosing a new random value αi for xi . . the prover eliminates the ﬁrst operator P∀x1 and sends the veriﬁer the polynomial G1 (x1 ) corresponding to the rest of the arithmetic expression: G1 (x1 ) = Rx1 P∃x2 . the a random value r1 . . xi ). Rxn F (x1 . xn ). . α2 . . xn ). .3 The Interactive Protocol Using the revised arithmetization. the veriﬁer checks that P∀x1 G1 (x1 ) = G1 (0) · G1 (1) = 1. Rxn F (r1 .3. (In this stage. xi ))[αi ]. . . Rxn F (x1 . . αi−1 . Now the veriﬁcation is reduced to whether β = P (α1 . . . Rxn F (x1 . P∀xn Rx1 . (αj (1 ≤ j ≤ i) is the most recent random value the veriﬁer assigned to variable xj . computes β2 = G1 (r2 ) and sends r2 to the prover. . The veriﬁer then picks a random value r2 for x1 . .) The prover then needs to convince the veriﬁer that β1 = Rx1 P∃x2 . . . . ˆ he asks for the polynomial G(xi ) = P (α1 . . . . . . Notice that this is a polynomial of degree 1 instead of exponential degree in the original ˆ naive arithmetization. . xn ). but for consistency with later rounds we let the veriﬁer send r1 to the prover. Notice that this is a polynomial of degree at most 2. In the end. . . . Upon receiving the polynomial ˆ ˆ G2 (x1 )(which the prover claims to be G2 (x1 )). . . Now the idea of the interactive protocol from the last lecture can be put to use without any trouble. . . and gets G(xi ). . . . no operators are left and the veriﬁer checks the required condition. . .14. ∀xn φ(x1 . when the veriﬁer needs to be convinced that β = (Rxi P (α1 . The prover then eliminates the second operator Rx1 and sends the veriﬁer the polynomial G2 (x1 ) corresponding to the rest of the arithmetic expression: G2 (x1 ) = P∃x2 . Initially. The veriﬁer then picks ˆ1 (r1 ) and sends r1 to the prover. . αi−1 . . . . . αi−1 . we arithmetize the quantiﬁed boolean formula of the form Φ = ∀x1 ∃x2 ∀x3 . The soundness of the protocol remains. Upon receiving the polynomial G1 (x1 )(which the prover claims to ˆ ˆ ˆ be G1 (x1 )). . αi ). . α2 . . xn ) into E = P∀x1 Rx1 P∃x2 Rx1 Rx2 P∀x3 Rx1 Rx2 Rx3 P∃x4 . x2 . . xn ). α2 . In general. computes β1 = G veriﬁer actually doesn’t have to send r1 to the prover. The prover and the veriﬁer communicate to eliminate one operator at each stage. . the veriﬁer checks that (Rx1 G2 (x1 ))[r1 ] = ˆ G1 (r1 ). where [r1 ] means that the polynomial in the parenthesis with free variable x1 is evalˆ uated at point r1 . . 95 .
and in the end the veriﬁer checks that q(αn ) = F (α1 . While dealing with the last n reduce operators.The case of the P∀u or P∃v operator is the same as in the last lecture where we proved P ⊆ IP. 14. For part (b). The prover proceeds by eliminating the operators one after another. The error can be introduced only when the veriﬁer happens to pick a root of some low degree polynomial at some round. the polynomials may have degree at most 3m. . the degree is at most 2. the analysis is the same as in the case of P ⊆ IP from the last lecture. if we choose the size of our ﬁeld p large enough. Thus.4 Analysis Theorem 68 (a) If Φ ∈ TQBF. . Thus. . the probability of error is at most (3mn + n 2 )/p ≤ 1/2. then for all provers P r[V erif ier accepts] ≤ 1/2.3. the sum of the degrees of all the polynomials encountered in the interaction is at most 3mn + n2 . For the remaining n(n − 1)/2 reduce operators. (b) If Φ ∈ TQBF. 2 96 . For every other operator (n operators of the form P∀u or P∃v ) the degree of the polynomial is 1. αn ). where q is the polynomial the prover sends to the veriﬁer in the last round and αi is the most recent random value the veriﬁer assigned to variable xi . then there exists a prover such that P r[V erif ier accepts] = 1. . Proof: Part (a) is obvious.
over all provers’ strategies P ∗ .Exercises 1. Show that IP ⊆ PSPACE. of the probability that V accepts x while interacting with P ∗ . we can compute the maximum.] 97 . [Hint: it is easier to show the following stronger claim: that given a veriﬁer V and an input x.
In particular.Lecture 15 Introduction to PCP In this lecture we see the connection between PCP and hardness of approximation. Output the best solution between alltrue and allfalse. and so cannot be solved in polynomial time unless P = NP. 98 . q(n)) if there is a polynomial time probabilistic veriﬁer V . ﬁnd an assignment that satisﬁes the largest number of clauses. Analysis. Also. we show that Max Clique cannot be approximated within any constant factor. However. Note that MAX3SAT generalizes 3SAT. therefore one of the solutions satisﬁes at least half of the clauses. reads q(x) bits of π and such that x ∈ L ⇒ ∃π · Pr[V π (x) = 1] = 1 x ∈ L ⇒ ∀π · Pr[V π (x) = 1] ≤ 1 . 8 Some problems admit polynomial time algorithms with approximation ratio ≥ (1− ) ∀ . which is given x. O(1)). The KarloﬀZwick algorithm runs in polynomial time and satisﬁes ≥ 7 times the optimal number of clauses. we show that the PCP Theorem implies that MAX3SAT cannot be approximated within (1 − ) for some > 0. / 2 Theorem 69 (PCP Theorem) NP = PCP(O(log n).1 PCP and MAX3SAT Approximability Deﬁnition 27 (MAX3SAT) Given a 3CNF formula ϕ (i. uses r(x) random bits. q(n). The PCP Theorem implies this is not the case for MAX3SAT. unless P = NP.2 15. it is easy to approximate it within a factor of 2: Algorithm.2. oracle access to a proof π. 15.1 Probabilistically Checkable Proofs Deﬁnition 26 Given functions r(n). It is possible to do better.e. Every clause is satisﬁed by at least one of the two solutions. we say that L ∈ PCP(r(n). as we see next. with at most 3 variables per clause). 15.
. xl . one of 1 1 the clauses representing fR fails. . . our plan is to construct a 3CNF formula ϕx on m variables such that (for some > 0 to be determined) x ∈ L ⇒ ϕx is satisﬁable x ∈ L ⇒ no more than (1 − )m clauses of ϕx are satisﬁable. Proof: Fix any NPcomplete problem L.2. Given an instance x of L. i. π(iq )). 1}r(z) . all clauses are satisﬁed. assume that V makes nonadaptive queries (by reading all the bits that could possibly be needed at once). . . For each Ri .1) Without loss of generality. This assumption is valid because the number of queries was a constant and remains a constant. z ∈ L ⇒ ϕz is satisﬁable ⇒ ∃π such that Vπ (z) always accept. then for every assignment to x1 . then L ∈ PCP(O(log n). e. Introduce Boolean variables x 1 . then the PCP Theorem must be true. the PCP Theorem holds for that problem. V on input z expects π to be a satisfying assignment for ϕz .e. . Therefore a fraction = 2 q2q of clauses fails. z ∈ L ⇒ ∀π a fraction 1 of clauses are unsatisﬁed / ⇒ ∀π V π (z) will reject with probability ≥ 99 1 2 (details omitted). xiq ). O( 1 ) = O(1) bits are read in the proof. . iq and a Boolean function fR : {0. If z ∈ L. By the PCP Theorem. Let V be the veriﬁer for L. V chooses q positions i1 . Theorem 71 If there is a reduction as in (15. . the corresponding proof / π(i) = xi makes the veriﬁer reject for half of the R ∈ {0. and checks that π satisﬁes all of them. 1} and accepts iﬀ fR (π(i1 ). . 2 It turns out if this holds for every NPcomplete problem. . . O(1)). For each such R. .1) for some problem L in NP. . If z ∈ L. Proof: We describe how to construct a veriﬁer for L. 1}q → {0. where l is the length of the proof. . so the number of such strings is polynomial in x.g. x2 ∨ x10 ∨ x11 ∨ x12 becomes (x2 ∨ x10 ∨ yR ) ∧ (y R ∨ x11 ∨ x12 ). The length of each string is r(x) = O(log x). .15. . Let q be the number of queries. . This requires at most q2q 3SATclauses. We want a Boolean formula to simulate this. . O(1)). / (15. then there is a proof π such that V π (z) accepts for every R. V picks O( 1 ) clauses of ϕz at random. We need to convert clauses of length q to length 3 with additional variables. Now we need a correspondence between the number of satisﬁable clauses and the probability that the veriﬁer accepts. For every R we add clauses that represent fR (xi1 . Enumerate all random strings R for V . Set xi = π(i) and auxiliary variables in the right way. L ∈ PCP(O(log n). . . xl (and to yR ’s). In general there is a series of auxiliary variables for each R. .2 Inapproximability Theorem 70 The PCP Theorem implies that there is an > 0 such that (1− )approximation of MAX3SAT is NPhard. This can be done with 2q SATclauses. Randomness is O( 1 log m) = O(log z).
If we had “= 1” in Equation (15.3. When convenient.2 15.3 15. we could ﬁnd a proof π by solving a system of linear equations (which would imply P = NP). we use the complementary graph equivalent “Max Independent Set” which is the largest I ⊆ V such that ∀u. i3 and a bit b and accepts iﬀ π(i1 ) ⊕ π(i2 ) ⊕ π(i3 ) = b. / Finding a largest clique is NPhard. ﬁnd the largest C ⊆ V such that ∀u. There is a simple log n approximation: n • arbitrarily partition the graph into subsets of size log n • solve exactly each subset • report the largest clique found Suppose there is a clique of size αn. hence the approximation. / 2 4 This is not producing a formula directly. For every R. If z ∈ L. then for a ( 1 − ) fraction of R. i2 . E). so we ask about approximations. Best known is a (log n) approximation algorithm. 1 of clauses are contradicted. v) ∈ E. then a fraction ≥ (1 − ) of clauses are satisﬁed. (u. n 100 . The veriﬁer satisﬁes z ∈ L ⇒ ∀π · Pr[V (x) = 1] ≤ / z ∈ L ⇒ ∃π · Pr[V π (x) = 1] ≥ 1 − π 1 2 (15.2. (u. but is enough to prove the hardness of approximation ratio: 1 1 1 − 4(2 − ) 7 =8+ 1− 15. one can encode xi1 ⊕ xi2 ⊕ xi3 = b with 4 clauses in 3CNF.2). chooses 3 positions i1 .2) (15. v ∈ I. If z ∈ L.3 Tighter result in Theorem 70? In the next lectures we will prove the What is the best known value of following result: Theorem 72 (H˚ astad) There is a PCP veriﬁer for NPthat uses O(log n) bits of randomness which. v) ∈ E.1 Max Clique Approximability Deﬁnition 28 (Max Clique) Given an undirected graph G = (V.3) + Note the slightly nonstandard bounds for the probabilities above. then one of the subset contains a clique of size 2 α log n. v ∈ C. based on R.
set in G)2 . So the hardness results for MAX3SAT translate directly to Max Independent Set. (v1 . Given G = (V. this result is not true for cliques (a clique in G could be possibly larger than expected). It can be shown that the size of the maximum independent set in G is equal to (size of max ind. we build a graph using the same construction as in the NPcompleteness proof of Independent Set in Lecture 3 (see Figure 15. and G has an independent set with k vertices iﬀ ϕ has an assignment that satisﬁes k clauses.1). ALM+ 98]. v1 ) ∈ E or (u2 . consider G = (V .4 References The PCP Theorem is from [AS98. unless we use a diﬀerent deﬁnition of G . 15. v2 ) ∈ E.3.1: Graph construction corresponding to the 3CNF formula ϕ = (x 1 ∨ x2 ∨ x3 ) ∧ (x2 ∨ x3 ∨ x5 ) ∧ (x1 ∨ x4 ∨ x5 ) ∧ (x1 ∨ x3 ∨ x5 ).2 Inapproximability We apply the PCP Theorem to Max Independent Set. u2 ). H˚ astad’s result on threequery PCP is from [H˚ as01]. As a sidenote.x1 x2 x1 x1 x2 x3 x3 x5 x4 x5 x3 x5 Figure 15. . . The graph has 3m vertices. v2 )) ∈ E iﬀ (u1 . . 101 . E ) where V = V × V and ((u1 . E). ∧ cm over variables x1 . and the connection between proofchecking and approximability is due to [FGL+ 96]. Below we show how this can be further improved. . H˚ astad’s result on the approximability of the Max Clique problem is from [H˚ as99]. we get: graph size of ind. 15. set if only (1 − )m clauses of ϕ are satisﬁable G m (1 − )m G m2 (1 − )2 m2 G m4 (1 − )4 m4 The ratio decreases without bound. xn arising from Theorem 70. Karloﬀ and Zwick [KZ97] describe a 7/8 approximation algorithm for Max 3SAT. Max Independent Set cannot be 1 approximated within n1− (n is the number of vertices). set if ϕ is satisﬁable size of ind. Given a 3CNF formula ϕ = c1 ∧ c2 ∧ . . By a constant number of applications of the graph product operation. . In fact: Theorem 73 (H˚ astad) If NP = ZPP. then ∀ > 0. Note that G has an independent set of size m iﬀ ϕ is satisﬁable. therefore there can be no constant approximation algorithm for Max Clique.
in addition to these conditions. the veriﬁer for any L ∈ NP is simple: V π generates three indices i. respectively. given input x and oracle access to a proof π. respectively. (Folklore) For every > 0. Moreover. we will need a more detailed parametrization of PCP. we have the following lower bounds: 1. 2]. H˚ 16.1/2+ [O(log n).1 H˚ astad’s veriﬁer To describe H˚ astad’s veriﬁer. j. we outline the connection between the PCP theorem and astad’s veriﬁer. Number of queries. q(n)].1− 1/3 [O(log n). Theorem 74 (H˚ astad) For every constant > 0.s [r(n). uses r(x) random bits. we derive asymptotically optimal hardness of approximation results for Max E3LIN2 and Max 3SAT. Say a language L is in PCPc.Lecture 16 H˚ astad’s Proof System In this lecture we introduce H˚ astad’s PCP veriﬁer. NP = naPCP1− .1−s [O(log n). we say L ∈ naPCPc. 3]. which yields stronger hardness results than the raw PCP theorem. (Zwick) If c/(1−s) ≥ 2. 3]. We remark that the theorem achieves the best possible values for its parameters in several ways. If. q(n)] if there is a probabilistic polynomial time veriﬁer V which. We will sketch how to derive H˚ astad’s theorem from the PCP theorem in the next few lectures. k and one bit b and accepts iﬀ π[i] ⊕ π[j] ⊕ π[k] = b. The values c and s are called the completeness and the soundness of the veriﬁer. In particular. reads q(x) bits of π and such that x ∈ L ⇒ ∃π : Pr[V π (x) = 1] ≥ c x ∈ L ⇒ ∀π : Pr[V π (x) = 1] < s. 102 . In particular.s [r(n). 2)] and (Zwick) P = PCP1− . Completeness vs. 2. We also show that Max 2SAT and Min Vertex Cover are hard to approximate within 21/22 + o(1) and 7/6 + o(1) of their optimum. soundness. P = PCP1. then P = PCPc. the veriﬁer V makes its queries nonadaptively. Finally.1− [O(log n.
k and accepts iﬀ π[i(r)] ⊕ π[j(r)] ⊕ π[k(r)] = b(r). It follows that E3LIN2 is inapproximable within a factor of (1/2+)m ≤ 1/2 + 2 . (H˚ astad) For all > 0. 7/8 + o(1) for Max 3SAT. V π generates a random string r of length R = O(log z). 1}R  = zO(1) . perfect completeness can be bought by giving away soundness or nonadaptivity: 1. 1}R }. there exists a proof π such that V π (x) accepts with probability 1 − over the choice of r. a 1/2 approximation is asymptotically optimal. The proofs use approximation preserving reductions similar to the ones we saw last time. NP = naPCP1. However. j. Max 3SAT. In this case.3. Lewin. we construct a reduction ψ from an arbitrary NP language L to E3LIN2. The reduction ψ from E3LIN2 to 3SAT maps each equation x ⊕ y ⊕ z = b into a conjunction of four clauses: x ⊕ y ⊕ z = 0 ⇔ (¯ ∨ y ∨ z ) ∧ (¯ ∨ y ∨ z) ∧ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z ) x ¯ ¯ x ¯ ¯ x ⊕ y ⊕ z = 1 ⇔ (x ∨ y ∨ z) ∧ (x ∨ y ∨ z ) ∧ (¯ ∨ y ∨ z ) ∧ (¯ ∨ y ∨ z). If z ∈ L. ¯ ¯ x ¯ x ¯ 103 . j(r).3/4+ [O(log n). queries π at positions i. unless P = NP. and choose the one that satisﬁes more equations. Linear equations modulo 2. computes three indices i(r). Consider H˚ astad’s veriﬁer V for L: On input z. (1− unless P = NP. 3]. implying )m z ∈ L. (Zwick) P = naPCP1. as the number of equations is m = {0. It is not known whether this soundness constant is optimal for perfect completeness. This is a polynomialtime reduction. If. Max E3LIN2 is the problem of ﬁnding a maximum assignment to a linear system of equations modulo 2.2 Applications to hardness of approximation We show that H˚ astad’s theorem implies the following hardness of approximation factors: 1/2 + o(1) for Max E3LIN2. then the proof π such that π[i] = ai makes V π (z) accept with probability (1/2 + )m. 3]. This assignment will satisfy at least half of the equations. k(r) and a bit b(r). the assignment xi = π[i] satisﬁes (1 − )m of the equations ψ(z). with three distinct variables per equation. NP = PCP1. (Guruswami. 2. some assignment xi = ai satisﬁes (1/2 + )m of the equations ψ(z). Obtaining a 1/2approximation for Max E3LIN2 is trivial: Consider the two assignments xi = 0 for all i and xi = 1 for all i. on the other hand. To show this. The problem of deciding whether a satisfying assignment of an E3LIN2 instance exists is P—it can be solved by Gaussian elimination. The reduction ψ maps z to the system of equations {xi(r) ⊕ xj(r) ⊕ xk(r) = b(r)  r ∈ {0. Surprisingly. The value of an assignment is the number of equations satisﬁed by it.1/2+ [O(log n). 3]. 16.5/8 [O(log n). The 7/8 + inapproximability of Max 3SAT follows by reduction from Max E3LIN2. Sudan and Trevisan) For all > 0. 21/22 + o(1) for Max 2SAT and 7/6−o(1) for Min Vertex Cover. Completeness.
We instantiate distinct copies of y1 . every assignment of φ such that x1 = a1 . there is no representation of an E3LIN2 equation by a short 2CNF formula. then 11(1− )m of their corresponding clauses in ψ(I) are satisﬁable (by property 2). For every equation x 1 ⊕ x2 ⊕ x3 = b in I. x2 = a2 and x3 = a3 satisﬁes at most 10 clauses of φ. Unlike for Max 3SAT. Equivalently. If at most (1/2 + )m equations of I are satisﬁable. If an equation in I is satisﬁed by a. y2 . which may contain two literal and one literal clauses. If (1− )m equations of I are satisﬁable. as well as between all pairs of vertices that correspond to inconsistent assignments. The ratio between these two values is less than 7/8 + . Every assignment of φ satisﬁes at most 11 clauses. so a random assignment is expected to satisfy 7m/8 clauses. If fewer than (1/2 + )m equations of I are satisﬁable. Max 2SAT. there is a 7/8 approximation algorithm based on semideﬁnite programming. then every set of (1/2 + )m + 1 vertices of ψ(I) 104 . x2 and x3 that is consistent with the equation. If a1 ⊕ a2 ⊕ a3 = b. The reduction from E3LIN2 to 2SAT uses auxiliary variables: Given an equation x1 ⊕ x2 ⊕ x3 = b. x2 = a2 and x3 = a3 . if (1− )m equations of I are satisﬁable. Even for arbitrary 3SAT instances. all four corresponding clauses in ψ(I) are satisﬁed by a. y4 such that: 1. then ψ(I) contains an independent set of size (1 − )m: The independent set is made of vertices corresponding to satisﬁed clauses consistent with the optimal satisfying assignment of I. A vertex cover for a graph G is a set C ⊆ V (G) such that every edge of G has at least one endpoint in C. Let I be an instance of E3LIN2. Otherwise. then at most 11(1/2 + )m + 10(1/2 − )m equations of ψ(I) are satisﬁable (by properties 1 and 3). one can show that Max Cut cannot be approximated within 16/17 + . unless P = NP. Using the same approach. then 4(1− )m+3 m clauses of ψ(I) are satisﬁable (by the same assignment). On the other hand. x3 . unless P = NP.Consider an instance I of E3LIN and an assignment a for I. Min Vertex Cover. The ratio between these two values is 21/22 + O( ) ≈ 0. exactly three of these four clauses are satisﬁed by a. if at most (1/2 + )m equations of I are satisﬁable. Therefore. If (1 − )m equations of I are satisﬁable. 3. y3 . The best known approximation algorithm for Max 2SAT is based on semideﬁnite programming and guarantees a ≈ 0. We draw edges between all pairs of vertices that originated from the same equation. x2 . 2. y3 and y4 for every equation of I.954 + O( ). y1 . We show that the minimum vertex cover of G cannot be approximated within 7/6. The factor 7/8 is optimal: All instances ψ(I) have exactly three literals per clause. The complement of this independent set is a vertex cover of size at most (3+ )m. there is an assignment that satisﬁes 11 clauses of φ such that x1 = a1 . C is a vertex cover of G if V (G) − C is an independent set.93 approximation. y2 . Each of these corresponds to an assignment of x1 . If a1 ⊕ a2 ⊕ a3 = b. then at most 4(1/2 + )m + 3(1/2 − )m clauses of ψ(I) are satisﬁable. there exists a 12 clause 2CNF φ over variables x1 . We reduce an E3LIN2 instance I to a 2CNF instance ψ(I) by applying this construction to every equation of I. we produce four vertices for ψ(I).
First. using a canonical reduction. The veriﬁer V π1 . a random variable xi that appears in Cj and accepts iﬀ the assignments π1 [i] and π2 [j] are consistent. A twoproof system for L with soundness γ is a probabilistic polynomialtime veriﬁer V with oracle access to two proofs. No algorithm is known that improves this factor asymptotically. 7} represents one of the seven satisfying assignments for Cj . The main ingredient is a proof system for NP of Raz that makes only two queries.36. The proofs π1 and π2 are supposed to represent diﬀerent encodings of an assignment for the input φ. but over proofs represented in a large alphabet.3 From the PCP theorem to Raz’s veriﬁer The PCP theorem says that for some constant δ > 0. 105 . Now suppose that at most (1 − δ)m clauses of φ are satisﬁable. If one of these entries in π2 is selected by the veriﬁer. . the ﬁrst step is to show that the proof system of Raz is equivalent to PCP1. NP = PCP1. Assuming this theorem. . π2 : Pr[V π1 . 16.contains an edge. π2 : Pr[V π1 . Here we cover the ﬁrst step of the argument.δ [O(log n). Therefore the largest independent set of V (G) has at most (1/2 + )m 1 vertices. Therefore an inconsistency is detected with probability δ/3. O(1)]. A trivial algorithm that picks all endpoints of an arbitrary maximal matching achieves a factor 2 approximation. The best known inapproximability ratio for this problem is 10 5 − 21 ≈ 1.π2 chooses a random clause Cj . .δ [O(log n). O(1)]. If φ is satisﬁable. . V performs exactly one query into each of π1 and π2 . For each clause Cj .π2 (x) = 1] = 1 x ∈ L ⇒ ∀π1 . at least one of its three literals is assigned inconsistently in π1 . the value π2 [j] ∈ {1. The ratio between these√ two values is 7/6 − O( ). π1 and π2 over an alphabet of size 1/γ O(1) such that on input x.π2 accept with probability 1.π2 (x) = 1] < γ. the proofs π1 and π2 encoding a satisfying assingment of φ make V π1 . there is a polynomialtime reduction ψ that maps instances of L into 3CN F formulas such that x ∈ L ⇒ ψ(x) is satisﬁable x ∈ L ⇒ At most (1 − δ)m clauses of ψ(x) are satisﬁable. we can restate the PCP theorem in the following form: For every L ∈ NP. Raz’s proof system. We now construct a twoproof system for 3SAT. the value π1 [i] ∈ {0. so its smallest vertex cover has at least (3 2 − )m vertices. We show that the PCP theorem implies the existence of a twoproof system with soundness δ for some δ > 0 for any language in NP. For each variable x i . and x ∈ L ⇒ ∃π1 . To prove Theorem 74. The second step is to construct a simulation of the two queries performed by Raz’s veriﬁer by three queries over an appropriate binary encoding of Raz’s proof. we sketch the proof of Theorem 74. Then at least a δ fraction of entries in π2 are inconsistent with π1 . 1} represents an assignment of xi .
.π2 chooses a random set of k clauses. . Raz’s result (Theorem 75) is from [Raz98]. 106 . jk ] ∈ {1. a random variable in each clause and accepts iﬀ the corresponding assignments in π1 and π2 are consistent. 1}k and π2 [j1 . in the next lecture we will see a twoproof system with soundness 1/2 such that its 2repetition has soundness only 5/8. Applying this theorem (which we won’t prove). . 7}k represent partial assignments to the ktuples of variables xi1 . . . . where the constant in the Ω() notation depends only on γ and on the alphabet size of the proof system. . . . this 2proof system can be simulated by naPCP1− . . .1/2+ [O(log n). Surprisingly. In the next three lectures we will show that for suitably small γ = γ( ). xik and ktuples of clauses Cj1 . . Fortunately. Cjk . . .4 References H˚ astad’s PCP result is from [H˚ as01].We will need a twoproof system for NP with much better soundness. . . The veriﬁer V π1 . 16. the krepetition of V has soundness 2−Ω(k) . 3]. repetition does amplify soundness at an exponential rate: Theorem 75 (Raz) If V is a twoproof veriﬁer with soundness γ. respectively. . . we conclude that for every γ > 0 there exists a 2proof system for 3SAT with soundness γ with proofs over alphabets of size 1/γ O(1) . this is not the case. ik ] ∈ {0. A krepetition twoproof system for 3SAT is speciﬁed as follows: The proof entries π 1 [i1 . We might expect that the krepetition of a twoproof system with soundness 1 − γ has soundness (1 − γ)k . We expect to increase soundness by “repeating” the twoproof protocol for 3SAT several times. . .
Lecture 17 Linearity Test As part of our preparation for an analysis of H˚ astad’s proof system (next lecture). i∈S xi for some S ⊆ {1. . . 1}. will decide if it is a code word or if it is far from being a code word. • if f is linear. 17.e. That is. accept if and only if f (x) ⊕ f (y) = f (x ⊕ y). . . 1} is a Boolean function. and multiplication (·) instead of addition (⊕). . f (x1 . . 1}k → {0. that is Boolean functions l : {0. . . . . 1}k → {0. 1 − 2xk ) 2 2 107 . we will see a procedure that. . 1} k → {0. xn ) = with probability 1. given f : {0. 1} such that there is some a ∈ {0. . 1}k for which l(x) = a · x for every x. . 1}k 2. −1 instead of 1. k} and it is deﬁned as lS (x1 . . . we denote by F : {−1. Here is a possible test: 1. First. the test accepts • if the test accepts with probability at least 1/2 + δ. we need to consider the problem of testing whether a given string is a code word for some particular code. xk ) = 1 1 − F (1 − 2x1 . given a string which may or may not be a code word for the Hadamard code. . . n}. 1}k → {−1. we will introduce some concepts from Fourier analysis of Boolean functions. . . In order to analyze the test. That is. This is equivalent to a test that decides whether a Boolean function is linear or far from linear. f (x1 . then f has agreement at least 1/2 + with a linear function. . If f : {0. . . .1 Fourier analysis of Boolean functions We are interested in linear functions. y ∈ {0. There is a linear function lS for every set S ⊆ {1. 1} the equivalent Boolean function in the representation where we use 1 instead of 0. pick x. We need to show that this test meets the requirements above. . xk ) = i∈S xi . i.
. 1}.and F (x1 . . G : {−1.1}k Notice that if F : {−1. we think of the sum as an expectation over x ∈ {−1. To get the third line. Deﬁne the following scaled dot 2 product over such functions: F ·G= 1 2k F (x)G(x) x∈{−1. LS · LT = 0 if S = T .e. if F is a Boolean function in this space. Then all such functions must be linear independent.k} ˆ This is known as the Fourier expansion of F . and it contains all boolean functions. . xk ) = i∈S xi is the equivalent function in this correspondence. Also notice that if F. 1} then Pr[f (x) = g(x)] = 1 1 + F ·G 2 2 How do the linear Boolean functions behave with respect to this dot product? Consider two linear functions LS and LT . where S∆T = (S T ) ∪ (T S) is the symmetric diﬀerence of S and T . . Consider now the set of functions F : {−1.. xk ) = 1 − 2f 1 1 1 1 − x1 . The Fourier coeﬃcient FS can be computed ˆS = F · LS . i. and therefore it is the expectation of a product of independent variables. . 1}k → R as ˆ F (x) = FS LS (x). Why would we write F like this? We’ll see that the Fourier coeﬃcients as F have a useful interpretation. 1}k → R: this is a vector space of dimension k over the reals. . where x is chosen randomly by setting the ith entry independently to ±1.. . − xk 2 2 2 2 Note that if lS is a linear function. S⊆{1. and since there are 2k of them in a space of dimension 2k they form an orthonormal basis. 1}k → {−1.. then LS (x1 . . then F · F = 1. S = T : 1 LS · L T = xi xj 2k k x∈{−1..1}k i∈S∆T xi = i∈S∆T Ex xi = 0. 1}k . 108 . So all functions LS are orthogonal to each other. .1} i∈S j∈T = 1 2k xi x∈{−1. . 1}k → {−1. .1}k i∈S∆T = Ex∈{−1. that is. . So we can write each function F : {−1.
then ˆ FS = F · LS = Ex F (x)LS (x) = 2Prx [F (x) = LS (x)] − 1 = Prx [F (x) = LS (x)] − Prx [F (x) = LS (x)] − 1 = 2Prx [f (x) = lS (x)] − 1. ˆ Equivalently. and it is 1 when f is exactly the complement of the linear function. we know that ˆ 2 = 1. We can easily prove it now using what we know about Fourier coeﬃcients. we need a couple of results about Fourier coeﬃcients. 2 109 . The coeﬃcient FS is 1 when f is exactly the corresponding linear function lS . Then FS ≥ 2 . ˆ Proof: [Lemma 78] Suppose ls and f have agreement at least 1/2 + . for every S. Lemma 78 Let f : {−1. and in a sense. ˆ 2 ≥ 4 2 . which implies that there is a linear function lS with nontrivial agreement with f . 1}k → R. 1}. they represent the bias that a Boolean function f ˆ has with respect to each of the linear functions. We have used the result in Lemma 78 before when we did worst case to average case reductions. 1}k → {−1. Ex [F 2 (x)] = S ˆ2 FS . 1}. we have Prx [f (x) = lS (x)] = 1 + 1 FS . We will see that if the linearity test for a Boolean function f given at the beginning of ˆ the lecture accepts with high probability. then there is a large Fourier coeﬃcient FS . There are at most 1/(4 2 ) linear functions lS such that f and lS have agreement at least 1/2 + . So there which implies that FS S FS can be at most 1/(4 2 ) functions lS with that much agreement with f . Corollary 77 For every F : {−1. S ˆ2 FS = 1. 1}k → {−1. 2 Proof: [Lemma 76 (Parseval’s Equality)] Ex [F 2 (x)] = Ex S ˆ FS LS (x) T ˆ FT LT (x) = S T ˆ ˆ FS FT Ex [LS (x)LT (x)] ˆ ˆ FS FT LS · L T = S T = S ˆ2 FS . The Fourier coeﬃcients 2 2 range between 1 and −1. Lemma 76 (Parseval’s Equality) For every F : {−1. but from Corollary 77. One way to interpret this is that the set of values of F and the Fourier coeﬃcients in the expansion of F have the same l2 norm up to a constant. First.If F is Boolean.
y ∈ {0. 1}k and accept iﬀ f (x) ⊕ f (y) = f (x ⊕ y).y [F (x)F (y)F (xy)]. i.2 Using Fourier analysis to test linearity Now consider the test to check if f : {0. if we do list decoding for f then we come up with a small list. To do this we just use the Fourier expansion of each function and simplify.e. we need to prove that when Ex. there is an S such that f and lS have agreement on at least 1/2 + fraction of inputs.y [ test accepts f ] − 1. S S So the largest Fourier coeﬃcient of F is an upper bound on the expectation E x [F (x)F (y)F (xy)].y S ˆ FS LS (x) T ˆ FT LT (y) U ˆ FU LU (xy) = S T U ˆ ˆ ˆ FS FT FU Ex.y [LS (x)LT (y)LU (x)LU (y)] ˆ ˆ ˆ FS FT FU (Ex LS (x)LU (x)) (Ey LT (y)LU (y)) S T U = = S T U ˆ ˆ ˆ FS FT FU (LS · LU )(LT · LU ) S = S ˆ3 FS ˆ ≤ max FS ˆ ˆ2 FS = max FS . then there exists an S such that FS ≥ 2 . We have shown so far that ˆ ˆ for every S. 2 2 This follows from the same type of argument we have made before. Now. 110 . and that Pr[ test accepts f ] ≤ 1 + 1 maxS FS . 1} is linear: pick at random x. Lemma 79 If the test accepts f with probability p.y [ test accepts f ] − Prx. Prx [f (x) = lS (x)] = 1 + 1 FS . then there is a linear function l S that agrees with f on at least a p fraction of inputs. 1}k → {0. If f is linear then the test accepts with probability one.y [ test rejects f ] = 2Prx. Proof: First observe that Pr[ test accepts f ] = 1 1 + Ex. based on the fact that F (x)F (y)F (xy) is 1 when the test accepts and −1 when it rejects: Ex. and the expectation tells us how likely the test is to accept f .y [F (x)F (y)F (xy)] is bounded away from zero. 2 2 2 2 ˆ Therefore if Pr[ test accepts f ] ≥ 1/2 + .y [F (x)F (y)F (xy)] = Ex. there is a large Fourier coeﬃcient. 2 Since not more than 1 in 4 2 functions can have that much agreement with f .17. Ex.y [F (x)F (y)F (xy)] = Prx.
A corresponds to some linear function l{a} . . . . . . the Long Code encodes a message m ∈ {1. It is also equivalent to think of the encoding as the concatenation of lS1 (m). . . it is a concatenation of the mth entry in each string of length k. a2 · m. 1}k .3. k} as a special linear function l {m} . and do the following test: choose a random x ∈ {0. . 1}k and accept if and only if A(p ⊕ x) ⊕ A(x) = 1. Equivalently. 1} that given an x ∈ {0. Let us now consider the error correcting code. . From this point of view. P (a) = 1. we need to send an encoding that takes 2 k bits. . where S1 . 1} returns M (f ) = f (m). 1}) → {0. lS2k (m). . . . k} in the Long Code is as a functional M : ({1. where S = {i : mi = 1}. . k} satisfying the predicate. we’ll start by considering the veriﬁcation model in vogue before H˚ astad’s veriﬁer. In fact. . a2k is an enumeration of a ∈ {0. 1} that given a function f : {1. the veriﬁer should check that there is some a ∈ {1. lS2 (m). . . . There is a loss of expressive power in this representation l{m} (note that it requires only log k bits of information). In H˚ astad’s proof system. . we want the veriﬁer to check that A is actually a code word for the Long Code as follows: for some predicate P . Notice that the Long Code is doubly exponentially long: in order to transmit log k bits of information (since m ∈ {1. for any code. called the Long Code. . . 1}k → {0. . a2k [m] where a1 . . a2k is an enumeration of {0. Equivalently. . One way to think of the encoding of m ∈ {1. . . the encoding is the concatenation of the evaluation of all possible linear functions at m. k} → {0. . . k} → {0. S2k is an enumeration of subsets of B k . . 1}k returns M (x) = x[m].1 PreH˚ astad veriﬁcation To get some understanding of how the Long Code test works in veriﬁcation. For a random x. 1}k in the Hadamard code as a function lS . Another equivalent way to think of the encoding of m is as the concatenation of a1 [m]. This is how we though of the code in worst case to average case reductions. . that is. . or in other words. 1}k → {0. so this blowup does not cause a problem. a2k · m where a1 . . f2k is an enumeration of all possible functions f : {1. a2 [m]. used in H˚ astad’s proof system. . . . . Given a function A : {0. . . 1}k such that p[i] = P (i) for every i. . the encoding is a function that maps strings to one of their bits: the encoding of m is a function M : {0. k}).17. . and no encoding can be longer than the Long Code unless two of the functions being evaluated are exactly the same. . . . . So suppose that after performing some test. and so there is at least a 1 − 2δ probability that 111 . . such that given any string x. 1}. we have been thinking of the encoding of a message m ∈ {0. it is possible to think of the encoding as the concatenation of the evaluation of a series of functions at the input value. Then deﬁne the vector p ∈ {0. . . the encoding of m can be thought of as the concatenation of f1 (m). . . the code is only used on inputs of a constant size.3 Testing code words for the Long Code So far. but it is convenient for our purposes. . . . . . f2 (m). An equivalent way to think of the encoding is as the concatenation of a1 · m. k} → {0. . . 1}k . . x ⊕ f is also random. Every error correcting code can be seen as a subset of this code. . f2k (m) where f1 . . that is. we ﬁnd that A has agreement 1 − δ with some l{a} . A(x) = x[a]. 1}. 17.
both A(x) = x[a] and A(p ⊕ x) = p[a] ⊕ x[a]. k} → {0. Luby and Rubinfeld [BLR93]. Then with high probability the test will accept since with high probability f (a) ⊕ g(a) = f (a) ⊕ g(a) ⊕ h(a). . .2 The Long Code test Given A : ({1. 17. if P (a) = 1. k} → {0. Goldreich and Sudan [BGS98]. . The eﬃcient test is due to H˚ astad [H˚ as01]. [BCH+ 96]. . Thus. g : {1. . . 1}. . it cannot be deﬁned over a very large set. 1} uniformly • pick h : {1. k} → {0. and if P (a) = 0 the test accepts with probability at least 2δ. 17. Suppose A = lS for some set S. the test accepts with probability at least 1 − 2δ. The Long Code was introduced by Bellar. . 1}) → {0. . . If A is linear but not a code word for the Long Code . If A has no correlation with any linear function then this test is close enough to a test for linearity that the test will fail. The analysis that we presented is due to Bellare et al. . 1} such that for every x h(x) = 0 1 with probability 1 − with probability • accept iﬀ A(f ) ⊕ A(g) = A(f ⊕ g ⊕ h) Suppose that A = l{a} .4 References The linearity test studied in these notes is due to Blum. 112 .3. . Then the probability that f (x) x∈S x∈S g(x) = x∈S f (x) x∈S g(x) x∈S h(x) gets exponentially close to 0 with the cardinality of S. • pick f.
Exercises
ˆ 1. Show that if f : {0, 1}n → {0, 1} is an arbitrary function, then maxS FS  ≥ 2−n/2 . 2. Compute the Fourier coeﬃcients of the function f (x1 , x2 ) = x1 ∨ x2 . 3. Let f : {0, 1}n → {0, 1} and g : {0, 1}m → {0, 1} be arbitrary functions, and let h : {0, 1}n+m → {0, 1} be deﬁned as h(x1 , . . . , xn+m ) = f (x1 , . . . , xn ) ⊕ g(xn+1 , . . . , xn+m ) Compute the Fourier coeﬃcients of h in terms of the Fourier coeﬃcients of f and g. 4. Let n be even and consider the function f (x1 , . . . , xn ) = (x1 ∨ x2 ) ⊕ (x3 ∨ x4 ) ⊕ · · · (xn−1 ∨ xn ) ˆ Prove that maxS FS  = 2−n/2 . 5. Suppose that f : {0, 1}n → {0, 1} is a function that depends only on its ﬁrts k variables, that is, suppose that there is a function g : {0, 1}k → {0, 1} such that f (x1 , . . . , xn ) = g(x1 , . . . , xk ). Compute the Fourier coeﬃcients of f in terms of the Fourier coeﬃcients of g. 6. Analyse the following linearity test: given oracle access to a function f : {0, 1} n → {0, 1}, pick uniformly at random four points x1 , x2 , x3 , x4 ∈ {0, 1}n and accept if and only if f (x1 ) ⊕ f (x2 ) ⊕ f (x3 ) ⊕ f (x4 ) = f (x1 ⊕ x2 ⊕ x3 ⊕ x4 )
113
Lecture 18
Testing the Long Code
18.1 Test for Long Code
We now consider a test to distinguish codewords of the long code. Given A : {0, 1} k → {0, 1} pick x, y ∈ {0, 1}k uniformly and z ∈ {0, 1}k where the bits of z are IID with Pr[z[i] = 1] = . Accept iﬀ A(x) ⊕ A(y) = A(x ⊕ y ⊕ z). If A is a codeword of the long code, then A = l{a} and the test check if x[a] ⊕ y[a] = x[a] ⊕ y[a] ⊕ z[a] and succeeds iﬀ z[a] = 0. Hence it succeds with probability 1 − . We now show that if the test accepts with probability 1/2 + δ, then list decoding of the codeword produces a list with size dependent on δ rather than k. Intuitively this is similar to the linearity test of the last lecture. Consider the case where A is a linear function with large support: A = l S , S is large. Then the test checks if lS (x) ⊕ lS (y) = lS (x) ⊕ lS (y) ⊕ lS (z) and will succeed iﬀ lS (z) = 0. 1 The probability of this occurring is 1 + 2 (1 − 2 )S . So the test cannot accept with high 2 probability if A is not linear or if A is linear but has large support. We now consider the equivalent setting where we replace 0 with 1, 1 with 1, and ⊕ with × so that A : {−1, 1} → {−1, 1}, x, y, z ∈ {−1, 1}k . Here the test accepts 1 iﬀ A(x)A(y)A(xyz) = 1. Speciﬁcally, Pr[test accepts A] = 1 + 2 Ex,y,z [A(x)A(y)A(xyz)] 2
114
where: Ex,y,z [A(x)A(y)A(xyz)] = Ex,y,z
S LS ·LU
ˆ AS LS (x)
T
ˆ AT LT (y)
U LT ·LU
ˆ AU LU (x)LU (y)LU (z)
=
S,T,U
ˆ ˆ ˆ As AT AU (Ex [LS (x)LU (x)]) (Ey [LT (y)LU (y)])(Ez LU (z)) ˆS A3 Ez [LS (z)]
S
= =
S
ˆS A3
i∈S
E[zi ]
=
S
ˆS A3 (1 − 2 )S
=1
≤
ˆ max AS (1 − 2 )S
S
·
ˆS A2
S 1 2
Now suppose that the probability that the test accepts is greater than ˆ is some S where AS (1 − 2 )S ≥ 2δ. Hence: ˆ AS (1 − 2 ) 1 S log 1−2
S
+ δ. Then there
≥ 2δ
≥ 2δ ≤ log 1 2δ 1 log 2δ 1 log 1−2
S ≤
This is close to the the analysis of H˚ astad’s proof system. The primary diﬀerence is that H˚ astad’s system looks at two strings rather than one.
18.2
H˚ astad’s veriﬁer
P i V j Q
First recall Raz’s veriﬁer:
The proof P is over the alphabet Σ, Σ = k and the proof Q is over the alphabet Γ, Γ = t. The veriﬁer takes the character at position i from P and the character j from Q, 115
1}k → {0. We digress for a moment to consider a subtle point about the long code test. 1}k → {0. ⊕ of an even number of inputs is not selfdual while ⊕ of an odd number ˆ of inputs is selfdual. Now consider H˚ astad’s veriﬁer: P Q i V j In the proofs for H˚ astad’s veriﬁer. . j are the long code encodings of the characters at those positions in Raz’s Veriﬁer. . B is the long code of Q[j] 1 and π(Q[j]) = P [i]. A(x) ∈ {−1. 1}t → {0. 1}k → {0. What if S = ∅? This is the case where the function is 0 everywhere. 1} into a self dual function A : {0. t} → {1. the binary strings indexed by i. 1} B : {0.where j is selected at random and i is selected at random from a few positions of P . . If A is selfdual then AS = 0 if S is even. A(x) ∈ {0. 1}: A (x) = This is selfdual since: A (1 − x) = 1 − A(x) = 1 − A (x) if x[1] = 0 A(1 − x) = 1 − A (x) if x[1] = 1 116 A(x) if x[1] = 0 1 − A(1 − x) if x[1] = 1 V . k} is as before. We will show that if the test accepts with probability 2 + then Pr[D(A) = π(D(B))] ≥ poly( ). 1} For example. The veriﬁer will accept with probability greater than 1 − if A is the long code of P [i]. . Consider the following procedure called “folding” to convert any function A : {0. . . H˚ astad’s verifer selects i. A function A is selfdual if: A(1 − x) = 1 − A(x) when x. . 1} Here the function π : {1. j the same way as Raz’s so that the problem can be modeled as: A : {0. . 1} A(−x) = −A(x) when x. Then the test passes with probability 1. This would not be a problem if the function is balanced. The veriﬁer accepts iﬀ P [i] = π(Q[i]).
In our case of interest. B = l{b} . we have a = P [i].y. (x ◦ π)[i] = x[π(i)]. 1}k → {−1. 1}t → {−1. b = Q[j]. then A = A. We can then calculate: Lβ ·Lγ Ex. In the case where the proof is based on a valid proof for Raz’s veriﬁer. y ∈ {0. k} we pick x ∈ {0. all testing and decoding is performed on A . 1}. A = l {a} .y. The test accepts iﬀ: A (x) ⊕ B (y) = B ((x ◦ π) ⊕ y ⊕ z) (From now on we will stop distinguishing between A and A and between B and B . k}an odd number of j ∈ β satisfy π(j) = i} so that i∈π2 (γ) xi . . .z α. 1}. . Using the Fourier analysis machinery we know that: 1 1 Pr[test accepts] = + Exyz [A(x)B(y)B((x ◦ π)yz)]. x ◦ π ∈ {0. B .β. B : {0.) Treating vectors as functions. 1}k → {0. 1}. 1}t where the bits are IID with Pr[z[i] = 1] = . . t} → {1. B : {−1. 1}t uniformly. 1}t . 1}k . We also pick z ∈ {0. π(b) = a and the test checks if: x[a] ⊕ y[b] = x[π(b)] ⊕ y[b] ⊕ z[b] = x[a] ⊕ y[b] ⊕ z[b] And hence the test accepts with probability 1 − . . the long code. 1}.γ α ˆ Aα Lα (x) β Lα ·Lπ2 (γ) ˆ Bβ Lβ (y) γ ˆ Bγ Lγ (x ◦ π)Lγ (y)Lγ (z) (1−2 )γ = = β ˆ ˆ ˆ Aα Bβ Bγ Ex [Lα (x)Lγ (x ◦ π)] Ey [Lβ (y)Lγ (y)] Ez [Lγ (z)] ˆ ˆ2 Aπ2 (β) Bβ (1 − 2 )β 1 2 Now suppose that Pr[test accepts] ≥ 2δ ≤ ≤ β + δ so that: ˆ ˆ2 Aπ2 (β) Bβ (1 − 2 )β ˆ ˆ2 Aπ2 (β) Bβ (1 − 2 )β <δ β = ˆ β:Aπ2 (β) ≥δ ˆ ˆ2 Aπ2 (β) Bβ (1 − 2 )β + ˆ β:Aπ2 (β) <δ ˆ ˆ2 Aπ2 (β) Bβ (1 − 2 )β 117 . . 2 2 Deﬁne π2 (β) j∈γ x[π(j)] = {i ∈ {1. . Given A : {0. π : {1. In the following analysis. Considering the equivalent setting of A : {−1.z [A(x)B(y)B((x ◦ π)yz)] = Ex. . 1}t → {0.
. Similarly D(B) picks β ⊆ {1. . t} with probability Bβ and c b ∈ β at random. c = O( 1 ) we ﬁnd: Pr[D(A) = π(D(B))] ≥ ≥ ≥ β ˆ ˆ2 1 A2 2 (β) Bβ π β β 1 c ˆ ˆ2 A2 2 (β) Bβ (1 − 2 )β π δ2 = Θ( δ 2 ) c This probability does not depend on t or k. . . k} with probability ˆα ˆ2 A2 and a ∈ α at random.Multiplying both sides by δ yields: δ2 ≤ ≤ ˆ2 ˆ A2 2 (β) Bβ (1 − 2 )β π ˆ β:Aπ2 (β) ≥δ β ˆ2 ˆ A2 2 (β) Bβ (1 − 2 )β π Now consider the following decoding algorithm: D(A) picks α ⊆ {1. 118 . . so that we can adjust the soundness parameter of Raz’s veriﬁer and not have a circular relationship of the variables. . . Using this strategy and noting that ∃c : (1 − 2 )k ≤ k . .
Before we talk about bounds on the size of a circuit. The size of a circuit is the number of AND and OR gates in the circuit. since each XOR gate is a function of n d−1 variables. We deﬁne AC0 to be the class of decision problems solvable by circuits of polynomial size and constant depth. each of which has fanin n d−1 . Theorem 80 If a circuit of size S and depth d computes PARITY then S ≥ 2Ω(n 1 (d−1) 1 d−1 ) ) is actually tight. we assume all the gates have unlimited fanin and fanout. C comprises of a tree of XOR gates. in the next class we will look at a diﬀerent proof which uses random restrictions.Lecture 19 Circuit Lower Bounds for Parity Using Polynomials In this lecture. which computes the PARITY of n vari1 ables. let us ﬁrst look at a circuit of constant depth d that computes PARITY. The depth of a circuit is deﬁned as the length of the longest path from the input to output. 1 Now. Let us replace alternating layers of XOR gates in the 119 . We want to prove the result that PARITY is not in AC0 . 19.Note that for our purpose. it can be implemented by a CNF or a DNF of size 2n 1 d−1 . In this class. let us ﬁrst clarify what we mean by circuit depth and circuit size. We will try to prove this result using two diﬀerent techniques. we will talk about circuit lower bounds and take our ﬁrst step towards proving a result about the lower bound on the size of a constant depth circuit which computes the XOR of n bits.1 Circuit Lower Bounds for PARITY Before we go into our proof.1). as the following example shows. This bound of 2Ω(n Consider the circuit C shown in Figure(19. the tree has a depth of d − 1. we will talk about a proof which uses polynomials.
1: Circuit for Computing XOR of n variables.2 Proof of Circuit Lower Bounds for PARITY using Polynomials For our proof. More formally. with high probability.for example we replace gates in the ﬁrst layer by their CNF implementation. we show that if a function f : {0. each small circuit in the tree 1 computes the XOR of k = n d−1 variables tree by CNF’s and DNF’s . we will utilise a property which is common to all circuits of small size and constant depth. 1} is computed by a circuit of size 3 s and depth d. which PARITY does not have. 1}n → {0. 1}n → {0. gates in the second layer by their DNF implementation. 1}n → agrees with PARITY on more √ than a fraction 3 of its inputs. a function which ˆ 4 agrees with PARITY on a large fraction of its inputs.Figure 19. This gives us a circuit of depth 2(d − 1). The property is that circuits of small size and constant depth can be represented by low degree polynomials. That is. 1}n → such that Prx [f (x) = g(x)] ≥ 4 2d ). it is easy to see that PARITY cannot be computed by circuits of constant depth and small size. and gα = 0 only for α ≤ O((log S) ˆ Then we will show that if a function g : {0. and so on. 1} such that • for every g in GOR . has to have high degree. The same thing can be done for AND to get a circuit of depth d. Now we can use the associativity of OR to collapse consecutive layers of OR gates into a single layer. then gα = 0 only for α = Ω( n). Lemma 81 Suppose we can describe a distribution GOR over functions g : {0. gα = 0 only for α ≤ O (log 1 )(log K) ˆ 120 . then there exists a function g : {0. 19. From these two results. This gives us a circuit of depth d and size O(n2n 1 d−1 ) which computes PARITY.
is at most S. if we expand out the expression for C(x). as C has depth d. we pick indepen1 dently an approximating function gi with parameter = 4S .2) Xj where Xi = 1 − 2xi . n A = {X : G(X) = i=1 Xi } (19. 1}n → • Prx [g(x) = x1 ⊕ x2 ⊕ · · · ⊕ xn ] ≥ 3 4 such that and • gα = 0. Now consider the set of all functions S = {f : A → }.5) 121 . In other words. P rg∼GOR [g(x) = x1 ∨ x2 ∨ · · · ∨ xk ] ≥ 1 − . 3 the probability that the results of all the gates are correctly computed.3) 2 2 2 2 Then. 1}n . Therefore. by our initial assumption. Then. as K. xn ) = Then Gi can be written as Gi = α 1 1 + Gi (1 − 2x1 . · · · . · · · . such that Prx [h(x) = C(x)] ≥ 3 4 Proof: The way we do this is as follows. for all sets α such that α > d.4) 3 Then A ≥ 4 2n . −1} be the Fourier transformed version of the function g. we will get a multilinear polynomial of degree at most O((log S)2d ). Then Gi has degree at most O((log S)2 ). Here K is the maximum fanin of each gate. the fanin of any gate. probability that the new circuit computes C(x) correctly. Deﬁne A to be the set of values over i=1 4 which g agrees with PARITY. −1} i gi (x1 . These form a vector space of dimension A.• for every x ∈ {0. − xn ) (19. is at least 4 . 1 − 2x2 . That is. given a circuit C of size S and depth d. 1 − 2xn ) 2 2 gi. 1 1 1 1 G(x) = 1 − 2g( − x1 . −1} be the Fourier transformed version of function g . so that Gi : {1. that is. · · · . Also suppose we can describe a such a distribution GAN D for AND gates. for a given input.α ˆ j∈α (19. −1}n → {1. Any function f in this set can be written as f (X) = α ˆ fα i∈α Xi (19. we can ﬁnd a polynomial h of degree at most O((log S) 2d ). Prx [G(X) = n Xi ] ≥ 3 . Let n → {1. For every gate in a circuit C. 2 Lemma 82 Suppose there is a function g : {0. ˆ √ Then d = Ω( n) Proof: Let G : {1. and replace the gate by gi .1) (19. where Xi = 1 − 2xi . Then.
(19. xi is the ith bit of x. An obvious choice is pbad (x1 . Therefore. for all x ∈ {0. Xi = G(X) i∈α i∈α / Xi (19.9) i∈{1. and 0 is the vector of zeros (of the appropriate size based on context).Over A. Pr [g(x) = x1 ∨ . Hence the set i∈α Xi α≤d+ n forms a basis for the set S. and when some xi = 1. . we’ll replace the tests of all k variables with just a few tests of random batches of variables. we’ll almost always (over our choice of p) have p(x) = 1. But it has degree k. for X ∈ A. for every α. we’ll have p(x) = 0.. 2 Now we need to prove that we can indeed describe such distributions GOR and GAN D over functions gi for OR and AND.. .8) g∼G Proof Idea: We want a random polynomial p : {0. As there must 2 2 3 (n ) ≥ 2 n k 4 (19. 1} k → R such that 1.7) √ This holds when d = Ω( n). . Every such function f which belong to S can be written as a polynomial of degree 2 at most d + n . this means that d+ n 2 k=0 By our initial assumption. (19.k} which computes OR with no error. g is a degree O((log 1 )(log k)) polynomial. whereas we’d like it to have logarithmic degree. To accomplish this amazing feat... 122 . xk ) = 1 − (1 − xi ) . ∨ xk ] ≥ 1 − . Therefore. G(X) = n i=1 Xi . such that α ≥ n . we can replace i∈α Xi by a polynomial of degree less than or equal 2 to d + n . 19. 1}k → R that computes OR.6) be at least A such monomials. We’ll use the notation that x is a vector of k bits. G(X) is a polynomial of degree at most d. and 2. there exists a distribution G over functions g : {0. This gives us a random polynomial which computes OR with onesided error: when x = 0. . Lemma 83 For all k and . 1}k . . .3 Approximating OR The following lemma says that we can approximately represent OR with a polynomial of degree exponentially small in the the fanin of the OR gate.
log k}: – For j ∈ {1. . suppose x1 ∨ . 1}a+2 . k} → {0. . . ∨ xk = 1. That is. h∼H Pr [{i ∈ A : h(i) = 0} = 1] ≥ 1 8 (19. ..Proof: We pick a random collection C of subsets of the bits of x.j }a∈{0. (19. for any x ∈ {0. . 1}a+2 ∗ Deﬁne Sa. .. Precisely. we deﬁne our polynomial as p(x1 . (That is.10) S∈C Why does p successfully approximate OR? First. Then if we pick h at random from H. we’ll deﬁne our collection C in terms of pairwise independent hash functions.. . . ∨ xk = 1. We have p(x) = 1 iﬀ the product term is zero. Then we let C = {Sa.. . xk ) = 1 − 1− xi i∈S . In order to accomplish this. .12) Next. ({i ∈ S : xi = 1} = 1) (19. And that. The product term is zero iﬀ the sum in some factor is 1. C (19. . Let t > 0 be a value that we will set later in terms of the approximation parameter . happens iﬀ there’s some S ∈ C which includes exactly one xi which is 1. • For a ∈ {0.. t}: ∗ Pick random pairwise independent hash function ha. and let H be a family of pairwise independent hash functions of the form h : {1. we have Pr [p(0) = 0] = 1. . . .14) With this as a guide. 123 . we want the following to be true with high probability. which you may recall from Lecture 7: Lemma 84 (ValiantVazirani) Let A ⊆ {1. . . . . We’ll soon see how to pick C.t} where the sets Sa. regardless of the distribution from which we pick C.log k}. . . there is a constant probability that there is a unique element i ∈ A such that h(i) = 0. 0) = 1 − S∈C 1− 0 i∈S = 0. k} → {0.11) So.j∈{1. ∃S ∈ C. . .j = h−1 (0). 1}k . Sa. we turn to the ValiantVazirani reduction. .13) Given that we don’t want C to be very large (so that the degree of the polynomial is small).. . . .j are deﬁned as follows. . . . .. let a be such that 2a ≤ A ≤ 2a+1 .. . and: p(0. suppose x1 ∨ . we’ll have to pick C very carefully. Formally. .j = {i : h(i) = 0}. .j : {1. Then we have x = 0. but once the choice has been made. in turn. . . (19. k}. k}). for each S ∈ C we have S ⊆ {1. .
j } = 1] ≥ (19.18) Any such g has the same degree as g.t .1 . . . (19. to get a success probability of 1− as required by the lemma. 1}k . . The degree of p will then be C = t(log k) = O((log 1 )(log k)). so C includes t sets Sa. . g clearly computes AND if g computes OR. Also. Sa. i.16) 7 t 8 .j ∼H Pr [{i ∈ A : i ∈ Sa. . we can also approximate AND with an exponentially low degree polynomial. .17) Now. for a particular x ∈ {0. .e. we have ha.15) which implies that ha. we just pick t = O(log 1 ). . . 124 . The original proof was diﬀerent. . . . xk ) = 1 − g(1 − x1 .j . Then we can construct G which approximates AND by drawing g from G and returning g such that g (x1 . ∨ xk ] ≥ 1 − p which 7 8 t . so the probability that there is some j for which Sa. A = {i : xi = 1}. and we will present it next. Then we have a ∈ {0. Consider any one such Sa. which satisﬁes the degree requirement of the lemma.j ∩ A = 1 is at least 1 − by the reasoning above tells us that Pr [p(x) = x1 ∨ . . 19. Suppose we have some G which approximates OR within as above. . .Now consider any x = 0 which we’re feeding to our ORpolynomial p. and let a be such that 2a ≤ A ≤ 2a+1 . .4 References The proof of the Parity lower bound using polynomials is due to Razborov [Raz87] and Smolensky [Smo87]. .j (i) = 0} = 1] ≥ 1 8 1 8 (19. 1 − xk ). Let A be the set of bits of x which are 1. log k}. which happens with probability at least 1 − over our choice of g. .. By ValiantVazirani. (19. 2 Note that given this lemma.j ∼H Pr [{i ∈ A : ha.
.Lecture 20 Lower Bound for Parity Using Random Restrictions In this lecture we give an alternate proof that parity ∈ AC0 using the technique of random / / restrictions. . and continue alternating in this way. First. Now let’s see how restrictions are used to prove our main theorem: Theorem 85 There does not exist a constantdepth. Next. . We give a way of squashing C down to depth d − 1 while still computing parity on many variables. We can repeat the method d − 2 times to obtain a parity circuit of depth 2 and subexponential size. n} → {0. we suppose by way of contradiction that we have an AC0 circuit C of some constant depth d which computes parity. More formally. 20. . a restriction is a function ρ : {1. 125 .) The proof will be by contradiction. polynomialsize circuit which computes the parity function of its inputs. and leaves other inputs free. . we show that parity circuits of depth 2 must have exponential width (Lemma 86).1 Random Restrictions A restriction ﬁxes some inputs of a circuit to constant values. 1. set the ith input of C to 0. . . Proof Idea: We’ll consider only circuits which at the ﬁrst level consist only of ORs. which contradicts Lemma 86. This is the original method that was used to prove parity ∈ AC 0 . We apply a restriction ρ to a circuit C with n inputs to obtain a restricted circuit Cρ as follows: • For each i ∈ {1. .1. (Any circuit can be coverted to this form with at most a factor of 2 increase in the depth. at the second level consist only of ANDs. n}: – If ρ(i) = 0. set the ith input of C to 1. leave the ith input of C as a free variable. unboundedfanin. ∗}. – If ρ(i) = 1. – If ρ(i) = ∗. This is depicted in Figure 20.
Statements and proofs of the lemmas mentioned above follow. obtaining a circuit of depth d − 1 that computes parity on many variables. which is ﬁne because the size of both layers was made constant by the two rounds of restrictions. Figure 20. For any CNF circuit C: 1. so we can collapse them both into a single layer. we have a contradiction! So every term must depend on all variables. Alas. Each term includes all n variables: Suppose by way of contradiction that C has some term t which does not depend on some variable xi .. which we can do with at most an exponential size increase of those layers (Lemma 90). t outputs 0 and the single AND gate on the next level outputs 0. After the ﬂip. The proof for DNFs is quite similar. which is the output of the whole circuit. The output of t is still 0.1: A circuit with alternating levels of ANDs and ORs. Restrictions enter in our method of circuit squashing. Now we ﬂip the ﬁrst two layers. the second and third layers are both ORs. obtaining Cρ1 . then: 1. which is just as good) on those free variables. which have OR gates at their top level and a single AND of all the ORs at the second level. There are at least 2n−1 terms: To compute parity. We prove that the top level OR gates of Cρ1 are likely to only depend on a constant number of inputs. the secondlevel AND gates are also likely to depend on only a constant number of inputs (Lemma 89). but a good number of variables still remain free (Corollary 88). Lemma 86 If a DNF or a CNF computes parity of n variables. 2. C outputs 0 only when one of the terms (OR gates) 126 . Proof: We will prove the lemma for CNFs. Now ﬂip the value of xi . the parity has ﬂipped. Then when all inputs to t are 0. C must output 0 on 2n−1 diﬀerent settings of the input variables. and thus the output of C has not changed. and 2. But since we’ve only changed one variable.. We apply a random restriction ρ 1 to C’s inputs. Each term includes all n variables. Then we show that after applying a second random restriction ρ 2 and obtaining Cρ1 ρ2 . Cρ1 will compute parity (or the inverse of parity.x1 x2 x3 x4 OR OR OR OR AND AND AND OR OR . There are at least 2n−1 terms.
Thus.p. Then: c log n Pr [gate depends on ≤ c inputs] ≥ 1 − ρ t=c+1 c log n t 1 √ n c 2 √ n t ≥ 1 − (c log n)c ≥ 1− 2 127 1 nc/3 (for large enough n).p. be reduced enough to depend on only c inputs.h. Proof Idea: We’ll break the proof into two cases. But each OR gate outputs 0 on exactly one setting of the input variables. large AND gates can be replaced with 0 w. (2) Small gates will w. c log n c log n (for large enough n) Case 2: Suppose the gate has at most c log n inputs.h.outputs 0.h. . (1) Large OR gates will have some input ﬁxed at 1 w. Similarly. and thus the entire gate can be replaced with a constant 1.1) Then an OR or AND gate with n inputs restricted with ρ depends on at most c inputs with 1 probability at least 1 − nc/3 . C must have at least 2n−1 terms.p. 2 Lemma 87 Let ρ be a random restriction deﬁned as 0 with probability 1 with probability ρ(i) = ∗ with probability 1 1 √ 2 − n 1 1 √ 2 − n 2 √ n (20. Then: Pr [gate depends on ≤ c inputs] ≥ Pr [gate becomes constant] ρ ρ ≥ Pr [some input set to 1 (OR gate) or 0 (AND gate)] ρ ≥ 1− ≥ 1− = 1− ≥ 1− 1 1 +√ 2 n 1 2 1 nc/2 1 nc/3 .. Proof: Case 1: Suppose the gate has at least c log n inputs before applying the restriction.
Cρ depends on at most b(c. Proof: The proof is by induction on c. obtaining a circuit Cρ such that Pr [some gate of Cρ depends on > 4k variables] < nk and Pr Cρ depends on < √ 1 n4k/3 = n−1/3 1 (20. 2 Lemma 89 Given any cCNF C (in which each OR depends on at most c variables) and any k > 0. Let M be a maximal set of disjoint clauses. there is a constant b(c. Then by Lemma 87.3) which follows from a Chernoﬀ bound. n 1 1 1 −√ 2 n c M  Note that in the third inequality we use the fact that the clauses of M are independent: the probability that a particular OR gate in M is unsatisﬁed is independent of the probability that any other OR gate in M is unsatisﬁed. k) variables. we’ll prove that it’s true for c. Lemma 89 is satisﬁed when b(1. Then: Pr [C depends on ≤ b(c. k) such that after applying the random restriction ρ of 1 Lemma 87 to C. (You can think of this as picking a maximal matching in the hypergraph whose nodes are the circuit’s variables and whose hyperedges are the sets of variables used by each clause. This guarantees the existance of a circuit which satisﬁes the corollary. √ n 2 free variables Proof: We apply Lemma 87 with c = 4k.Corollary 88 If a circuit has nk gates. 128 . then there is a restriction with such that all toplevel gates have fanin at most 4k. The OR gates each only have 1 input. k) = 3k. Inductive case: Suppose the lemma holds for c − 1. Base case: c = 1. That is. the clauses in M share no variables (disjointedness). and every clause not in M shares a variable with some clause in M (maximality).2) √ n variables ≈ 2O(− n) 2 1 (20.) One of two cases is true: Case 1: M  ≥ 2c k log n. k) inputs] ≥ Pr [C becomes constant] ρ ρ ≥ Pr [some OR gate unsatisﬁed] ρ ≥ 1− 1− ≈ 1 − e− 2c M  1 > 1− k. with probability more than 1 − nk . so we can replace the entire CNF with a single AND gate.
4) Lemma 90 (The Switching Lemma) A CNF or DNF of c variables can be represented as a DNF or a CNF. there are 24k ways to set the 4k variables. recall that every clause not in M shares a variable with a clause in M . Pr[after the restriction f can be expressed as a tCNF and a tDNF] ≥ 1 − (7pk) t . Now. Let V be the set of variables in the clauses of M . To do that. independently hardwire 0 or 1 as that variable’s value. We state one more useful result before getting on to the analysis: 1 We can take f to be a kDNF as well by taking complements. We’ll ﬁrst show that the restriction reduces V to constant size: Pr [≤ i variables of V stay free] ≥ 1 − ρ 1 i 2c k log n √ i n 1 ≥ 1 − i/3 (for large enough n).2 H˚ astad’s Result This section is devoted to a sketch of H˚ astad’s tight result for the complexity of parity with bounded depth circuits. we get a formula that can be expressed as a simpler DNF (if the original f was a CNF) or a simpler CNF (if the original f was a DNF) with exponentially high probability. so Cρ depends on at most b(c. . . But we’re not yet done: we’ve shown that the gates of M shrink to constant size. respectively. . so if we restrict the remaining 4k free inputs of M . k + 1) variables. 20. of size at most 2c . but we have said nothing about the gates not in M . k) = 4k + 24k b(c − 1. then every clause in Cρ will have some input ﬁxed. . Pick at random a restriction that leaves a fraction p of the variables unﬁxed.Case 2: M  ≤ 2c k log n. so V  ≤ cM  = c2c k log n. Cρ is thus a (c − 1)CNF and we can apply the inductive hypothesis: Cρ depends on at most b(c − 1. we choose i = 4k. Then for every t. obtaining Cρ . We omit the proof of this lemma. but note that we can actually prove a stronger fact: any function of c variables can be written as a DNF or as a CNF of size at most 2 c . 129 . We begin by stating the key theorem: Theorem 91 (Switching Lemma) Suppose f is a kCNF1 over the variables x1 . after making the speciﬁed sort of restriction. n 1 So to get the probability greater than 1 − nk as required by the lemma. k + 1) variables. For each of the n(1 − p) variables that are ﬁxed. 2 (20. xn . The theorem states that.
Next we apply the Switching Lemma to the (new) top level of gates. . Figure 20.} AND . By the Switching Lemma. . we can transform the AND gates of the top level to OR gates with high probability by making a random restriction. Each level of C will consist entirely of AND gates or of OR gates. the fanin at the top must be at least the number of variables. Lemma 92 If f is a kCNF (or a kDNF) for x1 ⊕ x2 ⊕ · · · ⊕ xm . By the lemma. we can collapse the ﬁrst two levels into one level of ORs to get a circuit that computes the parity of n/14 inputs such that the top gates have fanin at most log S (see Figure 2). we get a depth d − 1 circuit that computes the parity of n/(14(14 log s)) inputs such that the top gates have fanin at most log S. then every term has size n. . . with p = 1/(14 log S) and t = log S (see Figure 3).. Now let C be a depth d circuit for parity of size S. Since the AND gates of the top level have been switched to ORs. each of which is a log SDNF (so k = log S). Thus each of these OR gates can be computed by a log SCNF with probability at least 1 =1− 7 14 1 = 1 − log S 2 1 =1− . OR log S 1− 7 14 log S log S k . 130 . . We can apply the Switching Lemma with t = log S and p = 1/14 so that each of these toplevel gates becomes a log SDNF with probability at least 1 − 1/S (see Figure 1). .. . . each clause being a single variable. Each of the AND gates at the top level is a 1CNF. Now we continue applying this procedure recursively until we get a depth 2 circuit that computes the parity of n/(14(14 log S)d−2 ) inputs such that the top gates have fanin at most log S. . S log S and after collapsing the two consecutive levels of ANDs.2: The ﬁrst two levels of the circuit C. . starting with AND gates at the top level. .
4: We use the Switching Lemma to transform the (new) ﬁrst level of ORs to ANDs. } Collapse Figure 20. . .3: Applying the Switching Lemma to the AND gates transforms them to OR gates. . } OR . so we can collapse the ﬁrst two levels to form one level of ORs. . Figure 20. OR log S . . AND log S ... . . .. . . . . . . . 131 . .. . . and then we collapse the two consecutive levels of ANDs to one new level. . .} OR . . .
132 . The lower bound was improved to exponential by Yao [Yao85]. Saxe and Sipser [FSS84]. as well as the ﬁrst proof that Parity is in AC 0 . 20. is due to Furst. and the optimal lower bound is due to H˚ astad [H˚ as86].so log S ≥ or S ≥ 2 14 n 1 1/d−1 n 14(14 log S)d−2 .3 References The idea of using random restrictions.
. xn ) = 0 for all truth assignments to the variables x1 . coNP in much the same way that circuit complexity is related to the question of P vs. . we will consider a situation in which the requirement of a polynomial bound p in (1) is discarded. (Completeness) If ϕ is an unsatisﬁable formula. What we are going to do now is to ﬁx a natural proof system for UNSAT and then attempt to show unconditionally that there are formulas requiring superpolynomial proofs. which is related to the question of NP vs. These statements follow immediately from the deﬁnition of NP. π) rejects. For the purposes of studying proof complexity. This general approach can be likened to the approach physicists take when investigating 133 . π) accepts. . V (ϕ. (Eﬃciency) V runs in polynomial time. .Lecture 21 Proof Complexity In this lecture and in the next one we are going to discuss proof complexity. Thus. and 2. . then there is a π such that V (ϕ. if ϕ(x1 . such that V (ϕ. (Soundness) If ϕ is a satisﬁable formula. with π ≤ p(ϕ). then for all π. we say that V is a sound and complete proof system for UNSAT if 1. For all satisﬁable formulas ϕ and all π. . there exists a proof π. π) rejects (even if the length of π is not bounded by some polynomial in ϕ). There is a polynomialtime veriﬁer V and a polynomial bound p such that for all unsatisﬁable formulas ϕ. If UNSAT is in NP. there is an inﬁnite sequence of unsatisﬁable formulas whose shortest proof sizes grow superpolynomially. . We can now state a necessary and suﬃcient condition for NP to be diﬀerent from coNP in terms of these sound and complete proof systems: NP = coNP if and only if for every sound and complete proof system for UNSAT . where UNSAT is the problem of determining if a Boolean formula ϕ is a contradiction—that is. Recall that NP = coNP if and only if there is some coNPcomplete problem—the complement of an NPcomplete problem—that is in NP. NP. xn of ϕ. equality will hold if UNSAT is in NP. then we can make the following precise statements: 1. V (ϕ. . In particular. and 3. 2. π) accepts.
D2 = x1 ∨ x2 . Notice that any algorithm A for SAT. The reason that this is a proof is that the Resolution rule is sound: any assignment to the variables of ϕ that satisﬁes ϕ will also satisfy any clauses derived using the Resolution rule. consider the formula ϕ = x1 ∧ (x1 ∨ x2 ) ∧ (x2 ∨ x3 ) ∧ x3 . Dt is 0 (the false empty clause). using the resolution rule we can deduce D6 = x3 . As an example of how Resolution works. D2 . Hence a lower bound on the proof length of a proof in this system automatically gives a lower bound on the running time of A. . . if we derive a clause that cannot be satisﬁed by any assignment. Thus. 2.physical theory: a physical theory oﬀers some predictions about phenomena that can be observed in nature. j < i and h < i. Dt ) such that 1. 134 and . 3. we can construct a proof as follows (see Figure 4): 1. From D4 = x3 and D6 = x3 . induces a proof system for UNSAT . using the resolution rule we can deduce D5 = x2 . Let D1 = x1 . where each Ci is a clause. x ∨ D . . ∧ Cm . using the Resolution rule x ∨ D. whether it runs in polynomial time or not. . The proof that ϕ is unsatisﬁable is then just the sequence of clauses (D1 . If they are. In this case. . Each Di is either (a) A clause of the original formula ϕ (i. the “theory” is that NP = coNP. 21.1 Resolution Proof System The proof system for UNSAT that we will investigate is Resolution. Cj for some j between 1 and m) or (b) A clause obtained from Dj and Dh . D7 ). . and this theory predicts that for every proof system for UNSAT . . . using the resolution rule we deduce the empty clause D7 = 0. This reasoning shows that the proof system is sound.e. The proof that ϕ is unsatisﬁable is a description of a rejecting computation of A on input ϕ. so the ﬁrst four clauses in the refutation are just the original clauses of ϕ. it follows that ϕ can likewise not be satisﬁed by any assignment to its variables. there will be formulas requiring proofs of superpolynomial length. . Given a CNF formula ϕ = C1 ∧ C2 . a proof of the unsatisﬁability of ϕ (also called a “refutation” of ϕ) is a sequence of clauses (D1 . From D3 = x2 ∨ x3 and D5 = x2 . D2 . From D1 = x1 and D2 = x1 ∨ x2 . and we check to see that those predictions are actually correct. 4. we take that to be evidence in support of the theory. and D4 = x3 . D3 = x2 ∨ x3 . . D∨D 2..
with the sequence (D1 .e. . ∨ (xn = an ) The number of clauses in the proof (∗) 1 135 . . . 2. . . ∧ Cm be an unsatisﬁable formula over the n variables x1 . any proof that uses the rule can be converted to a proof that doesn’t without adding to the proof’s length: Lemma 93 If there is a proof for ϕ of length1 k using the Resolution and weakening rules.1: A treelike Resolution proof. as it turns out. every unsatisﬁable formula has a refutation). . the proof can be pictured as a tree (again see Figure 4). .X 1 X 1 X 2 X 2 X 3 X 3 X 2 X 3 0 Figure 21. D2 . Using the weakening rule. To see this.. . x2 . In doing so. xn . Now we want to show that Resolution is complete (i. . . by adding literals. . . D∨x We lose no generality in adding the weakening rule because. so the Resolution proof for ϕ is called treelike. m. In general. In this particular example. Dt ) being a topological sort of that graph. We will not prove this result here. . We claim that we can derive all length n clauses this way. then there is a proof for ϕ of length at most k using the Resolution rule only. . i = 1. a proof can be represented as a directed acyclic graph. . . it will be useful to allow the use of the weakening rule D . suppose that the clause (x1 = a1 ) ∨ (x2 = a2 ) ∨ . Now let ϕ = C1 ∧ C2 ∧ . construct all clauses of length n that can be obtained from some Ci .
and one more application of the Resolution rule gives 0.contradicts (∗). . . Then w ≤ log LT + k n log L) + k. w ≤ O( We ﬁrst state a lemma that will be useful in the proof of the theorem. we can use the Resolution rule to derive all length n−1 clauses. Lemma 95 Let ϕ be as above. but this is impossible since we assumed that (∗) cannot be derived from any Cj by adding literals.. We will prove the lemma and the second part of this theorem in the next lecture. . and the width of ϕx=a is at most d. xi2 . the width of ϕx=1−a is at most d − 1. . we eventually derive the clauses x and x for a variable x ∈ {x1 . The ϕ has width no greater than max d. We will show this by induction on n. where each xij is an xk for some k. then t is the length2 of the proof and the size of the shortest clause in the derivation is the width of ϕ (we only count clauses that are derived—not clauses that are in ϕ). . 1}. so if Ci is contradicted by the restriction of (∗∗) to xi1 . xn }. where We will also refer to the length of a proof as its size since the length is the number of nodes in the proof DAG. . x a variable in ϕ. Continuing this process. xi2 . let LT be the length of the shortest treelike Resolution proof. . the number of variables. xn = 1 − a n 21. . . . Dt ) is a Resolution proof for ϕ. 2 136 . Now take a clause Ci ∈ ϕ over the variables xi1 . and a ∈ {0. but this is a contradiction since we assumed that ϕ is unsatisﬁable. then it has a proof of small width. That this approach can still lead to strong lower bounds for length is the content of the following theorem: Theorem 94 Fix an unsatisﬁable kCNF formula ϕ over n variables. Let w be the minimal width of a proof for ϕ. and let L be the length of the shortest general Resolution proof. . a length n clause that is not derivable using weakening describes a satisfying assignment for ϕ). cannot be derived using weakening.2 Width of a Resolution Proof If (D1 . xil . x2 . Proof: The strategy of the proof is to show that if ϕ has a short proof. . . k. . Hence (∗∗) satisﬁes every clause of ϕ. Having derived all length n clauses. showing that ϕ has a refutation and that the proof system is indeed complete. Suppose ϕx=a is a kCNF. A clause is contradicted by exactly one assignment. . It turns out to be easier to prove width lower bounds than length lower bounds directly. (In other words. it follows that Ci is a subset of (∗). we focus on the ﬁrst part of the theorem. xil .. . Clearly the assignment x1 = 1 − a 1 x2 = 1 − a 2 (∗∗) . For now. and b. D2 .
and we can apply the inductive hypothesis to conclude that there is also a proof of ϕx=0 of width at most b + k − 1. we get a proof of ϕx=1 of size at most 2b . Since the size of the proof is at most 2b .2: The treelike Resolution proof of ϕ consists of two subtrees which are derivations of x and x. so any proof will have width 0 (or 1. so the treelike derivation of x becomes a derivation of 0. one of these subtrees must have size at most 2b−1 . If n = 1. If b = 0. as above. But since there is one less variable in ϕx=1 . assume that the derivation of x is the small one. depending on the conventions one chooses). a proof for ϕx=0 .e. without loss of generality. where x is a variable in ϕ (see Figure 5). The last step of the proof must by a derivation of 0 from x and x. b is an integer such that ϕ has a treelike proof of size bounded above by 2b . by induction on n there is a proof for ϕx=1 of width at most b + k. then it has a proof of width at most b + k. Hence ϕx=0 has a treelike proof of size at most 2b−1 . Suppose that we substitute x = 0 into all the clauses of ϕ and all the clauses we derived from ϕ. i. 2 137 . Since the total size of the proof is 2b . then the only possible literals are x and x.X X 0 Figure 21. in which case the claim is obvious. Again the claim is immediate. Next make the substitution x = 1 into all the clauses of ϕ and all the clauses derived from ϕ. Now suppose ϕ has a treelike proof of size 2b . The desired conclusion now follows from the lemma. the treelike derivation of x or the treelike derivation of x must have size no greater than 2b−1 . then the proof that ϕ is unsatisﬁable is (0). The claim is that if ϕ has a treelike proof of size at most 2b . The derivation is still correct since the Resolution rule still applies after we make this substitution. Then.
138 .21.3 References The ﬁrst superpolynomial lower bounds for resolution proof length were proved by Tseitin [Tse70]. The notion of width is due to BenSasson and Wigderson [BSW01]. The ﬁrst exponential lower bounds are due to Haken [Hak85].
1.1) and (22. w(ϕ) is related to LT and L as: w(ϕ) ≤ log LT + k w(ϕ) ≤ O( n log L) + k (22. Instead of working directly with LT and L. since a single clause in a general refutation can be used later in more than one derivation. we mean the width of its minimumwidth proof. if w(ϕ) = Ω(n) and k = o(n). a derivation (ϕ → D) can have diﬀerent proofs and by its width w(ϕ → D).2) This theorem is an attempt at simplifying the study of proof complexity [BSW01]. then by Relations (22. A proof that derives (ϕ → 0) is called a refutation of ϕ.1) (22.2) though. The following theorem is about w(ϕ → 0) (written sometimes as simply w(ϕ)).1 Width and Proof Complexity The Main Theorem In general. Then. LT and L = 2Ω(n) . We can’t mimic that proof for Relation (22. We prove Relation (22. Let LT and L be the length of the shortest treelike and shortest general refutation of ϕ respectively.Lecture 22 Lower Bounds on Resolution Proof Length 22. we work on proving linear lower bounds on w(ϕ) to get exponential lower bounds on LT and L. Theorem 96 (Main WidthLength Theorem) Let ϕ be a unsatisﬁable kCNF formula on n variables. we can’t say anything about L using the sublinear lower bound on w(ϕ). where 0 is the empty false clause and ϕ is unsatisﬁable.1 22.2). We proved Relation (22. leading to a dag structure. the width of the minimumwidth refutation of ϕ. using a simple induction argument on the structure of the treelike refutation. That is.2) using the following technical lemma. but we can still get a subexponential lower bound on LT .1) last time. If we replace n with n . 139 .
2) of WidthLength Theorem) Let π be the shortest general refutation of ϕ with length π = L.2) uses the value of d that optimizes this bound. is obtained similarly. For a variable x in ϕ.e. then w(ϕ → xa ) ≤ d. when d = n log L (i. Obtain a new proof π by copying all clauses in π. 1}. Also. let x1 and x0 be aliases for the literals x and x respectively.2 Proof of the technical lemmas In this section. then the width of π is atmost d + b + k. Then φ = φx=0 ∩ π denotes the restricted clauses of φ that were used in the proof π. 1}. w(ϕx=1−a ) ≤ d − 1 and w(ϕx=a ) ≤ d. The new proof π derives x directly if φ was not an empty set or indirectly via a weakening rule from the ﬁnal 0 clause otherwise. Proof: (of Relation (22. We obtain the result by using Lemma 97 on the trivial bound πd  < L. d = d upper bound on w(ϕ). 2 n log L) in the relation above. Let π be a refutation of ϕx=0 with width atmost d−1 and let φ denote the set of clauses in ϕ that have the literal x in them. we get the desired 22. for a ∈ {0. Thus. denoted by Dx=a . π is a valid proof of (ϕ → x) with width atmost w(π) + 1 = d. we prove two technical lemmas used in the proof of WidthLength theorem.1) last time and it will also be used in the proof of Lemma 97 this time. ϕx=a is a kCNF formula. This lemma is true for any value of d and the proof of Relation (22. We need more notations before proceeding. and adding x back to the clauses in φ and its subsequent derivations (to preserve validity). L = (1 − (using.. 140 .Lemma 97 Let πd be the set of clauses of size strictly greater than d in a general refutation d π. Lemma 98 was used in proving Relation (22. We assume a = 1 for notational ease (the proof for general a extends easily). let ϕx=a be the restriction of ϕ to x = a. but if there are not too many of them. ϕx=a is the set of clauses obtained from ϕ after removing all clauses that have xa in them and removing x1−a from all clauses that have this literal. then w(ϕ) ≤ max(d. we can safely bound the width of π. If πd  < (1 − 2n )−b for some parameter b.1. The restriction of a clause D to x = a. Proof: We will ﬁrst prove the claim: If w(ϕx=1−a ) ≤ d−1. log1+ x ≈ log x) Therefore. The clauses in πd are in a sense the problematic clauses in π. k). of a unsatisﬁable kCNF formula ϕ on n variables. w(ϕ) ≤ w(π) ≤d+b+k 1 d 1− 2n d 2n = d + log L+k L+k (where b is from. That is. 1 1− d −b 2n ) ) ≤ d + log1+ =d+ O( n d ≤ d + O( 2n log L) + k d log L) + k √ ≈ 1+ ) 1 (using. Lemma 98 If for a ∈ {0.
By induction on n. there must be a literal. The induction step then follows from applying Lemma 98 on these new refutations. Speciﬁcally. So. to obtain ϕx=1 . For the induction step. Now we turn to the proof of the lemma. We then resolve this derived x with all clauses of ϕ that doesn’t contain x. Proof: (of Lemma 97) The proof is by induction on b and n. We use the claim above to derive (ϕ → x). that occurs in atleast 2n πd  clauses in πd (another way to look at this result is to pick one out of the 2n literals in random and see the expected number of clauses in π d that this literal hits). This is ﬁnally fed into the refutation of ϕx=1 . πd  < 1 and the lemma is trivially true for any value of n. we get a refutation of ϕ with width atmost max(d. we are rescued by the fact that x appears in many clauses in πd and hence the set of clauses that are still wider than d in the restricted refutation ϕ x=1 141 . When b = 0. Since ϕx=1 is a kCNF and the width of both refutations used in the composition is atmost d. 2 We now give the proof of Lemma 97. k). which was used in proving Relation (22. say x d wlog. we just need to prove that w(ϕx=1 ) ≤ b + d + k − 1. to obtain a refutation for ϕ. we need a literal that hits many clauses in πd and we will indeed have such a literal by the Pigeon Hole Principle (on viewing the 2n literals as holes and the sumtotal of the sizes of all clauses in πd as the number of pigeons).2) of WidthLength theorem. the lemma is trivially true for any value of b. See Figure 22.1 for a depiction of this composition. we already have w(ϕx=0 ) ≤ b + d + k. we obtain valid refutations of ϕx=0 and ϕx=1 by restricting π to x = 0 and x = 1. as there are only two literals in the formula. again assuming a = 1. Similarly for n = 1. Here.1: Composing (ϕ → x) with (ϕx=1 → 0).ϕ width d copied from ϕ (x added to some clauses) x=0 x Clauses in ϕ with no x in them ϕx=1 width k width d 0 Figure 22. Using this x.
.2: Graph Gϕ associated with a formula ϕ. E). C2 .2) is the undirected bipartite graph Gϕ = (L. Cm and n variables x1 . (call this set πd ) is small enough: πd ≤ (1 − d 2n ) πd  d −(b−1) 2n ) < (1 − (substituting for πd ) Since the parameter is now (b−1). w(ϕx=1 ) ≤ b−1+d+k by induction on b and we are done with the proof. xn } . due to their expansion properties). . . we will use the expansion property of the bipartite graph associated with a CNF formula. . . R = {C1 . Deﬁnition 29 (Graph associated with a formula) Let ϕ be a CNF formula on m clauses C1 . x2 . . Cm } and 142 . . Next we use this theorem to show that random kCNFs require exponentiallength refutations (whp. x2 . . . R. Clause Cp could be xi ∨ xj ∨ xk and Cq could be xi ∨ xj ∨ xk .2 22. C2 . . . In our case. We ﬁrst deﬁne this expansion property and state the main theorem relating width and expansion. E = {(xi . Cj )  clause Cj contains one of the literals xi or xi } . 2 22. . . . The graph associated with ϕ (see Figure 22. .1 Width Lower Bounds The Setting We establish lower bounds on width by relating it to some natural combinatorial property.2.C1 x1 xi Cp xj Cq xk xn Cm Figure 22. . xn . deﬁned by L = {x1 .
say 1 .c)expanding and unsatisﬁable. constructed by picking each clause with k variables independently at random. 2 The proof of this theorem is quite interesting and is given in the next two sections. ∆ k−2+ N (S) ≥ (1 + c) r 22. c)expanding if and only if ∀S ⊆ {C1 .2 Matching and Satisﬁability This section prepares us for the proof of WidthLB theorem. For a speciﬁc value of . We look at Hall’s theorem ﬁrst. where N (S) refers to the neighbourhood of S in Gϕ . from all possible n n k .Deﬁnition 30 (Bipartite Expansion) A CNF formula ϕ on m clauses C1 . Theorem 100 Let ϕ be a random kCNF formula on n variables and ∆n clauses. The second one follows from the presence of a matching (in Gϕ ) of a subformula of size atmost r (by virtue of ϕ’s bipartite expansion property). The ﬁrst corollary follows from the absence of a matching (in Gϕ ) of the entire formula ϕ. The application of this theorem lies in the context of WidthLength theorem (Theorem 96). leads to an interesting result called the Tarsi’s theorem. . N (R ) ≥ R where N (R ) refers to the neighbourhood of R in G. C2 . 1 )expanding and hence 2 2 by WidthLB and WidthLength theorem. Consider a subformula S of ϕ. then the condition in the theorem is clearly true. whp ϕ is 1 density ∆ and the parameter with 0 ≤ ≤ 2 are constants. .2. Theorem 99 (Main WidthLB (LowerBound) Theorem) For positive r and c. This simple observation along with Hall’s Matching theorem. . . if and only if ∀R ⊆ R . Suﬃciency of the condition leads to the following two corollaries of Hall’s theorem. Infact. inspite of possible overlaps between the neighbourhood of diﬀerent R s. . where the clause 1 k 2 clauses with repetition. the next theorem gives such a lower bound for random kCNFs. But the theorem’s strength lies in the suﬃciency of the condition. then we can satisfy S by assigning proper truth values to the unique nonconﬂicting variable that each of its clauses is matched to. Theorem 101 (Hall’s Matching Theorem) For a bipartite graph G = (L. Then. If we have a matching of S in Gϕ . R. we see that random kCNFs are (Ω(n). if ϕ is (r. Cm is (r. expanding. since a linear lower bound on rc is now enough to show a exponential lower bound on the length of any refutation of ϕ. 143 . E). If there is a matching of vertices in R . they require exponentiallength refutations whp (note that they are also unsatisﬁable whp for proper choice of ∆ and k). then rc w(ϕ) ≥ . . . Cm } and S ≤ r. . there exists a matching of vertices R ⊆ R. C2 . The proof of this theorem follows from counting arguments and is left as an exercise.
. Corollary 103 For a positive r and c. Proof: Since ϕ is a unsatisﬁable formula (on clauses C1 . . if we we take away any clause from ϕ. the case that we have already proved. So.3). C2 . it becomes satisﬁable). That is. The only way to resolve this contradiction requires S to be the entire formula. S  < N (S ) − N (S). Since ϕ is minimally unsatisﬁable. . . . If S is the entire formula. By maximality of S. . . Cm } − S. we have S  < N (S ) − N (S) (see Figure 22. we have a subset of clauses S for which S > N (S). C2 . Corollary 102 If ϕ is unsatisﬁable. . . . we are done with the proof as S = m and N (S) = n. xn } − N (S). . which will be used in the proof of WidthLB theorem in the next section. C2 . then there exists a subset S of the clauses of ϕ for which S > N (S). . . . We now present Tarsi’s theorem and its corollary. Cm } − S using only the variables in {x1 . (Note that minimal unsatisﬁability refers to minimality w. resulting in a contradiction. . S can also be satisﬁed (ofcourse using only the variables in N (S)). this result basically translates to a matching in Gϕ (and hence satisﬁability). . because otherwise we could add S to the already maximal S. Cm and variables x1 . Theorem 104 (Tarsi’s theorem) For a minimally unsatisﬁable CNF formula ϕ on m clauses and n variables. m > n. . . . leading to the satisﬁability of the entire formula ϕ. if ϕ is (r.r. . . then every subformula of ϕ having atmost r clauses is satisﬁable.C1 N(S) x1 S N(S’)−N(S) S’ xn Cm Figure 22.e. . .3: S is an arbitrary subset of {C1 . Let’s make sure that S is the maximal such set.t inclusion of clauses. C2 . 2 144 . .. S < m) and let S be an arbitrary subset of {C1 . by Corollary 102 of Hall’s theorem. By Hall’s theorem. xn ). c)expanding. let’s assume otherwise (i. of all the clauses in {C1 . Cm }−S in the proof of Tarsi’s theorem.
Since µ(C) = 1 for a clause C in ϕ. . we get: (note how we use the 145 . whose measure is such that r/2 ≤ µ(D ∗ ) ≤ r. ϕ can’t be refuted using only r of its clauses and we should be able to ﬁnd a clause that can be derived from a ”fairly big” but not ”too big” subformula of ϕ. 1 k the unique assignment that contradicts D is then (x1 = 1−a1 . For the ﬁnal 0 clause in π. minimally implies a clause D of size k. We obtain the result by direct application of Tarsi’s theorem. then m>n−k or equivalently. That is. xk = 1−ak ). Then using the fact that r/2 ≤ S ≤ r and the corollary of Tarsi’s theorem. let S be the smallest subformula of ϕ that minimally implies D ∗ . µ(0) ≥ r. For any clause C in π that is also in ϕ. we are ready for the proof of WidthLB theorem (Theorem 99).r. Otherwise continue this process recursively. Restricting ϕ to this assignment should yield a unsatisﬁable formula ϕ(x1 =1−a1 . (Note that ”minimally implies” again refers to minimality w. µ(C) = 1.. For a clause D derived from D and D in π. 3.. 1. Proof: Recall that x1 and x0 are aliases for x and x respectively.Corollary 105 If a CNF formula ϕ on m clauses and n variables. start from the ﬁnal 0 clause and pick the higher measure clause of the two clauses from which it is derived. c)expanding formula on m clauses.xk =1−ak ) on (n − k) variables and atmost m clauses. In other words.3 Putting them together Equipped with the tools of last section. Proof: (of WidthLB Theorem) Let π be the minimumwidth refutation of ϕ. It is also minimally unsatisﬁable because ϕ minimally implied D in the ﬁrst place. The above properties suggest that there exists a clause in π (say D ∗ ). Intuitively. 2 22. The measure of the picked clause is atleast r/2 and if it is also atmost r stop.t inclusion of clauses. we will ultimately succeed due to subadditivity of the measure. (from Corollary 103 of Hall’s theorem). . . 2. To see this.. k > n − m. D can be derived from ϕ but not from any subformula of ϕ). If D is (xa1 ∨ . This measure has the following easily veriﬁable properties. . µ(D) ≤ µ(D ) + µ(D ).2. The proof follows from arguing about the size of such a clause in the minimumwidth refutation. ∨ xak ). µ is a subadditive measure.. a (r. . Now. . Deﬁne a measure µ (a number between 1 and m) for all clauses D ∈ π as: µ(D) =  smallest subformula of ϕ that minimally implies D  .
2 = c S ≥ N (S) − S (by Corollary 105 of Tarsi’s theorem) (by expansion property of ϕ.fact that S is ”fairly big” but not ”too big”) w(ϕ) ≥ Size of D ∗ ≥ (1 + c) S − S ≥ rc/2. since S ≤ r) 146 .
max(d. (The current proofs use induction on two variables). that yields a bound of d+1 on w(ϕ)). k)) because there is a composition slightly diﬀerent from the one in Figure 22. can be proven by induction on n alone. 147 .1) of WidthLength theorem (proved last time) and Lemma 97. for which the upper bound on w(ϕ) in Lemma 98 becomes tight. Prove Theorem 100 on the expansion property of random kCNFs.1. 22. 3.3 References All the results in this lecture are due to BenSasson and Wigderson [BSW01]. 2.Exercises 1. (Note that a tighter bound is min(d+1. Provide a family of examples. Show that Relation (22.
and outputs it.Lecture 23 Pseudorandomness and Derandomization These notes follow quite literally the notes that I wrote for the PCMI school on computational complexity in Summer 2000. r) = 1] ≤ 3 4 3 4 1 . ·) is an algorithm that takes two inputs x and r.1 Probabilistic Algorithms versus Deterministic Algorithms A probabilistic algorithm A(·. x ∈ L ⇒ Prr [A(x. it tests whether two readonce branching programs are equivalent). 23. For example the random source could be a device that tosses coins. r) = χL (x)] ≥ or. 3 4 We will typically restrict to the case where A solves a decision problem (e.g. said another way. observes the outcome. where x is an instance of some problem that we want to solve. r) = 1] ≥ and x ∈ L ⇒ Prr [A(x. ·) (polynomial in the length of the ﬁrst input) such that for every x Prr [A(x. A random source is an idealized device that outputs a sequence of bits that are uniformly and independently distributed. For any constant 1/2 < p < 1. In this case we say that a language L is in BPP if there is a polynomial time algorithm A(·. r) = right answer for x ] ≥ Prr [A(x. and r is the output of a random source. say. for every x. There is some redundancy between these notes and the content of past lectures. if we had deﬁned BPP by requiring the probabilistic algorithm to be correct with probability st 148 . 4 The choice of the constant 3/4 is clearly quite arbitrary. A probabilistic algorithm A is good if it is eﬃcient and if.
1. The correct answer is the one that comes up the majority of the times. number of random bits. On input x of length n. in some cases one can prove that deterministic algorithms 149 . the only known polynomial time algorithms are probabilistic. it is quite an interesting question to determine what is the relative power of probabilistic and deterministic computations. For quite a few interesting problems. then there is another probabilistic algorithm A . In fact. if A uses a logarithmic number of random bits. we just have to keep track. still running in polynomial time. for any two polynomials p and q. It is easy to come up with a deterministic algorithm that solves L in time 2m(n) T (n). Another example is the problem of extracting “square roots” modulo a prime. for any polynomial p. in order to solve our problem. As we will see in the next section. Notice that the running time of the simulation depends exponentially on the number of random bits used by A. a probabilistic algorithm uses a linear. of the number of strings r for which A(x. A wellknown example is the problem of testing whether two multivariate lowdegree polynomials given in an implicit representation are equivalent. r) = 0. There is no known deterministic polynomial time algorithm for any of the above problems. That is. However. where n is the size of the input. In general. say that A uses a random string r of length m = m(n) and runs in time T = T (n) (note that m ≤ T ). it would have been equivalent to deﬁne BPP by asking the algorithm to be correct with probability at least 1/2+1/p(n). r) for every r. 23. or that we have not been able yet to ﬁnd the best possible deterministic algorithms for these problems. This question is the main motivations for the results described in this lecture and the next ones.g. compute A(x.1 A trivial deterministic simulation Let A be a probabilistic algorithm that solves a decision problem L. and it would have also been equivalent if we had asked the algorithm to be correct with probability at least 1 − 1/2 p(n) . if they exist. we would have given an equivalent deﬁnition. there are probabilistic polynomial time algorithms to ﬁnd roots of polynomials modulo a prime.least p. to equations of the form x2 = a (mod p) where p and a are given. i. it is not easy to obtain more eﬃcient simulations.2 Exponential gaps between randomized and deterministic procedures For some computational problems (e. that solves L on every input of length n with probability at least 1 − 2−q(n) . during the computation of A(x. then the simulation is polynomial.1. r) = 1 and the number of strings r for which A(x. In particular. typically. but only polynomially on the running time of A. so.e. to ﬁnd solutions. if for a decision problem L we have a probabilistic polynomial time A that solves L on every input of length n with probability at least 1/2 + 1/p(n). 23. or more. On input x. r) for every r. approximating the size of a convex body) there are probabilistic algorithms that work even if the object on which they operate is exponentially big and given as a black box. and so this trivial simulation is exponential. More generally. and p is prime. It is not clear whether the existence of such probabilistic algorithms suggests that probabilistic algorithms are inherently more powerful than deterministic ones.
If the algorithm takes subexponential time. but an eﬃciently computable one. y) that is equal to f on all the points that our algorithm queries. If we want to come up with a general way of transforming probabilistic procedures into deterministic subexponential procedures. and is 1 elsewhere (note that. given x. unless they use exponential time. it is impossible to give subexponential time algorithms for all BPP problems by using “relativizing” techniques. and then output the fraction of i such that f (x. yt at random. using reductions one shows that algorithmic assumptions imply algorithmic consequences. g is almost everywhere one. 23. to within an additive factor 1/10) to the value Pry [f (x. one can show that the answer to some open question is implied by (or even equivalent to) the answer to some other question. provided that certain complexitytheoretic assumptions are true.cannot solve the same problem in the same setting.1. by this deﬁnition. by counterpositive. In fact. evaluate f (x. r) = 1] ≤ 1/4. 1 If we want to deterministically and subexponentially simulate BPP algorithms. the deterministic simulation of Section 23. and the observation of Section 23. 150 . ·) outputs 1 when Pr r [A(x. it is impossible to have a subexponential time deterministic “universal derandomization procedure” that given x and oracle access to an arbitrary function A(·. Now consider the function g(x. . yet the algorithm will give the same answer as when accessing f . r) = 1] ≥ 3/4 and outputs 0 when Prr [A(x. 1} that is given as an oracle. and intractability assumptions imply intractability consequences.1. Suppose that there is some function f : {0. . yi ). Typically a reduction from a computational problem A to a problem B shows that if B has an eﬃcient algorithm then A has also an eﬃcient algorithm. 1}n × {0. the queries of the algorithm on input x will be the same for f and g). This will be an approximation to within 1/10 with good probability. more generally. However a deterministic subexponential algorithm. so that a hardness condition implies an algorithm 1 More precisely. where it is possible to ﬁnd pointers to the relevant literature. we want to devise an algorithm that on input x ﬁnds an approximation (say. we have to exploit the fact that a BPP algorithm A(·. it cannot solve the problem with the required accuracy. The results described in these notes show that it is indeed possible to deterministically simulate probabilistic algorithms in subexponential (or even polynomial) time. Let us see a particularly clean (but more artiﬁcial) example of this situation. It is beyond the scope of these notes to explain what this means.2 Derandomization Under Complexity Assumptions It is still not known how to improve. “Relativizations” are discussed in [Pap94]. In these notes we will see instead that the existence of eﬃcient derandomized algorithms is implied by the intractability of some other problem. and. and why it is more general. y) = 1]. and this is diﬃcult because we still have a very poor understanding of the nature of eﬃcient computations. yi ) = 1. the transformation cannot be described and analyzed by modeling in a “black box” way the probabilistic procedure. however the nature of the results of these notes is somewhat unusual. y). Suppose that f is zero everywhere. . In general. A probabilistic algorithm would pick O(1) points y1 . in the general case. if A is intractable then B is also intractable. .1. ·) is not an arbitrary function. can only look at a negligible fraction of the values f (x. using reductions.2 shows one of the diﬃculties in achieving an improvement. which is everywhere zero. 1}n → {0. It is quite usual in complexity theory that. If our algorithm makes less than 2n−1 oracle queries.
BPP seems to still represent a class of feasible computations.consequence. As we will see in a moment. CC(Ln ) ≥ 2δn . on the other hand. Recall that DT IM E(T (n)) is the class of decision problems that can be solved by deterministic algorithms running in time at most T (n) on inputs of length n. then for every BPP language L and every > 0 there is a deterministic algorithm A that works in time 2 n and. for all suﬃciently large n. Either BPP = EXP. 1}n → {0. In the next section we will introduce some notation about computational problems and complexity measures. By deﬁnition. It will be convenient to think of Ln as a Boolean function Ln : {0. 1}n such that C(x) = f (x). Theorem 106 ([IW98]) Suppose BPP = EXP. we have BPP ⊆ EXP. We have the O(1) classes E = DT IM E(2O(n) ) and EXP = DT IM E(2n ). and then we will state some results about conditional derandomization. 1k ) : machine M halts within 2k steps on input x } Notice also that L cannot be solved by algorithms running in time 2o(n) . We can also get an optimal simulation of BPP under an assumption that is much stronger. something is wrong with the above intuition. what is the largest s such that for every circuit C of size ≤ s we have Prx∈{0. which sounds really impossible. Since there are settings where probabilistic procedures require exponential time to be simulated. 23. 23. For a function f : {0. because it would 151 . denote this number by H(f ).” Note that if the IW assumption is true.2. we have that if C is a circuit with n inputs of size less than CC(f ) then there exists an x ∈ {0.2. consider the size of the smallest circuit that solves f . for every n. for all suﬃciently large n. This gives a nontrivial simulation of BPP under an uncontroversial assumption. for inﬁnitely many n. and so it would be a little bit surprising if it could be solvable by circuits of size 2o(n) . but that would be quite remarkable enough). 1}n → {0. then P = BPP. x. 1} (with the convention that x ∈ Ln if and only if Ln (x) = 1). denote this number by CC(f ). Consider now. but quite believable. one would conjecture o(1) that BPP ⊆ 2n . solves L on a fraction 1 − 1/n of the inputs of length n. CC(Ln ) ≥ 2δn ” the “IW assumption.1}n [C(x) = f (x)] ≤ 1/2 + 1/s.2 Hardness versus Randomness From our previous arguments. or else it must be the case that BPP has subexponential time deterministic algorithms (that will work well only on average.1 Formal Deﬁnitions of Complexity Measures and Complexity Classes For a decision problem L and an integer n we denote by Ln the restriction of L to inputs of length n. and it would be very surprising if BPP = EXP. then it is true in the case where L = {(M. 1}. We will call the statement “there is a problem L in E and a ﬁxed δ > 0 such that. Theorem 107 ([IW97]) Suppose there is a problem L in E and a ﬁxed δ > 0 such that.
1/8) pseudorandom generator G : {0. The stronger the assumption. Uma02]. O( log n) circuits of size nlog n then the simulation runs in time 22 . z) = right answer ] ≥ 5/8 152 . ·) can be performed by a circuit of size s(n). polynomial and subexponential). The result of [IW98] does not scale up so well when one is willing to make stronger uniform assumptions. for inﬁnitely many n. The other diﬀerence is that Theorem 107 assumes exponential hardness. to a nonuniform measure of complexity. On the one hand. In particular. the more eﬃcient the simulation. BPP can be simulated in time 2n . 1}t(n) → {0. if one assumes that E requires circuits of size 2Ω(n) then the simulation runs in time nO(1) √ if one assumes that E requires . This is reﬂected in the running time of the consequent deterministic simulations (respectively. while the conclusion of Theorem 106 gives an algorithm that works only for most inputs. the following is an open question. and so on. Then we can deﬁne a new probabilistic algorithm A such that A (x. It is easy to observe that for every x we have Prz [A (x. the conclusion of Theorem 107 is stronger than the conclusion in Theorem 106 in two ways. suppose that for every x we have Prr [A(x. When one makes the nonuniform assumption that there is a problem in E that requires circuits of size s(n). that is. then the consequence is a deterministic simulation of BPP in −1 O(1) time roughly 2s (n ) [ISW99. G(z)). and suppose that we have a (s.99n existed for L. This diﬀerence accounts for the fact that the conclusion of Theorem 107 gives an algorithm that works for all inputs. 1}m(n) . nonuniformity buys more than a polynomial speedup. 23. SU01.mean that. So if one assumes that E requires supero(1) polynomial circuits. 1}t → {0. solves L on a fraction 1 − 1/n of the inputs of length n. whereas the assumption in Theorem 106 uses a uniform measure of complexity (running time of probabilistic algorithms). while Theorem 107 assumes only superpolynomial hardness. Notice that the assumption in Theorem 107 is stronger than the assumption in Theorem 106 in two ways. )pseudorandom generator if for every circuit D of size ≤ s we have Prr [D(r) = 1] − Prz [D(G(z)) = 1] ≤ Suppose that we have a probabilistic algorithm A such that for inputs x of length n the computation A(x. 1}m is a (s. z) = A(x. The two theorems that we just stated are the extremes of a continuum of results showing that by making assumptions on the hardness of problems in E and EXP it is possible to devise eﬃcient deterministic algorithms for all BPP problems.3 Pseudorandom Generators We say that a function G : {0. for general exponential time computations. the assumption in Theorem 107 refers to circuit size. then for every BPP language L there is a deterministic polynomial time algorithm A that. Conjecture 1 Suppose E ⊆ δ>0 BP T IM E(2δn ). and that. similarly. In fact it would be very surprising if circuits of size 2. r) = right answer ] ≥ 3/4.
our problem has to be so hard that even circuits of size 2δn have to be unable to solve the problem correctly on more than a fraction 1/2 + 2−δn of the inputs. the circuit can give the right answer on these 2δn inputs. [ACR98] show that in order to deterministically simulate probabilistic algorithms it is enough to construct hitting set generators. [MV99] also gets a stronger conclusion. So there will be no loss in generality if we consider only generators for the special case where. for a stronger reason. 1/8)pseudorandom (in particular.1 The two main theorems The NisanWigderson Theorem Theorem 109 (Special case of [NW94]) Suppose there is L ∈ E and δ > 0 such that. This way the circuit is good on about a fraction 1/2 + 2 −(1−δ)n of the inputs. but only under very strong complexity assumptions. The complicated proof of [ACR98] was simpliﬁed in subsequent work [ACRT99. Miltersen and Vinodchandran [MV99] give a very elegant construction of hitting set generators. Of course this is only a suﬃcient condition. and if G is computable in poly(m. it is not known how to prove the main result of [MV99] (about the “derandomization” of tworounds interactive proofs) using pseudorandom generators. (This is not really necessary. Notice also that if we have a (s. and. ) pseudorandom generator G : {0. 1}O(log m) → {0. 1}t → {0.4. without passing through the construction of such strong generators. and answer always 0 or always 1 (whichever is better) on the other inputs. 1}m that are computable in poly(m) time and that are (2m. say. for example.) We have the following easy theorem. 1}m (G just computes G and omits the last m − m bits of the output). A circuit of size 1 can certainly solve the problem on a fraction at least 1/2 of the inputs (either by always outputting 0 or by always outputting 1). Theorem 108 Suppose there is a family of generators Gm : {0. using the trivial derandomization we can get a deterministic algorithm A that always works correctly and whose running time is 2t times the sum of the running time of A plus the running time of G. the right solution to our problem for every input whose ﬁrst (1 − δ)n bits are 0. 2 23.4 23. then the whole simulation runs in deterministic polynomial time. )pseudorandom generator G : {0. in particular. Andreev et al. a seemingly weaker primitive than a pseudorandom generator. 1}m that are computable in poly(m) time and that are (2m. for all suﬃciently large n. 1}m . but it also requires a stronger hardness assumption than in [IW97]. and anyway the (arguably) most interesting results are proved using pseudorandom generators. a circuit of size 2n always exist that solves the problem on every input. P = BPP). 2 Some exceptions are discussed below. 1/8)pseudorandom. then for every m ≤ m we also have. On the other hand. 1}O(log m) → {0. Notice the strength of the assumption. a (s. GW99]. If t is logarithmic in m and s. Andreev et al. [ACR99] also show how to construct hitting set generators. H(Ln ) ≥ 2δn . s = 2m. Furthermore. s) time. then P = BPP.and that. A circuit of size 2δn can contain. 153 . 1}t → {0. There could be other approaches to proving (conditionally) P = BPP. then there is a family of generators Gm : {0. but it will help reduce the number of parameters in the statements of theorems. BF99. Unfortunately we hardly know of any other approach. For almost every input length n.
and this is about best possible for general problems [ACR97]. although this is not the way it was originally proved. for all suﬃciently large n. It is somewhat more tricky to show that there is in fact even a circuit of size 2(1/3+o(1))n that solves the problem on a fraction 1/2 + 2−(1/3+o(1))n of the inputs. there is a circuit of size 2n/2 that solves the problem on a fraction 1/2 + 2−n/2 of the inputs. H(Ln ) ≥ 2δ n . Then there is L ∈ E and δ > 0 such that. This is quite encouraging: the (believable) IW assumption implies the (a priori less believable) NW assumption. CC(Ln ) ≥ 2δn . IW97]) Suppose there is L ∈ E and δ > 0 such that. Notice how Theorem 107 follows from Theorems 109 and 110. for every problem. 23.2 Worstcase to Averagecase Reduction Theorem 110 ([BFNW93.4. The results on polynomial reconstruction from Lecture 10 imply Theorem 110. in particular. Our proof followed [STV01].So. 154 . for all suﬃciently large n. Imp95a.
1}l → {0. Our goal is to use these assumptions on Ll to build a generator whose input seed is of length O(l). 1}.1. We want to argue that G is a (s − 3. Lemma 111 Let f : {0. yet there is a uniform algorithm that solves Ll in 2O(l) time. and consider the pseudorandom generator G : {0. 1} be a function such that H(f ) ≥ s. and whose running time and pseudorandomness are as required. There is some redundancy between these notes and the content of past lectures. and present its analysis.1 The NisanWigderson Construction The NisanWigderson generator is based on the existence of a decision problem L in E such that for almost every input length l we have H(Ll ) ≥ 2δl . 24. 1}l → {0. We will then present the NisanWigderson construction. and the generator should be computable in time 2O(l) . and consists in the proof of the following result. Suppose that there is a circuit D of size s such that Prx [D(x · f (x)) = 1] − Prx. As we will see in a moment. The argument works by contradiction. 24. whose output is of length 2Θ(l) and indistinguishable from uniform by adversaries of size 2Θ(l) .Lecture 24 The NisanWigderson Generator These notes follow quite literally the notes that I wrote for the PCMI school on computational complexity in Summer 2000.1 Impredictability versus Pseudorandomness Let f : {0. where ‘·’ is used to denote concatenation. 1}l+1 deﬁned as G(x) = x · f (x). In this lecture we prove Theorem 109. 1}l → {0. 1/s)pseudorandom generator. it is not too hard to construct a generator that maps l bits into l + 1 bits.b [D(x · b) = 1] > then there is a circuit A of size s + 3 such that Prx [A(x) = f (x)] > 1 + 2 155 .
b [b = f (x)] 1 1 Prx.b [D (x. This observation suggests the following strategy in order to use D to predict f : given an input x. b) = 1b = f (x)] 2 1 + Prx [D (x. b) = 1] 2 1 + 2 1 + 2 (24. and we output b. 2 24.1) is satisﬁed either by taking D = D or by taking D = ¬D . Sm ) of subsets of an universe U . . .1. we guess a value b.b [D (x. b). we take it as evidence that b was a good guess for f (x). 1} such that Prx [Ab0 (x) = f (x)] > 1 + 2 And Ab0 is computed by a circuit of size at most s + 3 because Ab0 (x) = b0 ⊕ (¬D (x.1) This is because Expression (24. If D (x. If D (x. b) = 1b = f (x)] 2 2 2 1 + Prx. f (x)) = 1] − Prx. A way to interpret Expression (24. b) = 1b = f (x)] − Prx. and we output 1 − b. if the last bit is 1 − f (x)). α)design if.b [D (x. and.b [D (x. . b0 )).b [D (x.b [Ab (x) = f (x)b = f (x)] 2 2 1 1 Prx. b) = 1. for every i = j.b [D (x. b) = 1b = f (x)] + Prx. Let A b be the procedure that we just described.b [Ab (x) = f (x)b = f (x)]Prx. b) = 0b = f (x)] 2 2 1 1 1 + Prx. 156 . . and we compute D (x.b [D (x.b [Ab (x) = f (x)b = f (x)] + Prx.2) = > From Expression (24.b [b = f (x)] = = = = +Prx. for which we want to compute f (x).b [Ab (x) = f (x)] = Prx.b [D (x · b) = 1] > (24. We say the family is a (l. which can be implemented with two more gates given a circuit for D . we observe that there is a circuit D of size at most s + 1 such that Prz [D (x · f (x)) = 1] − Prx.2) we can observe that there must be a b0 ∈ {0.b [Ab (x) = f (x)b = f (x)]Prx.2 Combinatorial Designs Consider a family (S1 . D is more likely to accept if the last bit is f (x) than if the last bit is random (and. b) = 0. we take it as evidence that b was the wrong guess.Proof: First of all. Si  = l. for a stronger reason. b) = 1b = f (x)] + Prx. We claim that Prx.b [D (x. Si ∩ Sj  ≤ α.1) is to observe that when the ﬁrst l bits of the input of D () are a random string x. b) = 1b = f (x)] 2 1 − Prx. for every i.b [Ab (x) = f (x)] > The claim is proved by the following derivation Prx.
3 The NisanWigderson Generator For a Boolean function f : {0. 1}l → {0. . 0. log m) design (S 1 . 1). We deﬁne m + 1 distributions H0 . Lemma 113 Let f : {0. . 1}m . For example if z = (0. Suppose D : {0. At this level of abstraction. In order to ﬁnd the “right bit”. where t = O(l/γ) and m = 2γl . 1}. however. 1}m deﬁned as follows: N Wf. Let us start with the hybrid argument. the analysis is the same as the analysis of the BlumMicaliYao generator. . as the analysis unfolds. Hm . . 0) and S = {1.S (z) for a random z. 0. . then concatenate the ﬁrst i bits of v with the last m − i bits of r. . D is distinguishing f (x) from a random bit. log m) design over [t]. we will use the hybrid argument. Proof: The main idea is that if D distinguishes N Wf. and then sample a string r ∈ {0. there is an (l. 1. 1} be a Boolean function and S = (S1 . and such a distinguisher can be turned into a predictor for f . 1.Theorem 112 For every integer l. .S (y) and H0 is the uniform distribution over {0. such a design can be constructed in O(2t tm2 ) time.2 The Reduction from Distinguishing to Predicting The following lemma implies Theorem 109. . Sm ) over the universe [t]. Hm is distributed as N Wf. 1}t → {0. 3. Sm ) over [t]. We will use the following notation: if z is a string in {0.S (z)) = 1] > Then there exists a circuit C of size O(m2 ) such that Prx [D(C(x)) = f (x)] − 1/2 ≥ m . then we can ﬁnd a bit of the output of the generator where this distinction is noticeable. the NisanWigderson generator is a function N Wf. we will see major diﬀerences.S (·) from the uniform distribution. . then we denote by zS the string of length S obtained from z by selecting the bits indexed by S. 1}t and S ⊂ [t].S (z) = f (zS1 ) · f (zS2 ) · · · f (zSm ) 24. . Hi is deﬁned as follows: sample a string v = N Wf. Sm ) be a (l. .S : {0. 1. 2.1. Using the hypothesis of the Lemma. . 1}m → {0. By deﬁnition. 1}l → {0. . 1}m according to the uniform distribution. . fraction γ > 0. On such a bit. 157 .S (y)) = 1] − Prr [D (r)] > where D (x) = b0 ⊕ D(x). 1} such that Pry [D (N Wf. . . 24. 5} then zS = (0. 0. and a design S = (S1 . we know that there is a bit b0 ∈ {0. 1} is such that Prr [D(r) = 1] − Prz [D(N Wf.
ri . y). y). . y) where x = zSi ∈ {0. fi−1 (x. . let us deﬁne fj (x. 1} Pick random y ∈ {0. . . there exists an index i such that Pr[D (Hi ) = 1] − Pr[D (Hi−1 ) = 1] ≥ /m Now. 1}t as a pair (x. . . then D is more likely to accept the string if it contains f (x) in the ith entry than if it contains a random bit in the ith entry. . and let us check that A is good predictor. .. With this notation we have Prx. . . For every j < i and z = (x. recall that Hi−1 = f (zS1 ) · · · f (zSi−1 )ri ri+1 · rm and Hi = f (zS1 ) · · · f (ySi−1 )f (ySi )ri+1 · rm . . y).. 1}t−l . .3) Let us forget for a moment about the fact that the step of computing f 1 (x.y. y.. y).y. . y) for j < i in the ﬁrst i − 1 entries. . .ri+1 . rm ) = 1 output ri Else output 1 − ri (24. rm ) 158 . . . r 1 . . . 1}t−l Compute f1 (x.. rm ) = 1] −Prx. . . . . fi−1 (x. . when D is given a string that contains fj (x. rm ∈ {0. Let us denote by A(x.rm D (f1 (x. y).. . . fi−1 (x. . . y) looks very hard. .S (z)) = 1] − Prr [D (r)] = Pr[D (Hm ) = 1] − Pr[D (H0 ) = 1] m i=1 = (Pr[D (Hi ) = 1] − Pr[D (Hi−1 ) = 1]) In particular. and random bits in the last m − i entries. . . l}.We then observe that ≤ Prz [D (N Wf. . We can assume without loss of generality (up to a renaming of the indices) that S i = {1.ri+1 . . Consider the following algorithm: Algorithm A Input: x ∈ {0. . y). . . y).rm [D (f1 (x. fi−1 (x. fi−1 (x. y) = f (zSj ): note that fj (x. y).. f (x). . y). . y) If D (f1 (x. Then we can see z ∈ {0.. ri . rm ) = 1] > /m That is. . . 1}l and y = z[t]−Si ∈ {0. . This is good enough to (almost) get a predictor for f . .. 1}l Pick random ri . y) depends on Si ∩ Sj  ≤ log m bits of x and on l − Si ∩ Sj  ≥ l − log m bits of y. . .
r [D (f1 (x. Prx. . ri . ri . c) is doing the following: output the xor between ci and the complement of D (C(x)). We said we have 1 Prx. . y. . . ri . . y. .. or one is the complement of the other. . r) = f (x)ri = f (x)]Prx. fi−1 (x. . we notice that at this point A(x.r [D (f1 (x. However. . rm ) = 1f (x) = b] 2 1 − Prx. . Since w is ﬁxed. . . . y).y. fi−1 (x. rm . rm . y). Even composing i−1 < m such circuit. . fi−1 (x. rm ) = 1f (x) = b] + 2 2 −Prx. rm ) = 1f (x) = ri ] 2 1 + Prx. Since ci is ﬁxed. y).y. 2 At this point it should not be too hard to derive Theorem 109. . fi−1 (x. .y. 159 . y). y). . fi−1 (x. .rm [A(x. fi−1 (x. . by a circuit C of size O(m2 ). . . ci+1 . .. . . ci . . . . y). ..r [D (f1 (x. Finally. . ri . . . .y. . . rm ) = 1f (x) = b] 2 +Prx.ri [ri = f (x)] = +Prx. .ri . given x. c) always equals D(C(x)).r [A(x. .y. . . either A(x. . ci+1 . we still have that the sequence f1 (x. for each j. r) = f (x)] = Prx. fi−1 (x. . . y). ci . . .r [D (f1 (x. cm to give to ri .r [D (f1 (x. In either case the Lemma follows. . ri . . w). . . r) = f (x)ri = f (x)]Prx. y). now. . . . . cm ) = f (x)] ≥ + 2 m At this point we are essentially done. . fi−1 (x. we only have to compute fj (x. r1 . w. y. . y). and it is worthwhile to see whether we can get an eﬃcient implementation. . .the output of A on input x and random choices y.y. . . . and so is computable by a circuits of size O(m).y. . .r [A(x. y). w). w) given x. . .y. . rm ) = 1f (x) = b] 1 + Pr[D (Hi ) = 1] − Pr[D (Hi−1 ) = 1] 2 1 + 2 m = = = ≥ So A is good.r [D (f1 (x. . y). y). r) = f (x)] ≥ + 2 m so there surely exist ﬁxed values ci . . . w) is a function that depends only on ≤ log m bits of x.r [D (f1 (x. . . . rm ) = 0f (x) = ri ] 2 1 1 Prx. ri . y). fj (x. y). .. in order to implement A. cm can be computed. .y. w.y.ri [ri = f (x)] 1 Prx. . ri . .r [A(x. . .r [A(x. . y. . w. . rm ) = 1f (x) = b] 1 + Prx. and a ﬁxed value w to give to y such that 1 Prx. . .y.
Lecture 25
Extractors and Pseudorandom Generators
25.1 Use of Weak Random Sources
Suppose that we have a probabilistic algorithm A(·, ·) that on inputs of length n runs in time T (n) and uses m(n) random bits. Instead of a perfect source of randomness, we assume that we have a source that produces an output containing some “impredictability,” but that can still be very far from uniform. A very general way of modeling this source is to assume that on input 1N it outputs a string in {0, 1}N , and that the output string has “some randomness” (a notion that we will formalize and quantify in a moment). Typically, a good way to quantify the amount of randomness, or impredictability, in a distribution, is to compute its (Shannon) entropy. For a random variable X whose range is {0, 1} N , its entropy is deﬁned as H(X) = a∈{0,1}N Pr[X = a] log(1/Pr[X = a]), Shannon entropy is a very good measure in cases where one is allowed to take multiple samples from the distribution, but in our setting this is not the best measure. √ Consider for example a distribution X such that X = (0, 0, · · · , 0) with probability 1 √ 1/ N , and − √ it is uniform with probability 1/ N . Then its Shannon entropy is about N , which is quite high, yet it is almost always a useless string of zeroes. It is a good intuition to think that the amount of randomness contained in the outcome a of a random variable X is log 1/Pr[X = a]. If X has Shannon entropy k, then on average, when we sample from X we get a value of “randomness” k, however it can be the case that with very high probability we get almost zero randomness, and with low probability we get high randomness. We would rather have a measure of randomness that guarantees to have almost always, or, even better, always, high randomness. This motivates the deﬁnition of minentropy: a random variable X has minentropy at least k if for every a in the range of X we have Pr[X = a] ≤ 1/2 k . That is, the minentropy of X is mina {log 1/Pr[X = a]}. Deﬁnition 31 A random variable with range {0, 1}N having minentropy at least k will be called a (N, k)source. Given one access to a (N, k) source, we would like to be able to simulate any probabilistic algorithm that uses m random bits, where m is close to k. If the simulation is “black box” 160
and takes time T , one can argue that m ≤ k + O(log T ). We will not deﬁne formally what a blackbox simulation is, but we will develop simulations that are black box, so it will come as no surprise that our simulations will work only for m smaller than k, in fact only for m smaller than k 1/3 . (This is partly due to oversimpliﬁcations in the analysis; one could get k .99 with almost the same proof.)
25.2
Extractors
An extractor is a function that transforms a (N, k) source into an almost uniform distribution. The transformation is done by using a (typically very small) number of additional random bits. Formally, we have the following deﬁnition. Deﬁnition 32 For two random variables X and Y with range {0, 1}m , their variational distance is deﬁned as X − Y  = maxS⊆{0,1}m {Pr[X ∈ S] − Pr[Y ∈ S]}. We say that two random variables are close if their variational distance is at most . Deﬁnition 33 A function Ext : {0, 1}N × {0, 1}t → {0, 1}m is a (k, ) extractor if for any (N, k) source X we have that Ext(X, Ul ) is close to uniform, where Ul is the uniform distribution over {0, 1}l . Equivalently, if Ext : {0, 1}N × {0, 1}t → {0, 1}m is a (k, ) extractor, then for every distribution X ranging over {0, 1}N of minentropy k, and for every S ⊆ {0, 1}m , we have Pra∈X,z∈{0,1}t [Ext(a, z) ∈ S] − Prr∈{0,1}m [r ∈ S] ≤
25.3
Applications
See [Nis96] for an extensive survey. Here we present only one application. Another notable application is the construction of expanders. Suppose that A(·, ·) is a probabilistic algorithm that on an input of length n uses m(n) random bits, and suppose that for every x we have Prr [A(x, r) = right answer ] ≥ 3/4. Suppose Ext : {0, 1}N × {0, 1}t → {0, 1}m is a (k, 1/4)extractor. Consider the following algorithm A : on input x ∈ {0, 1}n and weakly random a ∈ {0, 1}N , A computes A(x, Ext(a, z)) for every z ∈ {0, 1}t , and then it outputs the answer that appears the majority of such 2t times. We want to argue that A is correct with high probability if a is sampled from a weak random source of entropy slightly higher than k. Let us ﬁx the input x. Consider the set B of strings a ∈ {0, 1}N for which the algorithm A makes a mistake: B = {a : Prz∈{0,1}t [A(x, Ext(a, z)) = right answer ] < 1/2} Consider the random variable X that is uniformly distributed over B (clearly, X has minentropy log B). Then we have Pra∈X,z∈{0,1}t [A(x, Ext(a, z)) = right answer ] < 1/2 161
and so Pra∈X,z∈{0,1}t [A(x, Ext(a, z)) = right answer ] − Prr [A(x, r) = right answer ] > 1/4 and then it follows form the property of Ext that X must have minentropy less than k, that is B ≤ 2k . Let now X be a (N, k + 2)source, and let us execute algorithm A using X. Then Pra∈X,z∈{0,1}t [A(x, Ext(a, z)) = right answer ] = 1 − Pra∈X [a ∈ B] ≥ 3/4 More generally Theorem 114 Suppose A is a probabilistic algorithm running in time TA (n) and using m(n) random bits on inputs of length n. Suppose we have for every m(n) a construction of a (k(n), 1/4)extractor Extn : {0, 1}N × {0, 1}t(n) → {0, 1}m(n) running in TE (n) time. Then A can be simulated in time 2t (TA + TE ) using one sample from a (N, k + 2) source.
25.4
An Extractor from NisanWigderson
This is a simpliﬁed presentation of results in [Tre01] (see also [RRV99, ISW00]). ¯ Let C : {0, 1}N → {0, 1}N be a polynomial time computable errorcorrecting code such that any ball of radius at most 1/2 − δ contains at most 1/δ 2 codewords. Such a code exists with n = poly(n, 1/δ). ¯ ¯ ¯ For a string x ∈ {0, 1}N , let < x >: {0, 1}log N → {0, 1} be the function whose truth ¯ table is x. Let l = log N , and let S = (S1 , . . . , Sm ) be a (l, log m) design over [t]. Then consider the procedure ExtN W : {0, 1}N × {0, 1}t → {0, 1}m deﬁned as ExtN WC,S (x, z) = N W<C(x)>,S (z) . That is, ExtN W ﬁrst encodes its ﬁrst input using an errorcorrecting code, then views it as a function, and ﬁnally applies the NisanWigderson construction to such a function, using the second input as a seed. Lemma 115 For suﬃciently large m and for > 2−m , ExtN WC,S is a (m3 , 2 )extractor. Proof: Fix a random variable X of minentropy m3 and a function D : {0, 1}m → {0, 1}; we will argue that Pr[D(r) = 1] − Pra∈X,z∈{0,1}t [D(ExtN W (a, z)) = 1] ≤ 2 Let us call a value a bad if it happens that Pr[D(r) = 1] − Prz∈{0,1}t [D(ExtN W (a, z)) = 1] > and let us call B the set of bad a. When a is bad, it follows that there is a circuit C of size O(m2 ) such that either D(C()) or its complement agrees with a on a fraction 1/2 + /m of its entries. Therefore, a is totally speciﬁed by D, C, and 2 log(m/ ) additional bits (once we have D and C, we know that the encoding of a sits in a given sphere of radius 1/2 − /m, 162
2
1}t [D(ExtN W (a.together with at most other (m/ )2 codewords). z)) = 1] ≤ a Pr[X = a] Pr[D(r) = 1] − Prz∈{0. z)) = 1] ≤ Pr[X ∈ B] + ≤ 2 2 Theorem 116 Fix a constant .1}t [D(ExtN W (a. We then have Pr[D(r) = 1] − Pra∈X. Therefore. 1}N × {0.z∈{0. 163 . )extractor Ext : {0. for every N and k = N Ω(1) there is a polynomialtime computable (k. the size of B is 2 upper bounded by the number of circuits of size O(m2 ). 1}m where m = k 1/3 and t = O(log N ). that is 2O(m log m) . for a ﬁxed D. 2 times 2. times (m/ )2 . The total is 2O(m log m) . The probability that an element a taken from X belongs 3 2 to B is therefore at most 2−m · 2O(m log m) < for suﬃciently large m. 1}t → {0.
Linearity testing over characteristic two.P. SL ⊆ L 3 . A. Sudan.P.E. In Proceedings of the 29th ACM Symposium on Theory of Computing. and J. Andreev. H˚ astad. Optimal bounds for the approximation of boolean functions and some applications. Preliminary version in Proc. pages 230–239.P.F. Andreev. Two theorems on random polynomial time. Clementi. A. Clementi. Armoni. In Proceedings of the 17th ACM Symposium on Theory of Computing. S. A. pages 75– 83. Trevisan. Proof veriﬁcation and hardness of approximation problems. of FOCS’92.E. 4 [ACR98] [ACR99] [ACRT99] [Adl78] [ALM+ 98] [AS98] [ATSWZ97] R.D. pages 421–429. See also [BM88]. Trading group theory for randomness. S. Weak random sources.E.E. Safra. 221:3–18. 1998. 45(1):70–122. A.D. D. 1998. Journal of the ACM. 1985. of FOCS’92. M. 180:243–268. J.D. Clementi.E.P. 1978. 1998. Andreev. Bellare. and J. Arora.F. Theoretical Computer Science. Rolim. [Bab85] L. Journal of the ACM. 42(6):1781–1795. and L.E. and S. 1999. SIAM Journal on Computing. Theoretical Computer Science.D. M. Motwani. A. 1996.Bibliography [ACR97] A. 45(3):501–555.F. Rolim. Zhou. Coppersmith. M. A. In Proceedings of the 19th IEEE Symposium on Foundations of Computer Science. Szegedy. 45(1):179–213. Arora and S. J. A. 1997. 28(6):2103–2116.E. Clementi. and M. Rolim. Lund. hitting sets. TaShma. and BPP simulations. and J. Wigderson. A. Preliminary version in Proc of FOCS’97. A new general derandomization method. C. Sudan. IEEE Transactions on Information Theory. Journal of the ACM. 1997. Kiwi.E. and M. Babai. Worstcase hardness suﬃces for derandomization: A new method for hardness vs randomness tradeoﬀs. A. Preliminary version in Proc. [BCH+ 96] 164 . 1999.F. R. Rolim. Andreev. Probabilistic checking of proofs: A new characterization of NP. Leonard Adleman.
L. 1984. 1990. Interactive proofs a and the hardness of approximating cliques. Safra. 3(4):307–318. and R. Hiding instances in multioracle queries. Free bits. Goldreich. Wigderson. Blum. M. 22:994–1005. Journal of Computer and System Sciences. L. and Mark N. and A. 1999. Equivalence of free boolean graphs can be decided probabilistically in polynomial time. Sudan. 1993. Wegman. 6:675–695. pages 151–158. and M.A. 27(3):804–915. 7(4):343–353. L. Eli BenSasson and Avi Wigderson. 1998. Benny Chor. Fortnow. Computational complexity of probabilistic Turing machines. S. 43(2):268– 292. Nisan. SIAM Journal on Computing. S. Journal of Computer and System Sciences. 17(1):13–27. Szegedy. 48(2). Buhrman and L. Computational Complexity. and Michael Luby. Onesided versus twosided error in probabilistic computation. 1988. 1973. Preliminary version in Proc. circuits. [BF90] [BF99] Donald Beaver and Joan Feigenbaum. BPP has subexponential time simulations unless EXPTIME has publishable proofs. 1993. 1993. O. of STOC’90. The complexity of theorem proving procedures. SIAM Journal on Computing. SIAM Journal on Computing. Oded Goldreich. Joan Feigenbaum and Lance Fortnow. of FOCS’95. Lov´sz. Preliminary version in Proc. Saxe. 1996. Information Processing Letters. Gill. Babai. and Michael Sipser. Preliminary version in Proc. pages 100–109. of FOCS91. In Proceedings of STACS’90. 1980. Chandra. Journal of the ACM. Journal of Computer and System Sciences. Goldwasser. [BDCGL92] Shai BenDavid. Stephen A Cook. 2001. PCP’s and nonapproximability – towards tight results. Babai and S. pages 37–48. 1992. M. Cook. In Proceedings of the 3rd ACM Symposium on Theory of Computing. 44(2):193–219. Selftesting/correcting with applications to numerical problems. Moran. Feige.[BCW80] Manuel Blum. and a hierarchy of complexity classes. Bellare. 1977. [BGS98] M. [BFNW93] L. On the randomselfreducibility of complete sets. 47(3):549– 595. A hierarchy for nondeterministic time complexity. Journal of the ACM. U. H. J. Mathematical Systems Theory. ArthurMerlin games: a randomized proof system. Luby. and the polynomialtime hierarchy. 10(2):80–82. Ashok K. Rubinfeld. Parity. S. On the theory of average case complexity. 165 [BLR93] [BM88] [BSW01] [Coo71] [Coo73] [FF93] [FGL+ 96] [FSS84] [Gil77] . 1971. In STACS’99. Merrick L. N. Fortnow. Journal of Computer and System Sciences. Short proofs are narrow: Resolution made simple. Furst. 36:254–276. and M. James B.
1991. Haken. and C. Oded Goldreich. Proofs that yield nothing but their validity. Clique is hard to approximate within n1− . Wigderson. pages 6– 20. Preliminary version in Proc of STOC’85. Lipton. Technical Report TR9550. Journal of the ACM. pages 59–68. and M. N. On the computational complexity of algorithms. O. 1995. M.E. 48(4):798–859. and A. SIAM Journal on Computing. 1991. Improved derandomization of BPP using a hitting set generator. Some optimal inapproximability results. 39:297–308. The knowledge complexity of interactive proof systems. Wigderson. Theoretical Computer Science. 1999. pages 32–42. No better ways to generate hard NP instances than picking uniformly at random. 18(1):186–208. pages 131–137. S. Wigderson. SIAM Journal on Computing. Goldwasser and M. Goldreich. Stearns. In Proceedings of the 18th ACM Symposium on Theory of Computing. Micali. 182:105–142. 38(3). S. R. 1986.[GLR+ 91] P. and A. Sipser. 117:285–306. 1990. N. S. Transactions of the AMS. On Yao’s XOR lemma. O. pages 812–821. Electronic Colloquium on Computational Complexity. 28(4):1364–1396. Nondeterministic space is closed under complementation. Rubinfeld. A pseudorandom generator from any oneway function. In Proceedings of the 18th ACM Symposium on Theory of Computing. Notes on Levin’s theory of averagecase complexity. H˚ astad. H˚ astad. Almost optimal lower bounds for small depth circuits. S. Goldreich and A. Selftesting/correcting for polynomials and for approximate functions. Gemmell. The intractibility of resolution. Russell Impagliazzo and Leonid Levin. 1999. Immerman. J. J. Goldwasser. Technical Report TR97058. Journal of the ACM. 1997. Micali. 1965. J. Luby. R. Nisan. 166 [GMR89] [GMW91] [GNW95] [Gol97] [GS86] [GW99] [Hak85] [H˚ as86] [H˚ as99] [H˚ as01] [HILL99] [HS65] [IL90] [Imm88] . Acta Mathematica. In Proceedings of the 23rd ACM Symposium on Theory of Computing. R. In Proceedings of the 31st IEEE Symposium on Foundations of Computer Science. Impagliazzo. Sudan. 1985. O. Private coins versus public coins in interactive proof systems. Wigderson. Rackoﬀ. 1999. 1988. 2001. In RANDOM’99. 1989. 1986. SIAM Journal on Computing. A. Hartmanis and R. and A. L. Goldreich. Levin. 17:935–938. Johan H˚ astad. Johan H˚ astad. Electronic Colloquium on Computational Complexity.
43:169–188. SIAM Journal on Computing. R. 1986. R. A (7/8− )approximation algorithm for MAX 3SAT? In Proceedings of the 38th IEEE Symposium on Foundations of Computer Science. editors. Shaltiel. In Proceedings of the 36th IEEE Symposium on Foundations of Computer Science. pages 302–309. Plenum Press. A. B. In Proceedings of the 12th ACM Symposium on Theory of Computing. Wigderson. R. In Proceedings of the 10th IEEE Conference on Structure in Complexity Theory. 1998. 1973.M. A personal view of averagecase complexity. Problemi Peredachi Informatsii. Levin. 9:265– 266. Lautemann.W. Complexity of Computer Computations. Universal search problems.M. C. 1972. 2000. 1986. and Vijay V. and A. Impagliazzo and A. Mark Jerrum. In Proceedings of the 29th ACM Symposium on Theory of Computing. Impagliazzo and A. R. R. 1995. pages 220–229. Wigderson. pages 1–10. Lipton. Hardcore distributions for somewhat hard problems. Wigderson. Average case complete problems. R. Impagliazzo. Valiant. BPP and the Polynomial Hierarchy. and A.J. R. Russell Impagliazzo. Randomness versus time: Derandomization under a uniform assumption. Karp and R. Information Processing Letters. pages 734–743. Reducibility among combinatorial problems. pages 538–545. Extractors and pseudorandom generators with optimal seed length. pages 134–147. 1999. Leonid Levin.[Imp95a] Russell Impagliazzo. In Proceedings of the 39th IEEE Symposium on Foundations of Computer Science. 1983.E. 1995. R. L. Thatcher. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science. Karloﬀ and U. Nearoptimal conversion of hardness into pseudorandomness. Some connections between nonuniform and uniform complexity classes. Theoretical Computer Science. Miller and J. pages 406–415. Zwick. pages 85–103. 1997. Impagliazzo. Shaltiel. Wigderson. [Imp95b] [ISW99] [ISW00] [IW97] [IW98] [JVV86] [Kar72] [KL80] [KZ97] [Lau83] [Lev73] [Lev86] 167 . P = BP P unless E has subexponential circuits. 1997. In R. In Proceedings of the 32nd ACM Symposium on Theory of Computing. 15(1):285–286. Vazirani. Karp. Leslie G. 1980. 17:215–217. Random generation of combinatorial structures from a uniform distribution.
Preliminary version in Proc of FOCS’90. Nisan. 1996. 39(4):859–868. 7(4):357–363. Schwartz. 1980. Algebraic methods for interactive proof systems. Nisan. AddisonWesley. Raz. A. Lower bounds on the size of bounded depth networks over a complete basis with logical addition. 1994. J. N. 27:701–717. Derandomizing ArthurMerlin games using hitting sets. Eﬃcient checking of computations. Preliminary version in Proc. SIAM Journal on Computing. P. pages 44–58. 1970. A parallel repetition theorem. 1992. Papadimitriou. Oneway functions and pseudorandom generators. 1998. and S. 41:598–607.V. 27(3):763–803.[Lev87] [LFKN92] Leonid Levin. Walter J. C. IP = PSPACE. 4(1). In Proceedings of the 33rd IEEE Symposium on Foundations of Computer Science. Fortnow. Relationships between nondeterministic and deterministic tape complexities. Karloﬀ. Fast probabilistic algorithms for veriﬁcation of polynomial identities. pages 149–158. R. L. O. 1999. T. Combinatorica. In Proceedings of the 31st ACM Symposium on Theory of Computing. H. Hardness vs randomness. Journal of Computer and System Sciences.B. N. E. In Proceedings of the 11th IEEE Conference on Computational Complexity. Shamir. 1987. Wigderson. 1992. Journal of the ACM. Matematicheskie Zametki. Miltersen and N. Extracting randomness: How and why. Preliminary version in Proc. C. In Proceedings of the 40th IEEE Symposium on Foundations of Computer Science. Szemeredi. of FOCS’88. 1992. 1987. Nisan and A. Undirected connectivity in O(log1. of STOC’95. A. Vadhan. Computational Complexity. Wigderson. 39(4):869–877. Journal of the ACM. In Proceedings of STACS’90. 1994.H. Journal of Computer and System Sciences. Reingold. R. Richard J. Lund. and A. Vinodchandran.A. RL ⊆ SC. 4(2):177–192. 1994. Computational Complexity. Savitch. N. Nisan. 49:149–167. pages 71–80. 1990. pages 24–29. Extracting all the randomness and reducing the error in Trevisan’s extractors. Nisan. pages 207–215. Raz. Preliminary version in Proc of FOCS’90. [Lip90] [MV99] [Nis94] [Nis96] [NSW92] [NW94] [Pap94] [Raz87] [Raz98] [RRV99] [Sav70] [Sch80] [Sha92] 168 . Journal of the ACM. and N.5 n) space. N. Lipton. Razborov. 1999.
8:189–201. Extractors and pseudorandom generators. 47:85–93. Pseudorandom generators without the XOR lemma. Sudan. Roman Smolensky. Errorcorrection for algebraic block codes. 1983. L. 2001. The complexity of computing the permanent.J. Stockmeyer. 2001. In A. pages 648–657. 3:1–22. Valiant and V.G.470. Studies in Constructive Mathematics and Mathematical Logic. L.G. In Proceedings of the 36th IEEE Symposium on Foundations of Computer Science. pages 77–82. Simple extractors for all minentropies and a new pseudorandom generator. Welch and E. Sudan. 62(2):236–266. L. G. In Proceedings of the 19th ACM Symposium on Theory of Computing. M. 1986.633. 1997. Saks and S. 26:279–284.[Sip83] [Smo87] M. Acta Informatica. of FOCS’96. M. Journal of the ACM. Berlekamp. The complexity of approximate counting. The polynomialtime hierarchy. Journal of Complexity. 48(4):860–879. Slisenko. editor. pages 344–353. R. 1979. Preliminary version in Proc. Stockmeyer. 2002. In Proceedings of the 15th ACM Symposium on Theory of Computing. and S. Decoding of ReedSolomon codes beyond the errorcorrection bound. Trevisan. 13(1):180–193. L. A complexity theoretic apprach to randomness. pages 118–126. O. In Proceedings of the 34th ACM Symposium on Theory of Computing. 1976. 4.R. Vazirani. Theoretical Computer Science. pages 330–335. In Proceedings of the 15th ACM Symposium on Theory of Computing. Zhou. On the complexity of derivation in propositional calculus. 1988. Trevisan. 1995. The method of forced enumeration for nondeterministic aue tomata. pages 627–634. 1983. RSPACE(S) ⊆ DSPACE(S 3/2 ). In Proceedings of the 42nd IEEE Symposium on Foundations of Computer Science. M. Umans. Pseudorandom generators for all hardnesses. Valiant. NP is as easy as detecting unique solutions. Theoretical Computer Science. S.J. 2001. US Patent No. [Sto76] [Sto83] [STV01] [SU01] [Sud97] [SZ95] [Sze88] [Tre01] [Tse70] [Uma02] [Val79] [VV86] [WB86] 169 . Shaltiel and C. L. Tseitin. L. Vadhan. 1970. R. L. pages 115–125. C. Theoretical Computer Science. Algebraic methods in the theory of lower bounds for boolean circuit complexity. Sipser. Szelepcs´nyi. Umans.V. 1987. Journal of Computer and System Sciences.R. 1986.
pages 80–91. In Proceedings of the 26th IEEE Symposium on Foundations of Computer Science. A. In Proceedings of the International Symposiumon on Symbolic and Algebraic Computation. In Proceedings of the 23th IEEE Symposium on Foundations of Computer Science. pages 1–10. [Yao85] [Zip79] 170 . Yao. Richard Zippel. 1979. 3:23–34. pages 216 – 226.[Wra76] [Yao82] C. 1985. Wrathall. Andrew C Yao. Separating the polynomialtime hierarchy by oracles. Complete sets for the polynomial hierarchy.C. Theoretical Computer Science. Theory and applications of trapdoor functions. SpringerVerlag. 1976. Probabilistic algorithms for sparse polynomials. 1982.
This action might not be possible to undo. Are you sure you want to continue?
Use one of your book credits to continue reading from where you left off, or restart the preview.