cs590 s23 Bruda

Some Problems Are More Complete than Others:
A CSC 590 Challenge
Stefan D. Bruda
stefan@bruda.ca
An overview of the domain is given in Section 1. The particular problem that you need to
present is outlined in Section 2. The more specific requirements of this challenge are revealed in
Section 3. For further preliminaries such as the big-oh notation and algorithm analysis refer to a
good textbook on algorithms [3] and/or a good textbook on theoretical computer science [5].
1 Preliminaries
A decision problem is a problem for which the possible answers are true and false. We henceforth
drop the “decision” qualifier with the understanding that all the problems are decision problems
unless stated otherwise. Given an input w for a (decision) problem π we say that w is a positive
instance of π whenever the answer on input w according to π is true; otherwise w is a negative
instance of π .
1.1 Complexity Classes

N N
The running time of an algorithm is t : → whenever the algorithm performs t(n) steps before
terminating on any input of size n. An algorithm runs in s : N N→ space if for any input of size
n the algorithm uses no more than s(n) additional memory space at any time. The input itself
does not count toward the s(n) memory allowance, but it is read only; the output does not count
toward the s(n) allowance either, but it is write only (we can produce symbols as output but we
can never read them back).
Why do we exclude the input from the space allowance you ask? That’s because this is the
only way we can define classes of problems requiring sublinear space (for example, logarithmic
space). How about the output? I am glad you asked. Indeed, we said at the beginning that our
problems are all decision problems, so the size of the output is always one (encoding a true or
false answer) so the space taken by the output is immaterial. However, we also want to define
transformations from input to output (see the concept of a reduction below), where the output
does not necessarily fit into that space.
The complexity classes that will be discussed later include the class NL of problems com-
putable in logarithmic space (s(n) = O(log n)) and the class P of problems computable in poly-
nomial time (t(n) = nO(1) ). In the same line of thought we will also consider the class POLY-
LOGSPACE (s(n) = (log n)O(1) ). A slightly more complicated class is NP, which includes those
1
problems for which it can be determined in polynomial time whether a solution candidate is in-
deed a solution for that problem (note however that an actual solution cannot necessarily be com-
puted in polynomial time).
1.2 The Hardest Problems of Them All

Suppose that you have a class of problems Y and your task is to find the hardest problems in this
class. At a fist sight the question is simple, but in reality it is not even well defined. For one thing,
given two problems it may not be that clear whether one is harder than another. Often but not
always the notion of “hardness” is built into the definition of Y . Consider for example the class
NP. In this particular case it is tempting to define hardness based on the running time of the best
known algorithm solving the problem. Problem is, hardness becomes a moving target: a hard
problem can conceivably become easy once a new, better algorithm is developed for that problem.
Even worse, if we consider such a “natural” notion of hardness then each class of problems
will have its particular “natural” definition. For example, classes defined in terms of memory
requirements rather than running time (such as POLYLOGSPACE) will obviously have to have
a different notion of a problem being harder than another. Note that this definition must be less
intuitive than the one for NP: we eventually need to compare problems with fast running time but
big space requirements vs. problems that can be solved using very little space but in a long time.
Because of these numerous pitfalls the hardness of a problem is determined in theoretical com-
puter science on a more functional basis: a problem is “hard” whenever once we solve it is “easy”
to modify that solution so that it becomes a solution for any other problem in Y . This concept
is still intuitive and has the potential of being a universal concept. Except that we also need a
definition for “easy”. The same pitfalls apply to defining how easy a problem is, so we are back to
square one. We can however settle all of this by considering not one but two classes of problems.
In all, the pursuit of finding the “hardest” problems for a given class involves the following
scenario: We consider two classes of decision problems X and Y such that X ⊆ Y . We then
define a reduction from a problem π1 to a problem π2 as the function τπ1 →π2 that transforms any
instance of π1 into an equivalent instance of π2 , meaning that w is a positive instance of π1 if and
only if τπ1 →π2 (w) is a positive instance of π2 . We insist that the reduction as a problem is in the
“inner” class X that is, “easy”. Note in passing that reductions must be closed under functional
composition (meaning that if τπ1 →π2 is a reduction from π1 to π2 and τπ2 →π3 is a reduction from
π2 to π3 then τπ1 →π2 · τπ2 →π3 is a reduction from π1 to π3 , where the operator · is defined as
( f · g)( x) = f ( g( x))).
In this setup the hardest problems for Y (called complete problems) can be defined in a natural
way as follows: A problem π is complete for Y (or Y -complete) whenever π ∈ Y and for any other
problem π ′ ∈ Y there exists a reduction τπ ′ →π from π ′ to π .
Note how nicely this reflects the functional concept of hardness explained above: once the
problem π is found to be complete for Y then any other problem in Y becomes no harder to solve
that π within an overhead no more complex than X (and recall that those problems in X are the
easy problems).
1.3 Two Simple Examples

Two particular instances of this kind of questions have wide consequences in the theory and prac-
tice of computing.
2
The first instance comes from the computability theory. In this case X is the class of recursive
problems (those that admit an algorithmic solution) and Y is the class of recursively enumerable
problems (that can be formulated as computational problems but cannot necessarily be solved
algorithmically). The reductions used in this circumstance are recursive, meaning that they are
functions that can be computed algorithmically.
The second example is the famous P vs. NP question, which was also discussed in CS 567.
Unsurprisingly enough X is P (problems that feature algorithm running in polynomial time) and
Y is NP (problems whose solutions can be verified in polynomial time, see above). Obviously the
reductions used in this context are all polynomial (functions that are computable algorithmically
in polynomial time).
1.4 The Million Dollar Question

At this point you may wonder why is it interesting to find the hardest problems in the class. It
turns out that complete problems are a great basis for answering the question whether the inclu-
sion X ⊆ Y is strict or not, as follows: We pick some Y -complete problem π and suppose that we
show that π ∈ X . If this is the case, then all the other problems in Y will also belong to X and
so X = Y . Indeed, we can solve any of them by first applying the reduction to π and then solve
the corresponding instance using the algorithm we developed when we showed that π ∈ X ; both
of these processes come from X . Conversely, once we show that π 6∈ X then we have effectively
shown that X ( Y . The latter applies to any problem from Y ; however, it is still advantageous to
start from a complete problem, since they are the hardest and so the most likely to be outside X .
The inclusion question is almost always very interesting in practice. It is useful for example to
know whether all recursively enumerable problems are recursive. If this is the case, then it follows
that once you can formulate a problem you have a guarantee that you can solve it algorithmically.
This question has been answered (negatively, the two classes are not equal) a long time ago at the
very dawn of the computing era [7, 8]. The complete problem first considered in this respect was
the halting problem (determining whether a given program terminates or not), but in the mean-
time many other problems have been shown to be complete for recursively enumerable languages
and thus unsolvable (or technically non-recursive). This whole theory turns out to have far reach-
ing consequences in all the aspects of computing, but research-wise is a closed problem (we know
all there is to know about it).
Another interesting question is whether P and NP are equal to each other. It may appear at
a first sight that NP is not a very interesting family of problems, but actually many problems
(and I really mean many [4]) that are very useful in our daily computing needs turn out to be
NP-complete. As a simple example, the timetable problem (creating an optimal timetable with
no conflicts) is NP-complete. Accordingly, it would be great if this kind of problems were easy to
solve (that is, P = NP). On the other hand, most of the current cryptographic schemes rely on the
fact that NP-complete problems are hard to solve (that is, P ( NP). Accordingly, if we ever show
that the two classes are one and the same then virtually all encrypted communications will break
down.
So are P and NP equal? That is a million dollar question. Literally. It is one of the Millennium
Prize problems so if you manage to solve it you get one cool million dollars, and just possibly
change Internet security forever. The first NP-complete problem was found a couple of genera-
tions ago [2] and people have tried to come up with a proof (one way or another) ever since. It is
widely believed that the two classes are different (and so our Internet applications are all safe).
3
The quest for showing that P is different from NP is part of the complexity theory. Any good
textbook in theoretical computer science covers complexity as well as the computability theory
(about the relation between recursively enumerable and recursive problems) so refer to such a
textbook [5] for further details.
2 Complete Problems and Parallel Computation

This challenge asks you to present a different class of complete problems namely, those problems
that are complete for P. The usefulness of this pursuit comes from a slightly different perspective.
While the two examples discussed above deal with problems that cannot be solved algorithmi-
cally (either at all or in a realistic amount of time), all the problems in the class P are considered
“feasible” or even “easy”, being solvable by polynomial time algorithms. The question is now
fine, we can solve these problems, but how much faster can we go if we want to solve them in
parallel, on a machine having a lot of processors, such as a supercomputer or a GPU.
The first question would be what is the definition of faster. We already have polynomial-time
solutions, so we need to go faster than this. The most straightforward answer is to go one level
down from polynomial time (nO(1) ) to polylogarithmic time ((log n)O(1) ) so we will go with that.
2.1 The Parallel Computation Thesis

The second problem we encounter is that we do not have a good, universal definition for what
a parallel machine is (like we have for sequential computation). Parallel machines come in lots
of shapes and sizes. We can solve this by not relying to any definition of parallel computation
at all, but considering instead the following parallel computation thesis [1]: The time used by any
(reasonable) parallel machine is polynomially related to the space used by a sequential machine.
What is a “reasonable” parallel machine and how this is a thesis rather than a hard result is
beyond the scope of this project so we will not elaborate on this. Suffice to say that the parallel
computation thesis has withstood the test of time just like the equivalent Church-Turing thesis
for sequential computation [5] and so we will take it to be true. As a consequence, we do not
need to go into any further definition of parallel computation. All we need to do is to compare
the classes P (of problems that can be solved sequentially) with POLYLOGSPACE (of problems
that can be solved using polylogarithmic space and thus can be solved fast, or in polylogarithmic
time in parallel). If we find that POLYLOGSPACE includes the whole P then we will know that
all the problems in P can be solved fast in parallel. On the other hand, if we succeed in showing
that the hard problems in P (the P-complete problems) are outside POLYLOGSPACE then the P-
complete problems do not benefit from any parallel implementation; this kind of problems are
called inherently sequential.
2.2 Parallel Complexity Theory

This challenge asks you to describe the main part of the parallel computation complexity theory by
considering the relation between P and PLOYLOGSPACE. To do so, you must define the notion of
P-complete problems. A further wrinkle is that the two examples of complete problems presented
earlier feature two complexity classes arranged in a neat hierarchy (one being included in the
other). Here, there is no known relation between P and POLYLOGSPACE, in fact it is widely
4
believed that the two are not comparable. The concept of a reduction must therefore rely on a
third class that is a subset of both. The first task of this challenge will establish such a class:
Task 1 Provide a formal proof that NL ⊆ P and NL ⊆ POLYLOGSPACE.
Hint. One of the two inclusions is trivial. For the other one, let π ∈ NL be a decision problem and
let w be some input for π . Consider all the possible contents of the storage plus the current output
(which can be true, false, or “not yet determined”) for an algorithm solving π and working on w.
How many such possible “configurations” do we have? Now arrange these configurations in a
graph so that there exists an edge between configuration ci and configuration c j if any only if c j
can be reached from ci in a single computation step according to the algorithm that solves π . Will
traversing this graph allow us to obtain the answer to our problem? If so, how much time will the
traversal take?
Now that we have an “inner” class we can proceed with defining a suitable reduction:
Task 2 Formally define reductions for the purpose of defining complete problems for the class P.
Explain carefully how are your reductions closed under functional composition.
Hint. Could it be that your reductions feature space restrictions? If so, it is quite possible that
the allowed space is not enough for storing intermediate results. That is, you cannot just compute
τπ1 →π2 (w) and then run τπ2 →π3 on the result simply because there is not enough space to store that
result. If this is the case then you need to be creative and compute τπ1 →π2 (w) piecemeal instead
of all at once. Note that during this process it is likely that you have no restriction on the running
time (just on space).
Finally, we get to define P-complete problems:
Task 3 Define the class of P-complete problems. Explain carefully how showing that a P-complete
problem is in PLOYLOGSPACE results in the whole class P being included in POLY-
LOGSPACE. Explain the practical consequences of finding that P-complete problems are
or are not in POLYLOGSPACE.
Now that all the ingredients are in place it is time to review the P-complete problems:
Task 4 Provide an overview of P-complete problems. Start with a detailed overview of how the
first P-complete problem was shown to exist. Continue with describing how a second prob-
lem is shown to be P-complete using a suitable reduction. Finally, describe a few other
interesting P-complete problems (without going into too many details regarding the proof
of their P-completeness).
3 Your Mission
Imagine that you are a graduate student who just proved that P and POLYLOGSPACE are not
comparable, and that you have approached this effort on the basis of the theory of P-complete
problems. You are ready to write your thesis. True, that’s not going to get you that cool million
dollar prize, but still the fame is yours and that usually comes packed with a good, well paid
academic job. In fact, I am pretty sure that you will be able to pick your own university!
5
Before all that could happen though, you need to write your thesis. Like all the theses, the
first three chapters do not have much of a relation with your findings. They consists instead of an
introduction (what is the problem and why is it worth solving), the preliminaries (summarizing
the concepts and notations that will be used throughout the rest of the paper), and previous work
(what other people have done in the domain already).
Your mission should you choose to accept it is to write a report that contains these three chap-
ters; the rest of the thesis described above is left as an exercise for the interested reader. The first
two are straightforward, while the third should address the tasks from the previous section and
also any additional concepts and results that are related to the topic at hand; it is your responsi-
bility to find those, including but not necessarily limited to examples of P-complete problems and
their practical significance.
All of this has been discovered already so there is nothing for you to discover. All you need
to do is to gather the information in a coherent, correct, and complete report. At the same time,
you need to provide references for each concept included in your report, so that the interested
reader can dig further and be able with minimal effort to come up with the original papers that
introduce the results you are talking about (these are called primary references, as opposed to sec-
ondary references just like your report, that collect and describe multiple results). As an example
of primary vs. secondary references note that the reference list at the end of this handout contains
five primary and three secondary references (I will leave you the pleasure to figure out which is
which).
Finding the references to the results that you are going to include in the report is your respon-
sibility. As a starting point I am going to give you a good primary reference [6] (incidentally this is
a rare occurrence of a book being in fact a primary reference) and also mention the fact that there
exists at least one other very good secondary reference (a book that is available electronically on
the Internet, which you will have to find all by yourself if needed).
3.1 What to Submit and How Will It Be Marked

Submit (through Moodle) a typed report typeset to PDF and addressing all the tasks mentioned
in the previous section. Provide in the preamble the names and email addresses of all the authors.
Use a reasonable font (anywhere between 10 and 12 points) and reasonable margins (1-inch
all around being ideal). I expect a length of no more than 10 pages, but note that the length of
the report is not part of the marking scheme. Do not forget to include a list of references that are
actually cited in the text (more on that below).
In writing this report I expect you to demonstrate a good understanding of an unfamiliar
topic, your ability to summarize your understanding in a coherent manner, but also your ability
of writing a technical document. The following are thus all part of the marking scheme:
Literature review and references (out of 30 marks) The first step in writing a report like this is
to find relevant papers to summarize. This process is called literature review. Potential
candidates include journal and conference papers as well as books (mostly as secondary
references). You need to find papers that are relevant to the subject and you need to pay
special attention to primary references. The literature search is demonstrated by the list of
references that you will be including at the end of your report. An example is this handout.
I am expecting at least two primary references and two secondary ones. Note that all your
6
references should be technical (research papers or textbooks). Lecture notes, Youtube videos,
or Wikipedia articles are not acceptable.
Having a list of references is not enough, you also need to cite them in the text in all the
appropriate places. This handout is a decent though not necessarily outstanding example.
Any new concept should be supported by an appropriate citation unless is a concept that
you are defining in your report for the first time. All your references need to be cited in the
text; I will disregard completely any reference that is not cited.
Presentation (out of 40 marks) You are presenting a report that includes motivation, definitions,
and a list of results, and I expect all three of these to be present. Note however that the report
presents known results and so there is no need to provide proofs; the only exception is the
fact that I have explicitly requested a proof in Task 1 and so you are expected to provide a
complete proof for this result. All the presentation (other than the motivation) should be
formal; you are not writing an introductory textbook, you are writing a technical report.
A note that applies to the presentation but also to some degree to the language and consis-
tency part below is that whatever I do not understand will simply not be marked.
Language and consistency (out of 30 marks) All the text of the report should be yours; copying
and pasting from your references is not acceptable. A collection of definitions and results
will not do. The style should be consistent and the report should read fluently, with a clear
direction of where things are going. The report should also read like a novel and so bullet
points are to be avoided; do not be lazy, use complete sentences and paragraphs.
I am not going to be pedantic, but I nonetheless expect a good language quality. Make sure
you proof read your manuscript carefully before submitting. Too many language issues and
typos will result in loss of marks.
Your overall grade for this challenge will be the sum of all of the above, and so will be out of
100 marks.
References
[1] A. K. C HANDRA AND L. J. S TOCKMEYER, Alternation, in Proceedings of the 17th Annual Sym-
posium on Foundations of Computer Science, 1976, pp. 98–108.
[2] S. C OOK, The complexity of theorem proving procedures, in Proceedings of the Third Annual ACM
Symposium on Theory of Computing, 1971, pp. 151–158.
[3] T. H. C ORMEN , C. E. L EISERSON , R. L. R IVEST, AND C. S TEIN, Introduction to Algorithms, MIT

press, Cambridge, MA, 3rd ed., 2009.
[4] M. R. G AREY AND D. S. J OHNSON, Computers and Intractability A Guide to the Theory of NP-
Completeness, Macmillan Higher Education, 1979.
[5] H. R. L EWIS AND C. H. PAPADIMITRIOU, Elements of the Theory of Computation, Prentice-Hall,

2nd ed., 1998.
[6] I. PARBERRY, Parallel Complexity Theory, John Wiley & Sons, New York, NY, 1987.
7
[7] E. L. P OST, Finite combinatory processes – formulation I, Journal of Symbolic Logic, 1 (1936),
pp. 103–105.
[8] A. M. T URING, On computable numbers, with an application to the Entscheidungsproblem, Proceed-

ings of the London Mathematical Society, 42 (1937), pp. 230–265.

cs590 s23 Bruda

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

cs590 s23 Bruda

Uploaded by

Copyright:

Available Formats

Some Problems Are More Complete than Others:

A CSC 590 Challenge

1.1 Complexity Classes

1.2 The Hardest Problems of Them All

1.3 Two Simple Examples

1.4 The Million Dollar Question

2 Complete Problems and Parallel Computation

2.1 The Parallel Computation Thesis

2.2 Parallel Complexity Theory

Task 1 Provide a formal proof that NL ⊆ P and NL ⊆ POLYLOGSPACE.

3.1 What to Submit and How Will It Be Marked

[3] T. H. C ORMEN , C. E. L EISERSON , R. L. R IVEST, AND C. S TEIN, Introduction to Algorithms, MIT

[5] H. R. L EWIS AND C. H. PAPADIMITRIOU, Elements of the Theory of Computation, Prentice-Hall,

[8] A. M. T URING, On computable numbers, with an application to the Entscheidungsproblem, Proceed-

You might also like