Professional Documents
Culture Documents
html
overview
Sets
Importance: languages are sets
A set is a collection of "things," called the elements or members of the set. It is
essential to have a criterion for determining, for any given thing, whether it is
or is not a member of the given set. This criterion is called themembership
criterion of the set.
There are two common ways of indicating the members of a set:
o List all the elements, e.g. {a, e, i, o, u}
o Provide some sort of an algorithm or rule, such as a grammar
Notation:
o To indicate that x is a member of set S, we write x S
o We denote the empty set (the set with no members) as {} or
o If every element of set A is also an element of set B, we say that A is
a subset of B, and write A B
o If every element of set A is also an element of set B, but B also has some
elements not contained in A, we say that A is a proper subset of B, and
write A B
Operations on Sets
The union of sets A and B, written A B, is a set that contains everything that
is in A, or in B, or in both.
The intersection of sets A and B, written A B, is a set that contains exactly
those elements that are in both A and B.
The set difference of set A and set B, written A - B, is a set that contains
everything that is in A but not in B.
The complement of a set A, written as -A or (better) A with a bar drawn over it,
is the set containing everything that is not in A. This is almost always used in
the context of some universal set U that contains "everything" (meaning
"everything we are interested in at the moment"). Then -A is shorthand for U -
A.
Additional terminology
The cardinality of a set A, written |A|, is the number of elements in a set A.
Graphs
Importance: Automata are graphs.
A graph consists of two sets
o A set V of vertices (or nodes), and
o A set E of edges (or arcs).
An edge consists of a pair of vertices in V. If the edges are ordered, the graph is
a digraph (a contraction of "directed graph").
A walk is a sequence of edges, where the finish vertex of each edge is the start
vertex of the next edge. Example: (a, e), (e, i), (i, o), (o, u).
A path is a walk with no repeated edges.
A simple path is a path with no repeated vertices.
Trees
Importance: Trees are used in some algorithms.
A tree is a kind of digraph:
o It has one distinguished vertex called the root;
o There is exactly one path from the root to each vertex; and
o The level of a vertex is the length of the path to it from the root.
Terminology:
o if there is an edge from A to B, then A is the parent of B, and B is
the child of A.
o A leaf is a node with no children.
o The height of a tree is the largest level number of any vertex.
Proof techniques
Importance
Because this is a formal subject, the textbook is full of proofs
Proofs are encapsulated understanding
Proof by induction
Prove something about P1 (the basis)
Prove that if it is true for Pn, then it is true for Pn+1 (the inductive assumption)
BFN
BNF Notation
Importance
o standard way to define programming language syntax
BNF examples
<loop statement> ::= <while loop> | <for loop>
<while loop> ::= while ( <condition> ) <statement>
<for loop> ::= for ( <expression> ; <expression>;
<expression> ) <statement>
<assignment statement> ::=
<variable> = <expression>
Recursion is used frequently:
<digit> ::= 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9
<integer> ::= <digit> | <integer> <digit>
<letter> ::= <lowercase letter> | <uppercase letter>
<lowercase letter> ::=
a|b|c|d|e|f|g|h|i|j|k|l|m|n|o|p|q|r|s|t|u|v|w|x|y|z
<name> ::= <letter> | <name> <letter>
| <name> <digit>
It's hard to set limits:
<small number> ::= <digit> | <digit> <digit>
| <digit> <digit> <digit>
| <digit> <digit> <digit> <digit>
| <digit> <digit> <digit> <digit> <digit>
More BNF Examples
Use recursion to build lists:
<statement list> ::= <statement>
| <statement list> <statement>
Expressions in BNF
You can use the structure of BNF to show the order of operations:
<expression> ::=
<expression> + <term>
| <expression> Ð <term>
| <term>
<term> ::=
<term> * <factor>
| <term> / <factor>
| <factor>
<factor> ::=
<primary> ** <factor>
| <primary>
<primary> ::=
<primary>
| <element>
<element> ::=
( <expression> )
| <variable>
| <number>
Since then, almost every author of books on new programming languages used it to
specify the syntax rules of the language. See [Jensen 74] and [Wirth 82] for examples.
::=
meaning "or"
<>
The angle brackets distinguish syntax rules names (also called non-terminal symbols)
from terminal symbols which are written exactly as they are to be represented. A
BNF rule defining a nonterminal has the form:
nonterminal ::= sequence_of_alternatives consisting of strings of
terminals or nonterminals separated by the meta-symbol |
For example, the BNF production for a mini-language is:
<program> ::= program
<declaration_sequence>
begin
<statements_sequence>
end ;
This shows that a mini-language program consists of the keyword "program"
followed by the declaration sequence, then the keyword "begin" and the statements
sequence, finally the keyword "end" and a semicolon.
(end of quotation)
In fact, many authors have introduced some slight extensions of BNF for the ease of
use:
Fundamental Concepts
There are three fundamental concepts that we will be working with in this course:
Languages
o A language is a subset of the set of all possible strings formed from a
given set of symbols.
o There must be a membership criterion for determining whether a
particular string in the set.
Grammars
o A grammar is a formal system for accepting or rejecting strings.
o A grammar may be used as the membership criterion for a language.
Automata
o An automaton is a simplified, formalized model of a computer.
o An automaton may be used to compute the membership function for a
language.
o Automata can also compute other kinds of things.
Languages
Definitions 1
An alphabet is a finite, nonempty set of symbols. We use to denote this
alphabet. Note: Symbols may be more than one English letter long, e.g. while is a
single symbol in Pascal.
denotes the set of all sequences of strings that are composed of zero or
more symbols of .
Languages
More Definitions
The concatenation of two strings is formed by joining the sequence of symbols in the
first string with the sequence of symbols in the second string.
Operations on Languages
Languages are sets. Therefore, any operation that can be performed on sets can be
performed on languages.
In addition,
L1 L2, the catenation of L1 and L2, is a language.
(The strings of L1 L2 are strings that have a word of L1 as a prefix and a word
of L2 as a suffix.)
L , the catenation of L with itself n times, is a language.
L = L L LL LLL LLLL ..., the star closure of L, is a
language.
L = L L LL LLL LLLL ..., the positive closure of L, is a
language.
Definition of a Grammar
A grammar G is a quadruple G = (V, T, S, P)
where
V is a finite set of (meta)symbols, or variables.
T is a finite set of terminal symbols.
S V is a distinguished element of V called the start symbol.
P is a finite set of productions (or rules).
We'll put this in words, but -- learn the symbols. The words are just "training wheels".
X (V T) : X is a member of the set of strings composed of any mixture of
variables and terminal symbols, but X is not the empty string.
Y (V T) : Y is a member of the set of strings composed of any mixture of
variables and terminal symbols; Y is allowed to be the empty string.
Derivations
Productions are rules that can be used to define the strings belonging to a language.
Suppose language L is defined by a grammar G = (V, T, S, P). You can find a string
belonging to this language as follows:
Notation:
Automata
An automaton is a simple model of a computer.
Generally, an automaton
One designated state is the start state.
Some states (possibly including the start state) can be designated as final states.
Arcs between states represent state transitions -- each such arc is labeled with the
symbol that triggers the transition.
Example DFA
Example input string: 1 0 0 1 1 1 0 0
Operation
Start with the "current state" set to the start state and a "read head" at the
beginning of the input string;
while there are still characters in the string:
Implementing a DFA
If you don't object to the go to statement, there is an easy way to implement a DFA:
q0 : read char;
if eof then accept string;
if char = 0 then go to q2;
if char = 1 then go to q1;
q1 : read char;
if eof then reject string;
if char = 0 then go to q3;
if char = 1 then go to q0;
q2 : read char;
if eof then reject string;
if char = 0 then go to q0;
if char = 1 then go to q3;
q3 : read char;
if eof then reject string;
if char = 0 then go to q1;
if char = 1 then go to q2;
q1 : read char;
if eof then reject string;
if char = 0 then state := q3;
if char = 1 then state := q0;
q2 : read char;
if eof then reject string;
if char = 0 then state := q0;
if char = 1 then state := q3;
q3 : read char;
if eof then reject string;
if char = 0 then state := q1;
if char = 1 then state := q2;
end case;
end loop;
Note: The fact that is a function implies that every vertex has an outgoing arc for
each member of .
: Q Q.
The difference is that, in this automaton, does not appear to be a function. It looks
like a partial function, that is, it is not defined for all values of Q .
We can complete the definition of by assuming the existence of an "invisible" state
and some "invisible" arcs. Specifically,
The automaton represented above is really exactly the same as the automaton on the
previous page; we just haven't bothered to draw one state and a whole bunch of arcs
that we know must be there.
I don't think you'll find abbreviated automata in the textbook. They aren't usually
allowed in a formal course. However, if you ever use an automaton to design a lexical
scanner, putting in an explicit error state just clutters up the diagram.
Due to nondeterminism, the same string may cause an nfa to end up in one of several
different states, some of which may be final while others are not. The string is
accepted if any possible ending state is a final state.
Example NFAs
Implementing an NFA
If you think of an automaton as a computer, how does it handle nondeterminism?
There are two ways that this could, in theory, be done:
There are three ways, two feasible and one not yet feasible, to simulate the second
alternative:
for each a in A do
for each transition from a
to some state b do
add b to B;
while there is a next symbol do
{ read next symbol (x);
B := ;
for each a in A do
{ for each transition from a to some state b do
add b to B;
for each x transition from a to some state b do
add b to B;
}
for each transition from
some state b in B to some state c not in B do
add c to B;
A := B;
}
if any element of A is a final state then
return True;
else
return False;
M = (Q, , , q0, F)
where
Q is a finite set of states,
is a finite set of symbols, the input alphabet,
: Q ( { } ) 2 is a transition function,
q0 Q is the initial state,
F Q is a set of final states.
These are all the same as for a dfa except for the definition of :
Transitions on are allowed in addition to transitions on elements of , and
The range of is 2 rather than Q. This means that the values of are not
elements of Q, but rather are sets of elements of Q.
DFA = NFA
Two acceptors are equivalent if the accept the same language.
A DFA is just a special case of an NFA that happens not to have any null transitions
or multiple transitions on the same symbol. So DFAs are not more powerful than
NFAs.
For any NFA, we can construct an equivalent DFA (see below). So NFAs are not
more powerful than DFAs. DFAs and NFAs define the same class of languages --
the regular languages.
To translate an NFA into a DFA, the trick is to label each state in the DFA with a set
of states from the NFA. Each state in the DFA summarizes all the states that the NFA
might be in. If the NFA contains |Q| states, the resultant DFA could contain as many
as |2 | states. (Usually far fewer states will be needed.)
Primitive Regular Expressions
A regular expression can be used to define a language. A regular expression
represents a "pattern;" strings that match the pattern are in the language, strings that
do not match the pattern are not in the language.
x, for each x ,
, the empty string, and
, indicating no strings at all.
Thus, if | | = n, then there are n+2 primitive regular expressions defined over .
For each x , the primitive regular expression x denotes the language {x}.
That is, the only string in the language is the string "x".
The primitive regular expression denotes the language { }. The only string
in this language is the empty string.
The primitive regular expression denotes the language {}. There
are no strings in this language.
Regular Expressions
Every primitive regular expression is a regular expression.
{ }
{}
(r1) L(r1)
r1* (L(r1))*
r1 r2 L(r1) L(r2)
r1 + r2 L(r1) L(r2)
Give regular expressions for the following languages on = {a, b, c}.
All strings containing exactly one a.
(b+c)*a(b+c)*
All strings containing no more than three a's.
We can describe the string containing zero, one, two, or three a's (and nothing
else) as
( +a)( +a)( +a)
Now we want to allow arbitrary strings not containing a's at the places marked
by X's:
X( +a)X( +a)X( +a)X
To make it easier to see what's happening, let's put an X in every place we want
to allow an arbitrary string:
XaXbXcX + XaXcXbX + XbXaXcX + XbXcXaX + XcXaXbX + XcXbXaX
Finally, replacing the X's with (a+b+c)* gives the final (unwieldy) answer:
(a+b+c)*a(a+b+c)*b(a+b+c)*c(a+b+c)* +
(a+b+c)*a(a+b+c)*c(a+b+c)*b(a+b+c)* +
(a+b+c)*b(a+b+c)*a(a+b+c)*c(a+b+c)* +
(a+b+c)*b(a+b+c)*c(a+b+c)*a(a+b+c)* +
(a+b+c)*c(a+b+c)*a(a+b+c)*b(a+b+c)* +
(a+b+c)*c(a+b+c)*b(a+b+c)*a(a+b+c)*
All strings which contain no runs of a's of length greater than two.
We can fairly easily build an expression containing no a, one a, or one aa:
(b+c)*( +a+aa)(b+c)*
but if we want to repeat this, we need to be sure to have at least one non-a
between repetitions:
(b+c)*( +a+aa)(b+c)*((b+c)(b+c)*( +a+aa)(b+c)*)*
All strings in which all runs of a's have lengths that are multiples of three.
(aaa+b+c)*
For any nondeterministic finite acceptor (nfa) we can find an equivalent dfa. Thus
nfas also describe regular languages.
Regular expressions also describe regular languages. We will show that regular
expressions are equivalent to nfas by doing two things:
1. For any given regular expression, we will show how to build an nfa that accepts
the same language. (This is the easy part.)
2. For any given nfa, we will show how to construct a regular expression that
describes the same language. (This is the hard part.)
For grouping (parentheses), we don't really need to do anything. The nfa that
represents the regular expression (r1) is the same as the nfa that represents r1.
The star
denotes zero or more applications of the regular
expression, so we need to set up a loop
in the nfa. We can do this with a backward-
pointing arc. Since we might want to
traverse the regular expression zero times (thus matching the null string), we also need
a forward-pointing arc to bypass the nfa entirely.
The regular expression derived in the final step accepts the same language as the
original nfa.
Since we can convert an nfa to a regular expression, and we can convert a regular
expression to an nfa, the two are equivalent formalisms--that is, they both describe the
same class of languages, the regular languages.
Here's how to delete a state (this is taken with minor modifications from Figure 3.9 on
page 85 of your textbook):
To delete state Q, where Q is neither the initial state nor the final state,
replace with .
You should convince yourself that this transformation is "correct", in the sense that
paths which leave you in Qi in the original will leave you in Qi in the replacement, and
similarly for Qj.
What if state Q has connections to more than two other states, say, Q i, Qj, and
Qk? Then you have to consider these states pairwise: Q i with Qj, Qj with Qk, and
Qi with Qk.
What if some of the arcs in the original state are missing? There are too many
cases to work this out in detail, but you should be able to figure it out for any
specific case, using the above as a model.
You will end up with an nfa that looks like this, where r 1, r2, r3,
and r4 are (probably very complex) regular expressions. The
resultant nfa represents the regular expression
r1*r2(r4 + r3r1*r2)*
(you should verify that this is indeed the correct regular
expression). All you have to do is plug in the correct values for r 1, r2, r3, and r4.
Definition by grammar
Define the grammar G = (V, T, S, P) where
These should be pretty obvious except for the set V, which we generally make up as
we construct P.
Since the empty string belongs to the language, we need the production
S
Some strings belonging to the language begin with the symbol a. The a can be
followed by any other string in the language, so long as this other string does not
begin with a. So we make up a variable, call it NOTA, to produce these other strings,
and add the production
S a NOTA
S c NOTc
Now, NOTA is either the empty string, or some string that begins with b, or some string
that begins with c. If it begins with b, then it must be followed by a (possibly empty)
string that does not begin with b--and we already have a variable for that case, NOTB.
Similarly, if NOTA is some string beginning with c, the c must be followed by NOTC.
This gives the productions
NOTA
NOTA b NOTB
NOTA c NOTC
Similar logic gives the following productions for NOTB and NOTC:
NOTB
NOTB a NOTA
NOTB c NOTC
NOTC
NOTC a NOTA
NOTC b NOTB
Example derivation:
S a NOTA a b NOTB a b a NOTA a b a c NOTC a b a c.
Definition by nfa
Defining the language by an nfa follows almost exactly the same logic as defining the
language by a grammar. Whenever an input symbol is read, go to a state that will
accept any symbol other than the one read. To emphasize the similarity with the
preceding grammar, we will name our states to correspond to variables in the
grammar.
The key insight is that strings of the language can be viewed as consisting of zero or
more repetitions of the symbol a, and between them must be strings of the
form bcbcbc... or cbcbcb.... So we can start with
X a Y a Y a Y a ... Y a Z
where we have to find suitable expressions for X, Y, and Z. But first, let's get the
above expression in a proper form, by getting rid of the "...". This gives
X a (Y a)* Z
X = ( + b + c + (bc)* + (cb)*)
This is now correct, but could be simplified. The last four terms include the
+b+c cases, so we can drop those three terms. Then we can combine the last four
terms into
X = (bc)*(b + ) + (cb)*(c + )
Now, what about Z? As it happens, there isn't any difference between what we need
for Z and what we need for X, so we can also use the above expression for Z.
Finally, what about Y? This is just like the others, except that Y cannot be empty.
Luckily, it's easy to adjust the above expression for X and Z so that it can't be empty:
Y = ((bc)*b + (cb)*c)
Regular Grammars
So dfas, nfas, and regular expressions are all "equivalent," in the sense that any
language you define with one of these could be defined by the others as well.
We also know that languages can be defined by grammars. Now we will begin to
classify grammars; and the first kinds of grammars we will look at are the regular
grammars. As you might expect, regular grammars will turn out to be equivalent to
dfas, nfas, and regular expressions.
Classifying Grammars
Recall that a grammar G is a quadruple G = (V, T, S, P)
where
V is a finite set of (meta)symbols, or variables.
T is a finite set of terminal symbols.
S V is a distinguished element of V called the start symbol.
P is a finite set of productions.
Right-Linear Grammars
In general, productions have the form:
(V T) (V T) .
V T*V
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side
consists of any number of terminals (members of ) optionally followed by a single
variable. (The "right" in "right-linear grammar" refers to the fact that, following the
arrow, a variable can occur only as the rightmost symbol of the production.)
A x y z B
A B
A x
As an example of the correspondence between an nfa and a right-linear grammar, the following
automaton and grammar both recognize the set of strings consisting of an even number of 0's and an
even number of 1's.
S
S 0 B
S 1 A
A 0 C
A 1 S
B 0 S
B 1 C
C 0 A
C 1 B
Left-Linear Grammars
In a left-linear grammar, all productions have one of the two forms:
V VT*
or
V T*
That is, the left-hand side must consist of a single variable, and the right-hand side
consists of an optional single variable followed by any number of terminals. This is
just like a right-linear grammar except that, following the arrow, a variable can occur
only on the left of the terminals, rather than only on the right.
We won't pay much attention to left-linear grammars, because they turn out to be
equivalent to right-linear grammars. Given a left-linear grammar for language L, we
can construct a right-linear grammar for the same language, as follows:
Step Method
Construct a right-linear
grammar for the (different) Replace each production A x of L with a production A x , and
Construct an nfa for L from We talked about deriving an nfa from a right-linear grammar on an
the right-linear grammar. This earlier page. If the nfa has more than one final state, we can make those
nfa should have just one final states nonfinal, add a new final state, and put transitions from each
state. previously final state to the new final state.
Construct a right-linear
grammar for L from the nfa for This is the technique we just talked about on an earlier page.
L.
Regular Grammars
A regular grammar is either a right-linear grammar or a left-linear grammar.
You do not get to mix the two. For example, consider a grammar with the following
productions:
S
S a X
X S b
This grammar is neither right-linear nor left-linear, hence it is not a regular grammar.
We have no reason to suppose that the language it generates is a regular language (one
that is generated by a dfa).
In fact, the grammar generates a language whose strings are of the form a b . This
language cannot be recognized by a dfa. (Why not?)
Properties of Regular Languages
Closure I
A set is closed under an operation if, whenever the operation is applied to members of
the set, the result is also a member of the set.
For example, the set of integers is closed under addition, because x+y is an integer
whenever x and y are integers. However, integers are not closed under division: if x
and y are integers, x/y may or may not be an integer.
We will show that the set of regular languages is closed under each of these
operations. We will also define the operations of "homomorphism" and "right
quotient" and show that the set of regular languages is also closed under these
operations.
Union of L1 and L2
Create a new start state.
Make a transition from the new start state to each of the original start
states.
Concatenation of L1 and L2
Put a transition from each final state of L1 to the initial state of L 2
Make the original final states of L1 nonfinal
Negation of L1
Kleene Star of L1
Make a new start state; connect it to the original start state with a transition.
Make a new final state; connect the original final states (which become nonfinal) to it with transitions.
Connect the new start state and new final state with a pair of transitions.
Reverse of L1
In these constructions you form a completely new machine, whose states are each
labeled with an ordered pair of state names: the first element of each pair is a state
from L1, and the second element of each pair is a state from L 2. (Usually you won't
need a state for every such pair, just some of them.)
1. Begin by creating a start state whose label is (start state of L 1, start state of L2).
2. Repeat the following until no new arcs can be added:
1. Find a state (A, B) that lacks a transition for some x in .
2. Add a transition on x from state (A, B) to state ( (A, x), (B, x)). (If this
state doesn't already exist, create it.)
The same construction is used for both intersection and set difference. The distinction
is in how the final states are selected.
Intersection: Mark a state (A, B) as final if both (i) A is a final state in L1, and (ii) B
is a final state in L2.
Set difference: Mark a state (A, B) as final if A is a final state in L 1, but B is not a
final state in L2.
If w is a string in , then we define h(w) to be the string obtained by replacing each
symbol x by the corresponding string h(x) *.
Proof.
That is, the strings in L1/L2 are strings from L1 "with their tails cut off." If some string
of L1 can be broken into two parts, w and x, where x is in language L 2, then w is in
language L1/L2.
Theorem. If L1 and L2 are both regular languages, then L1/L2 is a regular language.
Proof: Again, the proof is by construction. We start with a dfa M(L 1) for L1; the dfa
we construct is exactly like the dfa for L 1, except that (in general) different states will
be marked as final.
For each state Qi in M(L1), determine if it should be final in M(L 1/L2) as follows:
Starting in state Qi as if it were the initial state, determine if any of the strings
in language L2 are accepted by M(L1). If there are any, then state Qi should be
marked as final in M(L1/L2). (Why?)
That's the basic algorithm. However, one of the steps in it is problematical: since
language L2 may have an infinite number of strings, how do we determine whether
some unknown string in the language is accepted by M(L 1) when starting at Qi?
We cannot try all the strings, because we insist on a finite algorithm.
The trick is to construct a new dfa that recognizes the intersection of two languages:
(1) L2, and (2) the language that would be accepted by dfa M(L 1) if Qi were its initial
state. We already know we can build this machine. Now, if this machine
recognizes any string whatever (we can check this easily), then the two machines have
a nonempty intersection, and Qi should be a final state. (Why?)
We have to go through this same process for every state Q i in M(L1), so the algorithm
is too lengthy to step through by hand. However, it is enough for our purposes that the
algorithm exists.
Finally, since we can construct a dfa that recognizes L 1/L2, this language is therefore
regular, and we have shown that the regular languages are closed under right quotient.
Standard Representations
A regular language is given in a standard representation if it is specified by one of:
A finite automaton (dfa or nfa).
A regular expression.
A regular grammar.
(The importance of these particular representations is simply that they are precise and
unambiguous; thus, we can prove things about languages when they are expressed in a
standard representation.)
If there is no path from the initial state to a final state, then the language is
empty (and finite).
If there is a path containing a cycle from the initial state to some final state,
then the language is infinite.
If no path from the initial state to a final state contains a cycle, then the
language is finite.
The pigeonhole can be used to prove that certain infinite languages are not regular.
(Remember, any finite language is regular.)
As we have informally observed, dfas "can't count." This can be shown formally by
using the pigeonhole principle. As an example, we show that L = {a b : n > 0} is not
regular. The proof is by contradiction.
Suppose L is regular. There are an infinite number of values of n but M(L) has only a
finite number of states. By the pigeonhole principle, there must be distinct values
of i and j such that ai and aj end in the same state. From this state,
Since the language is infinite, some strings of the language must have length
> n.
For a string of length > n accepted by the dfa, the walk through the dfa must
contain a cycle.
Repeating the cycle an arbitrary number of times must yield another string
accepted by the dfa.
We can view this as a game wherein our opponent makes moves 1 and 3
(choosing m and choosing xyz) and we make moves 2 and 4 (choosing w and
choosing i). Our goal is to show that we can always beat our opponent. If we
can show this, we have proved that L is not regular.
Pumping Lemma Example 1
Prove that L = {anbn: n 0} is not regular.
1. We don't know m, but assume there is one.
2. Choose a string w = anbn where n > m, so that any prefix of length m consists
entirely of a's.
3. We don't know the decomposition of w into xyz, but since |xy| m, xy must
consist entirely of a's. Moreover, y cannot be empty.
4. Choose i = 0. This has the effect of dropping |y| a's out of the string, without
affecting the number of b's. The resultant string has fewer a's than b's, hence
does not belong to L. Therefore L is not regular.
Q.E.D.
Q.E.D.
Q.E.D.
Context-Free Grammars
Definition of CFGs
A grammar G = (V, T, S, P) is a context free grammar (cfg) if all
productions in P have the form
A x
where
A V, and
x (V T)*.
Since V (V T) , the productions for a context-free grammar are a restricted
form of the productions allowed for a general grammar. Thus, a context-free grammar
is a grammar.
Since T* (V T)* and T*V (V T)*, it follows that every right-linear
grammar is also a context-free grammar.
Similarly, right-linear grammars and linear grammars are also context-free grammars.
A context-free language (cfl) is a language that can be defined by a context-
free grammar.
Notes on Terminology
Every regular grammar is a context-free grammar, in the same way that every dog is
an animal.
If grammar G is context free but not regular, we know the language L(G) is context
free. We do not know that L(G) is not regular. It might be possible to find a regular
grammar G2 that also defines L.
Example
Consider the following grammar:
Is G a context-free grammar?
Yes.
Is G a regular grammar?
No.
Is L(G) a context-free language?
Yes.
Example CFGs
Example 1
We have shown that L = {anbn: n 0} is not regular. Here is a context-free grammar
for this language.
Example 2
We have shown that L = {anbk: k > n 0} is not regular. Here is a context-free
grammar for this language.
Example 3
The language L = {wwR: w {a, b}*}, where each string in L is a palindrome, is not
regular. Here is a context-free grammar for this language.
1. Does every string recognized by this grammar have an equal number of a's
and b's?
2. Is every string consisting of an equal number of a's and b's recognized by this
grammar?
Example 5
The language L, consisting of balanced strings of parentheses, is context-free but not
regular. The grammar is simple, but we have to be careful to keep our
symbols ( and ) separate from our metasymbols ( and ).
Sentential Forms
A sentential form is the start symbol S of a grammar or any string in (V T)* that
can be derived from S.
Because this grammar is linear, each sentential form has at most one variable. Hence
there is never any choice about which variable to expand next.
3. Derivation Trees
4. Since the order in which we expand the variables in a sentential form doesn't
seem to make any difference (the textbook contains a proof of this), it would be
nice to show a derivation in some way that is independent of the order.
A derivation tree is a way of presenting a derivation in an order-independent
fashion.
5. For example, for the following derivation:
6. S ABC aABC aABcC aBcC abBcC abBc abbB
c abbc
7. we would have the derivation tree:
8.
9.
This tree represents not just the given derivation, but all the different orders in
which the same productions could be applied to produce the string abbc.
10.A partial derivation tree is any subtree of a derivation tree such that, for any
node of the subtree, either all of its children are also in the subtree, or none of
them are.
11.The yield of the tree is the final string obtained by reading the leaves of the tree
from left to right, ignoring the s (unless all the leaves are , in which case the
yield is ). The yield of the above tree is the string abbc, as expected.
12.The yield of a partial derivation tree that contains the root is a sentential form.
A language is a set of strings, and any well-defined set must have a membership
criterion. A context-free grammar can be used as a membership criterion -- if we can
find a general algorithm for using the grammar to recognize strings.
Parsing a string is finding a derivation (or a derivation tree) for that string.
Systematic approaches are easy to find. Almost any exhaustive search technique will
do.
We can (almost) make the search finite by terminating every search path at the point
that it generates a sentential form containing more than |w| terminals.
Note: for the time being, we will ignore the possibility that is in the language.
Suppose we make the following restrictions on the grammar:
We start with the formal definition of an nfa, which is a 5-tuple, and add two things to
it:
We also need to modify , the transition function, so that it manipulates the stack.
M = (Q, , , , q0, z, F)
where
Let the symbol " " indicate a move of the npda, and suppose that (q1, a, x) = {(q2,
y), ...}. Then the following move is possible:
(q1, aW, xZ) (q2, W, yZ)
where W indicates the rest of the string following the a, and Z indicates the rest
of the stack contents underneath the x. This notation says that in moving from
state q1 to state q2, an a is consumed from the input string aW, and the x at the
top (left) of the stack xZ is replaced with y, leaving yZ on the stack.
(q0, w, z)
where
q0 is the start state,
w is the entire string to be processed, and
z is the start stack symbol.
Starting with this instantaneous description, make zero or more moves, just as you
would with an nfa. There are two kinds of moves that you can make:
-transitions. If you are in state q1, x is the top (leftmost) symbol in the stack,
and (q1, , x) = {(q2, w2), ...}, then you can replace the symbol x with the
string w2 and move to state q2.
Nonempty transitions. If you are in state q 1, a is the next unconsumed input
symbol, x is the top (leftmost) symbol in the stack, and (q1, a, x) = {(q2,
w2), ...}, then you can remove the a from the input string, replace the symbol x
with the string w2, and move to state q2.
If you are in a final state when you reach the end of the string (and maybe make
some transitions after reaching the end), then the string is accepted by the npda. It
doesn't matter what is on the stack.
As usual with nondeterministic machines, the string is accepted if there is any way it
could be accepted. If we take the "oracle" viewpoint, then every time we have to make
a choice, we magically always make the right choice, so we will end in a final state if
at all possible.
A Ax
for any x (V T)*. A grammar is left-recursive if it contains at least one left-
recursive variable.
A BC
or
A a
where A, B, and C are variables and a is a terminal. Any context-free grammar that
does not contain can be put into Chomsky Normal Form.
(Most textbook authors also allow the production S so long as S does not appear
on the right hand side of any production.)
Chomsky Normal Form is particularly useful for programs that have to manipulate
grammars.
A ax
Grammars in Greibach Normal Form are typically ugly and much longer than the cfg
from which they were derived. Greibach Normal Form is useful for proving the
equivalence of cfgs and npdas. When we discuss converting a cfg to an npda, or vice
versa, we will use Greibach Normal Form.
In the npda we will construct, the states are hardly important at all. All the real work
is done on the stack. In fact, we will use only the following three states, regardless of
the complexity of the grammar:
Start state q0 just gets things initialized. We use the transition from q 0 to q1 to
put the grammar's start symbol on the stack.
State q1 does the bulk of the work. We represent every derivation step as a
move from q1 to q1.
We use the transition from q1 to qf to accept the string.
Example
Consider the grammar G = ({S, A, B}, {a, b}, S, P), where
P = {S a, S aAB, A aA, A a, B bB, B b}.
We assert without proof that any npda can be transformed into an equivalent npda that
has the following form:
The npda has only one final state, which it enters if and only if the stack is
empty;
All transitions have the form
o (qj, )
o (qj, BC)
Each transition of
the form (qi, a,
A) = (qj, BC)
results in a multitude of grammar rules, one
for each pair of states qx and qy in the npda.
This algorithm results in a lot of useless
(unreachable) productions, but the useful
productions define the context-free grammar recognized by the npda.
Both npdas and dpdas may have -transitions; but a dpda may have a -
transition only if no other transition is possible.
We will show that, if L is a context free language, then strings of L that are at least m
symbols long can be "pumped" to produce additional strings in L. (The value of m
depends on the particular language.)
Let L be an infinite context-free language. Then there is some positive integer m such
that, if S is a string of L of length at least m, then
Preliminary Definitions
A variable is useful if it occurs in the derivation of some string. This requires that
the variable occurs in some sentential form (you can get to the variable if you
start from S), and
a string of terminals can be derived from the sentential form (the variable isn't a
"dead end").
directly recursive, that is, there is a production A x1Ax2 for some strings x1,
x2 (T V)*, or
indirectly recursive, that is, there is are variables Xi and productions
A ...X1...
X1 ...X2...
X2 ...X3...
...
XN ...A...
There are only a finite number of variables in a grammar, and the productions for each
variable have finite lengths. The only way that a grammar can generate arbitrarily
long strings is if one or more variables is both useful and recursive.
If a variable is not useful, it does not occur in the derivation of any string of the
language. Useless variables can always be eliminated from a grammar.
Suppose that no variable in the grammar is recursive. Since the start symbol is
nonrecursive, it must be defined only in terms of terminals and other variables. Then
since those variables are nonrecursive, they have to be defined in terms of terminals
and still other variables, and so on. After a while we run out of "other variables" while
the generated string is still finite. Hence there is an upper bound on the length of the
string that can be generated from the start symbol. This contradicts our premise that
the language is finite. Therefore, our assumption than no variable is recursive must be
incorrect.
Since A was used in the derivation, the derivation must have started as
S uAy
for some values of u and y. Since A was used recursively, the derivation must have
continued as
S uAy uvAxy
Finally, the derivation must have eliminated all variables to reach a string X in the
language:
A vAx
and
A w
are possible. Hence the derivation
A vwx
must also be possible.
(Notice, by the way, that the above does not imply that A was used recursively only
once. The "*" of " " could cover many uses of A, as well as other recursive
variables.)
There has to be some "last" recursive step. Consider the longest strings that can be
derived for v, w, and x without the use of recursion. Then there is a number m such
that |vwx| < m.
Since the grammar (by hypothesis) does not contain any -productions or unit
productions, every derivation step either introduces a terminal or increases the length
of the sentential form. Since A vAx, it follows that |vx| > 0.
Finally, since uvAxy occurs in the derivation, and A vAx and A w are both
possible, it follows that uviwxiy also belongs to L.
Suppose L is context-free. If string X L, where |X| > m, it follows that X=uvwxy,
where |vwx| m. Choose a value for i that is greater than m. Then, wherever vwx
occurs in the string aibici, it cannot contain more than two distinct letters--it can be all
a's, all b's, all c's, or it can be a's and b's, or it can be b's and c's. Thus the string vx
cannot contain more than two distinct letters; but by the pumping lemma, it cannot be
empty, either, so it must contain at least one letter.
Now we are ready to "pump." Since uvwxy is in L, uv 2wx2y must also be in L. Since v
and x can't both be empty, |uv2wx2y| > |uvwxy|, so we have added letters. But since vx
does not contain all three distinct letters, we cannot have added the same number of
each letter. Thus uv2wx2y cannot be in L.
Turing Machines
Informal Definition of Turing Machines
A Turing machine is a lot like a pushdown automaton. Both have a finite-state machine as a central component;
both have additional storage. But where a pushdown automaton uses a stack for storage, a Turing machine uses a
tape, which is considered to be infinite in both directions. The tape consists of a series of squares, each of which
can hold a single symbol. The tape head, or read-write head, can read a symbol from the tape, write a symbol to
the tape, and move one square in either direction.
Turing machines can be deterministic or nondeterministic. We will consider only deterministic machines.
Unlike the other automata we have discussed, a Turing machine does not read "input." Instead, there may be (and
usually are) symbols on the tape before the Turing machine begins; the Turing machine might read some, all, or
none of these symbols. The initial tape may, if desired, be thought of as "input."
We have defined acceptors, which produce only a binary (accept/reject) output, and transducers, which can produce
more complicated results. However, all our work so far has been with acceptors. A Turing machine also accepts or
rejects its input. More importantly, the results left on the tape when the Turing machine finishes can be regarded as
the "output" of the computation; thus, a Turing machine is a transducer.
(Q, , , , q0, #, F)
where
Q is a set of states,
is a finite set of symbols, the input alphabet,
Because the Turing machine has to be able to find its input, and to know when it has processed all of that input, we
require:
The tape is initially blank (every symbol is #) except possibly for a finite, contiguous sequence of symbols.
If there are initially nonblank symbols on the tape, the tape head is initially positioned on one of them.
Most other textbooks make no distinction between (the input alphabet) and (the tape alphabet). Our textbook
does this to emphasize that the "input" (the nonblank symbols on the tape) does not contain #. Also, there may
be more symbols in than are present in the input.
This means:
When the machine is in a given state (Q) and reads a given symbol ( ) from the tape, it replaces the symbol on the
tape with some other symbol ( ), goes to some other state (Q), and moves the tape head one square left (L) or right
(R).
An instantaneous description or configuration of a Turing machine requires (1) the state the Turing machine is in,
(2) the contents of the tape, and (3) the position of the tape head on the tape. This can be summarized in a string of
the form
xi...xjqmxk...xl
where the x's are the symbols on the tape, qm is the current state, and the tape head is on the square containing xk (the
symbol immediately following qm).
A move of a Turing machine can therefore be represented as a pair of instaneous descriptions, separated by the
symbol " ". For example, if
(q5, b) = (q8, c, R)
abbabq5babb abbabcq8abb
A Turing machine is often defined to start with the read head positioned over the first (leftmost) input symbol. This
isn't really necessary, because if the Turing machine starts anywhere on the nonblank portion of the tape, it's simple
to get to the first input symbol. For the input alphabet = {a, b}, the following program fragment does the trick,
then goes to state q1.
(q0, a, a, L, q0)
(q0, b, b, L, q0)
(q0, #, #, R, q1)
(Notice that this definition assumes that the Turing machine starts with its tape head positioned on the leftmost
symbol.)
We said a Turing machine accepts its input if it halts in a final state. There are two ways this could fail to happen:
If a Turing machine halts, the sequence of configurations leading to the halt state is called a computation.
Recognizing a Language
This machine will match strings of the form {a b : n 0}.
q1 is the only final state.
q0x qfy
where qf is a final state. (Actually, this definition is a bit stronger than necessary; can you see how?)
A function f is Turing computable if there exists a Turing machine that can perform the above task.
Sorting
Given a string consisting of a's and b's, this machine will rearrange the string so that all the a's come before all the
b's.
A class of automata (e.g. standard Turing machines) is equivalent to another class of automata (e.g. nondeterministic
Turing machines) if, for each transducer in one class, an equivalent transducer can be found in the other class.
At each move of a Turing machine, the tape head may move either left or right. We may augment this with a stay
option: we will add "don't move" to the set {L, R}.
Theorem. Turing machines with a stay option are equivalent to standard Turing machines.
An n-track Turing machine is one in which each square of the tape holds an ordered n-tuple of symbols from the
tape alphabet. You can think of this as a Turing machine with multiple tape heads, all of which move in lock-step
mode.
Theorem. N-track Turing machines are equivalent to standard Turing machines.
Theorem. Turing machines with semi-infinite tape are equivalent to standard Turing machines.
An off-line Turing machine has two tapes. One tape is read-only and contains the input; the other is read-write and is
initially blank.
A multitape Turing machine has a finite number of tapes, each with it's own independently controlled tape head.
A nondeterministic Turing machine is one in which the dfa controlling the tape is replaced with an nfa.
A binary Turing machine is one whose tape alphabet consists of exactly two symbols.
A two-state Turing machine is one that has only two states. (It makes up for this by having a large, albeit finite,
alphabet.)
Standard Turing machines do not have subroutines, but it's easy to fake them. We do not need to add any new
features. The basic idea is to use one state, or a small group of states, to perform a single task. For example, the
following state can be used to find the left end of the input:
-------------------------------------------------------
q0 | a | a | L | q0
q0 | b | b | L | q0
q0 | # | # | R | q1
-------------------------------------------------------
This "subroutine" can be entered by going to state q0 (which isn't a problem) and it "exits" by going to state q1
(which is a problem). We would like to have the subroutine exit to different states, depending on where it was
called from. The easy way to do this is to make multiple copies of the subroutine, using the same structure but
different state names, e.g.,
-------------------------------------------------------
q41 | a | a | L | q41
q41 | b | b | L | q41
q41 | # | # | R | q87
-------------------------------------------------------
This approach is cumbersome but theoretically adequate, so long as only a finite number of copies are required.
We will store the program as a sequence of 5-tuples: (old state, symbol read, symbol to write, direction, new state).
We want our universal Turing machine to emulate any other Turing machine, even one with a larger tape alphabet.
Hence, we will have to map the emulated alphabet into our own alphabet. The simplest way to do this is with a
unary encoding, e.g. a1=1, a2=11, a3=111, and so on. we can use a second tape symbol, say "0", to separate the
symbols of the emulated alphabet.
Alternatively, we could use a binary notation, e.g. a1=1, a2=10, a3=11, and so on. We would use some other symbol,
say "x", to separate the characters. Many other encoding schemes are possible.
Similarly, we can use strings of digits to encode states, and to encode directions (L and R).
...010110111110101110...
This same 5-tuple could be encoded in binary as
...x1x10x101x1x11x...
One tape holds the "program." This consists of a set of 5-tuples (old state, symbol read, symbol to write,
direction, new state).
A second tape holds the current "state" of the Turing machine that we are emulating.
A third tape holds the "input" and will, upon completion, hold the "output." Again, since the tape
alphabet of the emulated Turing machine may be larger than the tape alphabet of our universal Turing
machine, we may need to encode the alphabet. (This is exactly analogous to using eight bits to represent
an ASCII byte.)
We have already noted that a multitape Turing machine is equivalent to a standard Turing machine. In fact, once
you get into the details, it's probably easier to do on a single-tape machine.
We will not complete the development of a universal Turing machine, but it isn't as difficult as you might think. One
of my textbooks has a complete universal Turing machine in only 23 states (and a reasonably small alphabet).
I have heard of a Turing machine that implements a FORTRAN compiler. Some people have too much time on their
hands.
The languages recognizable by linear-bounded automata (lbas) are called the context-sensitive languages. We will
briefly touch on context-sensitive languages later in this course.
Turing's Thesis
Alan Turing defined Turing machines in an attempt to formalize the notion of an effective procedure (essentially the
same as what we would call an "algorithm").
At approximately the same time, other mathematicians were independently working on the same problem.
All of these formalisms were proved equivalent to one another. This led to
Turing's Thesis (weak form): A Turing machine can compute anything that can be computed by a general-purpose
digital computer.
Turing's Thesis (strong form): A Turing machine can compute anything that can be computed.
The strong form of Turing's thesis cannot be "proved," because it states a relationship between mathematical
concepts and the "real world."
You probably learned to count by putting things into a one-to-one correspondence with your fingers. Now you count
by putting things into a one-to-one correspondence with a subset of the natural numbers (the numbers 1, 2, 3, ...).
Like so:
In calculus you probably learned that "infinity" is not a number. They lied. Infinity, as a number, is represented by
A set is denumerable if its elements can be put into a one-to-one correspondence with the natural numbers.
{ 1, 2, 3, 4, 5, 6, 7, 8, 9, ...}
| | | | | | | | |
{ 0, 1, -1, 2, -2, 3, -3, 4, -4, ...}
Notice that, given any natural number N, you can figure out what integer I it corresponds to, and
vice versa:
Since we can put these two sets into a one-to-one correspondence, they must have the same
number of elements, namely, 0.
Example 2. There are as many odd natural numbers as there are even natural numbers. To prove
this, we note the following correspondence:
{ 1, 3, 5, 7, 9, ...}
| | | | |
{ 2, 4, 6, 8, 10, ...}
Example 3. There are as many even natural numbers as there are numbers.
{ 1, 2, 3, 4, 5, ...}
| | | | |
{ 2, 4, 6, 8, 10, ...}
Diagonalization
In mathematics (not in computer science!), real numbers are defined to have an infinite number
of digits to the right of the decimal point. Thus, trancendental numbers such as pi and e, as well
as rational numbers such as 2.000000... and 0.171717... are real numbers.
Theorem. The real numbers are not denumerable; there are more real numbers than there are
natural numbers.
We will consider only the real numbers between 0 and 1, or more exactly, the set {x: 0 x < 1}.
To show that these real numbers are not denumerable, we can't just demonstrate an attempted
correspondence that doesn't work; we need to show that no possible correspondence can work.
We do this by a technique called diagonalization.
Proof. Suppose that there exists a one-to-one correspondence between the natural numbers an
the real numbers. Then list the natural numbers and their corresponding real numbers as shown:
1 . 1 4 1 5 9 2 ...
2 . 1 7 1 7 1 7 ...
3 . 7 1 8 2 8 4 ...
4 . 2 5 0 0 0 0 ...
...
(The actual real numbers shown are for illustrative purposes only; to be more formal we should
represent these numbers in some more abstract way, such as .d1,1d1,2d1,3d1,4...)
We claim this correspondence is complete, so every real number must be found somewhere in
the right column. Now consider some number whose first digit (after the decimal point) is
different from the first digit of the first real number (the 1 shown in boldface above); whose
second digit differs from the second digit of the second real number (7); whose third digit differs
from the third digit of the third real number (8); and so on. For example, the real number we are
constructing might start out .2855... (since 2 1, 8 7, 5 8, 5 0, and so on.
We constructed this real number so that it differs from every real number in the right-hand
column by at least one digit. Thus, the number does not appear in the right-hand column. Since
this argument applies to any arbitrary correspondence between the natural numbers and the reals,
no one-to-one correspondence is possible, and the real numbers are not denumerable. Q.E.D.
Nondenumerable Powersets
Theorem. The powerset of an infinite (denumerable) set is not denumerable.
Put the elements of the infinite set into a one-to-one correspondence with the natural numbers.
(By hypothesis, the set is denumerable, so this step must be possible.) Then we can refer to the
elements of the set as E1, E2, E3, and so on.
The elements of the powerset will be subsets of {E1, E2, E3, ...}. Assume we can put these subsets
into a one-to-one correspondence with the natural numbers, as follows:
E1 E2 E3 E4 E5
______________________________
1 | 0 1 1 0 0 ...
2 | 1 1 0 0 1 ...
3 | 0 0 0 0 1 ...
4 | 0 0 1 1 1 ...
5 | 1 0 1 0 1 ...
... ... ... ... ... ... ...
(In the table, a 1 indicates that the element is present in the subset, and a 0 indicates that it is
absent.) Now construct a new subset, as follows: For each natural number i, element Ei belongs
to this new subset if and only if it doesn't belong to subset i.
Since this new subset differs from every subset in the correspondence by the presence or absence
of at least one element, the correspondence is faulty. But since the only assumption we made
about the correspondence is that it exists, no such correspondence can exist, and the powerset is
not denumerable. Q.E.D.
can be represented as
11111011011101101111111
can be represented as
11111011011101101111111011011011011011.
Not every binary number represents a Turing machine. For example, a Turing machine can only
move in two directions (L and R), so a sequence ...01110... for direction would not be
meaningful.
Suppose we count in binary, checking each number in turn to see whether it represents a valid
Turing machine. We assign 1 to the first valid Turing machine we encounter (that is, the one
represented by the smallest binary number), 2 to the second such machine, and so on. Since any
Turing machine can be represented in binary, it should be clear that this establishes a one-to-one
correspondence between Turing machines and the natural numbers. Hence, Turing machines are
denumerable.
Never halt.
Clearly, every recursive language is also recursively enumerable. It is not obvious whether every
recursively enumerable language is also recursive.
Note on terminology: Turing machines aren't "recursive." The terminology is borrowed from
recursive function theory (Turing machines are equivalent to general recursive functions). The
terms really don't make sense in this context, so don't worry about trying to make them make
sense.
If a language is recursive, then there exists a Turing machine for it that is guaranteed to halt. We
can generate the strings of * in a shortest-first order (to guarantee that every finite string will
be generated), test the string with the Turing machine, and if the Turing machine accepts the
string, assign that string the next available natural number.
We can also enumerate the recursively enumerable languages. We have a Turing machine that
will halt and accept any string that belongs to the language; the trick is to avoid getting hung up
on strings that cause the Turing machine to go into an infinite loop. We do this by "time
sharing." Here's how:
W := ; N := 0;
for i := 1 to 0 do {
add the next string in * to set W;
initialize a Turing machine for this new string;
for each string in set W do {
let the Turing machine for it make one move;
if the Turing machine halts {
accept or reject the string as appropriate;
if the string is accepted {
N := N + 1;
let this be string N of the language;
}
remove the string from set W;
}
}
}
We have shown that Turing machines are enumerable. Since recursively enumerable languages
are those whose strings are accepted by a Turing machine, the set of recursively enumerable
languages is also enumerable.
We have shown that the powerset of an infinite set is not enumerable -- that it has more than 0
subsets. Each of these subsets represents a language. Therefore, there must be languages that are
not computable by a Turing machine.
According to Turing's thesis, a Turing machine can compute any effective procedure. Therefore,
there are languages that cannot be defined by any effective procedure.
Problem. I've just defined a procedure for defining a non-recursively enumerable language. Isn't
this a contradiction?
When Recursively Enumerable Implies
Recursive
Suppose a language L is recursively enumerable. That means there exists a Turing machine T1
that, given any string of the language, halts and accepts that string. (We don't know what it will
do for strings not in the language -- it could reject them, or it could simply never halt.)
Now let's also suppose that the complement of L, -L = {w: w L}, is recursively enumerable.
That means there is some other Turing machine T2 that, given any string of -L, halts and accepts
that string.
Clearly, any string (over the appropriate alphabet ) belongs to either L or -L. Hence, any string
will cause either T1 or T2 (or both) to halt. We construct a new Turing machine that emulates
both T1 and T2, alternating moves between them. When either one stops, we can tell (by whether
it accepted or rejected the string) to which language the string belongs. Thus, we have
constructed a Turing machine that, for each input, halts with an answer whether or not the string
belongs to L. Therefore L and -L are recursive languages.
We have just proved the following theorem: If a language and its complement are both
recursively enumerable, then both are recursive.
We have shown how to enumerate strings for a given alphabet, w1, w2, w3, .... We have also
shown how to enumerate Turing machines, T1, T2, T3, .... (Recall that each Turing machine
defines a recursively enumerable language.) Consider the language
L = {wi: wi L(Ti)}
A little thought will show that L is itself recursively enumerable. But now consider its
complement:
-L = {wi: wi L(Ti)}
If -L is recursively enumerable, then there must exist a Turing machine that recognizes it. This
Turing machine must be in the enumeration somewhere -- call it Tk.
Does wk belong to L?
If wk belongs to L then (by the way we have defined L) Tk accepts this string. But Tk accepts only
strings that do not belong to L, so we have a contradiction.
If wk does not belong to L, then it belongs to -L and is accepted by Tk. But since Tk accepts wk, wk
must belong to L. Again, a contradiction.
We have now defined a recursively enumerable language L and shown by contradiction that -L is
not recursively enumerable.
We mentioned earlier that if a language is recursive, its complement must also be recursive. If
language L above were recursive, then -L would also be recursive, hence recursively
enumerable. But -L is not recursively enumerable; therefore L must not be recursive.
We have therefore shown that L is recursively enumerable but not recursive, therefore the set of
recursive languages is a proper subset of the set of recursively enumerable languages.
Unrestricted Grammars
Definition of Unrestricted Grammars
The productions of a grammar have the form
(V T)+ (V T) *
The other grammar types we have considered (left linear, right linear, linear, context free) restrict
the form of productions in one way or another. An unrestricted grammar does not.
In what follows, we will attempt to show that unrestricted grammars are equivalent to Turing
machines. Bear in mind that
A language is recursively enumerable if there exists a Turing machine that accepts every string of
the language, and does not accept strings that are not in the language.
"Does not accept" is not the same as "reject" -- the Turing machine could go into an infinite loop
instead, and never get around to either accepting or rejecting the string.
Our plan of attack is to show that the languages generated by unrestricted grammars are precisely
the recursively enumerable languages.
1. If a procedure exists for enumerating the strings of a language, then the language is recursively
enumerable. (We proved this earlier.)
2. There exists a procedure for enumerating all the strings in any language generated by an
unrestricted grammar. (We will demonstrate the procedure shortly.)
Here's a review of the argument for (1) above. We prove the language is recursively enumerable
by constructing a Turing machine to accept any string w of the language.
Build one Turing machine that generates the strings of the language in some systematic order.
Build a second Turing machine that compares its input to w and accepts its input if the two
strings are identical.
Build a composite Turing machine that incorporates the two machines above, using the output
of the first as input to the second.
Now to systematically generate all the strings of the language. For other types of grammars it
worked to generate shortest strings first; we don't know how to do that with an unrestricted
grammar, because some productions could make the sentential form shorter. It might take a
million steps to derive .
Instead, we order the strings shortest derivation first. First we consider all the strings that can be
generated from S in one derivation step, and see if any of them are composed entirely of
terminals. (We can do this because there are only a finite number of productions.) Then we
consider all the strings that can be derived in two steps, and so on.
xi...xjqmxk...xl
where the x's are the symbols on the tape, qm is the current state, and the tape head is on the
square containing xk (the symbol immediately following qm). It makes sense that a grammar,
which is a system for rewriting strings, can be used to manipulate configurations, which can
easily be written as strings.
q0w xqfy
for some strings x and y and some final state qf, whereas a grammar produces a string if
S w.
Because the Turing machine starts with w while the grammatical derivation ends with w, the
grammar we build will run "in reverse" as compared to the Turing machine.
q0w xqfy
and that our grammar will run "backwards" compared to the Turing machine.
The productions of the grammar we will construct can be logically grouped into three sets:
1. Initialization. These productions construct the string ...#$xqfy#... where # indicates a blank and $
is a special variable used for termination.
2. Execution. For each transition rule of we need a corresponding production.
3. Cleanup. Our derivation will leave some excess symbols q0, #, and $ in the string (along with the
desired w), so we need a few more productions to clean these up.
For the terminals T of the grammar we will use the input alphabet of the Turing machine.
Execution. For each transition rule of we need a corresponding production. For each rule of
the form
(qi, a) = (qj, b, R)
we use a production
bqj qia
and for each rule of the form
(qi, a) = (qj, b, L)
we use a production
qjcb cqia
for every c (the asymmetry is because the symbol to the right of q is the one under the Turing
machine's tape head.)
Cleanup. We end up with a string that looks like #...#$q0w#...#, so we need productions to get
rid of everything but the w:
#
$q0
There are two different definitions for "context-sensitive grammar," yielding grammars whose
productions look quite different. However, the grammars are equivalent, in that they describe
(almost) the same languages.
(Original definition) A context-sensitive grammar is one whose productions are all of the form
xAy xvy
The name "context-sensitive" comes from the fact that the actual string modification is given by
A v, while the x and y provide the context in which the rule may be applied.
(Extra crispy definition) A context-sensitive grammar is one whose productions are all of the
form
x y
Such a grammar is called noncontracting because derivation steps never decrease the length of
the sentential form.
Most modern textbooks use the second definition given above. It can be shown that the two
kinds of grammars are almost equivalent (generate the same languages) with one exception: one
kind of grammar permits languages to contain the empty string, while the other doesn't. (Easy
question: which one permits ?)
Notice how this definition carefully sidesteps the question of which kind of context-sensitive
grammar is meant.
Linear-Bounded Automata
A Turing machine has an infinite supply of blank tape. A linear-bounded automaton (lba) is a
Turing machine whose tape is only kn squares long, where n is the length of the input (initial)
string and k is a constant associated with the particular linear-bounded automaton.
Some textbooks define an lba to use only the portion of the tape that is occupied by the input;
that is, k=1 in all cases. The definitions lead to equivalent classes of machines, because we can
compensate for the shorter tape by having a larger tape alphabet.
Theorem. For every context-sensitive language L there exists an lba M such that L=L(M), that
is, M accepts exactly the strings of L.
Theorem. For every language L accepted by an lba there exists a context-sensitive grammar that
produces exactly L (or L - { }, depending on your definition of context-sensitive grammar).
Proof. The productions of a context-free language have the form A v. The productions of a
context-sensitive language have the form xAy xvy, where x and y are permitted to be .
Q.E.D.
Proof. The language {a b c : n 0} is not context-free (we used a pumping lemma to show
this). We can show that it is context-sensitive by providing an appropriate context-sensitive
grammar. Here are the productions for one such grammar:
X aBC CB BC aB ab
bB bb bC bc cC cc
Proof. A context-sensitive grammar is noncontracting; moreover, for any integer n there are only
a finite number of sentential forms of length n. Therefore, for any string w we can set a bound on
the number of derivation steps required to generate w, hence a bound on the number of possible
derivations. The string w is in the language if and only if one of these derivations produces w.
Regular grammar
Left-linear
grammar
Nondeterministic pushdown
Context-free language Context-free grammar a b
automaton
Context-sensitive Context-sensitive
Linear-bounded automaton a b c
language grammar
These languages form a strict hierarchy; that is, regular languages context-free languages
context-sensitive languages recursively enumerable languages.
Language Machine
Not all language classes fit neatly into a hierarchy. For example, we have discussed the linear
languages, which (like deterministic context-free languages) fit neatly between the regular
languages and the context-free languages; however, there are languages that are linear but not
deterministic context-free, and there are languages that are deterministic context-free but not
linear.
In fact, mathematicians have defined dozens, maybe hundreds, of different classes of languages,
and write papers on how these relate to one another. You should know at least the four "classic"
categories that are taught in almost every textbook on the subject.
A Random-Access Machine
We will define a random-access machine as follows:
Data types. The only data type we will support is the natural numbers 0, 1, 2, 3, .... (However,
numbers may be arbitrarily large.)
Variables. We will allow an arbitrary number of variables, each capable of holding a single
natural number. All variables will be initialized to 0.
This begins to look like a "real" programming language, albeit a very weak one. Here's the point:
this language is equivalent in power to a Turing machine. (You can prove this by using the
language to implement a Turing machine, then using a Turing machine to emulate the language.)
In other words: this language is powerful enough to compute anything that can be computed in
any programming language.
1. A place where anything is possible but nothing of interest is practical. Alan Turing helped lay
the foundations of computer science by showing that all machines and languages capable of
expressing a certain very primitive set of operations are logically equivalent in the kinds of
computations they can carry out, and in principle have capabilities that differ only in speed from
those of the most powerful and elegantly designed computers. However, no machine or language
exactly matching Turing's primitive set has ever been built (other than possibly as a classroom
exercise), because it would be horribly slow and far too painful to use. A 'Turing tar-pit' is any
computer language or other tool that shares this property. That is, it's theoretically universal --
but in practice, the harder you struggle to get any real work done, the deeper its inadequacies
suck you in. Compare bondage-and-discipline language.
2. The perennial holy wars over whether language A or B is the "most powerful".
A compiler reads a program in one language and produces the equivalent program in another
language.
A preprocessor reads a program with embedded processor commands and produces an
equivalent program without those commands.
A prettyprinter reads a program and writes a "cleaned up" version of the same program.
A prettyprinter, written in and for the same language, can prettyprint itself.
A compiler can be written in language X for a new, improved version of language X. This process
is called bootstrapping.
The input to a Turing machine is a string. Turing machines themselves can be written as strings,
and these strings can be used as input to other Turing machines.
In particular, we have already discussed the notion of a universal Turing machine whose input
consists of a description M of some arbitrary Turing machine, and some input w to which machine
M is to be applied (we will write this combined input as M+w), and produces the same output that
would be produced by M. We could write this as UTM(M+w)=M(w).
Since a Turing machine can be represented as a string, it is entirely possible to supply a Turing
machine as input to itself, e.g. UTM(UTM).
Suppose we have a Turing machine WillHalt which, given an input string M+w, will halt and
accept the string if Turing machine M halts on input w, and will halt and reject the string if Turing
machine M does not halt on input w. Viewed as a Boolean function, WillHalt(M,w) halts and
returns true in the first case, and halts and returns false in the second.
Recall that
A language is recursively enumerable if there exists a Turing machine that accepts every string in
the language and does not accept any string not in the language.
A language is recursive if there exists a Turing machine that accepts every string in the language
and rejects every string not in the language.
Proof. If a Turing machine WillHalt could be built, then we could readily build the machine
This Turing machine will always halt and always give the correct answer. If we could build
WillHalt then we could build Accepts. If we could build Accepts, then we would have shown
that every recursively enumerable language is recursive.
However, we showed earlier that there do exist recursively enumerable languages that are not
recursive. (You might wish to review this argument.) Therefore, it must be impossible to build
WillHalt.
Q.E.D.
First, every existing programming language, and every forseeable programming language, has no
more power than a Turing machine. Hence, this result applies directly to programming
languages.
The theorem does not say that we can never determine whether or not a given program halts on a
given input. Most of the time, for most practical programs, we can and do eliminate infinite
loops from our programs. We can even write a meta-program to check another program for
potential infinite loops, and get this meta-program to work most of the time.
The theorem does say that we cannot ever write such a meta-program and have it work all of the
time. Moreover, the result can be used to demonstrate that certain other programs are also
impossible. Here's the basic outline:
Sometimes you will be able to solve a useful, practical subset of problem X. However, unless
you have a particularly understanding customer, you are generally better off avoiding problem X
altogether.
Some philosophers have tried to use the Halting Problem as an argument against the possibility
of intelligent computers. Stripped to its basics, the argument goes like this:
The second premise is generally supported by displaying a program that solves some subset of
the Halting Problem, then describing a clever trick (not incorporated into the program) that
solves a slightly larger subset.
There may well be valid arguments against the possibility of artificial intelligence. This is not
one of them.
Assume that you have an effective procedure (Turing machine or any other kind of algorithm) to
solve problem X.
Show how to use the program for X to solve the Halting Problem.
(I prefer to think of this as the dead code problem.) The problem is to determine whether Turing
machine M, when given input w, ever enters state q.
The only way a Turing machine M halts is if it enters a state q for which some transition function
(qi, ai) is undefined. Add a new final state Z to the Turing machine, and add all these missing
transitions to lead to state Z.
Now use the (assumed) state-entry procedure to test whether state Z is ever entered when M is
given input w. This will reveal whether the original machine M halts. We conclude that it must
not be possible to build the assumed state-entry procedure.
A = (w1, w2, w3, ..., wk) and B = (x1, x2, x3, ..., xk)
Post's Correspondence Problem is the following. Does there exist a sequence of integers i1, i2,
i3, ..., im such that m 1 and
wi1wi2wi3...wim = xi1xi2xi3...xim ?
Example. Suppose A = (a, abaaa, ab) and B = (aaa, ab, b). Then the required sequence of
integers is 2,1,1,3, giving
This example had a solution. It will turn out that Post's correspondence problem is insoluable in
general.
Church's Thesis
Formal Systems
Two thousand years ago, Euclid set a standard for rigor in geometrical proofs. The rest of
mathematics has never succeeded in reaching this standard.
The required properties of a satisfactory formal system are that it be
complete -- it must be possible either to prove or to disprove any proposition that can be
expressed in the system.
consistent -- it must not be possible to both prove and disprove a proposition in the system.
Here is the now-famous problem that demolished the Principia Mathematica. Consider the set of
all sets that do not have themselves as a member. Is this set a member of itself?
Kurt Gödel explored the very notions of completeness and consistency. He invented a numbering
scheme (Gödel numbers) that allowed him to express proofs as numbers (much as we might
consider a computer program to be a very large binary number). He was able to prove the
following result:
If it is possible to prove, within a formal system, that the system is consistent, then the formal
system is not, in fact, consistent.
Or, equivalently,
If a formal system is consistent, then it is impossible to prove (within the system) that it is
consistent.
This result sets very definite limits on the kinds of things that we can know. In particular, it
shows that any attempt to prove mathematics consistent is foredoomed to failure.
Gödel left open the possibility that we could somehow distinguish between the provable
propositions and the unprovable ones. Ideally, we would like to have a mechanical (algorithmic)
theorem-proving procedure. Alan Turing invented Turing machines in an attempt to solve this
problem. With the Halting Problem, he showed that we cannot, in all cases, distinguish between
soluable and insoluable problems.
Other mathematicians, working with very different models of computation, ended up with very
similar results. One of these was Alonzo Church, who invented recursive function theory.
I have sometimes referred to this course, Formal Languages and Automata Theory, as "compiler
construction made difficult." A fairer statement is that this course presents a mathematician's
view of the subject, while a course in Compiler Construction presents a programmer's view.
In the same way, recursive function theory is "Lisp made difficult." If, like me, you understand
programming more readily than mathematics, learn Lisp before you take a course in recursive
function theory. You would not believe the difference it will make.
The textbook describes these functions as being over the natural numbers I={0,1,2,3,...}. A better
way to look at recursive functions, though, is as pure symbol systems. Numbers are not used in
the system; rather, we use the system to construct both numbers and arithmetical functions on
numbers. In other words, it's a different numbering system, in the same way that Roman
numerals are different. The correspondence goes like this:
To "translate" to decimal, just count the number of s's surrounding the central z(x).
Now let's get formal. In this system there are only a few basic functions:
The zero function: z(x)=z(y) for all x,y I. (This is our "zero"; it is written as a function so we don't
have to introduce constants into the system.)
The successor function: s(x). Informally, this means "x+1". Formally, it doesn't "return a value", it
just sits there: the result of s(x) is s(x).
For convenience, we make the following "abbreviations":
o ...and so on.
o p1(x) = x.
o p1(x, y) = x.
o p2(x, y) = y.
The projector (or "pick") functions are just a way of extracting one of the parameters and
discarding the rest. The book defines only p1 and p2 because it uses functions of no more
than two arguments.
Composition and Recursion
If g1, g2, g3, and h are previously defined functions, we can combine them to form new functions.
In a very careful, formal development, they can be combined only in precisely defined ways.
Here is an example of the kind of form required.
Note: The form of primitive recursion given in the textbook is not consistent with the
author's examples. If you want to step through his (or my) examples, use this form
instead.
A primitive recursive function is a function formed from the functions z, s, p1, and p2 by using
only composition and primitive recursion.
Composition
The important thing here is that each function be defined only in terms of previously defined
functions. This restriction prevents unwanted kinds of recursion from creeping in, such as
indirect recursion (A calls B, B calls C, C calls A).
Many programming languages have a restriction that you cannot reference a variable until after it
is defined. This is the same kind of restriction.
Primitive recursion
Primitive recursive functions may only use a simple form of recursion. In particular,
The recursion must be guaranteed to terminate. To ensure this, the function must carry along an
extra parameter that is "decremented" each time the function is called (s(x) is replaced by x),
and halts the recursion when it reaches "zero" (z(x)). That is,
The recursive function must appear only once in the definiens (right hand side of the definition).
This restriction prevents various forms of "fancy" recursion.
General recursive functions, which don't have these restrictions, are more powerful than
primitive recursive functions.
Examples I
The following examples show how these can be used to define more complicated functions. My
examples are taken from those in the textbook, but I prefer the notation s(x) to the abbreviation
x+1.
Examples II
Example. Multiplication of two numbers.
The key new feature here is the use of a previously defined function, add, in the definition of a
new function. We skip the step of playing around with the pi functions to pick out the right parts,
and go right to the simplified form.
multiply(x, s(z(x))) = x
multiply(x, s(y)) = add(x, multiply(x, y))
Example. Predecessor.
The trick here is that we can't drop below zero, so effectively 0-1=0. To show this, we write a dot
above the minus sign and call it a monus (and no, you don't need to remember this!). In any case,
the function is easy to define:
pred(z(x)) = z(x)
pred(s(x)) = x
(We have taken some liberties with the notation, because the form allowed by the textbook
doesn't allow us to define a function of one variable; we would have to define a function of two
variables, and just ignore one of them.)
Example. Subtraction.
subtract(x, z(x)) = x
subtract(x, s(y)) = pred(subtract(x, y))
Ackermann's Function
Ackermann's function is an example of a function that is mu-recursive but not primitive
recursive. (Mu-recursive functions have the power of a Turing machine.) Here is the definition:
A(0, y) = y + 1
A(x, 0) = A(x - 1, 1)
A(x, y) = A(x - 1, A(x, y - 1))
Ackermann's function is one of the few things I actually remember from the recursive function
theory course I took many long years ago. It's just a really neat function. Play with it a bit and
you'll see what I mean.
Stress-test your computer. See just how many values of Ackermann's function you can
compute.
Liven up a boring meeting. Instead of sitting there doodling, bring in a copy of Ackermann's
function and see how far you can get with it. If you have ever played with numbers, I guarantee
it will be a lot more interesting than drawing random designs.
Test your programming skills. There are a lot of short cuts you can find to help compute
Ackermann's function much faster. How many of them can you find?
Make money fast. Bet a hotshot programmer that s/he can't write a program in less than an hour
to compute A(5,5). Or be generous -- give them a week.
Turing's thesis. Anything that is computable can be computed by a Turing machine. There does
not and cannot exist a machine that can compute things a Turing machine cannot compute.
Church's thesis. All the models of computation yet developed, and all those that may be
developed in the future, are equivalent in power. We will not ever find a more powerful model.
P and NP
Complexity Theory
Complexity theory concerns itself with two kinds of measures: time and space.
Time complexity is a measure of how long a computation takes to execute. For a Turing machine,
this could be measured as the number of moves required to perform a computation. For a digital
computer, it could be measured as the number of machine cycles required for the computation.
Space complexity is a measure of how much storage is required for a computation. For a Turing
machine, the obvious measure is the number of tape squared used; for a digital computer, the
number of bytes used.
Both of these measures are functions of a single input parameter, the size of the input. Again, this
can be defined in terms of squares or bytes.
For any given input size, different inputs typically require different amounts of space and time.
Hence we can discuss for either the average case or for the worst case. Usually we are interested
in worst-case complexity because
It may difficult or impossible to define an "average" case. For many problems, the notion of
"average case" doesn't even make sense.
It is usually much easier to compute worst-case complexity.
In complexity theory we generally subject our equations to some extreme simplifications. For
example, if a given algorithm takes exactly 5n3+2n2-n+1003 machine cycles, where n is the size
of the input, we will simplify this to O(n3) (read: Order n-cubed). This is called an order
statistic. Specifically, we:
For very large values of n, the effect of the highest-order term completely swamps the
contribution of lower-ordered term. We are interested in large values of n because, from a
strictly practical point of view, it is the large problems that give us trouble. Small problems are
almost always feasible to compute.
Tweaking the code can improve the coefficients, but the order statistic is a function of the
algorithm itself.
Polynomial-Time Algorithms
A polynomial-time algorithm is an algorithm whose execution time is either given by a
polynomial on the size of the input, or can be bounded by such a polynomial. Problems that can
be solved by a polynomial-time algorithm are called tractable problems.
For example, most algorithms on arrays can use the array size, n, as the input size. To find the
largest element in an array requires a single pass through the array, so the algorithm for doing
this is O(n), or linear time.
Sorting algorithms usually require either O(n log n) or O(n2) time. Bubble sort takes linear time
in the best case, but O(n2) time in the average and worst cases. Heapsort takes O(n log n) time in
all cases. Quicksort takes O(n log n) time on average, but O(n2) time in the worst case.
The base of the logarithms is irrelevant, since the difference is a constant factor, which we
ignore; and
Although n log n is not, strictly speaking, a polynomial, the size of n log n is bounded by n2,
which is a polynomial.
Probably all the programming tasks you are familiar with have polynomial-time solutions. This
is not because all practical problems have polynomial-time solutions. Rather, it is because your
courses and your day-to-day work have avoided problems for which there is no known practical
solution.
Nondeterministic Polynomial-Time
Algorithms
Recall that a nondeterministic computation can be viewed in either of two ways:
When a choice point is reached, an infallible oracle can be consulted to determine the correct
choice.
When a choice point is reached, all choices can be made and computation can proceed
simultaneously.
(19, 23, 32, 42, 50, 62, 77, 88, 89, 105, 114, 123, 176)
These numbers sum to 1000. Can they be divided into two bins, bin A and bin B, such that the
sum of the integers in each bin is 500?
There is an obvious nondeterministic algorithm: for each number, put it in the correct bin. This
requires linear time.
There is also a fairly easy deterministic algorithm. There are 13 numbers (n=13), so form the 13-
bit binary number 0000000000000. For i ranging from 1 to 13: if bit i is zero, put integer i into
bin A; if bit i is one, put integer i into bin B. Test the resultant arrangement. If you don't have a
solution yet, add 1 to the binary number and try again. If you reach 1111111111111, stop and
conclude that there is no solution.
This is a fairly simple algorithm; the only problem is that it takes O(2n) time, that is, exponential
time. In the above example, we may need to try as many as 213 arrangements. This is fine for
small values of n (such as 13), but becomes unreasonable for large values of n.
You can find many shortcuts for problems such as this, but the best you can do is improve the
coefficients. The time complexity remains O(2n). Problems that require exponential time are
referred to as intractable problems.
You can pack objects with multiple dimensions (volume and weight, for example).
Objects have not only a size but also a value, and the object is to pack as much value as possible.
Boolean Satisfiability
Suppose you have n Boolean variables, named A, B, C, ..., and you have an expression in the
propositional calculus (that is, you can use and, or, and not to form the expression.) Is there an
assignment of truth values to the variables (e.g. A=true, B=true, C=false, ....) that will make the
expression true?
Here is a nondeterministic algorithm to solve the problem: For each Boolean variable, assign it
the proper truth value. This is a linear algorithm.
We can find a deterministic algorithm for this problem in much the same way as we did for the
integer bin problem. Effectively, the idea is to set up a systematic procedure to try every possible
assignment of truth values to variables. The algorithm terminates when a satisfactory solution is
found, or when all 2n possible assignments have been tried. Again, the deterministic solution
requires exponential time.
Additional NP Problems
The following problems all have a polynomial-time solution on a nondeterministic machine, but
an exponential-time solution on a deterministic machine. There are literally hundreds of
additional examples.
The travelling salesman problem
A salesman, starting in Harrisburg, wants to visit every capital city in the 48 continental United
States, returning to Harrisburg as his last stop. In what order should he visit the capital cities so
as to minimize the total distance travelled?
Every capital city has direct air flights to at least some other capital cities. Our intrepid salesman
wants to visit all 48 capitals, and return to his starting point, taking only direct air flights. Can he
find a path that lets him do this?
Given a set of finite automata M1, M2, M3, ..., Mn, all over the same alphabet A, is there some
string in A* that is accepted by all of these automata?
Linear programming
You have on hand X amount of butter, Y amount of flour, Z eggs, etc. You have cookie recipies
that use varying amounts of these ingredients. Different kinds of cookies bring different prices.
What mix of cookies should you make in order to maximize profits?
This type of problem is sufficiently important that entire college courses are devoted to it,
usually in the College of Business.
NP-Complete Problems
All of the known NP problems have a remarkable characteristic: they are all reducible to one
another. What this means is that, given any two NP problems X and Y,
This is what the "complete" refers to when we talk about NP-complete problems.
What this means is that, if anyone ever discovers a polynomial-time algorithm for any of these
problems, then there is an easily-derived polynomial-time algorithm for all of them. This leads to
the famous question:
Does P = NP?
No one has ever found a deterministic polynomial-time algorithm for any of these problems (or
the hundreds of others like them). However, no one has ever succeeded in proving that no
deterministic polynomial-time algorithm exists, either. The status for some years now is this:
most computer scientists don't think a polynomial-time algorithm can exist, but no one knows for
sure. This was a hot research topic for a while, but interest has died down on the problem, for the
simple reason that no one has made any progress (in either direction).
REVIEWS
Languages
A language is a set of strings over some finite alphabet. To be well-defined, a set
requires a membership criterion. Two kinds of membership criteria often used for
languages are grammars and automata. Other kinds of criteria are possible, such as
regular expressions and recursive functions.
If there exists a grammar of a given type for a language L, then L is no more
complex than the corresponding language type. It is possible that a simpler (less
powerful) grammar exists for the same language.
Grammars
A grammar G is a quadruple G = (V, T, S, P)
where
V (V T)
V T*V V T*
V VT* V T*
: Q ( { } ) 2
M = (Q, , , , q0, z, F)
with transition function
: Q ( { }) finite subsets of Q *
(Q, , , , q0, #, F)
with partial transition function
: Q Q {L, R}
Accepting Strings
The (regular) language accepted by a dfa M is
An lba is a Turing machine with a limited amount of tape (a linear function of the size
of the input). Lbas accept context-sensitive languages.
Pumping Lemmas
Regular languages
If L is an infinite regular language, then there exists some positive integer m such that
any string w L whose length is m or greater can be decomposed into three parts,
xyz, where
Context-free languages
Let L be an infinite context-free language. Then there is some positive integer m such
that, if S is a string of L of length at least m, then
Usage
Pumping lemmas are used to show that a given language is not of type
(regular/context-free). The argument goes: